SPSSX Discussion

Clustering problem with 1 and 0s

Classic

List

Threaded

4 messages Options

OZGU GILL

Clustering problem with 1 and 0s

Hi there,

I have data for 75 cases and 19 variables, All variable values are 1 or zero meaning either that information is there or not. Plus these 19 variables are coming from 3 main variables. Type has 9 variables, options has 4 variables and provisions has 6 variables. We would like to identify the similarities among the cases. Let say Cases with numbers 1, 5, 9, 11, 43, 50, 60 show the same characteristics. What would be the best way to cluster them? Thanks in advance.

Ozgu

---------------------------------
The fish are biting.
Get more visitors on your site using Yahoo! Search Marketing.

Dan Zetu

Re: Clustering problem with 1 and 0s

You can use hierarchical clustering with Jaccard coefficient as the measure
of similarity. I am pretty much sure SPSS has this capability built-in.

Dan

>From: OZGU GILL <[hidden email]>
>Reply-To: OZGU GILL <[hidden email]>
>To: [hidden email]
>Subject: Clustering problem with 1 and 0s
>Date: Mon, 26 Mar 2007 09:13:11 -0700
>
>Hi there,
>
> I have data for 75 cases and 19 variables, All variable values are 1 or
>zero meaning either that information is there or not. Plus these 19
>variables are coming from 3 main variables. Type has 9 variables, options
>has 4 variables and provisions has 6 variables. We would like to identify
>the similarities among the cases. Let say Cases with numbers 1, 5, 9, 11,
>43, 50, 60 show the same characteristics. What would be the best way to
>cluster them? Thanks in advance.
>
> Ozgu
>
>
>---------------------------------
>The fish are biting.
> Get more visitors on your site using Yahoo! Search Marketing.

_________________________________________________________________
i'm making a difference. Make every IM count for the cause of your choice.
Join Now.
http://clk.atdmt.com/MSN/go/msnnkwme0080000001msn/direct/01/?href=http://im.live.com/messenger/im/home/?source=hmtagline

Art Kendall-2

Re: Clustering problem with 1 and 0s

In reply to this post by OZGU GILL

Are the zero/ one variables dummies from the 3 main variables? I.e., one
and only one is flagged in a set?
Or are they 3 sets from o"check all that apply"?

In either instance take a look at TWOSTEP which handles categorical
variables.
If only one in a set can be a 1, just use the three categorical
variables as input.
Otherwise, use the 19 dichotomies as input.

Art Kendall
Social Research Consultants

OZGU GILL wrote:

> Hi there,
>
> I have data for 75 cases and 19 variables, All variable values are 1 or zero meaning either that information is there or not. Plus these 19 variables are coming from 3 main variables. Type has 9 variables, options has 4 variables and provisions has 6 variables. We would like to identify the similarities among the cases. Let say Cases with numbers 1, 5, 9, 11, 43, 50, 60 show the same characteristics. What would be the best way to cluster them? Thanks in advance.
>
> Ozgu
>
>
> ---------------------------------
> The fish are biting.
> Get more visitors on your site using Yahoo! Search Marketing.
>
>
>

paulandpen

Re: Clustering problem with 1 and 0s

In reply to this post by OZGU GILL

I would recommend preprocessing the data into a matrix. You could generate a correlation matrix based on tetrachoric correlations which is appropriate for (binary data). There is a macro to do this on someones web page (a simple google search will obtain the macro for you).

Then I would factor analyse the correlation matrix to see if there is an extensive overlap among some of the items. My idea here is to identify/eliminate multi-collinearity prior to the segmentation/cluster analysis. Heavily correlated data can unduly drive a cluster solution.

With the correlation matrix you can input this into any of the three different cluster algorithms and compare solutions. My gut instinct given the sample size is to use hierarchical clustering but there is some peer reviewed literature I remember reading that said that TWOSTEP is good at handling binary data. The other advantage of TWOSTEP is that it orders variables based on their effect on the cluster solution so you have an idea whether there are some items from each of your "3 broad constructs, latent variables etc" that are driving the solutions, or whether the cluster solution is dominated by one latent variable. I have just actually talked myself into TWOSTEP as of now.

HTH
Paul

> Art Kendall <[hidden email]> wrote:
>
> Are the zero/ one variables dummies from the 3 main variables? I.e., one
> and only one is flagged in a set?
> Or are they 3 sets from o"check all that apply"?
>
> In either instance take a look at TWOSTEP which handles categorical
> variables.
> If only one in a set can be a 1, just use the three categorical
> variables as input.
> Otherwise, use the 19 dichotomies as input.
>
> Art Kendall
> Social Research Consultants
>
>
>
> OZGU GILL wrote:
> > Hi there,
> >
> > I have data for 75 cases and 19 variables, All variable values are 1
> or zero meaning either that information is there or not. Plus these 19
> variables are coming from 3 main variables. Type has 9 variables,
> options has 4 variables and provisions has 6 variables. We would like to
> identify the similarities among the cases. Let say Cases with numbers 1,
> 5, 9, 11, 43, 50, 60 show the same characteristics. What would be the
> best way to cluster them? Thanks in advance.
> >
> > Ozgu
> >
> >
> > ---------------------------------
> > The fish are biting.
> > Get more visitors on your site using Yahoo! Search Marketing.
> >
> >
> >