Cluster analysis for binary data

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Cluster analysis for binary data

Kuramura
Dear All,

I am trying to do cluster analysis for 305 cases with 44 variables. All 44
variables are nominal data (1 or 0). Would you please suggest me, which
cluster analysis method will be suitable for such data.

Thank you.

Kuramura

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cluster analysis for binary data

Rajeshms
Hi 

In IBM SPSS we have 2 step,k means and hierarchical cluster analysis........

but 2 step or  hierarchical cluster analysis will be appropriate.

On Thu, Feb 23, 2012 at 8:15 PM, Kuramura <[hidden email]> wrote:
Dear All,

I am trying to do cluster analysis for 305 cases with 44 variables. All 44
variables are nominal data (1 or 0). Would you please suggest me, which
cluster analysis method will be suitable for such data.

Thank you.

Kuramura

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Rajesh M S



Reply | Threaded
Open this post in threaded view
|

Re: Cluster analysis for binary data

David Marso
Administrator
In reply to this post by Kuramura
Note that SPSS CLUSTER provides a HUGE number of distance measures (26 of which appear in the dropdown as appropriate for binary data) and seven different clustering methods.  Pretty much impossible to recommend anything with simply the information that the variables are nominal.  

Kuramura wrote
Dear All,

I am trying to do cluster analysis for 305 cases with 44 variables. All 44
variables are nominal data (1 or 0). Would you please suggest me, which
cluster analysis method will be suitable for such data.

Thank you.

Kuramura

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Cluster analysis for binary data

Rich Ulrich
I have always seen more benefit, for my data, in using factor
analysis instead of cluster analysis.  Dichotomous items raise
some problems for factoring which do not disappear for clusters.

In particular - How extreme a proportion is determines the limit
of how big the correlation will be with another proportion.  Limits
or problems exist for other distance measures. 

Because of that - If you do a factor analysis with 44 correlated
0/1 variables, the factors will (tend to) break out according to
the item means.  I have had data where I said, "That's okay. I will
use a factor analysis with 44 variables and derive 15 to 20 factors
with 2 or 3 items each;  score up the 15-20 factors as simple totals
for the items; and carry out a new factor analysis on the 15-20
totals in order to obtain definitions for 4 or 5 new totals. 

Then the 5 new scores would be my covariates.  If I were going
to do a cluster analysis, I would take those steps so that I could
use those reduced scores for the clustering.

--
Rich Ulrich

> Date: Thu, 23 Feb 2012 10:51:08 -0800

> From: [hidden email]
> Subject: Re: Cluster analysis for binary data
> To: [hidden email]
>
> Note that SPSS CLUSTER provides a HUGE number of distance measures (26 of
> which appear in the dropdown as appropriate for binary data) and seven
> different clustering methods. Pretty much impossible to recommend anything
> with simply the information that the variables are nominal.
>
>
> Kuramura wrote
> >
> > Dear All,
> >
> > I am trying to do cluster analysis for 305 cases with 44 variables. All 44
> > variables are nominal data (1 or 0). Would you please suggest me, which
> > cluster analysis method will be suitable for such data.
> >
[snip]
Reply | Threaded
Open this post in threaded view
|

Re: Cluster analysis for binary data

Rich Ulrich
Hector,
First - Principal Components is worth looking at for the first step, since
one purpose is to include all the items.  PCA generallly produces more
factors than PFA, and it is more likely to include all the items that have
extreme proportions (and therefore, generally smaller covariances).

As to scoring factors:  I'm not sure that I follow what you are suggesting
to weight, but here is my reaction. I have generally created scores as the
simple sum or average of items, to take advantage of "length of scale" for
creating a robust score.  - A scale with 10 items, but three of them weighted
heavily, will have the generally-lower reliability that you would expect for a
3 or 4 items scale.  A scale with 10 items is expected to be more reliable.

On the other hand, that rule is not hard-and-fast.

A binary item that is rarely endorsed will have low variance, and perhaps
should count for more.  So that is one exception.  The other major
exception is mainly for something like an overall Total composite,
when the selection of items seems unbalanced, for the sense that we
are deriving from the scale.  An example:  If there turn out to be three
sub-scales, with 20 items, 6 items, and 6 items, I might argue to create
the Total as the average of the 3 sub-scale average-item-scores, rather
than use the average of the 32 items. 

--
Rich Ulrich


From: [hidden email]
To: [hidden email]; [hidden email]
Subject: RE: Cluster analysis for binary data
Date: Fri, 24 Feb 2012 00:43:59 -0300

Rich,

I’ve done on occasion something similar (factor analysis of binary data, then adding the factor scores), but with a twist: I weighted the factor scores according to the contribution of each factor to total explained variance (in my case 100% of the variance was “explained” because I used Principal Components, but this is not the point here). Thus a minor factor explaining, say, 3% of the variance would receive less weight than the first factor which explains perhaps 40%. What do you think of such an approach?

 

Hector

[snip, previous]
Reply | Threaded
Open this post in threaded view
|

Re: Cluster analysis for binary data

news
In reply to this post by Kuramura
Kuramura,

You need to look for a appropriate dissimilarity coefficient. Jochen
Bacher published a 196 page script  on cluster analysis from the 2002 ZA
spring seminary at Cologne University which explains the pros & cons of
the different dissimilarity coefficient, too. You find the legit
downloadable text at
http://www.clusteranalyse.net/sonstiges/zaspringseminar2002/lecturenotes.pdf

HTH
Dr Frank Thomas
FTR Internet Research
Rosny-sous-Bois
France

On 23/02/2012 15:45, Kuramura wrote:

> Dear All,
>
> I am trying to do cluster analysis for 305 cases with 44 variables. All 44
> variables are nominal data (1 or 0). Would you please suggest me, which
> cluster analysis method will be suitable for such data.
>
> Thank you.
>
> Kuramura
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD