SPSSX Discussion - Re: Cluster analysis for binary data

Re: Cluster analysis for binary data

Posted by Rich Ulrich on
URL: http://spssx-discussion.165.s1.nabble.com/Cluster-analysis-for-binary-data-tp5508286p5511121.html

I have always seen more benefit, for my data, in using factor
analysis instead of cluster analysis. Dichotomous items raise
some problems for factoring which do not disappear for clusters.

In particular - How extreme a proportion is determines the limit
of how big the correlation will be with another proportion. Limits
or problems exist for other distance measures.

Because of that - If you do a factor analysis with 44 correlated
0/1 variables, the factors will (tend to) break out according to
the item means. I have had data where I said, "That's okay. I will
use a factor analysis with 44 variables and derive 15 to 20 factors
with 2 or 3 items each; score up the 15-20 factors as simple totals
for the items; and carry out a new factor analysis on the 15-20
totals in order to obtain definitions for 4 or 5 new totals.

Then the 5 new scores would be my covariates. If I were going
to do a cluster analysis, I would take those steps so that I could
use those reduced scores for the clustering.

--
Rich Ulrich

> Date: Thu, 23 Feb 2012 10:51:08 -0800

> From: [hidden email]
> Subject: Re: Cluster analysis for binary data
> To: [hidden email]
>
> Note that SPSS CLUSTER provides a HUGE number of distance measures (26 of
> which appear in the dropdown as appropriate for binary data) and seven
> different clustering methods. Pretty much impossible to recommend anything
> with simply the information that the variables are nominal.
>
>
> Kuramura wrote
> >
> > Dear All,
> >
> > I am trying to do cluster analysis for 305 cases with 44 variables. All 44
> > variables are nominal data (1 or 0). Would you please suggest me, which
> > cluster analysis method will be suitable for such data.
> >

[snip]