Cluster Analysis - K means or hierarchical?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Cluster Analysis - K means or hierarchical?

Shailaja Rego
I need help in my research - I have 24 variables and 290 valid cases for cluster analysis. When I perform hierarchical clustering,(Walds method) it gives me almost perfect three clusters (85, 90 and 115 cases in each clusters) the ANOVA is rejected for each of these 24 variables. I have also performed Discriminant analysis which gave me 91.5% accuracy in original grouped cases and 90.5% in cross validated (leave one out ) cases.

My problem is, when I am trying to do a K-means cluster analysis for 3 clusters, Its not giving me proper results, ( 286 in one cluster and 2 each in other)

Even a two cluster solution for k-means is not proper (2 in one cluster and others in other)

Can anyone tell me what's the problem?

I am doing this for my PhD thesis and my guide is insisting on using k-means cluster saying its more robust. Is  this true? Is it not possible for me to use hierarchical cluster analysis?

Please help me.

Thanks in advance

Shailaja
Reply | Threaded
Open this post in threaded view
|

Re: Cluster Analysis - K means or hierarchical?

Jon K Peck
Try using as the k-means starting values the results from the hierarchical clustering.  K-means is sensitive to starting values.  Reordering the data randomly may also help.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Shailaja Rego <[hidden email]>
To:        [hidden email]
Date:        06/20/2012 07:36 AM
Subject:        [SPSSX-L] Cluster Analysis - K means or hierarchical?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I need help in my research - I have 24 variables and 290 valid cases for
cluster analysis. When I perform hierarchical clustering,(Walds method) it
gives me almost perfect three clusters (85, 90 and 115 cases in each
clusters) the ANOVA is rejected for each of these 24 variables. I have also
performed Discriminant analysis which gave me 91.5% accuracy in original
grouped cases and 90.5% in cross validated (leave one out ) cases.

My problem is, when I am trying to do a K-means cluster analysis for 3
clusters, Its not giving me proper results, ( 286 in one cluster and 2 each
in other)

Even a two cluster solution for k-means is not proper (2 in one cluster and
others in other)

Can anyone tell me what's the problem?

I am doing this for my PhD thesis and my guide is insisting on using k-means
cluster saying its more robust. Is  this true? Is it not possible for me to
use hierarchical cluster analysis?

Please help me.

Thanks in advance

Shailaja

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-K-means-or-hierarchical-tp5713716.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD