|
I am running a K-means cluster analysis with 12 clusters, and I keep getting different clusters. For example, yesterday Cluster 2 had two cases in it. Today Cluster 2 has one case in it. Any ideas about why this is the case?
Thanks in advance for any advice that anyone can offer. Best, Courtney Cronley, Ph.D. Postdoctoral Associate Center of Alcohol Studies Rutgers University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
k-means is very dependent on the order the cases are in. One expects
slightly different results with addition or omission of a few cases or
changes in the sort order of the cases.
That is why I have long recommended that a user sort the cases in a few random orders and checks the consistency of the assigned cluster memberships. Are you getting different results with identical input? Also TWOSTEP is a step up from the older k-means approach. What is the nature of your data? Kinds of variables? relative independence of the variables? what kind of entity is the case? Art Kendall Social Research Consultants On 5/17/2010 10:15 AM, Courtney M. Cronley wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDI am running a K-means cluster analysis with 12 clusters, and I keep getting different clusters. For example, yesterday Cluster 2 had two cases in it. Today Cluster 2 has one case in it. Any ideas about why this is the case? Thanks in advance for any advice that anyone can offer. Best, Courtney Cronley, Ph.D. Postdoctoral Associate Center of Alcohol Studies Rutgers University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
Another approach is to use both hierarchical and
K-Means (tandem approach). Run hierarchical (Wards is a common method) and
use the group centroids as the starting seeds for
K-means.
|
|
I would add to the fine advice so far that you can increase the iterations in K-means (maybe from the default of 10 to 20). This will help stabilize the problem of sort dependence.
Best, Keith www.keithmccormick.com On Mon, May 17, 2010 at 12:34 PM, john wurst <[hidden email]> wrote:
|
|
In reply to this post by john wurst
Art On 5/17/2010 12:34 PM, john wurst wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
| Free forum by Nabble | Edit this page |
