K-Means Cluster Analysis

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

K-Means Cluster Analysis

Courtney M. Cronley
I am running a K-means cluster analysis with 12 clusters, and I keep getting different clusters. For example, yesterday Cluster 2 had two cases in it. Today Cluster 2 has one case in it. Any ideas about why this is the case?

Thanks in advance for any advice that anyone can offer.

Best,

Courtney Cronley, Ph.D.
Postdoctoral Associate
Center of Alcohol Studies
Rutgers University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: K-Means Cluster Analysis

Art Kendall
k-means is very dependent on the order the cases are in. One expects slightly different results with addition or omission of a few cases or changes in the sort order of the cases.
That is why I have long  recommended that a user sort the cases in a few random orders and checks the consistency of the assigned cluster memberships.


 Are you getting different results with identical input?

Also TWOSTEP is a step up from the older k-means approach.

What is the nature of your data? Kinds of variables? relative independence of the variables? what kind of entity is the case?

Art Kendall
Social Research Consultants

On 5/17/2010 10:15 AM, Courtney M. Cronley wrote:
I am running a K-means cluster analysis with 12 clusters, and I keep getting different clusters. For example, yesterday Cluster 2 had two cases in it. Today Cluster 2 has one case in it. Any ideas about why this is the case?

Thanks in advance for any advice that anyone can offer.

Best,

Courtney Cronley, Ph.D.
Postdoctoral Associate
Center of Alcohol Studies
Rutgers University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

  
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: K-Means Cluster Analysis

john wurst
Another approach is to use both hierarchical and K-Means (tandem approach).  Run hierarchical (Wards is a common method) and use the group centroids as the starting seeds for K-means.
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Art Kendall
Sent: Monday, May 17, 2010 11:54 AM
To: [hidden email]
Subject: Re: K-Means Cluster Analysis

k-means is very dependent on the order the cases are in. One expects slightly different results with addition or omission of a few cases or changes in the sort order of the cases.
That is why I have long  recommended that a user sort the cases in a few random orders and checks the consistency of the assigned cluster memberships.


 Are you getting different results with identical input?

Also TWOSTEP is a step up from the older k-means approach.

What is the nature of your data? Kinds of variables? relative independence of the variables? what kind of entity is the case?

Art Kendall
Social Research Consultants

On 5/17/2010 10:15 AM, Courtney M. Cronley wrote:
I am running a K-means cluster analysis with 12 clusters, and I keep getting different clusters. For example, yesterday Cluster 2 had two cases in it. Today Cluster 2 has one case in it. Any ideas about why this is the case?

Thanks in advance for any advice that anyone can offer.

Best,

Courtney Cronley, Ph.D.
Postdoctoral Associate
Center of Alcohol Studies
Rutgers University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

  
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: K-Means Cluster Analysis

Keith McCormick
I would add to the fine advice so far that you can increase the iterations in K-means (maybe from the default of 10 to 20). This will help stabilize the problem of sort dependence.

Best,

Keith
www.keithmccormick.com

On Mon, May 17, 2010 at 12:34 PM, john wurst <[hidden email]> wrote:
Another approach is to use both hierarchical and K-Means (tandem approach).  Run hierarchical (Wards is a common method) and use the group centroids as the starting seeds for K-means.
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Art Kendall
Sent: Monday, May 17, 2010 11:54 AM
To: [hidden email]
Subject: Re: K-Means Cluster Analysis

k-means is very dependent on the order the cases are in. One expects slightly different results with addition or omission of a few cases or changes in the sort order of the cases.
That is why I have long  recommended that a user sort the cases in a few random orders and checks the consistency of the assigned cluster memberships.


 Are you getting different results with identical input?

Also TWOSTEP is a step up from the older k-means approach.

What is the nature of your data? Kinds of variables? relative independence of the variables? what kind of entity is the case?

Art Kendall
Social Research Consultants

On 5/17/2010 10:15 AM, Courtney M. Cronley wrote:
I am running a K-means cluster analysis with 12 clusters, and I keep getting different clusters. For example, yesterday Cluster 2 had two cases in it. Today Cluster 2 has one case in it. Any ideas about why this is the case?

Thanks in advance for any advice that anyone can offer.

Best,

Courtney Cronley, Ph.D.
Postdoctoral Associate
Center of Alcohol Studies
Rutgers University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

  
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: K-Means Cluster Analysis

Art Kendall
In reply to this post by john wurst
That is very close to what the 2 steps are in TWOSTEP.

Art

On 5/17/2010 12:34 PM, john wurst wrote:
Another approach is to use both hierarchical and K-Means (tandem approach).  Run hierarchical (Wards is a common method) and use the group centroids as the starting seeds for K-means.
-----Original Message-----
From: SPSSX(r) Discussion [[hidden email]]On Behalf Of Art Kendall
Sent: Monday, May 17, 2010 11:54 AM
To: [hidden email]
Subject: Re: K-Means Cluster Analysis

k-means is very dependent on the order the cases are in. One expects slightly different results with addition or omission of a few cases or changes in the sort order of the cases.
That is why I have long  recommended that a user sort the cases in a few random orders and checks the consistency of the assigned cluster memberships.


 Are you getting different results with identical input?

Also TWOSTEP is a step up from the older k-means approach.

What is the nature of your data? Kinds of variables? relative independence of the variables? what kind of entity is the case?

Art Kendall
Social Research Consultants

On 5/17/2010 10:15 AM, Courtney M. Cronley wrote:
I am running a K-means cluster analysis with 12 clusters, and I keep getting different clusters. For example, yesterday Cluster 2 had two cases in it. Today Cluster 2 has one case in it. Any ideas about why this is the case?

Thanks in advance for any advice that anyone can offer.

Best,

Courtney Cronley, Ph.D.
Postdoctoral Associate
Center of Alcohol Studies
Rutgers University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

  
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants