SPSSX Discussion

K-Means Cluster Analysis

Classic

List

Threaded

5 messages Options

Courtney M. Cronley

K-Means Cluster Analysis

I am running a K-means cluster analysis with 12 clusters, and I keep getting different clusters. For example, yesterday Cluster 2 had two cases in it. Today Cluster 2 has one case in it. Any ideas about why this is the case?

Thanks in advance for any advice that anyone can offer.

Best,

Courtney Cronley, Ph.D.
Postdoctoral Associate
Center of Alcohol Studies
Rutgers University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: K-Means Cluster Analysis

k-means is very dependent on the order the cases are in. One expects slightly different results with addition or omission of a few cases or changes in the sort order of the cases.
That is why I have long recommended that a user sort the cases in a few random orders and checks the consistency of the assigned cluster memberships.

Are you getting different results with identical input?

Also TWOSTEP is a step up from the older k-means approach.

What is the nature of your data? Kinds of variables? relative independence of the variables? what kind of entity is the case?

Art Kendall
Social Research Consultants

On 5/17/2010 10:15 AM, Courtney M. Cronley wrote:

I am running a K-means cluster analysis with 12 clusters, and I keep getting different clusters. For example, yesterday Cluster 2 had two cases in it. Today Cluster 2 has one case in it. Any ideas about why this is the case?

Thanks in advance for any advice that anyone can offer.

Best,

Courtney Cronley, Ph.D.
Postdoctoral Associate
Center of Alcohol Studies
Rutgers University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

john wurst

Re: K-Means Cluster Analysis

Another approach is to use both hierarchical and K-Means (tandem approach). Run hierarchical (Wards is a common method) and use the group centroids as the starting seeds for K-means.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Art Kendall
Sent: Monday, May 17, 2010 11:54 AM
To: [hidden email]
Subject: Re: K-Means Cluster Analysis

k-means is very dependent on the order the cases are in. One expects slightly different results with addition or omission of a few cases or changes in the sort order of the cases.
That is why I have long recommended that a user sort the cases in a few random orders and checks the consistency of the assigned cluster memberships.

Are you getting different results with identical input?

Also TWOSTEP is a step up from the older k-means approach.

What is the nature of your data? Kinds of variables? relative independence of the variables? what kind of entity is the case?

Art Kendall
Social Research Consultants

On 5/17/2010 10:15 AM, Courtney M. Cronley wrote:
I am running a K-means cluster analysis with 12 clusters, and I keep getting different clusters. For example, yesterday Cluster 2 had two cases in it. Today Cluster 2 has one case in it. Any ideas about why this is the case?

Thanks in advance for any advice that anyone can offer.

Best,

Courtney Cronley, Ph.D.
Postdoctoral Associate
Center of Alcohol Studies
Rutgers University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

  
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Keith McCormick

Re: K-Means Cluster Analysis

I would add to the fine advice so far that you can increase the iterations in K-means (maybe from the default of 10 to 20). This will help stabilize the problem of sort dependence.

Best,

Keith
www.keithmccormick.com

On Mon, May 17, 2010 at 12:34 PM, john wurst <[hidden email]> wrote:

Another approach is to use both hierarchical and K-Means (tandem approach). Run hierarchical (Wards is a common method) and use the group centroids as the starting seeds for K-means.
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Art Kendall
Sent: Monday, May 17, 2010 11:54 AM
To: [hidden email]
Subject: Re: K-Means Cluster Analysis
k-means is very dependent on the order the cases are in. One expects slightly different results with addition or omission of a few cases or changes in the sort order of the cases.
That is why I have long recommended that a user sort the cases in a few random orders and checks the consistency of the assigned cluster memberships.

Are you getting different results with identical input?

Also TWOSTEP is a step up from the older k-means approach.

What is the nature of your data? Kinds of variables? relative independence of the variables? what kind of entity is the case?

Art Kendall
Social Research Consultants

On 5/17/2010 10:15 AM, Courtney M. Cronley wrote:
I am running a K-means cluster analysis with 12 clusters, and I keep getting different clusters. For example, yesterday Cluster 2 had two cases in it. Today Cluster 2 has one case in it. Any ideas about why this is the case?

Thanks in advance for any advice that anyone can offer.

Best,

Courtney Cronley, Ph.D.
Postdoctoral Associate
Center of Alcohol Studies
Rutgers University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

  
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall

Re: K-Means Cluster Analysis

In reply to this post by john wurst

That is very close to what the 2 steps are in TWOSTEP.

Art

On 5/17/2010 12:34 PM, john wurst wrote:

Another approach is to use both hierarchical and K-Means (tandem approach). Run hierarchical (Wards is a common method) and use the group centroids as the starting seeds for K-means.
-----Original Message-----
From: SPSSX(r) Discussion [[hidden email]]On Behalf Of Art Kendall
Sent: Monday, May 17, 2010 11:54 AM
To: [hidden email]
Subject: Re: K-Means Cluster Analysis

k-means is very dependent on the order the cases are in. One expects slightly different results with addition or omission of a few cases or changes in the sort order of the cases.
That is why I have long recommended that a user sort the cases in a few random orders and checks the consistency of the assigned cluster memberships.

Are you getting different results with identical input?

Also TWOSTEP is a step up from the older k-means approach.

What is the nature of your data? Kinds of variables? relative independence of the variables? what kind of entity is the case?

Art Kendall
Social Research Consultants

On 5/17/2010 10:15 AM, Courtney M. Cronley wrote:
I am running a K-means cluster analysis with 12 clusters, and I keep getting different clusters. For example, yesterday Cluster 2 had two cases in it. Today Cluster 2 has one case in it. Any ideas about why this is the case?

Thanks in advance for any advice that anyone can offer.

Best,

Courtney Cronley, Ph.D.
Postdoctoral Associate
Center of Alcohol Studies
Rutgers University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

  
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants