SPSSX Discussion - Re: Validity of K Means Cluster

Re: Validity of K Means Cluster

Posted by Hector Maletta on Aug 08, 2011; 1:05pm
URL: http://spssx-discussion.165.s1.nabble.com/Interrater-reliability-tp4663573p4677650.html

Clusters, obtained by k-means or otherwise, are not meant to be “reliable” in themselves. In the case of k-means, as Art Kendall observes, reordering the cases may alter the results, because the initial cases are taken as initial cluster centers, and the other cases are added sequentially to the various clusters. A practical way to implement what Art’s suggestion would be saving the cluster allocation from the first run, then sorting the cases and re-run the procedure, saving again, and then compare the two results (ideally, they should be almost perfectly correlated).

Another possible meaning of “reliable” could be the explanatory power of the clusters with respect to some criterion variable. Suppose you cluster cases by location, occupation and education, and use income as a criterion. A good cluster solution should minimize intra-cluster variance and maximize inter-cluster variance, thus you may apply ANOVA to the results and watch for possible differences between the two solutions.

A further possible variation is varying the number of clusters: you ordered three clusters in your syntax, but you may try with four and see which is better suited to your purposes.

Clustering is not an “analytical” procedure but a “heuristic” one. Its analytical significance or usefulness should be judged by external criteria, or variation in the clustering parameters or the ordering of cases.

Hector

De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Art Kendall
Enviado el: Monday, August 08, 2011 09:00
Para: [hidden email]
Asunto: Re: Validity of K Means Cluster

What does the word "reliable" mean to you?
How many cases do you have?
What is the nature of your data?
Are you sure that some other PROXIMITY measure might not be preferable?

I notice you did not save the cluster assignments. Since k-means is very dependent on case order it is good practice to try some random-order of cases runs.

Art Kendall
Social Research Consultants

On 8/5/2011 10:33 AM, Jeanne Eidex wrote:

Hi Everyone,

This might be an overly simple question, but, the output of this simple clustering syntax doesn’t offer much information to determine how reliable these clusters are. Any suggestions?

QUICK CLUSTER q21 q22 q23 q24 q25 q26 q27 q28 q29 q30 q31 q32 q33 q34

/MISSING=PAIRWISE

/CRITERIA=CLUSTER(3) MXITER(10) CONVERGE(0)

/METHOD=KMEANS(NOUPDATE)

/PRINT INITIAL.

Thanks,

Jeanne

No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1391 / Virus Database: 1518/3819 - Release Date: 08/07/11

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD