we want to do K-means cluster analysis and need to define number of clusters as input.
We first did hierarchical cluster analysis to determine the number of clusters. We copied the agglomeration schedule data outputted by the SPSS hierarchical cluster analysis to excel and drew a scree plot, wishing to see a clear gap, but it is not so obvious. In this situation, what one can do to determine the number of clusters? In SAS, people can use some statistics like pesudo F, t^2 and ccc, also Semipartial R-Square to help judge the number of clusters. But in spss, I did not see such options. Thanks and have a good weekend! Rongjin Guan Rutgers SSW ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
There exist tens of dozens of clustering criterions, internal and
external. Read Wikipedia article on Cluster analysis, to begin with
on this topic.
On page http://www.spsstools.net/en/KO-spssmacros you'll find a collection of some most recommended internal clustering criterions - just use those macros. One of them, Silhouette statistic, is also available as an extension command (by Jon Peck) which, if I recall right, has been added to last Statistics release. The approach you describe - take the agglomeration schedule or the dendrogram and visually find a "gap" - is just one of "clustering criterions" and is not very good one. One of the reasons is that you may not compare and choose between different agglomeration methods, using it. Please read some warnings regardind hierarchical cluster results comparison http://stats.stackexchange.com/a/63549/3277. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Some people like to do this by using Ward's method with squared Euclidean distance in hierarchical clustering and using the cluster centers from that as the starting point for k-means. The STATS_CLUS_SIL extension command can be helpful in evaluating the clusters from any method.It is installed with the Python Essentials in recent releases, but you can get it from the SPSS Community site (old or new) via the Utilities menu, if you don't already have it. It appears on the Analyze > Classify > Cluster Silhouettes menu. On Fri, Jan 8, 2016 at 3:48 PM, Kirill Orlov <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |