Login  Register

Re: Fitindices to determine optimal clustersolution

Posted by Jon K Peck on Apr 29, 2015; 5:16pm
URL: http://spssx-discussion.165.s1.nabble.com/Fitindices-to-determine-optimal-clustersolution-tp5729419p5729438.html

One tool that might give you added insight into your clustering solutions is cluster silhouettes.  These show the distribution of silhouette values for each cluster.  They can be produced by the STATS CLUS SIL extension command (Analyze > Classify > Cluster Silhouettes).  If you don't have that already installed and have V22 or later, you can install it from the Utilities menu.  For older versions you would need to get it from the SPSS Community website (www.ibm.com/developerworks/spssdevcentral) in the Extension Commands collection.  It requires the Python Essentials, which are integrated into the Statistics install as of V22.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        MaaikeSmits <[hidden email]>
To:        [hidden email]
Date:        04/29/2015 11:06 AM
Subject:        Re: [SPSSX-L] Fitindices to determine optimal clustersolution
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hello,

Thank you for taking interest in my question. I will try to provide you with additional information on your questions.

From a total of 225 cases, 187 were included in the cluster analysis (20 cases were lost as a result of missing data on one or more of the 10 input variables and another 18 were excluded because they showed to be extreme outliers on one or more of the input variables).

I started with a hierarchical cluster analysis on this 187 cases and the cluster means that resulted from this procedure were used as non-random starting points in the k-means cluster analysis, which was also done on these same 187 cases. So, I did not select subsamples for the hierarchical nor the k-means procedure, but ran both on the whole sample.

The 10 (standardized) dimensional scores that were used as input variables for the cluster analysis were fairly unrelated, most below .1, a few of .3 or .4.

I hope I have given you the relevant answers to be able to provide some guidance on my question. Of course I will be happy to provide more detailed information if necessary.

Kind Regards
Maaike





2015-04-29 17:08 GMT+02:00 Art Kendall [via SPSSX Discussion] <[hidden email]>:
How many cases do you have in the whole data set?

How were the cases selected?

Are you variables reasonably uncorrelated?

Am I reading correctly that you used the cluster profiles from the Ward method to start the k-means?

How many samples from the whole set of cases did you use for the Ward method?

How large were those samples?

Art Kendall
Social Research Consultants





If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Fitindices-to-determine-optimal-clustersolution-tp5729419p5729431.html
To unsubscribe from Fitindices to determine optimal clustersolution, click here.
NAML



View this message in context: Re: Fitindices to determine optimal clustersolution
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.
===================== To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD