I have several questions re clustering.
1. I was using two-step cluster procedure to explore a dataset consisting of assessment scores for 490 people on 26 items. At home I have an old v.11 software and at work a more recent v.19. After trying to replicate at work results of runs I did at home, I noticed that the solutions I get are somewhat different. The number of clusters that the algorithm finds in the dataset is the same (2), but the membership of cases changes. Does someone have an idea why this could be? And which version is better to use for clustering with two-step? 2. Two-step cluster in version 19 produces this incredibly funky output file, which alas is totally impenetrable and does not contain all the information I am used to seeing. I checked what kind of output the K-mean procedure would produce in this version. It turned out to be your normal output that you can manipulate and transfer to other file types. What gives? And how can I get back to the old, boring but useful, output format when running the two-step in v.19? 3. Finally, as I mentioned I am working with a dataset consisting of 490 cases that have scores on 26 variables. The data is not normally distributed, on some variables more than on others. The scores vary from 0 to 4. There are some missing values, but relatively few (about 20 in the whole set). I used two-step cluster procedure to explore the groupings that might exist in the set. After I know what goes on in the data, would it make sense to switch to another procedure (like K-means, or Hierarchical) to produce membership values for each case? Thanks! Oksana ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Can you send the syntax you're running
to the list?
Alex |
In reply to this post by Oksana Starchenko
Jon Peck (no "h") Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: Oksana Starchenko <[hidden email]> To: [hidden email] Date: 11/01/2011 05:51 PM Subject: [SPSSX-L] Cluster analysis options in SPSS v.19 Sent by: "SPSSX(r) Discussion" <[hidden email]> I have several questions re clustering. 1. I was using two-step cluster procedure to explore a dataset consisting of assessment scores for 490 people on 26 items. At home I have an old v.11 software and at work a more recent v.19. After trying to replicate at work results of runs I did at home, I noticed that the solutions I get are somewhat different. The number of clusters that the algorithm finds in the dataset is the same (2), but the membership of cases changes. Does someone have an idea why this could be? And which version is better to use for clustering with two-step? >>> The TWOSTEP code has been tweaked over the years, so it is not surprising that there would be small differences. I would use the most recent version. 2. Two-step cluster in version 19 produces this incredibly funky output file, which alas is totally impenetrable and does not contain all the information I am used to seeing. I checked what kind of output the K-means procedure would produce in this version. It turned out to be your normal output that you can manipulate and transfer to other file types. What gives? And how can I get back to the old, boring but useful, output format when running the two-step in v.19? >>>The Model Viewer output is designed to be more interactive and make it easier to explore the clustering results. However, you can still get the traditional output. You have to paste and edit the syntax. Add /print ic summary count. You can also get the Model Viewer output, or you can suppress it with /viewmodel display=no. >>>You might also find two clustering-related extension commands useful. STATS CLUS SIL calculates and plots silhouette statistics for clusters. It offers several metrics, none of which is exactly the same as twostep (Gower comes closest). This can help in assessing the cluster solution. STATS SUBGROUP PLOTS provides graphs of the clustering or other variables for each cluster or other subgroup. These commands can be obtained from the SPSS Community website(www.ibm.com/developerworks/spssdevcentral) in the Downloads for SPSS Statistics link. They require the Python Essentials, also available from the same site. 3. Finally, as I mentioned I am working with a dataset consisting of 490 cases that have scores on 26 variables. The data is not normally distributed, on some variables more than on others. The scores vary from 0 to 4. There are some missing values, but relatively few (about 20 in the whole set). I used two-step cluster procedure to explore the groupings that might exist in the set. After I know what goes on in the data, would it make sense to switch to another procedure (like K-means, or Hierarchical) to produce membership values for each case? >>>The assumptions behind each of these clustering methods are quite different. It might be interesting to see how consistent these methods are with each other. TWOSTEP is the most scalable method, but your dataset is not large enough for this to matter much. HTH, Jon Peck Thanks! Oksana ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |