Cluster analysis options in SPSS v.19

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Cluster analysis options in SPSS v.19

Oksana Starchenko
I have several questions re clustering.

1. I was using two-step cluster procedure to explore a dataset consisting
of assessment scores for 490 people on 26 items. At home I have an old
v.11 software and at work a more recent v.19. After trying to replicate at
work results of runs I did at home, I noticed that the solutions I get are
somewhat different. The number of clusters that the algorithm finds in the
dataset is the same (2), but the membership of cases changes.

Does someone have an idea why this could be?  And which version is better
to use for clustering with two-step?

2. Two-step cluster in version 19 produces this incredibly funky output
file, which alas is totally impenetrable and does not contain all the
information I am used to seeing. I checked what kind of output the K-mean
procedure would produce in this version. It turned out to be your normal
output that you can manipulate and transfer to other file types.

What gives? And how can I get back to the old, boring but useful, output
format when running the two-step in v.19?

3. Finally, as I mentioned I am working with a dataset consisting of 490
cases that have scores on 26 variables. The data is not normally
distributed, on some variables more than on others. The scores vary from 0
to 4. There are some missing values, but relatively few (about 20 in the
whole set).  I used two-step cluster procedure to explore the groupings
that might exist in the set.

After I know what goes on in the data, would it make sense to switch to
another procedure (like K-means, or Hierarchical) to produce membership
values for each case?

Thanks!
Oksana

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cluster analysis options in SPSS v.19

Alex Reutter
Can you send the syntax you're running to the list?

Alex
Reply | Threaded
Open this post in threaded view
|

Re: Cluster analysis options in SPSS v.19

Jon K Peck
In reply to this post by Oksana Starchenko

Jon Peck (no "h")
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Oksana Starchenko <[hidden email]>
To:        [hidden email]
Date:        11/01/2011 05:51 PM
Subject:        [SPSSX-L] Cluster analysis options in SPSS v.19
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I have several questions re clustering.

1. I was using two-step cluster procedure to explore a dataset consisting
of assessment scores for 490 people on 26 items. At home I have an old
v.11 software and at work a more recent v.19. After trying to replicate at
work results of runs I did at home, I noticed that the solutions I get are
somewhat different. The number of clusters that the algorithm finds in the
dataset is the same (2), but the membership of cases changes.

Does someone have an idea why this could be?  And which version is better
to use for clustering with two-step?

>>> The TWOSTEP code has been tweaked over the years, so it is not surprising that there would be small differences.  I would use the most recent version.

2. Two-step cluster in version 19 produces this incredibly funky output
file, which alas is totally impenetrable and does not contain all the
information I am used to seeing. I checked what kind of output the K-means
procedure would produce in this version. It turned out to be your normal
output that you can manipulate and transfer to other file types.

What gives? And how can I get back to the old, boring but useful, output
format when running the two-step in v.19?

>>>The Model Viewer output is designed to be more interactive and make it easier to explore the clustering results.  However, you can still get the traditional output.  You have to paste and edit the syntax.  Add /print ic summary count.  You can also get the Model Viewer output, or  you can suppress it with /viewmodel display=no.

>>>You might also find two clustering-related extension commands useful.
STATS CLUS SIL calculates and plots silhouette statistics for clusters.  It offers several metrics, none of which is exactly the same as twostep (Gower comes closest).  This can help in assessing the cluster solution.

STATS SUBGROUP PLOTS provides graphs of the clustering or other variables for each cluster or other subgroup.

These commands can be obtained from the SPSS Community website(www.ibm.com/developerworks/spssdevcentral) in the Downloads for SPSS Statistics link.
They require the Python Essentials, also available from the same site.

3. Finally, as I mentioned I am working with a dataset consisting of 490
cases that have scores on 26 variables. The data is not normally
distributed, on some variables more than on others. The scores vary from 0
to 4. There are some missing values, but relatively few (about 20 in the
whole set).  I used two-step cluster procedure to explore the groupings
that might exist in the set.

After I know what goes on in the data, would it make sense to switch to
another procedure (like K-means, or Hierarchical) to produce membership
values for each case?


>>>The assumptions behind each of these clustering methods are quite different.  It might be interesting to see how consistent these methods are with each other.  TWOSTEP is the most scalable method, but your dataset is not large enough for this to matter much.

HTH,
Jon Peck

Thanks!
Oksana

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD