SPSSX Discussion

Cluster Analysis SPSS vs SAS

Classic

List

Threaded

3 messages Options

Kevan Edwards (MDH)

Cluster Analysis SPSS vs SAS

Working on a cluster analysis project attempting to perform the same analysis in both SAS and SPSS and am getting very different results. I am not an experienced SAS user but would like some help from someone who is familiar with both SPSS and SAS.

Code for SPSS routine is

GET FILE=’DATA.sav'.

TEMPORARY.

SELECT IF RATE1A GE 40 AND RATE1A LT 98.

QUICK CLUSTER RATE1A

/MISSING=LISTWISE

/CRITERIA=CLUSTER(9) MXITER(50) CONVERGE(0)

/METHOD=KMEANS(UPDATE)

/INITIAL (40 47.25 54.5 61.75 69 76.25 83.5 90.76 98)

/PRINT ANOVA

/SAVE=CLUSTER (C1_RT1A).

Code for SAS is

PROC FASTCLUS DATA=SMALL OUT=OUTF SEED=SEEDS DRIFT MEAN=AVE MAXC=9 MAXITER=50;

VAR &RATE;

RUN;

We thought these two approaches would product very similar results. While SPSS is pretty straight forward in how it assigns values to cluster groups, our understanding is that the FASTCLUS procedure operates in four steps:

1. Observations called cluster seeds are selected. The values are SPECIFIED IN SEEDS FILE]

2. DRIFT option, temporary clusters are formed by assigning each observation to the cluster with the nearest seed. Each time an observation is assigned, the cluster seed is updated as the current mean of the cluster. This method is sometimes called incremental, on-line, or adaptive training.

3. If the maximum number of iterations is greater than zero, clusters are formed by assigning each observation to the nearest seed. After all observations are assigned, the cluster seeds are replaced by either the cluster means or other location estimates (cluster centers) appropriate to the LEAST= option. If you do not specify the LEAST= option, PROC FASTCLUS uses least squares.

4. Final clusters are formed by assigning each observation to the nearest seed.

Any help in understanding why we get such dramatically different results between the SPSS and SAS analyses in the cluster assignment would be appreciated.

Kevan

----------------------------

Kevan Edwards M.A., Ph.D.

Health Services Research Director
HEP / DHP / Minnesota Department of Health
PO Box 64975, St. Paul, MN 55164-0975
phone 651-201-3551 fax 651-201-5179
http://www.health.state.mn.us/healtheconomics

David Marso

Re: Cluster Analysis SPSS vs SAS

Administrator

1. QUICK CLUSTER is *HIGHLY* dependent upon data file order!!!
2. You are comparing Apples with Oranges?
3. You should probably be using TWOSTEP CLUSTER instead?
4. You don't indicate how you have determined the the solutions are different (however I do believe they are).
5. You could try:
COMPUTE Blah=UNIFORM(1).
SORT CASES BY Blah.
QUICK CLUSTER .......etc etc...
-------------
COMPUTE Blah=UNIFORM(1).
SORT CASES BY Blah.
QUICK CLUSTER .......etc etc...
and end up with *RADICALLY* different cluster assignment from QC.
HTH, David
--

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Art Kendall

Re: Cluster Analysis SPSS vs SAS

so true with any implementation of k-means.

Art Kendall
Social Research Consultants

On 4/11/2012 3:31 PM, David Marso wrote:

1.  QUICK CLUSTER is *HIGHLY* dependent upon data file order!!!
2.  You are comparing Apples with Oranges?
3.  You should probably be using TWOSTEP CLUSTER instead?
4.  You don't indicate how you have determined the the solutions are
different (however I do believe they are).
5. You could try:
COMPUTE Blah=UNIFORM(1).
SORT CASES BY Blah.
QUICK CLUSTER .......etc etc...
-------------
COMPUTE Blah=UNIFORM(1).
SORT CASES BY Blah.
QUICK CLUSTER .......etc etc...
and end up with *RADICALLY* different cluster assignment from QC.
HTH, David
--

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-SPSS-vs-SAS-tp5633770p5633835.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants