Working on a cluster analysis project attempting to perform the same analysis in both SAS and SPSS and am getting
very different results. I am not an experienced SAS user but would like some help from someone who is familiar with both SPSS and SAS. Code for SPSS routine is
GET FILE=’DATA.sav'.
TEMPORARY.
SELECT IF RATE1A GE 40 AND RATE1A LT 98.
QUICK CLUSTER RATE1A
/MISSING=LISTWISE
/CRITERIA=CLUSTER(9) MXITER(50) CONVERGE(0)
/METHOD=KMEANS(UPDATE)
/INITIAL (40 47.25 54.5 61.75 69 76.25 83.5 90.76 98)
/PRINT ANOVA
/SAVE=CLUSTER (C1_RT1A). Code for SAS is
PROC FASTCLUS DATA=SMALL OUT=OUTF SEED=SEEDS DRIFT MEAN=AVE MAXC=9 MAXITER=50;
VAR &RATE;
RUN; We thought these two approaches would product very similar results. While SPSS is pretty straight
forward in how it assigns values to cluster groups, our understanding is that the FASTCLUS procedure operates in four steps:
1.
Observations called
cluster seeds are selected. The values are SPECIFIED IN SEEDS FILE]
2.
DRIFT option, temporary clusters are formed by assigning each observation to the cluster with the nearest seed. Each time an observation is assigned, the cluster
seed is updated as the current mean of the cluster. This method is sometimes called
incremental, on-line, or adaptive training.
3.
If the maximum number of iterations is greater than zero, clusters are formed by assigning each observation to the nearest seed. After all observations are assigned,
the cluster seeds are replaced by either the cluster means or other location estimates (cluster centers) appropriate
to the LEAST= option. If you do not specify the LEAST= option, PROC FASTCLUS uses least squares.
4.
Final clusters are formed by assigning each observation to the nearest seed.
Any help in understanding why we get such dramatically different results between the SPSS and SAS analyses in the cluster assignment would be appreciated. Kevan ---------------------------- Kevan Edwards M.A., Ph.D. Health Services Research Director
|
Administrator
|
1. QUICK CLUSTER is *HIGHLY* dependent upon data file order!!!
2. You are comparing Apples with Oranges? 3. You should probably be using TWOSTEP CLUSTER instead? 4. You don't indicate how you have determined the the solutions are different (however I do believe they are). 5. You could try: COMPUTE Blah=UNIFORM(1). SORT CASES BY Blah. QUICK CLUSTER .......etc etc... ------------- COMPUTE Blah=UNIFORM(1). SORT CASES BY Blah. QUICK CLUSTER .......etc etc... and end up with *RADICALLY* different cluster assignment from QC. HTH, David --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
so true with any implementation of k-means.
Art Kendall Social Research Consultants On 4/11/2012 3:31 PM, David Marso wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD1. QUICK CLUSTER is *HIGHLY* dependent upon data file order!!! 2. You are comparing Apples with Oranges? 3. You should probably be using TWOSTEP CLUSTER instead? 4. You don't indicate how you have determined the the solutions are different (however I do believe they are). 5. You could try: COMPUTE Blah=UNIFORM(1). SORT CASES BY Blah. QUICK CLUSTER .......etc etc... ------------- COMPUTE Blah=UNIFORM(1). SORT CASES BY Blah. QUICK CLUSTER .......etc etc... and end up with *RADICALLY* different cluster assignment from QC. HTH, David -- -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-SPSS-vs-SAS-tp5633770p5633835.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Free forum by Nabble | Edit this page |