Re: Fitindices to determine optimal clustersolution
Posted by MaaikeSmits on Nov 13, 2015; 2:38pm
URL: http://spssx-discussion.165.s1.nabble.com/Fitindices-to-determine-optimal-clustersolution-tp5729419p5730940.html
Hi Jon Peck,
Some time ago you referred me to the silhouettes procedure as a way to determine and compare several clustersolutions, that I got from a k-means clusteranalysis. I performed the cluster-analysis on z-scores (10 variables). In the silhouettes procedure I think I should use the z-scores as well (next to clustersolution/membership variable), is that right? So option 1 instead of 2? QCL_1 is the clustermembership variabele that is saved from the k-means procedure. The rBPS to rSZOID variables are the variables I performed the k-means analyses on (in option 1 in Z-scores, in option 2 as originol non standardized scores). I do not understand enough of the mathematical computation of silhouettes in order to understand the difference between silhouette mean scores that arises if I use standardized versus non-standardized scores, but which one should I use ? I perform syntax below on all clustervariables, en then the one with highest total silhouette mean score woudl indicate best fit to the data right?
Option 1
STATS CLUS SIL CLUSTER=QCL_1 VARIABLES=ZrBPSdim ZrTHEAdim ZrNARCdim ZrANTdimA ZrAFHdim ZrONTdim
ZrOBSdim ZrPARAdim ZrSTYPdim ZrSZOIDdim
NEXTBEST=clusdrienextbest SILHOUETTE=clusdriesilval DISSIMILARITY=EUCLID MINKOWSKIPOWER=2
/OPTIONS MISSING=RESCALE RENUMBERORDINAL=NO
/OUTPUT HISTOGRAM=YES ORIENTATION=HORIZONTAL THREEDBAR=YES THREEDCOUNTS=YES.
Option 2
STATS CLUS SIL CLUSTER=QCL_3 VARIABLES=rBPSdim rTHEAdim rNARCdim rANTdimA rAFHdim rONTdim
rOBSdim rPARAdim rSTYPdim rSZOIDdim
NEXTBEST=clusdrienextbest SILHOUETTE=clusdriesilval DISSIMILARITY=EUCLID MINKOWSKIPOWER=2
/OPTIONS MISSING=RESCALE RENUMBERORDINAL=NO
/OUTPUT HISTOGRAM=YES ORIENTATION=HORIZONTAL THREEDBAR=YES THREEDCOUNTS=YES.