Login  Register

Re: Fitindices to determine optimal clustersolution

Posted by Jon K Peck on Nov 14, 2015; 12:47am
URL: http://spssx-discussion.165.s1.nabble.com/Fitindices-to-determine-optimal-clustersolution-tp5729419p5730948.html

You should be using the variables the way they were used when you did the clustering, so if you clustered using standardized variables, use those same variables for the silhouette plots.

The silhouette plots are showing you how comfortable, if you will, the points are in their assigned clusters,  Silhouette values near 1 are good.

This link gives you a concise description without too much math.
https://en.wikipedia.org/wiki/Silhouette_(clustering)

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        MaaikeSmits <[hidden email]>
To:        [hidden email]
Date:        11/13/2015 07:38 AM
Subject:        Re: [SPSSX-L] Fitindices to determine optimal clustersolution
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi Jon Peck,

Some time ago you referred me to the silhouettes procedure as a way to
determine and compare several clustersolutions, that I got from a k-means
clusteranalysis. I performed the cluster-analysis on z-scores (10
variables). In the silhouettes procedure I think I should use the z-scores
as well (next to clustersolution/membership variable), is that right? So
option 1 instead of 2? QCL_1 is the clustermembership variabele that is
saved from the k-means procedure. The rBPS to rSZOID variables are the
variables I performed the k-means analyses on (in option 1 in Z-scores, in
option 2 as originol non standardized scores). I do not understand enough of
the mathematical computation of silhouettes in order to understand the
difference between silhouette mean scores that arises if I use standardized
versus non-standardized scores, but which one should I use ? I perform
syntax below on all clustervariables, en then the one with highest total
silhouette mean score woudl indicate best fit to the data right?

Option 1
STATS CLUS SIL CLUSTER=QCL_1 VARIABLES=ZrBPSdim ZrTHEAdim ZrNARCdim
ZrANTdimA ZrAFHdim ZrONTdim
   ZrOBSdim ZrPARAdim ZrSTYPdim ZrSZOIDdim
NEXTBEST=clusdrienextbest SILHOUETTE=clusdriesilval DISSIMILARITY=EUCLID
MINKOWSKIPOWER=2
/OPTIONS MISSING=RESCALE RENUMBERORDINAL=NO
/OUTPUT HISTOGRAM=YES ORIENTATION=HORIZONTAL THREEDBAR=YES THREEDCOUNTS=YES.

Option 2
STATS CLUS SIL CLUSTER=QCL_3 VARIABLES=rBPSdim rTHEAdim rNARCdim rANTdimA
rAFHdim rONTdim
   rOBSdim rPARAdim rSTYPdim rSZOIDdim
NEXTBEST=clusdrienextbest SILHOUETTE=clusdriesilval DISSIMILARITY=EUCLID
MINKOWSKIPOWER=2
/OPTIONS MISSING=RESCALE RENUMBERORDINAL=NO
/OUTPUT HISTOGRAM=YES ORIENTATION=HORIZONTAL THREEDBAR=YES THREEDCOUNTS=YES.



--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Fitindices-to-determine-optimal-clustersolution-tp5729419p5730940.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD