Re: Fitindices to determine optimal clustersolution
Posted by
Jon K Peck on
Nov 14, 2015; 12:47am
URL: http://spssx-discussion.165.s1.nabble.com/Fitindices-to-determine-optimal-clustersolution-tp5729419p5730948.html
You should be using the variables the way
they were used when you did the clustering, so if you clustered using standardized
variables, use those same variables for the silhouette plots.The silhouette plots are showing you
how comfortable, if you will, the points are in their assigned clusters,
Silhouette values near 1 are good.This link gives you a concise description
without too much math.https://en.wikipedia.org/wiki/Silhouette_(clustering)
Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621
From:
MaaikeSmits <[hidden email]>To:
[hidden email]Date:
11/13/2015 07:38 AMSubject:
Re: [SPSSX-L]
Fitindices to determine optimal clustersolutionSent by:
"SPSSX(r)
Discussion" <[hidden email]>
Hi Jon Peck,
Some time ago you referred me to the silhouettes procedure as a way to
determine and compare several clustersolutions, that I got from a k-means
clusteranalysis. I performed the cluster-analysis on z-scores (10
variables). In the silhouettes procedure I think I should use the z-scores
as well (next to clustersolution/membership variable), is that right? So
option 1 instead of 2? QCL_1 is the clustermembership variabele that is
saved from the k-means procedure. The rBPS to rSZOID variables are the
variables I performed the k-means analyses on (in option 1 in Z-scores,
in
option 2 as originol non standardized scores). I do not understand enough
of
the mathematical computation of silhouettes in order to understand the
difference between silhouette mean scores that arises if I use standardized
versus non-standardized scores, but which one should I use ? I perform
syntax below on all clustervariables, en then the one with highest total
silhouette mean score woudl indicate best fit to the data right?
Option 1
STATS CLUS SIL CLUSTER=QCL_1 VARIABLES=ZrBPSdim ZrTHEAdim ZrNARCdim
ZrANTdimA ZrAFHdim ZrONTdim
ZrOBSdim ZrPARAdim ZrSTYPdim ZrSZOIDdim
NEXTBEST=clusdrienextbest SILHOUETTE=clusdriesilval DISSIMILARITY=EUCLID
MINKOWSKIPOWER=2
/OPTIONS MISSING=RESCALE RENUMBERORDINAL=NO
/OUTPUT HISTOGRAM=YES ORIENTATION=HORIZONTAL THREEDBAR=YES THREEDCOUNTS=YES.
Option 2
STATS CLUS SIL CLUSTER=QCL_3 VARIABLES=rBPSdim rTHEAdim rNARCdim rANTdimA
rAFHdim rONTdim
rOBSdim rPARAdim rSTYPdim rSZOIDdim
NEXTBEST=clusdrienextbest SILHOUETTE=clusdriesilval DISSIMILARITY=EUCLID
MINKOWSKIPOWER=2
/OPTIONS MISSING=RESCALE RENUMBERORDINAL=NO
/OUTPUT HISTOGRAM=YES ORIENTATION=HORIZONTAL THREEDBAR=YES THREEDCOUNTS=YES.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Fitindices-to-determine-optimal-clustersolution-tp5729419p5730940.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD