Posted by
Art Kendall on
Apr 30, 2015; 1:21pm
URL: http://spssx-discussion.165.s1.nabble.com/Fitindices-to-determine-optimal-clustersolution-tp5729419p5729444.html
Jon's suggestion is right on.
*****
Of course, a lot depends on what you are going to use the cluster membership variable for, what the nature of a case is, and the meaning of teh variables in the raw profiles.
*****
However, it has been my practice since the mid-70's to use several clustering algorithms and use "core clusters" based on consensus of several techniques.
Clustering techniques are exploratory/heuristic techniques. The various distance measure * agglomeration algorithm combinations grab different aspects of potential clusters.
(1) retain the cluster memberships from several techniques distance measures and levels of agglomeration. Interpreting each so they make some sense.
(2) crosstab memberships and find bunches of cases that are put together by several runs.
call those bunches "care clusters".
(3) use the classification phase of discriminant to see how well the core clusters are separated.
Now that STATS CLUS SIL is available use that to look at the core clusters.
I have not tried this but it would be interesting to see what happens when you run TWOSTEP on the set of cluster membership variables.
-----
For the cases with suspicious values, do they make sense as valid measures?
I am leery of trimming variables in general and especially so when looking for profiles. In clustering extremes might be particularly, e.g., looking at counties in 5 western US states, Los Angeles remained a singleton through everything and that makes tremendous sense.
----
Have you examined why some of the variables in the raw profiles are missing? Since your data set is small, you might consider whether you can make reasonable substitutions for the missing values, e.g, use variables not in the profile to do MVA, or substitute zero, or use means of valid scale items rather than totals if you have summative scores, or ...
Art Kendall
Social Research Consultants