Login  Register

Re: Fitindices to determine optimal clustersolution

Posted by MaaikeSmits on May 01, 2015; 11:59am
URL: http://spssx-discussion.165.s1.nabble.com/Fitindices-to-determine-optimal-clustersolution-tp5729419p5729459.html

@ Jon Peck and Art Kendall

I have now used Silhouettes to look at the clustersolutions. I am not sure if the use of euclidian is justified. I used squared euclidian distance as measure in the Original clustering procedure. Also I cannot find information on deciding the value of Minkowski, so I put that by default on 2. However, when I continue under these two assumptions (Minkwoski on 2 and use of euclidian) then I find the overal overall average S for all four clustering solutions to be rather low:  s(2) = .117, s(3)= .090, s(4) = .058, s (5) = .125. Maybe it is a good sign that none of the clusters show a negative mean value of S (in none of the clustersolutions), however in all of the clusters there are cases to be found with negative s values. Is there an absolute manner to interpreter the S values, or only relative to each other?


Considering the suggestions of Art Kendall:
Thank you for your advice on the stepped manner by working through the clustering via various procedures to decide via crostabbing which cases belong to core clusters. I will try to work out which different techniques of distance meausure would be suitable for my data apart from the procedure I already used. Can you refer me to an article in which the stepped procedure as you desribed above is used and outlined?

We did give a lot of consideration to the handling of missing data and outliers. In the steps of silhouettes there is being refered to the handling of missing data but in my procedure all cases which have missings on one of the inputvariables are automatically excluded. Are there clustering procedures in which this is not the case (without manually correcting for missings values or omputing them)?
When I do not exclude the extreme outliers, clusters are formed with only 1 of 2 cases, which makes the interpretation quite hard, so that is why we chose to exclude extreme outliers (but keep potential or probable outliers).