Achieving robust solution in two-step cluster analysis

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Achieving robust solution in two-step cluster analysis

Thomas M. Guterbock
Hello:
  I'm brand new to this list, but have been using SPSS in its various forms
for some 35 years.
  I'm engaged with colleagues in a fairly large project that seeks to
'segment' members of the public according to their preferences and
practices in seeking health information.  We have collected survey data
from a sample of 1,200 Virginia adults, and have several hundred variables
in our data set.  Some variables are categorical and some are
interval-level.
  We have set to work using two-step cluster to segment the data, and our
initial work yielded a 7-cluster solution that seemed to make sense in
relation to our theories.  (Autocluster gave us only two clusters, so we
asked for more and walked up to seven clusters before things got weird.)
   But then, we found that any minor change in the list of basis variables,
or in the way we specified these variables, leads the two-step cluster
procedure to yield a very different clustering of the cases.  (We detected
this by simpling cross-tabbing the cluster ID's from one solution with
those from the next.)
   We have tried to manipulate the 'outlier handling' function in the
program, but this has not led to a more stable solution under various
fairly similar specifications.
   I have been wondering if the data set may include cases that cluster
tightly and others that are not easily classified and have an undue effect
on the cluster outcome?  If so, I'd like such cases to be excluded.  Is
there a way to change the solution specifications so that more cases will
be seen as outliers by the program?  Could that lead to a more stable
result?  Again, just setting the 'outliers' subcommand to a positive value
doesn't seem to hold out many cases (about 30 out of 1200).
    I also encountered a note somewhere on the internet that suggested the
procedure is sensitive to the order in which the cases are read.  Is that
true?  Should I manipulate the case sort-order and could that be helpful in
getting a more stable result?
   Any ideas would be most welcome.
                                                                                Thanks in advance,
                                                                                        Tom Guterbock


Thomas M. Guterbock                        Voice: (434)243-5223
Director                         CSR Main Number: (434)243-5222
Center for Survey Research                   FAX: (434)243-5233
University of Virginia     EXPRESS DELIVERY:  2400 Old Ivy Road
P. O. Box 400767                                      Suite 223
Charlottesville, VA 22904-4767        Charlottesville, VA 22903
                e-mail: [hidden email]