SPSSX Discussion

post-classification model for a cluster analysis

Classic

List

Threaded

2 messages Options

Tanya Dockendorf-2

post-classification model for a cluster analysis

Hi all,
We are looking for a consultant to help us with the task of building a post-classification model for a cluster analysis that we have recently done. To classify the data we used an Ensemble approach (with software from Sawtooth) where many different versions of solutions are created first and then K-means is used to "cluster on clusters". I would usually use discriminate analysis to build a post-classification model and reduce the number of variables used, so that we can classify future respondents. The problem is that even if I put all the variables used in segmentation into the discriminate analysis, I only get 85% classification accuracy. So, I am looking for a consultant with experience in this area, particularly different methods other than discriminate analysis for this type of problem, who would like to take on a project...

Thanks in advance!
Tanya

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Zetu, Dan

Re: post-classification model for a cluster analysis

Tanya,

For starters, 85% is a fabulous level of accuracy for this type of
problem. In practice, if you get 50%, you are happy.

I can only think of multinomial logit as an alternative method to
discriminant analysis in your case. However, I don't think the
methodology is the problem, as discriminant performs reasonably well if
you have the right kind of data as input.

Regardless of the method you used in segmentation, there are a few
questions that need to be answered before you can ascertain the level of
acceptable accuracy you can live with. In general, it all centers around
what data you will have available post-segmentation that you can use to
assign subjects to segments. Only these data should be used in devising
the classification algorithm, regardless of method.

1. What are the types of base variables you used in segmentation? Is
this a survey-based segmentation?

2. Are the variables used in the classification algorithm a part of
segmentation variables? If yes, this can improve the classification
accuracy subsequently.

3. Are you able post-segmentation to gather additional subject-level
data that can be used in classification? For example, can you ask your
subjects a limited number of questions that, in addition to demographics
or other data, can assist you in further discriminating between them?

As you already know, this is a complex problem and in my view, while
accuracy is important, more important is the lift that you get from the
discriminant model. For example, say you have 5 segments and your
discriminant performs with 30% accuracy on average. You still have a 50%
lift in accuracy by using discriminant versus randomly assigning
subjects to segments.

Finally, have you tested different prior probabilities in the
discriminant model? If you know the "true" size of the segments, you can
use that as prior probability. If not, setting priors=equal may be a
better choice.

I am not sure this helps, but I am facing this issue on a regular basis.
Many times you have little choice but to accept a lower accuracy level,
which is better than nothing at all.

-------------------------------
Dan Zetu
Analytical Consultant
R. L. Polk & Co.
248-728-7278
[hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tanya Dockendorf
Sent: Thursday, September 04, 2008 5:42 PM
To: [hidden email]
Subject: post-classification model for a cluster analysis

Hi all,
We are looking for a consultant to help us with the task of building a
post-classification model for a cluster analysis that we have recently
done. To classify the data we used an Ensemble approach (with software
from Sawtooth) where many different versions of solutions are created
first and then K-means is used to "cluster on clusters". I would usually
use discriminate analysis to build a post-classification model and
reduce the number of variables used, so that we can classify future
respondents. The problem is that even if I put all the variables used in
segmentation into the discriminate analysis, I only get 85%
classification accuracy. So, I am looking for a consultant with
experience in this area, particularly different methods other than
discriminate analysis for this type of problem, who would like to take
on a project...

Thanks in advance!
Tanya

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
*****************************************************************
This message has originated from R. L. Polk & Co.,
26955 Northwestern Highway, Southfield, MI 48033.
R. L. Polk & Co. sends various types of email
communications. If this email message concerns the
potential licensing of a Polk product or service, and
you do not wish to receive further emails regarding Polk
products, forward this email to [hidden email]
with the word "remove" in the subject line.

The email and any files transmitted with it are confidential
and intended solely for the individual or entity to whom they
are addressed.

If you have received this email in error, please delete this
message and notify the Polk System Administrator at
[hidden email].
*****************************************************************

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD