|
Hi all,
We are looking for a consultant to help us with the task of building a post-classification model for a cluster analysis that we have recently done. To classify the data we used an Ensemble approach (with software from Sawtooth) where many different versions of solutions are created first and then K-means is used to "cluster on clusters". I would usually use discriminate analysis to build a post-classification model and reduce the number of variables used, so that we can classify future respondents. The problem is that even if I put all the variables used in segmentation into the discriminate analysis, I only get 85% classification accuracy. So, I am looking for a consultant with experience in this area, particularly different methods other than discriminate analysis for this type of problem, who would like to take on a project... Thanks in advance! Tanya ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Tanya,
For starters, 85% is a fabulous level of accuracy for this type of problem. In practice, if you get 50%, you are happy. I can only think of multinomial logit as an alternative method to discriminant analysis in your case. However, I don't think the methodology is the problem, as discriminant performs reasonably well if you have the right kind of data as input. Regardless of the method you used in segmentation, there are a few questions that need to be answered before you can ascertain the level of acceptable accuracy you can live with. In general, it all centers around what data you will have available post-segmentation that you can use to assign subjects to segments. Only these data should be used in devising the classification algorithm, regardless of method. 1. What are the types of base variables you used in segmentation? Is this a survey-based segmentation? 2. Are the variables used in the classification algorithm a part of segmentation variables? If yes, this can improve the classification accuracy subsequently. 3. Are you able post-segmentation to gather additional subject-level data that can be used in classification? For example, can you ask your subjects a limited number of questions that, in addition to demographics or other data, can assist you in further discriminating between them? As you already know, this is a complex problem and in my view, while accuracy is important, more important is the lift that you get from the discriminant model. For example, say you have 5 segments and your discriminant performs with 30% accuracy on average. You still have a 50% lift in accuracy by using discriminant versus randomly assigning subjects to segments. Finally, have you tested different prior probabilities in the discriminant model? If you know the "true" size of the segments, you can use that as prior probability. If not, setting priors=equal may be a better choice. I am not sure this helps, but I am facing this issue on a regular basis. Many times you have little choice but to accept a lower accuracy level, which is better than nothing at all. ------------------------------- Dan Zetu Analytical Consultant R. L. Polk & Co. 248-728-7278 [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tanya Dockendorf Sent: Thursday, September 04, 2008 5:42 PM To: [hidden email] Subject: post-classification model for a cluster analysis Hi all, We are looking for a consultant to help us with the task of building a post-classification model for a cluster analysis that we have recently done. To classify the data we used an Ensemble approach (with software from Sawtooth) where many different versions of solutions are created first and then K-means is used to "cluster on clusters". I would usually use discriminate analysis to build a post-classification model and reduce the number of variables used, so that we can classify future respondents. The problem is that even if I put all the variables used in segmentation into the discriminate analysis, I only get 85% classification accuracy. So, I am looking for a consultant with experience in this area, particularly different methods other than discriminate analysis for this type of problem, who would like to take on a project... Thanks in advance! Tanya ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ***************************************************************** This message has originated from R. L. Polk & Co., 26955 Northwestern Highway, Southfield, MI 48033. R. L. Polk & Co. sends various types of email communications. If this email message concerns the potential licensing of a Polk product or service, and you do not wish to receive further emails regarding Polk products, forward this email to [hidden email] with the word "remove" in the subject line. The email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error, please delete this message and notify the Polk System Administrator at [hidden email]. ***************************************************************** ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
