Help With Classification Trees

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Help With Classification Trees

Cardiff Tyke
Thanks to all that helped with my last problem!

I am currently using SPSS Classification Trees to try and segment a data file into Good and Bad risk (ideally I'd like to segment into 4 categories, but two is proving difficult enough).  I have a list of accounts plus various fields and demographic variables.  However, due to the customer base, the file is heavily weighted towards the Bad risk accounts (the split is around 85%:15%).

On using classification trees, the "best " node (I've selected "Good" accounts as my primary interest) is still very weighted towards bad payers (around 70%:30%) and overall SPSS cannot predict any "Good" account holders.  I'm tasked with trying to find the attibutes of "Good" risk account holders.

When faced with this situation, what should a data analyst do?

1)  Try and use a more equal sample?  I'm not sure how to select an equal number of cases in SPSS without manually deleting records.
2)  Try and find more variables (I think I have all the ones that are available)
3)  Try and different approach (I chose Classification Trees as I assumed this would be the easiest starting point and would identify key variables the quickest)

As ever, any help is greatly appreciated.

JC.
Reply | Threaded
Open this post in threaded view
|

Re: Help With Classification Trees

Oliver, Richard
Two comments:
 
1. If the 85/15 bad/good split is representative of the population of interest, then it's probably not a good idea to to try to create a sample that is "more equal". If, however, you have reason to believe that the "bad" group is over-represented, you could weight the data to produce a more representative distribution.
 
2. Try reducing the minimum parent and child node sizes (the defaults are 100 and 50 respectively), and/or increasing the maximum tree depth (the default is 3 for CHAID, 5 for CRT and QUEST).

________________________________

From: SPSSX(r) Discussion on behalf of Cardiff Tyke
Sent: Sun 8/20/2006 11:18 AM
To: [hidden email]
Subject: Help With Classification Trees



Thanks to all that helped with my last problem!

I am currently using SPSS Classification Trees to try and segment a data file into Good and Bad risk (ideally I'd like to segment into 4 categories, but two is proving difficult enough).  I have a list of accounts plus various fields and demographic variables.  However, due to the customer base, the file is heavily weighted towards the Bad risk accounts (the split is around 85%:15%).

On using classification trees, the "best " node (I've selected "Good" accounts as my primary interest) is still very weighted towards bad payers (around 70%:30%) and overall SPSS cannot predict any "Good" account holders.  I'm tasked with trying to find the attibutes of "Good" risk account holders.

When faced with this situation, what should a data analyst do?

1)  Try and use a more equal sample?  I'm not sure how to select an equal number of cases in SPSS without manually deleting records.
2)  Try and find more variables (I think I have all the ones that are available)
3)  Try and different approach (I chose Classification Trees as I assumed this would be the easiest starting point and would identify key variables the quickest)

As ever, any help is greatly appreciated.

JC.