Predictive model for rare event

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Predictive model for rare event

Paul-2
Hi everyone,

I would like to build a predictive model(Logistic regression or decision
tree or NNet . But the frequency of the event I am predicting is extremely
small less than 1%.
In fact here is my frequency distribution of my dependent variable
Churn_ind
Churn_ind=1  150(0.75%)
Churn_ind=0  19850(99.25%).
Questions:
Q1: What is the minimum sample size to run a reliable model?
Q2: What model could best fit this type of distribution where my event is
less than 1% and in this case 150 out of 20000 ?
Q3:Is there a minimum N sample size when running a decision Tree I tried
to run a decision but got no results.
I turning to the group here to seek for ideas . Your assistance is more
more than welcome.

Thanks,

Paul

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Predictive model for rare event

Bruce Weaver
Administrator
To avoid over-fitting a logistic regression model, one should have 15-20 events per model parameter (although sometimes that is relaxed to 10 events per parameter).  "Event" = the outcome category with the lower frequency.  For more info, see:

   http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf
   http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist -- see "Overfitting and lack of model validation"

HTH.

Paul wrote
Hi everyone,

I would like to build a predictive model(Logistic regression or decision
tree or NNet . But the frequency of the event I am predicting is extremely
small less than 1%.
In fact here is my frequency distribution of my dependent variable
Churn_ind
Churn_ind=1  150(0.75%)
Churn_ind=0  19850(99.25%).
Questions:
Q1: What is the minimum sample size to run a reliable model?
Q2: What model could best fit this type of distribution where my event is
less than 1% and in this case 150 out of 20000 ?
Q3:Is there a minimum N sample size when running a decision Tree I tried
to run a decision but got no results.
I turning to the group here to seek for ideas . Your assistance is more
more than welcome.

Thanks,

Paul

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).