Hi everyone,
I would like to build a predictive model(Logistic regression or decision tree or NNet . But the frequency of the event I am predicting is extremely small less than 1%. In fact here is my frequency distribution of my dependent variable Churn_ind Churn_ind=1 150(0.75%) Churn_ind=0 19850(99.25%). Questions: Q1: What is the minimum sample size to run a reliable model? Q2: What model could best fit this type of distribution where my event is less than 1% and in this case 150 out of 20000 ? Q3:Is there a minimum N sample size when running a decision Tree I tried to run a decision but got no results. I turning to the group here to seek for ideas . Your assistance is more more than welcome. Thanks, Paul ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
To avoid over-fitting a logistic regression model, one should have 15-20 events per model parameter (although sometimes that is relaxed to 10 events per parameter). "Event" = the outcome category with the lower frequency. For more info, see:
http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist -- see "Overfitting and lack of model validation" HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |