Limit on the number of independent variables in binary logistic regression and redundancy among the independent variables

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Limit on the number of independent variables in binary logistic regression and redundancy among the independent variables

Staffan Lindberg

Dear list!

 

In factor analyses there is a rule of thumb (of many) that the number of cases should be appr. 4-5 times greater than the number of variables. Is there a corresponding rule as regards to binomial logistic regression?  Would it be feasible to have say c:a 150 independent variables (a mixture of scale, ordinal and nominal ones) with 600 cases? Or are there other considerations that this should not be done. And second are there any caveats if there are several redundant variables among the independent variables i.e age in 1-.year classes, 5-year classes and 10-year classes in the same set of independent variables.

 

thankful for any input on this

 

best

 

Staffan Lindberg

Sweden

Reply | Threaded
Open this post in threaded view
|

Re: Limit on the number of independent variables in binary logistic regression and redundancy among the independent variables

Martin Holt
Hi Steffan,
 
If you search the group MedStats using "logistic regression sample size holt" in "Search this Group" at this page (without the quotation marks)
 
 
 
you will find a number of answers to your question. You do not need to sign in: the group is open to anyone to read.
 
I 've selected the following..it gives you the "rules" and where they came from, and what you can do if your sample size is too small.
 
Note the original reference of Peduzzi et al. 1996.
 
Best Wishes,
 
Martin Holt
----- Original Message -----
Sent: Tuesday, March 23, 2010 10:21 AM
Subject: Limit on the number of independent variables in binary logistic regression and redundancy among the independent variables

Dear list!

 

In factor analyses there is a rule of thumb (of many) that the number of cases should be appr. 4-5 times greater than the number of variables. Is there a corresponding rule as regards to binomial logistic regression?  Would it be feasible to have say c:a 150 independent variables (a mixture of scale, ordinal and nominal ones) with 600 cases? Or are there other considerations that this should not be done. And second are there any caveats if there are several redundant variables among the independent variables i.e age in 1-.year classes, 5-year classes and 10-year classes in the same set of independent variables.

 

thankful for any input on this

 

best

 

Staffan Lindberg

Sweden

Reply | Threaded
Open this post in threaded view
|

Re: Limit on the number of independent variables in binary logistic regression and redundancy among the independent variables

Bruce Weaver
Administrator
In reply to this post by Staffan Lindberg
Here are some links:

   http://faculty.chass.ncsu.edu/garson/PA765/logistic.htm#assume  -- see item 14
   http://www.medcalc.be/manual/logistic_regression.php  --  see "Sample Size Considerations"
   http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist  -- see "Overfitting and lack of model validation"



Staffan Lindberg wrote
Dear list!



In factor analyses there is a rule of thumb (of many) that the number of
cases should be appr. 4-5 times greater than the number of variables. Is
there a corresponding rule as regards to binomial logistic regression?
Would it be feasible to have say c:a 150 independent variables (a mixture of
scale, ordinal and nominal ones) with 600 cases? Or are there other
considerations that this should not be done. And second are there any
caveats if there are several redundant variables among the independent
variables i.e age in 1-.year classes, 5-year classes and 10-year classes in
the same set of independent variables.



thankful for any input on this



best



Staffan Lindberg

Sweden
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Limit on the number of independent variables in binary logistic regression and redundancy among the independent variables

SR Millis-3
In reply to this post by Staffan Lindberg
Staffan,
 
The sample size of the smaller of the 2 groups helps to determine the number of covariates that you can enter into your model.  You will want to have about 10 subject per covariate/variable.
 
Harrell, F. E., Jr. (2001). Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. New York: Springer-Verlag.
 
Scott
~~~~~~~~~~~
Scott R Millis, PhD, ABPP, CStat, CSci
Board Certified in Clinical Neuropsychology, Clinical Psychology, & Rehabilitation Psychology
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Dept of Emergency Medicine
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email: [hidden email]
Email: [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682

--- On Tue, 3/23/10, Staffan Lindberg <[hidden email]> wrote:

From: Staffan Lindberg <[hidden email]>
Subject: Limit on the number of independent variables in binary logistic regression and redundancy among the independent variables
To: [hidden email]
Date: Tuesday, March 23, 2010, 6:21 AM

Dear list!

 

In factor analyses there is a rule of thumb (of many) that the number of cases should be appr. 4-5 times greater than the number of variables. Is there a corresponding rule as regards to binomial logistic regression?  Would it be feasible to have say c:a 150 independent variables (a mixture of scale, ordinal and nominal ones) with 600 cases? Or are there other considerations that this should not be done. And second are there any caveats if there are several redundant variables among the independent variables i.e age in 1-.year classes, 5-year classes and 10-year classes in the same set of independent variables.

 

thankful for any input on this

 

best

 

Staffan Lindberg

Sweden