I know this has come up but can't find the reference or consensus statement:
There is a "generally accepted" ratio of cases to the number of IVs in a regression, I believe it was something like 30 to 1 but not sure. Anyone recall or can offer insight. Tks, W
Will
Statistical Services ============ info.statman@earthlink.net http://home.earthlink.net/~z_statman/ ============ |
Hi Will,
You might want to take a look at the article below. Green argues that the general rule of X # of cases per predictors results in sample sizes that may be unnecessarily large (is that possible?). However, whether you accept his formula or not doesn't really matter, because he gives references for two or three other rules for estimating the sample size, which should help. Green, S. B. (1991). How many subjects does it take to do a regression analysis. Multivariate Behavioral Research, 26, 499-510. Best, Lisa Lisa T. Stickney Ph.D. Candidate The Fox School of Business and Management Temple University [hidden email] ----- Original Message ----- From: "Will Bailey [Statman]" <[hidden email]> To: <[hidden email]> Sent: Wednesday, March 07, 2007 11:14 AM Subject: Ratio of Cases to Regression Variables >I know this has come up but can't find the reference or consensus >statement: > > There is a "generally accepted" ratio of cases to the number of IVs in a > regression, I believe it was something like 30 to 1 but not sure. > > Anyone recall or can offer insight. > > Tks, > W > |
In reply to this post by zstatman
Stevens' Applied Multivariate Stats book cites a study
that recommended 15 subjects per predictor when considering 3-25 predictors, shrinkage of less than 0.05 with high probability (.9), and a population squared multiple correlation of about 0.5. The magnitude of population correlation affects the recommended number, with higher rho-square leading to smaller recommended numbers, and lower rho-square leading to higher recommended numbers. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Will Bailey [Statman] Sent: Wednesday, March 07, 2007 10:14 AM To: [hidden email] Subject: Ratio of Cases to Regression Variables I know this has come up but can't find the reference or consensus statement: There is a "generally accepted" ratio of cases to the number of IVs in a regression, I believe it was something like 30 to 1 but not sure. Anyone recall or can offer insight. Tks, W |
In reply to this post by zstatman
The mathematics of regression defines no limit beyond having as many variables as cases (you can even get around that with Partial Least Squares). Anything beyond that is an arbitrary rule. Satisfactory results will, of course, depend on the variances and covariances of the regressors and the size of the error term.
My 3.14159 cents. Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Will Bailey [Statman] Sent: Wednesday, March 07, 2007 10:14 AM To: [hidden email] Subject: [SPSSX-L] Ratio of Cases to Regression Variables I know this has come up but can't find the reference or consensus statement: There is a "generally accepted" ratio of cases to the number of IVs in a regression, I believe it was something like 30 to 1 but not sure. Anyone recall or can offer insight. Tks, W |
In reply to this post by zstatman
Such "ratios" are generally not very helpful. The issue is power and
that will depend on the effect size as well as the sample size. So the # or subjects per variable will depend on the R squared for the full model. The lower it is, the more subjects per variable will be needed. The effect size in regression, F-squared is defined in terms of the squared change attributable to a variable divided by 1 minus the R squared for the full model. In the table below, the change in R squared is set at .075. The R squareds for the full models range from .10 to .50. This leads to F squareds that range from small/medium to medium (Cohen, 1988). The n's range from 40 to 120. With 4 Ivs, this is 10 to 30 subjects per IV. As you can see, 10 subjects per IV is not enough regardles of the Full model R squared. However, an R-squared for the full model of .7 would give an F squared of .25 and apower of .84. With 20 subjects per IV, a full model R-squared of .3 is sufficient to give a power > .8, and with 30 per IV, the power is > .9 even if the full model r-sqared is only .10. power analyses for regression model four predictors - RSQ_change = .075 Obs n_total alpha u df f_square lambda power 1 40 0.05 1 35 0.00000 0.0000 0.05000 2 40 0.05 1 35 0.08333 3.0833 0.40057 3 40 0.05 1 35 0.09375 3.4688 0.44099 4 40 0.05 1 35 0.10714 3.9643 0.49061 5 40 0.05 1 35 0.12500 4.6250 0.55230 6 40 0.05 1 35 0.15000 5.5500 0.62965 7 80 0.05 1 75 0.08333 6.4167 0.70561 8 80 0.05 1 75 0.09375 7.2188 0.75562 9 80 0.05 1 75 0.10714 8.2500 0.80931 10 80 0.05 1 75 0.12500 9.6250 0.86488 11 80 0.05 1 75 0.15000 11.5500 0.91845 12 120 0.05 1 115 0.08333 9.7500 0.87210 13 120 0.05 1 115 0.09375 10.9688 0.90728 14 120 0.05 1 115 0.10714 12.5357 0.93954 15 120 0.05 1 115 0.12500 14.6250 0.96654 16 120 0.05 1 115 0.15000 17.5500 0.98589 Paul R. Swank, Ph.D. Professor, Developmental Pediatrics Director of Research, University of Texas Health Science Center at Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Will Bailey [Statman] Sent: Wednesday, March 07, 2007 10:14 AM To: [hidden email] Subject: Ratio of Cases to Regression Variables I know this has come up but can't find the reference or consensus statement: There is a "generally accepted" ratio of cases to the number of IVs in a regression, I believe it was something like 30 to 1 but not sure. Anyone recall or can offer insight. Tks, W |
Free forum by Nabble | Edit this page |