Ratio of Cases to Regression Variables

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Ratio of Cases to Regression Variables

zstatman
I know this has come up but can't find the reference or consensus statement:

There is a "generally accepted" ratio of cases to the number of IVs in a
regression, I believe it was something like 30 to 1 but not sure.

Anyone recall or can offer insight.

Tks,
W
Will
Statistical Services
 
============
info.statman@earthlink.net
http://home.earthlink.net/~z_statman/
============
Reply | Threaded
Open this post in threaded view
|

Re: Ratio of Cases to Regression Variables

lts1
Hi Will,

    You might want to take a look at the article below.  Green argues that
the general rule of  X # of cases per predictors results in sample sizes
that may be unnecessarily large (is that possible?).  However, whether you
accept his formula or not doesn't really matter, because he gives references
for two or three other rules for estimating the sample size, which should
help.
    Green, S. B. (1991). How many subjects does it take to do a regression
analysis. Multivariate Behavioral Research, 26, 499-510.

    Best,
        Lisa

Lisa T. Stickney
Ph.D. Candidate
The Fox School of Business
     and Management
Temple University
[hidden email]


----- Original Message -----
From: "Will Bailey [Statman]" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, March 07, 2007 11:14 AM
Subject: Ratio of Cases to Regression Variables


>I know this has come up but can't find the reference or consensus
>statement:
>
> There is a "generally accepted" ratio of cases to the number of IVs in a
> regression, I believe it was something like 30 to 1 but not sure.
>
> Anyone recall or can offer insight.
>
> Tks,
> W
>
Reply | Threaded
Open this post in threaded view
|

Re: Ratio of Cases to Regression Variables

Anthony Babinec
In reply to this post by zstatman
Stevens' Applied Multivariate Stats book cites a study
that recommended 15 subjects per predictor when considering
3-25 predictors, shrinkage of less than 0.05 with high
probability (.9), and a population squared multiple correlation
of about 0.5. The magnitude of population correlation affects
the recommended number, with higher rho-square leading to
smaller recommended numbers, and lower rho-square leading
to higher recommended numbers.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Will Bailey [Statman]
Sent: Wednesday, March 07, 2007 10:14 AM
To: [hidden email]
Subject: Ratio of Cases to Regression Variables

I know this has come up but can't find the reference or consensus statement:

There is a "generally accepted" ratio of cases to the number of IVs in a
regression, I believe it was something like 30 to 1 but not sure.

Anyone recall or can offer insight.

Tks,
W
Reply | Threaded
Open this post in threaded view
|

Re: Ratio of Cases to Regression Variables

Peck, Jon
In reply to this post by zstatman
The mathematics of regression defines no limit beyond having as many variables as cases (you can even get around that with Partial Least Squares).  Anything beyond that is an arbitrary rule.  Satisfactory results will, of course, depend on the variances and covariances of the regressors and the size of the error term.

My 3.14159 cents.
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Will Bailey [Statman]
Sent: Wednesday, March 07, 2007 10:14 AM
To: [hidden email]
Subject: [SPSSX-L] Ratio of Cases to Regression Variables

I know this has come up but can't find the reference or consensus statement:

There is a "generally accepted" ratio of cases to the number of IVs in a
regression, I believe it was something like 30 to 1 but not sure.

Anyone recall or can offer insight.

Tks,
W
Reply | Threaded
Open this post in threaded view
|

Re: Ratio of Cases to Regression Variables

Swank, Paul R
In reply to this post by zstatman
Such "ratios" are generally not very helpful. The issue is power and
that will depend on the effect size as well as the sample size. So the #
or subjects per variable will depend on the R squared for the full
model. The lower it is, the more subjects per variable will be needed.
The effect size in regression, F-squared is defined in terms of the
squared change attributable to a variable divided by 1 minus the R
squared for the full model. In the table below, the change in R squared
is set at .075. The R squareds for the full models range from .10 to
.50. This leads to F squareds that range from small/medium to medium
(Cohen, 1988). The n's range from 40 to 120. With 4 Ivs, this is 10 to
30 subjects per IV. As you can see, 10 subjects per IV is not enough
regardles of the Full model R squared. However, an R-squared for the
full model of .7 would give an F squared of .25 and apower of .84. With
20 subjects per IV, a full model R-squared of .3 is sufficient to give a
power > .8, and with 30 per IV, the power is > .9 even if the full model
r-sqared is only .10.

                                             power analyses for
regression model
                                            four predictors - RSQ_change
= .075

                            Obs    n_total    alpha    u     df
f_square     lambda     power

                              1       40       0.05    1     35
0.00000     0.0000    0.05000
                              2       40       0.05    1     35
0.08333     3.0833    0.40057
                              3       40       0.05    1     35
0.09375     3.4688    0.44099
                              4       40       0.05    1     35
0.10714     3.9643    0.49061
                              5       40       0.05    1     35
0.12500     4.6250    0.55230
                              6       40       0.05    1     35
0.15000     5.5500    0.62965
                              7       80       0.05    1     75
0.08333     6.4167    0.70561
                              8       80       0.05    1     75
0.09375     7.2188    0.75562
                              9       80       0.05    1     75
0.10714     8.2500    0.80931
                             10       80       0.05    1     75
0.12500     9.6250    0.86488
                             11       80       0.05    1     75
0.15000    11.5500    0.91845
                             12      120       0.05    1    115
0.08333     9.7500    0.87210
                             13      120       0.05    1    115
0.09375    10.9688    0.90728
                             14      120       0.05    1    115
0.10714    12.5357    0.93954
                             15      120       0.05    1    115
0.12500    14.6250    0.96654
                             16      120       0.05    1    115
0.15000    17.5500    0.98589


Paul R. Swank, Ph.D.
Professor, Developmental Pediatrics
Director of Research,


University of Texas Health Science Center at Houston

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Will Bailey [Statman]
Sent: Wednesday, March 07, 2007 10:14 AM
To: [hidden email]
Subject: Ratio of Cases to Regression Variables

I know this has come up but can't find the reference or consensus
statement:

There is a "generally accepted" ratio of cases to the number of IVs in a
regression, I believe it was something like 30 to 1 but not sure.

Anyone recall or can offer insight.

Tks,
W