SPSSX Discussion

Ratio of Cases to Regression Variables

Classic

List

Threaded

5 messages Options

zstatman

Ratio of Cases to Regression Variables

I know this has come up but can't find the reference or consensus statement:

There is a "generally accepted" ratio of cases to the number of IVs in a
regression, I believe it was something like 30 to 1 but not sure.

Anyone recall or can offer insight.

Tks,
W

Will
Statistical Services

============
info.statman@earthlink.net
http://home.earthlink.net/~z_statman/
============

lts1

Re: Ratio of Cases to Regression Variables

Hi Will,

You might want to take a look at the article below. Green argues that
the general rule of X # of cases per predictors results in sample sizes
that may be unnecessarily large (is that possible?). However, whether you
accept his formula or not doesn't really matter, because he gives references
for two or three other rules for estimating the sample size, which should
help.
Green, S. B. (1991). How many subjects does it take to do a regression
analysis. Multivariate Behavioral Research, 26, 499-510.

Best,
Lisa

Lisa T. Stickney
Ph.D. Candidate
The Fox School of Business
and Management
Temple University
[hidden email]

----- Original Message -----
From: "Will Bailey [Statman]" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, March 07, 2007 11:14 AM
Subject: Ratio of Cases to Regression Variables

>I know this has come up but can't find the reference or consensus
>statement:
>
> There is a "generally accepted" ratio of cases to the number of IVs in a
> regression, I believe it was something like 30 to 1 but not sure.
>
> Anyone recall or can offer insight.
>
> Tks,
> W
>

Anthony Babinec

Re: Ratio of Cases to Regression Variables

In reply to this post by zstatman

Stevens' Applied Multivariate Stats book cites a study
that recommended 15 subjects per predictor when considering
3-25 predictors, shrinkage of less than 0.05 with high
probability (.9), and a population squared multiple correlation
of about 0.5. The magnitude of population correlation affects
the recommended number, with higher rho-square leading to
smaller recommended numbers, and lower rho-square leading
to higher recommended numbers.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Will Bailey [Statman]
Sent: Wednesday, March 07, 2007 10:14 AM
To: [hidden email]
Subject: Ratio of Cases to Regression Variables

I know this has come up but can't find the reference or consensus statement:

There is a "generally accepted" ratio of cases to the number of IVs in a
regression, I believe it was something like 30 to 1 but not sure.

Anyone recall or can offer insight.

Tks,
W

Peck, Jon

Re: Ratio of Cases to Regression Variables

In reply to this post by zstatman

The mathematics of regression defines no limit beyond having as many variables as cases (you can even get around that with Partial Least Squares). Anything beyond that is an arbitrary rule. Satisfactory results will, of course, depend on the variances and covariances of the regressors and the size of the error term.

My 3.14159 cents.
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Will Bailey [Statman]
Sent: Wednesday, March 07, 2007 10:14 AM
To: [hidden email]
Subject: [SPSSX-L] Ratio of Cases to Regression Variables

I know this has come up but can't find the reference or consensus statement:

There is a "generally accepted" ratio of cases to the number of IVs in a
regression, I believe it was something like 30 to 1 but not sure.

Anyone recall or can offer insight.

Tks,
W

Swank, Paul R

Re: Ratio of Cases to Regression Variables

In reply to this post by zstatman

Such "ratios" are generally not very helpful. The issue is power and
that will depend on the effect size as well as the sample size. So the #
or subjects per variable will depend on the R squared for the full
model. The lower it is, the more subjects per variable will be needed.
The effect size in regression, F-squared is defined in terms of the
squared change attributable to a variable divided by 1 minus the R
squared for the full model. In the table below, the change in R squared
is set at .075. The R squareds for the full models range from .10 to
.50. This leads to F squareds that range from small/medium to medium
(Cohen, 1988). The n's range from 40 to 120. With 4 Ivs, this is 10 to
30 subjects per IV. As you can see, 10 subjects per IV is not enough
regardles of the Full model R squared. However, an R-squared for the
full model of .7 would give an F squared of .25 and apower of .84. With
20 subjects per IV, a full model R-squared of .3 is sufficient to give a
power > .8, and with 30 per IV, the power is > .9 even if the full model
r-sqared is only .10.

power analyses for
regression model
four predictors - RSQ_change
= .075

Obs n_total alpha u df
f_square lambda power

1 40 0.05 1 35
0.00000 0.0000 0.05000
2 40 0.05 1 35
0.08333 3.0833 0.40057
3 40 0.05 1 35
0.09375 3.4688 0.44099
4 40 0.05 1 35
0.10714 3.9643 0.49061
5 40 0.05 1 35
0.12500 4.6250 0.55230
6 40 0.05 1 35
0.15000 5.5500 0.62965
7 80 0.05 1 75
0.08333 6.4167 0.70561
8 80 0.05 1 75
0.09375 7.2188 0.75562
9 80 0.05 1 75
0.10714 8.2500 0.80931
10 80 0.05 1 75
0.12500 9.6250 0.86488
11 80 0.05 1 75
0.15000 11.5500 0.91845
12 120 0.05 1 115
0.08333 9.7500 0.87210
13 120 0.05 1 115
0.09375 10.9688 0.90728
14 120 0.05 1 115
0.10714 12.5357 0.93954
15 120 0.05 1 115
0.12500 14.6250 0.96654
16 120 0.05 1 115
0.15000 17.5500 0.98589

Paul R. Swank, Ph.D.
Professor, Developmental Pediatrics
Director of Research,

University of Texas Health Science Center at Houston

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Will Bailey [Statman]
Sent: Wednesday, March 07, 2007 10:14 AM
To: [hidden email]
Subject: Ratio of Cases to Regression Variables

I know this has come up but can't find the reference or consensus
statement:

There is a "generally accepted" ratio of cases to the number of IVs in a
regression, I believe it was something like 30 to 1 but not sure.

Anyone recall or can offer insight.

Tks,
W