SPSSX Discussion

Significant regression, but low R

Classic

List

Threaded

7 messages Options

Tom

Sep 26, 2013; 10:39am

Significant regression, but low R

61 posts

I have a question concerning the interpretation of the result of a linear regression.

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.

Tom

Maguin, Eugene

Sep 26, 2013; 1:16pm

Re: Significant regression, but low R

1973 posts

Others can give much more precise answers but suppose each variable contributes equally to the Rsquare. So .095/6=about .016. That’s a beta of .126. Treat that like a correlation and the SE for a correlation is (roughly) 1/sqrt(N) = 1/sqrt(830) = .035. You and your predictors need to talk!

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, September 26, 2013 6:40 AM
To: [hidden email]
Subject: Significant regression, but low R

I have a question concerning the interpretation of the result of a linear regression.

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.

Tom

Chi Shu

Sep 26, 2013; 1:39pm

Re: Significant regression, but low R

3 posts

My understanding is that you probably need to exam the endogeneity of the independent variables. You are probably missing some very important independent variables so e a has huge variance.

But of course many other problems can cause endogeneity. E.g. autoregression

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Thursday, September 26, 2013 9:17 AM
To: [hidden email]
Subject: Re: Significant regression, but low R

Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, September 26, 2013 6:40 AM
To: [hidden email]
Subject: Significant regression, but low R

I have a question concerning the interpretation of the result of a linear regression.

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.

Tom

Rich Ulrich

Sep 26, 2013; 4:24pm

Re: Significant regression, but low R

1067 posts

In reply to this post by Tom

Why is R^2 seemingly low while tests are significant?
- Because the N is 830, which is large.

"Statistical significance" is a measure that compares to "random",
not to "clinically important" or "meaningful" or, more relevant to your
question, "evident with a tiny N". That's more or less the meaning of
r, if you want to be casual about it.

On the other hand, a design that collects N = 830 (instead ofa smaller
number like 50 or 100) is what is necessary when the relations have
a small r. Presumably, someone thought that effects of the observed
size would be useful and important to measure and test.

Note that r or R^2 is not, in general, a fine measure of "effect size". Yes,
it works when we know what we are expecting, mainly when we expect
something large because two things are nearly the same. The reason
that epidemiologists often collect Ns of many thousands is because their
Odds Ratios of 2.0 or more for a "big effect" may account for 1% or less
of "variance", owing to the rareness of the events being predicted.

--
Rich Ulrich

Date: Thu, 26 Sep 2013 10:39:52 +0000
From: [hidden email]
Subject: Significant regression, but low R
To: [hidden email]

I have a question concerning the interpretation of the result of a linear regression.

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.

Tom

Ntonghanwah Forcheh

Sep 26, 2013; 4:43pm

Re: Significant regression, but low R

11 posts

In reply to this post by Chi Shu

This is a standard limitation of use of "statistical significance" to draw inference. I usually insist on making a difference between a statistically significant association/relationship and a useful relation. Your data are an example of a statistically significant but useless relationship.

The reason is the linear relationship between the test statistic (t) and the sqrt of the standard deviation of the estimated parameter. here b=0.1 with a standard deviation of 0.1. as the model degree of freedom increases from 10 to a moderate 200, the p-value goes from 0.34089 (no linear association?) to (0.00001 - very strong linear association?) yet the strength of the linear association has not changed from 0.1.

See this example for illustration

b=0.1 sd(b)=0.1

df t p-value

10 0.632 0.54128

20 0.894 0.38173

50 1.414 0.16350

100 2.000 0.04821

200 2.828 0.00515

500 4.472 0.00001

1000 6.325 0.00000

A graph of Y versus each X is highly recommended before including each predictor in a model. One should convince themselves that a useful relationship exists between Y and X before using regression to extract the functional form of that relationship. IF this was the relationship between starting income and work experience would I be happy for it to be used to determine my starting salary? (any two variables that I care about) If no, then it is not useful to me.

Hope that this helps. PS: The first row of the p-values is generated from excel using function =2*(1-T.DIST(E4,D4,1)) were column E contains t = 0.1/(0.5/SQRT(D4)) and D4=10 is the degree of freedom

hope this helps

Forcheh

On Thu, Sep 26, 2013 at 3:39 PM, Chi Shu <[hidden email]> wrote:

My understanding is that you probably need to exam the endogeneity of the independent variables. You are probably missing some very important independent variables so e a has huge variance.

But of course many other problems can cause endogeneity. E.g. autoregression

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Thursday, September 26, 2013 9:17 AM
To: [hidden email]
Subject: Re: Significant regression, but low R

Others can give much more precise answers but suppose each variable contributes equally to the Rsquare. So .095/6=about .016. That’s a beta of .126. Treat that like a correlation and the SE for a correlation is (roughly) 1/sqrt(N) = 1/sqrt(830) = .035. You and your predictors need to talk!

Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, September 26, 2013 6:40 AM
To: [hidden email]
Subject: Significant regression, but low R

Hi
I have a question concerning the interpretation of the result of a linear regression.

N = 830
df=6
y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.
Tom

... [show rest of quote]

--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: +267 355 2696,
Mobile: Orange +267 75 26 2963, Bmobile: 73181378: Mascom 754 21238
fax: +267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*

Ntonghanwah Forcheh

Sep 26, 2013; 4:46pm

Re: Significant regression, but low R

11 posts

the linear relationship between the test statistic (t) and the sqrt of the standard deviation of the estimated parameter.

should be corrected to

the linear relationship between the test statistic (t) and the sqrt of the degrees of freedom (sample size - #parameters).

On Thu, Sep 26, 2013 at 6:43 PM, Ntonghanwah Forcheh <[hidden email]> wrote:

This is a standard limitation of use of "statistical significance" to draw inference. I usually insist on making a difference between a statistically significant association/relationship and a useful relation. Your data are an example of a statistically significant but useless relationship.

The reason is the linear relationship between the test statistic (t) and the sqrt of the standard deviation of the estimated parameter. here b=0.1 with a standard deviation of 0.1. as the model degree of freedom increases from 10 to a moderate 200, the p-value goes from 0.34089 (no linear association?) to (0.00001 - very strong linear association?) yet the strength of the linear association has not changed from 0.1.

See this example for illustration
b=0.1 sd(b)=0.1

df t p-value
10 0.632 0.54128

20 0.894 0.38173
50 1.414 0.16350

100 2.000 0.04821
200 2.828 0.00515

500 4.472 0.00001
1000 6.325 0.00000

A graph of Y versus each X is highly recommended before including each predictor in a model. One should convince themselves that a useful relationship exists between Y and X before using regression to extract the functional form of that relationship. IF this was the relationship between starting income and work experience would I be happy for it to be used to determine my starting salary? (any two variables that I care about) If no, then it is not useful to me.

Hope that this helps. PS: The first row of the p-values is generated from excel using function =2*(1-T.DIST(E4,D4,1)) were column E contains t = 0.1/(0.5/SQRT(D4)) and D4=10 is the degree of freedom

hope this helps
Forcheh

On Thu, Sep 26, 2013 at 3:39 PM, Chi Shu <[hidden email]> wrote:

My understanding is that you probably need to exam the endogeneity of the independent variables. You are probably missing some very important independent variables so e a has huge variance.

But of course many other problems can cause endogeneity. E.g. autoregression

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Thursday, September 26, 2013 9:17 AM
To: [hidden email]
Subject: Re: Significant regression, but low R

Others can give much more precise answers but suppose each variable contributes equally to the Rsquare. So .095/6=about .016. That’s a beta of .126. Treat that like a correlation and the SE for a correlation is (roughly) 1/sqrt(N) = 1/sqrt(830) = .035. You and your predictors need to talk!

Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, September 26, 2013 6:40 AM
To: [hidden email]
Subject: Significant regression, but low R

Hi
I have a question concerning the interpretation of the result of a linear regression.

N = 830
df=6
y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.
Tom

... [show rest of quote]

--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: <a href="tel:%2B267%20355%202696" value="+2673552696" target="_blank">+267 355 2696,
Mobile: Orange <a href="tel:%2B267%2075%2026%202963" value="+26775262963" target="_blank">+267 75 26 2963, Bmobile: 73181378: Mascom 754 21238
fax: <a href="tel:%2B267%203185099" value="+2673185099" target="_blank">+267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*

... [show rest of quote]

Tom

Sep 27, 2013; 11:46am

WG: Significant regression, but low R

61 posts

In reply to this post by Tom

Hi Bill

Yes, the Power is very high - and the Effect size, Cohen's f2 =.112 seems to be "medium".

Ok so far: so there's nothing else than to assert, that "the indipendent variables explain with medium effect size signifcantly the response variable", but that there must exist other variables influencing the response variable by a much more higher amount.

Is this "conclusion" correct?

Tom

I suspect that it is your relatively large sample size… Have you done a power analysis? I plugged your values into G*Power and found the power to be 1.0 when testing at the .05 level, likewise at the .01 level…

Bill

I have a question concerning the interpretation of the result of a linear regression.

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.

Tom