Significant regression, but low R

classic Classic list List threaded Threaded
7 messages Options
Tom
Reply | Threaded
Open this post in threaded view
|

Significant regression, but low R

Tom

Hi

I have a question concerning the interpretation of the result of a linear regression.

 

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

 

corr. R2 = 0.095

 

Model and all independent variables (Beta) are significant.

 

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor  knowledge I would have expected a higher R2

 

Thanks for hints.

Tom

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Significant regression, but low R

Maguin, Eugene

Others can give much more precise answers but suppose each variable contributes equally to the Rsquare. So .095/6=about .016. That’s a beta of .126. Treat that like a correlation and the SE for a correlation is (roughly) 1/sqrt(N) = 1/sqrt(830) = .035. You and your predictors need to talk!

Gene Maguin

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, September 26, 2013 6:40 AM
To: [hidden email]
Subject: Significant regression, but low R

 

Hi

I have a question concerning the interpretation of the result of a linear regression.

 

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

 

corr. R2 = 0.095

 

Model and all independent variables (Beta) are significant.

 

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor  knowledge I would have expected a higher R2

 

Thanks for hints.

Tom

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Significant regression, but low R

Chi Shu

My understanding is that you probably need to exam the endogeneity of the independent variables. You are probably missing some very important independent variables so e a has huge variance.

 

But of course many other problems can cause endogeneity. E.g. autoregression

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Thursday, September 26, 2013 9:17 AM
To: [hidden email]
Subject: Re: Significant regression, but low R

 

Others can give much more precise answers but suppose each variable contributes equally to the Rsquare. So .095/6=about .016. That’s a beta of .126. Treat that like a correlation and the SE for a correlation is (roughly) 1/sqrt(N) = 1/sqrt(830) = .035. You and your predictors need to talk!

Gene Maguin

 

 

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, September 26, 2013 6:40 AM
To: [hidden email]
Subject: Significant regression, but low R

 

Hi

I have a question concerning the interpretation of the result of a linear regression.

 

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

 

corr. R2 = 0.095

 

Model and all independent variables (Beta) are significant.

 

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor  knowledge I would have expected a higher R2

 

Thanks for hints.

Tom

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Significant regression, but low R

Rich Ulrich
In reply to this post by Tom
 Why is R^2 seemingly low while tests are significant? 
 - Because the N is 830, which is large.

"Statistical significance" is a measure that compares to "random",
not to "clinically important" or "meaningful" or, more relevant to your
question, "evident with a tiny N".  That's more or less the meaning of
r, if you want to be casual about it.

On the other hand, a design that collects  N = 830 (instead ofa smaller
number like 50 or 100) is what is necessary when the relations have
a small r.   Presumably, someone thought that effects of the observed
size would be useful and important to measure and test.  

Note that r or R^2 is not, in general, a fine measure of  "effect size".  Yes,
it works when we know what we are expecting, mainly when we expect
something large because two things are nearly the same.  The reason
that epidemiologists often collect Ns of many thousands is because their
Odds Ratios of  2.0 or more for a "big effect"  may account for 1% or less
of "variance", owing to the rareness of the events being predicted.

--
Rich Ulrich


Date: Thu, 26 Sep 2013 10:39:52 +0000
From: [hidden email]
Subject: Significant regression, but low R
To: [hidden email]

Hi

I have a question concerning the interpretation of the result of a linear regression.

 

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

 

corr. R2 = 0.095

 

Model and all independent variables (Beta) are significant.

 

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor  knowledge I would have expected a higher R2

 

Thanks for hints.

Tom

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Significant regression, but low R

Ntonghanwah Forcheh
In reply to this post by Chi Shu
This is a standard limitation of use of "statistical significance" to draw inference. I usually insist on making a difference between a statistically significant association/relationship and a useful relation. Your data are an example of a statistically significant but useless relationship.

The reason is the linear relationship between the test statistic (t) and the sqrt of the standard deviation of the estimated parameter.  here b=0.1 with a standard deviation of 0.1. as the model degree of freedom increases from 10 to a moderate 200, the p-value goes from 0.34089 (no linear association?) to (0.00001 - very strong linear association?) yet the strength of the linear association has not changed from 0.1. 

See this example for illustration
 b=0.1 sd(b)=0.1
df t p-value
10 0.632 0.54128
20 0.894 0.38173
50 1.414 0.16350
100    2.000 0.04821
200 2.828 0.00515
500 4.472 0.00001
1000  6.325 0.00000

A graph of Y versus each X is highly recommended before including each predictor in a model. One should convince themselves that a useful relationship exists between Y and X before using regression to extract the functional form of that relationship.  IF this was the relationship between starting income and work experience would I be happy for it to be used to determine my starting salary? (any  two variables that I care about) If no, then it is not useful to me.
Hope that this helps. PS:  The first row of the p-values is generated from excel using function =2*(1-T.DIST(E4,D4,1)) were column  E contains t = 0.1/(0.5/SQRT(D4)) and D4=10 is the degree of freedom

hope this helps
Forcheh


On Thu, Sep 26, 2013 at 3:39 PM, Chi Shu <[hidden email]> wrote:

My understanding is that you probably need to exam the endogeneity of the independent variables. You are probably missing some very important independent variables so e a has huge variance.

 

But of course many other problems can cause endogeneity. E.g. autoregression

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Thursday, September 26, 2013 9:17 AM
To: [hidden email]
Subject: Re: Significant regression, but low R

 

Others can give much more precise answers but suppose each variable contributes equally to the Rsquare. So .095/6=about .016. That’s a beta of .126. Treat that like a correlation and the SE for a correlation is (roughly) 1/sqrt(N) = 1/sqrt(830) = .035. You and your predictors need to talk!

Gene Maguin

 

 

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, September 26, 2013 6:40 AM
To: [hidden email]
Subject: Significant regression, but low R

 

Hi

I have a question concerning the interpretation of the result of a linear regression.

 

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

 

corr. R2 = 0.095

 

Model and all independent variables (Beta) are significant.

 

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor  knowledge I would have expected a higher R2

 

Thanks for hints.

Tom

 

 

 




--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: +267 355 2696,
 Mobile:  Orange +267 75 26 2963,    Bmobile:  73181378:    Mascom  754 21238
fax: +267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*


Reply | Threaded
Open this post in threaded view
|

Re: Significant regression, but low R

Ntonghanwah Forcheh
 the linear relationship between the test statistic (t) and the sqrt of the standard deviation of the estimated parameter.  

should be corrected to 

the linear relationship between the test statistic (t) and the sqrt of the degrees of freedom (sample size - #parameters).


On Thu, Sep 26, 2013 at 6:43 PM, Ntonghanwah Forcheh <[hidden email]> wrote:
This is a standard limitation of use of "statistical significance" to draw inference. I usually insist on making a difference between a statistically significant association/relationship and a useful relation. Your data are an example of a statistically significant but useless relationship.

The reason is the linear relationship between the test statistic (t) and the sqrt of the standard deviation of the estimated parameter.  here b=0.1 with a standard deviation of 0.1. as the model degree of freedom increases from 10 to a moderate 200, the p-value goes from 0.34089 (no linear association?) to (0.00001 - very strong linear association?) yet the strength of the linear association has not changed from 0.1. 

See this example for illustration
 b=0.1 sd(b)=0.1
df t p-value
10 0.632 0.54128
20 0.894 0.38173
50 1.414 0.16350
100    2.000 0.04821
200 2.828 0.00515
500 4.472 0.00001
1000  6.325 0.00000

A graph of Y versus each X is highly recommended before including each predictor in a model. One should convince themselves that a useful relationship exists between Y and X before using regression to extract the functional form of that relationship.  IF this was the relationship between starting income and work experience would I be happy for it to be used to determine my starting salary? (any  two variables that I care about) If no, then it is not useful to me.
Hope that this helps. PS:  The first row of the p-values is generated from excel using function =2*(1-T.DIST(E4,D4,1)) were column  E contains t = 0.1/(0.5/SQRT(D4)) and D4=10 is the degree of freedom

hope this helps
Forcheh


On Thu, Sep 26, 2013 at 3:39 PM, Chi Shu <[hidden email]> wrote:

My understanding is that you probably need to exam the endogeneity of the independent variables. You are probably missing some very important independent variables so e a has huge variance.

 

But of course many other problems can cause endogeneity. E.g. autoregression

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Thursday, September 26, 2013 9:17 AM
To: [hidden email]
Subject: Re: Significant regression, but low R

 

Others can give much more precise answers but suppose each variable contributes equally to the Rsquare. So .095/6=about .016. That’s a beta of .126. Treat that like a correlation and the SE for a correlation is (roughly) 1/sqrt(N) = 1/sqrt(830) = .035. You and your predictors need to talk!

Gene Maguin

 

 

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, September 26, 2013 6:40 AM
To: [hidden email]
Subject: Significant regression, but low R

 

Hi

I have a question concerning the interpretation of the result of a linear regression.

 

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

 

corr. R2 = 0.095

 

Model and all independent variables (Beta) are significant.

 

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor  knowledge I would have expected a higher R2

 

Thanks for hints.

Tom

 

 

 




--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: <a href="tel:%2B267%20355%202696" value="+2673552696" target="_blank">+267 355 2696,
 Mobile:  Orange <a href="tel:%2B267%2075%2026%202963" value="+26775262963" target="_blank">+267 75 26 2963,    Bmobile:  73181378:    Mascom  754 21238
fax: <a href="tel:%2B267%203185099" value="+2673185099" target="_blank">+267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*





--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: +267 355 2696,
 Mobile:  Orange +267 75 26 2963,    Bmobile:  73181378:    Mascom  754 21238
fax: +267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*


Tom
Reply | Threaded
Open this post in threaded view
|

WG: Significant regression, but low R

Tom
In reply to this post by Tom

Hi Bill

 

Yes, the Power is very high - and the Effect size, Cohen's f2 =.112 seems  to be "medium".

 

Ok so far: so there's nothing else than to assert, that "the indipendent variables explain with medium effect size signifcantly the response variable", but that there must exist other variables influencing the response variable by a much more higher amount.

 

Is this "conclusion" correct?

 

Tom

 

 

 

I suspect that it is your relatively large sample size… Have you done a  power analysis?  I plugged your values into G*Power and found the power to be 1.0 when testing at the .05 level, likewise at the .01 level…

 

Bill

 

 

 

Hi

I have a question concerning the interpretation of the result of a linear regression.

 

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

 

corr. R2 = 0.095

 

Model and all independent variables (Beta) are significant.

 

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor  knowledge I would have expected a higher R2

 

Thanks for hints.

Tom