SPSSX Discussion

Significant regression, but low R

Classic

List

Threaded

7 messages Options

Tom

Significant regression, but low R

I have a question concerning the interpretation of the result of a linear regression.

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.

Tom

Maguin, Eugene

Re: Significant regression, but low R

Others can give much more precise answers but suppose each variable contributes equally to the Rsquare. So .095/6=about .016. That’s a beta of .126. Treat that like a correlation and the SE for a correlation is (roughly) 1/sqrt(N) = 1/sqrt(830) = .035. You and your predictors need to talk!

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, September 26, 2013 6:40 AM
To: [hidden email]
Subject: Significant regression, but low R

I have a question concerning the interpretation of the result of a linear regression.

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.

Tom

Chi Shu

Re: Significant regression, but low R

My understanding is that you probably need to exam the endogeneity of the independent variables. You are probably missing some very important independent variables so e a has huge variance.

But of course many other problems can cause endogeneity. E.g. autoregression

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Thursday, September 26, 2013 9:17 AM
To: [hidden email]
Subject: Re: Significant regression, but low R

Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, September 26, 2013 6:40 AM
To: [hidden email]
Subject: Significant regression, but low R

I have a question concerning the interpretation of the result of a linear regression.

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.

Tom

Rich Ulrich

Re: Significant regression, but low R

In reply to this post by Tom

Why is R^2 seemingly low while tests are significant?
- Because the N is 830, which is large.

"Statistical significance" is a measure that compares to "random",
not to "clinically important" or "meaningful" or, more relevant to your
question, "evident with a tiny N". That's more or less the meaning of
r, if you want to be casual about it.

On the other hand, a design that collects N = 830 (instead ofa smaller
number like 50 or 100) is what is necessary when the relations have
a small r. Presumably, someone thought that effects of the observed
size would be useful and important to measure and test.

Note that r or R^2 is not, in general, a fine measure of "effect size". Yes,
it works when we know what we are expecting, mainly when we expect
something large because two things are nearly the same. The reason
that epidemiologists often collect Ns of many thousands is because their
Odds Ratios of 2.0 or more for a "big effect" may account for 1% or less
of "variance", owing to the rareness of the events being predicted.

--
Rich Ulrich

Date: Thu, 26 Sep 2013 10:39:52 +0000
From: [hidden email]
Subject: Significant regression, but low R
To: [hidden email]

I have a question concerning the interpretation of the result of a linear regression.

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.

Tom

Ntonghanwah Forcheh

Re: Significant regression, but low R

In reply to this post by Chi Shu

This is a standard limitation of use of "statistical significance" to draw inference. I usually insist on making a difference between a statistically significant association/relationship and a useful relation. Your data are an example of a statistically significant but useless relationship.

The reason is the linear relationship between the test statistic (t) and the sqrt of the standard deviation of the estimated parameter. here b=0.1 with a standard deviation of 0.1. as the model degree of freedom increases from 10 to a moderate 200, the p-value goes from 0.34089 (no linear association?) to (0.00001 - very strong linear association?) yet the strength of the linear association has not changed from 0.1.

See this example for illustration

b=0.1 sd(b)=0.1

df t p-value

10 0.632 0.54128

20 0.894 0.38173

50 1.414 0.16350

100 2.000 0.04821

200 2.828 0.00515

500 4.472 0.00001

1000 6.325 0.00000

A graph of Y versus each X is highly recommended before including each predictor in a model. One should convince themselves that a useful relationship exists between Y and X before using regression to extract the functional form of that relationship. IF this was the relationship between starting income and work experience would I be happy for it to be used to determine my starting salary? (any two variables that I care about) If no, then it is not useful to me.

Hope that this helps. PS: The first row of the p-values is generated from excel using function =2*(1-T.DIST(E4,D4,1)) were column E contains t = 0.1/(0.5/SQRT(D4)) and D4=10 is the degree of freedom

hope this helps

Forcheh

On Thu, Sep 26, 2013 at 3:39 PM, Chi Shu <[hidden email]> wrote:

My understanding is that you probably need to exam the endogeneity of the independent variables. You are probably missing some very important independent variables so e a has huge variance.

But of course many other problems can cause endogeneity. E.g. autoregression

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Thursday, September 26, 2013 9:17 AM
To: [hidden email]
Subject: Re: Significant regression, but low R

Others can give much more precise answers but suppose each variable contributes equally to the Rsquare. So .095/6=about .016. That’s a beta of .126. Treat that like a correlation and the SE for a correlation is (roughly) 1/sqrt(N) = 1/sqrt(830) = .035. You and your predictors need to talk!

Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, September 26, 2013 6:40 AM
To: [hidden email]
Subject: Significant regression, but low R

Hi
I have a question concerning the interpretation of the result of a linear regression.

N = 830
df=6
y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.
Tom

--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: +267 355 2696,
Mobile: Orange +267 75 26 2963, Bmobile: 73181378: Mascom 754 21238
fax: +267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*

Ntonghanwah Forcheh

Re: Significant regression, but low R

the linear relationship between the test statistic (t) and the sqrt of the standard deviation of the estimated parameter.

should be corrected to

the linear relationship between the test statistic (t) and the sqrt of the degrees of freedom (sample size - #parameters).

On Thu, Sep 26, 2013 at 6:43 PM, Ntonghanwah Forcheh <[hidden email]> wrote:

This is a standard limitation of use of "statistical significance" to draw inference. I usually insist on making a difference between a statistically significant association/relationship and a useful relation. Your data are an example of a statistically significant but useless relationship.

The reason is the linear relationship between the test statistic (t) and the sqrt of the standard deviation of the estimated parameter. here b=0.1 with a standard deviation of 0.1. as the model degree of freedom increases from 10 to a moderate 200, the p-value goes from 0.34089 (no linear association?) to (0.00001 - very strong linear association?) yet the strength of the linear association has not changed from 0.1.

See this example for illustration
b=0.1 sd(b)=0.1

df t p-value
10 0.632 0.54128

20 0.894 0.38173
50 1.414 0.16350

100 2.000 0.04821
200 2.828 0.00515

500 4.472 0.00001
1000 6.325 0.00000

A graph of Y versus each X is highly recommended before including each predictor in a model. One should convince themselves that a useful relationship exists between Y and X before using regression to extract the functional form of that relationship. IF this was the relationship between starting income and work experience would I be happy for it to be used to determine my starting salary? (any two variables that I care about) If no, then it is not useful to me.

Hope that this helps. PS: The first row of the p-values is generated from excel using function =2*(1-T.DIST(E4,D4,1)) were column E contains t = 0.1/(0.5/SQRT(D4)) and D4=10 is the degree of freedom

hope this helps
Forcheh

On Thu, Sep 26, 2013 at 3:39 PM, Chi Shu <[hidden email]> wrote:

My understanding is that you probably need to exam the endogeneity of the independent variables. You are probably missing some very important independent variables so e a has huge variance.

But of course many other problems can cause endogeneity. E.g. autoregression

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Thursday, September 26, 2013 9:17 AM
To: [hidden email]
Subject: Re: Significant regression, but low R

Others can give much more precise answers but suppose each variable contributes equally to the Rsquare. So .095/6=about .016. That’s a beta of .126. Treat that like a correlation and the SE for a correlation is (roughly) 1/sqrt(N) = 1/sqrt(830) = .035. You and your predictors need to talk!

Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Balmer, Thomas
Sent: Thursday, September 26, 2013 6:40 AM
To: [hidden email]
Subject: Significant regression, but low R

Hi
I have a question concerning the interpretation of the result of a linear regression.

N = 830
df=6
y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.
Tom

--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: <a href="tel:%2B267%20355%202696" value="+2673552696" target="_blank">+267 355 2696,
Mobile: Orange <a href="tel:%2B267%2075%2026%202963" value="+26775262963" target="_blank">+267 75 26 2963, Bmobile: 73181378: Mascom 754 21238
fax: <a href="tel:%2B267%203185099" value="+2673185099" target="_blank">+267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*

Tom

WG: Significant regression, but low R

In reply to this post by Tom

Hi Bill

Yes, the Power is very high - and the Effect size, Cohen's f2 =.112 seems to be "medium".

Ok so far: so there's nothing else than to assert, that "the indipendent variables explain with medium effect size signifcantly the response variable", but that there must exist other variables influencing the response variable by a much more higher amount.

Is this "conclusion" correct?

Tom

I suspect that it is your relatively large sample size… Have you done a power analysis? I plugged your values into G*Power and found the power to be 1.0 when testing at the .05 level, likewise at the .01 level…

Bill

I have a question concerning the interpretation of the result of a linear regression.

N = 830

df=6

y=c + x1 + x2 + x3 + x4 +x5+ x6

corr. R² = 0.095

Model and all independent variables (Beta) are significant.

How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R²…

Thanks for hints.

Tom