Hi I have a question concerning the interpretation of the result of a linear regression. N = 830 df=6 y=c + x1 + x2 + x3 + x4 +x5+ x6 corr. R2 = 0.095 Model and all independent variables (Beta) are significant. How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R2… Thanks for hints. Tom |
Others can give much more precise answers but suppose each variable contributes equally to the Rsquare. So .095/6=about .016. That’s a beta of .126. Treat that like a correlation and the SE for a correlation
is (roughly) 1/sqrt(N) = 1/sqrt(830) = .035. You and your predictors need to talk! Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Balmer, Thomas Hi I have a question concerning the interpretation of the result of a linear regression. N = 830 df=6 y=c + x1 + x2 + x3 + x4 +x5+ x6 corr. R2 = 0.095 Model and all independent variables (Beta) are significant. How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R2… Thanks for hints. Tom |
My understanding is that you probably need to exam the endogeneity of the independent variables. You are probably missing some very important independent variables so e a has huge variance.
But of course many other problems can cause endogeneity. E.g. autoregression
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Others can give much more precise answers but suppose each variable contributes equally to the Rsquare. So .095/6=about .016. That’s a beta of .126. Treat that like a correlation and the SE for a correlation is (roughly) 1/sqrt(N) = 1/sqrt(830) = .035. You and your predictors need to talk! Gene Maguin
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Balmer, Thomas
Hi I have a question concerning the interpretation of the result of a linear regression.
N = 830 df=6 y=c + x1 + x2 + x3 + x4 +x5+ x6
corr. R2 = 0.095
Model and all independent variables (Beta) are significant.
How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R2…
Thanks for hints. Tom
|
In reply to this post by Tom
Why is R^2 seemingly low while tests are significant?
- Because the N is 830, which is large. "Statistical significance" is a measure that compares to "random", not to "clinically important" or "meaningful" or, more relevant to your question, "evident with a tiny N". That's more or less the meaning of r, if you want to be casual about it. On the other hand, a design that collects N = 830 (instead ofa smaller number like 50 or 100) is what is necessary when the relations have a small r. Presumably, someone thought that effects of the observed size would be useful and important to measure and test. Note that r or R^2 is not, in general, a fine measure of "effect size". Yes, it works when we know what we are expecting, mainly when we expect something large because two things are nearly the same. The reason that epidemiologists often collect Ns of many thousands is because their Odds Ratios of 2.0 or more for a "big effect" may account for 1% or less of "variance", owing to the rareness of the events being predicted. -- Rich Ulrich Date: Thu, 26 Sep 2013 10:39:52 +0000 From: [hidden email] Subject: Significant regression, but low R To: [hidden email] Hi I have a question concerning the interpretation of the result of a linear regression.
N = 830 df=6 y=c + x1 + x2 + x3 + x4 +x5+ x6
corr. R2 = 0.095
Model and all independent variables (Beta) are significant.
How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R2…
Thanks for hints. Tom
|
In reply to this post by Chi Shu
This is a standard limitation of use of "statistical significance" to draw inference. I usually insist on making a difference between a statistically significant association/relationship and a useful relation. Your data are an example of a statistically significant but useless relationship. The reason is the linear relationship between the test statistic (t) and the sqrt of the standard deviation of the estimated parameter. here b=0.1 with a standard deviation of 0.1. as the model degree of freedom increases from 10 to a moderate 200, the p-value goes from 0.34089 (no linear association?) to (0.00001 - very strong linear association?) yet the strength of the linear association has not changed from 0.1.
See this example for illustration b=0.1 sd(b)=0.1 df t p-value 10 0.632 0.54128
20 0.894 0.38173 50 1.414 0.16350
100 2.000 0.04821 200 2.828 0.00515
500 4.472 0.00001 1000 6.325 0.00000
A graph of Y versus each X is highly recommended before including each predictor in a model. One should convince themselves that a useful relationship exists between Y and X before using regression to extract the functional form of that relationship. IF this was the relationship between starting income and work experience would I be happy for it to be used to determine my starting salary? (any two variables that I care about) If no, then it is not useful to me.
Hope that this helps. PS: The first row of the p-values is generated from excel using function =2*(1-T.DIST(E4,D4,1)) were column E contains t = 0.1/(0.5/SQRT(D4)) and D4=10 is the degree of freedom hope this helps Forcheh On Thu, Sep 26, 2013 at 3:39 PM, Chi Shu <[hidden email]> wrote:
Professor Ntonghanwah Forcheh Department of Statistics, University of Botswana Private Bag UB00705, Gaborone, Botswana. Office: +267 355 2696, Mobile: Orange +267 75 26 2963, Bmobile: 73181378: Mascom 754 21238 fax: +267 3185099; Alternative Email: [hidden email] *@Honesty is a Virtue, Freedom of the Mind is Power. Motto: Never be afraid to be honest, Never lie to yourself, Trust in the Truth and you will be forever free.* |
the linear relationship between the test statistic (t) and the sqrt of the standard deviation of the estimated parameter. should be corrected to the linear relationship between the test statistic (t) and the sqrt of the degrees of freedom (sample size - #parameters). On Thu, Sep 26, 2013 at 6:43 PM, Ntonghanwah Forcheh <[hidden email]> wrote:
Professor Ntonghanwah Forcheh Department of Statistics, University of Botswana Private Bag UB00705, Gaborone, Botswana. Office: +267 355 2696, Mobile: Orange +267 75 26 2963, Bmobile: 73181378: Mascom 754 21238 fax: +267 3185099; Alternative Email: [hidden email] *@Honesty is a Virtue, Freedom of the Mind is Power. Motto: Never be afraid to be honest, Never lie to yourself, Trust in the Truth and you will be forever free.* |
In reply to this post by Tom
Hi Bill Yes, the Power is very high - and the Effect size, Cohen's f2 =.112 seems to be "medium". Ok so far: so there's nothing else than to assert, that "the indipendent variables explain with medium effect size signifcantly the response variable", but that there must exist other variables influencing the response variable by a
much more higher amount. Is this "conclusion" correct? Tom I suspect that it is your relatively large sample size… Have you done a power analysis? I plugged your values into G*Power and found the power to be 1.0 when testing at the .05 level, likewise
at the .01 level… Bill Hi I have a question concerning the interpretation of the result of a linear regression. N = 830 df=6 y=c + x1 + x2 + x3 + x4 +x5+ x6 corr. R2 = 0.095 Model and all independent variables (Beta) are significant. How comes, that all the betas are significant, but the amount of explained variance is so low ? According to my poor knowledge I would have expected a higher R2… Thanks for hints. Tom |
Free forum by Nabble | Edit this page |