|
Hi Listers,
We want to predict Y from a few variable. Let's say x1, x2, x3. I am curious to know if there is any role of thumb on R square for the prediction. To my understanding, the prediction is not good if the R square is too small, let's say 0.1, even if the regression equation and coefficients are significant. Am I correct?
I tried to look up information through internet but didn't get much. Can anyone share your thoughts or direct me to some references? Thanks very much. Haijie
|
|
You ae right: even if statistically significant,
a regression may be a poor predictor if R2 is low. The R2 determination
coefficient (the square of the linear correlation coefficient between observed
and predicted values) indicates the proportion of the observed variance in the
dependent variable that is explained by a linear combination of the predictors.
It may be low. However, if your sample is large enough, it may be significantly
different from zero. When R2 is low, it means the residuals
around the predicted value are on average quite large. You can perhaps predict
the population average Y (for a given combination of predictors) but individual
cases may be quite far from the predicted value. Thus, if you want to predict
the population or average value it may still be right, even if for individual
cases it would produce wild errors. As for significance, you know it depends
not only on the strength of the relationship (i.e. the amount of variance
explained or not) but on the sample size. Almost any regression equation can be
significant if you have 10 million cases. Two final observations:
Hector From: SPSSX(r)
Discussion Hi Listers, We want to predict Y from a few variable. Let's say x1, x2, x3. I am
curious to know if there is any role of thumb on R square for the prediction.
To my understanding, the prediction is not good if the R square is too small,
let's say 0.1, even if the regression equation and coefficients are
significant. Am I correct? I tried to look up information through internet but didn't get much.
Can anyone share your thoughts or direct me to some references? Thanks very much. Haijie No
virus found in this incoming message. |
|
Administrator
|
Hector's second observation below touched on something that has long been a pet peeve of mine. Some years ago in psychology, there was a move to encourage (or even require) reporting of effect sizes. In some contexts, I think that makes sense. But in others, it doesn't make very much sense at all IMO. For example, I used to work for an attention researcher. Like most attention researchers, we measured response time (RT) in two or more experimental conditions, and compared mean RT in one condition to mean RT in another condition to test predictions from various theories or models. In this case, R-squared type measures of effect size would almost certainly be ridiculously low for the very reason Hector alluded to: I.e., there are a whole host of things that influence RT on a particular trial, the experimental conditions being only one of them. Granted, such measures of effect size would be appropriate IF one's goal was to explain as much of the variability in RT as possible. But for those attention researchers, explaining all of the variation in RT was NEVER the goal. They were not interested in RT per se, but were simply using it as a convenient tool to make inferences about attentional processes, and to test predictions from competing models. In any situation where the measurement is very indirect (which is a common occurrence in psychology), it seems to me that r-squared type measures of effect size are not really all that useful, because they are not in concert with the real aims of the research.
So in response to the OP, how large r-squared has to be depends an awful lot on the context. Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
According to your previous explanation, you
were trying to predict specific cases, no matter whether the “cases”
are individual human beings or individual groups. What is usually done about R2
(proportion of variance explained) to know whether it is significant is to calculate
de F ratio. Usually this ratio is produced as part of the regression procedure.
F is the ratio of explained to unexplained variance, where both variances are calculated
dividing the respective sum of squares by their corresponding degrees of freedom.
This tells you only one thing: it tells you
whether you can or can not reject the null hypothesis that the “real”
R2 (in the population) is zero, implying that your observed R2 is just a sampling
fluke. It does not tell you whether the prediction would be accurate, or whether
the linear function is the right kind of relationship between the variables, or
whether the actual value would be close to the predicted value. Hector From: Haijie Ding Hi Hector and Bruce, Thanks a lot for your kind explanation. Actually, what we want to predict is group phenomena (like culture) and
our sample is around 300. So, what's the threshold to say the R square is ok
for prediction? Is 0.2 enough? If not, how about 0.3? Bests, Haijie On Thu, Aug 6, 2009 at 3:26 AM, Bruce Weaver <[hidden email]> wrote: Hector's second observation below touched on something that has long
been a
> 1. Statistics is about large numbers, not
about individuals. Nobody > told you that a regression equation would predict an individual
value: the > 2. What is then the use of an equation with
significant coefficients > but a low R2? The
use is not so much prediction but explanation. If you ----- No
virus found in this incoming message. |
How do I quit smoking? Find out the ways on Yahoo! Answers! |
|
In reply to this post by Haijie Ding
Not necessarily. It depends on what you are testing: the contribution of a predictor, or the entire equation.
Consider a test that tries to accept or reject the Null Hypothesis that the additional variance explained by one specific predictor is zero. That additional variance explained would be zero if the respective regression coefficient is (statistically undistinguishable from) zero. The test whether the additional variance would also be zero may be done by testing the equivalent condition that R2 does not change (or shows a non significant difference) if the equation is estimated with and without a specific predictor. In these cases what is tested is the contribution of a predictor to explain the variability of the DV, and it is expressed in the increase caused in R2 by including that predictor in the equation. Regarding R2 what is tested is the difference in R2 under two situations (with and without a specific predictor). The other situation consists of testing the significance of R2 (not the significance of a difference in R2). This requires testing the Null Hypothesis that ALL the predictors explain nothing, so that the entire R2 is zero. The two situations are different. It may well be that R2 is significantly higher than zero, but one or more specific predictors contribute nothing. The F test may be used for all these tests. Hector ________________________________________ From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Eins Bernardo Sent: 07 August 2009 18:54 To: [hidden email] Subject: Re: R Square from Regression for Prediction Hector wrote: According to your previous explanation, you were trying to predict specific cases, no matter whether the �cases� are individual human beings or individual groups. What is usually done about R2 (proportion of variance explained) to know whether it is significant is to calculate de F ratio. Usually this ratio is produced as part of the regression procedure. F is the ratio of explained to unexplained variance, where both variances are calculated dividing the respective sum of squares by their corresponding degrees of freedom. This tells you only one thing: it tells you whether you can or can not reject the null hypothesis that the �real� R2 (in the population) is zero, implying that your observed R2 is just a sampling fluke. It does not tell you whether the prediction would be accurate, or whether the linear function is the right kind of relationship between the variables, or whether the actual value would be close to the predicted value. Hector Books on Regression analysis explain that the F ratio is used to test the null hypothesis that all the regression coeffciients(in the population) are equal to zero. What is the difference of this hypothesis to your hypothesis that the "R2 is zero"? Are they telling the same story? Eins --- On Thu, 8/6/09, Hector Maletta <hmaletta@fibertel..com.ar> wrote: From: Hector Maletta <[hidden email]> Subject: Re: R Square from Regression for Prediction To: [hidden email] Date: Thursday, 6 August, 2009, 4:02 PM According to your previous explanation, you were trying to predict specific cases, no matter whether the �cases� are individual human beings or individual groups. What is usually done about R2 (proportion of variance explained) to know whether it is significant is to calculate de F ratio. Usually this ratio is produced as part of the regression procedure. F is the ratio of explained to unexplained variance, where both variances are calculated dividing the respective sum of squares by their corresponding degrees of freedom. This tells you only one thing: it tells you whether you can or can not reject the null hypothesis that the �real� R2 (in the population) is zero, implying that your observed R2 is just a sampling fluke. It does not tell you whether the prediction would be accurate, or whether the linear function is the right kind of relationship between the variables, or whether the actual value would be close to the predicted value. Hector ________________________________________ From: Haijie Ding [mailto:[hidden email]] Sent: 05 August 2009 20:46 To: Bruce Weaver; Hector Maletta Cc: [hidden email] Subject: Re: R Square from Regression for Prediction Hi Hector and Bruce, Thanks a lot for your kind explanation. Actually, what we want to predict is group phenomena (like culture) and our sample is around 300. So, what's the threshold to say the R square is ok for prediction? Is 0.2 enough? If not, how about 0.3? Bests, Haijie On Thu, Aug 6, 2009 at 3:26 AM, Bruce Weaver <[hidden email]> wrote: Hector's second observation below touched on something that has long been a pet peeve of mine. Some years ago in psychology, there was a move to encourage (or even require) reporting of effect sizes. In some contexts, I think that makes sense. But in others, it doesn't make very much sense at all IMO. For example, I used to work for an attention researcher. Like most attention researchers, we measured response time (RT) in two or more experimental conditions, and compared mean RT in one condition to mean RT in another condition to test predictions from various theories or models. In this case, R-squared type measures of effect size would almost certainly be ridiculously low for the very reason Hector alluded to: I.e., there are a whole host of things that influence RT on a particular trial, the experimental conditions being only one of them. Granted, such measures of effect size would be appropriate IF one's goal was to explain as much of the variability in RT as possible. But for those attention researchers, explaining all of the variation in RT was NEVER the goal. They were not interested in RT per se, but were simply using it as a convenient tool to make inferences about attentional processes, and to test predictions from competing models. In any situation where the measurement is very indirect (which is a common occurrence in psychology), it seems to me that r-squared type measures of effect size are not really all that useful, because they are not in concert with the real aims of the research. So in response to the OP, how large r-squared has to be depends an awful lot on the context. Bruce Hector Maletta wrote: > > You ae right: even if statistically significant, a regression may be a > poor > predictor if R2 is low. The R2 determination coefficient (the square of > the > linear correlation coefficient between observed and predicted values) > indicates the proportion of the observed variance in the dependent > variable > that is explained by a linear combination of the predictors. It may be > low. > However, if your sample is large enough, it may be significantly different > from zero. > > When R2 is low, it means the residuals around the predicted value are on > average quite large. You can perhaps predict the population average Y (for > a > given combination of predictors) but individual cases may be quite far > from > the predicted value. Thus, if you want to predict the population or > average > value it may still be right, even if for individual cases it would produce > wild errors. > > As for significance, you know it depends not only on the strength of the > relationship (i.e. the amount of variance explained or not) but on the > sample size. Almost any regression equation can be significant if you have > 10 million cases.. > > Two final observations: > > 1. Statistics is about large numbers, not about individuals. Nobody > told you that a regression equation would predict an individual value: the > individual value is Y(i)=a+sum(b(j)X(ji)) + e(i). The error component e(i) > for an individual is indeterminate.. With regression (and ASSUMING that the > relationship is linear) you only guarantee that the SUM of squared error > is > at its minimum value (that is the least squares principle), but you do not > ensure that the sum is small, and even less that the error is small for a > particular individual. In general, you'd predict far better for a class of > individuals than for an individual (that is, you predict that for a class > of > individuals with such and such values of predictors, the predicted -or > average-value is Y*). This is a general principle with prediction: it is > ALWAYS about classes of events, not individual events. > 2. What is then the use of an equation with significant coefficients > but a low R2? The use is not so much prediction but explanation. If you > have > a sample large enough, you may prove that, say, 5% of the variance in Y is > explained by these predictors, even if you cannot predict the outcome very > well because of so many other factors causing large error in the > prediction. > This is often encountered not so much with R2 but with differences in R2: > In > medicine, for instance, you may prove that drinking one more daily cup of > coffee increases some predicted outcome, say hypertension, by some small > amount, even if you cannot predict the blood pressure very well based only > on the use of coffee because so many other factors influence that outcome. > > Hector > > > > _____ > > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Haijie Ding > Sent: 05 August 2009 04:14 > To: [hidden email] > Subject: R Square from Regression for Prediction > > > > Hi Listers, > > We want to predict Y from a few variable. Let's say x1, x2, x3. I am > curious > to know if there is any role of thumb on R square for the prediction. To > my > understanding, the prediction is not good if the R square is too small, > let's say 0.1, even if the regression equation and coefficients are > significant. Am I correct? > > I tried to look up information through internet but didn't get much. Can > anyone share your thoughts or direct me to some references? > > > > Thanks very much. > > > > Haijie > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
