R Square from Regression for Prediction

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

R Square from Regression for Prediction

Haijie Ding
Hi Listers,
We want to predict Y from a few variable. Let's say x1, x2, x3. I am curious to know if there is any role of thumb on R square for the prediction. To my understanding, the prediction is not good if the R square is too small, let's say 0.1, even if the regression equation and coefficients are significant. Am I correct?
I tried to look up information through internet but didn't get much. Can anyone share your thoughts or direct me to some references?

Thanks very much.

Haijie
Reply | Threaded
Open this post in threaded view
|

Re: R Square from Regression for Prediction

Hector Maletta

You ae right: even if statistically significant, a regression may be a poor predictor if R2 is low. The R2 determination coefficient (the square of the linear correlation coefficient between observed and predicted values) indicates the proportion of the observed variance in the dependent variable that is explained by a linear combination of the predictors. It may be low. However, if your sample is large enough, it may be significantly different from zero.

When R2 is low, it means the residuals around the predicted value are on average quite large. You can perhaps predict the population average Y (for a given combination of predictors) but individual cases may be quite far from the predicted value. Thus, if you want to predict the population or average value it may still be right, even if for individual cases it would produce wild errors.

As for significance, you know it depends not only on the strength of the relationship (i.e. the amount of variance explained or not) but on the sample size. Almost any regression equation can be significant if you have 10 million cases.

Two final observations:

  1. Statistics is about large numbers, not about individuals. Nobody told you that a regression equation would predict an individual value: the individual value is Y(i)=a+sum(b(j)X(ji)) + e(i). The error component e(i) for an individual is indeterminate. With regression (and ASSUMING that the relationship is linear) you only guarantee that the SUM of squared error is at its minimum value (that is the least squares principle), but you do not ensure that the sum is small, and even less that the error is small for a particular individual. In general, you’d predict far better for a class of individuals than for an individual (that is, you predict that for a class of individuals with such and such values of predictors, the predicted –or average—value is Y*). This is a general principle with prediction: it is ALWAYS about classes of events, not individual events.
  2. What is then the use of an equation with significant coefficients but a low R2? The use is not so much prediction but explanation. If you have a sample large enough, you may prove that, say, 5% of the variance in Y is explained by these predictors, even if you cannot predict the outcome very well because of so many other factors causing large error in the prediction. This is often encountered not so much with R2 but with differences in R2: In medicine, for instance, you may prove that drinking one more daily cup of coffee increases some predicted outcome, say hypertension, by some small amount, even if you cannot predict the blood pressure very well based only on the use of coffee because so many other factors influence that outcome.

Hector

 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Haijie Ding
Sent: 05 August 2009 04:14
To: [hidden email]
Subject: R Square from Regression for Prediction

 

Hi Listers,

We want to predict Y from a few variable. Let's say x1, x2, x3. I am curious to know if there is any role of thumb on R square for the prediction. To my understanding, the prediction is not good if the R square is too small, let's say 0.1, even if the regression equation and coefficients are significant. Am I correct?

I tried to look up information through internet but didn't get much. Can anyone share your thoughts or direct me to some references?

 

Thanks very much.

 

Haijie

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.392 / Virus Database: 270.13.44/2282 - Release Date: 08/04/09 18:01:00

Reply | Threaded
Open this post in threaded view
|

Re: R Square from Regression for Prediction

Bruce Weaver
Administrator
Hector's second observation below touched on something that has long been a pet peeve of mine.  Some years ago in psychology, there was a move to encourage (or even require) reporting of effect sizes.  In some contexts, I think that makes sense.  But in others, it doesn't make very much sense at all IMO.  For example, I used to work for an attention researcher.  Like most attention researchers, we measured response time (RT) in two or more experimental conditions, and compared mean RT in one condition to mean RT in another condition to test predictions from various theories or models.  In this case, R-squared type measures of effect size would almost certainly be ridiculously low for the very reason Hector alluded to:  I.e., there are a whole host of things that influence RT on a particular trial, the experimental conditions being only one of them.  Granted, such measures of effect size would be appropriate IF one's goal was to explain as much of the variability in RT as possible.  But for those attention researchers, explaining all of the variation in RT was NEVER the goal.  They were not interested in RT per se, but were simply using it as a convenient tool to make inferences about attentional processes, and to test predictions from competing models.  In any situation where the measurement is very indirect (which is a common occurrence in psychology), it seems to me that r-squared type measures of effect size are not really all that useful, because they are not in concert with the real aims of the research.

So in response to the OP, how large r-squared has to be depends an awful lot on the context.

Bruce

Hector Maletta wrote
You ae right: even if statistically significant, a regression may be a poor
predictor if R2 is low. The R2 determination coefficient (the square of the
linear correlation coefficient between observed and predicted values)
indicates the proportion of the observed variance in the dependent variable
that is explained by a linear combination of the predictors. It may be low.
However, if your sample is large enough, it may be significantly different
from zero.

When R2 is low, it means the residuals around the predicted value are on
average quite large. You can perhaps predict the population average Y (for a
given combination of predictors) but individual cases may be quite far from
the predicted value. Thus, if you want to predict the population or average
value it may still be right, even if for individual cases it would produce
wild errors.

As for significance, you know it depends not only on the strength of the
relationship (i.e. the amount of variance explained or not) but on the
sample size. Almost any regression equation can be significant if you have
10 million cases.

Two final observations:

1.      Statistics is about large numbers, not about individuals. Nobody
told you that a regression equation would predict an individual value: the
individual value is Y(i)=a+sum(b(j)X(ji)) + e(i). The error component e(i)
for an individual is indeterminate. With regression (and ASSUMING that the
relationship is linear) you only guarantee that the SUM of squared error is
at its minimum value (that is the least squares principle), but you do not
ensure that the sum is small, and even less that the error is small for a
particular individual. In general, you'd predict far better for a class of
individuals than for an individual (that is, you predict that for a class of
individuals with such and such values of predictors, the predicted -or
average-value is Y*). This is a general principle with prediction: it is
ALWAYS about classes of events, not individual events.
2.      What is then the use of an equation with significant coefficients
but a low R2? The use is not so much prediction but explanation. If you have
a sample large enough, you may prove that, say, 5% of the variance in Y is
explained by these predictors, even if you cannot predict the outcome very
well because of so many other factors causing large error in the prediction.
This is often encountered not so much with R2 but with differences in R2: In
medicine, for instance, you may prove that drinking one more daily cup of
coffee increases some predicted outcome, say hypertension, by some small
amount, even if you cannot predict the blood pressure very well based only
on the use of coffee because so many other factors influence that outcome.

Hector



  _____

From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Haijie Ding
Sent: 05 August 2009 04:14
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: R Square from Regression for Prediction



Hi Listers,

We want to predict Y from a few variable. Let's say x1, x2, x3. I am curious
to know if there is any role of thumb on R square for the prediction. To my
understanding, the prediction is not good if the R square is too small,
let's say 0.1, even if the regression equation and coefficients are
significant. Am I correct?

I tried to look up information through internet but didn't get much. Can
anyone share your thoughts or direct me to some references?



Thanks very much.



Haijie

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.392 / Virus Database: 270.13.44/2282 - Release Date: 08/04/09
18:01:00
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: R Square from Regression for Prediction

Hector Maletta

According to your previous explanation, you were trying to predict specific cases, no matter whether the “cases” are individual human beings or individual groups. What is usually done about R2 (proportion of variance explained) to know whether it is significant is to calculate de F ratio. Usually this ratio is produced as part of the regression procedure. F is the ratio of explained to unexplained variance, where both variances are calculated dividing the respective sum of squares by their corresponding degrees of freedom.

This tells you only one thing: it tells you whether you can or can not reject the null hypothesis that the “real” R2 (in the population) is zero, implying that your observed R2 is just a sampling fluke. It does not tell you whether the prediction would be accurate, or whether the linear function is the right kind of relationship between the variables, or whether the actual value would be close to the predicted value.

Hector

 


From: Haijie Ding [mailto:[hidden email]]
Sent: 05 August 2009 20:46
To: Bruce Weaver; Hector Maletta
Cc: [hidden email]
Subject: Re: R Square from Regression for Prediction

 

Hi Hector and Bruce,

Thanks a lot for your kind explanation.

Actually, what we want to predict is group phenomena (like culture) and our sample is around 300. So, what's the threshold to say the R square is ok for prediction? Is 0.2 enough? If not, how about 0.3?

 

Bests,

Haijie

 

On Thu, Aug 6, 2009 at 3:26 AM, Bruce Weaver <[hidden email]> wrote:

Hector's second observation below touched on something that has long been a
pet peeve of mine.  Some years ago in psychology, there was a move to
encourage (or even require) reporting of effect sizes.  In some contexts, I
think that makes sense.  But in others, it doesn't make very much sense at
all IMO.  For example, I used to work for an attention researcher.  Like
most attention researchers, we measured response time (RT) in two or more
experimental conditions, and compared mean RT in one condition to mean RT in
another condition to test predictions from various theories or models.  In
this case, R-squared type measures of effect size would almost certainly be
ridiculously low for the very reason Hector alluded to:  I.e., there are a
whole host of things that influence RT on a particular trial, the
experimental conditions being only one of them.  Granted, such measures of
effect size would be appropriate IF one's goal was to explain as much of the
variability in RT as possible.  But for those attention researchers,
explaining all of the variation in RT was NEVER the goal.  They were not
interested in RT per se, but were simply using it as a convenient tool to
make inferences about attentional processes, and to test predictions from
competing models.  In any situation where the measurement is very indirect
(which is a common occurrence in psychology), it seems to me that r-squared
type measures of effect size are not really all that useful, because they
are not in concert with the real aims of the research.

So in response to the OP, how large r-squared has to be depends an awful lot
on the context.

Bruce



Hector Maletta wrote:
>
> You ae right: even if statistically significant, a regression may be a
> poor
> predictor if R2 is low. The R2 determination coefficient (the square of
> the
> linear correlation coefficient between observed and predicted values)
> indicates the proportion of the observed variance in the dependent
> variable
> that is explained by a linear combination of the predictors. It may be
> low.
> However, if your sample is large enough, it may be significantly different
> from zero.
>
> When R2 is low, it means the residuals around the predicted value are on
> average quite large. You can perhaps predict the population average Y (for
> a
> given combination of predictors) but individual cases may be quite far
> from
> the predicted value. Thus, if you want to predict the population or
> average
> value it may still be right, even if for individual cases it would produce
> wild errors.
>
> As for significance, you know it depends not only on the strength of the
> relationship (i.e. the amount of variance explained or not) but on the
> sample size. Almost any regression equation can be significant if you have
> 10 million cases.
>
> Two final observations:
>

> 1.      Statistics is about large numbers, not about individuals. Nobody

> told you that a regression equation would predict an individual value: the
> individual value is Y(i)=a+sum(b(j)X(ji)) + e(i). The error component e(i)
> for an individual is indeterminate. With regression (and ASSUMING that the
> relationship is linear) you only guarantee that the SUM of squared error
> is
> at its minimum value (that is the least squares principle), but you do not
> ensure that the sum is small, and even less that the error is small for a
> particular individual. In general, you'd predict far better for a class of
> individuals than for an individual (that is, you predict that for a class
> of
> individuals with such and such values of predictors, the predicted -or
> average-value is Y*). This is a general principle with prediction: it is
> ALWAYS about classes of events, not individual events.

> 2.      What is then the use of an equation with significant coefficients

> but a low R2? The use is not so much prediction but explanation. If you
> have
> a sample large enough, you may prove that, say, 5% of the variance in Y is
> explained by these predictors, even if you cannot predict the outcome very
> well because of so many other factors causing large error in the
> prediction.
> This is often encountered not so much with R2 but with differences in R2:
> In
> medicine, for instance, you may prove that drinking one more daily cup of
> coffee increases some predicted outcome, say hypertension, by some small
> amount, even if you cannot predict the blood pressure very well based only
> on the use of coffee because so many other factors influence that outcome.
>
> Hector
>
>
>
>   _____
>
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Haijie Ding
> Sent: 05 August 2009 04:14
> To: [hidden email]
> Subject: R Square from Regression for Prediction
>
>
>
> Hi Listers,
>
> We want to predict Y from a few variable. Let's say x1, x2, x3. I am
> curious
> to know if there is any role of thumb on R square for the prediction. To
> my
> understanding, the prediction is not good if the R square is too small,
> let's say 0.1, even if the regression equation and coefficients are
> significant. Am I correct?
>
> I tried to look up information through internet but didn't get much. Can
> anyone share your thoughts or direct me to some references?
>
>
>
> Thanks very much.
>
>
>
> Haijie
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.392 / Virus Database: 270.13.44/2282 - Release Date: 08/04/09
> 18:01:00
>
>
>

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."

--
View this message in context: http://www.nabble.com/R-Square-from-Regression-for-Prediction-tp24823449p24834538.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.392 / Virus Database: 270.13.44/2282 - Release Date: 08/05/09 18:23:00

Reply | Threaded
Open this post in threaded view
|

Re: R Square from Regression for Prediction

E. Bernardo
Hector wrote:
 

According to your previous explanation, you were trying to predict specific cases, no matter whether the “cases” are individual human beings or individual groups. What is usually done about R2 (proportion of variance explained) to know whether it is significant is to calculate de F ratio. Usually this ratio is produced as part of the regression procedure. F is the ratio of explained to unexplained variance, where both variances are calculated dividing the respective sum of squares by their corresponding degrees of freedom.

This tells you only one thing: it tells you whether you can or can not reject the null hypothesis that the “real” R2 (in the population) is zero, implying that your observed R2 is just a sampling fluke. It does not tell you whether the prediction would be accurate, or whether the linear function is the right kind of relationship between the variables, or whether the actual value would be close to the predicted value.

Hector

 
 
Books on Regression analysis explain that the F ratio is used to test the null hypothesis that all the regression coeffciients(in the population) are equal to zero. What is the difference of this hypothesis to your hypothesis that the "R2 is zero"? Are they telling the same story?
 
Eins


--- On Thu, 8/6/09, Hector Maletta <hmaletta@fibertel..com.ar> wrote:

From: Hector Maletta <[hidden email]>
Subject: Re: R Square from Regression for Prediction
To: [hidden email]
Date: Thursday, 6 August, 2009, 4:02 PM

According to your previous explanation, you were trying to predict specific cases, no matter whether the “cases” are individual human beings or individual groups. What is usually done about R2 (proportion of variance explained) to know whether it is significant is to calculate de F ratio. Usually this ratio is produced as part of the regression procedure. F is the ratio of explained to unexplained variance, where both variances are calculated dividing the respective sum of squares by their corresponding degrees of freedom.

This tells you only one thing: it tells you whether you can or can not reject the null hypothesis that the “real” R2 (in the population) is zero, implying that your observed R2 is just a sampling fluke. It does not tell you whether the prediction would be accurate, or whether the linear function is the right kind of relationship between the variables, or whether the actual value would be close to the predicted value.

Hector

 


From: Haijie Ding [mailto:[hidden email]]
Sent: 05 August 2009 20:46
To: Bruce Weaver; Hector Maletta
Cc: [hidden email]
Subject: Re: R Square from Regression for Prediction

 

Hi Hector and Bruce,

Thanks a lot for your kind explanation.

Actually, what we want to predict is group phenomena (like culture) and our sample is around 300. So, what's the threshold to say the R square is ok for prediction? Is 0.2 enough? If not, how about 0.3?

 

Bests,

Haijie

 

On Thu, Aug 6, 2009 at 3:26 AM, Bruce Weaver <bweaver@...> wrote:

Hector's second observation below touched on something that has long been a
pet peeve of mine.  Some years ago in psychology, there was a move to
encourage (or even require) reporting of effect sizes.  In some contexts, I
think that makes sense.  But in others, it doesn't make very much sense at
all IMO.  For example, I used to work for an attention researcher.  Like
most attention researchers, we measured response time (RT) in two or more
experimental conditions, and compared mean RT in one condition to mean RT in
another condition to test predictions from various theories or models.  In
this case, R-squared type measures of effect size would almost certainly be
ridiculously low for the very reason Hector alluded to:  I.e., there are a
whole host of things that influence RT on a particular trial, the
experimental conditions being only one of them.  Granted, such measures of
effect size would be appropriate IF one's goal was to explain as much of the
variability in RT as possible.  But for those attention researchers,
explaining all of the variation in RT was NEVER the goal.  They were not
interested in RT per se, but were simply using it as a convenient tool to
make inferences about attentional processes, and to test predictions from
competing models.  In any situation where the measurement is very indirect
(which is a common occurrence in psychology), it seems to me that r-squared
type measures of effect size are not really all that useful, because they
are not in concert with the real aims of the research.

So in response to the OP, how large r-squared has to be depends an awful lot
on the context.

Bruce



Hector Maletta wrote:


>
> You ae right: even if statistically significant, a regression may be a
> poor
> predictor if R2 is low. The R2 determination coefficient (the square of
> the
> linear correlation coefficient between observed and predicted values)
> indicates the proportion of the observed variance in the dependent
> variable
> that is explained by a linear combination of the predictors. It may be
> low.
> However, if your sample is large enough, it may be significantly different
> from zero.
>
> When R2 is low, it means the residuals around the predicted value are on
> average quite large. You can perhaps predict the population average Y (for
> a
> given combination of predictors) but individual cases may be quite far
> from
> the predicted value. Thus, if you want to predict the population or
> average
> value it may still be right, even if for individual cases it would produce
> wild errors.
>
> As for significance, you know it depends not only on the strength of the
> relationship (i.e. the amount of variance explained or not) but on the
> sample size. Almost any regression equation can be significant if you have
> 10 million cases..
>
> Two final observations:
>

> 1.      Statistics is about large numbers, not about individuals. Nobody

> told you that a regression equation would predict an individual value: the
> individual value is Y(i)=a+sum(b(j)X(ji)) + e(i). The error component e(i)
> for an individual is indeterminate.. With regression (and ASSUMING that the
> relationship is linear) you only guarantee that the SUM of squared error
> is
> at its minimum value (that is the least squares principle), but you do not
> ensure that the sum is small, and even less that the error is small for a
> particular individual. In general, you'd predict far better for a class of
> individuals than for an individual (that is, you predict that for a class
> of
> individuals with such and such values of predictors, the predicted -or
> average-value is Y*). This is a general principle with prediction: it is
> ALWAYS about classes of events, not individual events.

> 2.      What is then the use of an equation with significant coefficients

> but a low R2? The use is not so much prediction but explanation. If you
> have
> a sample large enough, you may prove that, say, 5% of the variance in Y is
> explained by these predictors, even if you cannot predict the outcome very
> well because of so many other factors causing large error in the
> prediction.
> This is often encountered not so much with R2 but with differences in R2:
> In
> medicine, for instance, you may prove that drinking one more daily cup of
> coffee increases some predicted outcome, say hypertension, by some small
> amount, even if you cannot predict the blood pressure very well based only
> on the use of coffee because so many other factors influence that outcome.
>
> Hector
>
>
>
>   _____
>
> From: SPSSX(r) Discussion [mailto:SPSSX-L@...] On Behalf Of
> Haijie Ding
> Sent: 05 August 2009 04:14
> To: SPSSX-L@...
> Subject: R Square from Regression for Prediction
>
>
>
> Hi Listers,
>
> We want to predict Y from a few variable. Let's say x1, x2, x3. I am
> curious
> to know if there is any role of thumb on R square for the prediction. To
> my
> understanding, the prediction is not good if the R square is too small,
> let's say 0.1, even if the regression equation and coefficients are
> significant. Am I correct?
>
> I tried to look up information through internet but didn't get much. Can
> anyone share your thoughts or direct me to some references?
>
>
>
> Thanks very much.
>
>
>
> Haijie
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.392 / Virus Database: 270.13.44/2282 - Release Date: 08/04/09
> 18:01:00
>
>
>


-----
--
Bruce Weaver
bweaver@...
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."

--
View this message in context: http://www.nabble.com/R-Square-from-Regression-for-Prediction-tp24823449p24834538.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.392 / Virus Database: 270.13.44/2282 - Release Date: 08/05/09 18:23:00


How do I quit smoking?
Find out the ways on Yahoo! Answers!
Reply | Threaded
Open this post in threaded view
|

Re: R Square from Regression for Prediction

Hector Maletta
In reply to this post by Haijie Ding
Not necessarily. It depends on what you are testing: the contribution of a predictor, or the entire equation.
Consider a test that tries to accept or reject the Null Hypothesis that the additional variance explained by one specific predictor is zero. That additional variance explained would be zero if the respective regression coefficient is (statistically undistinguishable from) zero. The test whether the additional variance would also be zero may be done by testing the equivalent condition that R2 does not change (or shows a non significant difference) if the equation is estimated with and without a specific predictor.
In these cases what is tested is the contribution of a predictor to explain the variability of the DV, and it is expressed in the increase caused in R2 by including that predictor in the equation. Regarding R2 what is tested is the difference in R2 under two situations (with and without a specific predictor).
The other situation consists of testing the significance of R2 (not the significance of a difference in R2). This requires testing the Null Hypothesis that ALL the predictors explain nothing, so that the entire R2 is zero.
The two situations are different. It may well be that R2 is significantly higher than zero, but one or more specific predictors contribute nothing. The F test may be used for all these tests.
Hector

________________________________________
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Eins Bernardo
Sent: 07 August 2009 18:54
To: [hidden email]
Subject: Re: R Square from Regression for Prediction

Hector wrote:

According to your previous explanation, you were trying to predict specific cases, no matter whether the �cases� are individual human beings or individual groups. What is usually done about R2 (proportion of variance explained) to know whether it is significant is to calculate de F ratio. Usually this ratio is produced as part of the regression procedure. F is the ratio of explained to unexplained variance, where both variances are calculated dividing the respective sum of squares by their corresponding degrees of freedom.
This tells you only one thing: it tells you whether you can or can not reject the null hypothesis that the �real� R2 (in the population) is zero, implying that your observed R2 is just a sampling fluke. It does not tell you whether the prediction would be accurate, or whether the linear function is the right kind of relationship between the variables, or whether the actual value would be close to the predicted value.
Hector


Books on Regression analysis explain that the F ratio is used to test the null hypothesis that all the regression coeffciients(in the population) are equal to zero. What is the difference of this hypothesis to your hypothesis that the "R2 is zero"? Are they telling the same story?

Eins


--- On Thu, 8/6/09, Hector Maletta <hmaletta@fibertel..com.ar> wrote:

From: Hector Maletta <[hidden email]>
Subject: Re: R Square from Regression for Prediction
To: [hidden email]
Date: Thursday, 6 August, 2009, 4:02 PM
According to your previous explanation, you were trying to predict specific cases, no matter whether the �cases� are individual human beings or individual groups. What is usually done about R2 (proportion of variance explained) to know whether it is significant is to calculate de F ratio. Usually this ratio is produced as part of the regression procedure. F is the ratio of explained to unexplained variance, where both variances are calculated dividing the respective sum of squares by their corresponding degrees of freedom.
This tells you only one thing: it tells you whether you can or can not reject the null hypothesis that the �real� R2 (in the population) is zero, implying that your observed R2 is just a sampling fluke. It does not tell you whether the prediction would be accurate, or whether the linear function is the right kind of relationship between the variables, or whether the actual value would be close to the predicted value.
Hector

________________________________________
From: Haijie Ding [mailto:[hidden email]]
Sent: 05 August 2009 20:46
To: Bruce Weaver; Hector Maletta
Cc: [hidden email]
Subject: Re: R Square from Regression for Prediction

Hi Hector and Bruce,
Thanks a lot for your kind explanation.
Actually, what we want to predict is group phenomena (like culture) and our sample is around 300. So, what's the threshold to say the R square is ok for prediction? Is 0.2 enough? If not, how about 0.3?

Bests,
Haijie

On Thu, Aug 6, 2009 at 3:26 AM, Bruce Weaver <[hidden email]> wrote:
Hector's second observation below touched on something that has long been a
pet peeve of mine.  Some years ago in psychology, there was a move to
encourage (or even require) reporting of effect sizes.  In some contexts, I
think that makes sense.  But in others, it doesn't make very much sense at
all IMO.  For example, I used to work for an attention researcher.  Like
most attention researchers, we measured response time (RT) in two or more
experimental conditions, and compared mean RT in one condition to mean RT in
another condition to test predictions from various theories or models.  In
this case, R-squared type measures of effect size would almost certainly be
ridiculously low for the very reason Hector alluded to:  I.e., there are a
whole host of things that influence RT on a particular trial, the
experimental conditions being only one of them.  Granted, such measures of
effect size would be appropriate IF one's goal was to explain as much of the
variability in RT as possible.  But for those attention researchers,
explaining all of the variation in RT was NEVER the goal.  They were not
interested in RT per se, but were simply using it as a convenient tool to
make inferences about attentional processes, and to test predictions from
competing models.  In any situation where the measurement is very indirect
(which is a common occurrence in psychology), it seems to me that r-squared
type measures of effect size are not really all that useful, because they
are not in concert with the real aims of the research.

So in response to the OP, how large r-squared has to be depends an awful lot
on the context.

Bruce


Hector Maletta wrote:

>
> You ae right: even if statistically significant, a regression may be a
> poor
> predictor if R2 is low. The R2 determination coefficient (the square of
> the
> linear correlation coefficient between observed and predicted values)
> indicates the proportion of the observed variance in the dependent
> variable
> that is explained by a linear combination of the predictors. It may be
> low.
> However, if your sample is large enough, it may be significantly different
> from zero.
>
> When R2 is low, it means the residuals around the predicted value are on
> average quite large. You can perhaps predict the population average Y (for
> a
> given combination of predictors) but individual cases may be quite far
> from
> the predicted value. Thus, if you want to predict the population or
> average
> value it may still be right, even if for individual cases it would produce
> wild errors.
>
> As for significance, you know it depends not only on the strength of the
> relationship (i.e. the amount of variance explained or not) but on the
> sample size. Almost any regression equation can be significant if you have
> 10 million cases..
>
> Two final observations:
>
> 1.      Statistics is about large numbers, not about individuals. Nobody
> told you that a regression equation would predict an individual value: the
> individual value is Y(i)=a+sum(b(j)X(ji)) + e(i). The error component e(i)
> for an individual is indeterminate.. With regression (and ASSUMING that the
> relationship is linear) you only guarantee that the SUM of squared error
> is
> at its minimum value (that is the least squares principle), but you do not
> ensure that the sum is small, and even less that the error is small for a
> particular individual. In general, you'd predict far better for a class of
> individuals than for an individual (that is, you predict that for a class
> of
> individuals with such and such values of predictors, the predicted -or
> average-value is Y*). This is a general principle with prediction: it is
> ALWAYS about classes of events, not individual events.
> 2.      What is then the use of an equation with significant coefficients
> but a low R2? The use is not so much prediction but explanation. If you
> have
> a sample large enough, you may prove that, say, 5% of the variance in Y is
> explained by these predictors, even if you cannot predict the outcome very
> well because of so many other factors causing large error in the
> prediction.
> This is often encountered not so much with R2 but with differences in R2:
> In
> medicine, for instance, you may prove that drinking one more daily cup of
> coffee increases some predicted outcome, say hypertension, by some small
> amount, even if you cannot predict the blood pressure very well based only
> on the use of coffee because so many other factors influence that outcome.
>
> Hector
>
>
>
>   _____
>
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Haijie Ding
> Sent: 05 August 2009 04:14
> To: [hidden email]
> Subject: R Square from Regression for Prediction
>
>
>
> Hi Listers,
>
> We want to predict Y from a few variable. Let's say x1, x2, x3. I am
> curious
> to know if there is any role of thumb on R square for the prediction. To
> my
> understanding, the prediction is not good if the R square is too small,
> let's say 0.1, even if the regression equation and coefficients are
> significant. Am I correct?
>
> I tried to look up information through internet but didn't get much. Can
> anyone share your thoughts or direct me to some references?
>
>
>
> Thanks very much.
>
>
>
> Haijie
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD