SPSSX Discussion

Standard error of predictor and R-squared

Classic

List

Threaded

6 messages Options

nina

Standard error of predictor and R-squared

Dear all,

while I fear that my question is not specifically related to SPSS, I hope you could still help me with the following problem:
In a comment on a recent analysis based on OLS regressoin, a reviewer mentionds that "a smaller standard error typically results in a higher amount of explained variance (R_squared). Is that correct? Isn't it the regression weight itself (the slope coefficient) which is used for calculating R square rather than its standard error???

Thanks for your comments!
Nina

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Anthony Babinec

Re: Standard error of predictor and R-squared

Nina,
The context of the reviewer comment is unclear. However, I
have a guess. Take a look at the Model Summary table in the
regression output. You should find the R, R Square, Adjusted R Square,
and Std. Error of the Estimate. Could the comment be about this
last item? In a given setting, a higher R Square will lead to a lower
Std. Error of the Estimate.

Tony Babinec
[hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Nina Lasek
Sent: Thursday, March 12, 2015 7:52 AM
To: [hidden email]
Subject: Standard error of predictor and R-squared

Dear all,

while I fear that my question is not specifically related to SPSS, I hope you could still help me with the following problem:
In a comment on a recent analysis based on OLS regressoin, a reviewer mentionds that "a smaller standard error typically results in a higher amount of explained variance (R_squared). Is that correct? Isn't it the regression weight itself (the slope coefficient) which is used for calculating R square rather than its standard error???

Thanks for your comments!
Nina

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: Standard error of predictor and R-squared

Administrator

In reply to this post by nina

It's not entirely clear to me which SE the reviewer is talking about. But we could just plug in the root mean square error (RMSE) to illustrate, because the SEs of the coefficients are related to it. Whether Rsq changes or not depends on what is driving the change in RMSE, redistribution of the total SS, change in the sample size, or some combination of the two. Here is a simple example.

DATA LIST list / Example(F1) SSreg dfreg SSres dfres (4F5.0).
BEGIN DATA
1 200 1 300 98
2 400 1 100 98
3 200 1 300 998
END DATA.

COMPUTE RMSE = SQRT(SSres/dfres).
COMPUTE Rsq = SSreg / SUM(SSreg, SSres).
FORMATS RMSE Rsq (F8.4).
LIST.

OUTPUT:
Example SSreg dfreg SSres dfres RMSE Rsq

1 200 1 300 98 1.7496 .4000
2 400 1 100 98 1.0102 .8000
3 200 1 300 998 .5483 .4000

In all 3 examples, SS_Total = 500. Examples 2 and 3 both have a lower RMSE than example 1. In example 2, RMSE is lower because SS_residual dropped from 300 to 100, and SS_regression increased from 200 to 400. Because Rsq = SS_reg / SS_res, Rsq increased.

In example 3, on the other hand, all of the SS values remained the same, but N was increased from 100 to 1000. Therefore, RMSE error is a lot lower, but Rsq is unchanged (versus example 1).

The reviewer's comment indicates that they are thinking of the example 1 vs example 2 situation as more 'typical'.

HTH.

Nina Lasek wrote

Dear all,

while I fear that my question is not specifically related to SPSS, I hope you could still help me with the following problem:
In a comment on a recent analysis based on OLS regressoin, a reviewer mentionds that "a smaller standard error typically results in a higher amount of explained variance (R_squared). Is that correct? Isn't it the regression weight itself (the slope coefficient) which is used for calculating R square rather than its standard error???

Thanks for your comments!
Nina

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Mike

Re: Standard error of predictor and R-squared

I could be wrong but let's assume that the reviewer is talking
about the standard error of a regression coefficient. The
mistake the reviewer makes is in implying that the standard
error of a coefficient directly affects the value of the squared
multiple correlation (i.e., R^2) for the regression equation.
Large standard errors for the coefficients are a problem
when they are caused by a high level of collinearity among
the predictors. Quouting from the Wikipeadia entry on
multicollinearity:

|One of the features of multicollinearity is that the standard
|errors of the affected coefficients tend to be large. In that
|case, the test of the hypothesis that the coefficient is equal
|to zero may lead to a failure to reject a false null hypothesis
|of no effect of the explanator, a type II error.
|
|A principal danger of such data redundancy is that of
|overfitting in regression analysis models. The best regression
|models are those in which the predictor variables each
|correlate highly with the dependent (outcome) variable
|but correlate at most only minimally with each other. Such
|a model is often called "low noise" and will be statistically
|robust (that is, it will predict reliably across numerous samples
|of variable sets drawn from the same statistical population).
|
|So long as the underlying specification is correct, multicollinearity
|does not actually bias results; it just produces large standard
|errors in the related independent variables. More importantly,
| the usual use of regression is to take coefficients from the model
| and then apply them to other data. If the pattern of multicollinearity
|in the new data differs from that in the data that was fitted, such
|extrapolation may introduce large errors in the predictions.[6]
http://en.wikipedia.org/wiki/Multicollinearity

In the general multiple regression case, the standard error of
a coefficient [SE(bi)] is given by the following equation:

SE(bi) = sqrt[MSresidual/{sum of squares for Xi * (1 - Ri^2)}]
where Ri^2 is the multiple correlation of the predictor Xi with
the other Xs in the equation (page 126 in Edwards 1984,
Intro to Lin Reg & Corr)
As Ri^2 approaches 1.00, the denominator gets smaller
and the standard error gets larger. This can lead to odd
results, such as a significant R^2 for the regression but
none of the coefficients are significant. See the following
article by Cramer for related problems:

Cramer, E. M. (1972). Significance tests and tests of models
in multiple regression. The American Statistician, 26(4), 26-30.

Maybe the reviewer was confused because large standards
errors are bad but not in terms of R^2. Reviewers as reviewers
sometime feel the need to say something even if they are
confused about what they say. ;-)

-Mike Palij
New York University
[hidden email]

----- Original Message -----
From: "Bruce Weaver" <[hidden email]>
To: <[hidden email]>
Sent: Thursday, March 12, 2015 2:05 PM
Subject: Re: Standard error of predictor and R-squared

> It's not entirely clear to me which SE the reviewer is talking about.
> But we
> could just plug in the root mean square error (RMSE) to illustrate,
> because
> the SEs of the coefficients are related to it. Whether Rsq changes
> or not
> depends on what is driving the change in RMSE, redistribution of the
> total
> SS, change in the sample size, or some combination of the two. Here
> is a
> simple example.
>
> DATA LIST list / Example(F1) SSreg dfreg SSres dfres (4F5.0).
> BEGIN DATA
> 1 200 1 300 98
> 2 400 1 100 98
> 3 200 1 300 998
> END DATA.
>
> COMPUTE RMSE = SQRT(SSres/dfres).
> COMPUTE Rsq = SSreg / SUM(SSreg, SSres).
> FORMATS RMSE Rsq (F8.4).
> LIST.
>
> OUTPUT:
> Example SSreg dfreg SSres dfres RMSE Rsq
>
> 1 200 1 300 98 1.7496 .4000
> 2 400 1 100 98 1.0102 .8000
> 3 200 1 300 998 .5483 .4000
>
> In all 3 examples, SS_Total = 500. Examples 2 and 3 both have a lower
> RMSE
> than example 1. In example 2, RMSE is lower because SS_residual
> dropped
> from 300 to 100, and SS_regression increased from 200 to 400. Because
> Rsq =
> SS_reg / SS_res, Rsq increased.
>
> In example 3, on the other hand, all of the SS values remained the
> same, but
> N was increased from 100 to 1000. Therefore, RMSE error is a lot
> lower, but
> Rsq is unchanged (versus example 1).
>
> The reviewer's comment indicates that they are thinking of the example
> 1 vs
> example 2 situation as more 'typical'.
>
> HTH.
>
>
> Nina Lasek wrote
>> Dear all,
>>
>> while I fear that my question is not specifically related to SPSS, I
>> hope
>> you could still help me with the following problem:
>> In a comment on a recent analysis based on OLS regressoin, a
>> reviewer
>> mentionds that "a smaller standard error typically results in a
>> higher
>> amount of explained variance (R_squared). Is that correct? Isn't it
>> the
>> regression weight itself (the slope coefficient) which is used for
>> calculating R square rather than its standard error???
>>
>> Thanks for your comments!
>> Nina
> -----

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Re: Standard error of predictor and R-squared

In reply to this post by nina

Mike Palij gives a full answer, with plenty of detail. I can't resist from adding my
own shorter version of the same explanation, with a side-mention of "suppressors"...
in case that is at the root of your reviewer's problem.

The reviewer's comment is improved if it says, "a smaller standard error typically
results *from* [not *in*] a higher amount of explained variance...."

It is helpful to think of the 'baseline value' of the s.e. of the partial regression coefficient
as a scaling of the error of the univariate correlation between predictor and outcome.
Then you consider how this s.e. is influenced to be smaller or larger: smaller, when
other variables contribute to the overall R-squared; and larger, when the (appropriately
named) Variance Inflation Factor will indicate that this 'partial regression coefficient'
has a less precise contribution by itself because of correlated predictor variables.

If the similar variables only act redundantly, then the regression coefficient itself will
be made smaller. However, "suppressor variables" are an odd and confusing case, where
two correlated variables show up with opposite signs as predictors in the equation, and
each has a rather large s.e.

When a "standardized beta" is greater than the univariate correlation - and especially
when greater than 1.0 - this is a sign that "suppression" exists. I could easily note that
"beta > 1.0" (or bigger than usual) even when handed someone else's data where I
did not know the expected direction of prediction (i.e., the signs). Suppression indicates
that the best equation wants to make use of some computed difference of two variables.

By the way, WHAT TO DO for suppression....
Logically, you are on stronger grounds for a robust model if you can pull out the variables
involved and use a-priori knowledge to compute a ratio, log of that ratio, or some
justifiable weighted difference of two (or more) variables; and use that composite in
place of the ones that confound each other.

--
Rich Ulrich

> Date: Thu, 12 Mar 2015 08:51:49 -0400

> From: [hidden email]
> Subject: Standard error of predictor and R-squared
> To: [hidden email]
>
> Dear all,
>
> while I fear that my question is not specifically related to SPSS, I hope you could still help me with the following problem:
> In a comment on a recent analysis based on OLS regressoin, a reviewer mentionds that "a smaller standard error typically results in a higher amount of explained variance (R_squared). Is that correct? Isn't it the regression weight itself (the slope coefficient) which is used for calculating R square rather than its standard error???
>
> Thanks for your comments!
> Nina

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Ryan

Re: Standard error of predictor and R-squared

In reply to this post by nina

For two predictors, the standard error of the coefficient for the first predictor can be defined as:

sqrt((1 - R^2_Y12) / ((1 - R^2_12) * (N - K - 1))) * (Sy / Sx1)

You asked: "Isn't it the regression weight itself (the slope coefficient) which is used for calculating R square rather than its standard error???"

For two predictors, R-Squared ("R^2_Y12") can be solved by:

Zbeta1*r_Y1 + Zbeta2*r_Y2

Ryan

On Thu, Mar 12, 2015 at 8:51 AM, Nina Lasek <[hidden email]> wrote:

Dear all,

while I fear that my question is not specifically related to SPSS, I hope you could still help me with the following problem:
In a comment on a recent analysis based on OLS regressoin, a reviewer mentionds that "a smaller standard error typically results in a higher amount of explained variance (R_squared). Is that correct? Isn't it the regression weight itself (the slope coefficient) which is used for calculating R square rather than its standard error???

Thanks for your comments!
Nina

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD