SPSSX Discussion

R^2 computation in SPSS

Classic

List

Threaded

10 messages Options

Joanne Tsai

R^2 computation in SPSS

Dear Co-listers:
I have recently encountered this following question:
I got a pretty good R^2 estimation using Linear Regression model in
SPSS, 0.85. (Not all sample points have all the dependent as well as
independent variables, so I used the pairwise option.)
But when I plotted the predicted number vs actual number (my dependent
variable) in excel and curve expert, I can only get R^2 around 0.50
I am not sure what's causing this discrepancy, is it due to the
computation in SPSS or because of the fact that it's computed pariwise?
The other question I have is that what can one say about the result when
one uses the linear regression model without including the constant? The
R^2 is higher, but isn't that biased? Can one still use it as a
validation method?
Thank you so much for your help!

Joanne

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

ViAnn Beadle

Re: R^2 computation in SPSS

Try your SPSS analysis again using listwise deletion of missing data. I'd
guess you'll get the same results as Excel which AFAIK doesn't have an
algorithm for pairwise.

When you do not include the constant, you are testing an entirely different
model--that the relation is not significantly different from 0. Is that what
you want?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 6:48 AM
To: [hidden email]
Subject: R^2 computation in SPSS

Dear Co-listers:
I have recently encountered this following question:
I got a pretty good R^2 estimation using Linear Regression model in
SPSS, 0.85. (Not all sample points have all the dependent as well as
independent variables, so I used the pairwise option.)
But when I plotted the predicted number vs actual number (my dependent
variable) in excel and curve expert, I can only get R^2 around 0.50
I am not sure what's causing this discrepancy, is it due to the
computation in SPSS or because of the fact that it's computed pariwise?
The other question I have is that what can one say about the result when
one uses the linear regression model without including the constant? The
R^2 is higher, but isn't that biased? Can one still use it as a
validation method?
Thank you so much for your help!

Joanne

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ornelas, Fermin-2

Re: R^2 computation in SPSS

I think that when you exclude the intercept, it implies that the regression line runs through the origin, which is never recommendable to do.

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
1789 W. Jefferson Street
Phoenix, AZ 85032
Tel: (602) 542-5639
E-mail: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ViAnn Beadle
Sent: Tuesday, February 05, 2008 7:01 AM
To: [hidden email]
Subject: Re: R^2 computation in SPSS

Try your SPSS analysis again using listwise deletion of missing data. I'd
guess you'll get the same results as Excel which AFAIK doesn't have an
algorithm for pairwise.

When you do not include the constant, you are testing an entirely different
model--that the relation is not significantly different from 0. Is that what
you want?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 6:48 AM
To: [hidden email]
Subject: R^2 computation in SPSS

Dear Co-listers:
I have recently encountered this following question:
I got a pretty good R^2 estimation using Linear Regression model in
SPSS, 0.85. (Not all sample points have all the dependent as well as
independent variables, so I used the pairwise option.)
But when I plotted the predicted number vs actual number (my dependent
variable) in excel and curve expert, I can only get R^2 around 0.50
I am not sure what's causing this discrepancy, is it due to the
computation in SPSS or because of the fact that it's computed pariwise?
The other question I have is that what can one say about the result when
one uses the linear regression model without including the constant? The
R^2 is higher, but isn't that biased? Can one still use it as a
validation method?
Thank you so much for your help!

Joanne

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Joanne Tsai

Re: R^2 computation in SPSS

In reply to this post by Joanne Tsai

1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS.
But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for
listwise. I'd love to show the higher R^2, but would not want to draw a
wrong conclusion based on it. Or is there any other tool that I can plot
the graph and get the similar 0.85? IS there anywhere I can find more
information in terms of the algorithm for pairwise?
2. When I run the linear regression including the constant, the p-value
on the constant is 0.91, so I would think it's not significant. Can I
remove the constant just based on the P-value I got, is it fair?
Thank you so much for your pointers!

-----Original Message-----
From: ViAnn Beadle [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 9:01 AM
To: Joanne Tsai; [hidden email]
Subject: RE: R^2 computation in SPSS

Try your SPSS analysis again using listwise deletion of missing data.
I'd
guess you'll get the same results as Excel which AFAIK doesn't have an
algorithm for pairwise.

When you do not include the constant, you are testing an entirely
different
model--that the relation is not significantly different from 0. Is that
what
you want?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 6:48 AM
To: [hidden email]
Subject: R^2 computation in SPSS

Dear Co-listers:
I have recently encountered this following question:
I got a pretty good R^2 estimation using Linear Regression model in
SPSS, 0.85. (Not all sample points have all the dependent as well as
independent variables, so I used the pairwise option.)
But when I plotted the predicted number vs actual number (my dependent
variable) in excel and curve expert, I can only get R^2 around 0.50
I am not sure what's causing this discrepancy, is it due to the
computation in SPSS or because of the fact that it's computed pariwise?
The other question I have is that what can one say about the result when
one uses the linear regression model without including the constant? The
R^2 is higher, but isn't that biased? Can one still use it as a
validation method?
Thank you so much for your help!

Joanne

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Peck, Jon

Re: R^2 computation in SPSS

If you use pairwise deletion, you can't be sure of the statistical properties of your regression estimates. Pairwise deletion is rarely appropriate. In fact, with pairwise deletion you can’t even be sure that the covariance matrix is positive definite. Stick with listwise deletion.

As for the constant term, think of the model you are testing. Omitting the constant term is perfectly appropriate if your model implies that the regression line should go through the origin and you are confident of linearity. In most cases, though, you should just keep the constant term and not test it for significance. Forcing the regression line through the origin does produce an R^2 that isn't really comparable to the usual one.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai
Sent: Tuesday, February 05, 2008 9:01 AM
To: [hidden email]
Subject: Re: [SPSSX-L] R^2 computation in SPSS

1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS.
But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for
listwise. I'd love to show the higher R^2, but would not want to draw a
wrong conclusion based on it. Or is there any other tool that I can plot
the graph and get the similar 0.85? IS there anywhere I can find more
information in terms of the algorithm for pairwise?
2. When I run the linear regression including the constant, the p-value
on the constant is 0.91, so I would think it's not significant. Can I
remove the constant just based on the P-value I got, is it fair?
Thank you so much for your pointers!

-----Original Message-----
From: ViAnn Beadle [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 9:01 AM
To: Joanne Tsai; [hidden email]
Subject: RE: R^2 computation in SPSS

Try your SPSS analysis again using listwise deletion of missing data.
I'd
guess you'll get the same results as Excel which AFAIK doesn't have an
algorithm for pairwise.

When you do not include the constant, you are testing an entirely
different
model--that the relation is not significantly different from 0. Is that
what
you want?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 6:48 AM
To: [hidden email]
Subject: R^2 computation in SPSS

Dear Co-listers:
I have recently encountered this following question:
I got a pretty good R^2 estimation using Linear Regression model in
SPSS, 0.85. (Not all sample points have all the dependent as well as
independent variables, so I used the pairwise option.)
But when I plotted the predicted number vs actual number (my dependent
variable) in excel and curve expert, I can only get R^2 around 0.50
I am not sure what's causing this discrepancy, is it due to the
computation in SPSS or because of the fact that it's computed pariwise?
The other question I have is that what can one say about the result when
one uses the linear regression model without including the constant? The
R^2 is higher, but isn't that biased? Can one still use it as a
validation method?
Thank you so much for your help!

Joanne

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Joanne Tsai

Re: R^2 computation in SPSS

In reply to this post by Joanne Tsai

Thank you for the answer.
Is there anyway I can find out why the coeffecient estimates using two
different methods are similar, but R^2 is not. (I will be throwing out
25% of data if using listwise)
I am assuming the model should go through the origin, so the second
question is fully answered. Thank you.

-----Original Message-----
From: Peck, Jon [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 11:17 AM
To: Joanne Tsai; [hidden email]
Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS

If you use pairwise deletion, you can't be sure of the statistical
properties of your regression estimates. Pairwise deletion is rarely
appropriate. In fact, with pairwise deletion you can't even be sure
that the covariance matrix is positive definite. Stick with listwise
deletion.

As for the constant term, think of the model you are testing. Omitting
the constant term is perfectly appropriate if your model implies that
the regression line should go through the origin and you are confident
of linearity. In most cases, though, you should just keep the constant
term and not test it for significance. Forcing the regression line
through the origin does produce an R^2 that isn't really comparable to
the usual one.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 9:01 AM
To: [hidden email]
Subject: Re: [SPSSX-L] R^2 computation in SPSS

1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS.
But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for
listwise. I'd love to show the higher R^2, but would not want to draw a
wrong conclusion based on it. Or is there any other tool that I can plot
the graph and get the similar 0.85? IS there anywhere I can find more
information in terms of the algorithm for pairwise?
2. When I run the linear regression including the constant, the p-value
on the constant is 0.91, so I would think it's not significant. Can I
remove the constant just based on the P-value I got, is it fair?
Thank you so much for your pointers!

-----Original Message-----
From: ViAnn Beadle [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 9:01 AM
To: Joanne Tsai; [hidden email]
Subject: RE: R^2 computation in SPSS

Try your SPSS analysis again using listwise deletion of missing data.
I'd
guess you'll get the same results as Excel which AFAIK doesn't have an
algorithm for pairwise.

When you do not include the constant, you are testing an entirely
different
model--that the relation is not significantly different from 0. Is that
what
you want?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 6:48 AM
To: [hidden email]
Subject: R^2 computation in SPSS

Dear Co-listers:
I have recently encountered this following question:
I got a pretty good R^2 estimation using Linear Regression model in
SPSS, 0.85. (Not all sample points have all the dependent as well as
independent variables, so I used the pairwise option.)
But when I plotted the predicted number vs actual number (my dependent
variable) in excel and curve expert, I can only get R^2 around 0.50
I am not sure what's causing this discrepancy, is it due to the
computation in SPSS or because of the fact that it's computed pariwise?
The other question I have is that what can one say about the result when
one uses the linear regression model without including the constant? The
R^2 is higher, but isn't that biased? Can one still use it as a
validation method?
Thank you so much for your help!

Joanne

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Peck, Jon

Re: R^2 computation in SPSS

Regarding the R^2, when there is a constant term in the regression, the residuals have mean zero, so the sums of squares in the numerator and denominator match up with correlation coefficients. If there is no constant term, the residual mean is not zero, so the sums of squares in both numerator and denominator have a contribution from the mean square, so the explained/total sum of squares will be closer to one.

Now, here is the quiz for today: construct an ordinary least squares linear regression example where ALL of the residuals are positive.

Regards,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai
Sent: Tuesday, February 05, 2008 11:30 AM
To: [hidden email]
Subject: Re: [SPSSX-L] R^2 computation in SPSS

Thank you for the answer.
Is there anyway I can find out why the coeffecient estimates using two
different methods are similar, but R^2 is not. (I will be throwing out
25% of data if using listwise)
I am assuming the model should go through the origin, so the second
question is fully answered. Thank you.

-----Original Message-----
From: Peck, Jon [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 11:17 AM
To: Joanne Tsai; [hidden email]
Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS

If you use pairwise deletion, you can't be sure of the statistical
properties of your regression estimates. Pairwise deletion is rarely
appropriate. In fact, with pairwise deletion you can't even be sure
that the covariance matrix is positive definite. Stick with listwise
deletion.

As for the constant term, think of the model you are testing. Omitting
the constant term is perfectly appropriate if your model implies that
the regression line should go through the origin and you are confident
of linearity. In most cases, though, you should just keep the constant
term and not test it for significance. Forcing the regression line
through the origin does produce an R^2 that isn't really comparable to
the usual one.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 9:01 AM
To: [hidden email]
Subject: Re: [SPSSX-L] R^2 computation in SPSS

1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS.
But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for
listwise. I'd love to show the higher R^2, but would not want to draw a
wrong conclusion based on it. Or is there any other tool that I can plot
the graph and get the similar 0.85? IS there anywhere I can find more
information in terms of the algorithm for pairwise?
2. When I run the linear regression including the constant, the p-value
on the constant is 0.91, so I would think it's not significant. Can I
remove the constant just based on the P-value I got, is it fair?
Thank you so much for your pointers!

-----Original Message-----
From: ViAnn Beadle [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 9:01 AM
To: Joanne Tsai; [hidden email]
Subject: RE: R^2 computation in SPSS

Try your SPSS analysis again using listwise deletion of missing data.
I'd
guess you'll get the same results as Excel which AFAIK doesn't have an
algorithm for pairwise.

When you do not include the constant, you are testing an entirely
different
model--that the relation is not significantly different from 0. Is that
what
you want?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 6:48 AM
To: [hidden email]
Subject: R^2 computation in SPSS

Dear Co-listers:
I have recently encountered this following question:
I got a pretty good R^2 estimation using Linear Regression model in
SPSS, 0.85. (Not all sample points have all the dependent as well as
independent variables, so I used the pairwise option.)
But when I plotted the predicted number vs actual number (my dependent
variable) in excel and curve expert, I can only get R^2 around 0.50
I am not sure what's causing this discrepancy, is it due to the
computation in SPSS or because of the fact that it's computed pariwise?
The other question I have is that what can one say about the result when
one uses the linear regression model without including the constant? The
R^2 is higher, but isn't that biased? Can one still use it as a
validation method?
Thank you so much for your help!

Joanne

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Joanne Tsai

Re: R^2 computation in SPSS

In reply to this post by Joanne Tsai

Hi, Jon
Sorry I didn't make my question clear.
I meant to ask, by trying both listwise and pairwise, I observed that
both sets of estimated coefficients are similar though R^2 seemed to
perform a lot better with the pairwise. I am very curious of the reason
behind it. Can I get better coefficients by using pairwise since it
doesn't throw out any data? And how is R^2 computed by using pairwise,
why is it a lot better than the R^2 done listwise?

-----Original Message-----
From: Peck, Jon [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 1:47 PM
To: Joanne Tsai; [hidden email]
Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS

Regarding the R^2, when there is a constant term in the regression, the
residuals have mean zero, so the sums of squares in the numerator and
denominator match up with correlation coefficients. If there is no
constant term, the residual mean is not zero, so the sums of squares in
both numerator and denominator have a contribution from the mean square,
so the explained/total sum of squares will be closer to one.

Now, here is the quiz for today: construct an ordinary least squares
linear regression example where ALL of the residuals are positive.

Regards,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 11:30 AM
To: [hidden email]
Subject: Re: [SPSSX-L] R^2 computation in SPSS

Thank you for the answer.
Is there anyway I can find out why the coeffecient estimates using two
different methods are similar, but R^2 is not. (I will be throwing out
25% of data if using listwise)
I am assuming the model should go through the origin, so the second
question is fully answered. Thank you.

-----Original Message-----
From: Peck, Jon [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 11:17 AM
To: Joanne Tsai; [hidden email]
Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS

If you use pairwise deletion, you can't be sure of the statistical
properties of your regression estimates. Pairwise deletion is rarely
appropriate. In fact, with pairwise deletion you can't even be sure
that the covariance matrix is positive definite. Stick with listwise
deletion.

As for the constant term, think of the model you are testing. Omitting
the constant term is perfectly appropriate if your model implies that
the regression line should go through the origin and you are confident
of linearity. In most cases, though, you should just keep the constant
term and not test it for significance. Forcing the regression line
through the origin does produce an R^2 that isn't really comparable to
the usual one.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 9:01 AM
To: [hidden email]
Subject: Re: [SPSSX-L] R^2 computation in SPSS

1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS.
But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for
listwise. I'd love to show the higher R^2, but would not want to draw a
wrong conclusion based on it. Or is there any other tool that I can plot
the graph and get the similar 0.85? IS there anywhere I can find more
information in terms of the algorithm for pairwise?
2. When I run the linear regression including the constant, the p-value
on the constant is 0.91, so I would think it's not significant. Can I
remove the constant just based on the P-value I got, is it fair?
Thank you so much for your pointers!

-----Original Message-----
From: ViAnn Beadle [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 9:01 AM
To: Joanne Tsai; [hidden email]
Subject: RE: R^2 computation in SPSS

Try your SPSS analysis again using listwise deletion of missing data.
I'd
guess you'll get the same results as Excel which AFAIK doesn't have an
algorithm for pairwise.

When you do not include the constant, you are testing an entirely
different
model--that the relation is not significantly different from 0. Is that
what
you want?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 6:48 AM
To: [hidden email]
Subject: R^2 computation in SPSS

Dear Co-listers:
I have recently encountered this following question:
I got a pretty good R^2 estimation using Linear Regression model in
SPSS, 0.85. (Not all sample points have all the dependent as well as
independent variables, so I used the pairwise option.)
But when I plotted the predicted number vs actual number (my dependent
variable) in excel and curve expert, I can only get R^2 around 0.50
I am not sure what's causing this discrepancy, is it due to the
computation in SPSS or because of the fact that it's computed pariwise?
The other question I have is that what can one say about the result when
one uses the linear regression model without including the constant? The
R^2 is higher, but isn't that biased? Can one still use it as a
validation method?
Thank you so much for your help!

Joanne

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Peck, Jon

Re: R^2 computation in SPSS

There are two issues here. First, you are using different samples when you go from listwise to pairwise deletion. There could be population characteristics that differ, especially if values are not missing at random. Imagine, for example a situation where men rarely answer some question while women usually answer. Then the pairwise-sample gender proportion will be very different from the listwise one, and if males and females differ in the regression response, the results will be quite different in the two samples.

Second, the residual means are doubtless different. Do Descriptives on them. You will see how the contribution to the R^2 from the residual means differ. You might also look at regression diagnostics.

HTH,
Jon Peck

-----Original Message-----
From: Joanne Tsai [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 11:54 AM
To: Peck, Jon; [hidden email]
Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS

Hi, Jon
Sorry I didn't make my question clear.
I meant to ask, by trying both listwise and pairwise, I observed that
both sets of estimated coefficients are similar though R^2 seemed to
perform a lot better with the pairwise. I am very curious of the reason
behind it. Can I get better coefficients by using pairwise since it
doesn't throw out any data? And how is R^2 computed by using pairwise,
why is it a lot better than the R^2 done listwise?

-----Original Message-----
From: Peck, Jon [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 1:47 PM
To: Joanne Tsai; [hidden email]
Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS

Regarding the R^2, when there is a constant term in the regression, the
residuals have mean zero, so the sums of squares in the numerator and
denominator match up with correlation coefficients. If there is no
constant term, the residual mean is not zero, so the sums of squares in
both numerator and denominator have a contribution from the mean square,
so the explained/total sum of squares will be closer to one.

Now, here is the quiz for today: construct an ordinary least squares
linear regression example where ALL of the residuals are positive.

Regards,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 11:30 AM
To: [hidden email]
Subject: Re: [SPSSX-L] R^2 computation in SPSS

Thank you for the answer.
Is there anyway I can find out why the coeffecient estimates using two
different methods are similar, but R^2 is not. (I will be throwing out
25% of data if using listwise)
I am assuming the model should go through the origin, so the second
question is fully answered. Thank you.

-----Original Message-----
From: Peck, Jon [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 11:17 AM
To: Joanne Tsai; [hidden email]
Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS

If you use pairwise deletion, you can't be sure of the statistical
properties of your regression estimates. Pairwise deletion is rarely
appropriate. In fact, with pairwise deletion you can't even be sure
that the covariance matrix is positive definite. Stick with listwise
deletion.

As for the constant term, think of the model you are testing. Omitting
the constant term is perfectly appropriate if your model implies that
the regression line should go through the origin and you are confident
of linearity. In most cases, though, you should just keep the constant
term and not test it for significance. Forcing the regression line
through the origin does produce an R^2 that isn't really comparable to
the usual one.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 9:01 AM
To: [hidden email]
Subject: Re: [SPSSX-L] R^2 computation in SPSS

1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS.
But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for
listwise. I'd love to show the higher R^2, but would not want to draw a
wrong conclusion based on it. Or is there any other tool that I can plot
the graph and get the similar 0.85? IS there anywhere I can find more
information in terms of the algorithm for pairwise?
2. When I run the linear regression including the constant, the p-value
on the constant is 0.91, so I would think it's not significant. Can I
remove the constant just based on the P-value I got, is it fair?
Thank you so much for your pointers!

-----Original Message-----
From: ViAnn Beadle [mailto:[hidden email]]
Sent: Tuesday, February 05, 2008 9:01 AM
To: Joanne Tsai; [hidden email]
Subject: RE: R^2 computation in SPSS

Try your SPSS analysis again using listwise deletion of missing data.
I'd
guess you'll get the same results as Excel which AFAIK doesn't have an
algorithm for pairwise.

When you do not include the constant, you are testing an entirely
different
model--that the relation is not significantly different from 0. Is that
what
you want?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Joanne Tsai
Sent: Tuesday, February 05, 2008 6:48 AM
To: [hidden email]
Subject: R^2 computation in SPSS

Dear Co-listers:
I have recently encountered this following question:
I got a pretty good R^2 estimation using Linear Regression model in
SPSS, 0.85. (Not all sample points have all the dependent as well as
independent variables, so I used the pairwise option.)
But when I plotted the predicted number vs actual number (my dependent
variable) in excel and curve expert, I can only get R^2 around 0.50
I am not sure what's causing this discrepancy, is it due to the
computation in SPSS or because of the fact that it's computed pariwise?
The other question I have is that what can one say about the result when
one uses the linear regression model without including the constant? The
R^2 is higher, but isn't that biased? Can one still use it as a
validation method?
Thank you so much for your help!

Joanne

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: R^2 computation in SPSS

In reply to this post by Joanne Tsai

At 01:30 PM 2/5/2008, Joanne Tsai wrote:

>Is there anyway I can find out why the coefficient estimates using
>two different methods are similar, but R^2 is not. (I will be
>throwing out 25% of data if using listwise)

This looks like a case for missing-value imputation. It's done with
command MVA in SPSS -- in add-on module Missing Values Analysis, i.e.
significantly more money. See the end of this posting, for other
resources mentioned on the List.

Echoing John Peck: I think that all missing-value imputation
postulates that values are missing at random. If omissions are
disproportionately from one group, or disproportionately the larger
observed values, imputed values will be misleading. (In the latter
case, so will regression results with listwise deletion.)

>I am assuming the model should go through the origin, so the second
>question is fully answered. Thank you.

Fine, then, if you have really strong theoretical reasons backing
this. State them very clearly in your write-up, or statistical
reviewers will raise the same objections as have respondents here.
Two caveats always apply:
a) R^2 does not have its usual meaning
b) If your independent variable has a low coefficient of variation
(i.e., mean large compared with the SD), your model may essentially
devote itself to explaining the mean value.
...........................
Appendix: postings from the List, on missing-values imputation:

>Date: Fri, 17 Jun 2005 10:26:16 -0500
>From: Anthony Babinec <[hidden email]>
>Subject: Re: missing value imputation
>To: [hidden email]
>
>>Could you suggest me a good book (preferred online) about missing
>>value imputation?
>
>There's a readable PDF file at
>
>www.princeton.edu/~slynch/missingdata.pdf
>
>Standard book treatments include Paul Allison's in the Sage green
>paperback series, and Little and Rubin's book now in its second edition.

Here's one mention a free resource for missing-values imputation. I
can't say, of my own knowledge, whether it's good, or even whether
it's still available:

>Date: Tue, 4 May 2004 18:07:24 -0600
>From: Jeff <[hidden email]>
>Subject: Re: Data Imputation with NORM
>To: [hidden email]
>
>At 07:12 AM 5/3/2004, [Mark] wrote:
>
>>I have had very good luck with NORM and SPSS. I think John Graham
>>has written some SPSS macros for combining data augmentation step results.
>
>John's macros should be at the link below. ...haven't used them
>myself even though I've taken class from John, but they should work
>just fine. From what I understand, the following cite my also be a
>step-by-step easy reference for using Norm and SPSS, although I
>haven't read the article myself yet.
>
>http://mcgee.hhdev.psu.edu/missing/sep15/index.html

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD