|
Dear Co-listers:
I have recently encountered this following question: I got a pretty good R^2 estimation using Linear Regression model in SPSS, 0.85. (Not all sample points have all the dependent as well as independent variables, so I used the pairwise option.) But when I plotted the predicted number vs actual number (my dependent variable) in excel and curve expert, I can only get R^2 around 0.50 I am not sure what's causing this discrepancy, is it due to the computation in SPSS or because of the fact that it's computed pariwise? The other question I have is that what can one say about the result when one uses the linear regression model without including the constant? The R^2 is higher, but isn't that biased? Can one still use it as a validation method? Thank you so much for your help! Joanne ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Try your SPSS analysis again using listwise deletion of missing data. I'd
guess you'll get the same results as Excel which AFAIK doesn't have an algorithm for pairwise. When you do not include the constant, you are testing an entirely different model--that the relation is not significantly different from 0. Is that what you want? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 6:48 AM To: [hidden email] Subject: R^2 computation in SPSS Dear Co-listers: I have recently encountered this following question: I got a pretty good R^2 estimation using Linear Regression model in SPSS, 0.85. (Not all sample points have all the dependent as well as independent variables, so I used the pairwise option.) But when I plotted the predicted number vs actual number (my dependent variable) in excel and curve expert, I can only get R^2 around 0.50 I am not sure what's causing this discrepancy, is it due to the computation in SPSS or because of the fact that it's computed pariwise? The other question I have is that what can one say about the result when one uses the linear regression model without including the constant? The R^2 is higher, but isn't that biased? Can one still use it as a validation method? Thank you so much for your help! Joanne ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
I think that when you exclude the intercept, it implies that the regression line runs through the origin, which is never recommendable to do.
Fermin Ornelas, Ph.D. Management Analyst III, AZ DES 1789 W. Jefferson Street Phoenix, AZ 85032 Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ViAnn Beadle Sent: Tuesday, February 05, 2008 7:01 AM To: [hidden email] Subject: Re: R^2 computation in SPSS Try your SPSS analysis again using listwise deletion of missing data. I'd guess you'll get the same results as Excel which AFAIK doesn't have an algorithm for pairwise. When you do not include the constant, you are testing an entirely different model--that the relation is not significantly different from 0. Is that what you want? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 6:48 AM To: [hidden email] Subject: R^2 computation in SPSS Dear Co-listers: I have recently encountered this following question: I got a pretty good R^2 estimation using Linear Regression model in SPSS, 0.85. (Not all sample points have all the dependent as well as independent variables, so I used the pairwise option.) But when I plotted the predicted number vs actual number (my dependent variable) in excel and curve expert, I can only get R^2 around 0.50 I am not sure what's causing this discrepancy, is it due to the computation in SPSS or because of the fact that it's computed pariwise? The other question I have is that what can one say about the result when one uses the linear regression model without including the constant? The R^2 is higher, but isn't that biased? Can one still use it as a validation method? Thank you so much for your help! Joanne ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Joanne Tsai
1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS.
But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for listwise. I'd love to show the higher R^2, but would not want to draw a wrong conclusion based on it. Or is there any other tool that I can plot the graph and get the similar 0.85? IS there anywhere I can find more information in terms of the algorithm for pairwise? 2. When I run the linear regression including the constant, the p-value on the constant is 0.91, so I would think it's not significant. Can I remove the constant just based on the P-value I got, is it fair? Thank you so much for your pointers! -----Original Message----- From: ViAnn Beadle [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 9:01 AM To: Joanne Tsai; [hidden email] Subject: RE: R^2 computation in SPSS Try your SPSS analysis again using listwise deletion of missing data. I'd guess you'll get the same results as Excel which AFAIK doesn't have an algorithm for pairwise. When you do not include the constant, you are testing an entirely different model--that the relation is not significantly different from 0. Is that what you want? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 6:48 AM To: [hidden email] Subject: R^2 computation in SPSS Dear Co-listers: I have recently encountered this following question: I got a pretty good R^2 estimation using Linear Regression model in SPSS, 0.85. (Not all sample points have all the dependent as well as independent variables, so I used the pairwise option.) But when I plotted the predicted number vs actual number (my dependent variable) in excel and curve expert, I can only get R^2 around 0.50 I am not sure what's causing this discrepancy, is it due to the computation in SPSS or because of the fact that it's computed pariwise? The other question I have is that what can one say about the result when one uses the linear regression model without including the constant? The R^2 is higher, but isn't that biased? Can one still use it as a validation method? Thank you so much for your help! Joanne ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
If you use pairwise deletion, you can't be sure of the statistical properties of your regression estimates. Pairwise deletion is rarely appropriate. In fact, with pairwise deletion you can’t even be sure that the covariance matrix is positive definite. Stick with listwise deletion.
As for the constant term, think of the model you are testing. Omitting the constant term is perfectly appropriate if your model implies that the regression line should go through the origin and you are confident of linearity. In most cases, though, you should just keep the constant term and not test it for significance. Forcing the regression line through the origin does produce an R^2 that isn't really comparable to the usual one. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 9:01 AM To: [hidden email] Subject: Re: [SPSSX-L] R^2 computation in SPSS 1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS. But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for listwise. I'd love to show the higher R^2, but would not want to draw a wrong conclusion based on it. Or is there any other tool that I can plot the graph and get the similar 0.85? IS there anywhere I can find more information in terms of the algorithm for pairwise? 2. When I run the linear regression including the constant, the p-value on the constant is 0.91, so I would think it's not significant. Can I remove the constant just based on the P-value I got, is it fair? Thank you so much for your pointers! -----Original Message----- From: ViAnn Beadle [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 9:01 AM To: Joanne Tsai; [hidden email] Subject: RE: R^2 computation in SPSS Try your SPSS analysis again using listwise deletion of missing data. I'd guess you'll get the same results as Excel which AFAIK doesn't have an algorithm for pairwise. When you do not include the constant, you are testing an entirely different model--that the relation is not significantly different from 0. Is that what you want? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 6:48 AM To: [hidden email] Subject: R^2 computation in SPSS Dear Co-listers: I have recently encountered this following question: I got a pretty good R^2 estimation using Linear Regression model in SPSS, 0.85. (Not all sample points have all the dependent as well as independent variables, so I used the pairwise option.) But when I plotted the predicted number vs actual number (my dependent variable) in excel and curve expert, I can only get R^2 around 0.50 I am not sure what's causing this discrepancy, is it due to the computation in SPSS or because of the fact that it's computed pariwise? The other question I have is that what can one say about the result when one uses the linear regression model without including the constant? The R^2 is higher, but isn't that biased? Can one still use it as a validation method? Thank you so much for your help! Joanne ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Joanne Tsai
Thank you for the answer.
Is there anyway I can find out why the coeffecient estimates using two different methods are similar, but R^2 is not. (I will be throwing out 25% of data if using listwise) I am assuming the model should go through the origin, so the second question is fully answered. Thank you. -----Original Message----- From: Peck, Jon [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 11:17 AM To: Joanne Tsai; [hidden email] Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS If you use pairwise deletion, you can't be sure of the statistical properties of your regression estimates. Pairwise deletion is rarely appropriate. In fact, with pairwise deletion you can't even be sure that the covariance matrix is positive definite. Stick with listwise deletion. As for the constant term, think of the model you are testing. Omitting the constant term is perfectly appropriate if your model implies that the regression line should go through the origin and you are confident of linearity. In most cases, though, you should just keep the constant term and not test it for significance. Forcing the regression line through the origin does produce an R^2 that isn't really comparable to the usual one. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 9:01 AM To: [hidden email] Subject: Re: [SPSSX-L] R^2 computation in SPSS 1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS. But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for listwise. I'd love to show the higher R^2, but would not want to draw a wrong conclusion based on it. Or is there any other tool that I can plot the graph and get the similar 0.85? IS there anywhere I can find more information in terms of the algorithm for pairwise? 2. When I run the linear regression including the constant, the p-value on the constant is 0.91, so I would think it's not significant. Can I remove the constant just based on the P-value I got, is it fair? Thank you so much for your pointers! -----Original Message----- From: ViAnn Beadle [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 9:01 AM To: Joanne Tsai; [hidden email] Subject: RE: R^2 computation in SPSS Try your SPSS analysis again using listwise deletion of missing data. I'd guess you'll get the same results as Excel which AFAIK doesn't have an algorithm for pairwise. When you do not include the constant, you are testing an entirely different model--that the relation is not significantly different from 0. Is that what you want? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 6:48 AM To: [hidden email] Subject: R^2 computation in SPSS Dear Co-listers: I have recently encountered this following question: I got a pretty good R^2 estimation using Linear Regression model in SPSS, 0.85. (Not all sample points have all the dependent as well as independent variables, so I used the pairwise option.) But when I plotted the predicted number vs actual number (my dependent variable) in excel and curve expert, I can only get R^2 around 0.50 I am not sure what's causing this discrepancy, is it due to the computation in SPSS or because of the fact that it's computed pariwise? The other question I have is that what can one say about the result when one uses the linear regression model without including the constant? The R^2 is higher, but isn't that biased? Can one still use it as a validation method? Thank you so much for your help! Joanne ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Regarding the R^2, when there is a constant term in the regression, the residuals have mean zero, so the sums of squares in the numerator and denominator match up with correlation coefficients. If there is no constant term, the residual mean is not zero, so the sums of squares in both numerator and denominator have a contribution from the mean square, so the explained/total sum of squares will be closer to one.
Now, here is the quiz for today: construct an ordinary least squares linear regression example where ALL of the residuals are positive. Regards, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 11:30 AM To: [hidden email] Subject: Re: [SPSSX-L] R^2 computation in SPSS Thank you for the answer. Is there anyway I can find out why the coeffecient estimates using two different methods are similar, but R^2 is not. (I will be throwing out 25% of data if using listwise) I am assuming the model should go through the origin, so the second question is fully answered. Thank you. -----Original Message----- From: Peck, Jon [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 11:17 AM To: Joanne Tsai; [hidden email] Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS If you use pairwise deletion, you can't be sure of the statistical properties of your regression estimates. Pairwise deletion is rarely appropriate. In fact, with pairwise deletion you can't even be sure that the covariance matrix is positive definite. Stick with listwise deletion. As for the constant term, think of the model you are testing. Omitting the constant term is perfectly appropriate if your model implies that the regression line should go through the origin and you are confident of linearity. In most cases, though, you should just keep the constant term and not test it for significance. Forcing the regression line through the origin does produce an R^2 that isn't really comparable to the usual one. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 9:01 AM To: [hidden email] Subject: Re: [SPSSX-L] R^2 computation in SPSS 1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS. But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for listwise. I'd love to show the higher R^2, but would not want to draw a wrong conclusion based on it. Or is there any other tool that I can plot the graph and get the similar 0.85? IS there anywhere I can find more information in terms of the algorithm for pairwise? 2. When I run the linear regression including the constant, the p-value on the constant is 0.91, so I would think it's not significant. Can I remove the constant just based on the P-value I got, is it fair? Thank you so much for your pointers! -----Original Message----- From: ViAnn Beadle [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 9:01 AM To: Joanne Tsai; [hidden email] Subject: RE: R^2 computation in SPSS Try your SPSS analysis again using listwise deletion of missing data. I'd guess you'll get the same results as Excel which AFAIK doesn't have an algorithm for pairwise. When you do not include the constant, you are testing an entirely different model--that the relation is not significantly different from 0. Is that what you want? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 6:48 AM To: [hidden email] Subject: R^2 computation in SPSS Dear Co-listers: I have recently encountered this following question: I got a pretty good R^2 estimation using Linear Regression model in SPSS, 0.85. (Not all sample points have all the dependent as well as independent variables, so I used the pairwise option.) But when I plotted the predicted number vs actual number (my dependent variable) in excel and curve expert, I can only get R^2 around 0.50 I am not sure what's causing this discrepancy, is it due to the computation in SPSS or because of the fact that it's computed pariwise? The other question I have is that what can one say about the result when one uses the linear regression model without including the constant? The R^2 is higher, but isn't that biased? Can one still use it as a validation method? Thank you so much for your help! Joanne ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Joanne Tsai
Hi, Jon
Sorry I didn't make my question clear. I meant to ask, by trying both listwise and pairwise, I observed that both sets of estimated coefficients are similar though R^2 seemed to perform a lot better with the pairwise. I am very curious of the reason behind it. Can I get better coefficients by using pairwise since it doesn't throw out any data? And how is R^2 computed by using pairwise, why is it a lot better than the R^2 done listwise? -----Original Message----- From: Peck, Jon [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 1:47 PM To: Joanne Tsai; [hidden email] Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS Regarding the R^2, when there is a constant term in the regression, the residuals have mean zero, so the sums of squares in the numerator and denominator match up with correlation coefficients. If there is no constant term, the residual mean is not zero, so the sums of squares in both numerator and denominator have a contribution from the mean square, so the explained/total sum of squares will be closer to one. Now, here is the quiz for today: construct an ordinary least squares linear regression example where ALL of the residuals are positive. Regards, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 11:30 AM To: [hidden email] Subject: Re: [SPSSX-L] R^2 computation in SPSS Thank you for the answer. Is there anyway I can find out why the coeffecient estimates using two different methods are similar, but R^2 is not. (I will be throwing out 25% of data if using listwise) I am assuming the model should go through the origin, so the second question is fully answered. Thank you. -----Original Message----- From: Peck, Jon [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 11:17 AM To: Joanne Tsai; [hidden email] Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS If you use pairwise deletion, you can't be sure of the statistical properties of your regression estimates. Pairwise deletion is rarely appropriate. In fact, with pairwise deletion you can't even be sure that the covariance matrix is positive definite. Stick with listwise deletion. As for the constant term, think of the model you are testing. Omitting the constant term is perfectly appropriate if your model implies that the regression line should go through the origin and you are confident of linearity. In most cases, though, you should just keep the constant term and not test it for significance. Forcing the regression line through the origin does produce an R^2 that isn't really comparable to the usual one. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 9:01 AM To: [hidden email] Subject: Re: [SPSSX-L] R^2 computation in SPSS 1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS. But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for listwise. I'd love to show the higher R^2, but would not want to draw a wrong conclusion based on it. Or is there any other tool that I can plot the graph and get the similar 0.85? IS there anywhere I can find more information in terms of the algorithm for pairwise? 2. When I run the linear regression including the constant, the p-value on the constant is 0.91, so I would think it's not significant. Can I remove the constant just based on the P-value I got, is it fair? Thank you so much for your pointers! -----Original Message----- From: ViAnn Beadle [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 9:01 AM To: Joanne Tsai; [hidden email] Subject: RE: R^2 computation in SPSS Try your SPSS analysis again using listwise deletion of missing data. I'd guess you'll get the same results as Excel which AFAIK doesn't have an algorithm for pairwise. When you do not include the constant, you are testing an entirely different model--that the relation is not significantly different from 0. Is that what you want? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 6:48 AM To: [hidden email] Subject: R^2 computation in SPSS Dear Co-listers: I have recently encountered this following question: I got a pretty good R^2 estimation using Linear Regression model in SPSS, 0.85. (Not all sample points have all the dependent as well as independent variables, so I used the pairwise option.) But when I plotted the predicted number vs actual number (my dependent variable) in excel and curve expert, I can only get R^2 around 0.50 I am not sure what's causing this discrepancy, is it due to the computation in SPSS or because of the fact that it's computed pariwise? The other question I have is that what can one say about the result when one uses the linear regression model without including the constant? The R^2 is higher, but isn't that biased? Can one still use it as a validation method? Thank you so much for your help! Joanne ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
There are two issues here. First, you are using different samples when you go from listwise to pairwise deletion. There could be population characteristics that differ, especially if values are not missing at random. Imagine, for example a situation where men rarely answer some question while women usually answer. Then the pairwise-sample gender proportion will be very different from the listwise one, and if males and females differ in the regression response, the results will be quite different in the two samples.
Second, the residual means are doubtless different. Do Descriptives on them. You will see how the contribution to the R^2 from the residual means differ. You might also look at regression diagnostics. HTH, Jon Peck -----Original Message----- From: Joanne Tsai [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 11:54 AM To: Peck, Jon; [hidden email] Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS Hi, Jon Sorry I didn't make my question clear. I meant to ask, by trying both listwise and pairwise, I observed that both sets of estimated coefficients are similar though R^2 seemed to perform a lot better with the pairwise. I am very curious of the reason behind it. Can I get better coefficients by using pairwise since it doesn't throw out any data? And how is R^2 computed by using pairwise, why is it a lot better than the R^2 done listwise? -----Original Message----- From: Peck, Jon [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 1:47 PM To: Joanne Tsai; [hidden email] Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS Regarding the R^2, when there is a constant term in the regression, the residuals have mean zero, so the sums of squares in the numerator and denominator match up with correlation coefficients. If there is no constant term, the residual mean is not zero, so the sums of squares in both numerator and denominator have a contribution from the mean square, so the explained/total sum of squares will be closer to one. Now, here is the quiz for today: construct an ordinary least squares linear regression example where ALL of the residuals are positive. Regards, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 11:30 AM To: [hidden email] Subject: Re: [SPSSX-L] R^2 computation in SPSS Thank you for the answer. Is there anyway I can find out why the coeffecient estimates using two different methods are similar, but R^2 is not. (I will be throwing out 25% of data if using listwise) I am assuming the model should go through the origin, so the second question is fully answered. Thank you. -----Original Message----- From: Peck, Jon [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 11:17 AM To: Joanne Tsai; [hidden email] Subject: RE: Re: [SPSSX-L] R^2 computation in SPSS If you use pairwise deletion, you can't be sure of the statistical properties of your regression estimates. Pairwise deletion is rarely appropriate. In fact, with pairwise deletion you can't even be sure that the covariance matrix is positive definite. Stick with listwise deletion. As for the constant term, think of the model you are testing. Omitting the constant term is perfectly appropriate if your model implies that the regression line should go through the origin and you are confident of linearity. In most cases, though, you should just keep the constant term and not test it for significance. Forcing the regression line through the origin does produce an R^2 that isn't really comparable to the usual one. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 9:01 AM To: [hidden email] Subject: Re: [SPSSX-L] R^2 computation in SPSS 1. Yes, if I do use the listwise, R^2 is similar between Excel and SPSS. But which R^2 is more reliable? I have 0.85 for pairwise, and 0.65 for listwise. I'd love to show the higher R^2, but would not want to draw a wrong conclusion based on it. Or is there any other tool that I can plot the graph and get the similar 0.85? IS there anywhere I can find more information in terms of the algorithm for pairwise? 2. When I run the linear regression including the constant, the p-value on the constant is 0.91, so I would think it's not significant. Can I remove the constant just based on the P-value I got, is it fair? Thank you so much for your pointers! -----Original Message----- From: ViAnn Beadle [mailto:[hidden email]] Sent: Tuesday, February 05, 2008 9:01 AM To: Joanne Tsai; [hidden email] Subject: RE: R^2 computation in SPSS Try your SPSS analysis again using listwise deletion of missing data. I'd guess you'll get the same results as Excel which AFAIK doesn't have an algorithm for pairwise. When you do not include the constant, you are testing an entirely different model--that the relation is not significantly different from 0. Is that what you want? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joanne Tsai Sent: Tuesday, February 05, 2008 6:48 AM To: [hidden email] Subject: R^2 computation in SPSS Dear Co-listers: I have recently encountered this following question: I got a pretty good R^2 estimation using Linear Regression model in SPSS, 0.85. (Not all sample points have all the dependent as well as independent variables, so I used the pairwise option.) But when I plotted the predicted number vs actual number (my dependent variable) in excel and curve expert, I can only get R^2 around 0.50 I am not sure what's causing this discrepancy, is it due to the computation in SPSS or because of the fact that it's computed pariwise? The other question I have is that what can one say about the result when one uses the linear regression model without including the constant? The R^2 is higher, but isn't that biased? Can one still use it as a validation method? Thank you so much for your help! Joanne ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Joanne Tsai
At 01:30 PM 2/5/2008, Joanne Tsai wrote:
>Is there anyway I can find out why the coefficient estimates using >two different methods are similar, but R^2 is not. (I will be >throwing out 25% of data if using listwise) This looks like a case for missing-value imputation. It's done with command MVA in SPSS -- in add-on module Missing Values Analysis, i.e. significantly more money. See the end of this posting, for other resources mentioned on the List. Echoing John Peck: I think that all missing-value imputation postulates that values are missing at random. If omissions are disproportionately from one group, or disproportionately the larger observed values, imputed values will be misleading. (In the latter case, so will regression results with listwise deletion.) >I am assuming the model should go through the origin, so the second >question is fully answered. Thank you. Fine, then, if you have really strong theoretical reasons backing this. State them very clearly in your write-up, or statistical reviewers will raise the same objections as have respondents here. Two caveats always apply: a) R^2 does not have its usual meaning b) If your independent variable has a low coefficient of variation (i.e., mean large compared with the SD), your model may essentially devote itself to explaining the mean value. ........................... Appendix: postings from the List, on missing-values imputation: >Date: Fri, 17 Jun 2005 10:26:16 -0500 >From: Anthony Babinec <[hidden email]> >Subject: Re: missing value imputation >To: [hidden email] > >>Could you suggest me a good book (preferred online) about missing >>value imputation? > >There's a readable PDF file at > >www.princeton.edu/~slynch/missingdata.pdf > >Standard book treatments include Paul Allison's in the Sage green >paperback series, and Little and Rubin's book now in its second edition. Here's one mention a free resource for missing-values imputation. I can't say, of my own knowledge, whether it's good, or even whether it's still available: >Date: Tue, 4 May 2004 18:07:24 -0600 >From: Jeff <[hidden email]> >Subject: Re: Data Imputation with NORM >To: [hidden email] > >At 07:12 AM 5/3/2004, [Mark] wrote: > >>I have had very good luck with NORM and SPSS. I think John Graham >>has written some SPSS macros for combining data augmentation step results. > >John's macros should be at the link below. ...haven't used them >myself even though I've taken class from John, but they should work >just fine. From what I understand, the following cite my also be a >step-by-step easy reference for using Norm and SPSS, although I >haven't read the article myself yet. > >http://mcgee.hhdev.psu.edu/missing/sep15/index.html ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
