SPSSX Discussion - Re: R and R square over .90? for regression with imputed dataset

Re: R and R square over .90? for regression with imputed dataset

Posted by Rich Ulrich on Dec 19, 2012; 7:25am
URL: http://spssx-discussion.165.s1.nabble.com/R-and-R-square-over-90-for-regression-with-imputed-dataset-tp5717033p5717045.html

Okay. Not only am I no expert in imputation, but I managed to avoid it forever.
My data seldom had much missing (and not outcome) and I got by with
relabeling (like "Yes/not-yes" instead of Yes/No) or adding a Missing category.

What I wrote is the obvious starting point -- you can't use the info of outcome
to determine what you always set a predictor to. You *might* use some
probabilistic approach that avoids creating a relationship... if you have such
a large amount of missing to account for that you have to do this to salvage
an analysis.

Your results imply that you did the former - incorporating information - and
not the latter

Frank Harrell is reliable. I googled for him on the subject and came up with this
comment by someone else --
http://lists.utsouthwestern.edu/pipermail/impute/2001-February/000104.html
- which FH agrees with, in the next post in the thread.

I also noticed a comment worrying about Missing at Random, but your change
in results seems to drastic for that to matter.

--
Rich Ulrich

Date: Wed, 19 Dec 2012 06:37:58 +0000
From: [hidden email]
Subject: Re: R and R square over .90? for regression with imputed dataset
To: [hidden email]

Hi Rich,

Could you please expand on your comment? I have always been under the impression that one should include the outcome in an imputation model to ensure that all relevant relationships are accounted for, and that excluding the outcome could/would introduce bias (see, for example [1]). Are you aware of scenarios in which that isn’t the case?

I know that there is debate about whether to include cases that have had the outcome imputed in the final analysis (multiple imputation then deletion, [2]), but that appears to be a separate issue to what you describe.

Thanks,

Kylie.

[1] Moons KGM, Donders RART, Stijen T, Harrell FE. (2006) Using the outcome for imputation of missing predictor values was preferred. J of Clinical Epidemiology 59: 1092-1101.

[2] von Hippel PT. (2007) Regression with missing Ys: An improved strategy for analysing multiply imputed data. Sociological Methodology 37(1): 83-117.

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Rich Ulrich
Sent: Wednesday, 19 December 2012 4:38 PM
To: [hidden email]
Subject: Re: R and R square over .90? for regression with imputed dataset

It is a no-no to use your criterion variable for imputing values for
your predictors. Probably because, that could account for this sort
of result.

--
Rich Ulrich

> Date: Tue, 18 Dec 2012 19:51:51 -0800
> From: [hidden email]
> Subject: R and R square over .90? for regression with imputed dataset
> To: [hidden email]
>
> Hello,
>
> I was running regressions with the imputed variables, and I got R and R2
> over .90. Is this ever possible? The highest R2 I had with original dataset
> was over .50.
>
> I used SPSS Multiple Imputation and Missing Value Analysis functions by
> following all steps suggested by the IBM SPSS guide. The original dataset
> had a significant amount of missing values (about 10-40%). Some variables
> imputed are at the scale level as well as the individual scale items. I
> spoke to my advisor and she, too, is skeptical about this result.
>
> Any suggestions would be appreciated. Thanks much.
>
> ...