Re: R and R square over .90? for regression with imputed dataset
Posted by tonishi@iupui.edu on Dec 19, 2012; 6:50pm
URL: http://spssx-discussion.165.s1.nabble.com/R-and-R-square-over-90-for-regression-with-imputed-dataset-tp5717033p5717054.html
Hi Rich and Art,
Thanks much for your responses. I actually found that there were 2 different datasets produced by one MI attempt. One lists all variables generated by all imputations - variables from the original, imputations 1~5. Since I have about 150 cases, this lists have 150 x 5 imputations + 150 original data.
The other dataset has only 150 cases.
I thought I should use the second dataset, then got over .90 R. But I went through some archived messages in this ML and found it wasn't a correct one to use for regression. I used the variables imputed by the 5th imputation, and got about .5 R, which seems right.
But I would like to see more about your thoughts of missing values (or avoiding them) for my future survey studies --
Rich, it is great to hear you could manage limiting missing values in your datasets... My study needs to use several common scales used in the field (management), and my target organizations are notorious for not participating in surveys (venture capital funds, foundations, etc.). Almost all prior studies' response rates were about 20%. I could get over 50% of response rate, but am still suffering with many questions not answered -- I used paper-version, as well as online-, questionnaire by following Dillman's recommendation So I couldn't "force" them to answer all questions.
Is there any way that I can still limit missing values?
Art, I think the reasons why participants didn't answer some questions is because they thought the answers were "no." I mean, for instance, for the question about whether they participate in certain network organizations, some didn't circle the number. My assumption, by looking at the organizations' profiles, is that they are not affiliated to such network organizations. But, there is no way to prove all participants had the same intention.
I used the scales used commonly in my field. The reason why I needed to use the scales, rather than items , is because SPSS didn't allow me to run MI by individual items since they are too many (although after I changed the level of measurement into scale.). Also, those scales have good alphas. I left items for some scales that don't have alphas over .70. Those decisions were based on suggestions by prior literature such as those by Rubin, Little, Allison, Graham, and some other empirical studies using imputed values.
Should I do anything else?