SPSSX Discussion - Re: Multiple imputation question

Re: Multiple imputation question

Posted by Joost van Ginkel on
URL: http://spssx-discussion.165.s1.nabble.com/Multiple-imputation-question-tp5740383p5740393.html

Hello Jeff,

I think that Multiple imputation with the number of imputations set at M = 1 is better than stochastic regression imputation. However, I would not recommend single imputation with any imputation procedure so the question which of the two single-imputation procedures shouldn’t even be asked if you ask me. It’s like asking what is better: performing three independent t-tests for comparing the means at three different time points or performing and independent ANOVA. The latter is less wrong than the former, but actually you should do neither.

Best,

Joost

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 11:49 AM
To: [hidden email]
Subject: Re: Multiple imputation question

Hi Joost,

Yes, I understood that the stochastic regression imputation method is intended to produce a single imputed dataset and I’ve read what you said in your 2020 paper about MI. The only thing I’m slightly uncertain about is the practical difference between a single imputed dataset using stochastic regression imputation (that appears to originate with Little & Schenker 1995 and Van Buuren, 2012 according to your paper) and what one would get if they used the SPSS MI procedure and set the number of imputations to 1. In re-reading what I wrote, I can see that I wasn’t clear. I realize that neither procedure is ideal and neither incorporate the uncertainty in the imputation process that is intended to be addressed by the proper use of MI. I’m just trying to get a bit of a better understanding. I’m assuming that if you compared these two less than ideal methods for producing a single dataset with imputed values substituted for the missing ones, that the stochastic regression imputation procedure you mentioned would somehow be better than the SPSS MI procedure that was done only once?

Keep in mind that I’m one of those “applied researchers” that you speak about in your article and although I have a reasonable background in applied stat, my copy of Little and Rubin sits on my bookshelf collecting dust since it’s a bit over my head.

Jeff

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 7:07 PM
To: [hidden email]
Subject: Re: Multiple imputation question

Dear Jeff,

See below.

From: [hidden email] <[hidden email]>
Sent: Wednesday, April 7, 2021 10:20 AM
To: Ginkel, J.R. van <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

Ironically,

It was one of your papers from which I got the term, “stochastic regression imputation,” (van Ginkel et al, 2020 in J. Personality Assessment). I hadn’t heard of that term before I just read that paper.

I think you’re mixing up two things: the stochastic regression imputation I talked about in my 2020 paper was a method for single imputation, which you could say was the predecessor of fully conditional specification using regression. The regression method that I’m talking about is the one described on p. 4 of that paper in the Multiple Imputation Explained section.

I caught most of what you said and understand that R is much more sophisticated than most other statistical packages (I think it would take me a bit of time to fully digest your response), but am still curious if SPSS can be set to produce the type of singly-imputed dataset as you described above in that 2020 paper via its MI procedure even if this is not ideal in practice. I can imagine that it wouldn’t be too difficult to create a macro to so such a thing, but I’m wondering whether it’s built-in?

It is possible to produce a single imputed dataset with the MI procedure in SPSS by setting the number of imputations to 1, but that is not equivalent to stochastic regression imputation (the latter doesn’t use fully conditional specification as an estimation method).

Regardless of the paper, I can easily see that as being helpful in certain situations where you want to explore a number of different model specifications before settling on one that you’ll use in a final model.

Thanks in advance and for your former response.

Jeff

From: Ginkel, J.R. van <[hidden email]>
Sent: Wednesday, April 7, 2021 5:05 PM
To: '[hidden email]' <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

Dear Jeff,

See my answers below.

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 5:38 AM
To: [hidden email]
Subject: Multiple imputation question

I’m currently reviewing an article for a pretty decent journal in the psychological literature where the authors have said that they’ve used spss and also have used multiple imputation.

They are clearly either making a mistake (at worst) or just not describing things well (at best).

They say, “In order to analyze a complete data set multiple imputation (MI) was used to input the missing values.”

They state no other real details of the purported MI procedure they went through and are mixing up the definitions of MCAR, MAR, and NMAR (but only slightly – I’ve seen much worse).

I’m trying to understand what they may have done since otherwise, this is a very good paper, but I haven’t used spss’s implementation of multiple imputation (but have seen and assisted a colleague who was confused) so it’s difficult for me to figure out where they may have made an error.

SPSS performs multiple imputation, creates a new data file in which all imputed versions of the incomplete dataset are appended after another with the original dataset on top, all indicated by an indicator variable “Imputation_”.

I’ve only used older and less-user-friendly MI software (e.g., Norm) in the past (before MI became available in spss) and not seen in detail what spss can currently do with MI.

This may sound a bit offending to the makers of SPSS, but after doing multiple imputation in SPSS a few times I stopped using it and switched to the mice procedure in R. The basic procedure in SPSS is fully conditional specification, just like in R, but it lacks all flexibility that R has. In R you can specify a separate imputation model for each variable, which doesn’t necessarily need to include all variables entered, whereas in SPSS each variable is predicted by all other variables entered in the MI procedure. When you have entered many variables this will inevitably lead to overfitted imputation models, causing the imputed values to become near random (I have seen it happen in scatter plots). Additionally, R can use predictive mean matching (PMM) for some numerical variables and linear regression imputation for other numerical variables. In SPSS you can either use PMM for all numerical variables or regression for al numerical variables, but not one method for one set of variables and the other method for the other set of variables. These are just a few examples, but SPSS lacks many more (to my opinion, essential) features that mice has. To make matters worse, SPSS hardly has any diagnostic tools to determine whether the imputation process went right while mice has several features for that. In short, as an expert on multiple imputation I wouldn’t recommend the MI procedure in SPSS, unless the dataset has relatively few variables and the missing-data problem is relatively simple. Since you cannot tell this from the information that the authors gave in their paper, my comment as a reviewer would be that the authors should switch to mice in R.

Is there someway that SPSS will kick out a single set of regression coefficients that a user might interpret in the wrong way as coming from a single dataset? Can spss produce a single data set that might be described as what some call, “stochastic regression imputation,” which is where missing values are predicted by non-missing values within the data, but an random error term is then subsequently applied? In other words, can the MI procedure in SPSS be set to produce only a single dataset?

SPSS does have a possibility of combining the results of several imputed datasets into one result using Rubin’s combination rules. When SPSS recognizes a dataset as a multiple-imputation dataset, it gives the user a warning that the split file option must be switched on first, before carrying out any analysis (with Imputation_ as a split variable). Next, SPSS automatically does the combining, which is a really nice feature because you don’t need to do the combining yourself. What I usually do is impute the data in R first, save the result to a dataset in SPSS format, and next do the analyses in SPSS. However, SPSS does not pool the results of all statistical analyses. For example, it doesn’t pool the F-tests of ANOVA, R^2 in regression, or the results of PCA. Usually I use my own SPSS macros for that, which are freely available on my personal page. In some of my papers (Van Ginkel & Kroonenberg, 2014; Van Ginkel, 2019; Van Wingerde & Van Ginkel, 2021) I also refer to these macros. Most of these pooling procedures can also be done in R with the relevant packages by the way.

Long story short: I wouldn’t worry about the authors reporting a set of regression coefficients as coming from a single dataset. What I would worry about more, is the whole imputation process that preceded the analysis.

Best regards,

Joost van Ginkel

Thanks in advance,

Jeff

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD