Multiple imputation question

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Multiple imputation question

Jeff A

 

I’m currently reviewing an article for a pretty decent journal in the psychological literature where the authors have said that they’ve used spss and also have used multiple imputation.

 

They are clearly either making a mistake (at worst) or just not describing things well (at best).

 

They say, “In order to analyze a complete data set multiple imputation (MI) was used to input the missing values.”

 

They state no other real details of the purported MI procedure they went through and are mixing up the definitions of MCAR, MAR, and NMAR (but only slightly – I’ve seen much worse).

 

I’m trying to understand what they may have done since otherwise, this is a very good paper, but I haven’t used spss’s implementation of multiple imputation (but have seen and assisted a colleague who was confused) so it’s difficult for me to figure out where they may have made an error.

 

I’ve only used older and less-user-friendly MI software (e.g., Norm) in the past (before MI became available in spss) and not seen in detail what spss can currently do with MI. Is there someway that SPSS will kick out a single set of regression coefficients that a user might interpret in the wrong way as coming from a single dataset? Can spss produce a single data set that might be described as what some call, “stochastic regression imputation,” which is where missing values are predicted by non-missing values within the data, but an random error term is then subsequently applied? In other words, can the MI procedure in SPSS be set to produce only a single dataset?

 

Thanks in advance,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Joost van Ginkel

Dear Jeff,

 

See my answers below.

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 5:38 AM
To: [hidden email]
Subject: Multiple imputation question

 

 

I’m currently reviewing an article for a pretty decent journal in the psychological literature where the authors have said that they’ve used spss and also have used multiple imputation.

 

They are clearly either making a mistake (at worst) or just not describing things well (at best).

 

They say, “In order to analyze a complete data set multiple imputation (MI) was used to input the missing values.”

 

They state no other real details of the purported MI procedure they went through and are mixing up the definitions of MCAR, MAR, and NMAR (but only slightly – I’ve seen much worse).

 

I’m trying to understand what they may have done since otherwise, this is a very good paper, but I haven’t used spss’s implementation of multiple imputation (but have seen and assisted a colleague who was confused) so it’s difficult for me to figure out where they may have made an error.

 

SPSS performs multiple imputation, creates a new data file in which all imputed versions of the incomplete dataset are appended after another with the original dataset on top, all indicated by an indicator variable “Imputation_”.

 

I’ve only used older and less-user-friendly MI software (e.g., Norm) in the past (before MI became available in spss) and not seen in detail what spss can currently do with MI.

 

This may sound a bit offending to the makers of SPSS, but after doing multiple imputation in SPSS a few times I stopped using it and switched to the mice procedure in R. The basic procedure in SPSS is fully conditional specification, just like in R, but it lacks all flexibility that R has. In R you can specify a separate imputation model for each variable, which doesn’t necessarily need to include all variables entered, whereas in SPSS each variable is predicted by all other variables entered in the MI procedure. When you have entered many variables this will inevitably lead to overfitted imputation models, causing the imputed values to become near random (I have seen it happen in scatter plots). Additionally, R can use predictive mean matching (PMM) for some numerical variables and linear regression imputation for other numerical variables. In SPSS you can either use PMM for all numerical variables or regression for al numerical variables, but not one method for one set of variables and the other method for the other set of variables. These are just a few examples, but SPSS lacks many more (to my opinion, essential) features that mice has. To make matters worse, SPSS hardly has any diagnostic tools to determine whether the imputation process went right while mice has several features for that. In short, as an expert on multiple imputation I wouldn’t recommend the MI procedure in SPSS, unless the dataset has relatively few variables and the missing-data problem is relatively simple. Since you cannot tell this from the information that the authors gave in their paper, my comment as a reviewer would be that the authors should switch to mice in R.

 

Is there someway that SPSS will kick out a single set of regression coefficients that a user might interpret in the wrong way as coming from a single dataset? Can spss produce a single data set that might be described as what some call, “stochastic regression imputation,” which is where missing values are predicted by non-missing values within the data, but an random error term is then subsequently applied? In other words, can the MI procedure in SPSS be set to produce only a single dataset?

 

SPSS does have a possibility of combining the results of several imputed datasets into one result using Rubin’s combination rules. When SPSS recognizes a dataset as a multiple-imputation dataset, it gives the user a warning that the split file option must be switched on first, before carrying out any analysis (with Imputation_ as a split variable). Next, SPSS automatically does the combining, which is a really nice feature because you don’t need to do the combining yourself. What I usually do is impute the data in R first, save the result to a dataset in SPSS format, and next do the analyses in SPSS. However, SPSS does not pool the results of all statistical analyses. For example, it doesn’t pool the F-tests of ANOVA, R^2 in regression, or the results of PCA. Usually I use my own SPSS macros for that, which are freely available on my personal page. In some of my papers (Van Ginkel & Kroonenberg, 2014; Van Ginkel, 2019; Van Wingerde & Van Ginkel, 2021) I also refer to these macros. Most of these pooling procedures can also be done in R with the relevant packages by the way.

Long story short: I wouldn’t worry about the authors reporting a set of regression coefficients as coming from a single dataset. What I would worry about more, is the whole imputation process that preceded the analysis.

 

Best regards,

 

Joost van Ginkel

 

Thanks in advance,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Jeff A

 

Ironically,

 

It was one of your papers from which I got the term, “stochastic regression imputation,” (van Ginkel et al, 2020 in J. Personality Assessment). I hadn’t heard of that term before I just read that paper.

 

I caught most of what you said and understand that R is much more sophisticated than most other statistical packages (I think it would take me a bit of time to fully digest your response), but am still curious if SPSS can be set to produce the type of singly-imputed dataset as you described above in that 2020 paper via its MI procedure even if this is not ideal in practice. I can imagine that it wouldn’t be too difficult to create a macro to so such a thing, but I’m wondering whether it’s built-in?

 

Regardless of the paper, I can easily see that as being helpful in certain situations where you want to explore a number of different model specifications before settling on one that you’ll use in a final model.

 

Thanks in advance and for your former response.

 

Jeff

 

 

 

 

From: Ginkel, J.R. van <[hidden email]>
Sent: Wednesday, April 7, 2021 5:05 PM
To: '[hidden email]' <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

Dear Jeff,

 

See my answers below.

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 5:38 AM
To: [hidden email]
Subject: Multiple imputation question

 

 

I’m currently reviewing an article for a pretty decent journal in the psychological literature where the authors have said that they’ve used spss and also have used multiple imputation.

 

They are clearly either making a mistake (at worst) or just not describing things well (at best).

 

They say, “In order to analyze a complete data set multiple imputation (MI) was used to input the missing values.”

 

They state no other real details of the purported MI procedure they went through and are mixing up the definitions of MCAR, MAR, and NMAR (but only slightly – I’ve seen much worse).

 

I’m trying to understand what they may have done since otherwise, this is a very good paper, but I haven’t used spss’s implementation of multiple imputation (but have seen and assisted a colleague who was confused) so it’s difficult for me to figure out where they may have made an error.

 

SPSS performs multiple imputation, creates a new data file in which all imputed versions of the incomplete dataset are appended after another with the original dataset on top, all indicated by an indicator variable “Imputation_”.

 

I’ve only used older and less-user-friendly MI software (e.g., Norm) in the past (before MI became available in spss) and not seen in detail what spss can currently do with MI.

 

This may sound a bit offending to the makers of SPSS, but after doing multiple imputation in SPSS a few times I stopped using it and switched to the mice procedure in R. The basic procedure in SPSS is fully conditional specification, just like in R, but it lacks all flexibility that R has. In R you can specify a separate imputation model for each variable, which doesn’t necessarily need to include all variables entered, whereas in SPSS each variable is predicted by all other variables entered in the MI procedure. When you have entered many variables this will inevitably lead to overfitted imputation models, causing the imputed values to become near random (I have seen it happen in scatter plots). Additionally, R can use predictive mean matching (PMM) for some numerical variables and linear regression imputation for other numerical variables. In SPSS you can either use PMM for all numerical variables or regression for al numerical variables, but not one method for one set of variables and the other method for the other set of variables. These are just a few examples, but SPSS lacks many more (to my opinion, essential) features that mice has. To make matters worse, SPSS hardly has any diagnostic tools to determine whether the imputation process went right while mice has several features for that. In short, as an expert on multiple imputation I wouldn’t recommend the MI procedure in SPSS, unless the dataset has relatively few variables and the missing-data problem is relatively simple. Since you cannot tell this from the information that the authors gave in their paper, my comment as a reviewer would be that the authors should switch to mice in R.

 

Is there someway that SPSS will kick out a single set of regression coefficients that a user might interpret in the wrong way as coming from a single dataset? Can spss produce a single data set that might be described as what some call, “stochastic regression imputation,” which is where missing values are predicted by non-missing values within the data, but an random error term is then subsequently applied? In other words, can the MI procedure in SPSS be set to produce only a single dataset?

 

SPSS does have a possibility of combining the results of several imputed datasets into one result using Rubin’s combination rules. When SPSS recognizes a dataset as a multiple-imputation dataset, it gives the user a warning that the split file option must be switched on first, before carrying out any analysis (with Imputation_ as a split variable). Next, SPSS automatically does the combining, which is a really nice feature because you don’t need to do the combining yourself. What I usually do is impute the data in R first, save the result to a dataset in SPSS format, and next do the analyses in SPSS. However, SPSS does not pool the results of all statistical analyses. For example, it doesn’t pool the F-tests of ANOVA, R^2 in regression, or the results of PCA. Usually I use my own SPSS macros for that, which are freely available on my personal page. In some of my papers (Van Ginkel & Kroonenberg, 2014; Van Ginkel, 2019; Van Wingerde & Van Ginkel, 2021) I also refer to these macros. Most of these pooling procedures can also be done in R with the relevant packages by the way.

Long story short: I wouldn’t worry about the authors reporting a set of regression coefficients as coming from a single dataset. What I would worry about more, is the whole imputation process that preceded the analysis.

 

Best regards,

 

Joost van Ginkel

 

Thanks in advance,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Kirill Orlov
In reply to this post by Joost van Ginkel
--> Can spss produce a single data set that might be described as what
some call, “stochastic regression imputation,” which is where missing
values are predicted by non-missing values within the data, but an
random error term is then subsequently applied? In other words, can the
MI procedure in SPSS be set to produce only a single dataset?//

By the definition of multiple imputation, no single dataset (with or
without the "stochastic" errors added) can correspond to the pooled
statistics and their st. errors (regression coeff-s, etc.) produced by
MI. If it were possible, no one would be busy generating multiple
imputed datasets.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Jeff A
In reply to this post by Jeff A
I get the general idea, but in thinking about this further, I assume that
one could simply select a single one of the multiple datasets produced by
the MI procedure - depending upon the nature of the random error that's
added, I assume that any one of these imputed datasets would be better than
alternatives like listwise deletion in that they retain statistical power
and do not produce biased variances and covariances, but would result in p
values and confidence intervals that are biased downward. Clearly, the
correct use of MI is the ideal.

Jeff



-----Original Message-----
From: Kirill Orlov <[hidden email]>
Sent: Wednesday, April 7, 2021 6:28 PM
To: [hidden email]; [hidden email]
Subject: Re: Multiple imputation question

//Can spss produce a single data set that might be described as what some
call, "stochastic regression imputation," which is where missing values are
predicted by non-missing values within the data, but an random error term is
then subsequently applied? In other words, can the MI procedure in SPSS be
set to produce only a single dataset?//

By the definition of multiple imputation, no single dataset (with or without
the "stochastic" errors added) can correspond to the pooled statistics and
their st. errors (regression coeff-s, etc.) produced by MI. If it were
possible, no one would be busy generating multiple imputed datasets.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Joost van Ginkel
In reply to this post by Jeff A

Dear Jeff,

 

See below.

 

From: [hidden email] <[hidden email]>
Sent: Wednesday, April 7, 2021 10:20 AM
To: Ginkel, J.R. van <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

 

Ironically,

 

It was one of your papers from which I got the term, “stochastic regression imputation,” (van Ginkel et al, 2020 in J. Personality Assessment). I hadn’t heard of that term before I just read that paper.

 

I think you’re mixing up two things: the stochastic regression imputation I talked about in my 2020 paper was a method for single imputation, which you could say was the predecessor of fully conditional specification using regression. The regression method that I’m talking about is the one described on p. 4 of that paper in the Multiple Imputation Explained section.

 

I caught most of what you said and understand that R is much more sophisticated than most other statistical packages (I think it would take me a bit of time to fully digest your response), but am still curious if SPSS can be set to produce the type of singly-imputed dataset as you described above in that 2020 paper via its MI procedure even if this is not ideal in practice. I can imagine that it wouldn’t be too difficult to create a macro to so such a thing, but I’m wondering whether it’s built-in?

 

It is possible to produce a single imputed dataset with the MI procedure in SPSS by setting the number of imputations to 1, but that is not equivalent to stochastic regression imputation (the latter doesn’t use fully conditional specification as an estimation method).

 

 

Regardless of the paper, I can easily see that as being helpful in certain situations where you want to explore a number of different model specifications before settling on one that you’ll use in a final model.

 

Thanks in advance and for your former response.

 

Jeff

 

 

 

 

From: Ginkel, J.R. van <[hidden email]>
Sent: Wednesday, April 7, 2021 5:05 PM
To: '[hidden email]' <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

Dear Jeff,

 

See my answers below.

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 5:38 AM
To: [hidden email]
Subject: Multiple imputation question

 

 

I’m currently reviewing an article for a pretty decent journal in the psychological literature where the authors have said that they’ve used spss and also have used multiple imputation.

 

They are clearly either making a mistake (at worst) or just not describing things well (at best).

 

They say, “In order to analyze a complete data set multiple imputation (MI) was used to input the missing values.”

 

They state no other real details of the purported MI procedure they went through and are mixing up the definitions of MCAR, MAR, and NMAR (but only slightly – I’ve seen much worse).

 

I’m trying to understand what they may have done since otherwise, this is a very good paper, but I haven’t used spss’s implementation of multiple imputation (but have seen and assisted a colleague who was confused) so it’s difficult for me to figure out where they may have made an error.

 

SPSS performs multiple imputation, creates a new data file in which all imputed versions of the incomplete dataset are appended after another with the original dataset on top, all indicated by an indicator variable “Imputation_”.

 

I’ve only used older and less-user-friendly MI software (e.g., Norm) in the past (before MI became available in spss) and not seen in detail what spss can currently do with MI.

 

This may sound a bit offending to the makers of SPSS, but after doing multiple imputation in SPSS a few times I stopped using it and switched to the mice procedure in R. The basic procedure in SPSS is fully conditional specification, just like in R, but it lacks all flexibility that R has. In R you can specify a separate imputation model for each variable, which doesn’t necessarily need to include all variables entered, whereas in SPSS each variable is predicted by all other variables entered in the MI procedure. When you have entered many variables this will inevitably lead to overfitted imputation models, causing the imputed values to become near random (I have seen it happen in scatter plots). Additionally, R can use predictive mean matching (PMM) for some numerical variables and linear regression imputation for other numerical variables. In SPSS you can either use PMM for all numerical variables or regression for al numerical variables, but not one method for one set of variables and the other method for the other set of variables. These are just a few examples, but SPSS lacks many more (to my opinion, essential) features that mice has. To make matters worse, SPSS hardly has any diagnostic tools to determine whether the imputation process went right while mice has several features for that. In short, as an expert on multiple imputation I wouldn’t recommend the MI procedure in SPSS, unless the dataset has relatively few variables and the missing-data problem is relatively simple. Since you cannot tell this from the information that the authors gave in their paper, my comment as a reviewer would be that the authors should switch to mice in R.

 

Is there someway that SPSS will kick out a single set of regression coefficients that a user might interpret in the wrong way as coming from a single dataset? Can spss produce a single data set that might be described as what some call, “stochastic regression imputation,” which is where missing values are predicted by non-missing values within the data, but an random error term is then subsequently applied? In other words, can the MI procedure in SPSS be set to produce only a single dataset?

 

SPSS does have a possibility of combining the results of several imputed datasets into one result using Rubin’s combination rules. When SPSS recognizes a dataset as a multiple-imputation dataset, it gives the user a warning that the split file option must be switched on first, before carrying out any analysis (with Imputation_ as a split variable). Next, SPSS automatically does the combining, which is a really nice feature because you don’t need to do the combining yourself. What I usually do is impute the data in R first, save the result to a dataset in SPSS format, and next do the analyses in SPSS. However, SPSS does not pool the results of all statistical analyses. For example, it doesn’t pool the F-tests of ANOVA, R^2 in regression, or the results of PCA. Usually I use my own SPSS macros for that, which are freely available on my personal page. In some of my papers (Van Ginkel & Kroonenberg, 2014; Van Ginkel, 2019; Van Wingerde & Van Ginkel, 2021) I also refer to these macros. Most of these pooling procedures can also be done in R with the relevant packages by the way.

Long story short: I wouldn’t worry about the authors reporting a set of regression coefficients as coming from a single dataset. What I would worry about more, is the whole imputation process that preceded the analysis.

 

Best regards,

 

Joost van Ginkel

 

Thanks in advance,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Kirill Orlov
Jeff, There was a question on Cross Validated
https://stats.stackexchange.com/q/460238/3277 which you might take
interest in.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Jeff A
In reply to this post by Joost van Ginkel

 

Hi Joost,

 

Yes, I understood that the stochastic regression imputation method is intended to produce a single imputed dataset and I’ve read what you said in your 2020 paper about MI. The only thing I’m slightly uncertain about is the practical difference between a single imputed dataset using stochastic regression imputation (that appears to originate with Little & Schenker 1995 and Van Buuren, 2012 according to your paper) and what one would get if they used the SPSS MI procedure and set the number of imputations to 1. In re-reading what I wrote, I can see that I wasn’t clear. I realize that neither procedure is ideal and neither incorporate the uncertainty in the imputation process that is intended to be addressed by the proper use of MI. I’m just trying to get a bit of a better understanding. I’m assuming that if you compared these two less than ideal methods for producing a single dataset with imputed values substituted for the missing ones, that the stochastic regression imputation procedure you mentioned would somehow be better than the SPSS MI procedure that was done only once?

 

Keep in mind that I’m one of those “applied researchers” that you speak about in your article and although I have a reasonable background in applied stat, my copy of Little and Rubin sits on my bookshelf collecting dust since it’s a bit over my head.

 

Jeff

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 7:07 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Dear Jeff,

 

See below.

 

From: [hidden email] <[hidden email]>
Sent: Wednesday, April 7, 2021 10:20 AM
To: Ginkel, J.R. van <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

 

Ironically,

 

It was one of your papers from which I got the term, “stochastic regression imputation,” (van Ginkel et al, 2020 in J. Personality Assessment). I hadn’t heard of that term before I just read that paper.

 

I think you’re mixing up two things: the stochastic regression imputation I talked about in my 2020 paper was a method for single imputation, which you could say was the predecessor of fully conditional specification using regression. The regression method that I’m talking about is the one described on p. 4 of that paper in the Multiple Imputation Explained section.

 

I caught most of what you said and understand that R is much more sophisticated than most other statistical packages (I think it would take me a bit of time to fully digest your response), but am still curious if SPSS can be set to produce the type of singly-imputed dataset as you described above in that 2020 paper via its MI procedure even if this is not ideal in practice. I can imagine that it wouldn’t be too difficult to create a macro to so such a thing, but I’m wondering whether it’s built-in?

 

It is possible to produce a single imputed dataset with the MI procedure in SPSS by setting the number of imputations to 1, but that is not equivalent to stochastic regression imputation (the latter doesn’t use fully conditional specification as an estimation method).

 

 

Regardless of the paper, I can easily see that as being helpful in certain situations where you want to explore a number of different model specifications before settling on one that you’ll use in a final model.

 

Thanks in advance and for your former response.

 

Jeff

 

 

 

 

From: Ginkel, J.R. van <[hidden email]>
Sent: Wednesday, April 7, 2021 5:05 PM
To: '[hidden email]' <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

Dear Jeff,

 

See my answers below.

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 5:38 AM
To: [hidden email]
Subject: Multiple imputation question

 

 

I’m currently reviewing an article for a pretty decent journal in the psychological literature where the authors have said that they’ve used spss and also have used multiple imputation.

 

They are clearly either making a mistake (at worst) or just not describing things well (at best).

 

They say, “In order to analyze a complete data set multiple imputation (MI) was used to input the missing values.”

 

They state no other real details of the purported MI procedure they went through and are mixing up the definitions of MCAR, MAR, and NMAR (but only slightly – I’ve seen much worse).

 

I’m trying to understand what they may have done since otherwise, this is a very good paper, but I haven’t used spss’s implementation of multiple imputation (but have seen and assisted a colleague who was confused) so it’s difficult for me to figure out where they may have made an error.

 

SPSS performs multiple imputation, creates a new data file in which all imputed versions of the incomplete dataset are appended after another with the original dataset on top, all indicated by an indicator variable “Imputation_”.

 

I’ve only used older and less-user-friendly MI software (e.g., Norm) in the past (before MI became available in spss) and not seen in detail what spss can currently do with MI.

 

This may sound a bit offending to the makers of SPSS, but after doing multiple imputation in SPSS a few times I stopped using it and switched to the mice procedure in R. The basic procedure in SPSS is fully conditional specification, just like in R, but it lacks all flexibility that R has. In R you can specify a separate imputation model for each variable, which doesn’t necessarily need to include all variables entered, whereas in SPSS each variable is predicted by all other variables entered in the MI procedure. When you have entered many variables this will inevitably lead to overfitted imputation models, causing the imputed values to become near random (I have seen it happen in scatter plots). Additionally, R can use predictive mean matching (PMM) for some numerical variables and linear regression imputation for other numerical variables. In SPSS you can either use PMM for all numerical variables or regression for al numerical variables, but not one method for one set of variables and the other method for the other set of variables. These are just a few examples, but SPSS lacks many more (to my opinion, essential) features that mice has. To make matters worse, SPSS hardly has any diagnostic tools to determine whether the imputation process went right while mice has several features for that. In short, as an expert on multiple imputation I wouldn’t recommend the MI procedure in SPSS, unless the dataset has relatively few variables and the missing-data problem is relatively simple. Since you cannot tell this from the information that the authors gave in their paper, my comment as a reviewer would be that the authors should switch to mice in R.

 

Is there someway that SPSS will kick out a single set of regression coefficients that a user might interpret in the wrong way as coming from a single dataset? Can spss produce a single data set that might be described as what some call, “stochastic regression imputation,” which is where missing values are predicted by non-missing values within the data, but an random error term is then subsequently applied? In other words, can the MI procedure in SPSS be set to produce only a single dataset?

 

SPSS does have a possibility of combining the results of several imputed datasets into one result using Rubin’s combination rules. When SPSS recognizes a dataset as a multiple-imputation dataset, it gives the user a warning that the split file option must be switched on first, before carrying out any analysis (with Imputation_ as a split variable). Next, SPSS automatically does the combining, which is a really nice feature because you don’t need to do the combining yourself. What I usually do is impute the data in R first, save the result to a dataset in SPSS format, and next do the analyses in SPSS. However, SPSS does not pool the results of all statistical analyses. For example, it doesn’t pool the F-tests of ANOVA, R^2 in regression, or the results of PCA. Usually I use my own SPSS macros for that, which are freely available on my personal page. In some of my papers (Van Ginkel & Kroonenberg, 2014; Van Ginkel, 2019; Van Wingerde & Van Ginkel, 2021) I also refer to these macros. Most of these pooling procedures can also be done in R with the relevant packages by the way.

Long story short: I wouldn’t worry about the authors reporting a set of regression coefficients as coming from a single dataset. What I would worry about more, is the whole imputation process that preceded the analysis.

 

Best regards,

 

Joost van Ginkel

 

Thanks in advance,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Jeff A
In reply to this post by Kirill Orlov
Thanks Kirill,

Although some of the fine points of MI are over my head, I'm a bit beyond
the clear misunderstandings of the poster in that question you pointed to.
It's interesting, however, how many applied researchers do not understand
that MI produces multiple datasets that require a different analysis for
each set and require the combination of their results using Rubin's rules. I
had to explain this to a colleague a few years back since she also was
trying to figure out where she could see the "single pooled dataset." My
question is more of a hypothetical about how one could obtain the best
singly imputed dataset and how that best single one would be different from
using the SPSS MI procedure with the imputations set to 1.

Jeff


-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Kirill
Orlov
Sent: Wednesday, April 7, 2021 7:35 PM
To: [hidden email]
Subject: Re: Multiple imputation question

Jeff, There was a question on Cross Validated
https://stats.stackexchange.com/q/460238/3277 which you might take interest
in.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Joost van Ginkel
In reply to this post by Jeff A

Hello Jeff,

 

I think that Multiple imputation with the number of imputations set at M  = 1 is better than stochastic regression imputation. However, I would not recommend single imputation with any imputation procedure so the question which of the two single-imputation procedures shouldn’t even be asked if you ask me. It’s like asking what is better: performing three independent t-tests for comparing the means at three different time points or performing and independent ANOVA. The latter is less wrong than the former, but actually you should do neither.

 

Best,

 

Joost

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 11:49 AM
To: [hidden email]
Subject: Re: Multiple imputation question

 

 

Hi Joost,

 

Yes, I understood that the stochastic regression imputation method is intended to produce a single imputed dataset and I’ve read what you said in your 2020 paper about MI. The only thing I’m slightly uncertain about is the practical difference between a single imputed dataset using stochastic regression imputation (that appears to originate with Little & Schenker 1995 and Van Buuren, 2012 according to your paper) and what one would get if they used the SPSS MI procedure and set the number of imputations to 1. In re-reading what I wrote, I can see that I wasn’t clear. I realize that neither procedure is ideal and neither incorporate the uncertainty in the imputation process that is intended to be addressed by the proper use of MI. I’m just trying to get a bit of a better understanding. I’m assuming that if you compared these two less than ideal methods for producing a single dataset with imputed values substituted for the missing ones, that the stochastic regression imputation procedure you mentioned would somehow be better than the SPSS MI procedure that was done only once?

 

Keep in mind that I’m one of those “applied researchers” that you speak about in your article and although I have a reasonable background in applied stat, my copy of Little and Rubin sits on my bookshelf collecting dust since it’s a bit over my head.

 

Jeff

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 7:07 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Dear Jeff,

 

See below.

 

From: [hidden email] <[hidden email]>
Sent: Wednesday, April 7, 2021 10:20 AM
To: Ginkel, J.R. van <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

 

Ironically,

 

It was one of your papers from which I got the term, “stochastic regression imputation,” (van Ginkel et al, 2020 in J. Personality Assessment). I hadn’t heard of that term before I just read that paper.

 

I think you’re mixing up two things: the stochastic regression imputation I talked about in my 2020 paper was a method for single imputation, which you could say was the predecessor of fully conditional specification using regression. The regression method that I’m talking about is the one described on p. 4 of that paper in the Multiple Imputation Explained section.

 

I caught most of what you said and understand that R is much more sophisticated than most other statistical packages (I think it would take me a bit of time to fully digest your response), but am still curious if SPSS can be set to produce the type of singly-imputed dataset as you described above in that 2020 paper via its MI procedure even if this is not ideal in practice. I can imagine that it wouldn’t be too difficult to create a macro to so such a thing, but I’m wondering whether it’s built-in?

 

It is possible to produce a single imputed dataset with the MI procedure in SPSS by setting the number of imputations to 1, but that is not equivalent to stochastic regression imputation (the latter doesn’t use fully conditional specification as an estimation method).

 

 

Regardless of the paper, I can easily see that as being helpful in certain situations where you want to explore a number of different model specifications before settling on one that you’ll use in a final model.

 

Thanks in advance and for your former response.

 

Jeff

 

 

 

 

From: Ginkel, J.R. van <[hidden email]>
Sent: Wednesday, April 7, 2021 5:05 PM
To: '[hidden email]' <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

Dear Jeff,

 

See my answers below.

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 5:38 AM
To: [hidden email]
Subject: Multiple imputation question

 

 

I’m currently reviewing an article for a pretty decent journal in the psychological literature where the authors have said that they’ve used spss and also have used multiple imputation.

 

They are clearly either making a mistake (at worst) or just not describing things well (at best).

 

They say, “In order to analyze a complete data set multiple imputation (MI) was used to input the missing values.”

 

They state no other real details of the purported MI procedure they went through and are mixing up the definitions of MCAR, MAR, and NMAR (but only slightly – I’ve seen much worse).

 

I’m trying to understand what they may have done since otherwise, this is a very good paper, but I haven’t used spss’s implementation of multiple imputation (but have seen and assisted a colleague who was confused) so it’s difficult for me to figure out where they may have made an error.

 

SPSS performs multiple imputation, creates a new data file in which all imputed versions of the incomplete dataset are appended after another with the original dataset on top, all indicated by an indicator variable “Imputation_”.

 

I’ve only used older and less-user-friendly MI software (e.g., Norm) in the past (before MI became available in spss) and not seen in detail what spss can currently do with MI.

 

This may sound a bit offending to the makers of SPSS, but after doing multiple imputation in SPSS a few times I stopped using it and switched to the mice procedure in R. The basic procedure in SPSS is fully conditional specification, just like in R, but it lacks all flexibility that R has. In R you can specify a separate imputation model for each variable, which doesn’t necessarily need to include all variables entered, whereas in SPSS each variable is predicted by all other variables entered in the MI procedure. When you have entered many variables this will inevitably lead to overfitted imputation models, causing the imputed values to become near random (I have seen it happen in scatter plots). Additionally, R can use predictive mean matching (PMM) for some numerical variables and linear regression imputation for other numerical variables. In SPSS you can either use PMM for all numerical variables or regression for al numerical variables, but not one method for one set of variables and the other method for the other set of variables. These are just a few examples, but SPSS lacks many more (to my opinion, essential) features that mice has. To make matters worse, SPSS hardly has any diagnostic tools to determine whether the imputation process went right while mice has several features for that. In short, as an expert on multiple imputation I wouldn’t recommend the MI procedure in SPSS, unless the dataset has relatively few variables and the missing-data problem is relatively simple. Since you cannot tell this from the information that the authors gave in their paper, my comment as a reviewer would be that the authors should switch to mice in R.

 

Is there someway that SPSS will kick out a single set of regression coefficients that a user might interpret in the wrong way as coming from a single dataset? Can spss produce a single data set that might be described as what some call, “stochastic regression imputation,” which is where missing values are predicted by non-missing values within the data, but an random error term is then subsequently applied? In other words, can the MI procedure in SPSS be set to produce only a single dataset?

 

SPSS does have a possibility of combining the results of several imputed datasets into one result using Rubin’s combination rules. When SPSS recognizes a dataset as a multiple-imputation dataset, it gives the user a warning that the split file option must be switched on first, before carrying out any analysis (with Imputation_ as a split variable). Next, SPSS automatically does the combining, which is a really nice feature because you don’t need to do the combining yourself. What I usually do is impute the data in R first, save the result to a dataset in SPSS format, and next do the analyses in SPSS. However, SPSS does not pool the results of all statistical analyses. For example, it doesn’t pool the F-tests of ANOVA, R^2 in regression, or the results of PCA. Usually I use my own SPSS macros for that, which are freely available on my personal page. In some of my papers (Van Ginkel & Kroonenberg, 2014; Van Ginkel, 2019; Van Wingerde & Van Ginkel, 2021) I also refer to these macros. Most of these pooling procedures can also be done in R with the relevant packages by the way.

Long story short: I wouldn’t worry about the authors reporting a set of regression coefficients as coming from a single dataset. What I would worry about more, is the whole imputation process that preceded the analysis.

 

Best regards,

 

Joost van Ginkel

 

Thanks in advance,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Jeff A

 

Hi Joost,

 

Yes, I get the idea. I guess I just like hypotheticals.

 

…but are there not some procedures where you can’t combine the separate results from multiple analyses on multiple datasets using Rubin’s rules or the equivalent? E.g., can you use MI with factor analysis?

 

Jeff

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 8:17 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Hello Jeff,

 

I think that Multiple imputation with the number of imputations set at M  = 1 is better than stochastic regression imputation. However, I would not recommend single imputation with any imputation procedure so the question which of the two single-imputation procedures shouldn’t even be asked if you ask me. It’s like asking what is better: performing three independent t-tests for comparing the means at three different time points or performing and independent ANOVA. The latter is less wrong than the former, but actually you should do neither.

 

Best,

 

Joost

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 11:49 AM
To: [hidden email]
Subject: Re: Multiple imputation question

 

 

Hi Joost,

 

Yes, I understood that the stochastic regression imputation method is intended to produce a single imputed dataset and I’ve read what you said in your 2020 paper about MI. The only thing I’m slightly uncertain about is the practical difference between a single imputed dataset using stochastic regression imputation (that appears to originate with Little & Schenker 1995 and Van Buuren, 2012 according to your paper) and what one would get if they used the SPSS MI procedure and set the number of imputations to 1. In re-reading what I wrote, I can see that I wasn’t clear. I realize that neither procedure is ideal and neither incorporate the uncertainty in the imputation process that is intended to be addressed by the proper use of MI. I’m just trying to get a bit of a better understanding. I’m assuming that if you compared these two less than ideal methods for producing a single dataset with imputed values substituted for the missing ones, that the stochastic regression imputation procedure you mentioned would somehow be better than the SPSS MI procedure that was done only once?

 

Keep in mind that I’m one of those “applied researchers” that you speak about in your article and although I have a reasonable background in applied stat, my copy of Little and Rubin sits on my bookshelf collecting dust since it’s a bit over my head.

 

Jeff

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 7:07 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Dear Jeff,

 

See below.

 

From: [hidden email] <[hidden email]>
Sent: Wednesday, April 7, 2021 10:20 AM
To: Ginkel, J.R. van <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

 

Ironically,

 

It was one of your papers from which I got the term, “stochastic regression imputation,” (van Ginkel et al, 2020 in J. Personality Assessment). I hadn’t heard of that term before I just read that paper.

 

I think you’re mixing up two things: the stochastic regression imputation I talked about in my 2020 paper was a method for single imputation, which you could say was the predecessor of fully conditional specification using regression. The regression method that I’m talking about is the one described on p. 4 of that paper in the Multiple Imputation Explained section.

 

I caught most of what you said and understand that R is much more sophisticated than most other statistical packages (I think it would take me a bit of time to fully digest your response), but am still curious if SPSS can be set to produce the type of singly-imputed dataset as you described above in that 2020 paper via its MI procedure even if this is not ideal in practice. I can imagine that it wouldn’t be too difficult to create a macro to so such a thing, but I’m wondering whether it’s built-in?

 

It is possible to produce a single imputed dataset with the MI procedure in SPSS by setting the number of imputations to 1, but that is not equivalent to stochastic regression imputation (the latter doesn’t use fully conditional specification as an estimation method).

 

 

Regardless of the paper, I can easily see that as being helpful in certain situations where you want to explore a number of different model specifications before settling on one that you’ll use in a final model.

 

Thanks in advance and for your former response.

 

Jeff

 

 

 

 

From: Ginkel, J.R. van <[hidden email]>
Sent: Wednesday, April 7, 2021 5:05 PM
To: '[hidden email]' <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

Dear Jeff,

 

See my answers below.

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 5:38 AM
To: [hidden email]
Subject: Multiple imputation question

 

 

I’m currently reviewing an article for a pretty decent journal in the psychological literature where the authors have said that they’ve used spss and also have used multiple imputation.

 

They are clearly either making a mistake (at worst) or just not describing things well (at best).

 

They say, “In order to analyze a complete data set multiple imputation (MI) was used to input the missing values.”

 

They state no other real details of the purported MI procedure they went through and are mixing up the definitions of MCAR, MAR, and NMAR (but only slightly – I’ve seen much worse).

 

I’m trying to understand what they may have done since otherwise, this is a very good paper, but I haven’t used spss’s implementation of multiple imputation (but have seen and assisted a colleague who was confused) so it’s difficult for me to figure out where they may have made an error.

 

SPSS performs multiple imputation, creates a new data file in which all imputed versions of the incomplete dataset are appended after another with the original dataset on top, all indicated by an indicator variable “Imputation_”.

 

I’ve only used older and less-user-friendly MI software (e.g., Norm) in the past (before MI became available in spss) and not seen in detail what spss can currently do with MI.

 

This may sound a bit offending to the makers of SPSS, but after doing multiple imputation in SPSS a few times I stopped using it and switched to the mice procedure in R. The basic procedure in SPSS is fully conditional specification, just like in R, but it lacks all flexibility that R has. In R you can specify a separate imputation model for each variable, which doesn’t necessarily need to include all variables entered, whereas in SPSS each variable is predicted by all other variables entered in the MI procedure. When you have entered many variables this will inevitably lead to overfitted imputation models, causing the imputed values to become near random (I have seen it happen in scatter plots). Additionally, R can use predictive mean matching (PMM) for some numerical variables and linear regression imputation for other numerical variables. In SPSS you can either use PMM for all numerical variables or regression for al numerical variables, but not one method for one set of variables and the other method for the other set of variables. These are just a few examples, but SPSS lacks many more (to my opinion, essential) features that mice has. To make matters worse, SPSS hardly has any diagnostic tools to determine whether the imputation process went right while mice has several features for that. In short, as an expert on multiple imputation I wouldn’t recommend the MI procedure in SPSS, unless the dataset has relatively few variables and the missing-data problem is relatively simple. Since you cannot tell this from the information that the authors gave in their paper, my comment as a reviewer would be that the authors should switch to mice in R.

 

Is there someway that SPSS will kick out a single set of regression coefficients that a user might interpret in the wrong way as coming from a single dataset? Can spss produce a single data set that might be described as what some call, “stochastic regression imputation,” which is where missing values are predicted by non-missing values within the data, but an random error term is then subsequently applied? In other words, can the MI procedure in SPSS be set to produce only a single dataset?

 

SPSS does have a possibility of combining the results of several imputed datasets into one result using Rubin’s combination rules. When SPSS recognizes a dataset as a multiple-imputation dataset, it gives the user a warning that the split file option must be switched on first, before carrying out any analysis (with Imputation_ as a split variable). Next, SPSS automatically does the combining, which is a really nice feature because you don’t need to do the combining yourself. What I usually do is impute the data in R first, save the result to a dataset in SPSS format, and next do the analyses in SPSS. However, SPSS does not pool the results of all statistical analyses. For example, it doesn’t pool the F-tests of ANOVA, R^2 in regression, or the results of PCA. Usually I use my own SPSS macros for that, which are freely available on my personal page. In some of my papers (Van Ginkel & Kroonenberg, 2014; Van Ginkel, 2019; Van Wingerde & Van Ginkel, 2021) I also refer to these macros. Most of these pooling procedures can also be done in R with the relevant packages by the way.

Long story short: I wouldn’t worry about the authors reporting a set of regression coefficients as coming from a single dataset. What I would worry about more, is the whole imputation process that preceded the analysis.

 

Best regards,

 

Joost van Ginkel

 

Thanks in advance,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Joost van Ginkel

Hello Jeff,

 

Yes, I think you can use MI with factor analysis. I haven’t done factor analysis in my career a lot, but what I remember is that the factor loadings are tested for significance using either a t or a z test. You can simply apply Rubin’s rules for single-parameter estimates there. However, I’m not sure how to combine the overall Chi-square test. If the Chi-square test is based on a Likelihood-Ratio test (which I am not sure of), it should be possible to combine the results of this test because there are combination rules for LR tests as well (although the application is far from trivial).

 

Best,

 

Joost

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 12:34 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

 

Hi Joost,

 

Yes, I get the idea. I guess I just like hypotheticals.

 

…but are there not some procedures where you can’t combine the separate results from multiple analyses on multiple datasets using Rubin’s rules or the equivalent? E.g., can you use MI with factor analysis?

 

Jeff

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 8:17 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Hello Jeff,

 

I think that Multiple imputation with the number of imputations set at M  = 1 is better than stochastic regression imputation. However, I would not recommend single imputation with any imputation procedure so the question which of the two single-imputation procedures shouldn’t even be asked if you ask me. It’s like asking what is better: performing three independent t-tests for comparing the means at three different time points or performing and independent ANOVA. The latter is less wrong than the former, but actually you should do neither.

 

Best,

 

Joost

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 11:49 AM
To: [hidden email]
Subject: Re: Multiple imputation question

 

 

Hi Joost,

 

Yes, I understood that the stochastic regression imputation method is intended to produce a single imputed dataset and I’ve read what you said in your 2020 paper about MI. The only thing I’m slightly uncertain about is the practical difference between a single imputed dataset using stochastic regression imputation (that appears to originate with Little & Schenker 1995 and Van Buuren, 2012 according to your paper) and what one would get if they used the SPSS MI procedure and set the number of imputations to 1. In re-reading what I wrote, I can see that I wasn’t clear. I realize that neither procedure is ideal and neither incorporate the uncertainty in the imputation process that is intended to be addressed by the proper use of MI. I’m just trying to get a bit of a better understanding. I’m assuming that if you compared these two less than ideal methods for producing a single dataset with imputed values substituted for the missing ones, that the stochastic regression imputation procedure you mentioned would somehow be better than the SPSS MI procedure that was done only once?

 

Keep in mind that I’m one of those “applied researchers” that you speak about in your article and although I have a reasonable background in applied stat, my copy of Little and Rubin sits on my bookshelf collecting dust since it’s a bit over my head.

 

Jeff

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 7:07 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Dear Jeff,

 

See below.

 

From: [hidden email] <[hidden email]>
Sent: Wednesday, April 7, 2021 10:20 AM
To: Ginkel, J.R. van <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

 

Ironically,

 

It was one of your papers from which I got the term, “stochastic regression imputation,” (van Ginkel et al, 2020 in J. Personality Assessment). I hadn’t heard of that term before I just read that paper.

 

I think you’re mixing up two things: the stochastic regression imputation I talked about in my 2020 paper was a method for single imputation, which you could say was the predecessor of fully conditional specification using regression. The regression method that I’m talking about is the one described on p. 4 of that paper in the Multiple Imputation Explained section.

 

I caught most of what you said and understand that R is much more sophisticated than most other statistical packages (I think it would take me a bit of time to fully digest your response), but am still curious if SPSS can be set to produce the type of singly-imputed dataset as you described above in that 2020 paper via its MI procedure even if this is not ideal in practice. I can imagine that it wouldn’t be too difficult to create a macro to so such a thing, but I’m wondering whether it’s built-in?

 

It is possible to produce a single imputed dataset with the MI procedure in SPSS by setting the number of imputations to 1, but that is not equivalent to stochastic regression imputation (the latter doesn’t use fully conditional specification as an estimation method).

 

 

Regardless of the paper, I can easily see that as being helpful in certain situations where you want to explore a number of different model specifications before settling on one that you’ll use in a final model.

 

Thanks in advance and for your former response.

 

Jeff

 

 

 

 

From: Ginkel, J.R. van <[hidden email]>
Sent: Wednesday, April 7, 2021 5:05 PM
To: '[hidden email]' <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

Dear Jeff,

 

See my answers below.

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 5:38 AM
To: [hidden email]
Subject: Multiple imputation question

 

 

I’m currently reviewing an article for a pretty decent journal in the psychological literature where the authors have said that they’ve used spss and also have used multiple imputation.

 

They are clearly either making a mistake (at worst) or just not describing things well (at best).

 

They say, “In order to analyze a complete data set multiple imputation (MI) was used to input the missing values.”

 

They state no other real details of the purported MI procedure they went through and are mixing up the definitions of MCAR, MAR, and NMAR (but only slightly – I’ve seen much worse).

 

I’m trying to understand what they may have done since otherwise, this is a very good paper, but I haven’t used spss’s implementation of multiple imputation (but have seen and assisted a colleague who was confused) so it’s difficult for me to figure out where they may have made an error.

 

SPSS performs multiple imputation, creates a new data file in which all imputed versions of the incomplete dataset are appended after another with the original dataset on top, all indicated by an indicator variable “Imputation_”.

 

I’ve only used older and less-user-friendly MI software (e.g., Norm) in the past (before MI became available in spss) and not seen in detail what spss can currently do with MI.

 

This may sound a bit offending to the makers of SPSS, but after doing multiple imputation in SPSS a few times I stopped using it and switched to the mice procedure in R. The basic procedure in SPSS is fully conditional specification, just like in R, but it lacks all flexibility that R has. In R you can specify a separate imputation model for each variable, which doesn’t necessarily need to include all variables entered, whereas in SPSS each variable is predicted by all other variables entered in the MI procedure. When you have entered many variables this will inevitably lead to overfitted imputation models, causing the imputed values to become near random (I have seen it happen in scatter plots). Additionally, R can use predictive mean matching (PMM) for some numerical variables and linear regression imputation for other numerical variables. In SPSS you can either use PMM for all numerical variables or regression for al numerical variables, but not one method for one set of variables and the other method for the other set of variables. These are just a few examples, but SPSS lacks many more (to my opinion, essential) features that mice has. To make matters worse, SPSS hardly has any diagnostic tools to determine whether the imputation process went right while mice has several features for that. In short, as an expert on multiple imputation I wouldn’t recommend the MI procedure in SPSS, unless the dataset has relatively few variables and the missing-data problem is relatively simple. Since you cannot tell this from the information that the authors gave in their paper, my comment as a reviewer would be that the authors should switch to mice in R.

 

Is there someway that SPSS will kick out a single set of regression coefficients that a user might interpret in the wrong way as coming from a single dataset? Can spss produce a single data set that might be described as what some call, “stochastic regression imputation,” which is where missing values are predicted by non-missing values within the data, but an random error term is then subsequently applied? In other words, can the MI procedure in SPSS be set to produce only a single dataset?

 

SPSS does have a possibility of combining the results of several imputed datasets into one result using Rubin’s combination rules. When SPSS recognizes a dataset as a multiple-imputation dataset, it gives the user a warning that the split file option must be switched on first, before carrying out any analysis (with Imputation_ as a split variable). Next, SPSS automatically does the combining, which is a really nice feature because you don’t need to do the combining yourself. What I usually do is impute the data in R first, save the result to a dataset in SPSS format, and next do the analyses in SPSS. However, SPSS does not pool the results of all statistical analyses. For example, it doesn’t pool the F-tests of ANOVA, R^2 in regression, or the results of PCA. Usually I use my own SPSS macros for that, which are freely available on my personal page. In some of my papers (Van Ginkel & Kroonenberg, 2014; Van Ginkel, 2019; Van Wingerde & Van Ginkel, 2021) I also refer to these macros. Most of these pooling procedures can also be done in R with the relevant packages by the way.

Long story short: I wouldn’t worry about the authors reporting a set of regression coefficients as coming from a single dataset. What I would worry about more, is the whole imputation process that preceded the analysis.

 

Best regards,

 

Joost van Ginkel

 

Thanks in advance,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Kirill Orlov
In reply to this post by Jeff A
E.g., can you use MI with factor analysis?

If CORRELATIONS support MI pooling (I don't remember it), You can get the pooled estimated matrix and input it to FACTOR via syntax to obtain loadings (though not factor scores). Also to remark, MVA (Missing Value Analysis) procedure's EM method produces correct, unbiased estimates of correlations too. So, if you don't insist to have the complete imputed *dataset* and may do with just a correlation matrix, that is two ways to go.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Jeff A
In reply to this post by Joost van Ginkel

 

I think you just answered your own question about why some of us might want a single dataset on occasion that might not be ideal, but might be sufficient, especially when you’re working in the social sciences where there is generally so much error in the original measurements (that are often about latent issues such as attitudes). If you’re unsure about how to do some of this, what do you think us mere mortals are faced with when the data analysis is only a small portion of what we’re doing in any given study? 😊 More seriously, thanks for the assistance and explanation. I think your 2020 piece is now among my top two favorites for easy-to-follow MI discussions (along with the Schafer and Graham 2002 article you cited in that paper). I had both of those folks for instructors many years ago, but think I’ve forgotten much of what they have taught me.

 

Jeff

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 8:41 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Hello Jeff,

 

Yes, I think you can use MI with factor analysis. I haven’t done factor analysis in my career a lot, but what I remember is that the factor loadings are tested for significance using either a t or a z test. You can simply apply Rubin’s rules for single-parameter estimates there. However, I’m not sure how to combine the overall Chi-square test. If the Chi-square test is based on a Likelihood-Ratio test (which I am not sure of), it should be possible to combine the results of this test because there are combination rules for LR tests as well (although the application is far from trivial).

 

Best,

 

Joost

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 12:34 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

 

Hi Joost,

 

Yes, I get the idea. I guess I just like hypotheticals.

 

…but are there not some procedures where you can’t combine the separate results from multiple analyses on multiple datasets using Rubin’s rules or the equivalent? E.g., can you use MI with factor analysis?

 

Jeff

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 8:17 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Hello Jeff,

 

I think that Multiple imputation with the number of imputations set at M  = 1 is better than stochastic regression imputation. However, I would not recommend single imputation with any imputation procedure so the question which of the two single-imputation procedures shouldn’t even be asked if you ask me. It’s like asking what is better: performing three independent t-tests for comparing the means at three different time points or performing and independent ANOVA. The latter is less wrong than the former, but actually you should do neither.

 

Best,

 

Joost

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 11:49 AM
To: [hidden email]
Subject: Re: Multiple imputation question

 

 

Hi Joost,

 

Yes, I understood that the stochastic regression imputation method is intended to produce a single imputed dataset and I’ve read what you said in your 2020 paper about MI. The only thing I’m slightly uncertain about is the practical difference between a single imputed dataset using stochastic regression imputation (that appears to originate with Little & Schenker 1995 and Van Buuren, 2012 according to your paper) and what one would get if they used the SPSS MI procedure and set the number of imputations to 1. In re-reading what I wrote, I can see that I wasn’t clear. I realize that neither procedure is ideal and neither incorporate the uncertainty in the imputation process that is intended to be addressed by the proper use of MI. I’m just trying to get a bit of a better understanding. I’m assuming that if you compared these two less than ideal methods for producing a single dataset with imputed values substituted for the missing ones, that the stochastic regression imputation procedure you mentioned would somehow be better than the SPSS MI procedure that was done only once?

 

Keep in mind that I’m one of those “applied researchers” that you speak about in your article and although I have a reasonable background in applied stat, my copy of Little and Rubin sits on my bookshelf collecting dust since it’s a bit over my head.

 

Jeff

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 7:07 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Dear Jeff,

 

See below.

 

From: [hidden email] <[hidden email]>
Sent: Wednesday, April 7, 2021 10:20 AM
To: Ginkel, J.R. van <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

 

Ironically,

 

It was one of your papers from which I got the term, “stochastic regression imputation,” (van Ginkel et al, 2020 in J. Personality Assessment). I hadn’t heard of that term before I just read that paper.

 

I think you’re mixing up two things: the stochastic regression imputation I talked about in my 2020 paper was a method for single imputation, which you could say was the predecessor of fully conditional specification using regression. The regression method that I’m talking about is the one described on p. 4 of that paper in the Multiple Imputation Explained section.

 

I caught most of what you said and understand that R is much more sophisticated than most other statistical packages (I think it would take me a bit of time to fully digest your response), but am still curious if SPSS can be set to produce the type of singly-imputed dataset as you described above in that 2020 paper via its MI procedure even if this is not ideal in practice. I can imagine that it wouldn’t be too difficult to create a macro to so such a thing, but I’m wondering whether it’s built-in?

 

It is possible to produce a single imputed dataset with the MI procedure in SPSS by setting the number of imputations to 1, but that is not equivalent to stochastic regression imputation (the latter doesn’t use fully conditional specification as an estimation method).

 

 

Regardless of the paper, I can easily see that as being helpful in certain situations where you want to explore a number of different model specifications before settling on one that you’ll use in a final model.

 

Thanks in advance and for your former response.

 

Jeff

 

 

 

 

From: Ginkel, J.R. van <[hidden email]>
Sent: Wednesday, April 7, 2021 5:05 PM
To: '[hidden email]' <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

Dear Jeff,

 

See my answers below.

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 5:38 AM
To: [hidden email]
Subject: Multiple imputation question

 

 

I’m currently reviewing an article for a pretty decent journal in the psychological literature where the authors have said that they’ve used spss and also have used multiple imputation.

 

They are clearly either making a mistake (at worst) or just not describing things well (at best).

 

They say, “In order to analyze a complete data set multiple imputation (MI) was used to input the missing values.”

 

They state no other real details of the purported MI procedure they went through and are mixing up the definitions of MCAR, MAR, and NMAR (but only slightly – I’ve seen much worse).

 

I’m trying to understand what they may have done since otherwise, this is a very good paper, but I haven’t used spss’s implementation of multiple imputation (but have seen and assisted a colleague who was confused) so it’s difficult for me to figure out where they may have made an error.

 

SPSS performs multiple imputation, creates a new data file in which all imputed versions of the incomplete dataset are appended after another with the original dataset on top, all indicated by an indicator variable “Imputation_”.

 

I’ve only used older and less-user-friendly MI software (e.g., Norm) in the past (before MI became available in spss) and not seen in detail what spss can currently do with MI.

 

This may sound a bit offending to the makers of SPSS, but after doing multiple imputation in SPSS a few times I stopped using it and switched to the mice procedure in R. The basic procedure in SPSS is fully conditional specification, just like in R, but it lacks all flexibility that R has. In R you can specify a separate imputation model for each variable, which doesn’t necessarily need to include all variables entered, whereas in SPSS each variable is predicted by all other variables entered in the MI procedure. When you have entered many variables this will inevitably lead to overfitted imputation models, causing the imputed values to become near random (I have seen it happen in scatter plots). Additionally, R can use predictive mean matching (PMM) for some numerical variables and linear regression imputation for other numerical variables. In SPSS you can either use PMM for all numerical variables or regression for al numerical variables, but not one method for one set of variables and the other method for the other set of variables. These are just a few examples, but SPSS lacks many more (to my opinion, essential) features that mice has. To make matters worse, SPSS hardly has any diagnostic tools to determine whether the imputation process went right while mice has several features for that. In short, as an expert on multiple imputation I wouldn’t recommend the MI procedure in SPSS, unless the dataset has relatively few variables and the missing-data problem is relatively simple. Since you cannot tell this from the information that the authors gave in their paper, my comment as a reviewer would be that the authors should switch to mice in R.

 

Is there someway that SPSS will kick out a single set of regression coefficients that a user might interpret in the wrong way as coming from a single dataset? Can spss produce a single data set that might be described as what some call, “stochastic regression imputation,” which is where missing values are predicted by non-missing values within the data, but an random error term is then subsequently applied? In other words, can the MI procedure in SPSS be set to produce only a single dataset?

 

SPSS does have a possibility of combining the results of several imputed datasets into one result using Rubin’s combination rules. When SPSS recognizes a dataset as a multiple-imputation dataset, it gives the user a warning that the split file option must be switched on first, before carrying out any analysis (with Imputation_ as a split variable). Next, SPSS automatically does the combining, which is a really nice feature because you don’t need to do the combining yourself. What I usually do is impute the data in R first, save the result to a dataset in SPSS format, and next do the analyses in SPSS. However, SPSS does not pool the results of all statistical analyses. For example, it doesn’t pool the F-tests of ANOVA, R^2 in regression, or the results of PCA. Usually I use my own SPSS macros for that, which are freely available on my personal page. In some of my papers (Van Ginkel & Kroonenberg, 2014; Van Ginkel, 2019; Van Wingerde & Van Ginkel, 2021) I also refer to these macros. Most of these pooling procedures can also be done in R with the relevant packages by the way.

Long story short: I wouldn’t worry about the authors reporting a set of regression coefficients as coming from a single dataset. What I would worry about more, is the whole imputation process that preceded the analysis.

 

Best regards,

 

Joost van Ginkel

 

Thanks in advance,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Joost van Ginkel
In reply to this post by Kirill Orlov

But if you enter only a pooled correlation matrix as the input for the factor analysis, then you wipe away all the differences in correlations between the imputed datasets, which is then not incorporated in the statistical tests. It will give you unbiased factor loadings but not unbiased statistical tests.

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Kirill Orlov
Sent: Wednesday, April 7, 2021 12:53 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

E.g., can you use MI with factor analysis?

If CORRELATIONS support MI pooling (I don't remember it), You can get the pooled estimated matrix and input it to FACTOR via syntax to obtain loadings (though not factor scores). Also to remark, MVA (Missing Value Analysis) procedure's EM method produces correct, unbiased estimates of correlations too. So, if you don't insist to have the complete imputed *dataset* and may do with just a correlation matrix, that is two ways to go.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Joost van Ginkel
In reply to this post by Jeff A

Haha, what you’re describing is kind of cheating. ;) Anyway, you’re very welcome. I’m happy to hear that my paper is in your top list of easy-to-follow MI discussions! Sounds like we succeeded in accomplishing our goal!

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 1:25 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

 

I think you just answered your own question about why some of us might want a single dataset on occasion that might not be ideal, but might be sufficient, especially when you’re working in the social sciences where there is generally so much error in the original measurements (that are often about latent issues such as attitudes). If you’re unsure about how to do some of this, what do you think us mere mortals are faced with when the data analysis is only a small portion of what we’re doing in any given study? 😊 More seriously, thanks for the assistance and explanation. I think your 2020 piece is now among my top two favorites for easy-to-follow MI discussions (along with the Schafer and Graham 2002 article you cited in that paper). I had both of those folks for instructors many years ago, but think I’ve forgotten much of what they have taught me.

 

Jeff

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 8:41 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Hello Jeff,

 

Yes, I think you can use MI with factor analysis. I haven’t done factor analysis in my career a lot, but what I remember is that the factor loadings are tested for significance using either a t or a z test. You can simply apply Rubin’s rules for single-parameter estimates there. However, I’m not sure how to combine the overall Chi-square test. If the Chi-square test is based on a Likelihood-Ratio test (which I am not sure of), it should be possible to combine the results of this test because there are combination rules for LR tests as well (although the application is far from trivial).

 

Best,

 

Joost

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 12:34 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

 

Hi Joost,

 

Yes, I get the idea. I guess I just like hypotheticals.

 

…but are there not some procedures where you can’t combine the separate results from multiple analyses on multiple datasets using Rubin’s rules or the equivalent? E.g., can you use MI with factor analysis?

 

Jeff

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 8:17 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Hello Jeff,

 

I think that Multiple imputation with the number of imputations set at M  = 1 is better than stochastic regression imputation. However, I would not recommend single imputation with any imputation procedure so the question which of the two single-imputation procedures shouldn’t even be asked if you ask me. It’s like asking what is better: performing three independent t-tests for comparing the means at three different time points or performing and independent ANOVA. The latter is less wrong than the former, but actually you should do neither.

 

Best,

 

Joost

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 11:49 AM
To: [hidden email]
Subject: Re: Multiple imputation question

 

 

Hi Joost,

 

Yes, I understood that the stochastic regression imputation method is intended to produce a single imputed dataset and I’ve read what you said in your 2020 paper about MI. The only thing I’m slightly uncertain about is the practical difference between a single imputed dataset using stochastic regression imputation (that appears to originate with Little & Schenker 1995 and Van Buuren, 2012 according to your paper) and what one would get if they used the SPSS MI procedure and set the number of imputations to 1. In re-reading what I wrote, I can see that I wasn’t clear. I realize that neither procedure is ideal and neither incorporate the uncertainty in the imputation process that is intended to be addressed by the proper use of MI. I’m just trying to get a bit of a better understanding. I’m assuming that if you compared these two less than ideal methods for producing a single dataset with imputed values substituted for the missing ones, that the stochastic regression imputation procedure you mentioned would somehow be better than the SPSS MI procedure that was done only once?

 

Keep in mind that I’m one of those “applied researchers” that you speak about in your article and although I have a reasonable background in applied stat, my copy of Little and Rubin sits on my bookshelf collecting dust since it’s a bit over my head.

 

Jeff

 

 

 

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ginkel, J.R. van
Sent: Wednesday, April 7, 2021 7:07 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Dear Jeff,

 

See below.

 

From: [hidden email] <[hidden email]>
Sent: Wednesday, April 7, 2021 10:20 AM
To: Ginkel, J.R. van <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

 

Ironically,

 

It was one of your papers from which I got the term, “stochastic regression imputation,” (van Ginkel et al, 2020 in J. Personality Assessment). I hadn’t heard of that term before I just read that paper.

 

I think you’re mixing up two things: the stochastic regression imputation I talked about in my 2020 paper was a method for single imputation, which you could say was the predecessor of fully conditional specification using regression. The regression method that I’m talking about is the one described on p. 4 of that paper in the Multiple Imputation Explained section.

 

I caught most of what you said and understand that R is much more sophisticated than most other statistical packages (I think it would take me a bit of time to fully digest your response), but am still curious if SPSS can be set to produce the type of singly-imputed dataset as you described above in that 2020 paper via its MI procedure even if this is not ideal in practice. I can imagine that it wouldn’t be too difficult to create a macro to so such a thing, but I’m wondering whether it’s built-in?

 

It is possible to produce a single imputed dataset with the MI procedure in SPSS by setting the number of imputations to 1, but that is not equivalent to stochastic regression imputation (the latter doesn’t use fully conditional specification as an estimation method).

 

 

Regardless of the paper, I can easily see that as being helpful in certain situations where you want to explore a number of different model specifications before settling on one that you’ll use in a final model.

 

Thanks in advance and for your former response.

 

Jeff

 

 

 

 

From: Ginkel, J.R. van <[hidden email]>
Sent: Wednesday, April 7, 2021 5:05 PM
To: '[hidden email]' <[hidden email]>; [hidden email]
Subject: RE: Multiple imputation question

 

Dear Jeff,

 

See my answers below.

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Jeff A
Sent: Wednesday, April 7, 2021 5:38 AM
To: [hidden email]
Subject: Multiple imputation question

 

 

I’m currently reviewing an article for a pretty decent journal in the psychological literature where the authors have said that they’ve used spss and also have used multiple imputation.

 

They are clearly either making a mistake (at worst) or just not describing things well (at best).

 

They say, “In order to analyze a complete data set multiple imputation (MI) was used to input the missing values.”

 

They state no other real details of the purported MI procedure they went through and are mixing up the definitions of MCAR, MAR, and NMAR (but only slightly – I’ve seen much worse).

 

I’m trying to understand what they may have done since otherwise, this is a very good paper, but I haven’t used spss’s implementation of multiple imputation (but have seen and assisted a colleague who was confused) so it’s difficult for me to figure out where they may have made an error.

 

SPSS performs multiple imputation, creates a new data file in which all imputed versions of the incomplete dataset are appended after another with the original dataset on top, all indicated by an indicator variable “Imputation_”.

 

I’ve only used older and less-user-friendly MI software (e.g., Norm) in the past (before MI became available in spss) and not seen in detail what spss can currently do with MI.

 

This may sound a bit offending to the makers of SPSS, but after doing multiple imputation in SPSS a few times I stopped using it and switched to the mice procedure in R. The basic procedure in SPSS is fully conditional specification, just like in R, but it lacks all flexibility that R has. In R you can specify a separate imputation model for each variable, which doesn’t necessarily need to include all variables entered, whereas in SPSS each variable is predicted by all other variables entered in the MI procedure. When you have entered many variables this will inevitably lead to overfitted imputation models, causing the imputed values to become near random (I have seen it happen in scatter plots). Additionally, R can use predictive mean matching (PMM) for some numerical variables and linear regression imputation for other numerical variables. In SPSS you can either use PMM for all numerical variables or regression for al numerical variables, but not one method for one set of variables and the other method for the other set of variables. These are just a few examples, but SPSS lacks many more (to my opinion, essential) features that mice has. To make matters worse, SPSS hardly has any diagnostic tools to determine whether the imputation process went right while mice has several features for that. In short, as an expert on multiple imputation I wouldn’t recommend the MI procedure in SPSS, unless the dataset has relatively few variables and the missing-data problem is relatively simple. Since you cannot tell this from the information that the authors gave in their paper, my comment as a reviewer would be that the authors should switch to mice in R.

 

Is there someway that SPSS will kick out a single set of regression coefficients that a user might interpret in the wrong way as coming from a single dataset? Can spss produce a single data set that might be described as what some call, “stochastic regression imputation,” which is where missing values are predicted by non-missing values within the data, but an random error term is then subsequently applied? In other words, can the MI procedure in SPSS be set to produce only a single dataset?

 

SPSS does have a possibility of combining the results of several imputed datasets into one result using Rubin’s combination rules. When SPSS recognizes a dataset as a multiple-imputation dataset, it gives the user a warning that the split file option must be switched on first, before carrying out any analysis (with Imputation_ as a split variable). Next, SPSS automatically does the combining, which is a really nice feature because you don’t need to do the combining yourself. What I usually do is impute the data in R first, save the result to a dataset in SPSS format, and next do the analyses in SPSS. However, SPSS does not pool the results of all statistical analyses. For example, it doesn’t pool the F-tests of ANOVA, R^2 in regression, or the results of PCA. Usually I use my own SPSS macros for that, which are freely available on my personal page. In some of my papers (Van Ginkel & Kroonenberg, 2014; Van Ginkel, 2019; Van Wingerde & Van Ginkel, 2021) I also refer to these macros. Most of these pooling procedures can also be done in R with the relevant packages by the way.

Long story short: I wouldn’t worry about the authors reporting a set of regression coefficients as coming from a single dataset. What I would worry about more, is the whole imputation process that preceded the analysis.

 

Best regards,

 

Joost van Ginkel

 

Thanks in advance,

 

Jeff

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Kirill Orlov
In reply to this post by Joost van Ginkel
Sure. Exploratory FA is only seldom accompanied in practice by doing statistical tests in it.

07.04.2021 14:30, Ginkel, J.R. van пишет:

But if you enter only a pooled correlation matrix as the input for the factor analysis, then you wipe away all the differences in correlations between the imputed datasets, which is then not incorporated in the statistical tests. It will give you unbiased factor loadings but not unbiased statistical tests.



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Joost van Ginkel

In case of Exploratory Factor Analysis without any statistical tests you are right. As a matter of fact, Van Ginkel and Kroonenberg (2014) studied this option (averaging a correlation matrix) in the context of pooling the results of PCA, together with two other methods (averaging component loadings and Generalized Procrustes Analysis).

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Kirill Orlov
Sent: Wednesday, April 7, 2021 1:58 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 

Sure. Exploratory FA is only seldom accompanied in practice by doing statistical tests in it.

07.04.2021 14:30, Ginkel, J.R. van пишет:

But if you enter only a pooled correlation matrix as the input for the factor analysis, then you wipe away all the differences in correlations between the imputed datasets, which is then not incorporated in the statistical tests. It will give you unbiased factor loadings but not unbiased statistical tests.

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Kirill Orlov
In reply to this post by Jeff A
Jeff, although this thread is not discussing hot-deck imputation, I want
to remind to you that option too.
Hot-deck method is not unpopular in social sciences, especially in surveys.
SPSS does not have in-build command for hot-deck, but there exist macros.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple imputation question

Kirill Orlov
In reply to this post by Joost van Ginkel
Oh, bravo!
Are your paper and other papers mentioned here available for free somewhere? Do you have a site where one can download them?


07.04.2021 15:03, Ginkel, J.R. van пишет:

In case of Exploratory Factor Analysis without any statistical tests you are right. As a matter of fact, Van Ginkel and Kroonenberg (2014) studied this option (averaging a correlation matrix) in the context of pooling the results of PCA, together with two other methods (averaging component loadings and Generalized Procrustes Analysis).

 

From: SPSSX(r) Discussion [hidden email] On Behalf Of Kirill Orlov
Sent: Wednesday, April 7, 2021 1:58 PM
To: [hidden email]
Subject: Re: Multiple imputation question

 



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
12