SPSSX Discussion

How to analyse Multiple Imputation data with SPSS?

Classic

List

Threaded

13 messages Options

cmll

How to analyse Multiple Imputation data with SPSS?

Hello!

My question is not about how to create multiple imputation data with SPSS, but how to analyze it.

I have to work on a dataset treated with the Multiple Imputation method to handle missing data.

In fact, I have 5 different variant of the same dataset, with missing data replaced by probable values.

How can I analyses those datasets? Should I merge them?

Thank you for your time.

Best,
cmll.

Joost van Ginkel

Re: How to analyse Multiple Imputation data with SPSS?

SPSS merges the results of the 5 data sets for you (not the data itself). All you have to do is a split file with Imputation_ as a split variable. If you carry out an analysis after the split file, you'll see the separate results of the 5 imputed datasets, and at the bottom the pooled results in the output. Note however, that SPSS doesn't do this for every analysis. For example, for PCA and ANOVA SPSS doesn't provide pooled results. I have written two papers on these subjects but unfortunately they are still in press. They are expected to appear this year.

Best,

Joost

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of cmll
Sent: vrijdag 10 januari 2014 15:17
To: [hidden email]
Subject: How to analyse Multiple Imputation data with SPSS?

Hello!

My question is not about how to create multiple imputation data with SPSS, but how to analyze it.

I have to work on a dataset treated with the Multiple Imputation method to handle missing data.

In fact, I have 5 different variant of the same dataset, with missing data replaced by probable values.

How can I analyses those datasets? Should I merge them?

Thank you for your time.

Best,
cmll.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-analyse-Multiple-Imputation-data-with-SPSS-tp5723862.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: How to analyse Multiple Imputation data with SPSS?

Administrator

In reply to this post by cmll

From the FM entry for MULTIPLE IMPUTATION:

The output dataset contains the original (nonmissing) data and data for one or more imputations. Each imputation includes all of the observed data and imputed data values. The original and imputed data are stacked in the output dataset. A special variable, Imputation_, identifies whether a case represents original data (Imputation_=0)or imputed data (Imputation_=1…m).

So, there is no need to merge datasets. Just analyze them as you normally would (with a procedure that supports pooling--see the list at the link below by clicking on Procedures that Support Pooling), and with the file SPLIT BY Imputation_.

http://pic.dhe.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fmi_analysis.htm

Here's an example from Dave Howell's site:

http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS.html

cmll wrote

Hello!

My question is not about how to create multiple imputation data with SPSS, but how to analyze it.

I have to work on a dataset treated with the Multiple Imputation method to handle missing data.

In fact, I have 5 different variant of the same dataset, with missing data replaced by probable values.

How can I analyses those datasets? Should I merge them?

Thank you for your time.

Best,
cmll.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

cmll

Re: How to analyse Multiple Imputation data with SPSS?

Thank you for your answer!

the problem is, I did not run the MI myself, and I have 5 different data files. Your tutorials are much more focused on run MI within SPSS and then do the analysis.

But in my case the data files are not produced by SPSS. My problem is that I have 5 different files. How do I use them? Should I merge them?

I'm sorry for my poo English, but I hope you get my problem.

I did some search on the list archive, but nobody had the same problem it seems. Or they were just smarter then me in finding the solution.

Thank you for your kind answer.

best,
cmll.

cmll

Re: How to analyse Multiple Imputation data with SPSS?

Just to point: 5 different files, completely identical, that differ only on the missing values predicted value.

best,
martino.

Bruce Weaver

Re: How to analyse Multiple Imputation data with SPSS?

Administrator

In reply to this post by cmll

Thanks for clarifying. Do you also have the original data file with missing data points? Assuming you do, you want to stack the 6 files something like this:

ADD FILES
file = "C:\MyFolder\OriginalFile.sav" /
file = "C:\MyFolder\Imp1.sav" / IN=file1 /
file = "C:\MyFolder\Imp2.sav" / IN=file2 /
file = "C:\MyFolder\Imp3.sav" / IN=file3 /
file = "C:\MyFolder\Imp4.sav" / IN=file4 /
file = "C:\MyFolder\Imp5.sav" / IN=file5 .
EXECUTE.
DATASET NAME ImputedData.

Then you need to compute the special Imputation_ variable.

COMPUTE Imputation_ = 0.
FORMATS Imputation_ (f5.0).
DO REPEAT f = file1 to file5 / i = 1 to 5.
- IF f Imputation_ = i.
END REPEAT.
FREQUENCIES Imputation_ .

Finally, split the file by Imputation_, and perform your analysis.

HTH.

cmll wrote

Thank you for your answer!

the problem is, I did not run the MI myself, and I have 5 different data files. Your tutorials are much more focused on run MI within SPSS and then do the analysis.

But in my case the data files are not produced by SPSS. My problem is that I have 5 different files. How do I use them? Should I merge them?

I'm sorry for my poo English, but I hope you get my problem.

I did some search on the list archive, but nobody had the same problem it seems. Or they were just smarter then me in finding the solution.

Thank you for your kind answer.

best,
cmll.

news

Re: How to analyse Multiple Imputation data with SPSS?

What do you mean by

- IF f Imputation_ = i.

TIA
F. Thomas

On 10/01/2014 17:01, Bruce Weaver wrote:

> Thanks for clarifying. Do you also have the original data file with missing
> data points? Assuming you do, you want to stack the 6 files something like
> this:
>
> ADD FILES
> file = "C:\MyFolder\OriginalFile.sav" /
> file = "C:\MyFolder\Imp1.sav" / IN=file1 /
> file = "C:\MyFolder\Imp2.sav" / IN=file2 /
> file = "C:\MyFolder\Imp3.sav" / IN=file3 /
> file = "C:\MyFolder\Imp4.sav" / IN=file4 /
> file = "C:\MyFolder\Imp5.sav" / IN=file5 .
> EXECUTE.
> DATASET NAME ImputedData.
>
> Then you need to compute the special Imputation_ variable.
>
> COMPUTE Imputation_ = 0.
> FORMATS Imputation_ (f5.0).
> DO REPEAT f = file1 to file5 / i = 1 to 5.
> - IF f Imputation_ = i.
> END REPEAT.
> FREQUENCIES Imputation_ .
>
> Finally, split the file by Imputation_, and perform your analysis.
>
> HTH.
>
>
>
> cmll wrote
>> Thank you for your answer!
>>
>> the problem is, I did not run the MI myself, and I have 5 different data
>> files. Your tutorials are much more focused on run MI within SPSS and then
>> do the analysis.
>>
>> But in my case the data files are not produced by SPSS. My problem is that
>> I have 5 different files. How do I use them? Should I merge them?
>>
>> I'm sorry for my poo English, but I hope you get my problem.
>>
>> I did some search on the list archive, but nobody had the same problem it
>> seems. Or they were just smarter then me in finding the solution.
>>
>> Thank you for your kind answer.
>>
>> best,
>> cmll.
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-analyse-Multiple-Imputation-data-with-SPSS-tp5723862p5723869.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: How to analyse Multiple Imputation data with SPSS?

Administrator

I could have written:

- IF (f EQ 1) Imputation_ = i.

But f = a 1/0 indicator variable, so writing "IF f" accomplishes the same thing with fewer keystrokes. One way to look at it is that 1/0 variables behave like boolean variables in languages that have them: 1 = True, 0 = False.

In this example, the logic might be clearer if you substitute in the original 5 variable names and values of i. I.e., the DO REPEAT is really doing this:

IF file1 Imputation_ = 1.
IF file2 Imputation_ = 2.
IF file3 Imputation_ = 3.
IF file4 Imputation_ = 4.
IF file5 Imputation_ = 5.

HTH.

FTR wrote

What do you mean by

- IF f Imputation_ = i.

TIA
F. Thomas

On 10/01/2014 17:01, Bruce Weaver wrote:
> Thanks for clarifying. Do you also have the original data file with missing
> data points? Assuming you do, you want to stack the 6 files something like
> this:
>
> ADD FILES
> file = "C:\MyFolder\OriginalFile.sav" /
> file = "C:\MyFolder\Imp1.sav" / IN=file1 /
> file = "C:\MyFolder\Imp2.sav" / IN=file2 /
> file = "C:\MyFolder\Imp3.sav" / IN=file3 /
> file = "C:\MyFolder\Imp4.sav" / IN=file4 /
> file = "C:\MyFolder\Imp5.sav" / IN=file5 .
> EXECUTE.
> DATASET NAME ImputedData.
>
> Then you need to compute the special Imputation_ variable.
>
> COMPUTE Imputation_ = 0.
> FORMATS Imputation_ (f5.0).
> DO REPEAT f = file1 to file5 / i = 1 to 5.
> - IF f Imputation_ = i.
> END REPEAT.
> FREQUENCIES Imputation_ .
>
> Finally, split the file by Imputation_, and perform your analysis.
>
> HTH.
>
>
>
> cmll wrote
>> Thank you for your answer!
>>
>> the problem is, I did not run the MI myself, and I have 5 different data
>> files. Your tutorials are much more focused on run MI within SPSS and then
>> do the analysis.
>>
>> But in my case the data files are not produced by SPSS. My problem is that
>> I have 5 different files. How do I use them? Should I merge them?
>>
>> I'm sorry for my poo English, but I hope you get my problem.
>>
>> I did some search on the list archive, but nobody had the same problem it
>> seems. Or they were just smarter then me in finding the solution.
>>
>> Thank you for your kind answer.
>>
>> best,
>> cmll.
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-analyse-Multiple-Imputation-data-with-SPSS-tp5723862p5723869.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: How to analyse Multiple Imputation data with SPSS?

Administrator

For fewer keystrokes?

COMPUTE imputation_ = SUM(file1,file2*2,file3*3,file4*4,file5*5).

Bruce Weaver wrote

I could have written:

- IF (f EQ 1) Imputation_ = i.

But f = a 1/0 indicator variable, so writing "IF f" accomplishes the same thing with fewer keystrokes. One way to look at it is that 1/0 variables behave like boolean variables in languages that have them: 1 = True, 0 = False.

In this example, the logic might be clearer if you substitute in the original 5 variable names and values of i. I.e., the DO REPEAT is really doing this:

IF file1 Imputation_ = 1.
IF file2 Imputation_ = 2.
IF file3 Imputation_ = 3.
IF file4 Imputation_ = 4.
IF file5 Imputation_ = 5.

HTH.

FTR wrote

What do you mean by

- IF f Imputation_ = i.

TIA
F. Thomas

On 10/01/2014 17:01, Bruce Weaver wrote:
> Thanks for clarifying. Do you also have the original data file with missing
> data points? Assuming you do, you want to stack the 6 files something like
> this:
>
> ADD FILES
> file = "C:\MyFolder\OriginalFile.sav" /
> file = "C:\MyFolder\Imp1.sav" / IN=file1 /
> file = "C:\MyFolder\Imp2.sav" / IN=file2 /
> file = "C:\MyFolder\Imp3.sav" / IN=file3 /
> file = "C:\MyFolder\Imp4.sav" / IN=file4 /
> file = "C:\MyFolder\Imp5.sav" / IN=file5 .
> EXECUTE.
> DATASET NAME ImputedData.
>
> Then you need to compute the special Imputation_ variable.
>
> COMPUTE Imputation_ = 0.
> FORMATS Imputation_ (f5.0).
> DO REPEAT f = file1 to file5 / i = 1 to 5.
> - IF f Imputation_ = i.
> END REPEAT.
> FREQUENCIES Imputation_ .
>
> Finally, split the file by Imputation_, and perform your analysis.
>
> HTH.
>
>
>
> cmll wrote
>> Thank you for your answer!
>>
>> the problem is, I did not run the MI myself, and I have 5 different data
>> files. Your tutorials are much more focused on run MI within SPSS and then
>> do the analysis.
>>
>> But in my case the data files are not produced by SPSS. My problem is that
>> I have 5 different files. How do I use them? Should I merge them?
>>
>> I'm sorry for my poo English, but I hope you get my problem.
>>
>> I did some search on the list archive, but nobody had the same problem it
>> seems. Or they were just smarter then me in finding the solution.
>>
>> Thank you for your kind answer.
>>
>> best,
>> cmll.
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-analyse-Multiple-Imputation-data-with-SPSS-tp5723862p5723869.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Bruce Weaver

Re: How to analyse Multiple Imputation data with SPSS?

Administrator

Right. I knew there was a more efficient way, but it wasn't coming to me immediately.

David Marso wrote

For fewer keystrokes?

COMPUTE imputation_ = SUM(file1,file2*2,file3*3,file4*4,file5*5).

Bruce Weaver wrote

I could have written:

- IF (f EQ 1) Imputation_ = i.

But f = a 1/0 indicator variable, so writing "IF f" accomplishes the same thing with fewer keystrokes. One way to look at it is that 1/0 variables behave like boolean variables in languages that have them: 1 = True, 0 = False.

In this example, the logic might be clearer if you substitute in the original 5 variable names and values of i. I.e., the DO REPEAT is really doing this:

IF file1 Imputation_ = 1.
IF file2 Imputation_ = 2.
IF file3 Imputation_ = 3.
IF file4 Imputation_ = 4.
IF file5 Imputation_ = 5.

HTH.

FTR wrote

What do you mean by

- IF f Imputation_ = i.

TIA
F. Thomas

On 10/01/2014 17:01, Bruce Weaver wrote:
> Thanks for clarifying. Do you also have the original data file with missing
> data points? Assuming you do, you want to stack the 6 files something like
> this:
>
> ADD FILES
> file = "C:\MyFolder\OriginalFile.sav" /
> file = "C:\MyFolder\Imp1.sav" / IN=file1 /
> file = "C:\MyFolder\Imp2.sav" / IN=file2 /
> file = "C:\MyFolder\Imp3.sav" / IN=file3 /
> file = "C:\MyFolder\Imp4.sav" / IN=file4 /
> file = "C:\MyFolder\Imp5.sav" / IN=file5 .
> EXECUTE.
> DATASET NAME ImputedData.
>
> Then you need to compute the special Imputation_ variable.
>
> COMPUTE Imputation_ = 0.
> FORMATS Imputation_ (f5.0).
> DO REPEAT f = file1 to file5 / i = 1 to 5.
> - IF f Imputation_ = i.
> END REPEAT.
> FREQUENCIES Imputation_ .
>
> Finally, split the file by Imputation_, and perform your analysis.
>
> HTH.
>
>
>
> cmll wrote
>> Thank you for your answer!
>>
>> the problem is, I did not run the MI myself, and I have 5 different data
>> files. Your tutorials are much more focused on run MI within SPSS and then
>> do the analysis.
>>
>> But in my case the data files are not produced by SPSS. My problem is that
>> I have 5 different files. How do I use them? Should I merge them?
>>
>> I'm sorry for my poo English, but I hope you get my problem.
>>
>> I did some search on the list archive, but nobody had the same problem it
>> seems. Or they were just smarter then me in finding the solution.
>>
>> Thank you for your kind answer.
>>
>> best,
>> cmll.
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-analyse-Multiple-Imputation-data-with-SPSS-tp5723862p5723869.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

cmll

Re: How to analyse Multiple Imputation data with SPSS?

Thank you Bruce for your accurate answer; and thank you David for your code, it's quite smart :)

see you,
cmll.

cmll

Re: How to analyse Multiple Imputation data with SPSS?

In reply to this post by Bruce Weaver

Thank you Bruce for your answer, your syntax worked flawlessly.

Sorry to came to this question again,
but what if I don't have the original file (without multiple imputations)?

How can I compare the five different dataset and produce another one without the multiple imputations?

Obviously the 5 dataset are identical, but the variable names changes in every dataset:
In the first dataset it's abc1 and in the second dataset it's abc2.

I tried to use compare (http://pic.dhe.ibm.com/infocenter/spssstat/v21r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fidh_compare_to.htm) but it didn't work out for me.

Anyone know how to do?

Bruce Weaver

Re: How to analyse Multiple Imputation data with SPSS?

Administrator

Ultimately, you'll want a stacked dataset with the same variable names (i.e., abc for all datasets, not abc1, abc2, etc). The following assumes you are starting with such a dataset, and that variable Imputation_ gives the imputation number. It uses the SD function to determine if there is any variation in values across the 5 MI datasets. Where the SD > 0, it assigns a value that is later defined as missing.

* Simulate a stacked dataset containing the 5 sets of MI data.
* Notice that the variable names are the same for each dataset,
* and variable Imputation_ gives the imputation number.

DATA LIST list / ID (f2.0) v1 to v3 Imputation_ (4f5.0).
BEGIN DATA
1 2 4 6 1
2 3 4 5 1
1 2 5 6 2
2 3 4 5 2
1 2 6 6 3
2 3 4 5 3
1 2 2 6 4
2 3 4 5 4
1 2 4 6 5
2 3 4 5 5
END DATA.

SORT CASES by ID Imputation_.
DO REPEAT v = v1 to v3.
- IF Imputation_ EQ 5 and SD(v,lag(v),lag(v,2),lag(v,3),lag(v,4)) GT 0 v = 999.
END REPEAT.
MISSING VALUES v1 to v3(999).
VALUE LABELS v1 to v3 999 'Missing in original dataset'.
RECODE Imputation_(5=0).
EXECUTE.
SELECT IF Imputation_ EQ 0.
LIST. /* Use EXECUTE rather than LIST for your large dataset.

Output:

ID v1 v2 v3 Imputation_

1 2 999 6 0
2 3 4 5 0

Number of cases read: 2 Number of cases listed: 2

This is the original dataset without the imputed data points. Now use ADD FILES to combine it with your stacked file of 5 MI datasets.

HTH.

cmll wrote

Thank you Bruce for your answer, your syntax worked flawlessly.

Sorry to came to this question again,
but what if I don't have the original file (without multiple imputations)?

How can I compare the five different dataset and produce another one without the multiple imputations?

Obviously the 5 dataset are identical, but the variable names changes in every dataset:
In the first dataset it's abc1 and in the second dataset it's abc2.

I tried to use compare (http://pic.dhe.ibm.com/infocenter/spssstat/v21r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fidh_compare_to.htm) but it didn't work out for me.

Anyone know how to do?