SPSSX Discussion

Multiple Imputation results versus original data results

Classic

List

Threaded

5 messages Options

msherman

Jul 28, 2015; 12:56pm

Multiple Imputation results versus original data results

Dear All: I just ran a dependent t-test on a multiple imputation data set and obtained some strange results (so I think).

The original data results for the dependent t-test has a p value of .264 (n=369). The multiple imputation data results for the p value are

1. .009 (n=496)

2. .009 (n-490)

3. .005 (n=496)

4. .013 (n=494)

5. .014 (n=494)

Pooled .010 (n=11066)

Is this reasonable? Can it be attributed to the increase in the sample size? But why would the sample size (each of the five imputed data sets) increase so much from the original data set? What am I missing? Thoughts? Thanks,

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of Master’s Education: Thesis Track

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

[hidden email]

410-617-2417 tel

410-617-5341 fax

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Maguin, Eugene

Jul 28, 2015; 1:10pm

Re: Multiple Imputation results versus original data results

You show five results. Can we assume only five imputations? If so, why is the pooled N=11066? Why aren’t the imputation Ns the same? How similar were the means, SDs and the correlation for each imputation to each other and to the original dataset? This set of results seems odd to me and would encourage me to dig into the similarity of the values for the dependent t-test components.

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martin Sherman
Sent: Tuesday, July 28, 2015 8:57 AM
To: [hidden email]
Subject: Multiple Imputation results versus original data results

Dear All: I just ran a dependent t-test on a multiple imputation data set and obtained some strange results (so I think).

The original data results for the dependent t-test has a p value of .264 (n=369). The multiple imputation data results for the p value are

1. .009 (n=496)

2. .009 (n-490)

3. .005 (n=496)

4. .013 (n=494)

5. .014 (n=494)

Pooled .010 (n=11066)

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of Master’s Education: Thesis Track

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

[hidden email]

410-617-2417 tel

410-617-5341 fax

msherman

Jul 28, 2015; 1:58pm

Re: Multiple Imputation results versus original data results

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Tuesday, July 28, 2015 9:10 AM
To: [hidden email]
Subject: Re: Multiple Imputation results versus original data results

Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Martin Sherman
Sent: Tuesday, July 28, 2015 8:57 AM
To: [hidden email]
Subject: Multiple Imputation results versus original data results

Dear All: I just ran a dependent t-test on a multiple imputation data set and obtained some strange results (so I think).

The original data results for the dependent t-test has a p value of .264 (n=369). The multiple imputation data results for the p value are

1. .009 (n=496)

2. .009 (n-490)

3. .005 (n=496)

4. .013 (n=494)

5. .014 (n=494)

Pooled .010 (n=11066)

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of Master’s Education: Thesis Track

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

[hidden email]

410-617-2417 tel

410-617-5341 fax

Rich Ulrich

Jul 28, 2015; 4:12pm

Re: Multiple Imputation results versus original data results

In reply to this post by msherman

I have no experience doing imputation, but I have spent much time considering
the "paired t test" model with extra data at Pre and/or Post.   If it is Pre-Post,
there is often an important reason for absence at Post, failure or dropout.
- Look into this by comparing the paired Pre cases to the unpaired ones,
since Missing-not-at-random probably undermines imputation.   Do similarly
for Post, if there are enough cases.

If it is some other pairing, like Left-Right, the relative number of Missing may
deserve comment. If unequal, did you expect that?

Consider what you have for testing your hypothesis Without imputation:
- paired t-test where data are complete: Means, SDs and correlation.
- unpaired t-test between the other scores: Means, SDs.

Either these are consistent (means and tests), or they are not. If not, ask Why not?

Now, if you want to look further into the impact of imputation, you have three
groups of cases that you might compare on the variables used in the imputation:
Missing at Pre, Missing at Post, None Missing.   - If these groups differ on the
imputing variables, then, if imputation will tend to create differences in the
paired t, to whatever extent that the imputation is stronger than mean-replacement.

--
Rich Ulrich

Date: Tue, 28 Jul 2015 12:56:30 +0000
From: [hidden email]
Subject: Multiple Imputation results versus original data results
To: [hidden email]

Dear All: I just ran a dependent t-test on a multiple imputation data set and obtained some strange results (so I think).

The original data results for the dependent t-test has a p value of .264 (n=369). The multiple imputation data results for the p value are

1. .009 (n=496)

2. .009 (n-490)

3. .005 (n=496)

4. .013 (n=494)

5. .014 (n=494)

Pooled .010 (n=11066)

Jon K Peck

Jul 28, 2015; 4:32pm

Re: Multiple Imputation results versus original data results

In reply to this post by msherman

I suspect that what happened is that the multiply imputed dataset was analyzed with the splits turned off, so it appeared that you had five times as much data as you actually have. The MI procedure produces one big datasets split by the imputation number (and a variable named Imputation_ that defines the split). Split by that variable must be turned on, which it is automatically after the imputation, in order to get any valid results.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: Martin Sherman <[hidden email]>
To: [hidden email]
Date: 07/28/2015 07:58 AM
Subject: Re: [SPSSX-L] Multiple Imputation results versus original data results
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Gene: Yes, I noticed that too. Let me contact my graduate student (who ran the MI) and see what is going on. Those n’s should be the same. The means and sds for the IM data sets are very similar but quite different from the original data set. The SEs get smaller for the IM results given the increase in sample size.

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Maguin, Eugene
Sent: Tuesday, July 28, 2015 9:10 AM
To: [hidden email]
Subject: Re: Multiple Imputation results versus original data results

You show five results. Can we assume only five imputations? If so, why is the pooled N=11066? Why aren’t the imputation Ns the same? How similar were the means, SDs and the correlation for each imputation to each other and to the original dataset? This set of results seems odd to me and would encourage me to dig into the similarity of the values for the dependent t-test components.
Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Martin Sherman
Sent: Tuesday, July 28, 2015 8:57 AM
To: [hidden email]
Subject: Multiple Imputation results versus original data results

Dear All: I just ran a dependent t-test on a multiple imputation data set and obtained some strange results (so I think).
The original data results for the dependent t-test has a p value of .264 (n=369). The multiple imputation data results for the p value are
1. .009 (n=496)
2. .009 (n-490)
3. .005 (n=496)
4. .013 (n=494)
5. .014 (n=494)
Pooled .010 (n=11066)

Is this reasonable? Can it be attributed to the increase in the sample size? But why would the sample size (each of the five imputed data sets) increase so much from the original data set? What am I missing? Thoughts? Thanks,

Martin F. Sherman, Ph.D.
Professor of Psychology
Director of Master’s Education: Thesis Track

Department of Psychology
222 B Beatty Hall
4501 North Charles Street
Baltimore, MD 21210

msherman@...
410-617-2417 tel
410-617-5341 fax

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD