Multiple Imputation results versus original data results

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple Imputation results versus original data results

msherman


Dear All: I just ran a dependent t-test on a multiple imputation data set and obtained some strange results (so I think).

The original data results for the dependent t-test has a p value of .264 (n=369). The multiple imputation data  results for the p value are

1. .009 (n=496)

2. .009 (n-490)

3. .005 (n=496)

4. .013 (n=494)

5. .014 (n=494)

Pooled  .010 (n=11066)

 

Is this reasonable? Can it be attributed to the increase in the sample size? But why would the sample size (each of the five imputed data sets)  increase so much from the original data set? What am I missing? Thoughts?  Thanks,

 

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of Master’s Education: Thesis Track

 

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

 

[hidden email]

410-617-2417 tel

410-617-5341 fax

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Imputation results versus original data results

Maguin, Eugene

You show five results. Can we assume only five imputations? If so, why is the pooled N=11066? Why aren’t the imputation Ns the same? How similar were the means, SDs and the correlation for each imputation to each other and to the original dataset? This set of results seems odd to me and would encourage me to dig into the similarity of the values for the dependent t-test components.

Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martin Sherman
Sent: Tuesday, July 28, 2015 8:57 AM
To: [hidden email]
Subject: Multiple Imputation results versus original data results

 


Dear All: I just ran a dependent t-test on a multiple imputation data set and obtained some strange results (so I think).

The original data results for the dependent t-test has a p value of .264 (n=369). The multiple imputation data  results for the p value are

1. .009 (n=496)

2. .009 (n-490)

3. .005 (n=496)

4. .013 (n=494)

5. .014 (n=494)

Pooled  .010 (n=11066)

 

Is this reasonable? Can it be attributed to the increase in the sample size? But why would the sample size (each of the five imputed data sets)  increase so much from the original data set? What am I missing? Thoughts?  Thanks,

 

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of Master’s Education: Thesis Track

 

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

 

[hidden email]

410-617-2417 tel

410-617-5341 fax

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Imputation results versus original data results

msherman

Gene: Yes, I noticed that too. Let me contact my graduate student (who ran the MI) and see what is going on. Those n’s should be the same. The means and sds for the IM data sets are very similar but quite different from the original data set.  The SEs get smaller for the IM results given the increase in sample size.

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Tuesday, July 28, 2015 9:10 AM
To: [hidden email]
Subject: Re: Multiple Imputation results versus original data results

 

You show five results. Can we assume only five imputations? If so, why is the pooled N=11066? Why aren’t the imputation Ns the same? How similar were the means, SDs and the correlation for each imputation to each other and to the original dataset? This set of results seems odd to me and would encourage me to dig into the similarity of the values for the dependent t-test components.

Gene Maguin

 

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Martin Sherman
Sent: Tuesday, July 28, 2015 8:57 AM
To: [hidden email]
Subject: Multiple Imputation results versus original data results

 


Dear All: I just ran a dependent t-test on a multiple imputation data set and obtained some strange results (so I think).

The original data results for the dependent t-test has a p value of .264 (n=369). The multiple imputation data  results for the p value are

1. .009 (n=496)

2. .009 (n-490)

3. .005 (n=496)

4. .013 (n=494)

5. .014 (n=494)

Pooled  .010 (n=11066)

 

Is this reasonable? Can it be attributed to the increase in the sample size? But why would the sample size (each of the five imputed data sets)  increase so much from the original data set? What am I missing? Thoughts?  Thanks,

 

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of Master’s Education: Thesis Track

 

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

 

[hidden email]

410-617-2417 tel

410-617-5341 fax

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Imputation results versus original data results

Rich Ulrich
In reply to this post by msherman
I have no experience doing imputation, but I have spent much time considering
the "paired t test" model with extra data at Pre and/or Post.   If it is Pre-Post,
there is often an important reason for absence at Post, failure or dropout. 
- Look into this by comparing the paired Pre cases to the unpaired ones,
since Missing-not-at-random probably undermines imputation.   Do similarly
for Post, if there are enough cases. 

If it is some other pairing, like Left-Right, the relative number of Missing may
deserve comment.  If unequal, did you expect that?

Consider what you have for testing your hypothesis Without imputation: 
  - paired t-test where data are complete: Means, SDs and correlation.
  - unpaired t-test between the other scores:  Means, SDs.

Either these are consistent (means and tests), or they are not.  If not,  ask Why not?

Now, if you want to look further into the impact of imputation, you have three
groups of cases that you might compare on the variables used in the imputation:
Missing at Pre, Missing at Post, None Missing.   - If these groups differ on the
imputing variables, then, if imputation will tend to create differences in the
paired t, to whatever extent that the imputation is stronger than mean-replacement.

--
Rich Ulrich





Date: Tue, 28 Jul 2015 12:56:30 +0000
From: [hidden email]
Subject: Multiple Imputation results versus original data results
To: [hidden email]


Dear All: I just ran a dependent t-test on a multiple imputation data set and obtained some strange results (so I think).

The original data results for the dependent t-test has a p value of .264 (n=369). The multiple imputation data  results for the p value are

1. .009 (n=496)

2. .009 (n-490)

3. .005 (n=496)

4. .013 (n=494)

5. .014 (n=494)

Pooled  .010 (n=11066)

 

Is this reasonable? Can it be attributed to the increase in the sample size? But why would the sample size (each of the five imputed data sets)  increase so much from the original data set? What am I missing? Thoughts?  Thanks,

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Imputation results versus original data results

Jon K Peck
In reply to this post by msherman
I suspect that what happened is that the multiply imputed dataset was analyzed with the splits turned off, so it appeared that you had five times as much data as you actually have.  The MI procedure produces one big datasets split by the imputation number (and a variable named Imputation_ that defines the split).  Split by that variable must be turned on, which it is automatically after the imputation, in order to get any valid results.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Martin Sherman <[hidden email]>
To:        [hidden email]
Date:        07/28/2015 07:58 AM
Subject:        Re: [SPSSX-L] Multiple Imputation results versus original data results
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Gene: Yes, I noticed that too. Let me contact my graduate student (who ran the MI) and see what is going on. Those n’s should be the same. The means and sds for the IM data sets are very similar but quite different from the original data set.  The SEs get smaller for the IM results given the increase in sample size.
 
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Maguin, Eugene
Sent:
Tuesday, July 28, 2015 9:10 AM
To:
[hidden email]
Subject:
Re: Multiple Imputation results versus original data results

 
You show five results. Can we assume only five imputations? If so, why is the pooled N=11066? Why aren’t the imputation Ns the same? How similar were the means, SDs and the correlation for each imputation to each other and to the original dataset? This set of results seems odd to me and would encourage me to dig into the similarity of the values for the dependent t-test components.
Gene Maguin
 
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Martin Sherman
Sent:
Tuesday, July 28, 2015 8:57 AM
To:
[hidden email]
Subject:
Multiple Imputation results versus original data results

 

Dear All: I just ran a dependent t-test on a multiple imputation data set and obtained some strange results (so I think).

The original data results for the dependent t-test has a p value of .264 (n=369). The multiple imputation data  results for the p value are
1. .009 (n=496)
2. .009 (n-490)
3. .005 (n=496)
4. .013 (n=494)
5. .014 (n=494)
Pooled  .010 (n=11066)
 
Is this reasonable? Can it be attributed to the increase in the sample size? But why would the sample size (each of the five imputed data sets)  increase so much from the original data set? What am I missing? Thoughts?  Thanks,
 
Martin F. Sherman, Ph.D.
Professor of Psychology
Director of Master’s Education: Thesis Track
 
Department of Psychology
222 B Beatty Hall
4501 North Charles Street
Baltimore, MD 21210
 
msherman@...
410-617-2417 tel
410-617-5341 fax
 
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD