SPSSX Discussion

unexpected t-test df value following multiple imputation with PASW 18

Classic

List

Threaded

6 messages Options

Psych

unexpected t-test df value following multiple imputation with PASW 18

Hi all,

I have run 3 paired samples t-tests across five imputations (plus one original data set) following MCMC multiple imputation in PASW 18. The degrees of freedom for the original data set and all imputed data sets appear normal, ie. df values are n - 1 as one would expect.

However, the PASW output returns massive df values for the pooled data t-tests, eg. "t(237501) = 1.68, p = .05," whilst the df value found in the original and imputed data sets is only 4. I understand that the larger df value comes from combining 6 data sets but have not seen anything like this before.

I presume that PASW calculates the t-statistic using the pooled df value, so when writing my results section I gather that I should report the pooled df value rather than the original data set's df value. However, I'm concerned that 1.) it doesn't accurately inform the reader about the parameters of the analysis, 2.) I may have done something wrong in my analysis and 3.) readers won't understand why the df value is so enormous.

I have scoured a great deal of literature and sought advice from my faculty, but as yet I haven't been able to find any guidance around paired t-tests within MI data sets in SPSS/PASW. Can anyone shed some light on the size of pooled df values and how best to report them?

Joost van Ginkel

Re: unexpected t-test df value following multiple imputation with PASW 18

unexpected t-test df value following multiple imputation with PASW 18

It is true that the number of degrees of freedom may become extremely large for the pooled result in Multiple imputation. The reason is that PASW uses an approximation that is based on the assumption that the complete-data statistic (if no data were missing) is a z-test rather than a t-test. Since a z-test can be conceived of as a t-test with an infinite number of degrees of freedom, the number of degrees of freedom that you get is still relatively small compared to the complete-data case. This smaller number reflects the extra uncertainty caused by the missing data.

However, since the statistic of the complete-data case is a not a z-test but a t-test, the number of degrees of freedom that you found is too large. Barnard & Rubin proposed an alternative adjusted number of degrees of freedom that is always smaller than the number of degrees of freedom of the complete-data case. The reference for this paper is:

Barnard, D. & Rubin, D.B. (1999). Small-sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955.

Unfortunately this adjusted number of degrees of freedom is not implemented in PASW (yet) but I've written a macro for pooling the results in multiple imputation which has an option for using this adjusted number of degrees of freedom as well. You can download it from:

http://www.socialsciences.leiden.edu/educationandchildstudies/childandfamilystudies/organisation/staffcfs/van-ginkel.html

(file mi2.zip) Good luck!

Best regards,

Joost van Ginkel

Van: SPSSX(r) Discussion namens Psych
Verzonden: vr 15-10-2010 19:27
Aan: [hidden email]
Onderwerp: unexpected t-test df value following multiple imputation with PASW 18

Hi all,

I have run 3 paired samples t-tests across five imputations (plus one
original data set) following MCMC multiple imputation in PASW 18. The
degrees of freedom for the original data set and all imputed data sets
appear normal, ie. df values are n - 1 as one would expect.

However, the PASW output returns massive df values for the pooled data
t-tests, eg. "t(237501) = 1.68, p = .05," whilst the df value found in the
original and imputed data sets is only 4. I understand that the larger df
value comes from combining 6 data sets but have not seen anything like this
before.

I presume that PASW calculates the t-statistic using the pooled df value, so
when writing my results section I gather that I should report the pooled df
value rather than the original data set's df value. However, I'm concerned
that 1.) it doesn't accurately inform the reader about the parameters of the
analysis, 2.) I may have done something wrong in my analysis and 3.) readers
won't understand why the df value is so enormous.

I have scoured a great deal of literature and sought advice from my faculty,
but as yet I haven't been able to find any guidance around paired t-tests
within MI data sets in SPSS/PASW. Can anyone shed some light on the size of
pooled df values and how best to report them?
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/unexpected-t-test-df-value-following-multiple-imputation-with-PASW-18-tp3214135p3214135.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

**********************************************************************

This email and any files transmitted with it are confidential and

intended solely for the use of the individual or entity to whom they

are addressed. If you have received this email in error please notify

the system manager.

**********************************************************************

Ling Ting

Automatic reply: unexpected t-test df value following multiple imputation with PASW 18

I will be out of office till October 25th with limited access to email. If you need immediate assistance please contact 479-575-2905. Thank you.

Kelly Vander Ley

Automatic reply: unexpected t-test df value following multiple imputation with PASW 18

In reply to this post by Joost van Ginkel

I will be out of the office from October 18th-October 29th, returning to the office on November 1st. I will not have access to e-mail. If you need assistance regarding the SPF SIG cross-site evaluation please contact Gillian Leichtling ([hidden email]). Thank you. Kelly

Psych

Re: unexpected t-test df value following multiple imputation with PASW 18

In reply to this post by Joost van Ginkel

Many thanks Dr van Ginkel, I'll try running your macro tomorrow. I had read the Barnard & Rubin article before but your explanation made things much clearer. This is enormously helpful!

-Andrew

Bruce Weaver

Re: unexpected t-test df value following multiple imputation with PASW 18

Administrator

In reply to this post by Joost van Ginkel

I had never tried multiple imputation in the context of a t-test (paired or unpaired), so this made me curious. Here's an example for anyone else who wants to investigate further.

GET FILE='C:\Program Files\SPSSInc\PASWStatistics18\Samples\English\dietstudy.sav'.
dataset name original.

* Create some missing values on a couple variables.

if any(patid,2,7,11) wgt0 = 999.
if any(patid,5,6,15) wgt4 = 999.
missing values wgt0 wgt4 (999).

descriptives wgt0 to wgt4.

*Impute Missing Data Values.
DATASET DECLARE mi_data.
MULTIPLE IMPUTATION wgt0 to wgt4
/IMPUTE METHOD=AUTO NIMPUTATIONS=5 MAXPCTMISSING=NONE
/MISSINGSUMMARIES NONE
/IMPUTATIONSUMMARIES MODELS
/OUTFILE IMPUTATIONS=mi_data .

dataset activate mi_data.
T-TEST PAIRS=wgt0 WITH wgt4 (PAIRED)
/CRITERIA=CI(.9500)
/MISSING=ANALYSIS.

The interesting thing is that if you run this code more than once, you'll (probably) get a different number of df for the pooled test each time. I didn't expect that. I guess I'll have to take a look at that Barnard & Rubin article. Thanks for the reference, Joost.

Cheers,
Bruce

p.s. - I thought I'd be clever and run it as a repeated measures ANOVA (GLM Repeated Measures). But wouldn't you know it, you get no pooled results that way.

Ginkel, Joost van wrote

It is true that the number of degrees of freedom may become extremely large for the pooled result in Multiple imputation. The reason is that PASW uses an approximation that is based on the assumption that the complete-data statistic (if no data were missing) is a z-test rather than a t-test. Since a z-test can be conceived of as a t-test with an infinite number of degrees of freedom, the number of degrees of freedom that you get is still relatively small compared to the complete-data case. This smaller number reflects the extra uncertainty caused by the missing data.
However, since the statistic of the complete-data case is a not a z-test but a t-test, the number of degrees of freedom that you found is too large. Barnard & Rubin proposed an alternative adjusted number of degrees of freedom that is always smaller than the number of degrees of freedom of the complete-data case. The reference for this paper is:

Barnard, D. & Rubin, D.B. (1999). Small-sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955.

Unfortunately this adjusted number of degrees of freedom is not implemented in PASW (yet) but I've written a macro for pooling the results in multiple imputation which has an option for using this adjusted number of degrees of freedom as well. You can download it from:

http://www.socialsciences.leiden.edu/educationandchildstudies/childandfamilystudies/organisation/staffcfs/van-ginkel.html

(file mi2.zip) Good luck!

Best regards,

Joost van Ginkel

________________________________

Van: SPSSX(r) Discussion namens Psych
Verzonden: vr 15-10-2010 19:27
Aan: SPSSX-L@LISTSERV.UGA.EDU
Onderwerp: unexpected t-test df value following multiple imputation with PASW 18

Hi all,

I have run 3 paired samples t-tests across five imputations (plus one
original data set) following MCMC multiple imputation in PASW 18. The
degrees of freedom for the original data set and all imputed data sets
appear normal, ie. df values are n - 1 as one would expect.

However, the PASW output returns massive df values for the pooled data
t-tests, eg. "t(237501) = 1.68, p = .05," whilst the df value found in the
original and imputed data sets is only 4. I understand that the larger df
value comes from combining 6 data sets but have not seen anything like this
before.

I presume that PASW calculates the t-statistic using the pooled df value, so
when writing my results section I gather that I should report the pooled df
value rather than the original data set's df value. However, I'm concerned
that 1.) it doesn't accurately inform the reader about the parameters of the
analysis, 2.) I may have done something wrong in my analysis and 3.) readers
won't understand why the df value is so enormous.

I have scoured a great deal of literature and sought advice from my faculty,
but as yet I haven't been able to find any guidance around paired t-tests
within MI data sets in SPSS/PASW. Can anyone shed some light on the size of
pooled df values and how best to report them?
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/unexpected-t-test-df-value-following-multiple-imputation-with-PASW-18-tp3214135p3214135.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
**********************************************************************

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).