Re: Multiple Imputation

Posted by Bruce Weaver on
URL: http://spssx-discussion.165.s1.nabble.com/Multiple-Imputation-tp4994372p5722178.html

I am resurrecting this old thread because this same question (i.e., how to handle missing data in the context of exploratory factory analysis) has just come up for a couple of my colleagues.  I had forgotten about this thread, but found it when I started digging into the topic again.  But here's something else interesting I found:

  http://www.stats.ox.ac.uk/~snijders/Graham2009.pdf

Here are some of the most relevant bits (with emphasis added).

"The EM covariance matrix is also an excellent basis for exploratory factor analysis with missing data."

"Although direct analysis of the EM covariance matrix can be useful, a more widely useful EM tool is to impute a single data set from EM parameters (with random error). This procedure has been described in detail in Graham et al. (2003). This single imputed data set is known to yield good parameter estimates, close to the population average. But more importantly, because it is a complete data set, it may be read in using virtually any software, including SPSS. Once read into the software, coefficient alpha and exploratory factor analyses may be carried out in the usual way. One caution is that this data set should not be used for hypothesis testing. Standard errors based on this data set, say from a multiple regression analysis, will be too small, sometimes to a substantial extent. Hypothesis testing should be carried out with MI or one of the FIML procedures. Note that the procedure in SPSS for writing out a single imputed data set based on the EM algorithm is not recommended unless random error residuals are added after the fact to each imputed value; the current implementation of SPSS, up to version 16 at least, writes data out without adding error (e.g., see von Hippel 2004). This is known to produce important biases in the data set (Graham et al. 1996)."

This leads me to the following questions:

1. My v21 CSR manual does not say anything about the addition of random error when MVA - EM is used to write out an imputed dataset.  So I assume nothing has changed since v16. Can anyone confirm this?  (Jon?)

2. Does anyone know if Stata's implementation of EM adds random error?  (Marta, I hope you're reading this!)  I ask, because my digging also turned up this solution to the problem:

   http://www.ats.ucla.edu/stat/stata/faq/factor_missing.htm

Thanks,
Bruce



Alex Reutter wrote
This is an interesting idea.  I think I would be concerned about using the
pooled correlations in much the same way I'd be concerned about using a
single imputation method.  Even if the pooled estimates provide a superior
point estimate of the correlations, we would then be using them as point
estimate inputs to the factor analysis and lose information about the
variability in the correlation estimates.  Still, it's better than
nothing.

I just checked whether the pooled estimates are saved with MATRIX OUT;
they are not, so one would need to use OMS to collect the correlations
from the output table for a sizeable correlation matrix.

Alex




From:   Bruce Weaver <[hidden email]>
To:     [hidden email]
Date:   11/15/2011 09:12 AM
Subject:        Re: Multiple Imputation
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



I see that "Bivariate Correlations" is in the list of supported
procedures.
I assume that means that a correlation matrix is generated for each of the
multiply imputed data sets, and that those estimates are pooled.  So
couldn't one use that pooled correlation matrix as input for the
exploratory
factor analysis?

Cheers,
Bruce



Alex Reutter wrote:
>
> Hi Alex,
>
> 1. There are 2 different imputation methods: Fully conditional
> specification (which uses MCMC) and Monotone.  See
>
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/topic/com.ibm.spss.statistics.help/idh_idd_mi_method.htm

> for details.
>
> 2. I'm afraid Factor analysis does not currently support pooling of
> multiple imputation data: See
>
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/topic/com.ibm.spss.statistics.help/mi_analysis.htm

> for a list of procedures that do.  Procedures that do support pooling
> automatically generate pooled output when run on multiply imputed
> datasets.
>
> Alex
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-tp4994372p4994536.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).