Re: Multiple Imputation

Posted by Bruce Weaver on
URL: http://spssx-discussion.165.s1.nabble.com/Multiple-Imputation-tp4994372p5722186.html

Some follow-up:  I've had an e-mail from John Graham in which he assures me that "the EM estimates (means, standard deviations, correlations, covariances, etc.) in SPSS are fine", and that the problem is restricted to the imputed dataset.  So for exploratory factor analysis (EFA), it would appear that the simplest approach is to obtain EM estimates of the correlations (or covariances) via the MVA procedure, and use them as input for FACTOR.  (A bit of data management is required to get them into the right matrix file format, but that shouldn't be too difficult.)

Graham did not know if Stata has adds random error correctly when writing out the imputed data (using EM), but suggested this method for checking:

1. Run EM and get the variances and covariances.
2. Write out the imputed data set, and use it to compute variances & covariances.
3. Compare.

"If the random error is properly added to the imputed values, then variances based on the [imputed] data set will be very similar to the variances you see from the EM estimates."  If the random error is not added properly, the variances computed from the imputed dataset are expected to be lower.

HTH.


Bruce Weaver wrote
I am resurrecting this old thread because this same question (i.e., how to handle missing data in the context of exploratory factory analysis) has just come up for a couple of my colleagues.  I had forgotten about this thread, but found it when I started digging into the topic again.  But here's something else interesting I found:

  http://www.stats.ox.ac.uk/~snijders/Graham2009.pdf

Here are some of the most relevant bits (with emphasis added).

"The EM covariance matrix is also an excellent basis for exploratory factor analysis with missing data."

"Although direct analysis of the EM covariance matrix can be useful, a more widely useful EM tool is to impute a single data set from EM parameters (with random error). This procedure has been described in detail in Graham et al. (2003). This single imputed data set is known to yield good parameter estimates, close to the population average. But more importantly, because it is a complete data set, it may be read in using virtually any software, including SPSS. Once read into the software, coefficient alpha and exploratory factor analyses may be carried out in the usual way. One caution is that this data set should not be used for hypothesis testing. Standard errors based on this data set, say from a multiple regression analysis, will be too small, sometimes to a substantial extent. Hypothesis testing should be carried out with MI or one of the FIML procedures. Note that the procedure in SPSS for writing out a single imputed data set based on the EM algorithm is not recommended unless random error residuals are added after the fact to each imputed value; the current implementation of SPSS, up to version 16 at least, writes data out without adding error (e.g., see von Hippel 2004). This is known to produce important biases in the data set (Graham et al. 1996)."

This leads me to the following questions:

1. My v21 CSR manual does not say anything about the addition of random error when MVA - EM is used to write out an imputed dataset.  So I assume nothing has changed since v16. Can anyone confirm this?  (Jon?)

2. Does anyone know if Stata's implementation of EM adds random error?  (Marta, I hope you're reading this!)  I ask, because my digging also turned up this solution to the problem:

   http://www.ats.ucla.edu/stat/stata/faq/factor_missing.htm

Thanks,
Bruce



Alex Reutter wrote
This is an interesting idea.  I think I would be concerned about using the
pooled correlations in much the same way I'd be concerned about using a
single imputation method.  Even if the pooled estimates provide a superior
point estimate of the correlations, we would then be using them as point
estimate inputs to the factor analysis and lose information about the
variability in the correlation estimates.  Still, it's better than
nothing.

I just checked whether the pooled estimates are saved with MATRIX OUT;
they are not, so one would need to use OMS to collect the correlations
from the output table for a sizeable correlation matrix.

Alex




From:   Bruce Weaver <[hidden email]>
To:     [hidden email]
Date:   11/15/2011 09:12 AM
Subject:        Re: Multiple Imputation
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



I see that "Bivariate Correlations" is in the list of supported
procedures.
I assume that means that a correlation matrix is generated for each of the
multiply imputed data sets, and that those estimates are pooled.  So
couldn't one use that pooled correlation matrix as input for the
exploratory
factor analysis?

Cheers,
Bruce



Alex Reutter wrote:
>
> Hi Alex,
>
> 1. There are 2 different imputation methods: Fully conditional
> specification (which uses MCMC) and Monotone.  See
>
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/topic/com.ibm.spss.statistics.help/idh_idd_mi_method.htm

> for details.
>
> 2. I'm afraid Factor analysis does not currently support pooling of
> multiple imputation data: See
>
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/topic/com.ibm.spss.statistics.help/mi_analysis.htm

> for a list of procedures that do.  Procedures that do support pooling
> automatically generate pooled output when run on multiply imputed
> datasets.
>
> Alex
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-tp4994372p4994536.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).