Re: Multiple Imputation

Posted by Art Kendall on
URL: http://spssx-discussion.165.s1.nabble.com/Multiple-Imputation-tp4994372p5723121.html

comments interspersed below.
Art Kendall
Social Research Consultants
On 11/17/2013 5:45 AM, therp [via SPSSX Discussion] wrote:
Thank you again for your comments and advice!

To make sure i understood you correctly:

IV - questionnaire
Since I'm using an established questionnaire as my IV, I don't have to impute missing values and just sum the items by mean.n. As far as I know this procedure requires that at least 2/3, or better 3/4, of the items i use for summing are complete, which is not the case in my questionnaire, i.e. one scale has 11 items and i have missings on 4 of them (->only 63.3% are complete). Can I still use the mean.n fuction or do I have to drop this scale?
You could just use mean.7.  That uses an assumption that the missing data is equal to the mean of the valid items in the case.  You could also check how many cases you would lose if you use mean.8 or mean.9. 

Aside:  One lesson you should learn from this thesis  is the importance of good data gathering (test administration).  That greatly minimizes the amount of missing data.




Can you advice me on literature for that procedure (since the analysis is for my thesis and I have to justify my procedure)? Also, are you implying that I don't have to check MCAR or MAR for that questionnaire?
Indeed, without imputation, I could replicate the factor structure of that questionnaire.
Just use that as the justification.  I do not know of an article that suggests mean substitution for items in a scale.
Perhaps some else has a cite for this ages old practice.

DVs - behavioral measures
I was a little confused by Rich's comment that I don't mention categorical items. Most of my DV items have a response format, i.e. "not prejudiced behavior" vs. " prejudiced behavior". Doesn't that make them categorical?
Yes and no.  The can be considered categorical but they are also considered interval level.  The single interval is perfectly equal to itself. Do FREQUENCIES on some of them.  Look at the percentages and the means.
Would you consider a spelling test that used 1 for right and 0 for wrong and summed the item scores an invalid test? Why would that be different?

Another lesson to take away from this thesis exercise.  Use as fine grained a response scale as is practical under the circumstances. An extent scale that had more possible values on the response scale would restrict the variance less.  It seems that the construct "prejudice" is a continues variable. Why else would you use a summative scale?  A dichotomy is the coarsest possible operationalization of a continuous construct.


By z-transform I meant Fisher's z-transformation (my supervisor suggested that) because I will have to build scales, and 39 are categorical, one is answered on a 7-point liker scale, one is the amount of leaflets participants take with them (interval). I understand that I don't have to use Z-transformation for correlational analyses and factor analysis, right?
Are you putting those items into the same scale?  Are you getting meaningful scoring keys form the factor analysis that includes items with very different response scales?  If so, yes, you would z transform the items before summing them.  If there is not a mix of response scales, then there is no need to transform them.
So your advice, Art, is that I check the factor structure with CFA with listwise deletion and mean imputation and compare them. But before using the summative score or listwise deletion, don't I have to check if the data is MCAR or MAR?
If you want to also try multiple imputation, only use contributors from items that are on the same scoring key.
None of your data seems to be categorical. Before you create the imputed values use the mean.n function to get scores.  Then use the mean function without the .n.  Scatter plot the scores with the missing assumed to be at the mean of the other items vs those from the multiple imputation.  How do they look?
Subtract the scores using the mean.n  from the score using imputed items.  What is the mean min and max difference?

Use both sets of score in your actual analysis model?  How do the substantive conclusions compare?

If you have CFA available that is fine. Do that with listwise deletion, imputed values, and mean substitution from items in the same scale.

Most people do not have CFA available.  So just do EFA with both options on each set of items.
Do parallel analysis with both sets of data. Plot all 6 sets of eigenvalues. How do the they look?
How do the scoring keys compare across the three sets?
How do the scoring keys you find compare to the scoring keys used by the original research when there is some earlier research?





You
I understand from the literature that every method of imputation or deletion of cases assumes that data is MCAR/MAR.

Thank you so much for your help!!



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-tp4994372p5723119.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants