Multiple Imputation

classic Classic list List threaded Threaded
5 messages Options
Pia
Reply | Threaded
Open this post in threaded view
|

Multiple Imputation

Pia
I have some questions about the selection of appropriate variables to include in the Multiple Imputation mdel (SPSS 19). I have a large dataset with more than 1000 cases and around 3000 variables. I now want to impute missing values for 8 variables (5-40% missing values). I couldn't find a lot of literature about which and how many variables to select but what I found was:
a) you should inlcude as many variables as possible in the model,
b) include variables that are correlated with the imputed variable,
c) include variables that are associated with the missingness of the imputed variable and
d) variables that will be used in the analysis later.
If I follow these advices I will have to include almost all variables which is not possible. And theoretically it doesn't make sense to me to include all variables to predict different variables in the dataset.
Thanks for any helpful advice.
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Imputation

Art Kendall
What to do has a lot to do with the substantive nature of your research.

Why are those values missing? Do they have distinct missing value codes?
What role were they intended for in the analysis?

What questions are you using the data to answer?

Are the missing variables items in scales?

Are the variables that have missing values  some form of repeated measure like scale items, taken at different times, points along a spectrum etc?

Which variables are plausibly related to missingness? 

Is the missingness correlated across the 8?

Why do you have so many variables?

What role were those 8 variables intended to have in the analysis?

How did you get you get the set of cases?

How did you choose the variables that you measured?

Art Kendall
Social Research Consultants


On 5/13/2011 6:55 AM, Pia wrote:
I have some questions about the selection of appropriate variables to include
in the Multiple Imputation mdel (SPSS 19). I have a large dataset with more
than 1000 cases and around 3000 variables. I now want to impute missing
values for 8 variables (5-40% missing values). I couldn't find a lot of
literature about which and how many variables to select but what I found
was:
a) you should inlcude as many variables as possible in the model,
b) include variables that are correlated with the imputed variable,
c) include variables that are associated with the missingness of the imputed
variable and
d) variables that will be used in the analysis later.
If I follow these advices I will have to include almost all variables which
is not possible. And theoretically it doesn't make sense to me to include
all variables to predict different variables in the dataset.
Thanks for any helpful advice.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-tp4392805p4392805.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Pia
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Imputation

Pia
It is a longitudinal (5 time points) quasi-experimental research project about testing and evaluating an intervention.  The questionnaire was quite large containing many different standardized psychological scales (e.g. Depression, Grief, Social Support) but also non-standardized questions. Question: do the interventions show any effect on these measures but also more in-depth modeling about risk and protective factors for our specific sample. We used different missing codes (e.g. not applicable, interviewer omission). The missing values we want to impute were mainly omitted by the interviewer for different reasons. All of them are scale items at different time points (not all of them show more than 5% missings at all time points). The analysis we are going to do are: MANCOVA, repeated measures, SEM. For the 8 different variables there are, of course, different correlating variables/predictors in the dataset. I don't think the missingness is correlated across the 8 variables (different reasons for missingness depending on the variable).
Hope that helps!
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Imputation

Art Kendall
The variables that are most likely to be well correlated with the variable that has missing values are the other items in that scale at that time.

I don't under stand
not all of them show more than 5% missings at all time points.

Are the scale norms given to you as means or sums?

If you do something like this
count k_missing = grieft2_17, depressionT3_22 ....(missing).
frequencies vars = k_missing.

what do you get?


would you please post a tiny table with 8 lines, one for each item and
4 columns: scale_name time #items_asked #cases_missing

Art Kendall
Social Research Consultants

On 5/13/2011 8:29 AM, Pia wrote:
It is a longitudinal (5 time points) quasi-experimental research project
about testing and evaluating an intervention.  The questionnaire was quite
large containing many different standardized psychological scales (e.g.
Depression, Grief, Social Support) but also non-standardized questions.
Question: do the interventions show any effect on these measures but also
more in-depth modeling about risk and protective factors for our specific
sample. We used different missing codes (e.g. not applicable, interviewer
omission). The missing values we want to impute were mainly omitted by the
interviewer for different reasons. All of them are scale items at different
time points (not all of them show more than 5% missings at all time points).
The analysis we are going to do are: MANCOVA, repeated measures, SEM. For
the 8 different variables there are, of course, different correlating
variables/predictors in the dataset. I don't think the missingness is
correlated across the 8 variables (different reasons for missingness
depending on the variable).
Hope that helps!

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-tp4392805p4392971.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Pia
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Imputation

Pia
scale_name #items                % missing values
Prosocial          5                        17% in only 1 item of T1
Grief                  23                        20% in all items of T1
Coping          32                        5% in only 1 item of T1
PTSD                  9                        7-8% in all items of T1
Daily Problems 14                        20-30% in only 1 item of T3,T4 and T5
Support         5                        7-20% missings in T1,T2,T3,T4,T5
Assets         9                        6, 10,13 and 54 % missings in 4 items of T1
 
You see, I have missing items in all 5 time points.
"Not all of them show more than 5% missings at all time points" means thta there are some items that only show too many missings at one time point, not in all 5 time points (see above). We compute scale sums and scale means, but most liekly analyze with scale means.