Multiple Imputation

classic Classic list List threaded Threaded
2 messages Options
Pia
Reply | Threaded
Open this post in threaded view
|

Multiple Imputation

Pia
I have some questions about the selection of appropriate variables to include in the Multiple Imputation mdel (SPSS 19). I have a large dataset with more than 1000 cases and around 3000 variables. I now want to impute missing values for 8 variables (5-40% missing values). I couldn't find a lot of literature about which and how many variables to select but what I found was:
a) you should inlcude as many variables as possible in the model,
b) include variables that are correlated with the imputed variable,
c) include variables that are associated with the missingness of the imputed variable and
d) variables that will be used in the analysis later.
If I follow these advices I will have to include almost all variables which is not possible. And theoretically it doesn't make sense to me to include all variables to predict different variables in the dataset.
Thanks for any helpful advice.
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Imputation

Bruce Weaver
Administrator
Re point a) below, you should not include so many variables that you are over-fitting the model.  A nice general reference on over-fitting is Mike Babyak's article, available here:

  http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf

HTH.


Pia wrote
I have some questions about the selection of appropriate variables to include in the Multiple Imputation mdel (SPSS 19). I have a large dataset with more than 1000 cases and around 3000 variables. I now want to impute missing values for 8 variables (5-40% missing values). I couldn't find a lot of literature about which and how many variables to select but what I found was:
a) you should inlcude as many variables as possible in the model,
b) include variables that are correlated with the imputed variable,
c) include variables that are associated with the missingness of the imputed variable and
d) variables that will be used in the analysis later.
If I follow these advices I will have to include almost all variables which is not possible. And theoretically it doesn't make sense to me to include all variables to predict different variables in the dataset.
Thanks for any helpful advice.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).