I have 9 variables that require imputation (between 7% and 24% missing), so am going with the MI procedure. They are all dichotomous, so the Missing Values module is using logistic regression to estimate the values. Two questions: 1. I was asked to produce diagnostics of my results, but it appears the only diagnostic for MI is trace plots, and they only apply to quantitative variables (i.e. multiple regression solutions). Is there anything similar for imputed dichotomous variables? 2. I thought it made logical sense to pick the 3 or 4 best predictors of each variable requiring imputation, run it separately for each one (I cautiously elected 30 imputations per variable) and then merge these variables together along with the variables that are not imputed in one analytical file to use pooled results for logistic regression. Is that methodologically sound? Thank you for any guidance! Gary
|
By "MI procedure" do you mean MULTIPLE IMPUTATION? You would want to include all 9 variables if possible in the imputation process at one time. Also you will want to include any other informative variables in the process, even if they have no missing data.
Trace plots (that I am familiar with), plot the changes in coefficient estimates over iterations or other statistics. (This SAS documentation suggests the model log-likelihood, https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_mi_sect031.htm) So trace plots are applicable to both continuous and dichotomous variables. Does SPSS even give these though for any variables? For the OUTFILE subcommand it provides "FCSITERATIONS", but those are the predicted means and variances it appears, not the model coefficients. (You could make a trace plot of those, but it will be not so interesting.) Trace plots aren't the only diagnostics though. For example you can just look at the marginal proportions for the complete case vs imputed cases. For example, say variable X has 20% "A", and 80% "B" in the complete case, and 25% "A" and 75% "B" in the imputed sets. From this you could say they are pretty similar, but "A" cases may be underreported in the complete set. These marginal checks are more important for continuous variables, to make sure you did not impute values way outside the reasonable range. |
Thank you Andy W for addressing my questions, and being the novice I am with Multiple Imputation, I still have some questions.
Regarding your first point of doing the MI all at one time if possible, which of the following would be preferable, in a situation similar to the following
which I’m just simplifying a little for illustrative purposes: we are considering 20 dichotomous variables to predict a dichotomous poor outcome (very low birth weight), and the first thru fourth variables require imputation, but different sets of 3 predictors
from the remaining 16 variables best correlate with each of these four variables: run four separate MI procedures with the different sets of 3 predictors and then merge the imputed data sets, or run one MI procedure identifying four dependent variables and
twelve predictors all in one shot in the SPSS MI dialog box? I’ve seen that many journals desire the inclusion of diagnostic tests, and trace plots are certainly a viewer-friendly way of assuring the reader that the imputations
are legit. Is the SAS link implying that with dichotomous variables, I “can” evaluate how valid the pooled estimates are, based on the chosen number of iterations and imputations using SAS, something I cannot do in SPSS?
Yes, regarding your last point, thankfully I don’t have that outlier problem with marginals by focusing on proportions, and the MI output all looks reasonable
– the proportions for imputed cases look similar to the complete cases when one would expect them to, and likewise diverge a bit when one would expect such (for example, when father’s age is missing, they tend to be younger, and we usually have mother’s age
non-missing to serve as a predictor, so the dichotomous variable depicting young father has a higher proportion for the imputed cases). Best, Gary From: Andy W [via SPSSX Discussion] [mailto:ml-node+[hidden email]]
By "MI procedure" do you mean MULTIPLE IMPUTATION? You would want to include all 9 variables if possible in the imputation process at one time. Also you will want to include any other informative variables in
the process, even if they have no missing data. If you reply to this email, your message will be added to the discussion below: To unsubscribe from SPSS Missing Values Imputation Strategy,
click here. |
For SPSS you would probably want to include all variables in the imputation process. Some software lets you do equation-by-equation models, so if you say were imputing X1, X2 and X3, you could have different models predicting each, say:
X1 ~ (A,B,C) X2 ~ (X1,A,B) X3 ~ (X2,C) So X1 is a function of A, B and C, X2 is a function of X1, A and B etc. But SPSS pretty much forces everything to always be included. So it would be: X1 ~ (X2,X3,A,B,C) X2 ~ (X1,X3,A,B,C) X3 ~ (X1,X2,A,B,C) This is not a big deal though in most situations. It is more important to have a set of consistent models, so exclusions like that are pretty rare. Even if you say know C is not relevant to X2, it should not make a big difference for including it to predict X2. I can't make sense of your question for trace plots. "Diagnostic tests" is a pretty broad term. "Trace plots" can either show convergence (or lack of autocorrelation in MCMC draws). Examining the subsequent distributions like I mentioned accomplishes something different, it shows if the posterior draws for the imputations are on their face reasonable. The former (trace plots) does not guarantee the later. |
That’s good to know about usually not needing parsimonious sets of predictors. However, since I already got around the inability to do equation-by-equation
models at one go-round by creating separate MI datasets based on specific equations for each variable requiring imputation, and then just merging the imputed variables from each file together, I’ll keep that file and make another one including all variables
in one go-round, the same number of imputations (30) and see if the pooled frequencies differ. I believe your hunch is they will not. Regarding the lack of convergence diagnostics produced by SPSS for its logistic regression solution, do you think convergence is something I should be concerned
about? The distributional results all seem logical to me, but could there have been convergence problems that a trace plot would reveal which would make these distributional outcomes invalid? Thanking you, -Gary From: Andy W [via SPSSX Discussion] [mailto:ml-node+[hidden email]]
For SPSS you would probably want to include all variables in the imputation process. Some software lets you do equation-by-equation models, so if you say were imputing X1, X2 and X3, you could have different models predicting each, say:
If you reply to this email, your message will be added to the discussion below: To unsubscribe from SPSS Missing Values Imputation Strategy,
click here. |
In reply to this post by Andy W
Andy, you were right on the mark – the 7 dependent variables that required imputation received practically the identical pooled results when I incorporated
each predictor variable in one run as when I split them up. Reading a little bit on the convergence issue, an article by Paul Allison explains that most convergence problems for logistic regression occur due to quasi-complete
separation, when due to small sample and/or rare occurrences (extreme split on the dependent variable), there is a level of the predictor variable that has all or none of the outcomes, thus preventing an ML estimate. This is something I can simply discover
with bivariate 2x2 cross-tabulations, and adjust accordingly. But I’m still wondering whether I need to look at the log-likelihood over each iteration, as you mention that SAS provides. What’s your take, and Is there anything else I can look at empirically
with available SPSS output to rule out problematic imputations? Thank you, -Gary From: Gary Klein
That’s good to know about usually not needing parsimonious sets of predictors. However, since I already got around the inability to do equation-by-equation
models at one go-round by creating separate MI datasets based on specific equations for each variable requiring imputation, and then just merging the imputed variables from each file together, I’ll keep that file and make another one including all variables
in one go-round, the same number of imputations (30) and see if the pooled frequencies differ. I believe your hunch is they will not. Regarding the lack of convergence diagnostics produced by SPSS for its logistic regression solution, do you think convergence is something I should be concerned
about? The distributional results all seem logical to me, but could there have been convergence problems that a trace plot would reveal which would make these distributional outcomes invalid? Thanking you, -Gary From: Andy W [via SPSSX Discussion] [[hidden email]]
For SPSS you would probably want to include all variables in the imputation process. Some software lets you do equation-by-equation models, so if you say were imputing X1, X2 and X3, you could have different models predicting each, say:
If you reply to this email, your message will be added to the discussion below: To unsubscribe from SPSS Missing Values Imputation Strategy,
click here. |
Free forum by Nabble | Edit this page |