|
Hi all,
I want to use the RMV (remove missing values) procedure to get rid of missing values. The syntax guide isn't very clear about this, but I was wondering if this procedure depends on the sorting order of the data. As in: SORT BY independent var, RMV dependent var, where iv and dv have a meaningful relationship. How else could the missing values be replaced with the mean/median of the surrounding value(s)? In particular, I would like to replace missing values of pregnancy duration based on birth weight. Which of the below approaches would be best, is this can be said at all without touching the data? I am choosing the median because there may be quite a lot of variation (e.g., full term stillborns may have a very low birth weight), and in the 'split file' case, the group size may be quite small. sort cases by birthweight. rmv /amenorroe_estim =median(amenorroe all). or alternatively: rmv /amenorroe_estim =median(amenorroe, 10). or alternatively: sort cases by birthweight_cat birthweight. temporary. split file by birthweight_category. rmv /amenorroe_estim =median(amenorroe all). Thanks! Albert-Jan ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
I use the rmv procedure quite regularly to impute missing and it works even better with the SPLIT FILES command. It is not necessary to sort cases by the factor being imputed.
I would suggest not using the rmv procedure alone as in your first few examples since, depending on the level of missing, would alter your population distribution (assuming the missing are random & being assigned the mean or median). Instead, I encourage my students to find a set of factors that are correlated & to create a matrix of known group values to assign missing, you do something similar in the last set of your examples. For example, we have about 8 to 10% of our high sch admitted students to the university without ACT scores. We know there is a strong correlation between ACT scores and HS percentile, race, and admission type (admit, probation, deny). So we create a decile of HS percentiles, a binary for minority & the 3 level admit type and use the split file to assign missing: SORT CASES BY hspctile minority admitype. RMV act_imp=SMEAN(act). SPLIT FILE OFF. By partitioning the missing into the known value matrix, our imputed ACT scores have the same distribution as the non-imputed act scores. But again, there is no need to sort on the ACT values. David ---- Albert-jan Roskam <[hidden email]> wrote: > Hi all, > > I want to use the RMV (remove missing values) procedure to get rid of missing values. The syntax guide isn't very clear about this, but I was wondering if this procedure depends on the sorting order of the data. As in: SORT BY independent var, RMV dependent var, where iv and dv have a meaningful relationship. > > How else could the missing values be replaced with the mean/median of the surrounding value(s)? In particular, I would like to replace missing values of pregnancy duration based on birth weight. Which of the below approaches would be best, is this can be said at all without touching the data? I am choosing the median because there may be quite a lot of variation (e.g., full term stillborns may have a very low birth weight), and in the 'split file' case, the group size may be quite small. > > sort cases by birthweight. > rmv /amenorroe_estim =median(amenorroe all). > > or alternatively: > rmv /amenorroe_estim =median(amenorroe, 10). > > or alternatively: > sort cases by birthweight_cat birthweight. > temporary. > split file by birthweight_category. > rmv /amenorroe_estim =median(amenorroe all). > > Thanks! > > Albert-Jan > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
