Dear List,
I am running a multiple imputation on lots of questionnaire items and I'm trying to figure out a way to run the analysis, without imputing missing values for those participants who have missed out say all 5 items on an alcohol questionnaire because they were told to skip it if they do not drink alcohol. I don't want to exclude the alcohol measure entirely from the MI because there are also randomly missing values across these alcohol items that do need imputing. At the moment all missing values are identified as system missing in the data file, and I thought there might be a way to get SPSS to only run the MI on certain types of missing values if I coded the ones I want to be ignored as user missing, but this doesn't seem possible. The only solution I could come up with was running the MI, then manually scanning thousands of rows of data and deleting the imputed values on the alcohol measure for the participants who skipped the entire questionnaire. As you can imagine, this is taking hours! There must be a simpler way. Any advice greatly appreciated. Kind regards, Kathryn |
Administrator
|
"The only solution I could come up with was running the MI, then manually
scanning thousands of rows of data and deleting the imputed values on the alcohol measure..." Anytime you begin to manually scan thousands of rows... *STOP*! RETHINK!! "there must be a simpler way." Yes! It is called Syntax! --- Assuming something like alcohol measure = alc01 to alc05. Imputed values imp01 to imp05. DO IF NVALID(alc01 TO alc05)=0. DO REPEAT imp=imp01 TO imp05. COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as missing */. END REPEAT. END IF. --- OTOH: I don't have this module so not certain what the data come back with? --- I suspect you actually end up with the raw non-imputed data at the top and several imputed data sets below? Hopefully with some sort of consistent ID variable (ID) and some sort of imputation flag impflag (0 raw, 1 imputed)?. COMPUTE @ordered@=$CASENUM. SORT CASES BY ID impflag. COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0. IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@). DO IF @nukeme@. + DO REPEAT imp=imp01 TO imp05. + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as missing */. + END REPEAT. END IF. SORT CASES BY @ordered@. Kathryn Gardner wrote > > Dear List,I am running a multiple imputation on lots of questionnaire > items and I'm trying to figure out a way to run the analysis, without > imputing missing values for those participants who have missed out say all > 5 items on an alcohol questionnaire because they were told to skip it if > they do not drink alcohol. I don't want to exclude the alcohol measure > entirely from the MI because there are also randomly missing values across > these alcohol items that do need imputing. At the moment all missing > values are identified as system missing in the data file, and I thought > there might be a way to get SPSS to only run the MI on certain types of > missing values if I coded the ones I want to be ignored as user missing, > but this doesn't seem possible. The only solution I could come up with was > running the MI, then manually scanning thousands of rows of data and > deleting the imputed values on the alcohol measure for the participants > who skipped the entire questionnaire. As you can imagine, this is taking > hours! There must be a simpler way. Any advice greatly appreciated. Kind > regards,Kathryn > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077083.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by Kathryn Gardner
"The only solution I could come up with was running the MI, then manually scanning thousands of rows of data and deleting the imputed values on the alcohol measure..."
Anytime you begin to manually scan thousands of rows... *STOP*! RETHINK!! "there must be a simpler way." Yes! It is called Syntax! --- Assuming something like alcohol measure = alc01 to alc05. Imputed values imp01 to imp05. DO IF NVALID(alc01 TO alc05)=0. DO REPEAT imp=imp01 TO imp05. COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as missing */. END REPEAT. END IF. --- OTOH: I don't have this module so not certain what the data come back with? --- I suspect you actually end up with the raw non-imputed data at the top and several imputed data sets below? Hopefully with some sort of consistent ID variable (ID) and some sort of imputation flag impflag (0 raw, 1 imputed)?. COMPUTE @ordered@=$CASENUM. SORT CASES BY ID impflag. COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0. IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@). DO IF @nukeme@. + DO REPEAT imp=alc01 TO alc05. + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as missing */. + END REPEAT. END IF. SORT CASES BY @ordered@.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Kathryn Gardner
I would like to hear from other list members,
but imputing summative scale items via MI is often unnecessary.
You use the term items which often means the variables are meant to be used as part of a score so I am responding in that context. Are you planning to distribute a public use data set that includes items, that includes only scales scores, or are you only working on you own data set for your use? What is the goal of your project? finding totals, means, percents for a pop or for subpops? Or are you intending to compare and contrast groups? Or mainly interest in the relations of variables? Developing scales? What is the response scale on the alcohol items? Are they intended to be repeated measures of a construct where the total or mean is used in analysis as the measure of a construct? {would like to hear from other on this} If score is to be used in analysis and the mean is the summative score, just use it. If the score is to be used as a total e.g., for comparison to published norms, then a) do what the original authors of the scale did or b) compute adjscore = sum valid items * (# of items in scale/# of items with valid values). {end of part I would like to hear from other list members about. Art Kendall Social Research Consultants On 12/15/2011 4:15 AM, Kathryn Gardner wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by Kathryn Gardner
Kathryn, You don’t say what those five alcohol items are but I picture items about quantity and frequency and problems. Why wouldn’t you code a value of 0=never, 0=none, 0=etc for those items given a lead-in item response of ‘No use’. Unless you are prepared to assume that the alcohol lead-in question has a non-1.00 reliability, the response to those questions is 0. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Kathryn Gardner Dear List, |
In reply to this post by Art Kendall
Dear Art,
I wanted to impute the missing values at the item level as I thought this was more sensitive and I can then use all data in reliability analyses. Once imputed, I'll be summing the items to create scale scores to represent various constructs (e.g., alcohol use, personality, emotion regulation) that will be used in in the main analyses 2 papers I am publishing (SEM for one paper and latent profile analysis for another paper). I thought using the mean as the summative score to deal with missing data is equally as bad as using mean substitution? Why is imputing summative scale items via MI is often unnecessary? I couldn't find anything on the debate as to whether multiple imputation should be used for scale items vs. computed subscale scores etc. Kathryn Date: Thu, 15 Dec 2011 07:11:01 -0500 From: [hidden email] To: [hidden email] CC: [hidden email] Subject: Re: [SPSSX-L] Multiple imputation for different types of missing values I would like to hear from other list members, but imputing summative scale items via MI is often unnecessary. You use the term items which often means the variables are meant to be used as part of a score so I am responding in that context. Are you planning to distribute a public use data set that includes items, that includes only scales scores, or are you only working on you own data set for your use? What is the goal of your project? finding totals, means, percents for a pop or for subpops? Or are you intending to compare and contrast groups? Or mainly interest in the relations of variables? Developing scales? What is the response scale on the alcohol items? Are they intended to be repeated measures of a construct where the total or mean is used in analysis as the measure of a construct? {would like to hear from other on this} If score is to be used in analysis and the mean is the summative score, just use it. If the score is to be used as a total e.g., for comparison to published norms, then a) do what the original authors of the scale did or b) compute adjscore = sum valid items * (# of items in scale/# of items with valid values). {end of part I would like to hear from other list members about. Art Kendall Social Research Consultants On 12/15/2011 4:15 AM, Kathryn Gardner wrote:
|
In reply to this post by David Marso
thanks David. I'm a bit of a novice when it comes to syntax and can only do basic stuff and am not clear about all of the commands. You are correct in that MI produces 1 data file with a variable called Imputation_ coded 0 (raw), 1, 2, 3, 4, 5 for 5 imputed data sets. I tried to get the second set of code to run. I changed the syntax so that impfag = imputation_, $casenum to ID, and also changed the name of my alcohol items. It partly runs and adds the variables @nukeme@ and @ordered@ to the data file, and then in the @nukeme@ column it's coded any case with missing data on the alcohol items as 1, and also coded as 1 those cases with the same ID number but whose data on CAGEAID_2 to CAGEAID_5 has been imputed. I thought it would change the latter to system missing values though?
COMPUTE @ordered@= ID. SORT CASES BY ID imputation_. COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0. IF ID=LAG(ID) AND imputation_ @nukeme@=LAG(@nukeme@). DO IF @nukeme@. + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE imputation_=$SYSMIS . + END REPEAT. END IF. SORT CASES BY @ordered@. > Date: Thu, 15 Dec 2011 02:29:17 -0800 > From: [hidden email] > Subject: Re: Multiple imputation for different types of missing values > To: [hidden email] > > "The only solution I could come up with was running the MI, then manually > scanning thousands of rows of data and deleting the imputed values on the > alcohol measure..." > Anytime you begin to manually scan thousands of rows... *STOP*! RETHINK!! > "there must be a simpler way." Yes! It is called Syntax! > --- > Assuming something like alcohol measure = alc01 to alc05. Imputed values > imp01 to imp05. > DO IF NVALID(alc01 TO alc05)=0. > DO REPEAT imp=imp01 TO imp05. > COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as > missing */. > END REPEAT. > END IF. > --- > OTOH: I don't have this module so not certain what the data come back with? > --- > I suspect you actually end up with the raw non-imputed data at the top and > several imputed data sets below? Hopefully with some sort of consistent ID > variable (ID) > and some sort of imputation flag impflag (0 raw, 1 imputed)?. > COMPUTE @ordered@=$CASENUM. > SORT CASES BY ID impflag. > COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0. > IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@). > DO IF @nukeme@. > + DO REPEAT imp=alc01 TO alc05. > + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as > missing */. > + END REPEAT. > END IF. > SORT CASES BY @ordered@. > > > > > > Kathryn Gardner wrote > > > > Dear List,I am running a multiple imputation on lots of questionnaire > > items and I'm trying to figure out a way to run the analysis, without > > imputing missing values for those participants who have missed out say all > > 5 items on an alcohol questionnaire because they were told to skip it if > > they do not drink alcohol. I don't want to exclude the alcohol measure > > entirely from the MI because there are also randomly missing values across > > these alcohol items that do need imputing. At the moment all missing > > values are identified as system missing in the data file, and I thought > > there might be a way to get SPSS to only run the MI on certain types of > > missing values if I coded the ones I want to be ignored as user missing, > > but this doesn't seem possible. The only solution I could come up with was > > running the MI, then manually scanning thousands of rows of data and > > deleting the imputed values on the alcohol measure for the participants > > who skipped the entire questionnaire. As you can imagine, this is taking > > hours! There must be a simpler way. Any advice greatly appreciated. Kind > > regards,Kathryn > > > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
Administrator
|
Kathryn ,
Looks like the line + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE imputation_=$SYSMIS . may be the culprit. Should be 2 lines (sans >). Also modified logical flag for (imputation_). HTH, David --- * Do *NOT* change $CASENUM to ID!!! (This is to enable restoration to original order) . COMPUTE @ordered@= $CASENUM. SORT CASES BY ID imputation_. COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0. * Probably need following change IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@). DO IF @nukeme@. + DO REPEAT imputedvar=cageaid_2 TO cageaid_5. * Following line was previously munged into DO REPEAT line * Possible received error mesage? *. + COMPUTE imputatedvar=$SYSMIS . + END REPEAT. END IF. * Restore data to order of imputed data sets. SORT CASES BY @ordered@.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
Kathryn, in your first post, you said, "I'm trying to figure out a way to run the analysis, without imputing missing values for those participants who have missed out say all 5 items on an alcohol questionnaire because they were told to skip it if they do not drink alcohol." This makes me wonder if there is another dichotomous variable in your file (Drinks_Alcohol - Y/N) that can be used instead of NVALID(cageaid_2 TO cageaid_5)= 0. Assuming there is such a variable, and it's coded 1=Y, 0=N, David's line that computes @nukeme@ could be changed to:
COMPUTE @nukeme@= NOT drinks_alcohol. I think this is preferable, because for whatever reason (e.g., data entry error), someone might have zeros for cageaid_2 to cageaid_5, despite having a YES for drinks_alcohol. It's also simpler code. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by David Marso
I tried the new syntax and it seems to do something similar and produces the error messages below.
Just to clarify, should the syntax put system missing values in cageaid_2 to cageaid_5 when all of these have been imputed? Also, not sure if it makes a difference that some imputed values are negative (see 5 rows of imputed data for the same participant below). -1 -1 1 -1 -1 0 -1 0 -1 -1 -1 0 1 0 1 0 -1 -1 0 0 52 COMPUTE @ordered@= $CASENUM. 53 SORT CASES BY ID imputation_. 54 COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5) = 0. 55 IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@). 56 DO IF @nukeme@. 57 + DO REPEAT imputedvar=cageaid_2 TO cageaid_5. 58 + COMPUTE imputatedvar=$SYSMIS . 59 + END REPEAT. 60 END IF. 61 SORT CASES BY @ordered@. 56 + LOOP has no effect on this command. 56 + The first word in the line is not recognized as an SPSS Statistics command. 57 + LOOP has no effect on this command. 57 + The first word in the line is not recognized as an SPSS Statistics command. 58 + LOOP has no effect on this command. 58 + The first word in the line is not recognized as an SPSS Statistics command. > Date: Fri, 16 Dec 2011 05:15:37 -0800 > From: [hidden email] > Subject: Re: Multiple imputation for different types of missing values > To: [hidden email] > > Kathryn , > Looks like the line > + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE > imputation_=$SYSMIS . > may be the culprit. Should be 2 lines (sans >). Also modified logical flag > for (imputation_). > HTH, David > --- > * Do *NOT* change $CASENUM to ID!!! (This is to enable restoration to > original order) . > COMPUTE @ordered@= $CASENUM. > SORT CASES BY ID imputation_. > COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0. > * Probably need following change > IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@). > DO IF @nukeme@. > + DO REPEAT imputedvar=cageaid_2 TO cageaid_5. > * Following line was previously munged into DO REPEAT line * Possible > received error mesage? *. > + COMPUTE imputatedvar=$SYSMIS . > + END REPEAT. > END IF. > * Restore data to order of imputed data sets. > SORT CASES BY @ordered@. > > > Kathryn Gardner wrote > > > > thanks David. I'm a bit of a novice when it comes to syntax and can only > > do basic stuff and am not clear about all of the commands. You are correct > > in that MI produces 1 data file with a variable called Imputation_ coded 0 > > (raw), 1, 2, 3, 4, 5 for 5 imputed data sets. I tried to get the second > > set of code to run. I changed the syntax so that impfag = imputation_, > > $casenum to ID, and also changed the name of my alcohol items. It partly > > runs and adds the variables @nukeme@ and @ordered@ to the data file, and > > then in the @nukeme@ column it's coded any case with missing data on the > > alcohol items as 1, and also coded as 1 those cases with the same ID > > number but whose data on CAGEAID_2 to CAGEAID_5 has been imputed. I > > thought it would change the latter to system missing values though? > > > > COMPUTE @ordered@= ID. > > SORT CASES BY ID imputation_. > > COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0. > > IF ID=LAG(ID) AND imputation_ @nukeme@=LAG(@nukeme@). > > DO IF @nukeme@. > > + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE > > imputation_=$SYSMIS . > > + END REPEAT. > > END IF. > > SORT CASES BY @ordered@. > > > > > >> Date: Thu, 15 Dec 2011 02:29:17 -0800 > >> From: david.marso@ > >> Subject: Re: Multiple imputation for different types of missing values > >> To: SPSSX-L@.UGA > >> > >> "The only solution I could come up with was running the MI, then manually > >> scanning thousands of rows of data and deleting the imputed values on the > >> alcohol measure..." > >> Anytime you begin to manually scan thousands of rows... *STOP*! RETHINK!! > >> "there must be a simpler way." Yes! It is called Syntax! > >> --- > >> Assuming something like alcohol measure = alc01 to alc05. Imputed values > >> imp01 to imp05. > >> DO IF NVALID(alc01 TO alc05)=0. > >> DO REPEAT imp=imp01 TO imp05. > >> COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as > >> missing */. > >> END REPEAT. > >> END IF. > >> --- > >> OTOH: I don't have this module so not certain what the data come back > >> with? > >> --- > >> I suspect you actually end up with the raw non-imputed data at the top > >> and > >> several imputed data sets below? Hopefully with some sort of consistent > >> ID > >> variable (ID) > >> and some sort of imputation flag impflag (0 raw, 1 imputed)?. > >> COMPUTE @ordered@=$CASENUM. > >> SORT CASES BY ID impflag. > >> COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0. > >> IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@). > >> DO IF @nukeme@. > >> + DO REPEAT imp=alc01 TO alc05. > >> + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later > >> as > >> missing */. > >> + END REPEAT. > >> END IF. > >> SORT CASES BY @ordered@. > >> > >> > >> > >> > >> > >> Kathryn Gardner wrote > >> > > >> > Dear List,I am running a multiple imputation on lots of questionnaire > >> > items and I'm trying to figure out a way to run the analysis, without > >> > imputing missing values for those participants who have missed out say > >> all > >> > 5 items on an alcohol questionnaire because they were told to skip it > >> if > >> > they do not drink alcohol. I don't want to exclude the alcohol measure > >> > entirely from the MI because there are also randomly missing values > >> across > >> > these alcohol items that do need imputing. At the moment all missing > >> > values are identified as system missing in the data file, and I thought > >> > there might be a way to get SPSS to only run the MI on certain types of > >> > missing values if I coded the ones I want to be ignored as user > >> missing, > >> > but this doesn't seem possible. The only solution I could come up with > >> was > >> > running the MI, then manually scanning thousands of rows of data and > >> > deleting the imputed values on the alcohol measure for the participants > >> > who skipped the entire questionnaire. As you can imagine, this is > >> taking > >> > hours! There must be a simpler way. Any advice greatly appreciated. > >> Kind > >> > regards,Kathryn > >> > > >> > >> > >> -- > >> View this message in context: > >> http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html > >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >> > >> ===================== > >> To manage your subscription to SPSSX-L, send a message to > >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the > >> command. To leave the list, send the command > >> SIGNOFF SPSSX-L > >> For a list of commands to manage subscriptions, send the command > >> INFO REFCARD > > > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080244.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
Administrator
|
Kathryn ,
What is happening in lines 1-51 of the preceding prior to running the posted syntax? All my code does is check to see if the non-imputed data are *ALL* missing for the specified variables (cageaid_2 TO cageaid_5). If so it creates a flag @nukeme@. It then checks within the same ID and drags the flag into the imputed data sets. It should then clobber the specified variables for the specified variables (set $SYSMIS). It makes *ABSOLUTELY* no difference what the imputed values are. Only thing that matters is that the non-imputed are *ALL* missing. My question WHAT IS THE CONTEXT for: "56 + LOOP has no effect on this command. "... i.e. WHAT IS GOING ON PRIOR TO RUNNING my posted syntax? David ---
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Hi David,
Lines 1-51 were empty (I had some syntax there but deleted it). I see what you mean now about how the syntax works. I think i can use this now actually, recoding all cageaid items to system missing IF @nukeme@ = 1. Many thanks for this it's a great help! Kathryn > Date: Fri, 16 Dec 2011 07:04:42 -0800 > From: [hidden email] > Subject: Re: Multiple imputation for different types of missing values > To: [hidden email] > > Kathryn , > What is happening in lines 1-51 of the preceding prior to running the posted > syntax? > All my code does is check to see if the non-imputed data are *ALL* missing > for the specified variables > (cageaid_2 TO cageaid_5). If so it creates a flag @nukeme@. It then checks > within the same ID and drags the flag into the imputed data sets. It should > then clobber the specified variables for the specified variables (set > $SYSMIS). It makes *ABSOLUTELY* no difference what the imputed values are. > Only thing that matters is that the non-imputed are *ALL* missing. My > question WHAT IS THE CONTEXT for: > "56 + LOOP has no effect on this command. "... i.e. WHAT IS GOING ON PRIOR > TO RUNNING my posted syntax? > David > > --- > > Kathryn Gardner wrote > > > > I tried the new syntax and it seems to do something similar and produces > > the error messages below. > > Just to clarify, should the syntax put system missing values in cageaid_2 > > to cageaid_5 when all of these have been imputed? Also, not sure if it > > makes a difference that some imputed values are negative (see 5 rows of > > imputed data for the same participant below). > > > > -1 -1 1 -1 > > -1 0 -1 0 > > -1 -1 -1 0 > > 1 0 1 0 > > -1 -1 0 0 > > > > 52 COMPUTE @ordered@= $CASENUM. > > 53 SORT CASES BY ID imputation_. > > 54 COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5) = 0. > > 55 IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@). > > 56 DO IF @nukeme@. > > 57 + DO REPEAT imputedvar=cageaid_2 TO cageaid_5. > > 58 + COMPUTE imputatedvar=$SYSMIS . > > 59 + END REPEAT. > > 60 END IF. > > 61 SORT CASES BY @ordered@. > > > > 56 + LOOP has no effect on this command. > > 56 + The first word in the line is not recognized as an SPSS Statistics > > command. > > 57 + LOOP has no effect on this command. > > 57 + The first word in the line is not recognized as an SPSS Statistics > > command. > > 58 + LOOP has no effect on this command. > > 58 + The first word in the line is not recognized as an SPSS Statistics > > command. > > > > > > > > > > > >> Date: Fri, 16 Dec 2011 05:15:37 -0800 > >> From: david.marso@ > >> Subject: Re: Multiple imputation for different types of missing values > >> To: SPSSX-L@.UGA > >> > >> Kathryn , > >> Looks like the line > >> + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE > >> imputation_=$SYSMIS . > >> may be the culprit. Should be 2 lines (sans >). Also modified logical > >> flag > >> for (imputation_). > >> HTH, David > >> --- > >> * Do *NOT* change $CASENUM to ID!!! (This is to enable restoration to > >> original order) . > >> COMPUTE @ordered@= $CASENUM. > >> SORT CASES BY ID imputation_. > >> COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0. > >> * Probably need following change > >> IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@). > >> DO IF @nukeme@. > >> + DO REPEAT imputedvar=cageaid_2 TO cageaid_5. > >> * Following line was previously munged into DO REPEAT line * Possible > >> received error mesage? *. > >> + COMPUTE imputatedvar=$SYSMIS . > >> + END REPEAT. > >> END IF. > >> * Restore data to order of imputed data sets. > >> SORT CASES BY @ordered@. > >> > >> > >> Kathryn Gardner wrote > >> > > >> > thanks David. I'm a bit of a novice when it comes to syntax and can > >> only > >> > do basic stuff and am not clear about all of the commands. You are > >> correct > >> > in that MI produces 1 data file with a variable called Imputation_ > >> coded 0 > >> > (raw), 1, 2, 3, 4, 5 for 5 imputed data sets. I tried to get the second > >> > set of code to run. I changed the syntax so that impfag = imputation_, > >> > $casenum to ID, and also changed the name of my alcohol items. It > >> partly > >> > runs and adds the variables @nukeme@ and @ordered@ to the data file, > >> and > >> > then in the @nukeme@ column it's coded any case with missing data on > >> the > >> > alcohol items as 1, and also coded as 1 those cases with the same ID > >> > number but whose data on CAGEAID_2 to CAGEAID_5 has been imputed. I > >> > thought it would change the latter to system missing values though? > >> > > >> > COMPUTE @ordered@= ID. > >> > SORT CASES BY ID imputation_. > >> > COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0. > >> > IF ID=LAG(ID) AND imputation_ @nukeme@=LAG(@nukeme@). > >> > DO IF @nukeme@. > >> > + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE > >> > imputation_=$SYSMIS . > >> > + END REPEAT. > >> > END IF. > >> > SORT CASES BY @ordered@. > >> > > >> > > >> >> Date: Thu, 15 Dec 2011 02:29:17 -0800 > >> >> From: david.marso@ > >> >> Subject: Re: Multiple imputation for different types of missing values > >> >> To: SPSSX-L@.UGA > >> >> > >> >> "The only solution I could come up with was running the MI, then > >> manually > >> >> scanning thousands of rows of data and deleting the imputed values on > >> the > >> >> alcohol measure..." > >> >> Anytime you begin to manually scan thousands of rows... *STOP*! > >> RETHINK!! > >> >> "there must be a simpler way." Yes! It is called Syntax! > >> >> --- > >> >> Assuming something like alcohol measure = alc01 to alc05. Imputed > >> values > >> >> imp01 to imp05. > >> >> DO IF NVALID(alc01 TO alc05)=0. > >> >> DO REPEAT imp=imp01 TO imp05. > >> >> COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as > >> >> missing */. > >> >> END REPEAT. > >> >> END IF. > >> >> --- > >> >> OTOH: I don't have this module so not certain what the data come back > >> >> with? > >> >> --- > >> >> I suspect you actually end up with the raw non-imputed data at the top > >> >> and > >> >> several imputed data sets below? Hopefully with some sort of > >> consistent > >> >> ID > >> >> variable (ID) > >> >> and some sort of imputation flag impflag (0 raw, 1 imputed)?. > >> >> COMPUTE @ordered@=$CASENUM. > >> >> SORT CASES BY ID impflag. > >> >> COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0. > >> >> IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@). > >> >> DO IF @nukeme@. > >> >> + DO REPEAT imp=alc01 TO alc05. > >> >> + COMPUTE imp=$SYSMIS . /* or set to some value to be declared > >> later > >> >> as > >> >> missing */. > >> >> + END REPEAT. > >> >> END IF. > >> >> SORT CASES BY @ordered@. > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> Kathryn Gardner wrote > >> >> > > >> >> > Dear List,I am running a multiple imputation on lots of > >> questionnaire > >> >> > items and I'm trying to figure out a way to run the analysis, > >> without > >> >> > imputing missing values for those participants who have missed out > >> say > >> >> all > >> >> > 5 items on an alcohol questionnaire because they were told to skip > >> it > >> >> if > >> >> > they do not drink alcohol. I don't want to exclude the alcohol > >> measure > >> >> > entirely from the MI because there are also randomly missing values > >> >> across > >> >> > these alcohol items that do need imputing. At the moment all missing > >> >> > values are identified as system missing in the data file, and I > >> thought > >> >> > there might be a way to get SPSS to only run the MI on certain types > >> of > >> >> > missing values if I coded the ones I want to be ignored as user > >> >> missing, > >> >> > but this doesn't seem possible. The only solution I could come up > >> with > >> >> was > >> >> > running the MI, then manually scanning thousands of rows of data and > >> >> > deleting the imputed values on the alcohol measure for the > >> participants > >> >> > who skipped the entire questionnaire. As you can imagine, this is > >> >> taking > >> >> > hours! There must be a simpler way. Any advice greatly appreciated. > >> >> Kind > >> >> > regards,Kathryn > >> >> > > >> >> > >> >> > >> >> -- > >> >> View this message in context: > >> >> > >> http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html > >> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >> >> > >> >> ===================== > >> >> To manage your subscription to SPSSX-L, send a message to > >> >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the > >> >> command. To leave the list, send the command > >> >> SIGNOFF SPSSX-L > >> >> For a list of commands to manage subscriptions, send the command > >> >> INFO REFCARD > >> > > >> > >> > >> -- > >> View this message in context: > >> http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080244.html > >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >> > >> ===================== > >> To manage your subscription to SPSSX-L, send a message to > >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the > >> command. To leave the list, send the command > >> SIGNOFF SPSSX-L > >> For a list of commands to manage subscriptions, send the command > >> INFO REFCARD > > > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080477.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
In reply to this post by Bruce Weaver
Hi Bruce,
You are correct in that I do have a question that asks whether they drink alcohol or not (YES/NO), and some participants have actually answered no but then proceeded to answer the cageaid questions, but others have said yes and then scored 0 on the cageaid questions. I think I can re-work David's code here, but I do have other questionnaires where participants have missed out a set of 10 questions because they ask about their father and these were skipped if they were not raise by their father (I didn't include an initial Yes/NO question here though so can easily adapt David's code). Thank you for your suggestion. Kathryn > Date: Fri, 16 Dec 2011 05:32:31 -0800 > From: [hidden email] > Subject: Re: Multiple imputation for different types of missing values > To: [hidden email] > > Kathryn, in your first post, you said, "I'm trying to figure out a way to run > the analysis, without imputing missing values for those participants who > have missed out say all 5 items on an alcohol questionnaire because they > were told to skip it if they do not drink alcohol." This makes me wonder if > there is another dichotomous variable in your file (Drinks_Alcohol - Y/N) > that can be used instead of NVALID(cageaid_2 TO cageaid_5)= 0. Assuming > there is such a variable, and it's coded 1=Y, 0=N, David's line that > computes @nukeme@ could be changed to: > > COMPUTE @nukeme@= NOT drinks_alcohol. > > I think this is preferable, because for whatever reason (e.g., data entry > error), someone might have zeros for cageaid_2 to cageaid_5, despite having > a YES for drinks_alcohol. It's also simpler code. > > HTH. > > > > David Marso wrote > > > > Kathryn , > > Looks like the line > > + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE > > imputation_=$SYSMIS . > > may be the culprit. Should be 2 lines (sans >). Also modified logical > > flag for (imputation_). > > HTH, David > > --- > > * Do *NOT* change $CASENUM to ID!!! (This is to enable restoration to > > original order) . > > COMPUTE @ordered@= $CASENUM. > > SORT CASES BY ID imputation_. > > COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0. > > * Probably need following change > > IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@). > > DO IF @nukeme@. > > + DO REPEAT imputedvar=cageaid_2 TO cageaid_5. > > * Following line was previously munged into DO REPEAT line * Possible > > received error mesage? *. > > + COMPUTE imputatedvar=$SYSMIS . > > + END REPEAT. > > END IF. > > * Restore data to order of imputed data sets. > > SORT CASES BY @ordered@. > > > > > > Kathryn Gardner wrote > >> > >> thanks David. I'm a bit of a novice when it comes to syntax and can only > >> do basic stuff and am not clear about all of the commands. You are > >> correct in that MI produces 1 data file with a variable called > >> Imputation_ coded 0 (raw), 1, 2, 3, 4, 5 for 5 imputed data sets. I tried > >> to get the second set of code to run. I changed the syntax so that impfag > >> = imputation_, $casenum to ID, and also changed the name of my alcohol > >> items. It partly runs and adds the variables @nukeme@ and @ordered@ to > >> the data file, and then in the @nukeme@ column it's coded any case with > >> missing data on the alcohol items as 1, and also coded as 1 those cases > >> with the same ID number but whose data on CAGEAID_2 to CAGEAID_5 has been > >> imputed. I thought it would change the latter to system missing values > >> though? > >> > >> COMPUTE @ordered@= ID. > >> SORT CASES BY ID imputation_. > >> COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0. > >> IF ID=LAG(ID) AND imputation_ @nukeme@=LAG(@nukeme@). > >> DO IF @nukeme@. > >> + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE > >> imputation_=$SYSMIS . > >> + END REPEAT. > >> END IF. > >> SORT CASES BY @ordered@. > >> > >> > >>> Date: Thu, 15 Dec 2011 02:29:17 -0800 > >>> From: david.marso@ > >>> Subject: Re: Multiple imputation for different types of missing values > >>> To: SPSSX-L@.UGA > >>> > >>> "The only solution I could come up with was running the MI, then > >>> manually > >>> scanning thousands of rows of data and deleting the imputed values on > >>> the > >>> alcohol measure..." > >>> Anytime you begin to manually scan thousands of rows... *STOP*! > >>> RETHINK!! > >>> "there must be a simpler way." Yes! It is called Syntax! > >>> --- > >>> Assuming something like alcohol measure = alc01 to alc05. Imputed > >>> values > >>> imp01 to imp05. > >>> DO IF NVALID(alc01 TO alc05)=0. > >>> DO REPEAT imp=imp01 TO imp05. > >>> COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as > >>> missing */. > >>> END REPEAT. > >>> END IF. > >>> --- > >>> OTOH: I don't have this module so not certain what the data come back > >>> with? > >>> --- > >>> I suspect you actually end up with the raw non-imputed data at the top > >>> and > >>> several imputed data sets below? Hopefully with some sort of consistent > >>> ID > >>> variable (ID) > >>> and some sort of imputation flag impflag (0 raw, 1 imputed)?. > >>> COMPUTE @ordered@=$CASENUM. > >>> SORT CASES BY ID impflag. > >>> COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0. > >>> IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@). > >>> DO IF @nukeme@. > >>> + DO REPEAT imp=alc01 TO alc05. > >>> + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later > >>> as > >>> missing */. > >>> + END REPEAT. > >>> END IF. > >>> SORT CASES BY @ordered@. > >>> > >>> > >>> > >>> > >>> > >>> Kathryn Gardner wrote > >>> > > >>> > Dear List,I am running a multiple imputation on lots of questionnaire > >>> > items and I'm trying to figure out a way to run the analysis, without > >>> > imputing missing values for those participants who have missed out say > >>> all > >>> > 5 items on an alcohol questionnaire because they were told to skip it > >>> if > >>> > they do not drink alcohol. I don't want to exclude the alcohol measure > >>> > entirely from the MI because there are also randomly missing values > >>> across > >>> > these alcohol items that do need imputing. At the moment all missing > >>> > values are identified as system missing in the data file, and I > >>> thought > >>> > there might be a way to get SPSS to only run the MI on certain types > >>> of > >>> > missing values if I coded the ones I want to be ignored as user > >>> missing, > >>> > but this doesn't seem possible. The only solution I could come up with > >>> was > >>> > running the MI, then manually scanning thousands of rows of data and > >>> > deleting the imputed values on the alcohol measure for the > >>> participants > >>> > who skipped the entire questionnaire. As you can imagine, this is > >>> taking > >>> > hours! There must be a simpler way. Any advice greatly appreciated. > >>> Kind > >>> > regards,Kathryn > >>> > > >>> > >>> > >>> -- > >>> View this message in context: > >>> http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html > >>> Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >>> > >>> ===================== > >>> To manage your subscription to SPSSX-L, send a message to > >>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the > >>> command. To leave the list, send the command > >>> SIGNOFF SPSSX-L > >>> For a list of commands to manage subscriptions, send the command > >>> INFO REFCARD > >> > > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080275.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
In reply to this post by Kathryn Gardner
recoding all cageaid items to user missing value -5 IF @nukeme@ =
1.
Try to avoid having sysmis on the right hand side of an assignment. The values are missing because you said they should be. They are not missing due to SPSS being unable to follow your instruction. Having item values missing -1 because they are not applicable -2 the respondent drinks but did not answer any of those questions -3 answered but does not drink -4 answered this one but had to impute more than other 2 items in this scale -5 nuked -6 ... vs $sysmis SPSS could not obey your instructions for reading or transforming Art Kendall Social Research Consultants On 12/16/2011 11:14 AM, Kathryn Gardner wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Administrator
|
Certainly sage advice:
Notice in my original post: + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as missing */. I did *NOT* wish to be bothered with discerning or assuming what might constitute valid/missing values for the resulting variables. Sometimes within the context of a newsgroup posting I will follow the path of least resistance even though I would scarcely utilize such in my own production code. Recall in the original query: "At the moment all missing values are identified as system missing in the data file.". ... Perhaps *NOT* best practices applied here, OTOH: people pay me to apply them. Free advice is cheap and sometime worth what one pays for it (probably *ALOT* more but YMMV ;-) . --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Kathryn Gardner
It is particularly in RELIABILITY that I wonder
about imputing values for items from outside the set of variables
that are repeated measures of a construct.
Items in a summative scale often have a restricted range 0 to 9 1 to 5 -3 to 3, etc. A goal in scale development is to have a measurement that has convergent validity with the set of items and divergent validity for constructs other than the one you are trying to measure. One wants to work with the common variance It is likely that variables intended to measure other constructs would relate to the unique variance within an item Well developed scales have question stems that are balanced for direction, i.e., have opposite signs on the factor they are assigned to, Would you impute items the way they were entered of after they were reflected to be unidirectional like you need to do before you run RELIABILITY? Mean substitution is potentially problematic when a mean is across cases. I don't know about when the mean is within cases. __ Of course a lot depends on where that piece of research lies in the stream of research in that area, the goals of the particular piece of research, and where you are in the use of the data. With regard to the alcohol scale when you think about an item would zero be a reasonable value in the order of the responses to that item? It is hard to say much more without understanding the constructs you are trying to measure and their role in your theorizing. Also before doing any imputing do you remove items from further consideration when many of the values are missing for reasons otehr than non-applicability? Do you drop cases that have substantial amount of missing data? Or that have pattern responding? by pattern responding I mean all true, all false, alternating true and false, 1 2 3 4 5, etc. that show respondents were responding only to the request to give an answer but are not responding to the semantic content of the question? Do you drop items from scales when their inclusion lowers the internal consistency of the summative score(total or mean)? Of course it should be a quick project to get means, SD's and correlations several ways once you have finished cleaning the data.. With list wise deletion. With pair wise deletion. With imputed values for missing data. Do the values differ in meaningful ways? If you do factor analyses and plot the eigenvalues from each and from parallel analyses of each? Is there much difference? In the long run how much of your data is imputed? hth Art Kendall Social Research Consultants On 12/16/2011 4:35 AM, Kathryn Gardner wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Dear Art,
Thank you for your useful comments. Before estimating missing data I usually remove anyone who has missed out entire questionnaires (other than for non-applicability reasons), and also check for any patterns in the missing data. I also check that the amount of missing data to be imputed is not a large amount. Up until now I have used person mean substitution at the item level, which has shown promising results, but MI seems to be the gold standard, and now that it is available in SPSS I assumed this was the way forward, even for scale items. It seems you are suggesting not, but I wondered if you knew of any references that discuss this issue? I have noticed that there is an option to apply constraints when running MI, so that one can specify the range one would like the score to fall in e.g., 1-5. I was intending to do this for each item so that the imputation does not impute any implausible values e.g., -1. I couldn't find any discussion of this issue, but it seems like the only logical way to avoid implausible values. I have dropped items from scales where internal consistency is lowered, but only if this is substantially so. Kathryn Date: Fri, 16 Dec 2011 13:37:21 -0500 From: [hidden email] To: [hidden email] CC: [hidden email] Subject: Re: [SPSSX-L] Multiple imputation for different types of missing values It is particularly in RELIABILITY that I wonder about imputing values for items from outside the set of variables that are repeated measures of a construct. Items in a summative scale often have a restricted range 0 to 9 1 to 5 -3 to 3, etc. A goal in scale development is to have a measurement that has convergent validity with the set of items and divergent validity for constructs other than the one you are trying to measure. One wants to work with the common variance It is likely that variables intended to measure other constructs would relate to the unique variance within an item Well developed scales have question stems that are balanced for direction, i.e., have opposite signs on the factor they are assigned to, Would you impute items the way they were entered of after they were reflected to be unidirectional like you need to do before you run RELIABILITY? Mean substitution is potentially problematic when a mean is across cases. I don't know about when the mean is within cases. __ Of course a lot depends on where that piece of research lies in the stream of research in that area, the goals of the particular piece of research, and where you are in the use of the data. With regard to the alcohol scale when you think about an item would zero be a reasonable value in the order of the responses to that item? It is hard to say much more without understanding the constructs you are trying to measure and their role in your theorizing. Also before doing any imputing do you remove items from further consideration when many of the values are missing for reasons otehr than non-applicability? Do you drop cases that have substantial amount of missing data? Or that have pattern responding? by pattern responding I mean all true, all false, alternating true and false, 1 2 3 4 5, etc. that show respondents were responding only to the request to give an answer but are not responding to the semantic content of the question? Do you drop items from scales when their inclusion lowers the internal consistency of the summative score(total or mean)? Of course it should be a quick project to get means, SD's and correlations several ways once you have finished cleaning the data.. With list wise deletion. With pair wise deletion. With imputed values for missing data. Do the values differ in meaningful ways? If you do factor analyses and plot the eigenvalues from each and from parallel analyses of each? Is there much difference? In the long run how much of your data is imputed? hth Art Kendall Social Research Consultants On 12/16/2011 4:35 AM, Kathryn Gardner wrote:
|
In reply to this post by David Marso
I never leave things as system missing. I always use use missing values, but in the context of these posts it was an easy way to communicate what I needed, before I then recode to user missing values as required.
Kathryn > Date: Fri, 16 Dec 2011 10:26:18 -0800 > From: [hidden email] > Subject: Re: Multiple imputation for different types of missing values > To: [hidden email] > > Certainly sage advice: > Notice in my original post: > + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as > missing */. > I did *NOT* wish to be bothered with discerning or assuming what might > constitute valid/missing values for the resulting variables. Sometimes > within the context of a newsgroup posting I will follow the path of least > resistance even though I would scarcely utilize such in my own production > code. > Recall in the original query: > "At the moment all missing values are identified as system missing in the > data file.". > ... > Perhaps *NOT* best practices applied here, OTOH: people pay me to apply > them. > Free advice is cheap and sometime worth what one pays for it (probably > *ALOT* more but YMMV ;-) . > -- > > > Art Kendall wrote > > > > recoding all cageaid items to user missing value -5 IF @nukeme@ = > > 1. > > Try to avoid having sysmis on the right hand side of > > an assignment. > > The values are missing because you said they should be. They > > are > > not missing due to SPSS being unable to follow your instruction. > > > > Having item values missing > > -1 because they are not applicable > > -2 the respondent drinks but did not answer any of those > > questions > > -3 answered but does not drink > > -4 answered this one but had to impute more than other 2 items in > > this scale > > -5 nuked > > -6 ... > > vs > > $sysmis SPSS could not obey your instructions for reading or > > transforming > > > > Art Kendall > > Social Research Consultants > > > > On 12/16/2011 11:14 AM, Kathryn Gardner wrote: > > > > > > > > Hi David, > > Lines 1-51 were empty (I had some syntax there but deleted it). > > > > I see what you mean now about how the syntax works. I think i > > can use this now actually, recoding all cageaid items to system > > missing IF @nukeme@ = 1. Many thanks for this it's a great help! > > Kathryn > > > > > > > > > Date: Fri, 16 Dec 2011 07:04:42 -0800 > > > From: david.marso@ > > > Subject: Re: Multiple imputation for different types of > > missing values > > > To: SPSSX-L@.UGA > > > > > > Kathryn , > > > What is happening in lines 1-51 of the preceding prior to > > running the posted > > > syntax? > > > All my code does is check to see if the non-imputed data > > are *ALL* missing > > > for the specified variables > > > (cageaid_2 TO cageaid_5). If so it creates a flag > > @nukeme@. It then checks > > > within the same ID and drags the flag into the imputed > > data sets. It should > > > then clobber the specified variables for the specified > > variables (set > > > $SYSMIS). It makes *ABSOLUTELY* no difference what the > > imputed values are. > > > Only thing that matters is that the non-imputed are *ALL* > > missing. My > > > question WHAT IS THE CONTEXT for: > > > "56 + LOOP has no effect on this command. "... i.e. WHAT > > IS GOING ON PRIOR > > > TO RUNNING my posted syntax? > > > David > > > > > > --- > > > > > > Kathryn Gardner wrote > > > > > > > > I tried the new syntax and it seems to do something > > similar and produces > > > > the error messages below. > > > > Just to clarify, should the syntax put system > > missing values in cageaid_2 > > > > to cageaid_5 when all of these have been imputed? > > Also, not sure if it > > > > makes a difference that some imputed values are > > negative (see 5 rows of > > > > imputed data for the same participant below). > > > > > > > > -1 -1 1 -1 > > > > -1 0 -1 0 > > > > -1 -1 -1 0 > > > > 1 0 1 0 > > > > -1 -1 0 0 > > > > > > > > 52 COMPUTE @ordered@= $CASENUM. > > > > 53 SORT CASES BY ID imputation_. > > > > 54 COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5) = > > 0. > > > > 55 IF ID=LAG(ID) AND ( imputation_ GE 1) > > @nukeme@=LAG(@nukeme@). > > > > 56 DO IF @nukeme@. > > > > 57 + DO REPEAT imputedvar=cageaid_2 TO cageaid_5. > > > > 58 + COMPUTE imputatedvar=$SYSMIS . > > > > 59 + END REPEAT. > > > > 60 END IF. > > > > 61 SORT CASES BY @ordered@. > > > > > > > > 56 + LOOP has no effect on this command. > > > > 56 + The first word in the line is not recognized as > > an SPSS Statistics > > > > command. > > > > 57 + LOOP has no effect on this command. > > > > 57 + The first word in the line is not recognized as > > an SPSS Statistics > > > > command. > > > > 58 + LOOP has no effect on this command. > > > > 58 + The first word in the line is not recognized as > > an SPSS Statistics > > > > command. > > > > > > > > > > > > > > > > > > > > > > > >> Date: Fri, 16 Dec 2011 05:15:37 -0800 > > > >> From: david.marso@ > > > >> Subject: Re: Multiple imputation for different > > types of missing values > > > >> To: SPSSX-L@.UGA > > > >> > > > >> Kathryn , > > > >> Looks like the line > > > >> + DO REPEAT imputation_=cageaid_2 TO > > cageaid_5.> + COMPUTE > > > >> imputation_=$SYSMIS . > > > >> may be the culprit. Should be 2 lines (sans > > >). Also modified logical > > > >> flag > > > >> for (imputation_). > > > >> HTH, David > > > >> --- > > > >> * Do *NOT* change $CASENUM to ID!!! (This is to > > enable restoration to > > > >> original order) . > > > >> COMPUTE @ordered@= $CASENUM. > > > >> SORT CASES BY ID imputation_. > > > >> COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= > > 0. > > > >> * Probably need following change > > > >> IF ID=LAG(ID) AND ( imputation_ GE 1) > > @nukeme@=LAG(@nukeme@). > > > >> DO IF @nukeme@. > > > >> + DO REPEAT imputedvar=cageaid_2 TO cageaid_5. > > > >> * Following line was previously munged into DO > > REPEAT line * Possible > > > >> received error mesage? *. > > > >> + COMPUTE imputatedvar=$SYSMIS . > > > >> + END REPEAT. > > > >> END IF. > > > >> * Restore data to order of imputed data sets. > > > >> SORT CASES BY @ordered@. > > > >> > > > >> > > > >> Kathryn Gardner wrote > > > >> > > > > >> > thanks David. I'm a bit of a novice when it > > comes to syntax and can > > > >> only > > > >> > do basic stuff and am not clear about all > > of the commands. You are > > > >> correct > > > >> > in that MI produces 1 data file with a > > variable called Imputation_ > > > >> coded 0 > > > >> > (raw), 1, 2, 3, 4, 5 for 5 imputed data > > sets. I tried to get the second > > > >> > set of code to run. I changed the syntax so > > that impfag = imputation_, > > > >> > $casenum to ID, and also changed the name > > of my alcohol items. It > > > >> partly > > > >> > runs and adds the variables @nukeme@ and > > @ordered@ to the data file, > > > >> and > > > >> > then in the @nukeme@ column it's coded any > > case with missing data on > > > >> the > > > >> > alcohol items as 1, and also coded as 1 > > those cases with the same ID > > > >> > number but whose data on CAGEAID_2 to > > CAGEAID_5 has been imputed. I > > > >> > thought it would change the latter to > > system missing values though? > > > >> > > > > >> > COMPUTE @ordered@= ID. > > > >> > SORT CASES BY ID imputation_. > > > >> > COMPUTE @nukeme@=NVALID(cageaid_2 TO > > cageaid_5)= 0. > > > >> > IF ID=LAG(ID) AND imputation_ > > @nukeme@=LAG(@nukeme@). > > > >> > DO IF @nukeme@. > > > >> > + DO REPEAT imputation_=cageaid_2 TO > > cageaid_5.> + COMPUTE > > > >> > imputation_=$SYSMIS . > > > >> > + END REPEAT. > > > >> > END IF. > > > >> > SORT CASES BY @ordered@. > > > >> > > > > >> > > > > >> >> Date: Thu, 15 Dec 2011 02:29:17 -0800 > > > >> >> From: david.marso@ > > > >> >> Subject: Re: Multiple imputation for > > different types of missing values > > > >> >> To: SPSSX-L@.UGA > > > >> >> > > > >> >> "The only solution I could come up with > > was running the MI, then > > > >> manually > > > >> >> scanning thousands of rows of data and > > deleting the imputed values on > > > >> the > > > >> >> alcohol measure..." > > > >> >> Anytime you begin to manually scan > > thousands of rows... *STOP*! > > > >> RETHINK!! > > > >> >> "there must be a simpler way." Yes! It > > is called Syntax! > > > >> >> --- > > > >> >> Assuming something like alcohol measure > > = alc01 to alc05. Imputed > > > >> values > > > >> >> imp01 to imp05. > > > >> >> DO IF NVALID(alc01 TO alc05)=0. > > > >> >> DO REPEAT imp=imp01 TO imp05. > > > >> >> COMPUTE imp=$SYSMIS . /* or set to some > > value to be declared later as > > > >> >> missing */. > > > >> >> END REPEAT. > > > >> >> END IF. > > > >> >> --- > > > >> >> OTOH: I don't have this module so not > > certain what the data come back > > > >> >> with? > > > >> >> --- > > > >> >> I suspect you actually end up with the > > raw non-imputed data at the top > > > >> >> and > > > >> >> several imputed data sets below? > > Hopefully with some sort of > > > >> consistent > > > >> >> ID > > > >> >> variable (ID) > > > >> >> and some sort of imputation flag > > impflag (0 raw, 1 imputed)?. > > > >> >> COMPUTE @ordered@=$CASENUM. > > > >> >> SORT CASES BY ID impflag. > > > >> >> COMPUTE @nukeme@=NVALID(alc01 TO > > alc05)=0. > > > >> >> IF ID=LAG(ID) AND impflag > > @nukeme@=LAG(@nukeme@). > > > >> >> DO IF @nukeme@. > > > >> >> + DO REPEAT imp=alc01 TO alc05. > > > >> >> + COMPUTE imp=$SYSMIS . /* or set to > > some value to be declared > > > >> later > > > >> >> as > > > >> >> missing */. > > > >> >> + END REPEAT. > > > >> >> END IF. > > > >> >> SORT CASES BY @ordered@. > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> Kathryn Gardner wrote > > > >> >> > > > > >> >> > Dear List,I am running a multiple > > imputation on lots of > > > >> questionnaire > > > >> >> > items and I'm trying to figure out > > a way to run the analysis, > > > >> without > > > >> >> > imputing missing values for those > > participants who have missed out > > > >> say > > > >> >> all > > > >> >> > 5 items on an alcohol > > questionnaire because they were told to skip > > > >> it > > > >> >> if > > > >> >> > they do not drink alcohol. I don't > > want to exclude the alcohol > > > >> measure > > > >> >> > entirely from the MI because there > > are also randomly missing values > > > >> >> across > > > >> >> > these alcohol items that do need > > imputing. At the moment all missing > > > >> >> > values are identified as system > > missing in the data file, and I > > > >> thought > > > >> >> > there might be a way to get SPSS > > to only run the MI on certain types > > > >> of > > > >> >> > missing values if I coded the ones > > I want to be ignored as user > > > >> >> missing, > > > >> >> > but this doesn't seem possible. > > The only solution I could come up > > > >> with > > > >> >> was > > > >> >> > running the MI, then manually > > scanning thousands of rows of data and > > > >> >> > deleting the imputed values on the > > alcohol measure for the > > > >> participants > > > >> >> > who skipped the entire > > questionnaire. As you can imagine, this is > > > >> >> taking > > > >> >> > hours! There must be a simpler > > way. Any advice greatly appreciated. > > > >> >> Kind > > > >> >> > regards,Kathryn > > > >> >> > > > > >> >> > > > >> >> > > > >> >> -- > > > >> >> View this message in context: > > > >> >> > > > >> > > http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html > > > >> >> Sent from the SPSSX Discussion mailing > > list archive at Nabble.com. > > > >> >> > > > >> >> ===================== > > > >> >> To manage your subscription to SPSSX-L, > > send a message to > > > >> >> LISTSERV@.UGA (not to SPSSX-L), with no > > body text except the > > > >> >> command. To leave the list, send the > > command > > > >> >> SIGNOFF SPSSX-L > > > >> >> For a list of commands to manage > > subscriptions, send the command > > > >> >> INFO REFCARD > > > >> > > > > >> > > > >> > > > >> -- > > > >> View this message in context: > > > >> > > http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080244.html > > > >> Sent from the SPSSX Discussion mailing list > > archive at Nabble.com. > > > >> > > > >> ===================== > > > >> To manage your subscription to SPSSX-L, send a > > message to > > > >> LISTSERV@.UGA (not to SPSSX-L), with no body > > text except the > > > >> command. To leave the list, send the command > > > >> SIGNOFF SPSSX-L > > > >> For a list of commands to manage subscriptions, > > send the command > > > >> INFO REFCARD > > > > > > > > > > > > > -- > > > View this message in context: > > http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080477.html > > > Sent from the SPSSX Discussion mailing list archive at > > Nabble.com. > > > > > > ===================== > > > To manage your subscription to SPSSX-L, send a message to > > > LISTSERV@.UGA (not to SPSSX-L), with no body > > text except the > > > command. To leave the list, send the command > > > SIGNOFF SPSSX-L > > > For a list of commands to manage subscriptions, send the > > command > > > INFO REFCARD > > > > > > > > > > > > > > ===================== > > To manage your subscription to SPSSX-L, send a message to > > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > > command. To leave the list, send the command > > SIGNOFF SPSSX-L > > For a list of commands to manage subscriptions, send the command > > INFO REFCARD > > > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5081042.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
In reply to this post by Kathryn Gardner
My experience with the handling of missing data has lead me to distrust and largely not use the SPSS Multiple Imputation module. Because of the way it draws
its values and develops its estimate of the distribution, my feeling is that should only be used in situations that can truly be called MAR.
Research into the least biased way of estimating the missing value in survey research has actually shown that when the questions are based on finite scales,
other methods may be more appropriate. Hot Decking is one of the best approaches for this, and some versions can even take into account multiple time points. Hot Decking is also the method used by many of the very large scale survey groups such as the Census.
It’s my understanding that hot decking works best when the sample size is fairly substantial. Note that Hot Decking is not supported natively in SPSS, but numerous macro’s exist for it, and are fairly easy to use.
Matthew J Poes Research Data Specialist Center for Prevention Research and Development University of Illinois From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Kathryn Gardner Dear Art, Date: Fri, 16 Dec 2011 13:37:21 -0500 Art Kendall Social Research Consultants
Dear Art, Date: Thu, 15 Dec 2011 07:11:01 -0500 Dear List, |
Free forum by Nabble | Edit this page |