SPSSX Discussion

Multiple imputation for different types of missing values

Classic

List

Threaded

19 messages Options

Kathryn Gardner

Multiple imputation for different types of missing values

Dear List,
I am running a multiple imputation on lots of questionnaire items and I'm trying to figure out a way to run the analysis, without imputing missing values for those participants who have missed out say all 5 items on an alcohol questionnaire because they were told to skip it if they do not drink alcohol. I don't want to exclude the alcohol measure entirely from the MI because there are also randomly missing values across these alcohol items that do need imputing. At the moment all missing values are identified as system missing in the data file, and I thought there might be a way to get SPSS to only run the MI on certain types of missing values if I coded the ones I want to be ignored as user missing, but this doesn't seem possible. The only solution I could come up with was running the MI, then manually scanning thousands of rows of data and deleting the imputed values on the alcohol measure for the participants who skipped the entire questionnaire. As you can imagine, this is taking hours! There must be a simpler way. Any advice greatly appreciated.

Kind regards,
Kathryn

David Marso

Re: Multiple imputation for different types of missing values

Administrator

"The only solution I could come up with was running the MI, then manually
scanning thousands of rows of data and deleting the imputed values on the
alcohol measure..."
Anytime you begin to manually scan thousands of rows... *STOP*! RETHINK!!
"there must be a simpler way." Yes! It is called Syntax!
---
Assuming something like alcohol measure = alc01 to alc05. Imputed values
imp01 to imp05.
DO IF NVALID(alc01 TO alc05)=0.
DO REPEAT imp=imp01 TO imp05.
COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
missing */.
END REPEAT.
END IF.
---
OTOH: I don't have this module so not certain what the data come back with?
---
I suspect you actually end up with the raw non-imputed data at the top and
several imputed data sets below? Hopefully with some sort of consistent ID
variable (ID)
and some sort of imputation flag impflag (0 raw, 1 imputed)?.
COMPUTE @ordered@=$CASENUM.
SORT CASES BY ID impflag.
COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0.
IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@).
DO IF @nukeme@.
+ DO REPEAT imp=imp01 TO imp05.
+ COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
missing */.
+ END REPEAT.
END IF.
SORT CASES BY @ordered@.

Kathryn Gardner wrote

>
> Dear List,I am running a multiple imputation on lots of questionnaire
> items and I'm trying to figure out a way to run the analysis, without
> imputing missing values for those participants who have missed out say all
> 5 items on an alcohol questionnaire because they were told to skip it if
> they do not drink alcohol. I don't want to exclude the alcohol measure
> entirely from the MI because there are also randomly missing values across
> these alcohol items that do need imputing. At the moment all missing
> values are identified as system missing in the data file, and I thought
> there might be a way to get SPSS to only run the MI on certain types of
> missing values if I coded the ones I want to be ignored as user missing,
> but this doesn't seem possible. The only solution I could come up with was
> running the MI, then manually scanning thousands of rows of data and
> deleting the imputed values on the alcohol measure for the participants
> who skipped the entire questionnaire. As you can imagine, this is taking
> hours! There must be a simpler way. Any advice greatly appreciated. Kind
> regards,Kathryn
>

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077083.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

David Marso

Re: Multiple imputation for different types of missing values

Administrator

In reply to this post by Kathryn Gardner

"The only solution I could come up with was running the MI, then manually scanning thousands of rows of data and deleting the imputed values on the alcohol measure..."
Anytime you begin to manually scan thousands of rows... *STOP*! RETHINK!!
"there must be a simpler way." Yes! It is called Syntax!
---
Assuming something like alcohol measure = alc01 to alc05. Imputed values imp01 to imp05.
DO IF NVALID(alc01 TO alc05)=0.
DO REPEAT imp=imp01 TO imp05.
COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as missing */.
END REPEAT.
END IF.
---
OTOH: I don't have this module so not certain what the data come back with?
---
I suspect you actually end up with the raw non-imputed data at the top and several imputed data sets below? Hopefully with some sort of consistent ID variable (ID)
and some sort of imputation flag impflag (0 raw, 1 imputed)?.
COMPUTE @ordered@=$CASENUM.
SORT CASES BY ID impflag.
COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0.
IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@).
DO IF @nukeme@.
+ DO REPEAT imp=alc01 TO alc05.
+ COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as missing */.
+ END REPEAT.
END IF.
SORT CASES BY @ordered@.

Kathryn Gardner wrote

Dear List,I am running a multiple imputation on lots of questionnaire items and I'm trying to figure out a way to run the analysis, without imputing missing values for those participants who have missed out say all 5 items on an alcohol questionnaire because they were told to skip it if they do not drink alcohol. I don't want to exclude the alcohol measure entirely from the MI because there are also randomly missing values across these alcohol items that do need imputing. At the moment all missing values are identified as system missing in the data file, and I thought there might be a way to get SPSS to only run the MI on certain types of missing values if I coded the ones I want to be ignored as user missing, but this doesn't seem possible. The only solution I could come up with was running the MI, then manually scanning thousands of rows of data and deleting the imputed values on the alcohol measure for the participants who skipped the entire questionnaire. As you can imagine, this is taking hours! There must be a simpler way. Any advice greatly appreciated. Kind regards,Kathryn

Art Kendall

Re: Multiple imputation for different types of missing values

In reply to this post by Kathryn Gardner

I would like to hear from other list members, but imputing summative scale items via MI is often unnecessary.
You use the term items which often means the variables are meant to be used as part of a score so I am responding in that context.

Are you planning to distribute a public use data set that includes items, that includes only scales scores, or are you only working on you own data set for your use?

What is the goal of your project? finding totals, means, percents for a pop or for subpops? Or are you intending to compare and contrast groups? Or mainly interest in the relations of variables? Developing scales?

What is the response scale on the alcohol items? Are they intended to be repeated measures of a construct where the total or mean is used in analysis as the measure of a construct?

{would like to hear from other on this}
If score is to be used in analysis and the mean is the summative score, just use it.

If the score is to be used as a total e.g., for comparison to published norms, then a) do what the original authors of the scale did or b)
compute adjscore = sum valid items * (# of items in scale/# of items with valid values).
{end of part I would like to hear from other list members about.

Art Kendall
Social Research Consultants

On 12/15/2011 4:15 AM, Kathryn Gardner wrote:

Dear List,
I am running a multiple imputation on lots of questionnaire items and I'm trying to figure out a way to run the analysis, without imputing missing values for those participants who have missed out say all 5 items on an alcohol questionnaire because they were told to skip it if they do not drink alcohol. I don't want to exclude the alcohol measure entirely from the MI because there are also randomly missing values across these alcohol items that do need imputing. At the moment all missing values are identified as system missing in the data file, and I thought there might be a way to get SPSS to only run the MI on certain types of missing values if I coded the ones I want to be ignored as user missing, but this doesn't seem possible. The only solution I could come up with was running the MI, then manually scanning thousands of rows of data and deleting the imputed values on the alcohol measure for the participants who skipped the entire questionnaire. As you can imagine, this is taking hours! There must be a simpler way. Any advice greatly appreciated.

Kind regards,
Kathryn

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

Maguin, Eugene

Re: Multiple imputation for different types of missing values

In reply to this post by Kathryn Gardner

Kathryn,

You don’t say what those five alcohol items are but I picture items about quantity and frequency and problems. Why wouldn’t you code a value of 0=never, 0=none, 0=etc for those items given a lead-in item response of ‘No use’. Unless you are prepared to assume that the alcohol lead-in question has a non-1.00 reliability, the response to those questions is 0.

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Kathryn Gardner
Sent: Thursday, December 15, 2011 4:16 AM
To: [hidden email]
Subject: Multiple imputation for different types of missing values

Kathryn Gardner

Re: Multiple imputation for different types of missing values

In reply to this post by Art Kendall

Dear Art,
I wanted to impute the missing values at the item level as I thought this was more sensitive and I can then use all data in reliability analyses. Once imputed, I'll be summing the items to create scale scores to represent various constructs (e.g., alcohol use, personality, emotion regulation) that will be used in in the main analyses 2 papers I am publishing (SEM for one paper and latent profile analysis for another paper). I thought using the mean as the summative score to deal with missing data is equally as bad as using mean substitution? Why is imputing summative scale items via MI is often unnecessary? I couldn't find anything on the debate as to whether multiple imputation should be used for scale items vs. computed subscale scores etc.
Kathryn

Date: Thu, 15 Dec 2011 07:11:01 -0500
From: [hidden email]
To: [hidden email]
CC: [hidden email]
Subject: Re: [SPSSX-L] Multiple imputation for different types of missing values

I would like to hear from other list members, but imputing summative scale items via MI is often unnecessary.
You use the term items which often means the variables are meant to be used as part of a score so I am responding in that context.

Are you planning to distribute a public use data set that includes items, that includes only scales scores, or are you only working on you own data set for your use?

What is the goal of your project? finding totals, means, percents for a pop or for subpops? Or are you intending to compare and contrast groups? Or mainly interest in the relations of variables? Developing scales?

What is the response scale on the alcohol items? Are they intended to be repeated measures of a construct where the total or mean is used in analysis as the measure of a construct?

{would like to hear from other on this}
If score is to be used in analysis and the mean is the summative score, just use it.

If the score is to be used as a total e.g., for comparison to published norms, then a) do what the original authors of the scale did or b)
compute adjscore = sum valid items * (# of items in scale/# of items with valid values).
{end of part I would like to hear from other list members about.

Art Kendall
Social Research Consultants

On 12/15/2011 4:15 AM, Kathryn Gardner wrote:

Dear List,
I am running a multiple imputation on lots of questionnaire items and I'm trying to figure out a way to run the analysis, without imputing missing values for those participants who have missed out say all 5 items on an alcohol questionnaire because they were told to skip it if they do not drink alcohol. I don't want to exclude the alcohol measure entirely from the MI because there are also randomly missing values across these alcohol items that do need imputing. At the moment all missing values are identified as system missing in the data file, and I thought there might be a way to get SPSS to only run the MI on certain types of missing values if I coded the ones I want to be ignored as user missing, but this doesn't seem possible. The only solution I could come up with was running the MI, then manually scanning thousands of rows of data and deleting the imputed values on the alcohol measure for the participants who skipped the entire questionnaire. As you can imagine, this is taking hours! There must be a simpler way. Any advice greatly appreciated.

Kind regards,
Kathryn

Kathryn Gardner

Re: Multiple imputation for different types of missing values

In reply to this post by David Marso

thanks David. I'm a bit of a novice when it comes to syntax and can only do basic stuff and am not clear about all of the commands. You are correct in that MI produces 1 data file with a variable called Imputation_ coded 0 (raw), 1, 2, 3, 4, 5 for 5 imputed data sets. I tried to get the second set of code to run. I changed the syntax so that impfag = imputation_, $casenum to ID, and also changed the name of my alcohol items. It partly runs and adds the variables @nukeme@ and @ordered@ to the data file, and then in the @nukeme@ column it's coded any case with missing data on the alcohol items as 1, and also coded as 1 those cases with the same ID number but whose data on CAGEAID_2 to CAGEAID_5 has been imputed. I thought it would change the latter to system missing values though?

COMPUTE @ordered@= ID.
SORT CASES BY ID imputation_.
COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
IF ID=LAG(ID) AND imputation_ @nukeme@=LAG(@nukeme@).
DO IF @nukeme@.
+ DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE imputation_=$SYSMIS .
+ END REPEAT.
END IF.
SORT CASES BY @ordered@.

> Date: Thu, 15 Dec 2011 02:29:17 -0800

> From: [hidden email]
> Subject: Re: Multiple imputation for different types of missing values
> To: [hidden email]
>
> "The only solution I could come up with was running the MI, then manually
> scanning thousands of rows of data and deleting the imputed values on the
> alcohol measure..."
> Anytime you begin to manually scan thousands of rows... *STOP*! RETHINK!!
> "there must be a simpler way." Yes! It is called Syntax!
> ---
> Assuming something like alcohol measure = alc01 to alc05. Imputed values
> imp01 to imp05.
> DO IF NVALID(alc01 TO alc05)=0.
> DO REPEAT imp=imp01 TO imp05.
> COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
> missing */.
> END REPEAT.
> END IF.
> ---
> OTOH: I don't have this module so not certain what the data come back with?
> ---
> I suspect you actually end up with the raw non-imputed data at the top and
> several imputed data sets below? Hopefully with some sort of consistent ID
> variable (ID)
> and some sort of imputation flag impflag (0 raw, 1 imputed)?.
> COMPUTE @ordered@=$CASENUM.
> SORT CASES BY ID impflag.
> COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0.
> IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@).
> DO IF @nukeme@.
> + DO REPEAT imp=alc01 TO alc05.
> + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
> missing */.
> + END REPEAT.
> END IF.
> SORT CASES BY @ordered@.
>
>
>
>
>
> Kathryn Gardner wrote
> >
> > Dear List,I am running a multiple imputation on lots of questionnaire
> > items and I'm trying to figure out a way to run the analysis, without
> > imputing missing values for those participants who have missed out say all
> > 5 items on an alcohol questionnaire because they were told to skip it if
> > they do not drink alcohol. I don't want to exclude the alcohol measure
> > entirely from the MI because there are also randomly missing values across
> > these alcohol items that do need imputing. At the moment all missing
> > values are identified as system missing in the data file, and I thought
> > there might be a way to get SPSS to only run the MI on certain types of
> > missing values if I coded the ones I want to be ignored as user missing,
> > but this doesn't seem possible. The only solution I could come up with was
> > running the MI, then manually scanning thousands of rows of data and
> > deleting the imputed values on the alcohol measure for the participants
> > who skipped the entire questionnaire. As you can imagine, this is taking
> > hours! There must be a simpler way. Any advice greatly appreciated. Kind
> > regards,Kathryn
> >
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

David Marso

Re: Multiple imputation for different types of missing values

Administrator

Kathryn ,
Looks like the line
+ DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE imputation_=$SYSMIS .
may be the culprit. Should be 2 lines (sans >). Also modified logical flag for (imputation_).
HTH, David
---
* Do *NOT* change $CASENUM to ID!!! (This is to enable restoration to original order) .
COMPUTE @ordered@= $CASENUM.
SORT CASES BY ID imputation_.
COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
* Probably need following change
IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@).
DO IF @nukeme@.
+ DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
* Following line was previously munged into DO REPEAT line * Possible received error mesage? *.
+ COMPUTE imputatedvar=$SYSMIS .
+ END REPEAT.
END IF.
* Restore data to order of imputed data sets.
SORT CASES BY @ordered@.

Kathryn Gardner wrote

thanks David. I'm a bit of a novice when it comes to syntax and can only do basic stuff and am not clear about all of the commands. You are correct in that MI produces 1 data file with a variable called Imputation_ coded 0 (raw), 1, 2, 3, 4, 5 for 5 imputed data sets. I tried to get the second set of code to run. I changed the syntax so that impfag = imputation_, $casenum to ID, and also changed the name of my alcohol items. It partly runs and adds the variables @nukeme@ and @ordered@ to the data file, and then in the @nukeme@ column it's coded any case with missing data on the alcohol items as 1, and also coded as 1 those cases with the same ID number but whose data on CAGEAID_2 to CAGEAID_5 has been imputed. I thought it would change the latter to system missing values though?

COMPUTE @ordered@= ID.
SORT CASES BY ID imputation_.
COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
IF ID=LAG(ID) AND imputation_ @nukeme@=LAG(@nukeme@).
DO IF @nukeme@.
+ DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE imputation_=$SYSMIS .
+ END REPEAT.
END IF.
SORT CASES BY @ordered@.

> Date: Thu, 15 Dec 2011 02:29:17 -0800
> From: [hidden email]
> Subject: Re: Multiple imputation for different types of missing values
> To: [hidden email]
>
> "The only solution I could come up with was running the MI, then manually
> scanning thousands of rows of data and deleting the imputed values on the
> alcohol measure..."
> Anytime you begin to manually scan thousands of rows... *STOP*! RETHINK!!
> "there must be a simpler way." Yes! It is called Syntax!
> ---
> Assuming something like alcohol measure = alc01 to alc05. Imputed values
> imp01 to imp05.
> DO IF NVALID(alc01 TO alc05)=0.
> DO REPEAT imp=imp01 TO imp05.
> COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
> missing */.
> END REPEAT.
> END IF.
> ---
> OTOH: I don't have this module so not certain what the data come back with?
> ---
> I suspect you actually end up with the raw non-imputed data at the top and
> several imputed data sets below? Hopefully with some sort of consistent ID
> variable (ID)
> and some sort of imputation flag impflag (0 raw, 1 imputed)?.
> COMPUTE @ordered@=$CASENUM.
> SORT CASES BY ID impflag.
> COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0.
> IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@).
> DO IF @nukeme@.
> + DO REPEAT imp=alc01 TO alc05.
> + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
> missing */.
> + END REPEAT.
> END IF.
> SORT CASES BY @ordered@.
>
>
>
>
>
> Kathryn Gardner wrote
> >
> > Dear List,I am running a multiple imputation on lots of questionnaire
> > items and I'm trying to figure out a way to run the analysis, without
> > imputing missing values for those participants who have missed out say all
> > 5 items on an alcohol questionnaire because they were told to skip it if
> > they do not drink alcohol. I don't want to exclude the alcohol measure
> > entirely from the MI because there are also randomly missing values across
> > these alcohol items that do need imputing. At the moment all missing
> > values are identified as system missing in the data file, and I thought
> > there might be a way to get SPSS to only run the MI on certain types of
> > missing values if I coded the ones I want to be ignored as user missing,
> > but this doesn't seem possible. The only solution I could come up with was
> > running the MI, then manually scanning thousands of rows of data and
> > deleting the imputed values on the alcohol measure for the participants
> > who skipped the entire questionnaire. As you can imagine, this is taking
> > hours! There must be a simpler way. Any advice greatly appreciated. Kind
> > regards,Kathryn
> >
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

Bruce Weaver

Re: Multiple imputation for different types of missing values

Administrator

Kathryn, in your first post, you said, "I'm trying to figure out a way to run the analysis, without imputing missing values for those participants who have missed out say all 5 items on an alcohol questionnaire because they were told to skip it if they do not drink alcohol." This makes me wonder if there is another dichotomous variable in your file (Drinks_Alcohol - Y/N) that can be used instead of NVALID(cageaid_2 TO cageaid_5)= 0. Assuming there is such a variable, and it's coded 1=Y, 0=N, David's line that computes @nukeme@ could be changed to:

COMPUTE @nukeme@= NOT drinks_alcohol.

I think this is preferable, because for whatever reason (e.g., data entry error), someone might have zeros for cageaid_2 to cageaid_5, despite having a YES for drinks_alcohol. It's also simpler code.

HTH.

David Marso wrote

Kathryn ,
Looks like the line
+ DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE imputation_=$SYSMIS .
may be the culprit. Should be 2 lines (sans >). Also modified logical flag for (imputation_).
HTH, David
---
* Do *NOT* change $CASENUM to ID!!! (This is to enable restoration to original order) .
COMPUTE @ordered@= $CASENUM.
SORT CASES BY ID imputation_.
COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
* Probably need following change
IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@).
DO IF @nukeme@.
+ DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
* Following line was previously munged into DO REPEAT line * Possible received error mesage? *.
+ COMPUTE imputatedvar=$SYSMIS .
+ END REPEAT.
END IF.
* Restore data to order of imputed data sets.
SORT CASES BY @ordered@.

Kathryn Gardner wrote

thanks David. I'm a bit of a novice when it comes to syntax and can only do basic stuff and am not clear about all of the commands. You are correct in that MI produces 1 data file with a variable called Imputation_ coded 0 (raw), 1, 2, 3, 4, 5 for 5 imputed data sets. I tried to get the second set of code to run. I changed the syntax so that impfag = imputation_, $casenum to ID, and also changed the name of my alcohol items. It partly runs and adds the variables @nukeme@ and @ordered@ to the data file, and then in the @nukeme@ column it's coded any case with missing data on the alcohol items as 1, and also coded as 1 those cases with the same ID number but whose data on CAGEAID_2 to CAGEAID_5 has been imputed. I thought it would change the latter to system missing values though?

COMPUTE @ordered@= ID.
SORT CASES BY ID imputation_.
COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
IF ID=LAG(ID) AND imputation_ @nukeme@=LAG(@nukeme@).
DO IF @nukeme@.
+ DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE imputation_=$SYSMIS .
+ END REPEAT.
END IF.
SORT CASES BY @ordered@.

> Date: Thu, 15 Dec 2011 02:29:17 -0800
> From: [hidden email]
> Subject: Re: Multiple imputation for different types of missing values
> To: [hidden email]
>
> "The only solution I could come up with was running the MI, then manually
> scanning thousands of rows of data and deleting the imputed values on the
> alcohol measure..."
> Anytime you begin to manually scan thousands of rows... *STOP*! RETHINK!!
> "there must be a simpler way." Yes! It is called Syntax!
> ---
> Assuming something like alcohol measure = alc01 to alc05. Imputed values
> imp01 to imp05.
> DO IF NVALID(alc01 TO alc05)=0.
> DO REPEAT imp=imp01 TO imp05.
> COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
> missing */.
> END REPEAT.
> END IF.
> ---
> OTOH: I don't have this module so not certain what the data come back with?
> ---
> I suspect you actually end up with the raw non-imputed data at the top and
> several imputed data sets below? Hopefully with some sort of consistent ID
> variable (ID)
> and some sort of imputation flag impflag (0 raw, 1 imputed)?.
> COMPUTE @ordered@=$CASENUM.
> SORT CASES BY ID impflag.
> COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0.
> IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@).
> DO IF @nukeme@.
> + DO REPEAT imp=alc01 TO alc05.
> + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
> missing */.
> + END REPEAT.
> END IF.
> SORT CASES BY @ordered@.
>
>
>
>
>
> Kathryn Gardner wrote
> >
> > Dear List,I am running a multiple imputation on lots of questionnaire
> > items and I'm trying to figure out a way to run the analysis, without
> > imputing missing values for those participants who have missed out say all
> > 5 items on an alcohol questionnaire because they were told to skip it if
> > they do not drink alcohol. I don't want to exclude the alcohol measure
> > entirely from the MI because there are also randomly missing values across
> > these alcohol items that do need imputing. At the moment all missing
> > values are identified as system missing in the data file, and I thought
> > there might be a way to get SPSS to only run the MI on certain types of
> > missing values if I coded the ones I want to be ignored as user missing,
> > but this doesn't seem possible. The only solution I could come up with was
> > running the MI, then manually scanning thousands of rows of data and
> > deleting the imputed values on the alcohol measure for the participants
> > who skipped the entire questionnaire. As you can imagine, this is taking
> > hours! There must be a simpler way. Any advice greatly appreciated. Kind
> > regards,Kathryn
> >
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Kathryn Gardner

Re: Multiple imputation for different types of missing values

In reply to this post by David Marso

I tried the new syntax and it seems to do something similar and produces the error messages below.
Just to clarify, should the syntax put system missing values in cageaid_2 to cageaid_5 when all of these have been imputed? Also, not sure if it makes a difference that some imputed values are negative (see 5 rows of imputed data for the same participant below).

-1    -1    1    -1
-1    0    -1    0
-1    -1    -1    0
1    0    1    0
-1    -1    0    0

52 COMPUTE @ordered@= $CASENUM.
53    SORT CASES BY ID imputation_.
54   COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5) = 0.
55    IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@).
56    DO IF @nukeme@.
57   + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
58   + COMPUTE imputatedvar=$SYSMIS .
59   + END REPEAT.
60    END IF.
61    SORT CASES BY @ordered@.

56 + LOOP has no effect on this command.
56 + The first word in the line is not recognized as an SPSS Statistics command.
57 + LOOP has no effect on this command.
57 + The first word in the line is not recognized as an SPSS Statistics command.
58 + LOOP has no effect on this command.
58 + The first word in the line is not recognized as an SPSS Statistics command.

> Date: Fri, 16 Dec 2011 05:15:37 -0800

> From: [hidden email]
> Subject: Re: Multiple imputation for different types of missing values
> To: [hidden email]
>
> Kathryn ,
> Looks like the line
> + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE
> imputation_=$SYSMIS .
> may be the culprit. Should be 2 lines (sans >). Also modified logical flag
> for (imputation_).
> HTH, David
> ---
> * Do *NOT* change $CASENUM to ID!!! (This is to enable restoration to
> original order) .
> COMPUTE @ordered@= $CASENUM.
> SORT CASES BY ID imputation_.
> COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
> * Probably need following change
> IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@).
> DO IF @nukeme@.
> + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
> * Following line was previously munged into DO REPEAT line * Possible
> received error mesage? *.
> + COMPUTE imputatedvar=$SYSMIS .
> + END REPEAT.
> END IF.
> * Restore data to order of imputed data sets.
> SORT CASES BY @ordered@.
>
>
> Kathryn Gardner wrote
> >
> > thanks David. I'm a bit of a novice when it comes to syntax and can only
> > do basic stuff and am not clear about all of the commands. You are correct
> > in that MI produces 1 data file with a variable called Imputation_ coded 0
> > (raw), 1, 2, 3, 4, 5 for 5 imputed data sets. I tried to get the second
> > set of code to run. I changed the syntax so that impfag = imputation_,
> > $casenum to ID, and also changed the name of my alcohol items. It partly
> > runs and adds the variables @nukeme@ and @ordered@ to the data file, and
> > then in the @nukeme@ column it's coded any case with missing data on the
> > alcohol items as 1, and also coded as 1 those cases with the same ID
> > number but whose data on CAGEAID_2 to CAGEAID_5 has been imputed. I
> > thought it would change the latter to system missing values though?
> >
> > COMPUTE @ordered@= ID.
> > SORT CASES BY ID imputation_.
> > COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
> > IF ID=LAG(ID) AND imputation_ @nukeme@=LAG(@nukeme@).
> > DO IF @nukeme@.
> > + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE
> > imputation_=$SYSMIS .
> > + END REPEAT.
> > END IF.
> > SORT CASES BY @ordered@.
> >
> >
> >> Date: Thu, 15 Dec 2011 02:29:17 -0800
> >> From: david.marso@
> >> Subject: Re: Multiple imputation for different types of missing values
> >> To: SPSSX-L@.UGA
> >>
> >> "The only solution I could come up with was running the MI, then manually
> >> scanning thousands of rows of data and deleting the imputed values on the
> >> alcohol measure..."
> >> Anytime you begin to manually scan thousands of rows... *STOP*! RETHINK!!
> >> "there must be a simpler way." Yes! It is called Syntax!
> >> ---
> >> Assuming something like alcohol measure = alc01 to alc05. Imputed values
> >> imp01 to imp05.
> >> DO IF NVALID(alc01 TO alc05)=0.
> >> DO REPEAT imp=imp01 TO imp05.
> >> COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
> >> missing */.
> >> END REPEAT.
> >> END IF.
> >> ---
> >> OTOH: I don't have this module so not certain what the data come back
> >> with?
> >> ---
> >> I suspect you actually end up with the raw non-imputed data at the top
> >> and
> >> several imputed data sets below? Hopefully with some sort of consistent
> >> ID
> >> variable (ID)
> >> and some sort of imputation flag impflag (0 raw, 1 imputed)?.
> >> COMPUTE @ordered@=$CASENUM.
> >> SORT CASES BY ID impflag.
> >> COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0.
> >> IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@).
> >> DO IF @nukeme@.
> >> + DO REPEAT imp=alc01 TO alc05.
> >> + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later
> >> as
> >> missing */.
> >> + END REPEAT.
> >> END IF.
> >> SORT CASES BY @ordered@.
> >>
> >>
> >>
> >>
> >>
> >> Kathryn Gardner wrote
> >> >
> >> > Dear List,I am running a multiple imputation on lots of questionnaire
> >> > items and I'm trying to figure out a way to run the analysis, without
> >> > imputing missing values for those participants who have missed out say
> >> all
> >> > 5 items on an alcohol questionnaire because they were told to skip it
> >> if
> >> > they do not drink alcohol. I don't want to exclude the alcohol measure
> >> > entirely from the MI because there are also randomly missing values
> >> across
> >> > these alcohol items that do need imputing. At the moment all missing
> >> > values are identified as system missing in the data file, and I thought
> >> > there might be a way to get SPSS to only run the MI on certain types of
> >> > missing values if I coded the ones I want to be ignored as user
> >> missing,
> >> > but this doesn't seem possible. The only solution I could come up with
> >> was
> >> > running the MI, then manually scanning thousands of rows of data and
> >> > deleting the imputed values on the alcohol measure for the participants
> >> > who skipped the entire questionnaire. As you can imagine, this is
> >> taking
> >> > hours! There must be a simpler way. Any advice greatly appreciated.
> >> Kind
> >> > regards,Kathryn
> >> >
> >>
> >>
> >> --
> >> View this message in context:
> >> http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html
> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >>
> >> =====================
> >> To manage your subscription to SPSSX-L, send a message to
> >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> >> command. To leave the list, send the command
> >> SIGNOFF SPSSX-L
> >> For a list of commands to manage subscriptions, send the command
> >> INFO REFCARD
> >
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080244.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

David Marso

Re: Multiple imputation for different types of missing values

Administrator

Kathryn ,
What is happening in lines 1-51 of the preceding prior to running the posted syntax?
All my code does is check to see if the non-imputed data are *ALL* missing for the specified variables
(cageaid_2 TO cageaid_5). If so it creates a flag @nukeme@. It then checks within the same ID and drags the flag into the imputed data sets. It should then clobber the specified variables for the specified variables (set $SYSMIS). It makes *ABSOLUTELY* no difference what the imputed values are. Only thing that matters is that the non-imputed are *ALL* missing. My question WHAT IS THE CONTEXT for:
"56 + LOOP has no effect on this command. "... i.e. WHAT IS GOING ON PRIOR TO RUNNING my posted syntax?
David

---

Kathryn Gardner wrote

I tried the new syntax and it seems to do something similar and produces the error messages below.
Just to clarify, should the syntax put system missing values in cageaid_2 to cageaid_5 when all of these have been imputed? Also, not sure if it makes a difference that some imputed values are negative (see 5 rows of imputed data for the same participant below).

-1 -1 1 -1
-1 0 -1 0
-1 -1 -1 0
1 0 1 0
-1 -1 0 0

52 COMPUTE @ordered@= $CASENUM.
53 SORT CASES BY ID imputation_.
54 COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5) = 0.
55 IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@).
56 DO IF @nukeme@.
57 + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
58 + COMPUTE imputatedvar=$SYSMIS .
59 + END REPEAT.
60 END IF.
61 SORT CASES BY @ordered@.

56 + LOOP has no effect on this command.
56 + The first word in the line is not recognized as an SPSS Statistics command.
57 + LOOP has no effect on this command.
57 + The first word in the line is not recognized as an SPSS Statistics command.
58 + LOOP has no effect on this command.
58 + The first word in the line is not recognized as an SPSS Statistics command.

> Date: Fri, 16 Dec 2011 05:15:37 -0800
> From: [hidden email]
> Subject: Re: Multiple imputation for different types of missing values
> To: [hidden email]
>
> Kathryn ,
> Looks like the line
> + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE
> imputation_=$SYSMIS .
> may be the culprit. Should be 2 lines (sans >). Also modified logical flag
> for (imputation_).
> HTH, David
> ---
> * Do *NOT* change $CASENUM to ID!!! (This is to enable restoration to
> original order) .
> COMPUTE @ordered@= $CASENUM.
> SORT CASES BY ID imputation_.
> COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
> * Probably need following change
> IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@).
> DO IF @nukeme@.
> + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
> * Following line was previously munged into DO REPEAT line * Possible
> received error mesage? *.
> + COMPUTE imputatedvar=$SYSMIS .
> + END REPEAT.
> END IF.
> * Restore data to order of imputed data sets.
> SORT CASES BY @ordered@.
>
>
> Kathryn Gardner wrote
> >
> > thanks David. I'm a bit of a novice when it comes to syntax and can only
> > do basic stuff and am not clear about all of the commands. You are correct
> > in that MI produces 1 data file with a variable called Imputation_ coded 0
> > (raw), 1, 2, 3, 4, 5 for 5 imputed data sets. I tried to get the second
> > set of code to run. I changed the syntax so that impfag = imputation_,
> > $casenum to ID, and also changed the name of my alcohol items. It partly
> > runs and adds the variables @nukeme@ and @ordered@ to the data file, and
> > then in the @nukeme@ column it's coded any case with missing data on the
> > alcohol items as 1, and also coded as 1 those cases with the same ID
> > number but whose data on CAGEAID_2 to CAGEAID_5 has been imputed. I
> > thought it would change the latter to system missing values though?
> >
> > COMPUTE @ordered@= ID.
> > SORT CASES BY ID imputation_.
> > COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
> > IF ID=LAG(ID) AND imputation_ @nukeme@=LAG(@nukeme@).
> > DO IF @nukeme@.
> > + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE
> > imputation_=$SYSMIS .
> > + END REPEAT.
> > END IF.
> > SORT CASES BY @ordered@.
> >
> >
> >> Date: Thu, 15 Dec 2011 02:29:17 -0800
> >> From: david.marso@
> >> Subject: Re: Multiple imputation for different types of missing values
> >> To: SPSSX-L@.UGA
> >>
> >> "The only solution I could come up with was running the MI, then manually
> >> scanning thousands of rows of data and deleting the imputed values on the
> >> alcohol measure..."
> >> Anytime you begin to manually scan thousands of rows... *STOP*! RETHINK!!
> >> "there must be a simpler way." Yes! It is called Syntax!
> >> ---
> >> Assuming something like alcohol measure = alc01 to alc05. Imputed values
> >> imp01 to imp05.
> >> DO IF NVALID(alc01 TO alc05)=0.
> >> DO REPEAT imp=imp01 TO imp05.
> >> COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
> >> missing */.
> >> END REPEAT.
> >> END IF.
> >> ---
> >> OTOH: I don't have this module so not certain what the data come back
> >> with?
> >> ---
> >> I suspect you actually end up with the raw non-imputed data at the top
> >> and
> >> several imputed data sets below? Hopefully with some sort of consistent
> >> ID
> >> variable (ID)
> >> and some sort of imputation flag impflag (0 raw, 1 imputed)?.
> >> COMPUTE @ordered@=$CASENUM.
> >> SORT CASES BY ID impflag.
> >> COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0.
> >> IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@).
> >> DO IF @nukeme@.
> >> + DO REPEAT imp=alc01 TO alc05.
> >> + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later
> >> as
> >> missing */.
> >> + END REPEAT.
> >> END IF.
> >> SORT CASES BY @ordered@.
> >>
> >>
> >>
> >>
> >>
> >> Kathryn Gardner wrote
> >> >
> >> > Dear List,I am running a multiple imputation on lots of questionnaire
> >> > items and I'm trying to figure out a way to run the analysis, without
> >> > imputing missing values for those participants who have missed out say
> >> all
> >> > 5 items on an alcohol questionnaire because they were told to skip it
> >> if
> >> > they do not drink alcohol. I don't want to exclude the alcohol measure
> >> > entirely from the MI because there are also randomly missing values
> >> across
> >> > these alcohol items that do need imputing. At the moment all missing
> >> > values are identified as system missing in the data file, and I thought
> >> > there might be a way to get SPSS to only run the MI on certain types of
> >> > missing values if I coded the ones I want to be ignored as user
> >> missing,
> >> > but this doesn't seem possible. The only solution I could come up with
> >> was
> >> > running the MI, then manually scanning thousands of rows of data and
> >> > deleting the imputed values on the alcohol measure for the participants
> >> > who skipped the entire questionnaire. As you can imagine, this is
> >> taking
> >> > hours! There must be a simpler way. Any advice greatly appreciated.
> >> Kind
> >> > regards,Kathryn
> >> >
> >>
> >>
> >> --
> >> View this message in context:
> >> http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html
> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >>
> >> =====================
> >> To manage your subscription to SPSSX-L, send a message to
> >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> >> command. To leave the list, send the command
> >> SIGNOFF SPSSX-L
> >> For a list of commands to manage subscriptions, send the command
> >> INFO REFCARD
> >
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080244.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

Kathryn Gardner

Re: Multiple imputation for different types of missing values

Hi David,
Lines 1-51 were empty (I had some syntax there but deleted it).

I see what you mean now about how the syntax works. I think i can use this now actually, recoding all cageaid items to system missing IF @nukeme@ = 1. Many thanks for this it's a great help!
Kathryn

> Date: Fri, 16 Dec 2011 07:04:42 -0800

> From: [hidden email]
> Subject: Re: Multiple imputation for different types of missing values
> To: [hidden email]
>
> Kathryn ,
> What is happening in lines 1-51 of the preceding prior to running the posted
> syntax?
> All my code does is check to see if the non-imputed data are *ALL* missing
> for the specified variables
> (cageaid_2 TO cageaid_5). If so it creates a flag @nukeme@. It then checks
> within the same ID and drags the flag into the imputed data sets. It should
> then clobber the specified variables for the specified variables (set
> $SYSMIS). It makes *ABSOLUTELY* no difference what the imputed values are.
> Only thing that matters is that the non-imputed are *ALL* missing. My
> question WHAT IS THE CONTEXT for:
> "56 + LOOP has no effect on this command. "... i.e. WHAT IS GOING ON PRIOR
> TO RUNNING my posted syntax?
> David
>
> ---
>
> Kathryn Gardner wrote
> >
> > I tried the new syntax and it seems to do something similar and produces
> > the error messages below.
> > Just to clarify, should the syntax put system missing values in cageaid_2
> > to cageaid_5 when all of these have been imputed? Also, not sure if it
> > makes a difference that some imputed values are negative (see 5 rows of
> > imputed data for the same participant below).
> >
> > -1 -1 1 -1
> > -1 0 -1 0
> > -1 -1 -1 0
> > 1 0 1 0
> > -1 -1 0 0
> >
> > 52 COMPUTE @ordered@= $CASENUM.
> > 53 SORT CASES BY ID imputation_.
> > 54 COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5) = 0.
> > 55 IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@).
> > 56 DO IF @nukeme@.
> > 57 + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
> > 58 + COMPUTE imputatedvar=$SYSMIS .
> > 59 + END REPEAT.
> > 60 END IF.
> > 61 SORT CASES BY @ordered@.
> >
> > 56 + LOOP has no effect on this command.
> > 56 + The first word in the line is not recognized as an SPSS Statistics
> > command.
> > 57 + LOOP has no effect on this command.
> > 57 + The first word in the line is not recognized as an SPSS Statistics
> > command.
> > 58 + LOOP has no effect on this command.
> > 58 + The first word in the line is not recognized as an SPSS Statistics
> > command.
> >
> >
> >
> >
> >
> >> Date: Fri, 16 Dec 2011 05:15:37 -0800
> >> From: david.marso@
> >> Subject: Re: Multiple imputation for different types of missing values
> >> To: SPSSX-L@.UGA
> >>
> >> Kathryn ,
> >> Looks like the line
> >> + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE
> >> imputation_=$SYSMIS .
> >> may be the culprit. Should be 2 lines (sans >). Also modified logical
> >> flag
> >> for (imputation_).
> >> HTH, David
> >> ---
> >> * Do *NOT* change $CASENUM to ID!!! (This is to enable restoration to
> >> original order) .
> >> COMPUTE @ordered@= $CASENUM.
> >> SORT CASES BY ID imputation_.
> >> COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
> >> * Probably need following change
> >> IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@).
> >> DO IF @nukeme@.
> >> + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
> >> * Following line was previously munged into DO REPEAT line * Possible
> >> received error mesage? *.
> >> + COMPUTE imputatedvar=$SYSMIS .
> >> + END REPEAT.
> >> END IF.
> >> * Restore data to order of imputed data sets.
> >> SORT CASES BY @ordered@.
> >>
> >>
> >> Kathryn Gardner wrote
> >> >
> >> > thanks David. I'm a bit of a novice when it comes to syntax and can
> >> only
> >> > do basic stuff and am not clear about all of the commands. You are
> >> correct
> >> > in that MI produces 1 data file with a variable called Imputation_
> >> coded 0
> >> > (raw), 1, 2, 3, 4, 5 for 5 imputed data sets. I tried to get the second
> >> > set of code to run. I changed the syntax so that impfag = imputation_,
> >> > $casenum to ID, and also changed the name of my alcohol items. It
> >> partly
> >> > runs and adds the variables @nukeme@ and @ordered@ to the data file,
> >> and
> >> > then in the @nukeme@ column it's coded any case with missing data on
> >> the
> >> > alcohol items as 1, and also coded as 1 those cases with the same ID
> >> > number but whose data on CAGEAID_2 to CAGEAID_5 has been imputed. I
> >> > thought it would change the latter to system missing values though?
> >> >
> >> > COMPUTE @ordered@= ID.
> >> > SORT CASES BY ID imputation_.
> >> > COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
> >> > IF ID=LAG(ID) AND imputation_ @nukeme@=LAG(@nukeme@).
> >> > DO IF @nukeme@.
> >> > + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE
> >> > imputation_=$SYSMIS .
> >> > + END REPEAT.
> >> > END IF.
> >> > SORT CASES BY @ordered@.
> >> >
> >> >
> >> >> Date: Thu, 15 Dec 2011 02:29:17 -0800
> >> >> From: david.marso@
> >> >> Subject: Re: Multiple imputation for different types of missing values
> >> >> To: SPSSX-L@.UGA
> >> >>
> >> >> "The only solution I could come up with was running the MI, then
> >> manually
> >> >> scanning thousands of rows of data and deleting the imputed values on
> >> the
> >> >> alcohol measure..."
> >> >> Anytime you begin to manually scan thousands of rows... *STOP*!
> >> RETHINK!!
> >> >> "there must be a simpler way." Yes! It is called Syntax!
> >> >> ---
> >> >> Assuming something like alcohol measure = alc01 to alc05. Imputed
> >> values
> >> >> imp01 to imp05.
> >> >> DO IF NVALID(alc01 TO alc05)=0.
> >> >> DO REPEAT imp=imp01 TO imp05.
> >> >> COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
> >> >> missing */.
> >> >> END REPEAT.
> >> >> END IF.
> >> >> ---
> >> >> OTOH: I don't have this module so not certain what the data come back
> >> >> with?
> >> >> ---
> >> >> I suspect you actually end up with the raw non-imputed data at the top
> >> >> and
> >> >> several imputed data sets below? Hopefully with some sort of
> >> consistent
> >> >> ID
> >> >> variable (ID)
> >> >> and some sort of imputation flag impflag (0 raw, 1 imputed)?.
> >> >> COMPUTE @ordered@=$CASENUM.
> >> >> SORT CASES BY ID impflag.
> >> >> COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0.
> >> >> IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@).
> >> >> DO IF @nukeme@.
> >> >> + DO REPEAT imp=alc01 TO alc05.
> >> >> + COMPUTE imp=$SYSMIS . /* or set to some value to be declared
> >> later
> >> >> as
> >> >> missing */.
> >> >> + END REPEAT.
> >> >> END IF.
> >> >> SORT CASES BY @ordered@.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Kathryn Gardner wrote
> >> >> >
> >> >> > Dear List,I am running a multiple imputation on lots of
> >> questionnaire
> >> >> > items and I'm trying to figure out a way to run the analysis,
> >> without
> >> >> > imputing missing values for those participants who have missed out
> >> say
> >> >> all
> >> >> > 5 items on an alcohol questionnaire because they were told to skip
> >> it
> >> >> if
> >> >> > they do not drink alcohol. I don't want to exclude the alcohol
> >> measure
> >> >> > entirely from the MI because there are also randomly missing values
> >> >> across
> >> >> > these alcohol items that do need imputing. At the moment all missing
> >> >> > values are identified as system missing in the data file, and I
> >> thought
> >> >> > there might be a way to get SPSS to only run the MI on certain types
> >> of
> >> >> > missing values if I coded the ones I want to be ignored as user
> >> >> missing,
> >> >> > but this doesn't seem possible. The only solution I could come up
> >> with
> >> >> was
> >> >> > running the MI, then manually scanning thousands of rows of data and
> >> >> > deleting the imputed values on the alcohol measure for the
> >> participants
> >> >> > who skipped the entire questionnaire. As you can imagine, this is
> >> >> taking
> >> >> > hours! There must be a simpler way. Any advice greatly appreciated.
> >> >> Kind
> >> >> > regards,Kathryn
> >> >> >
> >> >>
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >> http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html
> >> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >> >>
> >> >> =====================
> >> >> To manage your subscription to SPSSX-L, send a message to
> >> >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> >> >> command. To leave the list, send the command
> >> >> SIGNOFF SPSSX-L
> >> >> For a list of commands to manage subscriptions, send the command
> >> >> INFO REFCARD
> >> >
> >>
> >>
> >> --
> >> View this message in context:
> >> http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080244.html
> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >>
> >> =====================
> >> To manage your subscription to SPSSX-L, send a message to
> >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> >> command. To leave the list, send the command
> >> SIGNOFF SPSSX-L
> >> For a list of commands to manage subscriptions, send the command
> >> INFO REFCARD
> >
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080477.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

Kathryn Gardner

Re: Multiple imputation for different types of missing values

In reply to this post by Bruce Weaver

Hi Bruce,
You are correct in that I do have a question that asks whether they drink alcohol or not (YES/NO), and some participants have actually answered no but then proceeded to answer the cageaid questions, but others have said yes and then scored 0 on the cageaid questions. I think I can re-work David's code here, but I do have other questionnaires where participants have missed out a set of 10 questions because they ask about their father and these were skipped if they were not raise by their father (I didn't include an initial Yes/NO question here though so can easily adapt David's code).

Thank you for your suggestion.
Kathryn

> Date: Fri, 16 Dec 2011 05:32:31 -0800

> From: [hidden email]
> Subject: Re: Multiple imputation for different types of missing values
> To: [hidden email]
>
> Kathryn, in your first post, you said, "I'm trying to figure out a way to run
> the analysis, without imputing missing values for those participants who
> have missed out say all 5 items on an alcohol questionnaire because they
> were told to skip it if they do not drink alcohol." This makes me wonder if
> there is another dichotomous variable in your file (Drinks_Alcohol - Y/N)
> that can be used instead of NVALID(cageaid_2 TO cageaid_5)= 0. Assuming
> there is such a variable, and it's coded 1=Y, 0=N, David's line that
> computes @nukeme@ could be changed to:
>
> COMPUTE @nukeme@= NOT drinks_alcohol.
>
> I think this is preferable, because for whatever reason (e.g., data entry
> error), someone might have zeros for cageaid_2 to cageaid_5, despite having
> a YES for drinks_alcohol. It's also simpler code.
>
> HTH.
>
>
>
> David Marso wrote
> >
> > Kathryn ,
> > Looks like the line
> > + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE
> > imputation_=$SYSMIS .
> > may be the culprit. Should be 2 lines (sans >). Also modified logical
> > flag for (imputation_).
> > HTH, David
> > ---
> > * Do *NOT* change $CASENUM to ID!!! (This is to enable restoration to
> > original order) .
> > COMPUTE @ordered@= $CASENUM.
> > SORT CASES BY ID imputation_.
> > COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
> > * Probably need following change
> > IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@).
> > DO IF @nukeme@.
> > + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
> > * Following line was previously munged into DO REPEAT line * Possible
> > received error mesage? *.
> > + COMPUTE imputatedvar=$SYSMIS .
> > + END REPEAT.
> > END IF.
> > * Restore data to order of imputed data sets.
> > SORT CASES BY @ordered@.
> >
> >
> > Kathryn Gardner wrote
> >>
> >> thanks David. I'm a bit of a novice when it comes to syntax and can only
> >> do basic stuff and am not clear about all of the commands. You are
> >> correct in that MI produces 1 data file with a variable called
> >> Imputation_ coded 0 (raw), 1, 2, 3, 4, 5 for 5 imputed data sets. I tried
> >> to get the second set of code to run. I changed the syntax so that impfag
> >> = imputation_, $casenum to ID, and also changed the name of my alcohol
> >> items. It partly runs and adds the variables @nukeme@ and @ordered@ to
> >> the data file, and then in the @nukeme@ column it's coded any case with
> >> missing data on the alcohol items as 1, and also coded as 1 those cases
> >> with the same ID number but whose data on CAGEAID_2 to CAGEAID_5 has been
> >> imputed. I thought it would change the latter to system missing values
> >> though?
> >>
> >> COMPUTE @ordered@= ID.
> >> SORT CASES BY ID imputation_.
> >> COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
> >> IF ID=LAG(ID) AND imputation_ @nukeme@=LAG(@nukeme@).
> >> DO IF @nukeme@.
> >> + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE
> >> imputation_=$SYSMIS .
> >> + END REPEAT.
> >> END IF.
> >> SORT CASES BY @ordered@.
> >>
> >>
> >>> Date: Thu, 15 Dec 2011 02:29:17 -0800
> >>> From: david.marso@
> >>> Subject: Re: Multiple imputation for different types of missing values
> >>> To: SPSSX-L@.UGA
> >>>
> >>> "The only solution I could come up with was running the MI, then
> >>> manually
> >>> scanning thousands of rows of data and deleting the imputed values on
> >>> the
> >>> alcohol measure..."
> >>> Anytime you begin to manually scan thousands of rows... *STOP*!
> >>> RETHINK!!
> >>> "there must be a simpler way." Yes! It is called Syntax!
> >>> ---
> >>> Assuming something like alcohol measure = alc01 to alc05. Imputed
> >>> values
> >>> imp01 to imp05.
> >>> DO IF NVALID(alc01 TO alc05)=0.
> >>> DO REPEAT imp=imp01 TO imp05.
> >>> COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
> >>> missing */.
> >>> END REPEAT.
> >>> END IF.
> >>> ---
> >>> OTOH: I don't have this module so not certain what the data come back
> >>> with?
> >>> ---
> >>> I suspect you actually end up with the raw non-imputed data at the top
> >>> and
> >>> several imputed data sets below? Hopefully with some sort of consistent
> >>> ID
> >>> variable (ID)
> >>> and some sort of imputation flag impflag (0 raw, 1 imputed)?.
> >>> COMPUTE @ordered@=$CASENUM.
> >>> SORT CASES BY ID impflag.
> >>> COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0.
> >>> IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@).
> >>> DO IF @nukeme@.
> >>> + DO REPEAT imp=alc01 TO alc05.
> >>> + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later
> >>> as
> >>> missing */.
> >>> + END REPEAT.
> >>> END IF.
> >>> SORT CASES BY @ordered@.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Kathryn Gardner wrote
> >>> >
> >>> > Dear List,I am running a multiple imputation on lots of questionnaire
> >>> > items and I'm trying to figure out a way to run the analysis, without
> >>> > imputing missing values for those participants who have missed out say
> >>> all
> >>> > 5 items on an alcohol questionnaire because they were told to skip it
> >>> if
> >>> > they do not drink alcohol. I don't want to exclude the alcohol measure
> >>> > entirely from the MI because there are also randomly missing values
> >>> across
> >>> > these alcohol items that do need imputing. At the moment all missing
> >>> > values are identified as system missing in the data file, and I
> >>> thought
> >>> > there might be a way to get SPSS to only run the MI on certain types
> >>> of
> >>> > missing values if I coded the ones I want to be ignored as user
> >>> missing,
> >>> > but this doesn't seem possible. The only solution I could come up with
> >>> was
> >>> > running the MI, then manually scanning thousands of rows of data and
> >>> > deleting the imputed values on the alcohol measure for the
> >>> participants
> >>> > who skipped the entire questionnaire. As you can imagine, this is
> >>> taking
> >>> > hours! There must be a simpler way. Any advice greatly appreciated.
> >>> Kind
> >>> > regards,Kathryn
> >>> >
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>> http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html
> >>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >>>
> >>> =====================
> >>> To manage your subscription to SPSSX-L, send a message to
> >>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> >>> command. To leave the list, send the command
> >>> SIGNOFF SPSSX-L
> >>> For a list of commands to manage subscriptions, send the command
> >>> INFO REFCARD
> >>
> >
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080275.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

Art Kendall

Re: Multiple imputation for different types of missing values

In reply to this post by Kathryn Gardner

recoding all cageaid items to user missing value -5 IF @nukeme@ = 1.
Try to avoid having sysmis on the right hand side of an assignment.
The values are missing because you said they should be. They are not missing due to SPSS being unable to follow your instruction.

Having item values missing
-1 because they are not applicable
-2 the respondent drinks but did not answer any of those questions
-3 answered but does not drink
-4 answered this one but had to impute more than other 2 items in this scale
-5 nuked
-6 ...
vs
$sysmis SPSS could not obey your instructions for reading or transforming

Art Kendall
Social Research Consultants

On 12/16/2011 11:14 AM, Kathryn Gardner wrote:

Hi David,
Lines 1-51 were empty (I had some syntax there but deleted it).

I see what you mean now about how the syntax works. I think i can use this now actually, recoding all cageaid items to system missing IF @nukeme@ = 1. Many thanks for this it's a great help!
Kathryn

> Date: Fri, 16 Dec 2011 07:04:42 -0800
> From: [hidden email]
> Subject: Re: Multiple imputation for different types of missing values
> To: [hidden email]
>
> Kathryn ,
> What is happening in lines 1-51 of the preceding prior to running the posted
> syntax?
> All my code does is check to see if the non-imputed data are *ALL* missing
> for the specified variables
> (cageaid_2 TO cageaid_5). If so it creates a flag @nukeme@. It then checks
> within the same ID and drags the flag into the imputed data sets. It should
> then clobber the specified variables for the specified variables (set
> $SYSMIS). It makes *ABSOLUTELY* no difference what the imputed values are.
> Only thing that matters is that the non-imputed are *ALL* missing. My
> question WHAT IS THE CONTEXT for:
> "56 + LOOP has no effect on this command. "... i.e. WHAT IS GOING ON PRIOR
> TO RUNNING my posted syntax?
> David
>
> ---
>
> Kathryn Gardner wrote
> >
> > I tried the new syntax and it seems to do something similar and produces
> > the error messages below.
> > Just to clarify, should the syntax put system missing values in cageaid_2
> > to cageaid_5 when all of these have been imputed? Also, not sure if it
> > makes a difference that some imputed values are negative (see 5 rows of
> > imputed data for the same participant below).
> >
> > -1 -1 1 -1
> > -1 0 -1 0
> > -1 -1 -1 0
> > 1 0 1 0
> > -1 -1 0 0
> >
> > 52 COMPUTE @ordered@= $CASENUM.
> > 53 SORT CASES BY ID imputation_.
> > 54 COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5) = 0.
> > 55 IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@).
> > 56 DO IF @nukeme@.
> > 57 + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
> > 58 + COMPUTE imputatedvar=$SYSMIS .
> > 59 + END REPEAT.
> > 60 END IF.
> > 61 SORT CASES BY @ordered@.
> >
> > 56 + LOOP has no effect on this command.
> > 56 + The first word in the line is not recognized as an SPSS Statistics
> > command.
> > 57 + LOOP has no effect on this command.
> > 57 + The first word in the line is not recognized as an SPSS Statistics
> > command.
> > 58 + LOOP has no effect on this command.
> > 58 + The first word in the line is not recognized as an SPSS Statistics
> > command.
> >
> >
> >
> >
> >
> >> Date: Fri, 16 Dec 2011 05:15:37 -0800
> >> From: david.marso@
> >> Subject: Re: Multiple imputation for different types of missing values
> >> To: [hidden email]
> >>
> >> Kathryn ,
> >> Looks like the line
> >> + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE
> >> imputation_=$SYSMIS .
> >> may be the culprit. Should be 2 lines (sans >). Also modified logical
> >> flag
> >> for (imputation_).
> >> HTH, David
> >> ---
> >> * Do *NOT* change $CASENUM to ID!!! (This is to enable restoration to
> >> original order) .
> >> COMPUTE @ordered@= $CASENUM.
> >> SORT CASES BY ID imputation_.
> >> COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
> >> * Probably need following change
> >> IF ID=LAG(ID) AND ( imputation_ GE 1) @nukeme@=LAG(@nukeme@).
> >> DO IF @nukeme@.
> >> + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
> >> * Following line was previously munged into DO REPEAT line * Possible
> >> received error mesage? *.
> >> + COMPUTE imputatedvar=$SYSMIS .
> >> + END REPEAT.
> >> END IF.
> >> * Restore data to order of imputed data sets.
> >> SORT CASES BY @ordered@.
> >>
> >>
> >> Kathryn Gardner wrote
> >> >
> >> > thanks David. I'm a bit of a novice when it comes to syntax and can
> >> only
> >> > do basic stuff and am not clear about all of the commands. You are
> >> correct
> >> > in that MI produces 1 data file with a variable called Imputation_
> >> coded 0
> >> > (raw), 1, 2, 3, 4, 5 for 5 imputed data sets. I tried to get the second
> >> > set of code to run. I changed the syntax so that impfag = imputation_,
> >> > $casenum to ID, and also changed the name of my alcohol items. It
> >> partly
> >> > runs and adds the variables @nukeme@ and @ordered@ to the data file,
> >> and
> >> > then in the @nukeme@ column it's coded any case with missing data on
> >> the
> >> > alcohol items as 1, and also coded as 1 those cases with the same ID
> >> > number but whose data on CAGEAID_2 to CAGEAID_5 has been imputed. I
> >> > thought it would change the latter to system missing values though?
> >> >
> >> > COMPUTE @ordered@= ID.
> >> > SORT CASES BY ID imputation_.
> >> > COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)= 0.
> >> > IF ID=LAG(ID) AND imputation_ @nukeme@=LAG(@nukeme@).
> >> > DO IF @nukeme@.
> >> > + DO REPEAT imputation_=cageaid_2 TO cageaid_5.> + COMPUTE
> >> > imputation_=$SYSMIS .
> >> > + END REPEAT.
> >> > END IF.
> >> > SORT CASES BY @ordered@.
> >> >
> >> >
> >> >> Date: Thu, 15 Dec 2011 02:29:17 -0800
> >> >> From: david.marso@
> >> >> Subject: Re: Multiple imputation for different types of missing values
> >> >> To: [hidden email]
> >> >>
> >> >> "The only solution I could come up with was running the MI, then
> >> manually
> >> >> scanning thousands of rows of data and deleting the imputed values on
> >> the
> >> >> alcohol measure..."
> >> >> Anytime you begin to manually scan thousands of rows... *STOP*!
> >> RETHINK!!
> >> >> "there must be a simpler way." Yes! It is called Syntax!
> >> >> ---
> >> >> Assuming something like alcohol measure = alc01 to alc05. Imputed
> >> values
> >> >> imp01 to imp05.
> >> >> DO IF NVALID(alc01 TO alc05)=0.
> >> >> DO REPEAT imp=imp01 TO imp05.
> >> >> COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
> >> >> missing */.
> >> >> END REPEAT.
> >> >> END IF.
> >> >> ---
> >> >> OTOH: I don't have this module so not certain what the data come back
> >> >> with?
> >> >> ---
> >> >> I suspect you actually end up with the raw non-imputed data at the top
> >> >> and
> >> >> several imputed data sets below? Hopefully with some sort of
> >> consistent
> >> >> ID
> >> >> variable (ID)
> >> >> and some sort of imputation flag impflag (0 raw, 1 imputed)?.
> >> >> COMPUTE @ordered@=$CASENUM.
> >> >> SORT CASES BY ID impflag.
> >> >> COMPUTE @nukeme@=NVALID(alc01 TO alc05)=0.
> >> >> IF ID=LAG(ID) AND impflag @nukeme@=LAG(@nukeme@).
> >> >> DO IF @nukeme@.
> >> >> + DO REPEAT imp=alc01 TO alc05.
> >> >> + COMPUTE imp=$SYSMIS . /* or set to some value to be declared
> >> later
> >> >> as
> >> >> missing */.
> >> >> + END REPEAT.
> >> >> END IF.
> >> >> SORT CASES BY @ordered@.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Kathryn Gardner wrote
> >> >> >
> >> >> > Dear List,I am running a multiple imputation on lots of
> >> questionnaire
> >> >> > items and I'm trying to figure out a way to run the analysis,
> >> without
> >> >> > imputing missing values for those participants who have missed out
> >> say
> >> >> all
> >> >> > 5 items on an alcohol questionnaire because they were told to skip
> >> it
> >> >> if
> >> >> > they do not drink alcohol. I don't want to exclude the alcohol
> >> measure
> >> >> > entirely from the MI because there are also randomly missing values
> >> >> across
> >> >> > these alcohol items that do need imputing. At the moment all missing
> >> >> > values are identified as system missing in the data file, and I
> >> thought
> >> >> > there might be a way to get SPSS to only run the MI on certain types
> >> of
> >> >> > missing values if I coded the ones I want to be ignored as user
> >> >> missing,
> >> >> > but this doesn't seem possible. The only solution I could come up
> >> with
> >> >> was
> >> >> > running the MI, then manually scanning thousands of rows of data and
> >> >> > deleting the imputed values on the alcohol measure for the
> >> participants
> >> >> > who skipped the entire questionnaire. As you can imagine, this is
> >> >> taking
> >> >> > hours! There must be a simpler way. Any advice greatly appreciated.
> >> >> Kind
> >> >> > regards,Kathryn
> >> >> >
> >> >>
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >> http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html
> >> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >> >>
> >> >> =====================
> >> >> To manage your subscription to SPSSX-L, send a message to
> >> >> [hidden email] (not to SPSSX-L), with no body text except the
> >> >> command. To leave the list, send the command
> >> >> SIGNOFF SPSSX-L
> >> >> For a list of commands to manage subscriptions, send the command
> >> >> INFO REFCARD
> >> >
> >>
> >>
> >> --
> >> View this message in context:
> >> http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080244.html
> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >>
> >> =====================
> >> To manage your subscription to SPSSX-L, send a message to
> >> [hidden email] (not to SPSSX-L), with no body text except the
> >> command. To leave the list, send the command
> >> SIGNOFF SPSSX-L
> >> For a list of commands to manage subscriptions, send the command
> >> INFO REFCARD
> >
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080477.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

Art Kendall
Social Research Consultants

David Marso

Re: Multiple imputation for different types of missing values

Administrator

Certainly sage advice:
Notice in my original post:
+ COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as missing */.
I did *NOT* wish to be bothered with discerning or assuming what might constitute valid/missing values for the resulting variables. Sometimes within the context of a newsgroup posting I will follow the path of least resistance even though I would scarcely utilize such in my own production code.
Recall in the original query:
"At the moment all missing values are identified as system missing in the data file.".
...
Perhaps *NOT* best practices applied here, OTOH: people pay me to apply them.
Free advice is cheap and sometime worth what one pays for it (probably *ALOT* more but YMMV ;-) .
--

Art Kendall wrote

recoding all cageaid items to user missing value -5 IF @nukeme@ =
1.
Try to avoid having sysmis on the right hand side of
an assignment.
The values are missing because you said they should be. They are
not missing due to SPSS being unable to follow your instruction.

Having item values missing
-1 because they are not applicable
-2 the respondent drinks but did not answer any of those
questions
-3 answered but does not drink
-4 answered this one but had to impute more than other 2 items in
this scale
-5 nuked
-6 ...
vs
$sysmis SPSS could not obey your instructions for reading or
transforming

Art Kendall
Social Research Consultants

On 12/16/2011 11:14 AM, Kathryn Gardner wrote:

Hi David,
Lines 1-51 were empty (I had some syntax there but deleted it).

I see what you mean now about how the syntax works. I think i
can use this now actually, recoding all cageaid items to system
missing IF @nukeme@ = 1. Many thanks for this it's a great help!
Kathryn

> Date: Fri, 16 Dec 2011 07:04:42 -0800
> From: [hidden email]
> Subject: Re: Multiple imputation for different types of
missing values
> To: [hidden email]
>
> Kathryn ,
> What is happening in lines 1-51 of the preceding prior to
running the posted
> syntax?
> All my code does is check to see if the non-imputed data
are *ALL* missing
> for the specified variables
> (cageaid_2 TO cageaid_5). If so it creates a flag
@nukeme@. It then checks
> within the same ID and drags the flag into the imputed
data sets. It should
> then clobber the specified variables for the specified
variables (set
> $SYSMIS). It makes *ABSOLUTELY* no difference what the
imputed values are.
> Only thing that matters is that the non-imputed are *ALL*
missing. My
> question WHAT IS THE CONTEXT for:
> "56 + LOOP has no effect on this command. "... i.e. WHAT
IS GOING ON PRIOR
> TO RUNNING my posted syntax?
> David
>
> ---
>
> Kathryn Gardner wrote
> >
> > I tried the new syntax and it seems to do something
similar and produces
> > the error messages below.
> > Just to clarify, should the syntax put system
missing values in cageaid_2
> > to cageaid_5 when all of these have been imputed?
Also, not sure if it
> > makes a difference that some imputed values are
negative (see 5 rows of
> > imputed data for the same participant below).
> >
> > -1 -1 1 -1
> > -1 0 -1 0
> > -1 -1 -1 0
> > 1 0 1 0
> > -1 -1 0 0
> >
> > 52 COMPUTE @ordered@= $CASENUM.
> > 53 SORT CASES BY ID imputation_.
> > 54 COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5) =
0.
> > 55 IF ID=LAG(ID) AND ( imputation_ GE 1)
@nukeme@=LAG(@nukeme@).
> > 56 DO IF @nukeme@.
> > 57 + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
> > 58 + COMPUTE imputatedvar=$SYSMIS .
> > 59 + END REPEAT.
> > 60 END IF.
> > 61 SORT CASES BY @ordered@.
> >
> > 56 + LOOP has no effect on this command.
> > 56 + The first word in the line is not recognized as
an SPSS Statistics
> > command.
> > 57 + LOOP has no effect on this command.
> > 57 + The first word in the line is not recognized as
an SPSS Statistics
> > command.
> > 58 + LOOP has no effect on this command.
> > 58 + The first word in the line is not recognized as
an SPSS Statistics
> > command.
> >
> >
> >
> >
> >
> >> Date: Fri, 16 Dec 2011 05:15:37 -0800
> >> From: david.marso@
> >> Subject: Re: Multiple imputation for different
types of missing values
> >> To: SPSSX-L@.UGA
> >>
> >> Kathryn ,
> >> Looks like the line
> >> + DO REPEAT imputation_=cageaid_2 TO
cageaid_5.> + COMPUTE
> >> imputation_=$SYSMIS .
> >> may be the culprit. Should be 2 lines (sans
>). Also modified logical
> >> flag
> >> for (imputation_).
> >> HTH, David
> >> ---
> >> * Do *NOT* change $CASENUM to ID!!! (This is to
enable restoration to
> >> original order) .
> >> COMPUTE @ordered@= $CASENUM.
> >> SORT CASES BY ID imputation_.
> >> COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)=
0.
> >> * Probably need following change
> >> IF ID=LAG(ID) AND ( imputation_ GE 1)
@nukeme@=LAG(@nukeme@).
> >> DO IF @nukeme@.
> >> + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
> >> * Following line was previously munged into DO
REPEAT line * Possible
> >> received error mesage? *.
> >> + COMPUTE imputatedvar=$SYSMIS .
> >> + END REPEAT.
> >> END IF.
> >> * Restore data to order of imputed data sets.
> >> SORT CASES BY @ordered@.
> >>
> >>
> >> Kathryn Gardner wrote
> >> >
> >> > thanks David. I'm a bit of a novice when it
comes to syntax and can
> >> only
> >> > do basic stuff and am not clear about all
of the commands. You are
> >> correct
> >> > in that MI produces 1 data file with a
variable called Imputation_
> >> coded 0
> >> > (raw), 1, 2, 3, 4, 5 for 5 imputed data
sets. I tried to get the second
> >> > set of code to run. I changed the syntax so
that impfag = imputation_,
> >> > $casenum to ID, and also changed the name
of my alcohol items. It
> >> partly
> >> > runs and adds the variables @nukeme@ and
@ordered@ to the data file,
> >> and
> >> > then in the @nukeme@ column it's coded any
case with missing data on
> >> the
> >> > alcohol items as 1, and also coded as 1
those cases with the same ID
> >> > number but whose data on CAGEAID_2 to
CAGEAID_5 has been imputed. I
> >> > thought it would change the latter to
system missing values though?
> >> >
> >> > COMPUTE @ordered@= ID.
> >> > SORT CASES BY ID imputation_.
> >> > COMPUTE @nukeme@=NVALID(cageaid_2 TO
cageaid_5)= 0.
> >> > IF ID=LAG(ID) AND imputation_
@nukeme@=LAG(@nukeme@).
> >> > DO IF @nukeme@.
> >> > + DO REPEAT imputation_=cageaid_2 TO
cageaid_5.> + COMPUTE
> >> > imputation_=$SYSMIS .
> >> > + END REPEAT.
> >> > END IF.
> >> > SORT CASES BY @ordered@.
> >> >
> >> >
> >> >> Date: Thu, 15 Dec 2011 02:29:17 -0800
> >> >> From: david.marso@
> >> >> Subject: Re: Multiple imputation for
different types of missing values
> >> >> To: SPSSX-L@.UGA
> >> >>
> >> >> "The only solution I could come up with
was running the MI, then
> >> manually
> >> >> scanning thousands of rows of data and
deleting the imputed values on
> >> the
> >> >> alcohol measure..."
> >> >> Anytime you begin to manually scan
thousands of rows... *STOP*!
> >> RETHINK!!
> >> >> "there must be a simpler way." Yes! It
is called Syntax!
> >> >> ---
> >> >> Assuming something like alcohol measure
= alc01 to alc05. Imputed
> >> values
> >> >> imp01 to imp05.
> >> >> DO IF NVALID(alc01 TO alc05)=0.
> >> >> DO REPEAT imp=imp01 TO imp05.
> >> >> COMPUTE imp=$SYSMIS . /* or set to some
value to be declared later as
> >> >> missing */.
> >> >> END REPEAT.
> >> >> END IF.
> >> >> ---
> >> >> OTOH: I don't have this module so not
certain what the data come back
> >> >> with?
> >> >> ---
> >> >> I suspect you actually end up with the
raw non-imputed data at the top
> >> >> and
> >> >> several imputed data sets below?
Hopefully with some sort of
> >> consistent
> >> >> ID
> >> >> variable (ID)
> >> >> and some sort of imputation flag
impflag (0 raw, 1 imputed)?.
> >> >> COMPUTE @ordered@=$CASENUM.
> >> >> SORT CASES BY ID impflag.
> >> >> COMPUTE @nukeme@=NVALID(alc01 TO
alc05)=0.
> >> >> IF ID=LAG(ID) AND impflag
@nukeme@=LAG(@nukeme@).
> >> >> DO IF @nukeme@.
> >> >> + DO REPEAT imp=alc01 TO alc05.
> >> >> + COMPUTE imp=$SYSMIS . /* or set to
some value to be declared
> >> later
> >> >> as
> >> >> missing */.
> >> >> + END REPEAT.
> >> >> END IF.
> >> >> SORT CASES BY @ordered@.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Kathryn Gardner wrote
> >> >> >
> >> >> > Dear List,I am running a multiple
imputation on lots of
> >> questionnaire
> >> >> > items and I'm trying to figure out
a way to run the analysis,
> >> without
> >> >> > imputing missing values for those
participants who have missed out
> >> say
> >> >> all
> >> >> > 5 items on an alcohol
questionnaire because they were told to skip
> >> it
> >> >> if
> >> >> > they do not drink alcohol. I don't
want to exclude the alcohol
> >> measure
> >> >> > entirely from the MI because there
are also randomly missing values
> >> >> across
> >> >> > these alcohol items that do need
imputing. At the moment all missing
> >> >> > values are identified as system
missing in the data file, and I
> >> thought
> >> >> > there might be a way to get SPSS
to only run the MI on certain types
> >> of
> >> >> > missing values if I coded the ones
I want to be ignored as user
> >> >> missing,
> >> >> > but this doesn't seem possible.
The only solution I could come up
> >> with
> >> >> was
> >> >> > running the MI, then manually
scanning thousands of rows of data and
> >> >> > deleting the imputed values on the
alcohol measure for the
> >> participants
> >> >> > who skipped the entire
questionnaire. As you can imagine, this is
> >> >> taking
> >> >> > hours! There must be a simpler
way. Any advice greatly appreciated.
> >> >> Kind
> >> >> > regards,Kathryn
> >> >> >
> >> >>
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html
> >> >> Sent from the SPSSX Discussion mailing
list archive at Nabble.com.
> >> >>
> >> >> =====================
> >> >> To manage your subscription to SPSSX-L,
send a message to
> >> >> LISTSERV@.UGA (not to SPSSX-L), with no
body text except the
> >> >> command. To leave the list, send the
command
> >> >> SIGNOFF SPSSX-L
> >> >> For a list of commands to manage
subscriptions, send the command
> >> >> INFO REFCARD
> >> >
> >>
> >>
> >> --
> >> View this message in context:
> >>
http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080244.html
> >> Sent from the SPSSX Discussion mailing list
archive at Nabble.com.
> >>
> >> =====================
> >> To manage your subscription to SPSSX-L, send a
message to
> >> LISTSERV@.UGA (not to SPSSX-L), with no body
text except the
> >> command. To leave the list, send the command
> >> SIGNOFF SPSSX-L
> >> For a list of commands to manage subscriptions,
send the command
> >> INFO REFCARD
> >
>
>
> --
> View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080477.html
> Sent from the SPSSX Discussion mailing list archive at
Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body
text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: Multiple imputation for different types of missing values

In reply to this post by Kathryn Gardner

It is particularly in RELIABILITY that I wonder about imputing values for items from outside the set of variables that are repeated measures of a construct.

Items in a summative scale often have a restricted range 0 to 9 1 to 5 -3 to 3, etc.

A goal in scale development is to have a measurement that has convergent validity with the set of items and divergent validity for constructs other than the one you are trying to measure. One wants to work with the common variance

It is likely that variables intended to measure other constructs would relate to the unique variance within an item
Well developed scales have question stems that are balanced for direction, i.e., have opposite signs on the factor they are assigned to,
Would you impute items the way they were entered of after they were reflected to be unidirectional like you need to do before you run RELIABILITY?

Mean substitution is potentially problematic when a mean is across cases. I don't know about when the mean is within cases.
__
Of course a lot depends on where that piece of research lies in the stream of research in that area, the goals of the particular piece of research, and where you are in the use of the data.

With regard to the alcohol scale when you think about an item would zero be a reasonable value in the order of the responses to that item? It is hard to say much more without understanding the constructs you are trying to measure and their role in your theorizing.

Also before doing any imputing do you remove items from further consideration when many of the values are missing for reasons otehr than non-applicability? Do you drop cases that have substantial amount of missing data? Or that have pattern responding?
by pattern responding I mean all true, all false, alternating true and false, 1 2 3 4 5, etc. that show respondents were responding only to the request to give an answer but are not responding to the semantic content of the question?

Do you drop items from scales when their inclusion lowers the internal consistency of the summative score(total or mean)?

Of course it should be a quick project to get means, SD's and correlations several ways once you have finished cleaning the data..
With list wise deletion.
With pair wise deletion.
With imputed values for missing data.

Do the values differ in meaningful ways?
If you do factor analyses and plot the eigenvalues from each and from parallel analyses of each? Is there much difference?

In the long run how much of your data is imputed?

hth

Art Kendall
Social Research Consultants

On 12/16/2011 4:35 AM, Kathryn Gardner wrote:

Dear Art,
I wanted to impute the missing values at the item level as I thought this was more sensitive and I can then use all data in reliability analyses. Once imputed, I'll be summing the items to create scale scores to represent various constructs (e.g., alcohol use, personality, emotion regulation) that will be used in in the main analyses 2 papers I am publishing (SEM for one paper and latent profile analysis for another paper). I thought using the mean as the summative score to deal with missing data is equally as bad as using mean substitution? Why is imputing summative scale items via MI is often unnecessary? I couldn't find anything on the debate as to whether multiple imputation should be used for scale items vs. computed subscale scores etc.
Kathryn

Date: Thu, 15 Dec 2011 07:11:01 -0500
From: [hidden email]
To: [hidden email]
CC: [hidden email]
Subject: Re: [SPSSX-L] Multiple imputation for different types of missing values

I would like to hear from other list members, but imputing summative scale items via MI is often unnecessary.
You use the term items which often means the variables are meant to be used as part of a score so I am responding in that context.

Are you planning to distribute a public use data set that includes items, that includes only scales scores, or are you only working on you own data set for your use?

What is the goal of your project? finding totals, means, percents for a pop or for subpops? Or are you intending to compare and contrast groups? Or mainly interest in the relations of variables? Developing scales?

What is the response scale on the alcohol items? Are they intended to be repeated measures of a construct where the total or mean is used in analysis as the measure of a construct?

{would like to hear from other on this}
If score is to be used in analysis and the mean is the summative score, just use it.

If the score is to be used as a total e.g., for comparison to published norms, then a) do what the original authors of the scale did or b)
compute adjscore = sum valid items * (# of items in scale/# of items with valid values).
{end of part I would like to hear from other list members about.

Art Kendall
Social Research Consultants

On 12/15/2011 4:15 AM, Kathryn Gardner wrote:

Dear List,
I am running a multiple imputation on lots of questionnaire items and I'm trying to figure out a way to run the analysis, without imputing missing values for those participants who have missed out say all 5 items on an alcohol questionnaire because they were told to skip it if they do not drink alcohol. I don't want to exclude the alcohol measure entirely from the MI because there are also randomly missing values across these alcohol items that do need imputing. At the moment all missing values are identified as system missing in the data file, and I thought there might be a way to get SPSS to only run the MI on certain types of missing values if I coded the ones I want to be ignored as user missing, but this doesn't seem possible. The only solution I could come up with was running the MI, then manually scanning thousands of rows of data and deleting the imputed values on the alcohol measure for the participants who skipped the entire questionnaire. As you can imagine, this is taking hours! There must be a simpler way. Any advice greatly appreciated.

Kind regards,
Kathryn

Art Kendall
Social Research Consultants

Kathryn Gardner

Re: Multiple imputation for different types of missing values

Dear Art,
Thank you for your useful comments. Before estimating missing data I usually remove anyone who has missed out entire questionnaires (other than for non-applicability reasons), and also check for any patterns in the missing data. I also check that the amount of missing data to be imputed is not a large amount. Up until now I have used person mean substitution at the item level, which has shown promising results, but MI seems to be the gold standard, and now that it is available in SPSS I assumed this was the way forward, even for scale items. It seems you are suggesting not, but I wondered if you knew of any references that discuss this issue?

I have noticed that there is an option to apply constraints when running MI, so that one can specify the range one would like the score to fall in e.g., 1-5. I was intending to do this for each item so that the imputation does not impute any implausible values e.g., -1. I couldn't find any discussion of this issue, but it seems like the only logical way to avoid implausible values.

I have dropped items from scales where internal consistency is lowered, but only if this is substantially so.

Kathryn

Date: Fri, 16 Dec 2011 13:37:21 -0500
From: [hidden email]
To: [hidden email]
CC: [hidden email]
Subject: Re: [SPSSX-L] Multiple imputation for different types of missing values

It is particularly in RELIABILITY that I wonder about imputing values for items from outside the set of variables that are repeated measures of a construct.

Items in a summative scale often have a restricted range 0 to 9 1 to 5 -3 to 3, etc.

A goal in scale development is to have a measurement that has convergent validity with the set of items and divergent validity for constructs other than the one you are trying to measure. One wants to work with the common variance

It is likely that variables intended to measure other constructs would relate to the unique variance within an item
Well developed scales have question stems that are balanced for direction, i.e., have opposite signs on the factor they are assigned to,
Would you impute items the way they were entered of after they were reflected to be unidirectional like you need to do before you run RELIABILITY?

Mean substitution is potentially problematic when a mean is across cases. I don't know about when the mean is within cases.
__
Of course a lot depends on where that piece of research lies in the stream of research in that area, the goals of the particular piece of research, and where you are in the use of the data.

With regard to the alcohol scale when you think about an item would zero be a reasonable value in the order of the responses to that item? It is hard to say much more without understanding the constructs you are trying to measure and their role in your theorizing.

Also before doing any imputing do you remove items from further consideration when many of the values are missing for reasons otehr than non-applicability? Do you drop cases that have substantial amount of missing data? Or that have pattern responding?
by pattern responding I mean all true, all false, alternating true and false, 1 2 3 4 5, etc. that show respondents were responding only to the request to give an answer but are not responding to the semantic content of the question?

Do you drop items from scales when their inclusion lowers the internal consistency of the summative score(total or mean)?

Of course it should be a quick project to get means, SD's and correlations several ways once you have finished cleaning the data..
With list wise deletion.
With pair wise deletion.
With imputed values for missing data.

Do the values differ in meaningful ways?
If you do factor analyses and plot the eigenvalues from each and from parallel analyses of each? Is there much difference?

In the long run how much of your data is imputed?

hth

Art Kendall
Social Research Consultants

On 12/16/2011 4:35 AM, Kathryn Gardner wrote:

Dear Art,
I wanted to impute the missing values at the item level as I thought this was more sensitive and I can then use all data in reliability analyses. Once imputed, I'll be summing the items to create scale scores to represent various constructs (e.g., alcohol use, personality, emotion regulation) that will be used in in the main analyses 2 papers I am publishing (SEM for one paper and latent profile analysis for another paper). I thought using the mean as the summative score to deal with missing data is equally as bad as using mean substitution? Why is imputing summative scale items via MI is often unnecessary? I couldn't find anything on the debate as to whether multiple imputation should be used for scale items vs. computed subscale scores etc.
Kathryn

Date: Thu, 15 Dec 2011 07:11:01 -0500
From: [hidden email]
To: [hidden email]
CC: [hidden email]
Subject: Re: [SPSSX-L] Multiple imputation for different types of missing values

I would like to hear from other list members, but imputing summative scale items via MI is often unnecessary.
You use the term items which often means the variables are meant to be used as part of a score so I am responding in that context.

Are you planning to distribute a public use data set that includes items, that includes only scales scores, or are you only working on you own data set for your use?

What is the goal of your project? finding totals, means, percents for a pop or for subpops? Or are you intending to compare and contrast groups? Or mainly interest in the relations of variables? Developing scales?

What is the response scale on the alcohol items? Are they intended to be repeated measures of a construct where the total or mean is used in analysis as the measure of a construct?

{would like to hear from other on this}
If score is to be used in analysis and the mean is the summative score, just use it.

If the score is to be used as a total e.g., for comparison to published norms, then a) do what the original authors of the scale did or b)
compute adjscore = sum valid items * (# of items in scale/# of items with valid values).
{end of part I would like to hear from other list members about.

Art Kendall
Social Research Consultants

On 12/15/2011 4:15 AM, Kathryn Gardner wrote:

Dear List,
I am running a multiple imputation on lots of questionnaire items and I'm trying to figure out a way to run the analysis, without imputing missing values for those participants who have missed out say all 5 items on an alcohol questionnaire because they were told to skip it if they do not drink alcohol. I don't want to exclude the alcohol measure entirely from the MI because there are also randomly missing values across these alcohol items that do need imputing. At the moment all missing values are identified as system missing in the data file, and I thought there might be a way to get SPSS to only run the MI on certain types of missing values if I coded the ones I want to be ignored as user missing, but this doesn't seem possible. The only solution I could come up with was running the MI, then manually scanning thousands of rows of data and deleting the imputed values on the alcohol measure for the participants who skipped the entire questionnaire. As you can imagine, this is taking hours! There must be a simpler way. Any advice greatly appreciated.

Kind regards,
Kathryn

Kathryn Gardner

Re: Multiple imputation for different types of missing values

In reply to this post by David Marso

I never leave things as system missing. I always use use missing values, but in the context of these posts it was an easy way to communicate what I needed, before I then recode to user missing values as required.
Kathryn

> Date: Fri, 16 Dec 2011 10:26:18 -0800

> From: [hidden email]
> Subject: Re: Multiple imputation for different types of missing values
> To: [hidden email]
>
> Certainly sage advice:
> Notice in my original post:
> + COMPUTE imp=$SYSMIS . /* or set to some value to be declared later as
> missing */.
> I did *NOT* wish to be bothered with discerning or assuming what might
> constitute valid/missing values for the resulting variables. Sometimes
> within the context of a newsgroup posting I will follow the path of least
> resistance even though I would scarcely utilize such in my own production
> code.
> Recall in the original query:
> "At the moment all missing values are identified as system missing in the
> data file.".
> ...
> Perhaps *NOT* best practices applied here, OTOH: people pay me to apply
> them.
> Free advice is cheap and sometime worth what one pays for it (probably
> *ALOT* more but YMMV ;-) .
> --
>
>
> Art Kendall wrote
> >
> > recoding all cageaid items to user  missing value -5 IF @nukeme@ =
> > 1.
> > Try to avoid having sysmis on the right hand side of
> > an assignment.
> > The values are missing because you said they should be.  They
> > are
> > not missing due to SPSS being unable to follow your instruction.
> >
> > Having item values missing 
> > -1 because they are not applicable
> > -2   the respondent drinks but did not answer any of those
> > questions
> > -3 answered but does not drink
> > -4 answered this one but had to impute more than other 2 items in
> > this scale
> > -5 nuked
> > -6 ...
> > vs
> > $sysmis SPSS could not obey your instructions for reading or
> > transforming
> >
> > Art Kendall
> > Social Research Consultants
> >
> > On 12/16/2011 11:14 AM, Kathryn Gardner wrote:
> >
> >
> >
> > Hi David,
> > Lines 1-51 were empty (I had some syntax there but deleted it).
> >
> > I see what you mean now about how the syntax works. I think i
> > can use this now actually, recoding all cageaid items to system
> > missing IF @nukeme@ = 1. Many thanks for this it's a great help!
> > Kathryn
> >
> >
> >
> > > Date: Fri, 16 Dec 2011 07:04:42 -0800
> > > From: david.marso@
> > > Subject: Re: Multiple imputation for different types of
> > missing values
> > > To: SPSSX-L@.UGA
> > >
> > > Kathryn ,
> > > What is happening in lines 1-51 of the preceding prior to
> > running the posted
> > > syntax?
> > > All my code does is check to see if the non-imputed data
> > are *ALL* missing
> > > for the specified variables
> > > (cageaid_2 TO cageaid_5). If so it creates a flag
> > @nukeme@. It then checks
> > > within the same ID and drags the flag into the imputed
> > data sets. It should
> > > then clobber the specified variables for the specified
> > variables (set
> > > $SYSMIS). It makes *ABSOLUTELY* no difference what the
> > imputed values are.
> > > Only thing that matters is that the non-imputed are *ALL*
> > missing. My
> > > question WHAT IS THE CONTEXT for:
> > > "56 + LOOP has no effect on this command. "... i.e. WHAT
> > IS GOING ON PRIOR
> > > TO RUNNING my posted syntax?
> > > David
> > >
> > > ---
> > >
> > > Kathryn Gardner wrote
> > > >
> > > > I tried the new syntax and it seems to do something
> > similar and produces
> > > > the error messages below.
> > > > Just to clarify, should the syntax put system
> > missing values in cageaid_2
> > > > to cageaid_5 when all of these have been imputed?
> > Also, not sure if it
> > > > makes a difference that some imputed values are
> > negative (see 5 rows of
> > > > imputed data for the same participant below).
> > > >
> > > > -1 -1 1 -1
> > > > -1 0 -1 0
> > > > -1 -1 -1 0
> > > > 1 0 1 0
> > > > -1 -1 0 0
> > > >
> > > > 52 COMPUTE @ordered@= $CASENUM.
> > > > 53 SORT CASES BY ID imputation_.
> > > > 54 COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5) =
> > 0.
> > > > 55 IF ID=LAG(ID) AND ( imputation_ GE 1)
> > @nukeme@=LAG(@nukeme@).
> > > > 56 DO IF @nukeme@.
> > > > 57 + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
> > > > 58 + COMPUTE imputatedvar=$SYSMIS .
> > > > 59 + END REPEAT.
> > > > 60 END IF.
> > > > 61 SORT CASES BY @ordered@.
> > > >
> > > > 56 + LOOP has no effect on this command.
> > > > 56 + The first word in the line is not recognized as
> > an SPSS Statistics
> > > > command.
> > > > 57 + LOOP has no effect on this command.
> > > > 57 + The first word in the line is not recognized as
> > an SPSS Statistics
> > > > command.
> > > > 58 + LOOP has no effect on this command.
> > > > 58 + The first word in the line is not recognized as
> > an SPSS Statistics
> > > > command.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >> Date: Fri, 16 Dec 2011 05:15:37 -0800
> > > >> From: david.marso@
> > > >> Subject: Re: Multiple imputation for different
> > types of missing values
> > > >> To: SPSSX-L@.UGA
> > > >>
> > > >> Kathryn ,
> > > >> Looks like the line
> > > >> + DO REPEAT imputation_=cageaid_2 TO
> > cageaid_5.> + COMPUTE
> > > >> imputation_=$SYSMIS .
> > > >> may be the culprit. Should be 2 lines (sans
> > >). Also modified logical
> > > >> flag
> > > >> for (imputation_).
> > > >> HTH, David
> > > >> ---
> > > >> * Do *NOT* change $CASENUM to ID!!! (This is to
> > enable restoration to
> > > >> original order) .
> > > >> COMPUTE @ordered@= $CASENUM.
> > > >> SORT CASES BY ID imputation_.
> > > >> COMPUTE @nukeme@=NVALID(cageaid_2 TO cageaid_5)=
> > 0.
> > > >> * Probably need following change
> > > >> IF ID=LAG(ID) AND ( imputation_ GE 1)
> > @nukeme@=LAG(@nukeme@).
> > > >> DO IF @nukeme@.
> > > >> + DO REPEAT imputedvar=cageaid_2 TO cageaid_5.
> > > >> * Following line was previously munged into DO
> > REPEAT line * Possible
> > > >> received error mesage? *.
> > > >> + COMPUTE imputatedvar=$SYSMIS .
> > > >> + END REPEAT.
> > > >> END IF.
> > > >> * Restore data to order of imputed data sets.
> > > >> SORT CASES BY @ordered@.
> > > >>
> > > >>
> > > >> Kathryn Gardner wrote
> > > >> >
> > > >> > thanks David. I'm a bit of a novice when it
> > comes to syntax and can
> > > >> only
> > > >> > do basic stuff and am not clear about all
> > of the commands. You are
> > > >> correct
> > > >> > in that MI produces 1 data file with a
> > variable called Imputation_
> > > >> coded 0
> > > >> > (raw), 1, 2, 3, 4, 5 for 5 imputed data
> > sets. I tried to get the second
> > > >> > set of code to run. I changed the syntax so
> > that impfag = imputation_,
> > > >> > $casenum to ID, and also changed the name
> > of my alcohol items. It
> > > >> partly
> > > >> > runs and adds the variables @nukeme@ and
> > @ordered@ to the data file,
> > > >> and
> > > >> > then in the @nukeme@ column it's coded any
> > case with missing data on
> > > >> the
> > > >> > alcohol items as 1, and also coded as 1
> > those cases with the same ID
> > > >> > number but whose data on CAGEAID_2 to
> > CAGEAID_5 has been imputed. I
> > > >> > thought it would change the latter to
> > system missing values though?
> > > >> >
> > > >> > COMPUTE @ordered@= ID.
> > > >> > SORT CASES BY ID imputation_.
> > > >> > COMPUTE @nukeme@=NVALID(cageaid_2 TO
> > cageaid_5)= 0.
> > > >> > IF ID=LAG(ID) AND imputation_
> > @nukeme@=LAG(@nukeme@).
> > > >> > DO IF @nukeme@.
> > > >> > + DO REPEAT imputation_=cageaid_2 TO
> > cageaid_5.> + COMPUTE
> > > >> > imputation_=$SYSMIS .
> > > >> > + END REPEAT.
> > > >> > END IF.
> > > >> > SORT CASES BY @ordered@.
> > > >> >
> > > >> >
> > > >> >> Date: Thu, 15 Dec 2011 02:29:17 -0800
> > > >> >> From: david.marso@
> > > >> >> Subject: Re: Multiple imputation for
> > different types of missing values
> > > >> >> To: SPSSX-L@.UGA
> > > >> >>
> > > >> >> "The only solution I could come up with
> > was running the MI, then
> > > >> manually
> > > >> >> scanning thousands of rows of data and
> > deleting the imputed values on
> > > >> the
> > > >> >> alcohol measure..."
> > > >> >> Anytime you begin to manually scan
> > thousands of rows... *STOP*!
> > > >> RETHINK!!
> > > >> >> "there must be a simpler way." Yes! It
> > is called Syntax!
> > > >> >> ---
> > > >> >> Assuming something like alcohol measure
> > = alc01 to alc05. Imputed
> > > >> values
> > > >> >> imp01 to imp05.
> > > >> >> DO IF NVALID(alc01 TO alc05)=0.
> > > >> >> DO REPEAT imp=imp01 TO imp05.
> > > >> >> COMPUTE imp=$SYSMIS . /* or set to some
> > value to be declared later as
> > > >> >> missing */.
> > > >> >> END REPEAT.
> > > >> >> END IF.
> > > >> >> ---
> > > >> >> OTOH: I don't have this module so not
> > certain what the data come back
> > > >> >> with?
> > > >> >> ---
> > > >> >> I suspect you actually end up with the
> > raw non-imputed data at the top
> > > >> >> and
> > > >> >> several imputed data sets below?
> > Hopefully with some sort of
> > > >> consistent
> > > >> >> ID
> > > >> >> variable (ID)
> > > >> >> and some sort of imputation flag
> > impflag (0 raw, 1 imputed)?.
> > > >> >> COMPUTE @ordered@=$CASENUM.
> > > >> >> SORT CASES BY ID impflag.
> > > >> >> COMPUTE @nukeme@=NVALID(alc01 TO
> > alc05)=0.
> > > >> >> IF ID=LAG(ID) AND impflag
> > @nukeme@=LAG(@nukeme@).
> > > >> >> DO IF @nukeme@.
> > > >> >> + DO REPEAT imp=alc01 TO alc05.
> > > >> >> + COMPUTE imp=$SYSMIS . /* or set to
> > some value to be declared
> > > >> later
> > > >> >> as
> > > >> >> missing */.
> > > >> >> + END REPEAT.
> > > >> >> END IF.
> > > >> >> SORT CASES BY @ordered@.
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> Kathryn Gardner wrote
> > > >> >> >
> > > >> >> > Dear List,I am running a multiple
> > imputation on lots of
> > > >> questionnaire
> > > >> >> > items and I'm trying to figure out
> > a way to run the analysis,
> > > >> without
> > > >> >> > imputing missing values for those
> > participants who have missed out
> > > >> say
> > > >> >> all
> > > >> >> > 5 items on an alcohol
> > questionnaire because they were told to skip
> > > >> it
> > > >> >> if
> > > >> >> > they do not drink alcohol. I don't
> > want to exclude the alcohol
> > > >> measure
> > > >> >> > entirely from the MI because there
> > are also randomly missing values
> > > >> >> across
> > > >> >> > these alcohol items that do need
> > imputing. At the moment all missing
> > > >> >> > values are identified as system
> > missing in the data file, and I
> > > >> thought
> > > >> >> > there might be a way to get SPSS
> > to only run the MI on certain types
> > > >> of
> > > >> >> > missing values if I coded the ones
> > I want to be ignored as user
> > > >> >> missing,
> > > >> >> > but this doesn't seem possible.
> > The only solution I could come up
> > > >> with
> > > >> >> was
> > > >> >> > running the MI, then manually
> > scanning thousands of rows of data and
> > > >> >> > deleting the imputed values on the
> > alcohol measure for the
> > > >> participants
> > > >> >> > who skipped the entire
> > questionnaire. As you can imagine, this is
> > > >> >> taking
> > > >> >> > hours! There must be a simpler
> > way. Any advice greatly appreciated.
> > > >> >> Kind
> > > >> >> > regards,Kathryn
> > > >> >> >
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> View this message in context:
> > > >> >>
> > > >>
> > http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5077084.html
> > > >> >> Sent from the SPSSX Discussion mailing
> > list archive at Nabble.com.
> > > >> >>
> > > >> >> =====================
> > > >> >> To manage your subscription to SPSSX-L,
> > send a message to
> > > >> >> LISTSERV@.UGA (not to SPSSX-L), with no
> > body text except the
> > > >> >> command. To leave the list, send the
> > command
> > > >> >> SIGNOFF SPSSX-L
> > > >> >> For a list of commands to manage
> > subscriptions, send the command
> > > >> >> INFO REFCARD
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> View this message in context:
> > > >>
> > http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080244.html
> > > >> Sent from the SPSSX Discussion mailing list
> > archive at Nabble.com.
> > > >>
> > > >> =====================
> > > >> To manage your subscription to SPSSX-L, send a
> > message to
> > > >> LISTSERV@.UGA (not to SPSSX-L), with no body
> > text except the
> > > >> command. To leave the list, send the command
> > > >> SIGNOFF SPSSX-L
> > > >> For a list of commands to manage subscriptions,
> > send the command
> > > >> INFO REFCARD
> > > >
> > >
> > >
> > > --
> > > View this message in context:
> > http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5080477.html
> > > Sent from the SPSSX Discussion mailing list archive at
> > Nabble.com.
> > >
> > > =====================
> > > To manage your subscription to SPSSX-L, send a message to
> > > LISTSERV@.UGA (not to SPSSX-L), with no body
> > text except the
> > > command. To leave the list, send the command
> > > SIGNOFF SPSSX-L
> > > For a list of commands to manage subscriptions, send the
> > command
> > > INFO REFCARD
> >
> >
> >
> >
> >
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message to
> > LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
> >
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-imputation-for-different-types-of-missing-values-tp5076972p5081042.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

Poes, Matthew Joseph

Re: Multiple imputation for different types of missing values

In reply to this post by Kathryn Gardner

My experience with the handling of missing data has lead me to distrust and largely not use the SPSS Multiple Imputation module. Because of the way it draws its values and develops its estimate of the distribution, my feeling is that should only be used in situations that can truly be called MAR.

Research into the least biased way of estimating the missing value in survey research has actually shown that when the questions are based on finite scales, other methods may be more appropriate. Hot Decking is one of the best approaches for this, and some versions can even take into account multiple time points. Hot Decking is also the method used by many of the very large scale survey groups such as the Census. It’s my understanding that hot decking works best when the sample size is fairly substantial. Note that Hot Decking is not supported natively in SPSS, but numerous macro’s exist for it, and are fairly easy to use.

Matthew J Poes

Research Data Specialist

Center for Prevention Research and Development

University of Illinois

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Kathryn Gardner
Sent: Monday, December 19, 2011 3:48 AM
To: [hidden email]
Subject: Re: Multiple imputation for different types of missing values

Art Kendall

Social Research Consultants

On 12/16/2011 4:35 AM, Kathryn Gardner wrote: