Imputation of categorical missing values

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Imputation of categorical missing values

Blain Waan
Hello,

I have a data set that has some categorical variables (both binary outcome variables and variables having more than two categories) and some continuous variables.

I can use SPSS to impute missing values for continuous variables by EM algorithm. But how do I impute missing values for the both types of categorical variables? Is there any macro for doing that?

Note that, I will use the complete data set for a factor analysis. So, multiple imputation may create problems to combine the results of each imputed data set. What is your suggestion in this context? Someone has suggested me hot-deck imputation instead. But how do I do this in SPSS?

Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Imputation of categorical missing values

Maguin, Eugene
Blain,

I'm not familiar with how imputation works in spss. I assume that people working on imputation have written on the problem of categorical variables. I'm not familiar with the literature; perhaps others who are will comment and give citations. Doing it is not too hard. You have to know serious syntax. Look at Ray Levesque's web site, spsstools.net, bottom of the page under missing data, he has a hot deck routine. It may need to be adapted to your situation, or not.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Blain Waan
Sent: Friday, July 27, 2012 7:39 AM
To: [hidden email]
Subject: Imputation of categorical missing values

Hello,

I have a data set that has some categorical variables (both binary outcome variables and variables having more than two categories) and some continuous variables.

I can use SPSS to impute missing values for continuous variables by EM algorithm. But how do I impute missing values for the both types of categorical variables? Is there any macro for doing that?

Note that, I will use the complete data set for a factor analysis. So, multiple imputation may create problems to combine the results of each imputed data set. What is your suggestion in this context? Someone has suggested me hot-deck imputation instead. But how do I do this in SPSS?

Thank you.



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Imputation-of-categorical-missing-values-tp5714496.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Imputation of categorical missing values

Alex Reutter
In reply to this post by Blain Waan
Are you using the EM algorithm through the MVA command?  If so, the command syntax documentation (http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/topic/com.ibm.spss.statistics.help/syn_mva.htm) explains how to specify categorical variables to be imputed; it's a /CATEGORICAL subcommand with the list of categorical variables.

Alex
Reply | Threaded
Open this post in threaded view
|

Re: Imputation of categorical missing values

M.H.
In reply to this post by Blain Waan
I know that post is about 2 years now. But I have some experience in PMM
(predictive mean matching) and for those who have both categorical/binary
and continuous data, I would never recommend multiple regression method.
Normally, you should go to ->multiple imputation ->impute missing data
values, ->custom (MCMC) and then select PMM. It's like a hot deck
imputation, and it uses real values from your data. The difference is that
you use regression method, you may have values like 1.35547226 or 2.38446341
even though SPSS will round the value at 1 and 2, respectively, because
obviously your categorical variables do not normally contain any numbers
after comma. But when you perform a histogram, you see that it looks ugly.
If you variable can only take on values like 0, 1, 2, 3, 4, and 5, PMM is
excellent because it gives you exactly the values of 0, 1, 2, 3, 4, and 5,
according to the matching. Some caveat of PMM or hot deck concerns small
data (both hot deck and PMM) because it is difficult to do the data matching
(or, to find a "donor" if you're familiar with the literature) when you have
few cases.
See this article for further discussion :  A Review of Hot Deck Imputation
for Survey Non-response
Rebecca R. Andridge and Roderick J. A. Little
<http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3130338/>  .

Now that I am at it, is there any expert users here ?

I want to work with hot deck imputation. Normally SPSS doesn't have it, but
Teresa Myers in the article  Goodbye, Listwise Deletion: Presenting Hot Deck
Imputation as an Easy and Effective Tool for Handling Missing Data
<http://www.afhayes.com/public/hotdeck.pdf>   presents the HD imputation,
and at the end (.ie., Appendix) of the paper, gives the macro for creating
the hot deck command in SPSS.

It is written "Execute the command set below in an SPSS syntax window
exactly as is". And yet I can't get it worked (I use spss 20). After
executing the syntax, nothing happens. (note there is two pages for the
entire syntax).

Any ideas ? Thanks.



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Imputation-of-categorical-missing-values-tp5714496p5724113.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Imputation of categorical missing values

Anton-24
In reply to this post by Blain Waan
I have successfully used her macro many times.  After executing the macro
are you providing a hotdeck command?  For example:

HOTDECK y = variables with missing data /deck = variables defining the decks .

On Thu, 23 Jan 2014 14:06:06 -0800, M.H. <[hidden email]> wrote:

>I know that post is about 2 years now. But I have some experience in PMM
>(predictive mean matching) and for those who have both categorical/binary
>and continuous data, I would never recommend multiple regression method.
>Normally, you should go to ->multiple imputation ->impute missing data
>values, ->custom (MCMC) and then select PMM. It's like a hot deck
>imputation, and it uses real values from your data. The difference is that
>you use regression method, you may have values like 1.35547226 or 2.38446341
>even though SPSS will round the value at 1 and 2, respectively, because
>obviously your categorical variables do not normally contain any numbers
>after comma. But when you perform a histogram, you see that it looks ugly.
>If you variable can only take on values like 0, 1, 2, 3, 4, and 5, PMM is
>excellent because it gives you exactly the values of 0, 1, 2, 3, 4, and 5,
>according to the matching. Some caveat of PMM or hot deck concerns small
>data (both hot deck and PMM) because it is difficult to do the data matching
>(or, to find a "donor" if you're familiar with the literature) when you have
>few cases.
>See this article for further discussion :  A Review of Hot Deck Imputation
>for Survey Non-response
>Rebecca R. Andridge and Roderick J. A. Little
><http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3130338/>  .
>
>Now that I am at it, is there any expert users here ?
>
>I want to work with hot deck imputation. Normally SPSS doesn't have it, but
>Teresa Myers in the article  Goodbye, Listwise Deletion: Presenting Hot Deck
>Imputation as an Easy and Effective Tool for Handling Missing Data
><http://www.afhayes.com/public/hotdeck.pdf>   presents the HD imputation,
>and at the end (.ie., Appendix) of the paper, gives the macro for creating
>the hot deck command in SPSS.
>
>It is written "Execute the command set below in an SPSS syntax window
>exactly as is". And yet I can't get it worked (I use spss 20). After
>executing the syntax, nothing happens. (note there is two pages for the
>entire syntax).
>
>Any ideas ? Thanks.
>
>
>
>--
>View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Imputation-of-categorical-missing-values-tp5714496p5724113.html
>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Imputation of categorical missing values

David Marso
Administrator
In reply to this post by M.H.
Maybe you should read up on how to RUN macros and follow the instructions from resources!

HOTDECK y = variables with missing data/deck = variables defining the decks.  

M.H. wrote
I know that post is about 2 years now. But I have some experience in PMM
(predictive mean matching) and for those who have both categorical/binary
and continuous data, I would never recommend multiple regression method.
Normally, you should go to ->multiple imputation ->impute missing data
values, ->custom (MCMC) and then select PMM. It's like a hot deck
imputation, and it uses real values from your data. The difference is that
you use regression method, you may have values like 1.35547226 or 2.38446341
even though SPSS will round the value at 1 and 2, respectively, because
obviously your categorical variables do not normally contain any numbers
after comma. But when you perform a histogram, you see that it looks ugly.
If you variable can only take on values like 0, 1, 2, 3, 4, and 5, PMM is
excellent because it gives you exactly the values of 0, 1, 2, 3, 4, and 5,
according to the matching. Some caveat of PMM or hot deck concerns small
data (both hot deck and PMM) because it is difficult to do the data matching
(or, to find a "donor" if you're familiar with the literature) when you have
few cases.
See this article for further discussion :  A Review of Hot Deck Imputation
for Survey Non-response
Rebecca R. Andridge and Roderick J. A. Little
<http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3130338/>  .

Now that I am at it, is there any expert users here ?

I want to work with hot deck imputation. Normally SPSS doesn't have it, but
Teresa Myers in the article  Goodbye, Listwise Deletion: Presenting Hot Deck
Imputation as an Easy and Effective Tool for Handling Missing Data
<http://www.afhayes.com/public/hotdeck.pdf>   presents the HD imputation,
and at the end (.ie., Appendix) of the paper, gives the macro for creating
the hot deck command in SPSS.

It is written "Execute the command set below in an SPSS syntax window
exactly as is". And yet I can't get it worked (I use spss 20). After
executing the syntax, nothing happens. (note there is two pages for the
entire syntax).

Any ideas ? Thanks.



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Imputation-of-categorical-missing-values-tp5714496p5724113.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"