creating dummy variables from multiple categorical variables

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

creating dummy variables from multiple categorical variables

mgura
Hi,

I have 90 categorical variables that can each have the same 28 responses
plus the system missing value. I want to code 28 dummy variables
corresponding to the 28 possible responses. I thought that I achieved this
task with do repeat:

Do repeat a=v1 to v90.
if a=1 dv1=1.
if a=2 dv2=1.
.
if a=28 dv28=1.
end repeat.

when i compare the frequencies of the dummy variables to the frequencies
that come out of the multiple response tool I am off. Sometimes just a few
cases, other times a couple of hundred cases. This is problematic. Any help
would be greatly appreciated.
thanks,
Mike Gura

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: creating dummy variables from multiple categorical variables

David Marso
Administrator
This post was updated on .
Well, the 90th one overwrites any previous!!!
EDIT:  Actually 2nd one overwrites first ... 90th etc ( I should have been more specific).
Perhaps you should be more attentive to the specifics of your data and the FM!
mgura wrote
Hi,

I have 90 categorical variables that can each have the same 28 responses
plus the system missing value. I want to code 28 dummy variables
corresponding to the 28 possible responses. I thought that I achieved this
task with do repeat:

Do repeat a=v1 to v90.
if a=1 dv1=1.
if a=2 dv2=1.
.
if a=28 dv28=1.
end repeat.

when i compare the frequencies of the dummy variables to the frequencies
that come out of the multiple response tool I am off. Sometimes just a few
cases, other times a couple of hundred cases. This is problematic. Any help
would be greatly appreciated.
thanks,
Mike Gura

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: creating dummy variables from multiple categorical variables

John F Hall
In reply to this post by mgura

Mike

 

Not sure if I've got the logic right for what you want, since values 1 thru 28 can appear more than once across v1 to v90, but you can play around with the syntax.  If you tell me more about what your data are like and what you are trying to do, I may be able to give more detailed help.  If you send me your *.sav file off list (in complete confidence) I can play with it and see what I can come up with.

 

Off the top of my head and untested, what happens if you do something like . . . ?

 

 

do repeat

x = 1 thru 28

/y = dv1 to dv28.

count y = v1 to v90 (x).

end repeat.

 

*check to see how many 1s thru 28s there are.

freq dv1 to dv28 .

 

*check  where <max> = max value from freq above.

mult resp groups dva (dv1 to dv28 (1, <max>))

/freq dva.

 

temp.

recode dv1 to dv28 (1 thru 28 = 1).

mult resp groups dvb (dv1 to dv28 (1))

/freq dvb.

 

 

 

John F Hall (Mr)

 

Email:     [hidden email]

Website: www.surveyresearch.weebly.com

 

 

 

 

 

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of mike gura
Sent: 25 August 2012 04:18
To: [hidden email]
Subject: creating dummy variables from multiple categorical variables

 

Hi,

 

I have 90 categorical variables that can each have the same 28 responses plus the system missing value. I want to code 28 dummy variables corresponding to the 28 possible responses. I thought that I achieved this task with do repeat:

 

Do repeat a=v1 to v90.

if a=1 dv1=1.

if a=2 dv2=1.

.

if a=28 dv28=1.

end repeat.

 

when i compare the frequencies of the dummy variables to the frequencies that come out of the multiple response tool I am off. Sometimes just a few cases, other times a couple of hundred cases. This is problematic. Any help would be greatly appreciated.

thanks,

Mike Gura

 

=====================

To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: creating dummy variables from multiple categorical variables

Bruce Weaver
Administrator
In reply to this post by David Marso
In other words, you have 90 sets of 28 dummy variables, don't you?  

Q1. What are you planning on doing with all those dummy variables?
Q2. What is the nature of the 90 categorical variables (i.e., purely nominal, or ordered categories, etc)?  

What I'm getting at with Q2 is whether the are close enough to interval scaled to justify treating them as such.  

Okay, that's enough interneTelepathy* for this morning.  ;-)
 
* (C) - David Marso


David Marso wrote
Well, the 90th one overwrites any previous!!!
EDIT:  Actually 2nd one overwrites first ... 90th etc ( I should have been more specific).
Perhaps you should be more attentive to the specifics of your data and the FM!
mgura wrote
Hi,

I have 90 categorical variables that can each have the same 28 responses
plus the system missing value. I want to code 28 dummy variables
corresponding to the 28 possible responses. I thought that I achieved this
task with do repeat:

Do repeat a=v1 to v90.
if a=1 dv1=1.
if a=2 dv2=1.
.
if a=28 dv28=1.
end repeat.

when i compare the frequencies of the dummy variables to the frequencies
that come out of the multiple response tool I am off. Sometimes just a few
cases, other times a couple of hundred cases. This is problematic. Any help
would be greatly appreciated.
thanks,
Mike Gura

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: creating dummy variables from multiple categorical variables

Art Kendall
In reply to this post by mgura
Why do you have _system_ missing values? Was there a problem reading your data the way that you said it should be read? (I.e., the data was not something that could be read by the specified format.)
Should those be some sort of _user_ missing values such as "not answered"? (I.e., the answer is missing because of a reason known to the user.)

Why do you want to use dummy variables, which usually mean a dichotomy.
Wouldn't you need to sum across the set of 90 sets of dummies to match MULT RESP?

What do you get if you do something like: (untested)
do repeat valcount = valuecount1 to valuecount28/ val = 1 to 28.
count valcount = v1 to v90(val).
end repeat.

if your syntax is a set of statements like
if varvalue eq 1 dummyvar1=1.
You would not get the same as MULT RESP.
If you syntax is a set of statement like
if a eq 1 valuecount = valuecount +1.
Then you should get the same results.

YMMV but most users get a firmer grasp on their syntax by using variable names that are closer to the concept.
a ==> varvalue
dv1 ==> dummyvar
but my interneTelepathy says you mean
dv1 ==> valuecount
etc.

As the logic becomes more complex most users benefit by using the conventional logical operator "eq" for a logical operation and reserving "=" for an assignment operation.
Art Kendall
Social Research Consultants
On 8/24/2012 10:17 PM, mike gura wrote:
Hi,

I have 90 categorical variables that can each have the same 28 responses
plus the system missing value. I want to code 28 dummy variables
corresponding to the 28 possible responses. I thought that I achieved this
task with do repeat:

Do repeat a=v1 to v90.
if a=1 dv1=1.
if a=2 dv2=1.
.
if a=28 dv28=1.
end repeat.

when i compare the frequencies of the dummy variables to the frequencies
that come out of the multiple response tool I am off. Sometimes just a few
cases, other times a couple of hundred cases. This is problematic. Any help
would be greatly appreciated.
thanks,
Mike Gura

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: creating dummy variables from multiple categorical variables

David Marso
Administrator
In reply to this post by mgura
Dummy variables have typically a value of 0 or 1.
Since you do not specify WHAT you are after we can only guess and apply ESPss.

Does this suffice?
VECTOR v=v1 TO v90/ result(28).
LOOP #=1 TO 90.
COMPUTE result(v(#))=SUM(result(v(#)),1).
END LOOP.

mgura wrote
Hi,

I have 90 categorical variables that can each have the same 28 responses
plus the system missing value. I want to code 28 dummy variables
corresponding to the 28 possible responses. I thought that I achieved this
task with do repeat:

Do repeat a=v1 to v90.
if a=1 dv1=1.
if a=2 dv2=1.
.
if a=28 dv28=1.
end repeat.

when i compare the frequencies of the dummy variables to the frequencies
that come out of the multiple response tool I am off. Sometimes just a few
cases, other times a couple of hundred cases. This is problematic. Any help
would be greatly appreciated.
thanks,
Mike Gura

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: creating dummy variables from multiple categorical variables

Rich Ulrich
Since I like to see explicit initialization, I would put a line
right after the VECTOR line,
-RECODE result1 to result28(else=0).

David's code gives 28 results that are each a count of how
many times the value (1 to 28) appeared.  If the desired result is simply
a 0/1  indicator, the SUM can be replaced by assigning the value of 1.

--
Rich Ulrich

> Date: Sat, 25 Aug 2012 07:36:51 -0700

> From: [hidden email]
> Subject: Re: creating dummy variables from multiple categorical variables
> To: [hidden email]
>
> Dummy variables have typically a value of 0 or 1.
> Since you do not specify *WHAT *you are after we can only guess and apply
> ESPss.
>
> Does this suffice?
> VECTOR v=v1 TO v90/ result(28).
> LOOP #=1 TO 90.
> COMPUTE result(v(#))=SUM(result(v(#)),1).
> END LOOP.
>
>
> mgura wrote
> >
> > Hi,
> >
> > I have 90 categorical variables that can each have the same 28 responses
> > plus the system missing value. I want to code 28 dummy variables
> > corresponding to the 28 possible responses. I thought that I achieved this
> > task with do repeat:
> >
> > Do repeat a=v1 to v90.
> > if a=1 dv1=1.
> > if a=2 dv2=1.
> > .
> > if a=28 dv28=1.
> > end repeat.
> >
> > when i compare the frequencies of the dummy variables to the frequencies
> > that come out of the multiple response tool I am off. Sometimes just a few
> > cases, other times a couple of hundred cases. This is problematic. Any
> > help
> > would be greatly appreciated.
> > thanks,
> > Mike Gura
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message to
> > LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
> >
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to email me.
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/creating-dummy-variables-from-multiple-categorical-variables-tp5714860p5714869.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD