Dummy Coding Variables

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Dummy Coding Variables

Bella Riley
Hi all,

Just got a quick question from an SPSS newbie.

I'm running a regression on vehicle registrations (as a proxy for vehicle demand) and wanted to use the marque as an independent variable. As it's categorical, I read that it was best to conduct dummy coding to create several dichotomous variables- I hope this is true!

I now have 25 of these (dummy coded) variables and am wondering what to do next? Do I include them in the regression model or run ANOVA analysis with them as the only independent variables and exclude them from the complete model which contains the other independent variables?

Any pointers would be greatly appreciated!

Best,

Ashley Morris
Reply | Threaded
Open this post in threaded view
|

Re: Dummy Coding Variables

Bruce Weaver
Administrator
Try Googling on <dummy variable regression SPSS>.  You'll find things like this:

  http://www.psychstat.missouristate.edu/multibook/mlt08m.html

I would not recommend the way the author created his dummy variables*, but the rest of the page might be helpful to you.  

* I would have done this instead for the example used on that page:

COMPUTE FamilyS = (Dept EQ 1).
COMPUTE Biology = (Dept EQ 2).
FORMATS FamilyS Biology(f1).
CROSSTABS FamilyS Biology BY Dept. /* Check that indicator variables are correct.

ALSO note that if you run your model via UNIANOVA (Analyze > GLM > Univariate) rather than REGRESSION, you do not have to compute the dummy variables:  Simply enter the original categorical variable as a fixed factor, and SPSS will compute the dummy variables internally.  

HTH.

JJEcon wrote
Hi all,

Just got a quick question from an SPSS newbie.

I'm running a regression on vehicle registrations (as a proxy for vehicle demand) and wanted to use the marque as an independent variable. As it's categorical, I read that it was best to conduct dummy coding to create several dichotomous variables- I hope this is true!

I now have 25 of these (dummy coded) variables and am wondering what to do next? Do I include them in the regression model or run ANOVA analysis with them as the only independent variables and exclude them from the complete model which contains the other independent variables?

Any pointers would be greatly appreciated!

Best,

Ashley Morris
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Dummy Coding Variables

Bella Riley
Thanks for your help, Bruce.

As I've already coded the variables (and written about the process), I think I'm going to keep them as they are and use the standard regression function.

Am I correct in thinking that these 25 dummy variables then replace the 1 categorical variable in the model and I can now go ahead and run the regression? (Sorry- may be a very obvious question)

Thanks again! :)
Reply | Threaded
Open this post in threaded view
|

Re: Dummy Coding Variables

David Marso
Administrator
Well the 25? 26? level NOMINAL variable has absolutely no business being in the REGRESSION in the first place (so, I will assume you will draw your own conclusions).  You would do yourself a favor by researching the relationship between REGRESSION and ANOVA from a linear models perspective and also write out the regression equation to acquire some sort of epiphany re the meaning/interpretation of the dummy variables.
JJEcon wrote
Thanks for your help, Bruce.

As I've already coded the variables (and written about the process), I think I'm going to keep them as they are and use the standard regression function.

Am I correct in thinking that these 25 dummy variables then replace the 1 categorical variable in the model and I can now go ahead and run the regression? (Sorry- may be a very obvious question)

Thanks again! :)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Dummy Coding Variables

Ryan
I did not dare ask what the research question was and how using this variable as an IV would answer it (and what exactly the DV was), so as to avoid going down the "rabbit hole." Are you not following your own advice, David?!?! :-)
 
Ryan


On Fri, Apr 19, 2013 at 6:52 AM, David Marso <[hidden email]> wrote:
Well the 25? 26? level NOMINAL variable has absolutely *no business* being in
the REGRESSION in the first place (so, I will assume you will draw your own
conclusions).  You would do yourself a favor by researching the relationship
between REGRESSION and ANOVA from a linear models perspective and also write
out the regression equation to acquire some sort of epiphany re the
meaning/interpretation of the dummy variables.

JJEcon wrote
> Thanks for your help, Bruce.
>
> As I've already coded the variables (and written about the process), I
> think I'm going to keep them as they are and use the standard regression
> function.
>
> Am I correct in thinking that these 25 dummy variables then replace the 1
> categorical variable in the model and I can now go ahead and run the
> regression? (Sorry- may be a very obvious question)
>
> Thanks again! :)





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Dummy-Coding-Variables-tp5719548p5719554.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Dummy Coding Variables

Bella Riley
In reply to this post by Bella Riley
The DV is vehicle registrations for 2011. The IVs include price, manufacturer as a proxy for brand loyalty (this is the variable I thought needed coding), body type, customer reviews, fuel capacity, fuel type etc.

So have I essentially gone at this in entirely the wrong way? (I'm an undergrad in Business Economics with only one statistics module under my belt- studied two years ago, so I'm far from comfortable with even the most simple SPSS processes).

Any guidance would be helpful (please don't say I have to scrap the entire thing :( )
Reply | Threaded
Open this post in threaded view
|

Re: Dummy Coding Variables

David Marso
Administrator
In reply to this post by Ryan
Mea Culpa ;-)
These rabbit holes sometimes manifest their own funky gravity fields.
Now, who wants some tea?
--
R B wrote
I did not dare ask what the research question was and how using this
variable as an IV would answer it (and what exactly the DV was), so as to
avoid going down the "rabbit hole." Are you not following your own advice,
David?!?! :-)

Ryan


On Fri, Apr 19, 2013 at 6:52 AM, David Marso <[hidden email]> wrote:

> Well the 25? 26? level NOMINAL variable has absolutely *no business* being
> in
> the REGRESSION in the first place (so, I will assume you will draw your own
> conclusions).  You would do yourself a favor by researching the
> relationship
> between REGRESSION and ANOVA from a linear models perspective and also
> write
> out the regression equation to acquire some sort of epiphany re the
> meaning/interpretation of the dummy variables.
>
> JJEcon wrote
> > Thanks for your help, Bruce.
> >
> > As I've already coded the variables (and written about the process), I
> > think I'm going to keep them as they are and use the standard regression
> > function.
> >
> > Am I correct in thinking that these 25 dummy variables then replace the 1
> > categorical variable in the model and I can now go ahead and run the
> > regression? (Sorry- may be a very obvious question)
> >
> > Thanks again! :)
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
> ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
> abyssum?"
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Dummy-Coding-Variables-tp5719548p5719554.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Dummy Coding Variables

Bruce Weaver
Administrator
In reply to this post by Bella Riley
So is Manufacturer the variable with 25 (or 26) levels?  What people have been hinting at is that this is FAR too many levels for you to get anything useful out of your model.  (I also suspect you don't have nearly enough cases in your data set to support a model with that many categories for manufacturer.)  You might consider looking at North American / European /Asian instead.

Finally, it also sounds like you need to do a fair bit of background reading, as another poster suggested.

@ David:  I'll take a splash of milk & no sugar please.  ;-)



JJEcon wrote
The DV is vehicle registrations for 2011. The IVs include price, manufacturer as a proxy for brand loyalty (this is the variable I thought needed coding), body type, customer reviews, fuel capacity, fuel type etc.

So have I essentially gone at this in entirely the wrong way? (I'm an undergrad in Business Economics with only one statistics module under my belt- studied two years ago, so I'm far from comfortable with even the most simple SPSS processes).

Any guidance would be helpful (please don't say I have to scrap the entire thing :( )
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Dummy Coding Variables

Bella Riley
In reply to this post by Bella Riley
Ah, super!

So if I completely change tack and remove the variable (with justification provided), if I were to dummy code another variable (body type) which has 6 dummy variables, is that appropriate?

Thanks for all your help with this!
Reply | Threaded
Open this post in threaded view
|

Re: Dummy Coding Variables

Maguin, Eugene
JJ,
What is your sample size (maybe I missed this fact)? I think everybody would be saying one thing if you say '100' and another if you say '10000'.
Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of JJEcon
Sent: Friday, April 19, 2013 10:26 AM
To: [hidden email]
Subject: Re: Dummy Coding Variables

Ah, super!

So if I completely change tack and remove the variable (with justification provided), if I were to dummy code another variable (body type) which has 6 dummy variables, is that appropriate?

Thanks for all your help with this!




--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Dummy-Coding-Variables-tp5719548p5719562.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Dummy Coding Variables

Bella Riley
In reply to this post by Bella Riley
Gene, my sample size is fairly small...

256.
Reply | Threaded
Open this post in threaded view
|

Re: Dummy Coding Variables

Richard Ristow
In reply to this post by Bella Riley
At 07:17 AM 4/19/2013, JJEcon wrote:

>The DV is vehicle registrations for 2011. The IVs include price,
>manufacturer as a proxy for brand loyalty (this is the variable I
>thought needed coding), body type, customer reviews, fuel capacity,
>fuel type etc.
>
>So have I essentially gone at this in entirely the wrong way?

It sounds to me like you have been. You're trying to explain, in your
words, "vehicle registrations (as a proxy for vehicle demand)". In
this case, I don't see how you can have manufacturer as an
independent variable. *Which* car to purchase and register is as much
a part of the outcome as is *whether* to purchase and register a car.

What constitutes a data point in your dataset? A single car purchase?
In that case, the only possible dependent value seems to be 1. Or,
say, a month of registrations?

Regression is not directly about causality. But in setting up a
regression model, it's well to think what causal relationships MAY be
present, and design the model to be sensitive to them.

This isn't a problem about coding variables; it's a problem about how
you think of your study. Tell us, or tell yourself, what outcome
you're trying to explain, and what you think affects it.

To start with: It looks like you're trying to get at the decision to
purchase a car. What do you hypothesize goes into that, and how can
you measure it? And, crucially: since you're only looking at
registrations, what handle do you have on decisions *not* to purchase
a car? A decline in total registrations may reflect such decisions,
but it's not clear that you have such a time component, or anything
comparable.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD