Hi all,
Just got a quick question from an SPSS newbie. I'm running a regression on vehicle registrations (as a proxy for vehicle demand) and wanted to use the marque as an independent variable. As it's categorical, I read that it was best to conduct dummy coding to create several dichotomous variables- I hope this is true! I now have 25 of these (dummy coded) variables and am wondering what to do next? Do I include them in the regression model or run ANOVA analysis with them as the only independent variables and exclude them from the complete model which contains the other independent variables? Any pointers would be greatly appreciated! Best, Ashley Morris |
Administrator
|
Try Googling on <dummy variable regression SPSS>. You'll find things like this:
http://www.psychstat.missouristate.edu/multibook/mlt08m.html I would not recommend the way the author created his dummy variables*, but the rest of the page might be helpful to you. * I would have done this instead for the example used on that page: COMPUTE FamilyS = (Dept EQ 1). COMPUTE Biology = (Dept EQ 2). FORMATS FamilyS Biology(f1). CROSSTABS FamilyS Biology BY Dept. /* Check that indicator variables are correct. ALSO note that if you run your model via UNIANOVA (Analyze > GLM > Univariate) rather than REGRESSION, you do not have to compute the dummy variables: Simply enter the original categorical variable as a fixed factor, and SPSS will compute the dummy variables internally. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Thanks for your help, Bruce.
As I've already coded the variables (and written about the process), I think I'm going to keep them as they are and use the standard regression function. Am I correct in thinking that these 25 dummy variables then replace the 1 categorical variable in the model and I can now go ahead and run the regression? (Sorry- may be a very obvious question) Thanks again! :) |
Administrator
|
Well the 25? 26? level NOMINAL variable has absolutely no business being in the REGRESSION in the first place (so, I will assume you will draw your own conclusions). You would do yourself a favor by researching the relationship between REGRESSION and ANOVA from a linear models perspective and also write out the regression equation to acquire some sort of epiphany re the meaning/interpretation of the dummy variables.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
I did not dare ask what the research question was and how using this variable as an IV would answer it (and what exactly the DV was), so as to avoid going down the "rabbit hole." Are you not following your own advice, David?!?! :-)
Ryan On Fri, Apr 19, 2013 at 6:52 AM, David Marso <[hidden email]> wrote: Well the 25? 26? level NOMINAL variable has absolutely *no business* being in |
In reply to this post by Bella Riley
The DV is vehicle registrations for 2011. The IVs include price, manufacturer as a proxy for brand loyalty (this is the variable I thought needed coding), body type, customer reviews, fuel capacity, fuel type etc.
So have I essentially gone at this in entirely the wrong way? (I'm an undergrad in Business Economics with only one statistics module under my belt- studied two years ago, so I'm far from comfortable with even the most simple SPSS processes). Any guidance would be helpful (please don't say I have to scrap the entire thing :( ) |
Administrator
|
In reply to this post by Ryan
Mea Culpa ;-)
These rabbit holes sometimes manifest their own funky gravity fields. Now, who wants some tea? --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by Bella Riley
So is Manufacturer the variable with 25 (or 26) levels? What people have been hinting at is that this is FAR too many levels for you to get anything useful out of your model. (I also suspect you don't have nearly enough cases in your data set to support a model with that many categories for manufacturer.) You might consider looking at North American / European /Asian instead.
Finally, it also sounds like you need to do a fair bit of background reading, as another poster suggested. @ David: I'll take a splash of milk & no sugar please. ;-)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Bella Riley
Ah, super!
So if I completely change tack and remove the variable (with justification provided), if I were to dummy code another variable (body type) which has 6 dummy variables, is that appropriate? Thanks for all your help with this! |
JJ,
What is your sample size (maybe I missed this fact)? I think everybody would be saying one thing if you say '100' and another if you say '10000'. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of JJEcon Sent: Friday, April 19, 2013 10:26 AM To: [hidden email] Subject: Re: Dummy Coding Variables Ah, super! So if I completely change tack and remove the variable (with justification provided), if I were to dummy code another variable (body type) which has 6 dummy variables, is that appropriate? Thanks for all your help with this! -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Dummy-Coding-Variables-tp5719548p5719562.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Bella Riley
Gene, my sample size is fairly small...
256. |
In reply to this post by Bella Riley
At 07:17 AM 4/19/2013, JJEcon wrote:
>The DV is vehicle registrations for 2011. The IVs include price, >manufacturer as a proxy for brand loyalty (this is the variable I >thought needed coding), body type, customer reviews, fuel capacity, >fuel type etc. > >So have I essentially gone at this in entirely the wrong way? It sounds to me like you have been. You're trying to explain, in your words, "vehicle registrations (as a proxy for vehicle demand)". In this case, I don't see how you can have manufacturer as an independent variable. *Which* car to purchase and register is as much a part of the outcome as is *whether* to purchase and register a car. What constitutes a data point in your dataset? A single car purchase? In that case, the only possible dependent value seems to be 1. Or, say, a month of registrations? Regression is not directly about causality. But in setting up a regression model, it's well to think what causal relationships MAY be present, and design the model to be sensitive to them. This isn't a problem about coding variables; it's a problem about how you think of your study. Tell us, or tell yourself, what outcome you're trying to explain, and what you think affects it. To start with: It looks like you're trying to get at the decision to purchase a car. What do you hypothesize goes into that, and how can you measure it? And, crucially: since you're only looking at registrations, what handle do you have on decisions *not* to purchase a car? A decline in total registrations may reflect such decisions, but it's not clear that you have such a time component, or anything comparable. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |