I am running a regression of a continuous variable on a number of
independent variables that include two categorical variables (age and income). There are 4 income categories and 9 age categories. This means I need 3+8=11 dummy vars. I understand that SPSS automatically takes one income category and one age category as the reference categories if I use 4 +9 dummies and enter them into the regression using When I use the syntax /regression /dependent=VarY /method = enter Var 1....Inc1 Inc2 Inc3 inc4 age1 age2..........age9 . These are returned in an excluded vars table after the coefficients table? Now when running this regression for different regions ( too many regions so Im evaluating the regression separately for each region)--i sometimes get 2 age categories in the excluded vars table. Also, for one region , i get no excluded vars table..what to do? |
Actually, SPSS does not "automatically takes one category as the reference category", but SPSS excludes variables when tolerance too low. One dummy is excluded because it can be perfectly predicted from the dummies for the other categories, thus tolerance zero, thus excluded. (all categories can be perfectly predicted from the dummies for the other categories; which one is excluded is arbitrary)
If 2 dummies of same variable are excluded, then a category of this variable can be perfectly predicted from the other predictors. No dummies excluded seems weird, I would check the data/dummies. Regards, Anita van der Kooij Data Theory Group Leiden University ________________________________ From: SPSSX(r) Discussion on behalf of Yusof ahmad Sent: Thu 31/08/2006 18:29 To: [hidden email] Subject: spss dummy variable regression--urgent I am running a regression of a continuous variable on a number of independent variables that include two categorical variables (age and income). There are 4 income categories and 9 age categories. This means I need 3+8=11 dummy vars. I understand that SPSS automatically takes one income category and one age category as the reference categories if I use 4 +9 dummies and enter them into the regression using When I use the syntax /regression /dependent=VarY /method = enter Var 1....Inc1 Inc2 Inc3 inc4 age1 age2..........age9 . These are returned in an excluded vars table after the coefficients table? Now when running this regression for different regions ( too many regions so Im evaluating the regression separately for each region)--i sometimes get 2 age categories in the excluded vars table. Also, for one region , i get no excluded vars table..what to do? ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. ********************************************************************** |
In reply to this post by Yusof ahmad
SPSS automatically takes one category as reference category (the last one by
default, I seem to recall, but you can change that with syntax) when SPSS itself does the conversion of a category variable into dummies, e.g. in Cox regression or logistic regression. Not so in Linear Regression, where you must create and enter the dummies yourself, and the choice of reference category is entirely yours. Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Yusof ahmad Enviado el: Thursday, August 31, 2006 1:30 PM Para: [hidden email] Asunto: spss dummy variable regression--urgent I am running a regression of a continuous variable on a number of independent variables that include two categorical variables (age and income). There are 4 income categories and 9 age categories. This means I need 3+8=11 dummy vars. I understand that SPSS automatically takes one income category and one age category as the reference categories if I use 4 +9 dummies and enter them into the regression using When I use the syntax /regression /dependent=VarY /method = enter Var 1....Inc1 Inc2 Inc3 inc4 age1 age2..........age9 . These are returned in an excluded vars table after the coefficients table? Now when running this regression for different regions ( too many regions so Im evaluating the regression separately for each region)--i sometimes get 2 age categories in the excluded vars table. Also, for one region , i get no excluded vars table..what to do? |
In reply to this post by Yusof ahmad
At 11:29 AM 8/31/2006, Yusof ahmad wrote:
>I am running a regression of a continuous variable on independent >variables that include two categorical variables (age and income). >There are 4 income categories and 9 age categories. This means Ineed >3+8=11 dummy vars. No, as others have suggested, it doesn't; but we'll come to that. >I understand that SPSS automatically takes one income category and one >age category as the reference categories if I use 4 +9 dummies and >enter them into the regression using No, absolutely not. Some SPSS procedures will take a categorical variable as an independent, and generate the dummy variables themselves. Those procedures will take one category as reference. But in /method = enter Var 1....Inc1 Inc2 Inc3 inc4 age1 age2..........age9 . SPSS doesn't know that Inc1 to Inc3 are indicator ('dummy') variables for the categorical variable Income; how could it? So it can't pick a reference category to eliminate. >When I use the syntax > >/regression >/dependent=VarY >/method = enter Var 1....Inc1 Inc2 Inc3 inc4 age1 age2..........age9 . > >These are returned in an excluded vars table after the coefficients >table? They are, but that's not why they are. As Anita van der Kooij wrote, SPSS excludes variables when it determines that some are linear combinations of the others. Before I go on: it is not good practice to use a set of regressor variables that you know are linearly dependent. Much better to choose the reference categories yourself, and drop the dummies for those categories. Choosing wisely can make your results much easier to interpret. >Now when running this regression for different regions (too many >regions so I'm evaluating the regression separately for each >region)--i sometimes get 2 age categories in the excluded vars table. The region has nobody in one of the age categories. The dummy for that category therefore has to be excluded, in addition to having to exclude one dummy because of linear dependence. (I write "the region has nobody in one of the age categories" categorically. There MAY be another way this can happen but I can't think of one.) >Also, for one region, I get no excluded vars table..what to do? Again, what you should do is choose your reference categories, rather than letting SPSS try - REGRESSION was not meant to be used this way. Again, from Anita, >No dummies excluded seems weird, I would check the data/dummies. It seems very weird indeed; I can't think of a way it could happen. POSSIBLY it could result from some kind of rounding errors in the calculations, though that doesn't seem very plausible. As Anita says, check your data - carefully. |
Free forum by Nabble | Edit this page |