spss dummy variable regression--urgent

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

spss dummy variable regression--urgent

Yusof ahmad
I am running a regression of a continuous variable on a number of
independent variables that include two categorical variables (age and
income). There are 4 income categories and 9 age categories. This means I
need 3+8=11 dummy vars.
I understand that SPSS automatically takes one income category and one age
category as the reference categories if I use 4 +9 dummies and enter them
into the regression using

When I use the syntax

/regression
/dependent=VarY
/method = enter Var 1....Inc1 Inc2 Inc3 inc4 age1 age2..........age9 .

These are returned in an excluded vars table after the coefficients table?

Now when running this regression for different regions ( too many regions
so Im evaluating the regression separately for each region)--i sometimes
get 2 age categories in the excluded vars table. Also, for one region , i
get no excluded vars table..what to do?
Reply | Threaded
Open this post in threaded view
|

Re: spss dummy variable regression--urgent

Kooij, A.J. van der
Actually, SPSS does not "automatically takes one category as the reference category", but SPSS excludes variables when tolerance too low.  One dummy is excluded because it can be perfectly predicted from the dummies for the other categories, thus tolerance zero, thus excluded.  (all categories can  be perfectly predicted from the dummies for the other categories; which one is excluded is arbitrary)
If 2 dummies of same variable are excluded, then a category of this variable
can be perfectly predicted from the other predictors.
No dummies excluded seems weird, I would check the data/dummies.
 
Regards,
Anita van der Kooij
Data Theory Group
Leiden University

________________________________

From: SPSSX(r) Discussion on behalf of Yusof ahmad
Sent: Thu 31/08/2006 18:29
To: [hidden email]
Subject: spss dummy variable regression--urgent



I am running a regression of a continuous variable on a number of
independent variables that include two categorical variables (age and
income). There are 4 income categories and 9 age categories. This means I
need 3+8=11 dummy vars.
I understand that SPSS automatically takes one income category and one age
category as the reference categories if I use 4 +9 dummies and enter them
into the regression using

When I use the syntax

/regression
/dependent=VarY
/method = enter Var 1....Inc1 Inc2 Inc3 inc4 age1 age2..........age9 .

These are returned in an excluded vars table after the coefficients table?

Now when running this regression for different regions ( too many regions
so Im evaluating the regression separately for each region)--i sometimes
get 2 age categories in the excluded vars table. Also, for one region , i
get no excluded vars table..what to do?



**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: spss dummy variable regression--urgent

Hector Maletta
In reply to this post by Yusof ahmad
SPSS automatically takes one category as reference category (the last one by
default, I seem to recall, but you can change that with syntax) when SPSS
itself does the conversion of a category variable into dummies, e.g. in Cox
regression or logistic regression. Not so in Linear Regression, where you
must create and enter the dummies yourself, and the choice of reference
category is entirely yours.
Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Yusof
ahmad
Enviado el: Thursday, August 31, 2006 1:30 PM
Para: [hidden email]
Asunto: spss dummy variable regression--urgent

I am running a regression of a continuous variable on a number of
independent variables that include two categorical variables (age and
income). There are 4 income categories and 9 age categories. This means I
need 3+8=11 dummy vars.
I understand that SPSS automatically takes one income category and one age
category as the reference categories if I use 4 +9 dummies and enter them
into the regression using

When I use the syntax

/regression
/dependent=VarY
/method = enter Var 1....Inc1 Inc2 Inc3 inc4 age1 age2..........age9 .

These are returned in an excluded vars table after the coefficients table?

Now when running this regression for different regions ( too many regions
so Im evaluating the regression separately for each region)--i sometimes
get 2 age categories in the excluded vars table. Also, for one region , i
get no excluded vars table..what to do?
Reply | Threaded
Open this post in threaded view
|

Re: spss dummy variable regression--urgent

Richard Ristow
In reply to this post by Yusof ahmad
At 11:29 AM 8/31/2006, Yusof ahmad wrote:
>I am running a regression of a continuous variable on independent
>variables that include two categorical variables (age and income).
>There are 4 income categories and 9 age categories. This means Ineed
>3+8=11 dummy vars.

No, as others have suggested, it doesn't; but we'll come to that.

>I understand that SPSS automatically takes one income category and one
>age category as the reference categories if I use 4 +9 dummies and
>enter them into the regression using

No, absolutely not. Some SPSS procedures will take a categorical
variable as an independent, and generate the dummy variables
themselves. Those procedures will take one category as reference.

But in

   /method = enter Var 1....Inc1 Inc2 Inc3 inc4 age1 age2..........age9
.

SPSS doesn't know that Inc1 to Inc3 are indicator ('dummy') variables
for the categorical variable Income; how could it? So it can't pick a
reference category to eliminate.

>When I use the syntax
>
>/regression
>/dependent=VarY
>/method = enter Var 1....Inc1 Inc2 Inc3 inc4 age1 age2..........age9 .
>
>These are returned in an excluded vars table after the coefficients
>table?

They are, but that's not why they are. As Anita van der Kooij wrote,
SPSS excludes variables when it determines that some are linear
combinations of the others.

Before I go on: it is not good practice to use a set of regressor
variables that you know are linearly dependent. Much better to choose
the reference categories yourself, and drop the dummies for those
categories. Choosing wisely can make your results much easier to
interpret.

>Now when running this regression for different regions (too many
>regions so I'm evaluating the regression separately for each
>region)--i sometimes get 2 age categories in the excluded vars table.

The region has nobody in one of the age categories. The dummy for that
category therefore has to be excluded, in addition to having to exclude
one dummy because of linear dependence. (I write "the region has nobody
in one of the age categories" categorically. There MAY be another way
this can happen but I can't think of one.)

>Also, for one region, I get no excluded vars table..what to do?

Again, what you should do is choose your reference categories, rather
than letting SPSS try - REGRESSION was not meant to be used this way.

Again, from Anita,

>No dummies excluded seems weird, I would check the data/dummies.

It seems very weird indeed; I can't think of a way it could happen.
POSSIBLY it could result from some kind of rounding errors in the
calculations, though that doesn't seem very plausible. As Anita says,
check your data - carefully.