Centering and excluded variables

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Centering and excluded variables

stats123123123
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Centering and excluded variables

Bruce Weaver
Administrator

1. You don't have to center any variables.  The model will generate the same fitted values whether you center or not.  Some people get upset about multicollinearity of product terms when variables are not centered, but that is a bit of a straw man, IMO--as noted above, the same fitted values are generated whether you center or not.  The main reason for centering is to make interpretation of coefficients easier.  Centering helps in this regard even for a so-called "main effects only" model.  I.e., if you center the variables on some sensible in-range value (which does not have to be the mean--it could be a value near the minimum, for example), the constant gives you the fitted value of Y when all  centered variables are set to 0--i.e., when the original variables are set to the values used for centering.

Re centering dummy variables (I prefer the term "indicator" variables), it seems to me that makes interpretation more difficult, not easier.  When all indicator variables are set to 0, you are setting that categorical variable to its reference category.

2. It does sound like you are including as many indicators as there are categories.  If k = the number of categories, you need to include k-1 indicators.  The category without an indicator is the reference category for the k-1 t-tests in the table of coefficients.

HTH.


Garnik Kazarjan wrote
Dear all,
 
I have two questions:
 
1. Say I am doing a regression with 10 independent variables and 1 dependent.
If I want to center the variables (because I have to do interaction terms) do I have to center all IV's or just the ones used for the interaction?
Also, can I center dummy variables? If yes, should I? If yes, should I use the mean to center or another method?
 
2. When I run the regression on spss it sometimes excludes variablea and only mentions that it has a tolerance of 0.000. It concerns dummy variables. I have double checked to make sure that not all categories are included as dummy variables, so I am kind of lost on what to do. What should I do so that it does not exclude variables?
 
Thanks in advance for a ny help.
 
Kind Regards,
 
Garnik
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Centering and excluded variables

jmdpulido
In reply to this post by stats123123123
Dear Garnik,

Are you running a linear model? or any other specification of the link function (e.g. logistic)? It is useful to know that to answer your question properly.

Centering is a linear transformation of a variable (substracting the grand mean -the overall mean- of the values in your sample). Therefore, you can center all independent variables, just some, or none, and the fit of the model will be no different.

Centering is usefull to interpret results eassily. As the mean of any centered variable is zero, centering is very useful for multilevel regressions (hierarchical models) and for not linear regressions (logistic), as it allows you  to calculate in a glimpse the effect of a marginal change on one independent variable when all the others are in their average values

You can center a dummy variable if you wish. Think in a dummy var which was originally 0-1 (e.g. 0 for male and 1 for female) and with a proportion of females = p (let's say 50% of women p=0,5), now your new variable for gender is -0,5 for male and +0,5 for female. Thiw will make more difficult to evaluate your regression equation in a glimpse.

With 0-1 values it is much easier.  For example if you have this linear regression equation:

Income = B(0) + B(1)Gender + B(2) Years in school + B(3) Age + B(4) Gender*Age

Where years in school and age are continuous, and gender is a dummy 0=male and 1=female.

I will center only the two continous IVs (years in school and age).

Therefore, for a male with the average years at school and the average age (years in school will be zero and age will be zero) the predicted income will be Income = Beta(0) as all other terms are cero.

For a female with the average years at school and the average age the predicted income will be Income= B(0) + B(1).

Thus, B(1) will be the difference in income between females and males at the average age in your sample.

The estimated increment in Income when a male grows a year older will be B(2).
The estimated increment in Income when a female grows a year older will be B(2) + B(4).

So, B(4) will be the difference in the increment of income when people grow a year older between genders...

If you need more help, do not hesitate to contact me again.
Reply | Threaded
Open this post in threaded view
|

Re: Centering and excluded variables

Rich Ulrich
In reply to this post by stats123123123
I only center the interactions themselves, not the original
variables.  That is, I compute as something like,
  AgeSex= (age-30)*(Sex-0.5).

One important consideration, which Bruce may undervalue, is that
people do look at the univariate statistics, and often care about
the single variables.  The centered interactions will not confound
the main effects and therefore give such misleading impressions.

Of course, you should never construct a model that has an interaction
term without first including both the variables that are interacting...
but that is an almost irresistible temptation for people who (foolishly)
invoke "stepwise regression"  and see that some badly computed
interaction term (which is not centered, and which therefore is
confounded with the main effect)  has entered, and wiped out the
potential contribution of the main effect.  (Another reason not to
ever use stepwise.)

As to (2.) - When you compute your indicator variables, you need
to have k-1  indicators for k categories *that exist*.  If there is
a variable scored 1-4  but there were no cases with value= 3, the
variable is completely score with only 2 indicators, not 3.
Could that explain it?

--
Rich Ulrich



=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Centering and excluded variables

Bruce Weaver
Administrator
Rich, I'm actually quite a fan of centering variables so that the constant and other coefficients give something that is easier to interpret.  I often do that even in the absence of interactions or polynomial terms.  (I'll often rescale age in years to age in decades or half-decades, etc.)  The point I was trying to make is that some folks (I think) fail to understand that whether you center or not, you'll get the same fitted values.  So in that sense, it's the same model, and you don't have to center anything.

Good question about whether one or more of the categories is not represented--that had not occurred to me.


Rich Ulrich wrote
I only center the interactions themselves, not the original
variables.  That is, I compute as something like,
  AgeSex= (age-30)*(Sex-0.5).

One important consideration, which Bruce may undervalue, is that
people do look at the univariate statistics, and often care about
the single variables.  The centered interactions will not confound
the main effects and therefore give such misleading impressions.

Of course, you should never construct a model that has an interaction
term without first including both the variables that are interacting...
but that is an almost irresistible temptation for people who (foolishly)
invoke "stepwise regression"  and see that some badly computed
interaction term (which is not centered, and which therefore is
confounded with the main effect)  has entered, and wiped out the
potential contribution of the main effect.  (Another reason not to
ever use stepwise.)

As to (2.) - When you compute your indicator variables, you need
to have k-1  indicators for k categories *that exist*.  If there is
a variable scored 1-4  but there were no cases with value= 3, the
variable is completely score with only 2 indicators, not 3.
Could that explain it?

--
Rich Ulrich



=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Centering and excluded variables

Ryan
In reply to this post by Rich Ulrich
There is at least one situation in which it is entirely appropriate to
exclude a main effect term despite the presence of the interaction
term. If one wants to fit a multivariate linear mixed model [via the
MIXED procedure], with the exception of the response indicator
variable, in general, main effects should be excluded. This allows for
separate intercepts and slopes to be estimated simultaneously for each
response. Aside from this example, I haven't encountered a situation
in my own work where excluding a main effect term was appropriate.

Ryan

On Thu, Mar 10, 2011 at 5:59 PM, Rich Ulrich <[hidden email]> wrote:

> I only center the interactions themselves, not the original
> variables.  That is, I compute as something like,
>  AgeSex= (age-30)*(Sex-0.5).
>
> One important consideration, which Bruce may undervalue, is that
> people do look at the univariate statistics, and often care about
> the single variables.  The centered interactions will not confound
> the main effects and therefore give such misleading impressions.
>
> Of course, you should never construct a model that has an interaction
> term without first including both the variables that are interacting...
> but that is an almost irresistible temptation for people who (foolishly)
> invoke "stepwise regression"  and see that some badly computed
> interaction term (which is not centered, and which therefore is
> confounded with the main effect)  has entered, and wiped out the
> potential contribution of the main effect.  (Another reason not to
> ever use stepwise.)
>
> As to (2.) - When you compute your indicator variables, you need
> to have k-1  indicators for k categories *that exist*.  If there is
> a variable scored 1-4  but there were no cases with value= 3, the
> variable is completely score with only 2 indicators, not 3.
> Could that explain it?
>
> --
> Rich Ulrich
>
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Centering and excluded variables

Bruce Weaver
Administrator
I can think of one more situation, Ryan, again in the context of multilevel models.  Here's a note I wrote for myself in a syntax file where I was analyzing change over time with MIXED.

* NOTE:  When adding longitudinal variables, we
* only want to examine their impact on the slope.
* If we examine the effect of longitudinal values
* on the INITIAL VALUE, we are in essence asking
* how the future is affecting the past.
* So this means that for longitudinal variables, we
* add only their interactions with TIME, but not
* their main effects.  (This is permissible in the
* multilevel model for change -- may have to find
* the quote from Singer & Willett).

The reference is to the book Applied Longitudinal Data Analysis by Judith Singer & John Willett.  Unfortunately, I did not mark down the page number where they make this comment.

Cheers,
Bruce


R B wrote
There is at least one situation in which it is entirely appropriate to
exclude a main effect term despite the presence of the interaction
term. If one wants to fit a multivariate linear mixed model [via the
MIXED procedure], with the exception of the response indicator
variable, in general, main effects should be excluded. This allows for
separate intercepts and slopes to be estimated simultaneously for each
response. Aside from this example, I haven't encountered a situation
in my own work where excluding a main effect term was appropriate.

Ryan

On Thu, Mar 10, 2011 at 5:59 PM, Rich Ulrich <rich-ulrich@live.com> wrote:
> I only center the interactions themselves, not the original
> variables.  That is, I compute as something like,
>  AgeSex= (age-30)*(Sex-0.5).
>
> One important consideration, which Bruce may undervalue, is that
> people do look at the univariate statistics, and often care about
> the single variables.  The centered interactions will not confound
> the main effects and therefore give such misleading impressions.
>
> Of course, you should never construct a model that has an interaction
> term without first including both the variables that are interacting...
> but that is an almost irresistible temptation for people who (foolishly)
> invoke "stepwise regression"  and see that some badly computed
> interaction term (which is not centered, and which therefore is
> confounded with the main effect)  has entered, and wiped out the
> potential contribution of the main effect.  (Another reason not to
> ever use stepwise.)
>
> As to (2.) - When you compute your indicator variables, you need
> to have k-1  indicators for k categories *that exist*.  If there is
> a variable scored 1-4  but there were no cases with value= 3, the
> variable is completely score with only 2 indicators, not 3.
> Could that explain it?
>
> --
> Rich Ulrich
>
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).