This post was updated on .
CONTENTS DELETED
The author has deleted this message.
|
Administrator
|
1. You don't have to center any variables. The model will generate the same fitted values whether you center or not. Some people get upset about multicollinearity of product terms when variables are not centered, but that is a bit of a straw man, IMO--as noted above, the same fitted values are generated whether you center or not. The main reason for centering is to make interpretation of coefficients easier. Centering helps in this regard even for a so-called "main effects only" model. I.e., if you center the variables on some sensible in-range value (which does not have to be the mean--it could be a value near the minimum, for example), the constant gives you the fitted value of Y when all centered variables are set to 0--i.e., when the original variables are set to the values used for centering. Re centering dummy variables (I prefer the term "indicator" variables), it seems to me that makes interpretation more difficult, not easier. When all indicator variables are set to 0, you are setting that categorical variable to its reference category. 2. It does sound like you are including as many indicators as there are categories. If k = the number of categories, you need to include k-1 indicators. The category without an indicator is the reference category for the k-1 t-tests in the table of coefficients. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by stats123123123
Dear Garnik,
Are you running a linear model? or any other specification of the link function (e.g. logistic)? It is useful to know that to answer your question properly. Centering is a linear transformation of a variable (substracting the grand mean -the overall mean- of the values in your sample). Therefore, you can center all independent variables, just some, or none, and the fit of the model will be no different. Centering is usefull to interpret results eassily. As the mean of any centered variable is zero, centering is very useful for multilevel regressions (hierarchical models) and for not linear regressions (logistic), as it allows you to calculate in a glimpse the effect of a marginal change on one independent variable when all the others are in their average values You can center a dummy variable if you wish. Think in a dummy var which was originally 0-1 (e.g. 0 for male and 1 for female) and with a proportion of females = p (let's say 50% of women p=0,5), now your new variable for gender is -0,5 for male and +0,5 for female. Thiw will make more difficult to evaluate your regression equation in a glimpse. With 0-1 values it is much easier. For example if you have this linear regression equation: Income = B(0) + B(1)Gender + B(2) Years in school + B(3) Age + B(4) Gender*Age Where years in school and age are continuous, and gender is a dummy 0=male and 1=female. I will center only the two continous IVs (years in school and age). Therefore, for a male with the average years at school and the average age (years in school will be zero and age will be zero) the predicted income will be Income = Beta(0) as all other terms are cero. For a female with the average years at school and the average age the predicted income will be Income= B(0) + B(1). Thus, B(1) will be the difference in income between females and males at the average age in your sample. The estimated increment in Income when a male grows a year older will be B(2). The estimated increment in Income when a female grows a year older will be B(2) + B(4). So, B(4) will be the difference in the increment of income when people grow a year older between genders... If you need more help, do not hesitate to contact me again. |
In reply to this post by stats123123123
I only center the interactions themselves, not the original
variables. That is, I compute as something like, AgeSex= (age-30)*(Sex-0.5). One important consideration, which Bruce may undervalue, is that people do look at the univariate statistics, and often care about the single variables. The centered interactions will not confound the main effects and therefore give such misleading impressions. Of course, you should never construct a model that has an interaction term without first including both the variables that are interacting... but that is an almost irresistible temptation for people who (foolishly) invoke "stepwise regression" and see that some badly computed interaction term (which is not centered, and which therefore is confounded with the main effect) has entered, and wiped out the potential contribution of the main effect. (Another reason not to ever use stepwise.) As to (2.) - When you compute your indicator variables, you need to have k-1 indicators for k categories *that exist*. If there is a variable scored 1-4 but there were no cases with value= 3, the variable is completely score with only 2 indicators, not 3. Could that explain it? -- Rich Ulrich ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Rich, I'm actually quite a fan of centering variables so that the constant and other coefficients give something that is easier to interpret. I often do that even in the absence of interactions or polynomial terms. (I'll often rescale age in years to age in decades or half-decades, etc.) The point I was trying to make is that some folks (I think) fail to understand that whether you center or not, you'll get the same fitted values. So in that sense, it's the same model, and you don't have to center anything.
Good question about whether one or more of the categories is not represented--that had not occurred to me.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Rich Ulrich
There is at least one situation in which it is entirely appropriate to
exclude a main effect term despite the presence of the interaction term. If one wants to fit a multivariate linear mixed model [via the MIXED procedure], with the exception of the response indicator variable, in general, main effects should be excluded. This allows for separate intercepts and slopes to be estimated simultaneously for each response. Aside from this example, I haven't encountered a situation in my own work where excluding a main effect term was appropriate. Ryan On Thu, Mar 10, 2011 at 5:59 PM, Rich Ulrich <[hidden email]> wrote: > I only center the interactions themselves, not the original > variables. That is, I compute as something like, > AgeSex= (age-30)*(Sex-0.5). > > One important consideration, which Bruce may undervalue, is that > people do look at the univariate statistics, and often care about > the single variables. The centered interactions will not confound > the main effects and therefore give such misleading impressions. > > Of course, you should never construct a model that has an interaction > term without first including both the variables that are interacting... > but that is an almost irresistible temptation for people who (foolishly) > invoke "stepwise regression" and see that some badly computed > interaction term (which is not centered, and which therefore is > confounded with the main effect) has entered, and wiped out the > potential contribution of the main effect. (Another reason not to > ever use stepwise.) > > As to (2.) - When you compute your indicator variables, you need > to have k-1 indicators for k categories *that exist*. If there is > a variable scored 1-4 but there were no cases with value= 3, the > variable is completely score with only 2 indicators, not 3. > Could that explain it? > > -- > Rich Ulrich > > > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
I can think of one more situation, Ryan, again in the context of multilevel models. Here's a note I wrote for myself in a syntax file where I was analyzing change over time with MIXED.
* NOTE: When adding longitudinal variables, we * only want to examine their impact on the slope. * If we examine the effect of longitudinal values * on the INITIAL VALUE, we are in essence asking * how the future is affecting the past. * So this means that for longitudinal variables, we * add only their interactions with TIME, but not * their main effects. (This is permissible in the * multilevel model for change -- may have to find * the quote from Singer & Willett). The reference is to the book Applied Longitudinal Data Analysis by Judith Singer & John Willett. Unfortunately, I did not mark down the page number where they make this comment. Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |