Multiple Regression with Continuous and Categorical Variables

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple Regression with Continuous and Categorical Variables

Bree Witteveen
Hello all,
I know that to use categorical independent variables in multiple
regression you must create dummy variables. How do you include the dummy
variables in a multiple regression model that also includes several
continuous independent variables? Is it possible to use dummy variables
and continuous in a stepwise regression?
Thanks!

--
Briana H. Witteveen
Doctoral Candidate

University of Alaska Fairbanks
University of Central Florida
Physiological Ecology and Bioenergetics Lab
118 Trident Way
Kodiak, AK 99615
Office: (907) 486-1514
Mobile: (907) 942-2733

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression with Continuous and Categorical Variables

SR Millis-3
Stepwise regression should be avoided:

1. It yields R-squared values that are badly biased to be high.

2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution.

3. The method yields confidence intervals for effects and predicted values that are falsely narrow (See Altman and Anderson, 1989, Statistics in Medicine).

4. It yields p-values that do not have the proper meaning, and the proper correction for them is a difficult problem.

5. It gives biased regression coefficients that need shrinkage (the coefficients for remaining variables are too large; see Tibshirani, 1996).

6. It has severe problems in the presence of collinearity.

7. It is based on methods (e.g., F tests for nested models) that were intended to be used to test prespecified hypotheses.

8. Increasing the sample size doesn't help very much (see Derksen and Keselman, 1992).

9. It allows us to not think about the problem.

10. It uses a lot of paper.

Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Wed, 7/23/08, Briana H. Witteveen <[hidden email]> wrote:

> From: Briana H. Witteveen <[hidden email]>
> Subject: Multiple Regression with Continuous and Categorical Variables
> To: [hidden email]
> Date: Wednesday, July 23, 2008, 10:03 PM
> Hello all,
> I know that to use categorical independent variables in
> multiple
> regression you must create dummy variables. How do you
> include the dummy
> variables in a multiple regression model that also includes
> several
> continuous independent variables? Is it possible to use
> dummy variables
> and continuous in a stepwise regression?
> Thanks!
>
> --
> Briana H. Witteveen
> Doctoral Candidate
>
> University of Alaska Fairbanks
> University of Central Florida
> Physiological Ecology and Bioenergetics Lab
> 118 Trident Way
> Kodiak, AK 99615
> Office: (907) 486-1514
> Mobile: (907) 942-2733
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body
> text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression with Continuous and Categorical Variables

Marta Garcia-Granero
In reply to this post by Bree Witteveen
Briana H. Witteveen escribió:
> I know that to use categorical independent variables in multiple
> regression you must create dummy variables. How do you include the dummy
> variables in a multiple regression model that also includes several
> continuous independent variables? Is it possible to use dummy variables
> and continuous in a stepwise regression?
>
Briana:

Scott Millis has already given you a decalogue of reasons for avoiding
stepwise regression. I could add one to his collection of reasons:
stepwise regression doesn't handle properly dummy coded categorical
variables. The final model might lack one of the dummy variables,
rendering the effect of the categorical variable uninterpretable.
Stepwise regression has been ironically called "unwise" regression
(Leamer, 1985). Avoid it. Period.

Take a look at chapter 4 of the book "Applied Logistic Regression"
Hosmer&Lemeshow (1989). They give excellent guidelines to model
development. Basically:

1) Univariate analysis

2) Select those variables that should be included for next step:
- Those that showed interesting results in univariate analysis (this
doesn't necessarily mean "significant")
- Those that your experience tells you that they might play an important
role (confounding and/or effect modifier). In Epidemiology/medical
research, gender and age are typical variables.

3) Build a model with all the variables you selected in the previous
step. Examine their adjusted effect and remove carefully those that look
non important. Check the effect of the removal of one variable in the
slopes of the rest. Important changes (above 10% is a good reference)
will show you that the variable you removed plays a role in the model
and should stay in it. If you suspect a variable is involved in
interactions (see next step), it should never be removed (hierarchical
rule). The final model is called the "main effects model"

4) Examine the existence of interaction between variables. Limit the
interaction terms according to these conditions:
- They should be statistically significant
- Meaningful: if you can't explain from a solid theoretical point of
view the presence of the interaction, then discard it
- Hierarchical rule: if an interaction term is present in a model, then
both main effects should also be. Stepwise regression tends to mess with
the rule, BTW

Your final model should be then validated (using an independent dataset).

Quoting Campbell (Statistics at Square Two, 2001): 'Do not forget that
models are simply an approximation to reality. "All models are wrong,
but some are useful" '

HTH,
Marta García-Granero



--
For miscellaneous statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD