Multiple Linear Regression vs a series of simple linear regression on the presence of multicollinearity

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple Linear Regression vs a series of simple linear regression on the presence of multicollinearity

E. Bernardo
I am conducting a multiple Linear regression with 5 predictors, all variables are continuous and n=100.  Before doing linear regression analysis, I did first a simple correlation analysis and found that all the predictors have positive and significant correlation with the outcome variable.  There are highly correlated predictors.  Surprisingly, when I did the multiple linear regression, two of the predictors have negative B coefficients, Beta coeffcient less than -1.0, VIF of greater than 10, Eigenvalue of zero, condition index of >30..  These are indication of multicollinearity problem.

Is it a right alternative to do simple linear regression, one predictor at a time, instead of multiple regression? In case this alternative is wrong, what makes it wrong? What information would be lost in doing a series of simple regression, rather than multiple regression.

Thank you.
Eins


Adding more friends is quick and easy.
Import them over to Yahoo! Mail today!
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Linear Regression vs a series of simple linear regression on the presence of multicollinearity

Bruce Weaver
Administrator
eins wrote
I am conducting a multiple Linear regression with 5 predictors, all variables are continuous and n=100.  Before doing linear regression analysis, I did first a simple correlation analysis and found that all the predictors have positive and significant correlation with the outcome variable.  There are highly correlated predictors.  Surprisingly, when I did the multiple linear regression, two of the predictors have negative B coefficients, Beta coeffcient less than -1.0, VIF of greater than 10, Eigenvalue of zero, condition index of >30..  These are indication of multicollinearity problem.

Is it a right alternative to do simple linear regression, one predictor at a time, instead of multiple regression? In case this alternative is wrong, what makes it wrong? What information would be lost in doing a series of simple regression, rather than multiple regression.

Thank you.
Eins
The negative coefficients for a couple variables suggests that you have one more more "suppressor variables".  If you Google on that term, you should find lots of hits, including some notes by textbook author David Howell.  

Regarding your second question, if you run 5 simple linear regressions, you'll have no control for confounding.  The fact that you were running a multiple regression model in the first place suggests that this is not what you want.  If the excessive multicollinearity is due to one variable, I would try just removing it.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Linear Regression vs a series of simple linear regression on the presence of multicollinearity

Hector Maletta
In addition to Bruce's comment:
1. In multiple regression, each coefficient tells you by how much the DV
changes for a unit change in one IV, keeping the other IV constant. Since
IVs are inter-correlated, it is no surprise that once you keep 99 of them
constant, an increase of the 100th actually decreases the DV.
2. Having N=100 limits the number of IV you can use. The old rule of thumb
is that you should never attempt anything with less than 10 cases per
variable. You are above that threshold (5 predictors with 100 cases = 20
cases per predictor), but even that threshold is far too low: ten (or 20)
cases per variable leave you with large margins of error. Linear regression
assumes that errors are normally distributed, but Monte Carlo sampling
experiments suggest that errors are likely to be not normally distributed
when sample size is less than 30-50 cases (per variable). This would imply
that you cannot use more than 2-3 independent variables with 100 cases. Of
course the final result's significance would depend also on the coefficient
of variation of each variable (SD/mean), their inter-correlation and other
things, but those figures suggest you better get a larger sample if you are
attempting such a regression exercise.

Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Bruce Weaver
Sent: 30 September 2009 16:54
To: [hidden email]
Subject: Re: Multiple Linear Regression vs a series of simple linear
regression on the presence of multicollinearity

eins wrote:

>
> I am conducting a multiple Linear regression with 5 predictors, all
> variables are continuous and n=100.  Before doing linear regression
> analysis, I did first a simple correlation analysis and found that all the
> predictors have positive and significant correlation with the outcome
> variable.  There are highly correlated predictors.  Surprisingly, when I
> did the multiple linear regression, two of the predictors have negative B
> coefficients, Beta coeffcient less than -1.0, VIF of greater than 10,
> Eigenvalue of zero, condition index of >30..  These are indication of
> multicollinearity problem.
>
> Is it a right alternative to do simple linear regression, one predictor at
> a time, instead of multiple regression? In case this alternative is wrong,
> what makes it wrong? What information would be lost in doing a series of
> simple regression, rather than multiple regression.
>
> Thank you.
> Eins
>

The negative coefficients for a couple variables suggests that you have one
more more "suppressor variables".  If you Google on that term, you should
find lots of hits, including some notes by textbook author David Howell.

Regarding your second question, if you run 5 simple linear regressions,
you'll have no control for confounding.  The fact that you were running a
multiple regression model in the first place suggests that this is not what
you want.  If the excessive multicollinearity is due to one variable, I
would try just removing it.



-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."

NOTE:  My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
--
View this message in context:
http://www.nabble.com/Multiple-Linear-Regression-vs-a-series-of-simple-linea
r-regression-on-the-presence-of-multicollinearity-tp25678823p25687751.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Linear Regression vs a series of simple linear regression on the presence of multicollinearity

Whanger, J. Mr. CTR
Hector,

Is there any chance you have a citation for the monte carlo experiments
you mentioned?

Thanks,

Jim

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hector Maletta
Sent: Wednesday, September 30, 2009 4:25 PM
To: [hidden email]
Subject: Re: Multiple Linear Regression vs a series of simple linear
regression on the presence of multicollinearity

In addition to Bruce's comment:
1. In multiple regression, each coefficient tells you by how much the DV
changes for a unit change in one IV, keeping the other IV constant.
Since IVs are inter-correlated, it is no surprise that once you keep 99
of them constant, an increase of the 100th actually decreases the DV.
2. Having N=100 limits the number of IV you can use. The old rule of
thumb is that you should never attempt anything with less than 10 cases
per variable. You are above that threshold (5 predictors with 100 cases
= 20 cases per predictor), but even that threshold is far too low: ten
(or 20) cases per variable leave you with large margins of error. Linear
regression assumes that errors are normally distributed, but Monte Carlo
sampling experiments suggest that errors are likely to be not normally
distributed when sample size is less than 30-50 cases (per variable).
This would imply that you cannot use more than 2-3 independent variables
with 100 cases. Of course the final result's significance would depend
also on the coefficient of variation of each variable (SD/mean), their
inter-correlation and other things, but those figures suggest you better
get a larger sample if you are attempting such a regression exercise.

Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Bruce Weaver
Sent: 30 September 2009 16:54
To: [hidden email]
Subject: Re: Multiple Linear Regression vs a series of simple linear
regression on the presence of multicollinearity

eins wrote:
>
> I am conducting a multiple Linear regression with 5 predictors, all
> variables are continuous and n=100.  Before doing linear regression
> analysis, I did first a simple correlation analysis and found that all

> the predictors have positive and significant correlation with the
> outcome variable.  There are highly correlated predictors.
> Surprisingly, when I did the multiple linear regression, two of the
> predictors have negative B coefficients, Beta coeffcient less than
> -1.0, VIF of greater than 10, Eigenvalue of zero, condition index of
> >30..  These are indication of multicollinearity problem.
>
> Is it a right alternative to do simple linear regression, one
> predictor at a time, instead of multiple regression? In case this
> alternative is wrong, what makes it wrong? What information would be
> lost in doing a series of simple regression, rather than multiple
regression.
>
> Thank you.
> Eins
>

The negative coefficients for a couple variables suggests that you have
one more more "suppressor variables".  If you Google on that term, you
should find lots of hits, including some notes by textbook author David
Howell.

Regarding your second question, if you run 5 simple linear regressions,
you'll have no control for confounding.  The fact that you were running
a multiple regression model in the first place suggests that this is not
what you want.  If the excessive multicollinearity is due to one
variable, I would try just removing it.



-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."

NOTE:  My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
--
View this message in context:
http://www.nabble.com/Multiple-Linear-Regression-vs-a-series-of-simple-l
inea
r-regression-on-the-presence-of-multicollinearity-tp25678823p25687751.ht
ml
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Linear Regression vs a series of simple linear regression on the presence of multicollinearity

Hector Maletta
Not out of my head: I just remembered the piece of information but not the
precise source. Unfortunately I am now travelling and have little chance to
look in the books I suspect have the answer. However, it is quite common
knowledge that at least 30/50 cases are needed for a normal distribution to
take shape.
Hector
-----Original Message-----
From: Whanger, J. Mr. CTR [mailto:[hidden email]]
Sent: 01 October 2009 10:53
To: Hector Maletta; [hidden email]
Subject: RE: Re: Multiple Linear Regression vs a series of simple linear
regression on the presence of multicollinearity

Hector,

Is there any chance you have a citation for the monte carlo experiments
you mentioned?

Thanks,

Jim

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hector Maletta
Sent: Wednesday, September 30, 2009 4:25 PM
To: [hidden email]
Subject: Re: Multiple Linear Regression vs a series of simple linear
regression on the presence of multicollinearity

In addition to Bruce's comment:
1. In multiple regression, each coefficient tells you by how much the DV
changes for a unit change in one IV, keeping the other IV constant.
Since IVs are inter-correlated, it is no surprise that once you keep 99
of them constant, an increase of the 100th actually decreases the DV.
2. Having N=100 limits the number of IV you can use. The old rule of
thumb is that you should never attempt anything with less than 10 cases
per variable. You are above that threshold (5 predictors with 100 cases
= 20 cases per predictor), but even that threshold is far too low: ten
(or 20) cases per variable leave you with large margins of error. Linear
regression assumes that errors are normally distributed, but Monte Carlo
sampling experiments suggest that errors are likely to be not normally
distributed when sample size is less than 30-50 cases (per variable).
This would imply that you cannot use more than 2-3 independent variables
with 100 cases. Of course the final result's significance would depend
also on the coefficient of variation of each variable (SD/mean), their
inter-correlation and other things, but those figures suggest you better
get a larger sample if you are attempting such a regression exercise.

Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Bruce Weaver
Sent: 30 September 2009 16:54
To: [hidden email]
Subject: Re: Multiple Linear Regression vs a series of simple linear
regression on the presence of multicollinearity

eins wrote:
>
> I am conducting a multiple Linear regression with 5 predictors, all
> variables are continuous and n=100.  Before doing linear regression
> analysis, I did first a simple correlation analysis and found that all

> the predictors have positive and significant correlation with the
> outcome variable.  There are highly correlated predictors.
> Surprisingly, when I did the multiple linear regression, two of the
> predictors have negative B coefficients, Beta coeffcient less than
> -1.0, VIF of greater than 10, Eigenvalue of zero, condition index of
> >30..  These are indication of multicollinearity problem.
>
> Is it a right alternative to do simple linear regression, one
> predictor at a time, instead of multiple regression? In case this
> alternative is wrong, what makes it wrong? What information would be
> lost in doing a series of simple regression, rather than multiple
regression.
>
> Thank you.
> Eins
>

The negative coefficients for a couple variables suggests that you have
one more more "suppressor variables".  If you Google on that term, you
should find lots of hits, including some notes by textbook author David
Howell.

Regarding your second question, if you run 5 simple linear regressions,
you'll have no control for confounding.  The fact that you were running
a multiple regression model in the first place suggests that this is not
what you want.  If the excessive multicollinearity is due to one
variable, I would try just removing it.



-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."

NOTE:  My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
--
View this message in context:
http://www.nabble.com/Multiple-Linear-Regression-vs-a-series-of-simple-l
inea
r-regression-on-the-presence-of-multicollinearity-tp25678823p25687751.ht
ml
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Linear Regression vs a series of simple linear regression on the presence of multicollinearity

Bruce Weaver
Administrator
Hector Maletta wrote
Not out of my head: I just remembered the piece of information but not the
precise source. Unfortunately I am now travelling and have little chance to
look in the books I suspect have the answer. However, it is quite common
knowledge that at least 30/50 cases are needed for a normal distribution to
take shape.
Hector
-----Original Message-----
From: Whanger, J. Mr. CTR [mailto:James.Whanger@med.navy.mil]
Sent: 01 October 2009 10:53
To: Hector Maletta; SPSSX-L@LISTSERV.UGA.EDU
Subject: RE: Re: Multiple Linear Regression vs a series of simple linear
regression on the presence of multicollinearity

Hector,

Is there any chance you have a citation for the monte carlo experiments
you mentioned?

Thanks,

Jim
Here is some advice from Dave Howell's book, Statistical Methods for Psychology (4th Ed).

   www.angelfire.com/wv/bwhomedir/notes/linreg_rule_of_thumb.txt

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).