Multicollinearity confusion

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Multicollinearity confusion

jimjohn
I'm a little confused. So, multicollinearity is a problem that can affect our regression results when the independent variables are correlated with each other. But many times, I see regression models like this:
y = B0 + B1 *Factor1 + B2 * (Factor1)^squared

So, wouldn't Factor 1 and (Factor 1)^squared be highly correlated, thus resulting in a big collinearity problem? Any ideas why its ok here? Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity confusion

mpirritano
I believe that it is the combination of the linear and squared variable
that together give you the curvilinear effect of the variable. You are
not interested or able to look only at the linear effect when the
quadratic is in the equation. You can only evaluate the squared effect.

matt


Matthew Pirritano, Ph.D.
Research Analyst IV
County of Orange
Medical Services Initiative (MSI)
[hidden email]
(714) 834-6011


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: Friday, July 18, 2008 2:33 PM
To: [hidden email]
Subject: Multicollinearity confusion

I'm a little confused. So, multicollinearity is a problem that can
affect our
regression results when the independent variables are correlated with
each
other. But many times, I see regression models like this:
y = B0 + B1 *Factor1 + B2 * (Factor1)^squared

So, wouldn't Factor 1 and (Factor 1)^squared be highly correlated, thus
resulting in a big collinearity problem? Any ideas why its ok here?
Thanks.
--
View this message in context:
http://www.nabble.com/Multicollinearity-confusion-tp18538040p18538040.ht
ml
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity confusion

Hector Maletta
In reply to this post by jimjohn
The problem, Jim John, arises not exactly when the independent variables are
correlated, but when they are (1) linearly correlated, and (2) the
correlation is nearly 1. Between a variable and its square there is no
linear correlation, except perhaps an approximately linear correlation for
small ranges of variation. The real problem is, to be more precise, that no
independent variable can be a perfect linear function of the rest of
independent variables. Imagine, for instance, having one variable called
TODAY, another variable DATEOFBIRTH, and a third variable AGETODAY. One of
them is redundant.
In that hypothetical case, one of the independent variables would be
redundant, and the matrix of covariances would be singular (i.e. will have a
zero determinant). Since computing the coefficients of regression involves
dividing by that determinant, it would involve dividing by zero, and no real
solution would exist. When the determinant is NEARLY zero, such as
0.000000001, a small change in any of the variables may cause large changes
in the estimated coefficients, leading to unstable solutions.
Moderate (or even relatively high) correlations among independent variables
do not have this effect, and can be tolerated. The TOLERANCE criterion in
the REGRESSION command (available in STEPWISE methods for instance) is used
to decide whether or not to accept a new variable in the equation. The
TOLERANCE criterion sets up the minimum value required for the determinant,
below which a new variable is not included because it would cause practical
multi-collinearity i.e. a very unstable solution.
Hope this clarifies the issue.

Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: 18 July 2008 18:33
To: [hidden email]
Subject: Multicollinearity confusion

I'm a little confused. So, multicollinearity is a problem that can affect
our
regression results when the independent variables are correlated with each
other. But many times, I see regression models like this:
y = B0 + B1 *Factor1 + B2 * (Factor1)^squared

So, wouldn't Factor 1 and (Factor 1)^squared be highly correlated, thus
resulting in a big collinearity problem? Any ideas why its ok here? Thanks.
--
View this message in context:
http://www.nabble.com/Multicollinearity-confusion-tp18538040p18538040.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity confusion

Swank, Paul R
Actually, the correlation between x and x squared is around .97 for
values of x between 0 and 10. This can get worse as you add x cubed and
higher quadratics. Thus, we often suggest centering such variables
before powering them for use in such analyses.

Paul R. Swank, Ph.D.
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center - Houston


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hector Maletta
Sent: Friday, July 18, 2008 5:15 PM
To: [hidden email]
Subject: Re: Multicollinearity confusion

The problem, Jim John, arises not exactly when the independent variables
are
correlated, but when they are (1) linearly correlated, and (2) the
correlation is nearly 1. Between a variable and its square there is no
linear correlation, except perhaps an approximately linear correlation
for
small ranges of variation. The real problem is, to be more precise, that
no
independent variable can be a perfect linear function of the rest of
independent variables. Imagine, for instance, having one variable called
TODAY, another variable DATEOFBIRTH, and a third variable AGETODAY. One
of
them is redundant.
In that hypothetical case, one of the independent variables would be
redundant, and the matrix of covariances would be singular (i.e. will
have a
zero determinant). Since computing the coefficients of regression
involves
dividing by that determinant, it would involve dividing by zero, and no
real
solution would exist. When the determinant is NEARLY zero, such as
0.000000001, a small change in any of the variables may cause large
changes
in the estimated coefficients, leading to unstable solutions.
Moderate (or even relatively high) correlations among independent
variables
do not have this effect, and can be tolerated. The TOLERANCE criterion
in
the REGRESSION command (available in STEPWISE methods for instance) is
used
to decide whether or not to accept a new variable in the equation. The
TOLERANCE criterion sets up the minimum value required for the
determinant,
below which a new variable is not included because it would cause
practical
multi-collinearity i.e. a very unstable solution.
Hope this clarifies the issue.

Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: 18 July 2008 18:33
To: [hidden email]
Subject: Multicollinearity confusion

I'm a little confused. So, multicollinearity is a problem that can
affect
our
regression results when the independent variables are correlated with
each
other. But many times, I see regression models like this:
y = B0 + B1 *Factor1 + B2 * (Factor1)^squared

So, wouldn't Factor 1 and (Factor 1)^squared be highly correlated, thus
resulting in a big collinearity problem? Any ideas why its ok here?
Thanks.
--
View this message in context:
http://www.nabble.com/Multicollinearity-confusion-tp18538040p18538040.ht
ml
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity confusion

Johnny Amora
In reply to this post by mpirritano
Matt,
 
Can you recommend a reference on the interpretation of nonlinear effect, particularly quadratic amd cubic?

Thanks.

--- On Sat, 7/19/08, Pirritano, Matthew <[hidden email]> wrote:

From: Pirritano, Matthew <[hidden email]>
Subject: Re: Multicollinearity confusion
To: [hidden email]
Date: Saturday, July 19, 2008, 6:07 AM

I believe that it is the combination of the linear and squared variable
that together give you the curvilinear effect of the variable. You are
not interested or able to look only at the linear effect when the
quadratic is in the equation. You can only evaluate the squared effect.

matt


Matthew Pirritano, Ph.D.
Research Analyst IV
County of Orange
Medical Services Initiative (MSI)
[hidden email]
(714) 834-6011


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: Friday, July 18, 2008 2:33 PM
To: [hidden email]
Subject: Multicollinearity confusion

I'm a little confused. So, multicollinearity is a problem that can
affect our
regression results when the independent variables are correlated with
each
other. But many times, I see regression models like this:
y = B0 + B1 *Factor1 + B2 * (Factor1)^squared

So, wouldn't Factor 1 and (Factor 1)^squared be highly correlated, thus
resulting in a big collinearity problem? Any ideas why its ok here?
Thanks.
--
View this message in context:
http://www.nabble.com/Multicollinearity-confusion-tp18538040p18538040.ht
ml
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity confusion

mpirritano
I've found this sage publication "Interaction effects in multiple regression" Jaccard & Turrisi, 2003 to be helpful in understanding how to interpret both linear interactions as well as interactions involving a variable raised to a power.

A quick perusal of Tabachnick and Fidel points to the following source for more info on understanding nonlinear effects in regression. Aiken & West, 1991, Multiple Regression: Testing and Interpreting Interactions.

Matt

Matthew Pirritano, Ph.D.
Research Analyst IV
County of Orange
Medical Services Initiative (MSI)
[hidden email]
(714) 834-6011


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Johnny Amora
Sent: Friday, July 18, 2008 11:37 PM
To: [hidden email]
Subject: Re: Multicollinearity confusion

Matt,

Can you recommend a reference on the interpretation of nonlinear effect, particularly quadratic amd cubic?

Thanks.

--- On Sat, 7/19/08, Pirritano, Matthew <[hidden email]> wrote:

From: Pirritano, Matthew <[hidden email]>
Subject: Re: Multicollinearity confusion
To: [hidden email]
Date: Saturday, July 19, 2008, 6:07 AM

I believe that it is the combination of the linear and squared variable
that together give you the curvilinear effect of the variable. You are
not interested or able to look only at the linear effect when the
quadratic is in the equation. You can only evaluate the squared effect.

matt


Matthew Pirritano, Ph.D.
Research Analyst IV
County of Orange
Medical Services Initiative (MSI)
[hidden email]
(714) 834-6011


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: Friday, July 18, 2008 2:33 PM
To: [hidden email]
Subject: Multicollinearity confusion

I'm a little confused. So, multicollinearity is a problem that can
affect our
regression results when the independent variables are correlated with
each
other. But many times, I see regression models like this:
y = B0 + B1 *Factor1 + B2 * (Factor1)^squared

So, wouldn't Factor 1 and (Factor 1)^squared be highly correlated, thus
resulting in a big collinearity problem? Any ideas why its ok here?
Thanks.
--
View this message in context:
http://www.nabble.com/Multicollinearity-confusion-tp18538040p18538040.ht
ml
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity confusion

jimjohn
thanks for all the answers and advice guys! just one more question: I have one model with the following three variables: (VIRM, VIRM^2, and BARate). Now, VIRM on its own has an inverse effect on my dependent variable, and so does VIRM^2. In the model when all these variables are grouped together, VIRM^2 has a positive coefficient. Is this ok? Or does this indicate some kind of problem with this model? thanks!





Pirritano, Matthew-2 wrote
I've found this sage publication "Interaction effects in multiple regression" Jaccard & Turrisi, 2003 to be helpful in understanding how to interpret both linear interactions as well as interactions involving a variable raised to a power.

A quick perusal of Tabachnick and Fidel points to the following source for more info on understanding nonlinear effects in regression. Aiken & West, 1991, Multiple Regression: Testing and Interpreting Interactions.

Matt

Matthew Pirritano, Ph.D.
Research Analyst IV
County of Orange
Medical Services Initiative (MSI)
mpirritano@ochca.com
(714) 834-6011


-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Johnny Amora
Sent: Friday, July 18, 2008 11:37 PM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: Multicollinearity confusion

Matt,

Can you recommend a reference on the interpretation of nonlinear effect, particularly quadratic amd cubic?

Thanks.

--- On Sat, 7/19/08, Pirritano, Matthew <MPirritano@ochca.com> wrote:

From: Pirritano, Matthew <MPirritano@ochca.com>
Subject: Re: Multicollinearity confusion
To: SPSSX-L@LISTSERV.UGA.EDU
Date: Saturday, July 19, 2008, 6:07 AM

I believe that it is the combination of the linear and squared variable
that together give you the curvilinear effect of the variable. You are
not interested or able to look only at the linear effect when the
quadratic is in the equation. You can only evaluate the squared effect.

matt


Matthew Pirritano, Ph.D.
Research Analyst IV
County of Orange
Medical Services Initiative (MSI)
mpirritano@ochca.com
(714) 834-6011


-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
jimjohn
Sent: Friday, July 18, 2008 2:33 PM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Multicollinearity confusion

I'm a little confused. So, multicollinearity is a problem that can
affect our
regression results when the independent variables are correlated with
each
other. But many times, I see regression models like this:
y = B0 + B1 *Factor1 + B2 * (Factor1)^squared

So, wouldn't Factor 1 and (Factor 1)^squared be highly correlated, thus
resulting in a big collinearity problem? Any ideas why its ok here?
Thanks.
--
View this message in context:
http://www.nabble.com/Multicollinearity-confusion-tp18538040p18538040.ht
ml
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




=======
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity confusion

Hector Maletta
Jim John,
I do not think it represents a problem. The univariate negative effect of
VIRM^2 is probably due to the increasing effect of VIRM as such. Once the
linear effect of VIRM is already taken account of, the quadratic effect is
shown to be negative (perhaps because in the relevant range of variation of
VIRM the quadratic function is decreasing, or simply because the non-linear
component is an attenuation of the linear effect).
Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: 22 July 2008 16:05
To: [hidden email]
Subject: Re: Multicollinearity confusion

thanks for all the answers and advice guys! just one more question: I have
one model with the following three variables: (VIRM, VIRM^2, and BARate).
Now, VIRM on its own has an inverse effect on my dependent variable, and so
does VIRM^2. In the model when all these variables are grouped together,
VIRM^2 has a positive coefficient. Is this ok? Or does this indicate some
kind of problem with this model? thanks!






Pirritano, Matthew-2 wrote:

>
> I've found this sage publication "Interaction effects in multiple
> regression" Jaccard & Turrisi, 2003 to be helpful in understanding how to
> interpret both linear interactions as well as interactions involving a
> variable raised to a power.
>
> A quick perusal of Tabachnick and Fidel points to the following source for
> more info on understanding nonlinear effects in regression. Aiken & West,
> 1991, Multiple Regression: Testing and Interpreting Interactions.
>
> Matt
>
> Matthew Pirritano, Ph.D.
> Research Analyst IV
> County of Orange
> Medical Services Initiative (MSI)
> [hidden email]
> (714) 834-6011
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Johnny Amora
> Sent: Friday, July 18, 2008 11:37 PM
> To: [hidden email]
> Subject: Re: Multicollinearity confusion
>
> Matt,
>
> Can you recommend a reference on the interpretation of nonlinear effect,
> particularly quadratic amd cubic?
>
> Thanks.
>
> --- On Sat, 7/19/08, Pirritano, Matthew <[hidden email]> wrote:
>
> From: Pirritano, Matthew <[hidden email]>
> Subject: Re: Multicollinearity confusion
> To: [hidden email]
> Date: Saturday, July 19, 2008, 6:07 AM
>
> I believe that it is the combination of the linear and squared variable
> that together give you the curvilinear effect of the variable. You are
> not interested or able to look only at the linear effect when the
> quadratic is in the equation. You can only evaluate the squared effect.
>
> matt
>
>
> Matthew Pirritano, Ph.D.
> Research Analyst IV
> County of Orange
> Medical Services Initiative (MSI)
> [hidden email]
> (714) 834-6011
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> jimjohn
> Sent: Friday, July 18, 2008 2:33 PM
> To: [hidden email]
> Subject: Multicollinearity confusion
>
> I'm a little confused. So, multicollinearity is a problem that can
> affect our
> regression results when the independent variables are correlated with
> each
> other. But many times, I see regression models like this:
> y = B0 + B1 *Factor1 + B2 * (Factor1)^squared
>
> So, wouldn't Factor 1 and (Factor 1)^squared be highly correlated, thus
> resulting in a big collinearity problem? Any ideas why its ok here?
> Thanks.
> --
> View this message in context:
> http://www.nabble.com/Multicollinearity-confusion-tp18538040p18538040.ht
> ml
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>
>
> =======
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

--
View this message in context:
http://www.nabble.com/Multicollinearity-confusion-tp18538040p18596414.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD