Multicollinearity

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Multicollinearity

jimjohn
many of my independent variables correlate highly with each other. since i am building a multiple linear regression model with many of these variables and I am only analyzing the grouped effect all the variables have on the dependent variable (instead of any individual effect one independent variable may have), then I don't need to worry about multicollinearity. Can someone plz confirm this? From what I recall, multicollinearity is only an issue when you are trying to analyze each independent variable's individual effect on the dependent variable. thx.
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

Ornelas, Fermin-2
I suggest you do a search on the issue.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of jimjohn
Sent: Monday, June 30, 2008 8:30 AM
To: [hidden email]
Subject: Multicollinearity

many of my independent variables correlate highly with each other. since i am
building a multiple linear regression model with many of these variables and
I am only analyzing the grouped effect all the variables have on the
dependent variable (instead of any individual effect one independent
variable may have), then I don't need to worry about multicollinearity. Can
someone plz confirm this? From what I recall, multicollinearity is only an
issue when you are trying to analyze each independent variable's individual
effect on the dependent variable. thx.
--
View this message in context: http://www.nabble.com/Multicollinearity-tp18197967p18197967.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed.  It may contain information that is privileged and confidential under state and federal law.  This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail.  Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

jimjohn
Thanks Fermin, any specific links you recommend that might give me
this answer? it didnt come up in any of my searches yet.





Quoting "Ornelas, Fermin" <[hidden email]>:

> I suggest you do a search on the issue.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On
> Behalf Of jimjohn
> Sent: Monday, June 30, 2008 8:30 AM
> To: [hidden email]
> Subject: Multicollinearity
>
> many of my independent variables correlate highly with each other. since i am
> building a multiple linear regression model with many of these variables and
> I am only analyzing the grouped effect all the variables have on the
> dependent variable (instead of any individual effect one independent
> variable may have), then I don't need to worry about multicollinearity. Can
> someone plz confirm this? From what I recall, multicollinearity is only an
> issue when you are trying to analyze each independent variable's individual
> effect on the dependent variable. thx.
> --
> View this message in context:
> http://www.nabble.com/Multicollinearity-tp18197967p18197967.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
> CONFIDENTIAL information and is intended only for the use of the
> specific individual(s) to whom it is addressed.  It may contain
> information that is privileged and confidential under state and
> federal law.  This information may be used or disclosed only in
> accordance with law, and you may be subject to penalties under law
> for improper use or further disclosure of the information in this
> e-mail and its attachments. If you have received this e-mail in
> error, please immediately notify the person named above by reply
> e-mail, and then delete the original e-mail.  Thank you.
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

SR Millis-3
Two excellent books on the topic:

Regression Diagnostics by John Fox

Regression Diagnostics by D Belsley, E Kuh, & R Welsch


Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Mon, 6/30/08, [hidden email] <[hidden email]> wrote:

> From: [hidden email] <[hidden email]>
> Subject: Re: Multicollinearity
> To: [hidden email]
> Date: Monday, June 30, 2008, 12:30 PM
> Thanks Fermin, any specific links you recommend that might
> give me
> this answer? it didnt come up in any of my searches yet.
>
>
>
>
>
> Quoting "Ornelas, Fermin"
> <[hidden email]>:
>
> > I suggest you do a search on the issue.
> >
> > -----Original Message-----
> > From: SPSSX(r) Discussion
> [mailto:[hidden email]] On
> > Behalf Of jimjohn
> > Sent: Monday, June 30, 2008 8:30 AM
> > To: [hidden email]
> > Subject: Multicollinearity
> >
> > many of my independent variables correlate highly with
> each other. since i am
> > building a multiple linear regression model with many
> of these variables and
> > I am only analyzing the grouped effect all the
> variables have on the
> > dependent variable (instead of any individual effect
> one independent
> > variable may have), then I don't need to worry
> about multicollinearity. Can
> > someone plz confirm this? From what I recall,
> multicollinearity is only an
> > issue when you are trying to analyze each independent
> variable's individual
> > effect on the dependent variable. thx.
> > --
> > View this message in context:
> >
> http://www.nabble.com/Multicollinearity-tp18197967p18197967.html
> > Sent from the SPSSX Discussion mailing list archive at
> Nabble.com.
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message
> to
> > [hidden email] (not to SPSSX-L), with no
> body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send
> the command
> > INFO REFCARD
> >
> > NOTICE: This e-mail (and any attachments) may contain
> PRIVILEGED OR
> > CONFIDENTIAL information and is intended only for the
> use of the
> > specific individual(s) to whom it is addressed.  It
> may contain
> > information that is privileged and confidential under
> state and
> > federal law.  This information may be used or
> disclosed only in
> > accordance with law, and you may be subject to
> penalties under law
> > for improper use or further disclosure of the
> information in this
> > e-mail and its attachments. If you have received this
> e-mail in
> > error, please immediately notify the person named
> above by reply
> > e-mail, and then delete the original e-mail.  Thank
> you.
> >
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body
> text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

Ornelas, Fermin-2
In reply to this post by jimjohn
I myself have written more than once on this issue. You should have been able to find it in the list, having said that anyone that is building a regression model is going to face this issue. The empirical characteristics of this problem generally are: low t-statistics for the parameter estimates, incorrect signs, instability of the parameter values and their sign direction as variables are removed from the models. These conditions will render hypotheses testing questionable. The empirical question becomes what would be a reasonable degree of collinearity in the model? To me, if my variance proportions from the diagnostics are less than .5 for no more than 3 variables say for a model of 10 variables and the condition index is less than 30, then the model passes this test. Also the VIF < 10 is a reasonable measure. One also has to be concerned with the purpose of the model, if prediction is the primary purpose of the model then having collinear variable is not likely to hinder!
  the model's prediction capability, but to make inferences that is a different story.

Extreme cases of collinearity will produce a warning error in most packages. SAS will tell you that parameters estimates could not be provided for the variables having linear dependence, a.k.a. being collinear.

Hope this short explanation helps. But most texts have a special section for this problem and suggest some solutions to it (collect more data, center the data, use ridge regression, etc).

-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: Monday, June 30, 2008 9:30 AM
To: Ornelas, Fermin
Cc: [hidden email]
Subject: RE: Multicollinearity

Thanks Fermin, any specific links you recommend that might give me
this answer? it didnt come up in any of my searches yet.





Quoting "Ornelas, Fermin" <[hidden email]>:

> I suggest you do a search on the issue.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On
> Behalf Of jimjohn
> Sent: Monday, June 30, 2008 8:30 AM
> To: [hidden email]
> Subject: Multicollinearity
>
> many of my independent variables correlate highly with each other. since i am
> building a multiple linear regression model with many of these variables and
> I am only analyzing the grouped effect all the variables have on the
> dependent variable (instead of any individual effect one independent
> variable may have), then I don't need to worry about multicollinearity. Can
> someone plz confirm this? From what I recall, multicollinearity is only an
> issue when you are trying to analyze each independent variable's individual
> effect on the dependent variable. thx.
> --
> View this message in context:
> http://www.nabble.com/Multicollinearity-tp18197967p18197967.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
> CONFIDENTIAL information and is intended only for the use of the
> specific individual(s) to whom it is addressed.  It may contain
> information that is privileged and confidential under state and
> federal law.  This information may be used or disclosed only in
> accordance with law, and you may be subject to penalties under law
> for improper use or further disclosure of the information in this
> e-mail and its attachments. If you have received this e-mail in
> error, please immediately notify the person named above by reply
> e-mail, and then delete the original e-mail.  Thank you.
>




NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed.  It may contain information that is privileged and confidential under state and federal law.  This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail.  Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

satyanarayana are
In reply to this post by jimjohn
when linear relationships among independant variables are not existing the
regression coefficientsis explained as measuring the change in dependant
variable by one unit increase in independant variables and all other kept
constant.When there is presence of linear relationships among independant
variables that is a problem of multicollinarity.when that is present the
regression results are ambiguous.A way to analyse multicollinearity reqires
principal components  of independant variables are used in the model.when
all the principal components are used in the model the regression estimates
and SE's are identical with those of least square s technique. so the
reduction in multicollinearity is obtained by using less set of principal
componenets and this is done by dropping sequentially the components which
are having near zero variances. Though there is bias due dropping of
pricipal components the estimates tend to more and more precise and stable .
so please try principal component analysis



----- Original Message ----
From: jimjohn <[hidden email]>
To: [hidden email]
Sent: Monday, June 30, 2008 9:00:27 PM
Subject: Multicollinearity

many of my independent variables correlate highly with each other. since i am
building a multiple linear regression model with many of these variables and
I am only analyzing the grouped effect all the variables have on the
dependent variable (instead of any individual effect one independent
variable may have), then I don't need to worry about multicollinearity. Can
someone plz confirm this? From what I recall, multicollinearity is only an
issue when you are trying to analyze each independent variable's individual
effect on the dependent variable. thx.
--
View this message in context: http://www.nabble.com/Multicollinearity-tp18197967p18197967.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

Matthew Reeder
In reply to this post by Ornelas, Fermin-2
Right on. This is a very common topic (and not remotely specific to SPSS). At the very least, there should be quite a good deal of information available freely on the Internet. There are also countless articles and books on the topic, so no lack of information there. Best of luck.

--- On Mon, 6/30/08, Ornelas, Fermin <[hidden email]> wrote:

From: Ornelas, Fermin <[hidden email]>
Subject: Re: Multicollinearity
To: [hidden email]
Date: Monday, June 30, 2008, 11:59 AM

I suggest you do a search on the issue.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: Monday, June 30, 2008 8:30 AM
To: [hidden email]
Subject: Multicollinearity

many of my independent variables correlate highly with each other. since i am
building a multiple linear regression model with many of these variables and
I am only analyzing the grouped effect all the variables have on the
dependent variable (instead of any individual effect one independent
variable may have), then I don't need to worry about multicollinearity. Can
someone plz confirm this? From what I recall, multicollinearity is only an
issue when you are trying to analyze each independent variable's individual
effect on the dependent variable. thx.
--
View this message in context:
http://www.nabble.com/Multicollinearity-tp18197967p18197967.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the specific
individual(s) to whom it is addressed.  It may contain information that is
privileged and confidential under state and federal law.  This information may
be used or disclosed only in accordance with law, and you may be subject to
penalties under law for improper use or further disclosure of the information
in this e-mail and its attachments. If you have received this e-mail in error,
please immediately notify the person named above by reply e-mail, and then
delete the original e-mail.  Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

jimjohn
In reply to this post by Ornelas, Fermin-2
Thanks so much guys! Will definitely be checkiing out those books. Just one follow up, so you say (and I did read this too) that if my only goal is prediction, multicollinearity is not likely to cause problems. but when i add a variable that is highly correlated with one or two other variables, my Adjusted R^2 increases but at the same time, i notice big changes in my coefficients (even though the coefficient of my new added variable is only 0.003). wouldn't it be risky to make predictions using an equation who's coefficients change in such a fashion? thanks.


Ornelas, Fermin-2 wrote
I myself have written more than once on this issue. You should have been able to find it in the list, having said that anyone that is building a regression model is going to face this issue. The empirical characteristics of this problem generally are: low t-statistics for the parameter estimates, incorrect signs, instability of the parameter values and their sign direction as variables are removed from the models. These conditions will render hypotheses testing questionable. The empirical question becomes what would be a reasonable degree of collinearity in the model? To me, if my variance proportions from the diagnostics are less than .5 for no more than 3 variables say for a model of 10 variables and the condition index is less than 30, then the model passes this test. Also the VIF < 10 is a reasonable measure. One also has to be concerned with the purpose of the model, if prediction is the primary purpose of the model then having collinear variable is not likely to hinder!
  the model's prediction capability, but to make inferences that is a different story.

Extreme cases of collinearity will produce a warning error in most packages. SAS will tell you that parameters estimates could not be provided for the variables having linear dependence, a.k.a. being collinear.

Hope this short explanation helps. But most texts have a special section for this problem and suggest some solutions to it (collect more data, center the data, use ridge regression, etc).

-----Original Message-----
From: azam.khan@utoronto.ca [mailto:azam.khan@utoronto.ca]
Sent: Monday, June 30, 2008 9:30 AM
To: Ornelas, Fermin
Cc: SPSSX-L@LISTSERV.UGA.EDU
Subject: RE: Multicollinearity

Thanks Fermin, any specific links you recommend that might give me
this answer? it didnt come up in any of my searches yet.





Quoting "Ornelas, Fermin" <FerminOrnelas@azdes.gov>:

> I suggest you do a search on the issue.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On
> Behalf Of jimjohn
> Sent: Monday, June 30, 2008 8:30 AM
> To: SPSSX-L@LISTSERV.UGA.EDU
> Subject: Multicollinearity
>
> many of my independent variables correlate highly with each other. since i am
> building a multiple linear regression model with many of these variables and
> I am only analyzing the grouped effect all the variables have on the
> dependent variable (instead of any individual effect one independent
> variable may have), then I don't need to worry about multicollinearity. Can
> someone plz confirm this? From what I recall, multicollinearity is only an
> issue when you are trying to analyze each independent variable's individual
> effect on the dependent variable. thx.
> --
> View this message in context:
> http://www.nabble.com/Multicollinearity-tp18197967p18197967.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
> CONFIDENTIAL information and is intended only for the use of the
> specific individual(s) to whom it is addressed.  It may contain
> information that is privileged and confidential under state and
> federal law.  This information may be used or disclosed only in
> accordance with law, and you may be subject to penalties under law
> for improper use or further disclosure of the information in this
> e-mail and its attachments. If you have received this e-mail in
> error, please immediately notify the person named above by reply
> e-mail, and then delete the original e-mail.  Thank you.
>




NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed.  It may contain information that is privileged and confidential under state and federal law.  This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail.  Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

Ornelas, Fermin-2
Even when the purpose of the model is prediction one still needs to be concerned with this problem and should try to minimize it if possible. In my own experience building predictive models for the credit card industry I found out that as long as the collinear relationship between the variables did not change over time the model was able to predict reasonable well. If you have variables that keep changing signs and whose coefficients are not significant at all you may want to drop the variable that is causing the most problem. There is an aside issue to consider, that is in social research often one must have keep a certain variable in order to satisfy a project objective and this is also a judgment call to consider when deciding to keep a variable knowing that it will present problems. These are some of the typical caveats of model building and getting familiar with your data and the research issue at hand will help you in tackling these modeling problems.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of jimjohn
Sent: Monday, June 30, 2008 12:52 PM
To: [hidden email]
Subject: Re: Multicollinearity

Thanks so much guys! Will definitely be checkiing out those books. Just one
follow up, so you say (and I did read this too) that if my only goal is
prediction, multicollinearity is not likely to cause problems. but when i
add a variable that is highly correlated with one or two other variables, my
Adjusted R^2 increases but at the same time, i notice big changes in my
coefficients (even though the coefficient of my new added variable is only
0.003). wouldn't it be risky to make predictions using an equation who's
coefficients change in such a fashion? thanks.



Ornelas, Fermin-2 wrote:

>
> I myself have written more than once on this issue. You should have been
> able to find it in the list, having said that anyone that is building a
> regression model is going to face this issue. The empirical
> characteristics of this problem generally are: low t-statistics for the
> parameter estimates, incorrect signs, instability of the parameter values
> and their sign direction as variables are removed from the models. These
> conditions will render hypotheses testing questionable. The empirical
> question becomes what would be a reasonable degree of collinearity in the
> model? To me, if my variance proportions from the diagnostics are less
> than .5 for no more than 3 variables say for a model of 10 variables and
> the condition index is less than 30, then the model passes this test. Also
> the VIF < 10 is a reasonable measure. One also has to be concerned with
> the purpose of the model, if prediction is the primary purpose of the
> model then having collinear variable is not likely to hinder!
>   the model's prediction capability, but to make inferences that is a
> different story.
>
> Extreme cases of collinearity will produce a warning error in most
> packages. SAS will tell you that parameters estimates could not be
> provided for the variables having linear dependence, a.k.a. being
> collinear.
>
> Hope this short explanation helps. But most texts have a special section
> for this problem and suggest some solutions to it (collect more data,
> center the data, use ridge regression, etc).
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> Sent: Monday, June 30, 2008 9:30 AM
> To: Ornelas, Fermin
> Cc: [hidden email]
> Subject: RE: Multicollinearity
>
> Thanks Fermin, any specific links you recommend that might give me
> this answer? it didnt come up in any of my searches yet.
>
>
>
>
>
> Quoting "Ornelas, Fermin" <[hidden email]>:
>
>> I suggest you do a search on the issue.
>>
>> -----Original Message-----
>> From: SPSSX(r) Discussion [mailto:[hidden email]] On
>> Behalf Of jimjohn
>> Sent: Monday, June 30, 2008 8:30 AM
>> To: [hidden email]
>> Subject: Multicollinearity
>>
>> many of my independent variables correlate highly with each other. since
>> i am
>> building a multiple linear regression model with many of these variables
>> and
>> I am only analyzing the grouped effect all the variables have on the
>> dependent variable (instead of any individual effect one independent
>> variable may have), then I don't need to worry about multicollinearity.
>> Can
>> someone plz confirm this? From what I recall, multicollinearity is only
>> an
>> issue when you are trying to analyze each independent variable's
>> individual
>> effect on the dependent variable. thx.
>> --
>> View this message in context:
>> http://www.nabble.com/Multicollinearity-tp18197967p18197967.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
>> CONFIDENTIAL information and is intended only for the use of the
>> specific individual(s) to whom it is addressed.  It may contain
>> information that is privileged and confidential under state and
>> federal law.  This information may be used or disclosed only in
>> accordance with law, and you may be subject to penalties under law
>> for improper use or further disclosure of the information in this
>> e-mail and its attachments. If you have received this e-mail in
>> error, please immediately notify the person named above by reply
>> e-mail, and then delete the original e-mail.  Thank you.
>>
>
>
>
>
> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
> CONFIDENTIAL information and is intended only for the use of the specific
> individual(s) to whom it is addressed.  It may contain information that is
> privileged and confidential under state and federal law.  This information
> may be used or disclosed only in accordance with law, and you may be
> subject to penalties under law for improper use or further disclosure of
> the information in this e-mail and its attachments. If you have received
> this e-mail in error, please immediately notify the person named above by
> reply e-mail, and then delete the original e-mail.  Thank you.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

--
View this message in context: http://www.nabble.com/Multicollinearity-tp18197967p18203033.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed.  It may contain information that is privileged and confidential under state and federal law.  This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail.  Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

Juanito Talili
Can we not do first exploratory factor analysis in the p independent variables (X1, X2, ...,Xp), then use the factor solutions as predictors in the regression model?  That is,  suppose F1, F2, ...Fk are the factor solutions of the p independent variables (where k<p), then the F1, F2,...Fk would be the independent variables in predicting the dependt variable Y. Is this statistically correct?
 
Juanito 
 

--- On Mon, 6/30/08, Ornelas, Fermin <[hidden email]> wrote:

From: Ornelas, Fermin <[hidden email]>
Subject: Re: Multicollinearity
To: [hidden email]
Date: Monday, June 30, 2008, 8:06 PM

Even when the purpose of the model is prediction one still needs to be concerned
with this problem and should try to minimize it if possible. In my own
experience building predictive models for the credit card industry I found out
that as long as the collinear relationship between the variables did not change
over time the model was able to predict reasonable well. If you have variables
that keep changing signs and whose coefficients are not significant at all you
may want to drop the variable that is causing the most problem. There is an
aside issue to consider, that is in social research often one must have keep a
certain variable in order to satisfy a project objective and this is also a
judgment call to consider when deciding to keep a variable knowing that it will
present problems. These are some of the typical caveats of model building and
getting familiar with your data and the research issue at hand will help you in
tackling these modeling problems.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: Monday, June 30, 2008 12:52 PM
To: [hidden email]
Subject: Re: Multicollinearity

Thanks so much guys! Will definitely be checkiing out those books. Just one
follow up, so you say (and I did read this too) that if my only goal is
prediction, multicollinearity is not likely to cause problems. but when i
add a variable that is highly correlated with one or two other variables, my
Adjusted R^2 increases but at the same time, i notice big changes in my
coefficients (even though the coefficient of my new added variable is only
0.003). wouldn't it be risky to make predictions using an equation
who's
coefficients change in such a fashion? thanks.



Ornelas, Fermin-2 wrote:

>
> I myself have written more than once on this issue. You should have been
> able to find it in the list, having said that anyone that is building a
> regression model is going to face this issue. The empirical
> characteristics of this problem generally are: low t-statistics for the
> parameter estimates, incorrect signs, instability of the parameter values
> and their sign direction as variables are removed from the models. These
> conditions will render hypotheses testing questionable. The empirical
> question becomes what would be a reasonable degree of collinearity in the
> model? To me, if my variance proportions from the diagnostics are less
> than .5 for no more than 3 variables say for a model of 10 variables and
> the condition index is less than 30, then the model passes this test. Also
> the VIF < 10 is a reasonable measure. One also has to be concerned with
> the purpose of the model, if prediction is the primary purpose of the
> model then having collinear variable is not likely to hinder!
>   the model's prediction capability, but to make inferences that is a
> different story.
>
> Extreme cases of collinearity will produce a warning error in most
> packages. SAS will tell you that parameters estimates could not be
> provided for the variables having linear dependence, a.k.a. being
> collinear.
>
> Hope this short explanation helps. But most texts have a special section
> for this problem and suggest some solutions to it (collect more data,
> center the data, use ridge regression, etc).
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> Sent: Monday, June 30, 2008 9:30 AM
> To: Ornelas, Fermin
> Cc: [hidden email]
> Subject: RE: Multicollinearity
>
> Thanks Fermin, any specific links you recommend that might give me
> this answer? it didnt come up in any of my searches yet.
>
>
>
>
>
> Quoting "Ornelas, Fermin" <[hidden email]>:
>
>> I suggest you do a search on the issue.
>>
>> -----Original Message-----
>> From: SPSSX(r) Discussion [mailto:[hidden email]] On
>> Behalf Of jimjohn
>> Sent: Monday, June 30, 2008 8:30 AM
>> To: [hidden email]
>> Subject: Multicollinearity
>>
>> many of my independent variables correlate highly with each other.
since
>> i am
>> building a multiple linear regression model with many of these
variables
>> and
>> I am only analyzing the grouped effect all the variables have on the
>> dependent variable (instead of any individual effect one independent
>> variable may have), then I don't need to worry about
multicollinearity.
>> Can
>> someone plz confirm this? From what I recall, multicollinearity is
only

>> an
>> issue when you are trying to analyze each independent variable's
>> individual
>> effect on the dependent variable. thx.
>> --
>> View this message in context:
>> http://www.nabble.com/Multicollinearity-tp18197967p18197967.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except
the

>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
>> CONFIDENTIAL information and is intended only for the use of the
>> specific individual(s) to whom it is addressed.  It may contain
>> information that is privileged and confidential under state and
>> federal law.  This information may be used or disclosed only in
>> accordance with law, and you may be subject to penalties under law
>> for improper use or further disclosure of the information in this
>> e-mail and its attachments. If you have received this e-mail in
>> error, please immediately notify the person named above by reply
>> e-mail, and then delete the original e-mail.  Thank you.
>>
>
>
>
>
> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
> CONFIDENTIAL information and is intended only for the use of the specific
> individual(s) to whom it is addressed.  It may contain information that is
> privileged and confidential under state and federal law.  This information
> may be used or disclosed only in accordance with law, and you may be
> subject to penalties under law for improper use or further disclosure of
> the information in this e-mail and its attachments. If you have received
> this e-mail in error, please immediately notify the person named above by
> reply e-mail, and then delete the original e-mail.  Thank you.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

--
View this message in context:
http://www.nabble.com/Multicollinearity-tp18197967p18203033.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the specific
individual(s) to whom it is addressed.  It may contain information that is
privileged and confidential under state and federal law.  This information may
be used or disclosed only in accordance with law, and you may be subject to
penalties under law for improper use or further disclosure of the information
in this e-mail and its attachments. If you have received this e-mail in error,
please immediately notify the person named above by reply e-mail, and then
delete the original e-mail.  Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

ViAnn Beadle
How one can use this technique to predict an individual observation. The
regression equation is very straightforward, what is required to do it using
the results of intermediate principal components.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Juanito Talili
Sent: Monday, June 30, 2008 6:45 PM
To: [hidden email]
Subject: Re: Multicollinearity

Can we not do first exploratory factor analysis in the p independent
variables (X1, X2, ...,Xp), then use the factor solutions as predictors in
the regression model?  That is,  suppose F1, F2, ...Fk are the factor
solutions of the p independent variables (where k<p), then the F1, F2,...Fk
would be the independent variables in predicting the dependt variable Y. Is
this statistically correct?

Juanito


--- On Mon, 6/30/08, Ornelas, Fermin <[hidden email]> wrote:

From: Ornelas, Fermin <[hidden email]>
Subject: Re: Multicollinearity
To: [hidden email]
Date: Monday, June 30, 2008, 8:06 PM

Even when the purpose of the model is prediction one still needs to be
concerned
with this problem and should try to minimize it if possible. In my own
experience building predictive models for the credit card industry I found
out
that as long as the collinear relationship between the variables did not
change
over time the model was able to predict reasonable well. If you have
variables
that keep changing signs and whose coefficients are not significant at all
you
may want to drop the variable that is causing the most problem. There is an
aside issue to consider, that is in social research often one must have keep
a
certain variable in order to satisfy a project objective and this is also a
judgment call to consider when deciding to keep a variable knowing that it
will
present problems. These are some of the typical caveats of model building
and
getting familiar with your data and the research issue at hand will help you
in
tackling these modeling problems.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: Monday, June 30, 2008 12:52 PM
To: [hidden email]
Subject: Re: Multicollinearity

Thanks so much guys! Will definitely be checkiing out those books. Just one
follow up, so you say (and I did read this too) that if my only goal is
prediction, multicollinearity is not likely to cause problems. but when i
add a variable that is highly correlated with one or two other variables, my
Adjusted R^2 increases but at the same time, i notice big changes in my
coefficients (even though the coefficient of my new added variable is only
0.003). wouldn't it be risky to make predictions using an equation
who's
coefficients change in such a fashion? thanks.



Ornelas, Fermin-2 wrote:

>
> I myself have written more than once on this issue. You should have been
> able to find it in the list, having said that anyone that is building a
> regression model is going to face this issue. The empirical
> characteristics of this problem generally are: low t-statistics for the
> parameter estimates, incorrect signs, instability of the parameter values
> and their sign direction as variables are removed from the models. These
> conditions will render hypotheses testing questionable. The empirical
> question becomes what would be a reasonable degree of collinearity in the
> model? To me, if my variance proportions from the diagnostics are less
> than .5 for no more than 3 variables say for a model of 10 variables and
> the condition index is less than 30, then the model passes this test. Also
> the VIF < 10 is a reasonable measure. One also has to be concerned with
> the purpose of the model, if prediction is the primary purpose of the
> model then having collinear variable is not likely to hinder!
>   the model's prediction capability, but to make inferences that is a
> different story.
>
> Extreme cases of collinearity will produce a warning error in most
> packages. SAS will tell you that parameters estimates could not be
> provided for the variables having linear dependence, a.k.a. being
> collinear.
>
> Hope this short explanation helps. But most texts have a special section
> for this problem and suggest some solutions to it (collect more data,
> center the data, use ridge regression, etc).
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> Sent: Monday, June 30, 2008 9:30 AM
> To: Ornelas, Fermin
> Cc: [hidden email]
> Subject: RE: Multicollinearity
>
> Thanks Fermin, any specific links you recommend that might give me
> this answer? it didnt come up in any of my searches yet.
>
>
>
>
>
> Quoting "Ornelas, Fermin" <[hidden email]>:
>
>> I suggest you do a search on the issue.
>>
>> -----Original Message-----
>> From: SPSSX(r) Discussion [mailto:[hidden email]] On
>> Behalf Of jimjohn
>> Sent: Monday, June 30, 2008 8:30 AM
>> To: [hidden email]
>> Subject: Multicollinearity
>>
>> many of my independent variables correlate highly with each other.
since
>> i am
>> building a multiple linear regression model with many of these
variables
>> and
>> I am only analyzing the grouped effect all the variables have on the
>> dependent variable (instead of any individual effect one independent
>> variable may have), then I don't need to worry about
multicollinearity.
>> Can
>> someone plz confirm this? From what I recall, multicollinearity is
only

>> an
>> issue when you are trying to analyze each independent variable's
>> individual
>> effect on the dependent variable. thx.
>> --
>> View this message in context:
>> http://www.nabble.com/Multicollinearity-tp18197967p18197967.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except
the

>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
>> CONFIDENTIAL information and is intended only for the use of the
>> specific individual(s) to whom it is addressed.  It may contain
>> information that is privileged and confidential under state and
>> federal law.  This information may be used or disclosed only in
>> accordance with law, and you may be subject to penalties under law
>> for improper use or further disclosure of the information in this
>> e-mail and its attachments. If you have received this e-mail in
>> error, please immediately notify the person named above by reply
>> e-mail, and then delete the original e-mail.  Thank you.
>>
>
>
>
>
> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
> CONFIDENTIAL information and is intended only for the use of the specific
> individual(s) to whom it is addressed.  It may contain information that is
> privileged and confidential under state and federal law.  This information
> may be used or disclosed only in accordance with law, and you may be
> subject to penalties under law for improper use or further disclosure of
> the information in this e-mail and its attachments. If you have received
> this e-mail in error, please immediately notify the person named above by
> reply e-mail, and then delete the original e-mail.  Thank you.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

--
View this message in context:
http://www.nabble.com/Multicollinearity-tp18197967p18203033.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the specific
individual(s) to whom it is addressed.  It may contain information that is
privileged and confidential under state and federal law.  This information
may
be used or disclosed only in accordance with law, and you may be subject to
penalties under law for improper use or further disclosure of the
information
in this e-mail and its attachments. If you have received this e-mail in
error,
please immediately notify the person named above by reply e-mail, and then
delete the original e-mail.  Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Clementine and distance functions for clustering

Greg James-2
Hello,

I am researching dynamic time warping as a distance measure within k-means
clustering (a substitution for Euclidean distance). I would like to know if
there is any way to insert a custom distance function into Clementine's
K-means modeling node. I am already familiar with the R package "flexcust."
I am hoping there is an expedient way to do this in Clementine.

Thank you,

-Greg James

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

jimjohn
In reply to this post by Ornelas, Fermin-2
Thanks so much! I see that SPSS has collinearity diagnostics:
Tolerance and VIF. Can anyone recommend generally what values of
tolerance and VIF should indicate there is a multicollinearity
problem. I am seeing many different responses in different
lectures/books. some say a tolerance < .1 or a VIF > 10 indicate
collinearity. others say tolerance < .2 and VIF > 4). and then ive
also seen tolerance < .4. Any ideas? thx.




Quoting "Ornelas, Fermin" <[hidden email]>:

> Even when the purpose of the model is prediction one still needs to
> be concerned with this problem and should try to minimize it if
> possible. In my own experience building predictive models for the
> credit card industry I found out that as long as the collinear
> relationship between the variables did not change over time the
> model was able to predict reasonable well. If you have variables
> that keep changing signs and whose coefficients are not significant
> at all you may want to drop the variable that is causing the most
> problem. There is an aside issue to consider, that is in social
> research often one must have keep a certain variable in order to
> satisfy a project objective and this is also a judgment call to
> consider when deciding to keep a variable knowing that it will
> present problems. These are some of the typical caveats of model
> building and getting familiar with your data and the research issue
> at hand will help you in tackling these modeling problems.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On
> Behalf Of jimjohn
> Sent: Monday, June 30, 2008 12:52 PM
> To: [hidden email]
> Subject: Re: Multicollinearity
>
> Thanks so much guys! Will definitely be checkiing out those books. Just one
> follow up, so you say (and I did read this too) that if my only goal is
> prediction, multicollinearity is not likely to cause problems. but when i
> add a variable that is highly correlated with one or two other variables, my
> Adjusted R^2 increases but at the same time, i notice big changes in my
> coefficients (even though the coefficient of my new added variable is only
> 0.003). wouldn't it be risky to make predictions using an equation who's
> coefficients change in such a fashion? thanks.
>
>
>
> Ornelas, Fermin-2 wrote:
>>
>> I myself have written more than once on this issue. You should have been
>> able to find it in the list, having said that anyone that is building a
>> regression model is going to face this issue. The empirical
>> characteristics of this problem generally are: low t-statistics for the
>> parameter estimates, incorrect signs, instability of the parameter values
>> and their sign direction as variables are removed from the models. These
>> conditions will render hypotheses testing questionable. The empirical
>> question becomes what would be a reasonable degree of collinearity in the
>> model? To me, if my variance proportions from the diagnostics are less
>> than .5 for no more than 3 variables say for a model of 10 variables and
>> the condition index is less than 30, then the model passes this test. Also
>> the VIF < 10 is a reasonable measure. One also has to be concerned with
>> the purpose of the model, if prediction is the primary purpose of the
>> model then having collinear variable is not likely to hinder!
>>   the model's prediction capability, but to make inferences that is a
>> different story.
>>
>> Extreme cases of collinearity will produce a warning error in most
>> packages. SAS will tell you that parameters estimates could not be
>> provided for the variables having linear dependence, a.k.a. being
>> collinear.
>>
>> Hope this short explanation helps. But most texts have a special section
>> for this problem and suggest some solutions to it (collect more data,
>> center the data, use ridge regression, etc).
>>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]]
>> Sent: Monday, June 30, 2008 9:30 AM
>> To: Ornelas, Fermin
>> Cc: [hidden email]
>> Subject: RE: Multicollinearity
>>
>> Thanks Fermin, any specific links you recommend that might give me
>> this answer? it didnt come up in any of my searches yet.
>>
>>
>>
>>
>>
>> Quoting "Ornelas, Fermin" <[hidden email]>:
>>
>>> I suggest you do a search on the issue.
>>>
>>> -----Original Message-----
>>> From: SPSSX(r) Discussion [mailto:[hidden email]] On
>>> Behalf Of jimjohn
>>> Sent: Monday, June 30, 2008 8:30 AM
>>> To: [hidden email]
>>> Subject: Multicollinearity
>>>
>>> many of my independent variables correlate highly with each other. since
>>> i am
>>> building a multiple linear regression model with many of these variables
>>> and
>>> I am only analyzing the grouped effect all the variables have on the
>>> dependent variable (instead of any individual effect one independent
>>> variable may have), then I don't need to worry about multicollinearity.
>>> Can
>>> someone plz confirm this? From what I recall, multicollinearity is only
>>> an
>>> issue when you are trying to analyze each independent variable's
>>> individual
>>> effect on the dependent variable. thx.
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Multicollinearity-tp18197967p18197967.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
>>> CONFIDENTIAL information and is intended only for the use of the
>>> specific individual(s) to whom it is addressed.  It may contain
>>> information that is privileged and confidential under state and
>>> federal law.  This information may be used or disclosed only in
>>> accordance with law, and you may be subject to penalties under law
>>> for improper use or further disclosure of the information in this
>>> e-mail and its attachments. If you have received this e-mail in
>>> error, please immediately notify the person named above by reply
>>> e-mail, and then delete the original e-mail.  Thank you.
>>>
>>
>>
>>
>>
>> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
>> CONFIDENTIAL information and is intended only for the use of the specific
>> individual(s) to whom it is addressed.  It may contain information that is
>> privileged and confidential under state and federal law.  This information
>> may be used or disclosed only in accordance with law, and you may be
>> subject to penalties under law for improper use or further disclosure of
>> the information in this e-mail and its attachments. If you have received
>> this e-mail in error, please immediately notify the person named above by
>> reply e-mail, and then delete the original e-mail.  Thank you.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/Multicollinearity-tp18197967p18203033.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
> CONFIDENTIAL information and is intended only for the use of the
> specific individual(s) to whom it is addressed.  It may contain
> information that is privileged and confidential under state and
> federal law.  This information may be used or disclosed only in
> accordance with law, and you may be subject to penalties under law
> for improper use or further disclosure of the information in this
> e-mail and its attachments. If you have received this e-mail in
> error, please immediately notify the person named above by reply
> e-mail, and then delete the original e-mail.  Thank you.
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

jimjohn
In reply to this post by jimjohn
just to follow up on this multicollinearity issue, I'm a little bit confused. I just noticed a case where my coefficient for one of my IV's was positive when it was supposed to be negatively related to the DV. I understand that this means there is multicollinearity. However, for this same example, my VIF's were 1.236 and 1.413. and my tolerances were .809 and .708. none of these VIF and tolerance levels area nywhere near close to the levels I shoudl be seeing for collinearity. What can I conclude from this, that there can be cases where VIF and tolerance doesn't spot multicollinearity that actually does exist? thanks!




jimjohn wrote
many of my independent variables correlate highly with each other. since i am building a multiple linear regression model with many of these variables and I am only analyzing the grouped effect all the variables have on the dependent variable (instead of any individual effect one independent variable may have), then I don't need to worry about multicollinearity. Can someone plz confirm this? From what I recall, multicollinearity is only an issue when you are trying to analyze each independent variable's individual effect on the dependent variable. thx.
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

SR Millis-3
In reply to this post by jimjohn
VIF (and tolerance) have limitations: the inability to distinguish among several coexisting near-dependencies and the lack of a meaningful guideline to differentiate high VIF from low.

To diagnose collinearity, it is much better to first use the condition indexes: pick out those that are large, say >20 or >30. For those large condition indexes, see if there are large variance-decomposition proportions (> .50) associated with each high condition index: this identifies those variables that have high collinearity.


Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Wed, 7/2/08, [hidden email] <[hidden email]> wrote:

> From: [hidden email] <[hidden email]>
> Subject: Re: Multicollinearity
> To: [hidden email]
> Date: Wednesday, July 2, 2008, 10:16 AM
> Thanks so much! I see that SPSS has collinearity
> diagnostics:
> Tolerance and VIF. Can anyone recommend generally what
> values of
> tolerance and VIF should indicate there is a
> multicollinearity
> problem. I am seeing many different responses in different
> lectures/books. some say a tolerance < .1 or a VIF >
> 10 indicate
> collinearity. others say tolerance < .2 and VIF > 4).
> and then ive
> also seen tolerance < .4. Any ideas? thx.
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

Johnny Amora
I agree with Dr. Scott's comments.  In addition to condition index, eigenvalue is also helpful.  An eigenvalue of 0 indicates perfect collinearity.  For those with eigenvalues close to zero are associated with large condition index and large variance proportions.  As Dr. Scott's said, this identifies those variables that have high collinearity.

Johnny T. Amora   Statistician, Center for Learning and Performance Assessment
De La Salle-College of Saint Benilde
Manila, Philippines

--- On Fri, 7/4/08, SR Millis <[hidden email]> wrote:
From: SR Millis <[hidden email]>
Subject: Re: Multicollinearity
To: [hidden email]
Date: Friday, July 4, 2008, 5:48 AM

VIF (and tolerance) have limitations: the inability to distinguish among several
coexisting near-dependencies and the lack of a meaningful guideline to
differentiate high VIF from low.

To diagnose collinearity, it is much better to first use the condition indexes:
pick out those that are large, say >20 or >30. For those large condition
indexes, see if there are large variance-decomposition proportions (> .50)
associated with each high condition index: this identifies those variables that
have high collinearity.


Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Wed, 7/2/08, [hidden email] <[hidden email]> wrote:

> From: [hidden email] <[hidden email]>
> Subject: Re: Multicollinearity
> To: [hidden email]
> Date: Wednesday, July 2, 2008, 10:16 AM
> Thanks so much! I see that SPSS has collinearity
> diagnostics:
> Tolerance and VIF. Can anyone recommend generally what
> values of
> tolerance and VIF should indicate there is a
> multicollinearity
> problem. I am seeing many different responses in different
> lectures/books. some say a tolerance < .1 or a VIF >
> 10 indicate
> collinearity. others say tolerance < .2 and VIF > 4).
> and then ive
> also seen tolerance < .4. Any ideas? thx.
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Clementine users? (was: Clementine and distance functions for clustering)

Greg James-2
In reply to this post by ViAnn Beadle
Are there any Clementine users on this list? I have received no responses to
my question posted on 6/30.

I am using the Educational version of Clementine and SPSS provides no
support other than installation. They direct all students to this list.

Thanks,
-Greg


-----Original Message-----
From: Greg James [mailto:[hidden email]]
Sent: Monday, June 30, 2008 10:15 PM
To: '[hidden email]'
Subject: Clementine and distance functions for clustering

Hello,

I am researching dynamic time warping as a distance measure within k-means
clustering (a substitution for Euclidean distance). I would like to know if
there is any way to insert a custom distance function into Clementine's
K-means modeling node. I am already familiar with the R package "flexcust."
I am hoping there is an expedient way to do this in Clementine.

Thank you,

-Greg James

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Clementine users? (was: Clementine and distance functions for clustering)

Mark Palmberg
Have you tried the CLUG-L mailing list?

On Fri, Jul 4, 2008 at 10:53 AM, Greg James <[hidden email]> wrote:

> Are there any Clementine users on this list? I have received no responses
> to
> my question posted on 6/30.
>
> I am using the Educational version of Clementine and SPSS provides no
> support other than installation. They direct all students to this list.
>
> Thanks,
> -Greg
>
>
> -----Original Message-----
> From: Greg James [mailto:[hidden email]]
> Sent: Monday, June 30, 2008 10:15 PM
> To: '[hidden email]'
> Subject: Clementine and distance functions for clustering
>
> Hello,
>
> I am researching dynamic time warping as a distance measure within k-means
> clustering (a substitution for Euclidean distance). I would like to know if
> there is any way to insert a custom distance function into Clementine's
> K-means modeling node. I am already familiar with the R package "flexcust."
> I am hoping there is an expedient way to do this in Clementine.
>
> Thank you,
>
> -Greg James
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

jimjohn
In reply to this post by SR Millis-3
Thanks Scott, this test does seem to work much better. Can I just ask
how to interpret these variance-decomposition proportions? Just
wondering what it means if an eigenvalue has a high variance
proportion for one variable? Or if a condition index is high. Or, is
there an article or book anyone recommends that would help me
understand this? Thanks!




Quoting SR Millis <[hidden email]>:

> VIF (and tolerance) have limitations: the inability to distinguish
> among several coexisting near-dependencies and the lack of a
> meaningful guideline to differentiate high VIF from low.
>
> To diagnose collinearity, it is much better to first use the
> condition indexes: pick out those that are large, say >20 or >30.
> For those large condition indexes, see if there are large
> variance-decomposition proportions (> .50) associated with each high
>  condition index: this identifies those variables that have high
> collinearity.
>
>
> Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
> Professor & Director of Research
> Dept of Physical Medicine & Rehabilitation
> Wayne State University School of Medicine
> 261 Mack Blvd
> Detroit, MI 48201
> Email:  [hidden email]
> Tel: 313-993-8085
> Fax: 313-966-7682
>
>
> --- On Wed, 7/2/08, [hidden email] <[hidden email]> wrote:
>
>> From: [hidden email] <[hidden email]>
>> Subject: Re: Multicollinearity
>> To: [hidden email]
>> Date: Wednesday, July 2, 2008, 10:16 AM
>> Thanks so much! I see that SPSS has collinearity
>> diagnostics:
>> Tolerance and VIF. Can anyone recommend generally what
>> values of
>> tolerance and VIF should indicate there is a
>> multicollinearity
>> problem. I am seeing many different responses in different
>> lectures/books. some say a tolerance < .1 or a VIF >
>> 10 indicate
>> collinearity. others say tolerance < .2 and VIF > 4).
>> and then ive
>> also seen tolerance < .4. Any ideas? thx.
>>
>>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity

SR Millis-3
I'd recommend the book, "Regression Diagnostics," by Belsley et al:

http://www.amazon.com/Regression-Diagnostics-Identifying-Influential-Collinearity/dp/0471691178/ref=pd_bbs_sr_2?ie=UTF8&s=books&qid=1215261260&sr=8-2



Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Fri, 7/4/08, [hidden email] <[hidden email]> wrote:

> From: [hidden email] <[hidden email]>
> Subject: Re: Multicollinearity
> To: "SR Millis" <[hidden email]>
> Cc: "SPSS" <[hidden email]>
> Date: Friday, July 4, 2008, 4:33 PM
> Thanks Scott, this test does seem to work much better. Can I
> just ask
> how to interpret these variance-decomposition proportions?
> Just
> wondering what it means if an eigenvalue has a high
> variance
> proportion for one variable? Or if a condition index is
> high. Or, is
> there an article or book anyone recommends that would help
> me
> understand this? Thanks!
>
>
>
>
> Quoting SR Millis <[hidden email]>:
>
> > VIF (and tolerance) have limitations: the inability to
> distinguish
> > among several coexisting near-dependencies and the
> lack of a
> > meaningful guideline to differentiate high VIF from
> low.
> >
> > To diagnose collinearity, it is much better to first
> use the
> > condition indexes: pick out those that are large, say
> >20 or >30.
> > For those large condition indexes, see if there are
> large
> > variance-decomposition proportions (> .50)
> associated with each high
> >  condition index: this identifies those variables that
> have high
> > collinearity.
> >
> >
> > Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
> > Professor & Director of Research
> > Dept of Physical Medicine & Rehabilitation
> > Wayne State University School of Medicine
> > 261 Mack Blvd
> > Detroit, MI 48201
> > Email:  [hidden email]
> > Tel: 313-993-8085
> > Fax: 313-966-7682
> >
> >
> > --- On Wed, 7/2/08, [hidden email]
> <[hidden email]> wrote:
> >
> >> From: [hidden email]
> <[hidden email]>
> >> Subject: Re: Multicollinearity
> >> To: [hidden email]
> >> Date: Wednesday, July 2, 2008, 10:16 AM
> >> Thanks so much! I see that SPSS has collinearity
> >> diagnostics:
> >> Tolerance and VIF. Can anyone recommend generally
> what
> >> values of
> >> tolerance and VIF should indicate there is a
> >> multicollinearity
> >> problem. I am seeing many different responses in
> different
> >> lectures/books. some say a tolerance < .1 or a
> VIF >
> >> 10 indicate
> >> collinearity. others say tolerance < .2 and VIF
> > 4).
> >> and then ive
> >> also seen tolerance < .4. Any ideas? thx.
> >>
> >>
> >

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
12