Multicollinearity in SPSS

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Multicollinearity in SPSS

Love-2
Hi guys,

I just found about this forum today and I am really happy for that. I am writing a PhD thesis and could not get much help from my advisor so far.
I have a dataset with categories to run a logistic regression. However, i want to check for multicollinearity before I run the log. regression. A book on SPSS says  to run a linear regression and ignore the rest of the ouput but focus on the Coefficients table and the columns labelled collinearity Statistics. My questions are:

The correlation between two variables ( fathers’ Spanish origin and mother’s Spanish origin) is -0.714. Another warning sign, the Pearson correlation between these variables also suggests multicollinearity (Pearson correlation of 0.808). However I am not sure if tolerance value is small enough to raise concern (0.293) or the VIF is  higher than the cut-off point advisable for a logistic regression (3.416). Some researchers say the cut-off for the tolerance is 0.1 or 0.2 and the VIF is 4, but one book says "Values of VIF exceeding 10 are often regarded as indicating multicollinearity, but in weaker models, which is often the case in logistic regression, values above 2.5 may be a cause for concern"

So, what do you guys think? Should I drop one of the variables? I also have a similar case with mother's race and father's race, only a little less correlation (0.619)

Another question is: Although the book said to ignore the other outputs, I couldn't help and see that some of the  condition values are very high in the collinearity diagnostics table. Should I care at all about these values or should just look at the other statistics like tolerance and VIF and correlation values?

- Also, Is there any other way to check for multicolinearity in this type of dataset in SPSS?

Please reply to any questions you might know the answer!

Many thanks, Love.


Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity in SPSS

zstatman
Yes, you found a very helpful site

Your question has a lot of implications so not really an easy answer without actually having the data to review with you. If you want, contact me directly.

WMB
Statistical Services

=========================================
mailto:[hidden email]
http://home.earthlink.net/~statmanz
=========================================

Virus Scan Notice: This email is certified to be virus free.




On 8/1/2006 3:15:23 PM, Love (sent by Nabble.com) ([hidden email]) wrote:

> Hi guys,
>
> I just found about this forum today and I am really happy for that. I am
> writing a PhD thesis and could not get much help from my advisor so far.
> I have a dataset with categories to run a logistic regression. However, i
> want to check for multicollinearity before I run the log. regression. A
> book
> on SPSS says  to run a linear regression and ignore the rest of the ouput
> but focus on the Coefficients table and the columns labelled collinearity
> Statistics. My questions are:
>
> The correlation between two variables ( fathers’ Spanish origin and
> mother’s
> Spanish origin) is -0.714. Another warning sign, the Pearson correlation
> between these variables also suggests multicollinearity (Pearson
> correlation
> of 0.808). However I am not sure if tolerance value is small enough to
> raise
> concern (0.293) or the VIF is  higher than the cut-off point advisable for
> a
> logistic regression (3.416). Some researchers say the cut-off for the
> tolerance is 0.1 or 0.2 and the VIF is 4, but one book says
> "Values of VIF
> exceeding 10 are often regarded as indicating multicollinearity, but in
> weaker models, which is often the case in logistic regression, values above
> 2.5 may be a
Will
Statistical Services
 
============
info.statman@earthlink.net
http://home.earthlink.net/~z_statman/
============
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity in SPSS

Jason McNellis
In reply to this post by Love-2
It's hard to give you a real specific recommendation without knowing more
about the variables at question are coded or what you are trying to model. I
would assume country of origin would be coded as a series of dummy
variables, but based on your post that doesn't seem to be the case so I am a
little confused.  (I would think the same thing about race, unless it is
applied at the population level.)

Based on my experience your values do raise a possible warning, but I don't
know enough about the problem to give many suggestions without knowing more.


Have a great day, Jason




-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Love (sent by Nabble.com)
Sent: Tuesday, August 01, 2006 2:15 PM
To: [hidden email]
Subject: Multicollinearity in SPSS

Hi guys,

I just found about this forum today and I am really happy for that. I am
writing a PhD thesis and could not get much help from my advisor so far.
I have a dataset with categories to run a logistic regression. However, i
want to check for multicollinearity before I run the log. regression. A book
on SPSS says  to run a linear regression and ignore the rest of the ouput
but focus on the Coefficients table and the columns labelled collinearity
Statistics. My questions are:

The correlation between two variables ( fathers' Spanish origin and mother's
Spanish origin) is -0.714. Another warning sign, the Pearson correlation
between these variables also suggests multicollinearity (Pearson correlation
of 0.808). However I am not sure if tolerance value is small enough to raise
concern (0.293) or the VIF is  higher than the cut-off point advisable for a
logistic regression (3.416). Some researchers say the cut-off for the
tolerance is 0.1 or 0.2 and the VIF is 4, but one book says "Values of VIF
exceeding 10 are often regarded as indicating multicollinearity, but in
weaker models, which is often the case in logistic regression, values above
2.5 may be a cause for concern"

So, what do you guys think? Should I drop one of the variables? I also have
a similar case with mother's race and father's race, only a little less
correlation (0.619)

Another question is: Although the book said to ignore the other outputs, I
couldn't help and see that some of the  condition values are very high in
the collinearity diagnostics table. Should I care at all about these values
or should just look at the other statistics like tolerance and VIF and
correlation values?

- Also, Is there any other way to check for multicolinearity in this type of
dataset in SPSS?

Please reply to any questions you might know the answer!

Many thanks, Love.



--
View this message in context:
http://www.nabble.com/Multicollinearity-in-SPSS-tf2035651.html#a5601207
Sent from the SPSSX Discussion forum at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity in SPSS

Love-2
Thank you Jason and Will for your prompt answers.


Sorry, i did not explain my variables well. I am trying to see which factors increase the risk of having a child with birth defects (1-yes (case), 2-no (control). I have lots of independent variables and the one i am afraid there is multicollinearity is fathers spanish origin (1-yes, 0-no). with mother's spanish origin (1-yes, 0-no).

Thank you !

Love.
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity in SPSS

zstatman
Conceptually I would agree but my concern lies more with the fact that these
variables are dichotomies and experience has shown that testing these for
multicollinearity is difficult at best.

Can you share a sample of your data for me to do some "hands on?" If you
want to converse directly do contact me.

W

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Love (sent by Nabble.com)
Sent: Tuesday, August 01, 2006 7:52 PM
To: [hidden email]
Subject: Re: Multicollinearity in SPSS

Thank you Jason and Will for your prompt answers.


Sorry, i did not explain my variables well. I am trying to see which factors
increase the risk of having a child with birth defects (1-yes (case), 2-no
(control). I have lots of independent variables and the one i am afraid
there is multicollinearity is fathers spanish origin (1-yes, 0-no). with
mother's spanish origin (1-yes, 0-no).

Thank you !

Love.
--
View this message in context:
http://www.nabble.com/Multicollinearity-in-SPSS-tf2035651.html#a5605301
Sent from the SPSSX Discussion forum at Nabble.com.
Will
Statistical Services
 
============
info.statman@earthlink.net
http://home.earthlink.net/~z_statman/
============
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity in SPSS

Love-2
Thank you so much Will for your interest in helping me. I am not sure if I can share any part of my dataset. I wish I could but I signed an IRB and I think I am prohibited by law to do that. But I appreciate your trying to help me. Love.
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity in SPSS

Almost Done
Hey, guys! I might have a problem that might seem easy to you but it isn't for me. I'm doing a research about creative advertising and have to check for example whether the divergence (rated on a seven point Lickert scale) and relevance (rated the same) and the interaction between the two divergence*relevance has an effect on the attention that the respondents also rated on a 7 point lickert schale. So when I run a regression this is what I get:

                                               B             t                    sig
Constant                                 ,529         ,649 ,518
Divergence                              ,666         4,215            ,000  
Relevance                                ,573         2,275            ,024
Divergence*Relevance              -,091        -2,012           ,046

This seemed weird to me because divergence*relevance has a negative influence on the dependent variable attention. How can it be?
So I removed the Divergence*Relevance interaction, and this is what I got:

                                              B                 t              sig.
Constant                              1,892            4,113        ,000
Divergence                            ,398             4,622        ,000
Relevance                              ,090              1,167       ,245

So the B and significance changed drastically. I've tested for multicoliniarity using the VIF. The combination where the Divergence was the dependent variable (and independent : Divergence and Divergence*Relevance) was the one where VIF was greater than 5. All the other combinations were fine (VIF was either 1 or slighter greater than 1).

So my question is - what does that mean and how do I proceed? What do I have to do? And how should I explain that?
 

       

                                       






Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity in SPSS

David Marso
Administrator
Please begin a new thread rather than replying to a 6 year old one with a subject only tangentially related to your own query. Nabble gets the threading all wrong and the UGA archive is useless for navigational reasons.  What are the actual correlations?  Maybe create some plots so you can SEE what is going on.
Interactions are sometimes rather difficult to interpret without some visuals.

Almost Done wrote
Hey, guys! I might have a problem that might seem easy to you but it isn't for me. I'm doing a research about creative advertising and have to check for example whether the divergence (rated on a seven point Lickert scale) and relevance (rated the same) and the interaction between the two divergence*relevance has an effect on the attention that the respondents also rated on a 7 point lickert schale. So when I run a regression this is what I get:

                                               B             t                    sig
Constant                                 ,529         ,649 ,518
Divergence                              ,666         4,215            ,000  
Relevance                                ,573         2,275            ,024
Divergence*Relevance              -,091        -2,012           ,046

This seemed weird to me because divergence*relevance has a negative influence on the dependent variable attention. How can it be?
So I removed the Divergence*Relevance interaction, and this is what I got:

                                              B                 t              sig.
Constant                              1,892            4,113        ,000
Divergence                            ,398             4,622        ,000
Relevance                              ,090              1,167       ,245

So the B and significance changed drastically. I've tested for multicoliniarity using the VIF. The combination where the Divergence was the dependent variable (and independent : Divergence and Divergence*Relevance) was the one where VIF was greater than 5. All the other combinations were fine (VIF was either 1 or slighter greater than 1).

So my question is - what does that mean and how do I proceed? What do I have to do? And how should I explain that?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity in SPSS

Almost Done
Hey, man. Sorry for posting in an old thread. If you want I can just send to you in an excel file the data for that one dependent variable and the two independent ones. Because its difficult to overtype all the data from the spss results tables.
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity in SPSS

David Marso
Administrator
You should start a new thread here
http://spssx-discussion.1045642.n5.nabble.com/ and attach the Excel
file to the message.

On Tue, Aug 7, 2012 at 7:57 AM, Almost Done [via SPSSX Discussion]
<[hidden email]> wrote:

> Hey, man. Sorry for posting in an old thread. If you want I can just send to
> you in an excel file the data for that one dependent variable and the two
> independent ones. Because its difficult to overtype all the data from the
> spss results tables.
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://spssx-discussion.1045642.n5.nabble.com/Multicollinearity-in-SPSS-tp1070043p5714606.html
> To unsubscribe from Multicollinearity in SPSS, click here.
> NAML
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Multicollinearity in SPSS

Swank, Paul R
In reply to this post by Almost Done
To me the result says that the positive effect of divergence is reduced in the presence of higher relevance and, likewise, the effect of relevance is reduced in the presence of higher divergence. It is sort of opposite to the synergistic effect.

Paul R. Swank, Ph.D.
Professor, Department of Pediatrics
Medical School
Adjunct Professor, Health Promotions and Behavioral Sciences
School of Public Health
University of Texas Health Science Center at Houston


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Almost Done
Sent: Monday, August 06, 2012 4:24 PM
To: [hidden email]
Subject: Re: Multicollinearity in SPSS

Hey, guys! I might have a problem that might seem easy to you but it isn't for me. I'm doing a research about creative advertising and have to check for example whether the divergence (rated on a seven point Lickert scale) and relevance (rated the same) and the interaction between the two divergence*relevance has an effect on the attention that the respondents also rated on a 7 point lickert schale. So when I run a regression this is what I get:

                                               B             t
sig
Constant                                 ,529         ,649      ,518
Divergence                              ,666         4,215            ,000
Relevance                                ,573         2,275            ,024
Divergence*Relevance              -,091        -2,012           ,046

This seemed weird to me because divergence*relevance has a negative influence on the dependent variable attention. How can it be?
So I removed the Divergence*Relevance interaction, and this is what I got:

                                              B                 t
sig.
Constant                              1,892            4,113        ,000
Divergence                            ,398             4,622        ,000
Relevance                              ,090              1,167       ,245

So the B and significance changed drastically. I've tested for multicoliniarity using the VIF. The combination where the Divergence was the dependent variable (and independent : Divergence and Divergence*Relevance) was the one where VIF was greater than 5. All the other combinations were fine (VIF was either 1 or slighter greater than 1).

So my question is - what does that mean and how do I proceed? What do I have to do? And how should I explain that?















--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multicollinearity-in-SPSS-tp1070043p5714600.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD