Hi guys,
I just found about this forum today and I am really happy for that. I am writing a PhD thesis and could not get much help from my advisor so far. I have a dataset with categories to run a logistic regression. However, i want to check for multicollinearity before I run the log. regression. A book on SPSS says to run a linear regression and ignore the rest of the ouput but focus on the Coefficients table and the columns labelled collinearity Statistics. My questions are: The correlation between two variables ( fathers’ Spanish origin and mother’s Spanish origin) is -0.714. Another warning sign, the Pearson correlation between these variables also suggests multicollinearity (Pearson correlation of 0.808). However I am not sure if tolerance value is small enough to raise concern (0.293) or the VIF is higher than the cut-off point advisable for a logistic regression (3.416). Some researchers say the cut-off for the tolerance is 0.1 or 0.2 and the VIF is 4, but one book says "Values of VIF exceeding 10 are often regarded as indicating multicollinearity, but in weaker models, which is often the case in logistic regression, values above 2.5 may be a cause for concern" So, what do you guys think? Should I drop one of the variables? I also have a similar case with mother's race and father's race, only a little less correlation (0.619) Another question is: Although the book said to ignore the other outputs, I couldn't help and see that some of the condition values are very high in the collinearity diagnostics table. Should I care at all about these values or should just look at the other statistics like tolerance and VIF and correlation values? - Also, Is there any other way to check for multicolinearity in this type of dataset in SPSS? Please reply to any questions you might know the answer! Many thanks, Love. |
Yes, you found a very helpful site
Your question has a lot of implications so not really an easy answer without actually having the data to review with you. If you want, contact me directly. WMB Statistical Services ========================================= mailto:[hidden email] http://home.earthlink.net/~statmanz ========================================= Virus Scan Notice: This email is certified to be virus free. On 8/1/2006 3:15:23 PM, Love (sent by Nabble.com) ([hidden email]) wrote: > Hi guys, > > I just found about this forum today and I am really happy for that. I am > writing a PhD thesis and could not get much help from my advisor so far. > I have a dataset with categories to run a logistic regression. However, i > want to check for multicollinearity before I run the log. regression. A > book > on SPSS says to run a linear regression and ignore the rest of the ouput > but focus on the Coefficients table and the columns labelled collinearity > Statistics. My questions are: > > The correlation between two variables ( fathers’ Spanish origin and > mother’s > Spanish origin) is -0.714. Another warning sign, the Pearson correlation > between these variables also suggests multicollinearity (Pearson > correlation > of 0.808). However I am not sure if tolerance value is small enough to > raise > concern (0.293) or the VIF is higher than the cut-off point advisable for > a > logistic regression (3.416). Some researchers say the cut-off for the > tolerance is 0.1 or 0.2 and the VIF is 4, but one book says > "Values of VIF > exceeding 10 are often regarded as indicating multicollinearity, but in > weaker models, which is often the case in logistic regression, values above > 2.5 may be a
Will
Statistical Services ============ info.statman@earthlink.net http://home.earthlink.net/~z_statman/ ============ |
In reply to this post by Love-2
It's hard to give you a real specific recommendation without knowing more
about the variables at question are coded or what you are trying to model. I would assume country of origin would be coded as a series of dummy variables, but based on your post that doesn't seem to be the case so I am a little confused. (I would think the same thing about race, unless it is applied at the population level.) Based on my experience your values do raise a possible warning, but I don't know enough about the problem to give many suggestions without knowing more. Have a great day, Jason -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Love (sent by Nabble.com) Sent: Tuesday, August 01, 2006 2:15 PM To: [hidden email] Subject: Multicollinearity in SPSS Hi guys, I just found about this forum today and I am really happy for that. I am writing a PhD thesis and could not get much help from my advisor so far. I have a dataset with categories to run a logistic regression. However, i want to check for multicollinearity before I run the log. regression. A book on SPSS says to run a linear regression and ignore the rest of the ouput but focus on the Coefficients table and the columns labelled collinearity Statistics. My questions are: The correlation between two variables ( fathers' Spanish origin and mother's Spanish origin) is -0.714. Another warning sign, the Pearson correlation between these variables also suggests multicollinearity (Pearson correlation of 0.808). However I am not sure if tolerance value is small enough to raise concern (0.293) or the VIF is higher than the cut-off point advisable for a logistic regression (3.416). Some researchers say the cut-off for the tolerance is 0.1 or 0.2 and the VIF is 4, but one book says "Values of VIF exceeding 10 are often regarded as indicating multicollinearity, but in weaker models, which is often the case in logistic regression, values above 2.5 may be a cause for concern" So, what do you guys think? Should I drop one of the variables? I also have a similar case with mother's race and father's race, only a little less correlation (0.619) Another question is: Although the book said to ignore the other outputs, I couldn't help and see that some of the condition values are very high in the collinearity diagnostics table. Should I care at all about these values or should just look at the other statistics like tolerance and VIF and correlation values? - Also, Is there any other way to check for multicolinearity in this type of dataset in SPSS? Please reply to any questions you might know the answer! Many thanks, Love. -- View this message in context: http://www.nabble.com/Multicollinearity-in-SPSS-tf2035651.html#a5601207 Sent from the SPSSX Discussion forum at Nabble.com. |
Thank you Jason and Will for your prompt answers.
Sorry, i did not explain my variables well. I am trying to see which factors increase the risk of having a child with birth defects (1-yes (case), 2-no (control). I have lots of independent variables and the one i am afraid there is multicollinearity is fathers spanish origin (1-yes, 0-no). with mother's spanish origin (1-yes, 0-no). Thank you ! Love. |
Conceptually I would agree but my concern lies more with the fact that these
variables are dichotomies and experience has shown that testing these for multicollinearity is difficult at best. Can you share a sample of your data for me to do some "hands on?" If you want to converse directly do contact me. W -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Love (sent by Nabble.com) Sent: Tuesday, August 01, 2006 7:52 PM To: [hidden email] Subject: Re: Multicollinearity in SPSS Thank you Jason and Will for your prompt answers. Sorry, i did not explain my variables well. I am trying to see which factors increase the risk of having a child with birth defects (1-yes (case), 2-no (control). I have lots of independent variables and the one i am afraid there is multicollinearity is fathers spanish origin (1-yes, 0-no). with mother's spanish origin (1-yes, 0-no). Thank you ! Love. -- View this message in context: http://www.nabble.com/Multicollinearity-in-SPSS-tf2035651.html#a5605301 Sent from the SPSSX Discussion forum at Nabble.com.
Will
Statistical Services ============ info.statman@earthlink.net http://home.earthlink.net/~z_statman/ ============ |
Thank you so much Will for your interest in helping me. I am not sure if I can share any part of my dataset. I wish I could but I signed an IRB and I think I am prohibited by law to do that. But I appreciate your trying to help me. Love.
|
Hey, guys! I might have a problem that might seem easy to you but it isn't for me. I'm doing a research about creative advertising and have to check for example whether the divergence (rated on a seven point Lickert scale) and relevance (rated the same) and the interaction between the two divergence*relevance has an effect on the attention that the respondents also rated on a 7 point lickert schale. So when I run a regression this is what I get:
B t sig Constant ,529 ,649 ,518 Divergence ,666 4,215 ,000 Relevance ,573 2,275 ,024 Divergence*Relevance -,091 -2,012 ,046 This seemed weird to me because divergence*relevance has a negative influence on the dependent variable attention. How can it be? So I removed the Divergence*Relevance interaction, and this is what I got: B t sig. Constant 1,892 4,113 ,000 Divergence ,398 4,622 ,000 Relevance ,090 1,167 ,245 So the B and significance changed drastically. I've tested for multicoliniarity using the VIF. The combination where the Divergence was the dependent variable (and independent : Divergence and Divergence*Relevance) was the one where VIF was greater than 5. All the other combinations were fine (VIF was either 1 or slighter greater than 1). So my question is - what does that mean and how do I proceed? What do I have to do? And how should I explain that? |
Administrator
|
Please begin a new thread rather than replying to a 6 year old one with a subject only tangentially related to your own query. Nabble gets the threading all wrong and the UGA archive is useless for navigational reasons. What are the actual correlations? Maybe create some plots so you can SEE what is going on.
Interactions are sometimes rather difficult to interpret without some visuals.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Hey, man. Sorry for posting in an old thread. If you want I can just send to you in an excel file the data for that one dependent variable and the two independent ones. Because its difficult to overtype all the data from the spss results tables.
|
Administrator
|
You should start a new thread here
http://spssx-discussion.1045642.n5.nabble.com/ and attach the Excel file to the message. On Tue, Aug 7, 2012 at 7:57 AM, Almost Done [via SPSSX Discussion] <[hidden email]> wrote: > Hey, man. Sorry for posting in an old thread. If you want I can just send to > you in an excel file the data for that one dependent variable and the two > independent ones. Because its difficult to overtype all the data from the > spss results tables. > > ________________________________ > If you reply to this email, your message will be added to the discussion > below: > http://spssx-discussion.1045642.n5.nabble.com/Multicollinearity-in-SPSS-tp1070043p5714606.html > To unsubscribe from Multicollinearity in SPSS, click here. > NAML
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Almost Done
To me the result says that the positive effect of divergence is reduced in the presence of higher relevance and, likewise, the effect of relevance is reduced in the presence of higher divergence. It is sort of opposite to the synergistic effect.
Paul R. Swank, Ph.D. Professor, Department of Pediatrics Medical School Adjunct Professor, Health Promotions and Behavioral Sciences School of Public Health University of Texas Health Science Center at Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Almost Done Sent: Monday, August 06, 2012 4:24 PM To: [hidden email] Subject: Re: Multicollinearity in SPSS Hey, guys! I might have a problem that might seem easy to you but it isn't for me. I'm doing a research about creative advertising and have to check for example whether the divergence (rated on a seven point Lickert scale) and relevance (rated the same) and the interaction between the two divergence*relevance has an effect on the attention that the respondents also rated on a 7 point lickert schale. So when I run a regression this is what I get: B t sig Constant ,529 ,649 ,518 Divergence ,666 4,215 ,000 Relevance ,573 2,275 ,024 Divergence*Relevance -,091 -2,012 ,046 This seemed weird to me because divergence*relevance has a negative influence on the dependent variable attention. How can it be? So I removed the Divergence*Relevance interaction, and this is what I got: B t sig. Constant 1,892 4,113 ,000 Divergence ,398 4,622 ,000 Relevance ,090 1,167 ,245 So the B and significance changed drastically. I've tested for multicoliniarity using the VIF. The combination where the Divergence was the dependent variable (and independent : Divergence and Divergence*Relevance) was the one where VIF was greater than 5. All the other combinations were fine (VIF was either 1 or slighter greater than 1). So my question is - what does that mean and how do I proceed? What do I have to do? And how should I explain that? -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multicollinearity-in-SPSS-tp1070043p5714600.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |