This post was updated on .
Dear fellows, I have run binary logitic regression in spss with 7 IVs (dichotmized). My DV is "dissatisfied with cildbirth care" with response options of 1=yes and 0=no. One of the Independent variables is "interpersonal care and communication" (0=High, 1= Low). When I get the output for binary regression, all the statistics seem normal except the odds ratio of "interpersonal communication and care". All ORs are in the range of 0.173 and 3.251, however OR for interpersonal communication and care is 23 which to me seems exceptionally large (If I'm right in believing so). The dichotomous response for this variable has been obtained from its factor score. All the VIFs and Tolerance values seem fine indicating that the problem of multicollinearity doesn't exist. My sample size is 317 with 52 responses in the outcome category that I want to predict (dissatisfaction). Could anybody point out to any other problems I might have in the data owing to which the huge OR is observed (if it is huge at all).
Furthermore, I would like to know that what statistics of binary regression output should I report while writing the results. Thanks |
Perhaps you've already checked this but one line of checking is the impact of of missing data. How many records do you lose as each IV is entered and how does that loss divide between the two DV categories. The other line of checking is to crosstab that variable against the DV to see what the bivariate odds ratio is.
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Sidra Sent: Wednesday, August 23, 2017 9:08 AM To: [hidden email] Subject: Big odds ration in binary regression output Dear fellows, I have run binary logitic regression in spss with 7 IVs (dichotmized). My DV is "dissatisfied with cildbirth care" with response options of 1=yes and 0=no. One of the Independent variables is "interpersonal care and communication" (0=High, 1= Low). When I get the output for binary regression, all the statistics seem normal except the odds ratio of "interpersonal communication and care". All ORs are in the range of 0.173 and 3.251, however OR for interpersonal communication and care is 23 which to me seems exceptionally large (If I'm right in believing so). The dichotomous response for this variable has been obtained from its factor score. All the VIFs and Tolerance values seem fine indicating that the problem of multicollinearity doesn't exist. My sample size is 317 with 52 responses in the outcome category that I want to predict (dissatisfaction). Could anybody point out to any other problems I might have in the data owing to which the huge OR is observed (if it is huge at all). Furthermore, I would like to know that what statistics of binary regression output should I report while writing the results. Thanks -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Big-odds-ration-in-binary-regression-output-tp5734731.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Sidra
With 52 events and 7 variables, you have 7.4 events per variable (EPV). In order to avoid over-fitting a binary logistic regression model, you ought to have approximately 15 EPV. See Mike Babyak's nice article on over-fitting, for example. See also the 2016 article by Greenland et al. on "sparse data bias", as it could also be relevant.
https://www.ncbi.nlm.nih.gov/pubmed/15184705 http://www.bmj.com/content/352/bmj.i1981 HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Maguin, Eugene
Thanks Eugene for a quick response. I have only one missing value in my entire data so missing values aren't a problem. I just checked bi-variate odds ratio which also seems fairly large again (OR= 19.24).
|
A bivariate OR of 19 does seem huge to me. The phi correlation must also be very large, I'd guess .7 or .8 but maybe higher. That makes me think of a variable definition problem. You mentioned a factor score. Could it be that one or more of the indicators for the IV, I'll call it IC&C, are highly correlated with the DV?
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Sidra Sent: Wednesday, August 23, 2017 9:43 AM To: [hidden email] Subject: Re: Big odds ration in binary regression output Thanks Eugene for a quick response. I have only one missing value in my entire data so missing values aren't a problem. I just checked bi-variate odds ratio which also seems fairly large again (OR= 19.24). -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Big-odds-ratio-in-binary-regression-output-tp5734731p5734735.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Sidra
Why did you coarsen your IVs to dichotomies?
Art Kendall
Social Research Consultants |
Kendall, 5 of my IVs have been measured on binary level. However, two factors which made up the scale for measuring experience with childbirthcare have been dichotmized because I have been told to do so by my research supervisor. I don't really know the rational behind it.
|
In reply to this post by Maguin, Eugene
Eugene , surprisingly the phi correlation is not very huge for "interpersonal communication and care" and IV. It is -.525 which seems a moderate value to me. I couldn't get the second part of your comment. Could you make it more clear for me.
|
In reply to this post by Maguin, Eugene
Eugene, can there be a possibility of this huge odds ratio being simply reflective of "reality". In cross-tabulation I have noticed the standardized residuals for the four cells and it seems that each cell is significantly contributing towards the overall significance of the test. The value of standardized residuals for the particular IV are 2.0, 3.2, 4.4, 7.3. The values are above 1.96 (considered threshold for significant contribution of an individual cell). I don't have a very strong theoretical knowledge of statistics, so I might be wrong in my judgments.
|
' ... huge odds ratio being simply reflective of "reality"?' Yes.
" ... second part of your comment." You put some items together to make a factor, computed factor scores for that factor and then dichotomized the factor scores. That variable has an OR of 19+ with the DV and when that variable goes into the equation, it has an OR of 23+. Your DV is "dissatisfied with childbirth care" and the problem IV is called "interpersonal care and communication". Those two names seem very close to me in that a woman who feels she has received low levels of "interpersonal care and communication" might be/probably would be very "dissatisfied with childbirth care". I'm thinking that the indicators of the IV might correlate pretty highly with the DV because they are also indicators of childbirth care dissatisfaction. Two things then. Consider whether each indicator is also an indicator of dissatisfaction and look at the correlations of the indicators and the DV. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Sidra Sent: Wednesday, August 23, 2017 12:14 PM To: [hidden email] Subject: Re: Big odds ration in binary regression output Eugene, can there be a possibility of this huge odds ratio being simply reflective of "reality". In cross-tabulation I have noticed the standardized residuals for the four cells and it seems that each cell is significantly contributing towards the overall significance of the test. The value of standardized residuals for the particular IV are 2.0, 3.2, 4.4, 7.3. The values are above 1.96 (considered threshold for significant contribution of an individual cell). I don't have a very strong theoretical knowledge of statistics, so I might be wrong in my judgments. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Big-odds-ratio-in-binary-regression-output-tp5734731p5734744.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Sidra
Let's see. Chi-squared equals phi-squared times N. So your -0.535 for phi is worth about 90 as a chi-squared, for N= 317. That seems pretty big to me.
Yeah, your 2x2 table should be extreme. Look at it.
I just about never bother to look at VIF and tolerance because I try to start out my analyses by knowing the size of the univariate relations between IVs and Dependent, and also among IVs. Avoid redundancy, and check for basic validity among expected relationships.
And, yes, an OR of 19 or 23 seems "too big" for this sort of variable. What I would look into is the possibility of artifact: Is the DV accidentally a component of the predictor (which was a forced dichotomy of a composite score)? It is time to look at the components of the composite against the IV, separately, to see what is responsible for that large effect.
-- Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Sidra <[hidden email]>
Sent: Wednesday, August 23, 2017 11:28:57 AM To: [hidden email] Subject: Re: Big odds ration in binary regression output Eugene , surprisingly the phi correlation is not very huge for "interpersonal
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
communication and care" and IV. It is -.525 which seems a moderate value to me. I couldn't get the second part of your comment. Could you make it more clear for me. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Big-odds-ratio-in-binary-regression-output-tp5734731p5734743.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Maguin, Eugene
Eugene and Ulrich, thanks for your valuable suggestions. I have tried looking at the individual items of the problem predictor variable "Interpersonal communication and care" and their correlations with DV through Phi correlations (individual items were measured at binary level). It seems that the items which pertain to interpersonal care for instance "were you treated with respect and courtesy?" have high correlations with DV (.4 to .61)whereas the items of this same factor which pertain to communication part such as "were you given sufficient information regarding care of the newborn?" have moderate correlation with DV( ranging from .2 to .4). I want you to note that the DV was measured using a single item worded as "All in all, were you satisfied with the services you received during your stay in the hospital?" with response options of yes and no. As far as I can think, I don't see any replication of ideas here. But one particular item of the factor "do you think that the healthcare personnel took care of you and your child?" may have been interpreted in the same sense as the question measuring DV. This particular item has a phi correlation of .61 with DV and also very few responses in one cell. Should I try removing this item, calculate the factor score again, dichotomize and look at the changed odds ratio ?
|
I removed the item with high Phi correlation with DV, computed the factor score again and ran binary regression with new dichotomized variable. But the Odds ratio is unaffected, in fact it has slightly increased. I can't think of any other possible solutions. Should I report the OR as such?
|
In reply to this post by Sidra
Yes.
I'm a bit confused with your message because you say " It seems that the items which pertain to interpersonal care for instance "were you treated with respect and courtesy?" have high correlations with DV (.4 to .61) ..." and then a bit later you say " But one particular item of the factor "do you think that the healthcare personnel took care of you and your child?" may have been interpreted in the same sense as the question measuring DV. This particular item has a phi correlation of .61 with DV and also very few responses in one cell." Is there just one item with a .61 correlation or two items, both with a .61 correlation? I just saw your most recent message about removing the item with the high correlation. I'm guessing that means there is only one high correlation item. True? There's something going on and I'm not sure what it is. If I were talking with somebody here, where I am, who was having this same problem, I'd be asking to see the prior analyses. I hope that you have somebody there that could go through the complete analysis story with you. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Sidra Sent: Wednesday, August 23, 2017 2:05 PM To: [hidden email] Subject: Re: Big odds ration in binary regression output Eugene and Ulrich, thanks for your valuable suggestions. I have tried looking at the individual items of the problem predictor variable "Interpersonal communication and care" and their correlations with DV through Phi correlations (individual items were measured at binary level). It seems that the items which pertain to interpersonal care for instance "were you treated with respect and courtesy?" have high correlations with DV (.4 to .61)whereas the items of this same factor which pertain to communication part such as "were you given sufficient information regarding care of the newborn?" have moderate correlation with DV( ranging from .2 to .4). I want you to note that the DV was measured using a single item worded as "All in all, were you satisfied with the services you received during your stay in the hospital?" with response options of yes and no. As far as I can think, I don't see any replication of ideas here. But one particular item of the factor "do you think that the healthcare personnel took care of you and your child?" may have been interpreted in the same sense as the question measuring DV. This particular item has a phi correlation of .61 with DV and also very few responses in one cell. Should I try removing this item, calculate the factor score again, dichotomize and look at the changed odds ratio ? -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Big-odds-ratio-in-binary-regression-output-tp5734731p5734749.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
If the base rate for one of the target cells is not low you might find a very high OR
For instance 20 10 1 30 The OR for Cell 1 1 would be (20/10)/(1/30) = 2/.033 = 60. Not sure that this is the problem. Odd ratios tend to overestimate the relative risk whenever the base rate is not low. MTC. Martin Sherman Martin F. Sherman, Ph.D. Professor of Psychology Director of Masters Education: Thesis Track Loyola University Maryland 4501 North Charles Street 222 B Beatty Hall Baltimore, MD 21210 410 617-2417 [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene Sent: Wednesday, August 23, 2017 4:03 PM To: [hidden email] Subject: Re: Big odds ration in binary regression output Yes. I'm a bit confused with your message because you say " It seems that the items which pertain to interpersonal care for instance "were you treated with respect and courtesy?" have high correlations with DV (.4 to .61) ..." and then a bit later you say " But one particular item of the factor "do you think that the healthcare personnel took care of you and your child?" may have been interpreted in the same sense as the question measuring DV. This particular item has a phi correlation of .61 with DV and also very few responses in one cell." Is there just one item with a .61 correlation or two items, both with a .61 correlation? I just saw your most recent message about removing the item with the high correlation. I'm guessing that means there is only one high correlation item. True? There's something going on and I'm not sure what it is. If I were talking with somebody here, where I am, who was having this same problem, I'd be asking to see the prior analyses. I hope that you have somebody there that could go through the complete analysis story with you. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Sidra Sent: Wednesday, August 23, 2017 2:05 PM To: [hidden email] Subject: Re: Big odds ration in binary regression output Eugene and Ulrich, thanks for your valuable suggestions. I have tried looking at the individual items of the problem predictor variable "Interpersonal communication and care" and their correlations with DV through Phi correlations (individual items were measured at binary level). It seems that the items which pertain to interpersonal care for instance "were you treated with respect and courtesy?" have high correlations with DV (.4 to .61)whereas the items of this same factor which pertain to communication part such as "were you given sufficient information regarding care of the newborn?" have moderate correlation with DV( ranging from .2 to .4). I want you to note that the DV was measured using a single item worded as "All in all, were you satisfied with the services you received during your stay in the hospital?" with response options of yes and no. As far as I can think, I don't see any replication of ideas here. But one particular item of the factor "do you think that the healthcare personnel took care of you and your child?" may have been interpreted in the same sense as the question measuring DV. This particular item has a phi correlation of .61 with DV and also very few responses in one cell. Should I try removing this item, calculate the factor score again, dichotomize and look at the changed odds ratio ? -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Big-odds-ratio-in-binary-regression-output-tp5734731p5734749.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Maguin, Eugene
Yes Eugene, you're right in assuming that only one item has phi value of .61 but 4 other items related to interpersonal care also have fairly high phi statistic (-.44, -.43, -.56, -.44) . Here I would want you to note that the instrument to measure "childbirth care experience" was developed by me. I used exploratory factor analysis to extract the dimensions. While I look at the Factor loading matrix and structure matrix , the values for the items with high phi values are also considerably high with a range of .77-.89 (rotated factor loading). Since the items are highly correlated with the underlying construct they are measuring, with each other and with the DV, does this explain high OR?
|
In reply to this post by Sidra
There is a limit to how big a correlation you can expect between scaled variables.
In my experience, two items scored on a Likert-type scale are effectively measuring
the same thing when their r is 0.80. The only way it gets higher is by artifact and "shared error". For two dichotomies, I expect the same underlying trait when their r is 0.60. The max is lower when their skews are not synchronized. Dichotomous r's of 0.40 are large.
What is special about the question, "... are you satisfied with the services... ", is that hospital administrators have learned to respect it. It is now /the/ popular
measure of outcome. To me, both your highest item and its composite seem to measure that same outcome - as a latent trait. So, yes, you still get a high OR.
And the question I have: What are you trying to learn, or to accomplish?
Given the classical question ("satisfied") and one or two or three alternate versions (item with r=0.61; item with r=0.56; composite score), I think I would want to examine the discordant answers - Why does someone say Yes to "satisfied"
while saying No to "took care of you"? (and vice-versa.) That's what occurs to me, but don't know what other sort of data you have available, or what your mandate is for these data.
-- Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Sidra <[hidden email]>
Sent: Wednesday, August 23, 2017 2:05:13 PM To: [hidden email] Subject: Re: Big odds ration in binary regression output Eugene and Ulrich, thanks for your valuable suggestions. I have tried looking
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
at the individual items of the problem predictor variable "Interpersonal communication and care" and their correlations with DV through Phi correlations (individual items were measured at binary level). It seems that the items which pertain to interpersonal care for instance "were you treated with respect and courtesy?" have high correlations with DV (.4 to .61)whereas the items of this same factor which pertain to communication part such as "were you given sufficient information regarding care of the newborn?" have moderate correlation with DV( ranging from .2 to .4). I want you to note that the DV was measured using a single item worded as "All in all, were you satisfied with the services you received during your stay in the hospital?" with response options of yes and no. As far as I can think, I don't see any replication of ideas here. But one particular item of the factor "do you think that the healthcare personnel took care of you and your child?" may have been interpreted in the same sense as the question measuring DV. This particular item has a phi correlation of .61 with DV and also very few responses in one cell. Should I try removing this item, calculate the factor score again, dichotomize and look at the changed odds ratio ? -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Big-odds-ratio-in-binary-regression-output-tp5734731p5734749.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Sidra
perhaps you should go back to the factor analysis.
How did you decide how many factors to retain? Did you do a parallel analysis? (See the archives for macros to do this). Did you use principal factors so you used only the common variance? Did you use varimax rotation to maximize divergent validity? In developing the scoring key did you reflect items with negative (possibly positive) loadings so all repeated measures of the construct were pointing in the same direction? Did you drop items that did not load cleanly? I suggest you use unit weights, i.e., simply take the mean of the clean loading items so the resulting scale is on the same metric as the item response scale. (Following this conventional approach facilitates using the scales in future research by yourself and others. (Non-unit weights rarely stand up across studies.) When you redo your analysis try using the scale scores without coarsening to see the results. Then coarsen the scales to see if results are robust enough to withstand the coarsening of measurement.
Art Kendall
Social Research Consultants |
Kendall, I had used Software named FACTOR to carry out EFA. Since all the variables were measured on binary scale, I came to know that this is the software that accommodates EFA for such type of variables. The rotation method was Promin. I used parallel analysis to determine the number of suggested dimensions. The results suggested two dimensions on the basis of Parallel analysis. Next I ran MRFA to extract the dimensions. All these options were set by default and recommended by the developers of the software. An anomaly I only later noted is that 9 out of 16 variables had commonalities equal to 1 (heywood case). Does this invalidate my entire EFA? I didn't observe any complex loadings. 3 variables didn't load on to any factor and were removed from the scale.
|
Free forum by Nabble | Edit this page |