Reviewing websites, from Rich and Bruce's responses and from regression books, I have came to the conclusion that multicollinearity can be truly identified only in linear regressions. For logistic regression it is not very much feasible and I think the approach to detect muliticollinearity in a logistic regression does not provide much additional data than adjustment for confounding factors.
So, I think I should concentrate on detecting if age or socioeconomic status is correlated with BMI or weight gain or not. Is my sample size is adequate for running a binary logistic analysis with above independent variables? |
In reply to this post by dr_msantu
Since you have birth
weight as a continuous variable, would the logistic
regression be followup merely to see if the results stood up when
you coarsen the dv?
Might it be informative to make the dv (birthweight - criterion for normal birthweight) ? Art Kendall Social Research ConsultantsOn 4/11/2013 6:42 AM, dr_msantu [via SPSSX Discussion] wrote: You have correctly identified my problem. I think I should not try to
Art Kendall
Social Research Consultants |
In reply to this post by dr_msantu
I hope you find someone locally who you can discuss this with --
there is a lot to absorb and apply in what has already been posted. Even if you study a couple of textbooks (Frank Harrell on logistic, Jacob Cohen's book on regression), you will need to do several analyses before you get a feel for it. Bouncing ideas and reactions off someone who has performed and presented an analysis may not be *necessary* but it surely should help. Here are some particular comments and guidelines. (1) On sample size. "This kind" of analysis is what kind? Using Bruce's citation of 20 cases per d.f. of predictors in the smaller outcome group, you have a maximum of 7 d.f. available for a robust procedure. You have not yet mentioned the split for the criterion, but an equal split is 154/154, allowing almost 8. But you need to be clear about what "kind" this is: You have an outcome; you have two primary predictors that you care about, with (I think) 5 or 6 d.f. (treating categories separately); and two confounders with another 3 or 4 d.f. This adds to 8 or 10, so you are evidently in a shaky position if you want to make sound statements about all the variates. However -- you don't care so much about the confounders, so that simplifies the problem. You can look at the two primary predictors and make a good statement. You can add the covariates, the potential confounders, and make a further statement with a little less confidence. If the confounders don't make any difference, then you are in good shape because, despite the possibility of over-fitting, "They didn't make any difference." (2) Is SES correlated with BMI and weight gain? Here are two things everyone should do with every multivariate analysis, and then provide these details with the write-up. (Even if an editor doesn't want them included except as a summary statement, the reviewers should see them.) a) Look at the univariate relations of predictors with outcome. b) Look at the univariate relations among the predictors, to become aware of possible confounding. - Oh, and those crosstabs and correlations also should look reasonable. Consider this as one necessary part of data validation. (A friend was involved with analyzing auto repair data for Consumers' Report, back in the computer-card days. When the first glimpse of comparative data showed that Corvettes did *not* have a high repair cost, they knew they had a data problem. [Turned out, there was no ID that connected card 1 with card 2. Had to be re-punched.] ) Among correlations: Something with p < 0.05 is interesting and potentially "hazardous" if it is a confounder. Something with p < 0.20 but not 0.05 is potentially interesting, and it won't be very surprising to see modify something or be modified. For your data, it is appropriate to pay special attention to the relations of confounders with the primary predictors. You have some predictors you can look at in a couple of ways, either as categories or as continuous. Here is a rare instance where p-values can actually be useful in the middle of an analysis, in comparing the results for the same variable treated as either categories or as continuous. If the continuous variable has all the more interesting p values, then you have pretty good justification for using the predictor as continuous instead of its categories. This saves the degree of freedom, which is a concern with the N being what it is. (2b) I think you report that including Age resulted in the p-value for one coefficient crossing from "NS" to p < 0.05. (That was not very clear.) Did it change from 0.06 to 0.049, or what? Some changes are less notable than others. Yes, it usually means some confounding, but you look at the direct univariate tests when you want to answer that question. I am less sure about what happens in logistic regression, but in Ordinary Least Squares, you *can* add a new variable that is 100% independent of a predictor and still see the p-value change (though not the beta), because the error term has been reduced by the new, *strong* predictor. Are you considering "experiment-wise" control for multiple tests performed? If you are looking at the p-values separately for every predictor, then you probably should consider these results as exploratory. -- Rich Ulrich > Date: Thu, 11 Apr 2013 16:10:54 +0530 > From: [hidden email] > Subject: Re: Nominal Depndent with Ordinal Independent > To: [hidden email] > > You have correctly identified my problem. I think I should not try to > found "multicollinearity" in logistic regression, rather I should > concentrate on "confounding". I am very much thankful to all of you > for making such thorough discussion regarding "multicollinearity", > which helped me a lot. I have two more questions: > > 1) Is my sample size (308) is adequate enough for this kind of binary > logistic analysis. > > 2) I have run the binary logistic regression with BMI > category(ordinal), Gestational weight gain category(ordinal), > age(continuous) and socioeconomic status(ordinal) as independent > variable for my dichotomous dependent variable (such as LBW and NBW). > I used SPSS 20. > > I found that age was not significantly associated as p >0.05 and > change of coefficient is minimum (for underweight BMI it cganged to > 1.182 from 1.181. > > But, when I run with socioeconomic status the change of coefficient is > much more and it is also statistically significant p < 0.05. > > From these may I conclude that socioeconomic status is correlated with > BMI and Wgain ?. If the confounding factor is present I could I make > adjustment for this? > > On 10/04/2013, Rich Ulrich <[hidden email]> wrote: > > I think you are concerned more with what is usually discussed as > > "confounding" rather than as "multicollinearity." > > > > Originally - and still, sometimes - multicollinearity refers to > > what you get, say, when you use 3 dummy variables to code > > three categories. Or you include a set of items along with their > > total score. That is, you have redundancy, and the computer > > algorithms will choke and fail when they try to divide by zero > > (or, allowing for round-off, too near to zero). > > > > The usual extended version of concern with multicollinearity arises > > when there is near-redundancy, and this results in "variance inflation" > > for the predictors. That is: if your best predictor is equal parts of A > > and B, scaled the same, then when A and B are too similar, you will > > find that (0.9*A+ 0.1*B) is practically the same as (0.1*A+ 0.9*B) ... > > Then, even if A and B separately have small CIs on their coefficients, > > the combined regression will show large CIs on whatever the coefficients > > come out as. That is the variance inflation. (One simple solution that > > sometimes works great is to replace predictors [A, B] with the orthogonal > > pair [(A+B), (A-B)]. YMMV.) > > > > If you want to know whether Age and SES change the partial-associations > > seen for BMI and WGain, the simple answer is the direct one: Do it. > > Run an analysis of outcome for BMI and WGain; run another analysis for > > BMI and WGain that controls for the demographics. Do the coefficients > > change? > > > > If the demographic variables are not correlated with BMI and WGain, there > > will be no change. If correlated, there may or may not be change. It is > > possible for the coefficients to the same while the tests become more > > significant. It is appropriate, in the discussion of results, to comment on > > > > whether there was much reason to expect confounding. Is either BMI or > > WGain associated with age or SES? If they are not associated even a little > > bit, then there is confounding possible. But it does not require a huge > > association for some confounding to show its effect, if that is merely > > shifting > > a p-value from 0.051 to 0.049. > > > > -- > > Rich Ulrich > > > >> Date: Wed, 10 Apr 2013 01:06:21 -0700 > >> From: [hidden email] > >> Subject: Re: Nominal Depndent with Ordinal Independent > >> To: [hidden email] > >> > >> My full sample size is 308. > >> > >> I am still confused regarding the detection of multicollinearity among > >> the > >> variables of my data set. I could not find the wright answer. > >> > >> Is there any method by which I can adjust age or socioeconomic status (so > >> that I can detect whether BMI or gestational weight gain can affect > >> pregnancy outcome independent of age and socioeconmic status)? > >> > >> > >> ... > > |
I am currently out of the office until Monday April 22nd. For urgent issues please contact David Peng ([hidden email]). Thank you. |
Free forum by Nabble | Edit this page |