SPSSX Discussion

Nominal Depndent with Ordinal Independent

Classic

List

Threaded

24 messages Options

dr_msantu

Re: Nominal Depndent with Ordinal Independent

Reviewing websites, from Rich and Bruce's responses and from regression books, I have came to the conclusion that multicollinearity can be truly identified only in linear regressions. For logistic regression it is not very much feasible and I think the approach to detect muliticollinearity in a logistic regression does not provide much additional data than adjustment for confounding factors.

So, I think I should concentrate on detecting if age or socioeconomic status is correlated with BMI or weight gain or not. Is my sample size is adequate for running a binary logistic analysis with above independent variables?

Art Kendall

Re: Nominal Depndent with Ordinal Independent

In reply to this post by dr_msantu

Since you have birth weight as a continuous variable, would the logistic regression be followup merely to see if the results stood up when you coarsen the dv?
Might it be informative to make the dv (birthweight - criterion for normal birthweight) ?

Art Kendall
Social Research Consultants

On 4/11/2013 6:42 AM, dr_msantu [via SPSSX Discussion] wrote:

You have correctly identified my problem. I think I should not try to
found "multicollinearity" in logistic regression, rather I should
concentrate on "confounding". I am very much thankful to all of you
for making such thorough discussion regarding "multicollinearity",
which helped me a lot. I have two more questions:

1) Is my sample size (308) is adequate enough for this kind of binary
logistic analysis.

2) I have run the binary logistic regression with BMI
category(ordinal), Gestational weight gain category(ordinal),
age(continuous) and socioeconomic status(ordinal) as independent
variable for my dichotomous dependent variable (such as LBW and NBW).
I used SPSS 20.

I found that age was not significantly associated as p >0.05 and
change of coefficient is minimum (for underweight BMI it cganged to
1.182 from 1.181.

But, when I run with socioeconomic status the change of coefficient is
much more and it is also statistically significant p < 0.05.

From these may I conclude that socioeconomic status is correlated with
BMI and Wgain ?. If the confounding factor is present I could I make
adjustment for this?

On 10/04/2013, Rich Ulrich <[hidden email]> wrote:

> I think you are concerned more with what is usually discussed as
> "confounding" rather than as "multicollinearity."
>
> Originally - and still, sometimes - multicollinearity refers to
> what you get, say, when you use 3 dummy variables to code
> three categories. Or you include a set of items along with their
> total score. That is, you have redundancy, and the computer
> algorithms will choke and fail when they try to divide by zero
> (or, allowing for round-off, too near to zero).
>
> The usual extended version of concern with multicollinearity arises
> when there is near-redundancy, and this results in "variance inflation"
> for the predictors. That is: if your best predictor is equal parts of A
> and B, scaled the same, then when A and B are too similar, you will
> find that (0.9*A+ 0.1*B) is practically the same as (0.1*A+ 0.9*B) ...
> Then, even if A and B separately have small CIs on their coefficients,
> the combined regression will show large CIs on whatever the coefficients
> come out as. That is the variance inflation. (One simple solution that
> sometimes works great is to replace predictors [A, B] with the orthogonal
> pair [(A+B), (A-B)]. YMMV.)
>
> If you want to know whether Age and SES change the partial-associations
> seen for BMI and WGain, the simple answer is the direct one: Do it.
> Run an analysis of outcome for BMI and WGain; run another analysis for
> BMI and WGain that controls for the demographics. Do the coefficients
> change?
>
> If the demographic variables are not correlated with BMI and WGain, there
> will be no change. If correlated, there may or may not be change. It is
> possible for the coefficients to the same while the tests become more
> significant. It is appropriate, in the discussion of results, to comment on
>
> whether there was much reason to expect confounding. Is either BMI or
> WGain associated with age or SES? If they are not associated even a little
> bit, then there is confounding possible. But it does not require a huge
> association for some confounding to show its effect, if that is merely
> shifting
> a p-value from 0.051 to 0.049.
>
> --
> Rich Ulrich
>
>> Date: Wed, 10 Apr 2013 01:06:21 -0700
>> From: [hidden email]
>> Subject: Re: Nominal Depndent with Ordinal Independent
>> To: [hidden email]
>>
>> My full sample size is 308.
>>
>> I am still confused regarding the detection of multicollinearity among
>> the
>> variables of my data set. I could not find the wright answer.
>>
>> Is there any method by which I can adjust age or socioeconomic status (so
>> that I can detect whether BMI or gestational weight gain can affect
>> pregnancy outcome independent of age and socioeconmic status)?
>>
>>
>> ...
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Nominal-Depndent-with-Ordinal-Independent-tp5718913p5719395.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants

Rich Ulrich

Re: Nominal Depndent with Ordinal Independent

In reply to this post by dr_msantu

I hope you find someone locally who you can discuss this with --
there is a lot to absorb and apply in what has already been posted.
Even if you study a couple of textbooks (Frank Harrell on logistic,
Jacob Cohen's book on regression), you will need to do several analyses
before you get a feel for it. Bouncing ideas and reactions off someone
who has performed and presented an analysis may not be *necessary*
but it surely should help.

Here are some particular comments and guidelines.

(1) On sample size. "This kind" of analysis is what kind? Using Bruce's
citation of 20 cases per d.f. of predictors in the smaller outcome group,
you have a maximum of 7 d.f. available for a robust procedure. You have
not yet mentioned the split for the criterion, but an equal split is 154/154,
allowing almost 8.

But you need to be clear about what "kind" this is: You have an outcome;
you have two primary predictors that you care about, with (I think) 5 or 6 d.f.
(treating categories separately); and two confounders with another 3 or 4
d.f. This adds to 8 or 10, so you are evidently in a shaky position if you want
to make sound statements about all the variates. However -- you don't care
so much about the confounders, so that simplifies the problem.

You can look at the two primary predictors and make a good statement.
You can add the covariates, the potential confounders, and make a further
statement with a little less confidence. If the confounders don't make any
difference, then you are in good shape because, despite the possibility of
over-fitting, "They didn't make any difference."

(2) Is SES correlated with BMI and weight gain?

Here are two things everyone should do with every multivariate analysis, and
then provide these details with the write-up. (Even if an editor doesn't want
them included except as a summary statement, the reviewers should see them.)
a) Look at the univariate relations of predictors with outcome.
b) Look at the univariate relations among the predictors, to become aware of
possible confounding.
- Oh, and those crosstabs and correlations also should look reasonable. Consider
this as one necessary part of data validation. (A friend was involved with
analyzing auto repair data for Consumers' Report, back in the computer-card
days. When the first glimpse of comparative data showed that Corvettes did
*not* have a high repair cost, they knew they had a data problem. [Turned out,
there was no ID that connected card 1 with card 2. Had to be re-punched.] )

Among correlations: Something with p < 0.05 is interesting and potentially
"hazardous" if it is a confounder. Something with p < 0.20 but not 0.05 is
potentially interesting, and it won't be very surprising to see modify something
or be modified.

For your data, it is appropriate to pay special attention to the relations
of confounders with the primary predictors. You have some predictors you can
look at in a couple of ways, either as categories or as continuous. Here is a
rare instance where p-values can actually be useful in the middle of an
analysis, in comparing the results for the same variable treated as either
categories or as continuous. If the continuous variable has all the more
interesting p values, then you have pretty good justification for using the
predictor as continuous instead of its categories. This saves the degree of
freedom, which is a concern with the N being what it is.

(2b) I think you report that including Age resulted in the p-value for one
coefficient crossing from "NS" to p < 0.05. (That was not very clear.)
Did it change from 0.06 to 0.049, or what? Some changes are less notable
than others. Yes, it usually means some confounding, but you look at the
direct univariate tests when you want to answer that question. I am less
sure about what happens in logistic regression, but in Ordinary Least
Squares, you *can* add a new variable that is 100% independent of
a predictor and still see the p-value change (though not the beta), because
the error term has been reduced by the new, *strong* predictor.

Are you considering "experiment-wise" control for multiple tests performed?
If you are looking at the p-values separately for every predictor, then you
probably should consider these results as exploratory.

--
Rich Ulrich

> Date: Thu, 11 Apr 2013 16:10:54 +0530

> From: [hidden email]
> Subject: Re: Nominal Depndent with Ordinal Independent
> To: [hidden email]
>
> You have correctly identified my problem. I think I should not try to
> found "multicollinearity" in logistic regression, rather I should
> concentrate on "confounding". I am very much thankful to all of you
> for making such thorough discussion regarding "multicollinearity",
> which helped me a lot. I have two more questions:
>
> 1) Is my sample size (308) is adequate enough for this kind of binary
> logistic analysis.
>
> 2) I have run the binary logistic regression with BMI
> category(ordinal), Gestational weight gain category(ordinal),
> age(continuous) and socioeconomic status(ordinal) as independent
> variable for my dichotomous dependent variable (such as LBW and NBW).
> I used SPSS 20.
>
> I found that age was not significantly associated as p >0.05 and
> change of coefficient is minimum (for underweight BMI it cganged to
> 1.182 from 1.181.
>
> But, when I run with socioeconomic status the change of coefficient is
> much more and it is also statistically significant p < 0.05.
>
> From these may I conclude that socioeconomic status is correlated with
> BMI and Wgain ?. If the confounding factor is present I could I make
> adjustment for this?
>
> On 10/04/2013, Rich Ulrich <[hidden email]> wrote:
> > I think you are concerned more with what is usually discussed as
> > "confounding" rather than as "multicollinearity."
> >
> > Originally - and still, sometimes - multicollinearity refers to
> > what you get, say, when you use 3 dummy variables to code
> > three categories. Or you include a set of items along with their
> > total score. That is, you have redundancy, and the computer
> > algorithms will choke and fail when they try to divide by zero
> > (or, allowing for round-off, too near to zero).
> >
> > The usual extended version of concern with multicollinearity arises
> > when there is near-redundancy, and this results in "variance inflation"
> > for the predictors. That is: if your best predictor is equal parts of A
> > and B, scaled the same, then when A and B are too similar, you will
> > find that (0.9*A+ 0.1*B) is practically the same as (0.1*A+ 0.9*B) ...
> > Then, even if A and B separately have small CIs on their coefficients,
> > the combined regression will show large CIs on whatever the coefficients
> > come out as. That is the variance inflation. (One simple solution that
> > sometimes works great is to replace predictors [A, B] with the orthogonal
> > pair [(A+B), (A-B)]. YMMV.)
> >
> > If you want to know whether Age and SES change the partial-associations
> > seen for BMI and WGain, the simple answer is the direct one: Do it.
> > Run an analysis of outcome for BMI and WGain; run another analysis for
> > BMI and WGain that controls for the demographics. Do the coefficients
> > change?
> >
> > If the demographic variables are not correlated with BMI and WGain, there
> > will be no change. If correlated, there may or may not be change. It is
> > possible for the coefficients to the same while the tests become more
> > significant. It is appropriate, in the discussion of results, to comment on
> >
> > whether there was much reason to expect confounding. Is either BMI or
> > WGain associated with age or SES? If they are not associated even a little
> > bit, then there is confounding possible. But it does not require a huge
> > association for some confounding to show its effect, if that is merely
> > shifting
> > a p-value from 0.051 to 0.049.
> >
> > --
> > Rich Ulrich
> >
> >> Date: Wed, 10 Apr 2013 01:06:21 -0700
> >> From: [hidden email]
> >> Subject: Re: Nominal Depndent with Ordinal Independent
> >> To: [hidden email]
> >>
> >> My full sample size is 308.
> >>
> >> I am still confused regarding the detection of multicollinearity among
> >> the
> >> variables of my data set. I could not find the wright answer.
> >>
> >> Is there any method by which I can adjust age or socioeconomic status (so
> >> that I can detect whether BMI or gestational weight gain can affect
> >> pregnancy outcome independent of age and socioeconmic status)?
> >>
> >>
> >> ...
> >

Timothy Van Blarcom

Automatic reply: Nominal Depndent with Ordinal Independent

I am currently out of the office until Monday April 22nd. For urgent issues please contact David Peng ([hidden email]). Thank you.