unable to test linearity assumption of logistic regression

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

unable to test linearity assumption of logistic regression

Beno=?ISO-8859-1?Q?=EEt?= Depaire
Hello,

I am doing a simple binary logistic regression with the following structure:

logit(Y) = a + bX

X is a categorical variable with 5 categories, which is entered into the
model as indicator with the last category as reference.

To test for linearity, I performed the box-tidwell transformation on X (=
X*ln(x) ) and added this variable as a covariate (I actually use the
multinomial logistic regression procedure of SPSS as a binary logistic
regression to achieve more information).

First Question: Is it ok to enter this new variable as covariate (= non
categorical)? Or should it be entered as a Factor (= categorical).

However, the results are strange. When taking a look at the Likelihood Ratio
Test, SPSS tells us that removing X ( or X*ln(X) ) does not increase the
degrees of freedom, hence no significance is calculated and I cannot check
for linearity.

Second Question: How comes that removing this parameter from the model, does
not increase the degrees of freedom?

Third Question: How can i test for linearity ?

I'm really puzzled here. Also looking at the parameter estimates it seems
that no parameters are calculated for two of the five categories of X (they
are set to 0). I understand that the parameters for the fifth category of X
is redundant because this is the reference category. But somehow, also the
fourth category becomes redundant?

When I enter X as a covariate (= non-categorical variable), none of these
problems occur.

Thanks in advance
Reply | Threaded
Open this post in threaded view
|

Re: unable to test linearity assumption of logistic regression

Marta García-Granero
Hi Benoît,

I might be wrong, but the assumption of linearity of the logit is
important for quantitative/ordinal variables, not for categorical
ones...

HTH

Marta

BD> I am doing a simple binary logistic regression with the following structure:

BD> logit(Y) = a + bX

BD> X is a categorical variable with 5 categories, which is entered into the
BD> model as indicator with the last category as reference.

BD> To test for linearity, I performed the box-tidwell transformation on X (=
BD> X*ln(x) ) and added this variable as a covariate (I actually use the
BD> multinomial logistic regression procedure of SPSS as a binary logistic
BD> regression to achieve more information).

BD> First Question: Is it ok to enter this new variable as covariate (= non
BD> categorical)? Or should it be entered as a Factor (= categorical).

BD> However, the results are strange. When taking a look at the Likelihood Ratio
BD> Test, SPSS tells us that removing X ( or X*ln(X) ) does not increase the
BD> degrees of freedom, hence no significance is calculated and I cannot check
BD> for linearity.

BD> Second Question: How comes that removing this parameter from the model, does
BD> not increase the degrees of freedom?

BD> Third Question: How can i test for linearity ?

BD> I'm really puzzled here. Also looking at the parameter estimates it seems
BD> that no parameters are calculated for two of the five categories of X (they
BD> are set to 0). I understand that the parameters for the fifth category of X
BD> is redundant because this is the reference category. But somehow, also the
BD> fourth category becomes redundant?

BD> When I enter X as a covariate (= non-categorical variable), none of these
BD> problems occur.
Reply | Threaded
Open this post in threaded view
|

Re: unable to test linearity assumption of logistic regression

Swank, Paul R
In reply to this post by Beno=?ISO-8859-1?Q?=EEt?= Depaire
If X is an ordered, numeric categorical variable, then it might make sense to test for deviations from linearity. In OLS regression, this means comparing the model where X is categorical with one where X is
assumed to be linearly relayed.

PAul

Paul R. Swank, Ph.D.
Professor, Developmental Pediatrics
Director of Research, Center for Improving the Readiness of Children for Learning and Education (C.I.R.C.L.E.)
Medical School
UT Health Science Center at Houston

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta García-Granero
Sent: Friday, June 23, 2006 11:36 AM
To: [hidden email]
Subject: Re: unable to test linearity assumption of logistic regression

Hi Benoît,

I might be wrong, but the assumption of linearity of the logit is important for quantitative/ordinal variables, not for categorical ones...

HTH

Marta

BD> I am doing a simple binary logistic regression with the following structure:

BD> logit(Y) = a + bX

BD> X is a categorical variable with 5 categories, which is entered into
BD> the model as indicator with the last category as reference.

BD> To test for linearity, I performed the box-tidwell transformation on
BD> X (=
BD> X*ln(x) ) and added this variable as a covariate (I actually use the
BD> multinomial logistic regression procedure of SPSS as a binary
BD> logistic regression to achieve more information).

BD> First Question: Is it ok to enter this new variable as covariate (=
BD> non categorical)? Or should it be entered as a Factor (= categorical).

BD> However, the results are strange. When taking a look at the
BD> Likelihood Ratio Test, SPSS tells us that removing X ( or X*ln(X) )
BD> does not increase the degrees of freedom, hence no significance is
BD> calculated and I cannot check for linearity.

BD> Second Question: How comes that removing this parameter from the
BD> model, does not increase the degrees of freedom?

BD> Third Question: How can i test for linearity ?

BD> I'm really puzzled here. Also looking at the parameter estimates it
BD> seems that no parameters are calculated for two of the five
BD> categories of X (they are set to 0). I understand that the
BD> parameters for the fifth category of X is redundant because this is
BD> the reference category. But somehow, also the fourth category becomes redundant?

BD> When I enter X as a covariate (= non-categorical variable), none of
BD> these problems occur.
Reply | Threaded
Open this post in threaded view
|

Re: unable to test linearity assumption of logistic regression

Richard Ristow
In reply to this post by Beno=?ISO-8859-1?Q?=EEt?= Depaire
At 02:52 AM 6/23/2006, Benoît Depaire wrote:

>I am doing a simple binary logistic regression
>with the following structure:
>
>logit(Y) = a + bX
>
>X is a categorical variable with 5 categories,
>which is entered into the model as indicator
>[variables] with the last category as reference.
>
>To test for linearity, I performed the
>box-tidwell transformation on X (=X*ln(x) ) and
>added this variable as a covariate.

Curious: what does the transformation mean, if X
is a categorical variable? If it's truly
categorical, it would be the same variable after any RECODE, like

RECODE X (1=3) (2=1) (3=5) (5=4) (4=2).

but that would raise Cain with the
transformation, wouldn't it? Or, what if the
categories were A, B, C, D and E?

>The results are strange. When taking a look at
>the Likelihood Ratio Test, SPSS tells us that
>removing X ( or X*ln(X) ) does not increase the
>degrees of freedom, hence no significance is
>calculated and I cannot check for linearity.

Exactly. The transform is totally confounded with
the four categorical indicators -  they're
collinear. (Algebra below.) If this were a
multiple regression, it would fail for
multi-collinearity of the variables.

Collinearity:

The variables are multi-collinear if any one is a
linear combination of the others.

If your indicator variables for the categories
are X1, X2, X3, X4 and X5; the numeric values you
associate with them are x1, x2, x3, x4, and x5;
and BT is the box-tidwell transformed variable.

Let bt1 = box-tidwell value for x1 = x1*ln(x1), etc.

Then, you have

BT = bt1*X1 + bt2*X2 + bt3*X3 + bt4*X4 + bt5*X5.

Or, with a constant in the model and 5 as the reference category,

BT = bt5 + (bt1-bt5)*X1+(bt2-bt5)*X2+(bt3-bt5)*X3+(bt4-bt5)*X4

Collinearity.
Reply | Threaded
Open this post in threaded view
|

Re: unable to test linearity assumption of logistic regression

Marta García-Granero
In reply to this post by Swank, Paul R
Hi:

Since Richard Ristow (brillianty, as always) has already explained the
collinearity problem (I had left that in my list of "TO DOs" for
sunday), I'm going to focus on how to really test linearity assumption
in logistic regression for an ORDINAL (categorical but ORDERED)
variable. I'm going to use Shapiro's dataset:

* Data from Shapiro S et al
  "Oral contraceptive use in relation to myocardial infarction"
  Lancet 1979; 1: 743-7.

DATA LIST FREE /agegroup(f8.0) tobacco(F8.0) ocu(F8.0) mi(F8.0) n(F8.0).
BEGIN DATA
1 1 0 0 106 1 1 0 1 1 1 1 1 0 25 1 2 0 0 79 1 2 1 0 25
1 2 1 1 1 1 3 0 0 39 1 3 0 1 1 1 3 1 0 12 1 3 1 1 3
2 1 0 0 175 2 1 1 0 13 2 2 0 0 142 2 2 0 1 5 2 2 1 0 10
2 2 1 1 1 2 3 0 0 73 2 3 0 1 7 2 3 1 0 10 2 3 1 1 8
3 1 0 0 153 3 1 0 1 3 3 1 1 0 8 3 2 0 0 119 3 2 0 1 11
3 2 1 0 11 3 2 1 1 1 3 3 0 0 58 3 3 0 1 19 3 3 1 0 7
3 3 1 1 3 4 1 0 0 165 4 1 0 1 10 4 1 1 0 4 4 1 1 1 1
4 2 0 0 130 4 2 0 1 21 4 2 1 0 4 4 3 0 0 67 4 3 0 1 34
4 3 1 0 1 4 3 1 1 5 5 1 0 0 155 5 1 0 1 20 5 1 1 0 2
5 1 1 1 3 5 2 0 0 96 5 2 0 1 42 5 2 1 0 1 5 3 0 0 50
5 3 0 1 31 5 3 1 0 2 5 3 1 1 3
END DATA.
WEIGHT BY n .
VAR LABEL ocu 'Oral contraceptive use' /mi 'Myocardial infarction'.
VALUE LABEL  agegroup
 1 '25-29 years'
 2 '30-34 years'
 3 '35-39 years'
 4 '40-44 years'
 5 '45-49 years'.
VALUE LABEL tobacco
 1 'Non smoker'
 2  '1-24 cig/day'
 3  ' >=25 cig/day'.
VALUE LABEL ocu 0 'No' 1 'Yes'.
VALUE LABEL mi 0 'Control' 1 'Case'.

LOGISTIC REGRESSION VAR=mi
  /METHOD=BSTEP(LR) tobacco ocu agegroup
  /CONTRAST (ocu)=Indicator(1)
  /CONTRAST (tobacco)=Indicator(1)
  /CONTRAST (agegroup)=Indicator(1)
  /PRINT=CI(95)
  /CRITERIA PIN(1) POUT(1).

Agegroup is an ordered categorical variable. If we are interested in
testing the linearity of the logits (and taking into account that the
difference between the logit and the log(OR) -b- is only a constant,
then we can do the following:

We take a look at the results for the variable Agegroup in the
LOGISTIC model we run above:

   b    SE(b)    Wald   DF   Sig    OR    Low CL  Upp CL
              111.413   4   .000
1.027  .482     4.535   1   .033   2.793   1.085   7.187
1.869  .464    16.195   1   .000   6.483   2.609  16.113
2.588  .456    32.184   1   .000  13.297   5.439  32.510
3.273  .455    51.773   1   .000  26.380  10.817  64.330

We could use OMS to extract the bi for the levels of agegroup, but,
to keep everything simple, I'll extract them manually.

DATA LIST LIST/agemidp(F8) b(F8.3).
BEGIN DATA
27 0
32 1.027
37 1.869
42 2.588
47 3.273
END DATA.

Now we plot the bi against Agegroup mid-points:

GRAPH /SCATTERPLOT(BIVAR)=agemidp WITH b.

The relationship is clearly lineal, showing that Agegroup fulfills the
linearity assumption.

HTH

Marta