Hello,
I am doing a simple binary logistic regression with the following structure: logit(Y) = a + bX X is a categorical variable with 5 categories, which is entered into the model as indicator with the last category as reference. To test for linearity, I performed the box-tidwell transformation on X (= X*ln(x) ) and added this variable as a covariate (I actually use the multinomial logistic regression procedure of SPSS as a binary logistic regression to achieve more information). First Question: Is it ok to enter this new variable as covariate (= non categorical)? Or should it be entered as a Factor (= categorical). However, the results are strange. When taking a look at the Likelihood Ratio Test, SPSS tells us that removing X ( or X*ln(X) ) does not increase the degrees of freedom, hence no significance is calculated and I cannot check for linearity. Second Question: How comes that removing this parameter from the model, does not increase the degrees of freedom? Third Question: How can i test for linearity ? I'm really puzzled here. Also looking at the parameter estimates it seems that no parameters are calculated for two of the five categories of X (they are set to 0). I understand that the parameters for the fifth category of X is redundant because this is the reference category. But somehow, also the fourth category becomes redundant? When I enter X as a covariate (= non-categorical variable), none of these problems occur. Thanks in advance |
Hi Benoît,
I might be wrong, but the assumption of linearity of the logit is important for quantitative/ordinal variables, not for categorical ones... HTH Marta BD> I am doing a simple binary logistic regression with the following structure: BD> logit(Y) = a + bX BD> X is a categorical variable with 5 categories, which is entered into the BD> model as indicator with the last category as reference. BD> To test for linearity, I performed the box-tidwell transformation on X (= BD> X*ln(x) ) and added this variable as a covariate (I actually use the BD> multinomial logistic regression procedure of SPSS as a binary logistic BD> regression to achieve more information). BD> First Question: Is it ok to enter this new variable as covariate (= non BD> categorical)? Or should it be entered as a Factor (= categorical). BD> However, the results are strange. When taking a look at the Likelihood Ratio BD> Test, SPSS tells us that removing X ( or X*ln(X) ) does not increase the BD> degrees of freedom, hence no significance is calculated and I cannot check BD> for linearity. BD> Second Question: How comes that removing this parameter from the model, does BD> not increase the degrees of freedom? BD> Third Question: How can i test for linearity ? BD> I'm really puzzled here. Also looking at the parameter estimates it seems BD> that no parameters are calculated for two of the five categories of X (they BD> are set to 0). I understand that the parameters for the fifth category of X BD> is redundant because this is the reference category. But somehow, also the BD> fourth category becomes redundant? BD> When I enter X as a covariate (= non-categorical variable), none of these BD> problems occur. |
In reply to this post by Beno=?ISO-8859-1?Q?=EEt?= Depaire
If X is an ordered, numeric categorical variable, then it might make sense to test for deviations from linearity. In OLS regression, this means comparing the model where X is categorical with one where X is
assumed to be linearly relayed. PAul Paul R. Swank, Ph.D. Professor, Developmental Pediatrics Director of Research, Center for Improving the Readiness of Children for Learning and Education (C.I.R.C.L.E.) Medical School UT Health Science Center at Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta García-Granero Sent: Friday, June 23, 2006 11:36 AM To: [hidden email] Subject: Re: unable to test linearity assumption of logistic regression Hi Benoît, I might be wrong, but the assumption of linearity of the logit is important for quantitative/ordinal variables, not for categorical ones... HTH Marta BD> I am doing a simple binary logistic regression with the following structure: BD> logit(Y) = a + bX BD> X is a categorical variable with 5 categories, which is entered into BD> the model as indicator with the last category as reference. BD> To test for linearity, I performed the box-tidwell transformation on BD> X (= BD> X*ln(x) ) and added this variable as a covariate (I actually use the BD> multinomial logistic regression procedure of SPSS as a binary BD> logistic regression to achieve more information). BD> First Question: Is it ok to enter this new variable as covariate (= BD> non categorical)? Or should it be entered as a Factor (= categorical). BD> However, the results are strange. When taking a look at the BD> Likelihood Ratio Test, SPSS tells us that removing X ( or X*ln(X) ) BD> does not increase the degrees of freedom, hence no significance is BD> calculated and I cannot check for linearity. BD> Second Question: How comes that removing this parameter from the BD> model, does not increase the degrees of freedom? BD> Third Question: How can i test for linearity ? BD> I'm really puzzled here. Also looking at the parameter estimates it BD> seems that no parameters are calculated for two of the five BD> categories of X (they are set to 0). I understand that the BD> parameters for the fifth category of X is redundant because this is BD> the reference category. But somehow, also the fourth category becomes redundant? BD> When I enter X as a covariate (= non-categorical variable), none of BD> these problems occur. |
In reply to this post by Beno=?ISO-8859-1?Q?=EEt?= Depaire
At 02:52 AM 6/23/2006, Benoît Depaire wrote:
>I am doing a simple binary logistic regression >with the following structure: > >logit(Y) = a + bX > >X is a categorical variable with 5 categories, >which is entered into the model as indicator >[variables] with the last category as reference. > >To test for linearity, I performed the >box-tidwell transformation on X (=X*ln(x) ) and >added this variable as a covariate. Curious: what does the transformation mean, if X is a categorical variable? If it's truly categorical, it would be the same variable after any RECODE, like RECODE X (1=3) (2=1) (3=5) (5=4) (4=2). but that would raise Cain with the transformation, wouldn't it? Or, what if the categories were A, B, C, D and E? >The results are strange. When taking a look at >the Likelihood Ratio Test, SPSS tells us that >removing X ( or X*ln(X) ) does not increase the >degrees of freedom, hence no significance is >calculated and I cannot check for linearity. Exactly. The transform is totally confounded with the four categorical indicators - they're collinear. (Algebra below.) If this were a multiple regression, it would fail for multi-collinearity of the variables. Collinearity: The variables are multi-collinear if any one is a linear combination of the others. If your indicator variables for the categories are X1, X2, X3, X4 and X5; the numeric values you associate with them are x1, x2, x3, x4, and x5; and BT is the box-tidwell transformed variable. Let bt1 = box-tidwell value for x1 = x1*ln(x1), etc. Then, you have BT = bt1*X1 + bt2*X2 + bt3*X3 + bt4*X4 + bt5*X5. Or, with a constant in the model and 5 as the reference category, BT = bt5 + (bt1-bt5)*X1+(bt2-bt5)*X2+(bt3-bt5)*X3+(bt4-bt5)*X4 Collinearity. |
In reply to this post by Swank, Paul R
Hi:
Since Richard Ristow (brillianty, as always) has already explained the collinearity problem (I had left that in my list of "TO DOs" for sunday), I'm going to focus on how to really test linearity assumption in logistic regression for an ORDINAL (categorical but ORDERED) variable. I'm going to use Shapiro's dataset: * Data from Shapiro S et al "Oral contraceptive use in relation to myocardial infarction" Lancet 1979; 1: 743-7. DATA LIST FREE /agegroup(f8.0) tobacco(F8.0) ocu(F8.0) mi(F8.0) n(F8.0). BEGIN DATA 1 1 0 0 106 1 1 0 1 1 1 1 1 0 25 1 2 0 0 79 1 2 1 0 25 1 2 1 1 1 1 3 0 0 39 1 3 0 1 1 1 3 1 0 12 1 3 1 1 3 2 1 0 0 175 2 1 1 0 13 2 2 0 0 142 2 2 0 1 5 2 2 1 0 10 2 2 1 1 1 2 3 0 0 73 2 3 0 1 7 2 3 1 0 10 2 3 1 1 8 3 1 0 0 153 3 1 0 1 3 3 1 1 0 8 3 2 0 0 119 3 2 0 1 11 3 2 1 0 11 3 2 1 1 1 3 3 0 0 58 3 3 0 1 19 3 3 1 0 7 3 3 1 1 3 4 1 0 0 165 4 1 0 1 10 4 1 1 0 4 4 1 1 1 1 4 2 0 0 130 4 2 0 1 21 4 2 1 0 4 4 3 0 0 67 4 3 0 1 34 4 3 1 0 1 4 3 1 1 5 5 1 0 0 155 5 1 0 1 20 5 1 1 0 2 5 1 1 1 3 5 2 0 0 96 5 2 0 1 42 5 2 1 0 1 5 3 0 0 50 5 3 0 1 31 5 3 1 0 2 5 3 1 1 3 END DATA. WEIGHT BY n . VAR LABEL ocu 'Oral contraceptive use' /mi 'Myocardial infarction'. VALUE LABEL agegroup 1 '25-29 years' 2 '30-34 years' 3 '35-39 years' 4 '40-44 years' 5 '45-49 years'. VALUE LABEL tobacco 1 'Non smoker' 2 '1-24 cig/day' 3 ' >=25 cig/day'. VALUE LABEL ocu 0 'No' 1 'Yes'. VALUE LABEL mi 0 'Control' 1 'Case'. LOGISTIC REGRESSION VAR=mi /METHOD=BSTEP(LR) tobacco ocu agegroup /CONTRAST (ocu)=Indicator(1) /CONTRAST (tobacco)=Indicator(1) /CONTRAST (agegroup)=Indicator(1) /PRINT=CI(95) /CRITERIA PIN(1) POUT(1). Agegroup is an ordered categorical variable. If we are interested in testing the linearity of the logits (and taking into account that the difference between the logit and the log(OR) -b- is only a constant, then we can do the following: We take a look at the results for the variable Agegroup in the LOGISTIC model we run above: b SE(b) Wald DF Sig OR Low CL Upp CL 111.413 4 .000 1.027 .482 4.535 1 .033 2.793 1.085 7.187 1.869 .464 16.195 1 .000 6.483 2.609 16.113 2.588 .456 32.184 1 .000 13.297 5.439 32.510 3.273 .455 51.773 1 .000 26.380 10.817 64.330 We could use OMS to extract the bi for the levels of agegroup, but, to keep everything simple, I'll extract them manually. DATA LIST LIST/agemidp(F8) b(F8.3). BEGIN DATA 27 0 32 1.027 37 1.869 42 2.588 47 3.273 END DATA. Now we plot the bi against Agegroup mid-points: GRAPH /SCATTERPLOT(BIVAR)=agemidp WITH b. The relationship is clearly lineal, showing that Agegroup fulfills the linearity assumption. HTH Marta |
Free forum by Nabble | Edit this page |