The effect of an interaction in logistic regression

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

The effect of an interaction in logistic regression

Charlotte-9
Dear all,

I am trying to fit a logistic regression model to some data concerning a
screening test.  As a start, I have generated models with just two
independent variables; age and ethnicity.  The outcome variable
is ‘screened’ or ‘not screened’. The ethnicity variable has five
categories.

After fitting the model AGE + ETHNICITY + AGE*ETHNICITY, it seems clear
that there is an interaction between age and ethnicity.  However, my
question concerns a difference in the models when age is used in its
original continuous form and when it is used as a categorical variable
with 4 categories.

If I generate the above model with age as a categorical variable, the
estimated coefficients relating to the ethnicity main effect are all
highly significant (p < 0.0001).  However, if I fit the same model but
with age in its original continuous form, the estimated coefficients
relating to ethnicity suddenly become not significant (p > 0.4 in each
case) and the Wald values completely diminish.  This basically happens
when I include the interaction term in the model.  Can anyone help me to
explain this please?  I’m now not sure whether to use age in its original
form or in the categorised form but it seems that there is definitely some
effect to take note of here.

Many thanks,

Charlotte
Reply | Threaded
Open this post in threaded view
|

Re: The effect of an interaction in logistic regression

Muir Houston
I have had something similar in other data relating to age and gender interactions - have you tried a quadratic term in the form of age squared along as well as an interaction of age-squared and ethnicity
It may also be worth calculating Mallows Cp for the best subsets of all your interaction terms
Are you entering ethnicity as a categorical or a series of dummy variables?
If using as a categorical it may be to do with e reference category - so may be worth constructing dummies

Muir Houston
BA (Hons.), MPhil, PhD, FHEA
Research Fellow
Institute of Education & CRLL
University of Stirling
FK9 4LA
Tel: 01786-46-7615


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Charlotte
Sent: 29 May 2007 14:25
To: [hidden email]
Subject: The effect of an interaction in logistic regression

Dear all,

I am trying to fit a logistic regression model to some data concerning a screening test.  As a start, I have generated models with just two independent variables; age and ethnicity.  The outcome variable is ‘screened’ or ‘not screened’. The ethnicity variable has five categories.

After fitting the model AGE + ETHNICITY + AGE*ETHNICITY, it seems clear that there is an interaction between age and ethnicity.  However, my question concerns a difference in the models when age is used in its original continuous form and when it is used as a categorical variable with 4 categories.

If I generate the above model with age as a categorical variable, the estimated coefficients relating to the ethnicity main effect are all highly significant (p < 0.0001).  However, if I fit the same model but with age in its original continuous form, the estimated coefficients relating to ethnicity suddenly become not significant (p > 0.4 in each
case) and the Wald values completely diminish.  This basically happens when I include the interaction term in the model.  Can anyone help me to explain this please?  I’m now not sure whether to use age in its original form or in the categorised form but it seems that there is definitely some effect to take note of here.

Many thanks,

Charlotte

--
The University of Stirling is a university established in Scotland by
charter at Stirling, FK9 4LA.  Privileged/Confidential Information may
be contained in this message.  If you are not the addressee indicated
in this message (or responsible for delivery of the message to such
person), you may not disclose, copy or deliver this message to anyone
and any action taken or omitted to be taken in reliance on it, is
prohibited and may be unlawful.  In such case, you should destroy this
message and kindly notify the sender by reply email.  Please advise
immediately if you or your employer do not consent to Internet email
for messages of this kind.
Reply | Threaded
Open this post in threaded view
|

Re: The effect of an interaction in logistic regression

Charlotte-9
In reply to this post by Charlotte-9
Hi Muir,

I am just about to try a quadratic term as that's the only other thing I
could think to look at for the moment.  As for ethnicity, I am entering as
a series of dummy variables - well this seems to be how SPSS does it
anyway (?).  I guess as well as trying to explain what's going on with
this model, I'm also baffled as to why this is not seen when I use age in
its grouped form.  Does this imply that the grouping is somehow distoring
an important effect?  Overall, I would expect ethnicity to have a big
effect on the outcome, so the more reasonable model 'seems' to be the one
where age is grouped!

Thanks for your help,
Charlotte

On Tue, 29 May 2007 14:41:31 +0100, Muir Houston <[hidden email]>
wrote:

>I have had something similar in other data relating to age and gender
interactions - have you tried a quadratic term in the form of age squared
along as well as an interaction of age-squared and ethnicity
>It may also be worth calculating Mallows Cp for the best subsets of all
your interaction terms
>Are you entering ethnicity as a categorical or a series of dummy
variables?
>If using as a categorical it may be to do with e reference category - so
may be worth constructing dummies

>
>Muir Houston
>BA (Hons.), MPhil, PhD, FHEA
>Research Fellow
>Institute of Education & CRLL
>University of Stirling
>FK9 4LA
>Tel: 01786-46-7615
>
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Charlotte
>Sent: 29 May 2007 14:25
>To: [hidden email]
>Subject: The effect of an interaction in logistic regression
>
>Dear all,
>
>I am trying to fit a logistic regression model to some data concerning a
screening test.  As a start, I have generated models with just two
independent variables; age and ethnicity.  The outcome variable is
‘screened’ or ‘not screened’. The ethnicity variable has five
categories.
>
>After fitting the model AGE + ETHNICITY + AGE*ETHNICITY, it seems clear
that there is an interaction between age and ethnicity.  However, my
question concerns a difference in the models when age is used in its
original continuous form and when it is used as a categorical variable
with 4 categories.
>
>If I generate the above model with age as a categorical variable, the
estimated coefficients relating to the ethnicity main effect are all
highly significant (p < 0.0001).  However, if I fit the same model but
with age in its original continuous form, the estimated coefficients
relating to ethnicity suddenly become not significant (p > 0.4 in each
>case) and the Wald values completely diminish.  This basically happens
when I include the interaction term in the model.  Can anyone help me to
explain this please?  I’m now not sure whether to use age in its
original form or in the categorised form but it seems that there is
definitely some effect to take note of here.

>
>Many thanks,
>
>Charlotte
>
>--
>The University of Stirling is a university established in Scotland by
>charter at Stirling, FK9 4LA.  Privileged/Confidential Information may
>be contained in this message.  If you are not the addressee indicated
>in this message (or responsible for delivery of the message to such
>person), you may not disclose, copy or deliver this message to anyone
>and any action taken or omitted to be taken in reliance on it, is
>prohibited and may be unlawful.  In such case, you should destroy this
>message and kindly notify the sender by reply email.  Please advise
>immediately if you or your employer do not consent to Internet email
>for messages of this kind.
Reply | Threaded
Open this post in threaded view
|

Re: The effect of an interaction in logistic regression

Hector Maletta
In reply to this post by Charlotte-9
Charlotte,
according to your previous messages about your study, your subjects are all in the 50-70 age bracket, and I imagine your 4 age groups mighr be 5-year intervals like 50-54, 55-59 and so on.
When you use individual years, you are measuring the effect of adding one individual year of age, which is probably a smaller effect than jumping from one 5-year age group to the next. That is possibly the souyrce of the lower significance of your result when age is measured as a continuous variable (or more precisely, as a discrete variable measured in years) as compared with the case in which it is measured as a discrete variable measured in 5-year intervals.

However, this is only a guess, which is not very likely in your case because your dataset is large enough to get significant result for small effects. There might possiblty be other problems involved. One of them is non monotonicity in the increase of the outcome by age: it may be the case that screening increases by age group, but not necessarily increases in all one-year intervals, thus making for a less clear relationship between the outcome and age when the latter is measured in individual years (if you measured age in months or days or minutes it would be yet less significant).

That is usually the reason why ages are grouped in intervals, even at the risk of losing information and introducing arbitrary cutoff points. To check for these risks, see whether your       works with other age groupings (e.g. in groups of 4 years instead of 5).

Hector


----- Mensaje original -----
De: Charlotte <[hidden email]>
Fecha: Martes, Mayo 29, 2007 10:24 am
Asunto: The effect of an interaction in logistic regression

> Dear all,
>
> I am trying to fit a logistic regression       to some data
> concerning a
> screening test.  As a start, I have generated      s with just two
> independent variables; age and ethnicity.  The outcome variable
> is ‘screened’ or ‘not screened’. The ethnicity variable
> has five
> categories.
>
> After fitting the       AGE + ETHNICITY + AGE*ETHNICITY, it seems
> clearthat there is an interaction between age and ethnicity.
> However, my
> question concerns a difference in the      s when age is used in its
> original continuous form and when it is used as a categorical variable
> with 4 categories.
>
> If I generate the above       with age as a categorical variable, the
> estimated coefficients relating to the ethnicity main effect are all
> highly significant (p < 0.0001).  However, if I fit the same       but
> with age in its original continuous form, the estimated coefficients
> relating to ethnicity suddenly become not significant (p > 0.4 in each
> case) and the Wald values completely diminish.  This basically happens
> when I include the interaction term in the      .  Can anyone help
> me to
> explain this please?  I’m now not sure whether to use age in its
> originalform or in the categorised form but it seems that there is
> definitely some
> effect to take note of here.
>
> Many thanks,
>
> Charlotte
>
Reply | Threaded
Open this post in threaded view
|

Re: The effect of an interaction in logistic regression

Charlotte-9
In reply to this post by Charlotte-9
Hi again,

I have just fit the following model Ethnicity + Age^2 + Ethnicity*Age^2
and this seems to have worked well at sorting out my initial problem - the
coefficients relating to ethncitiy are nearly all significant again.
However, I am now still not sure whether to accept this model or the one
with the categorical variable!  Furthermore, when including a quadratic
term, how does this affect the overall interpretation of the
interactions?  Any further thoughts appreciated.

Many thanks,

Charlotte

On Tue, 29 May 2007 10:01:48 -0400, Lou <[hidden email]> wrote:

>Hi Muir,
>
>I am just about to try a quadratic term as that's the only other thing I
>could think to look at for the moment.  As for ethnicity, I am entering as
>a series of dummy variables - well this seems to be how SPSS does it
>anyway (?).  I guess as well as trying to explain what's going on with
>this model, I'm also baffled as to why this is not seen when I use age in
>its grouped form.  Does this imply that the grouping is somehow distoring
>an important effect?  Overall, I would expect ethnicity to have a big
>effect on the outcome, so the more reasonable model 'seems' to be the one
>where age is grouped!
>
>Thanks for your help,
>Charlotte
>
>On Tue, 29 May 2007 14:41:31 +0100, Muir Houston <[hidden email]>
>wrote:
>
>>I have had something similar in other data relating to age and gender
>interactions - have you tried a quadratic term in the form of age squared
>along as well as an interaction of age-squared and ethnicity
>>It may also be worth calculating Mallows Cp for the best subsets of all
>your interaction terms
>>Are you entering ethnicity as a categorical or a series of dummy
>variables?
>>If using as a categorical it may be to do with e reference category - so
>may be worth constructing dummies
>>
>>Muir Houston
>>BA (Hons.), MPhil, PhD, FHEA
>>Research Fellow
>>Institute of Education & CRLL
>>University of Stirling
>>FK9 4LA
>>Tel: 01786-46-7615
>>
>>
>>-----Original Message-----
>>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Charlotte
>>Sent: 29 May 2007 14:25
>>To: [hidden email]
>>Subject: The effect of an interaction in logistic regression
>>
>>Dear all,
>>
>>I am trying to fit a logistic regression model to some data concerning a
>screening test.  As a start, I have generated models with just two
>independent variables; age and ethnicity.  The outcome variable is
>‘screened’ or ‘not screened’. The ethnicity variable has five
>categories.
>>
>>After fitting the model AGE + ETHNICITY + AGE*ETHNICITY, it seems clear
>that there is an interaction between age and ethnicity.  However, my
>question concerns a difference in the models when age is used in its
>original continuous form and when it is used as a categorical variable
>with 4 categories.
>>
>>If I generate the above model with age as a categorical variable, the
>estimated coefficients relating to the ethnicity main effect are all
>highly significant (p < 0.0001).  However, if I fit the same model but
>with age in its original continuous form, the estimated coefficients
>relating to ethnicity suddenly become not significant (p > 0.4 in each
>>case) and the Wald values completely diminish.  This basically happens
>when I include the interaction term in the model.  Can anyone help me to
>explain this please?  I’m now not sure whether to use age in its
>original form or in the categorised form but it seems that there is
>definitely some effect to take note of here.
>>
>>Many thanks,
>>
>>Charlotte
>>
>>--
>>The University of Stirling is a university established in Scotland by
>>charter at Stirling, FK9 4LA.  Privileged/Confidential Information may
>>be contained in this message.  If you are not the addressee indicated
>>in this message (or responsible for delivery of the message to such
>>person), you may not disclose, copy or deliver this message to anyone
>>and any action taken or omitted to be taken in reliance on it, is
>>prohibited and may be unlawful.  In such case, you should destroy this
>>message and kindly notify the sender by reply email.  Please advise
>>immediately if you or your employer do not consent to Internet email
>>for messages of this kind.
Reply | Threaded
Open this post in threaded view
|

Re: The effect of an interaction in logistic regression

statisticsdoc
In reply to this post by Charlotte-9
Charlotte,

Assuming that the dummy coding of ethnicity is OK, here is a conjecture about why the effects of ethnicity might not emerge when age is entered as a continuous variable.  Perhaps age has a more potent effect on the outcome when it is entered as a continuous variable than as a categorical variable?   If so, the residual association between age and ethnicity may no longer be significant.

Age in continuous form may have a more powerful effect if the categorization loses information without providing the benefit of slicing up age at meaningful cutpoints.

Please note that this is merely a conjecture, since I have not seen your data.

HTH,

Steve

www.statisticsdoc.com

---- Lou <[hidden email]> wrote:

> Hi Muir,
>
> I am just about to try a quadratic term as that's the only other thing I
> could think to look at for the moment.  As for ethnicity, I am entering as
> a series of dummy variables - well this seems to be how SPSS does it
> anyway (?).  I guess as well as trying to explain what's going on with
> this model, I'm also baffled as to why this is not seen when I use age in
> its grouped form.  Does this imply that the grouping is somehow distoring
> an important effect?  Overall, I would expect ethnicity to have a big
> effect on the outcome, so the more reasonable model 'seems' to be the one
> where age is grouped!
>
> Thanks for your help,
> Charlotte
>
> On Tue, 29 May 2007 14:41:31 +0100, Muir Houston <[hidden email]>
> wrote:
>
> >I have had something similar in other data relating to age and gender
> interactions - have you tried a quadratic term in the form of age squared
> along as well as an interaction of age-squared and ethnicity
> >It may also be worth calculating Mallows Cp for the best subsets of all
> your interaction terms
> >Are you entering ethnicity as a categorical or a series of dummy
> variables?
> >If using as a categorical it may be to do with e reference category - so
> may be worth constructing dummies
> >
> >Muir Houston
> >BA (Hons.), MPhil, PhD, FHEA
> >Research Fellow
> >Institute of Education & CRLL
> >University of Stirling
> >FK9 4LA
> >Tel: 01786-46-7615
> >
> >
> >-----Original Message-----
> >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Charlotte
> >Sent: 29 May 2007 14:25
> >To: [hidden email]
> >Subject: The effect of an interaction in logistic regression
> >
> >Dear all,
> >
> >I am trying to fit a logistic regression model to some data concerning a
> screening test.  As a start, I have generated models with just two
> independent variables; age and ethnicity.  The outcome variable is
> ‘screened’ or ‘not screened’. The ethnicity variable has five
> categories.
> >
> >After fitting the model AGE + ETHNICITY + AGE*ETHNICITY, it seems clear
> that there is an interaction between age and ethnicity.  However, my
> question concerns a difference in the models when age is used in its
> original continuous form and when it is used as a categorical variable
> with 4 categories.
> >
> >If I generate the above model with age as a categorical variable, the
> estimated coefficients relating to the ethnicity main effect are all
> highly significant (p < 0.0001).  However, if I fit the same model but
> with age in its original continuous form, the estimated coefficients
> relating to ethnicity suddenly become not significant (p > 0.4 in each
> >case) and the Wald values completely diminish.  This basically happens
> when I include the interaction term in the model.  Can anyone help me to
> explain this please?  I’m now not sure whether to use age in its
> original form or in the categorised form but it seems that there is
> definitely some effect to take note of here.
> >
> >Many thanks,
> >
> >Charlotte
> >
> >--
> >The University of Stirling is a university established in Scotland by
> >charter at Stirling, FK9 4LA.  Privileged/Confidential Information may
> >be contained in this message.  If you are not the addressee indicated
> >in this message (or responsible for delivery of the message to such
> >person), you may not disclose, copy or deliver this message to anyone
> >and any action taken or omitted to be taken in reliance on it, is
> >prohibited and may be unlawful.  In such case, you should destroy this
> >message and kindly notify the sender by reply email.  Please advise
> >immediately if you or your employer do not consent to Internet email
> >for messages of this kind.

--
For personalized and experienced consulting in statistics and research design, visit www.statisticsdoc.com
Reply | Threaded
Open this post in threaded view
|

Re: The effect of an interaction in logistic regression

Hector Maletta
In reply to this post by Charlotte-9
Charlotte,
the better fit you find with age squared means the effect of age is increasing with age. That the interaction with age squared is more significant that the interaction with age not squared means that the interaction is stronger as age increases.

Now, the SUBSTANTIVE (as opposed to statistical) interpretation of this finding is something better left to specialists. Just as a layman's guess, I presume reluctance and physical impediments to be screened get stronger as people get older. Why this reinforces the ethnicity effect would depend on which ethnic group is less screened. Just as a guess, it might be that ethnicity operates as a proxy for education, health-care insurance coverage and other variables influencing screening, and perhas THIS ethnicity effect is stronger with older people of the less/more favored ethnic groups.

Hector

----- Mensaje original -----
De: Charlotte <[hidden email]>
Fecha: Martes, Mayo 29, 2007 11:29 am
Asunto: Re: The effect of an interaction in logistic regression

> Hi again,
>
> I have just fit the following model Ethnicity + Age^2 +
> Ethnicity*Age^2and this seems to have worked well at sorting out
> my initial problem - the
> coefficients relating to ethncitiy are nearly all significant again.
> However, I am now still not sure whether to accept this model or
> the one
> with the categorical variable!  Furthermore, when including a
> quadraticterm, how does this affect the overall interpretation of the
> interactions?  Any further thoughts appreciated.
>
> Many thanks,
>
> Charlotte
>
> On Tue, 29 May 2007 10:01:48 -0400, Lou <[hidden email]>
> wrote:
> >Hi Muir,
> >
> >I am just about to try a quadratic term as that's the only other
> thing I
> >could think to look at for the moment.  As for ethnicity, I am
> entering as
> >a series of dummy variables - well this seems to be how SPSS does it
> >anyway (?).  I guess as well as trying to explain what's going on
> with>this model, I'm also baffled as to why this is not seen when
> I use age in
> >its grouped form.  Does this imply that the grouping is somehow
> distoring>an important effect?  Overall, I would expect ethnicity
> to have a big
> >effect on the outcome, so the more reasonable model 'seems' to be
> the one
> >where age is grouped!
> >
> >Thanks for your help,
> >Charlotte
> >
> >On Tue, 29 May 2007 14:41:31 +0100, Muir Houston
> <[hidden email]>>wrote:
> >
> >>I have had something similar in other data relating to age and
> gender>interactions - have you tried a quadratic term in the form
> of age squared
> >along as well as an interaction of age-squared and ethnicity
> >>It may also be worth calculating Mallows Cp for the best subsets
> of all
> >your interaction terms
> >>Are you entering ethnicity as a categorical or a series of dummy
> >variables?
> >>If using as a categorical it may be to do with e reference
> category - so
> >may be worth constructing dummies
> >>
> >>Muir Houston
> >>BA (Hons.), MPhil, PhD, FHEA
> >>Research Fellow
> >>Institute of Education & CRLL
> >>University of Stirling
> >>FK9 4LA
> >>Tel: 01786-46-7615
> >>
> >>
> >>-----Original Message-----
> >>From: SPSSX(r) Discussion [[hidden email]] On Behalf Of
> >Charlotte
> >>Sent: 29 May 2007 14:25
> >>To: [hidden email]
> >>Subject: The effect of an interaction in logistic regression
> >>
> >>Dear all,
> >>
> >>I am trying to fit a logistic regression model to some data
> concerning a
> >screening test.  As a start, I have generated models with just two
> >independent variables; age and ethnicity.  The outcome variable is
> >‘screened’ or ‘not screened’. The
> ethnicity variable has five
> >categories.
> >>
> >>After fitting the model AGE + ETHNICITY + AGE*ETHNICITY, it
> seems clear
> >that there is an interaction between age and ethnicity.  However, my
> >question concerns a difference in the models when age is used in its
> >original continuous form and when it is used as a categorical
> variable>with 4 categories.
> >>
> >>If I generate the above model with age as a categorical
> variable, the
> >estimated coefficients relating to the ethnicity main effect are all
> >highly significant (p < 0.0001).  However, if I fit the same
> model but
> >with age in its original continuous form, the estimated coefficients
> >relating to ethnicity suddenly become not significant (p > 0.4 in
> each>>case) and the Wald values completely diminish.  This
> basically happens
> >when I include the interaction term in the model.  Can anyone
> help me to
> >explain this please?  I’m now not sure whether to use age
> in its
> >original form or in the categorised form but it seems that there is
> >definitely some effect to take note of here.
> >>
> >>Many thanks,
> >>
> >>Charlotte
> >>
> >>--
> >>The University of Stirling is a university established in
> Scotland by
> >>charter at Stirling, FK9 4LA.  Privileged/Confidential
> Information may
> >>be contained in this message.  If you are not the addressee
> indicated>>in this message (or responsible for delivery of the
> message to such
> >>person), you may not disclose, copy or deliver this message to
> anyone>>and any action taken or omitted to be taken in reliance on
> it, is
> >>prohibited and may be unlawful.  In such case, you should
> destroy this
> >>message and kindly notify the sender by reply email.  Please advise
> >>immediately if you or your employer do not consent to Internet email
> >>for messages of this kind.
>