|
Dear all,
I am trying to fit a logistic regression model to some data concerning a screening test. As a start, I have generated models with just two independent variables; age and ethnicity. The outcome variable is âscreenedâ or ânot screenedâ. The ethnicity variable has five categories. After fitting the model AGE + ETHNICITY + AGE*ETHNICITY, it seems clear that there is an interaction between age and ethnicity. However, my question concerns a difference in the models when age is used in its original continuous form and when it is used as a categorical variable with 4 categories. If I generate the above model with age as a categorical variable, the estimated coefficients relating to the ethnicity main effect are all highly significant (p < 0.0001). However, if I fit the same model but with age in its original continuous form, the estimated coefficients relating to ethnicity suddenly become not significant (p > 0.4 in each case) and the Wald values completely diminish. This basically happens when I include the interaction term in the model. Can anyone help me to explain this please? Iâm now not sure whether to use age in its original form or in the categorised form but it seems that there is definitely some effect to take note of here. Many thanks, Charlotte |
|
I have had something similar in other data relating to age and gender interactions - have you tried a quadratic term in the form of age squared along as well as an interaction of age-squared and ethnicity
It may also be worth calculating Mallows Cp for the best subsets of all your interaction terms Are you entering ethnicity as a categorical or a series of dummy variables? If using as a categorical it may be to do with e reference category - so may be worth constructing dummies Muir Houston BA (Hons.), MPhil, PhD, FHEA Research Fellow Institute of Education & CRLL University of Stirling FK9 4LA Tel: 01786-46-7615 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Charlotte Sent: 29 May 2007 14:25 To: [hidden email] Subject: The effect of an interaction in logistic regression Dear all, I am trying to fit a logistic regression model to some data concerning a screening test. As a start, I have generated models with just two independent variables; age and ethnicity. The outcome variable is ‘screened’ or ‘not screened’. The ethnicity variable has five categories. After fitting the model AGE + ETHNICITY + AGE*ETHNICITY, it seems clear that there is an interaction between age and ethnicity. However, my question concerns a difference in the models when age is used in its original continuous form and when it is used as a categorical variable with 4 categories. If I generate the above model with age as a categorical variable, the estimated coefficients relating to the ethnicity main effect are all highly significant (p < 0.0001). However, if I fit the same model but with age in its original continuous form, the estimated coefficients relating to ethnicity suddenly become not significant (p > 0.4 in each case) and the Wald values completely diminish. This basically happens when I include the interaction term in the model. Can anyone help me to explain this please? I’m now not sure whether to use age in its original form or in the categorised form but it seems that there is definitely some effect to take note of here. Many thanks, Charlotte -- The University of Stirling is a university established in Scotland by charter at Stirling, FK9 4LA. Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not disclose, copy or deliver this message to anyone and any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer do not consent to Internet email for messages of this kind. |
|
In reply to this post by Charlotte-9
Hi Muir,
I am just about to try a quadratic term as that's the only other thing I could think to look at for the moment. As for ethnicity, I am entering as a series of dummy variables - well this seems to be how SPSS does it anyway (?). I guess as well as trying to explain what's going on with this model, I'm also baffled as to why this is not seen when I use age in its grouped form. Does this imply that the grouping is somehow distoring an important effect? Overall, I would expect ethnicity to have a big effect on the outcome, so the more reasonable model 'seems' to be the one where age is grouped! Thanks for your help, Charlotte On Tue, 29 May 2007 14:41:31 +0100, Muir Houston <[hidden email]> wrote: >I have had something similar in other data relating to age and gender interactions - have you tried a quadratic term in the form of age squared along as well as an interaction of age-squared and ethnicity >It may also be worth calculating Mallows Cp for the best subsets of all your interaction terms >Are you entering ethnicity as a categorical or a series of dummy variables? >If using as a categorical it may be to do with e reference category - so may be worth constructing dummies > >Muir Houston >BA (Hons.), MPhil, PhD, FHEA >Research Fellow >Institute of Education & CRLL >University of Stirling >FK9 4LA >Tel: 01786-46-7615 > > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Sent: 29 May 2007 14:25 >To: [hidden email] >Subject: The effect of an interaction in logistic regression > >Dear all, > >I am trying to fit a logistic regression model to some data concerning a screening test. As a start, I have generated models with just two independent variables; age and ethnicity. The outcome variable is ââ¬Ëscreenedââ¬â¢ or ââ¬Ënot screenedââ¬â¢. The ethnicity variable has five categories. > >After fitting the model AGE + ETHNICITY + AGE*ETHNICITY, it seems clear that there is an interaction between age and ethnicity. However, my question concerns a difference in the models when age is used in its original continuous form and when it is used as a categorical variable with 4 categories. > >If I generate the above model with age as a categorical variable, the estimated coefficients relating to the ethnicity main effect are all highly significant (p < 0.0001). However, if I fit the same model but with age in its original continuous form, the estimated coefficients relating to ethnicity suddenly become not significant (p > 0.4 in each >case) and the Wald values completely diminish. This basically happens when I include the interaction term in the model. Can anyone help me to explain this please? Iââ¬â¢m now not sure whether to use age in its original form or in the categorised form but it seems that there is definitely some effect to take note of here. > >Many thanks, > >Charlotte > >-- >The University of Stirling is a university established in Scotland by >charter at Stirling, FK9 4LA. Privileged/Confidential Information may >be contained in this message. If you are not the addressee indicated >in this message (or responsible for delivery of the message to such >person), you may not disclose, copy or deliver this message to anyone >and any action taken or omitted to be taken in reliance on it, is >prohibited and may be unlawful. In such case, you should destroy this >message and kindly notify the sender by reply email. Please advise >immediately if you or your employer do not consent to Internet email >for messages of this kind. |
|
In reply to this post by Charlotte-9
Charlotte,
according to your previous messages about your study, your subjects are all in the 50-70 age bracket, and I imagine your 4 age groups mighr be 5-year intervals like 50-54, 55-59 and so on. When you use individual years, you are measuring the effect of adding one individual year of age, which is probably a smaller effect than jumping from one 5-year age group to the next. That is possibly the souyrce of the lower significance of your result when age is measured as a continuous variable (or more precisely, as a discrete variable measured in years) as compared with the case in which it is measured as a discrete variable measured in 5-year intervals. However, this is only a guess, which is not very likely in your case because your dataset is large enough to get significant result for small effects. There might possiblty be other problems involved. One of them is non monotonicity in the increase of the outcome by age: it may be the case that screening increases by age group, but not necessarily increases in all one-year intervals, thus making for a less clear relationship between the outcome and age when the latter is measured in individual years (if you measured age in months or days or minutes it would be yet less significant). That is usually the reason why ages are grouped in intervals, even at the risk of losing information and introducing arbitrary cutoff points. To check for these risks, see whether your works with other age groupings (e.g. in groups of 4 years instead of 5). Hector ----- Mensaje original ----- De: Charlotte <[hidden email]> Fecha: Martes, Mayo 29, 2007 10:24 am Asunto: The effect of an interaction in logistic regression > Dear all, > > I am trying to fit a logistic regression to some data > concerning a > screening test. As a start, I have generated s with just two > independent variables; age and ethnicity. The outcome variable > is âscreenedâ or ânot screenedâ. The ethnicity variable > has five > categories. > > After fitting the AGE + ETHNICITY + AGE*ETHNICITY, it seems > clearthat there is an interaction between age and ethnicity. > However, my > question concerns a difference in the s when age is used in its > original continuous form and when it is used as a categorical variable > with 4 categories. > > If I generate the above with age as a categorical variable, the > estimated coefficients relating to the ethnicity main effect are all > highly significant (p < 0.0001). However, if I fit the same but > with age in its original continuous form, the estimated coefficients > relating to ethnicity suddenly become not significant (p > 0.4 in each > case) and the Wald values completely diminish. This basically happens > when I include the interaction term in the . Can anyone help > me to > explain this please? Iâm now not sure whether to use age in its > originalform or in the categorised form but it seems that there is > definitely some > effect to take note of here. > > Many thanks, > > Charlotte > |
|
In reply to this post by Charlotte-9
Hi again,
I have just fit the following model Ethnicity + Age^2 + Ethnicity*Age^2 and this seems to have worked well at sorting out my initial problem - the coefficients relating to ethncitiy are nearly all significant again. However, I am now still not sure whether to accept this model or the one with the categorical variable! Furthermore, when including a quadratic term, how does this affect the overall interpretation of the interactions? Any further thoughts appreciated. Many thanks, Charlotte On Tue, 29 May 2007 10:01:48 -0400, Lou <[hidden email]> wrote: >Hi Muir, > >I am just about to try a quadratic term as that's the only other thing I >could think to look at for the moment. As for ethnicity, I am entering as >a series of dummy variables - well this seems to be how SPSS does it >anyway (?). I guess as well as trying to explain what's going on with >this model, I'm also baffled as to why this is not seen when I use age in >its grouped form. Does this imply that the grouping is somehow distoring >an important effect? Overall, I would expect ethnicity to have a big >effect on the outcome, so the more reasonable model 'seems' to be the one >where age is grouped! > >Thanks for your help, >Charlotte > >On Tue, 29 May 2007 14:41:31 +0100, Muir Houston <[hidden email]> >wrote: > >>I have had something similar in other data relating to age and gender >interactions - have you tried a quadratic term in the form of age squared >along as well as an interaction of age-squared and ethnicity >>It may also be worth calculating Mallows Cp for the best subsets of all >your interaction terms >>Are you entering ethnicity as a categorical or a series of dummy >variables? >>If using as a categorical it may be to do with e reference category - so >may be worth constructing dummies >> >>Muir Houston >>BA (Hons.), MPhil, PhD, FHEA >>Research Fellow >>Institute of Education & CRLL >>University of Stirling >>FK9 4LA >>Tel: 01786-46-7615 >> >> >>-----Original Message----- >>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Charlotte >>Sent: 29 May 2007 14:25 >>To: [hidden email] >>Subject: The effect of an interaction in logistic regression >> >>Dear all, >> >>I am trying to fit a logistic regression model to some data concerning a >screening test. As a start, I have generated models with just two >independent variables; age and ethnicity. The outcome variable is >ââ¬Ëscreenedââ¬â¢ or ââ¬Ënot screenedââ¬â¢. The ethnicity variable has five >categories. >> >>After fitting the model AGE + ETHNICITY + AGE*ETHNICITY, it seems clear >that there is an interaction between age and ethnicity. However, my >question concerns a difference in the models when age is used in its >original continuous form and when it is used as a categorical variable >with 4 categories. >> >>If I generate the above model with age as a categorical variable, the >estimated coefficients relating to the ethnicity main effect are all >highly significant (p < 0.0001). However, if I fit the same model but >with age in its original continuous form, the estimated coefficients >relating to ethnicity suddenly become not significant (p > 0.4 in each >>case) and the Wald values completely diminish. This basically happens >when I include the interaction term in the model. Can anyone help me to >explain this please? Iââ¬â¢m now not sure whether to use age in its >original form or in the categorised form but it seems that there is >definitely some effect to take note of here. >> >>Many thanks, >> >>Charlotte >> >>-- >>The University of Stirling is a university established in Scotland by >>charter at Stirling, FK9 4LA. Privileged/Confidential Information may >>be contained in this message. If you are not the addressee indicated >>in this message (or responsible for delivery of the message to such >>person), you may not disclose, copy or deliver this message to anyone >>and any action taken or omitted to be taken in reliance on it, is >>prohibited and may be unlawful. In such case, you should destroy this >>message and kindly notify the sender by reply email. Please advise >>immediately if you or your employer do not consent to Internet email >>for messages of this kind. |
|
In reply to this post by Charlotte-9
Charlotte,
Assuming that the dummy coding of ethnicity is OK, here is a conjecture about why the effects of ethnicity might not emerge when age is entered as a continuous variable. Perhaps age has a more potent effect on the outcome when it is entered as a continuous variable than as a categorical variable? If so, the residual association between age and ethnicity may no longer be significant. Age in continuous form may have a more powerful effect if the categorization loses information without providing the benefit of slicing up age at meaningful cutpoints. Please note that this is merely a conjecture, since I have not seen your data. HTH, Steve www.statisticsdoc.com ---- Lou <[hidden email]> wrote: > Hi Muir, > > I am just about to try a quadratic term as that's the only other thing I > could think to look at for the moment. As for ethnicity, I am entering as > a series of dummy variables - well this seems to be how SPSS does it > anyway (?). I guess as well as trying to explain what's going on with > this model, I'm also baffled as to why this is not seen when I use age in > its grouped form. Does this imply that the grouping is somehow distoring > an important effect? Overall, I would expect ethnicity to have a big > effect on the outcome, so the more reasonable model 'seems' to be the one > where age is grouped! > > Thanks for your help, > Charlotte > > On Tue, 29 May 2007 14:41:31 +0100, Muir Houston <[hidden email]> > wrote: > > >I have had something similar in other data relating to age and gender > interactions - have you tried a quadratic term in the form of age squared > along as well as an interaction of age-squared and ethnicity > >It may also be worth calculating Mallows Cp for the best subsets of all > your interaction terms > >Are you entering ethnicity as a categorical or a series of dummy > variables? > >If using as a categorical it may be to do with e reference category - so > may be worth constructing dummies > > > >Muir Houston > >BA (Hons.), MPhil, PhD, FHEA > >Research Fellow > >Institute of Education & CRLL > >University of Stirling > >FK9 4LA > >Tel: 01786-46-7615 > > > > > >-----Original Message----- > >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Charlotte > >Sent: 29 May 2007 14:25 > >To: [hidden email] > >Subject: The effect of an interaction in logistic regression > > > >Dear all, > > > >I am trying to fit a logistic regression model to some data concerning a > screening test. As a start, I have generated models with just two > independent variables; age and ethnicity. The outcome variable is > ââ¬Ëscreenedââ¬â¢ or ââ¬Ënot screenedââ¬â¢. The ethnicity variable has five > categories. > > > >After fitting the model AGE + ETHNICITY + AGE*ETHNICITY, it seems clear > that there is an interaction between age and ethnicity. However, my > question concerns a difference in the models when age is used in its > original continuous form and when it is used as a categorical variable > with 4 categories. > > > >If I generate the above model with age as a categorical variable, the > estimated coefficients relating to the ethnicity main effect are all > highly significant (p < 0.0001). However, if I fit the same model but > with age in its original continuous form, the estimated coefficients > relating to ethnicity suddenly become not significant (p > 0.4 in each > >case) and the Wald values completely diminish. This basically happens > when I include the interaction term in the model. Can anyone help me to > explain this please? Iââ¬â¢m now not sure whether to use age in its > original form or in the categorised form but it seems that there is > definitely some effect to take note of here. > > > >Many thanks, > > > >Charlotte > > > >-- > >The University of Stirling is a university established in Scotland by > >charter at Stirling, FK9 4LA. Privileged/Confidential Information may > >be contained in this message. If you are not the addressee indicated > >in this message (or responsible for delivery of the message to such > >person), you may not disclose, copy or deliver this message to anyone > >and any action taken or omitted to be taken in reliance on it, is > >prohibited and may be unlawful. In such case, you should destroy this > >message and kindly notify the sender by reply email. Please advise > >immediately if you or your employer do not consent to Internet email > >for messages of this kind. -- For personalized and experienced consulting in statistics and research design, visit www.statisticsdoc.com |
|
In reply to this post by Charlotte-9
Charlotte,
the better fit you find with age squared means the effect of age is increasing with age. That the interaction with age squared is more significant that the interaction with age not squared means that the interaction is stronger as age increases. Now, the SUBSTANTIVE (as opposed to statistical) interpretation of this finding is something better left to specialists. Just as a layman's guess, I presume reluctance and physical impediments to be screened get stronger as people get older. Why this reinforces the ethnicity effect would depend on which ethnic group is less screened. Just as a guess, it might be that ethnicity operates as a proxy for education, health-care insurance coverage and other variables influencing screening, and perhas THIS ethnicity effect is stronger with older people of the less/more favored ethnic groups. Hector ----- Mensaje original ----- De: Charlotte <[hidden email]> Fecha: Martes, Mayo 29, 2007 11:29 am Asunto: Re: The effect of an interaction in logistic regression > Hi again, > > I have just fit the following model Ethnicity + Age^2 + > Ethnicity*Age^2and this seems to have worked well at sorting out > my initial problem - the > coefficients relating to ethncitiy are nearly all significant again. > However, I am now still not sure whether to accept this model or > the one > with the categorical variable! Furthermore, when including a > quadraticterm, how does this affect the overall interpretation of the > interactions? Any further thoughts appreciated. > > Many thanks, > > Charlotte > > On Tue, 29 May 2007 10:01:48 -0400, Lou <[hidden email]> > wrote: > >Hi Muir, > > > >I am just about to try a quadratic term as that's the only other > thing I > >could think to look at for the moment. As for ethnicity, I am > entering as > >a series of dummy variables - well this seems to be how SPSS does it > >anyway (?). I guess as well as trying to explain what's going on > with>this model, I'm also baffled as to why this is not seen when > I use age in > >its grouped form. Does this imply that the grouping is somehow > distoring>an important effect? Overall, I would expect ethnicity > to have a big > >effect on the outcome, so the more reasonable model 'seems' to be > the one > >where age is grouped! > > > >Thanks for your help, > >Charlotte > > > >On Tue, 29 May 2007 14:41:31 +0100, Muir Houston > <[hidden email]>>wrote: > > > >>I have had something similar in other data relating to age and > gender>interactions - have you tried a quadratic term in the form > of age squared > >along as well as an interaction of age-squared and ethnicity > >>It may also be worth calculating Mallows Cp for the best subsets > of all > >your interaction terms > >>Are you entering ethnicity as a categorical or a series of dummy > >variables? > >>If using as a categorical it may be to do with e reference > category - so > >may be worth constructing dummies > >> > >>Muir Houston > >>BA (Hons.), MPhil, PhD, FHEA > >>Research Fellow > >>Institute of Education & CRLL > >>University of Stirling > >>FK9 4LA > >>Tel: 01786-46-7615 > >> > >> > >>-----Original Message----- > >>From: SPSSX(r) Discussion [[hidden email]] On Behalf Of > >Charlotte > >>Sent: 29 May 2007 14:25 > >>To: [hidden email] > >>Subject: The effect of an interaction in logistic regression > >> > >>Dear all, > >> > >>I am trying to fit a logistic regression model to some data > concerning a > >screening test. As a start, I have generated models with just two > >independent variables; age and ethnicity. The outcome variable is > >ââ¬Ëscreenedââ¬â¢ or ââ¬Ënot screenedââ¬â¢. The > ethnicity variable has five > >categories. > >> > >>After fitting the model AGE + ETHNICITY + AGE*ETHNICITY, it > seems clear > >that there is an interaction between age and ethnicity. However, my > >question concerns a difference in the models when age is used in its > >original continuous form and when it is used as a categorical > variable>with 4 categories. > >> > >>If I generate the above model with age as a categorical > variable, the > >estimated coefficients relating to the ethnicity main effect are all > >highly significant (p < 0.0001). However, if I fit the same > model but > >with age in its original continuous form, the estimated coefficients > >relating to ethnicity suddenly become not significant (p > 0.4 in > each>>case) and the Wald values completely diminish. This > basically happens > >when I include the interaction term in the model. Can anyone > help me to > >explain this please? Iââ¬â¢m now not sure whether to use age > in its > >original form or in the categorised form but it seems that there is > >definitely some effect to take note of here. > >> > >>Many thanks, > >> > >>Charlotte > >> > >>-- > >>The University of Stirling is a university established in > Scotland by > >>charter at Stirling, FK9 4LA. Privileged/Confidential > Information may > >>be contained in this message. If you are not the addressee > indicated>>in this message (or responsible for delivery of the > message to such > >>person), you may not disclose, copy or deliver this message to > anyone>>and any action taken or omitted to be taken in reliance on > it, is > >>prohibited and may be unlawful. In such case, you should > destroy this > >>message and kindly notify the sender by reply email. Please advise > >>immediately if you or your employer do not consent to Internet email > >>for messages of this kind. > |
| Free forum by Nabble | Edit this page |
