Logistic regression help

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Logistic regression help

Charlotte-9
Dear list

I am trying to carry out a logistic regression analysis and have a quick
question with regards to the best way to input my independent variables.
I have three input variables: ethnicity (5 groups), age and deprivation
score.  Although age and deprivation score are continuous variables, I
have also been asked to split them into groups (4 for age and 5 for
deprivation) which are pre-determined by previous work on this subject
matter.  The dependent variable is simply whether or not a person took a
particular test.

I have tried generating models both with the age and deprivation variables
as they are and also with the new categorical age and deprivation
variables.  However, when looking at interaction terms, I find that the
interaction between age and deprivation is significant when they are input
as the continuous variables but not significant when I used the
categorical versions.  Why would this happen?  Furthermore, which is the
best way to go?  I have read information on logistic regression until my
head hurts, but still don’t feel completely satisfied as to how I should
determine the best model possible.

Any advice would be appreciated please!

Thanks

Lou
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression help

statisticsdoc
Keith Starborn
www.statisticsdoc.com

Dear Lou,

Categorizing continuous variables into categorical variables can result is a considerable loss of statistical power because the test for the categorized version of the variable uses more degrees of freedom that the test for the continuous variable.  In addition, categorizing a continuous variable can result in a loss of predictive information.

HTH,

KS

---- Lou <[hidden email]> wrote:

> Dear list
>
> I am trying to carry out a logistic regression analysis and have a quick
> question with regards to the best way to input my independent variables.
> I have three input variables: ethnicity (5 groups), age and deprivation
> score.  Although age and deprivation score are continuous variables, I
> have also been asked to split them into groups (4 for age and 5 for
> deprivation) which are pre-determined by previous work on this subject
> matter.  The dependent variable is simply whether or not a person took a
> particular test.
>
> I have tried generating models both with the age and deprivation variables
> as they are and also with the new categorical age and deprivation
> variables.  However, when looking at interaction terms, I find that the
> interaction between age and deprivation is significant when they are input
> as the continuous variables but not significant when I used the
> categorical versions.  Why would this happen?  Furthermore, which is the
> best way to go?  I have read information on logistic regression until my
> head hurts, but still don’t feel completely satisfied as to how I should
> determine the best model possible.
>
> Any advice would be appreciated please!
>
> Thanks
>
> Lou

--
For personalized and experienced consulting in statistics and research design, visit www.statisticsdoc.com
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression help

Charlotte-9
In reply to this post by Charlotte-9
Dear Keith,

Thanks for your advice which was very helpful.  I feel a bit stuck as to
know what to do about this really. My boss (who knows rougly zero about
statistics) is insisting that I categorise these variables since I am
comparing results with a previous report which did the same.  Does it take
meaning away from the analysis if I discuss results obtained using the
original continuous variables and then discuss results separately using
the categorised versions (i.e. generate two separate models)?  Not sure if
this really defies logic too much and how I would justify this in the
final report.  Although I have a lot to learn in this field, the report
that this work is being based on has a lot of dubious findings with
regards to the stats, so I'm very keen to ensure that the one I produce is
accurate!!

Many thanks,

Lou

On Thu, 15 Jun 2006 11:36:45 -0400, Statisticsdoc <[hidden email]>
wrote:

>Keith Starborn
>www.statisticsdoc.com
>
>Dear Lou,
>
>Categorizing continuous variables into categorical variables can result
is a considerable loss of statistical power because the test for the
categorized version of the variable uses more degrees of freedom that the
test for the continuous variable.  In addition, categorizing a continuous
variable can result in a loss of predictive information.

>
>HTH,
>
>KS
>
>---- Lou <[hidden email]> wrote:
>> Dear list
>>
>> I am trying to carry out a logistic regression analysis and have a quick
>> question with regards to the best way to input my independent variables.
>> I have three input variables: ethnicity (5 groups), age and deprivation
>> score.  Although age and deprivation score are continuous variables, I
>> have also been asked to split them into groups (4 for age and 5 for
>> deprivation) which are pre-determined by previous work on this subject
>> matter.  The dependent variable is simply whether or not a person took a
>> particular test.
>>
>> I have tried generating models both with the age and deprivation
variables
>> as they are and also with the new categorical age and deprivation
>> variables.  However, when looking at interaction terms, I find that the
>> interaction between age and deprivation is significant when they are
input
>> as the continuous variables but not significant when I used the
>> categorical versions.  Why would this happen?  Furthermore, which is the
>> best way to go?  I have read information on logistic regression until my
>> head hurts, but still don’t feel completely satisfied as to how I
should

>> determine the best model possible.
>>
>> Any advice would be appreciated please!
>>
>> Thanks
>>
>> Lou
>
>--
>For personalized and experienced consulting in statistics and research
design, visit www.statisticsdoc.com
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression help

statisticsdoc
In reply to this post by Charlotte-9
Keith Starborn
www.statisticsdoc.com

Lou,

I bet most of the people on this listerserv have faced a similar dilemma at some time in their careers.  Which one is best from the point of view of using the data to answer your questions and generate information that you can act on?  Probably, keeping the variables continuous is better from that point of view.

As to the politics of the situation, in your position, I would run the analyses both ways (continuous and categorized) in order to: a.) show that I did the analysis the way I was told to; and b.) found something else that works better.  You know the situation best of all.

HTH,

KS

---- Lou <[hidden email]> wrote:

> Dear Keith,
>
> Thanks for your advice which was very helpful.  I feel a bit stuck as to
> know what to do about this really. My boss (who knows rougly zero about
> statistics) is insisting that I categorise these variables since I am
> comparing results with a previous report which did the same.  Does it take
> meaning away from the analysis if I discuss results obtained using the
> original continuous variables and then discuss results separately using
> the categorised versions (i.e. generate two separate models)?  Not sure if
> this really defies logic too much and how I would justify this in the
> final report.  Although I have a lot to learn in this field, the report
> that this work is being based on has a lot of dubious findings with
> regards to the stats, so I'm very keen to ensure that the one I produce is
> accurate!!
>
> Many thanks,
>
> Lou
>
> On Thu, 15 Jun 2006 11:36:45 -0400, Statisticsdoc <[hidden email]>
> wrote:
>
> >Keith Starborn
> >www.statisticsdoc.com
> >
> >Dear Lou,
> >
> >Categorizing continuous variables into categorical variables can result
> is a considerable loss of statistical power because the test for the
> categorized version of the variable uses more degrees of freedom that the
> test for the continuous variable.  In addition, categorizing a continuous
> variable can result in a loss of predictive information.
> >
> >HTH,
> >
> >KS
> >
> >---- Lou <[hidden email]> wrote:
> >> Dear list
> >>
> >> I am trying to carry out a logistic regression analysis and have a quick
> >> question with regards to the best way to input my independent variables.
> >> I have three input variables: ethnicity (5 groups), age and deprivation
> >> score.  Although age and deprivation score are continuous variables, I
> >> have also been asked to split them into groups (4 for age and 5 for
> >> deprivation) which are pre-determined by previous work on this subject
> >> matter.  The dependent variable is simply whether or not a person took a
> >> particular test.
> >>
> >> I have tried generating models both with the age and deprivation
> variables
> >> as they are and also with the new categorical age and deprivation
> >> variables.  However, when looking at interaction terms, I find that the
> >> interaction between age and deprivation is significant when they are
> input
> >> as the continuous variables but not significant when I used the
> >> categorical versions.  Why would this happen?  Furthermore, which is the
> >> best way to go?  I have read information on logistic regression until my
> >> head hurts, but still don’t feel completely satisfied as to how I
> should
> >> determine the best model possible.
> >>
> >> Any advice would be appreciated please!
> >>
> >> Thanks
> >>
> >> Lou
> >
> >--
> >For personalized and experienced consulting in statistics and research
> design, visit www.statisticsdoc.com

--
For personalized and experienced consulting in statistics and research design, visit www.statisticsdoc.com
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression help

LUCINDA M TEAR
Hi, Lou.  I agree with all that Keith has said.  I might add that the non
significant interaction using categorical variables could be due either to
the fact that by lumping together the Y responses over a range of X inputs
you created a categorical variable whose variance is large enough that it is
not possible to detect any interaction and/or that the endpoints of the bin
categories you are using occur at points in the data that obscure the
interaction you found using the continuous data.

In some cases, it may actually serve you to have a model without an
interaction effect - it is possible, however, that the confidence intervals
around such a model will be larger than they would be from a model with an
interaction.  On the other hand, using the continuous data apparently
allowed you to detect some underlying "process" (the interaction you found).
If you are trying to understand what creates the patterns you see in your
data, both models give you information about the resolution at which certain
processes are revealed or obscured.  Apparently lumping the way you have
obscures the interaction.  You might want to try binning your x variables
differently than the previous report did, just to see if there is a way to
categorize the x variables such that an interaction is detected.  You could
probably use plots from your continuous model to give you an idea about
where appropriate bins thresholds might lie.  I tend to be one who likes to
use models as a way of revealing the "scale" at which the data should be
approached in order to answer the question at hand.  A different question
about the same data could require a different type of model.  Models also
help you find out if the scale you are looking at is missing information
about some underlying effects that could effect the application of the
results.

Just some thoughts.

Lucinda



----- Original Message -----
From: "Statisticsdoc" <[hidden email]>
Newsgroups: bit.listserv.spssx-l
To: <[hidden email]>
Sent: Thursday, June 15, 2006 12:36 PM
Subject: Re: Logistic regression help


> Keith Starborn
> www.statisticsdoc.com
>
> Lou,
>
> I bet most of the people on this listerserv have faced a similar dilemma
> at some time in their careers.  Which one is best from the point of view
> of using the data to answer your questions and generate information that
> you can act on?  Probably, keeping the variables continuous is better from
> that point of view.
>
> As to the politics of the situation, in your position, I would run the
> analyses both ways (continuous and categorized) in order to: a.) show that
> I did the analysis the way I was told to; and b.) found something else
> that works better.  You know the situation best of all.
>
> HTH,
>
> KS
>
> ---- Lou <[hidden email]> wrote:
> > Dear Keith,
> >
> > Thanks for your advice which was very helpful.  I feel a bit stuck as to
> > know what to do about this really. My boss (who knows rougly zero about
> > statistics) is insisting that I categorise these variables since I am
> > comparing results with a previous report which did the same.  Does it
> > take
> > meaning away from the analysis if I discuss results obtained using the
> > original continuous variables and then discuss results separately using
> > the categorised versions (i.e. generate two separate models)?  Not sure
> > if
> > this really defies logic too much and how I would justify this in the
> > final report.  Although I have a lot to learn in this field, the report
> > that this work is being based on has a lot of dubious findings with
> > regards to the stats, so I'm very keen to ensure that the one I produce
> > is
> > accurate!!
> >
> > Many thanks,
> >
> > Lou
> >
> > On Thu, 15 Jun 2006 11:36:45 -0400, Statisticsdoc
> > <[hidden email]>
> > wrote:
> >
> > >Keith Starborn
> > >www.statisticsdoc.com
> > >
> > >Dear Lou,
> > >
> > >Categorizing continuous variables into categorical variables can result
> > is a considerable loss of statistical power because the test for the
> > categorized version of the variable uses more degrees of freedom that
> > the
> > test for the continuous variable.  In addition, categorizing a
> > continuous
> > variable can result in a loss of predictive information.
> > >
> > >HTH,
> > >
> > >KS
> > >
> > >---- Lou <[hidden email]> wrote:
> > >> Dear list
> > >>
> > >> I am trying to carry out a logistic regression analysis and have a
> > >> quick
> > >> question with regards to the best way to input my independent
> > >> variables.
> > >> I have three input variables: ethnicity (5 groups), age and
> > >> deprivation
> > >> score.  Although age and deprivation score are continuous variables,
> > >> I
> > >> have also been asked to split them into groups (4 for age and 5 for
> > >> deprivation) which are pre-determined by previous work on this
> > >> subject
> > >> matter.  The dependent variable is simply whether or not a person
> > >> took a
> > >> particular test.
> > >>
> > >> I have tried generating models both with the age and deprivation
> > variables
> > >> as they are and also with the new categorical age and deprivation
> > >> variables.  However, when looking at interaction terms, I find that
> > >> the
> > >> interaction between age and deprivation is significant when they are
> > input
> > >> as the continuous variables but not significant when I used the
> > >> categorical versions.  Why would this happen?  Furthermore, which is
> > >> the
> > >> best way to go?  I have read information on logistic regression until
> > >> my
> > >> head hurts, but still don’t feel completely satisfied as to how I
> > should
> > >> determine the best model possible.
> > >>
> > >> Any advice would be appreciated please!
> > >>
> > >> Thanks
> > >>
> > >> Lou
> > >
> > >--
> > >For personalized and experienced consulting in statistics and research
> > design, visit www.statisticsdoc.com
>
> --
> For personalized and experienced consulting in statistics and research
> design, visit www.statisticsdoc.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression help

Charlotte-9
In reply to this post by Charlotte-9
Hi Lucinda,

Thanks very much for your response.  You have certainly helped me to think
more clearly about the issues surrounding this problem and I'll be re-
reading your reply in order to help me fathom out what's going on with
this data.

Thanks again,

Lou

On Thu, 15 Jun 2006 13:06:37 -0700, LUCINDA M TEAR <[hidden email]>
wrote:

>Hi, Lou.  I agree with all that Keith has said.  I might add that the non
>significant interaction using categorical variables could be due either to
>the fact that by lumping together the Y responses over a range of X inputs
>you created a categorical variable whose variance is large enough that it
is
>not possible to detect any interaction and/or that the endpoints of the
bin
>categories you are using occur at points in the data that obscure the
>interaction you found using the continuous data.
>
>In some cases, it may actually serve you to have a model without an
>interaction effect - it is possible, however, that the confidence
intervals
>around such a model will be larger than they would be from a model with an
>interaction.  On the other hand, using the continuous data apparently
>allowed you to detect some underlying "process" (the interaction you
found).
>If you are trying to understand what creates the patterns you see in your
>data, both models give you information about the resolution at which
certain
>processes are revealed or obscured.  Apparently lumping the way you have
>obscures the interaction.  You might want to try binning your x variables
>differently than the previous report did, just to see if there is a way to
>categorize the x variables such that an interaction is detected.  You
could
>probably use plots from your continuous model to give you an idea about
>where appropriate bins thresholds might lie.  I tend to be one who likes
to

>use models as a way of revealing the "scale" at which the data should be
>approached in order to answer the question at hand.  A different question
>about the same data could require a different type of model.  Models also
>help you find out if the scale you are looking at is missing information
>about some underlying effects that could effect the application of the
>results.
>
>Just some thoughts.
>
>Lucinda
>
>
>
>----- Original Message -----
>From: "Statisticsdoc" <[hidden email]>
>Newsgroups: bit.listserv.spssx-l
>To: <[hidden email]>
>Sent: Thursday, June 15, 2006 12:36 PM
>Subject: Re: Logistic regression help
>
>
>> Keith Starborn
>> www.statisticsdoc.com
>>
>> Lou,
>>
>> I bet most of the people on this listerserv have faced a similar dilemma
>> at some time in their careers.  Which one is best from the point of view
>> of using the data to answer your questions and generate information that
>> you can act on?  Probably, keeping the variables continuous is better
from
>> that point of view.
>>
>> As to the politics of the situation, in your position, I would run the
>> analyses both ways (continuous and categorized) in order to: a.) show
that

>> I did the analysis the way I was told to; and b.) found something else
>> that works better.  You know the situation best of all.
>>
>> HTH,
>>
>> KS
>>
>> ---- Lou <[hidden email]> wrote:
>> > Dear Keith,
>> >
>> > Thanks for your advice which was very helpful.  I feel a bit stuck as
to
>> > know what to do about this really. My boss (who knows rougly zero
about
>> > statistics) is insisting that I categorise these variables since I am
>> > comparing results with a previous report which did the same.  Does it
>> > take
>> > meaning away from the analysis if I discuss results obtained using the
>> > original continuous variables and then discuss results separately
using
>> > the categorised versions (i.e. generate two separate models)?  Not
sure
>> > if
>> > this really defies logic too much and how I would justify this in the
>> > final report.  Although I have a lot to learn in this field, the
report
>> > that this work is being based on has a lot of dubious findings with
>> > regards to the stats, so I'm very keen to ensure that the one I
produce

>> > is
>> > accurate!!
>> >
>> > Many thanks,
>> >
>> > Lou
>> >
>> > On Thu, 15 Jun 2006 11:36:45 -0400, Statisticsdoc
>> > <[hidden email]>
>> > wrote:
>> >
>> > >Keith Starborn
>> > >www.statisticsdoc.com
>> > >
>> > >Dear Lou,
>> > >
>> > >Categorizing continuous variables into categorical variables can
result

>> > is a considerable loss of statistical power because the test for the
>> > categorized version of the variable uses more degrees of freedom that
>> > the
>> > test for the continuous variable.  In addition, categorizing a
>> > continuous
>> > variable can result in a loss of predictive information.
>> > >
>> > >HTH,
>> > >
>> > >KS
>> > >
>> > >---- Lou <[hidden email]> wrote:
>> > >> Dear list
>> > >>
>> > >> I am trying to carry out a logistic regression analysis and have a
>> > >> quick
>> > >> question with regards to the best way to input my independent
>> > >> variables.
>> > >> I have three input variables: ethnicity (5 groups), age and
>> > >> deprivation
>> > >> score.  Although age and deprivation score are continuous
variables,

>> > >> I
>> > >> have also been asked to split them into groups (4 for age and 5 for
>> > >> deprivation) which are pre-determined by previous work on this
>> > >> subject
>> > >> matter.  The dependent variable is simply whether or not a person
>> > >> took a
>> > >> particular test.
>> > >>
>> > >> I have tried generating models both with the age and deprivation
>> > variables
>> > >> as they are and also with the new categorical age and deprivation
>> > >> variables.  However, when looking at interaction terms, I find that
>> > >> the
>> > >> interaction between age and deprivation is significant when they
are
>> > input
>> > >> as the continuous variables but not significant when I used the
>> > >> categorical versions.  Why would this happen?  Furthermore, which
is
>> > >> the
>> > >> best way to go?  I have read information on logistic regression
until
>> > >> my
>> > >> head hurts, but still don’t feel completely satisfied as to how
I

>> > should
>> > >> determine the best model possible.
>> > >>
>> > >> Any advice would be appreciated please!
>> > >>
>> > >> Thanks
>> > >>
>> > >> Lou
>> > >
>> > >--
>> > >For personalized and experienced consulting in statistics and
research
>> > design, visit www.statisticsdoc.com
>>
>> --
>> For personalized and experienced consulting in statistics and research
>> design, visit www.statisticsdoc.com
>>
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression help

Charlotte-9
In reply to this post by Charlotte-9
Hi Keith,

I completely agree with what you have said.  I will run the analyses both
ways and, if nothing else, will hopefully learn some new things for my own
benefit and knowledge even if I'm stuck in a situation of having to do
things a certain way for my current job.  It's just a shame and worrying,
to be honest, that my bosses (and lots of people in general) misuse
statistics in such a fashion, but I don't think that this is a new problem!

Thanks for your help,

Lou

On Thu, 15 Jun 2006 15:36:51 -0400, Statisticsdoc <[hidden email]>
wrote:

>Keith Starborn
>www.statisticsdoc.com
>
>Lou,
>
>I bet most of the people on this listerserv have faced a similar dilemma
at some time in their careers.  Which one is best from the point of view
of using the data to answer your questions and generate information that
you can act on?  Probably, keeping the variables continuous is better from
that point of view.
>
>As to the politics of the situation, in your position, I would run the
analyses both ways (continuous and categorized) in order to: a.) show that
I did the analysis the way I was told to; and b.) found something else
that works better.  You know the situation best of all.

>
>HTH,
>
>KS
>
>---- Lou <[hidden email]> wrote:
>> Dear Keith,
>>
>> Thanks for your advice which was very helpful.  I feel a bit stuck as to
>> know what to do about this really. My boss (who knows rougly zero about
>> statistics) is insisting that I categorise these variables since I am
>> comparing results with a previous report which did the same.  Does it
take
>> meaning away from the analysis if I discuss results obtained using the
>> original continuous variables and then discuss results separately using
>> the categorised versions (i.e. generate two separate models)?  Not sure
if
>> this really defies logic too much and how I would justify this in the
>> final report.  Although I have a lot to learn in this field, the report
>> that this work is being based on has a lot of dubious findings with
>> regards to the stats, so I'm very keen to ensure that the one I produce
is
>> accurate!!
>>
>> Many thanks,
>>
>> Lou
>>
>> On Thu, 15 Jun 2006 11:36:45 -0400, Statisticsdoc
<[hidden email]>

>> wrote:
>>
>> >Keith Starborn
>> >www.statisticsdoc.com
>> >
>> >Dear Lou,
>> >
>> >Categorizing continuous variables into categorical variables can result
>> is a considerable loss of statistical power because the test for the
>> categorized version of the variable uses more degrees of freedom that
the
>> test for the continuous variable.  In addition, categorizing a
continuous

>> variable can result in a loss of predictive information.
>> >
>> >HTH,
>> >
>> >KS
>> >
>> >---- Lou <[hidden email]> wrote:
>> >> Dear list
>> >>
>> >> I am trying to carry out a logistic regression analysis and have a
quick
>> >> question with regards to the best way to input my independent
variables.
>> >> I have three input variables: ethnicity (5 groups), age and
deprivation
>> >> score.  Although age and deprivation score are continuous variables,
I
>> >> have also been asked to split them into groups (4 for age and 5 for
>> >> deprivation) which are pre-determined by previous work on this
subject
>> >> matter.  The dependent variable is simply whether or not a person
took a
>> >> particular test.
>> >>
>> >> I have tried generating models both with the age and deprivation
>> variables
>> >> as they are and also with the new categorical age and deprivation
>> >> variables.  However, when looking at interaction terms, I find that
the
>> >> interaction between age and deprivation is significant when they are
>> input
>> >> as the continuous variables but not significant when I used the
>> >> categorical versions.  Why would this happen?  Furthermore, which is
the
>> >> best way to go?  I have read information on logistic regression
until my

>> >> head hurts, but still don’t feel completely satisfied as to how I
>> should
>> >> determine the best model possible.
>> >>
>> >> Any advice would be appreciated please!
>> >>
>> >> Thanks
>> >>
>> >> Lou
>> >
>> >--
>> >For personalized and experienced consulting in statistics and research
>> design, visit www.statisticsdoc.com
>
>--
>For personalized and experienced consulting in statistics and research
design, visit www.statisticsdoc.com
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression help

Charlotte-9
In reply to this post by Charlotte-9
Hi Lucinda,

Having just re-read your response below, I wondered if you could just
explain a little more what you mean by 'In some cases, it may actually
serve you to have a model without interaction effect'.  Sorry to be a
pain, I'm just not exactly clear what you mean.  Are you saying that even
if an interaction seems to exist, it may sometimes be better to look at
the model omitting the interaction term?

Thanks for your help

Lou

On Thu, 15 Jun 2006 13:06:37 -0700, LUCINDA M TEAR <[hidden email]>
wrote:

>Hi, Lou.  I agree with all that Keith has said.  I might add that the non
>significant interaction using categorical variables could be due either to
>the fact that by lumping together the Y responses over a range of X inputs
>you created a categorical variable whose variance is large enough that it
is
>not possible to detect any interaction and/or that the endpoints of the
bin
>categories you are using occur at points in the data that obscure the
>interaction you found using the continuous data.
>
>In some cases, it may actually serve you to have a model without an
>interaction effect - it is possible, however, that the confidence
intervals
>around such a model will be larger than they would be from a model with an
>interaction.  On the other hand, using the continuous data apparently
>allowed you to detect some underlying "process" (the interaction you
found).
>If you are trying to understand what creates the patterns you see in your
>data, both models give you information about the resolution at which
certain
>processes are revealed or obscured.  Apparently lumping the way you have
>obscures the interaction.  You might want to try binning your x variables
>differently than the previous report did, just to see if there is a way to
>categorize the x variables such that an interaction is detected.  You
could
>probably use plots from your continuous model to give you an idea about
>where appropriate bins thresholds might lie.  I tend to be one who likes
to

>use models as a way of revealing the "scale" at which the data should be
>approached in order to answer the question at hand.  A different question
>about the same data could require a different type of model.  Models also
>help you find out if the scale you are looking at is missing information
>about some underlying effects that could effect the application of the
>results.
>
>Just some thoughts.
>
>Lucinda
>
>
>
>----- Original Message -----
>From: "Statisticsdoc" <[hidden email]>
>Newsgroups: bit.listserv.spssx-l
>To: <[hidden email]>
>Sent: Thursday, June 15, 2006 12:36 PM
>Subject: Re: Logistic regression help
>
>
>> Keith Starborn
>> www.statisticsdoc.com
>>
>> Lou,
>>
>> I bet most of the people on this listerserv have faced a similar dilemma
>> at some time in their careers.  Which one is best from the point of view
>> of using the data to answer your questions and generate information that
>> you can act on?  Probably, keeping the variables continuous is better
from
>> that point of view.
>>
>> As to the politics of the situation, in your position, I would run the
>> analyses both ways (continuous and categorized) in order to: a.) show
that

>> I did the analysis the way I was told to; and b.) found something else
>> that works better.  You know the situation best of all.
>>
>> HTH,
>>
>> KS
>>
>> ---- Lou <[hidden email]> wrote:
>> > Dear Keith,
>> >
>> > Thanks for your advice which was very helpful.  I feel a bit stuck as
to
>> > know what to do about this really. My boss (who knows rougly zero
about
>> > statistics) is insisting that I categorise these variables since I am
>> > comparing results with a previous report which did the same.  Does it
>> > take
>> > meaning away from the analysis if I discuss results obtained using the
>> > original continuous variables and then discuss results separately
using
>> > the categorised versions (i.e. generate two separate models)?  Not
sure
>> > if
>> > this really defies logic too much and how I would justify this in the
>> > final report.  Although I have a lot to learn in this field, the
report
>> > that this work is being based on has a lot of dubious findings with
>> > regards to the stats, so I'm very keen to ensure that the one I
produce

>> > is
>> > accurate!!
>> >
>> > Many thanks,
>> >
>> > Lou
>> >
>> > On Thu, 15 Jun 2006 11:36:45 -0400, Statisticsdoc
>> > <[hidden email]>
>> > wrote:
>> >
>> > >Keith Starborn
>> > >www.statisticsdoc.com
>> > >
>> > >Dear Lou,
>> > >
>> > >Categorizing continuous variables into categorical variables can
result

>> > is a considerable loss of statistical power because the test for the
>> > categorized version of the variable uses more degrees of freedom that
>> > the
>> > test for the continuous variable.  In addition, categorizing a
>> > continuous
>> > variable can result in a loss of predictive information.
>> > >
>> > >HTH,
>> > >
>> > >KS
>> > >
>> > >---- Lou <[hidden email]> wrote:
>> > >> Dear list
>> > >>
>> > >> I am trying to carry out a logistic regression analysis and have a
>> > >> quick
>> > >> question with regards to the best way to input my independent
>> > >> variables.
>> > >> I have three input variables: ethnicity (5 groups), age and
>> > >> deprivation
>> > >> score.  Although age and deprivation score are continuous
variables,

>> > >> I
>> > >> have also been asked to split them into groups (4 for age and 5 for
>> > >> deprivation) which are pre-determined by previous work on this
>> > >> subject
>> > >> matter.  The dependent variable is simply whether or not a person
>> > >> took a
>> > >> particular test.
>> > >>
>> > >> I have tried generating models both with the age and deprivation
>> > variables
>> > >> as they are and also with the new categorical age and deprivation
>> > >> variables.  However, when looking at interaction terms, I find that
>> > >> the
>> > >> interaction between age and deprivation is significant when they
are
>> > input
>> > >> as the continuous variables but not significant when I used the
>> > >> categorical versions.  Why would this happen?  Furthermore, which
is
>> > >> the
>> > >> best way to go?  I have read information on logistic regression
until
>> > >> my
>> > >> head hurts, but still don’t feel completely satisfied as to how
I

>> > should
>> > >> determine the best model possible.
>> > >>
>> > >> Any advice would be appreciated please!
>> > >>
>> > >> Thanks
>> > >>
>> > >> Lou
>> > >
>> > >--
>> > >For personalized and experienced consulting in statistics and
research
>> > design, visit www.statisticsdoc.com
>>
>> --
>> For personalized and experienced consulting in statistics and research
>> design, visit www.statisticsdoc.com
>>
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression help

khaaver
Suppose you have data on 100 cases in a continuous variable. These are 100 distinct pieces of information (let's assume there are no ties for the sake of example). Now you want to shrink these 100 distinct data items into 4 or 5 distinct peices of information. You are definitely depriving of a lot of statistical power from the LR Model. This is how I explain to my clients when they insist to categorise a continous variable, which does not make any logical sense. Hope this will be helpfule, though few years late.
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression help

Hector Maletta
You can always present the variable values grouped into a few brackets, but
for the sake of analysis (e.g. as a predictor) it is better, as you claim,
to retain the full information.
Having only 100 cases, however, one or two outliers may distort the results.
Thus perhaps you should look for that possibility, and then perhaps group
the highest values into one average value, or drop the outliers. But I
wouldn't go farther than that.
You do not ask anything about logistic regression (the subject of your
message). If the variable involved is intended as a predictor for Log Reg,
I'd just let it as it is. If it is intended to be a dependent variable, I'd
use ordinary least square regression, not logistic regression.
Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
khaaver
Enviado el: Sunday, June 12, 2011 06:42
Para: [hidden email]
Asunto: Re: Logistic regression help

Suppose you have data on 100 cases in a continuous variable. These are 100
distinct pieces of information (let's assume there are no ties for the sake
of example). Now you want to shrink these 100 distinct data items into 4 or
5 distinct peices of information. You are definitely depriving of a lot of
statistical power from the LR Model. This is how I explain to my clients
when they insist to categorise a continous variable, which does not make any
logical sense. Hope this will be helpfule, though few years late.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Logistic-regression-help-tp106
9096p4481329.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1382 / Virus Database: 1513/3696 - Release Date: 06/11/11

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Categorisation of continuous data not always a bad thing

Martin Holt
In reply to this post by khaaver
This follows the posting below, and presents an alternative point of view that will surprise most statisticians, I think.
 
It is conventional wisdom that categorisation of continuous data loses information and the recommendation usually given is not to do this. If one reads the paper below, however, it becomes apparent that sometimes one should first explore categorisation before modelling based on continuous data: sometimes categorisation can be seen to be a better method.
 
Gilbert Welch H, Schwartz L.M, Woloshin S, "The exaggerated relations between diet, body weight and mortality: the case for a categorical data approach." CMAJ, March 29 2005; 172 (7)
 
The gist of the paper is that BMI continuous data have an underlying parabolic shape, and that the usual methods of continuous data fail to fit this, whilst categorisation succeeds. This begs the question, "How to do the categorical analysis?", and the paper covers this. Having performed this categorical analysis, one can then superimpose the results of the continuous analysis to effectively validate it.

Martin Holt
Medical Statistician

--- On Sun, 12/6/11, khaaver <[hidden email]> wrote:

From: khaaver <[hidden email]>
Subject: Re: Logistic regression help
To: [hidden email]
Date: Sunday, 12 June, 2011, 10:41

Suppose you have data on 100 cases in a continuous variable. These are 100
distinct pieces of information (let's assume there are no ties for the sake
of example). Now you want to shrink these 100 distinct data items into 4 or
5 distinct peices of information. You are definitely depriving of a lot of
statistical power from the LR Model. This is how I explain to my clients
when they insist to categorise a continous variable, which does not make any
logical sense. Hope this will be helpfule, though few years late.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Logistic-regression-help-tp1069096p4481329.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD