Help with Binary Logistic Regression

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with Binary Logistic Regression

Cardiff Tyke
All,

I'm a newcomer to SPSS (hence, I don't understand most of the questions posted to this mailing list!) and I would like to ask members for help with a simple problem.

I'm currently running a binary logistic regression procedure to try and predict a simple yes or no response from subjects.  I have roughly 20000 records which each have around 30 variables.  The yes:no split of existing records is currently around 65%:35%.

However, when I run the SPSS regression procedure, the model predicts that all respondents will return a "yes" answer.  I realise that this could be down to purely having a set of unpredictive variables (although I hope not), but I'd like to remove all potential human (i.e mine) before I jump to any conclusions.

Can anyone offer some simple assistance to help me to get to the bottom of this problem?

Thanks,
JC


       
       
               
___________________________________________________________
New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes.
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk
Reply | Threaded
Open this post in threaded view
|

Re: Help with Binary Logistic Regression

Björn Türoque
You might be having a problem with missing values, do you have a skip
pattern in your data,? i.e. some variables are not collected for people who
fit a certain set of criteria. Take a look at how your data is entered and
stored, understanding how your data is structured is a great way to remove
human error.

Don

On 1/23/07, Cardiff Tyke <[hidden email]> wrote:

>
> All,
>
> I'm a newcomer to SPSS (hence, I don't understand most of the questions
> posted to this mailing list!) and I would like to ask members for help with
> a simple problem.
>
> I'm currently running a binary logistic regression procedure to try and
> predict a simple yes or no response from subjects.  I have roughly 20000
> records which each have around 30 variables.  The yes:no split of existing
> records is currently around 65%:35%.
>
> However, when I run the SPSS regression procedure, the model predicts that
> all respondents will return a "yes" answer.  I realise that this could be
> down to purely having a set of unpredictive variables (although I hope not),
> but I'd like to remove all potential human (i.e mine) before I jump to any
> conclusions.
>
> Can anyone offer some simple assistance to help me to get to the bottom of
> this problem?
>
> Thanks,
> JC
>
>
>
>
>
> ___________________________________________________________
> New Yahoo! Mail is the ultimate force in competitive emailing. Find out
> more at the Yahoo! Mail Championships. Plus: play games and win prizes.
> http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk
>
Reply | Threaded
Open this post in threaded view
|

Re: Help with Binary Logistic Regression

Hector Maletta
In reply to this post by Cardiff Tyke
        The prediction is made by SPSS using a cutoff point for the
predicted probability, which by default is 0.50. In other words, cases with
p>0.5 are predicted for Yes, and those up to 0.5 for No. Since you have in
general p=0.65, I guess your predictors do not dent much the average
probability, so most cases end up with probabilities above 0.5 and are
therefore predicted to suffer the event.
        You may use syntax to change the cutoff point, if desired (putting
it at 0.65, for instance, or using ROC curves first to find the most
suitable cutoff point). However, Logistic Regression is best used as an
analytical tool and as a predictor for sample or population proportions than
as a predictor for individual cases, since it is based on probability
distributions which leave ample room for random effects. A particular case,
even with a high predicted probability, may well avoid suffering the event
(think of all those heavy smokers that live to 90) while others with low
probability suffer it anyway (non-smoker, exercising lean people suffering a
heart attack at 50).
        Hector


        -----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Cardiff Tyke
Enviado el: 23 January 2007 08:32
Para: [hidden email]
Asunto: Help with Binary Logistic Regression

        All,

        I'm a newcomer to SPSS (hence, I don't understand most of the
questions posted to this mailing list!) and I would like to ask members for
help with a simple problem.

        I'm currently running a binary logistic regression procedure to try
and predict a simple yes or no response from subjects.  I have roughly 20000
records which each have around 30 variables.  The yes:no split of existing
records is currently around 65%:35%.

        However, when I run the SPSS regression procedure, the model
predicts that all respondents will return a "yes" answer.  I realise that
this could be down to purely having a set of unpredictive variables
(although I hope not), but I'd like to remove all potential human (i.e mine)
before I jump to any conclusions.

        Can anyone offer some simple assistance to help me to get to the
bottom of this problem?

        Thanks,
        JC





        ___________________________________________________________
        New Yahoo! Mail is the ultimate force in competitive emailing. Find
out more at the Yahoo! Mail Championships. Plus: play games and win prizes.
        http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk
Reply | Threaded
Open this post in threaded view
|

Re: Help with Binary Logistic Regression

Cardiff Tyke
In reply to this post by Cardiff Tyke
Thanks for your help.  Out of interest, what would be the best statistical procedure to yield a predictor for individual cases? I thought BLR would be the best for this sort of exercise.


----- Original Message ----
From: Hector Maletta <[hidden email]>
To: [hidden email]
Sent: Tuesday, 23 January, 2007 1:12:09 PM
Subject: Re: Help with Binary Logistic Regression


The prediction is made by SPSS using a cutoff point for the
predicted probability, which by default is 0.50. In other words, cases with
p>0.5 are predicted for Yes, and those up to 0.5 for No. Since you have in
general p=0.65, I guess your predictors do not dent much the average
probability, so most cases end up with probabilities above 0.5 and are
therefore predicted to suffer the event.
        You may use syntax to change the cutoff point, if desired (putting
it at 0.65, for instance, or using ROC curves first to find the most
suitable cutoff point). However, Logistic Regression is best used as an
analytical tool and as a predictor for sample or population proportions than
as a predictor for individual cases, since it is based on probability
distributions which leave ample room for random effects. A particular case,
even with a high predicted probability, may well avoid suffering the event
(think of all those heavy smokers that live to 90) while others with low
probability suffer it anyway (non-smoker, exercising lean people suffering a
heart attack at 50).
        Hector


        -----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Cardiff Tyke
Enviado el: 23 January 2007 08:32
Para: [hidden email]
Asunto: Help with Binary Logistic Regression

        All,

        I'm a newcomer to SPSS (hence, I don't understand most of the
questions posted to this mailing list!) and I would like to ask members for
help with a simple problem.

        I'm currently running a binary logistic regression procedure to try
and predict a simple yes or no response from subjects.  I have roughly 20000
records which each have around 30 variables.  The yes:no split of existing
records is currently around 65%:35%.

        However, when I run the SPSS regression procedure, the model
predicts that all respondents will return a "yes" answer.  I realise that
this could be down to purely having a set of unpredictive variables
(although I hope not), but I'd like to remove all potential human (i.e mine)
before I jump to any conclusions.

        Can anyone offer some simple assistance to help me to get to the
bottom of this problem?

        Thanks,
        JC





        ___________________________________________________________
        New Yahoo! Mail is the ultimate force in competitive emailing. Find
out more at the Yahoo! Mail Championships. Plus: play games and win prizes.
        http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk


               
___________________________________________________________
The all-new Yahoo! Mail goes wherever you go - free your email address from your Internet provider. http://uk.docs.yahoo.com/nowyoucan.html
Reply | Threaded
Open this post in threaded view
|

Re: Help with Binary Logistic Regression

Hector Maletta
Logistic Regression is indeed usable as a (probabilistic) predictor for
individual cases, but "probabilistic" is the crucial word. Probability is
governed by the Law of Large Numbers, and anything it says about groups is
subject to large margins of error when applied to specific individual cases.
Thus Winston Churchill, who smoked heavily, led a sedentary life in his
mature years, was seriously overweight, was subject to years of constant
occupational stress and short sleep hours, and drank half a bottle of brandy
every day plus generous doses of other liquors, should have died before
making 60 but managed to survive to almost 90 against all odds, while the
inventor of aerobic exercise, I don't even remember his name now, died of a
heart attack while exercising in his fifties, in two fine examples of
probability going afoul when applied to individual cases. Take, however,
1000 people like Churchill or 1000 like the aerobic guy, and the odds will
not fail. Another example is Albert Einstein: barely passing high school,
was judged not to be university material, and only made it to a vocational
polytechnical school, ending up as a clerk in a patent office; in a
meritocratic system based on SAT and A-level exams he would have been judged
(as he was, on a less scientific basis) as unfit for any kind of academic
career. A predictor equation for adult academic success based on early
scholastic achievements would have predicted a round zero for poor Albert,
again showing the perils of applying probabilistic predictions to
individuals.



Hector



  _____

De: Cardiff Tyke [mailto:[hidden email]]
Enviado el: 23 January 2007 11:29
Para: Hector Maletta; [hidden email]
Asunto: Re: Help with Binary Logistic Regression



Thanks for your help.  Out of interest, what would be the best statistical
procedure to yield a predictor for individual cases? I thought BLR would be
the best for this sort of exercise.

----- Original Message ----
From: Hector Maletta <[hidden email]>
To: [hidden email]
Sent: Tuesday, 23 January, 2007 1:12:09 PM
Subject: Re: Help with Binary Logistic Regression

The prediction is made by SPSS using a cutoff point for the
predicted probability, which by default is 0.50. In other words, cases with
p>0.5 are predicted for Yes, and those up to 0.5 for No. Since you have in
general p=0.65, I guess your predictors do not dent much the average
probability, so most cases end up with probabilities above 0.5 and are
therefore predicted to suffer the event.
        You may use syntax to change the cutoff point, if desired (putting
it at 0.65, for instance, or using ROC curves first to find the most
suitable cutoff point). However, Logistic Regression is best used as an
analytical tool and as a predictor for sample or population proportions than
as a predictor for individual cases, since it is based on probability
distributions which leave ample room for random effects. A particular case,
even with a high predicted probability, may well avoid suffering the event
(think of all those heavy smokers that live to 90) while others with low
probability suffer it anyway (non-smoker, exercising lean people suffering a
heart attack at 50).
        Hector


        -----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Cardiff Tyke
Enviado el: 23 January 2007 08:32
Para: [hidden email]
Asunto: Help with Binary Logistic Regression

        All,

        I'm a newcomer to SPSS (hence, I don't understand most of the
questions posted to this mailing list!) and I would like to ask members for
help with a simple problem.

        I'm currently running a binary logistic regression procedure to try
and predict a simple yes or no response from subjects.  I have roughly 20000
records which each have around 30 variables.  The yes:no split of existing
records is currently around 65%:35%.

        However, when I run the SPSS regression procedure, the model
predicts that all respondents will return a "yes" answer.  I realise that
this could be down to purely having a set of unpredictive variables
(although I hope not), but I'd like to remove all potential human (i.e mine)
before I jump to any conclusions.

        Can anyone offer some simple assistance to help me to get to the
bottom of this problem?

        Thanks,
        JC





        ___________________________________________________________
        New Yahoo! Mail is the ultimate force in competitive emailing. Find
out more at the Yahoo! Mail Championships. Plus: play games and win prizes.
        http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk
<http://uk.rd.yahoo.com/evt=44106/*http:/mail.yahoo.net/uk>





  _____

Now you can scan
<http://us.rd.yahoo.com/mail/uk/taglines/default/nowyoucan/reading_pane/*htt
p:/us.rd.yahoo.com/evt=40565/*http:/uk.docs.yahoo.com/nowyoucan.html>
emails quickly with a reading pane. Get the new Yahoo!
<http://us.rd.yahoo.com/mail/uk/taglines/default/nowyoucan/reading_pane/*htt
p:/us.rd.yahoo.com/evt=40565/*http:/uk.docs.yahoo.com/nowyoucan.html>  Mail.
Reply | Threaded
Open this post in threaded view
|

Einstein (OT); was, re: Help with Binary Logistic Regression

Richard Ristow
At 01:35 PM 1/23/2007, Hector Maletta wrote:

>Another example is Albert Einstein: barely
>passing high school, was judged not to be
>university material, and only made it to a
>vocational polytechnical school,

Granted on a lot of counts, but his scientific
education was at a higher level than that. The
Eidgenössische Technische Hochschule Zürich,
though the literal English of "Technische
Hochschule" is "technical high school", is a
high-level scientific institution. (In terms of
the United States educational system, "Technische
Hochschule" translates more or less as "institute
of technology", as in "Massachusetts Institute of Technology.")

The ETH had apparently not reached that level
when Einstein was there. (From the Wikipedia
article on the ETH: "In 1909, the course program
of the ETH was restructured to that of a real
university, and the ETH was granted the right to
award doctorates.") Though Einstein did study at
the ETH, his doctorate (per the Wikipedia article
on Einstein) was from the University of Zürich, in 1905.

We now return you to SPSS and statistical matters.
Reply | Threaded
Open this post in threaded view
|

Re: Einstein (OT); was, re: Help with Binary Logistic Regression

Hector Maletta
        I stand corrected. The point was, however, that he was denied access
to the Universitat and had to go to the Hochschule, due to negative reports
from his high school teachers. He was also a rather erratic teenager, e.g.
taking a year out of school to wander in a bike through the country, not the
usual mark of an overachiever.
        By the way, I had been informed by a list member that the aerobism
inventor did not die while exercising: his supposed death is apparently just
another urban legend.
        So much for my examples, of which only Winston Churchill survives.
Fortunately, my point did not depend on those particular (and avowedly
poorly researched) examples.

        Hector


        -----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Richard Ristow
Enviado el: 23 January 2007 21:54
Para: [hidden email]
Asunto: Einstein (OT); was, re: Help with Binary Logistic Regression

        At 01:35 PM 1/23/2007, Hector Maletta wrote:

        >Another example is Albert Einstein: barely
        >passing high school, was judged not to be
        >university material, and only made it to a
        >vocational polytechnical school,

        Granted on a lot of counts, but his scientific
        education was at a higher level than that. The
        Eidgenössische Technische Hochschule Zürich,
        though the literal English of "Technische
        Hochschule" is "technical high school", is a
        high-level scientific institution. (In terms of
        the United States educational system, "Technische
        Hochschule" translates more or less as "institute
        of technology", as in "Massachusetts Institute of Technology.")

        The ETH had apparently not reached that level
        when Einstein was there. (From the Wikipedia
        article on the ETH: "In 1909, the course program
        of the ETH was restructured to that of a real
        university, and the ETH was granted the right to
        award doctorates.") Though Einstein did study at
        the ETH, his doctorate (per the Wikipedia article
        on Einstein) was from the University of Zürich, in 1905.

        We now return you to SPSS and statistical matters.
Reply | Threaded
Open this post in threaded view
|

Re: Einstein (OT); was, re: Help with Binary Logistic Regression

David Wasserman
If you were referring to Jim Fixx, whose "Complete Book of Running"
propelled jogging to new heights of popularity, he did die of a heart attack
at 52.  I remember reading the news stories at the time, and I have found no
on-line resources to contradict it.  If you're referring to someone else, I
can't help restore your example.

David Wasserman
Custom Data Analysis and SPSS Programming

----- Original Message -----
From: "Hector Maletta" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, January 23, 2007 6:59 PM
Subject: Re: Einstein (OT); was, re: Help with Binary Logistic Regression


>        I stand corrected. The point was, however, that he was denied
> access
> to the Universitat and had to go to the Hochschule, due to negative
> reports
> from his high school teachers. He was also a rather erratic teenager, e.g.
> taking a year out of school to wander in a bike through the country, not
> the
> usual mark of an overachiever.
>        By the way, I had been informed by a list member that the aerobism
> inventor did not die while exercising: his supposed death is apparently
> just
> another urban legend.
>        So much for my examples, of which only Winston Churchill survives.
> Fortunately, my point did not depend on those particular (and avowedly
> poorly researched) examples.
>
>        Hector
>
>
>        -----Mensaje original-----
> De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
> Richard Ristow
> Enviado el: 23 January 2007 21:54
> Para: [hidden email]
> Asunto: Einstein (OT); was, re: Help with Binary Logistic Regression
>
>        At 01:35 PM 1/23/2007, Hector Maletta wrote:
>
>        >Another example is Albert Einstein: barely
>        >passing high school, was judged not to be
>        >university material, and only made it to a
>        >vocational polytechnical school,
>
>        Granted on a lot of counts, but his scientific
>        education was at a higher level than that. The
>        Eidgenössische Technische Hochschule Zürich,
>        though the literal English of "Technische
>        Hochschule" is "technical high school", is a
>        high-level scientific institution. (In terms of
>        the United States educational system, "Technische
>        Hochschule" translates more or less as "institute
>        of technology", as in "Massachusetts Institute of Technology.")
>
>        The ETH had apparently not reached that level
>        when Einstein was there. (From the Wikipedia
>        article on the ETH: "In 1909, the course program
>        of the ETH was restructured to that of a real
>        university, and the ETH was granted the right to
>        award doctorates.") Though Einstein did study at
>        the ETH, his doctorate (per the Wikipedia article
>        on Einstein) was from the University of Zürich, in 1905.
>
>        We now return you to SPSS and statistical matters.
>
Reply | Threaded
Open this post in threaded view
|

Re: Einstein (OT); was, re: Help with Binary Logistic Regression

Hector Maletta
        You're probably right. The urban legend refers to a Dr Cooper,
introducer of aerobic exercise and still living, not to J.Fixx who advocated
jogging. I mistakenly wrote about aerobism when I should have written
jogging. My example was right after all.

        Hector

        -----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de David
Wasserman
Enviado el: 24 January 2007 02:16
Para: [hidden email]
Asunto: Re: Einstein (OT); was, re: Help with Binary Logistic Regression

        If you were referring to Jim Fixx, whose "Complete Book of Running"
        propelled jogging to new heights of popularity, he did die of a
heart attack
        at 52.  I remember reading the news stories at the time, and I have
found no
        on-line resources to contradict it.  If you're referring to someone
else, I
        can't help restore your example.

        David Wasserman
        Custom Data Analysis and SPSS Programming

        ----- Original Message -----
        From: "Hector Maletta" <[hidden email]>
        To: <[hidden email]>
        Sent: Tuesday, January 23, 2007 6:59 PM
        Subject: Re: Einstein (OT); was, re: Help with Binary Logistic
Regression


        >        I stand corrected. The point was, however, that he was
denied
        > access
        > to the Universitat and had to go to the Hochschule, due to
negative
        > reports
        > from his high school teachers. He was also a rather erratic
teenager, e.g.
        > taking a year out of school to wander in a bike through the
country, not
        > the
        > usual mark of an overachiever.
        >        By the way, I had been informed by a list member that the
aerobism
        > inventor did not die while exercising: his supposed death is
apparently
        > just
        > another urban legend.
        >        So much for my examples, of which only Winston Churchill
survives.
        > Fortunately, my point did not depend on those particular (and
avowedly
        > poorly researched) examples.
        >
        >        Hector
        >
        >
        >        -----Mensaje original-----
        > De: SPSSX(r) Discussion [mailto:[hidden email]] En
nombre de
        > Richard Ristow
        > Enviado el: 23 January 2007 21:54
        > Para: [hidden email]
        > Asunto: Einstein (OT); was, re: Help with Binary Logistic
Regression
        >
        >        At 01:35 PM 1/23/2007, Hector Maletta wrote:
        >
        >        >Another example is Albert Einstein: barely
        >        >passing high school, was judged not to be
        >        >university material, and only made it to a
        >        >vocational polytechnical school,
        >
        >        Granted on a lot of counts, but his scientific
        >        education was at a higher level than that. The
        >        Eidgenössische Technische Hochschule Zürich,
        >        though the literal English of "Technische
        >        Hochschule" is "technical high school", is a
        >        high-level scientific institution. (In terms of
        >        the United States educational system, "Technische
        >        Hochschule" translates more or less as "institute
        >        of technology", as in "Massachusetts Institute of
Technology.")
        >
        >        The ETH had apparently not reached that level
        >        when Einstein was there. (From the Wikipedia
        >        article on the ETH: "In 1909, the course program
        >        of the ETH was restructured to that of a real
        >        university, and the ETH was granted the right to
        >        award doctorates.") Though Einstein did study at
        >        the ETH, his doctorate (per the Wikipedia article
        >        on Einstein) was from the University of Zürich, in 1905.
        >
        >        We now return you to SPSS and statistical matters.
        >
Reply | Threaded
Open this post in threaded view
|

merging data

Stephanie Roahen-Harrison
In reply to this post by Hector Maletta
I have a very simple question.
I want to add data for a subset of cases and a subset of variables to my
master dataset, matched on subject ID (for which there is one row per
subject in each dataset). What is the best way to do this?
Thank you in advance,
Stephanie Harrison

_________________________________________________________________
Valentine’s Day -- Shop for gifts that spell L-O-V-E at MSN Shopping
http://shopping.msn.com/content/shp/?ctId=8323,ptnrid=37,ptnrdata=24095&tcode=wlmtagline
Reply | Threaded
Open this post in threaded view
|

Re: merging data

Maguin, Eugene
Stephanie,

Please do some reading in the syntax reference manual. Look at the following
commands:
Match files, Update.

Your message is a little unclear as to exactly what you have and what to do.


>>I want to add data for a subset of cases and a subset of variables to my
master dataset

This could be a job for Update if you want to change the values of existing
variables for existing cases in the master dataset. But it could be other
things. Depends on what exactly you mean.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

merging data

Hal 9000
In reply to this post by Stephanie Roahen-Harrison
Stephanie,

You're in luck - I specialize in 'very simple questions'.

Firstly, Gene's suggestion is good.

Secondly, I think I know what to do, but I don't know how to tell you to do
it because there is nothing here for me to work with. If you can provide a
very simplified example of the data you currently have, and another example
of what you want that data to look like, then you'll be giving people here
something to sink their teeth into.

:),
-Gary