All,
I'm a newcomer to SPSS (hence, I don't understand most of the questions posted to this mailing list!) and I would like to ask members for help with a simple problem. I'm currently running a binary logistic regression procedure to try and predict a simple yes or no response from subjects. I have roughly 20000 records which each have around 30 variables. The yes:no split of existing records is currently around 65%:35%. However, when I run the SPSS regression procedure, the model predicts that all respondents will return a "yes" answer. I realise that this could be down to purely having a set of unpredictive variables (although I hope not), but I'd like to remove all potential human (i.e mine) before I jump to any conclusions. Can anyone offer some simple assistance to help me to get to the bottom of this problem? Thanks, JC ___________________________________________________________ New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes. http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk |
You might be having a problem with missing values, do you have a skip
pattern in your data,? i.e. some variables are not collected for people who fit a certain set of criteria. Take a look at how your data is entered and stored, understanding how your data is structured is a great way to remove human error. Don On 1/23/07, Cardiff Tyke <[hidden email]> wrote: > > All, > > I'm a newcomer to SPSS (hence, I don't understand most of the questions > posted to this mailing list!) and I would like to ask members for help with > a simple problem. > > I'm currently running a binary logistic regression procedure to try and > predict a simple yes or no response from subjects. I have roughly 20000 > records which each have around 30 variables. The yes:no split of existing > records is currently around 65%:35%. > > However, when I run the SPSS regression procedure, the model predicts that > all respondents will return a "yes" answer. I realise that this could be > down to purely having a set of unpredictive variables (although I hope not), > but I'd like to remove all potential human (i.e mine) before I jump to any > conclusions. > > Can anyone offer some simple assistance to help me to get to the bottom of > this problem? > > Thanks, > JC > > > > > > ___________________________________________________________ > New Yahoo! Mail is the ultimate force in competitive emailing. Find out > more at the Yahoo! Mail Championships. Plus: play games and win prizes. > http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk > |
In reply to this post by Cardiff Tyke
The prediction is made by SPSS using a cutoff point for the
predicted probability, which by default is 0.50. In other words, cases with p>0.5 are predicted for Yes, and those up to 0.5 for No. Since you have in general p=0.65, I guess your predictors do not dent much the average probability, so most cases end up with probabilities above 0.5 and are therefore predicted to suffer the event. You may use syntax to change the cutoff point, if desired (putting it at 0.65, for instance, or using ROC curves first to find the most suitable cutoff point). However, Logistic Regression is best used as an analytical tool and as a predictor for sample or population proportions than as a predictor for individual cases, since it is based on probability distributions which leave ample room for random effects. A particular case, even with a high predicted probability, may well avoid suffering the event (think of all those heavy smokers that live to 90) while others with low probability suffer it anyway (non-smoker, exercising lean people suffering a heart attack at 50). Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Cardiff Tyke Enviado el: 23 January 2007 08:32 Para: [hidden email] Asunto: Help with Binary Logistic Regression All, I'm a newcomer to SPSS (hence, I don't understand most of the questions posted to this mailing list!) and I would like to ask members for help with a simple problem. I'm currently running a binary logistic regression procedure to try and predict a simple yes or no response from subjects. I have roughly 20000 records which each have around 30 variables. The yes:no split of existing records is currently around 65%:35%. However, when I run the SPSS regression procedure, the model predicts that all respondents will return a "yes" answer. I realise that this could be down to purely having a set of unpredictive variables (although I hope not), but I'd like to remove all potential human (i.e mine) before I jump to any conclusions. Can anyone offer some simple assistance to help me to get to the bottom of this problem? Thanks, JC ___________________________________________________________ New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes. http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk |
In reply to this post by Cardiff Tyke
Thanks for your help. Out of interest, what would be the best statistical procedure to yield a predictor for individual cases? I thought BLR would be the best for this sort of exercise.
----- Original Message ---- From: Hector Maletta <[hidden email]> To: [hidden email] Sent: Tuesday, 23 January, 2007 1:12:09 PM Subject: Re: Help with Binary Logistic Regression The prediction is made by SPSS using a cutoff point for the predicted probability, which by default is 0.50. In other words, cases with p>0.5 are predicted for Yes, and those up to 0.5 for No. Since you have in general p=0.65, I guess your predictors do not dent much the average probability, so most cases end up with probabilities above 0.5 and are therefore predicted to suffer the event. You may use syntax to change the cutoff point, if desired (putting it at 0.65, for instance, or using ROC curves first to find the most suitable cutoff point). However, Logistic Regression is best used as an analytical tool and as a predictor for sample or population proportions than as a predictor for individual cases, since it is based on probability distributions which leave ample room for random effects. A particular case, even with a high predicted probability, may well avoid suffering the event (think of all those heavy smokers that live to 90) while others with low probability suffer it anyway (non-smoker, exercising lean people suffering a heart attack at 50). Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Cardiff Tyke Enviado el: 23 January 2007 08:32 Para: [hidden email] Asunto: Help with Binary Logistic Regression All, I'm a newcomer to SPSS (hence, I don't understand most of the questions posted to this mailing list!) and I would like to ask members for help with a simple problem. I'm currently running a binary logistic regression procedure to try and predict a simple yes or no response from subjects. I have roughly 20000 records which each have around 30 variables. The yes:no split of existing records is currently around 65%:35%. However, when I run the SPSS regression procedure, the model predicts that all respondents will return a "yes" answer. I realise that this could be down to purely having a set of unpredictive variables (although I hope not), but I'd like to remove all potential human (i.e mine) before I jump to any conclusions. Can anyone offer some simple assistance to help me to get to the bottom of this problem? Thanks, JC ___________________________________________________________ New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes. http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk ___________________________________________________________ The all-new Yahoo! Mail goes wherever you go - free your email address from your Internet provider. http://uk.docs.yahoo.com/nowyoucan.html |
Logistic Regression is indeed usable as a (probabilistic) predictor for
individual cases, but "probabilistic" is the crucial word. Probability is governed by the Law of Large Numbers, and anything it says about groups is subject to large margins of error when applied to specific individual cases. Thus Winston Churchill, who smoked heavily, led a sedentary life in his mature years, was seriously overweight, was subject to years of constant occupational stress and short sleep hours, and drank half a bottle of brandy every day plus generous doses of other liquors, should have died before making 60 but managed to survive to almost 90 against all odds, while the inventor of aerobic exercise, I don't even remember his name now, died of a heart attack while exercising in his fifties, in two fine examples of probability going afoul when applied to individual cases. Take, however, 1000 people like Churchill or 1000 like the aerobic guy, and the odds will not fail. Another example is Albert Einstein: barely passing high school, was judged not to be university material, and only made it to a vocational polytechnical school, ending up as a clerk in a patent office; in a meritocratic system based on SAT and A-level exams he would have been judged (as he was, on a less scientific basis) as unfit for any kind of academic career. A predictor equation for adult academic success based on early scholastic achievements would have predicted a round zero for poor Albert, again showing the perils of applying probabilistic predictions to individuals. Hector _____ De: Cardiff Tyke [mailto:[hidden email]] Enviado el: 23 January 2007 11:29 Para: Hector Maletta; [hidden email] Asunto: Re: Help with Binary Logistic Regression Thanks for your help. Out of interest, what would be the best statistical procedure to yield a predictor for individual cases? I thought BLR would be the best for this sort of exercise. ----- Original Message ---- From: Hector Maletta <[hidden email]> To: [hidden email] Sent: Tuesday, 23 January, 2007 1:12:09 PM Subject: Re: Help with Binary Logistic Regression The prediction is made by SPSS using a cutoff point for the predicted probability, which by default is 0.50. In other words, cases with p>0.5 are predicted for Yes, and those up to 0.5 for No. Since you have in general p=0.65, I guess your predictors do not dent much the average probability, so most cases end up with probabilities above 0.5 and are therefore predicted to suffer the event. You may use syntax to change the cutoff point, if desired (putting it at 0.65, for instance, or using ROC curves first to find the most suitable cutoff point). However, Logistic Regression is best used as an analytical tool and as a predictor for sample or population proportions than as a predictor for individual cases, since it is based on probability distributions which leave ample room for random effects. A particular case, even with a high predicted probability, may well avoid suffering the event (think of all those heavy smokers that live to 90) while others with low probability suffer it anyway (non-smoker, exercising lean people suffering a heart attack at 50). Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Cardiff Tyke Enviado el: 23 January 2007 08:32 Para: [hidden email] Asunto: Help with Binary Logistic Regression All, I'm a newcomer to SPSS (hence, I don't understand most of the questions posted to this mailing list!) and I would like to ask members for help with a simple problem. I'm currently running a binary logistic regression procedure to try and predict a simple yes or no response from subjects. I have roughly 20000 records which each have around 30 variables. The yes:no split of existing records is currently around 65%:35%. However, when I run the SPSS regression procedure, the model predicts that all respondents will return a "yes" answer. I realise that this could be down to purely having a set of unpredictive variables (although I hope not), but I'd like to remove all potential human (i.e mine) before I jump to any conclusions. Can anyone offer some simple assistance to help me to get to the bottom of this problem? Thanks, JC ___________________________________________________________ New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes. http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk <http://uk.rd.yahoo.com/evt=44106/*http:/mail.yahoo.net/uk> _____ Now you can scan <http://us.rd.yahoo.com/mail/uk/taglines/default/nowyoucan/reading_pane/*htt p:/us.rd.yahoo.com/evt=40565/*http:/uk.docs.yahoo.com/nowyoucan.html> emails quickly with a reading pane. Get the new Yahoo! <http://us.rd.yahoo.com/mail/uk/taglines/default/nowyoucan/reading_pane/*htt p:/us.rd.yahoo.com/evt=40565/*http:/uk.docs.yahoo.com/nowyoucan.html> Mail. |
At 01:35 PM 1/23/2007, Hector Maletta wrote:
>Another example is Albert Einstein: barely >passing high school, was judged not to be >university material, and only made it to a >vocational polytechnical school, Granted on a lot of counts, but his scientific education was at a higher level than that. The Eidgenössische Technische Hochschule Zürich, though the literal English of "Technische Hochschule" is "technical high school", is a high-level scientific institution. (In terms of the United States educational system, "Technische Hochschule" translates more or less as "institute of technology", as in "Massachusetts Institute of Technology.") The ETH had apparently not reached that level when Einstein was there. (From the Wikipedia article on the ETH: "In 1909, the course program of the ETH was restructured to that of a real university, and the ETH was granted the right to award doctorates.") Though Einstein did study at the ETH, his doctorate (per the Wikipedia article on Einstein) was from the University of Zürich, in 1905. We now return you to SPSS and statistical matters. |
I stand corrected. The point was, however, that he was denied access
to the Universitat and had to go to the Hochschule, due to negative reports from his high school teachers. He was also a rather erratic teenager, e.g. taking a year out of school to wander in a bike through the country, not the usual mark of an overachiever. By the way, I had been informed by a list member that the aerobism inventor did not die while exercising: his supposed death is apparently just another urban legend. So much for my examples, of which only Winston Churchill survives. Fortunately, my point did not depend on those particular (and avowedly poorly researched) examples. Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Richard Ristow Enviado el: 23 January 2007 21:54 Para: [hidden email] Asunto: Einstein (OT); was, re: Help with Binary Logistic Regression At 01:35 PM 1/23/2007, Hector Maletta wrote: >Another example is Albert Einstein: barely >passing high school, was judged not to be >university material, and only made it to a >vocational polytechnical school, Granted on a lot of counts, but his scientific education was at a higher level than that. The Eidgenössische Technische Hochschule Zürich, though the literal English of "Technische Hochschule" is "technical high school", is a high-level scientific institution. (In terms of the United States educational system, "Technische Hochschule" translates more or less as "institute of technology", as in "Massachusetts Institute of Technology.") The ETH had apparently not reached that level when Einstein was there. (From the Wikipedia article on the ETH: "In 1909, the course program of the ETH was restructured to that of a real university, and the ETH was granted the right to award doctorates.") Though Einstein did study at the ETH, his doctorate (per the Wikipedia article on Einstein) was from the University of Zürich, in 1905. We now return you to SPSS and statistical matters. |
If you were referring to Jim Fixx, whose "Complete Book of Running"
propelled jogging to new heights of popularity, he did die of a heart attack at 52. I remember reading the news stories at the time, and I have found no on-line resources to contradict it. If you're referring to someone else, I can't help restore your example. David Wasserman Custom Data Analysis and SPSS Programming ----- Original Message ----- From: "Hector Maletta" <[hidden email]> To: <[hidden email]> Sent: Tuesday, January 23, 2007 6:59 PM Subject: Re: Einstein (OT); was, re: Help with Binary Logistic Regression > I stand corrected. The point was, however, that he was denied > access > to the Universitat and had to go to the Hochschule, due to negative > reports > from his high school teachers. He was also a rather erratic teenager, e.g. > taking a year out of school to wander in a bike through the country, not > the > usual mark of an overachiever. > By the way, I had been informed by a list member that the aerobism > inventor did not die while exercising: his supposed death is apparently > just > another urban legend. > So much for my examples, of which only Winston Churchill survives. > Fortunately, my point did not depend on those particular (and avowedly > poorly researched) examples. > > Hector > > > -----Mensaje original----- > De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de > Richard Ristow > Enviado el: 23 January 2007 21:54 > Para: [hidden email] > Asunto: Einstein (OT); was, re: Help with Binary Logistic Regression > > At 01:35 PM 1/23/2007, Hector Maletta wrote: > > >Another example is Albert Einstein: barely > >passing high school, was judged not to be > >university material, and only made it to a > >vocational polytechnical school, > > Granted on a lot of counts, but his scientific > education was at a higher level than that. The > Eidgenössische Technische Hochschule Zürich, > though the literal English of "Technische > Hochschule" is "technical high school", is a > high-level scientific institution. (In terms of > the United States educational system, "Technische > Hochschule" translates more or less as "institute > of technology", as in "Massachusetts Institute of Technology.") > > The ETH had apparently not reached that level > when Einstein was there. (From the Wikipedia > article on the ETH: "In 1909, the course program > of the ETH was restructured to that of a real > university, and the ETH was granted the right to > award doctorates.") Though Einstein did study at > the ETH, his doctorate (per the Wikipedia article > on Einstein) was from the University of Zürich, in 1905. > > We now return you to SPSS and statistical matters. > |
You're probably right. The urban legend refers to a Dr Cooper,
introducer of aerobic exercise and still living, not to J.Fixx who advocated jogging. I mistakenly wrote about aerobism when I should have written jogging. My example was right after all. Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de David Wasserman Enviado el: 24 January 2007 02:16 Para: [hidden email] Asunto: Re: Einstein (OT); was, re: Help with Binary Logistic Regression If you were referring to Jim Fixx, whose "Complete Book of Running" propelled jogging to new heights of popularity, he did die of a heart attack at 52. I remember reading the news stories at the time, and I have found no on-line resources to contradict it. If you're referring to someone else, I can't help restore your example. David Wasserman Custom Data Analysis and SPSS Programming ----- Original Message ----- From: "Hector Maletta" <[hidden email]> To: <[hidden email]> Sent: Tuesday, January 23, 2007 6:59 PM Subject: Re: Einstein (OT); was, re: Help with Binary Logistic Regression > I stand corrected. The point was, however, that he was denied > access > to the Universitat and had to go to the Hochschule, due to negative > reports > from his high school teachers. He was also a rather erratic teenager, e.g. > taking a year out of school to wander in a bike through the country, not > the > usual mark of an overachiever. > By the way, I had been informed by a list member that the aerobism > inventor did not die while exercising: his supposed death is apparently > just > another urban legend. > So much for my examples, of which only Winston Churchill survives. > Fortunately, my point did not depend on those particular (and avowedly > poorly researched) examples. > > Hector > > > -----Mensaje original----- > De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de > Richard Ristow > Enviado el: 23 January 2007 21:54 > Para: [hidden email] > Asunto: Einstein (OT); was, re: Help with Binary Logistic Regression > > At 01:35 PM 1/23/2007, Hector Maletta wrote: > > >Another example is Albert Einstein: barely > >passing high school, was judged not to be > >university material, and only made it to a > >vocational polytechnical school, > > Granted on a lot of counts, but his scientific > education was at a higher level than that. The > Eidgenössische Technische Hochschule Zürich, > though the literal English of "Technische > Hochschule" is "technical high school", is a > high-level scientific institution. (In terms of > the United States educational system, "Technische > Hochschule" translates more or less as "institute > of technology", as in "Massachusetts Institute of Technology.") > > The ETH had apparently not reached that level > when Einstein was there. (From the Wikipedia > article on the ETH: "In 1909, the course program > of the ETH was restructured to that of a real > university, and the ETH was granted the right to > award doctorates.") Though Einstein did study at > the ETH, his doctorate (per the Wikipedia article > on Einstein) was from the University of Zürich, in 1905. > > We now return you to SPSS and statistical matters. > |
In reply to this post by Hector Maletta
I have a very simple question.
I want to add data for a subset of cases and a subset of variables to my master dataset, matched on subject ID (for which there is one row per subject in each dataset). What is the best way to do this? Thank you in advance, Stephanie Harrison _________________________________________________________________ Valentines Day -- Shop for gifts that spell L-O-V-E at MSN Shopping http://shopping.msn.com/content/shp/?ctId=8323,ptnrid=37,ptnrdata=24095&tcode=wlmtagline |
Stephanie,
Please do some reading in the syntax reference manual. Look at the following commands: Match files, Update. Your message is a little unclear as to exactly what you have and what to do. >>I want to add data for a subset of cases and a subset of variables to my master dataset This could be a job for Update if you want to change the values of existing variables for existing cases in the master dataset. But it could be other things. Depends on what exactly you mean. Gene Maguin |
In reply to this post by Stephanie Roahen-Harrison
Stephanie,
You're in luck - I specialize in 'very simple questions'. Firstly, Gene's suggestion is good. Secondly, I think I know what to do, but I don't know how to tell you to do it because there is nothing here for me to work with. If you can provide a very simplified example of the data you currently have, and another example of what you want that data to look like, then you'll be giving people here something to sink their teeth into. :), -Gary |
Free forum by Nabble | Edit this page |