|
Hi all,
I have a quick question about a log-it regression. I've build a model that uses the log of odds ratio (probability of staying in school vs. dropping out) as my dependent variable. It looks like a decent model (good r sq), but what worries me is that there seems to be a slight pattern to the regression. For 12 data points I am using I get about three residiuals with a positive sign, three with a negative, then three more with a positive, etc. Does anyone know if this is a typical occurance with a log- it model or if there's a better model I should use to avoid seeing this pattern in the residiuals? thank you, Alina Sheyman |
|
Alina,
I don't quite understand what your problem really is. Logit or logistic regression estimates the probability or the odds of an event as a function of one or more predictors, and not the actual occurrence of the event in individual cases. As such, it should be used as an indicator of odds or probabilities for populations, not occurrences for individuals. Nonetheless, it is customarily used to predict the outcome of individuals by means of some cut-off point, and this leads often to some confusion and debate (not least about what the cut off point should be). As the predicted probability (or log odds ratio) goes up, of course, it is expected that the actual percentage of people with the outcome goes also up (or down, depending on the sign of coefficients), with some not having the event, i.e. with a value of zero which is at or below the predicted or observed probability of the outcome, and some having the event i.e. a value of one which is at or above the predicted or observed probability of the event. The individual "residuals" of the logit are in fact the actual outcome for each individual (0 or 1) minus the predicted value (the probability of the event for that individual, as a function of predictors). What you are encountering, apparently, is that your cases come in triads: as the log odds go up (or the probability of the event goes up) you find three cases without the event, then three with it, then another three without it, and so on. There is no reason for that, and it is probably a fluke or some quirk in the data. On the other hand, if that were the case all along, the odds would not vary as a function of predictors, since 0s and 1s would alternate in equal numbers (3 of each alternately), and the odds ratio curve would be flat (since the positives would equal the negatives all along the range of the logit function, except perhaps for the slight imbalance between the first three and the last three if the number of triads is an even number). Perhaps I am dumber than usual today and am missing something else you are trying to say. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Alina Sheyman Sent: 06 June 2007 11:11 To: [hidden email] Subject: Log-it regression Hi all, I have a quick question about a log-it regression. I've build a model that uses the log of odds ratio (probability of staying in school vs. dropping out) as my dependent variable. It looks like a decent model (good r sq), but what worries me is that there seems to be a slight pattern to the regression. For 12 data points I am using I get about three residiuals with a positive sign, three with a negative, then three more with a positive, etc. Does anyone know if this is a typical occurance with a log- it model or if there's a better model I should use to avoid seeing this pattern in the residiuals? thank you, Alina Sheyman |
|
In reply to this post by Alina Sheyman-3
Sorry, I should have replied to the group.
I've been working with group data, too. The problem is that there often are nonlinearity, as well as heteroskedasticy, problems. The traditional approach has been to weight the data by estimated variance, and then do iterated regressions, adjusting the weights after each iteration, until the iterations converge. Even then, the R-Sq for the logit linear regression is not a good measure of fit at the proportion level. Generalized Linear Models, with a logit link function, are more robust. Starting with v.15, SPSS (advanced regression models) can do these now. If you are interested, I have a working paper, "Unpacking the Bar), that uses this on group Bar passage rates. The paper is up on SPSS: <http://ssrn.com/abstract=988429>. Gary At 10:24 AM 6/6/2007, you wrote: >I'm using grouped data - >Percent remained/percent dropped out > >On 6/6/07, Gary Rosin <[hidden email]> wrote: >>Are you using individual data (with a binary independent variable) >>or grouped data (proportion dropping out)? >> >>Gary Rosin >> >>At 09:11 AM 6/6/2007, you wrote: >> >Hi all, >> > I have a quick question about a log-it regression. I've build a model >> that >> >uses the log of odds ratio (probability of staying in school vs. dropping >> >out) as my dependent variable. It looks like a decent model (good r sq), >> >but what worries me is that there seems to be a slight pattern to the >> >regression. For 12 data points I am using I get about three residiuals >> >with a positive sign, three with a negative, then three more with a >> >positive, etc. Does anyone know if this is a typical occurance with a log- >> >it model or if there's a better model I should use to avoid seeing this >> >pattern in the residiuals? >> > >> >thank you, >> >Alina Sheyman >> |
| Free forum by Nabble | Edit this page |
