Log-it regression

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Log-it regression

Alina Sheyman-3
Hi all,
 I have a quick question about a log-it regression. I've build a model that
uses  the log of odds ratio (probability of staying in school vs. dropping
out) as my dependent variable. It looks like a decent model (good r sq),
but what worries me is that there seems to be a slight pattern to the
regression. For 12 data points  I am using I get about three residiuals
with a positive sign, three with a negative, then three more with a
positive, etc. Does anyone know if this is a typical occurance with a log-
it model or if there's a better model I should use to avoid seeing this
pattern in the residiuals?

thank you,
Alina Sheyman
Reply | Threaded
Open this post in threaded view
|

Re: Log-it regression

Hector Maletta
         Alina,
         I don't quite understand what your problem really is.

         Logit or logistic regression estimates the probability or the odds
of an event as a function of one or more predictors, and not the actual
occurrence of the event in individual cases. As such, it should be used as
an indicator of odds or probabilities for populations, not occurrences for
individuals. Nonetheless, it is customarily used to predict the outcome of
individuals by means of some cut-off point, and this leads often to some
confusion and debate (not least about what the cut off point should be).

         As the predicted probability (or log odds ratio) goes up, of
course, it is expected that the actual percentage of people with the outcome
goes also up (or down, depending on the sign of coefficients), with some not
having the event, i.e. with a value of zero which is at or below the
predicted or observed probability of the outcome, and some having the event
i.e. a value of one which is at or above the predicted or observed
probability of the event. The individual "residuals" of the logit are in
fact the actual outcome for each individual (0 or 1) minus the predicted
value (the probability of the event for that individual, as a function of
predictors).

         What you are encountering, apparently, is that your cases come in
triads: as the log odds go up (or the probability of the event goes up) you
find three cases without the event, then three with it, then another three
without it, and so on. There is no reason for that, and it is probably a
fluke or some quirk in the data. On the other hand, if that were the case
all along, the odds would not vary as a function of predictors, since 0s and
1s would alternate in equal numbers (3 of each alternately), and the odds
ratio curve would be flat (since the positives would equal the negatives all
along the range of the logit function, except perhaps for the slight
imbalance between the first three and the last three if the number of triads
is an even number).

         Perhaps I am dumber than usual today and am missing something else
you are trying to say.

         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Alina Sheyman
Sent: 06 June 2007 11:11
To: [hidden email]
Subject: Log-it regression

         Hi all,
          I have a quick question about a log-it regression. I've build a
model that
         uses  the log of odds ratio (probability of staying in school vs.
dropping
         out) as my dependent variable. It looks like a decent model (good r
sq),
         but what worries me is that there seems to be a slight pattern to
the
         regression. For 12 data points  I am using I get about three
residiuals
         with a positive sign, three with a negative, then three more with a
         positive, etc. Does anyone know if this is a typical occurance with
a log-
         it model or if there's a better model I should use to avoid seeing
this
         pattern in the residiuals?

         thank you,
         Alina Sheyman
Reply | Threaded
Open this post in threaded view
|

Re: Log-it regression

Gary Rosin
In reply to this post by Alina Sheyman-3
Sorry, I should have replied to the group.

I've been working with group data, too.  The problem is
that there often are nonlinearity, as well as heteroskedasticy,
problems.

The traditional approach has been to weight the data
by estimated variance, and then do iterated regressions,
adjusting the weights after each iteration, until the iterations
converge.  Even then, the R-Sq for the logit linear
regression is not a good measure of fit at the proportion
level.

Generalized Linear Models, with a logit link function, are more
robust.  Starting with v.15, SPSS (advanced regression models)
can do these now.  If you are interested, I have a working paper,
"Unpacking the Bar), that uses this on group Bar passage rates.
The paper is up on SPSS:  <http://ssrn.com/abstract=988429>.

Gary

At 10:24 AM 6/6/2007, you wrote:

>I'm using grouped data -
>Percent remained/percent dropped out
>
>On 6/6/07, Gary Rosin <[hidden email]> wrote:
>>Are you using individual data (with a binary independent variable)
>>or grouped data (proportion dropping out)?
>>
>>Gary Rosin
>>
>>At 09:11 AM 6/6/2007, you wrote:
>> >Hi all,
>> >  I have a quick question about a log-it regression. I've build a model
>> that
>> >uses  the log of odds ratio (probability of staying in school vs. dropping
>> >out) as my dependent variable. It looks like a decent model (good r sq),
>> >but what worries me is that there seems to be a slight pattern to the
>> >regression. For 12 data points  I am using I get about three residiuals
>> >with a positive sign, three with a negative, then three more with a
>> >positive, etc. Does anyone know if this is a typical occurance with a log-
>> >it model or if there's a better model I should use to avoid seeing this
>> >pattern in the residiuals?
>> >
>> >thank you,
>> >Alina Sheyman
>>