Hi, I am a very beginner in stat. Now working on a research which apply logistic regression. I already entered all the data into SPSS & done with coding. Now, I presumed I should start with testing the assumption?
So, I tested the linearity of the logit. I created the log for each continuous IV & ran it using binary logistic, but the output showed such warning : Warning # 602 >The argument for the natural log function is less than or equal to zero. >The result has been set to the system-missing value. The p value all came out to be 0.999 & 1.0. My missing cases were 740 out of total sample of 800! I think this should be the problem of zero cells? How to deal with it? Certainly I cannot delete the case, it is so many! My objective is to find out the existence of the management committee (SRMC) among public companies. DV: SRMC (0 or 1) IV: 1) INDDIR - continuous 2) INDCHAIR - categorical 3) BRDSIZE- continuous 4) DIRSHIP- continuous 5)MEETING- continuous 6) EXPERT- continuous 7) INSTI- continuous 8) DEBT- continuous 9) SIZE- continuous 10) BIG4 - categorical Many thanks! |
Administrator
|
You cannot log-transform values equal to 0 or less. Re this method you're using to test for "linearity of the logit", do you have references to support it? See the suggestion here about how to test it: http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node54.html I.e., go ahead and run the model, then see how well it fits. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
One approach would be to compare the fit of the model that assumes
linearity (logit(p) = b0 + b1*X1 + b2*X2 + ... + bk*Xk) against the fit of a model that does not make such an assumption (including polynomials of the predictors). Ryan On Tue, Jan 11, 2011 at 7:11 AM, Bruce Weaver <[hidden email]> wrote: > lcl23 wrote: >> >> Hi, I am a very beginner in stat. Now working on a research which apply >> logistic regression. I already entered all the data into SPSS & done with >> coding. Now, I presumed I should start with testing the assumption? >> >> So, I tested the linearity of the logit. I created the log for each >> continuous IV & ran it using binary logistic, but the output showed such >> warning : >> Warning # 602 >>>The argument for the natural log function is less than or equal to zero. >>>The result has been set to the system-missing value. >> >> The p value all came out to be 0.999 & 1.0. My missing cases were 740 out >> of total sample of 800! I think this should be the problem of zero cells? >> How to deal with it? Certainly I cannot delete the case, it is so many! >> >> My objective is to find out the existence of the management committee >> (SRMC) among public companies. >> DV: SRMC (0 or 1) >> IV: >> 1) INDDIR - continuous >> 2) INDCHAIR - categorical >> 3) BRDSIZE- continuous >> 4) DIRSHIP- continuous >> 5)MEETING- continuous >> 6) EXPERT- continuous >> 7) INSTI- continuous >> 8) DEBT- continuous >> 9) SIZE- continuous >> 10) BIG4 - categorical >> >> Many thanks! >> > > You cannot log-transform values equal to 0 or less. > > Re this method you're using to test for "linearity of the logit", do you > have references to support it? See the suggestion here about how to test > it: > > http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node54.html > > I.e., go ahead and run the model, then see how well it fits. > > HTH. > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic-regression-linearity-to-the-logit-tp3336036p3336372.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Bruce Weaver
|
In reply to this post by Ryan
|
Administrator
|
In reply to this post by lcl23
It has to be done after fitting the model. As Ryan has suggested, fit the model assuming linearity. Then fit another model that does not assume linearity--e.g., a model with both the linear and quadratic terms. Does the -2LL value change significantly from one model to the next? If it does, then you have evidence against the linearity assumption.
You have many variables, but let's simplify it to a single continuous predictor variable. Model 1: logit(p) = b0 + b1*X1 + error Model 2: logit(p) = b0 + b1*X1 + b2*X1^2 + error Model 1 constrains the relationship between X1 and logit(p) to be linear. Model 2 allows it to be curvilinear (with one change in direction). If Model 2 fits better than Model 1, then you do not want to force a linear fit. The test for improvement in fit is a chi-square test on the change in -2LL from Model 1 to Model 2, with df = the difference in the number of model parameters. If this all sounds like Greek to you, you need to do some more background reading. David Garson's StatNotes might be a good place to start. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by lcl23
Another method to test the assumption of linearity in the logit is to use the Box-Tidwell transformation . This involves adding a term of the form (X)ln(X) to the equation. If the coefficient for this variable is statistically significant, there is evidence of nonlinearity in the relationship between logit(Y) and X.
~~~~~~~~~~~ Scott R Millis, PhD, ABPP, CStat, PStat(ASA) Professor Wayne State University School of Medicine Email: [hidden email] Email: [hidden email] Tel: 313-993-8085 --- On Tue, 1/11/11, R B <[hidden email]> wrote: > From: R B <[hidden email]> > Subject: Re: problem with logistic regression - linearity to the logit > To: [hidden email] > Date: Tuesday, January 11, 2011, 8:33 AM > One approach would be to compare the > fit of the model that assumes > linearity (logit(p) = b0 + b1*X1 + b2*X2 + ... + bk*Xk) > against the > fit of a model that does not make such an assumption > (including > polynomials of the predictors). > > Ryan > > On Tue, Jan 11, 2011 at 7:11 AM, Bruce Weaver <[hidden email]> > wrote: > > lcl23 wrote: > >> > >> Hi, I am a very beginner in stat. Now working on a > research which apply > >> logistic regression. I already entered all the > data into SPSS & done with > >> coding. Now, I presumed I should start with > testing the assumption? > >> > >> So, I tested the linearity of the logit. I created > the log for each > >> continuous IV & ran it using binary logistic, > but the output showed such > >> warning : > >> Warning # 602 > >>>The argument for the natural log function is > less than or equal to zero. > >>>The result has been set to the system-missing > value. > >> > >> The p value all came out to be 0.999 & 1.0. My > missing cases were 740 out > >> of total sample of 800! I think this should be the > problem of zero cells? > >> How to deal with it? Certainly I cannot delete the > case, it is so many! > >> > >> My objective is to find out the existence of the > management committee > >> (SRMC) among public companies. > >> DV: SRMC (0 or 1) > >> IV: > >> 1) INDDIR - continuous > >> 2) INDCHAIR - categorical > >> 3) BRDSIZE- continuous > >> 4) DIRSHIP- continuous > >> 5)MEETING- continuous > >> 6) EXPERT- continuous > >> 7) INSTI- continuous > >> 8) DEBT- continuous > >> 9) SIZE- continuous > >> 10) BIG4 - categorical > >> > >> Many thanks! > >> > > > > You cannot log-transform values equal to 0 or less. > > > > Re this method you're using to test for "linearity of > the logit", do you > > have references to support it? See the > suggestion here about how to test > > it: > > > > http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node54.html > > > > I.e., go ahead and run the model, then see how well it > fits. > > > > HTH. > > > > > > > > ----- > > -- > > Bruce Weaver > > [hidden email] > > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > > > "When all else fails, RTFM." > > > > NOTE: My Hotmail account is not monitored regularly. > > To send me an e-mail, please use the address shown > above. > > > > -- > > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic-regression-linearity-to-the-logit-tp3336036p3336372.html > > Sent from the SPSSX Discussion mailing list archive at > Nabble.com. > > > > ===================== > > To manage your subscription to SPSSX-L, send a message > to > > [hidden email] > (not to SPSSX-L), with no body text except the > > command. To leave the list, send the command > > SIGNOFF SPSSX-L > > For a list of commands to manage subscriptions, send > the command > > INFO REFCARD > > > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by lcl23
Are you familiar with SPSS syntax?
Ryan
On Tue, Jan 11, 2011 at 9:33 AM, lcl23 <[hidden email]> wrote:
|
In reply to this post by lcl23
In this post I provide SPSS code that simulates data that approximate the binary logistic regression equation,
logit(p) = -1.5 + 0.9*X + 0.2*X^2
At the end of the code, note that I fit two logistic regression models via the LOGISTIC REGRESSION procedure. The deviance difference test which Bruce referred to in the previous post is outputted from the LOGISTIC REGRESSION procedure. Results from the deviance difference test are located in the first row of the "Omnibus Tests of Model Coefficients" Table. Clearly the assumption of linearity does not hold, but we already knew that from the simulation code, didn't we? :)
HTH,
Ryan
-- *Generate Data. inp pro. loop ID= 1 to 1000. comp x = rv.normal(0,1). comp y = rv.bernoulli(prob). end case. Delete variables b0 b1 b2 eta prob. COMPUTE x_squared = x**2. LOGISTIC REGRESSION VARIABLES y On Tue, Jan 11, 2011 at 9:33 AM, lcl23 <[hidden email]> wrote:
|
In reply to this post by Ryan
|
In reply to this post by SR Millis-3
My favorite method for testing the assumption of linearity in the logit is the one proposed by Hosmer and Lemeshow on page 96 of their 1989 book. You stratify the variable in quartiles and treat the variable as categorical when entering it in the logistic model. If there is an increasing or decreasing trend in the ORs for the variable, then the assumption is met. Yes, the other methods are more simple but I take comfort in actually seeing the linearity.
-----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of SR Millis Sent: Tuesday, January 11, 2011 10:46 AM To: [hidden email] Subject: Re: problem with logistic regression - linearity to the logit Another method to test the assumption of linearity in the logit is to use the Box-Tidwell transformation . This involves adding a term of the form (X)ln(X) to the equation. If the coefficient for this variable is statistically significant, there is evidence of nonlinearity in the relationship between logit(Y) and X. ~~~~~~~~~~~ Scott R Millis, PhD, ABPP, CStat, PStat(ASA) Professor Wayne State University School of Medicine Email: [hidden email] Email: [hidden email] Tel: 313-993-8085 --- On Tue, 1/11/11, R B <[hidden email]> wrote: > From: R B <[hidden email]> > Subject: Re: problem with logistic regression - linearity to the logit > To: [hidden email] > Date: Tuesday, January 11, 2011, 8:33 AM One approach would be to > compare the fit of the model that assumes linearity (logit(p) = b0 + > b1*X1 + b2*X2 + ... + bk*Xk) against the fit of a model that does not > make such an assumption (including polynomials of the predictors). > > Ryan > > On Tue, Jan 11, 2011 at 7:11 AM, Bruce Weaver > <[hidden email]> > wrote: > > lcl23 wrote: > >> > >> Hi, I am a very beginner in stat. Now working on a > research which apply > >> logistic regression. I already entered all the > data into SPSS & done with > >> coding. Now, I presumed I should start with > testing the assumption? > >> > >> So, I tested the linearity of the logit. I created > the log for each > >> continuous IV & ran it using binary logistic, > but the output showed such > >> warning : > >> Warning # 602 > >>>The argument for the natural log function is > less than or equal to zero. > >>>The result has been set to the system-missing > value. > >> > >> The p value all came out to be 0.999 & 1.0. My > missing cases were 740 out > >> of total sample of 800! I think this should be the > problem of zero cells? > >> How to deal with it? Certainly I cannot delete the > case, it is so many! > >> > >> My objective is to find out the existence of the > management committee > >> (SRMC) among public companies. > >> DV: SRMC (0 or 1) > >> IV: > >> 1) INDDIR - continuous > >> 2) INDCHAIR - categorical > >> 3) BRDSIZE- continuous > >> 4) DIRSHIP- continuous > >> 5)MEETING- continuous > >> 6) EXPERT- continuous > >> 7) INSTI- continuous > >> 8) DEBT- continuous > >> 9) SIZE- continuous > >> 10) BIG4 - categorical > >> > >> Many thanks! > >> > > > > You cannot log-transform values equal to 0 or less. > > > > Re this method you're using to test for "linearity of > the logit", do you > > have references to support it? See the > suggestion here about how to test > > it: > > > > http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node54.html > > > > I.e., go ahead and run the model, then see how well it > fits. > > > > HTH. > > > > > > > > ----- > > -- > > Bruce Weaver > > [hidden email] > > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > > > "When all else fails, RTFM." > > > > NOTE: My Hotmail account is not monitored regularly. > > To send me an e-mail, please use the address shown > above. > > > > -- > > View this message in context: > > http://spssx-discussion.1045642.n5.nabble.com/problem-with-logistic- > > regression-linearity-to-the-logit-tp3336036p3336372.html > > Sent from the SPSSX Discussion mailing list archive at > Nabble.com. > > > > ===================== > > To manage your subscription to SPSSX-L, send a message > to > > [hidden email] > (not to SPSSX-L), with no body text except the > > command. To leave the list, send the command SIGNOFF SPSSX-L For a > > list of commands to manage subscriptions, send > the command > > INFO REFCARD > > > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the command. To leave the list, send the command SIGNOFF SPSSX-L For a > list of commands to manage subscriptions, send the command INFO > REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Question from the original poster:
"I thought examining the linearity of logit assumption is one of the requirement for logistic? Actually it should be done before or after the fitting of model? I am too confuse on what to do!" Notice that all of the methods that have been proposed require you to fit the model first, and then observe how well it fits (sometimes in comparison to an alternative model). HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by SR Millis-3
|
In reply to this post by SR Millis-3
OUTPUT.PDF (59K) Download Attachment |
Administrator
|
How many cases you have, and how many fall into each category of the outcome variable? One rule of thumb is that in order to avoid over-fitting the model, you should have 15-20 'events' per model parameter, where 'event' is defined as the outcome category with the lower frequency. You have 12 explanatory variables, plus interaction terms, and 25 parameters in total (including the constant). So you would need at least 750 cases, assuming 50% Yes and 50% No on the outcome variable (25 parameters * 15 * 2). See Mike Babyak's nice readable article for more information on over-fitting. http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Administrator
|
How many fall into each category of the outcome (dependent) variable?
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Administrator
|
The rule of thumb I mentioned earlier says that in order to avoid over-fitting, you should have 15-20 events per model parameter. You have approximately 120 events (i.e., 15% of 797). Therefore, your model should have about 8 parameters at most. The model you posted has 25 parameters. You either need more data, or fewer parameters in your model.
HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |