problem with casewise list and classification plot in logistic regression

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

problem with casewise list and classification plot in logistic regression

lcl23
I am using binary logistic regression to test this model:

logit(ρ1) = α + β1(INDDIR) + β2(INDCHAIR) + β3(BOARDSIZE) + β4(DIRSHIP) + β5(MEETING) + β6(EXPERT) + β7(INSTI) + β8(DEBT) + β9(LnSIZE) + β10(BIG4)

I have attached with the output of 2nd run. I have some doubt on the casewise list (outliers). The first time I run the analysis, there were 44 outliers. I deleted all these 44 cases. Then, I re-run the analysis. Again, 11 outliers were appeared. I deleted all these 11 cases then re-run. Again and again, 6 outliers were found!

So, should I continuously delete all outliers or it would be a never ending story?

Also, does the classification plot looks weird? I not sure how to interpret it.

Please help. Thanks.

OUTPUT.PDF
Reply | Threaded
Open this post in threaded view
|

Re: problem with casewise list and classification plot in logistic regression

Bruce Weaver
Administrator
You have other more serious problems to deal with here.  The classification table in your output gives this frequency distribution for your outcome variable:

732 -- No Combined RMC
 21 -- Combined RMC

So you are severely over-fitting the model.  According to the rule of thumb given in Frank Harrell's book (Regression Modeling Strategies, Springer), you should have at least 15-20 events per model parameter.  An "event" is defined as the outcome variable category with the lower frequency count.  So you have 21 events.  Therefore, you can only really include one (single degree of freedom) explanatory variable.  (Bear in mind that the constant counts as a parameter.)

For more information on over-fitting, see Mike Babyak's nice readable article.

   http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf

HTH.


lcl23 wrote
I am using binary logistic regression to test this model:

logit(ρ1) = α + β1(INDDIR) + β2(INDCHAIR) + β3(BOARDSIZE) + β4(DIRSHIP) + β5(MEETING) + β6(EXPERT) + β7(INSTI) + β8(DEBT) + β9(LnSIZE) + β10(BIG4)

I have attached with the output of 2nd run. I have some doubt on the casewise list (outliers). The first time I run the analysis, there were 44 outliers. I deleted all these 44 cases. Then, I re-run the analysis. Again, 11 outliers were appeared. I deleted all these 11 cases then re-run. Again and again, 6 outliers were found!

So, should I continuously delete all outliers or it would be a never ending story?

Also, does the classification plot looks weird? I not sure how to interpret it.

Please help. Thanks.

OUTPUT.PDF
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: problem with casewise list and classification plot in logistic regression

lcl23
I need some time to find solution for this overfitting problem. By the way, a little bit out of topic. What's the solution, if my Hosmer and Lemeshow Goodness-of-Fit test is significant (p<0.05)? No issue of multicollinearity, VIF & tolerance are ok. 

--- On Tue, 25/1/11, Bruce Weaver [via SPSSX Discussion] <[hidden email]> wrote:

From: Bruce Weaver [via SPSSX Discussion] <[hidden email]>
Subject: Re: problem with casewise list and classification plot in logistic regression
To: "lcl23" <[hidden email]>
Date: Tuesday, 25 January, 2011, 4:58 AM

You have other more serious problems to deal with here.  The classification table in your output gives this frequency distribution for your outcome variable:

732 -- No Combined RMC
 21 -- Combined RMC

So you are severely over-fitting the model.  According to the rule of thumb given in Frank Harrell's book (Regression Modeling Strategies, Springer), you should have at least 15-20 events per model parameter.  An "event" is defined as the outcome variable category with the lower frequency count.  So you have 21 events.  Therefore, you can only really include one (single degree of freedom) explanatory variable.  (Bear in mind that the constant counts as a parameter.)

For more information on over-fitting, see Mike Babyak's nice readable article.

   http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf

HTH.


lcl23 wrote:
I am using binary logistic regression to test this model:

logit(ρ1) = α + β1(INDDIR) + β2(INDCHAIR) + β3(BOARDSIZE) + β4(DIRSHIP) + β5(MEETING) + β6(EXPERT) + β7(INSTI) + β8(DEBT) + β9(LnSIZE) + β10(BIG4)

I have attached with the output of 2nd run. I have some doubt on the casewise list (outliers). The first time I run the analysis, there were 44 outliers. I deleted all these 44 cases. Then, I re-run the analysis. Again, 11 outliers were appeared. I deleted all these 11 cases then re-run. Again and again, 6 outliers were found!

So, should I continuously delete all outliers or it would be a never ending story?

Also, does the classification plot looks weird? I not sure how to interpret it.

Please help. Thanks.

OUTPUT.PDF
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.



To unsubscribe from problem with casewise list and classification plot in logistic regression, click here.

Reply | Threaded
Open this post in threaded view
|

Re: problem with casewise list and classification plot in logistic regression

lcl23
In reply to this post by Bruce Weaver
Perhaps you may get clearer picture after look at the attached output. Basically, it is the same set of independent variables with different dependent variable (SRMC). 

--- On Tue, 25/1/11, Chui Ling <[hidden email]> wrote:

From: Chui Ling <[hidden email]>
Subject: Re: problem with casewise list and classification plot in logistic regression
To: "Bruce Weaver [via SPSSX Discussion]" <[hidden email]>
Date: Tuesday, 25 January, 2011, 11:28 AM

I need some time to find solution for this overfitting problem. By the way, a little bit out of topic. What's the solution, if my Hosmer and Lemeshow Goodness-of-Fit test is significant (p<0.05)? No issue of multicollinearity, VIF & tolerance are ok. 

--- On Tue, 25/1/11, Bruce Weaver [via SPSSX Discussion] <[hidden email]> wrote:

From: Bruce Weaver [via SPSSX Discussion] <[hidden email]>
Subject: Re: problem with casewise list and classification plot in logistic regression
To: "lcl23" <[hidden email]>
Date: Tuesday, 25 January, 2011, 4:58 AM

You have other more serious problems to deal with here.  The classification table in your output gives this frequency distribution for your outcome variable:

732 -- No Combined RMC
 21 -- Combined RMC

So you are severely over-fitting the model.  According to the rule of thumb given in Frank Harrell's book (Regression Modeling Strategies, Springer), you should have at least 15-20 events per model parameter.  An "event" is defined as the outcome variable category with the lower frequency count.  So you have 21 events.  Therefore, you can only really include one (single degree of freedom) explanatory variable.  (Bear in mind that the constant counts as a parameter.)

For more information on over-fitting, see Mike Babyak's nice readable article.

   http://www.class.uidaho.edu/psy586/Course%20Readings/Babyak_04.pdf

HTH.


lcl23 wrote:
I am using binary logistic regression to test this model:

logit(ρ1) = α + β1(INDDIR) + β2(INDCHAIR) + β3(BOARDSIZE) + β4(DIRSHIP) + β5(MEETING) + β6(EXPERT) + β7(INSTI) + β8(DEBT) + β9(LnSIZE) + β10(BIG4)

I have attached with the output of 2nd run. I have some doubt on the casewise list (outliers). The first time I run the analysis, there were 44 outliers. I deleted all these 44 cases. Then, I re-run the analysis. Again, 11 outliers were appeared. I deleted all these 11 cases then re-run. Again and again, 6 outliers were found!

So, should I continuously delete all outliers or it would be a never ending story?

Also, does the classification plot looks weird? I not sure how to interpret it.

Please help. Thanks.

OUTPUT.PDF
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.



To unsubscribe from problem with casewise list and classification plot in logistic regression, click here.



Output - SRMC.spo (110K) Download Attachment