Hi fellow listers,
I am conducting an analysis for a trade association, and attempting to identify the events and activities that lead to renewal or rejection in the association (a binary measure), hence I am using binary logistic regression. I have each respondent's record of activity in various association events as independent variables in my attempt to correctly classify their renewal status and build a model. My question relates to setting an appropriate cutoff value. The default cutoff value is .5, and regardless of this value, the overall model and variables identified does not change, as expected. With the default cut value, I correct classify renewals (94%), but not rejecters (55%). The actual proportion of those not renewing in the data set is relatively low, at 20%, hence I am more interested in being able to identify activities that predict rejection then renewal. Setting a higher cutoff value (say .7 or .8) produces better correct classification of rejection but with a slight penalty in the ability to correctly classify renewals, also expected, but this seems to be a reasonable trade-off, since momentum generally leads to a renewal. I could argue that setting a cutoff value approaching 1.0 is the most useful, since we want to identify activities that lead to rejection, but this seems to penalize the correct classification of renewals a bit too heavily. Any thoughts or points of view on approaches to setting cutoff values with a skewed distribution like this is appreciated. Many thanks, Bob Walker Surveys & Forecasts, LLC https://www.safllc.com ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Save the predicted probability from a regression equation and then plot it using an ROC curve. This plot then gives the % true positives on the Y axis, and the % false positives on the X axis.
It typically gives a curve -- see https://andrewpwheeler.wordpress.com/2015/03/09/roc-and-precision-recall-curves-in-spss/ -- so there are no clear cut-offs. It depends on the application what is a reasonable tradeoff in costs for false-positives vs false-negatives where the cut-off should be located. See https://andrewpwheeler.wordpress.com/2015/05/27/how-wide-to-make-the-net-in-actuarial-tools-false-positives-versus-false-negatives/ for some example discussion. |
Andy - much appreciated... yes, I have run several ROC curves at different cutoff values. They are helpful to a point; knowing the specific use application is probably my best guide.
Many thanks, Bob Walker Surveys & Forecasts, LLC https://www.safllc.com -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Andy W Sent: Wednesday, August 23, 2017 9:31 AM To: [hidden email] Subject: Re: Cutoff Values in Binary Logistic Regression Save the predicted probability from a regression equation and then plot it using an ROC curve. This plot then gives the % true positives on the Y axis, and the % false positives on the X axis. It typically gives a curve -- see https://andrewpwheeler.wordpress.com/2015/03/09/roc-and-precision-recall-curves-in-spss/ -- so there are no clear cut-offs. It depends on the application what is a reasonable tradeoff in costs for false-positives vs false-negatives where the cut-off should be located. See https://andrewpwheeler.wordpress.com/2015/05/27/how-wide-to-make-the-net-in-actuarial-tools-false-positives-versus-false-negatives/ for some example discussion. ----- Andy W [hidden email] http://andrewpwheeler.wordpress.com/ -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cutoff-Values-in-Binary-Logistic-Regression-tp5734730p5734733.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
You might also want to try other classification methods such as SVM, which is available as the STATS SVM extension command. It allows you to specify a misclassification cost factor and finds the best model taking that into account. On Wed, Aug 23, 2017 at 7:45 AM, Bob Walker <[hidden email]> wrote: Andy - much appreciated... yes, I have run several ROC curves at different cutoff values. They are helpful to a point; knowing the specific use application is probably my best guide. |
In reply to this post by Robert Walker
What is your purpose? Getting renewals? You might want to send a cheap email reminder
to almost everyone, and save the expensive approaches for a smaller, targeted audience.
Unless you have a single well-defined purpose, the ROC curve is the "reduced form" of the data that preserves the information you have on hand.
-- Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Bob Walker <[hidden email]>
Sent: Wednesday, August 23, 2017 8:30:19 AM To: [hidden email] Subject: Cutoff Values in Binary Logistic Regression Hi fellow listers,
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
I am conducting an analysis for a trade association, and attempting to identify the events and activities that lead to renewal or rejection in the association (a binary measure), hence I am using binary logistic regression. I have each respondent's record of activity in various association events as independent variables in my attempt to correctly classify their renewal status and build a model. My question relates to setting an appropriate cutoff value. The default cutoff value is .5, and regardless of this value, the overall model and variables identified does not change, as expected. With the default cut value, I correct classify renewals (94%), but not rejecters (55%). The actual proportion of those not renewing in the data set is relatively low, at 20%, hence I am more interested in being able to identify activities that predict rejection then renewal. Setting a higher cutoff value (say .7 or .8) produces better correct classification of rejection but with a slight penalty in the ability to correctly classify renewals, also expected, but this seems to be a reasonable trade-off, since momentum generally leads to a renewal. I could argue that setting a cutoff value approaching 1.0 is the most useful, since we want to identify activities that lead to rejection, but this seems to penalize the correct classification of renewals a bit too heavily. Any thoughts or points of view on approaches to setting cutoff values with a skewed distribution like this is appreciated. Many thanks, Bob Walker Surveys & Forecasts, LLC https://www.safllc.com ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hi Rich, Thanks for your input. The purpose is to (hopefully) identify variables most predictive of both renewals and resignations of association companies. The database itself is relatively
small — about 200 companies. The variables contain the number of people at each company who attended various events over the past 24 months (repeated measures per year, for example, attendance at their annual conference, webinars, etc.). After some experimentation,
binary logistic regression with slightly higher cutoff values (.6 or .7) does a good job of classifying these groups at > 85%. We’re also looking to develop models by member type; the regression results here are even stronger because the reasons for renewal
are specific by member type. You, Jon, and Andy all suggested adding the ROC analysis; I will use the AUC values to further help identify variables that the association might focus on first.
From: Rich Ulrich [mailto:[hidden email]]
What is your purpose? Getting renewals? You might want to send a cheap email reminder
to almost everyone, and save the expensive approaches for a smaller, targeted audience.
Unless you have a single well-defined purpose, the ROC curve is the "reduced form" of the data that preserves the information you have on hand.
-- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]>
on behalf of Bob Walker <[hidden email]> Hi fellow listers, |
Free forum by Nabble | Edit this page |