Hello all,
My question regards Scoring Wizard results on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500 outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes. After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset. I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected? Sent from my iPhone -- This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Don't have any technical explanation but some comments nonetheless:
Of course the predicted probabilities are not bound to any distribution as observed from the original data. Why the probabilities in the new data are bounded by a minimum 0.5 could be due to some bias in the new data which skews it towards more likelihood of cases? Perhaps compare the distribution of the input variables in the model against both datasets. If the probabilities are bounded by a minimum 0.5 what cut off point are you using to then achieve a 90/10% predicted split? (It can't be the default 0.5 cut else you'd have 100% predicted positive?)
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
On Wednesday, 25 February 2015, Peter Spangler <[hidden email]> wrote: Hello all, |
Thank you Jignesh, The distribution of the inputs was my first exploration. They are similar as they appear below. As a test I computed the z statistic and the probability where P(y) = 1 / (1 + exp^-z) in the Test Data based on the beta coefficients noted below. The probabilities calculated through Transform>Compute Variable gave me very different results than the scoring wizard and were NOT bound at 50%. COMPUTE z=(6.08668)+(-0.0642*BounceRate)+(.41651*FreeShipping)+(-1.51439*QtyOne)+(.57765*Sale)+(-.33814*log_price). EXECUTE. COMPUTE Probability_z=1/(1+EXP(-z)). EXECUTE. Train Data Test Data Beta BounceRate 83.33 90.91 .0642 log_Price 3.53 3.66 -.33814 FreeShipping 66%/34% 61%/39% .41651 QtyOne 13%/87% 23%/77% -1.51439 Sale 15%/85% 11%/89% .57765 Constant 6.08668 On Wed, Feb 25, 2015 at 1:52 PM, Jignesh Sutar <[hidden email]> wrote: Don't have any technical explanation but some comments nonetheless: This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In addition, I am seeing Predicted Probabilities from the Scoring Wizard that are above 50% but with the Predicted Category of 0. I did not alter the Cutoff Point in the Binary Logistic Regression window before saving the model as an XML file. On Wed, Feb 25, 2015 at 2:39 PM, Peter Spangler <[hidden email]> wrote:
This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hi Peter,
In the scoring wizard, are you selecting "Probability of predicted value" or "Probability of selected value"? My guess is that you are selecting the former, but you want the latter. http://www-01.ibm.com/support/knowledgecenter/SSLVMB_22.0.0/com.ibm.spss.statistics.help/spss/base/idh_scoring_wizard_select_expressions.htm Alex From: Peter Spangler <[hidden email]> To: [hidden email] Date: 02/25/2015 05:56 PM Subject: Re: Scoring wizard Sent by: "SPSSX(r) Discussion" <[hidden email]> In addition, I am seeing Predicted Probabilities from the Scoring Wizard that are above 50% but with the Predicted Category of 0. I did not alter the Cutoff Point in the Binary Logistic Regression window before saving the model as an XML file. On Wed, Feb 25, 2015 at 2:39 PM, Peter Spangler <peter.spangler@...> wrote: Thank you Jignesh, The distribution of the inputs was my first exploration. They are similar as they appear below. As a test I computed the z statistic and the probability where P(y) = 1 / (1 + exp^-z) in the Test Data based on the beta coefficients noted below. The probabilities calculated through Transform>Compute Variable gave me very different results than the scoring wizard and were NOT bound at 50%. COMPUTE z=(6.08668)+(-0.0642*BounceRate)+(.41651*FreeShipping)+(-1.51439*QtyOne)+(.57765*Sale)+(-.33814*log_price). EXECUTE. COMPUTE Probability_z=1/(1+EXP(-z)). EXECUTE. Train Data Test Data Beta BounceRate 83.33 90.91 .0642 log_Price 3.53 3.66 -.33814 FreeShipping 66%/34% 61%/39% .41651 QtyOne 13%/87% 23%/77% -1.51439 Sale 15%/85% 11%/89% .57765 Constant 6.08668 On Wed, Feb 25, 2015 at 1:52 PM, Jignesh Sutar <jsutar@...> wrote: Don't have any technical explanation but some comments nonetheless: Of course the predicted probabilities are not bound to any distribution as observed from the original data. Why the probabilities in the new data are bounded by a minimum 0.5 could be due to some bias in the new data which skews it towards more likelihood of cases? Perhaps compare the distribution of the input variables in the model against both datasets. If the probabilities are bounded by a minimum 0.5 what cut off point are you using to then achieve a 90/10% predicted split? (It can't be the default 0.5 cut else you'd have 100% predicted positive?) On Wednesday, 25 February 2015, Peter Spangler <peter.spangler@...> wrote: Hello all, My question regards Scoring Wizard results on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500 outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes. After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset. I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected? Sent from my iPhone -- This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately. ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks, Alex! I had saved the syntax and did not make the distinction until reading this example. http://www-01.ibm.com/support/knowledgecenter/SSLVMB_22.0.0/com.ibm.spss.statistics.tut/spss/tutorials/scoring_applying_model.htm Sent from my iPhone
This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |