Scoring wizard

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Scoring wizard

pspangler1
Hello all,
My question regards Scoring Wizard results  on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500  outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes.  After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset.

I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected?

Sent from my iPhone
--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Scoring wizard

Jignesh Sutar
Don't have any technical explanation but some comments nonetheless:

Of course the predicted probabilities are not bound to any distribution as observed from the original data. 

Why the probabilities in the new data are bounded by a minimum 0.5 could be due to some bias in the new data which skews it towards more likelihood of cases? Perhaps compare the distribution of the input variables in the model against both datasets. 


If the probabilities are bounded by a minimum 0.5 what cut off point are you using to then achieve a 90/10% predicted split? (It can't be the default 0.5 cut else you'd have 100% predicted positive?)

On Wednesday, 25 February 2015, Peter Spangler <[hidden email]> wrote:
Hello all,
My question regards Scoring Wizard results  on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500  outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes.  After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset.

I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected?

Sent from my iPhone
--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
<a href="javascript:;" onclick="_e(event, &#39;cvml&#39;, &#39;LISTSERV@LISTSERV.UGA.EDU&#39;)">LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Scoring wizard

pspangler1
Thank you Jignesh,

The distribution of the inputs was my first exploration. They are similar as they appear below.

As a test I computed the z statistic and the probability where  P(y) = 1 / (1 + exp^-z) in the Test Data based on the beta coefficients noted below. The probabilities calculated through Transform>Compute Variable gave me very different 
results than the scoring wizard and were NOT bound at 50%.

COMPUTE z=(6.08668)+(-0.0642*BounceRate)+(.41651*FreeShipping)+(-1.51439*QtyOne)+(.57765*Sale)+(-.33814*log_price). 
EXECUTE. 
COMPUTE Probability_z=1/(1+EXP(-z)). 
EXECUTE.
 

                        Train Data     Test Data           Beta
BounceRate       83.33             90.91                .0642
log_Price           3.53               3.66                 -.33814
FreeShipping     66%/34%        61%/39%         .41651
QtyOne             13%/87%        23%/77%         -1.51439
Sale                 15%/85%         11%/89%         .57765
Constant                                                       6.08668

On Wed, Feb 25, 2015 at 1:52 PM, Jignesh Sutar <[hidden email]> wrote:
Don't have any technical explanation but some comments nonetheless:

Of course the predicted probabilities are not bound to any distribution as observed from the original data. 

Why the probabilities in the new data are bounded by a minimum 0.5 could be due to some bias in the new data which skews it towards more likelihood of cases? Perhaps compare the distribution of the input variables in the model against both datasets. 


If the probabilities are bounded by a minimum 0.5 what cut off point are you using to then achieve a 90/10% predicted split? (It can't be the default 0.5 cut else you'd have 100% predicted positive?)


On Wednesday, 25 February 2015, Peter Spangler <[hidden email]> wrote:
Hello all,
My question regards Scoring Wizard results  on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500  outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes.  After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset.

I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected?

Sent from my iPhone
--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Scoring wizard

pspangler1
In addition, I am seeing Predicted Probabilities from the Scoring Wizard that are above 50% but with the Predicted Category of 0.
I did not alter the Cutoff Point in the Binary Logistic Regression window before saving the model as an XML file.

On Wed, Feb 25, 2015 at 2:39 PM, Peter Spangler <[hidden email]> wrote:
Thank you Jignesh,

The distribution of the inputs was my first exploration. They are similar as they appear below.

As a test I computed the z statistic and the probability where  P(y) = 1 / (1 + exp^-z) in the Test Data based on the beta coefficients noted below. The probabilities calculated through Transform>Compute Variable gave me very different 
results than the scoring wizard and were NOT bound at 50%.

COMPUTE z=(6.08668)+(-0.0642*BounceRate)+(.41651*FreeShipping)+(-1.51439*QtyOne)+(.57765*Sale)+(-.33814*log_price). 
EXECUTE. 
COMPUTE Probability_z=1/(1+EXP(-z)). 
EXECUTE.
 

                        Train Data     Test Data           Beta
BounceRate       83.33             90.91                .0642
log_Price           3.53               3.66                 -.33814
FreeShipping     66%/34%        61%/39%         .41651
QtyOne             13%/87%        23%/77%         -1.51439
Sale                 15%/85%         11%/89%         .57765
Constant                                                       6.08668

On Wed, Feb 25, 2015 at 1:52 PM, Jignesh Sutar <[hidden email]> wrote:
Don't have any technical explanation but some comments nonetheless:

Of course the predicted probabilities are not bound to any distribution as observed from the original data. 

Why the probabilities in the new data are bounded by a minimum 0.5 could be due to some bias in the new data which skews it towards more likelihood of cases? Perhaps compare the distribution of the input variables in the model against both datasets. 


If the probabilities are bounded by a minimum 0.5 what cut off point are you using to then achieve a 90/10% predicted split? (It can't be the default 0.5 cut else you'd have 100% predicted positive?)


On Wednesday, 25 February 2015, Peter Spangler <[hidden email]> wrote:
Hello all,
My question regards Scoring Wizard results  on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500  outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes.  After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset.

I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected?

Sent from my iPhone
--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Scoring wizard

Alex Reutter
Hi Peter,

In the scoring wizard, are you selecting "Probability of predicted value" or "Probability of selected value"?  My guess is that you are selecting the former, but you want the latter.
http://www-01.ibm.com/support/knowledgecenter/SSLVMB_22.0.0/com.ibm.spss.statistics.help/spss/base/idh_scoring_wizard_select_expressions.htm

Alex




From:        Peter Spangler <[hidden email]>
To:        [hidden email]
Date:        02/25/2015 05:56 PM
Subject:        Re: Scoring wizard
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




In addition, I am seeing Predicted Probabilities from the Scoring Wizard that are above 50% but with the Predicted Category of 0.
I did not alter the Cutoff Point in the Binary Logistic Regression window before saving the model as an XML file.

On Wed, Feb 25, 2015 at 2:39 PM, Peter Spangler <peter.spangler@...> wrote:
Thank you Jignesh,

The distribution of the inputs was my first exploration. They are similar as they appear below.

As a test I computed the z statistic and the probability where  P(y) = 1 / (1 + exp^-z) in the Test Data based on the beta coefficients noted below. The probabilities calculated through Transform>Compute Variable gave me very different 
results than the scoring wizard and were NOT bound at 50%.

COMPUTE z=(6.08668)+(-0.0642*BounceRate)+(.41651*FreeShipping)+(-1.51439*QtyOne)+(.57765*Sale)+(-.33814*log_price). 
EXECUTE. 
COMPUTE Probability_z=1/(1+EXP(-z)). 
EXECUTE.
 

                        Train Data     Test Data           Beta
BounceRate       83.33             90.91                .0642
log_Price           3.53               3.66                 -.33814
FreeShipping     66%/34%        61%/39%         .41651
QtyOne             13%/87%        23%/77%         -1.51439
Sale                 15%/85%         11%/89%         .57765
Constant                                                       6.08668

On Wed, Feb 25, 2015 at 1:52 PM, Jignesh Sutar <jsutar@...> wrote:
Don't have any technical explanation but some comments nonetheless:

Of course the predicted probabilities are not bound to any distribution as observed from the original data. 

Why the probabilities in the new data are bounded by a minimum 0.5 could be due to some bias in the new data which skews it towards more likelihood of cases? Perhaps compare the distribution of the input variables in the model against both datasets. 


If the probabilities are bounded by a minimum 0.5 what cut off point are you using to then achieve a 90/10% predicted split? (It can't be the default 0.5 cut else you'd have 100% predicted positive?)


On Wednesday, 25 February 2015, Peter Spangler <
peter.spangler@...> wrote:
Hello all,
My question regards Scoring Wizard results  on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500  outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes.  After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset.

I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected?

Sent from my iPhone

--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately.
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Scoring wizard

pspangler1
Thanks, Alex! I had saved the syntax and did not make the distinction until reading this example. 


On Feb 25, 2015, at 4:31 PM, Alex Reutter <[hidden email]> wrote:

Hi Peter,

In the scoring wizard, are you selecting "Probability of predicted value" or "Probability of selected value"?  My guess is that you are selecting the former, but you want the latter.
http://www-01.ibm.com/support/knowledgecenter/SSLVMB_22.0.0/com.ibm.spss.statistics.help/spss/base/idh_scoring_wizard_select_expressions.htm

Alex




From:        Peter Spangler <[hidden email]>
To:        [hidden email]
Date:        02/25/2015 05:56 PM
Subject:        Re: Scoring wizard
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




In addition, I am seeing Predicted Probabilities from the Scoring Wizard that are above 50% but with the Predicted Category of 0.
I did not alter the Cutoff Point in the Binary Logistic Regression window before saving the model as an XML file.

On Wed, Feb 25, 2015 at 2:39 PM, Peter Spangler <[hidden email]> wrote:
Thank you Jignesh,

The distribution of the inputs was my first exploration. They are similar as they appear below.

As a test I computed the z statistic and the probability where  P(y) = 1 / (1 + exp^-z) in the Test Data based on the beta coefficients noted below. The probabilities calculated through Transform>Compute Variable gave me very different 
results than the scoring wizard and were NOT bound at 50%.

COMPUTE z=(6.08668)+(-0.0642*BounceRate)+(.41651*FreeShipping)+(-1.51439*QtyOne)+(.57765*Sale)+(-.33814*log_price). 
EXECUTE. 
COMPUTE Probability_z=1/(1+EXP(-z)). 
EXECUTE.
 

                        Train Data     Test Data           Beta
BounceRate       83.33             90.91                .0642
log_Price           3.53               3.66                 -.33814
FreeShipping     66%/34%        61%/39%         .41651
QtyOne             13%/87%        23%/77%         -1.51439
Sale                 15%/85%         11%/89%         .57765
Constant                                                       6.08668

On Wed, Feb 25, 2015 at 1:52 PM, Jignesh Sutar <[hidden email]> wrote:
Don't have any technical explanation but some comments nonetheless:

Of course the predicted probabilities are not bound to any distribution as observed from the original data. 

Why the probabilities in the new data are bounded by a minimum 0.5 could be due to some bias in the new data which skews it towards more likelihood of cases? Perhaps compare the distribution of the input variables in the model against both datasets. 


If the probabilities are bounded by a minimum 0.5 what cut off point are you using to then achieve a 90/10% predicted split? (It can't be the default 0.5 cut else you'd have 100% predicted positive?)


On Wednesday, 25 February 2015, Peter Spangler <
[hidden email]> wrote:
Hello all,
My question regards Scoring Wizard results  on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500  outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes.  After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset.

I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected?

Sent from my iPhone

--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD