SPSSX Discussion

Scoring wizard

Classic

List

Threaded

6 messages Options

pspangler1

Scoring wizard

Hello all,
My question regards Scoring Wizard results on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500 outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes. After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset.

I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected?

Sent from my iPhone
--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jignesh Sutar

Re: Scoring wizard

Don't have any technical explanation but some comments nonetheless:

Of course the predicted probabilities are not bound to any distribution as observed from the original data.

Why the probabilities in the new data are bounded by a minimum 0.5 could be due to some bias in the new data which skews it towards more likelihood of cases? Perhaps compare the distribution of the input variables in the model against both datasets.

If the probabilities are bounded by a minimum 0.5 what cut off point are you using to then achieve a 90/10% predicted split? (It can't be the default 0.5 cut else you'd have 100% predicted positive?)

On Wednesday, 25 February 2015, Peter Spangler <[hidden email]> wrote:

Hello all,
My question regards Scoring Wizard results on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500 outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes. After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset.

I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected?

Sent from my iPhone
--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
<a href="javascript:;" onclick="_e(event, 'cvml', 'LISTSERV@LISTSERV.UGA.EDU')">LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

pspangler1

Re: Scoring wizard

Thank you Jignesh,

The distribution of the inputs was my first exploration. They are similar as they appear below.

As a test I computed the z statistic and the probability where P(y) = 1 / (1 + exp^-z) in the Test Data based on the beta coefficients noted below. The probabilities calculated through Transform>Compute Variable gave me very different

results than the scoring wizard and were NOT bound at 50%.

COMPUTE z=(6.08668)+(-0.0642*BounceRate)+(.41651*FreeShipping)+(-1.51439*QtyOne)+(.57765*Sale)+(-.33814*log_price).

EXECUTE.

COMPUTE Probability_z=1/(1+EXP(-z)).

EXECUTE.

Train Data Test Data Beta

BounceRate 83.33 90.91 .0642

log_Price 3.53 3.66 -.33814

FreeShipping 66%/34% 61%/39% .41651

QtyOne 13%/87% 23%/77% -1.51439

Sale 15%/85% 11%/89% .57765

Constant 6.08668

On Wed, Feb 25, 2015 at 1:52 PM, Jignesh Sutar <[hidden email]> wrote:

Don't have any technical explanation but some comments nonetheless:

Of course the predicted probabilities are not bound to any distribution as observed from the original data.

Why the probabilities in the new data are bounded by a minimum 0.5 could be due to some bias in the new data which skews it towards more likelihood of cases? Perhaps compare the distribution of the input variables in the model against both datasets.

If the probabilities are bounded by a minimum 0.5 what cut off point are you using to then achieve a 90/10% predicted split? (It can't be the default 0.5 cut else you'd have 100% predicted positive?)

On Wednesday, 25 February 2015, Peter Spangler <[hidden email]> wrote:
Hello all,
My question regards Scoring Wizard results on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500 outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes. After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset.

I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected?

Sent from my iPhone
--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

pspangler1

Re: Scoring wizard

In addition, I am seeing Predicted Probabilities from the Scoring Wizard that are above 50% but with the Predicted Category of 0.

I did not alter the Cutoff Point in the Binary Logistic Regression window before saving the model as an XML file.

On Wed, Feb 25, 2015 at 2:39 PM, Peter Spangler <[hidden email]> wrote:

Thank you Jignesh,

The distribution of the inputs was my first exploration. They are similar as they appear below.

As a test I computed the z statistic and the probability where P(y) = 1 / (1 + exp^-z) in the Test Data based on the beta coefficients noted below. The probabilities calculated through Transform>Compute Variable gave me very different
results than the scoring wizard and were NOT bound at 50%.

COMPUTE z=(6.08668)+(-0.0642*BounceRate)+(.41651*FreeShipping)+(-1.51439*QtyOne)+(.57765*Sale)+(-.33814*log_price).
EXECUTE.
COMPUTE Probability_z=1/(1+EXP(-z)).
EXECUTE.

Train Data Test Data Beta
BounceRate 83.33 90.91 .0642
log_Price 3.53 3.66 -.33814
FreeShipping 66%/34% 61%/39% .41651
QtyOne 13%/87% 23%/77% -1.51439
Sale 15%/85% 11%/89% .57765
Constant 6.08668

On Wed, Feb 25, 2015 at 1:52 PM, Jignesh Sutar <[hidden email]> wrote:
Don't have any technical explanation but some comments nonetheless:

Of course the predicted probabilities are not bound to any distribution as observed from the original data.

Why the probabilities in the new data are bounded by a minimum 0.5 could be due to some bias in the new data which skews it towards more likelihood of cases? Perhaps compare the distribution of the input variables in the model against both datasets.

If the probabilities are bounded by a minimum 0.5 what cut off point are you using to then achieve a 90/10% predicted split? (It can't be the default 0.5 cut else you'd have 100% predicted positive?)

On Wednesday, 25 February 2015, Peter Spangler <[hidden email]> wrote:
Hello all,
My question regards Scoring Wizard results on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500 outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes. After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset.

I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected?

Sent from my iPhone
--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Alex Reutter

Re: Scoring wizard

Hi Peter,

In the scoring wizard, are you selecting "Probability of predicted value" or "Probability of selected value"? My guess is that you are selecting the former, but you want the latter.
http://www-01.ibm.com/support/knowledgecenter/SSLVMB_22.0.0/com.ibm.spss.statistics.help/spss/base/idh_scoring_wizard_select_expressions.htm

Alex

From: Peter Spangler <[hidden email]>
To: [hidden email]
Date: 02/25/2015 05:56 PM
Subject: Re: Scoring wizard
Sent by: "SPSSX(r) Discussion" <[hidden email]>

In addition, I am seeing Predicted Probabilities from the Scoring Wizard that are above 50% but with the Predicted Category of 0.
I did not alter the Cutoff Point in the Binary Logistic Regression window before saving the model as an XML file.

On Wed, Feb 25, 2015 at 2:39 PM, Peter Spangler <peter.spangler@...> wrote:
Thank you Jignesh,

The distribution of the inputs was my first exploration. They are similar as they appear below.

As a test I computed the z statistic and the probability where P(y) = 1 / (1 + exp^-z) in the Test Data based on the beta coefficients noted below. The probabilities calculated through Transform>Compute Variable gave me very different
results than the scoring wizard and were NOT bound at 50%.

COMPUTE z=(6.08668)+(-0.0642*BounceRate)+(.41651*FreeShipping)+(-1.51439*QtyOne)+(.57765*Sale)+(-.33814*log_price).
EXECUTE.
COMPUTE Probability_z=1/(1+EXP(-z)).
EXECUTE.

Train Data Test Data Beta
BounceRate 83.33 90.91 .0642
log_Price 3.53 3.66 -.33814
FreeShipping 66%/34% 61%/39% .41651
QtyOne 13%/87% 23%/77% -1.51439
Sale 15%/85% 11%/89% .57765
Constant 6.08668

On Wed, Feb 25, 2015 at 1:52 PM, Jignesh Sutar <jsutar@...> wrote:
Don't have any technical explanation but some comments nonetheless:

Of course the predicted probabilities are not bound to any distribution as observed from the original data.

Why the probabilities in the new data are bounded by a minimum 0.5 could be due to some bias in the new data which skews it towards more likelihood of cases? Perhaps compare the distribution of the input variables in the model against both datasets.

If the probabilities are bounded by a minimum 0.5 what cut off point are you using to then achieve a 90/10% predicted split? (It can't be the default 0.5 cut else you'd have 100% predicted positive?)

On Wednesday, 25 February 2015, Peter Spangler <peter.spangler@...> wrote:
Hello all,
My question regards Scoring Wizard results on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500 outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes. After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset.

I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected?

Sent from my iPhone
--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately. ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

pspangler1

Re: Scoring wizard

Thanks, Alex! I had saved the syntax and did not make the distinction until reading this example.

http://www-01.ibm.com/support/knowledgecenter/SSLVMB_22.0.0/com.ibm.spss.statistics.tut/spss/tutorials/scoring_applying_model.htm

Sent from my iPhone

On Feb 25, 2015, at 4:31 PM, Alex Reutter <[hidden email]> wrote:

Hi Peter,

In the scoring wizard, are you selecting "Probability of predicted value" or "Probability of selected value"? My guess is that you are selecting the former, but you want the latter.
http://www-01.ibm.com/support/knowledgecenter/SSLVMB_22.0.0/com.ibm.spss.statistics.help/spss/base/idh_scoring_wizard_select_expressions.htm

Alex

From: Peter Spangler <[hidden email]>
To: [hidden email]
Date: 02/25/2015 05:56 PM
Subject: Re: Scoring wizard
Sent by: "SPSSX(r) Discussion" <[hidden email]>

In addition, I am seeing Predicted Probabilities from the Scoring Wizard that are above 50% but with the Predicted Category of 0.
I did not alter the Cutoff Point in the Binary Logistic Regression window before saving the model as an XML file.

On Wed, Feb 25, 2015 at 2:39 PM, Peter Spangler <[hidden email]> wrote:
Thank you Jignesh,

The distribution of the inputs was my first exploration. They are similar as they appear below.

As a test I computed the z statistic and the probability where P(y) = 1 / (1 + exp^-z) in the Test Data based on the beta coefficients noted below. The probabilities calculated through Transform>Compute Variable gave me very different
results than the scoring wizard and were NOT bound at 50%.

COMPUTE z=(6.08668)+(-0.0642*BounceRate)+(.41651*FreeShipping)+(-1.51439*QtyOne)+(.57765*Sale)+(-.33814*log_price).
EXECUTE.
COMPUTE Probability_z=1/(1+EXP(-z)).
EXECUTE.

Train Data Test Data Beta
BounceRate 83.33 90.91 .0642
log_Price 3.53 3.66 -.33814
FreeShipping 66%/34% 61%/39% .41651
QtyOne 13%/87% 23%/77% -1.51439
Sale 15%/85% 11%/89% .57765
Constant 6.08668

On Wed, Feb 25, 2015 at 1:52 PM, Jignesh Sutar <[hidden email]> wrote:
Don't have any technical explanation but some comments nonetheless:

Of course the predicted probabilities are not bound to any distribution as observed from the original data.

Why the probabilities in the new data are bounded by a minimum 0.5 could be due to some bias in the new data which skews it towards more likelihood of cases? Perhaps compare the distribution of the input variables in the model against both datasets.

If the probabilities are bounded by a minimum 0.5 what cut off point are you using to then achieve a 90/10% predicted split? (It can't be the default 0.5 cut else you'd have 100% predicted positive?)

On Wednesday, 25 February 2015, Peter Spangler <[hidden email]> wrote:
Hello all,
My question regards Scoring Wizard results on a dataset with binary logistic regression. I have a binary logistic model with a 50 % balanced sample of approx 500 outcomes each of 0= Sold and 1=Not Sold taken from a larger dataset. I did so based on literature supporting this sampling method when predicting rare outcomes. After building a model I saved it as an XML file under the Save option and applied it to through the Scoring Wizard to a unique set of cases from the larger dataset.

I saved the predicted probabilities and predicted category. The prediction strength is similar to the balanced sample however, my predicted probabilities have a minimum value of 50% instead of the expected 0% as seen in the balanced sample model. The larger test data has an outcome distribution of 90% = Not Sold and 10% = Sold with the goal of predicting Sold. Can anyone shed light on why the probabilities have been affected?

Sent from my iPhone
--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

This email may contain confidential information for the sole use of the intended recipient(s). If you are not an intended recipient, please notify the sender and delete all copies immediately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD