Weight of evidence (WOE)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Weight of evidence (WOE)

DEBOER
Hi,

I'm trying to validate a scorecard, first  I check on one variable at a time. My main question is if you can get "the weight of edevince (WOE)" in SPSS in an easy way? Or must one use excel all the time?

If too many values exist for a categorical variable then it is not feasible to enter them in the model as a series of indicator variables.
There will be too many of them and consequently coefficients will be poorly estimated.
A good solution is to substitute the categorical variable with the WOE for each value.

Recall:
is the WOE for value of variable .
WOE greater than 0 indicates a greater association with the negative event.
WOE less than 0 indicates a greater association with the positive event.

W(v)=log(f(x=v|y=0)/f(y=v|y=1)).

Then after getting WOE I want to calculate the information value.

I always use the formel above and get the results in excel but I hope that they is a easy way in SPSS.

Thanks in advance


Reply | Threaded
Open this post in threaded view
|

Re: Weight of evidence (WOE)

David Marso
Administrator
Please provide the precise steps you use to calculate this in Excel.
A worked numerical example would be most informative.
If it can be done in Excel (painfully ) it can be done in SPSS with ease.
"W(v)=log(f(x=v|y=0)/f(y=v|y=1))."
does *NOT* provide sufficient information to work from.

DEBOER wrote
Hi,

I'm trying to validate a scorecard, first  I check on one variable at a time. My main question is if you can get "the weight of edevince (WOE)" in SPSS in an easy way? Or must one use excel all the time?

If too many values exist for a categorical variable then it is not feasible to enter them in the model as a series of indicator variables.
There will be too many of them and consequently coefficients will be poorly estimated.
A good solution is to substitute the categorical variable with the WOE for each value.

Recall:
is the WOE for value of variable .
WOE greater than 0 indicates a greater association with the negative event.
WOE less than 0 indicates a greater association with the positive event.

W(v)=log(f(x=v|y=0)/f(y=v|y=1)).

Then after getting WOE I want to calculate the information value.

I always use the formel above and get the results in excel but I hope that they is a easy way in SPSS.

Thanks in advance
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Weight of evidence (WOE)

DEBOER
age good bad tot column% outcome woe infomationvalue 18-20 150355 4667 155022 4% 3% -0.4949 0.0119 21-41 2013941 46674 2060615 51% 2.3% -0.2027 0.0229 41+ 1840080 24434 1864514 46% 1.3% 0.3542 0.0484 tot 4004376 75775 4080151 100% 1.9% 0.0834 Ok, We have a variabel age, I do a crosstab in spss with age and default(0=good and 1=bad). I get outcome from bad/tot WOE= Ln(good)-Ln(goodtot(4004376)) -Ln(bad)+(Ln(badtot(75775)) Informationvalue= WOE*((good/goodtot)-(bad/badtot))
Reply | Threaded
Open this post in threaded view
|

Re: Weight of evidence (WOE)

DEBOER
In reply to this post by David Marso
age               good      bad     tot          column%        outcome       woe           infomationvalue

18-20           150355    4667  155022      4%          3%               -0.4949        0.0119

21-41           2013941  46674  2060615    51%        2.3%             -0.2027        0.0229

41+             1840080   24434  1864514    46%         1.3%            0.3542          0.0484

tot               4004376   75775  4080151   100%        1.9%                               0.0834

Ok, We have a variabel age, I do a crosstab in spss with age and default(0=good and 1=bad).
I get outcome from bad/tot
WOE= Ln(good)-Ln(goodtot(4004376)) -Ln(bad)+(Ln(badtot(75775))
Informationvalue= WOE*((good/goodtot)-(bad/badtot))



   
Reply | Threaded
Open this post in threaded view
|

Re: Weight of evidence (WOE)

David Marso
Administrator
Not sure this helps considering
WOE= Ln(good)-Ln(goodtot(4004376)) -Ln(bad)+(Ln(badtot(75775))
is *NOT* a properly constructed mathematical expression!
Please fill in the specifics *WITH* the steps involved in calculation of the
value you believe WOE should evaluate to.

DEBOER wrote
age               good      bad     tot          column%        outcome       woe           infomationvalue

18-20           150355    4667  155022      4%          3%               -0.4949        0.0119

21-41           2013941  46674  2060615    51%        2.3%             -0.2027        0.0229

41+             1840080   24434  1864514    46%         1.3%            0.3542          0.0484

tot               4004376   75775  4080151   100%        1.9%                               0.0834

Ok, We have a variabel age, I do a crosstab in spss with age and default(0=good and 1=bad).
I get outcome from bad/tot
WOE= Ln(good)-Ln(goodtot(4004376)) -Ln(bad)+(Ln(badtot(75775))
Informationvalue= WOE*((good/goodtot)-(bad/badtot))
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Weight of evidence (WOE)

David Marso
Administrator
In reply to this post by DEBOER
See AGGREGATE, CASESTOVARS and COMPUTE.

DEBOER wrote
age               good      bad     tot          column%        outcome       woe           infomationvalue

18-20           150355    4667  155022      4%          3%               -0.4949        0.0119

21-41           2013941  46674  2060615    51%        2.3%             -0.2027        0.0229

41+             1840080   24434  1864514    46%         1.3%            0.3542          0.0484

tot               4004376   75775  4080151   100%        1.9%                               0.0834

Ok, We have a variabel age, I do a crosstab in spss with age and default(0=good and 1=bad).
I get outcome from bad/tot
WOE= Ln(good)-Ln(goodtot(4004376)) -Ln(bad)+(Ln(badtot(75775))
Informationvalue= WOE*((good/goodtot)-(bad/badtot))
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Weight of evidence (WOE)

Mike
In reply to this post by David Marso
Weight of Evidence and Information Value are outside of my area
of expertise but, after checking several websites, it appears that
there might be something wrong with the equations provided below.
Here is a link to an example I found that goes into a fair amount
of detail on output generated by SAS; see:

http://www.focusoptimal.com/doc/Information-Value-Calculation-Continuous.pdf

I assume that there is someone on this list that is familiar with the
weight of evidence or a Bayesian who can properly interpret what
is going on -- such as in this example:
http://www.web-e.stat.vt.edu/vining/keying/weight_of_evidence.pdf

-Mike Palij
New York University
[hidden email]


On Thu, Jan 5, 2012 at 11:23 AM, David Marso <[hidden email]> wrote:
Not sure this helps considering
WOE= Ln(good)-Ln(goodtot(4004376)) -Ln(bad)+(Ln(badtot(75775))
is *NOT* a properly constructed mathematical expression!
Please fill in the specifics *WITH* the steps involved in calculation of the
value you believe WOE should evaluate to.


DEBOER wrote
>
> age               good      bad     tot          column%        outcome
> woe           infomationvalue
>
> 18-20           150355    4667  155022      4%          3%
> -0.4949        0.0119
>
> 21-41           2013941  46674  2060615    51%        2.3%
> -0.2027        0.0229
>
> 41+             1840080   24434  1864514    46%         1.3%
> 0.3542          0.0484
>
> tot               4004376   75775  4080151   100%        1.9%
> 0.0834
>
> Ok, We have a variabel age, I do a crosstab in spss with age and
> default(0=good and 1=bad).
> I get outcome from bad/tot
> WOE= Ln(good)-Ln(goodtot(4004376)) -Ln(bad)+(Ln(badtot(75775))
> Informationvalue= WOE*((good/goodtot)-(bad/badtot))
>


--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Weight-of-evidence-WOE-tp5122524p5123145.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD