CRT Improvement Calculations

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

CRT Improvement Calculations

Matt Freeman
I'm trying to figure out how the improvement is calculated in a C&RT.  For
the following data set, I'm trying to predict ownership from lot size and
income.

Lot  Income  Ownership
18.8 33 N
20.4 43.2 N
16.4 47.4 N
17.6 49.2 N
14 51 N
22 51 O
20.8 52.8 N
16 59.4 N
18.4 60 O
20.8 61.5 O
14.8 63 N
17.2 64.8 N
21.6 64.8 O
18.4 66 N
20 69 O
19.6 75 N
20 81 O
22.4 82.8 O
17.6 84 N
16.8 85.5 O
23.6 87 O
20.8 93 O
17.6 108 O
19.2 110.1 O

Based on the first split (Income <= 59.7), I first calculated the Gini
impurity, I, for each part of the split as I(left) = 1 - (1/8)^2 - (7/8)^2
and I(right) = 1 - (11/16)^2 - (5/16)^2.  Then I calculated the weighted
average of these as (8/24)I(left) + (16/24)I(right) = 9/64 = 0.140625.  This
value is consistent with that listed by SPSS.
However, at the next split (Lot <= 21.4), I tried to calculate the
improvement in the same fashion for the 100% pure nodes, and I don't come up
with SPSS's 0.073.  I get a Gini Impurity of 0, so I don't know how the
0.073 figure is determined?

Can anyone help?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: CRT Improvement Calculations

Alex Reutter
Does the algorithms topic on the Gini criterion for C&RT help?
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/topic/com.ibm.spss.statistics.help/alg_tree-cart_split-criteria_categorical_gini.htm

The notation for the C&RT algorithms is explained at:
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/topic/com.ibm.spss.statistics.help/alg_tree-cart_notation.htm

...and there's more on C&RT and the other TREE algorithms if you browse the TOC from there.

Alex