Beruflich Dokumente
Kultur Dokumente
Decision Trees
LESSON 22
Construction of Decision Trees
Keywords: Entropy, Impurity, Variance, Misclassification
P (Ci)log2 P (Ci)
n/2
n
= 0.5
n
n
= 1.0
For branch 2,
P (Ci ) =
0
n
=0
and
14
50
= 0.28
P (C2 ) =
25
50
= 0.5
P (C3 ) =
11
50
= 0.22
Im(q) =
X
1
P (Ci )P (Cj ) = [1
P 2 (Cj )]
2
j
i6=j
If we have 50 examples which split into three classes with the first
class getting 14 patterns, the second class getting 25 and the third
class getting 11 patterns, then we get
P (C1 ) =
14
50
= 0.28
P (C2 ) =
25
50
= 0.5
P (C3 ) =
11
50
= 0.22
14
50
= 0.28
P (C2 ) =
25
50
= 0.5
P (C3 ) =
11
50
= 0.22
n2
2
log2 ( n1n+n
)
n1 +n2
2
Let a decision rule split the patterns into two branches. Let there be
n11 patterns of class1 and n12 patterns if class 2 in the left branch and
n21 patterns of class 1 and n22 patterns of class 2 in the right branch.
6
Class 1 : 100
Class 2 : 100
Class 3 : 100
Decision
Class 1 : 28
Class 2 : 45
Class 3 : 53
Class 1 : 60
Class 2 : 45
Class 3 : 10
Class 1 : 12
Class 2 : 10
Class 3 : 37
n12
log2 nn121
n1
n22
log2 nn222
n2
branch after the split. Let us calculate the information gain at the
node. Note that a node can also be represented by the number of patterns associated with it
The impurity at the node before the split is
Im(n) = 13 log2 13 31 log2 13 13 log2 31 = 1.585
After the split, impurity of the leftmost branch is
Total patterns in left branch = 126
28
28
45
45
Im(lef t) = 126
log2 126
126
log2 126
53
53
log2 126
126
= 1.538
45
45
log2 115
115
10
10
log2 115
115
= 1.326
59
59
10
log2 10
59
59
37
log2 37
59
59
= 1.324
126
300
1.538
115
300
1.326
59
300
1.324 = 0.17