Decion Tree Construction

MODULE 11
Decision Trees
LESSON 22
Construction of Decision Trees
Keywords: Entropy, Impurity, Variance, Misclassification
Construction of Decision Trees

A decision tree is induced from training examples and is therefore an
example of inductive learning or learning from examples.
The training set has a number of labelled patterns of each class.
By utilizing the values of each feature for the various classes in a number
of cases, the decision tree is induced.
It is necessary to choose the attribute at each node to make a decision.
The attribute chosen at each node should be the attribute which makes
the most difference to the classification. It is the attribute which is the
most discriminative.
Whenever a decision is made, the example set is split depending on
the various outcomes. Consider a two class example where the node
splits the positive and negative examples into two outcomes. If the
positive and negative examples are split equally to the two outcomes,
then it is not a good split as the split leaves the example set with almost
the same proportion of positive and negative examples. On the other
hand, if a split puts all the positive examples on one side and all the
negative examples on the other, it is the best split as it has succeeded
in classifying the examples as positive or negative.
If the split results in a certain answer as to the class, then it is a good
split.
Whenever an attribute is chosen, the different outcomes represent new
decision trees with a subset of the examples, and again the most important attribute is chosen. This is done till the classification is complete.
We need to measure how pure a node is. We measure the impurity of
a node. Given below are different measures of impurity.
Measures of Impurity
All the measures compute the fraction of the number of patterns of a
class going along a branch after a split.
2
The measure of impurity is found and gain in information is found at

each node for different attributes and the attribute with the largest
information gain is the attribute chosen.
Given below are the different measures of impurity.
1. Entropy Impurity or Information Impurity
At a node q, the entropy impurity Im(q) is given by
Im(q) =
P (Ci)log2 P (Ci)
P (Ci ) gives the fraction of patterns at node q of category Ci .

As an example, if out of n patterns, it splits equally into two subsets n2 and n2 , then
P (Ci ) =
n/2
n
= 0.5
Then the entropy impurity will be

Im(q) = 0.5log2 0.5 0.5log2 0.5 = 1
If all the patterns go along one branch and no patterns in the
other branch,
For branch 1,
P (Ci ) =
n
n
= 1.0
For branch 2,
P (Ci ) =
0
n
=0
and
Im(q) = 1.0log2 1.0 = 0

If we have 50 examples which split into three classes with the first
class getting 14 patterns, the second class getting 25 and the third
class getting 11 patterns, then we get
P (C1 ) =
14
50
= 0.28
P (C2 ) =
25
50
= 0.5
P (C3 ) =
11
50
= 0.22
and the entropy impurity will be

Im(q) = 0.28log2 0.28 0.5log2 0.5 0.22log2 0.22 = 1.49
2. Variance Impurity
For a two-category problem, the variance impurity will be
Im(q) = P (C1)P (C2 )
If the patterns split equally into the two classes, then
Im(q) = 0.5 0.5 = 0.25
If the patterns all go along one branch and no patterns in the
other branch, we get
Im(q) = 1.0 0 = 0
If the division is 0.8 to Class 1 and 0.2 to Class 2, we get
Im(q) = 0.8 0.2 = 0.16
In the case of more classes, we have to use the generalization of

the above equation. This is called the Gini Impurity and is given
by
Im(q) =
X
1
P (Ci )P (Cj ) = [1
P 2 (Cj )]
2
j
i6=j
If we have 50 examples which split into three classes with the first
class getting 14 patterns, the second class getting 25 and the third
class getting 11 patterns, then we get
P (C1 ) =
14
50
= 0.28
P (C2 ) =
25
50
= 0.5
P (C3 ) =
11
50
= 0.22
Then the gini impurity is

Im(q) = 21 [1 0.282 0.52 0.222 ] = 0.31
3. Misclassification Impurity
Im(q) = 1 maxi P (Ci )
In the above case in which
P (C1 ) =
14
50
= 0.28
P (C2 ) =
25
50
= 0.5
P (C3 ) =
11
50
= 0.22
the misclassification impurity will be

Im(q) = 1 max(0.28, 0.5, 0.22) = 1 0.5 = 0.5
This gives the minimum probability of error in classification.
Using the impurity of a split, the attribute to be chosen at a decision
node can be decided.
Finding the Attribute to Split at a Node
At each node, the attribute to be used for splitting is to be determined
so as to get the smallest and the most efficient decision tree.
Consider a two class problem. Let there be n1 patterns of class 1 and
n2 patterns of class two before splitting. If this attribute is split at the
node and there are half the patterns of class 1 and half the patterns
of class 2 at the left node and half the patterns of class 1 and half the
patterns of class 2 at the right node, this splitting has not succeeded
in separating the two classes. However, if all the n1 patterns of Class 1
are at the left node and all the n2 patterns of Class 2 are at the right
node, this is a very good decision as it separates the two classes.
The split is made according to the entropy or uncertainty remaining.
The entropy or uncertainty E of the patterns before splitting would be
Im(n) = i p(i)log2 p(i)
In this case, it will be

1
1
Im(n) = n1n+n
log2 ( n1n+n
)
2
2
n2
2
log2 ( n1n+n
)
n1 +n2
2
Let a decision rule split the patterns into two branches. Let there be
n11 patterns of class1 and n12 patterns if class 2 in the left branch and
n21 patterns of class 1 and n22 patterns of class 2 in the right branch.
6
Class 1 : 100
Class 2 : 100
Class 3 : 100
Decision
Class 1 : 28
Class 2 : 45
Class 3 : 53
Class 1 : 60
Class 2 : 45
Class 3 : 10
Class 1 : 12
Class 2 : 10
Class 3 : 37
Figure 1: The splitting at a decision node

Then the impurity of the left branch is
Im(lef t) = nn111 log2 nn111
n12
log2 nn121
n1
The impurity of the right branch is

Im(right) = nn212 log2 nn212
n22
log2 nn222
n2
The drop in impurity is

Im(n) = Im(n) ( nn1 Im(lef t)) ( nn2 Im(right))
The drop in impurity is also called the gain in information.
Therefore IG = Im(n) ( nn1 Im(lef t)) ( nn2 Im(right))
For example, consider Figure 1 which gives a three class problem which
is split by the decision into three outcomes. The figure shows the number of patterns of each class before splitting at the node and in each
7
branch after the split. Let us calculate the information gain at the
node. Note that a node can also be represented by the number of patterns associated with it
The impurity at the node before the split is
Im(n) = 13 log2 13 31 log2 13 13 log2 31 = 1.585
After the split, impurity of the leftmost branch is
Total patterns in left branch = 126
28
28
45
45
Im(lef t) = 126
log2 126
126
log2 126
53
53
log2 126
126
= 1.538
Impurity of the middle branch is

Total patterns in middle branch = 115
60
60
Im(middle) = 115
log2 115
45
45
log2 115
115
10
10
log2 115
115
= 1.326
Impurity in the rightmost branch is

Total patterns in right branch = 59
Im(right) = 12
log2 12
59
59
10
log2 10
59
59
37
log2 37
59
59
= 1.324
The information gain for this node is

IG = 1.585
126
300
1.538
115
300
1.326
59
300
1.324 = 0.17

Decion Tree Construction

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Decion Tree Construction

Hochgeladen von

Copyright:

Verfügbare Formate

MODULE 11

Construction of Decision Trees

The measure of impurity is found and gain in information is found at

P (Ci ) gives the fraction of patterns at node q of category Ci .

Then the entropy impurity will be

Im(q) = 1.0log2 1.0 = 0

and the entropy impurity will be

In the case of more classes, we have to use the generalization of

Then the gini impurity is

the misclassification impurity will be

In this case, it will be

Figure 1: The splitting at a decision node

The impurity of the right branch is

The drop in impurity is

Impurity of the middle branch is

Impurity in the rightmost branch is

The information gain for this node is

Das könnte Ihnen auch gefallen