You are on page 1of 15

Lecture Slides for

ETHEM ALPAYDIN
The MIT Press, 2010
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Tree Uses Nodes, and Leaves

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 3
Divide and Conquer
Internal decision nodes
Univariate: Uses a single attribute, xi
Numeric xi : Binary split : xi > wm
Discrete xi : n-way split for n possible values
Multivariate: Uses all attributes, x
Leaves
Classification: Class labels, or proportions
Regression: Numeric; r average, or local fit
Learning is greedy; find the best split recursively (Breiman
et al, 1984; Quinlan, 1986, 1993)

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 4
Classification Trees
(ID3, CART, C4.5)
For node m, Nm instances reach m, Nim belong to Ci
i
N
PC i | x, m pmi m
Nm

Node m is pure if pim is 0 or 1


Measure of impurity is entropy
K
I m pmi log2 pmi
i 1

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 5
Best Split
If node m is pure, generate a leaf and stop, otherwise split and
continue recursively
Impurity after split: Nmj of Nm take branch j. Nimj belong to Ci
i
N
PC i | x, m, j pmj
i
mj
Nmj
n N K
I' m mj
p i
mj log p i
2 mj
j 1 Nm i 1

Find the variable and split that min impurity (among all
variables -- and split positions for numeric variables)

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 6
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 7
Regression Trees
Error at node m:
1 i f x X m : x reachesnodem
bm x
0 otherwi s e
t m t
t


1 b x r

2
Em t
t
gm
b x
t
r g m bm x t
Nm t m

After splitting:
1 if x X mj : x reachesnode m and branch j
bmj x
0 otherwis e
b x r t t

E 'm
1
j t
t
2
t
t mj

b x
r g mj bmj x gmj t
Nm t mj
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 8
Model Selection in Trees

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 9
Pruning Trees
Remove subtrees for better generalization (decrease
variance)
Prepruning: Early stopping
Postpruning: Grow the whole tree then prune subtrees
which overfit on the pruning set
Prepruning is faster, postpruning is more accurate
(requires a separate pruning set)

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 10
Rule Extraction from Trees
C4.5Rules
(Quinlan, 1993)

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 11
Learning Rules
Rule induction is similar to tree induction but
tree induction is breadth-first,
rule induction is depth-first; one rule at a time
Rule set contains rules; rules are conjunctions of terms
Rule covers an example if all terms of the rule evaluate to
true for the example
Sequential covering: Generate rules one at a time until all
positive examples are covered
IREP (Frnkrantz and Widmer, 1994), Ripper (Cohen,
1995)

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 12
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 13
Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 14
Multivariate Trees

Lecture Notes for E Alpaydn 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 15