AI-chapter6 机器学习 - 决策树 0 (1) .3

Machine Learning
(Symbolic methods)
Introduction
What is learning?
) unknownknown
) learning enables to perform better next time
) learning enables to change our knowledge structure
) learning is related to adaptation (change for better)
) learning as search (learning problem solving search)
ID3_type Learning (decision
tree learning)
Motivation

Goalto search for order
to search for structures (tree)
The basic idea of ID3..
Introduction to information
(Theory Coding)
Channel
Input Output
e.g.P1,P2,,Pn is n symbols(Input)
Measure of information (Uncertainty)
( ) 0 p w =>
)surprise contains more information
A highly predictable sequence contains little actual information
Example: 11011011011011011011011011 (whats next?)
Example: I didnt win the lottery this week
A completely unpredictable sequence of n bits contains n bits of information
Example: 01000001110110011010010000 (whats next?)
Example: I just won $10 million in the lottery!!!!
)get more information from an observation of a rare event
S={s
1
,s
2
,,s
n
}
p(s
1
),p(s
2
)p(s
n
)
Question: If s
i
happenshow much
information do we have?
2
( ) log ( )
i i
I S P S =
2
2
2 2
( ) 1 log 1 0
( ) 0 log 0
( ) ( ) log log
i
i
i i i j
p S
p S
p S p S S S
= =
= =
> s
How to measure the information for the distribution
How to measure the information
for the distribution
exp
( ) ( ) log ( )
i
i i
S S
average ectation
I p p S p S
e
Normally, we call it the entropy of the distribution

H(P)
Special case, .
Multiple choice question
a) P=(1/4, 1/4, 1/4, 1/4)
H(P)=log4

b) P=(1/2, 0, 0, 1/2)
H(P)=log2
c) P=(0, 0, 0, 1)
H(P)=log1=0
Observation: learning can be viewed as an
entropy minimization problem.
How? Search (state space search )
To answer the Q, we need to do search.
Decision Learning
(Search a good decision tree)
Task: To learning the two classes +and based on three
attributesi.e., HeightHair and Eyes
Attributes
Objects
Height Hair Eyes Class
O1 Short Blond Blue +
O2 Short Blond Brown -
O3 Tall Red Blue +
O4 Tall Dark Blue -
O5 Tall Dark Blue -
O6 Tall Blond Blue +
O7 Tall Dark Brown -
Decision Tree

) Each internal node is labeled by an
attribute
) Each external node (leaf) is labeled
by a set of objects
) Each branch is labeled by the value
of an attribute

Dark Red Blond
Blue Brown
Blue Brown

Dark
Red
Blond

Tall Short
Height
Hair

{O7}}
-

Eyes
{O1 O2 O8}
+ - -
{O3 O4 O5 O6 O7}
+ - - + -
Eyes Hair
{O1 O2 O8}
+ - -

{O2 O8}
-

{O3}
-

{O6}
+

{O4 O5}
-

{O3 O4 O5 O6 }
+ - - +
Rules:
Height=Tall Eyes=Brown Class=
Height=Tall Eyes=Blue Hair=Blond Class=+
.
State space
Each decision tree or partial decision is a state
goal stateA decision tree whose leaves contain the
objects of the same class
Each edge is an expansion of one node of a tree

QHow to search the space of a good tree
Heuristic searchGreedy algorithm
What do we main by a good decision tree?
Eg. - Use less number of attributes
- The height of the DT is low

Learning is entropy reduction.
Can we use this principle?
Attributes
Objects
Height Hair Eyes Class
O1 Short Blond Blue +
O3 Tall Red Blue +
O4 Tall Dark Blue -
O5 Tall Dark Blue -
O6 Tall Blond Blue +
O7 Tall Dark Brown -
Eg.
Concept space
Single attribute-value
So far, we have the rules
Hair=Dark {O4,O5, O7}
Hair=Red + {O3}
Eyes=Blond {O2, O8}
{O1,O6} cant done.
Pair of attribute-value
How many concepts possibility?
Three of attribute-value

PRISM Algorithm
PRISM generates rules for each class by
looking at the training data and adding
rules that completely describe all tuples
in that class (by Witten and Frank in
2000)
PRISM learning
KR learning
We search concept space level by level (breadth first
search)
Question: Which one to search first?
*Choose the node with smallest entropy value

Comparison of ID3 and PRISM

ID3
PRISM
each object satisfies
only one rule
any object may satisfy
more than one rules
lager rules short rules
less number of attributes more number of attributes
global local
Question: Can we find a
minimal set of attributes?
R= {Hair, Eyes}

Reduct a subset of attributes:
1) Suppose R_At a subset of attribute set At
2) R is a reduct of At following condition holds:
) R is sufficient
) Each attribute aeR is necessary (That is ,R-
{a} is not sufficient)
R= {Hair, Eyes} is a reduct
) sufficient
) Hair is necessary
Eyes is necessary
Question: How to find a
reduct?
Deletion method

Which c there is more attributes
{ try to remove one attribute at a time
remove if you can
}

Eg.
Question: Which we should by first?
Try to remove attribute with highest entropy value!

Question: Does ID3 provide a reduct?
Rule Redundancy (proving)
a1a2an=>+

CLS
ID3
C4.5
CARTAssistant

Class
HeightHairEyes
attribute

objects

-

NP-

35[3+,5-]
E(S)=E([3+,5-])=-(3/8)log2(3/8)-
(5/8)log2(5/8)=0.96

ID3
Root
Exampleslabel=+Root
Exampleslable=-Root
AttributesRootlable=Examples

AAttributes
RootA

RootA=vi
Example-viExamplesAvi
Examples-vi
lable=Examples

ID3(example-vi,target-
attribute,attributes-|A|

Root

ID3
ID3

ID3
ID3

(1)

overfitting

forward pruning
backward pruning

MDL

2

gain ratioGini-indexdistance
measure

3

Thank You!

AI-chapter6 机器学习 - 决策树 0 (1) .3

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

AI-chapter6 机器学习 - 决策树 0 (1) .3

Hochgeladen von

Copyright:

Verfügbare Formate

Machine Learning

Normally, we call it the entropy of the distribution

Das könnte Ihnen auch gefallen