Sie sind auf Seite 1von 40

Machine Learning

(Symbolic methods)
Introduction
What is learning?
) unknownknown
) learning enables to perform better next time
) learning enables to change our knowledge structure
) learning is related to adaptation (change for better)
) learning as search (learning problem solving search)
ID3_type Learning (decision
tree learning)
Motivation

Goalto search for order
to search for structures (tree)
The basic idea of ID3..
Introduction to information
(Theory Coding)
Channel
Input Output
e.g.P1,P2,,Pn is n symbols(Input)
Measure of information (Uncertainty)
( ) 0 p w =>
)surprise contains more information
A highly predictable sequence contains little actual information
Example: 11011011011011011011011011 (whats next?)
Example: I didnt win the lottery this week
A completely unpredictable sequence of n bits contains n bits of information
Example: 01000001110110011010010000 (whats next?)
Example: I just won $10 million in the lottery!!!!
)get more information from an observation of a rare event
S={s
1
,s
2
,,s
n
}
p(s
1
),p(s
2
)p(s
n
)
Question: If s
i
happenshow much
information do we have?
2
( ) log ( )
i i
I S P S =
2
2
2 2
( ) 1 log 1 0
( ) 0 log 0
( ) ( ) log log
i
i
i i i j
p S
p S
p S p S S S
= =
= =
> s
How to measure the information for the distribution
How to measure the information
for the distribution
exp
( ) ( ) log ( )
i
i i
S S
average ectation
I p p S p S
e

Normally, we call it the entropy of the distribution


H(P)
Special case, .
Multiple choice question
a) P=(1/4, 1/4, 1/4, 1/4)
H(P)=log4




b) P=(1/2, 0, 0, 1/2)
H(P)=log2
c) P=(0, 0, 0, 1)
H(P)=log1=0
Observation: learning can be viewed as an
entropy minimization problem.
How? Search (state space search )
To answer the Q, we need to do search.
Decision Learning
(Search a good decision tree)
Task: To learning the two classes +and based on three
attributesi.e., HeightHair and Eyes
Attributes
Objects
Height Hair Eyes Class
O1 Short Blond Blue +
O2 Short Blond Brown -
O3 Tall Red Blue +
O4 Tall Dark Blue -
O5 Tall Dark Blue -
O6 Tall Blond Blue +
O7 Tall Dark Brown -
O8 Short Blond Brown -
Decision Tree

) Each internal node is labeled by an
attribute
) Each external node (leaf) is labeled
by a set of objects
) Each branch is labeled by the value
of an attribute


Dark Red Blond
Blue Brown
Blue Brown

Dark
Red
Blond

Tall Short
Height
Hair

{O7}}
-



Eyes
{O1 O2 O8}
+ - -
{O3 O4 O5 O6 O7}
+ - - + -
Eyes Hair
{O1 O2 O8}
+ - -

{O2 O8}
-

{O3}
-

{O6}
+


{O4 O5}
-

{O3 O4 O5 O6 }
+ - - +
Rules:
Height=Tall Eyes=Brown Class=
Height=Tall Eyes=Blue Hair=Blond Class=+
.
State space
Each decision tree or partial decision is a state
goal stateA decision tree whose leaves contain the
objects of the same class
Each edge is an expansion of one node of a tree

QHow to search the space of a good tree
Heuristic searchGreedy algorithm
What do we main by a good decision tree?
Eg. - Use less number of attributes
- The height of the DT is low

Learning is entropy reduction.
Can we use this principle?
Attributes
Objects
Height Hair Eyes Class
O1 Short Blond Blue +
O2 Short Blond Brown -
O3 Tall Red Blue +
O4 Tall Dark Blue -
O5 Tall Dark Blue -
O6 Tall Blond Blue +
O7 Tall Dark Brown -
O8 Short Blond Brown -
Eg.
Concept space
Single attribute-value
So far, we have the rules
Hair=Dark {O4,O5, O7}
Hair=Red + {O3}
Eyes=Blond {O2, O8}
{O1,O6} cant done.
Pair of attribute-value
How many concepts possibility?
Three of attribute-value

PRISM Algorithm
PRISM generates rules for each class by
looking at the training data and adding
rules that completely describe all tuples
in that class (by Witten and Frank in
2000)
PRISM learning
KR learning
We search concept space level by level (breadth first
search)
Question: Which one to search first?
*Choose the node with smallest entropy value

Comparison of ID3 and PRISM

ID3
PRISM
each object satisfies
only one rule
any object may satisfy
more than one rules
lager rules short rules
less number of attributes more number of attributes
global local
Question: Can we find a
minimal set of attributes?
R= {Hair, Eyes}

Reduct a subset of attributes:
1) Suppose R_At a subset of attribute set At
2) R is a reduct of At following condition holds:
) R is sufficient
) Each attribute aeR is necessary (That is ,R-
{a} is not sufficient)
R= {Hair, Eyes} is a reduct
) sufficient
) Hair is necessary
Eyes is necessary
Question: How to find a
reduct?
Deletion method

Which c there is more attributes
{ try to remove one attribute at a time
remove if you can
}



Eg.
Question: Which we should by first?
Try to remove attribute with highest entropy value!


Question: Does ID3 provide a reduct?
Rule Redundancy (proving)
a1a2an=>+










CLS
ID3
C4.5
CARTAssistant









Class
HeightHairEyes
attribute

objects



-





NP-






35[3+,5-]
E(S)=E([3+,5-])=-(3/8)log2(3/8)-
(5/8)log2(5/8)=0.96

ID3
Root
Exampleslabel=+Root
Exampleslable=-Root
AttributesRootlable=Examples

AAttributes
RootA

RootA=vi
Example-viExamplesAvi
Examples-vi
lable=Examples

ID3(example-vi,target-
attribute,attributes-|A|

Root



ID3
ID3

ID3
ID3

(1)

overfitting







forward pruning
backward pruning

MDL

2


gain ratioGini-indexdistance
measure

3




















Thank You!

Das könnte Ihnen auch gefallen