Sie sind auf Seite 1von 25

Learning Sets of Rules

[Read Ch. 10]


[Recommended exercises 10.1, 10.2, 10.5, 10.7, 10.8]
 Sequential covering algorithms
 FOIL
 Induction as inverse of deduction
 Inductive Logic Programming

229 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Learning Disjunctive Sets of Rules

Method 1: Learn decision tree, convert to rules


Method 2: Sequential covering algorithm:
1. Learn one rule with high accuracy, any coverage
2. Remove positive examples covered by this rule
3. Repeat

230 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Sequential Covering Algorithm

Sequential-
covering(Target attribute; Attributes; Examples; Thresho
 Learned rules fg
 Rule learn-one-
rule(Target attribute; Attributes; Examples)
 while performance(Rule; Examples)
> Threshold, do
{ Learned rules Learned rules + Rule
{ Examples Examples ? fexamples
correctly classi ed by Ruleg
{ Rule learn-one-
rule(Target attribute; Attributes; Examples)
 Learned rules sort Learned rules accord to
performance over Examples
 return Learned rules

231 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Learn-One-Rule

IF
THEN PlayTennis=yes

IF Wind=weak
THEN PlayTennis=yes
...
IF Wind=strong IF Humidity=high
THEN PlayTennis=no IF Humidity=normal THEN PlayTennis=no
THEN PlayTennis=yes

IF Humidity=normal
Wind=weak
THEN PlayTennis=yes
...
IF Humidity=normal IF Humidity=normal
Wind=strong IF Humidity=normal Outlook=rain
THEN PlayTennis=yes Outlook=sunny THEN PlayTennis=yes
THEN PlayTennis=yes

232 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Learn-One-Rule
 Pos positive Examples
 Neg negative Examples
 while Pos, do
Learn a NewRule
{ NewRule most general rule possible
{ NewRuleNeg Neg
{ while NewRuleNeg , do

Add a new literal to specialize NewRule


1. Candidate literals generate candidates
2. Best literal argmaxL2Candidate literals
Performance(SpecializeRule(NewRule; L))
3. add Best literal to NewRule preconditions
4. NewRuleNeg subset of NewRuleNeg
that satis es NewRule preconditions
{ Learned rules Learned rules + NewRule
{ Pos Pos ? fmembers of Pos covered by
NewRuleg
 Return Learned rules

233 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Subtleties: Learn One Rule

1. May use beam search


2. Easily generalizes to multi-valued target
functions
3. Choose evaluation function to guide search:
 Entropy (i.e., information gain)
 Sample accuracy: n
c
n
where nc = correct rule predictions, n = all
predictions
 m estimate: nc + mp
n+m

234 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Variants of Rule Learning Programs

 Sequential or simultaneous covering of data?


 General ! speci c, or speci c ! general?
 Generate-and-test, or example-driven?
 Whether and how to post-prune?
 What statistical evaluation function?

235 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Learning First Order Rules

Why do that?
 Can learn sets of rules such as
Ancestor(x; y) Parent(x; y)
Ancestor(x; y) Parent(x; z ) ^ Ancestor(z; y)
 General purpose programming language
Prolog: programs are sets of such rules

236 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
First Order Rule for Classifying Web
Pages

[Slattery, 1997]
course(A)
has-word(A, instructor),
Not has-word(A, good),
link-from(A, B),
has-word(B, assign),
Not link-from(B, C)
Train: 31/31, Test: 31/34

237 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
FOIL(Target predicate; Predicates; Examples)
 Pos positive Examples
 Neg negative Examples
 while Pos, do
Learn a NewRule
{ NewRule most general rule possible
{ NewRuleNeg Neg
{ while NewRuleNeg , do

Add a new literal to specialize NewRule


1. Candidate literals generate candidates
2. Best literal
argmaxL2Candidate literals Foil Gain(L; NewRule)
3. add Best literal to NewRule preconditions
4. NewRuleNeg subset of NewRuleNeg
that satis es NewRule preconditions
{ Learned rules Learned rules + NewRule
{ Pos Pos ? fmembers of Pos covered by
NewRuleg
 Return Learned rules

238 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Specializing Rules in FOIL

Learning rule: P (x1; x2; : : : ; xk ) L1 : : : Ln


Candidate specializations add new literal of form:
 Q(v1; : : : ; vr), where at least one of the vi in the
created literal must already exist as a variable in
the rule.
 Equal(xj ; xk), where xj and xk are variables
already present in the rule
 The negation of either of the above forms of
literals

239 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Information Gain in FOIL

0 1
Foil Gain(L; R)  t log2 p + p1 ? log p0
BB CC
@ 2p +n A
1 n1 0 0
Where
 L is the candidate literal to add to rule R
 p0 = number of positive bindings of R
 n0 = number of negative bindings of R
 p1 = number of positive bindings of R + L
 n1 = number of negative bindings of R + L
 t is the number of positive bindings of R also
covered by R + L
Note
 ? log2 p p+n is optimal number of bits to indicate
0

the class of a positive binding covered by R


0 0

240 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Induction as Inverted Deduction

Induction is nding h such that


(8hxi; f (xi)i 2 D) B ^ h ^ xi ` f (xi)
where
 xi is ith training instance
 f (xi) is the target function value for xi
 B is other background knowledge

So let's design inductive algorithm by inverting


operators for automated deduction!

241 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Induction as Inverted Deduction

\pairs of people, hu; vi such that child of u is v,"

f (xi) : Child(Bob; Sharon)


xi : Male(Bob); Female(Sharon); Father(Sharon; Bob)
B: Parent(u; v) Father(u; v)
What satis es (8hxi; f (xi)i 2 D) B ^ h ^ xi ` f (xi)?
h1 : Child(u; v) Father(v; u)
h2 : Child(u; v) Parent(v; u)

242 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Induction is, in fact, the inverse operation of
deduction, and cannot be conceived to exist
without the corresponding operation, so that
the question of relative importance cannot
arise. Who thinks of asking whether addition
or subtraction is the more important process
in arithmetic? But at the same time much
di erence in diculty may exist between a
direct and inverse operation; : : : it must be
allowed that inductive investigations are of a
far higher degree of diculty and complexity
than any questions of deduction: : : :
(Jevons 1874)

243 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Induction as Inverted Deduction

We have mechanical deductive operators


F (A; B ) = C , where A ^ B ` C
need inductive operators
O(B; D) = h where (8hxi; f (xi)i 2 D) (B ^h^xi) ` f (xi)

244 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Induction as Inverted Deduction

Positives:
 Subsumes earlier idea of nding h that \ ts"
training data
 Domain theory B helps de ne meaning of \ t"
the data
B ^ h ^ xi ` f (xi)
 Suggests algorithms that search H guided by B

245 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Induction as Inverted Deduction

Negatives:
 Doesn't allow for noisy data. Consider
(8hxi; f (xi)i 2 D) (B ^ h ^ xi) ` f (xi)
 First order logic gives a huge hypothesis space H
! over tting...
! intractability of calculating all acceptable h's

246 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Deduction: Resolution Rule

P _ L
:L _ R
P _R
1. Given initial clauses C1 and C2, nd a literal L
from clause C1 such that :L occurs in clause C2
2. Form the resolvent C by including all literals
from C1 and C2, except for L and :L. More
precisely, the set of literals occurring in the
conclusion C is
C = (C1 ? fLg) [ (C2 ? f:Lg)
where [ denotes set union, and \?" denotes set
di erence.

247 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Inverting Resolution

C : KnowMaterial V Study C : KnowMaterial V Study


2 2

C : PassExam V KnowMaterial C : PassExam V KnowMaterial


1 1

C: PassExam V Study
C: PassExam V Study

248 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Inverted Resolution (Propositional)

1. Given initial clauses C1 and C , nd a literal L


that occurs in clause C1, but not in clause C .
2. Form the second clause C2 by including the
following literals
C2 = (C ? (C1 ? fLg)) [ f:Lg

249 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
First order resolution

First order resolution:


1. Find a literal L1 from clause C1, literal L2 from
clause C2, and substitution  such that
L1 = :L2
2. Form the resolvent C by including all literals
from C1 and C2, except for L1 and :L2.
More precisely, the set of literals occurring in
the conclusion C is
C = (C1 ? fL1g) [ (C2 ? fL2g)

250 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Inverting First order resolution

C2 = (C ? (C1 ? fL1g)1)2?1 [ f:L112?1g

251 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Cigol

Father (Tom, Bob) GrandChild ( y,x ) V Father ( x,z ) V Father ( z,y )

{Bob/y, Tom/z}

Father (Shannon, Tom ) GrandChild ( Bob,x) V Father ( x,Tom )

{Shannon/x}

GrandChild ( Bob, Shannon)

252 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Progol

Progol: Reduce comb explosion by generating the


most speci c acceptable h
1. User speci es H by stating predicates, functions,
and forms of arguments allowed for each
2. Progol uses sequential covering algorithm.
For each hxi; f (xi)i
 Find most speci c hypothesis hi s.t.
B ^ hi ^ xi ` f (xi)
{ actually, considers only k -step entailment

3. Conduct general-to-speci c search bounded by


speci c hypothesis hi, choosing hypothesis with
minimum description length

253 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997

Das könnte Ihnen auch gefallen