ch10 PDF

Learning Sets of Rules
[Read Ch. 10]

[Recommended exercises 10.1, 10.2, 10.5, 10.7, 10.8]
Sequential covering algorithms
FOIL
Induction as inverse of deduction
Inductive Logic Programming
229 lecture slides for textbook Machine Learning, T. Mitchell, McGraw Hill, 1997
Learning Disjunctive Sets of Rules
Method 1: Learn decision tree, convert to rules

Method 2: Sequential covering algorithm:
1. Learn one rule with high accuracy, any coverage
2. Remove positive examples covered by this rule
3. Repeat
Sequential Covering Algorithm
Sequential-
covering(Target attribute; Attributes; Examples; Thresho
Learned rules fg
Rule learn-one-
rule(Target attribute; Attributes; Examples)
while performance(Rule; Examples)
> Threshold, do
{ Learned rules Learned rules + Rule
{ Examples Examples ? fexamples
correctly classied by Ruleg
{ Rule learn-one-
rule(Target attribute; Attributes; Examples)
Learned rules sort Learned rules accord to
performance over Examples
return Learned rules
Learn-One-Rule
IF
THEN PlayTennis=yes
IF Wind=weak
THEN PlayTennis=yes
...
IF Wind=strong IF Humidity=high
THEN PlayTennis=no IF Humidity=normal THEN PlayTennis=no
THEN PlayTennis=yes
IF Humidity=normal
Wind=weak
THEN PlayTennis=yes
...
IF Humidity=normal IF Humidity=normal
Wind=strong IF Humidity=normal Outlook=rain
THEN PlayTennis=yes Outlook=sunny THEN PlayTennis=yes
THEN PlayTennis=yes
Learn-One-Rule
Pos positive Examples
Neg negative Examples
while Pos, do
Learn a NewRule
{ NewRule most general rule possible
{ NewRuleNeg Neg
{ while NewRuleNeg , do
Add a new literal to specialize NewRule

1. Candidate literals generate candidates
2. Best literal argmaxL2Candidate literals
Performance(SpecializeRule(NewRule; L))
3. add Best literal to NewRule preconditions
4. NewRuleNeg subset of NewRuleNeg
that satises NewRule preconditions
{ Learned rules Learned rules + NewRule
{ Pos Pos ? fmembers of Pos covered by
NewRuleg
Return Learned rules
Subtleties: Learn One Rule
1. May use beam search

2. Easily generalizes to multi-valued target
functions
3. Choose evaluation function to guide search:
Entropy (i.e., information gain)
Sample accuracy: n
c
n
where nc = correct rule predictions, n = all
predictions
m estimate: nc + mp
n+m
Variants of Rule Learning Programs
Sequential or simultaneous covering of data?

General ! specic, or specic ! general?
Generate-and-test, or example-driven?
Whether and how to post-prune?
What statistical evaluation function?
Learning First Order Rules
Why do that?
Can learn sets of rules such as
Ancestor(x; y) Parent(x; y)
Ancestor(x; y) Parent(x; z ) ^ Ancestor(z; y)
General purpose programming language
Prolog: programs are sets of such rules
First Order Rule for Classifying Web
Pages
[Slattery, 1997]
course(A)
has-word(A, instructor),
Not has-word(A, good),
link-from(A, B),
has-word(B, assign),
Not link-from(B, C)
Train: 31/31, Test: 31/34
FOIL(Target predicate; Predicates; Examples)
Pos positive Examples
Neg negative Examples
while Pos, do
Learn a NewRule
{ NewRule most general rule possible
{ NewRuleNeg Neg
{ while NewRuleNeg , do
Add a new literal to specialize NewRule

1. Candidate literals generate candidates
2. Best literal
argmaxL2Candidate literals Foil Gain(L; NewRule)
3. add Best literal to NewRule preconditions
4. NewRuleNeg subset of NewRuleNeg
that satises NewRule preconditions
{ Learned rules Learned rules + NewRule
{ Pos Pos ? fmembers of Pos covered by
NewRuleg
Return Learned rules
Specializing Rules in FOIL
Learning rule: P (x1; x2; : : : ; xk ) L1 : : : Ln

Candidate specializations add new literal of form:
Q(v1; : : : ; vr), where at least one of the vi in the
created literal must already exist as a variable in
the rule.
Equal(xj ; xk), where xj and xk are variables
already present in the rule
The negation of either of the above forms of
literals
Information Gain in FOIL
0 1
Foil Gain(L; R) t log2 p + p1 ? log p0
BB CC
@ 2p +n A
1 n1 0 0
Where
L is the candidate literal to add to rule R
p0 = number of positive bindings of R
n0 = number of negative bindings of R
p1 = number of positive bindings of R + L
n1 = number of negative bindings of R + L
t is the number of positive bindings of R also
covered by R + L
Note
? log2 p p+n is optimal number of bits to indicate
0
the class of a positive binding covered by R

0 0
Induction as Inverted Deduction
Induction is nding h such that

(8hxi; f (xi)i 2 D) B ^ h ^ xi ` f (xi)
where
xi is ith training instance
f (xi) is the target function value for xi
B is other background knowledge
So let's design inductive algorithm by inverting

operators for automated deduction!
\pairs of people, hu; vi such that child of u is v,"
f (xi) : Child(Bob; Sharon)

xi : Male(Bob); Female(Sharon); Father(Sharon; Bob)
B: Parent(u; v) Father(u; v)
What satises (8hxi; f (xi)i 2 D) B ^ h ^ xi ` f (xi)?
h1 : Child(u; v) Father(v; u)
h2 : Child(u; v) Parent(v; u)
Induction is, in fact, the inverse operation of
deduction, and cannot be conceived to exist
without the corresponding operation, so that
the question of relative importance cannot
arise. Who thinks of asking whether addition
or subtraction is the more important process
in arithmetic? But at the same time much
dierence in diculty may exist between a
direct and inverse operation; : : : it must be
allowed that inductive investigations are of a
far higher degree of diculty and complexity
than any questions of deduction: : : :
(Jevons 1874)
We have mechanical deductive operators

F (A; B ) = C , where A ^ B ` C
need inductive operators
O(B; D) = h where (8hxi; f (xi)i 2 D) (B ^h^xi) ` f (xi)
Positives:
Subsumes earlier idea of nding h that \ts"
training data
Domain theory B helps dene meaning of \t"
the data
B ^ h ^ xi ` f (xi)
Suggests algorithms that search H guided by B
Negatives:
Doesn't allow for noisy data. Consider
(8hxi; f (xi)i 2 D) (B ^ h ^ xi) ` f (xi)
First order logic gives a huge hypothesis space H
! overtting...
! intractability of calculating all acceptable h's
Deduction: Resolution Rule
P _ L
:L _ R
P _R
1. Given initial clauses C1 and C2, nd a literal L
from clause C1 such that :L occurs in clause C2
2. Form the resolvent C by including all literals
from C1 and C2, except for L and :L. More
precisely, the set of literals occurring in the
conclusion C is
C = (C1 ? fLg) [ (C2 ? f:Lg)
where [ denotes set union, and \?" denotes set
dierence.
Inverting Resolution
C : KnowMaterial V Study C : KnowMaterial V Study

2 2
C : PassExam V KnowMaterial C : PassExam V KnowMaterial

1 1
C: PassExam V Study
C: PassExam V Study
Inverted Resolution (Propositional)
1. Given initial clauses C1 and C , nd a literal L

that occurs in clause C1, but not in clause C .
2. Form the second clause C2 by including the
following literals
C2 = (C ? (C1 ? fLg)) [ f:Lg
First order resolution
First order resolution:

1. Find a literal L1 from clause C1, literal L2 from
clause C2, and substitution such that
L1 = :L2
2. Form the resolvent C by including all literals
from C1 and C2, except for L1 and :L2.
More precisely, the set of literals occurring in
the conclusion C is
C = (C1 ? fL1g) [ (C2 ? fL2g)
Inverting First order resolution
C2 = (C ? (C1 ? fL1g)1)2?1 [ f:L112?1g
Cigol
Father (Tom, Bob) GrandChild ( y,x ) V Father ( x,z ) V Father ( z,y )
{Bob/y, Tom/z}
Father (Shannon, Tom ) GrandChild ( Bob,x) V Father ( x,Tom )
{Shannon/x}
GrandChild ( Bob, Shannon)
Progol
Progol: Reduce comb explosion by generating the

most specic acceptable h
1. User species H by stating predicates, functions,
and forms of arguments allowed for each
2. Progol uses sequential covering algorithm.
For each hxi; f (xi)i
Find most specic hypothesis hi s.t.
B ^ hi ^ xi ` f (xi)
{ actually, considers only k -step entailment
3. Conduct general-to-specic search bounded by

specic hypothesis hi, choosing hypothesis with
minimum description length

ch10 PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

ch10 PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Learning Sets of Rules

[Read Ch. 10]

Method 1: Learn decision tree, convert to rules

Add a new literal to specialize NewRule

1. May use beam search

 Sequential or simultaneous covering of data?

Add a new literal to specialize NewRule

Learning rule: P (x1; x2; : : : ; xk ) L1 : : : Ln

the class of a positive binding covered by R

Induction is nding h such that

So let's design inductive algorithm by inverting

\pairs of people, hu; vi such that child of u is v,"

f (xi) : Child(Bob; Sharon)

We have mechanical deductive operators

C : KnowMaterial V Study C : KnowMaterial V Study

C : PassExam V KnowMaterial C : PassExam V KnowMaterial

1. Given initial clauses C1 and C , nd a literal L

First order resolution:

C2 = (C ? (C1 ? fL1g)1)2?1 [ f:L112?1g

Father (Tom, Bob) GrandChild ( y,x ) V Father ( x,z ) V Father ( z,y )

Father (Shannon, Tom ) GrandChild ( Bob,x) V Father ( x,Tom )

GrandChild ( Bob, Shannon)

Progol: Reduce comb explosion by generating the

3. Conduct general-to-speci c search bounded by

Das könnte Ihnen auch gefallen

Sequential or simultaneous covering of data?

C2 = (C ? (C1 ? fL1g)1)2?1 [ f:L112?1g

3. Conduct general-to-specic search bounded by