Probably Approximately Correct (PAC) Learning

Note to other teachers and users of these slides.
Andrew would
be delighted if you found this source material useful in giving
your own lectures. Feel free to use these slides verbatim, or to
modify them to fit your own needs. PowerPoint originals are
available. If you make use of a significant portion of these sli des
in your own lecture, please include this message, or the
following link to the source repository of Andrew’s tutorials:
http://www.cs.cmu.edu/~awm/tutorials . Comments and
corrections gratefully received.
PAC-learning
Andrew W. Moore
Associate Professor
School of Computer Science
Carnegie Mellon University
www.cs.cmu.edu/~awm
awm@cs.cmu.edu
412-268-7599
Copyright © 2001, Andrew W. Moore Nov 30th, 2001
Probably Approximately Correct

(PAC) Learning
• Imagine we’re doing classification with categorical
inputs.
• All inputs and outputs are binary.
• Data is noiseless.
• There’s a machine f(x,h) which has H possible
settings (a.k.a. hypotheses), called h1, h 2 .. hH.
Copyright © 2001, Andrew W. Moore PAC-learning: Slide 2
1
Example of a machine
• f(x,h) consists of all logical sentences about X1, X2
.. Xm that contain only logical ands.
• Example hypotheses:
• X1 ^ X3 ^ X19
• X3 ^ X18
• X7
• X1 ^ X2 ^ X2 ^ x4 … ^ Xm
• Question: if there are 3 attributes, what is the
complete set of hypotheses in f?
Example of a machine
• X1 ^ X3 ^ X19
• X3 ^ X18
• X7
• X1 ^ X2 ^ X2 ^ x4 … ^ Xm
• Question: if there are 3 attributes, what is the
complete set of hypotheses in f? (H = 8)
True X2 X3 X2 ^ X3
X1 X1 ^ X2 X1 ^ X3 X1 ^ X2 ^ X3
2
And-Positive-Literals Machine
• X1 ^ X3 ^ X19
• X3 ^ X18
• X7
• X1 ^ X2 ^ X2 ^ x4 … ^ Xm
• Question: if there are m attributes, how many
hypotheses in f?
And-Positive-Literals Machine
• X1 ^ X3 ^ X19
• X3 ^ X18
• X7
• X1 ^ X2 ^ X2 ^ x4 … ^ Xm
• Question: if there are m attributes, how many
hypotheses in f? (H = 2m)
3
And-Literals Machine
• f(x,h) consists of all logical
sentences about X1, X2 ..
Xm or their negations that
contain only logical ands.
• X1 ^ ~X3 ^ X19
• X3 ^ ~X18
• ~X7
• X1 ^ X2 ^ ~X3 ^ … ^ Xm
• Question: if there are 2
attributes, what is the
complete set of hypotheses
in f?
contain only logical ands. True True
• Example hypotheses: True X2
• X1 ^ ~X3 ^ X19 True ~X2
• X3 ^ ~X18 X1 True
• ~X7 X1 ^ X2
• X1 ^ X2 ^ ~X3 ^ … ^ Xm X1 ^ ~X2
• Question: if there are 2 ~X1 True
attributes, what is the
~X1 ^ X2
complete set of hypotheses
in f? (H = 9) ~X1 ^ ~X2
4
• X1 ^ ~X3 ^ X19 True ~X2
• X3 ^ ~X18 X1 True
• ~X7 X1 ^ X2
• X1 ^ X2 ^ ~X3 ^ … ^ Xm X1 ^ ~X2
• Question: if there are m ~X1 True
attributes, what is the size of
~X1 ^ X2
the complete set of
hypotheses in f? ~X1 ^ ~X2
• X1 ^ ~X3 ^ X19 True ~X2
• X3 ^ ~X18 X1 True
• ~X7 X1 ^ X2
• X1 ^ X2 ^ ~X3 ^ … ^ Xm X1 ^ ~X2
• Question: if there are m ~X1 True
attributes, what is the size of
~X1 ^ X2
the complete set of
~X1 ^ ~X2
hypotheses in f? (H = 3m)
5
Lookup Table Machine
• f(x,h) consists of all truth X1 X2 X3 X4 Y
tables mapping combinations 0 0 0 0 0

0 0 0 1 1
of input attributes to true 0 0 1 0 1
and false 0 0 1 1 0
0 1 0 0 1
• Example hypothesis: 0 1 0 1 0
0 1 1 0 0
0 1 1 1 1
• Question: if there are m 1 0 0 0 0
attributes, what is the size of 1 0 0 1 0
the complete set of 1

1
0
0
1
1
0
1
0
1
hypotheses in f? 1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0
Lookup Table Machine

• f(x,h) consists of all truth X1 X2 X3 X4 Y
tables mapping combinations 0 0 0 0 0

0 0 0 1 1
of input attributes to true 0 0 1 0 1
and false 0 0 1 1 0
0 1 0 0 1
• Example hypothesis: 0 1 0 1 0
0 1 1 0 0
0 1 1 1 1
• Question: if there are m 1 0 0 0 0
attributes, what is the size of 1 0 0 1 0
the complete set of 1

1
0
0
1
1
0
1
0
1
hypotheses in f? 1 1 0 0 0
1 1 0 1 0
H =2
m 1 1 1 0 0
2 1 1 1 1 0
6
A Game
• We specify f, the machine
• Nature choose hidden random hypothesis h*
• Nature randomly generates R datapoints
•How is a datapoint generated?
1.Vector of inputs x k = (x k1 ,x k2 , x km) is
drawn from a fixed unknown distrib: D
2.The corresponding output y k=f(x k , h*)
• We learn an approximation of h* by choosing
some hest for which the training set error is 0
Test Error Rate

• Nature choose hidden random hypothesis h*
• For each hypothesis h ,
• Say h is Correctly Classified (CCd) if h has
zero training set error
• Define TESTERR(h )
= Fraction of test points that h will classify
correctly
= P(h classifies a random test point
correctly)
• Say h is BAD if TESTERR(h) > ε
7
Test Error Rate
P (h is CCd | h is bad ) =
P(∀k ∈ Training Set, f ( xk , h ) = yk | h is bad )
• Nature choose hidden random hypothesis h* ≤ (1 − ε ) R
• Say h is Correctly Classified (CCd) if h has
zero training set error
• Define TESTERR(h )
= Fraction of test points that i will classify
correctly
= P(h classifies a random test point
correctly)
Test Error Rate

P (h is CCd | h is bad ) =
P(∀k ∈ Training Set, f ( xk , h ) = yk | h is bad )
• Nature choose hidden random hypothesis h* ≤ (1 − ε ) R
•How is a datapoint generated? P ( we learn a bad h ) ≤
 the set of CCd h' s 
P   =
2.The corresponding output y k=f(x k , h*)  containsa bad h 
some hest for which the training set error is 0 P (∃h. h is CCd ^ h is bad) =
• Say h is Correctly Classified (CCd) if h has  ( h1 is CCd ^ h1 is bad) ∨ 
 
zero training set error  ( h2 is CCd ^ h2 is bad) ∨ 
• Define TESTERR(h ) P
: ≤
= Fraction of test points that i will classify  
 ( h is CCd ^ h is bad) 
correctly  H H 
= P(h classifies a random test point H
correctly)
H
∑ P(h is
i =1
i CCd ^ hi is bad) ≤
∑ P(h is
i =1
i CCd | hi is bad) =
H × P (hi is CCd | hi is bad) ≤ H (1 − ε ) R

8
PAC Learning
• Chose R such that with probability less than δ we’ll
select a bad hest (i.e. an hest which makes mistakes
more than fraction ε of the time)
• Probably Approximately Correct
• As we just saw, this can be achieved by choosing
R such that
δ = P( we learn a bad h ) ≤ H (1 − ε ) R
• i.e. R such that

0.69  1
R≥  log 2 H + log 2 
ε  δ
PAC in action
Machine Example H R required to PAC-
Hypothesis learn
And-positive- X3 ^ X7 ^ X8 2m 0.69  1
 m + log 2 
literals ε  δ
And-literals X3 ^ ~X7 3m 0.69  1
 (log2 3)m + log2 
ε  δ
Lookup X1
0
X2
0
X3
0
X4
0
Y
0 m 0.69  m 1
22  2 + log 2 
0 0 0 1 1
Table
0 0 1 0 1
ε  δ
0 0 1 1 0
0 1 0 0 1
0 1 0 1 0
0 1 1 0 0
0 1 1 1 1
1 0 0 0 0
1 0 0 1 0
1 0 1 0 0
1 0 1 1 1
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0
And-lits or (X1 ^ X5) v 0. 69  1

And-lits (X2 ^ ~X7 ^ X8) (3m ) 2 = 3 2m ε 
 ( 2 log 2 3) m + log2 
δ
9
PAC for decision trees of depth k
• Assume m attributes
• Hk = Number of decision trees of depth k
• H0 =2
• Hk+1 = (#choices of root attribute) *
(# possible left subtrees) *
(# possible right subtrees)
= m * Hk * Hk
• Write Lk = log2 Hk
• L0 = 1
• Lk+1 = log2 m + 2Lk
• So Lk = (2 k -1)(1+log2 m) +1
• So to PAC-learn, need
0.69  k 1
R≥  ( 2 − 1)(1 + log 2 m) + 1 + log 2 
ε  δ
What you should know

• Be able to understand every step in the math that
gets you to
δ = P( we learn a bad h ) ≤ H (1 − ε ) R
• Understand that you thus need this many records

to PAC-learn a machine with H hypotheses
0.69  1
R≥  log 2 H + log 2 
ε  δ
• Understand examples of deducing H for various
machines
10

Probably Approximately Correct (PAC) Learning

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Probably Approximately Correct (PAC) Learning

Hochgeladen von

Copyright:

Verfügbare Formate

Note to other teachers and users of these slides.

Copyright © 2001, Andrew W. Moore Nov 30th, 2001

Probably Approximately Correct

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 2

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 3

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 5

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 6

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 7

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 8

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 9

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 10

tables mapping combinations 0 0 0 0 0

attributes, what is the size of 1 0 0 1 0

the complete set of 1

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 11

Lookup Table Machine

tables mapping combinations 0 0 0 0 0

attributes, what is the size of 1 0 0 1 0

the complete set of 1

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 12

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 13

Test Error Rate

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 14

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 15

Test Error Rate

H × P (hi is CCd | hi is bad) ≤ H (1 − ε ) R

• i.e. R such that

And-lits or (X1 ^ X5) v 0. 69  1

Copyright © 2001, Andrew W. Moore PAC-learning: Slide 18

What you should know

• Understand that you thus need this many records

Das könnte Ihnen auch gefallen