Sie sind auf Seite 1von 21

Review of Le

ture 4 • Noisy targets


y = f (x) −→ y ∼ P (y | x)
• Error measures
- User-spe ied e h(x), f (x)

UNKNOWN TARGET DISTRIBUTION
P(y | x)
<+1 you
8
> target function f: X Y plus noise

f
−1 intruder
>
: UNKNOWN
TRAINING EXAMPLES INPUT
( x1 , y1 ), ... , ( xN , yN ) DISTRIBUTION

- In-sample: N
P (x)

1 X
e h(xn), f (xn)

E (h) =
in
N n=1 - (x1, y1), · · · , (xN , yN ) generated by

P (x, y) = P (x)P (y|x)


- Out-of-sample
E (h) = Ex e h(x), f (x) - E (h) is now Ex,y e h(x), y
   
out out
Learning From Data
Yaser S. Abu-Mostafa
California Institute of Te hnology

Le ture 5: Training versus Testing

Sponsored by Calte h's Provost O e, E&AS Division, and IST • Tuesday, April 17, 2012
Outline

• From training to testing

• Illustrative examples

• Key notion: break point

• Puzzle

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 2/20
The nal exam

Testing:

P [ |E − E | > ǫ ] ≤ 2 e
in out
−2ǫ2N

Training:

P [ |E − E | > ǫ ] ≤ 2M e
in out
−2ǫ2N

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 3/20
Where did the M ome from?

The B ad events Bm are

 |E (hm) − E (hm)| > ǫ


in out
B1 B2
The union bound:

P[B1 or B2 or · · · or BM ]

≤P [B 1 ] + P[B2] + · · · + P[BM ] B3
M terms
| {z }
no overlaps:
AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 4/20
Can we improve on M?

up

Yes, bad events are very overlapping!

∆E out : hange in +1 and −1 areas


1
∆E in : hange in labels of data points

+1

|E (h1) − E (h1)| ≈ |E (h2) − E (h2)|


in out in out

down

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 5/20
What an we repla e M with?

Instead of the whole input spa e,

we onsider a nite set of input points,

and ount the number of di hotomies

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 6/20
Di hotomies: mini-hypotheses

A hypothesis h : X → {−1, +1}

A di hotomy h : {x1, x2, · · · , xN } → {−1, +1}

Number of hypotheses |H| an be innite

Number of di hotomies |H(x1, x2, · · · , xN )| is at most 2N

Candidate for repla ing M

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 7/20
The growth fun tion

The growth fun tion ounts the most di hotomies on any N points

mH(N )= max |H(x1, · · · , xN )|


x1,··· ,xN ∈X

The growth fun tion satises:

mH(N ) ≤ 2N

Let's apply the denition.

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 8/20
ements PSfrag repla ements

Applying denition - per eptrons


0 0
0.5
1
0.5
1
mH(N )
1.5 1.5 PSfrag repla ements
2 2
2.5 2.5 0
3 3 0.5
3.5 3.5 1
4 4 1.5
1 1 2
1.2 1.2 2.5
1.4 1.4 3
1.6 1.6 3.5
1.8 1.8 4
2 2 0.5
2.2 2.2 1
2.4 2.4 1.5
2.6 2.6 2
2.8 2.8 2.5
3 3 3

N =3 N =3 N =4

mH(3) = 8 mH(4) = 14

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 9/20
Outline

• From training to testing

• Illustrative examples

• Key notion: break point

• Puzzle

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 10/20
0.04
Example 1: positive rays
0.06

0.08 h(x) = −1 h(x) = +1


a
0.1 x1 x2 x3 ... xN

H is set of h : R → {−1, +1}

h(x) = sign(x − a)

mH(N ) = N + 1
AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 11/20
0.06 Example 2: positive intervals

0.08 h(x) = −1 h(x) = +1 h(x) = −1


0.1 x1 x2 x3 ... xN

H is set of h : R → {−1, +1}

Pla e interval ends in two of N +1 spots

 
N +1
mH(N ) = 2 +1 = 12 N 2 + 12 N + 1

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 12/20
Example 3: onvex sets
2 up +
H is set of h : R → {−1, +1} −
+
h(x) = +1 is onvex +

mH(N ) = 2N −


The N points are `shattered' by onvex sets

+
− +
bottom

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 13/20
The 3 growth fun tions

•H is positive rays:

mH(N ) = N + 1

•H is positive intervals:

mH(N ) = 12 N 2 + 12 N + 1

•H is onvex sets:

mH(N ) = 2N

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 14/20
Ba k to the big pi ture

Remember this inequality?

P [ |E − E | > ǫ ] ≤ 2M e
in out
−2ǫ2N

What happens if mH(N ) repla es M?

mH(N ) polynomial =⇒ Good!

Just prove that mH(N ) is polynomial?

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 15/20
Outline

• From training to testing

• Illustrative examples

• Key notion: break point

• Puzzle

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 16/20
1.5

2
Break point of H
2.5

Denition: 3.5

4
If no data set of size k an be shattered by H,
0.5
then k is a break point for H
1

1.5
mH(k) < 2k
2

2.5
For 2D per eptrons, k=4
3

A bigger data set annot be shattered either

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 17/20
Break point - the 3 examples

• Positive rays mH(N ) = N + 1


break point k= 2 • •

• Positive intervals mH(N ) = 12 N 2 + 12 N + 1


break point k= 3 • • •

• Convex sets mH(N ) = 2N


break point k =`∞'

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 18/20
Main result

No break point =⇒ mH(N ) = 2N

Any break point =⇒ mH(N ) is polynomial in N

AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 19/20
Puzzle

x1 x2 x3
◦ ◦ ◦
◦ ◦ •
◦ • ◦
• ◦ ◦
• ◦ •
AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 5 20/20