# Review of Le

## ture 4 • Noisy targets

y = f (x) −→ y ∼ P (y | x)
• Error measures
- User-spe ied e h(x), f (x)

UNKNOWN TARGET DISTRIBUTION
P(y | x)
<+1 you
8
> target function f: X Y plus noise

f
−1 intruder
>
: UNKNOWN
TRAINING EXAMPLES INPUT
( x1 , y1 ), ... , ( xN , yN ) DISTRIBUTION

- In-sample: N
P (x)

1 X
e h(xn), f (xn)

E (h) =
in
N n=1 - (x1, y1), · · · , (xN , yN ) generated by

## P (x, y) = P (x)P (y|x)

- Out-of-sample
E (h) = Ex e h(x), f (x) - E (h) is now Ex,y e h(x), y
   
out out
Learning From Data
Yaser S. Abu-Mostafa
California Institute of Te hnology

## Le ture 5: Training versus Testing

Sponsored by Calte h's Provost O e, E&AS Division, and IST • Tuesday, April 17, 2012
Outline

## • From training to testing

• Illustrative examples

## • Key notion: break point

• Puzzle

AM
The nal exam

Testing:

P [ |E − E | > ǫ ] ≤ 2 e
in out
−2ǫ2N

Training:

P [ |E − E | > ǫ ] ≤ 2M e
in out
−2ǫ2N

AM
Where did the M ome from?

##  |E (hm) − E (hm)| > ǫ

in out
B1 B2
The union bound:

P[B1 or B2 or · · · or BM ]

≤P [B 1 ] + P[B2] + · · · + P[BM ] B3
M terms
| {z }
no overlaps:
AM
Can we improve on M?

up

## ∆E out : hange in +1 and −1 areas

1
∆E in : hange in labels of data points

+1

## |E (h1) − E (h1)| ≈ |E (h2) − E (h2)|

in out in out

down

AM
What an we repla e M with?

## and ount the number of di hotomies

AM
Di hotomies: mini-hypotheses

## Candidate for repla ing M

AM
The growth fun tion

The growth fun tion ounts the most di hotomies on any N points

x1,··· ,xN ∈X

mH(N ) ≤ 2N

## Let's apply the denition.

AM
ements PSfrag repla ements

## Applying denition - per eptrons

0 0
0.5
1
0.5
1
mH(N )
1.5 1.5 PSfrag repla ements
2 2
2.5 2.5 0
3 3 0.5
3.5 3.5 1
4 4 1.5
1 1 2
1.2 1.2 2.5
1.4 1.4 3
1.6 1.6 3.5
1.8 1.8 4
2 2 0.5
2.2 2.2 1
2.4 2.4 1.5
2.6 2.6 2
2.8 2.8 2.5
3 3 3

N =3 N =3 N =4

mH(3) = 8 mH(4) = 14

AM
AM
0.04
Example 1: positive rays
0.06

## 0.08 h(x) = −1 h(x) = +1

a
0.1 x1 x2 x3 ... xN

## H is set of h : R → {−1, +1}

h(x) = sign(x − a)

mH(N ) = N + 1
AM
0.06 Example 2: positive intervals

## 0.08 h(x) = −1 h(x) = +1 h(x) = −1

0.1 x1 x2 x3 ... xN

## Pla e interval ends in two of N +1 spots

 
N +1
mH(N ) = 2 +1 = 12 N 2 + 12 N + 1

AM
Example 3: onvex sets
2 up +
H is set of h : R → {−1, +1} −
+
h(x) = +1 is onvex +

mH(N ) = 2N −

The N points are `shattered' by onvex sets

+
− +
bottom

AM
The 3 growth fun tions

•H is positive rays:

mH(N ) = N + 1

•H is positive intervals:

mH(N ) = 12 N 2 + 12 N + 1

•H is onvex sets:

mH(N ) = 2N

AM
Ba k to the big pi ture

## Remember this inequality?

P [ |E − E | > ǫ ] ≤ 2M e
in out
−2ǫ2N

## Just prove that mH(N ) is polynomial?

AM
AM
1.5

2
Break point of H
2.5

Denition: 3.5

4
If no data set of size k an be shattered by H,
0.5
then k is a break point for H
1

1.5
mH(k) < 2k
2

2.5
For 2D per eptrons, k=4
3

## A bigger data set annot be shattered either

AM
Break point - the 3 examples

## • Positive rays mH(N ) = N + 1

break point k= 2 • •

## • Positive intervals mH(N ) = 12 N 2 + 12 N + 1

break point k= 3 • • •

## • Convex sets mH(N ) = 2N

break point k =`∞'

AM
Main result

## Any break point =⇒ mH(N ) is polynomial in N

AM
Puzzle

x1 x2 x3
◦ ◦ ◦
◦ ◦ •
◦ • ◦
• ◦ ◦
• ◦ •
AM
