Sie sind auf Seite 1von 25

PSfrag repla

ements

Review of Le ture 17

Sampling bias
Hi

O am's Razor
The simplest model that
ts the data is also the
most plausible.

P(x)

testing
training

Hi

Data snooping
Cumulative Prot %

30

snooping

20

10

omplexity of h omplexity of H
unlikely event signi ant if it happens

-10

no snooping
100

200

300

Day

400

500

Learning From Data


Yaser S. Abu-Mostafa

California Institute of Te hnology

Le ture 18:

Epilogue

Sponsored by Calte h's Provost O e, E&AS Division, and IST

Thursday, May 31, 2012

Outline

The map of ma hine learning

Bayesian learning

Aggregation methods

A knowledgments

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

2/23

It's a jungle out there

semisupervised learning

stochastic gradient descent

overfitting

Gaussian processes
distributionfree
collaborative filtering

deterministic noise
linear regression
VC dimension
nonlinear transformation

decision trees

data snooping
sampling bias

Q learning

SVM

learning curves

mixture of expe
neural networks

no free

training versus testing


RBF
noisy targets
Bayesian prior
active learning
linear models
biasvariance tradeoff
weak learners
ordinal regression
logistic regression
data contamination
cross validation

ensemble learning

types of learning

xploration versus exploitation

error measures

is learning feasible?

clustering

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

regularization

kernel methods

hidden Markov mod


perceptrons
graphical models

softorder constraint
weight decay

Occams razor

Boltzmann mach

3/23

The map

THEORY

TECHNIQUES

models
VC
biasvariance
complexity

linear

methods
supervised
regularization

neural networks
SVM
nearest neighbors

bayesian

PARADIGMS

RBF
gaussian processes

unsupervised
validation
reinforcement
aggregation
active
input processing
online

SVD
graphical models

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

4/23

Outline

The map of ma hine learning

Bayesian learning

Aggregation methods

A knowledgments

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

5/23

Probabilisti approa h
Hi

Extend probabilisti role to all omponents

P (D | h = f )
How about

de ides whi h

P (h = f | D)

UNKNOWN TARGET DISTRIBUTION


P(y |

x)

target function f: X

UNKNOWN
INPUT
DISTRIBUTION

plus noise

P( x )

(likelihood)
x1 , ... , xN

DATA SET
D = ( x1 , y1 ), ... , ( xN , yN )

g ( x )~
~ f (x )

LEARNING
ALGORITHM

FINAL
HYPOTHESIS
g: X Y

HYPOTHESIS SET
H
Hi

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

6/23

The prior
P (h = f | D)

requires an additional probability distribution:

P (D | h = f ) P (h = f )
P (h = f | D) =
P (D | h = f ) P (h = f )
P (D)
P (h = f )

is the

P (h = f | D)

prior

is the

posterior

Given the prior, we have the full distribution

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

7/23

Example of a prior
Consider a per eptron:

A possible prior on

w:

h is
Ea h

determined by

wi

w = w0, w1, , wd

is independent, uniform over

[1, 1]

This determines the prior over

Given

D,

we an ompute

P (h = f )

P (D | h = f )

Putting them together, we get

P (h = f | D)

P (h = f )P (D | h = f )
AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

8/23

A prior is an assumption
Even the most neutral prior:
Hi

is unknown

is random

P(x)

x
Hi

The true equivalent would be:


Hi

is unknown

is random

(xa)
1
AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

x
Hi
9/23

If we knew the prior


...

we ould ompute

P (h = f | D)

for every

hH

we an nd the most probable

we an derive

E(h(x))

we an derive the

h given the data

for every

error bar for every x

we an derive everything in a prin ipled way

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

10/23

When is Bayesian learning justied?


1. The prior is

valid

trumps all other methods

2. The prior is

irrelevant

just a omputational atalyst

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

11/23

Outline

The map of ma hine learning

Bayesian learning

Aggregation methods

A knowledgments

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

12/23

What is aggregation?
Combining dierent solutions
Hi

h 1 , h2 , , hT

that were trained on

D:

Hi

Regression: take an average


Classi ation: take a vote
a.k.a.
AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

ensemble learning

and

boosting
13/23

Dierent from 2-layer learning


Hi

In a 2-layer model, all units learn

In aggregation, they learn

training data

jointly:

Learning
Algorithm

independently then get ombined:

Hi

Hi

training data
Learning
Algorithm

Hi

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

14/23

Two types of aggregation


1. After the fa t: ombines existing solutions
Example. Netix teams merging

blending

2. Before the fa t: reates solutions to be ombined


Example. Bagging - resampling D
Hi

training data
Learning
Algorithm

Hi

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

15/23

De orrelation - boosting
Create

h 1 , , ht ,

sequentially: Make

ht

de orrelated with previous

h's:

Hi

training data
Learning
Algorithm

Hi

Emphasize points in

Choose weight of

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

ht

that were mis lassied

based on

E (ht)
in

16/23

Blending - after the fa t


For regression,

h 1 , h2 , , hT

g(x) =

T
X

t ht(x)

t=1
Prin ipled hoi e of

t's:

minimize the error on an aggregation data set

Some

t's

pseudo-inverse

an ome out negative

Most valuable

ht

in the blend?

Un orrelated ht's help the blend


AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

17/23

Outline

The map of ma hine learning

Bayesian learning

Aggregation methods

A knowledgments

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

18/23

Course ontent

AM

L

Professor

Malik Magdon-Ismail, RPI

Professor

Hsuan-Tien Lin, NTU

Creator: Yaser Abu-Mostafa - LFD Le ture 18

19/23

Course sta
Carlos Gonzalez (Head TA)
Ron Appel
Costis Sideris
Doris Xin

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

20/23

Filming, produ tion, and infrastru ture


Leslie Maxeld and the AMT sta
Ri h Fagen and the IMSS sta

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

21/23

Calte h support
IST -

AM

L

Mathieu Desbrun

E&AS Division -

Ares Rosakis

Provost's O e -

Ed Stolper

Creator: Yaser Abu-Mostafa - LFD Le ture 18

and

and

Mani Chandy
Melany Hunt

22/23

Many others
Calte h TA's and sta members
Calte h alumni and Alumni Asso iation
Colleagues all over the world

AM

L

Creator: Yaser Abu-Mostafa - LFD Le ture 18

23/23

To the fond memory of

Faiza A. Ibrahim