Beruflich Dokumente
Kultur Dokumente
Last
Time
Review
of
Supervised
Learning
Clustering
K-means
SoJ
K-means
Today
A
brief
look
at
Homework
2
Gaussian
Mixture
Models
Expecta7on
Maximiza7on
The
Problem
You
have
data
that
you
believe
is
drawn
from
n
popula7ons
You
want
to
iden7fy
parameters
for
each
popula7on
You
dont
know
anything
about
the
popula7ons
a
priori
Except
you
believe
that
theyre
gaussian
GMM
example
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10
10
15
20
25
f0 (x) = N (x; 2, 2)
= .5
Mixture
Models
Formally
a
Mixture
Model
is
the
weighted
sum
of
a
number
of
pdfs
where
the
weights
are
determined
by
a
distribu7on,
p(x) = 0 f0 (x) + 1 f1 (x) + 2 f2 (x) + . . . + k fk (x) k where i = 1
i=0
p(x) =
k i=0
i fi (x)
p(x) =
k i=0
i N (x|k , k )
swea7ng
laughing
Expecta7on
Maximiza7on
Both
the
training
of
GMMs
and
Graphical
Models
with
latent
variables
can
be
accomplished
using
Expecta7on
Maximiza7on
Step
1:
Expecta7on
(E-step)
Evaluate
the
responsibili7es
of
each
cluster
with
the
current
parameters
k i=0
z kk
i N (x|k , k ) =
p(x|z) =
K
p(z)p(x|z)
N (x|k , k )zk
k=1
k=1
k N (x|k , k )
j=1
ln
k=1
k N (xn |k , k )
ln
k=1
k N (xn |k , k )
n=1
(znk )1 (xk k ) = 0 k
N
n=1 (znk )xn N n=1 (znk )
k =
ln
k=1
k N (xn |k , k )
n=1
(znk ) n=1
k 1
k =
n=1
(zn k) N
MLE
of
a
GMM
k = N
n=1
(znk )xn Nk
Nk k = N Nk =
N
(zn k)
n=1
EM
for
GMMs
Ini7alize
the
parameters
Evaluate
the
log
likelihood
EM
for
GMMs
E-step:
Evaluate
the
Responsibili7es
(znk ) = k N (xn |k , k )
j=1
j N (xn |j , j )
EM
for
GMMs
M-Step:
Re-es7mate
Parameters
new k
new k =
n=1
(znk )xn Nk
new k
Nk = N
Visual example of EM
Poten7al
Problems
Incorrect
number
of
Mixture
Components
Singulari7es
Singulari7es
A
minority
of
the
data
can
have
a
dispropor7onate
eect
on
the
model
likelihood.
For
example
GMM
example
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10
10
15
20
25
f0 (x) = N (x; 2, 2)
= .5
Singulari7es
When
a
mixture
component
collapses
on
a
given
point,
the
mean
becomes
the
point,
and
the
variance
goes
to
zero.
Consider
the
likelihood
func7on
as
the
covariance
goes
to
zero.
The
likelihood
approaches
innity.
p(x) =
k i=0
1 1 N (xn |xn , I) = 2 j
2
i N (x|k , k )
Rela7onship
to
K-means
K-means
makes
hard
decisions.
Each
data
point
gets
assigned
to
a
single
cluster.
Responsibili7es:
The
expecta7on
of
soJ
k-means
is
the
intercluster
variability
Note:
only
the
means
are
rees7mated
in
SoJ
K-means.
The
covariance
matrices
are
all
7ed.
General
form
of
EM
Given
a
joint
distribu7on
over
observed
and
latent
variables:
p(X, Z|) Want
to
maximize:
p(X|)
1. Ini7alize
parameters
old 2. E
Step:
Evaluate:
p(Z|X, old ) 3. M-Step:
Re-es7mate
parameters
(based
on
expecta7on
of
complete-data
log
likelihood)
new = argmax p(Z|X, old ) ln p(X, Z|) 4. Check
for
convergence
of
params
or
likelihood
Z
Next
Time
Homework
4
due
Proof
of
Expecta7on
Maximiza7on
in
GMMs
Generalized
EM
Hidden
Markov
Models