Beruflich Dokumente
Kultur Dokumente
19:
More
EM
Machine
Learning
April
15,
2010
Last
Time
Expecta>on
Maximiza>on
Gaussian
Mixture
Models
Today
EM
Proof
Jensens
Inequality
Can we prove that were approaching some maximum, even if many exist.
Bound
maximiza>on
Since
we
cant
op>mize
the
GMM
parameters
directly,
maybe
we
can
nd
the
maximum
of
a
lower
bound.
Technically:
op>mize
a
convex
lower
bound
of
the
ini>al
non-convex
func>on.
l(t+1 ) Qt (t+1 )
EM
as
bound
maximiza>on
Claim:
for
GMM
likelihood
l() =
n
log
p(xn , z|)
EM
Correctness
Proof
l()
p(xn , z|) Introduce
hidden
variable
(mixtures
in
GMM)
p(xn , z|) p(z|xn , t ) p(z|xn , t ) p(xn , z|) p(z|xn , )
A
xed
value
of
t
n z
p(xn , z|) Jensens
Inequality
(coming
soon)
p(z|xn , t ) log p(z|xn , t ) n z p(z|xn , t ) log p(xn , z|) p(z|xn , t ) log p(z|xn , t )
z
p(z|xn , t )
EM
Correctness
Proof
Q() = argmax
n z z n log p(xn , z|)
l() t+1
Q(|t ) const
n z
log
i
i
i p(x|i , i )
i p(x|i , i )
i f (p(x|i , i ))
i log (p(x|i , i ))
General
form
of
EM
Given
a
joint
distribu>on
over
observed
and
latent
variables:
p(X, Z|) Want
to
maximize:
p(X|)
1. Ini>alize
parameters
old 2. E
Step:
Evaluate:
p(Z|X, old ) 3. M-Step:
Re-es>mate
parameters
(based
on
expecta>on
of
complete-data
log
likelihood)
new = argmax p(Z|X, old ) ln p(X, Z|) 4. Check
for
convergence
of
params
or
likelihood
Z
We only looked at training supervised HMMs. What if you believe the data is sequen>al, but you cant observe the state.
EM
on
HMMs
also
known
as
Baum-Welch
i
ij
ij
qt qt+1
i k qt qt+1
ij = N 1t=0 T 1
k=0 t=0
T 1
T 1
i qt xj t i qt xk t
EM
on
HMMs
Standard
EM
Algorithm
Ini>alize
E-Step:
evaluate
expected
likelihood
M-Step:
rees>mate
parameters
from
expected
likelihood
Check
for
convergence
EM
on
HMMs
Guess:
Ini>alize
parameters,
= [, , ]T
E-Step:
Compute
E{l()} = E{log p(x, q|)}
E{log p(x, q|)} = E log p(q0 ) =E M 1
i=0 T 1 t=1
M 1 i=0
i E{q0 } log i +
p(qt |qt 1)
T 1 t=0
p(xt |qt )
i E{qt xj } log ij ) t
EM
on
HMMs
But
what
are
these
E{}
quan>>es?
=
M 1 i=0 i E{q0 } log i + T 1 M 1 t=1 i,j=0 i j E{qt qt1 } log ij + T 1 M 1 N 1 t=0 i=0 j=0 i E{qt xj } log ij t
E{x } =
i
p(x)x =
i
so
p(x)(x = xi ) = p(xi )
EM
on
HMMs
p(qi , qi1 |x0 . . . xn ) p(q0 |x0 . . . xn ) p(qi |x0 . . . xn )
p(qi+1 |x0 . . . xn )
(qi+1 , xi+1 )
EM
on
HMMs
Standard
EM
Algorithm
Ini>alize
E-Step:
evaluate
expected
likelihood
JTA
algorithm.
Junc>on
Trees
In
general,
we
have
no
guarantee
that
we
can
isolate
a
single
variable.
Gibbs Sampling. This is helpful if its easier to sample from a condi>onal than it is to integrate to get the marginal.
Today
EM
as
bound
maximiza>on
EM
as
a
general
approach
to
learning
parameters
for
latent
variables
Sampling
Next
Time
Model
Adapta>on
Using
labeled
and
unlabeled
data
to
improve
performance.