Andrew Rosenberg - Lecture 19: More EM

Lecture
19: More EM
Machine Learning April 15, 2010
Last Time
Expecta>on Maximiza>on Gaussian Mixture Models
Today
EM Proof
Jensens Inequality
Clustering sequen>al data

EM over HMMs EM in any Graphical Model
Gibbs Sampling
Gaussian Mixture Models
How can we be sure GMM/EM works?

Weve already seen that there are mul>ple clustering solu>ons for the same data.
Non-convex op>miza>on problem
Can we prove that were approaching some maximum, even if many exist.
Bound maximiza>on
Since we cant op>mize the GMM parameters directly, maybe we can nd the maximum of a lower bound. Technically: op>mize a convex lower bound of the ini>al non-convex func>on.
EM as a bound maximiza>on problem

Need to dene a func>on Q(x,) such that
Q(x,) l(x,) for all x, l() Qt () Q(x,) = l(x,) at a single l(t ) Qt (t ) point Qt (t+1 ) > Qt (t ) Q(x,) is concave
l(t+1 ) > l(t )
l(t+1 ) Qt (t+1 )
EM as bound maximiza>on
Claim:
for GMM likelihood
l() =
n
log
The GMM MLE es>mate is a convex lower bound

Q() = argmax
n z z n log p(xn , z|)
p(xn , z|)
EM Correctness Proof
l()
Prove that l(x,) Q(x,)

= = = =
n n
log p(xn |) Likelihood func>on log log log

z
p(xn , z|) Introduce hidden variable (mixtures in GMM) p(xn , z|) p(z|xn , t ) p(z|xn , t ) p(xn , z|) p(z|xn , )
A xed value of t
n z
p(xn , z|) Jensens Inequality (coming soon) p(z|xn , t ) log p(z|xn , t ) n z p(z|xn , t ) log p(xn , z|) p(z|xn , t ) log p(z|xn , t )
z
p(z|xn , t )
EM Correctness Proof
Q() = argmax
n z z n log p(xn , z|)
l() t+1
Q(|t ) const
n z
p(z|xn , t ) log p(xn , z|)
p(z|xn , t ) log p(z|xn , t )
= argmax Q(|t ) p(z|xn , t ) log p(xn , z|) = argmax = argmax

n z n z
z n log p(xn , z|) GMM Maximum Likelihood Es>ma>on
The missing link: Jensens Inequality

If f is concave (or convex down): f (E{x}) E{f (x)} Incredibly important tool for dealing with mixture models.
f
if f(x) = log(x)
log
i
i
i p(x|i , i )
i p(x|i , i )
i f (p(x|i , i ))
i log (p(x|i , i ))
log(x1 + (1 )x2 ) log(x1 ) + (1 ) log(x2 )
Generalizing EM from GMM

No>ce, the EM op>miza>on proof never introduced the exact form of the GMM Only the introduc>on of a hidden variable, z. Thus, we can generalize the form of EM to broader types of latent variable models
General form of EM
Given a joint distribu>on over observed and latent variables: p(X, Z|) Want to maximize: p(X|)
1. Ini>alize parameters old 2. E Step: Evaluate: p(Z|X, old ) 3. M-Step: Re-es>mate parameters (based on expecta>on of complete-data log likelihood) new = argmax p(Z|X, old ) ln p(X, Z|) 4. Check for convergence of params or likelihood
Z
Applying EM to Graphical Models

Now we have a general form for learning parameters for latent variables.
Take a Guess Expecta>on: Evaluate likelihood Maximiza>on: Rees>mate parameters Check for convergence
Clustering over sequen>al data

Recall HMMs
We only looked at training supervised HMMs. What if you believe the data is sequen>al, but you cant observe the state.
EM on HMMs
also known as Baum-Welch
i
ij
ij
Recall HMM parameters: T 2 i j

i i = q 0 t=0 ij = M 1 T 1 k=0 t=0
qt qt+1
i k qt qt+1
Now the training counts are es>mated.

i j i t=0 E qt qt+1 i = E q0 ij = M 1 T 1 i k k=0 t=0 E qt qt+1 T 2
ij = N 1t=0 T 1
k=0 t=0
T 1
T 1
i qt xj t i qt xk t
j i t=0 E qt xt ij = N 1 T 1 i xk k=0 t=0 E qt t
EM on HMMs
Standard EM Algorithm
Ini>alize E-Step: evaluate expected likelihood M-Step: rees>mate parameters from expected likelihood Check for convergence
EM on HMMs
Guess: Ini>alize parameters, = [, , ]T E-Step: Compute E{l()} = E{log p(x, q|)}
E{log p(x, q|)} = E log p(q0 ) =E M 1
i=0 T 1 t=1
M 1 i=0
i E{q0 } log i +
T 1 M 1 T 1 M 1 N 1 j j i i i q0 log i + qt qt1 log ij + qt xt log ij )

t=1 i,j=0 t=0 i=0 j=0
T 1 M 1 t=1 i,j=0 i j E{qt qt1 } log ij + T 1 M 1 N 1 t=0 i=0 j=0
p(qt |qt 1)
T 1 t=0
p(xt |qt )
i E{qt xj } log ij ) t
EM on HMMs
But what are these E{} quan>>es?
=
M 1 i=0 i E{q0 } log i + T 1 M 1 t=1 i,j=0 i j E{qt qt1 } log ij + T 1 M 1 N 1 t=0 i=0 j=0 i E{qt xj } log ij t
E{x } =
i
p(x)x =
i
so
p(x)(x = xi ) = p(xi )
i i i j i j i i E{q0 } = p(q0 |) E{qt qt1 } = p(qt , qt1 |) E{qt } = p(qt |) x x x
These can be eciently calculated from JTA poten>als and separators.
EM on HMMs
p(qi , qi1 |x0 . . . xn ) p(q0 |x0 . . . xn ) p(qi |x0 . . . xn )
(q0 , x0 ) (q0 ) (qi1 , qi ) (q ) (qi , qi+1 ) i (qi ) (qi , xi ) (qi+1 )
p(qi+1 |x0 . . . xn )
(qi+1 , xi+1 )
EM on HMMs
Standard EM Algorithm
Ini>alize E-Step: evaluate expected likelihood
JTA algorithm.
M-Step: rees>mate parameters from expected likelihood

j i i t=0 E qt qt+1 i = E q0 ij = M 1 T 1 i k k=0 t=0 E qt qt+1
Using expected values from JTA poten>als and separators

T 2
Check for convergence
i j t=0 E qt xt ij = N 1 T 1 i xk k=0 t=0 E qt t T 1
Training latent variables in Graphical Models

Now consider a general Graphical Model with latent variables.
EM on Latent Variable Models

Guess
Easy, just assign random values to parameters
E-Step: Evaluate likelihood.

We can use JTA to evaluate the likelihood. And marginalize expected parameter values
M-Step: Re-es>mate parameters.

This can get trickier.
Maximiza>on Step in Latent Variable Models

Why is this easy in HMMs, but dicult in general Latent Variable Models?
(q0 , x0 ) (q0 ) (qi1 , qi ) (q ) (qi , qi+1 ) i (qi ) (qi , xi ) (qi+1 ) (qi+1 , xi+1 )
Many parents graphical model
Junc>on Trees
In general, we have no guarantee that we can isolate a single variable.
We need to es>mate marginal separately. Dense Graphs
M-Step in Latent Variable Models

M-Step: Rees>mate Parameters.
Keep k-1 parameters xed (to the current es>mate) Iden>fy a beker guess for the free parameter.





x(t+1) y (t+1) p(x|y (t) ) p(y|x(t+1) )
Gibbs Sampling. This is helpful if its easier to sample from a condi>onal than it is to integrate to get the marginal.
EM on Latent Variable Models

Guess
Easy, just assign random values to parameters
E-Step: Evaluate likelihood.

We can use JTA to evaluate the likelihood. And marginalize expected parameter values
M-Step: Re-es>mate parameters.

Either JTA poten>als and marginals OR Sampling
Today
EM as bound maximiza>on EM as a general approach to learning parameters for latent variables Sampling
Next Time
Model Adapta>on
Using labeled and unlabeled data to improve performance.
Model Adapta>on Applica>on

Speaker Recogni>on
UBM-MAP

Andrew Rosenberg - Lecture 19: More EM

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Andrew Rosenberg - Lecture 19: More EM

Hochgeladen von

Copyright:

Verfügbare Formate

Lecture

Clustering sequen>al data

Gaussian Mixture Models

How can we be sure GMM/EM works?

EM as a bound maximiza>on problem

The GMM MLE es>mate is a convex lower bound

Prove that l(x,) Q(x,)

log p(xn |) Likelihood func>on log log log

p(z|xn , t ) log p(xn , z|)

p(z|xn , t ) log p(z|xn , t )

= argmax Q(|t ) p(z|xn , t ) log p(xn , z|) = argmax = argmax

z n log p(xn , z|) GMM Maximum Likelihood Es>ma>on

The missing link: Jensens Inequality

log(x1 + (1 )x2 ) log(x1 ) + (1 ) log(x2 )

Generalizing EM from GMM

Applying EM to Graphical Models

Clustering over sequen>al data

Recall HMM parameters: T 2 i j

Now the training counts are es>mated.

j i t=0 E qt xt ij = N 1 T 1 i xk k=0 t=0 E qt t

T 1 M 1 T 1 M 1 N 1 j j i i i q0 log i + qt qt1 log ij + qt xt log ij )

i i i j i j i i E{q0 } = p(q0 |) E{qt qt1 } = p(qt , qt1 |) E{qt } = p(qt |) x x x

These can be eciently calculated from JTA poten>als and separators.

(q0 , x0 ) (q0 ) (qi1 , qi ) (q ) (qi , qi+1 ) i (qi ) (qi , xi ) (qi+1 )

M-Step: rees>mate parameters from expected likelihood

Using expected values from JTA poten>als and separators

Check for convergence

i j t=0 E qt xt ij = N 1 T 1 i xk k=0 t=0 E qt t T 1

Training latent variables in Graphical Models

EM on Latent Variable Models

E-Step: Evaluate likelihood.

M-Step: Re-es>mate parameters.

Maximiza>on Step in Latent Variable Models

Many parents graphical model

We need to es>mate marginal separately. Dense Graphs

M-Step in Latent Variable Models

M-Step in Latent Variable Models

M-Step in Latent Variable Models

M-Step in Latent Variable Models

M-Step in Latent Variable Models

M-Step in Latent Variable Models

EM on Latent Variable Models

E-Step: Evaluate likelihood.

M-Step: Re-es>mate parameters.

Model Adapta>on Applica>on

Das könnte Ihnen auch gefallen