Andrew Rosenberg - Lecture 18: Gaussian Mixture Models and Expectation Maximization

Lecture
18: Gaussian Mixture Models and Expecta7on Maximiza7on

Machine Learning April 13, 2010
Last Time
Review of Supervised Learning Clustering
K-means SoJ K-means
Today
A brief look at Homework 2 Gaussian Mixture Models Expecta7on Maximiza7on
The Problem
You have data that you believe is drawn from n popula7ons You want to iden7fy parameters for each popula7on You dont know anything about the popula7ons a priori
Except you believe that theyre gaussian
Gaussian Mixture Models

Rather than iden7fying clusters by nearest centroids Fit a Set of k Gaussians to the data Maximum Likelihood over a mixture model
GMM example
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10
10
15
20
25
f0 (x) = N (x; 2, 2)
= .5
f1 (x) = N (x; 10, .5) T .5
Mixture Models
Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribu7on,
p(x) = 0 f0 (x) + 1 f1 (x) + 2 f2 (x) + . . . + k fk (x) k where i = 1
i=0
p(x) =
k i=0
i fi (x)
Gaussian Mixture Models

GMM: the weighted sum of a number of Gaussians where the weights are determined by a distribu7on,
p(x) = 0 N (x|0 , 0 ) + 1 N (x|1 , 1 ) + . . . + k N (x|k , k ) k where i = 1
i=0
p(x) =
k i=0
i N (x|k , k )
Graphical Models with unobserved variables

What if you have variables in a Graphical model that are never observed?
Latent Variables
Training latent variable models is an unsupervised learning applica7on

uncomfortable amused
swea7ng
laughing
Latent Variable HMMs

We can cluster sequences using an HMM with unobserved state variables
We will train latent variable models using Expecta7on Maximiza7on
Expecta7on Maximiza7on
Both the training of GMMs and Graphical Models with latent variables can be accomplished using Expecta7on Maximiza7on
Step 1: Expecta7on (E-step)
Evaluate the responsibili7es of each cluster with the current parameters
Step 2: Maximiza7on (M-step)

Re-es7mate parameters using the exis7ng responsibili7es
Similar to k-means training.
Latent Variable Representa7on

We can represent a GMM involving a latent variable
p(x) =
p(z) =
K
k i=0
z kk
i N (x|k , k ) =
p(x|z) =
K
p(z)p(x|z)
N (x|k , k )zk
k=1
k=1
What does this give us?

TODO: plate nota7on
GMM data and Latent variables
One last bit

We have representa7ons of the joint p(x,z) and the marginal, p(x) The condi7onal of p(z|x) can be derived using Bayes rule.
The responsibility that a mixture component takes for explaining an observa7on x.
(zk ) = p(zk = 1|x) = = K K p(zk = 1)p(x|zk = 1) p(zj = 1)p(x|zj = 1) j N (x|j , j )
j=1
k N (x|k , k )
j=1
Maximum Likelihood over a GMM

As usual: Iden7fy a likelihood func7on
ln p(x|, , ) =
n=1 N
ln
k=1
k N (xn |k , k )
And set par7als to zero
Maximum Likelihood of a GMM

Op7miza7on of means.
ln p(x|, , ) = ln p(x|, , ) k
n=1 N
ln
k=1
N k N (xn |k , k ) = 1 (xk k ) = 0 j N (xn |j , j k j n=1
k N (xn |k , k )
n=1
(znk )1 (xk k ) = 0 k
N
n=1 (znk )xn N n=1 (znk )
k =

Op7miza7on of covariance
ln p(x|, , ) =
n=1
N
ln
k=1
k N (xn |k , k )
Note the similarity to the regular MLE without responsibility terms.
n=1
(znk ) n=1
(znk )(xk k )(xk k )T

Op7miza7on of mixing term
ln p(x|, , ) +
k=1 K
k 1
N k N (xn |k , k ) = + j j N (xn |j , j n=1
k =
n=1
(zn k) N
MLE of a GMM
k = N
n=1
(znk )xn Nk
N 1 (znk )(xk k )(xk k )T Nk n=1
Nk k = N Nk =
N
(zn k)
n=1
EM for GMMs
Ini7alize the parameters
Evaluate the log likelihood
Expecta7on-step: Evaluate the responsibili7es Maximiza7on-step: Re-es7mate Parameters

Evaluate the log likelihood Check for convergence
EM for GMMs
E-step: Evaluate the Responsibili7es
(znk ) = k N (xn |k , k )
j=1
j N (xn |j , j )
EM for GMMs
M-Step: Re-es7mate Parameters
new k
new k =
n=1
(znk )xn Nk
N 1 (znk )(xk new )(xk new )T k k Nk n=1
new k
Nk = N
Visual example of EM
Poten7al Problems
Incorrect number of Mixture Components
Singulari7es
Incorrect Number of Gaussians
Incorrect Number of Gaussians
Singulari7es
A minority of the data can have a dispropor7onate eect on the model likelihood. For example
GMM example
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10
10
15
20
25
f0 (x) = N (x; 2, 2)
= .5
f1 (x) = N (x; 10, .5) T .5
Singulari7es
When a mixture component collapses on a given point, the mean becomes the point, and the variance goes to zero. Consider the likelihood func7on as the covariance goes to zero. The likelihood approaches innity.
p(x) =
k i=0
1 1 N (xn |xn , I) = 2 j
2
i N (x|k , k )
Rela7onship to K-means
K-means makes hard decisions.
Each data point gets assigned to a single cluster.
GMM/EM makes so2 decisions.

Each data point can yield a posterior p(z|x)
SoJ K-means is a special case of EM.
SoJ means as GMM/EM

Assume equal covariance matrices for every mixture component: I Likelihood: 1 1
p(x|k , k ) = (2)M/2 exp 2 x k 2
2
Responsibili7es:
As epsilon approaches zero, the responsibility approaches unity.
k exp xn k /2 (znk ) = j exp {xn j 2 /2} j
SoJ K-Means as GMM/EM

Overall Log likelihood as epsilon approaches zero:
N K 1 Ez [ln p(X, Z|, , )] rnk xn k 2 + const. 2 n=1 k=1
The expecta7on of soJ k-means is the intercluster variability Note: only the means are rees7mated in SoJ K-means.
The covariance matrices are all 7ed.
General form of EM
Given a joint distribu7on over observed and latent variables: p(X, Z|) Want to maximize: p(X|)
1. Ini7alize parameters old 2. E Step: Evaluate: p(Z|X, old ) 3. M-Step: Re-es7mate parameters (based on expecta7on of complete-data log likelihood) new = argmax p(Z|X, old ) ln p(X, Z|) 4. Check for convergence of params or likelihood
Z
Next Time
Homework 4 due Proof of Expecta7on Maximiza7on in GMMs Generalized EM Hidden Markov Models

Andrew Rosenberg - Lecture 18: Gaussian Mixture Models and Expectation Maximization

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Andrew Rosenberg - Lecture 18: Gaussian Mixture Models and Expectation Maximization

Hochgeladen von

Copyright:

Verfügbare Formate

Lecture

18: Gaussian Mixture Models and Expecta7on Maximiza7on

Gaussian Mixture Models

f1 (x) = N (x; 10, .5) T .5

Gaussian Mixture Models

Graphical Models with unobserved variables

Training latent variable models is an unsupervised learning applica7on

Latent Variable HMMs

We will train latent variable models using Expecta7on Maximiza7on

Step 2: Maximiza7on (M-step)

Similar to k-means training.

Latent Variable Representa7on

What does this give us?

GMM data and Latent variables

One last bit

Maximum Likelihood over a GMM

And set par7als to zero

Maximum Likelihood of a GMM

N k N (xn |k , k ) = 1 (xk k ) = 0 j N (xn |j , j k j n=1

Maximum Likelihood of a GMM

Note the similarity to the regular MLE without responsibility terms.

(znk )(xk k )(xk k )T

Maximum Likelihood of a GMM

N k N (xn |k , k ) = + j j N (xn |j , j n=1

N 1 (znk )(xk k )(xk k )T Nk n=1

Expecta7on-step: Evaluate the responsibili7es Maximiza7on-step: Re-es7mate Parameters

N 1 (znk )(xk new )(xk new )T k k Nk n=1

Incorrect Number of Gaussians

Incorrect Number of Gaussians

f1 (x) = N (x; 10, .5) T .5

GMM/EM makes so2 decisions.

SoJ K-means is a special case of EM.

SoJ means as GMM/EM

As epsilon approaches zero, the responsibility approaches unity.

k exp xn k /2 (znk ) = j exp {xn j 2 /2} j

SoJ K-Means as GMM/EM

Das könnte Ihnen auch gefallen