Sie sind auf Seite 1von 34

Lecture

18: Gaussian Mixture Models and Expecta7on Maximiza7on


Machine Learning April 13, 2010

Last Time
Review of Supervised Learning Clustering
K-means SoJ K-means

Today
A brief look at Homework 2 Gaussian Mixture Models Expecta7on Maximiza7on

The Problem
You have data that you believe is drawn from n popula7ons You want to iden7fy parameters for each popula7on You dont know anything about the popula7ons a priori
Except you believe that theyre gaussian

Gaussian Mixture Models


Rather than iden7fying clusters by nearest centroids Fit a Set of k Gaussians to the data Maximum Likelihood over a mixture model

GMM example
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10

10

15

20

25

f0 (x) = N (x; 2, 2)

= .5

f1 (x) = N (x; 10, .5) T .5

Mixture Models
Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribu7on,
p(x) = 0 f0 (x) + 1 f1 (x) + 2 f2 (x) + . . . + k fk (x) k where i = 1
i=0

p(x) =

k i=0

i fi (x)

Gaussian Mixture Models


GMM: the weighted sum of a number of Gaussians where the weights are determined by a distribu7on,
p(x) = 0 N (x|0 , 0 ) + 1 N (x|1 , 1 ) + . . . + k N (x|k , k ) k where i = 1
i=0

p(x) =

k i=0

i N (x|k , k )

Graphical Models with unobserved variables


What if you have variables in a Graphical model that are never observed?
Latent Variables

Training latent variable models is an unsupervised learning applica7on


uncomfortable amused

swea7ng

laughing

Latent Variable HMMs


We can cluster sequences using an HMM with unobserved state variables

We will train latent variable models using Expecta7on Maximiza7on

Expecta7on Maximiza7on
Both the training of GMMs and Graphical Models with latent variables can be accomplished using Expecta7on Maximiza7on
Step 1: Expecta7on (E-step)
Evaluate the responsibili7es of each cluster with the current parameters

Step 2: Maximiza7on (M-step)


Re-es7mate parameters using the exis7ng responsibili7es

Similar to k-means training.

Latent Variable Representa7on


We can represent a GMM involving a latent variable
p(x) =
p(z) =
K

k i=0
z kk

i N (x|k , k ) =
p(x|z) =
K

p(z)p(x|z)

N (x|k , k )zk

k=1

k=1

What does this give us?


TODO: plate nota7on

GMM data and Latent variables

One last bit


We have representa7ons of the joint p(x,z) and the marginal, p(x) The condi7onal of p(z|x) can be derived using Bayes rule.
The responsibility that a mixture component takes for explaining an observa7on x.
(zk ) = p(zk = 1|x) = = K K p(zk = 1)p(x|zk = 1) p(zj = 1)p(x|zj = 1) j N (x|j , j )
j=1

k N (x|k , k )

j=1

Maximum Likelihood over a GMM


As usual: Iden7fy a likelihood func7on
ln p(x|, , ) =
n=1 N

ln

k=1

k N (xn |k , k )

And set par7als to zero

Maximum Likelihood of a GMM


Op7miza7on of means.
ln p(x|, , ) = ln p(x|, , ) k
n=1 N

ln

k=1

N k N (xn |k , k ) = 1 (xk k ) = 0 j N (xn |j , j k j n=1

k N (xn |k , k )

n=1

(znk )1 (xk k ) = 0 k
N
n=1 (znk )xn N n=1 (znk )

k =

Maximum Likelihood of a GMM


Op7miza7on of covariance
ln p(x|, , ) =
n=1
N

ln

k=1

k N (xn |k , k )

Note the similarity to the regular MLE without responsibility terms.

n=1

(znk ) n=1

(znk )(xk k )(xk k )T

Maximum Likelihood of a GMM


Op7miza7on of mixing term
ln p(x|, , ) +
k=1 K

k 1

N k N (xn |k , k ) = + j j N (xn |j , j n=1

k =

n=1

(zn k) N

MLE of a GMM
k = N
n=1

(znk )xn Nk

N 1 (znk )(xk k )(xk k )T Nk n=1

Nk k = N Nk =
N

(zn k)

n=1

EM for GMMs
Ini7alize the parameters
Evaluate the log likelihood

Expecta7on-step: Evaluate the responsibili7es Maximiza7on-step: Re-es7mate Parameters


Evaluate the log likelihood Check for convergence

EM for GMMs
E-step: Evaluate the Responsibili7es
(znk ) = k N (xn |k , k )

j=1

j N (xn |j , j )

EM for GMMs
M-Step: Re-es7mate Parameters
new k
new k =

n=1

(znk )xn Nk

N 1 (znk )(xk new )(xk new )T k k Nk n=1

new k

Nk = N

Visual example of EM

Poten7al Problems
Incorrect number of Mixture Components

Singulari7es

Incorrect Number of Gaussians

Incorrect Number of Gaussians

Singulari7es
A minority of the data can have a dispropor7onate eect on the model likelihood. For example

GMM example
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10

10

15

20

25

f0 (x) = N (x; 2, 2)

= .5

f1 (x) = N (x; 10, .5) T .5

Singulari7es
When a mixture component collapses on a given point, the mean becomes the point, and the variance goes to zero. Consider the likelihood func7on as the covariance goes to zero. The likelihood approaches innity.
p(x) =
k i=0

1 1 N (xn |xn , I) = 2 j
2

i N (x|k , k )

Rela7onship to K-means
K-means makes hard decisions.
Each data point gets assigned to a single cluster.

GMM/EM makes so2 decisions.


Each data point can yield a posterior p(z|x)

SoJ K-means is a special case of EM.

SoJ means as GMM/EM


Assume equal covariance matrices for every mixture component: I Likelihood: 1 1
p(x|k , k ) = (2)M/2 exp 2 x k 2
2

Responsibili7es:

As epsilon approaches zero, the responsibility approaches unity.

k exp xn k /2 (znk ) = j exp {xn j 2 /2} j

SoJ K-Means as GMM/EM


Overall Log likelihood as epsilon approaches zero:
N K 1 Ez [ln p(X, Z|, , )] rnk xn k 2 + const. 2 n=1 k=1

The expecta7on of soJ k-means is the intercluster variability Note: only the means are rees7mated in SoJ K-means.
The covariance matrices are all 7ed.

General form of EM
Given a joint distribu7on over observed and latent variables: p(X, Z|) Want to maximize: p(X|)
1. Ini7alize parameters old 2. E Step: Evaluate: p(Z|X, old ) 3. M-Step: Re-es7mate parameters (based on expecta7on of complete-data log likelihood) new = argmax p(Z|X, old ) ln p(X, Z|) 4. Check for convergence of params or likelihood
Z

Next Time
Homework 4 due Proof of Expecta7on Maximiza7on in GMMs Generalized EM Hidden Markov Models

Das könnte Ihnen auch gefallen