Sie sind auf Seite 1von 32

Unsupervised Leanring

Sourangshu Bhattacharya

Clustering
Dataset: 1 , , .
Goal: Partition the data into clusters.
Criterion:
Intra-cluster distances should be smaller than intercluster distances.

Clustering

Clustering
Dataset: 1 , , .
Goal: Partition the data into clusters.
Criterion:
Intra-cluster distances should be smaller than intercluster distances.

Many clustering schemes:


K-means, K-medoids, etc.
Hierarchical clustering.
Spectral clustering.

K- means clustering
Each cluster k is represented by a center, .
Let {0,1} denote whether datapoint is
assigned to cluster .
= 1 if belongs to
= 0 else.
Distortion measure:


=1 =1

K-means clustering
We need to find both and .
Do it alternatively for and keeping the other
fixed.

K-means algorithm
Initialize {1 , , } randomly.
Iterate till convergence:
E-step:
=
M-step:

= | |
0

Complexity: ()

K-means

K-means algorithm
It is easy to check that:
E-step: minimizes w.r.t. keeping fixed.
M-step: minimizes w.r.t. keeping fixed.

The iteration will converge:


keeping decreasing in each step by a certain minimum
amount.
Minimum value for is zero.

Converge to a local minimum.

K-medoids
Objective function:

( , )
=1 =1

Medoids, {1 , , }.
1 = ( , )
E-step: =
0

M-step: Assign to which minimizes

=1 ( , ) over all .

K-medoids
Complexity is ( 2 ).
Convergence: function values converges, but the
medoids can oscillate.
Advantage: works in non-vectorial datasets.

Mixture of Gaussians
0,1 : be a discrete latent variable, such that
= 1.
selects the cluster (mixture component) from
which the data point is generated.
There are K Gaussian distributions:
1 , 1

(| , )

Mixture of Gaussians
Given a data point :

(| , )
=1

Where:
= ( = 1)

Generative Procedure
Select z from probability distr. .

=1 .

Hence: =
Given z, generate x according to the conditional
distr.:
= 1 = (| , )
Hence:

,
=1

Generative Procedure
Joint distr.:

, =

=1

Marginal:

(, ) =

(| , )
=1

Posterior distribution
= 1 given :

Example

Max-likelihood
Let = {1 , , }
Likelihood function:

, , =

( | , )
=1 =1

Log likelihood:
ln , ,

ln(
=1

( | , ))
=1

Maximize log-likelihood w.r.t. , and .

KKT conditions
Differentiating w.r.t. :

Multiplying by 1 :

Where:

KKT conditions
Similarly, differentiating w.r.t. :

Lagrangian w.r.t. :

KKT conditions
Minimizing:

Multiplying with and adding over k: = .


Hence:

Where:

(EM) Algorithm
Initialize , and .
E-step:

M-step:

Repeat above two steps till ln , ,


converges.

Example

General EM Algorithm
Incomplete data log-likelihood of a latent variable
model:
Summation inside logarithm.
Complete data log-likelihood is more tractable:

General EM Algorithm
Define expectation w.r.t. posterior (|, ):

Maximize w.r.t. :

General EM algorithm
Initialize .
Iterate till convergence:

For Gaussian Mixtures:


E-step:

M-step: Maximize

Relation to K-means
Let = .
Hence:
Giving:

In the limit that 0, ( ) with the smallest


| | becomes 1. Rest become 0.

Relation to K-means
Setting = ( ), when 0:

EM Analysis
For any latent variable model:
Following holds:
Where:

EM Analysis
Since KL divergence > 0:

Tight when the probabilities are same:

EM Analysis

Das könnte Ihnen auch gefallen