Beruflich Dokumente
Kultur Dokumente
Sourangshu Bhattacharya
Clustering
Dataset: 1 , , .
Goal: Partition the data into clusters.
Criterion:
Intra-cluster distances should be smaller than intercluster distances.
Clustering
Clustering
Dataset: 1 , , .
Goal: Partition the data into clusters.
Criterion:
Intra-cluster distances should be smaller than intercluster distances.
K- means clustering
Each cluster k is represented by a center, .
Let {0,1} denote whether datapoint is
assigned to cluster .
= 1 if belongs to
= 0 else.
Distortion measure:
=1 =1
K-means clustering
We need to find both and .
Do it alternatively for and keeping the other
fixed.
K-means algorithm
Initialize {1 , , } randomly.
Iterate till convergence:
E-step:
=
M-step:
= | |
0
Complexity: ()
K-means
K-means algorithm
It is easy to check that:
E-step: minimizes w.r.t. keeping fixed.
M-step: minimizes w.r.t. keeping fixed.
K-medoids
Objective function:
( , )
=1 =1
Medoids, {1 , , }.
1 = ( , )
E-step: =
0
=1 ( , ) over all .
K-medoids
Complexity is ( 2 ).
Convergence: function values converges, but the
medoids can oscillate.
Advantage: works in non-vectorial datasets.
Mixture of Gaussians
0,1 : be a discrete latent variable, such that
= 1.
selects the cluster (mixture component) from
which the data point is generated.
There are K Gaussian distributions:
1 , 1
(| , )
Mixture of Gaussians
Given a data point :
(| , )
=1
Where:
= ( = 1)
Generative Procedure
Select z from probability distr. .
=1 .
Hence: =
Given z, generate x according to the conditional
distr.:
= 1 = (| , )
Hence:
,
=1
Generative Procedure
Joint distr.:
, =
=1
Marginal:
(, ) =
(| , )
=1
Posterior distribution
= 1 given :
Example
Max-likelihood
Let = {1 , , }
Likelihood function:
, , =
( | , )
=1 =1
Log likelihood:
ln , ,
ln(
=1
( | , ))
=1
KKT conditions
Differentiating w.r.t. :
Multiplying by 1 :
Where:
KKT conditions
Similarly, differentiating w.r.t. :
Lagrangian w.r.t. :
KKT conditions
Minimizing:
Where:
(EM) Algorithm
Initialize , and .
E-step:
M-step:
Example
General EM Algorithm
Incomplete data log-likelihood of a latent variable
model:
Summation inside logarithm.
Complete data log-likelihood is more tractable:
General EM Algorithm
Define expectation w.r.t. posterior (|, ):
Maximize w.r.t. :
General EM algorithm
Initialize .
Iterate till convergence:
M-step: Maximize
Relation to K-means
Let = .
Hence:
Giving:
Relation to K-means
Setting = ( ), when 0:
EM Analysis
For any latent variable model:
Following holds:
Where:
EM Analysis
Since KL divergence > 0:
EM Analysis