Sie sind auf Seite 1von 119

Mineração de Dados Aplicada

Clustering

Loı̈c Cerf
September, 1st 2018
DCC – ICEx – UFMG
Example of applicative problem

Student profiles
Given the marks students received for different courses, group the
students so that two students in a same group received about the
same marks for each course and two students in different groups
have different profiles.

Loı̈c Cerf Mineração de Dados Aplicada


2 / 46
N
Outline

1 Definition

2 Classical Algorithms

3 Assessing a Clustering

4 Case study

5 Clustering in KNIME

Loı̈c Cerf Mineração de Dados Aplicada


3 / 46
N
Definition

Outline

1 Definition

2 Classical Algorithms

3 Assessing a Clustering

4 Case study

5 Clustering in KNIME

Loı̈c Cerf Mineração de Dados Aplicada


4 / 46
N
Definition

An optimization problem

Definition
Partitioning the objects into clusters so that each cluster contains
similar objects and objects in different clusters are dissimilar.

Input:

a1 a2 ... an
o1 d1,1 d1,2 ... d1,n
o2 d2,1 d2,2 ... d2,n
.. .. .. .. ..
. . . . .
om dm,1 dm,2 . . . dm,n

Loı̈c Cerf Mineração de Dados Aplicada


5 / 46
N
Definition

An optimization problem

Definition
Partitioning the objects so that the intra-cluster similarities are
maximized and the inter-cluster similarities are minimized.

Input:

a1 a2 ... an
o1 d1,1 d1,2 ... d1,n
o2 d2,1 d2,2 ... d2,n
.. .. .. .. ..
. . . . .
om dm,1 dm,2 . . . dm,n

Loı̈c Cerf Mineração de Dados Aplicada


5 / 46
N
Definition

An optimization problem

Definition
Partitioning the objects so that the intra-cluster similarities are
maximized and the inter-cluster similarities are minimized.

Output:

a1 a2 ... an cluster
o1 d1,1 d1,2 ... d1,n c1
o2 d2,1 d2,2 ... d2,n c2
.. .. .. .. .. ..
. . . . . .
om dm,1 dm,2 . . . dm,n cm

Loı̈c Cerf Mineração de Dados Aplicada


5 / 46
N
Definition

Illustration
Clustering objects, described with two interval-scaled attributes,
using the Euclidean distance.

x y
o1 91 70
o2 129 91
o3 359 243
o4 322 254
o5 100 104
o6 464 113
o7 342 297
o8 410 65
o9 334 329
.. .. ..
. . .
Loı̈c Cerf Mineração de Dados Aplicada
6 / 46
N
Definition

Illustration
Clustering objects, described with two interval-scaled attributes,
using the Euclidean distance.

x y
o1 91 70
o2 129 91
o3 359 243
o4 322 254
o5 100 104
o6 464 113
o7 342 297
o8 410 65
o9 334 329
.. .. ..
. . .
Loı̈c Cerf Mineração de Dados Aplicada
6 / 46
N
Definition

Illustration
Clustering objects, described with two interval-scaled attributes,
using the Euclidean distance.

x y
o1 91 70
o2 129 91
o3 359 243
o4 322 254
o5 100 104
o6 464 113
o7 342 297
o8 410 65
o9 334 329
.. .. ..
. . .
Loı̈c Cerf Mineração de Dados Aplicada
6 / 46
N
Definition

Illustration
Clustering objects, described with two interval-scaled attributes,
using the Euclidean distance.

x y
o1 91 70
o2 129 91
o3 359 243
o4 322 254
o5 100 104
o6 464 113
o7 342 297
o8 410 65
o9 334 329
.. .. ..
. . .
Loı̈c Cerf Mineração de Dados Aplicada
6 / 46
N
Definition

Inductive database formalism

Querying patterns:

{X ∈ P | Q(X , D)}
where:
D is the dataset,
P is the pattern space,
Q is an inductive query.

Loı̈c Cerf Mineração de Dados Aplicada


7 / 46
N
Definition

Inductive database formalism

Querying a clustering:

{X ∈ P | Q(X , D)}
where:
D is the dataset,
P is the pattern space,
Q is an inductive query.

Loı̈c Cerf Mineração de Dados Aplicada


7 / 46
N
Definition

Inductive database formalism

Querying a clustering:

{X ∈ P | Q(X , D)}
where:
D is a set of objects O associated with a similarity measure,
P is the pattern space,
Q is an inductive query.

Loı̈c Cerf Mineração de Dados Aplicada


7 / 46
N
Definition

Inductive database formalism

Querying a clustering:

{X ∈ P | Q(X , D)}
where:
D is a set of objects O associated with a similarity measure,

∀` ∈ {1, . . . , k}, C` 6= ∅

O k
P is {(C1 , . . . , Ck ) ∈ (2 ) | ∀m 6= `, C` ∩ Cm 6= ∅ },

 k
∪`=1 C` = O
Q is an inductive query.

Loı̈c Cerf Mineração de Dados Aplicada


7 / 46
N
Definition

Inductive database formalism


Querying a clustering:

{X ∈ P | Q(X , D)}
where:
D is a set of objects O associated with a similarity measure,

∀` ∈ {1, . . . , k}, C` 6= ∅

P is {(C1 , . . . , Ck ) ∈ (2O )k | ∀m 6= `, C` ∩ Cm 6= ∅ },

 k
∪`=1 C` = O
Q is a function to optimize. It quantifies how similar are pairs of
objects in a same cluster and/or how dissimilar are those in two
different clusters.

Loı̈c Cerf Mineração de Dados Aplicada


7 / 46
N
Definition

Inductive database formalism


Querying a clustering:

{X ∈ P | Q(X , D)}
where:
D is a set of objects O associated with a similarity measure,

∀` ∈ {1, . . . , k}, C` 6= ∅

P is {(C1 , . . . , Ck ) ∈ (2O )k | ∀m 6= `, C` ∩ Cm 6= ∅ },

 k
∪`=1 C` = O
Q is a function to optimize. It quantifies how similar are pairs of
objects in a same cluster and/or how dissimilar are those in two
different clusters.

Variants exist, e. g., authorizing some overlapping of the clusters.


Loı̈c Cerf Mineração de Dados Aplicada
7 / 46
N
Definition

Inexactness

Every object influences the clustering and the number


 |O|of
 ways to
k
partition |O| objects in k ∈ N clusters is huge: O k! .

Loı̈c Cerf Mineração de Dados Aplicada


8 / 46
N
Definition

Inexactness

Every object influences the clustering and the number


 |O|of
 ways to
k
partition |O| objects in k ∈ N clusters is huge: O k! .

That is why clustering is usually solved in an approximate way.

Loı̈c Cerf Mineração de Dados Aplicada


8 / 46
N
Definition

Inexactness

Every object influences the clustering and the number


 |O|of
 ways to
k
partition |O| objects in k ∈ N clusters is huge: O k! .

That is why clustering is usually solved in an approximate way.

Domain decomposition consists of using a cheap clustering method


to get coarse clusters that other clustering algorithms can
independently process in a second step.

Loı̈c Cerf Mineração de Dados Aplicada


8 / 46
N
Classical Algorithms

Outline

1 Definition

2 Classical Algorithms

3 Assessing a Clustering

4 Case study

5 Clustering in KNIME

Loı̈c Cerf Mineração de Dados Aplicada


9 / 46
N
Classical Algorithms

Hierarchical agglomeration: illustration


(Agglomerative) hierarchical clustering of objects, described with
two interval-scaled attributes, using the Euclidean distance.

x y
o1 91 70
o2 129 91
o3 359 243
o4 322 254
o5 100 104
o6 464 113
o7 342 297
o8 410 65
o9 334 329
.. .. ..
. . .
Loı̈c Cerf Mineração de Dados Aplicada
10 / 46
N
Classical Algorithms

Dendrogram

Loı̈c Cerf Mineração de Dados Aplicada


11 / 46
N
Classical Algorithms

Distance

Loı̈c Cerf Mineração de Dados Aplicada


12 / 46
N
Classical Algorithms

Hierarchical agglomeration: algorithm


A greedy algorithm:
1 Initialize the clusters with the individual objects;

Loı̈c Cerf Mineração de Dados Aplicada


13 / 46
N
Classical Algorithms

Hierarchical agglomeration: algorithm


A greedy algorithm:
1 Initialize the clusters with the individual objects;
2 At each iteration, agglomerate the two closest clusters;

Loı̈c Cerf Mineração de Dados Aplicada


13 / 46
N
Classical Algorithms

Hierarchical agglomeration: algorithm


A greedy algorithm:
1 Initialize the clusters with the individual objects;
2 At each iteration, agglomerate the two closest clusters;
3 Stop when the desired number of clusters is reached.

Loı̈c Cerf Mineração de Dados Aplicada


13 / 46
N
Classical Algorithms

Hierarchical agglomeration: algorithm


A greedy algorithm:
1 Initialize the clusters with the individual objects;
2 At each iteration, agglomerate the two closest clusters;
3 Stop when the desired number of clusters is reached.

No guarantee whatsoever on the optimality of the solution;

Loı̈c Cerf Mineração de Dados Aplicada


13 / 46
N
Classical Algorithms

Hierarchical agglomeration: algorithm


A greedy algorithm:
1 Initialize the clusters with the individual objects;
2 At each iteration, agglomerate the two closest clusters;
3 Stop when the desired number of clusters is reached.

No guarantee whatsoever on the optimality of the solution;


Choice of a similarity between clusters;

Loı̈c Cerf Mineração de Dados Aplicada


13 / 46
N
Classical Algorithms

Hierarchical agglomeration: algorithm


A greedy algorithm:
1 Initialize the clusters with the individual objects;
2 At each iteration, agglomerate the two closest clusters;
3 Stop when the desired number of clusters is reached.

No guarantee whatsoever on the optimality of the solution;


Choice of a similarity between clusters;
Quadratic or cubic time complexity in the number of objects;

Loı̈c Cerf Mineração de Dados Aplicada


13 / 46
N
Classical Algorithms

Hierarchical agglomeration: algorithm


A greedy algorithm:
1 Initialize the clusters with the individual objects;
2 At each iteration, agglomerate the two closest clusters;
3 Stop when the desired number of clusters is reached.

No guarantee whatsoever on the optimality of the solution;


Choice of a similarity between clusters;
Quadratic or cubic time complexity in the number of objects;
The process should stop before agglomerating dissimilar clusters (no
need to guess this number beforehand) and the clusters are
hierarchically organized;

Loı̈c Cerf Mineração de Dados Aplicada


13 / 46
N
Classical Algorithms

Hierarchical agglomeration: algorithm


A greedy algorithm:
1 Initialize the clusters with the individual objects;
2 At each iteration, agglomerate the two closest clusters;
3 Stop when the desired number of clusters is reached.

No guarantee whatsoever on the optimality of the solution;


Choice of a similarity between clusters;
Quadratic or cubic time complexity in the number of objects;
The process should stop before agglomerating dissimilar clusters (no
need to guess this number beforehand) and the clusters are
hierarchically organized;
Outlier detection (cluster containing one single object).

Loı̈c Cerf Mineração de Dados Aplicada


13 / 46
N
Classical Algorithms

Hierarchical agglomeration: algorithm


A greedy algorithm:
1 Initialize the clusters with the individual objects;
2 At each iteration, agglomerate the two closest clusters;
3 Stop when the desired number of clusters is reached.

No guarantee whatsoever on the optimality of the solution;


Choice of a similarity between clusters;
Quadratic or cubic time complexity in the number of objects;
The process should stop before agglomerating dissimilar clusters (no
need to guess this number beforehand) and the clusters are
hierarchically organized;
Outlier detection (cluster containing one single object).

Loı̈c Cerf Mineração de Dados Aplicada


13 / 46
N
Classical Algorithms

Linkage criteria
The similarity between two clusters can be defined as:
Complete linkage the worst similarity between any pair of objects
taken from the two clusters;

to provide:
Complete linkage spherical clusters of approximately equal
diameters (clustering computed in O(|O|2 ) time);

Loı̈c Cerf Mineração de Dados Aplicada


14 / 46
N
Classical Algorithms

Linkage criteria
The similarity between two clusters can be defined as:
Complete linkage the worst similarity between any pair of objects
taken from the two clusters;
Single linkage the best similarity between any pair of objects taken
from the two clusters;

to provide:
Complete linkage spherical clusters of approximately equal
diameters (clustering computed in O(|O|2 ) time);
Single linkage “chains” of similar objects (clustering computed in
O(|O|2 ) time);

Loı̈c Cerf Mineração de Dados Aplicada


14 / 46
N
Classical Algorithms

Linkage criteria
The similarity between two clusters can be defined as:
Complete linkage the worst similarity between any pair of objects
taken from the two clusters;
Single linkage the best similarity between any pair of objects taken
from the two clusters;
Group average linkage the average similarity over all pairs of
objects taken from the two clusters.
to provide:
Complete linkage spherical clusters of approximately equal
diameters (clustering computed in O(|O|2 ) time);
Single linkage “chains” of similar objects (clustering computed in
O(|O|2 ) time);
Group average linkage the most natural linkage (clustering
computed in O(|O|3 ) time).
Loı̈c Cerf Mineração de Dados Aplicada
14 / 46
N
Classical Algorithms

Divisive hierarchical clustering

All objects are initially in one single cluster. Every cluster is


recursively split into two until every object is alone in a cluster.

Loı̈c Cerf Mineração de Dados Aplicada


15 / 46
N
Classical Algorithms

Divisive hierarchical clustering

All objects are initially in one single cluster. Every cluster is


recursively split into two until every object is alone in a cluster.

Considering all possible split to find the best one takes exponential
time. That is why a split is usually find in an approximate way,
e. g., using 2-means.

Loı̈c Cerf Mineração de Dados Aplicada


15 / 46
N
Classical Algorithms

k-means: illustration
3-means clustering of objects, described with two interval-scaled
attributes, using the Euclidean distance.

x y
o1 91 70
o2 129 91
o3 359 243
o4 322 254
o5 100 104
o6 464 113
o7 342 297
o8 410 65
o9 334 329
.. .. ..
. . .
Loı̈c Cerf Mineração de Dados Aplicada
16 / 46
N
Classical Algorithms

k-means: algorithm
Seeking the centers of k clusters by expectation-maximization:
1 Randomly choose k centers µ1 , . . . , µk in the object space;

Loı̈c Cerf Mineração de Dados Aplicada


17 / 46
N
Classical Algorithms

k-means: algorithm
Seeking the centers of k clusters by expectation-maximization:
1 Randomly choose k centers µ1 , . . . , µk in the object space;
2 Until convergence or a specified maximal number of iterations:

Loı̈c Cerf Mineração de Dados Aplicada


17 / 46
N
Classical Algorithms

k-means: algorithm
Seeking the centers of k clusters by expectation-maximization:
1 Randomly choose k centers µ1 , . . . , µk in the object space;
2 Until convergence or a specified maximal number of iterations:
E Assign each object to the cluster C` with the closest center µ` ;

Loı̈c Cerf Mineração de Dados Aplicada


17 / 46
N
Classical Algorithms

k-means: algorithm
Seeking the centers of k clusters by expectation-maximization:
1 Randomly choose k centers µ1 , . . . , µk in the object space;
2 Until convergence or a specified maximal number of iterations:
E Assign each object to the cluster C` with the closest center µ` ;
M Update the center µ` of each cluster to the mean of the objects
assigned to it.

Loı̈c Cerf Mineração de Dados Aplicada


17 / 46
N
Classical Algorithms

k-means: algorithm
Seeking the centers of k clusters by expectation-maximization:
1 Randomly choose k centers µ1 , . . . , µk in the object space;
2 Until convergence or a specified maximal number of iterations:
E Assign each object to the cluster C` with the closest center µ` ;
M Update the center µ` of each cluster to the mean of the objects
assigned to it.

The number of clusters must be guessed beforehand;

Loı̈c Cerf Mineração de Dados Aplicada


17 / 46
N
Classical Algorithms

k-means: algorithm
Seeking the centers of k clusters by expectation-maximization:
1 Randomly choose k centers µ1 , . . . , µk in the object space;
2 Until convergence or a specified maximal number of iterations:
E Assign each object to the cluster C` with the closest center µ` ;
M Update the center µ` of each cluster to the mean of the objects
assigned to it.

The number of clusters must be guessed beforehand;


Pk P
Convergence to a local minimum of `=1 o∈C` ko − µ` k2 ;

Loı̈c Cerf Mineração de Dados Aplicada


17 / 46
N
Classical Algorithms

k-means: algorithm
Seeking the centers of k clusters by expectation-maximization:
1 Randomly choose k centers µ1 , . . . , µk in the object space;
2 Until convergence or a specified maximal number of iterations:
E Assign each object to the cluster C` with the closest center µ` ;
M Update the center µ` of each cluster to the mean of the objects
assigned to it.

The number of clusters must be guessed beforehand;


Pk P
Convergence to a local minimum of `=1 o∈C` ko − µ` k2 ;
Spherical clusters of approximately equal diameters;

Loı̈c Cerf Mineração de Dados Aplicada


17 / 46
N
Classical Algorithms

k-means: algorithm
Seeking the centers of k clusters by expectation-maximization:
1 Randomly choose k centers µ1 , . . . , µk in the object space;
2 Until convergence or a specified maximal number of iterations:
E Assign each object to the cluster C` with the closest center µ` ;
M Update the center µ` of each cluster to the mean of the objects
assigned to it.

The number of clusters must be guessed beforehand;


Pk P
Convergence to a local minimum of `=1 o∈C` ko − µ` k2 ;
Spherical clusters of approximately equal diameters;
Sensible to outliers, which should be removed beforehand
(k-medoids uses the Manhattan distance and the median).

Loı̈c Cerf Mineração de Dados Aplicada


17 / 46
N
Classical Algorithms

k-means: algorithm
Seeking the centers of k clusters by expectation-maximization:
1 Randomly choose k centers µ1 , . . . , µk in the object space;
2 Until convergence or a specified maximal number of iterations:
E Assign each object to the cluster C` with the closest center µ` ;
M Update the center µ` of each cluster to the mean of the objects
assigned to it.

The number of clusters must be guessed beforehand;


Pk P
Convergence to a local minimum of `=1 o∈C` ko − µ` k2 ;
Spherical clusters of approximately equal diameters;
Sensible to outliers, which should be removed beforehand
(k-medoids uses the Manhattan distance and the median).
Linear time complexity in the number of objects, attributes, clusters
and iterations (small in practice).
Loı̈c Cerf Mineração de Dados Aplicada
17 / 46
N
Classical Algorithms

k-means: algorithm
Seeking the centers of k clusters by expectation-maximization:
1 Randomly choose k centers µ1 , . . . , µk in the object space;
2 Until convergence or a specified maximal number of iterations:
E Assign each object to the cluster C` with the closest center µ` ;
M Update the center µ` of each cluster to the mean of the objects
assigned to it.

The number of clusters must be guessed beforehand;


Pk P
Convergence to a local minimum of `=1 o∈C` ko − µ` k2 ;
Spherical clusters of approximately equal diameters;
Sensible to outliers, which should be removed beforehand
(k-medoids uses the Manhattan distance and the median).
Linear time complexity in the number of objects, attributes, clusters
and iterations (small in practice).
Loı̈c Cerf Mineração de Dados Aplicada
17 / 46
N
Classical Algorithms

The elbow method

Plot in
Pfunction
P of k a measure of the quality of the clustering,
e. g., k`=1 o∈C` ko − µ` k2 that k-means locally minimizes.
Choose k after a large drop.

Loı̈c Cerf Mineração de Dados Aplicada


18 / 46
N
Classical Algorithms

The elbow method

Plot in
Pfunction
P of k a measure of the quality of the clustering,
e. g., k`=1 o∈C` ko − µ` k2 that k-means locally minimizes.
Choose k after a large drop.

More principled methods exist, e. g., finding the best trade-off


between quality and compression.

Loı̈c Cerf Mineração de Dados Aplicada


18 / 46
N
Classical Algorithms

The elbow method

Plot in
Pfunction
P of k a measure of the quality of the clustering,
e. g., k`=1 o∈C` ko − µ` k2 that k-means locally minimizes.
Choose k after a large drop.

More principled methods exist, e. g., finding the best trade-off


between quality and compression.

No such method is implemented in KNIME. If the time complexity


of a hierarchical agglomeration (using a complete linkage for
similarly-shaped clusters) is not prohibitive, the number of clusters
can be chosen from the dendrogram. Outliers can be identified
(and removed) in this way too.

Loı̈c Cerf Mineração de Dados Aplicada


18 / 46
N
Classical Algorithms

Tendency to produce equi-sized clusters

Dataset k-means clustering EM clustering


0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Loı̈c Cerf Mineração de Dados Aplicada


19 / 46
N
Classical Algorithms

EM

The dataset D is seen as a random sample from an


|A|-dimensional random variable O (i. e., independent and
identically distributed) whose probability density function is given
as a mixture model of the k clusters.

Loı̈c Cerf Mineração de Dados Aplicada


20 / 46
N
Classical Algorithms

EM

The dataset D is seen as a random sample from an


|A|-dimensional random variable O (i. e., independent and
identically distributed) whose probability density function is given
as a mixture model of the k clusters.

EM searches, by expectation-maximization, a parametrization of


the model that locally maximizes the likelihood that D is indeed a
random sample of O.

Loı̈c Cerf Mineração de Dados Aplicada


20 / 46
N
Classical Algorithms

EM

The dataset D is seen as a random sample from an


|A|-dimensional random variable O (i. e., independent and
identically distributed) whose probability density function is given
as a mixture model of the k clusters.

EM searches, by expectation-maximization, a parametrization of


the model that locally maximizes the likelihood that D is indeed a
random sample of O.

The distribution of a cluster usually is assumed multivariate


normal, thus parametrized with a location (the center of the
cluster) and a covariance matrix.

Loı̈c Cerf Mineração de Dados Aplicada


20 / 46
N
Classical Algorithms

EM with |A| = 1 and k = 2: illustration

EM clustering of objects in a one-dimensional space.

Dataset:

Loı̈c Cerf Mineração de Dados Aplicada


21 / 46
N
Classical Algorithms

EM with |A| = 1 and k = 2: illustration

EM clustering of objects in a one-dimensional space.

Iteration 1:

Loı̈c Cerf Mineração de Dados Aplicada


21 / 46
N
Classical Algorithms

EM with |A| = 1 and k = 2: illustration

EM clustering of objects in a one-dimensional space.

Iteration 5:

Loı̈c Cerf Mineração de Dados Aplicada


21 / 46
N
Classical Algorithms

k-means specializes EM

k-means specializes EM:


EM the probability of a cluster given an object weights
the contribution of the object to the cluster;

Loı̈c Cerf Mineração de Dados Aplicada


22 / 46
N
Classical Algorithms

k-means specializes EM

k-means specializes EM:


EM the probability of a cluster given an object weights
the contribution of the object to the cluster;
k-means that probability is 1 for the cluster with the closest
center, 0 for the other clusters.

Loı̈c Cerf Mineração de Dados Aplicada


22 / 46
N
Classical Algorithms

k-means specializes EM

k-means specializes EM:


EM the probability of a cluster given an object weights
the contribution of the object to the cluster;
k-means that probability is 1 for the cluster with the closest
center, 0 for the other clusters.

Like k-means, EM:


requires the number of clusters to be guessed beforehand;
converges to a local optimum of the objective function (the
likelihood, i. e., the probability of the data given the mixture model);
is sensible to outliers, which should be removed beforehand.

Loı̈c Cerf Mineração de Dados Aplicada


22 / 46
N
Classical Algorithms

k-means vs. EM

k-means produces spherical clusters of approximately equal


diameters, whereas EM produces ellipsoidal clusters of any sizes.

Loı̈c Cerf Mineração de Dados Aplicada


23 / 46
N
Classical Algorithms

k-means vs. EM

k-means produces spherical clusters of approximately equal


diameters, whereas EM produces ellipsoidal clusters of any sizes.

k-means is faster than EM: computing the probabilities of every


cluster given an object requires an O(k|A|2 ) time for a Gaussian
mixture (O(k|A|) for k-means); the convergence is slower because
EM must learn the O(k|A|2 ) real parameters of a Gaussian
mixture (O(k|A|) for k-means).

Loı̈c Cerf Mineração de Dados Aplicada


23 / 46
N
Classical Algorithms

k-means vs. EM

k-means produces spherical clusters of approximately equal


diameters, whereas EM produces ellipsoidal clusters of any sizes.

k-means is faster than EM: computing the probabilities of every


cluster given an object requires an O(k|A|2 ) time for a Gaussian
mixture (O(k|A|) for k-means); the convergence is slower because
EM must learn the O(k|A|2 ) real parameters of a Gaussian
mixture (O(k|A|) for k-means).

The covariances can be fixed to 0 (diagonal covariance matrices)


so that EM only learns and uses 2k|A| real parameters, hence a
reduced running time... and quality.

Loı̈c Cerf Mineração de Dados Aplicada


23 / 46
N
Classical Algorithms

EM with full covariance matrices

Loı̈c Cerf Mineração de Dados Aplicada


24 / 46
N
Classical Algorithms

EM with diagonal covariance matrices

Loı̈c Cerf Mineração de Dados Aplicada


25 / 46
N
Classical Algorithms

Fuzzy c-means

Fuzzy c-means is k-means with a fuzzy (rather than crisp)


membership of every object to every cluster: every object is
associated with k normalized weights. The weights increase with
the similarity between the object and the center of the cluster. A
hyper-parameter controls how fast they increase. The
maximization step becomes the computation of a weighted mean.

Loı̈c Cerf Mineração de Dados Aplicada


26 / 46
N
Classical Algorithms

Fuzzy c-means

Fuzzy c-means is k-means with a fuzzy (rather than crisp)


membership of every object to every cluster: every object is
associated with k normalized weights. The weights increase with
the similarity between the object and the center of the cluster. A
hyper-parameter controls how fast they increase. The
maximization step becomes the computation of a weighted mean.

Like EM, fuzzy c-means associates every object with degrees of


membership to every cluster. Besides that, it has the same
advantages and drawbacks as k-means.

Loı̈c Cerf Mineração de Dados Aplicada


26 / 46
N
Classical Algorithms

Fuzzy c-means

Fuzzy c-means is k-means with a fuzzy (rather than crisp)


membership of every object to every cluster: every object is
associated with k normalized weights. The weights increase with
the similarity between the object and the center of the cluster. A
hyper-parameter controls how fast they increase. The
maximization step becomes the computation of a weighted mean.

Like EM, fuzzy c-means associates every object with degrees of


membership to every cluster. Besides that, it has the same
advantages and drawbacks as k-means.

EM is not included in KNIME. Fuzzy c-means is.

Loı̈c Cerf Mineração de Dados Aplicada


26 / 46
N
Classical Algorithms

Non-convex clusters (kernel 3-means)

Loı̈c Cerf Mineração de Dados Aplicada


27 / 46
N
Classical Algorithms

Non-convex clusters (3-means)

Loı̈c Cerf Mineração de Dados Aplicada


27 / 46
N
Classical Algorithms

Kernel k-means and spectral clustering

Problem
k-means, EM and fuzzy c-means only find convex clusters.

Loı̈c Cerf Mineração de Dados Aplicada


28 / 46
N
Classical Algorithms

Kernel k-means and spectral clustering

Problem
k-means, EM and fuzzy c-means only find convex clusters.

Ideas

Kernel methods Using a nonlinear function, the objects are


mapped to a higher-dimensional space where the
clusters hopefully become convex;
Spectral methods Mapping the objects to a space whose base
is the eigenvectors of an affinity matrix.

Loı̈c Cerf Mineração de Dados Aplicada


28 / 46
N
Classical Algorithms

Kernel k-means and spectral clustering

Problem
k-means, EM and fuzzy c-means only find convex clusters.

Ideas

Kernel methods Using a nonlinear function, the objects are


mapped to a higher-dimensional space where the
clusters hopefully become convex;
Spectral methods Mapping the objects to a space whose base
is the eigenvectors of an affinity matrix.

Both families of methods are related.

Loı̈c Cerf Mineração de Dados Aplicada


28 / 46
N
Classical Algorithms

Kernel k-means

Given, conceptually, a nonlinear mapping Φ of the objects to a


higher-dimensional space, kernel k-means is k-means in this space.

Loı̈c Cerf Mineração de Dados Aplicada


29 / 46
N
Classical Algorithms

Kernel k-means

Given, conceptually, a nonlinear mapping Φ of the objects to a


higher-dimensional space, kernel k-means is k-means in this space.

Knowing the scalar product κ(o, µ` ) = Φ(o) · Φ(µ` ) is enough to


compute kΦ(o) − Φ(µ` )k2 , i. e., Φ needs not be applied. κ is a
continuous, symmetric and semi-definite function:
Polynomial kernel κ(o, µ` ) = (o · µ` + a)b ;
o·µ`
Gaussian kernel κ(o, µ` ) = e − 2σ2 ;
Sigmoid kernel κ(o, µ` ) = tanh(a(o · µ` ) + θ);
...

Loı̈c Cerf Mineração de Dados Aplicada


29 / 46
N
Classical Algorithms

Normalized cut (a spectral clustering)

Definition
Removing from the (non-negative and symmetric) similarity
graph the edges with a small total weight so that k “reason-
ably large” connected components are obtained.

Loı̈c Cerf Mineração de Dados Aplicada


30 / 46
N
Classical Algorithms

Normalized cut (a spectral clustering)

Definition
O k
P the partitioning (C1 , . . . , Ck ) ∈ (2 )
Approximately compute
Pk o ∈C ,o ∈O\C` s(o i ,o j )
that minimizes `=1 Pi ` j s(oi ,oj ) .
oi ∈C` ,oj ∈O

Loı̈c Cerf Mineração de Dados Aplicada


30 / 46
N
Classical Algorithms

Normalized cut (a spectral clustering)

Definition
O k
P the partitioning (C1 , . . . , Ck ) ∈ (2 )
Approximately compute
Pk o ∈C ,o ∈O\C` s(o i ,o j )
that minimizes `=1 Pi ` j s(oi ,oj ) .
oi ∈C` ,oj ∈O

Method
Extract the k smallest eigenvectors of an affinity matrix, e. g.,
the normalized Laplacian of the similarity matrix. Cluster (e. g.,
with k-means) the objects rewritten w.r.t. these k attributes.

Loı̈c Cerf Mineração de Dados Aplicada


30 / 46
N
Classical Algorithms

Assignment of new objects to clusters

(Kernel) k-means, EM and fuzzy c-means explicitly model every


cluster with a center (and a covariance matrix for EM).

Loı̈c Cerf Mineração de Dados Aplicada


31 / 46
N
Classical Algorithms

Assignment of new objects to clusters

(Kernel) k-means, EM and fuzzy c-means explicitly model every


cluster with a center (and a covariance matrix for EM).

As a consequence, a new object can be assigned to the most


probable cluster. In KNIME, “Cluster Assigner” does so.

Loı̈c Cerf Mineração de Dados Aplicada


31 / 46
N
Classical Algorithms

Assignment of new objects to clusters

(Kernel) k-means, EM and fuzzy c-means explicitly model every


cluster with a center (and a covariance matrix for EM).

As a consequence, a new object can be assigned to the most


probable cluster. In KNIME, “Cluster Assigner” does so.

The completed clustering is not guaranted to be a local extremum


of the objective function.

Loı̈c Cerf Mineração de Dados Aplicada


31 / 46
N
Classical Algorithms

DBSCAN: illustration
DBSCAN clustering of objects, described with two interval-scaled
attributes, using the Euclidean distance.

x y
o1 91 70
o2 129 91
o3 359 243
o4 322 254
o5 100 104
o6 464 113
o7 342 297
o8 410 65
o9 334 329
.. .. ..
. . .
Loı̈c Cerf Mineração de Dados Aplicada
32 / 46
N
Classical Algorithms

DBSCAN: algorithm
A density-based algorithm:
1 At each iteration, choose an unlabeled object;

Loı̈c Cerf Mineração de Dados Aplicada


33 / 46
N
Classical Algorithms

DBSCAN: algorithm
A density-based algorithm:
1 At each iteration, choose an unlabeled object;
2 List the sufficiently similar objects;

Loı̈c Cerf Mineração de Dados Aplicada


33 / 46
N
Classical Algorithms

DBSCAN: algorithm
A density-based algorithm:
1 At each iteration, choose an unlabeled object;
2 List the sufficiently similar objects;
3 If there are too few of them, label the object as outlier;

Loı̈c Cerf Mineração de Dados Aplicada


33 / 46
N
Classical Algorithms

DBSCAN: algorithm
A density-based algorithm:
1 At each iteration, choose an unlabeled object;
2 List the sufficiently similar objects;
3 If there are too few of them, label the object as outlier;
4 Otherwise cluster these objects as well as those listed by the same
recursive process applied to the newly clustered objects.

Loı̈c Cerf Mineração de Dados Aplicada


33 / 46
N
Classical Algorithms

DBSCAN: algorithm
A density-based algorithm:
1 At each iteration, choose an unlabeled object;
2 List the sufficiently similar objects;
3 If there are too few of them, label the object as outlier;
4 Otherwise cluster these objects as well as those listed by the same
recursive process applied to the newly clustered objects.

One single user-defined density for all clusters (OPTICS addresses


this problem);

Loı̈c Cerf Mineração de Dados Aplicada


33 / 46
N
Classical Algorithms

DBSCAN: algorithm
A density-based algorithm:
1 At each iteration, choose an unlabeled object;
2 List the sufficiently similar objects;
3 If there are too few of them, label the object as outlier;
4 Otherwise cluster these objects as well as those listed by the same
recursive process applied to the newly clustered objects.

One single user-defined density for all clusters (OPTICS addresses


this problem);
Choice of a similarity (shape of the clusters);

Loı̈c Cerf Mineração de Dados Aplicada


33 / 46
N
Classical Algorithms

DBSCAN: algorithm
A density-based algorithm:
1 At each iteration, choose an unlabeled object;
2 List the sufficiently similar objects;
3 If there are too few of them, label the object as outlier;
4 Otherwise cluster these objects as well as those listed by the same
recursive process applied to the newly clustered objects.

One single user-defined density for all clusters (OPTICS addresses


this problem);
Choice of a similarity (shape of the clusters);
O(|O| log |O|) average time complexity using an appropriate index
structure (O(|O|2 ) worst case);

Loı̈c Cerf Mineração de Dados Aplicada


33 / 46
N
Classical Algorithms

DBSCAN: algorithm
A density-based algorithm:
1 At each iteration, choose an unlabeled object;
2 List the sufficiently similar objects;
3 If there are too few of them, label the object as outlier;
4 Otherwise cluster these objects as well as those listed by the same
recursive process applied to the newly clustered objects.

One single user-defined density for all clusters (OPTICS addresses


this problem);
Choice of a similarity (shape of the clusters);
O(|O| log |O|) average time complexity using an appropriate index
structure (O(|O|2 ) worst case);
Outlier detection;

Loı̈c Cerf Mineração de Dados Aplicada


33 / 46
N
Classical Algorithms

DBSCAN: algorithm
A density-based algorithm:
1 At each iteration, choose an unlabeled object;
2 List the sufficiently similar objects;
3 If there are too few of them, label the object as outlier;
4 Otherwise cluster these objects as well as those listed by the same
recursive process applied to the newly clustered objects.

One single user-defined density for all clusters (OPTICS addresses


this problem);
Choice of a similarity (shape of the clusters);
O(|O| log |O|) average time complexity using an appropriate index
structure (O(|O|2 ) worst case);
Outlier detection;
Single linkage.
Loı̈c Cerf Mineração de Dados Aplicada
33 / 46
N
Classical Algorithms

Configuration

Configuring data mining algorithms is hard. It often relies on


sampling in the hyper-parameter space and keeping the best
output. Metaheuristics can be used too.

Nevertheless, understanding the various algorithms and the effect


of their hyper-parameters helps.

Loı̈c Cerf Mineração de Dados Aplicada


34 / 46
N
Classical Algorithms

Configuration

Configuring data mining algorithms is hard. It often relies on


sampling in the hyper-parameter space and keeping the best
output. Metaheuristics can be used too.

Nevertheless, understanding the various algorithms and the effect


of their hyper-parameters helps.

Loı̈c Cerf Mineração de Dados Aplicada


34 / 46
N
Assessing a Clustering

Outline

1 Definition

2 Classical Algorithms

3 Assessing a Clustering

4 Case study

5 Clustering in KNIME

Loı̈c Cerf Mineração de Dados Aplicada


35 / 46
N
Assessing a Clustering

An unsupervised task

Clustering is an unsupervised task: it is about discovering a hidden


organization of the objects.

Loı̈c Cerf Mineração de Dados Aplicada


36 / 46
N
Assessing a Clustering

An unsupervised task

Clustering is an unsupervised task: it is about discovering a hidden


organization of the objects.

As a consequence, there are only intrinsic measures of the quality


of a clustering.

Loı̈c Cerf Mineração de Dados Aplicada


36 / 46
N
Assessing a Clustering

Intra and inter-cluster similarities

Given one cluster, the intra-cluster similarity can be defined as:


the minimal similarity between two objects in the cluster;

Given two clusters, the inter-cluster similarity can be defined as:


the maximal similarity between objects in the two clusters;

Loı̈c Cerf Mineração de Dados Aplicada


37 / 46
N
Assessing a Clustering

Intra and inter-cluster similarities

Given one cluster, the intra-cluster similarity can be defined as:


the minimal similarity between two objects in the cluster;
the average similarity between two objects in the cluster;

Given two clusters, the inter-cluster similarity can be defined as:


the maximal similarity between objects in the two clusters;
the average similarity between objects in the two clusters;

Loı̈c Cerf Mineração de Dados Aplicada


37 / 46
N
Assessing a Clustering

Intra and inter-cluster similarities

Given one cluster, the intra-cluster similarity can be defined as:


the minimal similarity between two objects in the cluster;
the average similarity between two objects in the cluster;
the average similarity to the center of the cluster.

Given two clusters, the inter-cluster similarity can be defined as:


the maximal similarity between objects in the two clusters;
the average similarity between objects in the two clusters;
the similarity between the centers of the two clusters.

Loı̈c Cerf Mineração de Dados Aplicada


37 / 46
N
Assessing a Clustering

Internal evaluation

BetaCV the ratio of the average intra-cluster similarity to the


average inter-cluster similarity;

Loı̈c Cerf Mineração de Dados Aplicada


38 / 46
N
Assessing a Clustering

Internal evaluation

BetaCV the ratio of the average intra-cluster similarity to the


average inter-cluster similarity;
Dunn the ratio of the minimal intra-cluster similarity to the
maximal inter-cluster similarity;

Loı̈c Cerf Mineração de Dados Aplicada


38 / 46
N
Assessing a Clustering

Internal evaluation

BetaCV the ratio of the average intra-cluster similarity to the


average inter-cluster similarity;
Dunn the ratio of the minimal intra-cluster similarity to the
maximal inter-cluster similarity;
Davies-Bouldin the average similarity between each cluster and its
most similar one (minimal ratio of the sum of the
two intra-cluster similarities to their inter-cluster
similarity), averaged over all the clusters;

Loı̈c Cerf Mineração de Dados Aplicada


38 / 46
N
Assessing a Clustering

Internal evaluation

BetaCV the ratio of the average intra-cluster similarity to the


average inter-cluster similarity;
Dunn the ratio of the minimal intra-cluster similarity to the
maximal inter-cluster similarity;
Davies-Bouldin the average similarity between each cluster and its
most similar one (minimal ratio of the sum of the
two intra-cluster similarities to their inter-cluster
similarity), averaged over all the clusters;
Silhouette for each object, the difference between the average
similarity to the objects in the same cluster and the
greatest average similarity to the objects in another
cluster divided by the greatest term.

Loı̈c Cerf Mineração de Dados Aplicada


38 / 46
N
Assessing a Clustering

Comparing clusterings

A quality measure is not meaningful, unless compared to that of


another clustering:
of the same dataset to select the best clustering;

Loı̈c Cerf Mineração de Dados Aplicada


39 / 46
N
Assessing a Clustering

Comparing clusterings

A quality measure is not meaningful, unless compared to that of


another clustering:
of the same dataset to select the best clustering;
of a randomized version of the dataset to have an information
about the tendency of the objects to be clustered.

Loı̈c Cerf Mineração de Dados Aplicada


39 / 46
N
Assessing a Clustering

Randomization of a dataset

Uniform distribution between the Normal distribution parametrized


extrema of each attribute: from the dataset:

Loı̈c Cerf Mineração de Dados Aplicada


40 / 46
N
Assessing a Clustering

Stability of a clustering

If a clustering method involves randomness (e. g., k-means), the


stability of its output over several runs is an indicator of its quality.

Loı̈c Cerf Mineração de Dados Aplicada


41 / 46
N
Assessing a Clustering

Stability of a clustering

If a clustering method involves randomness (e. g., k-means), the


stability of its output over several runs is an indicator of its quality.

The clustering to keep is the one with the best quality.

Loı̈c Cerf Mineração de Dados Aplicada


41 / 46
N
Assessing a Clustering

Stability of a clustering

If a clustering method involves randomness (e. g., k-means), the


stability of its output over several runs is an indicator of its quality.

The clustering to keep is the one with the best quality.

In KNIME, the k first objects are the initial centers of k-means or


fuzzy c-means. The “Shuffle” node can be used upstream for an
actual random initialization.

Loı̈c Cerf Mineração de Dados Aplicada


41 / 46
N
Assessing a Clustering

Stability of a clustering

If a clustering method involves randomness (e. g., k-means), the


stability of its output over several runs is an indicator of its quality.

The clustering to keep is the one with the best quality.

In KNIME, the k first objects are the initial centers of k-means or


fuzzy c-means. The “Shuffle” node can be used upstream for an
actual random initialization.

Loı̈c Cerf Mineração de Dados Aplicada


41 / 46
N
Assessing a Clustering

Similarity between two partitions

A correlation between nominal attributes (the two partitions)


measures their similarity. The entropy (“Entropy scorer” in
KNIME) is such a measure.

Loı̈c Cerf Mineração de Dados Aplicada


42 / 46
N
Assessing a Clustering

Similarity between two partitions

A correlation between nominal attributes (the two partitions)


measures their similarity. The entropy (“Entropy scorer” in
KNIME) is such a measure.

The Fowlkes-Mallows index, the Rand index and the adjusted Rand
index (all absent from KNIME) are alternatives. They are all based
on the number of pairs of objects that are in the same/different
cluster(s) in one clustering and in the same/different cluster(s) in
the other clustering.

Loı̈c Cerf Mineração de Dados Aplicada


42 / 46
N
Assessing a Clustering

Similarity between two partitions

A correlation between nominal attributes (the two partitions)


measures their similarity. The entropy (“Entropy scorer” in
KNIME) is such a measure.

The Fowlkes-Mallows index, the Rand index and the adjusted Rand
index (all absent from KNIME) are alternatives. They are all based
on the number of pairs of objects that are in the same/different
cluster(s) in one clustering and in the same/different cluster(s) in
the other clustering.

Those same measures can help to interpret a clustering, correlating


it with an external nominal attribute, which is not used to cluster.

Loı̈c Cerf Mineração de Dados Aplicada


42 / 46
N
Case study

Outline

1 Definition

2 Classical Algorithms

3 Assessing a Clustering

4 Case study

5 Clustering in KNIME

Loı̈c Cerf Mineração de Dados Aplicada


43 / 46
N
Clustering in KNIME

Outline

1 Definition

2 Classical Algorithms

3 Assessing a Clustering

4 Case study

5 Clustering in KNIME

Loı̈c Cerf Mineração de Dados Aplicada


44 / 46
N
Clustering in KNIME

Clustering in KNIME

Practice

1 After figuring out an appropriate k with a hierarchical clustering,


cluster, with k-means, the bears in bears.csv according to
their attributes Headlen, Headwth and Chest.
2 Test the stability of k-means’ clustering.
3 Does the sex partly explain the clustering? The age?

Loı̈c Cerf Mineração de Dados Aplicada


45 / 46
N
License

2012–2018
c Loı̈c Cerf
These slides are licensed under the Creative Commons
Attribution-ShareAlike 4.0 International License.

Loı̈c Cerf Mineração de Dados Aplicada


46 / 46
N

Das könnte Ihnen auch gefallen