Beruflich Dokumente
Kultur Dokumente
Cluster Validity
• For supervised classification we have a variety of
measures to evaluate how good our model is
•
Cluster Validation
• •
•
• Why validation?
– To avoid finding clusters formed by chance
– To compare clustering algorithms
– To choose clustering parameters
• e.g., the number of clusters in the K-means algorithm
0.9 0.9
0.8 0.8
0.6
0.7
0.6 DBSCAN
y
0.4 0.4
results. 0.3
0.2
0.3
0.2
• Why validation?
0.1 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
– To avoid finding clusters formed by chance 1 1
0.9 0.9
0.7
0.8
0.7
Complete
0.5
0.6
0.5
Link
y
y
• e.g., the number of clusters in the K-means 0.4 0.4
algorithm 0.3
0.2
0.3
0.2
0.1 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
3 4
5 6
?
7 8
– Rate-distortion method 6
MSE
5 Knee-point between
– F-ratio 4
14 and 15 clusters.
– Davies-Bouldin index (DBI) 3
2
– Bayesian Information Criterion (BIC)
1
– Silhouette Coefficient 0
5 10 15 20 25
9 Clusters 10
i xCi Total 10 0 10
– Separation is measured by the between cluster sum of squares
SSB C i (m mi ) 2 K=2
WSS (1 1.5) 2 (2 1.5) 2 (4 4.5)2 (5 4.5) 2 1
i BSS 2 (3 1.5) 2 2 (4.5 3)2 9
– Where |Ci| is the size of cluster i clusters: Total 1 9 10
Total Vatiance =
( X ) SSW SSB 13
F-ratio (x10^5)
1.0
PNN
• F-ratio (WB-index): 0.8
IS
N 0.6
k || xi c p ( i ) ||2 minimum
k SSW 0.4
F i 1
k
( X ) SSW 0.2
n j || c j x ||2
j 1
SSB 0.0
25 23 21 19 17 15 13 11 9 7 5
Clusters
15 16
We need a quantitative method to assess the quality of a clustering... • Cohesion: measures how closely related are
The silhouette value of a point is a measure of how similar a point is to points in its own
cluster compared to points in other clusters
objects in a cluster
• Separation: measure how distinct or well-
Formal definition:
separated a cluster is from other clusters
• a(i) is the average distance of the point i to the other points in its own cluster A
• d(i, C) is the average distance of the point i to the other points in the cluster C
• b(i) is the minimal d(i, C) over all clusters other than A
cohesion
separation
24
25
Soft partitions 26
27