You are on page 1of 9

# K-Means Clustering

1
K-Means Clustering

## Objective Function n=11

k 2

arg min ∑ ∑ Χ j − mi
S i =1 Χ j ∈S i

## http://en.wikipedia.org/wiki/K-means_clustering, 30 March 2010

2
K-Means Clustering

Basic idea
• proposed by Hugo Steinhaus in 1956
Standard Algorithm
• proposed by Stuart Lloyd in 1957
• for a pulse-code modulation technique
The term “K-means”
• proposed by James MacQueen in 1967

## http://en.wikipedia.org/wiki/K-means_clustering, 30 March 2010

3
Standard Algorithm

Assignment:

## initial set of k means: m1(1),…,m

mk(1)
(selected by a random or heuristic method)

Update:

## calculate the new means

-> centroid of the objects in the cluster

## repeat until stable

-> no objects move group

## http://en.wikipedia.org/wiki/K-means_clustering, 30 March 2010

4
Standard Algorithm

## 4. Repeat step 2 and 3 until the centroids no longer move or

the objects no longer move to other groups.
Examples

positions

## K-means interactive demo, http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html, 30 March 2010

6
Examples-
Examples-Matlab
100 100

90 90
n=20, k=3
80 80

70 70
Operation flow
60 60 1. Select initial centroid
50 50
(random)
40 40 2. Calculate Euclidian
30 30 distance
20 20 3. Assign group (find
10 10 minimum distance)
0
0 10 20 30 40 50 60 70 80 90
0
0 10 20 30 40 50 60 70 80 90 4. Calculate position of
new centroid
Initial positions & 1st step
grouping 5. Calculate stop
100 100
condition
90 90

80 80

70 70

60 60

50 50

40 40

30 30

20 20

## 10 10 Matlab Statistics Toolbox

0
0 10 20 30 40 50 60 70 80 90
0
0 10 20 30 40 50 60 70 80 90 : IDX = KMEANS(X, K)
2nd step final step
7
Summary

K-Means clustering
• is a fast and simple algorithm
• to solve clustering problem
But the algorithm
• does not necessarily find optimal configuration
• due to initialization problem
• by random or heuristic selection
And so k-means algorithm
• can be run multiple times
• to reduce above effect.

8
References

Joaquin Perez Ortega, Ma. Del Rocio Boone Rojas, and Maria J.
Somodevilla Garica, “Research issues on K-means Algorithm:
An Experimental Trial Using Matlab”, Proceedings of the 2nd
Workshop on Semantic Web and New Technologies (SemWeb09),
Puebla, Mexico, March 23-24, 2009.