1

K-Means Clustering

k 2

arg min ∑ ∑ Χ j − mi

S i =1 Χ j ∈S i

2

K-Means Clustering

Basic idea

• proposed by Hugo Steinhaus in 1956

Standard Algorithm

• proposed by Stuart Lloyd in 1957

• for a pulse-code modulation technique

The term “K-means”

• proposed by James MacQueen in 1967

3

Standard Algorithm

Assignment:

mk(1)

(selected by a random or heuristic method)

Update:

-> centroid of the objects in the cluster

-> no objects move group

4

Standard Algorithm

the objects no longer move to other groups.

Examples

positions

6

Examples-

Examples-Matlab

100 100

90 90

n=20, k=3

80 80

70 70

Operation flow

60 60 1. Select initial centroid

50 50

(random)

40 40 2. Calculate Euclidian

30 30 distance

20 20 3. Assign group (find

10 10 minimum distance)

0

0 10 20 30 40 50 60 70 80 90

0

0 10 20 30 40 50 60 70 80 90 4. Calculate position of

new centroid

Initial positions & 1st step

grouping 5. Calculate stop

100 100

condition

90 90

80 80

70 70

60 60

50 50

40 40

30 30

20 20

0

0 10 20 30 40 50 60 70 80 90

0

0 10 20 30 40 50 60 70 80 90 : IDX = KMEANS(X, K)

2nd step final step

7

Summary

K-Means clustering

• is a fast and simple algorithm

• to solve clustering problem

But the algorithm

• does not necessarily find optimal configuration

• due to initialization problem

• by random or heuristic selection

And so k-means algorithm

• can be run multiple times

• to reduce above effect.

8

References

Joaquin Perez Ortega, Ma. Del Rocio Boone Rojas, and Maria J.

Somodevilla Garica, “Research issues on K-means Algorithm:

An Experimental Trial Using Matlab”, Proceedings of the 2nd

Workshop on Semantic Web and New Technologies (SemWeb09),

Puebla, Mexico, March 23-24, 2009.

