Sie sind auf Seite 1von 11

10/12/2016

K Means Clustering Algorithm: Explained DnI Institute

Follow Us

DnI Institute
Build Data and Decision Science Experience

Menu

K Means Clustering Algorithm: Explained


September 26, 2015 by DnI Institute

Like 46

Share

Share

Share this on Whats App

Classification problems are solved by objective segmentation and subjective segmentation.


A non technical explanation ( http://dni-institute.in/blogs/segmentation-a-perspective-2/ ) on when
to use subjective segmentation technique such as K means clustering and when to use objective
segmentation methods such as Decision Tree.
One of the most frequently used unsupervised algorithms is K Means. K Means Clustering is
exploratory data analysis technique. This is non-hierarchical method of grouping objects together.
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in
the same group (called a cluster) are more similar (in some sense or another) to each other than
to those in other groups (clusters).
In this blog, we aim to explain the algorithm in a simple steps and with an example.
http://dni-institute.in/blogs/k-means-clustering-algorithm-explained/

1/11

10/12/2016

K Means Clustering Algorithm: Explained DnI Institute

Business Scenario: We have height and weight information. Using these two variables, we need
to group the objects based on height and weight information.

If you look at the above chart, you will expect that there are two visible clusters/segments and we
want these to be identified using K Means algorithm.
Data Sample

Height

Weight

185

72

170

56

168

60

179

68

182

72

188

77

180

71

180

70

183

84

180

88

180

67

177

76

Step 1: Input
Dataset, Clustering Variables and Maximum Number of Clusters (K in Means Clustering)
In this dataset, only two variables height and weight are considered for clustering
Height

Weight

185

72

170

56

168

60

179

68

http://dni-institute.in/blogs/k-means-clustering-algorithm-explained/

2/11

10/12/2016

K Means Clustering Algorithm: Explained DnI Institute

182

72

188

77

180

71

180

70

183

84

180

88

180

67

177

76

Step 2: Initialize cluster centroid


In this example, value of K is considered as 2. Cluster centroids are initialized with first 2
observations.
Initial Centroid
Cluster

Height

Weight

K1

185

72

K2

170

56

Step 3: Calculate Euclidean Distance


Euclidean is one of the distance measures used on K Means algorithm. Euclidean distance
between of a observation and initial cluster centroids 1 and 2 is calculated. Based on euclidean
distance each observation is assigned to one of the clusters - based on minimum distance.

First two observations

Height

Weight

185

72

170

56

Now initial cluster centroids are :


Updated
http://dni-institute.in/blogs/k-means-clustering-algorithm-explained/

3/11

10/12/2016

K Means Clustering Algorithm: Explained DnI Institute

Centroid

Cluster

Height

Weight

K1

185

72

K2

170

56

Euclidean Distance Calculation from each of the clusters is calculated.

Euclidian Distance from Cluster

Euclidian Distance from

Cluster 2

Assignment

(185-185) +(72-72)
=0

(185-170) +(72-56)
= 21.93

(170-185) +(56-72)
= 21.93

(170-170) +(56-56)
=0

We have considered two observations for assignment only because we knew the
assignment. And there is no change in Centroids as these two observations were only
considered as initial centroids
Step 4: Move on to next observation and calculate Euclidean Distance
Height

Weight

168

60

Euclidean Distance from Cluster 1

Euclidean Distance from Cluster 2


Assignment

(168-185) +(60-72)
=20.808

(168-185) +(60-72)
= 4.472

Since distance is minimum from cluster 2, so the observation is assigned to cluster 2. Now revise
Cluster Centroid mean value Height and Weight as Custer Centroids. Addition is only to cluster
2, so centroid of cluster 2 will be updated
Updated cluster centroids
Updated Centroid
Cluster

K=1
K=2

Height

Weight
185

72

(170+168)/2

(56+60)/2

= 169

= 58

Step 5: Calculate Euclidean Distance for the next observation, assign next observation based on
minimum euclidean distance and update the cluster centroids.
Next Observation.
http://dni-institute.in/blogs/k-means-clustering-algorithm-explained/

4/11

10/12/2016

K Means Clustering Algorithm: Explained DnI Institute

Height

Weight

179

68

Euclidean Distance Calculation and Assignment


Euclidain

Euclidain

Distance

Distance

from

from

Cluster 1

Cluster 2

Assignment

7.211103 14.14214

Update Cluster Centroid


Cluster

Updated Centroid
Height

Weight

K=1

182

70.6667

K=2

169

58

Continue the steps until all observations are assigned


Final assignments

Cluster Centroids
Cluster

Updated Centroid
Height

Weight

K=1

182.8

72

K=2

169

58

http://dni-institute.in/blogs/k-means-clustering-algorithm-explained/

5/11

10/12/2016

K Means Clustering Algorithm: Explained DnI Institute

This is what was expected initially based on two-dimensional plot.

A few important considerations in K Means


Scale of measurements influences Euclidean Distance , so variable standardisation
becomes necessary
Depending on expectations - you may require outlier treatment
K Means clustering may be biased on initial centroids - called cluster seeds
Maximum clusters is typically inputs and may also impacts the clusters getting created
In the next blog, we focus on creating clusters using R. K Means Clustering using R
Share this on Whats App

K Means
k means clustering algorithm, k means clustering example, k means clustering explained, k means steps,
simple explanation k means, Working of k means
Interview Process - Evaluating Analytical Skills
Facebook Groups - Who is contributing?

http://dni-institute.in/blogs/k-means-clustering-algorithm-explained/

6/11

10/12/2016

K Means Clustering Algorithm: Explained DnI Institute

4 thoughts on K Means Clustering Algorithm: Explained

Vishal Nigam
September 25, 2016 at 1:57 pm | Reply

Excellent Example. No better example found

DnI Institute
September 25, 2016 at 2:21 pm | Reply

Thanks Vishal

Nitesh
October 8, 2016 at 5:55 am | Reply

Very good..example..
but there is a text mistake in step 4.. euclidean distance from cluster 2

DnI Institute
October 8, 2016 at 6:13 am | Reply

Thanks Nitesh.. We have corrected the spelling.

Leave a Comment

http://dni-institute.in/blogs/k-means-clustering-algorithm-explained/

7/11

10/12/2016

K Means Clustering Algorithm: Explained DnI Institute

Name *
Email *
Website
Post Comment

Search

Categories
Campaign Analytics
Career
Cross Sell Modeling
Customer Analytics
Customer Retention
Decision Tree
Forecasting
Fraud Analytics
Insurance
jobs
K Means
Logistic Regression
http://dni-institute.in/blogs/k-means-clustering-algorithm-explained/

8/11

10/12/2016

K Means Clustering Algorithm: Explained DnI Institute

Machine Learning
Market Basket
Multiple Regression
Next Best Action
Predictive Modeling
R
R for Data Science
R Visualization
Random Forest
Retail Analytics
Risk Analytics
SAS
Segmentation
Social Media
Statistical Tests
Statistics
Support Vector Machine
Survival Model
Technology
Tool
Views

Views
Chi Square Test using SAS
ANOVA using SAS and Example
Retain Statement - Explained with Examples
Machine Learning for Retailers
Machine Learning - Steps to Build Regression Model
Scenarios: Binary Predictive Models
Logistic Regression using R: German Credit Example
10 Most Commonly Used Character Functions in SAS
K Means Clustering Examples and Practical Applications
Data Science for Schools and Educational Institutes

http://dni-institute.in/blogs/k-means-clustering-algorithm-explained/

9/11

10/12/2016

K Means Clustering Algorithm: Explained DnI Institute

Data Science Trainings


Machine Learning using R
Data Science Foundation
Retail Credit Risk Analytics

Offerings
Trainings
Internship
Mentorship
Data Science Views
Consulting

Analytics & Data Science


Data Science Career
Analytics Cases
Analytics News

Login
You are not logged in.
Username

Password

log in Forgot? Register

Copyright 2016 GeneratePress


http://dni-institute.in/blogs/k-means-clustering-algorithm-explained/

10/11

10/12/2016

K Means Clustering Algorithm: Explained DnI Institute

http://dni-institute.in/blogs/k-means-clustering-algorithm-explained/

11/11

Das könnte Ihnen auch gefallen