Cluster Analysis

Cluster Analysis
Prepared by Mr. C Y Nimkar 1

Usage
• Usually used for grouping customers into

clusters that have similar behaviour /
attitude
• Helps marketer to decide target audience.
With this he can do
– Product differentiation…
– Offer differentiation…

Product differentiation
Model 1 Target audience 1
Mobile handset

Offer differentiation
Children
Enjoyment
Chocolate Celebration Youngsters
Quick lunch
Busy persons

Steps in Cluster analysis

Step 1 – Collect data

• Collect data in continuous scale (interval or
ratio) on:
– customer needs
– Customers' opinion about products
• Usually rating scale is used

Cluster Analysis for Housing

• Following attributes for selecting location for housing are
considered:
– Nearness to school/college
– Nearness to market
– Locality
• Fifteen customers’ response is obtained on 10 point rating
scale as shown in next slide

Cust
No
Ratings out of 10 on Customer Space
Nearness Nearness Locality
to to market
school/coll
ege
1 6 5 6 
13
2 2 4 7

3 7 6 6 2 
7
4 3 8 5 
 9
5 5 5 6 12
 
5 
6 9 6 5 1 3

7 4 4 7 4

14
8 5 5 4 

10 
11
15
9 2 7 6 
6
10 4 5 5

11 1 2 5 8
12 2 6 6
13 7 4 8
14 8 2 6
15 6 5 5
Each customer is a point in space


13

2 
7

 9
12

 
5 3
1

4

14

 10
11 
15 
6

8
Cluster analysis groups customers based on distances

between them
Step 2 – Apply hierarchical cluster analysis method

• In hierarchical cluster method:
– Initially, each customer forms his own cluster
– Then, two clusters that are closest form a new cluster
– This process continues till we get one cluster

13

2 
7

 9
12
 
5 
1 3

4

14

 10
11 
15 
6

8

• In this technique once two customers join a cluster they can
not be separated
• It works in a stepwise fashion to form clusters, hence called
‘hierarchical’ method.
• It is also called agglomerative method since clusters are
formed by combining existing clusters
• We use this technique to find number of clusters that can be
formed
• This technique uses distance method and clustering method
that we specify

• Following distance methods are available:
– Squared Euclidean distance method
– Euclidean distance method
– City-block (Manhattan) distance method
– Chebychev distance method

Squared Euclidean/Euclidean distance
Customer No Ratings out of 10 on
Nearness to Nearness to market Locality

school/college
1 6 5 6
2 2 4 7
• Squared Euclidean distance between customer no 1 and 2

= (6-2)2 + (5-4)2 + (6-7)2 = 18.00
• Euclidean distance between customer no 1 and 2
= √(6-2)2 + (5-4)2 + (6-7)2 = 4.2426

City Block distance

school/college
1 6 5 6
2 2 4 7
It is the sum of absolute (positive) distances = 4 + 1 + 1 = 6

Chebychev distance

school/college
1 6 5 6
2 2 4 7
It is the maximum absolute (positive) distance =

Maximum {4,1,1} = 4

Method to form clusters
• Following methods are available:

– Single linkage rule (nearest neighbour)
– Complete linkage rule (farthest neighbour)
– Between-groups linkage rule
– Within-groups linkage rule
– Centroid rule
– Ward’s method

Single linkage rule (nearest neighbours)
1 . 5 .
2 . 6.
3 . 4. 8. 7.
• Distance between clusters is the minimum distance

between a customer in one cluster and a customer in other
cluster

Complete linkage rule (farthest neighbours)
1 . 5 .
2 . 6.
3 . 4 . 8. 7.
• Distance between clusters is the maximum distance

between a customer in one cluster and a customer in other
cluster

Between - group linkage
1. 5 .
2. 6.
3 . 4. 8. 7.
• It is the average distance between all pairs of customers in
two clusters:
1-5,1-6,1-7,1-8
2-5,2-6,2-7,2-8
3-5,3-6,3-7,3-8
4-5,4-6,4-7,4-8

Within - group linkage
• Within-group method considers distance

between pairs of customers after combining
two clusters.
– For e.g. there are 3 clusters I, II and III
1. 5 .
2.
3. 4.
6.
8 . 7.
I
II III

After combining clusters I and II, form pairs of customers
in I and II. Pairs Pairs
1-2 3-5
1. 5 . 1-3 3-6
2. 1-4 3-7
3. 4.
6. 1-5 3-8
8 . 7. Pairs
1-6
1-7
4-5
4-6
I 1-8 4-7
II 2-3 4-8
2-4 5-6
2-5 5-7
2-6 5-8
2-7 6-7
2-8 6-8
Calculate average distance of pairs formed
3-4 7-8

• Do the same calculations by combining clusters II and III
and clusters I and III
• Combine those clusters where average distance is least

Centroid rule
Suppose customers 1, 5 and 15 are in one cluster
Customer Nearness to Nearness to market Locality
No. school/college
1 6 5 6
5 5 5 6
15 6 5 5
Centroid (6+5+6)/3 = 5.67 (5+5+5)/3 = 5.00 (6+6+5)/3 = 5.67
• Centroid of a cluster is a virtual customer with ratings of 5.67, 5.00

and 5.67 on nearness to school/college, nearness to market and locality
respectively
.
(5.67, 5.00, 5.67)
• Distance between two clusters is distance between their centroids

• Distance is calculated between centroids of every two clusters and two
clusters are combined whose centroids are closest
Ward’s method
9
1 I
II
5 15
12
• Suppose there are 2 clusters I and II

• Cluster I comprises of respondents 1, 5 and 15
• Cluster II comprises of respondents 9 and 12

• Distance between a respondent and the cluster centroid is
calculated by squared Euclidean method and these distances are
added
• For example, for cluster I that contains customers 1,5 and 15:
Customer Nearness to school/college Nearness to market Locality

No.
1 6 5 6
5 5 5 6
1 I
15 6 5 5
5 15
Centroid (6+5+6)/3 = 5.67 (5+5+5)/3 = 5.00 (6+6+5)/3
= 5.67
= (6-5.67)2+(5-5.67)2+(6-5.67)2+(5-5)2+(5-5)2+(5-5)2+(6-5.67)2+(6-5.67)2+(5-5.67)2
= 1.3334

For cluster II,

No. 9
9 2 7 6
12 2 6 6
II
Centroid 2 6.5 6 12
= (2-2)2+(2-2)2+(7-6.5)2+(6-6.5)2+(6-6)2+(6-6)2 = 0.50

• Both clusters are combined
1 I 9
II
5 15
12
• Calculate new centroid

No.
1 6 5 6
5 5 5 6
15 6 5 5
9 2 7 6
12 2 6 6
Centroid 4.2 5.6 5.8
1 I 9
II
5 15
12
• Distance is calculated for a respondent from new centroid

by squared euclidean method
Customer Nearness to Nearness to Locality
No. school/college market
1 6 5 6
5 5 5 6
15 6 5 5 Distance = 20.8
9 2 7 6
12 2 6 6
Centroid 4.2 5.6 5.8
• Increase in distance after combining two clusters is calculated
1
I 9
1.334 II 0.50
5 15
12
9
1 I 20.8
II
5 15
12
Increase in distance after combining two clusters = 20.8 – 0.5 = 20.3

• This process is done for all clusters
• Two clusters are combined where increase in distance is
the smallest

Output of hierarchical method

Squared euclidean method/Between groups Linkage Method
Agglomeration Schedule
Stage Cluster First

Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 1 15 1.000 0 0 3
2 9 12 1.000 0 0 7
3 1 5 1.500 1 0 5
4 8 10 2.000 0 0 8
5 1 3 3.333 3 0 8
6 2 7 4.000 0 0 10
7 4 9 4.500 0 2 13
8 1 8 5.250 5 4 11
9 13 14 9.000 0 0 12
10 2 11 13.000 6 0 13
11 1 6 14.667 8 0 12
12 1 13 15.643 11 9 14
13 2 4 18.333 10 7 14
14 1 2 26.574 12 13 0
Squared euclidean distance between

respondents 1 and 15 = 1.000

Agglomeration Schedule
Stage Cluster First

Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 1 15 1.000 0 0 3
2 9 12 1.000 0 0 7
3 1 5 1.500 1 0 5
4 8 10 2.000 0 0 8
5 1 3 3.333 3 0 8
6 2 7 4.000 0 0 10
7 4 9 4.500 0 2 13
8 1 8 5.250 5 4 11
9 13 14 9.000 0 0 12
10 2 11 13.000 6 0 13
11 1 6 14.667 8 0 12
12 1 13 15.643 11 9 14
13 2 4 18.333 10 7 14
14 1 2 26.574 12 13 0
Sudden increase in coefficients, indicating 2 clusters are possible

Step 3 – Apply K-Means cluster analysis method

K-means cluster analysis method
• Initially clusters are formed by assigning customers at

random
• Centroids of clusters are calculated
• Then distance of respondent from each centroid is
calculated
• If the respondent is closer to centroid of other cluster then
he is transferred to the other cluster
• This process continues till no further transfer is possible. It
means every respondent is closer to the centroid of the
cluster to which he belongs (Optimal solution)

Comparison between hierarchical and K-means methods
• In hierarchical method once two clusters join, they

are never separated
• In K-means method two clusters that are joined
can be separated at a later stage
• Therefore, K means method gives more
homogeneous clusters

Initial Cluster Centers
Cluster
1 2
Nearness to
9 1
school/college
Nearness to market 6 2
Locality 5 5
This is the solution by assigning customers at random to 2 clusters

Iteration Historya
Change in Cluster
Centers
Iteration 1 2
1 2.992 3.219
2 .547 .502
3 .000 .000
a. Convergence achieved due to no or small change in
cluster centers. The maximum absolute coordinate
change for any center is .000. The current iteration is 3.
The minimum distance between initial centers is 8.944.
This is the iteration history showing optimal solution is achieved at

3rd iteration

ANOVA
Cluster Error
Mean Square df Mean Square df F Sig.
Nearness to
61.344 1 1.661 13 36.938 .000
school/college
Nearness to market .576 1 2.797 13 .206 .657
Locality .043 1 1.104 13 .039 .847
The F tests should be used only for descriptive purposes because the clusters have been chosen to
maximize the differences among cases in different clusters. The observed significance levels are not
corrected for this and thus cannot be interpreted as tests of the hypothesis that the cluster means are
equal.
Final Cluster Centers
Cluster • Higher F ratio, more difference in

1 2
Nearness to consumer ratings
7 3
school/college
Nearness to market 5 5 • Such variable is segmentation
Locality 6 6
variable

Number of Cases in each Cluster
Cluster 1 8.000
2 7.000
Valid 15.000
Missing .000
This table shows cluster sizes

Cluster Analysis

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Cluster Analysis

Hochgeladen von

Copyright:

Verfügbare Formate

Cluster Analysis

Prepared by Mr. C Y Nimkar 1

• Usually used for grouping customers into

Prepared by Mr. C Y Nimkar 2

Model 1 Target audience 1

Model 2 Target audience 2

Model 4 Target audience 4

Prepared by Mr. C Y Nimkar 3

Chocolate Celebration Youngsters

Prepared by Mr. C Y Nimkar 4

Prepared by Mr. C Y Nimkar 5

Prepared by Mr. C Y Nimkar 6

Prepared by Mr. C Y Nimkar 7

Prepared by Mr. C Y Nimkar 8

Prepared by Mr. C Y Nimkar 9

Each customer is a point in space

Cluster analysis groups customers based on distances

Prepared by Mr. C Y Nimkar 12

Prepared by Mr. C Y Nimkar 13

Prepared by Mr. C Y Nimkar 14

Prepared by Mr. C Y Nimkar 15

Customer No Ratings out of 10 on

Nearness to Nearness to market Locality

• Squared Euclidean distance between customer no 1 and 2

• Euclidean distance between customer no 1 and 2

= √(6-2)2 + (5-4)2 + (6-7)2 = 4.2426

Prepared by Mr. C Y Nimkar 16

Customer No Ratings out of 10 on

Nearness to Nearness to market Locality

It is the sum of absolute (positive) distances = 4 + 1 + 1 = 6

Prepared by Mr. C Y Nimkar 17

Customer No Ratings out of 10 on

Nearness to Nearness to market Locality

It is the maximum absolute (positive) distance =

Prepared by Mr. C Y Nimkar 18

• Following methods are available:

Prepared by Mr. C Y Nimkar 19

• Distance between clusters is the minimum distance

Prepared by Mr. C Y Nimkar 20

• Distance between clusters is the maximum distance

Prepared by Mr. C Y Nimkar 21

Prepared by Mr. C Y Nimkar 22

• Within-group method considers distance

Prepared by Mr. C Y Nimkar 23

Prepared by Mr. C Y Nimkar 24

Prepared by Mr. C Y Nimkar 25

• Centroid of a cluster is a virtual customer with ratings of 5.67, 5.00

• Distance between two clusters is distance between their centroids

• Suppose there are 2 clusters I and II

Prepared by Mr. C Y Nimkar 27

Customer Nearness to school/college Nearness to market Locality

Prepared by Mr. C Y Nimkar 28

Customer Nearness to school/college Nearness to market Locality

Prepared by Mr. C Y Nimkar 29

• Calculate new centroid

• Distance is calculated for a respondent from new centroid

Increase in distance after combining two clusters = 20.8 – 0.5 = 20.3

Prepared by Mr. C Y Nimkar 33

Prepared by Mr. C Y Nimkar 34

Stage Cluster First

Squared euclidean distance between

Prepared by Mr. C Y Nimkar 35

Stage Cluster First