Cluster Analysis: Prepared by Mr. C Y Nimkar 1

Cluster Analysis
Prepared by Mr. C Y Nimkar 1

Steps in Marketing
Segmentation
Targeting
Positioning

Segmentation
CUSTOMER NEEDS
Do you want AC in car? Do you want AC in car?

Yes/No 5 points rating scale
Categorical data Continuous data
Chi-square technique Cluster analysis technique

•Independence
•Homogeneity
Categorical data – Chi-square technique of independence
Can we use age to segment customers’ into ‘want AC’ and ‘do not want AC’
categories?
• Cross tabulate age and whether

AC needed
• Use χ2 test for independence
• Ho : Age and need of AC are
independent
• H1 : Age and need of AC are
dependent
• Calculate measures of association
• If both are independent, age is
not suitable for segmentation
Categorical data – Chi-square technique of homogeneity
Is there any age-group that is large among customers who want AC in car?
• Select customers who said ‘Yes’ to AC in car

• Prepare frequency table according to their
age category
• Use χ2 test for homogeneity (goodness of
fit)
• Ho : Frequency distribution based on
age category is homogeneous
• H1 : Frequency distribution based on
age category is not homogeneous
• If frequency distribution is homogeneous,
age is not suitable for segmentation
• If ‘Asymp. Sig.’ value >= 0.05 frequency
distribution based on age is homogeneous
age is not suitable for segmentation
• Segmentation helps to decide target
audience for:
– Product differentiation strategy…
– Offer differentiation strategy…

Product differentiation
Model 1 Target audience 1
Mobile handset

Offer differentiation
Enjoyment Children
Chocolate Celebration Youngsters
Quick lunch
Busy persons

• In both approaches, segmentation is
essential

Example:
Cluster Analysis for Retail Mall

Que.: (SHOW CARD A) This card shows few
requirements that a customer like you may
be looking for in a retail mall. Please indicate
your need on a scale of 10 where 1 means
‘not at all needed’ and 10 means ‘very much
needed’
Need Rating
Variety
Price

5 customers gave following ratings

Customers’ location in 2-dimensional space (customer
space) is as under:

• Each customer is a point in customer space
• Cluster analysis groups customers on ‘distances among
customers’
Distance Methods

• Following distance methods are available:
– Squared Euclidean distance method
– Euclidean distance method
– City-block (Manhattan) distance method
– Chebychev distance method

Squared Euclidean Distance
Sr_No Variety Price

(A) (B)
1 8 3
52
u nit 2 4 9
s
Squared Euclidean distance

between customers 1 and 2
= (8-4)2 + (3-9)2 = 16 + 36 = 52 units

Euclidean Distance
Sr_No Variety Price

(A) (B)
7.2 1 8 3
1 uni
ts 2 4 9
Euclidean distance between

customers 1 and 2
= √(8-4)2 + (3-9)2 = 7.21 units

City Block Distance
Sr_No Variety Price

(A) (B)
4 units 1 8 3
2 4 9
6 units City Block distance between

customers 1 and 2
= |8-4| + |3-9| = 10 units

Chebychev Distance
Sr_No Variety Price

(A) (B)
1 8 3
2 4 9
6 units
Chebychev distance between
customers 1 and 2
= Max {|8-4| , |3-9|} = 6 units

• Squared Euclidean method magnifies
distance compared to other methods
• But gives higher weightage to larger
distance
• Preferred method
Ratings on
Var 1 Var 2
Customer 1 2 5
Customer 2 5 9
Absolute Distance 3 4
Ratio 1 1.33
Squared Distance 9 16
Ratio 1 1.78
• Initially, each customer forms his own
cluster. This is called 0th stage of clustering

• Then distance is calculated between two clusters
by any of the distance methods
• Two clusters that are closest join together and
form a new cluster
• This process continues till all customers are in one
cluster
• This method is called ‘Hierarchical Clustering
Method’

Pairs of Squared
customers Euclidean
Distance
(1, 2) 52
(1, 3) 10
(1, 4) 61
(1, 5) 41
(2, 3) 18
(2, 4) 5
(2, 5) 5
(3, 4) 29
(3, 5) 17
(4, 5) 2

• Customers 4 and 5 are closest
• They join together and form a new cluster
1. Now, there are 4 clusters: {(1), (2), (3), (4,5)}

• In next stage, distance is calculated between four clusters
• Distance between clusters (3) and (4,5) ?

Clustering Methods

• Following methods are available:
– Single linkage rule (nearest neighbour)
– Complete linkage rule (farthest neighbour)
– Between-groups linkage rule
– Within-groups linkage rule
– Centroid rule
– Ward’s method

Single linkage rule (nearest neighbour)
• It is minimum
distance between
customers in two
clusters
• For e.g. distance
17
between clusters {(3),
u
nit
(4,5)} = Minimum
s distance between
{(3,4),(3,5)} = 17 units
Pairs of Squared
customers Euclidean
Distance
(3, 4) 29
(3, 5) 17

Complete linkage rule (farthest neighbour)
• It is maximum
distance between
customers in two
clusters
29
u
nit
(4,5)} = Max. distance
s between {(3,4),(3,5)}
= 29 units
Pairs of Squared
customers Euclidean
Distance
(3, 4) 29
(3, 5) 17

Between - group linkage
• It is average distance
between all pairs of
customers in two
clusters
23
u
nit
s (4,5)} = (29+17)/2 =
23 units
Pairs of Squared
customers Euclidean
Distance
(3, 4) 29
(3, 5) 17

Within - group linkage
• First combine two
clusters
• It is average distance
between all pairs of
customers in
combined cluster
16
u
nit between clusters {(3),
s (4,5)} = (29+17+2)/3 =
16 units
Pairs of Squared
customers Euclidean
Distance
(3, 4) 29
(3, 5) 17
Prepared by Mr. C Y Nimkar

(4, 5) 2 32
Centroid rule
Centroid of a cluster is
a virtual customer with
average ratings. For e.g.
centroid of (4,5)
Customer
No. Variety Price
4 2 8
5 3 7
Centroid (2+3)/2 = (8+7)/2 =
2.5 7.5
Distance between clusters is distance between their

centroids
Ward’s method
• Distance between each
respondent and the cluster
centroid is calculated by
squared Euclidean method
and added. For e.g.
• Distance between
cluster (3) and its
centroid = 0
Customer
No. Variety Price
customer (4) and
4 2 8 centroid + Distance
5 3 7 between customer (5)
and centroid
Centroid (2+3)/2 (8+7)/2 =
= 2.5 7.5 ={(2-2.5)2+(8-7.5)2} +
{(3-2.5)2+(7-7.5)2} = 1.0

• Combine two
clusters (3) and (4,5)
• Find value of new
centroid
Customer
No. Variety Price
3 7 6
4 2 8
5 3 7
Centroid 4 7
each respondent and
new cluster centroid is
calculated by squared
Euclidean method and
added
= {(7-4)2+(6-7)2}+{(2-4)2+(8-7)2}+{(3-4)2+(7-7)2} = 16.0

Before combining After combining
16.0
1.0
• Minimum distance before combining clusters (3) and (4,5)

from centroids = 0
• Distance after combining two clusters (3) and (4,5) from new
centroid = 16.0
• Increase in distance after combination = 16.0 – 0 = 16.0 =
Distance between two clusters (3) and (4,5)
Example:
Cluster Analysis for Housing

• Following attributes for selecting location for
housing are considered:
– Nearness to school/college
– Nearness to market
– Locality (Surrounding)
• Fifteen customers’ response is obtained on 10
point rating scale as shown in next slide

Cust No Ratings out of 10 on
Nearness to Nearness to
school/college market Locality
1 6 5 6
2 2 4 7
3 7 6 6
4 3 8 5
5 5 5 6
6 9 6 5
7 4 4 7
8 5 5 4
9 2 7 6
10 4 5 5
11 1 2 5
12 2 6 6
13 7 4 8
14 8 2 6
15 6 5 5
• Open data file
• Analyze Classify Hierarchical Cluster…

• Bring variables for segmentation in ‘Variables(s)’ box
• Select ‘Statistics…’

‘Agglomeration schedule’ should show selected

Select ‘Plots…’

Click ‘Dendrogram’

Select ‘Method…’

Select desired clustering method, usually ‘Ward’s method’

Select desired distance measure, usually ‘Squared Euclidean distance’

Click ‘OK’

Output of hierarchical method

Squared Euclidean method/Ward’s Method
Customer 1 and customer 15 formed the first cluster

Distance between customer 1 and
its centroid = 0
1 15 Distance between customer 15
and its centroid = 0
Cust No. Sch_Coll Market Locality

1 6 5 6
1 15
15 6 5 5
Centroid 6 5 5.5
Distance between customer 1

and new centroid + Distance
between customer 15 and new
centroid =
{(6-6)2+ (5-5)2+(6-5.5)2}+
{(6-6)2 +(5-5)2+(5-5.5)2} = 0.5
= Ward’s distance between customer
1 and customer 15
Squared Euclidean method/Ward’s Method
Ward’s distance between customer 1 and customer 15

Icicle Plot

Dendrogram is graphical representation of Agglomeration Schedule

Sudden increase is seen in coefficients from 75.722 to 134.267
Cluster 1 Cluster 2
Customers with Customers with

Stage 13 Distance <= 75.722 Distance <= 75.722
Cluster 1 Cluster 2
Customers with Customers with

Stage 14 Distance <= 75.722 Distance <= 75.722
134.267
• Should we join cluster 1 and cluster 2? Answer is ‘No’

• Therefore, 2 clusters can be formed
Euclidean / Between Group Euclidean / Within Group
Agglomeration Schedule Agglomeration Schedule
Stage Cluster First Stage Cluster First

Cluster Combined Appears Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 1 15 1.000 0 0 3 1 1 15 1.000 0 0 3
2 9 12 1.000 0 0 7 2 9 12 1.000 0 0 6
3 1 5 1.207 1 0 5 3 1 5 1.138 1 0 5
4 8 10 1.414 0 0 8 4 8 10 1.414 0 0 7
5 1 3 1.794 3 0 8 5 1 3 1.466 3 0 7
6 2 7 2.000 0 0 10 6 4 9 1.727 0 2 11
7 4 9 2.091 0 2 11 7 1 8 1.855 5 4 9
8 1 8 2.202 5 4 10 8 2 7 2.000 0 0 11
9 13 14 3.000 0 0 12 9 1 13 2.303 7 0 10
10 1 2 3.454 8 6 11 10 1 6 2.671 9 0 12
11 1 4 3.864 10 7 13 11 2 4 2.849 8 6 13
12 6 13 4.183 0 9 13 12 1 14 2.987 10 0 14
13 1 6 4.928 11 12 14 13 2 11 3.425 11 0 14
14 1 11 5.669 13 0 0 14 1 2 4.024 12 13 0
Not a good solution

Hierarchical Clustering Method
• In this technique once two customers join a

cluster they can not be separated
• It works in a stepwise fashion to form clusters,
hence called ‘hierarchical’ method.
• It is also called agglomerative method since
clusters are formed by combining existing clusters
• We use this technique to find number of clusters
that can be formed. This number will be denoted
by ‘K’
• In our example, K = 2

Apply K-Means cluster analysis method

• Open data file
• Analyze Classify K-Means Cluster…

• Bring variables for segmentation in ‘Variables(s)’ box
• Specify number of clusters in ‘Number of Clusters’ box
• Select ‘Save…’

Click ‘Cluster membership’

Select ‘Options…’

Click ‘ANOVA table’

Select ‘OK’

• Initially customers are grouped into clusters at random
 Cluster 1

13  Cluster 2

2 
7

 9
12
 
5 
3
1

4

14

 10
11 
15 
6

8

• Centroids of clusters are calculated
Initial Cluster Centers
Cluster
1 2
Nearness to
9 1
school/college
 Cluster 1

13
Nearness
 to market
Cluster 2
6 2

Locality 5 5
2 
7

 9
12
 
5 
3
1

4

14

 10
11 
15 
6

8

• Distance of a customer from every centroid is calculated
• If he is closer to centroid of other cluster then his categorisation is changed to
other cluster. For e.g.
• Customer 12’s categorisation will be changed from cluster 2 to cluster 1
 Cluster 1

13  Cluster 2

2 
7

 9
12

Cluster 1 5  
3
1
Centroid 
4 Cluster 2

14
 Centroid
 10
11 
15 
6

8

• This process continues till no further
categorisation needs change. It means every
respondent is closer to the centroid of the cluster
to which he belongs (Optimal solution)
• The results of this process are shown in next slide

Iteration Historya
Change in Cluster
Centers
Iteration 1 2
1 2.992 3.219
2 .547 .502
3 .000 .000
a. Convergence achieved due to no or small change in
cluster centers. The maximum absolute coordinate
change for any center is .000. The current iteration is 3.
The minimum distance between initial centers is 8.944.
• Due to recategorisation of one customer, centroid of 1st cluster

is displaced by 2.992 units and of 2nd cluster by 3.219 units
• Optimal solution is achieved at 3rd iteration since there is no
displacement
Iteration Historya
Change in Cluster
Centers
Iteration 1 2
1 2.992 3.219
2 .547 .502
3 .000 .000
a. Convergence achieved due to no or small change in
cluster centers. The maximum absolute coordinate
change for any center is .000. The current iteration is 3.
The minimum distance between initial centers is 8.944.
• If the last iteration row has at least one non zero

displacement, we need to increase number of iterations
• It is dons as shown in next slide

Select ‘Iterate…’

• By default, number of maximum iterations = 10
• Increase number of iterations

Optimal Solution
ANOVA
Cluster Error
Mean Square df Mean Square df F Sig.
Nearness to
61.344 1 1.661 13 36.938 .000
school/college
Nearness to market .576 1 2.797 13 .206 .657
Locality .043 1 1.104 13 .039 .847
The F tests should be used only for descriptive purposes because the clusters have been chosen to
maximize the differences among cases in different clusters. The observed significance levels are not
corrected for this and thus cannot be interpreted as tests of the hypothesis that the cluster means are
equal.
Final Cluster Centers • Higher F ratio, more difference in

Cluster
1 2
consumer ratings
Nearness to
7 3
school/college • Such variable is segmentation
Nearness to market 5 5
Locality 6 6 variable

Number of Cases in each Cluster
Cluster 1 8.000
2 7.000
Valid 15.000
Missing .000
This table shows cluster sizes

Comparison between hierarchical and K-means methods
• In hierarchical method once two clusters join,

they are never separated
• In K-means method two clusters that are joined
can be separated at a later stage
• Therefore, K means method gives more
homogeneous clusters

How many clusters?
• Are clusters distinct from marketing point

of view?
• Do clusters vary in demographics &
psychographics?
• Use discriminant analysis by taking clusters
as groups

Cluster Analysis: Prepared by Mr. C Y Nimkar 1

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Cluster Analysis: Prepared by Mr. C Y Nimkar 1

Hochgeladen von

Copyright:

Verfügbare Formate

Cluster Analysis

Prepared by Mr. C Y Nimkar 1

Prepared by Mr. C Y Nimkar 2

Do you want AC in car? Do you want AC in car?

Categorical data Continuous data

Chi-square technique Cluster analysis technique

• Cross tabulate age and whether

• Select customers who said ‘Yes’ to AC in car

Prepared by Mr. C Y Nimkar 6

Model 1 Target audience 1

Model 2 Target audience 2

Model 4 Target audience 4

Prepared by Mr. C Y Nimkar 7

Chocolate Celebration Youngsters

Prepared by Mr. C Y Nimkar 8

Prepared by Mr. C Y Nimkar 9

Prepared by Mr. C Y Nimkar 10

Prepared by Mr. C Y Nimkar 11

Prepared by Mr. C Y Nimkar 12

Prepared by Mr. C Y Nimkar 13

Prepared by Mr. C Y Nimkar 15

Prepared by Mr. C Y Nimkar 16

Sr_No Variety Price

Squared Euclidean distance

Prepared by Mr. C Y Nimkar 17

Sr_No Variety Price

Euclidean distance between

Prepared by Mr. C Y Nimkar 18

Sr_No Variety Price

6 units City Block distance between

Prepared by Mr. C Y Nimkar 19

Sr_No Variety Price

Prepared by Mr. C Y Nimkar 20

Prepared by Mr. C Y Nimkar 22

Prepared by Mr. C Y Nimkar 23

Prepared by Mr. C Y Nimkar 24

1. Now, there are 4 clusters: {(1), (2), (3), (4,5)}

Prepared by Mr. C Y Nimkar 26

Prepared by Mr. C Y Nimkar 27

Prepared by Mr. C Y Nimkar 28

Prepared by Mr. C Y Nimkar 29

Prepared by Mr. C Y Nimkar 30

Prepared by Mr. C Y Nimkar 31

Prepared by Mr. C Y Nimkar

Distance between clusters is distance between their

Prepared by Mr. C Y Nimkar 34

Prepared by Mr. C Y Nimkar 35

• Minimum distance before combining clusters (3) and (4,5)

Prepared by Mr. C Y Nimkar 37

Prepared by Mr. C Y Nimkar 38

Prepared by Mr. C Y Nimkar 40

Prepared by Mr. C Y Nimkar 41

Prepared by Mr. C Y Nimkar 42

Prepared by Mr. C Y Nimkar 43

Prepared by Mr. C Y Nimkar 44

Prepared by Mr. C Y Nimkar 45

Prepared by Mr. C Y Nimkar 46

Prepared by Mr. C Y Nimkar 47

Prepared by Mr. C Y Nimkar 48

Prepared by Mr. C Y Nimkar 49

Customer 1 and customer 15 formed the first cluster

Prepared by Mr. C Y Nimkar 50

Cust No. Sch_Coll Market Locality