Sie sind auf Seite 1von 77

Cluster Analysis

Prepared by Mr. C Y Nimkar 1


Steps in Marketing

Segmentation

Targeting

Positioning

Prepared by Mr. C Y Nimkar 2


Segmentation

CUSTOMER NEEDS

Do you want AC in car? Do you want AC in car?


Yes/No 5 points rating scale

Categorical data Continuous data

Chi-square technique Cluster analysis technique


•Independence
•Homogeneity
Prepared by Mr. C Y Nimkar 3
Categorical data – Chi-square technique of independence
Can we use age to segment customers’ into ‘want AC’ and ‘do not want AC’
categories?

• Cross tabulate age and whether


AC needed
• Use χ2 test for independence
• Ho : Age and need of AC are
independent
• H1 : Age and need of AC are
dependent
• Calculate measures of association
• If both are independent, age is
not suitable for segmentation
Prepared by Mr. C Y Nimkar 4
Categorical data – Chi-square technique of homogeneity

Is there any age-group that is large among customers who want AC in car?

• Select customers who said ‘Yes’ to AC in car


• Prepare frequency table according to their
age category
• Use χ2 test for homogeneity (goodness of
fit)
• Ho : Frequency distribution based on
age category is homogeneous
• H1 : Frequency distribution based on
age category is not homogeneous
• If frequency distribution is homogeneous,
age is not suitable for segmentation
• If ‘Asymp. Sig.’ value >= 0.05 frequency
distribution based on age is homogeneous
age is not suitable for segmentation
Prepared by Mr. C Y Nimkar 5
• Segmentation helps to decide target
audience for:
– Product differentiation strategy…
– Offer differentiation strategy…

Prepared by Mr. C Y Nimkar 6


Product differentiation

Model 1 Target audience 1

Model 2 Target audience 2

Mobile handset
Model 3 Target audience 3

Model 4 Target audience 4

Prepared by Mr. C Y Nimkar 7


Offer differentiation

Enjoyment Children

Chocolate Celebration Youngsters

Quick lunch
Busy persons

Prepared by Mr. C Y Nimkar 8


• In both approaches, segmentation is
essential

Prepared by Mr. C Y Nimkar 9


Example:
Cluster Analysis for Retail Mall

Prepared by Mr. C Y Nimkar 10


Que.: (SHOW CARD A) This card shows few
requirements that a customer like you may
be looking for in a retail mall. Please indicate
your need on a scale of 10 where 1 means
‘not at all needed’ and 10 means ‘very much
needed’
Need Rating
Variety
Price

Prepared by Mr. C Y Nimkar 11


5 customers gave following ratings

Prepared by Mr. C Y Nimkar 12


Customers’ location in 2-dimensional space (customer
space) is as under:

Prepared by Mr. C Y Nimkar 13


• Each customer is a point in customer space
• Cluster analysis groups customers on ‘distances among
customers’
Prepared by Mr. C Y Nimkar 14
Distance Methods

Prepared by Mr. C Y Nimkar 15


• Following distance methods are available:
– Squared Euclidean distance method
– Euclidean distance method
– City-block (Manhattan) distance method
– Chebychev distance method

Prepared by Mr. C Y Nimkar 16


Squared Euclidean Distance

Sr_No Variety Price


  (A) (B)
1 8 3
52
u nit 2 4 9
s

Squared Euclidean distance


between customers 1 and 2
= (8-4)2 + (3-9)2 = 16 + 36 = 52 units

Prepared by Mr. C Y Nimkar 17


Euclidean Distance

Sr_No Variety Price


  (A) (B)
7.2 1 8 3
1 uni
ts 2 4 9

Euclidean distance between


customers 1 and 2
= √(8-4)2 + (3-9)2 = 7.21 units

Prepared by Mr. C Y Nimkar 18


City Block Distance

Sr_No Variety Price


  (A) (B)
4 units 1 8 3
2 4 9

6 units City Block distance between


customers 1 and 2
= |8-4| + |3-9| = 10 units

Prepared by Mr. C Y Nimkar 19


Chebychev Distance

Sr_No Variety Price


  (A) (B)
1 8 3
2 4 9

6 units
Chebychev distance between
customers 1 and 2
= Max {|8-4| , |3-9|} = 6 units

Prepared by Mr. C Y Nimkar 20


• Squared Euclidean method magnifies
distance compared to other methods
• But gives higher weightage to larger
distance
• Preferred method
Ratings on
  Var 1 Var 2
Customer 1 2 5
Customer 2 5 9
Absolute Distance 3 4
Ratio 1 1.33
Squared Distance 9 16
Ratio 1 1.78
Prepared by Mr. C Y Nimkar 21
• Initially, each customer forms his own
cluster. This is called 0th stage of clustering

Prepared by Mr. C Y Nimkar 22


• Then distance is calculated between two clusters
by any of the distance methods
• Two clusters that are closest join together and
form a new cluster
• This process continues till all customers are in one
cluster
• This method is called ‘Hierarchical Clustering
Method’

Prepared by Mr. C Y Nimkar 23


Pairs of Squared
customers Euclidean
Distance
(1, 2) 52
(1, 3) 10
(1, 4) 61
(1, 5) 41
(2, 3) 18
(2, 4) 5
(2, 5) 5
(3, 4) 29
(3, 5) 17
(4, 5) 2

Prepared by Mr. C Y Nimkar 24


• Customers 4 and 5 are closest
• They join together and form a new cluster

1. Now, there are 4 clusters: {(1), (2), (3), (4,5)}


Prepared by Mr. C Y Nimkar 25
• In next stage, distance is calculated between four clusters
• Distance between clusters (3) and (4,5) ?

Prepared by Mr. C Y Nimkar 26


Clustering Methods

Prepared by Mr. C Y Nimkar 27


• Following methods are available:
– Single linkage rule (nearest neighbour)
– Complete linkage rule (farthest neighbour)
– Between-groups linkage rule
– Within-groups linkage rule
– Centroid rule
– Ward’s method

Prepared by Mr. C Y Nimkar 28


Single linkage rule (nearest neighbour)

• It is minimum
distance between
customers in two
clusters
• For e.g. distance
17
between clusters {(3),
u
nit
(4,5)} = Minimum
s distance between
{(3,4),(3,5)} = 17 units

Pairs of Squared
customers Euclidean
Distance
(3, 4) 29
(3, 5) 17

Prepared by Mr. C Y Nimkar 29


Complete linkage rule (farthest neighbour)

• It is maximum
distance between
customers in two
clusters
• For e.g. distance
29
between clusters {(3),
u
nit
(4,5)} = Max. distance
s between {(3,4),(3,5)}
= 29 units

Pairs of Squared
customers Euclidean
Distance
(3, 4) 29
(3, 5) 17

Prepared by Mr. C Y Nimkar 30


Between - group linkage

• It is average distance
between all pairs of
customers in two
clusters
• For e.g. distance
23
u
between clusters {(3),
nit
s (4,5)} = (29+17)/2 =
23 units

Pairs of Squared
customers Euclidean
Distance
(3, 4) 29
(3, 5) 17

Prepared by Mr. C Y Nimkar 31


Within - group linkage
• First combine two
clusters
• It is average distance
between all pairs of
customers in
combined cluster
16
• For e.g. distance
u
nit between clusters {(3),
s (4,5)} = (29+17+2)/3 =
16 units

Pairs of Squared
customers Euclidean
Distance
(3, 4) 29
(3, 5) 17

Prepared by Mr. C Y Nimkar


(4, 5) 2 32
Centroid rule

Centroid of a cluster is
a virtual customer with
average ratings. For e.g.
centroid of (4,5)
Customer
No. Variety Price

4 2 8
5 3 7
Centroid (2+3)/2 = (8+7)/2 =
2.5 7.5

Distance between clusters is distance between their


centroids
Prepared by Mr. C Y Nimkar 33
Ward’s method
• Distance between each
respondent and the cluster
centroid is calculated by
squared Euclidean method
and added. For e.g.
• Distance between
cluster (3) and its
centroid = 0
Customer
• Distance between
No. Variety Price
customer (4) and
4 2 8 centroid + Distance
5 3 7 between customer (5)
and centroid
Centroid (2+3)/2 (8+7)/2 =
= 2.5 7.5 ={(2-2.5)2+(8-7.5)2} +
{(3-2.5)2+(7-7.5)2} = 1.0

Prepared by Mr. C Y Nimkar 34


• Combine two
clusters (3) and (4,5)
• Find value of new
centroid

Customer
No. Variety Price

3 7 6
4 2 8
5 3 7
Centroid 4 7

• Distance between
each respondent and
new cluster centroid is
calculated by squared
Euclidean method and
added
= {(7-4)2+(6-7)2}+{(2-4)2+(8-7)2}+{(3-4)2+(7-7)2} = 16.0

Prepared by Mr. C Y Nimkar 35


Before combining After combining

16.0
1.0

• Minimum distance before combining clusters (3) and (4,5)


from centroids = 0
• Distance after combining two clusters (3) and (4,5) from new
centroid = 16.0
• Increase in distance after combination = 16.0 – 0 = 16.0 =
Distance between two clusters (3) and (4,5)
Prepared by Mr. C Y Nimkar 36
Example:
Cluster Analysis for Housing

Prepared by Mr. C Y Nimkar 37


• Following attributes for selecting location for
housing are considered:
– Nearness to school/college
– Nearness to market
– Locality (Surrounding)
• Fifteen customers’ response is obtained on 10
point rating scale as shown in next slide

Prepared by Mr. C Y Nimkar 38


Cust No Ratings out of 10 on

Nearness to Nearness to
  school/college market Locality
1 6 5 6
2 2 4 7
3 7 6 6
4 3 8 5
5 5 5 6
6 9 6 5
7 4 4 7
8 5 5 4
9 2 7 6
10 4 5 5
11 1 2 5
12 2 6 6
13 7 4 8
14 8 2 6
Prepared by Mr. C Y Nimkar 39
15 6 5 5
• Open data file
• Analyze Classify Hierarchical Cluster…

Prepared by Mr. C Y Nimkar 40


• Bring variables for segmentation in ‘Variables(s)’ box
• Select ‘Statistics…’

Prepared by Mr. C Y Nimkar 41


‘Agglomeration schedule’ should show selected

Prepared by Mr. C Y Nimkar 42


Select ‘Plots…’

Prepared by Mr. C Y Nimkar 43


Click ‘Dendrogram’

Prepared by Mr. C Y Nimkar 44


Select ‘Method…’

Prepared by Mr. C Y Nimkar 45


Select desired clustering method, usually ‘Ward’s method’

Prepared by Mr. C Y Nimkar 46


Select desired distance measure, usually ‘Squared Euclidean distance’

Prepared by Mr. C Y Nimkar 47


Click ‘OK’

Prepared by Mr. C Y Nimkar 48


Output of hierarchical method

Prepared by Mr. C Y Nimkar 49


Squared Euclidean method/Ward’s Method

Customer 1 and customer 15 formed the first cluster

Prepared by Mr. C Y Nimkar 50


Distance between customer 1 and
its centroid = 0
1 15 Distance between customer 15
and its centroid = 0

Cust No. Sch_Coll Market Locality


1 6 5 6
1 15
15 6 5 5
Centroid 6 5 5.5

Distance between customer 1


and new centroid + Distance
between customer 15 and new
centroid =
{(6-6)2+ (5-5)2+(6-5.5)2}+
{(6-6)2 +(5-5)2+(5-5.5)2} = 0.5
= Ward’s distance between customer
1 and customer 15
Prepared by Mr. C Y Nimkar 51
Squared Euclidean method/Ward’s Method

Ward’s distance between customer 1 and customer 15

Prepared by Mr. C Y Nimkar 52


Icicle Plot

Prepared by Mr. C Y Nimkar 53


Dendrogram is graphical representation of Agglomeration Schedule

Prepared by Mr. C Y Nimkar 54


Sudden increase is seen in coefficients from 75.722 to 134.267
Prepared by Mr. C Y Nimkar 55
Cluster 1 Cluster 2

Customers with Customers with


Stage 13 Distance <= 75.722 Distance <= 75.722

Cluster 1 Cluster 2

Customers with Customers with


Stage 14 Distance <= 75.722 Distance <= 75.722
134.267

• Should we join cluster 1 and cluster 2? Answer is ‘No’


• Therefore, 2 clusters can be formed
Prepared by Mr. C Y Nimkar 56
Euclidean / Between Group Euclidean / Within Group

Agglomeration Schedule Agglomeration Schedule

Stage Cluster First Stage Cluster First


Cluster Combined Appears Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 1 15 1.000 0 0 3 1 1 15 1.000 0 0 3
2 9 12 1.000 0 0 7 2 9 12 1.000 0 0 6
3 1 5 1.207 1 0 5 3 1 5 1.138 1 0 5
4 8 10 1.414 0 0 8 4 8 10 1.414 0 0 7
5 1 3 1.794 3 0 8 5 1 3 1.466 3 0 7
6 2 7 2.000 0 0 10 6 4 9 1.727 0 2 11
7 4 9 2.091 0 2 11 7 1 8 1.855 5 4 9
8 1 8 2.202 5 4 10 8 2 7 2.000 0 0 11
9 13 14 3.000 0 0 12 9 1 13 2.303 7 0 10
10 1 2 3.454 8 6 11 10 1 6 2.671 9 0 12
11 1 4 3.864 10 7 13 11 2 4 2.849 8 6 13
12 6 13 4.183 0 9 13 12 1 14 2.987 10 0 14
13 1 6 4.928 11 12 14 13 2 11 3.425 11 0 14
14 1 11 5.669 13 0 0 14 1 2 4.024 12 13 0

Not a good solution

Prepared by Mr. C Y Nimkar 57


Hierarchical Clustering Method

• In this technique once two customers join a


cluster they can not be separated
• It works in a stepwise fashion to form clusters,
hence called ‘hierarchical’ method.
• It is also called agglomerative method since
clusters are formed by combining existing clusters
• We use this technique to find number of clusters
that can be formed. This number will be denoted
by ‘K’
• In our example, K = 2

Prepared by Mr. C Y Nimkar 58


Apply K-Means cluster analysis method

Prepared by Mr. C Y Nimkar 59


• Open data file
• Analyze Classify K-Means Cluster…

Prepared by Mr. C Y Nimkar 60


• Bring variables for segmentation in ‘Variables(s)’ box
• Specify number of clusters in ‘Number of Clusters’ box
• Select ‘Save…’

Prepared by Mr. C Y Nimkar 61


Click ‘Cluster membership’

Prepared by Mr. C Y Nimkar 62


Select ‘Options…’

Prepared by Mr. C Y Nimkar 63


Click ‘ANOVA table’

Prepared by Mr. C Y Nimkar 64


Select ‘OK’

Prepared by Mr. C Y Nimkar 65


• Initially customers are grouped into clusters at random

 Cluster 1

13  Cluster 2


2 
7

 9
12
 
5 
3
1

4

14

 10
11 
15 
6


8

Prepared by Mr. C Y Nimkar 66


• Centroids of clusters are calculated
Initial Cluster Centers

Cluster
1 2
Nearness to
9 1
school/college
 Cluster 1

13
Nearness
 to market
Cluster 2
6 2

Locality 5 5
2 
7

 9
12
 
5 
3
1

4

14

 10
11 
15 
6


8

Prepared by Mr. C Y Nimkar 67


• Distance of a customer from every centroid is calculated
• If he is closer to centroid of other cluster then his categorisation is changed to
other cluster. For e.g.
• Customer 12’s categorisation will be changed from cluster 2 to cluster 1

 Cluster 1

13  Cluster 2


2 
7

 9
12

Cluster 1 5  
3
1
Centroid 
4 Cluster 2

14
 Centroid
 10
11 
15 
6


8

Prepared by Mr. C Y Nimkar 68


• This process continues till no further
categorisation needs change. It means every
respondent is closer to the centroid of the cluster
to which he belongs (Optimal solution)
• The results of this process are shown in next slide

Prepared by Mr. C Y Nimkar 69


Iteration Historya

Change in Cluster
Centers
Iteration 1 2
1 2.992 3.219
2 .547 .502
3 .000 .000
a. Convergence achieved due to no or small change in
cluster centers. The maximum absolute coordinate
change for any center is .000. The current iteration is 3.
The minimum distance between initial centers is 8.944.

• Due to recategorisation of one customer, centroid of 1st cluster


is displaced by 2.992 units and of 2nd cluster by 3.219 units
• Optimal solution is achieved at 3rd iteration since there is no
displacement
Prepared by Mr. C Y Nimkar 70
Iteration Historya

Change in Cluster
Centers
Iteration 1 2
1 2.992 3.219
2 .547 .502
3 .000 .000
a. Convergence achieved due to no or small change in
cluster centers. The maximum absolute coordinate
change for any center is .000. The current iteration is 3.
The minimum distance between initial centers is 8.944.

• If the last iteration row has at least one non zero


displacement, we need to increase number of iterations
• It is dons as shown in next slide

Prepared by Mr. C Y Nimkar 71


Select ‘Iterate…’

Prepared by Mr. C Y Nimkar 72


• By default, number of maximum iterations = 10
• Increase number of iterations

Prepared by Mr. C Y Nimkar 73


Optimal Solution

ANOVA

Cluster Error
Mean Square df Mean Square df F Sig.
Nearness to
61.344 1 1.661 13 36.938 .000
school/college
Nearness to market .576 1 2.797 13 .206 .657
Locality .043 1 1.104 13 .039 .847
The F tests should be used only for descriptive purposes because the clusters have been chosen to
maximize the differences among cases in different clusters. The observed significance levels are not
corrected for this and thus cannot be interpreted as tests of the hypothesis that the cluster means are
equal.

Final Cluster Centers • Higher F ratio, more difference in


Cluster
1 2
consumer ratings
Nearness to
7 3
school/college • Such variable is segmentation
Nearness to market 5 5
Locality 6 6 variable

Prepared by Mr. C Y Nimkar 74


Number of Cases in each Cluster
Cluster 1 8.000
2 7.000
Valid 15.000
Missing .000

This table shows cluster sizes

Prepared by Mr. C Y Nimkar 75


Comparison between hierarchical and K-means methods

• In hierarchical method once two clusters join,


they are never separated
• In K-means method two clusters that are joined
can be separated at a later stage
• Therefore, K means method gives more
homogeneous clusters

Prepared by Mr. C Y Nimkar 76


How many clusters?

• Are clusters distinct from marketing point


of view?
• Do clusters vary in demographics &
psychographics?
• Use discriminant analysis by taking clusters
as groups

Prepared by Mr. C Y Nimkar 77

Das könnte Ihnen auch gefallen