Sie sind auf Seite 1von 44

Cluster Analysis

Prepared by Mr. C Y Nimkar 1


Usage

• Usually used for grouping customers into


clusters that have similar behaviour /
attitude
• Helps marketer to decide target audience.
With this he can do
– Product differentiation…
– Offer differentiation…

Prepared by Mr. C Y Nimkar 2


Product differentiation

Model 1 Target audience 1

Model 2 Target audience 2

Mobile handset
Model 3 Target audience 3

Model 4 Target audience 4

Prepared by Mr. C Y Nimkar 3


Offer differentiation

Children
Enjoyment

Chocolate Celebration Youngsters

Quick lunch
Busy persons

Prepared by Mr. C Y Nimkar 4


Steps in Cluster analysis

Prepared by Mr. C Y Nimkar 5


Step 1 – Collect data

Prepared by Mr. C Y Nimkar 6


• Collect data in continuous scale (interval or
ratio) on:
– customer needs
– Customers' opinion about products
• Usually rating scale is used

Prepared by Mr. C Y Nimkar 7


Cluster Analysis for Housing

Prepared by Mr. C Y Nimkar 8


• Following attributes for selecting location for housing are
considered:
– Nearness to school/college
– Nearness to market
– Locality
• Fifteen customers’ response is obtained on 10 point rating
scale as shown in next slide

Prepared by Mr. C Y Nimkar 9


Cust
No
Ratings out of 10 on Customer Space
  Nearness Nearness Locality
to to market
school/coll
ege

1 6 5 6 
13
2 2 4 7

3 7 6 6 2 
7
4 3 8 5 
 9
5 5 5 6 12
 
5 
6 9 6 5 1 3

7 4 4 7 4

14
8 5 5 4 

10 
11
15
9 2 7 6 
6
10 4 5 5

11 1 2 5 8
12 2 6 6
13 7 4 8
14 8 2 6
15 6 5 5

Each customer is a point in space


Prepared by Mr. C Y Nimkar 10

13


2 
7

 9
12

 
5 3
1

4

14

 10
11 
15 
6


8

Cluster analysis groups customers based on distances


between them
Prepared by Mr. C Y Nimkar 11
Step 2 – Apply hierarchical cluster analysis method

Prepared by Mr. C Y Nimkar 12


• In hierarchical cluster method:
– Initially, each customer forms his own cluster
– Then, two clusters that are closest form a new cluster
– This process continues till we get one cluster


13


2 
7

 9
12
 
5 
1 3

4

14

 10
11 
15 
6


8

Prepared by Mr. C Y Nimkar 13


• In this technique once two customers join a cluster they can
not be separated
• It works in a stepwise fashion to form clusters, hence called
‘hierarchical’ method.
• It is also called agglomerative method since clusters are
formed by combining existing clusters
• We use this technique to find number of clusters that can be
formed
• This technique uses distance method and clustering method
that we specify

Prepared by Mr. C Y Nimkar 14


• Following distance methods are available:
– Squared Euclidean distance method
– Euclidean distance method
– City-block (Manhattan) distance method
– Chebychev distance method

Prepared by Mr. C Y Nimkar 15


Squared Euclidean/Euclidean distance

Customer No Ratings out of 10 on

  Nearness to Nearness to market Locality


school/college

1 6 5 6
2 2 4 7

• Squared Euclidean distance between customer no 1 and 2


= (6-2)2 + (5-4)2 + (6-7)2 = 18.00

• Euclidean distance between customer no 1 and 2

= √(6-2)2 + (5-4)2 + (6-7)2 = 4.2426

Prepared by Mr. C Y Nimkar 16


City Block distance

Customer No Ratings out of 10 on

  Nearness to Nearness to market Locality


school/college

1 6 5 6
2 2 4 7

It is the sum of absolute (positive) distances = 4 + 1 + 1 = 6

Prepared by Mr. C Y Nimkar 17


Chebychev distance

Customer No Ratings out of 10 on

  Nearness to Nearness to market Locality


school/college

1 6 5 6
2 2 4 7

It is the maximum absolute (positive) distance =


Maximum {4,1,1} = 4

Prepared by Mr. C Y Nimkar 18


Method to form clusters

• Following methods are available:


– Single linkage rule (nearest neighbour)
– Complete linkage rule (farthest neighbour)
– Between-groups linkage rule
– Within-groups linkage rule
– Centroid rule
– Ward’s method

Prepared by Mr. C Y Nimkar 19


Single linkage rule (nearest neighbours)

1 . 5 .
2 . 6.
3 . 4. 8. 7.

• Distance between clusters is the minimum distance


between a customer in one cluster and a customer in other
cluster

Prepared by Mr. C Y Nimkar 20


Complete linkage rule (farthest neighbours)

1 . 5 .
2 . 6.
3 . 4 . 8. 7.

• Distance between clusters is the maximum distance


between a customer in one cluster and a customer in other
cluster

Prepared by Mr. C Y Nimkar 21


Between - group linkage

1. 5 .
2. 6.
3 . 4. 8. 7.
• It is the average distance between all pairs of customers in
two clusters:
1-5,1-6,1-7,1-8
2-5,2-6,2-7,2-8
3-5,3-6,3-7,3-8
4-5,4-6,4-7,4-8

Prepared by Mr. C Y Nimkar 22


Within - group linkage

• Within-group method considers distance


between pairs of customers after combining
two clusters.
– For e.g. there are 3 clusters I, II and III
1. 5 .
2.
3. 4.
6.
8 . 7.
I
II III

Prepared by Mr. C Y Nimkar 23


After combining clusters I and II, form pairs of customers
in I and II. Pairs Pairs
1-2 3-5
1. 5 . 1-3 3-6
2. 1-4 3-7

3. 4.
6. 1-5 3-8

8 . 7. Pairs
1-6
1-7
4-5
4-6
I 1-8 4-7
II 2-3 4-8
2-4 5-6
2-5 5-7
2-6 5-8
2-7 6-7
2-8 6-8
Calculate average distance of pairs formed
3-4 7-8

Prepared by Mr. C Y Nimkar 24


• Do the same calculations by combining clusters II and III
and clusters I and III
• Combine those clusters where average distance is least

Prepared by Mr. C Y Nimkar 25


Centroid rule
Suppose customers 1, 5 and 15 are in one cluster
Customer Nearness to Nearness to market Locality
No. school/college

1 6 5 6
5 5 5 6
15 6 5 5
Centroid (6+5+6)/3 = 5.67 (5+5+5)/3 = 5.00 (6+6+5)/3 = 5.67

• Centroid of a cluster is a virtual customer with ratings of 5.67, 5.00


and 5.67 on nearness to school/college, nearness to market and locality
respectively

.
(5.67, 5.00, 5.67)

• Distance between two clusters is distance between their centroids


• Distance is calculated between centroids of every two clusters and two
clusters are combined whose centroids are closest
Prepared by Mr. C Y Nimkar 26
Ward’s method

9
1 I
II
5 15
12

• Suppose there are 2 clusters I and II


• Cluster I comprises of respondents 1, 5 and 15
• Cluster II comprises of respondents 9 and 12

Prepared by Mr. C Y Nimkar 27


• Distance between a respondent and the cluster centroid is
calculated by squared Euclidean method and these distances are
added
• For example, for cluster I that contains customers 1,5 and 15:

Customer Nearness to school/college Nearness to market Locality


No.

1 6 5 6
5 5 5 6
1 I
15 6 5 5
5 15
Centroid (6+5+6)/3 = 5.67 (5+5+5)/3 = 5.00 (6+6+5)/3
= 5.67

= (6-5.67)2+(5-5.67)2+(6-5.67)2+(5-5)2+(5-5)2+(5-5)2+(6-5.67)2+(6-5.67)2+(5-5.67)2
= 1.3334

Prepared by Mr. C Y Nimkar 28


For cluster II,

Customer Nearness to school/college Nearness to market Locality


No. 9

9 2 7 6
12 2 6 6
II
Centroid 2 6.5 6 12

= (2-2)2+(2-2)2+(7-6.5)2+(6-6.5)2+(6-6)2+(6-6)2 = 0.50

Prepared by Mr. C Y Nimkar 29


• Both clusters are combined

1 I 9

II
5 15
12

• Calculate new centroid


Customer Nearness to school/college Nearness to market Locality
No.

1 6 5 6
5 5 5 6
15 6 5 5
9 2 7 6
12 2 6 6
Centroid 4.2 5.6 5.8
Prepared by Mr. C Y Nimkar 30
1 I 9

II
5 15
12

• Distance is calculated for a respondent from new centroid


by squared euclidean method
Customer Nearness to Nearness to Locality
No. school/college market

1 6 5 6
5 5 5 6
15 6 5 5 Distance = 20.8
9 2 7 6
12 2 6 6
Centroid 4.2 5.6 5.8
Prepared by Mr. C Y Nimkar 31
• Increase in distance after combining two clusters is calculated

1
I 9

1.334 II 0.50
5 15
12

9
1 I 20.8
II
5 15
12

Increase in distance after combining two clusters = 20.8 – 0.5 = 20.3


Prepared by Mr. C Y Nimkar 32
• This process is done for all clusters
• Two clusters are combined where increase in distance is
the smallest

Prepared by Mr. C Y Nimkar 33


Output of hierarchical method

Prepared by Mr. C Y Nimkar 34


Squared euclidean method/Between groups Linkage Method
Agglomeration Schedule

Stage Cluster First


Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 1 15 1.000 0 0 3
2 9 12 1.000 0 0 7
3 1 5 1.500 1 0 5
4 8 10 2.000 0 0 8
5 1 3 3.333 3 0 8
6 2 7 4.000 0 0 10
7 4 9 4.500 0 2 13
8 1 8 5.250 5 4 11
9 13 14 9.000 0 0 12
10 2 11 13.000 6 0 13
11 1 6 14.667 8 0 12
12 1 13 15.643 11 9 14
13 2 4 18.333 10 7 14
14 1 2 26.574 12 13 0

Squared euclidean distance between


respondents 1 and 15 = 1.000

Prepared by Mr. C Y Nimkar 35


Prepared by Mr. C Y Nimkar 36
Agglomeration Schedule

Stage Cluster First


Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 1 15 1.000 0 0 3
2 9 12 1.000 0 0 7
3 1 5 1.500 1 0 5
4 8 10 2.000 0 0 8
5 1 3 3.333 3 0 8
6 2 7 4.000 0 0 10
7 4 9 4.500 0 2 13
8 1 8 5.250 5 4 11
9 13 14 9.000 0 0 12
10 2 11 13.000 6 0 13
11 1 6 14.667 8 0 12
12 1 13 15.643 11 9 14
13 2 4 18.333 10 7 14
14 1 2 26.574 12 13 0

Sudden increase in coefficients, indicating 2 clusters are possible


Prepared by Mr. C Y Nimkar 37
Step 3 – Apply K-Means cluster analysis method

Prepared by Mr. C Y Nimkar 38


K-means cluster analysis method

• Initially clusters are formed by assigning customers at


random
• Centroids of clusters are calculated
• Then distance of respondent from each centroid is
calculated
• If the respondent is closer to centroid of other cluster then
he is transferred to the other cluster
• This process continues till no further transfer is possible. It
means every respondent is closer to the centroid of the
cluster to which he belongs (Optimal solution)

Prepared by Mr. C Y Nimkar 39


Comparison between hierarchical and K-means methods

• In hierarchical method once two clusters join, they


are never separated
• In K-means method two clusters that are joined
can be separated at a later stage
• Therefore, K means method gives more
homogeneous clusters

Prepared by Mr. C Y Nimkar 40


Initial Cluster Centers

Cluster
1 2
Nearness to
9 1
school/college
Nearness to market 6 2
Locality 5 5

This is the solution by assigning customers at random to 2 clusters

Prepared by Mr. C Y Nimkar 41


Iteration Historya

Change in Cluster
Centers
Iteration 1 2
1 2.992 3.219
2 .547 .502
3 .000 .000
a. Convergence achieved due to no or small change in
cluster centers. The maximum absolute coordinate
change for any center is .000. The current iteration is 3.
The minimum distance between initial centers is 8.944.

This is the iteration history showing optimal solution is achieved at


3rd iteration

Prepared by Mr. C Y Nimkar 42


ANOVA

Cluster Error
Mean Square df Mean Square df F Sig.
Nearness to
61.344 1 1.661 13 36.938 .000
school/college
Nearness to market .576 1 2.797 13 .206 .657
Locality .043 1 1.104 13 .039 .847
The F tests should be used only for descriptive purposes because the clusters have been chosen to
maximize the differences among cases in different clusters. The observed significance levels are not
corrected for this and thus cannot be interpreted as tests of the hypothesis that the cluster means are
equal.

Final Cluster Centers

Cluster • Higher F ratio, more difference in


1 2
Nearness to consumer ratings
7 3
school/college
Nearness to market 5 5 • Such variable is segmentation
Locality 6 6
variable

Prepared by Mr. C Y Nimkar 43


Number of Cases in each Cluster
Cluster 1 8.000
2 7.000
Valid 15.000
Missing .000

This table shows cluster sizes

Prepared by Mr. C Y Nimkar 44

Das könnte Ihnen auch gefallen