Sie sind auf Seite 1von 17

Cluster Analysis Cluster by definition is a group of similar objects A general question facing researchers in many areas of inquiry is how

to organize observed data into meaningful structures. Cluster analysis is a multivariate data analysis tool which aims at sorting different objects into groups in a way that the degree of association between two objects is maximal if they belong to the same group and minimal otherwise. It is a multivariate technique ideally suited for segmentation applications

Examples
Identifying group of target customers who are similar in buying habits or demographic characteristics or psychographics Identify brands which are similar to each other and different from others

Cluster Analysis
PURPOSE: To identify natural clusters of objects on the basis of similarities of the objects on a variety of characteristics INPUT: Any variables upon which the similarity among respondents can be measured KEY OUTPUT: A group of clusters shown by dendrogram or icicle plot

An Example of Cluster Analysis: Test Market Selection


Income Index
1.14 -1.25 1.62 1.64 0.55 -0.94 0.89 -0.87 -0.44 0.08 -0.18 -1.29 -1.07

Test Market
1 2 3 4 5 6 7 8 9 10 11 12 13

Population Index
1.72 -1.17 0.89 1.35 0.1 -1.25 1.32 -0.63 -0.07 -0.55 0.62 -0.86 -1.38

Test Market Selection


2 1

Population

-1

-2 -2 -1 0 1 2

Income

Test Market Selection: Dendrogram from Cluster Analysis


Distance
10.72

7.14

3.57

0.00 1 7 3 4 2 12 6 13 8 11 5 10 9 14 15

Observations

Methods
Clustering methods
Hierarchical

or linkage method k-means (non hierarchical clustering)

Distance (similarity) measures Define the distance among clusters

Number of clusters to keep

Cluster Analysis Vs. Discriminant Analysis

If the market segments are unknown, use cluster analysis e.g. how should a research company classify cities into several homogenous groups for the purpose of test market? If the market segments are known but a firm needs to classify new customers into those segments, use discriminant analysis e.g. given the demographic information of a customer, should she be classified as a catalog user or not?

Groups unknown cluster analysis Groups known discriminant analysis Typical steps of market segmentation:

1. Cluster analysis to identify segments

2. Discriminant analysis to link the segments to a set of observable characteristics 3. Forecast the group membership for new customers using the results from discriminant analysis

Statistics and concepts associated with cluster analysis


1.

2.

3.

4.

Agglomeration schedule : This gives information on the objects or cases combined at each stage of hierarchical clustering process The cluster centroid: The mean values of variables for all the cases in a cluster Cluster membership : This indicates the cluster to which each object or case belongs Cluster centers : Starting points (seeds) in non- hierarchical clusters

5.

Distance between cluster centers : These indicate how separate the individual pairs of clusters are. Clusters that are widely separated are distinct from each other and hence desirable
Dendrogram : A graphical device displaying clustering results. Vertical lines represent clusters that are joined together.

6.

Distance or similarity measures 1. Euclidean distance 2. Manhattan distance Clustering procedure 1. Hierarchical clustering
Methods : Agglomerative clustering ( linkage methods) Divisive Clustering
2.

Non hierarchical clustering (K- Mean clustering)


Methods

Marketing applications
Market segmentation - helps formulating marketing programmes Buyer behaviour - homogeneous groups of buyers studied Development of new product clustering brands or products Identifying homogeneous test markets cities grouped into clusters on the basis of their similarities

Exercise Problems: A major Indian FMCG company wants to map the profile of its target audience in terms of lifestyles, attitudes, and perceptions. A set of 15 questions is prepared to measure the variables of interest. The respondents had to agree or disagree ( 1 = strongly agree, 2 = agree, 3 = neither agree nor disagree, 4 = disagree, 5 = strongly disagree ) with each of the statements. The statement set is given below.

1.

I prefer to use email rather than write a letter I feel that quality products are always priced high I think twice before I buy any thing Television is a major source of entertainment

2.

3.

4.

5.

A car is a necessity rather than luxury


I prefer fast food and ready to use products People are more health conscious today

6.

7.

8. 9.

10.

Entry to foreign companies has increased the efficiency of Indian companies Women are active participants in purchase decisions I believe politics can play a positive role I enjoy watching movies If I get a chance, I will like to settle abroad I always buy branded products I frequently go out on weekends I prefer to pay by credit card than pay cash

11.

12.

13. 14. 15.

Input data is on 20 respondents ( 20 x 15 ) SPSS data file cluster.sav. SPSS commands: ANALZE/Classify/Hierarchical clusters/cluster cases Method- between group linkages Measure Squared Euclidean distances Statistic Agglomeration Schedule

Output file :

Das könnte Ihnen auch gefallen