Sie sind auf Seite 1von 5

Unit 6: Data Analysis

Multivariate analysis (MVA) is based on the statistical principle of


multivariate statistics, which involves observation and analysis of more than
one statistical variable at a time. In design and analysis, the technique is
used to perform trade studies across multiple dimensions while taking into
account the effects of all variables on the responses of interest.
Uses for multivariate analysis include:
• Design for capability (also known as capability-based design)
• Inverse design, where any variable can be treated as an independent
variable
• Analysis of alternatives, the selection of concepts to fulfill a customer
need
• Analysis of concepts with respect to changing scenarios
• Identification of critical design drivers and correlations across
hierarchical levels
Multivariate analysis can be complicated by the desire to include physics-
based analysis to calculate the effects of variables for a hierarchical
"system-of-systems." Often, studies that wish to use multivariate analysis
are stalled by the dimensionality of the problem. These concerns are often
eased through the use of surrogate models, highly accurate approximations
of the physics-based code. Since surrogate models take the form of an
equation, they can be evaluated very quickly. This becomes an enabler for
large-scale MVA studies: while a Monte Carlo simulation across the design
space is difficult with physics-based codes, it becomes trivial when
evaluating surrogate models, which often take the form of response surface
equations.
CLUSTER ANALYSIS:
'Cluster analysis' is a class of statistical techniques that can be applied to
data that exhibit “natural” groupings. Cluster analysis sorts through the raw
data and groups them into clusters. A cluster is a group of relatively
homogeneous cases or observations. Objects in a cluster are similar to
each other. They are also dissimilar to objects outside the cluster,
particularly objects in other clusters.
The diagram below illustrates the results of a survey that studied drinkers’
perceptions of spirits (alcohol). Each point represents the results from one
respondent. The research indicates there are four clusters in this market.
Another example is the vacation travel market. Recent research has
identified three clusters or market segments. They are the: 1) The
demanders - they want exceptional service and expect to be pampered; 2)
The escapists - they want to get away and just relax; 3) The educationalist -
they want to see new things, go to museums, go on a safari, or experience
new cultures.
Cluster analysis, like factor analysis and multi dimensional scaling, is an
interdependence technique: it makes no distinction between dependent
and independent variables. The entire set of interdependent relationships is
examined. It is similar to multi dimensional scaling in that both examine
inter-object similarity by examining the complete set of interdependent
relationships. The difference is that multi dimensional scaling identifies
underlying dimensions, while cluster analysis identifies clusters. Cluster
analysis is the obverse of factor analysis. Whereas factor analysis reduces
the number of variables by grouping them into a smaller set of factors,
cluster analysis reduces the number of observations or cases by grouping
them into a smaller set of clusters.
In marketing, cluster analysis is used for:
• Segmenting the market and determining target markets
• Product positioning and New Product Development
• Selecting test markets (see : experimental techniques)
The basic procedure is:
1. Formulate the problem - select the variables that you wish to apply
the clustering technique to
2. Select a distance measure - various ways of computing distance:
○ Squared Euclidean distance - the square root of the sum of the
squared differences in value for each variable
○ Manhattan distance - the sum of the absolute differences in
value for any variable
○ Chebychev distance - the maximum absolute difference in
values for any variable
○ Mahalanobis (or correlation) distance - this measure uses the
correlation coefficients between the observations and uses that
as a measure to cluster them. This is an important measure
since it is unit invariant (can literally compare apples to
oranges)
3. Select a clustering procedure (see below)
4. Decide on the number of clusters
5. Map and interpret clusters - draw conclusions - illustrative techniques
like perceptual maps, icicle plots, and dendrograms are useful
6. Assess reliability and validity - various methods:
○ repeat analysis but use different distance measure
○ repeat analysis but use different clustering technique
○ split the data randomly into two halves and analyze each part
separately
○ repeat analysis several times, deleting one variable each time
○ repeat analysis several times, using a different order each time
Clustering procedures
There are several types of clustering methods:
1) Non-Hierarchical clustering (also called k-means clustering)
○ first determine a cluster center, then group all objects that are
within a certain distance
○ examples:
a) Sequential Threshold method - first determine a cluster
center, then group all objects that are within a
predetermined threshold from the center - one cluster is
created at a time
b) Parallel Threshold method - simultaneously several
cluster centers are determined, then objects that are
within a predetermined threshold from the centers are
grouped
c) Optimizing Partitioning method - first a non-hierarchical
procedure is run, then objects are reassigned so as to
optimize an overall criterion.
2) Hierarchical clustering
○ objects are organized into an hierarchical structure as part of
the procedure
○ examples:
a) Divisive clustering - start by treating all objects as if they
are part of a single large cluster, then divide the cluster
into smaller and smaller clusters
b) Agglomerative clustering - start by treating each object
as a separate cluster, then group them into bigger and
bigger clusters
examples:
c) Centroid methods - clusters are generated that
maximize the distance between the centers of clusters (a
centroid is the mean value for all the objects in the
cluster)
d) Variance methods - clusters are generated that minimize
the within-cluster variance
example:
e) Ward’s Procedure - clusters are generated that minimize
the squared Euclidean distance to the center mean
f) Linkage methods - cluster objects based on the distance
between them
examples:
g) Single Linkage method - cluster objects based on the
minimum distance between them (also called the nearest
neighbour rule)
h) Complete Linkage method - cluster objects based on
the maximum distance between them (also called the
furthest neighbour rule)
i) Average Linkage method - cluster objects based on the
average distance between all pairs of objects (one
member of the pair must be from a different cluster)
See also
• marketing
• marketing research

• factor analysis
• multi dimensional scaling
• quantitative marketing research
• positioning
• perceptual mapping

Das könnte Ihnen auch gefallen