Sie sind auf Seite 1von 29

CHAPTER 14: MULTIVARIATE DATA ANALYSIS: ANALYSIS OF

INTERDEPENDENCE
Learning Objectives
After reading this chapter student should be able to:
 Know the interdependence multivariate technique that can be used in
data analysis
 Conduct w exploratory factor analysis to reduce factors
 Interpret results from an exploratory factor analysis
 Understand how cluster analysis can be used to cluster objects and
individuals
 Understand how multidimensional scaling can be used in research

Introduction
As was discussed in Chapter 13, when the research does not distinguish
between independent and dependent variables then the interdependence
techniques can be used. The techniques under this group are factor
analysis, cluster analysis and multidimensional scaling.

Factor Analysis
Factor analysis is typically known as a data reduction technique. This is a
technique that tries to statistically identify a reduced number of factors from
a larger number of items which are typically called the measured variables.
The factors identified are called latent variables as they are not measured
directly. To do this analysis the researcher does not have to distinguish
between dependent and independent variables.

Factor analysis can be divided into 2 types as follows:


1. Exploratory factor analysis (EFA)
This type of factor analysis is carried out when the researcher has no
knowledge or is not sure of the number of underlying factors that may
be available in the data. This analysis is usually performed using the
SPSS.
2. Confirmatory factor analysis
This type of factor analysis is carried out when the researcher knows
the underlying structure of the factors from prior research and would
like to confirm (number of factors and their items used to measure
them) only. This analysis is usually performed using Structural
equation modeling (SEM) software’s like LISREL, AMOS or PLS.

Method
The principal focus of a factor analysis is to reduce the number of variables
from a larger number to a manageable number of factors to simply the
subsequent analysis. The technique relies on the correlations between the
large number of items by looking at the correlation and the intercorrelations.
There are several approaches that can be used to reduce the number of
items like unweighted least squares, generalized least squares, maximum
likelihood, principal axis factoring, alpha factoring, image factoring and
principal component analysis but principal component analysis is the one
most popularly used.

This principal component method transforms the variables into a smaller


number of variables by creating a set of composite variables or components
to represent the original set which are not correlated. This linear
combination of variables which are often called factors will account for the
variance in the data as a whole. The first principal component will always
explain the most variance then followed by the second, the third and so on
until a certain number and then stops based on the criterion that the
researcher decides or which is set default by the software.

There several questions that the researcher needs to answer before the
analysis is started as follows:

1. How many factors should be extracted?


2. What factor loadings will be considered acceptable?
3. What factor rotation technique will be used?
How many factors to be extracted?
Most researchers follow the default in the software which is eigen values
more than 1 then the factor will be extracted. Eigen values are a measure of
how much variance is explained by each factor. There are also other criteria
suggested in literature such as a visual inspection of the scree plot, based
the number of factors based on theory and another analysis called parallel
analysis can also be carried out. The readers are advised to read recent
journal articles for a deeper understanding.

What factor loadings will be considered acceptable?


A factor loading indicates the strength of the correlation of that particular
item to the factor. The standardized loading will range from 0 to 1 with a
higher value indicating that the item is correlated with that factor. To assess
and determine the item loads on which factor we will look into the cut off
values for the loading versus the cross loadings. Items are assigned to a
particular factor based on a high loading on one factor and a low loading on
other factors. Typically researchers use a cut off value of 0.5 and above to
signify a significant loading as such the cross loading (loading on other
factors should be < 0.5). if the cross loadings are more than 0.5 then the
item cannot be uniquely assigned and may not be used in further analysis.

What factor rotation technique will be used?


Factor rotation techniques are defined as a mathematical way of simplifying
the results to make the results more easily or clearly interpretable.
Sometimes the unrotated solution will give us items that cannot be uniquely
or clearly assigned as such the rotation is need to clearly assign the items.
Generally there are 2 groups of rotation technique namely the orthogonal
rotation and the oblique rotation. The orthogonal rotation is used when the
researcher assumes the factors are not correlated whereas the oblique
rotation is used when the researcher thinks the factors may be correlated.
The decision is usually based on prior knowledge of the factors or an
understanding that is gained from the literature review.
Example
In a study to assess factors influencing the attitude towards knowledge
sharing a researcher has collected the following 13 items (using a Likert
scale 1=strongly disagree to 5=strongly agree) in the questionnaire which are
supposed to measure 3 constructs which are anticipated extrinsic rewards
(REWARD), anticipated reciprocal relationships (RECIP) and sense of self-
worth (SW). He is not sure if this will work in the data that he has collected.

Item Question in the questionnaire


Reward1 I will consider sharing valuable information if I am well
rewarded
Reward2 I will receive rewards in return for my information sharing
Reward3 I will receive additional points for promotion for my information
sharing
Recip1 My information sharing would strengthen the relationship
between existing members in the organization and myself
Recip2 My information sharing will get me well acquainted with new
members in the organization
Recip3 My information sharing would expand the scope of my
association with other members in the organization
Recip4 My information sharing would draw smooth cooperation from
outstanding members in the future
Recip5 My information sharing would create strong relationships with
members who have common interests in the organization
Sw1 My information sharing would help other members in the
organization solve problems
Sw2 My information sharing would create new business
opportunities for the organization
Sw3 My information sharing would improve work processes in the
organization
Sw4 My information sharing would increase productivity in the
organization
Sw5 My information sharing would help the organization achieve its
performance objectives

A factor analysis can be run to see how many factors can be derived before
proceeding further to test the relationship in a model. The process of setting
up the analysis in the SPSS is illustrated next.
Once the analysis is done we will get a long list of output as which can be
interpreted next.

Interpretation

The first table that we will be interested in is this table called “KMO and
Bartlett’s Test” which measures the extent to which there are sufficient
correlations which is suitable for a factor analysis. The value will range from
0 to 1 with a value of 0.5 and above deemed acceptable. The bartlett’s test
the hypothesis that there is lack of sufficient correlation for a factor analysis
to be carried out. The KMO value of 0.847 and the bartlett’s test which is
significant (p< 0.01) indicates that the data is suitable for a factor analysis.

KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .847


Bartlett's Test of Sphericity Approx. Chi-Square 2638.607
df 78.000
Sig. .000

Once this is cleared, the next issue will be to look at the individual measure
of sampling adequacy (MSA). This is given in the table labeled anti image
matrices. Under the table anti image correlations, in the diagonals we will
see some values with a superscript a, these value should be greater than 0.5
if we have values less than 0.5 then we should consider depleting them one
at a time starting with the lowest loading first. Once that is done then only
we move on to the next table which is the communalities.
Anti-image
Anti-image Matrices

Reward1 Reward2 Reward3 Recip1 Recip2 Recip3 Recip4 Recip5 Sw1 Sw2 Sw3 Sw4 Sw5

Anti-image Reward1 .806a -.482 -.451 -.023 -.148 .152 .017 .071 -.036 -.024 .089 .026 -.100
Correlation
Reward2 -.482 .749a -.546 -.187 .021 .011 .097 -.049 .152 -.007 -.212 .219 -.085

Reward3 -.451 -.546 .741a .230 .110 -.196 -.105 .000 -.100 .033 .124 -.273 .171

Recip1 -.023 -.187 .230 .831a -.195 -.277 -.136 -.031 .172 -.097 .082 -.158 -.152

Recip2 -.148 .021 .110 -.195 .881a -.155 -.083 -.248 -.163 -.077 .036 -.037 .164

Recip3 .152 .011 -.196 -.277 -.155 .870a -.144 -.179 .008 -.111 .068 -.044 .068

Recip4 .017 .097 -.105 -.136 -.083 -.144 .893a -.356 -.022 -.137 -.089 .135 -.058

Recip5 .071 -.049 .000 -.031 -.248 -.179 -.356 .843a .012 .128 -.109 .017 -.021

Sw1 -.036 .152 -.100 .172 -.163 .008 -.022 .012 .858a -.434 .050 -.235 -.469

Sw2 -.024 -.007 .033 -.097 -.077 -.111 -.137 .128 -.434 .891a -.390 -.046 .125

Sw3 .089 -.212 .124 .082 .036 .068 -.089 -.109 .050 -.390 .867a -.446 -.293

Sw4 .026 .219 -.273 -.158 -.037 -.044 .135 .017 -.235 -.046 -.446 .884a -.162

Sw5 -.100 -.085 .171 -.152 .164 .068 -.058 -.021 -.469 .125 -.293 -.162 .873a

a. Measures of Sampling Adequacy(MSA)


Communality measures the percentage of a variables variation that is
explained by the factors. When the communality is high more than 0.5 this
indicates that the variable has a lot in common to the other variables taken
as a group. The communality value is calculated as sum of squared loadings
for the particular variable which is from the unrotated factor loadings table.
The values that we are looking for is also 0.5 and above with values less
than 0.5 being candidate for deletion.

Communalities
Initial Extraction
Reward1 1.000 .979
Reward2 1.000 .981
Reward3 1.000 .982
Recip1 1.000 .521
Recip2 1.000 .577
Recip3 1.000 .629
Recip4 1.000 .625
Recip5 1.000 .620
Sw1 1.000 .865
Sw2 1.000 .826
Sw3 1.000 .865
Sw4 1.000 .851
Sw5 1.000 .826
Extraction Method: Principal Component Analysis.

Next we will look at the table labeled “Total Variance Explained” to assess
how much of the variance has been explained by the extracted factors and
how many factors has been extracted.
aa
Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
% of Cumulative % of Cumulative % of Cumulative
Component Total Variance % Total Variance % Total Variance %
1 5.718 43.985 43.985 5.718 43.985 43.985 4.167 32.055 32.055
2 2.805 21.575 65.560 2.805 21.575 65.560 2.999 23.067 55.121
3 1.624 12.494 78.054 1.624 12.494 78.054 2.981 22.933 78.054
4 .652 5.017 83.071
5 .544 4.183 87.253
6 .486 3.741 90.994
7 .412 3.168 94.162
8 .246 1.892 96.054
9 .207 1.593 97.647
10 .167 1.284 98.930
11 .097 .742 99.673
12 .023 .178 99.850
13 .019 .150 100.000
Extraction Method: Principal Component
Analysis.
Next from the table above we can know that based on the eigen value more
than 1, 3 factors can be extracted. See the column on initial eigen value we
can see three eigen values 5.178, 2.805 and 1.624 are more than 1 and the
next one 0.652 is less than one so the program will stop extracting more
factors.

The scree plot below can also be used to decide on number of factors that
can be derived. The scree plot follows the phenomenon of rock falls; the
bigger rocks will fall first while the smaller rocks will fall later. The bigger
rocks are important factor while the smaller ones which will explain very
small amounts of variance can be ignored. We can see that based on eigen
value only 3 factors will be extracted but based on the scree plot it may be
plausible to go up to 4 factors. Since initially we had 3 factors so we will
stop at the 3 factor solution.
The total variance explained by the 3 factor solution is 78.054% which can
be considered high. Generally we are looking for at least 50% variance
explained. Each of the 3 factors explains a portion of the total variance (see
the rotation sum of squared loadings) and can be broken down as
Reward (23.067%), Reciprocal (22.933%) and Self-worth (32.055%).

Rotated Component Matrixa


Component
1 2 3
Reward1 .098 .984 .033
Reward2 .075 .986 .050
Reward3 .098 .985 .047
Recip1 .253 .007 .676
Recip2 .233 .053 .721
Recip3 .132 .200 .756
Recip4 .331 -.033 .717
Recip5 .120 -.044 .777
Sw1 .902 .033 .226
Sw2 .845 .057 .330
Sw3 .890 .096 .253
Sw4 .880 .148 .234
Sw5 .889 .072 .175

The next big task is to assign items to factors based on the assessment of
the loadings and cross-loadings. Items loading high on one factor and
loading low on other factors can be uniquely assigned to each factor. From
the table we can see that the first three items are all loading high on the first
second factor, the next five items are all loading high on the third factor
whereas the last five items are loading high on the first factor. Thus we can
also rename the first factor as self-worth, second factor as reward and
the third factor as reciprocal.

The next thing that can be done is to compute the factors that will be used
in subsequent analysis. There are three ways suggested in the literature
which are 1) Surrogate variable, 2) Summated scale and 3) Factor scores.
Each has its advantages and drawbacks. The most commonly used method
for computing factors is the summated scale which lends itself to
generalizability and transferability.

Cluster Analysis
Cluster analysis is a multivariate approach for identifying objects or
individuals that are similar to one another based on some criteria or
characteristics. This analysis will classify individuals or objects into a small
number of mutually exclusive and collectively exhaustive groups based on a
set of variables called cluster variate. The focus of cluster analysis is not to
estimate the variate but to compare objects based on the variate; in a sense
similar to factor analysis that group variables, whereas cluster analysis
groups objects and individuals.

Method
Most cluster analysis uses the basic 5 steps described below although they
may differ.
1. Selection of sample to be clustered
2. Definition of variables that will be used to measure the objects or
individuals
3. Computation of similarities among the entities through correlation,
Euclidean distances, and other techniques
4. Selection of mutually exclusive clusters or hierarchically arrange
clusters
5. Cluster comparison and validation.

Many researchers have cautioned that different clustering methods will


inevitably produce different solutions. The researcher has to have the
information about the data, the clustering algorithm and the stopping rules
to decide on the clusters or else it would be an arbitrary exercise of
statistics.
A researcher would like to cluster 12 Internet users into several groups
based on 2 variables, Attitude (Attitude towards Internet) and Price (Price
Sensitivity). The data about attitude and price is presented in the table
below.

Respondent Attitude Price


1 20 5
2 22 7
3 20 9
4 21 8
5 15 15
6 15 17
7 13 16
8 16 16
9 5 25
10 8 25
11 4 28
12 7 28

A visual inspection of the 12 respondents based on the 2 variables is shown


below. We can see that the 12 respondents can be grouped into 3 distinct
groups.
Once we have done this the computer will give us a long list of output. One
of the tables that we can look to is this table called cluster membership.
Here we have asked for a minimum 2 groups and maximum 3 group cluster.
From the results we can see which respondents are grouped together in
which groups.

Cluster Membership

Case 4 Clusters 3 Clusters 2 Clusters

1 1 1 1

2 1 1 1

3 1 1 1

4 1 1 1

5 2 2 1

6 2 2 1

7 2 2 1
8 2 2 1

9 3 3 2

10 3 3 2

11 4 3 2

12 4 3 2

We should also look at the dendrogram and it clearly shows that a 3 cluster
solution is the best. Cluster 1 (1,2,3,4) can be classified as power users,
Cluster 2 (5,6,7,8) can be classified as casual users and Cluster 3
(9,10,11,12) as starters.
* * * H I E R A R C H I C A L C L U S T E R A N A L Y S I S * * *

Dendrogram using Average Linkage (Between Groups)

Rescaled Distance Cluster Combine

C A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+

6 ─┐
8 ─┤
5 ─┼─────────────┐
7 ─┘ │
3 ─┐ ├─────────────────────────────────┐
4 ─┤ │ │
2 ─┼─────────────┘ │
1 ─┘ │
11 ─┐ │
12 ─┼───────────────────────────────────────────────┘
9 ─┤
10 ─┘

To validate we can use a one-way ANOVA to test the 2 variables against the
3 clusters and the results clearly confirm our findings. The power users are
low on price sensitivity and high on attitude whereas the starters are low on
attitude and high on price sensitivity. The casual users are the ones in
between the two extreme clusters.

Power Users Casual Users Starters F-value


Attitude 20.75 14.75 6.00 113.20**
Price 7.25 16.00 26.50 169.33**
**p< 0.01
The chart clearly shows the group differences with the line going in opposite
directions for the two variables with the casual users not changing.

Multidimensional Scaling
Multidimensional scaling refers to a series of techniques that helps the
researcher identify key dimensions underlying respondents’ evaluation of
objects and then position these objects in this dimensional space sometimes
called a perceptual map. The analysis works on the judgment of the
respondents based on similarity of the objects and these similarities are
reflected in the relative distance among the objects in the multidimensional
space. This is commonly used in marketing studies to identify key
dimensions underlying customer evaluations of products, services, or
company. For example a customer maybe asked to rate the similarity of
several cars based on pairs which will give us many paired comparison.
Then a plot will be generated to explain the differences.

Method
When respondents assess the objects they may use different types of
measures which can be classified into objective dimension and subjective
dimension. The objective dimension are quantifiable (physical or observable)
while the other is not easily quantifiable (perceptions). The subjective
dimension may or may not be based on the objective dimension. The same
object having the same physical characteristics (objective dimension) but
they may be viewed differently by the respondents based on quality
(perceived dimension). The researcher needs to understand how objective
and subjective dimensions relate to the axes in the multidimensional space
used in the perceptual map. The perceptual map can be visually depicted as
follows:

Dimension 1

C
B

F
Dimension 2
D

G H

A researcher wanted to assess 6 universities in Malaysia in terms how


similar they are in terms of the perception of the public. To do that the
researcher has to first create 15 pairs of comparison of the universities and
if the pair is similar then the value given will be small (1) and if they are
dissimilar then the value given will be bigger (15). The results are tabulated
in a matrix as shown below.
UNIVERSITY A B C D E F
A -
B 2 -
C 13 12 -
D 4 6 9 -
E 3 5 10 1 -
F 8 7 11 14 15 -

Next we will analyze the matrix using the SPSS using the Analyze, Scale,
Multidimensional Scaling tab. The next screen shows some options that we
need to choose. Once the analysis is executed we will get the output that
can be interpreted.
Once we have the solution we need to identify the 2 dimensions of attributes
that the respondents have used in evaluating the similarity and
dissimilarity. From experience we may classify the first dimension as
Quality and the second dimension as Prestige. Then we can see if the
grouping in the multidimensional space makes sense.

Summary
In this chapter we have looked at the multivariate techniques which are
called interdependence techniques namely the factor analysis, cluster
analysis and the multidimensional scaling. The exploratory is a technique
that tries to statistically identify a reduced number of factors from a larger
number of items which are typically called the measured variables. The
subsequent factors identified are called latent variables as they are not
measured directly. The pattern loading will be used to assign items into
particular factors. Next we looked at cluster analysis which is an approach
for identifying objects or individuals that are similar to one another based on
some criteria or characteristics. This analysis will classify individuals or
objects into a small number of mutually exclusive and collectively
exhaustive groups based on a set of variables called cluster variate. Lastly
we looked at the multidimensional scaling technique that helps the
researcher identify key dimensions underlying respondents’ evaluation of
objects and then position these objects in this dimensional space sometimes
called a perceptual map.
Review Questions
1. What is the purpose of doing an exploratory factor analysis?
2. How can we decide on the number of factors to be extracted?
3. There are 2 main rotation techniques in factors analysis, how do we
decide which one to choose?
4. Explain what you understand about factor loadings and communality.
5. Explain what is your understanding of the term percentage variance
explained. What is the use of this measure?
6. Discuss the three ways how factors can be computed after the factor
analysis.
7. What is the difference between cluster analysis and factor analysis?
8. Explain how the cluster analysis works.
9. Explain the principle on which the multidimensional scaling analysis
works?
10. The following question is based on the data “Data Knowledge
Sharing”. The task is to do a factor analysis of these 13 items which
measures 3 factors, Attitude (Att), Subjective norm (Sn) and Perceived
behavioral control (Pbc) and the items are as listed below:

Item Question in the questionnaire


Att1 My information sharing with other organizational
members is good
Att2 My information sharing with other organizational
members is harmful
Att3 My information sharing with other organizational
members is an enjoyable experience
Att4 My information sharing with other organizational
members is valuable to me
Att5 My information sharing with other organizational
members is a wise move
Sn1 My higher management thinks I should share information
with other member in the organization and factory
Sn2 My manager thinks I should share information with other
member in the organization and factory
Sn3 My colleagues think I should share information with other
member in the organization and factory
S4 My organization expects me to share information
Pbc1 It is always possible for me to share information with
members in my organization
Pbc2 I believe that I have much control in what I have to share
with others in my organization
Pbc3 If I want, I could always share information with members
in my organization / factory
Pbc4 It is mostly up to me whether or not I share information

Conduct a factor analysis using the principal component analysis with


the varimax rotation technique and using a priori (theory) factor
extraction set as 3. Then answer the following questions.

a. Is the value for the KMO acceptable?


b. Is the bartlett’s test significant?
c. Do you need to delete any items?
d. Interpret the communality values and are they acceptable?
e. How many factors can be extracted? Explain how much
variance each factor accounts for.
f. Interpret the rotated factor loadings and assign the items to the
factors.
g. Write a report summarizing your findings and interpretation of
the results.

Das könnte Ihnen auch gefallen