Sie sind auf Seite 1von 69

SW388R7

Data Analysis &


Computers II

Slide 1
Principal component analysis

Principal component analysis

Strategy for solving problems

Sample problems

Steps in principal component analysis

SW388R7
Data Analysis &
Computers II

Slide 2
Principal components factor analysis
Obtaining a factor solution through principal
components analysis is an iterative process that
usually requires repeating the SPSS factor analysis
procedure a number of times to reach a satisfactory
solution.

We begin by identifying a group of variables whose
variance we believe can be represented more
parsimoniously by a smaller set of factors, or
components. The end result of the principal
components analysis will tell us which variables can
be represented by which components, and which
variables should be retained as individual variables
because the factor solution does not adequately
represent their information.
SW388R7
Data Analysis &
Computers II

Slide 3
Strategy for solving problems - 1
A principal component factor analysis requires:
The variables included must be metric level or dichotomous
(dummy-coded) nominal level
The sample size must be greater than 50 (preferably 100)
The ratio of cases to variables must be 5 to 1 or larger
The correlation matrix for the variables must contain 2 or
more correlations of 0.30 or greater
Variables with measures of sampling adequacy less than 0.50
must be removed
The overall measure of sampling adequacy is 0.50 or higher
The Bartlett test of sphericity is statistically significant.

The first phase of a principal component analysis is
devoted to verifying that we meet these
requirements. If we do not meet these
requirements, factor analysis is not appropriate.
SW388R7
Data Analysis &
Computers II

Slide 4
Strategy for solving problems - 2
The second phase of a principal component factor
analysis focuses on deriving a factor model, or
pattern of relationships between variables and
components, that satisfies the following
requirements:
The derived components explain 50% or more of the
variance in each of the variables, i.e. have a communality
greater than 0.50
None of the variables have loadings, or correlations, of 0.40
or higher for more than one component, i.e. do not have
complex structure
None of the components has only one variable in it

To meet these requirements, we remove problematic
variables from the analysis and repeat the principal
component analysis.
SW388R7
Data Analysis &
Computers II

Slide 5
Strategy for solving problems - 3
If, at the conclusion of this process, we have
components that have more than one variable
loading on them, have components that explain at
least 50% of the variance in the included variables,
and have components that collectively explain more
than 60% of the variance in the set of variables, we
can substitute the components for the variables in
further analyses.
Variables that were removed in the analysis should
be included individually in further analyses.
Substitution of components for individual variables is
accomplished by using only the highest loading
variable, or by combining the variables loading on
each component to create a new variable.
SW388R7
Data Analysis &
Computers II

Slide 6
Notes - 1
When evaluating measures of sampling adequacy,
communalities, or factor loadings, we ignore the sign
of the numeric value and base our decision on the
size or magnitude of the value.

The sign of the number indicates the direction of the
relationship.

A loading of -0.732 is just as strong as a loading of
0.732. The minus sign indicates an inverse or
negative relationship; the absence of a sign is meant
to imply a plus sign indicating a direct or positive
relationship.
SW388R7
Data Analysis &
Computers II

Slide 7
Notes - 2
If there are two or more components in the
component matrix, the pattern of loadings is based
on the SPSS Rotated Component Matrix. If there is
only one component in the solution, the Rotated
Component Matrix is not computed, and the pattern
of loadings is based on the Component Matrix.

It is possible that the analysis will break down and
we will have too few variables in the analysis to
support the use of principal component analysis.
SW388R7
Data Analysis &
Computers II

Slide 8
Problem 1
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application
of a statistic? Answer the question based on the results of a principal component analysis prior
to testing for outliers, split sample validation, and a test of reliability. Assume that there is no
problematic pattern of missing data. Use a level of significance of 0.05.

Based on the results of a principal component analysis of the 8 variables "claims about
environmental threats are exaggerated" [grnexagg], "danger to the environment from modifying
genes in crops" [genegen], "America doing enough to protect environment" [amprogrn], "should
be international agreements for environment problems" [grnintl], "poorer countries should be
expected to do less for the environment" [ldcgrn], "economic progress in America will slow
down without more concern for environment" [econgrn], "likelihood of nuclear power station
damaging environment in next 5 years" [nukeacc], and "respondent's socioeconomic index" [sei],
the information in these variables can be represented with 2 components and 4 individual
variables.

Component 1 includes the variables "danger to the environment from modifying genes in crops"
[genegen] and "likelihood of nuclear power station damaging environment in next 5 years"
[nukeacc]. Component 2 includes the variables "claims about environmental threats are
exaggerated" [grnexagg] and "poorer countries should be expected to do less for the
environment" [ldcgrn]. The variables "respondent's socioeconomic index" [sei], "economic
progress in America will slow down without more concern for environment" [econgrn], "should
be international agreements for environment problems" [grnintl], and "America doing enough to
protect environment" [amprogrn] were not included on the components and are retained as
individual variables.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II

Slide 9
Computing a principal component analysis
To compute a principal
component analysis in SPSS,
select the Data Reduction |
Factor command from the
Analyze menu.
SW388R7
Data Analysis &
Computers II

Slide 10
Add the variables to the analysis
First, move the
variables listed in
the problem to the
Variables list box.
Second, click on the
Descriptives button to
specify statistics to
include in the output.
SW388R7
Data Analysis &
Computers II

Slide 11
Compete the descriptives dialog box
First, mark the Univariate
descriptives checkbox to get a
tally of valid cases.
Third, mark the
Coefficients checkbox to get
a correlation matrix, one of
the outputs needed to
assess the appropriateness
of factor analysis for the
variables.
Second, keep the Initial
solution checkbox to get
the statistics needed to
determine the number
of factors to extract.
Fourth, mark the KMO and Bartletts test
of sphericity checkbox to get more outputs
used to assess the appropriateness of
factor analysis for the variables.
Fifth, mark the Anti-image
checkbox to get more
outputs used to assess the
appropriateness of factor
analysis for the variables.
Sixth, click
on the
Continue
button.
SW388R7
Data Analysis &
Computers II

Slide 12
Select the extraction method
First, click on the
Extraction button to
specify statistics to
include in the output.
The extraction method refers
to the mathematical method
that SPSS uses to compute the
factors or components.
SW388R7
Data Analysis &
Computers II

Slide 13
Compete the extraction dialog box
First, retain the default
method Principal components.
Second, click
on the
Continue
button.
SW388R7
Data Analysis &
Computers II

Slide 14
Select the rotation method
First, click on the
Rotation button to
specify statistics to
include in the output.
The rotation method refers to
the mathematical method that
SPSS rotate the axes in
geometric space. This makes
it easier to determine which
variables are loaded on which
components.
SW388R7
Data Analysis &
Computers II

Slide 15
Compete the rotation dialog box
First, mark the
Varimax method
as the type of
rotation to used
in the analysis.

Second, click
on the
Continue
button.
SW388R7
Data Analysis &
Computers II

Slide 16
Complete the request for the analysis
First, click on the
OK button to
request the output.
SW388R7
Data Analysis &
Computers II

Slide 17
Level of measurement requirement
"Respondent's socioeconomic index" [sei] is an interval level
variable, which satisfy the level of measurement requirements for
a principal component analysis.

"Claims about environmental threats are exaggerated" [grnexagg],
"danger to the environment from modifying genes in crops"
[genegen], "America doing enough to protect environment"
[amprogrn], "should be international agreements for environment
problems" [grnintl], "poorer countries should be expected to do
less for the environment" [ldcgrn], "economic progress in America
will slow down without more concern for environment" [econgrn],
and "likelihood of nuclear power station damaging environment in
next 5 years" [nukeacc] are ordinal level variables. If we follow the
convention of treating ordinal level variables as metric variables,
the level of measurement requirement for principal component
analysis is satisfied. Since some data analysts do not agree with
this convention, a note of caution should be included in our
interpretation.
SW388R7
Data Analysis &
Computers II

Slide 18
Descriptive Statistics
3.31 .988 67
3.09 .949 67
2.40 .605 67
1.97 .674 67
3.73 .898 67
2.69 .839 67
2.49 .894 67
49.757 18.9651 67
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
AMERICAN DOING
ENOUGH TO PROTECT
ENVIRONMENT
INTL AGREEMENTS FOR
ENVIRONMENT
PROBLEMS
POOR COUNTRIES LESS
THAN RICH FOR
ENVIRONMENT
ECONOMIC PROGRESS
DEPENDENT ON
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
RESPONDENT'S
SOCIOECONOMIC INDEX
Mean Std. Devi ation Anal ysi s N
Sample size requirement:
minimum number of cases
The number of valid cases for this
set of variables is 67.

While principal component analysis
can be conducted on a sample that
has fewer than 100 cases, but more
than 50 cases, we should be
cautious about its interpretation.
SW388R7
Data Analysis &
Computers II

Slide 19
Descriptive Statistics
3.31 .988 67
3.09 .949 67
2.40 .605 67
1.97 .674 67
3.73 .898 67
2.69 .839 67
2.49 .894 67
49.757 18.9651 67
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
AMERICAN DOING
ENOUGH TO PROTECT
ENVIRONMENT
INTL AGREEMENTS FOR
ENVIRONMENT
PROBLEMS
POOR COUNTRIES LESS
THAN RICH FOR
ENVIRONMENT
ECONOMIC PROGRESS
DEPENDENT ON
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
RESPONDENT'S
SOCIOECONOMIC INDEX
Mean Std. Devi ation Anal ysi s N
Sample size requirement:
ratio of cases to variables
The ratio of cases to
variables in a principal
component analysis should
be at least 5 to 1.

With 67 and 8 variables,
the ratio of cases to
variables is 8.4 to 1, which
exceeds the requirement
for the ratio of cases to
variables.
SW388R7
Data Analysis &
Computers II

Slide 20
Correlation Matrix
1.000 -.240 .394 -.305 .301 -.117 -.126 .069
-.240 1.000 -.301 .146 -.149 .340 .394 .208
.394 -.301 1.000 -.305 .258 -.136 -.401 .054
-.305 .146 -.305 1.000 -.289 .117 .151 .094
.301 -.149 .258 -.289 1.000 -.194 -.003 .114
-.117 .340 -.136 .117 -.194 1.000 .310 -.018
-.126 .394 -.401 .151 -.003 .310 1.000 .341
.069 .208 .054 .094 .114 -.018 .341 1.000
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
AMERICAN DOING
ENOUGH TO PROTECT
ENVIRONMENT
INTL AGREEMENTS FOR
ENVIRONMENT
PROBLEMS
POOR COUNTRIES LESS
THAN RICH FOR
ENVIRONMENT
ECONOMIC PROGRESS
DEPENDENT ON
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
RESPONDENT'S
SOCIOECONOMIC INDEX
Correl ati on
ENVIRON
MENTAL
THREATS
EXAGGER
ATED
HOW
DANGEROUS
MODIFYING
GENES IN
CROPS
AMERICAN
DOING
ENOUGH TO
PROTECT
ENVIRONME
NT
INTL
AGREEMEN
TS FOR
ENVIRONME
NT
PROBLEMS
POOR
COUNTRIES
LESS THAN
RICH FOR
ENVIRONME
NT
ECONOMIC
PROGRESS
DEPENDEN
T ON
ENVIRONME
NT
LIKELIHOOD
OF NUCLEAR
MELTDOWN
IN 5 YEARS
RESPONDEN
T'S
SOCIOECON
OMIC INDEX
Appropriateness of factor analysis:
Presence of substantial correlations
Principal components analysis
requires that there be some
correlations greater than 0.30
between the variables
included in the analysis.

For this set of variables, there
are 10 correlations in the
matrix greater than 0.30,
satisfying this requirement.
The correlations greater than
0.30 are highlighted in yellow.
SW388R7
Data Analysis &
Computers II

Slide 21
Appropriateness of factor analysis:
Sampling adequacy of individual variables
Principal component analysis requires
that the Kaiser-Meyer-Olkin Measure of
Sampling Adequacy be greater than 0.50
for each individual variable as well as the
set of variables.

The Measure of Sampling Adequacy
(MSA) is described at marvelous if it is
0.90 or greater, meritorious if it is in the
0.80's, middling if in the 0.70's, mediocre
if in the in the 0.60's , miserable if in the
0.50's, and unacceptable if below 0.50.
There are two anti-image
matrices: the anti-image
covariance matrix and the
anti-image correlation
matrix. We are interested in
the anti-image correlation
matrix.
SW388R7
Data Analysis &
Computers II

Slide 22
Appropriateness of factor analysis:
Sampling adequacy of individual variables
On iteration 1, the MSA for
the variable "respondent's
socioeconomic index" [sei]
was 0.410 which was less
than 0.50, so it was
removed from the analysis.
SPSS locates the Measures
of Sampling Adequacy are
on the diagonal of the anti-
image correlation matrix,
highlighted in yellow.
SW388R7
Data Analysis &
Computers II

Slide 23
Excluding a variable from the factor analysis
To remove the variable
"respondent's socioeconomic
index" [sei] from the
analysis, click on the Dialog
Recall tool button to access
the drop down menu.
SW388R7
Data Analysis &
Computers II

Slide 24
Repeating the factor analysis
In the drop down menu,
select Factor Analysis to
reopen the factor analysis
dialog box.
SW388R7
Data Analysis &
Computers II

Slide 25
Removing the variable from the list of variables
First, highlight
the sei variable.
Second, click on the left
arrow button to remove
the variable from the
Variables list box.
SW388R7
Data Analysis &
Computers II

Slide 26
Replicating the factor analysis
The dialog recall command opens
the dialog box with all of the
settings that we had selected the
last time we used factor analysis.

To replicate the analysis without
the variable that we just removed,
click on the OK button.
SW388R7
Data Analysis &
Computers II

Slide 27
Appropriateness of factor analysis:
Sample adequacy for revised factor analysis
In the factor analysis with the sei
variable removed, we see that the
measures of sampling adequacy
for the remaining variables are all
greater than 0.50.
On iteration 2, the MSA for all of
the individual variables still
included in the analysis was
greater than 0.5, supporting their
retention in the analysis.
SW388R7
Data Analysis &
Computers II

Slide 28
KMO and Bartlett's Test
.734
69.350
21
.000
Kaiser-Meyer-Ol ki n Measure of Sampli ng
Adequacy.
Approx. Chi-Square
df
Si g.
Bartl ett's Test of
Spheri ci ty
Appropriateness of factor analysis:
Sample adequacy for set of variables
In addition, the overall
MSA for the set of variables
included in the analysis
was 0.734, which exceeds
the minimum requirement
of 0.50 for overall MSA.
The seven variables remaining
in the analysis satisfy the
criteria for appropriateness of
factor analysis.

SW388R7
Data Analysis &
Computers II

Slide 29
KMO and Bartlett's Test
.642
84.006
28
.000
Kaiser-Meyer-Ol ki n Measure of Sampli ng
Adequacy.
Approx. Chi-Square
df
Si g.
Bartl ett's Test of
Spheri ci ty
Appropriateness of factor analysis:
Bartlett test of sphericity
Principal component analysis requires
that the probability associated with
Bartlett's Test of Sphericity be less
than the level of significance.

The probability associated with the
Bartlett test is <0.001, which satisfies
this requirement.

The next step is to determine the
number of factors that should be
included in the factor solution.
SW388R7
Data Analysis &
Computers II

Slide 30
Number of factors to extract:
Latent root criterion
Total Variance Explained
2.461 35.164 35.164 2.461 35.164 35.164 1.849 26.408 26.408
1.229 17.557 52.721 1.229 17.557 52.721 1.842 26.313 52.721
.889 12.695 65.415
.722 10.313 75.728
.621 8.878 84.606
.572 8.174 92.780
.505 7.220 100.000
Component
1
2
3
4
5
6
7
Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve %
Initi al Ei genval ues Extracti on Sums of Squared Loadi ngs Rotati on Sums of Squared Loadi ngs
Extracti on Method: Princi pal Component Anal ysi s.
Using the output from iteration 2,
there were 2 eigenvalues greater
than 1.0.

The latent root criterion for
number of factors to derive would
indicate that there were 2
components to be extracted for
these variables.
SW388R7
Data Analysis &
Computers II

Slide 31
Number of factors to extract:
Percentage of variance criterion
Total Variance Explained
2.461 35.164 35.164 2.461 35.164 35.164 1.849 26.408 26.408
1.229 17.557 52.721 1.229 17.557 52.721 1.842 26.313 52.721
.889 12.695 65.415
.722 10.313 75.728
.621 8.878 84.606
.572 8.174 92.780
.505 7.220 100.000
Component
1
2
3
4
5
6
7
Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve %
Initi al Ei genval ues Extracti on Sums of Squared Loadi ngs Rotati on Sums of Squared Loadi ngs
Extracti on Method: Princi pal Component Anal ysi s. In contrast, the cumulative
proportion of variance criteria
would require 3 components to
satisfy the criterion of explaining
60% or more of the total variance.

A 3 components solution would
explain 65.415% of the total
variance.
Since the SPSS default is to extract
the number of components indicated
by the latent root criterion, our
initial factor solution was based on
the extraction of 2 components.
SW388R7
Data Analysis &
Computers II

Slide 32
Communalities
1.000 .526
1.000 .582
1.000 .521
1.000 .491
1.000 .494
1.000 .437
1.000 .640
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
AMERICAN DOING
ENOUGH TO PROTECT
ENVIRONMENT
INTL AGREEMENTS FOR
ENVIRONMENT
PROBLEMS
POOR COUNTRIES LESS
THAN RICH FOR
ENVIRONMENT
ECONOMIC PROGRESS
DEPENDENT ON
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
Initi al Extracti on
Extracti on Method: Pri nci pal Component Anal ysi s.
Evaluating communalities
Communalities represent the
proportion of the variance in
the original variables that is
accounted for by the factor
solution.

The factor solution should
explain at least half of each
original variable's variance, so
the communality value for
each variable should be 0.50
or higher.
SW388R7
Data Analysis &
Computers II

Slide 33
Communalities
1.000 .526
1.000 .582
1.000 .521
1.000 .491
1.000 .494
1.000 .437
1.000 .640
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
AMERICAN DOING
ENOUGH TO PROTECT
ENVIRONMENT
INTL AGREEMENTS FOR
ENVIRONMENT
PROBLEMS
POOR COUNTRIES LESS
THAN RICH FOR
ENVIRONMENT
ECONOMIC PROGRESS
DEPENDENT ON
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
Initi al Extracti on
Extracti on Method: Princi pal Component Anal ysi s.
Communality requiring variable removal

On iteration 2, the
communality for the
variable "economic
progress in America will
slow down without more
concern for environment"
[econgrn] was 0.437
which was less than 0.50.

The variable was removed
and the principal
component analysis was
computed again.
In this iteration, there are actually three
variables that have communalities less
than 0.50. The variable with the
smallest communality is selected for
removal.
SW388R7
Data Analysis &
Computers II

Slide 34
Repeating the factor analysis
In the drop down menu,
select Factor Analysis to
reopen the factor analysis
dialog box.
SW388R7
Data Analysis &
Computers II

Slide 35
Removing the variable from the list of variables
First, highlight
the econgrn
variable.
Second, click on the left
arrow button to remove
the variable from the
Variables list box.
SW388R7
Data Analysis &
Computers II

Slide 36
Replicating the factor analysis
The dialog recall command opens
the dialog box with all of the
settings that we had selected the
last time we used factor analysis.

To replicate the analysis without
the variable that we just removed,
click on the OK button.
SW388R7
Data Analysis &
Computers II

Slide 37
Communalities
1.000 .517
1.000 .573
1.000 .576
1.000 .486
1.000 .573
1.000 .716
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
AMERICAN DOING
ENOUGH TO PROTECT
ENVIRONMENT
INTL AGREEMENTS
FOR ENVIRONMENT
PROBLEMS
POOR COUNTRIES
LESS THAN RICH FOR
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
Initi al Extracti on
Extracti on Method: Princi pal Component Anal ysi s.
Communality requiring variable removal
On iteration 3, the
communality for the
variable "should be
international agreements
for environment
problems" [grnintl] was
0.486 which was less
than 0.50.

The variable was removed
and the principal
component analysis was
computed again.
SW388R7
Data Analysis &
Computers II

Slide 38
Repeating the factor analysis
In the drop down menu,
select Factor Analysis to
reopen the factor analysis
dialog box.
SW388R7
Data Analysis &
Computers II

Slide 39
Removing the variable from the list of variables
First, highlight
the grnintl
variable.
Second, click on the left
arrow button to remove
the variable from the
Variables list box.
SW388R7
Data Analysis &
Computers II

Slide 40
Replicating the factor analysis
The dialog recall command opens
the dialog box with all of the
settings that we had selected the
last time we used factor analysis.

To replicate the analysis without
the variable that we just removed,
click on the OK button.
SW388R7
Data Analysis &
Computers II

Slide 41
Communalities
1.000 .591
1.000 .568
1.000 .582
1.000 .679
1.000 .725
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
AMERICAN DOING
ENOUGH TO PROTECT
ENVIRONMENT
POOR COUNTRIES
LESS THAN RICH FOR
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
Initi al Extracti on
Extracti on Method: Princi pal Component Anal ysi s.
Communality satisfactory for all variables
Complex structure occurs when
one variable has high loadings or
correlations (0.40 or greater) on
more than one component. If a
variable has complex structure, it
should be removed from the
analysis.

Variables are only checked for
complex structure if there is more
than one component in the
solution. Variables that load on
only one component are described
as having simple structure.
Once any variables with
communalities less than
0.50 have been removed
from the analysis, the
pattern of factor loadings
should be examined to
identify variables that
have complex structure.
SW388R7
Data Analysis &
Computers II

Slide 42
Rotated Component Matrix
a
-.265 .722
.732 -.182
-.581 .495
.092 .819
.847 .087
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
AMERICAN DOING
ENOUGH TO PROTECT
ENVIRONMENT
POOR COUNTRIES
LESS THAN RICH FOR
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
1 2
Component
Extracti on Method: Pri nci pal Component Anal ysi s.
Rotati on Method: Vari max wi th Kai ser Normal i zati on.
Rotati on converged in 3 i terati ons.
a.
Identifying complex structure
On iteration 4, the variable
"America doing enough to
protect environment"
[amprogrn] was found to
have complex structure.

Specifically, the variable
had a loading of -0.581 on
component 1 and a loading
of 0.495 on component 2.

The variable should be
removed and the principal
component analysis should
be repeated.
If only one component has been
extracted, each variable can only
load on that one factor, so
complex structure is not an issue.
SW388R7
Data Analysis &
Computers II

Slide 43
Repeating the factor analysis
In the drop down menu,
select Factor Analysis to
reopen the factor analysis
dialog box.
SW388R7
Data Analysis &
Computers II

Slide 44
Removing the variable from the list of variables
First, highlight
the amprogrn
variable.
Second, click on the left
arrow button to remove
the variable from the
Variables list box.
SW388R7
Data Analysis &
Computers II

Slide 45
Replicating the factor analysis
The dialog recall command opens
the dialog box with all of the
settings that we had selected the
last time we used factor analysis.

To replicate the analysis without
the variable that we just removed,
click on the OK button.
SW388R7
Data Analysis &
Computers II

Slide 46
Rotated Component Matrix
a
-.207 .756
.801 -.229
.051 .830
.861 .059
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
POOR COUNTRIES
LESS THAN RICH FOR
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
1 2
Component
Extracti on Method: Pri nci pal Component Anal ysi s.
Rotati on Method: Vari max wi th Kai ser Normal i zati on.
Rotati on converged in 3 i terati ons.
a.
Checking for single-variable components
On iteration 5, none of the
variables demonstrated complex
structure.

It is not necessary to remove
any additional variables because
of complex structure.
After variables have been
removed for low communalities
and complex structure, the
factor solution is examined to
remove any components that
have only a single variable
loading on them.

If a component has only a single
variable loading on it, the
variable should be removed from
the next iteration of the principal
component analysis.
SW388R7
Data Analysis &
Computers II

Slide 47
Variable loadings on components
Rotated Component Matrix
a
-.207 .756
.801 -.229
.051 .830
.861 .059
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
POOR COUNTRIES
LESS THAN RICH FOR
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
1 2
Component
Extracti on Method: Pri nci pal Component Anal ysi s.
Rotati on Method: Vari max wi th Kai ser Normal i zati on.
Rotati on converged in 3 i terati ons.
a.
On iteration 5, the 2
components in the
analysis had more than
one variable loading on
each of them.
Communalities
1.000 .615
1.000 .694
1.000 .691
1.000 .744
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
POOR COUNTRIES
LESS THAN RICH FOR
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
Initi al Extracti on
Extracti on Method: Principal Component Anal ysi s.
The communalities for all of the
variables included on the
components were greater than
0.50 and all variables had
simple structure.

The principal component
analysis has been completed.
SW388R7
Data Analysis &
Computers II

Slide 48
Rotated Component Matrix
a
-.207 .756
.801 -.229
.051 .830
.861 .059
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
POOR COUNTRIES
LESS THAN RICH FOR
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
1 2
Component
Extracti on Method: Pri nci pal Component Anal ysi s.
Rotati on Method: Vari max wi th Kai ser Normal i zati on.
Rotati on converged in 3 i terati ons.
a.
Interpreting the principal components
The information in 4 of the
variables can be represented
by 2 components.
Component 1 includes the variables
"danger to the environment from
modifying genes in crops"
[genegen] and "likelihood of nuclear
power station damaging
environment in next 5 years"
[nukeacc]. We can substitute one
component variable for this
combination of variables in further
analyses.

Component 2 includes the variables
"claims about environmental threats
are exaggerated" [grnexagg] and
"poorer countries should be
expected to do less for the
environment" [ldcgrn]. We can
substitute one component variable
for this combination of variables in
further analyses.
SW388R7
Data Analysis &
Computers II

Slide 49
Communalities
1.000 .615
1.000 .694
1.000 .691
1.000 .744
ENVIRONMENTAL
THREATS
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN
CROPS
POOR COUNTRIES
LESS THAN RICH FOR
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN
IN 5 YEARS
Initi al Extracti on
Extracti on Method: Principal Component Anal ysi s.
Variance explained in individual variables
The components explain at
least 50% of the variance in
each of the variables included
in the final analysis.
SW388R7
Data Analysis &
Computers II

Slide 50
Total Variance Explained
1.626 40.651 40.651 1.626 40.651 40.651 1.428 35.699 35.699
1.119 27.968 68.619 1.119 27.968 68.619 1.317 32.921 68.619
.694 17.341 85.960
.562 14.040 100.000
Component
1
2
3
4
Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve %
Initi al Ei genval ues Extracti on Sums of Squared Loadi ngs Rotati on Sums of Squared Loadi ngs
Extracti on Method: Princi pal Component Anal ysi s.
Total variance explained
The 2 components explain
68.619% of the total
variance in the variables
which are included on the
components.
SW388R7
Data Analysis &
Computers II

Slide 51
Total Variance Explained
1.626 40.651 40.651 1.626 40.651 40.651 1.428 35.699 35.699
1.119 27.968 68.619 1.119 27.968 68.619 1.317 32.921 68.619
.694 17.341 85.960
.562 14.040 100.000
Component
1
2
3
4
Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve %
Initi al Ei genval ues Extracti on Sums of Squared Loadi ngs Rotati on Sums of Squared Loadi ngs
Extracti on Method: Princi pal Component Anal ysi s.
Answering the problem question
The answer to the original question is true with caution.

Component 1 includes the variables "danger to the environment from
modifying genes in crops" [genegen] and "likelihood of nuclear power
station damaging environment in next 5 years" [nukeacc]. We can
substitute one component variable for this combination of variables in
further analyses.

Component 2 includes the variables "claims about environmental threats
are exaggerated" [grnexagg] and "poorer countries should be expected to
do less for the environment" [ldcgrn]. We can substitute one component
variable for this combination of variables in further analyses.

The components explain at least 50% of the variance in each of the
variables included in the final analysis.

The components explain 68.619% of the total variance in the variables
which are included on the components.

A caution is added to our findings because of the inclusion of ordinal level
variables in the analysis and the limited sample available for analysis.
SW388R7
Data Analysis &
Computers II

Slide 52
Problem 2
2. In the dataset GSS2000.sav, is the following statement true, false, or
an incorrect application of a statistic? Answer the question based on the
results of a principal component analysis prior to testing for outliers, split
sample validation, and a test of reliability. Assume that there is no
problematic pattern of missing data. Use a level of significance of 0.05.

A principal component analysis of the variables "number of hours worked
in the past week" [hrs1], "occupational prestige score" [prestg80], "age"
[age], and "highest academic degree" [degree] does not result in any
usable components that reduce the number of variable needed to
represent the information in the original variables.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II

Slide 53
Computing a principal component analysis
To compute a principal
component analysis in SPSS,
select the Data Reduction |
Factor command from the
Analyze menu.
SW388R7
Data Analysis &
Computers II

Slide 54
Add the variables to the analysis
First, move the
variables listed in
the problem to the
Variables list box.
Second, click on the
Descriptives button to
specify statistics to
include in the output.
SW388R7
Data Analysis &
Computers II

Slide 55
Compete the descriptives dialog box
First, mark the Univariate
descriptives checkbox to get a
tally of valid cases.
Third, mark the
Coefficients checkbox to get
a correlation matrix, one of
the outputs needed to
assess the appropriateness
of factor analysis for the
variables.
Second, keep the Initial
solution checkbox to get
the statistics needed to
determine the number
of factors to extract.
Fourth, mark the KMO and Bartletts test
of sphericity checkbox to get more outputs
used to assess the appropriateness of
factor analysis for the variables.
Fifth, mark the Anti-image
checkbox to get more
outputs used to assess the
appropriateness of factor
analysis for the variables.
Sixth, click
on the
Continue
button.
SW388R7
Data Analysis &
Computers II

Slide 56
Select the extraction method
First, click on the
Extraction button to
specify statistics to
include in the output.
SW388R7
Data Analysis &
Computers II

Slide 57
Compete the extraction dialog box
First, retain the default
method Principal components.
Second, click
on the
Continue
button.
SW388R7
Data Analysis &
Computers II

Slide 58
Select the rotation method
First, click on the
Rotation button to
specify statistics to
include in the output.
SW388R7
Data Analysis &
Computers II

Slide 59
Compete the rotation dialog box
First, mark the
Varimax method
as the type of
rotation to used
in the analysis.

Second, click
on the
Continue
button.
SW388R7
Data Analysis &
Computers II

Slide 60
Complete the request for the analysis
First, click on the
OK button to
request the output.
SW388R7
Data Analysis &
Computers II

Slide 61
Level of measurement requirement
"Number of hours worked in the past week" [hrs1],
"occupational prestige score" [prestg80], and "age"
[age] are interval level variables, which satisfy the
level of measurement requirements for a principal
component analysis.

"Highest academic degree" [degree] is an ordinal level
variable. If we follow the convention of treating ordinal
level variables as metric variables, the level of
measurement requirement for principal component
analysis is satisfied. Since some data analysts do not
agree with this convention, a note of caution should be
included in our interpretation.
SW388R7
Data Analysis &
Computers II

Slide 62
Descriptive Statistics
40.94 12.624 172
45.37 14.218 172
41.81 11.837 172
1.70 1.185 172
NUMBER OF HOURS
WORKED LAST WEEK
RS OCCUPATIONAL
PRESTIGE SCORE
(1980)
AGE OF RESPONDENT
RS HIGHEST DEGREE
Mean Std. Devi ation Anal ysi s N
Sample size requirement:
minimum number of cases
The number of valid cases for this
set of variables is 172.

The preferred minimum sample size
requirement of 100 valid cases is
satisfied.
SW388R7
Data Analysis &
Computers II

Slide 63
Descriptive Statistics
40.94 12.624 172
45.37 14.218 172
41.81 11.837 172
1.70 1.185 172
NUMBER OF HOURS
WORKED LAST WEEK
RS OCCUPATIONAL
PRESTIGE SCORE
(1980)
AGE OF RESPONDENT
RS HIGHEST DEGREE
Mean Std. Devi ation Anal ysi s N
Sample size requirement:
ratio of cases to variables
The ratio of cases to
variables in a principal
component analysis should
be at least 5 to 1.

With 172 and 4 variables,
the ratio of cases to
variables is 43.0 to 1,
which exceeds the
requirement for the ratio of
cases to variables.
SW388R7
Data Analysis &
Computers II

Slide 64
Correlation Matrix
1.000 .160 .021 .174
.160 1.000 .060 .525
.021 .060 1.000 .068
.174 .525 .068 1.000
NUMBER OF HOURS
WORKED LAST WEEK
RS OCCUPATIONAL
PRESTIGE SCORE
(1980)
AGE OF RESPONDENT
RS HIGHEST DEGREE
Correl ati on
NUMBER OF
HOURS
WORKED
LAST WEEK
RS
OCCUPA
TIONAL
PRESTIG
E SCORE
(1980)
AGE OF
RESPON
DENT
RS HIGHEST
DEGREE
Appropriateness of factor analysis:
Presence of substantial correlations
Principal components analysis requires
that there be some correlations
greater than 0.30 between the
variables included in the analysis.

For this set of variables, there is 1
correlation in the matrix greater than
0.30, which does not satisfy this
requirement.
Since we do not satisfy a
basic requirement for
principal component factor
analysis, the answer to
the question is
inappropriate
application of a statistic
SW388R7
Data Analysis &
Computers II

Slide 65
Steps in principal component analysis - 1
The following is a guide to the decision process for
answering problems about principal component analysis:
Is the number of valid
cases 50 or more?

Yes
No
Incorrect application
of a statistic
Yes
No
Are the variables
included in the analysis
metric?

Is the ratio of cases to
variables at least 5 to 1?
Yes
No
Incorrect application
of a statistic
Incorrect application
of a statistic
SW388R7
Data Analysis &
Computers II

Slide 66
Steps in principal component analysis - 2
Is the measure of
sampling adequacy larger
than 0.50 for each
variable?
Yes
No
Incorrect application
of a statistic
Yes
No
Are there two or more
correlations that are
0.30 or greater?
Overall measure of
sampling adequacy
greater than 0.50?
Yes
No
Incorrect application
of a statistic
Remove variable with
lowest MSA and repeat
analysis
SW388R7
Data Analysis &
Computers II

Slide 67
Steps in principal component analysis - 3
Incorrect application
of a statistic
Yes
No
Probability for Bartlett
test of sphericity less
than level of
significance?
Communality for each
variable greater than
0.50?

Yes
No Remove variable with
lowest communality
and repeat analysis
SW388R7
Data Analysis &
Computers II

Slide 68
Steps in principal component analysis - 4
Does all variables show
simple structure
(only 1 loading > 0.40)?
Yes
No
Do all of the components
have more than one
variable loading on it?
Yes
No
Remove variable with
complex structure and
repeat analysis
False
No
Are the number of
components and pattern
of loadings correct?
Remove single variable
loading on component
and repeat analysis
Yes
SW388R7
Data Analysis &
Computers II

Slide 69
Steps in principal component analysis - 5
Are all of the metric
variables included in the
final analysis interval
level?
Yes
No
Is the sample size for the
final analysis include 100
cases or more?
Yes
No
True with caution
No
Is the cumulative
proportion of variance
for variables 60% or
higher?
Yes
True with caution
True with caution
True

Das könnte Ihnen auch gefallen