Sie sind auf Seite 1von 50

Principal Components Analysis

Reduce survey data into factors that account for maximum variance
Principal Components Analysis (PCA) uses algorithms to "reduce" data into
correlated "factors" that provide a conceptual and mathematical
understanding of the construct of interest.
Going back to the construct specification and the survey items, everything has
been focused on measuring for one construct related to answering the research
question. Under the assumption that researchers are measuring for one
construct, the individual items should correlate in some form or fashion.
These inter-correlations among different sets of survey items (or content
areas) provide a mathematical basis for understanding latent or underlying
relationships that may exist. Principal Components Analysis (PCA) reduces
survey data down into content areas that account for the most variance.
The process of conducting a Principal Components Analysis
A Principal Components Analysis) is a three step process:
1. The inter-correlations amongst the items are calculated yielding a correlation
matrix.
2. The inter-correlated items, or "factors," are extracted from the correlation
matrix to yield "principal components."
3. These "factors" are rotated for purposes of analysis and interpretation.
At this point, the researcher has to make a decision about how to move forward.
Luckily, there are two statistical calculations that help you make this
decision: Eigenvalues and scree plots.
An eigenvalue is essentially a ratio of the shared variance to the unique variance
accounted for in the construct of interest by each "factor" yielded from the
extraction of principal components. An eigenvalue of 1.0 or greater is the
arbitrary criterion accepted in the current literature for deciding if a factor
should be further interpreted. The logic underlying the criterion of 1.0 comes
from the belief that the amount of shared variance explained by a "factor"
should at least be equal to the unique variance the "factor" accounts for in the
overall construct.
Scree plots provide a visual aid in deciding how many "factors" should be
interpreted from the principal components extraction. In a scree plot,

the eigenvalues are plotted against the order of "factors" extracted from the data.
Because the first "factors" extracted from the principal components analysis
often have the highest inter-correlations amongst their individual survey items,
and will thus account for more overall variance in your construct of interest,
they tend to be extracted first. As other "factors" are extracted, the intercorrelations will become weaker and have smaller eigenvalues. One can look at
a scree plot and see a visually significant decrease at one point in time as
eigenvalues decrease. This "elbow" or factor at which the screen plot has a
significant reduction in eigenvalue and then level's off is often considered the
criterion for selecting the number of "factors" to interpret.
So, based on the two statistical calculations above, the eigenvalues and scree
plot, make a decision on how many "factors" should be extracted.
These extracted "factors" of inter-correlated items are "rotated." This
"rotation" occurs because it is prevalent for certain items to be highly intercorrelated with items on several different "factors." This makes it hard for the
initial extraction of factors to be interpreted. The "rotation" forces these
troublesome items onto the "factor" with which it has the most strongest
association with the items of the "factor." This mathematical
"rotation" increases the interpretability of extracted "factors," but cancels
out the ability to interpret the amount of shared variance associated with
the "factor."
When it comes to interpreting the "factors" themselves, any item that does not at
least have a correlation or "factor loading" of .3 with the "factor" it has loaded
on should be discarded.
There are a few assumptions that must be met to conduct a Principal
Components Analysis (PCA):
1. There must be a large enough sample size to allow the correlations to
converge into mutually exclusive "factors."
2. Normality and linearity of the items is assumed because correlations
provide the mathematical foundation for factor analysis to extract "factors."
3. The items must be written in a fashion where sufficiently high enough
correlations can be yielded and extracted.
4. Content areas and items must be utilized within some sort of theoretical or
conceptual framework so that correlations can be yielded.

5. The sample must be relatively homogeneous so that the construct can be


measured for in its relative context in the given population.
The steps for conducting a Principal Components Analysis (PCA) in SPSS
1. The data is entered in a within-subjects fashion.
2. Click Analyze.
3. Drag the cursor over the Dimension Reduction drop-down menu.
4. Click Factor.
5. Click on the first ordinal or continuous variable, observation, or item to
highlight it.
6. Click on the arrow to move the variable into the Variables: box.
7. Repeat Steps 5 and 6 until all of the variables, observations, or items are in
the Variables: box.
8. Click on the Descriptives button.
9. Click on the KMO and Bartlett's test of sphericity box to select it.
10. Click Continue.
11. Click on the Extraction button.
12. Click on the Scree plot box to select it.
13. Click Continue.
14. Click on the Rotation button.
15. Click on the Direct Oblimin choice to select it.
16. Click Continue.
17. Click on Options.
18. In the Coefficient Display Format table, click on the Suppress small
coefficients box to select it.

19. Type .40 into the Absolute value below: box.


20. Click Continue.
21. Click OK.
The steps for interpreting the SPSS output for PCA
1. Look in the KMO and Bartlett's Test table.
2. The Kaiser-Meyer-Olkin Measure of Sampling Adequacy (KMO) needs to
be at least .6 with values closer to 1.0 being better.
3. The Sig. row of the Bartlett's Test of Sphericity is the p-value that should be
interpreted.
If the p-value is LESS THAN .05, reject the null hypothesis that this is an
identity matrix. RESEARCHERS WANT TO REJECT THE NULL
HYPOTHESIS.
If the p-value is MORE THAN .05, there is an identity matrix and a principal
components analysis should not be conducted.
4. Scroll down to the Total Variance Explained table. Look under the Initial
Eigenvalues column heading. The Total column contains the eigenvalues,
interpret only factors that have an eigenvalue above 1.0. The % of
Variance column shows how much variance within the construct is
accounted for by that factor. The Cumulative % column shows the total
amount of variance accounted for in the construct by factors with eigenvalues
above 1.0. The total number of factors, the amount of variance each factor
accounts for, and the final amount of variance accounted for by all factors with
eigenvalues above 1.0 are important results to report.
5. Scroll down to the Pattern Matrix table. These are your extracted and
rotated factors. Researchers will see which survey items "loaded" on each
factor. The items in the factors constitute the underlying components of the
overall construct.
At this point, researchers discard the items that did not make it through the
iterations of the reliability analysis, and formally structure the newly piloted
survey with the items that loaded on factors with eigenvalues higher than 1.0.
These are the survey items that will be tested within a nomological network

with a new sample to establish validity evidence for the survey instrument and
construct.
http://support.minitab.com/en-us/minitab/17/topic-library/modelingstatistics/multivariate/principal-components-and-factor-analysis/chooseextraction-method/

Annotated SPSS Output


Factor Analysis
This page shows an example of a factor analysis with footnotes explaining the output. The
data used in this example were collected by Professor James Sidanius, who has generously
shared them with us. You can download the data set here.
Overview: The "what" and "why" of factor analysis
Factor analysis is a method of data reduction. It does this by seeking underlying
unobservable (latent) variables that are reflected in the observed variables (manifest
variables). There are many different methods that can be used to conduct a factor analysis
(such as principal axis factor, maximum likelihood, generalized least squares, unweighted
least squares), There are also many different types of rotations that can be done after the
initial extraction of factors, including orthogonal rotations, such as varimax and equimax,
which impose the restriction that the factors cannot be correlated, and oblique rotations,
such as promax, which allow the factors to be correlated with one another. You also need to
determine the number of factors that you want to extract. Given the number of factor
analytic techniques and options, it is not surprising that different analysts could reach very
different results analyzing the same data set. However, all analysts are looking for simple
structure. Simple structure is pattern of results such that each variable loads highly onto one
and only one factor.
Factor analysis is a technique that requires a large sample size. Factor analysis is based on
the correlation matrix of the variables involved, and correlations usually need a large sample
size before they stabilize. Tabachnick and Fidell (2001, page 588) cite Comrey and Lee's
(1992) advise regarding sample size: 50 cases is very poor, 100 is poor, 200 is fair, 300 is
good, 500 is very good, and 1000 or more is excellent. As a rule of thumb, a bare minimum
of 10 observations per variable is necessary to avoid computational difficulties.
For the example below, we are going to do a rather "plain vanilla" factor analysis. We will
use iterated principal axis factor with three factors as our method of extraction, a varimax
rotation, and for comparison, we will also show the promax oblique solution. The
determination of the number of factors to extract should be guided by theory, but also
informed by running the analysis extracting different numbers of factors and seeing which
number of factors yields the most interpretable results.
In this example we have included many options, including the original and reproduced
correlation matrix, the scree plot and the plot of the rotated factors. While you may not wish
to use all of these options, we have included them here to aid in the explanation of the
analysis. We have also created a page of annotated output for a principal components
analysis that parallels this analysis. For general information regarding the similarities and
differences between principal components analysis and factor analysis, see Tabachnick and
Fidell (2001), for example.

factor

/variables item13 item14 item15 item16 item17


item18 item19 item20 item21 item22 item23
item24
/print initial det kmo repr extraction
rotation fscore univariate
/format blank(.30)
/plot eigen rotation
/criteria factors(3)
/extraction paf
/rotation varimax
/method = correlation.

The table above is output because we used the univariate option on


the /print subcommand. Please note that the only way to see how many cases were
actually used in the factor analysis is to include the univariate option on
the /print subcommand. The number of cases used in the analysis will be less than the total
number of cases in the data file if there are missing values on any of the variables used in
the factor analysis, because, by default, SPSS does a listwise deletion of incomplete cases.
If the factor analysis is being conducted on the correlations (as opposed to the covariances),
it is not much of a concern that the variables have very different means and/or standard
deviations (which is often the case when variables are measured on different scales).
a. Mean - These are the means of the variables used in the factor analysis.
b. Std. Deviation - These are the standard deviations of the variables used in the factor
analysis.
c. Analysis N - This is the number of cases used in the factor analysis.

The table above is included in the output because we used the det option on
the /print subcommand. All we want to see in this table is that the determinant is not 0. If
the determinant is 0, then there will be computational problems with the factor analysis, and
SPSS may issue a warning message or be unable to complete the factor analysis.

a. Kaiser-Meyer-Olkin Measure of Sampling Adequacy - This measure varies between 0


and 1, and values closer to 1 are better. A value of .6 is a suggested minimum.
b. Bartlett's Test of Sphericity - This tests the null hypothesis that the correlation matrix is
an identity matrix. An identity matrix is matrix in which all of the diagonal elements are 1 and
all off diagonal elements are 0. You want to reject this null hypothesis.
Taken together, these tests provide a minimum standard which should be passed before a
factor analysis (or a principal components analysis) should be conducted.

a. Communalities - This is the proportion of each variable's variance that can be explained
by the factors (e.g., the underlying latent continua). It is also noted as h2 and can be defined
as the sum of squared factor loadings for the variables.
b. Initial - With principal factor axis factoring, the initial values on the diagonal of the
correlation matrix are determined by the squared multiple correlation of the variable with the
other variables. For example, if you regressed items 14 through 24 on item 13, the squared
multiple correlation coefficient would be .564.
c. Extraction - The values in this column indicate the proportion of each variable's variance
that can be explained by the retained factors. Variables with high values are well
represented in the common factor space, while variables with low values are not well
represented. (In this example, we don't have any particularly low values.) They are the

reproduced variances from the factors that you have extracted. You can find these values
on the diagonal of the reproduced correlation matrix.

a. Factor - The initial number of factors is the same as the number of variables used in the
factor analysis. However, not all 12 factors will be retained. In this example, only the first
three factors will be retained (as we requested).
b. Initial Eigenvalues - Eigenvalues are the variances of the factors. Because we
conducted our factor analysis on the correlation matrix, the variables are standardized,
which means that the each variable has a variance of 1, and the total variance is equal to the
number of variables used in the analysis, in this case, 12.
c. Total - This column contains the eigenvalues. The first factor will always account for the
most variance (and hence have the highest eigenvalue), and the next factor will account for
as much of the left over variance as it can, and so on. Hence, each successive factor will
account for less and less variance.
d. % of Variance - This column contains the percent of total variance accounted for by each
factor.
e. Cumulative % - This column contains the cumulative percentage of variance accounted
for by the current and all preceding factors. For example, the third row shows a value of
68.313. This means that the first three factors together account for 68.313% of the total
variance.
f. Extraction Sums of Squared Loadings - The number of rows in this panel of the table
correspond to the number of factors retained. In this example, we requested that three
factors be retained, so there are three rows, one for each retained factor. The values in this
panel of the table are calculated in the same way as the values in the left panel, except that
here the values are based on the common variance. The values in this panel of the table will

always be lower than the values in the left panel of the table, because they are based on the
common variance, which is always smaller than the total variance.
g. Rotation Sums of Squared Loadings - The values in this panel of the table represent
the distribution of the variance after the varimax rotation. Varimax rotation tries to maximize
the variance of each of the factors, so the total amount of variance accounted for is
redistributed over the three extracted factors.

The scree plot graphs the eigenvalue against the factor number. You can see these values
in the first two columns of the table immediately above. From the third factor on, you can
see that the line is almost flat, meaning the each successive factor is accounting for smaller
and smaller amounts of the total variance.

b. Factor Matrix - This table contains the unrotated factor loadings, which are the
correlations between the variable and the factor. Because these are correlations, possible
values range from -1 to +1. On the /format subcommand, we used the option blank(.30),
which tells SPSS not to print any of the correlations that are .3 or less. This makes the
output easier to read by removing the clutter of low correlations that are probably not
meaningful anyway.
c. Factor - The columns under this heading are the unrotated factors that have been
extracted. As you can see by the footnote provided by SPSS (a.), three factors were
extracted (the three factors that we requested).

c. Reproduced Correlations - This table contains two tables, the reproduced correlations
in the top part of the table, and the residuals in the bottom part of the table.
d. Reproduced Correlation - The reproduced correlation matrix is the correlation matrix
based on the extracted factors. You want the values in the reproduced matrix to be as close
to the values in the original correlation matrix as possible. This means that the residual
matrix, which contains the differences between the original and the reproduced matrix to be
close to zero. If the reproduced matrix is very similar to the original correlation matrix, then
you know that the factors that were extracted accounted for a great deal of the variance in
the original correlation matrix, and these few factors do a good job of representing the
original data. The numbers on the diagonal of the reproduced correlation matrix are
presented in the Communalities table in the column labeled Extracted.
e. Residual - As noted in the first footnote provided by SPSS (a.), the values in this part of
the table represent the differences between original correlations (shown in the correlation
table at the beginning of the output) and the reproduced correlations, which are shown in the
top part of this table. For example, the original correlation between item13 and item14 is .
661, and the reproduced correlation between these two variables is .646. The residual is .
016 = .661 - .646 (with some rounding error).

b. Rotated Factor Matrix - This table contains the rotated factor loadings (factor pattern
matrix), which represent both how the variables are weighted for each f actor but also the
correlation between the variables and the factor. Because these are correlations, possible
values range from -1 to +1. On the /format subcommand, we used the option blank(.30),
which tells SPSS not to print any of the correlations that are .3 or less. This makes the
output easier to read by removing the clutter of low correlations that are probably not
meaningful anyway.
For orthogonal rotations, such as varimax, the factor pattern and factor structure matrices
are the same.

c. Factor - The columns under this heading are the rotated factors that have been
extracted. As you can see by the footnote provided by SPSS (a.), three factors were
extracted (the three factors that we requested). These are the factors that analysts are most
interested in and try to name. For example, the first factor might be called "instructor
competence" because items like "instructor well prepare" and "instructor competence" load
highly on it. The second factor might be called "relating to students" because items like
"instructor is sensitive to students" and "instructor allows me to ask questions" load highly on
it. The third factor has to do with comparisons to other instructors and courses.

The table below is from another run of the factor analysis program shown above, except with
a promax rotation. We have included it here to show how different the rotated solutions can
be, and to better illustrate what is meant by simple structure. As you can see with an oblique
rotation, such as a promax rotation, the factors are permitted to be correlated with one
another. With an orthogonal rotation, such as the varimax shown above, the factors are not
permitted to be correlated (they are orthogonal to one another). Oblique rotations, such as
promax, produce both factor pattern and factor structure matrices. For orthogonal rotations,
such as varimax and equimax, the factor structure and the factor pattern matrices are the
same. The factor structure matrix represents the correlations between the variables and the
factors. The factor pattern matrix contain the coefficients for the linear combination of the
variables.

The table below indicates that the rotation done is an oblique rotation. If an orthogonal
rotation had been done (like the varimax rotation shown above), this table would not appear
in the output because the correlations between the factors are set to 0. Here, you can see
that the factors are highly correlated.

The rest of the output shown below is part of the output generated by the SPSS syntax
shown at the beginning of this page.

a. Factor Transformation Matrix - This is the matrix by which you multiply the unrotated
factor matrix to get the rotated factor matrix.

The plot above shows the items (variables) in the rotated factor space. While this picture
may not be particularly helpful, when you get this graph in the SPSS output, you can
interactively rotate it. This may help you to see how the items (variables) are organized in
the common factor space.

a. Factor Score Coefficient Matrix - This is the factor weight matrix and is used to
compute the factor scores.

a. Factor Score Covariance Matrix - Because we used an orthogonal rotation, this should
be a diagonal matrix, meaning that the same number should appear in all three places along
the diagonal. In actuality the factors are uncorrelated; however, because factor scores are
estimated there may be slight correlations among the factor scores.
How to cite this page
Report an error on this page or leave a comment
The content of this web site should not be construed as an endorsement of any particular
web site, book, or software product by the University of California.

Factor Analysis & SEM


Conduct and Interpret a Factor Analysis
What is the Factor Analysis? The Factor Analysis is an explorative analysis. Much like the cluster analysis grouping
similar cases, the factor analysis groups similar variables into dimensions. This process is also called identifying latent
variables. Since factor analysis is an explorative analysis it does not distinguish between independent and dependent
variables. Factor Analysis reduces

Read More
PLS Graph Software
PLS graph is an application that consists of a windows based graphical user interface that helps the researcher or the
user to perform partial least square (PLS) analyses. PLS analysis provides a general model which helps in predictive
analyses (usually in pilot studies), such as canonical correlations, multiple regressions, MANOVAs, and PCAs. It helps
the

Read More
LISREL
LISRELis a program application provided by Windows for performing structural equation modeling (SEM), and other
related linear structure modeling (e.g.,multilevel structural equation modeling, multilevel linear and non-linear
modeling, etc.). LISREL for Windows is helpful in importing the external data in various formats like SPSS, SAS , MS
Excel, etc. as a PRELIS system file (PSF).

Read More
Structural Equation Modeling
Structural equation modeling is a multivariate statistical analysis technique that is used to analyze structural
relationships. This technique is the combination of factor analysis and multiple regression analysis, and it is used to
analyze the structural relationship between measured variables and latent constructs. This method is preferred by the
researcher because it estimates the multiple

Read More
Principal Component Analysis (PCA)
There are two basic approaches to factor analysis: principal component analysis (PCA) and common factor analysis.
Overall, factor analysis involves techniques to help produce a smaller number of linear combinations on variables so

that the reduced variables account for and explain most the variance in correlation matrix pattern. Principal component
analysis is an approach to

Read More
Path Analysis
Path analysis is an extension of the regression model. In a path analysis model from the correlation matrix, two or
more casual models are compared. The path of the model is shown by a square and an arrow, which shows the
causation. Regression weight is predicated by the model. Then the goodness of fit statistic

Read More
Factor Analysis
Factor analysis is a technique that is used to reduce a large number of variables into fewer numbers of factors. This
technique extracts maximum common variance from all variables and puts them into a common score. As an index of
all variables, we can use this score for further analysis. Factor analysis is part of

Read More
Exploratory Factor Analysis
Exploratory factor analysis is a statistical technique that is used to reduce data to a smaller set of summary variables
and to explore the underlining theoretical structure of the phenomena. It is used to identify the structure of the
relationship between the variable and the respondent. Exploratory factor analysis can be performed by using the

Read More
Confirmatory Factor Analysis
Confirmatory factor analysis (CFA) is a multivariate statistical procedure that is used to test how well the measured
variables represent the number of constructs. Confirmatory factor analysis (CFA) and exploratory factor analysis (EFA)
are similar techniques, but in exploratory factor analysis (EFA), data is simply explored and provides information about
the numbers of factors required

Read More
What is the Factor Analysis?

The Factor Analysis is an explorative analysis. Much like the cluster


analysis grouping similar cases, the factor analysis groups similar
variables into dimensions. This process is also called identifying
latent variables. Since factor analysis is an explorative analysis it
does not distinguish between independent and dependent variables.
Factor Analysis reduces the information in a model by reducing the
dimensions of the observations. This procedure has multiple
purposes. It can be used to simplify the data, for example reducing
the number of variables in predictive regression models. If factor
analysis is used for these purposes, most often factors are rotated
after extraction. Factor analysis has several different rotation

methodssome of them ensure that the factors are orthogonal.


Then the correlation coeffi cient between two factors is zero, which
eliminates problems of multicollinearity in regression analysis.
Factor analysis is also used in theory testing to verify scale
construction and operationalizations. In such a case, the scale is
specified upfront and we know that a certain subset of the scale
represents an independent dimension within this scale. This form of
factor analysis is most often used in structural equation modeling
and is referred to as Confirmatory Factor Analysis. For example, we
know that the questions pertaining to the big five personality traits
cover all five dimensions N, A, O, and I. If we want to build a
regression model that predicts the influence of the personality
dimensions on an outcome variable, for example anxiety in public
places, we would start to model a confirmatory factor analysis of the
twenty questionnaire items that load onto five factors and then
regress onto an outcome variable.
Factor analysis can also be used to construct indices. The most
common way to construct an index is to simply sum up the items in
an index. In some contexts, however, some variables might have a
greater explanatory power than others. Also sometimes similar
questions correlate so much that we can justify dropping one of the
questions completely to shorten questionnaires. In such a case, we
can use factor analysis to identify the weight each variable should
have in the index.
The Factor Analysis in SPSS

The research question we want to answer with our explorative factor


analysis is as follows:
What are the underlying dimensions of our standardized and
aptitude test scores? That is, how do aptitude and standardized
tests form performance dimensions?
The factor analysis can be found in Analyze/Dimension Reduction/Factor

In the dialog box of the factor analysis we start by adding our variables
(the standardized tests math, reading, and writing, as well as the
aptitude tests 1-5) to the list of variables.

In the dialog Descriptives we need to add a few statistics for which


we must verify the assumptions made by the factor analysis. If you
want the Univariate Descriptives that is your choice, but to verify
the assumptions we need the KMO test of sphericity and the AntiImage Correlation matrix.

The dialog box Extraction allows us to specify the extraction method


and the cut-off value for the extraction. Let's start with the easy

one the cut-off value. Generally, SPSS can extract as many factors
as we have variables. The eigenvalue is calculated for each factor
extracted. If the eigenvalue drops below 1 it means that the factor
explains less variance than adding a variable would do (all variables
are standardized to have mean = 0 and variance = 1). Thus we
want all factors that better explain the model than would adding a
single variable.

The more complex bit is the appropriate extraction method. Principal


Components (PCA) is the standard extraction method. It does extract
uncorrelated linear combinations of the variables. The first factor
has maximum variance. The second and all following factors explain
smaller and smaller portions of the variance and are all uncorrelated
with each other. It is very similar to Canonical Correlation Analysis.
Another advantage is that PCA can be used when a correlation
matrix is singular.
The second most common analysis is principal axis factoring, also
called common factor analysis , or principal factor analysis . Although
mathematically very similar to principal components it is interpreted
as that principal axis that identifies the latent constructs behind the
observations, whereas principal component identifies similar groups
of variables.

Generally speaking, principal component is preferred when using


factor analysis in causal modeling, and principal factor when using
the factor analysis to reduce data. In our research question we are
interested in the dimensions behind the variables, and therefore we
are going to use Principal Axis Factoring.
The next step is to select a rotation method. After extracting the
factors, SPSS can rotate the factors to better fit the data. The most
commonly used method is Varimax . Varimax is an orthogonal rotation
method (that produces independent factors = no multicollinearity)
that minimizes the number of variables that have high loadings on
each factor. This method simplifies the interpretation of the factors.

A second, frequently used method is Quartimax . Quartimax rotates the


factors in order to minimize the number of factors needed to explain
each variable. This method simplifies the interpretation of the
observed variables.
Another method is Equamax . Equamax is a combination of
the Varimax method, which simplifies the factors, and
the Quartimax method, which simplifies the variables. The number of

variables that load highly on a factor and the number of factors


needed to explain a variable are minimized. We choose Varimax.
In the dialog box Options we can manage how missing values are
treated it might be appropriate to replace them with the mean,
which does not change the correlation matrix but ensures that we
don't over penalize missing values. Also, we can specify that in the
output we don't want to include all factor loadings. The factor
loading tables are much easier to interpret when we suppress small
factor loadings. Default value is 0.1 in most fields. It is appropriate
to increase this value to 0.4. The last step would be to save the
results in the Scores dialog. This calculates a value that every
respondent would have scored had they answered the factors
questions (whatever they might be) instead. Before we save these
results to the data set, we should run the factor analysis first, check
all assumptions, ensure that the results are meaningful and that
they are what we are looking for and then we should re-run the
analysis and save the factor scores.

Output, syntax, and interpretation can be found in our downloadable


manual: Statistical Analysis: A Manual on Dissertation Statistics in SPSS
(included in our member resources). Click here to download .

is a multivariate statistical
procedure that is used to test how well the measured variables
represent the number of constructs. Confirmatory factor analysis
(CFA) and exploratory factor analysis (EFA) are similar techniques, but in
exploratory factor analysis (EFA), data is simply explored and
provides information about the numbers of factors required to
represent the data. In exploratory factor analysis, all measured
variables are related to every latent variable. But in confirmatory
factor analysis (CFA), researchers can specify the number of factors
required in the data and which measured variable is related to which
latent variable. Confirmatory factor analysis (CFA) is a tool that is
used to confirm or reject the measurement theory.
Confirmatory factor analysis (CFA)

General Purpose - Procedure


1. Defi ning individual construct: First, we have to defi ne the individual
constructs. The fi rst step involves the procedure that defi nes constructs
theoretically. This involves a pretest to evaluate the construct items, and a
confi rmatory test of the measurement model that is conducted using confi rmatory
factor analysis (CFA), etc.
2. Developing the overall measurement model theory: In confi rmatory factor
analysis (CFA), we should consider the concept of unidimensionality between
construct error variance and within construct error variance. At least four
constructs and three items per constructs should be present in the research.
3. Designing a study to produce the empirical results: The measurement model
must be specifi ed. Most commonly, the value of one loading estimate should be
one per construct. Two methods are available for identifi cation; the fi rst is rank
condition, and the second is order condition.
4. Assessing the measurement model validity: Assessing the measurement
model validity occurs when the theoretical measurement model is compared with
the reality model to see how well the data fi ts. To check the measurement model
validity, the number of the indicator helps us. For example, the factor loading
latent variable should be greater than 0.7. Chi-square test and other goodness of
fi t statistics like RMR, GFI, NFI, RMSEA, SIC, BIC, etc., are some key indicators that
help in measuring the model validity.

Questions a CFA answers


From my 20 question instrument, are the five factors clearly
identifiable constructs as measured by the 4 questions that they are
comprised of?
Do my four survey questions accurately measure one factor?
Assumptions

The assumptions of a CFA include multivariate normality, a suffi cient


sample size (n >200), the correct a priori model specification, and
data must come from a random sample.
Key Terms:

Theory: A systematic set of causal relationships that provide the comprehensive


explanation of a phenomenon.
Model: A specifi ed set of dependent relationships that can be used to test the
theory.
Path analysis : Used to test structural equations.
Path diagram: Shows the graphical representation of cause and eff ect
relationships of the theory.
Endogenous variable: The resulting variables that are a causal relationship.
Exogenous variable: The predictor variables.
Confi rmatory analysis: Used to test the pre-specifi ed relationship.
Cronbachs alpha: Used to measure the reliability of two or more construct
indicators.
Identifi cation: Used to test whether or not there are a suffi cient number of
equations to solve the unknown coeffi cient. Identifi cations are of three types: (1)
under-identifi ed, (2) exact identifi ed, and (3) over-identifi ed.
Goodness of fi t: The degree to which the observed input matrix is predicted by
the estimated model.
Latent variables: Variables that are inferred, not directly observed, from other
variables that are observed.

Confirmatory factor analysis (CFA) and statistical software:

Usually, statistical software like AMOS , LISREL, EQS and SAS are used
for confirmatory factor analysis. In AMOS, visual paths are manually
drawn on the graphic window and analysis is performed. In LISREL,
confirmatory factor analysis can be performed graphically as well as
from the menu. In SAS, confirmatory factor analysis can be
performed by using the programming languages.
Related Pages:

Exploratory Factor Analysis


Sample Size
SPSS Manual

To Reference This Page:

Statistics Solutions. (2013). Confirmatory Factor Analysis [WWW


Document]. Retrieved from
http://www.statisticssolutions.com/academicsolutions/resources/directory-of-statistical-analyses/confirmatoryfactor-analysis/

is a technique that is used to reduce a large number


of variables into fewer numbers of factors. This technique extracts
maximum common variance from all variables and puts them into a
common score. As an index of all variables, we can use this score
for further analysis. Factor analysis is part of general linear model
(GLM) and this method also assumes several assumptions: there is
linear relationship, there is no multicollinearity, it includes relevant
variables into analysis, and there is true correlation between
variables and factors. Several methods are available, but principle
component analysis is used most commonly.
Factor analysis

Types of factoring:

There are different types of methods used to extract the factor from
the data set:
1. Principal component analysis: This is the most common method
used by researchers. PCA starts extracting the maximum variance
and puts them into the first factor. After that, it removes that
variance explained by the first factors and then starts extracting
maximum variance for the second factor. This process goes to the
last factor.
2. Common factor analysis: The second most preferred method by
researchers, it extracts the common variance and puts them into
factors. This method does not include the unique variance of all
variables. This method is used in SEM.
3. Image factoring: This method is based on correlation matrix. OLS
Regression method is used to predict the factor in image factoring.
4. Maximum likelihood method: This method also works on correlation
metric but it uses maximum likelihood method to factor.
5. Other methods of factor analysis: Alfa factoring outweighs least
squares. Weight square is another regression based method which is
used for factoring.
Factor loading:

Factor loading is basically the correlation coeffi cient for the variable
and factor. Factor loading shows the variance explained by the
variable on that particular factor. In the SEM approach, as a rule of
thumb, 0.7 or higher factor loading represents that the factor
extracts suffi cient variance from that variable.

Eigenvalues is also called characteristic roots.


Eigenvalues shows variance explained by that particular factor out
of the total variance. From the commonality column, we can know
how much variance is explained by the first factor out of the total
variance. For example, if our first factor explains 68% variance out
of the total, this means that 32% variance will be explained by the
other factor.
Factor score: The factor score is also called the component score.
This score is of all row and columns, which can be used as an index
of all variables and can be used for further analysis. We can
standardize this score by multiplying a common term. With this
factor score, whatever analysis we will do, we will assume that all
variables will behave as factor scores and will move.
Eigenvalues:

According to the Kaiser


Criterion, Eigenvalues is a good criteria for determining a factor. If
Eigenvalues is greater than one, we should consider that a factor
and if Eigenvalues is less than one, then we should not consider that
a factor. According to the variance extraction rule, it should be more
than 0.7. If variance is less than 0.7, then we should not consider
that a factor.
Criteria for determining the number of factors:

Rotation method makes it more reliable to


understand the output. Eigenvalues do not affect the rotation
method, but the rotation method affects the Eigenvalues or
percentage of variance extracted. There are a number of rotation
methods available: (1) No rotation method, (2) Varimax rotation
method, (3) Quartimax rotation method, (4) Direct oblimin rotation
method, and (5) Promax rotation method. Each of these can be
easily selected in SPSS , and we can compare our variance explained
by those particular methods.
Rotation method:

Assumptions:
1. No outlier: Assume that there are no outliers in data.
2. Adequate sample size: The case must be greater than the factor.
3. No perfect multicollinearity: Factor analysis is an interdependency technique.
There should not be perfect multicollinearity between the variables.
4. Homoscedasticity: Since factor analysis is a linear function of measured
variables, it does not require homoscedasticity between the variables.
5. Linearity: Factor analysis is also based on linearity assumption. Non-linear
variables can also be used. After transfer, however, it changes into linear
variable.
6. Interval Data: Interval data are assumed.

Key concepts and terms:

Assumes that any indicator or variable


may be associated with any factor. This is the most common factor
analysis used by researchers and it is not based on any prior theory.
Exploratory factor analysis:

Used to determine the factor and


factor loading of measured variables, and to confirm what is
expected on the basic or pre-established theory. CFA assumes that
each factor is associated with a specified subset of measured
variables. It commonly uses two approaches:
Confirmatory factor analysis (CFA):

1. The traditional method: Traditional factor method is based on principle factor


analysis method rather than common factor analysis. Traditional method allows
the researcher to know more about insight factor loading.
2. The SEM approach: CFA is an alternative approach of factor analysis which can
be done in SEM . In SEM, we will remove all straight arrows from the latent
variable, and add only that arrow which has to observe the variable representing
the covariance between every pair of latents. We will also leave the straight
arrows error free and disturbance terms to their respective variables. If
standardized error term in SEM is less than the absolute value two, then it is
assumed good for that factor, and if it is more than two, it means that there is still
some unexplained variance which can be explained by factor. Chi-square and a
number of other goodness-of-fi t indexes are used to test how well the model fi ts.

Resources

Bryant, F. B., & Yarnold, P. R. (1995). Principal components analysis


and exploratory and confirmatory factor analysis. In L. G. Grimm & P.
R. Yarnold (Eds.), Reading and understanding multivariate analysis .
Washington, DC: American Psychological Association.
Dunteman, G. H. (1989). Principal components analysis . Newbury Park,
CA: Sage Publications.
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J.
(1999). Evaluating the use of exploratory factor analysis in
psychological research. Psychological Methods, 4 (3), 272-299.
Gorsuch, R. L. (1983). Factor Analysis . Hillsdale, NJ: Lawrence Erlbaum
Associates.
Hair, J. F., Jr., Anderson, R. E., Tatham, R. L., & Black, W. C.
(1995). Multivariate data analysis with readings (4th ed.). Upper Saddle
River, NJ: Prentice-Hall.
Hatcher, L. (1994). A step-by-step approach to using the SAS system for
factor analysis and structural equation modeling . Cary, NC: SAS Institute.

Hutcheson, G., & Sofroniou, N. (1999). The multivariate social scientist:


Introductory statistics using generalized linear models . Thousand Oaks, CA:
Sage Publications.
Kim, J. -O., & Mueller, C. W. (1978a). Introduction to factor analysis: What
it is and how to do it . Newbury Park, CA: Sage Publications.
Kim, J. -O., & Mueller, C. W. (1978b). Factor Analysis: Statistical methods
and practical issues . Newbury Park, CA: Sage Publications.
Lawley, D. N., & Maxwell, A. E. (1962). Factor analysis as a statistical
method. The Statistician, 12 (3), 209-229.
Levine, M. S. (1977). Canonical analysis and factor comparison . Newbury
Park, CA: Sage Publications.
Pett, M. A., Lackey, N. R., & Sullivan, J. J. (2003). Making sense of factor
analysis: The use of factor analysis for instrument development in health care
research .

Thousand Oaks, CA: Sage Publications.

Shapiro, S. E., Lasarev, M. R., & McCauley, L. (2002). Factor analysis


of Gulf War illness: What does it add to our understanding of
possible health effects of deployment, American Journal of Epidemiology,
156 , 578-585.
Velicer, W. F., Eaton, C. A., & Fava, J. L. (2000). Construct explication
through factor or component analysis: A review and evaluation of
alternative procedures for determining the number of factors or
components. In R. D. Goffi n & E. Helmes (Eds.), Problems and solutions
in human assessment: Honoring Douglas Jackson at seventy. Boston, MA:
Kluwer.
Widaman, K. F. (1993). Common factor analysis versus principal
component analysis: Differential bias in representing model
parameters, Multivariate Behavioral Research, 28 , 263-311.
Related Pages:

General Linear Model


Confi rmatory Factor Analysis
Exploratory Factor Analysis
Principal Component Analysis

Principal Component Analysis (PCA)

There are two basic approaches to factor analysis : principal component


analysis (PCA) and common factor analysis. Overall, factor analysis
involves techniques to help produce a smaller number of linear
combinations on variables so that the reduced variables account for and
explain most the variance in correlation matrix pattern. Principal
component analysis is an approach to factor analysis that considers the
total variance in the data, which is unlike common factor analysis, and
transforms the original variables into a smaller set of linear combinations.
The diagonal of the correlation matrix consists of unities and the full
variance is brought into the factor matrix. The term factor matrix is the
matrix that contains the factor loadings of all the variables on all the
factors extracted. The term factor loadings are the simple correlations
between the factors and the variables. Principal component analysis is
recommended when the researchers primary concern is to determine the
minimum number of factors that will account for the maximum variance in
the data in use in the particular multivariate analysis, like in Delphi
studies. While conducting principal component analysis, the researcher
can get well versed with terms such as standard deviations and
eigenvalues. The eigenvalues refer to the total variance explained by
each factor. The standard deviation measures the variability of the data.
The task of principal component analysis is to identify the patterns in the
data and to direct the data by highlighting their similarities and
differences.
Questions Answered:
What survey questions should be grouped together that best measure X,
Y, and Z domains?
Should sections X and Y account for any variance in Z domain?
Assumptions:
Sample size: ideally, there should be 150+ cases and there should be
ratio of at least five cases for each variable (Pallant, 2010)
Correlations: there should be some correlation among the factors to be
considered for PCA
Linearity: it is assumed that the relationship between the variables are
linearly related
Outliers: PCA is sensitive to outliers; they should be removed.

To conduct this using SPSS, first click Analyze then select Dimension
Reductionand then Factor.
Select all required variables and move them into the Variables box.
You can do Descriptives if desired.
Click on the Extraction button and make sure Principal components is
checked under the Method section
*For assistance with factor analysis or other quantitative
analyses click here .
Related Pages:
Conduct and Interpret a Factor Analysis

Principal Component Analysis


Explained Visually
Tweet
By Victor Powell
with text by Lewis Lehe
Principal component analysis (PCA) is a technique used to emphasize variation
and bring out strong patterns in a dataset. It's often used to make data easy to
explore and visualize.
2D example
First, consider a dataset in only two dimensions, like (height, weight). This
dataset can be plotted as points in a plane. But if we want to tease out variation,
PCA finds a new coordinate system in which every point has a new (x,y) value.
The axes don't actually mean anything physical; they're combinations of height
and weight called "principal components" that are chosen to give one axes lots
of variation.
Drag the points around in the following visualization to see PC coordinate
system adjusts.
original data set0246810x0246810youtput from PCA-6-4-20246pc1-6-420246pc2

PCA is useful for eliminating dimensions. Below, we've plotted the data along a
pair of lines: one composed of the x-values and another of the y-values.
If we're going to only see the data along one dimension, though, it might be
better to make that dimension the principal component with most variation. We
don't lose much by dropping PC2 since it contributes the least to the variation in
the data set.
0246810x0246810y-6-4-20246pc1-6-4-20246pc2
3D example
With three dimensions, PCA is more useful, because it's hard to see through a
cloud of data. In the example below, the original data are plotted in 3D, but you
can project the data into 2D through a transformation no different than finding a
camera angle: rotate the axes to find the best angle. To see the "official" PCA
transformation, click the "Show PCA" button. The PCA transformation ensures
that the horizontal axis PC1 has the most variation, the vertical axis PC2 the
second-most, and a third axis PC3 the least. Obviously, PC3 is the one we drop.
Eating in the UK (a 17D example)
Original example from Mark Richardson's class notes Principal Component
Analysis
What if our data have way more than 3-dimensions? Like, 17 dimensions?! In
the table is the average consumption of 17 types of food in grams per person per
week for every country in the UK.
The table shows some interesting variations across different food types, but
overall differences aren't so notable. Let's see if PCA can eliminate dimensions
to emphasize how countries differ.
3755724514721055419314711027202536854881983601374156135472671494
6641209936741033143586355187334150613945853242146210362184122957
5661717504182203371572147475732271582103642351601137874265803570
2033651256175EnglandN IrelandScotlandWalesAlcoholic
drinksBeveragesCarcase meatCerealsCheeseConfectioneryFats and
oilsFishFresh fruitFresh potatoesFresh VegOther meatOther VegProcessed
potatoesProcessed VegSoft drinksSugars
Here's the plot of the data along the first principal component. Already we can
see something is different about Northern Ireland.

-300-200-1000100200300400500pc1EnglandWalesScotlandN Ireland
Now, see the first and second principal components, we see Northern Ireland a
major outlier. Once we go back and look at the data in the table, this makes
sense: the Northern Irish eat way more grams of fresh potatoes and way fewer
of fresh fruits, cheese, fish and alcoholic drinks. It's a good sign that structure
we've visualized reflects a big fact of real-world geography: Northern Ireland is
the only of the four countries not on the island of Great Britain. (If you're
confused about the differences among England, the UK and Great Britain,
see: this video.)

Principal Component Analysis (PCA) and Common Factor Analysis (CFA) are distinct methods. Often, they produce similar results and
PCA is used as the default extraction method in the SPSS Factor Analysis routines. This undoubtedly results in a lot of confusion about
the distinction between the two.
The bottom line is that these are two different models, conceptually. In PCA, the components are actual orthogonal linear combinations
that maximize the total variance. In FA, the factors are linear combinations that maximize the shared portion of the variance--underlying
"latent constructs". That's why FA is often called "common factor analysis". FA uses a variety of optimization routines and the result,
unlike PCA, depends on the optimization routine used and starting points for those routines. Simply there is not a single unique solution.
In R, the factanal() function provides CFA with a maximum likelihood extraction. So, you shouldn't expect it to reproduce an SPSS result
which is based on a PCA extraction. It's simply not the same model or logic. I'm not sure if you would get the same result if you used
SPSS's Maximum Likelihood extraction either as they may not use the same algorithm.
For better or for worse in R, you can, however, reproduce the mixed up "factor analysis" that SPSS provides as its default. Here's the
process in R. With this code, I'm able to reproduce the SPSS Principal Component "Factor Analysis" result using this dataset. (With the
exception of the sign, which is indeterminate). That result could also then be rotated using any of R's available rotation methods.
Principal component analysis involves extracting linear composites of observed variables.
Factor analysis is based on a formal model predicting observed variables from theoretical latent factors.
In psychology these two techniques are often applied in the construction of multi-scale tests to determine which items load on which
scales. They typically yield similar substantive conclusions (for a discussion see Comrey (1988) Factor-Analytic Methods of Scale
Development in Personality and Clinical Psychology). This helps to explain why some statistics packages seem to bundle them
together. I have also seen situations where "principal component analysis" is incorrectly labelled "factor analysis".
In terms of a simple rule of thumb, I'd suggest that you:
1.
2.

Run factor analysis if you assume or wish to test a theoretical model of latent factors causing observed variables.
Run principal component analysis If you want to simply reduce your correlated observed variables to a smaller set of
important independent composite variables.

The difference is that factor analysis allows the noise to have an arbitrary diagonal covariance matrix, while PCA assumes
the noise is spherical. ... The aim of principal component analysis is to explain the variance while factor analysis explains
the covariance between the variables.

What is principal components analysis?


Learn more about Minitab 17
Principal components analysis is a procedure for identifying a smaller number of uncorrelated variables, called
"principal components", from a large set of data. The goal of principal components analysis is to explain the maximum
amount of variance with the fewest number of principal components. Principal components analysis is commonly used
in the social sciences, market research, and other industries that use large data sets.
Principal components analysis is commonly used as one step in a series of analyses. You can use principal components
analysis to reduce the number of variables and avoid multicollinearity, or when you have too many predictors relative
to the number of observations.

Example
A consumer products company wants to analyze customer responses to several characteristics of a new shampoo:
color, smell, texture, cleanliness, shine, volume, amount needed to lather, and price. They perform a principal
components analysis to determine whether they can form a smaller number of uncorrelated variables that are easier to
interpret and analyze. The results identify the following patterns:

Color, smell, and texture form a "Shampoo quality" component.

Cleanliness, shine, and volume form an "Effect on hair" component.

Amount needed to lather and price form a "Value" component.


Factor analysis is a method for explaining the structure of data by explaining the correlations between variables. Factor
analysis summarizes data into a few dimensions by condensing a large number of variables into a smaller set of latent
variables or factors. It is commonly used in the social sciences, market research, and other industries that use large
data sets.
Consider a credit card company that creates a survey to assess customer satisfaction. The survey is designed to
answer questions in three categories: timeliness of service, accuracy of the service, and courteousness of phone
operators. The company can use factor analysis to ensure that the survey items address these three areas before
sending the survey to a large number of customers. If the survey does not adequately measure the three factors, then
the company should reevaluate the questions and retest the survey before sending it to customers.

Minitab.com

License Portal

Store

Blog
Contact Us

Perform a factor analysis


Learn more about Minitab 17
To perform factor analysis, you need to decide how many factors to use, and determine loadings that make the most
sense for your data.

1.

Decide how many factors to use. The choice of the number of factors is often based on the proportion of
variance explained by the factors, subject matter knowledge, and reasonableness of the solution.
a.

Try using the principal components extraction method without specifying the number of
components.

b.

Examine the proportion of variability explained by different factors and narrow down your choice of
how many factors to use. A scree plot can be useful here in visually assessing the importance of
factors.

c.

Examine the fits of the different factor analyses. Communality values, the proportion of variability of
each variable explained by the factors, can be especially useful in comparing fits. You might decide
to add a factor if it contributes to the fit of certain variables.

d.
2.

Try the maximum likelihood estimation method of extraction as well.

Evaluate your solution by trying multiple rotations. Johnson and Wichern suggest the varimax rotation. A
similar result from different methods can lend credence to the solution you have selected. At this point you
might want to interpret the factors using your knowledge of the data.

Get eigenvalues
Learn more about Minitab 17
In This Topic

What is an eigenvalue?

Get eigenvalues for principal components by using only a correlation matrix or a covariance matrix

Get eigenvalues for a matrix that was factored in factor analysis

What is an eigenvalue?
Eigenvalues (also called characteristic values or latent roots) are the variances of the principal components. They are
displayed by default in the Session window output under the table with the factor loadings.
NOTE

Minitab only calculates eigenvalues when you choose principal components as the method of extraction.

Get eigenvalues for principal components by using only a correlation matrix or a


covariance matrix
Use Factor Analysis instead of Principal Components Analysis to get the eigenvalues used in Principal Components
Analysis. Suppose the covariance matrix is in columns C1-C3:
1.

Choose Data > Copy > Columns to Matrix.

2.

In Copy from columns, enter

3.

For In current worksheet, in matrix, enter

4.

Choose Stat > Multivariate > Factor Analysis.

C1-C3.
M1. Click OK.

5.

Click Options.

6.

For Matrix to Factor, choose Covariance.

7.

For Source of Matrix, choose Use matrix: and enter

8.

Click OK in each dialog box.

M1.

In the output, the eigenvalues are under Variance (in Factor Analysis, the eigenvalues are the variances of the principal
components).

Get eigenvalues for a matrix that was factored in factor analysis


You can use either Factor Analysis or Eigen Analysis to get the eigenvalues.

Store the eigenvalues using Factor Analysis.


a.

Choose Stat > Multivariate > Factor Analysis.

b.

Click Storage.

c.

In the field beside Eigenvalues, enter a column in which to store the eigenvalues. Underneath,
enter a matrix in which to store the eigenvectors of the matrix that was factored.

Minitab stores eigenvalues in numerical order from largest to smallest.

Store the eigenvalues using Eigen Analysis.


Suppose your variables are in columns C1-C5, and you want to store the eigenvalues in column C6:
a.

Choose Stat > Basic Statistics > Correlation.

b.

In Variables, enter

c.

Select Store matrix (display nothing). Click OK.

d.

Choose Calc > Matrices > Eigen Analysis.

e.

In Analyze matrix, enter

f.

In Column of eigenvalues, enter

C1-C5.

CORR1.
C6. Click OK.

Perform principal components analysis by using varimax rotation


Learn more about Minitab 17
To perform a principal components analysis using varimax rotation, perform Principal Components, storing the
coefficients, and then perform Factor Analysis.
Suppose you want to calculate 3 components. Your data are in columns C1-C20 and columns C21-C23 are empty.
1.

Perform Principal Components.


a.

Select Stat > Multivariate > Principal Components.

2.

C1-C20.

b.

In Variables, enter

c.

In Number of components to compute, enter

d.

Click Storage.

e.

In Coefficients, enter

f.

Click OK in each dialog box.

3.

C21-C23.

Perform Factor Analysis.


a.

Select Stat > Multivariate > Factor Analysis.

b.

Select Varimax.

c.

Click Options.

d.

Select Use loadings and enter

e.

Click OK in each dialog box.

C21-C23.

Nonuniqueness of coefficients in principal components


Learn more about Minitab 17
The coefficients are unique (except for a change in sign) if the eigenvalues are unique and not zero. If an eigenvalue is
repeated, then the "space spanned" by all the principal component vectors corresponding to the same eigenvalue is
unique, but the individual vectors are not. Therefore, the coefficients that Minitab prints and those in a book or a
different application might not agree, though the eigenvalues (variances) will always be the same.
If the covariance matrix has rank r < p, where p is the number of variables, then there will be p - r eigenvalues equal to
zero. Eigenvectors corresponding to these eigenvalues might not be unique. This can occur if the number of
observations is less than p or if there is multicollinearity.

Minitab.com

License Portal

Store

Blog
Contact Us

Should I use a correlation matrix or a covariance matrix?


Learn more about Minitab 17
Use the correlation matrix to calculate the principal components if variables are measured by different scales and you
want to standardize them or if the variances differ widely between variables. You can use the covariance or correlation
matrix in all other situations.

Ways to determine the number of principal components

Learn more about Minitab 17


You can determine the number of principal components using several approaches. These approaches can detect a
different number of components for the same data. In such cases, choose the most interpretable and logical solution
for your data.
Kaiser method
Retain components with eigenvalues greater than 1.
Scree test
The ideal pattern in a scree plot is a steep curve, followed by a bend and then a flat or horizontal line. Retain
those components or factors in the steep curve before the first point that starts the flat line trend. You might
have difficulty interpreting a scree plot. Use your knowledge of the data and the results from the other
methods of selecting components or factors to help decide the number of important components or factors.
Percentage of variation explained
Retain components that cumulatively explain a certain percentage of variation. The acceptable level of
explained variance depends on how you use Principal Components. For descriptive purposes, you might only
need 80% of the variance explained. However, if you are doing other analyses on these data, you might want
to have at least 90% of the variance explained.

Methods for orthogonal rotation


Learn more about Minitab 17
An orthogonal rotation rotates the axes to give you a different perspective. The goal of rotation is to obtain a simpler
factor loading pattern that is easier to interpret than the original factor pattern. The communalities are unchanged
from the unrotated to the rotated solution.
A parameter, gamma, within this criterion is determined by the rotation method. If you use a method with a low value
of gamma, the rotation will tend to simplify the rows of the loadings; if you use a method with a high value of gamma,
the rotation will tend to simplify the columns of the loadings.
There are four methods to orthogonally rotate the initial factor loadings found by either principal components or
maximum likelihood extraction. They are equimax, varimax, quartimax, and orthomax. The following table summarizes
the rotation methods.

Rotation
Method

Goal

Gamma

equimax

To rotate the loadings so that a variable loads high on one factor but low on others.

number of
factors / 2

varimax

To maximizes the squared factor loadings in each factor. That is, to simplify the columns
of the factor loading matrix. In each factor the large loadings are increased and the

small ones are decreased so that each factor only has a few variables with large
loadings.

quartimax

To maximize the variance of the squared factor loadings in each variable. That is, to
simplify the rows of the factor loading matrix. In each variable the large loadings are

increased and the small ones are decreased so that each variable will only load on a few
factors.

orthomax

User determined, based on the specified value of gamma.

0-1

Rotation
Method

Goal

Gamma

What is a factor score?


Learn more about Minitab 17
Factor scores are estimated values of the factors in factor analysis. Minitab calculates factor scores by multiplying
factor score coefficients and your data after Minitab has centered them by subtracting means. Factor scores are used
to examine the behavior of observations and in other analyses such as regression or MANOVA.
For example, job applicants were measured on 12 different characteristics: academic record, appearance,
communication, company fit, experience, job fit, letter of interest, likeability, organization, potential, resume, and selfconfidence. You want to perform a factor analysis to determine what factors underlie the data. To calculate the factor
score, multiply the coefficients with your data. For example, here is the formula to calculate the factor scores for the
first factor:
0.482 Academic record + 0.028 Appearance 0.004 Communication 0.081 Company Fit + 0.209 Experience 0.143
Job Fit 0.030 Letter 0.065 Likeability 0.113 Organization + 0.594 Potential 0.104 Resume 0.119 Self-confidence
You must standardize the variables to obtain the correct factor score.

What are factor loadings?


Learn more about Minitab 17
Factor loadings represent how much a factor explains a variable in factor analysis.
For example, a credit card company creates a survey to assess customer satisfaction. The survey is designed to answer
questions in three categories: timeliness of service, accuracy of the service, and courteousness of phone operators. For
each survey question, examine the highest (positive or negative) loadings to determine which factor affects that
question the most. In the following table, questions 1-3 load on factor 1, questions 4-5 load on factor 2, and questions
6-8 load on factor 3.

Variable

Question 1

Question 2

Question 3

Question 4

Question 5

Variable

Question 6

Question 7

Question 8

Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the factor strongly affects the variable.
Loadings close to zero indicate that the factor has a weak affect on the variable.
Examine the loading pattern in the Minitab factor analysis output to determine the factor that has the largest effect on
each variable. Some variables might have high loadings on multiple factors.

What is a factor coefficient?


Learn more about Minitab 17
Factor coefficients identify the relative weight of each variable in the component in a factor analysis. The larger the
absolute value of the coefficient, the more important the corresponding variable is in calculating the component.
Minitab uses the factor coefficients to calculate the factor scores.
The matrix of factor score coefficients is:
L (LT L)-1
where L = the matrix of factor loadings

Minitab.com
License Po

What is Mahalanobis distance?


Learn more about Minitab 17
Mahalanobis distance is the distance between a data point and a multivariate space's centroid (overall mean). Use the
Mahalanobis distance in principle components analysis to identify outliers. It is a more powerful multivariate method
for detecting outliers than examining one variable at a time because it considers the different scales between variables
and the correlations between them.

The circled data point on the scatterplot does not fit in with the two variables' correlation structure. However, when examined
individually, its x or y-value are not unusual. Nevertheless, the Mahalanobis distance for this point is unusually large.

In discriminant analysis, Minitab uses the Mahalanobis distance to classify the observations into their predicted groups.
The group with the smallest distance is the one Minitab classifies the observation into.

Display Mahalanobis distance


Learn more about Minitab 17
You can display Mahalanobis distance using either Principle Components or Discriminant Analysis.
NOTE

In discriminant analysis, Minitab uses the pooled covariance matrix to calculate the Mahalanobis distance. This
considers the classification that each observation is grouped into. Because principle components analysis does not
classify the observation into groups, it uses the covariance matrix of all the data.

Display the Mahalanobis distance between an observation and the centroid using Principal Components.
a.

Choose Stat > Multivariate > Principle Components and click Storage.

b.

In Distances, enter the column that you want to store the distances in.

c.

Click OK in each dialog box.

Display the Mahalanobis distance between an observation and the group centroid using Discriminant
Analysis.
a.

Choose Stat > Multivariate > Discriminant Analysis and click Options.

b.

Under Display of Results, choose Above plus complete classification summary. Click OK.

In the Summary of Classified Observations table, the Squared Distance is the Mahalanobis Distance (D
squared) statistic, calculated for each observation from each group centroid.

What is a Heywood case?


Learn more about Minitab 17
A Heywood case occurs in factor analysis when the iterative maximum likelihood estimation method converges to
unique (specific) variances values that are less than a prefixed lower bound value. Minitab sets the value for these
unique variances equal to 0 and their corresponding communalities equal to 1. Heywood cases occur frequently when
too many factors are extracted or the sample size is too small.

When a Heywood case occurs, Minitab displays * NOTE * Heywood case in the Session window output.

What is a scree plot?


Learn more about Minitab 17
A scree plot displays the eigenvalues associated with a component or factor in descending order versus the number of
the component or factor. You can use scree plots in principal components analysis and factor analysis to visually assess
which components or factors explain most of the variability in the data.

A factor analysis was conducted on 12 different characteristics of job applicants. This scree plot shows that 5 of those factors explain
most of the variability because the line starts to straighten after factor 5. The remaining factors explain a very small proportion of the
variability and are likely unimportant.

The ideal pattern in a scree plot is a steep curve, followed by a bend and then a flat or horizontal line. Retain those
components or factors in the steep curve before the first point that starts the flat line trend. You might have difficulty
interpreting a scree plot. Use your knowledge of the data and the results from the other approaches of selecting
components or factors to help decide the number of important components or factors.

Choose an extraction method for factor analysis


Minitab offers two extraction methods for factor analysis: principal components and maximum likelihood. When
performing factor analysis:

If the factors and the errors obtained after fitting the factor model are assumed to follow a normal
distribution, use the maximum likelihood method to obtain maximum likelihood estimates of the factor
loadings.

If the factors and errors obtained after fitting the factor model are not assumed to follow a normal
distribution, use the principal components method.

What are the differences between principal components analysis and factor
analysis?
Principal Components Analysis and Factor Analysis are similar because both procedures are used to simplify the
structure of a set of variables. However, the analyses differ in several important ways:
1.

In Minitab, you can only enter raw data when using Principal Components Analysis. However, you can enter
raw data, a correlation or covariance matrix, or the loadings from a previous analysis when using Factor
Analysis.

2.

In Principal Components Analysis, the components are calculated as linear combinations of the original
variables. In Factor Analysis, the original variables are defined as linear combinations of the factors.

3.

In Principal Components Analysis, the goal is to explain as much of the total variance in the variables as
possible. The goal in Factor Analysis is to explain the covariances or correlations between the variables.

4.

Use Principal Components Analysis to reduce the data into a smaller number of components. Use Factor
Analysis to understand what constructs underlie the data.

The two analyses are often conducted on the same data. For example, you can conduct a principal components
analysis to determine the number of factors to extract in a factor analytic study.

Das könnte Ihnen auch gefallen