Beruflich Dokumente
Kultur Dokumente
Reduce survey data into factors that account for maximum variance
Principal Components Analysis (PCA) uses algorithms to "reduce" data into
correlated "factors" that provide a conceptual and mathematical
understanding of the construct of interest.
Going back to the construct specification and the survey items, everything has
been focused on measuring for one construct related to answering the research
question. Under the assumption that researchers are measuring for one
construct, the individual items should correlate in some form or fashion.
These inter-correlations among different sets of survey items (or content
areas) provide a mathematical basis for understanding latent or underlying
relationships that may exist. Principal Components Analysis (PCA) reduces
survey data down into content areas that account for the most variance.
The process of conducting a Principal Components Analysis
A Principal Components Analysis) is a three step process:
1. The inter-correlations amongst the items are calculated yielding a correlation
matrix.
2. The inter-correlated items, or "factors," are extracted from the correlation
matrix to yield "principal components."
3. These "factors" are rotated for purposes of analysis and interpretation.
At this point, the researcher has to make a decision about how to move forward.
Luckily, there are two statistical calculations that help you make this
decision: Eigenvalues and scree plots.
An eigenvalue is essentially a ratio of the shared variance to the unique variance
accounted for in the construct of interest by each "factor" yielded from the
extraction of principal components. An eigenvalue of 1.0 or greater is the
arbitrary criterion accepted in the current literature for deciding if a factor
should be further interpreted. The logic underlying the criterion of 1.0 comes
from the belief that the amount of shared variance explained by a "factor"
should at least be equal to the unique variance the "factor" accounts for in the
overall construct.
Scree plots provide a visual aid in deciding how many "factors" should be
interpreted from the principal components extraction. In a scree plot,
the eigenvalues are plotted against the order of "factors" extracted from the data.
Because the first "factors" extracted from the principal components analysis
often have the highest inter-correlations amongst their individual survey items,
and will thus account for more overall variance in your construct of interest,
they tend to be extracted first. As other "factors" are extracted, the intercorrelations will become weaker and have smaller eigenvalues. One can look at
a scree plot and see a visually significant decrease at one point in time as
eigenvalues decrease. This "elbow" or factor at which the screen plot has a
significant reduction in eigenvalue and then level's off is often considered the
criterion for selecting the number of "factors" to interpret.
So, based on the two statistical calculations above, the eigenvalues and scree
plot, make a decision on how many "factors" should be extracted.
These extracted "factors" of inter-correlated items are "rotated." This
"rotation" occurs because it is prevalent for certain items to be highly intercorrelated with items on several different "factors." This makes it hard for the
initial extraction of factors to be interpreted. The "rotation" forces these
troublesome items onto the "factor" with which it has the most strongest
association with the items of the "factor." This mathematical
"rotation" increases the interpretability of extracted "factors," but cancels
out the ability to interpret the amount of shared variance associated with
the "factor."
When it comes to interpreting the "factors" themselves, any item that does not at
least have a correlation or "factor loading" of .3 with the "factor" it has loaded
on should be discarded.
There are a few assumptions that must be met to conduct a Principal
Components Analysis (PCA):
1. There must be a large enough sample size to allow the correlations to
converge into mutually exclusive "factors."
2. Normality and linearity of the items is assumed because correlations
provide the mathematical foundation for factor analysis to extract "factors."
3. The items must be written in a fashion where sufficiently high enough
correlations can be yielded and extracted.
4. Content areas and items must be utilized within some sort of theoretical or
conceptual framework so that correlations can be yielded.
with a new sample to establish validity evidence for the survey instrument and
construct.
http://support.minitab.com/en-us/minitab/17/topic-library/modelingstatistics/multivariate/principal-components-and-factor-analysis/chooseextraction-method/
factor
The table above is included in the output because we used the det option on
the /print subcommand. All we want to see in this table is that the determinant is not 0. If
the determinant is 0, then there will be computational problems with the factor analysis, and
SPSS may issue a warning message or be unable to complete the factor analysis.
a. Communalities - This is the proportion of each variable's variance that can be explained
by the factors (e.g., the underlying latent continua). It is also noted as h2 and can be defined
as the sum of squared factor loadings for the variables.
b. Initial - With principal factor axis factoring, the initial values on the diagonal of the
correlation matrix are determined by the squared multiple correlation of the variable with the
other variables. For example, if you regressed items 14 through 24 on item 13, the squared
multiple correlation coefficient would be .564.
c. Extraction - The values in this column indicate the proportion of each variable's variance
that can be explained by the retained factors. Variables with high values are well
represented in the common factor space, while variables with low values are not well
represented. (In this example, we don't have any particularly low values.) They are the
reproduced variances from the factors that you have extracted. You can find these values
on the diagonal of the reproduced correlation matrix.
a. Factor - The initial number of factors is the same as the number of variables used in the
factor analysis. However, not all 12 factors will be retained. In this example, only the first
three factors will be retained (as we requested).
b. Initial Eigenvalues - Eigenvalues are the variances of the factors. Because we
conducted our factor analysis on the correlation matrix, the variables are standardized,
which means that the each variable has a variance of 1, and the total variance is equal to the
number of variables used in the analysis, in this case, 12.
c. Total - This column contains the eigenvalues. The first factor will always account for the
most variance (and hence have the highest eigenvalue), and the next factor will account for
as much of the left over variance as it can, and so on. Hence, each successive factor will
account for less and less variance.
d. % of Variance - This column contains the percent of total variance accounted for by each
factor.
e. Cumulative % - This column contains the cumulative percentage of variance accounted
for by the current and all preceding factors. For example, the third row shows a value of
68.313. This means that the first three factors together account for 68.313% of the total
variance.
f. Extraction Sums of Squared Loadings - The number of rows in this panel of the table
correspond to the number of factors retained. In this example, we requested that three
factors be retained, so there are three rows, one for each retained factor. The values in this
panel of the table are calculated in the same way as the values in the left panel, except that
here the values are based on the common variance. The values in this panel of the table will
always be lower than the values in the left panel of the table, because they are based on the
common variance, which is always smaller than the total variance.
g. Rotation Sums of Squared Loadings - The values in this panel of the table represent
the distribution of the variance after the varimax rotation. Varimax rotation tries to maximize
the variance of each of the factors, so the total amount of variance accounted for is
redistributed over the three extracted factors.
The scree plot graphs the eigenvalue against the factor number. You can see these values
in the first two columns of the table immediately above. From the third factor on, you can
see that the line is almost flat, meaning the each successive factor is accounting for smaller
and smaller amounts of the total variance.
b. Factor Matrix - This table contains the unrotated factor loadings, which are the
correlations between the variable and the factor. Because these are correlations, possible
values range from -1 to +1. On the /format subcommand, we used the option blank(.30),
which tells SPSS not to print any of the correlations that are .3 or less. This makes the
output easier to read by removing the clutter of low correlations that are probably not
meaningful anyway.
c. Factor - The columns under this heading are the unrotated factors that have been
extracted. As you can see by the footnote provided by SPSS (a.), three factors were
extracted (the three factors that we requested).
c. Reproduced Correlations - This table contains two tables, the reproduced correlations
in the top part of the table, and the residuals in the bottom part of the table.
d. Reproduced Correlation - The reproduced correlation matrix is the correlation matrix
based on the extracted factors. You want the values in the reproduced matrix to be as close
to the values in the original correlation matrix as possible. This means that the residual
matrix, which contains the differences between the original and the reproduced matrix to be
close to zero. If the reproduced matrix is very similar to the original correlation matrix, then
you know that the factors that were extracted accounted for a great deal of the variance in
the original correlation matrix, and these few factors do a good job of representing the
original data. The numbers on the diagonal of the reproduced correlation matrix are
presented in the Communalities table in the column labeled Extracted.
e. Residual - As noted in the first footnote provided by SPSS (a.), the values in this part of
the table represent the differences between original correlations (shown in the correlation
table at the beginning of the output) and the reproduced correlations, which are shown in the
top part of this table. For example, the original correlation between item13 and item14 is .
661, and the reproduced correlation between these two variables is .646. The residual is .
016 = .661 - .646 (with some rounding error).
b. Rotated Factor Matrix - This table contains the rotated factor loadings (factor pattern
matrix), which represent both how the variables are weighted for each f actor but also the
correlation between the variables and the factor. Because these are correlations, possible
values range from -1 to +1. On the /format subcommand, we used the option blank(.30),
which tells SPSS not to print any of the correlations that are .3 or less. This makes the
output easier to read by removing the clutter of low correlations that are probably not
meaningful anyway.
For orthogonal rotations, such as varimax, the factor pattern and factor structure matrices
are the same.
c. Factor - The columns under this heading are the rotated factors that have been
extracted. As you can see by the footnote provided by SPSS (a.), three factors were
extracted (the three factors that we requested). These are the factors that analysts are most
interested in and try to name. For example, the first factor might be called "instructor
competence" because items like "instructor well prepare" and "instructor competence" load
highly on it. The second factor might be called "relating to students" because items like
"instructor is sensitive to students" and "instructor allows me to ask questions" load highly on
it. The third factor has to do with comparisons to other instructors and courses.
The table below is from another run of the factor analysis program shown above, except with
a promax rotation. We have included it here to show how different the rotated solutions can
be, and to better illustrate what is meant by simple structure. As you can see with an oblique
rotation, such as a promax rotation, the factors are permitted to be correlated with one
another. With an orthogonal rotation, such as the varimax shown above, the factors are not
permitted to be correlated (they are orthogonal to one another). Oblique rotations, such as
promax, produce both factor pattern and factor structure matrices. For orthogonal rotations,
such as varimax and equimax, the factor structure and the factor pattern matrices are the
same. The factor structure matrix represents the correlations between the variables and the
factors. The factor pattern matrix contain the coefficients for the linear combination of the
variables.
The table below indicates that the rotation done is an oblique rotation. If an orthogonal
rotation had been done (like the varimax rotation shown above), this table would not appear
in the output because the correlations between the factors are set to 0. Here, you can see
that the factors are highly correlated.
The rest of the output shown below is part of the output generated by the SPSS syntax
shown at the beginning of this page.
a. Factor Transformation Matrix - This is the matrix by which you multiply the unrotated
factor matrix to get the rotated factor matrix.
The plot above shows the items (variables) in the rotated factor space. While this picture
may not be particularly helpful, when you get this graph in the SPSS output, you can
interactively rotate it. This may help you to see how the items (variables) are organized in
the common factor space.
a. Factor Score Coefficient Matrix - This is the factor weight matrix and is used to
compute the factor scores.
a. Factor Score Covariance Matrix - Because we used an orthogonal rotation, this should
be a diagonal matrix, meaning that the same number should appear in all three places along
the diagonal. In actuality the factors are uncorrelated; however, because factor scores are
estimated there may be slight correlations among the factor scores.
How to cite this page
Report an error on this page or leave a comment
The content of this web site should not be construed as an endorsement of any particular
web site, book, or software product by the University of California.
Read More
PLS Graph Software
PLS graph is an application that consists of a windows based graphical user interface that helps the researcher or the
user to perform partial least square (PLS) analyses. PLS analysis provides a general model which helps in predictive
analyses (usually in pilot studies), such as canonical correlations, multiple regressions, MANOVAs, and PCAs. It helps
the
Read More
LISREL
LISRELis a program application provided by Windows for performing structural equation modeling (SEM), and other
related linear structure modeling (e.g.,multilevel structural equation modeling, multilevel linear and non-linear
modeling, etc.). LISREL for Windows is helpful in importing the external data in various formats like SPSS, SAS , MS
Excel, etc. as a PRELIS system file (PSF).
Read More
Structural Equation Modeling
Structural equation modeling is a multivariate statistical analysis technique that is used to analyze structural
relationships. This technique is the combination of factor analysis and multiple regression analysis, and it is used to
analyze the structural relationship between measured variables and latent constructs. This method is preferred by the
researcher because it estimates the multiple
Read More
Principal Component Analysis (PCA)
There are two basic approaches to factor analysis: principal component analysis (PCA) and common factor analysis.
Overall, factor analysis involves techniques to help produce a smaller number of linear combinations on variables so
that the reduced variables account for and explain most the variance in correlation matrix pattern. Principal component
analysis is an approach to
Read More
Path Analysis
Path analysis is an extension of the regression model. In a path analysis model from the correlation matrix, two or
more casual models are compared. The path of the model is shown by a square and an arrow, which shows the
causation. Regression weight is predicated by the model. Then the goodness of fit statistic
Read More
Factor Analysis
Factor analysis is a technique that is used to reduce a large number of variables into fewer numbers of factors. This
technique extracts maximum common variance from all variables and puts them into a common score. As an index of
all variables, we can use this score for further analysis. Factor analysis is part of
Read More
Exploratory Factor Analysis
Exploratory factor analysis is a statistical technique that is used to reduce data to a smaller set of summary variables
and to explore the underlining theoretical structure of the phenomena. It is used to identify the structure of the
relationship between the variable and the respondent. Exploratory factor analysis can be performed by using the
Read More
Confirmatory Factor Analysis
Confirmatory factor analysis (CFA) is a multivariate statistical procedure that is used to test how well the measured
variables represent the number of constructs. Confirmatory factor analysis (CFA) and exploratory factor analysis (EFA)
are similar techniques, but in exploratory factor analysis (EFA), data is simply explored and provides information about
the numbers of factors required
Read More
What is the Factor Analysis?
In the dialog box of the factor analysis we start by adding our variables
(the standardized tests math, reading, and writing, as well as the
aptitude tests 1-5) to the list of variables.
one the cut-off value. Generally, SPSS can extract as many factors
as we have variables. The eigenvalue is calculated for each factor
extracted. If the eigenvalue drops below 1 it means that the factor
explains less variance than adding a variable would do (all variables
are standardized to have mean = 0 and variance = 1). Thus we
want all factors that better explain the model than would adding a
single variable.
is a multivariate statistical
procedure that is used to test how well the measured variables
represent the number of constructs. Confirmatory factor analysis
(CFA) and exploratory factor analysis (EFA) are similar techniques, but in
exploratory factor analysis (EFA), data is simply explored and
provides information about the numbers of factors required to
represent the data. In exploratory factor analysis, all measured
variables are related to every latent variable. But in confirmatory
factor analysis (CFA), researchers can specify the number of factors
required in the data and which measured variable is related to which
latent variable. Confirmatory factor analysis (CFA) is a tool that is
used to confirm or reject the measurement theory.
Confirmatory factor analysis (CFA)
Usually, statistical software like AMOS , LISREL, EQS and SAS are used
for confirmatory factor analysis. In AMOS, visual paths are manually
drawn on the graphic window and analysis is performed. In LISREL,
confirmatory factor analysis can be performed graphically as well as
from the menu. In SAS, confirmatory factor analysis can be
performed by using the programming languages.
Related Pages:
Types of factoring:
There are different types of methods used to extract the factor from
the data set:
1. Principal component analysis: This is the most common method
used by researchers. PCA starts extracting the maximum variance
and puts them into the first factor. After that, it removes that
variance explained by the first factors and then starts extracting
maximum variance for the second factor. This process goes to the
last factor.
2. Common factor analysis: The second most preferred method by
researchers, it extracts the common variance and puts them into
factors. This method does not include the unique variance of all
variables. This method is used in SEM.
3. Image factoring: This method is based on correlation matrix. OLS
Regression method is used to predict the factor in image factoring.
4. Maximum likelihood method: This method also works on correlation
metric but it uses maximum likelihood method to factor.
5. Other methods of factor analysis: Alfa factoring outweighs least
squares. Weight square is another regression based method which is
used for factoring.
Factor loading:
Factor loading is basically the correlation coeffi cient for the variable
and factor. Factor loading shows the variance explained by the
variable on that particular factor. In the SEM approach, as a rule of
thumb, 0.7 or higher factor loading represents that the factor
extracts suffi cient variance from that variable.
Assumptions:
1. No outlier: Assume that there are no outliers in data.
2. Adequate sample size: The case must be greater than the factor.
3. No perfect multicollinearity: Factor analysis is an interdependency technique.
There should not be perfect multicollinearity between the variables.
4. Homoscedasticity: Since factor analysis is a linear function of measured
variables, it does not require homoscedasticity between the variables.
5. Linearity: Factor analysis is also based on linearity assumption. Non-linear
variables can also be used. After transfer, however, it changes into linear
variable.
6. Interval Data: Interval data are assumed.
Resources
To conduct this using SPSS, first click Analyze then select Dimension
Reductionand then Factor.
Select all required variables and move them into the Variables box.
You can do Descriptives if desired.
Click on the Extraction button and make sure Principal components is
checked under the Method section
*For assistance with factor analysis or other quantitative
analyses click here .
Related Pages:
Conduct and Interpret a Factor Analysis
PCA is useful for eliminating dimensions. Below, we've plotted the data along a
pair of lines: one composed of the x-values and another of the y-values.
If we're going to only see the data along one dimension, though, it might be
better to make that dimension the principal component with most variation. We
don't lose much by dropping PC2 since it contributes the least to the variation in
the data set.
0246810x0246810y-6-4-20246pc1-6-4-20246pc2
3D example
With three dimensions, PCA is more useful, because it's hard to see through a
cloud of data. In the example below, the original data are plotted in 3D, but you
can project the data into 2D through a transformation no different than finding a
camera angle: rotate the axes to find the best angle. To see the "official" PCA
transformation, click the "Show PCA" button. The PCA transformation ensures
that the horizontal axis PC1 has the most variation, the vertical axis PC2 the
second-most, and a third axis PC3 the least. Obviously, PC3 is the one we drop.
Eating in the UK (a 17D example)
Original example from Mark Richardson's class notes Principal Component
Analysis
What if our data have way more than 3-dimensions? Like, 17 dimensions?! In
the table is the average consumption of 17 types of food in grams per person per
week for every country in the UK.
The table shows some interesting variations across different food types, but
overall differences aren't so notable. Let's see if PCA can eliminate dimensions
to emphasize how countries differ.
3755724514721055419314711027202536854881983601374156135472671494
6641209936741033143586355187334150613945853242146210362184122957
5661717504182203371572147475732271582103642351601137874265803570
2033651256175EnglandN IrelandScotlandWalesAlcoholic
drinksBeveragesCarcase meatCerealsCheeseConfectioneryFats and
oilsFishFresh fruitFresh potatoesFresh VegOther meatOther VegProcessed
potatoesProcessed VegSoft drinksSugars
Here's the plot of the data along the first principal component. Already we can
see something is different about Northern Ireland.
-300-200-1000100200300400500pc1EnglandWalesScotlandN Ireland
Now, see the first and second principal components, we see Northern Ireland a
major outlier. Once we go back and look at the data in the table, this makes
sense: the Northern Irish eat way more grams of fresh potatoes and way fewer
of fresh fruits, cheese, fish and alcoholic drinks. It's a good sign that structure
we've visualized reflects a big fact of real-world geography: Northern Ireland is
the only of the four countries not on the island of Great Britain. (If you're
confused about the differences among England, the UK and Great Britain,
see: this video.)
Principal Component Analysis (PCA) and Common Factor Analysis (CFA) are distinct methods. Often, they produce similar results and
PCA is used as the default extraction method in the SPSS Factor Analysis routines. This undoubtedly results in a lot of confusion about
the distinction between the two.
The bottom line is that these are two different models, conceptually. In PCA, the components are actual orthogonal linear combinations
that maximize the total variance. In FA, the factors are linear combinations that maximize the shared portion of the variance--underlying
"latent constructs". That's why FA is often called "common factor analysis". FA uses a variety of optimization routines and the result,
unlike PCA, depends on the optimization routine used and starting points for those routines. Simply there is not a single unique solution.
In R, the factanal() function provides CFA with a maximum likelihood extraction. So, you shouldn't expect it to reproduce an SPSS result
which is based on a PCA extraction. It's simply not the same model or logic. I'm not sure if you would get the same result if you used
SPSS's Maximum Likelihood extraction either as they may not use the same algorithm.
For better or for worse in R, you can, however, reproduce the mixed up "factor analysis" that SPSS provides as its default. Here's the
process in R. With this code, I'm able to reproduce the SPSS Principal Component "Factor Analysis" result using this dataset. (With the
exception of the sign, which is indeterminate). That result could also then be rotated using any of R's available rotation methods.
Principal component analysis involves extracting linear composites of observed variables.
Factor analysis is based on a formal model predicting observed variables from theoretical latent factors.
In psychology these two techniques are often applied in the construction of multi-scale tests to determine which items load on which
scales. They typically yield similar substantive conclusions (for a discussion see Comrey (1988) Factor-Analytic Methods of Scale
Development in Personality and Clinical Psychology). This helps to explain why some statistics packages seem to bundle them
together. I have also seen situations where "principal component analysis" is incorrectly labelled "factor analysis".
In terms of a simple rule of thumb, I'd suggest that you:
1.
2.
Run factor analysis if you assume or wish to test a theoretical model of latent factors causing observed variables.
Run principal component analysis If you want to simply reduce your correlated observed variables to a smaller set of
important independent composite variables.
The difference is that factor analysis allows the noise to have an arbitrary diagonal covariance matrix, while PCA assumes
the noise is spherical. ... The aim of principal component analysis is to explain the variance while factor analysis explains
the covariance between the variables.
Example
A consumer products company wants to analyze customer responses to several characteristics of a new shampoo:
color, smell, texture, cleanliness, shine, volume, amount needed to lather, and price. They perform a principal
components analysis to determine whether they can form a smaller number of uncorrelated variables that are easier to
interpret and analyze. The results identify the following patterns:
Minitab.com
License Portal
Store
Blog
Contact Us
1.
Decide how many factors to use. The choice of the number of factors is often based on the proportion of
variance explained by the factors, subject matter knowledge, and reasonableness of the solution.
a.
Try using the principal components extraction method without specifying the number of
components.
b.
Examine the proportion of variability explained by different factors and narrow down your choice of
how many factors to use. A scree plot can be useful here in visually assessing the importance of
factors.
c.
Examine the fits of the different factor analyses. Communality values, the proportion of variability of
each variable explained by the factors, can be especially useful in comparing fits. You might decide
to add a factor if it contributes to the fit of certain variables.
d.
2.
Evaluate your solution by trying multiple rotations. Johnson and Wichern suggest the varimax rotation. A
similar result from different methods can lend credence to the solution you have selected. At this point you
might want to interpret the factors using your knowledge of the data.
Get eigenvalues
Learn more about Minitab 17
In This Topic
What is an eigenvalue?
Get eigenvalues for principal components by using only a correlation matrix or a covariance matrix
What is an eigenvalue?
Eigenvalues (also called characteristic values or latent roots) are the variances of the principal components. They are
displayed by default in the Session window output under the table with the factor loadings.
NOTE
Minitab only calculates eigenvalues when you choose principal components as the method of extraction.
2.
3.
4.
C1-C3.
M1. Click OK.
5.
Click Options.
6.
7.
8.
M1.
In the output, the eigenvalues are under Variance (in Factor Analysis, the eigenvalues are the variances of the principal
components).
b.
Click Storage.
c.
In the field beside Eigenvalues, enter a column in which to store the eigenvalues. Underneath,
enter a matrix in which to store the eigenvectors of the matrix that was factored.
b.
In Variables, enter
c.
d.
e.
f.
C1-C5.
CORR1.
C6. Click OK.
2.
C1-C20.
b.
In Variables, enter
c.
d.
Click Storage.
e.
In Coefficients, enter
f.
3.
C21-C23.
b.
Select Varimax.
c.
Click Options.
d.
e.
C21-C23.
Minitab.com
License Portal
Store
Blog
Contact Us
Rotation
Method
Goal
Gamma
equimax
To rotate the loadings so that a variable loads high on one factor but low on others.
number of
factors / 2
varimax
To maximizes the squared factor loadings in each factor. That is, to simplify the columns
of the factor loading matrix. In each factor the large loadings are increased and the
small ones are decreased so that each factor only has a few variables with large
loadings.
quartimax
To maximize the variance of the squared factor loadings in each variable. That is, to
simplify the rows of the factor loading matrix. In each variable the large loadings are
increased and the small ones are decreased so that each variable will only load on a few
factors.
orthomax
0-1
Rotation
Method
Goal
Gamma
Variable
Question 1
Question 2
Question 3
Question 4
Question 5
Variable
Question 6
Question 7
Question 8
Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the factor strongly affects the variable.
Loadings close to zero indicate that the factor has a weak affect on the variable.
Examine the loading pattern in the Minitab factor analysis output to determine the factor that has the largest effect on
each variable. Some variables might have high loadings on multiple factors.
Minitab.com
License Po
The circled data point on the scatterplot does not fit in with the two variables' correlation structure. However, when examined
individually, its x or y-value are not unusual. Nevertheless, the Mahalanobis distance for this point is unusually large.
In discriminant analysis, Minitab uses the Mahalanobis distance to classify the observations into their predicted groups.
The group with the smallest distance is the one Minitab classifies the observation into.
In discriminant analysis, Minitab uses the pooled covariance matrix to calculate the Mahalanobis distance. This
considers the classification that each observation is grouped into. Because principle components analysis does not
classify the observation into groups, it uses the covariance matrix of all the data.
Display the Mahalanobis distance between an observation and the centroid using Principal Components.
a.
Choose Stat > Multivariate > Principle Components and click Storage.
b.
In Distances, enter the column that you want to store the distances in.
c.
Display the Mahalanobis distance between an observation and the group centroid using Discriminant
Analysis.
a.
Choose Stat > Multivariate > Discriminant Analysis and click Options.
b.
Under Display of Results, choose Above plus complete classification summary. Click OK.
In the Summary of Classified Observations table, the Squared Distance is the Mahalanobis Distance (D
squared) statistic, calculated for each observation from each group centroid.
When a Heywood case occurs, Minitab displays * NOTE * Heywood case in the Session window output.
A factor analysis was conducted on 12 different characteristics of job applicants. This scree plot shows that 5 of those factors explain
most of the variability because the line starts to straighten after factor 5. The remaining factors explain a very small proportion of the
variability and are likely unimportant.
The ideal pattern in a scree plot is a steep curve, followed by a bend and then a flat or horizontal line. Retain those
components or factors in the steep curve before the first point that starts the flat line trend. You might have difficulty
interpreting a scree plot. Use your knowledge of the data and the results from the other approaches of selecting
components or factors to help decide the number of important components or factors.
If the factors and the errors obtained after fitting the factor model are assumed to follow a normal
distribution, use the maximum likelihood method to obtain maximum likelihood estimates of the factor
loadings.
If the factors and errors obtained after fitting the factor model are not assumed to follow a normal
distribution, use the principal components method.
What are the differences between principal components analysis and factor
analysis?
Principal Components Analysis and Factor Analysis are similar because both procedures are used to simplify the
structure of a set of variables. However, the analyses differ in several important ways:
1.
In Minitab, you can only enter raw data when using Principal Components Analysis. However, you can enter
raw data, a correlation or covariance matrix, or the loadings from a previous analysis when using Factor
Analysis.
2.
In Principal Components Analysis, the components are calculated as linear combinations of the original
variables. In Factor Analysis, the original variables are defined as linear combinations of the factors.
3.
In Principal Components Analysis, the goal is to explain as much of the total variance in the variables as
possible. The goal in Factor Analysis is to explain the covariances or correlations between the variables.
4.
Use Principal Components Analysis to reduce the data into a smaller number of components. Use Factor
Analysis to understand what constructs underlie the data.
The two analyses are often conducted on the same data. For example, you can conduct a principal components
analysis to determine the number of factors to extract in a factor analytic study.