Principal Components Analysis

Principal Components Factor Analysis
The purpose of principal components factor analysis is to reduce the number of variables in the analysis by using a surrogate variable or factor to represent a number of variables, while retaining the variance that was present in the original variables. The data analysis indicates the relationship between the original variables and the factors, so that we know how to make the substitutions. Principal components is frequently used to simplify a data set prior to conducting a multiple regression or discriminant analysis. To demonstrate principal components analysis, we will use the sample problem in the text which begins on page 120.
Slide 1
Stage 1: Define the Research Problem

The perceptions of HATCO on seven attributes (Delivery speed, Price level, Price flexibility, Manufacturer image, Service, Salesforce image, and Product quality) are examined to (1) understand if these perceptions can be "grouped" and (2) reduce the seven variables to a smaller number. (Text, page 120)
Slide 2
Stage 2: Designing a Factor Analysis

In this stage, we address issues of sample size and measurement issues. Since missing data has an impact on sample size, we will analyze patterns of missing data in this stage. Sample size issues Missing Data Analysis There is no missing data in the HATCO data set. Sample size of 100 or more There are 100 subjects in the sample. This requirement is met. Ratio of subjects to variables should be 5 to 1 There are 100 subjects and 7 variables in the analysis for a ratio of 14 to 1. This requirement is met.
Variable selection and measurement issues

Dummy code non-metric variables All variables in the analysis are metric so no dummy coding is required. Parsimonious variable selection
The variables included in the analysis are core elements of the business. There do not appear to be any extraneous variables.
Principal Components Factor Analysis Slide 3
Stage 3: Assumptions of Factor Analysis

In this stage, we do the tests necessary to meet the assumptions of the statistical analysis. For factor analysis, we will examine the suitability of the data for a factor analysis. Metric or dummy-coded variables All variables are metric in this analysis. Departure from normality, homoscedasticity, and linearity diminish correlations In the last class, we conducted an exploratory analysis of these variables. We know from this several variables are not normally distributed. Non-normality will diminish the correlations among the variables, such that the relationships between variables might be stronger than what we are able to represent in this analysis. Multivariate Normality required if a statistical criteria for factor loadings is used We will use the criteria of 0.40 for identifying substantial loadings on factors, rather than a statistical probability, so this criteria is not binding on this analysis. Homogeneity of sample There is nothing in the problem to indicate that subgroups within the sample have different patterns of scores on the variables included in the analysis. In the absence of evidence to the contrary, we will assume that this assumption is met. Use of Factor Analysis is Justified The determination that the use of factor analysis is justifiable is obtained from the statistical output that SPSS provides in the request for a factor analysis.
Requesting a Principal Components Factor Analysis
Click on the 'Data Reduction | Factor...' command in the Analyze menu.
Slide 5
Specify the Variables to Include in the Analysis

First, highlight the variables: X1 'Delivery Speed', X2 'Price Level', X3 'Price Flexibility', X4 'Manufacturer Image', X5 'Service' , X6 'Salesforce Image', and X7 'Product Quality'.
Second, click on the move arrow to move the highlighted variables to the 'Variables:' list.
Slide 6
Specify the Descriptive Statistics to include in the Output

Second, mark the checkbox for 'Initial solution' in the 'Statistics' panel. Clear all other checkboxes.
First, click on the 'Descriptives...' button.
Fourth, click on the 'Continue' button to complete the 'Factor Analysis: Descriptives' dialog box.
Third, mark the checkboxes for 'Coefficients', 'KMO and Bartlett's test of sphericity', and 'Anti-image' on the 'Correlation Matrix' panel. Clear all other checkboxes.
Slide 7
Specify the Extraction Method and Number of Factors

Second, select 'Principal components' from the 'Method' drop down menu.
Third, mark the 'Correlation matrix' option in the 'Analyze' panel.
Fourth, mark the checkboxes for 'Unrotated factor solution' and 'Scree plot' on the 'Display' panel.
First, click on the 'Extraction...' button. Sixth, click on the 'Continue' button to complete the dialog box.
Fifth, accept the default values of 'Eigenvalues over: 1' on the Extract panel and the 'Maximum Iterations for convergence: 25'.
Slide 8
Specify the Rotation Method
Second, mark the 'Varimax' option on the 'Method' panel.
First, click on the 'Rotation...' button.
Fourth, click on the 'Continue' button to complete the dialog box.
Third, mark the checkbox for 'Rotated solution' on the 'Display' panel. Clear all other checkboxes.
Slide 9
Complete the Factor Analysis Request

Click on the OK button to complete the factor analysis request.
Slide 10
Count the Number of Correlations Greater than 0.30

Nine of the 21 correlations in the matrix are larger than 0.30, highlighted in yellow in the Correlation Matrix. We meet this criteria for the suitability of the data for factor analysis.
Slide 11
Measures of Appropriateness of Factor Analysis
Interpretive adjectives for the Kaiser-Meyer-Olkin Measure of Sampling Adequacy are: in the 0.90 as marvelous, in the 0.80's as meritorious, in the 0.70's as middling, in the 0.60's as mediocre, in the 0.50's as miserable, and below 0.50 as unacceptable. The value of the KMO Measure of Sampling Adequacy for this set of variables is .446, falling below the acceptable level. We will examine the anti-image correlation matrix to see if it provides us with any possible remedies.
Slide 12
Assessing the Sampling Adequacy Problem
The Anti-image Correlation Matrix contains the measures of sampling adequacy for the individual variables on the diagonal of the matrix, highlighted in cyan. The measures for three variables fall below the acceptable level of 0.50: X1 'Delivery Speed' (.344), X2 'Price Level' (.330), and X5 'Service' (.288). The corrective action is to delete the variables one at a time, starting with the one with the smallest value, until the problem is corrected.
Slide 13
Removing X5 'Service' from the Analysis

First, click on the 'Dialog Recall' tool button. Second, in the drop down menu of recently used dialogs, highlight the 'Factor Analysis' item.
Third, highlight 'Service (X5)' in the list of 'Variables:'.
Fourth, click on the move arrow to return 'Service (X5)' to the list of available buttons.
Fifth, click on the OK button to request the revised analysis.
Slide 14
The Revised Measures of Appropriateness of Factor Analysis

The revised KMO Measure of Sampling Adequacy has a value of 0.665, in the range of acceptable values.
Bartlett's test of sphericity tests the hypothesis that the correlation matrix is an identify matrix; i.e. all diagonal elements are 1 and all offdiagonal elements are 0, implying that all of the variables are uncorrelated. If the Sig value for this test is less than our alpha level, we reject the null hypothesis that the population matrix is an identity matrix. The Sig. value for this analysis leads us to reject the null hypothesis and conclude that there are correlations in the data set that are appropriate for factor analysis.
Slide 15
The Revised Anti-image Correlation Matrix
The new anti-image correlation matrix indicates that the sampling adequacy for each variable is above the 0.50 threshold.
Slide 16
Stage 4: Deriving Factors and Assessing Overall Fit - 1

In the stage, several criteria are examined to determine the number of factors that represent the data. If the analysis is designed to identify a factor structure that was obtained in or suggested by previous research, we are employing an a priori criterion, i.e. we specify a specific number of factors. The three other criteria are obtained in the data analysis: the latent root criterion, the percentage of variance criterion, and the Scree test criterion. A final criterion influencing the number of factors is actually deferred to the next stage, the interpretability criterion. The derived factor structure must make plausible sense in terms of our research; if it does not we should seek a factor solution with a different number of components. It is generally recommended that each strategy be used in determining the number of factors in the data set. It may be that multiple criteria suggest the same solution. If different criteria suggest different conclusions, we might want to compare the parameters of our problem, i.e. sample size, correlations, and communalities to determine which criterion should be given greater weight.
Slide 17

The Latent Root Criterion One of the most commonly used criteria for determining the number of factors or components to include is the latent root criterion, also known as the eigenvalue-one criterion or the Kaiser criterion. With this approach, you retain and interpret any component that has an eigenvalue greater than 1.0. The rationale for this criterion is straightforward. Each observed variable contributes one unit of variance to the total variance in the data set (the 1.0 on the diagonal of the correlation matrix). Any component that displays an eigenvalue greater than 1.0 is accounting for a greater amount of variance than was contributed by one variable. Such a component is therefore accounting for a meaningful amount of variance and is worthy of being retained. On the other hand, a component with an eigenvalue less than 1.0 is accounting for less variance than had been contributed by one variable. The latent root criterion has been shown to produce the correct number of components when the number of variables included in the analysis is small (10 to 15) or moderate (20 to 30) and the communalities are high (greater than 0.70). Low communalities are those below 0.40 (Stevens, page 366).
Slide 18

Proportion of Variance Accounted For Another criterion in determining the number of factors to retain involves retaining a component if it accounts for a specified proportion of variance in the data set, i.e. at least 5% or 10%. Alternatively, one can retain enough components to explain some cumulative total percent of variance, usually 70% to 80%. While this strategy has intuitive appeal (our goal is to explain the variance in the data set), there is not agreement about what percentages are appropriate to use and the strategy is criticized for being subjective and arbitrary. We will employ 70% as the target total percent of variance.
Slide 19

The Scree Test With a Scree test, the eigenvalues associated with each component are plotted against their ordinal numbers (i.e. first eigenvalue, second eigenvalue, etc.). Generally what happens is that the magnitude of successive eigenvalues drops off sharply (steep descent) and then tends to level off. The recommendation is to retain all the eigenvalues (and corresponding components) before the first one on the line where they start to level off. (Hatcher and Stevens both indicate that the number to accept is the number before the line levels off; Hair, et. al. say that the number of factors is the point where the line levels off, but also state that this tends to indicate one or two more factors that are indicated by the latent root criteria.) The Scree test has been shown to be accurate in detecting the correct number of factors with a sample size greater than 250 and communalities greater than 0.60.
Slide 20

The Interpretability Criteria Perhaps the most important criterion in solving the number of components problem is the interpretability criterion: interpreting the substantive meaning of the retained components and verifying that this interpretation makes sense in terms of what is know about the constructs under investigation. The interpretability of a component is improved if it is measured by at least three variables, when all of the variables that load on the component have the same conceptual meaning, when the conceptual meaning of other components appear to be measuring other constructs, and when the rotated factor pattern shows simple structure, i.e. each variable loads significantly on only one component.
Slide 21
The Latent Root Criterion

In the latent root criterion, we identify the number of eigenvalues that are larger than 1.0. In this table, we have two, so this criterion supports the presence of two components or factors.
Slide 22
Percentage of Variance Criterion
In this criterion, we count the number of components that would be necessary to explain 70% or more of the variance in the original set of variables. In this analysis, we reach the 70% minimum with two components.
Slide 23
Scree Test Criterion
In my analysis of the scree plot, the eigenvalues level off beginning with the third eigenvalue. The number of components to retain corresponds to the number of eigenvalues before the line levels off. Therefore, we would retain two components, which corresponds to the number determined by the latent root criterion. (NOTE: in applying this test, the text identifies three components using their interpretation of the criteria).
Slide 24
Stage 5: Interpreting the Factors - 1

All three of the criteria for determining the number of components indicate that two components should be retained. If this number matches the number of components derived by SPSS, we can continue with the analysis. If this number does not match the number of components that SPSS derived, we need to request the factor analysis, specifying the number of components that we want SPSS to extract. Once the extraction of factors has been completed satisfactorily, the resulting factor matrix, which shows the relationship of the original variables to the factors, is rotated to make it easier to interpret. The axes are rotated about the origin so that they are located as close to the clusters of related variables as possible. The orthogonal VARIMAX rotation is the one found most commonly in the literature. VARIMAX rotation keeps the axes at right angles to each other so that the factors are not correlated with each other. Oblique rotation permits the factors to be correlated, and is cited as being a more realistic method of analysis. In the analyses that we do for this class we will specify an orthogonal rotation. The first step in this stage is to determine if any variables should be eliminated from the factor solution. A variable can be eliminated for two reasons. First, if the communality of a variable is low, i.e. less than 0.50, it means that the factors contain less than half of the variance in the original variable, so we might want to exclude that variable from the factor analysis and use it in its original form in subsequent analyses. Second, a variable may have loadings below the criteria level on all factors, i.e. it does not have a strong enough relationship with any factor to be represented by the factor score, so that the variables information is better represented by the original form of the variable.
Slide 25
Stage 5: Interpreting the Factors - 2

Since factor analysis is based on a pattern of relationships among variables, elimination of one variable will change the pattern of all of the others. If we have variables to eliminate, we should eliminate them one at a time, starting with the variable that has the lowest communality or pattern of factor loadings. Once we have eliminated variables that do not belong in the factor analysis, we complete the analysis by naming the factors which we obtained. Naming is an important piece of the analysis, because it assures us that the factor solution is conceptually valid.
Slide 26
Analysis of the Communalities

Once the extraction of factors has been completed, we examine the table of 'Communalities' which tells us how much of the variance in each of the original variables is explained by the extracted factors. For example, in the table shown below, 65.8% of the variance in the original X1 'Delivery Speed' variable is explained by the two extracted components. Higher communalities are desirable. If the communality for a variable is less than 50%, it is a candidate for exclusion from the analysis because the factor solution contains less that half of the variance in the original variable, and the explanatory power of that variable might be better represented by the individual variable. The table of Communalities for this analysis shows communalities for all variables above 0.50, so we would not exclude any variables on the basis of low communalities. If we did exclude a variable for a low communality, we should re-run the factor analysis without that variable before proceeding.
The table of Communalities for this analysis shows communalities for all variables above 0.50, so we would not exclude any variables on the basis of low communalities. If we did exclude a variable for a low communality, we should re-run the factor analysis without that variable before proceeding.
Analysis of the Factor Loadings - 1

When we are satisfied that the factor solution explains sufficient variance for all of the variables in the analysis, we examine the 'Rotated Factor Matrix ' to see if each variable has a substantial loading on one, and only one, factor. The size of the loading termed substantial is a subject about which there are a lot of divergent opinions. We will use a time-honored rule of thumb that a substantial loading is 0.40 or higher. Whichever method is employed to define substantial, the process of analyzing factor loadings is the same. The methodology for analyzing factor loading is to underline or mark all of the loadings in the rotated factor matrix that are higher than 0.40. For the rotated factor matrix for this problem, the substantial loadings are highlighted in green.
Slide 28
Analysis of the Factor Loadings - 2

We examine the pattern of loadings for what is called 'simple structure' which means that each variable has a substantial loading on one and only one factor.
In this component matrix, each variable does have one substantial loading on a component. If one or more variables did not have a substantial loading on a factor, we would re-run the factor analysis excluding those variables one at a time, until we have a solution in which all of the variables in the analysis load on at least one factor.
In this component matrix, each of the original variables also has a substantial loading on only one factor. If a variable had a substantial loading on more than one variable, we refer to that variable as "complex" meaning that it has a relationship to two or more of the derived factors. There are a variety of prescriptions for handling complex variables. The simple prescription is to ignore the complexity and treat the variable as belonging to the factor on which it has the highest loading. A second simple solution to complexity is to eliminate the complex variable from the factor analysis. I have seen other instances where authors chose to include it as a variable in multiple factors, or to arbitrarily assign it to a factor for conceptual reasons. Other prescriptions are to try different methods of factor extraction and rotation to see if a more interpretable solution can be found.
Slide 29
Naming the Factors

Once we have an interpretable pattern of loadings, we name the factors or components according to their substantive content or core. The factors should have conceptually distinct names and content. Variables with higher loadings on a factor should play a more important role in naming the factor. If the factors are not conceptually distinct and cannot be named satisfactorily, the factor solution may be a mathematical contrivance that has not useful application. The naming should take into account the signs of the factor loadings, i.e. negative signs imply an inverse relationship to the factor. For example, X1 'Delivery Speed', X2 'Price Level', X3 'Price Flexibility', and X7 'Product Quality' load on the first factor. Two of these variables: X1 'Delivery Speed' and X3 'Price Flexibility' have a negative sign meaning that they vary inversely to the two variables which have a positive loading: 'Price Level' and X7 'Product Quality'. The name for this factor, which the authors term 'basic value' on page 126 of the text, attempts to take into account the direction of the relationships among all of these variables. The two variables loading on the second factor X4 'Manufacturer Image' and X6 'Salesforce Image' both have positive signs and are named 'HATCO image' by the authors on page 127 of the text.
Slide 30
Stage 6: Validation of Factor Analysis

In the validation stage of the factor analysis, we are concerned with the issue generalizability of the factor model we have derived. We examine two issues: first, is the factor model stable and generalizable, and second, is the factor solution impacted by outliers. The only method for examining the generalizability of the factor model in SPSS is a splithalf validation. To identify outliers, we will employ a strategy proposed in the SPSS manual. Split Half Validation As in all multivariate methods, the findings in factor analysis are impacted by sample size. The larger the sample size, the greater the opportunity to obtain significant findings that are present only because of the large sample. The strategy for examining the stability of the model is to do a split-half validation to see if the factor structure and the communalities remain the same.
Slide 31
Set the Starting Point for Random Number Generation
First, select the 'Random Number Seed...' command from the 'Transform' menu.
Second, click on the 'Set seed to:' option to access the text box for the seed number. Fourth, click on the OK button to complete this action. Third, type '34567' in the 'Set seed to:' text box. (This is the same random number seed specified by the authors on page 705 of the text.)
Slide 32
Compute the Variable to Randomly Split the Sample into Two Halves
First, select the 'Compute...' command from the Transform menu.
Second, create a new variable named 'split' that has the values 1 and 0 to divide the sample into two part. Type the name 'split' into the 'Target Variable:' text box.
Third, type the formula 'uniform(1) > 0.52' in the 'Numeric Expression:' text box. The uniform function will generate a random number between 0.0 and 1.0 for each case. If the generated random number is greater than 0.52, the numeric expression will result in a 1, since the numeric expression is true. If the generated random number is 0.52 or less, the numeric expression will produce a 0, since its value is false. In many computer programs, true is represented by the number 1 and false is represented by a 0.
Fourth, we click on the OK button to compute the split variable.
Slide 33
Compute the Factor Analysis for the First Half of the Sample
First, select the 'Data Reduction | Factor...' command from the Analyze menu.
Second, highlight the 'split' variable and click on the move button to put it into the 'Selection Variable:' text box.
Slide 34
Select the First Half of the Sample for Analysis

Second, type the value 0 into the 'Value for Selection Variable:' text box to replace the '?' in the 'split=?' entry in the 'Selection Variable:' text box with 'split=0'.
First, click on the 'Value...' button which was activated when the split variable moved to the 'Selection Variable:' text box. Third, click on the Continue button to complete the value assignment. Click on the OK button in the Factor Analysis dialog to compute the factor analysis for the first half of the sample.
Slide 35
Compute the Factor Analysis for the Second Half of the Sample
Fourth, type the value 1 into the 'Value for Selection Variable:' text box to replace the '0' in the 'split=0' entry in the 'Selection Variable:' text box with 'split=1'.
First, select the 'Data Reduction | Factor...' command from the Analyze menu. Second, click on the 'Selection Variable:' text box to highlight it.
Fifth, click on the Continue button to complete the value assignment. Third, click on the 'Value...' button which was activated when the 'Selection Variable:' text box was highlighted. Click on the OK button in the Factor Analysis dialog to compute the factor analysis for the second half of the sample.
Slide 36
Compare the Two Rotated Factor Matrices

The two rotated factor matrices for each half of the sample produce the same pattern of loadings of variables on factors that we obtained for the analysis on the complete sample. This result validates the factor solution obtained. Had we obtained fewer factors or a different pattern of loading, we should adjust our analysis accordingly, or include this information in the discussion of limitations to our study.
Slide 37
Compare the Communalities

While the communalities differ for the two models, in all cases they are above 0.50, indicating that the factor model is explaining more than half of the variance in all of the original variables.
Slide 38
2. Identification of Outliers
SPSS proposes a strategy for identifying outliers that is not found in the text (See: SPSS Base 7.5 Applications Guide, pp. 303-304). SPSS computes the factor scores as standard scores with a mean of 0 and a standard deviation of 1. We can examine the factor scores to see if any are above or below the standard score size associated with extreme cases, i.e. +/-2.5 or +-3.0. For this analysis, we will need to compute the factors scores which we have not requested to this point.
Slide 39
Removing the Split Variable from the Analysis
First, we re-open the Factor Analysis dialog box by selecting the 'Data Reduction | Factor...' command from the Analyze menu.
Second, we highlight the 'split=1' selection variable and click on the move arrow to remove it, so that the factors scores are computed using the parameters for the full sample.
Slide 40
Requesting the Factor Scores

Second, we mark the 'Save as variables' checkbox in the 'Factor Analysis: Factor Scores' dialog.
First, we click on the 'Scores...' button in the Factor Analysis dialog.
Fourth, we click on the 'Continue' button to close the 'Factor Analysis: Factor Scores' dialog and the OK button to request the output.
Third, we accept the default 'Regression' method for computing the factor scores.
Slide 41
The Factor Scores in the SPSS Data Editor
SPSS adds variables for the factor scores to the data set.
Slide 42
Use the Explore Procedure to Locate Factor Score Outliers
Second, move the FAC1_1 'REGR factor score 1 for analysis 1' and 'FAC2_1 REGR factor score 2 for analysis 1' variables compute by the Factor Analysis to the 'Dependent List:' list box.
First, select the 'Descriptive Statistics | Explore' command from the Analyze menu.
Third, move the ID variable to the 'Label Cases by:' text box so that the case ID will appear in the output listings.
Fifth, click on the 'Statistics' to request the listing of outliers. Fourth, mark the 'Statistics' option on the Display panel.
Slide 43
Specify Outliers as the Desired Statistics
First, we mark the 'Outliers' check box and clear all other check boxes.
Third, we click on the OK button to produce the output.
Second, we click on the Continue button to complete our selection of statistics.
Slide 44
Extreme Values as Outliers

Using a criterion of +/-2.5, we have no outliers on the first factor and two outliers on the second factor, case ID 5 and case ID 42.
Slide 45
Excluding the Outliers from the Factor Analysis
Second, mark the 'If condition in satisfied' option in the 'Select' panel.
Third, click on the 'If...' button to specify the inclusion condition.
First, select the 'Select Cases...' command from the Data menu.
Slide 46
Specify the Criterion for Selecting Cases

First, we type in the criteria that specifies that cases will be included is their ID number is not 5 and their ID number is not 42.
Second, click on the Continue button to complete the specification.
Slide 47
Re-computing the Factor Model

First, select 'Factor Analysis' using the 'Dialog Recall' tool button. Second, since we are not changing the specifications for the Factor Analysis, we click on the OK button to request that it be recomputed.
Slide 48
The Correlation Matrix for the Model Excluding Outliers

The correlation matrix for the full sample is shown in the top half of the window, and the correlation matrix for the sample excluding outliers is shown in the bottom half of the window. Some correlations are stronger without the outliers and others are weaker, but the overall pattern of correlations in the matrix is the same. This output would not support a conclusion that the outliers are having an impact on the factor results.
Correlation Matrix Delivery Speed 1.000 -.349 .509 .050 .077 -.483 Price Level -.349 1.000 -.487 .272 .186 .470 Price Flexibility .509 -.487 1.000 -.116 -.034 -.448 Manufacturer Image .050 .272 -.116 1.000 .788 .200 Salesforce Image .077 .186 -.034 .788 1.000 .177 Product Quality -.483 .470 -.448 .200 .177 1.000
Correlation
Delivery Speed Price Level Price Flexibility Manufacturer Image Salesforce Image Product Quality
Correlation Matrix Delivery Speed 1.000 -.319 .487 -.039 -.020 -.450 Price Level -.319 1.000 -.471 .353 .272 .449 Price Flexibility .487 -.471 1.000 -.186 -.107 -.426 Manufacturer Image -.039 .353 -.186 1.000 .761 .295 Salesforce Image -.020 .272 -.107 .761 1.000 .284 Product Quality -.450 .449 -.426 .295 .284 1.000
Correlation
Delivery Speed Price Level Price Flexibility Manufacturer Image Salesforce Image Product Quality
Slide 49
The Communalities for the Model Excluding Outliers

The communalities for the full model are shown on the left, with the communalities for the model excluding outliers shown on the right. The overall pattern of communalities is identical for both models.
Slide 50
The Rotated Component Matrix for the Model Excluding Outliers

The Rotated Component Matrix for the full model are shown on the left, with the Rotated Component Matrix for the model excluding outliers shown on the right. The overall pattern of Rotated Component Matrix is identical for both models, so we would conclude that the outliers are not impacting our solution. In subsequent analysis using the factors, we can include all cases in the analysis.
Slide 51
Stage 7: Additional Uses of the Factor Analysis Results

We have already computed the scores for the two factors, which we can use in subsequent analyses as a substitute for the six original variables.
Another option for reducing the data set is to select one of the variables on each factor to use as a surrogate for all the variables that loaded on that factor.
A more common method for incorporating the results of the factor analysis is to create summated scale variables. In this method, the variables which load on each factor are simply summed to form the scale score, rather than using the weights or coefficients for each variable that SPSS uses in calculating factor scores.
Summated scales are easier to compute than weighted factor scores and can easily be applied to cases not included in the original factor analysis. When summated scales are used, it is customary to compute Chronbach's Alpha
Slide 52
Summated Scales and Chronbach's Alpha - 1

(From: Larry Hatcher and Edward J. Stepanski. A Step-by-Step Approach to Using the SAS System for Univariate and Multivariate Statistics.)
Summated or additive scales are formed by summing the scores for a set of variables that load on a factor. If you incorporate summated or additive scales into your research, there is an expectation that you will include efforts to assess the reliability of your measures.
Recall that an underlying construct is a hypothetical variable that you wish to measure, but which cannot be directly measured. The observed variables, on the other hand, consist of measurements that are actually obtained. A reliability coefficient is defined as the percent of variance in an observed variable that is accounted for by the true scores on the underlying construct. Since it is generally not possible to obtain true scores on the underlying construct, reliability is usually defined in practice in terms of the consistency of the scores that are obtained on the observed variables; an instrument is said to be reliable if it is shown to provide consistent scores upon repeated administration, upon administration in alternate forms, and so forth. A variety of methods for estimating scale reliability are actually used in practice. Testretest reliability is assessed by administering the same instrument to the same sample of subjects at two points in time and computing the correlation between the two sets of scores. However, this can be a time consuming and expensive procedure, where you are collecting additional data that cannot be used in other analyses. Because of the cost and time involved in test-retest procedures, indices of reliability that require only one administration are often used. The most popular of these indices are the internal consistency indices of reliability. Briefly, internal consistency is the extent to which the individual items that constitute a test correlate with one another or with the test total. In the social sciences, one of the most widely used indices of internal consistency is coefficient alpha or Cronbach's alpha.
Summated Scales and Chronbach's Alpha - 2

While coefficient alpha has values from 0 to 1.0, the general rule of thumb is that it must be above 0.70 in order to be judged adequate. Coefficient alpha will be high to the extent that many items are included in the scale, and the items that constitute the scale are highly correlated with one another. Coefficient alpha requires that we specify the variables that we believe form the scale and the measure tells us whether or not we have internal consistency between these items and a summated scale that would be formed from them. In many of the articles which we have used this semester, variables have been combined to form scales which were in turn used as independent variables. The alpha statistic frequently cited in discussing the formation of the scales is Cronbach's, or coefficient, alpha.
Slide 54
Computing the Reliability Coefficient for the First Factor
Second, move the items loading on the first scale, Delivery Speed, Price Level, Price Flexibility, and Product Quality, to the list box of 'Items:'.
First, select the 'Scale | Reliability Analysis...' from the Analyze menu.
Third, select 'Alpha' from the drop down menu of 'Model:' choices.
Fourth, click on the 'Statistics...' button to specify the statistics we want included in the output.
Slide 55
Specifying the Statistics to Include in the Reliability Analysis

Second, click on the Continue button to close the 'Reliability Analysis: Statistics' dialog box.
First, we mark the check boxes for 'Scale' and 'Scale if item deleted' and clear all other check boxes. If the obtained value of coefficient alpha is below the acceptable criteria, these statistics will suggest a remedy for correcting the problem.
Third, click on the OK button to close the 'Reliability Analysis' dialog box.
Slide 56
The Reliability Analysis for the First Summated Scale

The alpha coefficient for the first summated scale is -0.8984, well above the 0.70 criteria. Had we not exceeded the criteria, we would have looked at the column "Alpha if Item Deleted" to see if we could have omitted one of the variables from the list and formed a reliable scale from the 3 remaining variables.
Slide 57
Computing the Reliability Coefficient for the Second Factor
First, select the 'Scale | Reliability Analysis...' from the Analyze menu.
Second, remove the variables for the first scale from the 'Items:' list box and move the items for the second scale, Manufacturer Image and Salesforce Image, to the list box of 'Items:'.
Third, all other specifications remain the same, so we click on the OK button to produce the output.
Slide 58
The Reliability Analysis for the Second Summated Scale

The alpha coefficient for the second summated scale is 0.8463, well above the 0.70 criteria.
Slide 59

Principal Components Analysis

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Principal Components Analysis

Hochgeladen von

Copyright:

Verfügbare Formate

Principal Components Factor Analysis

Principal Components Factor Analysis

Stage 1: Define the Research Problem

Principal Components Factor Analysis

Stage 2: Designing a Factor Analysis

Variable selection and measurement issues

Stage 3: Assumptions of Factor Analysis

Requesting a Principal Components Factor Analysis

Click on the 'Data Reduction | Factor...' command in the Analyze menu.

Principal Components Factor Analysis

Specify the Variables to Include in the Analysis

Principal Components Factor Analysis

Specify the Descriptive Statistics to include in the Output

First, click on the 'Descriptives...' button.

Principal Components Factor Analysis

Specify the Extraction Method and Number of Factors

Third, mark the 'Correlation matrix' option in the 'Analyze' panel.

Principal Components Factor Analysis

Specify the Rotation Method

Second, mark the 'Varimax' option on the 'Method' panel.

First, click on the 'Rotation...' button.

Fourth, click on the 'Continue' button to complete the dialog box.

Principal Components Factor Analysis

Complete the Factor Analysis Request

Principal Components Factor Analysis

Count the Number of Correlations Greater than 0.30

Principal Components Factor Analysis

Measures of Appropriateness of Factor Analysis

Principal Components Factor Analysis

Assessing the Sampling Adequacy Problem

Principal Components Factor Analysis

Removing X5 'Service' from the Analysis

Third, highlight 'Service (X5)' in the list of 'Variables:'.

Fifth, click on the OK button to request the revised analysis.

Principal Components Factor Analysis

The Revised Measures of Appropriateness of Factor Analysis

Principal Components Factor Analysis

The Revised Anti-image Correlation Matrix

Principal Components Factor Analysis

Stage 4: Deriving Factors and Assessing Overall Fit - 1

Principal Components Factor Analysis

Stage 4: Deriving Factors and Assessing Overall Fit - 2

Principal Components Factor Analysis

Stage 4: Deriving Factors and Assessing Overall Fit - 3

Principal Components Factor Analysis

Stage 4: Deriving Factors and Assessing Overall Fit - 4

Principal Components Factor Analysis

Stage 4: Deriving Factors and Assessing Overall Fit - 5

Principal Components Factor Analysis

The Latent Root Criterion

Principal Components Factor Analysis

Percentage of Variance Criterion

Principal Components Factor Analysis

Scree Test Criterion

Principal Components Factor Analysis

Stage 5: Interpreting the Factors - 1

Principal Components Factor Analysis

Stage 5: Interpreting the Factors - 2

Principal Components Factor Analysis

Analysis of the Communalities

Analysis of the Factor Loadings - 1

Principal Components Factor Analysis

Analysis of the Factor Loadings - 2