Slide 2 Principal components factor analysis Obtaining a factor solution through principal components analysis is an iterative process that usually requires repeating the SPSS factor analysis procedure a number of times to reach a satisfactory solution.
We begin by identifying a group of variables whose variance we believe can be represented more parsimoniously by a smaller set of factors, or components. The end result of the principal components analysis will tell us which variables can be represented by which components, and which variables should be retained as individual variables because the factor solution does not adequately represent their information. SW388R7 Data Analysis & Computers II
Slide 3 Strategy for solving problems - 1 A principal component factor analysis requires: The variables included must be metric level or dichotomous (dummy-coded) nominal level The sample size must be greater than 50 (preferably 100) The ratio of cases to variables must be 5 to 1 or larger The correlation matrix for the variables must contain 2 or more correlations of 0.30 or greater Variables with measures of sampling adequacy less than 0.50 must be removed The overall measure of sampling adequacy is 0.50 or higher The Bartlett test of sphericity is statistically significant.
The first phase of a principal component analysis is devoted to verifying that we meet these requirements. If we do not meet these requirements, factor analysis is not appropriate. SW388R7 Data Analysis & Computers II
Slide 4 Strategy for solving problems - 2 The second phase of a principal component factor analysis focuses on deriving a factor model, or pattern of relationships between variables and components, that satisfies the following requirements: The derived components explain 50% or more of the variance in each of the variables, i.e. have a communality greater than 0.50 None of the variables have loadings, or correlations, of 0.40 or higher for more than one component, i.e. do not have complex structure None of the components has only one variable in it
To meet these requirements, we remove problematic variables from the analysis and repeat the principal component analysis. SW388R7 Data Analysis & Computers II
Slide 5 Strategy for solving problems - 3 If, at the conclusion of this process, we have components that have more than one variable loading on them, have components that explain at least 50% of the variance in the included variables, and have components that collectively explain more than 60% of the variance in the set of variables, we can substitute the components for the variables in further analyses. Variables that were removed in the analysis should be included individually in further analyses. Substitution of components for individual variables is accomplished by using only the highest loading variable, or by combining the variables loading on each component to create a new variable. SW388R7 Data Analysis & Computers II
Slide 6 Notes - 1 When evaluating measures of sampling adequacy, communalities, or factor loadings, we ignore the sign of the numeric value and base our decision on the size or magnitude of the value.
The sign of the number indicates the direction of the relationship.
A loading of -0.732 is just as strong as a loading of 0.732. The minus sign indicates an inverse or negative relationship; the absence of a sign is meant to imply a plus sign indicating a direct or positive relationship. SW388R7 Data Analysis & Computers II
Slide 7 Notes - 2 If there are two or more components in the component matrix, the pattern of loadings is based on the SPSS Rotated Component Matrix. If there is only one component in the solution, the Rotated Component Matrix is not computed, and the pattern of loadings is based on the Component Matrix.
It is possible that the analysis will break down and we will have too few variables in the analysis to support the use of principal component analysis. SW388R7 Data Analysis & Computers II
Slide 8 Problem 1 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Answer the question based on the results of a principal component analysis prior to testing for outliers, split sample validation, and a test of reliability. Assume that there is no problematic pattern of missing data. Use a level of significance of 0.05.
Based on the results of a principal component analysis of the 8 variables "claims about environmental threats are exaggerated" [grnexagg], "danger to the environment from modifying genes in crops" [genegen], "America doing enough to protect environment" [amprogrn], "should be international agreements for environment problems" [grnintl], "poorer countries should be expected to do less for the environment" [ldcgrn], "economic progress in America will slow down without more concern for environment" [econgrn], "likelihood of nuclear power station damaging environment in next 5 years" [nukeacc], and "respondent's socioeconomic index" [sei], the information in these variables can be represented with 2 components and 4 individual variables.
Component 1 includes the variables "danger to the environment from modifying genes in crops" [genegen] and "likelihood of nuclear power station damaging environment in next 5 years" [nukeacc]. Component 2 includes the variables "claims about environmental threats are exaggerated" [grnexagg] and "poorer countries should be expected to do less for the environment" [ldcgrn]. The variables "respondent's socioeconomic index" [sei], "economic progress in America will slow down without more concern for environment" [econgrn], "should be international agreements for environment problems" [grnintl], and "America doing enough to protect environment" [amprogrn] were not included on the components and are retained as individual variables.
1. True 2. True with caution 3. False 4. Inappropriate application of a statistic SW388R7 Data Analysis & Computers II
Slide 9 Computing a principal component analysis To compute a principal component analysis in SPSS, select the Data Reduction | Factor command from the Analyze menu. SW388R7 Data Analysis & Computers II
Slide 10 Add the variables to the analysis First, move the variables listed in the problem to the Variables list box. Second, click on the Descriptives button to specify statistics to include in the output. SW388R7 Data Analysis & Computers II
Slide 11 Compete the descriptives dialog box First, mark the Univariate descriptives checkbox to get a tally of valid cases. Third, mark the Coefficients checkbox to get a correlation matrix, one of the outputs needed to assess the appropriateness of factor analysis for the variables. Second, keep the Initial solution checkbox to get the statistics needed to determine the number of factors to extract. Fourth, mark the KMO and Bartletts test of sphericity checkbox to get more outputs used to assess the appropriateness of factor analysis for the variables. Fifth, mark the Anti-image checkbox to get more outputs used to assess the appropriateness of factor analysis for the variables. Sixth, click on the Continue button. SW388R7 Data Analysis & Computers II
Slide 12 Select the extraction method First, click on the Extraction button to specify statistics to include in the output. The extraction method refers to the mathematical method that SPSS uses to compute the factors or components. SW388R7 Data Analysis & Computers II
Slide 13 Compete the extraction dialog box First, retain the default method Principal components. Second, click on the Continue button. SW388R7 Data Analysis & Computers II
Slide 14 Select the rotation method First, click on the Rotation button to specify statistics to include in the output. The rotation method refers to the mathematical method that SPSS rotate the axes in geometric space. This makes it easier to determine which variables are loaded on which components. SW388R7 Data Analysis & Computers II
Slide 15 Compete the rotation dialog box First, mark the Varimax method as the type of rotation to used in the analysis.
Second, click on the Continue button. SW388R7 Data Analysis & Computers II
Slide 16 Complete the request for the analysis First, click on the OK button to request the output. SW388R7 Data Analysis & Computers II
Slide 17 Level of measurement requirement "Respondent's socioeconomic index" [sei] is an interval level variable, which satisfy the level of measurement requirements for a principal component analysis.
"Claims about environmental threats are exaggerated" [grnexagg], "danger to the environment from modifying genes in crops" [genegen], "America doing enough to protect environment" [amprogrn], "should be international agreements for environment problems" [grnintl], "poorer countries should be expected to do less for the environment" [ldcgrn], "economic progress in America will slow down without more concern for environment" [econgrn], and "likelihood of nuclear power station damaging environment in next 5 years" [nukeacc] are ordinal level variables. If we follow the convention of treating ordinal level variables as metric variables, the level of measurement requirement for principal component analysis is satisfied. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation. SW388R7 Data Analysis & Computers II
Slide 18 Descriptive Statistics 3.31 .988 67 3.09 .949 67 2.40 .605 67 1.97 .674 67 3.73 .898 67 2.69 .839 67 2.49 .894 67 49.757 18.9651 67 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS AMERICAN DOING ENOUGH TO PROTECT ENVIRONMENT INTL AGREEMENTS FOR ENVIRONMENT PROBLEMS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT ECONOMIC PROGRESS DEPENDENT ON ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS RESPONDENT'S SOCIOECONOMIC INDEX Mean Std. Devi ation Anal ysi s N Sample size requirement: minimum number of cases The number of valid cases for this set of variables is 67.
While principal component analysis can be conducted on a sample that has fewer than 100 cases, but more than 50 cases, we should be cautious about its interpretation. SW388R7 Data Analysis & Computers II
Slide 19 Descriptive Statistics 3.31 .988 67 3.09 .949 67 2.40 .605 67 1.97 .674 67 3.73 .898 67 2.69 .839 67 2.49 .894 67 49.757 18.9651 67 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS AMERICAN DOING ENOUGH TO PROTECT ENVIRONMENT INTL AGREEMENTS FOR ENVIRONMENT PROBLEMS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT ECONOMIC PROGRESS DEPENDENT ON ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS RESPONDENT'S SOCIOECONOMIC INDEX Mean Std. Devi ation Anal ysi s N Sample size requirement: ratio of cases to variables The ratio of cases to variables in a principal component analysis should be at least 5 to 1.
With 67 and 8 variables, the ratio of cases to variables is 8.4 to 1, which exceeds the requirement for the ratio of cases to variables. SW388R7 Data Analysis & Computers II
Slide 20 Correlation Matrix 1.000 -.240 .394 -.305 .301 -.117 -.126 .069 -.240 1.000 -.301 .146 -.149 .340 .394 .208 .394 -.301 1.000 -.305 .258 -.136 -.401 .054 -.305 .146 -.305 1.000 -.289 .117 .151 .094 .301 -.149 .258 -.289 1.000 -.194 -.003 .114 -.117 .340 -.136 .117 -.194 1.000 .310 -.018 -.126 .394 -.401 .151 -.003 .310 1.000 .341 .069 .208 .054 .094 .114 -.018 .341 1.000 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS AMERICAN DOING ENOUGH TO PROTECT ENVIRONMENT INTL AGREEMENTS FOR ENVIRONMENT PROBLEMS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT ECONOMIC PROGRESS DEPENDENT ON ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS RESPONDENT'S SOCIOECONOMIC INDEX Correl ati on ENVIRON MENTAL THREATS EXAGGER ATED HOW DANGEROUS MODIFYING GENES IN CROPS AMERICAN DOING ENOUGH TO PROTECT ENVIRONME NT INTL AGREEMEN TS FOR ENVIRONME NT PROBLEMS POOR COUNTRIES LESS THAN RICH FOR ENVIRONME NT ECONOMIC PROGRESS DEPENDEN T ON ENVIRONME NT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS RESPONDEN T'S SOCIOECON OMIC INDEX Appropriateness of factor analysis: Presence of substantial correlations Principal components analysis requires that there be some correlations greater than 0.30 between the variables included in the analysis.
For this set of variables, there are 10 correlations in the matrix greater than 0.30, satisfying this requirement. The correlations greater than 0.30 are highlighted in yellow. SW388R7 Data Analysis & Computers II
Slide 21 Appropriateness of factor analysis: Sampling adequacy of individual variables Principal component analysis requires that the Kaiser-Meyer-Olkin Measure of Sampling Adequacy be greater than 0.50 for each individual variable as well as the set of variables.
The Measure of Sampling Adequacy (MSA) is described at marvelous if it is 0.90 or greater, meritorious if it is in the 0.80's, middling if in the 0.70's, mediocre if in the in the 0.60's , miserable if in the 0.50's, and unacceptable if below 0.50. There are two anti-image matrices: the anti-image covariance matrix and the anti-image correlation matrix. We are interested in the anti-image correlation matrix. SW388R7 Data Analysis & Computers II
Slide 22 Appropriateness of factor analysis: Sampling adequacy of individual variables On iteration 1, the MSA for the variable "respondent's socioeconomic index" [sei] was 0.410 which was less than 0.50, so it was removed from the analysis. SPSS locates the Measures of Sampling Adequacy are on the diagonal of the anti- image correlation matrix, highlighted in yellow. SW388R7 Data Analysis & Computers II
Slide 23 Excluding a variable from the factor analysis To remove the variable "respondent's socioeconomic index" [sei] from the analysis, click on the Dialog Recall tool button to access the drop down menu. SW388R7 Data Analysis & Computers II
Slide 24 Repeating the factor analysis In the drop down menu, select Factor Analysis to reopen the factor analysis dialog box. SW388R7 Data Analysis & Computers II
Slide 25 Removing the variable from the list of variables First, highlight the sei variable. Second, click on the left arrow button to remove the variable from the Variables list box. SW388R7 Data Analysis & Computers II
Slide 26 Replicating the factor analysis The dialog recall command opens the dialog box with all of the settings that we had selected the last time we used factor analysis.
To replicate the analysis without the variable that we just removed, click on the OK button. SW388R7 Data Analysis & Computers II
Slide 27 Appropriateness of factor analysis: Sample adequacy for revised factor analysis In the factor analysis with the sei variable removed, we see that the measures of sampling adequacy for the remaining variables are all greater than 0.50. On iteration 2, the MSA for all of the individual variables still included in the analysis was greater than 0.5, supporting their retention in the analysis. SW388R7 Data Analysis & Computers II
Slide 28 KMO and Bartlett's Test .734 69.350 21 .000 Kaiser-Meyer-Ol ki n Measure of Sampli ng Adequacy. Approx. Chi-Square df Si g. Bartl ett's Test of Spheri ci ty Appropriateness of factor analysis: Sample adequacy for set of variables In addition, the overall MSA for the set of variables included in the analysis was 0.734, which exceeds the minimum requirement of 0.50 for overall MSA. The seven variables remaining in the analysis satisfy the criteria for appropriateness of factor analysis.
SW388R7 Data Analysis & Computers II
Slide 29 KMO and Bartlett's Test .642 84.006 28 .000 Kaiser-Meyer-Ol ki n Measure of Sampli ng Adequacy. Approx. Chi-Square df Si g. Bartl ett's Test of Spheri ci ty Appropriateness of factor analysis: Bartlett test of sphericity Principal component analysis requires that the probability associated with Bartlett's Test of Sphericity be less than the level of significance.
The probability associated with the Bartlett test is <0.001, which satisfies this requirement.
The next step is to determine the number of factors that should be included in the factor solution. SW388R7 Data Analysis & Computers II
Slide 30 Number of factors to extract: Latent root criterion Total Variance Explained 2.461 35.164 35.164 2.461 35.164 35.164 1.849 26.408 26.408 1.229 17.557 52.721 1.229 17.557 52.721 1.842 26.313 52.721 .889 12.695 65.415 .722 10.313 75.728 .621 8.878 84.606 .572 8.174 92.780 .505 7.220 100.000 Component 1 2 3 4 5 6 7 Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve % Initi al Ei genval ues Extracti on Sums of Squared Loadi ngs Rotati on Sums of Squared Loadi ngs Extracti on Method: Princi pal Component Anal ysi s. Using the output from iteration 2, there were 2 eigenvalues greater than 1.0.
The latent root criterion for number of factors to derive would indicate that there were 2 components to be extracted for these variables. SW388R7 Data Analysis & Computers II
Slide 31 Number of factors to extract: Percentage of variance criterion Total Variance Explained 2.461 35.164 35.164 2.461 35.164 35.164 1.849 26.408 26.408 1.229 17.557 52.721 1.229 17.557 52.721 1.842 26.313 52.721 .889 12.695 65.415 .722 10.313 75.728 .621 8.878 84.606 .572 8.174 92.780 .505 7.220 100.000 Component 1 2 3 4 5 6 7 Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve % Initi al Ei genval ues Extracti on Sums of Squared Loadi ngs Rotati on Sums of Squared Loadi ngs Extracti on Method: Princi pal Component Anal ysi s. In contrast, the cumulative proportion of variance criteria would require 3 components to satisfy the criterion of explaining 60% or more of the total variance.
A 3 components solution would explain 65.415% of the total variance. Since the SPSS default is to extract the number of components indicated by the latent root criterion, our initial factor solution was based on the extraction of 2 components. SW388R7 Data Analysis & Computers II
Slide 32 Communalities 1.000 .526 1.000 .582 1.000 .521 1.000 .491 1.000 .494 1.000 .437 1.000 .640 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS AMERICAN DOING ENOUGH TO PROTECT ENVIRONMENT INTL AGREEMENTS FOR ENVIRONMENT PROBLEMS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT ECONOMIC PROGRESS DEPENDENT ON ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS Initi al Extracti on Extracti on Method: Pri nci pal Component Anal ysi s. Evaluating communalities Communalities represent the proportion of the variance in the original variables that is accounted for by the factor solution.
The factor solution should explain at least half of each original variable's variance, so the communality value for each variable should be 0.50 or higher. SW388R7 Data Analysis & Computers II
Slide 33 Communalities 1.000 .526 1.000 .582 1.000 .521 1.000 .491 1.000 .494 1.000 .437 1.000 .640 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS AMERICAN DOING ENOUGH TO PROTECT ENVIRONMENT INTL AGREEMENTS FOR ENVIRONMENT PROBLEMS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT ECONOMIC PROGRESS DEPENDENT ON ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS Initi al Extracti on Extracti on Method: Princi pal Component Anal ysi s. Communality requiring variable removal
On iteration 2, the communality for the variable "economic progress in America will slow down without more concern for environment" [econgrn] was 0.437 which was less than 0.50.
The variable was removed and the principal component analysis was computed again. In this iteration, there are actually three variables that have communalities less than 0.50. The variable with the smallest communality is selected for removal. SW388R7 Data Analysis & Computers II
Slide 34 Repeating the factor analysis In the drop down menu, select Factor Analysis to reopen the factor analysis dialog box. SW388R7 Data Analysis & Computers II
Slide 35 Removing the variable from the list of variables First, highlight the econgrn variable. Second, click on the left arrow button to remove the variable from the Variables list box. SW388R7 Data Analysis & Computers II
Slide 36 Replicating the factor analysis The dialog recall command opens the dialog box with all of the settings that we had selected the last time we used factor analysis.
To replicate the analysis without the variable that we just removed, click on the OK button. SW388R7 Data Analysis & Computers II
Slide 37 Communalities 1.000 .517 1.000 .573 1.000 .576 1.000 .486 1.000 .573 1.000 .716 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS AMERICAN DOING ENOUGH TO PROTECT ENVIRONMENT INTL AGREEMENTS FOR ENVIRONMENT PROBLEMS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS Initi al Extracti on Extracti on Method: Princi pal Component Anal ysi s. Communality requiring variable removal On iteration 3, the communality for the variable "should be international agreements for environment problems" [grnintl] was 0.486 which was less than 0.50.
The variable was removed and the principal component analysis was computed again. SW388R7 Data Analysis & Computers II
Slide 38 Repeating the factor analysis In the drop down menu, select Factor Analysis to reopen the factor analysis dialog box. SW388R7 Data Analysis & Computers II
Slide 39 Removing the variable from the list of variables First, highlight the grnintl variable. Second, click on the left arrow button to remove the variable from the Variables list box. SW388R7 Data Analysis & Computers II
Slide 40 Replicating the factor analysis The dialog recall command opens the dialog box with all of the settings that we had selected the last time we used factor analysis.
To replicate the analysis without the variable that we just removed, click on the OK button. SW388R7 Data Analysis & Computers II
Slide 41 Communalities 1.000 .591 1.000 .568 1.000 .582 1.000 .679 1.000 .725 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS AMERICAN DOING ENOUGH TO PROTECT ENVIRONMENT POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS Initi al Extracti on Extracti on Method: Princi pal Component Anal ysi s. Communality satisfactory for all variables Complex structure occurs when one variable has high loadings or correlations (0.40 or greater) on more than one component. If a variable has complex structure, it should be removed from the analysis.
Variables are only checked for complex structure if there is more than one component in the solution. Variables that load on only one component are described as having simple structure. Once any variables with communalities less than 0.50 have been removed from the analysis, the pattern of factor loadings should be examined to identify variables that have complex structure. SW388R7 Data Analysis & Computers II
Slide 42 Rotated Component Matrix a -.265 .722 .732 -.182 -.581 .495 .092 .819 .847 .087 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS AMERICAN DOING ENOUGH TO PROTECT ENVIRONMENT POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS 1 2 Component Extracti on Method: Pri nci pal Component Anal ysi s. Rotati on Method: Vari max wi th Kai ser Normal i zati on. Rotati on converged in 3 i terati ons. a. Identifying complex structure On iteration 4, the variable "America doing enough to protect environment" [amprogrn] was found to have complex structure.
Specifically, the variable had a loading of -0.581 on component 1 and a loading of 0.495 on component 2.
The variable should be removed and the principal component analysis should be repeated. If only one component has been extracted, each variable can only load on that one factor, so complex structure is not an issue. SW388R7 Data Analysis & Computers II
Slide 43 Repeating the factor analysis In the drop down menu, select Factor Analysis to reopen the factor analysis dialog box. SW388R7 Data Analysis & Computers II
Slide 44 Removing the variable from the list of variables First, highlight the amprogrn variable. Second, click on the left arrow button to remove the variable from the Variables list box. SW388R7 Data Analysis & Computers II
Slide 45 Replicating the factor analysis The dialog recall command opens the dialog box with all of the settings that we had selected the last time we used factor analysis.
To replicate the analysis without the variable that we just removed, click on the OK button. SW388R7 Data Analysis & Computers II
Slide 46 Rotated Component Matrix a -.207 .756 .801 -.229 .051 .830 .861 .059 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS 1 2 Component Extracti on Method: Pri nci pal Component Anal ysi s. Rotati on Method: Vari max wi th Kai ser Normal i zati on. Rotati on converged in 3 i terati ons. a. Checking for single-variable components On iteration 5, none of the variables demonstrated complex structure.
It is not necessary to remove any additional variables because of complex structure. After variables have been removed for low communalities and complex structure, the factor solution is examined to remove any components that have only a single variable loading on them.
If a component has only a single variable loading on it, the variable should be removed from the next iteration of the principal component analysis. SW388R7 Data Analysis & Computers II
Slide 47 Variable loadings on components Rotated Component Matrix a -.207 .756 .801 -.229 .051 .830 .861 .059 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS 1 2 Component Extracti on Method: Pri nci pal Component Anal ysi s. Rotati on Method: Vari max wi th Kai ser Normal i zati on. Rotati on converged in 3 i terati ons. a. On iteration 5, the 2 components in the analysis had more than one variable loading on each of them. Communalities 1.000 .615 1.000 .694 1.000 .691 1.000 .744 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS Initi al Extracti on Extracti on Method: Principal Component Anal ysi s. The communalities for all of the variables included on the components were greater than 0.50 and all variables had simple structure.
The principal component analysis has been completed. SW388R7 Data Analysis & Computers II
Slide 48 Rotated Component Matrix a -.207 .756 .801 -.229 .051 .830 .861 .059 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS 1 2 Component Extracti on Method: Pri nci pal Component Anal ysi s. Rotati on Method: Vari max wi th Kai ser Normal i zati on. Rotati on converged in 3 i terati ons. a. Interpreting the principal components The information in 4 of the variables can be represented by 2 components. Component 1 includes the variables "danger to the environment from modifying genes in crops" [genegen] and "likelihood of nuclear power station damaging environment in next 5 years" [nukeacc]. We can substitute one component variable for this combination of variables in further analyses.
Component 2 includes the variables "claims about environmental threats are exaggerated" [grnexagg] and "poorer countries should be expected to do less for the environment" [ldcgrn]. We can substitute one component variable for this combination of variables in further analyses. SW388R7 Data Analysis & Computers II
Slide 49 Communalities 1.000 .615 1.000 .694 1.000 .691 1.000 .744 ENVIRONMENTAL THREATS EXAGGERATED HOW DANGEROUS MODIFYING GENES IN CROPS POOR COUNTRIES LESS THAN RICH FOR ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN IN 5 YEARS Initi al Extracti on Extracti on Method: Principal Component Anal ysi s. Variance explained in individual variables The components explain at least 50% of the variance in each of the variables included in the final analysis. SW388R7 Data Analysis & Computers II
Slide 50 Total Variance Explained 1.626 40.651 40.651 1.626 40.651 40.651 1.428 35.699 35.699 1.119 27.968 68.619 1.119 27.968 68.619 1.317 32.921 68.619 .694 17.341 85.960 .562 14.040 100.000 Component 1 2 3 4 Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve % Initi al Ei genval ues Extracti on Sums of Squared Loadi ngs Rotati on Sums of Squared Loadi ngs Extracti on Method: Princi pal Component Anal ysi s. Total variance explained The 2 components explain 68.619% of the total variance in the variables which are included on the components. SW388R7 Data Analysis & Computers II
Slide 51 Total Variance Explained 1.626 40.651 40.651 1.626 40.651 40.651 1.428 35.699 35.699 1.119 27.968 68.619 1.119 27.968 68.619 1.317 32.921 68.619 .694 17.341 85.960 .562 14.040 100.000 Component 1 2 3 4 Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve % Total % of Vari ance Cumul ati ve % Initi al Ei genval ues Extracti on Sums of Squared Loadi ngs Rotati on Sums of Squared Loadi ngs Extracti on Method: Princi pal Component Anal ysi s. Answering the problem question The answer to the original question is true with caution.
Component 1 includes the variables "danger to the environment from modifying genes in crops" [genegen] and "likelihood of nuclear power station damaging environment in next 5 years" [nukeacc]. We can substitute one component variable for this combination of variables in further analyses.
Component 2 includes the variables "claims about environmental threats are exaggerated" [grnexagg] and "poorer countries should be expected to do less for the environment" [ldcgrn]. We can substitute one component variable for this combination of variables in further analyses.
The components explain at least 50% of the variance in each of the variables included in the final analysis.
The components explain 68.619% of the total variance in the variables which are included on the components.
A caution is added to our findings because of the inclusion of ordinal level variables in the analysis and the limited sample available for analysis. SW388R7 Data Analysis & Computers II
Slide 52 Problem 2 2. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Answer the question based on the results of a principal component analysis prior to testing for outliers, split sample validation, and a test of reliability. Assume that there is no problematic pattern of missing data. Use a level of significance of 0.05.
A principal component analysis of the variables "number of hours worked in the past week" [hrs1], "occupational prestige score" [prestg80], "age" [age], and "highest academic degree" [degree] does not result in any usable components that reduce the number of variable needed to represent the information in the original variables.
1. True 2. True with caution 3. False 4. Inappropriate application of a statistic SW388R7 Data Analysis & Computers II
Slide 53 Computing a principal component analysis To compute a principal component analysis in SPSS, select the Data Reduction | Factor command from the Analyze menu. SW388R7 Data Analysis & Computers II
Slide 54 Add the variables to the analysis First, move the variables listed in the problem to the Variables list box. Second, click on the Descriptives button to specify statistics to include in the output. SW388R7 Data Analysis & Computers II
Slide 55 Compete the descriptives dialog box First, mark the Univariate descriptives checkbox to get a tally of valid cases. Third, mark the Coefficients checkbox to get a correlation matrix, one of the outputs needed to assess the appropriateness of factor analysis for the variables. Second, keep the Initial solution checkbox to get the statistics needed to determine the number of factors to extract. Fourth, mark the KMO and Bartletts test of sphericity checkbox to get more outputs used to assess the appropriateness of factor analysis for the variables. Fifth, mark the Anti-image checkbox to get more outputs used to assess the appropriateness of factor analysis for the variables. Sixth, click on the Continue button. SW388R7 Data Analysis & Computers II
Slide 56 Select the extraction method First, click on the Extraction button to specify statistics to include in the output. SW388R7 Data Analysis & Computers II
Slide 57 Compete the extraction dialog box First, retain the default method Principal components. Second, click on the Continue button. SW388R7 Data Analysis & Computers II
Slide 58 Select the rotation method First, click on the Rotation button to specify statistics to include in the output. SW388R7 Data Analysis & Computers II
Slide 59 Compete the rotation dialog box First, mark the Varimax method as the type of rotation to used in the analysis.
Second, click on the Continue button. SW388R7 Data Analysis & Computers II
Slide 60 Complete the request for the analysis First, click on the OK button to request the output. SW388R7 Data Analysis & Computers II
Slide 61 Level of measurement requirement "Number of hours worked in the past week" [hrs1], "occupational prestige score" [prestg80], and "age" [age] are interval level variables, which satisfy the level of measurement requirements for a principal component analysis.
"Highest academic degree" [degree] is an ordinal level variable. If we follow the convention of treating ordinal level variables as metric variables, the level of measurement requirement for principal component analysis is satisfied. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation. SW388R7 Data Analysis & Computers II
Slide 62 Descriptive Statistics 40.94 12.624 172 45.37 14.218 172 41.81 11.837 172 1.70 1.185 172 NUMBER OF HOURS WORKED LAST WEEK RS OCCUPATIONAL PRESTIGE SCORE (1980) AGE OF RESPONDENT RS HIGHEST DEGREE Mean Std. Devi ation Anal ysi s N Sample size requirement: minimum number of cases The number of valid cases for this set of variables is 172.
The preferred minimum sample size requirement of 100 valid cases is satisfied. SW388R7 Data Analysis & Computers II
Slide 63 Descriptive Statistics 40.94 12.624 172 45.37 14.218 172 41.81 11.837 172 1.70 1.185 172 NUMBER OF HOURS WORKED LAST WEEK RS OCCUPATIONAL PRESTIGE SCORE (1980) AGE OF RESPONDENT RS HIGHEST DEGREE Mean Std. Devi ation Anal ysi s N Sample size requirement: ratio of cases to variables The ratio of cases to variables in a principal component analysis should be at least 5 to 1.
With 172 and 4 variables, the ratio of cases to variables is 43.0 to 1, which exceeds the requirement for the ratio of cases to variables. SW388R7 Data Analysis & Computers II
Slide 64 Correlation Matrix 1.000 .160 .021 .174 .160 1.000 .060 .525 .021 .060 1.000 .068 .174 .525 .068 1.000 NUMBER OF HOURS WORKED LAST WEEK RS OCCUPATIONAL PRESTIGE SCORE (1980) AGE OF RESPONDENT RS HIGHEST DEGREE Correl ati on NUMBER OF HOURS WORKED LAST WEEK RS OCCUPA TIONAL PRESTIG E SCORE (1980) AGE OF RESPON DENT RS HIGHEST DEGREE Appropriateness of factor analysis: Presence of substantial correlations Principal components analysis requires that there be some correlations greater than 0.30 between the variables included in the analysis.
For this set of variables, there is 1 correlation in the matrix greater than 0.30, which does not satisfy this requirement. Since we do not satisfy a basic requirement for principal component factor analysis, the answer to the question is inappropriate application of a statistic SW388R7 Data Analysis & Computers II
Slide 65 Steps in principal component analysis - 1 The following is a guide to the decision process for answering problems about principal component analysis: Is the number of valid cases 50 or more?
Yes No Incorrect application of a statistic Yes No Are the variables included in the analysis metric?
Is the ratio of cases to variables at least 5 to 1? Yes No Incorrect application of a statistic Incorrect application of a statistic SW388R7 Data Analysis & Computers II
Slide 66 Steps in principal component analysis - 2 Is the measure of sampling adequacy larger than 0.50 for each variable? Yes No Incorrect application of a statistic Yes No Are there two or more correlations that are 0.30 or greater? Overall measure of sampling adequacy greater than 0.50? Yes No Incorrect application of a statistic Remove variable with lowest MSA and repeat analysis SW388R7 Data Analysis & Computers II
Slide 67 Steps in principal component analysis - 3 Incorrect application of a statistic Yes No Probability for Bartlett test of sphericity less than level of significance? Communality for each variable greater than 0.50?
Yes No Remove variable with lowest communality and repeat analysis SW388R7 Data Analysis & Computers II
Slide 68 Steps in principal component analysis - 4 Does all variables show simple structure (only 1 loading > 0.40)? Yes No Do all of the components have more than one variable loading on it? Yes No Remove variable with complex structure and repeat analysis False No Are the number of components and pattern of loadings correct? Remove single variable loading on component and repeat analysis Yes SW388R7 Data Analysis & Computers II
Slide 69 Steps in principal component analysis - 5 Are all of the metric variables included in the final analysis interval level? Yes No Is the sample size for the final analysis include 100 cases or more? Yes No True with caution No Is the cumulative proportion of variance for variables 60% or higher? Yes True with caution True with caution True