Handout: Statistical Analysis using SPSS-Juliana Bahiense
Statistical Analysis Using SPSS command Practical Guide
Bahiense-Juliana de Sousa Guimarães. Salvador / BA julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense Summary 1. Introduction ................................................. .............. .................................... .................................. 3 2. Ste p ................................................ ............................. ..................... .......................... 3 3. The Windows .............. .................................. ............................................. ..... .................................. 4 4. Menus ............................ .................... .................................................. ........ ............................ 6 4.1 Data Editor .......... ...................... ............................ .................................................. ................ 6 4.2 Output ............................... .................. ................................ ............................................... ... ... May 9. Data Analysis ............................................... ... ............................................... ..................... 10 6. Bibl iography ................................................ ...................... ............................ ............ 19 julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense 1. Introduction The Statistical Package for Social Sciences for Windows (SPSS) is a software for statistical analysis of data, in a friendly environment, using menus and dialog windows, which allows you to perform complex calculations and display results i n a simple and self-explanatory. According to the Wikipedia site, "SPSS is a sof tware application (computer program) the type of science, acronym for Statistica l Package for Social Sciences - Statistical Package for Social Sciences. This pa ckage to support decision making that includes: analytical application, Data Min ing, Text Mining and statistics that turn data into valuable information that pr ovide lower costs and increase profitability. One of the important uses of this software is to perform market research. " The first version dates from 1968 and the latest is the SPSS for Windows 16 (2007). To illustrate we will use the data bases 1991 U.S. General Social Survey.sav anorectic.sav which is in the SPSS dir ectory. and To improve the use of the routines presented in this book it is necessary a prev ious knowledge of statistical techniques of data mining. 2. First Step Once you start the program the following screen: There you can open an existing file (or database syntax or output), go to the tu torial, create a new database. julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense 3. The windows In SPSS there are seven types of windows, they are: SPSS - Data Editor: allows e ntry, modification and visualization of data. Output - SPSS Viewer: it is the re sults window, tables and graphs. Syntax - SPSS Syntax Editor window: The window where we keep the commands of SPSS for reuse at another time. SPSS Pivot Table O bject: edit and modify tables. SPSS Chart Object: edit and modify graphics. Scri pt Editor: create and modify scripts to automate tasks. Text Output Editor: chan ge text not visible in the Pivot Table Editor. However, he works primarily with the first three, which will be displayed in thi s book. The initial appearance of the editor is presented in the following figur es. In Figure 1 we have the Data View (Data Editor), in which columns are variab les and rows cases (or individuals). The cells can contain numeric or alphanumer ic values, but can not contain formulas. Figure 1 - Display of data - bank anorectic.sav In Figure 2 we have the Variable View (Data Editor), where we define the charact eristics of the variables: name: variable name, maximum 64 characters, uppercase and lowercase letters are equal. Type: type of variable (numeric, date, currenc y, alphanumeric (string)) Width: length of the variable, ie the number of digits you have. Decimals: number of decimal places that the variable has. Label: desc riptive variable julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense Values: value labels of the variables (eg, 1 = female and 2 = male). Missing: to indicate the coding of missing values, those that will not be considered for th e purpose of statistical calculation. Columns: indicates the number of character s that form the spine, ie the column width. Align: alignment of the data.ÂMeasu re: selects the measuring scale of the variable (interval / ratio, ordinal or no minal). Figure 2 - Display of variables - bank anorectic.sav In Figure 3 we have View (Output), which shows all the outputs required, such as graphs, tables, and statistics. In Figure 4 we display the command syntax "Freq uencies" Descriptive Statistics of the topic. Figure 3 - Display Output - Output - bank anorectic.sav julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense Figure 4 - Screen syntax - Syntax - bank anorectic.sav 4. Menus 4.1 Date Editor File - has the functions to create, open, read, print, save, show recently used files, for the process, exit the program. julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense Edit - editing commands manage files, modify, copy, paste, cut, delete, find and manipulate the output format (default). View - format of screens: toolbars, fonts, status, and grid line and labels of v ariables. Date - insert variable or data, define data format, ordering file according to v alues of a variable incorporates variables (in a new file - transpose), group fi les (merge files), create new file with added values of the original variables, divided a file according to a qualitative variable, selects cases that meet a ce rtain condition, considering the values of the variable. Transform - to change selected variable, calculate new variables from existing g enerates random sample creates a new variable through existing recoding variable s, transformed into categorical qualitative variable, assign jobs to the values of a variable (according to another) creates variable Lag time series, replaces missing values, the wheel transformaçõess pending. julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense Analyze - Descriptive Analysis and statistical functions, tables of frequencies, ANOVA, Correlation, Regression, Factor Analysis, Reliability Analysis, Analysis of multiple responses, non-parametric tests, Survival Analysis, etc.. Graphs - Create bar charts, sectoral, Box plot, line, histogram, etc.. julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense Utilities - to obtain information about variables, change menus, scripts ... Window - switching between different windows that are open SPSS. Help - Help topics, tutorials, Home of SPSS. 4.2 Output The menu bar is similar to the output of the Data Editor window, plus items Inse rt and Format julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense 5. Data Analysis In SPSS we can create a new bank in the program itself or imported from another software such with Excel, Access, dBase. After loading the database SPSS is read y to be exploited. Start with simpler procedures for descriptive statistics. For this analysis we will use the database in 1991 U.S. General Social Survey.sav Table of Frequency Distribution To generate the frequency table follow the follo wing commands in the menu bar on the windows or Data Editor Output: Analyze> Des criptive Statistics>> Frequecies Or, we can use the commands from Syntax window, as follows: FREQUENCIES VARIABLE S = fri / ORDER = ANALYSIS. For this example select the variable "fri" (sex of respondents), obtaining the f ollowing output: Respondent's Fri Frequency Valid Male Female Total 636 881 1517 Percent 41.9 58.1 100.0 Valid Percent 41.9 58.1 100.0 Cumulative Percent 41.9 100.0 We can format the table data, such as number of decimal places, include%, font, etc.. For this, it is also necessary in the Output window, giving double-click t he left mouse on the table, for it opens up the "island" of editing, select the data you want to format and give a click with the right to opens the list of men u options. You can also request a frequency table of several variables at once, simply select them in the dialog, or add them into commands Syntax: FREQUENCIES VARIABLES = fri sibs / ORDER = ANALYSIS. julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense Later this item, we may request, through the Statistics button and some summary statistics charts and graphs to represent the variables. When we need to describe quantitative variables using general statistics we can use the command: Analyze> Descriptive Statistics> Descriptive Or even the commands: Analyze> Descriptive Statistics> Explore Analyze this menu item can also obtain statistical parameters, boxplots and bran ch-and-leaf and tests of normality Kolmogorov-Smirnov and Shapiro-Wilk (where th e null hypothesis, H0, tells us that the studied variable follows normal distrib ution, versus the alternative hypothesis, Ha, the variable does not julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense follows a Normal distribution, whose decision rule is if p-value <α then we rej ect H0) and visual analysis using the graphic Detrended QQ and QQ (normality whe n the points are distributed randomly around the line). To do the analysis of va riable X according to the factors of variable X on Y should insert "Dependent Li st" and Y "Factor List". To analyze quantitative variables based on a qualitative, for example, want to k now if sex (fri) may explain variations in study time (educ). We can do this che ck using: I. II. III. IV. V. Analyze> Explore Analyze> Reports>> Report Summary Row in Analyze> Compare Means> Means Analyze> Compare Means>> independet Sample T Test Graphs> Boxplot To apply the t-Student test must verify that the variable tested meet the assump tions of normality and homoscedasticity, the latter can be checked by Levene tes t whose null hypothesis says there is no difference between the variances. The t -student test has as null hypothesis that there is no difference between the ave rage of the variable by group (factor). For both tests we have as decision rule if p-value <α then we reject H0. Variable crossover can be done through the com mand: Analyze> Descriptive Statistics>> Crosstable Then we select the variables that will form the rows and columns. We can add the percentages by clicking the "Cell Display". We can also use one of the commands of tables, for example: Analyze>> General Ta bles>> General Tables julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense Correlation analysis can be done to address how the variables relate. We can obt ain the Pearson correlation coefficients and Spearman correlation coefficient (v ariables whose distribution is not Normal). Analyze>> correlate>> Bivariate Correlations Number of Children Spearman's rho Correlation Coefficient Number of Children Sig . (2-tailed) N Highest Year of School Completed Correlation Coefficient Sig. (2- tailed) N Highest Year School Completed, Father Correlation Coefficient Sig. (2- tailed) N ** Correlation is Significant at the 0:01 level (two-tailed). 1.000. 1 509 -, 262 (**) 000 1507 - 297 (**) 000 1064 Highest Year of School Completed -, 262 (**) 000 1507 1000. 1510, 450 (**) 000 1 065 Highest Year School Completed, Father -, 297 (**) 000 1064, 450 (**) 000 1065 10 00. 1069 The null hypothesis tested is zero correlation (two-tailed test). Regression analysis can be done to model a variable in another function (s). Ana lyze> Regression>> (select the model type) julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense The following is the output from linear regression in which the dependent variab le is "educ" and the independent variables are: "fri", "paeduc" and "maeduc. Variables Entered / Removed (b) Model 1 Variables Entered Variables Removed Method Highest Year School Completed, Mother, Respondent's Fri, Highest Year School Com pleted, Father (a) . Enter Requested All the variables entered. b Dependent Variable: Highest Year of Schoo l Completed Model Summary (b) R 1 R Square Model, 486 (a), 236, 234 Adjusted R Square Std Error of the Estimate 2.448 Coefficient of determination: R2 = 23.6%. This model explains 23.6% of the varia tion of "educ". a Predictors: (Constant), Highest Year School Completed, Mother, Respondent's Fr i, Highest Year School Completed, Father b Dependent Variable: Highest Year of S chool Completed ANOVA (b) Model Sum of Squares 1 Regression Residual Total 1796.560 5806.745 760 3.305 598.853 5.993 3969972 99.934 df Mean Square F Sig. , 000 (a) P-value = 0.000 we reject H0 and educ can be modeled by a straight line with the predictors selected. a Predictors: (Constant), Highest Year School Completed, Mother, Respondent's Fr i, Highest Year School Completed, Father b Dependent Variable: Highest Year of S chool Completed Coefficients (a) Coefficients unstandardized B 1 (Constant) Fri Respondent's Hig hest Year School Completed, Father Highest Year School Completed, Mother 9.902 - , 380, 196, 189 Std Error, 384, 160, 026, 031 -, 067, 288, Standardized Beta Coe fficients 231 7.574 6.085 25.782 -2.381, 000, 017, 000, 000 Sig. t the Dependent Variable: Highest Year of School Completed The equation of the model is: educ = 9.902 to 0.380 fri + 0.196 + 0.189 paeduc maeduc julianabahiense@gmail.com All predictors are statistically significant. Handout: Statistical Analysis using SPSS-Juliana Bahiense Statistics residuals (a) Minimum Predicted Predicted Value Standard Value Std Er ror of Predicted Value Adjusted Predicted Value 9.11 Residual Residual Std Stud. Deleted Residual Residual Stud. Deleted Residual Mahal. Cook's Distance Distanc e Centered Leverage Value, 001 to Dependent Variable: Highest Year of School Com pleted, 023, 003, 003 973 -9.603 -3.923 -3.930 -9.636 -3.959, 744, 000 17.20 8.2 77 3.381 3.399 8.365 3.418 22.354, 045 13 , 54, 000, 000, 000, 000, 000 2.997 00 1 1.359 2.444, 1.001 2.455 1.002 2.499 998, 003 973 973 973 973 973 973 973 973 9.14 -3.239 104 17.22 2.707 Maximum 379 Mean 13, 54, 000, 151 Std Deviation 1.36 0 1.000, 041 N 973 973 973 Normal PP Plot of Regression Standardized Residual Dependent Variable: Highest Year of School Completed 1.0 0.8 Expected Cum Prob Analysis Visula waste to assess the quality of adjustment. Indicates data normal ity "educ". 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Observed Cum Prob The Factor analysis has as main objective to describe the variability of a set o f variables in terms of a smaller number of variables that are related to the or iginal group by the linear model, without loss of information. SPSS uses the fol lowing commands: Analyze> Data Reduction> Factor julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense In this dialog, we can specify and descriptive statistics and correlation coeffi cients. We selected the method of extracting factors. Initial solution presents the commonalities, the eigenvalues and the percentage of variance explained. Correlation matrix: variables in different scales. Covariance matrix: multiple g roups with different variances for each variable. Tests for the validity of the application of factor analysis. . Interpretation o f the KMO test: <.50 from .50 to .60 from .60 to .70 .70 to .80 .80 to .90 .90 t o 1 initials. In the same dialog box can also set the Rotation (Rotation), which is applied to transform coefficients of the main components in a simplified str ucture with Method: Varimax: Some significant weights and the other close to zer o Quartimax: heavy weights for a few components and near zero for the other. Equ amax: combination of Varimax and Quartimax. Direct oblimin and Promax: methods n ot orthogonal, there is the assumption of independence of components. The method of calculating the scores are defined in Scores. And we can choose the Options will be treated as a missing value, for example. Unacceptable Poor Fair Average Good Very Good The null hypothesis test of sphericity Bartlett says there is no correlation bet ween variables In SPSS we have some tests of hypotheses, for example, have with parametric test s available to test theo ANOVA and nonparametric tests like the Sign test, McNem ar, Wilcoxon, Mann-Whitney and Kruskal-wallis, Randomness, Binomial and Chi-squa re . julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense The t test can be done via the command: Analyze> Compare Means> Independent Samp les T test The groups of the variable is defined in "Define Groups". These values correspond to codes used in the variable, in this case, "fri", 1 = male and 2 = female The output is shown: Group Statistics Respondent's Fri Male Female N 633 877 Mean 13.23 12.63 Std Dev iation 3.143 2.839 Std Error Mean, 125, 096 Highest Year of School Completed % Chance of observing a mean difference of this value, if H0 is true. Independent Samples Test Levene's Test for Equality of Variance t-test for Equality of Means Difference Std Error, 155, 157 95% Confidence Inter val of the Difference Lower, 298, 293 Upper, 906, 911 Highest Year of School Completed Equal variances assumed Equal variances not assumed F 11.226 Sig. , 001 t 3.887 3.824 df 1508 1276.454 Sig. (2-tailed), 000, 000 Mean Difference, 602, 602 Test of equal variances. H0 equal variances. Average years for samples (mas. and fem.) Differed 0,602 years. julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense The ANOVA can be done through the command: Analyze> Compare Means> One-Way ANOVA Get summaries of data. Get multiple comparison test of Bonferroni. For Non-parametric tests proceeded as follows: Analyze>> Nonparametric Tests We, in this order, the chi-square, Binomial, randomness, Kolmogorov-Smirnov test for two independent samples, test for two related samples, Kruskal-Wallis and M edian (k Independent Samples) To do Cluster AnalysisÂfollow the following commands: Analyze> Classify> Hierar chical Cluster julianabahiense@gmail.com Handout: Statistical Analysis using SPSS-Juliana Bahiense To put the variables in the same scale we standardize it by the method of proces sing found in the dialog box. For dendogramms 6. Bibliography CAZORLA, Irene M. Course packages. UESC. Ilheus. Aug 2003. Ferreira, Armando M. SPSS - Instruction Guide. Agrarian School of Castelo Branco . 1999. PEREIRA, Alexandre. Practical Guide to Using SPSS. Data Analysis for Social Scie nces and Psychology. 4th ed. Silabo editions. Lisbon. Mar 2003. SANTANA, Cora. LISBON, Grace Basic Guide to SPSS for Windows. CPD / UFBA. SPSS Inc. Statistical Analysis Using SPSS. Chicago. 2001 Wikipedia. SPSS. Available at: <http://pt.wikipedia.org/wiki/SPSS>. julianabahiense@gmail.com