Beruflich Dokumente
Kultur Dokumente
Table of Contents
1. Outline: SPSS Workshop 2014 .............................................................................................................................3
2. What is SPSS? .....................................................................................................................................................4
3. Introducing the SPSS interface ............................................................................................................................5
3.1. SPSS Data Editor: Data View ........................................................................................................................................... 5
3.2. SPSS Data Editor: Variable View...................................................................................................................................... 5
3.3. SPSS Output window....................................................................................................................................................... 7
3.4. SPSS Syntax window........................................................................................................................................................ 7
4. Getting familiar with SPSS Menu and Icon ...........................................................................................................8
5. Data Import/Export ..........................................................................................................................................10
5.1. Create Data File (Entering Data) ................................................................................................................................... 10
5.2. Opening Data File (Import data) ................................................................................................................................... 15
5.2.1. Opening SPSS data: File > Open>Data… (Select SPSS statistics (*.sav) as File of type :) ..................................... 15
5.2.2. Opening Text File: Fixed width .............................................................................................................................. 15
5.2.3. Opening Text File: (Tab) Delimited ....................................................................................................................... 17
5.2.4. Opening EXCEL (or CSV) File .................................................................................................................................. 19
5.2.5. Opening SAS data file ............................................................................................................................................ 20
5.3. Export Data File (Save as different type of data) .......................................................................................................... 21
5.4. Saving Data File with selected variables ....................................................................................................................... 21
6. Manipulating data1 (SPSS Menu: Data) .............................................................................................................22
6.1. Data Menu: Sort Cases… ............................................................................................................................................... 22
6.2. Data Menu: Identify Duplicate Cases… ......................................................................................................................... 23
6.3. Data Menu: Merge Files > Add Cases…......................................................................................................................... 24
6.4. Data Menu: Merge Files > Add Variables… ................................................................................................................... 25
6.5. Data Menu: Aggregate… ............................................................................................................................................... 26
6.6. Data Menu: Restructure… ............................................................................................................................................. 27
6.7. Data Menu: Split into Files ............................................................................................................................................ 29
6.8. Data Menu: Split Files…................................................................................................................................................. 30
6.9. Data Menu: Select Cases… ............................................................................................................................................ 31
6.10. Data Menu: Weight Cases… .................................................................................................................................. 32
7. Manipulating data2 (SPSS Menu: Transform).....................................................................................................33
7.1. Transform Menu: Compute Variable… ......................................................................................................................... 33
7.2. Transform Menu: Recode into Same Variables….......................................................................................................... 33
7.3. Transform Menu: Recode into Different Variables… .................................................................................................... 34
7.4. Transform Menu: Automatic Recode… ......................................................................................................................... 34
7.5. Transform Menu: Create Dummy Variables ................................................................................................................. 35
7.6. Transform Menu: Visual Binning… ................................................................................................................................ 35
7.7. Transform Menu: Rank Cases…..................................................................................................................................... 37
7.8. Transform Menu: Date and time Wizard… ................................................................................................................... 37
7.9. Transform Menu: Replace missing values… .................................................................................................................. 38
8. Descriptive statistics .........................................................................................................................................39
8.1. Descriptive statistics for continuous data (Interval, Ratio)........................................................................................... 39
8.2. Descriptive statistics for categorical data (Nominal, Ordinal) ...................................................................................... 44
8.3. Generating graphs (or charts) for continuous data (Interval, Ratio) ............................................................................ 47
8.4. Generating graphs (or charts) for categorical data (Nominal, Ordinal) ........................................................................ 50
8.5. Using Chart Builder ....................................................................................................................................................... 52
WCHRI, University of Alberta
2 SPSS Workshop 2014 Tutorial
1
Note that this tutorial was created using IBM SPSS Statistics Version 22.
2. What is SPSS?
• Windows based program that can be used to perform data entry and analysis and to create tables and graphs.
• Capable of handling large amounts of data and can perform all of the analyses covered in the text and much
more.
• Commonly used in the Social Sciences and in the business world.
• SPSS is updated often.
Many of the features of Data View are similar to the features that are found in spreadsheet applications. There are,
however, several important distinctions:
• Rows are cases. Each row represents a case or an observation. For example, each individual respondent to a
questionnaire is a case.
• Columns are variables. Each column represents a variable or characteristic that is being measured. For example, each
item on a questionnaire is a variable.
• Cells contain values. Each cell contains a single value of a variable for a case. The cell is where the case and the variable
intersect. Cells contain only data values. Unlike spreadsheet programs, cells in the Data Editor cannot contain formulas.
• The data file is rectangular. The dimensions of the data file are determined by the number of cases and variables. You
can enter data in any cell. If you enter data in a cell outside the boundaries of the defined data file, the data rectangle
is extended to include any rows and/or columns between that cell and the file boundaries. There are no "empty" cells
within the boundaries of the data file. For numeric variables, blank cells are converted to the system-missing value. For
string variables, a blank is considered a valid value.
Variable View contains descriptions of the attributes of each variable in the data file. In Variable View:
• Rows are variables.
• Columns are variable attributes.
WCHRI, University of Alberta
6 SPSS Workshop 2014 Tutorial
You can add or delete variables and modify attributes of variables, including the following attributes:
• Variable name
• Data type
• Number of digits or characters
• Number of decimal places
• Descriptive variable and value labels
• User-defined missing values
• Column width
• Measurement level
All of these attributes are saved when you save the data file.
In addition to defining variable properties in Variable View, there are two other methods for defining variable properties:
• The Copy Data Properties Wizard provides the ability to use an external IBM® SPSS® Statistics data file or another
dataset that is available in the current session as a template for defining file and variable properties in the active
dataset. You can also use variables in the active dataset as templates for other variables in the active dataset. Copy
Data Properties is available on the Data menu in the Data Editor window. See the topic Copying Data Properties for
more information.
• Define Variable Properties (also available on the Data menu in the Data Editor window) scans your data and lists all
unique data values for any selected variables, identifies unlabeled values, and provides an auto-label feature. This
method is particularly useful for categorical variables that use numeric codes to represent categories--for example, 0 =
Male, 1 = Female. See the topic Defining Variable Properties for more information.
SPSS MENUs
SPSS ICONs
Status Bar
SPSS MENU
• File includes all of the options you typically use in other programs, such as open, save, exit. Notice, that you can open
or create new files of multiple types as illustrated to the right.
• Edit includes the typical cut, copy, and paste commands, and allows you to specify various options for displaying data
and output.
o Click on Options, and you will see the dialog box to the left. You can use this to format the data, output, charts,
etc. These choices are rather overwhelming, and you can simply take the default options for now. The author
of your text (me) was too dumb to even know these options could easily be set.
• View allows you to select which toolbars you want to show, select font size, add or remove the gridlines that separate
each piece of data, and to select whether or not to display your raw data or the data labels.
• Data allows you to select several options ranging from displaying data that is sorted by a specific variable to selecting
certain cases for subsequent analyses.
• Transform includes several options to change current variables. For example, you can change continuous variables to
categorical variables, change scores into rank scores, add a constant to variables, etc.
• Analyze includes all of the commands to carry out statistical analyses and to calculate descriptive statistics. Much of
this book will focus on using commands located in this menu.
• Graphs includes the commands to create various types of graphs including box plots, histograms, line graphs, and bar
charts.
• Utilities allows you to list file information which is a list of all variables, there labels, values, locations in the data file,
and type.
• Add-ons are programs that can be added to the base SPSS package. You probably do not have access to any of those.
• Window can be used to select which window you want to view (i.e., Data Editor, Output Viewer, or Syntax). Since we
have a data file and an output file open, let’s try this.
o Select Window/Data Editor. Then select Window/SPSS Viewer.
• Help has many useful options including a link to the SPSS homepage, a statistics coach, and a syntax guide. Using topics,
you can use the index option to type in any key word and get a list of options, or you can view the categories and
subcategories available under contents. This is an excellent tool and can be used to troubleshoot most problems.
SPSS ICON
• The Icons directly under the Menu bar provide shortcuts to many common commands that are available in specific
menus. Take a moment to review these as well.
STATUS Bar
The status bar at the bottom of each IBM® SPSS® Statistics window provides the following information:
• Command status. For each procedure or command that you run, a case counter indicates the number of cases
processed so far. For statistical procedures that require iterative processing, the number of iterations is displayed.
• Filter status. If you have selected a random sample or a subset of cases for analysis, the message Filter on indicates
that some type of case filtering is currently in effect and not all cases in the data file are included in the analysis.
• Weight status. The message Weight on indicates that a weight variable is being used to weight cases for analysis.
• Split File status. The message Split File on indicates that the data file has been split into separate groups for analysis,
based on the values of one or more grouping variables.
5. Data Import/Export
5.1. Create Data File (Entering Data)
Variable names
The following rules apply to variable names:
Note: Letters include any non-punctuation characters used in writing ordinary words in the languages supported in the
platform's character set.
Variable type
Variable Type specifies the data type for each variable. By default, all new variables are assumed to be numeric. You can use
Variable Type to change the data type. The contents of the Variable Type dialog box depend on the selected data type. For
some data types, there are text boxes for width and number of decimals; for other data types, you can simply select a
format from a scrollable list of examples. The available data types are as follows:
• Numeric. A variable whose values are numbers. Values are displayed in standard numeric format. The Data Editor
accepts numeric values in standard format or in scientific notation.
WCHRI, University of Alberta
11 SPSS Workshop 2014 Tutorial
• Comma. A numeric variable whose values are displayed with commas delimiting every three places and displayed with
the period as a decimal delimiter. The Data Editor accepts numeric values for comma variables with or without commas
or in scientific notation. Values cannot contain commas to the right of the decimal indicator.
• Dot. A numeric variable whose values are displayed with periods delimiting every three places and with the comma as
a decimal delimiter. The Data Editor accepts numeric values for dot variables with or without periods or in scientific
notation. Values cannot contain periods to the right of the decimal indicator.
• Scientific notation. A numeric variable whose values are displayed with an embedded E and a signed power-of-10
exponent. The Data Editor accepts numeric values for such variables with or without an exponent. The exponent can be
preceded by E or D with an optional sign or by the sign alone--for example, 123, 1.23E2, 1.23D2, 1.23E+2, and 1.23+2.
• Date. A numeric variable whose values are displayed in one of several calendar-date or clock-time formats. Select a
format from the list. You can enter dates with slashes, hyphens, periods, commas, or blank spaces as delimiters. The
century range for two-digit year values is determined by your Options settings (from the Edit menu, choose Options,
and then click the Data tab).
• Dollar. A numeric variable displayed with a leading dollar sign ($), commas delimiting every three places, and a period
as the decimal delimiter. You can enter data values with or without the leading dollar sign.
• Custom currency. A numeric variable whose values are displayed in one of the custom currency formats that you have
defined on the Currency tab of the Options dialog box. Defined custom currency characters cannot be used in data
entry but are displayed in the Data Editor.
• String. A variable whose values are not numeric and therefore are not used in calculations. The values can contain any
characters up to the defined length. Uppercase and lowercase letters are considered distinct. This type is also known as
an alphanumeric variable.
• Restricted numeric. A variable whose values are restricted to non-negative integers. Values are displayed with leading
zeros padded to the maximum width of the variable. Values can be entered in scientific notation.
Measure of Variables
• Nominal variable is one that has two or more categories, but there is no intrinsic ordering to the categories.
e.g., gender, ethnicity etc.
• Ordinal variable is similar to nominal variable with clear ordering of the categories but the spacing between the values
may not be the same.
e.g. Socio-economic status, Severity of disease etc.
• Interval variable is similar to ordinal variable with intervals between values are equally spaced.
e.g. Height, weight, age etc.
Missing values
• If you do not enter any data in a field, it will be considered as missing and SPSS will enter a period for you.
• Or you can define specific value as missing value
• Entering data in SPSS (Variable name, define value labels, and define missing value)
5.2.1. Opening SPSS data: File > Open>Data… (Select SPSS statistics (*.sav) as File of type :)
5.2.2. Opening Text File: Fixed width
o Raw data
o Open in SPSS
o File > Read Text Data…
o File > Open>Data…
o Open in SPSS
o File > Read Text Data…
o File > Open>Data…
o Open in SPSS
o File > Open>Data…
Select EXCEL as “Files of type:” to open EXCEL file
Select TEXT as “Files of type:” to open CSV file
o Or simply drag EXCEL (or CSV) file to SPSS program
WCHRI, University of Alberta
20 SPSS Workshop 2014 Tutorial
o Open in SPSS
o File > Open>Data…
Select SAS as “Files of type:” to open SAS data file
o Or simply drag SAS data file to SPSS program
WCHRI, University of Alberta
21 SPSS Workshop 2014 Tutorial
o SPSS output
Cumulative
Frequency Percent Valid Percent Percent
Valid Duplicate Case 1 3.2 3.2 3.2
Primary Case 30 96.8 96.8 100.0
Total 31 100.0 100.0
o Both SPSS datasets above have same variables with same name, but different cases (One is data for Female, the other is for
Male).
o Note that both SPSS datasets above must have key variable (Unique Identifier).
o Note that both SPSS datasets above must be sorted by key variable before merging files.
o Using “Restructure” menu in SPSS (Step1 – 7), we can convert wide format data into long format (or Vice versa)
o If you run above, you can see “Split by Gender” in Status bar
o SPSS output before splitting files by Gender.
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Age 30 12 31 21.60 6.360
Weight 30 100 200 147.13 34.309
Height 30 121 188 144.00 17.388
Valid N (listwise) 30
o If you analyze all cases, then select “Analyze all cases, do not create groups” in “Split Files…” menu (See Figure 30)
o From the Figure 32, Female data was selected, so all of male data will be excluded from the analysis.
o If you want to use all cases again, then select “All cases” in “Select Cases…” menu in Figure 31
x * y Crosstabulation
Count
y
1 2 Total
x 1 15 20 35
2 25 35 60
Total 40 55 95
o From the example dataset, we want to generate same categorized age variable (<20 years, 20-29 years, >=30 years) in
Section 7.4.
o Make Cutpoints…
o In the example dataset, “Date” variable is string variable. Let’s generate date type variable using string (or text).
8. Descriptive statistics
• With the dataset specified and labeled it is ready for analysis.
• The first thing that would be done before conducting the analysis would be to present descriptive statistics for each of the
variables in the study.
• The descriptive statistics that will be presented or frequency distributions, measures of central tendency and comparing
means with different groups etc.
o SPSS output:
Descriptive Statistics
N Minimum Mean Std. Deviation Variance Skewness Kurtosis
Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error
Age 30 12 21.60 6.360 40.455 .067 .427 -1.380 .833
Weight 30 100 147.13 34.309 1177.085 .087 .427 -1.414 .833
Height 30 121 144.00 17.388 302.345 .872 .427 .358 .833
Valid N (listwise) 30
Descriptive Statistics
N Minimum Mean Std. Deviation Variance Skewness Kurtosis
Gender Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error
F Age 18 12 20.06 5.955 35.467 .527 .536 -.805 1.038
Weight 18 100 142.61 36.934 1364.134 .314 .536 -1.596 1.038
Height 18 121 143.94 18.479 341.467 1.119 .536 .902 1.038
Valid N (listwise) 18
M Age 12 12 23.92 6.487 42.083 -.688 .637 -.852 1.232
Weight 12 111 153.92 30.189 911.356 -.141 .637 -.674 1.232
Height 12 122 144.08 16.412 269.356 .435 .637 -.338 1.232
Valid N (listwise) 12
o SPSS output:
(Note that if you do analysis by group variable, then add group variable into “Factor List:” on the menu)
Descriptives
Statistic Std. Error
Height Mean 144.00 3.175
95% Confidence Interval for Lower Bound 137.51
Mean
Upper Bound 150.49
5% Trimmed Mean 142.96
Median 144.00
Variance 302.345
Std. Deviation 17.388
Minimum 121
Maximum 188
Range 67
Interquartile Range 24
Skewness .872 .427
Kurtosis .358 .833
Tests of Normality
a
Kolmogorov-Smirnov Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
*
Height .111 30 .200 .927 30 .041
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
6.00 12 . 124579
8.00 13 . 00013568
6.00 14 . 355679
5.00 15 . 02455
2.00 16 . 03
1.00 17 . 7
2.00 18 . 08
Stem width: 10
Each leaf: 1 case(s)
Report
Weight
Age (categorical variable) Gender N Mean Std. Deviation Median Minimum Maximum
< 20 years F 10 131.60 36.056 111.00 100 197
M 3 169.00 30.050 167.00 140 200
Total 13 140.23 37.343 140.00 100 200
20-29 years F 6 162.33 35.770 169.00 115 199
M 7 146.29 34.369 161.00 111 198
Total 13 153.69 34.541 161.00 111 199
>= 30 years F 2 138.50 38.891 138.50 111 166
M 2 158.00 2.828 158.00 156 160
Total 4 148.25 25.171 158.00 111 166
Total F 18 142.61 36.934 133.00 100 199
M 12 153.92 30.189 160.50 111 200
Total 30 147.13 34.309 158.00 100 200
o SPSS output:
Statistics
Weight Height
N Valid 30 30
Missing 0 0
Mean 147.13 144.00
Median 158.00 144.00
Percentiles 10 102.80 124.10
25 111.00 130.00
30 111.90 130.30
50 158.00 144.00
70 166.70 151.40
75 169.75 154.25
o SPSS output:
Gender
Cumulative
Frequency Percent Valid Percent Percent
Valid F 18 60.0 60.0 60.0
M 12 40.0 40.0 100.0
Total 30 100.0 100.0
Ethnicity
Cumulative
Frequency Percent Valid Percent Percent
Valid A 11 36.7 36.7 36.7
B 5 16.7 16.7 53.3
O 5 16.7 16.7 70.0
W 9 30.0 30.0 100.0
Total 30 100.0 100.0
o SPSS output:
Gender * Ethnicity Crosstabulation
Ethnicity
A B O W Total
Gender F Count 6 3 3 6 18
% within Gender 33.3% 16.7% 16.7% 33.3% 100.0%
% within Ethnicity 54.5% 60.0% 60.0% 66.7% 60.0%
% of Total 20.0% 10.0% 10.0% 20.0% 60.0%
M Count 5 2 2 3 12
% within Gender 41.7% 16.7% 16.7% 25.0% 100.0%
% within Ethnicity 45.5% 40.0% 40.0% 33.3% 40.0%
% of Total 16.7% 6.7% 6.7% 10.0% 40.0%
Total Count 11 5 5 9 30
% within Gender 36.7% 16.7% 16.7% 30.0% 100.0%
% within Ethnicity 100.0% 100.0% 100.0% 100.0% 100.0%
% of Total 36.7% 16.7% 16.7% 30.0% 100.0%
8.3. Generating graphs (or charts) for continuous data (Interval, Ratio)
- Histogram, Box-plot, Stem-and-Leaf plot
- Error bar chart, Scatter plot etc.
(Example dataset: DataExcel.sav)
8.4. Generating graphs (or charts) for categorical data (Nominal, Ordinal)
- Bar, Pie chart, Line, Area chart etc.
(Example dataset: DataExcel.sav)
• Bar chart: Graphs > Legacy Dialogs > Bar… Analyze > Descriptive Statistics>Explore…
• Example:
o SPSS output:
Group Statistics
GrazeType N Mean Std. Deviation Std. Error Mean
WeightGain continuous 16 75.19 33.812 8.453
controlled 16 83.13 30.535 7.634
o Interpretation: A group test statistic for the equality of means is reported for both equal and unequal
variances. Both tests indicate a lack of evidence for a significant difference between grazing methods (and for
the pooled test-equal variance assumed), and for the Satterthwaite test-equal variance not assumed). The
equality of variances test does not indicate a significant difference in the two variances (Levene’s Test). This
test assumes that the observations in both groups are normally distributed.
o SPSS output:
N Correlation Sig.
o Interpretation: The variables SBPbefore and SBPafter are the paired variables with a sample size of 12.
The summary statistics of the difference are displayed (mean, standard deviation, and standard
error) along with their confidence limits. The minimum and maximum differences are also displayed.
The test is not significant (t=-1.09, p=0.299), indicating that the stimuli did not significantly affect
systolic blood pressure.
o SPSS output:
Chi-Square Tests
Asymp. Sig. (2-
Value df sided)
Pearson Chi-Square 20.925a 8 .007
Likelihood Ratio 25.973 8 .001
Linear-by-Linear Association 3.229 1 .072
N of Valid Cases 762
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is
5.75.
o Interpretation: The SPSS output displays the chi-square statistics. The alternative hypothesis for this analysis
states that eye color is associated with hair color. With p-value=0.007, the alternative hypothesis is supported
WCHRI, University of Alberta
57 SPSS Workshop 2014 Tutorial
o SPSS output:
Chi-Square Tests
Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
Value df sided) sided) sided)
Pearson Chi-Square 4.960a 1 .026
Continuity Correctionb 3.188 1 .074
Likelihood Ratio 5.098 1 .024
Fisher's Exact Test .039 .037
Linear-by-Linear Association 4.744 1 .029
N of Valid Cases 23
a. 2 cells (50.0%) have expected count less than 5. The minimum expected count is 3.48.
b. Computed only for a 2x2 table
Risk Estimate
95% Confidence Interval
Value Lower Upper
Odds Ratio for Heart Disease
(Low Cholesterol Diet / High 8.250 1.154 59.003
Cholesterol Diet)
For cohort Exposure = No 3.900 .989 15.373
For cohort Exposure = Yes .473 .214 1.045
N of Valid Cases 23
o Interpretation: SPSS output displays the chi-square statistics. Because the expected counts in some of the
table cells are small, Output gives a warning that the asymptotic chi-square tests might not be appropriate. In
this case, the exact tests are appropriate. The alternative hypothesis for this analysis states that coronary
heart disease is more likely to be associated with a high fat diet, so a one-sided test is desired. Fisher’s exact
right-sided test analyzes whether the probability of heart disease in the high fat group exceeds the probability
of heart disease in the low fat group; because this p-value is small, the alternative hypothesis is supported.
The odds ratio, displayed in “Risk estimate” table, provides an estimate of the relative risk when an event is
rare. This estimate indicates that the odds of heart disease is 8.25 times higher in the high fat diet group;
however, the wide confidence limits indicate that this estimate has low precision.
o SPSS output:
Chi-Square Tests
Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
Gender Value df sided) sided) sided)
female Pearson Chi-Square 8.310c 1 .004
Continuity Correctionb 6.759 1 .009
Likelihood Ratio 8.633 1 .003
Fisher's Exact Test .005 .004
N of Valid Cases 52
male Pearson Chi-Square 1.501d 1 .221
Continuity Correctionb .884 1 .347
Likelihood Ratio 1.515 1 .218
Fisher's Exact Test .264 .174
N of Valid Cases 54
Total Pearson Chi-Square 8.443a 1 .004
Continuity Correctionb 7.318 1 .007
Likelihood Ratio 8.626 1 .003
Fisher's Exact Test .005 .003
N of Valid Cases 106
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 19.25.
b. Computed only for a 2x2 table
c. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 10.10.
d. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 9.15.
Risk Estimate
95% Confidence Interval
Gender Value Lower Upper
female Odds Ratio for Treatment
5.818 1.676 20.203
(Active / Placebo)
For cohort Response = Better 2.963 1.274 6.891
For cohort Response = Same .509 .310 .836
N of Valid Cases 52
male Odds Ratio for Treatment
2.036 .648 6.398
(Active / Placebo)
For cohort Response = Better 1.592 .741 3.418
For cohort Response = Same .782 .526 1.163
WCHRI, University of Alberta
60 SPSS Workshop 2014 Tutorial
N of Valid Cases 54
Total Odds Ratio for Treatment
3.370 1.462 7.772
(Active / Placebo)
For cohort Response = Better 2.164 1.237 3.783
For cohort Response = Same .642 .471 .875
N of Valid Cases 106
o SPSS output:
First survey * Second survey Crosstabulation
Second survey
Approve Disapprove Total
First survey Approve Count 794 150 944
% within First survey 84.1% 15.9% 100.0%
Disapprove Count 86 570 656
% within First survey 13.1% 86.9% 100.0%
Total Count 880 720 1600
% within First survey 55.0% 45.0% 100.0%
Symmetric Measures
Value Asymp. Std. Errora Approx. Tb Approx. Sig.
Interval by Interval Pearson's R .702 .018 39.396 .000c
Ordinal by Ordinal Spearman Correlation .702 .018 39.396 .000c
N of Valid Cases 1600
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
c. Based on normal approximation.
o Interpretation: SPSS output above displays the result of McNemar’s test for matched pair data. We can see
that there is big difference of the probabilities of approval for the prime minister’s performance at the times
of two surveys (p-value <0.001). i.e., we have strong evidence to support a drop in rating
o SPSS output:
Symmetric Measures
Value Asymp. Std. Errora Approx. Tb Approx. Sig.
Measure of Agreement Kappa .345 .072 5.637 .000
N of Valid Cases 88
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
o Interpretation: From the SPSS output, estimated Cohen’s Kappa=0.345 and Test for the test of symmetry is
significant (p-value<0.001) implying low agreement.
Retrospective studies
A retrospective study looks backwards and examines exposures to suspected risk or protection factors in relation
to an outcome that is established at the start of the study. Many valuable case-control studies, such as Lane and
Claypon's 1926 investigation of risk factors for breast cancer, were retrospective investigations. Most sources of
error due to confounding and bias are more common in retrospective studies than in prospective studies. For this
reason, retrospective investigations are often criticised. If the outcome of interest is uncommon, however, the size
of prospective investigation required to estimate relative risk is often too large to be feasible. In retrospective
studies the odds ratio provides an estimate of relative risk. You should take special care to avoid sources of bias
and confounding in retrospective studies.
Prospective investigation is required to make precise estimates of either the incidence of an outcome or the
relative risk of an outcome based on exposure.
Case-Control studies
Case-Control studies are usually but not exclusively retrospective; the opposite is true for cohort studies. The
following notes relate case-control to cohort studies:
• outcome is measured before exposure
• controls are selected on the basis of not having the outcome
• good for rare outcomes
• relatively inexpensive
• smaller numbers required
• quicker to complete
• prone to selection bias
• prone to recall/retrospective bias
• related methods are risk (retrospective), chi-square 2 by 2 test, Fisher's exact test, exact confidence
interval for odds ratio, odds ratio meta-analysis and conditional logistic regression.
Cohort studies
Cohort studies are usually but not exclusively prospective; the opposite is true for case-control studies. The
following notes relate cohort to case-control studies:
• outcome is measured after exposure
• yields true incidence rates and relative risks
• may uncover unanticipated associations with outcome
• best for common outcomes
• expensive
• requires large numbers
• takes a long time to complete
• prone to attrition bias (compensate by using person-time methods)
• prone to the bias of change in methods over time
• related methods are risk (prospective), relative risk meta-analysis, risk difference meta-analysis and
proportions
o SPSS output:
Descriptives
Nitrogen
95% Confidence Interval for Mean
N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum
3DOK1 5 28.8200 5.80017 2.59392 21.6181 36.0219 19.40 33.00
3DOK13 5 13.2600 1.42759 .63844 11.4874 15.0326 11.60 14.40
3DOK4 5 14.6400 4.11619 1.84082 9.5291 19.7509 9.10 19.40
3DOK5 5 23.9800 3.77717 1.68920 19.2900 28.6700 17.70 27.90
3DOK7 5 19.9200 1.13004 .50537 18.5169 21.3231 18.60 21.00
COMPOS 5 18.7000 1.60156 .71624 16.7114 20.6886 16.90 20.80
Total 30 19.8867 6.24217 1.13966 17.5558 22.2175 9.10 33.00
ANOVA
Nitrogen
Sum of Squares df Mean Square F Sig.
Between Groups 847.047 5 169.409 14.371 .000
Within Groups 282.928 24 11.789
Total 1129.975 29
Multiple Comparisons
Dependent Variable: Nitrogen
Tukey HSD
Mean Difference 95% Confidence Interval
(I) Strain (J) Strain (I-J) Std. Error Sig. Lower Bound Upper Bound
3DOK1 3DOK13 15.56000* 2.17151 .000 8.8458 22.2742
3DOK4 14.18000* 2.17151 .000 7.4658 20.8942
3DOK5 4.84000 2.17151 .262 -1.8742 11.5542
3DOK7 8.90000* 2.17151 .005 2.1858 15.6142
COMPOS 10.12000* 2.17151 .001 3.4058 16.8342
3DOK13 3DOK1 -15.56000* 2.17151 .000 -22.2742 -8.8458
3DOK4 -1.38000 2.17151 .987 -8.0942 5.3342
3DOK5 -10.72000* 2.17151 .001 -17.4342 -4.0058
3DOK7 -6.66000 2.17151 .053 -13.3742 .0542
COMPOS -5.44000 2.17151 .162 -12.1542 1.2742
3DOK4 3DOK1 -14.18000* 2.17151 .000 -20.8942 -7.4658
3DOK13 1.38000 2.17151 .987 -5.3342 8.0942
3DOK5 -9.34000* 2.17151 .003 -16.0542 -2.6258
3DOK7 -5.28000 2.17151 .185 -11.9942 1.4342
COMPOS -4.06000 2.17151 .443 -10.7742 2.6542
3DOK5 3DOK1 -4.84000 2.17151 .262 -11.5542 1.8742
3DOK13 10.72000* 2.17151 .001 4.0058 17.4342
3DOK4 9.34000* 2.17151 .003 2.6258 16.0542
3DOK7 4.06000 2.17151 .443 -2.6542 10.7742
COMPOS 5.28000 2.17151 .185 -1.4342 11.9942
3DOK7 3DOK1 -8.90000* 2.17151 .005 -15.6142 -2.1858
3DOK13 6.66000 2.17151 .053 -.0542 13.3742
3DOK4 5.28000 2.17151 .185 -1.4342 11.9942
3DOK5 -4.06000 2.17151 .443 -10.7742 2.6542
COMPOS 1.22000 2.17151 .993 -5.4942 7.9342
COMPOS 3DOK1 -10.12000* 2.17151 .001 -16.8342 -3.4058
3DOK13 5.44000 2.17151 .162 -1.2742 12.1542
3DOK4 4.06000 2.17151 .443 -2.6542 10.7742
3DOK5 -5.28000 2.17151 .185 -11.9942 1.4342
3DOK7 -1.22000 2.17151 .993 -7.9342 5.4942
*. The mean difference is significant at the 0.05 level.
o SPSS output:
Multiple Comparisons
Dependent Variable: y
Tukey HSD
Mean 95% Confidence Interval
(I) drug (J) drug Difference (I-J) Std. Error Sig. Lower Bound Upper Bound
1 2 .53 3.838 .999 -9.70 10.76
*
3 17.32 4.070 .001 6.47 28.17
*
4 12.57 3.777 .009 2.50 22.63
2 1 -.53 3.838 .999 -10.76 9.70
*
3 16.78 4.070 .001 5.93 27.63
*
4 12.03 3.777 .013 1.97 22.10
*
3 1 -17.32 4.070 .001 -28.17 -6.47
*
2 -16.78 4.070 .001 -27.63 -5.93
4 -4.75 4.013 .640 -15.45 5.95
*
4 1 -12.57 3.777 .009 -22.63 -2.50
*
2 -12.03 3.777 .013 -22.10 -1.97
3 4.75 4.013 .640 -5.95 15.45
Based on observed means.
The error term is Mean Square(Error) = 110.453.
*. The mean difference is significant at the .05 level.
Figure 62 ANCOVA
o SPSS output:
o Exercise (Goat .sav): Experiments were carried out on six commercial goat farms to determine whether the
standard worm drenching program was adequate. Forty goats were used in each experiment. Twenty of these,
chosen completely at random, were drenched according to the standard program, while the remaining twenty
were drenched more frequently. The goats were individually tagged, and weighed at the start and end of the
year-long study. For the first farm in the study the resulting liveweight gains are given along with the initial
liveweights. In each experiment the main interest was in the comparison of the liveweight gains between the
two treatments.
Assumption of MANOVA
• One of the assumptions of MANOVA is that the response variables come from group populations that are multivariate
normal distributed. This means that each of the dependent variables is normally distributed within group, that any
linear combination of the dependent variables is normally distributed, and that all subsets of the variables must be
multivariate normal. With respect to Type I error rate, MANOVA tends to be robust to minor violations of the
multivariate normality assumption.
• The homogeneity of population covariance matrices (a.k.a. sphericity) is another assumption. This implies that the
population variances and covariances of all dependent variables must be equal in all groups formed by the
independent variables.
• Small samples can have low power, but if the multivariate normality assumption is met, the MANOVA is generally more
powerful than separate univariate tests.
Figure 63 MANOVA
o SPSS output:
Descriptive Statistics
GROUP Mean Std. Deviation N
USEFUL Treatment 18.1182 3.90380 11
Control1 15.5273 2.07562 11
Control2 15.3455 3.13827 11
Total 16.3303 3.29246 33
DIFFICULTY Treatment 6.1909 1.89971 11
Control1 5.5818 2.43426 11
Control2 5.3727 1.75903 11
Total 5.7152 2.01760 33
IMPORTANCE Treatment 8.6818 4.86309 11
Control1 5.1091 2.53119 11
Control2 5.6364 3.54691 11
Total 6.4758 3.98513 33
Multivariate Testsa
Effect Value F Hypothesis df Error df Sig.
Intercept Pillai's Trace .986 657.857b 3.000 28.000 .000
Wilks' Lambda .014 657.857b 3.000 28.000 .000
Hotelling's Trace 70.485 657.857b 3.000 28.000 .000
Roy's Largest Root 70.485 657.857b 3.000 28.000 .000
GROUP Pillai's Trace .477 3.025 6.000 58.000 .012
Wilks' Lambda .526 3.538b 6.000 56.000 .005
Hotelling's Trace .897 4.038 6.000 54.000 .002
Roy's Largest Root .892 8.623c 3.000 29.000 .000
a. Design: Intercept + GROUP
b. Exact statistic
c. The statistic is an upper bound on F that yields a lower bound on the significance level.
o Exercise (Pottery .sav): This example employs multivariate analysis of variance (MANOVA) to measure
differences in the chemical characteristics of ancient pottery found at four kiln sites in Great Britain. The data
are from Tubb, Parker, and Nickless (1980), as reported in Hand et al. (1994). For each of 26 samples of
pottery, the percentages of oxides of five metals are measured.
o SPSS output:
Ranks
GrazeType N Mean Rank Sum of Ranks
WeightGain continuous 16 15.19 243.00
controlled 16 17.81 285.00
Total 32
a
Test Statistics
WeightGain
Mann-Whitney U 107.000
Wilcoxon W 243.000
Z -.792
Asymp. Sig. (2-tailed) .429
b
Exact Sig. [2*(1-tailed Sig.)] .445
a. Grouping Variable: GrazeType
b. Not corrected for ties.
o SPSS output:
Ranks
N Mean Rank Sum of Ranks
a
SBPafter - SBPbefore Negative Ranks 3 8.17 24.50
b
Positive Ranks 9 5.94 53.50
Ties c
0
Total 12
a. SBPafter < SBPbefore
b. SBPafter > SBPbefore
c. SBPafter = SBPbefore
a
Test Statistics
SBPafter -
SBPbefore
b
Z -1.143
Asymp. Sig. (2-tailed) .253
a. Wilcoxon Signed Ranks Test
b. Based on negative ranks.
o SPSS output:
Ranks
Strain N Mean Rank
Nitrogen 3DOK1 5 26.00
3DOK13 5 4.60
3DOK4 5 8.00
3DOK5 5 22.20
3DOK7 5 17.60
COMPOS 5 14.60
Total 30
a,b
Test Statistics
Nitrogen
Chi-Square 21.659
df 5
Asymp. Sig. .001
a. Kruskal Wallis Test
b. Grouping Variable:
Strain
o SPSS output:
Correlations
cholesterol age
**
cholesterol Pearson Correlation 1 .688
Sig. (2-tailed) .000
N 30 30
**
age Pearson Correlation .688 1
Sig. (2-tailed) .000
N 30 30
**. Correlation is significant at the 0.01 level (2-tailed).
Correlations
cholesterol age
**
Spearman's rho cholesterol Correlation Coefficient 1.000 .749
Sig. (2-tailed) . .000
N 30 30
**
age Correlation Coefficient .749 1.000
Sig. (2-tailed) .000 .
N 30 30
**. Correlation is significant at the 0.01 level (2-tailed).
o SPSS output:
b
Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate
a
1 .725 .526 .491 42.646
a. Predictors: (Constant), State1, age
b. Dependent Variable: cholesterol
a
ANOVA
Model Sum of Squares df Mean Square F Sig.
b
1 Regression 54432.754 2 27216.377 14.965 .000
Residual 49103.913 27 1818.663
Total 103536.667 29
a. Dependent Variable: cholesterol
b. Predictors: (Constant), State1, age
a
Coefficients
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 93.141 24.799 3.756 .001
age 2.698 .496 .738 5.440 .000
State1 -28.651 16.541 -.235 -1.732 .095
a. Dependent Variable: cholesterol
o Exercise (BrainSize.sav): Are the size and weight of your brain indicators of your mental capacity? In this
study by Willerman et al. (1991) the researchers use Magnetic Resonance Imaging (MRI) to determine the
brain size of the subjects. The researchers take into account gender and body size to draw conclusions about
the connection between brain size and intelligence. Willerman et al. (1991) conducted their study at a large
southwestern university. They selected a sample of 40 right-handed Anglo introductory psychology students
who had indicated no history of alcoholism, unconsciousness, brain damage, epilepsy, or heart disease. These
subjects were drawn from a larger pool of introductory psychology students with total Scholastic Aptitude Test
Scores higher than 1350 or lower than 940 who had agreed to satisfy a course requirement by allowing the
administration of four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the
Wechsler (1981) Adult Intelligence Scale-Revised. With prior approval of the University's research review
board, students selected for MRI were required to obtain prorated full-scale IQs of greater than 130 or less
than 103, and were equally divided by sex and IQ classification. The MRI Scans were performed at the same
facility for all 40 subjects. The scans consisted of 18 horizontal MR images. The computer counted all pixels
with non-zero gray scale in each of the 18 images and the total count served as an index for brain size.
Variable Information:
Gender: Male or Female
FSIQ: Full Scale IQ scores based on the four Wechsler (1981) subtests
VIQ: Verbal IQ scores based on the four Wechsler (1981) subtests
PIQ: Performance IQ scores based on the four Wechsler (1981) subtests
Weight: body weight in pounds
Height: height in inches
MRI_Count: total pixel Count from the 18 MRI scans
o SPSS output:
Model Summary
Cox & Snell R Nagelkerke R
Step -2 Log likelihood Square Square
1 458.517a .098 .138
a. Estimation terminated at iteration number 4 because parameter
estimates changed by less than .001.
Classification Tablea
Predicted
ADMIT Percentage
Observed Not admitted Admitted Correct
Step 1 ADMIT Not admitted 254 19 93.0
Admitted 97 30 23.6
Overall Percentage 71.0
a. The cut value is .500
o Interpretation: From the output, GRE, GPA, and Rank variables are associated with response variable
(Admit or not). In logistics regression. Exp(B) is useful for interpretation. For the coefficients for GPA,
b 0.804
the odds ratio can be computed by raising e to the power of the logistic coefficient, OR = e = e =
2.235. This means that a one unit change in GPA results in 2.235 times chance to get admission.