Sie sind auf Seite 1von 92

GSO & GS PROGRAMS

CapSU Pontevedra Research & Extension Services


August 2014 Bailan, Pontevedra, Capiz

DATA MANAGEMENT
&

Statistical Analysis
for Social Science Researches

using

IBM-
IBM-SPSS Statistics
ver. 20

September 8-9, 2014


Capiz State University, Main Campus
Roxas City

Secretariat: Graduate School Office & Graduate School Programs


Capiz State University-
University-Pontevedra Research & Extension Services
Bailan, Pontevedra, Capiz
capsupontgs@gmail.com
Tel.No. (036)-634-0474

TABLE OF CONTENTS
Data Management & Statistical Analysis using IBM-SPSS Statistics
by Maritess D. Villanueva
0
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Content Page No.


CapSU Vision, Mission, and Goals .......... 3
RDE Vision, Mission, and Goals .......... 4
Rationale .......... 5
Objectives .......... 6
Module 1 – Basic Concepts in Statistics .......... 7
Categories of Statistics .......... 7
Statistical Softwares .......... 9

Module 2 – IBM SPSS: An Introduction .......... 12


How to run SPSS .......... 12
SPSS Interface .......... 14
SPSS Windows .......... 15
Data Editor Window .......... 15
Output Window .......... 15
Syntax Editor Window .......... 17

Module 3 – Entering, Saving and Opening SPSS data .......... 18


Reading an Excel File .......... 23
Data Recoding .......... 26
Exercise 1 .......... 34

Module 4 – Generating Descriptive Statistics .......... 35


Exercise 2 .......... 40

Module 5 – Generating Frequency Tables and Graphs .......... 41


Exercise 3 .......... 46

Module 6 – Detecting Data Outliers .......... 47


Exercise 4 .......... 50

Module 7 – Inferential Statistics: A Review .......... 51


Steps in Testing a Statistical Hypothesis .......... 53
One-tailed or Two-tailed? .......... 53
Parametric or Non-parametric? .......... 53
Exercise 5 .......... 54

Module 8 – Testing Assumptions .......... 55


Test for Homogeneity of Variances .......... 55
Test for Randomness .......... 56
Test for Nomality .......... 56
Exercise 6 .......... 58

Module 9 – Test on a Single Population .......... 59


Parametric: Z-test .......... 59
Nonparametric: Binomial test .......... 60
Exercise 7 .......... 61

Module 10 – Case of Two Population Means – Related Samples .......... 62


Parametric: t-test .......... 62
Nonparametric: Wilcoxon Signed-Rank Test .......... 63
Exercise 8 .......... 64

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
1
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

TABLE OF CONTENTS

Content Page No.


Module 11 – Case of Two Population Means – Independent
Samples ......... 65
Parametric: t-test or Z-test ......... 66
Nonparametric: Mann-Whitney U Test ......... 67
Exercise 9

Module 12 – Case of Two or More Population Means


- One-way Classification ......... 68
Parametric: F-test (ANOVA) ......... 68
Nonparametric: Kruskal-Wallis H Test ......... 69
Exercise 10 ......... 71

Module 13 – Case of Two or More Population Means


- Two-way Classification ......... 72
Parametric: F-test (ANOVA) ......... 72
Nonparametric: Friedman’s Test ......... 73
Exercise 11 ......... 76

Module 14 – Using SPSS to find Simple Random Samples ......... 78

Module 15 – Measures of Correlations and Relationships ......... 81


Parametric: Pearson Product Moment
Correlation Coefficient ......... 81
Nonparametric: Spearman Rank Correlation
Coefficient ......... 82
Chi-square test (categorical
data) ......... 83
Exercise 12 ......... 85

Module 16 – Simple Linear Regression ......... 86


Multiple Linear Regression ......... 89
Exercise 13 ......... 90

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
2
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

CAPIZ STATE UNIVERSITY

VISION

Center of Academic Excellence Delivering Quality Service to All.

MISSION

Capiz State University is committed to provide advanced knowledge


and innovation; develop skills, talents and values; undertake
relevant research, development and extension services;
promote entrepreneurship and environmental
consciousness and enhance industry
collaboration and linkages
with partner agencies.

GOALS

• Globally competitive graduates.


• Institutionalized research culture.
• Responsive and sustainable extension services
• Maximized profit of viable agro-industrial business ventures.
• Effective and efficient administration.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
3
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

VMG of RDE

VISION

CapSU as a credible and recognized leader in the pursuit of


RDE activities in the Visayas Region.

MISSION

The University thru its RDE activities shall generate and extend
quality technical information, products and services in various
discipline using appropriate approaches for sustained agro-industrial
development to improve the quality of life.

GOALS

To actively support a sustainable agro-industrialization and


balanced socio-economic growth through technology generation and
commercialization, continued capability building, communication
advocacy on market-driven innovations, and partnership with key
sectors of development.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
4
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Training Rationale

Capiz State University at Pontevedra, spearheaded by the


Graduate School Office and Graduate School Programs, in
collaboration with the University’s Extension services, will adopt, as
part of its development in research aspect, initiatives in faculty
researches as well as thesis and dissertation assistance of graduate
students. One such initiative that has proven to be popular,
participatory, and efficient is by the use of a certain statistical
package, particularly the IBM-SPSS Statistics version 20. The updated
technology was introduced in June 2014 during the training course
on Research Design, Statistical Data Analysis and Interpretation for
Researchers in Forestry, Environment and Natural Resources given by
PCAARRD-DOST.

In this university, the IBM-SPSS Statistics version 20 will be


introduced in different departments and colleges and will be applied
in Social Science Researches.

IBM-SPSS version 20 will be used as a tool for data


management and data analysis in solving research problems in
various fields like education, economics, management, health,
marketing and others.

In this connection, a training workshop on Data Management


and Statistical Analysis for Social Science Researches using IBM-SPSS
Statistics version 20 is designed for faculty researchers and graduate
students. Interested research enthusiasts can also participate to the
said training.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
5
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Training Objectives

The training-workshop is conducted to enable the participants


to:
1. be the recipient of technology transfer;

2. appreciate the importance of IBM-SPSS Statistics ver. 20


computer program and its relevance in the conduct of their
researches;

3. use the said program in analyzing their own collected


research data or data related to their field of specialization;

4. apply improved skills in research;

5. be more involved in statistical analysis of their


thesis/dissertation, for the part of graduate student-
participants; and

6. identify staff for future researches that will be in-charge for


data banking which includes collection, organization,
presentation, analysis and interpretation of data for each
department of each colleges, for the part of faculty
participants.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
6
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 1
Basic Concepts and Categories of Statistics
& Statistical Packages/Softwares
Learning Objectives

At the end of this module, the participants should be able to:


 discuss statistical concepts;
 identify different types of variables and classify data according to level
of measurement ; and
 introduce the different statistical packages.

Statistics
 (Singular sense) is a science which deals with the collection,
organization, presentation, analysis, and interpretation of data
 a study of variation
 (plural sense) is an actual number derived from the data
 a collection of facts and figures
 a processed data (e.g. Population statistics, statistics on births,
statistics on enrollment)

Data → facts and figures

Types of data
 Primary data – acquired directly from the source
Ex: data obtained by measuring wt. of 500 one-day old chicks from
Farm XYZ
 Secondary data – non-primary data
Ex: Phil. Rice Production (tons/ha) data by province from 1990-2014
taken from publications of the Phil. Bureau of Agricultural Statistics

Categories of Statistics
 Descriptive statistics- methods of organizing, summarizing, presenting data
and their interpretation.
 Inferential statistics – concerned with making generalizations about a larger
set of data where only a part is examined.

Descriptive statistics Inferential statistics

Probability and Sampling


Scope of Statistics

Role of Statistics
 A tool for data analysis (e.g. standard drug vs. new drug…. which is
more effective?)
 Opinion poll survey (Do you think Philippines is ready for ASEAN
integration 2015?)
Some Basic Terms:
 Universe – set of all entities or individuals under consideration/
subject of the study.
Data Management & Statistical Analysis using IBM-SPSS Statistics
by Maritess D. Villanueva
7
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

2 types:
 Finite – when the elements of the universe can be counted for a
given time period
 Infinite – when the number of elements of the universe is
unlimited

 Variable – characteristics of interest measurable or observable on


each & every individual of the universe.
Qualitative
Variable
Types Discrete
Quantitative
Continuous

 Population – set of all possible values of the variable


 Sample – subset of the universe or the population
 Distribution – pattern of variation of a variable

The Variables and Levels of Measurement

The measurement of a variable determines the amount of information that


can be processed to answer research objectives of a study. The scale of
measurement of the variable determines the algebraic operations that can be
performed and the statistical tools that can be applied to analyze the data. These
are four scales or levels of measurement:
Nominal
 data collected are simply labels or names or categories without any
implicit or explicit ordering of the labels.
 observations with the same label belong to the same category
 lowest level of measurement
 frequencies or counts of observations belonging to the same
category can be obtained.

Example 1.
Variable Possible data values
1. Sex Male, Female
2. Hair Color Black, Brown, Reddish Brown…
3. Cellphone network Smart, Talk n Text, Globe, Sun cellular…

Ordinal
 data collected are labels or classes with an implied ordering in
these labels;
 the difference between two labels cannot be quantified;
 a level of measure higher than nominal;
 only ordering or ranking can be done on the data;

Example 2.
Variable Possible data values
1. Military rank Sergeant, Lieutenant, Captain, General
2. Job Position President, Vice-President, Manager
3. Sibling Rank 1st, 2nd ,3rd, 4th, 5th, …
Data Management & Statistical Analysis using IBM-SPSS Statistics
by Maritess D. Villanueva
8
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Interval
 data collected can be ordered or ranked, added and subtracted, but
not divided nor multiplied;
 differences between any two data values can be determined;
 the unit of measurement is constant (but arbitrary), and the zero
point is arbitrary;
 a level of measurement higher than ordinal

Example 3.
Variable Possible data values
1. Baking temperature 172oC to 178oC
2. Intelligence Quotient (IQ) 80 to 140
3. Grades 1.0, 1.25, 1.5, …

Ratio
 data collected has all the properties of the interval scale and in
addition, can be multiplied and divided;
 has a true zero point;
 is the highest level of measurement.

Example 4.
Variable Possible data values
1. Height 4’ to 7’
2. Width 0” to 5”
3. Weight 20 g to 50 kg

Statistical Software
 is a specialized computer program used for data management and
statistical analysis

Statistical Packages
 CS Pro (Census and Survey Processing System)
 SAS (Statistical Analysis Software)
 Stata
 Minitab
 R
 STAR (Statistical Tool for Agricultural Research)
 IRRISTAT
 CROPSTAT
 ITSM 2000
 E-Views
 SPSS (Statistical Package for Social Sciences)

CS Pro
 a software package for editing, tabulating, and disseminating data
from censuses and surveys
 a public domain software
 Advantages:
 Can improve the data management and analysis of large scale
surveys
 Can be downloaded without any cost (free)
 Can run on a computer with very basic specifications
Data Management & Statistical Analysis using IBM-SPSS Statistics
by Maritess D. Villanueva
9
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

 Disadvantages:
 Too many files are being generated
 Only a single user can access and write to a file at any given
time
 Modifying item affects existing file
SAS
 a propriety software that enables users to implement data
management, statistical analysis, data mining, forecasting, etc.
 a popular statistical software for medical research and pharmaceutical
industry
 Advantages:
 Powerful specially in implementing analysis on experimental
design and analysis of variance (ANOVA)
 Has a wide range of statistical procedures
 Disadvantages:
 Difficult to learn
 Expensive
 Requires annual license
 Recently launched a free SAS version for professors and students
called SAS university (www.sas.com)

Stata
 a propriety software that widely used in the field of economics,
sociology and medicine.
 executes data management and transformation, parameter
estimations, graphics, statistical measure computations and other
related mathematical calculations.
 in executing the program, time series, statistics and graphics are
being loaded.

Minitab
 a statistical software package originally intended for teaching
statistics.
 Suitable for moderate-size datasets
 Advantages:
 Easy to learn and easy to use
 Impressive quality of graphs
 Cheaper compared to SAS and SPSS
 Requires less disk space
 Disadvantages:
 Poor compatibility with other statistical programs
 Less efficient for complex procedures
R
 A free software programming language based on S programming
language
 A software environment for statistical computing and graphics
 Advantages:
 Freely available online
 Has powerful and customizable graphics
 Can be integrated to other Statistical packages

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
10
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Run on various operating systems such as Windows, Linux and



Mac
 Disadvantages:
 Difficult to learn
 More complicated to learn compared to SAS or Stata

ITSM 2000
 permit easy execution of data processing, graphical display,
estimation, and diagnostic testing for univariate and multivariate time
series models in the time and frequency domains
 provides easy to use estimation and forecasting tools for spectral
analysis
 particularly, the dynamic graphics allow the user to instantly see the
effect of data transformations and model changes on a wide variety of
features such as the sample, residual, and model autocorrelation
functions and spectra.

E-Views
 offers an extensive array of powerful features for data handling,
statistics and econometric analysis, forecasting and simulation, data
presentation, and programming.

IRRISTAT
 a set of microcomputer programs designed to assist agricultural
researchers in developing experimental lay-outs and undertaking plot
sampling, data collection, data and file management, statistical
analysis of data and presentation of results

STAR
 a freeware developed specifically by Biometrics and Breeding
Informatics, Plant Breeding, Genetics and Biotechnology Division of
International Rice Research Institute)
 a computer program for data management and basic statistical
analysis of experimental data.

SPSS (Statistical Package for Social Sciences)


 one of the most widely used program for statistical analysis in Social
Sciences
 Advantages:
 User-friendly interface
 Wide array of statistical procedures
 Disadvantages:
 Expensive
 License is time limited
 Graphics are less impressive

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
11
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 2
IBM-SPSS Introduction
Learning Objectives

At the end of this module, the participants should be able to:


 run the IBM-SPSS;
 familiar with IBM-SPSS Interface and Windows

Quick Facts about SPSS


 It was invented by Norman H Nie, C. Hadlai “Tex” Hul, and Dale H.
Bent during 1960s.
 In 1980s, the version of the software was moved to a personal
computer.
 Last 2008, the name SPSS was changed to Predictive Analysis
Software (PASW).
 A year after, SPSS was acquired by IBM and renamed the software as
IBM SPSS Statistics

Statistical Analysis and procedures we can do with SPSS


 Calculate Descriptive Statistics
 Compute Frequencies
 Compare Means
 Do Test of Association and Independence
 Create Different Graphs and Charts
 Run Correlation and Regression
 Conduct Analysis of Variance (ANOVA) and many other Statistical
Procedures

How to run SPSS

 Option 1

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
12
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

 Option 2

 Option 3

 Option 4

Press the window key in your keyboard

The monitor will display different icons

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
13
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

3
Move the scroll bar to the end point

3
4 Click the IBM-SPSS Statistics 20 icon discplayed in
the monitor.

SPSS Interface

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
14
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

SPSS Windows

 SPSS is divided into 3 main windows:

1. Data Editor Window - this is where you enter the data


- divided into 2 views:
 Data View
 Variable View

 Data View – a spreadsheet-like interface where you enter the data.


This is the default view when opening SPSS

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
15
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

 Variable View – this is where you define your variables

2. Output Window - this is where the result is being displayed

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
16
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

3. Syntax Editor Window - is used to run and store SPSS command

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
17
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 3
Entering, Saving and Opening SPSS data
Learning Objectives

At the end of this module, the participants should be able to:


 know how to encode and save information in the SPSS data editor;
 exporting information from MS Excel file
 transform/recode data into same variables;
 compute another variable from the existing data;
 transform/recode data into different variables;

Entering SPSS data


Define the variable names
1
Click the Variable View tab at the bottom of the Data editor window
Table 1. Socio-Economic Characteristics of 10
Domestic Helpers Interviewed in
Quezon City.
Number of
DH Place of Origin Age
Siblings
1 Barrio 34 5
2 Barrio 16 3
3 Barrio 20 2
4 Town 23 8
5 Town 18 4
6 Barrio 17 4
7 Barrio 37 3
8 Barrio 25 2
9 City 31 4
10 City 42 1
Source: Laboratory Manual in Statistics 1 by Habacon, L.T. et.al.

In the first row of the first column, type origin. Then press ENTER
key. In the second row, type age. Then ENTER. In the third row, type
num_sib. Press ENTER.
New variables are automatically given a Numeric data type

Note: Variable name must start with a letter and has no space
Data Management & Statistical Analysis using IBM-SPSS Statistics
by Maritess D. Villanueva
18
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Type – the type of variable

Width – number of characters or numerical digit you will be able to


enter for a particular variable

Decimals – desired number of decimal places

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
19
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Label – full name of the variable

Values – Use to assign values to vaariables


e.g. 1 – Male 2 – Female

Missing – allows you to assign missing values


Data Management & Statistical Analysis using IBM-SPSS Statistics
by Maritess D. Villanueva
20
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Column– determine the size of column display

Align – Alignment of data in column

Measure – Level of measurement of data/variable

2 Management & Statistical Analysis using IBM-SPSS Statistics


Data by Maritess D. Villanueva
21
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Enter the corresponding data

Click the Data View tab


(Notice that the names that are entered in the Variable View are now
the headings for the first three column in the Data View)
Table 1. Socio-Economic Characteristics
of 10 Domestic Helpers
Interviewed in Quezon City.
Place of Number of
DH Age
Origin Siblings
1 Barrio 34 5
2 Barrio 16 3
3 Barrio 20 2
4 Town 23 8
5 Town 18 4
6 Barrio 17 4
7 Barrio 37 3
8 Barrio 25 2
9 City 31 4
10 City 42 1
Source: Laboratory Manual in Statistics 1 by Habacon, L.T. et.al.

Begin entering data in the first row starting at the first column.

Move the cursor to the second row of the first column to add the next
subject’s data.

Saving SPSS data

Click File  Save

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
22
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Reading/Opening an Excel File


SPSS is capable of reading Excel file
To demonstrate:
 Open an excel file Exercise 1.xlsx located at Desktop > SPSS Training
> Data sets

 Close the Excel file before opening it in SPSS

 Click the folder icon to open data document

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
23
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

 Another way is to Click File > Open > Data

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
24
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Check Read
variable names
from the first
row of data.
Put the
worksheet
number/name
where you typed your data.
8

Sample OUTPUT of Reading an Excel file using SPSS

Save your work as Exercise1a.sav.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
25
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Recoding (Into Same Variable)

From previously saved Exercise1a.sav,

1. Transform Place of Origin into same variable:


Place of Origin: Barrio → 1
City → 2
Town → 3

Click Transform > Recode into Same Variables

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
26
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Highlight Place of Origin, move it to String Variables and click Old


and New Values

Put Barrio in the Old Value, and 1 in the New Value then Click Add.
City in the Old Value, and 2 in the New Value then Click Add.
Town in the Old Value, and 3 in the New Value then Click Add.

Then Click Continue

Click OK

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
27
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Note that the entries for variable Place of Origin were replaced by
codes 1, 2 and 3.

To properly label the codes, Click Variable View, go to Values for


Place of Origin,
type 1 in the Value and Barrio in the Label, then click Add
2 in the Value and City in the Label, then click Add
3 in the Value and Town in the Label, then click Add

Click OK.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
28
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Recoding (Into Different Variable)

From previously saved Exercise1a.sav,

2. Transform Age into different variable:


Age: Below 25 years old → 1
25 – 35 years old → 2
Above 35 years old → 3

Click Transform > Recode into Different Variables

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
29
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Highlight Age, move it to String Variables and put an Output


Variable Name and Label. Click Change and then Old and New
Values

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
30
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Click Continue.

Click OK.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
31
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Note that another column was created for age_recoded.

Do not forget to label properly the codes used.

Compute Variable

In creating another variable (annual salary) based on existing variable


monthly salary.

Click Transform > Compute Variable

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
32
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Supply a variable name for target variable, say Annual_salary, then put
in a numeric expression box: 12* Monthly_Salary

Note that another column was created for annual_salary.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
33
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 1. Entering, Saving, Recoding and Computing SPSS Data

Source: Laboratory Manual in Statistics 1 by Habacon, L.T. et.al.

Consider the data in Table 1,


A. Transform the following variables into the same variable:
Previous Occupation: None → 0
Agriculture → 1
Factory Worker → 2
Saleslady → 3

Save your work as Exercise1b.sav.


B. Recode the following variables into different variables:
Number of Siblings: less than 5 → 1
5 and more → 2
Above 35 years old → 3

Employer’s Monthly HH Income (PhP)


50,000 & Below → 1
50,001 – 75,000 → 2

Annual Salary (PhP): 15,000 & below → 1


15,001 – 20,000 → 2
Above 20,000 → 3
Save your work as Exercise1c.sav.

Data Management & Statistical Analysis using IBM-SPSS Statistics by Maritess D. Villanueva
34
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 4
Generating Descriptive Statistics
Learning Objectives

At the end of this module, the participants should be able to:


 Generate statistics for the averages or measures of central tendency;
 Produce statistics for measures of variability or dispersion ; and
 check the distribution of the data according to its skewness and
peakedness.

Consider the data on screening exam scores of 20 freshman applicants


each in Science High school and Rural High School.

Open the data file Module 4 (screening exam scores)


Location of folder: Desktop > SPSS Training > Data sets > Module 4
(screening exam scores)

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
35
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

From the menu bar, select Analyze>Descriptive Statistics>Frequencies

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
36
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
37
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Output

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
38
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Interpretation

Measure of Central Tendency


Mean - the average value
Median – the middle value of the data set when it is arranged in an
ascending or decreasing order
Mode – the most frequently occurring value(s) in the data set

Measure of Location
Minimum – smallest observed value in the data
Maximum – largest value observed in the data

Measure of Dispersion
Standard deviation – a measure of variability of the data points
from the mean value
Variance – average squared differences of the data points from the
mean value
Range – the simplest measure of variation computed as the
difference between the highest and lowest value of the data set

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
39
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 2. Generating Descriptive Statistics

Consider the data in Table 1. Socio-Economic Characteristics of 35


Domestic Helpers Interviewed in Quezon City, in page 34.

TO DO:
Open your recently saved SPSS data: Desktop > SPSS Training > Data
Sets > Exercise1c.sav

Generate the Descriptive Statistics of the data for the variables Age and
Annual Salary (Minimum, Maximum, Range, Mean, Median, Mode,
Variance, Standard Deviation, Skewness and Kurtosis)

Save your work as Exercise2.spv

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
40
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 5
Generating Frequency Tables and Graphs
Learning Objectives

At the end of this module, the participants should be able to:


 familiarize himself with different methods of data presentation
 organize data by constructing a frequency distribution table; and
 implement the most appropriate method of data presentation for a
given set of data

Frequency Table – a table that lists the number of occurrence of each item in
the data

Consider your recently saved SPSS data: Desktop > SPSS Training >
Data Sets > Exercise1c.sav

Place of Origin)
at the center.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
41
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Expected Output
Place of Origin

Frequency Percent Valid Percent Cumulative


Percent
Barrio 18 51.4 51.4 51.4
Town 7 20.0 20.0 71.4
Valid
City 10 28.6 28.6 100.0
Total 35 100.0 100.0

Sample Interpretation
 About 51.4% of the total number of DH respondents are from
Barrio.
 More than half (51.4%) of the total number of DH respondents
came from Barrio.
 In every 10 DH respondents, five originated from Barrio.

Generating Graphs
Chart or graphs are visual representation of the data
Pie Charts

Bar Charts

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
42
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Consider your recently saved SPSS data: Desktop > SPSS Training >
Data Sets > Exercise1c.sav

Click Graphs > Chart Builder

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
43
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Move the Place of Origin variable to the x-axis. Click OK to create the
chart.

Output

To create a Pie chart. Consider the same data on Place of Origin


 Click Graphs > Legacy Dialogs > Pie

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
44
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

 A dialog prompt will appear. Click Define

 Select the variable Place of Origin by placing it on the Define


Slices by: Click OK

 Generated Pie Chart (double click the chart to enhance more the
Pie Chart)

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
45
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 3. Generating Frequency Tables and Graphs

Consider the data in Table 1. Socio-Economic Characteristics of 35


Domestic Helpers Interviewed in Quezon City, in page 34.

TO DO:
Open your recently saved SPSS data: Desktop > SPSS Training > Data
Sets > Exercise1c.sav

Generate frequency tables for Previous Occupation and Number of


Siblings (recoded).

Generate pie chart for Previous Occupation and bar graph for Number
of Siblings (recoded).

Save your work as Exercise3.spv

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
46
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 6
Detecting Data Outliers
Learning Objectives

At the end of this module, the


participants should be able to:
 Detect data outliers using histogram
 Identify data outliers using box-and-
whiskers plot

Use the SPSS data


Module 6 (quiz scores).sav
to determine if there are outliers
in the data

Click Analyze>Descriptive Statistics


> Explore

Move the variable(s) to


Dependent List then
click Statistics

In a separate dialog box, tick


the words “Outliers” and
“Percentiles” then click
Continue

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
47
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Click Plots

Sd

Click OK

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
48
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Expected Output

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
49
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 4. Detecting outliers in the data

TO DO:
Open the SPSS data: Desktop > SPSS Training > Data Sets > Exercise4
(senior citizens).sav

Consider the characteristics of the 24 members of the Batong Malake


Senior Citizens Association(BMSCA) who participated in their Lakbay-
Aral.

Test if there are outliers for the variables age and income using
histogram and box-and-whiskers plot.

Save your work as Exercise4.spv

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
50
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 7
Inferential Statistics : Steps in Testing Statistical
Hypothesis
Learning Objectives

At the end of this module, the participants should be able to:


 formulate the null and alternative hypothesis for a given situation;
 identify TYPE I and TYPE II errors and recognize consequences of
such errors
 identify one-tailed and two-tailed test; and
 recognize parametric and nonparametric tests

REVIEW:
Inferential Statistics – concerned about estimating parameters by statistics.
Statistical hypothesis
A conjecture about….
⇒ The value of a parameter of the population or
⇒ The distribution of the population
• Examples of Statistical Hypothesis:
• The mean height of students enrolled in Statistics is
5’2” (H: µ = 5’2”).
• The grain length of a variety of rice (IR-8) is normally
distributed (H: Y~normal)
⇒ Conclusions are stated subject to uncertainty
Null Hypothesis – the conjecture which is being tested, denoted by Ho.
- Generally, this is a statement of equality or status quo or no
difference.
Alternative Hypothesis – the complementary statement that will be accepted
in the event that the null hypothesis is rejected. It is
denoted by Ha or H1.

Example: The mean weekly allowance of CapSU students is 500 pesos.


In symbols,Ho: µ = 500 pesos
Ha: µ ≠ 500 pesos OR Ha: µ > 500 OR Ha: µ < 500

Note: Only one of these three alternatives has to be specified

Application Problem: The various consumer ‘watchdog’ organizations regularly


check the mass of items being sold to ensure that advertised
data matches reality.
The 1 kg bags of sugar from the Citizen Kane Sugar Co.
are under scrutiny and we assume that the bags are correctly
labeled – i.e. that they contain exactly 1 kg of sugar.
Solution: The mean mass for the population of 1 kg bags can
be: ____________, _______________, ______________

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
51
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

If µ represents the average bag mass of the population, then the following
possibilities exist:
Possible value Action
_________________ _____________________________________
_________________ _____________________________________
_________________ _____________________________________

Although there are ___ possibilities for µ, 2 of them amount to the


same thing. No action will be taken against the company for _____, since
this is giving customers a value for money deal. No action will be taken for
______ since this is a fair dealing. So we combine these into _______. But
_________ will produce action!

 The null hypothesis is set up formally:


It is appropriate to assume that this company is meeting its
obligations and so the null hypothesis is that there is no
disadvantage to the customer.
 The alternative hypothesis is set up formally:
The company is not meeting its obligations and consumers are
being disadvantaged.

In summary: Hypothesis Action?


H0: _____________________________________
Ha: _____________________________________
The problem of Citizen Kane Sugar can be used to give a generalized picture where we
use the symbol µ0 to stand for the hypothesized mean. In the given problem, it took the
value 1 kg. The three forms of hypothesis test concerning the population mean are
Form 1 Form 2 Form 3
Null hypothesis H0 : H0: H0:
Alternative hypothesis Ha: Ha: Ha:
Test of a Statistical Hypothesis
 Procedure or rule for deciding whether to reject Ho on the basis of a
sample drawn from a population.
Courses of Action or Decision in Hypothesis Testing
1. Reject Ho
2. “Fail to reject” (Accept?) Ho
Consequences of Decision Made in Hypothesis Testing
Ho is actually
Decision Made
TRUE FALSE
Error in Decision
Reject Ho Correct Decision
(TYPE I)
Error in Decision
Fail to reject Ho Correct Decision
(TYPE II)
2 Types of error: Type I error – error in rejecting a true Ho
Type II error – error in accepting a false Ho
Probability of Committing Errors
1. The probability of committing Type I error is denoted by α;
i.e α = P[Type I error] = P[reject Ho/Ho is true
= level of significance of a statistical test
2. The probability of committing Type II error is denoted by
β; i.e β = P[Type II error] = P[accept Ho/Ho is false]

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
52
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

TEST OF STATISTICAL HYPOTHESIS


Steps in Testing a Statistical Hypothesis Parallelism to the Judicial Process Analog

1. State Ho and Ha - Innocence assumption / Accusation of guilt


2. Identify the test statistic and its distribution
when Ho is true. - Type of evidence
3. Specify the level of significance. - Risk of “guilty verdict when innocent
4. State the decision rule. - Substantial evidence or not
5. Collect the data and perform calculations - Collect and summarize evidence
6. Make a statistical decision - Verdict
7. State conclusion - Sentence

Test statistic → Statistic which provides a basis for determining whether to


reject Ho in favor of Ha.

Decision Rule → Rule which specifies that region for which the test statistic
leads to the rejection of Ho in favor of Ha.

Critical Region → The region specified on the test of Ho vs Ha.

One-tailed or Two-tailed?

Acceptance Acceptance Acceptance


region region region

Rejection Region Rejection Region Rejection Region


Ha: ≠ (α/2) Ha: > (α) Ha: < (α)

Parametric or Nonparametric?

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
53
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 5. Formulating Hypothesis and Errors in


Hypothesis Testing

Name: ___________________________ Score: __________

TO DO:

A. Consider each of the following situations and indicate for each of the four
actions whether it is a CORRECT DECISION, a TYPE I error or a TYPE II
error.

Ho : A training course is effective.


1. Approve an ineffective training course. - ___________________________
2. Disapprove an ineffective training course. - ___________________________
3. Disapprove an effective training course. - ___________________________
4. Approve an effective training course. - ___________________________

Ho : A large manufacturing firm is being charged with discrimination in its hiring practices.
5. The jury gave an innocent verdict to the guilty firm. - ___________________________
6. The jury gave a guilty verdict to a not innocent firm. - ___________________________
7. The jury gave a guilty verdict to an innocent firm. - ___________________________
8. The jury gave a “not guilty” verdict to an innocent firm. - ___________________________

B. For the given problem, formulate an appropriate null (Ho) and an


appropriate alternative (Ha) hypothesis. Define any term or symbol which
you would be using. Also, identify the situations when Type I and Type II
errors will be committed.

From past experience, it has been determined that a qualified operator of a


certain machine turning out 500 items per day produces 25 or fewer
defective items per day. A new operator is being hired to run the same
machine and the hypothesis is made that he is a qualified operator.

Null Hypothesis (H0): Alternative Hypothesis (Ha):

Type I error situation: Type II error situation:

Data Management & Statistical Analysis using IBM-SPSS Statistics by Maritess D. Villanueva
54
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 8
Testing Assumptions
Learning Objectives

At the end of this module, the participants should be able to:


 Perform a test on homogeneity of variances
 Execute test on Randomness of data observations
 Analyze if the data set follows a normal distribution
 Decide whether to use parametric or nonparametric tests

TEST ON
ON ASSUMPTIONS
In most situations, the satisfaction of assumptions for certain parametric
methods ensures the validity of the results and the appropriateness of the test
employed. It is for this reason that a number of methods has been designed to
test on certain assumptions of parametric methods.
Example: Three sections of the same Mathematics course are taught by
three instructors. The final exam score of the students in the
three sections are recorded as follows:
Section 1: 95, 32, 47, 75, 83, 84, 73, 68
Section 2: 85, 90, 79, 50, 32, 84, 78, 95, 65, 80
Section 3: 79, 92, 63, 68, 76, 20, 37, 74, 86
Is the distribution of final exam scores the same in three sections? Test
for α = 5%.

Use the SPSS data Module 8 (Math sections).sav


1. Tests on Equality of Variances
 The assumption of homoskedasticity (equality of variances) is used
in ANOVA techniques and regression analysis.
 The assumption of homoskedasticity is necessary for some tests to
be valid.
 The Bartlett’s test makes use of the χ2 test.
 It tests whether p populations have equal variances of the samples
obtained from the p populations.
 One of the many assumptions in the analysis of an experimental
data
 If this assumption does not hold, the F-tests in the analysis of
variance is not valid
Test of Hypothesis:
1. Ho: The variances in final exam scores of 3 sections are equal.
Ha: The variances final exam scores of 3 sections are not equal.
2. TEST PROCEDURE: Homogeneity of variance test (Levene’s test)
3. α = 5%
4. Decision Rule: Reject Ho if sig < α; Otherwise, fail to reject Ho.
5. Computations:
sig = 0.907
α = 0.05
6. DECISION: Since sig= 0.907 < α =0.05; we fail to reject Ho.
7.CONCLUSION: At α = 5%, The variances of three treatments are
equal.

PROCEDURE: Analyze > Compare Means > Oneway ANOVA > Options > Homogeneity of Variance Test

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
55
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

2. The Run’s Test for Randomness


 Inferential statistics will only be valid if random samples are taken
from the population(s) of interest, i.e., successive observations
must be independent of each other.
 Test for randomness are usually based on the sequence or order in
which observations were obtained.

Test for Randomness


1. Ho: The sequence of observations is random.
Ha: The sequence of observations is not random.
2. TEST PROCEDURE: Runs test for randomness
3. α = 5%
4. Decision Rule: Reject Ho if sig < α; Otherwise, fail to reject
Ho.
5. Computations:
sig = 0.969
α = 0.05
6. DECISION: Since sig= 0.969 < α =0.05; we fail to reject Ho.
7. CONCLUSION: At α = 5%, the sequence of observations is
random.

PROCEDURE: Analyze > Nonparametric Tests > Legacy Dialogs > RUNS

3. The One-Sample Test for Normality

 Use Wilk-Shapiro test (for N < 2000) and Kolmogorov-Smirnov (K-


S) test (for N > 2000) is used to determine whether the sample
data came from a normal distribution or not.
 It makes use of the standard normal distribution as the basis to say
whether a certain distribution is normal or not.

Test of Hypothesis:
1. Ho: The distribution of data is normal.
Ha: The distribution of data is not normal.
2. TEST PROCEDURE: Wilk-Shapiro Test for Normality
3. α = 5%
4. Decision Rule: Reject Ho if sig < α; Otherwise, fail to
reject Ho.
5. Computations:
sig = 0.438 (for section 1)
sig = 0.088 (for section 2)
sig = 0.172 (for section 3)
α = 0.05
6. DECISION: Since sig= < α =0.05; we fail to reject
Ho.
7. CONCLUSION: At α = 5%, the distribution of data is
normal among three sections.

PROCEDURE: Analyze > Descriptive Statistics > Explore > Plots > Normality plots with tests.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
56
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

In instances wherein certain assumptions are not satisfied, appropriate


transformations and adjustments to the data must be done before parametric
methods (e.g., t, Z of F tests) are employed. Another alternative in such
instances is also done, i.e., to employ the nonparametric counterpart of the
appropriate parametric test.

Nonparametric Statistical tests


 Also called distribution-free statistics.
 No assumptions are made about the precise form of the sampled
population.
 Easier to apply.
 Applicable to rank data
 Usable when two sets of observations come from different populations
 The only alternative when sample size is small (n< 25)
 Useful at a specified significance level as stated (whatever happened to be
the shape of the distribution from which the sample distribution was
drawn)
 Lower statistical efficiency

NOTE: Parametric statistical test (e.g., Z, t, F tests) are more powerful


than nonparametric tests.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
57
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 6. Testing Assumptions

Name: ___________________________ Score: __________

TO DO:

In the article “Shelf-Space Strategy in Retailing,” published in the Proceedings:


Southern Marketing Association (1975), the effect of shelf height on the
supermarket sales of canned dog food is investigated. An experiment was
conducted at a small supermarket for a period of 8 days on the sales of a single
brand of dog food, referred to as Arf dog food, involving three levels of shelf
height: knee level, waist level, and eye level. During each day the shelf height of
the canned dog food was randomly changed on three different occasions. The
remaining sections of the gondola that housed the given brand were filled with a
mixture of dog food brands that were both familiar and unfamiliar to customers
in this particular geographic area. Sales, in hundreds of dollars, of Arf dog food
per day for the three shelf heights are as follows:
Shelf Height
Knee Level Waist Level Eye Level
77 88 85
82 94 85
86 93 87
78 90 81
81 91 80
86 94 79
77 90 87
81 87 93
Is there a significant difference in the average daily sales of this dog food
based on shelf height? Use a 0.01 level of significance.

 Check the three underlying assumptions (normality, randomness and equality


of variances) of the above problem.
Yes No
 Are the data normally distributed? ( ) ( )
 Are the sample data collected at random? ( ) ( )
 Are the variances in sales for each shelf
height equal? ( ) ( )

 Which family of tests do you think is more appropriate to apply?

Parametric tests

Nonparametric tests

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
58
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 9
Test on Single Population
Learning Objectives

At the end of this module, the participants should be able to:


 Decide whether to use parametric or nonparametric test for a single
population
 Perform a test of hypothesis for the mean or median in one
population

Parametric Statistical test : Z or t-


t-test: Case of Mean (µ
(µ) of a Single Population

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
59
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

NonParametric counterpart: Binomial Test (Based on


on median/rank)

Example:

Six types of dried fishery products were tested for levels of


histamine content. The histamine content /100 mg samples were
as follows:

24.09 9.47 5.11 13.14 6.57 10.95

It is claimed that the median level of histamine content among the


samples did not exceed the acceptable histamine level of 20
mg/100 g sample. Test the claim at α =0.01 level of significance.

Test of Hypothesis:

1. Ho: The median level of histamine content did not exceed 20mg/100g sample
Ha: The median level of histamine content exceed 20mg/100g sample
2. TEST PROCEDURE: Binomial test
3. α = 1%
4. Decision Rule: Reject Ho if sig < α; Otherwise, fail to reject Ho.
5. Computations:

PROCEDURE: In Data Editor,


select ANALYZE > NONPARAMETRIC TESTS > LEGACY DIALOGS > BINOMIAL TEST

sig = 0.219/2 = 0.1095


α = 0.01

6. DECISION: Since sig= 0.1095 < α =0.01; we fail to reject Ho.


7. CONCLUSION: At α = 1%, the median level of histamine content did not
exceed 20mg/100g sample

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
60
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 7. Test on Single Population

Name: ___________________________ Score: __________

Using SPSS, solve the following problem and perform a complete test of
statistical hypothesis.

An accountancy firm is investigating the installation of a computer system.


On a test run, it obtained the following time savings on an audit of a
selection of 10 major accounts (measured in hours):
74 12 35 26 34 42 30 45 8 33
At the 1% significance level, will the computer system make significant time
savings?

1) One-tailed or Two-tailed:____________________________________

2) Parametric or Nonparametric:________________________________

STEP BY STEP STATISTICAL HYPOTHESIS TESTING:

a) Ho:

_________________________________________________________

Ha:

_________________________________________________________

b) Test Procedure: ___________________________________________

c) Level of significance: ________________________________

d) Decision Rule: ____________________________________________

e) Computation:

α= _________

 = _________

f) Decision:_________________________________________________

g) Conclusion: _______________________________________________

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
61
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 10
Test of Hypothesis: Case of Two Population
Means – Related Samples
Learning Objectives

At the end of this module, the participants should be able to:


 Decide whether to use parametric or nonparametric on two
population means, case of paired or related samples.
 Perform a statistical test of hypothesis on two population means, case
of paired or related samples.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
62
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Test of Hypothesis:

1. Ho: There is no difference between the scores of a control group and their
matched individuals.
Ha: There is a difference between the scores of a control group and their
matched individuals.
2. TEST PROCEDURE: Wilcoxon Signed-Rank Test
3. α = 5%
4. Decision Rule: Reject Ho if sig < α; Otherwise, fail to reject Ho.
5. Computations:

Ranks

N Mean Rank Sum of Ranks


a
Negative Ranks 6 6.00 36.00
b
Positive Ranks 3 3.00 9.00
y-x c
Ties 1
Total 10
a. y < x
b. y > x
c. y = x
a
Test Statistics

y-x
b
Z -1.604
Asymp. Sig. (2-tailed) .109
a. Wilcoxon Signed Ranks Test
b. Based on positive ranks.

sig = 0.109/2 = 0.0545


α = 0.05

6. DECISION: Since sig= 0.1095 < α =0.05; we fail to reject Ho.


7. CONCLUSION: At α = 5%, the scores of a control group and their matched
individuals are the same.

Data Management & Statistical Analysis using IBM-SPSS Statistics by Maritess D. Villanueva
63
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 8. Test of Hypothesis:


Case of Two Population Means – Related Samples

Name: ___________________________ Score: __________

Using SPSS, solve the following problem and perform a complete test of
statistical hypothesis.

It is claimed that a new diet will reduce a person’s weight in a period of two
weeks. The weights of 7 women who followed this diet were recorded before
and after a 2-week period.

Woman

1 2 3 4 5 6 7

Weight before 58.5 60.3 61.7 69.0 64.0 62.6 56.7


Weight after 60.0 54.9 58.1 62.1 58.5 59.9 54.4

Test a manufacturer’s claim at 5% level of significance.

1) One-tailed or Two-tailed:__________________________________________

2) Parametric or Nonparametric:______________________________________

STEP BY STEP STATISTICAL HYPOTHESIS TESTING:

a) Ho: __________________________________________________________

Ha: __________________________________________________________

b) Test Procedure: ________________________________________________

c) Level of significance: ________________________________

d) Decision Rule: __________________________________________________

e) Computation:

α= _________

 = _________

f) Decision:_______________________________________________________

g) Conclusion: _____________________________________________________

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
64
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 11
Test of Hypothesis: Case of Two Population
Means – Independent Samples
Learning Objectives

At the end of this module, the participants should be able to:


 Decide whether to use parametric or nonparametric on two
population means, case of independent samples.
 Perform a statistical test of hypothesis on two population means, case
of independent samples.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
65
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
66
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 9. Test of Hypothesis:


Case of Two Population Means – Independent Samples

Name: ___________________________ Score: __________

Using SPSS, solve the following problem and perform a complete test of
statistical hypothesis.

Production line quantities for two managers in two plants of a large company
are compared. Each data value represents the amount of production during
randomly selected 1-hour periods over a whole week.
Manager A:
15 13 8 16 12 15 12 18 11 12
9 10 7 9
Manager B:
14 15 10 16 11 13 15 12 14 11

Use the 1% level of significance to test the hypothesis that there is no


significant difference in the mean production rate.

1) One-tailed or Two-tailed:____________________________________

2) Parametric or Nonparametric:________________________________

STEP BY STEP STATISTICAL HYPOTHESIS TESTING:

a) Ho: _____________________________________________________

Ha: _____________________________________________________

b) Test Procedure: ___________________________________________

c) Level of significance: ________________________________

d) Decision Rule: _____________________________________________

e) Computation:

α= _________

 = _________

f) Decision:__________________________________________________

g) Conclusion: _______________________________________________

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
67
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 12
Test of Hypothesis: Case of Two or More
Population Means – One-way Classification
Learning Objectives

At the end of this module, the participants should be able to:


 Decide whether to use parametric or nonparametric on two or more
population means, one way classification
 Perform a statistical test of hypothesis on two or more population
means, one way classification.

One-Way ANOVA

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
68
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
69
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
70
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 10. Test of Hypothesis:


Case of Two or More Population Means – One-Way Classification

Name: ___________________________ Score: __________

Using SPSS, solve the following problem and perform a complete test of
statistical hypothesis.

In order to compare the effectiveness of four methods of teaching young


children a computer programming language, independent random samples of
sizes 6 for each method are taken from large groups of children taught by these
four methods, and their standardized achievement test are recorded as follows:
Method Scores
A 75 73 68 72 87 75
B 84 92 84 82 87 85
C 62 65 68 67 67 66
D 74 76 73 72 76 74
Is there evidence to suggest that at α = 0.01, there is a difference in scores
among 4 teaching methods.

1) Response Variable:____________________________________

2) Independent Variable: ______________________________________

3) Parametric or Nonparametric:________________________________

STEP BY STEP STATISTICAL HYPOTHESIS TESTING:

a) Ho: __________________________________________________________

Ha: __________________________________________________________

b) Test Procedure: ________________________________________________

c) Level of significance: ________________________________

d) Decision Rule: __________________________________________________

e) Computation:

α= _________

 = _________

f) Decision:_______________________________________________________

g) Conclusion: _____________________________________________________

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
71
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 13
Test of Hypothesis: Case of Two or More
Population Means – Two-way Classification
Learning Objectives
At the end of this module, the participants should be able to:
 Decide whether to use parametric or nonparametric on two or more
population means, two-way classification
 Perform a statistical test of hypothesis on two or more population
means, two-way classification.
Features:
1. It employs a one-directional blocking of experimental units within a block or more or
less homogeneous.
2. Each block is a complete replication of the entire set of treatments.
3. The number of experimental units in a block should be equal to the number of
treatments, or some multiple of it.
Randomization
1. Group or stratify the experimental units into r blocks, with each block having t (or
some multiple of t) experimental units.
2. Allocate the treatments into the experimental units in a block at random, and do this
from block to block, independent of the results of randomization in other blocks.
Computation of Sums of Squares
Analysis of Variance Table:
TSS = ∑∑ (Yij)2 – CF
SV df SS MS Fc
TrSS = ∑ (Yi.)2/r – CF Treatment t–1 TrSS MSTr
RSS = ∑ (Y.j)2/t – CF Block r–1 RSS MSR
ESS = TSS – TrSS – RSS Error (t – 1)(r – 1) ESS MSE
and CF = (Y..)2 /tr Total tr – 1 TSS

Test of Hypothesis
1. To test for difference among treatment means (effects)
Test statistic: Fc = MSTr/ MSE ~ F[t – 1,(t – 1)(r – 1)]

2. To test for difference among block means (effects)


Test statistic: Fc = MSR/ MSE ~ F[r – 1,(t – 1)(r – 1)]
EXAMPLE:
Suppose the US Golf Association (USGA) wants to compare the mean distances traveled
by four different brands of golf balls when struck with a driver. Using human golfers, a
driver was used to hit a random sample of even number of balls of each brand in a
random sequence. The distance is recorded for each hit, and the results are shown
below, organized by brand.
GOLFER (Block) BRAND A BRAND B BRAND C BRAND D Block Total
1 202.4 203.2 223.7 203.6 823.9
2 242.0 248.7 259.8 240.7 991.2
3 220.4 227.3 240.0 207.4 895.1
4 230.0 243.1 247.7 226.9 947.7
5 191.6 211.4 218.7 200.1 821.8
6 247.7 253.0 268.1 244.0 1012.8
7 214.8 214.8 233.9 195.8 859.3
8 245.4 243.6 257.8 227.9 974.7
9 224.0 231.5 238.2 215.7 909.4
10 252.2 255.2 265.4 245.2 1018.0
Treatment Total 2270.5 2331.8 2453.3 2207.3 GT=9262.9
Means 227.0 233.2 245.3 220.7 GM=231.5725
a) Compare the mean distances for the four brands. Use 5% level of significance.
b) At α= 0.05, are there effects of the different golfers on the mean distance?

Data Management & Statistical Analysis using IBM-SPSS Statisticsby Maritess D. Villanueva
72
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

ns – not significant

6. Decision: Since 6. Decision: Since

sig = 0.408 > α= 0.05, we fail to reject Ho. sig = 0.295 > α= 0.05, we fail to reject Ho.

7. Conclusion: At α = 5%, There are significant differences 7. Conclusion: At α = 5%, There are no significant
among treatment means. differences among block means.

NonParametric counterpart: Friedman Test

 is used to analyze K-related samples.


 An extension of the two-way Analysis of variance technique for a
randomized block design when the assumption of normality is replaced by
the assumption that the distributions are continuous.

Data Management & Statistical Analysis using IBM-SPSS Statisticsby Maritess D. Villanueva
73
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Example:
A clothing manufacturer conducted an experiment to study the effect on
productivity of increases in its employee’s hourly wages. 4 treatments
were used and 12 employees were selected and grouped according to the
length of time they had been with the company. The employees were
observed for 3 weeks, and their productivity was measured as the average
number of nondefective garments each produced per hour. The resulting
productivity measures appear in the table:
TREATMENTS
No increase in Increase hourly Increase hourly Increase hourly
hourly wage wage by $0.50 wage by $1.00 wage by $1.50
Group 1 (less than 1 year) 2.4 3.0 3.1 3.2
Group 2 (1-5 years) 4.8 6.1 5.9 5.7
Group 3 (over 5 years) 5.1 7.0 7.2 7.3
a) Is there evidence that the mean productivity levels differ among the
four pay programs? Use α=0.01
b) Is there evidence that the mean productivity levels differ among the 3
groups? Use α=0.05

PROCEDURE: In Data Editor, select


ANALYZE > NONPARAMETRIC TESTS > LEGACY DIALOGS > K-
RELATED SAMPLES > FRIEDMAN

Test of Hypothesis (TREATMENTS):


1. Ho: The mean productivity levels did not differ among the four pay programs.
Ha: The mean productivity levels differ among the four pay programs.
2. TEST PROCEDURE: Friedman Test
3. α = 5%
4. Decision Rule: Reject Ho if sig < α; Otherwise, fail to reject Ho.
5. Computations:

sig = 0.122
α = 0.05

6. DECISION: Since sig= 0.122 > α =0.05; we fail to reject Ho.


7. CONCLUSION: At α = 5%, the mean productivity levels did not differ among
the four pay programs.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
74
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Test of Hypothesis (BLOCKS):


1. Ho: The mean productivity levels did not differ among the 3 groups.
Ha: The mean the mean productivity levels differ among the 3 groups.
2. TEST PROCEDURE: Friedman Test
3. α = 5%
4. Decision Rule: Reject Ho if sig < α; Otherwise, fail to reject Ho.
5. Computations:

sig = 0.018
α = 0.05

6. DECISION: Since sig= 0.018 > α =0.05; we reject Ho.


7. CONCLUSION: At α = 5%, the mean the mean productivity levels differ
among the 3 groups.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
75
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 11. Test of Hypothesis:


Case of Two or More Population Means – Two -Way Classification

Name: ___________________________ Score: __________

Using SPSS, solve the following problem and perform a complete test of
statistical hypothesis.

A food chain sells a particular item at all its stores. Each store carries three
brands, two of which are economy brands. The management decides to
discontinue selling one of the economy. It has decided to look at the turn
time of each brand – i.e, the average time between successive purchases of
the same brand. Five of the stores in the chain are selected, and an
employee in each store reports the turn time (in min) for each brand.

STORE BRAND
1 4.1 3.9
2 5.2 5.1
3 5.0 5.0
4 4.9 4.7
5 6.1 5.9

Is there a difference in the mean turn times for the two economy brands?
Use α = 0.01

Is there a difference in the mean turn times for the 5 stores? Use α =
0.05

1) Treatment:____________________________________

2) Block: ______________________________________

3) Parametric or Nonparametric:________________________________

STEP BY STEP STATISTICAL HYPOTHESIS TESTING:

(For Treatment Means)

a) Ho: __________________________________________________________

Ha: __________________________________________________________

b) Test Procedure: ________________________________________________

c) Level of significance: ________________________________

d) Decision Rule: __________________________________________________

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
76
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

e) Computation:

α= _________

 = _________

f) Decision:_______________________________________________________

g) Conclusion: _____________________________________________________

STEP BY STEP STATISTICAL HYPOTHESIS TESTING:

(For Block Means)

a) Ho: __________________________________________________________

Ha: __________________________________________________________

b) Test Procedure: ________________________________________________

c) Level of significance: ________________________________

d) Decision Rule: __________________________________________________

e) Computation:

α= _________

 = _________

f) Decision:_______________________________________________________

g) Conclusion: _____________________________________________________

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
77
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 14
Using SPSS to find Simple Random Samples
Learning Objectives
At the end of this module, the participants should be able to:
 Draw simple random samples from the constructed frame in SPSS
data

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
78
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
79
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
80
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 15
Measures of Correlations and Relationships
Learning Objectives
At the end of this module, the participants should be able to:
 compute the correlation coefficient & test its significance.
 compute the rank correlation coefficient & test its significance.
 perform an appropriate test for categorical data – the chi-square test
(χ2) for independence.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
81
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
82
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
83
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
84
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 12. Test of Hypothesis:


Measures of RElationship

Name: ___________________________ Score: __________

Using SPSS, solve the following problem and perform a complete test of
statistical hypothesis.

A random sample of 400 married men, all retired or at least in their 65’s were classified according to
educational attainment and number of children.
Number of Children
Educational Attainment
0-2 3-5 Over 5
None 12 22 26
Elementary 14 59 37
Highschool 20 80 34
College 26 31 19
Test the hypothesis that the number of children is independent of the level of education attained by
the father at α = 0.05.

1) Independent Variable:____________________________________

2) Dependent Variable: ______________________________________

3) Parametric or Nonparametric:________________________________

STEP BY STEP STATISTICAL HYPOTHESIS TESTING:

a) Ho: __________________________________________________________

Ha: __________________________________________________________

b) Test Procedure: ________________________________________________

c) Level of significance: ________________________________

d) Decision Rule: __________________________________________________

e) Computation:

α= _________

 = _________

f) Decision:_______________________________________________________

g) Conclusion: _____________________________________________________

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
85
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Module 16
Regression Analysis
Learning Objectives
At the end of this module, the participants should be able to:
 formulate predicting equation and test its significance
 perform at least simple linear regression analysis

PARAMETRIC REGRESSION ANALYSIS

Regression Analysis is a statistical technique used for determining the


probable form of the relationship between variables. The
ultimate objective when using this method of analysis is
usually to predict or estimate the value of one variable
corresponding to a given value of another variable.
Recall:
Simple Regression Analysis a form of linear relationship consisting only
one independent variable X to predict dependent variable
Y. Objective: To find the possible relationship between
two variables X and Y, where X and Y are paired
variables.

Two variables X and Y are linearly related if their relationship can be


expressed by the simple linear statistical model
Y = β0 + βiX i + εi
where Yi = ith observed value of the random variable Y
Xi = ith observed value of the random variable X
β0 = regression constant. It is the true Y intercept
β1 = regression coefficient. It measures the true
increase in Y per unit increase in X.

This model is called the SIMPLE LINEAR REGRESSION MODEL

Assumptions Underlying the SLRM:


1. The values of the independent variable X may either
be “fixed” or random.
2. The X’s are measured without error
3. The Y-values are statistically independent.
4. For each value of X, there is a subpopulation of the Y
values that is normally distributed.
5. The variances of the subpopulations of Y are all equal
to σ2.
6. The means of the subpopulations of Y all lie on the
same straight line.

PARAMETERS OF THE MODEL:


β0 = regression constant β1 = regression coefficient σ2 = common
population variance

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
86
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Result from Statistical Theory:


Estimators of the parameters based on SRS of size n.
S xy
1. β1 = b1 = 2
2. β0 = b0 = Y − b1 X 3.
Sx
n 2
Sy
2 ∑ (Yi − Y1 )
2 i=1
ˆ =
σ =
n−2 n−2
Predicting Equation: Yi = b0 + b1 Xi

Evaluation of the Simple Regression Equation

An overall measure of adequacy of the equation is provided by the


coefficient of multiple determination, denoted by r2. It is defined as
2
S xy b1S xy SSR
r2 = 2 2
= 2
=
Sx Sy S y SST
2
r gives the proportion of total variation in Y that is accounted for by
the independent variable X. It ranges from 0 to 1, or 0 to 100%.

The nearer its value to 1 the better is the fit of the regression line.

Note: If the model is not significant, do not use the prediction because it might
not be linear.

Data Management & Statistical Analysis using IBM-SPSS Statistics by Maritess D. Villanueva
87
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
88
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
89
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

Exercise 13. Test of Hypothesis:


Regression Analysis

Name: ___________________________ Score: __________

Using SPSS, solve the following problem and perform a complete test of
statistical hypothesis.
A young economist wants to verify if wage is related to the educational background of an
individual. He interviewed 20 randomly chosen individuals and obtained the following
results:
Observation No. of Years in Monthly Observation No. of Years in Monthly
No. School Wage (P) No. School Wage (P)
1 0 300 11 15 1600
2 3 400 12 10 900
3 6 600 13 17 2000
4 10 800 14 8 700
5 1 400 15 14 1250
6 11 950 16 17 2500
7 11 950 17 10 850
8 7 650 18 13 1200
9 14 1000 19 9 600
10 2 450 20 14 1500

a. Identify the independent variable: _________________________


b. Identify the dependent variable: ___________________________
c. Plot a scatterplot diagram.
d. Find the equation of the regression line and interpret the result
e. Fit the regression line on the scatter plot diagram.
f. Compute for the coefficient of multiple determination and interpret.
g. Estimate the monthly wage when the number of years in school is 15.
h. Test for the significance of β1 at α = 5%.

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
90
GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services
August 2014 Bailan, Pontevedra, Capiz

IBM-SPSS Statistics version 20 Training Module

Team Leader: MARITESS D. VILLANUEVA,


VILLANUEVA MAT (Mathematics), MS Statistics

Technical Assistant: CLEO S. VILLANUEVA


VILLANUEVA, MIT

Program Assistants: DIEGO MALONES,


MALONES Ed. D.
ANABO MBA
FERDINAND D. ANABO,
BACAS MBA
MICHELLE BACAS,
BASQUEZ MAT (Math)
MALOU BASQUEZ,
TENORIO MAT (Math)
ALVIN TENORIO,
BALTERO M. Chem
KRIS D. BALTERO,
PET ROANA B. BATACANDOLO
JOHN KENETH ADA

Data Management & Statistical Analysis using IBM-SPSS Statistics


by Maritess D. Villanueva
91

Das könnte Ihnen auch gefallen