Basic Statistics (Data Management&Stat Analysis)

GSO & GS PROGRAMS
CapSU Pontevedra Research & Extension Services

August 2014 Bailan, Pontevedra, Capiz
DATA MANAGEMENT
&
Statistical Analysis
for Social Science Researches
using
IBM-
IBM-SPSS Statistics
ver. 20
September 8-9, 2014

Capiz State University, Main Campus
Roxas City
Secretariat: Graduate School Office & Graduate School Programs

Capiz State University-
University-Pontevedra Research & Extension Services
Bailan, Pontevedra, Capiz
capsupontgs@gmail.com
Tel.No. (036)-634-0474
TABLE OF CONTENTS
Data Management & Statistical Analysis using IBM-SPSS Statistics
by Maritess D. Villanueva
0
GSO & GS PROGRAMS
Content Page No.

CapSU Vision, Mission, and Goals .......... 3
RDE Vision, Mission, and Goals .......... 4
Rationale .......... 5
Objectives .......... 6
Module 1 – Basic Concepts in Statistics .......... 7
Categories of Statistics .......... 7
Statistical Softwares .......... 9
Module 2 – IBM SPSS: An Introduction .......... 12

How to run SPSS .......... 12
SPSS Interface .......... 14
SPSS Windows .......... 15
Data Editor Window .......... 15
Output Window .......... 15
Syntax Editor Window .......... 17
Module 3 – Entering, Saving and Opening SPSS data .......... 18

Reading an Excel File .......... 23
Data Recoding .......... 26
Exercise 1 .......... 34
Module 4 – Generating Descriptive Statistics .......... 35

Exercise 2 .......... 40
Module 5 – Generating Frequency Tables and Graphs .......... 41

Exercise 3 .......... 46
Module 6 – Detecting Data Outliers .......... 47

Exercise 4 .......... 50
Module 7 – Inferential Statistics: A Review .......... 51

Steps in Testing a Statistical Hypothesis .......... 53
One-tailed or Two-tailed? .......... 53
Parametric or Non-parametric? .......... 53
Exercise 5 .......... 54
Module 8 – Testing Assumptions .......... 55

Test for Homogeneity of Variances .......... 55
Test for Randomness .......... 56
Test for Nomality .......... 56
Exercise 6 .......... 58
Module 9 – Test on a Single Population .......... 59

Parametric: Z-test .......... 59
Nonparametric: Binomial test .......... 60
Exercise 7 .......... 61
Module 10 – Case of Two Population Means – Related Samples .......... 62

Parametric: t-test .......... 62
Nonparametric: Wilcoxon Signed-Rank Test .......... 63
Exercise 8 .......... 64

1
GSO & GS PROGRAMS
TABLE OF CONTENTS
Content Page No.

Module 11 – Case of Two Population Means – Independent
Samples ......... 65
Parametric: t-test or Z-test ......... 66
Nonparametric: Mann-Whitney U Test ......... 67
Exercise 9
Module 12 – Case of Two or More Population Means

- One-way Classification ......... 68
Parametric: F-test (ANOVA) ......... 68
Nonparametric: Kruskal-Wallis H Test ......... 69
Exercise 10 ......... 71
Module 13 – Case of Two or More Population Means

- Two-way Classification ......... 72
Parametric: F-test (ANOVA) ......... 72
Nonparametric: Friedman’s Test ......... 73
Exercise 11 ......... 76
Module 14 – Using SPSS to find Simple Random Samples ......... 78
Module 15 – Measures of Correlations and Relationships ......... 81

Parametric: Pearson Product Moment
Correlation Coefficient ......... 81
Nonparametric: Spearman Rank Correlation
Coefficient ......... 82
Chi-square test (categorical
data) ......... 83
Exercise 12 ......... 85
Module 16 – Simple Linear Regression ......... 86

Multiple Linear Regression ......... 89
Exercise 13 ......... 90

2
GSO & GS PROGRAMS
CAPIZ STATE UNIVERSITY
VISION
Center of Academic Excellence Delivering Quality Service to All.
MISSION
Capiz State University is committed to provide advanced knowledge

and innovation; develop skills, talents and values; undertake
relevant research, development and extension services;
promote entrepreneurship and environmental
consciousness and enhance industry
collaboration and linkages
with partner agencies.
GOALS
• Globally competitive graduates.

• Institutionalized research culture.
• Responsive and sustainable extension services
• Maximized profit of viable agro-industrial business ventures.
• Effective and efficient administration.

3
GSO & GS PROGRAMS
VMG of RDE
VISION
CapSU as a credible and recognized leader in the pursuit of

RDE activities in the Visayas Region.
MISSION
The University thru its RDE activities shall generate and extend
quality technical information, products and services in various
discipline using appropriate approaches for sustained agro-industrial
development to improve the quality of life.
GOALS
To actively support a sustainable agro-industrialization and

balanced socio-economic growth through technology generation and
commercialization, continued capability building, communication
advocacy on market-driven innovations, and partnership with key
sectors of development.

4
GSO & GS PROGRAMS
Training Rationale
Capiz State University at Pontevedra, spearheaded by the

Graduate School Office and Graduate School Programs, in
collaboration with the University’s Extension services, will adopt, as
part of its development in research aspect, initiatives in faculty
researches as well as thesis and dissertation assistance of graduate
students. One such initiative that has proven to be popular,
participatory, and efficient is by the use of a certain statistical
package, particularly the IBM-SPSS Statistics version 20. The updated
technology was introduced in June 2014 during the training course
on Research Design, Statistical Data Analysis and Interpretation for
Researchers in Forestry, Environment and Natural Resources given by
PCAARRD-DOST.
In this university, the IBM-SPSS Statistics version 20 will be

introduced in different departments and colleges and will be applied
in Social Science Researches.
IBM-SPSS version 20 will be used as a tool for data

management and data analysis in solving research problems in
various fields like education, economics, management, health,
marketing and others.
In this connection, a training workshop on Data Management

and Statistical Analysis for Social Science Researches using IBM-SPSS
Statistics version 20 is designed for faculty researchers and graduate
students. Interested research enthusiasts can also participate to the
said training.

5
GSO & GS PROGRAMS
Training Objectives
The training-workshop is conducted to enable the participants

to:
1. be the recipient of technology transfer;
2. appreciate the importance of IBM-SPSS Statistics ver. 20

computer program and its relevance in the conduct of their
researches;
3. use the said program in analyzing their own collected

research data or data related to their field of specialization;
4. apply improved skills in research;
5. be more involved in statistical analysis of their

thesis/dissertation, for the part of graduate student-
participants; and
6. identify staff for future researches that will be in-charge for

data banking which includes collection, organization,
presentation, analysis and interpretation of data for each
department of each colleges, for the part of faculty
participants.

6
GSO & GS PROGRAMS
Module 1
Basic Concepts and Categories of Statistics
& Statistical Packages/Softwares
Learning Objectives
At the end of this module, the participants should be able to:

discuss statistical concepts;
identify different types of variables and classify data according to level
of measurement ; and
introduce the different statistical packages.
Statistics
(Singular sense) is a science which deals with the collection,
organization, presentation, analysis, and interpretation of data
a study of variation
(plural sense) is an actual number derived from the data
a collection of facts and figures
a processed data (e.g. Population statistics, statistics on births,
statistics on enrollment)
Data → facts and figures
Types of data
Primary data – acquired directly from the source
Ex: data obtained by measuring wt. of 500 one-day old chicks from
Farm XYZ
Secondary data – non-primary data
Ex: Phil. Rice Production (tons/ha) data by province from 1990-2014
taken from publications of the Phil. Bureau of Agricultural Statistics
Categories of Statistics
Descriptive statistics- methods of organizing, summarizing, presenting data
and their interpretation.
Inferential statistics – concerned with making generalizations about a larger
set of data where only a part is examined.
Descriptive statistics Inferential statistics
Probability and Sampling

Scope of Statistics
Role of Statistics
A tool for data analysis (e.g. standard drug vs. new drug…. which is
more effective?)
Opinion poll survey (Do you think Philippines is ready for ASEAN
integration 2015?)
Some Basic Terms:
Universe – set of all entities or individuals under consideration/
subject of the study.
7
GSO & GS PROGRAMS
2 types:
Finite – when the elements of the universe can be counted for a
given time period
Infinite – when the number of elements of the universe is
unlimited
Variable – characteristics of interest measurable or observable on

each & every individual of the universe.
Qualitative
Variable
Types Discrete
Quantitative
Continuous
Population – set of all possible values of the variable

Sample – subset of the universe or the population
Distribution – pattern of variation of a variable
The Variables and Levels of Measurement
The measurement of a variable determines the amount of information that

can be processed to answer research objectives of a study. The scale of
measurement of the variable determines the algebraic operations that can be
performed and the statistical tools that can be applied to analyze the data. These
are four scales or levels of measurement:
Nominal
data collected are simply labels or names or categories without any
implicit or explicit ordering of the labels.
observations with the same label belong to the same category
lowest level of measurement
frequencies or counts of observations belonging to the same
category can be obtained.
Example 1.
Variable Possible data values
1. Sex Male, Female
2. Hair Color Black, Brown, Reddish Brown…
3. Cellphone network Smart, Talk n Text, Globe, Sun cellular…
Ordinal
data collected are labels or classes with an implied ordering in
these labels;
the difference between two labels cannot be quantified;
a level of measure higher than nominal;
only ordering or ranking can be done on the data;
Example 2.
1. Military rank Sergeant, Lieutenant, Captain, General
2. Job Position President, Vice-President, Manager
3. Sibling Rank 1st, 2nd ,3rd, 4th, 5th, …
8
GSO & GS PROGRAMS
Interval
data collected can be ordered or ranked, added and subtracted, but
not divided nor multiplied;
differences between any two data values can be determined;
the unit of measurement is constant (but arbitrary), and the zero
point is arbitrary;
a level of measurement higher than ordinal
Example 3.
1. Baking temperature 172oC to 178oC
2. Intelligence Quotient (IQ) 80 to 140
3. Grades 1.0, 1.25, 1.5, …
Ratio
data collected has all the properties of the interval scale and in
addition, can be multiplied and divided;
has a true zero point;
is the highest level of measurement.
Example 4.
1. Height 4’ to 7’
2. Width 0” to 5”
3. Weight 20 g to 50 kg
Statistical Software
is a specialized computer program used for data management and
statistical analysis
Statistical Packages
CS Pro (Census and Survey Processing System)
SAS (Statistical Analysis Software)
Stata
Minitab
R
STAR (Statistical Tool for Agricultural Research)
IRRISTAT
CROPSTAT
ITSM 2000
E-Views
SPSS (Statistical Package for Social Sciences)
CS Pro
a software package for editing, tabulating, and disseminating data
from censuses and surveys
a public domain software
Advantages:
Can improve the data management and analysis of large scale
surveys
Can be downloaded without any cost (free)
Can run on a computer with very basic specifications
9
GSO & GS PROGRAMS
Disadvantages:
Too many files are being generated
Only a single user can access and write to a file at any given
time
Modifying item affects existing file
SAS
a propriety software that enables users to implement data
management, statistical analysis, data mining, forecasting, etc.
a popular statistical software for medical research and pharmaceutical
industry
Advantages:
Powerful specially in implementing analysis on experimental
design and analysis of variance (ANOVA)
Has a wide range of statistical procedures
Disadvantages:
Difficult to learn
Expensive
Requires annual license
Recently launched a free SAS version for professors and students
called SAS university (www.sas.com)
Stata
a propriety software that widely used in the field of economics,
sociology and medicine.
executes data management and transformation, parameter
estimations, graphics, statistical measure computations and other
related mathematical calculations.
in executing the program, time series, statistics and graphics are
being loaded.
Minitab
a statistical software package originally intended for teaching
statistics.
Suitable for moderate-size datasets
Advantages:
Easy to learn and easy to use
Impressive quality of graphs
Cheaper compared to SAS and SPSS
Requires less disk space
Disadvantages:
Poor compatibility with other statistical programs
Less efficient for complex procedures
R
A free software programming language based on S programming
language
A software environment for statistical computing and graphics
Advantages:
Freely available online
Has powerful and customizable graphics
Can be integrated to other Statistical packages

10
GSO & GS PROGRAMS
Run on various operating systems such as Windows, Linux and

Mac
Disadvantages:
Difficult to learn
More complicated to learn compared to SAS or Stata
ITSM 2000
permit easy execution of data processing, graphical display,
estimation, and diagnostic testing for univariate and multivariate time
series models in the time and frequency domains
provides easy to use estimation and forecasting tools for spectral
analysis
particularly, the dynamic graphics allow the user to instantly see the
effect of data transformations and model changes on a wide variety of
features such as the sample, residual, and model autocorrelation
functions and spectra.
E-Views
offers an extensive array of powerful features for data handling,
statistics and econometric analysis, forecasting and simulation, data
presentation, and programming.
IRRISTAT
a set of microcomputer programs designed to assist agricultural
researchers in developing experimental lay-outs and undertaking plot
sampling, data collection, data and file management, statistical
analysis of data and presentation of results
STAR
a freeware developed specifically by Biometrics and Breeding
Informatics, Plant Breeding, Genetics and Biotechnology Division of
International Rice Research Institute)
a computer program for data management and basic statistical
analysis of experimental data.
SPSS (Statistical Package for Social Sciences)

one of the most widely used program for statistical analysis in Social
Sciences
Advantages:
User-friendly interface
Wide array of statistical procedures
Disadvantages:
Expensive
License is time limited
Graphics are less impressive

11
GSO & GS PROGRAMS
Module 2
IBM-SPSS Introduction
Learning Objectives

run the IBM-SPSS;
familiar with IBM-SPSS Interface and Windows
Quick Facts about SPSS

It was invented by Norman H Nie, C. Hadlai “Tex” Hul, and Dale H.
Bent during 1960s.
In 1980s, the version of the software was moved to a personal
computer.
Last 2008, the name SPSS was changed to Predictive Analysis
Software (PASW).
A year after, SPSS was acquired by IBM and renamed the software as
IBM SPSS Statistics
Statistical Analysis and procedures we can do with SPSS

Calculate Descriptive Statistics
Compute Frequencies
Compare Means
Do Test of Association and Independence
Create Different Graphs and Charts
Run Correlation and Regression
Conduct Analysis of Variance (ANOVA) and many other Statistical
Procedures
How to run SPSS
Option 1

12
GSO & GS PROGRAMS
Option 2
Option 3
Option 4
Press the window key in your keyboard
The monitor will display different icons

13
GSO & GS PROGRAMS
3
Move the scroll bar to the end point
3
4 Click the IBM-SPSS Statistics 20 icon discplayed in
the monitor.
SPSS Interface

14
GSO & GS PROGRAMS
SPSS Windows
SPSS is divided into 3 main windows:
1. Data Editor Window - this is where you enter the data

- divided into 2 views:
Data View
Variable View
Data View – a spreadsheet-like interface where you enter the data.

This is the default view when opening SPSS

15
GSO & GS PROGRAMS
Variable View – this is where you define your variables
2. Output Window - this is where the result is being displayed

16
GSO & GS PROGRAMS
3. Syntax Editor Window - is used to run and store SPSS command

17
GSO & GS PROGRAMS
Module 3
Entering, Saving and Opening SPSS data
Learning Objectives

know how to encode and save information in the SPSS data editor;
exporting information from MS Excel file
transform/recode data into same variables;
compute another variable from the existing data;
transform/recode data into different variables;
Entering SPSS data

Define the variable names
1
Click the Variable View tab at the bottom of the Data editor window
Table 1. Socio-Economic Characteristics of 10
Domestic Helpers Interviewed in
Quezon City.
Number of
DH Place of Origin Age
Siblings
1 Barrio 34 5
2 Barrio 16 3
3 Barrio 20 2
4 Town 23 8
5 Town 18 4
6 Barrio 17 4
7 Barrio 37 3
8 Barrio 25 2
9 City 31 4
10 City 42 1
Source: Laboratory Manual in Statistics 1 by Habacon, L.T. et.al.
In the first row of the first column, type origin. Then press ENTER
key. In the second row, type age. Then ENTER. In the third row, type
num_sib. Press ENTER.
New variables are automatically given a Numeric data type
Note: Variable name must start with a letter and has no space
18
GSO & GS PROGRAMS
Type – the type of variable
Width – number of characters or numerical digit you will be able to

enter for a particular variable
Decimals – desired number of decimal places

19
GSO & GS PROGRAMS
Label – full name of the variable
Values – Use to assign values to vaariables

e.g. 1 – Male 2 – Female
Missing – allows you to assign missing values

20
GSO & GS PROGRAMS
Column– determine the size of column display
Align – Alignment of data in column
Measure – Level of measurement of data/variable
2 Management & Statistical Analysis using IBM-SPSS Statistics

Data by Maritess D. Villanueva
21
GSO & GS PROGRAMS
Enter the corresponding data
Click the Data View tab

(Notice that the names that are entered in the Variable View are now
the headings for the first three column in the Data View)
Table 1. Socio-Economic Characteristics
of 10 Domestic Helpers
Interviewed in Quezon City.
Place of Number of
DH Age
Origin Siblings
1 Barrio 34 5
2 Barrio 16 3
3 Barrio 20 2
4 Town 23 8
5 Town 18 4
6 Barrio 17 4
7 Barrio 37 3
8 Barrio 25 2
9 City 31 4
10 City 42 1
Begin entering data in the first row starting at the first column.
Move the cursor to the second row of the first column to add the next
subject’s data.
Saving SPSS data
Click File Save

22
GSO & GS PROGRAMS
Reading/Opening an Excel File

SPSS is capable of reading Excel file
To demonstrate:
Open an excel file Exercise 1.xlsx located at Desktop > SPSS Training
> Data sets
Close the Excel file before opening it in SPSS
Click the folder icon to open data document

23
GSO & GS PROGRAMS
Another way is to Click File > Open > Data

24
GSO & GS PROGRAMS
Check Read
variable names
from the first
row of data.
Put the
worksheet
number/name
where you typed your data.
8
Sample OUTPUT of Reading an Excel file using SPSS
Save your work as Exercise1a.sav.

25
GSO & GS PROGRAMS
Data Recoding (Into Same Variable)
From previously saved Exercise1a.sav,
1. Transform Place of Origin into same variable:

Place of Origin: Barrio → 1
City → 2
Town → 3
Click Transform > Recode into Same Variables

26
GSO & GS PROGRAMS
Highlight Place of Origin, move it to String Variables and click Old

and New Values
Put Barrio in the Old Value, and 1 in the New Value then Click Add.
City in the Old Value, and 2 in the New Value then Click Add.
Town in the Old Value, and 3 in the New Value then Click Add.
Then Click Continue
Click OK

27
GSO & GS PROGRAMS
Note that the entries for variable Place of Origin were replaced by
codes 1, 2 and 3.
To properly label the codes, Click Variable View, go to Values for

Place of Origin,
type 1 in the Value and Barrio in the Label, then click Add
2 in the Value and City in the Label, then click Add
3 in the Value and Town in the Label, then click Add
Click OK.

28
GSO & GS PROGRAMS
Data Recoding (Into Different Variable)
From previously saved Exercise1a.sav,
2. Transform Age into different variable:

Age: Below 25 years old → 1
25 – 35 years old → 2
Above 35 years old → 3
Click Transform > Recode into Different Variables

29
GSO & GS PROGRAMS
Highlight Age, move it to String Variables and put an Output

Variable Name and Label. Click Change and then Old and New
Values

30
GSO & GS PROGRAMS
Click Continue.
Click OK.

31
GSO & GS PROGRAMS
Note that another column was created for age_recoded.
Do not forget to label properly the codes used.
Compute Variable
In creating another variable (annual salary) based on existing variable

monthly salary.
Click Transform > Compute Variable

32
GSO & GS PROGRAMS
Supply a variable name for target variable, say Annual_salary, then put
in a numeric expression box: 12* Monthly_Salary
Note that another column was created for annual_salary.

33
GSO & GS PROGRAMS
Exercise 1. Entering, Saving, Recoding and Computing SPSS Data
Consider the data in Table 1,

A. Transform the following variables into the same variable:
Previous Occupation: None → 0
Agriculture → 1
Factory Worker → 2
Saleslady → 3
Save your work as Exercise1b.sav.

B. Recode the following variables into different variables:
Number of Siblings: less than 5 → 1
5 and more → 2
Above 35 years old → 3
Employer’s Monthly HH Income (PhP)

50,000 & Below → 1
50,001 – 75,000 → 2
Annual Salary (PhP): 15,000 & below → 1

15,001 – 20,000 → 2
Above 20,000 → 3
Save your work as Exercise1c.sav.
Data Management & Statistical Analysis using IBM-SPSS Statistics by Maritess D. Villanueva
34
GSO & GS PROGRAMS
Module 4
Generating Descriptive Statistics
Learning Objectives

Generate statistics for the averages or measures of central tendency;
Produce statistics for measures of variability or dispersion ; and
check the distribution of the data according to its skewness and
peakedness.
Consider the data on screening exam scores of 20 freshman applicants

each in Science High school and Rural High School.
Open the data file Module 4 (screening exam scores)

Location of folder: Desktop > SPSS Training > Data sets > Module 4
(screening exam scores)

35
GSO & GS PROGRAMS
From the menu bar, select Analyze>Descriptive Statistics>Frequencies

36
GSO & GS PROGRAMS

37
GSO & GS PROGRAMS
Output

38
GSO & GS PROGRAMS
Interpretation
Measure of Central Tendency

Mean - the average value
Median – the middle value of the data set when it is arranged in an
ascending or decreasing order
Mode – the most frequently occurring value(s) in the data set
Measure of Location
Minimum – smallest observed value in the data
Maximum – largest value observed in the data
Measure of Dispersion
Standard deviation – a measure of variability of the data points
from the mean value
Variance – average squared differences of the data points from the
mean value
Range – the simplest measure of variation computed as the
difference between the highest and lowest value of the data set

39
GSO & GS PROGRAMS
Exercise 2. Generating Descriptive Statistics
Consider the data in Table 1. Socio-Economic Characteristics of 35

Domestic Helpers Interviewed in Quezon City, in page 34.
TO DO:
Open your recently saved SPSS data: Desktop > SPSS Training > Data
Sets > Exercise1c.sav
Generate the Descriptive Statistics of the data for the variables Age and
Annual Salary (Minimum, Maximum, Range, Mean, Median, Mode,
Variance, Standard Deviation, Skewness and Kurtosis)
Save your work as Exercise2.spv

40
GSO & GS PROGRAMS
Module 5
Generating Frequency Tables and Graphs
Learning Objectives

familiarize himself with different methods of data presentation
organize data by constructing a frequency distribution table; and
implement the most appropriate method of data presentation for a
given set of data
Frequency Table – a table that lists the number of occurrence of each item in
the data
Consider your recently saved SPSS data: Desktop > SPSS Training >
Data Sets > Exercise1c.sav
Place of Origin)
at the center.

41
GSO & GS PROGRAMS
Expected Output
Place of Origin
Frequency Percent Valid Percent Cumulative

Percent
Barrio 18 51.4 51.4 51.4
Town 7 20.0 20.0 71.4
Valid
City 10 28.6 28.6 100.0
Total 35 100.0 100.0
Sample Interpretation
About 51.4% of the total number of DH respondents are from
Barrio.
More than half (51.4%) of the total number of DH respondents
came from Barrio.
In every 10 DH respondents, five originated from Barrio.
Generating Graphs
Chart or graphs are visual representation of the data
Pie Charts
Bar Charts

42
GSO & GS PROGRAMS
Consider your recently saved SPSS data: Desktop > SPSS Training >
Data Sets > Exercise1c.sav
Click Graphs > Chart Builder

43
GSO & GS PROGRAMS
Move the Place of Origin variable to the x-axis. Click OK to create the
chart.
Output
To create a Pie chart. Consider the same data on Place of Origin

Click Graphs > Legacy Dialogs > Pie

44
GSO & GS PROGRAMS
A dialog prompt will appear. Click Define
Select the variable Place of Origin by placing it on the Define

Slices by: Click OK
Generated Pie Chart (double click the chart to enhance more the
Pie Chart)

45
GSO & GS PROGRAMS
Exercise 3. Generating Frequency Tables and Graphs
Consider the data in Table 1. Socio-Economic Characteristics of 35

Domestic Helpers Interviewed in Quezon City, in page 34.
TO DO:
Open your recently saved SPSS data: Desktop > SPSS Training > Data
Sets > Exercise1c.sav
Generate frequency tables for Previous Occupation and Number of

Siblings (recoded).
Generate pie chart for Previous Occupation and bar graph for Number
of Siblings (recoded).

46
GSO & GS PROGRAMS
Module 6
Detecting Data Outliers
Learning Objectives
At the end of this module, the

participants should be able to:
Detect data outliers using histogram
Identify data outliers using box-and-
whiskers plot
Use the SPSS data

Module 6 (quiz scores).sav
to determine if there are outliers
in the data
Click Analyze>Descriptive Statistics

> Explore
Move the variable(s) to

Dependent List then
click Statistics
In a separate dialog box, tick

the words “Outliers” and
“Percentiles” then click
Continue

47
GSO & GS PROGRAMS
Click Plots
Sd
Click OK

48
GSO & GS PROGRAMS
Expected Output

49
GSO & GS PROGRAMS
Exercise 4. Detecting outliers in the data
TO DO:
Open the SPSS data: Desktop > SPSS Training > Data Sets > Exercise4
(senior citizens).sav
Consider the characteristics of the 24 members of the Batong Malake

Senior Citizens Association(BMSCA) who participated in their Lakbay-
Aral.
Test if there are outliers for the variables age and income using
histogram and box-and-whiskers plot.

50
GSO & GS PROGRAMS
Module 7
Inferential Statistics : Steps in Testing Statistical
Hypothesis
Learning Objectives

formulate the null and alternative hypothesis for a given situation;
identify TYPE I and TYPE II errors and recognize consequences of
such errors
identify one-tailed and two-tailed test; and
recognize parametric and nonparametric tests
REVIEW:
Inferential Statistics – concerned about estimating parameters by statistics.
Statistical hypothesis
A conjecture about….
⇒ The value of a parameter of the population or
⇒ The distribution of the population
• Examples of Statistical Hypothesis:
• The mean height of students enrolled in Statistics is
5’2” (H: µ = 5’2”).
• The grain length of a variety of rice (IR-8) is normally
distributed (H: Y~normal)
⇒ Conclusions are stated subject to uncertainty
Null Hypothesis – the conjecture which is being tested, denoted by Ho.
- Generally, this is a statement of equality or status quo or no
difference.
Alternative Hypothesis – the complementary statement that will be accepted
in the event that the null hypothesis is rejected. It is
denoted by Ha or H1.
Example: The mean weekly allowance of CapSU students is 500 pesos.

In symbols,Ho: µ = 500 pesos
Ha: µ ≠ 500 pesos OR Ha: µ > 500 OR Ha: µ < 500
Note: Only one of these three alternatives has to be specified
Application Problem: The various consumer ‘watchdog’ organizations regularly

check the mass of items being sold to ensure that advertised
data matches reality.
The 1 kg bags of sugar from the Citizen Kane Sugar Co.
are under scrutiny and we assume that the bags are correctly
labeled – i.e. that they contain exactly 1 kg of sugar.
Solution: The mean mass for the population of 1 kg bags can
be: ____________, _______________, ______________

51
GSO & GS PROGRAMS
If µ represents the average bag mass of the population, then the following
possibilities exist:
Possible value Action
_________________ _____________________________________
_________________ _____________________________________
_________________ _____________________________________
Although there are ___ possibilities for µ, 2 of them amount to the

same thing. No action will be taken against the company for _____, since
this is giving customers a value for money deal. No action will be taken for
______ since this is a fair dealing. So we combine these into _______. But
_________ will produce action!
The null hypothesis is set up formally:

It is appropriate to assume that this company is meeting its
obligations and so the null hypothesis is that there is no
disadvantage to the customer.
The alternative hypothesis is set up formally:
The company is not meeting its obligations and consumers are
being disadvantaged.
In summary: Hypothesis Action?

H0: _____________________________________
Ha: _____________________________________
The problem of Citizen Kane Sugar can be used to give a generalized picture where we
use the symbol µ0 to stand for the hypothesized mean. In the given problem, it took the
value 1 kg. The three forms of hypothesis test concerning the population mean are
Form 1 Form 2 Form 3
Null hypothesis H0 : H0: H0:
Alternative hypothesis Ha: Ha: Ha:
Test of a Statistical Hypothesis
Procedure or rule for deciding whether to reject Ho on the basis of a
sample drawn from a population.
Courses of Action or Decision in Hypothesis Testing
1. Reject Ho
2. “Fail to reject” (Accept?) Ho
Consequences of Decision Made in Hypothesis Testing
Ho is actually
Decision Made
TRUE FALSE
Error in Decision
Reject Ho Correct Decision
(TYPE I)
Error in Decision
Fail to reject Ho Correct Decision
(TYPE II)
2 Types of error: Type I error – error in rejecting a true Ho
Type II error – error in accepting a false Ho
Probability of Committing Errors
1. The probability of committing Type I error is denoted by α;
i.e α = P[Type I error] = P[reject Ho/Ho is true
= level of significance of a statistical test
2. The probability of committing Type II error is denoted by
β; i.e β = P[Type II error] = P[accept Ho/Ho is false]

52
GSO & GS PROGRAMS
TEST OF STATISTICAL HYPOTHESIS

Steps in Testing a Statistical Hypothesis Parallelism to the Judicial Process Analog
1. State Ho and Ha - Innocence assumption / Accusation of guilt

2. Identify the test statistic and its distribution
when Ho is true. - Type of evidence
3. Specify the level of significance. - Risk of “guilty verdict when innocent
4. State the decision rule. - Substantial evidence or not
5. Collect the data and perform calculations - Collect and summarize evidence
6. Make a statistical decision - Verdict
7. State conclusion - Sentence
Test statistic → Statistic which provides a basis for determining whether to

reject Ho in favor of Ha.
Decision Rule → Rule which specifies that region for which the test statistic
leads to the rejection of Ho in favor of Ha.
Critical Region → The region specified on the test of Ho vs Ha.
One-tailed or Two-tailed?
Acceptance Acceptance Acceptance

region region region
Rejection Region Rejection Region Rejection Region

Ha: ≠ (α/2) Ha: > (α) Ha: < (α)
Parametric or Nonparametric?

53
GSO & GS PROGRAMS
Exercise 5. Formulating Hypothesis and Errors in

Hypothesis Testing
Name: ___________________________ Score: __________
TO DO:
A. Consider each of the following situations and indicate for each of the four
actions whether it is a CORRECT DECISION, a TYPE I error or a TYPE II
error.
Ho : A training course is effective.

1. Approve an ineffective training course. - ___________________________
2. Disapprove an ineffective training course. - ___________________________
3. Disapprove an effective training course. - ___________________________
4. Approve an effective training course. - ___________________________
Ho : A large manufacturing firm is being charged with discrimination in its hiring practices.
5. The jury gave an innocent verdict to the guilty firm. - ___________________________
6. The jury gave a guilty verdict to a not innocent firm. - ___________________________
7. The jury gave a guilty verdict to an innocent firm. - ___________________________
8. The jury gave a “not guilty” verdict to an innocent firm. - ___________________________
B. For the given problem, formulate an appropriate null (Ho) and an

appropriate alternative (Ha) hypothesis. Define any term or symbol which
you would be using. Also, identify the situations when Type I and Type II
errors will be committed.
From past experience, it has been determined that a qualified operator of a

certain machine turning out 500 items per day produces 25 or fewer
defective items per day. A new operator is being hired to run the same
machine and the hypothesis is made that he is a qualified operator.
Null Hypothesis (H0): Alternative Hypothesis (Ha):
Type I error situation: Type II error situation:
54
GSO & GS PROGRAMS
Module 8
Testing Assumptions
Learning Objectives

Perform a test on homogeneity of variances
Execute test on Randomness of data observations
Analyze if the data set follows a normal distribution
Decide whether to use parametric or nonparametric tests
TEST ON
ON ASSUMPTIONS
In most situations, the satisfaction of assumptions for certain parametric
methods ensures the validity of the results and the appropriateness of the test
employed. It is for this reason that a number of methods has been designed to
test on certain assumptions of parametric methods.
Example: Three sections of the same Mathematics course are taught by
three instructors. The final exam score of the students in the
three sections are recorded as follows:
Section 1: 95, 32, 47, 75, 83, 84, 73, 68
Section 2: 85, 90, 79, 50, 32, 84, 78, 95, 65, 80
Section 3: 79, 92, 63, 68, 76, 20, 37, 74, 86
Is the distribution of final exam scores the same in three sections? Test
for α = 5%.
Use the SPSS data Module 8 (Math sections).sav

1. Tests on Equality of Variances
The assumption of homoskedasticity (equality of variances) is used
in ANOVA techniques and regression analysis.
The assumption of homoskedasticity is necessary for some tests to
be valid.
The Bartlett’s test makes use of the χ2 test.
It tests whether p populations have equal variances of the samples
obtained from the p populations.
One of the many assumptions in the analysis of an experimental
data
If this assumption does not hold, the F-tests in the analysis of
variance is not valid
Test of Hypothesis:
1. Ho: The variances in final exam scores of 3 sections are equal.
Ha: The variances final exam scores of 3 sections are not equal.
2. TEST PROCEDURE: Homogeneity of variance test (Levene’s test)
3. α = 5%
4. Decision Rule: Reject Ho if sig < α; Otherwise, fail to reject Ho.
5. Computations:
sig = 0.907
α = 0.05
6. DECISION: Since sig= 0.907 < α =0.05; we fail to reject Ho.
7.CONCLUSION: At α = 5%, The variances of three treatments are
equal.
PROCEDURE: Analyze > Compare Means > Oneway ANOVA > Options > Homogeneity of Variance Test

55
GSO & GS PROGRAMS
2. The Run’s Test for Randomness

Inferential statistics will only be valid if random samples are taken
from the population(s) of interest, i.e., successive observations
must be independent of each other.
Test for randomness are usually based on the sequence or order in
which observations were obtained.
Test for Randomness

1. Ho: The sequence of observations is random.
Ha: The sequence of observations is not random.
2. TEST PROCEDURE: Runs test for randomness
3. α = 5%
4. Decision Rule: Reject Ho if sig < α; Otherwise, fail to reject
Ho.
5. Computations:
sig = 0.969
α = 0.05
7. CONCLUSION: At α = 5%, the sequence of observations is
random.
PROCEDURE: Analyze > Nonparametric Tests > Legacy Dialogs > RUNS
3. The One-Sample Test for Normality
Use Wilk-Shapiro test (for N < 2000) and Kolmogorov-Smirnov (K-

S) test (for N > 2000) is used to determine whether the sample
data came from a normal distribution or not.
It makes use of the standard normal distribution as the basis to say
whether a certain distribution is normal or not.
Test of Hypothesis:
1. Ho: The distribution of data is normal.
Ha: The distribution of data is not normal.
2. TEST PROCEDURE: Wilk-Shapiro Test for Normality
3. α = 5%
4. Decision Rule: Reject Ho if sig < α; Otherwise, fail to
reject Ho.
5. Computations:
sig = 0.438 (for section 1)
α = 0.05
6. DECISION: Since sig= < α =0.05; we fail to reject
Ho.
7. CONCLUSION: At α = 5%, the distribution of data is
normal among three sections.
PROCEDURE: Analyze > Descriptive Statistics > Explore > Plots > Normality plots with tests.

56
GSO & GS PROGRAMS
In instances wherein certain assumptions are not satisfied, appropriate

transformations and adjustments to the data must be done before parametric
methods (e.g., t, Z of F tests) are employed. Another alternative in such
instances is also done, i.e., to employ the nonparametric counterpart of the
appropriate parametric test.
Nonparametric Statistical tests

Also called distribution-free statistics.
No assumptions are made about the precise form of the sampled
population.
Easier to apply.
Applicable to rank data
Usable when two sets of observations come from different populations
The only alternative when sample size is small (n< 25)
Useful at a specified significance level as stated (whatever happened to be
the shape of the distribution from which the sample distribution was
drawn)
Lower statistical efficiency
NOTE: Parametric statistical test (e.g., Z, t, F tests) are more powerful

than nonparametric tests.

57
GSO & GS PROGRAMS
Exercise 6. Testing Assumptions
Name: ___________________________ Score: __________
TO DO:
In the article “Shelf-Space Strategy in Retailing,” published in the Proceedings:

Southern Marketing Association (1975), the effect of shelf height on the
supermarket sales of canned dog food is investigated. An experiment was
conducted at a small supermarket for a period of 8 days on the sales of a single
brand of dog food, referred to as Arf dog food, involving three levels of shelf
height: knee level, waist level, and eye level. During each day the shelf height of
the canned dog food was randomly changed on three different occasions. The
remaining sections of the gondola that housed the given brand were filled with a
mixture of dog food brands that were both familiar and unfamiliar to customers
in this particular geographic area. Sales, in hundreds of dollars, of Arf dog food
per day for the three shelf heights are as follows:
Shelf Height
Knee Level Waist Level Eye Level
77 88 85
82 94 85
86 93 87
78 90 81
81 91 80
86 94 79
77 90 87
81 87 93
Is there a significant difference in the average daily sales of this dog food
based on shelf height? Use a 0.01 level of significance.
Check the three underlying assumptions (normality, randomness and equality

of variances) of the above problem.
Yes No
Are the data normally distributed? ( ) ( )
Are the sample data collected at random? ( ) ( )
Are the variances in sales for each shelf
height equal? ( ) ( )
Which family of tests do you think is more appropriate to apply?
Parametric tests
Nonparametric tests

58
GSO & GS PROGRAMS
Module 9
Test on Single Population
Learning Objectives

Decide whether to use parametric or nonparametric test for a single
population
Perform a test of hypothesis for the mean or median in one
population
Parametric Statistical test : Z or t-

t-test: Case of Mean (µ
(µ) of a Single Population

59
GSO & GS PROGRAMS
NonParametric counterpart: Binomial Test (Based on

on median/rank)
Example:
Six types of dried fishery products were tested for levels of

histamine content. The histamine content /100 mg samples were
as follows:
24.09 9.47 5.11 13.14 6.57 10.95
It is claimed that the median level of histamine content among the

samples did not exceed the acceptable histamine level of 20
mg/100 g sample. Test the claim at α =0.01 level of significance.
Test of Hypothesis:
1. Ho: The median level of histamine content did not exceed 20mg/100g sample
Ha: The median level of histamine content exceed 20mg/100g sample
2. TEST PROCEDURE: Binomial test
3. α = 1%
5. Computations:
PROCEDURE: In Data Editor,

select ANALYZE > NONPARAMETRIC TESTS > LEGACY DIALOGS > BINOMIAL TEST
sig = 0.219/2 = 0.1095

α = 0.01

7. CONCLUSION: At α = 1%, the median level of histamine content did not
exceed 20mg/100g sample

60
GSO & GS PROGRAMS
Exercise 7. Test on Single Population
Name: ___________________________ Score: __________
Using SPSS, solve the following problem and perform a complete test of
statistical hypothesis.
An accountancy firm is investigating the installation of a computer system.

On a test run, it obtained the following time savings on an audit of a
selection of 10 major accounts (measured in hours):
74 12 35 26 34 42 30 45 8 33
At the 1% significance level, will the computer system make significant time
savings?
1) One-tailed or Two-tailed:____________________________________
2) Parametric or Nonparametric:________________________________
STEP BY STEP STATISTICAL HYPOTHESIS TESTING:
a) Ho:
_________________________________________________________
Ha:
_________________________________________________________
b) Test Procedure: ___________________________________________
c) Level of significance: ________________________________
d) Decision Rule: ____________________________________________
e) Computation:
α= _________
= _________
f) Decision:_________________________________________________
g) Conclusion: _______________________________________________

61
GSO & GS PROGRAMS
Module 10
Test of Hypothesis: Case of Two Population
Means – Related Samples
Learning Objectives

Decide whether to use parametric or nonparametric on two
population means, case of paired or related samples.
Perform a statistical test of hypothesis on two population means, case
of paired or related samples.

62
GSO & GS PROGRAMS
Test of Hypothesis:
1. Ho: There is no difference between the scores of a control group and their
matched individuals.
Ha: There is a difference between the scores of a control group and their
matched individuals.
2. TEST PROCEDURE: Wilcoxon Signed-Rank Test
3. α = 5%
5. Computations:
Ranks
N Mean Rank Sum of Ranks

a
Negative Ranks 6 6.00 36.00
b
Positive Ranks 3 3.00 9.00
y-x c
Ties 1
Total 10
a. y < x
b. y > x
c. y = x
a
Test Statistics
y-x
b
Z -1.604
Asymp. Sig. (2-tailed) .109
a. Wilcoxon Signed Ranks Test
b. Based on positive ranks.
sig = 0.109/2 = 0.0545

α = 0.05

7. CONCLUSION: At α = 5%, the scores of a control group and their matched
individuals are the same.
63
GSO & GS PROGRAMS
Exercise 8. Test of Hypothesis:

Case of Two Population Means – Related Samples
Name: ___________________________ Score: __________
It is claimed that a new diet will reduce a person’s weight in a period of two
weeks. The weights of 7 women who followed this diet were recorded before
and after a 2-week period.
Woman
1 2 3 4 5 6 7
Weight before 58.5 60.3 61.7 69.0 64.0 62.6 56.7

Weight after 60.0 54.9 58.1 62.1 58.5 59.9 54.4
Test a manufacturer’s claim at 5% level of significance.
1) One-tailed or Two-tailed:__________________________________________
2) Parametric or Nonparametric:______________________________________
a) Ho: __________________________________________________________
Ha: __________________________________________________________
b) Test Procedure: ________________________________________________
d) Decision Rule: __________________________________________________
e) Computation:
α= _________
= _________
f) Decision:_______________________________________________________
g) Conclusion: _____________________________________________________

64
GSO & GS PROGRAMS
Module 11
Test of Hypothesis: Case of Two Population
Means – Independent Samples
Learning Objectives

Decide whether to use parametric or nonparametric on two
population means, case of independent samples.
Perform a statistical test of hypothesis on two population means, case
of independent samples.

65
GSO & GS PROGRAMS

66
GSO & GS PROGRAMS

Case of Two Population Means – Independent Samples
Name: ___________________________ Score: __________
Production line quantities for two managers in two plants of a large company
are compared. Each data value represents the amount of production during
randomly selected 1-hour periods over a whole week.
Manager A:
15 13 8 16 12 15 12 18 11 12
9 10 7 9
Manager B:
14 15 10 16 11 13 15 12 14 11
Use the 1% level of significance to test the hypothesis that there is no

significant difference in the mean production rate.
1) One-tailed or Two-tailed:____________________________________
a) Ho: _____________________________________________________
Ha: _____________________________________________________
b) Test Procedure: ___________________________________________
d) Decision Rule: _____________________________________________
e) Computation:
α= _________
= _________
f) Decision:__________________________________________________
g) Conclusion: _______________________________________________

67
GSO & GS PROGRAMS
Module 12
Test of Hypothesis: Case of Two or More
Population Means – One-way Classification
Learning Objectives

Decide whether to use parametric or nonparametric on two or more
population means, one way classification
Perform a statistical test of hypothesis on two or more population
means, one way classification.
One-Way ANOVA

68
GSO & GS PROGRAMS

69
GSO & GS PROGRAMS

70
GSO & GS PROGRAMS

Case of Two or More Population Means – One-Way Classification
Name: ___________________________ Score: __________
In order to compare the effectiveness of four methods of teaching young

children a computer programming language, independent random samples of
sizes 6 for each method are taken from large groups of children taught by these
four methods, and their standardized achievement test are recorded as follows:
Method Scores
A 75 73 68 72 87 75
B 84 92 84 82 87 85
C 62 65 68 67 67 66
D 74 76 73 72 76 74
Is there evidence to suggest that at α = 0.01, there is a difference in scores
among 4 teaching methods.
1) Response Variable:____________________________________
2) Independent Variable: ______________________________________
a) Ho: __________________________________________________________
Ha: __________________________________________________________
b) Test Procedure: ________________________________________________
d) Decision Rule: __________________________________________________
e) Computation:
α= _________
= _________
f) Decision:_______________________________________________________
g) Conclusion: _____________________________________________________

71
GSO & GS PROGRAMS
Module 13
Test of Hypothesis: Case of Two or More
Population Means – Two-way Classification
Learning Objectives
Decide whether to use parametric or nonparametric on two or more
population means, two-way classification
Perform a statistical test of hypothesis on two or more population
means, two-way classification.
Features:
1. It employs a one-directional blocking of experimental units within a block or more or
less homogeneous.
2. Each block is a complete replication of the entire set of treatments.
3. The number of experimental units in a block should be equal to the number of
treatments, or some multiple of it.
Randomization
1. Group or stratify the experimental units into r blocks, with each block having t (or
some multiple of t) experimental units.
2. Allocate the treatments into the experimental units in a block at random, and do this
from block to block, independent of the results of randomization in other blocks.
Computation of Sums of Squares
Analysis of Variance Table:
TSS = ∑∑ (Yij)2 – CF
SV df SS MS Fc
TrSS = ∑ (Yi.)2/r – CF Treatment t–1 TrSS MSTr
RSS = ∑ (Y.j)2/t – CF Block r–1 RSS MSR
ESS = TSS – TrSS – RSS Error (t – 1)(r – 1) ESS MSE
and CF = (Y..)2 /tr Total tr – 1 TSS
Test of Hypothesis
1. To test for difference among treatment means (effects)
Test statistic: Fc = MSTr/ MSE ~ F[t – 1,(t – 1)(r – 1)]
2. To test for difference among block means (effects)

Test statistic: Fc = MSR/ MSE ~ F[r – 1,(t – 1)(r – 1)]
EXAMPLE:
Suppose the US Golf Association (USGA) wants to compare the mean distances traveled
by four different brands of golf balls when struck with a driver. Using human golfers, a
driver was used to hit a random sample of even number of balls of each brand in a
random sequence. The distance is recorded for each hit, and the results are shown
below, organized by brand.
GOLFER (Block) BRAND A BRAND B BRAND C BRAND D Block Total
1 202.4 203.2 223.7 203.6 823.9
2 242.0 248.7 259.8 240.7 991.2
3 220.4 227.3 240.0 207.4 895.1
4 230.0 243.1 247.7 226.9 947.7
5 191.6 211.4 218.7 200.1 821.8
6 247.7 253.0 268.1 244.0 1012.8
7 214.8 214.8 233.9 195.8 859.3
8 245.4 243.6 257.8 227.9 974.7
9 224.0 231.5 238.2 215.7 909.4
10 252.2 255.2 265.4 245.2 1018.0
Treatment Total 2270.5 2331.8 2453.3 2207.3 GT=9262.9
Means 227.0 233.2 245.3 220.7 GM=231.5725
a) Compare the mean distances for the four brands. Use 5% level of significance.
b) At α= 0.05, are there effects of the different golfers on the mean distance?
Data Management & Statistical Analysis using IBM-SPSS Statisticsby Maritess D. Villanueva
72
GSO & GS PROGRAMS
ns – not significant
6. Decision: Since 6. Decision: Since
sig = 0.408 > α= 0.05, we fail to reject Ho. sig = 0.295 > α= 0.05, we fail to reject Ho.
7. Conclusion: At α = 5%, There are significant differences 7. Conclusion: At α = 5%, There are no significant
among treatment means. differences among block means.
NonParametric counterpart: Friedman Test
is used to analyze K-related samples.

An extension of the two-way Analysis of variance technique for a
randomized block design when the assumption of normality is replaced by
the assumption that the distributions are continuous.
Data Management & Statistical Analysis using IBM-SPSS Statisticsby Maritess D. Villanueva
73
GSO & GS PROGRAMS
Example:
A clothing manufacturer conducted an experiment to study the effect on
productivity of increases in its employee’s hourly wages. 4 treatments
were used and 12 employees were selected and grouped according to the
length of time they had been with the company. The employees were
observed for 3 weeks, and their productivity was measured as the average
number of nondefective garments each produced per hour. The resulting
productivity measures appear in the table:
TREATMENTS
No increase in Increase hourly Increase hourly Increase hourly
hourly wage wage by $0.50 wage by $1.00 wage by $1.50
Group 1 (less than 1 year) 2.4 3.0 3.1 3.2
Group 2 (1-5 years) 4.8 6.1 5.9 5.7
Group 3 (over 5 years) 5.1 7.0 7.2 7.3
a) Is there evidence that the mean productivity levels differ among the
four pay programs? Use α=0.01
b) Is there evidence that the mean productivity levels differ among the 3
groups? Use α=0.05
PROCEDURE: In Data Editor, select

ANALYZE > NONPARAMETRIC TESTS > LEGACY DIALOGS > K-
RELATED SAMPLES > FRIEDMAN
Test of Hypothesis (TREATMENTS):

1. Ho: The mean productivity levels did not differ among the four pay programs.
Ha: The mean productivity levels differ among the four pay programs.
2. TEST PROCEDURE: Friedman Test
3. α = 5%
5. Computations:
sig = 0.122
α = 0.05
6. DECISION: Since sig= 0.122 > α =0.05; we fail to reject Ho.

7. CONCLUSION: At α = 5%, the mean productivity levels did not differ among
the four pay programs.

74
GSO & GS PROGRAMS
Test of Hypothesis (BLOCKS):

1. Ho: The mean productivity levels did not differ among the 3 groups.
Ha: The mean the mean productivity levels differ among the 3 groups.
2. TEST PROCEDURE: Friedman Test
3. α = 5%
5. Computations:
sig = 0.018
α = 0.05
6. DECISION: Since sig= 0.018 > α =0.05; we reject Ho.

7. CONCLUSION: At α = 5%, the mean the mean productivity levels differ
among the 3 groups.

75
GSO & GS PROGRAMS

Case of Two or More Population Means – Two -Way Classification
Name: ___________________________ Score: __________
A food chain sells a particular item at all its stores. Each store carries three
brands, two of which are economy brands. The management decides to
discontinue selling one of the economy. It has decided to look at the turn
time of each brand – i.e, the average time between successive purchases of
the same brand. Five of the stores in the chain are selected, and an
employee in each store reports the turn time (in min) for each brand.
STORE BRAND
1 4.1 3.9
2 5.2 5.1
3 5.0 5.0
4 4.9 4.7
5 6.1 5.9
Is there a difference in the mean turn times for the two economy brands?
Use α = 0.01
Is there a difference in the mean turn times for the 5 stores? Use α =
0.05
1) Treatment:____________________________________
2) Block: ______________________________________
(For Treatment Means)
a) Ho: __________________________________________________________
Ha: __________________________________________________________
b) Test Procedure: ________________________________________________
d) Decision Rule: __________________________________________________

76
GSO & GS PROGRAMS
e) Computation:
α= _________
= _________
f) Decision:_______________________________________________________
g) Conclusion: _____________________________________________________
(For Block Means)
a) Ho: __________________________________________________________
Ha: __________________________________________________________
b) Test Procedure: ________________________________________________
d) Decision Rule: __________________________________________________
e) Computation:
α= _________
= _________
f) Decision:_______________________________________________________
g) Conclusion: _____________________________________________________

77
GSO & GS PROGRAMS
Module 14
Using SPSS to find Simple Random Samples
Learning Objectives
Draw simple random samples from the constructed frame in SPSS
data

78
GSO & GS PROGRAMS

79
GSO & GS PROGRAMS

80
GSO & GS PROGRAMS
Module 15
Measures of Correlations and Relationships
Learning Objectives
compute the correlation coefficient & test its significance.
compute the rank correlation coefficient & test its significance.
perform an appropriate test for categorical data – the chi-square test
(χ2) for independence.

81
GSO & GS PROGRAMS

82
GSO & GS PROGRAMS

83
GSO & GS PROGRAMS

84
GSO & GS PROGRAMS

Measures of RElationship
Name: ___________________________ Score: __________
A random sample of 400 married men, all retired or at least in their 65’s were classified according to
educational attainment and number of children.
Number of Children
Educational Attainment
0-2 3-5 Over 5
None 12 22 26
Elementary 14 59 37
Highschool 20 80 34
College 26 31 19
Test the hypothesis that the number of children is independent of the level of education attained by
the father at α = 0.05.
1) Independent Variable:____________________________________
2) Dependent Variable: ______________________________________
a) Ho: __________________________________________________________
Ha: __________________________________________________________
b) Test Procedure: ________________________________________________
d) Decision Rule: __________________________________________________
e) Computation:
α= _________
= _________
f) Decision:_______________________________________________________
g) Conclusion: _____________________________________________________

85
GSO & GS PROGRAMS
Module 16
Regression Analysis
Learning Objectives
formulate predicting equation and test its significance
perform at least simple linear regression analysis
PARAMETRIC REGRESSION ANALYSIS
Regression Analysis is a statistical technique used for determining the

probable form of the relationship between variables. The
ultimate objective when using this method of analysis is
usually to predict or estimate the value of one variable
corresponding to a given value of another variable.
Recall:
Simple Regression Analysis a form of linear relationship consisting only
one independent variable X to predict dependent variable
Y. Objective: To find the possible relationship between
two variables X and Y, where X and Y are paired
variables.
Two variables X and Y are linearly related if their relationship can be

expressed by the simple linear statistical model
Y = β0 + βiX i + εi
where Yi = ith observed value of the random variable Y
Xi = ith observed value of the random variable X
β0 = regression constant. It is the true Y intercept
β1 = regression coefficient. It measures the true
increase in Y per unit increase in X.
This model is called the SIMPLE LINEAR REGRESSION MODEL
Assumptions Underlying the SLRM:

1. The values of the independent variable X may either
be “fixed” or random.
2. The X’s are measured without error
3. The Y-values are statistically independent.
4. For each value of X, there is a subpopulation of the Y
values that is normally distributed.
5. The variances of the subpopulations of Y are all equal
to σ2.
6. The means of the subpopulations of Y all lie on the
same straight line.
PARAMETERS OF THE MODEL:

β0 = regression constant β1 = regression coefficient σ2 = common
population variance

86
GSO & GS PROGRAMS
Result from Statistical Theory:

Estimators of the parameters based on SRS of size n.
S xy
1. β1 = b1 = 2
2. β0 = b0 = Y − b1 X 3.
Sx
n 2
Sy
2 ∑ (Yi − Y1 )
2 i=1
ˆ =
σ =
n−2 n−2
Predicting Equation: Yi = b0 + b1 Xi
Evaluation of the Simple Regression Equation
An overall measure of adequacy of the equation is provided by the

coefficient of multiple determination, denoted by r2. It is defined as
2
S xy b1S xy SSR
r2 = 2 2
= 2
=
Sx Sy S y SST
2
r gives the proportion of total variation in Y that is accounted for by
the independent variable X. It ranges from 0 to 1, or 0 to 100%.
The nearer its value to 1 the better is the fit of the regression line.
Note: If the model is not significant, do not use the prediction because it might
not be linear.
87
GSO & GS PROGRAMS

88
GSO & GS PROGRAMS

89
GSO & GS PROGRAMS

Regression Analysis
Name: ___________________________ Score: __________
A young economist wants to verify if wage is related to the educational background of an
individual. He interviewed 20 randomly chosen individuals and obtained the following
results:
Observation No. of Years in Monthly Observation No. of Years in Monthly
No. School Wage (P) No. School Wage (P)
1 0 300 11 15 1600
2 3 400 12 10 900
3 6 600 13 17 2000
4 10 800 14 8 700
5 1 400 15 14 1250
6 11 950 16 17 2500
7 11 950 17 10 850
8 7 650 18 13 1200
9 14 1000 19 9 600
10 2 450 20 14 1500
a. Identify the independent variable: _________________________

b. Identify the dependent variable: ___________________________
c. Plot a scatterplot diagram.
d. Find the equation of the regression line and interpret the result
e. Fit the regression line on the scatter plot diagram.
f. Compute for the coefficient of multiple determination and interpret.
g. Estimate the monthly wage when the number of years in school is 15.
h. Test for the significance of β1 at α = 5%.

90
GSO & GS PROGRAMS
IBM-SPSS Statistics version 20 Training Module
Team Leader: MARITESS D. VILLANUEVA,

VILLANUEVA MAT (Mathematics), MS Statistics
Technical Assistant: CLEO S. VILLANUEVA

VILLANUEVA, MIT
Program Assistants: DIEGO MALONES,

MALONES Ed. D.
ANABO MBA
FERDINAND D. ANABO,
BACAS MBA
MICHELLE BACAS,
BASQUEZ MAT (Math)
MALOU BASQUEZ,
TENORIO MAT (Math)
ALVIN TENORIO,
BALTERO M. Chem
KRIS D. BALTERO,
PET ROANA B. BATACANDOLO
JOHN KENETH ADA

91

Basic Statistics (Data Management&amp;Stat Analysis)

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Basic Statistics (Data Management&amp;Stat Analysis)

Hochgeladen von

Copyright:

Verfügbare Formate

GSO & GS PROGRAMS

CapSU Pontevedra Research & Extension Services

September 8-9, 2014

Secretariat: Graduate School Office & Graduate School Programs

Content Page No.

Module 2 – IBM SPSS: An Introduction .......... 12

Module 3 – Entering, Saving and Opening SPSS data .......... 18

Module 4 – Generating Descriptive Statistics .......... 35

Module 5 – Generating Frequency Tables and Graphs .......... 41

Module 6 – Detecting Data Outliers .......... 47

Module 7 – Inferential Statistics: A Review .......... 51

Module 8 – Testing Assumptions .......... 55

Module 9 – Test on a Single Population .......... 59

Module 10 – Case of Two Population Means – Related Samples .......... 62

Data Management & Statistical Analysis using IBM-SPSS Statistics

Content Page No.

Module 12 – Case of Two or More Population Means

Module 13 – Case of Two or More Population Means

Module 14 – Using SPSS to find Simple Random Samples ......... 78

Module 15 – Measures of Correlations and Relationships ......... 81

Module 16 – Simple Linear Regression ......... 86

Data Management & Statistical Analysis using IBM-SPSS Statistics

CAPIZ STATE UNIVERSITY

Center of Academic Excellence Delivering Quality Service to All.

Capiz State University is committed to provide advanced knowledge

• Globally competitive graduates.

Data Management & Statistical Analysis using IBM-SPSS Statistics

CapSU as a credible and recognized leader in the pursuit of

To actively support a sustainable agro-industrialization and

Data Management & Statistical Analysis using IBM-SPSS Statistics

Capiz State University at Pontevedra, spearheaded by the

In this university, the IBM-SPSS Statistics version 20 will be

IBM-SPSS version 20 will be used as a tool for data

In this connection, a training workshop on Data Management

Data Management & Statistical Analysis using IBM-SPSS Statistics

The training-workshop is conducted to enable the participants

2. appreciate the importance of IBM-SPSS Statistics ver. 20

3. use the said program in analyzing their own collected

4. apply improved skills in research;

5. be more involved in statistical analysis of their

6. identify staff for future researches that will be in-charge for

Data Management & Statistical Analysis using IBM-SPSS Statistics

At the end of this module, the participants should be able to:

Data → facts and figures

Descriptive statistics Inferential statistics

Probability and Sampling

Variable – characteristics of interest measurable or observable on

Population – set of all possible values of the variable

The Variables and Levels of Measurement

The measurement of a variable determines the amount of information that

Data Management & Statistical Analysis using IBM-SPSS Statistics

Run on various operating systems such as Windows, Linux and

SPSS (Statistical Package for Social Sciences)

Data Management & Statistical Analysis using IBM-SPSS Statistics

At the end of this module, the participants should be able to:

Quick Facts about SPSS

Statistical Analysis and procedures we can do with SPSS

How to run SPSS

Data Management & Statistical Analysis using IBM-SPSS Statistics

Press the window key in your keyboard

The monitor will display different icons

Data Management & Statistical Analysis using IBM-SPSS Statistics

Data Management & Statistical Analysis using IBM-SPSS Statistics

SPSS is divided into 3 main windows:

Basic Statistics (Data Management&Stat Analysis)

Basic Statistics (Data Management&Stat Analysis)