Sie sind auf Seite 1von 34

MANAGERIAL STATISTICS

Introduction

In business and economics, statistics plays an important role in research, in


forecasting business trends, forecasting sales and in other business and economic
activities such as investments, production, employment, marketing and in management
and control. Private companies also use statistics in making intelligent company
policies.
Today, statistics is used in nearly every field. It is for this reason that students,
whether taking a course in business or sciences should learn or at least get acquainted
with statistics. It is to their advantage that they should have an adequate understanding
and working knowledge of the concepts and principles of statistics to equip them with
the knowledge necessary for an intelligent decision making.

Statistics may be defined as the branch of mathematics that deals with the systematic
method of collecting, classifying, presenting, analyzing, and interpreting quantitative or
numerical data.

DIVISION OF STATISTICS

1. DESCRIPTIVE STATISTICS- which is concerned with the collection,


classification, and presentation of data designed to summarize and describe the
group characteristics of the data. Ex. Measures of location, measures of
variability, skewness and kurtosis.

2. INFERENTIAL STATISTICS- refers to the drawing of conclusion or judgment


about a population based on a representative sample systematically taken from
the same population. Its aim is to give concise information about large groups of
data without dealing with each and every element of these groups. So that, if the
taken is small, certain assumptions and inferences are made based on limited
information, and if the sample drawn is large, it may be treated as equal to that of
the whole observation.

STEPS IN STATISTICAL INQUIRY OR INVESTIGATION:


In statistical investigation, after the problem has been clearly defined and its
objectives have been set up, the following steps have to be undertaken:

A. COLLECTION OF DATA:
The data collected must be valid, reliable, relevant and consistent with
other information to the problem at hand. Data collected may be classified
as:

1
a. Primary data – refer to data obtained directly from an original source by
means of actual observations or by conducting interviews. The direct
source could be an individual or family group, business entities or private
and government agencies.
b. Secondary data – refer to data or information that come from existing
record ( published and or unpublished ) in usable form such as surveys,
census, business journals and magazines, newspapers, commercial
publications, and others such as theses and dissertations and research
papers, etc.
c. Internal data – data taken from the company’s own records of operations
such as sales records, production records, personnel records, etc.
d. External data – data that come from outside sources and not from the
company’s own records.

METHODS OF DATA COLLECTION

1.The Interview or Direct method.


Data gathering device where in the research worker or interviewer gets
theneeded data/ information from the respondent or interviewee verbally and
directly in a face-to-face contact. One marked advantage of this method is that
skillful interviewer may draw from the interviewee certain types of personal and
confidential information which may not be possible through the other methods of
data collection.

2. The Questionnaire or indirect method.


Data gathering instrument consisting of a list of well- planned, written
questions related to a particular topic sent by mail to individuals, with space
provided for responses to each question given out to acquire the needed
data/ information.
3.Registration method.
Pertains to records of births, marriages, and deaths at the NSO or
registration record of voting age at the COMELEC.
4.Observation.
Employed when certain data or information cannot be secured adequately or
validly through the use of the other methods of data collection except through
the use of observation. Observation must be specific, systematic, quantitative,
expert and its results must be checked and substantiated.
5.Experimentation.
Data obtained as a result of systematic effort in following the scientific
method.

B.PROCESSING OF DATA
After data have been collected, they have to be processed. Processing of
data includes:
a.Editing – in which the purpose is detect errors and omissions, and to
ensure that the data gathered are accurate, consistent with other information,
complete, and should be arranged in such a way as to facilitate and
classification.
b.Coding – refers to assigning numerals and other symbols to the data
collected to be able to group them into a limited number of classes or
categories.
c.Classification – refers to sorting of the data and grouping them on the
basis of similarity . The purpose of classification is to enable us quickly see all
the possible characteristics in the data collected.
C. PRESENTATION OF DATA.

Data can be presented by means of the following modes:


a.Textual Presentation - this mode of presentation combines text and figures
in a statistical report. Ex. The news item in the newspaper which reports the
number of houses buried by the lahar flow and the number of families
evacuated from the lahar ravaged municipalities during the recent lahar
calamity.

b.Tabular Presentation – the mode of presentation is better than the textual


form because the data are presented in more concise and systematic
manner. The data are systematically presented through tables consisting of
vertical columns and horizontal rows with headings describing these rows and
columns. Example:

Table I. Frequency Table of Market Sectors

Value label ( Business Classification) Value Freq’cy %


Chemical and Materials (CM) 1 10 10
Consumer Products 2 8 8
Durables/ capital Equipment ( DCE) 3 7 7
Energy 4 13 13
Financial 5 24 24

c.Graphic Presentation- the most effective means of presenting statistical


data, because important relationships are brought out more clearly in graphs.
Graphs have a great advantage over tables because graphs convey
quantitative values and compares more readily than tables.

3
Types of Graphs:
1.Bar Graph. The simplest form of graphing presentation generally intended for
comparison of simple magnitude. It may either be horizontal bar graphed or a
vertical bar graph.

Business classification

2.Line Graph. The most widely use practical device effective in showing a trend (
Changes in value ) over a period.

3.Circle or Pie Chart. A circle divided into parts whose sizes are proportional to
the magnitude or percentages they represent. Used to show the component parts
of a whole.

4.Scatter Diagram. Provides a means for visual inspections of data which is a


list of values for two variables cannot. It shows if a relationship exists between
variables. It also convey both direction and shape of the relationship.

5.Pictograph or Pictogram. Uses symbols such as a stick figure for population


to indicate data instead of a bar in a bar- type chart.

4
D. ANALYSIS OF DATA.

E. INTERPRETATION OF DATA.

KEY TERMS AND THEIR DEFINITIONS


 Analysis- is the manipulation of the data gathered as descriptive and inferential
statistics

 Cumulative Frequency- is used in getting the value for the median, quartiles,
deciles and percentiles.

 Data- point to statistical facts, principles, opinions and various items of different
sources.

 Data collection- the process and methods of gathering information by interview,


questionnaire, experiments, observation and documentary analysis.

 Data Presentation- takes the form of tables and graphs.

 Descriptive statistics- includes frequency distribution, measures of central


tendency, measures of central location, measure of dispersion or variation,
graphs, skewness and kurtosis. Likewise, it refers to some techniques which are
concerned with presentation and collection of data or information.

 Inferential statistics- the technique by which decision and conclusion and


conclusion are to be made from the population observed using only the
representative samples. This statistics includes as both parametric and non
parametric tests which are more concerned with generalizing information or
making inference about the population through representative samples.

 Frequency distribution- is the tabulation of data of measuring group with class


interval.

 Graphical presentation- points to the construction of bar graph, frequency


polygons, pie charts and pictographs, among others.

 Grouped data- are properly organized and classified data such as the use of
frequency distribution.

 Interpretation- makes clear results of the analysis using statistical methods to


see whether significant differences or relationships exist between variables.

 Parameter- is a characteristic of a population.

 Population- is the totality of all the actual observable characteristics of a set of


objects or individuals.
 Random sampling- involves the selection of samples such that each sample of
a given size has precisely the probability of being selected. It includes the simple
random, stratified, cluster and multistage sampling techniques.

 Sample- refers to the element of objects or individuals selected from the


population.

 Schedule- is the extensive set of questions and instruction used in personal


interview.

TYPES OF MEASUREMENT

The data can be classified into two types. These are the continuous and
discontinuous or discrete data.

 Continuous data- are measures like feet, pounds, kilos, minutes and meters.
These kinds of data can be made into measurement be made into
measurement of varying degrees of precision, for example, 1 yard equals three
feet ( 1 yd = 3; 1 ft = 12 in.

 Discontinuous or discrete data- are measurement expressed in whole units.


Counting of people, number of objects, number of cars passing by, number of
houses, number of students, workers and so on.

MEASUREMENT OF SCALES

According to Stevens, there are four types of scales that are used in sciences.
These are the nominal, ordinal, interval and, ratio..

 Nominal Scales – are used as measure of identity. Examples of this are


classification of individuals into categories, like gender and, male and female; yes
and no answers; in religion for instance, Muslims and Christians; for political
parties,LP, Laban, Lakas, and KNP; dwelling place, rural and Urban; and more of
such categories.

 Ordinal Scales- is used in measurement like ranking of individuals or objects.


Ordinal measures reveal which person or object is larger or small, harder or
softer, responses like strongly agree, agree, no opinion, disagree, and strongly
disagree.

6
 Interval scales- are numbers that reflect differences among items. Examples are
scores in a test, grades of students, ages, blood pressures, Fahrenheit and
Celsius thermometers.

 Ratio scale- the highest type of scale. The basic differences between the interval
and ratio scale is that ratio scale are the measures of length, weight, loudness,
width, and so on.

STATISTICAL SYMBOLS

Σ = capital letter sigma denotes summation of, the sum of

f = small letter f denotes frequencies

F = capital letter F denotes cumulative frequencies

n = small letter n denotes sample size

N = capital letter N denotes population size

i = small letter I denotes interval

X = capital letter x denotes independent variables

Y = capital letter y denotes dependent variables

X = denotes mean of the sample

μ = capital letter m denotes population mean.

Familiarization for the following expressions

x = y x equals y

x ≠ y x is not equal to y

x ˃ y x is greater than y

x˂ y x is lesser than y

x ≥ y x is greater than or equal to y

x ≤ y x is lesser than or equal to y

7
The characteristics of the population are called parameters while the
characteristics of the sample are called statistics.

Characteristics Parameters Statistics

Mean μ , mu x

Standard Deviation σ, sigma s

Number of Cases N n

Proportion P p

Pearson Product Moment Correlation R r


Coefficient

Variance S2 s2

Summation Notation
Example 1. If N =5 the following observations are X1 = 2 ; X2 = 4; X3 = 3 ; X4 = 5; X6 =
6, find the sum of five values of Xi using summation notation.

N
Solution: Σ Xi = X1 + X2 + X3 + X4 + X5 = 2 + 4 + 3 + 5 + 6 = 20
i=1

Example 2. If N = 3 and the following observations are X 1= 5; X2= 4; X3 = 1,


N
Solution: Σ Xi = X1 + X2 + X3 = 5 + 4 + 1 = 10
i=1

Example 3. Suppose a be a constant. Find the sum of the values, when a constant has
been added to each, Use example 2, where N = 3 and X i = 5; X2 = 4; X3 =1
N
Solution: Σ (Xi + a) = (Xi + a) + (X2 + a) + (X3 + a) = 5 + a + 4 +a + 1 + a = 10 + 3a
i=1

So we can say that the es of the variables plus N times the constant. Therefore;

N N
Σ (Xi + a) = Σ Xi + Na
i=1 i =1

8
Example 4. Suppose a be a constant has been subtracted from each observation X i.
Find the values using the notation of N = 4 and X 1= 4; X2 = 7; X3 = 1; X4 = 5.

N
Solution: Σ (Xi - a) = ( X1 – a ) + ( X2 – a ) + ( X3 – a ) + ( X4 – a )
i=1

= ( 4 –a ) + ( 7 – a ) + ( 1 – a ) + ( 5 – a )
= 17 – 4a

So, the sum of the values of a variable when a constant has been
subtracted from each is equal to the sum of the values of the variables
minus N times the constant. Therefore,
N N
Σ (Xi - a) = Σ X1 - Na
i=1 i-1

THE NATURE OF STATISTICS: Statistical investigation can be classified into two major
functions;
1. Descriptive Statistics- method of collecting and presenting data. it includes the
computation of measures of central tendency, measure of central location,
likewise the measures of dispersion or variability. It also includes the construction
of tables and graphs.

2. Inferential Statistics- concerned with higher degree of critical judgment and


advanced mathematical modes such as using the different statistical tools both
the parametric and non parametric tests. This is concerned with the analysis and
interpretation of data in order to draw conclusion and generalization from
organized data. This also includes the testing of the significant relationship
between the dependent and independent variables as well as the significant
differences between and among independent samples.
SAMPLE AND POPULATION

 Population – identifies the totality of objects under investigation. The researcher


may use the population as subject of studies when it is small and manageable
when employing statistical methods. However, if the population is too large, trhe
researches may use the representative sample.

 Sampling- the method of getting a small part from the population that serves as
the representative of the population called sample.

9
Note: If the population under study is too large to handle and will entail too much time, cost,
and effort, taking samples is a very alternative. It should be noted that if a small part of the
population is considered, sampling error should be expected. Thus, in drawing conclusions
about the population from which a sample is drawn, the researcher should learn how to draw
samples that are truly representative of the population. Different sampling techniques include
sample random sampling, stratified sampling, cluster sampling and multi- stage
sampling.
 A simple random sample is a subset of a statistical population in which each
member of the subset has an equal probability of being chosen. A simple
random sample is meant to be an unbiased representation of a group.

 Stratified sampling refers to a type of sampling method. With stratified


sampling, the researcher divides the population into separate groups, called
strata. Then, a probability sample (often a simple random sample ) is drawn
from each group.

 Cluster sampling is a sampling technique used when "natural" but relatively


heterogeneous groupings are evident in a statistical population. It is often used
in marketing research. In this technique, the total population is divided into these
groups (or clusters) and a simple random sample of the groups is selected.

 Multistage sampling can be a complex form of cluster sampling... Cluster


because sampling is a type of sampling which involves dividing the population
into groups (or clusters). Then, one or more clusters are chosen at random and
everyone within the chosen cluster is sampled.

Note: The problem that is commonly encountered is determining the sample size.
It is not advisable to set a certain percentage; instead, the margin of error which
is from 1% to 10% in social science researches should be considered. The
computation of the sample size, relative to the population size has this formula:

n= N
1 + Ne2
Where: N = the population size
e2 = the margin of error
n = the sample size

10
Example 1. Find the sample size if the population size is 2500 at 95% accuracy.

Solution: At 95% accuracy, the corresponding percentage margin of error is 5% or .05


using the formula,
n= N
1 + Ne2

= 2500
1 + 2500 ( .05 )2

= 344.83 or 345

Example 2 . A researcher is conducting an investigation regarding the factor affecting


the performance of 200 teachers in the 1st district of Catarman, Northern Samar. If the
margin of error is 3%, how many of the teachers should be taken as respondents?

Solution: The target population is composed of 200 teachers in 1 st district of Catarman


N. Samar. At 3% margin of error, the sample size n shall be:

n = N_______
1 + Ne2

= 200
1 + 200 (.03)2

= 169.49 or 169

11
II. MEASURES OF CENTRAL TENDENCY

For Grouped Data

a.Mean – define as an arithmetic average. It is the sum of the observed


values divided by the number of observations. It is a computed average and
its magnitude is influenced by every value in the set. It is the location
measure most frequently used, but can be misleading when the distribution
contains extremely large or small values.

Two ways of solving the mean:


Long method b) short method

X = ΣfM X = Am + ( Σfd ) i
n n
Where: X = the mean;
ΣfM = the summation of the products of frequencies and midpoints
Σ fd = the summation of the products of frequencies and deviations
Am = assumed mean, the midpoint of the class where the zero
deviation is placed.
n = the number of cases or scores
i = the class interval

b.Median – median of the distribution. Half of the values in the distribution fall
below the median, and the other half fall above it. It is the most appropriate
locator of center values.

Me = Lme + ( n/2 – fb ) i
Fw

Where: Me = the median

Lme = lower boundary of the median class

n/2 = the median class

fb = the less than or equal to cumulative frequency just below the

median class

fw = the actual frequency within the median class

n = the total number cases or scores

i= the class interval

12
c.Mode – value that appears with the highest frequency. It is determined by the
formula;

Mo = Lmo + ( d1 ) i
d1+ d2

where: Mo = the mode


Lmo = the lower boundary of the modal class
d1 = the difference between the frequency of the modal class and
the frequency of that class next lower in value.
d2 = the difference between the frequency of the modal class and
the frequency of of that class next higher in value

Fractiles )

In a frequency distribution, the quantiles or fractiles is a value at or below which


a given fraction of the distribution must lie. Like the median, the quantiles or fractiles are
also positional measures.

a.Quartiles – are values that divide the distribution into 4 equal parts.. These are
Q1 in which 25% or less of the distributions lie, Q 2, which 50% or less of the
distributions lie, and Q3, where 75% or less of the distributions lie.

Q1 = LQ1+ ( n/4 - fbQ1 ) i


fwQ1

where: Q1 = quartile one or first quartile

LQ1 = the lower boundary of the quartile one class

n/4 = the quartile one class

fb1 = the less than or equal to cumulative frequency just below the
quartile one class

fbw = the actual frequency within the quartile one class

b.Deciles – are values that divide the distribution into 10 equal parts:
The deciles are: D1 , D2 , D3 , …, D9.

D1 = LD1+ ( n/10 – fbD1 ) i


FwD1
C.Percentiles – are values that divide the distribution into 100 equal parts.
These are: P1 , P2 , P3 , P4, …, P99
P1 = LP1+ ( n/100 – fbP1 ) i
FwP1

13
Note: You will notice that quantiles, deciles, and percentiles utilize the
median formula while they differ only in the subscripts.

III. MEASURES OF VARIABILITY


( Measures of Absolute Variation & Measures of Relative Dispersion)

MEASURES OF VARIATION:
A.Range – the difference between the upper boundary of the highest class
and the lower boundary of the lowest class.
R = UBHC - LBLC

B.Interquartile Range, IQR = Q3 - Q1

C.Quartile Deviation, QD = Q3 – Q1
2

D. Mean Absolute Deviation, MAD = ∑ f M - X


n
E.Mean Squared Deviation or Variance, S2
S2 = [ nΣ fd2 – ( Σfd)2] i2
n(n-1)

F.Standard Deviation, S = √ S2

. MEASURES OF RELATIVE DISPERSION:

a. Coefficient of Variation ( CV ),
CV= S x 100%
X

b.Coefficient of Quartile Deviation ( CQD )

CQD = Q3 – Q1 x 100%
Q3 + Q1

14

Illustrative Problem:

Problem: The following are the distribution of the ages of 100 employees of Philippine
Christian University during the time of Carlito S. Puno as the President.
Class f M fM M-x f M-x d fd d2 fd2 ≤cumf
54-59 1 56.5 56.5 23.22 23.22 3 3 9 9 100

48-53 5 50.5 252.5 17.22 86.1 2 10 4 20 99


42-47 11 44.5 489.5 11.22 123.42 1 11 1 11 94
36-41 18 38.5 693 5.22 93.96 0 0 0 0 83
30-35 30 32.5 975 0.28 8.4 -1 -30 1 30 65
24-29 24 26.5 636 6.78 162.72 -2 -48 4 96 35
18-23 11 20.5 225.5 12.58 140.58 -3 -33 9 99 11
n= 100 3328 638.4 -87 265
i=6

Dedtermine: a) Mean, X b) Me c) Mo d) Q 3 e) Q1 f) D7 g) D4 h) P35 i) P55 j) R


k) IQR l) QD m) MAD n) S2 o) S p) CV q) CQD
Solution:

a) X = ΣfM = 3328 = 33.28 alt. sol’n. X = Am + (Σfd) i = 38.5 + (-87) 6 = 33.28


n 100 n 100

b) Me = Lme + (n/2 - fb) i = 29.5 + ( 100/2 – 35 ) 6 = 32.5


fw 30

c) Mo = Lmo + [ d1 ] i = 29.5 + [ 6 ] 6 = 31.5


d1 + d 2 6 + 12

d) Q3 = LQ3 + [ 3n/4 - fbQ3 ] i = 35.5 + [ (3x100)/4 – 65 ] 6 = 38.83


fwQ3 18

e) Q1 = LQ1 + [ n/4 - fbQ1 ] i = 23.5 [ 100/4 – 11 ) 6 = 27


fwQ1 24

f) D7 = LD7 + [ 7n/10 – fbD7 ] iI = 35.5 [ 25 – 65 ] 6 = 22.17


fwD7 18

g) D4 = LD4 + [ 4n/10 – fbD4 ] i = 29.5 [ 40 – 35 ] 6 = 30.5


fwD4 30

h) P35 = LP35 + [ 35n/100 - fBP35 ] i = 23.5 + [ 35 - 11] 6 = 29.5


fwP35 24

i) P55 = LP55 + [ 55n/100 – fbP55 ] i = 29.5 + [ 55 – 35 ] 6 = 33.5


fwP55 30

15

j) R = UBHC – LBLC = 59.5 – 17.5 = 42

k) IQR = Q3 - Q1 = 38.83 – 27 = 11.83


l) QD = Q3 – Q1 = IQR = 11.83 = 5.92
2 2 2

m) MAD = Σf M- x = 638.4 = 6.38


n 100

n) S2 = [ nΣ fd2 – ( Σfd)2] i2 = [ 100 ( 265 ) – (-87)2 ] 62 = 68.84


n(n-1) 100(100-1)

o) S = √ 68.84 = 8.30

p) CV = S x 100% = 8.30 x100 = 24.94%


x 33.28

q) CQD = Q3 – Q1 x 100% = 38.83 – 27 x 100% = 17.97%


Q3 + Q1 38.83 + 27

Exercises:

1. For the given frequency distribution table determine the following: a) Mean b)
Me c) Mo d) Q3 e) Q1 f) D8 g) D5 h) P65 i) P35 j) R k) IQR l) QD m) MAD n)
S2 o) S p) CV q) CQD
Classes F M fM d fd d2 fd2 f M - x f M - x ≤cumf
95 – 99 2
90 – 94 2
85 – 89 7
80 – 84 9
75 – 79 10
70 – 74 8
65 – 69 2

16
2. In the given frequency distribution table, determine: a) Mean b) Me c) Mo d) Q 1
e) Q3 f) D5 g) D7 h) P25 i) P65 j) R k) IQR l) Q.D. m) MAD n) S2 o) S.D.
Classes F M fM d fd d2 fd2 fM-x f M-x ≤cumf
60 - 64 6
55 - 59 7
50 - 54 10
45 - 49 8
40 - 44 8
35 - 39 5
30 - 34 4
25 - 29 2

V. HYPOTHESIS TESTING
In either accepting or rejecting a null hypothesis, incorrect decision can be
made. A null hypothesis can be accepted when it should have been rejected or rejected
when it should have been accepted. Thus in accepting or rejecting the null, two types of
decision errors could be committed.

Type I error is committed if the null hypothesis is rejected when it is true.


Type II error is committed if the null hypothesis is accepted when it is false.

A.CHI- SQUARE TEST ( X2 )

Another most widely used test of significance ( non- parametric) is the x 2 test. X2
can test for the significant differences between the observed distribution of data among
categories an the expected distribution of data based upon the null hypothesis ( or
significant relationship ) .It is used in cases of one- sample analysis, two- independent
samples or k independent samples.

Illustrative Problem.

Test the hypothesis that there is no significant relationship between the gender of the
employees and their job satisfaction level, if in a certain School the following results
were obtained at 0.05 significant level.

Sex Low Medium High Total


Male 45 60 55 160
Female 9 10 10 29
Total 54 70 65 189

17
I.Statement of hypothesis:
Ho: There is no significant relationship between the gender of the employees
and their job satisfaction level.
H1: There is significant relationship between the sex of the employees and
Their job satisfaction level.
II.Statistical test: use the one sample x2 test.
Level of significant and Critical value:
@ 0.05 and df = ( r-1)( c -1) = ( 2 -1)(3 -1)= 2
Critical x2 value = 5.99

Expected value, E = Ct x Rt
Gt
III.Computation:
Male/ low: E = 54 x 160 = 45.71
189
Male/ med : E = 70 x 160 = 59.26
189
Male/ high : E = 65 x160 = 55.03
189
Female/ low: E = 54 x 29 = 8.29
189
Female/ med: E = 70 x 29 = 10.74
189
Female / high: E = 65 x 29 = 9.97
189

O–E X2 =∑ ( O – E )2
O E E
45 45.71 -0.71 0.011028
60 59.26 0.74 0.009241
55 55.03 -0.03 0.000016
9 8.29 0.71 0.060808
10 10.74 -0.74 0.050987
10 9.97 0.03 0.003009
∑X 2 = 0.135089

Decision: Since the critical X2 value of 5.99 ˃ the computed X2 value of 0.135089, the
null hypothesis, Ho is accepted while the alternative hypothesis, H 1 is rejected.
Therefore, there is no significant relationship between the gender of the employees and
their job satisfaction level.

18

Note: For any hypothesis testing involving the relationship between the critical statistic
test value and the computed value, when the critical value is greater than the computed
value (Critical value ˃ computed value), the null hypothesis is accepted leading to the
rejection of the alternative hypothesis. But when the critical value is less than the
computed value (critical value ˂ computed value), the null hypothesis is rejected leading
to the acceptance of the alternative hypothesis.

Exercise:
Test the hypothesis that there is no significant relationship between the students
class level and attitudes with respect to fraternities using 5% level of significance.
Students Favorable Neutral Unfavorable Total
Junior 80 60 70
Senior 100 50 70
Total

B. LINEAR CORRELATION

Correlation analysis is used to measure the nature of the relationship or


association between variables.

The PEARSON Product Moment Correlation


The Pearson product moment correlation reveals the magnitude and direction
of relationships.
The Pearson’s r measures relationships in variables that are linearly related. Its value
range from +1 through 0 to -1. The r symbolizes the coefficient’s estimate of linear
association based on sampling data. The formula for Pearson’s r is:

n ( ∑xy ) – (∑x) ( ∑y)


r =
[ n ( ∑x2) – ( ∑x )2] [ n (∑y2) – (∑ y)2 ]

Where: x = observed data for the independent variable


y = observed data for the dependent variable
n = sample size
r = degree of relationship between x and y

19

Range of values of Pearson’s r

Range of Values Interpretation


+/- 1.00 Perfectly positive/ negative correlation
+/- 0.91 - +/- 0.99 Very High positive/ negative correlation
+/- 0.71 - +/- 0.90 High positive/ negative correlation
+/- 0.51 - +/- 0.70 Moderately positive/ negative correlation
+/- 0.31 - +/- 0.50 Low positive/ negative correlation
+/- 0.01 - +/- 0.30 Negligible positive/ negative correlation
0.00 No correlation
Illustrative Problem:

A research study was conducted to determine the correlation between students’


grade in English and their grades in Mathematics. A random sample of 10 students of
Education, major in Physics of a certain University were taken and the results of the
sampling are tabulated as shown. Use the 5% level of significance.

Student 1 2 3 4 5 6 7 8 9 10
Number
English grade 93 89 84 91 90 83 75 81 84 77
Mathematics 91 86 80 88 89 87 78 78 85 76
grade

Solution: Let x = grade in English


y = grade in Mathematics

n ( ∑xy ) – (∑x) ( ∑y)


r =
[ n ( ∑x2) – ( ∑x )2] [ n (∑y2) – (∑ y)2 ]

20

Stud. No English x Math y xy x2 y2

1 93 91 8463 8649 8281

2 89 86 7564 7921 7396

3 84 80 6720 7056 6400

4 91 88 8008 8281 7744

5 90 89 8010 8100 7921

6 83 87 7221 6889 7569

7 75 78 5850 5625 6084


8 81 78 6318 6561 6084

9 84 85 7140 7056 7225

10 77 76 5852 5929 5776

n = 10 ∑x = 847 ∑y= 838 ∑ xy = 71,236 ∑x2 = 72,067 ∑Y2= 70,480

n ( ∑xy ) – (∑x) ( ∑y)


r=
[ n ( ∑x2) – ( ∑x )2] [ n (∑y2) – (∑ y)2 ]

10 ( 71,236 ) – (∑847) ( ∑838)


r=
[ 10 ( 72,067 – ( 8472] [ 10 (70,480) – ( 8382) ]

r = 0.8916 say 0.89 - high correlation ( from the range of values)

TESTING THE SIGNFIICANCE OF r

Although the values of r obtained is high ( 89% ) , we still cannot be sure if it


is statistically significant, so we have to test the significance of r. The t- test will be
used.

21

n–2
Formula: t = r
1- r2

where: n-2 = degree of freedom ; r = Pearson’s r coefficient

I. Statement of Hypotheses:

Ho : There is no correlation between grades in English and grades in


Mathematics.
H1 : There is a correlation between grades in English and grades in
Mathematics.

II. Statistical Test:

n–2
t = r
1- r2

III. Level of Significance and Critical Value:

@ α = 0.05 and df = n – 2 = 10 – 2 = 8 , critical value of t = 2.306

Computation: 10 - 2
t = 0.89
1 – ( 0.89 )2

t = 5.52

Conclusion: In as much as the critical t- value of 2.306 < the computed t value
of 5.52, the null hypothesis (Ho) is rejected while the alternative hypothesis ( H 1)
is accepted. Therefore, there is a high correlation between grades in English
and grades in Mathematics.

22

Exercise:

Ten employees in one industrial organization have the following characteristics of number of years of
experience(X) and yearly salary (Y)(given in thousand pesos). Solve the Pearson product –moment
correlation (r) for the data and interpret the result.
SN X Y XY X2 Y2
1 7 18
2 11 16
3 33 25
4 24 22
5 5 19
6 18 23
7 35 24
8 12 19
9 9 21
10 10 26

B .ANOVA ( Analysis of Variance )

Use for testing the null- hypothesis that the means of several populations are equal.
The comparison in means of 3 or more populations which follow normal distributions
can be taken simultaneously in just one application of this test. This test, therefore, is
the generalization of Z- test and t- test of two normal population means.

ANOVA uses a simple factor, fixed- effects model to compare the effects of one factor
on a continuous dependent variable. It uses squared deviations or variances so that the
computation of distances of individual data points from their own mean or from the
grand mean can be summed.

The test statistic for ANOVA is the F- ratio, comparing the variance from the two
sources.

Formula:

F ratio = MSB = Between –groups variance = Mean Square between


MSW Within – groups variance Mean square within

Where:
MSB = Sum of Squares between = SSB
Degree of freedom between dfB

Degree of freedom for SSB, dfB =k-1’,where k pertains to the number of groups
or samples

22

MSW = Sum of square within = SSW


Degrees of freedom within dfW

Degrees of freedom for SSW, dfW = n(k-1)

Where n pertains to the number of items per column


( size of each sample)

ΣX = Xa + Xb + Xc + …

ΣX2 = Xa2 + Xb2 + Xc2 + …

SST = ΣX2 – ( ΣX)2


N
SSB = Σ[ ( XA )2 + ( XB)2 + (XC)2] - ( X)2
n N

SSW = SST – SSB

Illustrative Problem:

Three brands of infant’s powdered milk ( Infant’s formula) were given to three groups of
8 infants and the results were monitored for a certain period of time during an outreach
program of a certain University in Cavite. The results in terms of weight gains are
tabulated below:

Brand A Brand B Brand C


Respondents XA XB Xc
1 4.5 3.2 3.0
2 4.1 3.0 2.8
3 3.6 3.8 3.2
4 5.3 3.9 3.6
5 4.8 4.2 3.5
6 2.7 3.1 3.5
7 4.3 4.0 2.9
8 3.8 3.3 3.6

Test the hypothesis that there is no significant difference in the mean growth of the three
groups of infants given the three brands of infant powdered milk @ 0.01 level.

23

I.STATEMENT OF HYPOTHESES
Ho: there is no significant difference in the mean growth of the three
groups of infants given the three brands of infant powdered milk )
H1: There is a significant difference …given the three brands of infant
powdered milk.
Solution: ( completing the table below);

Brand A Brand B Brand C

Respo XA XA2 XB XB2 XC XC2


ndent

1 4.5 20.25 3.2 10.24 3.0 9.0

2 4.1 16.81 3.0 9.0 2.8 7.84

3 3.6 12.96 3.8 14.44 3.2 10.24


4 5.3 28.09 3.9 15.21 3.6 12.96

5 4.8 23.04 4.2 17.64 3.5 12.25

6 2.7 7.29 3.1 9.61 3.5 12.25

7 4.3 18.49 4.0 16.0 2.9 8.41

8 3.8 14.44 3.3 10.89 3.6 12.96

Σ XA=33.1 ΣXA2=141.71 ΣXB=28.5 ΣXB2= 103.03 ΣXC=26.1 ΣXC2= 85.91


k=3
n=8
N = 3x8 = 24

II.Statistical test: Use the F-ratio


III.Level of significance and critical value

@ Sig level and dfB = k – 1 and dfW = k ( n-1 )


=3–1 = 3 ( 8-1)
=2 = 21
Critical F value = 5.78

IV Computation:

F ratio = MSB
MSW
ΣX = XA + XB + Xc = 33.1 + 28.5 + 26.1 = 87.7

23
2 2 2 2
ΣX =XA +X + Xc = 330.31
B

SST = ΣX2 – ( ΣX)2 = 330.31 – (87.7)2 = 9.84


N 24

SSB = (33.1)2 + (28.5)2 + (26.1)2 - (87.7)2 = 3.16


8 24

SSW = SST – SSB = 9.84 – 3.16 = 6.68

MSB = SSB = 3.16 = 1.58


dfB 2

MSW = SSW = 6.68 = 0.32


dfW 21

Therefore: F = MSB = 1.58 = 4.94


MSW 0.32
Decision: In as much as the critical Fratio value of 5.78 ˃ the computed F ratio
of 4.94, the null hypothesis (Ho) is accepted therefore, there is no significant
difference in the mean growth of the three groups of infants given the three

Exercise
A. Three Administrators were task for packing noodles in a plastic cup that must
weigh 200 grams. A random sample of 6 plastic cups were weighed and the
results are tabulated below. Test the hypothesis that there is no significant
difference in the average weight of the cup noodles packed by the 3
administrators at 0.05 level.

A D M I N I S T R A T O R
Cup A B C
1 198 188 199
2 201 195 200
3 196 193 198
4 201 196 201
5 199 200 198
6 196 190 197

24

D. REGRESSION ANALYSIS

This section deals with the simplest type of prediction. When we tahe the
observed values of X to estimate or predict corresponding Y values, the process is
called simple prediction. When more than one x values is used, the outcome is a
function of multiple predictors. The simple and multiple predictions are made using a
technique called regression analysis.

Regression is a term used to describe the process of estimating the


relationship between two variables. The relationship is estimated by fitting a straight
line through the given data. The method of least squares permits us to find a line of
best fit called regression line which keeps the errors of prediction to a minimum.The
equation for s fitted straight line is:

y = a + bx
where: y = predicted value
a = y- intercept
b = slope of the line ( regression coefficient )
To find the y- intercept ( a ),

a = y – bx

where: x = mean of x –values

y = mean of y- values

To find the slope (b)

b = n ( Σxy) – (Σx) (Σy)


n ( Σx2 ) - ( Σx)2

25
Illustrative Problem

Dr. Fred Santos, the Administrator of the biggest University in Asia would
like to estimate the number of enrollees that would be expected 7 th week of their
2-month long ( 8weeks) school promotion. The number of enrolees during the
past 6 weeks are tabulated below.

Week number (x) 1 2 3 4 5 6 7

Number of enrollees in hundred 6 5.5 6.4 5.1 4.9 6.6 ?

Determine the predicted number of enrollees on the 7 th week

Solution.

Week No. (x) No. of enrollees (y) xy x2

1 6 6 1

2 5.5 11 4
3 6.4 19.2 9

4 5.1 20.4 16

5 4.9 24.5 25

6 6.6 39.6 36

7 ?

Σx = 21 Σy= 34.5 Σxy= 120.7 Σx2 = 91

b = nΣxy - ( Σx )( Σy ) = 6 ( 120.7) – (21)(34.5 ) = - 0.0029


n( Σx2 ) – ( Σx )2 6 ( 91) – ( 21)2

= y – bx x = Σx = 21 = 3.5
n 6

y = Σy = 34.5 = 5.75

n 6

26
a = 5.75 - ( - 0.0029 )( 3.5 ) = 5.76

By regression equation;

y = a + b x = 5.76 +( - .0029)( 7)

y = 5.74 ( predicted number of enrollees))

E. The table below shows the monthly income (x) and the monthly expenses (y) of 7
families in a certain barangay in Makati. Estimate the monthly expenditure of the
family whose income is P 8250.

Family Number Income (x) Expenses (y) xy x2


1 6600 4980
2 5875 4680
3 7250 5650
4 4925 3700
5 5678 5668
6 5975 4260
7 6950 6380
8 8250 ?

THE PARAMETRIC TEST

 The parametric tests are tests that require normal distribution and the level of
measurement are expressed in interval or ratio data.

Type of Parametric Tests are ( t-test, z- test, F- test, analysis of variance for the
test of difference and r, Pearson Product Moment Coefficient of Correlation for the
test of relationship/ association, and the test for prediction and forecasting are the
Simple Linear Regression Analysis, and Multiple Regression Analysis.

27

I. The t –Test . The t- test is used to compare two means, the means of two
independent samples or two independent groups and the means of correlated
samples before and after the treatment. Ideally, the t- test is used when there
are less than 30 samples, but some researchers use t- test even if there are
more than 30 samples.

X2 - X 1

t=
17 11
16 SS 1 + SS2 1 + 1 5
4 n 1+ n2 – 2 n 1 n 2
10
14 3
Where: t = the
12 t- test 7
X1 = mean
10 of group 1 2
X2 = mean
9 of group 2 6
SS1 = sum
17 of squares of group 1 13
SS2 = sum of squares of group 2
n1 = number of observations in group 1
N2 = number of observations in group 2
Solution:
_______________________________________________________________
Male ( X1 ) Female ( X2 )
2
X1 X1 X1 X 22
14 196 12 144
18 324 9 81
17 289 11 121
16 256 5 25
4 16 10 100
14 196 3 9
12 144 7 49
10 100 2 4
9 81 6 16
17 289 13 169
2 2
ΣX1 =131 ΣX1 =1891 ΣX2 =78 ΣX2 = 738

28

n1= 10 n2 = 10

X1 = 13.1 X2 = 7.8

SS1 = Σ x1 2 - ( Σ x1)2 = 1891 – ( 131 )2 = 174.9


n1 10

SS2 = Σ x2 2 - ( Σx2)2 = 738 – ( 78 )2 = 129.6


n2 10

t= X1 - X 2

SS1 + SS2 1 + 1
n1+ n2 – 2 n 1 n2

= 13.1 – 7.8
174.9 + 129.6 1 +1
10 + 10 – 2 10 10

t = 2.88

Solving by the stepwise Method:

I. Problem: Is there a significant difference between the performance of male


and female students in spelling?
II. Hypothesis: H0: There is no significant difference between the performance
of male and female AB students in spelling.

H0: x1 = x2
H1: There is a significant difference between the performance of male
and female AB students in spelling.
H1: x1 ≠ x2
III. Level of Significance:
α = .05 ; df =[ n1 + n2] -2 = 10 -+ 10 -2 =lar/ critical value, reject the null ( H0 )
IV. Conclusion: Since the t- computed value of 2.88 is greater than the t- tabular
value of 2.101 at .05 level of significance with 18 degrees of freedom, the null
hypothesis is rejected in favor of the alternative hypothesis. This means that
there is a significant difference between the performance of male and female
AB students in spelling. It implies that the male perform perform better than
the female students considering that the mean/ average score of the male
students of 13.1 is greater compared to the average score of female students
of only 7.8.

29
Exercise 1.
Two groups of experimental rats were injected with tranquilizer at 1.0 mg. and 1.5
mg dose respectively. The time given in seconds that look them to fall asleep is hereby
given. Use the t-test for independent samples at .01 to test the null hypothesis that the
difference in dosage has no effect on the length of time it took them to fall asleep.

1.0 mg. dose 9.8 13.2 11.2 9.5 13.0 12.1 9.8 12.3 7.9 10.2 9.7
1.5 mg. dose 12.0 7.4 9.8 11.5 13.0 12.5 9.8 10.5 13.5

Exercise 2.
To find out whether a new serum would arrest leukemia, 16 patients, who had all
reached an advanced stage of the disease, were selected. Eight patients received the
treatment and eight did not. The survival was taken from time the experiment was
conducted.

No Treatment ( x1) 2.1 3.2 3.0 2.8 2.1 1.2 1.8 1.9
With treatment ( x2 ) 4.2 5.1 5.0 4.6 3.9 4.3 5.2 3.9

 THE t-TEST FOR CORRELATED SAMPLES


The t-test for correlated samples is used when comparing the means before and
after the treatment. It is also used to compare the means of the pre-test and post-
test.
Formula:

t= D
Σ D - (ΣD )2
2

n
n (n-1)

where: D = the mean difference between the pretest and the posttest
Σ D2 = the sum of squares of the difference between the pretest and the post test
Σ D = the summation of the difference between the pretest and the post test
n = the sample size

30

Example: An experimental study was conducted on the effect of programmed materials


in English on the performance of 20 selected college students. Before the program was
implemented the pretest was administered a and after 5 months the same instrument
was used to get the posttest result. The following is the result of the experiment.

Pretest Posttest

X1 X2 D D2
20 25 -5 25
30 35 -5 25
10 25 -15 225
15 25 -10 100
20 20 0 0
10 20 -10 100
18 22 -4 16
14 20 -6 36
15 20 -5 25
20 15 5 25
18 30 -12 144
15 10 5 25
15 16 -1 1
20 25 -5 25
18 10 8 64
40 45 -5 25
10 15 -5 25
10 10 0 0
12 18 -6 36
20 25 -5 25
2
ΣD = -81 ΣD = 947

D = -81 = -4.05
20

Using the formula for t; t= -4.05

947 - ( -81 )2
20
20 ( 20 - 1)

t = - 3.17

31
Solving by Stepwise` Method:

I.Problem: Is there a significant difference between the pretest and the posttest on
the use of program materials in English?
II.Hypothesis: H0: There is no significant difference between the pretest and post
test on the use of the programmed materials did not effect the
student’s performance in English.
H1: The posttest result is higher than the pretest result.
III.Level of Significance:
α = .05
df = n-1= 20 -1 = 19
t@ .05 = -1.729 = -1.73

IV.Statistics: t- test for correlated samples


V. Decision Rule: If the t- computed value is greater than or beyond the critical
value, reject the null.
VI.Conclusion: The t-computed value of -3.17 is beyond the t-critical value
of -1.73 at .05 level of significance with 19 degree of
freedom, the null hypothesis is therefore rejected in favor of
the research hypothesis. This means that the posttest result
is higher than the pretest result. It implies that the use of the
programmed materials in English is effective.

** The One-Sample Mean Test ( t- test)

Exercise
An admission test was administered to incoming freshmen in the College of Nursing
and veterinary medicine with 100 students. Each was randomly selected. The mean
score of the given samples were x1= 90 and x2 = 85 and the variances of the test scores
were 40 and 35, respectively. Is there a significant difference between the two groups?
Use .01 level of significance.