Beruflich Dokumente
Kultur Dokumente
measurements
Medicine Psychology
government
Terminologies
• Data- a set of observations, values, elements
or objects under consideration
• Population (N)- complete set of all possible
observations or elements.
• Universe- the set of all entities under study
• Sample (n)- Representative of a population
• Variable- attribute of interest observable on
each entity in the universe
Types of data
1. Qualitative data (attributes)- categories
1. Students according to sex
2. Age level
3. Curriculum level
• 0 90 0 40
• 90/40 = 9/4 add 5cm to both lengths
• 95/45≠ 9/4
Subscript and summation notation
• ∑ ( summation symbol Greek capital letter Sigma)
denotes that subscripted variables are to be
added.
• Xi (x sub i) where i stands for numbers 1,2,3,…,n.
n
• ∑ (xi) = (x1) + (x2) + (x3) +. . . (xn).
i =1
4
• ∑ (xi)3 = (x1)3 + (x2)3 + (x3)3 + (x4)3
i =1
Continuation of Subscript and summation notation
• ∑x2 ?
= (∑x)2
2 2 ?
• ∑x ∑y = ∑(x2 y2)
?
• ∑x ∑y = ∑(x y)
2 2 ?
• ∑x ∑y = ∑(x2 y2)
Continuation of Subscript and summation notation
x y xy x2 y2 x2y2
5 2 10 25 4 100
2 6 12 4 36 144
3 5 15 9 25 225
4 3 12 16 9 144
• Systematic Sampling
Body
Simple Frequency Table
• Steps
Determine the range (highest score-lowest Score)
Determine the class intervals or categories (5-15)
Range/i
Write the class interval (lowest lower limit)
Determine the class frequency
Compute the class mark (LL+UL)/2
Class boundaries (0.5 below the LL and 0.5 above
the UL)
10 12 16 18 20 26 29 30 33 36 39 40 44 45 49 50 52 54 54
11 13 15 19 21 24 26 27 33 32 34 39 45 46 49 50 52 51 53
14 17 19 22 25 29 32 35 36 45 44 47 48 50 51 53 52 54 53
6
5
4
3 Series 1
2 Series 2
Series 3
1
0
Category Category Category Category
1 2 3 4
Horizontal Bar Graph
Category 4
Category 3
Series 3
Series 2
Category 2 Series 1
Category 1
0 2 4 6
Line graph
6
4
Series 1
3
Series 2
Series 3
2
0
Category 1 Category 2 Category 3 Category 4
Pie Chart
Sales
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
10 12 16 18 20 26 29 30 33 36 39 40 44 45 49 50 52 54 54
11 13 15 19 21 24 26 27 33 32 34 39 45 46 49 50 52 51 53
14 17 19 22 25 29 32 35 36 45 44 47 48 50 51 53 52 54 53
10-18 14
19-27 23
28-36 32
37-45 41
46-54 50
100
%
Measures of Central Tendency
(Measures of Position)
• A measure of position or central tendency is a
single figure which is representative of the
general level of magnitudes or values of the
items in a set of data.
• Mean is the arithmetic average of a set of
values, or distribution
• Median
• Mode
Mean
Mean is the sum of all items divided by the total number of items or
terms
12 15 16
12 21 16
13 21 21
14 23 21
14 23 23
25 24
12 21 17
13 21 20
14 23 21
14 23 24
Median = 13 24 25
Median = 23
Median
• Characteristics • When to Use
12 21 16
13 21 21
14 23 21
15 23 23
Mode = 12 24 23
25 24
Mean>Median>Mode
Mode mean
median
Mode>Median>mean
Source
Mean Mode mvpprograms.com
median (7/7/2011)
SOURCE: stewardess.inhatc.ac.kr (7/7/2011)
Kurtosis is the degree of peakedness of a
distribution. A normal distribution is a mesokurtic
distribution. A pure leptokurtic distribution has a
higher peak than the normal distribution and has
heavier tails. A pure platykurtic distribution has a
lower peak than a normal distribution and lighter
tails
MEAN FOR GROUPED DATA
10 12 16 18 20 26 29 30 33 36 39 40 44 45 45 50 52 54 54
11 13 15 19 21 24 26 27 29 30 33 32 34 39 45 45 52 51 53
18 11 15 14 17 19 22 25 29 32 35 36 45 44 45 53 52 54 53
Class f xi f(xi)
Interval
10-18 12 14 168 x = fx/n
19-27 10 23 230
= 1815/57
= 31.84
28-36 13 32 416
37-45 11 41 451
46-54 11 50 550
i= 9 N=57 1815
MEDIAN FOR GROUPED DATA
ci TLL-TUL f <cf
10-18 9.5-18.5 12 12
19-27 18.5-27.5 10 22
28-36 27.5-36.5 13 35
Median Class
37-45 36.5-45.5 11 46
46-54 45.5-54.5 11 57
i= 9 N=57
• n/2 = 57/2=28.5
WHERE : fm-1 is the sum of all frequencies before the median class
fm is the frequency of the median class
TLL is the true lower limit of the median class
(n/2 –fm-1)
• Median = TLL + --------------- (i)
fm
(28.5 – 22)
= 27.5 + ------------- (9)
13
= 27.5 + { 6.5/13} (9)
= 27.5 + (0.5) (9)
=27.5 +4.5
Median = 32
MODE FOR GROUPED DATA
ci TLL-TUL f <cf
10-18 9.5-18.5 12 12
19-27 18.5-27.5 10 22
28-36 27.5-36.5 13 35
Modal Class
37-45 36.5-45.5 11 46
46-55 45.5-55.5 11 57
i= 9 N=57
mode
∆1
• Mode = TLL + ------------ i Where :
∆1+ ∆2 TLL is the true lower
limit of the modal
10 class (highest number
Mode = 27.5 + ------------ (9) of frequency
10+11
= 27.5 + [10/21] (9) ∆1 frequency above
= 27.5 + (0.47) (9) the modal class
= 27.5 + (4.29)
=31.79 ∆2 frequency below
the modal class
Measures of Variation
• Range
• Variance
• Standard Deviation
Range is the simplest measure of variation . The
difference between the largest and the smallest value
in a given set of data. A much larger range suggests
greater variation or dispersion and is influenced by
extreme values called outliers. Only two variables are
considered and all other values are being ignored.
σ = ∑ (x -)2
N
The final exam scores of five students were 80, 88, 92, 90
and 85. Determine the variance and standard deviation.
x - (x -)2
88 1 1
80 -7 49
92 5 25
90 3 9
85 -2 4
= 435/5 = 87 88
1. Bell shaped
2. Mean, median and mode are
equal and located at the center
of the distribution.
3. The normal curve is unimodal
4. The curve is symmetrical about
the mean
5. The curve is continuous
6. The curve never touches the x-
axis(asymptotic about the x-
axis)
7. The total area under the
normal distribution is 100%.
Areas under the normal curve
x-
Z=
σ
Where:
X is the score
is the mean
σ is the standard
deviation
Area =
47.50%
Area = 0.4750 X 100% = 47.50%
0 1.96 = 0. 4750
Area from -1.53 - 0
43.70%
- 1.53 0
= 0.4370 or 43.70%
Area from z= -1.3 to 0.99
Total Area
-1.3 to 0 = 0.4032
0 to 0.99 = 0.3389
===============
0.7421
0.338
Or 74.21%
0.4032
- 1.3 0 0.99
0.4032 0.3389
Area to the right of z= 0.71
Total Area
Area (half) = 0.5000
0 to 0.71 = 0.2611 (-)
====================
0.2389
23.89%
0 0.71
= 0.2611
The average weekly income of 2,000 workers is P151.00 with a standard
deviation of P15.00. Assuming that the weekly incomes are normally
distributed, find the number of workers who earn:
1. from P119.50 to P155.50 per week
2. less than or equal to 127.50 per week
3. greater than or equal to P185.50 per week
x-
Z= mean = P151.00 sd = P15.00
σ
(1) Z for 119.50 and 155.50
z119.50 = (119.50 - 151.00)/15 = -2.1 ; area = 0.4821
z155.50 = (155.50 -151.00)/15 = 0.30 ; area = 0.1179
======
Total Area = 0.6000
-2.1 0 0.30
or 60%
Total number of workers whose salary falls from P119.50 to P155.50
= 2000 x .60
= 1,200 workers
2. less than or equal to 127.50 per week
(1) Z for 127.50
Area = 0.5000
z127.50 = (127.50 - 151.00)/15 = -1.56 ; area = - 0.4406
======
0.05940 or 5.94%
1.56 0
Total number of workers whose salary is less than or equal to P127.50
= 2000 x 0.05940
= 118.8 or 119
3. greater than or equal to P185.50 per week
(1) Z for 185.50
Area = 0.5000
z185.50 = (185.50 - 151.00)/15 = 2.3 ; area = - 0.4893
======
Total Area = 0.0107 or 1.07%
0 2.3
Total number of workers whose salary is greater than or equal to P185.50
= 2000 x 0.0107
= 21.4 or 21
Simple Tests of Hypothesis
• Hypothesis is a statement or tentative theory which aims to
explain facts about the real world.
• Hypothesis are subjected to testing and if found to be
statistically true , they are accepted if found to be false are
rejected.
• Ho null hypothesis
• Ha alternative hypothesis
• Rejection of the Ho implies acceptance of Ha
• Acceptance of Ha implies rejection of Ho
Type 1 and type 2 errors
Type 1 error (α error-when we reject the null
hypothesis (action) when in fact the null hypothesis
or Ho is true (actual condition) and therefore the
alternative hypothesis or Ha is false.(Producers’ risk
because it means articles produced may be rejected or
not sold)
Type 2 error (β error – error-when we accept the null
hypothesis (action) when in fact the null hypothesis
or Ho is false(actual condition) and therefore the
alternative hypothesis or Ha is true.(Consumers’ risk
because the consumer may accept products which are
poor of quality to meet the standards.)
Acceptance
region Rejection
Rejection
Rejection Acceptance region region
region region
α
>
Critical / Tabular
Value
Ha : M<Mo
One tailed
Rejection
/sided test to
Region Acceptance
Region
the left
α
<
Critical /
Tabular
Value
Z test (population standard deviation)
• ( x - ) n
Z=
σ
x1 - x2
Z=
σ 1/n1 + 1/n2
T-test (sample standard deviation)
x1 - x 2
t=
(n1-1) (S1 )2 + (n2 -1) (S2)2 1/n1 + 1/n2
n1 + n2 - 2
Steps in hypothesis Testing
• Formulate the null hypothesis that there is no significant
difference between items being compared. State the alternative
hypothesis which will be used in case Ho is rejected
• Set the level of significance (α )
• Determine the test to be used. Use z if population standard
deviation is given and t if sample standard deviation is given.
• Determine the tabular value. Use z tabular for z test
• For t tabular, compute degree of freedom (df) then look for the
tabular value from the table of t-distribution.
– Single sample df = n-1
– Two samples df = n1+n2 -2
• Compute for the z or t test
• Compare the computed with the tabular value
– Reject Ho if the absolute computed value is equal or greater than (≥)
the absolute tabular value
– Accept Ho if the absolute value is less than (<) the absolute tabular
value
Critical Values of Z
Test Type
One tailed test (< ,>) ±1.28 ±1.645 ±1.96 ±2.33
Two tailed test (≠) ±1.645 ±1.96 ±2.33 ±2.58
The average grade of students in freshmen were found to be 115. A sample of 25
students grade has a mean 126. 7 and a standard deviation of 24.2. Is there a reason to
believe that the grade of the 25 students has no significant difference with the others?
Use α= 5%to test Ho = 115 against Ha≠115
Ho : There is no significant difference on the average grades of the freshmen students
with the grades of the 25 students (Ho = 115)
Ha : There is a significant difference on the average grades of the freshmen students with
the grades of the 25 students (Ha ≠ 115)
α= 5% df = n-1 = 25-1 = 24 t.05,24 =1.711
( x - ) n-1
t=
s
(126.7 -115 ) 25-1
t = = 2.3685
24.2
Since
tc = 2.3685 > t(24,0.05)= 1.711
Reject the null hypothesis and accept the alternative which is : The 100 college
students are really heavier than the rest.
• One of the deans of a higher education
institution claims that the average salary
of the instructor in the college is
P15,200.00 per month. The salaries of 10
instructors are P15,000, P15,100,
P15,400, P15,300, P15,200, P14,700,
P15,100, P14,800, P15,000 and 14,900. Is
there enough evidence to reject the
Dean’s claim @ α=0.01? Assume that
pop. Is normally distributed
• Ho: M=P15,200/mo. (dean’s claim)
• Ha: M≠P15,200/mo
• df=n-1 , df=10-1=9
• x = 15,050
• std dev= 217.307
• tc=-2.183 . tc = I-2.183I , tcomp = 2.183
• α=0.01 α/2= 0.005
• df=10-1 =9
• ttab= 3.250
• tcomp = 2.183 < ttab= 3.250
• Accept the Ho and reject the alternative
• The mean salary of an instructor in the college is P15,200.00@
α=1%level.
Regression & Correlation
The pearson product moment coefficient of correlation
Figure 1
• Correlation
Table 2
Correlations
number of
accidents
alcohol level (thousands)
alcohol level Pearson Correlation 1
.984**
Sig. (2-tailed) .000
N 7 7
number of Pearson Correlation .984** 1
accidents
(thousands) Sig. (2-tailed) .000
N 7 7
• significant @ 1% level
Interpretation:
b’= 1.029
a’ =Σy /n - b’ Σx/n = (195/7) – 1.029(140)/7
a’=27.86 -20.58 = 7.28
y=7.28 +1.029(x)
Table 4 Simple Regression
Coefficientsa
Standardiz
ed
Unstandardized Coefficient
Coefficients s
Model B Std. Error Beta t Sig.
1 Constant 7.286 1.863 3.912 .011
alcohol 1.029 .083 .984 12.348 .000
level
a. Dependent Variable: number of accidents (thousands)
Interpretation:
• Number of accidents = (7.286 + 1.029 x alcohol level)
• The model will be used to predict number of accidents
given an alcohol level
• As alcohol level increases, number of accidents also
increase.
Table 5
Model Summary
Figure 2
Analysis
Table 7
Correlations
final entrance intelligence
exam exam age quotient
final exam Pearson Correlation 1 .892** .651** .196
N 20 20 20 20
entrance exam Pearson Correlation .892** 1 .742** .009
N 20 20 20 20
age Pearson Correlation .651** .742** 1 .182
N 20 20 20 20
intelligence quotient Pearson Correlation .196 .009 .182 1
N 20 20 20 20
**. Correlation is significant at the 0.01 level (2-tailed).
Anova table
Table 8 ANOVAb
Sum of Mean
Model Squares df Square F Sig.
1 Regression 1495.088 3 498.363 27.061 .000a
Total 1789.750 19
Significant @ 1% level
Interpretation
Unstandardized Standardized
Coefficients Coefficients
Std.
Model B Error Beta t Sig.
1
(Constant) 2.329 46.444 .050 .961
entrance
exam 1.438 .229 .967 6.274 .000
age -1.993 3.026 -.103 -.659 .520
intelligence
quotient .135 .069 .206 1.960 .068
a. Dependent Variable: final exam
Interpretation:
•Final Exam = 2.329 + 1.438 x entrance exam – 1.993 x age + .135
intelligence quotient
•The only predictors that has significant effects on final exam is the
result of entrance exam
Using Stepwise Regression Analysis, all variables that has no
significant effect on the final exam were removed and variables
which has significant effect were retained.
Table 11
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts
Between
Column
Within
Column
Total
Source of Sum of df MSS=SS/df F-Value
Variation Squares Computed Tabular
5% 1%
Between
Column 817 2 408.5 13.12 3.68 6.36
Within
Column 467 15 31.13
Total 1284 17