Beruflich Dokumente
Kultur Dokumente
Module 8
TESTING HYPOTHESIS
The null hypothesis is the hypothesis of no difference. This means that the
population mean, μ1-; μ2= 0; μ1=μ2; therefore, HO: μ1-; μ2= 0 or no significant
difference. Otherwise if μ1-; μ2≠ 0; μ1≠μ2, therefore, μ1>μ2 or μ1<μ2, then reject HO
in favor of H1: μ1-; μ2≠ 0 or there is significant difference.
However, the probability of rejecting the null hypothesis, HO, in favor of the
Alternative Hypothesis,H1, when it is in fact true which should have been
accepted is called Type 1 Error or False Positive. Type I Error is denoted by α,
(0.05) the significance level of a test or it is the probability of committing Type I
Error.
1
On the other hand, the probability of accepting the null hypothesis, HO,
when it is in fact false which should have been rejected is called Type II Error.
Refusing to believe a truth is considered as Type II Error and is denoted by β.
2
Module 9
CORRELATION AND REGRESSION
Example 1
The following data represent the math grades for a random sample of 12
freshmen at a certain college along with their scores on an intelligence test while
they were still seniors in high school.
3
STUDENT INTELLIGENCE MATH GRADE
TEST SCORE
Xi Yi X i2 XiYi
1 65 85 4225 5525
2 50 74 2500 3700
3 55 76 3025 4180
4 65 90 4225 5850
5 55 85 3025 4675
6 70 87 4900 6090
7 65 94 4225 6110
8 70 98 4900 6860
9 55 81 3025 4455
10 70 91 4900 6370
11 50 76 2500 3800
12 55 74 3025 4070
725 1011 44475 61685
_ _ 1011
725
X = = 60.42 Y = 12 = 84.25
12
n XY- X Y
b=
n X 2 - X 2
n∑XY = 12(61,685) = 740,220
∑X∑Y = 725(1011) = 732,975
n∑X2 = 12(44,475) = 533,700
(∑X)2 = (725)2 = 525,625
b = 740,220 – 732,975
533,700 – 525,625
= 7, 245
8, 075
b = 0.897 or b = 0.90
a = ∑X∑Y2 - ∑X∑XY or a = Y – bX
n∑X2 – (∑X)2 a = 84.25 – 0.9(60.42)
a = 84.25 – 54.378
4 a= 29.97
a = 1011(44,475) – 725(61,685)
8,075
a = 44,964,225 – 44,721,625
8,075
a = 242,600
8,075
a = 30.04
Therefore the linear regression equation from the general form of
Y = a + bX is
Y = 30.04 + 0.90X
The regression line plotted on the scatter diagram is
(80,102)
(70,93.04)
M (60,84.04)
A
T (50,75.04)
H
G
R
A (30,57.04)
D
E (10,39.04)
(0,30.4)
5
X Y
0 30.04
70 93.04
50 75.04
60 84.04
80 102
10 39.04
30 57.04
2 2 2
Y - XY / X
sy 2 x=
n-2
The null hypothesis, Ho : b = 0, that is, Y does not depend on X. reject Ho if |tb| > tk (n-2)
Y
2
From the Example 1: =85,905
2
85, 905- 61, 685 / 44, 478
s 2 y.x= 6
10
3805039225
85, 905-
44, 478
=
10
85, 905-85,548.793
10
356.207
10
35.621
Therefore
0.897
tb =
35.621/44,475
0.897
=
0.028
t b =32.036
From table A.2 tt with ta(n-2) with α = 0.05, df = n-2 = 10 tf = 2.23. since tb>tt then we
reject Ho :b=0 that Y is not dependent on X.
7
Lesson 2 – Correlation
The correlation coefficient r was first introduced by Karl Pearson and is often called as
the product – moment correlation. This correlation coefficient measures the amount of
spread about the linear least – squares equation. Its range is from -1.0 to 1.0. if all the
points are exactly on the straight line, r value will either be +1.0 or -1.0 depending on
whether the relationship is positive or negative. As the value of r approaches 1.0, the
more points are located on the line. The value of r approaches 0 (zero) if the points are
more randomly scattered as indicated in the figures below:
8
To compute for the value of the correlation coefficient. rr the formula is
n XY X Y
r
n X 2 X 2 n Y 2 Y 2
9
STUDENT TEST SCORE MATH GRADE
X Y X2
1 65 85 7225
2 50 74 5476
3 55 76 5776
4 65 90 8100
5 55 85 7225
6 70 87 7569
7 65 94 8836
8 70 98 9604
9 55 81 6561
10 70 91 8281
11 50 76 5776
12 55 74 5476
725 1011 85,905
Computing for r
10
Testing the significance of the correlation coefficient the following relationship are
used:
r n2
t With n-2 degrees of freedom if n<50
2
1 r
r
z 1 If n > 50 critical where Z = 1.645
n2
r n2
t
2
1 r
0.8625 12 2
1 0.7439
0.8625(3.162)
0.2561
2.727
0.506
t 5.389
Tt with n – 2 = 10 df at α = 0.05
tt = 2.228
Accept Ho iff tc is less than tt; Reject Ho iff tc is equal or greater than tt
Therefore, reject H0, therefore, we can say that there is a correlation between the two
variables.
11
Exercise 9
Correlation and Regression
Name:________________________________________Score:___________
Course/Year:_____________________Time:________Date:_____________
Code No.____________
1. The data below show the final grades in algebra and physics obtained by
10 students selected at random form a large group of students.
Algebra (X) 75 80 93 65 78 71 98 68 84 77
Physics (Y) 82 78 86 72 81 80 95 72 89 74
2. The following data are indexed prices of gold and copper over the last 10
years. Assuming these indexed values constitute a random sample from
the population of possible values, test for the existence of correlation
between the indexed prices of the two metals.
Gold: 76 62 70 58 52 53 53 56 57 56
Copper: 80 68 73 63 65 68 65 63 65 66
12
3. An exclusive school in Manila conducted a study on the relationship of age
teaching performance of teachers, as evaluated by the students using the
5-point scale, where 5 the highest. With the random sample of 16
teachers, the result is shown below.
13
Module 10
CHI-SQUARE, X2
( 0 - E )
2
Where 0 = the observed cell frequency
2
x = E = the expected or theoretical frequency
E
14
Lesson 1 – The One-Sample Chi-Square Test
The actual or observed frequencies (O) are: 55 in favor and 45 not in favor.
Since Ho is equally divided, then the expected frequencies, E, are equal for both:
½ N=50 for in favor and ½ N=50 for not in favor. To facilitate the computation, a
working table is recommended:
Referring to the Table A., the tabular value, xt2 with df = (number of
categories -1) = 2-1 = 1 with α = 0.05 is 3.841.
15
The null hypothesis, Ho, is accepted if x2c < x2t. It is rejected if x2c > x2t. From
our example, since x2c < x2t or 1.0 <3.841, then the decision is to accept Ho, that
is, there is indeed equal proportion of residents who are in favor and not in favor
of putting a certain project in their barangay.
(In a 2x2 Contingency Table). Suppose that age (independent variable) is being
tested for association with the level of job satisfaction (dependent variable). Age
can be categorized into two independent samples: Young and Old. While the
dependent variable can be categorized into low and high level of job satisfaction.
A 2x2 contingency table is recommended to be constructed as follows:
16
When df = 1 and any expected frequency is small – less than 10 – the formula
below should be used:
X2 = N[(AD-BC) – ½]2
(A+D)(B+D)(A+B)(C+D)
Using formula
Xc2 = N(AD-BC) 2
(A+C)(B+D)(A+B)(C+D)
= 140[(25)(25)- 30(60)]2
85(55)(55)(85)
= 193,287,500
21,855,625
Xc2 = 8,844
The tabular value x2t with df = 1 and α = 0.05 is 3.481. Since x2c > x2t or
8.844 > 3.841, therefore, reject Ho, that there is no association between age and
level of job satisfaction.
17
Lesson 3 – The Chi-Square Test for Tables Having More Than
Four Cells
18
Computed Chi-Square can easily be solved by using a table similar below:
From the table A.3 x2t = 16.919 with df = 9 and α = 0.05. since x2c > x2t we
reject the null hypothesis, Ho, in favor of the alternative hypothesis, H1, that there
is relationship between ethnic group and political party preference.
19
Lesson 4 – Application of the Chi-Square Test for the Effect of
Two or more Independent Variables
In this case, there are 2, 2x2 contingency tables. Combining the df’s of the
two tables, the final df is 3 when referring the tabular values.
20
Exercise 10
Chi-Square, X2
Name:________________________________________Score:___________
Course/Year:_____________________Time:________Date:_____________
Code No.____________
1. Two groups, A and B, consist of 100 people each who have a disease. A
serum is given to group A but not to group B (which is called control)
otherwise, the two groups are treated identically. It is found that in group A
and B 75 and 65 people, respectively, recover from the disease. We
moved expect 70 people in each of the groups to recover and 30 in each
group not to recover as shown in the tables below:
FREQUENCIES OBSERVED
Recover Do not TOTAL
Recover
Gorup A (Using serum) 75 35 100
Group B (not using serum) 65 35 100
TOTAL 140 60 200
0 E
2
Using formula
E
21
2. Students, teachers, and school employees are asked to report to a scale of
proposed action with the response in 3 categories. Determine whether
there are significant differences in their responses.
Plan of Aciton
A B C
Student 20 20 60
Teachers 40 40 20
School Employees 10 70 20
22
Module 11
TEST OF SIGNIFICANCE WITH “t” DISTRIBUTION
To test the equality of means between two samples, the t-test is used.
There are two application: one for comparing the means of paired samples and
the other is comparing the means of two non-paired samples. The latter has two
methods a) samples with the same number of observations, and b) samples with
different numbers of observations.
Each of x1 is paired in some way with that of x2. In other words, there is
basis for pairing the observations. The formulas used are:
d
tc = Where : n = number of pairs
s2 / n
D = the difference between paired values, i.e., x1-x2
d = mean difference
S2 = estimated variance
S2 = [ d2 – ( d) 2]
n-1
23
The null hypothesis, Ho : µ1 - µ2 (there is no difference between the
Example 1
x1 x2 d d2 x12 x22
70 85 -15 225 4900 7225
65 70 -5 25 4225 4900
83 83 0 0 6889 6889
82 88 -6 36 6724 7744
86 90 -4 16 7396 8100
75 84 -9 81 5625 7056
74 75 -1 1 5476 5625
74 88 -14 196 5476 7744
65 79 -14 196 4225 6241
68 68 0 0 4624 4624
84 88 -4 16 7056 7744
90 88 2 4 8100 7744
81 84 -3 9 6561 7056
70 85 -15 225 4900 7225
66 70 -4 16 4356 4900
1133 1225 -92 1046 86530 100817
24
15 pairs – 1 in identifying the degree of freedom in the table
d ( d )
2
2
/n 1133
s2 X1
n 1 15
=75.533
1046 (92) 2 /15
14
92
s 2 34.41 d
15
=-6.133
d
From the table A.2., The tabulated value of t with
s2 / n df = n-1=14 @ α =0.05 is 2.145 since |tc| > tt.
6.133 Therefore, reject Ho. There is indeed a significant
34.41 / 15 difference between intelligence test scores
6.133 husbands and wives ….. df (degree of freedom)
2.294
4.05
25
Lesson 2 – Means of Two Non-Paired Samples
X1 X 2 n
tc
s 2
Where tc = computed t value
X 1 = sample mean of the first group
X 2 = sample mean of the second group
s 2 = pooled variance-standard deviation/ square root
S2 [ x ( x ) / n] [ x ( x ) / n]
2 2 2 2
1 1 2 2
2(n 1)
s s 2 61.752
s
7.858
tc X1 X 2 n
s 2
75,533 81, 667 15
2
7,858
6.134 2.739
7,858
2,138
26
28= 2.045 based to the table
From table A.2., The value, tt width df = 2(n-1) = 28 and α = 0.05 is 2.045. Since
tc > tt then reject Ho in favor of the alternative hypothesis, H1, that is, there is indeed
a significant difference between scores of husbands and wives.
X1 X 2 n1n2
tc
s n1 n2
X 12 X 1 2 / n1 X 2 2 X 2 2 / n2
s2
n1 1 n2 1
Example 1
Suppose we are going to compare the yield per plot or 2 varieties of rice:
n1 10 n2 7
X1 380 X2 247
X1 38.00 X2 35.29
14, 662 8,829
2 2
X1 X2
X1 X2
2 2
27
s2
14, 662 14, 440 8,823 8, 716
96
222 107
21.93
15
s 52 21.93
4.68
2.71 4.118
4.68
The tabular value of t with df=( n1-1)+ ( n2-1)=15 and σ=0.05 is 2.131.
Since |tc|< tt, Ho is accepted, which means that varieties X1 and X2 do not differ
significantly. (1.18<2.131 that is why Ho is accepted)
28
Paired
Exercise 11
Test Significance with “T” Distribution
Name:________________________________________Score:___________
Course/Year:_____________________Time:________Date:_____________
Code No.____________
TEST 1 TES8T 11
18 16
12 14
8 8
6 8
3 8
12 10
16 3
7 14
8 12
12 8
15 14
7 4
5 6
3 6
11 7
29
Module 12
ANALYSIS OF VARIANCE (ANOVA)
The analysis of variance utilizes the F-test. Despite its name, ANOVA is used to
test hypotheses about population means rather than population variances.
This type of ANOVA tests two or more groups if they are affected by
various treatments. In the actual experiment we can see that the means of the
groups vary. This variation of group means from the grand mean is called
between-groups variance. The variations of the scores within each group are
called within-groups variance and the variation of all individual scores is called
total variance.
CATEGORIES
A1 A2 ... Ax TOTAL
x11 x12 ... x1k
x21 x22 ... x2k
Scores x31 x32 ... x3k
X X
r y
Sums X i1 X i2 ik ij
i 1 i 1 i 1 i j
Means X.1 X.2 ... X.k X..
# of Cases r r ... r N
30
31
r r
Total sum of squares (BSS) i i x ij 2 CF
Where CF = Correction Factor
2
x ij
2
N
Between sum of squares (BSS)
x ij
N
Within sum of squares (WSS) = TSS – BSS
32
Example 1
Suppose we want to find out if there is a significant difference between the
lengths of lives of the 3 brands of fluorescent tubes (in years). The samples of
eight tubes were selected at random and their corresponding lengths of lives
were as follows:
FLOURESCENT BRAND
A B C TOTAL
2.5 2.0 2.0
2.6 2.4 2.2
3.5 2.6 2.0
2.4 3.0 2.1
3.7 1.8 1.2
3.7 1.5 1.5
2.8 2.1 1.5
2.5 2.2 2.0
TOTAL 23.7 17.6 14.5 55.8
MEAN 2.96 2.20 1.81
55.8
2
3113.64
cf 129.735
24 24
139.940 129.735
10.205
X ij
2
23.7 2 17.62 14.52
BSS CF CF
r 8 8 8
1081.7
129.735
8
5.478
33
We have to summarize the computation in the ANOVA table below:
ANOVA
* significant at α = 0.05
Since Fc> Ft, we can reject Ho which means that there is a significant
difference between the lengths of lives of the three brands of fluorescent tubes.
If after using the ANOVA and there is a need to reject Ho, it is imperative
for us to test where the difference or difference lie. There are several tests to
determine this and one of them is Scheffe’s Test (1957). This is done by
arranging the individual means and comparing each other.
Comparing, we have:
X C 1.81 X C VS XB
X S 2.20 X C VS XA
X A 2.96 X S VS XA
34
Then compute the F ratio of each group:
X
2
X2
Fr 1
BMS N1 N 2 / N1 N 2
FS = Ft (k-1)
= 3.47 (2)
= 6.94
Each of the three F’s computed above is then compared with the value of
6.94. if the E values computed is larger than 6.94 then it follows that the mean of
each differs significantly with each other:
Since F2<FS or 2.714 < 6.94 then the length of lives between brands B and
C are not significantly different from each other.
Since Fr > Fs or 23.625 > 6.94, then the lengths of lives between brands
C and A are significantly different from each other.
Finally, since F3> F2 or 10.314 > 6.94, then the lengths of lives between
brands B and A are significantly different from each other.
Exercise 12
35
Analysis of Variance (ANOVA)
Name:________________________________________Score:___________
Course/Year:_____________________Time:________Date:_____________
Code No.____________
1 2 3 4
25.6 25.2 20.8 31.6
24.3 28.6 26.7 29.8
27.9 24.7 22.2 34.3
Test if there are significant differences in the sales between regions. Further test
if there are significant differences.
36