ExercisesSection 22 WSolutions

Syllabus sections 22: Goodness of fit test
EXERCISES WITH SOLUTIONS
EXERCISE 1
A plumber has noticed a certain association between the typology of sink chosen by his clients and
the area of residency. In order to confirm or reject his intuition, he selects a sample of 500 clients and he
classifies them according to the contingency table reported below:
Area Downtown Outskirt Suburb

Sink
Cabinet Style 60 50 80
Pedestal 50 40 70
Wall mounted 40 60 50
a) Correctly state the hypotheses we need to verify.
H0 : there is NO ASSOCIATION between “sink” and “Area”

H1 : there exists an ASSOCIATION between “sink” and “Area”
b) Is the plumber’s intuition correct? Answer by using α = 0.01.
We have to run a Chi-square test of independence. The test-statistics to be used is
(O − Eij )
2
r c Ri C j
∑∑ where Eij =
ij
=i 1 =j 1 Eij n
under the null hypothesis. This statistics has approximately a Chi-square distribution with
(r-1)(c-1) degrees of freedom. By setting α = 0.01 and by taking into account that in this case, r = 3 and c = 3,
we have χ2 4, 0.01 = 13.28.
Therefore, the null hypothesis will be rejected only if the observed value of the test statistics is larger than
13.28.
The “Expected frequencies” are the absolute expected frequencies in the case in which the characters A and
B were independent.
𝑅𝑅𝑖𝑖 𝐶𝐶𝑗𝑗 (𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑖𝑖 𝑡𝑡ℎ 𝑟𝑟𝑟𝑟𝑟𝑟)(𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑗𝑗 𝑡𝑡ℎ 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐)
𝐸𝐸𝑖𝑖𝑖𝑖 = =
𝑛𝑛 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
The following table shows the expected frequencies under the null hypothesis:

Sink
Pedestal 48 48 64
Let us compute the value of the test statistics
χ2 = (60-57)2/57 + (50-57)2/57 + (80-76)2/76 + (50-48)2/48 +(40-48)2/48+(70-64)2/64+(40-45)2/45+

(60-45)2/45+(50- 60)2/60 = 10.4294
10.4294 < 13.28 so that we do not reject H0: we can conclude that there is no association between
“sink” and “Area”. (p-value 0.0338)
EXERCISE 2
A plumber has noticed a certain association between the typology of sink chosen by his clients and
the area of residency. In order to confirm or reject his intuition, he selects a sample of 1000 clients and he
classifies them according to the contingency table reported below:

Sink
Pedestal 120 100 100
a) Correctly state the hypotheses we need to verify.
H0 : there is NO ASSOCIATION between “sink” and “Area”

H1 : there exists an ASSOCIATION between “sink” and “Area”
b) Is the plumber’s intuition correct? Answer by using α = 0.025.
We have to run a Chi-square test of independence. The test-statistics to be used is
(O − Eij )
2
r c Ri C j
∑∑ where Eij =
ij
=i 1 =j 1 Eij n
under the null hypothesis. This statistics has approximately a Chi-square distribution with
(r-1)(c-1) degrees of freedom. By setting α = 0.025 and by taking into account that in this case, r = 3 and c =
3, we have χ2 4, 0.025 = 11.14.
Therefore, the null hypothesis will be rejected only if the observed value of the test statistics is larger than
11.14.
The “Expected frequencies” are the absolute expected frequencies in the case in which the characters A and
B were independent.
𝑅𝑅𝑖𝑖 𝐶𝐶𝑗𝑗 (𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑖𝑖 𝑡𝑡ℎ 𝑟𝑟𝑟𝑟𝑟𝑟)(𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑗𝑗 𝑡𝑡ℎ 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐)

𝐸𝐸𝑖𝑖𝑖𝑖 = =
𝑛𝑛 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
The following table shows the expected frequencies under the null hypothesis:
Sink
Pedestal 112 96 112
Let us compute the value of the test statistics
χ2 = (140-154)2/154 + (120-132)2/132 + (180-154)2/154 + (120-112)2/112 +(100-96)2/96+

(100-112)2/112+ (90-84)2/84+(80-72)2/72+(70-84)2/84 = 12.4279
12.4279 > 11.14 so that we reject H0: we can conclude that there exists an association between “sink” and
“Area”.(p-value 0.0144)
EXERCISE 3
On a sample of 100 workers, we want to find a possible link between the average contract duration (years)
and the industry type (A=Automotive, NA=Not Automotive). This contingency table shows the results
Contract duration 2 5 10
Industry Type
A 10 15 20 45
NA 15 10 30 55
25 25 50 100
a) Show the two hypothesis we should test.
Chi-square test for independence.
H0 : there is NO ASSOCIATION between “Contract duration” and “Industry type”

H1 : there exists an ASSOCIATION between “Contract duration” and “Industry type”
b) Write the formula of the statistic we should use in the test.
The test statistic is:

(O − Eij )
2
r c Ri C j
∑∑ where Eij =
ij
;
=i 1 =j 1 Eij n
under H0 is distributed according to a chi-square distribution with (r − 1)(c − 1) degrees of freedom
c) Make your final decision about hypothesis with a level of significance of 5%.
Fixed a confidence interval of 5% and , r = 2 e c = 3, we have χ 2;0.05 = 5.99 .

2
We reject H0 if the observed value for the test statistics is > 5.99.
Industry type
A 11.25 11.25 22.5
NA 13.75 13.75 27.5
Industry type
A 0.1389 1.2500 0.2778
NA 0.1136 1.0227 0.2273
Considering the data at hand, 3.0303 < 5.99, then we do not reject H0.
EXERCISE 4
On a sample of 100 workers, we want to find a possible link between the age and the means of transport
used to reach the work place. This contingency table shows the results
Transport Private car Public Transport Others

Age
18<age<30 10 35 10 55
≥30 25 15 5 45
35 50 15 100
a) Show the two hypothesis we should test.
Chi-square test. The objective is to test the independence.

H0 : Chi square = 0
H1 : Chi square ≠ 0
b) Write the formula of the statistic we should use in the test.
The test statistic is:

(O − Eij )
2
r c Ri C j
C = ∑∑ where Eij =
ij
;
=i 1 =j 1 Eij n
Under H0 the test statistic is distributed as a chi-square distribution with (r − 1)(c − 1) degrees; in
our case, r = 2 e c = 3, and the number of degrees is equal at 2.
c) Calculate (approximately) the p-value and make your final decision about hypothesis with a level of
significance of 5%.
To derive the p-value the observed value for the test statistics is required.
Transport Private Public Transport Others
Age car
18<age<30 19.25 27.5 8.25
≥30 15.75 22.5 6.75
Transport Private Public Transport Others

Age car
18<age<30 4.4448 2.0455 0.3712
≥30 5.4325 2.5000 0.4537
The sum is equal to 15.2477; the p-value is:

P(C > 15.2477 | H 0 ) < 0.005
10.60 < 15.2477. We reject H0. The two variables are dependent.
EXERCISE 5
In order to evaluate a possible association between the payment of the vehicle tax and the geographical
area, a survey of 400 users was conducted. The results are shown in the following table:
Payment vehicle tax \Geographical Area Center-North South-Islands
No 40 70
Yes 160 130
a) Specify the hypothesis to verify.

H0: independence between payment of the vehicle tax and the geographical area
H1: there exists dependence between payment of the vehicle tax and the geographical area
b) Compute the p-value of the test. For a significance level equal to α=0.05, what is the decision
concerning the hypothesis of the test?
On the basis of the marginal frequencies Oi. and O.j and of the sample size n
Payment vehicle tax Center-North South-Islands Oi.
No 40 7 110
Yes 160 130 290
O.j 200 200 400
Oi. O. j
We compute the expected frequencies: E ij =
n
Payment vehicle tax Center-North South-Islands
No 55 55
Yes 145 145
The value of the test statistics is:
c = ∑∑
2
r c (O
ij − Eij )
2
=
(40 − 55)2 + (70 − 55)2 + (160 − 145)2 + (130 − 145)2 = 11.2853
i =1 j =1 Eij 55 55 145 145
This statistics has distribution chi-square with (r-1)(c-1)=1 degrees of freedom.
Therefore, p − value = Pr (χ 1 > 11.2852 ) . From the tables it results that p-value < 0.005 (the precise
value, not available on the table is equal to 0.0008). Therefore, for every level of significance greater than
1%, p-value < α. We reject the null hypothesis. There exists dependence between payment of the vehicle
tax and the geographical area.
EXERCISE 6
In order to evaluate the existence of a possible association between the payment of the Rai tax (i.e.
“canone Rai”) and the geographical area, a survey of 200 users was conducted. The results are shown in the
following table:
Rai tax\ Geographical Area Center-North South-Islands
Yes 80 65
No 20 35
a) Specify the hypothesis to verify.
H0: independence between payment of the Rai tax and the geographical area;
H1: there exists dependence between payment of the Rai tax and the geographical area
b) Compute the p-value of the test. For a significance level equal to α=0.01, what is the decision concerning
the hypothesis of the test?
On the basis of the marginal frequencies Oi. and O.j and of the sample size n
Payment vehicle tax Center-North South-Islands Oi.
No 80 65 145
Yes 20 35 55
O.j 100 100 200
Oi. O. j
We compute the expected frequencies: E ij =
n
Payment vehicle tax Center-North South-Islands
No 72.5 72.5
Yes 27.5 27.5
The value of the test statistics is:
r c
(Oij − Eij )
2
=
(80 − 72.5) (65 − 72.5) (20 − 27.5) (35 − 27.5)
2
+
2
+
2
+
2
=
c 2 = ∑∑ Eij 72.5 72.5 27.5 27.5
i =1 j =1
= 0.7759 + 0.7759 + 2.0455 + 2.0455 = 5.6428
This statistics has distribution chi-square with (r-1)(c-1)=1 degrees of freedom.
Therefore, p − value = Pr (χ1 > 5.6428) . From the tables it results that 0.01 < p-value < 0.025 (the
precise value, not available on the table is equal to 0.0175). Therefore, we do not reject the null
hypothesis.
EXERCISE 7
Gianni is gambling with Andrea on the results of a dice roll. Gianni, thinks that the dice used by Andrea is
unfair. Gianni carries out 60 rolls of the dice and observe the following results.
Result 1 2 3 4 5 6
Observed frequency 15 5 5 15 10 10
In order to verify whether the dice used by Andrea is really unfair, Gianni decides to run an appropriate
hypothesis test.
a) Specify the hypothesis to be verified.
If the dice is fair, the probability of each outcome is 1/6 and then, over 60 rolls the expected frequency for
1
each outcome is ⋅ 60 = 10 . Let us use a chi-square test to evaluate whether the deviation between the
6
observed frequencies (Oi) and the expected frequencies (Ei) is significantly different than 0. Hence:
H0: the dice is fair,
H1: the dice is unfair.
b) Compute the p-value of the test. Fix a significance level equal to =0.05 and decide about the hypothesis
previously specified.
Let us compute the statistics
χ2 = ∑
6
(Oi − Ei )2 = (15 − 10)2 + (5 − 10)2 + (5 − 10)2 + (15 − 10)2 + (10 − 10)2 + (10 − 10)2 = 10
i =1 Ei 10
That follows a χ 2 distribution with 6 – 1 = 5 degrees of freedom.
Fix a significance level equal to α = 0.05.
(
The p-value of the test is: p − value = Pr χ 52 > 10 . )
From the tables it results that 0.05 < p − value < 0.1 , therefore we do not reject the null hypothesis:
there is no sufficient empirical evidence for the dice to be considered unfair (the exact value of the p-value,
not given in the tables is 0.0752).
EXERCISE 8
Luigi is gambling with Filippo on the results of a dice roll. Luigi, thinks that the dice used by Filippo is unfair.
Luigi carries out 120 rolls of the dice and observe the following results.
Result 1 2 3 4 5 6
Observed frequency 15 25 23 22 17 18
In order to verify whether the dice used by Filippo is really unfair, we decide to run an appropriate
hypothesis test.
a) Specify the hypothesis to be verified.
If the dice is fair, the probability of each outcome is 1/6 and then, over 120 rolls the expected frequency for
1
each outcome is ⋅ 120 = 20 . Let us use a chi-square test to evaluate whether the deviation between the
6
observed frequencies (Oi) and the expected frequencies (Ei) is significantly different than 0. Hence:
H0: the dice is fair,
H1: the dice is unfair.
b) Compute the p-value of the test. Fix a significance level equal to =0.05 and decide about the hypothesis
previously specified.
Let us compute the statistics
6
(Oi − Ei )2 (15 − 20)2 + (25 − 20)2 + (23 − 20)2 + (22 − 20)2 + (17 − 20)2 + (18 − 20)2
χ =∑
2
= = 3.8
i =1 Ei 20
That follows a χ 2 distribution with 6 – 1 = 5 degrees of freedom.
Fix a significance level equal to α = 0.05.
(
The p-value of the test is: p − value = Pr χ 52 > 3.8 . )
From the tables it results that p − value > 0.05 , therefore we do not reject the null hypothesis: there is
no sufficient empirical evidence for the dice to be considered unfair (the exact value of the p-value, not
given in the tables is 0.5786).
EXERCISE 9
A pastry chef wants to know if his customers predominantly like some kind of pastries. A sample of 80
customers has been asked which kind of pastries prefers and the classification in the following table has
been obtained:
With fruits With cream With chocolate Dry
22 26 15 17
a) At a 10% significance level, is it possible for the pastry chef to conclude, on the basis of sample
evidence, that all the 4 typologies of pastry are equally preferred?
With With With Dry Total
fruits cream chocolate
Probability distribution of
p1 p2 p3 p4 1
the population
Oi = Observations drawn
from the population 22 26 15 17 80
Assumed probability
distribution of the 1/4 1/4 1/4 1/4 1
population
Ei = expected number of
observations if the assumed 80*1/4=20 80*1/4=20 80*1/4=20 80*1/4=20 80
distribution is true
We ran the test:
 1
H 0 : the probability distribution of the population is uniform  i.e. p1 = p2 = p3 = 
 3
H 1 : the probability distribution of the population is not uniform
(Oi − Ei )
K 2
We use the test statistics U = ∑i =1 Ei

that has distribution χ k2−1 (with k=4 and n=80).
(Oi − Ei ) ( 22 − 20) ( 26 − 20) (15 − 20) (17 − 20)

k 2 2 2 2 2
U= ∑i =1 Ei
=
20
+
20
+
20
+
20
= 3,7
(Oi − Ei )
K 2
We reject H 0 if: ∑
i =1 Ei
> χ 3,0.1
2
Knowing that χ 3,0.1 = 6, 25

2
3,7 < 6, 25 and assuming a 10% significance level, I do not reject the null hypothesis H 0 : “there is no
sufficient empirical evidence to state that the customers prefer a particular kind of pastry”. It is possible for
the pastry chef to conclude that on the basis of sample evidence, that all the 4 typologies of pastry are
equally preferred.
b) Describe the logic of the test statistic used to verify the hypothesis in the previous point.
A test with significance level α , under H 0 , against the alternative hypothesis that the assumed
probabilities are not correct, is based on the following decision rule
(Oi − Ei )
K 2
reject H 0 if: ∑
i =1 Ei
> χ K2 −1,α
where χ
2
K −1,α (
is the value for which P χ K2 −1 > χ K2 −1,α =
α )
And the random variable χ K2 −1 follows a chi-square distribution with ( K − 1) degrees of freedom.
EXERCISE 10
A professor suggests to his students to choose one of the following books for their studies: A, B, C or D. He
believes that the students have no particular preferences and so the books will be equally (Uniformly)
chosen. To test this assumption, the professor collects the chosen book for a random sample of 100
students, with these results: 20 preferred book A, 40 B, 30 C and 10 D.
a) Which is the statistic we must use to help the professor in verifying his hypothesis?
We must use the “Chi-square test”, that is we must calculate the statistic:
K
(O i − E i ) 2
χ2 = ∑
i =1 Ei
where Oi are the observed frequencies and Ei those expected, in this case the frequencies of the Uniform
distribution.
This statistic is distributed like Chi-square distribution with k-1 degrees of freedom (K = nr. of classes).
b) Test if the professor hypothesis is rejected, with α = 0.01.
Observed Oi and expected Ei frequencies in this situation are:
A B C D
Oi 20 40 30 10
Ei 25 25 25 25
(O i − E i ) 2 (−5) 2 15 2 5 2 (−15) 2
K
from which: χ = ∑ 2
= + + + = 20
i =1 Ei 25 25 25 25
In a Chi-square distribution with 3 (4-1) degrees of freedom the percentile 99 is, according to our table,
11.34.
Since 20 > 11.34 we must reject the null hypothesis H0 of the professor: the distribution of preferences
among books is not Uniform.
EXERCISE 11
A professor suggests to his students to choose one of the following books for their studies: A, B, C or D. He
believes that the students have no particular preferences and so the books will be equally (Uniformly)
chosen. To test this assumption, the professor collects the chosen book for a random sample of 200
students, with these results: 70 preferred book A, 40 B, 60 C and 30 D.
a) Test if the professor hypothesis is rejected, with α = 0.05.
We must use the “Chi-square test”, that is we must calculate the statistic:
K
(O i − E i ) 2
χ2 = ∑
i =1 Ei
where Oi are the observed frequencies and Ei those expected, in this case the frequencies of the Uniform
distribution.
This statistic is distributed like Chi-square distribution with k-1 degrees of freedom (K = nr. of classes).
Observed Oi and expected Ei frequencies in this situation are:
A B C D
Oi 70 40 60 30
Ei 50 50 50 50
K
(O i − E i ) 2 (20) 2 (−10) 2 10 2 (−20) 2
from which: χ 2 = ∑
i =1 Ei
=
50
+
50
+
50
+
50
= 20
In a Chi-square distribution with 3 (4-1) degrees of freedom the percentile 95 is, according to our table,
7.81. Since 20 > 7.81 we must reject the null hypothesis H0 of the professor: the distribution of preferences
among books is not Uniform.
EXERCISE 12
You want to verify if there is an association between the area of residence of families and the presence of
underage children. To this aim, a random sample of 100 families is analyzed and the collected information
is organized in the following contingency table:
Area of residence
Urban Rural
Presence of underage Yes 12 12

children NO 53 23
a) What are the null and alternative hypotheses to verify?
We need to use a chi-square test where the hypotheses are as follows:
 H 0 : There is no association between "presence of children" and "Area of residence"


 H1 : There is association between "presence of children" and "Area of residence"
b) Determine, at the significance level α = 0.05, if there is association between the two variables.
The test statistic to use is:
(O − Eij )
2
r c Ri C j
∑∑ where Eij =
ij
.
=i 1 =j 1 Eij n
Under the null hypothesis, the test statistic has an approximate Chi-square distribution with (r − 1)(c − 1)
degrees of freedom.
(O − Eij )
2
r c
∑∑ > c (2r −1)( c −1),α

ij
We reject H 0 when
=i 1 =j 1 Eij
Setting alpha α = 0.05 and since r = 2 and c = 2 we have that χ1;0.05 = 3.84 . In the following table we
2
 Ri ⋅ C j 
calculate the expected frequencies  Eij = :
 n 
Area of residence
Urban Rural
Presence of underage Yes 15.6 8.4

children NO 49.4 26.6
(O − Eij )
2
ij
While in the following table we calculate the quantities:
Eij
Area of residence
Urban Rural
Presence of Yes 0.8308 1.5429

underage children NO 0.2623 0.4872
The sum of the values in the last table, i.e. the value of the test statistic, is 3.1232
Since 3.1232 < 3.84 we do not reject H0 and we conclude that there is no evidence that the Presence of
underage children and the Area of residence are associated.
EXERCISE 13
You want to verify if there is an association between the area of residence of families and the presence of
underage children. To this aim, a random sample of 500 families is analyzed and the collected information
is organized in the following contingency table:
Area of residence
Urban Rural

children NO 80 95
a) What are the null and alternative hypotheses to verify?
We need to use a chi-square test where the hypotheses are as follows:
 H 0 : There is no association between "presence of children" and "Area of residence"


 H1 : There is association between "presence of children" and "Area of residence"
b) Determine, at the significance level α = 0.01, if there is association between the two variables.
The test statistic to use is:
(O − Eij )
2
r c Ri C j
∑∑ where Eij =
ij
.
=i 1 =j 1 Eij n
Under the null hypothesis, the test statistic has an approximate Chi-square distribution with (r − 1)(c − 1)
degrees of freedom.
(O − Eij )
2
r c
∑∑ > c (2r −1)( c −1),α

ij
We reject H 0 when
=i 1 =j 1 Eij
Setting alpha α = 0.01 and since r = 2 and c = 2 we have that χ1;0.01 = 6.63 . In the following table we
2
 Ri ⋅ C j 
calculate the expected frequencies  Eij = :
 n 
Area of residence
Urban Rural

children NO 91 84
(O − Eij )
2
ij
While in the following table we calculate the quantities:
Eij
Area of residence
Urban Rural
Presence of Yes 0.716 0.7756

underage children NO 1.3297 1.4405
The sum of the values in the last table, i.e. the value of the test statistic, is 4.2618.
Since 4.2618 < 6.63 we do not reject H0 and we conclude that there is no evidence that the Presence of
underage children and the Area of residence are associated.
EXERCISE 14
A sample of cyclists has been classified according to their gender and to their opinion towards the bike
routes of a city:
GENDER Male Female
OPINION
Low 16 39
Sufficient 24 17
Good 20 24
You would like to carry on a test in order to verify whether there is an association between the two
variables.
a) Write down the hypotheses to test.
H0 : there is no association between “OPINION” and “GENDER”

H1 : there is association between “OPINION” and “GENDER”
b) Does the data constitute enough empirical evidence in order to affirm that the two variables are
significantly related? Answer by adopting an α = 0.05.
We need to run a Chi-square test of independence. The test statistic to be utilized is:
r c (Oij − Eij ) 2 RiC j

∑∑
i =1 j =1 Eij
where Eij =
n
Under the null hp, the random variable associated with the test follows a chi square distribution with (r-
1)(c-1) degrees of freedom.
By setting α = 0.05 and knowing that, r = 3 e c = 2, we have χ 22,0.05 = 5.99 .
The null hp will be rejected if the observed value of the test statistic is above the critical value of
χ 22,0.05 = 5.99
In order to compute the test statistic:
GENDER Male Female Ri
OPINION
Low 16 39 55
Sufficient 24 17 41
Good 20 24 44
Cj 60 80 140
Ri C j
Below, the expected frequencies under the null hp Eij = :
n
GENDER Male Female
OPINION
Low 23.5714 31.4286
Sufficient 17.5714 23.4286
Good 18.8571 25.1429

Let’s compute the value of the test statistic
r c (Oij − Eij ) 2 (16 − 23.5714) 2 (39 − 31.4286) 2 (24 − 17.5714) 2 (17 − 23.4286) 2
∑∑ = + + + +
i =1 j =1 Eij 23.5714 31.4286 17.5714 23.4286
(20 − 18.8571) 2 (24 − 25.1429) 2
+ = 8.4931
18.8571 25.1429
8.4931 > 5.99 therefore we reject H0. Data provide sufficient empirical evidence to affirm that the two
variables are associated at a level of significance α = 0.05. (p-value 0.014313)
EXERCISE 15
100 people are selected for an interview at random. The following two-way table refers to the questions
“will you take part in the next masked parade?” and “interviewee’s gender”.
“will take part…”
Yes No
Gender M 12 28
F 13 47
a) What is the statistical test used to detect association between the variables in the two-way table? What
are the null and the alternative hypotheses?
In order to test for independence we carry out the Chi-squared ( χ 2 ) test. The test’s hypotheses are:
H0: “there is no association between the variables (they are independent)”

H1: “there exists some association between the variables (they are dependent)”
b) Calculate the p-value of the test you indicated in the previous point.
To calculate the p-value we can use the statistics c = ∑∑

2
r c (O
ij − Eij )
2
that has distribution:

i =1 j =1 Eij
χ (2r −1)(χ−1) = χ12 (where c=r=2 are the number of columns and the number of rows of the joint
distribution table).
If the two variables were independent, the table would be:
“Will take part…”
Yes No
Gender M 10 30
F 15 45
The test statistic is calculated as follows
c = ∑∑
2
r c (O
ij − Eij )
2
=
(12 − 10)2 + (13 − 15)2 + (28 − 30)2 + (47 − 45)2 =
i =1 j =1 Eij 10 15 30 45
4 4 4 4
= + + + = 0,8889
10 15 30 45
The p-value is:
( ) (
p − value = P χ 12 > χ 2 = P χ 12 > 0,8889 )
Thus the p-value is larger than 0,01.
c) Based on the result obtained in point b), make an assertion on H0, with α = 0.05. Briefly comment on the
output.
Since p − value > a we don’t reject the null hypothesis: there is no sufficient empirical evidence to state
that the willingness to take part in the parade varied across genders.
EXERCISE 16
100 people are selected for an interview at random. The following two-way table refers to the questions
“will you take part in the next masked parade?” and “interviewee’s gender”.
“will take part…”
YES No
Gender M 10 30
F 30 30
a) What is the statistical test used to detect association between the variables in the two-way table? What
are the null and the alternative hypotheses?
In order to test for independence we carry out the Chi-squared ( χ 2 ) test. The test’s hypotheses are:
H0: “there is no association between the variables (they are independent)”

H1: “there exists some association between the variables (they are dependent)”
b) Calculate the p-value of the test you indicated in the previous point.
To calculate the p-value we can use the statistics c = ∑∑

2
r c (O
ij − Eij )
2
that has distribution:

i =1 j =1 Eij
χ (2r −1)(χ−1) = χ12 (where c=r=2 are the number of columns and the number of rows of the joint
distribution table).
If the two variables were independent, the table would be:
“Will take part…”
Yes No
Gender M 16 24
F 24 36
The test statistic is calculated as follows

r c (O − E ij )
2
(10 − 16)2 + (30 − 24)2 + (30 − 24)2 + (30 − 36)2
c = ∑∑ = =
2 ij
i =1 j =1 E ij 16 24 24 36
36 36 36 36
= + + + = 6,25
16 24 24 36
The p-value is:

( )
p − value = P χ 12 > χ 2 = P χ 12 > 6,25 ( )
Thus the p-value is in between 0,01 e 0,025.
c) Based on the result obtained in point d), make an assertion on H0, with α = 0.05. Briefly comment on the
output.
Since p − vαlue < α we reject the null hypothesis: there’s sufficient empirical evidence to state that the
willingness to take part in the parade varied across genders (the variables are statistically dependent).
EXERCISE 17
A retail manager of home appliances wants to analyze whether customers do have a particular preference
when they choose a flat TV. The three kind of flat TV sold are: lcd, plasma, led. In a simple random sample
of 288 purchasers of a flat TV 112 bought lcd, 103 plasma, 73 led.
a) Calculate the p-value of the test to verify whether the customers do have a preference or not.
To verify whether the customers do have a preference or not we need to compare the distribution of the
preferences with a uniform distribution:
LCD PLASMA LED Total
p1 p2 p3 1
Oi = observed distribution 112 103 73 288
Uniform probability distribution 1/3 1/3 1/3 1
Ei = expected distribution in case of no 288*1/3= 288*1/3= 288*1/3=

288
preferences 96 96 96
Test:
 1
H 0 : uniform distribution  so p1 = p2 = p3 = 
 3
H 1 : distribution is not uniform, customers have preferences
U =∑
k
(Oi − Ei )2 is distributed as a χ k2−1 (with k=3 and n=287).
i =1 Ei
U =∑
k
(Oi − Ei )2 = (112 − 96)2 + (103 − 96)2 + (73 − 96)2 = 8,3902
i =1 Ei 96 96 96
( ) (
p − value = P χ 22 > U = P χ 22 > 8.6875 )
From χ 22 distribution table we can see that 0.01 < p − value < 0.025 .
b) Assuming α equal to 0.05, which is the final conclusion you would suggest to the retail manager?
Justify your answer.
With a level of significance of 5%, we reject the Null hypothesis H 0 : there’s enough empirical evidence to say
that customers do have a preference in their TV set choice.
EXERCISE 18
A market research on drink consumption has been conducted in order to verify the association between
type of drinks and consumers’ age. A survey was administered to 120 customers and their preferences have
been collected and recorded into the following cross-table:
Type of drink
Cocktail Liqueur
< 30 years 70 10
Age of customers
≥ 30 years 18 22
a) Using an appropriate statistical test, verify the hypothesis of independence of the two variables,
providing the assumptions and the computations needed to take the final decision. What are the
conclusions? Briefly explain.
Perform a chi-square test. The hypotheses to be tested are:
H0: The variables “Age of customer” and “Type of drink” are independent
H1: The variables “Age of customer” and “Type of drink” are dependent
The statistics test to verify the hypothesis of independence is given by:

2
2
∑𝑖𝑖,𝑗𝑗�𝑂𝑂𝑖𝑖𝑖𝑖 − 𝐸𝐸𝑖𝑖𝑖𝑖 � (70 − 58.6667)2 (10 − 21.3333)2 (18 − 29.3333)2 (22 − 10.6667)2
𝜒𝜒 = = + + +
𝐸𝐸𝑖𝑖𝑖𝑖 58.6667 21.3333 29.3333 10.6667
= 24.6305
Not having specified the level of significance, we calculate the p-value considering 𝜒𝜒 2 with (2-1)(2-1)= 1 d.f.
𝑃𝑃[𝜒𝜒 2 > 24.6305] < 0.001 → 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝐻𝐻0 (𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖)
Using the sample data we can conclude that the dependence between Type of drink and Age of customer is
statistically significant.
EXERCISE 19
The business agents working for the same company presume a salary gap against women. In order to verify
their suspects, they consider a sample of 46 agents and collect the following data on gender and income:
Gender \ Income Low-Income Middle-Income High-Income
Women 5 10 4
Men 6 10 11
a) Can they conclude that there is actually a dependency between gender and income? Build and use an
appropriate statistical test with a level of significance α = 0.01. What could the agents conclude?
The appropriate test to verify the association between the two variables is the chi-square test of
independence.
The hypotheses to be tested are:
H 0 : there is no association between the variables "Income" and "Gender" (i.e. they are independent)
H1 : there is an association between the variables "Income" and "Gender" (i.e. they are dependent)
r c (O − Eij )
2
The statistical test to be used is as follows: c 2 = ∑∑ i =1 j =1

ij
Eij
that when H 0 is true, it has distribution χ ( r −1)( χ −1) = χ 2 (where c=3 are the number of columns and r=2
2 2
are the number of rows in the table of the joint frequency).

Expected frequencies table:
Gender\Income Low-Income Middle-Income High-Income Tot
4.5435 8.2609 6.1957 19 4.5435
6.4565 11.7391 8.8043 27 6.4565
11 20 15 46 11
r c (O − E ij )
2
(5 − 4.5435)
2
(10 − 8.2609)
2
(4 − 6.1957 )
2
(6 − 6.4565)
2
c = ∑∑ = + + + +
2 ij
i =1 j =1 E ij 4.5435 8.2609 6.1957 6.4565
+
(10 − 11.7391)2 + (11 − 8.8043)2 = 2.027
11.7391 8.8043
We reject H 0 when the observed value of the statistics test χ 2 is located in the right tail of the
distribution χ 22 . From the table of Chi-Square distribution we get χ 22; 0, 01 = 9.21 > 2.027  We do not
reject the null hypothesis of independence at 1% significance (Income and Gender are statistically
independent).
On the basis of the considering sample and the test result, there is not a salary gap against women.
EXERCISE 20
A researcher wants to analyse the relation, if any, between children weight and time spent in sports in a
week. A random sample of 320 children between 8 and 10 years old is observed. Let X be weekly time
spent in sports (0 = “less than 1 hour”; 1 = “1-3 hours”; 2 = “more than 3 hours”) and Y be the weight (0 =
“normal”; 1 = “slightly overweight”; 2 = “heavily overweight”).
X\Y 0 1 2
0 30 70 0
1 20 30 60
2 10 40 60
a) Write the hypotheses we should test.

Hypotheses to be tested:
H0: the two variables are independent;

H1: the two variables are not independent.
b) Write the general formula of the statistic that should be used in this case.
Compute the expected frequencies for each pair (i, j) of values of X and Y under the null hypothesis as
Ri C j
Eij = ; where Ri is the marginal frequency corresponding to the i-th row and C is the marginal
n
3 3 (O − Eij )
2
frequency corresponding to the j-th column. Reject H0 if: ∑∑

i =1 j =1
ij
Eij
> χ (3−1)(3−1), 0.05 , where Oij are
the observed frequencies.
c) Test the hypothesis of a relation between the two variables with a significance level α = 0.05.
The following tables contains the expected frequencies Eij:
X\Y 0 1 2 Tot
0 18.75 43.75 37.5 100
1 20.625 48.125 41.25 110
2 20.625 48.125 41.25 110
Tot 60 140 120 320
The values
(O ij − Eij )
2
for each cell are

Eij
X\Y 0 1 2
0 6.75 15.75 37.5
1 0.0189 6.8263 8.5227
2 5.4735 1.3718 8.5227
The sum, 90.7359, exceeds χ (3−1)(3−1), 0.05 = 9.49: we reject the hypothesis that X and Y are independent.
EXERCISE 21
You want to study the relationship existing between the revenues of a group of companies and whether or
not the companies have a web page. You conduct a sample survey on 445 companies and obtain:
Revenues\ Web site Yes No
>10 Mil $ 56 88
≤10 Mil $ 99 202
i) Are revenues and having a web site independent (state the null and the alternative hypotheses and verify
whether there is an association between the two variables)? Use the p-value approach.
H0 the two variables X, Y are independent
H1 the two variables X, Y are NOT independent, there is a relationship between the two variables X, Y
Ri C j
Expected frequency: Eij = Revenues\ Web site Yes No
n
>10 Mil $ 155*144/445 290*144/445
=50.1573 =93.8427
≤10 Mil $ 155*301/445 290*301/445
=104.8427 =196.1573
r χ (Oij − Eij ) 2
We reject H0 when ∑∑
i =1 j =1 Eij
> χ (2r −1)( χ −1),α
r c (Oij − Eij ) 2
∑∑
i =1 j =1 Eij
= 0.6806 + 0.3638 + 0.3256 + 0.1740 = 1.544 p-value >0.10
For any reasonable α do not reject H0, the two variables are independent.

ExercisesSection 22 WSolutions

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

ExercisesSection 22 WSolutions

Hochgeladen von

Copyright:

Verfügbare Formate

Syllabus sections 22: Goodness of fit test

EXERCISES WITH SOLUTIONS

Area Downtown Outskirt Suburb

a) Correctly state the hypotheses we need to verify.

H0 : there is NO ASSOCIATION between “sink” and “Area”

b) Is the plumber’s intuition correct? Answer by using α = 0.01.

We have to run a Chi-square test of independence. The test-statistics to be used is

Area Downtown Outskirt Suburb

χ2 = (60-57)2/57 + (50-57)2/57 + (80-76)2/76 + (50-48)2/48 +(40-48)2/48+(70-64)2/64+(40-45)2/45+

Area Downtown Outskirt Suburb

a) Correctly state the hypotheses we need to verify.

H0 : there is NO ASSOCIATION between “sink” and “Area”

b) Is the plumber’s intuition correct? Answer by using α = 0.025.

We have to run a Chi-square test of independence. The test-statistics to be used is

𝑅𝑅𝑖𝑖 𝐶𝐶𝑗𝑗 (𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑖𝑖 𝑡𝑡ℎ 𝑟𝑟𝑟𝑟𝑟𝑟)(𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑗𝑗 𝑡𝑡ℎ 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐)

Let us compute the value of the test statistics

χ2 = (140-154)2/154 + (120-132)2/132 + (180-154)2/154 + (120-112)2/112 +(100-96)2/96+

a) Show the two hypothesis we should test.

Chi-square test for independence.

H0 : there is NO ASSOCIATION between “Contract duration” and “Industry type”

b) Write the formula of the statistic we should use in the test.

The test statistic is:

Fixed a confidence interval of 5% and , r = 2 e c = 3, we have χ 2;0.05 = 5.99 .

Transport Private car Public Transport Others

a) Show the two hypothesis we should test.

Chi-square test. The objective is to test the independence.

b) Write the formula of the statistic we should use in the test.

The test statistic is:

Transport Private Public Transport Others

The sum is equal to 15.2477; the p-value is:

Payment vehicle tax \Geographical Area Center-North South-Islands

Yes 160 130

a) Specify the hypothesis to verify.

Payment vehicle tax Center-North South-Islands Oi.

Yes 160 130 290

O.j 200 200 400

Yes 145 145

The value of the test statistics is:

This statistics has distribution chi-square with (r-1)(c-1)=1 degrees of freedom.

Rai tax\ Geographical Area Center-North South-Islands

a) Specify the hypothesis to verify.

O.j 100 100 200

Yes 27.5 27.5

The value of the test statistics is:

This statistics has distribution chi-square with (r-1)(c-1)=1 degrees of freedom.

H0: the dice is fair,

H1: the dice is unfair.

Let us compute the statistics

That follows a χ 2 distribution with 6 – 1 = 5 degrees of freedom.

Fix a significance level equal to α = 0.05.

H0: the dice is fair,

H1: the dice is unfair.

Let us compute the statistics

That follows a χ 2 distribution with 6 – 1 = 5 degrees of freedom.

Fix a significance level equal to α = 0.05.

With fruits With cream With chocolate Dry

We ran the test:

H 1 : the probability distribution of the population is not uniform

We use the test statistics U = ∑i =1 Ei

(Oi − Ei ) ( 22 − 20) ( 26 − 20) (15 − 20) (17 − 20)

Knowing that χ 3,0.1 = 6, 25

b) Test if the professor hypothesis is rejected, with α = 0.01.

Observed Oi and expected Ei frequencies in this situation are:

Ei = expected distribution in case of no 2881/3= 2881/3= 288*1/3=

≤10 Mil $ 155301/445 290301/445