Sie sind auf Seite 1von 9

Examiners’ commentaries 2015

Examiners’ commentaries 2015


ST104b Statistics 2: Preliminary examination

Important note

This commentary reflects the examination and assessment arrangements for this course in the
academic year 2014–15. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).

Information about the subject guide

Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refers to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.

Comments on specific questions

Candidates should answer all FOUR questions: QUESTION 1 of Section A (40 marks) and all
THREE questions from Section B (60 marks in total). Candidates are strongly advised to
divide their time accordingly.

Section A

Answer all parts of Question 1 (40 marks in total).

Question 1

(a) For each one of the statements below say whether the statement is true or false,
explaining your answer.
i. For two independent events, A and B, such that P (A) > 0 and P (B) > 0,
then P (A ∪ B) < P (A) + P (B).
ii. If two events A and B are independent, then Ac and B are independent.
iii. If B ⊂ A, P (A) < 1 and P (B) > 0, then Ac and B are independent.
iv. The significance level of a test is the probability that the null hypothesis is
false.
v. If y and x have a sample correlation coefficient of 0, the estimate of β1 in the
regression equation y = β0 + β1 x + ε will be 0.
(10 marks)
Reading for this question
For (i.–iii.), the relevant probability concepts are covered in Section 2.9 of the subject guide;
for (iv.) see Section 9.7; and for (v.) see Section 11.7.

1
ST104b Statistics 2: Preliminary examination

Approaching the question


i. True. Under independence, P (A ∩ B) = P (A) P (B), so:

P (A ∪ B) = P (A) + P (B) − P (A) P (B) < P (A) + P (B).

ii. True. Noting that B = (A ∩ B) ∪ (Ac ∩ B), then:

P (Ac ∩ B) = P (B) − P (A ∩ B) = P (B) − P (A) P (B)

= (1 − P (A)) P (B)

= P (Ac ) P (B).

iii. False. Since B ⊂ A, then:

P (Ac ∩ B) = 0 6= P (Ac ) P (B) > 0.

iv. False. It is the probability that we reject a true null hypothesis.


P
v. True. The estimate of the slope is proportional to (xi − x̄)(yi − ȳ), as is the sample
correlation coefficient. If one is 0, so is the other.

(b) A random variable X can take the values 0, 1 and 2. We know that:

3α α α
P (X = 0) = 1 − , P (X = 1) = , and P (X = 2) = .
4 2 4
One observation is taken and we want to estimate α. Consider the estimators
T1 = X and T2 = 2X(X − 1).
i. Show that T1 and T2 are both unbiased estimators.
ii. Which one of the two estimators would you prefer and why?
(8 marks)
Reading for this question
Estimator properties are covered in Section 7.6 of the subject guide.
Approaching the question
i. We have:
 
X 3α α α α α
E(T1 ) = x p(x) = 0× 1− +1× + 2 × = + = α.
4 2 4 2 2
 
X 3α α
E(T2 ) = 2x(x − 1) p(x) = 2(0)(−1) × 1 − + 2(1)(0) ×
4 2
α
+2(2)(1) ×
4

= = α.
4

Therefore, both estimators are unbiased.


ii. The better estimator is the one with the smaller mean squared error. Since both
estimators are unbiased, the one with the smaller variance is better. Therefore, we need
to calculate the variance of both estimators and choose the one with the smaller
variance. Of course, since the expected values are equal anyway, calculating the second
moments, E(T12 ) and E(T22 ), and choosing the smaller value works as well.

2
Examiners’ commentaries 2015

We have:
 
X 3α α α
E(T12 ) = 2
x p(x) = 2
0 × 1− + 12 × + 22 ×
4 2 4

α 3α
= +α= .
2 2
 
X 3α α
E(T2 ) = 22 x2 (x − 1)2 p(x) = 22 (02 )((−1)2 ) × 1 − + 22 (12 )(02 ) ×
4 2
2 2 2 α
+2 (2 )(1 ) ×
4
16α
= = 4α.
4

Since E(T12 ) < E(T22 ), then Var(T1 ) < Var(T2 ) and so T1 is the preferred estimator.

(c) A random sample {X1 , X2 , . . . , Xn } is drawn from the following distribution:


2
λ2x e−λ
p(x; λ) = .
x!
i. Find the maximum likelihood estimator for λ.
ii. State the maximum likelihood estimator for θ = λ3 .
(9 marks)
Reading for this question
Maximum likelihood estimation is covered in Section 7.9 of the subject guide.
Approaching the question
i. The likelihood function is:
n
P
n 2 2 Xi 2
Y λ2Xi e−λ λ i=1 e−nλ
L(λ) = = n .
Xi ! Q
i=1 Xi !
i=1

The log-likelihood function is:


n
! n
!
X Y
2
l(λ) = ln L(λ) = 2 Xi (ln λ) − nλ − ln Xi ! .
i=1 i=1

Differentiating:
n n
Xi − 2nλ2
P P
2 Xi 2
d
l(λ) = i=1 − 2nλ = i=1
.
dλ λ λ
Setting to zero, we rearrange for the estimator:
n
P 1/2
n
X Xi
b2 = 0 b =  i=1 = X̄ 1/2 .
 
2 Xi − 2nλ ⇒ λ  n


i=1

ii. By the invariance principle of maximum likelihood estimators:


b3 = X̄ 3/2 .
θb = λ

3
ST104b Statistics 2: Preliminary examination

(d) A professor finds that he awards a final grade of A to 20% of his students. Of
those who obtain a final grade of A, 70% obtained an A on the midterm
examination. Also, 10% of the students who failed to obtain a final grade of A
earned an A on the midterm examination. What is the probability that a
student with an A on the midterm examination will obtain a final grade of A?
(5 marks)
Reading for this question
Conditional probability is covered in Section 2.9 of the subject guide.
Approaching the question
We have:
P (Af ∩ Am) P (Am | Af ) P (Af )
P (Af | Am) = =
P (Am) P (Am | Af ) P (Af ) + P (Am | Af c ) P (Af c )

0.7 × 0.2
=
(0.7 × 0.2) + (0.1 × 0.8)

= 0.6364.

(e) The random variables εi , for i = 1, 2 and 3, are independent with mean 0 and
variance 1, and α is an unknown parameter. Suppose that you are given
observations y1 , y2 and y3 such that:

y1 = 2α + ε1
y2 = −α + ε2
y3 = −α + ε3 .

Find the least squares estimator of α, verify it is unbiased and calculate its
variance.
(8 marks)
Reading for this question
Least squares estimation is covered in Section 7.8 of the subject guide.
Approaching the question
We need to find α
b that minimises:

(y1 − 2α)2 + (y2 + α)2 + (y3 + α)2 .

Differentiating with respect to α and setting to zero, we have:

−2 × 2(y1 − 2b
α) + 2(y2 + α
b) + 2(y3 + α
b) = 0

and therefore:
2y1 − y2 − y3
α
b= .
6
To verify it is unbiased:

2E(y1 ) − E(y2 ) − E(y3 ) 2 × 2α − (−α) − (−α)


E(b
α) = = = α.
6 6
Also:  2  2  2
2 1 1 1
Var(b
α) = + + = .
6 6 6 6

4
Examiners’ commentaries 2015

Section B

Answer all questions from this section.

Question 2

The random variable X has a probability density function given by:


f (x) = k(2x + 1)(x + 3)
defined over the region 0 < x < 1, and 0 otherwise.

(a) Show that k = 6/43.


(3 marks)
(b) Calculate P (X < 0.25 | X < 0.5).
(4 marks)
(c) Calculate Cov(1/(2X + 1), 2X + 1).
(8 marks)
(d) Determine the standard deviation of X.
(5 marks)

Reading for this question

Continuous random variables are covered in Section 3.7 of the subject guide.

Approaching the question

(a) We have:
Z 1 Z 1 Z 1
1= f (x) dx = k(2x + 1)(x + 3) dx = k 2x2 + 7x + 3 dx
0 0 0

1
2x3 7x2

= k + + 3x
3 2 0

43
= k.
6
Hence k = 6/43.
(b) Since {X < 0.25} ∩ {X < 0.5} = {X < 0.25}, we have:

P (X < 0.25)
P (X < 0.25 | X < 0.5) =
P (X < 0.5)
R 0.25
f (x) dx
= R00.5
0
f (x) dx

(0.25)3 × 2/3 + (0.25)2 × 7/2 + 3 × 0.25


=
(0.5)3 × 2/3 + (0.5)2 × 7/2 + 3 × 0.5

= 0.3983.

(c) We have:
     
1 2X + 1 1
Cov , 2X + 1 = E −E E(2X + 1)
2X + 1 2X + 1 2X + 1
 
1
= 1−E E(2X + 1).
2X + 1

5
ST104b Statistics 2: Preliminary examination

Also:
Z 1 Z 1
6 6
E(X) = x(2x + 1)(x + 3) dx = 2x3 + 7x2 + 3x dx
43 0 43 0

1
6 2x4 7x3 3x2

= + + = 0.6047
43 4 3 2 0

and:
1 1
6 x2
  Z 
1 6 (2x + 1)(x + 3)
E = dx = + 3x = 0.4884.
2X + 1 43 0 2x + 1 43 2 0

Therefore:
 
1
Cov , 2X + 1 = 1 − 0.4884 × (2 × 0.6047 + 1) = −0.0791.
2X + 1

(d) We have:
p
σ= E(X 2 ) − (E(X))2 .
Since:
Z 1 Z 1
6 6
E(X 2 ) = x2 (2x + 1)(x + 3) dx = 2x4 + 7x3 + 3x2 dx
43 0 43 0

1
6 2x5 7x4 3x3

= + + = 0.4395
43 5 4 3 0

then: p
σ= 0.4395 − (0.6047)2 = 0.2717.

Question 3

(a) Consider two random variables X and Y . They both take the values −1, 0 and
1. The joint probabilities for each pair of values, (x, y), are given in the
following table.
X = −1 X=0 X=1
Y = −1 0.09 0.16 0.15
Y =0 0.09 0.08 0.03
Y =1 0.12 0.16 0.12

i. Calculate E(X | Y = 0) and E(X | X + Y = 1).


ii. Define U = |X| and V = Y . Calculate E(U ) and the covariance of U and V .
Are U and V correlated?

(10 marks)

(b) A random sample of size n = 20 is taken from N (µ, σ 2 ). Consider the following
hypothesis test:
H0 : σ 2 = 2 vs. H1 : σ 2 > 2
to be conducted at the 10% significance level. Determine the power of the test
for σ 2 = 2 and σ 2 = 3. (You may use the closest available values in the
statistical tables provided.)
(10 marks)

6
Examiners’ commentaries 2015

Reading for this question

For (a), discrete bivariate distributions are covered in Sections 5.6 to 5.8 of the subject guide.
For (b), tests for variances of normal distributions are covered in Section 9.11.

Approaching the question

(a) i. We have P (Y = 0) = 0.09 + 0.08 + 0.03 = 0.2, hence:

0.09
P (X = −1 | Y = 0) = = 0.45
0.2
0.08
P (X = 0 | Y = 0) = = 0.4
0.2
0.03
P (X = 1 | Y = 0) = = 0.15
0.2

and therefore:
E(X | Y = 0) = −1 × 0.45 + 0 × 0.4 + 1 × 0.15 = −0.3.
We also have P (X + Y = 1) = 0.16 + 0.03 = 0.19, hence:

0.16 16
P (X = 0 | X + Y = 1) = =
0.19 19
0.03 3
P (X = 1 | X + Y = 1) = =
0.19 19

and therefore:
16 3 3
E(X | X + Y = 1) = 0 × +1× = = 0.1579.
19 19 19
ii. Here is the table of joint probabilities:
U =0 U =1
V = −1 0.16 0.24
V =0 0.08 0.12
V =1 0.16 0.24
We then have P (U = 0) = 0.16 + 0.08 + 0.16 = 0.4 and P (U = 1) = 1 − P (U = 0) = 0.6.
Also, P (V = −1) = 0.4, P (V = 0) = 0.2 and P (V = 1) = 0.4. So:

E(U ) = 0 × 0.4 + 1 × 0.6 = 0.6


E(V ) = −1 × 0.4 + 0 × 0.2 + 1 × 0.4 = 0
E(U V ) = 1 × −1 × 0.24 + 1 × 1 × 0.24 = 0.

Hence Cov(U, V ) = E(U V ) − E(U ) E(V ) = 0 − 0.6 × 0 = 0. Since the covariance is zero,
so is the correlation coefficient, therefore U and V are uncorrelated.

(b) The power of the test at σ 2 is:


β(σ) = Pσ (H0 is rejected) = Pσ (T > χ2α, n−1 )

(n − 1)S 2 (n − 1)S 2 σ02


   
2 2
= Pσ > χα, n−1 = Pσ > 2 · χα, n−1
σ02 σ2 σ
2
 
σ
= P χ2n−1 > 02 · χ2α, n−1 .
σ

7
ST104b Statistics 2: Preliminary examination

Hence here:
β(σ) = P (χ219 > 2 × χ20.1, 19 /σ 2 ) = P (χ219 > 2 × 27.20/σ 2 ).
With any given values of σ 2 , we may compute β(σ). For the σ 2 values requested, we obtain
the following.
σ2 2 3
2× χ20.1, 19 /σ 2 27.20 18.13
Approx. β(σ) 0.05 0.50

Question 4

A supermarket chain investigated its profit margins over a period of 4 years. It split
its merchandise into 6 categories and recorded the profit margin for each category
for each year. Among the 6 categories were meat products, with an average yearly
profit margin of 1.1%. The investigators constructed an analysis of variance
(ANOVA) table, which is given below (some entries are missing).

Source DF SS MS F
Category 62.10
Year 3.31
Residual 3.14
Total

(a) Complete the table.


(7 marks)

(b) Is there a significant difference between the profit margins of different


categories? What about the profit margins for different years?
(6 marks)
(c) Construct a 95% confidence interval for the average yearly profit margin for
meat products.
(4 marks)
(d) Consider the two-way ANOVA model:

Xij = µ + γi + βj + εij

for i = 1, 2, . . . r and j = 1, 2, . . . , c. State the assumed distribution of εij .


(3 marks)

Reading for this question

Two-way analysis of variance is covered in Section 10.9 of the subject guide.

Approaching the question

(a) The complete table is:


Source DF SS MS F
Category 5 62.10 12.42 3.96
Year 3 31.18 10.39 3.31
Residual 15 47.10 3.14
Total 23 140.38

8
Examiners’ commentaries 2015

(b) We test the hypothesis H0 : There is no difference between the profit margins of different
categories against the alternative H1 : There is a difference between the profit margins of
different categories. The F value is 3.96 and at a 5% significance level the critical value is
F0.05, 5, 15 = 2.90, hence we reject H0 and so we conclude that there is evidence of a
difference. We also test the hypothesis H0 : There is no difference between the profit margins
in different years against the alternative H1 : There is a difference between the profit margins
in different years. The F value is 3.31 and at a 5% significance level the critical value is
F0.05, 3, 15 = 3.29, hence we again reject H0 and conclude that there is evidence of a
difference.

(c) We have x̄ = 1.1, t0.025, 15 = 2.131 and s2 = 3.14. Therefore, a 95% confidence interval is:
r
3.14
1.1 ± 2.131 × ⇒ (−0.79, 2.99).
4

(d) We assume εij ∼ N (0, σ 2 ) and independent for all i = 1, 2, . . . , r and j = 1, 2, . . . , c.

Das könnte Ihnen auch gefallen