Sie sind auf Seite 1von 7

Solutions for Statistics 1

Mock Examination 2010


Dr James Abdey

Section A
1 (a)

(Total 8 marks)

i. 95% CI formula: x
t(n1),0.025 sn
Degrees of freedom: n 1 = 24
t value: 2.064
Interval end-points: 18.8 2.064

5.25

25

= 16.63 and 20.97

Confidence interval: (16.63, 20.97)


ii. Interpretation: 95% confident that true mean lies in this interval.
iii. Since 20 is in the CI, manufacturers claim is supported.

1 (b)

(Total 8 marks)

i. False: P (1 Z 1) = 0.62827.
ii. False: If r < 0, then < 0 always.
iii. False: Depends on and degrees of freedom.
iv. False: Median Q3 .

1 (c)

(Total 6 marks)

Mean = 31.3, median = 25, mode = 15 and 25 (bimodal), range = 69.


Mean greater than median due to extreme observations 62 and 74.

1 (d)

(Total 6 marks)

Let Pb and Sb be percolator and steamer breaking down respectively.


i. P (Pb Sbc ) = P (Pb )P (Sbc ) = 0.92 0.33 = 0.3036.
ii. P (Pbc Sb ) = P (Pbc )P (Sb ) = 0.08 0.67 = 0.0536.
iii. P (Pb Sbc ) + P (Pbc Sb ) = 0.3036 + 0.0536 = 0.3572.
iv. 1 P (Pbc Sbc ) = 1 P (Pbc )P (Sbc ) = 1 (0.08 0.33) = 0.9736.

1 (e)

(Total 8 marks)
250
2000

Number of professional males =

Number of professional females =


Number of CPD males =

250
2000

450 56

250
2000

296 37

591 74

Number of CPD females =

250
2000

261 33

Number of leisure males =

250
2000

60 7

Number of leisure females =

250
2000

342 43

To select the sample, obtain a list (sampling frame) from the university. Randomly select
the appropriate number of students from each stratum.

1 (f )

(Total 10 marks)

i.

20
15
10

Holiday expenditure,

25

30

Regression line of Holiday Expenditure v. Family Income

20

40

60

80

Family income,

Need accurate line, title and axis labels.

100

ii. Line indicates a positive (linear) relationship between family income and holiday
expenditure. 25% of each additional pound of income is spent on holidays. According
to the model, a family with no income spends 5 on holidays.
iii. If income drops by 3,500, then expenditure is likely to decrease by 875.
iv. To determine whether extrapolation is used in prediction, knowledge of the range of x data
would be required.

1 (g)

(Total 4 marks)

i.

P (W W ) + P (Y Y ) + P (BB) =
=

18 17

30 29


+

8
7

30 29


+

4
3

30 29

187
= 0.4299.
435

ii.
c

P (W W ) =

12 11

30 29


=

22
= 0.1517
145

Section B
2 (a)

(Total 12 marks)

i. H0 : Supplier and insulation type are independent


H1 : Supplier and insulation type are not independent
ii. Degrees of freedom = (4 1) (3 1) = 6
At = 0.05, critical value is 12.59
Hence reject H0 , the test is statistically significant at the 5% level.
For second level, choose < 0.05.
Test, say, at = 0.01, critical value is 16.81.
Hence do reject H0 (just) at the 1% level.
iii. The test is highly significant. We conclude that there is an association between supplier
and insulation type.
From the table it can be seen that supplier C tends to specialise in Seal F, which supplier
A tends to avoid.
3

2 (b)

(Total 13 marks)

i. Draw stem-and-leaf plot with sensible stems


Informative title
Stem label
Leaf label
Accuracy (including leaf alignment)
Stem-and-leaf plot of waiting times of A&E patients (in hours)
Stem = hours | Leaf = fraction of hour
1
2
3
4
5
6
7

|
|
|
|
|
|
|

045688
1144456679
02478
2446
1234
0026
16

ii. Mean = 3.69 hours


Median = 3.2 hours
Q1 = 2.325 (accept [2.3, 2.4]), Q3 = 5.125 (accept [5.1, 5.2]),
Quartile deviation =

Q3 Q1
2

= 1.4.

iii. Mean > median, hence distribution is skewed, specifically positively skewed. Mean affected
by extreme values such as 7.1 and 7.6. Modal waiting time 2.4 hours.

3 (a)

(Total 13 marks)

Least squares is used for the estimation of linear regression parameters.


Correlation coefficient is a measure of the strength of the linear relationship between two
variables.
i. Title, axis labels and accurate plot.

Scatter plot of executive bonuses v. total sales

100
80
60

x
40

Bonus (thousands of euros)

120

x
x

x
x
x

500

1000

1500

Sales (millions of euros)

There is a strong, positive linear relationship between sales and bonuses.


ii. Calculation of least squares regression line: y = a + bx
P
xy n
xy
= 0.067
b= P 2
x n
x2
a = y b
x = 18.491
Hence the estimated regression line is y = 18.491 + 0.067x
For every additional million euros in sales, bonuses increase by 67 euros.
a. y = 18.491 + 0.067(75) 23, 516 euros

iii.

b. y = 18.491 + 0.067(1000) 85, 491 euros


The second forecast is not reliable due to considerable extrapolation.

3 (b)

(Total 12 marks)

Difference between the means 20.54 17.46 = 3.08


q 2
s1
s22
Standard error formula:
n1 + n2 = 0.1088.
Confidence interval formula:
s
x
1 x
2 t/2,n1 +n2 2

s21
s2
+ 2
n1 n2

Since n1 + n2 > 30, can use the standard normal approximation.


z value: 1.96

C.I.: (2.867, 3.293)

i. The computed 95% confidence interval does support the anecdotal evidence since 0 in
not included. This suggests that skilled operatives earn (significantly) more than unskilled
operatives.
ii. Since 3 euros is included in the confidence interval, then this is consistent with a claim
that skilled operatives earn this amount more than unskilled operatives.

4 (a)

(Total 10 marks)

i.


P (X > 75) = P

75 71
Z>
2.25


= P (Z > 1.78) = 0.0375

Expressed as a percentage = 3.75%.


ii.

P (63 < X < 75) = P

75 71
63 71
<Z<
2.25
2.25


= P (3.56 < Z < 1.78) 0.9625

Expressed as a percentage = 96.25%.


iii. P (Z > 2.33) = 0.99. Hence


N 71, 2.252 . So,
iv. X
36

x71
2.25

> 72) = P
P (X

4 (b)
i.

= 2.33. Solve to give x = 65.76kg.

72 71
Z>
2.25/6


= P (Z > 2.67) = 0.00379

(Total 15 marks)
Interviewer-administered questionnaires face-to-face: Advantages: high response rate,
clarify meaning of terms, instant results. Disadvantages: interviewer effect, high cost.
Interviewer-administered telephone questionnaires: Advantages: Relatively cheaper,
can call-back, clarify meaning of terms, instant results. Disadvantages: interviewer
effect, biased against people without telephones.
Questionnaires delivered by post: Advantages: Cheap, no interviewer bias.
Disadvantages: high non-response rate, cannot probe respondents for more detailed
answers.
Questionnaires delivered by email: Advantages: Zero marginal cost, no interviewer
bias. Disadvantages: high non-response rate, cannot probe respondents for more
detailed answers, biased against those without email accounts.

ii. Some suggested responses (not exhaustive).


Survey of teachers: Sensitive topic (pay), hence use of an interviewer inappropriate.
Professional association would have email contact details, hence use option 4. Low
response rate unlikely to occur here teachers would be particularly interested in
pay-related surveys.
Survey of teenagers: Easiest way to contact students is face-to-face at school. Given
broad range of leisure activities, the interviewer could probe respondents for greater
depth. Telephone infeasible since access to a directory of mobile numbers covering
all networks unavailable.
Government survey: Government inquiry would have access to household addresses
(via the electoral register). Hence a postal questionnaire would be suitable. Some
incentive could be introduced to boost response rate.

Das könnte Ihnen auch gefallen