Sie sind auf Seite 1von 35

Full-time MBA: Business Statistics (BST510)

2011-12 September
Exam Paper: Guideline answers

Dr Paul Bottomley (Room F03)


bottomleypa@cf.ac.uk
Q11: Mobile Phone Ownership
The table below shows the percentage of the population owning a
mobile phone for selected countries in 1998.
Turkey Brazil Chile Uruguay Hungary Greece

2.5 2.6 2.7 4.5 7.0 8.8


Belgium France Germany Netherlands Spain Malaysia

9.6 9.9 9.9 10.9 10.9 11.2


Greenland Canada Austria Switzerland New Zealand United Kingdom

11.5 13.6 14.1 14.2 14.7 15.2


Portugal USA Italy Iceland Singapore Australia

15.3 20.0 20.4 23.6 24.0 26.0


Denmark Israel Japan Sweden Norway Finland

27.0 27.0 30.2 35.6 37.8 46.6

Source: International Telecommunications Union Yearbook (2000).


Q11a: Calculate median, lower and upper
quartiles for mobile phone ownership
(30%)
Position of Q1 = (n+1)/4 = (30 + 1)/4 = 7.75
Position of Q2 = 2(n+1)/4 = 2(30 + 1)/4 = 15.50
Position of Q3 = 3(n+1)/4 = 3(30 + 1)/4 = 23.25

Determine quartile values using the interpolation approach


dont average or round to the nearest integer.

Value of Q1 = 9.6 + 0.75x( 9.9 9.6) = 9.83


Value of Q2 = 14.1 + 0.50x(14.2 14.1) = 14.15
Value of Q3 = 24.0 + 0.25x(26.0 24.0) = 24.50
Q11b: Build a Box-Plot
To build a Box-Plot requires information on the median, and the
quartiles (part a), plus the position of the fences.
Whiskers = max. and min. values within the fences - not outliers.

Upper fence: Q3 + (1.5 x IQR)


Lower fence: Q1 (1.5 x IQR)

Inter-quartile range (IQR) = Q3 Q1 = 24.50 9.83 = 14.67


Upper fence: 24.50 + (1.5 x 14.67) = 46.51
Lower fence: 9.83 (1.5 x 14.67) = -12.18 0

Round to zero. We can not have negative ownership rates.


Q11b: Box-Plot, Mobile Phone Ownership
0 5 10 15 20 25 30 35 40 45 50

*
LF UF

Remember:
Always draw in the position of the upper and lower fences.
Dont connect the whiskers to the fences!
Always draw BP to scale, though graph paper is not required,
but a ruler would definitely help!
Choose your scale carefully use the width of the page.
Can be drawn horizontally or vertically you decide.
Q11c: Describe the Main Features
0 5 10 15 20 25 30 35 40 45 50

Finland
*(outlier)

LF UF
I want a discussion of central tendency, dispersion & shape
Median suggests that within the typical country, 14.15% of
people own a mobile phone.
Ownership varies between 2.5% in Turkey to 37.8% in Norway
(range = 35.3%). Focusing on the middle 50% of observations,
IQR suggests ownership varies by about 15% (10 25%).
Bowleys skew is moderately positive as median is closer to Q1
(actually = 0.40). Pearsons skew is also positive given relative
length of whiskers. So, more smaller values and a few larger ...
Q11d: Measures of Central Tendency
What is the typical level of mobile phone ownership?
Justify your choice of mean (16.9%), median or mode. (20%)

Data is metric so lots of different measures to choose from.


Can be difficult to identify with non-Bell-shaped distributions.
From box-plot, we know the data is positively skewed (Bowley
and Pearson), and Finland is an outlier. This suggests median
(robust statistic) will be more representative than the mean.
From the table, we can deduce that mobile phone ownership is
multi-modal: 9.9%, 10.9% and 27.0% (two countries each) so
mode not very helpful.

Median is likely to be most representative.


Q12: Sales and Advertising
The table contains data on sales and advertising (both measured in
10,000s) for seven brands of toothpaste. Identify the dependent;
then estimate a regression that relates advertising to sales. (55%)
Brand Ad_Exp (X) Sales (Y) Ad*Sales Ad_sq
A 30 20 600 900
B 10 35 350 100
C 50 30 1500 2500
D 60 60 3600 3600
E 72 90 6480 5184
F 4 8 32 16
G 90 72 6480 8100
Sum 316 315 19042 20400

Must identify dependent and independent variables; X causes Y.


But, you dont need sum of Sales (Y) squared wastes time!
Be careful: sum sales = 315, sum ad_exp = 316. (poor question).
Toothpaste: Regression Cont.
Brand Ad_Exp (X) Sales (Y) Ad*Sales Ad_sq
The slope coefficient is:
A 30 20 600 900 n XY X Y
B
C
10
50
35
30
350
1500
100
2500
b
n X 2
X
2

D 60 60 3600 3600
E 72 90 6480 5184 7 *19042 (316 * 315)
b
F 4 8 32 16 7 * 20400 (316 * 316)
G 90 72 6480 8100
133294 99540
Sum 316 315 19042 20400 b
142800 99586
33754
The intercept is: b 0.786
42944
_
a Yb X
Y_ X
b a
315
0.79 *
316
9.34
n n
7 7
Q12b: Regression Coefficients (25%)

The slope coefficient is: b 33754 0.786


42944

315 316 [9.52 Excel rounding to


The intercept is: a 0.79 * 9.34
7 7 many decimal places]

Intercept shows the value of Y when X is zero.


If X is zero, Y will be 9.34 units (abstract)
If a brand spent zero on ad_exp, then expect sales of 93,400.
Sales are positive as depend on many factors (not just ad_exp)
including distribution, competition.

Slope shows the impact of a 1 unit change in X on Y.


As X increases by 1 unit, Y increases by 0.79 units (abstract).
Ad_exp rises by 1 unit, sales revenue rises by 0.79 units (better)
Ad_exp increases by 10,000, sales increase by 7,900 (best).
Q12c: Predicting Sales Revenue
Use this regression equation to predict: (i) sales for a brand that
spends 200,000 on advertising, (ii) the level of advertising that
would predict sales of 500,000. (20%)

(i) If X = 20, Y = 9.34 + (0.79 * 20) = 25.14


(ii) If Y = 50, must now solve equation for X (Dont need to know!)
50 = 9.34 + (0.79 * X) 40.66 = 0.79*X = 40.66/0.79 = 51.47

Always think about the units of measurement: X & Y= 10,000.


If brand spends 200K on ad, we predict sales will be 251,400.
[Units = 10,000, move decimal point 4 places to the right].

Confidence? Within range of Xs used to estimate the model.


Dont worry there should be carryover to future time periods!
Q13a: Binomial Assumptions
What conditions must be satisfied for a binomial probability
distribution to be applicable in a given situation. Illustrate your
answer with reference to parts (b) and (c). (20%)

Discrete probability distribution two possible outcomes.


Experiment consists of n independent trials.
With a constant probability of success.
Good answers discuss how reasonable assumptions are in
the context of scenarios (b) and (c).

Thinking about a natural upper limit helps with identification.


Part (c) is binomial, while part (b) is Poisson.
Q13b: Requesting a Quotation
A builder receives on average 8 requests per 5 day week from
people looking for estimates (quotes) for new building projects.
What is the chance of the builder receiving:
(i) Zero requests for estimates on any weekday?
(ii) More than one request on any weekday? (40%)

Poisson distribution problem no obvious upper limit here.


8 requests per 5 days. Assuming events occur at constant
rate (Assumption1), so we expect 1.6 (8 / 5) per day (mu).

BUT, what about pent-up weekend demand are people more


likely to contact the builder on a Monday rather than a Friday?
Q13b: Requesting a Quotation
What is the chance that (i) there will be no requests on
any weekday, (ii) more than 1 request on any weekday.
xe 1.6 e 0 1.6
P( X x) P( X 0)
x! 0!
P( X 0) (1* 0.2019) / 1 0.2019

(ii) P(More than 1) = 1 P(X = 0 or X = 1).


1.61 e 1.6 1.6 * 0.2019
P ( X 1) 0.3230
1! 1
P(X > 1) = 1 (0.2019 + 0.3230) = 0.4751
* There is about 48% chance of > 1 enquiry on any weekday.
Poisson: Graphical Illustration
0.35
0.3
0.25
Probability

0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6 7 8
P(X = x)

Rate of occurrence (mu) = 1.6 requests per day.


But no upper limit P(X is greater than 8 not shown).
Q13c: Successful Contracts (40%)
If the builder wins on average 25% of building projects he
provides estimates for, what is the probability that of the 6
jobs quoted for last week, 1 or 2 will result in new business?
Binomial: only two possible outcomes (win vs. lose).
Each estimate / quotation constitutes a separate trial.
Independent: winning Job A does not impact winning Job B.

p (1 p)
n! x n x
P( X x)
(n x)! x!
6!
P( X 1) 0.2510.755 6 * 0.25 * 0.16 0.356
5!`1!
6!
P( X 2) 0.252 0.754 15 * 0.0625 * 0.3164 0.297
4!`2!
There is a 65.3% (0.356+0.297) chance of winning 1 or 2 new
contracts when providing estimates for 6 jobs last week.
Q14a. Mapping Onto the
Standard Normal Distribution (Z)
A local bakery knows that the weight of bread loaves which
they bake are Normally distributed with a mean of 1000g
and a standard deviation of 12g. What proportion of loaves
will weight: (50%).
Normal Distribution
(mean = 1000, SD = 12)

a) More than 1015g?

b) Between 970g and 1020g?

c) Always draw a picture a


sketch will do. Marks will = 1000
be lost if you dont!
(i) More than 1015 grammes?
Normal Distribution Standard Normal Distribution

= 1000 N(0,1) Z
= 12

a
1000 1015 0 Z1

The Z value corresponding to X1 = 1015 is:


X 1015 1000
Z1 1.25
12
P( X > 1015) = P( Z > +1.25). Not drawn to scale, but always shown!
Area (a) = 0.1057 area lies above Z = +1.25 SD
10.6% of loaves weigh more than 1015g. [Always interpret the result
1 sentence is OK. Avoids client confusion!]
(ii) Between 970 and 1020 grammes.
Normal Distribution Standard Normal Distribution

= 1000 N(0,1) Z
= 12

a b
970 1000 1020 Z1 0 Z2

The Z values corresponding to X1 = 970 and X2 = 1020 are:


X 970 1000 1020 1000
Z1 2.50 Z2 1.67
12 12
P(970 < X < 1020) = P(-2.50 < Z < +1.67). Not drawn to scale!
Area (a) = 0.0062 area lies below Z = -2.5 SD (by symmetry, Z = +2.50)
Area (b) = 0.0475 area lies above 1.67 standard deviations (Z = +1.67)
1 [0.0475 + 0.0062] = 0.9463. About 95% of loaves within these limits.
Q14b: Design Agency
A design agency has developed new packaging for a brand of
chocolates. The purple and black chocolate box is evaluated
by 45 consumers and receives an average rating of 5.80 with a
standard deviation of 1.8 using 11-point rating scale (0 = very
poor, 10 = very good).
Write down a suitable null and alternative and significance level
to determine if the packaging scored above the mid-point of 5.

H0: pop. mean () 5; H1: pop. mean () > 5.


Test will be performed at 5% significance level (): the social
science industry standard.
Alpha () shows the chance of committing a Type 1 error. It
should be less than 5 in 100 (one-tailed test) if we reject H0:
NB: setting up hypo. test is theory driven (no sample info.)
Q14c: Chocolates Packaging (25%)
Use the appropriate table to find the statistical significance of
this sample mean and explain the logic behind your steps.
H0: pop. mean () 5; H1: pop. mean () > 5.

Significance level (): 5% (1-tailed test).


Critical value = +1.64.

Central Limit Theorem states that the sampling distribution of


sample means (SDSM) will be Normally distributed if we take
a large enough sample (n > 30), regardless of the shape of
the distribution of this variable in the population.
Hypothesis Test: Chocolate Box
H0 _
X Z(0,1) Reject

=5 5.8 0 +1.64
_
X 5.8 5 0.8
Z 2.98
s / n 1.8 / 45 0.268
Z value (2.98) is greater than the critical value (+1.64), thus
reject Ho. The sampling evidence suggests that the population
mean is greater than 5 the null is unlikely to be true. So, we
conclude the packaging is favorably received (pop mean > 5)
subject to the usual caveats of making a Type 1 error etc.
***Always draw a picture and identify the critical region(s)
Extra Questions to Think About
Q11: If a printing error revealed that the adoption of mobile
phones in Finland was not 46.6% but (i) 36.6% or (ii) 40.6%,
what changes (if any) would you make to the box-plot?

Q12: The correlation between sales and advertising is r =


+0.85. Could this impressive positive correlation have
occurred because of sampling error or chance? Write down
a suitable null and alternative, and test this hypothesis.

If youre ensure how to test the correlation, see Mick Silver,


pp.100-103, or tutorial 3 (box-office and cinema screens), or
the extra question in tutorial 5 (all answers on LC).
If you still have further questions:

Maths Support Unit:


Everyday, 11.00 to 13.00
School of Mathematics, Room, M0.37

Discussion Board on LC:


(dont email me directly, as many
people may have the same question).
Section A: Multiple Choice (40%)

1: C 2: D
3: B 4: A
5: B 6: C
7: A 8: D
9: B 10: B

Following typical textbook format, I will go through the


answers to the even numbered questions; you can work
out the logic behind the odd ones. Practice makes perfect!
Q1. A market research survey includes a number of
socio-demographic questions about each respondent.
Which statement is NOT correct? The respondents

A) Martial status (married, divorced, separated, never


married) is a nominal scaled variable.

B) Level of education (no qualifications, O-levels, A-


levels, degree) is an ordinal scaled variable.

C) Household income (10-20K, 21-30K, 31-40K,


41-60K, 61-80K, 81-100K) is ordinal scaled
doesnt satisfy equal interval property add/subtract

D) Age (in years) is a metric (ratio) scaled variable.


Q2. Wages of factory workers (/hr) are found to be
symmetrical and bell-shaped. Assuming that the mean =
11 and standard deviation = 2.20, we would expect:

-2 SD +2 SD

_
6.60 X = 11 15.40

(i) 95% of workers to earn between 8.80 and 13.20


(ii) 99% of workers to earn between 6.60 and 15.40

A) (i)=T(ii)=T, B) (i)=T (ii)=F, C) (i)=F (ii)=T, D) (i)=F(ii)=F


68% within 1SD, 95% within 2SD, 99% within 3SD
Q3. Skewness (i) refers to the shape of the distribution
of a variable. (ii) When a variable is negatively skewed,
the median will be greater than the mean. (iii) The
distribution of peoples income is usually a negative
skewed variable.
15
A) Only (i) is true.

Frequency
10
B) Only (i) and (ii) are true. 5
C) Only (i) and (iii) are true. 0

D) Only (ii) and (iii) are true.


1 2 3 4 5 6 7 8 9 10 11 12 13

Income distribution is positively skewed, so mean >


median. Do we live in a world full of millionaires?
Q4. A designer develops 3 new logos for a brand. A
random sample of 500 loyal or occasional brand users
select the logo they prefer. How many loyal customers
would you expect to prefer the green logo if their was
no relationship between the two variables?

Customer Blue Red Green

Loyal 60 120 120

Occ. 80 80 40

Expected Freq. = (row tot. x column tot.)/sample size


Expected Freq. = (300 x 160) / 500 = 96
A) 96, B) 120, C) 200, D) 375
Q5. The following diagram has four quadrants. What
is the contribution of the two pairs of data points
labelled P and Q to the covariance calculation?

200
Cov( XY ) 150
_ _
P* _
1

100
( X i X )(Yi Y ) 50
Q*
Y
n *
0
200 300 400 500 _600 700 800 900 1000
X

A) = P positive covariance and Q positive covariance


B) = P positive covariance and Q negative covariance
C) = P negative covariance and Q negative covariance
D) = P negative covariance and Q positive covariance
Q6. Which of the following statements about the
coefficient of determination (R2) is NOT true. The
coefficient of determination measures

A) How well the least squares regression line fits the


data.

B) Is the square of the correlation coefficient when


there is only 1 independent variable.

C) Must be larger than 0.5 for a managerially useful


model. Market research study = 0.4/0.5 OK. Time-
series (Bank of England & UK economy = 0.95+)

D) Is a value between 0 and 1 inclusive.


Q7. Which of the following pairs of events are
mutually exclusive?

Scenario Event A Event B


1. Throw 1 coin Getting a head Getting a tail
2. Throw 2 coins Getting a head Getting a tail
3. Customers settling their By cash Every 3 months
accounts due
4. Mens appearance Over 1.8m tail Grey hair

Mutually exclusive events dont have any elements in


common (Venn diagram: circles dont overlap).
A) = Scenario 1 only
B) = Scenarios 1 and 2
C) = Scenarios 1, 2, and 4
D) = Scenarios 1, 2, 3, and 4
Q8. On average, 60 telephone calls are received by a
firms switchboard each hour during the morning. If we
want to find the probability that exactly 2 calls will be
received during any 1-minute period, using the Poisson
distribution, what is the corresponding value of mu?

A) mu = 60
B) mu = 30
C) mu = 2
D) mu = 1

Recall: Assumption #1: Events occurs at a constant


rate throughout the interval. So with 60 calls per
60 minute period, there will be 1 call each minute.
Q9. What does the Central Limit Theorem indicate
will happen to the distribution of sample means drawn
from a non-Normally distributed population as the
sample size gets larger and larger? The distribution
of sample means will be:

A) Bell-shaped with more and more variability.


B) Bell-shaped with less and less variability.
C) Non-Normal with more and more variability.
D) Non-Normal with less and less variability.

Recall, the Jelly-Mould slide all those strange SDSM


distributions uniform, bimodal & skewed populations?
Q10. Suppose a researcher wishes to determine
whether the average length of the working week for a
population of workers has changed from the previously
reported claim of 40.2 hours. Identify the appropriate
null and alternative hypothesis from these options.

A) H0: < 40.2; H1: 40.2


B) H0: = 40.2; H1: 40.2
C) H0: 40.2; H1: = 40.2
D) H0: 40.2; H1: 40.2

Recall, each pair of hypotheses must cover all three


states of the world (exhaust), and mutually exclusive.
H1: Ask yourself what do you want to prove?

Das könnte Ihnen auch gefallen