Beruflich Dokumente
Kultur Dokumente
Page 1
Subject CT6
Statistical Methods
CT6 Online Classroom Questions Handout
Decision theory
1
ActEd
For each of the zero-sum game payoff matrices below determine the minimax solution:
(i)
A
I
-1
0
1
2
II
5
6
[2]
(ii)
I
1
0
2
1
2
3
A
II
5
3
4
III
2
1
5
[2]
(iii)
A
B
1
2
I
1
3
II
2
-4
[4]
[Total 8]
Page 2
Q1
Q2
Q3
D1
23
34
16
D2
30
19
18
D3
23
27
20
D4
32
19
19
(i)
[2]
(ii)
[2]
(iii)
Given the following distribution P(Q1 ) = 0.25 , P (Q2 ) = 0.15 , P (Q3 ) = 0.60 ,
determine the Bayes criterion solution to the problem.
[2]
[Total 6]
ActEd
A statistician is trying to decide whether a coin is fair or biased towards tails. The
decision is to be made on whether a tail is obtained on one toss of the coin.
(i)
[1]
If the coin is biased, the probability of obtaining a tail is 0.75. The loss function is:
statistician
(ii)
(iii)
loss
biased
fair
nature
biased
fair
2
-1
1
0
(a)
(b)
(a)
(b)
[3]
Determine the solution using Bayes criterion, if P (fair coin) = 0.8 .
[Total 9]
[5]
Page 3
Slots
Dice
Cards
250,000
550,000
1,150,000
Expected
Revenue
per customer
60
120
160
The casino operator is uncertain about the number of customers and decides to prepare a
profit forecast based on cautious, best estimate and optimistic numbers of customers.
The figures are 14,000, 20,000 and 23,000 respectively.
(i)
[2]
(ii)
[2]
(iii)
Determine the Bayes criterion solution based on the annual profit given the
probability distribution:
and
P (cautious) = 0.2
P (best estimate) = 0.7
P (optimistic) = 0.1 .
[2]
[Total 6]
Page 4
Bayesian statistics
1
Determine the posterior distribution for l given the observed values x1, , xn
of the number of claims in each of n months.
(ii)
(iii)
[2]
If n = 5 and
[3]
[4]
[Total 9]
The number of claims registered per week has a Poisson distribution for which the
mean, l , is either 1 or 2. The prior distribution for l is given by:
P ( l = 1) = 0.4
P ( l = 2) = 0.6
Given that three claims are registered in a particular week, calculate the Bayesian
estimate of l under squared error loss, and under zero-one loss.
[4]
Page 5
The number of claims arising each month from a general insurance portfolio has a
Poisson distribution, with unknown Poisson parameter l . Claims are monitored over a
period of 50 months, and an average of 210 claims per month are observed.
(i)
(ii)
If 7 claims were recorded on the most recent day for which data are available,
determine the posterior distribution for l , and hence find the Bayesian estimate
[6]
of l under quadratic loss.
(iii)
Discuss briefly the differences between the estimators in (i) and (ii), indicating
which you think is preferable.
[2]
[Total 14]
Page 6
Estimation
1
The most recent ten claims under a particular class of insurance policy were:
35 111 201 309 442 617 843 1,330 2, 368 4, 685
(i)
Assuming that the claims came from a lognormal distribution with parameters
m and s , derive the formula for the maximum likelihood estimates of these
parameters and estimate the parameters based on the observed data.
[5]
(ii)
Assuming that the claims come from a Pareto distribution with parameters a
[3]
and l , use the method of moments to estimate these parameters.
(iii)
Assuming that the claims come from a Weibull distribution with parameters c
and g , use the method of percentiles (based on the 25th and 75th percentiles) to
estimate these parameters.
[5]
[Total 13]
ActEd
In a portfolio of motor policies, the annual number of claims for a single policy has a
Poisson distribution with parameter l . The parameter l is not the same for all policies
in the portfolio, but is modelled as a random variable with density:
G (n + 2) 1 5
P ( N = n) =
, n = 0,1, 2,3,
G (n + 1)G (2) 6 6
(ii)
[3]
Hence, obtain the mean and variance of the number of claims per annum.
[2]
[Total 5]
Page 7
Reinsurance
1
ActEd
Claims from a certain portfolio have a Pareto distribution with parameters = 3 and
= 500 . A retention limit of 400 is in force, with the excess of this amount on any
claim being paid by a reinsurer.
(i)
[2]
(ii)
[4]
(iii)
What is the mean amount paid by the reinsurer on all claims in which it is
involved?
[2]
[Total 8]
ActEd
(i)
An insurer effects excess of loss reinsurance with retention limit M . Obtain the
distribution of the claim amounts paid by the reinsurer given that they are
referred to the reinsurer if claims amounts have an exponential distribution with
[2]
parameter l .
(ii)
Page 8
(i)
The random variable X has the lognormal distribution with density function
f ( x ) and parameters m and s . Show that for any positive integer k :
U
k
k m + k
x f ( x) dx = e
s2
[F(U k ) - F( Lk )]
where Lk =
(ii)
ln L - m
- ks and U k =
- ks .
[6]
ln U - m
X 28
X > 28
(a)
(b)
Calculate the mean and variance of the claims paid by the insurer.
(c)
The loss severity distribution for a portfolio of household insurance policies is assumed
to be Pareto with parameters a = 3.5 , l = 1,000 .
Next year, losses are expected to increase by 5%, and the insurer has decided to
introduce a policyholder excess of 100.
Calculate the probability that a loss next year is borne entirely by the policyholder.
[3]
Page 9
A specialist motor insurer writes policies with individual excesses of 500 per claim.
The insurer has taken out a reinsurance policy whereby the insurer pays out a maximum
of 4,500 in respect of each individual claim, the rest being paid by the reinsurer. The
individual claims, gross of reinsurance and the excess, are believed to follow an
exponential distribution with parameter l .
Over the last year, the insurer has gathered the following data:
There were 5 claims which were not processed because the loss was less than the
excess.
There were 11 claims where the insurer paid out 4,500 and the reinsurer the remainder.
There were 26 other claims in respect of which the insurer paid out a total of 76,457.
Derive the loglikelihood function of l .
[6]
Page 10
Credibility theory
1
Claim amounts under a particular insurance portfolio are believed to follow a Normal
distribution with variance s 12 and an unknown mean q . The insurer observes a sample
of n policies that have given rise to a claim for which the mean amount is a . The prior
distribution of q is assumed to be Normal with mean m and variance s 22 .
(i)
(ii)
(a)
(b)
(iii)
Page 11
Z .( x n) + (1 - Z ).m
where m is the mean of the beta prior distribution and express Z as a function
[3]
of a , b and n .
A claims analyst estimates that the mean and standard deviation of the prior distribution
of q are 0.20 and 0.25 respectively.
(ii)
[3]
From a random sample of 50 policies, a claim is made on 24% of them during the year.
(iii)
(a)
(b)
(c)
(1)
(2)
State the limiting value of Z as s and n increase and explain what this
means.
[4]
[Total 10]
Page 12
EBCT
1
The total amount claimed for a particular risk in a portfolio is observed for each of 5
consecutive years. An insurer decides to use Empirical Bayes Credibility, Model 1,
where the credibility premium combines the mean for the particular risk with the
estimated value of E ( m(q )) . Data from 3 risks in this portfolio over 5 years are
available. Let X ij be the claim for Risk i in year j . The table shows various
summary statistics for the observed data.
5
Risk 1
Risk 2
Risk 3
Xi
( X ij - X i )2
122
164
106
2,848
1,628
1,887
j =1
Calculate the estimated credibility factor, and calculate the credibility premium for
Risk 1.
[4]
ActEd
The table below shows the total aggregate claims, Yij , and the corresponding risk
volumes, Pij , over the past six years for the three types of pet insurance offered by a
small company:
6
Risk
Pij
54,278
68,861
1,689
362
485
78
j =1
Cat
Dog
Stick insect
Yij
j =1
(i)
Using EBCT Model 2, the credibility premium per unit of risk volume for the
coming year for cats is 148.357. Calculate the credibility premium per unit of
risk volume for the coming year for dogs and stick insects.
[7]
(ii)
[2]
[Total 9]
Page 13
An insurance company has insured a fleet of cars for the last four years. For year j (
j = 1,..., 4 ), let Y j and Pj be the total amount claimed and the number of cars in the
fleet, respectively. Let X j = Y j / Pj be the average amount claimed per car in year j .
Assume that the distribution of X j depends on a risk parameter q and that the
conditions of Empirical Bayes Credibility Theory Model 2 are satisfied.
The company has insured ten similar fleets over the last four years. Using the data from
these years, E[ m(q )] , E[ s 2 (q )] and V [ m(q )] are estimated to be 62.8, 106.32 and 5.8
respectively.
(a)
Calculate next years credibility premium for a fleet of cars with claims over the
last four years given below, if the fleet will have 16 cars next year.
Year
Total amount claimed
Number of cars
(b)
1
1,000
15
2
1,200
16
3
1,500
18
4
1,400
15
Explain how and why the credibility factor would be affected if the estimate of
[5]
V [ m(q )] increases, and comment on the effect on the credibility premium.
[Total 8]
Page 14
For the past five years an insurance company has insured 15 different chains of
newsagents shops against damage to their premises and stock from any cause. For
chain i , i = 1,2,,15 , and year j , j = 1,2, ,5 , the random variable Yij represents the
annual claims and Pij represents the number of shops in the chain. The sequence
RSnY ; P s UV
T
W
5
ij
ij
15
j =1 i =1
Model 2. The data for the first three chains in this collective are shown in the table
below. Also shown for each of the first two chains is the credibility premium per shop
for the coming year.
Yij ; Pij
Chain
j =1
1
2
3
450; 2
2500; 3
4950; 9
220; 2
1140; 4
39600; 9
3700; 2
3600; 4
14850; 11
250; 2
3900; 4
29700; 12
380; 2
860; 5
9900; 14
Credibility
premium per
shop
750
733
(a)
Calculate the credibility premium per shop for the coming year for Chain
number 3.
(b)
Explain carefully why the credibility premium per shop for the coming year is
higher for Chain 1 than for Chain 2 even though the average annual claim per
shop is lower for Chain 1 than for Chain 2.
[11]
[Total 17]
Page 15
Risk models 1
1
The number of claims arising from a hurricane in a particular region has a Poisson
distribution with mean l . The claim severity distribution has mean 0.5 and variance 1.
(i)
Determine the mean and variance of the total amount of claims arising from a
hurricane.
[2]
(ii)
The number of hurricanes in this region in one year has a Poisson distribution
with mean m . Determine the mean and variance of the total amount claimed
from all the hurricanes in this region in one year.
[3]
[Total 5]
A portfolio consists of two types of policies. For type 1, the number of claims in a year
has a Poisson distribution with mean 1.5 and the claim sizes are exponentially
distributed with mean 5. For type 2, the number of claims in a year has a Poisson
distribution with mean 2 and the claim sizes are exponentially distributed with mean 4.
Let S be the total amount claimed on the whole portfolio in one year. All policies are
assumed to be independent.
Derive the MGF of S and show that S has a compound Poisson distribution.
[4]
Page 16
A bicycle wheel manufacturer claims that its products are virtually indestructible in
accidents and therefore offers a guarantee to purchasers of pairs of its wheels. There are
250 bicycles covered, each of which has a probability p of being involved in an
accident (independently) in a year. Despite the manufacturers publicity, if a bicycle is
involved in an accident, there is in fact a probability of 0.1 for each wheel
(independently) that the wheel will need to be replaced at a cost of 100. Let S denote
the total cost of replacement wheels in a year.
(i)
M S (t ) =
+ 1 - p
100
(ii)
250
[4]
[6]
Suppose instead that the manufacturer models the cost of replacement wheels as a
random variable T based on a portfolio of 500 wheels, each of which (independently)
has a probability of 0.1p of requiring replacement.
(iii)
(iv)
Suppose p = 0.05 .
(a)
(b)
(c)
[2]
[5]
[Total 17]
Page 17
Risk models 2
1
The total claim amount, S , on a portfolio of insurance policies has a compound Poisson
distribution with Poisson parameter 50. Individual loss amounts have an exponential
distribution with mean 75. However, the terms of the policies mean that the maximum
sum payable by the insurer in respect of a single claim is 100.
(i)
(ii)
(iii)
a normal distribution
a log-normal distribution
[7]
[3]
[3]
[Total 13]
The total claims arising from a certain portfolio of insurance policies over a given
month is represented by
N
X
S = i =1 i
if
N >0
if N = 0
[8]
Page 18
A general insurance company has a portfolio of fire insurance policies, which offer
cover for just one fire each year.
Within the portfolio, there are three types of buildings for which the average cost of a
claim and probability of a claim are given in the table below.
Type of
building
Number of Risks
Covered
Small
Medium
Large
147
218
21
Average Cost
of a Claim
(000s)
12.4
27.8
130.3
Probability
of a Claim
0.031
0.028
0.017
It is assumed that the cost of a claim has an exponential distribution, and that all the
buildings in the portfolio represent independent risks for this insurance cover.
(i)
Show that the mean and standard deviation of annual aggregate claims from this
portfolio of insurance policies are 272,715 and 150,671, respectively and
obtain the cumulant generating function.
[4]
(ii)
(iii)
Market conditions dictate that the insurer can only charge a premium which
includes a loading of 25%. Calculate the amount of capital that the insurer must
allocate to this line of business in order to ensure that the probability that annual
aggregate claims exceed premium income and capital is 0.05 (again using a
normal approximation).
[2]
(iv)
Page 19
Ruin theory
1
ActEd
S (t ) is a compound Poisson process with Poisson parameter 40, and claim size
distribution which is log N (5, 4) .
(i)
(ii)
The initial surplus is 400,000, and the rate of premium income is 41,000 per unit
time. Assuming that U (t ) can be approximated by a normal distribution, find
[3]
the probability that U (10) < 0 .
[Total 6]
[3]
Show that if the first claim occurs at time t, the probability that this claim causes
ruin is e -U e - (1+a ) l t .
[3]
e -U
.
2 +a
(ii)
(iii)
Show that if the insurer wishes to set a such that the probability of ruin at the
first claim is less than 1% then it must choose a > 100e -U - 2 .
[4]
[2]
[Total 9]
Page 20
ActEd
The aggregate daily claims (000s) on a certain portfolio of policies occur according to
a Poisson process with parameter 30. Individual claim sizes (000s) have a gamma
distribution with mean 40 and variance 800. The insurer adds a premium loading factor
of 50%.
(i)
[4]
(ii)
Hence calculate the minimum amount of capital needed to ensure that the
probability of ultimate ruin is less than 0.1%.
[1]
(iii)
[1]
[Total 6]
Claims arrive as a Poisson process rate l , and the premium loading factor is 25%.
(a)
(b)
[3]
Page 21
ActEd
Claims arrive as a Poisson process with rate . Individual claim sizes are exponentially
distributed with mean 100. The insurer uses a premium loading factor of 0.2.
A proportional reinsurance arrangement has been proposed, with a retained proportion
of . The reinsurer uses a security loading of 0.4.
(i)
State the range of possible values of a such that the probability of ruin is less
than 1.
[2]
(ii)
(a)
(iii)
2a - 1
100a (7a - 1)
(b)
(c)
Determine the expected profit per unit time and the upper bound for the
probability of ultimate ruin for the value of a calculated in part (ii)(b).
[10]
Compare the profit and the probability of ultimate ruin in part (ii)(c) with those
where no reinsurance is effected.
[2]
[Total 14]
Page 22
ActEd
(i)
(ii)
Show that each of the following distributions are members of the exponential
family:
(a)
Yi ~ Poi( mi )
(b)
Yi ~ Exp(li )
(c)
Yi ~ N ( mi ,s 2 )
(d)
Yi = Zi n where Zi ~ Bin(n, mi )
[8]
For each distribution in part (i), use the properties of exponential families to
determine their mean, variance and variance function.
[8]
[Total 16]
(i)
The gamma distribution with mean m and variance m 2 a has density function:
f ( y) =
aa
m G (a )
ya -1 e - ya / m
( y > 0)
(a)
(b)
Use the properties of exponential families to confirm that the mean and
variance of the distribution are m and m 2 a .
[9]
Page 23
The next six questions refer to the following generalised linear model:
The number of claims observed in the last year on 15 policies were as follows:
i
10 11 12 13 14 15
yi
ActEd
(i)
Show that the log-likelihood function for this model, based on observations
{ yi : i = 1,...,15} is given by:
15
ln L( mi ) = { yi ln mi - mi + c}
where c is a constant .
[1]
i =1
(ii)
Identify the canonical link function associated with the Poisson model from the
Tables.
[1]
ActEd
The insurance company proposes a simple model (Model A) with linear predictor:
hi = a
for i = 1,...,15
(i)
(ii)
Use the canonical link function to show that the log-likelihood in terms of a is
given by:
15
[1]
ln L(a ) = a yi - ea + c
i =1
(iii)
(iv)
Hence,
show
a = ln
15
y
i =1 i
that
the
maximum
15 = -0.3102 .
[1]
likelihood
estimate
for
is
[1]
[1]
[Total 4]
Page 24
ActEd
The insurance company proposes a second model (Model B) with linear predictor:
a for i = 1,...,10
hi =
b for i = 11,...,15
(i)
[1]
(ii)
Write down the log-likelihood function for Model B and derive maximum
likelihood estimates for a and b .
[2]
(iii)
[1]
[Total 4]
ActEd
The insurance company proposes a third model (Model C) with linear predictor:
hi = a i for i = 1,...,15
(i)
[1]
(ii)
[2]
(iii)
[1]
[Total 4]
Page 25
ActEd
(i)
(ii)
Write down the log-likelihood function of the saturated model for the data in the
previous question in terms of yi only.
[1]
(iii)
(a)
(b)
12.89
[1]
[7]
(b)
[2]
[Total 11]
ActEd
(i)
(ii)
(iii)
(b)
[2]
(b)
[2]
Page 26
ActEd
h = ai + b j + g x
where i denotes region ( i = 1,..., 4 ), j denotes socio-economic group ( j = 1,...,5 ) and
x denotes age. He has fitted a binomial model to a set of relevant data and has obtained
the following estimates of the parameters.
a1 = -2.3975
a 2 = -2.3118
a 3 = -2.7375
a 4 = -2.6562
b1 = 0
b 2 = 0.1242
b 3 = 0.3894
b 4 = 0.4665
b 5 = 0.6616
g = 0.0012
(i)
Explain why the statistician has chosen a binomial model and write down the
canonical link function.
[2]
(ii)
Use the canonical link function to predict the probability of each of the
following lives developing the condition.
(iii)
(a)
(b)
[2]
[Total 8]
10
Page 27
Model 1
log mij = m
266.35
Degrees of
Freedom
11
Model 2
log mij = a i
202.19
Model 3
log mij = a i + b j
10.68
Deviance
11
(a)
(b)
[7]
ActEd
A generalised linear model is being used to estimate the expected future lifetime of
individuals aged exactly 25. The following covariates are used:
(i)
A
Si
Oj
Write the parameterised form of the linear predictor for the following models:
(a)
(ii)
cigarettes
(b)
(cigarettes)2
(c)
occupation
[3]
Write the parameterised form of the linear predictor for the following models:
(a)
(c)
(e)
cigarettes + (cigarettes)2
cigarettes + sex
sex + occupation
(b)
(d)
(f)
cigarettes + alcohol
alcohol + occupation
[6]
cigarettes + sex + occupation
[Total 9]
Page 28
12
ActEd
(i)
(ii)
[1]
Explain in words what the following represent and relate that to a reduction in
life expectancy:
(a)
(iii)
(b)
smoker * drinker
[2]
Using the covariates from 11, write the parameterised form of the linear
predictor for the following models:
(a)
(b)
(c)
(iv)
smoker drinker
[3]
Using the covariates from 11, write the parameterised form of the linear
predictor for the following models:
(a)
(b)
(c)
[3]
[Total 9]
13
Page 29
(ii)
[3]
(iii)
A company is analysing its claims data on a portfolio of motor policies, and uses
a gamma distribution to model the claim severities. The company uses three
rating factors:
policyholder age (as a continuous variable)
policyholder gender
vehicle rating group (as a factor).
(a)
Write down the form of the linear predictor when all rating factors are
included as main effects.
(b)
Page 30
ActEd
The figures below give the claim payments made during the calendar years 2007-2009
for a certain portfolio of general insurance policies:
Accident
year
Development year
0
2007
320
460
110
2008
350
410
2009
400
2007/08
5%
2009/10
3%
2008/09
4%
2010/11
4%
Use the inflation adjusted chain ladder method to estimate the total amount outstanding
for future claims arising from accident years 2008 and 2009.
[6]
Page 31
The table below shows cumulative paid claims and premium income on a portfolio of
general insurance policies.
Underwriting
year
Development Year
0
2002
38,419
77,112
91,013
2003
31,490
78,504
2004
43,947
Premium
income
120,417
117,101
135,490
(i)
Assuming an ultimate loss ratio of 93% for underwriting years 2003 and 2004,
calculate the Bornhuetter-Ferguson estimate of outstanding claims for this
triangle.
[8]
(ii)
[2]
[Total 10]
Page 32
The delay triangles given below relate to a portfolio of motor insurance policies.
The cost of claims settled during each year is given in the table below:
(Figures in 000s)
Accident
Year
Development Year
0
2004
4,144
694
183
2005
4,767
832
2006
5,903
Accident
Year
Development Year
0
2004
581
75
28
2005
626
71
2006
674
Calculate the outstanding claims reserve for this portfolio using the average cost per
claim method with grossing-up factors, and state the assumptions underlying your
result.
[7]
Page 33
cov( X , Y )
2.
cov ( X , c )
3.
cov ( 2 X ,3Y )
4.
5.
6.
7.
8.
cov(2 X + 1, 5 - 3Y )
cov( X , X )
cov( aX + b, cY + d )
cov( X , Y + Z )
corr( X , Y )
If { X t } denotes a time series defined at integer times and {Zt } is white noise with
variance s 2 (and mean 0), calculate each of the following:
1.
cov( Z 2 , Z3 )
2.
cov( Z3 , Z3 )
3.
cov( X 2 , Z3 )
ie what is the connection between future white noise ( Z3 ) and the current
observed time series ( X 2 ) ?
4.
cov( Z1 + Z 2 , Z1 + Z3 )
5.
cov(0.5Z1 + Z 2 , 0.5Z 2 + Z3 )
6.
var( Z1 + Z 2 )
Page 34
A time series is strictly stationary if all the statistical properties are unchanged over
time.
For a weakly stationary time series, we require that E ( X t ) and var( X t ) are constant,
and that the covariances are the same for the same lags:
cov( X ti , X t j ) = cov( X ti + k , X t j + k )
Why do we want a time series to be stationary?
Indeterminism
Page 35
Invertibility
A time series is invertible if we can calculate the white noise terms (residuals) from
observed data values by inverting the formula for the process.
Examples:
X t = 0.8 X t -1 + et
X t = et + 0.6et -1
Why do we want a time series to be invertible?
Markov property
A time series has the Markov property if we can predict the future development of a
time series from its present state alone. In the course this is formerly expressed as the
process probabilities depend only on the most recent value:
P( X t A | X t1 = x1, , X tm = xm ) = P( X t A | X tm = xm )
Examples:
X t = 0.8 X t -1 + et
X t = 0.8 X t -1 - 0.3 X t - 2 + et
X t = et + 0.6et -1
Characteristic polynomials
1.
Write the formula for the time series in terms of the backwards shift operators:
f ( B) X t = q ( B)et
2.
3a.
3b.
Page 36
Examples:
Check stationarity and invertibility of:
12 X t = 10 X t -1 - 2 X t - 2 + 12et - 11et -1 + 2et - 2
X t = 0.8 X t -1 + et
X t = et + 0.6et -1
Autocovariance Function
g 0 = var( X t )
g k = cov( X t , X t + k )
Note: if process is not stationary, then covariances g k (t ) would depend on t and k .
Autocorrelation Function (ACF)
rk = corr( X t , X t + k )
- 1 rk 1
f1 = r1,
r2 - r12
f2 =
,
1 - r12
Page 37
MA(q)
A moving average process of order q is a weighted average of the past q white noise
terms (plus a new white noise term):
X t = et + b1et -1 + + b q et - q
(zero mean)
X t = m + et + b1et -1 + + b q et - q
(mean m )
Always stationary
rk cuts off for k > q
fk decays as k
Not Markov
Do the test for invertibility
AR(p)
An autoregressive process of order p depends on the previous p values (plus just one
white noise term):
X t = a1 X t -1 + + a p X t - p + et
X t = m + a1( X t -1 - m ) + + a p ( X t - p - m ) + et
Always invertible
rk decays as k
(zero mean)
(mean m )
Page 38
[5]
(ii)
[1]
[Total 6]
(i)
(a)
(b)
(ii)
State the conditions on the values of the parameters such that the process
is invertible.
[5]
Page 39
{et : t = 0,1,}
(ii)
[5]
State, without performing additional calculations, what you would expect to find
if you were to calculate rk and fk for larger values of k .
[2]
[Total 7]
coefficients r1 and r2 .
[3]
(b)
State, with a reason, whether the process possesses the Markov property.
(c)
700 2
600 2
s and g 1 = 169
s , and find the value of g 2 .
Show that g 0 = 169
[8]
Page 40
= a X 0 + a j et - j
t
j =0
E( X t ) = a t E( X 0 )
var( X t ) = a 2t var( X 0 ) +
1 - a 2t 2
s
1-a 2
Xt =
a j et - j
j =0
s2
.
1-a 2
ARMA(p, q)
(zero mean)
X t = m + a1 ( X t -1 - m ) + + a p ( X t - p - m ) + et
+ b1et -1 + + b q et - q
rk decays as k
fk decays as k
(mean m )
Page 41
[6]
Difference operator
Yt = d X t is a stationary ARMA( p, q )
Page 42
Example:
Classify the following processes:
X t - 0.6 X t -1 - 0.3 X t - 2 - 0.1X t - 3 = et - 0.25et -1
2 X t = 7 X t -1 - 9 X t - 2 + 5 X t - 3 - X t - 4 + et + et - 2
6 X t = 17 X t -1 - 17 X t - 2 + 7 X t - 3 - X t - 4 + Zt
A time series that depends on more than one variable. This is as opposed to a univariate
time series which is a time series with only one variable in it (white noise doesnt count
as a variable). For example AR ( p ), MA( p ), ARMA( p, q ) and ARIMA( p, d , q ) are all
univariate. Multivariate time series can be written in vector (VARMA) form:
Example:
X t = 0.7 X t -1 - 0.1Yt -1 + et
Yt = 0.2 X t -1 - 0.3Yt -1 + et
Is this example time series:
a)
b)
c)
Cointegrated
This means that the individual processes are not stationary but when combined in some
way, they do produce a stationary series.
Two time series processes, X t and Yt are cointegrated if:
Page 43
Bilinear models
X t + a ( X t -1 - m ) = m + et + b et -1 + b ( X t -1 - m ) et -1
Threshold AR models
a1 ( X t -1 - m ) + et
Xt = m +
a 2 ( X t -1 - m ) + et
if X t -1 d
if X t -1 > d
X t = m + a t ( X t -1 - m ) + et , a t random
ARCH models
p
X t = m + et a 0 + a k ( X t - k - m )
k =1
Note: ARCH has now been tested twice (Sept 2007 and Sept 2008).
Page 44
linear trend
exponential trend
seasonal/periodic
take logs
method of moving
averages
seasonal differencing
method of seasonal
means
difference to minimise s d2
Identify
p, d and q
identify d
MA( q )
AR ( p )
ARMA( p, q )
(the sample
fk decays
fk cuts off (ie within 95% CI) when k > p ,
rk decays
if not MA( q ) or AR ( p ) then ARMA( p, q )
start with ARMA(1,1) and work up to more
complicated models if fit poor
AIC only add new parameters if relative
reduction in sum of squares of residuals
-2
Estimate the
parameters
method of moments
e n
solve rk = rk
minimise
method of maximum
likelihood
et2
Page 45
shouldnt be a pattern
with t
1.96 n
should be no more than
5% outside this CI
portmanteau test
(Ljung-Box test):
ACF of the residuals is 0
Tables pg 42
one-tailed test
Tables pg 42
assume normal
distribution
two-tailed test (with
continuity correction)
Ratio
1.000
0.900
0.800
0.700
1
11 13 15 17 19 21 23 25 27 29
Year, t
Figure 1b
Page 46
Autocorrelation, rk
0.6
0.4
0.2
0
-0.2 0
-0.4
-0.6
Lag, k
(i)
Explain which feature of the Figures indicates that differencing is not required in
order to obtain a stationary series.
[1]
(ii)
On the basis of the sample ACF, rk , the companys analyst decides to fit a firstorder autoregressive model to the data. State, with reasons, whether you
consider this to be a reasonable decision and indicate what additional plot you
would require in order to make a firmer recommendation.
[3]
(iii)
The model is fitted and the residuals calculated. The sample ACF of the
residuals is shown in Figure 1c. State what conclusions you would draw from
the plot.
[1]
Figure 1c
Autocorrelation, rk
Lag, k
[Total 5]
Page 47
ACF
0.854
0.820
0.762
PACF
0.854
0.371
0.085
(ii)
[4]
[Total 6]
Page 48
ActEd
The monthly sales (in kilolitres) of red wine by Australian winemakers from January
1980 through to October 1991 are shown in the graph below:
3000
2500
2000
Sales
1500
1000
500
0
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
Month
(i)
(a)
Give two features which indicate that the time series is non-stationary.
(b)
[3]
The data is logged and seasonally adjusted and the statistician tries to fit an
ARIMA( p, d , q ) model to the adjusted data:
(ii)
Xt
X t
2 X t
3 X t
sample
variance
r1
0.0791
0.0159
0.0471
0.1565
0.8805
-0.4774
-0.6672
-0.7453
r2
0.8541
0.0433
0.1918
0.2998
r3
0.8245
-0.0374
0.0010
-0.0115
r4
0.8025
-0.0920
-0.1148
-0.1316
r5
0.8110
0.1810
0.1967
0.1985
r6
0.7815
-0.1284
-0.1409
-0.1456
SACF
Use the data in the table to choose the most appropriate value for d . State your
reasons clearly.
[2]
Page 49
After being differenced an appropriate number of times, the statistician examines the
SACF and SPACF:
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
SACF
Lag
1
10
0.5
0.4
SPACF
0.3
0.2
0.1
Lag
0
-0.1
-0.2
-0.3
-0.4
-0.5
(iii)
(iv)
[2]
[2]
[Total 9]
Page 50
xn (1),..., xn (k - 1)
This is a weighted average of the past values but there is less emphasis on older values.
Rearrangements: xn (1) = a xn + (1 - a ) xn -1 (1) or xn (1) = xn -1(1) + a [ xn - xn -1(1)]
Page 51
A modeller has attempted to fit an ARMA ( p, q ) model to a set of data using the BoxJenkins methodology. The plot of residuals based on this proposed fit is shown below.
Residuals based on fitted model
120
100
80
60
40
20
0
-20
-40
-60
-80
1
6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
Time
(i)
(ii)
Under the assumptions of the model, the residuals should form a white noise
process.
(a)
(b)
(c)
[3]
Page 52
(iii)
(b)
[6]
[Total 15]
Page 53
1.
2.
x = F -1 (u )
Inverse transform method (discrete random variable)
1.
2.
where the discrete random variable, X , can take only the values x1, x2 , , xN , (
x1 < x2 < < xN )
Box-Muller algorithm (page 39 Tables)
1.
2.
Return:
1.
2.
3.
-2ln S
-2ln S
and z2 = v2
S
S
Page 54
Acceptance/rejection method
Scales up h( x) , so that the area under Ch ( x ) includes all of the area under f ( x ) :
C = max
all x
f ( x)
h( x )
1.
2.
3.
4.
If u2 < g ( x ) =
f ( x)
then return x otherwise repeat.
Ch ( x )
Page 55
ActEd
1< x < 3
(i)
Explain why the inverse transform method is not appropriate for generating
random variates from X .
[1]
(ii)
f ( x)
h( x)
, is
43
36
where h( x) is the
f ( x)
Ch ( x )
Use the random numbers 0.461, 0.966, 0.024 and 0.222 to generate
random variates from the U (1, 3) distribution.
(d)
Use the random numbers 0.458, 0.373, 0.711 and 0.606 together with the
probability function in part (ii)(b) to decide which of the values in part
(ii)(c) should be kept.
[5]
(iii)
Explain the proportion of the pseudo-random numbers used that will be rejected
in the long run.
[2]
(iv)
Page 56
Errors
1
Using q = xi to estimate q = E ( X ) :
n
absolute error = q - q
relative error =
q - q
q
Using
q - q
~ N (0,1) , the number of simulations required, n , to ensure:
t n
P q - q < e = 1 - a
q - q
P
< e =1-a
q
n > za2
t 2
2 2
e
n > za2
t 2
2 2 2
e q
1 n
where q = x , q = E[ X ] and t 2 =
( x - q)2 .
n - 1 k =1 k
ActEd
Monte Carlo simulation is being used to model the life expectancy of a particular group
of annuitants. The standard deviation of the life expectancy has been estimated as 6.52
years and a previous estimate of the mean life-expectancy was 74.13 years. Calculate
how many simulations should be performed to ensure that, with 95% probability, the:
(i)
absolute error when estimating the mean life expectancy is less than 0.05
(ii)
relative error when estimating the mean life expectancy is less than 0.01%. [2]
[Total 4]
[2]
Page 57
ActEds solutions to exams set from 2010 to 2013, with useful exam technique
comments. In the summer, Mini-ASET will also be available covering the April 2014
paper only.
Mock Exam
There is a 100-mark mock exam paper (Mock Exam A) available. You can purchase
the mock exam with or without mock exam marking.
AMP
The Additional Mock Pack (AMP) consists of two further 100-mark mock exam
papers. This pack is ideal for students who are retaking and have already sat Mock
Exam A, or for those who just want some extra question practice. You can have either
or both mock exams marked by submitting the relevant mock exam with a Marking
Voucher (purchased separately).
Revision Notes
Flashcards
These A6-sized cards cover the key points of the course that most students like to
commit to memory and are an excellent supplement to our other study materials.
Each flashcard has questions on one side and the answers on the reverse.
For more information about these products, please see the Student Brochure or visit our website at
www.ActEd.co.uk. Please order online at www.ActEd.co.uk/estore or complete an order form and send it
to us by post or fax.
If you have queries in this subject then please use the Questions forum that is within the Online
Classroom.
Page 58