Beruflich Dokumente
Kultur Dokumente
ACTL3003
Insurance Risk Models
L ; x f xi ;
l ; x log L ; x ln f xi ;
i 1
i 1
The maximum likelihood estimator 1 ,..., m is the set of parameter values that maximises the
log-likelihood function. Or equivalently, set the score vector, i.e. vector of the partial derivatives of
the log-likelihood, to zero.
0 0,..., 0
l ; x
l ; x
S ;x l ;x
,...,
1
2l ; x
12
2
l ; x
H ; x
2
T
l ; x
m 1
...
...
2l ; x
1 m
2l ; x
m2
I H ; x
The square root of the diagonal elements of this matrix gives the standard errors of the MLE
Zhi Ying Feng
loss xi M
Then the likelihood function for the loss data is given by:
i
1 F xi M
L
1 F M
i 1
n
1 i
f xi M
1 F M
Calculate the theoretical CDF at each of the observed data points F xi ; 0,1
i 0.5
on x-axis against F xi ; on y-axis
n
If the empirical cumulative probabilities match the theoretical values, then the plot should lie
approximately on the line y x
Assumption: there is no censoring or truncation in the data
P-P plot scales everything onto a 0-1 scale and distributes all the points uniformly over that
range, allowing one to better focus on the fit where most of the probability mass is at
Q-Q plot
A Q-Q plot plots the theoretical quantiles against the empirical quantiles of the ranked observations.
To construct a Q-Q plot:
Rank the observed data into an ordered statistics, i.e. from smallest to largest x1 ,..., x n
i 0.5
;
For i 1, 2,..., n , calculate the theoretical quantiles F 1
n
i 0.5
; on y-axis
For i 1, 2,..., n , plot the points x i on x-axis against F 1
n
KS max F x F x;
x
Where:
Anderson-Darling Test
For n observations, the Anderson-Darling test statistic is:
F x F x;
f x; dx
AD n
F x; 1 F x;
Similar to the KS test, the theoretical distribution must be continuous and it does not work for
grouped data. The critical values for the AD test depend on the specific distribution that is tested.
2
Both the KS and AD tests look at the difference between the empirical and assumed distribution,
KS in absolute values and AD in squared differences. However, AD test is a weighted average
with more emphasis on goodness-of-fit in the tails than in the middle. For both tests, generally the
LOWER the test statistic, the better the model is. However, neither KS nor AD tests account for
the number of parameters in the model, so more complex models will often fare better.
Chi-Squared Goodness-of-fit Test
In the Chi-squared test, first break down the whole range of observed values into k subintervals
0 c0 c1 ... ck
The expected number of observations in the interval c j 1 , c j , assuming that the model in the null
hypothesis is true, is given by:
Ej n p j
where
p j F c j ; F c j 1;
O j np j
where
p j F c j F c j 1
The Chi-squared test statistic, which adjusts the d.o.f. for the number of parameters, is given by:
k
j0
Oj
Ej
fY y exp
c y;
Where:
is the location, or the natural, parameter and is a function of the mean
If Y1 , Y2 ,..., Yn are i.i.d. with exponential family distributions, then the sample mean Y is also
exponential dispersion with same location parameter , but scale parameter is n
Y Y t
t
Var Y
b '
t0
2
Y t b ''
t 2
t0
The variance can involve both and , depending on the function b. Thus it can be related to
the mean for certain distributions. To emphasise this, the variance can be written as
Var Y V
Scale Parameter
2
Gamma ,
1
2 2
2
2
ln
Inverse Gaussian ,
Poisson
Binomial n, p
ln
1 p
n
p
ln 1 p ln
r
ln
Negative Binomial r , p
1
1
Variance Function
V 1
V 2
V 3
V
V 1
n
V 1
r
y 1e y
f y
exp ln ln 1 ln y y
y ln ln
1 1
exp
ln
1
ln
y
1
1
1 1
y b
exp
c y;
Where:
1
and
b ln and c y;
ln
1
1
1 ln y ln
1
Y b '
Var Y b '' 2 2
V b ''
Components of a GLM
A GLM has three main components:
A linear predictor that is a function of the explanatory variables (systematic/location)
An exponential family distribution for the responses/errors (stochastic/spread)
A link function that connects the mean response to the linear predictor
Linear Predictor
The linear predictor is a function of the covariates that is linear in parameter:
Where:
xi are the covariates
Link Function
If we have estimated the parameters, e.g. using MLE, and we have the values for the covariates,
then we can calculate the value of the linear predictor. Then to obtain the expected response value
i , we substitute the linear predictor into the inverse of a link function g:
i g i
i g 1 i
If the response has an exponential family distribution, then given b and , all that is needed to
define this exponential family distribution is . This can be computed from the linear predictor
using the link function g . and using the fact that is a function of
i i
g 1 i
g 1 xiT
i i g 1 i 1 i i
This is known as a canonical link functions, or natural link function. To find the canonical link
function, we can use the b ' step to derive as a function of and equating this to g .
Some common canonical link functions are:
Distribution
Canonical link
Normal
g
Name
Identity (additive model)
Poisson
g ln
Binomial
g ln
Gamma
Logit
n
1
Reciprocal
L f yi exp i i
c yi ;
i 1
i 1
n
y b i
l i i
c yi ;
i 1
Since we want to estimate the s, we need to get i xiT involved by using the link function:
y g 1 xT b g 1 xT
i
i
i
i
l
c yi ;
i 1
i xi
g i i i xi
The canonical link function allows us the write the log-likelihood function in terms of the
parameters we want to estimate:
n
y ln i
l i i
c yi ;
i 1
n y x ln x
i
i
i
c yi ;
i 1
Then we can differentiate l with respect to and , and set the derivatives to zero to find the
MLEs.
Zhi Ying Feng
Goodness of Fit
The saturated model is a model where there are as many parameters as observations, so that the
fitted values are equal to the observed values, i.e. all variations explained by covariates:
yi i xiT
for all i
n y i b i
i
l y; y
c yi ;
i 1
Where:
Scaled Deviance
One way of assessing the goodness of fit of a given GLM is to use the likelihood ratio criterion.
This gives the difference in the log-likelihood between the model estimated using MLE and the
saturated model, which fits the data perfectly.
L y;
n y i i
b i b i
i
2ln 2ln
2 l y; y l y; 2
L y; y
i 1
2ln
D y,
approx.
2 n p
Where:
D y, is known as the deviance, which is similar to RSS for ordinary linear models
The difference in scaled deviance can be used to compare nested models, i.e. models that are
subsets of each other with the same distribution and link function, but more parameters in the linear
predictor. Consider two estimated models with scaled deviance below:
Model 1
Model 2
Scaled Deviance
D1
D2
No. of parameters
q
p>q
To decide whether Model 2 is a significant improvement (decrease in scaled deviance) over the
more parsimonious Model 1, we compare the difference in scaled deviance
D1 D2 ~ 2 p q
As a rule of thumb, the 5% critical value for a chi-squared distribution with v d.o.f. is
approximately 2v. Thus, Model 2 is better than Model 1 if D1 D2 2 p q .
ri D sign yi i
di
Where:
The sign . function returns 1 for positive arguments and -1 for negative arguments
If the model is correct and the sample size n is large, then the scaled deviance is
approximately n2 p , so the expected value of the deviance is n p , and each observation
should contribute approximately n p n 1 to the deviance. Thus, if di is significantly
greater than 1, then this indicate a departure from the model assumptions for that obs.
Typically, deviance residuals are plotted against fitted values to examine whether the
variance function is correct, i.e. residuals should have zero mean and constant variance
yi i
V i
If the model is correct, the Pearsons residuals are approximately normal, so check with Q-Q plot.
However, Pearsons residuals are often skewed for non-normal data while deviance residuals are
more likely to be symmetrically distributed. Thus, deviance residuals are preferred.
Overdispersion
Overdispersion is the presence of greater variance in a data set than what would be expected based
on a fitted model. While it is usually possible to choose the model parameters such that the models
theoretical mean is approximately equal to the sample mean of the data, the observed variance may
often be higher than the theoretical variance. Overdispersion tends to occur when one or more of the
relevant covariates have not been measured.
Overdispersion is often encountered with very simple parametric models, such as Poisson. When
using a Poisson regression model for count data, we assume that the mean is equal to the variance
but this may not reflect our data. However, the Poisson distribution only has one free parameter
since 1 , so it does not allow the variance to be adjusted independently of the mean.
Estimating
In the absence of Poisson overdispersion, the dispersion parameter should be close to 1. One way
to estimate is to use the Pearsons 2 statistic, which is the sum of Pearsons residuals squared:
n
2
i 1
y
i
Var Yi
y
1
V
n
i 1
2
approx.
~ n2 p
n p . Then taking the expectation of both sides gives an approximately unbiased estimate of as
y
1
n p
V
n
i 1
This method is preferred for the gamma case, as well as for non-aggregate data. Otherwise, an
alternative estimation is to use the fact that the scaled deviance is also approximately 2 distributed
with n p degrees of freedom, so its expected value is also n p . Then the estimate of is:
D y,
n p
If the estimated value of is significantly greater than 1, then it indicates that overdispersion is
present. Then we can either:
Use an alternative model with additional free parameters, e.g. negative binomial
Use maximum quasi-likelihood estimation, also applicable to dispersion with other models
Quasi-Likelihood Function
When overdispersion is present, we can use the quasi-likelihood function to specify a more
appropriate variance function. Estimation using quasi-likelihood ONLY requires the meanvariance relationship to be specified, which can be easily estimated from data. Thus, quasilikelihood is useful when we dont know for sure what the underlying distribution of the data is.
For any given mean i and variance function V i , the quasi-log-likelihood Q is defined as:
i
yi z
Q
dz
i 1 yi V z
n
y i i
Q i
j
i 1 V i j
To find the maximum quasi-likelihood (MQL) estimates of the parameters j , we only need to
specify the mean i , variance function V i and the link function g.
Example: Poisson Variance Function
Lets say we think the data is Poisson, but we arent sure, so we only specify variance function as
V but not the data distribution nor . Then the quasi-log-likelihood is
i
n
1
y z
Q i
dz yi ln i i constant
i 1 yi z
i 1
y ln n
i 1
Thus, if 1 , then the MQL estimates will be the same as the MLE from a Poisson regression.
However, the standard errors will be multiplied by a factor of , reflecting the greater degree of
uncertainty due to the presence of overdispersion
Zhi Ying Feng
Module 5: Copulas
Copula
Consider a random vector X1 , X 2 ,..., X n with marginal CDF Fi x . Applying the probability
integral transform to each component gives a random vector with uniformly distributed marginals:
F x1 , x2 ,..., xn
And marginal CDFs
Fj x
X1 x1, X 2 x2 ,..., X n xn
x , j 1, 2,..., n
F x1 , x2 ,..., xn C F1 x1 , F2 x2 ,..., Fn xn
Or equivalently:
X1 x1,..., X n xn C X1 x1 ,..., X n xn
When modelling, we can model the marginal distributions separately then plug them into the
copula function to model the joint distribution
Conversely, given a joint distribution function, the corresponding copula that describes the
dependence structure of this joint CDF can be found by substituting the inverse of the marginal
distributions into the joint CDF:
C u1 ,..., un F F11 u1 ,..., Fn1 un
C u1 , u2 ,..., un u1 u2 ... un
If X1 , X 2 ,..., X n are completely dependent, i.e. X1 ... X n X , then the copula is:
F x1 ,..., xn
c u1 , u2 ,..., un
n
C u1 , u2 ,..., un
u1u2 ...un
For continuous marginal densities, the joint probability density of X1 , X 2 ,..., X n is:
n F x1 ,..., xn
f x1 , x2 ,..., xn
x1...xn
nC F1 x1 ,..., Fn xn F1 x1
F x
... n n
x1...xn
x1
xn
f x1 , x2 ,..., xn
f1 x1 f 2 x2 ... f n xn
c F1 x1 , F2 x2 ,..., Fn xn
Properties of Copulas
For n 2 , the copula C is a function mapping 0,1 to 0,1 that is non-decreasing and right
2
u2 1
C v1 , v2 C u1 , v2 C v1 , u2 C u1 , u2 0 for u1 v1 , u2 v2
Invariance Property
Suppose a random vector X X1 , X 2 ,..., X n has copula C and suppose T1 ,..., Tn are non-decreasing,
continuous functions of X1 ,..., X n . Then consider the random vector defined by:
T X ,..., T X
1
T X x ,..., T X x X
1
T X x ,..., T X
1
T11 x1 ,...,
1
X
n
Tn1 xn
n
xn
Therefore, this random vector has the same copula C. This implies that:
Copulas are invariant under non-decreasing transformations of the random variables.
However, the marginal p.d.f.s will change under transformation
An implication of this property is that even if two random vectors X X1 ,..., X n and Y Y1 ,..., Yn
have the same copula, i.e. CX CY , it does not mean that X Y , since Y can be a non-decreasing
transformation of X. However, if X and Y have the same distribution, then CX CY .
Zhi Ying Feng
u n 1, 0
k 1 k
Archimedean Copulas
An Archimedean copula C has the form:
C u1 , u2 ,..., un 1 u1 u2 ... un
Where:
The function is called the generator
Each Archimedean copula has a single parameter that controls the degree of dependence
Clayton Copula
The Clayton copula is defined by:
n
C u1 , u2 ,..., un uk n 1
k 1
t t 1, 1
1 s 1 s
In the case of n 2 :
C u, v u v 1
Frank Copula
The Frank copula is defined by:
n uk 1
1
k 1
C u1 , u2 ,..., un
ln 1
n 1
ln
1
t 1
, 1
1
t ln
1 s
1
ln 1 e s 1 e
ln
Simulation of Copulas
In the bivariate case, to simulate a pair of random variables with known distributions F and G,
whose the dependence structure described by a copula C, use the conditional distribution method:
1. Simulate two independent uniform random variables u and t
2. Using the given copula, transform the t into a v 0,1 so that it has the right dependence
structure w.r.t u
-
cu v
-
V v | U u lim
u 0
C u u, v C u, v C u, v
u
u
cu v t
cu1 t v
x F 1 u
y G 1 v
This gives you two simulated outcomes for the random variables X and Y, with distributions F and
G respectively, such that their dependence is represented by the copula C
Example
Let X and Y be exponential with mean 1 and standard normal. The copula describing their
dependence is given by:
uv
C u, v
u v uv
Simulate two pairs of outcomes for X , Y given the following uniform(0,1) pseudo-random
variables:
0.3726791, 0.6189313
1. Treat the generated U 0,1 random variables as:
u, t 0.3726791, 0.6189313
2. Set up the conditional distribution function and set it as t:
C u, v
v
cu v
t
u
u v uv
2
tu
v
1 t 1 u
tu
0.5788953
1 t 1 u
x F 1 u ln 1 u 0.4662971
y 1 v 0.8648739
Zhi Ying Feng
S X1 X 2 ... X n X i
i 1
FX Y s s x
FY s x f X x dx
fY s x f X x
f X Y s s x
fY s x f X x dx
e
tS
e ... e
t X1 ... X n
X1 ... X n
tX n
tX1
t
t
t ... t
pS t
Xn
X1
p X1 t ... p X n t
mX1 t ...mX n t
There is a one-to-one relationship between a distribution and its mgf/pgf, so we can use the
mgf/pgf of the individual losses to find the mgf of the sum S, then if the mgf/pgf can be recognized
then we know the distribution of S. Both mgf and pgf have some useful identities involving Taylor
series expansion about 0:
pX t
t X n t
mX t
e X n e
n 0
n 0
tX
tn
n 0
n 0
0 t n
n!
m
0 etn
n!
X x | I 0 I 0 X x | I 1 I 1
1 q q B x
mX t
tX
| I 0
1 q q
I 0
tX
| I 1
I 1
e
tB
1 q q mB t
The mean is given by:
X | I
X |I X | I 0 I 0
q B
I
X |I
X |I
X | I 1 I 1
X | I Var X | I
B I 2 Var B
Var I Var B
q 1 q
I
2
q Var B
Example
Consider a car insurance policy where the policyholder has a 0.1 probability of having an accident:
If an accident occurs, the damage to the vehicle is uniformly distributed between 0 and 2500
The policy has a deductible of 500 and a policy limit of 1500
Find the density of X, the claim amount the insurer is liable for.
The random variable X is given by:
B ~ U 0, 2500
x 0 0 B 500
0.9 0.1 2500 0.92
fX x
0.1
0.00004 0 x 1500 500 B 2000
2500
500
0.1
0.02
x 1500 2000 B 2500
2500
S X 1 X 2 ... X N X i
i 1
Where:
N is the number of claims, commonly Poisson(), Binomial(n,p) or Negative Binomial(r,p)
X i is the amount of the ith claim
CDF P x
P.d.f. p x
Moments
X p
k
S | N
N
i 1
X i | N n N
X i p1 N
N
i 1
X i | N n Var N
N Var X i
S | N
X i Var N
N p2 p12 p12 Var N
N p2 p12 Var N N
2
X i
mS t
e
tS
t X1 ... X N
|N
N
m X t
N ln mX t
mN ln mX t
Distribution of S
The general expression for the distribution of S can be found by conditioning on the no. of claims:
FS x
N n P*n x
S x | N n
n 0
N n
n 0
Where:
N is the no. of claims so it will always be discrete, thus this formula works for any type of X
However, the type of S will depend on the type of X
If X is continuous
If X is a continuous r.v., then S will generally have a mixed distribution with:
A probability mass at 0 since:
FS 0 P*n 0
N n N 0
n 0
N 0
If X is mixed
If X is mixed, then S will also generally have a mixed distribution with:
N 0 and X 0
N 0
If X is discrete
If X is discrete, then we can derive a similar expression for the p.m.f of S:
fS x
n 0
S x | N n N n p *n x N n
n 0
Where:
p*0 0 1
Efficient if the number of possible outcomes for N is small, otherwise too tedious
N n
Here the distribution of represents a different class or risk, so for a certain class of risk we
have a certain value of and N will have a different distribution
N n 0 N n | u d
0
The expected value and variance of N is:
e n
u d
n!
N |
e
e
N |
tN
tN
et 1
e
m et 1
Example: ~ Gamma(,)
Find the distribution of N if given that N | ~ Poi and ~ Gamma , .
Using the m.g.f. approach:
mN t m et 1
t
e 1
t
1 1 1 e
S X i ~ Compound Poisson , P x
i 1
Where:
N Var N
and mN t exp et 1
S p1
Var S p2
mS t exp mX t 1
k X t ln mX t
The k-th cumulant of a random variable X with c.g.f. k X t is defined as:
dk
k k kX t k k 0
dt
t0
Some properties of cumulants are:
k S
for k 2,3
and
dk
dk
k k mX t 1 k mX t pk
dt
dt
t0
t0
Then we can use the cumulant to derive the expected value, variance, skewness and kurtosis of S:
S 1 p1
Var S 2 p2
Skew S 1
Kurt S 2
3
p3
p3
3/2
3/2
2
p23/2
p2
4 p4
p
2 2 42
2
2 p2 p2
Theorem 1
Let S1 , S2 ,..., Sn be independent compound Poisson random variables with parameters i and Pi x .
Then the sum is also a compound Poisson random variable:
S S1 S2 ... Sn ~ Compound Poisson , P x
Where:
i 1
n
i
Pi x or equivalently p x i pi x
i 1
i 1
n
i and P x
This means that independent portfolios of losses can be easily aggregated. Also, the total
claims paid over n years is compound Poisson, even if losses vary in severity and frequency
across the years
mX t etx p x dx
0
etx i pi x dx
0
i 1
i
i 1
n
etx pi x dx
i
mX t
i 1
n
mSi t exp mX i t 1
mS t mSi t
i 1
exp i mX i t 1
i 1
exp i mX i t 1
i 1
exp mX t 1
Thus the mgf of S is that of a compound Poisson distribution with parameters and P x
Theorem 2
If S ~ Compound Poisson , p xi i , i 1,..., n , then we can write S in the form:
S x1 N1 ... xm Nm
Where:
N1 , N2 ,..., Nm are mutually independent
Proof:
We need to show that N i are mutually independent and Poisson distributed. First define the sum of
the number of claims from each possible claim amount by:
m
N Ni
i 1
If we know what N is, i.e. if N n , then the number of claims N1 , N2 ,..., Nm of each claim amount
has a multinomial distribution with parameters n, 1 , 2 ,..., m
Zhi Ying Feng
m
m
exp ti N i exp ti N i | N n
i 1
n 0
i 1
N n
m
e n
i eti
n 0 i 1
n!
m
1
e i eti
n 0 i 1
n!
e exp i eti
i 1
m
m
exp i eti i
i 1
i 1
exp i eti 1
i 1
Since the joint mgf is a product of the individual mgf, it implies that N1 , N2 ,..., Nm are mutually
independent. Next, by setting ti t and t j 0 , we have the mgf of a particular N i :
exp tNi exp i et 1
This is the mgf of a Poisson random variable with parameter i and therefore Ni ~ Poi i
Sparse Vector Algorithm
Theorem 2 allows us to use the sparse vector algorithm, an alternative to convolution that is more
efficient for small m. Suppose S has a compound Poisson distribution with 0.8 and the
distribution for the individual claim amount is given by:
xi
1
2
3
X x p xi i
0.250
0.375
0.375
S N1 2 N2 3N3
Then the distributions for N1 , 2N 2 and 3N3 , i.e. the total $1, $2 and $3 claims, are
Recursion Algorithms
Another method to get the distribution of S when the claim amount X i is discrete is to use recursive
algorithms, which requires a certain family of distributions
The (a,b) Family
A distribution in the (a,b) family has the following property:
b
N n a N n 1
n
fS s
a p j fS s j
1 a p 0 j 1
s
With starting value:
N 0
if p 0 0
fS 0
mN ln p 0 if p 0 0
Note that if p x 0 for x xmax , then the upper bound of the sum can be reduced to min s, xmax .
The second outcome for f S 0 is derived through:
fS 0
n 0
X 1 ... X N
p 0
0 | N n
N n
N n
n 0
N
p 0
mN ln p 0
fS s
1 s
j p j fS s j
s j 1
x 0
x 1
mX t p x etx p 0 p x etx
p x tx
e
x 1 1 p 0
p 0 1 p 0
q p mX t
Where we define:
q p 0 so that p 1 p 0
A new random variable X whose pmf can be obtained by comparing coefficients of etx
p x
p 0 0, p x
, x 1, 2,...
1 p 0
t X1 ... X n
q p mX t
This has the form of the mgf of a compound binomial distribution. Thus, the n-th convolution of X
n
i 1
X i i 1 X i
N
Where:
1 p 0
p
1 p
p 0
and
n 1 p n 1 1 p 0
1 p
p 0
Pmf of X is p x
So now this convolution is equivalent to the density of a compound binomial, and since binomial
distribution is part of the(a,b) family, we can apply Panjers algorithm:
x
1
bj
*n
p*n x
a p j p x j
x
1 a p 0 j 1
x
1 p 0 n 1 1 p 0
p
0
p 0 x
j 1
1
p 0
j p j *n
p x j
1 p 0
min x , xmax
n 1 x 1 p j p x j
*n
j 1
Note that despite transforming S into a sum of X , the pmf in the algorithm is still the original p j
Zhi Ying Feng
for j 0
f 0 FX h 2 0
f j f j FX jh h 2 0 FX jh h 2 0 for j 1, 2,..., m 1
f 1 F mh h 2 0
for j m
X
m
-
m
j0
f j 1
Approximation Methods
Sometimes it is not possible to compute the distribution of S, either because the data available is not
detailed enough, or it is impossible to fit a tractable model to the data. Rather than taking the risk of
fitting a wrong model, a quick approximation may be used instead.
CLT Approximation
Assuming that the distribution of S is symmetrical, the central limit theorem suggests that we can
use the approximation:
s S
s S
Z
Var S
Var S
S s
S x
S S x S
Var S
Var S
S S
x* s
Var S
Where:
s
6
1
2
2
1
6 x*
S x
dk
x
dx k
z z 1 3 z 2 4 z 1 6 z
6
24
72
Var S
Where:
x f x
1
1 12 x2
e
2
x x f x
etc ...
, , x0 such that the first, second and third moment of S and U x0 coincide
x x0
1 t
FS x U x0 x G x x0 ; ,
t e dt
0
Equating the mean, variance and third moment gives the method of moments estimators for , , x0 :
S U x0 x0
Var S Var U x0
S
U x
0
1
U x0
1
4
12
x0
S x0
S x
i 1
i 1
S I i bi Ni bi S
Where:
I[b , ) x p x i I x bi
i 1
i 1
i and P x
i 1
When approximating an individual model with the collective model, we need to make assumptions
for i , the contribution of an individual of amount i to the total expected number of claims . Note
that in both assumptions the distribution of claim size X, i.e. p x , remains the same.
Assumption 1
Ii qi
i
n
qi
i 1
This assumption ensures that the expected number of claims of size bi in the approximated collective
model will be the same as the individual mode. Therefore, the expected value of the aggregate
claims S and S will also be the same
n
S qibi
i 1
i 1
i 1
1
1
2
3
This assumption ensures that the probability of no claims in the approximated collective model will
be the same as the individual model. Therefore, the probability of zero aggregate claims will be the
same:
S 0
S 0
However, the approximated collective model will have both a higher mean and variance
n
i 1
i 1
S ibi qibi
i 1
i 1
i 1
Module 2: Reinsurance
Reinsurance is the transfer of risk from an insurer to a reinsurer, i.e. the insurer pays a deterministic
premium to the reinsurer to protect itself from a random loss arising from claims.
Types of Reinsurance
Proportional Reinsurance
In a proportional reinsurance contract, the insurer pays a retained proportion of a claim X:
Insurer pays Y X
Reinsurer pays Z 1 X
This is simply a change of scale, so the mean and variance of what the insurer is liable for is:
Y X
Y2 2 X2
Y X
Non-proportional Reinsurance
In an excess of loss reinsurance contract, the reinsurer pays the excess over a retention d, with the
premium being a fixed cost, e.g. $0.025 per unit of coverage in excess of retention.
Reinsurer pays Z X d
For the direct insurer, a non-proportional reinsurance will always have the least variance of retained
claims for all reinsurance arrangements with the same expected retained claims. The reinsurer may
also limit the payment to an amount L so that:
Pd S d d
if S is discrete
k d fS k
k d
Pd 1 Pd 1 FS d
If d is not an integer
with P0
Pd Pd d d 1 FS d
Var S d
S d 2
S d
S d 2 Pd2
Pd21 2 Pd 1 1 FS d 1
with P02
S
2
The probability of a claim q for each of the 16000 lives is 0.02. The excess of loss reinsurance with
retention of 30000 is available at a cost of 0.025 per dollar of coverage. Calculate the probability
that the total cost will exceed 8250000.
The total cost to the insurer after reinsurance is the total retained claims payable plus the
reinsurance premium. With reinsurance, policies with benefit amount of 1,2 and 3 will remain
unchanged but for those with benefit amount of 5 and 10, the insurer is now only liable for 3. Thus,
the portfolio of retained business is given by:
Thus, the expected value and variance of the total retained claims is:
3
S nk bk qk
k 1
Var S nk bk2 qk 1 qk
k 1
662.5 S
Z
Var
S
Z 2.653
S 162.5 825
0.0041
Stochastic Processes
Rather than looking at the aggregate losses at a point in time as a random variable S, we are
interested in modelling it over a period of time as a stochastic process S(t).
The increment of a stochastic process is a random variable:
X t h X t
Poisson Process
N t 0 and N t is integer-valued
N 0 0
Pr N s t N s n Pr N t n e
t n
n!
Or equivalently, the inter-arrival time between the (n-1)-th and n-th event has an
Exponential() distribution
The probability of more than a jump at a time is 0
N t h N t 0 e h 1 h o h
N t h N t 1 he h h o h
N t h N t 2 o h
S t X i
i 1
Where:
In a Poisson process, the increments can only have a height of 0 or 1 but for a compound
Poisson process, the increments can have a height of X i
S t h S t
N t h
N h
i N t 1
X i ~ X i ~ Compound Poisson h, P x
i 1
Cramer-Lundberg Process
Let
N t
i 1
U t u ct X i u ct S t
i 1
Where:
u is the initial surplus, or initial capital
c is the constant yearly premium rate which is the cash inflows. It is defined to cover the
expected aggregate loss over a unit period of time, with a the relative security loading of ,:
c 1 S 1
N 1
1 i 1 X i
N 1
1 i 1 X i | N 1
1 N 1
1 X
N t
U t represents the insurers surplus at time t. Each parallel upward jump represents the
constant premium income, and the downward jump represent the losses from claims
Cramer-Lundberg process has stationary and independent increments, because:
i 1
U t s U t cs
D
N t s
i N t 1
cs
N t s N t
i 1
independent of U t
Xi
Xi
N s
cs X i
i 1
U s U 0
stationary increments
Probability of Ruin
The survival of an insurance company will depend on certain variables, e.g. the initial surplus u,
loading of premiums and the level of reinsurance. One way of monitoring these variables is the
concept of the probability of ruin. In the Cramer-Lundberg model, the time to ruin T is defined as:
T inf t 0 | U t 0
i.e. the first time that the insurers surplus U t becomes negative. The probability that a company
will be ruined by time t given the initial surplus u is denoted by:
u, t
T t
u, t u, t
T lim
t
Adjustment Coefficient
Consider the excess of losses over premiums over the interval 0,t , i.e. the negative surplus
U t U 0 S t ct
Define the adjustment coefficient R as the smallest positive solution of the following equation:
r S t ct
e rct
mS t ct r e
cr mX r 1
e e
rS t
rct t mX r 1
1 X r mX r 1
As long as 0 , then there will only be 1 positive solution to this equation other than 0, since the
LHS is linear in r and RHS is concave up, so they will definitely intersect at some positive R.
Alternatively, the adjustment coefficient is the R that makes e RU t a martingale, i.e. satisfies:
e RU t e Ru
RU t
RU t
| U v , 0 v s e
RU s
R U t U s RU s
| U v , 0 v s e
e
| U v , 0 v s
RU s
RU s
RU s
RU s
e RU t U s | U v , 0 v s
e RU t U s
independent increments
e RU t s U 0
stationary increments
The red part is equal to 1 since the increment in the exponential can be written as:
U t s U 0 U 0 U t s S t s c t s
So therefore the red part is equivalent to the same equation we solved to find the adjustment
coefficient, which is equated to 1.
e RU t s U 0 eR S t s ct s e Rct s e t s mX R1 1
e RU T | T
RU T
Since U T 0 , then e | T 1 so we have the Lundbergs upper bound:
u e Ru
X xmax 1 for some finite xmax , then the lower bound is:
U T xmax
u e R u x
max
Therefore, the upper and lower bound on the probability of ultimate ruin is:
R u xmax
u e Ru
1 1
To find u , we find need to first find the denominator of the formula in Theorem 13.4.1. Denote
U T as the surplus just before ruin and Y as the claim that causes ruin, so Y ~ exp :
Y U T U T
U T | T U T | Y U T
Y U T | Y U T
Y U T x | Y U T Y x U T | Y U T
Y x
U T | T ~ exp
e RU T | T 1
R
Then we have the probability of ultimate ruin as:
e Ru
1 Ru
1
e
exp
u
1
1
1
e RU T | T
Here the probability of ruin is inversely proportional to the adjustment coefficient, so the insurer
will adjust (maximise) the adjustment coefficient to minimise the probability of ruin.
Applications to Reinsurance
Instead of setting aside large amount of capital to lower its probability of ruin, the direct insurer can
purchase reinsurance. Let h x x be the amount paid by the reinsurer for a claim of amount x:
U t u c ch t X i h X i
i 1
To solve for the adjustment coefficient, consider the case of proportional reinsurance:
N t
U t u c ch t 1 X i u c ch t 1 S t
i 1
U t U 0 1 S t c ch t
Then the adjustment coefficient is the smallest positive solution to:
m1 S t c ch t r 1
e r 1 S t c ch t 1
r c ch t
mS t r 1 1
r c ch t t mX r 1 1
c ch r mX r 1 1
In general, when there is proportional/excess of loss reinsurance, we need to solve:
c ch r mX h X r 1
The adjustment coefficient with reinsurance will always be lower than the corresponding
adjustment coefficient without reinsurance, since reinsurance will reduce the probability of ruin,
which is inversely proportional to the adjustment coefficient.
Example: Adjustment Coefficient with Proportional Reinsurance
An insurer models its surplus using the Cramer-Lundberg process with claim distribution
X ~ exp 1 and security loading 0.25 . The insurer is considering proportional reinsurance with
retention with security loading k 0.4 . What is the retention that will maximise Rh ?
Setting up the equation to solve for the adjustment coefficient:
X h x X
1
1 r
ch 1 0.4 1 X 1.04 1
c ch 1.25 1.4 1 1.4 0.15
mX h x r
1.4 0.15
1 r
Rh 0.223787
mX h X r e e dx
d
rx x
1 re d 1 r
e e dx
1 r
rd x
x d e x dx 1.4 d
d
ch 1.4
e x dx 1.4 e d
d 1 r
Compared to the proportional reinsurance case, the adjustment coefficient here is much higher
which means the probability of ruin will be much lower.
Theorem 14.5.1
This theorem formalises the results of the previous two examples, if:
We are in a Cramer-Lundberg setting
We are considering two reinsurance contracts, one of which is excess of loss
The premium loading and expected amount paid by the reinsurer, i.e. the reinsurance
premium, is the same for both contracts
Then theorem 14.5.1 states that the adjustment coefficient with the excess of loss contract will
always be at least as high as any other type of reinsurance contract. The excess of loss contract is
also the most optimal for the direct insurer, i.e. a lower variance of retained claims
De Finettis Modification
Using the probability of ruin as a criterion presents some issues:
Minimising u supposes that companies should let their surplus grow without limit, which
is not realistic
If some of the surplus is distributed from time to time, e.g. as dividend, then calculations of
u are wrong
Use overall mean X of all the groups makes sense only if the portfolio is homogenous
Use the mean of a particular group X j makes sense only if the group is sufficiently large
and arguably different from other groups
The credibility approach is to take a weighted average of these two extremes. In general, the
credibility premium is given by:
Pcred z j X j 1 z j X
Where:
X j ,T 1 |
2 Var X j ,T 1 |
Zhi Ying Feng
Pcoll m
X j ,T 1 X j ,T 1 | f d
Bayesian Premium
Given T observations x of X X1 ,..., X T , we are interested in estimating the individual premium
X x |
Define:
L , g x as the loss function, if is the true parameter and g x is the value taken by the
estimator if X is observed
f|X | x x as the posteriori distribution of , i.e. after observations have been made
The Bayesian estimator is defined as the estimator that minimizes the error w.r.t. the posteriori
distribution after observation
min L , g x f |X | x d
L , j j
The Bayesian premium is defined as the Bayesian estimator w.r.t the quadratic loss function,
which turns out to be the posterior expectation of :
| X f|X | x d
In other words, the Bayesian premium minimises the quadratic error between the estimated
P Bayes
Var | X
The expected quadratic loss of the collective premium is higher:
m 2
Var
Var | X Var
Var | X
| X
X |
To calculate the Bayesian premium, we first need to determine a posteriori distribution of using
the realisations of X
F|X | x
| X x
| x
| X x
X x |
X x
This gives 4 cases depending on whether X t and is continuous or discrete, for example:
t 1
T
| x
X t xt |
X t xt | d
T
t 1
t 1 f X | xt |
T
| x
T
t 1
f X | xt |
P Bayes | X x | x d
f X | x | x 1
Bernoulli
1 x
1
1
1 , 0 1 Beta ,
For this particular choice of the conditional likelihood and priori distribution, the posterior
distribution will also have a Beta distribution. Since X t is discrete and is continuous, then:
t 1
T
| x
t 1
X t xt |
X t xt | d
t 1
X t xt |
1 1
t 1 xt 1
1 xt
xt
t1
xt
t1
Proportionality is true since the red dominator is just a constant. Now denote s as sum of
observations and T as the number of observations. In this example, T is the number of observed
years and s t 1 xt the total number of years with a claim. Then the posterior distribution is:
T
| x s 1 1
T s 1
~ Beta s, T s
P Bayes | X
| X
s
T
To write this in credibility premium form, it needs to be a weighted average of the expected number
of claims for a particular risk X and for all risk m.
t 1 t
s
T
X |
T T T
z X 1 z m
This is indeed in the form of a credibility premium, with the credibility factor being:
T
z
T
1 z
T
Zhi Ying Feng
Buhlmann Model
The Bayesian estimator/premium is the best possible estimator but it can be tedious to calculate and
it requires us to specify the conditional likelihood and priori distribution. The idea of Buhlmann
model is to restrict the class of allowable estimator functions to those which are linear in the
observations x .
In the Buhlmann model, we make the following assumptions:
There is only 1 policy j for 1 insured
X jt |
2 Var X jt |
m
Var a
2 s 2
PTBayes
a1 X j a0
1
The linear Bayesian estimator is the one that minimises the mean squared error, which is equivalent
to minimizing:
a X a
2
2
min a1 X j a0 min X j ,T 1 a1 X j a0
a0 ,a1
a0 , a1
Taking partial derivatives w.r.t the 2 parameters give:
a 0
2 X j ,T 1 a1 X j a0 0
j ,T 1
2 X j
Cov X j , X j ,T 1 a1 Var X j 0
j ,T 1 a1 X j
0
Now using the following results from the preliminary tutorial exercises:
X j X j ,T 1 m
Cov X j , X j ,T 1 Var a
2
s2
Var a
T
T
Var X j
Cov X j , X j ,T 1
Var X j
s2
T
a
T
2
a T sa
a0 1 a1 1 a1 m
Therefore, the Bayesian premium the Buhlmann model is:
PTBayes
a1 X j 1 a1 m
1
PTcred
1 z X j 1 z m
With:
2 s 2
T
a1 z
, K
and m
T K
Var a
If a increases, i.e. there is more heterogeneity between risks, then K will decrease and thus z
will increase. So if the risks we have are quite different from each other, then we will place
more weight on the individual mean structure X
If s decreases, i.e. if there is less heterogeneity within the risk, then K will decrease and thus
z will increase. So if each risk group doesnt vary much within itself, then we would use
more information on the individual mean structure X
Nonparametric Estimation
There are three ways of estimating the parameters m, s2 and a:
Pure Bayesian procedure: intuitively set them using knowledge of an experienced actuary
X
Zhi Ying Feng
1 J
1 J 1 T
X
X jt
J j 1
J j 1 T t 1
1 J 2 1 J T X jt X j
s s j
J j 1
J j 1 t 1
T 1
2
1 J
a max
X j X
J 1 j 1
1 2
s , 0
T
Note that if a 0 then z 0 , since all risks will have the same risk profile.
Example: Credibility Premium with Parametric Estimation
It is known that given the risk profile , the number of claims follow a Poi distribution. Among
all those insured, the parameter has a Pareto 3, 10 . Suppose that a policy had the
following claim experience in the last 3 years. Determine the credibility premium for the 4th year.
Year
No. of Claims
1
2 claims
2
3 claims
3
1 claim
The conditional likelihood and priori distribution are:
N | ~ Poi
~ Pareto 3, 10
Given these, we can derive:
N4 |
and 2 Var N4 |
a Var Var
2
75
2
1 2
s2 1
a 15
Therefore, the credibility premium is given by:
K
T
15T
T K 1 15T
15T
1
N
5
15T 1
15T 1
Note that since we want to estimate the expected number of claims, T the risk exposure, i.e. the no.
P4cred z N 1 z m
of observations. Here we took 3 observations, i.e. 3 years of experience and N is the average no. of
claims for this particular policy, so we have:
1
N 2 3 1 2 & T 3
3
cred
P4 2.06522
Zhi Ying Feng
5
11
8
13
11
12
8
12
9
1
X 10 m
s 5
Estimate the Buhlmanns credibility factor and use this to estimate next years credibility premium
for each policy.
First, we need to test whether there is sufficient heterogeneity using the F-test:
SSB / J 1
MSB
F
~ F J 1, J T 1
MSW SSW / J T 1
J
SSB X jt X j
j 1 t 1
SSW T X j X
j 1
J 2, T 3
The null is that the portfolio is homogeneous, so if this test accepts the null then the credibility
factor is zero, i.e. the Buhlmann credibility premium is the same for all risk classes and equal to the
overall mean. However, if the test rejects then the Buhlmann credibility premium is calculated as:
1 J 2
s
s s j 5
1
J j 1
t 1
1 J
1 2 19
a
X j X s
J 1 j 1
T
3
2
j
jt
X j
PAcred z X 1 1 z m 8.42
&
T
19
2
T s / a 24
PBcred z X 2 1 z m 11.58
Alternatively, the credibility factor z can be estimated from the F test statistic
1
z 1
F
Buhlmann-Straub Model
The Buhlmann-Straub model is an extension of the Buhlmann model where the risk exposure, e.g.
number of policyholders, is changing with each observation. In this model we assume that for the jth class of risk:
The risk class j is characterized by its specific risk parameter j , which is a realisation of
X jt S jt / V jt is the claim amount per unit of volume in year t. Depending on what measure of
volume we used, X jt has different interpretations. E.g. if V jt is the number of claims in year t,
then X jt is the average claim size in year t. Since the risk exposure is changing with each
observation, the observed data is the average of the individual observations instead
Conditional on j , X j1 , X j 2 ,..., X jT are i.i.d. with:
j X jt | j
Zhi Ying Feng
& 2 j V jt Var X jt | j
j X jt | j
w j w jt
aggregate volume
t 1
w jt
t 1
w j
X jt
m
a Var
s 2 2
collective premium
w w j
aggregate volume
j 1
J
w j
j 1
X j
Pjcred
,T 1 z j X j 1 z j m
m z j X j m
Where:
w j
and K
s2
a
zj
w j K
model
Under the Buhlmann-Straub model, this credibility estimator is the best linear estimator under the
quadratic loss function, given experience and collective data. The actual expected quadratic loss is:
2
P cred 2 1 z Var z s
i
j
j
j ,T 1
w j
This quadratic loss is smaller than Var , which is the quadratic loss if we just used the
collective premium m, i.e. the best estimator with only collective data and no experience
s2
It is also smaller than
, which is the quadratic loss of X j , i.e. the best estimator with only
w j
s
a
zj
1 J 2 1 J 1 T
sj J
w jt X jt X
J j 1
j 1 T 1 t 1
J
w j X
J
2
w
j 1 w2j j 1
2
J 1 s
w j
w j k
w j
2
w j s a
Then estimate m as a weighted average of the experience, using the estimated credibility factor:
J z
j
m X j
j 1 z
J
z z j
j 1
Pjcred
,T 1 z j X j 1 z j m m z j X j m
2
1 z j
z
X
j
j
a 1 z j 1 z
2. Calculate X j and X . Note that since the table is in terms of aggregate claims amounts, the
values in the table are NOT X jt but S jt . To obtain X jt , use:
X jt
S jt
w jt
8000
11000
15000
50
70
40
50
70 212.5
160
20000
24000
19000
100
120
115
100
120
115 188.06
335
w X 1 w2 X 2 160 212.5 335 188.06
1
195.61
w
495
40
X 1
X 2
X
j 1
w X jt X j
t 1 jt
J T 1
25160.58
2
w j X j X
2
2
w
j 1 w2j j 1
2
J 1 s 182.48
25160.58
137.88
182.48
a
w1
160
z1
0.537
w1 k 160 137.88
k
z2
w2
w2 k
335
0.708
335 137.88
P1,4cred w14 z1 X 1 1 z1 m
cred
P2,4
w24 z 2 X 2 1 z 2 m
18158.10
Where:
Accident year indicates the year that the claim is incurred in
Development year indicates the year the claim is reported, could start at 0 instead of 1
Each entry in the triangle X ij is the amount of the total claims incurred in accident year i paid
in development year j. E.g. the first entry 90 means that of the claims incurred in year 1995,
90 was reported in 1995, i.e. the first year.
Each diagonal represents the total amount of claims paid in a particular calendar year
The aim is to complete the table by predicting the numbers in the bottom right triangle
X ij i j i j 1
X ij ~ Poisson i j i j 1
Where:
i is a parameter varying by accident year i
j is the development factor for year j independent of accident year i, it represents the
ij X ij i j
X ij ~ Poisson i j
Where:
i is the total claims arising in accident year i, which is the same across rows
The likelihood function, which is the product of all observed probabilities in the upper left triangle:
t
t 1i
i 1
j 1
L , e
i j
xij
xij !
Taking the log and then setting the derivatives to zero gives the MLEs:
t 1i
j 1
t 1i
j 1
t 1i
i 1
t 1i
i 1
xij
j
xij
i
t 1i
j 1
Kj
t 1i
i 1
with
t
i 1
j 1
Where:
Ri are the sums of row i
The method to completing the run-off triangle using the chain-ladder technique is:
1. Transform the run-off triangle to incremental form first if it is given in cumulative form
2. Calculate the first and the last using:
1 R1
and
t Kt / 1
3. work your way up (start with lowest) and down (start with highest):
t j 1
Rt j 1
1 ... j
Rt j 1
1 k j 1 k
t
and
Kj
1 ... t j 1
An alternative solution that applies ONLY to incremental run-off triangles is demonstrated below:
X 34
C B
A
ij X ij j k
X ij ~ Poi j k
Where:
j is the percentage of claims reported in year j which is same across columns, with
j 1
To estimate j and k , construct the likelihood function, which is the product of all observed
probabilities in the upper left triangle:
t
L , e
j k
xij
j k
xij !
i 1 k j
x K
t 1i
j
k
j 1
t
ij
k j
t
i 1
k j
i , j: i j 1 k
k
j 1
ij
j 1
j 1
Where:
K j are the sums of column j
t Dt
t Kt / t
j t , k t , then moving down the rows and moving left on the columns at each recursion:
t j 1
j
Dk
1 ... j
Dk
1 k j 1 k
t
Kj
t
k j
Since there is no data to estimate k , k t , i.e. future calendar years, we can extrapolate from k for
k t using either linear regression or log-linear extrapolation, i.e. linear regression on the logs
ij X ij i j
X ij ~ N i j , 2
Since the response distribution is normal, the estimates i and j can be found using least squares by
minimizing sum of squares as it is equivalent to MLE:
ij
i j
i, j
Bornhuetter-Ferguson Method
In the BF method, the future claims are determined by loss ratios and development factors. The BF
method demonstrated below using the following example:
The ultimate loss ratios for underwriting years 2004, 2005 and 2006 are expected to be in line with
2003 and the total claims paid are $1,942,000. Calculate the BF estimate of INBR reserve required
and state the assumptions underlying this estimate
Assumptions:
Accident year 2003, i.e. the first accident year, is fully run-off
Each accident year will develop in the same way
The data is already adjusted for inflation, or past inflation pattern will repeat in future
The estimated loss ratio is appropriate
Method:
1. Transform the run-off table into cumulative form if it isnt already in cumulative form
2. Derive the development factors d j from the column sums of the cumulative run-off triangle:
dj
-
j 1
k 1
j
k 1
k
k
Incurred Claims
Earned Premiums
In this example, the loss ratio for all accident year is to follow 2003, which is:
715
LR
0.8314
860
4. Find the initial estimate of the ultimate loss (IUL) for each accident year i using:
IULi Loss Ratio Premium
-
In this example, the ultimate loss for each accident year is:
2003 0.8314 860 715
2004
The cumulative development factor is a product of the development factors from step 2,
depending on which years are left to develop currently
- In this example, 2003 is already fully developed, so EC is 715 and for other years:
781.516
2004
754.213
d3
814.772
2005
698.955
d2d3
848.028
2006
586.384
d1d 2 d 3
6. Finally, calculate the ultimate loss (UL) for each accident year and the IBNR reserve
required
ULi claims already reported + claims yet to be reported
Laplace Criterion
The Laplace criterion selects the outcome with the highest expected value under the assumption that
all states of growth are equally likely to occur. In this example, either bonds or savings plans are
equally good.
1
stocks 10000 6500 4000 4167
3
1
bonds 8000 6000 1000 5000
3
savings 5000
Maximin Criterion & Minimax Criterion
The maximin or Wald criterion applies to gains and selects the alternative that maximizes the
minimum returns/payoffs. It is based on the assumption that the decision maker is pessimistic about
the future. In this example, we focus solely on the slow growth state and select the investment that
maximizes payoff in this state, i.e. savings.
If the data is w.r.t losses instead of gains like the example, then the corresponding criterion is the
minimax criterion. First calculate the maximum possible losses of each alternative, and then choose
the alternative with the minimum.
Maximax Criterion
The maximax criterion selects the alternative that has maximizes the maximum returns/payoffs. It is
based on the assumption that the decision maker is optimistic about the future. In this example, we
focus solely on the accelerated growth state and select the investment that maximizes payoff in this
state, i.e. stocks.
In this example, for a coefficient of 0.6 , we take a weighted average of the payoffs under the
accelerated growth state (optimistic) and the slow growth state (pessimistic). The best investment in
this case is bonds.
For example, if the state is accelerated growth then stock is best, so the opportunity loss from
investing in bonds is 10,000 8,000 = 2,000. Then the maximum regret for each alternative is
$9000, $4000, and $5000. Therefore to minimise the maximum regret, the bond investment is the
best.
Game Theory
Game theory is concerned with decisions in the face of conflict, which exists when the interest of
two or more parties are in competition. In a two-person, zero-sum game:
The payoffs in two-person zero-sum game can be represented using a payoff matrix.
A zero-sum game is a game where the sum of the payoffs among the players is zero. For a twoperson zero-sum game, the gain from a strategy of Player A is the loss for Player B. For example, if
Player A selects alternative 1 and Player B selects alternative y, then the gain to Player A or the loss
to Player B is 40.
In general, Player A will use the maximin principle for gains, i.e. find the minimum gains of each
alternative and choose the highest one. Player B will use the minimax principle for losses, i.e. find
the maximum loss of each alternative and choose the lowest one.
Zhi Ying Feng
Firstly, we can reduce the payoff matrix by removing dominated strategies. For the union, strategy
1 and 4 are always have higher gains than 2 and 3 while for the management, strategy y and z are
always have lower losses than w and x. This reduces the payoff matrix to:
Using the minimax approach, if the union selects strategy 4, then the management will select
strategy y. But given the management selects strategy y, then the union will change to strategy 1
since it has a higher payoff of 65. Thus, the game will end up in in an infinite loop with no saddle
point reached, where one of the players will always be dissatisfied and want to change.
Expected Gains/Losses Approach
When there is no saddle point one can select in a random fashion but with a certain probability
structure, i.e. select each strategy a certain % of the time such that the players gains or losses are
equal regardless of the opponents selection of strategies. Now selecting a strategy a given % of
time is equivalent to selecting it with a given probability on a one-time basis. We can solve for this
probability by equation the expected gains of the two possible strategies
Zhi Ying Feng
65 p 50 1 p
If management selects z, the possible payoffs to the union are 45 and 55. Then if the union
selects strategy 1 with probability p, the unions expected gains are:
45 p 55 1 p
p is determined such that the union is indifferent between strategies 1 and 4, regardless of what the
management selects, i.e. the expected gains from both possibilities must be the same:
65 p 50 1 p 45 p 55 1 p
p 0.2
This means that the union will select strategy 1 20% of the time, and strategy 4 80% of the time.
Under this strategy, regardless of what decision the management makes, the expected gains for the
union is the same.
Assume that XYZ models total loss from a single disaster using a Pareto distribution with mean
$1.5 million and standard deviation $2.5 million. The density for a Pareto distribution is:
fX x
1
x
Given the mean and s.d., we can solve for the parameters:
1.5
2
Var X
2.52
2
1 2
3.125
3187500
Then without reinsurance, i.e. D0, the expected loss per claim is $1.5 million. Reinsurance will
cover any excess over $2 million so that the expected amount paid by the reinsurer is:
x 2 f x dx
2
x 2 f x dx 1 F x dx
dx
2 x
x 1
1 2
532889
Then the expected amount payable by the XYZ with reinsurance is:
min X , d X X d
1500000 532889
967111
There are 4 possible states of nature denoted as 0 , 1 , 2 and 3 representing 0, 1, 2 and 3 or more
disasters. Then if XYZ decides to go with:
D0 : no reinsurance, hence reinsurance premium is zero. Net total claims will just be $1.5
and for any more disasters it will just be $967,111 plus $1.5 million for each disaster
D2 : reinsurance premium is $1 million. Then net total claim for 1 disaster is $967,111, for 2
claims is 2x$967,111 and 2x$967,111 plus $1.5 million for 3 or more disasters.
This then becomes a decision under uncertainty problem, and the decision matrix which is in terms
of losses is:
The minimax solution to this decision problem is the alternative that minimizes the maximum loss,
and maximum loss is incurred when there are 3 disasters. Therefore, the minimum of the maximum
loss is D2 , i.e. the reinsurance plan that covers 2 disasters.
Now suppose that XYA believes that the number of disasters each year follows a Poisson
distribution with mean 0.9. Then the probability of each state of nature is:
0 e0.9 0.407
1 0.9e0.9 0.365
2 0.92 e0.9 / 2 0.165
3 1 0.407 0.365 0.165 0.063
Then this becomes a decision under risk problem now that we have the probability structure. Thus,
we should choose the decision that minimises the expected loss, which is no reinsurance D0
Zhi Ying Feng