Actl3003 Notes Unsw Summary

Zhi Ying Feng
ACTL3003
Insurance Risk Models
Zhi Ying Feng
Zhi Ying Feng
Module 3: Fitting Loss Models

Parametric Model Estimation
First step in fitting a parametric model is to analyse the data by summarising it, i.e. mean, median,
standard deviation, skewness, kurtosis, quantiles, min, max, etc. Then graph the data using
histograms, density plots, QQ plots and etc. From these information, we can consider possible
parametric distributions that could be fitted to loss data, e.g. Pareto, log-normal, Gamma.
Incomplete Data
Often we have to fit a model to incomplete loss data, as the exact data might not be available. This
could be due to observations being grouped, so only the range of values is available, or in the
presence of censoring and/or truncation:
Left truncation excess/deductible: observation is left truncated at M if it is NOT recorded
when it is below M and recorded at its exact value when it is above M. i.e. there is no
information about these losses below M at all
Right censoring policy limit: observation is right censored at d if when it is above d it is
recorded as being equal to d, but recorded at its exact value when it is below d, i.e. you have
partial information about these losses, i.e. they are at least as high as d
Maximum Likelihood Parameter Estimation
The likelihood function for a statistical model gives an indication of how likely it is to observe the
set of n observations, if the model is correct.
n
L ; x f xi ;
l ; x log L ; x ln f xi ;
i 1
i 1
The maximum likelihood estimator 1 ,..., m is the set of parameter values that maximises the
log-likelihood function. Or equivalently, set the score vector, i.e. vector of the partial derivatives of
the log-likelihood, to zero.
0 0,..., 0
l ; x
l ; x
S ;x l ;x
,...,
1
The Hessian matrix is a m m matrix of second derivatives
2l ; x
12
2
l ; x
H ; x
2
T
l ; x

m 1
The Fisher information is given by:
...
...
2l ; x
1 m
2l ; x
m2
I H ; x
A consistent estimator of the covariance matrix of the MLE is given by:

1
1
Var
H ; x

I
The square root of the diagonal elements of this matrix gives the standard errors of the MLE
Zhi Ying Feng
Zhi Ying Feng

Constructing the Likelihood Function
Consider a set of incomplete data loss data with a policy limit and a deductible of $M, recorded in
the form of xi , i for i 1, 2,..., n , where:
i 1 if the loss exceeds the policy limit, i.e. right censored
xi is the loss in excess of the deductible M, i.e. left truncated, so that:

xi loss M
loss xi M
Then the likelihood function for the loss data is given by:
i
1 F xi M
L
1 F M
i 1
n
1 i
f xi M
1 F M
Graphical Model Evaluation

Once we have chosen and estimated a parametric model, we need to judge its quality. First step is to
do some graphical comparisons:
Histogram vs. fitted parametric density function
Empirical CDF vs. fitted parametric CDF
Probability-probability (P-P) plot theoretical vs. empirical cumulative probabilities
Quantile-quantile (Q-Q) plot theoretical vs. empirical quantiles
P-P Plot
A P-P plot plots the theoretical CDF evaluated at the ranked observations against the empirical
CDF. To construct a P-P plot:
Rank the observed data into an ordered statistics, i.e. from smallest to largest x1 ,..., x n
Calculate the theoretical CDF at each of the observed data points F xi ; 0,1
For i 1, 2,..., n ,plot the points
i 0.5
on x-axis against F xi ; on y-axis
n
If the empirical cumulative probabilities match the theoretical values, then the plot should lie
approximately on the line y x
Assumption: there is no censoring or truncation in the data
P-P plot scales everything onto a 0-1 scale and distributes all the points uniformly over that
range, allowing one to better focus on the fit where most of the probability mass is at
Q-Q plot
A Q-Q plot plots the theoretical quantiles against the empirical quantiles of the ranked observations.
To construct a Q-Q plot:
Rank the observed data into an ordered statistics, i.e. from smallest to largest x1 ,..., x n
i 0.5
;
For i 1, 2,..., n , calculate the theoretical quantiles F 1
n
i 0.5
; on y-axis
For i 1, 2,..., n , plot the points x i on x-axis against F 1
n
Assumption: there is no censoring or truncation in the data

Q-Q plots tend to focus on the tails, especially for distributions with heavy tails since most
of the points tend to be clustered around the middle of the distribution
Zhi Ying Feng
Zhi Ying Feng
Model Hypothesis Tests

To test the null and alternate hypothesis:
H 0 : data came from population with the specified model
H1 : data did not come from such a population

Kolmogorov-Smirnoff Test
The Kolmogorov-Smirnoff test statistic measures a distance between empirical and theoretical CDF
KS max F x F x;
x
Where:
F x is the empirical distribution
F x; is the assumed, continuous theoretical distribution
is the MLE for under the null hypothesis

This test does not work for grouped data
Anderson-Darling Test
For n observations, the Anderson-Darling test statistic is:
F x F x;
f x; dx
AD n
F x; 1 F x;
Similar to the KS test, the theoretical distribution must be continuous and it does not work for
grouped data. The critical values for the AD test depend on the specific distribution that is tested.
2
Both the KS and AD tests look at the difference between the empirical and assumed distribution,
KS in absolute values and AD in squared differences. However, AD test is a weighted average
with more emphasis on goodness-of-fit in the tails than in the middle. For both tests, generally the
LOWER the test statistic, the better the model is. However, neither KS nor AD tests account for
the number of parameters in the model, so more complex models will often fare better.
Chi-Squared Goodness-of-fit Test
In the Chi-squared test, first break down the whole range of observed values into k subintervals
0 c0 c1 ... ck
The expected number of observations in the interval c j 1 , c j , assuming that the model in the null
hypothesis is true, is given by:
Ej n p j
where
p j F c j ; F c j 1;
The actual number of observations in the interval c j 1 , c j is given by:
O j np j
where
p j F c j F c j 1
The Chi-squared test statistic, which adjusts the d.o.f. for the number of parameters, is given by:
k
j0
Zhi Ying Feng
Oj
Ej
~ 2 k 1 no. of parameters estimated
Zhi Ying Feng

Schwarz-Bayesian Criterion
All 3 previous tests are sensitive to sample size, i.e. test statistic tends to increase as sample size
increases. This means that larger sample size has an increased probability of rejecting the null
hypothesis.
An alternative that does take into account both the number of parameters and the sample size is the
SBC score, which is defined by:
k
l x; log n
2
Where:
l x; is the maximised log-likelihood
k is the number of parameters estimated

n is the sample size
Generally prefer a larger SBC score
Zhi Ying Feng
Zhi Ying Feng
Module 4: Generalised Linear Models

GLMs are an extension of linear regression models in that they still relate a response variable y to a
set of covariates or independent variables with known information. However, unlike linear
regression, GLMs does not require the responses/errors to be normally distributed.
In reality, the error is often non-normal for many reasons, e.g. heterogeneity, skewed or kurtotic.
This can be because the data itself is not normally distributed, which can be identified by graphing
the mean-variance relationship. In linear regression, variance is constant with mean but for:
Count data: response is integers, variance may increase linearly with mean, should use
Poisson distribution
Count data as proportions: count of number of failures or successes, variance will be an
inverted parabola, should use binomial distribution
Time to death data: variance increases faster than linearly with mean, use Gamma or
exponential distributions
Binary response variables: when the response is binary, there is inherent heterogeneity
GLMs have many other useful actuarial applications because:
Models are often multiplicative, hence linear on a log scale. Other GLMs allow linearity on
other scales, e.g. logit and reciprocal
Claim frequency are generally modelled using a Poisson distribution
Claim severity are generally modelled using gamma or lognormal distributions
Exponential Family of Distributions

The responses in GLMs instead have a distribution that belongs to the exponential family of
distributions. A probability distribution for a random variable Y belongs to an exponential family if
its density has the form:
y b
fY y exp
c y;
Where:
is the location, or the natural, parameter and is a function of the mean
is the scale parameter
b and c y; are known functions that specify the distribution
If Y1 , Y2 ,..., Yn are i.i.d. with exponential family distributions, then the sample mean Y is also
exponential dispersion with same location parameter , but scale parameter is n
Properties of Exponential Distributions

The moment generating function can be expressed as:
b t b
mY t eYt exp
The cumulative generating function is then:

b t b
Y t ln mY t
Zhi Ying Feng
Zhi Ying Feng

Thus, the mean and variance is given by:
Y Y t
t
Var Y
b '
t0
2
Y t b ''
t 2
t0
The mean is a function of only and vice versa, so we can write as
The variance can involve both and , depending on the function b. Thus it can be related to
the mean for certain distributions. To emphasise this, the variance can be written as
Var Y V
V b '' is called the variance function
Common Exponential Family Distributions

Distribution
Location Parameter
2

Normal N ,
Scale Parameter
2
Gamma ,
1
2 2
2
2
ln

Inverse Gaussian ,
Poisson
Binomial n, p
ln
1 p
n
p
ln 1 p ln
r
ln
Negative Binomial r , p
1
1
Variance Function
V 1
V 2
V 3
V

V 1
n

V 1
r
Example: Gamma Distribution

Show that the Gamma , distribution belongs to the exponential family.
y 1e y
f y

exp ln ln 1 ln y y
y ln ln
1 1
exp
ln
1
ln
y
1
1
1 1
y b
exp
c y;
Where:
1
and
b ln and c y;
Zhi Ying Feng
ln
1
1
1 ln y ln

1
Zhi Ying Feng

The mean and variance is then:
Y b '

Var Y b '' 2 2
Note that the mean is related to :
Therefore the variance function is given by:
V b ''
Components of a GLM
A GLM has three main components:
A linear predictor that is a function of the explanatory variables (systematic/location)
An exponential family distribution for the responses/errors (stochastic/spread)
A link function that connects the mean response to the linear predictor
Linear Predictor
The linear predictor is a function of the covariates that is linear in parameter:
i xij j 0 1 x1i ... j xij

j
Where:
xi are the covariates
j are the parameters that needs to be estimated
Link Function
Yi i to the linear predictori xiT .

i g i
The link function links the mean response, i.e.
If we have estimated the parameters, e.g. using MLE, and we have the values for the covariates,
then we can calculate the value of the linear predictor. Then to obtain the expected response value
i , we substitute the linear predictor into the inverse of a link function g:
i g i
i g 1 i
If the response has an exponential family distribution, then given b and , all that is needed to
define this exponential family distribution is . This can be computed from the linear predictor
using the link function g . and using the fact that is a function of
i i
g 1 i
g 1 xiT
Zhi Ying Feng
Zhi Ying Feng

Canonical Link Function
If g . . , then we have that:
i i g 1 i 1 i i
This is known as a canonical link functions, or natural link function. To find the canonical link
function, we can use the b ' step to derive as a function of and equating this to g .
Some common canonical link functions are:
Distribution
Canonical link
Normal
g
Name
Identity (additive model)
Poisson
g ln
Binomial
g ln
Gamma
Log link (multiplicative model)
Logit
n
1
Reciprocal
Maximum Likelihood Estimation

To estimate the parameters in the linear predictor component of a GLM, we usually use maximum
likelihood estimation. Since the response has a distribution from the exponential family, the loglikelihood function is given by:
n
n
y b i
L f yi exp i i
c yi ;
i 1
i 1
n
y b i
l i i
c yi ;
i 1
Since we want to estimate the s, we need to get i xiT involved by using the link function:
y g 1 xT b g 1 xT
i
i
i
i
l
c yi ;
i 1
Example: Gamma Distribution

Consider a GLM with one covariate, the gamma distribution and the canonical link function
b ln
i xi
g i i i xi
The canonical link function allows us the write the log-likelihood function in terms of the
parameters we want to estimate:
n
y ln i
l i i
c yi ;
i 1
n y x ln x

i
i
i
c yi ;
i 1
Then we can differentiate l with respect to and , and set the derivatives to zero to find the
MLEs.
Zhi Ying Feng
Zhi Ying Feng
Goodness of Fit
The saturated model is a model where there are as many parameters as observations, so that the
fitted values are equal to the observed values, i.e. all variations explained by covariates:
yi i xiT
for all i
The log-likelihood of the saturated model is given by:
n y i b i
i
l y; y
c yi ;
i 1
Where:
i is the canonical parameter corresponding to i yi
Scaled Deviance
One way of assessing the goodness of fit of a given GLM is to use the likelihood ratio criterion.
This gives the difference in the log-likelihood between the model estimated using MLE and the
saturated model, which fits the data perfectly.
L y;
n y i i
b i b i
i
2ln 2ln
2 l y; y l y; 2
L y; y
i 1
This is also known as the scale deviance, written as:
2ln
D y,
approx.
2 n p
Where:
is MLE of the chosen model and as the natural parameter corresponding to
D y, is known as the deviance, which is similar to RSS for ordinary linear models
The scaled deviance is approximately (asymptotically) chi-squared distributed with d.o.f.

equal to the no. of observations minus the no. of estimated parameters.
In general, the smaller the deviance, the better the model fit
The difference in scaled deviance can be used to compare nested models, i.e. models that are
subsets of each other with the same distribution and link function, but more parameters in the linear
predictor. Consider two estimated models with scaled deviance below:
Model 1
Model 2
Scaled Deviance
D1
D2
No. of parameters
q
p>q
To decide whether Model 2 is a significant improvement (decrease in scaled deviance) over the
more parsimonious Model 1, we compare the difference in scaled deviance
D1 D2 ~ 2 p q
As a rule of thumb, the 5% critical value for a chi-squared distribution with v d.o.f. is
approximately 2v. Thus, Model 2 is better than Model 1 if D1 D2 2 p q .
Zhi Ying Feng
Zhi Ying Feng

Residuals in GLM
Residuals are the primary tool to assess goodness-of-fit after a model has been chosen and they
should show no pattern if the fitted model is correct. Also, they also help detect misspecifications in
the form of the variance function and allow estimation of to check for overdispersion
The deviance residuals are defined as:
ri D sign yi i
di
Where:
The sign . function returns 1 for positive arguments and -1 for negative arguments
d i is the contribution of the ith observation to the deviance
If the model is correct and the sample size n is large, then the scaled deviance is
approximately n2 p , so the expected value of the deviance is n p , and each observation
should contribute approximately n p n 1 to the deviance. Thus, if di is significantly
greater than 1, then this indicate a departure from the model assumptions for that obs.
Typically, deviance residuals are plotted against fitted values to examine whether the
variance function is correct, i.e. residuals should have zero mean and constant variance
The Pearson residuals are defined as:

ri P
yi i
V i
If the model is correct, the Pearsons residuals are approximately normal, so check with Q-Q plot.
However, Pearsons residuals are often skewed for non-normal data while deviance residuals are
more likely to be symmetrically distributed. Thus, deviance residuals are preferred.
Overdispersion
Overdispersion is the presence of greater variance in a data set than what would be expected based
on a fitted model. While it is usually possible to choose the model parameters such that the models
theoretical mean is approximately equal to the sample mean of the data, the observed variance may
often be higher than the theoretical variance. Overdispersion tends to occur when one or more of the
relevant covariates have not been measured.
Overdispersion is often encountered with very simple parametric models, such as Poisson. When
using a Poisson regression model for count data, we assume that the mean is equal to the variance
but this may not reflect our data. However, the Poisson distribution only has one free parameter
since 1 , so it does not allow the variance to be adjusted independently of the mean.
Estimating
In the absence of Poisson overdispersion, the dispersion parameter should be close to 1. One way
to estimate is to use the Pearsons 2 statistic, which is the sum of Pearsons residuals squared:
n
2
i 1
Zhi Ying Feng
y
i
Var Yi
y
1

V
n
i 1
2
approx.
~ n2 p
Zhi Ying Feng

Since Pearsons 2 statistic is approximately 2 distributed, then its expected value is approximately
n p . Then taking the expectation of both sides gives an approximately unbiased estimate of as
y
1
n p
V
n
i 1
This method is preferred for the gamma case, as well as for non-aggregate data. Otherwise, an
alternative estimation is to use the fact that the scaled deviance is also approximately 2 distributed
with n p degrees of freedom, so its expected value is also n p . Then the estimate of is:
D y,
n p
If the estimated value of is significantly greater than 1, then it indicates that overdispersion is
present. Then we can either:
Use an alternative model with additional free parameters, e.g. negative binomial
Use maximum quasi-likelihood estimation, also applicable to dispersion with other models
Quasi-Likelihood Function
When overdispersion is present, we can use the quasi-likelihood function to specify a more
appropriate variance function. Estimation using quasi-likelihood ONLY requires the meanvariance relationship to be specified, which can be easily estimated from data. Thus, quasilikelihood is useful when we dont know for sure what the underlying distribution of the data is.
For any given mean i and variance function V i , the quasi-log-likelihood Q is defined as:
i
yi z
Q
dz
i 1 yi V z
n
y i i
Q i
j
i 1 V i j
To find the maximum quasi-likelihood (MQL) estimates of the parameters j , we only need to
specify the mean i , variance function V i and the link function g.
Example: Poisson Variance Function
Lets say we think the data is Poisson, but we arent sure, so we only specify variance function as
V but not the data distribution nor . Then the quasi-log-likelihood is
i
n
1
y z
Q i
dz yi ln i i constant
i 1 yi z
i 1
Compare this to the log-likelihood if we assumed the data was Poisson:
y ln n
i 1
Thus, if 1 , then the MQL estimates will be the same as the MLE from a Poisson regression.
However, the standard errors will be multiplied by a factor of , reflecting the greater degree of
uncertainty due to the presence of overdispersion
Zhi Ying Feng
Zhi Ying Feng
Module 5: Copulas
Copula
Consider a random vector X1 , X 2 ,..., X n with marginal CDF Fi x . Applying the probability
integral transform to each component gives a random vector with uniformly distributed marginals:
U1,U 2 ,...,U n F1 X1 , F2 X 2 ,..., Fn X n Ui ~ U 0,1

The copula of X1 , X 2 ,..., X n is defined as the joint cumulative distribution of U1 ,U 2 ,...,U n
C u1 , u2 ,..., un U1 u1 ,U 2 u2 ,...,U n un
Sklars Theorem
Sklars Theorem states that for any random variables X1 , X 2 ,..., X n with the joint CDF
F x1 , x2 ,..., xn
And marginal CDFs
Fj x
X1 x1, X 2 x2 ,..., X n xn
x , j 1, 2,..., n
There exists a copula such that:
F x1 , x2 ,..., xn C F1 x1 , F2 x2 ,..., Fn xn
Or equivalently:
X1 x1,..., X n xn C X1 x1 ,..., X n xn
The implication of Sklars Theorem is that:

We can describe the joint distribution of X1 , X 2 ,..., X n by using the marginal distributions
Fj x and the copular C.
When modelling, we can model the marginal distributions separately then plug them into the
copula function to model the joint distribution
Conversely, given a joint distribution function, the corresponding copula that describes the
dependence structure of this joint CDF can be found by substituting the inverse of the marginal
distributions into the joint CDF:
C u1 ,..., un F F11 u1 ,..., Fn1 un
For a bivariate case, the marginal distributions can be found using:

FX x lim FX ,Y x, y and
FY y lim FX ,Y x, y
y
Copula & Dependence

Copula usually has parameters that capture the dependence structure between r.v., while separating
the effects of their marginal distributions. If X1 , X 2 ,..., X n are independent, then its copula is:
F x1 ,..., xn F1 x1 ... Fn xn C F1 x1 ,..., Fn xn
C u1 , u2 ,..., un u1 u2 ... un
If X1 , X 2 ,..., X n are completely dependent, i.e. X1 ... X n X , then the copula is:
F x1 ,..., xn
X min x ,..., x min X x ,..., X x C F x ,..., F x

1
C u1 , u2 ,..., un min u1 , u2 ,..., un

Zhi Ying Feng
Zhi Ying Feng

Copula Density
The p.d.f. of the copula distribution c is given by:
c u1 , u2 ,..., un
n
C u1 , u2 ,..., un
u1u2 ...un
For continuous marginal densities, the joint probability density of X1 , X 2 ,..., X n is:
n F x1 ,..., xn
f x1 , x2 ,..., xn
x1...xn
nC F1 x1 ,..., Fn xn F1 x1
F x
... n n
x1...xn
x1
xn
f x1 , x2 ,..., xn
f1 x1 f 2 x2 ... f n xn
c F1 x1 , F2 x2 ,..., Fn xn
In this form, the copula p.d.f. can be interpreted as:

The ratio of the joint p.d.f. to what it would have been under independence
The adjustment factor needed to convert the independence p.d.f. into the joint p.d.f.
Properties of Copulas
For n 2 , the copula C is a function mapping 0,1 to 0,1 that is non-decreasing and right
2
continuous. It has the following properties:

lim C u1 , u2 0 for k 1, 2
lim C u1 ,..., un 0
uk 0
. For higher dimensions: uk 0
lim C u1 , u2 u2 and lim C u1 , u2 u1 . For higher dimensions: C 1,...,1, uk ,1,...,1 uk
u1 1
u2 1
C satisfies the inequality
C v1 , v2 C u1 , v2 C v1 , u2 C u1 , u2 0 for u1 v1 , u2 v2
Invariance Property
Suppose a random vector X X1 , X 2 ,..., X n has copula C and suppose T1 ,..., Tn are non-decreasing,
continuous functions of X1 ,..., X n . Then consider the random vector defined by:
T X ,..., T X
1
Then the copula for this random vector is:
T X x ,..., T X x X
1
T11 x1 ,..., X n Tn1 xn
T X x ,..., T X
1
T11 x1 ,...,
1
X
n
Tn1 xn
n
xn
Therefore, this random vector has the same copula C. This implies that:
Copulas are invariant under non-decreasing transformations of the random variables.
However, the marginal p.d.f.s will change under transformation
An implication of this property is that even if two random vectors X X1 ,..., X n and Y Y1 ,..., Yn
have the same copula, i.e. CX CY , it does not mean that X Y , since Y can be a non-decreasing
transformation of X. However, if X and Y have the same distribution, then CX CY .
Zhi Ying Feng
Zhi Ying Feng

The Frechet Bounds
Define the Frechet bounds as:
Frechet Lower Bound : LF u1 ,..., un max
u n 1, 0
k 1 k
Frechet Upper Bound : U F u1 ,..., un min u1 ,..., un

Then any copula function satisfies the following bounds:
LF u1 ,..., un C u1 ,..., un U F u1 ,..., un

The Frechet upper bound satisfies the definition of a copula, but the lower bound doesnt for n 3
Archimedean Copulas
An Archimedean copula C has the form:
C u1 , u2 ,..., un 1 u1 u2 ... un
Where:
The function is called the generator
is decreasing and convex, with 1 0
Each Archimedean copula has a single parameter that controls the degree of dependence
Clayton Copula
The Clayton copula is defined by:
n
C u1 , u2 ,..., un uk n 1
k 1
This is an Archimedean copula with:
t t 1, 1
1 s 1 s
In the case of n 2 :
C u, v u v 1
Frank Copula
The Frank copula is defined by:
n uk 1
1
k 1
C u1 , u2 ,..., un
ln 1
n 1
ln
1
This is an Archimedean copula with:
t 1
, 1
1
t ln
1 s
Zhi Ying Feng
1
ln 1 e s 1 e
ln
Zhi Ying Feng
Simulation of Copulas
In the bivariate case, to simulate a pair of random variables with known distributions F and G,
whose the dependence structure described by a copula C, use the conditional distribution method:
1. Simulate two independent uniform random variables u and t
2. Using the given copula, transform the t into a v 0,1 so that it has the right dependence
structure w.r.t u
-
First we need the conditional distribution function for V given U u , denoted by cu v
cu v
-
V v | U u lim
u 0
C u u, v C u, v C u, v
u
u
Find the inverse of cu v , then set:
cu v t
cu1 t v
3. Map u, v into x, y using their distribution functions:
x F 1 u
y G 1 v
This gives you two simulated outcomes for the random variables X and Y, with distributions F and
G respectively, such that their dependence is represented by the copula C
Example
Let X and Y be exponential with mean 1 and standard normal. The copula describing their
dependence is given by:
uv
C u, v
u v uv
Simulate two pairs of outcomes for X , Y given the following uniform(0,1) pseudo-random
variables:
0.3726791, 0.6189313
1. Treat the generated U 0,1 random variables as:
u, t 0.3726791, 0.6189313
2. Set up the conditional distribution function and set it as t:
C u, v
v
cu v
t
u
u v uv
2
Thus, its inverse is:

cu1 t
tu
v
1 t 1 u
3. Use the inverse to find v:

v
tu
0.5788953
1 t 1 u
4. Map u, v into x, y using:
x F 1 u ln 1 u 0.4662971
y 1 v 0.8648739
Zhi Ying Feng
Zhi Ying Feng
Module 1: Individual & Collective Risk Model

Individual Risk Model
A portfolio of insurance contracts will potentially experience a sequence of losses, and the aim is to
model the distribution of the aggregate sum S of these losses over a certain period of time. If the
number of losses is deterministic, then we have an individual risk model:
n
S X1 X 2 ... X n X i
i 1
There are two ways of obtaining the exact distribution of S:

Convolution formula
Moment generating functions
Convolution of Random Variables
Convolution is the operation of determining the distribution of the sum of two random variables.
FY s x FX x
FX Y s s x
FY s x f X x dx

fY s x f X x
f X Y s s x
fY s x f X x dx

MGF & PGF

If all Xi are independent, then the mgf and pgf of S can be written as:
mS t
e
tS
e ... e
t X1 ... X n
X1 ... X n
tX n
tX1
t
t
t ... t
pS t
Xn
X1
p X1 t ... p X n t
mX1 t ...mX n t
There is a one-to-one relationship between a distribution and its mgf/pgf, so we can use the
mgf/pgf of the individual losses to find the mgf of the sum S, then if the mgf/pgf can be recognized
then we know the distribution of S. Both mgf and pgf have some useful identities involving Taylor
series expansion about 0:
pX t
t X n t
mX t
e X n e
n 0
n 0
tX
tn
n 0
n 0
0 t n
n!
m
0 etn
n!
Modelling Individual Losses Given a Claim Occurs

Often we will model the distribution of the individual loss only when a claim occurs. This allows us
to model claim frequency from claim severity separately:
X IB
Where:
I is an indicator variable for a claim occurring:
1 w.p. q
I
0 w.p. 1 q
B is the claim amount random variable, given that a claim occurs, i.e. I 1
Zhi Ying Feng
Zhi Ying Feng

In this case, the distribution of the individual loss is:
FX x X x
X x | I 0 I 0 X x | I 1 I 1
1 q q B x
The mgf of the individual loss is:
mX t
tX
| I 0
1 q q
I 0
tX
| I 1
I 1
e
tB
1 q q mB t
The mean is given by:
X | I
X |I X | I 0 I 0
q B
I
X |I
X |I
X | I 1 I 1
The variance is given by:

Var X Var
Var I

X | I Var X | I
B I 2 Var B
Var I Var B
q 1 q
I
2
q Var B
Deductible & Policy Limit

Insurers introduce deductibles and policy limits as a way of controlling the cost and variability of
individual claim losses:
Deductible d: the insurer only starts paying claim amounts above d
Policy limit L: the insurer pays claims up to the limit L, this does not include the deductible!
For a policy with both a deductible and policy limit, if we denote the damage random variable by D,
then the claim random variable if a claim occurs is:
B min max D d , 0 , L
Example
Consider a car insurance policy where the policyholder has a 0.1 probability of having an accident:
If an accident occurs, the damage to the vehicle is uniformly distributed between 0 and 2500
The policy has a deductible of 500 and a policy limit of 1500
Find the density of X, the claim amount the insurer is liable for.
The random variable X is given by:
B ~ U 0, 2500
X min max B 500, 0 ,1500

In this case, we cant use the X IB structure since a 0 claim could be a result of no claim occurring,
or a claim is under the deductible
Zhi Ying Feng
Zhi Ying Feng

This can be represented by the diagram:
The density of B is given by:

500
x 0 0 B 500
0.9 0.1 2500 0.92
fX x
0.1
0.00004 0 x 1500 500 B 2000
2500
500
0.1
0.02
x 1500 2000 B 2500
2500
Collective Risk Model

The collective risk model is another way to separate claim frequency from claim severity, by
assuming that the total number of claims in the whole portfolio is also a random variable. Thus, the
aggregate loss is a random sum:
N
S X 1 X 2 ... X N X i
i 1
Where:
N is the number of claims, commonly Poisson(), Binomial(n,p) or Negative Binomial(r,p)
X i is the amount of the ith claim
The X i s are i.i.d. with:

-
CDF P x
P.d.f. p x
Moments
X p
k
The X i s and N are mutually independent
Under this model, the mean of the aggregate loss is:
S | N
N
i 1
X i | N n N
X i p1 N
The variance of the aggregate loss is:

Var S Var S | N Var
Var
N
i 1
X i | N n Var N
N Var X i
Zhi Ying Feng
S | N
X i Var N
N p2 p12 p12 Var N
N p2 p12 Var N N
2
X i
Zhi Ying Feng

The m.g.f. of S can be expressed as a function of the m.g.f. of X and N:
mS t
e
tS
t X1 ... X N
|N
N
m X t
N ln mX t
mN ln mX t
Distribution of S
The general expression for the distribution of S can be found by conditioning on the no. of claims:
FS x
N n P*n x
S x | N n
n 0
N n
n 0
Where:
P*n x is the nth convolution of X, i.e. X1 ... X n
N is the no. of claims so it will always be discrete, thus this formula works for any type of X
However, the type of S will depend on the type of X
If X is continuous
If X is a continuous r.v., then S will generally have a mixed distribution with:
A probability mass at 0 since:
FS 0 P*n 0
N n N 0
n 0
Continuous everywhere else, with a density integrating to 1
N 0
If X is mixed
If X is mixed, then S will also generally have a mixed distribution with:
N 0 and X 0
A probability mass at 0 since because of
Mixed (if X is not continuous for x 0 ), or continuous elsewhere
A density integrating to something less than 1
N 0
If X is discrete
If X is discrete, then we can derive a similar expression for the p.m.f of S:
fS x
n 0
S x | N n N n p *n x N n
n 0
Where:
p*0 0 1
Efficient if the number of possible outcomes for N is small, otherwise too tedious
p*n x can be calculated using convolution in a table or using de Prils algorithm
f S x sumproduct of row x and row
Zhi Ying Feng
N n
Zhi Ying Feng

Inhomogeneous Portfolio
In an inhomogeneous portfolio, for different classes of risk we want to model N, the number of
claims, separately. We can let
N | be Poisson()
have the p.d.f. u , 0
Here the distribution of represents a different class or risk, so for a certain class of risk we
have a certain value of and N will have a different distribution
The distribution of N is given by:
N n 0 N n | u d
0
The expected value and variance of N is:
e n
u d
n!
N |

Var N Var N | Var

Var
The m.g.f. of N is:
mN t
e
e
N |
tN
tN
et 1
e
m et 1
Example: ~ Gamma(,)
Find the distribution of N if given that N | ~ Poi and ~ Gamma , .
Using the m.g.f. approach:
mN t m et 1
t
e 1
t
1 1 1 e
This is the m.g.f. of a negative binomial distribution, so the distribution of N is:

N ~ Negative Binomial r , p 1
Then the distribution if S will be a compound negative binomial
Zhi Ying Feng
Zhi Ying Feng
Compound Poisson Distribution

A common distribution for N is Poisson(), for which S has a compound Poisson distribution:
N
S X i ~ Compound Poisson , P x
i 1
Where:
N Var N
and mN t exp et 1
P x is the CDF of i.d.d. X i , the amount of each claim
Using the formulae derived earlier, we find that:
S p1
Var S p2
mS t exp mX t 1
Cumulant Generating Function

The cumulant generating function is defined by:
k X t ln mX t
The k-th cumulant of a random variable X with c.g.f. k X t is defined as:
dk
k k kX t k k 0
dt
t0
Some properties of cumulants are:
k S
for k 2,3
Therefore, for a compound Poisson random variable S, we have:
and
dk
dk
k k mX t 1 k mX t pk
dt
dt
t0
t0
Then we can use the cumulant to derive the expected value, variance, skewness and kurtosis of S:
S 1 p1
Var S 2 p2
Skew S 1
Kurt S 2
3
p3
p3
3/2
3/2
2
p23/2
p2
4 p4
p
2 2 42
2
2 p2 p2
Theorem 1
Let S1 , S2 ,..., Sn be independent compound Poisson random variables with parameters i and Pi x .
Then the sum is also a compound Poisson random variable:
S S1 S2 ... Sn ~ Compound Poisson , P x
Where:
i 1
n
i
Pi x or equivalently p x i pi x
i 1
i 1
n
i and P x
This means that independent portfolios of losses can be easily aggregated. Also, the total
claims paid over n years is compound Poisson, even if losses vary in severity and frequency
across the years
Zhi Ying Feng
Zhi Ying Feng

Proof:
The mgf of P x can be expressed as a weighted sum of the mgf of individual claims X i
mX t etx p x dx
0
etx i pi x dx
0
i 1
i
i 1
n
etx pi x dx
i
mX t
i 1
n
The mgf of each Si is given by:
mSi t exp mX i t 1
Then the m.g.f. of the sum S S1 ... Sn is:

n
mS t mSi t
i 1
exp i mX i t 1
i 1
exp i mX i t 1
i 1

exp mX t 1
Thus the mgf of S is that of a compound Poisson distribution with parameters and P x
Theorem 2
If S ~ Compound Poisson , p xi i , i 1,..., n , then we can write S in the form:
S x1 N1 ... xm Nm
Where:
N1 , N2 ,..., Nm are mutually independent
Ni ~ Poi i i represents the no. of claims of amount xi

Each xi Ni represents the aggregate sum of claims of amount xi , so it is also a compound
Poisson with parameters i and p xi 1 , since it only includes claims of amount xi
Proof:
We need to show that N i are mutually independent and Poisson distributed. First define the sum of
the number of claims from each possible claim amount by:
m
N Ni
i 1
If we know what N is, i.e. if N n , then the number of claims N1 , N2 ,..., Nm of each claim amount
has a multinomial distribution with parameters n, 1 , 2 ,..., m
Zhi Ying Feng
Zhi Ying Feng

Thus, we can find the joint moment generating function of this multinomial distribution by
conditioning on N
m

m
exp ti N i exp ti N i | N n
i 1
n 0
i 1
N n
m
e n
i eti
n 0 i 1
n!
m
1
e i eti
n 0 i 1
n!
e exp i eti
i 1
m
m
exp i eti i
i 1
i 1
exp i eti 1
i 1
Since the joint mgf is a product of the individual mgf, it implies that N1 , N2 ,..., Nm are mutually
independent. Next, by setting ti t and t j 0 , we have the mgf of a particular N i :
exp tNi exp i et 1
This is the mgf of a Poisson random variable with parameter i and therefore Ni ~ Poi i
Sparse Vector Algorithm
Theorem 2 allows us to use the sparse vector algorithm, an alternative to convolution that is more
efficient for small m. Suppose S has a compound Poisson distribution with 0.8 and the
distribution for the individual claim amount is given by:
xi
1
2
3
X x p xi i
0.250
0.375
0.375
We can avoid convolution by writing S as the sum of $1, $2 and $3 claims
S N1 2 N2 3N3
Then the distributions for N1 , 2N 2 and 3N3 , i.e. the total $1, $2 and $3 claims, are
Zhi Ying Feng
Zhi Ying Feng
Recursion Algorithms
Another method to get the distribution of S when the claim amount X i is discrete is to use recursive
algorithms, which requires a certain family of distributions
The (a,b) Family
A distribution in the (a,b) family has the following property:
b
N n a N n 1
n
N n can be obtained by recursion using the initial value N 0 . There are
This means that
only 3 members of the (a,b) family
Panjers Recursion Algorithm

Consider S i 1 X i and let:
N
N belong to the(a,b) family

X i be i.i.d. and integer valued non-negative random variables
Then Panjers recursion algorithm can be used to approximate the density of S

s
1
bj
fS s
a p j fS s j
1 a p 0 j 1
s
With starting value:
N 0
if p 0 0
fS 0
mN ln p 0 if p 0 0
Note that if p x 0 for x xmax , then the upper bound of the sum can be reduced to min s, xmax .
The second outcome for f S 0 is derived through:
fS 0
n 0
X 1 ... X N
p 0
0 | N n
N n
N n
n 0
N
p 0
mN ln p 0
If S ~ Compound Poisson , P x , then N is Poisson and part of the(a,b) family so we have:
fS s
1 s
j p j fS s j
s j 1
With a starting value that doesnt depend on whether p 0 is positive or zero:

p 0 1
f S 0 e
Zhi Ying Feng
Zhi Ying Feng

De Prils Recursion Algorithm
De Prils recursion algorithm can be used to find the density of the n-th convolution of a nonnegative and integer-valued random variable X with a positive probability mass at 0, i.e. p 0 0 .
To derive the algorithm, first consider the mgf of X:
x 0
x 1
mX t p x etx p 0 p x etx
p x tx
e
x 1 1 p 0
p 0 1 p 0
q p mX t
Where we define:
q p 0 so that p 1 p 0
A new random variable X whose pmf can be obtained by comparing coefficients of etx
p x
p 0 0, p x
, x 1, 2,...
1 p 0
Then, if we consider the n-th convolution of X, i.e. X1 ... X n , we have:
t X1 ... X n
q p mX t
This has the form of the mgf of a compound binomial distribution. Thus, the n-th convolution of X
has a compound binomial distribution with parameters n, p 1 p 0 , P x , i.e.:
n
i 1
X i i 1 X i
N
Where:
N ~ Bin n, p which is part of the(a,b) family with:

a
1 p 0
p
1 p
p 0
and
n 1 p n 1 1 p 0
1 p
p 0
Pmf of X is p x
So now this convolution is equivalent to the density of a compound binomial, and since binomial
distribution is part of the(a,b) family, we can apply Panjers algorithm:
x
1
bj
*n
p*n x
a p j p x j
x
1 a p 0 j 1
x
1 p 0 n 1 1 p 0

p
0
p 0 x
j 1
1
p 0
j p j *n
p x j
1 p 0
min x , xmax
n 1 x 1 p j p x j
*n
j 1
With initial value:

p*n 0 p 0
Note that despite transforming S into a sum of X , the pmf in the algorithm is still the original p j
Zhi Ying Feng
Zhi Ying Feng

Discretization
Both Panjers and de Prils algorithm requires a discrete distribution for X, but in reality X is often
fitted to a continuous or mixed distribution, in this case we would need to discretize the distribution.
One method of discretizing a continuous distribution is the method of rounding:
1. Segregate the continuous distribution into a finite set of points 0, h, 2h,..., mh
- If the range of X is not finite, then m has to be chosen in order to have a good
representation of the right tail
- The span h should be relatively small compared to the mean
2. Let f j be the probability mass placed at jh, for j 0,1,..., m :
for j 0
f 0 FX h 2 0
f j f j FX jh h 2 0 FX jh h 2 0 for j 1, 2,..., m 1
f 1 F mh h 2 0
for j m
X
m
-
This ensures that
m
j0
f j 1
Approximation Methods
Sometimes it is not possible to compute the distribution of S, either because the data available is not
detailed enough, or it is impossible to fit a tractable model to the data. Rather than taking the risk of
fitting a wrong model, a quick approximation may be used instead.
CLT Approximation
Assuming that the distribution of S is symmetrical, the central limit theorem suggests that we can
use the approximation:
s S
s S
Z

Var S
Var S
However, this is usually a poor approximation:

For individual model if n is small then CLT is not effective, i.e. sample size too small
For collective model if for compound Poisson or r for compound negative binomial is small
then it does not approximate to a normal distribution
CLT approximation does not work well for highly skewed distributions
FS s
S s
Normal Power Approximation

For positively skewed distributions such as compound Poisson, we can use the normal power
approximation which is the CLT approximation but with a correction term for skewness
S x
S S x S
Var S
Var S
S S
x* s
Var S
Where:
where 1 is the skewness coefficient
If we want the quantile, use x* s
If S is discrete distributions, add 0.5 discontinuity correction when approximating
Zhi Ying Feng
s
6
1
2
If we want the probability, use s
2
1
6 x*
S x
Zhi Ying Feng

Edgeworths Approximation
Another approximation method is to use the CLT approximation and add terms that will improve
the accuracy. Let:
dk
x
dx k
Then Edgeworths approximation is:

S S
z z 1 3 z 2 4 z 1 6 z
6
24
72
Var S
Where:
x f x
1
1 12 x2
e
2
x x f x
etc ...
Translated Gamma Approximation

An alternative correcting the CLT approximation for positive skewness is to use a distribution that
is naturally positively skewed. We can approximate S with U x0 with U ~ Gamma , with
, , x0 such that the first, second and third moment of S and U x0 coincide
x x0
1 t
FS x U x0 x G x x0 ; ,
t e dt
0
Equating the mean, variance and third moment gives the method of moments estimators for , , x0 :
S U x0 x0
Var S Var U x0
S
U x
0
1
U x0
1
4
12
x0
The translated Gamma approximation has the following properties:
If U ~ Gamma , then 2U ~ 2 2 , useful for calculating P U x x0
If , and x0 such that the mean and variance converge to a constant
constant and Var S 2 2 constant
Then G x x0 ; , converges to N , 2 , i.e. it becomes a CLT approximation
S x0
If S is discrete distributions, add 0.5 discontinuity correction when approximating
Zhi Ying Feng
S x
Zhi Ying Feng

Approximating Individual Model with Collective Model
We may want to approximate an individual model with a collective model to take advantage of
computational advantages of compound distributions, e.g. ability to use recursive algorithms.
Consider S from the individual model and S from the collective model:
n
i 1
i 1
S I i bi Ni bi S
Where:
Ii ~ Bernoulli qi represents the probability of a claim occurring
Ni ~ Poi i represents the number of claims of amount bi
S is compound Poisson with parameters:

n
i
I[b , ) x p x i I x bi
i 1
i 1
i and P x
i 1
When approximating an individual model with the collective model, we need to make assumptions
for i , the contribution of an individual of amount i to the total expected number of claims . Note
that in both assumptions the distribution of claim size X, i.e. p x , remains the same.
Assumption 1
Ii qi
i
n
qi
i 1
This assumption ensures that the expected number of claims of size bi in the approximated collective
model will be the same as the individual mode. Therefore, the expected value of the aggregate
claims S and S will also be the same
n
S qibi
i 1
However, the approximated collective model will have a higher variance

n
i 1
i 1
Var S qibi2 qi 1 qi bi2 Var S

Assumption 2
1
1
2
3
This assumption ensures that the probability of no claims in the approximated collective model will
be the same as the individual model. Therefore, the probability of zero aggregate claims will be the
same:
i ln 1 qi qi qi2 qi3 ... qi
S 0
S 0
However, the approximated collective model will have both a higher mean and variance
n
i 1
i 1
S ibi qibi
i 1
i 1
i 1
Var S i bi2 qi bi2 qi 1 qi bi2 Var S

Zhi Ying Feng
Zhi Ying Feng
Module 2: Reinsurance
Reinsurance is the transfer of risk from an insurer to a reinsurer, i.e. the insurer pays a deterministic
premium to the reinsurer to protect itself from a random loss arising from claims.
Types of Reinsurance
Proportional Reinsurance
In a proportional reinsurance contract, the insurer pays a retained proportion of a claim X:
Insurer pays Y X
Reinsurer pays Z 1 X
This is simply a change of scale, so the mean and variance of what the insurer is liable for is:
Y X
Y2 2 X2
Y X
Non-proportional Reinsurance
In an excess of loss reinsurance contract, the reinsurer pays the excess over a retention d, with the
premium being a fixed cost, e.g. $0.025 per unit of coverage in excess of retention.
Insurer pays Y min X , d
Reinsurer pays Z X d
For the direct insurer, a non-proportional reinsurance will always have the least variance of retained
claims for all reinsurance arrangements with the same expected retained claims. The reinsurer may
also limit the payment to an amount L so that:
Insurer pays Y min X , d X d L

Reinsurer pays Z min X d , L
Stop Loss Premium

In excess of loss reinsurance on aggregate loss S with a retention d, the reinsurance premium for the
direct insurer is the expected value of how much the aggregate loss will exceed the retention:
x d f x dx 1 F x dx if S is continuous
S
S
d
Pd S d d
if S is discrete
k d fS k
k d
Alternatively, if S is discrete the stop-loss premium can be calculated recursively:

If d is an integer
Pd 1 Pd 1 FS d
If d is not an integer
Zhi Ying Feng
with P0
Pd Pd d d 1 FS d
Zhi Ying Feng

To calculate the variance, we can calculate the second moment recursively:
Var S d
S d 2
S d
S d 2 Pd2
Pd21 2 Pd 1 1 FS d 1
with P02
S
2
Example: Non-Proportional Reinsurance

A life insurance company covers 16000 lives for 1 year term life insurances in amounts shown
below:
The probability of a claim q for each of the 16000 lives is 0.02. The excess of loss reinsurance with
retention of 30000 is available at a cost of 0.025 per dollar of coverage. Calculate the probability
that the total cost will exceed 8250000.
The total cost to the insurer after reinsurance is the total retained claims payable plus the
reinsurance premium. With reinsurance, policies with benefit amount of 1,2 and 3 will remain
unchanged but for those with benefit amount of 5 and 10, the insurer is now only liable for 3. Thus,
the portfolio of retained business is given by:
Thus, the expected value and variance of the total retained claims is:
3
S nk bk qk
k 1
8000 1 0.02 3500 2 0.02 4500 3 0.02

570
3
Var S nk bk2 qk 1 qk
k 1
0.02 0.98 8000 12 3500 22 4500 32

1225
Now, the reinsurance premium is calculated as:

P coverage 0.025
The coverage is what the reinsurer would have to pay in excess of the retention 3, which in this case
is:
2 per life for each of the 1500 lives insured for 5
7 per life for each of the 500 lives insured for 10
Therefore:
P 1500 2 500 7 0.025 162.5

Zhi Ying Feng
Zhi Ying Feng

Thus, the desired probability using normal approximation is:
662.5 S
Z
Var
S
Z 2.653
S 162.5 825
0.0041
Stochastic Processes
Rather than looking at the aggregate losses at a point in time as a random variable S, we are
interested in modelling it over a period of time as a stochastic process S(t).
The increment of a stochastic process is a random variable:
X t h X t
A stochastic process has independent increments if the random variable X t s X t is

independent of X t for all s t , i.e. future increments are independent of the past or present.
Equivalently, if X t0 , X t1 X t0 ,..., X tn X tn1 are independent for all
t1 t2 ... tn , then the stochastic process is independent.
A stochastic process has stationary increments if X t2 X t1 and X t2 X t1 ,

i.e. increments of the same length, have the same probability distribution, for all t1 , t2 .
Poisson Process
A stochastic process N t , t 0 is a counting process if it represents the number of events that

occur up to time t. A counting process satisfies:
N t 0 and N t is integer-valued
N s N t for s t , i.e. it must be non-decreasing
For s t , N t N s is the number of events that occurred in the interval ( s, t ]
A counting process N t , t 0 is a Poisson process with rate if it satisfies
N 0 0
It has independent increments

The number of events in any interval of length t has a Poisson distribution with rate t :
Pr N s t N s n Pr N t n e
t n
n!
Or equivalently, the inter-arrival time between the (n-1)-th and n-th event has an
Exponential() distribution
The probability of more than a jump at a time is 0
N t h N t 0 e h 1 h o h
N t h N t 1 he h h o h
N t h N t 2 o h
Zhi Ying Feng
Zhi Ying Feng

Compound Poisson Process
A compound Poisson process is defined as:
N t
S t X i
i 1
Where:
N t is a Poisson process with parameter
X i are i.i.d. with distribution P x are independent of N t
In a Poisson process, the increments can only have a height of 0 or 1 but for a compound
Poisson process, the increments can have a height of X i
Its increments also have a compound Poisson distribution
S t h S t
N t h
N h
i N t 1
X i ~ X i ~ Compound Poisson h, P x
i 1
Cramer-Lundberg Process
Let
N t
i 1
X i be a compound Poisson process. A Cramer-Lundberg process is defined as:

N t
U t u ct X i u ct S t
i 1
Where:
u is the initial surplus, or initial capital
c is the constant yearly premium rate which is the cash inflows. It is defined to cover the
expected aggregate loss over a unit period of time, with a the relative security loading of ,:
c 1 S 1
N 1
1 i 1 X i
N 1
1 i 1 X i | N 1

1 N 1
1 X
N t
U t represents the insurers surplus at time t. Each parallel upward jump represents the
constant premium income, and the downward jump represent the losses from claims
Cramer-Lundberg process has stationary and independent increments, because:
i 1
X i S t is the aggregate losses up to time t which is the cash outflows
U t s U t cs
D
N t s
i N t 1
cs
N t s N t
i 1
independent of U t
Xi
Xi
N s
cs X i
i 1
U s U 0
Zhi Ying Feng
stationary increments
Zhi Ying Feng
Probability of Ruin
The survival of an insurance company will depend on certain variables, e.g. the initial surplus u,
loading of premiums and the level of reinsurance. One way of monitoring these variables is the
concept of the probability of ruin. In the Cramer-Lundberg model, the time to ruin T is defined as:
T inf t 0 | U t 0
i.e. the first time that the insurers surplus U t becomes negative. The probability that a company
will be ruined by time t given the initial surplus u is denoted by:
u, t
T t
The probability of ultimate ruin is given by:
u, t u, t
T lim
t
Adjustment Coefficient
Consider the excess of losses over premiums over the interval 0,t , i.e. the negative surplus
U t U 0 S t ct
Define the adjustment coefficient R as the smallest positive solution of the following equation:
r S t ct
e rct
mS t ct r e
cr mX r 1
e e
rS t
rct t mX r 1
1 X r mX r 1
As long as 0 , then there will only be 1 positive solution to this equation other than 0, since the
LHS is linear in r and RHS is concave up, so they will definitely intersect at some positive R.
Alternatively, the adjustment coefficient is the R that makes e RU t a martingale, i.e. satisfies:
e RU t e Ru
To show this, we need to show that:
Starting with LHS:
RU t
RU t
| U v , 0 v s e
RU s
R U t U s RU s
| U v , 0 v s e
e
| U v , 0 v s
RU s
RU s
RU s
RU s
e RU t U s | U v , 0 v s
e RU t U s
independent increments
e RU t s U 0
stationary increments
The red part is equal to 1 since the increment in the exponential can be written as:
U t s U 0 U 0 U t s S t s c t s
So therefore the red part is equivalent to the same equation we solved to find the adjustment
coefficient, which is equated to 1.
e RU t s U 0 eR S t s ct s e Rct s e t s mX R1 1
Zhi Ying Feng
Zhi Ying Feng

Probability of Ruin in Cramer-Lundberg Process
If U t is a Cramer-Lundberg process with 0 , then for u 0 the probability of ultimate ruin is:
e Ru
e RU T | T
RU T
Since U T 0 , then e | T 1 so we have the Lundbergs upper bound:
u e Ru
X xmax 1 for some finite xmax , then the lower bound is:
U T xmax
u e R u x
If the claim size is bounded, i.e.
max
Therefore, the upper and lower bound on the probability of ultimate ruin is:
R u xmax
u e Ru
Example: Exponential Claims

Consider X ~ exp , to find R we look for the positive solution in the equation
1 1
To find u , we find need to first find the denominator of the formula in Theorem 13.4.1. Denote
U T as the surplus just before ruin and Y as the claim that causes ruin, so Y ~ exp :
Y U T U T
U T | T U T | Y U T
Y U T | Y U T
Since Y is exponential, it has the memoryless property so we have that:
Y U T x | Y U T Y x U T | Y U T
Y x
U T | T ~ exp
Therefore, the conditional mgf is the mgf of an exponential
e RU T | T 1
R
Then we have the probability of ultimate ruin as:
e Ru
1 Ru
1

e
exp
u
1
1
1
e RU T | T
Here the probability of ruin is inversely proportional to the adjustment coefficient, so the insurer
will adjust (maximise) the adjustment coefficient to minimise the probability of ruin.
Applications to Reinsurance
Instead of setting aside large amount of capital to lower its probability of ruin, the direct insurer can
purchase reinsurance. Let h x x be the amount paid by the reinsurer for a claim of amount x:
For proportional reinsurance: h X 1 X
For excess of loss reinsurance: h X X d
Zhi Ying Feng
Zhi Ying Feng

The reinsurance premium ch is the expected value of the amount paid by the reinsurer, loaded by k:
ch 1 k h X
Then the Cramer-Lundberg process for the direct insurer with reinsurance is:
N t
U t u c ch t X i h X i
i 1
To solve for the adjustment coefficient, consider the case of proportional reinsurance:
N t
U t u c ch t 1 X i u c ch t 1 S t
i 1
Consider the loss in period 0,t :
U t U 0 1 S t c ch t
Then the adjustment coefficient is the smallest positive solution to:
m1 S t c ch t r 1
e r 1 S t c ch t 1
r c ch t
mS t r 1 1
r c ch t t mX r 1 1
c ch r mX r 1 1
In general, when there is proportional/excess of loss reinsurance, we need to solve:
c ch r mX h X r 1
The adjustment coefficient with reinsurance will always be lower than the corresponding
adjustment coefficient without reinsurance, since reinsurance will reduce the probability of ruin,
which is inversely proportional to the adjustment coefficient.
Example: Adjustment Coefficient with Proportional Reinsurance
An insurer models its surplus using the Cramer-Lundberg process with claim distribution
X ~ exp 1 and security loading 0.25 . The insurer is considering proportional reinsurance with
retention with security loading k 0.4 . What is the retention that will maximise Rh ?
Setting up the equation to solve for the adjustment coefficient:
X h x X
1
1 r
ch 1 0.4 1 X 1.04 1
c ch 1.25 1.4 1 1.4 0.15
mX h x r
Then the equation for Rh is:
1.4 0.15
1 r
Maximising r w.r.t. gives the solution:

0.671933
Zhi Ying Feng
Rh 0.223787
Zhi Ying Feng

Example: Adjustment Coefficient with Excess of Loss Reinsurance
Now the insurer considers an excess of loss reinsurance with deductible d with the same security
loading of k 0.4 . What is the d that will maximise Rh ?
Setting up the equation to solve for the adjustment coefficient:
d X d
X h X X X d
X X d
mX h X r e e dx
d
rx x
1 re d 1 r
e e dx
1 r
rd x
x d e x dx 1.4 d
d
ch 1.4
e x dx 1.4 e d
Then the equation for Rh is:

1 re
1 r
A closed form solution cannot be found by differentiating w.r.t. d, but the solution from R is:
d 0.9632226
Rh 0.3493290
1 1.25 1.4e d r
d 1 r
Compared to the proportional reinsurance case, the adjustment coefficient here is much higher
which means the probability of ruin will be much lower.
Theorem 14.5.1
This theorem formalises the results of the previous two examples, if:
We are in a Cramer-Lundberg setting
We are considering two reinsurance contracts, one of which is excess of loss
The premium loading and expected amount paid by the reinsurer, i.e. the reinsurance
premium, is the same for both contracts
Then theorem 14.5.1 states that the adjustment coefficient with the excess of loss contract will
always be at least as high as any other type of reinsurance contract. The excess of loss contract is
also the most optimal for the direct insurer, i.e. a lower variance of retained claims
De Finettis Modification
Using the probability of ruin as a criterion presents some issues:
Minimising u supposes that companies should let their surplus grow without limit, which
is not realistic
If some of the surplus is distributed from time to time, e.g. as dividend, then calculations of
u are wrong
Optimal Dividend Strategy

Dividend is a distribution of surplus to shareholders, and shareholders will want to maximise the
expected present value of dividends paid until ruin. The optimal dividend strategy is the optimal
decision on how much and when dividends are to be distributed w.r.t. the EPV of the dividends,
rather than the probability of ruin, which is usually 1 in this context. The optimal dividend strategy
is sometimes a barrier strategy, e.g. in the Cramer-Lundberg model with exponential claim size
distribution. Under a barrier strategy, any excess of surplus over a barrier is paid out immediately as
dividends but no dividends are paid when the surplus is below the barrier.
Zhi Ying Feng
Zhi Ying Feng
Module 6: Updating Insurance Premiums over Time

Credibility Theory
An insurers portfolio will often have different groups of contracts that are heterogeneous, i.e. have
different risks. In setting a premium for a particular group, there may only be limited data for each
group. However, there may be a lot of data when combined with other related contracts. In this case,
to calculate the premium for a particular group j, i.e. the expected claim costs, there are two extreme
choices:
Use overall mean X of all the groups makes sense only if the portfolio is homogenous
Use the mean of a particular group X j makes sense only if the group is sufficiently large
and arguably different from other groups
The credibility approach is to take a weighted average of these two extremes. In general, the
credibility premium is given by:
Pcred z j X j 1 z j X
Where:
X j is an estimate of the expected aggregate claims based on data of a particular risk
X is an estimate of the expected aggregate claims based on similar risks

z j is the credibility factor representing the weight attached to a particular risk. It reflects how
much trust is placed in the data from the risk itself compared to the data from a group of
similar risks. In general:
- The more data there are from the individual risk, the higher the credibility factor
- The more relevant/similar the other risks are, the lower the credibility factor
- The credibility factor will be higher if there is low variance within the particular risk, or
if there is high variance between different risks
Greatest Accuracy Credibility

Let X jt denote the claim size of policy j during year t. This random variable is a function of a risk
profile j that cannot be observed, so it is modelled as an outcome of a random variable . This risk
profile is assumed to be the same for a given contract across time t, but different between policies,
i.e. across j.
Consider observations of claims over T years for a policy j. That is, for the same given the risk
profile j , the claim sizes X j1 ,..., X jT are independent across time t. However, the claim sizes across
policies, i.e. for different j with different risk profiles, are always independent
Individual & Collective Premium
Now we want to take the T years of claim experience into account to calculate the premium for next
year T+1. The correct individual premium for this policy is the expectation of claim sizes as
functions of the risk profile, which is a random variable in it self
X j ,T 1 |
2 Var X j ,T 1 |
Zhi Ying Feng
Zhi Ying Feng

The collective premium for the entire portfolio is simply the expected claim sizes, which is a
number, not a random variable
Pcoll m
X j ,T 1 X j ,T 1 | f d
Bayesian Premium
Given T observations x of X X1 ,..., X T , we are interested in estimating the individual premium
. We know that if given , the distribution of X is known:

FX| x |
X x |
Define:
g x as the estimator of based on the observations x
L , g x as the loss function, if is the true parameter and g x is the value taken by the
estimator if X is observed
f as the priori distribution of , i.e. before any observations are made
f|X | x x as the posteriori distribution of , i.e. after observations have been made
The Bayesian estimator is defined as the estimator that minimizes the error w.r.t. the posteriori
distribution after observation
min L , g x f |X | x d
The quadratic loss function for an estimator j is given by:
L , j j
The Bayesian premium is defined as the Bayesian estimator w.r.t the quadratic loss function,
which turns out to be the posterior expectation of :
| X f|X | x d
In other words, the Bayesian premium minimises the quadratic error between the estimated
P Bayes
premium and the true premium j if j was known
The expected quadratic loss of the Bayesian premium is:

2 | X 2
Var | X
The expected quadratic loss of the collective premium is higher:
m 2
Var
Var | X Var
Zhi Ying Feng
Var | X
| X
Zhi Ying Feng

General Model for Bayesian Premium
In a general model where given:
T realisations of X
The distribution of X given with mean:
FX| x | X x |
X |
A priori distribution of , determined before observations
To calculate the Bayesian premium, we first need to determine a posteriori distribution of using
the realisations of X
F|X | x
| X x
We will need the density of the posteriori distribution, which we denote as | x :
| x
| X x
X x |
X x
This gives 4 cases depending on whether X t and is continuous or discrete, for example:
If X t is discrete and is continuous, then:
t 1
T
| x
X t xt |
X t xt | d
T
t 1
If X t is continuous and is discrete, then:
t 1 f X | xt |
T
| x
T
t 1
f X | xt |
Finally, we can calculate Bayesian premium using:
P Bayes | X x | x d
Example: Bayesian/Credibility Premium for Conjugate Distributions

A conjugate distribution is where the posterior distributions are in the same family as the priori
distribution, e.g. both normal. Therefore, for a particular choice of the likelihood function
f X | x | and priori distribution , the posterior distribution | x has the same algebraic
form as the priori, but with different parameters.
Let X t be a random variable for whether a claim occurs or not in year t. Conditional on the risk
parameter , X t is distributed as Bernoulli(). The risk parameter cannot be observed, but the chief
actuary has assumed a prior distribution of Beta(,). The chief actuary also has information on
X1 ,..., X T , i.e. observations on whether a claim occurs or not in the past T years. Calculate the
Bayesian premium for year X T 1 and show it can be written in the form of a credibility premium.
Zhi Ying Feng
Zhi Ying Feng

Firstly, we need to derive the posterior distribution given:
f X | x | x 1
Bernoulli
1 x
1
1
1 , 0 1 Beta ,

For this particular choice of the conditional likelihood and priori distribution, the posterior
distribution will also have a Beta distribution. Since X t is discrete and is continuous, then:
t 1
T
| x
t 1
X t xt |
X t xt | d
t 1
X t xt |
1 1
t 1 xt 1
1 xt
xt
t1
xt
t1
Proportionality is true since the red dominator is just a constant. Now denote s as sum of
observations and T as the number of observations. In this example, T is the number of observed
years and s t 1 xt the total number of years with a claim. Then the posterior distribution is:
T
| x s 1 1
T s 1
~ Beta s, T s
Then the Bayesian premium under the posterior distribution is:

X T 1 |
P Bayes | X
| X
s
T
To write this in credibility premium form, it needs to be a weighted average of the expected number
of claims for a particular risk X and for all risk m.
t 1 t
s
T
X |
Rewriting the Bayesian premium to incorporate X and risk m gives:

s
P Bayes
T
T
s

T T T
z X 1 z m
This is indeed in the form of a credibility premium, with the credibility factor being:
T
z
T

1 z
T
Zhi Ying Feng
Zhi Ying Feng
Buhlmann Model
The Bayesian estimator/premium is the best possible estimator but it can be tedious to calculate and
it requires us to specify the conditional likelihood and priori distribution. The idea of Buhlmann
model is to restrict the class of allowable estimator functions to those which are linear in the
observations x .
In the Buhlmann model, we make the following assumptions:
There is only 1 policy j for 1 insured
The risk parameter is a random variable with distribution
Conditional on , X1 , X 2 ,..., X T are independent and identically distributed
Also, for these previously defined quantities, denote their moments as
X jt |
2 Var X jt |
m
Var a
2 s 2
Bayesian Premium in Buhlmann Model

Under the Buhlmann model we are still looking for a linear Bayesian estimator, in the form:
PTBayes
a1 X j a0
1
The linear Bayesian estimator is the one that minimises the mean squared error, which is equivalent
to minimizing:
a X a
2
2
min a1 X j a0 min X j ,T 1 a1 X j a0
a0 ,a1
a0 , a1
Taking partial derivatives w.r.t the 2 parameters give:
a 0
2 X j ,T 1 a1 X j a0 0
j ,T 1
2 X j
Cov X j , X j ,T 1 a1 Var X j 0
j ,T 1 a1 X j
0
Now using the following results from the preliminary tutorial exercises:
X j X j ,T 1 m

Cov X j , X j ,T 1 Var a
2
s2
Var a
T
T
Var X j
Then we can solve the equations to arrive at:

a1
Cov X j , X j ,T 1
Var X j
s2
T
a
T
2
a T sa
a0 1 a1 1 a1 m
Therefore, the Bayesian premium the Buhlmann model is:
PTBayes
a1 X j 1 a1 m
1
Zhi Ying Feng
Zhi Ying Feng

When the Bayesian premium is linear in the observations, then it is actually equal to the credibility
premium, since it is a weighted average of the collective premium m and the individual mean X :
PTcred
1 z X j 1 z m
With:
2 s 2
T
a1 z
, K
and m
T K
Var a
Credibility Coefficient & Factor

The K in the above formula for credibility premium is known as the credibility coefficient, which is
inversely proportional to the credibility factor z.
If T increases then z will increase, since having more experience in terms of risk
exposure/number of observations means we can give more credibility to the mean of the
individual risk X j , which is the estimate obtained from experience
If a increases, i.e. there is more heterogeneity between risks, then K will decrease and thus z
will increase. So if the risks we have are quite different from each other, then we will place
more weight on the individual mean structure X
If s decreases, i.e. if there is less heterogeneity within the risk, then K will decrease and thus
z will increase. So if each risk group doesnt vary much within itself, then we would use
more information on the individual mean structure X
Nonparametric Estimation
There are three ways of estimating the parameters m, s2 and a:
Pure Bayesian procedure: intuitively set them using knowledge of an experienced actuary
Parametric estimation: if the distribution of f X | x | and are known then the
parameters can be calculated from these distributions

Nonparametric estimation: if the priori distribution and conditional likelihood are not given,
the values for these parameters are estimated from data from a collective of similar risks.
In nonparametric estimation, denote:

X jt as the claim size of policy j in year t
X j1 , X j 2 ,..., X jT are i.i.d. conditional on
The available set of data for 1 j J and 1 t T as:
Then the unbiased estimator for m is the overall arithmetic average:
X
Zhi Ying Feng
1 J
1 J 1 T
X
X jt
J j 1
J j 1 T t 1
Zhi Ying Feng

The unbiased estimator for 2 s 2 is:
1 J 2 1 J T X jt X j
s s j
J j 1
J j 1 t 1
T 1
2
The unbiased estimator for Var a is:
1 J
a max
X j X
J 1 j 1
1 2
s , 0
T
Note that if a 0 then z 0 , since all risks will have the same risk profile.
Example: Credibility Premium with Parametric Estimation
It is known that given the risk profile , the number of claims follow a Poi distribution. Among
all those insured, the parameter has a Pareto 3, 10 . Suppose that a policy had the
following claim experience in the last 3 years. Determine the credibility premium for the 4th year.
Year
No. of Claims
1
2 claims
2
3 claims
3
1 claim
The conditional likelihood and priori distribution are:
N | ~ Poi
~ Pareto 3, 10
Given these, we can derive:
N4 |
and 2 Var N4 |
We will also need the moments of these quantities:

m
s 2 2
a Var Var
2
75
2
1 2
This allows us to calculate k and hence the credibility factor z.
s2 1
a 15
Therefore, the credibility premium is given by:
K
T
15T
T K 1 15T
15T
1
N
5
15T 1
15T 1
Note that since we want to estimate the expected number of claims, T the risk exposure, i.e. the no.
P4cred z N 1 z m
of observations. Here we took 3 observations, i.e. 3 years of experience and N is the average no. of
claims for this particular policy, so we have:
1
N 2 3 1 2 & T 3
3
cred
P4 2.06522
Zhi Ying Feng
Zhi Ying Feng

Example: Credibility Premium with Nonparametric Estimation
An insurance company sells automobile insurance and has two types of policies: A and B. the
aggregate claim amounts for the last 3 years are shown below.
Policy Type Year 1
Year 2
Year 3
s 2j
X j
A
B
5
11
8
13
11
12
8
12
9
1
X 10 m
s 5
Estimate the Buhlmanns credibility factor and use this to estimate next years credibility premium
for each policy.
First, we need to test whether there is sufficient heterogeneity using the F-test:
SSB / J 1
MSB
F
~ F J 1, J T 1
MSW SSW / J T 1
J
SSB X jt X j
j 1 t 1
SSW T X j X
j 1
J 2, T 3
The null is that the portfolio is homogeneous, so if this test accepts the null then the credibility
factor is zero, i.e. the Buhlmann credibility premium is the same for all risk classes and equal to the
overall mean. However, if the test rejects then the Buhlmann credibility premium is calculated as:
1 J 2
s
s s j 5
1
J j 1
t 1
1 J
1 2 19
a
X j X s
J 1 j 1
T
3
2
j
jt
X j
PAcred z X 1 1 z m 8.42
&
T
19
2
T s / a 24
PBcred z X 2 1 z m 11.58
Alternatively, the credibility factor z can be estimated from the F test statistic
1
z 1
F
Buhlmann-Straub Model
The Buhlmann-Straub model is an extension of the Buhlmann model where the risk exposure, e.g.
number of policyholders, is changing with each observation. In this model we assume that for the jth class of risk:
The risk class j is characterized by its specific risk parameter j , which is a realisation of
S jt is the aggregate claim amount in year t
V jt is the volume associated to S jt in year t, it is treated as a constant and denoted as w jt
X jt S jt / V jt is the claim amount per unit of volume in year t. Depending on what measure of
volume we used, X jt has different interpretations. E.g. if V jt is the number of claims in year t,
then X jt is the average claim size in year t. Since the risk exposure is changing with each
observation, the observed data is the average of the individual observations instead
Conditional on j , X j1 , X j 2 ,..., X jT are i.i.d. with:
j X jt | j
Zhi Ying Feng
& 2 j V jt Var X jt | j
Zhi Ying Feng
For a particular risk j, denote:
j X jt | j
individual risk premium
variance within individual risk 2 j w jt Var X jt | j

T
w j w jt
aggregate volume
t 1
weighted mean of outcomes
w jt
t 1
w j
X jt
For the collective, i.e. the entire portfolio, denote:

variance between individual risk premium
m
a Var
average variance within individual risks
s 2 2
collective premium
w w j
aggregate volume
j 1
J
w j
j 1
weighted mean of outcomes
X j
Credibility Estimator in Buhlmann-Straub Model

If m, s2 and a are all known, then the credibility estimator in the Buhlmann-Straub model is:
Pjcred
,T 1 z j X j 1 z j m
m z j X j m
Where:
w j
and K
s2
a
zj
The credibility factor z j now depends on the risk j
If V jt wjt 1 , then w j T and z j is equivalent to credibility factor z in the simple Buhlmann
w j K
model
Under the Buhlmann-Straub model, this credibility estimator is the best linear estimator under the
quadratic loss function, given experience and collective data. The actual expected quadratic loss is:
2
P cred 2 1 z Var z s
i
j
j
j ,T 1
w j
In comparison, if we used one of the two extremes:
This quadratic loss is smaller than Var , which is the quadratic loss if we just used the
collective premium m, i.e. the best estimator with only collective data and no experience
s2
It is also smaller than
, which is the quadratic loss of X j , i.e. the best estimator with only
w j
experience data and no collective data

Zhi Ying Feng
Zhi Ying Feng

Estimation of m, s2 and a
In the Buhlmann-Straub model, if s2 and a needs to be estimated, use:
2
s
a
zj
1 J 2 1 J 1 T
sj J
w jt X jt X
J j 1
j 1 T 1 t 1
J
w j X
J
2
w
j 1 w2j j 1
2
J 1 s
w j
w j k
w j
2
w j s a
Then estimate m as a weighted average of the experience, using the estimated credibility factor:
J z
j
m X j
j 1 z
J
z z j
j 1
Then the credibility premium for risk j is:
Pjcred
,T 1 z j X j 1 z j m m z j X j m
The expected quadratic loss in this case is given by:
2
1 z j
z
X
j
j
a 1 z j 1 z
Example: Calculating Buhlmann-Straub Credibility Premium

Aggregate past claims data on a portfolio of 2 groups of policyholders are given below. Estimate
the Buhlmann-Straub credibility premium to be charged in year 4 for each group of policyholder
Year
Group (j) 1
2
3
4
Total Claim Amount
1
8000
11000
15000
Number in Group
40
50
70
75
Total Claim Amount
2
20000
24000
19000
Number in Group
100
120
115
95
Use the following steps:
1. Calculate the volume measures w j and w , which are the no. of policyholders in each group.
w1 40 50 70 160
w2 100 120 115 335
w 160 335 495
2. Calculate X j and X . Note that since the table is in terms of aggregate claims amounts, the
values in the table are NOT X jt but S jt . To obtain X jt , use:
X jt
Zhi Ying Feng
S jt
w jt
Zhi Ying Feng
8000
11000
15000
50
70
40
50
70 212.5
160
20000
24000
19000
100
120
115
100
120
115 188.06
335
w X 1 w2 X 2 160 212.5 335 188.06
1
195.61
w
495
40
X 1
X 2
X
3. Now we can estimate s and a :
j 1
w X jt X j
t 1 jt
J T 1
25160.58
2
w j X j X
2
2
w
j 1 w2j j 1
2
J 1 s 182.48
4. Then we also need to estimate the credibility factor:

2
25160.58
137.88
182.48
a
w1
160
z1
0.537
w1 k 160 137.88
k
z2
w2
w2 k
335
0.708
335 137.88
5. Lastly, the estimator for m is:

2 z
0.537
0.708
m j X j
212.50
188.06 198.6016
0.537 0.708
0.537 0.708
j 1 z
6. Putting these all together, we can calculate the TOTAL credibility premium, i.e. the
expected total claim amount, for each group in year 4:
P1,4cred w14 z1 X 1 1 z1 m
75 0.537 212.50 0.463 198.6016

15454.90
cred
P2,4
w24 z 2 X 2 1 z 2 m
95 0.708 188.06 0.292 198.6016
18158.10
Zhi Ying Feng
Zhi Ying Feng
Module 7: Predicting Unpaid Losses of a Contract

In general insurance, there is usually a delay between the occurrence and the actual payment of
claims. This could be a result of either a delay in reporting the claims, or a delay in settling the
reported claims. The reasons for these delays could be:
Delay in assessing exact size or amount of claims
Delay in investigating whether claims are valid
Legal proceedings
Incidents that have already occurred, but claims are not filed until later
Claims consist of series of payments, e.g. disability insurance
An insurer must estimate these incurred-but-not-reported INBR claims that have arisen on or
before the valuation date, so they can set aside reserves to cover these future liabilities. The aim is
to use past data on claim pattern over a number of policy years to determine the expected present
value of INBR claims. These techniques are called run-off techniques and the past data is presented
in a cumulative or incremental run-off triangle:
Where:
Accident year indicates the year that the claim is incurred in
Development year indicates the year the claim is reported, could start at 0 instead of 1
Each entry in the triangle X ij is the amount of the total claims incurred in accident year i paid
in development year j. E.g. the first entry 90 means that of the claims incurred in year 1995,
90 was reported in 1995, i.e. the first year.
Each diagonal represents the total amount of claims paid in a particular calendar year
The aim is to complete the table by predicting the numbers in the bottom right triangle
GLM Family of IBNR Techniques

Several IBNR models are in fact GLMs with a log link function and Poisson error:
X ij i j i j 1
X ij ~ Poisson i j i j 1
Where:
i is a parameter varying by accident year i
j is the development factor for year j independent of accident year i, it represents the
proportion of claim reported in year j

i j 1 is a parameter varying by calendar year, e.g. for inflation, when the calendar year is
i+j-1 if the first development year is denoted as year 1. If it is denoted as 0, then the
calendar year is i+j.
Zhi Ying Feng
Zhi Ying Feng

Chain-Ladder Technique
In this method, we set =1 and assume that the response is Poisson distributed and the expected
claim amount for claims arising in year i and settled in year j as the total amount of claims in year i
multiplied by the proportion of claims reported in year j:
ij X ij i j
X ij ~ Poisson i j
Where:
i is the total claims arising in accident year i, which is the same across rows
j is the % of claims reported in development year j which is same across columns
This is a GLM with a log link function withi ln i ln i ln j
The likelihood function, which is the product of all observed probabilities in the upper left triangle:
t
t 1i
i 1
j 1
L , e
i j
xij
xij !
Taking the log and then setting the derivatives to zero gives the MLEs:
t 1i
j 1
t 1i
j 1
t 1i
i 1
t 1i
i 1
xij
j
xij
i
t 1i
j 1
Kj
t 1i
i 1
with
t
i 1
j 1
Where:
Ri are the sums of row i
K j are the sums of column j
The method to completing the run-off triangle using the chain-ladder technique is:
1. Transform the run-off triangle to incremental form first if it is given in cumulative form
2. Calculate the first and the last using:
1 R1
and
t Kt / 1
3. work your way up (start with lowest) and down (start with highest):
t j 1
Rt j 1
1 ... j
Rt j 1
1 k j 1 k
t
and
Kj
1 ... t j 1
4. Calculate each value missing in the incremental run-off triangle as:

X ij i j
An alternative solution that applies ONLY to incremental run-off triangles is demonstrated below:
X 34
C B
A
This method works even if some elements in A , B or C are predictions themselves.

Zhi Ying Feng
Zhi Ying Feng

Arithmetic Separation Method
In this method, we set 1 and assume that the response is Poisson distributed with mean:
ij X ij j k
X ij ~ Poi j k
Where:
j is the percentage of claims reported in year j which is same across columns, with
j 1
k is a calendar year effect, which is the same across diagonals, with k i j 1
To estimate j and k , construct the likelihood function, which is the product of all observed
probabilities in the upper left triangle:
t
L , e
j k
xij
j k
xij !
i 1 k j
Taking the log and maximizing yields the MLEs:
x K

t 1i
j
k
j 1
t
ij
k j
t
i 1
k j
i , j: i j 1 k
k
j 1
ij
j 1
j 1
Where:
K j are the sums of column j
Dk are the sum of diagonal k i j 1
These MLEs give a direct recursive way to estimate each parameter.

j t , k t , i.e. starting from the top right corner of the triangle:
t Dt
t Kt / t
j t , k t , then moving down the rows and moving left on the columns at each recursion:
t j 1
j
Dk
1 ... j
Dk
1 k j 1 k
t
Kj
t
k j
Since there is no data to estimate k , k t , i.e. future calendar years, we can extrapolate from k for
k t using either linear regression or log-linear extrapolation, i.e. linear regression on the logs
Zhi Ying Feng
Zhi Ying Feng

de Vijlders Least Squares Method
In this method, we set 1 and assume that the response has a Normal distribution, but with a mean
similar to the chain-ladder method:
ij X ij i j
X ij ~ N i j , 2
Since the response distribution is normal, the estimates i and j can be found using least squares by
minimizing sum of squares as it is equivalent to MLE:
ij
i j
i, j
However, in this method, we have an extra parameter estimate 2 to estimate as well.
Bornhuetter-Ferguson Method
In the BF method, the future claims are determined by loss ratios and development factors. The BF
method demonstrated below using the following example:
The ultimate loss ratios for underwriting years 2004, 2005 and 2006 are expected to be in line with
2003 and the total claims paid are $1,942,000. Calculate the BF estimate of INBR reserve required
and state the assumptions underlying this estimate
Assumptions:
Accident year 2003, i.e. the first accident year, is fully run-off
Each accident year will develop in the same way
The data is already adjusted for inflation, or past inflation pattern will repeat in future
The estimated loss ratio is appropriate
Method:
1. Transform the run-off table into cumulative form if it isnt already in cumulative form
2. Derive the development factors d j from the column sums of the cumulative run-off triangle:
dj
-
j 1
k 1
j
k 1
k
k
In this example, the development factors are:

620 660 700
DY 0 DY 1 d 1
1.2406
473 512 611
690 750
DY 1 DY 2 d 2
1.125
620 660
715
DY 2 DY 3 d 3
1.0362
690
Zhi Ying Feng
Zhi Ying Feng

3. Calculate the loss ratio for each year:
Loss Ratio =
Incurred Claims
Earned Premiums
In this example, the loss ratio for all accident year is to follow 2003, which is:
715
LR
0.8314
860
4. Find the initial estimate of the ultimate loss (IUL) for each accident year i using:
IULi Loss Ratio Premium
-
In this example, the ultimate loss for each accident year is:
2003 0.8314 860 715
2004
0.8314 940 781.516
2005 0.8314 980 814.772

2006 0.8314 1020 848.028
5. Calculate the expected claims to date (EC), i.e. given the IUL for an accident year, how
much of the claims would we expect to have arisen already
IULi
ECi
cumulative development factor
-
The cumulative development factor is a product of the development factors from step 2,
depending on which years are left to develop currently
- In this example, 2003 is already fully developed, so EC is 715 and for other years:
781.516
2004
754.213
d3
814.772
2005
698.955
d2d3
848.028
2006
586.384
d1d 2 d 3
6. Finally, calculate the ultimate loss (UL) for each accident year and the IBNR reserve
required
ULi claims already reported + claims yet to be reported
last value in row i IULi ECi
In this example, the results are shown below in the table

Accident Claims Already
Claims Yet to be Reported
Ultimate
Year
Reported
(IUL-EC)
Claims
2003
715
0
715
2004
750
781.516 754.213 = 27.303
777.303
2005
700
814.772 698.955 = 115.817
815.817
2006
647
848.028 586.384 = 261.644
908.644
Total
3216.764
Claims paid to date
1942
INBR reserve
1274.764
Zhi Ying Feng
Zhi Ying Feng
Module 8: Decision Theory & Game Theory

Decision Theory
Decision theory is concerned with making decisions in the face of risk and uncertainty.
Decision under Risk
Condition of risk exists when perfect information is not available, but the probabilities of
outcomes can be estimated. Two useful tools for decision making under risk are:
Expected value: choose the alternative with the highest expected value
Expected opportunity loss: choose the alternative with the lowest expected regret or
opportunity loss, i.e. the loss experienced from a particular decision
Decision under Uncertainty
Decision making under uncertainty refer to situations in which the probabilities of potential
outcomes are not known. Consider the following situation:
Laplace Criterion
The Laplace criterion selects the outcome with the highest expected value under the assumption that
all states of growth are equally likely to occur. In this example, either bonds or savings plans are
equally good.
1
stocks 10000 6500 4000 4167
3
1
bonds 8000 6000 1000 5000
3
savings 5000
Maximin Criterion & Minimax Criterion
The maximin or Wald criterion applies to gains and selects the alternative that maximizes the
minimum returns/payoffs. It is based on the assumption that the decision maker is pessimistic about
the future. In this example, we focus solely on the slow growth state and select the investment that
maximizes payoff in this state, i.e. savings.
If the data is w.r.t losses instead of gains like the example, then the corresponding criterion is the
minimax criterion. First calculate the maximum possible losses of each alternative, and then choose
the alternative with the minimum.
Maximax Criterion
The maximax criterion selects the alternative that has maximizes the maximum returns/payoffs. It is
based on the assumption that the decision maker is optimistic about the future. In this example, we
focus solely on the accelerated growth state and select the investment that maximizes payoff in this
state, i.e. stocks.
Zhi Ying Feng
Zhi Ying Feng

Hurwicz Criterion
The Hurwicz criterion is a compromise between the maximin and maximax criteria, assuming that
the decision maker is neither pessimistic nor optimistic, but somewhere in between.
max max 1 min
In this example, for a coefficient of 0.6 , we take a weighted average of the payoffs under the
accelerated growth state (optimistic) and the slow growth state (pessimistic). The best investment in
this case is bonds.
Minimax Regret Criterion

The minimax regret criterion is based on minimizing the maximum opportunity loss i.e. the loss
experienced from choosing a particular decision over another. For this example, the minimax
matrix of opportunity loss is:
For example, if the state is accelerated growth then stock is best, so the opportunity loss from
investing in bonds is 10,000 8,000 = 2,000. Then the maximum regret for each alternative is
$9000, $4000, and $5000. Therefore to minimise the maximum regret, the bond investment is the
best.
Game Theory
Game theory is concerned with decisions in the face of conflict, which exists when the interest of
two or more parties are in competition. In a two-person, zero-sum game:
The payoffs in two-person zero-sum game can be represented using a payoff matrix.
A zero-sum game is a game where the sum of the payoffs among the players is zero. For a twoperson zero-sum game, the gain from a strategy of Player A is the loss for Player B. For example, if
Player A selects alternative 1 and Player B selects alternative y, then the gain to Player A or the loss
to Player B is 40.
In general, Player A will use the maximin principle for gains, i.e. find the minimum gains of each
alternative and choose the highest one. Player B will use the minimax principle for losses, i.e. find
the maximum loss of each alternative and choose the lowest one.
Zhi Ying Feng
Zhi Ying Feng

Pure Strategy Nash Equilibrium
In the previous example, if both players follow the maximin/minimax principle, then:
Player A will choose strategy 1 since it maximizes the minimum payoff
Player B will choose strategy y since it minimizes the maximum loss
Therefore, the solution to the game, i.e. the strategy selection that satisfies both players, is Player A
chooses 1 and Player B chooses y. If A assumes B is attempting to minimise its own losses and
selects y, then A will choose 1 and if B knows that A has selected 1, B will minimise its own losses
by selecting y. Thus, neither player has any incentive to move to a different strategy nor such a
strategy is known as a pure strategy. Pure strategies exist only when the solution has Nash
equilibrium, and this point in the payoff matrix is known as a saddle point. The saddle point will
always be the one that is largest in its column, but smallest in its row.
A strategy is dominant if regardless of what the other player does, it is still better than the
alternative strategy, known as the dominated strategy. In this example, for Player A strategy 1 is
always better regardless of which strategy B chooses. If a dominant strategy exists, then we can
remove the dominated strategy from the payoff matrix, since Player A will never consider it.
Mixed Strategies Nash Equilibrium
A mix strategy arises when the game does not result in the emergence of a unique pure strategy, i.e.
there is no equilibrium/saddle point, and the game reaches a point where a player will have to
randomly choose one out of two or more strategies. Consider the following scenario:
Union wants to maximize the employees gains
Management wants to minimize its losses
Firstly, we can reduce the payoff matrix by removing dominated strategies. For the union, strategy
1 and 4 are always have higher gains than 2 and 3 while for the management, strategy y and z are
always have lower losses than w and x. This reduces the payoff matrix to:
Using the minimax approach, if the union selects strategy 4, then the management will select
strategy y. But given the management selects strategy y, then the union will change to strategy 1
since it has a higher payoff of 65. Thus, the game will end up in in an infinite loop with no saddle
point reached, where one of the players will always be dissatisfied and want to change.
Expected Gains/Losses Approach
When there is no saddle point one can select in a random fashion but with a certain probability
structure, i.e. select each strategy a certain % of the time such that the players gains or losses are
equal regardless of the opponents selection of strategies. Now selecting a strategy a given % of
time is equivalent to selecting it with a given probability on a one-time basis. We can solve for this
probability by equation the expected gains of the two possible strategies
Zhi Ying Feng
Zhi Ying Feng

In the union and management example:
If management selects y, the possible payoffs to the union are 65 and 50. Then if the union
selects strategy 1 with probability p, the unions expected gains are:
65 p 50 1 p
If management selects z, the possible payoffs to the union are 45 and 55. Then if the union
selects strategy 1 with probability p, the unions expected gains are:
45 p 55 1 p
p is determined such that the union is indifferent between strategies 1 and 4, regardless of what the
management selects, i.e. the expected gains from both possibilities must be the same:
65 p 50 1 p 45 p 55 1 p
p 0.2
This means that the union will select strategy 1 20% of the time, and strategy 4 80% of the time.
Under this strategy, regardless of what decision the management makes, the expected gains for the
union is the same.
Example: Reinsurance Decisions

ABC Company is a large organization responsible for managing and staging sporting events. An
insurance company XYZ has made an agreement to provide insurance coverage for disasters
leading to cancellation of a sporting event. Under the terms of the contract agreement the insurer
will cover the total cost of the claim for up to three disasters in any single year. XYZ is now
considering taking out reinsurance where the reinsurer will cover any excess of the total cost over
$2 million per disaster. The reinsurer has quoted the premiums for two different types of coverage
as follows for XYZ, or XYZ can choose not to reinsure D0:
Assume that XYZ models total loss from a single disaster using a Pareto distribution with mean
$1.5 million and standard deviation $2.5 million. The density for a Pareto distribution is:

fX x
1
x
Given the mean and s.d., we can solve for the parameters:
1.5
2
Var X
2.52
2
1 2
3.125
3187500
Then without reinsurance, i.e. D0, the expected loss per claim is $1.5 million. Reinsurance will
cover any excess over $2 million so that the expected amount paid by the reinsurer is:
x 2 f x dx
2
Zhi Ying Feng
Zhi Ying Feng

This integral is difficult to evaluate, instead we can use:
x 2 f x dx 1 F x dx

dx
2 x
x 1

1 2
532889
Then the expected amount payable by the XYZ with reinsurance is:
min X , d X X d
1500000 532889
967111
There are 4 possible states of nature denoted as 0 , 1 , 2 and 3 representing 0, 1, 2 and 3 or more
disasters. Then if XYZ decides to go with:
D0 : no reinsurance, hence reinsurance premium is zero. Net total claims will just be $1.5
million multiplied by the number of disasters

D1 : reinsurance premium is $500,000. Then the net total claim for 1 disaster is $967,111
and for any more disasters it will just be $967,111 plus $1.5 million for each disaster
D2 : reinsurance premium is $1 million. Then net total claim for 1 disaster is $967,111, for 2
claims is 2x$967,111 and 2x$967,111 plus $1.5 million for 3 or more disasters.
This then becomes a decision under uncertainty problem, and the decision matrix which is in terms
of losses is:
The minimax solution to this decision problem is the alternative that minimizes the maximum loss,
and maximum loss is incurred when there are 3 disasters. Therefore, the minimum of the maximum
loss is D2 , i.e. the reinsurance plan that covers 2 disasters.
Now suppose that XYA believes that the number of disasters each year follows a Poisson
distribution with mean 0.9. Then the probability of each state of nature is:
0 e0.9 0.407
1 0.9e0.9 0.365
2 0.92 e0.9 / 2 0.165
3 1 0.407 0.365 0.165 0.063
Then this becomes a decision under risk problem now that we have the probability structure. Thus,
we should choose the decision that minimises the expected loss, which is no reinsurance D0
Zhi Ying Feng

Actl3003 Notes Unsw Summary

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Actl3003 Notes Unsw Summary

Hochgeladen von

Copyright:

Verfügbare Formate

Zhi Ying Feng

Zhi Ying Feng

Zhi Ying Feng

Module 3: Fitting Loss Models

The Hessian matrix is a m m matrix of second derivatives

The Fisher information is given by:

A consistent estimator of the covariance matrix of the MLE is given by:

Zhi Ying Feng

i 1 if the loss exceeds the policy limit, i.e. right censored

xi is the loss in excess of the deductible M, i.e. left truncated, so that:

Graphical Model Evaluation

For i 1, 2,..., n ,plot the points

Assumption: there is no censoring or truncation in the data

Zhi Ying Feng

Model Hypothesis Tests

H1 : data did not come from such a population

F x is the empirical distribution

F x; is the assumed, continuous theoretical distribution

is the MLE for under the null hypothesis

The actual number of observations in the interval c j 1 , c j is given by:

Zhi Ying Feng

~ 2 k 1 no. of parameters estimated

Zhi Ying Feng

l x; is the maximised log-likelihood

k is the number of parameters estimated

Zhi Ying Feng

Zhi Ying Feng

Module 4: Generalised Linear Models

Exponential Family of Distributions

is the scale parameter

b and c y; are known functions that specify the distribution

Properties of Exponential Distributions

The cumulative generating function is then:

Zhi Ying Feng

Zhi Ying Feng

The mean is a function of only and vice versa, so we can write as

V b '' is called the variance function

Common Exponential Family Distributions

Example: Gamma Distribution

Zhi Ying Feng

Zhi Ying Feng

Note that the mean is related to :

Therefore the variance function is given by:

i xij j 0 1 x1i ... j xij

j are the parameters that needs to be estimated

Yi i to the linear predictori xiT .

The link function links the mean response, i.e.

Zhi Ying Feng

Zhi Ying Feng

Log link (multiplicative model)

Maximum Likelihood Estimation

Example: Gamma Distribution

Zhi Ying Feng

The log-likelihood of the saturated model is given by:

i is the canonical parameter corresponding to i yi

This is also known as the scale deviance, written as:

is MLE of the chosen model and as the natural parameter corresponding to

The scaled deviance is approximately (asymptotically) chi-squared distributed with d.o.f.

Zhi Ying Feng

Zhi Ying Feng

d i is the contribution of the ith observation to the deviance

The Pearson residuals are defined as:

Zhi Ying Feng