Sie sind auf Seite 1von 8

Constructing GLMs linear predictors

Ch16, pg 16-7
Specimen 2010, Q1
September 2011, Q9 (ii)
Two types of variable GLMs require to be defined:
Weight/exposure these are the weights used in the model fit to attach
an importance to each observation. Eg in a claim frequency model exposure
would be defined as the length of time the policy has been on risk; for an
average claim size model, the exposure will be the number of claims for the
observation.
Common choices for the prior weight are equal to:
o 1 (eg when modeling claim counts)
o the number of exposures (eg when modeling claim frequency)
o the number of claims (eg when modeling claim severity)

Response this is the value that the model is trying to predict. Hence, in
the claim frequency model, it is the number of claims for that observation,
and for the average claim size model, it is the total claims cost for that
observation.

In general the name of the model corresponds to the ratio: response/weight, ie:
Claim frequency = number of claims / policy years
Average claim size = cost of claims / number of claims
A categorical factor is a factor to be used for modeling where the values of
each level are distinct, and often cannot be given any natural ordering or score.
An example is car manufacturer, which has various values. By contrast, a noncategorical factor is one that takes a naturally ordered value, eg age or car
value (these may need to be rounded at the input stage to reduce the number of
levels to a convenient number).
An interaction term is used where the pattern in the response variable is
modeled better by including extra parameters for each combination of two or
more factors. This combination adds predictive value over and above the
separate single factors.
Initial analyses
Ch17, pg 23
April 2010, Q8

One-way analyses indicates whether a variable contains enough


information to be included in any model (eg if 99.5% of a variables
exposures are in one level, it may not be suitable for modeling).
Assuming there is some viable distribution by levels of the factor,
consideration needs to be given to any individual levels containing very low

exposure and claim count. If these levels are not ultimately combined with
other levels, the GLM maximum likelihood algorithm may not converge. (If a
factor level has zero claims and a multiplicative level is being fitted, the
theoretically correct multiplier for that level will be close to zero, and the
parameter estimate corresponding to the log of that likelihood may be so
large and negative that the numerical algorithm seeking the maximum
likelihood will not converge).
In addition to investigating the exposure and claim distribution, a query of
one-way statistics (eg frequency, severity, loss ratio and pure premium) will
give a preliminary indication of the effect of each factor.
Example:

Factor 1
(car age)

Factor 2
(mileage)

Exposure

0-1
0-1
2-6
2-6
7+
7+

0-8k
8k+
0-8k
8k+
0-8k
8k+

900
25,450
4,700
13,025
5,273
652

Predicted
value
(apply
GLM)
0.4
0.8
0.5
1.0
1.5
3.0

Total response
= exposure x predicted
value
360
20,360
2,350
13,025
7,909
1,956

Two-way analyses considers key statistics summarized by each


combination of a pair of factors. It can be represented graphically with
multiple lines and stacked exposure bars.
They can be particularly useful where we think there is some correlation
between levels of two factors. These might be used to help us gain a better
understanding of our data prior to performing a GLM.

Correlation analyses although not used in the GLM process, an


understanding of the correlations within a portfolio is helpful when
interpreting the results of a GLM. In particular, it can explain why the
multivariate results for a particular factor differ from the univariate results,
and can indicate which factors may be affected by the removal or inclusion
of any other factor in the GLM.
One commonly-used statistic for categorical factors is Cramers V statistic.
It takes values between 0 and 1. A value of 0 means that knowledge of one
of the two factors gives no knowledge of the value of the other. A value of 1
means that knowledge of one of the factors allows the value of the other
factor to be deduced.

Distribution analyses the distribution of key data items can be


considered for the purpose of identifying any unusual features or data

problems that should be investigated prior to modeling. Mainly this


concerns the distribution of claim amounts (that is, number of claim counts
by average claim size), which are examined in order to identify features
such as unusually large claims and distortions resulting from average
reserves placed on newly reported claims.
Understanding of the data is enhanced by investigating how the response
distribution varies by different levels of a factor.
Distribution analyses can also highlight specific anomalies that might
require addressing prior to modeling. For example, if many new claims have
a standard average reserve allocated to them, it might be appropriate to
adjust the amount of such an average reserve if it was felt that the average
level was systematically below or above the ultimate average claims cost.
Methods for simplifying factors
Ch16, pg 58-60
September 2010, Q1 (ii), (iii)
1. Group and summarise the data prior to loading this requires
knowledge of the pattern that is expected. It is now mainly adopted as a
method to thin out redundant codes from the data that has little exposure.
2. Grouping in the modeling package often called a custom factor, this
method simply assigns a single parameter to represent the relativity for
multiple levels of the factor
3. Curve fitting, or use of a variate the levels of a factor are each
assigned an x-value and a polynomial is fitted to the factor. In this case, the
parameters in the model are just the parameters from the polynomial itself,
excluding the constant term.
4. Piecewise curve fitting the factor levels are broken in to sections and a
custom factor and/or curve from methods 2 and 3 is applied to each
section. By combining these in different ways, the join at each section
boundary can be disjoint or piecewise continuous as the modeler thinks
appropriate.
Spatial smoothing
Ch17, pg 14-17
September 2010, Q6
GLMs work well where the number of levels is small (around 100 or less), or
where the level forms a naturally continuous variable that can be fitted as a
function (say, a polynomial or a set of polynomial splines).
For a factor such as postcode where there are many levels (c. 1.7m in the UK) an
alternative approach such as spatial smoothing is required. This allows the model
to fit many values to the postcode factor, and then removes the noise from these
predictions by adjusting the relativity to take into account neighbouring values.

This improves the predicted values by taking into account the credibility (or lack
thereof) for the response in a single location.
Two main forms of spatial smoothing are typically employed:
Distance-based smoothing
Adjacency-based smoothing
The features of each form of smoothing make it more or less appropriate to use
depending on the underlying processes behind the loss typed being modelled.
Distance-based smoothing incorporates information about nearby location
codes based on the distance between the location codes: the further away a
location code, the less influence (or weight) is given to its experience).
This is true regardless of whether an area is urban or rural, and whether natural
or artificial boundaries (such as rivers) exist between location codes, and
therefore may not be appropriate for certain perils such as theft, where a river
with no bridge separates two areas and therefore the claims experience is
different.
As such, distance-based smoothing methods are often employed for weatherrelated perils were there is less danger of over- or under-smoothing urban and
rural areas.
Distance-based smoothing methods have the advantage of being easy to
understand and implement, as no distributional assumptions are required in the
algorithm.
Distance-based methods can also be enhanced by amending the distance metric
to include dimensions other than longitude and latitude. Eg including urban
density in the distance metric would allow urban areas to be more influenced by
experience in nearby urbane areas than by nearby rural ones, which may be
appropriate.
Adjacency-based smoothing incorporates information about directly
neighbouring location codes. Each location code is influenced by its direct
neighbours, each of which is in turn influenced by its direct neighbours;
distributional assumptions or prior knowledge of the claims processes can be
incorporated in the technique. The algorithms are therefore iterative and
complex to implement.
As this smoothing method relies on defining which location codes neighbor each
other, natural or artificial boundaries (eg rivers or motorways) can be reflected in
the smoothing process.
Location codes tend to be smaller in urban regions and larger in rural areas, so
adjacency-based smoothing can sometimes handle urban and rural differences
more appropriately for non-weather-related perils.

The degree of smoothing is an important consideration. Employing too low a


level of spatial smoothing would mean that near or neighbouring location codes
have little influence on the location code in question. This can result in some of
the random noise element being captured together with the true underlying
residual variation. This causes distortions and reduces the predictiveness of the
model.
Conversely, employing too high a level of spatial smoothing can result in the
blurring of experience so that some of the true underlying residual variation is
lost, again causing distortions.
Appropriate diagnostics should therefore be used to assess the level of smoothing
required.
Model validation
Ch17, pg 37-40
September 2010, Q6 (iv)
Whilst many individual aspects of the model selection can be tested using specific
formal statistical tests during the modeling process, it can also be helpful to
perform an overall validation of the effectiveness of a model by testing its
predictiveness on out-of-sample experience.
Validation samples of, say, 20% of the total data can be withheld from the
modeling process. A range of tests can then be undertaken on this validation
sample comparing actual experience with that predicted by the selected model.
Another related way of assessing the predictiveness of a model is to calculate a
lift curve on an out-of-sample model validation dataset.
This method is also useful for comparing two methods of different forms.
One approach is to rank all policies in the validation dataset in order of expected
experience (according to the model being tested), and then to group up the
policies into bands of equal exposure based on this ranking. The actual
experience for each group can then be calculated and displayed as a curve. The
steeper the curve, the more effective the model is at distinguishing between high
and low experience because there is a greater differentiation between the good
and bad risks.
Another useful way of presenting the final fitted values from the model, and
comparing them to the observed values, is the gains curve. With this method
the data are sorted high to low according to the fitted model values, and then the
chart shows the cumulative values from the fitted model and the cumulative
observed values from the data. A statistical measure for the lift produced by the
model is called the Gini coefficient. This can be thought of as the area enclosed
by the model curve and the diagonal line as a ratio of the triangle above the
diagonal.

The Gini coefficient is a measure of statistical dispersion that can range from 0 to
1. The higher the Gini coefficient, the more predictive the model.
The structure of a GLM
Ch16, pg 12
September 2011, Q9 (i)
The linear model structure:
k

Yi =

X ij j

+ i

j=1

generalises into the following structure:


k

-1

Yi = g (

X ij j

+ i) + i

j=1

which can be written in matrix form as:


Y = g-1(X. + ) +
Where:

g() is known as the link function


X is the design matrix of factors
is a vector of parameters to be estimated
is a vector of offsets or known effects these are included when we know
the effect of an explanatory variable and include this as a known effect
is the error term appropriate to Y

Analysis of the significance of factors


Ch16, pg 23
September 2011, Q9 (iii)
April 2012, Q2
The deviance of a model compares the observed value Yi to the fitted value i
with allowances for the weights and assigning higher importance to errors where
the variance should be small. The total deviance for a model is the sum of all
residual deviances:
n

D=

d (Y i ; i )
i=1

The deviance can be adjusted by the scale parameter to give a standarised


measure that can be compared to other models:
D* =

Tests to analyse the significance of factors used in the model:

Chi-squared statistics the number of degrees of freedom df for the


model is defined as the number of observations less the number of
parameters.
Two nested models (that is, one is a subset of the other) can be compares
using a 2 test for the change in scaled deviance (D*).

D1

D2

F statistics in cases where the scale parameter for the model is not
known, its estimator is distributed as a 2 distribution, and the ratio of two
2 distributions is the F distribution:
D1D2
(df 1df 2)(D2 /df 2)

2df 1df 2

Fdf

df 2 ,df 2

Akaike Information Criteria (AIC) this is a statistics primarily used for


model selection (that is, comparing which of two models is a better fit).
If two models are nested, then the more usual chi-squared test is the most
appropriate to use.
If the models are not nested, the AIC can be used. The AIC for a model is
calculated as:
AIC = -2 x log-likelihood + 2 x number of parameters
The AIC looks at the trade-off of the likelihood of a model against the
number of parameters: the lower the AIC, the better the fit. For example, if
two models fit the data equally well in terms of the log-likelihood, then the
model with the fewer parameters is more parsimonious (and therefore
better).

Other methods for checking the significance of a factor


Ch16, pg 29-35
April 2012, Q2 (a good example of ways to compare models)

Comparison with expectations factor significance is initially checked


by considering the spread of relativity values for each level, combined with
the standard errors at each level. If the relativities are being fitted
individually, then the pattern of their values should be consistent with the
definition of the factor under consideration.

For example, if the levels represent a continuous variable (eg vehicle age),
then the relativity should vary smoothly as the factor value increases. The
error ranges of these relativities are also distinct (they dont overlap much),
indicating that the response from the data underlying each level has a
significantly different relativity value. Hence the factor should be accepted.
These can be illustrated on a graph.

Comparison over time the time consistency check (derived by


interacting each factor in turn with a time-related factor) is important for
pricing work, because typically you will be analyzing data from two to seven
years ago, and then deploying rates for the next year. So if the pattern you
select is moving rapidly over time, then the model average selected will be
inappropriate for future periods.

Consistency checks with other factors note that time is not the only
factor that can be used as a consistency check. If you are producing a
model for a multi-distribution channel business then it is particularly
important that each factor is checked to ensure that it is valid for every
channel.
Differences in the data collection methods by channel can cause problems
here.
Also a random factor could be created in the data as a means to check
consistency for a factor.

The form of the exponential family


Ch16, pg 6
Tables page 27

Das könnte Ihnen auch gefallen