Beruflich Dokumente
Kultur Dokumente
Linear Models
2007 CAS Predictive Modeling Seminar
Prepared by
Louise Francis
Francis Analytics and Actuarial Data Mining, Inc.
www.data-mines.com
Louise_francis@msn.com
October 11, 2007
Objectives
Gentle introduction to Linear
Models
Illustrate some simple applications
of linear models
Address some practical modeling
issues
Show features common to LMs
and GLMs
Predictive Modeling
GLMs
Data Mining
Severity
Year
fitted values
) 2
min (Yi Y )
Total Award
Initial Indemnity reserve
Policy Limit
Attorney Involvement
Lags
Closing
Report
Injury
Sprain, back injury, death, etc
Simple Illustration
Total Settlement vs. Initial Indemnity Reserve
Scatterplot Matrix
Analysis Tool
Pak (Add In) that
comes wit Excel
Click Tools, Data
Analysis,
Regression
Y=lnTotal
Award
10.13
14.08
10.31
Predicted Residual
11.76
-1.63
12.47
1.61
11.65
-1.34
model (SST)
Adjusted R2
R2 Statistic
Significance of Regression
F statistic:
residual components
determine if it is significant
Intercept
lnInitialIndemnity
Res
Coefficients
10.343
Standard
Error
0.112
0.154
0.011
t Stat
P-value
92.122
0
13.530
8.21E-40
Random Residual
zi =
( yi yi )
se
, se =
(y
)
y
i i
i=1
N k 1
500
1000
-4
Observation
1500
2000
Outliers
May represent error
May be legitimate but have undue influence
on regression
Can downweight oultliers
Non-Linear Relationship
Non-Linear Relationships
Suppose Relationship between dependent and
Transformation of Variables
Apply a transformation to either the
Y = log(Y)
X = log(X)
X = 1/X
Y=Y1/2
Transformation of Variables:
Skewness of Distribution
Use Exploratory Data Analysis to detect skewness, and heavy tails
Box Plot
Histogram
500
20
450
400
18
Y Values
14
12
10
8
6
4
2
0
Frequency
16
350
300
250
200
150
100
50
0
lnTotalAward
9.6
10.4 11.2
12
16
16.8 17.6
Transformation of Variables
Suppose the Claim Severity is a function of the
Total
567,889
168,747
863,485
1,097,402
801,748
302,500
Model
Model is Model Y = ai, where i is a category of
16, 000. 00
14, 000. 00
12, 000. 00
10, 000. 00
Drop Series Fields Here
8, 000. 00
6, 000. 00
4, 000. 00
2, 000. 00
BRUISE
Y=a1
Total
BURN
CRUSHI CUT/ PU
NG
NCT
EYE
534.23
FRACTU
RE
OTHER
SPRAIN STRAIN
Y=a
9
Injur y
Two Categories
Model Y = ai, where i is a category
Mean
Variance
Observations
Hypothesized Mea
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Variable 1
124,002
2.35142E+11
354
0
1591
-7.17
0.00
1.65
0.00
1.96
Variable 2
440,758
1.86746E+12
1448
Involvement
Variable is 1 If Attorney Involved, and 0
Otherwise
Attorneyinvolvement-insurer
Y
Y
Y
N
Y
N
Y
Y
N
Attorney TotalSettlement
1
25000
1
1300000
1
30000
0
42500
1
25000
0
30000
1
36963
1
145000
0
875000
Design Matrix
Injury Code
1
1
12
11
17
Injury_Other
0
0
0
0
1
bn*Xn+e
or categorical dummies
Multiple Regression
Y = a + b1* Initial Reserve+ b2* Report Lag + b3*PolLimit
+ b4*age+ ciAttorneyi+dkInjury k+e
SU M M A R Y O U TPU T
R egression Statistics
M ultiple R
R Square
Adjusted R Square
Standard Error
O bservations
0.49844
0.24844
0.24213
1.10306
1802
A N O VA
df
R egression
R esidual
Total
Intercept
lnInitialIndem nityR es
lnR eportlag
Policy Lim it
C lm t Age
Attorney
Injury_Backinjury
Injury_Braindam age
Injury_Burnschem ical
Injury_Burnsheat
Injury_C irculatorycondition
15
1786
1801
SS
718.36
2173.09
2891.45
MS
47.89
1.22
t Stat
64.374
9.588
1.887
4.405
-1.037
10.599
-1.995
3.719
2.375
3.645
1.196
F
39.360
P-value
0.000
0.000
0.059
0.000
0.300
0.000
0.046
0.000
0.018
0.000
0.232
Y = u+ai + bj + + e, where ai + bj
are offsets to u
Multicollinearity
Predictor variables are assumed
uncorrelated
Assess with correlation matrix
variables
Use Factor analysis or Principle components
to produce a new variable which is a
weighted average of the correlated variables
Use stepwise regression to select variables
to include
GLMs
Link functions
Use dummy coding for
categorical variables
Deviance
Test significance of
coefficients