Copula Regression

Copula Regression
BY
RAHUL A. PARSA
DRAKE UNIVERSITY
&
STUART A. KLUGMAN
SOCIETY OF ACTUARIES
Outline of Talk
OLS Regression
Generalized Linear Models (GLM)
Copula Regression
Continuous case
Discrete Case
Examples
Notation
Notation:
Y Dependent Variable
X 1 , X 2 , X k Independent Variables
Assumption
Y is related to Xs in some functional form
E[ Y | X 1 x1 X n xn ] f ( X 1 , X 2 , X n )
OLS Regression
Y is linearly related to Xs
OLS Model
Yi 0 1 X 1i 2 X 2i k X ki i
OLS Regression
2
min (Yi Yi )
Estimated Model
1
Y X ' X X 'Y
Yi 0 1 X 1i k X ki
OLS
Multivariate Normal Distribution
Assume Y , X 1 , X 2 , X k
Jointly follow a multivariate normal distribution
Then the conditional distribution of Y | X follows
normal distribution with mean and variance given by
E(Y | X x) y YX ( x x )
1
XX
Variance YY YX
1
XX
YX
OLS & MVN

Y-hat = Estimated Conditional mean
It is the MLE
Estimated Conditional Variance is the error variance
OLS and MLE result in same values
Closed form solution exists
GLM
Y belongs to an exponential family of distributions
E (Y | X x) g ( 0 1 x1 k xk )
1
g is called the link function

x's are not random
Y|x belongs to the exponential family
Conditional variance is no longer constant
Parameters are estimated by MLE using numerical
methods
GLM
Generalization of GLM: Y can be any distribution
(See Loss Models)

Computing predicted values is difficult
No convenient expression conditional variance
Copula Regression
Y can have any distribution
Each Xi can have any distribution
The joint distribution is described by a Copula
Estimate Y by E(Y|X=x) conditional mean
Copula
Ideal Copulas will have the following properties:
ease of simulation
closed form for conditional density
different degrees of association available for different
pairs of variables.
Good Candidates are:

Gaussian or MVN Copula
t-Copula
MVN Copula
CDF for MVN is Copula is
1
F ( x1 , x2 , , xn ) G ( [ F ( x1 )], [ F ( xn )])
Where G is the multivariate normal cdf with zero
mean, unit variance, and correlation matrix R.

Density of MVN Copula is
vT ( R 1 I )v
0.5
f ( x1 , x2 ,, xn ) f ( x1 ) f ( x2 ) f ( xn ) exp
* R
2
Where v is a vector with ith element vi 1[ F ( xi )]
Conditional Distribution in
MVN Copula
The conditional distribution of xn given x1 .xn-1 is
{F ( xn ) r T Rn11vn1}2
1
2
T 1
0.5
f ( xn | x1 xn1 ) f ( xn ) * exp 0.5 *
[
F
(
x
)]}
*
(
1
r
R
r
)
n
n
T 1
(
1
r
Rn1r )
Where
vn 1 (v1 , vn 1 )
Rn1
R T
r
Copula Regression
Continuous Case
Parameters are estimated by MLE.
If Y , X 1 , X k
are continuous variables, then

we use previous equation to find the conditional
mean.
one-dimensional numerical integration is needed to
compute the mean.
Copula Regression
Discrete Case
When one of the covariates is discrete
Problem:
determining discrete probabilities from the Gaussian
copula requires computing many multivariate
normal distribution function values and thus
computing the likelihood function is difficult
Solution:
Replace discrete distribution by a continuous
distribution using a uniform kernel.
Copula Regression Standard Errors

How to compute standard errors of the estimates?
As n -> , MLE , q converges to a normal
n
distribution with mean q and variance I(q)-1, where
I (q ) n * E 2 ln( f ( X ,q ))
q
I(q) Information Matrix.
How to compute Standard Errors

Loss Models: To obtain information matrix, it is
necessary to take both derivatives and expected

values, which is not always easy. A way to avoid this
problem is to simply not take the expected value.
It is called Observed Information.
Examples
All examples have three variables
R Matrix :
0.7
0.7
0.7
0.7
0.7
0.7
Error measured by
(Yi Yi )
Also compared to OLS
Example 1
Dependent X3 - Gamma
Though X2 is simulated from Pareto, parameter
estimates do not converge, gamma model fit

Variables
X1-Pareto
X2-Pareto
X3-Gamma
Parameters
3, 100
4, 300
3, 100
3.44, 161.11
1.04, 112.003
3.77, 85.93
MLE
Error:
Copula
59000.5
OLS
637172.8
Ex 1 - Standard Errors
Diagonal terms are standard deviations and off-
diagonal terms are correlations

X1 Pareto
Alpha1
Theta1
X2 Gamma
Alpha2
Theta2
X3 Gamma
Alpha3
Theta3
R(2,1)
R(3,1)
R(3,2)
Alpha1
0.266606 0.966067 0.359065 -0.33725 0.349482 -0.33268 -0.42141 -0.33863 -0.29216
Theta1
0.966067 15.50974 0.390428 -0.25236 0.346448 -0.26734 -0.37496 -0.29323 -0.25393
Alpha2
0.359065 0.390428 0.025217 -0.78766 0.438662 -0.35533 -0.45221 -0.30294 -0.42493
Theta2
-0.33725 -0.25236 -0.78766 3.558369 -0.38489 0.464513 0.496853
Alpha3
0.349482 0.346448 0.438662 -0.38489 0.100156 -0.93602 -0.34454 -0.46358 -0.46292
Theta3
-0.33268 -0.26734 -0.35533 0.464513 -0.93602 2.485305 0.365629 0.482187 0.481122
R(2,1)
R(3,1)
R(3,2)
-0.42141 -0.37496 -0.45221 0.496853 -0.34454 0.365629 0.010085 0.457452 0.465885

-0.33863 -0.29323 -0.30294 0.35608 -0.46358 0.482187 0.457452 0.01008 0.481447
-0.29216 -0.25393 -0.42493 0.470009 -0.46292 0.481122 0.465885 0.481447 0.009706
0.35608 0.470009
Example 1 - Cont
Maximum likelihood Estimate of Correlation Matrix
R-hat =
0.711
0.699
0.711
0.713
0.699
0.713
Example 2
X1 & X2 estimated Empirically

Variables
X1-Pareto
X2-Pareto
X3-Gamma
Parameters
3, 100
4, 300
3, 100
F(x) = x/n 1/2n

f(x) = 1/n
F(x) = x/n 1/2n

f(x) = 1/n
4.03, 81.04
MLE
Error:
Copula
595,947.5
OLS
637,172.8
GLM
814,264.754
Example 3
Pareto for X2 estimated by Exponential

Variables
X1-Poisson
X2-Pareto
X3-Gamma
Parameters
4, 300
3, 100
5.65
119.39
3.67, 88.98
MLE
Error:
Copula
574,968
OLS
582,459.5
Example 4
X1 & X2 estimated Empirically

C = # of obs x and a = (# of obs = x)
Variables
X1-Poisson
X2-Pareto
X3-Gamma
Parameters
4, 300
3, 100
F(x) = c/n + a/2n

f(x) = a/n
F(x) = x/n 1/2n

f(x) = 1/n
3.96, 82.48
MLE
Error:
Copula
OLS
GLM
559,888.8
582,459.5
652,708.98
Example 5
Dependent X1 - Poisson
X2, estimated by Exponential

Variables
X1-Poisson
X2-Pareto
X3-Gamma
Parameters
4, 300
3, 100
5.65
119.39
3.66, 88.98
MLE
Error:
Copula
108.97
OLS
114.66
Example 6
Dependent X1 - Poisson
X2 & X3 estimated by Empirically
Variables
X1-Poisson
X2-Pareto
X3-Gamma
Parameters
4, 300
3, 100
5.67
F(x) = x/n 1/2n

f(x) = 1/n
F(x) = x/n 1/2n

f(x) = 1/n
MLE
Error:
Copula
110.04
OLS
114.66

Copula Regression

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Copula Regression

Hochgeladen von

Copyright:

Verfügbare Formate

Copula Regression

OLS & MVN

g is called the link function

(See Loss Models)

Good Candidates are:

mean, unit variance, and correlation matrix R.

Where v is a vector with ith element vi 1[ F ( xi )]

Parameters are estimated by MLE.

are continuous variables, then

Copula Regression Standard Errors

distribution with mean q and variance I(q)-1, where

I(q) Information Matrix.

How to compute Standard Errors

necessary to take both derivatives and expected

Also compared to OLS

Though X2 is simulated from Pareto, parameter

estimates do not converge, gamma model fit

diagonal terms are correlations

0.266606 0.966067 0.359065 -0.33725 0.349482 -0.33268 -0.42141 -0.33863 -0.29216

0.966067 15.50974 0.390428 -0.25236 0.346448 -0.26734 -0.37496 -0.29323 -0.25393

0.359065 0.390428 0.025217 -0.78766 0.438662 -0.35533 -0.45221 -0.30294 -0.42493

-0.33725 -0.25236 -0.78766 3.558369 -0.38489 0.464513 0.496853

0.349482 0.346448 0.438662 -0.38489 0.100156 -0.93602 -0.34454 -0.46358 -0.46292

-0.33268 -0.26734 -0.35533 0.464513 -0.93602 2.485305 0.365629 0.482187 0.481122

-0.42141 -0.37496 -0.45221 0.496853 -0.34454 0.365629 0.010085 0.457452 0.465885

X1 & X2 estimated Empirically

F(x) = x/n 1/2n

F(x) = x/n 1/2n

Pareto for X2 estimated by Exponential

X1 & X2 estimated Empirically

F(x) = c/n + a/2n

F(x) = x/n 1/2n

X2, estimated by Exponential

X2 & X3 estimated by Empirically

F(x) = x/n 1/2n

F(x) = x/n 1/2n

Das könnte Ihnen auch gefallen