Beruflich Dokumente
Kultur Dokumente
Paul Embrechts ETH Zurich Rdiger Frey u University of Leipzig Alexander McNeil ETH Zurich
19th International Summer School of Swiss Association of Actuaries 1014 July 2006, University of Lausanne http://www.pupress.princeton/edu/titles/8056.html http://www.math.ethz.ch/mcneil/book/ embrechts@math.ethz.ch ruediger.frey@math.uni-leipzig.de mcneil@math.ethz.ch
c 2006 (Embrechts, Frey, McNeil)
Q U A N T I TAT I V E
RISK
MANAGEMENT
Concepts Techniques Tools
Overview
I. Introduction to QRM and Multivariate Risk Models II. Modelling Extreme Risks III. Operational Risk IV. Credit Risk Management V. Dynamic Credit Models and Credit Derivatives
Financial Risk
We are primarily concerned with the main categories of nancial risk: market risk the risk of a change in the value of a nancial position due to changes in the value of the underlying components on which that position depends, such as stock and bond prices, exchange rates, commodity prices, etc. credit risk the risk of not receiving promised repayments on outstanding investments such as loans and bonds, because of the default of the borrower. operational risk the risk of losses resulting from inadequate or failed internal processes, people and systems, or external events.
c 2006 (Embrechts, Frey, McNeil) 7
Insurance Risk
The insurance industry also has a longstanding relationship with risk. The Institute and Faculty of Actuaries use the following denition of the actuarial profession: Actuaries are respected professionals whose innovative approach to making business successful is matched by a responsibility to the public interest. Actuaries identify solutions to nancial problems. They manage assets and liabilities by analysing past events, assessing the present risk involved and modelling what could happen in the future. An additional risk category entering through insurance is underwriting risk: the risk inherent in insurance policies sold.
c 2006 (Embrechts, Frey, McNeil) 8
Some Dates
1950s. Foundations of modern risk analysis are laid by work of Markowitz and others on portfolio theory. 1970. Oil crises and abolition of Bretton-Woods turn energy prices and exchange rates into volatile risk factors. 1973. CBOE, Chicago Board Options Exchange starts operating. Fisher Black and Myron Scholes, publish an article on the rational pricing of options. [Black and Scholes, 1973] 1980s. Deregulation; globalization - mergers on unprecedented scale; advances in IT.
10
Growth of Markets
Example 1 Average daily trading volume at New York stock exchange: 1970: 3.5 million shares 1990: 40 million shares Example 2: Global market in OTC derivatives (nominal value). 1995 1998 FOREX contracts $13 trillion $18 trillion Interest rate contracts $26 trillion $50 trillion All types $47 trillion $80 trillion Source BIS; see [Crouhy et al., 2001]. $1 trillion = $1 1012.
11
12
14
Basel II Continued
Two options for the measurement of credit risk: Standard approach Internal rating based approach (IRB) Pillar 1 sets out the minimum capital requirements (Cooke Ratio): total amount of capital 8% risk-weighted assets MRC (minimum regulatory capital) Explicit treatment of operational risk
c 2006 (Embrechts, Frey, McNeil) 15
def
8% of risk-weighted assets
16
Extremes Matter
From the point of view of the risk manager, inappropriate use of the normal distribution can lead to an understatement of risk, which must be balanced against the signicant advantage of simplication. From the central banks corner, the consequences are even more serious because we often need to concentrate on the left tail of the distribution in formulating lender-of-last-resort policies. Improving the characterization of the distribution of extreme values is of paramount importance. [Alan Greenspan, Joint Central Bank Research Conference, 1995]
17
18
19
20
Concentration Risk
Over the last number of years, regulators have encouraged nancial entities to use portfolio theory to produce dynamic measures of risk. VaR, the product of portfolio theory, is used for short-run, day-to-day prot and loss exposures. Now is the time to encourage the BIS and other regulatory bodies to support studies on stress test and concentration methodologies. Planning for crises is more important than VaR analysis. And such new methodologies are the correct response to recent crises in the nancial industry. [Scholes, 2000]
21
22
Interdisciplinarity
The quantitative risk manager of the future should have a combined skillset that includes concepts, techniques and tools from many elds: mathematical nance; statistics and nancial econometrics; actuarial mathematics; non-quantitative skills, especially communication skills; humility QRM is a small piece of a bigger picture!
23
25
Risk Factors
Generally the loss Lt+1 for the period [t, t + 1] will depend on changes in a number of fundamental risk factors in the time period, such as stock prices and index values, yields and exchange rates. Writing Xt+1 for the vector of changes in underlying risk factors, the loss will be given by a formula of the form Lt+1 = l[t](Xt+1). where l[t] : Rd R is a known function which we call the loss operator. The book contains examples showing how the loss operator is derived for dierent kinds of portfolio. This is a process known as mapping.
c 2006 (Embrechts, Frey, McNeil) 26
Loss Distribution
The loss distribution is the distribution of Lt+1 = l[t](Xt+1)? But which distribution exactly? The Conditional distribution of Lt+1 given given Ft = ({Xs : s t}), the history up to and including time t? The unconditional distribution under assumption that (Xt) form stationary time series? Conditional problem forces us to model the dynamics of the risk factors and is most suitable for market risk. Unconditional approach is used for longer time intervals and is also typical in credit portfolio management.
c 2006 (Embrechts, Frey, McNeil) 27
28
(1)
where we use the notation q(FL) or q(L) for a quantile of the distribution of L and FL for the (generalized) inverse of FL. Provided E(|L|) < expected shortfall is dened as 1 ES = 1
1
qu(FL)du.
(2)
29
95% ES = 3.3
0.05
0.20
5% probability
0.0
-10
-5
10
30
0.05
0.20
5% probability
0.0
-10
-5
10
31
Expected Shortfall
For continuous loss distributions expected shortfall is the expected loss, given that the VaR is exceeded. For any (0, 1) we have E(L; L q(L)) ES = = E(L | L VaR) , 1 where we have used the notation E(X; A) := E(XIA) for a generic integrable rv X and a generic set A F. For a discontinuous loss df we have the more complicated expression 1 ES = E(L; L q) + q(1 P (L q)) . 1 [Acerbi and Tasche, 2002].
c 2006 (Embrechts, Frey, McNeil) 32
The Axioms
A coherent risk measure is a realvalued function on some space of rvs (representing losses) that fullls the following 4 axioms: 1. Monotonicity. For two rvs with L1 L2 we have (L1) (L2). 2. Subadditivity. For any L1, L2 we have (L1 + L2) (L1) + (L2). This is the most debated property. Necessary for following reasons: Reects idea that risk can be reduced by diversication and that a merger creates no extra risk. Makes decentralized risk management possible. If a regulator uses a non-subadditive risk measure, a nancial institution could reduce risk capital by splitting into subsidiaries.
c 2006 (Embrechts, Frey, McNeil) 35
The Axioms II
3. Positive homogeneity. For 0 we have that (L) = (L). If there is no diversication we should have equality in subadditivity axiom. 4. Translation invariance. For any a R we have that (L + a) = (L) + a. Remarks: VaR is in general not coherent. coherent. ES (as we have dened it) is
Non-subadditivity of VaR is relevant in presence of skewed loss distributions (credit-risk management, derivative books), or if traders optimize against VaR.
c 2006 (Embrechts, Frey, McNeil) 36
37
0.10
Time
-0.05
SIEMENS
SIEMENS 0.0
23.01.85
23.01.86 23.01.87
23.01.88
23.01.89 23.01.90
23.01.91
23.01.92
-0.10
-0.15
0.05
-0.15
-0.15
23.01.85
23.01.86 23.01.87
23.01.88
23.01.91
23.01.92
-0.15
-0.10
-0.05
0.0 BMW
0.05
0.10
0.10
-0.15
1
23.01.85 23.01.86 23.01.87 23.01.88 Time
2
23.01.89 23.01.90 23.01.91
3
23.01.92
-0.05
SIEMENS
-0.10
0.05
-0.15
-0.15
1
23.01.85 23.01.86 23.01.87 23.01.88 Time
2
23.01.89 23.01.90 23.01.91
3
23.01.92
SIEMENS 0.0
-0.15
-0.10
-0.05
0.0 BMW
0.05
0.10
History
Berlin Wall
16thOctober 1989
Basics II
Densities. Joint densities f (x) = f (x1, . . . , xd), when they exist, are related to joint dfs by
x1 xd
F (x1, . . . , xd) =
F (x) =
i=1
Fi(xi),
x Rd,
f (x) =
i=1
c 2006 (Embrechts, Frey, McNeil)
fi(xi),
x Rd.
43
Moments
The mean vector of X is E(X) = (E(X1), . . . , E(Xd)) and the covariance matrix is cov(X) = E ((X E(X))(X E(X)) ) . (assuming niteness of moments in both cases). Writing for cov(X), the (i, j)th element of this matrix is ij = cov(Xi, Xj ) = E(XiXj ) E(Xi)E(Xj ). The correlation matrix of X is the matrix P with (i, j)th element ij , ij = iijj the ordinary pairwise linear correlation of Xi and Xj . Writing = diag( 11, . . . , dd) we have P = 11.
c 2006 (Embrechts, Frey, McNeil) 44
Moments II
Mean vectors and covariance matrices are extremely easily manipulated under linear operations on the vector X. For any matrix B Rkd and vector b Rk we have E(BX + b) = BE(X) + b, cov(BX + b) = B cov(X)B . Covariance matrices must be positive semi-denite; writing for cov(X) we see that (3) implies that var(a X) = a a 0 for any a Rd . (3)
45
The sample correlation matrix R has (i, j)th element given by rij = sij / siisjj . Writing D = diag( s11, . . . , sdd) we have R = D1SD1.
c 2006 (Embrechts, Frey, McNeil) 46
-2
-4
-4 4
-4
-4
-2
-2
-2
0 Y
2 0 X
0 Y -2
0 X
4 2
4
-4
-4
-2
-2
52
Could be used to model two regimes - ordinary and extreme. Multivariate t W has an inverse gamma distribution, W Ig(/2, /2). This gives multivariate t with degrees of freedom. Equivalently /W 2 . Symmetric generalised hyperbolic W has a GIG (generalised inverse Gaussian) distribution.
c 2006 (Embrechts, Frey, McNeil) 54
where Rd, Rdd is a positive denite matrix, is the degrees of freedom and k,,d is a normalizing constant.
If X has density f then E (X) = and cov (X) = 2 , so that and are the mean vector and dispersion matrix respectively. For nite variances/correlations > 2. Notation: X td(, , ).
If is diagonal the components of X are uncorrelated. They are not independent. The multivariate t distribution has heavy tails.
c 2006 (Embrechts, Frey, McNeil) 55
-2
-4
-4
-4
-4
-4
-2
-2
-2
0 X
0 Y -2
0 X
0 Y
-4
-4
-2
-2
0.10 0.10
-0.05
-0.10
-0.10
-0.05
0.05
SIEMENS 0.0
-0.15
-0.15
-0.10
-0.05
0.0 BMW
0.05
0.10
-0.15
SIEMENS 0.0
0.05
-0.15
-0.10
-0.05
0.0 BMW
0.05
0.10
Simulated data (2000) from models tted by maximum likelihood to BMW-Siemens data.
c 2006 (Embrechts, Frey, McNeil) 57
58
provided W has nite variance. We observe from (6) and (7) that the parameters and are not in general the mean vector and covariance matrix of X. Note that a nite covariance matrix requires var(W ) < whereas the variance mixtures only require E(W ) < .
59
x > 0,
where K denotes a modied Bessel function of the third kind with index and the parameters satisfy > 0, 0 if < 0; > 0, > 0 if = 0 and 0, > 0 if > 0. For more on this Bessel function see [Abramowitz and Stegun, 1965]. The GIG density actually contains the gamma and inverse gamma densities as special limiting cases, corresponding to = 0 and = 0 respectively. Thus, when = 0 and = 0 the mixture distribution in (5) is multivariate t.
c 2006 (Embrechts, Frey, McNeil) 60
62
63
Special Cases
If = 1 we get a multivariate distribution whose univariate margins are one-dimensional hyperbolic distributions, a model widely used in univariate analyses of nancial return data. If = 1/2 then the distribution is known as a normal inverse Gaussian (NIG) distribution. This model has also been used in univariate analyses of return data; its functional form is similar to the hyperbolic with a slightly heavier tail. If > 0 and = 0 we get a limiting case of the distribution known variously as a generalised Laplace, Bessel function or variance gamma distribution. If = /2, = and = 0 we get an asymmetric or skewed t distribution.
c 2006 (Embrechts, Frey, McNeil) 64
17144.38 0.00
2872.36 0.00
65
Elliptical distributions
A random vector (X1, . . . , Xd) is spherical if its distribution is invariant under rotations, i.e. for all U Rdd with U U = U U = Id U X = X. A random vector (X1, . . . , Xd) is called elliptical if it is an ane transform of a spherical random vector (Y1, . . . , Yk ), X = AY + b, A Rdk , b Rd. A normal variance mixture in (4) with = 0 and = I is spherical; any normal variance mixture is elliptical.
c 2006 (Embrechts, Frey, McNeil) 66
References
[Barndor-Nielsen and Shephard, 1998] distributions) (generalized hyperbolic
[Barndor-Nielsen, 1997] (NIG distribution) [Eberlein and Keller, 1995] ) (hyperbolic distributions) [Prause, 1999] (GH distributions - PhD thesis) [Fang et al., 1990] (elliptical distributions)
67
(i) F = (F1, . . . , Fp) is random vector of factors with p < d, (ii) = (1, . . . , d) is random vector of idiosyncratic error terms, which are uncorrelated and mean zero, (iii) B Rdp is a matrix of constant factor loadings and a Rd a vector of constants, (iv) cov(F, ) = E((F E(F)) ) = 0.
68
Conversely, if (9) holds for covariance matrix of random vector X, then X follows factor model (8) for some a, F and . If, moreover, X is Gaussian then F and may be taken to be independent Gaussian vectors, so that has independent components.
c 2006 (Embrechts, Frey, McNeil) 69
C. Copulas
1. Basic Copula Primer 2. Copula-Based Dependence Measures 3. Normal Mixture Copulas 4. Archimedean Copulas 5. Fitting Copulas to Data
71
What is a Copula?
A copula is a multivariate distribution function with standard uniform margins. Equivalently, a copula if any function C : [0, 1]d [0, 1] satisfying the following properties: 1. C(u1, . . . , ud) is increasing in each component ui. 2. C(1, . . . , 1, ui, 1, . . . , 1) = ui for all i {1, . . . , d}, ui [0, 1]. 3. For all (a1, . . . , ad), (b1, . . . , bd) [0, 1]d with ai bi we have:
2 2
i1=1 id=1
74
Sklars Theorem
Let F be a joint distribution function with margins F1, . . . , Fd. There exists a copula C such that for all x1, . . . , xd in [, ] F (x1, . . . , xd) = C(F1(x1), . . . , Fd(xd)). If the margins are continuous then C is unique; otherwise C is uniquely determined on RanF1 RanF2 . . . RanFd. And conversely, if C is a copula and F1, . . . , Fd are univariate distribution functions, then F dened above is a multivariate df with margins F1, . . . , Fd.
75
76
77
max
i=1
ui + 1 d, 0
(10)
The upper bound is the df of (U, . . . , U ) and the copula of the random a.s. vector X where Xi = Ti(X1) for increasing functions T2, . . . , Td. It represents perfect positive dependence or comonotonicity. The lower bound is only a copula when d = 2. It is the df of the a.s. vector (U, 1 U ) and the copula of (X1, X2) where X2 = T (X1) for T decreasing. It represents perfect negative dependence or countermonotonicity. The copula representing independence is C(u1, . . . , ud) =
c 2006 (Embrechts, Frey, McNeil)
d i=1 ui.
78
Parametric Copulas
There are basically two possibilities: Copulas implicit in well-known parametric distributions. Sklars Theorem states that we can always nd a copula in a parametric distribution function. Denoting the df by F and assuming the margins F1, . . . , Fd are continuous, the implied copula is
C(u1, . . . , ud) = F (F1 (u1), . . . , Fd (ud))
. Such a copula may not have a simple closed form. Closed form parametric copula families generated by some explicit construction that is known to yield copulas. The best example is the well-known Archimedean copula family. These generally have limited numbers of parameters and limited exibility; the standard Archimedean copulas are dfs of exchangeable random vectors.
c 2006 (Embrechts, Frey, McNeil) 79
where denotes the standard univariate normal df, P denotes the Ga joint df of X Nd(0, P ) and P is a correlation matrix. Write C when d = 2. P = Id gives independence and P = Jd gives comonotonicity. t Copula
t C,P (u) = t,P t1(u1), . . . , t1(ud) , where t is the df of a standard univariate t distribution, t,P is the joint df of the vector X td(, 0, P ) and P is a correlation matrix. t Write C, when d = 2.
1/
u 1
+ +
u d
d+1
1/
81
Simulating Copulas
Ga Simulating Gaussian copula CP
Simulate X td(, 0, P ) Set U = (t (X1) , . . . , t (Xd)) (probability transformation) t is df of univariate t distribution. Simulation of Archimedean copulas is less obvious, but also turns out to be fairly simple in the majority of cases.
c 2006 (Embrechts, Frey, McNeil) 82
Simulating Copulas II
Gaussian
1.0 1.0
0.0 0.2 0.4 X1 0.6 0.8 1.0
Gumbel
0.0 0.2 0.4 X1 0.6 0.8 1.0
0.8
0.6
X2
0.4
0.0
0.2
Clayton
1.0
0.0 0.2 0.4 X1 0.6 0.8 1.0
0.6
0.8
t4
1.0
0.0 0.2 0.4 X1 0.6 0.8 1.0
0.8
0.6
X2
0.4
0.0
0.2
0.6
0.8
84
Gumbel
4
X2
0 X1
4
4
X2
0 X1
Clayton
4 4
t4
4 2 0 X1 2
X2
X2
4
0 X1
Copula References
[Embrechts et al., 2002] (dependence in QRM) [Joe, 1997] (on dependence in general) [Nelsen, 1999] (standard reference on bivariate copulas) [Daul et al., 2003] (summary article) [Cherubini et al., 2004] (copulas in nance)
87
88
Tail Dependence
Clearly u [0, 1] and l [0, 1]. For copulas of elliptically symmetric distributions u = l =: . This is true, more generally, for all copulas with radial symmetry. Terminology: u (0, 1]: upper tail dependence, u = 0: asymptotic independence in upper tail, l (0, 1]: lower tail dependence, l = 0: asymptotic independence in lower tail.
90
The Gumbel copula is upper tail dependent for > 1. u = 2 21/ . The Clayton copula is lower tail dependent for > 0. l = 21/ . All formulas are derived in the book.
c 2006 (Embrechts, Frey, McNeil) 91
Rank Correlation
Spearmans rho S (X1, X2) = (F1(X1), F2(X2)) = (copula)
1 1
S (X1, X2) = 12
0 0
Kendalls tau Take an independent copy of (X1, X2) denoted (X1, X2). (X1, X2) = 2P (X1 X1)(X2 X2) > 0 1
1 1
(X1, X2) = 4
0
c 2006 (Embrechts, Frey, McNeil)
93
n+1 rank(Xi,2) . 2
94
t Dependence
4
-4
-4
X2 0
-2
-4
-2
0 X1
-2
X2 0
-4
-2
0 X1
For normal copula probability is given. For t copulas the factor by which Gaussian probability must be multiplied is given.
c 2006 (Embrechts, Frey, McNeil) 97
98
Financial Interpretation
Consider daily returns on ve nancial instruments and suppose that we believe that all correlations between returns are equal to 50%. However, we are unsure about the best multivariate model for these data. If returns follow a multivariate Gaussian distribution then the probability that on any day all returns fall below their 1% quantiles is 7.48 105. In the long run such an event will happen once every 13369 trading days on average, that is roughly once every 51.4 years (assuming 260 trading days in a year). On the other hand, if returns follow a multivariate t distribution with four degrees of freedom then such an event will happen 7.68 times more often, that is roughly once every 6.7 years.
c 2006 (Embrechts, Frey, McNeil) 99
Rank Correlations
Gaussian Case
Ga Let X be a bivariate random vector with copula C and continuous margins. Then the rank correlations are
2 (X1, X2) = arcsin , 6 S (X1, X2) = arcsin . 2 Normal Variance Mixture Case The formula (11) also holds when X has the copula of a normal variance mixture distribution with correlation parameter , for t example the t copula C,.
c 2006 (Embrechts, Frey, McNeil)
(11) (12)
100
W1Zs1 , . . . ,
WmZdsm+1, . . . ,
WmZd) .
102
The class of such functions coincides with the class of Laplace transforms of dfs G on R+ satisfying G(0) = 0. Recall that the Laplace transform G of G is given by
G(t) =
0
etxdG(x), t 0.
For this reason we refer to generators with completely monotonic inverses as LT-Archimedean generators and the resulting copulas as LT-Archimedean copulas.
c 2006 (Embrechts, Frey, McNeil) 104
=
0
dG(v)
106
Special Cases
Clayton copula We generate a gamma variate V Ga(1/, 1) with > 0. The df of V has Laplace transform G(t) = (1 + t)1/ . Gumbel copula We generate a positive stable variate V St(1/, 1, , 0) where = (cos(/(2))) and > 1. This df has Laplace transform G(t) = exp(t1/ ) as desired. For denitions of these distributions, consult the book.
107
1.0
0.0
0.6
0.8
0.2
0.4
gumbel[,2]
1.0
0.0
0.6
0.8
0.2
0.4
gumbel[,4]
0.0
0.2
0.4
gumbel[,3]
0.6
0.8
1.0
0.0
0.2
0.4
gumbel[,1]
0.6
0.8
1.0
109
6 ij S (Xi, Xj ) = arcsin ij , 2 where the nal approximation is very accurate. This suggests we estimate P by the matrix of pairwise Spearmans rank coecients RS .
c 2006 (Embrechts, Frey, McNeil) 111
It follows from Proposition 5.37 in book that 2 (Xi, Xj ) = arcsin ij , so that a possible estimator of P is the matrix R with components given by rij = sin(rij /2) This may not be positive denite, in which case R can be transformed by the eigenvalue method given in Algorithm 5.55 to obtain a positive denite matrix that is close to R. The remaining parameter of the copula could then be estimated by maximum likelihood.
c 2006 (Embrechts, Frey, McNeil) 112
114
BMW-Siemens Example I
1.0
0.0 0.2 0.4 BMW 0.6 0.8 1.0
0.6
0.8
BMW-Siemens Example II
Copula Gauss t Gumbel Clayton , std.error(s) log-likelihood 0.70 0.0098 610.39 0.70 4.89 0.0122,0.73 649.25 1.90 0.0363 584.46 1.42 0.0541 527.46
Goodness-of-t. Akaikes criterion (AIC) suggests choosing model that minimizes AIC = 2p 2 (log-likelihood), where p = number of parameters of model. This is clearly t model. Remark. Formal methods for goodness-of-t also available.
c 2006 (Embrechts, Frey, McNeil) 116
GE
IBM
MCD
0.0 0.2
MSFT
0.4
0.6
0.8
1.0
10 nu
15
20
Daily returns on ATT, General Electric, IBM, McDonalds, Microsoft. Form of likelihood for nu indicates non-Gaussian dependence.
c 2006 (Embrechts, Frey, McNeil) 118
119
120
121
122
Types of Distributions
Denition (Equality in type) Two random variables U and V are of the same type d U = aV + b for a > 0, b R. In terms of their dfs FU and FV this means FU (x) = FV ((x b)/a). Thus random variables of the same type have the same df, up to possible changes of scale and location.
125
where 1 + x > 0 and is the shape parameter. Note, this parametrization is continuous in . For >0 <0 H is equal in type to classical Frchet df e H is equal in type to classical Weibull df .
We introduce location and scale parameters and > 0 and work with H,, (x) := H ((x )/). Clearly H,, is of type H .
c 2006 (Embrechts, Frey, McNeil) 126
0.8
H(x)
0.6
2 x
0.0
0.1
0.2
0.3
0.4
2 x
Solid line corresponds to = 0 (Gumbel); dotted line is = 0.5 (Frchet); dashed line is = 0.5 (Weibull). = 0 and = 1. e
c 2006 (Embrechts, Frey, McNeil) 127
128
130
Examples
Recall: F MDA(H ) , i there are sequences cn and dn with P ((Mn dn) /cn x) = F (cnx + dn) H(x) . We have the following examples: The exponential distribution, F (x) = 1 ex, > 0, x 0, is in MDA(H0) (Gumbel-case). Take cn = 1/, dn = (log n)/. The Pareto distribution, F (x) = 1 +x
n n
, > 0,
x 0,
Take cn = n1//, dn =
131
Extremal Index
The value is known as the extremal index of the process (to be distinguished from the tail index of distributions in the Frchet class). e For processes with an extremal index normalized block maxima converge in distribution provided that maxima of the associated iid process converge in distribution, that is, provided the underlying distribution F is in MDA(H ) for some .
H (x) is a distribution of the same type as H (x). It is also a GEV distribution with exactly the same scaling parameter . Only the location and scaling of the distribution change.
133
134
= H,dn,cn (y).
Implication: We collect data on block maxima and t the three-parameter form of the GEV. For this we require a lot of raw data so that we can form suciently many, suciently large blocks.
136
L(; y) = log
i=1
and maximize this w.r.t. to obtain the MLE = (, , ) . Clearly, in dening blocks, bias and variance must be traded o. We reduce bias by increasing the block size n; we reduce variance by increasing the number of blocks m.
c 2006 (Embrechts, Frey, McNeil) 137
-6
05.01.60
-4
-2
05.01.65
05.01.70
05.01.75 Time
05.01.80
05.01.85
139
140
Return Levels
Rn,k , the k nblock return level, is dened by P (Mn > Rn,k ) = 1/k ; i.e. it is that level which is exceeded in one out of every k nblocks, on average. We use the approximation
1 Rn,k H,, (1 1/k) + ( log(1 1/k))
We wish to estimate this functional of the unknown parameters of our GEV model for maxima of nblocks.
141
1963 1964 1965 1966 1967 2.806479 1.253012 1.757765 2.460411 1.558183 1971 1972 1973 1974 1975 1.522388 1.319013 3.051598 3.671256 2.362394 1979 1980 1981 1982 1983 2.957772 3.006734 2.886327 3.996544 2.697254 1987 5.253623
$par.ests: xi sigma mu 0.3343843 0.6715922 1.974976 $par.ses: xi sigma mu 0.2081 0.130821 0.1512828 $nllh.final: [1] 38.33949
142
= 0.027 .
It is important to construct condence intervals for such statistics. We use asymptotic likelihood ratio ideas to construct asymmetric intervals the socalled prole likelihood method.
c 2006 (Embrechts, Frey, McNeil) 143
10
15
05.01.65
05.01.70
05.01.75 Time
05.01.80
05.01.85
144
References
On EVT in general: [Embrechts et al., 1997] [Reiss and Thomas, 1997] On Fisher-Tippett Theorem: [Fisher and Tippett, 1928] [Gnedenko, 1943] Application of Block Maxima Method to S&P Data: [McNeil, 1998]
c 2006 (Embrechts, Frey, McNeil) 145
146
where > 0, and the support is x 0 when 0 and 0 x / when < 0. This subsumes: > 0 Pareto (reparametrized version) = 0 exponential < 0 Pareto type II. Moments. For > 0 distribution is heavy tailed. E X k does not exist for k 1/.
c 2006 (Embrechts, Frey, McNeil) 147
0.8
0.6
G(x)
4 x
0.0
0.2
0.4
0.6
0.8
4 x
Solid line corresponds to = 0 (exponential); dotted line is = 0.5 (Pareto); dashed line is = 0.5 (Pareto type II). = 1.
c 2006 (Embrechts, Frey, McNeil) 148
Examples
1. Exponential. F (x) = 1 ex, > 0, x 0. Fu(x) = F (x) , x 0.
The lack-of-memory property. e(u) = 1/ for all u 0 2. GPD. F (x) = G, (x). Fu(x) = G,+u(x), 0 x < if 0 0x
u if < 0.
The excess distribution of GPD remains GPD with same shape. + u e(u) = , 1 0 u < if 0 0u
if < 0.
150
lim
sup
Fu(x) G,(u)(x) = 0 ,
if and only if F MDA (H ), R. This theorem provides a characterization of distributions in MDA(H ): they are the distributions with excess distributions that converge to generalized Pareto with shape parameter . This amounts to all the common continuous distributions used in risk management or insurance mathematics.
151
152
0
030180
50
100
150
200
250
030182
030184
030186 Time
030188
030190
153
Estimating Excess df
Estimate of Excess Distribution
1.0
Fu(x-u)
0.0
10
0.2
0.4
0.6
0.8
100
155
(13)
To view this function we generally construct the mean excess plot {(Xi,n, en(Xi,n)) : 2 i n} , where Xi,n denotes the ith order statistic. If the data support a GPD model over a high threshold we would expect this plot to become linear in view of (13).
c 2006 (Embrechts, Frey, McNeil) 158
100
0 10 20
20
Mean Excess 40 60 80
30 40 Threshold
50
60
159
(14)
which, if we know F (u), gives us a formula for tail probabilities. This formula may be used to derive formulas for risk measures like VaR and expected shortfall.
c 2006 (Embrechts, Frey, McNeil) 160
1 .
(15)
Assuming that < 1 the associated expected shortfall can be calculated easily from (2) and (15). We obtain 1 ES = 1
1
(16)
161
so that the shape parameter of the GPD eectively determines the ratio when we go far enough out into the tail.
162
1+
(18)
which is valid for x u. For 1 Nu/n we obtain analogous point estimators of VaR and ES. Asymmetric condence intervals can be constructed using prole likelihood method as desribed in book (page 284).
c 2006 (Embrechts, Frey, McNeil) 163
0.00500
0.00050
0.00005
10
100
164
0.05000
0.00500
95 99
0.00050
0.00005
10
23.3
27.3
33.1
100
165
0.05000
0.00500
95 99
0.00050
0.00005
10
41.6
50
58.2
100
154.7
166
Illustration
0.04
0.04
500
1000 Time
1500
2000
0.02
0.04
0.0
0.06
500
1000 Time
1500
2000
Exponential Quantiles
0 20
40 Ordered Data
60
0.04
0.05
Ordered Data
(19)
provided (1 + (x )/) > 0 and zero otherwise, where parameters satisfy , R and > 0.
170
Implications of model
The intensity measure is
t2 x
(A) =
t1
1/
,
+
for a set of the form A = (t1, t2) (x, ) X so that (A)k . P (k points in A) = exp((A)) k! The 1-d intensity of exceedances of the level x u is (x) = x 1+
1/
(20)
0 x < xF
Statistical Inference
Likelihood of exceedance data {(Tj , Xj ) : j = 1, . . . , Nu} is
Nu
(Xj )
= exp {n (u)}
j=1
(Xj ).
For a justication see [Daley and Vere-Jones, 2003]. Fitting this model allows immediate inference about numbers and magnitudes of threshold exceedances at dierent levels. In particular the well-known GPD model for the excess distribution is easily derived.
c 2006 (Embrechts, Frey, McNeil) 173
0.04
0.06
03.01.96
03.01.99 Time
03.01.02
0.02
0.04
0.0
0.06
10.01.96
10.01.98 Time
10.01.01
Exponential Quantiles
0 50
150
0.04
0.05
Ordered Data
175
176
0.10 0.05
02.01.85
0.0
0.05
02.01.86
02.01.87
02.01.88
02.01.89
02.01.90 Time
02.01.91
02.01.92
02.01.93
02.01.94
Normal
0.10 0.05
0.0
0.05
500
1000 Time
1500
2000
2500
Student t
0.10 0.05
0.0
0.05
500
1000 Time
1500
2000
2500
Log-returns for DAX index from 02.01.85 until 30.12.94 compared with simulated iid data from tted normal and t distribution.
c 2006 (Embrechts, Frey, McNeil) 178
1.0
0.8
0.2
0.0
10
Lag
20
30
0.0
0
0.2
0.8
1.0
10
Lag
20
30
Normal
1.0
0.8
0.2
0.0
10
Lag
20
30
0.0
0
0.2
0.8
1.0
10
Lag
20
30
t (absolute values)
1.0
0.8
0.2
0.0
10
Lag
20
30
0.0
0
0.2
0.8
1.0
10
Lag
20
30
180
where t, the volatility, is a function of the history up to time t 1 represented by Ft1. Zt is assumed independent of Ft1. Mathematically, t is Ft1-measurable, where Ft1 is the ltration 2 generated by (Xs)st1, and therefore var(Xt | Ft1) = t . Volatility is the conditional standard deviation of the process.
181
with j > 0.
Intuition: volatility inuenced by large observations in recent past. (Xt) follows a GARCH(p,q) process (generalised ARCH) if, for all t,
p 2 t = 0 + j=1 2 j Xtj + k=1 q 2 k tk ,
with j , k > 0.
(22)
j +
j=1 k=1
k < 1.
ARCH and GARCH are technically uncorrelated white noise processes since the autocovariance function is given by (h) := cov(Xt, Xt+h) = E(t+hZt+htZt) = E(Zt+h)E(t+htZt) = 0.
183
184
with 0, 1, > 0, 1 + < 1 and || < 1 for a stationary model with nite variance. This model is a reasonable t for many daily nancial return series, particularly under the assumption that the driving innovations are heavier-tailed than normal.
186
with 0, 1, > 0, 1 + < 1 and || < 1. We assume (Zt) is strict white noise with E(Zt) = 0 and var(Zt) = 1, but leave exact innovation distribution unspecied. Other GARCH-type models could be used if desired.
c 2006 (Embrechts, Frey, McNeil) 187
Dynamic EVT
Given a data sample Xtn+1, . . . , Xt we adopt a two-stage estimation procedure. (Typically we take n = 1000.) We forecast t+1 and t+1 by tting an AR-GARCH model with unspecied innovation distribution by quasi-maximum-likelihood (QML) and calculating 1-step predictions. (QML yields consistent, asymptotically normal estimates of GARCH parameters) We consider the model residuals to be iid realizations from the innovation distribution and estimate the tails of this distribution using EVT (GPD-tting). In particular estimates of quantiles q(Z) and expected shortfalls ES(Z) for the distribution of (Zt) can be determined.
c 2006 (Embrechts, Frey, McNeil) 188
Risk Measures
Recall that we must distinguish between risk measures based on tails of conditional and unconditional distributions of the loss - in this case the negative return. We are interested in the former and thus calculate risk measures based on the conditional distribution F[Xt+1|Ft]. For a one-step time horizon risk measure estimates are easily computed from estimates of q(Z) and ES(Z) and predictions of t+1 and t+1 using VaRt = t+1 + t+1q(Z),
t ES = t+1 + t+1 ES(Z) .
189
-10 -5
0
0
5 10 15 20
200
400
600
800
1000
2
0
200
800
1000
1000 day excerpt from series of negative log returns on Standard & Poors index containing crash of 1987.
c 2006 (Embrechts, Frey, McNeil) 190
Series : abs(data)
0.8
0.2
0.0
10
15 Lag
20
25
30
0.0 0
0.2
0.8
10
15 Lag
20
25
30
Series : residuals
1.0 1.0
Series : abs(residuals)
0.8
0.2
0.0
10
15 Lag
20
25
30
0.0 0
0.2
0.8
10
15 Lag
20
25
30
191
Heavy-Tailedness Remains
QQ-plot of residuals; raw data from S&P
1000
999 998
xres
-2
-4
192
Gains
0.1000 1-F(x) (on log scale) 0.0100
0.0500
0.0050
0.0010
0.0005
0.0001
GPD Normal t
0.0001
10
193
0.0
0.05
01.07.98
01.01.99 Time
01.07.99
01.01.100
194
73 104 78 86
55 74 61 59
Remark: Performance of ES estimates even more sensitive to suitability of model in the tail region.
c 2006 (Embrechts, Frey, McNeil) 195
h(t Tj ; Xj u),
, > 0
Under this formulation the increase in intensity depends not only on the time since an event but also on the size of a past event.
1.5
11.2
Intensity
1.0
5.6
0.5
4.35 3.2
0.0 mu
u T0 T1 T2 Time T3 T4
197
198
Statistical Inference
Model can also be tted easily by ML. Exceedance data is {(Tj , Xj ) : j = 1, . . . , Nu}. Parameters (in M1) are = (, , , ) .
n Nu
(t)dt
j=1
(Tj ).
199
0.02
0.04
0.06
500
1000 Time
1500
2000
intensity
0.05
0.15
0.25
S-PLUS Analysis
$threshold: [1] 0.01506956 $theta: tau psi gamma delta 0.03154302 0.01605022 0.02622928 12.25498 $theta.ses: [1] 0.01132330
0.00687112
0.01090378 25.61115098
202
203
204
205
def
Operational Risk
Denition: The risk of losses resulting from inadequate or failed internal processes, people and systems, or external events.
Remark: This denition includes legal risk, but excludes strategic and reputational risk. Note: Solvency 2
c 2006 (Embrechts, Frey, McNeil) 207
209
GI: average annual gross income over the previous three years = 15% (set by the Committee based on CISs)
210
Standardised Approach
Similar to the BIA, but on the level of each business line:
8
SA COP =
i GIi
i=1
i [12%, 18%], i = 1, 2, . . . , 8 and 3-year averaging 8 business lines: Corporate nance (18%) Trading & sales (18%) Retail banking (12%) Payment & Settlement (18%) Agency Services (15%) Asset management (12%)
Incorporation of risk diversication benets allowed Given the continuing evolution of analytical approaches for operational risk, the Committee is not specifying the approach or distributional assumptions used to generate the operational risk measures for regulatory capital purposes. Example: Loss distribution approach
c 2006 (Embrechts, Frey, McNeil) 212
ik eik
i=1 k=1
(rst attempt)
eik : expected loss for business line i, risk type k ik : scaling factor 7 loss types:
Internal fraud External fraud Employment practices and workplace safety Clients, products & business practices Damage to physical assets Business disruption and system failures Execution, delivery & process management
c 2006 (Embrechts, Frey, McNeil) 213
RT1 BL1
r r r
r r r
RTk
r r r
RT7
'
BLi
r r r
&
LT +1 i,k
$
0 1992
10
15
20
1994
1996
1998
2000
2002
BL8 LT +1
c 2006 (Embrechts, Frey, McNeil) 214
LDA: continued
For each business line/loss type cell (i, k) one models LT +1: OP risk loss for business line i, loss type k i,k over the future (one year, say) period [T, T + 1]
T +1 Ni,k
LT +1 = i,k
=1
Xi,k
215
LDA: continued
Remark: Look at the structure of the loss random variable LT +1
8 7
LT +1 =
i=1 k=1 8
LT +1 i,k
T +1 Ni,k 7
=
i=1 k=1 8 =1
Xi,k LT +1 i
i=1
SN (t) =
k=1
Xk
Remarks: FSN (t) (x) = P (SN (t) x) is called the total loss df. If t is xed, we write SN and FSN instead The random variable SN (t) is also referred to as random sum
c 2006 (Embrechts, Frey, McNeil) 217
Compound Sums
Assume: 1. (Xk ) are iid with common df G, G(0) = 0 2. N and (Xk ) are independent SN is then referred to as a compound sum. The pdf of N is denoted by pN (k) = P (N = k), k = 0, 1, . . . and N is called a compounding rv. Proposition 1: Let SN be a compound sum and the above assumptions hold. Then
FSN (x) =
k=0
c 2006 (Embrechts, Frey, McNeil)
pN (k)Gk (x),
x0
218
Proposition 2: Let SN be a compound sum and the above assumptions hold. Then the Laplace-Stieltjes transform of SN satises
FSN (s) =
0
esxdFSN (x) =
k=0
s0
where MN denotes the moment-generating function of N . Proposition 3: Let SN be a compound sum and the above assumptions hold. If 2 E(N 2) < and E(X1 ) < , we have that E(SN ) = E(N )E(X1) var(SN ) = var(N )(E(X1))2 + E(N ) var(X1)
c 2006 (Embrechts, Frey, McNeil) 219
2 var(SN ) = E(X1 )
220
SN :=
i=1
SNi =
i=1 k=1
Xi,k
=
i=1
i Gi and G = i=1
G is hence a mixture distribution. A simulation from G can be done in two steps: rst draw i, i {1, . . . , d} with probability i/ and then draw a loss with df Gi.
c 2006 (Embrechts, Frey, McNeil) 221
Over-dispersion
For compound Poisson rvs with N Poi() we have that E(N ) = var(N ) = Count data however often exhibit over-dispersion meaning that they indicate E(N ) < var(N ) This can be achieved by mixing, i.e. by randomizing the parameter
223
Randomization
Other examples of mixing include: from Black-Scholes to stochastic volatility models (randomize ) from Nd(, ) to elliptical distributions (randomize ); i.e. the multivariate t distribution. Randomization of both and leads to generalized hyperbolic distributions mixing models for credit risk (randomizing the default probability) credibility theory in insurance (randomizing the underlying risk parameter) Bayesian inference ...
c 2006 (Embrechts, Frey, McNeil) 224
Poisson Mixtures
Denition: Let be a positive rv with distribution function F. A rv N given by
P (N = k) =
0
P (N = k| = )dF() =
0
k dF() k!
is called a mixed Poisson rv with structure or mixing distribution F. A compound sum with a mixed Poisson rv as the compounding rv is called a compound mixed Poisson rv. Lemma: Suppose that N is mixed Poisson with structure df F. Then E(N ) = E() and var(N ) = E() + var(), i.e. for non-degenerate, N is over-dispersed.
225
+1
1 +1
( + k) ( + 1)+k
Approximations
The distribution of SN is generally intractable Normal approximation for CPoi(, G): FSN (x) x E(N )E(X1) var(N )(E(X1))2 + E(N ) var(X1)
Translated-gamma approximation for CPoi(, G): Approximate SN by k + Y where k is a translation parameter and Y Ga(, ). The parameters k, , are found by matching the mean, variance and skewness.
c 2006 (Embrechts, Frey, McNeil) 227
Simulated CPoi(100, Exp(1)) data together with normal and translated gamma approximations. The 99.9% quantile estimates are also given.
c 2006 (Embrechts, Frey, McNeil) 228
Simulated CPoi(100, Pa(4, 1)) data together with normal and translated gamma approximations. GPD approximation based on the POT method is also performed.
c 2006 (Embrechts, Frey, McNeil) 229
Panjer Class
Recursive method for approximating SN in case the severity distribution G is discrete and N satises a specic condition. Panjer Class The probability mass distribution of N belongs to the Panjer(a, b) class for some a, b R if pN (k) = (a + (b/k))pN (k 1) for k 1 The only nondegenerate examples of distributions belonging to a Panjer(a, b) class are binomial B(n, p) with a = p/(1 p) and b = (n + 1)p/(1 p) Poisson Poi() with a = 0 and b = Negative binomial NB(, p) with a = 1 p and b = ( 1)(1 p)
c 2006 (Embrechts, Frey, McNeil) 230
Panjer Recursion
For a discrete severity rv X1 we denote gi := P (X1 = i) and si := P (SN = i) Theorem: Suppose N satises the Panjer(a, b) class condition and g0 = 0. Then s0 = pN (0) = 0 and, for k 1,
k
sk =
i=1
bi giski. a+ k
For continuous severity distributions, discretization necessary Correction for g0 > 0 possible Estimation of sk far in the tail is more tricky
c 2006 (Embrechts, Frey, McNeil) 231
Example
Simulated CPoi(100, LN(1, 1)) data together with the Panjer recursion approximation. Normal, translated gamma and GPD approximations are also performed.
c 2006 (Embrechts, Frey, McNeil) 232
Further Topics
SN can be looked upon as a process in time, SN (t). Instead of N we then have the process {N (t), t 0} counting the number of events in [0, t]. Interesting examples for N (t) are homogeneous Poisson process non-homogeneous Poisson process Cox or doubly stochastic Poisson process Of further interest is the surplus process Ct = u + ct SN (t) and the corresponding ruin probability (u, T ) = P u{ inf Ct 0}
0tT
234
235
Ruin Probability
Recall ruin results for (u) := (u, ) = P u{ inf Cramr-Lundberg: small claims e (u) < eRu, (u) eRu, 1 u > 0 u
0t
Ct < 0}
(u)
c 2006 (Embrechts, Frey, McNeil)
2 u
G(t)dt,
u
236
Subexponential Distributions
Denition: For X1, . . . , Xn positive iid random variables with common distribution function FX , denote Sn =
n k=1 Xk
and
Mn = max(X1, . . . , Xn). The distribution function FX is called subexponential (denoted by FX S) for some (and then for all) n 2 if P (Sn > x) =1 lim x P (Mn > x) Examples: Pareto, Generalized Pareto, Lognormal, Loggamma, ...
c 2006 (Embrechts, Frey, McNeil) 237
20 simulations from the ruin process Ct, 0 t 1, with (N (t)) HPois(100t) and X1 Exp(1).
c 2006 (Embrechts, Frey, McNeil) 238
20 simulations from the ruin process Ct, 0 t 1, with (N (t)) HPois(100t) and X1 Pareto(2, 1).
c 2006 (Embrechts, Frey, McNeil) 239
20 simulations from the ruin process Ct, 0 t 1, with (N (t)) a doubly stochastic Poisson process with a two-state Markov intensity process: HPois(10t) and HPois(100t) with mean holding times 5 and 0.2, respectively, and X1 Exp(1).
c 2006 (Embrechts, Frey, McNeil) 240
20 simulations from the ruin process Ct, 0 t 1, with (N (t)) a doubly stochastic Poisson process with a two-state Markov intensity process: HPois(10t) and HPois(100t) with mean holding times 5 and 0.2, respectively, and X1 Pareto(2, 1).
c 2006 (Embrechts, Frey, McNeil) 241
242
(contd)
243
Basel II proposal
Period: one year Distribution: should be based on internal data/models external data expert opinion Condence level: = 99.9%, for economic capital purposes even = 99.95% or = 99.97% Risk measure: VaR Total capital charge: C T +1,OR =
i,k
244
246
Skewness
100 iid loans: 2%-coupon, 100 face value, 1% default probability (period: 1 year): Xi = 2 with probability 99% 100 with probability 1% (loss)
100
Two portfolios L1 =
i=1
! Xi
100 P i=1
VaR95%(Xi)
247
Special Dependence
Given rvs X1, . . . , Xn with marginal dfs F1, . . . , Fn, then one can always nd a copula C so that for the joint model F (x1, . . . , xn) = C(F1(x1), . . . , Fn(xn)) VaR is superadditive:
n n
VaR
k=1
Xk
>
k=1
VaR(Xk )
Special Dependence
249
Very Heavy-tailedness
Pareto: take X1, X2 independent with P (Xi > x) = x1/2, x 1 then for x > 2 2 x1 P (X1 + X2 > x) = > P (2X > x) x so that VaR(X1 + X2) > VaR(2X1) = VaR(X1) + VaR(X2) Pareto-type: similar result holds for X1, X2 independent with P (Xi > x) = x1/ L(x), where > 1, L slowly varying For < 1, we obtain subadditivity - WHY?
c 2006 (Embrechts, Frey, McNeil) 250
Several reasons: (Marcinkiewicz-Zygmund) Strong Law of Large Numbers Argument based on stable distributions Main reason however comes from functional analysis In the spaces Lp, 0 < p < 1, there exist no convex open sets other than the empty set and Lp itself. Hence as a consequence 0 is the only continuous linear functional on Lp; this is in violent contrast to Lp, p 1 Discussion: no reasonable risk measures exist diversication goes the wrong way
c 2006 (Embrechts, Frey, McNeil) 251
Denition: An Rd-valued random vector X is said to be regularly varying if there exists a sequence (an), 0 < an , = 0 Radon measure on d d B R \{0} with (R \R) = 0, so that for n , nP (a1X n Note that: (an) RV for some > 0 (uB) = u
1/
) () on B R \{0} .
Theorem: (several versions Samorodnitsky) If (X1, X2) RV 1/ , < 1, then for suciently close to 1, VaR is subadditive.
c 2006 (Embrechts, Frey, McNeil) 252
LT +1 = i,k
l=1
l Xi,k
Tasks: Suitable model for the severity Xi,k T +1 Suitable model for the frequency Ni,k Estimation of VaRi,k
c 2006 (Embrechts, Frey, McNeil) 254
type 2
30
10
20
1992
1994
1996
1998
2000
2002
0 1992
10
15
1994
1996
1998
2000
2002
type 3
40
1992
1994
1996
1998
2000
2002
0 1992
10
20
30
1994
1996
1998
2000
2002
255
12
10
Mean Excess
0 2 6 8
4 Threshold
Stylized Facts
Stylized facts about OpRisk losses: Loss amounts show extremes Loss occurence times are irregularly spaced in time
(reporting bias, economic cycles, regulation, management interactions, structural changes, . . . )
Non-stationarity (frequency(!), severity(?)) Large losses are of main concern Repetitive versus non-repetitive losses However: severity is of key importance
257
Peaks-over-Threshold Method
Excess distribution: asymptotically Generalized Pareto (GPD) P (X u > y|X > u) y 1+ (u)
1/
1/
x>u close to 1
259
1 ,
Threshold Choice
Application of POT based estimates requires a choice of an appropriate threshold u Rates of convergence: very tricky - No generally valid convergence rate - Convergence rate depends on F , in particular on the slowly varying function L, in a complicated way and may be very slow - L is not visible from data directly
i 1.19 (*) 1.17 1.01 1.39 (*) 1.23 1.22 (*) 0.85 0.98
* means signicant at 95% level
if x v, if x > v,
1 1 1 (1 1 v + 2 2
)1 1 (1 )2 1
Shape Plots
rep(xi1, length(xdata))
1 0
100
200
300 xdata
400
500
600
264
rep(xi1, length(xdata))
2 0
100
200
300 xdata
400
500
600
2 < 1
265
(contd)
Change of behavior typically visible in the mean excess plot Hard case: v high Typically only few observations above v Mean excess plot may not reveal anything Classical POT analysis easily yields incorrect resuls Vast overestimation of VaR possible
266
Mixture Models
Example 2: Consider FX = (1 p)F1 + pF2, with Fi exact Pareto, i.e. Fi(x) = 1 x1/i for x 1 and 0 < 1 < 2. Asymptotically, the tail index of FX is 2 VaR can be obtained numerically and furthermore does not correspond to VaR of a Pareto distribution with tailindex equals VaR corresponding to F2 at a level lower than
c 2006 (Embrechts, Frey, McNeil) 267
1.220
1.740
2.920
4.640
13.500
300
q q q q
Mean Excess
200
q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q
100
50
100 Threshold
150
200
0.0 500
0.5
1.0
1.5
433
366
299
232
165
98
48
Exceedances
rep(xi1, length(xdata))
2 0
100
200
300 xdata
400
500
600
268
269
Including Frequencies
The POT method can be embedded into a wider framework based on Point processes iid case: exceedance times follow asymptotically a homogeneous Poisson Process Extensions: several possibilities Including the severities: marked Poisson process Non-stationarity: non-homogeneous Poisson processes Over-dispersion: doubly stochastic processes Short-range dependence: clustering
c 2006 (Embrechts, Frey, McNeil) 270
binomial, Poisson and negative binomial cases), then lim P( > x) = E(N ). 1 FX (x)
N i=1 Xi
Approximation of VaR:
N
VaR
i=1
Xi
VaR (X),
1 , =1 EN
271
Bounds on VaR
Find optimal bounds for
d
VaRT +1 VaRT +1 l,
k=1
LT +1 k
VaRT +1 u,
given marginal VaRs and dependence information Solution: Frchet Problem e Mass Transportation Problem
c 2006 (Embrechts, Frey, McNeil) 276
i = 1, . . . , d
VaRT +1
k=1
LT +1 = k
k=1
VaRT +1 k,
277
V aR0.999
i=1
Li
=
i=1
VaR0.999(Li) = 0.79
VaR0.999
i=1
c 2006 (Embrechts, Frey, McNeil)
Li
1.93
278
Correlation
Correlation: Correlation (linear, rank, tail) is one-number summary: , , S ... Careful: linear correlation does not exist for > 0.5 Linear correlation is typically small for heavy tailed rvs Knowledge of correlation (linear, rank, tail...) individual models, but totally insucient in general is sucient for
280
Correlation
Upper and lower bound on linear correlation (L1, L2) for L1 Pareto(2.5) (left) and L1 Pareto(2.05) (right) and L2 Pareto()
c 2006 (Embrechts, Frey, McNeil) 281
Copulas
Copula: With LT +1 Fi the joint distribution can be written as i P (LT +1 l1, . . . , LT +1 ld) = C(F1(l1), . . . , Fd(ld)) 1 d The function C is known as copula and is a joint distribution on [0, 1]d with uniform marginals A copula and marginal distributions determine the joint model completely However: there are not enough OpRisk data: one year of loss data comprises to a single observation of (LT +1, . . . , LT +1) 1 d
c 2006 (Embrechts, Frey, McNeil) 282
Nk (T )
Lk =
i=1
Xi,k ,
k = 1, 2
285
Proposition 1 For L1, . . . , Ld iid and subexponential we have for L = L1 + + Ld that Proposition 2 P (L > x) dP (L1 > x),
Ni k=1 Xi,k
independent. Furthermore, for i = 1, . . . , d, Xi,k s are iid with P (Xi > x) = x1/i hi(x), If 1 > > d, we have that P (L > x) cP (X1 > x)
c 2006 (Embrechts, Frey, McNeil) 286
hi slowly varying
Discussion
Tail issues: robust statistics scaling mixtures Innite mean: industry occasionally uses constrained estimation to < 1 estimate under the condition of a nite upper limit
c 2006 (Embrechts, Frey, McNeil) 287
Discussion
(contd)
Aggregation issues: adding risk measures across a 7 8 table reduction because of correlation eects Data issues: impact of pooling incorporation of external data and expert opinion credibility theory
c 2006 (Embrechts, Frey, McNeil) 288
References
[1] Embrechts, P., Klppelberg, C., and Mikosch, T. (1997). Modelu ling Extremal Events for Insurance and Finance. Springer. [2] McNeil, A.J., Frey, R., and Embrechts, P. (2005) Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press. [3] Moscadelli, M. (2004) The Modelling of Operational Risk: Experience with the Analysis of the Data, Collected by the Basel Committee, Banaca dItalia, report 517-July 2004 [4] Nelehov, J., Embrechts, P. and Chavez-Demoulin, V. (2006) s a Innite mean models and the LDA for operational risk. Journal of Operational Risk, 1(1), 3-25.
c 2006 (Embrechts, Frey, McNeil) 289
290
291
Dependence between defaults is key issue in credit risk management. In large balanced loan portfolios main risk is occurrence of many joint defaults this might be termed extreme credit risk. Dependence between default critically aects performance of many basket credit derivatives Sources for dependence between defaults Dependence caused by common factors (eg. interest rates and changes in economic growth) aecting all obligors Default of company A may have direct impact on default probability of company B and vice versa because of direct business relations, a phenomenon known as counterparty risk or contagion.
295
Empirical Evidence I
Moodys annual default rates (defaulted companies/overall number of rated companies) and changes in economic growth from 1920 1999; changes in economic growth clearly aect default rates.
c 2006 (Embrechts, Frey, McNeil) 296
Empirical Evidence II
CCC B BB BBB A
0.3
0.2
0.1
1995 2000
0.0
Standard and Poors default data from 1980 to 2000 show clear evidence of cycles; we expect within-year and between-year dependence.
c 2006 (Embrechts, Frey, McNeil) 297
probability
0.0
0.02
0.04
0.06
0.08
0.10
10
20
30 Number of losses
40
50
60
Distribution of number of defaults for homogeneous portfolio of 1000 BB loans with default probability 1%; Bernoulli mixture model with default correlation 0.22% is compared with independent default model.
c 2006 (Embrechts, Frey, McNeil) 298
Overview
In our treatment of credit risk management we cover the following Structural or rm-value models for credit risk, in particular Mertons model, KMV, CreditMetrics. Focus on determining individual default probabilities. Static models for credit portfolios with focus on dependence modelling. Approaches to model simulation and calibration.
299
Remark. In Mertons model equity is a call option on V with strike F . The option interpretation of equity explains certain conicts of interest between shareholders and bondholders.
c 2006 (Embrechts, Frey, McNeil) 301
Under this assumption the default probability of our rm is readily computed. We have P (VT < F ) = P (ln VT < ln F ) =
F 2 ln V0 (V 1 V )T 2 V T
. (24)
In line with economic intuition this is increasing in F and V and decreasing in V0 and V .
c 2006 (Embrechts, Frey, McNeil) 302
(25)
303
Pricing of Debt
We now consider the value of the rms debt or equivalently of a zero coupon bond issued by the rm. Recall from (23) that value of rms debt equals dierence of default-free debt and put option on V with strike F , i.e. Ft = F p0(t, T ) P BS(t, Vt; r, V , F, T ) , where p0(t, T ) = exp (r(T t)) is price of default-free zero-coupon bond. Black-Scholes formula for European puts now yields Ft = p0(t, T )F (dt,2) + Vt(dt,1). (26)
304
Credit Spreads
Recall that credit spread measures the dierence of the (continuously compounded) yield of a default-free zero coupon bond p0(t, T ) and of a defaultable zero coupon bond p1(t, T ), i.e. 1 1 p1(t, T ) (ln p1(t, T ) ln p0(T t)) = ln . c(t, T ) = T t T t p0(t, T )
1 In Mertons model we have p1(t, T ) = F Ft and hence
(27)
Note that c(t, T ) depends only on V and on the ratio F p0(t, T )/V (a measure of indebtedness of the rm). In line with economic intuition it is increasing in both quantities.
c 2006 (Embrechts, Frey, McNeil) 305
0.1
0.2 volatility
0.3
0.4
0.5
0.0
0
0.4
0.8
1.2
2 time to maturity
Credit spread c(t, T ) (%) as function of V (top) and time to maturity T t (bottom) for xed debt to rm value ratio d = 0.6. In upper picture T t = 2; in lower picture V = 0.25. Note that, for T t < 0.25 (3 months) the credit spread implied by Mertons model is approximately zero.
c 2006 (Embrechts, Frey, McNeil) 306
(28)
In KMV model EDF has similar structure, but 1 is replaced by estimated function, and V0 and V are determined from equity data using Mertons model.
c 2006 (Embrechts, Frey, McNeil) 307
Calculation of EDFs
Distance to Default. Relation (28) for EDFM erton might be too simplistic. Therefore KMV denes in an intermediary step the distance to default or DD by DD := (V0 F )/(V V0), (29)
where F represents the default threshold (typically liabilities payable within one year). (29) is an approximation of the argument of (28), 2 since V and V are small and ln V0 ln F (V0 F )/V0. Calculation of EDFs. KMV uses DD as state variable and assumes that rms with equal DD have equal EDFs. Functional relation between EDF and DD is not postulated, but estimated empirically using proprietary default database.
c 2006 (Embrechts, Frey, McNeil) 309
310
311
Empirical probabilities of migrating from one rating to another within 1 year. Source: Standard & Poors CreditWeek. For example, the 1 year default probability of a B-rated rm is 5.2%.
312
313
References
[Merton, 1974] (On Mertons model) [Crosbie and Bohn, 2002] (KMVs default-model) [Kealhofer and Bohn, 2001] (KMVs portfolio model) [RiskMetrics-Group, 1997] (CreditMetrics manual) [Crouhy et al., 2000] (good comparison of dierent models) [Crouhy et al., 2001] (book by same authors).
316
317
319
Distribution of Defaults
For y = (y1, . . . , ym) in {0, 1}m we get
m
P (Y = y | = ) =
i=1
pi()yi (1 pi())1yi ,
f (y) = P (Y = y) =
Rp i=1
where g() is the probability density of the factors. By adding exposures and assumptions about losses given default we complete the specication of a one-period model.
c 2006 (Embrechts, Frey, McNeil) 320
Default Correlation
Denition. Default or event-correlation of rms i and j is given by corr(Yi, Yj ). Denote by pi = P (Yi = 1) = E(Yi) unconditional default probabilities. We have cov(Yi, Yj ) = E(YiYj ) pipj and var(Yi) = pi(1 pi). Hence corr(Yi, Yj ) = E(YiYj ) pipj pi(1 pi) pj (1 pj )
so that default correlation can be computed from joint default probability E(YiYj ) = P (Yi = 1, Yj = 1). Note that in mixture models pi =
c 2006 (Embrechts, Frey, McNeil)
Rd
pi()g()d
321
CreditRisk+
CreditRisk+ may be represented as a Bernoulli mixture model. Distribution of the default indicators is given by pi() = 1 exp(wi). Here = (1, . . . , p) is a vector of independent gamma distributed macroeconomic factors and wi = (wi,1, . . . , wi,p) is a vector of positive factor weights. Remark: CreditRisk+ is usually presented as a Poisson mixturre where, conditional on , the default of counterparty i occurs independently of other counterparties with a Poisson intensity i() = wi. This leads to the above default probabilities. The model assumptions mean that the distribution of the number of defaults is a sum of independent negative binomials.
c 2006 (Embrechts, Frey, McNeil) 322
k := P Yi1 = 1, . . . , Yik = 1 = E Qk =
q k dG(q) ,
0
Unconditional default probabilities and higher order joint default probabilities are moments of the mixing distribution.
c 2006 (Embrechts, Frey, McNeil) 323
N (0, 1)
324
325
50
100
Beta Density g(q) of mixing variable Q in exchangeable Bernoulli mixture model with = 0.005 and Y = 0.0018.
c 2006 (Embrechts, Frey, McNeil) 326
0.02
0.04
0.06
0.08
0.10
10^6
0.0
10^5
0.1 q
0.2
0.3
Horizontal line at 0.01 shows that models only diverge at 99th percentile of Q. For given and 2 there is little model risk.
c 2006 (Embrechts, Frey, McNeil) 327
(30)
Special Cases
One-Factor Model pi() =
1
(i) i p , 1 i
(i) p , 1
(31)
where turns out to be the asset correlation between any two critical variables Xi = Xj .
332
and
Conditional on some realization of the common factor the SLLN says that, almost surely, L(m) lim = q = p1(). m m In other words the distribution of Q = p1() can be thought of as the distribution of the portfolio loss (expressed as a fraction) in an innitely large exchangeable portfolio.
c 2006 (Embrechts, Frey, McNeil) 333
Remarks
In this example the asymptotic portfolio loss distribution, the distribution of 1 () p Q= , 1 has been called the Vasicek loss distribution. [Vasicek, 1997]. (It is a probit-normal distribution.) The idea of looking at innitely ne-grained portfolios where the individual risks become negligible and the systematic factor(s) dominate(s) has been taken up by other researchers in more complicated models and has inuenced regulation. [Gordy, 2003]
334
() p
1 1(l/m) .
() + 1() p . 1
335
General Results
We will give two results, as in our book. Proofs are found in [Frey and McNeil, 2003]. Let (ei)iN be an innite sequence of positive deterministic exposures, (Yi)iN be the corresponding sequence of default indicators and (i)iN a sequence of random variables with values in (0, 1] representing percentage losses given that default occurs. In this setting the loss for a portfolio of size m is given by m L(m) = i=1 Li where Li = eiiYi are the individual losses. We now make some technical assumptions for our model.
336
1 1 (m) lim E L | = = lim i() = () m m m m i=1 () is known as asymptotic conditional loss for all Rp. function. We preserve composition of portfolio as it grows. 3. There is some C < such that i=1(ei/i)2 < C for all m. No individual exposure may be too large.
c 2006 (Embrechts, Frey, McNeil) 337
339
Examples
Consider one-factor models with ei = i 1. Exchangeable Model. Proposition 2 implies that for m large we have q(L(m)) m q(Q). Tail of distribution of Lm is determined by the tail of Q. Homogeneous Group Model. If the relative group size mr /m converge to xed constants r as m for all r. We obtain k p() = r=1 r h(r + ). Hence for m large we have
k (m)
q(L(m)) m
r=1
c 2006 (Embrechts, Frey, McNeil)
r h(r + 1()) .
340
q L(m)
i=1
ieipi(1()).
In the internal-ratings-based (IRB) approach the capital required for risk i is proportional to ieipi(
1
(0.999)) = iei
1 (i) + (0.999) p . 1
341
342
343
C1. Motivation
Consider a Bernoulli mixture model for a loan portfolio and assume m that the overall loss is of the form L = i=1 Li, where the Li are conditionally independent given some economic factor vector . Suppose we wish to measure the portfolio risk with expected shortfall and to calculate a capital allocation based on expected shortfall contributions at the condence level . We need to evaluate the conditional expectations E (L | L q(L)) and E (Li | L q(L)) . (33)
A possible is to use Monte Carlo (MC) simulation, although the problem of rare event simulation arises.
344
= E(h(X)) =
h(x)f (x)dx,
(34)
for some known function h. For the probability of an event we have h(x) = 1{xA} for some set A R; for expected shortfall computation we have h(x) = x1{xc} for some c R. Where the analytical evaluation of is dicult we can resort to an MC approach: 1. Simulate X1, . . . , Xn independently from density f . MC 2. Compute the standard MC estimate n =
c 2006 (Embrechts, Frey, McNeil)
1 n
n i=1 h(Xi).
345
346
h(x)r(x)g(x)dx = Eg (h(X)r(X)),
(35)
where Eg denotes expectation with respect to the density g. Hence we can approximate the integral with the following algorithm. 1. Simulate X1, . . . , Xn independently from density g. IS 2. Compute the IS estimate n =
c 2006 (Embrechts, Frey, McNeil)
1 n
MC var n
The aim is to make Eg (h(X)2r(X)2) small compared to E(h(X)2). Consider the case of estimating a tail probability where h(x) = 1{xc} for c signicantly larger than the mean of X. We try to choose g so that the likelihood ratio r(x) = f (x)/g(x) is small for x c; in other words we make the event {X c} more likely under the IS density g than it is under the original density f .
c 2006 (Embrechts, Frey, McNeil) 348
Exponential Tilting
For t R we write MX (t) = E(e ) = for the moment generating function of X, which we assume is nite for t R. It is not hard to check that we can dene a density gt(x) := etxf (x)/MX (t) which can be used for importance sampling when X is light tailed. Dene t to be the mean of X with respect to the density gt i.e. t := Egt (X) = E X exp(tX)/MX (t) . How can we choose t optimally for a particular importance sampling problem? In the case of tail probability estimation theory suggests we should choose t as the solution of t = c.
c 2006 (Embrechts, Frey, McNeil) 349
tX
etxf (x)dx
so that under the tilted distribution, X N (t, 1). In particular, exponential tilting is a convenient way of shifting the mean of X.
350
g(x)dx , A R .
With this notation (35) becomes = EP (h(X)) = EQ(h(X)r(X)), so that r(X) equals dP/dQ, the (measure-theoretic) density of P with respect to Q.
351
exp(tX) ;A , MX (t)
The IS algorithm remains essentially unchanged: simulate independent n 1 realizations Xi under the measure Qt and set IS = n i=1 Xirt(Xi) as before.
352
353
P ({y}) =
i=1
pyi (1 pi)1yi , i
y {0, 1}m .
ML(t) = E
exp t
i=1
eiYi
=
i=1
E eteiYi =
i=1
etei pi + 1 pi .
The measure Qt is given by Qt({y}) = EP (etL/ML(t); Y = y) and hence exp (t i=1 eiyi) P ({y}). Qt({y}) = ML(t)
c 2006 (Embrechts, Frey, McNeil) 354
Qt({y}) ==
i=1
Qt({y}) =
i=1
yi q t,i(1 q t,i)1yi
where q t,i := exp(tei)pi/(exp(tei)pi + 1 pi). The default indicators remain independent but with new default probability q t,i. The optimal value of t is chosen such that EQt (L) = c, leading to the m equation i=1 eiq t,i = c.
c 2006 (Embrechts, Frey, McNeil) 355
1 n
where
357
Example
We consider an exchangeable portfolio of 100 rms with identical unit exposures, default probabilities 0.05 and asset correlations (i.e. values of ) 0.05. Aim is to calculate the tail probability P (L 20) by IS. (In such a simple model it can in fact be calculated analytically to be 0.00112.) We compare 1. Naive Monte Carlo (n = 10000). 2. IS for factor distribution (n = 10000). 3. Naive Monte Carlo for factor (n = 10000) and IS for conditional default distribution (n1 = 50). 4. IS for factor distribution (n = 10000) and conditional default distribution (n1 = 50).
c 2006 (Embrechts, Frey, McNeil) 358
Results
No IS Outer IS Inner IS Full IS
0.0020
0.0020
0.0020
0.0015
0.0015
0.0015
Estimate
Estimate
Estimate
0.0010
0.0010
0.0010
Estimate 0.0005
0 2000 4000 6000 8000 10000
0.0005
0.0005
2000
4000
6000 Iterations
8000
10000
2000
4000
6000
8000
10000
0.0005
0
0.0010
0.0015
0.0020
2000
4000
6000
8000
10000
Iterations
Iterations
Iterations
359
360
362
where g() is a link function, typically a mapping from R to (0, 1) like a distribution function (e.g. g = ); xi and zi are explanatory variables (covariates) for ith obligor, such as indicators for rating category or sector, or rm-specic information from balance sheet; are unknown parameters (generally including an intercept).
c 2006 (Embrechts, Frey, McNeil) 364
y b
N.B. the factors are represented here by b and are hyperparameters of the distribution of .
c 2006 (Embrechts, Frey, McNeil) 365
A Multi-Period Model
Given factors t in time period t we assume that individual default indicators Yt1, . . . , Ytmt are conditionally independent with Yti | t = t Be(pti( t)), i = 1, . . . , mt, where (37)
pti( t) = g((t,i) xti zti t). Note some slight notational changes: (t, i) returns the credit rating of company i, 1, . . . , K are unknown intercepts.
366
where w2 := var(
ti).
367
Multi-Period Model
xt
xt+1
yt bt
yt+1 bt+1
368
xt
xt+1
yt bt
yt+1 bt+1
369
Serial Dependence
Serial dependence in default probabilities can be modelled using correlated latent factors 1, . . . , T . We assume that 1. conditional on (t)T the default indicator vectors Y1, . . . , YT are t=1 independent. Moreover, Yt depends on t only. 2. the latent factors (t)T form a Markov chain. t=1 Remark: The above assumptions dene a state space model (hidden Markov model) for the sequence Y1, . . . , YT .
370
f (yt | , ) =
Rp
P (Yti = yti | t = , ) g( | ) d
i=1
where p = dim() and g( | ) is the density of t. The likelihood function with 1, . . . , T independent is
T
f (yt | , ).
(38)
t=1 i=1
To evaluate this expression we have an integral over RT p, which makes standard maximum likelihood dicult.
372
Bayesian Inference
We distinguish between observed quantities D := (xt, zt, yt)T and t=1 unobserved quantities := (, , 1, . . . , T ). The prior distribution p() expresses a state of knowledge (or ignorance) about the unobserved elements before the data D are obtained. Inference in our model is based on the posterior distribution p( | D) p(D | )p() p( | D) = = p(D) Problem: nding p( | D)! p(D | )p() . p(D | )p() d
373
374
Advantages of MCMC
calibration of complex models with multivariate, serially correlated latent factors; implementation straight-forward; simulation fast; point estimates, standard errors and (joint) condence sets of parameters are inherent in the output; inference about derived model parameters (e.g. default correlations, implied asset correlations) as easy as for primary parameters; posterior path of latent factors (t) can be compared with other macro-economic variables; prior information about parameters governing portfolio dependence can be entered in the analysis.
c 2006 (Embrechts, Frey, McNeil) 375
mtk := #{i : (t, i) = k} (number of companies for rating class k), Mt := (Mt1, . . . , MtK ) (vector of default count variables).
We t several models to S&P default data (rating classes CCC, B, BB, BBB, A) by Gibbs sampling with non-informative priors. The sequence (t) of scalar latent factors satises an AR(1) process: t = t1 + t, where
0, 1, . . .
b0 = 0 /
1 2 ,
Empirical Analysis
Given (t)T , we assume M1, . . . , MT conditionally independent. t=1 Model 1: Mtk | t = t B(mtk , g(k t)), where g(x) = 1/(1 + exp{x}) is the logit response.
mt1 Mt1 bt
mtK MtK
mt +1,1 Mt +1,1 bt +1
mt +1,K Mt +1,K
Results
mean (std.) A 9.097 (0.654) BBB 7.144 (0.356) BB 5.712 (0.276)
B 3.872 (0.239)
0.396 (0.083)
0.649 (0.169)
500
1500 iteration
2500
500
1500 iteration
2500
Histogram
frequency
0 500 1500 iteration 2500
0
0.0
100
200
300
0.4 alpha
0.8
378
Extended models
Time heterogeneity: (xt) denotes the Chicago Fed National Activity Index: Model 2: Mtk | t = t B(mtk , g(k xt t)). Heterogeneity among rating classes: Model 3: Mtk | t = t B(mtk , g(k k t)). Sector heterogeneity: Consider S industry sectors, and denote by Mtsk (mtsk ) the number of defaults (companies) for rating class k, period t, and sector s. Set t = (t1, . . . , tS ) , where ts := t + ts with (t) as before and t1, . . . , tS iid Gaussian. Model 4: Mtsk | t B(mtsk , g(k ts)). The covariance matrix of bt is of compound symmetry type.
c 2006 (Embrechts, Frey, McNeil) 379
A model without sector-specic latent factors yields an overall implied asset correlation of 6.9 % for the same dataset.
c 2006 (Embrechts, Frey, McNeil) 380
with z).
The elements of the matrix corr(Yti, Ytj ) are migration correlations of obligors i and j.
c 2006 (Embrechts, Frey, McNeil) 383
BB
D CCC B
BB
BBB A AA AAA
Let t1, . . . , tmt be iid rvs with df g (independent of t). We dene Vti := xti + ztit + ti, i = 1, . . . , mt, and notice that Rti = Vti (t,i),
1,
(t,i),
Interpretation: Vti is the asset value and ((t,i), ) are critical liability levels. We refer to corr(Vti, Vtj ) as the implied asset correlation between obligors i and j.
c 2006 (Embrechts, Frey, McNeil) 384
385
A. Credit Derivatives
1. Overview 2. Credit default swaps 3. Portfolio products
386
A1. Overview.
We nd it convenient to divide the universe of credit risky securities into the following three groups. Vulnerable securities Single-name credit derivatives Portfolio credit derivatives Vulnerable securities. These are securities whose actual payo is aected by the default of the issuer (or a party in a contract), but where trading or management of credit risk is not the primary purpose of the transaction. Examples: vulnerable options, interest rate swaps and corporate bonds.
c 2006 (Embrechts, Frey, McNeil) 387
An Overview ctd
Single-name credit derivatives. Credit derivatives are derivative securities which are primarily used for management or trading of credit risk. In case of a single-name credit derivative the payo depends only of the credit risk of a single obligor. Prime example: credit default swap (CDS). Portfolio credit derivatives. Credit derivatives whose payo depends on the credit risk within a portfolio of obligors. Prime examples: basket credit derivatives and CDOs (collaterized debt obligation).
388
389
B
c 2006 (Embrechts, Frey, McNeil)
A
390
391
CDS Indices
There are a variety of standardized CDS indices for dierent regions (North America, Europe, Asia etc.) and industry sectors. These indices are fairly liquid and quotes for protection buyer and seller positions are readily obtained. Indices for North America are known under the label DJ CDX. . . , indices for Europe and Asia under the label DJ iTraxx Europe . . . , DJ iTraxx Asia . . . etc. Consider a CDS index on m names. The payo of the index corresponds to the payo of a portfolio of CDSs containing 1/m units of a single-name CDS on every name in the portfolio. In particular, the index spread is (approximately) given by the average spread of the single name CDSs in the portfolio.
c 2006 (Embrechts, Frey, McNeil) 392
394
Assets
Interest and Principal
-
SPV
Interest and Principal
-
Liabilities
Senior Mezzanine
Protection Fee
CDS Premiums
Initial Investment
Proceeds
Equity
Payments in a CDO structure; above arrow: asset-based structure; below arrow: synthetic CDO.
396
Types of CDOs
Asset-based CDO. These CDOs are based on a portfolio of real assets such as bonds (CBOs), loans (CLOs), mortgage-backed securities etc. Noteholders make an initial investment used to buy the underlying assets; in return they receive interest and principal repayments. Synthetic CDO. Here the underlying assets consist of single-name CDSs on a pool of rms; the noteholders make default payments and receive a periodic protection fee. No initial payments. Mixed forms exist. In a balance-sheet CDO the pool of underlying assets remains essentially unchanged over the duration of the transaction. In an arbitrage-CDO the underlying assets are actively traded with the goal of making additional prot.
c 2006 (Embrechts, Frey, McNeil) 397
Economic motivations
CDOs are arranged for a number of reasons. Spread arbitrage. Often the sale price of notes exceeds the initial value of underlying assets, as notes have a more favourable risk-return prole due to diversication. Regulatory capital relief. Important reason for balance-sheet CLOs; the structure enables banks to sell some of their credit risk but maintain borrower-lender relationship. Risk transfer. Banks use CDOs and related credit derivatives to improve the risk/return prole in their loan book.
398
399
400
Reduced-form models.
In reduced-form models the default of a rm is modelled by some random time whose df is specied by the modeller; the precise economic mechanism leading to default is left open. A similar modelling philosophy was used in Bernoulli mixture models for credit risk management. Reduced-form models are popular in practice. They lead to tractable formulas explaining the price of credit-risky securities in terms of economic covariates With reduced-form models, the pricing machinery for default-free term-structure models can be applied to many defaultable securities as well.
c 2006 (Embrechts, Frey, McNeil) 401
1 F (t + h) F (t) 1 lim P ( t + h | > t) = lim = (t). h0 h h F (t) h0 Example. 1) The exponential distribution with parameter has df F (t) = 1 et, so that (t) = for all t > 0. 2) The Weibull distribution has df F (t) = 1 exp(t) for , > 0. This yields f (t) = t1 exp(t) and (t) = t1. Note that (t) is decreasing in t if < 1 and increasing if > 1.
c 2006 (Embrechts, Frey, McNeil) 403
(39)
Conditional Expectations
Prices of credit derivatives will be (conditional) expectations wrt (Ht). Lemma. Let be a random time with jump indicator process Yt = 1{ t} and associated default history (Ht). Then, for any integrable rv X and any t 0, we have E(X; > t) E(1{ >t}X | Ht) = 1{ >t} . P ( > t) This gives the following expression for the conditional survival function of . Put With X := 1{ >s}, s > t. We get F (s) P ( > s | Ht) = E(X | Ht) = E(1{ >t}X | Ht) = 1{ >t} . F (t)
c 2006 (Embrechts, Frey, McNeil) 405
E exp
t
r(s)ds 1{ >T } | Ht
(40)
s (s)ds). t
If has hazard rate (t), we have F (s)/F (t) = exp( T Hence (40) equals 1{ >t} exp t (r(s) + (s))ds .
Remark. Bond price can be viewed as price of a default-free bond in a model with adjusted interest rate R(t) = r(t) + (t). This holds more generally in models where default time is a doubly stochastic random time.
c 2006 (Embrechts, Frey, McNeil) 406
407
Martingale Modelling
When building a model for pricing derivatives, the dynamics of the objects of interest (eg. interest rates or default times) are often specied directly under a risk-neutral measure Q. This approach is termed martingale modelling. Using risk-neutral pricing, if the value H of an asset at maturity T is exogenously given, its price Ht at time t < T can be computed as the conditional expectation under Q of the discounted payo. Denote by (rt) the default-free short rate. We have the following formula
T
Ht = E Q exp(
t
rsds)H | Ft .
(41)
Model parameters are determined by equating the price of traded securities computed via (41) to observed market prices (calibration to market data).
c 2006 (Embrechts, Frey, McNeil) 408
V Prem(x; Q) = E Q
k=1 N
tk
exp
0
=x
k=1
Q(s)ds .
410
V Def( Q) = E Q exp
0
r(u)du 1{ <tN } .
t Q (u)du 0 tN
Since has density f (t) = Q(t) exp R(u) := r(u) + Q(u), we get V Def( Q) =
0 tN
, dening
Rt
Rt
0 r(s)ds
f (t)dt =
0
Q(t)e
0 R(s)ds
dt .
Fair swap spread x. Since there are no initial payments in a CDS, the initial value of the contract is zero. Hence x is given by the relation V Prem(x; Q) = V Def( Q), which is easily solved for x. Note that x depends on the intensity function Q, as V Prem and V Def depend on Q.
c 2006 (Embrechts, Frey, McNeil) 411
Calibration to CDS-Spreads
Assume that we observe spreads quoted in the market for one or more CDSs on the same reference entity. In order to calibrate our model we hence have to determine the implied risk-neutral hazard rate function Q, which ensures that the fair CDS spreads implied by the model equal market quotes. Suppose that market information consists of the fair spread x of one CDS with maturity tN ; hence we take Q constant. Hence Q has to solve the equation
N
k=1
tk
Q 0
tN
p0(0, t)e
Qt
dt.
Note that the lhs of this equation (premium payments) is decreasing in Q, whereas the rhs (default payments) is increasing in Q, so that a unique solution exists.
c 2006 (Embrechts, Frey, McNeil) 412
413
References
Our presentation is based on Sections 9.2.1 and 9.3 of [McNeil et al., 2005]. For more information on doubly stochastic random times see eg. Sections 9.2.3 and 9.4 of that volume or [Lando, 1998] and [Lando, 2004].
414
415
C1. Introduction
Existing reduced form models for credit portfolios can be divided in following model classes Models with conditionally independent defaults such as [Due and Singleton, 1999], [Lando, 1998]. Easy to treat; in particular similar valuation formulas for credit derivatives as in default-free term-structure models; no default contagion. Copula models such as [Li, 2001], [Schnbucher and Schubert, 2001]. o Easily calibrated to defaultable term structure, allow for default contagion. Main drawback: in general copula models there is a fairly unintuitive parametrization of dependence.
416
418
Comments 1) Specifying dependence structure C and marginal distribution F i separately is useful for calibration. Model is calibrated to given term structure of single-name CDS spreads by specifying F i; calibration of dependence structure (i.e. C) can then be done independently. 2) Typically F i is written as F i(t) = exp 0 i(s)ds where i(s) = F i(s)/F i(s) is the marginal hazard rate.
c 2006 (Embrechts, Frey, McNeil) 420
=E
i=1 m
=: E
(43)
F (t1, . . . , tm) =
Rp i=1
Comments. 1) This is similar to representation of static threshold models as Bernoulli mixture models. In particular, for T xed YT follows a Bernoulli mixture model with factor vector V and conditional default probabilities QT,i(v) = 1 F i|V (T | v). 2) The latent V is sometimes termed frailty of the default times. 3) Mixture representation very useful for simulation and pricing.
c 2006 (Embrechts, Frey, McNeil) 422
423
R i=1
di(t) iv 1 i
v 2/2
dv .
424
lambda
0.010
0.0
0.015
0.020
0.025
0.2
0.4 time
0.6
0.8
1.0
A trajectory of default intensity for dierent default correlations in a typical factor copula model, assuming T1 = 4 months.
c 2006 (Embrechts, Frey, McNeil) 427
428
yi .
(45)
In exchangeable models i(t, Yt) is necessarily of this form. Moreover, natural interpretation in terms of mean-eld interaction. Extension to model with several groups possible. [Yu, 2005] claims that h(t, l) = 0.01 + 0.001 1{l>0} is a reasonable model for European telecom bonds.
c 2006 (Embrechts, Frey, McNeil) 430
Note that f(l) = (K l)+ (K1 l)+ (put spread with strike prices K and K1).
c 2006 (Embrechts, Frey, McNeil) 432
A stylized Example
Stylized CDO. We assume that payo of tranche is simply given by N(T ), the value of the notional at maturity. Real CDOs are more complicated, as there is intermediate income, but stylized example retains essential features. Impact of default dependence. More dependence, same marginal default probabilities Equity tranche increases in value, senior tranches decrease in value. Impact on mezzanine tranches unclear. Qualitative properties carry over to more complex structures actually traded.
433
0.10
0.04
0.02
0.0
20
40
60 Cumulative loss L
80
100
Payo of a stylized CDO with attachment points at 20, 40 and 60 with two dierent loss distributions overlayed.
c 2006 (Embrechts, Frey, McNeil) 434
10
Payoff of tranches
0.08
Probability
0.06
15
20
25
Dependence Independence
30
435
V def = E Q
0
D(t)dL(t)
T
= D(T )E Q L(T ) +
0
As L(t) is a function of Lt this can be computed by one-dimensional integration if we know the distribution of Lt. Premium payments can also be expressed in terms of Lt.
436
This means that holder of [0,3] tranche gets 27.6% of notional upfront and 5% of
438
Dierent correlation parameter needed to explain price of dierent tranches. For mezzanine tranches is not unique.
c 2006 (Embrechts, Frey, McNeil) 439
440
30
20
10
10
12
14
16
18
20
22
This is a typical pattern of tranche and base correlation, called base correlation skew. In particular model based on Gauss copula cannot explain all prices simultaneously.
c 2006 (Embrechts, Frey, McNeil) 442
Alternative models such as models with interacting intensity or common shock models.
c 2006 (Embrechts, Frey, McNeil) 443
Bibliography
[Abramowitz and Stegun, 1965] Abramowitz, M. and Stegun, I., editors (1965). Handbook of Mathematical Functions. Dover Publications, New York. [Acerbi and Tasche, 2002] Acerbi, C. and Tasche, D. (2002). On the coherence of expected shortfall. J. Banking Finance, 26:14871503. [Andersen and Sidenius, 2004] Andersen, L. and Sidenius, J. (2004). Extensions to the Gaussian copula: Random recovery and random factor loadings. Journal of Credit Risk, 1:2970. [Atkinson, 1982] Atkinson, A. (1982). The simulation of generalized inverse Gaussian and hyperbolic random variables. SIAM J. Sci. Comput., 3(4):502515.
c 2006 (Embrechts, Frey, McNeil) 444
[Balkema and de Haan, 1974] Balkema, A. and de Haan, L. (1974). Residual life time at great age. Ann. Probab., 2:792804. [Barndor-Nielsen, 1997] Barndor-Nielsen, O. (1997). Normal inverse Gaussian distributions and stochastic volatility modelling. Scand. J. Statist., 24:113. [Barndor-Nielsen and Shephard, 1998] Barndor-Nielsen, O. and Shephard, N. (1998). Aggregation and model construction for volatility models. Preprint, Center for Analytical Finance, University of Aarhus. [Black and Scholes, 1973] Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities. J. Polit. Economy, 81(3):637654.
c 2006 (Embrechts, Frey, McNeil) 445
[Bluhm, 2003] Bluhm, C. (2003). CDO modeling: techniques, examples and applications. Preprint, HVB Group, Munich. [Bluhm et al., 2002] Bluhm, C., Overbeck, L., and Wagner, C. (2002). An Introduction to Credit Risk Modeling. CRC Press/Chapman & Hall, London. [Cherubini et al., 2004] Cherubini, U., Luciano, E., and Vecchiato, W. (2004). Copula Methods in Finance. Wiley, Chichester. [Clayton, 1996] Clayton, D. (1996). Generalized linear mixed models. In Gilks, W., Richardson, S., and Spiegelhalter, D., editors, Markov Chain Monte Carlo in Practice, pages 275301. Chapman & Hall, London. [Crosbie and Bohn, 2002] Crosbie, P. and Bohn, J. (2002). Modeling default risk. Technical document, Moodys/KMV, New York.
c 2006 (Embrechts, Frey, McNeil) 446
[Crouhy et al., 2000] Crouhy, M., Galai, D., and Mark, R. (2000). A comparative analysis of current credit risk models. J. Banking Finance, 24:59117. [Crouhy et al., 2001] Crouhy, M., Galai, D., and Mark, R. (2001). Risk Management. McGraw-Hill, New York. [Daley and Vere-Jones, 2003] Daley, D. and Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes, volume I: Elementary Theory and Methods. Springer, New York, 2nd edition. [Daul et al., 2003] Daul, S., De Giorgi, E., Lindskog, F., and McNeil, A. (2003). The grouped t-copula with an application to credit risk. Risk, 16(11):7376. [Davis and Lo, 2001] Davis, M. and Lo, V. (2001). Infectious defaults. Quant. Finance, 1(4):382387.
c 2006 (Embrechts, Frey, McNeil) 447
[Due and Singleton, 1999] Due, D. and Singleton, K. (1999). Modeling term structures of defaultable bonds. Rev. Finan. Stud., 12:687720. [Eberlein and Keller, 1995] Eberlein, E. and Keller, U. (1995). Hyperbolic distributions in nance. Bernoulli, 1:281299. [Eberlein et al., 1998] Eberlein, E., Keller, U., and Prause, K. (1998). New insights into smile, mispricing, and value at risk: the hyperbolic model. J. Bus., 38:371405. [Embrechts et al., 1997] Embrechts, P., Klppelberg, C., and Mikosch, u T. (1997). Modelling Extremal Events for Insurance and Finance. Springer, Berlin. [Embrechts et al., 2002] Embrechts, P., McNeil, A., and Straumann, D. (2002). Correlation and dependency in risk management:
c 2006 (Embrechts, Frey, McNeil) 448
properties and pitfalls. In Dempster, M., editor, Risk Management: Value at Risk and Beyond, pages 176223. Cambridge University Press, Cambridge. [Fang et al., 1990] Fang, K.-T., Kotz, S., and Ng, K.-W. (1990). Symmetric Multivariate and Related Distributions. Chapman & Hall, London. [Fisher and Tippett, 1928] Fisher, R. and Tippett, L. (1928). Limiting forms of the frequency distribution of the largest or smallest member of a sample. Proc. Camb. Phil. Soc., 24:180190. [Frey and Backhaus, 2004] Frey, R. and Backhaus, J. (2004). Portfolio credit risk models with interacting default intensities: a Markovian approach. Preprint, University of Leipzig. [Frey and McNeil, 2002] Frey, R. and McNeil, A. (2002). VaR and
c 2006 (Embrechts, Frey, McNeil) 449
expected shortfall in portfolios of dependent credit risks: Conceptual and practical insights. J. Banking Finance, pages 13171344. [Frey and McNeil, 2003] Frey, R. and McNeil, A. (2003). Dependent defaults in models of portfolio credit risk. J. Risk, 6(1):5992. [Genest and Rivest, 1993] Genest, C. and Rivest, L. (1993). Statistical inference procedures for bivariate archimedean copulas. J. Amer. Statist. Assoc., 88:10341043. [Glasserman and Li, 2003] Glasserman, P. and Li, J. (2003). Importance sampling for portfolio credit risk. Preprint, Columbia Business School. [Gnedenko, 1943] Gnedenko, B. (1943). Sur la distribution limite du terme maximum dune srie alatoire. Ann. of Math., 44:423453. e e
c 2006 (Embrechts, Frey, McNeil) 450
[Gordy, 2003] Gordy, M. (2003). A risk-factor model foundation for ratings-based capital rules. J. Finan. Intermediation, 12(3):199232. [Greenspan, 2002] Greenspan, A. (2002). Speech before the Council on Foreign Relations. In International Financial Risk Management, Washington, D.C. 19th November. [Hawkes, 1971] Hawkes, A. (1971). Point spectra of some mutually exciting point processes. J. R. Stat. Soc. Ser. B Stat. Methodol., 33:438443. [Hull and White, 2004] Hull, J. and White, A. (2004). Valuation of a CDO and an nth to default CDS without Monte Carlo simulation. J. Derivatives, 12:823. [Jarrow and Yu, 2001] Jarrow, R. and Yu, F. (2001). Counterparty risk and the pricing of defaultable securities. J. Finance, 53:22252243.
c 2006 (Embrechts, Frey, McNeil) 451
[Joe, 1997] Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall, London. [Kealhofer and Bohn, 2001] Kealhofer, S. and Bohn, J. (2001). Portfolio management of default risk. Technical document, Moodys/KMV, New York. [Lando, 1998] Lando, D. (1998). Cox processes and credit risky securities. Rev. Derivatives Res., 2:99120. [Lando, 2004] Lando, D. (2004). Credit Risk Modeling: Theory and Applications. Princeton University Press, Princeton. [Laurent and Gregory, 2003] Laurent, J. and Gregory, J. (2003). Basket default swaps, CDOs and factor copulas. Preprint, University of Lyon and BNP Paribas.
c 2006 (Embrechts, Frey, McNeil) 452
[Li, 2001] Li, D. (2001). On default correlation: a copula function approach. J. of Fixed Income, 9:4354. [Marshall and Olkin, 1988] Marshall, A. and Olkin, I. (1988). Families of multivariate distributions. J. Amer. Statist. Assoc., 83:834841. [McNeil, 1998] McNeil, A. (1998). History repeating. Risk, 11(1):99. [McNeil et al., 2005] McNeil, A., Frey, R., and Embrechts, P. (2005). Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press, Princeton. [McNeil and Wendin, 2003] McNeil, A. and Wendin, J. (2003). Generalised linear mixed models in portfolio credit risk modelling. Preprint, ETH Zurich.
c 2006 (Embrechts, Frey, McNeil) 453
[Merton, 1974] Merton, R. (1974). On the pricing of corporate debt: The risk structure of interest rates. J. Finance, 29:449470. [Nelsen, 1999] Nelsen, R. (1999). Springer, New York. An Introduction to Copulas.
[Ogata, 1988] Ogata, Y. (1988). Statistical models for earthquake occurrences and residuals analysis for point processes. J. Amer. Statist. Assoc., 83:927. [Pickands, 1975] Pickands, J. (1975). Statistical inference using extreme order statistics. Ann. Statist., 3:119131. [Prause, 1999] Prause, K. (1999). The generalized hyperbolic model: estimation, nancial derivatives and risk measures. PhD thesis, Institut fr Mathematische Statistik, Albert-Ludwigs-Universitt u a Freiburg.
c 2006 (Embrechts, Frey, McNeil) 454
[Reiss and Thomas, 1997] Reiss, R.-D. and Thomas, M. (1997). Statistical Analysis of Extreme Values. Birkhuser, Basel. a [RiskMetrics-Group, 1997] RiskMetrics-Group (1997). Creditmetrics technical document. [Robert and Casella, 1999] Robert, C. and Casella, G. (1999). Monte Carlo Statistical Methods. Springer, New York. [Scholes, 2000] Scholes, M. (2000). Amer. Econ. Rev., pages 1722. Crisis and risk management.
Credit Derivatives
(2001). Copula-dependent default risk in intensity models. Preprint, Universitt Bonn. a [Smith, 1987] Smith, R. (1987). Estimating tails of probability distributions. Ann. Statist., 15:11741207. [Smith, 1989] Smith, R. (1989). Extreme value analysis of environmental time series: an application to trend detection in ground-level ozone. Statist. Sci., 4:367393. [Steinherr, 1998] Steinherr, A. (1998). Derivatives. The Wild Beast of Finance. Wiley, New York. [Tavakoli, 2001] Tavakoli, J. (2001). Credit Derivatives and Synthetic Structures: A Guide to Investments and Applications. Wiley, New York, 2nd edition.
c 2006 (Embrechts, Frey, McNeil) 456
[Yu, 2005] Yu, F. (2005). Correlated defaults and the valuation of defaultable securities. Math. Finance. To appear.
457