Quantitative Risk Management. Concepts, Techniques and Tools

Quantitative Risk Management: Concepts, Techniques and Tools
Paul Embrechts ETH Zurich Rdiger Frey u University of Leipzig Alexander McNeil ETH Zurich
19th International Summer School of Swiss Association of Actuaries 1014 July 2006, University of Lausanne http://www.pupress.princeton/edu/titles/8056.html http://www.math.ethz.ch/mcneil/book/ embrechts@math.ethz.ch ruediger.frey@math.uni-leipzig.de mcneil@math.ethz.ch
c 2006 (Embrechts, Frey, McNeil)
Q U A N T I TAT I V E
RISK
MANAGEMENT
Concepts Techniques Tools
Alexander J. McNeil Rdiger Frey Paul Embrechts

P R I N C E T O N S E R I E S I N F I N A N C E
Overview
I. Introduction to QRM and Multivariate Risk Models II. Modelling Extreme Risks III. Operational Risk IV. Credit Risk Management V. Dynamic Credit Models and Credit Derivatives
I: Introduction to QRM and Multivariate Risk Models

A. QRM: The Nature of the Challenge B. Multivariate Risk Factor Models C. Copulas
A. The Nature of the Challenge

1. Financial Risk in Perspective 2. QRM: the Nature of the Challenge 3. Loss Distributions 4. Risk Measures Based on Loss Distributions
A1. Financial Risk in Perspective

What is Risk? hazard, a chance of bad consequences, loss or exposure to mischance [OED] any event or action that may adversely aect an organizations ability to achieve its objectives and execute its strategies the quantiable likelihood of loss or less-than-expected returns
Risk and Randomness

Risk relates to uncertainty and hence to the notion of randomness. Arguably randomness had eluded a clear, workable denition for centuries, until Komogorov oered an axiomatic denition of randomness and probability in 1933. We assume that readers/participants are familiar with basic notation, terminology and results from elementary probability and statistics, the branch of mathematics dealing with stochastic models and their application to the real world. The word stochastic is derived from the Greek Stochazesthai, the art of guessing, or Stochastikos, meaning skilled at aiming, stochos being a target.
Financial Risk
We are primarily concerned with the main categories of nancial risk: market risk the risk of a change in the value of a nancial position due to changes in the value of the underlying components on which that position depends, such as stock and bond prices, exchange rates, commodity prices, etc. credit risk the risk of not receiving promised repayments on outstanding investments such as loans and bonds, because of the default of the borrower. operational risk the risk of losses resulting from inadequate or failed internal processes, people and systems, or external events.
c 2006 (Embrechts, Frey, McNeil) 7
Insurance Risk
The insurance industry also has a longstanding relationship with risk. The Institute and Faculty of Actuaries use the following denition of the actuarial profession: Actuaries are respected professionals whose innovative approach to making business successful is matched by a responsibility to the public interest. Actuaries identify solutions to nancial problems. They manage assets and liabilities by analysing past events, assessing the present risk involved and modelling what could happen in the future. An additional risk category entering through insurance is underwriting risk: the risk inherent in insurance policies sold.
The Road to Basel

Risk management: one of the most important innovations of the 20th century. [Steinherr, 1998] The late 20th century saw a revolution on nancial markets. It was an era of innovation in academic theory, product development (derivatives) and information technology and of spectacular market growth. Large derivatives losses and other nancial incidents raised banks consciousness of risk. Banks became subject to regulatory capital requirements, internationally coordinated by the Basle Committee of the Bank of International Settlements.
Some Dates
1950s. Foundations of modern risk analysis are laid by work of Markowitz and others on portfolio theory. 1970. Oil crises and abolition of Bretton-Woods turn energy prices and exchange rates into volatile risk factors. 1973. CBOE, Chicago Board Options Exchange starts operating. Fisher Black and Myron Scholes, publish an article on the rational pricing of options. [Black and Scholes, 1973] 1980s. Deregulation; globalization - mergers on unprecedented scale; advances in IT.
10
Growth of Markets
Example 1 Average daily trading volume at New York stock exchange: 1970: 3.5 million shares 1990: 40 million shares Example 2: Global market in OTC derivatives (nominal value). 1995 1998 FOREX contracts $13 trillion $18 trillion Interest rate contracts $26 trillion $50 trillion All types $47 trillion $80 trillion Source BIS; see [Crouhy et al., 2001]. $1 trillion = $1 1012.
11
Disasters of the 1990s

The period 1993-1996 saw some spectacular derivatives-based losses: Orange County (1.7 billion US$) Metallgesellschaft (1.3 billion US$) Barings (1 billion US$) Although, to be fair, classical banking produced its own large losses.e.g. 50 billion CHF of bad loans written o by the Big Three in early nineties.
12
The Regulatory Process

1988. First Basel Accord takes rst steps toward international minimum capital standard. Approach fairly crude and insuciently dierentiated. 1993. The birth of VaR. Seminal G-30 report addressing for rst time o-balance-sheet products (derivatives) in systematic way. At same time JPMorgan introduces the Weatherstone 4.15 daily market risk report, leading to emergence of RiskMetrics. 1996. Amendment to Basel I allowing internal VaR models for market risk in larger banks. 2001 onwards. Second Basel Accord, focussing on credit risk but also putting operational risk on agenda. Banks may opt for a more advanced, so-called internal-ratings-based approach to credit.
Basel II: What is New?

Rationale for the New Accord: More exibility and risk sensitivity Structure of the New Accord: Three-pillar framework: Pillar 1: minimal capital requirements (risk measurement) Pillar 2: supervisory review of capital adequacy Pillar 3: public disclosure
14
Basel II Continued
Two options for the measurement of credit risk: Standard approach Internal rating based approach (IRB) Pillar 1 sets out the minimum capital requirements (Cooke Ratio): total amount of capital 8% risk-weighted assets MRC (minimum regulatory capital) Explicit treatment of operational risk
def
8% of risk-weighted assets
A2. QRM: the Nature of the Challenge

In our book we have tried to contribute to the establishment of the new discipline of QRM. This has two main strands: Fixing the Foundations putting current practice onto a rmer mathematical footing where, for example, concepts like prot and loss distributions, risk factors, risk measures, capital allocation and risk aggregation are given formal denitions. In doing this we have been guided by the consideration of what topics should form the core of a course on QRM for a wide audience of students. Going Beyond Current Practice gathering material on techniques and tools which go beyond current practice and address some of the deciencies that have been raised repeatedly by critics.
16
Extremes Matter
From the point of view of the risk manager, inappropriate use of the normal distribution can lead to an understatement of risk, which must be balanced against the signicant advantage of simplication. From the central banks corner, the consequences are even more serious because we often need to concentrate on the left tail of the distribution in formulating lender-of-last-resort policies. Improving the characterization of the distribution of extreme values is of paramount importance. [Alan Greenspan, Joint Central Bank Research Conference, 1995]
17
Extremes Matter II: LTCM

With globalisation increasing, youll see more crises. Our whole focus is on the extremes now - whats the worst that can happen to you in any situation - because we never want to go through that [LTCM] again. [John Meriwether, The Wall Street Journal, 21st August 2000] Much space is devoted in our book to models for nancial risk factors that go beyond the normal (or Gaussian) model and attempt to capture the related phenomena of heavy tails, volatility and extreme values.
18
The Interdependence and Concentration of Risks

The multivariate nature of risk presents an important challenge. Whether we look at market risk or credit risk, or overall enterprise-wide risk, we are generally interested in some form of aggregate risk that depends on high-dimensional vectors of underlying risk factors such as individual asset values in market risk, or credit spreads and counterparty default indicators in credit risk. A particular concern in our multivariate modelling is the phenomenon of dependence between extreme outcomes, when many risk factors move against us simultaneously.
19
Dependent Extreme Values: LTCM

Extreme, synchronized rises and falls in nancial markets occur infrequently but they do occur. The problem with the models is that they did not assign a high enough chance of occurrence to the scenario in which many things go wrong at the same timethe perfect storm scenario. [Business Week, September 1998] In a perfect storm scenario the risk manager discovers that the diversication he thought he had is illusory; practitioners describe this also as a concentration of risk.
20
Concentration Risk
Over the last number of years, regulators have encouraged nancial entities to use portfolio theory to produce dynamic measures of risk. VaR, the product of portfolio theory, is used for short-run, day-to-day prot and loss exposures. Now is the time to encourage the BIS and other regulatory bodies to support studies on stress test and concentration methodologies. Planning for crises is more important than VaR analysis. And such new methodologies are the correct response to recent crises in the nancial industry. [Scholes, 2000]
21
The Problem of Scale

A further challenge in QRM is the typical scale of our portfolios, which at their most general may represent the entire position in risky assets of a nancialinstitution. Calibration of detailed multivariate models for all risk factors is an almost impossible task and hence any sensible strategy involves dimension reduction, that is to say the identication of key risk drivers and a concentration on modelling the main features of the overall risk landscape with a fairly broad brush approach. This applies both to market risk and credit risk models. In the latter, factor models for default dependence are at least as important as detailed models of individual default.
22
Interdisciplinarity
The quantitative risk manager of the future should have a combined skillset that includes concepts, techniques and tools from many elds: mathematical nance; statistics and nancial econometrics; actuarial mathematics; non-quantitative skills, especially communication skills; humility QRM is a small piece of a bigger picture!
23
A3. Loss Distributions

To model risk we use language of probability theory. Risks are represented by random variables mapping unforeseen future states of the world into values representing prots and losses. The risks which interest us are aggregate risks. In general we consider a portfolio which might be a collection of stocks and bonds; a book of derivatives; a collection of risky loans; a nancial institutions overall position in risky assets.
Portfolio Values and Losses

Consider a portfolio and let Vt denote its value at time t; we assume this random variable is observable at time t. Suppose we look at risk from perspective of time t and we consider the time period [t, t + 1]. The value Vt+1 at the end of the time period is unknown to us. The distribution of (Vt+1 Vt) is known as the prot-and-loss or P&L distribution. We denote the loss by Lt+1 = (Vt+1 Vt). By this convention, losses will be positive numbers and prots negative. We refer to the distribution of Lt+1 as the loss distribution.
25
Risk Factors
Generally the loss Lt+1 for the period [t, t + 1] will depend on changes in a number of fundamental risk factors in the time period, such as stock prices and index values, yields and exchange rates. Writing Xt+1 for the vector of changes in underlying risk factors, the loss will be given by a formula of the form Lt+1 = l[t](Xt+1). where l[t] : Rd R is a known function which we call the loss operator. The book contains examples showing how the loss operator is derived for dierent kinds of portfolio. This is a process known as mapping.
Loss Distribution
The loss distribution is the distribution of Lt+1 = l[t](Xt+1)? But which distribution exactly? The Conditional distribution of Lt+1 given given Ft = ({Xs : s t}), the history up to and including time t? The unconditional distribution under assumption that (Xt) form stationary time series? Conditional problem forces us to model the dynamics of the risk factors and is most suitable for market risk. Unconditional approach is used for longer time intervals and is also typical in credit portfolio management.
A4. Risk Measures

Risk measures attempt to quantify the riskiness of a portfolio. The most popular risk measures like VaR describe the right tail of the loss distribution of Lt+1 (or the left tail of the P&L). To address this question we put aside the question of whether to look at conditional or unconditional loss distribution and assume that this has been decided. Denote the distribution function of the loss L := Lt+1 by FL so that P (L x) = FL(x).
28
VaR and Expected Shortfall

Let 0 < < 1. We use Value at Risk is dened as
VaR = q(FL) = FL () ,
(1)
where we use the notation q(FL) or q(L) for a quantile of the distribution of L and FL for the (generalized) inverse of FL. Provided E(|L|) < expected shortfall is dened as 1 ES = 1
1
qu(FL)du.
(2)
29
VaR in Visual Terms

Loss Distribution
0.25
Mean loss = -2.4 95% VaR = 1.6

95% ES = 3.3
0.05
probability density 0.10 0.15
0.20
5% probability
0.0
-10
-5
10
30
Losses and Prots

Profit & Loss Distribution (P&L)
0.25
95% VaR = 1.6
Mean profit = 2.4
0.05
probability density 0.10 0.15
0.20
5% probability
0.0
-10
-5
10
31
Expected Shortfall
For continuous loss distributions expected shortfall is the expected loss, given that the VaR is exceeded. For any (0, 1) we have E(L; L q(L)) ES = = E(L | L VaR) , 1 where we have used the notation E(X; A) := E(XIA) for a generic integrable rv X and a generic set A F. For a discontinuous loss df we have the more complicated expression 1 ES = E(L; L q) + q(1 P (L q)) . 1 [Acerbi and Tasche, 2002].
Coherent Measures of Risk

There are many possible measures of the risk in a portfolio such as VaR, ES or stress losses. To decide which are reasonable risk measures a systematic approach is called for. New approach of Artzner et.al.(1999): Give a list of properties (Axioms) that a reasonable risk measure should have; such risk measures are called coherent. Study coherence of standard risk measures (VaR, ES, stress losses etc.). On a more theoretical level: characterize all coherent risk measures. Goal: Look at practically relevant aspects of this approach.
Purposes of Risk Measurement

Risk measures are used for the following purposes: Determination of risk capital. Risk measure gives amount of capital needed as a buer against (unexpected) future losses to satisfy a regulator. Management tool. Risk measures are used in internal limit systems. Insurance premia can be viewed as measure of riskiness of insured claims. Our interpretation. Risk measure gives amount of capital that needs to be added to a position with loss L, so that the position becomes acceptable to an (internal/external) regulator.
The Axioms
A coherent risk measure is a realvalued function on some space of rvs (representing losses) that fullls the following 4 axioms: 1. Monotonicity. For two rvs with L1 L2 we have (L1) (L2). 2. Subadditivity. For any L1, L2 we have (L1 + L2) (L1) + (L2). This is the most debated property. Necessary for following reasons: Reects idea that risk can be reduced by diversication and that a merger creates no extra risk. Makes decentralized risk management possible. If a regulator uses a non-subadditive risk measure, a nancial institution could reduce risk capital by splitting into subsidiaries.
The Axioms II
3. Positive homogeneity. For 0 we have that (L) = (L). If there is no diversication we should have equality in subadditivity axiom. 4. Translation invariance. For any a R we have that (L + a) = (L) + a. Remarks: VaR is in general not coherent. coherent. ES (as we have dened it) is
Non-subadditivity of VaR is relevant in presence of skewed loss distributions (credit-risk management, derivative books), or if traders optimize against VaR.
B. Multivariate Risk Factor Models

1. Motivation: Multivariate Risk Factor Data 2. Basics of Multivariate modelling 3. The Multivariate Normal Distribution 4. Normal Mixture Distributions 5. Generalized Hyperbolic Distributions 6. Dimension Reduction and Factor Models
37
B1. Motivation: Multivariate Risk Factor Data

Assume we have data on risk factor changes X1, . . . , Xn. These might be daily (log) returns in context of market risk or longer interval returns in credit risk (e.g. monthly/yearly asset value returns). What are appropriate multivariate models? Distributional Models. In unconditional approach to risk modelling we require appropriate multivariate distributions, which are calibrated under assumption data come from stationary time series. Dynamic Models. In conditional approach we use multivariate time series models that allow us to make risk forecasts. This module concerns the rst issue. A motivating example shows the kind of data features that particularly interest us.
Bivariate Daily Return Data

BMW
-0.05 0.0 0.05 0.10
0.10
Time
-0.05 0.0 0.05 0.10
-0.05
SIEMENS
SIEMENS 0.0
23.01.85
23.01.86 23.01.87
23.01.88
23.01.89 23.01.90
23.01.91
23.01.92
-0.10
-0.15
0.05
-0.15
-0.15
23.01.85
23.01.86 23.01.87
23.01.88
23.01.89 23.01.90 Time
23.01.91
23.01.92
-0.15
-0.10
-0.05
0.0 BMW
0.05
0.10
BMW and Siemens: 2000 daily (log) returns 1985-1993.

Three Extreme Days

BMW
-0.05 0.0 0.05 0.10
0.10
-0.15
1
23.01.85 23.01.86 23.01.87 23.01.88 Time
2
23.01.89 23.01.90 23.01.91
3
23.01.92
-0.05 0.0 0.05 0.10
-0.05
SIEMENS
-0.10
0.05
-0.15
-0.15
1
23.01.85 23.01.86 23.01.87 23.01.88 Time
2
23.01.89 23.01.90 23.01.91
3
23.01.92
SIEMENS 0.0
-0.15
-0.10
-0.05
0.0 BMW
0.05
0.10
Those extreme days: 19.10.1987, 16.10.1989, 19.08.1991

History
New York, 19th October 1987
Berlin Wall
16thOctober 1989
The Kremlin, 19th August 1991

B2. Basics of Multivariate Modelling

A d-dimensional random vector of risk-factor changes X = (X1, . . . , Xd) has joint df F (x) = F (x1, . . . , xd) = P (X1 x1, . . . , Xd xd). The marginal dfs Fi of the individual risks are given by Fi(xi) = P (Xi xi) = F (, . . . , , xi, , . . . , ). In some cases we work instead with joint survival functions F (x) = F (x1, . . . , xd) = P (X1 > x1, . . . , Xd > xd), and marginal survival functions Fi(xi) = P (Xi > xi) = F (, . . . , , xi, , . . . , ).
Basics II
Densities. Joint densities f (x) = f (x1, . . . , xd), when they exist, are related to joint dfs by
x1 xd
F (x1, . . . , xd) =
f (u1, . . . , ud)du1 . . . dud.
Independence. The components of X are said to be mutually independent if and only if

d
F (x) =
i=1
Fi(xi),
x Rd,
or, when X possesses a joint density, if and only if

d
f (x) =
i=1
fi(xi),
x Rd.
43
Moments
The mean vector of X is E(X) = (E(X1), . . . , E(Xd)) and the covariance matrix is cov(X) = E ((X E(X))(X E(X)) ) . (assuming niteness of moments in both cases). Writing for cov(X), the (i, j)th element of this matrix is ij = cov(Xi, Xj ) = E(XiXj ) E(Xi)E(Xj ). The correlation matrix of X is the matrix P with (i, j)th element ij , ij = iijj the ordinary pairwise linear correlation of Xi and Xj . Writing = diag( 11, . . . , dd) we have P = 11.
Moments II
Mean vectors and covariance matrices are extremely easily manipulated under linear operations on the vector X. For any matrix B Rkd and vector b Rk we have E(BX + b) = BE(X) + b, cov(BX + b) = B cov(X)B . Covariance matrices must be positive semi-denite; writing for cov(X) we see that (3) implies that var(a X) = a a 0 for any a Rd . (3)
45
Estimators of Covariance and Correlation

Assumptions. We have data X1, . . . , Xn which are either iid or at least serially uncorrelated from a distribution with mean vector , nite covariance matrix and correlation matrix P . Standard method-of-moments estimators of and are the sample mean vector X and the sample covariance matrix S dened by 1 X= Xi , n i=1
n
1 S= (Xi X)(Xi X) . n i=1
The sample correlation matrix R has (i, j)th element given by rij = sij / siisjj . Writing D = diag( s11, . . . , sdd) we have R = D1SD1.
Properties of the Estimators?

Further properties of the estimators X, S and R depend on the true multivariate distribution of observations. They are not necessarily the best estimators of , and P in all situations, a point that is often forgotten in nancial risk management where they are routinely used. If our data are iid multivariate normal Nd(, ) then X and S are the maximum likelihood estimators (MLEs) of the mean vector and covariance matrix . Their behaviour as estimators is well understood and statistical inference concerning the model parameters is relatively unproblematic. However, certainly at short time intervals such as daily data, the multivariate normal is not a good description of nancial risk factor returns and other estimators of and may be better.
B3. Multivariate Normal (Gaussian) Distribution

This distribution has joint density (x ) 1(x ) f (x) = (2)d/2||1/2 exp , 2 where Rd and Rdd is a positive denite matrix. If X has density f then E (X) = and cov (X) = , so that and are the mean vector and covariance matrix respectively. A standard notation is X Nd(, ). Clearly, the components of X are mutually independent if and only if is diagonal. For example, X Nd(0, I) if and only if X1, . . . , Xd are iid N (0, 1).
Bivariate Standard Normals

4 4
-4 -2
-2
-4
In left plots = 0.9; in right plots = 0.7.

-4 4
-4
-4
-2
-2
-2
0 Y
2 0 X
0 Y -2
0 X
Z 0.25 0 0.05 0.1 0.15 0.2
Z 0.1 0.2 0.3 0.4
4 2
4
-4
-4
-2
-2
Properties of Multivariate Normal Distribution

The marginal distributions are univariate normal. Linear combinations a X = a1X1 + adXd are univariate normal with distribution a X N (a , a a). Conditional distributions are multivariate normal. The sum of squares (X ) 1(X ) 2 (chi-squared). d Simulation. 1. Perform a Cholesky decomposition = AA 2. Simulate iid standard normal variates Z = (Z1, . . . , Zd) 3. Set X = + AZ.
Testing for Multivariate Normality

If data are multivariate normal then margins must be univariate normal. This can be assessed graphically with QQplots or tested formally with tests like Jarque-Bera or Anderson-Darling. However, normality of the margins is not sucient we must test joint normality. There are numerical tests of multivariate normality (see book) or one can perform a graphical test by calculating Xi X S 1 Xi X , i = 1, . . . , n . These should form (approximately) a sample from a 2 distribution, d and this can be assessed with a QQplot. (QQplots compare empirical quantiles with theoretical quantiles of reference distribution.)
Deciencies of Multivariate Normal for Risk Factors

Tails of univariate margins are very thin and generate too few extreme values. Simultaneous large values in several margins relatively infrequent. Model cannot capture phenomenon of joint extreme moves in several risk factors. Very strong symmetry (known as elliptical symmetry). suggests more skewness may often be present. Reality
52
B4. Normal Mixture Distributions

Let Z Nd(0, Id) be a vector of iid standard normal variates and W be an independent, positive, scalar random variable. Let Rd and A Rdd be a vector an matrix of constants respectively. The vector X given by X=+ W AZ (4) is said to have a multivariate normal variance mixture distribution. Easy calculations give E(X) = and cov(X) = E(W ) where := AA . Note that X | W = w Nd(, w) The rv W can be thought of as a common shock impacting the variances of all components. The distribution of X is a mixture of normals but is not itself have a normal distribution.
Examples of Normal Variance Mixtures

2 point mixture W = k1 with probability p, k2 with probability 1 p k1 > 0, k2 > 0, k1 = k2.
Could be used to model two regimes - ordinary and extreme. Multivariate t W has an inverse gamma distribution, W Ig(/2, /2). This gives multivariate t with degrees of freedom. Equivalently /W 2 . Symmetric generalised hyperbolic W has a GIG (generalised inverse Gaussian) distribution.
The Multivariate t Distribution

This has density f (x) = k,,d (x ) (x ) 1+
1
(+d) 2
where Rd, Rdd is a positive denite matrix, is the degrees of freedom and k,,d is a normalizing constant.
If X has density f then E (X) = and cov (X) = 2 , so that and are the mean vector and dispersion matrix respectively. For nite variances/correlations > 2. Notation: X td(, , ).
If is diagonal the components of X are uncorrelated. They are not independent. The multivariate t distribution has heavy tails.
Bivariate Normal and t

4 4
-4 -2
-2
-4
Z 0.25 0 0.05 0.1 0.15 0.2
= 0.7, = 3, variances all equal 1.

-4
-4
-4
-4
-2
-2
-2
0 X
0 Y -2
0 X
0 Y
Z 0.1 0.2 0.3 0.4
-4
-4
-2
-2
Fitted Normal and t3 Distributions
0.10 0.10
-0.05
-0.10
-0.10
-0.05
0.05
SIEMENS 0.0
-0.15
-0.15
-0.10
-0.05
0.0 BMW
0.05
0.10
-0.15
SIEMENS 0.0
0.05
-0.15
-0.10
-0.05
0.0 BMW
0.05
0.10
Simulated data (2000) from models tted by maximum likelihood to BMW-Siemens data.
Multivariate Normal Mean-Variance Mixtures

Normal variance mixtures are elliptically symmetric distributions. We can introduce asymmetry and skewness by considering the following more general mixture construction: X = + W + W AZ, (5) where Rd is a vector of asymmetry parameters and all other terms are as in (4). If = 0 then we are back in the elliptical variance mixture family. Main example. When W has a GIG distribution we get generalized hyperbolic family.
58
Moments of Mean-Variance Mixtures

Since X | W Nd( + W , W ) it follows that E(X) = E (E(X | W )) = + E(W ), cov(X) = E (cov(X | W )) + cov (E(X | W )) = E(W ) + var(W ) , (7) (6)
provided W has nite variance. We observe from (6) and (7) that the parameters and are not in general the mean vector and covariance matrix of X. Note that a nite covariance matrix requires var(W ) < whereas the variance mixtures only require E(W ) < .
59
Generalised Inverse Gaussian (GIG) Distribution

The random variable X has a generalised inverse Gaussian (GIG), written X N (, , ), if its density is 1 ( ) 1 x exp x1 + x f (x) = 2 2K( )
x > 0,
where K denotes a modied Bessel function of the third kind with index and the parameters satisfy > 0, 0 if < 0; > 0, > 0 if = 0 and 0, > 0 if > 0. For more on this Bessel function see [Abramowitz and Stegun, 1965]. The GIG density actually contains the gamma and inverse gamma densities as special limiting cases, corresponding to = 0 and = 0 respectively. Thus, when = 0 and = 0 the mixture distribution in (5) is multivariate t.
Samping from Normal Mixture Distributions

It is straightforward to simulate normal mixtures. 1. Generate Z Nd(0, Id). 2. Generate W independently. 3. Set X = + W + W AZ. Example: t distribution (and skewed t) We require W Ig(/2, /2). Alternatively generate V 2 and set W = /V . Example: generalized hyperbolic distribution To sample from GIG distribution we can use an algorithm in [Atkinson, 1982]; see also work of [Eberlein et al., 1998].
B5. Generalized Hyperbolic Distributions

The generalized hyperbolic density f (x) K d
2
( + Q(x; , ))( + 1) exp (x ) 1 ( + Q(x; , ))( + 1)

d 2
where Q(x; , ) = (x ) 1(x ) and the normalising constant is d 1 ( ) ( + ) 2 c= . d 1 (2) 2 || 2 K
62
Generalized Hyperbolic Distribution II

Notation X GHd(, , , , , ). Closure under linear operations If X GHd(, , , , , ) and we consider Y = BX + b where B Rkd and b Rk then Y GHk (, , , B + b, BB , B). This means of course that marginal distributions are univariate generalized hyperbolic. A version of the variance-covariance method may be based on this family.
63
Special Cases
If = 1 we get a multivariate distribution whose univariate margins are one-dimensional hyperbolic distributions, a model widely used in univariate analyses of nancial return data. If = 1/2 then the distribution is known as a normal inverse Gaussian (NIG) distribution. This model has also been used in univariate analyses of return data; its functional form is similar to the hyperbolic with a slightly heavier tail. If > 0 and = 0 we get a limiting case of the distribution known variously as a generalised Laplace, Bessel function or variance gamma distribution. If = /2, = and = 0 we get an asymmetric or skewed t distribution.
Empirical Experience with GH Family

The normal mixture structure of this family makes it possible to t it with the EM algorithm [McNeil et al., 2005]. Our experience shows that skewed t and NIG models are useful special cases; often the elliptical special cases ( = 0) cannot be rejected.
Daily p-value =0 p-value Weekly p-value =0 p-value GH 17306.44 17303.10 0.15 2890.65 2887.52 0.18 GBP, NIG Hyp t VG 17306.43 17305.61 17304.97 17302.5 0.85 0.20 0.09 0.00 17303.06 17302.15 17301.85 17299.15 0.24 0.13 0.10 0.01 2889.90 2889.65 2890.65 2888.98 0.22 0.16 1.00 0.07 2886.74 2886.48 2887.52 2885.86 0.17 0.14 0.28 0.09 Euro, Yen, CHF against USD: returns 00-04 Gauss
17144.38 0.00
2872.36 0.00
65
Elliptical distributions
A random vector (X1, . . . , Xd) is spherical if its distribution is invariant under rotations, i.e. for all U Rdd with U U = U U = Id U X = X. A random vector (X1, . . . , Xd) is called elliptical if it is an ane transform of a spherical random vector (Y1, . . . , Yk ), X = AY + b, A Rdk , b Rd. A normal variance mixture in (4) with = 0 and = I is spherical; any normal variance mixture is elliptical.
References
[Barndor-Nielsen and Shephard, 1998] distributions) (generalized hyperbolic
[Barndor-Nielsen, 1997] (NIG distribution) [Eberlein and Keller, 1995] ) (hyperbolic distributions) [Prause, 1999] (GH distributions - PhD thesis) [Fang et al., 1990] (elliptical distributions)
67
B6. Dimension Reduction and Factor Models

Idea: Explain the variability in a d-dimensional vector X in terms of a smaller set of common factors. Denition: X follows a p-factor model if X = a + BF + , (8)
(i) F = (F1, . . . , Fp) is random vector of factors with p < d, (ii) = (1, . . . , d) is random vector of idiosyncratic error terms, which are uncorrelated and mean zero, (iii) B Rdp is a matrix of constant factor loadings and a Rd a vector of constants, (iv) cov(F, ) = E((F E(F)) ) = 0.
68
Remarks on Theory of Factor Models

Factor model (8) implies that covariance matrix = cov(X) satises = BB + , where = cov(F) and = cov() (diagonal matrix). Factors can always be transformed so that they are orthogonal: = BB + . (9)
Conversely, if (9) holds for covariance matrix of random vector X, then X follows factor model (8) for some a, F and . If, moreover, X is Gaussian then F and may be taken to be independent Gaussian vectors, so that has independent components.
Factor Models in Practice

We have multivariate nancial return data X1, . . . , Xn which are assumed to follow (8). Two situations to be distinguished: 1. Appropriate factor data F1, . . . , Fn are also observed, for example returns on relevant indices. We have a multivariate regression problem; parameters (a and B) can be estimated by multivariate least squares. 2. Factor data are not directly observed. We assume data X1, . . . , Xn identically distributed and calibrate factor model by one of two strategies: statistical factor analysis - we rst estimate B and from (9) and use these to reconstruct F1, . . . , Fn; principal components - we fabricate F1, . . . , Fn by PCA and estimate B and a by regression.
C. Copulas
1. Basic Copula Primer 2. Copula-Based Dependence Measures 3. Normal Mixture Copulas 4. Archimedean Copulas 5. Fitting Copulas to Data
71
C1. Basic Copula Primer

Copulas help in the understanding of dependence at a deeper level; They show us potential pitfalls of approaches to dependence that focus only on correlation; They allow us to dene useful alternative dependence measures; They express dependence on a quantile scale, which is natural in QRM; They facilitate a bottom-up approach to multivariate model building; They are easily simulated and thus lend themselves to Monte Carlo risk studies.
What is a Copula?
A copula is a multivariate distribution function with standard uniform margins. Equivalently, a copula if any function C : [0, 1]d [0, 1] satisfying the following properties: 1. C(u1, . . . , ud) is increasing in each component ui. 2. C(1, . . . , 1, ui, 1, . . . , 1) = ui for all i {1, . . . , d}, ui [0, 1]. 3. For all (a1, . . . , ad), (b1, . . . , bd) [0, 1]d with ai bi we have:
2 2
i1=1 id=1
(1)i1++id C(u1i1 , . . . , udid ) 0,
where uj1 = aj and uj2 = bj for all j {1, . . . , d}.

Probability and Quantile Transforms

Lemma 1: probability transform Let X be a random variable with continuous distribution function F . Then F (X) U (0, 1) (standard uniform). P (F (X) u) = P (X F (u)) = F (F (u)) = u, Lemma 2: quantile transform Let U be uniform and F the distribution function of any rv X. d 1 Then F (U ) = X so that P (F (U ) x) = F (x). These facts are the key to all statistical simulation and essential in dealing with copulas. u (0, 1).
74
Sklars Theorem
Let F be a joint distribution function with margins F1, . . . , Fd. There exists a copula C such that for all x1, . . . , xd in [, ] F (x1, . . . , xd) = C(F1(x1), . . . , Fd(xd)). If the margins are continuous then C is unique; otherwise C is uniquely determined on RanF1 RanF2 . . . RanFd. And conversely, if C is a copula and F1, . . . , Fd are univariate distribution functions, then F dened above is a multivariate df with margins F1, . . . , Fd.
75
Sklars Theorem: Proof in Continuous Case

Henceforth, unless explicitly stated, vectors X will be assumed to have continuous marginal distributions. In this case: F (x1, . . . , xd) = P (X1 x1, . . . , Xd xd) = P (F1(X1) F1(x1), . . . , Fd(Xd) Fd(xd)) = C(F1(x1), . . . , Fd(xd)). The unique copula C can be calculated from F, F1, . . . , Fd using
C(u1, . . . , ud) = F (F1 (u1), . . . , Fd (ud)) .
76
Copulas and Dependence Structures

Sklars theorem shows how a unique copula C describes in a sense the dependence structure of the multivariate df of a random vector X. This motivates a further denition. Denition: Copula of X The copula of (X1, . . . , Xd) is the df C of (F1(X1), . . . , Fd(Xd)). Invariance C is invariant under strictly increasing transformations of the marginal distributions. If T1, . . . , Td are strictly increasing, then (T1(X1), . . . , Td(Xd)) has the same copula as (X1, . . . , Xd).
77
The Frchet Bounds e

For every copula C(u1, . . . , ud) we have the important bounds
d
max
i=1
ui + 1 d, 0
C(u) min {u1, . . . , ud} .
(10)
The upper bound is the df of (U, . . . , U ) and the copula of the random a.s. vector X where Xi = Ti(X1) for increasing functions T2, . . . , Td. It represents perfect positive dependence or comonotonicity. The lower bound is only a copula when d = 2. It is the df of the a.s. vector (U, 1 U ) and the copula of (X1, X2) where X2 = T (X1) for T decreasing. It represents perfect negative dependence or countermonotonicity. The copula representing independence is C(u1, . . . , ud) =
d i=1 ui.
78
Parametric Copulas
There are basically two possibilities: Copulas implicit in well-known parametric distributions. Sklars Theorem states that we can always nd a copula in a parametric distribution function. Denoting the df by F and assuming the margins F1, . . . , Fd are continuous, the implied copula is
C(u1, . . . , ud) = F (F1 (u1), . . . , Fd (ud))
. Such a copula may not have a simple closed form. Closed form parametric copula families generated by some explicit construction that is known to yield copulas. The best example is the well-known Archimedean copula family. These generally have limited numbers of parameters and limited exibility; the standard Archimedean copulas are dfs of exchangeable random vectors.
Examples of Implicit Copulas

Gaussian Copula
Ga CP (u) = P 1(u1), . . . , 1(ud) ,
where denotes the standard univariate normal df, P denotes the Ga joint df of X Nd(0, P ) and P is a correlation matrix. Write C when d = 2. P = Id gives independence and P = Jd gives comonotonicity. t Copula
t C,P (u) = t,P t1(u1), . . . , t1(ud) , where t is the df of a standard univariate t distribution, t,P is the joint df of the vector X td(, 0, P ) and P is a correlation matrix. t Write C, when d = 2.
P = Jd gives comonotonicity, but P = Id does not give independence.

Examples of Explicit (Archimedean) Copulas

Gumbel Copula
Gu C (u1, . . . , ud)
= exp ( log u1) + + ( log ud)
1/
1: = 1 gives independence; gives comonotonicity. Clayton Copula

Cl C (u1, . . . , ud)
u 1
+ +
u d
d+1
1/
> 0: 0 gives independence ; gives comonotonicity.
81
Simulating Copulas
Ga Simulating Gaussian copula CP
Simulate X Nd(0, P ) Set U = ( (X1) , . . . , (Xd)) (probability transformation)

t Simulating t copula C,P
Simulate X td(, 0, P ) Set U = (t (X1) , . . . , t (Xd)) (probability transformation) t is df of univariate t distribution. Simulation of Archimedean copulas is less obvious, but also turns out to be fairly simple in the majority of cases.
Simulating Copulas II
Gaussian
1.0 1.0
0.0 0.2 0.4 X1 0.6 0.8 1.0
Gumbel
0.0 0.2 0.4 X1 0.6 0.8 1.0
0.8
0.6
X2
0.4
X2 0.0 0.2 0.4
0.0
0.2
Clayton
1.0
0.0 0.2 0.4 X1 0.6 0.8 1.0
0.6
0.8
t4
1.0
0.0 0.2 0.4 X1 0.6 0.8 1.0
0.8
0.6
X2
0.4
X2 0.0 0.2 0.4
Gauss: = 0.7, Gumbel: = 2, Clayton: = 2.2, t: = 0.71, = 4

0.0
0.2
0.6
0.8
Meta-Distributions and Their Simulation

By the converse of Sklars Theorem we know that if C is a copula and F1, . . . , Fd are univariate dfs, then F (x) = C(F1(x1), . . . , Fd(xd)) is a multivariate df with margins F1, . . . , Fd. We refer to F as a meta-distribution with the dependence structure represented by C. For example, if C is a Gaussian copula we get a meta-Gaussian distribution and if C is a t copula we get a meta-t distribution. If we can sample from the copula C, then it is easy to sample from F : we generate a vector (U1, . . . , Ud) with df C and then return
(F1 (U1) , . . . , Fd (Ud)) .
84
Simulating Meta-Gaussian Distributions

Gaussian
4

Gumbel
4

X2
0 X1
4
4
X2
0 X1
Clayton
4 4
t4
4 2 0 X1 2
X2
X2
4
0 X1
Linear correlation (X1, X2) 0.7 in all cases.

A Note on Symmetry of Copulas

Exchangeability A copula is exchangeable if it is the df of an exchangeable random d vector U satisfying (U1, . . . , Ud) = (U(1), . . . , U(d)) for any permutation ((1), . . . , (d)) of (1, . . . , d). The copula must satisfy C(u1, . . . , ud) = C(u(1), . . . , u(d)). Examples: Clayton, Gumbel, or Gauss and t when P is an equicorrelation matrix. Radial Symmetry A copula is radially symmetric if it is the df of a random vector U d satisfying (U1, . . . , Ud) = (1 U1, . . . , 1 Ud). Examples: Gauss and t.
Copula References
[Embrechts et al., 2002] (dependence in QRM) [Joe, 1997] (on dependence in general) [Nelsen, 1999] (standard reference on bivariate copulas) [Daul et al., 2003] (summary article) [Cherubini et al., 2004] (copulas in nance)
87
C2. Copula-Based Dependence Measures

Consider a pair of random variables (X1, X2) with continuous marginal distributions F1 and F2 and unique copula C. In this section we consider scalar measures of dependence for (X1, X2) which depend only on C, and not on the marginal distributions. We consider coecients of tail dependence, which provide a way of comparing the extremal dependence properties of copulas, i.e. the amount of dependence in the joint tails of the bivariate distribution. We consider rank correlations, which turn out to be useful in the calibration of copulas to empirical data. (They are more useful than the standard linear correlation, which is not a copula-based measure.)
88
Tail Dependence or Extremal Dependence

When limit exists, coecient of upper tail dependence is
u(X1, X2) = lim P (X2 > F2 (q) | X1 > F1 (q)), q1
Analogously the coecient of lower tail dependence is

l(X1, X2) = lim P (X2 F2 (q) | X1 F1 (q)) . q0
These are functions of the copula given by u l

C(q, q) 1 2q + C(q, q) = lim , = lim q1 q1 1 q 1q C(q, q) = lim . q0 q

89
Tail Dependence
Clearly u [0, 1] and l [0, 1]. For copulas of elliptically symmetric distributions u = l =: . This is true, more generally, for all copulas with radial symmetry. Terminology: u (0, 1]: upper tail dependence, u = 0: asymptotic independence in upper tail, l (0, 1]: lower tail dependence, l = 0: asymptotic independence in lower tail.
90
Examples of Tail Dependence

The Gaussian copula is asymptotically independent for || < 1. The t copula is tail dependent when > 1. = 2t+1 +1 1 / 1 + .
The Gumbel copula is upper tail dependent for > 1. u = 2 21/ . The Clayton copula is lower tail dependent for > 0. l = 21/ . All formulas are derived in the book.
Rank Correlation
Spearmans rho S (X1, X2) = (F1(X1), F2(X2)) = (copula)
1 1
S (X1, X2) = 12
0 0
{C(u1, u2) u1u2}du1du2.
Kendalls tau Take an independent copy of (X1, X2) denoted (X1, X2). (X1, X2) = 2P (X1 X1)(X2 X2) > 0 1
1 1
(X1, X2) = 4
0
C(u1, u2)dC(u1, u2) 1.

0
92
Properties of Rank Correlation

The following statements are true for Spearmans rho (S ) or Kendalls tau ( ), but not for Pearsons linear correlation (). S depends only on copula of (X1, X2). S is invariant under strictly increasing transformations of the random variables. S (X1, X2) = 1 X1, X2 comonotonic. S (X1, X2) = 1 X1, X2 countermonotonic.
93
Sample Rank Correlations

Consider iid bivariate data {(X1,1, X1,2), . . . , (Xn,1, Xn,2)}. The standard estimator of (X1, X2) is 1 sgn [(Xi,1 n 2 1i<jn Xj,1) (Xi,2 Xj,2)] ,
and the estimator of S (X1, X2) is 12 n+1 rank(Xi,1) 2 1) n(n 2 i=1

n
n+1 rank(Xi,2) . 2
94
C3. Normal Mixture Copulas

A useful class of parametric copulas is contained in multivariate normal mixture distributions. These copulas have found applications in both market and credit risk. In this section we will explore the implications of the dierent extremal behaviours of the Gaussian and t copulas; give a useful formula for the Kendalls tau of normal variance mixture copulas; present a couple of more exotic copulas derived from mixture representations.
Gaussian and t3 Copulas Compared

Normal Dependence
t Dependence
4
-4
-4
X2 0
-2
Copula parameter = 0.7; quantiles lines 0.5% and 99.5%.

-4
-2
0 X1
-2
X2 0
-4
-2
0 X1
Joint Tail Probabilities at Finite Levels

0.5 0.5 0.5 0.5 0.7 0.7 0.7 0.7 C N t8 t4 t3 N t8 t4 t3 95% 1.21 102 1.20 1.39 1.50 1.95 102 1.11 1.21 1.27 Quantile 99% 99.5% 99.9% 1.29 103 4.96 104 5.42 105 1.65 1.94 3.01 2.22 2.79 4.86 2.55 3.26 5.83 2.67 103 1.14 103 1.60 104 1.33 1.46 1.86 1.60 1.82 2.52 1.74 2.01 2.83
For normal copula probability is given. For t copulas the factor by which Gaussian probability must be multiplied is given.
Joint Tail Probabilities, d 2

0.5 0.5 0.5 0.5 0.7 0.7 0.7 0.7 C N t8 t4 t3 N t8 t4 t3 Dimension d 2 3 4 5 1.29 103 3.66 104 1.49 104 7.48 105 1.65 2.36 3.09 3.82 2.22 3.82 5.66 7.68 2.55 4.72 7.35 10.34 2.67 103 1.28 103 7.77 104 5.35 104 1.33 1.58 1.78 1.95 1.60 2.10 2.53 2.91 1.74 2.39 2.97 3.45
We consider only 99% quantile and case of equal correlations.
98
Financial Interpretation
Consider daily returns on ve nancial instruments and suppose that we believe that all correlations between returns are equal to 50%. However, we are unsure about the best multivariate model for these data. If returns follow a multivariate Gaussian distribution then the probability that on any day all returns fall below their 1% quantiles is 7.48 105. In the long run such an event will happen once every 13369 trading days on average, that is roughly once every 51.4 years (assuming 260 trading days in a year). On the other hand, if returns follow a multivariate t distribution with four degrees of freedom then such an event will happen 7.68 times more often, that is roughly once every 6.7 years.
Rank Correlations
Gaussian Case
Ga Let X be a bivariate random vector with copula C and continuous margins. Then the rank correlations are
2 (X1, X2) = arcsin , 6 S (X1, X2) = arcsin . 2 Normal Variance Mixture Case The formula (11) also holds when X has the copula of a normal variance mixture distribution with correlation parameter , for t example the t copula C,.
(11) (12)
100
More Exotic Normal Mixture Copulas

The t copula is popular in applications but has its drawbacks. It is radially symmetric, meaning that the dependence in the joint upper tail is the same as that in the joint lower tail, and its bivariate margins are exchangeable. Moreover, a single degree-of-freedom parameter determines level of tail dependence for all bivariate margins. Other copulas of normal mixtures can get around these drawbacks. Copulas of skewed members of the GH family. The NIG copula or the copula of the skewed t oer interesting possibilities for modelling more asymmetry. In general the bivariate margins of these copulas need be neither radially symmetric nor exchangeable. Grouped normal mixture copulas. We dene a block structure; within blocks the variables have copulas of standard normal mixture distributions.
Construction of the Grouped t Copula

G denotes df of a univariate Ig(/2, /2) distribution Let Z Nd(0, ) and let U U (0, 1) be a uniform variate independent of Z. Partition {1, . . . , d} into m subsets of sizes s1, . . . , sm and for k = 1, . . . , m let k be the degree of freedom parameter associated with group k. Let Wk = G1(U ) so that W1, . . . , Wm are perfectly dependent. k Grouped t copula is copula of X = ( W1Z1, . . . ,
W1Zs1 , . . . ,
WmZdsm+1, . . . ,
WmZd) .
102
C4. Archimedean Copulas

Bivariate Archimedean copulas have the form C(u1, u2) = 1((u1) + (u2)), where : [0, 1] [0, ], the so-called copula generator, is continuous, strictly decreasing, convex and satises (1) = 0 and (0) = . The simplest higher dimensional extension is the exchangeable construction C(u1, . . . , ud) = 1((u1) + + (ud)). This yields a valid copula in any dimension d if and only if 1 is a completely monotonic function. This is true in the Gumbel and Clayton cases where (u) = (ln u) and (u) = u 1 respectively.
Complete Monotonicity and Laplace Transforms

: [0, ] [0, 1] is a completely monotonic function if it satises dk (1)k k (t) 0, dt k N, t > 0.
The class of such functions coincides with the class of Laplace transforms of dfs G on R+ satisfying G(0) = 0. Recall that the Laplace transform G of G is given by
G(t) =
0
etxdG(x), t 0.
For this reason we refer to generators with completely monotonic inverses as LT-Archimedean generators and the resulting copulas as LT-Archimedean copulas.
Construction of an LT-Archimedean Copula

Let V be a rv with df G and let U1, . . . , Ud be conditionally independent given V with conditional distribution function given by P (Ui u | V = v) = exp(v G1(u)) for u [0, 1]. The distribution of (U1, . . . , Ud) is an LT-Archimedean copula: F (u1, . . . , ud) = P (U1 u1, . . . , Ud ud)
=
0
P (U1 u1, . . . , Ud ud | V = v) dG(v) e

0 b b v (G1(u1)++G1(ud))
dG(v)
= G G1(u1) + + G1(ud) . [Marshall and Olkin, 1988]

Sampling LT-Archimedean Copulas

For any LT-Archimedean copula generator there exists a distribution function G on R+ with G(0) = 0 such that G = 1. The main practical problems are (i) to nd the G and (ii) to sample from the corresponding distribution. If this can be achieved we then have the algorithm. 1. Generate a variate V with df G such that G = 1. 2. Generate independent uniform variates X1, . . . , Xd. 3. Return (U1, . . . , Ud) = (G( ln(X1)/V ), . . . , G( ln(Xd)/V )).
106
Special Cases
Clayton copula We generate a gamma variate V Ga(1/, 1) with > 0. The df of V has Laplace transform G(t) = (1 + t)1/ . Gumbel copula We generate a positive stable variate V St(1/, 1, , 0) where = (cos(/(2))) and > 1. This df has Laplace transform G(t) = exp(t1/ ) as desired. For denitions of these distributions, consult the book.
107
Simulating Gumbel Copula

0.0 0.4 0.8 0.0 0.4 0.8
1.0
0.0
0.6
0.8
0.2
0.4
gumbel[,2]
1.0
0.0
0.0 0.4 0.8
0.0 0.4 0.8
0.6
0.8
1000 points from 4-dimensional Gumbel with = 2.

0.2
0.4
gumbel[,4]
0.0
0.2
0.4
gumbel[,3]
0.6
0.8
1.0
0.0
0.2
0.4
gumbel[,1]
0.6
0.8
1.0
Partially Exchangeable Archimedean Copulas

By mixing generators it is possible to create copulas with group structure. A possible 3-dimensional construction is C(u1, u2, u3) = 1 2 1 (1(u1) + 1(u2)) + 2(u3) . 2 1 1. 1, 2, 3 are LT-Archimedean generators. 2. Derivative of 2 1 must be completely monotonic. 1 3. All bivariate margins are bivariate Archimedean copulas. 4. If (U1, U2, U3) have this df then the pair (U1, U2) are, roughly speaking, more dependent than the pairs (U1, U3) and (U2, U3).
109
C5. Fitting Copulas to Data

We have data vectors X1, . . . , Xn with identical distribution function F . We write Xt = (Xt,1, . . . , Xt,d) for an individual data vector and X = (X1, . . . , Xd) for a generic random vector with df F . We assume further that this df F has continuous margins F1, . . . , Fd and thus by Sklars theorem a unique representation F (x) = C(F1(x1), . . . , Fd(xd)). This module devoted to the problem of estimating the parameters of a parametric copula C . The main method we consider is maximum likelihood estimation, but we rst outline a simpler method-of-moments procedure using sample rank correlation estimates; this method has the advantage that marginal distributions do not need to be estimated so that inference about the copula is margin-free.
Method-of-Moments Using Rank Correlation

Recall the standard estimators of Kendalls rank correlation and Spearmans rank correlation. We will use the notation R and RS to denote matrices of pairwise estimates. These can be shown to be positive semi-denite (see book). Calibrating Gauss copula with Spearmans rho
Ga Suppose we assume a meta-Gaussian model for X with copula CP and we wish to estimate the correlation matrix P . It follows from Theorem 5.36 in book that
6 ij S (Xi, Xj ) = arcsin ij , 2 where the nal approximation is very accurate. This suggests we estimate P by the matrix of pairwise Spearmans rank coecients RS .
Calibrating t Copula with Kendalls tau

t Suppose we assume a meta t model for X with copula C,P and we wish to estimate the correlation matrix P . The theoretical relationship between Spearmans rho and P is not known in this case, but a relationship between Kendalls tau and P is known.
It follows from Proposition 5.37 in book that 2 (Xi, Xj ) = arcsin ij , so that a possible estimator of P is the matrix R with components given by rij = sin(rij /2) This may not be positive denite, in which case R can be transformed by the eigenvalue method given in Algorithm 5.55 to obtain a positive denite matrix that is close to R. The remaining parameter of the copula could then be estimated by maximum likelihood.
Maximum Likelihood Method

To estimate the copula by ML we require a so-called pseudo-sample of observations from the copula. To construct such a sample we are required to estimate marginal distributions. This can be done with 1. parametric models F1, . . . , Fd, 2. a form of the empirical distribution function such as n 1 Fj (x) = n+1 i=1 1{Xi,j x}, j = 1, . . . , d, 3. empirical df with EVT tail model. The second method, known as pseudo-maximum likelihood, means that we essentially work with the ranks of the original data, standardized to lie on the copula scale. For statistical properties see [Genest and Rivest, 1993].
Estimating the Copula

We form the pseudo-sample Ui = Ui,1, . . . , Ui,d = F1(Xi,1), . . . , Fd(Xi,d) , i = 1, . . . , n.
and t parametric copula C by maximum likelihood.

Copula density is c(u1, . . . , ud; ) = u1 u C(u1, . . . , ud; ), where d denote unknown parameters. The log-likelihood is n
l(; U1, . . . , Un) =

i=1
log c(Ui,1, . . . , Ui,d; ).
Independence of vector observations assumed for simplicity.
114
BMW-Siemens Example I
1.0
0.0 0.2 0.4 BMW 0.6 0.8 1.0
SIEMENS 0.0 0.2 0.4
The pseudo-sample from copula after estimation of margins.

0.6
0.8
BMW-Siemens Example II
Copula Gauss t Gumbel Clayton , std.error(s) log-likelihood 0.70 0.0098 610.39 0.70 4.89 0.0122,0.73 649.25 1.90 0.0363 584.46 1.42 0.0541 527.46
Goodness-of-t. Akaikes criterion (AIC) suggests choosing model that minimizes AIC = 2p 2 (log-likelihood), where p = number of parameters of model. This is clearly t model. Remark. Formal methods for goodness-of-t also available.
Dow Jones Example

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
GE
IBM
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
MCD
0.0 0.2
MSFT
0.4
0.6
0.8
1.0
The pseudo-sample from copula after estimation of margins.

0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
Dow Jones Example II:

650 loglikelihood 500 550 600
10 nu
15
20
Daily returns on ATT, General Electric, IBM, McDonalds, Microsoft. Form of likelihood for nu indicates non-Gaussian dependence.
II: Modelling Extreme Risks

A. Extreme Value Theory B. Threshold Models C. More Advanced Topics in Extremes
119
A. Extreme Value Theory: Maxima

1. Limiting Behaviour of Maxima 2. The Fisher-Tippett Theorem 3. Maximum Domains of Attraction 4. Maxima of Dependent Data 5. The Block Maxima Method
120
A1. Limiting Behaviour of Maxima

Broadly speaking, there are two principal kinds of model for extreme values. The most traditional models are models for block maxima, which are the subject of this module. These are models for the largest observations collected from large samples of identically distributed observations. The asymptotic theory of maxima is the main subject of classical EVT. A more modern and powerful group of models are the models for threshold exceedances. These are models for all large observations that exceed a some high level, and are generally considered to be the most useful for practical applications, due to their more ecient use of the (often limited) data on extreme outcomes.
121
Notation for Study of Maxima

Let X1, X2, . . . be iid random variables with distribution function (df) F . In risk management applications these could represent nancial losses, operational losses or insurance losses. Let Mn = max (X1, . . . , Xn) be the maximum loss in a sample of n losses. Clearly P (Mn x) = P (X1 x, . . . , Xn x) = F n(x) . It can be shown that, almost surely, Mn xF , where xF := sup{x R : F (x) < 1} is the right endpoint of F . But what about normalized maxima?
n
122
Limiting Behaviour of Sums or Averages

(See [Embrechts et al., 1997], Chapter 2.) We are familiar with the central limit theorem. Let X1, X2, . . . be iid with nite mean and nite variance 2. Let Sn = X1 + X2 + . . . + Xn. Then 2 x n (x) , P (Sn n) / n where is the distribution function of the standard normal distribution x 1 u2/2 (x) = e du . 2 Note, more generally, the limiting distributions for appropriately normalized sample sums are the class of stable distributions; Gaussian distribution is a special case.
Limiting Behaviour of Sample Maxima

(See [Embrechts et al., 1997], Chapter 3.) Let X1, X2, . . . be iid from F and let Mn = max (X1, . . . , Xn). Suppose we can nd sequences of real numbers cn > 0 and dn such that (Mn dn) /cn, the sequence of normalized maxima, converges in distribution, i.e. P ((Mn dn) /cn x) = F (cnx + dn) H(x) , for some nondegenerate df H(x). If this condition holds we say that F is in the maximum domain of attraction of H, abbreviated F MDA(H) . Note that such an H is determined up to location and scale, i.e. will specify a unique type of distribution.
Types of Distributions
Denition (Equality in type) Two random variables U and V are of the same type d U = aV + b for a > 0, b R. In terms of their dfs FU and FV this means FU (x) = FV ((x b)/a). Thus random variables of the same type have the same df, up to possible changes of scale and location.
125
A2. The Fisher-Tippett Theorem

The generalized extreme value (GEV) distribution has df H (x) = exp (1 + x)1/ exp (ex) = 0, = 0,
where 1 + x > 0 and is the shape parameter. Note, this parametrization is continuous in . For >0 <0 H is equal in type to classical Frchet df e H is equal in type to classical Weibull df .
= 0 H is equal in type to classical Gumbel df
We introduce location and scale parameters and > 0 and work with H,, (x) := H ((x )/). Clearly H,, is of type H .
GEV Distribution Functions and Densities

1.0
0.8
H(x)
0.6
h(x) 0.4 0.0 0.2
2 x
0.0
0.1
0.2
0.3
0.4
2 x
Solid line corresponds to = 0 (Gumbel); dotted line is = 0.5 (Frchet); dashed line is = 0.5 (Weibull). = 0 and = 1. e
Fisher-Tippett Theorem (1928)

Theorem: If F MDA(H) then H is of the type H for some . If suitably normalized maxima converge in distribution to a non-degenerate limit, then the limit distribution must be an extreme value distribution. Remark 1: Essentially all commonly encountered continuous distributions are in the maximum domain of attraction of an extreme value distribution. Remark 2: We can always choose normalizing sequences cn and dn so that the limit law H appears in standard form (without relocation or rescaling).
128
A3. Maximum Domains of Attraction

When does F MDA (H ) hold? Frchet Case ( > 0) e Gnedenko (1943) showed that for > 0 F MDA (H ) 1 F (x) = x1/ L(x) , for some slowly varying function L(x). A function L on (0, ) is slowly varying if limx L(tx) = 1 , t > 0 . L(x) Summary: The distributions in this class have heavy tails that decay like power functions. Not all moments are nite. Examples are Pareto, log-gamma, F, t and Cauchy.
Maximum Domains of Attraction II

Gumbel Case ( = 0) This class contains distributions with tails that decay faster than power tails, for example exponentially. They are lighter-tailed distributions for which all moments are nite. Examples are the Normal, lognormal, exponential and gamma. Weibull Case ( < 0) For < 0, F MDA(H ) if and only if xF < and 1 F (xF x1) = x1/ L(x), for some slowly varying function L(x). Examples are short-tailed distributions like uniform and beta.
130
Examples
Recall: F MDA(H ) , i there are sequences cn and dn with P ((Mn dn) /cn x) = F (cnx + dn) H(x) . We have the following examples: The exponential distribution, F (x) = 1 ex, > 0, x 0, is in MDA(H0) (Gumbel-case). Take cn = 1/, dn = (log n)/. The Pareto distribution, F (x) = 1 +x
n n
, > 0,
x 0,
is in MDA(H1/) (Frchet case). e n1/ .

Take cn = n1//, dn =
131
A4. Maxima of Dependent Data

Let (Xi)iZ be a (strictly) stationary time series; let (Xi)iZ be an associated iid series with the same marginal distribution F . Let Mn = max(X1, . . . , Xn) and Mn = max(X1, . . . , Xn) be the respective maxima of n-blocks. For many processes (Xi)iZ it may be shown that there exists a real number in (0, 1] such that
n
lim P {(Mn dn)/cn x} = H(x),
for a non-degenerate limit H(x) if and only if

n
lim P {(Mn dn)/cn x} = H (x).

132
Extremal Index
The value is known as the extremal index of the process (to be distinguished from the tail index of distributions in the Frchet class). e For processes with an extremal index normalized block maxima converge in distribution provided that maxima of the associated iid process converge in distribution, that is, provided the underlying distribution F is in MDA(H ) for some .
H (x) is a distribution of the same type as H (x). It is also a GEV distribution with exactly the same scaling parameter . Only the location and scaling of the distribution change.
133
Interpretation of Extremal Index

Writing u = cnx + dn we observe that, for large enough n P (Mn u) P Mn u = F n (u), so that at high levels the probability distribution of the maximum of n observations from the time series with extremal index is like that of the maximum of n < n observations from the associated iid series. In a sense n can be thought of as counting the number of roughly independent clusters of observations in n observations and is often interpreted as the reciprocal of the mean cluster size.
134
Extremal Index: Examples

Not every strictly stationary process has an extremal index (EKM page 418) but for the kinds of time series processes that interest us in nancial modelling an extremal index generally exists. We distinguish between the cases when = 1 and the cases when < 1; for the former there is no tendency for extremes to cluster at high levels and large sample maxima behave exactly like maxima from similarly-sized iid samples. Strict white noise processes (iid rvs) have extremal index = 1. ARMA processes with iid Gaussian innovations have = 1 (EKM, pages 216218). However, if the innovation distribution is in MDA(H ) for > 0, then < 1 (EKM, pages 415415). ARCH and GARCH processes have < 1 (EKM, pages 476480).
A5. The Block Maxima Method

Assume that we have a large enough block of n iid random variables so that the limit result is more or less exact, i.e. an > 0, dn R such that, for some , P Mn d n x cn H (x) .
ydn cn
Now set y = cnx + dn. P (Mn y) H We wish to estimate , dn and cn.
= H,dn,cn (y).
Implication: We collect data on block maxima and t the three-parameter form of the GEV. For this we require a lot of raw data so that we can form suciently many, suciently large blocks.
136
ML Inference for Maxima

We have block maxima data y = Mn , . . . , Mn from m blocks of size n. We wish to estimate = (, , ) . We construct a loglikelihood by assuming we have independent observations from a GEV with density h ,
m (1) (m)
L(; y) = log
i=1
(i) h Mn 1n1+M (i)/>0o

n
and maximize this w.r.t. to obtain the MLE = (, , ) . Clearly, in dening blocks, bias and variance must be traded o. We reduce bias by increasing the block size n; we reduce variance by increasing the number of blocks m.
The S&P 500 Example

It is the early evening of Friday the 16th October 1987. In the equity markets it has been an unusually turbulent week, which has seen the S&P 500 index fall by 9.21%. On that Friday alone the index is down 5.25% on the previous day, the largest oneday fall since 1962. At our disposal are all daily closing values of the index since 1960. We analyse annual maxima of daily percentage falls in the index. (1) (28) These values M260, . . . , M260 are assumed to be iid from H,, . Remark. Although we have only justied this choice of limiting distribution for maxima of iid data, it turns out that the GEV is also the correct limit for maxima of stationary time series, under some technical conditions on the nature of the dependence. These conditions are fullled, for example, by GARCH processes.
S&P 500 Return Data

S&P 500 to 16th October 1987
-6
05.01.60
-4
-2
05.01.65
05.01.70
05.01.75 Time
05.01.80
05.01.85
139
Assessing the Risk in S&P

We will address the following two questions: What is the probability that next years maximum exceeds all previous levels? What is the 40year return level R260,40? In the rst question we assess the probability of observing a new record. In the second problem we dene and estimate a rare stress or scenario loss.
140
Return Levels
Rn,k , the k nblock return level, is dened by P (Mn > Rn,k ) = 1/k ; i.e. it is that level which is exceeded in one out of every k nblocks, on average. We use the approximation
1 Rn,k H,, (1 1/k) + ( log(1 1/k))
We wish to estimate this functional of the unknown parameters of our GEV model for maxima of nblocks.
141
S-Plus Maxima Analysis with EVIS

> out <- gev(-sp,"year") > out $n.all: [1] 6985 $n: [1] 28 $data: 1960 2.268191 1968 1.899367 1976 1.797353 1984 1.820587
1961 2.083017 1969 1.903001 1977 1.625611 1985 1.455301
1962 6.675635 1970 2.768166 1978 2.009257 1986 4.816644
1963 1964 1965 1966 1967 2.806479 1.253012 1.757765 2.460411 1.558183 1971 1972 1973 1974 1975 1.522388 1.319013 3.051598 3.671256 2.362394 1979 1980 1981 1982 1983 2.957772 3.006734 2.886327 3.996544 2.697254 1987 5.253623
$par.ests: xi sigma mu 0.3343843 0.6715922 1.974976 $par.ses: xi sigma mu 0.2081 0.130821 0.1512828 $nllh.final: [1] 38.33949
142
S&P Example (continued)

Answers: Probability is estimated by 1 H,, max M260, . . . , M260 R260,40 is estimated by
1 H,, (1 1/40) = 6.83 . (1) (28)
= 0.027 .
It is important to construct condence intervals for such statistics. We use asymptotic likelihood ratio ideas to construct asymmetric intervals the socalled prole likelihood method.
Estimated 40-Year Return Level

S&P Negative Returns with 40 Year Return Level
20 -5
05.01.60
10
15
05.01.65
05.01.70
05.01.75 Time
05.01.80
05.01.85
144
References
On EVT in general: [Embrechts et al., 1997] [Reiss and Thomas, 1997] On Fisher-Tippett Theorem: [Fisher and Tippett, 1928] [Gnedenko, 1943] Application of Block Maxima Method to S&P Data: [McNeil, 1998]
B. EVT: Modelling Threshold Exceedances

1. The Generalized Pareto Distribution 2. Modelling Excess Losses over Thresholds 3. Modelling Tails of Loss Distributions 4. The POT Model
146
B1. Generalized Pareto Distribution

The GPD is a two parameter distribution with df G, (x) = 1 (1 + x/)1/ 1 exp(x/) = 0, = 0,
where > 0, and the support is x 0 when 0 and 0 x / when < 0. This subsumes: > 0 Pareto (reparametrized version) = 0 exponential < 0 Pareto type II. Moments. For > 0 distribution is heavy tailed. E X k does not exist for k 1/.
GPD: Dfs and Densities

1.0 1.0 g(x) 0.4 0.2 0.0
0.8
0.6
G(x)
4 x
0.0
0.2
0.4
0.6
0.8
4 x
Solid line corresponds to = 0 (exponential); dotted line is = 0.5 (Pareto); dashed line is = 0.5 (Pareto type II). = 1.
The Role of the GPD

The GPD is a natural limiting model for excess losses over high thresholds. To discuss this idea we need the concepts of the excess distribution over a threshold and the mean excess function. Let u be a high threshold and dene the excess distribution above the threshold u to have the df F (x + u) F (u) , Fu(x) = P (X u x | X > u) = 1 F (u) for 0 x < xF u where xF is the right endpoint of F . The mean excess function of a rv X is e(u) = E(X u | X > u). It is the mean of the excess distribution above the threshold u expressed as a function of u.
Examples
1. Exponential. F (x) = 1 ex, > 0, x 0. Fu(x) = F (x) , x 0.
The lack-of-memory property. e(u) = 1/ for all u 0 2. GPD. F (x) = G, (x). Fu(x) = G,+u(x), 0 x < if 0 0x

u if < 0.
The excess distribution of GPD remains GPD with same shape. + u e(u) = , 1 0 u < if 0 0u

if < 0.
150
Mean excess function is linear in threshold u.

Asymptotics of Excess Distribution

Theorem.[Balkema and de Haan, 1974, Pickands, 1975] We can nd a function (u) such that
uxF 0x<x u F
lim
sup
Fu(x) G,(u)(x) = 0 ,
if and only if F MDA (H ), R. This theorem provides a characterization of distributions in MDA(H ): they are the distributions with excess distributions that converge to generalized Pareto with shape parameter . This amounts to all the common continuous distributions used in risk management or insurance mathematics.
151
B2. Modelling Excess Distributions

We assume we are dealing with a loss distribution F for which the excess distribution over some high threshold u is exactly generalized Pareto. We assume from now on that Fu = G, for some parameters R and > 0. This is clearly an idealized model but facilitates a number of later calculations. Given data X1, . . . , Xn a random number Nu will exceed the threshold. We relabel these exceedances X1, . . . , XNu and calculate excess amounts Yi = Xi u, for i = 1, . . . , Nu. To estimate and we t the GPD to the excess data. We may use various tting methods including maximum likelihood (ML) and probability-weighted moments (PWM).
152
Danish Fire Loss Data

The Danish data consist of 2167 losses exceeding one million Danish Krone from the years 1980 to 1990. The loss gure is a total loss for the event concerned and includes damage to buildings, damage to contents of buildings as well as loss of prots. The data have been adjusted for ination to reect 1985 values.
Large Insurance Claims
0
030180
50
100
150
200
250
030182
030184
030186 Time
030188
030190
153
EVIS POT Analysis

$par.ests: xi beta 0.4969857 6.975468 $par.ses: xi beta 0.1362838 1.11349 > out <- gpd(danish,10) > out $n: [1] 2167 $data: [1] 11.37482 [4] 11.71303 [7] 13.62079 ...etc... [106] 144.65759 [109] 17.73927 $threshold: [1] 10 $p.less.thresh: [1] 0.9497 $n.exceed: [1] 109 c 2006 (Embrechts, Frey, McNeil) 154 $varcov: [,1] [,2] [1,] 0.01857326 -0.08194611 [2,] -0.08194611 1.23986096 $information: [1] "observed" $converged: [1] T $nllh.final: [1] 374.893 $
26.21464 14.12208 12.46559 17.56955 21.96193 263.25037 28.63036 19.26568
Estimating Excess df
Estimate of Excess Distribution
1.0

Fu(x-u)
0.0
10
0.2
0.4
0.6
0.8
50 x (on log scale)
100
155
Serially Dependent Data

In the application of ML we usually assume the underlying losses, and hence the excess losses, are iid. This is unproblematic for insurance or operational risk data but not for nancial time series. If the data are serially dependent but show no tendency for extremes to cluster (suggesting = 1) then high-level threshold exceedances occur as a Poisson process and the excess loss amounts over the threshold can essentially be modelled as iid data. If extremal clustering is present (suggesting < 1 as in GARCH process), then iid assumption is unsatisfactory. The easiest approach is to neglect this problem. Although the likelihood is misspecied with respect to the serial dependence of the data, the point estimates should still be reasonable although standard errors may be too small.
Excesses Over Higher Thresholds

From the model G, for the excess distribution over u we can easily infer a model for the excess distribution over any higher threshold. We have that Fv (x) = G,+(vu)(x) for v u. Thus the excess distribution over higher thresholds remains GPD with the same parameter but a scaling that grows linearly with the threshold v. Provided < 1 the mean excess function is given by v u + (v u) = + , e(v) = 1 1 1 where u v < if 0 < 1 and u v u / if < 0. The linearity of the mean excess function in v is commonly used as a diagnostic for data admitting a GPD model for the excess distribution. It forms the basis for a simple graphical method for deciding on an appropriate threshold as follows.
(13)
Using Mean Excess Plot to Set a Threshold

For positive valued loss data X1, . . . , Xn we dene the sample mean excess function to be an empirical estimator of the mean excess function given by en(v) =
n i=1(Xi v)1{Xi>v} . n i=1 1{Xi>v}
To view this function we generally construct the mean excess plot {(Xi,n, en(Xi,n)) : 2 i n} , where Xi,n denotes the ith order statistic. If the data support a GPD model over a high threshold we would expect this plot to become linear in view of (13).
Mean Excess Plot for Danish Data

120
100
0 10 20
20
Mean Excess 40 60 80
30 40 Threshold
50
60
159
B3. Modelling Tails of Loss Distributions

Under our assumption that Fu = G, for some u, and we have, for x u, F (x) = P (X > u)P (X > x | X > u) = F (u)P (X u > x u | X > u) = F (u)F u(x u) xu = F (u) 1 +
1/
(14)
which, if we know F (u), gives us a formula for tail probabilities. This formula may be used to derive formulas for risk measures like VaR and expected shortfall.
Calculating VaR and Expected Shortfall

For F (u) we have that VaR is equal to VaR = q(F ) = u + 1 F (u)
1 .
(15)
Assuming that < 1 the associated expected shortfall can be calculated easily from (2) and (15). We obtain 1 ES = 1
1
VaR u qx(F )dx = + . 1 1
(16)
161
Ratios of Risk Measures

It is interesting to look at how the ratio of the two risk measures behaves for large values of the quantile probability . It is easily calculated from (15) and (16) that ES lim = 1 VaR (1 )1 0, 1 < 0, (17)
so that the shape parameter of the GPD eectively determines the ratio when we go far enough out into the tail.
162
Estimating Tails and Risk Measures

Tail probabilities, VaRs and expected shortfalls are all given by formulas of the form g(, , F (u)). We estimate these quantities by replacing and by their estimates and replacing F (u) ny the simple empirical estimator Nu/n. For tail probabilities we the estimator of [Smith, 1987] Nu F (x) = n xu
b 1/
1+
(18)
which is valid for x u. For 1 Nu/n we obtain analogous point estimators of VaR and ES. Asymmetric condence intervals can be constructed using prole likelihood method as desribed in book (page 284).
Estimating Tail of Underlying df

Tail of Underlying Distribution
0.05000

0.00500
1-F(x) (on log scale)
0.00050
0.00005
10
50 x (on log scale)
100
164
Estimating a Quantile (99%)
0.05000
0.00500
95 99
0.00050
0.00005
10
23.3
27.3
33.1
50 x (on log scale)
100
165
Estimates of 99% VaR and ES
0.05000
0.00500
95 99
0.00050
0.00005
10
41.6
50
58.2
100
154.7
x (on log scale)
166
B4. The POT Model

When the loss data form a stationary time series then the timing and magnitude of threshold exceedances are both of interest. The POT (peaks-over-thresholds) model is a limiting model for threshold exceedances in iid processes (or processes with extremal index = 1). The limit is derived by considering datasets X1, . . . , Xn and thresholds un that increase with n and letting n (MFE, Section 7.4.2). The limiting model says that Exceedances occur according to a homogeneous Poisson process in time. Excess amounts above the threshold are iid and independent of exceedance times. Distribution of excesses is generalized Pareto.
Illustration
0.04
0.04
500
1000 Time
1500
2000
0.02
0.04
0.0
0.06
500
1000 Time
1500
2000
GPD Quantiles; xi = 0.09
Exponential Quantiles
0 20
0.0 0.01 0.02 0.03
40 Ordered Data
60
0.04
0.05
Ordered Data
200 largest from 2078 simulated t data (4.6 dof).

Representations of the POT Model

There are various alternative ways of describing this model: A marked Poisson point process where exceedances are points and GPD-distributed excesses are marks. A (non-homogeneous) two-dimensional Poisson point process, where points (t, x) in the space record times and magnitudes of exceedances. The latter representation leads to a very powerful way of tting the model as proposed in [Smith, 1989]. Other standard EVT models, such as generalized extreme value (GEV) models for maxima and GPDs for excesses, drop out of this representation easily.
Smith Formulation of POT Model

Original data: (Xt)1tn. Exceedances of threshold u: {(t, Xt) : 1 t n, Xt > u}. Also label these {(Tj , Xj ) : 1 j Nu} for convenience. We assume that points in the space X = (0, n] (u, ) occur according to a non-homogeneous Poisson process with intensity 1 x (x) = 1+
1/1
(19)
provided (1 + (x )/) > 0 and zero otherwise, where parameters satisfy , R and > 0.
170
Implications of model
The intensity measure is
t2 x
(A) =
t1
x (y)dydt = (t2 t1) 1 +
1/
,
+
for a set of the form A = (t1, t2) (x, ) X so that (A)k . P (k points in A) = exp((A)) k! The 1-d intensity of exceedances of the level x u is (x) = x 1+
1/
(20)
so that 1-d process is homogeneous Poisson.

GPD Models for Excesses

Consider a generic rv X from (Xt). We calculate that (u + x) P (X > u + x | X > u) = (u) = x 1+ + (u )
1/
= G, (x), where = + (u ) and x G, (x) = 1 1 +

1/
0 x < xF
is df of generalized Pareto (xF = if > 0, xF = / if < 0).

Statistical Inference
Likelihood of exceedance data {(Tj , Xj ) : j = 1, . . . , Nu} is
Nu
L(, , ; data) = exp {((0, n] (u, ))}

j=1 Nu
(Xj )
= exp {n (u)}
j=1
(Xj ).
For a justication see [Daley and Vere-Jones, 2003]. Fitting this model allows immediate inference about numbers and magnitudes of threshold exceedances at dierent levels. In particular the well-known GPD model for the excess distribution is easily derived.
Unsuitability for Financial Series
0.04
0.06
03.01.96
03.01.99 Time
03.01.02
0.02
0.04
0.0
0.06
10.01.96
10.01.98 Time
10.01.01
GPD Quantiles; xi = 0.1
Exponential Quantiles
0 50
0.0 0.01 0.02 0.03
100 Ordered Data
150
0.04
0.05
Ordered Data
200 worst days from 8 years of negative SP500 log returns.

POT for Financial TS

Exceedances tend to form clusters in time, violating the Poisson assumption. The usual approach is to assume the cluster centres follow a Poisson process and to decluster the data leaving only single cluster representatives. Declustering procedures are fairly ad hoc and are wasteful of data. No attempt is to made to model the in-cluster behaviour.
175
C. More Advanced Topics in Extremes

1. Motivation and Stylized Facts 2. GARCH Models 3. Dynamic EVT in Time Series Framework 4. Self-Exciting Extreme Value Models
176
C1. Motivation and Stylized Facts

In this module we consider dynamic models for extreme values that are more suitable for market risk applications. We recall that typical nancial return data show strong evidence against the iid assumption. If (St) denotes (say daily) values of an asset price and returns (Xt) are dened by Xt = ln(St/St1), we generally observe a number of stylized facts: Returns not iid but correlation low Absolute returns highly correlated Volatility appears to change randomly with time Returns are leptokurtic or heavytailed Extremes appear in clusters
Stylized Facts: Volatility

DAX
0.10 0.05
02.01.85
0.0
0.05
02.01.86
02.01.87
02.01.88
02.01.89
02.01.90 Time
02.01.91
02.01.92
02.01.93
02.01.94
Normal
0.10 0.05
0.0
0.05
500
1000 Time
1500
2000
2500
Student t
0.10 0.05
0.0
0.05
500
1000 Time
1500
2000
2500
Log-returns for DAX index from 02.01.85 until 30.12.94 compared with simulated iid data from tted normal and t distribution.
Stylized Facts: Correlation

DAX DAX (absolute values)
1.0
0.8
ACF 0.4 0.6
0.2
0.0
10
Lag
20
30
0.0
0
0.2
ACF 0.4 0.6
0.8
1.0
10
Lag
20
30
Normal
Normal (absolute values)
1.0
0.8
ACF 0.4 0.6
0.2
0.0
10
Lag
20
30
0.0
0
0.2
ACF 0.4 0.6
0.8
1.0
10
Lag
20
30
t (absolute values)
1.0
0.8
ACF 0.4 0.6
0.2
0.0
10
Lag
20
30
0.0
0
0.2
ACF 0.4 0.6
0.8
1.0
10
Lag
20
30
Correlograms for the three datasets in previous gure.

Towards Models for Financial Time Series

We seek theoretical stochastic process models that can mimic these stylized facts. In particular, we require models that generate volatility clustering, since most of the other observations ow from this. Econometricians have proposed a number of useful models including the ARCH/GARCH class. We examine to what extent EVT can be combined with such models to obtain estimates of conditional risk measures. We also consider the class of self-exciting point process models for extremes in nancial data. These models have their origins in seismology.
180
C2. GARCH Models

Let (Zt)tZ follow a strict white noise process (iid) with mean zero and variance one. ARCH and GARCH processes (Xt)tZ take general form Xt = tZt, t Z, (21)
where t, the volatility, is a function of the history up to time t 1 represented by Ft1. Zt is assumed independent of Ft1. Mathematically, t is Ft1-measurable, where Ft1 is the ltration 2 generated by (Xs)st1, and therefore var(Xt | Ft1) = t . Volatility is the conditional standard deviation of the process.
181
ARCH and GARCH Processes

(Xt) follows an ARCH(p) process if, for all t,
p 2 t = 0 + j=1 2 j Xtj ,
with j > 0.
Intuition: volatility inuenced by large observations in recent past. (Xt) follows a GARCH(p,q) process (generalised ARCH) if, for all t,
p 2 t = 0 + j=1 2 j Xtj + k=1 q 2 k tk ,
with j , k > 0.
(22)
Intuition: more persistence is built into the volatility.

Stationarity and Autocorrelations

The condition for the GARCH equations to dene a covariance stationary process with nite variance is that
p q
j +
j=1 k=1
k < 1.
ARCH and GARCH are technically uncorrelated white noise processes since the autocovariance function is given by (h) := cov(Xt, Xt+h) = E(t+hZt+htZt) = E(Zt+h)E(t+htZt) = 0.
183
Absolute and Squared GARCH Processes

Although (Xt) is an uncorrelated process, it can be shown that the 2 processes (Xt ) and (|Xi|) possess profound serial dependence.
2 In fact (Xt ) can be shown to have a kind of ARMA-like structure. 2 A GARCH(1,1) model is like an ARMA(1,1) model for (Xt ).
184
Hybrid ARMA/GARCH Processes

Although changes in volatility are the most obvious feature of nancial return series, there is sometimes some evidence of serial correlation at small lags. This can be modelled by Xt = t + t t = tZt, where t follows an ARMA specication, t follows a GARCH specication, and (Zt) is (0,1) strict white noise. t and t are respectively the conditional mean and standard deviation of Xt given history to time t 1; they satisfy 2 E(Xt | Ft1) = t and var(Xt | Ft1) = t .
A Simple Eective Model: AR(1)+GARCH(1,1)

For our purposes the following model will suce. t = Xt1 ,
2 t 2 = 0 + 1 (Xt1 t1) + t1, 2
with 0, 1, > 0, 1 + < 1 and || < 1 for a stationary model with nite variance. This model is a reasonable t for many daily nancial return series, particularly under the assumption that the driving innovations are heavier-tailed than normal.
186
C3. EVT in a Time Series Framework

We assume (negative) returns follow stationary time series of the form Xt = t + tZt . Dynamics of conditional mean t and conditional volatility t are given by an AR(1)-GARCH(1,1) model: t = Xt1 ,
2 2 t = 0 + 1 (Xt1 t1) + t1 , 2
with 0, 1, > 0, 1 + < 1 and || < 1. We assume (Zt) is strict white noise with E(Zt) = 0 and var(Zt) = 1, but leave exact innovation distribution unspecied. Other GARCH-type models could be used if desired.
Dynamic EVT
Given a data sample Xtn+1, . . . , Xt we adopt a two-stage estimation procedure. (Typically we take n = 1000.) We forecast t+1 and t+1 by tting an AR-GARCH model with unspecied innovation distribution by quasi-maximum-likelihood (QML) and calculating 1-step predictions. (QML yields consistent, asymptotically normal estimates of GARCH parameters) We consider the model residuals to be iid realizations from the innovation distribution and estimate the tails of this distribution using EVT (GPD-tting). In particular estimates of quantiles q(Z) and expected shortfalls ES(Z) for the distribution of (Zt) can be determined.
Risk Measures
Recall that we must distinguish between risk measures based on tails of conditional and unconditional distributions of the loss - in this case the negative return. We are interested in the former and thus calculate risk measures based on the conditional distribution F[Xt+1|Ft]. For a one-step time horizon risk measure estimates are easily computed from estimates of q(Z) and ES(Z) and predictions of t+1 and t+1 using VaRt = t+1 + t+1q(Z),
t ES = t+1 + t+1 ES(Z) .
189
Example with S&P Data
-10 -5
0
0
5 10 15 20
200
400
600
800
1000
2
0
200
400 600 Series and Conditional SD
800
1000
1000 day excerpt from series of negative log returns on Standard & Poors index containing crash of 1987.
Prewhitening with GARCH

Series : data
1.0 1.0
Series : abs(data)
0.8
ACF 0.4 0.6
0.2
0.0
10
15 Lag
20
25
30
0.0 0
0.2
ACF 0.4 0.6
0.8
10
15 Lag
20
25
30
Series : residuals
1.0 1.0
Series : abs(residuals)
0.8
ACF 0.4 0.6
0.2
0.0
10
15 Lag
20
25
30
0.0 0
0.2
ACF 0.4 0.6
0.8
10
15 Lag
20
25
30
191
Heavy-Tailedness Remains
QQ-plot of residuals; raw data from S&P
1000
999 998
xres
-2
-4
-2 0 Quantiles of Standard Normal 2
192
Comparison with Standard Conditional Distributions

Losses

Gains
0.1000 1-F(x) (on log scale) 0.0100

0.0500
0.0050
0.0010
0.0005
0.0001
GPD Normal t
0.0001
5 x (on log scale)
10
2 x (on log scale)
193
Dynamic EVT: 95% and 99% VaR Predictions

DAX Returns: losses (+ve) and profits (-ve)
0.10 -0.05
01.01.98
0.0
0.05
01.07.98
01.01.99 Time
01.07.99
01.01.100
194
Backtesting: Violation Counts

Length of Test 0.95 Quantile Expected Conditional EVT Conditional Normal Conditional t Unconditional EVT 0.99 Quantile Expected Conditional EVT Conditional Normal Conditional t Unconditional EVT S&P 7414 371 (0.41) (0.25) (0.04) (0.05) 74 (0.48) (0.00) (0.34) (0.10) DAX 5146 257 (0.49) (0.11) (0.41) (0.30) 51 (0.33) (0.00) (0.11) (0.16)
366 384 404 402
258 238 253 266
73 104 78 86
55 74 61 59
Remark: Performance of ES estimates even more sensitive to suitability of model in the tail region.
C4. Self-Exciting Processes

In these models we assume a self-exciting structure for threshold exceedances. Let Ht denotes the history of exceedances up to but not including time t, that is {(Tj , Xj ) : 0 < Tj < t}. The conditional intensity of exceedances of threshold u is (t) =
0+
lim 1P {exceedance in [t, t + ) | Ht} .
We parameterize this conditional intensity by assuming (t) = +

j:0<Tj <t
h(t Tj ; Xj u),
, > 0
where h(s, x) is positive-valued, decreasing in rst argument and increasing in second.

General forms for h(s, x)

M1 : h(s, x) = esex, M2: h(s, x) = (s + )+1ex, , > 0 , , > 0
Under this formulation the increase in intensity depends not only on the time since an event but also on the size of a past event.
1.5
11.2
Intensity
1.0
5.6
0.5
4.35 3.2
0.0 mu
u T0 T1 T2 Time T3 T4
197
Hawkes and Marked Hawkes Processes

Processes of this kind are known as (temporal) Hawkes processes. They have been applied to earthquakes and their aftershocks (where they have spatio-temporal extensions). [Hawkes, 1971] [Ogata, 1988] [Daley and Vere-Jones, 2003] If we combine the Hawkes process for occurrence times with an iid GPD assumption for the excess returns above the threshold u we get a so-called marked Hawkes process with unpredictable marks. Ideally, however, we would also like a model where size of the marks could also depend on the current level of excitement.
198
Statistical Inference
Model can also be tted easily by ML. Exceedance data is {(Tj , Xj ) : j = 1, . . . , Nu}. Parameters (in M1) are = (, , , ) .
n Nu
L(; data) = exp

0
(t)dt
j=1
(Tj ).
199
Hawkes Model in Action
0.02
0.04
0.06
500
1000 Time
1500
2000
0 500 1000 times 1500 2000
intensity
0.05
0.15
0.25
200 most extreme negative log returns.

S-PLUS Analysis
$threshold: [1] 0.01506956 $theta: tau psi gamma delta 0.03154302 0.01605022 0.02622928 12.25498 $theta.ses: [1] 0.01132330
0.00687112
0.01090378 25.61115098
$ll.poisson: [1] -668.1688 $ll.max$: [1] -648.215

III: Operational Risk

A. Operational Risk and Insurance Analytics
202
A. Operational Risk and Insurance Analytics

1. A New Risk Class 2. Insurance Analytics Toolkit 3. The Capital Charge Problem 4. Marginal VaR Estimation 5. Global VaR Estimation
203
A1. A New Risk Class

The New Accord (Basel II) 1988: Basel Accord (Basel I): minimal capital requirements against credit risk, one standardised approach, Cooke ratio 1996: Amendment to Basel I: market risk, internal models, netting 1999: Several Consultative Papers on the New Accord (Basel II) to date: CP3: Third Consultative Paper on the New Basel Capital Accord (www.bis.org/bcbs/) 2007+: full implementation of Basel II
204
Basel II: What is new?

Rationale for the New Accord: More exibility and risk sensitivity Structure of the New Accord: Three-pillar framework: Pillar 1: minimal capital requirements (risk measurement) Pillar 2: supervisory review of capital adequacy Pillar 3: public disclosure
205
Basel II: What is new? (contd)

Two options for the measurement of credit risk: Standard approach Internal rating based approach (IRB) Pillar 1 sets out the minimum capital requirements (Cooke Ratio, McDonough Ratio): total amount of capital 8% risk-weighted assets MRC (minimum regulatory capital) = 8% of risk-weighted assets Explicit treatment of operational risk
def
Operational Risk
Denition: The risk of losses resulting from inadequate or failed internal processes, people and systems, or external events.
Remark: This denition includes legal risk, but excludes strategic and reputational risk. Note: Solvency 2
Operational Risk (contd)

Notation: COP: capital charge for operational risk Target: COP 12% of MRC (down from initial 20%) Estimated total losses in the US (2001): $50b Some examples 1977: Credit Suisse Chiasso-aair 1995: Nick Leeson/Barings Bank, 1.3b 2001: September 11 2001: Enron (largest US bankruptcy so far) 2002: Allied Irish, 450m
Risk Measurement Methods for Operational Risk

Pillar 1 regulatory minimal capital requirements for operational risk: Three distinct approaches: 1. Basic Indicator Approach 2. Standardised Approach 3. Advanced Measurement Approach (AMA)
209
Basic Indicator Approach

Capital charge:
BIA COP = GI BIA COP : capital charge under the Basic Indicator Approach
GI: average annual gross income over the previous three years = 15% (set by the Committee based on CISs)
210
Standardised Approach
Similar to the BIA, but on the level of each business line:
8
SA COP =
i GIi
i=1
i [12%, 18%], i = 1, 2, . . . , 8 and 3-year averaging 8 business lines: Corporate nance (18%) Trading & sales (18%) Retail banking (12%) Payment & Settlement (18%) Agency Services (15%) Asset management (12%)
Commercial banking(15%) Retail brokerage (12%)

Advanced Measurement Approach (AMA)

Allows banks to use their internally generated risk estimates Preconditions: Bank must meet qualitative and quantitative standards before being allowed to use the AMA
SA Risk mitigation via insurance possible ( 20% of COP)
Incorporation of risk diversication benets allowed Given the continuing evolution of analytical approaches for operational risk, the Committee is not specifying the approach or distributional assumptions used to generate the operational risk measures for regulatory capital purposes. Example: Loss distribution approach
Internal Measurement Approach

Capital charge (similar to Basel II model for Credit Risk):
8
IMA COP =
ik eik
i=1 k=1
(rst attempt)
eik : expected loss for business line i, risk type k ik : scaling factor 7 loss types:
Internal fraud External fraud Employment practices and workplace safety Clients, products & business practices Damage to physical assets Business disruption and system failures Execution, delivery & process management
Loss Distribution Approach (LDA)

type 2
RT1 BL1
r r r
r r r
RTk
r r r
RT7

'
BLi
r r r
&
LT +1 i,k
$

0 1992

10
15
20
1994
1996
1998
2000
2002
BL8 LT +1
LDA: continued
For each business line/loss type cell (i, k) one models LT +1: OP risk loss for business line i, loss type k i,k over the future (one year, say) period [T, T + 1]
T +1 Ni,k
LT +1 = i,k
=1
Xi,k
(next periods loss for cell (i, k))
Note that Xi,k is truncated from below
215
LDA: continued
Remark: Look at the structure of the loss random variable LT +1
8 7
LT +1 =
i=1 k=1 8
LT +1 i,k
T +1 Ni,k 7
(next periods total loss)
=
i=1 k=1 8 =1
Xi,k LT +1 i
i=1
(often used decomposition)
Check again the overall complexity of the (BL, RT) matrix

A2. Insurance Analytics: an Essential Toolkit

Total Loss Amount Denote by N (t) the (random) number of losses over a xed period [0, t] and write X1, X2, . . . for the individual losses. The aggregate loss is
N (t)
SN (t) =
k=1
Xk
Remarks: FSN (t) (x) = P (SN (t) x) is called the total loss df. If t is xed, we write SN and FSN instead The random variable SN (t) is also referred to as random sum
Compound Sums
Assume: 1. (Xk ) are iid with common df G, G(0) = 0 2. N and (Xk ) are independent SN is then referred to as a compound sum. The pdf of N is denoted by pN (k) = P (N = k), k = 0, 1, . . . and N is called a compounding rv. Proposition 1: Let SN be a compound sum and the above assumptions hold. Then
FSN (x) =
k=0
pN (k)Gk (x),
x0
218
Proposition 2: Let SN be a compound sum and the above assumptions hold. Then the Laplace-Stieltjes transform of SN satises

FSN (s) =
0
esxdFSN (x) =
k=0
pN (k)Gk (s) = MN (G(s)),
s0
where MN denotes the moment-generating function of N . Proposition 3: Let SN be a compound sum and the above assumptions hold. If 2 E(N 2) < and E(X1 ) < , we have that E(SN ) = E(N )E(X1) var(SN ) = var(N )(E(X1))2 + E(N ) var(X1)
Compound Poisson Distribution

Example 1: Consider N Poi(). Then SN is referred to as a compound Poisson rv. The moment-generating function of N satises MN (s) = exp((1 s)) and hence FSN (s) = exp((1 G(s))) Notation: SN CPoi(, G)
2 If E(X1 ) < , the moments of SN are by Proposition 3
E(SN ) = E(X1) and

2 var(SN ) = E(X1 )
220
Aggregation of Compound Poisson rvs

Suppose that the compound sums SNi CPoi(i, Gi), i = 1, . . . , d and that these rvs are independent. Then
d d Ni
SN :=
i=1
SNi =
i=1 k=1
Xi,k
is again a compound Poisson rv, SN CPoi(, G) where

d
=
i=1
i Gi and G = i=1
G is hence a mixture distribution. A simulation from G can be done in two steps: rst draw i, i {1, . . . , d} with probability i/ and then draw a loss with df Gi.
Binomial Loss Model

Example 2: Suppose N Bin(n, p). SN is then called the (individual risk) binomial model. Consider a time interval [0, 1] and let Nn denote the total number of losses in [0, 1] for a xed n. Suppose further that we have a number of potential loss generators that can produce, with probability pn, a loss in each small subinterval ((k 1)/n, k/n], k = 1, . . . , n. Moreover, the occurrence of a loss in any particular subinterval is not inuenced by the occurrence of losses in other intervals and npn for a > 0 as n . For a xed severity distribution, SNn is then a binomial model with Nn Bin(n, pn) and converges in law to a compound Poisson rv with parameter as n .
Over-dispersion
For compound Poisson rvs with N Poi() we have that E(N ) = var(N ) = Count data however often exhibit over-dispersion meaning that they indicate E(N ) < var(N ) This can be achieved by mixing, i.e. by randomizing the parameter
223
Randomization
Other examples of mixing include: from Black-Scholes to stochastic volatility models (randomize ) from Nd(, ) to elliptical distributions (randomize ); i.e. the multivariate t distribution. Randomization of both and leads to generalized hyperbolic distributions mixing models for credit risk (randomizing the default probability) credibility theory in insurance (randomizing the underlying risk parameter) Bayesian inference ...
Poisson Mixtures
Denition: Let be a positive rv with distribution function F. A rv N given by

P (N = k) =
0
P (N = k| = )dF() =
0
k dF() k!
is called a mixed Poisson rv with structure or mixing distribution F. A compound sum with a mixed Poisson rv as the compounding rv is called a compound mixed Poisson rv. Lemma: Suppose that N is mixed Poisson with structure df F. Then E(N ) = E() and var(N ) = E() + var(), i.e. for non-degenerate, N is over-dispersed.
225
Negative Binomial Distribution

Example 3: For Ga(, ), the mixed Poisson rv is negative binomial, N NB(, /( + 1)): P (N = k) = Further, E(N ) = = E() ( + 1) = + 2 = E() + var() var(N ) = 2 Compounding leads to compound mixed Poisson rvs. Many more interesting models exist.
+1
1 +1
( + k) ( + 1)+k
Approximations
The distribution of SN is generally intractable Normal approximation for CPoi(, G): FSN (x) x E(N )E(X1) var(N )(E(X1))2 + E(N ) var(X1)
However, the skewness of SN is positive:

3 E(SN E(SN ))3 E(X1 ) = >0 3/2 (var(SN ))
Translated-gamma approximation for CPoi(, G): Approximate SN by k + Y where k is a translation parameter and Y Ga(, ). The parameters k, , are found by matching the mean, variance and skewness.
Example: Light-tailed Severities
Simulated CPoi(100, Exp(1)) data together with normal and translated gamma approximations. The 99.9% quantile estimates are also given.
Example: Heavy-tailed Severities
Simulated CPoi(100, Pa(4, 1)) data together with normal and translated gamma approximations. GPD approximation based on the POT method is also performed.
Panjer Class
Recursive method for approximating SN in case the severity distribution G is discrete and N satises a specic condition. Panjer Class The probability mass distribution of N belongs to the Panjer(a, b) class for some a, b R if pN (k) = (a + (b/k))pN (k 1) for k 1 The only nondegenerate examples of distributions belonging to a Panjer(a, b) class are binomial B(n, p) with a = p/(1 p) and b = (n + 1)p/(1 p) Poisson Poi() with a = 0 and b = Negative binomial NB(, p) with a = 1 p and b = ( 1)(1 p)
Panjer Recursion
For a discrete severity rv X1 we denote gi := P (X1 = i) and si := P (SN = i) Theorem: Suppose N satises the Panjer(a, b) class condition and g0 = 0. Then s0 = pN (0) = 0 and, for k 1,
k
sk =
i=1
bi giski. a+ k
For continuous severity distributions, discretization necessary Correction for g0 > 0 possible Estimation of sk far in the tail is more tricky
Example
Simulated CPoi(100, LN(1, 1)) data together with the Panjer recursion approximation. Normal, translated gamma and GPD approximations are also performed.
Further Topics
SN can be looked upon as a process in time, SN (t). Instead of N we then have the process {N (t), t 0} counting the number of events in [0, t]. Interesting examples for N (t) are homogeneous Poisson process non-homogeneous Poisson process Cox or doubly stochastic Poisson process Of further interest is the surplus process Ct = u + ct SN (t) and the corresponding ruin probability (u, T ) = P u{ inf Ct 0}
0tT
Rare event simulation

Homogeneous Poisson Process
Ten realizations of a homogeneous Poisson process with = 100.
234
Mixed Poisson Process
Ten realizations of a mixed Poisson process with Ga(100, 1).
235
Ruin Probability
Recall ruin results for (u) := (u, ) = P u{ inf Cramr-Lundberg: small claims e (u) < eRu, (u) eRu, 1 u > 0 u
0t
Ct < 0}
Embrechts-Veraverbeke: large claims with df G
(u)
2 u
G(t)dt,
u
236
Subexponential Distributions
Denition: For X1, . . . , Xn positive iid random variables with common distribution function FX , denote Sn =
n k=1 Xk
and
Mn = max(X1, . . . , Xn). The distribution function FX is called subexponential (denoted by FX S) for some (and then for all) n 2 if P (Sn > x) =1 lim x P (Mn > x) Examples: Pareto, Generalized Pareto, Lognormal, Loggamma, ...
Ruin Process with Exponential Claims
20 simulations from the ruin process Ct, 0 t 1, with (N (t)) HPois(100t) and X1 Exp(1).
Ruin Process with Pareto Claims
20 simulations from the ruin process Ct, 0 t 1, with (N (t)) HPois(100t) and X1 Pareto(2, 1).
Ruin Process with Exponential Claims
20 simulations from the ruin process Ct, 0 t 1, with (N (t)) a doubly stochastic Poisson process with a two-state Markov intensity process: HPois(10t) and HPois(100t) with mean holding times 5 and 0.2, respectively, and X1 Exp(1).
Ruin Process with Pareto Claims
20 simulations from the ruin process Ct, 0 t 1, with (N (t)) a doubly stochastic Poisson process with a two-state Markov intensity process: HPois(10t) and HPois(100t) with mean holding times 5 and 0.2, respectively, and X1 Pareto(2, 1).
A3. The Capital Charge Problem within LDA
242
Loss Distribution Approach

Choose: Period T Distribution of LT +1 for each cell i, k i,k Interdependence between cells Condence level (0, 1), 1 Risk measure g Capital charge for:
T +1, Each cell: Ci,k OR = g(LT +1) i,k T +1, Total OR loss: C T +1,OR based on Ci,k OR
(contd)
243
Basel II proposal
Period: one year Distribution: should be based on internal data/models external data expert opinion Condence level: = 99.9%, for economic capital purposes even = 99.95% or = 99.97% Risk measure: VaR Total capital charge: C T +1,OR =
i,k
VaR(LT +1) i,k
possible reduction due to correlation eects
244
Basel II Proposal: Summary

Marginal VaR calculations VaR1 , . . . , VaRl Global VaR estimate VaR+ = VaR1 + + VaRl Reduction because of correlation eects VaR < VaR+ Further possibilities: insurance, pooling, ...
Coherence and VaR

VaR is in general not coherent: 1. skewness 2. special dependence 3. very heavy-tailed losses VaR is coherent for: elliptical distributions
246
Skewness
100 iid loans: 2%-coupon, 100 face value, 1% default probability (period: 1 year): Xi = 2 with probability 99% 100 with probability 1% (loss)
100
Two portfolios L1 =
i=1
Xi, L2 = 100X1 (!)
VaR95%(L1) > VaR95%(100X1)

VaR95%
100 P i=1
! Xi
100 P i=1
VaR95%(Xi)
247
Special Dependence
Given rvs X1, . . . , Xn with marginal dfs F1, . . . , Fn, then one can always nd a copula C so that for the joint model F (x1, . . . , xn) = C(F1(x1), . . . , Fn(xn)) VaR is superadditive:
n n
VaR
k=1
Xk
>
k=1
VaR(Xk )
In particular, take the nice case F1 = = Fn = N (0, 1)

Special Dependence
249
Very Heavy-tailedness
Pareto: take X1, X2 independent with P (Xi > x) = x1/2, x 1 then for x > 2 2 x1 P (X1 + X2 > x) = > P (2X > x) x so that VaR(X1 + X2) > VaR(2X1) = VaR(X1) + VaR(X2) Pareto-type: similar result holds for X1, X2 independent with P (Xi > x) = x1/ L(x), where > 1, L slowly varying For < 1, we obtain subadditivity - WHY?
Several reasons: (Marcinkiewicz-Zygmund) Strong Law of Large Numbers Argument based on stable distributions Main reason however comes from functional analysis In the spaces Lp, 0 < p < 1, there exist no convex open sets other than the empty set and Lp itself. Hence as a consequence 0 is the only continuous linear functional on Lp; this is in violent contrast to Lp, p 1 Discussion: no reasonable risk measures exist diversication goes the wrong way
Denition: An Rd-valued random vector X is said to be regularly varying if there exists a sequence (an), 0 < an , = 0 Radon measure on d d B R \{0} with (R \R) = 0, so that for n , nP (a1X n Note that: (an) RV for some > 0 (uB) = u
1/
) () on B R \{0} .
(B) for B B R \{0}
Theorem: (several versions Samorodnitsky) If (X1, X2) RV 1/ , < 1, then for suciently close to 1, VaR is subadditive.
Phase Transition of Value-at-Risk

Theorem: Assume that X = (X1, X2) is a rv such that Xi F for all i where F is continuous and F RV , > 0 X has an Archimedean copula with generator which is regularly varying at 0 with index < 0 then there exists a constant q2(, )such that VaR(X1 + X2) (q2(, ))1/ VaR(X1), 1
Behavior of q2(, ) with respect to and , respectively

A4. Marginal VaR Estimation

LDA revisited Recall: VaRi,k is a Value-at-Risk of a compound sum
T +1 Ni,k
LT +1 = i,k
l=1
l Xi,k
Tasks: Suitable model for the severity Xi,k T +1 Suitable model for the frequency Ni,k Estimation of VaRi,k
Some OpRisk Data

type 1
40 20
type 2
30
10
20
1992
1994
1996
1998
2000
2002
0 1992
10
15
1994
1996
1998
2000
2002
type 3
40
pooled operational losses
1992
1994
1996
1998
2000
2002
0 1992
10
20
30
1994
1996
1998
2000
2002
255
pooled operational losses: mean excess plot

14
12
10
Mean Excess
0 2 6 8
4 Threshold
P (L > x) x1/ L(x)

Stylized Facts
Stylized facts about OpRisk losses: Loss amounts show extremes Loss occurence times are irregularly spaced in time
(reporting bias, economic cycles, regulation, management interactions, structural changes, . . . )
Non-stationarity (frequency(!), severity(?)) Large losses are of main concern Repetitive versus non-repetitive losses However: severity is of key importance
257
Peaks-over-Threshold Method
Distribution of the exceedances Distribution of the inter-arrival times

Peaks-over-Threshold Method (POT)

X1, . . . , Xn iid with distribution function F satisfying F (x) = x1/ L(x), > 0 and L slowly varying
Excess distribution: asymptotically Generalized Pareto (GPD) P (X u > y|X > u) y 1+ (u)
1/
POT-MLE estimation of tail probabilities and risk measures

Nu x u F (x) = 1+ n n VaR = u + (1 ) Nu
1/
x>u close to 1
259
1 ,
Threshold Choice
Application of POT based estimates requires a choice of an appropriate threshold u Rates of convergence: very tricky - No generally valid convergence rate - Convergence rate depends on F , in particular on the slowly varying function L, in a complicated way and may be very slow - L is not visible from data directly
Threshold choice is very dicult. Trade-o between bias and variance

usually takes place. Diagnostic tools: - Graphical tools (mean excess plot, shape plot,...) - Bootstrap and other methods requiring extra conditions on L
Basel II QIS 2002 Data

POT Analysis of Severities: P (Li > x) = x1/i Li(x) Business line Corporate nance Trading & sales Retail banking Commercial banking Payment & settlement Agency services Asset management Retail brokerage i > 1: innite mean Remark: dierent picture at level of individual banks
i 1.19 (*) 1.17 1.01 1.39 (*) 1.23 1.22 (*) 0.85 0.98
* means signicant at 95% level
Issues Regarding Innite Mean Models

Reason for > 1? Potentially: wrong analysis EVT conditions not fullled contamination, mixtures We concentrate on the latter: Two examples: Contamination above a high threshold Mixture models Main aim: show through examples how certain data-structures can lead to innite mean models
Contamination Above a High Threshold

Example 1: Consider the model 1/1 1x 1 1 + 1 FX (x) = 1/2 2(xv ) 1 1 + 2 where 0 < 1 < 2 and 1, 2 > 0. v is a constant depending on the model parameters in a way that FX is continuous VaR can be calculated explicitly: VaR(X) =
if x v, if x > v,
1 1 1 (1 1 v + 2 2
)1 1 (1 )2 1
if FX (v), if > FX (v).

263
Shape Plots
rep(xi1, length(xdata))
rep(xi1, length(xdata)) 0 100 200 300 xdata 400 500 600
1 0
100
200
300 xdata
400
500
600
Easy case: v low
Hard case: v high
264
Finite Mean Case
2 0
100
200
300 xdata
400
500
600
Careful: similar picture for v low and 1
2 < 1
265
Contamination above a high threshold

Easy case: v low
(contd)
Change of behavior typically visible in the mean excess plot Hard case: v high Typically only few observations above v Mean excess plot may not reveal anything Classical POT analysis easily yields incorrect resuls Vast overestimation of VaR possible
266
Mixture Models
Example 2: Consider FX = (1 p)F1 + pF2, with Fi exact Pareto, i.e. Fi(x) = 1 x1/i for x 1 and 0 < 1 < 2. Asymptotically, the tail index of FX is 2 VaR can be obtained numerically and furthermore does not correspond to VaR of a Pareto distribution with tailindex equals VaR corresponding to F2 at a level lower than
Classical POT analysis can be very misleading:

Threshold 0.936
q
1.220
1.740
2.920
4.640
13.500
300
q q q q
Shape (xi) (CI, p = 0.95)
Mean Excess
200
q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q
100
50
100 Threshold
150
200
0.0 500
0.5
1.0
1.5
433
366
299
232
165
98
48
Exceedances
2 0
100
200
300 xdata
400
500
600
268
VaR for Mixture Models

0.9 0.95 0.99 0.999 0.9999 0.99999 VaR(FX ) VaR(Pareto(2)) 6.39 12.06 71.48 2222.77 105 4.64 106 46.42 147.36 2154.43 105 4.64 106 2.15 108 0.8 0.83 0.93 1.12 1.27 1.33
Value-at-Risk for mixture models with p = 0.1, 1 = 0.7 and 2 = 1.6.
269
Including Frequencies
The POT method can be embedded into a wider framework based on Point processes iid case: exceedance times follow asymptotically a homogeneous Poisson Process Extensions: several possibilities Including the severities: marked Poisson process Non-stationarity: non-homogeneous Poisson processes Over-dispersion: doubly stochastic processes Short-range dependence: clustering
VaR Estimation for Compound Sums

Proposition: Let X, X1, X2, . . . be iid with FX S. If for some > 0,
n=1(1
+ )nP (N = n) < (satised for instance in the important
binomial, Poisson and negative binomial cases), then lim P( > x) = E(N ). 1 FX (x)
N i=1 Xi
Approximation of VaR:
N
VaR
i=1
Xi
VaR (X),
1 , =1 EN
271
Marginal VaR Estimation

Approximative method 1. Estimate the excess distribution of severities using POT 2. Calculate VaR(LT +1) i,k Monte Carlo methods 1. Choose a distribution for severities and a process for frequencies (evt. jointly) 2. After a large number of simulations, estimate the VaR of the compound sum via the POT-MLE method
n(1 ) =u+ NuE(NiT +1)
POT-MLE VaR Estimate
VaR99.9% of a GPD(, 1) rv. as a function of (0, 1.5)
Small changes in lead to considerable changes in VaR

Marginal VaR Estimate: Issues

VaR is an exponential function of and therefore very sensitive to Condence intervals for VaR widen rapidly with increasing and decreasing sample size Fazit 1: for very high levels (99.9% or 99.97%) there is typically substantial uncertainty and variability in the VaR estimate due to the lack of data Fazit 2: Issues far in the tail call for judgement > 1 is an issue!
A5. Global VaR Estimation

Recall: Global VaR estimate VaR+ = VaR1 + + VaRl Reduction because of correlation eects VaR < VaR+ In general, VaR+ is not the upper bound!
Bounds on VaR
Find optimal bounds for
d
VaRT +1 VaRT +1 l,
k=1
LT +1 k
VaRT +1 u,
given marginal VaRs and dependence information Solution: Frchet Problem e Mass Transportation Problem
Example 1: Comonotonic Case

Recall: LT +1, . . . , LT +1 are comonotonic if there exists a rv Z and increasing 1 d
T T functions f1 +1, . . . , fd +1, so that T Li +1 = fiT +1(Z),
i = 1, . . . , d
If LT +1, . . . , LT +1 are comonotonic VaR is additive 1 d

d d
VaRT +1
k=1
LT +1 = k
k=1
VaRT +1 k,
277
Example 2: No Dependence Information

Take LT +1 = Li, i = 1, . . . , d = 8 and i FL1 = = FLd = Pareto(1, 1.5) Comonotonic case:
8 8
V aR0.999
i=1
Li
=
i=1
VaR0.999(Li) = 0.79
Unconstrained upper bound:

8
VaR0.999
i=1
Li
1.93
278
Example 3: No Dependence Information

FL1 = = FLd is more dicult
Bounds on VaR using the OpRisk portfolio given in Moscadelli(2004)

Correlation
Correlation: Correlation (linear, rank, tail) is one-number summary: , , S ... Careful: linear correlation does not exist for > 0.5 Linear correlation is typically small for heavy tailed rvs Knowledge of correlation (linear, rank, tail...) individual models, but totally insucient in general is sucient for
280
Correlation
Upper and lower bound on linear correlation (L1, L2) for L1 Pareto(2.5) (left) and L1 Pareto(2.05) (right) and L2 Pareto()
Copulas
Copula: With LT +1 Fi the joint distribution can be written as i P (LT +1 l1, . . . , LT +1 ld) = C(F1(l1), . . . , Fd(ld)) 1 d The function C is known as copula and is a joint distribution on [0, 1]d with uniform marginals A copula and marginal distributions determine the joint model completely However: there are not enough OpRisk data: one year of loss data comprises to a single observation of (LT +1, . . . , LT +1) 1 d
Dynamic Dependence Models

In order to use the data at hand, we need a dynamic model for the compound processes. Consider: d=2: and 1. make (Xi,1) and (Xi,2) dependent, i 1 2. make {N1(t) : t T } and {N2(t) : t T } dependent 3. combination of both
Nk (T )
Lk =
i=1
Xi,k ,
k = 1, 2
Dependent Counting Processes

Various models for dependent counting processes {N1(t) : t T } and {N2(t) : t T } exist: Common shock models Point process models Mixed Poisson processes with dependent mixing rvs. Lvy Copulas e So far, there is no general dependence concept It is not clear how to quantify dependence between processes It is less transparent how the dependence structure of the frequency processes aects the dependence structure of the compound rvs
One Loss Causes Ruin Problem

Question: how do the marginal severities aect the global loss? based on Lorenz curve in economics 20 80 rule for 1/ = 1.4 0.1 95 rule for 1/ = 1.01
285
Proposition 1 For L1, . . . , Ld iid and subexponential we have for L = L1 + + Ld that Proposition 2 P (L > x) dP (L1 > x),
Ni k=1 Xi,k
Suppose in addition that Li =
where Ni Poi(i) are
independent. Furthermore, for i = 1, . . . , d, Xi,k s are iid with P (Xi > x) = x1/i hi(x), If 1 > > d, we have that P (L > x) cP (X1 > x)
hi slowly varying
Discussion
Tail issues: robust statistics scaling mixtures Innite mean: industry occasionally uses constrained estimation to < 1 estimate under the condition of a nite upper limit
Discussion
(contd)
Aggregation issues: adding risk measures across a 7 8 table reduction because of correlation eects Data issues: impact of pooling incorporation of external data and expert opinion credibility theory
References
[1] Embrechts, P., Klppelberg, C., and Mikosch, T. (1997). Modelu ling Extremal Events for Insurance and Finance. Springer. [2] McNeil, A.J., Frey, R., and Embrechts, P. (2005) Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press. [3] Moscadelli, M. (2004) The Modelling of Operational Risk: Experience with the Analysis of the Data, Collected by the Basel Committee, Banaca dItalia, report 517-July 2004 [4] Nelehov, J., Embrechts, P. and Chavez-Demoulin, V. (2006) s a Innite mean models and the LDA for operational risk. Journal of Operational Risk, 1(1), 3-25.
IV: Credit Risk Management

A. Introduction to Credit Risk B. Mixture Models of Credit Risk C. Monte Carlo Techniques D. Statistical Inference for Credit Models
290
A. Introduction to Credit Risk

1. The Nature of the Challenge 2. The Merton Model 3. The KMV Model 4. Models Based on Credit Migration
291
A1. Credit Risk: Nature of the Challenge

Denition. Credit risk is the risk that unexpected changes in the credit quality of counterparties or issuers of securities in a given portfolio aect the value of a that portfolio. It includes in particular losses due to default and downgrading of counterparties. Measuring and management of credit risk is of high importance for nancial institutions. Lending is a key business for most banks. It creates a substantial amount of credit risk, which can be reduced only partially by trading in secondary markets. Most OTC transactions involve signicant credit risk. Regulatory developments such as Basel II.
Applications of Credit Risk Models:

a) Credit Risk Management Computation of loss distribution and associated risk measures (such as VaR and ES) for portfolios of defaultable bonds and loans Determination of risk capital (economic capital or regulatory capital) and risk contributions b) Pricing of credit-risky securities Corporate bonds, swaps and vulnerable securities (eg. options whose writer may default) Single name credit derivatives such as credit default swaps Portfolio-related products such as collaterised bond obligations (CDOs) or basket credit derivatives (eg. i-th-to-default swap)
Specic Issues in Credit Risk Management

The management of credit risk poses similar problems to the management of market risk. In addition there are a number of specic challenges, which are less relevant in market risk. Lack of public information on credit quality, which might lead to adverse selection problems. Scarcity of reliable data: most relevant data are private; risk measurement horizon typically at least one year. Loss distributions are typically strongly skewed with long upper tail, leading to frequent small gains and occasional large losses. Modelling of dependence even more important than in market risk management, as tail of loss distribution is inuenced strongly by specication of dependence between defaults.
Dependence between defaults is key issue in credit risk management. In large balanced loan portfolios main risk is occurrence of many joint defaults this might be termed extreme credit risk. Dependence between default critically aects performance of many basket credit derivatives Sources for dependence between defaults Dependence caused by common factors (eg. interest rates and changes in economic growth) aecting all obligors Default of company A may have direct impact on default probability of company B and vice versa because of direct business relations, a phenomenon known as counterparty risk or contagion.
295
Empirical Evidence I
Moodys annual default rates (defaulted companies/overall number of rated companies) and changes in economic growth from 1920 1999; changes in economic growth clearly aect default rates.
Empirical Evidence II
CCC B BB BBB A
0.3
Empirical default probability
0.2
0.1
1995 2000
1985 1990 year
0.0
Standard and Poors default data from 1980 to 2000 show clear evidence of cycles; we expect within-year and between-year dependence.
Impact of dependence on loss distribution

0.12
probability
0.0
0.02
0.04
0.06
0.08
0.10
BB portfolio, dependent defaults independent defaults
10
20
30 Number of losses
40
50
60
Distribution of number of defaults for homogeneous portfolio of 1000 BB loans with default probability 1%; Bernoulli mixture model with default correlation 0.22% is compared with independent default model.
Overview
In our treatment of credit risk management we cover the following Structural or rm-value models for credit risk, in particular Mertons model, KMV, CreditMetrics. Focus on determining individual default probabilities. Static models for credit portfolios with focus on dependence modelling. Approaches to model simulation and calibration.
299
A2. Mertons model

Mertons model [Merton, 1974] is the prototype of all rm value models; it is an inuential benchmark even today. Modelling of Default: Consider rm with stochastic asset-value (Vt), nancing itself by equity (i.e. by issuing shares) and debt. Denote by St and Ft value at time t T of equity and debt so that Vt = St + Ft, 0 t T. Assume that Debt consists of single zero coupon bond with face value F and maturity T . Default occurs if the rm misses a payment to its debt holders and hence only in T .
Modelling of Default continued

In T we have two possible cases i) VT F. In that case the debtholders receive F ; shareholders receive residual value ST = VT F , and there is no default. ii) VT < F . In that case the rm cannot meet its nancial obligations, and shareholders hand over control to the bondholders, who liquidate the rm; hence we have FT = VT , ST = 0. In summary we obtain ST = [VT F ]+ and FT = min(VT , F ) = F [F VT ]+ . (23)
Remark. In Mertons model equity is a call option on V with strike F . The option interpretation of equity explains certain conicts of interest between shareholders and bondholders.
Dynamics of (Vt) and Default Probability

It is assumed that asset value (Vt) follows a diusion of the form dVt = V Vtdt + V VtdWt for constants V R, V > 0, and a Brownian motion (Wt)t0, so that 1 2 VT = V0 exp (V V )T + V WT ; 2
2 2 in particular ln VT N ln V0 + (V 1 V )T, V T . 2
Under this assumption the default probability of our rm is readily computed. We have P (VT < F ) = P (ln VT < ln F ) =
F 2 ln V0 (V 1 V )T 2 V T
. (24)
In line with economic intuition this is increasing in F and V and decreasing in V0 and V .
Pricing of Equity and Debt

In Mertons model it is assumed that the asset value (Vt) is a traded security, and that the riskless interest rate equals some constant r 0. Under this assumption we can price equity and debt using the Black- Scholes formula. Pricing of Equity. Recall that equity is just a call option on the asset value Vt. Hence Black-Scholes formula yields St = C BS(t, Vt; V , r, T, F ) := Vt(dt,1) F er(T t)(dt,2), dt,1
2 ln Vt + (r + 1 V )(T t) K 2 , dt,2 = dt,1 V T t. = V T t
(25)
303
Pricing of Debt
We now consider the value of the rms debt or equivalently of a zero coupon bond issued by the rm. Recall from (23) that value of rms debt equals dierence of default-free debt and put option on V with strike F , i.e. Ft = F p0(t, T ) P BS(t, Vt; r, V , F, T ) , where p0(t, T ) = exp (r(T t)) is price of default-free zero-coupon bond. Black-Scholes formula for European puts now yields Ft = p0(t, T )F (dt,2) + Vt(dt,1). (26)
304
Credit Spreads
Recall that credit spread measures the dierence of the (continuously compounded) yield of a default-free zero coupon bond p0(t, T ) and of a defaultable zero coupon bond p1(t, T ), i.e. 1 1 p1(t, T ) (ln p1(t, T ) ln p0(T t)) = ln . c(t, T ) = T t T t p0(t, T )
1 In Mertons model we have p1(t, T ) = F Ft and hence
V 1 c(t, T ) = ln (dt,2) + (dt,1) . (T t) F p0(t, T )
(27)
Note that c(t, T ) depends only on V and on the ratio F p0(t, T )/V (a measure of indebtedness of the rm). In line with economic intuition it is increasing in both quantities.
Credit Spreads in Mertons Model

credit spread (%) 4 0 1 2 3 5 6
0.1
0.2 volatility
0.3
0.4
0.5
credit spread (%)
0.0
0
0.4
0.8
1.2
2 time to maturity
Credit spread c(t, T ) (%) as function of V (top) and time to maturity T t (bottom) for xed debt to rm value ratio d = 0.6. In upper picture T t = 2; in lower picture V = 0.25. Note that, for T t < 0.25 (3 months) the credit spread implied by Mertons model is approximately zero.
A3. The KMV Model

The KMV model, developed in 1990s, is a popular extension of Mertons model, which is now maintained by Moodys KMV. Contributions of KMV: extends Mertons model to give better empirical performance; implementation using proprietary database. Key concept: EDF (expected default frequency). This is simply the one-year default probability of a given rm. In Mertons model we get from (24) EDFM erton = 1
2 ln V0 ln F + (V 1 V ) 2 V
(28)
In KMV model EDF has similar structure, but 1 is replaced by estimated function, and V0 and V are determined from equity data using Mertons model.
The KMV Model continued

Determination of V0 and V . Market value V0 of a rms assets is not fully observable. Market value and book value can dier widely. Only equity and at most a part of liabilities is traded. Therefore KMV backs out V0 from observable value of a rms equity using the Merton model by inverting pricing formula (25). In an iterative procedure asset volatility V (which diers in general from equity volatility) is determined from equity data. Remark. KMV uses slightly more sophisticated pricing model for equity; typically F is taken as sum of short-term and half of long-term debt.
Calculation of EDFs
Distance to Default. Relation (28) for EDFM erton might be too simplistic. Therefore KMV denes in an intermediary step the distance to default or DD by DD := (V0 F )/(V V0), (29)
where F represents the default threshold (typically liabilities payable within one year). (29) is an approximation of the argument of (28), 2 since V and V are small and ln V0 ln F (V0 F )/V0. Calculation of EDFs. KMV uses DD as state variable and assumes that rms with equal DD have equal EDFs. Functional relation between EDF and DD is not postulated, but estimated empirically using proprietary default database.
Applying the KMV Approach: an Example

Variable Value Notes Market value of equity $110,688 Share Price # Outstanding. Book Liabilities $64,062 From balance sheet. Market value of assets $170,558 From option-pricing model. Asset volatility 0.21 Default point $47,499 Liabilities payable within 1 year. 17047 Distance-to-default 3.5 Given by ratio 0.21170 . EDF (one year) 0.25 % Determined using empirical mapping between DD and EDF.
The example is concerned with situation of Phillip Morris Inc. at the end of April 2001. Financial quantities in million USD. Example provided by KMV.
310
A4. Models Based on Credit Migration

Overview. In credit migration approach each rm is given a particular credit rating measuring its credit quality; moreover, probability of moving from one rating to another (including default) is specied. Credit rating serves as state variable, i.e. rms with equal rating have equal transition probabilities. Credit ratings for major companies or sovereigns and rating transition matrices are provided by rating agencies such as Moodys or Standard & Poors (S&P). Standard industry model in this class is CreditMetrics developed by J.P. Morgan and the RiskMetrics Group.
311
Transition Probabilities: an Example from S&P

Initial rating AAA AA A BBB BB B CCC AAA 90.81 0.70 0.09 0.02 0.03 0.00 0.22 AA 8.33 90.65 2.27 0.33 0.14 0.11 0.00 Rating at year-end(%) A BBB BB B 0.68 0.06 0.12 0.00 7.79 0.64 0.06 0.14 91.05 5.52 0.74 0.26 5.95 86.93 5.30 1.17 0.67 7.73 80.53 8.84 0.24 0.43 6.48 83.46 0.22 1.30 2.38 11.24 CCC 0.00 0.02 0.01 1.12 1.00 4.07 64.86 Default 0.00 0.00 0.06 0.18 1.06 5.20 19.79
Empirical probabilities of migrating from one rating to another within 1 year. Source: Standard & Poors CreditWeek. For example, the 1 year default probability of a B-rated rm is 5.2%.
312
Cumulative Default Probabilities according to S&P

Term 1 2 3 4 5... 7... 10... AAA 0.00 0.00 0.07 0.15 0.24... 0.66... 1.40... AA 0.00 0.02 0.12 0.25 0.43... 0.89... 1.29... A 0.06 0.16 0.27 0.44 0.67... 1.12... 2.17... BBB 0.18 0.44 0.72 1.27 1.78... 2.99... 4.34... BB 1.06 3.48 6.12 8.68 10.97... 14.46... 17.73... B 5.20 11.00 15.95 19.40 21.88... 25.14... 29.02... CCC 19.79 26.92 31.63 35.97 40.15... 42.64... 45.10... Average cumulative default-rates (%). Source: Standard & Poors CreditWeek. To be consistent these rates should be approximately equal to default rates obtained from n-fold product of one-year transition probabilities.
313
Credit Migrations and KMV Approach Compared

Advantages of KMV Approach. The KMV EDF reacts quickly to changes in economic prospects of a rm, whereas agencies are often slow to adjust ratings. EDFs tend to reect the current macroeconomic environment and tend to be better predictors of defaults over short time horizons. Advantages of Credit migration approach. KMV approach sensitive to over- and underreactions in equity markets. If KMV model is widely followed this might have destabilizing eects on real economy. As rating agencies focus on average credit quality through the business cycle, risk capital requirements based on rating transitions uctuate less, helping to provide liquidity in credit markets.
Credit Migration Models as Firm-Value Models

Given n + 1 rating classes j {0, 1, . . . , n} of increasing quality (0 is default). Suppose that for a given rm the probability that the rm is in class j at year-end is given by probabilities qj , 0 j n. Suppose that asset value VT is lognormal. Choose thresholds = d0 < d1 < < dn < dn+1 = such that P dj < VT dj+1 = qj for all j. Denition. Firm belongs to rating class j {0, . . . , n} at T if and only if dj < VT dj+1. Remarks. 1) Transition probabilities of our rm value model are invariant under simultaneous increasing transformations of V and thresholds, so that we may work with normally distributed X = ln V . 2) Portfolio versions of KMV/CreditMetrics are similar.
References
[Merton, 1974] (On Mertons model) [Crosbie and Bohn, 2002] (KMVs default-model) [Kealhofer and Bohn, 2001] (KMVs portfolio model) [RiskMetrics-Group, 1997] (CreditMetrics manual) [Crouhy et al., 2000] (good comparison of dierent models) [Crouhy et al., 2001] (book by same authors).
316
B. Mixture Models of Portfolio Credit Risk

1. Bernoulli Mixture Models 2. One-Factor Mixture Models 3. KMV/Creditmetrics as a Mixture Model 4. Large Portfolio Behaviour of Mixture Models
317
Notation for One-Period Portfolio Models

Consider portfolio of m rms and a time horizon T (typically 1 year). For i {1, . . . , m} dene Yi to be the default indicator of company i, i.e. Yi = 1 if company defaults by time T , Yi = 0 otherwise. Loss given default denoted by iei, where ei is exposure and i [0, 1] represents percentage loss given default. Portfolio loss equals m L := i=1 eiiYi Simplications We consider only a two-state model (default/no-default). All of the ideas generalize to models with more states representing rating classes. Notation is simpler in two-state setting. We mostly neglect the modelling of exposures.
B1. Bernoulli Mixture Models

These provide a way of capturing the dependence among Bernoulli events (i.e. defaults/non-defaults) occurring in a xed time period. Denition. (Bernoulli mixture model). Given some p < m and a p-dimensional random vector = (1, . . . , p) , the default indicator vector Y follows a Bernoulli mixture model with factor vector if there are functions pi : Rp (0, 1), such that conditional on the components of Y are independent Bernoulli rvs with P (Yi = 1 | = ) = pi().
319
Distribution of Defaults
For y = (y1, . . . , ym) in {0, 1}m we get
m
P (Y = y | = ) =
i=1
pi()yi (1 pi())1yi ,
and the unconditional distribution is given by

m
f (y) = P (Y = y) =
Rp i=1
pi()yi (1 pi())1yi g()d,
where g() is the probability density of the factors. By adding exposures and assumptions about losses given default we complete the specication of a one-period model.
Default Correlation
Denition. Default or event-correlation of rms i and j is given by corr(Yi, Yj ). Denote by pi = P (Yi = 1) = E(Yi) unconditional default probabilities. We have cov(Yi, Yj ) = E(YiYj ) pipj and var(Yi) = pi(1 pi). Hence corr(Yi, Yj ) = E(YiYj ) pipj pi(1 pi) pj (1 pj )
so that default correlation can be computed from joint default probability E(YiYj ) = P (Yi = 1, Yj = 1). Note that in mixture models pi =
Rd
pi()g()d
321
CreditRisk+
CreditRisk+ may be represented as a Bernoulli mixture model. Distribution of the default indicators is given by pi() = 1 exp(wi). Here = (1, . . . , p) is a vector of independent gamma distributed macroeconomic factors and wi = (wi,1, . . . , wi,p) is a vector of positive factor weights. Remark: CreditRisk+ is usually presented as a Poisson mixturre where, conditional on , the default of counterparty i occurs independently of other counterparties with a Poisson intensity i() = wi. This leads to the above default probabilities. The model assumptions mean that the distribution of the number of defaults is a sum of independent negative binomials.
B2. One-Factor Mixture Models

Often it is useful to work in a one factor model (p = 1): Fitting to default data relatively easy Behaviour of large portfolios easy to understand In the exchangeable special case the conditional default probabilities pi() are identical, i, making Y exchangeable. Dene the rv Q := p1() with df G(q): := P (Yi = 1) = E (Yi) = E (E (Yi | Q)) = E(Q)
1
k := P Yi1 = 1, . . . , Yik = 1 = E Qk =
q k dG(q) ,
0
Unconditional default probabilities and higher order joint default probabilities are moments of the mixing distribution.
Exchangeable Bernoulli Mixtures

Default correlations: It follows that, for i = j, cov (Yi, Yj ) = 2 2 = var Q 0. Hence default correlation is given by 2 2 Y := corr (Yi, Yj ) = . 2 Examples of Mixing Distributions Beta Q Beta(a, b), g(q) = (a, b)1q a1(1 q)b1, a, b > 0. Corresponds to one-factor CreditRisk+ [Frey and McNeil, 2002]. Probit-Normal Q = ( + ), N (0, 1) (CreditMetrics/KMV) Logit-Normal Q = (1 + exp( ))1, (CreditPortfolioView)
N (0, 1)
324
The Clayton Copula Model

A further model is obtained by taking a gamma distributed factor Ga(1/.1) and setting Q = p1() = exp ( 1) . This gives default probabilities k = (k k + 1)1/ , for k = 1, 2, . . . ,. It is known as the Clayton copula model because it can be understood as an asset value model where the multivariate asset value changes in the time interval [0, T ] have a Clayton copula. See Section 8.4.4 and Example 8.22 in MFE.
325
Parameterizing Mixing Distributions

In these 2-parameter examples, if we x default probability and default correlation Y (or 2) we fully calibrate the model. For instance, in the exchangeable BetaBernoulli mixture model, we have = a/(a + b) and 2 = (a + 1)/(a + b + 1).
150 g(q) 0
0.0

50
100
Beta Density g(q) of mixing variable Q in exchangeable Bernoulli mixture model with = 0.005 and Y = 0.0018.
0.02
0.04
0.06
0.08
0.10
Comparison of Exchangeable Models ( and 2 xed)

The tail of distribution of Q determines tail of loss distribution of portfolio.
10^0 P(Q>q) 10^4 10^3 10^2 10^1
Probitnormal Beta Logitnormal Clayton
10^6
0.0
10^5
0.1 q
0.2
0.3
Horizontal line at 0.01 shows that models only diverge at 99th percentile of Q. For given and 2 there is little model risk.
One-Factor Model with k Homogeneous Groups

Exchangeability may be too restrictive, for instance if we want to model a portfolio with obligors belonging to rating classes. Hence more general one-factor models are needed. Example: (homogeneous group-model) We have k groups and r(i) {1, . . . , k} gives the group membership of rm i. We model default probabilities by pi() = h(r(i) + ), for > 0 and a real-valued rv . If h is increasing, the conditional default probabilities are comonotonic. Possible choices: h = , N (0, 1). Remark: This model is akin to GLMM (generalised linear mixed model) and can be tted by ML.
(30)
B3. KMV/CreditMetrics as a Mixture Model

These industry models belongs to the class of structural or rm-value models descending from Mertons inuential credit risk model. [Merton, 1974] We assume that default occurs for counterparty i if a critical variable Xi (often interpreted as asset value) lies below a critical threshold di (often interpreted as liabilities) at the end of the time period of interest. We assume X = (X1, . . . , Xm) satises: X has a multivariate normal distribution. Xi N (0, 1) for all i (since we can standardize Xi and di without altering the default probability.) Xi follows a standard linear factor model.
Factor Model for Critical Variables

X i = zi + i where Np(0, ) is a random vector of normally distributed common economic factors, zi is a vector of loadings for the ith counterparty, and i is a normally distributed error, which is independent of and of j for j = i. The term zi is the systematic risk component for counterparty i and has variance i := zizi, whereas i is the idiosyncratic risk component with variance 1 i.
KMV/CreditMetrics as a Mixture Model

KMV/Creditmetrics is a Bermoulli mixture model with factor vector . The conditional independence of defaults given follows from the independence of the idiosyncratic terms 1, . . . , m. The conditional default probabilities are pi() = P (Yi = 1 | = ) = P (Xi < di | = ) = P (i < di zi) d i zi = . 1 i We can reparametrize in terms of the default probability pi of counterparty i: 1(i) zi p . pi() = 1 i
Special Cases
One-Factor Model pi() =
1
(i) i p , 1 i
where is a standard normally distributed factor. Homogeneous Correlation Model pi() =

1
(i) p , 1
(31)
where turns out to be the asset correlation between any two critical variables Xi = Xj .
332
B4. Large Portfolios and Basel II

As motivation consider an exchangeable Bernoulli default model of KMV/CreditMetrics type, i.e. a model in which defaults are conditionally iid with common conditional default probability Q := p1() where p1 has the form (31). Suppose ei = i = 1 for all companies. Let L(m) = consider increasing the portfolio size.
m i=1 Yi
and
Conditional on some realization of the common factor the SLLN says that, almost surely, L(m) lim = q = p1(). m m In other words the distribution of Q = p1() can be thought of as the distribution of the portfolio loss (expressed as a fraction) in an innitely large exchangeable portfolio.
Remarks
In this example the asymptotic portfolio loss distribution, the distribution of 1 () p Q= , 1 has been called the Vasicek loss distribution. [Vasicek, 1997]. (It is a probit-normal distribution.) The idea of looking at innitely ne-grained portfolios where the individual risks become negligible and the systematic factor(s) dominate(s) has been taken up by other researchers in more complicated models and has inuenced regulation. [Gordy, 2003]
334
Computations for Large Portfolios

Of course the practical relevance of the large portfolio results is for making computations when m is large. Consider again the simple example. For tail probabilities we have P L(m) > l P (Q > l/m) =
1
() p
1 1(l/m) .
For Value-at-Risk we have VaR(L(m)) mq(Q) = m

1
() + 1() p . 1
335
General Results
We will give two results, as in our book. Proofs are found in [Frey and McNeil, 2003]. Let (ei)iN be an innite sequence of positive deterministic exposures, (Yi)iN be the corresponding sequence of default indicators and (i)iN a sequence of random variables with values in (0, 1] representing percentage losses given that default occurs. In this setting the loss for a portfolio of size m is given by m L(m) = i=1 Li where Li = eiiYi are the individual losses. We now make some technical assumptions for our model.
336
Assumptions for Large Portfolio Results

1. There is a p-dimensional random vector and functions i: Rp [0, 1] such that, conditional on , the (Li)iN form a sequence of independent random variables with mean i() = E(Li | = ). We extend conditional independence assumption to losses. 2. There is a function : Rp R+such that
m
1 1 (m) lim E L | = = lim i() = () m m m m i=1 () is known as asymptotic conditional loss for all Rp. function. We preserve composition of portfolio as it grows. 3. There is some C < such that i=1(ei/i)2 < C for all m. No individual exposure may be too large.
Large Portfolio Result I

In large portfolios the portfolio loss is essentially determined by the asymptotic conditional loss function and the realisation of the factor random vector according Proposition 1. Consider a sequence L(m) = i=1 Li satisfying assumptions above. Denote by P ( | = ) the conditional distribution of the sequence (Li)iN given = . Then 1 (m) lim L = () P ( | = ) a.s. for all supp() . m m This obviously applies to the number of defaults i=1 Yi if we set i = ei 1. We simply assume that m 1 limm m i=1 pi() = p() for some function p : Rp [0, 1].
Large Portfolio Result II

For one factor Bernoulli mixture models we can obtain a stronger result which links the quantiles of L(m) to quantiles of the mixing distribution. Proposition 2. Consider a sequence L(m) = i=1 Li satisfying the assumptions with a one-dimensional mixing variable with distribution function G(). Assume that the conditional asymptotic loss function () is strictly increasing and right continuous and that G is strictly increasing at q(), i.e. that G(q() + ) > for every > 0. Then 1 lim q(L(m)) = (q()) . (32) m m
m
339
Examples
Consider one-factor models with ei = i 1. Exchangeable Model. Proposition 2 implies that for m large we have q(L(m)) m q(Q). Tail of distribution of Lm is determined by the tail of Q. Homogeneous Group Model. If the relative group size mr /m converge to xed constants r as m for all r. We obtain k p() = r=1 r h(r + ). Hence for m large we have
k (m)
q(L(m)) m
r=1
r h(r + 1()) .
340
Implications for Basel II

The formulas coming from a large-portfolio analysis of the KMV/CreditMetrics model have been inuential in deriving capital rules for Basel II. Consider the model in (31). For m large we have
m
q L(m)
i=1
ieipi(1()).
In the internal-ratings-based (IRB) approach the capital required for risk i is proportional to ieipi(
1
(0.999)) = iei
1 (i) + (0.999) p . 1
341
Implications for Basel II ctd.

Hence the IRB capital charge can be considered as 99.9% quantile of the portfolio loss in a large homogeneous portfolio following a one-factor KMV-type model. Parameters. Bank species pi. Correlation parameter and one-factor assumption is imposed by the regulator independently of portfolio considered In particular IRB-approach is not a fully internal model.
342
C. Monte Carlo Techniques for Credit Risk

1. Motivation 2. Importance Sampling 3. Application to Bernoulli Mixtures
343
C1. Motivation
Consider a Bernoulli mixture model for a loan portfolio and assume m that the overall loss is of the form L = i=1 Li, where the Li are conditionally independent given some economic factor vector . Suppose we wish to measure the portfolio risk with expected shortfall and to calculate a capital allocation based on expected shortfall contributions at the condence level . We need to evaluate the conditional expectations E (L | L q(L)) and E (Li | L q(L)) . (33)
A possible is to use Monte Carlo (MC) simulation, although the problem of rare event simulation arises.
344
Standard Monte Carlo

In a generic Monte Carlo problem we have an rv X with density f and we wish to compute an expected value of the form
= E(h(X)) =
h(x)f (x)dx,
(34)
for some known function h. For the probability of an event we have h(x) = 1{xA} for some set A R; for expected shortfall computation we have h(x) = x1{xc} for some c R. Where the analytical evaluation of is dicult we can resort to an MC approach: 1. Simulate X1, . . . , Xn independently from density f . MC 2. Compute the standard MC estimate n =
1 n
n i=1 h(Xi).
345
Rare Event Simulation

The MC estimator converges to by the strong law of large numbers, (SLLN) but the speed of convergence may not be particularly fast, particularly when we are dealing with rare event simulation. If we consider the application in (33) we see that if = 0.99, say, then only 1% of our standard Monte Carlo draws will lead to a portfolio loss higher than q0.99(L). The standard MC estimator of, which consists of averaging the simulated values of L or Li over all draws leading to a simulated portfolio loss L q(L), will be unstable and subject to high variability, unless the number of simulations is very large. The technique of importance sampling is a way of reducing this variability and is well suited to problems of the kind we consider.
346
C2. Importance Samping

Importance sampling is based on an alternative representation of in (34). We consider an importance sampling density g (whose support should contain that of f ) and dene the likelihood ratio r(x) by r(x) := f (x)/g(x) whenever g(x) > 0 and r(x) = 0 otherwise. The integral may be written in terms of the likelihood ratio as
h(x)r(x)g(x)dx = Eg (h(X)r(X)),
(35)
where Eg denotes expectation with respect to the density g. Hence we can approximate the integral with the following algorithm. 1. Simulate X1, . . . , Xn independently from density g. IS 2. Compute the IS estimate n =
1 n
n i=1 h(Xi) r(Xi).

347
Reducing the Variance

The art of importance sampling is in choosing g such that for xed n the variance of the IS estimator is considerably smaller than that of the standard Monte Carlo estimator. varg IS n 1 Eg h(X)2r(X)2 2 , = n 1 E h(X)2 2 . = n
MC var n
The aim is to make Eg (h(X)2r(X)2) small compared to E(h(X)2). Consider the case of estimating a tail probability where h(x) = 1{xc} for c signicantly larger than the mean of X. We try to choose g so that the likelihood ratio r(x) = f (x)/g(x) is small for x c; in other words we make the event {X c} more likely under the IS density g than it is under the original density f .
Exponential Tilting
For t R we write MX (t) = E(e ) = for the moment generating function of X, which we assume is nite for t R. It is not hard to check that we can dene a density gt(x) := etxf (x)/MX (t) which can be used for importance sampling when X is light tailed. Dene t to be the mean of X with respect to the density gt i.e. t := Egt (X) = E X exp(tX)/MX (t) . How can we choose t optimally for a particular importance sampling problem? In the case of tail probability estimation theory suggests we should choose t as the solution of t = c.
tX
etxf (x)dx
Exponential Tilting For Normal Distribution

We illustrate the concept of exponential tilting in the simple case of a standard normal random variable. Suppose X N (0, 1) with density (x). Using exponential tilting we obtain the new density gt(x) = exp(tx)(x)/MX (t). The moment generating function of X is known to be MX (t) = exp t2/2 . Hence 1 1 2 gt(x) = exp tx (t + x2) 2 2 1 (x t)2 = exp , 2 2
so that under the tilted distribution, X N (t, 1). In particular, exponential tilting is a convenient way of shifting the mean of X.
350
An Abstract View of Importance Sampling

To handle the more complex application to portfolio credit risk we consider importance sampling from a slightly more general viewpoint. Given densities f and g we dene probability measures P and Q by P (A) =
A
f (x)dx and Q(A) =

A
g(x)dx , A R .
With this notation (35) becomes = EP (h(X)) = EQ(h(X)r(X)), so that r(X) equals dP/dQ, the (measure-theoretic) density of P with respect to Q.
351
More General View of Exponential Tilting

Using this more abstract view, exponential tilting can be applied in more general situations: given a rv X on (, F, P ) such that MX (t) = EP (exp(tX)) < , dene the measure Qt on (, F) by dQt exp(tX) = , i.e. Qt(A) = EP dP MX (t) and note that dQt/dP
1
exp(tX) ;A , MX (t)
= MX (t) exp(tX) = rt(X).
The IS algorithm remains essentially unchanged: simulate independent n 1 realizations Xi under the measure Qt and set IS = n i=1 Xirt(Xi) as before.
352
C3. Application to Bernoulli Mixtures

Consider a portfolio loss of the form L = i=1 eiYi, where the ei are deterministic, positive exposures and the Yi are default indicators with default probabilities pi. Y follows a Bernoulli mixture model with factor vector and conditional default probabilities pi(). We study the problem of estimating exceedance probabilities = P (L c) for c substantially larger than E(L) using importance sampling. We consider rst the situation where the default indicators Y1, . . . , Ym are independent and discuss subsequently the extension to the case of conditionally independent default indicators. Our exposition is based on [Glasserman and Li, 2003].
m
353
Independent Default Indicators

Here we use the more general IS approach and set = {0, 1}m, the state space of Y. The probability measure P is given by
m
P ({y}) =
i=1
pyi (1 pi)1yi , i
y {0, 1}m .
The moment generating function of L is

m m m
ML(t) = E
exp t
i=1
eiYi
=
i=1
E eteiYi =
i=1
etei pi + 1 pi .
The measure Qt is given by Qt({y}) = EP (etL/ML(t); Y = y) and hence exp (t i=1 eiyi) P ({y}). Qt({y}) = ML(t)
Independent Default Indicators II

We obtain
m
Qt({y}) ==
i=1
exp(teiyi) pyi (1 pi)1yi . exp(tei)pi + 1 pi i
and it follows that

m
Qt({y}) =
i=1
yi q t,i(1 q t,i)1yi
where q t,i := exp(tei)pi/(exp(tei)pi + 1 pi). The default indicators remain independent but with new default probability q t,i. The optimal value of t is chosen such that EQt (L) = c, leading to the m equation i=1 eiq t,i = c.
Conditionally Independent Default Indicators

The rst step in the extension to conditionally independent defaults is obvious: given a realization of the economic factors, the conditional exceedance probability () := P (L c | = ) is estimated using the approach for independent default indicators. This yields Algorithm IS,1 8.26 in MFE and gives an estimate n1 () where n1 is the number of random draws of (Y1, . . . , Ym). Our ultimate aim is to estimate = P (L c). In a naive approach we could generate realizations of and estimate by calculating the n 1 IS,1 average n i=1 n1 (i). However a dramatic improvement can be obtained by also applying importance sampling to the distribution of . This is the idea behind Algorithm 8.27 in MFE.
One-Factor KMV/CreditMetrics Model

Consider a model with conditional default probabilities pi() where N (0, 1). Instead of generating 1, . . . , n from a standard normal N (0, 1) distribution we should use exponential tilting to generate them from a N (, 1) distribution for some sensibly chosen value of . An approach to determining and references to literature on this subject are given in MFE (p.373). We obtain the algorithm: 1. Generate 1, . . . , n N (, 1) independently. IS,1 2. For each i calculate n1 (i) by importance sampling. IS 3. Determine the full IS estimator: n = r() = exp( + 1 2). 2
1 n
n IS,1 r(i)n (i) i=1
where
357
Example
We consider an exchangeable portfolio of 100 rms with identical unit exposures, default probabilities 0.05 and asset correlations (i.e. values of ) 0.05. Aim is to calculate the tail probability P (L 20) by IS. (In such a simple model it can in fact be calculated analytically to be 0.00112.) We compare 1. Naive Monte Carlo (n = 10000). 2. IS for factor distribution (n = 10000). 3. Naive Monte Carlo for factor (n = 10000) and IS for conditional default distribution (n1 = 50). 4. IS for factor distribution (n = 10000) and conditional default distribution (n1 = 50).
Results
No IS Outer IS Inner IS Full IS
0.0020
0.0020
0.0020
0.0015
0.0015
0.0015
Estimate
Estimate
Estimate
0.0010
0.0010
0.0010
Estimate 0.0005
0 2000 4000 6000 8000 10000
0.0005
0.0005
2000
4000
6000 Iterations
8000
10000
2000
4000
6000
8000
10000
0.0005
0
0.0010
0.0015
0.0020
2000
4000
6000
8000
10000
Iterations
Iterations
Iterations
359
D. Statistical Inference in Portfolio Credit Models

1. Statistical Inference in Credit Models 2. Bernoulli Mixture Models as GLMMs 3. Estimation of Models 4. Examples 5. Migration Models
360
D1. Introduction to Statistics for Mixture Models

Industry models generally separate the problems of estimating (i) default probabilities and (ii) model parameters describing the dependence of defaults. 1. Default probabilities are usually estimated by a historical default rate for similar companies, where the similarity metric may be based on ratings (CreditMetrics) or a proprietary measure like distance-todefault (KMV). 2. Dependence is usually described by a macro-economic or fundamental factor model. Parameters of factor models are often simply assigned by economic argument or derived from factor analyses of proxy variables (e.g. equity returns for asset value returns in the rm-value models).
Ad Hoc Calibration and Model Risk

The ad hoc nature of some of the attempts to model dependence in particular raises the issue of model risk. For example, in KMV/CreditMetrics, how condent are we that we can correctly determine the size of the systematic component of risk (determined by the common factors and loadings)? Misspecication of the asset correlation parameter in (31) can have a drastic impact on any calculations that are made with this model, for example determination of VaR.
362
Simple Moment Estimators for Exchangeable Models

In exchangeable models we have a simple moment estimator for , 2. Recall that parameters of typical exchangeable Bernoulli mixtures are determined from and 2. Observations: Historical data for years j = 1, . . . , n. In year j we have mj obligors of which Mj default. Our moment estimator. In this context a simple moment-style estimator of k , 1 k min{m1, . . . , mn} is given by 1 k = n j=1
n Mj k mj k 1 n
1 Mj (Mj 1) (Mj k + 1) = . n j=1 mj (mj 1) (mj k + 1)

Mj n j=1 mj .
Note that is simply [Frey and McNeil, 2003].

The estimator is consistent; see

363
D2. Bernoulli Mixture Models and GLMMs

We consider a sub-class of Bernoulli mixture models that t in the generalized linear mixed model (GLMM) framework. Conditional on factors (known in GLMMs as random eects) we assume pi() = P (Yi = 1 | = ) = g (xi + zi) (36)
where g() is a link function, typically a mapping from R to (0, 1) like a distribution function (e.g. g = ); xi and zi are explanatory variables (covariates) for ith obligor, such as indicators for rating category or sector, or rm-specic information from balance sheet; are unknown parameters (generally including an intercept).
GLMM as DAG (Directed Acyclic Graph)
y b
N.B. the factors are represented here by b and are hyperparameters of the distribution of .
A Multi-Period Model
Given factors t in time period t we assume that individual default indicators Yt1, . . . , Ytmt are conditionally independent with Yti | t = t Be(pti( t)), i = 1, . . . , mt, where (37)
pti( t) = g((t,i) xti zti t). Note some slight notational changes: (t, i) returns the credit rating of company i, 1, . . . , K are unknown intercepts.
366
The Asset Value Interpretation

Let t1, . . . , tmt be iid rvs with distribution function g, which are also independent of t. Set Vti := xti + ztit + ti for i = 1, . . . , mt. Observe that (37) corresponds to a model in which company i in period t defaults i Vti (t,i). The rv Vti can be interpreted as the asset value and (t,i) as the critical liability level. The implied asset correlation of rms i and j in period t is (Vti, Vtj ) = cov(ztit, ztj t) var(ztit) + w2 var(ztj t) + w2
where w2 := var(
ti).
Probit link: w2 = 1. Logit link: w2 = 2/3.
367
Multi-Period Model
xt
xt+1
yt bt
yt+1 bt+1
368
Multi-Period Model With Serial Dependence
xt
xt+1
yt bt
yt+1 bt+1
369
Serial Dependence
Serial dependence in default probabilities can be modelled using correlated latent factors 1, . . . , T . We assume that 1. conditional on (t)T the default indicator vectors Y1, . . . , YT are t=1 independent. Moreover, Yt depends on t only. 2. the latent factors (t)T form a Markov chain. t=1 Remark: The above assumptions dene a state space model (hidden Markov model) for the sequence Y1, . . . , YT .
370
D3. Estimation of Models

IID random eects The unconditional mass function of Yt = (Yt1, . . . , Ytmt ) is
mt
f (yt | , ) =
Rp
P (Yti = yti | t = , ) g( | ) d
i=1
where p = dim() and g( | ) is the density of t. The likelihood function with 1, . . . , T independent is
T
L(, | observed data) =

t=1
f (yt | , ).
(38)
There is no between-period dependence.

Dependent Random Eects

Let 1, . . . , T have joint density g( 1, . . . , T | ). The likelihood function L(, | observed data) now takes the form
T mt
t=1 i=1
P (Yti = yti | t, )g( 1, . . . , T | ) d 1 d T .
To evaluate this expression we have an integral over RT p, which makes standard maximum likelihood dicult.
372
Bayesian Inference
We distinguish between observed quantities D := (xt, zt, yt)T and t=1 unobserved quantities := (, , 1, . . . , T ). The prior distribution p() expresses a state of knowledge (or ignorance) about the unobserved elements before the data D are obtained. Inference in our model is based on the posterior distribution p( | D) p(D | )p() p( | D) = = p(D) Problem: nding p( | D)! p(D | )p() . p(D | )p() d
373
Markov Chain Monte Carlo (MCMC)

Assume we want to simulate from a (multivariate) distribution p(x). Idea: Construct an ergodic Markov chain with p as its stationary distribution. Regard a sample of the Markov chain (possibly after a certain burn-in) as a sample from p. Constructing such a Markov chain turns out to be surprisingly simple: Metropolis-Hastings algorithm Gibbs sampler (special case) MCMC can be used to simulate p( | D) even in complex cases. [Robert and Casella, 1999, Clayton, 1996]
374
Advantages of MCMC
calibration of complex models with multivariate, serially correlated latent factors; implementation straight-forward; simulation fast; point estimates, standard errors and (joint) condence sets of parameters are inherent in the output; inference about derived model parameters (e.g. default correlations, implied asset correlations) as easy as for primary parameters; posterior path of latent factors (t) can be compared with other macro-economic variables; prior information about parameters governing portfolio dependence can be entered in the analysis.
D4. Empirical Analysis of S&P Default Data

Homogeneous portfolio: xti = xt for all companies. Observed data collected on a six-month basis: Mtk :=
i:(t,i)=k Yti
(number of defaults for rating class k),
mtk := #{i : (t, i) = k} (number of companies for rating class k), Mt := (Mt1, . . . , MtK ) (vector of default count variables).
We t several models to S&P default data (rating classes CCC, B, BB, BBB, A) by Gibbs sampling with non-informative priors. The sequence (t) of scalar latent factors satises an AR(1) process: t = t1 + t, where
0, 1, . . .
b0 = 0 /
1 2 ,
are iid N (0, 1).

376
Empirical Analysis
Given (t)T , we assume M1, . . . , MT conditionally independent. t=1 Model 1: Mtk | t = t B(mtk , g(k t)), where g(x) = 1/(1 + exp{x}) is the logit response.
mt1 Mt1 bt
mtK MtK
mt +1,1 Mt +1,1 bt +1
mt +1,K Mt +1,K
State space model for the sequence (Mt) of default counts.

Results
mean (std.) A 9.097 (0.654) BBB 7.144 (0.356) BB 5.712 (0.276)
B 3.872 (0.239)
CCC 1.593 (0.245)
0.396 (0.083)
0.649 (0.169)
500
1500 iteration
2500
0.2 0.3 0.4 0.5 0.6 0.7

0
6.5 6.0 5.5 5.0
500
1500 iteration
2500
Histogram
0.0 0.2 0.4 0.6 0.8 1.0
frequency
0 500 1500 iteration 2500
0
0.0
100
200
300
0.4 alpha
0.8
378
Extended models
Time heterogeneity: (xt) denotes the Chicago Fed National Activity Index: Model 2: Mtk | t = t B(mtk , g(k xt t)). Heterogeneity among rating classes: Model 3: Mtk | t = t B(mtk , g(k k t)). Sector heterogeneity: Consider S industry sectors, and denote by Mtsk (mtsk ) the number of defaults (companies) for rating class k, period t, and sector s. Set t = (t1, . . . , tS ) , where ts := t + ts with (t) as before and t1, . . . , tS iid Gaussian. Model 4: Mtsk | t B(mtsk , g(k ts)). The covariance matrix of bt is of compound symmetry type.
Some Empirical Conclusions

Residual, cyclical, latent component in the systematic risk; even after accounting for observed business cycle covariates. Implied asset correlations for companies sharing industry sector is 10.5 %, whereas the across-sector counterpart is only 6.0 %. 1 Implied asset correlations do not appear to fall monotonely with increasing probability of default. [McNeil and Wendin, 2003]
A model without sector-specic latent factors yields an overall implied asset correlation of 6.9 % for the same dataset.
D5. Migration Models in GLMM Framework

Consider a latent factor (random eect) t (vector/scalar). Given t we assume that Rt1, . . . , Rtmt are conditionally independent with P (Rti | t = t) = g (t,i), xti zti t where g : R (0, 1) is an increasing response function, e.g. (x) (ordered probit) or 1/(1 + ex) (ordered logit), (t, i) is the rating of obligor i at the outset of period t, the intercepts (k, ) and regressor coecients are unknown parameters satisfying = k,1 k,0 k,K = ,
GLMMs for Ordered Categorical Responses

P (Rti | t = t) = g (t,i), xti zti t xti and zti are additional covariates other than rating, t are latent factors with hyperparameters . This denes a generalized linear mixed model (GLMM) for the ordered, categorical responses Rt1, . . . , Rtmt . We refer to xti + ztit as the systematic risk of obligor i. Conditional transition probabilities: P (Rti = | t) = P (Rti | t) P (Rti 1 | t).
Rating Transitions as Multinomial Trials

Dene the rating indicator Yti := I{Rti=0}, . . . , I{Rti=K} . Conditional on t, we have Yt1, . . . , Ytmt conditionally independent with Yti | t = t Multinomial 1, p(t,i)(xti + zti t) , where pk (z) = (pk,0(z), . . . , pk,K (z)) pk, (z) = g(k, z) g(k,
1
with z).
The elements of the matrix corr(Yti, Ytj ) are migration correlations of obligors i and j.
Interpretation as an Asset Value Model
BB
D CCC B
BB
BBB A AA AAA
Let t1, . . . , tmt be iid rvs with df g (independent of t). We dene Vti := xti + ztit + ti, i = 1, . . . , mt, and notice that Rti = Vti (t,i),
1,
(t,i),
Interpretation: Vti is the asset value and ((t,i), ) are critical liability levels. We refer to corr(Vti, Vtj ) as the implied asset correlation between obligors i and j.
V: Dynamic Credit Models

A. Credit Derivatives B. Reduced-Form Models: the Single-Firm Case C. Reduced-Form Models for Credit Portfolios
385
A. Credit Derivatives
1. Overview 2. Credit default swaps 3. Portfolio products
386
A1. Overview.
We nd it convenient to divide the universe of credit risky securities into the following three groups. Vulnerable securities Single-name credit derivatives Portfolio credit derivatives Vulnerable securities. These are securities whose actual payo is aected by the default of the issuer (or a party in a contract), but where trading or management of credit risk is not the primary purpose of the transaction. Examples: vulnerable options, interest rate swaps and corporate bonds.
An Overview ctd
Single-name credit derivatives. Credit derivatives are derivative securities which are primarily used for management or trading of credit risk. In case of a single-name credit derivative the payo depends only of the credit risk of a single obligor. Prime example: credit default swap (CDS). Portfolio credit derivatives. Credit derivatives whose payo depends on the credit risk within a portfolio of obligors. Prime examples: basket credit derivatives and CDOs (collaterized debt obligation).
388
A Popular Asset Class

Market for credit derivatives is growing rapidly, and credit derivatives have become indispensable tools for managing credit risk; they might even contribute to overall nancial stability. The view of Alan Greenspan ([Greenspan, 2002]).
More recently, instruments [. . . ] such as credit default swaps, collaterized debt obligations and credit-linked notes have been developed and their use has grown rapidly in recent years. The result? Improved credit risk management together with more and better risk management tools appear to have signicantly reduced loan concentrations in telecommunications and, indeed, other areas and the associated stress on banks and other nancial institutions. [..] Obviously this market [the market for credit derivatives] is still too new to have been tested in a widespread down-cycle for credit, but, to date, it appears to have functioned well.
389
A2. Single-name credit derivatives and CDSs

Credit default swaps (CDS) are the workhorse of market for credit derivatives. Three parties involved: C (reference entity); default triggers default payment. A (protection buyer); makes periodic premium payments to B. B (protection seller); makes default payment to A if C < T . C
6
premium payment at xed dates until default or maturity
yes: default payment
B
default of C occurs ? no: no payment

-
A
390
CDS: the payments

Premium payments. They are due at times 0 < t1 < < tN . If C > tk , A pays in tk a premium of size x(tk tk1), where x denotes the swap spread; after C premium payments stop. No initial payments. Default payment. If C < tN = T , B pays A the LGD of C at C . Sometimes B receives an accrued payment of size x( tk ) for (tk , tk1). Fair swap spread x. Since there are no initial payments x is chosen such that value at t = 0 of default payments equals value of premium payments. x is the quantity which is quoted on the market.
391
CDS Indices
There are a variety of standardized CDS indices for dierent regions (North America, Europe, Asia etc.) and industry sectors. These indices are fairly liquid and quotes for protection buyer and seller positions are readily obtained. Indices for North America are known under the label DJ CDX. . . , indices for Europe and Asia under the label DJ iTraxx Europe . . . , DJ iTraxx Asia . . . etc. Consider a CDS index on m names. The payo of the index corresponds to the payo of a portfolio of CDSs containing 1/m units of a single-name CDS on every name in the portfolio. In particular, the index spread is (approximately) given by the average spread of the single name CDSs in the portfolio.
A3. Portfolio Credit Derivatives

Basket credit derivatives. The most important basket credit derivative is the kth-to-default swap. This product has similar structure as an ordinary CDS, but instead of a single reference entity we now consider a portfolio with m k obligors (the basket). Premium payments are as before; Default payment is triggered if kth default time Tk < tN ; size of the payment may depend on identity k of defaulting rm. Most important: First to default swap. The contract is sensitive to default dependence. General principle: more dependence but same marginal default probability fair spread of rst to default decreases, fair spread of higher order default swaps increases (numerical illustration on next slide).
Basket Credit Derivatives and Default Dependence

A numerical illustration. We considered kth to default swaps with maturity T = 5 years on a portfolio of m = 5 rms for low medium and high default correlations in a Gaussian copula model calibrated to the same single-name CDS spreads (ie. the same marginal distribution of default times). k low correlation moderate correlation high correlation 1 5.38% 5.02% 4.54% 2 0.75% 0.94% 1.14% 3 0.090% 0.177% 0.318% 4 0.0077% 0.0269% 0.0776% 5 0.0003% 0.0024% 0.0123%
394
CDOs - Basic Structure

There are a variety of CDO contracts, but all have the same basic structure. Each CDO has a asset side, and a liability side, linked by a special purpose vehicle (SPV). The assets consist of credit risky securities related to a pool of reference entities; typically bonds, loans or - in synthetic CDOs - a protection-seller positions in single name CDS. These assets are acquired by the SPV. To nance the asset purchase the SPV issues notes. This amounts to a repackaging of the assets. The notes form the liability side of the structure. They belong to tranches of dierent seniority, called senior, mezzanine and equity piece. Due to repackaging most losses of the assets are borne by the equity piece, and the credit rating of mezzanine and senior tranches is higher than average rating of asset pool.
CDOs - Basic Structure
Assets
Interest and Principal
-
SPV
Interest and Principal
-
Liabilities
Senior Mezzanine
Protection Fee
CDS Premiums
Initial Investment

Proceeds

Default Payments on CDSs
Default Payments of CDO
Equity
Payments in a CDO structure; above arrow: asset-based structure; below arrow: synthetic CDO.
396
Types of CDOs
Asset-based CDO. These CDOs are based on a portfolio of real assets such as bonds (CBOs), loans (CLOs), mortgage-backed securities etc. Noteholders make an initial investment used to buy the underlying assets; in return they receive interest and principal repayments. Synthetic CDO. Here the underlying assets consist of single-name CDSs on a pool of rms; the noteholders make default payments and receive a periodic protection fee. No initial payments. Mixed forms exist. In a balance-sheet CDO the pool of underlying assets remains essentially unchanged over the duration of the transaction. In an arbitrage-CDO the underlying assets are actively traded with the goal of making additional prot.
Economic motivations
CDOs are arranged for a number of reasons. Spread arbitrage. Often the sale price of notes exceeds the initial value of underlying assets, as notes have a more favourable risk-return prole due to diversication. Regulatory capital relief. Important reason for balance-sheet CLOs; the structure enables banks to sell some of their credit risk but maintain borrower-lender relationship. Risk transfer. Banks use CDOs and related credit derivatives to improve the risk/return prole in their loan book.
398
References on credit derivatives

More information to be found for instance in [Schnbucher, 2003] o [Bluhm et al., 2002] [Bluhm, 2003] [Tavakoli, 2001]
399
B. Reduced-Form Models: the Single-Firm Case

1. Mathematical Tools 2. CDS-Pricing and Model Calibration
400
Reduced-form models.
In reduced-form models the default of a rm is modelled by some random time whose df is specied by the modeller; the precise economic mechanism leading to default is left open. A similar modelling philosophy was used in Bernoulli mixture models for credit risk management. Reduced-form models are popular in practice. They lead to tractable formulas explaining the price of credit-risky securities in terms of economic covariates With reduced-form models, the pricing machinery for default-free term-structure models can be applied to many defaultable securities as well.
B1. Mathematical Tools

Consider a random time (a rv on some probability space (, F, P ) with values in [0, )), modelling the default time of the rm under consideration. We denote the df of by F (t) = P ( t), the density by f (t) = F (t) and the survival function by F (t) := 1 F (t). The default indicator process (Yt) is dened by Yt = 1{ t}, t 0. Note that (Yt) jumps from 0 to 1 in and that 1 Yt = 1{ >t}. The function (t) := ln F (t) is called the cumulative hazard function of . Finally, (t) := f (t)/ (1 F (t)) = f (t)/F (t) is called the hazard rate of . We have F (t) = 1 e(t) and (t) = f (t)/F (t) = (t) so that t (t) = 0 (s)ds.
Cumulative Hazard Function and Hazard Rate

(t) can be interpreted as instantaneous chance of default at t, given survival up to time t. In fact, for h > 0 we have P ( t + h | > t) = (F (t + h) F (t))/F (t), yielding
1 F (t + h) F (t) 1 lim P ( t + h | > t) = lim = (t). h0 h h F (t) h0 Example. 1) The exponential distribution with parameter has df F (t) = 1 et, so that (t) = for all t > 0. 2) The Weibull distribution has df F (t) = 1 exp(t) for , > 0. This yields f (t) = t1 exp(t) and (t) = t1. Note that (t) is decreasing in t if < 1 and increasing if > 1.
Modelling the Flow of Information

When setting up a dynamic model in nance we have to specify how information is revealed to investors over time. Usually this is done using ltrations. A ltration (Ft) on our underlying probability space (, F, P ) is an increasing family {Ft : t 0} of sub--algebras of F: Ft Fs F for 0 t s < . Ft represents the events an investor can observe at time t. Here we assume that the only observable quantity is the default history of the rm under consideration. The appropriate ltration is therefore given by (Ht) with Ht = ({Yu : u t}), By construction, is an (Ht)-stopping time.
(39)
Conditional Expectations
Prices of credit derivatives will be (conditional) expectations wrt (Ht). Lemma. Let be a random time with jump indicator process Yt = 1{ t} and associated default history (Ht). Then, for any integrable rv X and any t 0, we have E(X; > t) E(1{ >t}X | Ht) = 1{ >t} . P ( > t) This gives the following expression for the conditional survival function of . Put With X := 1{ >s}, s > t. We get F (s) P ( > s | Ht) = E(X | Ht) = E(1{ >t}X | Ht) = 1{ >t} . F (t)
Application to Bond Pricing

Suppose that P represents a risk-neutral measure, and that the default-free interest rate is given by the function r(t). In that case the price in t of a defaultable zero-coupon bond with maturity T and zero recovery is given by
T
E exp
t
r(s)ds 1{ >T } | Ht
(40)
s (s)ds). t
If has hazard rate (t), we have F (s)/F (t) = exp( T Hence (40) equals 1{ >t} exp t (r(s) + (s))ds .
Remark. Bond price can be viewed as price of a default-free bond in a model with adjusted interest rate R(t) = r(t) + (t). This holds more generally in models where default time is a doubly stochastic random time.
B2. CDS-Pricing and Model Calibration

Approaches to pricing credit risk Financial pricing approach is nowadays standard for credit derivatives. Here prices are computed as expected discounted value of the payo under some risk-neutral measure or equivalent martingale measure; model parameters are calibrated to observed market prices of traded credit products. The actuarial pricing approach is mostly used for loan pricing. There prices are the sum of the expected discounted payo under the physical measure(the measure that models the actual probability of default) and a risk premium. Size of the risk premium is often related to the notion of economic capital of a loan.
407
Martingale Modelling
When building a model for pricing derivatives, the dynamics of the objects of interest (eg. interest rates or default times) are often specied directly under a risk-neutral measure Q. This approach is termed martingale modelling. Using risk-neutral pricing, if the value H of an asset at maturity T is exogenously given, its price Ht at time t < T can be computed as the conditional expectation under Q of the discounted payo. Denote by (rt) the default-free short rate. We have the following formula
T
Ht = E Q exp(
t
rsds)H | Ft .
(41)
Model parameters are determined by equating the price of traded securities computed via (41) to observed market prices (calibration to market data).
CDS-Pricing and Model Calibration

Following market practice, we study consider CDS-pricing under martingale modelling approach in a simple reduced-form model with deterministic risk-neutral hazard rates. This illustrates the martingale modelling approach; is practically relevant: since the CDS market is quite liquid most pricing models for credit derivatives are calibrated to observed CDS spreads. Our setup. We work directly under risk-neutral measure Q. has risk-neutral hazard rate Q(t). (0, 1) gives deterministic loss given default. r(t) 0 denotes deterministic default-free interest-rate; hence T p0(t, T ) = exp t r(s)ds gives price of default-free bond.
Pricing the CDS

Recall structure of CDS payments. As a start we price default payment leg and premium payment leg separately. The premium payment leg. Assume a generic swap spread x and ignore for simplicity accrued payment. The expected discounted value of the premium payments equals
N
V Prem(x; Q) = E Q
k=1 N
tk
exp
0
r(u)du x(tk tk1) I{tk < }
=x
k=1
p0(0, tk )(tk tk1)Q( > tk ),

tk 0
which is easily computed using Q( > tk ) = exp

Q(s)ds .
410
Pricing the CDS ctd.

The default payment leg. We obtain
V Def( Q) = E Q exp
0
r(u)du 1{ <tN } .
t Q (u)du 0 tN
Since has density f (t) = Q(t) exp R(u) := r(u) + Q(u), we get V Def( Q) =
0 tN
, dening
Rt
Rt
0 r(s)ds
f (t)dt =
0
Q(t)e
0 R(s)ds
dt .
Fair swap spread x. Since there are no initial payments in a CDS, the initial value of the contract is zero. Hence x is given by the relation V Prem(x; Q) = V Def( Q), which is easily solved for x. Note that x depends on the intensity function Q, as V Prem and V Def depend on Q.
Calibration to CDS-Spreads
Assume that we observe spreads quoted in the market for one or more CDSs on the same reference entity. In order to calibrate our model we hence have to determine the implied risk-neutral hazard rate function Q, which ensures that the fair CDS spreads implied by the model equal market quotes. Suppose that market information consists of the fair spread x of one CDS with maturity tN ; hence we take Q constant. Hence Q has to solve the equation
N
k=1
p0(0, tk )(tk tk1)e
tk
Q 0
tN
p0(0, t)e
Qt
dt.
Note that the lhs of this equation (premium payments) is decreasing in Q, whereas the rhs (default payments) is increasing in Q, so that a unique solution exists.
Calibration to CDS-Spreads ctd

Example. Consider the 5 year iTraxx EUR from August 4, 2004. The index level (quoted average spread) was 42 bp. Assuming a constant LGD = 60% and a homogeneous portfolio this leads to Q = 0.007 and hence to a 5-year risk-neutral default probability 5 Q P ( 5) = 1 e 3.44%. Comments. 1) If we observe spreads for several CDSs on the same reference entity but with dierent maturities, a time dependent (eg. piecewise constant or piecewise linear) risk-neutral hazard rate has to be used. 2) A good rst approximation for implied hazard rates is given by Q x/. This approximation is frequently used in practice.
413
References
Our presentation is based on Sections 9.2.1 and 9.3 of [McNeil et al., 2005]. For more information on doubly stochastic random times see eg. Sections 9.2.3 and 9.4 of that volume or [Lando, 1998] and [Lando, 2004].
414
C. Reduced-Form Models for Credit Portfolios

1. Introduction 2. Factor Copula Models 3. Default Contagion in Reduced-Form Models 4. Pricing of CDOs 5. CDOs: Explaining Market Prices
415
C1. Introduction
Existing reduced form models for credit portfolios can be divided in following model classes Models with conditionally independent defaults such as [Due and Singleton, 1999], [Lando, 1998]. Easy to treat; in particular similar valuation formulas for credit derivatives as in default-free term-structure models; no default contagion. Copula models such as [Li, 2001], [Schnbucher and Schubert, 2001]. o Easily calibrated to defaultable term structure, allow for default contagion. Main drawback: in general copula models there is a fairly unintuitive parametrization of dependence.
416
Reduced Form Models for Credit Portfolios ctd.

Factor copula models such as [Laurent and Gregory, 2003] [Schnbucher, 2003] or [Hull and White, 2004]. Special case of o copula models, where copula has a factor structure. Here contagion can be interpreted in terms of incomplete information on unobservable factors. These models have become market standard. Models with interacting intensities, where default contagion is explicitly modelled, often using Markov chains. Examples include [Jarrow and Yu, 2001], [Davis and Lo, 2001] and some of our own work. We concentrate mainly on factor copula models; they are practically relevant and a natural extension of static threshold models with factor structure. Our presentation follows Sections 9.7 and 9.8 of [McNeil et al., 2005].
Basic Concepts and Notation

Consider m rms with default times i and default indicator process Yt = (Yt,1, . . . , Yt,m) with Yt,i = 1{it}. F i(t) = P (i > t) survival function of obligor i; joint survival function: F (t1, . . . , tm) = P (1 > t1, . . . , m > tm). Ordered default times denoted by T0 < T1 < . . . < Tm. n {1, . . . , m} gives identity of the rm defaulting at time Tn
i Filtrations. Ht = ({Ys,i : s t}) is default history of rm i. Ht = ({Ys,i : s t, 1 i m}) = ({(Tn, n) : Tn t}) is default history of the portfolio. (Ht) is often called internal ltration of (Yt).
418
C2. Factor Copula Models

Copulas in a nutshell. A copula is a df C on [0, 1]m with uniform margins. Copulas and dependence structure. If a multivariate df F has continuous margins F1, . . . , Fm and X F , the copula C of F is the df of (F1(X1), . . . Fm(Xm)), and we have Sklars identity F (x1, . . . , xm) = C(F1(x1), . . . , Fm(xm)) , Survival copulas. Similarly, the survival function of X can be written as F (x1, . . . , xm) = C F 1(x1), . . . , F m(xm) , where the survival copula C is dened by C(u1, . . . , um) = C(1 u1, . . . , 1 um).
Ga Example. Gauss copula CP is the copula of X Nm(0, P), P a Ga Ga correlation matrix. Symmetry of Nm(0, P) CP = CP .
Copula Models for Default Times

In copula models marginal distribution at t = 0 and survival copula (denoted C) of the vector of default times (1, . . . , m) are specied. Hence survival function of default times is given by F (t1, . . . , tm) = C F 1(t1), . . . , F m(tm) , (42)
Comments 1) Specifying dependence structure C and marginal distribution F i separately is useful for calibration. Model is calibrated to given term structure of single-name CDS spreads by specifying F i; calibration of dependence structure (i.e. C) can then be done independently. 2) Typically F i is written as F i(t) = exp 0 i(s)ds where i(s) = F i(s)/F i(s) is the marginal hazard rate.
Factor Copula Models

Consider copula C and U C. Suppose that there is a p < m-dimensional random vector V, such that conditional on V the Ui are independent. In that case the model is termed factor copula model. The survival function has the following form F (t1, . . . , tm) = P (U1 F 1(t1), . . . , Um F m(tm))
m
=E
i=1 m
P Ui F i(ti) | V F i|V (ti | V) ,

i=1
=: E
(43)
where F i|V (t | v) := P Ui F i(t) | V = v is the conditional survival function of i given V.

Mixture Representation of Survival Function

Equation (43) is termed the mixture-representation of factor copula models. Denote by gV the density of V. We sometimes write (43) more explicitly as
m
F (t1, . . . , tm) =
Rp i=1
F i|V (ti | v)gV (v)dv.
Comments. 1) This is similar to representation of static threshold models as Bernoulli mixture models. In particular, for T xed YT follows a Bernoulli mixture model with factor vector V and conditional default probabilities QT,i(v) = 1 F i|V (T | v). 2) The latent V is sometimes termed frailty of the default times. 3) Mixture representation very useful for simulation and pricing.
Simulation of factor copula models

We have the following algorithm 1. Simulate a realization of V. 2. Simulate independent rvs i with df 1 F i|V (t | V), 1 i m. Comment. Importance sampling techniques discussed in the context of static Bernoulli mixture models can be employed to speed up simulations. Particularly useful for rare event simulation as in the pricing of CDO tranches with high attachment points.
423
One-Factor Gauss Copula Model

We compute mixture representation for a one-factor Gauss copula model; this model is frequently used in practice. Representation of copula. Let Xi = iV + 1 i i, i (0, 1) and V, ( i)1im iid standard normal rvs; hence X Nm(0, P), P a correlation matrix with (i, j)-th element i,j = ij . Set Ga Ui = (Xi), so that U CP . Corresponding survival function. Put di(t) := 1 F i(t) . Then di(t) iv F i|V (t | v) = P Ui F i(t) | V = v = P i , 1 i and hence 1 F (t1, . . . , tm) = 2
R i=1
di(t) iv 1 i
v 2/2
dv .
424
One-Factor Gauss Copula Model ctd

Exchangeable special case. If i [0, 1) for all rms the copula is exchangeable. In that case P is an equicorrelation matrix with o-diagonal element , and the survival function becomes m i d (t) v v 2/2 dv . F (t1, . . . , tm) = e 1 R i=1 In this model default occurs before T if Xi > di(T ), or, equivalently, Xi < di(T ). The vector X is often interpreted as asset-return vector; (di(T ), . . . , dm(t)) are the corresponding default thresholds. Since X Nm(0, P) as well, is often interpreted as asset correlation. Remark. Similar computations for copulas of other mean-variance mixtures such as t.
C3. Default contagion in reduced-form models

Recall that we speak of default contagion if the conditional default probability of some rm is increased given the default event of other rms. Modelling of default contagion has recently attracted a lot of attention. Default contagion in reduced-form models is best discussed in terms of default intensities. The default intensity t,i of rm i at time t is the instantaneous chance of default given the default history (Ht) of all rms in the portfolio, i.e. t,i = P i T | Ht) T T =t t It can be shown that Yt,i 0 i s,ids is a martingale wrt (Ht).
Default contagion in factor copula models

In factor copula model V is unobservable if information is restricted to default history (Ht). New default information leads to updating of conditional distribution GV|Ht and hence to changing default intensities.
0.030
default corr. 2% default corr. 0.5%
lambda
0.010
0.0
0.015
0.020
0.025
0.2
0.4 time
0.6
0.8
1.0
A trajectory of default intensity for dierent default correlations in a typical factor copula model, assuming T1 = 4 months.
Models with Interacting Intensities

Basic idea. Default contagion is explicitly modelled. Default intensity is modelled as function i(t, Yt) of time and and of default-state Yt of portfolio at time t. (Extension to stochastic state variables possible). Advantage. Intuitive and explicit parametrization of dependence between defaults; Markov process techniques available for analysis and simulation of the model. Disadvantage. Calibration to term structure of defaultable bonds or CDSs more dicult than with copula models, as marginal distribution of default times typically not available in closed form.
428
Construction via Markov Chains.

A model with interacting intensities is conveniently dened as time-inhomogeneous Markov chain with state space S = {0, 1}m and transition rate functions (from y to x) (t, y, x) = 1{yi=0}i(t, y), 0 if x = yi for some i {1 . . . , m}, else, (44) where yi S is obtained from y S by ipping ith coordinate. Interpretation. The chain can jump only to neighbouring states, which dier from the current state Yt by exactly one default; if i Yt,i = 0, the probability of a jump in [t, t + h) to state Yt (default of rm i), is hi(t, Yt).
Modelling Default Intensities

Default intensities i(t, y) are essential ingredient of the model. [Jarrow and Yu, 2001]: primary-secondary framework. [Frey and Backhaus, 2004]: homogenous group model with
m
i(t, Yt) = h (t, M (Yt)) , where M (y) =

i=1
yi .
(45)
In exchangeable models i(t, Yt) is necessarily of this form. Moreover, natural interpretation in terms of mean-eld interaction. Extension to model with several groups possible. [Yu, 2005] claims that h(t, l) = 0.01 + 0.001 1{l>0} is a reasonable model for European telecom bonds.
C4. Pricing of CDOs

Overview. In practice prices of portfolio derivatives are mostly computed using Monte Carlo. Explicit pricing formulas exist for rst-to-default swaps. In factor copula models there are semianalytic pricing formulas for CDOs. These exploit the mixture representation and the conditional independence of the default times Our setup. We consider a factor copula model for (Yt). Throughout we assume that model has been set up under equivalent martingale measure Q and that risk-free interest rate r and LGD i are t deterministic; D(t) = exp( 0 r(s))ds is default-free discount factor.
Synthetic CDOs: Payment Description

Notation. Consider portfolio of m loans with nominal ei, relative LGD i and default-indicator process (Yt). Cumulative loss of the m portfolio in t given by Lt = i=1 ieiYt,i. The CDO. Maturity T . We have k tranches, characterized by m attachment points 0 = K0 < K1 < < Kk i=1 ei. The notional of tranche at time t is given by K K1 for l < K1 N(t) = f(Lt) with f(l) = K l for l [K1, K]. 0 for l > K
Note that f(l) = (K l)+ (K1 l)+ (put spread with strike prices K and K1).
A stylized Example
Stylized CDO. We assume that payo of tranche is simply given by N(T ), the value of the notional at maturity. Real CDOs are more complicated, as there is intermediate income, but stylized example retains essential features. Impact of default dependence. More dependence, same marginal default probabilities Equity tranche increases in value, senior tranches decrease in value. Impact on mezzanine tranches unclear. Qualitative properties carry over to more complex structures actually traded.
433
Default Correlation and CDO Tranches

0.12
0.10
0.04
0.02
0.0
20
40
60 Cumulative loss L
80
100
Payo of a stylized CDO with attachment points at 20, 40 and 60 with two dierent loss distributions overlayed.
10
Payoff of tranches
Equity Mezzanine Senior
0.08
Probability
0.06
15
20
25
Dependence Independence
30
Payments of a Synthetic CDO

Consider CDO with attachment points K0 < < Kk and notional of tranche given by N(t) = (K Lt)+ (K1 Lt)+; dene cumulative loss of tranche as L(t) := N(0) N(t). Default payments of CDO. Default payment of tranche at nth default time Tn < T given by L(Tn) = (L(Tn) L(Tn1)) (the part of cumulative loss at Tn falling in the layer [K1, K]). Protection fee or premium payments. Holder of tranche receives periodic premium payments at 0 < t1 < < tN = T of size xCDO(tn tn1)N(tn). No initial payments. xCDO is called the (fair) CDO spread.
435
Synthetic CDOs: Pricing

Using partial integration we obtain for the value of the default payments of tranche
T
V def = E Q
0
D(t)dL(t)
T
= D(T )E Q L(T ) +
0
r(t)D(t)E Q L(t) dt.
As L(t) is a function of Lt this can be computed by one-dimensional integration if we know the distribution of Lt. Premium payments can also be expressed in terms of Lt.
436
Computing the distribution of Lt

In factor copula models the distribution of Lt can be determined using the mixture representation. Consider rst the simplest case of a homogeneous model (identical exposures, LGD and conditional default probabilities). Denote by Qt(v) = (1 F 1|V )(t | v) the conditional probability of default m before t given V = v. Then Lt = eMt, where Mt = i=1 Yt,i is binomial with parameters m and Qt. The unconditional df of Lt is then determined by integrating out the factors. For extensions to inhomogeneous portfolios or stochastic LGD we refer to the literature. [Laurent and Gregory, 2003] suggest the use of Fourier inversion techniques; [Hull and White, 2004] develop (approximate) recursion schemes.
C5. Explaining Market Data

Financial industry has developed CDS-indices for a variety of sectors; spreads for CDO-tranches on these indices are available. Observed CDS spreads. Consider the 5 year iTraxx EUR from August 4, 2004. The index level was 42 bp. Assuming = 60% and a homogenous portfolio, this leads to a risk-neutral marginal hazard rate Q = 0.007 and hence to a 5-year default probability of 3.44%. Observed spreads for CDO tranches. On the market we observed the following tranche prices (taken from [Hull and White, 2004]). [0, 3] [3,6] [6,9] [9,12] [12,22] 27.6 % 1.68% 0.70 % 0.43 % 0.20%
This means that holder of [0,3] tranche gets 27.6% of notional upfront and 5% of
438
outstanding notional per year.

Gaussian Copula and Market Quotes

Following market standard we try to model the observed prices by the Gaussian copula model with appropriate correlation parameter . Tranche Market Quote Gauss = 21.9% Gauss = 4.2% Gauss = 87.9% Gauss = 14.8% Gauss = 22.3% Gauss = 30.5% [0,3] [3,6] [6,9] [9,12] [12,22] 27.6 % 1.68% 0.70 % 0.43 % 0.20% 27.6 % 2.95% 1.05% 0.42% 0.09% 43.1 % 1.68% 0.10% 0.005% 6 105% 1.68% 1.35% 1.14% 0.87% 33.2 % 2.69% 0.70% 0.20% 0.02% 27.3 % 2.96% 1.07% 0.43% 0.09% 21.6 % 3.05% 1.35% 0.67% 0.20%
Dierent correlation parameter needed to explain price of dierent tranches. For mezzanine tranches is not unique.
Implied Tranche- and Base Correlation

Implied tranche correlation is the value of in a homogeneous Gaussian copula model leading to the observed tranche price. (generally not uniquely dened). Implied base correlation is the value of explaining the price (spread) of an equity tranche with the corresponding attachment point ([0,3], [0,6], [0,9], . . . ). Base correlation is unique. Moreover, (hypothetical) prices of equity tranches can be computed recursively from observed prices of CDO tranches. Other than tranche correlations, base correlations are generally uniquely dened. However, they are more dicult to interpret than tranche correlation.
440
Computing Base Correlation from CDO-Spreads

Consider example with two tranches, say, [0, 3] and [3, 6] with observed CDO spreads x and x. 1 2 Step 1. Spread of the equity tranche is decreasing in there is unique value of explaining observed spread x. 1 1 Step 2. (Price of [0, 6] tranche). Payo of [0, 6] tranche with spread x equals sum of payo of [0, 3]-tranche with x and [3, 6] tranche 2 2 with x. Denition of x price of [3, 6] tranche with x is zero. 2 2 2 Hence price of [0, 6] tranche with x equal to price of [0, 3]-tranche 2 with x; price of [0, 3] tranche computable using Gauss copula model 2 with . 1 Step 3. Find asset correlation explaining price of [0, 6] tranche 2 with spread x. 2
Implied Tranche- and Base Correlation

In our example we have the following values for tranche- and base correlation. Tranche Base Correlation Correlation [0,3] 22.4% 22.4% [3,6] 4.2% 32.1% [6,9] 14.8% 38.8% [9,12] 22.3% 43.3% [12,22] 30.5% 57.0%
40
30
20
10
10
12
14
16
18
20
22
This is a typical pattern of tranche and base correlation, called base correlation skew. In particular model based on Gauss copula cannot explain all prices simultaneously.
Base Correlation Skews

Informal explanations. High risk premia for low-probability events such as a payout on the default leg of a senior tranche. Market segmentation (not very convincing) Model risk and oversimplication of Gaussian copula model. Modelling attempts. Alternative copulas; see eg. [Burtschell et.al. 2005]. Random recovery rates, negatively correlated with probabilities (see eg. [Andersen and Sidenius, 2004]). default
Alternative models such as models with interacting intensity or common shock models.
Bibliography
[Abramowitz and Stegun, 1965] Abramowitz, M. and Stegun, I., editors (1965). Handbook of Mathematical Functions. Dover Publications, New York. [Acerbi and Tasche, 2002] Acerbi, C. and Tasche, D. (2002). On the coherence of expected shortfall. J. Banking Finance, 26:14871503. [Andersen and Sidenius, 2004] Andersen, L. and Sidenius, J. (2004). Extensions to the Gaussian copula: Random recovery and random factor loadings. Journal of Credit Risk, 1:2970. [Atkinson, 1982] Atkinson, A. (1982). The simulation of generalized inverse Gaussian and hyperbolic random variables. SIAM J. Sci. Comput., 3(4):502515.
[Balkema and de Haan, 1974] Balkema, A. and de Haan, L. (1974). Residual life time at great age. Ann. Probab., 2:792804. [Barndor-Nielsen, 1997] Barndor-Nielsen, O. (1997). Normal inverse Gaussian distributions and stochastic volatility modelling. Scand. J. Statist., 24:113. [Barndor-Nielsen and Shephard, 1998] Barndor-Nielsen, O. and Shephard, N. (1998). Aggregation and model construction for volatility models. Preprint, Center for Analytical Finance, University of Aarhus. [Black and Scholes, 1973] Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities. J. Polit. Economy, 81(3):637654.
[Bluhm, 2003] Bluhm, C. (2003). CDO modeling: techniques, examples and applications. Preprint, HVB Group, Munich. [Bluhm et al., 2002] Bluhm, C., Overbeck, L., and Wagner, C. (2002). An Introduction to Credit Risk Modeling. CRC Press/Chapman & Hall, London. [Cherubini et al., 2004] Cherubini, U., Luciano, E., and Vecchiato, W. (2004). Copula Methods in Finance. Wiley, Chichester. [Clayton, 1996] Clayton, D. (1996). Generalized linear mixed models. In Gilks, W., Richardson, S., and Spiegelhalter, D., editors, Markov Chain Monte Carlo in Practice, pages 275301. Chapman & Hall, London. [Crosbie and Bohn, 2002] Crosbie, P. and Bohn, J. (2002). Modeling default risk. Technical document, Moodys/KMV, New York.
[Crouhy et al., 2000] Crouhy, M., Galai, D., and Mark, R. (2000). A comparative analysis of current credit risk models. J. Banking Finance, 24:59117. [Crouhy et al., 2001] Crouhy, M., Galai, D., and Mark, R. (2001). Risk Management. McGraw-Hill, New York. [Daley and Vere-Jones, 2003] Daley, D. and Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes, volume I: Elementary Theory and Methods. Springer, New York, 2nd edition. [Daul et al., 2003] Daul, S., De Giorgi, E., Lindskog, F., and McNeil, A. (2003). The grouped t-copula with an application to credit risk. Risk, 16(11):7376. [Davis and Lo, 2001] Davis, M. and Lo, V. (2001). Infectious defaults. Quant. Finance, 1(4):382387.
[Due and Singleton, 1999] Due, D. and Singleton, K. (1999). Modeling term structures of defaultable bonds. Rev. Finan. Stud., 12:687720. [Eberlein and Keller, 1995] Eberlein, E. and Keller, U. (1995). Hyperbolic distributions in nance. Bernoulli, 1:281299. [Eberlein et al., 1998] Eberlein, E., Keller, U., and Prause, K. (1998). New insights into smile, mispricing, and value at risk: the hyperbolic model. J. Bus., 38:371405. [Embrechts et al., 1997] Embrechts, P., Klppelberg, C., and Mikosch, u T. (1997). Modelling Extremal Events for Insurance and Finance. Springer, Berlin. [Embrechts et al., 2002] Embrechts, P., McNeil, A., and Straumann, D. (2002). Correlation and dependency in risk management:
properties and pitfalls. In Dempster, M., editor, Risk Management: Value at Risk and Beyond, pages 176223. Cambridge University Press, Cambridge. [Fang et al., 1990] Fang, K.-T., Kotz, S., and Ng, K.-W. (1990). Symmetric Multivariate and Related Distributions. Chapman & Hall, London. [Fisher and Tippett, 1928] Fisher, R. and Tippett, L. (1928). Limiting forms of the frequency distribution of the largest or smallest member of a sample. Proc. Camb. Phil. Soc., 24:180190. [Frey and Backhaus, 2004] Frey, R. and Backhaus, J. (2004). Portfolio credit risk models with interacting default intensities: a Markovian approach. Preprint, University of Leipzig. [Frey and McNeil, 2002] Frey, R. and McNeil, A. (2002). VaR and
expected shortfall in portfolios of dependent credit risks: Conceptual and practical insights. J. Banking Finance, pages 13171344. [Frey and McNeil, 2003] Frey, R. and McNeil, A. (2003). Dependent defaults in models of portfolio credit risk. J. Risk, 6(1):5992. [Genest and Rivest, 1993] Genest, C. and Rivest, L. (1993). Statistical inference procedures for bivariate archimedean copulas. J. Amer. Statist. Assoc., 88:10341043. [Glasserman and Li, 2003] Glasserman, P. and Li, J. (2003). Importance sampling for portfolio credit risk. Preprint, Columbia Business School. [Gnedenko, 1943] Gnedenko, B. (1943). Sur la distribution limite du terme maximum dune srie alatoire. Ann. of Math., 44:423453. e e
[Gordy, 2003] Gordy, M. (2003). A risk-factor model foundation for ratings-based capital rules. J. Finan. Intermediation, 12(3):199232. [Greenspan, 2002] Greenspan, A. (2002). Speech before the Council on Foreign Relations. In International Financial Risk Management, Washington, D.C. 19th November. [Hawkes, 1971] Hawkes, A. (1971). Point spectra of some mutually exciting point processes. J. R. Stat. Soc. Ser. B Stat. Methodol., 33:438443. [Hull and White, 2004] Hull, J. and White, A. (2004). Valuation of a CDO and an nth to default CDS without Monte Carlo simulation. J. Derivatives, 12:823. [Jarrow and Yu, 2001] Jarrow, R. and Yu, F. (2001). Counterparty risk and the pricing of defaultable securities. J. Finance, 53:22252243.
[Joe, 1997] Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall, London. [Kealhofer and Bohn, 2001] Kealhofer, S. and Bohn, J. (2001). Portfolio management of default risk. Technical document, Moodys/KMV, New York. [Lando, 1998] Lando, D. (1998). Cox processes and credit risky securities. Rev. Derivatives Res., 2:99120. [Lando, 2004] Lando, D. (2004). Credit Risk Modeling: Theory and Applications. Princeton University Press, Princeton. [Laurent and Gregory, 2003] Laurent, J. and Gregory, J. (2003). Basket default swaps, CDOs and factor copulas. Preprint, University of Lyon and BNP Paribas.
[Li, 2001] Li, D. (2001). On default correlation: a copula function approach. J. of Fixed Income, 9:4354. [Marshall and Olkin, 1988] Marshall, A. and Olkin, I. (1988). Families of multivariate distributions. J. Amer. Statist. Assoc., 83:834841. [McNeil, 1998] McNeil, A. (1998). History repeating. Risk, 11(1):99. [McNeil et al., 2005] McNeil, A., Frey, R., and Embrechts, P. (2005). Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press, Princeton. [McNeil and Wendin, 2003] McNeil, A. and Wendin, J. (2003). Generalised linear mixed models in portfolio credit risk modelling. Preprint, ETH Zurich.
[Merton, 1974] Merton, R. (1974). On the pricing of corporate debt: The risk structure of interest rates. J. Finance, 29:449470. [Nelsen, 1999] Nelsen, R. (1999). Springer, New York. An Introduction to Copulas.
[Ogata, 1988] Ogata, Y. (1988). Statistical models for earthquake occurrences and residuals analysis for point processes. J. Amer. Statist. Assoc., 83:927. [Pickands, 1975] Pickands, J. (1975). Statistical inference using extreme order statistics. Ann. Statist., 3:119131. [Prause, 1999] Prause, K. (1999). The generalized hyperbolic model: estimation, nancial derivatives and risk measures. PhD thesis, Institut fr Mathematische Statistik, Albert-Ludwigs-Universitt u a Freiburg.
[Reiss and Thomas, 1997] Reiss, R.-D. and Thomas, M. (1997). Statistical Analysis of Extreme Values. Birkhuser, Basel. a [RiskMetrics-Group, 1997] RiskMetrics-Group (1997). Creditmetrics technical document. [Robert and Casella, 1999] Robert, C. and Casella, G. (1999). Monte Carlo Statistical Methods. Springer, New York. [Scholes, 2000] Scholes, M. (2000). Amer. Econ. Rev., pages 1722. Crisis and risk management.
[Schnbucher, 2003] Schnbucher, P. (2003). o o Pricing Models. Wiley.
Credit Derivatives
[Schnbucher and Schubert, 2001] Schnbucher, P. and Schubert, D. o o

(2001). Copula-dependent default risk in intensity models. Preprint, Universitt Bonn. a [Smith, 1987] Smith, R. (1987). Estimating tails of probability distributions. Ann. Statist., 15:11741207. [Smith, 1989] Smith, R. (1989). Extreme value analysis of environmental time series: an application to trend detection in ground-level ozone. Statist. Sci., 4:367393. [Steinherr, 1998] Steinherr, A. (1998). Derivatives. The Wild Beast of Finance. Wiley, New York. [Tavakoli, 2001] Tavakoli, J. (2001). Credit Derivatives and Synthetic Structures: A Guide to Investments and Applications. Wiley, New York, 2nd edition.
[Vasicek, 1997] Vasicek, O. (1997). Preprint, KMV Corporation.
The loan loss distribution.
[Yu, 2005] Yu, F. (2005). Correlated defaults and the valuation of defaultable securities. Math. Finance. To appear.
457

Quantitative Risk Management. Concepts, Techniques and Tools

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Quantitative Risk Management. Concepts, Techniques and Tools

Hochgeladen von

Copyright:

Verfügbare Formate

Quantitative Risk Management: Concepts, Techniques and Tools

Alexander J. McNeil Rdiger Frey Paul Embrechts

c 2006 (Embrechts, Frey, McNeil)

c 2006 (Embrechts, Frey, McNeil)

I: Introduction to QRM and Multivariate Risk Models

c 2006 (Embrechts, Frey, McNeil)

A. The Nature of the Challenge

c 2006 (Embrechts, Frey, McNeil)

A1. Financial Risk in Perspective

c 2006 (Embrechts, Frey, McNeil)

Risk and Randomness

c 2006 (Embrechts, Frey, McNeil)

The Road to Basel

c 2006 (Embrechts, Frey, McNeil)

c 2006 (Embrechts, Frey, McNeil)

Disasters of the 1990s

c 2006 (Embrechts, Frey, McNeil)

The Regulatory Process

Basel II: What is New?

c 2006 (Embrechts, Frey, McNeil)

A2. QRM: the Nature of the Challenge

c 2006 (Embrechts, Frey, McNeil)

c 2006 (Embrechts, Frey, McNeil)

Extremes Matter II: LTCM

c 2006 (Embrechts, Frey, McNeil)

The Interdependence and Concentration of Risks

c 2006 (Embrechts, Frey, McNeil)

Dependent Extreme Values: LTCM

c 2006 (Embrechts, Frey, McNeil)

c 2006 (Embrechts, Frey, McNeil)

The Problem of Scale

c 2006 (Embrechts, Frey, McNeil)

c 2006 (Embrechts, Frey, McNeil)

A3. Loss Distributions

Portfolio Values and Losses

c 2006 (Embrechts, Frey, McNeil)

A4. Risk Measures

c 2006 (Embrechts, Frey, McNeil)

VaR and Expected Shortfall

c 2006 (Embrechts, Frey, McNeil)

VaR in Visual Terms

probability density 0.10 0.15

c 2006 (Embrechts, Frey, McNeil)

Losses and Prots

Mean profit = 2.4

probability density 0.10 0.15

c 2006 (Embrechts, Frey, McNeil)

Coherent Measures of Risk

Purposes of Risk Measurement

B. Multivariate Risk Factor Models

c 2006 (Embrechts, Frey, McNeil)

B1. Motivation: Multivariate Risk Factor Data

Bivariate Daily Return Data

-0.05 0.0 0.05 0.10

23.01.89 23.01.90 Time

BMW and Siemens: 2000 daily (log) returns 1985-1993.

Three Extreme Days

-0.05 0.0 0.05 0.10

Those extreme days: 19.10.1987, 16.10.1989, 19.08.1991

New York, 19th October 1987

The Kremlin, 19th August 1991

B2. Basics of Multivariate Modelling

f (u1, . . . , ud)du1 . . . dud.