Beruflich Dokumente
Kultur Dokumente
A. Patton
FN3142
2015
Undergraduate study in
Economics, Management,
Finance and the Social Sciences
This is an extract from a subject guide for an undergraduate course offered as part of the
University of London International Programmes in Economics, Management, Finance and
the Social Sciences. Materials for these programmes are developed by academics at the
London School of Economics and Political Science (LSE).
For more information, see: www.londoninternational.ac.uk
This guide was prepared for the University of London International Programmes by:
A. Patton, Department of Economics, Duke University.
This is one of a series of subject guides published by the University. We regret that due to
pressure of work the author is unable to enter into any correspondence relating to, or arising
from, the guide. If you have any comments on this subject guide, favourable or unfavourable,
please use the form at the back of this guide.
Contents
1 Introduction 1
1.1 Route map to the guide . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Why study quantitative finance? . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Aims of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Learning outcomes for the course . . . . . . . . . . . . . . . . . . . . . . 2
1.6 Overview of learning resources . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6.1 The subject guide . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6.2 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6.3 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6.4 Online study resources . . . . . . . . . . . . . . . . . . . . . . . . 4
1.7 The structure of the subject guide . . . . . . . . . . . . . . . . . . . . . . 5
1.8 Examination advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
i
Contents
4 ARMA processes 41
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Autoregressive-moving average (ARMA) processes . . . . . . . . . . . . . 42
4.2.1 Autoregressive (AR) processes . . . . . . . . . . . . . . . . . . . . 42
4.2.2 The MA(1) process . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.3 Moving average (MA) processes . . . . . . . . . . . . . . . . . . . 43
4.2.4 ARMA processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Autocovariance functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Predictability, R2 and ARMA processes . . . . . . . . . . . . . . . . . . . 47
4.5 Choosing the best ARMA model . . . . . . . . . . . . . . . . . . . . . . 48
4.6 Overview of chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.7 Reminder of learning outcomes . . . . . . . . . . . . . . . . . . . . . . . 50
ii
Contents
iii
Contents
iv
Contents
v
Contents
vi
Contents
vii
Contents
viii
Contents
ix
Contents
x
Chapter 1
Introduction
This subject guide is designed to help you understand, and eventually master, the
material to be covered in the final examination of FN3142 Quantitative finance.
This material is generally technical in nature, and the best way to learn it is to work
through all of the activities and derivations in this subject guide and the accompanying
readings. This is not a passive course! Merely reading this subject guide is not enough –
you need to be willing to devote time to solving the numerous practice questions and
problems presented here. Be sure to check the VLE for additional practice questions
and discussion. Solutions for the activities are presented at the end of each chapter, to
help you learn some ‘tricks’ for answering these questions. The ‘test your knowledge’
questions at the end of each chapter have no solutions – you need to try to solve those
questions for yourself, and then convince yourself that you have done it correctly (and
then perhaps compare your answers with a friend or classmate).
1
1. Introduction
1.3 Syllabus
Building on concepts introduced in FN3092 Corporate finance and EC2020
Elements of econometrics, this course introduces econometric tools related to
time-series analysis and applies them to study issues in asset pricing, investment theory,
risk analysis and management, market microstructure, and return forecasting.
Topics addressed by this course are:
Time-series analysis
Risk management
understand some of the practical issues in the forecasting of key financial market
variables, such as asset prices, risk and dependence.
2
1.6. Overview of learning resources
Tsay, R.S., Analysis of Financial Time Series. (John Wiley & Sons, New Jersey,
2010) third edition. [ISBN 9780470414354].
The book by Tsay is the closest to this guide, though it is pitched at the Masters rather
than undergraduate level. He covers some of the material in more depth than is required
for this course. If you are interested in postgraduate study in finance or econometrics,
you may find the readings from Tsay helpful. Taylor’s book is also aimed at Masters
students, but covers several of the topics we cover in this guide. Campbell, Lo and
Mackinlay is a classic, graduate-level, book covering topics in finance and financial
econometrics.
3
1. Introduction
For additional reading on finance and investments topics that arise in this subject guide
see the following books:
Bodie, Z., A. Kane and A.J. Marcus Investments. (McGraw-Hill, U.S.A., 2013)
ninth edition [ISBN 9780077861674].
Elton, E.J., M.J. Gruber, S.J. Brown and W.N. Goetzmann Modern Portfolio
Theory and Investment Analysis. (John Wiley & Sons, New York, 2009) eighth
edition [ISBN 978118038093].
For additional reading/revision of regression and hypothesis testing topics see the
following books, both of which are aimed at undergraduate students:
The VLE
The VLE, which complements this subject guide, has been designed to enhance your
learning experience, providing additional support and a sense of community. It forms an
important part of your study experience with the University of London and you should
access it regularly.
The VLE provides a range of resources for EMFSS courses:
Self-testing activities: Doing these allows you to test your own understanding of
subject material.
Electronic study materials: The printed materials that you receive from the
University of London are available to download, including updated reading lists
and references.
4
1.7. The structure of the subject guide
A student discussion forum: This is an open space for you to discuss interests and
experiences, seek support from your peers, work collaboratively to solve problems
and discuss subject material.
Videos: There are recorded academic introductions to the subject, interviews and
debates and, for some courses, audio-visual tutorials and conclusions.
Recorded lectures: For some courses, where appropriate, the sessions from previous
years’ Study Weekends have been recorded and made available.
Study skills: Expert advice on preparing for examinations and developing your
digital literacy skills.
Feedback forms.
Some of these resources are available for certain courses only, but we are expanding our
provision all the time and you should check the VLE regularly for updates.
The Online Library contains a huge array of journal articles and other resources to help
you read widely and extensively.
To access the majority of resources via the Online Library you will either need to use
your University of London Student Portal login details, or you will be required to
register and use an Athens login: http://tinyurl.com/ollathens
The easiest way to locate relevant content and journal articles in the Online Library is
to use the Summon search engine. If you are having trouble finding an article listed in a
reading list, try removing any punctuation from the title, such as single quotation
marks, question marks and colons.
For further advice, please see the online help pages:
www.external.shl.lon.ac.uk/summon/about.php
Chapter 1: Introduction
• Aims and objectives for the course
• Recommended reading
5
1. Introduction
6
1.8. Examination advice
7
1. Introduction
each question. You are strongly advised to divide your time in this manner. The
examination for this course contains a mix of quantitative and qualitative questions.
Examples of examination questions are provided at the end of each chapter, and a
complete sample examination paper is provided at the end of this guide.
8
Chapter 2
Financial econometrics concepts and
statistics review
2.1 Introduction
This chapter introduces some key concepts and definitions from financial econometrics
that will be used throughout this subject guide: time series, sampling frequencies,
return definitions. We then review some fundamental definitions and results from
statistics: definitions and calculation of moments (means, variances, skewness, etc.) and
distribution and density functions. We also review definitions of moments for vector
random variables.
Introduce some terminology and concepts from financial econometrics for studying
financial data
Review results and definitions for moments, distributions and densities, for both
scalar and vector random variables.
9
2. Financial econometrics concepts and statistics review
10
2.3. Some important concepts
include:
exchange rates
interest rates
inflation rates
A central question in finance relates to the risk/return trade off, and so forecasting just
the price, or just the change in the price of an asset (i.e., the return on the asset) is
only one half of the problem. As risk-averse investors, we will also care about
forecasting risk, measured in some way, and so we may also be interested in forecasting
other properties of a time series, such as:
the liquidity of the market for the asset (measured by trading volume, trade
intensity, bid-ask spread, etc.)
risk management
portfolio management
option pricing
government/monetary policy
11
2. Financial econometrics concepts and statistics review
Financial data is available at a range of frequencies (how often we see a data point):
annually
monthly
weekly
daily
Economics: need to be able to tell whether a model makes economic sense or not.
• Most disciplines that use forecasts (or statistics in general) have specialist
forecasters within their field: biology has biostatisticians, medicine has
epidemiologists, atmospheric science has meteorologists, and economics has
econometricians. Why aren’t there just expert forecasters out there ready to
work with any data set? Because knowing where the data comes from and how
it is generated generally leads to better forecasts.
Common sense: do you believe the forecast a particular model tells you? Should
you?
Here Rt+1 is called the ‘net return’ (it will be a number like 0.03, -0.01, etc.) and
(1 + Rt+1 ) is called the ‘gross return’ (which will be something like 1.03, 0.99, etc.)
12
2.4. Forecasting returns and prices
Both definitions give approximately the same answer when returns are not ‘too large’
(less than around 0.10, or 10%, in absolute value).
Activity 2.1 Compute the arithmetic returns and the continuously compounded
returns for the following cases:
A L
Pt Pt+1 Rt+1 Rt+1
100 103
100 92
100 145
100 30
Throughout this course we will focus on continuously compounded returns. One reason
for doing so is that it allows for simple time series aggregation of returns. For example,
let Yt+5 = log Pt+5 − log Pt be the weekly return on an asset, and let
Xt+1 = log Pt+1 − log Pt be the daily return on the same asset. Then notice that:
13
2. Financial econometrics concepts and statistics review
and so the weekly continuously compounded return is simply the sum of the daily
continuously compounded returns through the week. It might be noted that while
continuously compounded returns allow for simple aggregation of returns through time,
they do not allow for simple aggregation of returns across stocks, to get a portfolio
return for example. However for reasonable values of returns the difference is small.
Activity 2.2 Let Zt+2 = (Pt+2 − Pt ) /Pt be the two-day arithmetic return, and let
Wt+1 = (Pt+1 − Pt ) /Pt be the one-day arithmetic return. Find an expression for
Zt+2 as a function of Wt+1 and Wt+2 (and notice that it is not as nice as the
expression for continuously compounded returns).
Here we will show that forecasting prices is equivalent to forecasting returns. This is true
so long as we include today’s price as part of the information set (which we always do).
The reason for the emphasis on the equivalence between prices and returns is that while
prices are often the economic object of interest, they have statistical properties that
2
Jensen’s inequality states that if g is a convex function (like the exponential) then g (E [X]) ≤
E [g (X)] .
14
2.5. Revision of basic statistics
make them hard to deal with. Prices (usually) have a ‘unit root’, meaning, amongst
other things, that the variance of prices diverges to infinity as time goes on. Dealing
with variables that have a unit root requires more care than required for variables with
no unit root. (We will look at this problem in a Chapter 18 on spurious regressions.)
Returns, generally, do not have a unit root, which makes their analysis a lot easier
econometrically.
For the remainder of the course we will discuss forecasting prices and returns
interchangeably.
Example 2.1 Coin-tossing: Let X = 1 if the coin comes up heads, and let X = 0 if
the coin comes up tails. Then X is a discrete random variable, and {0, 1} is the set of
possible realisations of the random variable.
Example 2.2 Rainfall: Let X be the rainfall in London in milliliters over the past
year. Then X is a continuous random variable with support on the non-negative real
line. (It’s not the entire real line because we can’t see negative rainfall.) Example
realisations of X in this example are 200, 34.3535, 0, etc.
F (x) ≡ Pr [X ≤ x]
15
2. Financial econometrics concepts and statistics review
If we do not know the complete distribution of a random variable, but we do know its
first 2 moments (i.e., its mean and variance) then we write that X ∼ (µ, σ 2 ). For some
distributions, such as the normal, knowing the mean and variance is sufficient to
completely describe the random variable. E.g.: if we know X is normally distributed
with mean 2 and variance 5 we write X ∼ N (2, 5) .Other distributions are characterised
by other properties: for example, if we know X is uniformly distributed between -3 and
10 we write X ∼ U nif (−3, 10) .
Definition 2.3 (Probability mass function, or pmf ) The probability mass function,
f, of a discrete random variable X is given by:
f (x) ≡ Pr [X = x]
The points at which a discrete random variable has a positive pmf are known as the
‘support’ of this random variable.
A continuous random variable is formally defined by F (x) being a continuous function
of x. For continuous random variables Pr [X = x] = 0 for all x, by the continuity of the
cdf, and so instead of using pmf ’s we use probability density functions:
Definition 2.4 (Probability density function, or pdf ) The probability density
function, f, of a continuous random variable X is the function that satisfies:
Z x
F (x) = f (s) ds for all x
−∞
Figure 2.1 shows an illustration of CDFs for a discrete and a continuous random
variable, and their corresponding PMF and PDF.
Activity 2.3 Unlike a pmf, a pdf can take values greater than one. (As the pmf is a
probability, it will always lie between zero and one.) To prove this, consider a
random variable uniformly distributed on the interval [a, b] , where a < b. A
‘Unif(a, b)’ random variable has the cdf F (x) = (x − a) / (b − a) , for a < x < b.
Find the pdf of this random variable, and then find values of a and b such that the
pdf takes values greater than one.
16
2.5. Revision of basic statistics
Figure 2.1: CDFs, PDF and PMF for a continuous and discrete random variable.
17
2. Financial econometrics concepts and statistics review
Definition 2.5 (Time series) A time series is an ordered set of realisations from some
random variable. Usually the set is ordered according to time (thus the name ‘time
series’).
The field of time series analysis is broad and complex. Some of the definitions given
here are adequate for the purposes of this course, but may not be sufficient for
higher-level study. The standard graduate text on time series analysis for
econometricians is Hamilton (1994), and the interested student should look there for a
more rigorous treatment.
Examples of time series can be found everywhere: daily temperatures in Paris, closing
price of Google shares, sales of prawn sandwiches at Wright’s Bar each Tuesday, etc. In
Figure 2.2 I plot a time series of daily EUR/US dollar exchange rates and exchange rate
returns over the period 1999-2009, and in Figure 2.3 I plot a time series of one-second
prices on IBM on 31 December 2009.
Definition 2.7 (Variance) The variance of the random variable X with pdf f is:
σ 2 ≡ V [X] ≡ E (X − µ)2
Z ∞
= (x − µ)2 · f (x) dx
−∞
= E X 2 − µ2
The ‘standard deviation’ of a random variable is the square root of the variance:
p
σ = E [X 2 ] − µ2
Definition 2.8 (Skewness) The skewness of the random variable X with pdf f is:
E (X − µ)3
s ≡ Skew [X] ≡
Z ∞ σ3
1
= (x − µ)3 · f (x) dx
σ 3 −∞
Definition 2.9 (Kurtosis) The kurtosis of the random variable X with pdf f is:
E (X − µ)4
κ ≡ Kurt [X] ≡
Z ∞ σ4
1
= 4
(x − µ)4 · f (x) dx
σ −∞
18
2.5. Revision of basic statistics
Figure 2.2: Euro/US dollar exchange rate, and daily exchange rate return, January 1999
to December 2009.
19
2. Financial econometrics concepts and statistics review
Figure 2.3: IBM stock price, and 1-second returns, on 31 December 2009.
20
2.5. Revision of basic statistics
Figure 2.4: Student’s (1927) memory aids for platykurtosis (κ < 3) and leptokurtosis
(κ > 3).
Definition 2.10 (Moment) The pth ‘moment’ of the random variable X with pdf f is:
Z ∞
p
m̃p ≡ E [X ] = xp · f (x) dx
−∞
th
Definition 2.11 (Central moment) The p ‘central moment’ of the random variable
X with pdf f is: Z ∞
p
mp ≡ E [(X − µ) ] = (x − µ)p · f (x) dx
−∞
Activity 2.4 A random variable is symmetric around zero if f (x) = f (−x) for all
x. Using the integral definition of skewness, show that all random variables that are
symmetric around zero must have zero skewness.
The above definitions apply for continuous random variables, which is the most
common case. For a discrete random variable with pmf f, the definition is modified
slightly. For example, the pth moment is defined as
n
X
m̃p ≡ E [X p ] = xpi f (xi )
i=1
where the sum is taken over all the possible values for X. These values are denoted
(x1 , x2 , ..., xn ) . If we set p = 1 obtain the mean for a discrete random variable:
n
X
µ = E [X] = xi f (xi )
i=1
21
2. Financial econometrics concepts and statistics review
Find the mean and standard deviation of the return on this stock.
FXY (x, y) ≡ Pr [X ≤ x ∩ Y ≤ y]
If (X, Y ) have cdf FXY , we write that ‘(X, Y ) is distributed according to FXY ’, or
‘(X, Y ) ∼ FXY ’ in shorthand.
The symbol ∩ denotes intersection, and can be thought of as ‘and’ in this application.
Union is denoted by ∪ and can be thought of as ‘or’ in this application.
Definition 2.13 (Bivariate probability mass function) The probability mass
function, fXY , of discrete random variables X and Y is given by:
fXY (x) ≡ Pr [X = x ∩ Y = y]
The marginal distribution of X (i.e., the distribution just of X rather than of both X
and Y ) is obtained as
The marginal density of X is obtained from the joint density by ‘integrating out’ Y :
Z ∞
fX (x) = fXY (x, y) dy
−∞
Recall that if X and Y are independent then their joint cdf is just the product of the
univariate cdf s:
22
2.5. Revision of basic statistics
and if they are discrete then their joint pmf is the product of their univariate pmf s:
fXY (x) ≡ Pr [X = x ∩ Y = y] = Pr [X = x] Pr [Y = y] ≡ fx (x) fy (y) ;
if they are continuous then their joint pdf is the product of their univariate pdf s:
∂ 2 FXY (x, y) ∂2
fXY (x) = = (Fx (x) Fy (y)) = fx (x) fy (y)
∂x∂y ∂x∂y
The following two important quantities are derived from the joint distribution of two
random variables:
Definition 2.15 (Covariance) The covariance between the random variables X and Y
with joint pdf fXY is:
Cov [X, Y ] ≡ E [(X − µx ) (Y − µy )]
Z ∞Z ∞
= (x − µx ) (y − µy ) fXY (x, y) dxdy
−∞ −∞
= E [XY ] − µx µy
where µx = E [X] and µy = E [Y ] .
Definition 2.16 (Correlation) The correlation between the random variables X and Y
is:
Cov [X, Y ]
Corr [X, Y ] ≡ p
V [X] · V [Y ]
−1 ≤ Corr [X, Y ] ≤ 1
Recall that if X and Y are independent then the conditional cdf and pdf of Y |X = x is
equal to the unconditional cdf and pdf of Y.
23
2. Financial econometrics concepts and statistics review
Activity 2.6 Covariance and correlation measure the linear relationship between
two variables, and it is possible that two variables have zero correlation but are not
independent. For example, let X ∼ N (0, 1) , then
≡ E [X1 , X2 , ..., Xn ]0
24
2.6. Overview of chapter
Notice that Σ is a symmetric matrix, so element (i, j) is equal to element (j, i) , for any
i, j.
Definition 2.20 (Correlation matrix) Any covariance matrix can be decomposed into
a matrix containing the standard deviations on the diagonal and the correlation matrix
Σ =
D R D
(n×n) (n×n)(n×n)(n×n)
σ1 0 · · · 0
0 σ2 · · · 0
where D = ..
.. . . ..
. . . .
0 0 · · · σn
1 ρ12 · · · ρ1n
ρ12 1 · · · ρ2n
R = ..
.. . . .
. ..
. .
ρ1n ρ2n · · · 1
If all variances are strictly positive, then the correlation matrix can be obtained from
the covariance matrix be pre- and post-multiplying the covariance matrix by D−1 :
R = D−1 ΣD−1
Higher moments, such as skewness and kurtosis, can also be defined for vector random
variables, but it requires some cumbersome notation, and will not be needed in this
course.
Activity 2.7 Let X ∼ N (2, 5) , Y ∼ N (0, 3) and W ∼ N (1, 6) , and assume that
all three variables are independent of each other. Define Z = [X, Y, W ]0 . Find the
mean vector, covariance matrix, and correlation matrix of Z.
25
2. Financial econometrics concepts and statistics review
Payoff Probability
100 0.80
70 0.05
50 0.10
0 0.05
(a) Plot the cdf and pmf of the pay-off on this bond.
(b) Find the mean (expected) pay-off on this bond.
(c) Find the standard deviation of the pay-off on this bond.
2. Let Zt+3 = (Pt+3 − Pt ) /Pt be the three-day arithmetic return, and let
Wt+1 = (Pt+1 − Pt ) /Pt be the one-day arithmetic return. Find an expression for
Zt+3 as a function of Wt+1 , Wt+2 and Wt+3 .
3. Don’t forget to check the VLE for additional practice problems for this chapter.
A L
Pt Pt+1 Rt+1 Rt+1
100 103 0.0300 0.0296
100 92 -0.0800 -0.0834
100 145 0.4500 0.3716
100 30 -0.7000 -1.2040
So when returns (either arithmetic or logarithmic) are small, less than around 0.10, the
two definitions of returns are very close. When returns are large the two definitions give
different answers. (Also notice that while arithmetic returns can never go below -100%,
logarithmic returns can be below -100%: in the fourth row the arithmetic return is -70%
while the logarithmic return is -120.4%.)
26
2.9. Solutions to activities
Activity 2.2
We are given:
Pt+1 − Pt
Wt+1 =
Pt
Pt+2 − Pt
and Zt+2 =
Pt
then we derive
Pt+2 − Pt
Zt+2 =
Pt
Pt+2 − Pt+1 Pt+1 − Pt
= +
Pt Pt
Pt+2 − Pt+1 Pt+1
= + Wt+1
Pt+1 Pt
= Wt+2 (1 + Wt+1 ) + Wt+1
Activity 2.3
0, x≤a
x−a
F (x) ≡ Pr [X ≤ x] = b−a
, a<x<b
1, x≥b
0, x≤a
∂F (x)
1
f (x) ≡ = b−a
,a<x<b
∂x
0, x≥b
(Note that the pdf is not a function of x: this means it is a flat line over the interval
[a, b], which is why this distribution is called the ‘uniform’ distribution.) So now we just
need to find values of a and b where f (x) is greater than one. This will be true for any
a and b where b − a < 1. For example, if a = 0 and b = 1/2, then f (x) = 1/ (1/2) = 2.
Activity 2.4
27
2. Financial econometrics concepts and statistics review
Z ∞
E X3 = x3 f (x) dx
−∞
Z 0 Z ∞
3
= x f (x) dx + x3 f (x) dx
−∞ 0
Z 0 Z ∞
3
= x f (−x) dx + x3 f (x) dx, since f (x) = f (−x)
Z−∞
∞
0
Z ∞
3
x3 f (x) dx, changing the sign of x in first integral
= −x f (x) dx +
Z0 ∞ 0
−x3 + x 3
= f (x) dx, gathering terms
0
= 0
It can also be shown that skewness is zero when the variable is symmetric around some
general point a, using the same logic as above (though a little more notation is needed).
Activity 2.5
It is useful to expand the table and fill it with some other calculations:
Payoff Probability Payoff 2 Payoff ×Prob Payoff 2 ×Prob
2 0.4 4 0.8 1.6
0 0.5 0 0.0 0.0
-5 0.1 25 -0.5 2.5
Sum -3 1.0 29 0.3 4.1
Then we can obtain the mean:
3
X
E [X] = xi f (xi ) = 2 × 0.4 + 0 × 0.5 + (−5) × 0.1 = 0.3
i=1
Next we compute the uncentered second moment, the variance and the standard
deviation:
3
X
2
E X = x2i f (xi ) = 4 × 0.4 + 0 × 0.5 + 25 × 0.1 = 4.1
i=1
Activity 2.6
28
2.9. Solutions to activities
Part (2): There are many possible ways to show that these variables are not
independent. The easiest is to show that the conditional density of X given X 2 is not
2
equal to the unconditional density of X. For example, assume√ that X √takes the value
of 2. Then there are only two possible values for X, namely 2 and − 2, so the
distribution of X|X 2 = 2 is discrete:
√
2
−√2, prob 1/2
fx|x2 x|x = 2 = ,
+ 2, prob 1/2
Activity 2.7
Z = [X, Y, W ]0
so E [Z] = E [X, Y, W ]0 = [E [X] , E [Y ] , E [W ]]0 = [2, 0, 1]0
Σ ≡ V [Z] = V [X, Y, W ]0
V [X] Cov [X, Y ] Cov [X, W ]
= Cov [X, Y ] V [Y ] Cov [Y, W ]
Cov [X, W ] Cov [Y, W ] V [Z]
5 0 0
= 0 3 0
0 0 6
All of the covariances are zero since we are told that the variables are independent.
Finally, the correlation matrix is very simple (this was a bit of a trick question): since
all the covariances are zero, all correlations are also zero, and so there is nothing to
work out:
1 0 0
R= 0 1 0
0 0 1
29
2. Financial econometrics concepts and statistics review
30
Chapter 3
Basic time series concepts
3.1 Introduction
Many problems in quantitative finance involve the study of financial data. Such data
most often comes in the form of ‘time series,’ which is a sequence of random variables
that are ordered through time. Before moving on to financial applications, we must first
cover some fundamental topics in time series analysis, such as autocorrelation, white
noise processes and ARMA procresses (covered in the next chapter). These two chapters
are the most theoretical in this guide, and it may not appear too related to finance, but
they lay the foundations for the topics we will cover in later chapters.
Present the ‘law of iterated expectations’ and illustrate its use in time series
analysis.
Describe the various forms of ‘white noise’ processes used in the analysis of
financial data
31
3. Basic time series concepts
We will generally focus on the special case of ‘covariance stationary’ time series, where:
E [Yt ] = µ ∀ t
V [Yt ] = σ 2 ∀ t
and Cov [Yt , Yt−j ] = γj , ∀ j, t
(The notation ‘∀ t’ means ‘for all t.’) The first two conditions imply that the
unconditional means and variances of each of the Yt ’s are assumed to be the same
through time. This does not imply that their conditional means and variances will be
the same, and we will spend a lot of time looking at these. Many economic and financial
time series can be treated as though this assumption holds. The third condition implies
that that all autocovariances, denoted γj and defined below, are also constant through
time, so when describing an autocovariance we need only denote it with a ‘j,’ not also
with a t.
Definition 3.1 (Autocovariance) The j th -order autocovariance of a time series Yt is:
γj = Cov [Yt , Yt−j ]
= E [(Yt − µ) (Yt−j − µ)]
= E [Yt · Yt−j ] − µ2
Note that γ0 = Cov [Yt , Yt ] = V [Yt ] .
32
3.3. The Law of Iterated Expectations
Example 3.1 Let Yt take the value 1 or 0 depending on whether the tth coin toss
came up heads or tails. If the coin is fair (that is, the probability of seeing a ‘head’ is
always equal to one-half) then all of the autocorrelations of Yt are zero. That is
This is because a fair coin has no ‘memory’: the probability of seeing a tail at time t
is unaffected by whether we saw a tail at time t − j (j 6= 0).
Example 3.2 Let It be all the information available as at date t, so It ⊆ It+1 . Then
Example 3.3 Again, let It be all the information available as at date t, and notice
that an unconditional expectation employs an ‘empty’ information set, which is
smaller than any non-empty information set. Then
E [ Et [Yt+1 ] ] = E [Yt+1 ]
Activity 3.1 (1) If X = −1 with probability 1/2 and X = +1 with probability 1/2,
and E [Y |X] = X, show that E [Y ] = 0.
(2) If E [Y |X] = X 2 , and X ∼ N (0, 32 ) , find E [Y ] .
Almost everything we derive in this chapter, and much of what we will derive in later
chapers, is based on just three things:
33
3. Basic time series concepts
E [a + bX] = a + bE [X]
E [a + bX + cY ] = a + bE [X] + cE [Y ]
V [a + bX + cY ] = V [bX + cY ]
= V [bX] + V [cY ] + 2Cov [bX, cY ]
= b2 V [X] + c2 V [Y ] + 2bc · Cov [X, Y ]
Activity 3.2 Consider two stocks that generate returns as X ∼ N (1, 2) and
Y ∼ N (2, 3) and assume that these returns are independent. Now consider two
portfolios of these two stocks, where
1 1
W = X+ Y
2 2
3 1
Z = X+ Y
4 4
Let U = [W, Z]0 . Find the mean vector, covariance matrix, and correlation matrix of
U.
34
3.5. Application to an AR(1) process
Definition 3.5 (iid white noise) εt is independent and identically distributed (iid)
white noise if
εt is independent of εt−j ∀ j 6= 0, and
εt ∼ F ∀ t, where F is some distribution
In this case we write that εt ∼ iid W N (or εt ∼ iid F ). If in addition we know that
E [εt ] = 0 then we write εt ∼ iid W N (0), and say that εt is a zero-mean iid white noise
process. If we know that E [εt ] = 0 and V [εt ] = σ 2 then we write εt ∼ iid W N (0, σ 2 )
and we say that εt is a zero-mean iid white noise process with variance σ 2 .
Notice that these three definitions carry an increasing amount of information. Simple
white noise only imposes that the process has zero serial correlation. iid white noise
imposes zero serial dependence, which is stronger than zero serial correlation, and
further imposes that the distribution is the same at all points in time. Gaussian white
noise imposes both serial independence and a distributional assumption on the series.
Most often, people work with Gaussian white noise (it simplifies many calculations) but
it should be noted that this is the most restrictive form of white noise.
Activity 3.3 Any time series, Yt+1 , may be decomposed into its conditional mean,
Et [Yt+1 ] , and a ‘remainder’ process, εt+1 :
2. Show that εt+1 has mean zero conditional on the information set available at
time t.
3. The white noise term εt+1 is uncorrelated with the conditional mean term,
Et [Yt+1 ] . (Hint: it might make the problem easier if you define a new variable,
µt+1 = Et [Yt+1 ] , and treat µt+1 as a separate random variable which is
observable at time t. Note that we denote it with a subscript ‘t + 1’ even though
it is observable at time t.)
35
3. Basic time series concepts
The above equation is a particular type of time series, namely an ‘autoregressive process
of order 1’ or a ‘first-order autoregression’, or an ‘AR(1)’ process. It’s called this
because the variable Yt is ‘regressed’ onto itself (the ‘auto’ part of the name) lagged 1
period (the ‘first-order’ part of the name). For the rest of this course you can assume
that the time series we consider are stationary (we will cover non-stationary processes
later in the notes). Notice that the time series defined in the equation above has only
two fundamental parameters: φ, called the (first-order) autoregressive coefficient, and
σ 2 , the variance of the ‘innovation process’, εt . All the properties of Yt are simply
functions of φ and σ 2 , and when asked for a particular property of Yt it should always
be given as a function of φ and σ 2 .
Problem 1
E [Yt ] = φE [Yt−1 ]
µ = φµ
µ (1 − φ) = 0 , which implies that
µ = 0 as |φ| < 1
Problem 2
γ0 ≡ V [Yt ]
= V [φYt−1 + εt ]
= V [φYt−1 ] + V [εt ] + 2Cov [φYt−1 , εt ]
= φ2 V [Yt−1 ] + σ 2 + 0
γ0 = φ2 γ0 + σ 2
1 − φ2 = σ 2 , and so
γ0
σ2
γ0 =
1 − φ2
36
3.6. Overview of chapter
Problem 3
Problem 4
Describe the various forms of ‘white noise’ processes used in the analysis of
financial data
37
3. Basic time series concepts
2. Verify explicitly that Et [Yt+2 ] = Et [Et+1 [Yt+2 ]] , as implied by the law of iterated
expectations, for an AR(1) process.
3. Don’t forget to check the VLE for additional practice problems for this chapter.
Part (1): If X = −1 with probability 1/2 and X = +1 with probability 1/2, and
E [Y |X] = X, show that E [Y ] = 0.
If E [Y |X] = X,
then E [E [Y |X]] = E [X] , taking unconditional expectation of both sides
Note that E [E [Y |X]] = E [Y ] , by the LIE
38
3.9. Solutions to activities
E [Y |X] = X 2
so E [E [Y |X]] = E X 2
Activity 3.2
and so
W V [W ] Cov [W, Z] 1.25 1.125
V [U] = V = =
Z Cov [W, Z] V [Z] 1.125 1.3125
39
3. Basic time series concepts
Activity 3.3
and so εt+1 also has unconditional mean zero. Next show that it is serially uncorrelated:
Thus Cov [εt , εt−j ] = 0 for all j > 0, and so εt+1 is serially uncorrelated. So we have
shown that εt+1 is a zero-mean white noise process (part (1) of the problem) by using
the answer to part (2). Now we move to part (3):
Thus εt+1 is uncorrelated with the conditional mean term, µt+1 ≡ Et [Yt+1 ] .
40
Chapter 4
ARMA processes
4.1 Introduction
This chapter builds on the concepts and techniques introduced in the previous chatper.
We will study the class of autoregressive-moving average (ARMA) processes, which are
a widely-used and very useful class of models for time series data.
Introduce the most widely-used classes of time series models: autoregressive (AR),
moving average (MA) and ARMA processes.
Describe the various forms of ‘white noise’ processes used in the analysis of
financial data
41
4. ARMA processes
The above equation is another common type of time series, known as a ‘moving average
process of order 1’, or a ‘first-order moving average’, or a ‘MA(1)’ process. By recursive
substitution we can get some insight as to where this process gets its name (we drop the
intercept, φ0 , for simplicity):
Yt = εt + θεt−1
= εt + θ (Yt−1 − θεt−2 ) = εt + θYt−1 − θ2 εt−2
= εt + θYt−1 − θ2 (Yt−2 − θεt−3 ) = εt + θYt−1 − θ2 Yt−2 + θ3 εt−3
= ...
X∞
= εt + (−1)i+1 θi Yt−i
i=1
Thus an M A (1) process can be re-written as a weighted average of all lags of Yt plus
some innovation term.
Activity 4.1 Let Yt be defined as below. Find its mean, variance, first
autocovariance and second autocovariance.
Yt = φ0 + εt + θεt−1 , εt ∼ W N 0, σ 2
42
4.3. Autocovariance functions
We showed that for this process the first two autocovariances are
φσ 2
γ1 = = φγ0
1 − φ2
φ2 σ 2
γ2 = = φ2 γ0
1 − φ2
where γ0 = V [Yt ], which was denoted σy2 above. It can be shown (see Activity 3.4) that
for stationary AR(1) processes, the j th -order autocovariance is
γj = φj γ0 for j ≥ 0
43
4. ARMA processes
Yt = φ0 + φ1 Yt−1 + φ2 Yt−2 + εt , εt ∼ W N 0, σ 2
φ1
ρ1 =
1 − φ2
ρj = φ1 ρj−1 + φ2 ρj−2 for j ≥ 2
Yt = 0.6Yt−1 + 0.2Yt−2 + εt
Yt = 0.1Yt−1 + 0.7Yt−2 + εt
Yt = 0.4Yt−1 − 0.4Yt−2 + εt
The autocorrelation functions of these processes is given in the lower panel of Figure 4.1.
Now let us look at the autocorrelations of some MA processes. We already derived the
autcovariance function for an MA(1) process, if we note that γj = 0 for j ≥ 2 (which is
simple to show). Next consider the ACF of an MA(2) process.
Yt = φ0 + εt + θ1 εt−1 + θ2 εt−2 , εt ∼ W N 0, σ 2
Let us now look at the ACFs of a few MA(1) and MA(2) processes, presented in Figure
4.2.
Yt = εt + 0.8εt−1 , εt ∼ W N (0, 1)
Yt = εt + 0.2εt−1 , εt ∼ W N (0, 1)
Yt = εt − 0.5εt−1 , εt ∼ W N (0, 1)
Yt = εt + 0.6εt−1 + 0.2εt−2 , εt ∼ W N (0, 1)
Yt = εt + 0.1εt−1 + 0.7εt−2 , εt ∼ W N (0, 1)
Yt = εt + 0.4εt−1 − 0.4εt−2 , εt ∼ W N (0, 1)
44
4.3. Autocovariance functions
45
4. ARMA processes
46
4.4. Predictability, R2 and ARMA processes
V [εt+1 ]
R2 = 1 −
V [Yt+1 ]
If the variance of the residual is exactly equal to the variance of the original variable
then R2 = 0, and we would conclude that the model is not very good. If the residual has
very low variance, then R2 will be close to 1, and we conclude that we have a good
model. We can also use R2 to measure the degree of (mean) predictability in a time
series process.
Let’s consider an AR(1) as an example:
V [εt+1 ]
R2 = 1 −
V [Yt+1 ]
σ2
= 1− 2
σ / (1 − φ21 )
= φ21
So the larger the autoregressive coefficient, φ1 , the larger the R2 , and the greater the
degree of predictability in this variable.
1. (a) Derive the variance of the forecast error for the optimal one-step and two-step
forecasts of each of Yt and Xt .
(b) Find the values for θ and σ 2 that make Xt and Yt equally predictable
(according to the variance of their forecast errors) for one-step and two-step
forecasts.
(c) Given these values, which variable is easier to predict three steps ahead?
47
4. ARMA processes
where et is the residual from the ARMA model. A lower MSE means that the errors are
generally smaller, which means that the model is providing a better fit. Note that the
MSE can never increase when you increase the order of an ARMA model – the worst
that can happen is that the MSE stays the same. However the (potential) improvement
in MSE may not come from the model being good; it may come from ‘in-sample
overfitting.’ Over-fitting occurs when a researcher adds variables to a model that appear
to be good, because they increase the R2 or lower the MSE, but are not really useful for
forecasting. Thus, the MSE goodness-of-fit measure is not useful for helping us to find a
good model for forecasting, as it ignores the impact of estimation error on forecast
accuracy.
Let us now consider a few alternative measures for choosing a model for forecasting.
Recall that the sample variance is usually defined as the sum of squared errors divided
by (T − 1) to account for the fact that the sample mean is estimated. An analogue to
MSE that does reflect the number of parameters estimated is s2 :
n
2 1 X 2
s = e
T − k t=1 t
T
= · M SE
T −k
where k is the number of parameters in the regression.
It turns out that a number of other interesting goodness-of-fit measures can also be
written as a function of the sample size and the number of parameters, multiplied by
the MSE. The Akaike Information Criterion (AIC), Hannan-Quinn Information
Criteron (HQIC) and Schwarz’s Bayesian Information Criterion (known as either BIC
or SIC) are:
2k
AIC = exp · M SE
T
HQIC = {log (T )}2k/T · M SE
√ 2k/T
BIC = T · M SE
To select the best model from a given set of models, we estimate all of them and then
choose the model that minimises our selection criterion: MSE, s2 , AIC, HQIC, or BIC.
48
4.5. Choosing the best ARMA model
Figure 4.3: Penalty applied by various model selection criteria for the addition of an extra
parameter. MSE applies no penalty, so this function is always equal to 1. Here T = 1000.
To see how these four measures compare, we can plot the penalty term that each of
them applies for adding an extra regressor, see Figure 4.3. A penalty factor of 1 implies
no penalty at all, but a penalty factor greater than 1 implies some penalty. Obviously,
the MSE has a penalty factor of 1 for all k.
Figure 4.3 shows the proportion by which the MSE must decrease before the
information criterion will report that the larger model is an improvement. The MSE
itself simply requires that the larger model decreases the MSE by some (possibly tiny)
amount. The s2 measure requires a bit more improvement, the AIC and HQIC more
still and the BIC is the strictest measure. As such, when selecting between different
models, the MSE will always choose the largest, while the model that the BIC selects
will generally be smaller than the model selected by the HQIC, AIC and s2 .
The AIC and BIC are the two most widely-used model selection criteria, and there are
various reasons why AIC or BIC is better than the other. Most software packages
usually report both, and leave it to the researcher to decide which measure to use. The
BIC will pick smaller models, which is generally a good thing for forecasting, while the
AIC will tend to pick larger models.
49
4. ARMA processes
2. Let
Yt = φ1 Yt−1 + φ2 Yt−2 + θεt−1 + εt ,
εt ∼ W N 0, σ 2
50
4.9. Solutions to activities
Let Yt be defined as below. Find its mean, variance, first autocovariance and second
autocovariance.
Yt = φ0 + εt + θεt−1 , εt ∼ W N 0, σ 2
γ0 = V [Yt ]
= V [φ0 + εt + θεt−1 ]
= V [εt ] + V [θεt−1 ] + 2Cov [εt , θεt−1 ]
= σ 2 + θ2 σ 2 + 0
σ 2 1 + θ2
=
Activity 4.2
Yt = φ0 + εt + θ1 εt−1 + θ2 εt−2 , εt ∼ W N 0, σ 2
γ0 = V [Yt ]
= V [φ0 + εt + θ1 εt−1 + θ2 εt−2 ]
= V [εt ] + V [θ1 εt−1 ] + V [θ2 εt−2 ]
= σ 2 + θ12 σ 2 + θ22 σ 2
σ 2 1 + θ12 + θ22
=
51
4. ARMA processes
= θ1 E ε2t−1 + θ1 θ2 E ε2t−2
= θ1 σ 2 + θ1 θ2 σ 2
= σ 2 θ1 (1 + θ2 )
= θ2 E ε2t−2
= θ2 σ 2
γj = 0 for j ≥ 3
Activity 4.3
(a) Derive the variance of the forecast error for the optimal one-step and two-step
forecasts of each of Yt and Xt .
⇒
Ŷ = Et [Yt+1 ] = φYt
y t+1,t
so V et+1,t = 1
Ŷt+2,t = Et [Yt+2 ] = φ2 Yt
since Yt+2 = φYt+1 + εt+2
= φ (φYt + εt+1 ) + εt+2
= φ2 Yt + φεt+1 + εt+2
so V eyt+2,t = 1 + φ2
X̂ = Et [Xt+1 ] = θut
xt+1,t
so V et+1,t = σ 2
X̂ = Et [Xt+2 ] = 0
xt+2,t
so V et+2,t = σ 2 1 + θ2
(b) Find the values for θ and σ 2 that make Xt and Yt equally predictable (according to
the variance of their forecast errors) for one-step and two-step forecasts.
52
4.9. Solutions to activities
V eyt+1,t = V
x
et+1,t ⇒ σ 2 = 1
V eyt+2,t = V
x
et+2,t ⇒ 1 + φ2 = 1 + θ2 ⇒ θ = ±φ
(c) Given these values, which variable is easier to predict three steps ahead?
⇒
Ŷt+3,t = Et [Yt+3 ] = φ3 Yt
so V eyt+3,t = 1 + φ2 + φ4
X̂t+3,t = Et [Xt+3 ] = 0
so V ext+3,t = σ 2 1 + θ2
= 1 + φ2 , if σ 2 = 1 and θ = ±φ
≤ 1 + φ2 + φ4
53