Sie sind auf Seite 1von 262

Continuous Distributions

by R.J. Reed
These notes are based on handouts for lecture modules given jointly by myself and the late Jeff Harrison.
A list of the distributions considered in this report is given on page 255.
There are many additional theoretical results in the exercises—full solutions are provided at the end.
The colour red indicates a hyperlink. The hyperlink to a page number is always to the top of the page, more
precisely to the headline of the page. Thus if a link is given as page 40(§13.1), it is better to click on the section
reference—the page number is useful if you have a printed copy.
Version 2, January 2019. Version 3, August 2019. Version 4, March 2020.

Contents

1 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Revision of some basic results: 3. Exercises: 5. Order statistics: 8. Stable distributions: 11.
Infinitely divisible distributions: 15. Exercises: 15.

2 Univariate Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19


The uniform distribution: 19. Exercises: 25. The exponential distribution: 27. Exercises: 31.
The Gamma and χ2 distributions: 33. Exercises: 37. The beta and arcsine distributions: 40.
Exercises: 44. The normal distribution: 46. Exercises: 49. The lognormal distribution: 52.
Exercises: 55. The power law and Pareto distributions: 56. Exercises: 60. The t, Cauchy and
F distributions: 63. Exercises: 68. 2
Non-central χ , t and F : 71. Exercises: 76. Size, shape and
related characterization theorems: 77. Exercises: 81. Laplace, Rayleigh and Weibull distributions: 81.
Exercises: 83. The logistic distribution: 86. Extreme value distributions: 89. Exercises: 96.
The Lévy and inverse Gaussian distributions: 98. Exercises: 102. Other distributions with bounded
support: 103. Exercises: 104. Other distributions with unbounded support: 107. Exercises: 110.

3 Multivariate Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117


General results: 117. Exercises: 121. The bivariate normal: 123. Exercises: 127. The multivariate
normal: 130. Exercises: 139. Quadratic forms of normal random variables: 141. Exercises: 156.
The bivariate t distribution: 159. The multivariate t distribution: 160. Exercises: 163. The Dirichlet
distribution: 163. Exercises: 165.

Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Chapter 1.
Exercises on page 5: 167. Exercises on page 15: 171.
Chapter 2.
Exercises on page 25: 175. Exercises on page 31: 181. Exercises on page 37: 185. Exercises on page 44: 189.
Exercises on page 49: 193. Exercises on page 55: 198. Exercises on page 60: 200. Exercises on page 68: 206.
Exercises on page 76: 213. Exercises on page 81: 215. Exercises on page 83: 216. Exercises on page 96: 220.
Exercises on page 102: 223. Exercises on page 104: 226. Exercises on page 110: 229.
Chapter 3.
Exercises on page 121: 235. Exercises on page 127: 236. Exercises on page 139: 241. Exercises on
page 156: 244. Exercises on page 163: 248. Exercises on page 165: 249.

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

Comments are welcome—even comments such as “not useful because it omits xxx”. Please send comments and details of errors to my
Wordpress account, bcgts.wordpress.com, where the most up-to-date version of these notes will be found.

Bayesian Time Series Analysis by R.J. Reed Mar 10, 2020(20:25) Page 1
Page 2 Mar 10, 2020(20:25) Bayesian Time Series Analysis
CHAPTER 1

Foundations
1 Revision of some basic results
1.1 Conditional variance and expectation. For any random vector (X, Y ) such that E[Y 2 ] is finite, the
conditional variance of Y given X is defined to be
2
var( Y |X ) = E[ Y 2 |X ] − E[Y |X] = E (Y − E[Y |X])2 |X
 
(1.1a)
This is a function of X. Taking expectations of the first equality and using the standard result var(Z) = E[Z 2 ] −
( E[Z] )2 with Z = E[Y |X] shows that
  
var(Y ) = E var(Y |X) + var E[Y |X] (1.1b)
Equation(1.1b) is often called the Law of Total Variance and is probably best remembered in the following form:
var(Y ) = E[conditional variance] + var(conditional mean)
This is similar to the decomposition in the analysis of variance. A generalization is given in exercise 2.13 on
page 7.
Definition(1.1a).For any random vector (X, Y, Z) such that E[XY ], E[X] and E[Y ] are all finite, the condi-
tional covariance between X and Y given Z is defined to be
cov(X, Y |Z) = E[XY |Z] − E[X|Z] E[Y |Z]
An alternative definition is
   
cov(X, Y |Z) = E X − E[X|Z] Y − E[Y |Z] Z (1.1c)
Note that cov(X, Y |Z) is a function of Z. Using the results cov(X, Y ) = E[XY ] − E[X]E[Y ] and
  
cov E[X|Z], E[Y |Z] = E E[X|Z] E[Y |Z] − E[X]E[Y ]
gives the Law of Total Covariance
  
cov(X, Y ) = E cov(X, Y |Z) + cov E[X|Z], E[Y |Z] (1.1d)
This can be remembered as
cov(X, Y ) = E[conditional covariance] + cov(conditional means)
Setting X = Y in the Law of Total Covariance gives the Law of Total Variance.

1.2 Conditional independence. Recall that X ⊥⊥ Y | Z means that X and Y are conditionally independent
given Z. By definition
X ⊥⊥ Y | Z iff P[X ≤ x, Y ≤ y|Z] = P[X ≤ x|Z] P[Y ≤ y|Z] a.e. on (x, y) ∈ R2 .
By first considering simple random variables and then non-negative random variables by taking limits, we see that
if X ⊥⊥ Y | Z and E[X], E[Y ] and E[XY ] are all finite then
E[XY |Z] = E[X|Z] E[Y |Z] (1.2a)
Example(1.2a). Conditional independence does not imply independence.
Here is a simple demonstration: suppose box 1 contains two fair coins and box 2 contains two coins which have heads on
both sides. A box is chosen at random—denote the result by Z. A coin is selected from the chosen box and tossed—denote
the result by X; then the other coin from the chosen box is tossed independently of the first coin—denote the result by Y .
Clearly X ⊥⊥ Y | Z. However
5 3
P[X = H, Y = H] = but P[X = H] = P[Y = H] =
8 4

Bayesian Time Series Analysis by R.J. Reed Mar 10, 2020(20:25) §1 Page 3
Page 4 §1 Mar 10, 2020(20:25) Bayesian Time Series Analysis

1.3 The hazard function. Suppose X is a non-negative absolutely continuous random variable with density
function fX and distribution function FX . Then the survivor function is the function SX (t) = 1 − FX (t) = P[X >
t] and the hazard function is the function
fX (t) fX (t)
hX (t) = = for t > 0. (1.3a)
1 − FX (t) SX (t)
If the random variable X represents the lifetime of some item, then the hazard function is a measure of the failure
rate given the item has already lasted for a time t.
The cumulative hazard function is the function
Z t
HX (t) = hX (u) du for t > 0. (1.3b)
0
The following relations follow:
d
hX (t) = − ln SX (t) HX (t) = −ln SX (t) SX (t) = e−HX (t) fX (t) = hX (t)e−HX (t)
dt
It follows that any one of the functions hX , HX , SX and fX determines all the others.
The function g: [0, ∞) → [0, ∞) can be used as a hazard function iff
Z ∞
g ≥ 0 and g(t) dt = ∞
0
For a justification of this result see exercise 2.19 on page 7.
Example(1.3a). Suppose X has the exponential density fX (x) = λe−λx for x > 0. Find the hazard and cumulative hazard
functions.
Solution. Now FX (t) = 1 − e−λt . Hence hX (t) = λ which is constant. Also HX (t) = λt for t > 0.
Example(1.3b). Suppose β > 0 and γ > 0 and X has the Weibull density
βxβ−1 −(x/γ)β
fX (x) = e for x > 0.
γβ
Find the hazard and cumulative hazard functions.
Solution. The distribution function of X is FX (x) = 1 − exp(−xβ /γ β ) for x > 0 and hence
βxβ−1 xβ
hX (t) = β
and HX (t) = β for t > 0.
γ γ
Note that if β < 1 then the hazard function is decreasing and if β > 1 then the hazard function is increasing. If β = 1 then
the hazard function is constant—this is just the exponential density again.

Further results about the hazard function are given in exercise 2.20 and exercise 2.21 on page 7.
1.4 Skewness. Skewness is a measure of the asymmetry of a distribution.
Definition(1.4a).Suppose X is a random variable with finite expectation µ and finite variance σ 2 . Then the
skewness of X is defined to be
"  #
X −µ 3 E[ (X − µ)3 ]
skew[X] = E = (1.4a)
σ σ3
A distribution is positively skewed iff skew(X) > 0, and then the density function has a long tail to the right of
the mean. Exercise 2.22 on page 7 shows that
E[X 3 ] − 3µσ 2 − µ3
skew[X] = (1.4b)
σ3
Example(1.4b). Suppose X is a random variable with finite mean µ and finite variance σ 2 . Suppose further that the
d d
distribution of X is symmetric about a—this means that X − a = a − X, where the notation W = Z means that the random
variables W and Z have the same distribution. Show that E[X] = a and skew[X] = 0.
Solution. The fact that E[X −a] = E[a−X] implies E[X] = a. Similarly E[(X −a)3 ] = E[(a−X)3 ] implies E[(X −a)3 ] = 0
and hence skew[X] = 0.
See exercise 2.23 on page 7 for an example of a distribution which is not symmetric but which has zero skewness.
1 Foundations Mar 10, 2020(20:25) §2 Page 5

1.5 Kurtosis. Kurtosis is a measure of the “peakedness” of a distribution or, equivalently, the fatness of the tails.
Definition(1.5a). Suppose X is a random variable with finite expectation µ and finite variance σ 2 . Then the
kurtosis of X is defined to be "  #
X −µ 4 E[ (X − µ)4 ]
κ[X] = E = (1.5a)
σ σ4
Exercise 2.25 on page 8 shows that, provided E[X 3 ] is finite,
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4
κ[X] = (1.5b)
σ4
Example(1.5b). Suppose the random variable X has the uniform distribution on (0, 1). Find the skewness and kurtosis
of X. R1 R1 R1
Solution. Now E[X] = 0 x dx = 12 , E[X 2 ] = 0 x2 dx = 13 and hence var[X] = 12 1
. Also E[X 3 ] = 0 x3 dx = 14 . Using
1
equation(1.4b) gives E[(X − µ)3 ] = E[X 3 ] − 3µσ 2 − µ3 = 41 − 3 × 24 − 18 = 0 and hence skew[X] = 0. Using equation(1.5b)
gives E[(X − µ)4 ] = E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 = 51 − 48 + 48
6 3 1 1
 1 1
+ 16 = 80 . Hence κ[X] = 80 ( 12 12 ) = 95 .
1.6 Location-scale families. Informally, a location-scale family is a family of distributions with two parameters:
one parameter determines the location and the other parameter, which must be non-negative, determines the scale.
Definition(1.6a). Suppose X and Y are real-valued random variables. Then X and Y have the same type or
belong to the same location-scale family iff there exist a ∈ R and b > 0 such that
d
Y = a + bX
Equivalently the distribution functions FX and FY havethe same type iff there exist a ∈ R and b > 0 such that
y−a
FY (y) = FX for all y ∈ R. (1.6a)
b
Two distributions have the same type if they represent the same quantity but in different physical units. The
parameter a determines the location and the parameter b determines the scale. It is easy to see that the relation of
being of the same type is an equivalence relation (reflexive, symmetric and transitive) on the class of all distribution
functions.
Definition(1.6b). The family of distributions F is said to be a location-scale family iff for all distribution
functions F ∈ F and G ∈ F there exist a ∈ R and b > 0 such that F (y) = G (y − a)/b for all y ∈ R.
2
Example(1.6c). Suppose X ∼ N (µX , σX ) and Y ∼ N (µY , σY2 ), then X and Y have the same type. We say that the family
of distributions {N (µ, σ 2 ) : µ ∈ R, σ > 0} is a location-scale family of distributions.
Similarly, a family of distributions is called a scale family of distributions if any one member can be expressed as
a positive multiple of another.
Definition(1.6d). The family of distributions F is said to be ascale family iff for all distribution functions
F ∈ F and G ∈ F there exists b > 0 such that F (y) = G y/b for all y ∈ R.
Example(1.6e). Suppose Xλ ∼ exponential (λ). In §9.1, we shall see that this implies αXλ ∼ exponential (λ/α).
Hence the family of distributions {exponential (λ) : λ > 0} is a scale family of distributions because
d µ
Xλ = Xµ
λ
• If X and Y are in the same location-scale family which has finite expectations, then there exists b > 0 such
d
that Y − µY = b(X − µX ). Hence skew(X) = skew(Y ) and κ(X) = κ(Y ).
• By differentiating equation(1.6a), if X and Y are in the same location-scale family and both have absolutely
continuous distributions with densities fX and fY respectively, then there exist a ∈ R and b > 0 such that
y−a
 
1
fY (y) = fX
b b

2 Exercises (exs-basic.tex)

Revision exercises.
2.1 The following assumptions are made about the interest rates for the next three years. Suppose the interest rate for year 1
is 4% p.a. effective. Let V2 and V3 denote the interest rates in years 2 and 3 respectively. Suppose V2 = 0.04 + U2 and
V3 = 0.04 + 2U3 where U2 and U3 are independent random variables with a uniform distribution on [−0.01, 0.01]. Hence
V2 has a uniform distribution on [0.03, 0.05] and V3 has a uniform distribution on [0.02, 0.06].
(a) Find the expectation of the accumulated amount at the end of 3 years of £1,000 invested now.
(b) Find the expectation of the present value of £1,000 in three years’ time. [Ans]
Page 6 §2 Mar 10, 2020(20:25) Bayesian Time Series Analysis

2.2 Suppose X and Y are positive random variables such that U = X/(X + Y ) and V = X + Y are independent.
(a) Prove that (X 2 + Y 2 )/XY and X + Y are independent.
(b) Prove that (X + Y )2 /XY and (X + Y )2 are independent. [Ans]
2.3 Uniform to triangular. Suppose X and Y are i.i.d random variables with the uniform distribution uniform(−a, a),
where a > 0. Find the density of W = X + Y and sketch the shape of the density. [Ans]
1
2.4 Suppose the random variable X has the density fX (x) = 2 for −1 < x < 1. Find the density of Y = X 4 . [Ans]
2.5 Suppose X is a random variable with X > 0 a.e. and such that both E[X] and E[ 1/X ] both exist. Prove that E[X] +
E[ 1/X ] ≥ 2. [Ans]
2.6 (a) Suppose X is a random variable with X ≥ 0 and density function f . Let F denote the distribution function of X.
Show that Z ∞ Z ∞
E[X] = [1 − F (x)] dx and E[X r ] = rxr−1 [1 − F (x)] dx for r = 1, 2, . . . .
0 0
(b) Now suppose X is a random variable with a ≤ X ≤ b where −∞ < a < b < ∞. Prove that
Z b Z b
E[X] = a + [1 − F (x)] dx = b − F (x) dx [Ans]
a a
2.7 Suppose X and Y are independent random variables with absolutely continuous distributions with densities fX and fY
respectively.
(a) Find the densities of W = Y − X and Z = Y + X.
(b) Find the densities of V = |Y − X| and T = (Y − X)2 . [Ans]
2.8 Suppose a > 0 and X and Y are i.i.d. random variables with the density
ex
f (x) = for −∞ < x < ln(a).
a
−w
Show that the density of W = |X − Y | is e for w > 0; this is the exponential (1) distribution. [Ans]
2.9 Suppose X1 , X2 , . . . , Xn are independent and identically distributed positive random variables and Sn = X1 + · · · + Xn .
(a) Show that  
1 1
E ≥
Sn nµ
Hint: use Jensen’s Inequality which states that if X is a random variable which takes values in the interval I and has
a finite expectation and φ : I → R is a convex function, then φ (E[X]) ≤ E [φ(X)].
(b) Show that   Z ∞
1 n
E = E[e−tX ] dt [Ans]
Sn 0
2.10 Suppose X1 , X2 , . . . , Xn are independent and identically distributed positive random variables.
(a) Suppose E[1/Xi ] is finite for all i. By using the arithmetic-geometric mean inequality, show that E[1/Sj ] is finite
for all j = 2, 3, . . . , n where Sj = X1 + · · · + Xj .
(b) Suppose E[Xi ] and E[1/Xi ] both exist and are finite for all i. Show that
 
Sj j
E = for j = 1, 2,. . . , n. [Ans]
Sn n
2.11 Suppose X and Y are positive random variables with E[Y ] > 0. Suppose further that X/Y is independent of X and X/Y
is independent of Y .
2
(a) Suppose E[X 2 ], E[Y 2 ] and E[ X /Y 2 ] are all finite. Show that E[X] = E[ X/Y ] E[Y ]. Hence deduce that there exists
b ∈ R with X/Y = b almost everywhere.
(b) Use characteristic functions to prove there exists b ∈ R with X/Y = b almost everywhere. [Ans]
Pn
2.12 Recursive calculation of the sample variance. Suppose {xn }n≥1 is a sequence in R. Let tn = i=1 xi and vn =
Pn 2
i=1 (x i − t n /n) for n ≥ 1. Hence
n
X t2
vn = x2i − n
n
i=1
(a) Show that t2n+1 = t2n + x2n+1 + 2xn+1 tn for n ≥ 1.
(b) Hence show that
 2
n tn
vn+1 = vn + xn+1 − for n ≥ 1.
n+1 n
Pn Pn
It follows that if s2n = i=1 (xi − xn )2 /(n − 1) then ns2n+1 = (n − 1)s2n + n(xn+1 − xn )2 )/(n + 1) where xn = i=1 xi /n.
[Ans]
1 Foundations Mar 10, 2020(20:25) §2 Page 7

Conditional expectation.
2.13 Suppose (X, Y ) is a random vector and g : R → R such that E[Y 2 ] < ∞ and E[g(X)2 ] < ∞. Show that
h 2 i   h 2 i
E Y − g(X) = E var[Y |X] + E E[Y |X] − g(X) [Ans]

2.14 (For this question, you need the results that if X has the uniform distribution on (0, b) which is denoted uniform(0, b),
then E[X] = b/2, E[X 2 ] = b2 /3 and var[X] = b2 /12.) Suppose X ∼ uniform(0, 1) and the distribution of Y given
X = x is uniform(0, x). By using the law of total expectation E[Y ] = E[ E[Y |X] ] and the law of total variance, which
is equation(1.1b), find E[Y ] and var[Y ]. [Ans]
2.15 The best predictor of the random variable Y . Given the random vector (X, Y ) with E[X 2 ] < ∞ and E[Y 2 ] < ∞,
find that random variable Yb = g(X) which is a function of X and provides the best predictor of Y . Precisely, show that
Yb = E[Y |X], which is a function of X, minimizes
 2 
E Y −Y b [Ans]

2.16 Suppose the random vector (X, Y ) satisfies 0 < E[X 2 ] < ∞ and 0 < E[Y 2 ] < ∞. Suppose further that E[Y |X = x] =
a + bx a.e..
(a) Show that µY = a + bµX and E[XY ] = aµX + bE[X 2 ]. Hence show that cov[X, Y ] = b var[X] and E[Y |X] =
µY + ρ σσXY
(X − µX ) a.e..
   2
(b) Show that var E(Y |X) = ρ2 σY2 and E Y − E(Y |X) = (1 − ρ2 )σY2 .
(Hence if ρ ≈ 1 then Y is near E(Y |X) with high probability; if ρ = 0 then the variation of Y about E(Y |X) is the
same as the variation about the mean µY .)
(c) Suppose E(X|Y ) = c + dY a.e. where bd < 1 and d 6= 0. Find expressions for E[X], E[Y ], ρ2 and σY2 /σX 2
in terms
of a, b, c and d. [Ans]
2.17 Best linear predictor of the random variable Y . Suppose the random vector (X, Y ) satisfies 0 < E[X 2 ] < ∞ and
0 < E[Y 2 ] < ∞.
Find a and b such that therandom variable Yb = a + bX provides the best linear predictor of Y . Precisely, find a ∈ R and
b ∈ R which minimize E ( Y − a − bX )2 .
Note. Suppose E[Y |X] = a0 + b0 X. By exercise 2.15, we know that E[Y |X] = a0 + b0 X is the best predictor of Y .
Hence a0 + b0 X is also the best linear predictor of Y . [Ans]
2.18 Suppose the random vector (X, Y ) has the density

6 2
f(X,Y ) (x, y) = 7 (x + y) for x ∈ [0, 1] and y ∈ [0, 1];
0 otherwise.
(a) Find the best predictor of Y .
(b) Find the best linear predictor of Y .
(c) Compare the plots of the answers to parts (a) and (b) as functions of x ∈ [0, 1]. [Ans]
Hazard function, skewness and kurtosis.
2.19 Suppose g is a function : [0, ∞) → [0, ∞) with
Z ∞
g ≥ 0 and g(t) dt = ∞
0
Show that g can be used as the hazard function of a distribution. [Ans]
2.20 Suppose T1 and T2 are independent non-negative absolutely continuous random variables with hazard functions h1 and
h2 respectively. Let T = min{T1 , T2 }. Show that hT (t) = h1 (t) + h2 (t) for t > 0. [Ans]
2.21 Suppose the random variable T is non-negative and absolutely continuous. Let Y = HT (T ) where HT is the cumulative
hazard function of T . Prove that Y ∼ exponential (1). [Ans]
2.22 Suppose X is a random variablewith finite expectation µ and finite variance σ 2 .
(a) Show that
E[X 3 ] − 3µσ 2 − µ3
skew[X] = (2.22a)
σ3
(b) If a ∈ R and b > 0 show that skew[a + bX] = skew[X].
(c) If a ∈ R and b < 0 show that skew[a + bX] = −skew[X]. [Ans]
2 2
2.23 Suppose X ∼ N (−2, σ = 1), Y ∼ N (1, σ = 2), and I has the Bernoulli distribution with P[I = 1] = and 1/3
P[I = 0] = 2/3. Suppose further that X, Y and I are independent and Z = IX + (1 − I)Y . Show that E[Z] = 0,
var[Z] = 11/3 and skew[Z] = 0 but the distribution of Z is not symmetric. [Ans]
Page 8 §3 Mar 10, 2020(20:25) Bayesian Time Series Analysis

2.24 Suppose the random variable B has the Bernoulli distribution with P[B = 1] = p and P[B = 0] = 1 − p where p ∈ [0, 1].
(a) Find skew[B].
(b) Find κ[B]. [Ans]
2 3
2.25 Suppose X is a random variable with finite expectation µ, finite variance σ , and finite third moment E[X ].
(a) Show that
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4
κ[X] = (2.25a)
σ4
(b) If a ∈ R and b 6= 0 show that κ[a + bX] = κ[X]. [Ans]

3 Order statistics
3.1 Basics. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with an absolutely continuous distribution
which has distribution function F and density f . Suppose further that
X1:n , X2:n , . . . , Xn:n
denote the order statistics of X1 , X2 , . . . , Xn . This means
X1:n = min{X1 , . . . , Xn }
Xn:n = max{X1 , . . . , Xn }
and the random variables X1:n , X2:n , . . . , Xn:n consist of X1 , X2 , . . . , Xn arranged in increasing order; hence
X1:n ≤ X2:n ≤ · · · ≤ Xn:n

3.2 Finding the density of (X1:n , . . . , Xn:n ). Let g(y1 , . . . , yn ) denote the density of (X1:n , . . . , Xn:n ).
Note that (X1:n , . . . , Xn:n ) can be regarded as a transformation T of the vector (X1 , . . . , Xn ).
• Suppose n = 2. Let A1 = {(x1 , x2 ) ∈ R2 : x1 < x2 } and A2 = {(x1 , x2 ) ∈ R2 : x1 > x2 }. Then T : A1 → A1
is 1 − 1 and T : A2 → A1 is 1 − 1. Hence for all (y1 , y2 ) ∈ A1 , the density g(y1 , y2 ) of (X1:2 , X2:2 ) is
fX ,X (y1 , y2 ) fX1 ,X2 (y2 , y1 )
g(y1 , y2 ) = 1 2 +
∂(y ,y ) ∂(y1 ,y2 )
∂(x11 ,x22 ) ∂(x1 ,x2 )
fX1 ,X2 (y1 , y2 ) fX1 ,X2 (y2 , y1 )
= +
|1| |−1|
= 2f (y1 )f (y2 )
• Suppose n = 3. For this case, we need A1 , A2 , A3 , A4 , A5 and A6 where
A1 = {(x1 , x2 , x3 ) ∈ R3 : x1 < x2 < x3 }
A2 = {(x1 , x2 , x3 ) ∈ R3 : x1 < x3 < x2 }
etc. There are 3! = 6 orderings of (x1 , x2 , x3 ). So this leads to
g(y1 , y2 , y3 ) = 3!f (y1 )f (y2 )f (y3 )
• For the general case of n ≥ 2, we have
g(y1 , . . . , yn ) = n!f (y1 ) · · · f (yn ) for y1 < · · · < yn . (3.2a)

3.3 Finding the distribution of Xr:n by using distribution functions. Dealing with the maximum is easy:
n
Y
Fn:n (x) = P[Xn:n ≤ x] = P[X1 ≤ x, . . . , Xn ≤ x] = P[Xi ≤ x] = {F (x)}n
i=1
and by differentiation
fn:n (x) = nf (x) {F (x)}n−1
Provided the random variables are positive, using the result of exercise 2.6 on page 6 gives the following expression
for the expectation of the maximum:
Z ∞
1 − {F (x)}n dx
 
E[Xn:n ] =
0
1 Foundations Mar 10, 2020(20:25) §3 Page 9

Now for the distribution of the minimum, X1:n :


n
Y
P[X1:n > x] = P[X1 > x, . . . , Xn > x] = P[Xi > x] = {1 − F (x)}n
i=1
F1:n (x) = 1 − P[X1:n > x] = 1 − {1 − F (x)}n
f1:n (x) = nf (x) {1 − F (x)}n−1
Provided the random variables are positive, using the result of exercise 2.6 on page 6 gives
Z ∞
E[X1:n ] = {1 − F (x)}n dx
0
Now for the general case, Xr:n where 2 ≤ r ≤ n − 1. The event {Xr:n ≤ x} occurs iff at least r random variables
from X1 , . . . , Xn are less than or equal to x. Hence
n  
X n
P[Xr:n ≤ x] = {F (x)}j {1 − F (x)}n−j (3.3a)
j
j=r
n−1  
X n
= {F (x)}j {1 − F (x)}n−j + {F (x)}n
j
j=r
Differentiating gives
n−1  
X n
fr:n (x) = jf (x) {F (x)}j−1 {1 − F (x)}n−j −
j
j=r
n−1  
X n
(n − j)f (x) {F (x)}j {1 − F (x)}n−j−1 + nf (x) {F (x)}n−1
j
j=r
n
X n!
= f (x) {F (x)}j−1 {1 − F (x)}n−j −
(j − 1)!(n − j)!
j=r
n−1
X n!
f (x) {F (x)}j {1 − F (x)}n−j−1
j!(n − j − 1)!
j=r
n!
= f (x) {F (x)}r−1 {1 − F (x)}n−r (3.3b)
(r − 1)!(n − r)!
Note that equation(3.3b) is true for all r = 1, 2, . . . , n.
3.4 Finding the distribution of Xr:n by using the density of (X1:n , . . . , Xn:n ). Recall that the density of
(X1:n , . . . , Xn:n ) is g(y1 , . . . , yn ) = n!f (y1 ) · · · f (yn ) for y1 < · · · < yn .
Integrating out yn gives
Z ∞
 
g(y1 , . . . , yn−1 ) = n!f (y1 ) · · · f (yn−1 ) f (yn )dyn = n!f (y1 ) · · · f (yn−1 ) 1 − F (yn−1 )
yn−1
Now integrate out yn−1 :
Z ∞  
g(y1 , . . . , yn−2 ) = n!f (y1 ) · · · f (yn−2 ) f (yn−1 ) 1 − F (yn−1 ) dyn−1
yn−2
 2
1 − F (yn−2 )
= n!f (y1 ) · · · f (yn−2 )
2!
Integrating out yn−2 gives
 3
1 − F (yn−3 )
g(y1 , . . . , yn−3 ) = n!f (y1 ) · · · f (yn−3 )
3!
By induction for r = 1, 2, . . . , n − 1 we have
[1 − F (yr )]n−r
g(y1 , . . . , yr ) = n!f (y1 ) · · · f (yr ) for y1 < y2 < · · · < yr .
(n − r)!
Assuming r ≥ 3 and integrating over y1 gives
Page 10 §3 Mar 10, 2020(20:25) Bayesian Time Series Analysis
y2
[1 − F (yr )]n−r [1 − F (yr )]n−r
Z
g(y2 , . . . , yr ) = n! f (y1 ) · · · f (yr ) dy1 = n!F (y2 )f (y2 ) · · · f (yr )
y1 =−∞ (n − r)! (n − r)!
Integrating over y2 gives
[F (y3 )]2 [1 − F (yr )]n−r
g(y3 , . . . , yr ) = n! f (y3 ) · · · f (yr ) for y3 < · · · < yr .
2! (n − r)!
And so on, leading to equation(3.3b).
3.5 Joint distribution of ( Xj:n , Xr:n ) by using the density of (X1:n , . . . , Xn:n ). Suppose X1:n , . . . , Xn:n
denote the order statistics from the n random variables X1 , . . . , Xn which have density f (x) and distribution
function F (x). Suppose 1 ≤ j < r ≤ n; then the joint density of (Xj:n , Xr:n ) is
 j−1  r−1−j  n−r
f(j:n,r:n) (u, v) = cf (u)f (v) F (u) F (v) − F (u) 1 − F (v) (3.5a)
where
n!
c=
(j − 1)!(r − 1 − j)!(n − r)!
The method used to derive this result is the same as that used to derive the distribution of Xr:n in the previous
paragraph.
Example(3.5a). Suppose X1 , . . . , Xn are i.i.d. random variables with density f (x) and distribution function F (x). Find
expressions for the density and distribution function of Rn = Xn:n − X1:n , the range of X1 , . . . , Xn .
Solution. The density of (X1:n , Xn:n ) is
 n−2
f(1:n,n:n) (u, v) = n(n − 1)f (u)f (v) F (v) − F (u) for u < v.
Now use the transformation R = Xn:n − X1:n and T = X1:n . The absolute value of the Jacobian is one. Hence
 n−2
f(R,T ) (r, t) = n(n − 1)f (t)f (r + t) F (r + t) − F (t) for r > 0 and t ∈ R.
Integrating out T gives
Z ∞
 n−2
fR (r) = n(n − 1) f (t)f (r + t) F (r + t) − F (t) dt
t=−∞
The distribution function is, for v > 0,
Z v Z ∞
 n−2
FR (v) = n(n − 1) f (t)f (r + t) F (r + t) − F (t) dt dr
r=0 t=−∞
Z ∞ Z v
 n−2
= n(n − 1) f (t) f (r + t) F (r + t) − F (t) dr dt
t=−∞ r=0
Z ∞ Z ∞
h n−1 v  n−1
=n f (t) F (r + t) − F (t) dt = n f (t) F (v + t) − F (t) dt
t=−∞ r=0 t=−∞

3.6 Joint distribution of ( Xj:n , Xr:n ) by using distribution functions. Suppose u < v and then define the
counts N1 , N2 and N3 as follows:
X n Xn Xn
N1 = I(Xi ≤ u) N2 = I(u < Xi ≤ v) and N3 = n − N1 − N2 = I(Xi > v)
i=1 i=1 i=1
     
Now P X1 ≤ u = F (u); also P u < X1 ≤ v = F (v) − F (u) and P X > v = 1 − F (v). It follows  that the
vector (N1 , N2 , N3 ) has the multinomial distribution with probabilities F (u), F (v) − F (u), 1 − F (v) .

The joint distribution function of Xj:n .Xr:n is:
n X
`
    X  
P Xj:n ≤ u and Xr:n < v = P N1 ≥ j and (N1 + N2 ) ≥ r = P N1 = k and N1 + N2 = `
`=r k=j
n X
`
X n!  k  `−k  n−`
= F (u) F (v) − F (u) 1 − F (v)
k!(` − k)!(n − `)!
`=r k=j

Now the joint density of Xj:n .Xr:n is:
∂2  
f(j:n,r:n) (u, v) = P Xj:n ≤ u and Xr:n < v
∂u∂v
1 Foundations Mar 10, 2020(20:25) §4 Page 11

Using the abbreviations a = F (u), b = F (v) − F (u) and c = 1 − F (v) gives



n X `
∂ X n!
ak−1 b`−k cn−`
 
P Xj:n ≤ u and Xr:n < v = f (u)
∂u  (k − 1)!(` − k)!(n − `)!
`=r k=j

`−1
X n! 
− ak b`−k−1 cn−`
k!(` − k − 1)!(n − `)! 
k=j
n
X n!
= f (u) aj−1 b`−j cn−`
(j − 1)!(` − j)!(n − `)!
`=r
and hence
∂2  n!
aj−1 br−j−1 cn−r

P Xj:n ≤ u and Xr:n < v = f (u)f (v)
∂u∂v (j − 1)!(r − j − 1)!(n − r)!
as required—see equation(3.5a) on page 10.
3.7 Asymptotic distributions. The next proposition gives the asymptotic distribution of the median.
Proposition(3.7a). Suppose the random variable X has an absolutely
  continuous distribution with density f
which is positive and continuous at the median, µ̃. Suppose in = n/2 + 1. Then
√  D
2 nf (µ̃) Xin :n − µ̃ =⇒ N (0, 1) as n → ∞.
1
This means that Xin :n is asymptotically normal with mean µ̃ and variance 4n(f (µ̃) )2
.
Proof. See page 223 in [A RNOLD et al.(2008)].
For other results, see chapter 8 in [A RNOLD et al.(2008)].
Example(3.7b). Suppose U1 , . . . , U2n−1 are i.i.d. random variables with the uniform(0, 1) distribution. Find the asymptotic
distribution of the median of U1 , . . . , U2n−1 .
√  D
Solution. The median is Un:(2n−1) and hence by the proposition 8n − 4 Un:(2n−1) − 1/2 =⇒ N (0, 1) as n → ∞.
√ √
Of course 8n/ 8n − 4 → 1 as n → ∞. Hence by Lemma 23 on page 263 of [F RISTEDT &G RAY(1997)] we have
√  D
8n Un:(2n−1) − 1/2 =⇒ N (0, 1) as n → ∞ and
 
1 t
lim P Un:(2n−1) − < √ = Φ(t) for t ∈ R.
n→∞ 2 8n

4 Stable distributions
4.1 The basic definition. It is well known that if X1 , X2 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 )
d √ √
distribution, then X1 + · · · + Xn ∼ N (nµ, nσ 2 ); hence X1 + · · · + Xn = (n − n)µ + nX where X ∼ N (µ, σ 2 ).
This means that X1 + · · · + Xn and X are of the same type1 for all n ∈ {1, 2, . . . , }.
A distribution is stable iff its shape is preserved under addition up to shift and scale. The formal definition is
Definition(4.1a). The random variable X has a stable distribution iff it is non-constant and for every n ∈
{1, 2, . . .}, if X1 , . . . , Xn are i.i.d. random variables with the same distribution as X then the random variables
X1 + · · · + Xn and X have the same type; this means that for every n ∈ {1, 2, . . .} there exist an ∈ R and
d
bn > 0 such that, then X1 + · · · + Xn = an + bn X.
The {an } are called the centring parameters and the {bn } are called the scaling parameters. We need to insist on
non-constant in the definition—otherwise X = µ, an = (n − 1)µ and bn = 1 would be a solution.
If the centring constant is always zero, then the distribution is said to be strictly stable.
Definition(4.1b). The random variable X has a strictly stable distribution iff it is non-constant and for every
n ∈ {1, 2, . . .} there exist bn > 0 such that, if X1 , . . . , Xn are i.i.d. random variables with the same distribution
d
as X then X1 + · · · + Xn = bn X.
Similarly, the distribution function F is stable if for every n ∈ {1, 2, . . .} there exist constants an ∈ R and bn > 0
such that if X1 , . . . , Xn are independent and have distribution function F then b−1 n (X1 + · · · + Xn ) + an has

1
Same type is defined in §1.6 on page 5.
Page 12 §4 Mar 10, 2020(20:25) Bayesian Time Series Analysis

distribution function F . Again the non-constant requirement is essential—otherwise X = µ and bn = n would be


a solution.
d
Note that if a + bX = c + dX, then either a = c and b = d or X is constant. It follows that if we exclude constant
random variables, then the values of the parameters an and bn in definitions(4.1a) and (4.1b) are unique.
Example(4.1c). Suppose X has the normal N (µ, σ 2 ) distribution. Show that X is stable and find an and bn .
d √ √ √
√ X1 + · · · + Xn (n − n)µ + nX. Hence the centring parameter is an = (n − n)µ and the scaling parameter
Solution. =
is bn = n.
If X has a stable distribution with a finite variance, then by equating expectations and variances we see that
d √ √
X1 + · · · + Xn = (n − n)µ + nX.
4.2 Basic properties.
Proposition(4.2a). Suppose the random variable X is stable with centring parameter an and scaling parame-
ter bn .
(a) Suppose c ∈ R and d ∈ R. Then the random variable Y = c + dX is stable with centring parameter
dan + (n − bn )c and scaling parameter bn .
(b) The random variable −X is stable with centring parameter −an and scaling parameter bn .
Proof.
d
(a) Suppose Y1 , . . . , Yn are i.i.d. random variables with the same distribution as Y . Then Y1 +· · ·+Yn = nc+d(X1 +· · ·+Xn )
d
where X1 , . . . , Xn are i.i.d. random variables with the same distribution as X. Now X1 + · · · + Xn = an + bn X. Hence
d d
Y1 + · · · + Yn = (nc + dan ) + dbn X = dan + (n − bn )c + bn Y .
d
(b) Suppose X1 , . . . , Xn are i.i.d. random variables with the same distribution as X. Then X1 + · · · + Xn = an + bn X.
d
Hence −X1 − · · · − Xn = −an − bn X; also −X1 , . . . , −Xn are i.i.d. random variables with the same distribution as
−X. Hence result.
Proposition(4.2b). Suppose X has a stable distribution with centring parameter an and scaling parameter bn
and suppose Y has a stable distribution with centring parameter cn and the same scaling parameter bn . Suppose
further that X and Y are independent. Then the random variable Z = X + Y has a stable distribution with
centring parameter an + cn and scaling parameter bn .
d
Proof. Suppose Z1 , . . . , Zn are i.i.d. random variables with the same distribution as Z. Then Z1 + · · · + Zn = (X1 + · · · +
Xn ) + (Y1 + · · · + Yn ) where X1 , . . . , Xn are i.i.d. with the distribution of X and Y1 , . . . , Yn are i.i.d. with the distribution
d d d d
of Y . Hence X1 + · · · + Xn = an + bn X and Y1 + · · · + Yn = cn + bn Y . Hence Z1 + · · · + Zn = (an + cn ) + bn (X + Y ) =
(an + cn ) + bn Z as required.

Proposition(4.2c).Suppose X and Y are i.i.d. random variables with a stable distribution which has centring
parameters an and scaling parameters bn . Then X − Y has a strictly stable distribution with scaling parame-
ter bn .
Proof. This follows immediately from propositions(4.2a) and (4.2b).
This last proposition means that X − Y has a strictly stable distribution which is symmetric about zero.2 By using
this result, proofs about stable distributions can sometimes be reduced to proofs about strictly stable distributions
which are symmetric about 0.
4.3 More advanced results.
Proposition(4.3a). Suppose X has a stable distribution with centring parameters {an } and scaling parame-
ters {bn }. Then there exists α ∈ (0, 2] such that
bn = n1/α for every n ∈ {1, 2, . . .}.
The constant α is called the characteristic exponent of the distribution.
Proof. This proof is based on pages 170–171 of [F ELLER(1971)].
First we assume that the distribution of X is symmetric and strictly stable. Part(b) of exercise 6.18 shows that for all
d
integers m ≥ 1 and n ≥ 1 we have bm+n X = bm X1 + bn X2 where X1 and X2 are i.i.d. random variables with the same
 x ≥ 0 we have
distribution as X. Hence for   
bm+n bm bn
X1 ≥ 0, X2 ≥ x ⊆ X= X1 + X2 ≥ x
bn bm+n bm+n

2
The random variable X has a distribution which is symmetric about 0 iff X and −X has the same distribution.
1 Foundations Mar 10, 2020(20:25) §4 Page 13

Independence of X1 and X2 implies


bm+n
P[X ≥ x] ≥ P[X1 ≥ 0] P[X2 ≥ x]
bn
Because the distribution of X is symmetric, we must3 have P[X ≥ 0] ≥ 1/2. Hence
 
1 bm+n
P[X ≥ x] ≥ P X ≥ x for all x ≥ 0 and all integers m ≥ 1 and n ≥ 1.
2 bn
We now show that the set  
bn
: m ≥ 1, n ≥ 1
bm+n
is bounded above. For suppose not: then there exists a sequence (mj , nj ) such that
bnj bmj +nj
→ ∞ as j → ∞ and hence → 0 as j → ∞
bmj +nj bnj
Hence
bmj +nj
 
1 1
P[X > x] ≥ lim P X ≥ x ≥
j→∞ 2 bnj 4
1
and 1 − FX (x) = P[X > x] ≥ 4 for all x gives the required contradiction. We have shown there exists K > 0 such that
bn
≤ K for all integers m ≥ 1 and n ≥ 1.
bm+n
To clarify the following argument, we shall now treat b as a function b : {1, 2, . . .} → (0, ∞). By part(a) of exercise 6.18
we know that b(rk ) = [b(r)]k for all integers k ≥ 1 and r ≥ 1.
Fix r ∈ {1, 2, . . .} and let α = ln(r)/ ln( b(r) ); hence b(r) = r1/α . Similarly fix s ∈ {1, 2, . . .} with s 6= r and let
β = ln(s)/ ln( b(s) ); hence b(s) = s1/β . We need to show α = β.
For every j ∈ {1, 2, . . .}, we have b(rj ) = [b(r)]j = rj/α . Similarly for every k ∈ {1, 2, . . .}, we have b(sk ) = [b(s)]k =
sk/β .
Fix ` ∈ {1, 2, . . .}; then there exists j ∈ {1, 2, . . .} such that rj ≤ s` ≤ rj+1 which implies s`/β ≤ rj/β r1/β . Hence
b(s` ) ≤ [b(rj )]α/β r1/β . Hence for every ` ∈ {1, 2, . . .} there exists j ∈ {1, 2, . . .} such that
b(rj ) [b(rj )](β−α)/β
K≥ ≥
b(s` ) r1/β
j j
As ` → ∞ so j → ∞ and b(r ) = [b(r)] → ∞ because b(r) > 1. Hence β ≤ α. Interchanging r and s shows that
α ≤ β. Hence α = β and b(n) = n1/α for all n ∈ {1, 2, . . .}.
It remains to prove that α ≤ 2. Suppose α > 2; we shall obtain a contradiction.
Now
Z ∞ Z ∞ ∞ Z 2k
2 2
√ X √
E[X ] = P[X > x] dx = P[|X| > x] dx = P[|X| > x] dx (4.3a)
0 0 k=1 2k−1

Now for every t > 0 and n ∈ {1, 2, . . .} we have


P[ |X1 + · · · + Xn | > tbn ] = P[ bn |X| > tbn ] = P[ |X| > t ]
Hence there exists t > 0 such that P[ |X1 + · · · + Xn | > tbn ] ≤ 41 . Using lemma 2 on page 149 of [F ELLER(1971)] gives
1  1
1 − e−nP[|X|>tbn ] ≤ P[ |X1 + · · · + Xn | ≥ tbn ] ≤
2 4
So this implies there exists K2 such that nP[|X| > tbn ] < K2 for all n. Substituting x = tbn gives P[|X| > x] ≤
(K2 tα )x−α . Hence
Z 2k

P[|X| > x] dx ≤ K2 tα 2k(1−α/2)
2k−1
If α > 2 then equation(4.3a) implies E[X 2 ] < ∞. If a stable distribution distribution has a finite variance, then by
d √ √
equating expectations and variances we see that X1 + · · · + Xn = (n − n)µ + nX and hence α = 2. So we have a
contradiction and we must have α ∈ (0, 2].
Now suppose X is stable with centring parameters {an } and scaling parameters {bn }. Proposition(4.2c) implies the
required result.
Proposition(4.3b). Every stable distribution is continuous.
Proof. This proof is based on page 215 of [F ELLER(1971)].
First suppose X is a random variable with a strictly stable distribution with scaling factors {bn }. Part(b) of exercise 6.18
3
By definition, symmetric means that the distributions of X and −X are the same. Let α = P[X > 0] = P[X < 0]. Then
2α + P[X = 0] = 1; hence P[X ≥ 0] = α + P[X = 0] = 1 − α ≥ 1/2.
Page 14 §4 Mar 10, 2020(20:25) Bayesian Time Series Analysis

d
shows that for all integers m ≥ 1 and n ≥ 1 we have bm+n X = bm X1 +bn X2 where X1 and X2 are i.i.d. random variables
with the same distribution as X.
Suppose there exists x ∈ R with x 6= 0 such that P[X = x] = p > 0. Then
 
2 (bm + bn )x
p = P[X1 = x] P[X2 = x] = P[(X1 = x) ∩ (X2 = x)] ≤ P X =
bm+n
2
So we have infinitely many points {xj } with P[X = xj ] ≥ p ; contradiction. Hence
P[X = x] = 0 for all x 6= 0.
d
Now suppose P[X = 0] = p > 0. By proposition(4.3a) we know that b1 = 11/α = 1. Using b2 X = b1 X1 + b1 X2 = X1 + X2
shows that P[X1 + X2 = 0] = P[X = 0] = p. But we have already established that P[X = x] = 0 for all x 6= 0; hence
P[X1 + X2 = 0] = P[X1 = 0] P[X2 = 0] = p2 . Hence p = p2 and hence p = 0. This proves X has a continuous distribution.
Now suppose X has a stable distribution. Let V = X1 − X2 where X1 and X2 are i.i.d. random variables with the same
distribution as X. Hence V has a strictly stable distribution and so is continuous. Suppose P[X = x] = p > 0; then
p2 = P[X1 = x, X2 = x] ≤ P[V = 0] which gives a contradiction. Hence the distribution of X is continuous.
Now for the characteristic function:
Proposition(4.3c). Suppose X has a stable distribution with characteristic exponent α. Then there exists
β ∈ [−1, 1], c ∈ R and d > 0 such that for all t ∈ R we have
 h πα i 
E[eitX ] = exp itc − dα |t|α 1 − iβ sgn(t) tan( ) if α 6= 1.
  2  (4.3b)
2iβ
E[eitX ] = exp itc − d|t| 1 + sgn(t) ln(|t|) if α = 1.
π
where
−1 if t < 0;
(
sgn(t) = 0 if t = 0;
1 if t > 0.
Proof. See pages 204–207 of [B REIMAN(1968)].
If X has a stable distribution with a finite variance, then then by equating expectations and variances we see that
d √ √
X1 + · · · + Xn = (n − n)µ + nX. Hence α = 2 and the characteristic function is
E[eitX ] = exp itc − d2 t2 [1 − iβ sgn(t) tan(π)] = exp itc − d2 t2
 

Hence X ∼ N (c, σ 2 = 2d2 ). Hence the only stable distribution with finite variance is the normal distribution;
also, the only stable distribution with α = 2 is the normal distribution.
If X1 and X2 are independent and have stable distributions with the same characteristic exponent, then, by propo-
sition(4.3a) they have the same scaling parameter. Hence it follows by proposition(4.2b) that X1 + X2 has a stable
distribution. This leads to an alternative definition of a stable distribution.
Proposition(4.3d). The random variable X has a stable distribution iff for every γ1 > 0 and γ2 > 0 there
exist a ∈ R and b > 0 such that if X1 and X2 are i.i.d. random variables with the same distribution as X then
d
γ1 X1 + γ2 X2 = a + bX.
Proof.
⇐ Setting γ1 = γ2 = 1 shows that there exist a2 ∈ R and b2 > 0 such that if X1 and X2 are i.i.d. random variables
d
with the same distribution as X then X1 + X2 = a2 + b2 X. We now proceed by induction. Suppose X1 , . . . , Xn , Xn+1
are i.i.d. random variables with the distribution of X. By the induction assumption, there exist an ∈ R and bn > 0 such
d d
that X1 + · · · + Xn = an + bn X. By independence, X1 + · · · + Xn + Xn+1 = an + bn X1 + Xn+1 where X1 and Xn+1 are
i.i.d. with the same distribution as X. Using the given assumption shows that there exist c ∈ R and bn+1 > 0 such that
d d
bn X1 + Xn+1 = c + bn+1 X. Hence X1 + · · · + Xn+1 = (an + c) + bn+1 X. Hence the result by induction.
⇒ By exercise 6.17.
1 Foundations Mar 10, 2020(20:25) §6 Page 15

5 Infinitely divisible distributions


5.1 The definition.
Definition(5.1a). The random variable X : (Ω, F , P) → (R, B) has an infinitely divisible distribution iff for
every n ∈ {1, 2, . . .} there exists i.i.d random variables X1 , X2 , . . . , Xn such that
d
X = X1 + · · · + Xn
The key result is that stable implies infinitely divisible.
Proposition(5.1b). Suppose the random variable X has a stable distribution. Then X has an infinitely divisible
distribution.
Proof. Suppose n ∈ {1, 2, . . .} and X1 , X2 , . . . , Xn are i.i.d. random variables with the same distribution as X. By
definition (4.1a), the definition of stable distribution, we know there exist an ∈ R and bn > 0 such that
d
an + bn X = X1 + · · · + Xn
Hence
n
d X Xj − an /n
X = = Y1 + · · · + Yn
bn
j=1
where Y1 , . . . , Yn are i.i.d. Hence result.
The concept of infinite divisibility is important when deriving various limit theorems in probability theory—see
for example chapter 16 in [F RISTEDT &G RAY(1997)] and chapters 6 and 9 in [F ELLER(1971)].
5.2 Two examples. These two examples depend on properties of distributions considered later in these notes
and they can safely be omitted on a first reading. First, a distribution which is stable and infinitely divisible.
d
Example(5.2a). Suppose X ∼ N (µ, σ 2 ) and n ∈ {1, 2, . . .}. Then X = Y1 + · · · + Yn where Y1 , . . . , Yn are i.i.d. random
2
variables with the N (µ/n, σ /n) distribution.

Now for a distribution which is infinitely divisible but not stable.


d
Example(5.2b). Suppose X ∼ gamma(k, α) and n ∈ {1, 2, . . .}. Then X = Y1 + · · · + Yn where Y1 , . . . , Yn are
i.i.d. random variables with the gamma(k/n, α) distribution—see exercise 12.6 on page 38.

6 Exercises (exs-orderstable.tex)

Order statistics.
6.1 Suppose X1 and X2 are i.i.d. random variables with the uniform(0, 1) distribution. Let Y denote the point which is
closest to an endpoint—either 0 or 1.
(a) Find the distribution of Z, the distance from Y to the nearest endpoint.
(b) Find the distribution of Z, the distance from 0 to Y . [Ans]
6.2 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the uniform uniform(0, 1) distribution.
(a) Find the distribution of Xj:n . (b) Find E[Xj:n ]. [Ans]
6.3 Suppose X1 , X2 , X3 and X4 are i.i.d. random variables with the uniform(0, 1) distribution.
(a) Find the density of (X3:4 , X4:4 ).
(b) Find P[X3:4 + X4:4 ≤ 1]. [Ans]
6.4 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the uniform uniform(0, 1) distribution. Define (Y1 , Y2 , . . . , Yn )
by
X1:n X2:n X(n−1):n
Y1 = , Y2 = , . . . , Yn−1 = , Yn = Xn:n
X2:n X3:n Xn:n
Show that Y1 , . . . , Yn are independent and that V1 = Y1 , V2 = Y22 , . . . , Vn = Ynn are i.i.d. [Ans]
6.5 Suppose X1 , X2 , X3 and X4 are i.i.d. random variables with the uniform uniform(0, 1) distribution. Find the distributions
of Y = X3:4 − X1:4 and Y = X4:4 − X2:4 . [Ans]
6.6 Suppose X1 , X2 and X3 are i.i.d. random variables with the uniform uniform(0, 1) distribution. Find the conditional
density of X2:3 given (X1:3 , X3:3 ). [Ans]
Page 16 §6 Mar 10, 2020(20:25) Bayesian Time Series Analysis

6.7 Suppose X and Y are i.i.d. random variables with support in the interval [a, b] where −∞ < a < b < ∞.
(a) Let Z = max{X, Y }. Prove that
b Z b
2 2
E[Z] = z {FX (z)} − {FX (z)} dz
a a

(b) Let V = min{X, Y }. Prove that


b Z b
2 2
E[V ] = −v {1 − FX (v)} + {1 − FX (v)} dv
a a

(c) Check the value of E[Z] + E[V ]. [Ans]


6.8 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with order statistics X1:n , X2:n , . . . , Xn:n . Find an expression for
E[ X1 |X1:n , X2:n , . . . , Xn:n ]. [Ans]
6.9 Suppose X1 , . . . , Xn are i.i.d. random variables with a distribution which is symmetric about 0. Let FX denote the
distribution function of this distribution. If Y = |X1 |, then Y has distribution function FY with FY (x) = 2FX (x) − 1.
Suppose Y1 , . . . , Yn are i.i.d. random variables with distribution function FY .
(a) Prove that for r ∈ {1, . . . , n}
" r−1   n  
#
1 X n X n
E[Xr:n ] = n E[Y(r−k):(n−k) ] − E[Y(k−r+1):k ] (6.9a)
2 k k
k=0 k=r
(b) Prove that for r ∈ {1, . . . , n} and m ∈ {1, 2, . . . .}
" r−1   n  
#
m 1 X n m m
X n m
E[Xr:n ] = n E[Y(r−k):(n−k) ] + (−1) E[Y(k−r+1):k ]
2 k k
k=0 k=r
(c) Prove that for 1 ≤ j < r ≤ n
" j−1   r−1  
1 X n X n
E[Xj:n Xr:n ] = n E[Y(j−k):(n−k) Y(r−k):(n−k) ] − E[Y(k−j+1):k ]E[Y(r−k):(n−k) ]
2 k k
k=0 k=j
n  
#
X n
+ E[Y(k−r+1):k Y(k−j+1):k ]
k
k=r
(d) Prove that for 1 ≤ j < r ≤ n, m1 ∈ {1, 2, . . .} and m2 ∈ {1, 2, . . .},
" j−1   r−1  
m1 m2 1 X n m1 m2 m1
X n m1 m2
E[Xj:n Xr:n ] = n E[Y(j−k):(n−k) Y(r−k):(n−k) ] + (−1) E[Y(k−j+1):k ]E[Y(r−k):(n−k) ]
2 k k
k=0 k=j
n  
#
m1 +m2
X n m1 m2
+ (−1) E[Y(k−r+1):k Y(k−j+1):k ] [G OVINDARAJULU(1963)] [Ans]
k
k=r

6.10 Suppose k > r. It is known4 that if the random variable X has an absolutely continuous distribution with distribution
function F then the conditional distribution function P[Xk:n < y|Xr:n = x] is the same as the distribution function of
the (k − r)th order statistic in a sample of size (n − r) from the distribution function

F (y)−F (x)
F1 (y) = 1−F (y) if y > x;
0 otherwise.

Suppose X1 and X2 are i.i.d. absolutely continuous non-negative random variables with density function f (x) and
distribution function F (x). By using the above result, show that X2:2 − X1:2 is independent of X1:2 if and only if
X ∼ exponential (λ). [Ans]
6.11 Suppose X1 , X2 , . . . , Xn are i.i.d. absolutely continuous non-negative random variables with density function f (x) and
distribution function F (x). Define the vector (Y1 , Y2 , . . . , Yn ) by
X2:n Xn:n
Y1 = X1:n , Y2 = , . . . , Yn =
X1:n X1:n
(a) Find an expression for the density of the vector (Y1 , Y2 , . . . , Yn ) in terms of f and F .
(b) Hence derive expressions for the density of the vector (Y1 , Y2 ) = (X1:n , X2:n/X1:n ) and the density of the random
variable Y1 = X1:n . [Ans]

4
For example, page 38 of [G ALAMBOS &KOTZ(1978)].
1 Foundations Mar 10, 2020(20:25) §6 Page 17

6.12 Record values. Suppose X0 , X1 , X2 , . . . are i.i.d. random variables with an absolutely continuous distribution. Let T
denote the index value of the first variable which is greater than X0 . Hence
{ N = 1 } = { X1 > X0 }
{ N = 2 } = { X1 < X0 , X2 > X0 }
{ N = 3 } = { X1 < X0 , X2 < X0 , X3 > X0 } etc.
Find the distribution of N and E[N ]. [Ans]
Stable distributions.
6.13 Suppose the random variable X has a stable distribution with characteristic exponent α 6= 1. Prove there exists β ∈ R
such that X − β has a strictly stable distribution. Hint: Express Snm as the sum of m independent random variables
each with the same distribution as X1 + · · · + Xn . [Ans]
6.14 Suppose the random variable X has a stable distribution with characteristic exponent α = 1. Prove that anm = man +
d
nam and X1 + · · · + Xn = n ln(n) + nX, and hence deduce E[X] does not exist. [Ans]
6.15 Suppose X has a strictly stable distribution with characteristic exponent α ∈ (0, 2]. Prove that for integers m ≥ 1 and
d
n ≥ 1 we have m1/α X1 + n1/α X2 = (n + m)1/α X. [Ans]
6.16 This question uses the same notation as in equation(4.3b) for the characteristic function of a stable distribution. Suppose
X1 has a stable distribution with characteristic exponent α and parameters {c1 , d1 , β1 } and X2 has a stable distribution
with the same characteristic exponent α and parameters {c2 , d2 , β2 }. Suppose further that X1 and X2 are independent.
Show that X1 + X2 has a stable distribution with parameters {c = c1 + c2 , d = (dα α 1/α
1 + d2 ) , β} where
α α
β1 d1 + β2 d2
β= [Ans]
dα α
1 + d2
6.17 (a) Suppose the random variable X has a stable distribution with characteristic exponent α 6= 1 and characteristic
function with parameters {c, d, β}. Suppose γ > 0. Show that γX has a stable distribution with characteristic
exponent α 6= 1 and characteristic function with parameters {γc, γd, β}.
(b) Suppose X1 and X2 are i.i.d. random variables with a stable distribution with characteristic exponent α 6= 1 and
characteristic function with parameters {c, d, β}. Suppose γ1 > 0 and γ2 > 0. Prove there exist a ∈ R and b > 0
d
such that γ1 X1 + γ2 X2 = a + bX.
(c) Suppose X1 and X2 are i.i.d. random variables with a stable distribution with characteristic exponent α = 1 and
characteristic function with parameters {c, d, β}. Suppose γ1 > 0 and γ2 > 0. Prove there exist a ∈ R and b > 0
d
such that γ1 X1 + γ2 X2 = a + bX. [Ans]
6.18 Suppose the random variable X has a strictly stable distribution with scaling factors {bn }.
(a) Prove that bmn = bm bn for all integers m ≥ 1 and n ≥ 1. Hence prove that for integers k ≥ 1 and r ≥ 1, if n = rk
then bn = bkr . Prove also that b1 = 1 and br > 1 for r ∈ {2, 3, . . .}.
Hint: Express Snm as the sum of m independent random variables each with the same distribution as X1 + · · · + Xn .
d
(b) Prove that for all integers m ≥ 1 and n ≥ 1 we have bm+n X = bm X1 + bn X2 where X1 and X2 are i.i.d. random
variables with the same distribution as X. [Ans]
Page 18 §6 Mar 10, 2020(20:25) Bayesian Time Series Analysis
CHAPTER 2

Univariate Continuous Distributions

7 The uniform distribution


7.1 Definition of the uniform distribution.
Definition(7.1a). Suppose a ∈ R, b ∈ R and a < b. Then the random variable X has the uniform distribution
uniform(a, b) iff X has density

 1
for x ∈ (a, b).
f (x) = b − a
0 otherwise.
The distribution function is
0 if x < a;

x−a
F (x) = if x ∈ (a, b);
 b−a
1 if x > b.
The uniform distribution is also called the rectangular distribution.
If X ∼ uniform(0, 1) and Y = a + (b − a)X then Y ∼ uniform(a, b). It follows that the family of distributions
{uniform(a, b) : a ∈ R, b ∈ R, a < b} is a location-scale family—see definition(1.6b) on page 5.
Moments. The moments E[X n ] are finite for n 6= 1:
Z b Z b
a+b a2 + ab + b2 (b − a)2
E[X] = xf (x) dx = E[X 2 ] = x2 f (x) dx = var[X] =
a 2 a 3 12
Z b n+1 n+1
b −a
E[X n ] = xn f (x) dx = for n 6= −1, n ∈ R.
a (n + 1)(b − a)
The moment generating function and characteristic function.
(b−a)t
 tb
e − eta

Z b tx
e  2 sin( 2 )ei(a+b)t/2
for t 6
= 0; for t 6= 0;

E[etX ] = dx = t(b − a) and E[eitX ] = t(b − a)
a b−a for t = 0.
1 for t = 0.
 
1
fX (x) ................ FX (x) ...........
.... .....
... ..
... .
...........
...
...
...........................................................................................
.... ....
1 .... ..... .
.....................................
......
... ... ... .... ..... ..
... ... ... .. ..
....... .
.. .
... ... ... ... .....
... ... ... ... ..... .
... ... ... ... ...
...... .
.
... 1 .
. ... ... .
...... .
.. .... .
... . ... ... .
..... .
...b−a ..
.
..
... ... ....
... .
.
... . ... ... ......
. .
... ..
. ... ... .... .
... ..
. ... ... ..
..... .
... ..
. ... ... ......
. .
.. ... .
... . ... ... .....
.. .
. .. . . . .
. . .
......................................................................................................................................................................................................
. . .......................................................................................................................................................................................................
. ..
. .
... ...
0 a b x 0 a b x
Figure(7.1a). Left: plot of density of uniform(a, b). Right: plot of distribution function of uniform(a, b).
(PICTEX)

7.2 Sum of two i.i.d. uniform random variables.


Example(7.2a). Suppose X ∼ uniform(0, a), Y ∼ uniform(0, a) and X and Y are independent. Find the distribution of
Z =X +Y.
Solution. Clearly Z ∈ (0, 2a). The usual convolution integral gives
1 a
Z Z
fZ (z) = fX (x)fY (z − x) dx = fY (z − x) dx
x a 0
where fY (z − x) = 1/a when 0 < z − x < a; i.e. when z − a < x < z. Hence

Bayesian Time Series Analysis by R.J. Reed Mar 10, 2020(20:25) §7 Page 19
Page 20 §7 Mar 10, 2020(20:25) Bayesian Time Series Analysis

min{a,z}  z

if 0 < z < a;
min{a, z} − max{0, z − a}  a2
Z
1
fZ (z) = dx = =
a2 max{0,z−a} a2 2a − z
if a < z < 2a.


a2

and the distribution function is


 2
z
if 0 < z < a;


 2
2a
FZ (z) = 2 2
2z z (2a − z)
− 2 −1=1− if a < z < 2a.



a 2a 2a2
A graph of the density and distribution function is shown in figure(7.2a). For obvious reasons, this is called the triangular
distribution.
fX (x) ................. FX (x) ............
.... ....
... ...
.. ..
.......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...........................................................
...
...
.........
..... . .....
..... . ....
1 .... ........
....... .
.
... .... .. ......... ... ....... .
... .
.. ... ......
...
.
.... . .....
... ......
.
.
.
.... ..... .
.....
... ..... .
. ..... ... ....
... .
.
... ..... . ..... ... ..
. .
... ..
...... . .....
..... ... ...
.
.... .
... .
....
.
...
. .
. .....
.....
1/2 .. . . . . . . . . . . . . . . . . . .......... .
.
... ..
. . .... ..
...
. .
... .
....
.
...
. 1/a .
.
.....
..... ... .
.....
.... .
.
.
.
... ..
. . ..... ... .
...
.
.. . .
... .
....
. . .....
..... ... ..
..
. . .
... ......... . ..... ... ..
..
..... . .
.... ...... . ..... ... ...
......
. . .
........
.
.
. .....
. ..
. .
...
...
...... .
.
.
. .
........................................................................................................................................................................................................ ...............................................................................................................................................................................................................
. .
.... ....
0 a 2a x 0 a 2a x
Figure(7.2a). Plot of density (left) and distribution function (right) of triangular distribution.
(PICTEX)

7.3 Sum of n i.i.d. uniform random variables—the Irwin-Hall distribution. Now for the general result on
the sum of n independent and identically distributed uniforms.
Proposition(7.3a). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the uniform(0, 1) distribution.
Let Sn = X1 + · · · + Xn . Then the density and distribution function of Sn are given by
n  
1 X k n
n
(t − k)+

Fn (t) = (−1) for all t ∈ R and all n = 1, 2, . . . . (7.3a)
n! k
k=0
n  
1 k n
X n−1
(t − k)+

fn (t) = (−1) for all t ∈ R and all n = 2, 3, . . . . (7.3b)
(n − 1)! k
k=0
Proof. We prove the result for Fn (t) by induction on n.
Now if n = 1, then the right hand side of equation(7.3a) gives [t+ − (t − 1)+ ] which equals
0 if t < 0;
(
t if 0 < t < 1;
1 if t > 1.
as required. Also, for t ∈ R and n = 2, 3, . . . , we have
Z 1 Z 1 Z t
 
fn (t) = fn−1 (t − x)f1 (x) dx = fn−1 (t − x) dx = fn−1 (y) dy = Fn−1 (t) − Fn−1 (t − 1)
0 0 y=t−1
Assume that equation(7.3a) is true for n; to prove it true for n + 1:
fn+1 (t) = [Fn (t) − Fn (t − 1)]
" n   n   #
1 X n n X n n
(−1)k (t − k)+ − (−1)k (t − k − 1)+
   
=
n! k k
k=0 k=0
" n   n+1   #
1 k n + n n + n
X X
`
   
= (−1) (t − k) + (−1) (t − `)
n! k `−1
k=0 `=1
n+1  
1 X n+1  n
= (−1)k (t − k)+
n! k
k=0

by using the combinatorial identity nk + k−1 n


= n+1
  
k . Integrating fn+1 (t) gives
Z t n+1  
1 k n+1
X n+1
(t − k)+

Fn+1 (t) = fn+1 (x) dx = (−1)
0 (n + 1)! k
k=0
This establishes the proposition.
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §7 Page 21

Corollary(7.3b). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the uniform(0, 1) distribution. Let
Sn = X1 + · · · + Xn . Then the density of Sn is given by
[t]  
1 X n
fn (t) = (−1)k (t − k)n−1 for all t ∈ R and all n = 2, 3, . . . . (7.3c)
(n − 1)! k
k=0
Proof. Note that [t] is the floor function which is defined by
[t] = j if j ∈ {1, 2, 3, . . .} and j ≤ t < j + 1.
Equation(7.3c) follows immediately from equation(7.3b).
The random variable X is said to have the Irwin-Hall distribution with order n iff X has the density given in
equation(7.3b).
The proposition easily extends to other uniforms. For example. Suppose X1 , X2 , . . . , Xn are i.i.d. random
variables with the uniform(0, a) distribution and Sn = X1 + · · · + Xn . Then the proposition can be applied to the
sum Sn0 = Y1 + · · · + Yn , where Yj = Xj/a. Hence
Fn (t) = P[X1 + · · · + Xn ≤ t] = P[Sn0 ≤ t/a]
n  
1 X k n
n
(t − ka)+

= n (−1) for all t ∈ R and all n = 1, 2, . . . .
a n! k
k=0
n  
1 X n  n−1
fn (t) = n (−1)k (t − ka)+ for all t ∈ R and all n = 2, 3, . . . .
a (n − 1)! k
k=0
Similarly if the random variables come from the uniform(a, b) distribution.
An alternative proof of proposition(7.3a) is based on taking the weak limit of the equivalent result for discrete
uniforms; the proof below is essentially that given on pages 284–285 in [F ELLER(1968)].
Proof. (An alternative proof of proposition(7.3a).) For m = 1, 2, . . . and n = 2, 3, . . . , suppose Xm1 , Xm2 , . . . , Xmn
are i.i.d. random variables with the discrete uniform distribution on the points 0, 1/m, 2/m, . . . , m/m = 1.
0 0 0 0
Let Xmj = mXmj + 1. Then Xm1 , Xm2 , . . . , Xmn are i.i.d. random variables with the discrete uniform distribution on
the points 1, 2, . . . , m + 1.
Now the discrete uniform distribution on the points 1, 2, . . . , m + 1 has probability generating function
s(1 − sm+1 )
for |s| < 1.
(m + 1)(1 − s)
0 0
Hence the probability generating function of Xm1 + · · · + Xmn is
0 0 sn (1 − sm+1 )n
E[sXm1 +···+Xmn ] = for |s| < 1.
(m + 1)n (1 − s)n
0 0
Hence the probability generating function of the sequence P[Xm1 + · · · + Xmn ≤ j] is
n m+1 n
s (1 − s )
for |s| < 1. (7.3d)
(m + 1)n (1 − s)n+1
Also, for j ∈ {0, 1/m, 2/m, . . . , mn−1/m, mn/m = n} we have
0 0
P[Xm1 + · · · + Xmn ≤ j] = P[Xm1 + · · · + Xmn ≤ mj + n]
mj+n
and this is the coefficient of s in the expansion of equation(7.3d), which in turn is
n  
( )
m+1 n m`+`
1 (1 − s ) X n ` s
× coefficient of smj in the expansion of = (−1)
(m + 1)n (1 − s)n+1 ` (1 − s)n+1
`=0
which is
n  
( )
1 0
X n sm`−mj+`
`
× coefficient of s in the expansion of (−1)
(m + 1)n ` (1 − s)n+1
`=0
This is clearly 0 if l > j. Otherwise it is
n   n
` mj − m` − ` + n (mj − m` − ` + n)!
   
1 X n 1 X ` n
(−1) = (−1) (7.3e)
(m + 1)n ` n n! ` (m + 1)n (mj − m` − `)!
`=0 `=0
P∞
by using the binomial series 1/(1−z)n+1 = `=0 `+n
 `
n z for |z| < 1. Taking the limit as m → ∞ of the expression in (7.3e)
gives
n  
1 X ` n
n
(j − `)+

(−1)
n! `
`=0
This proves the result when j is an integer. If j is any rational, take a series m tending to ∞ with mj an integer. Hence
the result in equation(7.3a) for any t by right continuity.
Page 22 §7 Mar 10, 2020(20:25) Bayesian Time Series Analysis

A note on the combinatorialP  by equation(7.3a) on page 20. Now Fn (t) = P[Sn ≤ t] = 1 for t ≥ n. By
identity implied
n
equation(7.3a), this implies k=0 (−1)k nk (t − k)n = n! for t ≥ n. How do we prove this identity without probability?
For all t ∈ R we have the identity
n  
X n
(−1)k etk = (1 − et )n
k
k=0
Pn n
Setting t = 0 gives k=0 k (−1)k = 0. Differentiating the identity once and setting t = 0 gives
n  
X n
(−1)k k = 0
k
k=0
Similarly, differentiating r times and setting t = 0 gives
n   
X n 0 if r = 0, 1, 2. . . . , n − 1;
(−1)k k r = (7.3f )
k (−1)n n! if r = n.
k=0
and hence
n  
X n n
0 if r = 1, 2. . . . , n − 1;
(−1)n−k k r =
k n! if r = n.
k=0
For all t ∈ R
n   n X n    
X n X n n j
(−1)k (t − k)n = (−1)k t (−1)n−j k n−j
k k j
k=0 k=0 j=0
n   n  
X n j j
X n
= (−1) t (−1)n−k k n−j = n!
j k
j=0 k=0
This generalizes the combinatorial result implied by equation(7.3a). See also question 16 on page 65 in [F ELLER(1968)].
7.4 Representing the uniform distribution as the sum of independent Bernoulli random variables. Now
every y ∈ [0, 1) can be represented as a ‘binary decimal’. This means we can write
y = 0.x1 x2 x3 . . . where each xj is either 0 or 1.
This representation motivates the following result:
Proposition(7.4a). Suppose X1 , X2 , . . . are i.i.d. random variables with the Bernoulli distribution

1 with probability 1/2;
0 with probability 1/2.
Then the random variable

X Xk
V =
2k
k=1
has the uniform uniform(0, 1) distribution.
P n Xk Xk
Proof. Let Sn = k=1 2k for n = 2, 3, . . . . Now the moment generating function of 2k
is
k 1 1 k
E[etXk /2 ] = + et/2
2 2
Hence
n
1 Y h t/2k i
E[etSn ] = e + 1
2n
k=1
  
t/2n+1 t/2n+1 t/2n
Using the identity e −1 e +1 =e − 1, and induction, it is possible to show
n
1 Y t/2k
h i 1 et − 1
E[etSn ] = n e + 1 = n t/2n (7.4a)
2
k=1
2 e −1
and hence
1 et − 1 et − 1
E[etSn ] =
n → as n → ∞.
2n et/2 − 1 t
t
Because (e − 1)/t is the moment generating function of uniform(0, 1), we see that V ∼ uniform(0, 1) as required.
Equation(7.4a) proves the following representation. Suppose Vn ∼ uniform(0, 1/2n ), and X1 , X2 , . . . , Xn , Vn are
all independent; then
n
X Xk
Vn + ∼ uniform(0, 1) for all n ∈ {1, 2, . . .}.
2k
k=1
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §7 Page 23

7.5 Order statistics for the uniform distribution. Suppose X1 , . . . , Xn are i.i.d. with the uniform(0, 1) dis-
tribution.
Maximum and minimum. It is straightforward to check the following results:
n
P[Xn:n ≤ x] = xn for 0 ≤ x ≤ 1, and E[Xn:n ] =
n+1
1
P[X1:n ≤ x] = 1 − (1 − x)n for 0 ≤ x ≤ 1, and E[X1:n ] =
n+1
Note that E[X1:n ] = 1 − E[Xn:n ] → 0 as n → ∞.
x n
→ e−x as n → ∞. This implies

For 0 ≤ x < 1 we have P[nX1:n > x] = P[X1:n > x/n] = 1 − n
D D
nX1:n =⇒ exponential (1) as n → ∞ where =⇒ denotes weak convergence (also called convergence in
distribution). Now for Xk:n .
The distribution of Xk:n . By equations(3.3a) and (3.3b) we have
n  
X n j
P[Xk:n ≤ t] = t (1 − t)n−k
j
j=k
n − 1 k−1
 
fXk:n (t) = n t (1 − t)n−k for 0 ≤ t ≤ 1. (7.5a)
k−1
This is the Beta density beta(k, n − k + 1) which is considered later—see §13.1 on page 40.
The limiting distribution of Xk:n as n → ∞.
P[nXk:n > t] = 1 − P[Xk:n ≤ 1/n]
n    j 
t n−j

X n t
=1− 1−
j n n
j=k

t n t n−1
  k−1 
t n−k
       
n t n t
= 1− + 1− + ··· + 1−
n 1 n n k−1 n n
k−1
t e −t
→ e−t + te−t + · · · +
k!
which is equal to P[Y > t] where Y ∼ gamma(k, 1) distribution—this will be shown in §11.2 on page 34.
D
We have shown that nXk:n =⇒ gamma(k, 1) as n → ∞ for any fixed k ∈ {1, 2, . . . , n}.
n
Note also that for x < 0, we have P[n(Xn:n − 1) < x] = P[Xn:n < 1 + x/n] = 1 + nx → ex as n → ∞ and
P[X1:n < x/n] = 1 − (1 − x/n)n → 1 − e−x as n → ∞.
7.6 The probability integral transform. Every probability distribution function F is monotonic increasing;
if F is a strictly increasing probability distribution function on the whole of R, then we know from elementary
analysis that F has a unique inverse G = F −1 . If U ∼ uniform(0, 1) and Y = G(U ), then clearly Y has distribution
function F . Hence we can simulate variates from the distribution F by simulating variates x1 , x2 , . . . xn from the
uniform(0, 1) distribution and calculating G(x1 ), G(x2 ), . . . , G(xn ).
Now for the general case. Suppose F : R → [0, 1] is a distribution function (not necessarily continuous). We first
need to define the “inverse” of F .
Proposition(7.6a). Suppose F : R → [0, 1] is a distribution function and we let G(u) = min{x: F (x) ≥ u}
for u ∈ (0, 1). Then
{x: G(u) ≤ x} = {x: F (x) ≥ u}.
Proof. Fix u ∈ (0, 1); then
x0 ∈ R.H.S. ⇒ F (x0 ) ≥ u
⇒ G(u) ≤ x0 by definition of G
⇒ x0 ∈ L.H.S.
Conversely
x0 ∈ L.H.S. ⇒ G(u) ≤ x0
⇒ min{x: F (x) ≥ u} ≤ x0
Let x = min{x: F (x) ≥ u}. Hence x ≤ x0 . Choose a sequence {xn }n≥1 with xn ↓↓ x∗ as n → ∞ (this means
∗ ∗

that the sequence {xn }n≥1 strictly decreases with limit x∗ ). Hence F (xn ) ≥ u for all n = 1, 2, . . . . Now F is a
Page 24 §7 Mar 10, 2020(20:25) Bayesian Time Series Analysis

distribution function; hence F is right continuous; hence F (x∗ ) ≥ u. Also x0 ≥ x∗ and F is monotonic increasing; hence
F (x0 ) ≥ F (x∗ ) ≥ u. Hence x0 ∈ R.H.S.
Suppose the distribution function F is continuous at α ∈ R. Then G(β) = α implies F (α) ≥ β. Also, for every
x < α we have F (x) < β. Hence F (α) = β. We have shown that G(β) = α implies F (α) = β and hence
F G(β) = β. If the random variable X has the distribution function F and F is continuous, then P[F (X) ≥ u] =
P[G(u) ≤ X] = 1 − F G(u) = 1 − u.
We have shown the following two important results.
• If the random variable X has the distribution function F and F is continuous, then the random variable F (X)
has the uniform(0, 1) distribution.
• Suppose F is a distribution function and G is defined in terms of F as explained above. If U has a uniform
distribution on (0, 1) and X = G(U ) then
P[X ≤ x] = P [G(U ) ≤ x] = P [F (x) ≥ U ] = F (x).
Hence
If U ∼ uniform(0, 1), then the distribution function of G(U ) is F .

As explained before the proposition, if the distribution function F is strictly increasing on the whole of R then
F −1 , the inverse of F , exists and G = F −1 . If F is the distribution function of a discrete distribution, then F is
constant except for countably many jumps and the inverse of F does not exist. However, G(u) is still defined by
the proposition and this method of simulating from the distribution F still works.
7.7 Using the probability integral transformation to prove results about order statistics.
Suppose X1 , . . . , Xn are i.i.d. with the uniform(0, 1) distribution. By equation(7.5a) on page 23 we have
n!
fXk:n (x) = xk−1 (1 − x)n−k for 0 ≤ x ≤ 1.
(k − 1)!(n − k)!
Suppose Y1 , . . . , Yn are i.i.d. with an absolutely continuous distribution with distribution function FY and density
function fY and we wish to find the distribution of Yk:n .
Then X1 = FY (Y1 ), . . . , Xn = FY (Yn ) are i.i.d. with the uniform(0, 1) distribution and hence
n!
fXk:n (x) = xk−1 (1 − x)n−k
(k − 1)!(n − k)!
Now FY is monotonic increasing and continuous and the transformation (X1 , . . . , Xn ) 7→ (Y1 , . . . , Yn ) is order
preserving; hence
Z FY (y)
P[Yk:n ≤ y] = P[FY (Yk:n ) ≤ FY (y)] = P[Xk:n ≤ FY (y)] = fXk:n (x) dx
−∞
and hence
n!
fYk:n (y) = {FY (y)}k−1 {1 − FY (y)}n−k fY (y)
(k − 1)!(n − k)!
This approach provides a general method for proving results about the order statistics of a sample from a contin-
uous distribution function.
7.8 Random partitions of an interval. Suppose U1 , . . . , Un are i.i.d. random variables with the uniform
uniform(0, 1) distribution and let U1:n , . . . , Un:n denote the order statistics. These variables partition the interval
[0, 1] into n + 1 disjoint intervals with the following lengths:
D1 = U1:n , D2 = U2:n − U1:n , . . . , Dn = Un:n − U(n−1):n , Dn+1 = 1 − Un:n
Clearly D1 + · · · + Dn+1 = 1. The absolute value of the Jacobian of the transformation (U1:n , . . . , Un:n ) 7→
(D1 , . . . , Dn ) is
∂(d1 , . . . , dn )
∂(u1:n , . . . , un:n ) = 1

The density of (U1:n , . . . , Un:n ) is given by (3.2a) on page 8. Hence the density of (D1 , . . . , Dn ) is
Pn
f(D1 ,...,Dn ) (d1 , . . . , dn ) = n! for d1 ≥ 0,. . . , dn ≥ 0, `=1 d` ≤ 1. (7.8a)
There are many results on random partitions of an interval—see [F ELLER(1971)], [DAVID &BARTON(1962)] and
[W HITWORTH(1901)]. One result is given in exercise 14.9 on page 44.
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §8 Page 25

7.9
Summary. The uniform distribution.
• Density. Suppose a < b. Then X has the uniform uniform(a, b) density iff
1
fX (x) = for x ∈ (a, b).
b−a
• The distribution function.
x−a
F (x) = for x ∈ (a, b).
b−a
• Moments.
a+b (b − a)2 bn+1 − an+1
E[X] = var[X] = E[X n ] = for n 6= −1, n ∈ R.
2 12 (n + 1)(b − a)
• M.g.f. and c.f.
(b−a)t
 tb
 e − eta  2 sin( 2 )ei(a+b)t/2

tX for t 6= 0; itX for t 6= 0;
MX (t) = E[e ] = t(b − a) and φX (t) = E[e ] = t(b − a)
1 for t = 0. for t = 0.
 
1
• Properties.
The sum of two independent uniforms on (0, a) is the triangular distribution on (0, 2a).
If X has the distribution function F and F is continuous, then F (X) ∼ uniform(0, 1).

8 Exercises (exs-uniform.tex)

8.1 Suppose X ∼ uniform(a, b). Show that


n
(b − a) n

1 − (−1) n+1  (b − a)
 
n if n is even;
E[(X − µ) ] = = (n + 1)2n [Ans]
(n + 1)2n+1 
0 if n is odd.
8.2 Suppose X and Y are independent random variables such that X has density fX (x) = 6x(1 − x) for 0 < x < 1 and Y
has density fY (y) = 2y for 0 < y < 1. Find the density of Z = X 2 Y . [Ans]
8.3 Transforming uniform to exponential. Suppose X ∼ uniform(0, 1). Find the distribution of Y = − ln X. [Ans]
8.4 Product of independent uniforms.
(a) Suppose X and Y are i.i.d. with the uniform(0, 1) distribution. Let Z = XY . Find the density and distribution
function of Z. (An alternative expression of this problem is given in exercise 8.5.)
(b) (Note: this part makes use of the fact that the sum of independent exponentials is a gamma distribution—see propo-
sition(11.6a) on page 35.) Suppose X1 , . . . , Xn are i.i.d. with the uniform(0, 1) distribution and let Pn = X1 · · · Xn .
Find the density of − ln Pn and hence find the density of Pn . [Ans]
8.5 Suppose X1 ∼ uniform(0, 1), X2 ∼ uniform(0, X1 ), X3 ∼ uniform(0, X2 ), and in general, Xn ∼ uniform(0, Xn−1 )
for n ∈ {2, 3, . . .}.
(a) Prove by induction that the density of Xn is
n−1
ln 1/xn
f (x) = for 0 < xn < 1.
(n − 1)!
(b) By using the result of part (b) exercise 8.4, find the density of Xn . [Ans]
8.6 Suppose X is a random variable with an absolutely continuous distribution with density f . Then entropy of X is defined
to be Z
H(X) = − f (x) ln f (x) dx
Suppose X ∼ uniform(a, b). Find the entropy of X.
(It can be shown that the continuous distribution on the interval (a, b) with the largest entropy is the uniform.) [Ans]
8.7 Sum and difference of two independent uniforms. Suppose X ∼ uniform(0, a) and Y ∼ uniform(0, b) and X and Y are
independent.
(a) Find the density of V = X + Y and sketch its shape.
(b) Find the density of W = Y − X and sketch its shape. [Ans]
8.8 Suppose X ∼ uniform(0, a) and Y ∼ uniform(0, b) and X and Y are independent. Find the distribution of V =
min{X, Y } and find P[V = X]. [Ans]
Page 26 §8 Mar 10, 2020(20:25) Bayesian Time Series Analysis

8.9 A waiting time problem. Suppose you arrive at a bus stop at time t = 0. The stop is served by two bus routes. From past
observations, you assess that the time X1 to wait for a bus on route 1 has the uniform(0, a) distribution and the time X2
to wait for a bus on route 2 has the uniform(0, b) distribution. Also X1 and X2 are independent. (Clearly this assumption
will not hold in practice!!) A bus on route 1 takes the time α to reach your destination whilst a bus on route 2 takes the
time α + β.
Suppose the first bus arrive at the stop at time t0 and is on route 2. Should you catch it if you wish to minimize your
expected arrival time? [Ans]
8.10 Suppose U ∼ uniform(0, 1) and the random variable V has an absolutely continuous distribution with finite expectation
and density f . Also U and V are independent. Let W denote the fractional part of U + V ; this means that W =
U + V − bU + V c. Show that W ∼ uniform(0, 1).
(See also Poincaré’s roulette problem; pages 62–63 in [F ELLER(1971)] for example.) [Ans]
8.11 Suppose X and Y are i.i.d. random variables with the uniform(0, 1) distribution. Let V = min{X, Y } and W =
max{X, Y }.
(a) Find the distribution functions, densities and expectations of V and W .
(b) Find P[V ≤ v, W ≤ w] and hence derive f(V,W ) (v, w), the joint density of (V, W ).
(c) Find the density of (W |V ≤ v) and hence derive E[W |V ≤ v]. [Ans]
8.12 Suppose two points are chosen independently and at random on a circle with a circumference which has unit length.
(a) Find the distribution of the lengths of the intervals (X1 , X2 ) and (X2 , X1 ).
(b) Find the distribution of the length of the interval L which contains the fixed point Q.
(See page 23 in [F ELLER(1971)].) [Ans]
8.13 Suppose n points are distributed independently and uniformly on a disc with radius r. Let D denote the distance from
the centre of the disc to the nearest point. Find the density and expectation of D. [Ans]
8.14 Suppose the random vector (X1 , X2 ) has a distribution which is uniform over the disc {(x, y) ∈ R2 : x2 + y 2 ≤ a2 }.
Find the density of X1 . (See also the semicircle distribution defined in equation (34.1b) on page 103.) [Ans]
8.15 Suppose V1 , V2 and V3 are i.i.d. random variables with the uniform(0, 1) distribution.
(a) Let R = max{V1 , V2 } and Θ = 2πV3 . Find the density of (R, Θ).
(b) Let X = R cos Θ and Y = R sin Θ. Show that the random vector (X, Y ) is uniformly distributed on the unit disc
{(x, y) ∈ R2 : 0 < x2 + y 2 < 1}. [Ans]
8.16 Suppose X1 , . . . , Xn are i.i.d. with the uniform(0, 1) distribution.
(a) Find E[Xk:n ] and var[Xk:n ] for k = 1, 2, . . . , n.
(b) Find the joint density of (Xj:n , Xk:n ).
(c) Find E[Xj:n Xk:n ], cov[Xj:n , Xk:n ] and corr[Xj:n , Xk:n ]. [Ans]
8.17 Suppose Y1 , . . . , Yn are i.i.d. with an absolutely continuous distribution with distribution function FY and density
function fY . By transforming to the uniform distribution, find the density of (Yi:n , Yj:n ) where 1 ≤ i < j ≤ n. [Ans]
8.18 Suppose X ∼ uniform(a, b).
(a) Find skew[X], the skewness of X. (b) Find κ[X], the kurtosis of X.
Note: skew[X] = E[(X − µ) ]/σ and κ[X] = E[(X − µ)4 ]/σ 4 where σ 2 = E[(X − µ)2 ]. See §1.4 and §1.5.
3 3
[Ans]
8.19 Tukey’s lambda distribution. Suppose a > 0 and X ∼ uniform(0, 1). Define the random variable Y by
aX λ − (1 − X)λ Xa
 
Y = if λ 6= 0 and Y = ln if λ = 0.
λ 1−X
Then Y has the Tukey lambda (a, λ) distribution. Note that if λ > 0, then − 1/λ ≤ Y ≤ a/λ.
Suppose λ > 0; find E[Y n ] for n ∈ {1, 2, 3, . . .}.
(Note there is no simple closed form for the density and distribution functions.) [Ans]
8.20 The triangular density. Suppose a, b and c are real numbers with a < c < b. Define the function f by
 2(x−a)
 (b−a)(c−a) if a ≤ x ≤ c
f (x) = 2(b−x)
 (b−a)(b−c) if c ≤ x ≤ b
0 otherwise
(a) Check that f is a density function and plot its shape.
(b) Suppose the random variable X has the density f . Find E[X] and var[X].

[Ans]
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §9 Page 27

The Irwin-Hall distribution.


8.21 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the uniform(0, 1) distribution. Let Sn = X1 + · · · + Xn .
(a) Show that the density of Sn is given by
n  
1 k n
X
gn (t) = (−1) sgn(t − k)(t − k)n−1 for all t ∈ R and all n = 2, 3, . . . . (8.21a)
2(n − 1)! k
k=0
where
−1 if k > t;
(
sgn(t − k) = 0 if k = t;
1 if k < t.
(b) Show that the distribution function of Sn is given by
[t]  
1 X k n
Fn (t) = (−1) (t − k)n for all t ∈ R and all n = 1, 2, . . . . (8.21b)
n! k
k=0
and
n
"   #
1 1 X n
Fn (t) = 1+ (−1)k sgn(t − k)(t − k)n for all t ∈ R and all n = 1, 2, . . . . (8.21c)
2 n! k
k=0
[Ans]
8.22 Suppose Xn has the Irwin-Hall distribution with order n.
(a) Show that E[Xn ] = n/2 and var[Xn ] = n/12 and the m.g.f. of Xn is
 t n
tXn e −1
E[e ] = for all t ∈ R with t 6= 0.
t
(b) Show that
Xn − n/2 D
p =⇒ N (0, 1) as n → ∞. [Ans]
n/12
8.23 Suppose X has the Irwin-Hall distribution with order n.
(a) Show that E[X 2 ] = (3n2 + n)/12, E[X 3 ] = (n3 − n2 + 2n)/8 and E[X 4 ] = (n/240)[15n3 + 30n2 + 5n − 2].
(b) Show that the skewness of X is skew[X] = 0.
(c) Show that the kurtosis of X is κ[X] = 3 − 6/5n. [Ans]
8.24 Suppose a < b and X1 , X2 , . . . , Xn are i.i.d. random variables with the uniform(a, b) distribution. Let Sn = X1 + · · · +
Xn . Find expressions for the distribution function and density function of Sn . [Ans]

9 The exponential distribution


9.1 The basics
Definition(9.1a). Suppose λ > 0. Then X has the exponential distribution, exponential (λ), iff X has an
absolutely continuous distribution with density

λe−λx if x > 0;
f (x) =
0 if x < 0.
Clearly the density f of the exponential distribution is decreasing on [0, ∞) and is convex on [0, ∞) because
f 00 > 0. Also f (x) → 0 as x → ∞.
The distribution function. 
F (x) = 1 − e−λx if x > 0;
0 if x < 0.
The quantile function and median. The quantile function is
ln(1 − p)
F −1 (p) = − for p ∈ [0, 1).
λ
The median is F −1 ( 1/2) = ln 2/λ.
Moments. These can be obtained by integrating by parts.
1 2 n!
E[X] = E[X 2 ] = 2 E[X n ] = n for n = 1, 2, . . . . (9.1a)
λ λ λ
Page 28 §9 Mar 10, 2020(20:25) Bayesian Time Series Analysis

More generally, E[X n ] = Γ(n + 1)/λn for all n ∈ (0, ∞). Hence
n
1 n! X (−1)k
var[X] = 2 and E[ (X − 1/λ)n ] = n for n = 1, 2, . . . .
λ λ k!
k=0

The moment generating function and characteristic function.


λ λ
E[etX ] = and φ(t) = E[eitX ] =
λ−t λ − it
Multiple of an exponential distribution. Suppose X ∼ exponential (λ) and Y = αX where α > 0. Then
P[Y > t] = P[X > t/α] = e−λt/α and hence Y ∼ exponential ( λ/α).
It follows that the family of exponential distributions {exponential (λ) : λ > 0} is a scale family of distributions—
see definition(1.6d) on page 5 for the meaning of this term.
Sum of i.i.d. exponentials. The sum of i.i.d. random variables with an exponential distribution has a gamma
distribution—this is explained in proposition(11.6a) on page 35.
9.2 The exponential as the limit of geometric distributions.
Suppose events can only occur at times δ, 2δ, 3δ, . . . , and events at different times are independent. Let
P[event occurs at time kδ] = p
Let T denote the time to the first event. Then
P[T > kδ] = (1 − p)k
Hence P[T = kδ] = (1 − p)k−1 p for k ∈ {1, 2, . . .} and E[T ] = δ/p.
Now suppose δn → 0 as n → ∞ in such a way that E[Tn ] = δn/pn = 1/α is constant. Then
lim P[Tn > t] = lim (1 − pn )t/δn = lim (1 − αδn )t/δn = e−αt (9.2a)
n→∞ n→∞ n→∞
D
and hence Tn =⇒ T as n → ∞ where T ∼ exponential (α).
This result can be rigorously stated as follows.
Proposition(9.2a). Suppose that for every p ∈ (0, 1), the random variable Xp has the geometric distribution
P[Xp = k] = (1 − p)k−1 p for k = 1, 2, . . .
Then
D
pXp =⇒ exponential (1) as p → 0.
Proof. The m.g.f. of Xp is

X pet
E[etXp ] = etk (1 − p)k−1 p = for t < − ln(1 − p).
1 − (1 − p)et
k=1
It follows that the m.g.f. of pXp is

X petp
E[etpXp ] = etpk (1 − p)k−1 p = for t < − ln(1 − p)/p.
1 − (1 − p)etp
k=1
Setting x = pt in the standard inequality x < ex − 1 < x/(1 − x) for x < 1 shows that
etp − 1 t etp − 1
t< < for p < 1/t, and hence lim =t
p 1 − tp p→0 p
Hence for every t ∈ (0, ∞) we have
1 − (1 − p)etp 1 etp − 1
= 1 − −→ 1 − t as p → 0.
petp etp p
We have shown that
1
lim E[etpXp ] =
p→0 1−t
and the limit on the right hand side is the m.g.f. of the exponential (1) distribution. Hence, by the result1 on page 390 of
D
[B ILLINGSLEY(1995)], we have pXp =⇒ exponential (1) as p → 0.

1
Suppose s0 > 0 and {Mn } is a sequence of moment generating functions which exist for t ∈ (−s0 , s0 ) and Mn (t) → M (t)
as n → ∞ for all t ∈ (−s0 , s0 ) where M is also a moment generating function. Then this implies convergence in distribution
of the corresponding distributions.
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §9 Page 29

Corollary(9.2b). Suppose λ ∈ (0, ∞) and {pn }n≥1 is a sequence in (0, 1) such that npn → λ as n → ∞.
Suppose further that for every n ∈ {1, 2, . . .} the random variable Xn has the geometric distribution P[Xn =
k] = (1 − pn )k pn for k = 1, 2, . . . . Then
Xn D
=⇒ exponential (λ) as n → ∞.
n
Proof. The assumption npn → λ as n → ∞ implies pn → 0 as n → ∞. Proceeding as in the proposition gives
D
pn Xn =⇒ exponential (1) as n → ∞. Let λn = npn ; hence λn → λ as n → ∞. The Convergence of Types theorem2
then implies
pn Xn D
=⇒ exponential (λ) as n → ∞.
λn
Here is another approach to defining points distributed randomly on [0, ∞). Suppose X1 , . . . , Xn are i.i.d. random
variables with the uniform(0, `n ) distribution. Then P[X(1) > t] = (1 − t/`n )n for t ∈ (0, `n ). Now suppose
n → ∞ in such a way that n/`n = λ is fixed. Then limn→∞ P[X(1) > t] = e−λt , which means that
D
X(1) =⇒ T as n → ∞
where T ∼ exponential (α). Informally, this result says that if points are distributed randomly on the line such that
the mean density of points is λ, then the distance to the first point has the exponential (λ) distribution.
9.3 The lack of memory or Markov property of the exponential distribution. Suppose the random variable T
models the lifetime of some component. Then the random variable T is said to have the lack of memory property
iff the remaining lifetime of an item which has already lasted for a length of time x has the same distribution as T .
This means
P[T > x + t|T > x] = P[T > t]
and hence
P[T > x + t] = P[T > t] P[T > x] for all t > 0 and x > 0.
Similarly, the distribution with distribution function F has the lack of memory property iff [1 − F (x + t)] =
[1 − F (x)][1 − F (t)] for all x > 0 and all t > 0.
If X ∼ exponential (λ) then 1 − F (x) = e−λx and hence X has the lack of memory property. Conversely
Proposition(9.3a). Suppose X is an absolutely continuous random variable on [0, ∞) with the lack of memory
property. Then there exists λ > 0 such that X ∼ exponential (λ).
Proof. Let G(x) = P[X > x]. Then
G(x + y) = G(x)G(y) for all x ≥ 0 and all y ≥ 0.
 m  n  mn
Suppose x = m/n is rational. Then G(m/n) = G(1/n) . Raising to the nth power gives G(m/n) = G(1/n) =
[G(1)]m . Hence G(m/n) = [G(1)]m/n .
Now suppose x is any real number in [0, ∞). Choose sequences qn and rn of rationals such that qn ≤ x ≤ rn and
qn → x as n → ∞ and rn → x as n → ∞. Hence G(1)qn = G(qn ) ≥ G(x) ≥ G(rn ) = G(1)rn . Letting n → ∞ gives
G(x) = G(1)x .
Now let λ = − ln [G(1)]. Then G(x) = G(1)x = e−λx . See also [F ELLER(1968)], page 459.
The proof of proposition(9.3a) depends on finding a solution of the functional equation f (x + y) = f (x)f (y).
Taking logs of this equation gives the Cauchy functional equation f (x + y) = f (x) + f (y). Both of these equations
have been studied very extensively—see [S AATY(1981)], [ACZ ÉL(1966)], [K UCZMA(2009)], etc.
9.4 Distribution of the minimum and maximum. Suppose Xj ∼ exponential (λj ) for j = 1, 2, . . . , n. Suppose
further that X1 , X2 , . . . , Xn are independent. Let X1:n = min{X1 , . . . , Xn }. Then
P[X1:n > t] = P[X1 > t] · · · P[Xn > t] = e−(λ1 +···+λn )t for t > 0.
Hence X1:n ∼ exponential (λ1 + · · · + λn ).
In particular, we have shown that if X1 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution,
then X1:n ∼ exponential (nλ). Hence
X1
X1:n has the same distribution as
n
This property characterizes the exponential distribution—see page 39 of [G ALAMBOS &KOTZ(1978)].
The maximum is Xn:n and clearly P[Xn:n ≤ t] = nk=1 [1 − e−λk t ] for t ∈ [0, ∞). See also exercise 10.15.
Q

2
See Lemma 1 on page 193 of [B ILLINGSLEY(1995)]
Page 30 §9 Mar 10, 2020(20:25) Bayesian Time Series Analysis

9.5 The order statistics of the exponential distribution. Suppose we think of X1 , . . . , Xn as the times when
n events occur. Then we have shown that the time to the first event has the exponential (nλ) distribution. Using
the lack of memory property suggests that the extra time to the second event, X2:n − X1:n , should have the
exponential ( (n − 1)λ ) distribution. And so on. This result is established in the following proposition.
Proposition(9.5a). Suppose X1 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution. De-
fine Z1 , . . . , Zn by
Z1 = nX1:n , Z2 = (n − 1)(X2:n − X1:n ), . . . , Zn = Xn:n − X(n−1):n
Then Z1 , . . . ,Zn are i.i.d. random variables with the exponential (λ) distribution.
P n
−λ xj
Proof. We know that the density of (X1:n , . . . , Xn:n ) is g(x, . . . , xn ) = n!λn e j=1 for 0 < x1 < · · · < xn .
Also
Z1 Z1 Z2 Z1 Z2 Zn−1
X1:n = X2:n = + . . . , Xn:n = + + ··· + + Zn
n n n−1 n n−1 2
and hence the Jacobian of the transformation is
∂(x1:n , . . . , xn:n ) 1
=
∂(z1 , . . . , zn ) n!
Hence the density of (Z1 , . . . , Zn ) is
1
f(Z1 ,...,Zn ) (z1 , . . . , zn ) = n!λn e−λ(z1 +···+zn ) for z1 > 0, . . . , zn > 0.
n!
This establishes the proposition.
9.6 Link with the the order statistics from a uniform distribution.
Proposition(9.6a). Suppose Y1 , . . . , Yn+1 are i.i.d. random variables with the exponential (λ) distribution, and
let
S` = Y1 + · · · + Y` for ` = 1, . . . , n + 1.
Then  
Y1 Yn
,..., is independent of Sn+1 . (9.6a)
Sn+1 Sn+1
Suppose U1 , . . . , Un are i.i.d. random variables with the uniform(0, 1) distribution and denote the vector of
order statistics by (U1:n , . . . , Un:n ). Let
D1 = U1:n , D2 = U2:n − U1:n , . . . , Dn = Un:n − U(n−1):n , Dn+1 = 1 − Un:n
Then  
Y1 Yn+1
,..., has the same distribution as (D1 , . . . , Dn+1 ) (9.6b)
Sn+1 Sn+1
Proof. By equation(3.2a), the density of the vector (U1:n , . . . , Un:n ) is
n! if 0 < x1 < · · · < xn < 1
n
g(x1 , . . . , xn ) =
0 otherwise
Also
f(Y1 ,...,Yn+1 ) (y1 , . . . , yn+1 ) = λn+1 e−λ(y1 +···+yn+1 ) for y1 > 0,. . . , yn+1 > 0.
Consider the transformation:
Y1 Y2 Yn
X1 = , X2 = , . . . , Xn = , Xn+1 = Y1 + · · · + Yn+1
Sn+1 Sn+1 Sn+1
Or
Y1 = X1 Xn+1 , Y2 = X2 Xn+1 , . . . , Yn = Xn Xn+1 , Yn+1 = (1 − X1 − X2 − · · · − Xn )Xn+1
The absolute value of the Jacobian of the transformation is:
∂(y1 , . . . , yn+1 ) n
∂(x1 , . . . , xn+1 ) = xn+1

The determinant can be easily evaluated by replacing the last row by the sum of all the rows—this gives an upper triangular
determinant.
Hence the density of (X1 , . . . , Xn+1 ) is
 Pn+1
n+1 −λxn+1 n
f(X1 ,...,Xn+1 ) (x1 , . . . , xn+1 ) = λ e xn+1 for x1 ≥ 0, . . . , xn+1 ≥ 0, `=1 x` = 1
0 otherwise
Now Xn+1 = Y1 + · · · + Yn+1 is the sum of (n + 1) i.i.d. exponentials; hence by proposition(11.6a) on page 35, Xn+1 ∼
gamma(n + 1, λ) and has density
λn+1 xnn+1 e−λxn+1
fXn+1 (xn+1 ) =
n!
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §10 Page 31

It follows that (X1 , . . . , Xn ) is independent of Xn+1 and (X1 , . . . , Xn ) has density


Pn
f(X1 ,...,Xn ) (x1 , . . . , xn ) = n! for x1 ≥ 0, . . . , xn ≥ 0, `=1 x` ≤ 1.
This establishes (9.6a).
Using equation(7.8a) on page 24 shows that the density of (D1 , . . . , Dn ) is the same as the density of (X1 , . . . , Xn ). Also
D1 + · · · + Dn + Dn+1 = 1 and X1 + · · · + Xn + Yn+1/Sn+1 = 1. Hence (9.6b).
Corollary(9.6b). With the same notation as the proposition,
 
S1 Sn
,..., has the same distribution as (U1:n , . . . , Un:n ) (9.6c)
Sn+1 Sn+1
Also  
S1 Sn
,..., is independent of Sn+1 . (9.6d)
Sn+1 Sn+1
Proof. The proposition shows that (X1 , . . . , Xn ) has the same distribution (D1 , . . . , Dn ). Taking partial sums of both
sides gives (9.6c). Result (9.6d) follows directly from (9.6a).
9.7 Decomposition of an exponential. In exercise 12.6 on page 38 it is shown that if X ∼ gamma( 1/2, α),
d
Y ∼ gamma( 1/2, α) and X and Y are independent, then X +Y has the gamma(1, α) = exponential (α) distribution.
Thus an exponential can be expressed as the sum of two independent gammas. A further decomposition is given
in exercise 10.4.

9.8
Summary. The exponential distribution.
• Density. X ∼ exponential (λ) iff fX (x) = λe−λx for x > 0.
• The distribution function. FX (x) = 1 − e−λx for x > 0.
• Moments.
1 2 1
E[X] = E[X 2 ] = 2 var[X] = 2
λ λ λ
• M.g.f. and c.f.
λ λ
MX (t) = E[etX ] = for t < λ. φX (t) = E[eitX ] =
λ−t λ − it
• Properties.
Suppose X ∼ exponential (λ) then αX ∼ exponential (λ/α).
The lack of memory property: P[X > x + t|X > x] = P[X > t].
If X1 , . . . , Xn are i.i.d. exponential (λ), then X1:n ∼ exponential (nλ).

10 Exercises (exs-exponential.tex)

10.1 Suppose X ∼ exponential (λ).


(a) Find skew[X], the skewness of X.
(b) Find κ[X], the kurtosis of X. [Ans]
10.2 The failure rate function of the exponential distribution.
(a) Suppose λ ∈ (0, ∞) and X ∼ exponential (λ). Show that the hazard function hX is constant on (0, ∞). The
exponential is said to have a constant failure rate.
(b) Suppose X is a random variable with an absolutely continuous distribution on (0, ∞) and X has a constant failure
rate. Show that X has an exponential distribution. [Ans]
10.3 Most elementary analysis texts contain a proof of the result:
 n
1
lim 1 − =e
n→∞ n
By using this result, show that if {δn } is a real sequence in (0, ∞) such that δn → 0 as n → ∞ and α ∈ (0, ∞), then
lim (1 − αδn )t/δn = e−αt
n→∞
This is used in equation(9.2a) on page 28. [Ans]
10.4 Suppose X ∼ exponential (λ).
(a) Let Y be the integer part of X; hence Y = bXc. Let Z be the fractional part of X; hence Z = X − bXc.
Find the distributions of Y and Z and show that Y and Z are independent.
(b) Now let W = dXe, the ceiling of X; hence {W = n} = {n − 1 < X ≤ n}. Find the distribution of W . [Ans]
Page 32 §10 Mar 10, 2020(20:25) Bayesian Time Series Analysis

10.5 Suppose X ∼ uniform(0, 1) and λ > 0. Prove that


ln(1 − X)
Y =− ∼ exponential (λ) [Ans]
λ
10.6 Suppose X and Y are i.i.d. random variables with the exponential (λ) distribution.
(a) Find the distribution of V = X/(X + Y ). This is a special case of part (a) of exercise 12.7 on page 38.
(b) Find the distribution of Z = (X − Y )/(X + Y ). [Ans]
10.7 Suppose X and Y are i.i.d. random variables with the exponential (λ) distribution. Find the distribution of Z = X/(Y +1).
[Ans]
10.8 (a) Suppose X ∼ exponential (λ), Y ∼ exponential (µ) and X and Y are independent. Show that
λ
P[X < Y ] =
λ+µ
(b) Suppose X1 , . . . , Xn are independent random variables with Xj ∼ exponential (λj ) for j = 1, . . . , n. Show that for
k ∈ {1, 2, . . . , n} we have
λk
P[ Xk < min{X` : ` 6= k} ] = Pn
j=1 λj
(c) Suppose X1 , . . . , Xn are independent random variables with Xj ∼ exponential (λj ) for j = 1, . . . , n. Show that
     Yn
λ1 λ2 λn−1 λ
P[X1 < X2 < · · · < Xn ] = ··· = Pn k
λ1 + · · · + λn λ2 + · · · + λn λn−1 + λn j=k λj
k=1
(d) Deduce that the event A = { X1 < min(X2 , . . . , Xn ) } is independent of the event B = {X2 < · · · < Xn }. [Ans]
10.9 Suppose X ∼ exponential (µ) and Y ∼ exponential (δ) where 0 < δ ≤ µ. Suppose further that f : (0, ∞) → (0, ∞)
with f differentiable and f 0 (x) > 0 for all x > 0. Prove that E[f (X)] ≤ E[f (Y )]. [Ans]
10.10 Suppose X and Y are i.i.d. random variables with the exponential (λ) distribution. Find the conditional density of X
given X + Y = z. What is E[X|X + Y ]? [Ans]
10.11 Suppose X1 and X2 are i.i.d. random variables with the exponential (λ) distribution. Let Y1 = X1 −X2 and Y2 = X1 +X2 .
(a) Find the densities of Y1 and Y2 . Note that Y1 has the Laplace(0, λ) distribution—see §27.1 on page 81.
(b) What is the density of R = |X1 − X2 |? [Ans]
10.12 Suppose the random variables X and Y are i.i.d. with the exponential (1) distribution. Let U = min{X, Y } and
V = max{X, Y }. Prove that U ∼ exponential (2) and V ∼ X + 12 Y . [Ans]
10.13 A characterization of the exponential distribution. Suppose X1 and X2 are i.i.d. random variables which are non-negative
and absolutely continuous. Let Y = min{X1 , X2 } and R = |X1 − X2 |. Then Y and R are independent iff X1 and X2
have the exponential distribution. [Ans]
10.14 Suppose X1 ∼ exponential (λ1 ), X2 ∼ exponential (λ2 ) and X1 and X2 are independent.
(a) Find P[min{X1 , X2 } = X1 ].
(b) Show that {min{X1 , X2 } > t} and {min{X1 , X2 } = X1 } are independent.
(c) Let R = max{X1 , X2 } − min{X1 , X2 }. Find P[R > t].
(d) Show that min{X1 , X2 } and R are independent. [Ans]
10.15 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution and let Xn:n denote the max-
imum. Suppose further thatPY1 , Y2 , . . . , Yn are independent random variables such that Yk ∼ exponential (kλ) for
n
k = 1, 2, . . . , n and let Y = k=1 Yk .
d
Show that Xn:n = Y . [Ans]
10.16 Suppose X1 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution. Find
(a) E[Xk:n ] (b) var[Xk:n ] (c) cov[Xj:n , Xk:n ]. [Ans]
10.17 Suppose X1 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution.
Let Z = nX1:n + (n − 1)X2:n + · · · + 2X(n−1):n + Xn:n . Find E[Z] and var[Z]. [Ans]
10.18 Suppose X1 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution. Prove that
Xn
X1:n is independent of (X` − X1:n ) [Ans]
`=1

10.19 Suppose X1 , X2 , X3 , . . . is a sequence of i.i.d. random variables with the exponential (1) distribution. For n =
1, 2, 3, . . . , let Zn = max{X1 , . . . , Xn }. Show that
n
e−x

−x
P[Zn − ln(n) < x] = 1 − and lim P[Zn − ln(n) < x] = e−e [Ans]
n n→∞
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §11 Page 33

10.20 (a) Sum of two independent exponentials. Suppose X ∼ exponential (λ), Y ∼ exponential (µ) and X and Y are
independent. Find the density of Z = X + Y .
(b) Ratio of two independent exponentials. Suppose X ∼ exponential (λ), Y ∼ exponential (µ) and X and Y are
independent. Find the distribution of Z = X/Y .
(c) Product of two independent exponentials. Suppose X ∼ exponential (λ), Y ∼ exponential (µ) and X and Y are
independent.
(i) Find the distribution function of Z = XY . Express your answer in terms of the modified Bessel function of the
second kind, order 1, which is:
Z ∞
K1 (y) = cosh(x)e−y cosh(x) dx for <(x) > 0.
x=0

(ii) Find the density of Z = XY . Express your answer in terms of the modified Bessel function of the second kind,
order 0, which is:
Z ∞
K0 (y) = e−y cosh(x) dx for <(x) > 0.
x=0

(iii) Write down what these answers become when λ = µ. [Ans]


10.21 Suppose X1 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution. Define Y1 , . . . , Yn as follows:
Y1 = X1 , Y2 = X1 + X2 , . . . , Yn = X1 + · · · + Xn .
Find the density of the vector (Y1 , . . . , Yn ). [Ans]
10.22 Sum of a random number of independent exponentials. Suppose X1 , X2 , X3 , . . . are i.i.d. random variables with
the exponential (λ) distribution. Suppose further that N is an integer valued random variable which is independent of
X = (X1 , X2 , . . .) and such that P[N = k] = q k−1 p for k ∈ {1, 2, . . .} where q ∈ (0, 1) and p = 1 − q. Let
Z = X1 + · · · + XN
Show that Z ∼ exponential (pλ). [Ans]
10.23 Suppose λ ∈ (0, ∞) and X ∼ exponential (λ). Suppose further that Y is a random variable which is independent of X
and takes values in (0, ∞).
(a) Prove that the conditional distribution of X − Y given X > Y is exponential (λ).
(b) By using part(a) or otherwise, prove that Y and X − Y are conditionally independent given X > Y . [Ans]
10.24 As usual, let N denote the positive integers {1, 2, . . .}. Suppose
P∞ {Xk : k ∈ N} is a countable collection of random
variables such that Xk ∼ exponential (λk ) for every k ∈ N and k=1 λk < ∞.
P∞
(a) Let M = inf{Xk : k ∈ N}. Show that M ∼ exponential ( k=1 λk ).
(b) Suppose i ∈ {1, 2, . . .}. Show that
λi
P[Xi < Xj for all j ∈ N − {i} ] = P∞
k=1 λk
P∞ P∞
(c) Let Y = k=1 Xk and µ = k=1 1/λk . Because all terms arePnon-negative, we can interchange the integral and

summation and obtain E[Y ] = µ. Show that P[Y < ∞] = 1 iff k=1 1/λk < ∞. [Ans]

11 The gamma and chi-squared distributions


11.1 Definition of the Gamma distribution.
Definition(11.1a). Suppose n > 0 and α > 0. Then the random variable X has the gamma(n, α) distribution
iff X has density
αn xn−1 e−αx
f (x) = for x > 0. (11.1a)
Γ(n)
R∞
By definition, Γ(n) = xn−1 e−x dx for all n ∈ (0, ∞)3 . It follows that
0
Z ∞
Γ(n)
xn−1 e−αx dx = n provided α > 0 and n > 0. (11.1b)
0 α
We shall see that n is a shape parameter and 1/α is a scale parameter; sometimes α is called the rate parameter.
The gamma distribution when n is an integer is called the Erlang distribution.

R∞ √
3
The integral 0
xn−1 e−x dx diverges for n ≤ 0. Also Γ( 1/2) = π, Γ(1) = 1 and Γ(n) = (n − 1)! for n ∈ {2, 3, . . .}.
Page 34 §11 Mar 10, 2020(20:25) Bayesian Time Series Analysis

11.2 The distribution function. There is a simple expression for the distribution function only when n is a
positive integer: if X ∼ gamma(n, α) where n is a positive integer then
αx (αx)2 (αx)n−1
 
−αx
P[X ≤ x] = 1 − e 1+ + + ··· + (11.2a)
1! 2! (n − 1)!
This is easy to check—differentiating the right hand side of equation(11.2a) gives the density in equation(11.1a).
Equation(11.2a) also implies
αxe−αx (αx)2 e−αx (αx)n−1 e−αx
P[X > x] = e−αx + + + ··· +
1! 2! (n − 1)!
Example(11.2a). Suppose X ∼ Γ(4, 1/100). Find P[X > 300].
Now αx = 3. Hence P[X > 300] = e−3 1 + 3 + 9/2 + 27/6 = 13e−3 .
 
Solution.

11.3 Multiple of a gamma distribution. Suppose n > 0 and α > 0 and X ∼ gamma(n, α) with density fX (x).
Suppose further that Y = βX where β > 0. Then the density of Y is given by:
fX (x) αn ( y/β )n−1 exp(−αy/β)
fY (y) = =
dy
dx βΓ(n)

Hence Y = βX ∼ gamma(n, α/β ). Thus the parameter 1/α is a scale parameter of the gamma distribution—if
X has a gamma distribution with scale parameter b and Y = βX, then Y has a gamma distribution with scale
parameter βb. Also, for any fixed n > 0, the family of distributions {gamma(n, α) : α > 0} is a scale family—see
definition(1.6d) on page 5.
R
11.4 Moments and shape of the gamma distribution. Using the result that f (x) dx = 1 easily gives
Γ(n + k)
E[X k ] = for n + k > 0. (11.4a)
αk Γ(n)
and so 
n n 1 α
E[X] = and var(X) = 2 and E = for n > 1. (11.4b)
α α X n−1

0.8 n = 21
n=1
0.6 n=2

0.4

0.2

0.0

1 2 3 4
Figure(11.4a). Plot of gamma density function for n = 21 , n = 1 and n = 2 (all with α = 1).
(wmf/gammadensity-fig001,121mm,73mm)

Figure (11.4a) shows that n is a shape parameter of the gamma distribution. By §11.3, we know that if X ∼
gamma(n, α) then Y = X/α ∼ gamma(n, 1). So without loss of generality, we now consider the shape of the
density of gamma(n, 1) distribution.
Let g(x) = xn−1 e−x . If n ≤ 1, then g(x) = e−x /x1−n is monotonic decreasing and hence the density of the
gamma(n, 1) distribution is monotonic decreasing.
If n > 1, then g 0 (x) = e−x xn−2 [n − 1 − x]. Clearly, if x < n − 1 then g 0 (x) > 0; if x = n − 1 then g 0 (x) = 0 and
if x > n − 1 then g 0 (x) < 0. Hence the density first increases to the maximum at x = n − 1 and then decreases.
By using §11.3, it follows that the maximum of the density of a gamma(n, α) density occurs at x = (n − 1)/α.
See also exercises 12.3 and 12.4 on page 38.
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §11 Page 35

11.5 The moment generating function of a gamma distribution. Suppose X ∼ gamma(n, α). Then
Z ∞ n n−1 e−αx Z ∞
tx α x αn
tX
MX (t) = E[e ] = e dx = xn−1 e−(α−t)x dx
0 Γ(n) Γ(n) 0
αn Γ(n) 1
= = for t < α. (11.5a)
Γ(n) (α − t)n (1 − t/α)n
Hence the characteristic function is 1/(1 − it/α)n ; in particular, if n = 1, the characteristic function of the
exponential(α) distribution is α/(α − it).
Equation(11.5a) shows that for integral n, the Gamma distribution is the sum of n independent exponentials. The
next paragraph gives the long proof of this.
11.6 Representing the gamma distribution as a sum of independent exponentials. The following propo-
sition shows that the distribution of the waiting time for the nth event in a Poisson process with rate α has the
gamma(n, α) distribution.
Proposition(11.6a).Suppose X1 , X2 . . . . , Xn are i.i.d. random variables with the exponential density αe−αx
for x ≥ 0. Then Sn = X1 + · · · + Xn ∼ gamma(n, α).
Proof. By induction: let gn denote the density of Sn . Then for all t > 0 we have
Z t Z t n
α (t − x)n−1 e−α(t−x) −αx
gn+1 (t) = gn (t − x)αe−αx dx = αe dx
0 0 Γ(n)
x=t
αn+1 e−αt t αn+1 e−αt (t − x)n αn+1 tn e−αt
Z 
= (t − x)n−1 dx = − = as required.
Γ(n) 0 Γ(n) n x=0 Γ(n + 1)
The result that the sum of n independent exponentials has the Gamma distribution is the continuous analogue of
the result that the sum of n independent geometrics has a negative binomial distribution.
Link with the Poisson distribution and Poisson process. Equation(11.2a) on page 34 implies P[X ≤ x] = P[Y ≥
n] where Y has a Poisson distribution with expectation αx. In terms of the Poisson process with rate α, the
relation P[X ≤ x] = P[Y ≥ n] means that the nth event occurs before time x iff there are at least n events in
[0, x].
Suppose Sn = X1 + · · · + Xn as in the proposition. Let Nt denote the number of indices k ≥ 1 with Sk ≤ t. Then
e−αt (αt)n
P[Nt = n] = P[Sn ≤ t and Sn+1 > t] = Gn (t) − Gn+1 (t) =
n!
by using equation(11.2a) on page 34. An alternative statement of this result: suppose X1 , . . . , Xn+1 are i.i.d. ran-
dom variables with the exponential (λ) distribution and x0 > 0. Then
e−αx0 (αt)n
P[X1 + · · · + Xn ≤ x0 < X1 + · · · + Xn+1 ] =
n!
11.7 Normal limit and approximation. Suppose Gn ∼ gamma(n, α). Using the representation in proposi-
tion(11.6a) and the fact that each Xj has expectation 1/α and variance 1/α2 , the central limit theorem implies
Gn − E[Gn ] Gn − n/α D
√ = p =⇒ N (0, 1) as n → ∞.
var[Gn ] n/α2
and hence for large n
αx − n
 
P[Gn ≤ x] ≈ Φ √
n
The local central limit theorem4 showsthat
√ √ 
n n+z n 1 1 2
lim fGn = n(z) where n(x) = √ e− 2 x (11.7a)
n→∞ α α 2π
See exercise 12.18 on page 39 below.

4
The local central limit theorem. Suppose Y1 , Y2 , . . . are i.i.d. random variables with mean 0 and variance 1 and characteristic
k
√ |φY | is integrable for some positive k and sup{|φY (t)| : |t| ≥ δ} < 1 for all δ > 0. Let
function φY . Suppose further that
Sn = Y1 + · · · + Yn ; then Sn / n has a bounded continuous density fn for all n ≥ k and supx∈R |fn (x) − n(x)| → 0 as
n → ∞.
This formulation is due to Michael Wichura: galton.uchicago.edu/~wichura/Stat304/Handouts/L16.limits.pdf.
See also page 516 in [F ELLER(1971)].
Page 36 §11 Mar 10, 2020(20:25) Bayesian Time Series Analysis

11.8 Lukacs’ characterization of the gamma distribution.


Proposition(11.8a). Suppose X and Y are both positive, non-degenerate 5 and independent random variables.
Then X/(X +Y ) is independent of X +Y iff there exist k1 > 0, k2 > 0 and α > 0 such that X ∼ gamma(k1 , α)
and Y ∼ gamma(k2 , α).
Proof.
⇐ This is exercise 12.7 on page 38.
⇒ This is proved in [L UKACS(1955)] and [M ARSAGLIA(1989)].
We can easily extend this result to n variables:
Proposition(11.8b). Suppose X1 , X2 , . . . , Xn are positive, non-degenerate and independent random variables.
Then Xj /(X1 + · · · + Xn ) is independent of X1 + · · · + Xn for j = 1, 2, . . . , n iff there exist α > 0,
k1 > 0, . . . , kn > 0 such that Xj ∼ gamma(kj , α) for j = 1, 2, . . . , n.
Proof.
⇐ Let W = X2 + · · · + Xn . By equation(11.5a), W ∼ gamma(k2 + · · · + kn , α). Also X1 ∼ gamma(k1 , α) and
W and X1 are independent positive random variables. Hence X1 /(X1 + · · · + Xn ) is independent of X1 + · · · + Xn by
proposition(11.8a). Similarly Xj /(X1 + · · · + Xn ) is independent of X1 + · · · + Xn for j = 2, . . . , n.
⇒ Let Wj = X1 + · · · + Xn − Xj . Then Wj and Xj are independent positive random variables. Also Xj /(Wj + Xj ) is
independent of Wj + Xj . By proposition(11.8a), there exist kj > 0, kj∗ > 0 and αj > 0 such that Xj ∼ gamma(kj , αj )
and Wj ∼ gamma(kj∗ , αj ). Hence X1 + · · · + Xn = Wj + Xj ∼ gamma(kj + kj∗ , αj ). The same argument works for
j = 1, 2, . . . , n; this implies α1 = · · · = αn . The result follows.

11.9 The χ2 distribution. For n ∈ (0, ∞) the gamma( n/2, 1/2) distribution has density:
xn/2−1 e−x/2
f (x) = for x > 0. (11.9a)
2n/2 Γ( n/2)
This is the density of the χ2n distribution. If n is a positive integer, then n is called the degrees of freedom.
Definition(11.9a). Suppose n > 0. Then the random variable has a χ2n distribution iff X ∼ gamma( n/2, 1/2).
Moments of the χ2 distribution. If Y ∼ χ2n = gamma( n/2, 1/2), then equation(11.4a) shows that the k th moment of
Y is given by
( n
k Γ(k+ /2) if n > −2k;
E[Y ] = 2 Γ( n/2)
k
(11.9b)
∞ if n ≤ −2k.
In particular E[Y ] = n, E[Y 2 ] = n(n + 2), var[Y ] = 2n,


 
2Γ( (n+1)/2) 1 1
E[ Y ] = and E = provided n > 2. (11.9c)
Γ( /2)
n Y (n − 2)

Convolutions of independent χ2 distributions. By equation(11.5a), the c.f. of the χ2n distribution is 1/(1 − 2it)n/2 .
It immediately follows that if X ∼ χ2m , Y ∼ χ2n and X and Y are independent, then X + Y ∼ χ2m+n .
If X ∼ χ2k and n ∈ {1, 2, . . .}, then X = Y1 + · · · + Yn where Y1 , . . . ,Yn are i.i.d. random variables with the
χ2k/n distribution. Hence by definition(5.1a) on page 15, the distribution χ2k is infinitely divisible.
Relations with other distributions.
• If n ∈ (0, ∞) and X ∼ gamma(n, α) then 2αX ∼ gamma(n, 1/2) = χ22n .
• It immediately follows from the expressions for the densities that χ22 = exponential ( 1/2).
• If X ∼ N (0, 1), then X 2 ∼ χ21 ; also if X1 , . . . , Xn are i.i.d. random variables with the N (0, 1) distribution
then X12 + · · · + Xn2 ∼ χ2n . See §15.5 on page 48 for the proofs of these two results.
• Suppose Vn ∼ χ2n ; then we can represent Vn as the sum of n i.i.d. random variables each with expectation 1
and variance 2. Hence, by the central limit theorem,
Vn − E[Vn ] Vn − n D
√ = √ =⇒ N (0, 1) as n → ∞.
var[Vn ] 2n

5
To exclude the trivial case that both X and Y are constant. In fact if one of X and Y is constant and X/(X +Y ) is independent
of X + Y , then the other must be constant also.
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §12 Page 37

11.10 The generalized gamma distribution.


Definition(11.10a). Suppose n > 0, λ > 0 and b > 0. Then the random variable X has the generalized
gamma distribution ggamma(n, λ, b) iff X has density
bλn bn−1 −λxb
f (x) = x e for x > 0. (11.10a)
Γ(n)
Note that
• if n = b = 1, then the generalized gamma is the exponential distribution;
• if b = 1, the generalized gamma is the gamma distribution;
• if n = 1, the generalized gamma is the Weibull distribution—introduced below in §27.2 on page 82;
• if n = 1, b = 2 and λ = 1/2σ2 , the generalized gamma is the Rayleigh distribution—introduced below in §27.4
on page 83;
and finally
• if n = 1/2, b = 2 and λ = 1/2σ2 then the generalized gamma is the half-normal distribution—introduced in
exercise 16.25 on page 52.
It is left to an exercise (see exercise 12.22 on page 39) to check:
• The function f in equation(11.10a) integrates to 1 and so is a density.
• If X ∼ ggamma(n, λ, b) then Y = X b ∼ gamma(n, λ).
• The central moments are given by the expression:
Γ( k/b + n)
E[X k ] = b/k
λ Γ(n)
The generalized gamma distribution is used in survival analysis and reliability theory to model lifetimes.

11.11
Summary. The gamma distribution.
• Density. X has the gamma(n, α) density for n > 0 and α > 0 iff
αn xn−1 e−αx
fX (x) = for x > 0.
Γ(n)
• Moments. E[X] = n/α; var[X] = n/α2 and E[X k ] = Γ(n+k)/αk Γ(n) for n + k > 0.
• M.g.f. and c.f.
1 1
MX (t) = E[etX ] = for t < α. φX (t) = E[eitX ] =
(1 − t/α)n (1 − it/α)n
• Properties.
gamma(1, α) is the exponential (α) distribution.
If X ∼ gamma(n, α) and β > 0 then βX ∼ gamma(n, α/β ).
The gamma(n, α) distribution is the sum of n independent exponential (α) distributions.
If X ∼ gamma(m, α), Y ∼ gamma(n, α) and X and Y are independent, then X + Y ∼ gamma(m + n, α).
The χ2n distribution.
• This is the gamma( n/2, 1/2) distribution.
• If X ∼ χ2n , then E[X] = n, var[X] = 2n and the c.f. is φ(t) = 1/(1 − 2it)n/2 .
• If X ∼ χ2m , Y ∼ χ2n and X and Y are independent, then X + Y ∼ χ2m+n .
• The χ22 distribution is the exponential ( 1/2) distribution.

12 Exercises (exs-gamma.tex)
R∞ x−1 −u
12.1 The Gamma function. This is defined to be Γ(x) = 0 u e du for x > 0. Show that
(a) Γ(x + 1) = x Γ(x) for all x > 0; (b) Γ(1) = 1

(c) Γ(n) = (n − 1)! for all integral n ≥ 2; (d) Γ( 1/2) = π
 1.3.5 . . . (2n − 1) √ (2n)! √
(e) Γ n + 1/2 = π = 2n π for integral n ≥ 1 [Ans]
2n 2 n!
12.2 Suppose X ∼ Γ(n, α).
(a) Find skew[X], the skewness of X.
(b) Find κ[X], the kurtosis of X.
q
In particular, if X ∼ χ2n , then skew[X] = n8 and κ[X] = 3 + 12 n. [Ans]
Page 38 §12 Mar 10, 2020(20:25) Bayesian Time Series Analysis

12.3 By §11.4 on page 34, we know that if n > 1, the maximum of the gamma(n, 1) density occurs at x = n − 1. Show that
the maximum value of the density when n > 1 is approximately
1

2π(n − 1)

Hint: Stirling’s formula is n! ∼ nn e−n 2πn as n → ∞. [Ans]
12.4 Suppose f is the density of the Γ(n, α) distribution.
(a) Show that if 0 < n ≤ 1 then the function f is convex (f 00 ≥ 0).

(b) Show that if 1 < n ≤ 2 then f is concave and then convex with a point of inflection at x = (n − 1 + n − 1)/α.

(c) Show that if n√> 2 then f is convex, then concave, and then convex with points of inflection at (n − 1 − n − 1)/α
and (n − 1 + n − 1)/α. [Ans]
12.5 Suppose X ∼ gamma(m, α) and Y ∼ gamma(n, α) and X and Y are independent. Find E[ Y /X ]. [Ans]
12.6 (a) Gamma densities are closed under convolution. Suppose X ∼ gamma(n1 , α), Y ∼ gamma(n2 , α) and X and Y are
independent. Prove that X + Y has the gamma(n1 + n2 , α) distribution.
(b) Hence show that the distribution gamma(n, α) is infinitely divisible. [Ans]
12.7 Suppose X ∼ gamma(m, α) and Y ∼ gamma(n, α) and X and Y are independent.
(a) Show that U = X + Y and V = X/(X + Y ) are independent..
(b) Show that U = X + Y and V = Y /X are independent. fs In both cases, find the densities of U and V . [Ans]
12.8 Suppose X1 ∼ gamma(k1 , λ), X2 ∼ gamma(k2 , λ) and X3 ∼ gamma(k3 , λ). Suppose further that X1 , X2 and X3 are
independent. Let
X1 X1 + X2
Y1 = Y2 = Y3 = X1 + X2 + X3
X1 + X2 X1 + X2 + X3
Show that Y1 , Y2 and Y3 are independent and find their distributions. [Ans]
12.9 Suppose X ∼ gamma(n, α). Show that
   n
2n 2
P X≥ ≤ [Ans]
α e
12.10 Suppose the random variable X has the following central moments:
(a) E[X k ] = 2k−1 (k + 2)! for k = 1, 2, 3, . . . . (b) E[X k ] = 2k (k + 1)! for k = 1, 2, 3, . . . .
In both cases, find the distribution of X. [Ans]
12.11 Suppose X ∼ gamma(m, α) and Y ∼ gamma(n, α) and X and Y are independent. Show that
n mv
if v > 0;
E[X|X + Y = v] = m+n [Ans]
0 otherwise.
12.12 Suppose the random variable X ∼ exponential (Y ) where Y ∼ gamma(n, α).
(a) Find the density of X and the value of E[X].
(b) Find the conditional density of Y given X = x. [Ans]
12.13 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution. Suppose further that Z ∼
exponential (µ) and is independent of X1 , X2 , . . . , Xn . Find
P[X1 + · · · + Xn > Z] [Ans]
12.14 Suppose X ∼ gamma(α, λ). Show that
α
[ψ(α + 1) − ln λ]
E[ X ln X ] =
λ
6
where ψ denotes the digamma function which satisfies
Z ∞
d Γ0 (z)
ψ(z) = ln( Γ(z) ) = and hence Γ(z)ψ(z) = Γ0 (z) = xz−1 e−x ln x dx [Ans]
dz Γ(z) x=0

12.15 Suppose X ∼ exponential (λ), and given X = x, the n random variables Y1 , . . . , Yn are i.i.d. exponential (x).7 Find the
distribution of (X|Y1 , . . . , Yn ) and E[X|Y1 , . . . , Yn ]. [Ans]
12.16 Suppose X ∼ gamma(n, α). By expressing the density of X in the form
1
fX (x) = αn exp [(n − 1) ln x − αx] for x ∈ (0, ∞).
Γ(n)
show that gamma(n, α) belongs to the exponential family of distributions with natural parameters n − 1 and −α and
natural statistics ln X and X. [Ans]
6
See page 258 in [A BRAMOWITZ &S TEGUN(1965)]
7
This means that f(Y1 ,...,Yn )|X (y1 , . . . , yn |x) = Πni=1 fYi |X (yi |x) = xn e−x(y1 +···+yn ) .
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §12 Page 39

12.17 The Poisson-Gamma mixture. Suppose X ∼ gamma(n, α); suppose further that given the random variable X then Y
has a Poisson distribution with expectation X. Compute P[Y = j] for j = 0, 1, 2, . . . . [Ans]
12.18 Suppose Gn ∼ gamma(n,
Pn α) and Sn = α(Gn − n/α) where n > 1 is an integer. Hence Sn = α(Gn − n/α) =
Pn
i=1 α(Xi − 1/α) = i=1 Yi where each Xi ∼ exponential (α) and each Yi has mean 0 and variance 1.
Check that the conditions of the local central limit theorem (§11.7 on page 35) are satisfied and hence verify the limiting
result(11.7a) on page 35. [Ans]
12.19 Length biased sampling in the Poisson process. Suppose {Xj }j≥1 is a sequence of i.i.d. random variables with the
exponential (α) distribution. For n ≥ 1, let Sn = X1 + · · · + Xn and suppose t ∈ (0, ∞).
Define the random variable K to be the unique integer with SK−1 < t ≤ SK ; equivalently K = min{j : Sj ≥ t}.
(a) Find the density of XK . Find E[XK ] and compare with 1/α, the expectation of an exponential (α) distribution.
Note that a longer interval has a higher chance of containing t!
(b) Let Wt denote the waiting time to the next event after time t; hence Wt = SK − t.
Find the distribution of Wt . [Ans]
12.20 Suppose X ∼ χ2n .
By expressing the density of X in the form
e−x/2  
fX (x) = n/2 exp ( n/2 − 1) ln x for x ∈ (0, ∞).
2 Γ(n/2)
2
show that χn belongs to the exponential family of distributions with natural parameter n/2 − 1 and natural statistic ln X.
[Ans]
12.21 Suppose U ∼ uniform(0, 1). Find the distribution of Y = −2 ln(U ). [Ans]
12.22 The generalized gamma distribution.
(a) Show that the function f defined in equation(11.10a) is a density.
(b) Suppose X ∼ ggamma(n, λ, b). Show that Y = X b ∼ gamma(n, λ).
(c) Suppose X ∼ ggamma(n, λ, b). Find the central moments E[X k ] for k = 1, 2, . . . . [Ans]
8
12.23 (a) Suppose that for every p ∈ (0, 1), the random variable Xp has the negative binomial distribution with parameters
k and p; this means that
k−1 j
 
P[Xp = k] = p (1 − p)k−j for k ∈ {j, j + 1, j + 2, . . .}.
k−j
Prove that
D
pXp =⇒ gamma(j, 1) as p → 0.
(b) Suppose j ∈ {1, 2, . . .} and λ ∈ (0, ∞); suppose further that {pn }n≥1 is a sequence in (0, 1) such that npn → λ
as n → ∞. Finally, Suppose that for every n ∈ {1, 2, . . .} the random variable Xn has the negative binomial
k−1 j
distribution with parameters j and pn ∈ (0, 1); this means P[Xn = k] = k−j pn (1 − pn )k−j for k ∈ {j, j + 1, j +


2, . . .}. Prove that


Xn D
=⇒ gamma(j, λ) as n → ∞ [Ans]
n

8
The negative binomial distribution is the distribution of the serial number of the j th success in a sequence of Bernoulli trials.
Page 40 §13 Mar 10, 2020(20:25) Bayesian Time Series Analysis

13 The beta and arcsine distributions


The beta distribution.
The distribution of order statistics from the uniform distribution uniform(0, 1) leads to the beta distribution—this
is considered in §7.5 on page 23.
13.1 The density and distribution function.
Definition(13.1a). Suppose α > 0 and β > 0. Then the random variable X has the beta distribution,
beta(α, β), iff X has density
Γ(α + β) α−1
f (x; α, β) = x (1 − x)β−1 for 0 < x < 1. (13.1a)
Γ(α)Γ(β)
R∞
• Checking equation(13.1a) is a density function. Now Γ(α) = 0 uα−1 e−u du by definition. Hence
Z ∞Z ∞
Γ(α)Γ(β) = uα−1 v β−1 e−u−v du dv
0 0
Now use the transformation x = u/(u + v) and y = u + v; hence u = xy and v = y(1 − x). Clearly 0 < x < 1 and

0 < y < ∞; also ∂(u,v)
∂(x,y) = y. Hence

Z 1 Z ∞
Γ(α)Γ(β) = y α+β−1 xα−1 (1 − x)β−1 e−y dx dy
x=0 y=0
Z 1
= Γ(α + β) xα−1 (1 − x)β−1 dx
x=0

• The beta function is defined by


Z 1
Γ(α)Γ(β)
B(α, β) = tα−1 (1 − t)β−1 dt = for all α > 0 and β > 0.
0 Γ(α + β)
Properties of the beta and gamma functions can √ be found in most advanced calculus books. Recall that Γ(n) =
(n − 1)! if n is a positive integer and Γ( 1/2) = π.
Equation (13.1a) now becomes
xα−1 (1 − x)β−1
f (x; α, β) = for 0 < x < 1.
B(α, β)

• The distribution function of the beta distribution, beta(α, β) is


Z x Z x
1 Ix (α, β)
F (x; α, β) = f (t; α, β) dt = tα−1 (1 − t)β−1 dt = for x ∈ (0, 1).
0 B(α, β) 0 B(α, β)
The integral, Ix (α, β), is called the incomplete beta function.
13.2 Shape of the density. The beta density can take many different shapes—see figure(13.2a).
Finding the mode. By differentiation, f 0 (x; α, β) = 0 implies x(α + β − 2) = α − 1. This has a solution for x
with x ≥ 0 if either (a) α + β > 2 and α ≥ 1 or (b) α + β < 2 and α ≤ 1. Hence we have a solution for x with
x ∈ [0, 1] if either (a) α ≥ 1 and β ≥ 1 with α + β 6= 2 or (b) α ≤ 1 and β ≤ 1 with α + β 6= 2.9
By checking the second derivative, we see
α−1
mode[X] = if α ≥ 1 and β ≥ 1 with α + β 6= 2.
α+β−2
If α = β, then the density is symmetric about x = 1/2 and hence the expectation is 1/2.
For further results about the shape of the beta density, see exercise 14.1 on page 44.
R1
13.3 Moments. Using the fact that 0 xα−1 (1 − x)β−1 dx = B(α, β), it is easy to check that
α (α + 1)α αβ
E[X] = E[X 2 ] = and hence var[X] = 2
(13.3a)
α+β (α + β + 1)(α + β) (α + β) (α + β + 1)

9
The exceptional case α + β = 2 implies α = β = 1 and the density is uniform on [0, 1].
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §13 Page 41

3.0 α = 1/2 , β = 1/2 3.0 α = 2, β = 2


α = 5, β = 1 α = 2, β = 5
2.5 α = 1, β = 3 2.5 α = 5, β = 2

2.0 2.0

1.5 1.5

1.0 1.0

0.5 0.5

0.0 0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Figure(13.2a). Shape of the beta density for various values of the parameters.
(wmf/betadensity1,wmf/betadensity2,72mm,54mm)

13.4 Some distribution properties.


• Suppose X ∼ beta(α, β), then 1 − X ∼ beta(β, α).
• The beta(1, 1) distribution is the same as the uniform distribution on (0, 1). The beta(α, 1) distribution is the
same as the powerlaw(α, 0, 1) distribution.
• Suppose X ∼ gamma(n1 , α) and Y ∼ gamma(n2 , α). Suppose further that X and Y are independent. Then
X/(X + Y ) ∼ beta(n1 , n2 ). See exercise 12.7 on page 38. In particular, if X ∼ χ22k = gamma(k, 1/2),
Y ∼ χ22m = gamma(m, 1/2) and X and Y are independent, then X/(X + Y ) ∼ beta(k, m).
• If X ∼ beta(α, 1) then − ln(X) ∼ exponential (α). See exercise 14.4 on page 44.
• For the link between the beta and F distributions, see exercises 22.29 and 22.30 on page 71.
13.5 Linear transformation of a beta distribution. Suppose a ∈ R, h > 0 and X ∼ beta(α, β). Then the
density of the linear transformation Y = a + hX is
Γ(α + β) (y − a)α−1 (h + a − y)β−1
fY (y) = for a < y < a + h.
Γ(α)Γ(β) hα+β−1
This is called the beta(α, β) distribution with location parameter a and scale parameter h.
Clearly for fixed α > 0 and fixed β > 0, this family of beta distributions for a ∈ R and h ∈ (0, ∞) forms a
location-scale family—see definition(1.6b) on page 5.
The beta prime distribution.
13.6 The beta prime distribution—definition. A random variable X with the beta distribution takes values in
[0, 1] and so is often used to model a probability. The corresponding odds ratio, which is X/(1 − X), has a beta
prime distribution.
Definition(13.6a). Suppose α > 0 and β > 0. Then the random variable X is said to have the beta prime
distribution, beta0 (α, β), iff it has density
xα−1 (1 + x)−α−β
f (x) = for x > 0. (13.6a)
B(α, β)
The beta prime distribution is sometimes called the beta distribution of the second kind.
13.7 The beta prime distribution—basic properties.
Shape of the density function. See exercise 14.10.
Moments of the beta prime distribution. See exercise 14.11 on page 45.
X
Link between the beta and beta prime distributions. The key result is that if X ∼ beta(α, β), then 1−X ∼
0
beta (α, β). See exercise 14.13 on page 45.
If X ∼ beta(α, β), then Y = 1 − X ∼ beta(β, α); hence the previous result implies X1 − 1 ∼ beta0 (β, α).
Conversely, if X ∼ beta0 (α, β) then X/(1 + X) ∼ beta(α, β).
Page 42 §13 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Distribution function. Suppose Y ∼ beta0 (α, β); then Y has distribution function
 
y
FY (y) = FX for y > 0.
1+y
where FX denotes the distribution function of X ∼ beta(α, β).
Links with other distributions. We shall see later (see §21.8 on page 66) that the beta prime distribution is just a
multiple of the F -distribution. Also, the standard beta prime distribution, beta0 (1, 1) is the same as the standard
log-logistic distribution, loglogistic(1, 1)—see equation(36.4d) on page 109.

The arc sine distribution.



13.8 The arcsine distribution on (0, 1). The arcsine distribution is the beta 1/2, 1/2 distribution.
Definition(13.8a). The random variable has the arcsine distribution on (0, 1) iff X has density
1
fX (x) = √ for x ∈ (0, 1).
π x(1 − x)
A plot of the density is given in figure(13.8a).
3.0

2.5

2.0

1.5

1.0

0.5

0.0 0.2 0.4 0.6 0.8 1.0


Figure(13.8a). Plot of the arcsine(0, 1) density
(wmf/arcsineDensity,72mm,54mm)

The distribution function. Suppose X has the arcsine distribution; then


Z x
du 2 √ arcsin(2x − 1) 1
FX (x) = P[X ≤ x] = √ = arcsin( x) = + for x ∈ [0, 1]. (13.8a)
u=0 π u(1 − u) π π 2
Linear transformation of the arcsine(0, 1) distribution. Suppose X ∼ arcsine(0, 1) and Y = a + bX where a ∈ R
and b ∈ (0, ∞). Then Y has density
1
fY (y) = √ for x ∈ (a, a + b).
π (y − a)(a + b − y)
This leads to the definition of the arcsine(a, b) distribution.
13.9 The arcsine distribution on (a, a + b).
Definition(13.9a). Suppose a ∈ R and b ∈ (0, ∞). Then the random variable X has the arcsine distribution
on (a, a + b), denoted arcsin(a, b), iff X has density
1
fX (x) = √ for x ∈ (a, a + b).
π (x − a)(a + b − x))
This means that the distribution defined in definition(13.8a) can also be described as the arcsin(0, 1) distribution.
Linear transformation of an arcsine(a, b) distribution. Suppose X ∼ arcsine(a, b) and c ∈ R and d ∈ (0, ∞).
Then c + dX ∼ arcsine(c + ad, bd).
In particular,
if X ∼ arcsine(0, 1) then a + bX ∼ arcsine(a, b);
if X ∼ arcsine(a, b) then (X − a)/b ∼ arcsine(0, 1).
It follows that the family of distributions {arcsin(a, b) : a ∈ R, b > 0} is a location-scale family—see defini-
tion(1.6b) on page 5.
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §13 Page 43

The distribution function of the arcsine(a, b) distribution.


If X ∼ arcsine(a, b) then (X − a)/b ∼ arcsine(0, 1); hence
r !
X −a x−a x−a
 
2
FX (x) = P[X ≤ x] = P ≤ = arcsin for a ≤ x ≤ a + b. (13.9a)
b b π b
Quantile function and median. The quantile function of the arcsine(a, b) distribution is
2
FX−1 (p) = a + b sin(πp/2)

for p ∈ (0, 1). (13.9b)

Hence the median of the distribution is FX−1 (1/2) = a + b(1/ 2)2 = a + b/2.
Distribution of the square of an arcsine distribution.
By part(b) of exercise 14.20 on page 46, if X ∼ arcsine(−1, 2), then X 2 ∼ arcsine(0, 1). It follows that if
X ∼ arcsine(−1, 2), then
1+X
X 2 has the same distribution as
2
2
If also follows that if X ∼ arcsine(0, 1), then X and 4 X − 1/2 have the same distribution. This property
effectively characterizes the arcsine distribution—see theorem 1 in [A RNOLD &G ROENEVELD(1980)].
Further properties of the arcsine distribution are in exercises 14.15–14.18 on page 45.
13.10 The generalized arcsine distribution. This is the beta(1 − α, α) distribution where α ∈ (0, 1). Now
Euler’s reflection formula states that
π
Γ(x)Γ(1 − x) = for x ∈ (0, 1) (13.10a)
sin(πx)
This implies the density of the generalized arcsine distribution is
1 sin(πα) −α
f (x) = x−α (1 − x)α−1 = x (1 − x)α−1 for x ∈ (0, 1).
Γ(α)Γ(1 − α) π
The expectation is 1 − α and the variance is (1 − α)α/2. Setting α = 1/2 in this distribution gives the standard
arcsine(0, 1) distribution.

13.11
Summary.
The beta distribution. Suppose α > 0 and β > 0; thenX ∼ beta(α, β) iff X has density
Γ(α + β) α−1
f (x; α, β) = x (1 − x)β−1 for 0 < x < 1.
Γ(α)Γ(β)
• Moments:
α αβ
E[X] = var[X] =
α+β (α + β)2 (α + β + 1)
• Suppose X ∼ beta(α, β), then 1 − X ∼ beta(β, α).
• The beta(1, 1) distribution is the same as the uniform distribution on (0, 1).
• Suppose X ∼ gamma(n1 , α) and Y ∼ gamma(n2 , α). Suppose further that X and Y are independent. Then
X/(X + Y ) ∼ beta(n1 , n2 ).
nX
• If X ∼ beta(α, 1) then − ln(X) ∼ Exponential(α). If X ∼ beta( m/2, n/2) then m(1−X) ∼ Fm,n .
The beta prime distribution. Suppose α > 0 and β > 0; then X ∼ beta 0 (α, β) iff the density is
xα−1 (1 + x)−α−β
f (x) = for x > 0.
B(α, β)
X 0
• If X ∼ beta(α, β), then 1−X ∼ beta (α, β).
d
The arcsine distribution. If X ∼ arcsine(0, 1) = beta( 1/2, 1/2) then X has density
1
fX (x) = √ for x ∈ (0, 1).
π x(1 − x)
• Moments: E[X] = 1/2 and var[X] = 1/8.
Page 44 §14 Mar 10, 2020(20:25) Bayesian Time Series Analysis

14 Exercises (exs-betaarcsine.tex)

The beta distribution.

14.1 Shape of the beta density. Let


1−α
x0 =
2−α−β
(a) Suppose α ∈ (0, 1) and β ∈ (0, 1). Show that fX first decreases and then increases with the minimum value at
x = x0 . Also fX (x) → ∞ as x ↓ 0 and fX (x) → ∞ as x ↑ 1.
(b) If α = β = 1, then fX (x) = 1, which is the density of the uniform uniform(0, 1) distribution.
(c) If α ∈ (0, 1) and β ≥ 1 then fX (x) is decreasing with fX (x) → ∞ as x ↓ 0. If α ≥ 1 and β ∈ (0, 1) then fX (x) is
increasing with fX (x) → ∞ as x ↑ 1.
(d) If α = 1 and β > 1 then fX (x) is decreasing with mode at x = 0; if α > 1 and β = 1 then fX (x) is increasing with
mode at x = 10.
(e) If α > 1 and β > 1 then fX increases and then decreases with mode at x = x0 . [Ans]
14.2 Moments of the beta distribution. Suppose X ∼ beta(α, β).
(a) Find an expression for E[X m ] for m = 1, 2, . . . .
(b) Find E[X] and var[X].
(c) Find E[1/X] if α > 1. [Ans]
14.3 Suppose α > 0 and β > 0 and X ∼ beta(α, β).
(a) Find skew[X], the skewness of X. (Note: the distribution is positively skewed if α < β and negatively skewed if
α > β. If α = β then the distribution is symmetric about 1/2.)
(b) Find κ[X], the kurtosis of X.
(c) In particular, find the values of the skewness and kurtosis when X1 ∼ beta(2, 2), X2 ∼ beta(3, 2), X3 ∼ beta(2, 3)
d
and X4 ∼ beta( 1/2, 1/2) = arcsine(0, 1). [Ans]
14.4 Suppose X ∼ beta(α, 1). Show that − ln(X) ∼ exponential (α). [Ans]
14.5 Link between the beta and Pareto distributions.
(a) Suppose X ∼ beta(α, β). Find the density of Y = 1/X.
(b) In particular, if X ∼ beta(α, 1), show that Y = 1/X has the Pareto(α, 1, 0) distribution.
Conversely, if Y ∼ Pareto(α, 1, 0) then X = 1/Y ∼ beta(α, 1). [Ans]
14.6 Link between the beta distribution function and the binomial distribution function. Suppose X ∼ beta(k, n − k + 1)
and Y ∼ binomial (n, p). Prove that P[X > p] = P[Y ≤ k − 1], or equivalently P[X ≤ p] = P[Y ≥ k]. We assume
p ∈ [0, 1], k ∈ {1, 2, . . .} and n ∈ {k, k + 1, . . .}.
Note. This has already been proved—see equation(7.5a) on page 23 where it is shown that if X1 , . . . , Xn are i.i.d. with
the uniform(0, 1) distribution then the density of Xk:n is the beta(k, n − k + 1) distribution. Clearly {Xk:n ≤ p} if there
at least k of the n random variables X1 , . . . , Xn in the interval [0, p]. [Ans]
14.7 Suppose α > 0 and β > 0. Suppose further that Y ∼ beta(α, β) and X is a random variable such that the distribution
of X given Y = y is binomial (n, y). Show that the distribution of Y given X = k is beta(α + k, β + n − k). [Ans]
14.8 Suppose Y ∼ beta(α, β) where α > 0 and β > 0, and X is a random variable such that the distribution of X given
Y = y is negativebinomial (y, k)10 . Show that the distribution of Y given X = k is beta(α + k, β + n − k). [Ans]
14.9 Random partitions of an interval—see §7.8 on page 24. Suppose U1 , . . . , Un are i.i.d. random variables with the uniform
uniform(0, 1) distribution and let U1:n , . . . , Un:n denote the order statistics. These variables partition the interval [0, 1]
into n + 1 disjoint intervals with the following lengths:
D1 = U1:n , D2 = U2:n − U1:n , . . . , Dn = Un:n − U(n−1):n , Dn+1 = 1 − Un:n
Clearly D1 + · · · + Dn+1 = 1. Find E[Dk ] for k = 1, 2, . . . , n + 1. [Ans]

10
The random variable X ∼ negativebinomial (y, k) is X is the serial number of the k th success in a sequence of Bernoulli
trials with probability of success equal to p. Hence for n ∈ {k, k + 1, . . .} we have P[X = n] = n−1
 k n−k
k−1 p (1 − p) .
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §14 Page 45

The beta prime distribution


14.10 Shape of the density. Suppose α > 0 and β > 0. Suppose further that X ∼ beta 0 (α, β) has density function f .
(a) Find f 0 (x), the derivative of the density.
(b) Suppose α ∈ (0, 1); show that f is decreasing with f (x) → ∞ as x ↓ 0.
(c) Suppose α = 1; show that f is decreasing with mode at x = 0.
(d) Suppose α > 1; show that f first increases and then decreases with mode at (α − 1)/(β + 1).
(e) Find f 00 (x), the second derivative of the density.
(f) Let
√ √
(α − 1)(β + 2) − (α − 1)(β + 2)(α + β) (α − 1)(β + 2) + (α − 1)(β + 2)(α + β)
x1 = and x2 =
(β + 1)(β + 2) (β + 1)(β + 2)
Suppose α ∈ (0, 1]; show that f is convex. Suppose α ∈ (1, 2]; show that f is initially concave and then convex
with inflection point at x = x2 . Suppose α > 2; show that f is initially convex, then concave and then convex again
with inflection points at x = x1 and x = x2 . [Ans]
14.11 Moments.Suppose X has the beta prime distribution, beta 0 (α, β).
(a) Show that E[X] = α/(β − 1) provided β > 1.
(b) Show that var[X] = α(α + β − 1)/(β − 2)(β − 1)2 provided β > 2.
(c) Suppose −α < k < β; show that
B(α + k, β − k)
E[X k ] = [Ans]
B(α, β)
14.12 Suppose X ∼ beta 0 (α, β).
(a) Find skew[X], the skewness of X. (b) Find κ[X], the kurtosis of X. [Ans]
X 0
14.13 Suppose X ∼ beta(α, β). Show that 1−X ∼ beta (α, β). [Ans]
14.14 Suppose X has the beta prime distribution, beta 0 (α, β).
(a) 1/X ∼ beta 0 (β, α).
(d) Suppose X ∼ gamma(n1 , 1) and Y ∼ gamma(n2 , 1). Suppose further that X and Y are independent. Show that
X/Y ∼ beta 0 (n1 , n2 ).
(e) Suppose X ∼ χ2n1 , Y ∼ χ2n2 and X and Y are independent. Show that X/Y ∼ beta 0 (n1 /2, n2 /2). [Ans]

The arcsine distribution


14.15 Prove equation(13.9a) on page 43. [Ans]
14.16 Shape of the arcsine density. Suppose X ∼ arcsine(a, b) has density function, f .
(a) Show that f is symmetric about a + b/2.
(b) Show that f first decreases and then increases with minimum value at x = a + b/2.
(c) Show that f is convex. [Ans]
14.17 Moments of the arcsine distribution. Suppose X ∼ arcsine(0, 1), the standard arcsine distribution.
(a) Show that  
1 2n
E[X n ] = n for n = 1, 2, . . . .
4 n
In particular, E[X] = 1/2, E[X 2 ] = 3/8 and hence var[X] = 1/8.
(b) Let µ = E[X] = 1/2. Show that E[ (X − µ)n ] = 0 if n is odd and
 
n 1 n
E[ (X − µ) ] = n n if n is even.
4 /2
In particular var[X] = 1/8.
(c) Show that the characteristic function of X is
∞  
itX
X 1 2k
φX (t) = E[e ] = (it)k
4k k! k
k=0

(d) Now suppose X ∼ arcsine(a, b). Find E[X] and var[X]. [Ans]
14.18 Suppose X ∼ arcsine(a, b).
(a) Find skew[X], the skewness of X.
(b) Find κ[X], the kurtosis of X. [Ans]
Page 46 §15 Mar 10, 2020(20:25) Bayesian Time Series Analysis

14.19 (a) Suppose U ∼ uniform(0, 1), a ∈ R and b ∈ (0, ∞). Show that
π 
X = a + b sin2 U ∼ arcsine(a, b)
2
(b) Suppose X ∼ arcsine(a, b). Show that
r !
2 X −a
U = arcsin ∼ uniform(0, 1) [Ans]
π b

14.20 (a) Suppose X ∼ arcsine(−1, 2). Prove that X 2 ∼ arcsine(0, 1).


(b) Suppose X ∼ uniform(−π, 2π). Prove that sin(X), sin(2X) and −cos(2X) all have the arcsine(−1, 2) distribution.
(c) Suppose X ∼ uniform(0, π). Show that Y = cos(X) ∼ arcsine(−1, 2).
(d) Suppose X ∼ uniform(0, π/2). Show that Y = sin2 (X) ∼ arcsine(0, 1). [Ans]
14.21 Suppose X ∼ gamma( 21 , α), Y ∼ gamma( 12 , α) and X and Y are independent. Show that X/(X + Y ) ∼ arcsine(0, 1).
[Ans]
14.22 Suppose X ∼ uniform(−π, 2π), Y ∼ uniform(−π, 2π) and X and Y are independent.
(a) Prove that sin(X + Y ) ∼ arcsine(−1, 2). (b) Prove that sin(X − Y ) ∼ arcsine(−1, 2). [Ans]

15 The normal distribution


15.1 The density function.
Definition(15.1a). Suppose µ ∈ (−∞, ∞) and σ ∈ (0, ∞). Then the random variable X has the normal
distribution N (µ, σ 2 ) if it has density
(x − µ)2
 
1
fX (x) = √ exp − for x ∈ R. (15.1a)
σ 2π 2σ 2
The normal density has the familiar “bell” shape. There are points of inflection at x = µ − σ and x = µ + σ—this
means the f 00 (x) = 0 at these points and the curve changes from convex, when x < µ − σ, to concave and then to
convex again when x > µ + σ. Clearly the mode is at x = µ.

A B

µ−σ µ µ+σ

Figure(15.1a). The graph of the normal density. Points A and B are points of inflection.
(wmf/normaldensity,72mm,54mm)

To check that the function fX defined in equation(15.1a) is a density function:


Clearly fX (x) ≥ 0 for all x ∈ R. Using the substitution t = (x − µ)/σ gives
Z ∞ Z ∞
(x − µ)2
 
1
I= fX (x) dx = √ exp − dx
−∞ −∞ σ 2π 2σ 2
Z ∞ Z ∞ r
1  2  2  2  2
=√ exp − t /2 dt = √ exp − t /2 dt = J
2π −∞ 2π 0 π
where
Z ∞Z ∞ Z π/2 Z ∞
2 1 2 2 π
J = exp[− 2 (x + y )]dy dx = r exp[− 21 r2 ]dr dθ =
0 0 0 0 2
and hence r
π
J=
2
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §15 Page 47

This shows that fX integrates to 1 and hence is a density function.


One reason for the importance of the normal distribution is the central limit theorem—this asserts that the
normalized sum of independent random variables tends to a normal distribution. This theorem is explained in
most probability books; see for example page 258 in [F ELLER(1971)].
15.2 The distribution function, mean and variance. The standard normal distribution is the normal distribu-
tion N (0, 1); its distribution function is
Z x
1
Φ(x) = √ exp[− 12 t2 ] dt
−∞ 2π
This function is widely tabulated. Note that:
• Φ(−x) = 1 − Φ(x). See exercise 16.1 on page 49.
• If X has the N (µ, σ 2 ) distribution, then for −∞ < a < b < ∞ we have
Z b Z (b−µ)/σ
(x − µ)2
 
1 1
exp − t2/2 dt
 
P[a < X ≤ b] = √ exp − 2
dx = √
a σ 2π 2σ 2π (a−µ)/σ
b−µ a−µ
   
=Φ −Φ
σ σ
2
The mean of the N (µ, σ ) distribution:
Z ∞
(x − µ)2
 
1
E[X] = [(x − µ) + µ] √ exp − dx = 0 + µ = µ
−∞ σ 2π 2σ 2
because the function x 7−→ x exp[− x2/2] is odd.
The variance of the N (µ, σ 2 ) distribution: use integration by parts as follows
Z ∞ Z ∞
(x − µ)2 σ2
 
2 1
var[X] = (x − µ) √ exp − 2
dx = √ t2 exp[ −t2/2] dt
−∞ σ 2π 2σ 2π −∞
2 Z ∞ 2 Z ∞
2σ 2σ
=√ t t exp[ −t2/2] dt = √ exp[− t2/2] dt = σ 2
2π 0 2π 0
15.3 The moment generating function and characteristic function. Suppose X ∼ N (µ, σ 2 ) and X = µ+σY .
Then Y ∼ N (0, 1) by using the usual change of variable method. It has already been pointed out in example (1.6c)
on page 5 that the family of distributions {N (µ, σ 2 ) : µ ∈ R, σ > 0} is a location-scale family.
For s ∈ R, the moment generating function of X is given by
Z ∞
1 1 2
sX sµ
MX (s) = E[e ] = e E[e ] = e sσY sµ
esσt √ e− 2 t dt
−∞ 2π
sµ Z ∞  2 sµ Z ∞
t − 2σst (t − σs)2 σ 2 s2
  
e e
=√ exp − dt = √ exp − + dt
2π −∞ 2 2π −∞ 2 2
= exp sµ + 12 σ 2 s2
 
1 2 2
Similarly the characteristic function of X is E[eitX ] = eiµt− 2 σ t .
15.4 Moments of the normal distribution. Moments of a distribution can be obtained by expanding the mo-
ment generating function as a power series: E[X r ] is the coefficient of sr /r! in the expansion of the moment
generating function. It is easy to find the moments
 1 about  the mean of a normal distribution in this way: if
2 sY 2 2
X ∼ N (µ, σ ) and Y = X − µ then E[e ] = exp 2 σ s which can be expanded in a power series of powers
of s. Hence
E (X − µ)2n+1 = E Y 2n+1 = 0 for n = 0, 1, . . .
   

and
 (2n)!σ 2n
E (X − µ)2n = E Y 2n =
  
for n = 0, 1, . . . (15.4a)
2n n!
For example, E[(X − µ)2 ] = σ 2 and E[(X − µ)4 ] = 3σ 4 .
Similarly we can show that (see exercise 16.22 on page 51):
 2n/2 σ n
 
 n n+1
E |X − µ| = √ Γ for n = 0, 1, . . . .
π 2
There are available complicated expressions for E[X n ] and E[|X|n ]; for example, see [W INKELBAUER(2014)].
Page 48 §15 Mar 10, 2020(20:25) Bayesian Time Series Analysis

15.5 Sum of squares of independent N (0, 1) variables.


Proposition(15.5a). Suppose X1 ,. . . , Xn are i.i.d. random variables with the N (0, 1) distribution.
Let Z = X12 + · · · + Xn2 . Then Z ∼ χ2n .
Proof. Consider n = 1. Now X1 has density
1 2
fX1 (x) = √ e−x /2 for x ∈ R.

Then Z = X12 has density
√ dx z −1/2 e−z/2

1 1
fZ (z) = 2fX1 ( z) = 2 √ e−z/2 √ = 1/2 1 for z > 0.
dz 2π 2 z 2 Γ( /2)
Thus Z ∼ χ21 . By §11.9 on page 36, we know that if X ∼ χ2m , Y ∼ χ2n and X and Y are independent, then X+Y ∼ χ2n+m .
Hence Z ∼ χ2n in the general case.

15.6 Linear combination of independent normals.


2
Proposition(15.6a). Pn X1 , X2 , . . . , Xn are independent random variables with Xi ∼ N (µi , σi ) for
Suppose
i = 1, 2, . . . , n. Let T = i=1 bi Xi where bi ∈ R for i = 1, 2, . . . , n. Then
n n
!
X X
2 2
T ∼N bi µi , bi σi
i=1 i=1
Proof. Using moment generating functions gives
n n n  
Y Y Y 1
MT (s) = E[esT ] = E[esbi Xi ] = MXi (sbi ) = exp sbi µi + s2 b2i σi2
2
i=1 i=1 i=1
n n
!
X 1 X 2 2
= exp s bi µi , + s2 bi σ i
2
i=1 i=1
Pn Pn 2 2 
which is the mgf of N i=1 bi µi , i=1 bi σi .
Pn
Corollary(15.6b). If X1 , . . . , Xn are i.i.d. N (µ, σ 2 ), then X = i=1 Xi /n ∼ N (µ, σ 2 /n).
This result implies the normal distribution is stable—see definition(4.1a) on page 11.
d
Note also that if X ∼ N (µ, σ 2 ) and n ∈ {1, 2, . . .}, then X = Y1 + · · · + Yn where Y1 , . . . , Yn are i.i.d. random
variables with the N (µ/n, σ 2 /n) distribution. Hence by definition(5.1a) on page 15, the distribution N (µ, σ 2 ) is
infinitely divisible.

15.7 Independence of two linear combinations of independent normals.


2
Proposition(15.7a). Suppose n are independent random variables with Xi ∼ N (µi , σi ) for
Pn X1 , X2 , . . . , XP n
i = 1, 2, . . . , n. Let V = i=1 bi Xi and W = i=1 ci Xi where bi ∈ R and ci ∈ R for i = 1, 2, . . . , n. Then V
and W are independent iff
Xn
bi ci σi2 = 0 (15.7a)
i=1
Proof. Using moment generating functions gives
n n n  
Y Y Y 1
M(V,W ) (s, t) = E[esV +tW ] = E[e(sbi +tci )Xi ] = MXi (sbi + tci ) = exp (sbi + tci )µi + (sbi + tci )2 σi2
2
i=1 i=1 i=1
n  
Y 1 1
= exp (sbi + tci )µi + s2 b2i σi2 + stbi ci σi2 + s2 c2i σi2
2 2
i=1
n
!
X
= MV (s)MV (t) exp st bi ci σi2
i=1
which proves the proposition.
A particular case. Suppose X and Y are i.i.d. random variables with the normal N (µ, σ 2 ) distribution. Let
V = X + Y and W = X − Y . Using the notation of the proposition we have b1 = 1, b2 = 1, c1 = 1 and c2 = −1.
Hence V and W are independent.
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §16 Page 49

15.8 Characterizations of the normal distribution. There are many characterizations of the normal distribu-
tion11 —here are two of the most useful and interesting.
Proposition(15.8a). Cramér’s theorem. Suppose X and Y are independent random variables such that Z =
X + Y has a normal distribution. Then both X and Y have normal distributions—although one may have a
degenerate distribution.
Proof. See, for example, page 298 in [M ORAN(2003)].
Proposition(15.8b). The Skitovich-Darmois theorem. Suppose n ≥ 2 and X1 , . . . , Xn are independent
non-degenerate random variables. Suppose a1 , . . . , an , b1 , . . . , bn are all in R and
L1 = a1 X1 + · · · + an Xn L2 = b1 X1 + · · · + bn Xn
If L1 and L2 are independent, then all random variables Xj with aj bj 6= 0 are normal.
Proof. See, for example, page 89 in [K AGAN et al.(1973)].
A particular case of the Skitovich-Darmois theorem. Suppose X and Y are independent random variables and
V = X + Y and W = X − Y . If V and W are independent then X and Y have normal distributions.
Example(15.8c). Suppose X1 , X2 , . . . , Xn are independent non-degenerate random variables such that V = X and
W = X1 − X are independent. Show that X1 , X2 , . . . , Xn all have normal distributions.
Solution. Now V = a1 X1 + · · · + an Xn with a1 = · · · = an = 1/n and W = b1 X1 + · · · + bn Xn with b1 = 1 − 1/n,
b2 = · · · = bn = −1/n. The result follows by the Skitovich-Darmois theorem.

15.9
Summary. The normal distribution.
• Density. X has the N (µ, σ 2 ) distribution iff it has the density
(x − µ)2
 
1
fX (x) = √ exp − for x ∈ R.
σ 2π 2σ 2
• Moments: E[X] = µ and var[X] = σ 2
• The distribution function: P[X ≤ x] = Φ(x) which is tabulated.
• The moment generating function: MX (t) = E[etX ] = exp[tµ + 21 t2 σ 2 ]
• The characteristic function: φX (t) = E[eitX ] = exp[iµt − 21 σ 2 t2 ]
• A linear combination of independent normals has a normal distribution.
• The sum of squares of n independent N (0, 1) variables has the χ2n distribution.

16 Exercises (exs-normal.tex)

−1 −1
16.1 Show that Φ(−x) = 1 − Φ(x). Hence deduce that Φ (p) + Φ (1 − p) = 0 for p ∈ [0, 1]. [Ans]
2 2
16.2 Suppose X ∼ N (µ, σ ). Suppose further that P[X ≤ 140] = 0.3 and P[X ≤ 200] = 0.6. Find µ and σ . [Ans]
16.3 Suppose X ∼ N (µ, σ 2 ).
(a) Find skew[X], the skewness of X.
(b) Find κ[X], the kurtosis of X. [Ans]
16.4 (a) Show that
Z ∞ Z ∞
1
Φ(−x)ϕ(x) dx = Φ(x)ϕ(x) dx =
−∞ −∞ 2
(b) Suppose a ∈ R and σ > 0. Show that
Z ∞  Z ∞ 
a−x 1 x−a x−a 1 x−a
     
1
Φ ϕ dx = Φ ϕ dx = [Ans]
−∞ σ σ σ −∞ σ σ σ 2
16.5 Suppose a ∈ R, b ∈ R and d ∈ R. Suppose further that c > 0. Prove that
Z ∞
cd − bd − ac
 
1
Φ(d − a − bx)ϕ(cx − d) dx = Φ √
x=−∞ c c 2 + b2
Now suppose c < 0. Prove that
Z ∞
cd − bd − ac ac + bd − cd
   
1 1 1
Φ(d − a − bx)ϕ(cx − d) dx = Φ √ − =− Φ √
x=−∞ c c 2 + b2 c c c2 + b2
If c = 0, the integral diverges. [Ans]
11
For example, see [M ATHAI &P EDERZOLI(1977)] and [PATEL &R EAD(1996)]
Page 50 §16 Mar 10, 2020(20:25) Bayesian Time Series Analysis

16.6 Suppose ϕ denotes the density function of the N (0, 1) distribution.


(a) Prove that, for w > 0 we have
∞  
ϕ(w)
Z
1
1+ 2 ϕ(x) dx =
w x w
2 2 2
(b) Suppose a ≤ b and m = max{a , b }. Prove that
Z b
(b − a)ϕ(m) ≤ ϕ(x) dx ≤ (b − a)ϕ(0) [Ans]
a

16.7 Equality of distributions of linear combinations of independent normals. Suppose X1 , X2 , . . . , Xn are independent
normal random variables with distributions N (µ1 , σ12 ), N (µ2 , σ22 ) . . . , N (µn , σn2 ) respectively. Suppose further that
a1 , . . . , an , b1 , . . . , bn are all in R.
Pn Pn Pn Pn Pn 2 2
(a) Show
Pn 2 2 that k=1 ak X k and k=1 b k Xk have the same distribution iff k=1 ak µ k = k=1 b k µ k and k=1 ak σk =
b
k=1 k k σ .
√ √
(b) Show that (X1 + · · · + Xj )/ j and (X1 + · · · + Xm )/ m have the same distributions. [Ans]
16.8 Tail probabilities of the normal distribution. Suppose ϕ denotes the density function of the N (0, 1) distribution.
(a) Prove that

1 1 xϕ(x) ϕ(x)
− ϕ(x) ≤ ≤ 1 − Φ(x) ≤ for all x > 0.
x x2 1 + x2 x
(b) Prove that 1 − Φ(x) ∼ ϕ(x)/x as x → ∞.
(c) Prove that for any x < y we have
x2
 
ϕ(x) ϕ(y) ϕ(x) ϕ(y)
− ≤ Φ(y) − Φ(x) ≤ −
1 + x2 x y x y
(d) Prove that for any α > 0 we have
ϕ(x) ϕ(x + α)
Φ(x + α) − Φ(x) ∼ − as x → ∞.
x x+α
(e) Prove that for all x > 0 and all j ∈ {1, 2, 3, . . .} we have
1.3.5 . . . (4j − 3)
 
1 1 1.3 1.3.5
ϕ(x) − + − 7 + ··· − ≤ 1 − Φ(x)
x x3 x5 x x4j−1
1.3 . . . (4j − 1)
 
1 1 1.3 1.3.5
≤ ϕ(x) − + − 7 + ··· +
x x3 x5 x x4j+1
(f) Prove that for all k ∈ {1, 2, 3, . . .} we have
k 1.3.5 . . . (2k − 1)
 
1 1 1.3 1.3.5
1 − Φ(x) ∼ ϕ(x) − + − 7 + · · · + (−1) as x → ∞. [Ans]
x x3 x5 x x2k+1
16.9 Suppose Y has the distribution function FY (y) with
1
Φ(y) if y < 0;
FY (y) = 21 1
2 + 2 Φ(y) if y ≥ 0.
n
Find E[Y ] for n = 0, 1, . . . . [Ans]
−Q(x) 2
16.10 Suppose X is a random variable with density fX (x) = ce for all x ∈ R where Q(x) = ax − bx and a 6= 0.
(a) Find any relations that must exist between a, b and c and show that X must have a normal density.
(b) Find the mean and variance of X in terms of a and b. [Ans]
16.11 The entropy of a normal random variable. For the definition of entropy, see exercise 8.6 on page 25. Suppose
X ∼ N (µ, σ 2 ). Find the entropy of X.
(It can be shown that the continuous distribution with mean µ and variance σ 2 with the largest entropy is N (µ, σ 2 ).)
[Ans]
16.12 (a) Suppose X and Y are i.i.d. random variables with the N (0, σ 2 ) distribution. Find the density of Z = X 2 + Y 2 .
(b) Suppose X1 ,. . . , Xn are i.i.d. random variables with the N (0, σ 2 ) distribution. Let Z = X12 + · · · + Xn2 . Find the
distribution of Z. [Ans]
16.13 Suppose X and Y are i.i.d. random variables. Let V = X 2 + Y 2 and W = X/Y . Are V and W independent? [Ans]
16.14 Suppose X ∼ N (µ, σ 2 ). Suppose further that, given X = x, the n random variables Y1 , . . . , Yn are i.i.d. N (x, σ12 ).12
Find the distribution of (X|Y1 , . . . , Yn ). [Ans]

12
This means that f(Y1 ,...,Yn )|X (y1 , . . . , yn |x) = Πni=1 fYi |X (yi |x).
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §16 Page 51

16.15 Suppose X and Y are i.i.d. N (0, 1).


(a) Let Z1 = X + Y and Z2 = X − Y . Show that Z1 and Z2 are independent. Hence deduce that the distribution of
X−Y /X+Y is the same as the distribution of X/Y .
2 2
(b) By using the relation XY = X+Y 2 − X−Y 2 , find the characteristic function of Z = XY .
itXY
R∞ ityX
(c) By using the relation E[e ] = −∞ E[e ]fY (y) dy, find the characteristic function of Z = XY .
(d) Now suppose X and Y are i.i.d. N (0, σ 2 ). Find the c.f. of Z = XY .
(e) Now suppose X and Y are i.i.d. N (µ, σ 2 ). Find the c.f. of Z = XY . [Ans]
16.16 Suppose X1 , X2 , X3 and X4 are i.i.d. N (0, 1). Find the c.f. of X1 X2 + X3 X4 and the c.f. of X1 X2 − X3 X4 . See also
exercise 28.12 on page 84. [Ans]
16.17 (a) Suppose b ∈ (0, ∞). Show that

b2
   r
π −b
Z
1 2
exp − u + 2 du = e (16.17a)
0 2 u 2
(b) Suppose a ∈ R with a 6= 0 and b ∈ R. Show that
Z ∞
b2
    π 1/2
1
exp − a2 u2 + 2 du = 2
e−|ab| (16.17b)
0 2 u 2a
This result is used in exercise 28.28 on page 86. [Ans]
16.18 (a) Suppose X ∼ N (0, 1), Y ∼ N (0, 1) and X and Y are independent. Find the density of X|X + Y = v.
(b) Suppose X ∼ N (µ1 , σ12 ), Y ∼ N (µ2 , σ22 ) and X and Y are independent. Find the density of X|X + Y = v [Ans]
16.19 Suppose X1 , X2 , . . . , Xn are independent random variables with distributions N (µ1 , σ12 ), N (µ2 , σ22 ), . . . , N (µn , σn2 )
respectively. Suppose further that X1 − X is independent of X. Prove that
σ 2 + · · · + σn2
σ12 = 2 [Ans]
n−1
16.20 Suppose X1 , X2 , . . . , Xn are independent random variables with distributions N (µ1 , σ 2 ), N (µ2 , σ 2 ), . . . , N (µn , σ 2 )
respectively.
(a) By using characteristic functions, prove that (X1 − X, X2 − X, . . . , Xn − X) is independent of X.
Pn
(b) Deduce that X is independent of the sample variance j=1 (Xj − X)2 /(n − 1).
(c) Prove that the range Xn:n − X1:n is independent of X. [Ans]
2
16.21 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the normal N (0, σ ) distribution.
(a) Show that the random variables
Xk2
Wk = 2 and X12 + · · · + Xn2
X1 + · · · + Xn2

are independent for any k = 1, 2, . . . , n and show that Wk ∼ beta 1/2, (n−1)/2 .
In particular
X12 
2 2
∼ beta 1/2, 1/2 = arcsine(0, 1)
X1 + X2
(b) Show that idexDDnormal distributionuniform distributionnormal uniform
X12 + X22
∼ uniform(0, 1) [Ans]
X1 + X22 + X32 + X42
2

The folded normal and half-normal distributions. Suppose X ∼ N (µ, σ 2 ). Then |X| has the folded normal
distribution, folded (µ, σ 2 ). The half-normal is the folded (0, σ 2 ) distribution.
16.22 Suppose X ∼ N (µ, σ 2 ). Show that
 2n/2 σ n
 
n+1
E |X − µ|n = √ Γ

for n = 0, 1, . . . .
π 2
This also gives E[|X|n ] for the half-normal distribution. [Ans]
2
16.23 The folded normal distribution. Suppose Y ∼ folded (µ, σ ).
(a) Find the density of Y . (b) Find the distribution function of Y .
(c) Find E[Y ] and var[Y ]. (d) Find the c.f. of Y . [Ans]
d
16.24 (a) Show that folded (µ, σ 2 ) = folded (−µ, σ 2 ).
(b) Suppose X ∼ folded (µ, σ 2 ) and b ∈ (0, ∞). Show that bX ∼ folded (bµ, b2 σ 2 ). [Ans]
Page 52 §17 Mar 10, 2020(20:25) Bayesian Time Series Analysis

16.25 The half-normal distribution, folded (0, σ 2 ). Suppose X ∼ N (0, σ 2 ).


(a) Find fX , the density of X. (b) Show that fX is decreasing and hence has mode at x = 0.
(c) Show that fX is initially concave and then convex with inflection point at x = σ.
(d) Find the distribution function of X. (e) Find E[X] and var[X].
(f) Find the c.f. of X. [Ans]
16.26 Suppose X ∼ folded (0, σ 2 ).
(a) Show that for n = 0, 1, . . .
1
σ 2n (2n)! σ 2n+1 2n+ 2 n!
E[X 2n ] = n
and E[X 2n+1 ] = √
n!2 π
(b) Show that the skewness and kurtosis are

2(4 − π) 3π 2 − 4π − 12
skew[X] = and κ[X] = [Ans]
(π − 2)3/2 (π − 2)2
16.27 (a) Suppose X ∼ folded (0, 1). Show that X ∼ χ1 where the chi distribution is explained on page 110.
(b) Suppose X ∼ folded (0, σ 2 ) and b ∈ (0, ∞). Show that bX ∼ folded (0, b2 σ 2 ). Hence the family of distributions
{ folded ( 0, σ 2 ) : σ ∈ (0, ∞) } is a scale family of distributions—see definition(1.6d) on page 5. [Ans]

17 The lognormal distribution


17.1 The definition.
Definition(17.1a). Suppose µ ∈ R and σ ∈ R; then the random variable X has the lognormal distribution,
logN (µ, σ 2 ), iff ln(X) ∼ N (µ, σ 2 ).
Hence:
• if X ∼ logN (µ, σ 2 ) then ln(X) ∼ N (µ, σ 2 );
• if Z ∼ N (µ, σ 2 ) then eZ ∼ logN (µ, σ 2 ).
d d
It follows that if X ∼ logN (µ, σ 2 ) then ln(X) = µ + σY where Y ∼ N (0, 1). Hence X = eµ (eY )σ . We have
shown the following.
d
If X ∼ logN (µ, σ 2 ) then X = eµ W σ where W ∼ logN (0, 1).

17.2 The density and distribution function. Suppose X ∼ logN (µ, σ 2 ) and let Z = ln(X). Then
ln(x) − µ
 
FX (x) = P[X ≤ x] = P[Z ≤ ln(x)] = Φ
σ
hence the distribution function of the logN (µ, σ 2 ) distribution is
ln x − µ
 
FX (x) = Φ for x > 0.
σ
Differentiating the distribution function gives the density:
ln x − µ (ln x − µ)2
   
1 1
fX (x) = φ =√ exp − for x > 0.
σx σ 2πσx 2σ 2
Z
The density can also be obtained by transforming the normal density
2 dx z dz
 as follows. Now X = e where2 Z ∼
N (µ, σ ). Hence | dz | = e = x; hence fX (x) = fZ (z)| dx | = fZ (ln x) x where fZ is the density of N (µ, σ ).
The shape of the density function fX is considered in exercise 18.1 on page 55.

17.3 Moments. Suppose X ∼ logN (µ, σ 2 ). Then E[X n ] = E[enZ ] = exp nµ + 12 n2 σ 2 for any n ∈ R. In
 
particular
 
1 2
E[X] = exp µ + σ (17.3a)
2
2
 2 
var[X] = E[X 2 ] − {E[X]}2 = e2µ+σ eσ − 1
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §17 Page 53

µ = 0, σ = 0.25
1.5
µ = 0, σ = 0.5
µ = 0, σ = 1.0

1.0

0.5

0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0

Figure(17.2a). The graph of the lognormal density for (µ = 0, σ = 0.25), (µ = 0, σ = 0.5) and (µ = 0, σ = 1).
In all 3 cases, we have median = 1, mode < 1 and mean > 1—see exercise 18.4 on page 55.
(wmf/lognormaldensity,72mm,54mm)

17.4 The moment generating function. Suppose X ∼ logN (µ, σ 2 ) and consider E[etX ].
Case 1. Suppose t < 0. Now X ≥ 0 and hence tX ≤ 0 and etX ≤ 1. Hence E[etX ] ≤ 1.
Case 2. Suppose t > 0. Without loss of generality, consider the case when µ = 0 and σ = 1. Hence X = eZ where
Z ∼ N (0, 1)and Z ∞
z2
 
tX 1 z
E[e ] = √ exp te − dz
2π z=−∞ 2
2 3 2 2
For z > 0 we have ez > 1 + z + z2 + z6 and hence tez − z2 > t + tz + z6 [3t − 3 + tz] > t + tz for z sufficiently
R∞
large, say z > K. But K exp[t + tz] dz = ∞. Hence for all t > 0 we have E[etX ] = ∞.
We have shown that the m.g.f. only exists when t < 0. There is no simple closed form for its value.
17.5 Other properties.
• Suppose X1 , . . . , Xn are independent random variables with Xi ∼ logN (µi , σi2 ) for i = 1, . . . , n. Then
n n n
!
Y X X
Xi = X1 · · · Xn ∼ logN µi , σi2
i=1 i=1 i=1
• Suppose X1 , . . . , Xn are i.i.d. with the logN (µ, σ 2 ) distribution. Then
σ2
 
1/n
(X1 · · · Xn ) ∼ logN µ,
n
• If X ∼ logN (µ, σ 2 ) , b ∈ R and c > 0 then
cX b ∼ logN ln(c) + bµ, b2 σ 2

(17.5a)
See exercises 18.11 and 18.9 below for the derivations of these results.
17.6 The multiplicative central limit theorem.
Proposition(17.6a). Suppose X1 , . . . , Xn are i.i.d. positive random variables such that
E[ ln(X) ] = µ and var[ ln(X) ] = σ 2
both exist and are finite. Then
 √
X1 · · · Xn 1/ n D

−→ logN (0, σ 2 ) as n → ∞.
enµ
Proof. Let Yi = ln(Xi ) for i = 1, 2, . . . , n. Then
" 1/√n # Pn
X1 · · · Xn (Yi − µ) D
ln = i=1√ −→ N (0, σ 2 ) as n → ∞. 13
enµ n

13
The classical central limit theorem asserts that if X1 , X2 , . . . is a sequence of i.i.d. random variables with finite expectation µ
and finite variance σ 2 and Sn = (X1 + · · · + Xn )/n, then
√ D
n (Sn − µ) −→ N (0, σ 2 ) as n → ∞.
See page 357 in [B ILLINGSLEY(1995)].
Page 54 §17 Mar 10, 2020(20:25) Bayesian Time Series Analysis
D D
Now if Xn −→X as n → ∞ then g(Xn ) −→g(X) as n → ∞ for any continuous function, g. Taking g(x) = ex proves
the proposition.

Using equation(17.5a) shows that if X ∼ logN (0, σ 2 ) then X 1/σ ∼ logN (0, 1). It follows that
"  √1 #
X1 · · · Xn σ n
lim P ≤ x = Φ(x) for all x > 0.
n→∞ enµ
Also, if we let
 √
X1 · · · Xn 1/ n √ √ √

1/ n µ n 1/n µ 1/ n
W = then (X 1 · · · Xn ) = e W and (X 1 · · · Xn ) = e W
enµ
and hence by equation(17.5a), (X1 · · · Xn )1/n is asymptotically logN (µ, σ 2 /n).
We can generalise proposition(17.6a) as follows:
Proposition(17.6b). Suppose X1 , X2 , . . . is a sequence of independent positive random variables such that for
all i = 1, 2, . . .
E[ ln(Xi ) ] = µi , var[ ln(Xi ) ] = σi2 and E |ln(Xi ) − µi |3 = ωi3
all exist and are finite. For n = 1, 2, . . . , let
n
X Xn n
X
2 2 3
µ(n) = µi s(n) = σi ω(n) = ωi3
i=1 i=1 i=1
Suppose further that ω(n) /s(n) → 0 as n → ∞. Then
X1 · · · Xn 1/s(n) D
 
−→ logN (0, 1) as n → ∞.
eµ(n)
Proof. Let Yi = ln(Xi ) for i = 1, 2, . . . , n. Then
1/s(n) Pn
X1 · · · Xn (Yi − µi ) D

ln µ
= i=1 −→N (0, 1) as n → ∞. 14
e (n) s(n)
Using the transformation g(x) = ex proves the proposition.
Also, if we let
1/s(n)
X1 · · · Xn

W = then X1 · · · Xn = eµ(n) W s(n)
eµ(n)
 
and hence by equation(17.5a), the random variable (X1 · · · Xn ) is asymptotically logN µ(n) , s2(n) .
17.7 Usage. The multiplicative central limit theorem suggests the following applications of the lognormal which
can be verified by checking available data.
• Grinding, where a whole is divided into a multiplicity of particles and the particle size is measured by volume,
mass, surface area or length.
• Distribution of farm size (which corresponds to a division of land)—where a 3-parameter lognormal can be
used. The third parameter would be the smallest size entertained.
• The size of many natural phenomena is due to the accumulation of many small percentage changes—leading to
a lognormal distribution.

17.8
Summary. The lognormal distribution.
• X ∼ logN (µ, σ 2 ) iff ln(X) ∼ N (µ, σ 2 ).
2 2
• Moments: if X ∼ logN (µ, σ 2 ) then E[X] = exp µ + 12 σ 2 and var[X] = e2µ+σ ( eσ − 1 )
 
• The product of independent lognormals is lognormal.
• If X ∼ logN (µ, σ 2 ) , b ∈ R and c > 0 then cX b ∼ logN ln(c) + bµ, b2 σ 2

• The multiplicative central limit theorem.
14
Lyapunov central limit theorem with δ = 1. Suppose X1 , X2 , . . . is a sequence of independent random variables such that
E[Xi ] = µi and var[Xi ] = σi2 are both finite. Let sn = σ12 + · · · + σn2 and suppose
n Pn
1 X 3 i=1 (Xi − µi ) D
lim E |Xi − µi | = 0, then −→N (0, 1) as n → ∞.
n→∞ s3n sn
i=1
See page 362 in [B ILLINGSLEY(1995)].
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §18 Page 55

18 Exercises (exs-logN.tex)

2
18.1 Shape of the density function of the lognormal distribution. Suppose X ∼ logN (µ, σ ) with density function fX .
2
(a) Show that fX first increases and then decreases with mode at x = eµ−σ . Show also that fX (x) → 0 as x ↓ 0 and as
x → ∞.
(b) Show that fX is initiallyconvex, then concave andthen convex againwith points of inflectionat
3 1 √ 3 1 √
x1 = exp µ − σ 2 − σ σ 2 + 4 and x2 = exp µ − σ 2 + σ σ 2 + 4 [Ans]
2 2 2 2
18.2 Suppose X ∼ logN (µ, σ 2 ). Let E[X] = α and var[X] = β. Express µ and σ 2 in terms of α and β. [Ans]
18.3 An investor forecasts that the returns on an investment over the next 4 years will be as follows: for each of the first
2 years he estimates that £1 will grow to £(1 + I) where I is a random variable with E[I] = 0.08 and var[I] = 0.001;
for each of the last 2 years he estimates that £1 will grow to £(1 + I) where I is a random variable with E[I] = 0.06 and
var[I] = 0.002.
Suppose he further assumes that the return Ij in year j is independent of the returns in all other years and that 1 + Ij has
a lognormal distribution, for j = 1, 2, . . . , 4.
Calculate the amount of money which must be invested at time t = 0 in order to ensure that there is a 95% chance that
the accumulated value at time t = 4 is at least £5,000. [Ans]
18.4 Suppose X ∼ logN (µ, σ 2 ).
(a) Find the median and mode and show that: mode < median < mean.
(b) Find expressions for the lower and upper quartiles of X in terms of µ and σ.
(c) Suppose αp denotes the p-quartile of X; this means that P[X ≤ αp ] = p. Prove that αp = eµ+σβp where βp is the
p-quartile of the N (0, 1) distribution. [Ans]
18.5 Suppose X ∼ logN (µ, σ 2 ).
(a) Find skew[X], the skewness of X.
(b) Find κ[X], the kurtosis of X. [Ans]
18.6 The geometric mean and geometric variance of a distribution. Suppose each xi P in the data set {x1 ., . . . , P
xn } satisfies
n
xi > 0. Then the geometric mean of the data set is g = (x1 · · · xn )1/n or ln(g) = n1 i=1 ln(xi ) or ln(g) = n1 j fj ln(xj )
where fj is the frequency of the observation xj . This definition motivates the following.
Suppose
R∞ X is a random variable with X > 0. Then GMX , the geometric mean of X is defined by ln(GMX ) =
0
ln(x)f X (x) dx = E[ln(X)].
Similarly, we define the geometric variance, GVX , by
ln(GVX ) = E (ln X − ln GMX )2 = var[ln(X)]
 

and the geometric standard deviation by GSDX = GVX .
Suppose X ∼ logN (µ, σ 2 ). Find GMX and GSDX . [Ans]
18.7 Suppose X ∼ logN (µ, σ 2 ) and k > 0. Show that  
ln(k)−µ−σ 2
1 2
Φ σ
E[X|X < k] = eµ+ 2 σ  
ln(k)−µ
Φ σ
and  
µ+σ 2 −ln(k)
Φ σ
µ+ 12 σ 2
E[X|X ≥ k] = e   [Ans]
ln(k)−µ
1−Φ σ

18.8 Suppose X ∼ logN (µ, σ 2 ). Then the j th moment distribution function of X is defined to be the function G : [0, ∞) →
[0, 1] with Z x
1
G(x) = uj fX (u) du
E[X j ] 0
(a) Show that G is the distribution function of the logN (µ + jσ 2 , σ 2 ) distribution.
(b) Suppose γX denotes the Gini coefficient of X Z (also called the coefficient of mean difference of X). By definition
∞Z ∞
1
γX = |u − v|fX (u)fX (v) dudv
2E[X] 0 0
Hence
E|X − Y |
γX =
2E[X]
where X and Y are independent with the same distribution. Prove that
γX = 2Φ( σ/√2) − 1 [Ans]
Page 56 §19 Mar 10, 2020(20:25) Bayesian Time Series Analysis

18.9 Suppose X ∼ logN (µ, σ 2 ).


(a) Find the distribution of 1/X.
(b) Suppose b ∈ R − {0} and c > 0. Find the distribution of cX b .
In particular, if X ∼ logN (µ, σ 2 ) and c > 0 then cX ∼ logN (µ + ln c, σ 2 ). Hence if σ > 0 and Xµ ∼ logN (µ, σ 2 ),
d
then eλ−µ Xµ = Xλ . Hence the family of distributions {Xµ : µ ∈ R} is a scale family of distributions—see
definition(1.6d) on page 5. [Ans]
18.10 Suppose X1 and X2 are independent random variables with Xi ∼ logN (µi , σi2 ) for i = 1 and i = 2. Find the distribution
of X1 /X2 . [Ans]
18.11 (a) Suppose X1 , .Q. . , Xn are independent random variables with Xi ∼ logN (µi , σi2 ) for i = 1, . . . , n. Find the
n
distribution of i=1 Xi = X1 · · · Xn .
(b) Suppose X1 , . . . , Xn are i.i.d. with the logN (µ, σ 2 ) distribution. Find the distribution of (X1 · · · Xn )1/n .
(c) Suppose X1 , . . . , Xn be independent random variables with Xi ∼ logN (µi , σi2 ) for i = 1, . . . , n. Suppose further
that a1 , . . . , an are real constants. Show that
Yn
Xiai ∼ logN (mn , s2n )
i=1
for some mn and sn and find explicit expressions for mn and sn . [Ans]

19 The power law and Pareto distributions


The power law distribution.
19.1 The power law distribution.
Definition(19.1a). Suppose a0 ∈ R, h ∈ (0, ∞) and α ∈ (0, ∞). Then the random variable X has the power law
distribution, powerlaw(α, h, a0 ), iff X has density
α(x − a0 )α−1
f (x) = for a0 < x < a0 + h. (19.1a)

It is easy to see that if α < 1 then f is monotonic decreasing and if α > 1 then f is monotonic increasing. Also,
if α < 1 or α > 2 then f is convex, and if 1 < α < 2 then f is concave. The density is shown in figure (19.1a)
for three values of α.
4
α = 1/2
α=2
3
α=4

0
0.0 0.2 0.4 0.6 0.8 1.0

Figure(19.1a). The density of powerlaw(α, 1, 0) for α = 1/2, α = 2 and α = 4.


(wmf/powerdensity,72mm,54mm)

The distribution function of powerlaw(α, h, a0 ) is


(x − a0 )α
F (x) = for a0 < x < a0 + h.

The standard power law distribution is powerlaw(α, 1, 0); this has density f (x) = αxα−1 for 0 < x < 1 and
distribution function F (x) = xα for 0 < x < 1. Clearly
d
powerlaw(α, 1, 0) = beta(α, 1)
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §19 Page 57

19.2 Moments and elementary transformations. If X ∼ powerlaw(α, 1, 0) = beta(α, 1), then E[X] = α/(α +
1), E[X 2 ] = α/(α + 2) and var[X] = α/(α + 1)2 (α + 2); Further results are given in exercise 20.1 on page 60.
Clearly,
X ∼ powerlaw(α, h, a0 ) iff (X − a0 )/h ∼ powerlaw(α, 1, 0)
and
if X ∼ powerlaw(α, h, 0) and β ∈ (0, ∞), then βX ∼ powerlaw(α, βh, 0)
It follows that for fixed α ∈ (0, ∞), the family of distributions {powerlaw(α, b, 0) : b ∈ (0, ∞)} is a scale family
of distributions——see definition(1.6d) on page 5.
19.3 A characterization of the power law distribution. Suppose X ∼ powerlaw(α, h, 0); then
αxα−1 xα
f (x) = and F (x) = for x ∈ (0, h).
hα hα
Also, for all c ∈ (0, h) we have
Z c
αxα−1 α c
E[X|X ≤ c] = x α dx = c = E[X]
0 c α+1 h
The next proposition shows this result characterizes the power law distribution (see [DALLAS(1976)]).
Proposition(19.3a). Suppose X is a non-negative absolutely continuous random variable such that there exists
h > 0 with P[X ≤ h] = 1. Suppose further that for all c ∈ (0, h) we have
c
E[X|X ≤ c] = E[X] (19.3a)
h
Then there exists α > 0 such that X ∼ powerlaw(α, h, 0).
Proof. Let f denote the density and F denote the distribution function of X. Then equation(19.3a) becomes
Z c
xf (x) c h
Z
dx = xf (x) dx
0 F (c) h 0
Rh
Let δ = h1 0 xf (x) dx. Then δ ∈ (0, 1) and
Z c
xf (x) dx = cF (c) δ for all c ∈ (0, h). (19.3b)
0
Differentiating with respect to c gives
cf (c) = [F (c) + cf (c)] δ
and hence
F 0 (c) α δ
= where α = > 0.
F (c) c 1−δ
Integrating gives ln F (c) = A + α ln(c) or F (c) = kcα . Using F (h) = 1 gives F (c) = cα /hα for c ∈ (0, h), as required.
The above result leads on to another characterization of the power law distribution:
Proposition(19.3b). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely con-
tinuous distribution function F and such that there exists h > 0 with F (h) = 1. Then
 
Sn
E X(n) = x = c with c independent of x for all x ∈ (0, h) (19.3c)
X(n)
iff there exists α > 0 such that F (x) = xα /hα for x ∈ (0, h).
Proof. ⇒ Writing Sn = X(1) + · · · + X(n) in equation(19.3c) gives
 
(c − 1)x = E X(1) + · · · + X(n−1) |X(n) = x for all x ∈ (0, h).
It is easy to see that given X(n) = x, then the vector (X(1) , . . . , X(n−1) ) has the same distribution as the vector of n − 1
order statistics (Y(1) , . . . , Y(n−1) ) from the density f (y)/F (x) for 0 < y < x. Hence Y(1) + . . . + Y(n−1) = Y1 + · · · + Yn−1
and
(c − 1)x = (n − 1)E[Y ] where Y has density f (y)/F (x) for y ∈ (0, x). (19.3d)
Hence Z x
c−1
yf (y) dy = xF (x)
0 n−1
Because X(j) < X(n) for all j = 1, 2, . . . , n, equation(19.3c) implies c < n; also equation(19.3d) implies c > 1. Hence
c−1
δ = n−1 ∈ (0, 1). This applies for all x ∈ (0, h) and c is independent of x. So we have equation(19.3b) again and we
must have F (x) = xα /hα for x ∈ (0, h).
⇐ See part (a) of exercise 20.7 on page 61.
Page 58 §19 Mar 10, 2020(20:25) Bayesian Time Series Analysis

The next result is an easy consequence of the last one—it was originally announced in [S RIVASTAVA(1965)] but
the proof here is due to [DALLAS(1976)].

Proposition(19.3c). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely con-
tinuous distribution function F and such that there exists h > 0 with F (h) = 1. Then
Sn
is independent of max{X1 , . . . , Xn } (19.3e)
max{X1 , . . . , Xn }
iff there exists α > 0 such that F (x) = xα /hα for x ∈ (0, h).
Proof. ⇒ Clearly equation(19.3e) implies equation(19.3c). ⇐ See part (b) of exercise 20.7 on page 61.

The Pareto distribution.

19.4 The Pareto distribution. Suppose α > 0 and x0 > 0. Then the random variable X is said to have a Pareto
distribution iff X has the distribution function
 x α
0
FX (x) = 1 − for x ∈ [x0 , ∞).
x
It follows that X has density
αxα0
fX (x) = for x ∈ [x0 , ∞).
xα+1
Shifting the x-axis by the distance a leads to the general definition:

Definition(19.4a). Suppose α ∈ (0, ∞), x0 ∈ (0, ∞) and a ∈ [0, ∞). Then the random variable X has the
Pareto distribution, Pareto(α, x0 , a), iff X has density
αxα0
fX (x) = for x ∈ [a + x0 , ∞) (19.4a)
(x − a)α+1
It follows that X has distribution function
xα0
FX (x) = 1 − for x ∈ [a + x0 , ∞). (19.4b)
(x − a)α
1
The standard Pareto distribution is Pareto(1, 1, 0). This has density f (x) = x2
for x ∈ [1, ∞) and distribution
function F (x) = 1 − x1 for x ∈ [1, ∞).
By differentiation, we see that the density is monotonic decreasing and convex on (a + x0 , ∞). The shape of the
Pareto density is shown in figure(19.4a).
The Pareto distribution has been used to model the distribution of incomes, the distribution of wealth, the sizes of
human settlements, etc.
3.0
x0 = 1, α = 1
x0 = 1, α = 2
2.5
x0 = 1, α = 3
2.0

1.5

1.0

0.5

0.0
0 1 2 3 4 5

Figure(19.4a). The density of Pareto(α, x0 , 0) for α = 1, α = 2 and α = 3 (all with x0 = 1).


(wmf/Paretodensity,72mm,54mm)
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §19 Page 59

19.5 Elementary transformations of the Pareto distribution. Clearly


X ∼ Pareto(α, x0 , a) iff X − a ∼ Pareto(α, x0 , 0)
and
X ∼ Pareto(α, x0 , a) iff (X − a)/x0 ∼ Pareto(α, 1, 0)
Also
suppose X ∼ Pareto(α, x0 , 0) and b ∈ (0, ∞), then bX ∼ Pareto(α, bx0 , 0) (19.5a)

It follows that for fixed α ∈ (0, ∞), the family of distributions {Pareto(α, b, 0) : b ∈ (0, ∞)} is a scale family
of distributions——see definition(1.6d) on page 5. The parameter α is called the shape parameter and the
parameter x0 is called the scale parameter of the Pareto distribution.

Link between the Pareto and power law distributions. It is important to note that
if X ∼ Pareto(α, h, 0) then 1/X ∼ powerlaw(α, 1/h, 0),
and
if X ∼ powerlaw(α, h, 0) then 1/X ∼ Pareto(α, 1/h, 0).
So a result about one distribution can often be transformed into an equivalent result about the other.

Link between the Pareto and beta distributions. The previous results specialize to
if X ∼ Pareto(α, 1, 0) then 1/X ∼ powerlaw(α, 1, 0) = beta(α, 1),
and
if X ∼ beta(α, 1) = powerlaw(α, 1, 0) then 1/X ∼ Pareto(α, 1, 0).

19.6 Quantile function, median and moments. Using equation(19.4b) gives the quantile function
x0
FX−1 (p) = a + for p ∈ [0, 1).
(1 − p)1/α
Hence the median is FX ( 1/2) = a + 21/α x0 .
Clearly the mode of the distribution Pareto(α, x0 , a) is at a + x0 .

Moments of the Pareto(α, x0 , 0) distribution. Suppose X ∼ Pareto(α, x0 , 0). Then


Z ∞ 
n α n−α−1 ∞ if n ≥ α;
E[X ] = αx0 x dx = n /(α − n) if n ∈ (0, α).
x=x0
αx0

In particular, if X ∼ Pareto(α, x0 , 0), then


αx0 αx20
E[X] = if α > 1, and var[X] = if α > 2.
α−1 (α − 1)2 (α − 2)

19.7 A characterization of the Pareto distribution. Suppose X ∼ Pareto(α, x0 , 0). Suppose further that
α > 1 so that the expectation is finite. We have
αxα0 xα0
f (x) = and F (x) = 1 − for x > x0 .
xα+1 xα
Because the expectation is finite, we have for all c > x0
Z ∞ Z ∞
αxα0 1 αc c
E[X|X > c] = x α+1 dx = αcα α
dx = = E[X]
c x [1 − F (c)] c x α − 1 x 0
The next proposition shows this result characterizes the Pareto distribution (see [DALLAS(1976)]).

Proposition(19.7a). Suppose X is a non-negative absolutely continuous random variable with a finite expec-
tation and such that these exists x0 > 0 with P[X > x0 ] = 1. Suppose further that for all c > x0 we have
c
E[X|X > c] = E[X] (19.7a)
x0
Then there exists α > 1 such that X ∼ Pareto(α, x0 , 0).
Page 60 §20 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Proof. Let f denote the density and F denote the distribution function of X. Then equation(19.7a) becomes
Z ∞ Z ∞
xf (x) c
dx = xf (x) dx (19.7b)
c 1 − F (c) x0 x0
R∞
Let δ = x10 x0 xf (x) dx. We are assuming E[X] is finite; hence δ ∈ (1, ∞). Equation(19.7b) leads to
Z ∞
xf (x) dx = c[1 − F (c)] δ for all c > x0 . (19.7c)
c
Differentiating equation(19.7c) with respect to c gives
−cf (c) = [1 − F (c) − cf (c)] δ
and hence
cf (c)[δ − 1] = [1 − F (c)]δ
F 0 (c) α δ
= where α = > 1.
1 − F (c) c δ−1
Integrating gives − ln[1 − F (c)] = A + ln(cα ) or 1 − F (c) = k/cα . Using F (x0 ) = 0 gives 1 − F (c) = xα α
0 /c for c > x0 ,
as required.
The above result leads on to another characterization of the Pareto distribution:
Proposition(19.7b). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely con-
tinuous distribution function F with a finite expectation and such that there exists x0 > 0 with P[X > x0 ] = 1.
Then  
Sn
E X(1) = x = c with c independent of x for all x > x0 (19.7d)
X(1)
iff there exists α > 1 such that X ∼ Pareto(α, x0 , 0).
Proof. ⇒ Writing Sn = X(1) + · · · + X(n) in equation(19.7d) gives
 
(c − 1)x = E X(2) + · · · + X(n) |X(1) = x
It is easy to see that given X(1) = x, then the vector (X(2) , . . . , X(n) ) has the same distribution as the vector of n − 1 order
statistics (Y(1) , . . . , Y(n−1) ) from the density f (y)/[1 − F (x)] for y > x. Hence Y(1) + . . . + Y(n−1) = Y1 + · · · + Yn−1 and
(c − 1)x = (n − 1)E[Y ] where Y has density f (y)/[1 − F (x)] for y > x. (19.7e)
Hence Z ∞
c−1
yf (y) dy = x[1 − F (x)]
x n −1
Because X(j) > X(1) for all j = 2, 3, . . . , n, equation(19.7d) implies c > nx c−1
x = n. Hence δ = n−1 ∈ (1, ∞). Recall c is
α α
independent of x; hence we have equation(19.7c) again and we must have F (x) = 1 − x0 /x for x ∈ (x0 , ∞).
⇐ See part (b) of exercise 20.22 on page 62.
The next result is an easy consequence of the last one—it was originally announced in [S RIVASTAVA(1965)] but
the proof here is due to [DALLAS(1976)].
Proposition(19.7c). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely con-
tinuous distribution function F with finite expectation and such that there exists x0 > 0 with P[X > x0 ] = 1.
Then
Sn
is independent of min{X1 , . . . , Xn } (19.7f )
min{X1 , . . . , Xn }
iff there exists α > 1 such that X ∼ Pareto(α, x0 , 0).
Proof. ⇒ Clearly equation(19.7f ) implies equation(19.7d). ⇐ See part (c) of exercise 20.22 on page 62.

20 Exercises (exs-powerPareto.tex)

The power law distribution.


20.1 Suppose X has the powerlaw(α, h, a) distribution. Find E[X], E[X 2 ] and var[X]. [Ans]
20.2 Suppose X ∼ powerlaw(α, h, a0 ).
(a) Find skew[X], the skewness of X.
(b) Find κ[X], the kurtosis of X. [Ans]
20.3 Transforming the power law distribution to the exponential. Suppose X ∼ powerlaw(α, h, 0). Let Y = − ln(X);
equivalently Y = ln( 1/X ). Show that Y − ln( 1/h) ∼ exponential (α). [Ans]
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §20 Page 61

20.4 Suppose U1 , U2 , . . . , Un are i.i.d. random variables with the uniform(0, 1) distribution.
(a) Find the distribution of Mn = max(U1 , . . . , Un ).
1/n
(b) Find the distribution of Y = U1 .
(c) Suppose X ∼ powerlaw(α, h, a). Show that X ∼ a + hU 1/α where U ∼ uniform(0, 1). Hence show that
n  
n
X α n j n−j
E[X ] = h a for n = 1, 2, . . . . [Ans]
α+j j
j=0

20.5 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the powerlaw(α, h, a) distribution. Find the distribution of
Mn = max(X1 , . . . , Xn ). [Ans]
20.6 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the power law distribution powerlaw(α, 1, 0). By using the
2
density of Xk:n , find E[Xk:n ] and E[Xk:n ]. [Ans]
20.7 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the powerlaw(α, h, 0) distribution.
 
Sn
(a) Show that E X(n) = x = c where c is independent of x.
X(n)
Sn
(b) Show that is independent of max{X1 , . . . , Xn }. [Ans]
max{X1 , . . . , Xn }
20.8 Suppose r > 0 and X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely continuous distribution
function F such that there exists h > 0 with F (h) = 1.
(a) Show that for some i = 1, 2, . . . , n − 1
" #
r
X(i)
E X(i+1) = x = c with c independent of x for x ∈ (0, h)

r
X(i+1)
iff there exists α > 0 such that F (x) = xα /hα for x ∈ (0, h).
(b) Assuming the expectation is finite, show that for some i = 1, 2, . . . , n − 1
" #
r
X(i+1)
E r X(i+1) = x = c with c independent of x for x ∈ (0, h)

X(i)
iff there exists α > 0 such that F (x) = xα /hα for x ∈ (0, h). [DALLAS(1976)] [Ans]
20.9 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the power law distribution powerlaw(α, 1, 0), which has
distribution function F (x) = xα for 0 < x < 1 where α > 0.
(a) Let
X1:n X2:n X(n−1):n
W1 = , W2 = , . . . , Wn−1 = , Wn = Xn:n
X2:n X3:n Xn:n
Prove that W1 , W2 , . . . , Wn are independent and find the distribution of Wk for k = 1, 2, . . . , n.
2
(b) Hence find E[Xk:n ] and E[Xk:n ]. [Ans]

The Pareto distribution.


20.10 Relationship with the power law distribution. Relationship between the Pareto and uniform distributions.
Recall that if α > 0, then U ∼ uniform(0, 1) iff Y = U 1/α ∼ powerlaw(α, 1, 0) = beta(α, 1).
(a) Suppose α > 0. Show that U ∼ uniform(0, 1) iff Y = U −1/α ∼ Pareto(α, 1, 0).
In particular, if U ∼ uniform(0, 1) then x0 U −1/α ∼ Pareto(α, x0 , 0).
(b) Suppose α > 0 and x0 > 0. Show that Y ∼ Pareto(α, x0 , a) iff Y = a + x0 U −1/α where U ∼ uniform(0, 1).
In particular, if Y ∼ Pareto(α, x0 , a) then xα α
0 /(Y − a) ∼ uniform(0, 1).
(c) Show that X ∼ powerlaw(α, 1, 0) iff 1/X ∼ Pareto(α, 1, 0). [Ans]
20.11 Suppose X ∼ Pareto(α, x0 , a).
(a) Find E[X n ] for n = 1, 2, . . . .
(b) Let µX = E[X]. Find E[(X − µX )n ] for n = 1, 2, . . . . In particular, find an expression for var[X].
(c) Find MX (t) = E[etX ], the moment generating function of X and φX (t) = E[eitX ], the characteristic function of X.
[Ans]
20.12 Suppose X ∼ Pareto(α, x0 , a).
(a) Find skew[X], the skewness of X.
(b) Find κ[X], the kurtosis of X. [Ans]
Page 62 §20 Mar 10, 2020(20:25) Bayesian Time Series Analysis

20.13 Suppose X ∼ Pareto(α, x0 , 0) and d ∈ [x0 , ∞). Show that the conditional distribution of X given X ≥ d is
Pareto(α, d, 0). [Ans]
20.14 Show that the Pareto( 1/2, 1, 0) distribution provides an example of a distribution with E[1/X] finite but E[X] infinite.
[Ans]
20.15 Positive powers of a Pareto distribution. Suppose α ∈ (0, ∞) and x0 ∈ (0, ∞) and X ∼ Pareto(α, x0 , 0). Show that
X n ∼ Pareto(α/n, xn0 , 0) for n ∈ (0, ∞).
It follows that if X ∼ Pareto(1, 1, 0), then X 1/α ∼ Pareto(α, 1, 0) and, by (19.5a), x0 X 1/α ∼ Pareto(α, x0 , 0). [Ans]
20.16 Link between the Pareto and exponential distributions.
(a) Suppose X ∼ Pareto(α, x0 , 0). Let Y = ln(X). Show that Y has a shifted exponential distribution: Y − ln(x0 ) ∼
exponential (α).
(b) Suppose X ∼ exponential (λ). Show that Y = eX ∼ Pareto(λ, 1, 0). [Ans]
20.17 Suppose X ∼ Pareto(α, x0 , 0). Find the geometric mean of X and the Gini coefficient of X. The geometric mean of a
distribution is defined in exercise 18.6 on page 55 and the Gini coefficient is defined in exercise 18.8 on page 55. [Ans]
20.18 Suppose X1 , X2 , . . . , Xn are independent random variables with Xj ∼ Pareto(αj , x0 , a) for j = 1, 2, . . . , n. Find the
distribution of Mn = min(X1 , X2 , . . . , Xn ). [Ans]
20.19 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Pareto distribution Pareto(α, 1, 0). By using the density of
2
Xk:n , find E[Xk:n ] and E[Xk:n ]. [Ans]
20.20 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Pareto(α, 1, 0) distribution.
(a) Let
X2:n X(n−1):n Xn:n
W1 = X1:n W2 = , . . . , Wn−1 = , Wn =
X1:n X(n−2):n X(n−1):n
Prove that W1 , W2 , . . . , Wn are independent and find the distribution of Wk for k = 1, 2, . . . , n.
2
(b) Hence find E[Xk:n ] and E[Xk:n ]. See exercise 20.19 for an alternative derivation. [Ans]
20.21 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the powerlaw(α, 1, 0) distribution. Suppose also Y1 , Y2 , . . . , Yn
are i.i.d. random variables with the Pareto(α, 1, 0) distribution. Show that for k = 1, 2, . . . , n
1
Xk:n and have the same distribution. [Ans]
Y(n−k+1):n
20.22 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Pareto(α, x0 , 0) distribution.
(a) Prove that the random variable X1:n is independent of the random vector ( X2:n/X1:n , . . . , Xn:n/X1:n ).
 
Sn
(b) Show that E X(1) = x = c where c is independent of x.
X(1)
(c) Prove that X1:n is independent of Sn/X1:n = (X1 +···+Xn )/X1:n . [Ans]
20.23 Suppose r > 0 and X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely continuous distribution
function F with finite expectation and such that there exists x0 > 0 with P[X > x0 ] = 1.
(a) Show that for some i = 1, "2, . . . , n − 1 #
r
X(i+1)
E r X(i) = x = c with c independent of x for x > x0

X(i)
iff there exists α > r/(n − i) such that F (x) = 1 − xα α
0 /x for x > x0 .
(b) Show that for some i = 1, "2, . . . , n − 1 #
r
X(i)
E X(i) = x = c with c independent of x for x > x0

r
X(i+1)
iff there exists α > r/(n − i) such that F (x) = 1 − xα α
0 /x for x > x0 . [DALLAS(1976)] [Ans]
20.24 Suppose X and Y are i.i.d. random variables with the Pareto(α, x0 , 0) distribution. Find the distribution function and
density of Y /X . [Ans]
20.25 Suppose X and Y are i.i.d. random variables with the Pareto(α, x0 , 0) distribution. Let M = min(X, Y ). Prove that M
and Y /X are independent. [Ans]
20.26 A characterization of the Pareto distribution. It is known that if X and Y are i.i.d. random variables with an absolutely
continuous distribution and min(X, Y ) is independent of X − Y , then X and Y have an exponential distribution—see
[C RAWFORD(1966)].
Now suppose X and Y are i.i.d. positive random variables with an absolutely continuous distribution and min(X, Y ) is
independent of Y /X . Prove that X and Y have a Pareto distribution.
Combining this result with exercise 20.25 gives the following characterization theorem: suppose X and Y are i.i.d. pos-
itive random variables with an absolutely continuous distribution; then min(X, Y ) is independent of Y /X if and only if
X and Y have the Pareto distribution. [Ans]
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §21 Page 63

20.27 Another characterization of the Pareto distribution. Suppose X1 , X2 , . . . , Xn are i.i.d. absolutely continuous non-
negative random variables with density function f (x) and distribution function F (x). Suppose further that F (1) = 0 and
f (x) > 0 for all x > 1 and 1 ≤ i < j ≤ n. Show that Xj:n/Xi:n is independent of Xi:n if and only if there exists β > 0
such that each Xi has the Pareto(β, 1, 0) distribution.
Using the fact that X ∼ Pareto(α, x0 , 0) iff X/x0 ∼ Pareto(α, 1, 0), it follows that if F (x0 ) = 0 and f (x) > 0 for all
x > x0 where x0 > 0 then Xj:n/Xi:n is independent of Xi:n if and only if there exists β > 0 such that each Xi has the
Pareto(β, x0 , 0) distribution. [Ans]

21 The t, Cauchy and F distributions


The tn distribution

21.1 Definition of the tn distribution.


Definition(21.1a). Suppose n ∈ (0, ∞). Then the random variable T has a t-distribution with n degrees of
freedom iff
X
T =p (21.1a)
Y /n
where X ∼ N(0, 1), Y ∼ χ2n , and X and Y are independent.
It follows that the conditional distribution of T given Y = y is N (0, n/y).

Density: Finding the density is a routine calculation and is left to exercise 22.1 on page 68 where it is shown that
the density of the tn distribution is
−(n+1)/2   −(n+1)/2
t2 Γ (n+1)/2 t2

1
fT (t) = √ 1+ =  √ 1+ for t ∈ R. (21.1b)
B( 1/2, n/2) n n Γ n/2 πn n
We can check that the function fT defined in equation(21.1b) is a density for any n ∈ (0, ∞) as follows. Clearly
fT (t) > 0; also, by using the transformation θ = 1/(1 + t2 /n), it follows that
Z ∞ −(n+1)/2 Z ∞ −(n+1)/2 Z 1
t2 t2 √
1+ dt = 2 1+ dt = n θ(n−2)/2 (1 − θ)−1/2 dθ
−∞ n 0 n 0

= n B(1/2, n/2)
Hence fT is a density.
Now Y in equation(21.1a) can be replaced by Z12 + · · · + Zn2 where Z1 , Z2 , . . . , Zn are i.i.d. with the N (0, 1) distri-
bution. Hence Y /n has variance 1 when n = 1, but its distribution becomes more clustered about the constant 1 as
n becomes larger. Hence T has a larger variance then the normal when n = 1, but tends to the normal as n → ∞.
See exercise 22.7 on page 68 for a mathematical proof of this limiting result.
Figure(21.1a) graphically demonstrates the density of the t-distribution is similar to the shape of the normal den-
sity but has heavier tails.
0.4 t2
t10
normal
0.3

0.2

0.1

0.0
−4 −2 0 2 4

Figure(21.1a). Plot of the t2 , t10 and standard normal densities.


(wmf/tdensity,72mm,54mm)
Page 64 §21 Mar 10, 2020(20:25) Bayesian Time Series Analysis
R∞
21.2 Moments of the tn distribution. Suppose T ∼ tn . Now it is well-known that the integral 1 x1j dx
converges if j > 1 and diverges if j ≤ 1. It follows that
Z ∞
tr
dt converges if r < n.
1 (n + t2 )(n+1)/2
Hence the function tr fT (t) is integrable iff r < n.
√ √
Provided n > 1, E[T ] exists and equals nE[X] E[1/ Y ] = 0.
Provided n > 2, var(T ) = E[T 2 ] = nE[X 2 ] E[1/Y ] = n/(n − 2) by using equation(11.9b) on page 36 which
gives E[1/Y ] = 1/(n − 2).
21.3 Linear transformation of the tn distribution. Suppose m ∈ R, s > 0 and V = m + sT . Then
2 !−(n+1)/2


dt 1 1 v m
fV (v) = fT (t) = √ 1+ (21.3a)
dv B( 1/2, n/2)s n n s
Also E[V ] = m for n > 1 and
n
var(V ) = s2 for n > 2 (21.3b)
n−2
This is called a tn (m, s2 ) distribution.
The Cauchy distribution
21.4 The Cauchy distribution. The Cauchy distribution is basically the t1 distribution.
Definition(21.4a). Suppose a ∈ R and s ∈ (0, ∞). Then the random variable X has the Cauchy distribution,
Cauchy(a, s), iff X has the t1 (a, s2 ) distribution; hence X has the following density
s
γs (x) =  2  for x ∈ R.
π s + (x − a)2
The standard Cauchy distribution, denoted Cauchy(0, 1), is the same as the t1 distribution and has density
1
γ1 (x) = for x ∈ R.
π(1 + x2 )
Clearly if a ∈ R, s ∈ (0, ∞) and X ∼ Cauchy(0, 1) = t1 , then a + sX ∼ Cauchy(a, s) = t1 (a, s2 ).
Shape of the density of the Cauchy(a, s) distribution. See exercise 22.9 and figure(21.4a).
0.4 normal
γ1 = t1
γ2 = t1 (0, 4)
0.3

0.2

0.1

0.0
−4 −2 0 2 4

Figure(21.4a). Plot of the normal, standard Cauchy and the Cauchy(0, 2) = t1 (0, 4) densities.
(wmf/cauchydensity,72mm,54mm)

21.5 Elementary properties of the Cauchy distribution.


• Moments. The expectation, variance and higher moments of the Cauchy distribution are not defined.
• The distribution function of the Cauchy(a, s) distribution. This is
−1 x − a
 
1
where tan−1 (x−a)/s ∈ (0, π).

Fs (x) = tan
π s
This is probably better written as
−1 x − a
 
1 1
where now tan−1 (x−a)/s ∈ (− π/2, π/2).

Fs (x) = + tan (21.5a)
2 π s
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §21 Page 65

• The characteristic function. Suppose the random variable T has the standard Cauchy distribution Cauchy(0, 1).
Then
φT (t) = E[eitT ] = e−|t| (21.5b)
and hence if W ∼ Cauchy(a, s), then W = a + sT and E[e ] = eitW iat−s|t| .
Note. The characteristic function can be derived by using the calculus of residues, or by the following trick. Using integration
by parts gives
Z ∞ Z ∞ Z ∞ Z ∞
−y −y −y
e cos(ty) dy = 1 − t e sin(ty) dy and e sin(ty) dy = t e−y cos(ty) dy
0 0 0 0
and hence Z ∞
1
e−y cos(ty) dy =
0 1 + t2
Now the characteristic function of the bilateral exponential15 density f (x) = 21 e−|x| for x ∈ R is
1 ∞
Z Z ∞
−|y| 1
φ(t) = (cos(ty) + i sin(ty))e dy = e−y cos(ty) dy =
2 −∞ 0 1 + t2
Because this function is absolutely integrable, we can use the inversion theorem to get
Z ∞ −ity Z ∞ ity
1 −|t| 1 e 1 e
e = 2
dy = dy as required.
2 2π −∞ 1 + y 2π −∞ 1 + y 2

The Cauchy distribution is infinitely divisible. If X ∼ Cauchy(a, s) then E[eitX ] = eiat−s|t| ]. Hence if n ∈
{1, 2, . . .}, then X = Y1 + · · · + Yn where Y1 , . . . ,Yn are i.i.d. random variables with the Cauchy(a/n, s/n) dis-
tribution. Hence by definition(5.1a) on page 15, the distribution Cauchy(a, s) is infinitely divisible. Note that
infinite divisibility is a consequence of stability which is proved in exercise 22.14.
Further properties of the Cauchy distribution can be found in exercises 22.9–22.22 starting on page 69.
The F distribution
21.6 Definition of the F distribution.
Definition(21.6a). Suppose m ∈ (0, ∞) and n ∈ (0, ∞). Suppose further that X ∼ χ2m , Y ∼ χ2n and X and Y
are independent. Then
X/m
F = has the Fm,n distribution.
Y /n
Finding the density of the Fm,n distribution is a routine calculation and is left to exercise 22.23 on page 70 where
it is shown that the density of the Fm,n distribution is
Γ( m+n
2 ) m
m/2 nn/2 xm/2−1
fF (x) = for x ∈ (0, ∞). (21.6a)
Γ( m n
2 )Γ( 2 ) [mx + n]
(m+n)/2

Shape of the density function. If m ∈ (0, 2] then fF is decreasing, whilst if m ∈ (2, ∞) then fF first increases and
then decreases with mode at x = (m − 2)n/m(n + 2). See also exercise 22.24 and figure(21.6a).
F10,4 density
0.8 F10,50 density

0.6

0.4

0.2

0.0
0 1 2 3 4 5

Figure(21.6a). Plot of the F10,4 and F10,50 densities.


(wmf/fdensity,72mm,54mm)

15
The bilateral exponential or Laplace distribution is considered in §27.1 on page 81.
Page 66 §21 Mar 10, 2020(20:25) Bayesian Time Series Analysis

21.7 The connection between the t and F distributions. Recall the definition of a tn distribution:
X
Tn = p
Y /n
where X and Y are independent, X ∼ N (0, 1) and Y ∼ χ2n .
Now X 2 ∼ χ21 ; hence
X2
Tn2 = 2 ∼ F1,n (21.7a)
Y /n
It follows that if X ∼ Cauchy(0, 1) = t1 , then X 2 ∼ F1,1 .
Example(21.7a). Using knowledge of the F density and equation(21.7a), find the density of Tn .
Solution. Let W = Tn2 ; hence W ∼ F1,n . Then
1  √ √ 
fW (w) = √ fTn (− w) + fTn ( w)
2 w
But equation(21.7a) clearly implies the distribution of Tn is symmetric about 0; hence for w > 0
−(n+1)/2
√ Γ( n+1 nn/2 w−1 Γ( n+1 w2

1 2 2 ) 2 )
fW (w) = √ fTn ( w) and fTn (w) = wfW (w ) = w 1 =√ 1+
w Γ( 2 )Γ( n2 ) [w2 + n](n+1)/2 nπΓ( n2 ) n
Finally, by symmetry, fTn (−w) = fTn (w).
21.8 Properties of the F distribution. The following properties of the F -distribution are considered in exercises
22.25–22.33 on pages 70–71.
• If X ∼ Fm,n then 1/X ∼ Fn,m .
• If X ∼ Fm,n then E[X] = n/(n − 2) for n > 2 and var[X] = 2n2 (m + n − 2)/[m(n − 2)2 (n − 4)] for n > 4.
See exercise 22.25 on page 70.
• If X1 ∼ gamma(n1 , α1 ), X2 ∼ gamma(n2 , α2 ) and X1 and X2 are independent then
n2 α1 X1
∼ F2n1 ,2n2 (21.8a)
n1 α2 X2
In particular, if X and Y are i.i.d. with the exponential (λ) distribution, then X/Y ∼ F2,2 .
nX
• If X ∼ beta( m/2, n/2) then m(1−X) ∼ Fm,n . See exercise 22.29 on page 71. (21.8b)
mX n
• If X ∼ Fm,n then n+mX ∼ beta( m/2, n/2) and n+mX ∼ beta( n/2, m/2). See exercise 22.30 on page 71. (21.8c)
• If X ∼ Fm,n then mX/n ∼ beta 0 ( m/2, n/2). See exercise 22.31 on page 71. (21.8d)
D
• Suppose X ∼ Fm,n . Then mX −→χ2m as n → ∞. See exercise 22.32 on page 71.
21.9 Fisher’s z distribution.
Definition(21.9a). If X ∼ Fm,n , then
ln(X)
∼ FisherZ (m, n)
2
It follows that if X ∼ FisherZ (n, m) then e2X ∼ Fn,m .
The skew t distribution
21.10 The skew t-distribution. The idea is to split the factor in the tn density
 n     
t t
into 1 − √ × 1 + √
n + t2 n + t2 n + t2
and apply different powers to each factor. This leads to the following definition.
Definition(21.10a). Suppose a > 0 and b > 0. Then the random variable X has the skew t-distribution iff X
has density
 a+1/2  b+1/2
Γ(a + b) x x
fX (x) = √ 1+ √ 1− √ for x ∈ R. (21.10a)
2a+b−1 Γ(a)Γ(b) a + b a + b + x2 a + b + x2
Note:
• This distribution is denoted skewt(a, b).
• Using the standard algebraic result (1 + y)(1 − y) = 1 − y 2 shows that an alternative expression is
 a  b
Γ(a + b) x x
fX (x) = √ 1+ √ 1− √ for x ∈ R.
2a+b−1 Γ(a)Γ(b) a + b + x2 a + b + x2 a + b + x2
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §21 Page 67

• To prove fX (x) in equation(21.10a) integrates to 1 we use the transformation


x dv a+b
v=√ which implies =
a + b + x2 dx (a + b + x2 )3/2
dv
Note that dx > 0 for all x and hence the transformation is a 1-1 map: (−∞, ∞) −→ (−1, 1). Hence
Z ∞  a  b
1 x x
√ 1+ √ 1− √ dx
−∞ a + b + x2 a + b + x2 a + b + x2
Z ∞  a−1  b−1
a+b x x
= 2 3/2
1+ √ 1− √ dx
−∞ (a + b + x ) a + b + x2 a + b + x2
Z 1 Z 1
a−1 b−1 Γ(a)Γ(b)
= (1 + v) (1 − v) dv = 2a+b−1 wa−1 (1 − w)b−1 dw = 2a+b−1
−1 0 Γ(a + b)
where w = (1 + v)/2. This proves that fX (x) in equation(21.10a) is a density.
21.11 Properties of the skew t-distribution.
• Suppose X ∼ skewt(a, b) and Y ∼ skewt(b, a). Then fY (x) = fX (−x) for all x ∈ R.
• Suppose a ∈ (0, ∞). The skewt(a, a) distribution is the same as the t2a distribution. (Exercise 22.34.)
The shape of the density is displayed in figure (21.11a).
0.4 0.4

a = 3, b = 3 a = 3, b = 3
0.3 0.3
a = 4, b = 2
a = 2, b = 4
0.2 0.2

a = 1, b = 5 a = 5, b = 1
0.1 0.1

0.0 0.0
−15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
Figure(21.11a). Plot of the skew t-density for various a and b with a + b = 6.
Note that a = b = 3 is the t6 density.
(wmf/skewtDensity-1,wmf/skewtDensity-2,80mm,60mm)

21.12
Summary.
• The tn distribution. The random variable T ∼ tn iff
X
T =p
Y /n
where X ∼ N (0, 1), Y ∼ χ2n and X and Y are independent.
• Moments:  
n 1 1
E[T ] = 0 var[T ] = for n > 2. E =
n−2 T n−2
• Suppose T ∼ tn , m ∈ R and s > 0. Then V = m + sT ∼ tn (m, s2 ).
The Cauchy distribution. This has density
1
γ1 (t) = for t ∈ R.
π(1 + t2 )
It is the t1 distribution. The Cauchy(a, s) distribution is the same as the t1 (a, s2 ) distribution.
The F distribution. Suppose m > 0 and n > 0. Suppose further that X ∼ χ2m , Y ∼ χ2n and X and Y are
independent. Then
X/m
F = has an Fm,n distribution.
Y /n
• If X ∼ tn then X 2 ∼ F1,n .
Page 68 §22 Mar 10, 2020(20:25) Bayesian Time Series Analysis

22 Exercises (exs-tCauchyF.tex)

The t distribution.
22.1 (a) Using the definition of the tn distribution given in definition(21.1a) on page 63, show that the density of the tn dis-
tribution is given by equation(21.1b).
(b) Using the fact that the conditional distribution of T given Y = y is N (0, n/y), show that the density of the tn distri-
bution is given by equation(21.1b). [Ans]
22.2 Shape of the tn density function. Suppose X ∼ tn has the density fX .
(a) Show that fX (x) is symmetric about x = 0.
(b) Show that fX is initially increasing and then decreasing with mode at x = 0. Also fX (x) → 0 as x → ±∞.
p
√ fX is initially convex, then concave and then convex again with points of inflection at x = − n/(n + 2)
(c) Show that
and x = n(n + 2). [Ans]
22.3 Moments of the tn distribution. Suppose T ∼ tn and k ∈ {1, 2, 3, . . . , n − 1}. Clearly E[T k ] = 0 when k is odd. If k is
even, and hence k = 2r where r ∈ {1, 2, 3, . . .} and r ≤ (n − 1)/2, then prove that
k+1 n−k
 
k k/2 Γ 2 Γ (k − 1)(k − 3) · · · 3.1
E[T ] = n 1 n
2
= nk/2
Γ 2 Γ 2 (n − 2)(n − 4) · · · (n − k + 2)(n − k)
or, in terms of r:
1 n
 
rΓ r+ 2  Γ 2 − r (2r − 1)(2r − 3) · · · 3.1
2r
E[T ] = n 1 n
= nr
Γ 2 Γ 2
(n − 2)(n − 4) · · · (n − 2r + 2)(n − 2r)
If k ≥ n then 
∞ if k is even, k ≥ n;
E[T k ] = [Ans]
undefined if k is odd, k ≥ n.
22.4 Suppose X ∼ tn .
(a) Find skew[X], the skewness of X.
(b) Find κ[X], the kurtosis of X. [Ans]
2
22.5 Suppose X, Y1 , . . . , Yn are i.i.d. random variables with the N (0, σ ) distribution. Find the distribution of
X
Z=q [Ans]
(Y12 + · · · + Yn2 )/n
22.6 Suppose n > 0, s > 0 and α ∈ R. Show that
Z ∞ n/2
1 n−1
 
1
dt = sB , [Ans]
−∞ 1 + (t − α)2 /s2 2 2
22.7 (a) Prove that the limit as n → ∞ of the tn density given in equation(21.1b) is the standard normal density.
a.e.
(b) Suppose Tn ∼ tn for n ∈ {1, 2, . . .}. Show that Tn −→Z as n → ∞. [Ans]
22.8 The t2 distribution.
(a) Show that the density of the t2 distribution is
1
f (x) = for x ∈ R.
(2 + x2 )3/2
(b) Show that the distribution function of the t2 distribution
 is 
1 x
F (x) = 1+ √ for x ∈ R.
2 2 + x2
(c) Show that the quantile function Q(u) = F −1 (u) is
2u − 1
Q(u) = √ for u ∈ (0, 1).
2u(1 − u)
(d) Moments. E[X] exists and equals 0; E[X n ] does not exist for n > 1.

(e) Measures p of variability. Suppose T ∼ t2 . Show that the mean absolute deviation E|T | = 2 and the interquartile
range is 2 2/3 = 1.633.
(f) Suppose X ∼ N (0, 1) and Y ∼ exponential (1) and X and Y are independent. Show that
X
√ ∼ t2
Y
(g) Suppose X and Y are i.i.d. random variables with the exponential (1) distribution. Show that
X X −Y
∼ F2,2 and hence √ ∼ t2 [Ans]
Y 2XY
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §22 Page 69

The Cauchy distribution.


22.9 Shape of the Cauchy density. Suppose γs denotes the density of the distribution Cauchy(a, s).
(a) Show that γs (x) is symmetric about x = a.
(b) Show that γs (x) first increases and then decreases with mode at x = a.

γs (x) is initially convex, then concave and then convex with points of inflection at x = a − s/ 3 and
(c) Show that √
x = a + s/ 3. [Ans]
22.10 The quantile function and median of the Cauchy distribution. Suppose a ∈ R, s ∈ (0, ∞) and X ∼ Cauchy(a, s). Show
that the quantile function of X is
Fs−1 (p) = a + s tan[π(p − 1/2)] for p ∈ (0, 1).
and the median of X is a. [Ans]
22.11 Linear transformation of a Cauchy distribution. Suppose X ∼ Cauchy(a, s), c ∈ R and d ∈ (0, ∞). Show that
Y = c + dX ∼ Cauchy(c + da, ds).
It follows that the family of Cauchy distributions {Cauchy(a, s) : a ∈ R, s > 0} form a location-scale family—see
definition(1.6b) on page 5. [Ans]
22.12 Sums of independent Cauchy distributions.
(a) Suppose X1 ∼ Cauchy(a1 , s1 ) and X2 ∼ Cauchy(a2 , s2 ). Suppose further that X1 and X2 are independent. Show
that X1 + X2 ∼ Cauchy(a1 + a2 , s1 + s2 ).
(b) Suppose X1 , . . . , Xn are i.i.d. random variables with the Cauchy(a, s) distribution. Show that X1 + · · · + Xn ∼
Cauchy(na, ns). [Ans]
22.13 Suppose X1 , . . . , Xn are i.i.d. with density γs .
X1 +···+Xn
(a) Show that Y = n also has the Cauchy(0, s) distribution.
(b) Let Mn = median(X1 , . . . , Xn ). Show that Mn is asymptotically normal with mean 0 and a variance which tends
to 0. [Ans]
22.14 The Cauchy distribution is stable. Suppose X ∼ Cauchy(a, s) where s > 0.
(a) Prove that X has a strictly stable distribution with characteristic exponent α = 1.
(b) Using the notation in equation(4.3b), show that X is stable with characteristic function {c = a, d = s, β = 0}. [Ans]
22.15 (a) Suppose X ∼ Cauchy(0, 1). Show that Y = 1/X ∼ Cauchy(0, 1).
(b) Suppose X ∼ Cauchy(0, s). Show that Y = 1/X ∼ Cauchy(0, 1/s).
(c) Suppose X has a non-central Cauchy(m, s) distribution with median m. Hence
s
fX (x) = for x ∈ R.
π[s2 + (x − m)2 ]
Find the density of Y = 1/X . [Ans]
22.16 Suppose X and Y are i.i.d. with the N (0, σ 2 ) distribution. Find the distribution of:
(a) W = X/Y ; (b) W = X/|Y |; (c) W = |X|/|Y |. [Ans]
22.17 (a) Suppose U has the uniform distribution uniform(− π/2, π/2). Show that tan(U ) ∼ Cauchy(0, 1).
(b) Suppose U has the uniform distribution uniform(−π, π). Show that tan(U ) ∼ Cauchy(0, 1).
(c) Suppose a ∈ R and s ∈ (0, ∞). Suppose further that U ∼ uniform(0, 1). Show that a + s tan[π(U − 1/2)] ∼
Cauchy(a, s). Conversely, if X ∼ Cauchy(a, s) then 12 + π1 tan−1 ( X−a
s ) ∼ uniform(0, 1). [Ans]
22.18 (a) Suppose X ∼ Cauchy(0, s). Find the density of 2X. (This shows that 2X has the same distribution as X1 + X2
where X1 and X2 are i.i.d. with the same distribution as X.)
(b) Supppose U and V are i.i.d. with the Cauchy(0, s) distribution. Let X = aU + bV and Y = cU + dV . Find the
distribution of X + Y . [Ans]
22.19 Suppose X and Y are i.i.d. with the N (0, 1) distribution. Define R and Θ by R2 = X 2 + Y 2 and tan(Θ) = Y /X where
R > 0 and Θ ∈ (−π, π). Show that R2 has the χ22 distribution, tan(Θ) has the Cauchy(0, 1) distribution, and R and Θ
2
are independent. Show also that the density of R is re−r /2 for r > 0. [Ans]
22.20 Suppose X has the Cauchy(0, 1) distribution.
2X
 
(a) Find the density of 1−X 2
. Hint: tan(2θ) = 2 tan(θ) 1 − tan2 (θ) .
1
X − X1 .

(b) Find the density of V = 2 [Ans]
Page 70 §22 Mar 10, 2020(20:25) Bayesian Time Series Analysis

22.21 From a point O, radioactive particles are directed at an absorbing line which is at a distance b from O. Suppose OP
denotes the perpendicular from the point O to the absorbing line—and hence the length of OP is b. The direction of
emission is measured by the angle Θ from the straight line OP . Suppose Θ is equally likely to be any direction in
(− π/2, π/2). Formally, Θ ∼ uniform(− π/2, π/2).
(a) Determine the density of X, the distance from P where the particle hits the absorbing line.
(b) What is the density of 1/X ? [Ans]
2 2
22.22 The symmetric Cauchy distribution in R . Define the function f : R → (0, ∞) by
1
f (x, y) =
2π(1 + x2 + y 2 )3/2
(a) Show that f is a density function.
(b) Find the marginal densities.
(c) Suppose (X, Y ) has the density f and we transform to polar coordinates: X = R cos Θ and Y = R sin Θ. Show that
R and Θ are independent and find the distributions of R and Θ.
The last question can be generalized to produce this density—in this case, the direction must be uniform over the surface
of a hemisphere. [Ans]
The F distribution.
22.23 (a) By using a bivariate transformation and definition(21.6a) on page 65, show that the density of the Fm,n distribution
is given by equation(21.6a) on page 65.
(b) Using definition(21.6a) on page 65, we see that the distribution of F given Y = y is
 
n
X where X ∼ χ2m = gamma( m/2, 1/2).
my
By §11.3 on page 34, we see that the distribution of F given Y = y is gamma( m/2, my/2n). Hence derive the density
of the Fm,n distribution given in equation(21.6a) on page 65. [Ans]
22.24 Shape of the F density. Suppose m ∈ (0, ∞), n ∈ (0, ∞), and X ∼ Fm,n with density f .
(a) Suppose m ∈ (0, 2). Show that f is decreasing on (0, ∞) with f (x) → ∞ as x → 0.
(b) Suppose m = 2. Show that f is decreasing with mode at x = 0 with f (0) = 1 − 2/n.
(c) Suppose m ∈ (2, ∞). Show that f first increases and then decreases with mode at x = (m − 2)n/m(n + 2).
(d) Suppose m ∈ (0, 2]; show that f is convex. Suppose m ∈ (2, 4]; show that f is initially concave and then convex
with point of inflection at

n(m − 2) n 2(m − 2)(n + 4)(m + n)
β= +
m(n + 2) m (n + 2)(n + 4)
Suppose m ∈ (4, ∞); show that f is initially convex, then concave and then convex again with points of inflection
at √
n(m − 2) n 2(m − 2)(n + 4)(m + n)
α= − and at β.
m(n + 2) m (n + 2)(n + 4)
Note that α > 0 when m > 4. [Ans]
22.25 Moments of the F distribution.Suppose F has the Fm,n distribution.
(a) Show E[F ] = ∞ for n ∈ (0, 2] and
n
E[F ] = for n > 2.
n−2
(b) Show that var[F ] is undefined for n ∈ (0, 2]; var[F ] = ∞ for n ∈ (2, 4] and
2n2 (m + n − 2)
var[F ] = for n > 4.
m(n − 2)2 (n − 4)
(c) Suppose r ∈ {1, 2, . . .}; show that
 n r m(m + 2) · · · (m + 2r − 2)
E[F r ] = for n > 2r. [Ans]
m (n − 2)(n − 4) · · · (n − 2r)
22.26 Suppose X ∼ F (m, n).
(a) Find skew[X], the skewness of X.
(b) Find κ[X], the kurtosis of X. [Ans]
2
22.27 Suppose X and Y are i.i.d. N (0, σ ). Find the density of Z where

2 2
Z = Y /X if X 6= 0; [Ans]
0 if X = 0.
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §23 Page 71

22.28 (a) Suppose X ∼ Fm,n . Show that 1/X ∼ Fn,m .


(b) Suppose X1 ∼ gamma(n1 , α1 ), X2 ∼ gamma(n2 , α2 ) and X1 and X2 are independent. Show that
n2 α1 X1
∼ F2n1 ,2n2
n1 α2 X2
In particular, if X1 ∼ exponential (λ), X2 ∼ exponential (λ) and X1 and X2 are independent, then
X1
∼ F2,2 [Ans]
X2
nX
22.29 Suppose X ∼ beta( m/2, n/2). Show that m(1−X) ∼ Fm,n . [Ans]
mX n
22.30 (a) Suppose X ∼ Fm,n . Show that n+mX ∼ beta( m/2, n/2) and n+mX ∼ beta( n/2, m/2).
(b) Suppose X ∼ F2α,2β where α > 0 and β > 0. Show that αX/β ∼ beta 0 (α, β). [Ans]
22.31 Suppose W ∼ Fm,n . Show that mW/n ∼ beta 0 ( m/2, n/2). 0
Conversely, if Y ∼ beta (a, b) then bY /a ∼ F2a,2b . [Ans]
D 2
22.32 Suppose W ∼ Fm,n . Show that mW −→χm as n → ∞. [Ans]
22.33 (a) Suppose X ∼ Fn,n . Show that
√ 
n √

1
X−√ ∼ tn
2 X
(b) Suppose X ∼ χ2n , Y ∼ χ2n and X and Y are independent. Let

nX −Y
T = √
2 XY
Prove that T ∼ tn . [Ans]
The skew t distribution.
22.34 Suppose a ∈ (0, ∞). Show that the skewt(a, a) distribution is the same as the t2a distribution. [Ans]
22.35 (a) Suppose X ∼ F2m,2n and W = mX/n. Show that

m+n √
 
1
W− √ ∼ skewt(m, n)
2 W
(b) Suppose Y ∼ χ22m , Z ∼ χ22n and Y and Z are independent. Show that

m + n (Y − Z)
√ ∼ skewt(m, n)
2 YZ
(c) Suppose B ∼ beta(m, n). Show that

m + n (2B − 1)
√ ∼ skewt(m, n) [Ans]
2 B(1 − B)
22.36 Suppose X ∼ skewt(m, n). Show that
(m + n)1/2 (m − n) Γ(m − 1/2) Γ(n − 1/2) (m + n) (m − n)2 + m + n − 2
 
E[X] = and E[X 2 ] =
2 Γ(m) Γ(n) 4 (m − 1)(n − 1)
where the result for E[X] holds for m > 1/2 and n > 1/2 and the result for E[X 2 ] holds for m > 1 and n > 1. [Ans]
22.37 Suppose X ∼ skewt(m, n). Show that the density of X is unimodal with mode at

(m − n) m + n
√ √ [Ans]
2m + 1 2n + 1

23 Non-central chi-squared, t and F


23.1 The non-central χ2 -distribution with 1 degree of freedom. We know that if Z ∼ N (0, 1), then Z 2 ∼ χ21 .
Now suppose
W = (Z + a)2 where Z ∼ N (0, 1) and a ∈ R.
Then W is said to have a non-central χ21 distribution with non-centrality parameter a2 .
We can also write W ∼ Y 2 where Y ∼ N (a, 1).
The moment generating function of W .
Z ∞
t(Z+a)2 1 2 1 2
tW
E[e ] = E[e ]= √ et(z+a) e− 2 z dz
2π −∞
But
Page 72 §23 Mar 10, 2020(20:25) Bayesian Time Series Analysis

t(z + a)2 − 1/2 z 2 = z 2 t + 2azt + a2 t − 1/2 z 2 = z 2 (t − 1/2) + 2azt + a2 t


" 2 #
2t − 2t 2 t2
 
2 4azt 2a 1 2t 2at 2a 4a
= (t − 1/2) z − − =− z− − −
1 − 2t 1 − 2t 2 1 − 2t 1 − 2t (1 − 2t)2
" #
1 − 2t 2at 2 2a2 t

=− z− −
2 1 − 2t (1 − 2t)2
and hence, if α = 2at/(1 − 2t) and t < 1/2,
 2  Z ∞
(1 − 2t)(z − α)2
 
tW t(Z+a)2 a t 1
E[e ] = E[e ] = exp √ exp − dz
1 − 2t 2π −∞ 2
 2 
−1/2 a t
= (1 − 2t) exp (23.1a)
1 − 2t
The density of W . Using the usual transformation formula for densities shows that for w > 0 we have
√ √ √ √
φ( w − a) + φ(− w − a) φ( w − a) + φ( w + a)
fW (w) = √ = √
2 w 2 w
1 √ √ 
exp − 1/2(w + a2 ) exp(a w) + exp(−a w)

= √ (23.1b)
2 2πw
1 √
exp − 1/2(w + a2 ) cosh(a w) because cosh(x) = (ex + e−x )/2 for all x ∈ R.

=√
2πw

The standard expansion for cosh is cosh(x) = ∞ 2j n
P
j=0 x /(2j)! for all x ∈ R; also Γ(n + /2) = (2n)! π/(4 n!) for
1

all n = 0, 1, 2, . . . ; hence
∞ ∞
1 2
X (a2 w)j 1 2
X (a2 w/4)j
fW (w) = √ exp − /2(w + a )
1 √ =√ exp − /2(w + a )
1
2w π(2j)! 2w j!Γ(j + 1/2)
j=0 j=0
 √ 1/2
1 √ a w
exp − 1/2(w + a2 ) I−1/2 (a w)

=√
2w 2
 2 1/4
1  a √
= exp − 1/2(w + a2 ) I−1/2 (a w) (23.1c)
2 w
where, for all x > 0,
∞  x 2j− 1/2
X 1
I−1/2 (x) =
j!Γ(j + 1/2) 2
j=0
is a modified Bessel function of the first kind.
Note. The general definition of a modified Bessel function of the first kind is

 x ν X x2j
Iν (x) = for all ν ∈ R and x ∈ C. (23.1d)
2 4j j!Γ(ν + j + 1)
j=0

23.2 The non-central χ2 -distribution with n degrees of freedom where n ∈ {1, 2, .P . .}.
Suppose X1 ∼ N (µ1 , 1), X2 ∼ N (µ2 , 1), . . . , Xn ∼ N (µn , 1) are independent. Then nj=1 (Xj − µj )2 ∼ χ2n but
Pn 2 2
j=1 Xj does not have a χ distribution. We say
Xn
Xj2 has a non-central χ2n distribution with non-centrality parameter λ = nj=1 µ2j .
P
W = (23.2a)
j=1
This can be written as: suppose X ∼ N (µ, In ) then XT X ∼ χ2n,µT µ . In particular, if X ∼ N (µ, σ 2 ) then
X 2 /σ 2 ∼ χ21,µ2 /σ2 , the non-central χ21 distribution with non-centrality parameter µ2 /σ 2 .
Note that some authors define the non-centrality parameter to be λ/2.
Moments. See exercise 24.1 on page 76 for the following moments:
E[W ] = n + λ and var[W ] = 2n + 4λ
2 2
Pn 2 χn distribution with non-centrality parameter λ and W2 ∼ χn , then
So we see that if W1 has a non-central
E[W1 ] ≥ E[W2 ] because λ = j=1 µj ≥ 0.
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §23 Page 73

The moment generating function of W . By equation(23.1a) the moment generating function of X12 is
 2 
tX12 1 µ1 t
E[e ] = exp
(1 − 2t)1/2 1 − 2t
Hence  
1 λt
E[etW ] = exp for t < 1/2. (23.2b)
(1 − 2t)n/2 1 − 2t
Distribution of S 2 . Suppose X1 ∼ N (µ1 , σ 2 ), X2 ∼ N (µ2 , σ 2 ), . . . , Xn ∼ N (µn , σ 2 ) are independent. Then
Pn 2
Pn 2
(n − 1)S 2 k=1 (Xk − X) 2 k=1 (µk − µ)
= ∼ χ n−1,λ where λ =
σ2 σ2 σ2
For the proof see example(42.13b) on page 137.
23.3 The non-central χ2 -distribution with n degrees of freedom—the basic decomposition theorem. The
easiest proof is by using moment generating functions and equation(23.2b). Here is a proof from first principles:
Proposition(23.3a). Suppose n ∈ {1, 2, . . .} and W has a non-central χ2n distribution with non-centrality
parameter λ > 0. Then W has the same distribution as U + V where:
U has a non-central χ21 distribution with non-centrality parameter λ;
V has a χ2n−1 distribution;
U and V arepindependent.
n
Proof. Let µj = λ/n for j = 1, . . . , n; hence j=1 µ2j = λ.
P
Pn
We are given that W has a non-central χ2n distribution with non-centrality parameter λ > 0. Hence W ∼ j=1 Xj2 where
X1 , . . . , Xn are independent with Xj ∼ N (µj , 1) for j = 1, . . . , n.
√ √
Let e1 = (1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1) denote the standard basis of Rn . Set b1 = (µ1 / λ, . . . , µn / λ). Then
{b1 , e2 , . . . , en } form a basis of Rn . Use the Gram-Schmidt orthogonalization procedure to create the basis {b1 , . . . , bn }.
Define B to be the n × n matrix with rows {b1 , . . . , bn }; then B is orthogonal.

Suppose X = (X1 , . . . , Xn ) and set Y = BX. Then Y ∼ N (Bµ, BIBT = I) where µ = (µ1 , . . . , µn ) = λb1 . Hence
n×1 n×1

Y1 ∼ N (bT1 µ = λ, 1) and Yj ∼ N (bTj µ = 0, 1) for j = 2, . . . , n and Y1 , . . . , Yn are independent. Also YT Y = XT X.
Pn
Finally, let U = Y12 and V = j=2 Yj2 .

23.4 The non-central χ2 -distribution with n degrees of freedom—the density function. We use proposi-
tion(23.3a). Now U has a non-central χ21 distribution with non-centrality parameter λ. Using equation(23.1b)
gives
1 1
h √ √ i
fU (u) = 3/2 1 √ e− 2 (u+λ) e λu + e− λu for u > 0.
2 Γ( /2) u
Also, V ∼ χ2n−1 has density
e−v/2 v (n−3)/2
fV (v) = for v > 0.
2(n−1)/2 Γ( (n−1)/2)
Using independence of U and V gives
u−1/2 v (n−3)/2 e−(u+v)/2 e−λ/2 h √λu √ i
− λu
f(U,V ) (u, v) = e + e
2(n+2)/2 Γ( 1/2)Γ( (n−1)/2)
Now use the transformation X = U + V and Y = V . The Jacobian equals 1. Hence for y > 0 and x > y
1/2 " √λ(x−y) √ #
e−x/2 e−λ/2 x(n−4)/2  y (n−3)/2 − e− λ(x−y)

x e
f(X,Y ) (x, y) = n/2 1
2 Γ( /2)Γ( (n−1)/2) x x−y 2
Now
1/2 " √λ(x−y) √ #  ∞
1/2 X
− e− λ(x−y) λj (x − y)j

x e x
=
x−y 2 x−y (2j)!
j=0

X (λx)j  y j−1/2
= 1−
(2j)! x
j=0
and so we have
Page 74 §23 Mar 10, 2020(20:25) Bayesian Time Series Analysis


e−x/2 e−λ/2 x(n−4)/2 X (λx)j  y (n−3)/2  y j−1/2
f(X,Y ) (x, y) = n/2 1 1 − for y > 0 and x > y.
2 Γ( /2)Γ( (n−1)/2) j=0 (2j)! x x

We need to integrate out y. By setting w = y/x we get


Z x  (n−3)/2  Z 1
y y j−1/2
1− dy = x w(n−3)/2 (1 − w)j−1/2 dw
y=0 x x w=0
= xB( (n−1)/2, j + 1/2))
Γ( (n−1)/2)Γ(j + 1/2))
=x
Γ( n/2 + j)
and hence for x > 0

e−x/2 e−λ/2 x(n−2)/2 X (λx)j Γ(j + 1/2)) Γ( n/2)
fX (x) =
2n/2 Γ( n/2) j=0
(2j)! Γ( 1/2) Γ( n/2 + j)

e−x/2 e−λ/2 x(n−2)/2 X (λx)j
= (23.4a)
2n/2 4j j!Γ( n/2 + j)
j=0

The expression for the modified Bessel function of the first kind in equation(23.1d) on page 72 gives
√ ∞
(λx)(n−2)/4 X (λx)j
I n2 −1 ( λx) =
2n/2−1 4j j!Γ( n/2 + j)
j=0

Hence an alternative expression for the density is


1  x (n−2)/4 √
fX (x) = e−x/2 e−λ/2 I n2 −1 ( λx) (23.4b)
2 λ
This is the same as equation(23.1c) if we set n = 1 and λ = a2 .
A plot of the density of the χ28 distribution and the density of the non-central χ28 distribution with non-centrality
parameter µ equal to 5 and 10 is given in figure(23.4a).
0.15
n = 8, µ = 0
n = 8, µ = 5
n = 8, µ = 10
0.10

0.05

0.00
0 5 10 15 20
Figure(23.4a). Plot of the non-central χ28 density for various values of the non-centrality parameter µ.
(wmf/noncentralchisquared,79mm,56mm)

23.5 The general definition of the non-central χ2 for any n ∈ (0, ∞). Now exercise 24.2 shows that if f is
the density of the non-central χ2n distribution with non-centrality parameter λ, then
∞ −λ/2 λ j
X e ( /2)
f (x) = fn+2j (x) (23.5a)
j!
j=0

where fn+2j (x) is the density of the χ2n+2j distribution. This representation permits the following generalization:
Definition(23.5a). Suppose λ ∈ [0, ∞) and n ∈ (0, ∞). Then the random variable X has the non-central χ2
distribution with n degrees of freedom and non-centrality parameter λ iff X has the density function given
in equation (23.5a).
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §23 Page 75

Suppose X ∼ χ2n,λ where λ ∈ [0, ∞) and n ∈ (0, ∞).


Distribution function. By integrating equation(23.5a) we get the distribution function of X:
∞ −λ/2 λ j
X e ( /2)
FX (x) = Fn+2j (x) for x ∈ (0, ∞).
j!
j=0

where Fn+2k is the distribution function of the χ2n+2j distribution.


Moment generating function. For t ∈ (−∞, 1/2) we have
∞ −λ/2 λ j Z ∞
tX
X e ( /2)
MX (t) = E[e ] = etx fn+2j (x) dx
j! 0
j=0
∞ ∞ −λ/2 λ j
X e−λ/2 ( λ/2)j 1 1 X e ( /2)
= =
j! (1 − 2t)n/2+j (1 − 2t) n/2 j!(1 − 2t)j
j=0 j=0
   
1 λ λ 1 λt
= exp − + = exp
(1 − 2t)n/2 2 2(1 − 2t) (1 − 2t)n/2 1 − 2t
Representation in terms of the Poisson distribution. Suppose V has a Poisson distribution with mean λ/2 and the
distribution of W given V = j is the χ2n+2j distribution. Then W has the moment generating function
∞ ∞
tW
X
tW e−λ/2 ( λ/2)j X 1 e−λ/2 ( λ/2)j
E[e ]= E[e |V = j] = n/2+j
= E[etX ]
j! (1 − 2t) j!
j=0 j=0

Hence the distribution of W is the non-central χ2n,λ distribution.


Moments. Using the fact that X ∼ W we have
∞ ∞
X e−λ/2 ( λ/2)j X e−λ/2 ( λ/2)j
E[X] = E[W ] = E[W |V = j] = (n + 2j) = n + 2E[V ] = n + λ (23.5b)
j! j!
j=0 j=0
Similarly

2
X e−λ/2 ( λ/2)j
E[X ] = (n + 2j)(n + 2j + 2) = n2 + 2n + 4(n + 1)E[V ] + 4E[V 2 ]
j!
j=0

= n2 + 2n + 2(n + 1)λ + 4( λ/2 + λ2/4) (23.5c)


and hence
var[X] = 2(n + 2λ)
This could also be obtained by using the law of total variance which is equation(1.1b) on page 3.
Asymptotic normality Suppose X ∼ χ2n,λ . Then
X − (n + λ) D
√ =⇒ N (0, 1) as n → ∞. (23.5d)
2(n + 2λ)
which implies
X − (n + λ) D
√ =⇒ N (0, 1) as λ → ∞.
2(n + 2λ)
23.6 The non-central t distribution.
Definition(23.6a). Suppose n ∈ (0, ∞) and µ ∈ R. Then the random variable T has a non-central t-
distribution with n degrees of freedom and non-centrality parameter µ iff
X +µ
T =p (23.6a)
Y /n
where X ∼ N(0, 1), Y ∼ χ2n , and X and Y are independent.
See exercise 24.6 on page 76 for the following moments:
"  #2
n Γ n−1 n−1

n(1 + µ2 ) µ2 n Γ
r
2 2
E[T ] = µ for n > 1 and var[T ] = − for n > 2.
2 Γ n2 n−2 2 Γ n
2
Page 76 §24 Mar 10, 2020(20:25) Bayesian Time Series Analysis

If X ∼ N (µ, σ 2 ) and Y ∼ χ2n and X and Y are independent, then


X/σ
T =p
Y /n
has the non-central tn distribution with non-centrality parameter µ/σ.
23.7 The non-central F distribution.
Definition(23.7a). Suppose m > 0 and n > 0. Suppose further that X has a non-central χ2m distribution with
non-centrality parameter λ, Y ∼ χ2n and X and Y are independent. Then
X/m
F = has a non-central Fm,n distribution with non-centrality parameter λ.
Y /n
See exercise 24.8 for the following moments:
n(m + λ) (m + λ)2 + (m + 2λ)(n − 2)  n 2
E[F ] = for n > 2 and var[F ] = 2 for n > 4.
m(n − 2) (n − 2)2 (n − 4) m
If F1 has the non-central Fm,n distribution with non-centrality parameter λ and F2 has the Fm,n distribution, then
E[F1 ] ≥ E[F2 ]. This follows from the corresponding property of the non-central χ2 distribution.
The F -statistic used to test a hypothesis will usually have a central F distribution if the hypothesis is true and a
non-central F distribution if the hypothesis is false. The power of a test is the probability of rejecting the null
hypothesis when it is false. Hence calculating the power of a test will often involve calculating probabilities from
a non-central F distribution.
Similarly we have
Definition(23.7b). Suppose m > 0 and n > 0. Suppose further that X has a non-central χ2m distribution with
non-centrality parameter λ1 , Y has a non-central χ2n distribution with non-centrality parameter λ2 , and X and
Y are independent. Then
X/m
F = has a doubly non-central Fm,n distribution with non-centrality parameters (λ1 , λ2 ).
Y /n

24 Exercises (exs-noncentral.tex)

24.1 Suppose n ∈ {1, 2, . . .} and W has a non-central χ2n distribution with non-centrality parameter λ. By using the repre-
sentation in equation(23.2a), find E[W ] and var[W ]. [Ans]
24.2 Suppose λ ∈ (0, ∞) and the random variable V has the Poisson distribution with mean λ/2. Suppose further that the
distribution of W given V = j is the χ2n+2j distribution where n ∈ {1, 2, . . .}. Show that the distribution of W is the
non-central χ2n with non-centrality parameter λ. [Ans]
24.3 Suppose n ∈ {2, 3, . . .} and X1 , . . . , Xn are independent random variables. Suppose further that for j = 1, . . . , n we
have Xj ∼ χ2kj ,λj where kj ∈ (0, ∞) and λj ∈ [0, ∞). Find the distribution of Z = X1 + · · · + Xn . [Ans]

24.4 Suppose X ∼ χ2n,λ where n ∈ (0, ∞) and λ ∈ [0, ∞).


(a) Find the skewness, skew[X].
(b) Find the kurtosis, κ[X]. [Ans]
24.5 Suppose X ∼ χ2n,λwhere λ ∈ [0, ∞) and n ∈ (0, ∞). By using moment generating functions, prove the limiting result
in equation(23.5d).
24.6 Suppose T has the non-central t distribution with n degrees of freedom and non-centrality parameter µ. Show that
"  #2
n Γ n−1 n(1 + µ2 ) µ2 n Γ n−1
r 
2 2
E[T ] = µ for n > 1 and var[T ] = − for n > 2. [Ans]
2 Γ n2 n−2 2 Γ n2

24.7 Suppose T has the non-central t distribution with n degrees of freedom and non-centrality parameter µ. Show that T 2
has the non-central F1,n distribution with non-centrality parameter µ2 . [Ans]
24.8 Suppose F has the non-central Fm,n distribution with non-centrality parameter λ. Show that
n(m + λ) (m + λ)2 + (m + 2λ)(n − 2)  n 2
E[F ] = for n > 2 and var[F ] = 2 for n > 4. [Ans]
m(n − 2) (n − 2)2 (n − 4) m
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §25 Page 77

24.9 Suppose λ ∈ (0, ∞) and the random variable N has the Poisson distribution with mean λ/2. Suppose further that the
distribution of X given N = j is the Fm+2j,n distribution where m ∈ (0, ∞) and n ∈ (0, ∞).
Show that the distribution of X is the non-central Fm,n with non-centrality parameter λ.
(Hint: use exercise 24.2.) [Ans]

25 Size, shape and related characterization theorems


25.1 Size and shape: the definitions. The results in this section on size and shape are from [M OSSIMAN(1970)]
and [JAMES(1979)].
Definition(25.1a). The function g : (0, ∞)n → (0, ∞) is an n-dimensional size variable iff
g(ax) = ag(x) for all a > 0 and all x ∈ (0, ∞)n .
Definition(25.1b). Suppose g : (0, ∞)n → (0, ∞) is an n-dimensional size variable. Then the function z :
(0, ∞)n → (0, ∞)n is the shape function associated with g iff
x
z(x) = for all x ∈ (0, ∞)n .
g(x)
25.2 Size and shape: standard examples.
• The standard size function. This is g(x1 , . . . , xn ) = x1 + · · · + xn . The associated shape function is the function
z : (0, ∞)n → (0, ∞)n with !
x1 xn
z(x1 , . . . , xn ) = Pn , . . . , Pn
j=1 xj j=1 xj
• Dimension 1 size. This is g(x1 , . . . , xn ) = x1 . The associated shape function is
 
x2 xn
z(x1 , . . . , xn ) = 1, , . . . ,
x1 x1
• Dimension 2 size. This is g(x1 , . . . , xn ) = x2 . The associated shape function is
 
x1 xn
z(x1 , . . . , xn ) = , 1, . . . ,
x2 x2
• Volume. This is g(x1 , . . . , xn ) = (x21 + · · · + x2n )1/2 . The associated shape function is
 
x1 xn
z(x1 , . . . , xn ) = ,..., 2
(x21 + · · · + x2n )1/2 (x1 + · · · + x2n )1/2
• The maximum. This is g(x1 , . . . , xn ) = max{x1 , . . . , xn }. The associated shape function is
 
x1 xn
z(x1 , . . . , xn ) = ,...,
max{x1 , . . . , xn } max{x1 , . . . , xn }
• Root n size. This is g(x1 , . . . , xn ) = (x1 x2 . . . xn )1/n . The associated shape function is
 
x1 xn
z(x1 , . . . , xn ) = ,...,
(x1 x2 . . . xn )1/n (x1 x2 . . . xn )1/n
25.3 Size and shape: the fundamental result. We shall show that:
• if any one shape function z(X) is independent of the size variable g(X), then every shape function is independent
of g(X);
• if two size variables g(X) and g ∗ (X) are both independent of the same shape function z(X), then g(X)/g ∗ (X) is
almost surely constant.
First a specific example16 of this second result:
Example(25.3a). Suppose X = (X1 , X2 , X3 ) ∼ logN (µ, Σ) distribution. By definition, this means that if Y1 = ln(X1 ),
Y2 = ln(X2 ) and Y3 = ln(X3 ), then (Y1 , Y2 , Y3 ) ∼ N (µ, Σ).
Define the three size functions:

g1 (x) = x1 g2 (x) = x2 x3 g3 (x) = (x1 x2 x3 )1/3
and let z1 , z2 and z3 denote the corresponding shape functions. Suppose g1 (X) is independent of z1 (X).
(a) Show that var[Y1 ] = cov[Y1 , Y2 ] = cov[Y1 , Y3 ].
(b) Show that g1 (X) is independent of z2 (X). (c) Show that g1 (X) is independent of g2 (X)/g1 (X).
16
Understanding this example is not necessary for the rest of the section. The example makes use of the definition of the
multivariate normal and the fact that normals are independent if the covariance is zero. See Chapter3:§40.6 on page 126.
Page 78 §25 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Now suppose g3 (X) is also independent of z1 (X).


(d) Show that cov[Y1 , S] = cov[Y2 , S] = cov[Y3 , S] where S = Y1 + Y2 + Y3 .
(e) Show that var[Y2 ] + cov[Y2 , Y3 ] = var[Y3 ] + cov[Y2 , Y3 ] = 2var[Y1 ].
(f) Show that var[2Y1 − Y2 − Y3 ] = 0 and hence g1 (X)/g3 (X) is constant almost everywhere.

Solution. We are given X1 is independent of 1, X2 /X1 , X3 /X1 . Taking logs shows that Y1 is independent of (Y2 −
Y1 , Y3 − Y1 ) and these are normal. Hence cov[Y1 , Y2 − Y1 ] = cov[Y1 , Y3 − Y1 ] = 0 and hence (a).
(b) follows because Y1 is independent of (Y1 − 21 Y2 − 12 Y3 , 21 Y2 − 12 Y3 , 12 Y3 − 21 Y2 ).
(c) Now cov[Y1 , 12 (Y2 + Y3 ) − Y1 ) = 21 cov[Y1 , Y2 ] + 21 cov[Y1 , Y3 ] − var[Y1 ] = 0. By normality, ln (g1 (X)) = Y1 is independent
of log (g2 (X)) − ln (g1 (X)). Because the exponential function is one-one, we have (c).
(d) The assumption g3 (X) is independent of z1 (X) implies, by taking logs, that S is independent of (Y2 − Y1 , Y3 − Y1 ) and
these are normal. Hence (d).
(e) Expanding cov[Y1 , S] and using part (a) shows that cov[Y1 , S] = 3var[Y1 ]. Similarly, expanding cov[Y2 , S] shows that
var[Y2 ] + cov[Y2 , Y3 ] + cov[Y1 , Y2 ] = cov[Y2 , S] = cov[Y1 , S] = 3var[Y1 ]. Hence (e).
(f) Now var[2Y1 − Y2 − Y3 ] = 4var[Y1 ] − 4cov[Y 1 , Y2 ] − 4cov[Y1 , Y3 ] + var[Y2 ] + var[Y3 ] + 2cov[Y2 , Y3 ] = 0. Hence
var[Y1 − 31 S] = 0; hence var[ln g1 (X)/g3 (X) ] = 0. Hence (f).


Now for the general result:


Proposition(25.3b). Suppose g : (0, ∞)n → (0, ∞) is an n-dimensional size variable and z ∗ : (0, ∞)n →
(0, ∞)n is any shape function. Suppose further that X is a random vector such that z ∗ (X) is non-degenerate
and independent of g(X). Then
(a) for any other shape function z1 : (0, ∞)n → (0, ∞)n , z1 (X) is independent of g(X);
(b) if g2 : (0, ∞)n → (0, ∞) is another size variable such that z ∗ (X) is independent of both g2 (X) and g(X),
then
g2 (X)
is constant almost everywhere.
g(X)
Proof. Let g ∗ and g1 denote the size variables which lead to the shape functions z ∗ and z1 . Hence
x x
z ∗ (x) = ∗ and z1 (x) = for all x ∈ (0, ∞)n .
g (x) g1 (x)
For all x ∈ (0, ∞)n we have
 

 x g1 (x)
g1 z (x) = g1 ∗
= ∗ by using g1 (ax) = ag1 (x).
g (x) g (x)
Hence for all x ∈ (0, ∞)n
z ∗ (x) x g ∗ (x)
z1 z ∗ (x) =

= × = z1 (x) (25.3a)
g1 ( z ∗ (x) ) g ∗ (x) g1 (x)
∗ ∗
Equation(25.3a) shows that z1 (X) is a function of z (X); also, we are given that z (X) is independent of g(X). Hence
z1 (X) is independent of g(X). This proves (a).
(b) Because of part (a), we can assume
X X
z2 (X) = is independent of g(X) and z(X) = is independent of g2 (X)
g2 (X) g(X)
Applying g to the first and g2 to the second gives
g(X) g2 (X)
g ( z2 (X) ) = is independent of g(X) and g2 ( z(X) ) = is independent of g2 (X)
g2 (X) g(X)
and hence
g2 (X)
is independent of both g2 (X) and g(X).
g(X)
Hence result by part (b) of exercise 2.11 on page 6.

25.4 A characterization of the gamma distribution.


Proposition(25.4a). Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables.
Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
n
X
g ∗ (x) = xj
j=1
Then there exists a shape vector z(X) which is independent of g ∗ (X) iff there exist α > 0, k1 > 0, . . . , kn > 0
such that Xj ∼ gamma(kj , α) for j = 1, 2, . . . , n.
Proof.
⇐ Now g ∗ (X) = nj=1 Xj ∼ gamma(k1 + · · · + kn , α). Proposition(11.8b) implies Xj /(X1 + · · · + Xn ) is independent
P
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §25 Page 79

of g ∗ (X) = X1 + · · · + Xn for j = 1, 2, . . . , n. Hence the standard shape vector


 
X1 X2 Xn
z(X) = , ,...,
X1 + · · · + Xn X1 + · · · + Xn X1 + · · · + Xn
∗ ∗
is independent of g (X). Hence all shape vectors are independent of g (X).
⇒ By proposition(25.3b), if there exists one shape vector which is independent of g ∗ (X), then all shape vectors are
independent of g ∗ (X). Hence Xj /(X1 + · · · + Xn ) is independent of g ∗ (X) = X1 + · · · + Xn for j = 1, 2, . . . , n. Hence
by proposition(11.8b), there exists α > 0 and kj > 0 such that Xj ∼ gamma(kj , α) for j = 1, 2, . . . , n.
This result implies many others. For example, suppose X1 , X2 , . . . , Xn are independent random variables with
Xj ∼ gamma(kj , α). Then every shape vector is independent of X1 + X2 + · · · + Xn ; in particular,
Xj
X1 + X2 + · · · + Xn is independent of
max{X1 , X2 , . . . , Xn }
and
X1 + X2 + · · · + Xn
X1 + X2 + · · · + Xn is independent of
max{X1 , X2 , . . . , Xn }
25.5 A characterization of the Pareto distribution.
Proposition(25.5a). Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables.
Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
g ∗ (x) = min{x1 , . . . , xn }
Then there exists a shape vector z(X) which is independent of g ∗ (X) iff there exists x0 > 0 and αj > 0 for
j = 1, 2, . . . , n such that Xj ∼ Pareto(αj , x0 , 0) for j = 1, 2, . . . , n.
Proof.
⇐ Let Y1 = ln(X1 ) and Y2 = ln(X2 ). Then Y1 −ln(x0 ) ∼ exponential (α1 ) and Y2 −ln(x0 ) ∼ exponential (α2 ) and Y1 and
Y2 are independent. By exercise 26.5 on page 81, we know that if Y1 −a ∼ exponential (λ1 ) and Y2 −a ∼ exponential (λ2 )
and Y1 and Y2 are independent, then min{Y1 , Y2 } is independent of Y2 − Y1 .
This establishes U = min{Y1 , Y2 } is independent of V = Y2 − Y1 . But (Y3 , . . . , Yn ) is independent of U and V . Hence
min{Y1 , . . . , Yn } is independent of Y2 −Y1 . Similarly min{Y1 , . . . , Yn } is independent of Yj −Y1 for j = 2, . . . , n. Hence
min{Y1 , . . . , Yn } is independent of the vector (Y2 − Y1 , Y3 − Y1 , . . . , Yn − Y1 ). And hence g ∗ (X) = min{X1 , . . . , Xn ) is
independent of the shape vector (1, X2/X1 , . . . , Xn/X1 ) as required.
⇒ Suppose n = 2. Using the shape vector (1, x2/x1 ) implies that we are given min{X1 , X2 } is independent of X2 /X1 .
Taking logs shows that min{Y1 , Y2 } is independent of Y2 − Y1 where Y1 = ln(X1 ) and Y2 = ln(X2 ).
It is known (see [C RAWFORD(1966)]) that if Y1 and Y2 are independent random variables with an absolutely continuous
distribution and min(Y1 , Y2 ) is independent of Y2 − Y1 , then there exist a ∈ R, α1 > 0 and α2 > 0 such that Y1 − a ∼
exponential (α1 ) and Y2 − a ∼ exponential (α2 ). Hence fY1 (y1 ) = α1 e−α1 (y1 −a) for y1 > a and fY2 (y2 ) = α2 e−α2 (y2 −a) for
y2 > a.
Hence X1 = eY1 ∼ Pareto(α1 , x0 = ea , 0) and X2 = eY2 ∼ Pareto(α2 , x0 = ea , 0) where x0 > 0.
For n > 2 we are given that
Xj
is independent of min{X1 , . . . , Xn } for j = 1, 2, . . . , n.
min{X1 , . . . , Xn }
But
min{X1 , . . . , Xn } = min{Xj , Zj } where Zj = min{Xi : i 6= j}
Hence for some x0j > 0, λj > 0 and λ∗j > 0, Xj ∼ Pareto(λj , x0j , 0) and Zj ∼ Pareto(λ∗j , x0j , 0). Because Zj =
min{Xi : i 6= j} we must have x0j ≤ x0i for j 6= i. It follows that all x0j are equal. Hence result.

25.6 A characterization of the power law distribution. Because the inverse of a Pareto random variable
has the power law distribution, the previous proposition can be transformed into a result about the power law
distribution.
Proposition(25.6a). Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables.
Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
g ∗ (x) = max{x1 , . . . , xn }
Then there exists a shape vector z(X) which is independent of g ∗ (X) iff there exists x0 > 0 and αj > 0 for
j = 1, 2, . . . , n such that Xj ∼ powerlaw(αj , x0 , 0) for j = 1, 2, . . . , n.
Proof.
⇐ Let Y1 = ln( 1/X1 ) and Y2 = ln( 1/X2 ). By exercise 20.3 on page 60, Y1 −ln( 1/x0 ) ∼ exponential (α1 ) and Y2 −ln( 1/x0 ) ∼
exponential (α2 ) and Y1 and Y2 are independent. By exercise 26.5 on page 81, we know that if Y1 − a ∼ exponential (λ1 )
and Y2 − a ∼ exponential (λ2 ) and Y1 and Y2 are independent, then min{Y1 , Y2 } is independent of Y1 − Y2 .
Page 80 §25 Mar 10, 2020(20:25) Bayesian Time Series Analysis

This establishes U = min{Y1 , Y2 } is independent of V = Y1 − Y2 . But (Y3 , . . . , Yn ) is independent of U and V . Hence


min{Y1 , . . . , Yn } is independent of Y1 −Y2 . Similarly min{Y1 , . . . , Yn } is independent of Y1 −Yj for j = 2, . . . , n. Hence
min{Y1 , . . . , Yn } is independent of the vector (Y1 − Y2 , Y1 − Y3 , . . . , Y1 − Yn ). And hence g ∗ (X) = max{X1 , . . . , Xn ) is
independent of the shape vector (1, X2/X1 , . . . , Xn/X1 ) as required.
⇒ Suppose n = 2. Using the shape vector (1, x2/x1 ) implies that we are given max{X1 , X2 } is independent of X2 /X1 .
Set Y1 = ln( 1/X1 ) and Y2 = ln( 1/X2 ). Hence min{Y1 , Y2 } is independent of Y2 − Y1 .
It is known (see [C RAWFORD(1966)]) that if Y1 and Y2 are independent random variables with an absolutely continuous
distribution and min(Y1 , Y2 ) is independent of Y2 − Y1 , then there exist a ∈ R, α1 > 0 and α2 > 0 such that Y1 − a ∼
exponential (α1 ) and Y2 − a ∼ exponential (α2 ). Hence fY1 (y1 ) = α1 e−α1 (y1 −a) for y1 > a and fY2 (y2 ) = α2 e−α2 (y2 −a) for
y2 > a.
Hence X1 = e−Y1 ∼ powerlaw(α1 , h = e−a , 0) and X2 = e−Y2 ∼ powerlaw(α2 , h = e−a , 0) where h > 0.
For n > 2 we are given that
Xj
is independent of max{X1 , . . . , Xn } for j = 1, 2, . . . , n.
max{X1 , . . . , Xn }
But
max{X1 , . . . , Xn } = max{Xj , Zj } where Zj = max{Xi : i 6= j}
Hence for some hj > 0, λj > 0 and λ∗j > 0, Xj ∼ powerlaw(λj , hj , 0) and Zj ∼ powerlaw(λ∗j , hj , 0). Because
Zj = max{Xi : i 6= j} we must have hj ≥ hi for j 6= i. It follows that all hj are equal. Hence result.

25.7 Independence of size and shape for the multivariate lognormal. This result requires a basic knowledge
of the multivariate normal—see Chapter3:§42 on page 130.
We say that the random vector X = (X1 , . . . , Xn ) ∼ logN (µ, Σ) iff ln(X) = ( ln(X1 ), . . . , ln(Xn ) ) ∼ N (µ, Σ).
Proposition(25.7a). Suppose X = (X1 , . . . , Xn ) ∼ logN (µ, Σ). Suppose further that g1 : (0, ∞)n → (0, ∞)
denotes the size variable
g1 (x) = (x1 · · · xn )1/n
Then g1 (X) is independent of every shape vector z(X) iff there exists c ∈ R such that cov[Yj , Y1 + · · · + Yn ] = c
for all j = 1, 2, . . . , n, where Y = (Y1 , . . . , Yn ) = (ln(X1 ), . . . , ln(Xn ) ).
Proof. By proposition(25.3b) on page 78, we need only prove g1 (X) is independent of one shape function. Consider the
shape function z ∗ (x) = 1, x2/x1 , . . . , xn/x1 .
Now g1 (X) is independent of z ∗ (X) iff (X1 · · · Xn )1/n is independent of 1, X2/X1 , . . . , Xn/X1 . This occurs iff Y1 +· · ·+Yn


is independent
Pof (Y2 − Y1 , . . . , Yn − Y1 ). But the Y ’s are normal; hence by
Pproposition(42.8b) Pon page 133, this occurs iff
n n n
cov[Yi − Y1 , j=1 Yj ] = 0 for i = 2, 3, . . . , n; and this occurs iff cov[Yi , j=1 Yj ] = cov[Y1 , j=1 Yj ] for i = 2, 3, . . . , n.

This result implies many others. For example, suppose X1 , X2 , . . . , Xn are independent random variables with
Xj ∼ logN (µj , σ 2 ) for j = 1, 2, . . . , n. Then
Xj
(X1 X2 · · · Xn )1/n is independent of
max{X1 , X2 , . . . , Xn }
and
X1 + X2 + · · · + Xn
(X1 X2 · · · Xn )1/n is independent of etc.
max{X1 , X2 , . . . , Xn }

Proposition(25.7a) leads to the following characterization of the lognormal distribution.


Proposition(25.7b). Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables.
Suppose g1 (0, ∞)n → (0, ∞) denotes the size variable
g1 (x) = (x1 · · · xn )1/n
Then there exists a shape vector z(X) which is independent of g1 (X) iff there exists σ > 0 such that every
Xj ∼ logN (µj , σ 2 ).
Proof.
⇐ Let Yj = ln(Xj ). Then Yj ∼ N (µj , σ 2 ); also Y1 , . . . , Yn are independent. Hence cov[Yj , Y1 + · · · + Yn ] = σ 2 for
j = 1, 2, . . . , n. Hence result by previous proposition.
⇒ By proposition(25.3b), if there exists one shape  vector which is independent of g1 (X), then all shape vectors are inde-
pendent of g1 (X). Hence 1, X2/X1 , . . . , Xn/X1 is independent of g1 (X) = (X1 · · · Xn )1/n . Hence Yk − Y1 is independent
of Y1 + · · · + Yn for k = 2, . . . , n. Hence, by the Skitovich-Darmois theorem—see proposition(15.8b), every Yk is normal.
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §27 Page 81

26 Exercises (exs-sizeshape.tex)

26.1 Suppose X = (X1 , X2 ) is a 2-dimensional random vector with X1 = aX2 where a ∈ R. Show that if z : (0, ∞)2 →
(0, ∞)2 is any shape function, then z(X) is constant almost everywhere. [Ans]
26.2 Suppose X = (X1 , X2 ) is a 2-dimensional random vector with the distribution given in the following table:
X2
1 2 3 6
1 0 0 1/4 0
2 0 0 0 1/4
X1 1/4
3 0 0 0
6 0 1/4 0 0

Define the size variables g1 (x) = x1 x2 and g2 (x) = x1 + x2 .
(a) Suppose z is any shape function: (0, ∞)2 → (0, ∞)2 . Show that z(X) cannot be almost surely constant. Also, show
that z(X) is independent of both g1 (X) and g2 (X).
(b) Find the distribution of g1 (X)/g2 (X). [Ans]
26.3 A characterization of the generalized gamma distribution. Prove the following result.
Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables. Suppose g ∗ (0, ∞)n → (0, ∞)
denotes the size variable
 1/b
n
X
g ∗ (x) =  xbj  where b > 0.
j=1
Then there exists a shape vector z(X) which is independent of g ∗ (X) iff there exist α > 0, k1 > 0, . . . , kn > 0 such that
Xj ∼ ggamma(kj , α, b) for j = 1, 2, . . . , n.
Hint: use the result that X ∼ ggamma(n, λ, b) iff X b ∼ Γ(n, λ) and proposition(25.4a). [Ans]
26.4 Suppose X1 ∼ exponential (λ1 ), Y ∼ exponential (λ2 ) and X and Y are independent.
(a) Find P[X1 < X2 ].
(b) By using the lack of memory property of the exponential distribution, find the distribution of X1 − X2 .
(c) By using the usual convolution formula for densities, find the density of X1 − X2 . [Ans]
26.5 Suppose Y1 − a ∼ exponential (λ1 ) and Y2 − a ∼ exponential (λ2 ) and Y1 and Y2 are independent. Show that U =
min{Y1 , Y2 } is independent of V = Y2 − Y1 . [Ans]
26.6 A generalization of proposition(25.5a) on page 79. Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate
random variables. and θ1 , θ2 , . . . , θn are positive constants. Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
 
∗ x1 xn
g (x) = min , ...,
θ1 θn
Prove there exists a shape vector z(X) which is independent of g ∗ (X) iff there exists x0 > 0 and αj > 0 for j =
1, 2, . . . , n such that Xj ∼ Pareto(αj , θj x0 , 0) for j = 1, 2, . . . , n. [JAMES(1979)] [Ans]
26.7 A generalization of proposition(25.6a) on page 79. Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate
random variables and θ1 , θ2 , . . . , θn are positive constants. Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
 
∗ x1 xn
g (x) = max , ...,
θ1 θn
Prove there exists a shape vector z(X) which is independent of g ∗ (X) iff there exists x0 > 0 and αj > 0 for j =
1, 2, . . . , n such that Xj ∼ powerlaw(αj , θj x0 , 0) for j = 1, 2, . . . , n. [JAMES(1979)] [Ans]

27 Laplace, Rayleigh and Weibull distributions


27.1 The Laplace or bilateral exponential distribution. Suppose µ ∈ R and α > 0. Then the random
variable X is said to have the Laplace(µ, α) distribution iff X has the density
α
fX (x) = e−α|x−µ| for x ∈ R.
2
Clearly if X ∼ Laplace(µ, α), then X − µ ∼ Laplace(0, α) and α(X − µ) ∼ Laplace(0, 1); see also exercise 28.2.
As figure(27.1a) shows, the density consists of two equal exponential densities spliced back to back.
Decomposition. If V ∼ Laplace(0, 1), then V = X − Y where X and Y are i.i.d. random variables with the
exponential (1) distribution. See part(b) of exercise 28.1 on page 83.
Page 82 §27 Mar 10, 2020(20:25) Bayesian Time Series Analysis

3.0

2.5

2.0

1.5

1.0

0.5

0.0

−2 −1 0 1 2
Figure(27.1a). The bilateral exponential density for µ = 0 and α = 6.
(wmf/bilateralExponential,72mm,54mm)

Characteristic function. Suppose V ∼ Laplace(0, 1); then by the decomposition just described,
1 1 1
E[eitV ] = E[eitX ] E[e−itY ] = =
1 − it 1 + it 1 + t2
It follows that if X ∼ Laplace(µ, α), then
α2
E[eitX ] = eitµ 2 2
α +t
The usual inversion formula for characteristic functionsZ shows that
1 −|x| 1 1
e = e−itx dt
2 2π R 1 + t2
So this gives a way of finding the characteristic function of the Cauchy distribution—see equation(21.5b) on
page 65.
Moments. Suppose X ∼ Laplace(0, α). Then
(2n)!
E[X 2n−1 ] = 0 and E[X 2n ] = for n = 1, 2, . . . . (27.1a)
α2n
It follows that var[X] = E[X 2 ] = 2/α2 .
Entropy. For the definition of entropy, see exercise 8.6 on page 25. For the entropy of the Laplace distribution, see
exercise 28.7 on page 84. Now consider the class C of continuous distributions with E[ |X| ] finite and non-zero
density on R. It can be shown that the distribution in C with maximum entropy is the Laplace distribution—see
pages 62–65 in [KOTZ et al.(2001)].
Convolutions of independent Laplace distributions. See exercise 28.14 on page 84.
27.2 The Weibull distribution. Suppose β > 0 and γ > 0. Then the random variable X is said to have the
Weibull (β, γ) distribution iff X has the density
βxβ−1 −(x/γ)β
fX (x) = e for x > 0. (27.2a)
γβ
The distribution function is F (x) = 1 − exp(−xβ /γ β ) for x > 0. The Weibull distribution is frequently used to
model failure times.
The density can take several shapes as figure(27.3a) illustrates.
27.3 Elementary properties of the Weibull distribution.
Multiple of a Weibull distribution. Suppose X ∼ Weibull (β, γ) and d ∈ (0, ∞). Let Y = dX. Then Y has the
density
dx 1 βy β−1 −(x/(dγ) )β
fY (y) = fX (x)| | = fX (y/d) = e
dy d (dγ)β
which is the density of the Weibull (β, dγ) distribution.
It follows that for fixed β ∈ (0, ∞), the family of distributions {Weibull (β, γ) : γ ∈ (0, ∞)} is a scale family of
distributions—see definition(1.6d) on page 5.
Link with the exponential distribution. By setting β = 1 in equation(27.2a), we see that
d
Weibull (1, γ) = exponential (1/γ)
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §28 Page 83

This result can be generalised to

if Y ∼ Weibull (β, γ) then X = (Y /γ)β ∼ exponential (1)


Conversely,
if X ∼ exponential (1) then Y = γX 1/β ∼ Weibull (β, γ) (27.3a)
2.5
β = 1/2, γ = 1
2.0

1.5 β = 5, γ = 1

1.0 β = 1.5, γ = 1

0.5 β = 1, γ = 1

0.0
0.0 0.5 1.0 1.5 2.0 2.5

Figure(27.3a). The Weibull density for shape parameter β = 1/2, β = 1, β = 1.5 and β = 5;
all with scale parameter γ = 1.
(wmf/weibulldensity,72mm,54mm)


27.4 The Rayleigh distribution. Suppose σ > 0; then the Weibull (2, σ 2) is called the Rayleigh (σ) distribu-
tion. Hence, the random variable X has the Rayleigh (σ) distribution if X has the density
r 2 2
fR (r) = 2 e−r /2σ for r > 0.
σ
Clearly if R ∼ Rayleigh (σ) and b > 0 then bR ∼ Rayleigh (bσ). It follows that the family of distributions
{ Rayleigh (σ) : σ ∈ (0, ∞) } is a scale family of distributions—see definition(1.6d) on page 5.
The Rayleigh distribution is used to model the lifetime of various items and the magnitude of vectors—see exer-
cise 28.26 on page 86. There are plots of the density in figure(27.4a).
1.2 σ = 0.5
σ = 1.5
1.0 σ=4

0.8

0.6

0.4

0.2

0.0
0 2 4 6 8 10

Figure(27.4a). The Rayleigh distribution density for σ = 0.5, σ = 1.5 and σ = 4.


(wmf/Rayleighdensity,72mm,54mm)

28 Exercises (exs-LaplaceRayleighWeibull.tex)

The Laplace or bilateral exponential distribution.


28.1 (a) Suppose α > 0. Suppose further that X has the exponential density αe−αx for x > 0 and Y has the exponential
density αeαx for x < 0 and X and Y are independent. Show that X + Y ∼ Laplace(0, α).
(b) Suppose α > 0 and the random variables X and Y have the exponential density αe−αx for x > 0. Suppose further
that X and Y are independent. Show that X − Y ∼ Laplace(0, α). [Ans]
28.2 Linear transformation of a Laplace distribution.
(a) Suppose X ∼ Laplace(µ, α); suppose further that k > 0 and b ∈ R. Show that kX + b ∼ Laplace(kµ + b, α/k).It
follows that the family of distributions { Laplace(µ, σ) : µ ∈ R, σ ∈ (0, ∞) } is a location-scale family—see
definition(1.6b) on page 5.
(b) Suppose X ∼ Laplace(µ, α). Show that α(X − µ) ∼ Laplace(0, 1). [Ans]
Page 84 §28 Mar 10, 2020(20:25) Bayesian Time Series Analysis

28.3 Shape of the Laplace density. Suppose µ ∈ R and α ∈ (0, ∞). Suppose further that X ∼ Laplace(µ, α) has density fX .
(a) Show that fX is symmetric about x = µ.
(b) Show that fX increases on (−∞, µ) and decreases on (µ, ∞) and hence the mode is at x = µ.
(c) Show that fX is convex on (−∞, µ) and on (µ, ∞). [Ans]
28.4 Suppose X has the Laplace(µ, α) distribution.
(a) Show that the distribution function of X is
−α(µ−x)
1
2e if x ≤ µ;
FX (x) =
1 − 12 e−α(x−µ) if x ≥ µ.
(b) Show that the quartile function is

−1 µ + ln(2p)/α if p ∈ [0, 1/2];
FX (p) =
µ − ln(2 − 2p)/α if p ∈ [ 1/2, 1];
and hence the median is µ.
(c) Show that E[X] = µ and var[X] = 2/α2 .
(d) Show that the moment generating function of X
α2 eµt
E[etX ] = for |t| < α. [Ans]
α 2 − t2
28.5 Suppose Y ∼ Laplace(0, α). Show that
(2n)!
E[Y 2n−1 ] = 0 and E[Y 2n ] = for n = 1, 2, . . . .
α2n
d
In particular, if X ∼ Laplace(µ, α), then X = Y + µ and E[X] = µ and var[X] = E[Y 2 ] = 2/α2 . [Ans]
28.6 Suppose X ∼ Laplace(µ, α).
(a) Show that the skewness of X, skew[X] = 0.
(b) Show that the kurtosis of X, κ[X] = 6. [Ans]
28.7 Entropy of the Laplace distribution.For the definition of entropy, see exercise 8.6 on page 25.
Suppose X ∼ Laplace(µ, α). Find the entropy of X. [Ans]
28.8 (a) Suppose X has the Laplace(0, α) distribution. Show that |X| ∼ exponential (α).
(b) Suppose X ∼ exponential (λ), Y ∼ exponential (µ) and X and Y are independent. Find the density of Z = X − Y .
[Ans]
28.9 Suppose X and Y are independent random variables with X ∼ exponential (α) and Y ∼ Bernoulli ( 1/2).
Show that X(2Y − 1) ∼ Laplace(0, α). [Ans]
Pn 2
28.10 (a) Suppose X1 , . . . , Xn are i.i.d. with the Laplace(µ, α) distribution. Show that 2α i=1 |Xi − µ| ∼ χ2n .
(b) Suppose X and Y are i.i.d. Laplace(µ, α). Show that
|X − µ|
∼ F2,2 [Ans]
|Y − µ|
28.11 Suppose X and Y are i.i.d. uniform uniform(0, 1). Show that ln( X/Y ) ∼ Laplace(0, 1). [Ans]
28.12 Suppose X1 , X2 , X3 and X4 are i.i.d. N (0, 1).
(a) Show that X1 X2 − X3 X4 ∼ Laplace(0, 1).
(b) Show that X1 X2 + X3 X4 ∼ Laplace(0, 1). (See also exercise 16.16 in §2.16 on page 51.) [Ans]
28.13 Exponential scale mixture of normals.
√ Suppose X and Y are independent
√ random variables with X ∼ exponential (1)
and Y ∼ N (0, 1). Show that Y 2X ∼ Laplace(0, 1) and µ + Y 2X/α ∼ Laplace(µ, α).
Note. We can use either characteristic functions or densities. [Ans]
28.14 Convolutions of independent Laplace distributions.
(a) Suppose X and Y are i.i.d. random variables with the Laplace(0, α) distribution. Find the densities of V = X + Y
and W = X − Y .
(b) Suppose X ∼ Laplace(0, α1 ), Y ∼ Laplace(0, α2 ) and X and Y are independent. Find the densities of V = X + Y
and W = X − Y . (Hint: use characteristic functions.) [Ans]
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §28 Page 85

The Weibull distribution.


28.15 Shape of the Weibull density. Suppose the random variable X ∼ Weibull (β, γ) has density function fX .
0 00
(a) Find the derivatives fX (x) and fX (x).
(b) Suppose β ∈ (0, 1). Show that fX is decreasing and convex with fX (x) → ∞ as x ↓ 0.
(c) Suppose β = 1. Show that fX is decreasing and convex with mode at x = 0.
(d) Suppose β ∈ (1, 2]. Show that fX (x) is concave and then convex with point of inflection at
√ 1/β
3(β − 1) + (5β − 1)(β − 1)

x2 = γ

(e) Suppose β ∈ (2, ∞). Show that fX (x) is initially convex, then concave and then convex again with points of
inflection at
√ 1/β √ 1/β
3(β − 1) − (5β − 1)(β − 1) 3(β − 1) + (5β − 1)(β − 1)
 
x1 = γ and x2 = γ [Ans]
2β 2β
β
28.16 Suppose X has the Weibull (β, γ) distribution; hence fX (x) = βxβ−1 e−(x/γ) /γ β for x > 0.
(a) Suppose α > 0; find the distribution of Y = αX. (b) Find an expression for E[X n ] for n = 1, 2, . . . .
(c) Find the mean, variance, median and mode of X. (d) Find E[et ln(X) ], the m.g.f. of ln(X).
(e) Show that E[X] → γ and var[X] → 0 as β → ∞. [Ans]
28.17 Suppose X ∼ Weibull (β, γ).
(a) Find skew[X], the skewness of X.
(b) Find κ[X], the kurtosis of X. [Ans]
28.18 Suppose X has the Weibull (β, γ) distribution.
(a) Find hX (x) = f (x)/[1 − F (x)], the hazard function of X.
(b) Check that if β < 1 then hX decreases as x increases; if β = 1 then hX is constant; and if β > 1 then hX increases
as x increases. [Ans]
28.19 (a) Suppose U ∼ uniform(0, 1) distribution. Show that X = γ(− ln U )1/β ∼ Weibull (β, γ).
(b) Suppose X ∼ Weibull (β, γ). Show that U = exp −(X/γ)β ] ∼ uniform(0, 1).
 
[Ans]
28.20 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables each with the Weibull (β, γ) distribution. As usual, let X1:n =
min{X1 , . . . , Xn }. Show that X1:n ∼ Weibull (β, γ/n1/β ). [Ans]
D
28.21 Suppose Xβ ∼ Weibull (β, γ). Show that Xβ =⇒ γ as β → ∞. [Ans]

The Rayleigh distribution.


28.22 Suppose R has the Rayleigh (σ) distribution:
r −r2 /2σ2
fR (r) = e for r > 0.
σ2
(a) Show that fR increases and then decreases with mode at r = σ.

(b) Show that fR is initially concave, then convex with inflection point at r = 3σ.
(c) Find the distribution function of R. (d) Find an expression for E[Rn ] for n = 1, 2, . . . .
(e) Find E[R] and var[R]. (f) Find the quartile function and median of R. [Ans]
(g) Find the hazard function of R.
28.23 Suppose X ∼ Rayleigh (σ).
(a) Find skew[X], the skewness of X.
(b) Find κ[X], the kurtosis of X. [Ans]
2
28.24 (a) Suppose X ∼ N (µ, σ ). Show that
Z ∞
(x − µ)2 µ2
   
1 σ µ
x√ exp − dx = √ exp − + µΦ
0 2πσ 2 2σ 2 2π 2σ 2 σ
(b) Suppose X ∼ Rayleigh (σ) where σ ∈ (0, ∞). Find the moment generating function of X. [Ans]

28.25 (a) Suppose U ∼ uniform(0, 1) and X = σ −2 ln U . Show that X ∼ Rayleigh (σ).
Conversely, if X ∼ Rayleigh (σ), then exp −X 2 /(2σ 2 ) ∼ uniform(0, 1).

(b) Suppose X has the exponential (λ) = gamma(1,
√ λ) distribution. Find the distribution of Y = X. In particular,
suppose X ∼ exponential (1); show that R = 2X has the Rayleigh (1) distribution. [Ans]
Page 86 §29 Mar 10, 2020(20:25) Bayesian Time Series Analysis

28.26 (a) Suppose X and Y are i.i.d. with the N (0, σ 2 ) distribution. Define R and Θ by R = X 2 + Y 2 , X = R cos Θ and
Y = R sin Θ with Θ ∈ [0, 2π). Prove that R and Θ are independent and find the density of R and Θ.
(b) Suppose R ∼ Rayleigh (σ), Θ ∼ uniform(−π, π) and R and Θ are independent. Show that X = R cos Θ and
Y = R sin Θ are i.i.d. N (0, σ 2 ).
(c) The Box-Muller √transformation.√ Suppose X ∼ exponential (1), Y ∼ uniform(0, 1) and X and Y are independent.
Let (V, W ) = ( 2X sin(2πY ), 2X cos(2πY )). Show that V and W are i.i.d. N (0, 1) random variables.
(d) Suppose U1 and√U2 are i.i.d. random variables
√ with the uniform(0, 1) distribution.
Let (V, W ) = ( −2 ln U1 sin(2πU2 ), −2 ln U1 cos(2πU2 )). Show that V and W are i.i.d. N (0, 1) random vari-
ables.
(e) Suppose X and Y are i.i.d. random variables with the N (0, σ 2 ) distribution. Let
2XY X2 − Y 2
W =√ and Z = √
X2 + Y 2 X2 + Y 2
Prove that W and Z are i.i.d. random variables with the N (0, 1) distribution. [Ans]
28.27 (a) Suppose R has the Rayleigh (σ) distribution. Find the distribution of R2 .
(b) Suppose
√ R has the Rayleigh (1) distribution. Show that the distribution of R2 is χ22 . Conversely, if X ∼ χ22 then
X ∼ Rayleigh (1).
Pn
(c) Suppose R1 , . . . , Rn are i.i.d. with the Rayleigh (σ) distribution. Show that Y = i=1 Ri2 has the gamma(n, 1/2σ 2 )
distribution. [Ans]
28.28 Suppose X ∼ Rayleigh (s) where s > 0, and Y |X ∼ N (µ, σ = X). Show that Y ∼ Laplace(µ, 1/s). [Ans]

29 The logistic distribution


29.1 The logistic distribution.
Recall the logistic function is the sigmoid curve
M
f (x) = for x ∈ R.
1+ e−k(x−x0 )
The logistic distribution has the logistic curve as its distribution function. The basic version of the density is

π e−πx/ 3
fX (x) = √ √ (29.1a)
3 (1 + e−πx/ 3 )2
and this has distribution function
1
FX (x) = √ for x ∈ R. (29.1b)
1 + e−πx/ 3
A linear transformation gives the following general form of the logistic distribution:
Definition(29.1a). Suppose µ ∈ R and σ > 0. Then the random variable X has the logistic distribution,
logistic(µ, σ 2 ), iff X has density function

π e−π(x−µ)/(σ 3)
fX (x) = √ h i for x ∈ R. (29.1c)
σ 3 1 + e−π(x−µ)/(σ√3) 2

The distribution function is


1
FX (x) = √ for x ∈ R. (29.1d)
1+e −π(x−µ)/(σ 3)

Another parametrization has s = σ 3/π; this gives
e−(x−µ)/s 1
fX (x) = 2 and FX (x) = −(x−µ)/s
(29.1e)
1 + e

s 1+e −(x−µ)/s

Alternative expressions for equations(29.1c), (29.1d) and (29.1e) are given in exercise 31.2.

Shape of the logistic density. It is easy to see that the density is symmetric about µ and the mode is µ–see also
exercise 31.1 and plots of the density in figure(29.1a).
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §29 Page 87

0.5
µ = 0, σ = 1
1.0
µ = 0, σ = 1
0.4 µ = 0, σ = 2
0.8

0.3
0.6 µ = 0, σ = 4

0.2 µ = 0, σ = 2
0.4

0.1 µ = 0, σ = 4 0.2

0.0 0.0
−15 −10 −5 0 5 10 15 −5 0 5 10 15 20
Figure(29.1a). Plots of the density and distribution function of the logistic distribution.
(wmf/logisticDensity,wmf/logisticDf,80mm,60mm)

29.2 Basic properties of the logistic distribution.


• If X ∼ logistic(µ, σ 2 ), and Y = a + bX, then Y ∼ logistic(a + bµ, b2 σ 2 ). See exercise 31.5 on page 96. It
follows that the family of distributions {logistic(µ, σ 2 ) : µ ∈ R, σ > 0} form a location-scale family—see
definition(1.6b) on page 5.
• If X ∼ logistic(0, 1) then X has the density in equation(29.1a) and distribution function in equation(29.1b).
• If X ∼ logistic(0, π 2 /3) then
e−x 1
fX (x) = −x
and FX (x) = for x ∈ R. (29.2a)
(1 + e ) 2 1 + e−x
This is the simplest form of the logistic density and sometimes called the standard logistic distribution. Multi-
plying the numerator and denominator of the density by e2x shows that fX (x) = ex /(1 + ex )2 .
• If X ∼ logistic(µ, σ 2 ), then
π
fX (x) = √ FX (x)[1 − FX (x)] for x ∈ R.
σ 3

29.3 Moments of the logistic distribution. Recall that if Y ∼ logistic(0, π 2 /3) then
e−y
fY (y) = for y ∈ R.
(1 + e−y )2
Straightforward algebra show that fY (−y) = fY (y) and hence E[Y n ] = 0 when n is odd. By using the expansion
∞  ∞
−2 n X

1 X
= x = (−1)n (n + 1)xn
(1 + x)2 n
n=0 n=0
we get

y 2 e−y
Z
E[Y 2 ] = 2 dy
0 (1 + e−y )2
Z ∞ X∞
=2 y 2 e−y (−1)n (n + 1)e−ny dy
0 n=0

X Z ∞
=2 n
(−1) (n + 1) y 2 e−(n+1)y dy
n=0 0

X Γ(3)
=2 (−1)n (n + 1) by using equation(11.1b) on page 33.
(n + 1)3
n=0

X (−1)n
=4
(n + 1)2
n=0
Because the series is absolutely convergent, we can rearrange and get
Page 88 §29 Mar 10, 2020(20:25) Bayesian Time Series Analysis
   
∞ ∞ ∞ ∞
X 1 X 1  X 1 X 1 
E[Y 2 ] = 4  −  = 4 −2

(n + 1)2 (n + 1) 2 (n + 1) 2 (n + 1)2

n=0 n=0 n=0 n=0
n even n odd n odd
"∞ ∞
# ∞
X 1 X 1 X 1 π2 π2
=4 − 2 = 2 = 2 =
k2 (2k)2 k2 6 3
k=1 k=1 k=1
by using equation 23.2.24 on page 807 in [A BRAMOWITZ &S TEGUN(1965)].
shown that if Y ∼ logistic(0, π 2 /3) then E[Y ] = 0 and var[Y ] = π 2 /3. If X ∼ logistic(µ, σ 2 ), then
We have √
X = µ + 3σY /π where Y ∼ logistic(0, π 2 /3). Hence
E[X] = µ and var[X] = σ 2

General result for moments of the logistic(0, π 2 /3) distribution. Suppose X ∼ logistic(0, π 2 /3). Because the
distribution is symmetric, we have
E[X 2r+1 ] = 0 for r = 0, 1, 2, . . . .
It remains to find the even moments: for r = 1, 2, . . . , we have
Z ∞ Z ∞
2r 2r e−x 2r e−x
E[X ] = x dx = 2 x dx (29.3a)
−∞ (1 + e−x )2 0 (1 + e−x )2
Consider the standard expansion
∞  
1 X `+n `
= z for |z| < 1
(1 − z)n+1 n
`=0
Setting n = 1 and replacing z by −z gives
∞   ∞
1 X `+1 ` `
X
= (−1) z = (` + 1)(−1)` z ` for |z| < 1.
(1 + z)2 1
`=0 `=0
Applying this result to equation(29.3a) gives
Z ∞ ∞
X ∞
X Z ∞
` −(`+1)x
2r
E[X ] = 2 x 2r
(` + 1)(−1) e dx = 2 (` + 1)(−1)`
x2r e−(`+1)x dx
0 `=0 `=0 0
∞ ∞
X Γ(2r + 1) X (−1)j−1
=2 (` + 1)(−1)` = 2Γ(2r + 1)
(` + 1)2r+1 j 2r
`=0 j=1
Because r ≥ 1, the series is absolutely convergent, and we can rearrange to get
   
∞ ∞ ∞ ∞
X 1 X 1  X 1 1 X 1 
= 2Γ(2r + 1)  2r
−2 2r
= 2Γ(2r + 1)  2r
− 2r−1
j (2`) j 2 `2r
j=1 `=1 j=1 `=1
 
1
= 2Γ(2r + 1) 1 − 2r−1 ζ(2r)
2
where ζ denotes the Riemann zeta function—see page 807 in [A BRAMOWITZ &S TEGUN(1965)]. By equation
23.2.16 in that reference, we have
(2π)2r
ζ(2r) = |B2r |
2(2r)!
where Bn are the Bernoulli numbers—see page 804 in [A BRAMOWITZ &S TEGUN(1965)]. Hence
E[X 2r ] = (22r − 2)π 2r |B2r |
Equation 23.1.3 in the same reference gives
1 1 1
B0 = 1, B1 = − , B2 = , B3 = 0, B4 = −
2 6 30
and hence
π2 7π 4
E[X 2 ] = 2π 2 |B2 | = and E[X 4 ] = 14π 4 |B4 | = (29.3b)
3 15
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §30 Page 89

29.4 Characteristic function of the logistic distribution. Suppose Y ∼ logistic(0, π 2 /3). Then
Z ∞ Z ∞
itY eity−y eity ey
φY (t) = E[e ] = −y 2
dy = y 2
dy
−∞ (1 + e ) −∞ (1 + e )

Consider the transformation u/(1 − u) = ey or u = ey /(1 + ey ). which maps y ∈ (−∞, ∞) 7→ u ∈ (0, 1). Note
that du y y 2
dy = e /(1 + e ) . Hence

u it
Z 1  Z 1
φY (t) = du = uit (1 − u)−it du = B(1 + it, 1 − it) = Γ(1 + it) Γ(1 − it) (29.4a)
0 1 − u 0

Euler’s reflection formula (equation 6.1.17 on page 256 in [A BRAMOWITZ &S TEGUN(1965)]) states that
π
Γ(z) Γ(1 − z) = provided z is not an integer.
sin(πz)
Using this result and the facts that Γ(z + 1) = zΓ(z) and sinh x = −i sin(ix) gives
itπ πt
φY (t) = itΓ(it) Γ(1 − it) = = for all t ∈ R.
sin(πit) sinh(πt)

If X ∼ logistic(µ, σ 2 ), then X = µ + 3σY /π where Y ∼ logistic(0, π 2 /3). Hence
√ ! √ !
iµt
√ iµt 3σ 3σ
φX (t) = e φY ( 3σt/π) = e Γ 1 + it Γ 1 − it
π π

iµt 3σt
=e √ for all t ∈ R.
sinh( 3σt)

30 Extreme value distributions


30.1 Extreme value distributions. The term extreme value distribution is used to denote a distribution which
is the limit as n → ∞ of the distribution of max{X1 , . . . , Xn } or min{X1 , . . . , Xn } where X1 , . . . , Xn are
i.i.d. random variables from an unbounded distribution. Further information about extreme value distributions can
be found in [KOTZ &NADARAJAH(2000)].

30.2 The standard Gumbel distribution.

Definition(30.2a). The random variable X has the standard Gumbel distribution iff X has an absolutely
continuous distribution with density function
fX (x) = exp −(e−x + x) for x ∈ R.
 
(30.2a)

Elementary properties.
• X has distribution function
FX (x) = exp −e−x for x ∈ R.
 

and this shows that the function defined by equation(30.2a) is a valid probability density function.
h √ i h √ i
• The mode is at x = 0, with points of inflection at x = ln (3 − 5)/2 and x = ln (3 + 5)/2 . See exer-
cise 31.12 on page 97.
• Median is ln( − ln 2 ). See exercise 31.13 on page 97.
• The moment generating function of X is
MX (t) = E[etX ] = Γ(1 − t) for t < 1.
See exercise 31.14 on page 97.
Page 90 §30 Mar 10, 2020(20:25) Bayesian Time Series Analysis

30.3 First and second moments of the standard Gumbel distribution. The derivative of the moment gener-
ating function is
Z ∞ Z ∞ Z ∞
0 d −t d −t ln(v)
MX (t) = v exp(−v) dv = e exp(−v) dv = − ln(v)v −t exp(−v) dv
0 dt 0 dt 0
R∞
Hence E[X] = γ where γ = − 0 e−x ln(x) dx is Euler’s constant.
Using equations 6.3.1 and 6.3.16 on pages 258–259 in [A BRAMOWITZ &S TEGUN(1965)] shows that the digamma
function ψ(z) satisfies Γ0 (z) = ψ(z)Γ(z) and
∞ 
Γ0 (z) X

d 1 1
ψ(z) = ln( Γ(z) ) = = − − γ for z > 0.
dz Γ(z) k+1 z+k
k=0
The derivative of the digamma function is the trigamma function. Using equation 6.4.10 on page 260 and equation
23.2.24 on page 807 in [A BRAMOWITZ &S TEGUN(1965)] shows that

X 1 π2
ψ 0 (1) = =
k2 6
k=1
Differentiating MX (t) = E[etX ] = Γ(1 − t) gives E[X] = −Γ0 (1) = −ψ(1)Γ(1) = γ and
π2
E[X 2 ] = Γ00 (1) = ψ 0 (1)Γ(1) + ψ(1)Γ0 (1) = ψ 0 (1) + [ψ(1)]2 = + γ2
6
Hence
var[X] = π 2 /6
Differentiating again and using equation 6.4.2 on page 260 in [A BRAMOWITZ &S TEGUN(1965)] gives
2π 2 γ π2 3π 2 γ
Γ000 (1) = ψ 00 (1)Γ(1) + 2ψ 0 (1)Γ0 (1) + ψ(1)Γ00 (1) = ψ 00 (1) − − γ( + γ 2 ) = ψ 00 (1) − − γ3
6 6 6
3π 2 γ
= −2ζ(3) − − γ3
6
and hence
E[X 3 ] − 3µσ 2 − µ3 −Γ000 (1) − 3µσ 2 − µ3 12ζ(3)
skew[X] = = =
σ3 σ3 π2
Finally
E[X 4 ] = Γ0000 (1) = ψ 000 (1)Γ(1) + 3ψ 00 (1)Γ0 (1) + 3ψ 0 (1)Γ00 (1) + ψ(1)Γ000 (1)
π2 π2 3π 2 γ
= 6ζ(4) + 6γζ(3) + 3 ( + γ 2 ) + γ(2ζ(3) + + γ3)
6 6 6
π4
= 6ζ(4) + 8γζ(3) + + π2γ 2 + γ 4
12
and hence E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 = 6ζ(4) + π 4 /12 which implies
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 27
κ[X] = =
σ4 5
30.4 Relationships with other distributions. The following relationships are left to exercise 31.15.
• If X has the standard Gumbel distribution then FX (X) = exp(e−X ) ∼ uniform(0, 1).
• If U ∼ uniform(0, 1) then FX−1 (U ) = −ln( − ln U ) has the standard Gumbel distribution.
• If Z ∼ exponential (1) then − ln Z has the standard Gumbel distribution.
• If X has the standard Gumbel distribution then Z = e−X ∼ exponential (1).
• Suppose X and Y are i.i.d. random variables with the standard Gumbel distribution. Then Z = X − Y ∼
logistic(0, π 2 /3). This can be proved by using the convolution integral or by using characteristic functions—
see exercise 31.17 on page 97.
The following proposition shows why the Gumbel distribution is an extreme value distribution:
Proposition(30.4a). Suppose X1 , X2 , . . . is a sequence of i.i.d. random variables with the exponential (1)
distribution. Let
Vn = max{X1 , . . . , Xn } − ln n for n = 1, 2, . . .
Then
D
Vn =⇒ Z as n → ∞.
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §30 Page 91

where Z has the standard Gumbel distribution.


Proof. Now
P[Xn:n ≤ x] = [ FX (x) ]n = (1 − e−x )n for x > 0.
Hence for x > 0 n
e−x

n −x
P[Vn ≤ x] = P[Xn:n ≤ x + ln n] = 1 − e−(x+ln n) = → ee

1− as n → ∞.
n
D
Similarly, if Wn = min{−X1 , . . . , −Xn } + ln n = − max{X1 , . . . , Xn } + ln n, then Wn =⇒ −Z as n → ∞.
The distribution of −Z is called the reverse standard Gumbel distribution, or the standard Gumbel distribution for
minima.
Definition(30.4b). The random variable X has the reverse standard Gumbel distribution iff X has an abso-
lutely continuous distribution with density function
fX (x) = ex exp −ex for x ∈ R.
 
(30.4a)
30.5 The general Gumbel or type I extreme value distribution—definition.
Definition(30.5a). Suppose a ∈ R and b 6= 0. Then the random variable X has the Gumbel distribution,
Gumbel (a, b) iff X has density
x−a x−a
    
1
fX (x) = exp − exp − exp − for x ∈ R.
|b| b b
Clearly the Gumbel (0, 1) distribution is the standard Gumbel distribution defined in definition (30.4b) above.
0.4 0.4

a = 0, b = 1 a = 0, b = −1
0.3 0.3

a = 0, b = 2 a = 0, b = −2
0.2 0.2

0.1 0.1 a = 0, b = −6
a = 0, b = 6

0.0 0.0
−10 −5 0 5 10 15 20 −20 −15 −10 −5 0 5 10

Figure(30.5a). Plots of the density function of the Gumbel distribution.


(wmf/GumbelDensity1,wmf/GumbelDensity2,80mm,60mm)

If a = 0 and b ∈ (0, ∞), then the Gumbel (0, b) distribution has density
1  
fX (x) = e−x/b exp −e−x/b for x ∈ R.
b
Suppose b ∈ (0, ∞). Then the random variable X has the reverse Gumbel (0, b) distribution or the Gumbel (0, b)
distribution for minima iff X has an absolutely continuous distribution with density function
h i
fX (x) = ex/b exp −ex/b for x ∈ R. (30.5a)

30.6 The general Gumbel distribution—properties.


Linear transformation of a Gumbel distribution.
dy
Suppose X ∼ Gumbel (a, b), c ∈ R and d 6= 0. Let Y = c + dX. Then | dx | = |d| and

y − c − ad y − c − ad
    
dx 1 1
fY (y) = fX (x) = fX (x) = exp − exp − exp −
dy |d| |bd| bd bd
which is the density of the Gumbel (c + ad, bd) distribution.
Relationship to the standard Gumbel distribution. The last result implies that if X ∼ Gumbel (a, b) then X =
a + bY where Y ∼ Gumbel (0, 1). Hence the family of distributions {Gumbel (a, b) : a ∈ R, b > 0} is a location-
scale family, and the family of distributions {Gumbel (a, b) : a ∈ R, b < 0} is a location-scale family—see
definition(1.6b) on page 5.
Page 92 §30 Mar 10, 2020(20:25) Bayesian Time Series Analysis

The distribution function of X ∼ Gumbel (a, b). For x ∈ R we have


FX (x) = P[X ≤ x] = P[X − a ≤ x − a]
 
 P X−a ≤ x−a

b b if b > 0;
=
 P X−a ≥ x−a = 1 − P X−a ≤ x−a
   
b b b b if b < 0;

 FV x−a = exp −e−(x−a)/b
 
b if b > 0;
=
 1 − FV x−a = 1 − exp −e−(x−a)/b
 
b if b < 0.
where V ∼ Gumbel (0, 1), the standard Gumbel distribution.

The quantile function, moments and the moment generating function. See exercise 31.16.

30.7 Links with other distributions.


• Suppose X ∼ Gumbel (a, b) where a ∈ R and b > 0. Let Y = e−X . Then Y ∼ Weibull ( 1b , e−a ).
For the proof see exercise 31.18.
• Suppose Y ∼ Weibull (β, γ) where β > 0 and γ > 0. Then X = −ln Y ∼ Gumbel ( − ln γ, β1 ).
For the proof see exercise 31.19.
• Suppose X ∼ Gumbel (aX , b) and Y ∼ Gumbel (aY , b). Suppose further that X and Y are independent. Then
X − Y ∼ logistic(aX − aY , π 2 b2 /3). For the proof see exercise 31.17.

30.8 The Fréchet or type II extreme value distribution.


Definition(30.8a). Suppose α > 0. Then the random variable X has the standard Fréchet or type II extreme
value distribution iff X has distribution function

FX (x) =
0  if x < 0; (30.8a)
exp −x−α if x > 0.

Elementary properties.
• X has density
fX (x) = αx−1−α exp −x−α

for x > 0. (30.8b)

• Suppose Y ∼ exponential (1) and X = Y −1/α . Then for x > 0 we have


P[X ≤ x] = P[Y −1/α ≤ x] = P[Y 1/α ≥ 1/x] = P[Y ≥ 1/xα ] = exp(−1/xα )
and this is the distribution function of the Fréchet distribution.

• Moments. Now
Z ∞  
−1/α −1/α −y 1
E[X] = E[Y ]= y e dy = Γ 1 − for α > 1 by equation(11.1b) on page 33.
y=0 α
Similarly
Z ∞  
2 −2/α −2/α −y 2
E[X ] = E[Y ]= y e dy = Γ 1 − for α > 2.
y=0 α

1
• Median. By setting FX (x) = 2 we see that the median is at
1

α
ln 2

• Shape of the density. By differentiating the density we see that the mode is at
 α 1/α
xM =
1+α
and fX increases for x < xM and decreases for x > xM .
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §30 Page 93

30.9 The general Fréchet or type II extreme value distribution. This is a linear transformation of the standard
Fréchet distribution defined above.
Definition(30.9a). Suppose a ∈ R, b > 0 and α > 0. Then the random variable X has the Fréchet distribution,
Fréchet(α, a, b) iff X has density
 !
α x − a −1−α x − a −α
  
fX (x) = exp − for x > a. (30.9a)
b b b
1.2
0.5
a = 0, b = 1, α = 1 1.0 a = 0, b = 1, α = 3
0.4
0.8
0.3 a = 0, b = 2, α = 1 a = 0, b = 2, α = 3
0.6

0.2 0.4
a = 0, b = 6, α = 1 a = 0, b = 6, α = 3
0.1 0.2

0.0 0.0
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

Figure(30.9a). Plots of the density function of the Fréchet distribution.


Left: α = 1; right: α = 3. Note the difference in scales on the y-axis.
(wmf/Frechet-1,wmf/Frechet-3,80mm,60mm)

Linear transformation of the Fréchet distribution. If X ∼ Fréchet(α, a, b), a1 ∈ R and b1 > 0 then a1 + b1 X ∼
Fréchet(α, a1 + b1 a, bb1 )—see exercise 31.21 on page 98.
It follows that Fréchet(α, 0, 1) is the standard Fréchet distribution defined in equations(30.8b) and (30.8a). Also,
if X ∼ Fréchet(α, a, b), then X = a + bV where V ∼ Fréchet(α, 0, 1).
Basic properties.
• The distribution function of the Fréchet(α, a, b) distribution is
 !
x − a −α

F (x) = exp − for x > a. (30.9b)
b

• Median. The median is at


b
a+ √
α
ln 2
• Moments. 
E[X] = a + bΓ(1 − α1 ) if α > 1;
∞ otherwise

2 2 2 1 2
 
var[X] = b Γ 1 − α − b Γ 1 − α if α > 2;
∞ otherwise.
Links with other distributions. The following results are left to exercise 31.21 on page 98.
• If U ∼ uniform(0, 1), α > 0, a ∈ R, and b > 0 then a + b (− ln U )−1/α ∼ Fréchet(α, a, b).
• If X1 , . . . , Xn are i.i.d. random variables with the Fréchet(α, a, b) distribution and Y = max{X1 , . . . , Xn }
then
Y ∼ Fréchet(α, a, n1/α b)

• If X ∼ Fréchet(α, a, b) then
1
∼ Weibull (α, b−1 )
X
Page 94 §30 Mar 10, 2020(20:25) Bayesian Time Series Analysis

30.10 The reverse Weibull or type III extreme value distribution.


Definition(30.10a). Suppose α > 0. Then the random variable X has the standard type III extreme value
distribution iff X has distribution function 
exp (−(−x)α ) if x < 0;
FX (x) =
1 if x > 0.
Elementary properties.
• X has density
fX (x) = α(−x)α−1 exp (−(−x)α ) for x < 0. (30.10a)
• Suppose Y ∼ exponential (1) and X = −Y 1/α . Then for x < 0 we have
P[X ≤ x] = P[−Y 1/α ≤ x] = P[Y 1/α ≥ −x] = P[Y ≥ (−x)α ] = exp (−(−x)α )
and this is the distribution function of the type III extreme value distribution.
• Moments. Now for any α > 0 we have
Z ∞  
1
E[X] = −E[Y 1/α ] = − y 1/α e−y dy = −Γ 1 +
y=0 α
Similarly Z ∞  
2 2/α 2/α −y 2
E[X ] = E[Y ]= y e dy = Γ 1 +
y=0 α
1
• Median. By setting FX (x) = 2 we see that the median is at
√α
− ln 2

30.11 Link with the Weibull distribution. Suppose α > 0 and Y ∼ Weibull (α, 1). By equation(27.2a) on
page 82, Y has density
α
fY (y) = αy α−1 e−y for y > 0.
Let X = −Y . Then X has density
fX (x) = α(−x)α−1 exp (−(−x)α ) for x < 0.
and this is the same as equation(30.10a).
Hence we see that if X has the standard type III extreme value distribution, then −X ∼ Weibull (α, 1) and hence
properties of the type III extreme value distribution can be derived from properties of the Weibull distribution.
30.12 The general type III extreme value distribution. This is a linear transformation of the standard type III
distribution.
Definition(30.12a). Suppose a ∈ R, b > 0 and α > 0. Then the random variable X has the type III extreme
value distribution, extremeIII (α, a, b) iff X has density
x − a α−1 x−a α
     
α
fX (x) = − exp − − for x < a. (30.12a)
b b b
1.0 2.5

a = 0, b = 1, α = 1 a = 0, b = 1, α = 6
0.8 2.0

0.6 a = 0, b = 2, α = 1 1.5 a = 0, b = 2, α = 6

0.4 1.0
a = 0, b = 6, α = 1 a = 0, b = 6, α = 6
0.2 0.5

0.0 0.0
−8 −6 −4 −2 0 −8 −6 −4 −2 0

Figure(30.12a). Plots of the density function of the type III extreme value distribution. Left: α = 1; right: α = 6.
Note the difference in scales on the y-axis. When a = 0, b = 6 and α = 1 then f (x) = 61 exp( x6 ) for x < 0.
(wmf/extremeIII-alpha1,wmf/extremeIII-alpha6,80mm,60mm)
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §30 Page 95

Basic properties.
• The distribution function of the extremeIII (α, a, b) distribution. This is
x−a α
   
F (x) = exp − − for x < a. (30.12b)
b
with F (x) = 1 for x ≥ a.
• Linear transformation of the type III extreme value distribution. Suppose X ∼ extremeIII (α, a, b), a1 ∈ R and
b1 > 0. Then a1 + b1 X ∼ extremeIII (α, a1 + b1 a, b1 b).
In particular, if X ∼ extremeIII (α, a, b), then X = a + bV where V ∼ extremeIII (α, 0, 1).
• Median. This occurs at √
α
a − b ln 2

• Moments.  
1
E[X] = a − bΓ 1 +
α
1 2
   
2 2 2
var[X] = b Γ 1 + −b Γ 1+
α α

30.13 Unified formulation—the generalized extreme value distribution. The Fréchet(α, 0, 1) distribution
has density
fX (x) = αx−1−α exp −x−α for x > 0.


Consider the transformation y = α(x − 1); then y ∈ (−α, ∞) and


  
dx fX (x)  y −1−α y −α
fY (y) = fX (x) =
= 1+ exp − 1 + for y > −α. (30.13a)
dy α α α
The extremeIII (α, 0, 1) distribution has density
fX (x) = α(−x)α−1 exp (−(−x)α ) for x < 0.

dy
Consider the transformation y = α(1 + x); then y ∈ (−∞, α), dx = α and

dx y α−1   y α 
fY (y) = fX (x) = 1 − exp − 1 − for y < α.
dy α α
Using the parameter β = −α where now β < 0 gives
 !
y −1−β y −β
  
fY (y) = 1 + exp − 1 + for y < −β. (30.13b)
β β
Comparing equations(30.13a) and (30.13b) leads to the following unified formulation of an extreme value distri-
bution:
Definition(30.13a).The standardized extreme value density is defined to be the density
( −1−α  −α 
1 + αx exp − 1 + αx if α 6= 0;
f (x; α) = (30.13c)
−x
 
exp −(e + x) if α = 0, any x ∈ R.
If α > 0 then x > −α, and if α < 0 then x < −α.
Note that the expression for α = 0 in equation(30.13c) is the limit as α → 0 of the expression for α 6= 0. The
distribution function is 
( −α 
exp − 1 + αx if α 6= 0;
F (x; α) = (30.13d)
−x
 
exp −(e ) if α = 0, any x ∈ R.
Just as for the density, the expression for the distribution function for α = 0 is the limit as α → 0 of the expression
for α 6= 0.
Equation(30.13d) is valid for x > −α when α > 0 and is valid for x < −α when α < 0. In detail
(
0   if x ≤ −α;
when α > 0, then F (x; α) = exp − 1 + x −α if x > −α.
α
Page 96 §31 Mar 10, 2020(20:25) Bayesian Time Series Analysis

and (  −α 
when α < 0, then F (x; α) = exp − 1 + αx if x < −α;
1 if x ≥ −α;
A linear transformation leads to the following density.
Definition(30.13b). Suppose α ∈ R, µ ∈ R and σ > 0. Then the extreme value density GEV (α, µ, σ) is defined
to be the density
1+1/α −τ (x)
(1
σ [τ (x)] e if α 6= 0;
f (x; α, µ, σ) = 1 −τ (x)
σ τ (x)e if α = 0, and x ∈ R.
for x > µ − ασ if α > 0 and for x < µ − σα if α < 0, where
−α
1 + x−µ
(
ασ if α 6= 0;
τ (x) = x−µ 
exp − σ if α = 0.
The distribution function of GEV (α, µ, σ) is F (x; α, µ, σ) = e−τ (x) .
Clearly GEV (0, µ, σ) = Gumbel (µ, σ).
30.14 Properties of the generalized extreme value distribution. For proofs of the following, see exer-
cise 31.22.
• If X ∼ exponential (1), then µ − σ ln X ∼ GEV (0, µ, σ) = Gumbel (µ, σ).
• If X ∼ GEV (α, µ, σ), a > 0 and b ∈ R, then aX + b ∼ GEV (α, aµ + b, aσ).
• If X ∼ Weibull (β, γ) then β(1 − γ ln(X/γ)) ∼ GEV  (0, β,γ) = Gumbel (beta, γ).
• If X ∼ GEV (0, µ, σ) = Gumbel (µ, σ) then σ exp − X−µ µσ ∼ Weibull (µ, σ).

31 Exercises (exs-extreme.tex)

The logistic distribution.


31.1 Shape of the logistic density. Suppose X ∼ logistic(µ, σ 2 ) has density function f .
(a) Show that f is symmetric about x = µ.
(b) Show that f increases for x < µ and decreases for x > µ.
√ √
(c) Show that f is√initially convex
√ then concave and then convex with points of inflection at x = µ − σ 3 ln(2 + 3)/π
and x = µ + σ 3 ln(2 + 3)/π. [Ans]
31.2 Alternative expressions for the density and distribution function. √
(a) Show that the density of the logistic(µ, σ 2 ) distribution is, where s = σ 3/π
π(x − µ) x−µ
   
π 1
f (x) = √ sech2 √ = sech2 for x ∈ R.
4σ 3 2σ 3 4b 2b
This result explains why the logistic distribution is sometimes called the sech-squared distribution.

(b) Show that the distribution function of the logistic(µ, σ 2 ) distribution is,where s = σ 3/π
π(x − µ) x−µ
   
1 1 1 1
FX (x) = + tanh √ = + tanh for x ∈ R. [Ans]
2 2 2σ 3 2 2 2b
31.3 Quantile function and median of the logistic distribution. Show that the quantile function of the logistic(µ, σ 2 ) distribu-
tion is √  
−1 σ 3 p
FX (p) = µ + ln for p ∈ (0, 1).
π 1−p
−1 1
and hence the median of the logistic distribution is FX ( /2) = µ. [Ans]
31.4 Suppose Y ∼ logistic(µ, σ 2 ). Find the coefficients of skewness and kurtosis of X. [Ans]
2
31.5 Linear transformation of the logistic distribution.Suppose X ∼ logistic(µ, σ ) and Y = a + bX where a ∈ R and b ∈ R.
Prove that Y ∼ logistic(a + bµ, b2 σ 2 ). [Ans]
31.6 Suppose X ∼ logistic(µ, σ 2 ). Find the hazard function of X and show that it is proportional to the distribution function
of X. [Ans]
31.7 Suppose U ∼ uniform(0, 1) and  
U
Y = ln
1−U
Prove that Y ∼ logistic(0, π 2 /3).
Conversely, if Y ∼ logistic(0, π 2 /3) then U = eY /(1 + eY ) ∼ uniform(0, 1). [Ans]
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §31 Page 97

31.8 Suppose Y ∼ logistic(0, π 2 /3) and X = |Y |.


(a) Find the density and distribution function of X. (This is called the half-logistic distribution.)
(b) Find E[X] and var[X]. [Ans]
31.9 Suppose Y1 , Y2 , . . . , Yn are i.i.d. random variables with the logistic(0, π 2 /3) distribution.
(a) Find the density of Yr:n , the rth order statistic.
(b) Find the characteristic function of Yr:n .
(c) Find E[Yr:n ] and var[Yr:n ]. [Ans]
31.10 (a) Suppose µ ∈ R, σ > 0 and X ∼ exponential (1). Prove that

σ 3
Y =µ+ ln(eX − 1) ∼ logistic(µ, σ 2 )
π
In particular, if X ∼ exponential (1) then Y = ln(eX − 1) ∼ logistic(0, π 2 /3), the standard logistic distribution.
Conversely, if Y ∼ logistic(0, π 2 /3), the standard logistic distribution, then X = ln(eY + 1) ∼ exponential (1).
(b) Suppose X and Y are i.i.d. random variables with the exponential (1) distribution. Prove that
√  
σ 3 X
Y =µ+ ln ∼ logistic(µ, σ 2 ) [Ans]
π Y

31.11 (a) Suppose X ∼ logistic(0, π 2 /3), the standard logistic distribution. Prove that Y = eX + 1 ∼ Pareto(1, 1, 0) which
has density fY (y) = 1/y2 for y > 1.
(b) Suppose Y ∼ Pareto(1, 1, 0) which has density fY (y) = 1/y2 for y > 1. Prove that X = ln(Y −1) ∼ logistic(0, π 2 /3),
the standard logistic distribution. [Ans]
Extreme value distributions.
31.12 Suppose f is the density function of the standard Gumbel distribution.
(a) Find the derivative of f and show that f increases for x < 0, decreases for x > 0 and has a unique mode at x = 0.
00
(b) Find the second derivative of f and showh that√f is convex(f >h0) , then concave (f 00 < 0), then convex again
00
i √ i
(f > 0) with inflection points at x = ln (3 − 5)/2 and x = ln (3 + 5)/2 . [Ans]

31.13 Suppose X has the standard Gumbel distribution. Find the quantile function (the inverse of the distribution function)
and the median of the distribution. [Ans]
31.14 Suppose X has the standard Gumbel distribution. Find the moment generating function MX (t) = E[etX ]. [Ans]
31.15 Show that
(a) if U ∼ uniform(0, 1) then − ln( − ln U ) has the standard Gumbel distribution;
(b) if X has the standard Gumbel distribution then exp(e−X ) ∼ uniform(0, 1);
(c) if Z ∼ exponential (1) then − ln Z has the standard Gumbel distribution;
(d) if X has the standard Gumbel distribution then Z = e−X ∼ exponential (1). [Ans]
31.16 Suppose X ∼ Gumbel (a, b).
−1
(a) Find the quantile function, FX of X.
(b) Find the moment generating function of X.
(c) Find E[X] and var[X]. [Ans]
31.17 Suppose X and Y are i.i.d. random variables with the Gumbel (0, 1) distribution.
(a) By using the convolution integral, prove that Z = X − Y ∼ logistic(0, π 2 /3).
(b) By using characteristic functions, prove that Z = X − Y ∼ logistic(0, π 2 /3).
(c) Suppose X ∼ Gumbel (aX , b), Y ∼ Gumbel (aY , b) and X and Y are independent. Prove that Z = X − Y ∼
logistic(aX − aY , π 2 b2 /3). [Ans]

31.18 Suppose X ∼ Gumbel (a, b) where a ∈ R and b > 0. Let Y = e−X . Prove that Y ∼ Weibull ( b1 , e−a ). [Ans]

31.19 Suppose Y ∼ Weibull (β, γ) where β > 0 and γ > 0. Prove that X = −ln Y ∼ Gumbel ( − ln γ, β1 ). [Ans]

31.20 Suppose X ∼ Gumbel (0, 1) and Y = (−X|X < 0). Hence Y ≥ 0. Find the density of Y . [Ans]
Page 98 §32 Mar 10, 2020(20:25) Bayesian Time Series Analysis

31.21 (a) Suppose U ∼ uniform(0, 1), α > 0, a ∈ R, and b > 0. Prove that a + b (− ln U )−1/α ∼ Fréchet(α, a, b).
(b) Suppose X ∼ Fréchet(α, a, b), a1 ∈ R and b1 > 0. Prove that a1 + b1 X ∼ Fréchet(α, a1 + b1 a, bb1 ).
(c) Suppose X1 , . . . , Xn are i.i.d. random variables with the Fréchet(α, a, b) distribution and Y = max{X1 , . . . , Xn }.
Prove that
Y ∼ Fréchet(α, a, bn1/α )
(d) Suppose X ∼ Fréchet(α, 0, b). Prove that
1
∼ Weibull (α, b−1 ) [Ans]
X
31.22 (a) Suppose X ∼ exponential (1). Prove that µ − σ ln X ∼ GEV (0, µ, σ).
(b) Suppose X ∼ Weibull (β, γ) where β > 0 and γ > 0. Prove that β(1 − γ ln(X/γ)) ∼ GEV (0, β, γ).
 
(c) Suppose X ∼ GEV (0, µ, σ) where µ ∈ R and σ > 0. Prove that σ exp − X−µµσ ∼ Weibull (µ, σ).
(d) Suppose X ∼ GEV (α, µ, σ), a > 0 and b ∈ R.. Prove that Y = aX + b ∼ GEV (α, aµ + b, aσ). [Ans]

32 The Levy and inverse Gaussian distributions


The Lévy distribution
32.1 The standard Lévy distribution
Definition(32.1a). Suppose Z ∼ N (0, 1) and U = 1/Z 2 . Then U has the standard Lévy distribution,
Lévy(0, 1).

Derivation of the following properties of the standard Lévy distribution is left to exercise 33.1 on page 102:
• Density function of U .  
1 1
fU (u) = √ exp − for u ∈ (0, ∞).
2πu3 2u
• Distribution function of U .
  
1
FU (u) = 2 1 − Φ √ for u ∈ (0, ∞).
u
• Shape of the density function of U . The function fU√first increases and√ then decreases with mode at /3. The
1
density function fU has points of inflection at /3 −
1 10/15 and /3 +
1 10/15; it is initially convex, then concave
and finally convex.
• Moments of U .
E[U ] = ∞ and E[U k ] = ∞ for k = 1, 2, . . . .
It follows that the variance, skewness and kurtosis of U do not exist. Also, the moment generating function
MU (t) = E[etU ] = ∞ for all t > 0.
• Quantile function of U .
1
FU−1 (u) =  2 for p ∈ [0, 1).
−1
Φ (1 − p/2)
32.2 Characteristic function of the standard Lévy distribution. This derivation is a little tricky—here are the
details
Proposition(32.2a). Suppose U ∼ Lévy(0, 1). Then U has characteristic function
−1 if t < 0;
(
 
φU (t) = exp −|t|1/2 1 − i sgn(t)
 
for t ∈ R, where sgn(t) = 0 if t = 0; (32.2a)

1 if t > 0.
Proof. Suppose t > 0. Using the transformation w = (1 − i) t implies w2 = −2it and hence
Z ∞ Z ∞
2itu2 − 1
   
itu 1 1 1
φU (t) = e √ exp − du = √ exp du
0 2πu3 2u 0 2πu3 2u
Z ∞
e−w ∞ 1
   Z  2 
1 1 2 1 1  −1/2 1/2
= √ exp − w u+ du = √ √ exp − u − wu du (32.2b)
0 2πu3 2 u 2π 0 u3 2

Now let y = 1/(w2 u); hence y > 0, du 2 2
dy = w u , and u
−1/2
− wu1/2 = wy 1/2 − y −1/2 . Hence

2 Univariate Continuous Distributions Mar 10, 2020(20:25) §32 Page 99

e−w ∞ −1/2
Z  2 
1  1/2
φU (t) = √ wy exp − wy − y −1/2 dy (32.2c)
2π 0 2
Taking the average of equations(32.2b) and (32.2c) gives
e−w ∞ wy −1/2 y −3/2
Z    2 
1  1/2 −1/2
φU (t) = √ + exp − wy − y dy
2π 0 2 2 2
Now let z = wy 1/2 − y −1/2 ; hence
e−w ∞ −z2 /2 √
Z 
φU (t) = √ e dz = exp (−w) = exp −(1 − i) t
2π −∞
√
Finally, using complex conjugates gives, for t > 0, φU (−t) = φU (t) = exp −(1 + i) t .

32.3 The general Lévy distribution.


Definition(32.3a). Suppose a ∈ R and b > 0. Then V has the Lévy distribution, Lévy(a, b), iff V = a + bU
where U ∼ Lévy(0, 1).
Most properties of the Lévy(a, b) follow immediately from those for the standard Lévy distribution, Lévy(0, 1).
• Clearly V takes values in (a, ∞).
• Suppose X ∼ Lévy(a, b), c ∈ R and d > 0. Then c + dX ∼ Lévy(c + ad, bd).
It follows that the family of Lévy distributions {Lévy(a, b) : a ∈ R, b > 0} form a location-scale family—see
definition(1.6b) on page 5.
• Density function of V .
r  
b 1 b
fV (v) = exp − for v ∈ (a, ∞).
2π (v − a)3/2 2(v − a)
• Distribution function of V . " r !#
b
FV (v) = 2 1 − Φ
v−a
• Shape of the density function of V . The function fV √first increases and then √ decreases with mode at a + /3.
b
The function fV has points of inflection at a + ( /3 −
1 10/15)b and a + ( /3 +
1 10/15)b; it is initially convex, then
concave and finally convex.
• Moments of V .
E[V ] = ∞ and E[V k ] = ∞ for k = 1, 2, . . . .
It follows that the variance, skewness and kurtosis of V do not exist.
• Quantile function of V .
b
FV−1 (v) = a +  2 for p ∈ (0, 1).
−1
Φ (1 − p/2)
• Characteristic function of V .
 
φV (t) = exp ita − b1/2 |t|1/2 1 − i sgn(t)

for t ∈ R

1.0

0.8 b = 1/2
b=1
0.6 b=2

0.4

0.2

0.0
0 1 2 3 4
Figure(32.3a). Plot of the density of the Lévy distribution for a = 0 and various values of b.
The height of the mode decreases as b increases.
(wmf/LevyDistribution,80mm,60mm)
Page 100 §32 Mar 10, 2020(20:25) Bayesian Time Series Analysis

The inverse Gaussian distribution

32.4 The inverse Gaussian or Wald distribution. The name “inverse Gaussian” is misleading because this
distribution is not the inverse of the normal distribution. In fact the name is due to the fact that its cumulant
generating function (logarithm of the characteristic function) is the inverse of the cumulant generating function of
a Gaussian random variable.
Definition(32.4a). Suppose µ > 0 and λ > 0. Then the inverse Gaussian distribution, inverseGaussian(µ, λ),
is defined to be the distribution with density
r  
λ λ 2
f (x) = exp − 2 (x − µ) for x > 0. (32.4a)
2πx3 2µ x
There are two parameters: it is explained below that λ is the shape parameter and µ is the mean. The distribution
inverseGaussian(1, 1) is called the standard inverse Gaussian distribution.
It is easy to check that if X ∼ inverseGaussian(µ, λ) and α > 0, then αX ∼ inverseGaussian(αµ, αλ). This
implies that if X ∼ inverseGaussian(1, λ/µ) and Y = µX then Y ∼ inverseGaussian(µ, λ).
It follows that the family of distributions {inverseGaussian(µ, λ) : µλ = α}, where α > 0, is a scale family—see
definition (1.6d) on page 5.

32.5 The inverse Gaussian distribution: the distribution function. Consider the function
r √ 
λ x−µ λ 1/2 µ 
f (x) = = x − 1/2 for x > 0.
x µ µ x
Let Y = f (X) where X ∼ inverseGaussian(µ, λ). Then
√  
df λ 1 µ
= + > 0 for all x > 0.
dx 2µ x1/2 x3/2
Also f (x) → −∞ as x → 0 and f (x) → ∞ as x → ∞.
Hence this is a 1-1 transformation from x ∈ (0, ∞) 7→ y ∈ (−∞, ∞). The density of Y is
2µx3/2
 2
dx 2µ 1 y
fY (y) = fX (x) = fX (x) √
= √ exp − (32.5a)
dy λ(x + µ) (x + µ) 2π 2
It is left to an exercise to show that ! 2
y e−y /2
fY (y) = 1− p √ for y ∈ R. (32.5b)
4λ/µ + y 2 2π
Hence the distribution function of Y is
s !
2λ/µ 4λ
FY (y) = Φ(y) + e Φ − y2 + for y ∈ (−∞, ∞). (32.5c)
µ
Because the transformation is 1-1, we have
√ ! r ! r !
λ(x − µ) λ x−µ λ x + µ
FX (x) = FY √ =Φ + e2λ/µ Φ − for x > 0. (32.5d)
xµ x µ x µ

32.6 The inverse Gaussian distribution: the characteristic function. Let α = 2itµ2 /λ. Then
Z ∞ r Z ∞  
itX itx λ λ 2
φX (t) = E[e ] = e f (x; µ, λ) dx = exp itx − 2 (x − µ) dx
0 2πx3 0 2µ x
r Z ∞  
λ λ
= 3
exp itx − 2 (x2 − 2µx + µ2 ) dx
2πx 0 2µ x
 r Z ∞  
λ λ λx λ
= exp exp itx − 2 − dx
µ 2πx3 0 2µ 2x
 r Z ∞
xλ(1 − α) 2itµ2
 
λ λ λ
= exp exp − − dx where α = .
µ 2πx3 0 2µ2 2x λ
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §32 Page 101

Hence " #
r Z ∞ 1/2
− −

λ λ λ λ(1 α) xλ(1 α) λ
φX (t) = exp − (1 − α)1/2 exp − − dx
µ µ 2πx3 0 µ 2µ2 2x
r Z ∞ " 2 #

 
λ λ λ λ(1 α) µ
= exp − (1 − α)1/2 exp − x− dx
µ µ 2πx3 0 2xµ2 (1 − α)1/2
 
λ λ 1/2
= exp − (1 − α)
µ µ
because the integral of the density of the inverseGaussian µ(1 − α)−1/2 , λ is equal to 1. We have shown that

" r #
itX λ λ 2itµ2
φX (t) = E[e ] = exp − 1− (32.6a)
µ µ λ

32.7 The inverse Gaussian distribution: other properties.


Shape of the density. See exercise 33.9. The density f first increases and then decreases with mode at
"r #
9µ2 3µ
µ 1+ 2 −
4λ 2λ
The density is first convex, then concave and finally convex.
3.0 3.0

2.5 µ = 1, λ = 0.2 2.5 µ = 3, λ = 0.2

2.0 2.0
µ = 1, λ = 1 µ = 3, λ = 1
1.5 1.5

1.0 µ = 1, λ = 3 1.0 µ = 3, λ = 3

0.5 0.5

0.0 0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

Figure(32.7a). Plots of the density function of the inverse Gaussian distribution. Left: µ = 1; right: µ = 3.
(wmf/inverseGaussian-mu1,wmf/inverseGaussian-mu3,80mm,60mm)
• Moments—see exercise 33.7. Suppose X ∼ inverseGaussian(µ, λ). Then
µ3
E[X] = µ and var[X] =
λ
• Convolutions—see exercise 33.10. The following results are proved by using characteristic functions.
(a) Suppose X1 , . . . , Xn are independent random variables with Xk ∼ inverseGaussian(µk , λk ) for k = 1, . . . , n.
Then
n n
X λk Xk 2
X λk
2
∼ inverseGaussian(µ, µ ) where µ =
µk µk
k=1 k=1
(b) Suppose X1 , . . . , Xn are i.i.d. random variables with the inverse Gaussian inverseGaussian(µ, λ) distribution.
X ∼ inverseGaussian(µ, nλ)
(c) Suppose α > 0, X ∼ inverseGaussian(µ1 , αµ21 ), Y ∼ inverseGaussian(µ2 , αµ22 ) and X and Y are indepen-
dent, then X + Y ∼ inverseGaussian( µ, αµ2 ) where µ = µ1 + µ2 .
• Limiting distributions. See exercise 33.12.
D
inverseGaussian(µ, λ) =⇒ Lévy(0, λ) as µ → ∞
and
D
inverseGaussian(µ, λ) =⇒ µ as λ → ∞
Page 102 §33 Mar 10, 2020(20:25) Bayesian Time Series Analysis

33 Exercises (exs-LevyInverseGaussian.tex.tex)

The Lévy distribution.


33.1 The standard Lévy distribution. Suppose U has the standard Lévy distribution, Lévy(0, 1).
(a) Show that the density function of U is
 
1 1
fU (u) = √ exp − for u ∈ (0, ∞).
2πu3 2u
(b) Show that the distribution function of U is   
1
FU (u) = 2 1 − Φ √ for u ∈ (0, ∞).
u
(c) Show that E[U k ] = ∞ for k = 1, 2, . . . .
(d) Show that the quantile function of U is
1
FU−1 (u) =  2 for p ∈ [0, 1).
Φ−1 (1 − p/2)
(e) Show that fU first increases and then decreases with mode at 1/3.
√ √
(f) Show that the function fU has points of inflection at 1/3 − 10/15 and 1/3 + 10/15; show further that fU is initially
convex, then concave and finally convex. [Ans]

33.2 Suppose U ∼ Lévy(0, 1). Show that X = 1/ U has the standard half-normal distribution. [Ans]
33.3 Suppose X andY are independent random
 variables such that X ∼ Lévy(a1 , b1 ) and Y ∼ Lévy(a2 , b2 ). Show that
1/2 1/2 2
X + Y ∼ Lévy a1 + a2 , (b1 + b2 ) . [Ans]
33.4 Suppose X ∼ Lévy(a, b) where a ∈ R and b > 0.
(a) Prove that X has a stable distribution with characteristic exponent α = 1/2.
(b) Using the notation in equation(4.3b), show that X is stable with characteristic function {c = a, d = b, β = 1}. [Ans]

The inverse Gaussian distribution.


33.5 Suppose we denote the density of the inverse Gaussian distribution in equation(32.4a) by f (x; µ, λ). Prove that
1 x λ 1 x µ
f (x; µ, λ) = f ( ; 1, ) and f (x; µ, λ) = f ( ; , 1)
µ µ µ λ λ λ
The first equality says if X ∼ inverseGaussian(1, µλ ) then µX ∼ inverseGaussian(µ, λ), and the second equality says if
X ∼ inverseGaussian( µλ , 1) then λX ∼ inverseGaussian(µ, λ). [Ans]
33.6 (a) Derive equation(32.5b) from equation(32.5a).
(b) Derive equation(32.5c) from equation(32.5b). [Ans]
33.7 Suppose X ∼ inverseGaussian(µ, λ).
(a) Show that
µ3
E[X] = µ and var[X] =
λ
(b) Find expressions for E[X 3 ] and E[X 4 ]. [Ans]
33.8 Suppose X ∼ inverseGaussian(µ, λ).
(a) Find skew[X], the skewness of X.
(b) Find κ[X], the kurtosis of X. [Ans]
33.9 Suppose X ∼ inverseGaussian(µ, λ) with density function f .
(a) Show that f increases and then decreases with"mode at #
r
9µ2 3µ
µ 1+ 2 −
4λ 2λ
(b) Show that f 00 is given by
r
λ(x − µ)2
 2 4
6λx3 2λ2
 
λ λx
f 00 (x) = exp − + + x 2
(15 − ) − 10λx + λ 2
32πx11 2µ2 x µ4 µ2 µ2
By investigating the quadratic in x inside the last set of square brackets, it is possible to show that f 00 is initially
positive, then negative and then positive again. Hence f is initially convex, then concave and then convex again. The
positions of the two points of inflection are the positive roots of this quartic—very complicated! (Note that we need
only consider the special case of the quartic when µ = 1 by the first equality in exercise 33.5.) [Ans]
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §34 Page 103

33.10 (a) Suppose X1 , . . . , Xn are independent random variables with Xk ∼ inverseGaussian(µk , λk ) for k = 1, . . . , n. Show
that
n n
X λk Xk 2
X λk
2
∼ inverseGaussian(µ, µ ) where µ =
µk µk
k=1 k=1
(b) Suppose X1 , . . . , Xn are i.i.d. random variables with the inverse Gaussian inverseGaussian(µ, λ) distribution. Show
that
X ∼ inverseGaussian(µ, nλ) [Ans]
33.11 Suppose X ∼ inverseGaussian(µ, λ). Show that X is infinitely divisible. [Ans]
D
33.12 (a) Show that inverseGaussian(µ, λ) =⇒ Lévy(0, λ) as µ → ∞.
D
(b) Show that inverseGaussian(µ, λ) =⇒ µ as λ → ∞. [Ans]
33.13 Suppose X ∼ inverseGaussian(µ, λ). By expressing the density of X in the form
r    
λ λ λ λ 1
fX (x) = exp exp − 2 x − for x ∈ (0, ∞).
2πx3 µ 2µ 2 x
show that inverseGaussian(µ, λ) belongs to the exponential family of distributions with natural parameters −λ/(2µ2 )
and −λ/2 and natural statistics X and 1/X. [Ans]

34 Other distributions with bounded support


34.1 The Wigner or semicircle distribution. The random variable X has the standard semicircle distribution
iff X has density
2p
fX (x) = 1 − x2 for x ∈ [−1, 1]. (34.1a)
π
Suppose µ ∈ R and r > 0. Then the random variable X has the semicircle distribution with mean µ and radius r
iff X has density
2 p
fX (x) = 2 r2 − (x − µ)2 for x ∈ [µ − r, µ + r]. (34.1b)
πr

34.2 The triangular distribution. Suppose a ∈ R, b ∈ (0, ∞) and p ∈ [0, 1]; then the random variable X has
the triangular distribution, triangular(a, b, p) iff X has the following density.
2
If p = 0: fX (x) = b2
(a + b − x) for x ∈ [a, a + b].
( 2
pb2
(x − a) if x ∈ [a, a + pb];
If 0 < p < 1: fX (x) = 2
b2 (1−p)
(a + b − x) if x ∈ [a + pb, a + b].
2
If p = 1: fX (x) = b2
(x − a) for x ∈ [a, a + b].

fX (x) .............. fX (x) .............. fX (x) ..............


.. .. ..
... ... ...
.. ............ .. .......... .. ...
.........
... ... ......... ... .... ..... ... ...... ....
... ... ...... ... .. . ......... ... ...... ...
... ...
......
...... ... .
... . ..... ... .
.......... ...
. . ..... .
... ... ...... ... ... . ..... ... ...... ...
... ...
......
...... ... ... . ..... ... .
.......... ...
... ... ...... ... .. . ..... ... ....... ...
... 2/b ...
......
...... ... .
.
.. 2/b
.
.
.....
..... ... .
...........
. . 2/b ...
...
... ... ...... ... ... . ..... ... ..
...
... ... ......
...... ... ..
. . .....
..... ... ......... ...
...
... ... ...... ... ..
. .
. ..... ... ...
.....
. ...
... .
.
......
...... ... .... . ..... ... ..
..
.... ...
. ..... ....
... ..
. ...... ... ... ..... ... ..
..
. ...
... ... ...... . ... ..
.
.
. .... . ... .
..
.....
. .. .
.....................................................................................................................................................................................
. ...................................................................................................................................................................................... .....................................................................................................................................................................................
. . .
.... .... ....
0 a a+b x 0 a a + pb a+b x 0 a a+b x
Figure(34.2a). Plot of triangular density for p = 0(left), 0 < p < 1(centre) and p = 1(right).
(PICTEX)

34.3 The sine distribution. Suppose a ∈ R and b ∈ (0, ∞); then the random variable X has the sine distribu-
tion, which we shall denote as sinD (a, b), iff X has density
x−a
 
π
fX (x) = sin π for x ∈ [a, a + b]. (34.3a)
2b b
From standard properties of the sine function, it is clear that the density first increases and then decreases with
mode at x = a + b/2; also f 00 < 0 and hence the density is concave.
Page 104 §35 Mar 10, 2020(20:25) Bayesian Time Series Analysis

34.4 The U-power distribution. Suppose a ∈ R, b ∈ (0, ∞) and k ∈ {1, 2, 3, . . .}; then the random variable X
has the U-power distribution, which we shall denote as Upower(a, b, k), iff X has density
2k + 1 x − a 2k
 
fX (x) = for x ∈ [a − b, a + b]. (34.4a)
2b b
The density of the Upower(0, 1, k) distribution is f (x) = (k + 1/2)x2k for x ∈ [−1, 1].
Shape of the Upower(a, b, k) density. Differentiating equation(34.4a) gives, for x ∈ [a − b, a + b]
(2k + 1)(2k) x − a 2k−1 (2k + 1)(2k)(2k − 1) x − a 2k−2
   
0 00
fX (x) = and fX (x) =
2b2 b 2b3 b
Hence fX is convex and fX first decreases and then increases with minimum at x = a. Also the modes are at a − b
and a + b and fX is symmetric about x = a.
Illustrative plots of the density are given in figure (34.4a).
3.0

2.5

2.0

1.5

1.0

0.5

0.0
−1.0 −0.5 0.0 0.5 1.0

Figure(34.4a). Plots of the U-power density: Upower(0, 1, 1) (black) and Upower(0, 1, 3) (red).
(wmf/upower,72mm,54mm)

35 Exercises (exs-otherbounded.tex)

The Wigner or semicircle distribution


35.1 The density and distribution function of the standard semicircle distribution.
(a) Check that equation(34.1a) integrates to 1 and so is a density.
(b) Show that the density increases for x ≤ 0, decreases for x ≥ 0, has a mode at 0 and is concave for all x ∈ [−1, 1].
(c) Show that the distribution function is
1 1 p 1
FX (x) = + x 1 − x2 + arcsin x for x ∈ [−1, 1]. [Ans]
2 π π
35.2 Moments of the standard semicircle distribution. Suppose X has the standard semicircle distribution with the density in
equation(34.1a).
(a) Show that the moments of X are given by
 
2n−1 2n1 1 2n
E[X ] = 0 and E[X ] = 2n for n = 1, 2, . . . .
2 n+1 n
In particular E[X] = 0 and var[X] = E[X 2 ] = 1/4.
(b) Show that the skewness is skew[X] = 0 and the kurtosis is κ[X] = 2. [Ans]
2 2 2
35.3 Suppose the random vector (X, Y ) has the uniform distribution on the unit disc {(x, y) ∈ R : 0 < x + y < 1}. Prove
that both X and Y have the standard semicircle distribution. [Ans]
35.4 Let semicircle(µ, r) denote the semicircle distribution with mean µ and radius r, where µ ∈ R and r > 0.
Suppose the random variable X ∼ semicircle(µ, r).
(a) Prove that X = µ + rY where Y has the standard semicircle distribution, semicircle(0, 1).
(b) Show that the distribution function of X is
1 x − µp 2 x−µ
 
1
FX (x) = + r − (µ − x) + arcsin
2 for x ∈ (µ − r, µ + r).
2 πr2 π r
Note that part(a) implies E[X] = µ, var[Y ] = r2 /4, skew[X] = 0 and κ[X] = 2.
It also implies that the family of distributions {semicircle(µ, r) : µ ∈ R, r > 0} is a location-scale family—see
definition (1.6b) on page 5. [Ans]
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §35 Page 105

35.5 Show that beta( 3/2, 3/2) = semicircle( 1/2, 1/2). [Ans]

The triangular distribution


35.6 The triangular(0, 1, p) has the following density.
If p = 0: fX (x) = 2(1 − x) for x ∈ [0, 1].
(
2x
p if x ∈ [0, p];
If 0 < p < 1: fX (x) = 2(1−x)
(35.6a)
1−p if x ∈ [p, 1].
If p = 1: fX (x) = 2x for x ∈ [0, 1].
Suppose X ∼ triangular(a, b, p). Show that X = a + bY where Y ∼ triangular(0, 1, p).
It follows that the family of distributions {triangular(a, b, p) : a ∈ R, b ∈ (0, ∞) } is a location-scale family—see
definition(1.6b) on page 5. [Ans]
35.7 (a) Suppose X ∼ triangular(0, 1, 1). Show that E[X n ] = 2/(n + 2) for n ∈ {1, 2, . . .}.
(b) Suppose X ∼ triangular(0, 1, p) where p ∈ [0, 1) Show that
2 1 − pn+1
E[X n ] = for n ∈ {1, 2, . . .}.
(n + 1)(n + 2) 1 − p
(c) Suppose X ∼ triangular(a, b, p) where a ∈ R, b ∈ (0, ∞) and p ∈ [0, 1]. Show that
b b2
E[X] = a + (1 + p) and var[X] = (1 − p + p2 ) [Ans]
3 18
35.8 Skewness and kurtosis. Suppose X ∼ triangular(a, b, p) where a ∈ R, b ∈ (0, ∞) and p ∈ [0, 1].
(a) Show that the skewness is
21/2 (1 − 2p)(1 + p)(2 − p)
skew[X] =
5(1 − p + p2 )3/2
If Y ∼ triangular(a, b, 1 − p), then skew[Y ] = −skew[X] and if X ∼ triangular(a, b, 21 ), then skew[X] = 0.
(b) Show that the kurtosis is κ[X] = 12/5. [Ans]
35.9 The distribution function. Suppose X ∼ triangular(a, b, p) where a ∈ R, b ∈ (0, ∞) and p ∈ [0, 1].
(a) Show that the distribution function of X is as follows.
If p = 0: FX (x) = 1 − b12 (a + b − x)2 for x ∈ [a, a + b].
( 1
pb2
(x − a)2 if x ∈ [a, a + pb];
If 0 < p < 1: FX (x) =
1
1 − b2 (1−p) (a + b − x)2 if x ∈ [a + pb, a + b].
1
If p = 1: FX (x) = b2
(x − a)2 for x ∈ [a, a + b].
(b) Show that the quantile function of X is, for all p ∈ [0, 1]:
 √
−1 a + b pu√ if 0 ≤ u ≤ p;
FX (u) =
a + b − b (1 − u)(1 − p) if p ≤ u ≤ 1.
−1
Hence if U ∼ uniform(0, 1), then FX (U ) ∼ triangular(a, b, p). [Ans]
35.10 Suppose X ∼ triangular(0, 1, p). Show that 1 − X ∼ triangular(0, 1, 1 − p). [Ans]
35.11 Suppose X ∼ triangular(a, b, p) where a ∈ R, b ∈ (0, ∞) and p ∈ [0, 1]. Suppose further that c ∈ R and d ∈ (0, ∞).
(a) Show that c + dX ∼ triangular(c + da, db, p).
(b) Show that c − dX ∼ triangular(c − da − db, db, 1 − p).
In particular, if X ∼ triangular(a, b, p), then −X ∼ triangular(−a − b, b, 1 − p). [Ans]
35.12 Suppose U1 and U2 are i.i.d. random variables with the uniform(a, a + b) distribution where a ∈ R and b ∈ (0, ∞).
(a) Show that min{U1 , U2 } ∼ triangular(a, b, 0) and max{U1 , U2 } ∼ triangular(a, b, 1).
(b) Show that |U2 − U1 | ∼ triangular(0, b, 0).
(c) Show that U1 + U2 ∼ triangular(2a, 2b, 1/2) and U2 − U1 ∼ triangular(−b, 2b, 1/2).
In particular, if U1 and U2 are i.i.d. random variables with the uniform(0, 1) distribution, then we have U1 + U2 ∼
triangular(0, 2, 1/2). This is the Irwin-Hall distribution with n = 2. [Ans]
35.13 Link with the beta distribution. Show that triangular(0, 1, 0) = beta(1, 2) and triangular(0, 1, 1) = beta(2, 1). [Ans]
Page 106 §35 Mar 10, 2020(20:25) Bayesian Time Series Analysis

The sine distribution


35.14 Suppose X ∼ sinD (a, b) where a ∈ R and b ∈ (0, ∞). Show that X = a + bY where Y ∼ sinD (0, 1).
Hence show that the family of distributions {sinD (a, b) : a ∈ R and b ∈ (0, ∞) } is a location-scale family of
distributions—see definition(1.6b) on page 5. [Ans]
35.15 Distribution function, quantile function and m.g.f. of the sine distribution. Suppose X ∼ sinD (a, b) where a ∈ R and
b ∈ (0, ∞).
(a) Show that the distribution function of X is given by
x−a
  
1
FX (x) = 1 − cos π for x ∈ [a, a + b].
2 b
(b) Show that the quantile function of X is
−1 b
FX (p) = a + arccos(1 − 2p) for p ∈ (0, 1).
π
and the median is a + b/2.
(c) Show that the m.g.f. of X is given by
π 2 (eat + e(a+b)t )
MX (t) = for t ∈ R. [Ans]
2(b2 t2 + π 2 )
35.16 Moments of the sine distribution.
(a) Suppose X ∼ sinD (0, 1). Show that E[X] = 1/2, E[X 2 ] = 1/2 − 2/π2 , var[X] = 1/4 − 2/π2 , E[X 3 ] = 1/2 − 3/π2 and
E[X 4 ] = 1/2 + 24/π4 − 6/π2 .
(b) Suppose X ∼ sinD (a, b) where a ∈ R and b ∈ (0, ∞). Show that E[X] = a + b/2, E[X 2 ] = a2 + b2 ( 1/2 − 2/π2 ) + ab,
and var[X] = b2 ( 1/4 − 2/π2 ). [Ans]
35.17 Skewness and kurtosis of the sine distribution. Suppose X ∼ sinD (a, b) where a ∈ R and b ∈ (0, ∞).
(a) Show that the skewness of X is skew[X] = 0.
(b) Show that the kurtosis of X is κ[X] = (384 − 48π 2 + π 4 )/(π 2 − 8)2 . [Ans]

The U-power distribution


35.18 Suppose X ∼ Upower(a, b, k) where a ∈ R, b ∈ (0, ∞) and k ∈ {1, 2, . . .}. Show that X = a + bY where Y ∼
Upower(0, 1, k).
Hence show that for every k ∈ {1, 2, . . .}, the family of distributions {Upower(a, b, k) : a ∈ R and b ∈ (0, ∞) } is a
location-scale family of distributions—see definition(1.6b) on page 5. [Ans]
35.19 Distribution function and quantile function of the Upower distribution. Suppose X ∼ Upower(a, b, k) where a ∈ R,
b ∈ (0, ∞) and k ∈ {1, 2, . . .}.
(a) Show that the distribution function of X is given by
" 2k+1 #
x−a

1
FX (x) = 1+ for x ∈ [a − b, a + b]. (35.19a)
2 b

(b) Show that the quantile function of X is


−1
FX (p) = a + b(2p − 1)1/(2k+1) for p ∈ (0, 1).
and the median is a.
It hfollows that if Ui ∼ uniform(0, 1) then a + b(2U − 1)1/(2k+1) ∼ Upower(a, b, k) and if X ∼ Upower(a, b, k) then
1 x−a 2k+1

2 1+ b ∼ uniform(0, 1). [Ans]

35.20 Moments of the U-power distribution.


(a) Suppose X ∼ Upower(0, 1, k) where k ∈ {1, 2, . . .}. Show that

n 0 if n is odd;
E[X ] = 2k+1
n+2k+1 if n is even.
(b) Deduce that E[X] = 0 and var[X] = (2k + 1)/(2k + 3).
(c) Suppose X ∼ Upower(a, b, k) where a ∈ R, b ∈ (0, ∞) and k ∈ {1, 2, . . .}. Show that

0 if n is odd;
E[(X − a)n ] =
bn n+2k+1
2k+1
if n is even.
Deduce that E[X] = a and var[X] = b2 (2k + 1)/(2k + 3). [Ans]
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §36 Page 107

35.21 Skewness and kurtosis of the U-power distribution. Suppose X ∼ Upower(a, b, k) where a ∈ R, b ∈ (0, ∞) and
k ∈ {1, 2, . . .}.
(a) Show that the skewness of X is skew[X] = 0.
(b) Show that the kurtosis of X is
(2k + 3)2
κ[X] = [Ans]
(2k + 1)(2k + 5)
35.22 (a) Suppose Xk ∼ Upower(0, 1, k) for k ∈ {1, 2, . . .}. Show that
D
Xk =⇒ Y as k → ∞ where Y is a discrete random variable with P[Y = −1] = P[Y = 1] = 1/2.
(b) Suppose a ∈ R, b ∈ (0, ∞) and Xk ∼ Upower(a, b, k) for k ∈ {1, 2, . . .}.
D
Xk =⇒ Y as k → ∞ where Y is a discrete random variable with P[Y = a − b] = P[Y = a + b] = 1/2. [Ans]

36 Other distributions with unbounded support



36.1 The chi distribution. Suppose X ∼ χ2n where n ∈ (0, ∞). Then Y = X has the chi distribution with n
degrees of freedom. This distribution is denoted the χn distribution.
0.8
n=1
n=2
0.6 n=3
n=6

0.4

0.2

0.0
0 1 2 3 4 5
Figure(36.1a). Density of the chi distribution for n = 1, n = 2, n = 3 and n = 6.
(wmf/chiDistribution,72mm,54mm)

36.2 The Maxwell distribution. q Suppose β > 0 Then the random variable X has the Maxwell (β) distribution
iff X has the same distribution as N12 + N22 + N32 where N1 , N2 and N3 are i.i.d. random variables with the
N (0, β) distribution.

Relation to other distributions. Clearly if X ∼ Maxwell (1) then X 2 ∼ χ23 . Conversely if Y ∼ χ23 then Y ∼
Maxwell (1). Finally, the Maxwell (1) distribution is the same as the chi distribution with 3 degrees of freedom.
Density of the Maxwell distribution. Now
s
N12 N22 N32 √
X=β + + = β Y where Y ∼ χ23 .
β2 β2 β2
Hence  2 r
2x x 2 x2 −x2 /(2β 2 )
fX (x) = 2 fY 2
= e for x ∈ (0, ∞).
β β π β3
For the shape of the density see exercise 37.8 on page 112.
Distribution function of the Maxwell distribution. For x ∈ (0, ∞) we have
Z x r Z
1 2 x u −u2 /(2β 2 )
Fx (x) = fX (u) du = u 2e du
0 β π 0 β
r " Z x/β −v2 /2 #

r  x Z x 
1 2 2 2 2 2 1 2 2 2 e
= −ue−u /(2β ) + e−u /(2β ) du = −xe−x /(2β ) + β 2π √ dv

β π 0 0 β π 0 2π

r     
1 2 −x2 /(2β 2 ) x 1
= −xe + β 2π Φ −
β π β 2
  r
x 1 2 −x2 /(2β 2 )
= 2Φ −1− xe
β β π
Page 108 §36 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Multiple of a Maxwell distribution. Suppose X ∼ Maxwell (β) and α ∈ (0, ∞). Let Y = αX. Then Y has density

dx 1  y  r 2 y2 2 2 2
fY (y) = fX (x) = fX = 3 3
e−y /(2α β )
dy α α πα β
Hence Y ∼ Maxwell (αβ). It follows that the family of distributions { Maxwell (β) : β ∈ (0, ∞) } is a scale
family of distributions—see definition(1.6d) on page 5.
Moment generating function of the Maxwell distribution. First, consider the m.g.f. of the Maxwell (1) distribution.
For t ∈ R we haver r
2 ∞ 2 tx−x2 /2 2 t2 /2 ∞ 2 −(x−t)2 /2
Z Z
tX
E[e ] = x e dx = e x e dx
π x=0 π x=0
r
2 t2 /2 ∞
Z
2
= e (v + t)2 e−v /2 dv
π v=−t
r Z ∞ Z ∞ Z ∞ 
2 t2 /2 −v 2 /2 −v 2 /2 2 −v 2 /2
= e v ve dv + 2t ve dv + t e dv
π v=−t v=−t v=−t
√ √
r r
2 t2 /2 h 2 2
i 2 2
= e −te−t /2 + 2πΦ(t) + 2te−t /2 + 2π t2 Φ(t) = t + 2(1 + t2 )et /2 Φ(t)
π π
It follows that if X ∼ Maxwell (β) then X has m.g.f.
r
tX 2 2 2
E[e ] = βt + 2(1 + β 2 t2 )eβ t /2 Φ(βt)
π
Moments of the Maxwell distribution. See exercise 37.9 and 37.10 on page 112.
36.3 The exponential logarithmic distribution. Suppose p ∈ (0, 1) and b ∈ (0, ∞); then the random variable
X has the exponential logarithmic distribution, denoted expLog(b, p), iff X has density
(1 − p)e−x/b
fX (x) = − for x ∈ (0, ∞). (36.3a)
b ln(p)[1 − (1 − p)e−x/b ]
Illustrative plots of the density are given in figure(36.3a).
2.0

1.5

1.0

0.5

0.0
0 1 2 3 4
Figure(36.3a). Plot of the expLog(1, 0.25) density in black and the expLog(1, 0.75) density in green.
The exponential density 2e−2x is the dotted line in red.
(wmf/expLog,72mm,54mm)

36.4 The log-logistic distribution. Suppose k ∈ (0, ∞) and b ∈ (0, ∞); then the random variable X has the
log-logistic distribution, denoted loglogistic(k, b), iff X has density
bk kxk−1
f (x) = k for x ∈ (0, ∞). (36.4a)
(b + xk )2
An alternative expression is
k x k−1

b b
f (x) = 2 for x ∈ (0, ∞).
1 + ( xb )k
The distribution function of the loglogistic(k, b) distribution. Integrating equation(36.4a) gives
xk
F (x) = k for x ∈ [0, ∞). (36.4b)
b + xk
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §36 Page 109

The quantile function is


 1/k
−1 p
F (p) = b for p ∈ [0, 1). (36.4c)
1−p
and the median is F −1 ( 1/2) = b.
Shape of the density function. The derivative of f is
bk (k − 1) − (k + 1)xk
f 0 (x) = kbk xk−2
(bk + xk )3
Hence if k ∈ (0, 1) then f is decreasing with f (x) → ∞ as x → 0. If k = 1, then f is decreasing with mode at
x = 0. If k > 1, then f first increases and then decreases with mode at
k − 1 1/k
 
x=b
k+1
Further results about the shape of the density can be found in exercise 37.17.
Distribution of a multiple of a log-logistic distribution. Suppose X ∼ loglogistic(k, b) where b ∈ (0, ∞) and k ∈
(0, ∞). Suppose further that α > 0 and Y = αX. Applying the usual transformation formula to equation(36.4a)
shows that Y ∼ loglogistic(k, αb).
It follows that if k ∈ (0, ∞) is fixed, then the family of distributions {loglogistic(k, b) : b ∈ (0, ∞)} is a scale
family of distributions—see definition(1.6d) on page 5.
The standard log-logistic distribution. The standard log-logistic distribution is loglogistic(1, 1) and this has den-
sity
1
f (x) = for x ∈ (0, ∞). (36.4d)
(1 + x2 )2
This is the same as the beta0 (1, 1) distribution.
36.5 The hyperbolic secant distribution. Suppose µ ∈ R and σ ∈ (0, ∞); then the random variable X has the
hyperbolic secant distribution, which we shall denote as Hsecant(µ, σ), iff X has density
π x−µ
  
1
fX (x) = sech for x ∈ R. (36.5a)
2σ 2 σ
The density of the Hsecant(0, 1) distribution is
1  πx  1
fX (x) = sech = πx/2 for x ∈ R. (36.5b)
2 2 e + e−πx/2

0.5

0.4

0.3

0.2

0.1

0.0
−3 −2 −1 0 1 2 3

Figure(36.5a). Plot of the Hsecant(0, 1) density in black and the N (0, 1) density in red.
The Hsecant(0, 1) distribution has the higher peak.
(wmf/Hsecant,72mm,54mm)

Shape of the Hsecant(0, 1) density. Differentiating equation(36.5b) gives, for x ∈ R


0 π eπx/2 − e−πx/2
fX (x) = −
2 [eπx/2 + e−πx/2 ]2
00 π 2 [eπx/2 + e−πx/2 ]3 − 2[eπx/2 − e−πx/2 ]2 [eπx/2 + e−πx/2 ]
fX (x) = −
4 [eπx/2 + e−πx/2 ]4
π 2 [eπx/2 + e−πx/2 ]2 − 2[eπx/2 − e−πx/2 ]2
=−
4 [eπx/2 + e−πx/2 ]3
Page 110 §37 Mar 10, 2020(20:25) Bayesian Time Series Analysis

π 2 [eπx/2 − e−πx/2 ]2 − 4
 
=
4 [eπx/2 + e−πx/2 ]3
So fX is initially increasing for x < 0 and then decreasing for x > 0; hence the mode is at x = 0. Also fX
is initially convex, √then concave
√ and then convex
√ again with points of inflection
√ when [eπx/2 − e−πx/2 ]2 = 4,
i.e. when e πx/2 = 2 + 1 or 2 − 1 = 1/( 2 + 1); hence πx/2 = ± ln( 2 + 1). Clearly the density fX is
symmetric about 0 because the sec function is symmetric about 0.
36.6 The Gompertz distribution. The Gompertz distribution has an exponentially increasing failure rate; for
this reason it is often used to model human lifetimes.
The definition. Suppose Y has the reverse Gumbel distribution defined in equation(30.4a) on page 91. Hence Y
has density fY (y) = exp [−ey + y] and distribution function FY (y) = 1 − exp [−ey ] for y ∈ R.
Define the random variable X to be the same as Y given Y > 0, then X has density
fY (x) ex exp[−ex ]
= ex exp −(ex − 1) for x ∈ (0, ∞).
 
f (x) = = −1
P[Y > 0] e
This is defined to be the Gompertz(1, 1) distribution and is generalised as follows.
Definition(36.6a). Suppose a ∈ (0, ∞) and b ∈ (0, ∞). Then the random variable X has the Gompertz
distribution, Gompertz(a, b), iff X has the density
a h i
f (x) = ex/b exp −a(ex/b − 1) for x ∈ [0, ∞). (36.6a)
b
Distribution function. The distribution function of i b) distribution is
h the Gompertz(a,
F (x) = 1 − exp −a(ex/b − 1) for x ∈ [0, ∞). (36.6b)
Multiple of a Gompertz distribution. Suppose X ∼ Gompertz(a, b) and c ∈ (0, ∞). Then cX ∼ Gompertz(a, bc).
It follows that if a ∈ (0, ∞) is fixed, then the family of distributions { Gompertz(a, b) : b ∈ (0, ∞) } is a scale
family of distributions—see definition(1.6d) on page 5.
Mean of the Gompertz distribution.Z This is
aea ∞ x/b
E[X] = xe exp[−aex/b ] dx
b 0
Z ∞
a
= abe ln y exp[−ay] dy by using the transformation y = ex/b .
1
Z ∞
a 1 −ay
= be e dy by integrating by parts
1 y
Z ∞
1 −z
= bea e dz
a z
and this is as far as we can get—the integral is a version of the exponential integral which is considered in
[A BRAMOWITZ &S TEGUN(1965)].

37 Exercises (exs-other.tex)

The chi (χ) distribution


37.1 The density and distribution function of the chi distribution.
(a) Suppose Y ∼ χn . Show that the density of Y is
2
y n−1 e−y /2
fY (y) = (n−2)/2 for y > 0. (37.1a)
2 Γ(n/2)
(b) Show that the distribution function of Y is
2 Z x
Γ( n/2, y /2)
FY (y) = where Γ(n, x) is the incomplete gamma function: Γ(n, x) = y n−1 e−y dy [Ans]
Γ( n/2) 0
37.2 Links between the chi distribution and other distributions.
d
(a) Suppose X ∼ χ1 . Show that X = |Y | where Y ∼ N (0, 1). More generally, show that if X ∼ χ1 then σX ∼
folded (0, σ 2 ), which is the half-normal distribution. See exercise 16.25 and exercise 16.23 on page 51.
(b) Suppose X ∼ χ2 . Show that X ∼ Rayleigh (1).
(c) Suppose X ∼ χ3 . Show that X has the standard Maxwell density
r
2 2 −y2 /2
fY (y) = y e for y > 0. [Ans]
π
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §37 Page 111

37.3 Moments of the chi distribution. Suppose n ∈ (0, ∞) and X ∼ χn . Show that

(n+k)/2
k k/2 Γ
E[X ] = 2 for k ∈ (0, ∞). (37.3a)
Γ( n/2)
In particular
  !2
(n+1)/2 (n+1)/2
1/2 Γ Γ
E[X] = 2 E[X 2 ] = n var[X] = n − µ2 = n − 2 [Ans]
Γ( n/2) Γ( n/2)
37.4 Skewness and kurtosis of the chi distribution. Suppose X has the chi distribution with n degrees of freedom. Show that
µ(1 − 2σ 2 ) 2 − 2µσ skew[X] + σ 2
skew[X] = and κ[X] = [Ans]
σ3 σ2
37.5 Shape of the density of the chi distribution. Denote the density of χn by f .
p
(a) Suppose n = 1. Show that f is decreasing with mode at 0 equal to f (0) = 2/π . Show that f is concave for
0 < x < 1 and convex for x > 1 with point of inflection at x = 1.
(b) Suppose 0 < n < 1. Show that f is decreasing with f (x) → ∞ as x → 0. √
Suppose n > 1. Show that f increases and then decreases with mode at x = n − 1.
(c) If n ≤ 7/8 show that f (x) is convex for all x.
If 7/8 < n < 1, show that f is initially convex, then concave and then convex again with two points of inflection at
q
1

2 [2n − 1 ± 8n − 7].
q √
If 1 ≤ n ≤ 2 show that f is initially concave and then convex with one point of inflection at 12 [2n − 1 + 8n − 7].
If n > 2, show that f is initially convex, then concave and then convex again with two points of inflection at
q
1

2 [2n − 1 ± 8n − 7] [Ans]

37.6 For x > − 1/2, let


 2
Γ(x + 1)
θ(x) = −x
Γ(x + 1/2)
Then θ(x) → 1/4 as n → ∞. This result is proved in [WATSON(1959)].
For n ∈ {1, 2, . . .}, suppose Xn has the χn distribution and let µn = E[Xn ].
(a) By using Watson’s result, prove that
E[Xn ]
→ 1 as n → ∞, and E[Xn ] − n1/2 → 0 as n → ∞.
n1/2
(b) By using Watson’s result, prove that
1
var[Xn ] → as n → ∞. [Ans]
2
37.7 Asymptotic normality of the chi distribution. Suppose {Vn } is a sequence of random variables such that Vn ∼ χ2n for
every n ∈ {1, 2, . . .}. By the central limit theorem we know that
Vn − n D
√ =⇒ N (0, 1) as n → ∞.
2n
(a) By using the delta17 method, prove that if {Wn } is a sequence of random variables such that Wn ∼ χn , then

Wn − n D
√ =⇒ N (0, 1) as n → ∞.
1/ 2
(b) Prove that if {Wn } is a sequence of random variables such that Wn ∼ χn , then
W n − µn D
=⇒ N (0, 1) as n → ∞, where µn = E[Wn ] and σn2 = var[Wn ]. [Ans]
σn

17
The delta method is the following theorem: suppose {Xn } is a sequence of random variables such that
Xn − µ D
=⇒ N (0, 1) as n → ∞.
σn
where µ ∈ R and {σn } is a sequence in (0, ∞) such that σn → 0 as n → ∞. Suppose further that g is a real-valued function
which is differentiable at µ and g 0 (µ) 6= 0. Then
g(Xn ) − g(µ) D
=⇒ N (0, 1) as n → ∞.
g 0 (µ)σn
Page 112 §37 Mar 10, 2020(20:25) Bayesian Time Series Analysis

The Maxwell distribution


37.8 Shape of the Maxwell density. Suppose X ∼ Maxwell (β) with density function fX .

(a) Show that fX initially increases and then decreases with mode at x = β 2.
(b) Show that fX is initially convex, then concave and then convex again with points of inflection at
s √ s √
5 − 17 5 + 17
x1 = β and x2 = β [Ans]
2 2
37.9 Moments of the Maxwell distribution.Suppose X ∼ Maxwell (β).
p
(a) Show that E[X] = 2β 2/π and var[X] = β 2 (3 − 8/π).
(b) Show that
2n/2+1 β n
 
n+3
E[X n ] = √ Γ for n ∈ {1, 2, 3, . . .}. [Ans]
π 2
37.10 Skewness and kurtosis of the Maxwell distribution. Suppose X ∼ Maxwell (β). Show that

2 2(16 − 5π) 15π 2 + 16π − 192
skew[X] = and κ[X] = [Ans]
(3π − 8)3/2 (3π − 8)2

The exponential logarithmic distribution


37.11 (a) Check that equation(36.3a) defines a density function.
(b) Show that the density of expLog(b, p) is decreasing and convex on (0, ∞) and hence has a unique mode at 0.
(c) Find the distribution function of the expLog(b, p) distribution.
(d) Find the quartile function of the expLog(b, p) distribution and find an expression for the median. [Ans]
37.12 (a) Suppose X ∼ expLog(b, p). Show that X = bY where Y ∼ expLog(1, p).
Definition(1.6d) on page 5 implies that for fixed p ∈ (0, 1), the family of distributions {expLog(b, p) : b > 0} is a
scale family of distributions.
(b) Find hX (x), the hazard function of the expLog(b, p) distribution and show hX (x) = b1 hY ( xb ) where X = bY .
(c) Show that the hazard function of the expLog(b, p) distribution is decreasing on [0, ∞). Thus the exponential-
logarithmic distribution can be used to model objects which improve with age. [Ans]
37.13 Moments of the expLog(b, p) distribution. For n ∈ {1, 2, . . .} show that

bn n! X (1 − p)k
E[X n ] = −
ln(p) k n+1
k=1
Hence show that E[X n ] → 0 as p → 0 and E[X n ] → bn n! as p → 1. [Ans]
37.14 (a) Suppose U ∼ uniform(0, 1), p ∈ (0, 1) and b ∈ (0, ∞). Show that
1−p
 
X = b ln ∼ expLog(b, p)
1 − pU
(b) Suppose X ∼ expLog(b, p) where p ∈ (0, 1) and b ∈ (0, ∞). Show that
ln[1 − (1 − p)e−X/b ]
∼ uniform(0, 1) [Ans]
ln(p)
37.15 Suppose Y1 , Y2 , . . . are i.i.d. random variables with the exponential (λ) distribution. Suppose further that N is indepen-
dent of {Y1 , Y2 , . . .} and has the logarithmic distribution:
(1 − p)n
P[N = n] = − for n ∈ {1, 2, 3, . . .}.
n ln(p)
Show that
X = min{Y1 , Y2 , . . . , YN } ∼ expLog( 1/λ, p) [Ans]
37.16 Suppose b ∈ (0, ∞) and Xn ∼ expLog(b, pn ).
D
(a) Show that Xn =⇒ 1 as pn → 0.
D
(b) Show that Xn =⇒ exponential (1/b) as pn → 1. [Ans]
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §37 Page 113

The log-logistic distribution


37.17 Shape of the density function. Suppose X ∼ loglogistic(k, b) has density function f .
(a) Show that the second derivative of f is
b2k (k − 1)(k − 2) − 4(k 2 − 1)xk bk + (k + 1)(k + 2)x2k
f 00 (x) = kbk xk−3
(bk + xk )4
(b) Suppose k ∈ (0, 1]. Show that f is convex.
(c) Suppose k ∈ (1, 2]. Show that f is initially concave and then convex with point of inflection at
" p #1/k
2(k 2 − 1) + k 3(k 2 − 1)
x2 = b
(k + 1)(k + 2)

(d) Suppose k > 2. Show that f is initially convex, then concave and then convex again with points of inflection at
" p #1/k
2(k 2 − 1) − k 3(k 2 − 1)
x1 = b
(k + 1)(k + 2)
and at x2 . [Ans]
37.18 The hazard function of the log-logistic distribution. Suppose X ∼ loglogistic(k, b).
(a) Show that the hazard function of X is
kxk−1
h(x) = for x ∈ (0, ∞).
bk + xk
(b) Show that if k ∈ (0, 1] then h is decreasing. Show that if k > 1 then h first decreases and then increases with
minimum at x = b(k − 1)1/k . [Ans]
37.19 Moments of the log-logistic distribution. Suppose X ∼ loglogistic(k, b) where k ∈ (0, ∞) and b ∈ (0, ∞).
(a) By using Euler’s reflection formula, (13.10a), show that
(
∞ if n ≥ k;
n
E[X ] = bn B 1 − n , 1 + n  = bn πn/k
if n ∈ [0, k).
k k sin(πn/k)

(b) In particular, show that if k > 1 then


 
1 1 (π/k)
E[X] = b B 1 − , 1 + =b
k k sin(π/k)
and if k > 2 then  
2 22 2 (2π/k)
E[X ] = b B 1 − , 1 + = b2 [Ans]
k k sin(2π/k)
37.20 Power transformation of a log-logistic distribution. Suppose X ∼ loglogistic(k, b) and Y = X n where n ∈ (0, ∞).
Show that Y ∼ loglogistic(k/n, bn ).
In particular, if X ∼ loglogistic(1, 1) then Y = bX 1/k ∼ loglogistic(k, b). [Ans]
37.21 (a) Suppose U ∼ uniform(0, 1), b ∈ (0, ∞) and k ∈ (0, ∞). Show that X = b[U/(1 − U )]1/k ∼ loglogistic(k, b).
(b) Suppose X ∼ loglogistic(k, b). Show that Y = X k /(bk + X k ) ∼ uniform(0, 1).
(c) Suppose X ∼ loglogistic(k, b). Show that Y = ln X ∼ logistic( ln b, σ 2 = π 2 /(3k 2 ) ).
(d) Suppose X ∼ logistic(a, σ 2 = π 2 /(3k 2 ) ). Show that Y = eX ∼ loglogistic(k, ea ). [Ans]
37.22 Limiting distribution of the log-logistic distribution.
D
(a) Suppose Xk ∼ loglogistic(k, 1). Show that Xk =⇒ 1 as k → ∞.
D
(b) Suppose Xk ∼ loglogistic(k, b). Show that Xk =⇒ b as k → ∞.

The hyperbolic secant distribution


37.23 (a) Check that equation(36.5b) defines a density function.
(b) Suppose µ ∈ R and σ ∈ (0, ∞). Show that if X ∼ Hsecant(µ, σ), then X = µ + σY where Y ∼ Hsecant(0, 1).
Hence deduce that {Hsecant(µ, σ) : µ ∈ R, σ ∈ (0, ∞) } forms a location-scale family of distributions—see
definition(1.6b) on page 5.
Note that parts(a) and (b) show that equation(36.5a) defines a density function.
(c) Shape of the Hsecant(µ, σ) density. Suppose fX is the density of the Hsecant(µ, σ) distribution. Show that fX is
symmetric about µ and fX increases for x < µ and decreases for x > µ with mode at x = µ. Show√also that fX is
initially convex, then concave and then convex again with points of inflection when x = µ ± π2 σ ln( 2 + 1). [Ans]
Page 114 §37 Mar 10, 2020(20:25) Bayesian Time Series Analysis

37.24 The distribution function and quantile function of the hyperbolic secant distribution.
(a) Suppose X ∼ Hsecant(0, 1). Show that the distribution function of X is
2 h  πx i
FX (x) = arctan exp for x ∈ R.
π 2
(b) Suppose X ∼ Hsecant(µ, σ). Show that the distribution function of X is
π(x − µ)
  
2
FX (x) = arctan exp for x ∈ R.
π 2σ
(c) Suppose X ∼ Hsecant(µ, σ). Show that the quantile function of X is
−1 2σ h  πp i
FX (p) = µ + ln tan for p ∈ (0, 1).
π 2
Hence deduce that the median of X is µ. [Ans]
37.25 Moment generating function and characteristic function.
(a) Suppose X ∼ Hsecant(0, 1). Show that the m.g.f. of X is MX (t) = sec(t) for t ∈ (−π/2, π/2) and the characteristic
function of X is φX (t) = E[eitX ] = sech(t) for t ∈ R.
(b) Suppose X ∼ Hsecant(µ, σ). Show that
MX (t) = eµt sec(σt) for t ∈ − 2σ
π π
and φX (t) = E[eitX ] = eµt sech(σt) for t ∈ R.

, 2σ
Hint for part(a). First show that

e−cαw
Z
B(α, β) = c dw
w=−∞ (1 + e−cw )α+β
and then deduce that for any q > p > 0 we have
Z ∞
e−pw
q −qw
dw = B(p/q, 1 − p/q)
w=−∞ 1 + e
Then use the fact that B(x, 1 − x) = π/ sin(πx) for any x ∈ (0, 1). [Ans]
37.26 Moments of the hyperbolic secant distribution.
(a) Suppose X ∼ Hsecant(0, 1). Clearly all odd moments are 0 because the density is symmetric about 0. Show that
E[X] = 0 E[X 2 ] = 1 E[X 3 ] = 0 E[X 4 ] = 5
Hence deduce var[X] = 1, skew[X] = 0 and κ[X] = 5.
(b) Suppose X ∼ Hsecant(µ, σ). For this case, all odd moments about µ are zero. Show that
E[X] = µ E[X 2 ] = σ 2 + µ2
Hence deduce var[X] = σ 2 , skew[X] = 0 and κ[X] = 5. [Ans]

The Gompertz distribution


37.27 Suppose X ∼ Gompertz(a, b).
(a) Find the quantile function of X and show that the median is b ln(1 + ln 2/a).
(b) Show that X has hazard function
a x/b
h(x) = e for x ∈ [0, ∞).
b
and hence X has an exponentially increasing hazard rate. [Ans]
37.28 Shape of the Gompertz density. Suppose X ∼ Gompertz(a, b) and f denotes the density of X.
(a) Suppose a < 1. Show that f first increases and then decreases with mode at x = −b ln a.
(b) Suppose a ≥ 1. Show that f is decreasing with mode at x = 0.
(c) Find the second derivative of f .

(d) Suppose a < (3 − 5)/2. √Show that f is initially convex√ and then concave and then convex again with points of
inflection at x1 = b ln[(3 − 5)/2a] and x2 = b ln[(3 + 5)/2a].
√ √
(e) Suppose (3 − √5)/2 ≤ a < (3 + 5)/2. Show that f is initially concave and then convex with point of inflection at
x2 = b ln[(3 + 5)/2a].

(f) Suppose a ≥ (3 + 5)/2. Show that f is convex. [Ans]
37.29 Suppose b ∈ (0, ∞) and the random variable X has the reverse Gumbel (0, b) distribution. The density of this distribution
is given in equation(30.5a) on page 91. Show that the conditional distribution of X given X ≥ 0 is the Gompertz(1, b)
distribution. [Ans]
2 Univariate Continuous Distributions Mar 10, 2020(20:25) §37 Page 115

37.30 Link between the Gompertz and uniform distributions.


(a) Suppose X ∼ uniform(0, 1), a ∈ (0, ∞) and b ∈ (0, ∞). Show that
 
1
Y = b ln 1 − ln X ∼ Gompertz(a, b)
a
(b) Suppose X ∼ Gompertz(a, b). Show that
h i
Y = exp −a(eX/b − 1) ∼ uniform(0, 1) [Ans]

37.31 Link between the Gompertz and exponential distributions.


(a) Suppose X ∼ Gompertz(a, b). Show that Y = eX/b − 1 ∼ exponential (a).
X

(b) Suppose X ∼ exponential (1), a ∈ (0, ∞) and b ∈ (0, ∞). Show that Y = b ln a + 1 ∼ Gompertz(a, b). [Ans]

The Linnik distribution


37.32 (a) Suppose X and Y are independent random variables such that X ∼ exponential (1) and Y ∼ Cauchy(1). Show that
the characteristic function of the random variable Z = XY is
1
φZ (t) = E[eitZ ] =
1 + |t|
(b) A generalization of part (a). Suppose X and Y are independent random variables such that X ∼ exponential (1) and
α
Y has the symmetric stable distribution with characteristic function φY (t) = E[eitY ] = e−|t| where 0 < α < 2.
Show that the characteristic function of the random variable Z = X 1/α Y is
1
φZ (t) = E[eitZ ] =
1 + |t|α
This distribution is called the Linnik distribution. [Ans]
Page 116 §37 Mar 10, 2020(20:25) Bayesian Time Series Analysis
CHAPTER 3

Multivariate Continuous Distributions

38 General results
38.1 The mean and variance matrices. If
X1

.
X =  .. 
n×1
Xn
is a random vector, then, provided the univariate expectations E[X1 ], . . . , E[Xn ] exist, we define
E[X1 ]
 
.
E[X] =  .. 
n×1
E[Xn ]
Provided the second moments E[X12 ], . . . , E[Xn2 ] are finite, the variance matrix or covariance matrix of X is the
n × n matrix
var[X] = E[(X − µ)(X − µ)T ] where µ = E[X].
n×n

The (i, j) entry in the variance matrix is cov[Xi , Xj ]. In particular, the ith diagonal entry is var[Xi ].
Clearly:
• the variance matrix is symmetric;
• if X1 , . . . , Xn are i.i.d. with variance σ 2 , then var[X] = σ 2 I;
• var[X] = E[XXT ] − µµT ; (38.1a)
• we shall denote the variance matrix by Σ or ΣX .
We shall often omit stating “when the second moments are finite” when it is obviously needed. Random vectors
will be nearly always column vectors, but may be written in text as row vectors in order to save space.

38.2 Linear transformations. If Y = X + a then var[Y] = var[X].


More generally, if Y = A + BX where A is m × 1 and B is m × n, then µY = A + BµX and
   
var[Y] = E (Y − µY )(Y − µY )T = E B(X − µX )(X − µX )T )BT = B var[X] BT

In particular, if a = (a1 , . . . , an ) is a 1 × n vector, then aX = ni=1 ai Xi is a random variable and


P

n X
X n
var[aX] = a var[X] aT = ai aj cov[Xi , Xj ]
i=1 j=1

Example(38.2a). Suppose the random vector X = (X1 , X2 , X3 ) has variance matrix


6 2 3
" #
var[X] = 2 4 0
3 0 2
Let Y1 = X1 + X2 and Y2 = X1 + X2 − X3 . Find the variance matrix of Y = (Y1 , Y2 ).
Solution. Now Y = AX where  
1 1 0
A=
1 1 −1
Hence  
14 11
var[Y] = var[AX] = Avar[X]AT =
11 10

Bayesian Time Series Analysis by R.J. Reed Mar 10, 2020(20:25) §38 Page 117
Page 118 §38 Mar 10, 2020(20:25) Bayesian Time Series Analysis

38.3 Positive definiteness of the variance matrix. Suppose X is an n × 1 random vector. Then for any n × 1
vector c we have
cT var[X]c = var[cT X] ≥ 0 (38.3a)
Hence var[X] is positive semi-definite (also called non-negative definite).
Proposition(38.3a). Suppose X is a random vector with finite second moments and such that no element of X
is a linear combination of the other elements. Then var[X] is a symmetric positive definite matrix.
Proof. No element of X is a linear combination of the other elements; this means that if a is an n × 1 vector with aT X
constant then we must have a = 0.
Now suppose var[cT X] = 0; this implies cT X is constant and hence c = 0. Hence cT var[X]c = 0 implies var[cT X] = 0
which implies c = 0. This result, together with equation(38.3a) shows that var[X] must be positive definite.

Example(38.3b). Consider the random vector Z = (X, Y, X + Y )T where µX = E[X], µY = E[Y ] and ρ = corr[X, Y ].
Show that var[Z] is not positive definite.
Solution. Let a = (1, 1, −1). Then a var[Z] aT = var[aZ] = var[0] = 0.

38.4 The square root of a variance matrix; the transformation to independent random variables Suppose
C is a real symmetric positive definite n × n matrix. Because C is real and symmetric, we can write C = LDLT
where L is orthogonal1 and D = diag(d1 , . . . , dn ) is diagonal and d1 , . . . , dn are the eigenvalues of C. Because
C is also positive definite, we have d1 > 0, . . . , dn > 0. Hence we can write C = (LD1/2 LT )(LD1/2 LT ) = QQ
where Q is symmetric and nonsingular.
If we only know C is real, symmetric and non-negative definite, then we only have d1 ≥ 0, . . . , dn ≥ 0. We can
still write C = (LD1/2 LT )(LD1/2 LT ) = QQ where Q is symmetric but Q is now possibly singular.

Now suppose X is a random vector with finite second moments and such that no element of X is a linear combi-
nation of the other elements; then var[X] is a real symmetric positive definite matrix. Hence var[X] = QQ and
if Y = Q−1 X then var(Y) = Q−1 var[X] (Q−1 )T = I. This means that if X is a random vector with finite second
moments and such that no element of X is a linear combination of the other elements, then there exists a linear
transformation of X to independent variables.

38.5 The covariance between two random vectors.


Definition(38.5a). If X is an m × 1 random vector with finite second moments and Y is an n × 1 random vector
with finite second moments, then cov[X, Y] is the m × n matrix with (i, j) entry equal to cov[Xi , Yj ].
Because the (i, j) entry of cov[X, Y] equals cov[Xi , Yj ], it follows that
cov[X, Y] = E[(X − µX )(Y − µy )T ] = E[XYT ] − µX µTY
Also:
• because cov[Xi , Yj ] = cov[Yj , Xi ], it follows that cov[X, Y] = cov[Y, X]T ;
• if n = m, the covariance matrix cov[X, Y] is symmetric;
• cov[X, X] = var[X];
• we shall often denote the covariance matrix by Σ or ΣX,Y .

38.6 The correlation matrix. Suppose the √ n × 1 random vector X has the variance matrix Σ. Let D be the
n × n diagonal matrix with diagonal equal to diag(Σ). Then the correlation matrix of X is given by
corr[X] = D−1 ΣD−1
Clearly, the (i, j) element of corr[X] is corr(Xpi , Xj ). Also, corr[X] is the variance matrix of the random vector
Z = (Z1 , . . . , Zn ) where Zj = (Xj − E[Xj ])/ var(Xj ).
Conversely, given corr[X] we need the vector of standard deviations in order to determine the variance  matrix. In
fact, var[X] = D corr[X] D where D is the diagonal matrix with entries stdev(X1 ), . . . , stdev(Xn ) .

1
Orthogonal means that LT L = I and hence L−1 = LT . Because C = LDLT , we have LT CL = D and hence LT (C − λI)L =
D − λI; hence |C − λI| = |D − λI| and the eigenvalues of C equal the eigenvalues of D—see page 39 of [R AO(1973)].
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §38 Page 119

38.7 Quadratic forms. Results about quadratic forms are important in regression and the analysis of variance.
A quadratic form in (x, y) is an expression of the type ax2 +by 2 +cxy; a quadratic form in (x, y, z) is an expression
of the form ax2 + by 2 + cz 2 + dxy + exz + f yz. Thus, for example, 2x2 + 4x + 3y 2 is not a quadratic form in (x, y).
Definition(38.7a). Suppose A is a real n × n symmetric matrix. Then the quadratic form of A is the function
qA : Rn → Rn with
Xn X n
qA (x) = ajk xj xk = xT Ax
j=1 k=1
Suppose we have x Ax where the matrix A is not symmetric. Because xT Ax is a scalar, we have xT Ax = xT AT x
T

and hence xT Ax = xT Bx where B = 21 (A + AT ). In this way, we can work round the requirement that A is
symmetric.
Pn
Example(38.7b). Suppose A = I, the identity matrix. Then qA (X) = XT AX = k=1 Xk2 .
Pn 2
Example(38.7c). Suppose A = 1, the n × n matrix with every entry equal to 1. Then qA (X) = XT AX = k=1 Xk .
If A and B are both real n×n symmetric matrices and a, b ∈ R, then aA+bB can be used to create a new quadratic
form:
qaA+bB (X) = XT (aA + bB)X = aqA (X) + bqB (X)
Pn Pn 2
Example(38.7d). Suppose A = I and B = 1. Then qaI+b1 (X) = a k=1 Xk2 + b k=1 Xk .
Pn Pn 2 Pn Pn
In particular qI−1/n (X) = k=1 Xk2 − n1 k=1 Xk = k=1 (Xk − X)2 and hence the sample variance S 2 = k=1 (Xk −
X)2 /(n − 1) is a quadratic form in X = (X1 , . . . , Xn ).
Example(38.7e). Suppose
0 1 0 ··· 0 0 0 1 0 ··· 0 0
   
0 0 1 ··· 0 0 1 0 1 ··· 0 0
0 0 0 ··· 0 ···
   
0 0 1 0 0 0
A1 =  ... ... ... . . . ... ..  and A2 =   ... ... ... .. .. .. 
 .  . . . 
0 0 0 ··· 0 1 0 0 0 ··· 0 1
0 0 0 ··· 0 0 0 0 0 ··· 1 0
Then XT A1 X = X1 X2 + · · · + Xn−1 Xn . Note that the matrix A1 is not symmetric. Now A2 = A1 + AT1 is symmetric and
Pn−1
qA2 (X) = XT A2 X = 2XT A1 X = 2 k=1 Xk Xk+1 .
38.8 Mean of a quadratic form.
Proposition(38.8a). Suppose X is an n × 1 random vector with E[X] = µ and var[X] = Σ. Suppose A is a
real n × n symmetric matrix. Then qA (X), the quadratic form of A has expectation
E[XT AX] = trace(AΣ) + µT Aµ (38.8a)
Proof. Now XT AX = (X − µ)T A(X − µ) + µT AX + XT Aµ − µT Aµ and hence
E[XT AX] = E (X − µ)T A(X − µ) + µT Aµ
 

Because (X − µ)T A(X − µ) is a scalar, we have


E[ (X − µ)T A(X − µ) ] = E trace (X − µ)T A(X − µ)
 

= E trace A(X − µ)(X − µ)T


 
because trace(AB) = trace(BA).
T
 
= trace E A(X − µ)(X − µ) because E[trace(V)] = trace(E[V]).
Hence result.
The second term in equation(38.8a) is xT Ax evaluated at x = µ; this simplifies some derivations. We now apply
this result to some of the examples above.
Pn
Example(38.8b). Suppose A = I, the identity matrix. Then qA (X) = XT AX = j=1 Xj2 . and equation(38.8a) gives
 
Xn n
X n
X
E Xj2  = σj2 + µ2j
j=1 j=1 j=1
P 2
n
Example(38.8c). Suppose A = 1, the n × n matrix with every entry equal to 1. Then qA (X) = XT AX = j=1 Xj and
equation(38.8a) gives
 2   2
n
X n X
X n Xn
E  Xj   = σjk +  µj 
 
j=1 j=1 k=1 j=1
Page 120 §38 Mar 10, 2020(20:25) Bayesian Time Series Analysis

P Suppose X1 , . . . , Xn are i.i.d. random variables with mean µ and


Example(38.8d). Continuation of example(38.7d).
n 1
variance σ 2 . Consider the quadratic form qA (X) = 2 T
k=1 (Xk − X) . Then qA (X) = X AX where A = I − n 1. Now
2
var[X] = σ I; hence equation(38.8a) gives
n
1 X
E[qA (X)] = σ 2 trace(I − 1) + xT Ax = (n − 1)σ 2 + (xk − x)2 = (n − 1)σ 2

n x=µ x=µ
k=1
Pn
Hence if S 2 = k=1 (Xk − X)2 /(n − 1), then E[S 2 ] = σ 2 .
Example(38.8e). Suppose X1 , . . . , Xn are i.i.d. random variables with mean µ and variance σ 2 . First we shall find the
Pn−1
matrix A with qA (X) = k=1 (Xk − Xk+1 )2 = (X1 − X2 )2 + (X2 − X3 )2 + · · · + (Xn−1 − Xn )2 . Now qA (X) = X12 + 2X22 + · · · +
2 n−1
+Xn2 −2 k=1 Xk Xk+1 . Using the matrix A2 in example(38.7e) gives qA (X) = X12 +2X22 +· · ·+2Xn−1 2
+Xn2 −XT A2 X.
P
2Xn−1
T T
Hence qA (X) = X A1 X − X A2 X where A1 = diag [ 1 2 2 · · · 2 1 ].
Hence qA (X) = XT AX where
1 −1 0 · · · 0 0
 
 −1 2 −1 · · · 0 0 
 0 −1 2 · · · 0 0 
 
A = A1 − A2 =   ... .
.. .
.. .. .
.. . 
.. 
 . 
 0 0 0 · · · 2 −1 
0 0 0 ··· −1 1
2
Equation(38.8a) gives E[qA (X)] = σ trace(A) + qA (X) = σ (2n − 2) + 0 = σ 2 (2n − 2).
2

X=(µ,...,µ)

38.9 Variance of a quadratic form. This result is complicated!


Proposition(38.9a). Suppose X1 , . . . , Xn are independent random variables with E[Xj ] = 0 for j = 1, . . . , n.
Suppose all the random variables have the same finite second and fourth moments; we shall use the following
notation:
E[Xj2 ] = σ 2 and E[Xj4 ] = µ4
Suppose A is an n × n symmetric matrix with entries aij and d is the n × 1 column vector with entries
(a11 , . . . , ann ) = diag(A). Then
var[XT AX] = (µ4 − 3σ 4 )dT d + 2σ 4 trace(A2 )
  2
Proof. Now var[XT AX] = E (XT AX)2 − E[XT AX] . Pn
Because E[X] = 0, using equation(38.8a) gives E[XT AX] = trace(AΣ) = σ 2 trace(A) = σ 2 j=1 ajj . Let c = XT A and
Z = XXT ; then c is a 1×n row vector and Z is an n×n matrix and (XT AX)2 = XT AXXT AX = cZcT = j k cj ck Zjk =
P P
P P th T
P n th T
j k cj ck Xj Xk . The j entry in the row vector c = X A is i=1 Xi aij and the k entry in the row vector c = X A
P n
is `=1 X` a`k . Hence
XXXX
(XT AX)2 = aij a`k Xi Xj Xk X`
i j k `
Using independence of the X’s gives
µ4
if i = j = k = `;
(
E[Xi Xj Xk X` ] = σ4
if i = j and k = `, or i = k and j = `, or i = ` and j = k.
otherwise. 0
Hence  
X X X X
E[ (XT AX)2 = µ4 a2ii + σ 4  a2ij 
 
aii akk + aij aji +
 
i i,k i,j i,j
i6 =k i6 =j i6 =j

Now
X n
X n
X n
X
aii akk = aii akk = aii (trace(A) − aii ) = [trace(A)]2 − dT d
i,k i=1 k=1 i=1
i6 =k k6 =i

X n X
X n n X
X n
a2ij = a2ij = a2ij − dT d = trace(A2 ) − dT d
i,j i=1 j=1 i=1 j=1
i6 =j j6 =i
X X
aij aji = a2ij = trace(A2 ) − dT d
i,j i,j
i6 =j i6 =j

and hence
E[ (XT AX)2 = (µ4 − 3σ 4 )dT d + σ 4 trace(A)2 + 2 trace(A2 )
   

and hence the result.


3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §39 Page 121

Example(38.9b). Suppose X1 , . . . , Xn are i.i.d. random variables with the N (0, σ 2 ) distribution and A is a symmetric
n × n matrix. By §15.4 on page 47, we know that E[Xj4 ] = 3σ 4 . Hence
var[XT AX] = 2σ 4 trace(A2 )

We can generalize proposition(38.9a) to non-zero means as follows:


Proposition(38.9c). Suppose X1 , . . . , Xn are independent random variables with E[Xj ] = µj for j = 1, . . . , n.
Suppose all the random variables have the same finite second, third and fourth moments about the mean; we
shall use the following notation: E[X] = µ and
E[(Xj − µj )2 ] = σ 2
E[(Xj − µj )3 ] = µ3
E[(Xj − µj )4 ] = µ4
Suppose A is an n × n symmetric matrix with entries ai,j and d is the n × 1 column vector with entries
(a11 , . . . , ann ) = diag(A). Then
var[XT AX] = (µ4 − 3σ 4 )dT d + 2σ 4 trace(A2 ) + 4σ 2 µT A2 µ + 4µ3 µT Ad (38.9a)
Proof. See exercise 39.11 on page 122.

38.10
Summary.
The variance matrix.
• var[X] = E[ (X − µ)(X − µ)T ] = E[XXT ] − µµT .
• var[A + BX] = B var[X] BT
• var[X] is symmetric positive semi-definite
• if no element of X is a linear combination of the others, then var[X] is symmetric positive definite
• if var[X] is positive definite, there exists symmetric non-singular Q with var[X] = QQ
• if X has finite second moments and no element is a linear combination of the other elements, then there exists a linear
transformation of X to independent variables
The covariance matrix.
• cov[X, Y] = E[ (X − µX )(Y − µY )T ] = E[XYT ] − µX µTY
• cov[X, Y] = cov[Y, X]T
• cov[X, X] = var[X]
• if the dimensions of X and Y are equal, then cov[X, Y] is symmetric
Quadratic forms.
• qA (x) = xT Ax where A is a real symmetric matrix
• E[qA (X)] = trace(AΣ) + µT Aµ

39 Exercises (exs-multiv.tex)

39.1 Suppose X is an m × 1 random vector and Y is an n × 1 random vector. Suppose further that all second moments are
finite of X and Y and suppose a is an m × 1 vector and b is an n × 1 vector. Show that
Xm X n
cov[aT X, bT Y] = aT cov[X, Y]b = ai bj cov[Xi , Yj ] [Ans]
i=1 j=1

39.2 Further properties of the covariance matrix. Suppose X is an m × 1 random vector and Y is an n × 1 random vector.
Suppose further that all second moments are finite.
(a) Show that for any m × 1 vector b and any n × 1 vector d we have
cov[X + b, Y + d] = cov[X, Y]
(b) Show that for any ` × m matrix A and any p × n matrix B we have
cov[AX, BY] = Acov[X, Y]BT
(c) Suppose a, b, c and d ∈ R; suppose further that V is an m × 1 random vector and W is an n × 1 random vector with
finite second moments.
cov[aX + bV, cY + dW] = ac cov[X, Y] + ad cov[X, W] + bc cov[V, Y] + bd cov[V, W]
Both sides are m × n matrices.
(d) Suppose a and b ∈ R and both X and V are m × 1 random vectors. Show that
var[aX + bV] = a2 var[X] + ab cov[X, V] + ab cov[V, X] + b2 var[V] [Ans]
Page 122 §39 Mar 10, 2020(20:25) Bayesian Time Series Analysis

39.3 Suppose X is an m-dimensional random vector with finite second order moments and such that such that no element of
X is a linear combination of the other elements.
Show that for any n-dimensional random vector Y, there exists an n × m matrix A such that
cov[Y − AX, X] = 0 [Ans]
39.4 Suppose X is an n × 1 random vector with E[X] = µ and var[X] = Σ. Prove the following results:
(a) E[(AX + a)(BX + b)T ] = AΣBT + (Aµ + a)(Bµ + b)T where A is m × n, a is m × 1, B is r × n and b is r × 1.
(b) E[(X + a)(X + a)T ] = Σ + (µ + a)(µ + a)T where a is n × 1.
(c) E[XaT X] = (Σ + µµT )a where a is n × 1. [Ans]
39.5 Suppose Y1 , Y2 , . . . , Yn are independent random variables each with variance 1. Let X1 = Y1 , X2 = Y1 + Y2 , . . . , Xn =
Y1 + · · · + Yn . Find the n × n matrix var[X]. [Ans]
39.6 Suppose X is an n × 1 random vector with finite second moments. Show that for any n × 1 vector α ∈ Rn we have
E[(X − α)(X − α)T ] = var[X] + (µX − α)(µx − α)T [Ans]
39.7 Suppose X is an n × 1 random vector with E[X] = µ and var[X] = Σ. Prove the following results:
(a) E[(AX + a)T (BX + b)] = trace(AΣBT ) + (Aµ + a)T (Bµ + b) where A and B are m × n, and a and b are m × 1.
(b) E[XT X] = trace(Σ) + µT µ
(c) E[(AX)T (AX)] = trace(AΣAT ) + (Aµ)T (Aµ) where A is n × n.
(d) E[(X + a)T (X + a)] = trace(Σ) + (µ + a)T (µ + a) where a is n × 1. [Ans]
39.8 Suppose X is an m-dimensional random vector, Y is an n-dimensional random vector and A is an m × n real matrix.
Prove that
E[ XT AY ] = trace(AΣY,X ) + µTX AµY
where ΣY,X is the n × m matrix cov[Y, X]. [Ans]
39.9 Quadratic forms. Suppose X is an n × 1 random vector with E[X] = µ and var[X] = Σ. Suppose A is a real n × n
symmetric matrix and b is an n × 1 real vector.
Show that
E[(X − b)T A(X − b)] = trace(AΣ) + (µ − b)T A(µ − b)
In particular
• E[ (X − µ)T A(X − µ) ] = trace(AΣ). √
• If kak denotes the length of the vector a, then kak = aT a and EkX − bk2 = trace(Σ) + kµ − bk2 . [Ans]
39.10 Suppose X1 , . . . , Xn are random variables with E[Xj ] = µj and var[Xj ] = σj2 for j = 1, . . . , n; also cov[Xj , Xk ] = 0
for k > j + 1. If
Xn
Q= (Xk − X)2
k=1
show that
(n − 1)α − 2β
E[Q] = +γ
n
Pn 1
Pn 2
where α = σ12 + · · · + σn2 , β = cov[X1 , X2 ] + cov[X2 , X3 ] + · · · + cov[Xn−1 , Xn ] and γ = k=1 µ2k − n k=1 µk .
Note that if all variables have the same mean, then γ = 0. [Ans]
39.11 Variance of a quadratic form—proof of proposition(38.9c) on page 121.
(a) Show that XT AX = W1 + W2 + c where W1 = (X − µ)T A(X − µ), W2 = 2µT A(X − µ) and c = µT Aµ.
(b) Show that var[W2 ] = 4σ 2 µT A2 µ.
(c) Show that cov[W1 , W2 ] = E[W1 W2 ] = 2µT A E[YYT AY] = 2µ3 µT Ad where Y = X − µ.
(d) Hence show var[XT AX] = (µ4 − 3σ 4 )dT d + 2σ 4 trace(A2 ) + 4σ 2 µT A2 µ + 4µ3 µT Ad [Ans]
2
39.12 Suppose X1 , . . . , Xn are random variables
Pnwith common expectation µ and common variance σ . Suppose further that
cov[Xj , Xk ] = ρσ 2 for j 6= k. Show that k=1 (Xk − X)2 has expectation σ 2 (1 − ρ)(n − 1) and hence
Pn 2
k=1 (Xk − X)
(1 − ρ)(n − 1)
2
is an unbiased estimator of σ . [Ans]
39.13 Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution. Let
Pn Pn−1
(Xk − X)2 (Xk+1 − Xk )2
S 2 = k=1 and Q = k=1
n−1 2(n − 1)
(a) Show that var[S 2 ] = 2σ 4 /(n − 1). (See also exercise 43.8 on page 139.)
(b) Show that E[Q] = σ 2 and var[Q] = 2σ 4 (6n − 8)/4(n − 1)2 . [Ans]
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §40 Page 123

39.14 (a) Expectation of XT AY. Suppose X is an n-dimensional random vector with expectation µX , Y is an m-dimensional
random vector with expectation µY and A is an n × m real matrix. Let Z = XT AY.
Prove that E[Z] = trace( Acov[Y, X] ) + µTX AµY .
(b) Suppose (X1 , Y1 ), . . . , (Xn , Yn ) are i.i.d. random vectors with a distribution with expectation (µX , µY ) and variance
 2 
σX σXY
where σXY = cov[X, Y ].
σXY σY2
Suppose
Pn
j=1 (Xj − X)(Yj − Y)
SXY =
n−1
Show that E[SXY ] = σXY . [Ans]

40 The bivariate normal


40.1 The density. Here is the first of several equivalent formulations of the density.

The random vector (X1 , X2 ) has a bivariate normal distribution iff it has density
Definition(40.1a).
|P|1/2 (x − µ)T P (x − µ)
 
fX1 X2 (x1 , x2 ) = exp − (40.1a)
2π 2
   
x1 µ1
where x = , µ = ∈ R2 and P is a real symmetric positive definite matrix.
2×1 x 2 2×1 µ 2 2×2

Suppose the entries in the 2 × 2 real symmetric matrix P are denoted as follows2 :
 
a1 a2
P=
a2 a3
It follows that equation(40.1a) is equivalent to
q
a1 a3 − a22 
a1 (x1 − µ1 )2 + 2a2 (x1 − µ1 )(x2 − µ2 ) + a3 (x2 − µ2 )2

f (x1 , x2 ) = exp − (40.1b)
2π 2
A more common form of the density is given in equation(40.3a) on page 124.

To show that equation(40.1b) defines a density. Clearly f ≥ 0. It remains to check that f integrates to 1. Let
y1 = x1 − µ1 and y2 = x2 − µ2 . Then
Z Z
fX1 X2 (x1 , x2 ) dx1 dx2
x1 x2
|P|1/2 a1 y12 + 2a2 y1 y2 + a3 y22
Z Z  
= exp − dy1 dy2 (40.1c)
2π y1 y2 2
( 2 )  2
|P|1/2 a22
 
a1 a2 y2
Z Z
= exp − y1 + y2 exp − a3 − dy1 dy2 (40.1d)
2π y1 y2 2 a1 2 a1

√  
a1 a3 −a22
Now use the transformation z1 = a1 y1 + aa12 y2 and z2 = y2 √ a1 . This transformation has Jacobian
q
a1 a3 − a22 = |P|1/2 and is a 1 − 1 map R2 → R2 ; it gives
 2
z1 + z22
Z Z Z Z 
1
fX1 X2 (x1 , x2 ) dx1 dx2 = exp − dz1 dz2 = 1
x1 x2 2π z1 z2 2
by using the integral of the standard normal density equals one.

2
It is easy to check that the real symmetric matrix P is positive definite iff a1 > 0 and a1 a3 − a22 > 0.
Page 124 §40 Mar 10, 2020(20:25) Bayesian Time Series Analysis

40.2 The marginal distributions of X1 and X2 . For the marginal density of X2 we need to find the following
integral:
Z
fX2 (x2 ) = fX1 X2 (x1 , x2 ) dx1
x1
First let Y1 = X1 − µ1 and Y2 = X2 − µ2 and find the density of Y2 :
|P|1/2 a1 y12 + 2a2 y1 y2 + a3 y22
Z Z  
fY2 (y2 ) = fY1 ,Y2 (y1 , y2 ) dy1 = exp − dy1
y1 2π y1 2
Using the decomposition in equation(40.1d) gives
 2 ( 2 )
|P|1/2 a22
 Z 
y2 a1 a2
fY2 (y2 ) = exp − a3 − exp − y1 + y2 dy1
2π 2 a1 y1 2 a1
 2  r
|P|1/2 y2 a1 a3 − a22 y2
 
2π 1
= exp − =q exp − 22
2π 2 a1 a1 2πσ22 2σ2

where σ22 = a1 /(a1 a3 − a22 ) = a1 /|P|. It follows that the density of X2 = Y2 + µ2 is


(x2 − µ2 )2
 
1 a1 a1
fX2 (x2 ) = exp − 2
where σ22 = 2
=
|P|
q
2πσ22 2σ2 (a1 a3 − a2 )

We have shown that the marginal distributions are normal:


X2 has the N (µ2 , σ22 ) distribution.
Similarly,
X1 has the N (µ1 , σ12 ) distribution.
where
a3 a3 a1 a1
σ12 = 2
= and σ22 = 2
=
a1 a3 − a2 |P| a1 a3 − a2 |P|
If X1 and X2 are both normal, it does not follow that (X1 , X2 ) is normal—see example (42.8a) on page 133.
40.3 The covariance and correlation between X1 and X2 . Of course, cov[X1 , X2 ] = cov[Y1 , Y2 ] where
Y1 = X1 − µ1 and Y2 = X2 − µ2 . So it suffices to find cov[Y1 , Y2 ] = E[Y1 Y2 ]. The density of (Y1 , Y2 ) is:
|P|1/2 a1 y12 + 2a2 y1 y2 + a3 y22
 
fY1 Y2 (y1 , y2 ) = exp −
2π 2
It follows that
a1 y12 + 2a2 y1 y2 + a3 y22
Z Z  

exp − dy1 dy2 = q
y1 y2 2 a a − a2
1 3 2
Differentiating with respect to a2 gives
a1 y12 + 2a2 y1 y2 + a3 y22
Z Z  

(−y1 y2 ) exp − dy1 dy2 = a2
y1 y2 2 (a1 a3 − a22 )3/2
and hence
a2
cov[X1 , X2 ] = E[Y1 Y2 ] = −
a1 a3 − a22
The correlation between X1 and X2 is
cov[X1 , X2 ] a2
ρ= = −√
σ1 σ2 a1 a3
These results lead to an alternative expression for the density of a bivariate normal:
(x1 − µ1 )2 (x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2
  
1 1
fX1 X2 (x1 , x2 ) = exp − − 2ρ +
2(1 − ρ2 ) σ12 σ22
p
2πσ1 σ2 1 − ρ2 σ 1 σ2
(40.3a)
We have also shown that  
σ12 cov[X1 , X2 ]
var[X] = = P−1
cov[X1 , X2 ] σ22
P is sometimes called the precision matrix—it is the inverse of the variance matrix var[X].
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §40 Page 125

Summarizing some of these results:


   
a1 a2 −1 1 a3 −a2
P= |P| = a1 a3 − a22 P =
a2 a3 a1 a3 − a22 −a2 a1
   
−1 σ12 ρσ1 σ2 −1 1 σ22 −ρσ1 σ2
Σ=P = |Σ| = (1 − ρ2 )σ12 σ22 Σ =P=
ρσ1 σ2 σ22 (1 − ρ2 )σ12 σ22 −ρσ1 σ2 σ12
(40.3b)
Example(40.3a). Suppose (X, Y ) has a bivariate normal distribution with density
 
1 1 2 2
f (x, y) = exp − (x + 2y − xy − 3x − 2y + 4)
k 2
Find the mean vector and the variance matrix of (X, Y ). What is the value of k?
Solution. Let Q(x, y) = a1 (x − µ1 )2 + 2a2 (x − µ1 )(y − µ2 ) + a3 (y − µ2 )2 . So we want Q(x, y) = x2 + 2y 2 − xy − 3x − 2y + 4.
Equating coefficients of x2 , xy and y 2 gives a1 = 1, a2 = − 12 and a3 = 2. Hence
1 − 12 4 2 12
   
−1
P= and Σ = P =
− 12 2 7 21 1

Also |P| = 74 and hence k = 2π/|P|1/2 = 4π/ 7.
Now ∂Q(x,y)
∂x = 2a1 (x − µ1 ) + 2a2 (y − µ2 ) and ∂Q(x,y)
∂y = 2a2 (x − µ1 ) + 2a3 (y − µ2 ). If ∂Q(x,y)
∂x = 0 and ∂Q(x,y)
∂y = 0 then we
2
must have x = µ1 and y = µ2 because |P| = a1 a3 − a2 6= 0.
Applying this to Q(x, y) = x2 + 2y 2 − xy − 3x − 2y + 4 gives the equations 2µ1 − µ2 − 3 = 0 and 4µ2 − µ1 − 2 = 0. Hence
(µ1 , µ2 ) = (2, 1).

40.4 The characteristic function. Suppose XT = (X1 , X2 ) has the bivariate density defined in equation(40.1a).
Then for all t ∈ R2 , the characteristic function of X is
2×1

|P|1/2 (x − µ)T P(x − µ)


h i Z Z  
itT X itT x
φ(t) = E e = e exp − dx dy
2π x1 x2 2
|P|1/2 itT µ
 T
2it y − yT Py
Z Z 
= e exp dy1 dy2 by setting y = x − µ.
2π y1 y2 2
But y Py − 2it y = (y − iΣt) P(y − iΣt) + tT Σt where Σ = P−1 = var[X]. Hence
T T T

|P|1/2 itT µ− 1 tT Σt (y − iΣt)T P(y − iΣt)


Z Z  
φ(t) = e 2 exp − dy1 dy2
2π y1 y2 2
T
µ− 21 tT Σt
= eit
by using the integral of equation(40.1a) is 1.
Example(40.4a). Suppose X = (X1 , X2 ) ∼ N (µ, Σ). Find the distribution of X1 + X2 .
Solution. The c.f. of X is φX (t1 , t2 ) = exp iµ1 t1 + iµ2 t2 − 21 (t21 σ12 + 2t1 t2 σ12 + t22 σ22 ) . Setting t1 = t2 = t gives the


c.f. of X1 + X2 to be φX1 +X2 (t) = exp i(µ1 + iµ2 )t − 21 t2 (σ12 + 2σ12 + σ22 ) . Hence X1 + X2 ∼ N (µ1 + µ2 , σ 2 ) where


σ 2 = σ12 + 2σ12 + σ22 = σ12 + 2ρσ1 σ2 + σ22 .

40.5 The conditional distributions. We first find the conditional density of Y1 given Y2 where Y1 = X1 − µ1
and Y2 = X2 − µ2 . Now
fY Y (y1 , y2 )
fY1 |Y2 (y1 |y2 ) = 1 2
fY2 (y2 )
We use the following forms:
 n o
2 2ρσ1 σ12 2
1 y1 − σ2 y1 y2 + σ2 y2
fY1 Y2 (y1 , y2 ) = exp − 2
 
2
p 2
2σ1 (1 − ρ )
2

2πσ1 σ2 1 − ρ

y22
 
1
fY2 (y2 ) = q exp − 2
2πσ22 2σ2
Page 126 §40 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Hence  n o
2ρσ1 ρ2 σ12 2
1 y12 − σ2 y1 y2 + σ22 2
y
f (y1 |y2 ) = √ q exp −
 
2σ12 (1 − ρ2 )

2π σ12 (1 − ρ2 )
 n  o2 
ρσ1
1 y1 − y
σ2 2
=√ q exp −
 
2σ 2 (1 − ρ 2) 
2
2π σ (1 − ρ2 ) 1
1
 
ρσ1 2
which is the density of the N σ2 y2 , σ1 (1 − ρ2 )
distribution.
 
It follows that the density of X1 given X2 is the N µ1 + ρσ σ2
1
(x 2 − µ 2 ), σ1
2 (1 − ρ2 ) distribution, and hence

ρσ1
E[X1 |X2 ] = µ1 + σ2 (X2 − µ2 ) and var[X1 |X2 ] = σ12 (1 − ρ2 ).
of the original notation, σ12 (1 − ρ2 ) = 1/a1 and ρσ1 /σ2 = −a2 /a1 and hence the distribution of X1 given
In terms 
a2
X2 is N µ1 − a1 (x2 − µ2 ), a11 .
Example(40.5a). Suppose the 2-dimensional random vector X has the bivariate normal N (µX , ΣX ) distribution where
   
2 4 2
µX = and ΣX =
1 2 3
Find the distribution of X1 + X2 given X1 = X2 .
Solution. Let Y1 = X1 + X2 and Y2 = X1 − X2 . Then   
Y1 1 1
Y= = BX where B =
Y2 1 −1
Hence Y ∼ N (µY , ΣY ) where
   
3 T 11 1
µY = and ΣY = BΣX B =
1 1 3
2 2

 the random vector Y we haveσ1 = 11, σ2 = 3 and ρ = 1/ 33. We now want the distribution of Y1 given Y2 = 0.
Note that for
This is N µ1 + ρσ 2 2
σ2 (y2 − µ2 ), σ1 (1 − ρ ) = N ( /3, /3).
1 5 32

We have also shown that if the random vector (X1 , X2 ) is bivariate normal, then E[X1 |X2 ] is a linear function
of X2 and hence the best predictor and best linear predictor are the same—see exercises 2.15 and 2.18 on page 7.
40.6 Independence of X1 and X2 .
Proposition(40.6a). Suppose (X1 , X2 ) ∼ N (µ, Σ). Then X1 and X2 are independent iff ρ = 0.
Proof. If ρ = 0 then fX1 X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ). Conversely, if X1 and X2 are independent then cov[X1 , X2 ] = 0
and hence ρ = 0.
In terms of entries in the precision matrix: X1 and X2 are independent iff a2 = 0.
40.7 Linear transformation of a bivariate normal.
Proposition(40.7a). Suppose X has the bivariate normal distribution N (µ, Σ) and C is a 2 × 2 non-singular
matrix. Then the random vector Y = a + CX has the bivariate normal distribution N (a + Cµ, CΣCT ).
Proof. The easiest way is to find the characteristic function of Y. For t ∈ R2 we have
2×1
itT Y itT a itT CX itT a itT Cµ− 21 tT CΣCT t
φY (t) = E[e ] = e E[e ]=e e
which is the characteristic function of the bivariate normal N (a + Cµ, CΣCT ).
We need C to be non-singular in order to ensure the variance matrix of the result is non-singular.

We can transform a bivariate normal to independent normals as follows.


Proposition(40.7b). Suppose X has the bivariate normal distribution N (µ, Σ) where
   
µ1 σ12 ρσ1 σ2
µ= and Σ =
µ2 ρσ1 σ2 σ22
Define Y to be the random vector with components Y1 and Y2 where:
X1 = µ1 + σ1 Y1
p
X2 = µ2 + ρσ2 Y1 + σ2 1 − ρ2 Y2
Then Y ∼ N (0, I).
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §41 Page 127

−1
Proof. Note that X = µ + BY  and Y = B (X −µ) where  
σ1 0 −1
1/σ 0
B= p and B = √ 1

ρσ2 σ2 1 − ρ2 − ρ/σ1 1−ρ2 1/σ2 1−ρ2
It is straightforward to check that B−1 Σ(B−1 )T = I. Hence Y ∼ N (0, I).
Example(40.7c). Suppose X1 and X2 are i.i.d. random variables with the N (0, 1) distribution. Suppose further that µ1 ∈ R,
µ2 ∈ R, σ1 ∈ (0, ∞), σ2 ∈ (0, ∞) and ρ ∈ (−1, 1).
(a) Let p
Y1 = µ1 + σ1 X1 and Y2 = µ2 + ρσ2 X1 + 1 − ρ2 σ2 X2
Find the distribution of (Y1 , Y2 ).
(b) Let p
Y10 = µ1 + ρσ1 X1 + 1 − ρ2 σ1 X2 and Y20 = µ2 + σ2 X1
Find the distribution of (Y10 , Y20 ).
Solution. (a) Now Y = a + CX where X ∼N (0, I) and  
µ1 σ1 p 0
a= and C =
µ2 ρσ2 1 − ρ2 σ2
Hence Y ∼ N (a, CCT ) and C is non-singular with
 
σ12 ρσ1 σ2
CCT =
ρσ1 σ2 σ22
(b) The same distribution as Y = (Y1 , Y2 ).
For the transformation of a bivariate normal distribution by polar coordinates and the Box-Muller transformation
see exercise 28.26 on page 86.
40.8
Summary. The bivariate normal distribution.
Suppose X = (X1 , X2 ) ∼ N (µ, Σ) where µ = E[X] and Σ = var[X].
• Density.
|P|1/2 (x − µ)T P(x − µ)
 
fX (x) = exp − where P = Σ−1 is the precision matrix.
2π 2
(x1 − µ1 )2 (x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2
  
1 1
= exp − − 2ρ +
2(1 − ρ2 ) σ12 σ22
p
2πσ1 σ2 1 − ρ2 σ1 σ2
   
a1 a2 1 a3 −a2
• If P = then Σ = P−1 = .
a2 a3 a1 a3 − a2 2 −a 2 a1
 
1 σ22 −ρσ1 σ2
• P = Σ−1 = 2 2 and |Σ| = (1 − ρ2 )σ12 σ22 .
σ1 σ2 (1 − ρ2 ) −ρσ1 σ2 σ12
• The marginal distributions. X1 ∼ N (µ1 , σ12 ) and X2 ∼ N (µ2 , σ22 ).
T 1 T T 1 T
• The characteristic function: φ(t) = eit µ− 2 t Σt ; the m.g.f. is E[etX ] = et µ+ 2 t Σt
.
• The conditional distributions.  
The distribution of X1 given X2 = x2 is N µ1 + ρσ σ2
1
(x2 − µ 2 ), σ1
2 (1 − ρ2 ) .
 
ρσ2
The distribution of X2 given X1 = x1 is N µ2 + σ1 (x1 − µ1 ), σ22 (1 − ρ2 ) .
• X1 and X2 are independent iff ρ = 0.
• Linear transformation of a bivariate normal. If C is non-singular, then Y = a + CX has a bivariate normal
distribution with mean a + Cµ and variance matrix CΣCT .

41 Exercises (exs-bivnormal.tex)

41.1 Sum of two independent bivariate normals. Suppose X = (X1 , X2 ) ∼ N (µX , ΣX ) and Y = (Y1 , Y2 ) ∼ N (µY , ΣY ).
Prove that X + Y ∼ N (µX + µY , ΣX + ΣY ). [Ans]
41.2 Suppose (X, Y ) has the density
2 2
fXY (x, y) = ce−(x −xy+y )/3

(a) Find c. (b) Are X and Y independent? [Ans]


Page 128 §41 Mar 10, 2020(20:25) Bayesian Time Series Analysis

41.3 Suppose (X, Y ) has a bivariate normal distribution with density


f (x, y) = k exp −(x2 + 2xy + 4y 2 )
 

Find the mean vector and the variance matrix of (X, Y ). What is the value of k? [Ans]
41.4 Suppose (X, Y ) has a bivariate normal distribution
 with density 
1 1 2 2
f (x, y) = exp − (2x + y + 2xy − 22x − 14y + 65)
k 2
Find the mean vector and the variance matrix of (X, Y ). What is the value of k? [Ans]
41.5 Suppose the random vector Y = (Y1 , Y2 ) has the density 
1 1 2 2
f (y1 , y2 ) = exp − (y1 + 2y2 − y1 y2 − 3y1 − 2y2 + 4) for y = (y1 , y2 ) ∈ R2 .
k 2
Find E[Y] and var[Y]. [Ans]
41.6 Evaluate the integral Z ∞
exp −(y12 + 2y1 y2 + 4y22 ) dy1 dy2
 
[Ans]
−∞
41.7 Suppose the random vector Y = (Y1, Y2 ) has the density 
1 2 2
f (y1 , y2 ) = k exp − (y1 + 2y1 (y2 − 1) + 4(y2 − 1) for y = (y1 , y2 ) ∈ R2 .
12
Show that Y ∼ N (µ, Σ) and find the values of µ and Σ. [Ans]
2
41.8 Suppose (X, Y ) has the bivariate normal distribution N (µ, Σ). Let σX
= var[X], σY2
= var[Y ] and ρ = corr(X, Y ).
(a) Show that X and Y − ρσY X/σX are independent.
(b) Suppose θ satisfies tan(θ) = σX/σY , show that X cos θ + Y sin θ and X cos θ − Y sin θ are independent. [Ans]
41.9 Suppose X = (X1 , X2 ) has a bivariate normal distribution with E[X1 ] = E[X2 ] = 0 and variance matrix Σ. Prove that
X2
XT PX − 21 ∼ χ21
σ1
where P is the precision matrix of X. [Ans]
41.10 (a) Suppose E[X1 ] = µ1 , E[X2 ] = µ2 and there exists α such that Y = X1 + αX2 is independent of X2 . Prove that
E[X1 |X2 ] = µ1 + αµ2 − αX2 .
(b) Use part (a) to derive E[X1 |X2 ] for the bivariate normal. [Ans]
41.11 Suppose X = (X, Y ) has the bivariate normal distribution N (µ, Σ). Because the matrix Σ is positive definite, the
Cholesky decomposition asserts that there is a unique lower triangular matrix L with Σ = LLT . Define the random
vector Z by X = µ + LZ. Show that Z ∼ N (0, I) and hence the two components of Z are i.i.d. random variables with
the N (0, 1) distribution. [Ans]
41.12 An alternative method
p for constructing the bivariate normal. Suppose X and Y are i.i.d. N (0, 1). Suppose ρ ∈ (−1, 1)
and Z = ρX + 1 − ρ2 Y .
(a) Find the density of Z.
(b) Find the density of (X, Z).
(c) Suppose µ1 ∈ R, µ2 ∈ R, σ1 > 0 and σ2 > 0. Find the density of (U, V ) where U = µ1 + σ1 X and V = µ2 + σ2 Z.
[Ans]
41.13 Suppose X1 ∼ N (0, σ12 ), X2 ∼ N (0, σ22 ) and X1 and X2 are independent. Let Z = X1 + X2 .
(a) Find the distribution of the random vector (X1 , Z).
(b) Find E[X1 |Z] and E[eitX1 |Z]. [Ans]
41.14 Suppose X1 , . . . , Xn are i.i.d. random variables with the N (0, σ12 ) distribution and Y1 , . . . , Yn are i.i.d. random variables
with the N (0, σ22 ) distribution. Suppose further that all the random variables X1 , . . . , Xn , Y1 , . . . , Yn are independent.
Let Zj = Xj + Yj for j = 1, . . . , n; hence Zj ∼ N (0, σ12 + σ22 ).
Define α : Rn → {1, 2, . . . , n} by 
α(x1 , . . . , xn ) = min j ∈ {1, 2, . . . , n} : xj = max{x1 , . . . , xn }
Let W = Xα(Z1 ,...,Zn ) . Find expressions for E[W ] and E[eitW ]. (We can think of this as a problem where the Zj are
observed but we are really interested in the Xj .) [Ans]
41.15 Suppose (X1 , X2 ) has the bivariate normal distribution with density given by equation(40.3a). Define Q by:
e−Q(x1 ,x2 )
fX1 X2 (x1 , x2 ) = p
2πσ1 σ2 1 − ρ2
Hence
(x1 − µ1 )2 (x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2
 
1
Q(x1 , x2 ) = − 2ρ +
2(1 − ρ2 ) σ12 σ1 σ2 σ22
Define the random variable Y by Y = Q(X1 , X2 ). Show that Y has the exponential density. [Ans]
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §41 Page 129

41.16 (a) Suppose (X1 , X2 ) has a bivariate normal distribution with E[X1 ] = E[X2 ] = 0. Hence it has characteristic function
   
1 T 1 2 2 2 2

φX (t) = exp − t Σt = exp − σ1 t1 + 2σ12 t1 t2 + σ2 t2
2 2
Explore the situations when Σ is singular.
(b) Now suppose (X1 , X2 ) has a bivariate normal distribution without the restriction of zero means. Explore the situa-
tions when the variance matrix Σ is singular. [Ans]
41.17 Suppose T1 and T2 are i.i.d. N (0, 1). Set X = a1 T1 + a2 T2 and Y = b1 T1 + b2 T2 where a21 + a22 > 0 and a1 b2 6= a2 b1 .
(a) Show that E[Y |X] = X(a1 b1 + a2 b2 )/(a21 + a22 ).
 2
(b) Show that E Y − E(Y |X) = (a1 b2 − a2 b1 )2 /(a21 + a22 ). [Ans]
41.18 (a) Suppose (X, Y ) has a bivariate normal distribution with var[X] = var[Y ]. Show that X + Y and X − Y are
independent random variables.
(b) Suppose (X, Y ) has a bivariate normal distribution with E[X] = E[Y ] = 0, var[X] = var[Y ] = 1 and cov[X, Y ] =
ρ. Show that X 2 and Y 2 are independent iff ρ = 0.
2
(Note. If var[X] = σX and var[Y ] = σY2 then just set X1 = X/σX and Y1 = Y /σY .) [Ans]
41.19 (a) Suppose (X, Y ) has a bivariate normal distribution N (µ, Σ) with
   2 
0 σ ρσ 2
µ= and Σ=
0 ρσ 2 σ 2
Let (R, Θ) denote the polar coordinates of (X, Y ). Find the distribution of (R, Θ) and the marginal distribution of
Θ.
If ρ = 0, equivalently if X and Y are independent, show that R and Θ are independent.
(b) Suppose X and Y are i.i.d. random variables with the N (0, σ 2 ) distribution. Let
X2 − Y 2 2XY
T1 = √ and T2 = √
2
X +Y 2 X2 + Y 2
Show that T1 and T2 are i.i.d. random variables with the N (0, 1) distribution. [Ans]
41.20 Suppose (X1 , X2 ) has a bivariate normal distribution with E[X1 ] = E[X2 ] = 0. Let Z = X1 /X2 .
(a) Show that
p
σ1 σ2 1 − ρ2
fZ (z) =
π(σ22 z 2 − 2ρσ1 σ2 z + σ12 )
(b) Suppose X1 and X2 are i.i.d. random variables with the N (0, σ 2 ) distribution.
(i) What is the distribution of Z?
(ii) What is the distribution of W = (X − Y )/(X + Y )? [Ans]
41.21 Suppose (X1 , X2 ) has a bivariate normal distribution with E[X1 ] = E[X2 ] = 0 and var[X1 ] = var[X2 ] = 1. Let
ρ = corr[X1 , X2 ] = cov[X1 , X2 ] = E[X1 X2 ]. Show that
X12 − 2ρX1 X2 + X22
∼ χ22 [Ans]
1 − ρ2

41.22 Suppose (X, Y ) has a bivariate normal distribution with E[X] = E[Y ] = 0. Show that
1 1
P[X ≥ 0, Y ≥ 0] = P[X ≤ 0, Y ≤ 0] = + sin−1 ρ
4 2π
1 1
P[X ≤ 0, Y ≥ 0] = P[X ≥ 0, Y ≤ 0] = − sin−1 ρ [Ans]
4 2π
41.23 Normality of conditional distributions does not imply normality. Suppose the random vector (X, Y ) has the density
f(X,Y ) (x, y) = C exp −(1 + x2 )(1 + y 2 )
 
for x ∈ R and y ∈ R.
Find the marginal distributions of X and Y and show that the conditional distributions (X|Y = y) and (Y |X = x) are
both normal. [Ans]
Page 130 §42 Mar 10, 2020(20:25) Bayesian Time Series Analysis

42 The multivariate normal


42.1 The multivariate normal distribution. The n × 1 random vector X has a non-singular multivariate
normal distribution iff X has density
 
1 T
fX (x) = C exp − (x − µ) P(x − µ) for x ∈ Rn (42.1a)
2
where
• C is a constant so that the density integrates to 1;
• µ is a vector in Rn ;
• P is a real symmetric positive definite n × n matrix called the precision matrix.
42.2 Integrating the density. Because P is a real symmetric positive definite matrix, there exists an orthogonal
matrix L with
P = LT DL
where L is orthogonal and D is diagonal with entries d1 > 0, . . . , dn > 0. This result is explained in §38.4 on
page 118; the values d1 , . . . , dn are the eigenvalues of P.
Consider the transformation Y = L(X − µ); this is a 1 − 1 transformation: Rn → Rn which has a Jacobian with
absolute value:
∂(y1 , . . . , yn )
∂(x1 , . . . , xn ) = |(det(L)| = 1

Note that X − µ = LT Y. The density of Y is


    n  
1 T T 1 T Y 1 2
fY (y) = C exp − y LPL y = C exp − y Dy = C exp − dj yj
2 2 2
j=1
It follows that Y1 , . . . , Yn are independent with distributions N (0, 1 ), . . . , N (0, 1/dn ) respectively, and
1/d
√ √ √
d1 · · · dn det(D) det(P)
C= n/2
= n/2
=
(2π) (2π) (2π)n/2
So equation(42.1a) becomes
√  
det(P) 1
fX (x) = exp − (x − µ) P(x − µ)T
for x ∈ Rn (42.2a)
(2π)n/2 2
Note that the random vector Y satisfies E[Y] = 0 and var[Y] = D−1 . Using X = µ + LT Y gives
E[X] = µ
var[X] = var[L Y] = LT var[Y]L = LT D−1 L = P−1
T

and hence P is the precision matrix—the inverse of the variance matrix. So equation(42.1a) can be written as
 
1 1 T −1
fX (x) = √ exp − (x − µ) Σ (x − µ) for x ∈ Rn (42.2b)
(2π)n/2 det(Σ) 2
where µ = E[X] and Σ = P−1 = var[X]. This is defined to be the density of the N (µ, Σ) distribution.
Notes.
• A real matrix is the variance matrix of a non-singular normal distribution iff it is symmetric and positive
definite.
• The random vector X is said to have a spherical normal distribution iff X ∼ N (µ, σ 2 I). Hence X1 , . . . , Xn are
independent and have the same variance.
42.3 The characteristic function. Suppose the n-dimensional random vector X has the N (µ, Σ) distribution.
We know that using the transformation Y = L(X − µ) leads to Y ∼ N (0, D−1 ). Because L is orthogonal we have
X = µ + LT Y. Hence the characteristic function of X is:
h T i h T T T
i T
h T T i
φX (t) = E eit X = E eit µ+it L Y = eit µ E eit L Y for all n × 1 vectors t ∈ Rn .
But Y1 , . . . , Yn are independent with distributions N (0, 1/d1 ), . . . , N (0, 1/dn ), respectively. Hence
h T i 2 2 1 T −1
E eit Y = E ei(t1 Y1 +···+tn Yn ) = e−t1 /2d1 · · · e−tn /2dn = e− 2 t D t for all n × 1 vectors t ∈ Rn .
 

Applying this result to the n × 1 vector Lt gives


h T T i 1 T T −1 1 T
E eit L Y = e− 2 t L D Lt = e− 2 t Σt
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §42 Page 131

We have shown that if X ∼ N (µ, Σ) then


h T i T 1 T
φX (t) = E eit X = eit µ− 2 t Σt for all t ∈ Rn .
T T
µ+ 21 tT Σt
The moment generating function of X is E[et X] = et .

42.4 The singular multivariate normal distribution. In the last section we saw that if µ ∈ Rn and Σ is an
n × n real symmetric positive definite matrix, then the function φ : Rn → Cn with
T
µ− 12 tT Σt
φ(t) = eit for t ∈ Rn .
is a characteristic function. The condition on Σ can be relaxed to non-negative definite as follows:
Proposition(42.4a). Suppose µ ∈ Rn and V is an n × n real symmetric non-negative definite matrix. Then
the function φ : Rn → Cn with
T
µ− 21 tT Vt
φ(t) = eit for t ∈ Rn (42.4a)
is a characteristic function.
Proof. For n = 1, 2, . . . , set Vn = V + n1 I where I is the n × n identity matrix. Then Vn is symmetric and positive
definite and so
T 1 T
φn (t) = eit µ− 2 t Vn t
is a characteristic function.
Also φn (t) → φ(t) as n → ∞ for all t ∈ Rn . Finally, φ is continuous at t = 0. It follows that φ is a characteristic function
by the multidimensional form of Lévy’s convergence theorem3 .

If V is symmetric and positive definite, then we know that φ in equation(42.4a) is the characteristic function of
the N (µ, V) distribution.
If V is only symmetric and non-negative definite and not positive definite, then by §38.3 on page 118, we know
that some linear combination of the components is zero and the density does not exist. In this case, we say that
the distribution with characteristic function φ is a singular multivariate normal distribution.

42.5 Linear combinations of the components of a multivariate normal.


Proposition(42.5a). Suppose the n-dimensional random vector X has the possibly singular N (µ, Σ) distribu-
tion. Then for any n × 1 vector ` ∈ Rn the random variable Z = `T X has a normal distribution.
Proof. Use characteristic functions. For t ∈ R we have
T T 1 2 T
φZ (t) = E[eitZ ] = E[eit` X ] = φX (t`) = eit` µ− 2 t ` Σ`

and hence Z ∼ N `T µ, `T Σ` .
Conversely:
Proposition(42.5b). Suppose X is an n-dimensional random vector such that for every n × 1 vector ` ∈ Rn
the random variable `T X is univariate normal. Then X has the multivariate normal distribution.
T
Proof. The characteristic function of X is φX (t) = E[eit X ]. Now Z = tT X is univariate normal. Also E[tT X] = tT µ
and var[tT X] = tT Σt where µ = E[X] and Σ = var[X]. Hence Z ∼ N (tT µ, tT Σt). Hence the characteristic function
of Z is, for all u ∈ R:
T 1 2 T
φZ (u) = eiut µ− 2 u t Σt
Take u = 1; hence
T 1 T
φZ (1) = eit µ− 2 t Σt
T
But φZ (1) = E[eiZ ] = E[eit X ]. So we have shown that
T T
µ− 21 tT Σt
E[eit X
] = eit
and so X ∼ N (µ, Σ).
Combining these two previous propositions gives a characterization of the multivariate normal distribution:
the n-dimensional random vector X has a multivariate normal distribution iff every linear combination of
the components of X has a univariate normal distribution.

3
Also called the “Continuity Theorem.” See, for example, page 361 in [F RISTEDT &G RAY(1997)].
Page 132 §42 Mar 10, 2020(20:25) Bayesian Time Series Analysis

42.6 Linear transformation of a multivariate normal.


Proposition(42.6a). Suppose the n-dimensional random vector X has the non-singular N (µ, Σ) distribution.
Suppose further that B is an m × n matrix with m ≤ n and rank(B) = m; hence B has full rank.
Let Z = BX. Then Z has the non-singular N (Bµ, BΣBT ) distribution.
Proof. We first establish that BΣBT is positive definite. Suppose x ∈ Rm with xT BΣBT x = 0. Then yT Σy = 0 where
y = BT x. Because Σ is positive definite, we must have y = 0. Hence BT x = 0; hence x1 αT1 + · · · + xm αTm = 0 where
α1 , . . . , αm are the m rows of B. But rank(B) = m; hence x = 0 and hence BΣBT is positive definite.
The characteristic function of Z is, for all t ∈ Rm :
T T T 1 T T
φZ (t) = E[eit Z ] = E[eit BX ] = φX (BT t) = eit Bµ− 2 t BΣB T (42.6a)
Hence Z ∼ N (Bµ, BΣBT ).
What if B is not full rank? Suppose now that X has the possibly singular N (µ, Σ) distribution and B is any m × n
matrix where now m > n is allowed. Equation(42.6a) for φZ (t) still holds. Also if v is any vector in Rm , then
vT BΣBT v = zT Σz where z is the n × 1 vector BT v. Because Σ is non-negative definite, it follows that BΣBT is
non-negative definite. Hence Y = BX has the possibly singular N (Bµ, BΣBT ) distribution. We have shown the
following result:
Corollary(42.6b). Suppose that X has the possibly singular N (µ, Σ) distribution and B is any m × n matrix
where m > n is allowed. Then Y = BX has the possibly singular N (Bµ, BΣBT ) distribution.
Here is an example where AX and BX have the same distribution but A 6= B.

Example(42.6c). Suppose
√ the 2-dimensional random vector X ∼ N (0, I). Let Y1 = X1 + X2 , Y2 = 2X1 + X2 , Z1 = X1 2
and Z2 = (3X1 + X2 )/ 2. Then
 √ 
1 1 2 0
Y = AX where A = and Z = BX where B = 3/√2 1/√2
2 1
Let Σ = AAT = BBT . Then Y ∼ N (0, Σ) and Z ∼ N (0, Σ).

42.7 Transforming a multivariate normal into independent normals. The following proposition shows that
we can always transform the components of a non-singular multivariate normal into i.i.d. random variables with
the N (0, 1) distribution; see also §38.4 on page 118. We shall show below in §42.14 on page 138 how to convert
a singular multivariate normal into a non-singular multivariate normal.
Proposition(42.7a). Suppose the random vector X is has the non-singular N (µ, Σ) distribution.
Then there exists a non-singular matrix Q such that the QT Q = P, the precision matrix, and
Q(X − µ) ∼ N (0, I)
Proof. From §42.2 on page 130 we know that the precision matrix P = Σ−1 = LT DL where √ L is orthogonal
√ and
D = diag[d1 , . . . , dn ] with d1 > 0, . . . , dn > 0. Hence P = LT D1/2 D1/2 L where D1/2 = diag[ d1 , . . . , dn ]. Hence
P = QT Q where Q = D1/2 L. Because L is non-singular, it follows that Q is also non-singular.
If Z = Q(X − µ), then E[Z] = 0 and var[Z] = QΣQT = QP−1 QT = I. Hence Z ∼ N (0, I) and Z1 , . . . , Zn are
i.i.d. random variables with the N (0, 1) distribution.

It also follows that if Y = LX where L is the orthogonal matrix which satisfies LT DL = Σ−1 , then Y ∼
N (Lµ, D−1 ). Hence Y1 , Y2 , . . . , Yn are independent with var[Yk ] = 1/dk where 1/d1 , 1/d2 , . . . , 1/dn are the
eigenvalues of Σ.
We now have another characterization of this result: the random vector X has a non-singular normal distri-
bution iff there exists an orthogonal transformation L such that the random vector LX has independent
normal components.
An orthogonal transformation of a spherical normal is a spherical normal: if the random vector has the spherical
normal distribution N (µ, σ 2 I) and L is orthogonal, then Y ∼ N (Lµ, σ 2 I) and hence Y also has a spherical normal
distribution.
42.8 The marginal distributions. Suppose the n-dimensional random vector X has the N (µ, Σ) distribution.
Then the characteristic function of X is
T 1 T
φX (t) = E[ei(t1 X1 +···+tn Xn ) ] = eit µ− 2 t Σt for t ∈ Rn .
and hence the characteristic function of X1 is
1 2
φX1 (t1 ) = φX (t1 , 0, . . . , 0) = eiµ1 t1 − 2 t1 Σ11
and so X1 ∼ N (µ1 , Σ11 ). Similarly, Xj ∼ N (µj , Σjj ) where Σjj is the (j, j) entry in the matrix Σ.
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §42 Page 133

Similarly, the random vector (Xi , Xj ) has the bivariate normal distribution with mean vector (µi , µj ) and variance
matrix  
Σii Σij
Σij Σjj
In general, we see that every marginal distribution of a multivariate normal is normal.
An alternative method for deriving marginal distributions is to use proposition (42.6a) on page 132: for example,
X1 = aX where a = (1, 0, . . . , 0).
The converse is false!!
Example(42.8a). Suppose X and Y are independent two dimensional random vectors with distributions N (µ, ΣX ) and
N (µ, ΣY ) respectively where
     
µ1 σ12 ρ1 σ1 σ2 σ12 ρ2 σ1 σ2
µ= ΣX = and ΣY =
µ2 ρ1 σ1 σ2 σ22 ρ2 σ1 σ2 σ22
and ρ1 6= ρ2 . Let

X with probability 1/2;
Z=
Y with probability 1/2.
Show that Z has normal marginals but is not bivariate normal. See also exercise 43.15 on page 140.
Solution. Let Z1 and Z2 denote the components of Z; hence Z = (Z1 , Z2 ). Then Z1 ∼ N (µ1 , σ12 ) and Z2 ∼ N (µ2 , σ22 ).
Hence every marginal distribution of Z is normal.
Now E[Z] = µ and var[Z] = E[(Z − µ)(Z − µ)T ] = 1/2(ΣX + ΣY ). The density of Z is fZ (z) = 21 fX (z) + 12 fY (z) and this is
not the density of N (µ, 1/2(ΣX + ΣY ))—we can see that by comparing the values of these two densities at z = µ.
A special case is when ρ1 = −ρ2 ; then cov[Z1 , Z2 ] = 0, Z1 and Z2 are normal but Z = (Z1 , Z2 ) is not normal.
Now for the general case of a subvector of X.
Proposition(42.8b). Suppose X is an n-dimensional random vector with the possibly singular N (µ, Σ) distri-
bution, and X is partitioned into two sub-vectors:
X1
 
k×1 
X = where n = k + `.
n×1 X2
`×1
Now partition µ and Σ conformably as follows:
µ1 Σ11 Σ12
   
k×1  k×k k×`
µ = and Σ =  (42.8a)
n×1 µ2 n×n Σ21 Σ22
`×1 `×k `×`
Note that Σ21 = ΣT12 . Then
(a) X1 ∼ N (µ1 , Σ11 ) and X2 ∼ N (µ2 , Σ22 );
(b) the random vectors X1 and X2 are independent
h i Σ12 = 0, equivalently iff cov[X1 , X2 ] = 0.
iff
T T 1 T
Proof. Now the characteristic function of X is E eit X = eit µ− 2 t Σt for all t ∈ Rn . Partitioning t conformably into
t = (t1 , t2 ) gives
h T i h T T
i T 1 T
E eit X = E eit1 X1 +it2 X2 = eit µ− 2 t Σt
Setting t2 = 0 shows that X1 ∼ N (µ1 , Σ11 ). Hence part (a).
⇒ We are given cov[X1i , X2j ] = 0 for all i = 1, . . . , k and j = 1, . . . , `. Hence Σ12 = 0.
⇐ Because Σ12 = 0 we have
 
Σ11 0
Σ=
0 Σ22
The characteristic function of X gives
h T i T 1 T T 1 T T 1 T
E eit X = eit µ− 2 t Σt = eit1 µ1 − 2 t1 Σ11 t1 eit2 µ2 − 2 t2 Σ22 t2 for all t ∈ Rn .
h T T
i h T i h T i
and hence E eit1 X1 +it2 X2 = E eit1 X1 E eit2 X2 and hence X1 and X2 are independent.

The converse of this result is considered in exercise 43.16 on page 141.


Similarly, if X ∼ N (µ, Σ) and we partition X into 3 sub-vectors X1 , X2 and X3 , then these 3 sub-vectors are
independent iff Σ12 = 0, Σ13 = 0 and Σ23 = 0. More generally:
• Pairwise independence implies independence for sub-vectors of a multivariate normal.
• If X ∼ N (µ, Σ) then X1 , . . . , Xn are independent iff all covariances equal 0. (42.8b)
Page 134 §42 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Example(42.8c). Suppose the 5-dimensional random vector X = (X1 , X2 , X3 , X4 , X5 ) has the N (µ, Σ) distribution where
2 4 0 0 0
 
4 3 0 0 0 
Σ = 0 0 1 0 0 
 
0 0 0 4 −1
 
0 0 0 −1 3
Then (X1 , X2 ), X3 , and (X4 , X5 ) are independent.
42.9 Conditional distributions. To prove the following proposition, we use the following result: suppose
W1 is a k-dimensional random vector;
W2 is an `-dimensional random vector;
W1 and W2 are independent;
h is a function : R` → Rk .
Let V = W1 + h(W2 ). Then the conditional distribution of V given W2 has density
f(V,W2 ) (v, w2 ) f(W1 ,W2 ) (v − h(w2 ), w2 )
fV|W2 (v|w2 ) = = = fW1 ( v − h(w2 ) )
fW2 (w2 ) fW2 (w2 )
In particular, if W1 ∼ N (µ1 , Σ1 ) then the conditional distribution of V = W1 + h(W2 ) given W2 = w2 has the
density of the N ( µ1 + h(w2 ), Σ1 ) distribution. We have shown the following.
Suppose W1 has the non-singular N (µ1 , Σ1 ) distribution and W2 is independent of W1 .
Then the conditional density of W1 + h(W2 ) given W2 = w2 is the density of N ( µ1 +
h(w2 ), Σ1 ). (42.9a)

Proposition(42.9a). Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribu-
tion, and X is partitioned into two sub-vectors:
X1
 
k×1 
X = where n = k + `.
n×1 X2
`×1
Partition µ and Σ conformably as in equations(42.8a). Then the conditional distribution of X1 given X2 = x2
is the normal distribution:
N µ1 + Σ12 Σ−1 −1

22 (x2 − µ2 ), Σ11 − Σ12 Σ22 Σ21 (42.9b)
Proof. We shall give two proofs of this important result; the first proof is shorter but requires knowledge of the answer!
Proof 1. Let
I −Σ12 Σ−1
 
22
k×k k×`
B = 
n×n 0 I
`×k `×`
Note that B is invertible with inverse
I Σ12 Σ−1
 
B−1 = 22
0 I
T
But by proposition(42.6a), we know that BX ∼ N (Bµ, BΣB ). where
X1 − Σ12 Σ−1 µ1 − Σ12 Σ−1 Σ11 − Σ12 Σ−1
     
BX = 22 X2 Bµ = 22 µ2 and BΣBT = 22 Σ21 0
X2 µ2 0 Σ22
It follows that X1 − Σ12 Σ−122 X2 is independent of X2 . Also
X1 − Σ12 Σ−1 −1 −1

22 X2 ∼ N µ1 − Σ12 Σ22 µ2 , Σ11 − Σ12 Σ22 Σ21
It follows by the boxed result(42.9a) that the conditional distribution of X1 given X2 = x2 has the density of
N µ1 + Σ12 Σ−1 −1

22 (x2 − µ2 ), Σ11 − Σ12 Σ22 Σ21
Proof 2. We want to construct a k × 1 random vector W1 with
W1 = C1 X1 + C2 X2
where C1 is k × k and C2 is k × ` and such that W1 is independent of X2 .
Now if W1 is independent of X2 , then C0 W1 is also independent of X2 for any C0 ; hence the answer is arbitrary up to
multiplicative C0 . So take C1 = I. This implies we are now trying to find a k × ` matrix C2 such that W1 = X1 + C2 X2 is
independent of X2 .
Now cov[W1 , X2 ] = 0; hence cov(X1 , X2 ) + cov(C2 X2 , X2 ) = 0; hence Σ12 + C2 Σ22 = 0 and hence C2 = −Σ12 Σ−1 22 .
So W1 = X1 − Σ12 Σ−1 X
22 2 is independent of X 2 . Also X 1 = W 1 − C X
2 2 ; hence var[X 1 ] = var[W 1 ] + C 2 var[X T
2 ]C2 ;
−1
hence var[W1 ] = Σ11 − Σ12 Σ22 Σ21 . The rest of the proof is as in the first proof.
The proof shows that the unique linear function of X2 which makes X1 − C2 X2 independent of X2 is Σ12 Σ−1
22 .
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §42 Page 135

Corollary(42.9b). The distribution in (42.9b) is a non-singular multivariate normal. Hence the matrix Σ11 −
Σ12 Σ−1
22 Σ21 is positive definite and non-singular.
Proof. Note that the inverse of a positive definite matrix is positive definite and any principal sub-matrix of a positive
definite matrix is positive definite. Both these results can be found on page 214 of [H ARVILLE(1997)].
Now Σ is positive definite; hence the inverse P = Σ−1 is also positive definite. Partition P conformably as
 
P11 P12
P=
P21 P22
Because PΣ = I we have
P11 Σ11 + P12 Σ21 = I P21 Σ11 + P22 Σ21 = 0
P11 Σ12 + P12 Σ22 = 0 P21 Σ12 + P22 Σ22 = I
Hence
P12 = −P11 Σ12 Σ−1 −1
22 and P21 = −P22 Σ21 Σ11
Hence
P11 Σ11 − Σ12 Σ−1
 
22 Σ21 = I

Hence the matrix Σ11 − Σ12 Σ−1 22 Σ21 is non-singular with inverse P11 . Because P is positive definite, P11 is also positive
definite and hence its inverse is positive definite.

42.10 The matrix of regression coefficients, partial covariance and partial correlation coefficients. The
k × ` matrix Σ12 Σ−1 22 is called the matrix of regression coefficients of the k-dimensional vector X1 on the `-
dimensional vector X2 ; it is obtained by multiplying the k × ` matrix cov[X1 , X2 ] = Σ12 by the ` × ` precision
matrix of X2 .
Similarly, the matrix of regression coefficients of X2 on X1 is the ` × k matrix Σ21 Σ−1
11 .

Let D1 denote the variance matrix of the conditional distribution of X1 given X2 . Hence D1 is the k × k invertible
matrix Σ11 − Σ12 Σ−1 −1
22 Σ21 . Similarly, let D2 denote the ` × ` invertible matrix Σ22 − Σ21 Σ11 Σ12 .
By postmultiplying the following partitioned matrix by the partitioned form of Σ, it is easy to check that
D−1 −D−1 Σ12 Σ−1
 
−1 1 1 22
P=Σ = (42.10a)
−D−1
2 Σ21 Σ11
−1
D−1
2
The matrix D1 , which is the variance of the conditional distribution of X1 given X2 , is also called the partial
covariance of X1 given X2 . Thus the partial covariance between X1j and X1k given X2 is [D1 ]jk and the partial
correlation between X1j and X1k given X2 is defined to be
[D1 ]jk
p √
[D1 ]jj [D1 ]kk
This is sometimes denoted ρjk·r1 r2 ...r` . For example, ρ13·567 is the partial correlation between X1 and X3 in the
conditional distribution of X1 , X2 , X3 , X4 given X5 , X6 and X7 .
If X1 and X2 are independent then Σ12 = 0 and hence D1 = Σ11 ; this means that the partial covariance between
X1j and X1k given X2 equals the ordinary covariance cov[X1j , X1k ]; similarly the partial correlation equals the
ordinary correlation. In general these quantities are different and indeed may have different signs.
Example(42.10a). Suppose the 4-dimensional random vector X has the N (µ, Σ) distribution where
2 4 2 −1 2
   
3  2 8 3 −2 
µ =   and Σ = 
1 −1 3 5 −4

4 2 −2 −4 4
Find ρ13 and ρ13·24 and show that they have opposite signs.
√ √
Solution. Now ρ13 = σ13 /(σ11 σ33 ) = −1/ 4 × 5 = −1/ 20.
We have X1 = (X1 , X3 )T and X2 = (X2 , X4 )T ; hence
   −1  −1    
4 −1 2 2 8 −2 2 3 1 12 4
D1 = Σ11 − Σ12 Σ−1 22 Σ 21 = − =
−1 5 3 −4 −2 4 2 −4 7 4 6
√ √ √ √
Hence ρ13·24 = 4/ 12 × 6 = 2/ 18 = 2/ 3
Page 136 §42 Mar 10, 2020(20:25) Bayesian Time Series Analysis

42.11 The special case of the conditional distribution of X1 given (X2 , . . . , Xn ). For this case, k = 1 and
` = n − 1. Hence D1 is 1 × 1. Denote the first row of the precision matrix P by [q11 , q12 , . . . , q1n ]. Then by
equation(42.10a)
[q11 , q12 , . . . , q1n ] = D−1 −1 −1
 
1 , −D1 Σ12 Σ22
Hence
1 1
D1 = and Σ12 Σ−1 22 = − [q12 , . . . , q1n ]
q11 q11
By equation(42.9b) on page 134, the conditional distribution of X1 given (X2 , . . . , Xn ) = (x2 , . . . , xn ) is
q12 (x2 − µ2 ) + · · · + q1n (xn − µn ) 1
 
−1

N µ1 + Σ12 Σ22 (x2 − µ2 ), D1 = N µ1 − ,
q11 q11
The proof of proposition(42.9a) on page 134 also shows that
the unique linear function a2 X2 + · · · + an Xn which makes X1 − (a2 X2 + · · · + an Xn ) independent of
(X2 , . . . , Xn ) is given by a2 = − qq12
11
, . . . , an = − qq1n
11
.
42.12 The joint distribution of X and S 2 . This is a very important result in statistical inference!!
Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution. To prove independence of X and
S 2 we can proceed as follows. Let Y = BX where B is the n × n matrix
/n 1/n ··· 1/n 
 1
 − 1/n 1 − 1/n · · · − 1/n 
− /n − 1/n · · · − 1/n 
 1 
B=  .
 .. .. .. .. 
. . . 
− 1/n − 1/n · · · 1 − 1/n
Hence (Y1 , Y2 , . . . , Yn ) = (X, X2 − X, . . . , Xn − X). Then
 
1/n 01×(n−1)
T
BB = where A is an (n − 1) × (n − 1) matrix.
0(n−1)×1 A
Hence X is independent of (Y2 , . . . , Yn ) = (X2 − X, . . . , Xn − X). Now (X1 − X) + nk=2 Yk = nk=1 (Xk − X) = 0
P P
Pn 2 2
Yk . Finally (n−1)S 2 = nk=1 (Xk −X)2 = (X1 −X)2 + nk=2 Yk2 =
Pn
and hence (X1 −X)2 =
P P
k=2 k=2 Yk +
Pn 2 2
k=2 Yk . Hence X is independent of S . See also exercise 16.20 on page 51.
The following proposition also derives the distribution of S 2 .
Proposition(42.12a). Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution. Let
Pn Pn
Xk (Xk − X)2
X = k=1 and S 2 = k=1
n n−1
2
Then X and S are independent; also
σ2 (n − 1)S 2
 
X ∼ N µ, and ∼ χ2n−1 (42.12a)
n σ2
Proof. We shall give two proofs. Pn
Method 1. Let Yk = (Xk − µ)/σ for k = 1, . . . , n. Then Y = k=1 Yk /n = (X − µ)/σ and
Pn 2 n
(n − 1)S 2 k=1 (Xk − X)
X
= = (Yk − Y)2
σ2 σ2
k=1
Because Y1 , . . . , Yn are i.i.d. N (0, 1), we have Y = (Y1 , . . . , Yn ) ∼ N (0, I).
Consider the transformation from Y = (Y1 , . . . , Yn ) to Z = (Z1 , . . . , Zn ) with Z = AY where A is defined as follows.
Y1 + · · · + Yn √
Z1 = √ = nY
n
 
Hence the first row of the matrix A is √1n , . . . , √1n . Construct the other (n − 1) rows of A so that A is orthogonal. For
the explicit value of A, see exercise 43.12 on page 140. Because A is orthogonal, we have AAT = I and hence
Xn Xn
2 T T T T
Zk = Z Z = Y A AY = Y Y = Yk2
k=1 k=1
Pn Pn
Now Y ∼ N (0, I); hence Z is also N (0, I). Now k=1 Zk2 = k=1 Yk2 . Hence
n n n n
X X X 2 X (n − 1)S 2
Zk2 = Yk2 − Z12 = Yk2 − nY = (Yk − Y)2 =
σ2
k=2 k=1 k=1 k=1
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §42 Page 137

This proves Z1 = n Y is independent of S 2 and hence X is independent of S 2 . It also shows that
n
(n − 1)S 2 X 2
= Zk ∼ χ2n−1
σ2
k=2
Hence result.
Method 2. This is based on an algebraic trick applied to moment generating functions.
For all t1 ∈ R, . . . , tn ∈ R we have
Xn Xn
tk (Xk − X) = tk Xk − nX t
k=1 k=1
and hence for all t0 ∈ R we have
n n   n
X X t0 X
t0 X + tk (Xk − X) = + tk − t Xk = ck Xk
n
k=1 k=1 k=1
Pn Pn Pn
where ck = + (tk − t). Note that k=1 ck = t0 and
t0/n 2
+ k=1 (tk − t)2 . t20/n
= k=1 ck
Now let t = (t0 , t1 , . . . , tn ); then the moment generating function of the vector Z = (X, X1 − X, . . . , Xn − X) is
n n n
!
σ 2 c2k σ 2 t20 σ2 X
Y Y    
t·Z c1 X1 +···cn Xn ck Xk 2
E[e ] = E[e ]= E[e ]= exp µck + = exp µt0 + exp (tk − t)
2 2n 2
k=1 k=1 k=1
t0 X
The first factor is E[e ]. Hence X is independent of the vector (X1 − X, . . . , Xn − X) and hence X and S 2 are
independent.
Using the identity Xk − µ = (Xk − X) + (X − µ) gives
n n 2 2
X − µ) (n − 1)S 2 X − µ)
 
1 X 2 1 X 2
(Xk − µ) = (X k − X) + √ = + √ (42.12b)
σ2 σ2 σ n σ2 σ n
k=1 k=1
The left hand side has the χ2n distribution and the second term on the right hand side has the χ21 distribution. Using
moment generating functions and the independence of the two terms on the right hand side of equation(42.12b) gives
(n − 1)S 2 /σ 2 ∼ χ2n−1 .

42.13 Daly’s theorem. This can be regarded as a generalization of the result that X and S 2 are independent.
Proposition(42.13a). Suppose X ∼ N (µ, σ 2 I) and the function g : Rn → R is translation invariant: this
means g(x + a1) = g(x) for all x ∈ Rn and all a ∈ R. Then X and g(X) are independent.
Proof. This proof is based on [DALY(1946)]. The density of X = (X1 , . . . , Xn ) is
1 − 12
P
(xk −µk )2
fX (x1 , . . . , xn ) = n/2 n
e 2σ k for (x1 , . . . , xn ) ∈ Rn .
(2π) σ

Hence the moment generating function of X, g(X) is
Z Z t P
1 0 xk +t1 g(x1 ,...,xn ) − 1 2
P
(xk −µk )2
E[et0 X+t1 g(X) ] = n/2 n
· · · e n k e 2σ k dx1 · · · dxn
(2π) σ
The exponent is
2
σ 2 t20 σ 2 t0

t0 X 1 X 2 1 X
xk − 2 (xk − µk ) + t1 g(x1 , . . . , xn ) = + t0 µ − 2 xk − µ k − + t1 g(x1 , . . . , xn )
n 2σ 2n 2σ n
k k k
Hence 2
Z Z 1
P  σ 2 t0
1 − xk −µk − +t1 g(x1 ,...,xn )
σ 2 t20 /2n+t0 µ 2σ 2
E[et0 X+t1 g(X) ] = e k n
··· e dx1 · · · dxn
(2π)n/2 x1 ∈R xn ∈R
2
Using the transformation (y1 , . . . , yn ) = (x1 , . . . , xn ) − σnt0 gives
Z Z
2 2 1 − 12
P t t
(yk −µk )2 +t1 g(y1 + n0 ,...,yn + n0 )
E[et0 X+t1 g(X) ] = eσ t0 /2n+t0 µ n/2
· · · e 2σ k dy1 · · · dyn
(2π) y1 ∈R yn ∈R
Z Z
1 − 12
P
(yk −µk )2 +t1 g(y1 ,...,yn )
= E[et0 X ] n/2
· · · e 2σ k dy1 · · · dyn
(2π) y1 ∈R yn ∈R

= E[et0 X ] E[et1 g(X) ]


Hence X and g(X) are independent.
Daly’s theorem implies X and S 2 are independent because S 2 is translation invariant. It also implies that the range
Rn = Xn:n − X1:n and X are independent because the range is translation invariant.
Example(42.13b). Suppose X1 ∼ N (µ1 , σ 2 ), X2 ∼ N (µ2 , σ 2 ), . . . , Xn ∼ N (µn , σ 2 ) are independent. Show that
Pn Pn
(n − 1)S 2 k=1 (Xk − X)2 k=1 (µk − µ)2
= ∼ χ2n−1,λ where λ =
σ2 σ2 σ2
Page 138 §42 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Solution. Using the decomposition Xk − µ = (Xk − X) + (X − µ) gives


Pn 2
Pn 2
k=1 (Xk − µ) k=1 (Xk − X) (X − µ)2
2
= 2
+
σ σ σ 2 /n
The second term on the right hand side has the χ21 distribution and momentPgenerating function 1/(1 − 2t)1/2 . The left hand
n
side has the non-central χn distribution with non-centrality parameter λ = k=1 (µk − µ)2 /σ 2 ; by equation(23.2b) on page 73
2

its moment generating function is exp(λt/(1 − 2t))/(1 − 2t)n/2 . By Daly’s theorem or exercise 16.20 on page 51, X and
P n 2
k=1 (Xk − X) are independent; hence the moment generating function of the first term on the right hand side is
 
λt 1
exp
1 − 2t (1 − 2t)(n−1)/2

42.14 Converting a singular multivariate normal into a non-singular multivariate normal distribution.
Example(42.14a). Suppose the three dimensional vector X has the N (µ, Σ) distribution where
2 2 1 3
" # " #
µ = 1 and Σ = 1 5 6
5 3 6 9
Find B and µ so that if X = BY + µ then Y ∼ N (0, Ir ).
Solution. Note that if x = [ −1 −1 1 ]T then xT Σx = 0; hence X has a singular multivariate normal distribution. Also, if
   
1 5 1 2 1
Q= √ then QQT =
13 1 8 1 5
Let
"5√
2 / 13 1/√13
" # #
√ √
µ = 1 and B = 1/ 13 8/ 13 then BBT = Σ
5 6/√13 9/√13
√ √
Hence if X = BY + µ then 3Y1 = (8X1 − X2 − 15)/ 13 and 3Y2 = (5X2 − X1 − 3)/ 13 and Y ∼ N (0, I2 ).

In general, we can proceed as follows:


Proposition(42.14b). Suppose X is an n-dimensional random vector with the N (µ, Σ) distribution where
rank(Σ) = r ≤ n. Then X = BY + µ where Y is an r-dimensional random vector with Y ∼ N (0, Ir ) and B is
a real n × r matrix with Σ = BBT and rank(B) = r.
Proof. If r = n, then the result follows from §38.4 on page 118. So suppose r < n.
Now Σ is a real symmetric non-negative definite matrix with rank(Σ) = r. Hence there exists an orthogonal matrix Q
such that  
T D 0
Σ=Q Q
0 0
where D is an r × r diagonal matrix with the non-zero eigenvalues {λ1 , . . . , λr } of Σ on the main diagonal—the values
are all strictly positive. Define the n × n matrix T by
 −1/2 
D 0
T = QT Q
0 In−r
Then T is non-singular and
 
I 0
TΣTT = r
0 0
Let W = TX; then W ∼ N (Tµ, TΣTT ). Partition W into the r × 1 vector W1 and the (n − r) × 1 vector W2 ; partition
α = Tµ conformably into α1 and α2 . Then W1 ∼ N (α1 , Ir ) and W2 = α2 with probability 1.
Because T is non-singular, we can write X = T−1 W = BW1 + CW2 where B is n × r and C is n × (n − r) and
T−1 = [ B C ]. Hence
X = BW1 + Cα2 = B(W1 − α1 ) + Bα1 + Cα2
= B(W1 − α1 ) + T−1 α
= BY + µ where Y = W1 − α1 .
Then Y ∼ N (0, Ir ) and
    
Ir 0 I 0 BT
Σ=T −1
(T−1 )T = [ B C] r = BBT
0 0 0 0 CT
Finally r = rank(Σ) = rank(BBT ) ≤ min{rank(B), rank(BT } = rank(B) ≤ r; hence rank(B) = r.
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §43 Page 139

43 Exercises (exs-multivnormal.tex)

43.1 (a) Suppose the random vector X has the N (µ, Σ) distribution. Show that X − µ ∼ N (0, Σ).
(b) Suppose X1 , . . . , Xn are independent with distributions N (µ1 , σ 2 ), . . . , N (µn , σ 2 ) respectively. Show that the
random vector X = (X1 , . . . , Xn )T has the N (µ, σ 2 I) distribution where µ = (µ1 , . . . , µn )T .
(c) Suppose X ∼ N (µ, Σ) where X = (X1 , . . . , Xn )T . Suppose further that X1 , . . . , Xn are uncorrelated. Show that
X1 , . . . , Xn are independent.
(d) Suppose X and Y are independent n-dimensional random vectors with X ∼ N (µX , ΣX ) and Y ∼ N (µY , ΣY ).
Show that X + Y ∼ N (µX + µY , ΣX + ΣY ). [Ans]
43.2 Suppose (X1 , X2 , X3 ) has the density
1 1 2 2 2 x1 x2 x3 −(x21 +x22 +x23 )
f(X1 ,X2 ,X3 ) (x1 , x2 , x3 ) =
3/2
e− 2 (x1 +x2 +x3 ) + e for x1 , x2 , x3 ∈ R.
(2π) (2π)3/2
1 2
(Note that the maximum of |x3 e− 2 x3 | occurs at x3 = ±1 and has absolute value which is less than 1. Hence f ≥ 0
everywhere.)
Show that X1 , X2 and X3 are pairwise independent but not independent. [Ans]
43.3 Suppose X ∼ N (µ, Σ) where
−3 4 0 −1
! !
µ= 1 and Σ= 0 5 0
4 −1 0 2
(a) Are (X1 , X3 ) and X2 independent?
(b) Are X1 − X3 and X1 − 3X2 + X3 independent?
(c) Are X1 + X3 and X1 − 2X2 − 3X3 independent? [Ans]
43.4 Suppose the random vector X = (X1 , X2 , X3 ) has the multivariate normal distribution N (0, Σ) where
2 1 −1
" #
Σ= 1 3 0
−1 0 5
(a) Find the distribution of (X3 |X1 = 1).
(b) Find the distribution of (X2 |X1 + X3 = 1). [Ans]
43.5 From linear regression. Suppose a is an n × m matrix with n ≥ m and rank(a) = m. Hence a has full rank.
(a) Show that the m × m matrix aT a is invertible.
(b) Suppose the n-dimensional random vector X has the N (µ, σ 2 I) distribution. Let
B = (aT a)−1 aT and Y = BX
m×n n×1
Show that
Y ∼ N Bµ, σ 2 (aT a)−1

[Ans]
43.6 Suppose the 5-dimensional random vector Z = (Y, X1 , X2 , X3 , X4 ) is multivariate normal with finite expectation E[Z] =
(1, 0, 0, 0, 0) and finite variance var[Z] = Σ where
1 1/2 1/2 1/2 1/2
 
 /2 1
1 1/2 /2 /2 
1 1

Σ =  1/2 1/2 1 1/2 1/2 


 
/2 /2 /2 1 /2
1 1 1 1 
1/2 1/2 1/2 1/2 1
Show that E[Y |X1 , X2 , X3 , X4 ] = 1 + 51 (X1 + X2 + X3 + X4 ). [Ans]
43.7 Suppose X = (X1 , X2 , X3 ) has a non-singular multivariate normal distribution with E[Xj ] = µj and var[Xj ] = σj2 for
j = 1, 2 and 3. Also
1 ρ12 ρ13
!
corr[X] = ρ12 1 ρ23
ρ13 ρ23 1
(a) Find E[X1 |(X2 , X3 )] and var[X1 |(X2 , X3 )].
(b) Find E[(X1 , X2 )|X3 ] and var[(X1 , X2 )|X3 ]. [Ans]
2
43.8 Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ ) distribution. As usual
Pn 2
2 j=1 (Xj − X)
S =
n−1
By using the distribution of S 2 , find var[S 2 ]. (See also exercise 39.13 on page 122.) [Ans]
Page 140 §43 Mar 10, 2020(20:25) Bayesian Time Series Analysis

43.9 Suppose the n-dimensional random vector X has the non-singular N (µ, Σ) distribution and j ∈ {1, 2, . . . , n}. Show
that var[Xj |X(j) ] ≤ var[Xj ] where, as usual, X(j) denotes the vector X with Xj removed. [Ans]

43.10 Continuation of proposition(25.7a).


(a) Show that c ≥ 0.
(b) Show that the size variable g1 (X) is independent of every shape vector z(X) iff the n-dimensional vector (1, . . . , 1 )
is an eigenvector of Σ.
(c) Suppose c = 0. Show that (X1 · · · Xn )1/n is almost surely constant. [Ans]

43.11 Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution. Consider the transformation from
(x1 , . . . , xn ) to (y1 , . . . , yn ) with
y1 = x − µ, y2 = x2 − x, . . . , yn = xn − x
This transformation is 1 − 1 and

∂(y1 , . . . , yn ) 1
∂(x1 , . . . , xn ) = n

Find the density of (Y1 , . . . , Yn ). [Ans]

43.12 The Helmert matrix of order n.


(a) Consider the n × n matrix A with the following rows:
1 1 1
v1 = √ (1, 1, . . . , 1) v2 = √ (1, −1, 0, . . . , 0) v3 = √ (1, 1, −2, 0, . . . , 0)
n 2 6
and in general for k = 2, 3, . . . , n:
1
vk = √ (1, 1, . . . , 1, −(k − 1), 0, . . . , 0)
k(k − 1)

where the vector vk starts with (k − 1) terms equal to 1/ k(k − 1) and ends with (n − k) terms equal to 0.
Check that A is orthogonal. This matrix is used in §42.12 on page 136.
(b) Consider the 3 vectors
2 2 2
 
α1 = (a1 , a2 , a3 ) α2 = a1 , − a1/a2 , 0 α3 = a1 , a2 , − (a1 +a2 )/a3
Check the vectors are orthogonal: this means α1 · α2 = α1 · α3 = α2 · α3 = 0.
Hence construct a 3 × 3 matrix which is orthogonal and which has a first row proportional to the vector α1 .
(c) Construct an n × n orthogonal matrix which has a first row proportional to the vector α = (a1 , a2 , . . . , an ). [Ans]

43.13 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the N (0, 1) distribution. Suppose a1 , a2 , . . . , an are real
constants with a21 + a22 + · · · + a2n 6= 0.
Find the conditional distribution of X12 + X22 + · · · + Xn2 given a1 X1 + a2 X2 + · · · an Xn = 0.
(Hint. Use part(c) of exercise 43.12). [Ans]

43.14 Suppose X = (X1 , . . . , Xn ) has the multivariate normal distribution with E[Xj ] = µj , var[Xj ] = σ 2 and corr[Xj , Xk ] =
ρ|j−k| for all j and k in {1, 2, . . . , n}. Hence X ∼ N (µ, Σ) where
1 ρ ρ2 · · · ρn−1
 
µ
 
n−2
 ρ 1 ρ ··· ρ
µ =  ...  and Σ = σ 2 

 .. .. .. ..
.
.. 
. . . .

µ
ρn−1 ρn−2 ρn−3 · · · 1
Show that the sequence {X1 , X2 , . . . , Xn } forms a Markov chain. [Ans]

43.15 Suppose X = (X1 , . . . , Xn ) is an n-dimensional random vector with density


 
n n
!
1 Y 1 2 1 X
fX (x) = 1+ xk e− 2 xk exp − x2j  for x = (x1 , . . . , xn ) ∈ Rn .
(2π)n/2 k=1
2
j=1
(`)
Using the usual notation, let X denote the vector X with X` removed for ` = 1, 2 . . . , n. Thus, for example,
X(1) = (X2 , X3 , . . . , Xn ) and X(2) = (X1 , X3 , X4 , . . . , Xn ).
(a) Let g` denote the density of X(`) . Find g` and hence show that the distribution of X(`) is N 0, I).
(b) Show that X gives an example of a random vector whose distribution is not multivariate normal and whose compo-
nents are not independent yet X(`) has the distribution of (n − 1) independent N (0, 1) variables for all `.
See also [P IERCE &DYKSTRA(1969)]. [Ans]
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §44 Page 141

43.16 See proposition(42.8b) on page 133.


Suppose X is an m-dimensional random vector with X ∼ N (µX , ΣX ) and Y is an n-dimensional random vector with
Y ∼ N (µY , ΣY ). Suppose further that X and Y are independent. Let
 
X
Z=
Y
Then Z ∼ N (µ, Σ) where
   
µX ΣX 01
µ= and Σ =
µY 02 ΣY
where 01 is an m × n matrix and 02 is an n × m matrix with every entry 0. [Ans]

44 Quadratic forms of normal random variables


44.1 Introduction. QuadraticP P introduced in §38.7 on page 119. In that section, it was shown
forms were
that common quantities such as x2j and (xj − x)2 can be regarded as quadratic forms. In addition, general
expressions were obtained for the mean and variance of a quadratic form in random variables with an arbitrary
distribution.
We now consider the special case of quadratic forms in normal random variables. Extensive bibliographies of this
topic can be found in [S CAROWSKY(1973)] and [D UMAIS(2000)].
We investigate four problems:
• (A) moment generating functions of quadratic forms;
• (B) the independence of quadratic forms (including Craig’s theorem);
• (C) the distribution of a quadratic form;
• (D) partitioning a quadratic form into independent quadratic forms (including Cochran’s theorem).
(A): Moment generating functions of quadratic forms
44.2 The moment generating function of a quadratic form.
Theorem(44.2a). Suppose A is a real symmetric n × n matrix and X ∼ N (µ, Σ), possibly singular. Then the
T
m.g.f. of X AX is
T 1 T −1
E[etX AX
]=
1/2
etµ (I−2tAΣ) Aµ (44.2a)
|I − 2tAΣ|
provided t is sufficiently small that the matrix I − 2tAΣ is positive definite, or equivalently 4 provided all the
eigenvalues of the matrix I − 2tAΣ are positive.
Proof. Let r = rank(Σ). Then by proposition(42.14b) on page 138 we can write X = BY + µ where Y ∼ N (0, Ir ) and
B is a real n × r matrix with Σ = BBT .
Now BT AB is a real symmetric r × r matrix and hence there exists an orthogonal r × r matrix P with BT AB = PT DP
where D is an r × r diagonal matrix with diagonal elements {λ1 , . . . , λr } which are the eigenvalues of BT AB.
From pages 545–546 in [H ARVILLE(1997)] we know that if F is an m × n matrix and G is an n × m matrix and qFG (λ)
denotes the characteristic polynomial of the m × m matrix FG and qGF (λ) denotes the characteristic polynomial of the
n × n matrix GF, then qGF (λ) = (−λ)n−m qFG (λ) provided n > m. It follows that the eigenvalues of the n × n matrix
ABBT = AΣ are the same as the eigenvalues of the r × r matrix BT AB together with n − r zeros; this implies they are
{λ1 , . . . , λr , 0, . . . , 0}.
Let Z = PY. Now P is orthogonal and Y ∼ N (0, Ir ); hence Z ∼ N (0, Ir ). Also
XT AX = (YT BT + µT )A(BY + µ) = YT BT ABY + 2µT ABY + µT Aµ
= YT PT DPY + 2µT ABPT Z + µT Aµ
= ZT DZ + αT Z + µT Aµ where α is the r × 1 vector 2PBT Aµ
Xr
= (λj Zj2 + αj Zj ) + µT Aµ
j=1
By part(a) of exercise 45.5 on page 156, we know that
!
tλj Zj2 +tαj Zj 1 t2 αj2
E[e ]= p exp for t ∈ R and tλj < 1/2.
1 − 2tλj 2(1 − 2tλj )

4
A symmetric matrix is positive definite iff all its eigenvalues are strictly positive—see page 543 of [H ARVILLE(1997)].
Page 142 §44 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Now λj may be negative; hence the condition |t| < 1/|2λj | is sufficient.
Suppose t ∈ R with |t| < min{1/|2λ1 |, . . . , 1/|2λr |} and Z1 , . . . , Zr are independent. Then
    
r r 2
X 1 t2 X αj
E exp  (tλj Zj2 + tαj Zj ) = √ exp  
(1 − 2tλ1 ) · · · (1 − 2tλr ) 2 1 − 2tλj
j=1 j=1
Now λ is an eigenvalue of AΣ iff 1 − 2tλ is an eigenvalue of I − 2tAΣ. Also, the determinant of a matrix equals the
product of the eigenvalues. Hence |I − 2tAΣ| = (1 − 2tλ1 ) · · · (1 − 2tλr ). Hence
t2 T
 
T 1
E[etX AX ] = exp tµ T
Aµ + α (I − 2tD)−1
α (44.2b)
|I − 2tAΣ|1/2 2
By straightforward multiplication (I − 2tPT DP)PT (I − 2tD)−1 P = I and hence
PT (I − 2tD)−1 P = (I − 2tPT DP)−1 = (I − 2tBT AB)−1 (44.2c)
T
Replacing α by 2PB Aµ in the exponent of equation(44.2b) and using equation(44.2c) shows that
t2
tµT Aµ + αT (I − 2tD)−1 α = tµT I + 2tAB(I − 2tBT AB)−1 BT Aµ
 
2
Equation(44.2c) shows that Ir − 2tBT AB is non-singular; also In − 2tAΣ = In − 2tABBT is non-singular for our values
of t. By part (b) of exercise 45.5 on page 156, we know that if F and G are n × r matrices such that In − FGT and
Ir − GT F are non-singular, then In + F(Ir − GT F)−1 GT = (In − FGT )−1 . So applying this result with F = 2tAB and
G = B shows that In + 2tAB(Ir − 2tBT AB)−1 BT = (In − 2tABBT )−1 = (In − 2tAΣ)−1 where t ∈ R is such that
|t| < min{1/|2λ1 |, . . . , 1/|2λr |}. Hence result.
Corollary(44.2b). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the N (0, σ 2 ) distribution and A
is a real symmetric n × n matrix with rank(A) = r. Then the m.g.f. of XT AX/σ 2 is
T 2 1 1
E[etX AX/σ ] = 1/2
=√ (44.2d)
|1 − 2tA| (1 − 2tλ1 ) · · · (1 − 2tλr )
where λ1 , . . . ,λr are the non-zero eigenvalues of A.
Example(44.2c). Suppose X1 , X2 , X3 and X4 are i.i.d. random variables with the N (0, σ 2 ) distribution. Let Q = X1 X2 −
X3 X4 . Find the distribution of Q/σ 2 .
Solution. Now Q = XT AX where
0 1/2 0 0
 
 1/ 0 0 0 
A= 2
0 0 0 − 1/2

0 0 − 1/2 0
Now rank(A) = 4 and the eigenvalues of A are /2, /2, − /2 and − /2. Hence the m.g.f. of Q is 1/(1 − t2 ). Hence Q has the
1 1 1 1

bilateral exponential or Laplace distribution—see §27.1 on page 81.

44.3 Alternative expressions for the m.g.f. when the distribution is non-singular. Now suppose the distri-
bution of X is non-singular; hence the matrix Σ is invertible. Using part of the exponent in equation(44.2a)
gives
1
t(I − 2tAΣ)−1 A = Σ−1 tΣ(I − 2tAΣ)−1 A = Σ−1 (I − 2tΣA)−1 − I
 
2
where the last equality follows from part(b) of exercise 45.5 on page 156 with F = 2tΣ and G = A = AT .
Hence an alternative expression for the m.g.f. when the distribution is non-singular is
1 1 T −1 −1
e− 2 µ Σ [I−(I−2tΣA) ]µ
T
E[etX AX ] = 1/2
(44.3a)
|I − 2tAΣ|
Note that
Σ−1 − Σ−1 (Σ−1 − 2tA)−1 Σ−1 = Σ−1 − (I − 2tAΣ)−1 Σ−1 = I − (I − 2tAΣ)−1 Σ−1
 

Similarly
Σ−1 − Σ−1 (Σ−1 − 2tA)−1 Σ−1 = Σ−1 − Σ−1 (I − 2tΣA)−1 = Σ−1 I − (I − 2tΣA)−1
 

Hence alternative expressions for the m.g.f. when the distribution is non-singular include
T 1 − 12 µT [I−(I−2tAΣ)−1 ]Σ−1 µ
E[etX AX ] = e (44.3b)
|I − 2tAΣ|1/2
1 1 T −1 −1
= 1/2
e− 2 µ [Σ −(Σ−2tΣAΣ) ]µ (44.3c)
|I − 2tAΣ|
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §44 Page 143

44.4 Other moment generating function. Suppose A and B are real symmetric n × n matrices and X ∼
N (µ, Σ). Then we can write down the m.g.f. of the 2-dimensional random vector (XT AX, XT BX) as follows:
replace A by s1 A + s2 B in equation(44.2a) and then let t1 = ts1 and t2 = s2 . Hence, provided t1 and t2 are
sufficiently small
exp µT (I − 2t1 A1 Σ − 2t2 BΣ)−1 (t1 A + t2 B)µ
 
t1 XT AX+t2 XT BX
E[e ]= (44.4a)
|I − 2t1 AΣ − 2t2 BΣ|1/2
If the distribution is non-singular, then we can use equation(44.3b) or (44.3c) and then
exp − 12 µT Σ−1 µ + 12 µT (I − 2t1 AΣ − 2t2 BΣ)−1 Σ−1 µ
 
t1 XT AX+t2 XT BX
E[e ]=
|I − 2t1 AΣ − 2t2 BΣ|1/2
In particular, if X ∼ N (0, Σ) then µ = 0 and
T T 1
E[et1 X AX+t2 X BX ] =
|I − 2t1 AΣ − 2t2 BΣ|1/2

Hence if X ∼ N (0, Σ), then XT AX and XT BX are independent iff


|I − 2t1 AΣ − 2t2 BΣ|1/2 = |I − 2t1 AΣ|1/2 |I − 2t2 BΣ|1/2 (44.4b)

Moment generating function of a quadratic expression. Suppose X ∼ N (0, I), A is a real symmetric n × n matrix,
b is an n × 1 real vector and d ∈ R. Suppose further that we want the m.g.f. of the quadratic expression
Q = XT AX + bT X + d. First note that if C is an n × n non-singular matrix, x ∈ Rn , b ∈ Rn and t ∈ R then
(x − tC−1 b)T C(x − tC−1 b) − t2 bT C−1 b = (xT − tbT C−1 )(Cx − tb) − t2 bT C−1 b
= xT Cx − 2tbT X
and hence if C = I − 2tA, we have
xT x − (x − tC−1 b)T C(x − tC−1 b) + t2 bT C−1 b = xT x − xT Cx + 2tbT X = 2txT Ax + 2tbT x
and hence
1 1 1
txT Ax + tbT x + td = xT x − (x − tC−1 b)T C(x − tC−1 b) + td + t2 bT C−1 b
2 2 2
Hence the m.g.f. of Q is, provided t is sufficiently small that the matrix I − 2tA is positive definite (equivalently,
t is sufficiently small that all the eigenvalues are positive),
exp[− 21 (x − tC−1 b)T C(x − tC−1 b)]
 Z
T T 1
E[etX AX+tb X+td ] = exp td + t2 bT C−1 b dx
2 x∈Rn (2π)n/2
 
1 2 T −1 1
= exp td + t b C b √
2 det(C)
 
1 1 2 T −1
= exp td + t b (I − 2tA) b (44.4c)
|I − 2tA|1/2 2

(B) Independence of quadratic forms


44.5 Independence of two normal linear forms. This first result below is the key to the subsequent two results.
Proposition(44.5a).Suppose X is an n-dimensional random vector with the possibly singular N (µ, Σ) distri-
bution. Suppose A is an m1 × n matrix and B is an m2 × n matrix. Then AX and BX are independent iff
AΣBT = 0.
Proof. Let F denote the (m1 + m2 ) × n matrix
 
A
F=
B
Then Y = FX is multivariate normal by Corollary(42.6b) on page 132. Using proposition(42.8b) on page 133 shows AX
and BX are independent iff cov[AX, BX] = 0. But cov[AX, BX] = AΣBT by exercise 39.2 on page 121. Hence result.
Similarly suppose Ai is mi × n for i = 1, 2, . . . , k. Then Ai ΣATj = 0 for all i 6= j implies A1 X, . . . , Ak X are
pairwise independent by the previous proposition; box display(42.8b) on page 133 then implies A1 X, . . . , Ak X
are independent.
Page 144 §44 Mar 10, 2020(20:25) Bayesian Time Series Analysis

44.6 Independence of normal quadratic and normal linear form—a simple special case.
Proposition(44.6a). Suppose X is an n-dimensional random vector with the N (µ, Σ) distribution. Suppose
further that A is k × n matrix and B is a symmetric idempotent n × n matrix with AΣB = 0. Then
AX and XT BX are independent.
Proof. We are given AΣB = 0 and B is symmetric; hence AΣBT = 0; hence AX and BX are independent by proposi-
tion(44.5a). Hence AX and (BX)T (BX) are independent. Hence result.
See exercise 45.13 for a variation of this proposition.
The requirement that B is idempotent is very restrictive. The previous proposition is still true without this condition
but the proof is much harder. We shall return to this problem after considering Craig’s theorem.
44.7 Independence of two normal quadratic forms—Craig’s theorem. The special case when both matrices
are idempotent is easy to prove.
Proposition(44.7a). Suppose X is an n-dimensional random vector with the N (µ, Σ) distribution. Suppose A
and B are n × n symmetric idempotent matrices such that AΣB = 0. Then XT AX and XT BX are independent.
Proof. By proposition(44.5a), we know that AX and BX are independent iff AΣBT = 0.
We are given AΣB = 0. Using the fact that B is symmetric gives AΣBT = 0 and hence AX and BX are independent.
Hence (AX)T (AX) and (BX)T (BX) are independent. But A and B are both idempotent; hence result.
See exercise 45.12 for a variation of this proposition. We shall also prove in proposition(44.12a) on page 147 that,
provided µ = 0, then XT AX/σ 2 ∼ χ2r where r = rank(A) and XT BX/σ 2 ∼ χ2s where s = rank(B).
The previous proposition is still true without the assumption of idempotent—but then the proof is much harder.
Theorem(44.7b). Craig’s theorem on the independence of two normal quadratic forms. Suppose X has
the non-singular normal distribution N (µ, Σ) and A and B are real and symmetric. Then
XT AX and XT BX are independent iff AΣB = 0
Proof.
⇒ Omitted because the proof is long and not used in applications. See [R EID &D RISCOLL(1988)] and pages 208–211
in [M ATHAI &P ROVOST(1992)].
⇐ Because Σ is real, symmetric and positive definite, then by §38.4 on page 118, we can write Σ = QQ where Q is
symmetric and nonsingular. Let C = QT AQ and D = QT BQ. Hence CD = QT AQQT BQ = QT AΣBQ = 0, because we
are assuming AΣB = 0. Taking the transpose and using the fact that both C and D are symmetric gives DC = 0.
See pages 559–560 in [H ARVILLE(1997)] for the result that if C and D are both n × n real symmetric matrices, then there
exists an orthogonal matrix P such that both PT CP and PT DP are diagonal iff CD = DC.
Let Y = PT Q−1 X; hence QPY = X.
Now var[Y] = PT Q−1 var[X]Q−1 P = I because var[X] = Σ = QQ and P is orthogonal. So Y1 , . . . , Yn are independent.
T T T T T T
Pn {λ12, . . 2. , λn } denote the eigenvalues of C; then X AX = Y P Q AQPY T= Y P CPY
Let = YT diag[λ1 , . . . , λn ]Y =
T T T T T
T
k=1 λk Yk . Similarly, Let Pn {µ12, . . 2. , µn } denote the eigenvalues of D; then X BX = Y P Q BQPY = Y P DPY =
Y diag[µ1 , . . . , µn ]Y = k=1 µk Yk .
Finally, note that CPPT D = CD = 0; hence PT CPPT DP = 0, and hence diag[λ1 , . . . , λn ]diag[µ1 , . . . , µn ] = 0. This
implies λk µk = 0 for every k = 1, 2, . . . , n. So partition the set {1, 2, . . . , n} into N1 and N2 such that N1 = {j : λj 6= 0}
and N2 = {1, 2, . . . , n} − N1 . Then j ∈ N1 implies λj 6= 0 and hence µj = 0; also µj 6= 0 implies λj = 0 and hence
j ∈ N2 . Hence XT AX depends only on the random variables {Yj : j ∈ N1 } and XT BX depends only on the random
variables {Yj : j ∈ N2 }; hence XT AX and XT BX are independent and the result is proved.
Zhang has given a slicker proof of this result—see after the proof of proposition (44.8a) below.

Note that Craig’s theorem includes proposition(44.7a) when the distribution is non-singular.

T
Example(44.7c). Applying Craig’s theorem P of X X. Suppose X1 , . . . , Xn are i.i.d. random variables
to a decomposition
n n
with the N (µ, σ 2 ) distribution. Let X = j=1 Xj /n and S 2 = j=1 (Xj − X)2 /(n − 1). Show that X and S 2 are independent.
P
Pn Pn 2 2
Solution. Now XT X = j=1 Xj2 = j=1 (Xj − X)2 + nX . Also XT 1X/n = nX . Hence XT X = XT [I − 1/n]X + XT 1X/n.
By Craig’s theorem, X and S 2 are independent iff AΣB = 0 where A = I − 1/n, Σ = σ 2 I and B = 1/n. This occurs iff
[I − 1/n]1/n = 0. But 11/n2 = 1/n. Hence result.
Example(44.7d). Suppose X1 , X2 and X3 are i.i.d. random variables with the N (0, σ 2 ) distribution. Let Q1 = X12 +
3X1 X2 + X22 + X1 X3 + X32 and Q2 = X12 − 2X1 X2 + 2X22 /3 − 2X1 X3 − X32 . Are Q1 and Q2 independent?
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §44 Page 145

Solution. Now Q1 = XT A1 X and Q2 = XT A2 X where


1 3/2 1/2 1 −1 −1
" # " #
A1 = 3/2 1 0 and A2 = −1 2/3 0
1/2 0 1 −1 0 −1
Clearly A1 A2 6= 0; hence by Craig’s theorem (44.7b), Q1 and Q2 are not independent.
Note that the following three propositions are equivalent:
[A] If X has the non-singular n-dimensional normal distribution N (0, I) and A and B are real symmetric n ×
n matrices, then XT AX and XT BX are independent iff AB = 0.
[B] If X has a non-singular n-dimensional normal distribution with var[X] = I and A and B are real symmetric
n × n matrices, then XT AX and XT BX are independent iff AB = 0.
[C] If X has a non-singular n-dimensional normal distribution and A and B are real symmetric n × n matrices,
then XT AX and XT BX are independent iff AΣB = 0 where Σ = var[X].
Clearly [C] ⇒ [B] ⇒ [A].
To prove [B] ⇒ [C]. Suppose X has the non-singular normal distribution N (µ, Σ) and A and B are real and sym-
metric. Let Y = Σ−1/2 X; then Y ∼ N (Σ−1/2 µ, I). Also XT AX = YT Σ1/2 AΣ1/2 Y and XT BX = YT Σ1/2 BΣ1/2 Y.
If XT AX and XT BX are independent, then [B] implies Σ1/2 AΣ1/2 Σ1/2 BΣ1/2 = 0 and hence AΣB = 0. Con-
versely, if AΣB = 0 then Σ1/2 AΣ1/2 Σ1/2 BΣ1/2 = 0 and hence XT AX and XT BX are independent.
To prove [A] ⇒ [B]. Suppose X has the non-singular normal distribution N (µ, I) and C and D are real and
symmetric. We need to prove that XT CX and XT DX are independent iff CD = 0.
Let Y = X−µ; then Y ∼ N (0, I). Also XT CX = (YT +µT )C(Y+µ) and XT DX = (YT +µT )D(Y+µ). Consider the
quadratic expression Q = αXT CX+XT DX = YT (αC+D)Y+2µT (αC+D)Y+µT (αC+D)µ. Using equation(44.4c)
where A is replaced by αC + D and b by 2(αC + D)µ and d by µT (αC + D)µ gives
2 µT (αC + D)(I − 2tαC − 2tD)−1 (αC + D)µ
 T 
exp tµ (αC + D)µ + 2t
E[etQ ] =
|1 − 2tαC − 2tD|1/2
Using
"∞ # ∞
X X
−1 k
2
4t (αC + D)(I − 2tαC − 2tD) (αC + D) = 2t(αC + D) [2t(αC + D)] 2t(αC + D) = [2t(αC + D)]k
k=0 k=2
and hence

" #
1 1 TX
E[tQ ] = exp µ [2t(αC + D)]k µ
|1 − 2tαC − 2tD|1/2 2
k=1
P∞ P∞ P∞
Now if CD = 0 then k=1 [2t(αC + D)] = k=1 (2tαC) + k=1 (2tD)k . Also by [A] we have YT CY and YT DY
k k

are independent; hence |I − 2tαC − 2tD| = |I − 2tαC| |I − 2tD|. Hence the m.g.f. factorizes
∞ ∞
" # " #
tQ 1 1 TX k 1 1 TX k
E[e ] = exp µ (2tαC) µ exp µ (2tD) µ
|I − 2tαC|1/2 2 |I − 2tD|1/2
k=1
2
k=1
αtXT CX+tXT DX αtXT CX tXT DX
E[e ] = E[e ] E[e ]
and this is equivalent to
T T T T
E[et1 X CX+t2 X DX ] = E[et1 X CX ] E[et2 X DX ] and hence XT CX and XT DX are independent.
The proof that XT CX and XT DX are independent implies CD = 0 is harder—see page 315 in [P ROVOST(1996)].
The equivalence of [A], [B] and [C] shows that if we wish to prove Craig’s theorem (which is just [C]), it is
sufficient to prove [A].
44.8 Independence of normal quadratic and normal linear form: the general non-singular case.
Proposition(44.8a). Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribu-
tion. Suppose further that A is k × n matrix and B is a symmetric n × n matrix. Then
AX and XT BX are independent iff AΣB = 0
Proof.
⇒ We are given AX and XT BX are independent; hence XT AT AX = (AX)T (AX) and XT BX are independent. By Craig’s
theorem, AT AΣB = 0. Premultiplying by A gives (AAT )T AΣB = 0.
Suppose A has full row rank; then by exercise 45.15 on page 156, we know that AAT is non-singular and hence AΣB = 0.
Page 146 §44 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Suppose rank(A) = r < k. Let A1 denote the r × n matrix of r independent rows of A and let A2 denote the (k − r) × n
some(k − r) × r matrix C. Consider the matrix A∗ where
matrix of the other k − r rows of A. Hence A2 = CA1 for 
A1
A∗ =
A2
So A∗ can be obtained from A by an appropriate permutation of the rows. Now A1 X = f (AX) for some function f ; hence
A1 X and XT BX are independent. By the full row rank case we have A1 ΣB = 0. Hence A2 ΣB = CA1 ΣB = 0. Hence
A∗ ΣB = 0. Hence AΣB = 0.
⇐ The proof is similar to that of Craig’s theorem—see exercise 45.11 on page 156.
Here is another5 slicker proof of proposition(44.8a) due to [Z HANG(2017)] which uses generalized inverses:
⇐ We are given AΣB = 0. Hence cov[AX, BX] = AΣBT = AΣB = 0; thus AX is independent
of BX. Let B− be a generalized inverse of B; hence B = BB− B and hence XT BX = XT BB− BX =
(BX)T B− (BX) = f (BX). Because AX is independent of BX, it follows that AX is independent of
f (BX) = XT BX. Hence result.
⇒ We are given AX and XT BX are independent; hence XT AT AX and XT BX are independent; hence
by Craig’s theorem we have AT AΣB = 0. Now use the equality A = A(AT A)− AT A to get AΣB =
A(AT A)− AT AΣB = 0. Hence result.
Clearly, the Zhang proof of ⇐ also shows that AΣB = 0 implies XT AX and XT BX are independent, and this is
the ⇐ part of Craig’s theorem—proposition (44.7a) above.
The following example shows we can use this proposition to prove the standard result about the independence of
S 2 and X.
Example(44.8b). Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution.
Pn Pn
Let X = j=1 Xj /n and S 2 = j=1 (Xj − X)2 /(n − 1). Show that X and S 2 are independent.
Solution. Now X = aT X where aT = (1, 1, . . . , 1)/n. Also by example(38.7d) on page 119, we have (n − 1)S 2 = XT (I −
1/n)X where 1 is an n × n matrix with every entry equal to 1. Finally Σ = σ 2 I. Using the notation of proposition(44.8a) we
have AΣB = σ 2 aT (I − 1/n) = 0. Hence result.
44.9 Craig’s theorem for the singular case.
Proposition(44.9a). Suppose X is an n-dimensional random vector with the possibly singular N (µ, Σ) dis-
tribution. Suppose further that A and B are n × n real symmetric matrices. Then XT AX and XT BX are
independent iff ΣAΣBΣ = 0, ΣAΣBµ = 0, ΣBΣAµ = 0, and µT AΣBµ = 0.
Proof. See [D RISCOLL &K RASNICKA(1995)].

(C) The distribution of a quadratic form


44.10 The variance of a quadratic from. A general result for the variance of a quadratic form was obtained
in §38.9 on page 120. See exercise 45.21 on page 157 for an expression for the variance of a quadratic form in
normal random variables.
44.11 Distribution of XT Σ−1 X. Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ)
T
distribution. By proposition (42.7a) on page 132, we
−1 T
Pnknow2 there2 exists a non-singular matrix Q such that Q Q =
Σ and Z = Q(X − µ) ∼ N (0, I). Hence Z Z = i=1 Zi ∼ χn and so
(X − µ)T QT Q(X − µ) ∼ χ2n
Summarizing:
if X ∼ N (µ, Σ) then (X − µ)T Σ−1 (X − µ) ∼ χ2n
This result for the bivariate case was given in exercise 41.21 on page 129. Here is a generalization:
Proposition(44.11a). Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribu-
tion. Then the random variable XT Σ−1 X has the non-central χ2n distribution with non-centrality parameter
λ = µT Σ−1 µ.
Proof. Take Z = QX where Σ−1 = QT Q. Then Z ∼ N (µZ , I) where µZ = Qµ. Also
XT Σ−1 X = XT QT QX = ZT Z = Z12 + · · · + Zn2
Hence, by Chapter2:§23.2 on page 72, XT Σ−1 X has the non-central χ2n distribution with non-centrality parameter
µTZ µZ = µT Σ−1 µ.

5
Yet another proof can be found in [S EARLE(1971)]. His proof of ⇐ is based on expressing B as LLT where L has full
column rank. This is possible but L will be complex unless B is non-negative definite. Hence the proof involves using the
complex multivariate normal which we have not considered—and nor does Searle. The proof of ⇒ in Searle is clearly false.
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §44 Page 147

44.12 Distribution of XT CX when C is idempotent. Properties of idempotent matrices can be found in many
places; for example [H ARVILLE(1997)].
Proposition(44.12a).Suppose X1 , . . . , Xn are i.i.d. random variables with the N (0, σ 2 ) distribution. Suppose
C is an n × n symmetric idempotent matrix with rank r (and so trace(C) = r). Then
XT CX
∼ χ2r
σ2
Proof. Because C is real and symmetric, it is orthogonally diagonalizable: this means we can write C = LDLT where
D = diag(d1 , . . . , dn ) is diagonal and L is orthogonal—see §38.4 on page 118. Now C2 = LDLT LDLT = LD2 LT and
C = LDLT . Also C is idempotent—this means that C2 = C. Hence D2 = D. This implies every dj is either 0 or 1 and
hence all the eigenvalues of C are either 0 or 1. It follows, after possibly rearrranging the rows of L, that
 
Ir 0
C=L LT (44.12a)
0 0
where the submatrix Ir is the r × r identity matrix and r = rank(C).
Let Z = LT X; then Z ∼ N (0, σ 2 In ). Also
Pr 2
XT CX ZT LT CLZ
 
1 T Ir 0 k=1 Zk
= = Z Z = ∼ χ2r as required.
σ 2 σ 2 σ 2 0 0 σ 2

See exercise 45.18 on page 157 for a generalization of this proposition.


44.13 Conditions for a quadratic form in non-singular normal variables to have a χ2 distribution.
Theorem(44.13a). Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribution
and A is a real n × n symmetric matrix. Suppose AΣ is idempotent with rank(AΣ) = r; then XT AX has the
non-central χ2r distribution with non-centrality parameter µT Aµ.
Proof. Here are two proofs—both are instructive.
Proof 1. Now Σ is a real symmetric positive definite matrix. Hence there exists a non-singular Q such that Σ = QQT .
Let A1 = QT AQ. Now AΣAΣ = AΣ and Σ is non-singular; hence AΣA = A. Hence A21 = QT AΣAQ = A1 and
rank(A1 ) = rank(A) = rank(AΣ) = r because Q, QT and Σ are all non-singular. Because A1 is a real symmetric
idempotent matrix with rank r, there exists an orthogonal matrix P such that
 
I 0 T
A1 = P r P (44.13a)
0 0
Define P1 to be the n × r matrix and P2 to be the n × (n − r) matrix so that P = [ P1 P2 ]; hence A1 = P1 PT1 . From
equation(44.13a) we have
   T    T  T 
T Ir 0 T P1 Ir 0 T P1 P1 P1 PT1 P2
P A1 = P = and hence = P A1 P = [ P1 P2 ] =
0 0 0 0 0 0 0 0
Hence PT1 P1 = Ir .
Let Z = PT1 Q−1 X. Then var[Z] = PT1 Q−1 Σ(Q−1 )T P1 = PT1 Q−1 QQT (Q−1 )T P1 = Ir and hence Z ∼ N (PT1 Q−1 µ, Ir ).
Also ZT Z = XT (Q−1 )T P1 PT1 Q−1 X = XT (Q−1 )T A1 Q−1 X = XT AX. Hence XT AX has a non-central χ2r distribution with
non centrality parameter µT (Q−1 )T P1 PT1 Q−1 µ = µT Aµ.
Proof 2. This proof is based on expanding the moment generating function in equation(44.3b).
Now for |t| < min{1/|2λ1 |, . . . , 1/|2λr |} we have
 
∞  
 −1
 −1
X
j j  −1 2t
I − (I − 2tAΣ) Σ =  (2t) (AΣ) Σ = A
1 − 2t
j=1

because AΣ is idempotent. Also, as in the proof of proposition (44.2a) on page 141, we have |I − 2tAΣ| = (1 −
2tλ1 ) · · · (1 − 2tλr ) = (1 − 2t)r where r = rank(AΣ) = rank(A), because we are assuming Σ is non-singular. Substituting
into equation(44.3b) shows that for − 12 < t < 12 we have
 
tXT AX 1 λt
E[e ]= exp where λ = µT Aµ.
(1 − 2t)r/2 1 − 2t
By equation(23.2b), the m.g.f. of the non-central χ2 distribution with n degrees of freedom and non-centrality parameter λ
is  
1 λt
exp for t < 1/2.
(1 − 2t)n/2 1 − 2t
Hence XT AX has a non-central χ2 distribution with r degrees of freedom and non-centrality parameter µT Aµ.
Page 148 §44 Mar 10, 2020(20:25) Bayesian Time Series Analysis

The converse is also true:


Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribution and A is a real
n×n symmetric matrix. Suppose further that XT AX has the non-central χ2m distribution with non-centrality
parameter λ. Then AΣ is idempotent, λ = µT Aµ and m = rank(A).
A proof of this result can be found in [D RISCOLL(1999)].
Here are two special cases:
• Suppose X ∼ N (0, In ) and A is a real symmetric n × n matrix. Then XT AX ∼ χ2r iff A is idempotent with
rank(A) = r.
• Suppose X ∼ N (µ, σ 2 In ) and A is a real symmetric n × n matrix. Then XT AX/σ 2 has a non-central χ2r
distribution with non-centrality parameter µT Aµ/σ 2 iff A is idempotent with rank(A) = r.
Hence this new proposition includes proposition(44.12a) as a special case.
Example(44.13b). Suppose X1 , X2 and X3 are i.i.d. random variables with the N (µ, σ 2 ) distribution. Suppose further that
/2 0 1/2
"1 #
A= 0 1 0
1/2 0 1/2

Find the distribution of the quadratic form XT AX/σ 2 .


Solution. Clearly A is real symmetric and idempotent with rank(A) = 2. Finally, µT Aµ/σ 2 = 3µ2 /σ 2 . Hence XT AX/σ 2
has the non-central χ22 distribution with non-centrality parameter 3µ2 /σ 2 .
44.14 Conditions for a quadratic form in possibly singular normal variables to have a χ2 distribution. The
requirement that X has a non-singular distribution is essential in proposition (44.13a)—here is an example which
demonstrates this.
Example(44.14a). Suppose Z ∼ N (0, 1), A = I2 and XT = ( Z 1 ). Show that X has a singular multivariate normal
distribution, AΣ is idempotent but XT AX does not have a non-central χ2 distribution.
Solution. We have X ∼ N (µ, Σ) where    
0 1 0
µ= and Σ =
1 0 0
T
Then (AΣ)2 = AΣ, rank(AΣ) = 1 and XT AX = Z 2 + 1. Now Z 2 ∼ χ21 . Hence E[etX AX
] = et /(1 − 2t)1/2 and so does not
have a non-central χ2 distribution.
Now for the conditions which are necessary and sufficient for a quadratic form in possibly singular normal vari-
ables to have a χ2 distribution.
Theorem(44.14b). Suppose X is an n-dimensional random vector with the possibly singular N (µ, Σ) distri-
bution and A is a real n × n symmetric matrix. Suppose further that
ΣAΣAΣ = ΣAΣ and µT (AΣ)2 = µT AΣ and µT AΣAµ = µT Aµ
then XT AX has the non-central χ2r distribution with non-centrality parameter α where r = trace(AΣ) and
α = µT Aµ.
Proof. We use moment generating functions. From equation(23.2b) on page 73 we know that if W has the non-central
χ2r distribution with non-centrality parameter α then
 
tW 1 αt
E[e ] = exp for t < 1/2.
(1 − 2t)r/2 1 − 2t
Also, by equation(44.2a) we have the moment generating function of XT AX:
T 1 T −1
E[etX AX ] = etµ (I−2tAΣ) Aµ
|I − 2tAΣ|1/2
The first assumption is that ΣAΣAΣ = ΣAΣ; by part(a) of exercise 45.16 this implies (ΣA)3 = (ΣA)2 . This implies6
every eigenvalue of the matrix ΣA is either 0 or 1. Hence every eigenvalue of the transpose AΣ is either 0 or 1.
Now r = trace(AΣ); hence the number of non-zero eigenvalues of AΣ equals r. Using the fact that the determinant
equals the product of the eigenvalues and λ is an eigenvalue of AΣ iff 1 − 2tλ is an eigenvalue of I − 2tAΣ, we get
|I − 2tAΣ|1/2 = (1 − 2t)r/2 .
We have by induction µT (AΣ)k = µT (AΣ) for k ∈ {1, 2, . . .}. Hence µT (AΣ)k Aµ = µT Aµ = α forP k ∈ {1, 2, . . .}.
Now it is well known
P∞ k that if the spectral radius of the matrix F is less than 1 then the geometric series Fk converges
−1
and (I − F) = k=0 F ; but the eigenvalues of the matrix 2tAΣ are either 2t or 0. Hence for − 2 < t < 12 we can
1

expand (I − 2tAΣ)−1 and get



X α
µT (1 − 2tAΣ)−1 Aµ = (2t)k µT (AΣ)k Aµ =
1 − 2t
k=0

6
Now ΣAx = λx implies (ΣA)2 x = λΣAx = λ2 x and (ΣA)3 x = λ3 x. Hence λ2 = λ3 and hence λ = 0 or 1.
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §44 Page 149

Hence the proposition.


It can be shown that the conditions specified in the previous proposition are necessary and sufficient: see pages
197–201 in [M ATHAI &P ROVOST(1992)]

(D) Partitioning a quadratic form into independent pieces


44.15 Cochran’s theorem. Cochran’s theorem is frequently used in the analysis of variance and is con-
cerned with partitioning a sum of squares into independent pieces. The simplest case is as follows. Suppose
X1 , X2 , . . . , Xn are i.i.d. random variables with the N (0, σ 2 ) distribution. Then
Pn 2
Pn 2 2
j=1 Xj j=1 (Xj − X) nX
= + 2
σ2 σ2 σ
Pn 2 2
By §42.12 on page 136 we know that j=1 (Xj − X) and nX are independent. Hence
Pn 2
Pn 2
j=1 Xj j=1 (Xj − X)2
(n − 1)S 2 nX
∼ χ2n is partitioned into = ∼ χ2n−1 and ∼ χ21
σ2 σ2 σ2 σ2
44.16 Matrix results when the sum is the identity matrix. The proof of Cochran’s theorem has led to several
results about matrices. We start with a version where the sum of the matrices is the identity matrix.
Proposition(44.16a). Suppose A1 , . . . , Ak are real symmetric n × n matrices with A1 + · · · + Ak = In . Let
rj = rank(Aj ) for j = 1, 2, . . . , k.
Then the following 3 statements are equivalent:
(1) r1 + · · · + rk = n (2) Aj is idempotent for every j = 1, 2, . . . , k (3) Ai Aj = 0 for all i 6= j
Proof.
(3) ⇒ (2) Multiply the equation A1 + · · · + Ak = In by Aj . This gives A2j = Aj .
(2) ⇒ (1) We need the following two general results: trace(A + B) = trace(A) + trace(B), and rank(B) = trace(B) for
Pn Pn
every idempotent matrix B. Hence n = trace(In ) = j=1 trace(Aj ) = j=1 rank(Aj ).
(1) ⇒ (3) We use the rank factorization theorem for matrices: suppose A is an m × n matrix with rank(A) = r where
r 6= 0. Then there exists an m × r matrix B and an r × n matrix C such that A = BC and rank(B) = rank(C) = r. See
page 38 in [H ARVILLE(1997)].
The rank factorization theorem implies that for every j = 1, 2, . . . , n, there exists an n × rj matrix Bj and an rj × n
matrix Cj such that Aj = Bj Cj . Also r1 + r2 + · · · + rk = n. Let B and C denote the n × n matrices defined by
C1
 
 C2 
B = [ B1 B2 · · · Bk ] and C =   ... 

Ck
Then BC = B1 C1 + · · · Bk Ck = A1 + · · · + Ak = In . Thus C is the inverse of the matrix B and hence CB = In . Hence
Cj B` = 0 for j 6= `. Hence for j 6= ` we have Aj A` = Bj Cj B` C` = 0, as required.
See also pages 434–439 of [H ARVILLE(1997)].
Now for a more sophisticated result which is, effectively, an algebraic version of Cochran’s theorem.
Proposition(44.16b). Suppose A1 , . . . , Ak are symmetric non-negative definite n × n matrices with A1 + · · · +
Ak = In . Let rj = rank(Aj ) for j = 1, 2, . . . , k. Suppose further that r1 + · · · + rk = n.
Then there exists an n × n orthogonal matrix L such that for any x ∈ Rn , if y = Lx then
xT A1 x = y12 + · · · + yr21
xT A2 x = yr21 +1 + · · · + yr21 +r2
(44.16a)
············
xT Ak x = yr21 +···+rk−1 +1 + · · · + yn2
Also A1 , . . . , Ak are all idempotent and Ai Aj = 0 for i 6= j.
Proof. Because A1 is a real symmetric n × n matrix, there exists an orthogonal matrix L1 such that A1 = L1 D1 LT1 where
D1 = diag(d1 , . . . , dn ) is diagonal with entries which are the eigenvalues of A1 —see §38.4 on page 118. Because A1 is
non-negative definite, we must have d1 ≥ 0, . . . , dn ≥ 0. Now rank(A1 ) = r1 ; hence r1 of the eigenvalues are non-zero
and n − r1 are zero. Without loss of generality, we can assume d1 > 0, . . . , dr1 > 0 and dr1 +1 = · · · = dn = 0.
Let B = A2 + · · · + Ak Now rank(C + D) ≤ rank(C) + rank(D) for all matrices C and D with the same dimensions; hence
n = rank(In ) = rank(A1 + · · · + Ak ) ≤ rank(A1 ) + rank(B) ≤ r1 + · · · + rk = n
Page 150 §44 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Hence all these terms are equal and rank(B) = r2 + · · · + rk .


Pn Pr1
Now let y = LT1 x. Then j=1 yj2 = yT y = xT x = xT A1 x + xT Bx = yT D1 y + xT Bx = j=1 dj yj2 + xT Bx.
Pr1 n
Hence j=1 (1 − dj )yj2 + j=r1 +1 yj2 = xT Bx = yT LT1 BL1 y. Because L1 and LT1 are non-singular, we have rank(LT1 BL1 ) =
P
rank(B) = r2 + · · · + rk = n − r1 . Hence d1 = · · · = dr1 = 1.
Similarly, for j = 1, 2, . . . , k, there exist orthogonal Lj with Aj = Lj Dj LTj where
 
I 0
Dj = r j and Irj is the rj × rj identity matrix.
0 0
Let Mj consists of the first rj columns of Lj ; hence
h i
L = Mj
j
Nj
n×rj n×(n−rj )
rj ×rj
Clearly
Aj = Lj Dj LTj = Mj Irj MTj = Mj MTj (44.16b)
Now define the n × n matrix M as follows
M = [ M1 M2 · · · Mk ]
Hence
MMT = M1 MT1 + · · · + Mk MTk = A1 + · · · + Ak = In
Thus M is an orthogonal matrix and MT M = In . Hence MTj Mj = Irj and MTj M` = 0 for j 6= `.
Hence MTj Aj M` = MTj Mj MTj M` = 0, and MTj A` M` = MTj M` MT` M` = 0; of course MTj Aj Mj = Irj . Hence
0 0 0
  " #
T I r1 0 T
M A1 M = and M A2 M = 0 Ir2 0 and so on.
0 0
0 0 0
Let L = MT and then if y = Lx equations(44.16a) follow immediately.
Using the representation Aj = Mj MTj in equation(44.16b) shows that Aj is idempotent and Aj A` = 0 for j 6= `.
Here is an alternative method for proving the first part of the theorem which is quite nice.
Now λ is an eigenvalue of A1 iff |A1 − λI| = 0. Because B = I − A1 , it follows that λ is an eigenvalue of A1 iff
|B − (1 − λ)I| = 0. Hence λ is an eigenvalue of A1 iff 1 − λ is an eigenvalue of B. Now A1 is a real symmetric matrix;
hence r1 , the rank of A1 , equals7 the number of non-zero eigenvalues of A1 . So A1 has n − r1 eigenvalues equal to 0
and hence B has n − r1 eigenvalues equal to 1. Because B has rank n − r1 and is symmetric, it follows that B has
n − r1 eigenvalues equal to 1 and r1 eigenvalues equal to 0. Similarly A1 has r1 eigenvalues equal to 1 and n − r1
eigenvalues equal to 0.

44.17 Partitioning a quadratic form into independent pieces—Cochran’s theorem. Converting proposi-
tion(44.16a) into a result about random variables gives the following.
Theorem(44.17a). Suppose the n-dimensional random vector X has the N (µ, σ 2 I) distribution and
XT X = XT A1 X + · · · + XT Ak X
where A1 , . . . , Ak are symmetric matrices with ranks r1 , . . . , rk . Consider the following three statements:
(1) r1 + · · · + rk = n (2) Aj is idempotent for every j = 1, 2, . . . , k (3) Ai Aj = 0 for all i 6= j
T T
If any one of these statements is true then all three are true and X A1 X, . . . , X Ak X are independent and
XT Aj X 2 XT X
∼ χ T
rj ,µ Aj µ/σ 2 for j = 1, 2, . . . , k, and ∼ χ2n,µT µ/σ2
σ2 σ2
Proof. Proposition(44.16a) shows that if any one of these statements is true then all three are true. By proposition(44.5a)
on page 143 we know that A1 X, . . . , Ak X are pairwise independent. The result in box display(42.8b) on page 133
shows that A1 X, . . . , Ak X are independent. Now (A1 X)T (A1 X) = XT A1 X because A1 is idempotent. It follows that
XT A1 X, . . . , XT Ak X are independent. Proposition (44.13a) on page 147 implies XT Aj X/σ 2 ∼ χ2rj ,µT Aj µ/σ2 for j =
1, 2, . . . , k. Hence result.

7
For a symmetric matrix, the rank equals the number of non-zero eigenvalues. This follows from the fact that we can diago-
nalize a symmetric matrix. We do need the matrix to be symmetric. Consider the 3 × 3 matrix
0 2 4
" #
0 0 3
0 0 0
This has all eigenvalues equal to 0 but rank 2.
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §44 Page 151

Example(44.17b). Distribution of the sample variance from a normal sample. See also example(44.7c) on page 144.
Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution.
(a) Show that
Pn n Pn 2
j=1 Xj (n − 1)S 2 j=1 (Xj − X)
X
2
X= and (Xj − X) are independent. and = ∼ χ2n−1
n σ2 σ2
j=1
(b) Show that
X−µ
T = √ ∼ tn−1
S/ n
Pn Pn 2 2
Solution. (a) By expressing Xj = (Xj − X) + X we get j=1 Xj2 = j=1 Xj − X + n X . Now XT X = XT [In −
1/n]X + XT 1X/n where 1 is the n × n matrix with every entry equal to 1 (see §38.7 on page 119). So for this case we have
A1 = In − 1/n and A2 = 1/n. Clearly A1 A2 = 0 and the theorem can be applied, giving
Pn 2 2
XT [In − 1/n]X 2 XT 1X/n ( j=1 Xj ) nX
2
∼ χ n−1 and 2
= 2
= 2
∼ χ21,nµ2 /σ2
√ σ σ nσ σ
(b) Now (X − µ)/(σ/ n) ∼ N (0, 1) and hence

(X − µ)/(σ/ n)
T = p ∼ tn−1 as required.
S 2 /σ 2
Converting proposition(44.16b) into a result about random variables gives the following.
Theorem(44.17c). Cochran’s theorem. Suppose the n-dimensional random vector X has the N (0, σ 2 I) dis-
tribution and
XT X = XT A1 X + · · · + XT Ak X
where A1 , . . . , Ak are symmetric non-negative definite matrices with ranks r1 , . . . , rk with r1 + · · · + rk = n.
Then XT A1 X, . . . , XT Ak X are independent and
XT Aj X
∼ χ2rj for j = 1, 2, . . . , k.
σ2
Proof. Use proposition(44.16b). From §42.6, we know that an orthogonal transformation of a spherical normal is
spherical normal—hence Y1 , Y2 , . . . , Yn are i.i.d. random variables with the N (0, σ 2 ) distribution. Hence result.
Cochran’s theorem can be extended to a non-zero mean as follows:
Proposition(44.17d). Non-central version of Cochran’s theorem. Suppose the n-dimensional random vector
X has the N (µ, σ 2 I) distribution and
XT X = XT A1 X + · · · + XT Ak X
where A1 , . . . , Ak are symmetric non-negative definite matrices with ranks r1 , . . . , rk with r1 + · · · + rk = n.
Then XT A1 X, . . . , XT Ak X are independent and
XT Aj X
∼ χ2rj ,µT Aj µ/σ2 for j = 1, 2, . . . , k.
σ2
Proof. Use the transformation Y = LX constructed in proposition(44.16b); hence Y ∼ N (Lµ, σ 2 I) and Y1 , . . . , Yn are
independent.
From §23.2 on page 72 we know that if W ∼ N (µ, In ), then WT W ∼ χ2n,λ where λ = µT µ. Apply this to Z = MTj X/σ ∼
N (MTj µ/σ, Irj ); hence ZT Z ∼ χ2rj ,λj where λj = µT Mj MTj µ/σ 2 = µT Aj µ/σ 2 . Also ZT Z = XT Mj MTj X/σ 2 =
XT Aj X/σ 2 .
44.18 Matrix results when the sum is any symmetric matrix.
Proposition(44.18a). Suppose A, A1 and A2 are symmetric n × n matrices with
A = A1 + A2
Suppose further that A and A1 are idempotent and A2 is non-negative definite.
Then A1 A2 = A2 A1 = 0 and A2 is idempotent. Also rank(A) = r1 + r2 where r1 = rank(A1 ) and r2 = rank(A2 ).
Proof. First note that if C is a real symmetric matrix, then there exists an orthogonal matrix Q with C = QT DQ where
D is a diagonal matrix with the eigenvalues of C on the diagonal. Hence if C is also idempotent, then the eigenvalues are
either 0 or 1 and C is non-negative definite. Thus every real symmetric idempotent matrix is non-negative definite.
This implies A, A1 and A2 are all non-negative definite. Using the fact that A and A1 are idempotent gives
A = A2 = (A1 + A2 )2 = A1 + A1 A2 + A2 A1 + A22
Hence
A2 = A1 A2 + A2 A1 + A22 (44.18a)
Page 152 §44 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Pre-multiplying by A1 and using the fact that A1 is idempotent gives A1 A2 A1 + A1 A22 = 0.


It is well known that trace(A + B) = trace(A) + trace(B) and trace(AB) = trace(BA). Hence
trace(A1 A2 A1 ) + trace(A2 A1 A2 ) = 0 (44.18b)
T
Now the (i, i)-element of A1 A2 A1 is βi A2 βi where βi is row i of A1 . This quantity is non-negative because A2 is non-
negative definite. Similarly, because both A1 and A2 are non-negative definite, equation(44.18b) implies trace(A1 A2 A1 ) =
trace(A2 A1 A2 ) = 0. Clearly F = A1 A2 A1 is also symmetric and non-negative definite; hence all the eigenvalues of F are
non-negative; also because trace(F) = 0, the sum of the eigenvalues equals 0. Hence A1 A2 A1 = 0; hence A1 A2 A1 A2 = 0.
Now if B is a real symmetric matrix with B2 = 0, then because B can be diagonalized, it follows that we must have B = 0.
Hence A1 A2 = 0. Similarly A2 A1 = 0. Substituting in equation(44.18a) gives A22 = A2 .
The three matrices A, A1 and A2 commute in pairs. Hence by pages 559-560 of [H ARVILLE(1997)], the matrices are
simultaneously diagonalizable—this means there exists a non-singular matrix P with P−1 AP = D, P−1 A1 P = D1 and
P−1 A2 P = D2 where D, D1 and D2 are all diagonal. Let r = rank(A). Because A is idempotent, the diagonal of D consists
of r entries equal to 1 and the rest equal to 0. Similarly for D1 and D2 . Because D = D1 + D2 we have r = r1 + r2 .
We can extend the previous result to any finite number of terms as follows.
Corollary(44.18b). Suppose A, A1 , A2 , . . . , Ak are symmetric n × n matrices with
A = A1 + A2 + · · · + Ak
Suppose further that A and A1 , . . . , Ak−1 are idempotent and Ak is non-negative definite.
Then Ai Aj = 0 for i 6= j and Ak is idempotent. Also rank(A) = r1 + · · · + rk where rj = rank(Aj ) for
j = 1, . . . , k.
Proof. The proof is by induction. The previous proposition implies P(2). We shall now prove P(k) ⇒ P(k + 1).
So we are given A = A1 + B where B = A2 + · · · + Ak+1 , the matrices A, A1 , . . . , Ak are symmetric idempotent and Ak+1
is non-negative definite. Hence A1 , . . . , Ak+1 are all non-negative definite and hence B is non-negative definite. Hence
P(2) implies A1 B = BA1 = 0 and B is idempotent.
Now apply P(k) to B = A2 + · · · + Ak+1 which we can do because the k matrices B, A2 , . . . , Ak are all idempotent and
Ak+1 is non-negative definite. So P(k) implies Ak+1 is idempotent and Ai Aj = 0 for i 6= j and i,j ∈ {2, . . . , k + 1}.
Now apply P(2) to the decomposition A = Ak+1 + C where A and Ak+1 are both idempotent and C = A1 + · · · + Ak is
non-negative definite. Hence C is idempotent and we can apply P(k) to C = A1 + · · · + Ak which implies Ai Aj = 0 for
i 6= j and i,j ∈ {1, . . . , k}. Finally, A1 B = 0 then implies A1 Ak+1 = 0.
44.19 Other matrix results. Further results where the sum of the matrices is not necessarily the identity matrix.
Proposition(44.19a). Suppose A, A1 , . . . , Ak are real symmetric n × n matrices with A1 + · · · + Ak = A.
Let r = rank(A) and rj = rank(Aj ) for j = 1, 2, . . . , k. Consider the following 3 statements:
(1) A is idempotent (2) Aj is idempotent for every j = 1, 2, . . . , k (3) Ai Aj = 0 for all i 6= j.
If any two of these statements are true then all 3 are true and also r = r1 + · · · + rk .
Proof. We first show
(2) and (3) ⇒ (1). This result is easy because A2 = (A1 + · · · + Ak )2 = A1 + · · · + Ak by using (2) and (3).
(1) and (2) ⇒ (3). Now A is symmetric and idempotent with rank r; hence there exists an orthogonal matrix Q such
that  
T Ir 0
Q AQ =
0 0
T
Let Bj = Q Aj Q for j = 1, 2, . . . , k; hence
 
I 0
B1 + · · · + Bk = QT AQ = r (44.19a)
0 0
Now suppose C is a non-negative definite matrix with entries cij ; then xT Cx ≥ 0 for all x ∈ Rn ; by choosing the
appropriate values for x we seePthat cii ≥ 0 for all i. Now suppose C is also symmetric and idempotent. Hence
C = C2 = CCT and hence cii = j c2ij . This shows that cii = 0 implies cij = cji = 0 for all j.
For our problem, every Bj is real, symmetric and idempotent and hence non-negative definite. So equation(44.19a)
implies that for every j = 1, 2, . . . , k there exists an r × r matrix Cj such that
 
Cj 0
Bj =
0 0
Because Bj is symmetric and idempotent, it follows that Cj is symmetric and idempotent; also equation(44.19a) implies
C1 + · · · + Ck = Ir . By proposition (44.16a), Ci Cj = 0 for all i 6= j. Hence Bi Bj = 0 for all i 6= j. Hence Ai Aj = 0 for
all i 6= j.
(1) and (3) ⇒ implies (2). Now Ai Aj = 0; hence ATj ATi = 0. But Ai and Aj are symmetric; hence Ai Aj = 0 = Aj Ai .
Hence Ai Aj = Aj Ai for all i 6= j. Hence there exists an orthogonal matrix Q and diagonal matrices D1 , . . . , Dk such that
QT Ai Q = Di for all i. Denote the diagonal elements of Di by {di1 , . . . , din }.
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §44 Page 153

Let D = D1 +· · ·+Dk ; hence D is diagonal and QT AQ = D. Because A is symmetric and idempotent, the diagonal elements
of D are either 0 or 1. Also D = D2 = (D1 + · · · + Dk )2 = D21 + · · · + D2k because Di Dj = QT Ai QQT Aj Q = QT Ai Aj Q = 0
for all i 6= j. Picking out the (1, 1) element of the equation D21 + · · · + D2k = D gives d211 + · · · + d2k1 = 0 or 1. Arguing
this way shows that every diagonal element of every one of the matrices D1 , . . . , Dk is either 0 or 1. Hence every Di is
idempotent; and hence every Ai = QDi QT is idempotent, as required.
Suppose both (1) and (2) are both true. Because the matrices are idempotent, we have trace(A) = rank(A) and trace(Ai ) =
rank(Ai ). Hence
X X X
rank(A) = trace(A) = trace( Ai ) = trace(Ai ) = rank(Ai )
i i i
This completes the proof.

We now consider a variation of the previous result which is less useful—because we are no longer assuming the
matrices are symmetric. First we need a lemma
Lemma(44.19b). The rank cancellation rule. Suppose L1 and L2 are ` × m matrices, A is m × n1 and B is
n1 × n2 . Suppose further that L1 AB = L2 AB and rank(AB) = rank(A). Then L1 A = L2 A.
Proof. Let r = rank(A). Using the full rank factorization of A shows there exists an m × r matrix F and an n1 × r
matrix G with A = FGT and rank(F) = rank(G) = r.
Hence rank(AB) = rank(FGT B) = rank(GT B) because F has full column rank—see exercise 45.15. Hence rank(GT B) = r.
Now L1 AB = L2 AB; hence L1 FGT B = L2 FGT B; hence L1 FGT B(GT B)T = L2 FGT B(GT B)T . But the matrix GT B has
full row rank; hence GT B(GT B)T is non-singular by exercise 45.15. Hence L1 F = L2 F and hence L1 A = L2 A.

This lemma is used in the following proof


Proposition(44.19c). Suppose A, A1 , . . . , Ak are real n × n matrices with A1 + · · · + Ak = A. Suppose A is
idempotent. Let r = rank(A) and rj = rank(Aj ) for j = 1, 2, . . . , k. Consider the following 3 statements:
(1) Ai Aj = 0 for all i 6= j and rank(A2i ) = rank(Ai ) for all i.
(2) Aj is idempotent for every j = 1, 2, . . . , k
(3) r = r1 + · · · + rk .
Then each of the statements (1), (2) and (3) imply the other two.
Proof.
(1) ⇒ (2) Using Ai Aj = 0 gives Ai A = A2i and A2i A = A3i . We also need to use A is idempotent. Hence A2i = Ai A =
Ai A2 = (Ai A)A = A2i A = A3i .
Now rank(A2i ) = rank(Ai ). Now use rank cancellation rule with L1 = A = B = Ai and L2 = I. Hence A2i = Ai . Similarly,
every Ai is idempotent; hence (2).
(2) ⇒ (3) For an idempotent matrix, the rank equals the trace; hence
k
X k
X Xk
rank(Ai ) = trace(Ai ) = trace( Ai ) = trace(A) = rank(A)
i=1 i=1 i=1
and hence (3).
(3) ⇒ (1)
Let A0 = In − A. Hence A0 + A1 + · · · + Ak = In and A0 is idempotent. By exercise 45.2 on page 156, we know that for
any n × n matrix A, A is idempotent iff rank(A) + rank(I − A) = n. Applying this to A0 gives rank(A0 ) = n − rank(A).
Hence rank(A0 ) + rank(A1 ) + · · · + rank(Ak ) = n.
Because the rank of a sum of matrices is less than or equal the sum of the ranks, we have
rank(In − A1 ) ≤ rank(A0 ) + rank(A2 ) + · · · + rank(Ak ) = n − rank(A1 )
But In = A1 + (In − A1 ); hence n = rank(In ) ≤ rank(A1 ) + rank(In − A1 ). Hence rank(A1 ) + rank(In − A1 ) = n and
hence A1 is idempotent. Similarly every Ai is idempotent and hence rank(A2i ) = rank(Ai ).
Now In = (A1 + A2 ) + (In − A1 − A2 ). Hence n ≤ rank(A1 + A2 ) + rank(In − A1 − A2 ). But
rank(In − A1 − A2 ) = rank(A0 + A3 + · · · + Ak ) ≤ rank(A0 ) + rank(A3 ) + · · · + rank(Ak )
= n − rank(A1 ) − rank(A2 ) ≤ n − rank(A1 + A2 )
Hence rank(A1 + A2 ) + rank(In − A1 − A2 ) = n. Hence A1 + A2 is idempotent. Hence exercise 45.3 on page 156, we
know that A1 A2 = A2 A1 = 0.
Hence the proposition is proved.
Page 154 §44 Mar 10, 2020(20:25) Bayesian Time Series Analysis

44.20 Converting the matrix results into random variable results. Converting the matrix result in corol-
lary(44.18b) on page 152 into a result about random variables gives:
Proposition(44.20a). Suppose X ∼ N (0, σ 2 In ). Suppose further that A, A1 , . . . , Ak are k + 1 symmetric n × n
matrices with A = A1 + · · · + Ak . Suppose Ak is non-negative definite,
XT AX 2 XT Aj X
∼ χ r and ∼ χ2rj for j = 1, . . . , k − 1.
σ2 σ2
where r = rank(A) and rj = rank(Aj ) for j = 1, 2, . . . , k. Then XT A1 X, . . . , XT Ak X are independent and
XT Ak X
∼ χ2rk where rk = r − (r1 + · · · + rk−1 ).
σ2
Proof. By exercise 45.18, we know that A, A1 , . . . , Ak−1 are all idempotent. Hence corollary(44.18b) implies Ai Aj = 0
for i 6= j and Ak is idempotent. Also r = r1 + · · · + rk .
By proposition(44.5a) on page 143 we know that A1 X, . . . , Ak X are pairwise independent. Also, the result in box
display(42.8b) on page 133 shows that A1 X, . . . , Ak X are independent. Now (A1 X)T (A1 X) = XT A1 X because A1 is
idempotent. Hence XT A1 X, . . . , XT Ak X are independent.
We now generalize the previous result to non-zero mean and general non-singular variance matrix Σ.
Proposition(44.20b). Suppose the n-dimensional random vector X has the non-singular N (µ, Σ) distribution.
Suppose further that A1 , A2 , . . . , Ak are symmetric n × n matrices and A = A1 + · · · + Ak . Hence A is also
symmetric. Suppose AΣ, A1 Σ, . . . , Ak−1 Σ are all idempotent and Ak Σ is non-negative definite.
Let rj = rank(Aj ) for j = 1, 2, . . . , k and r = rank(A).
Then
• XT Aj X ∼ χ2rj ,µT Aj µ for j = 1, 2. . . . , k
• XT AX ∼ χ2r,µT Aµ
• XT A1 X, . . . , XT Ak X are independent
Also Ak Σ is idempotent, r = r1 + · · · + rk , and Ai ΣAj = 0 for all i 6= j.
Proof. Applying Corollary(44.18b) to AΣ, A1 Σ, . . . , Ak Σ shows that Ak Σ is idempotent and Ai ΣAj Σ = 0. Because
Σ is non-singular, this implies Ai ΣAj = 0. Using rank(AΣ) = rank(A) etc. shows that r = r1 + · · · + rk .
By proposition(44.13a) on page 147, XT Aj X ∼ χ2rj ,µT Aj µ and XT AX ∼ χ2r,µT Aµ .
By proposition(44.5a) on page 143 and the result in box display(42.8b), we know that A1 X, . . . , Ak X are independent.
Now (A1 X)T (A1 X) = XT A1 X because A1 is idempotent. Hence XT A1 X, . . . , XT Ak X are independent.
The next proposition is the random variable form of proposition(44.19a) on page 152.
Theorem(44.20c). Suppose the n-dimensional random vector X has the non-singular N (µ, Σ) distribution.
Suppose further that A, A1 , . . . , Ak are real symmetric n × n matrices with A1 + · · · + Ak = A. Let r = rank(A)
and rj = rank(Aj ) for j = 1, 2, . . . , k. Consider the following 3 statements:
(1) AΣ is idempotent (2) Aj Σ is idempotent for every j = 1, 2, . . . , k (3) Ai ΣAj = 0 for all i 6= j.
If any two of these statements are true then all three are true and also r = r1 + · · · + rk and
• XT Aj X ∼ χ2rj ,µT Aj µ for j = 1, 2, . . . , k
• XT AX ∼ χ2r,µT Aµ
• XT A1 X, . . . , XT Ak X are independent
Proof. Applying proposition(44.19a) on page 152 to the matrices AΣ, A1 Σ, . . . , Ak Σ shows that any two of the
statements are true then all three are true and also r = r1 + · · · + rk .
Because Σ is non-singular, rank(AΣ) = rank(A) = r and hence by proposition(44.13a) on page 147 we have XT AX ∼
χ2r,µT Aµ . Similarly XT Aj X ∼ χ2rj ,µT Aj µ for j = 1, 2, . . . , k.
Proposition(44.5a) on page 143 implies A1 X, . . . , Ak X are pairwise independent. The result in box display(42.8b) on
page 133 shows that A1 X, . . . , Ak X are independent. By Zhang’s approach (see after proposition(44.8a) on page 145) we
know that for any n × n symmetric matrix A we have XT AX = (AX)T A− (AX) = fA (AX) for some function fA where
A− is a generalized inverse of A. Hence XT A1 X, . . . , XT Ak X are independent.
The previous proposition implies the following special case:
Corollary(44.20d). Suppose the n-dimensional random vector X has the non-singular N (µ, σ 2 In ) distribution.
Suppose further that A, A1 , . . . , Ak are real symmetric n × n matrices with A1 + · · · + Ak = A. Let r = rank(A)
and rj = rank(Aj ) for j = 1, 2, . . . , k. Consider the following 3 statements:
(1) A is idempotent (2) Aj is idempotent for every j = 1, 2, . . . , k (3) Ai Aj = 0 for all i 6= j.
If any two of these statements are true then all three are true and also r = r1 + · · · + rk and
• XT Aj X/σ 2 ∼ χ2rj ,µT Aj µ/σ2 for j = 1, 2, . . . , k
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §44 Page 155

• XT AX/σ 2 ∼ χ2r,µT Aµ/σ2


• XT A1 X, . . . , XT Ak X are independent
Note that it is not generally true that X ∼ χ2n,λ implies X/a ∼ χ2n,λ/a . However, in this case we see the theorem
implies XT AX ∼ χ2r,µT Aµ and, by applying the theorem to X/σ, that XT AX/σ 2 ∼ χ2r,µT Aµ/σ2

The corollary implies the following result: suppose X ∼ N (0, I), Q1 = XT A1 X, and Q2 = XT A2 X where A1
and A2 are real symmetric n × n matrices. Suppose further that Q1 and Q2 are independent and Q1 + Q2 has a
χ2 -distribution. Then Q1 and Q2 both have χ2 -distributions.
44.21 A standard example from the analysis of variance. Suppose
X11 X12 · · · X1n
X21 X22 · · · X2n
.. .. .. ..
. . . .
Xm1 Xm2 · · · Xmn
is an m × n array of i.i.d. random variables with the N (µ, σ 2 ) distribution. As usual, let
Pm Pn Pn Pm
i=1 j=1 Xij j=1 Xij Xij
X·· = Xi· = and X·j = i=1
mn n m
By simple algebra,
X m X n m X
X n
(Xij − X·· )2 = (Xij − Xi· + Xi· − X·· )2
i=1 j=1 i=1 j=1
m X
X n m
X
= (Xij − Xi· )2 + n (Xi· − X·· )2
i=1 j=1 i=1
Hence
Q = Q1 + Q2
where
m X
X n m X
X n m
X
2
Q= (Xij − X·· ) Q1 = (Xij − Xi· )2 and Q2 = n (Xi· − X·· )2
i=1 j=1 i=1 j=1 i=1

Let Yij = Xij − µ for all i and j. Then Yij ∼ N (0, σ 2 ).


Also
m
XX n m
XXn m
X
Q= (Yij − Y·· )2 Q1 = (Yij − Yi· )2 and Q2 = n (Yi· − Y·· )2
i=1 j=1 i=1 j=1 i=1
Now Q, Q1 and Q2 are quadratic forms in the mn × 1 vector Y. In principle, although it would be tedious to
do so, we could find symmetric matrices A, A1 and A2 with Q = YT AY, Q1 = YT A1 Y, and Q2 = YT A2 Y. Now
Q/(mn − 1) is the sample variance of the mn random variables {Yij }; hence, by equation(42.12a)
Q
∼ χ2mn−1
σ2
Similarly, nj=1 (Yij − Yi· )2 /(n − 1) is the sample variance of {Yi1 , Yi2 , . . . , Yin }. Hence nj=1 (Yij − Yi· )2 /σ 2 ∼
P P
nP o
n
χ2n−1 . But the random variables 2
j=1 (Yij − Yi· ) : i = 1, . . . , m are independent. Hence

Q1
∼ χ2m(n−1)
σ2
Hence by proposition(44.20a) on page 154, Q1 and Q2 are independent and
Q2
∼ χ2m−1
σ2
Similarly, if Q3 = m
P Pn 2
Pn 2
i=1 j=1 (Xij − X·j ) and Q4 = m j=1 (X·j − X·· ) , then Q = Q3 + Q4 . Also Q3 and Q4 are
independent and
Q3 Q4
2
∼ χ2n(m−1) and ∼ χ2n−1
σ σ2
Page 156 §45 Mar 10, 2020(20:25) Bayesian Time Series Analysis

45 Exercises (exs-quadraticForms.tex)

45.1 Suppose the n-dimensional random vector X has the N (µ, Σ) distribution, A is an m × n real matrix and B is an
n × n symmetric matrix. Show that
cov[ AX, XT BX ] = 2AΣBµ [Ans]
45.2 Suppose A is a real n × n matrix. Show that A is idempotent iff rank(A) + rank(I − A) = n. [Ans]
45.3 Suppose A and B are real idempotent n × n matrices. Prove that A + B is idempotent iff BA = AB = 0. [Ans]
45.4 Suppose P1 and P2 are symmetric idempotent matrices and P1 −P2 is non-negative definite. Prove that P1 P2 = P2 P1 = P2
and P1 − P2 is idempotent. [Ans]
45.5 The following two results are used in the proof of proposition(44.2a).
(a) Suppose X ∼ N (0, 1). Prove that
t2
 
2 1
E[esX +tX ] = √ exp for t ∈ R and s < 21 .
1 − 2s 2(1 − 2s)
(b) Suppose F and G are n × r matrices such that In − FGT and Ir − GT F are non-singular. Show that
(In − FGT )−1 = In + F(Ir − GT F)−1 GT [Ans]
Independence of quadratic forms
45.6 Suppose X1 , X2 . . . , Xn are i.i.d. random variables with the N (0, σ 2 ) distribution. Suppose further that X12 + · · · + Xn2
and the quadratic form Q = XT AX are independent. Prove that A = 0. [Ans]
45.7 Suppose X1 , X2 X3 and X4 are i.i.d. random variables with the N (0, σ 2 ) distribution. Let Q1 = X1 X2 − X3 X4 and
Q2 = (a1 X1 + a2 X2 + a3 X3 + a4 X4 )2 . Suppose Q1 and Q2 are independent; show that we must have Q2 = 0. [Ans]
45.8 Suppose X1 , X2 . . . , Xn are i.i.d. random variables with the N (0, σ 2 ) distribution. Suppose further that Qj = XT Aj X
is a quadratic form in X for j = 1, 2, . . . , k,. Suppose further that Q1 , Q2 , . . . , Qk are pairwise independent. Prove that
Q1 , Q2 , . . . , Qk are independent. [Ans]
45.9 Suppose X is an n-dimensional random vector with the non-singular N (0, σ 2 I) distribution. Suppose further that a is
1 × n non-zero matrix and B is a symmetric n × n matrix. Prove that aX and XT BX are independent iff the two quadratic
forms Q1 = XT aT aX and Q2 = XT BX are independent. [Ans]
45.10 (a) Suppose A is a real symmetric non-negative definite n × n matrix. By considering vectors such as x1 = (1, 0, . . . , 0),
x2 = (0, 1, 0, . . . , 0), etc, it is clear that every diagonal element of a non-negative definite matrix A is non-negative.
Suppose that ajj , the j th diagonal element of A is zero. Prove by contradiction that every element in the j th row
and j th column of A equals zero.
(b) Suppose X is an n-dimensional random vector with the non-singular N (0, σ 2 I) distribution. Suppose further that A1
and A2 are real symmetric non-negative definite n × n matrices. Let Q1 = XT A1 X, Q2 = XT A2 X, and Q = XT AX
where A is a real symmetric n × n matrix. Show that Q is independent of Q1 + Q2 iff Q is independent of Q1 and
Q is independent of Q2 . [Ans]
45.11 See proposition(44.8a) on page 145. Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) dis-
tribution. Suppose further that A is k × n matrix and B is a symmetric n × n matrix.
Prove that AΣB = 0 implies AX and XT BX are independent. [Ans]
45.12 Independence of two normal quadratic forms—see §44.7. Suppose X is an n-dimensional random vector with the
N (0, σ 2 I) distribution and A and B are n × n symmetric idempotent matrices. Then XT AX and XT BX are independent
iff AB = 0.
Hint: the implication ⇐ has been proved in proposition(44.7a). [Ans]
45.13 Independence of normal quadratic and normal linear form—see §44.6. Suppose X is an n-dimensional random
vector with the N (0, σ 2 I) distribution, a is n × 1 vector and B is a symmetric idempotent n × n matrix. Then aT X and
XT BX are independent random variables iff aT B = 0.
Hint: the implication ⇐ has been proved in proposition(44.6a). [Ans]
45.14 Suppose X1 and X2 are i.i.d. random variables with the N (0, σ 2 ) distribution. Find those linear functions of X1 and X2
which are independent of (X1 − X2 )2 . [Ans]
45.15 From page 75 of [H ARVILLE(1997)] we know that for any m × n matrix A we have rank(A) = rank(AAT ). Also
rank(A) = rank(AT ); hence rank(A) = rank(AT ) = rank(AAT ) = rank(AT A). Also, by page 37 of [H ARVILLE(1997)],
rank(AB) ≤ min{rank(A), rank(B)}.
(a) Suppose the m × n matrix A has full column rank, and hence n ≥ m. Show that AT A is non-singular.
(b) Suppose the m × n matrix A has full row rank, and hence m ≥ n. Show that AAT is non-singular.
(c) Suppose A is m × n and B is r × m. Show that rank(BA) = rank(BT BA). Hence deduce that if B has full column
rank then rank(BA) = rank(A).
(d) Suppose A is m × n and B is n × r with full row rank. Show that rank(AB) = rank(A). [Ans]
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §45 Page 157

45.16 Suppose Σ is an n × n non-negative definite matrix and A is an n × n symmetric matrix.


(a) Show that ΣAΣAΣ = ΣAΣ iff (ΣA)3 = (ΣA)2 .
(b) Show that rank(AΣ) = rank(ΣA) = rank(AΣA).
(c) Show that rank(AΣAΣ) = rank(ΣAΣ) = rank(ΣAΣA).
(d) Show that AΣ is idempotent iff Σ1/2 AΣ1/2 is idempotent. [Ans]
Distribution of a quadratic form
45.17 Continuation of examples(38.7d) and (38.8d). Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) dis-
tribution. By using proposition(44.13a) on page 147 show that
Pn 2
(n − 1)S 2 j=1 (Xj − X)
= ∼ χ2n−1 [Ans]
σ2 σ2
45.18 Distribution of XT CX when C is idempotent—see §44.12. Suppose X1 , . . . , Xn are i.i.d. random variables with the
N (0, σ 2 ) distribution and C is an n × n symmetric matrix. Show that
XT CX
∼ χ2r iff C is idempotent with rank(C) = r.
σ2
Hint: the implication ⇐ has been proved in proposition(44.12a); use characteristic functions for ⇒ . [Ans]
45.19 The hat matrix in linear regression.
(a) Suppose x is an n × p real matrix with rank(x) = p < n. Let H = x(xT x)−1 xT . Show that H and In − H are both
idempotent and find rank(H) and rank(In − H).
(b) Suppose b is a p × 1 real vector and the n-dimensional random vector Y has the N (xb, σ 2 In ) distribution. Find the
values of E[YT HY] and E[YT (In − H)Y] and the distributions of YT HY/σ 2 and YT (In − H)Y/σ 2 .
(c) Show that YT HY and YT (In − H)Y are independent and hence write down the distribution of
YT HY/p
[Ans]
Y (In − H)Y/(n − p)
T

45.20 The following two results are used in part (c) of exercise 45.21 below.
(a) Suppose A is a real n × n non-singular matrix and let B = A−1 . Assuming the necessary differentiability, show that
2
d2 B d2 A

dB dA dA
= −B B and = −B B + 2 B B
dx dx dx2 dx2 dx
(b) Suppose A is an n × n real matrix with eigenvalues {λ1 , . . . , λn }. Show that
X
[trace(A)]2 = trace(A2 ) + 2 λi λj [Ans]
i,j
i6 =j

45.21 Suppose X is an n-dimensional random vector with the possibly singular N (µ, Σ) distribution and A is a real n × n
symmetric matrix.
(a) Suppose g(t) = |I − 2tAΣ|. Show that
d2 g

dg X
g(t)|t=0 = 1 and = −2 trace(AΣ) and = 8 λi λj
dt t=0
dt2 t=0

i,j
i6 =j

(b) The cumulant generating function of the random variable Y is KY (t) where

X κr tr
KY (t) = ln MY (t) =
r!
r=1
2

and then κ1 = µ and κ2 = σ 2 . Hence µ = dKdtY (t) and σ 2 = d K Y (t)
.

dt2
t=0 t=0
Hence prove that
var[XT AX] = 2 trace[ (AΣ)2 ] + 4µT AΣAµ [Ans]
45.22 Suppose X is an n-dimensional random vector with the possibly singular N (µ, Σ) distribution and A is a real n × n
symmetric matrix.
(a) Suppose |t| < min{1/|2λ1 |, . . . , 1/|2λn |}. Show that
∞ r
1 X t r−1
− ln[|I − 2tAΣ|] = 2 trace[ (AΣ)r ]
2 r
r=1

(b) Suppose |t| < min{1/|2λ1 |, . . . , 1/|2λn |}. Show that



X
I − (I − 2tAΣ)−1 Σ−1 = − 2r tr (AΣ)r−1 A
 

r=1
Page 158 §45 Mar 10, 2020(20:25) Bayesian Time Series Analysis

(c) By using parts (a) and (b), prove that the rth cumulant of the quadratic form XT AX is
κr = 2r−1 (r − 1)! trace(AΣ) ]r + 2r−1 r!µT (AΣ)r−1 Aµ
This result generalizes part (b) of exercise 45.21. [Ans]
Partitioning a quadratic form into independent pieces
45.23 Continuation of example(44.17b) on page 151. Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) dis-
tribution and µ0 6= µ. Show that
X − µ0
T = √
S/ n
has a non-central tn−1 distribution and find the value of the non-centrality parameter. [Ans]
45.24 Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution. Find the distribution of
2
nX
Y = [Ans]
S2
45.25 Suppose X1 , X2 , X3 and X4 are i.i.d. random variables with the N (µ, σ 2 ) distribution. Let
 2  2
(X1 − X2 )2 X3 − (X1 + X2 )/2 X4 − (X1 + X2 + X3 )/3
Q1 = Q2 = and Q3 =
2 3/2 4/3
(a) Show that
X4
(Xi − X)2 = Q1 + Q2 + Q3
i=1
(b) Show that Q1 /σ 2 , Q2 /σ 2 and Q3 /σ 2 are independent each with the χ21 distribution.
Hint: recall exercise 2.12 on page 6 which considers the recursive calculation of s2 . Clearly the result of the current
exercise can be extended to n random variables. are independent. [Ans]
45.26 Suppose {Xij : i = 1, . . . , m; j = 1, . . . , n} is a m × n array of i.i.d. random variables with the N (µ, σ 2 ) distribution.
Let
Xm X n Xm Xn
Q= (Xij − X·· )2 Q2 = n (Xi· − X·· )2 Q4 = m (X·j − X·· )2
i=1 j=1 i=1 j=1
and
m X
X n
Q5 = (Xij − Xi· − X·j + X·· )2
i=1 j=1
(a) Show that Q = Q2 + Q4 + Q5 .
(b) Show that Q2 , Q4 and Q5 are independent and
Q5
∼ χ2(m−1)(n−1) [Ans]
σ2
45.27 Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution and let
Pn
(1) Xi
X = i=2
n−1
(1)
Then nX = X1 + (n − 1)X .
Pn 2 Pn (1) 2 (1) 2
+ 1 − n1 X1 − X

(a) Show that i=1 Xi − X = i=2 Xi − X .
Pn (1) 2 (1) 2
(b) Show that i=2 Xi − X and X1 − X are independent and find the distribution of
 (1) 2
n − 1 X1 − X
nσ 2
the two terms on the right hand side are independent and the required distribution is χ21 . [Ans]
45.28 Suppose the n-dimensional random vector X has the N (µ, Σ) distribution where µ = µ(1, 1, . . . , 1)T and
1 ρ ··· ρ
 
ρ 1 ··· ρ
Σ = σ2  2
 .. .. . . . ..  = σ [ (1 − ρ)In + ρ1 ]

. . .
ρ ρ ··· 1
Find the distribution of Pn 2
j=1 (Xj − X)
Y = [Ans]
σ 2 (1 − ρ)
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §46 Page 159

46 The bivariate t distribution


46.1 The bivariate t-distribution with equal variances. One possible version of the bivariate t-density is
−(ν+2)/2
x2 − 2ρxy + y 2

1
f(X,Y ) (x, y) = p 1+ (46.1a)
2π 1 − ρ2 ν(1 − ρ2 )
for ν > 0, ρ ∈ (−1, 1), x ∈ R and y ∈ R.

If x denotes the 2 × 1 vector (x, y), then an alternative expression which is equivalent to equation(46.1a) is
ν (ν+2)/2  −(ν+2)/2
fX (x) = p ν + xT C−1 x (46.1b)
2π 1 − ρ2
   
−1 1 1 −ρ 1 ρ
where C = and C = .
1 − ρ2 −ρ 1 ρ 1
This distribution is called the tν (0, C) distribution. We shall see below in §46.3 on page 160 that C = corr[X] and
X and Y have equal variances.

Characterization of the bivariate tν (0, C) distribution. The univariate tν -distribution is the distribution
46.2 .
p
of Z W/ν where Z ∼ N (0, 1), W ∼ χ2 and Z and W are independent. The generalisation to 2 dimensions
ν
is:
Proposition(46.2a). Suppose Z = (Z1 , Z2 ) ∼ N (0, C) where
 
1 ρ
C= and ρ ∈ (−1, 1)
ρ 1
Suppose further that W ∼ χ2ν and Z and W are independent. Define X = (X, Y ) by
Z
X=
(W/ν)1/2
Then X = (X, Y ) has the tν (0, C) density given in (46.1a).
Proof. The density of (Z1 , Z2 , W ) is
wν/2−1
 2
z1 − 2ρz1 z2 + z22
 h wi
1
f(Z1 ,Z2 ,W ) (z1 , z2 , w) = exp − exp −
2(1 − ρ2 ) 2ν/2 Γ( ν2 )
p
2π 1 − ρ2 2
.p .p
Consider the transformation to (X, Y, W ) where X = Z1 W/ν , Y = Z W/ν and W = W . This is a 1 − 1
2
transformation and the absolute value of the Jacobian is
∂(x, y, w) ν
∂(z1 , z2 , w) = w

Hence
wν/2 w x2 − 2ρxy + y 2
  
w
f(X,Y,W ) (x, y, w) = f (z1 , z2 , w) = ν exp − +1
ν(1 − ρ2 )
p
ν 2 2 +1 πνΓ( ν2 ) 1 − ρ2 2
wν/2 h wα i x2 − 2ρxy + y 2
= ν exp − where α = +1 (46.2a)
ν(1 − ρ2 )
p
2 2 +1 πνΓ ν2 2

1 − ρ2
Now using the integral of the χ2n density is 1 gives
Z ∞ h xi
n
x 2 −1 exp − dx = 2n/2 Γ n/2

0 2
which implies
Z ∞     ν2 +1    ν   2  ν2 +1  ν 
ν tα 2 ν
t 2 exp − dt = Γ +1 = Γ
0 2 α 2 2 α 2
Integrating the variable w out of equation(46.2a) gives
−(ν+2)/2
x2 − 2ρxy + y 2

1
f(X,Y ) (x.y) = 1+ for (x, y) ∈ R2
ν(1 − ρ2 )
p
2π 1 − ρ2
which is equation(46.1a) above.
Page 160 §47 Mar 10, 2020(20:25) Bayesian Time Series Analysis

46.3 Properties of the bivariate tν (0, C) distribution.


• The marginal distributions. Both X and Y have t-distributions with ν degrees of freedom. The proof of this is
left to exercise 48.1 on page 163.
• Moments. E[X] = E[Y ] = 0 and var[X] = var[Y ] = ν/(ν − 2) for ν > 2. The correlation is corr[X, Y ] = ρ
and the covariance is cov[X, Y ] = ρν/(ν − 2). The proof of these results is left to exercise 48.2 on page 163. It
follows that  
ν 1 ρ ν
var[X] = = C and corr[X] = C
ν−2 ρ 1 ν−2
• If ρ = 0, then equation(46.1a) becomes
−(ν+2)/2
x2 + y 2

1
f(X,Y ) (x, y) = 1+
2π ν
Note that f(X,Y ) (x, y) 6= fX (x)fY (y) and hence X and Y are not independent even when ρ = 0.

46.4 Generalisation to non-equal variances. Suppose T1 = aX and T2 = bY where a 6= 0 and b 6= 0 and


X = (X, Y ) ∼ tν (0, C). Thus
     2 
T1 a 0 ν a abρ ν
T= = X and Σ = var[T] = 2 = R
T2 0 b ν − 2 abρ b ν−2
where 8
   
a2 abρ −1 1 b2 −abρ 1
R= and R = 2 2 and |R−1 | =
abρ b2 a b (1 − ρ2 ) −abρ a2 a2 b2 (1 − ρ2 )
The absolute value of the Jacobian is |ab|. Substituting in equation(46.1a) on page 159 gives
−(ν+2)/2
b2 t21 − 2ρabt1 t2 + a2 t22

1
fT (t) = 1+
νa2 b2 (1 − ρ2 )
p
2π|ab| 1 − ρ2
−(ν+2)/2
tT R−1 t ν (ν+2)/2 

1 −(ν+2)/2
= 1 + = ν + tT R−1 t
2π|R| 1/2 ν 2π|R| 1/2
ν
This is the tν (0, R) distribution. Note that var[T] = ν−2 R.

47 The multivariate t distribution


47.1 The density of the multivariate t-distribution, tν (0, I). If we put ρ = 0 in equation(46.1b) we see that if
T ∼ tν (0, I) then
ν (ν+2)/2  −(ν+2)/2
fT (t) = ν + tT t

Generalizing to p-dimensions leads to the following definition.
Definition(47.1a). The p-dimensional random vector T has the t-distribution tν ( 0 , I ) iff T has density
p×1 p×1 p×p
1
f (t) ∝  (ν+p)/2
ν + tT t
where ν ∈ R and ν > 2.
An alternative expression is:
κ
f (t1 , . . . , tp ) =  (ν+p)/2
ν+ t21 + · · · + t2p

8
In general, the inverse of the 2 × 2 symmetric matrix
   
a c 1 b −c
is
c b ab − c2 −c a
provided ab 6= c2 .
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §47 Page 161

The constant of proportionality, κ, can be determined by integration. Integrating out tp gives


Z ∞ Z ∞
dtp dtp
f (t1 , . . . , tp−1 ) = κ  (ν+p)/2
= 2κ  (ν+p)/2
−∞ ν + t2 + · · · + t2 p 0 ν + t21 + · · · + t2p
1
Z ∞
dtp 2 2
= 2κ  (ν+p)/2 where α = ν + t1 + · · · + tp−1
0 α + t2p
Z ∞
2κ dtp
= (ν+p)/2 (ν+p)/2
α

0 1 + t2p /α
√ Z ∞
2κ α dx √ √
= (ν+p)/2 √ (ν+p)/2 where x = tp ν + p − 1/ α
α ν+p−1 0

1 + x2 /(ν + p − 1)
Using the standard result that
Z ∞ −(n+1)/2 √  
t2 √ nΓ 1/2 Γ n/2
2 1+ dt = nB(1/2, n/2) = 
0 n Γ (n+1)/2
implies

κ πΓ( (ν+p−1)/2)
f (t1 , . . . , tp−1 ) =
α(ν+p−1)/2 Γ( (ν+p)/2)

κ π Γ( (ν+p−1)/2) 1
=
Γ( (ν+p)/2) [ν + t21 + · · · + t2p−1 ](ν+p−1)/2
By induction
κπ (p−1)/2 Γ( (ν+1)/2)
f (t1 ) =  (ν+1)/2
Γ( (ν+p)/2) ν + t21
and so
ν ν/2 Γ( (ν+p)/2)
κ=
π p/2 Γ( ν/2)

It follows that the density of the p-dimensional tν (0, I) is


ν ν/2 Γ( (ν+p)/2) 1
f (t) = (47.1a)
π p/2 Γ( ν/2) ν + tT t (ν+p)/2
 

47.2 Characterization of the tν (0, I) distribution.


Proposition(47.2a). Suppose Z1 , Z2 , . . . , Zp are i.i.d. with the N (0, 1) distribution and W has the χ2ν dis-
tribution. Suppose further that Z = (Z1 , Z2 , . . . , Zp ) and W are independent. Define T = (T1 , T2 , . . . , Tp )
by
Z
T=
(W/ν)1/2
Then T has the density in equation(47.1a).
Proof. See exercise 48.3 on page 163.

47.3 Properties of the tν (0, I) distribution.


• TT T/p has the F (p, ν) distribution—see exercise 48.6 on page 163.
• The contours of the distribution are ellipsoidal (the product of independent t distributions does not have this
property).
• The marginal distribution of an r-dimensional subset of T has the tν (0, I) distribution. In particular, each Ti
has the tν distribution. These results follow immediately from the characterization in §47.2.
ν
• E[T] = 0 and var[T] = E[TTT ] = ν−2 I for ν > 2. (Because W ∼ χ2ν implies E[1/W ] = 1/(ν − 2).)
Finally, corr[T] = I.
Page 162 §47 Mar 10, 2020(20:25) Bayesian Time Series Analysis

47.4 The p-dimensional t-distribution: tν (m, C).


Here C is real, symmetric, positive definite p × p matrix.
The Cholesky decomposition implies there exists a real and nonsingular L with C = LLT . Let
V = m + L T where T ∼ tν (0, I)
p×1 p×1 p×p p×1
T ν T ν
Then E[V] = m and var(V) = Lvar(T)L = ν−2 LL = ν−2 C.
See exercise 48.4 on page 163 for the proof of the
result
TT T = (V − m)T C−1 (V − m) (47.4a)
It follows that V has density:
ν ν/2 Γ (ν+p)/2

κ
f (v) = (ν+p)/2 where κ =  and |L| = |C|1/2 (47.4b)
π p/2 Γ ν/2

−1
|L| ν + (v − m) C (v − m)
T

A random variable which has the density given in equation(47.4b) is said to have the tν (m, C) distribution.
Definition(47.4a). Suppose C is real, symmetric, positive definite p × p matrix and m is a p × 1 vector in Rp .
Then the p-dimensional random vector V has the tν (m, C) distribution iff V has the density
1
f (v) ∝  (ν+p)/2
ν + (v − m)T C−1 (v − m)
It follows that
ν
E[V] = m and var[V] = C
ν−2
and the constant of proportionality is given in equation(47.4b).
47.5 Linear transformation of the tν (m, C) distribution. Suppose T ∼ tν (m, C); thus m is the mean vector
ν
and ν−2 C is the covariance matrix of the random vector T. Suppose V = a + AT where A is non-singular.
It follows that T = A−1 (V − a), E[V] = a + Am and var[V] = ν−2ν
ACAT .
Let m1 = a + Am and C1 = ACAT . Then V has the tν (m1 , C1 ) distribution—see exercise 48.5 on page 163.
47.6 Characterization of the tν (m, C) distribution.
Proposition(47.6a). Suppose Z has the non-singluar multivariate normal distribution N (0, Σ) and W has
the χ2ν distribution. Suppose further that Z and W are independent. Then T = m + Z/(W/ν)1/2 has the
tν (m, Σ) distribution.
Proof. Because Z has a non-singular distribution, Σ is positive definite and there exists a symmetric non-singular Q
with Σ = QQ. Let Y = Q−1 Z. Then var[Y] = Q−1 var[Z](Q−1 )T = I. So Y ∼ N (0, I). Hence
Y
T1 = p ∼ tν (0, I)
W/ν
Using §47.5 gives T = m + QT1 ∼ tν (m, Σ) as required.

47.7
Summary.
ν
• The bivariate t-distribution tν (0, R). This has E[T] = 0 and var[T] = ν−2 R. The density is
ν (ν+2)/2  T −1 −(ν+2)/2

fT (t) = ν + t R t
2π|R|1/2
Particular case:
−(ν+2)/2
t21 − 2ρt1 t2 + t22
  
1 ν 1 ρ
fT (t) = p 1+ where var[T] =
2π 1 − ρ2 ν(1 − ρ2 ) ν−2 ρ 1
ν
• The p-dimensional t-distribution tν (m, R). This has E[T] = m and var[T] = ν−2 R. The density is
ν ν/2 Γ ν+p

1
f (t) = p/2 ν2 
π Γ 2 |R|1/2 ν + (v − m)T R−1 (v − m) (ν+p)/2
 

• Characterization of the t-distribution. Suppose Z ∼ N (0, Σ) and W has the χ2ν distribution. Suppose
further that Z and W are independent. Then T = m + Z/(W/ν)1/2 has the tν (m, Σ) distribution.
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §49 Page 163

48 Exercises (exs-t.tex)

48.1 Suppose T has the bivariate t-density given in equation(46.1a) on page 159. Show that both marginal distributions are
the tν -distribution and hence have density given in equation(21.1b) on page 63:
−(ν+1)/2
t2

1
f (t) = √ 1+ for t ∈ R. [Ans]
B( 1/2, ν/2) ν ν
48.2 Suppose T has the bivariate t-density given in equation(46.1a) on page 159 and ν > 2.
(a) Find E[X] and var[X].
(b) Find cov[X, Y ] and corr[X, Y ]. [Ans]
48.3 Prove proposition(47.2a) on page 161: Suppose Z1 , Z2 , . . . , Zp are i.i.d. with the N (0, 1) distribution and W has the
χ2ν distribution.
 Suppose further that Z = (Z1 , Z2 , . . . , Zp ) and W are independent. Define T = (T1 , T2 , . . . , Tp ) by
T = Z (W/ν)1/2 . Then T has the following density
ν ν/2 Γ( (ν+p)/2) 1
f (t) = [Ans]
π p/2 Γ( ν/2) ν + tT t(ν+p)/2

48.4 Prove equation(47.4a) on page 162: TT T = (V − m)T C−1 (V − m). [Ans]


48.5 See §47.5 on page 162. Suppose T ∼ tν (m, C) and V = a + AT where A is non-singular. Prove that V ∼ tν (m1 , C1 )
where m1 = a + Am and C1 = ACAT . [Ans]
48.6 Suppose the p-variate random vector T has the tν (0, I) distribution. Show that TT T/p has the F (p, ν) distribution. [Ans]

49 The Dirichlet distribution


49.1 The Dirichlet distribution can be regarded as a multivariate generalization of the beta distribution. In fact
in one dimension, the Dirichlet distribution reduces to the beta distribution.
49.2 One dimension. Suppose X1 ∼ gamma(k1 , α) and X2 ∼ gamma(k2 , α) and X1 and X2 are independent.
Let Y1 = X1 /(X1 + X2 ), Y2 = X2 /(X1 + X2 ) and Z = X1 + X2 . By exercise 12.7 on page 38 we know that
Y1 ∼ beta(k1 , k2 ), Y2 ∼ beta(k2 , k1 ) and Z ∼ gamma(k1 + k2 , α). Also Y1 and Z are independent and Y2 and Z
are independent. Note that the random vector (Y1 , Y2 ) does not have a density function because Y1 + Y2 = 1. So
Y1 has density:
Γ(k1 + k2 ) k1 −1
fY1 (y1 ) = y (1 − y1 )k2 −1 for 0 < y1 < 1.
Γ(k1 )Γ(k2 ) 1
Definition(49.2a). Suppose k1 > 0 and k2 > 0; then Y1 has the Dirichlet distribution Dir(k1 , k2 ) iff Y1 ∼
beta(k1 , k2 ).
Note that
• fY1 (y1 )/(k1 + k2 − 1) is the probability of k1 − 1 successes in k1 + k2 − 2 independent Bernoulli trials in which
the probability of success on any trial is y1 .
• 1 − Y1 has the distribution beta(k2 , k − k2 ) = Dir(k2 , k − k2 ) where k = k1 + k2 .
49.3 Two dimensions. Suppose X1 ∼ gamma(k1 , α), X2 ∼ gamma(k2 , α), X3 ∼ gamma(k3 , α) and X1 , X2
and X3 are independent. The joint density of (X1 , X2 , X3 ) is
xk1 1 −1 xk2 2 −1 xk3 3 −1 e−α(x1 +x2 +x3 )
f(X1 ,X2 ,X3 ) (x1 , x2 , x3 ) = αk1 +k2 +k3 for x1 > 0, x2 > 0 and x3 > 0.
Γ(k1 )Γ(k2 )Γ(k3 )
Let
X1 X2
Y1 = Y2 = and Z = X1 + X2 + X3
X1 + X2 + X3 X1 + X2 + X3
Hence the transformation (X1 , X2 , X3 ) 7−→ (Y1 , Y2 , Z) maps
{(x1 , x2 , x3 ) ∈ R3 : x1 > 0, x2 > 0, x3 > 0} 7−→ {(y1 , y2 , z) ∈ R3 : y1 > 0, y2 > 0, y1 + y2 < 1, z > 0}
Note that X1 = Y1 Z, X2 = Y2 Z and X3 = Z(1−Y1 −Y2 ). The absolute value of the Jacobian of the transformation
is
∂(x1 , x2 , x3 ) z 0 y1 z 0 y1

∂(y1 , y2 , z) = 0 z y2 = 0 z y2 = z 2

−z −z 1 − y1 − y2 0 0 1

Page 164 §49 Mar 10, 2020(20:25) Bayesian Time Series Analysis

where we have added the first and second rows to the third row in the first determinant in order to get the second
determinant. Hence the density of (Y1 , Y2 , Z) is
αk1 +k2 +k3 y1k1 −1 y2k2 −1 (1 − y1 − y2 )k3 −1 −αz k1 +k2 +k3 −1
f(Y1 ,Y2 ,Z) (y1 , y2 , z) = e z
Γ(k1 )Γ(k2 )Γ(k3 )
for y1 > 0, y2 > 0, y1 + y2 < 1, z > 0.
This density factorizes into the densities of (Y1 , Y2 ) and Z; hence (Y1 , Y2 ) is independent of Z. Clearly Z ∼
gamma(k1 + k2 + k3 , α) and the density of (Y1 , Y2 ) is:
Γ(k1 + k2 + k3 ) k1 −1 k2 −1
f(Y1 ,Y2 ) (y1 , y2 ) = y y2 (1 − y1 − y2 )k3 −1 for y1 > 0, y2 > 0, y1 + y2 < 1. (49.3a)
Γ(k1 )Γ(k2 )Γ(k3 ) 1
Definition(49.3a). Suppose k1 > 0, k2 > 0 and k3 > 0. Then the random vector (Y1 , Y2 ) has the Dirichlet
distribution Dir(k1 , k2 , k3 ) iff (Y1 , Y2 ) has the density in equation(49.3a).
Given definition(49.3a), consider the transformation (Y1 , Y2 ) 7−→ (Y1 , Y3 = 1 − Y1 − Y2 ). The absolute value of
the Jacobian is 1 and hence the density of (Y1 , Y3 ) is
Γ(k1 + k2 + k3 ) k1 −1 k3 −1
f(Y1 ,Y3 ) (y1 , y3 ) = y y3 (1 − y1 − y3 )k2 −1 for y1 > 0, y3 > 0, y1 + y3 < 1.
Γ(k1 )Γ(k2 )Γ(k3 ) 1
Similarly by considering the transformation (Y1 , Y2 ) 7−→ (Y3 = 1 − Y1 − Y2 , Y2 ) shows that the density of (Y2 , Y3 )
is
Γ(k1 + k2 + k3 ) k2 −1 k3 −1
f(Y2 ,Y3 ) (y2 , y3 ) = y y3 (1 − y2 − y3 )k1 −1 for y2 > 0, y3 > 0, y2 + y3 < 1.
Γ(k1 )Γ(k2 )Γ(k3 ) 2
Let k = k1 + k2 + k3 . By integrating out the other variable we see that Y1 ∼ beta(k1 , k − k1 ), Y2 ∼ beta(k2 , k − k2 )
and Y3 = 1 − Y1 − Y2 ∼ beta(k3 , k − k3 ).

49.4 The general case of n dimensions. Now suppose X1 , . . . , Xn+1 are independent random variables and
Xj ∼ gamma(kj , α) for j = 1, . . . , n + 1. Let
Xj
Yj = for j = 1, . . . , n + 1.
X1 + · · · + Xn+1
and let Z = X1 + · · · + Xn+1 . Then the absolute value of the Jacobian of the transformation (X1 , . . . , Xn+1 ) 7→
(Y1 , . . . , Yn , Z) is z n . The density of (Y1 , . . . , Yn , Z) is
αk1 +···+kn+1 y1k1 −1 · · · ynkn −1 (1 − y1 − · · · − yn )kn+1 −1 −αz k1 +···+kn+1 −1
f(Y1 ,...,Yn ,Z) (y1 , . . . , yn , z) = e z
Γ(k1 ) · · · Γ(kn+1 )
and hence Z is independent of (Y1 , . . . , Yn ) and the density of (Y1 , . . . , Yn ) can be written down.
Definition(49.4a). Suppose n ∈ {1, 2, . . .} and (k1 , . . . , kn+1 ) ∈ (0, ∞)n+1 . The random vector (Y1 , . . . , Yn )
has the Dirichlet distribution Dir(k1 , . . . , kn+1 ) iff (Y1 , . . . , Yn ) has density
Γ(k1 + · · · + kn+1 ) k1 −1
f(Y1 ,...,Yn ) (y1 , . . . , yn ) = y · · · ynkn −1 (1 − y1 − · · · − yn )kn+1 −1 (49.4a)
Γ(k1 ) · · · Γ(kn+1 ) 1
for y1 > 0, y2 > 0, . . . , yn > 0 and y1 + y2 + · · · yn < 1.

An immediate consequence of the fact that equation(49.4a) defines a density is the Dirichlet integral formula,
which is
Γ(k1 ) · · · Γ(kn+1 )
Z Z
· · · y1k1 −1 · · · ynkn −1 (1 − y1 − · · · − yn )kn+1 −1 dy1 . . . dyn = (49.4b)
Γ(k1 + · · · + kn+1 )
A
where A = { (y1 , . . . , yn ) : y1 > 0, y2 > 0, . . . , yn > 0, y1 + y2 + · · · yn < 1 }.
The special case of n = 1 reduces to the beta distribution. Suppose (k1 , k2 ) ∈ (0, ∞)2 . Then Y ∼ Dir(k1 , k2 ) iff
Y has density
Γ(k1 + k2 ) k1 −1
fY (y) = y (1 − y)k2 −1 for 0 < y < 1.
Γ(k1 )Γ(k2 )
and this is just the density of the beta(k1 , k2 ) distribution.
3 Multivariate Continuous Distributions Mar 10, 2020(20:25) §50 Page 165

49.5 Marginal distributions. Suppose (Y1 , . . . , Yn ) ∼ Dir(k1 , . . . , kn+1 ) and let k = n+1
P
j=1 kj . Then we know
that
Xj
Yj = for j = 1, . . . , n
X1 + · · · + Xn+1
where X1 , . . . , Xn+1 are independent random variables and Xj ∼ gamma(kj , α) for j = 1, . . . , n + 1.
Marginal distribution of Yj . Now
Xj
Yj =
Xj + Wj
where Xj ∼ gamma(kj , α), Wk ∼ gamma(k − kj , α) and Xj and Wj are independent. By exercise 12.7 on
page 38 it follows that
Yj ∼ beta(kj , k − kj ) = Dir(kj , k − kj ) for j = 1, 2, . . . , n + 1. (49.5a)
Note that Yn+1 = 1 − (Y1 + · · · + Yn ).
Using equation (13.3a) on page 40 shows that
kj kj (kj + 1) kj (k − kj )
E[Yj ] = E[Yj2 ] = and var[Yj ] = (49.5b)
k k(k + 1) k 2 (k + 1)
Marginal distribution of (Y1 , Y2 ). Now
  n+1
X1 X2 X
(Y1 , Y2 ) ∼ , where Z3 = X3 + · · · + Xn+1 = Xj − (X1 + X2 )
X1 + X2 + Z3 X1 + X2 + Z3
k=1
Also X1 ∼ gamma(k1 , α), X2 ∼ gamma(k2 , α) , Z3 ∼ gamma(k − k1 − k2 ) and X1 , X2 and Z3 are independent.
Hence (Y1 , Y2 ) ∼ Dir(k1 , k2 , k − k1 − k2 ).
Similarly and for any 1 ≤ j < ` ≤ n we have (Yj , Y` ) ∼ Dir(kj , k` , k − kj − k` ).
Other marginal distributions. Similarly it can be shown that (Y1 , Y2 , Y3 ) ∼ Dir(k1 , k2 , k3 , k − k1 − k2 − k3 ), etc,
etc.

50 Exercises (exs-dirichlet.tex)

50.1 Suppose (Y1 , Y2 , Y3 , Y4 , Y5 , Y6 ) ∼ Dir(k1 , k2 , k3 , k4 , k5 , k6 , k7 ). Show that


(Y1 , Y2 + Y3 , Y4 + Y5 + Y6 ) ∼ Dir(k1 , k2 + k3 , k4 + k5 + k6 , k7 ) [Ans]
50.2 Suppose (Y1 , Y2 , Y3 , Y4 , Y5 , Y6 ) ∼ Dir(k1 , k2 , k3 , k4 , k5 , k6 , k7 ). Find the distribution of V = Y1 + Y2 + Y3 + Y4 . [Ans]
50.3 Suppose (Y1 , Y2 ) ∼ Dir(k1 , k2 , k3 ).
(a) Find the conditional density of Y1 given Y2 = y2 .
(b) Find E[Y1 |Y2 = y2 ]. [Ans]
50.4 Suppose (Y1 , . . . , Yn ) ∼ Dir(k1 , . . . , kn+1 ) and suppose 1 ≤ i < j ≤ n.
(a) Find E[Yi Yj ]. (b) Find cov[Yi , Yj ] and corr[Yi , Yj ]. [Ans]
50.5 Suppose (Y1 , . . . , Yn ) ∼ Dir(k1 , . . . , kn+1 ) and (j1 , . . . , jn ) ∈ (0, ∞)n . Find
E[Y1j1 · · · Ynjn ]
What does this result become in the special case E[Y1 · · · Yn ]? [Ans]
50.6 The inverted Dirichlet distribution or the Dirichlet distribution of the second kind.
Suppose (Y1 , . . . , Yn ) ∼ Dir(k1 , . . . , kn+1 ). Consider the transformation
Yi
Zi = for i = 1, . . . , n.
1 − Y1 − · · · − Yn
(a) Show that
Zi
Yi =
1 + Z1 + · · · + Zn
and hence the transformation maps
{(y1 , . . . , yn ) : y1 > 0, . . . , yn > 0, y1 + · · · + yn < 1} −→ {(z1 , . . . , zn ) : z1 > 0, . . . , zn > 0}
(b) Show that the Jacobian of the transformation is

∂(z1 , . . . , zn ) n+1
∂(y1 , . . . , yn ) = (1 + z1 + · · · + zn )

Page 166 §50 Mar 10, 2020(20:25) Bayesian Time Series Analysis

(c) Show that the density of (Z1 , . . . , Zn ) is


Γ(k1 + · · · + kn ) z1k1 −1 · · · znkn −1
f(Z1 ,...,Zn ) (z1 , . . . , zn ) =
Γ(k1 ) · · · Γ(kn+1 ) (1 + z1 + · · · + zn )k1 +···+kn+1
for (z1 , . . . , zn ) ∈ (0, ∞)n .
(d) Find E[Zi ], var[Zi ] and cov[Zi , Zj ].
See also [N G et al.(2011)]. [Ans]
ANSWERS

Chapter 1 Section 2 on page 5 (exs-basic.tex)

2.1 (a) A = £1,000 × 1.04 × (1 + V2 ) × (1 + V3 ). Hence E[A] = £1 000 × 1.043 = £1,124.864 or £1,124.86.
(b) For this case    
1,000 1,000 1 1
C= and E[C] = E E
1.04(1 + V2 )(1 + V3 ) 1.04 1 + V2 1 + V3
Now 0.05
  Z 0.05
1 dv
E = 50 = 50 ln(1 + v) = 50(ln 1.05 − ln 1.03)
1 + V2 0.03 1 + v 0.03
  Z 0.06 0.06
1 du
E = 25 = 25 ln(1 + v) = 25(ln 1.06 − ln 1.02)
1 + V3 0.02 1 + v 0.02
Hence E[C] = 889.133375744 or £889.13. [←EX]
2.2 (a) Now (X 2 + Y 2 )/XY = U/(1 − U ) + (1 − U )/U = f1 (U ) where 1 − U > 0 because X and Y are positive. Also
f (U ) and V are independent; hence result. (b) (X + Y )2 /XY = (X 2 + Y 2 )/XY + 2 = f2 (U ). Also f2 (U ) and V are
independent. [←EX]
2.3 Clearly −2a < W < 2a. For w ∈ (−2a, 2a) we have
Z
fW (w) = fX (x)fY (w − x) dx
The integrand is non-zero when −a < x < a and −a < w − x < a; this implies w − a < x < w + a. Hence
Z min(a,w+a)
1
fW (w) = fX (x)fY (w − x) dx = 2 [min(a, w + a) − max(−a, −a + w)]
max(−a,−a+w) 4a
|w|
  
(2a − w)/4a2 if w > 0 1
= = 1−
(w + 2a)/4a2 if w < 0 2a 2a

1/2a

−2a 0 2a
Figure(3a). The shape of the triangular density
(wmf/triangulardensity,60mm,21mm)

[←EX]
dy
2.4 Clearly 0 ≤ Y < 1; also dx = 4x3 = 4y 3/4 .
X fX (x) X fX (x) X 1 1
fY (y) = = = = 3/4 [←EX]
x
| dy/dx|
x
4y 3/4
x
8y 3/4 4y
2.5 Now (X − 1)2 ≥ 0; hence X 2 + 1 ≥ 2X. Because X > 0 a.e., we have X + 1/X ≥ 2 a.e. Hence result. [←EX]
2.6 (a)
Z ∞ Z ∞ Z ∞ Z ∞ Z t Z ∞
r−1 r−1 r−1
rx [1 − F (x)] dx = rx f (t) dt dx =
f (t)dx dt = tr f (t) dt = E[X r ]
rx
0 x=0 t=x t=0 x=0 t=0
R∞ R b−a
(b) Let Y = X − a. Then 0 ≤ Y ≤ b − a and by part (a) we have E[Y ] = 0 [1 − FY (y)] dy = 0 [1 − FY (y)] dy.
Now E[Y ] = E[X] − a and hence
Z b−a Z b−a Z b
E[X] = a + P[Y > y] dy = a + P[X > a + y] dy = a + P[X > y] dy
0 0 a
as required. [←EX]
2.7 (a) These are standard results in probability:
Z ∞ Z ∞
fW (w) = fY (w + x)fX (x) dx and fZ (z) = fY (z − x)fX (x) dx
x=−∞ x=−∞
(b) Now for v > 0 we have
P[V ≤ v] = P[−v ≤ W ≤ v] = FW (v) − FW (−v)
and hence by differentiation, we get
fV (v) = fW (v) +√fW (−v) √
√ √
Similarly, FT (t) = P[T ≤ t] = P[− t ≤ Y − X ≤ t] = FW ( t) − FW (− t) and by differentiation
1 √ 1 √
fT (t) = √ fW ( t) + √ fW (− t) [←EX]
2 t 2 t

Bayesian Time Series Analysis by R.J. Reed Mar 10, 2020(20:25) Answers 1§2 Page 167
Page 168 Answers 1§2 Mar 10, 2020(20:25) Bayesian Time Series Analysis

2.8 Suppose W = X − Y . Then for w < 0 we have


ln(a) ln(a)
ew e2y ew
Z Z
fW (w) = fX (w + y)fY (y) dy = dy =
y=−∞ −∞ a2 2
For w > 0 we have
ln(a)−w ln(a)−w
ew e2y e−w
Z Z
fW (w) = fX (w + y)fY (y) dy = 2
dy =
y=−∞ −∞ a 2
−w
Hence density of |W | is e for w > 0. This is the exponential (1) distribution. [←EX]
2.9 (a) Suppose φ(x) = 1/x; then φ is a convex function  on (0, ∞). Hence if X is positive random variable with finite
expectation, then 1/E[X] ≤ E[1/X]. Hence E 1/Sn ≥ 1/E[Sn ] = 1/(nµ). Trivially, the result is still true if
E[X] = ∞.
(b) Z ∞ Z ∞ Z ∞Z ∞
n
E[e−tX ] dt = E[e−tSn ] dt = e−tx dSn (x) dt
0 0 t=0 x=0
Z ∞Z ∞ Z ∞  
−tx 1 1
= e dt dSn (x) = dSn (x) = E
x=0 t=0 x=0 x Sn
by using the Fubini-Tonelli theorem that the order of integration can be changed for a non-negative integrand. [←EX]
2.10 (a) The arithmetic mean-geometric mean inequality gives
x1 + · · · + xn √
≥ n x1 · · · xn for all x1 > 0, . . . , xn > 0.
n
Hence
1 1

x1 + · · · + xn 1/n
nx1 · · · xn
1/n

Using independence gives


  " #!n
1 1 1
E ≤ E
Sn n X1
1/n

Now for x > 0 we have 


1 1/x if 0 < x ≤ 1;

x1/n 1 if x ≥ 1.
Hence E[1/Sn ] is finite. (b) Because they have identical distributions, E[X1 /Sn ] = · · · = E[Xn /Sn ]. Hence
X1 + · · · + Xn X1 + · · · + Xj
           
Sn X1 Sj X1 j
1=E =E = nE Hence E =E = jE = [←EX]
Sn Sn Sn Sn Sn Sn n

2.11 (a) Recall |cov[X1 , X2 ]| ≤ var[X1 ] var[X2 ]; hence cov[ X/Y , Y ] is finite and hence E[X/Y ] and var[X/Y ] are both
finite. Also E[X] = E[ X/Y ] E[Y ] and, because Y and X 2 /Y 2 are independent, we have E[Y ]E[X 2 /Y 2 ] = E[X 2 /Y ].
Hence    2      2  
X X X X X X
E[Y ] var = E[Y ]E 2
− E[Y ]E E = E − E[X]E
Y Y Y Y Y Y
   
X X
= E[X]E − E[X]E =0
Y Y
Hence var[X/Y ] = 0 and hence result.
(b) Because X/Y is independent of Y and Y /X is independent of X, we have φX (t) = φX/Y (t)φY (t) and φY (t) =
φY /X (t)φX (t). Hence φX (t)φY (t)φX/Y (t)φY /X (t) = φX (t)φY (t) and so φX/Y (t)φY /X (t) = 1. But for any character-
istic function |φ(t)| ≤ 1. Hence |φY /X (t)| = 1 everywhere. This implies1 Y /X is constant almost everywhere and this
establishes the result. [←EX]
2.12 Squaring tn+1 = tn + xn+1 gives (a). For part(b):
(n + 1)t2n − nt2n+1
 
n+1 n+1 2 1
= 2 n(n + 1)x2n+1 + (n + 1)t2n − nt2n+1

(vn+1 − vn ) = xn+1 +
n n n(n + 1) n
1
= 2 n2 x2n+1 + t2n − 2nxn+1 tn

n
as required. [←EX]
2.13 Denote E[Y |X] by h(X). Then
h 2 i h 2 i
E Y − g(X) = E Y − h(X) + h(X) − g(X)
h 2 i    h 2 i
= E Y − h(X) + 2E Y − h(X) h(X) − g(X) + E h(X) − g(X)
But        
E Y − h(X) h(X) − g(X) |X = h(X) − g(X) E Y − h(X) |X = h(X) − g(X) .0
Hence result. [←EX]
1
See for example, pages 18–19 in [L UKACS(1970)] and exercise 4 on page 298 in [A SH(2000)].
Mar 10, 2020(20:25) Answers 1§2 Page 169

2.14 E[Y |X] = X/2; hence E[Y ] = E[X]/2 = 1/4. Now var( E[Y |X] ) = var[X]/4 = 1/48. Finally var[Y |X = x] =
x2 /12 and hence E[ var(Y |X) ] = E[X 2 ]/12 = 1/36. Hence var[Y ] = 1/36 + 1/48 = 7/144. [←EX]
2.15 Now    
 2  2
E Y − Yb = E Y − E(Y |X) + E(Y |X) − Yb
h  2 
2 i h  i
= E Y − E(Y |X) + 2E Y − E(Y |X) E(Y |X) − Y b + E E(Y |X) − Y
b

By equation(1.1a) on page 3 and the law of total expectation, the first term is E[var(Y |X)]. Applying the law of total
expectation to the second term gives
h  i n h   io
2E Y − E(Y |X) E(Y |X) − Yb = 2E E Y − E(Y |X) E(Y |X) − Yb |X
n  o
= 2E E(Y |X) − Yb × 0 = 0
Hence  2   2 
E Y − Yb = E[var(Y |X)] + E E(Y |X) − Yb

which is minimized when Yb = E(Y |X). [←EX]


2.16 (a) For the second result, just use E(XY |X) = XE(Y |X) = aX + bX 2 and take expectations. Clearly cov[X, Y ] =
b var[X]. Then E(Y |X) = a + bX = a + bµX + b(X − µX ) = µY + b(X − µX ). Then use b = cov(X, Y )/var(X) =
2 h i2
and b = ρσY /σX . Finally E Y − E(Y |X) = E Y − µY − ρ σσX
  
ρσY /σX . (b) var E(Y |X) = b2 σX
2 Y
(X − µX ) =
σY2 + ρ2 σY2 − 2ρ σσX
Y
cov[X, Y ] = σY2 + ρ2 σY2 − 2ρ2 σY2 as required.
(c) We have µX = c + dµY and µY = a + bµX . Hence µX = (c + ad)/(1 − bd) and µY = (a + bc)/(1 − bd).
Now E[XY ] = cµY + dE[Y 2 ] and E[XY ] = aµX + bE[X 2 ]. Hence cov[X, Y ] = dvar[Y ] and cov[X, Y ] = bvar[X].
Hence σY2 /σX 2
= b/d. Finally ρ = cov[X, Y ]/(σX σY ) = d σσX Y
and hence ρ2 = d2 b/d = bd. [←EX]
2
  2 2 2 2
2.17 Let g(a, b) = E ( Y − a − bX ) = E[Y ] − 2aµY + a − 2bE[XY ] + b E[X ] + 2abµX . Hence we need to solve
∂g(a, b) ∂g(a, b)
= −2µY + 2a + 2bµX = 0 and = −2E[XY ] + 2bE[X 2 ] + 2aµX = 0
∂a ∂b
This gives
E[XY ] − µX µY σY σY
b= 2
=ρ and a = µY − bµX = µY − ρ µX [←EX]
E[X ] − µX
2 σX σX
2.18
Z 1
6 2  2
fX (x) = (x + y)2 dy = (x + 1)3 − x3 = (3x2 + 3x + 1) for x ∈ [0, 1].
0 7 7 7
Similarly
2 2
fY (y) = (3y + 3y + 1) for y ∈ [0, 1].
7
Hence
3(x + y)2 3(x + y)2
fX|Y (x|y) = and fY |X (y|x) = for x ∈ [0, 1] and y ∈ [0, 1].
3y 2 + 3y + 1 3x2 + 3x + 1
and so the best predictor of Y is
1 2 2 1
x y y 3 y 4
Z
3 3
E[Y |X = x] = 2
(x2 + 2xy + y 2 )y dy =
+ 2x +
3x + 3x + 1
0 3x2 + 3x + 1 2 3 4 0
 2 
3 x 2x 1 1  2 
= 2 + + = 2
6x + 8x + 3
3x + 3x + 1 2 3 4 4(3x + 3x + 1)
2 1
x=1
= 14 , E[X ] = 7 0 (3x + 3x + x )dx = 27 35 x5 + 43 x4 + 13 x3 x=0 = 72 53 + 34 + 13 =
9 101
2
R 4 3 2
  
(b) Now µX = µY 210 and
2
2 9 101 81 199
σX = σY2 = E[X 2
] − 14 2 = 210 − 196 =
Also2940 .
Z 1Z 1 Z 1 (Z 1
)
1
1 2y y 2
Z  
7 2 2
E[XY ] = xy(x + y) dxdy = y x(x + y) dx dy = y + + dy
6 0 0 y=0 x=0 y=0 4 3 2
1
y 2y 2 y 3
Z  
1 2 1 17
= + + dy = + + =
y=0 4 3 2 8 9 8 36
2
17
Hence E[XY ] = and cov[X, Y ] = 17
42
9 5 5 2940 25
42 − 142 = − 588 and ρ = − 588 × 199 = − 199 . Hence the best linear predictor is
σY 9 144 25
µY + ρ (X − µX ) = (1 − ρ) + ρX = − X
σX 14 199 199
(c) See figure(18a) below. [←EX]
Page 170 Answers 1§2 Mar 10, 2020(20:25) Bayesian Time Series Analysis

0.75 Best linear predictor


Best predictor

0.70

0.65

0.60
0.0 0.2 0.4 0.6 0.8 1.0
Figure(18a). Plot of best predictor (solid line) and best linear predictor (dashed line) for exercise 2.18.
(wmf/exs-bestlin,72mm,54mm)

2.19 Define G : [0, ∞) → [0, ∞) and f : [0, ∞) → [0, ∞) by


Z t
G(t) = g(u) du and f (t) = g(t)e−G(t)
0
Then f ≥ 0 and
Z M M
f (t) dt = −e−G(t) = e−G(0) − e−G(M ) == e−0 − e−G(M ) = 1 − e−G(M ) → 1 as M → ∞.

0 0
Hence f satisfies the conditions for being a probability density function. [←EX]
2.20 Now P[T > t] = P[T1 > t] P[T2 > t] and fT (t) = fT1 (t)[1 − FT2 (t)] + fT2 (t)[1 − FT1 (t)] Hence
fT (t) fT1 (t) fT2 (t)
hT (t) = = + = hT1 (t) + hT2 (t) [←EX]
P[T > t] 1 − FT1 (t) 1 − FT2 (t)
2.21 Recall, because hT is a hazard function, it follows that
Z ∞
hT (t) dt = ∞
0
and hence HT is a 1 − 1 transformation : [0, ∞) → [0, ∞). Using the usual formula for transformation of random
variable gives

dt fT (t)
= e−HT (t) t=H −1 (y)

fY (y) = fT (t) =

dy hT (t) t=H −1 (y)
T
T

= e−y [←EX]
3 3 2 2 3 3 2 3 3 2 3
2.22 (a) Now E[(X − µ) ] = E[X ] − 3µE[X ] + 3µ E[X] − µ = E[X ] − 3µE[X ] + 2µ = E[X ] − 3µσ − µ by using
E[X 2 ] = σ 2 + µ2 . Hence result.
(b) Let Y = a + bX. Then E[Y ] = a + bµ and var[Y ] = b2 σ 2 . Because b > 0 it follows that σY = bσ. Hence
skew[Y ] = E[(bX − bµ)3 ]/b3 σ 3 = skew[X].
(c) In this case σY = −bσ. Hence result. [←EX]
2.23 Now E[Z] = E[I]E[X] + E[1 − I]E[Y ] = − 2/3 + 2/3 = 0. Because I(1 − I) = 0, it follows that Z n = I n X n + (1 − I)n Y n
for n = 2, 3, . . . . Hence var[Z] = E[Z 2 ] = 5/3 + 6/3 = 11/3. Finally E[(Z − µ)3 ] = E[I 3 X 3 + (1 − I)3 Y 3 ] =
E[X 3 ]/3 + 2E[Y 3 ]/3.
Now if W ∼ N (µ, σ 2 ), then W is symmetric about µ and hence E[(W −µ)3 ] = 0. This implies 0 = E[W 3 ]−3µE[W 2 ]+
3µ2 E[W ] − µ3 = E[W 3 ] − 3µE[W 2 ] + 2µ3 = E[W 3 ] − 3µσ 2 − µ3 . Thus E[W 3 ] = 3µσ 2 + µ3 . Using this general result
shows that E[X 3 ] = −14 and E[Y 3 ] = 7. Hence skew[Z] = 0.
Now fZ = 31 fX + 23 fY . Hence
1 1 2 1 1 1 2 1
fZ (1) = √ e−9/2 + √ and fZ (−1) = √ e−1/2 + √ e−1
3 2π 3 4π 3 2π 3 4π
and these are clearly not equal. Hence the distribution of Z is not symmetric. [←EX]

2.24 (a) Now E[B] = E[B 2 ] = p and var[B] = p(1 − p). Hence σ = p(1 − p). Also E[(B − p)3 ] = p(1 − p)3 − (1 −
√ p)p =
3
2 3 4 3 4 2 3 2
p − 3p + 3p − p − p + p = p − 3p + 2p = p(1 − 3p + 2p ) = p(1 − p)(1 − 2p). Hence skew[B] = (1 − 2p)/ p(1 − p).
(b) Now E[(B − p)4 ] = p(1 − p)4 + (1 − p)p4 = p − 4p2 + 6p3 − 3p4 = p(1 − 4p + 6p2 − 3p3 ) = p(1 − p)(1 − 3p + 3p2 )
and hence κ[B] = (1 − 3p + 3p2 )/[p(1 − p)]. [←EX]
2.25 (a) Either E[X ] = ∞ and then κ[X] = E[(X − µ) ] = ∞, or both are finite and then E[(X − µ) ] = E[X − 4µX 3 +
4 4 4 4

6µ2 X 2 − 4µ3 X + µ4 ] = E[X 4 ] − 4µE[X 3 ] + 6µ2 (σ 2 + µ2 ) − 3µ4 and hence the result.
(b) Let Y = a + bX. Then E[Y ] = a + bµ and var[Y ] = b2 σ 2 and hence σY = |b|σ. Also E[(Y − µY )4 ] = b4 E[(X − µ)4 ].
Hence skew[Y ] = b skew[X]. [←EX]
Mar 10, 2020(20:25) Answers 1§6 Page 171

Chapter 1 Section 6 on page 15 (exs-orderstable.tex)

6.1 (a) Clearly 0 < Z < Also Z = min{X1 , X2 , 1 − X1 , 1 − X2 }. Hence P[Z ≥ z] = P[X1 ≥ z, X2 ≥ z, 1 − X1 ≥
1/2.
z, 1 − X2 ≥ z] = P[X1 ≥ z, X2 ≥ z, X1 ≤ 1 − z, X2 ≤ 1 − z] = P[z ≤ X1 ≤ 1 − z] P[z ≤ X2 ≤ 1 − z] = (1 − 2z)2 .
Hence the density is fZ (z) = 4(1 − 2z) for 0 < z < 1/2.
(b) Now
 
X(1) if X(1) < 1 − X(2) X(1) if X(1) + X(2) < 1
Z= =
X(2) if X(1) > 1 − X(2) X(2) if X(1) + X(2) > 1
Hence P[Z ≤ z] = P[X(1) ≤ z, X(1) + X(2) < 1] + P[X(2) ≤ z, X(1) + X(2) > 1]. If z < 1/2 then P[Z ≤ z] =
2(z − z 2 ) + 0. If z > 1/2 then P[Z ≤ z] = 1/2 + 2(z 2 − z) + 1/2 = 1 + 2z 2 − 2z. Or: If z ≤ 1/2 we have
{Z ≤ z} = {(X1 ≤ z) ∩ (X2 further from end than X1 )} ∪ {(X2 ≤ z) ∩ (X1 further from end than X2 )}. Hence
2
P[Z ≤ z] = 2P[ (X1 ≤ z) ∩ (X2 further from end than X1 ) ] = P[ (X1 ≤  z) ∩ (X1 < X2 ≤ 1 − X1 ) ] = 2(z − z ). If z>
1/2, P[Z ≤ z] = 2P [ (X ≤ z) ∩ (X further from end than X ) ] = 2P (X ≤ 1/2) ∩ (X further from end than X ) +
 1 2  1  1 2  1
2P ( 1/2 < X1 ≤ z) ∩ (X2 further from end than X1 ) = 2 × 1/4 + 2P ( 1/2 < X1 < z) ∩ (1 − X1 < X2 < X1 ) = 1/2 +
2z 2 − 2z + 1/2 as before. [←EX]
n!
6.2 (a) Using equation(3.3b) on page 9 gives the density f (x) = (j−1)!(n−j)! xj−1 (1 − x)n−j . Recall B(j, n − j + 1) =
Γ(j)Γ(n−j+1)
Γ(n+1) = (j−1)!(n−j)!
n!
1
. Hence the density of Xj:n is f (x) = B(j,n−j+1) xj−1 (1 − x)n−j for x ∈ (0, 1) which is
the beta(j, n − j + 1) distribution. (b) E[Xj:n ] = j/(n + 1) by using the standard result of the expectation of a Beta
distribution. [←EX]
6.3 Now the density of (X1:4 , X2:4 , X3:4 , X4:4 ) is g(x1 , x2 , x3 , x4 ) = 4! = 24 for 0 < x1 < x2 < x3 < x4 < 1. Hence the
marginal density of (X2:4 , X3:4 , X4:4 ) is g2,3,4 (x2 , x3 , x4 ) = 24x2 for 0 < x2 < x3 < x4 . Hence the marginal density of
(X3:4 , X4:4 ) is g3,4 (x3 , x4 ) = 12x23 for 0 < x3 < x4 < 1. Hence
Z Z
P[X3:4 + X4:4 < 1] = 12x2 I[0 < x < 1, 0 < y < 1, x < y, x + y < 1] dx dy
Z 1/2 Z 1−x Z 1/2
1
= 12x2 dy dx = 12x2 (1 − 2x) dx = [←EX]
x=0 y=x x=0 8
6.4 Now Xn:n = Yn , X(n−1):n = Yn−1 Yn , X(n−2):n = Yn−2 Yn−1 Yn , . . . , X1:n = Y1 · · · Yn . The absolute value of the
Jacobian of the transformation
is
∂(x1 , . . . , xn ) 2 3 n−1
∂(y1 , . . . , yn ) = (y2 · · · yn )(y3 · · · yn ) · · · (yn−1 yn )(yn ) = y2 y3 y4 · · · yn

Hence the density of the vector (Y1 , . . . , Yn ) is


f (y1 , . . . , yn ) = n!y2 y32 y43 · · · ynn−1 for 0 < y1 < 1, . . . , 0 < yn < 1.
Because the density factorizes, it follows that Y1 , . . . , Yn are independent. Also
fY1 (y1 ) = 1 fY2 (y2 ) = 2y2 fY3 (y3 ) = 3y32 . . . fYn (yn ) = nynn−1
It is easy to check that V1 = Y1 , V2 = Y22 , . . . , Vn = Ynn are i.i.d. random variables with the uniform(0, 1) distribution.
[←EX]
6.5 Now the density of (X1:4 , X2:4 , X3:4 , X4:4 ) is g(x1 , x2 , x3 , x4 ) = 4! = 24 for 0 < x1 < x2 < x3 < x4 < 1. Hence
the marginal density of (X2:4 , X3:4 , X4:4 ) is g2,3,4 (x2 , x3 , x4 ) = 24x2 for 0 < x2 < x3 < x4 < 1. Hence the marginal
R 1−y
density of (X2:4 , X4:4 ) is g2,4 (x2 , x4 ) = 24x2 (x4 − x2 ) for 0 < x2 < x4 < 1. Hence fY (y) = x2 =0 g2,4 (x2 , x2 + y) dx2 =
R 1−y
x2 =0
24x2 y dx2 = 12y(1 − y)2 for 0 < y < 1.
The marginal density of (X1:4 , X2:4 , X3:4 ) is g1,2,3 (x1 , x2 , x3 ) = 24(1 − x3 ) for 0 < x1 < x2 < x3 . Hence the marginal
R 1−z
density of (X1:4 , X3:4 ) is g1,3 (x1 , x3 ) = 24(x3 − x1 )(1 − x3 ) for 0 < x1 < x3 < 1. Hence fZ (z) = x1 =0 g1,3 (x1 , x1 +
R 1−z
z) dx1 = x1 =0 24z(1 − x1 − z) dx1 = 12z(1 − z)2 for 0 < z < 1. So both have the same distribution. [←EX]
6.6 Now the density of (X1:3 , X2:3 , X3:3 ) is g(x1 , x2 , x3 ) = 3! = 6 for 0 < x1 < x2 < x3 < 1. Hence the marginal
density of (X1:3 , X3:3 ) is g1,3 (x1 , x3 ) = 6(x3 − x1 ) for 0 < x1 < x3 < 1. Hence the conditional density of X2:3 given
(X1:3 , X3:3 ) = (x1 , x3 ) is
6 1
fX2:3 |(X1:3 ,X3:3 ) (x2 |(x1 , x3 ) = = for x2 ∈ (x1 , x3 ).
6(x3 − x1 ) x3 − x1
This is the uniform distribution on (x1 , x3 ). [←EX]
2
6.7 Now P[Z ≤ z] = P[X ≤ x] P[Y ≤ z] = {FX (z)} and hence fZ (z) = 2fX (z)FX (z). Hence
Z b Z b b Z b
2 2
E[Z] = zfZ (z) dz = z2fX (z)FX (z) dz = z {FX (z)} − {FX (z)} dz
a a a a
by using integration by parts.
2
(b) Now P[V > v] = P[X > v]P[Y > v] = {1 − FX (v)} and hence fV (v) = 2fX (v) {1 − FX (v)} and hence
b Z b
2 2
E[V ] = −v {1 − FX (v)} + {1 − FX (v)} dv
a a
Page 172 Answers 1§6 Mar 10, 2020(20:25) Bayesian Time Series Analysis

(c) Adding gives


" # " #
Z b Z b
b
E[Z] + E[V ] = 2 vFX (v)|a − FX (v) dv = 2 b − FX (v) dv = 2E[X]
a a

by using part (b) of exercise 2.6. Of course, the result is immediate from the fact that min{x, y} + max{x, y} = x + y.
[←EX]
6.8 Now E[ X1 |X1:n , . . . , Xn:n ] = E[ Xj |X1:n , . . . , Xn:n ] for j = 2, . . . , n. Hence E[ X1 |X1:n , . . . , Xn:n ] = E[ X1 + · · · +
Xn |X1:n , . . . , Xn:n ]/n = E[ X1:n + · · · + Xn:n | X1:n , . . . , Xn:n ]/n = ( X1:n + · · · + Xn:n ) /n. [←EX]
6.9 By equation(3.3b) on page 9 we have
Z 0 Z ∞!  
n r−1 n−r
E[Xr:n ] = + r xfX (x) {FX (x)} {1 − FX (x)} dx
−∞ 0 r
The first integral. First set t = −x; then
 Z 0
n r−1 n−r
r xfX (x) {FX (x)} {1 − FX (x)} dx
r x=−∞
 Z ∞
n r−1 n−r
= −r tfX (t) {FX (−t)} {1 − FX (−t)} dt
r t=0
For t > 0 we have fY (t) = 2fX (t); also FX (−t) = 1 − FX (t) = 21 [1 − FY (t)] and 1 − FX (−t) = 12 {1 + FY (t)}. Hence
the integral is
 Z ∞
r n r−1 n−r
− n tfY (t) {1 − FY (t)} {1 + FY (t)} dt
2 r 0
n−r   Z ∞
r X n n−r r−1 `
=− n tfY (t) {1 − FY (t)} {FY (t)} dt
2 r ` 0
`=0
n−r Z ∞
1 X n! r−1 `
=− n tfY (t) {1 − FY (t)} {FY (t)} dt
2 (r − 1)!`!(n − r − `)! 0
`=0
Using equation(3.3b) on page 9 again shows that it equals
n−r   n  
1 X n 1 X n
=− n E[Y(`+1):(r+`) ] = − n E[Y(k−r+1):k ] by setting k = r + `.
2 r+` 2 k
`=0 k=r
Similarly
 Z ∞
n r−1 n−r
r xfX (x) {FX (x)} {1 − FX (x)} dx
r 0
  Z ∞
n 1 r−1 n−r
=r xfY (x) {1 + FY (x)} {1 − FY (x)} dx
r 2n 0
r−1  Z ∞
r−1
  X
n 1 r−1−k n−r
=r xfY (x) {FY (x)} {1 − FY (x)} dx
r 2n k 0 k=0
r−1
1 X n! (r − k − 1)!(n − r)!
= E[Y(r−k):(n−k) ]
2n (n − r)!k!(r − k − 1)! (n − k)!
k=0
r−1  
1 X n
= n E[Y(r−k):(n−k) ]
2 k
k=0
Part (b) is a trivial extension of part (a). For part (c), use equation(3.5a) on page 10 which is, for u < v
 j−1  r−1−j  n−r
f(j:n,r:n) (u, v) = cfX (u)fX (v) FX (u) FX (v) − FX (u) 1 − FX (v)
n!
where c =
(j − 1)!(r − 1 − j)!(n − r)!
Z ∞ Z ∞
 j−1  r−1−j  n−r
E[Xj:n Xr:n ] = c uvfX (u)fX (v) FX (u) FX (v) − FX (u) 1 − FX (v) dvdu
u=−∞ v=u
Write
Z ∞ Z ∞ Z ∞ Z ∞ Z 0 Z ∞ Z 0 Z 0
= + +
u=−∞ v=u u=0 v=u u=−∞ v=0 u=−∞ v=u
The middle integral. First set t = −u, hence t > 0 and
Z 0 Z ∞
 j−1  r−1−j  n−r
c uvfX (u)fX (v) FX (u) FX (v) − FX (u) 1 − FX (v) dvdu
u=−∞ v=0
Mar 10, 2020(20:25) Answers 1§6 Page 173
Z ∞ Z ∞  j−1  r−1−j  n−r
= −c tvfX (t)fX (v) FX (−t) FX (v) − FX (−t) 1 − FX (v) dv dt
t=0 v=0
For t > 0 and v > 0 we have FX (−t) = 1 − FX (t) = 21 [1 − FY (t)], 1 − FX (v) = 12 [1 − FY (v) and FX (v) − FX (−t) =
1 1 1
2 [1 + FY (v)] − 2 [1 − FY (t)] = 2 [FY (v) + FY (t)]. Hence the middle integral is
Z ∞Z ∞
c  j−1  r−1−j  n−r
=− n tvfY (t)fY (v) 1 − FY (t) FY (v) + FY (t) 1 − FY (v) dv dt
2 t=0 v=0
r−1−j  Z ∞
c X r−1−j
=− n tfY (t){FY (t)}` {1 − FY (t)}j−1 dt ×
2 ` t=0
`=0
Z ∞
vfY (v){FY (v)}r−1−j−` {1 − FY (v)}n−r dv
v=0
r−1−j
X  r−1−j `!(j − 1)! (r − j − ` − 1)!(n − r)!

c
=− n E[Y(`+1):(`+j) ] E[Y(r−j−`):(n−j−`) ]
2 ` (` + j)! (n − j − `)!
`=0
r−1−j
c X (r − 1 − j)!(j − 1)!(n − r)!
=− E[Y(`+1):(`+j) ] E[Y(r−j−`):(n−j−`) ]
2n (` + j)!(n − j − `)!
`=0
r−1−j
X  n 
1
=− E[Y(`+1):(`+j) ] E[Y(r−j−`):(n−j−`) ]
2n j+`
`=0
r−1  
1 X n
=− E[Y(k−j+1):k ] E[Y(r−k):(n−k) ] by setting k = j + `.
2n k
k=j

The first integral. For x > 0 and y > 0 we have FX (x) = 21 [1+FY (x)], 1−FX (x) = 12 [1−FY (x)] and FX (y)−FX (x) =
1
2 [FY (y) − FY (x)]. Hence the first integral is
Z ∞Z ∞
 j−1  r−1−j  n−r
c uvfX (u)fX (v) FX (u) FX (v) − FX (u) 1 − FX (v) dvdu
u=0 v=u
Z ∞Z ∞
c  j−1  r−1−j  n−r
= n uvfY (u)fY (v) 1 + FY (u) FY (v) − FY (u) 1 − FY (v) dvdu
2 u=0 v=u
j−1  Z ∞Z ∞
c X j−1  `  r−1−j  n−r
= n uvfY (u)fY (v) FY (u) FY (v) − FY (u) 1 − FY (v) dvdu
2 ` u=0 v=u
`=0
j−1
c X (j − 1)! `!(r − j − 1)!(n − r)!
= n E[Y(`+1):(n−j+`+1) Y(r−j+`+1):(n−j+`+1) ]
2 `!(j − 1 − `)! (n − j + ` + 1)!
`=0
j−1
1 X n!
= E[Y(`+1):(n−j+`+1) Y(r−j+`+1):(n−j+`+1) ]
2n (j − 1 − `)!(n − j + ` + 1)!
`=0
j−1  
1 X n
= E[Y(`+1):(n−j+`+1) Y(r−j+`+1):(n−j+`+1) ]
2n n−j+1+`
`=0
j−1  
1 X n
= E[Y(j−k):(n−k) Y(r−k):(n−k) ] by setting k = j − ` − 1.
2n k
k=0
The third integral. First set t = −u and w = −v. Then the integral becomes
Z 0 Z 0
 j−1  r−1−j  n−r
c uvfX (u)fX (v) FX (u) FX (v) − FX (u) 1 − FX (v) dvdu
u=−∞ v=u
Z ∞Z t
 j−1  r−1−j  n−r
=c twfX (t)fX (w) FX (−t) FX (−w) − FX (−t) 1 − FX (−w) dw dt
t=0 w=0
Now for t > 0 and w > 0 we have FX (−t) = 1 − FX (t) = 21 [1 − FY (t)], 1 − FX (−w) = 12 [1 + FY (w)] and
FX (−w) − FX (−t) = 12 [FY (t) − FY (w)]. Hence the third integral is
Z ∞Z t
c  j−1  r−1−j  n−r
twfY (t)fY (w) 1 − FY (t) FY (t) − FY (w) 1 + FY (w) dw dt
2n t=0 w=0
n−r  Z ∞Z t
c X n−r  `  r−1−j  j−1
= n twfY (t)fY (w) FY (w) FY (t) − FY (w) 1 − FY (t) dw dt
2 ` t=0 w=0
`=0
Page 174 Answers 1§6 Mar 10, 2020(20:25) Bayesian Time Series Analysis

n−r
c X (n − r)! `!(r − j − 1)!(j − 1)!
= n
E[Y(`+1):(r+`) Y(r+`−j+1):(r+`) ]
2 `!(n − r − `)! (r + `)!
`=0
n−r n−r  
1 X n! 1 X n
= n E[Y(`+1):(r+`) Y(r+`−j+1):(r+`) ] = n E[Y(`+1):(r+`) Y(r+`−j+1):(r+`) ]
2 (n − r − `)!(r + `)! 2 r+`
`=0 `=0
n  
1 X n
= n E[Y(k−r+1):k Y(k−j+1):k ] by setting k = r + `.
2 k
k=r
Part (d) is a trivial extension of part (c). [←EX]
6.10 ⇐ The joint density of (X1:2 , X2:2 ) is g(y1 , y2 ) = 2f (y1 )f (y2 ) = 2λ2 e−λ(y1 +y2 ) for 0 <
y 1 < y 2 . Now consider the
∂(w,y)
transformation to (W, Y ) = (X2:2 − X1:2 , X1:2 ). The absolute value of the Jacobian is ∂(y1 ,y2 ) = | − 1| = 1. Hence
f(W,Y ) (w, y) = 2λ2 e−λ(w+y+y) = 2λe−2λy λe−λw = fY (y)fW (w) where the density of X1:2 is fY (y) = 2λe−2λy . The
fact that the joint density is the product of the marginal densities implies W and Y are independent.
⇒ P[X2:2 − X1:2 > y|X1:2 = x] = P[X2:2 > x + y|X1:2 = x] = 1−F (x+y)
1−F (x) and this is independent of x. Taking x = 0
gives 1 − F (x + y) = (1 − F (x))(1 − F (y)) and F is continuous. Hence there exists λ > 0 with F (x) = 1 − e−λx .
[←EX]
6.11 By equation(3.2a) on page 8, the density of the vector is (X1:n , X2:n , . . . , Xn:n ) is g(x1 , . . . , xn ) = n!f (x1 ) · · · f (xn )
for 0 ≤ x1 ≤ x2 · · · ≤ xn . The transformation to (Y1 , Y2 , . . . , Yn ) has Jacobian with absolute value

∂(y1 , . . . , yn ) 1
∂(x1 , . . . , xn ) = y n−1

1
Hence for y1 ≥ 0 and 1 ≤ y2 ≤ · · · ≤ yn , the density of the vector (Y1 , Y2 , . . . , Yn ) is
h(y1 , . . . , yn ) = n!y1n−1 f (y1 )f (y1 y2 )f (y1 y3 ) · · · f (y1 yn )
(b) Integrating yn from yn−1 to ∞ gives
h(y1 , . . . , yn−1 ) = n!y1n−2 f (y1 )f (y1 y2 )f (y1 y3 ) · · · f (y1 yn−1 ) 1 − F (y1 yn−1 )
 

Then integrating yn−1 over yn−2 to ∞ gives


 2
1 − F (y1 yn−2 )
h(y1 , . . . , yn−2 ) = n!y1n−3 f (y1 )f (y1 y2 )f (y1 y3 ) · · · f (y1 yn−2 )
2
and by induction
[1 − F (y1 y2 )]n−2
h(y1 , y2 ) = n!y1 f (y1 )f (y1 y2 )
(n − 2)!
n−1
h(y1 ) = nf (y1 ) [1 − F (y1 )]
as required. [←EX]
6.12 P[N = 1] = 1/2, P[N = 2] = P[X1 > X0 < X2 ] = 1/6, P[N = 3] = P[X1 < X0 , X2 < X0 , X3 > X0 ] = 2P[X1 < X2 <
P∞ 1
X0 < X3 ] = 2/4! = 1/12. In general, P[N = k] = (k − 1)!/(k + 1)! = 1/k(k + 1) for n = 1, 2, . . . . Hence E[N ] = k=1 k+1
which diverges. [←EX]
6.13 For integers m ≥ 1 and n ≥ 1, the random variable Snm is the sum of m independent random variables each of which
has the same distribution as an + bn X. Hence
d d
Snm = man + bn (X1 + · · · + Xm ) = man + bn Sm = man + bn am + bn bm X
d
Clearly we also have Snm = nam + bm an + bn bm X. Hence an (m − bm ) = am (n − bn ). But bn = n1/α and α 6= 1. Hence
there must exist β ∈ R such that
an = β(n − bn )
Hence if Y1 , . . . , Yn are i.i.d. random variables with the same distribution as X − β, then
d d d
Y1 + · · · + Yn = Sn − nβ = an + bn X − nβ = bn (X − β) = bn Y
Hence X − β has a strictly stable distribution. [←EX]
d d
6.14 Now Snm = anm + nmX and Snm = man + nam + nmX; hence anm = man + nam or
anm an am
= + which implies an = ln(nn ) = n ln(n)
nm n m
Hence
d
X1 + · · · + Xn = n ln(n) + nX [←EX]
6.15 Let Sn = X1 + · · · + Xn where X1 , . . . , Xn are i.i.d. random variables with the same distribution as X. Then Sm +
(Sn+m − Sm ) = Sn+m where Sm and Sn+m − Sm are independent; hence
d d
bm X1 + bn X2 = bn+m X which implies m1/α X1 + n1/α X2 = (n + m)1/α X [←EX]
Mar 10, 2020(20:25) Answers 2§8 Page 175

6.16 If φX1 +X2 (t) denotes the characteristic function of X1 + X2 , then φX1 +X2 (t) = φX1 (t)φX2 (t).
First suppose α = 1. Then
 
2iβ1 2iβ2
φX1 +X2 (t) = exp itc − d1 |t|[1 + sgn(t) ln(|t|)] − d2 |t|[1 + sgn(t) ln(|t|)]
π π
2id1 |t|β1 2id2 |t|β2
 
= exp itc − d|t| − sgn(t) ln(|t|) − sgn(t) ln(|t|)
π π
 
2i|t|(d1 β1 + d2 β2 )
= exp itc − d|t| − sgn(t) ln(|t|)
π
 
2i|t|dβ
= exp itc − d|t| − sgn(t) ln(|t|) as required.
π
Now suppose α 6= 1; then
  πα   πα  
φX1 +X2 (t) = exp itc − dα α
1 |t| [1 − iβ1 sgn(t) tan ] − dα α
2 |t| [1 − iβ2 sgn(t) tan ]
2  2
 πα  πα 
= exp itc − dα |t|α + idα α
1 |t| β1 sgn(t) tan + idα α
2 |t| β2 sgn(t) tan
  πα2 2
α α α α
= exp itc − d |t| + id β|t| sgn(t) tan as required. [←EX]
2
6.17 (a) Substituting γt for t in equation(4.3b) gives the result.
(b) By part (a), γ1 X1 has a stable distribution with characteristic exponent α 6= 1 and characteristic function with
parameters {γ1 c, γ1 d, β} and γ2 X2 has a stable distribution with characteristic exponent α 6= 1 and characteristic function
with parameters {γ2 c, γ2 d, β}. Then use exercise 6.16. Hence γ1 X1 + γ2 X2 has a stable distribution with characteristic
exponent α and a characteristic function with parameters {c0 , d0 , β 0 } where
c0 = (γ1 + γ2 )c d0 = (γ1α + γ2α )1/α d and β 0 = β
This is the characteristic function of the distribution γ1 + γ2 − (γ1α + γ2α )1/α c + (γ1α + γ2α )1/α X as required.
 

(c) Now |t| sgn(t) = t; hence γ1 X1 has characteristic function


  
2β 2iβ
φγ1 X1 (t) = exp itγ1 c − it γ1 ln(γ1 ) − dγ1 |t| 1 + sgn(t) ln(|t|)
π π
and hence, if γ 0 = γ1 + γ2 ,
  
2β 2iβ
φγ1 X1 +γ2 X2 (t) = exp itγ 0 c − it (γ1 ln(γ1 ) + γ2 ln(γ2 )) − dγ 0 |t| 1 + sgn(t) ln(|t|)
π π
and this is the characteristic function of a + γ 0 X where
2β  0
γ ln(γ 0 ) − γ1 ln(γ1 ) − γ2 ln(γ2 )

a= [←EX]
π
d d d
6.18 (a) Now Snm = bnm X. Using the hint shows that Snm = bn Y1 + · · · + bn Ym = bm bn X. Hence bmn = bm bn . Using
b2 = b2 b1 shows that b1 = 1.
d
Now br = 1 implies X1 + · · · + Xr = X; using characteristic functions shows φX (t) = 1 which implies X = 0 constant,
d
a contradiction. Suppose br < 1, then if n = rj then X1 + · · · + Xn = [b(r)]j X → 0 as j → ∞. Using characteristic
functions shows X = 0, constant which is a contradiction. Hence br > 1.
(b) Now Sm+n = Sm + (Sm+n − Sm ) where Sm and Sm+n − Sm are independent. Hence result. [←EX]

Chapter 2 Section 8 on page 25 (exs-uniform.tex)

8.1
 b n Z b−a
a+b
Z
1 1 2
E[X − µ)n ] = x− dx = v n dv
b−a a 2 b − a a−b

n
 2
1 − (−1)n+1

n+1 n+1 − a)
(b − a) − (a − b) (b
= = [←EX]
2n+1 (b − a)(n + 1) (n + 1)2n+1
Page 176 Answers 2§8 Mar 10, 2020(20:25) Bayesian Time Series Analysis
R1
P Y ≤ xz2 6x(1 − x) dx. Also
 
8.2 For 0 < z < 1 we have P[Z ≤ z] = x=0

h zi z 2

if x2 > z
P Y ≤ 2 = x2
x 1 if x2 < z
Hence √
Z 1
 z 2 Z z
P[Z ≤ z] = √ 2
6x(1 − x) dx + 6x(1 − x) dx
x= z x x=0
Z 1
1−x
= 6z 2 √ 3
dx + 3z − 2z 3/2
x= z x
  1
2 1 1
= 6z − 2 + + 3z − 2z 3/2
2x x x=√z
 
1 1 1
= 6z 2 + −√ + 3z − 2z 3/2
2 2z z
= 3z 2 + 6z − 8z 3/2

and hence fZ (z) = 6z + 6 − 12z 1/2 = 6(z + 1 − 2z 1/2 ) = 6( z − 1)2 for 0 < z < 1. [←EX]
−x −x
8.3 P[− ln X ≤ x] = P[ln X ≥ −x] = P[X ≥ e ] = 1 − e . Hence Y ∼ exponential (1). [←EX]
R1
8.4 (a) Distribution function: FZ (z) = P[XY ≤ z] = 0 P[X ≤ /y] dy. But P[X ≤ /y] = 1 if y ≤ z and = /y if z < y.
z z z
Rz R1
Hence FZ (z) = 0 dy + z z/y dy = z(1 − ln z) for 0 < z < 1.
Density: By differentiating FZ (z) we get fz (z) = − ln z for 0 < z < 1. Alternatively, consider
the
transformation
∂(z,v)
Z = XY and V = Y . Then 0 < Z < 1 and 0 < V < 1. The absolute value of the Jacobian is ∂(x,y) = y = v. Hence
f(Z,V ) (z, v) = f(X,Y ) ( z/v, v)/v. Hence
Z 1 Z 1
fX ( z/v) dv
fZ (z) = dv = = − ln z for 0 < z < 1.
v=0 v v=z v
Pn
(b) Now − ln Pn = j=1 Yj where Y1 , . . . , Yn are i.i.d. with the exponential (1) distribution, after using the result in
exercise 8.3. Hence Z = − ln Pn ∼ gamma(n, 1) and Z has density z n−1 e−z /Γ(n) for z > 0. Transforming back to Pn
shows the density of Pn is f (x) = (− ln x)n−1 /Γ(n) = (ln 1/x)n−1 /Γ(n) for x ∈ (0, 1). [←EX]
8.5 (a) First n = 2. Now the density of (X1 , X2 ) is
1
f(X1 ,X2 ) (x1 , x2 ) = fX2 |X1 (x2 |x1 )fX1 (x1 ) = for 0 < x2 < x1 < 1.
x1
Z 1 1  
1
fX2 (x2 ) = f(X1 ,X2 ) (x1 , x2 ) dx1 = − ln x1 = ln for 0 < x2 < 1.

x1 =x2 x1 =x2 x2
Assume true for n − 1; to prove for n. Now Xn ∼ uniform(0, Xn−1 ); hence
1
f(Xn−1 ,Xn ) (xn−1 , xn ) = fXn |Xn−1 (xn |xn−1 )fXn−1 (xn−1 ) = fX (xn−1 )
xn−1 n−1
n−2
1 ln 1/xn−1
= for 0 < xn < xn−1 < 1.
xn−1 (n − 2)!
and hence
Z 1 n−2 n−1
1 ln 1/xn−1 ln 1/xn
fXn (xn ) = dxn−1 = for 0 < xn < 1.
xn−1 =xn xn−1 (n − 2)! (n − 1)!
As required.
(b) Now Xn = Xn−1 Z where Z ∼ uniform(0, 1). Similarly, by induction, Xn = X1 Z1 · · · Zn−1 which is the product of
n random variables with the uniform(0, 1) distribution. Hence result by part (b) of exercise 8.4 [←EX]
8.6
Z b  
1 1
H(X) = − ln dx = ln(b − a) [←EX]
a b−a b−a
8.7 (a)
b
1 b min{b, v} − max{0, v − a}
Z Z
fV (v) = fX (v − y)fY (y) dy =fX (v − y)dy =
0 b 0 ab
by using fX (v − y) = 1/a when 0 < v − y < a; i.e. when v − a < y < v. Suppose a < b. Then

 v/ab if 0 < v < a;
fV (v) = 1/b if a < v < b;
(a + b − v)/ab if b < v < a + b.

Mar 10, 2020(20:25) Answers 2§8 Page 177

(b) Now −a < W < b.


b
1 b min{a, b − w} − max{0, −w}
Z Z
fW (w) = fY (y + w) dy = fY (y + w)fX (y) dy =
0 a 0 ab
by using fY (y + w) = 1/b when 0 < y + w < b; i.e. when −w < y < b − w. Suppose a < b. Then

 (a + w)/ab if −a < w < 0;
fW (w) = 1/b if 0 < w < b − a;
(b − w)/ab if b − a < w < b.

fX (x) .......
..........
fX (x) .......
..........
.... ....
... ...
.. ..
... ...
... ......................................................................... ... .........................................................................
... .... .. ... .... ..
... .... ...... ... .... ......
... ... . . ... ... ... . . ...
... ... .. . ...
. . ... ... .. . ...
. .
... .. .
. . .... ... .. .
. . ....
... ..
. . . ... ... ..
. . . ...
... .... .
.
.
.
...
... ... .... .
.
.
.
...
...
... ... . . ... ... ... . . ...
.. ..
...
... .....
. .
.
.
/ 1 b
.
.
.
...
...
...
...
... .....
. .
.
.
/ 1 b
.
.
.
...
...
...
... ... . . ... ... ... . . ...
... ... .
.
.
. ... ... ... .
.
.
. ...
... ... . . ... ... ... . . ...
...... . . .. . ...... . . .. .
....................................................................................................................................................................................................... .......................................................................................................................................................................................................
a .... . .... .
0 b a+b x −a 0 b−a b x
Figure(7a). Plot of density of V = X + Y (left) and density of W = Y − X (right).
(PICTEX)

[←EX]
8.8 Now ( a−t b−t
a b for 0 ≤ t ≤ min{a, b};
P[V ≥ v] = P[X ≥ v]P[Y ≥ v] =
0 for t ≥ min{a, b}.
(a−t)(b−t)
(
1− ab if 0 ≤ t ≤ min{a, b};
FV (t) =
1 t ≥ min{a, b}.
a + b − 2t
fV (t) = for 0 ≤ t ≤ min{a, b}.
ab
Finally
Z min{a,b} Z a a
b−x

P[Y > x]
Z
1 1 − a/2b if a ≤ b;
P[V = X] = P[Y > X] = dx = dx + dx = [←EX]
0 a 0 ab min{a,b} a 1 − b/2a if a > b.
8.9 Let V denote the arrival time. If you take the bus on route 2 then E[V ] = V = t0 + α + β. If you wait for a bus on route 1,
then E[V ] = t0 + α + E[X2 − t0 |X2 > t0 ]. But the distribution of (X2 − t0 |X2 > t0 ) is uniform(0, a − t0 ), and hence
E[X2 − t0 |X2 > t0 ] = (a − t0 )/2. Hence route 1 is faster if (a − t0 )/2 < β and route 2 is faster if (a − t0 )/2 > β.
[←EX]
8.10 For w ∈ (0, 1) we have
X∞ X∞
P[W ≤ w] = P[W ≤ w, bU + V c = k] = P[U + V ≤ w + k, bU + V c = k]
k=−∞ k=−∞
X∞ ∞ Z
X 1
= P[V ≤ w + k − U, bU + V c = k] = P[u + V ≤ w + k, k ≤ u + V < k + 1] du
k=−∞ k=−∞ u=0
X∞ Z 1 ∞
X Z 1
= P[k ≤ u + V ≤ w + k] du = P[k − u ≤ V ≤ w + k − u] du
k=−∞ u=0 k=−∞ u=0

X∞ Z 1 Z w+k−u ∞
X
= fV (v)dv du = In
k=−∞ u=0 v=k−u k=−∞
where
Z k+w  
In = min{w + k − v, 1} − max{0, k − v} fV (v) dv
v=k−1
Z k−1+w Z k Z k+w
= [1 − k + v]fV (v)dv + wfV (v)dv + (w + k − v)fV (v)dv
v=k−1 v=k−1+w v=k
"Z #
Z k+w k−1+w Z k+w
=w fV (v) dv + vfV (v)dv − vfV (v)dv +
v=k−1+w v=k−1 v=k
" Z #
k+w Z k−1+w
k fV (v)dv − (k − 1) fV (v) dv
v=k v=k−1
Page 178 Answers 2§8 Mar 10, 2020(20:25) Bayesian Time Series Analysis

and hence
n
"Z #
X Z n+w −n−1+w Z n+w
In = w fV (v)dv + vfV (v)dv − vfV (v)dv +
k=−n v=−n−1+w v=−n−1 v=n
" Z #
n+w Z −n−1+w
n fV (v)dv − (−n − 1) fV (v)dv
n v=−n−1
R n+w R n+w
Now E|V | < ∞; hence E[V + ] < ∞. Hence n vfV (v) dv → 0 as n → ∞. In turn, this implies n n fV (v) dv → 0
as n → ∞. Similarly for V − and
Pn the other two
R∞ integrals.
Hence P[W ≤ w] = limn→∞ k=−n In = w v=−∞ fV (v) dv = w as required. [←EX]
8.11 (a) P[V ≥ t] = (1 − t)2 . Hence FV (t) = P[V ≤ t] = 2t − t2 and fV (t) = 2(1 − t) for 0 ≤ t ≤ 1. Also E[V ] = 1/3.
FW (t) = P[W ≤ t] = t2 and fW (t) = 2t for 0 ≤ t ≤ 1. Also E[W ] = 2/3.
(b) For v < w we have P[V ≤ v, W ≤ w] = P[W ≤ w] − P[W ≤ w, V > v] = w2 − (w − v)2 = v(2w − v); whilst for
v > w we have P[V ≤ v, W ≤ w] = P[W ≤ w] = w2 . Hence
∂2
f(V,W ) (v, w) = P[V ≤ v, W ≤ w] = 2 for 0 ≤ v < w ≤ 1.
∂v∂w
(c) For v < w we have P[W ≤ w|V ≤ v] = (2w − v)/(2 − v) and for v > w we have P[W ≤ w|V ≤ v] = w2 /(2v − v 2 ).
Hence 
2/(2 − v) if v < w;
fW (w|V ≤ v) =
2w/(2v − v 2 ) if v > w.
Z v Z 1
2 2 2 2v 2 1 − v2 3 − v2
E[W |V ≤ v] = w dw + w dw = + =
2v − v w=0
2 2 − v w=v 3(2 − v) 2 − v 3(2 − v)
Note that E[W |V ≤ 1] = 2/3 = E[W ] and E[W |V ≤ 0] = 1/2 = E[X]. [←EX]
8.12 (a) Without loss of generality, suppose we measure distances clockwise from some fixed origin O on the circle. Let D
denote the length of the interval (X1 , X2 ). Then 0 < D < 1 and
 
X2 − X1 if X2 > X1 ; 0 if 1 > X2 − X1 > 0;
D= = X2 − X1 +
1 − X1 + X2 if X1 > X2 . 1 if −1 < X2 − X1 < 0.
The first line corresponds to points in the clockwise order O → X1 → X2 and the second line to points in the clockwise
order O → X2 → X1 .
So for y ∈ (0, 1) we have
P[D ≤ y] = P[0 ≤ X2 − X1 ≤ y] + P[X2 − X1 ≤ y − 1]
= P[X1 ≤ X2 ≤ min{X1 + y, 1}] + P[X2 ≤ X1 + y − 1]
"Z # "Z #
1−y Z 1 1
= y fX1 (x1 ) dx1 + (1 − x1 )fX1 (x1 ) dx1 + (x1 + y − 1)fX1 (x1 )dx1
x1 =0 x1 =1−y x1 =1−y

1 (1 − y)2 1 (1 − y)2
   
= y(1 − y) + y − + + y(y − 1) + − =y
2 2 2 2
as required.
(b) Without loss of generality, take Q to be the origin and measure clockwise. So we have either Q → X1 → X2 or
Q → X2 → X1 and both of these lead to the same probability. Consider the first case: Q → X1 → X2 . Then for
t ∈ (0, 1) we have
Z t Z t
t2
P[L ≤ t, Q → X1 → X2 ] = P[X2 ≥ 1 − (t − x1 )]fX1 (x1 ) dx1 = (t − x1 )dx1 =
x1 =0 x1 =0 2
2
Similarly for Q → X2 → X1 . Hence P[L ≤ t] = t and fl (t) = 2t for t ∈ (0, 1). Finally, E[L] = /3. 2 [←EX]
2
n 2
n
8.13 For x ∈ (0, r) we have P[D > x] = 1 − /r x 2 and hence P[D ≤ x] = 1 − 1 − /r . Hence
x 2

2 n−1
 
2nx x
fD (x) = 2 1 − 2 for x ∈ (0, r).
r r
Z r n−1 Z 1
2nx2 x2

E[D] = 2
1 − 2
dx = 2nrv 2 (1 − v 2 )n−1 dv
0 r r 0
Z 1  
Γ 3/2 Γ(n) Γ 3/2 Γ(n + 1)
1/2 n−1
= nr u (1 − u) du = nr  =r  [←EX]
0 Γ n + 3/2 Γ n + 3/2
8.14 Now f(X1 ,X2 ) (x1 , x2 ) is constant on the disc. Hence f(X1 ,X2 ) (x1 , x2 ) = 1/πa2 for x21 + x22 ≤ a2 . Hence
Z √a2 −x2
q
1 1 2 a2 − x21
fX1 (x1 ) = √ dx1 = for −a < x1 < a. [←EX]
x2 =− a2 −x21 πa2 πa2
Mar 10, 2020(20:25) Answers 2§8 Page 179

8.15 Now P[R ≤ r] = r 2 for 0 < r < 1. Hence the density of R is fR (r) = 2r for r ∈ (0, 1). The density of Θ is fΘ (θ) = 1/2π
for θ ∈ (0, 2π). Also R and Θ are independent; hence the density of the random vector (R, Θ) is
r
f(R,Θ) (r, θ) = for r ∈ (0, 1) and θ ∈ (0, 2π).
π
Using the standard change of variable result gives
∂(r, θ) r 1 1
f(X,Y ) (x, y) = f(R,Θ) (r, θ)
= = for (x, y) ∈ {(x, y) ∈ R2 : 0 < x2 + y 2 < 1}. [←EX]
∂(x, y) π r π
R 1 α−1
8.16 (a) By equation(7.5a) and the standard result 0 x (1 − x)β−1 dx = Γ(α)Γ(β)/Γ(α + β) which is derived in §13.1 on
page 40, we have
Z 1  Z 1
n−1 k

n!
E[Xk:n ] = n t (1 − t)n−k dt = tk (1 − t)n−k dt
0 k−1 (n − k)!(k − 1)! 0
n! Γ(k + 1)Γ(n − k + 1) k
= =
(n − k)!(k − 1)! Γ(n + 2) n+1
Similarly
Z 1  Z 1
n − 1 k+1

n!
E[Xk:n 2
]= n t (1 − t)n−k dt = tk+1 (1 − t)n−k dt
0 k−1 (n − k)!(k − 1)! 0
n! Γ(k + 2)Γ(n − k + 1) (k + 1)k
= =
(n − k)!(k − 1)! Γ(n + 3) (n + 2)(n + 1)
and so
k(n − k + 1)
var[Xk:n ] =
(n + 1)2 (n + 2)
(b) Just substitute into equation(3.5a) on page 10. This gives f(Xj:n ,Xk:n ) (x, y) = cxj−1 (y − x)k−j−1 (1 − y)n−k for
0 ≤ x < y ≤ 1 where c = n!/[ (j − 1)!(k − j − 1)!(n − k)! ].
(c)
j(k + 1)
E[Xj:n Xk:n ] =
(n + 1)(n + 2)
j(k + 1) j k j(n − k + 1)
cov[Xj:n , Xk:n ] = E[Xj:n Xk:n ] − E[Xj:n ]E[Xk:n ] = − =
(n + 1)(n + 2) n + 1 n + 1 (n + 1)2 (n + 2)
s
cov[Xj:n , Xk:n ] j(n − k + 1)
corr[Xj:n , Xk:n ] = p = [←EX]
var[Xj:n ]var[Xk:n ] k(n − j + 1)
8.17 Now X1 = FY (Y1 ), . . . , Xn = FY (Yn ) are i.i.d. with the uniform(0, 1) distribution and hence the density of (Xi:n , Xj:n )
is
n! i−1 j−i−1 n−j
f1 (x, y) = fY (x)fY (y) {x} {y − x} {1 − y}
(i − 1)!(j − i − 1)!(n − j)!
Now FY is monotonic increasing and continuous and the transformation (X1 , . . . , Xn ) 7→ (Y1 , . . . , Yn ) is order preserv-
ing; hence
P[Yi:n ≤ x, Yj:n ≤ y] = P[FY (Yi:n ) ≤ FY (x), FY (Yj:n ) ≤ FY (y)]
Z FY (x) Z FY (y)
= P[Xi:n ≤ FY (x), Xj:n ≤ FY (y)] = f1 (x, y) dx dy
−∞ −∞
and hence the joint density of (Yi:n , Yj:n ) is
n! i−1 j−i−1 n−j
f(i:n,j:n) (y) = fY (x)fY (y) {FY (x)} {FY (y) − FY (x)} {1 − FY (y)}
(i − 1)!(j − i − 1)!(n − j)!
for −∞ < x < y < ∞. [←EX]
8.18 By exercise 8.1 on page 25 we have σ 2 = E[(X − µ)2 ] = (b − a)2 /12, E[(X − µ)3 ] = 0 and E[(X − µ)4 ] = (b − a)4 /80.
Hence γ1 = 0 and γ2 = 9/5. Note that the distribution of X is symmetric about µ = (a + b)/2 and this implies
skew[X] = 0. Alternatively, use example (1.5b) and skew[a + bX] = skew[X] when b > 0 and κ[a + bX] = κ[X].
[←EX]
8.19 Now
n  
1 X n
Yn = n (−1)j an−j X (n−j)λ (1 − X)λj
λ j
j=0
and hence
n   Z 1
n 1 X n j n−j
E[Y ] = n (−1) a xλ(n−j) (1 − x)λj dx
λ j 0
j=0
n  
1 X n
= n (−1)j an−j B ( λ(n − j) + 1, λj + 1 )
λ j
j=0
where B denotes the beta function described in §13.1 on page 40. [←EX]
Page 180 Answers 2§8 Mar 10, 2020(20:25) Bayesian Time Series Analysis

8.20 (a) Clearly f ≥ 0 and


c Z c  2 c
x (c − a)2 c − a
Z
2 2 2
f (x) dx = (x − a) dx = − ax = =
a (b − a)(c − a) a (b − a)(c − a) 2 a (b − a)(c − a) 2 b−a
Similarly, substituting a → b and b → a gives
Z b Z b
2 b−c
f (x) dx = (b − x) dx =
c (b − a)(b − c) c b−a
Hence f is a density; a plot of the function is shown in figure (20a).
.
......
.........
....
f (x) ...
...
... .....
... ........
... . .....
... ... . .........
... ... .. .....
... ... . .....
.....
... ... . .....
... ..
. .
. .....
... ..
. .....
.
... .... . .....
.....
... .. . .....
...
... .
.
..
.
.. . 2
.
.
/b−a .....
.....
.....
... ..
. .
. .....
... ..
. .....
. .....
... .... . .....
. . . .
........................................................................................................................................................................................................
. .
.
.. .
a .. c b x
Figure(20a). Plot of density of triangular distribution.
(PICTEX)
(b) Now
c Z c  3 c
x ax2 2c2 − ac − a2
Z
2 2 2
xf (x) dx = (x − ax) dx = − =
a (b − a)(c − a) a (b − a)(c − a) 3 2 a 3(b − a)
Similarly, substituting a → b and b → a gives
Z b Z b
2 −2c2 + bc + b2
f (x) dx = (bx − x2 ) dx =
c (b − a)(b − c) c 3(b − a)
and hence
b2 + bc − ac − a2 (b − a)(b + a + c) a + b + c
E[X] = = =
3(b − a) 3(b − a) 3
as required. Finally and in a similar way,
3(b2 + ab + a2 + bc + ac + c2 ) 2 a2 + b2 + c2 − ab − ac − bc
E[X 2 ] = and hence var[X] = E[X 2 ] − {E[X]} = [←EX]
18 18
8.21 (a) Suppose j ≤ t < j + 1; then
j   n  
1 k n 1 k n
X X
n−1
gn (t) = (−1) (t − k) − (−1) (t − k)n−1
2(n − 1)! k 2(n − 1)! k
k=0 k=j+1
j   n  
1 k n 1 k n
X X
n−1
= (−1) (t − k) − (−1) (t − k)n−1
(n − 1)! k 2(n − 1)! k
k=0 k=0
j  
1 X n
= (−1)k (t − k)n−1 = fn (t)
(n − 1)! k
k=0
The second term equals 0 by expanding the term (t − k)n−1 and using equation(7.3f ).
(b) Equation(8.21b) follows directly from equation(7.3a). Equation(8.21c) is proved in the same way as part (a). [←EX]
d
8.22 Now Xn = U1 + · · · + Un where U1 , U2 , . . . , Un are i.i.d. random varaibles with the uniform(0, 1) distribution. Hence
E[Xn ] = nE[U1 ] = n/2 and var[X] = nvar[U1 ] = n/12. Also
 t n
n e −1
E[etXn ] = E[etU1 ] = for all t ∈ R with t 6= 0.
t
(b) follows by the central limit theorem. [←EX]
d 2 2 2
8.23 (a) Using X U1 + · · · + Un gives E[X ] = nE[U1 ] + n(n − 1)E[U1 ]E[U2 ] = n/3 + n(n − 1)/4 = (3n + n)/12 and
=
E[X 3 ] = n1 E[U13 ] + 231 n2 E[U12 ]E[U2 ] + 231 n2 E[U1 ]E[U22 ] + 1 13 1 n3 E[U1 ]E[U2 ]E[U3 ] = n/4 + n(n − 1)/2 +
      

n(n − 1)(n − 2)/8 = n2 (n + 1)/8.


Similarly,
          
4 n 4 4 n 3 4 n 2 2 4 n
E[X ] = E[U1 ] + 2 E[U1 ]E[U2 ] + E[U1 ]E[U2 ] + 3 E[U12 ]E[U2 ]E[U3 ]
1 31 2 22 2 211 3
  
4 n
+ E[U1 ]E[U2 ]E[U3 ]E[U4 ]
1111 4
n 4n(n − 1) 3n(n − 1) 6n(n − 1)(n − 2) n(n − 1)(n − 2)(n − 3) n[15n3 + 30n2 + 5n − 2]
= + + + + =
5 8 9 12 16 240
Mar 10, 2020(20:25) Answers 2§10 Page 181

(b) Either use symmetry about µ = n/2 or use equation(2.22a) which implies
E[X 3 ] − 3µσ 2 − µ3 1 n(n2 + n) n3
 
n n
skew[X] = = − 3 − =0
σ3 σ3 8 2 12 8
(c) Using equation(2.25a) shows that we have
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 144 n3 (n + 1) n3 3n4
 
4
κ[X] = = 2 E[X ] − + +
σ4 n 4 8 16
3
 
144 n (n + 2) 6
= 2 E[X 4 ] − =3− [←EX]
n 16 5n
8.24 Let Yj = Xj − a. Then Y1 , Y2 , . . . , Yn are i.i.d. random variables with the uniform(0, b − a) distribution. Also
P[X1 + · · · + Xn ≤ t] = P[Y1 + · · · + Yn < −na]. Hence
n  
1 k n
X n
(x − na − kb + ka)+

Fn (x) = n
(−1) for all x ∈ R and all n = 1, 2, . . . .
(b − a) n! k
k=0
n  
1 k n
X n−1
(x − na − kb + ka)+

fn (x) = n
(−1) for all x ∈ R and all n = 2, 3, . . . . [←EX]
(b − a) (n − 1)! k
k=0

Chapter 2 Section 10 on page 31 (exs-exponential.tex)

n n
10.1 Using E[X ] = n!/λ and var[X] = 1/λ and equations(2.22a) and (2.25a) gives skew[X] = λ (6/λ −3/λ3 −1/λ3 ) =
2 3 3

2 and κ[X] = λ4 (24/λ4 − 24/λ4 + 6/λ4 + 3/λ4 ) = 9. [←EX]


10.2 (a) Now FX (t) = 1 − e−λt for t > 0. Hence hX (t) = fX (t)/[1 − FX (t)] = λ for t ∈ (0, ∞). (b) We are given
fX (t)/[1 − FX (t)] = λ where, of course, λ must be positive. Let Fc (t) = 1 − FX (t). Hence
Fc0 (t)
= −λ
Fc (t)
For u > 0, integrate t ∈ (0, u) and this gives
ln[Fc (u)] = −λu
Hence result. [←EX]
10.3 We take n large enough so that αδn < 1. Suppose t ∈ [1 − αδn , 1]. Then
1 1
1< <
t 1 − αδn
Integrating over t from 1 − αδn to 1 gives
αδn
αδn < − ln(1 − αδn ) <
1 − αδn
Hence
1 1 1 1
eαδn < < e 1/αδn −1 and hence <e<
1 − αδn (1 − αδn )1/αδn −1 (1 − αδn )1/αδn
Hence
1 − αδn 1
< (1 − αδn )1/αδn < and hence lim (1 − αδn )1/αδn = e−1
e e n→∞
as required. [←EX]
(a) For k = 0, 1, . . . , we have P[Y = k] = P[k ≤ X < k + 1] = 1 − e−λ(k+1) − 1 − e−λk = e−λk (1 − e−λ ) = q k p
   
10.4
where 1 − e−λ . This is the geometric P∞ distribution. P∞
For z ∈ (0, 1) we have P[Z ≤ z] = k=0 P[k < X ≤ k + z] = k=0 e−λk 1 − e−λz = 1 − e−λz /[1 − e−λ ]. Hence
   

the density is fZ (z) = λe−λz /[1 − e−λ ] for z ∈ (0, 1).


Clearly Y and Z are independent by the  lack of memory property of the exponential; or, from first principles: P[Y =
k, Z ≤ z] = P[k < X < k + z] = e−λk 1 − e−λz and this equals P[Y

 = k]P[Z ≤z].
(b) For k = 1, 2, . . . , we have P[W = k] = P[X = k]−P[X = k−1] = 1 − e−λk − 1 − e−λ(k−1) = e−λ(k−1) (1−e−λ ).


Hence P[W = k] = q k−1 p where p = 1 − e−λ and q = 1 − p. [←EX]


10.5 This is just the probability integral transformation—see §7.6 on page 23. The distribution function of an exponential (λ)
is F (x) = 1 − e−λx . Inverting this gives G(y) = − ln(1 − y)/λ. Hence G(U ) ∼ exponential (λ). [←EX]
10.6 (a) Either use equation(9.6c) on page 31 with n = 1 to get S1 /S2 = X/(X + Y ) ∼ U1:1 , and this is the uniform(0, 1) dis-
tribution. Or from first principles: let V = X/(X + Y ) and W = X + Y . Hence X = V W and Y = W (1 − V ); also
∂(x,y)
W > 0 and 0 < V < 1. Now ∂(v,w) = w; hence f(V,W ) (v, w) = wf(X,Y ) (x, y) = wλ2 e−λ(x+y) = wλ2 e−λw and
R∞
fV (v) = λ 0 wλe−λw dw = 1 for 0 < v < 1. (b) The distribution of (X, Y ) is f(X,Y ) (x, y) = λ2 e−λ(x+y) for
x > 0 and y > 0. Let v = (x − y)/(x + y) and w = x + y; hence x = w(1 + v)/2 and y = w(1 − v)/2. Note that w > 0 and
∂(x,y) w
v ∈ (−1, 1). The absolute value of the Jacobian is ∂(v,w) = 2 and hence f(V,W ) (v, w) = λ2 we−λw /2 for v ∈ (−1, 1)
and w > 0. Hence fV (v) = 1/2 for v ∈ (−1, 1). [←EX]
Page 182 Answers 2§10 Mar 10, 2020(20:25) Bayesian Time Series Analysis

10.7 The
density
of (X, Y ) is f(X,Y ) (x, y) = λ2 e−λ(x+y) for x > 0 and y > 0. Let W = Y + 1 and Z = X/Y +1. Then
∂(w,z) 1
∂(x,y) = w . Hence f(W,Z) (w, z) = λ2 eλ we−λw(z+1) for w > 1 and z > 0. Integrating out w gives
1 + λ(z + 1)
fZ (z) = e−λz for z > 0. [←EX]
(z + 1)2

10.8 (a) By conditioning,


∞ ∞ ∞
λ
Z Z Z
−λx −µx −λx
P[X < Y ] = P[Y > x]λe dx = e λe dx = λ e−(λ+µ)x dx =
0 0 0 λ+µ
An alternative argument: P[X < Y |X] = P e−µX ; taking expectation gives P[X < Y ] = E[e−µX ] = λ/(λ + µ).
(b) Now min{X` : ` 6= k} ∼ exponential ( `6=k λ` ) and is independent of Xk . Hence result by part(a).
(c) The proof is by induction; so denote the proposition in the question by P (n). Part(a) proves P (2). Consider P (3):
Z ∞ Z ∞ Z ∞
P[X1 < X2 < X3 ] = fX1 (x1 ) fX2 (x2 ) λ3 e−λ3 x3 dx3 dx2 dx1
x1 =0 x2 =x1 x3 =x2
Z ∞ Z ∞ Z ∞ Z ∞
= fX1 (x1 ) fX2 (x2 )e−λ3 x2 dx2 dx1 = fX1 (x1 ) λ2 e−(λ2 +λ3 )x2 dx2 dx1
x1 =0 x2 =x1 x1 =0 x2 =x1
Z ∞ Z ∞
λ2
= fX1 (x1 ) (λ2 + λ3 )e−(λ2 +λ3 )x2 dx2 dx1
λ2 + λ3 x1 =0 x2 =x1
λ2
= P[X1 < Y ] where Y ∼ exponential (λ2 + λ3 ).
λ2 + λ3
λ2 λ1
= by using part(a).
λ2 + λ3 λ1 + λ2 + λ3
So P (3) is proved. Now assume P (n − 1).
Z ∞ Z ∞ Z ∞
P[X1 < X2 < · · · < Xn ] = fX1 (x1 ) · · · fXn−1 (xn−1 ) fXn (xn ) dxn dxn−1 . . . dx1
x1 =0 xn−1 =xn−2 xn =xn−1
Z ∞ Z ∞
= fX1 (x1 ) · · · fXn−1 (xn−1 )λn e−λn xn−1 dxn−1 . . . dx1
x1 =0 xn−1 =xn−2
∞ ∞
λn
Z Z
= fX1 (x1 ) · · · (λn−1 + λn )e−(λn−1 +λn )xn−1 dxn−1 . . . dx1
λn−1 + λn x1 =0 xn−1 =xn−2
λn
= P[X1 < · · · < Xn−2 < Y ] where Y ∼ exponential (λn−1 + λn ).
λn−1 + λn
Then use the P (n − 1) assumption to deduce P (n).
(d) Note that {X1 < · · · < Xn } = A ∩ B and part(c) shows that P[X1 < · · · < Xn ] = P[A]P[B]. Hence A and B are
independent. [←EX]
R∞ ∞
10.9 E[f (X)]−E[f (Y )] = 0 f (t)[µe−µt −δe−δt ] dt = − 0 df −δt
−e−µt ] dt by integration by parts. Hence E[f (X)]−
R
dt [e
E[f (Y )] ≤ 0. [←EX]
Rz R z 2 −λz 2 −λz
10.10 The density of Z = X + Y is fZ (z) = 0 fY (z − x)fX (x) dx = 0 λ e dx = λ ze for z ≥ 0. The joint density
of (X, Z) is f(X,Z) (x, z) = fX (x)fY (z − x) = λ2 e−λz . Hence
f(X,Z) (x, z) 1
fX|Z (x|z) = = for x ∈ (0, z).
fZ (z) z
This is the uniform(0, z) distribution with expectation z/2. Hence E[X|X + Y ] = (X+Y )/2. [←EX]
10.11
∂(y1 , y2 ) 1 2 −λy2
Now
=2 and f(Y1 ,Y2 ) (y1 , y2 ) = λ e for y2 ≥ 0, y1 ∈ R and −y2 ≤ y1 ≤ y2 .
∂(x1 , x2 ) 2
Hence
Z y2
1 2 −λy2
fY2 (y2 ) = λ e dy1 = λ2 y2 e−λy2 for y2 ≥ 0.
−y 2
Z ∞2
1 2 −λy2 1
fY1 (y1 ) = λ e dy2 = λe−λy1 if y1 ≥ 0
y 2 2
Z 1∞
1 2 −λy2 1
fY1 (y1 ) = λ e dy2 = λeλy1 if y1 < 0
−y1 2 2
(b) fR (r) = λe−λr . [←EX]
Mar 10, 2020(20:25) Answers 2§10 Page 183

10.12 Now P[U > t] = ( P[X > t] )2 = e−2t and hence U ∼ exponential (2).
Let W = 21 Y . Then fW (w) = 2fY (2w) = 2e−2w for w > 0. Hence, if Z = X + W , then for z ≥ 0 we have
Z z Z z Z z
fZ (z) = fW (z − x)fX (x) dx = 2e−2z+2x e−x dx = 2e−2z ex dx = 2e−2z (ez − 1) = 2e−z − 2e−2z
0 0 0
2 −t 2 −t −2t
for t ≥ 0, and hence fV (t) = 2e−t − 2e−2t for t ≥ 0.

Finally, P[V ≤ t] = (P[X ≤ t]) = 1 − e = 1 − 2e +e
[←EX]
10.13 Now P[Y ≤ y, R ≤ r] = 2P[R ≤ r, X1 < X2 , X1 ≤ y] = 2P[X2 − X1 ≤ r, X1 < X2 , X1 ≤ y]. Hence
Z y Z y
P[Y ≤ y, R ≤ r] = 2 P[z < X2 < r + z]f (z) dz = 2 [F (r + z) − F (z)] f (z) dz
Z0 y Z r 0

=2 f (z + t)f (z) dt dz
z=0 t=0
Differentiating gives f(Y,R) (y, r) = 2f (y)f (r + y) for all y ≥ 0 and r ≥ 0.
Suppose X1 and X2 have the exponential distribution. Then f(Y,R) (y, r) = 2λ2 e−2λy e−λr = 2λe−2λy
  −λr
λe =
fY (y)fR (r). Hence Y and R are independent.
Suppose Y and R are independent. Then we have fY (y)fR (r) = 2f (y)f (r + y) for all y ≥ 0 and r ≥ 0. Let r = 0.
2
Then fY (y)fR (0) = 2 {f (y)} . But fY (y) = 2f (y) [1 − F (y)]. Hence fR (0) [1 − F (y)] = f (y) or cF (y) + f (y) = c. This
differential equation can be solved by using the integrating factor ecy where c = fR (0). We have cecy F (y) + ecy f (y) =
d
cecy or dy {ecy F (y)} = cecy . Hence ecy F (y) = A + ecy . But F (0) = 0; hence A = −1 and F (y) = 1 − e−cy as required.
[←EX]
R∞ −λx1
R∞ −(λ1 +λ2 )x1
10.14 (a) P[min{X1 , X2 } = X1 ] = P[X2 ≥ X1 ] = x=0 P[X2 ≥ x]λ1 e dx = x=0 λ1 e dx = λ1 /(λ1 + λ2 ).
R∞ R∞
(b) P[min{X1 , X2 } > t and min{X1 , X2 } = X1 ] = P[X2 > X1 > t] = x=t y=x λ1 e−λ1 x λ2 e−λ2 y dy dx =
R∞
λ e−(λ1 +λ2 )x dx = λ1λ+λ
x=t 1
1
2
e−(λ1 +λ2 )t = P[min{X1 , X2 } > t] P[min{X1 , X2 } = X1 ] as required.
R∞
(c) P[R > t and min{X1 , X2 } = X1 ] = P[X2 − X1 > t] = P[X2 > t + X1 ] = u=0 e−λ2 (t+u) λ1 e−λ1 u du =
λ1 e−λ2 t /(λ1 + λ2 ). Similarly for P[R > t and min{X1 , X2 } = X2 ]. Hence P[R > t] = λ1 e−λ2 t + λ2 e−λ1 t /(λ1 + λ2 )
 
for t > 0.
(d) Now P[R > t, min{X1 , X2 } > u] = P[R > t, min{X1 , X2 } > u, min{X1 , X2 }R= X1 ] + P[R > t, min{X1 , X2 } >

u, min{X1 , X2 } = X2 ] = P[X2 − X1 > t, X1 > u] + P[X1 − X2 > t, X2 > u] = x1 =u P[X2 > t + x1 ]fX1 (x1 ) dx1 +
R∞
x2 =u
P[X1 > t + x2 ]fX2 (x2 ) dx2 = λ1 e−λ2 t e−(λ1 +λ2 )u /(λ1 + λ2 ) + λ2 e−λ1 t e−(λ1 +λ2 )u /(λ1 + λ2 ). Hence
λ1 e−λ2 t λ2 e−λ1 t −(λ1 +λ2 )u
 
P[R > t, min{X1 , X2 } > u] = + e = P[R > t] P[min{X1 , X2 } > u] [←EX]
λ1 + λ2 λ1 + λ2
10.15 Clearly the m.g.f. of Y is
n n
Y kλY n!λn
E[etY ] = E[etYk ] = = for t < λ.
kλ − t (λ − t)(2λ − t) · · · (nλ − t)
k=1 k=1
n
≤ x] = 1 − e−λx with density fn (x) = nλe−λx (1 − e−λx )n−1 for x > 0. Hence

For x ∈ [0, ∞) we have P[Xn:n
Xn:n has m.g.f.
Z ∞
tXn:n
E[e ] = nλ e−(λ−t)x (1 − e−λx )n−1 dx for t < λ.
0
We prove the result by induction. For k = 1, 2, . . . , n, let P (k) denote the proposition
Z ∞
(n − k)!λn−k
P (k) : e−(kλ−t)x (1 − e−λx )n−k dx =
0 (kλ − t)(kλ + λ − t) · · · (nλ − t)
Clearly P (n) is true. Express P (k + 1) as
Z ∞
(n − k − 1)!λn−k−1
e−(kλ−t)x e−λx (1 − e−λx )n−k−1 dx =
0 (kλ + λ − t) · · · (nλ − t)
and then integrate by parts to deduce P (k).
Hence by induction, P (k) is true for all k ∈ {1, 2, . . . , n}. [←EX]
10.16 By proposition(9.5a)
Z1 Z2 Zk
Xk:n ∼ + + ··· + where Z1 , . . . , Zk are i.i.d. with the exponential (λ) distribution.
n n−1 n−k+1
Hence
k k
1X 1 1X 1
E[Xk:n ] = var[Xk:n ] =
λ n+1−` λ (n + 1 − `)2
`=1 `=1
Finally, cov[Xj:n , Xk:n ] = var[Xj:n ] for 1 ≤ j < k ≤ n. [←EX]
Page 184 Answers 2§10 Mar 10, 2020(20:25) Bayesian Time Series Analysis

10.17 By proposition(9.5a) we know that


k
X Z`
Xk:n = for k = 1, 2, . . . , n.
n−`+1
`=1
where Z1 , . . . , Zn are i.i.d. with the exponential (λ) distribution. Hence
n n k n n n
X X X Z` X Z` X X n−`+2
Z= (n − k + 1)Xk:n = (n − k + 1) = (n − k + 1) = Z`
n−`+1 n−`+1 2
k=1 k=1 `=1 `=1 k=` `=1
Hence we can write
n
1X
Z= (j + 1)Tj where T1 , . . . , Tn are i.i.d. with the exponential (λ) distribution.
2
j=1
Pn Pn
Hence E[Z] = j=1 (j + 1)/2λ = n(n + 3)/4λ and var[Z] = j=1 (j + 1)2 /4λ2 = n(2n2 + 9n + 13)/24λ2 . [←EX]
Pn Pn Pn
10.18 Now `=1 (X` − X1:n ) = `=1 X` − nX1:n = `=1 X`:n − nX1:n = Z2 + · · · + Zn which is independent of Z1 . Hence
result. n [←EX]
10.19 Now FX (x) = 1 − e−x for x > 0. RHence P[Zn < x] = 1 − e−u R for x > 0. Hence result. [←EX]
z z Rz
10.20 (a) For z > 0 we have fZ (z) = x=0 fX (x)fY (z − x) dx = λµ x=0 e−λx e−µ(z−x) dx = λµe−µz x=0 e−(λ−µ)x dx =
( e−λz − e−µz )/( 1/λ − 1/µ). R
∞ R∞
(b) P[Z ≤ t] = P[Y ≥ X/t] = x=0 P[Y ≥ x/t]λe−λx dx = λ x=0 e−µx/t e−λx dx = λ/(λ + µ/t) = λt/(µ + λt).
(c) First note the following integration " result: √ √ !#
Z ∞ Z ∞ √ α λx
e−(α/x+λx) dx = exp − αλ √ + √ dx
0 x=0 λx α
p
Using the substitution x λ/α = e−y gives
Z ∞ r Z ∞
α h √ i −y
r Z ∞
α √
−(α/x+λx) y −y
e dx = exp − αλ e + e e dy = e−y e−2 αλ cosh(y) dy
x=0 λ y=−∞ λ y=−∞
r "Z ∞ √ 0 √
#
α
Z
= e−y e−2 αλ cosh(y) dy + e−y e−2 αλ cosh(y) dy
λ y=0 y=−∞
r Z ∞ √ Z ∞ √

α −y −2 αλ cosh(y) y −2 αλ cosh(y)
= e e dy + e e dy
λ y=0 y=0

r Z ∞ √
r
α α
=2 cosh(y)e−2 αλ cosh(y) dy = 2 K1 (2 αλ)
λ y=0 λ
Using this result we get
Z ∞ Z ∞h i
−λx
F (z) = P[Z ≤ z] = P[XY ≤ z] = P[Y ≤ /x]λe
z dx = 1 − e−µz/x λe−λx dx
x=0 x=0
Z ∞ p p
−(µz/x+λx)
=1−λ e dx = 1 − 2 λµzK1 (2 λµz)
x=0
The density function can be found by differentiating the distribution function:
Z ∞ Z ∞ " √ √ !#
1 −(λx+µz/x) 1 p x λ µz
f (z) = λµ e dx = λµ exp − λµz √ + √ dx
0 x 0 x µz x λ
√ √
Using the substitution x = µzey / λ gives
Z ∞ h p Z ∞
i h p i
f (z) = λµ exp − λµz ey + e−y dy = λµ exp −2 λµz cosh(y) dy
y=−∞ y=−∞
Z ∞ h p i p
= 2λµ exp −2 λµz cosh(y) dy = 2λµK0 (2 λµz)
y=0
√ √ 2

(c) When λ = µ we have F (z) = 1 − 2λ zK1 (2λ z). and f (z) = 2λ K0 (2λ z). [←EX]
∂(y1 ,...,yn ) n −λyn
10.21 Now f(Y1 ,...,Yn ) (y1 , . . . , yn ) = f(X1 ,...,Xn ) (x1 , . . . , xn ) ∂(x 1 ,...,xn )
= f (x
(X1 ,...,Xn ) 1 , . . . , x n ) = λ e for 0 < y1 <
y2 < · · · < yn .
Or by induction using fY1 (y1 ) = λe−λy1 for y1 > 0 and
f(Y1 ,...,Yn ) (y1 , . . . , yn ) = fYn |(Y1 ,...,Yn−1 ) (yn |y1 , . . . , yn−1 )f(Y1 ,...,Yn−1 ) (y1 , . . . , yn−1 ) = λe−λ(yn −yn−1 ) λn−1 e−λyn−1 =
λn e−λyn [←EX]
10.22 Use characteristic functions:
∞ ∞
X X k pφX (t) pλ pλ
E[eitZ ] = E[eitZ |N = k]P[N = k] = p {φX (t)} q k−1 = = =
1 − qφX (t) λ − it − qλ pλ − it
k=1 k=1
as required. [←EX]
Mar 10, 2020(20:25) Answers 2§12 Page 185

10.23 (a) Just integrate over the possible values of Y : for t > 0 we have
R
P[X − Y ≤ t, X > Y ] P[Y < X ≤ t + Y ] y∈(0,∞)
P[y < X ≤ t + y] PY (dy)
P[X − Y ≤ t | X > Y ] = = = R
P[X > Y ] P[X > Y ] y∈(0,∞)
e−λy PY (dy)
 −λy
− e−λ(t+y) PY (dy)
R
y∈(0,∞)
e
= R = 1 − e−λt
y∈(0,∞)
e−λy PY (dy)
(b) For t > 0 and A ∈ B ∩ (0, ∞) we have
R
P[Y ∈ A, Y < X < t + Y ] P[y < X ≤ t + y] PY (dy)
P[Y ∈ A, X − Y ≤ t | X > Y ] = = A R
P[X > Y ] y∈(0,∞)
e−λy PY (dy)
R  −λy
− e−λ(t+y) PY (dy)

e
= A R
y∈(0,∞)
e−λy PY (dy)
R −λy
−λt R A
e PY (dy)
= [1 − e ] −λy P (dy)
y∈(0,∞)
e Y

= P[X − Y ≤ t | X > Y ] P[Y ∈ A | X > Y ] [←EX]


Q∞ Q∞ −λk t  P∞  
10.24 (a) Suppose t > 0. Then P[M > t] = k=1 P[Xk > t] = k=1 e = exp − k=1 λk t .
P ∞
(b) Now P[Xi < Xj for all j ∈ N − {i} ] = P[Xi < inf{Xj : j ∈ N−{i}] = λi / k=1 λk by part(a) and exercise 10.8.
(c) If µ < ∞ then because E[Y ] = µ it follows that P[Y = ∞] = 0.
Conversely, suppose P[Y < ∞] = 1. Then P[e−Y > 0] = 1 and hence E[e−Y ] > 0. But

Y 1 1
E[e−Y ] = = Q∞
1 + 1/λk k=1 (1 + 1/λk )
k=1
Q∞ Q∞ P∞
Hence Pk=1 (1 + 1/λk ) < ∞. But it is well known that if every ak ≥ 0 then k=1 (1 + ak ) < ∞ iff k=1 ak < ∞.

Hence k=1 1/λk < ∞. [←EX]

Chapter 2 Section 12 on page 37 (exs-gamma.tex)

R∞ ∞ R∞
12.1 (a) Integrating by parts gives Γ(x + 1) = 0
ux e−u du = −ux e−u |0 + x 0 ux−1 e−u du = xΓ(x). (b) Γ(1) =
R ∞ −u 1 2
e du = 1. (c) Use parts (a) and (b) and induction. (d) Use the transformation u = t ; hence Γ( 1/2) =
R0∞ −1/2 −u √ R ∞ − 1 t2 √ 2

0
u e du = 2 0 e 2 dt = π. The final equality follows because the integral over (−∞, ∞) of the

standard normal density is 1. (e) Use induction. For n = 1 we have Γ( 3/2) = π/2 which is the right hand side. Now
2n + 1 1.3.5 . . . (2n − 1).(2n + 1) √
Γ(n + 1 + 1/2) = (n + 1/2)Γ(n + 1/2) = Γ(n + 1/2) = π as required. [←EX]
2 2n+1
12.2 Now
Γ(n + k) n
E[X k ] = k for n + k > 0 and var[X] = 2
α Γ(n) α
Using equation(2.22a) shows that
E[X 3 ] − 3µσ 2 − µ3 (n + 2)(n + 1)n − 3n2 − n3 2
skew[X] = = =√
σ3 n3/2 n
Using equation(2.25a) shows that we have
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 3(n + 2)
κ[X] = = [←EX]
σ4 n
12.3 fX (x) = xn−1 e−x /Γ(n). At x = n − 1, fX (x) = (n − 1)n−1 e−(n−1) /Γ(n). Hence result by Stirling’s formula. [←EX]
12.4 For all x > 0 we have
αn n−2 −αx αn n−3 −αx 
f 0 (x) = [n − 1 − αx] and f 00 (x) = (n − 1)(n − 2) − 2αx(n − 1) + α2 x2

x e x e
Γ(n) Γ(n)
(a) If 0 < n ≤ 1 we have f 00 (x) > 0 as required.
 √   √ 
Note the roots of the quadratic are δ1 = n − 1 − n − 1 /α and δ2 = n − 1 + n − 1 /α.
(b) If 1 < n ≤ 2 we have f 0 (x) > 0 for x < (n − 1)/α and f 0 (x) < 0 for x > (n − 1)/α. Only δ2 > 0 and f 00 (x) < 0
for x < δ2 and f 00 (x) > 0 for x > δ2 .
(c) If n > 2 then both δ1 > 0 and δ2 > 0 and f 00 (x) > 0 for x < δ1 ; f 00 (x) < 0 for x ∈ (δ1 , δ2 ) and f 00 (x) > 0 for
x > δ2 . [←EX]
12.5 E[Y ] = n/α and E[1/X] = α/(m − 1) provided m > 1. Hence E[ Y /X ] = n/(m − 1) provided m > 1. [←EX]
Page 186 Answers 2§12 Mar 10, 2020(20:25) Bayesian Time Series Analysis

12.6 The simplest way is to use the moment generating function: MX+Y (t) = E[et(X+Y ) ] = E[etX ]E[etY ] = 1/(1 − t/α)n1 +n2
which is the mgf of a gamma(n1 + n2 , α) distribution. Alternatively,
Z t
αn1 +n2 e−αt t
Z
fX+Y (t) = fX (t − u)fY (u) du = (t − u)n1 −1 un2 −1 du
0 Γ(n1 )Γ(n2 ) u=0
αn1 +n2 tn1 +n2 −1 e−αt 1 αn1 +n2 tn1 +n2 −1 e−αt Γ(n1 )Γ(n2 )
Z
= (1 − w)n1 −1 wn2 −1 dw =
Γ(n1 )Γ(n2 ) w=0 Γ(n1 )Γ(n2 ) Γ(n1 + n2 )
αn1 +n2 tn1 +n2 −1 e−αt
=
Γ(n1 + n2 )
where we have used the transformation w = u/t.
(b) If X ∼ gamma(k, α) and n ∈ {1, 2, . . .}, then X = Y1 +· · ·+Yn where Y1 , . . . ,Yn are i.i.d. random variables with the
gamma(k/n, α) distribution. Hence by definition(5.1a) on page 15, the distribution gamma(k, α) is infinitely divisible.
[←EX]
12.7 (a) Clearly u ∈ R and 0 < v < 1; also x = uv and y = u(1 − v). Hence
n+m n+m−1 −αu m−1
(1 − v)n−1

∂(x, y)
= u and f(U,V ) (u, v) = α u e v
= fU (u)fV (v)
∂(u, v) Γ(n)Γ(m)
where U ∼ gamma(n + m, α) and V ∼ beta(m, n).
(1+v)2
(b) Clearly u ∈ R and v ∈ R; also x = u/(1 + v) and y = uv/(1 + v). Hence ∂(u,v) ∂(x,y) = and

u

u αn+m un+m−2 e−αu v n−1 αn+m un+m−1 e−αu v n−1


f(U,V ) (u, v) = =
(1 + v)2 (1 + v)m+n−2 Γ(n)Γ(m) (1 + v)m+n Γ(n)Γ(m)
n+m n+m−1 −αu n−1
α u e Γ(n + m) v
= = fU (u)fV (v)
Γ(n + m) Γ(n)Γ(m) (1 + v)m+n
where U ∼ gamma(n + m, α) and 1/(1 + V ) ∼ beta(m, n). [←EX]
12.8 Note that y1 ∈ (0, 1), y2 ∈ (0, 1) and y3 > 0. Also x1 = y1 y2 y3 , x2 = y2 y3 (1 − y1 ) and x3 = y3 (1 − y2 ) and the absolute
value of the Jacobian of the transformation is given by ∂(x
1 ,x2 ,x3 ) 2
∂(y1 ,y2 ,y3 ) = y2 y3 . Now

λk1 +k2 +k3 xk1 1 −1 xk2 2 −1 x3k3 −1 e−λ(x1 +x2 +x3 )


f(X1 ,X2 ,X3 ) (x1 , x2 , x3 ) = for x1 > 0, x2 > 0, and x3 > 0.
Γ(k1 )Γ(k2 )Γ(k3 )
Hence
λk1 +k2 +k3 y1k1 −1 y2k1 +k2 −1 y3k1 +k2 +k3 −1 (1 − y1 )k2 −1 (1 − y2 )k3 −1 e−λy3
f(Y1 ,Y2 ,Y3 ) (y1 , y2 , y3 ) =
Γ(k1 )Γ(k2 )Γ(k3 )
y1k1 −1 (1 − y1 )k2 −1 y2k1 +k2 −1 (1 − y2 )k3 −1 λk1 +k2 +k3 y3k1 +k2 +k3 −1 e−y3
=
B(k1 , k2 ) B(k1 + k2 , k3 ) Γ(k1 + k2 + k3 )
Hence Y1 ∼ beta(k1 , k2 ), Y2 ∼ beta(k1 + k2 , k3 ) and Y3 ∼ gamma(k1 + k2 + k3 , λ). [←EX]
12.9 Now etX dP ≥ {X>x} etX dP ≥ etx P[X ≥ x]. Hence, for all x ≥ 0 and all t < α we have P[X ≥ x] ≤ e−tx E[etX ].
R R

Hence
e−tx
P[X ≥ x] ≤ inf e−tx E[etX ] = αn inf
t<α t<α (α − t)n
By differentiation, the infimum occurs at t = α − n/x. Hence
en−αx
 
2n
P[X ≥ x] ≤ αn and P X ≥ ≤ 2n e−n as required. [←EX]
(n/x)n α
12.10 (a) The moment generating function of X is
∞ k ∞ ∞ 
2k−1 (k + 2)!tk X k + 2

X t X 1
E[etX ] = 1 + E[X k ] = 1 + = (2t)k = for |t| < 1/2.
k! k! 2 (1 − 2t)3
k=1 k=1 k=0
P∞
Hence X ∼ gamma(3, 1/2) = χ26 . (b) E[etX ] = k=0 k+1 k
 2 2
k (2t) = 1/(1 − 2t) for |t| < /2. This is gamma(2, /2) = χ4 .
1 1
[←EX]
12.11 Let V = X + Y . Then
f(X,V ) (x, v) f(X,Y ) (x, v − x)
fX|V (x|v) = =
fV (v) fV (v)
α(αx)m−1 e−αx α (α(v − x))n−1 e−α(v−x) Γ(m + n)
=
Γ(m) Γ(n) α(αv)m+n−1 e−αv
Γ(m + n) xm−1 (v − x)n−1
= for 0 < x < v.
Γ(m)Γ(n) v m+n−1
Hence for v > 0 we have
Mar 10, 2020(20:25) Answers 2§12 Page 187
v v
Γ(m + n) xm (v − x)n−1 Γ(m + n)  x m  x n−1
Z Z
E[X|X + Y = v] = m+n−1
dx = 1− dx
Γ(m)Γ(n) x=0 v Γ(m)Γ(n) x=0 v v
1
Γ(m + n) Γ(m + n) Γ(m + 1)Γ(n) mv
Z
= v tm (1 − t)n−1 dt = v = [←EX]
R∞ Γ(m)Γ(n) t=0 Γ(m)Γ(n) Γ(m + n + 1) m +n

12.12 (a) Now fX (x) = 0 fX|Y (x|y)fY (y) dy = αn y=0 y n e−(α+x)y dy/Γ(n) = nαn /(x + α)n+1 for x > 0. E[X] can be
R
 
found by integration; or E[X|Y ] = 1/Y and hence E[X] = E E[X|Y ] = E[ 1/Y ] = α/(n − 1).
(b) Now f(X,Y ) (x, y) = αn y n e−(x+α)y /Γ(n). Hence fY |X (y|x) = f(X,Y ) (x, y)/fX (x) = y n (x + α)n+1 e−(x+α)y /Γ(n + 1)
for x > 0 and y > 0. [←EX]
12.13 Let W = X1 + · · · + Xn . Then W ∼ gamma(n, λ) and
Z ∞ Z ∞
λn
P[Z > W ] = P[Z > w]fW (w) dw = wn−1 e−(λ+µ)w dw
w=0 (n − 1)! w=0
Z ∞
λn (λ + µ)n wn−1 e−(λ+µ)w λn
= n
dw =
(λ + µ) w=0 (n − 1)! (λ + µ)n
by using the integral of the gamma density is 1. Hence answer is 1 − λ /(λ + µ)n .
n
[←EX]
12.14 Now Z ∞
Γ(α) = λα xα−1 e−λx dx
0
and hence
∞ ∞
d
Z Z
0 α α−1 −λx
Γ (α) = Γ(α) = λ ln(λ) x e dx + λα xα−1 ln(x) e−λx dx
dα 0 0
leading to
Z ∞ Z ∞
Γ0 (α + 1) = λα+1 ln(λ) xα e−λx dx + λα+1 xα ln(x) e−λx dx
0 0
= ln(λ)Γ(α + 1) + λΓ(α)E[X ln X]
and hence
1 α
E[X ln X] = [ ψ(α + 1)Γ(α + 1) − ln(λ)Γ(α + 1) ] = [ ψ(α + 1) − ln λ ] [←EX]
λΓ(α) λ
12.15
f(X|Y1 ,...,Yn ) (x|y1 , . . . , yn ) ∝ f (y1 , . . . , yn , x) = f (y1 , . . . , yn |x)fX (x)
n
Y
= xe−xyi λe−λx = λxn e−x(λ+y1 +···+yn )
i=1
Using the integral of the Gamma density gives
Z ∞
λn!
λxn e−x(λ+y1 +···+yn ) dx =
0 (λ + y1 + · · · + yn )n+1
and hence
xn e−x(λ+y1 +···+yn ) (λ + y1 + · · · + yn )n+1
f(X|Y1 ,...,Yn ) (x|y1 , . . . , yn ) =
n!
which is the gamma(n + 1, λ + y1 + · · · + yn ) distribution. Hence E[X|Y1 = y1 , . . . , Yn = yn ] = (n + 1)/(λ + y1 + · · · + yn ).
[←EX]
12.16 We can express the density in the form
k
!
X
fX (x|θ) = h(x)g(θ) exp ηi (θ)hi (x)
i=1
where θ = (n, α), h(x) = 1, g(θ) = αn /Γ(n), η1 (θ) = n − 1, η2 (θ) = −α, h1 (x) = ln x and h2 (x) = x. [←EX]
12.17 The joint density of (X, Y ) is
e−x xj αn xn−1 e−αx
P[Y = j|X = x]fX (x) = for x > 0 and j = 0, 1, . . . .
j! Γ(n)
Hence Z ∞ −x j n n−1 −αx Z ∞
e x α x e αn
P[Y = j] = dx = xj+n−1 e−(1+α)x dx
x=0 j! Γ(n) j!Γ(n) x=0
Z ∞
αn Γ(j + n) (1 + α)j+n j+n−1 −(1+α)x
= x e dx
j!Γ(n) (1 + α)j+n x=0 Γ(j + n)
j
αn Γ(j + n) j + n − 1  α n
  
1
= =
j!Γ(n) (1 + α)j+n j 1+α 1+α
j+n−1 j n

which is the negative binomial distribution j p (1 − p) , the distribution of the number of successes before the
nth failure. [←EX]
Page 188 Answers 2§12 Mar 10, 2020(20:25) Bayesian Time Series Analysis

12.18 Now φX (t) = E[eitX ] = α/(α − it). Using Y = αX − 1 gives φY (t) = E[eitY ] = e−it E[eitαX ] = e−it /(1 − it). Hence
|φY (t)| = 1/(1 + t2 )1/2 . Choose k > 2 and then |φY (t)|k ≤ 1/(1 + t2 ) which is integrable. Also, for |t| ≥ δ we have
|φY (t)| ≤ 1/(1 + δ 2√
)1/2 < 1 for all δ > 0.
It follows
√ that Sn / n √ has a bounded continuous density with density fn which satisfies limn→∞ fn (z) = n(z). Now
Sn / n = (αGn − n)/ n. Hence
√  √ 
n n+z n
fn (z) = fGn and hence the required result. [←EX]
α α
12.19 Let Lt = SK − SK−1 . Note that {K = 1} = {S1 ≥ t}. Then for x < t we have

X ∞
X
P[Lt ≤ x] = P[Lt ≤ x, K = n] = P[Sn − Sn−1 ≤ x, K = n]
n=1 n=2

X ∞
X
= P[Sn − Sn−1 ≤ x, Sn−1 < t ≤ Sn ] = P[Xn ≤ x, Sn−1 < t ≤ Sn−1 + Xn ]
n=2 n=2
∞ Z x
X ∞ Z
X x Z t
−αy
= P[Sn−1 < t < Sn−1 + y]αe dy = fSn−1 (v)αe−αy dv dy
n=2 y=0 n=2 y=0 v=t−y
∞ Z
X t Z x ∞ Z
X t
fSn−1 (v)αe−αy dy dv =
 −α(t−v)
− e−αx fSn−1 (v) dv

= e
n=2 v=t−x y=t−v n=2 v=t−x
P∞
But n=2 fSn−1 (v) = α. Hence
P[Lt ≤ x] = 1 − e−αx − αxe−αx and fLt (x) = α2 xe−αx
If x > t then

X ∞
X
P[Lt ≤ x] = P[Lt ≤ x, K = n] = P[t < X1 ≤ x] + P[Sn − Sn−1 ≤ x, K = n]
n=1 n=2

X
= e−αt − e−αx + P[Sn − Sn−1 ≤ x, Sn−1 < t ≤ Sn ]
n=2

X
= e−αt − e−αx + P[Xn ≤ x, Sn−1 < t ≤ Sn−1 + Xn ]
n=2
∞ Z x
X
= e−αt − e−αx + P[Sn−1 < t < Sn−1 + y]αe−αy dy
n=2 y=0
∞ Z
X t Z x
−αt −αx
=e −e + fSn−1 (v)αe−αy dy dv
n=2 v=0 y=t−v
∞ Z
X t
= e−αt − e−αx +
 −α(t−v)
− e−αx fSn−1 (v) dv

e
n=2 v=0
−αx −αx
=1−e − αte
−αx
and fLt (x) = α(1 + αt)e .
Z t Z ∞
E[Lt ] = α2 x2 e−αx dx + α(1 + αt)xe−αx dx
x=0 x=t
Z t Z ∞
= −αt2 e−αt + 2α xe−αx dx + α(1 + αt) xe−αx dx
0 x=t
2 − e−αt
=
α
and so E[Lt ] > 1/α.
(b)

X
P[Wt ≤ x] = P[SK ≤ t + x] = P[Sn ≤ t + x, K = n]
n=1

X
= P[t < X1 ≤ t + x] + P[Sn ≤ t + x, Sn ≥ t, Sn−1 < t]
n=2
∞ Z t
X
= e−αt − e−α(x+t) + fSn−1 (y) P[t − y ≤ Xn ≤ t + x − y] dy
n=2 0
Mar 10, 2020(20:25) Answers 2§14 Page 189
∞ Z
X t
= e−αt − e−α(x+t) + fSn−1 (y) e−α(t−y) − e−α(x+t−y) dy
 

n=2 0
−αx
=1−e
So Wt has the same exponential distribution as the original Xj —this is the “lack of memory” property. [←EX]
12.20 We can express the density in the form
k
!
X
fX (x|θ) = h(x)g(θ) exp ηi (θ)hi (x)
i=1
where θ = n, h(x) = e−x/2 n/2
, g(θ) = 1/(2 Γ(n/2)), η1 (θ) = /2 − 1 and h1 (x) = ln x.
n [←EX]
du 1 −y/2 2
12.21 Use fY (y) = fU (u) dy = 2 e . This is the χ2 distribution. [←EX]
b
12.22 (a) Use the transformation y = λx . Then
∞ ∞
y n−1 e−y
Z Z
f (x) dx = dy = 1
0 0 Γ(n)
(b) Straightforward. (c) Use the transformation y = xb . [←EX]
12.23 (a) Just use Proposition (9.2a) on page 28 and the fact that the m.g.f. of the sum of j i.i.d. random variables is the
j th power of the m.g.f. of one. (b) Similarly use corollary (9.2b) on page 29.
From first principles. (a) The m.g.f. of Xp is
∞ j
k−1 j pet
X   
E[etXp ] = etk p (1 − p)k−j = for t < − ln(1 − p).
k−j 1 − (1 − p)et
k=j
It follows that the m.g.f. of pXp is
j
petp

E[etpXp ] = for t < − ln(1 − p)/p.
1 − (1 − p)etp
Setting x = pt in the standard inequality x < ex − 1 < x/(1 − x) for x < 1 shows that
etp − 1 t etp − 1
t< < for p < 1/t, and hence lim =t
p 1 − tp p→0 p
Hence for every t ∈ (0, ∞) we have
1 − (1 − p)etp 1 etp − 1
= 1 − −→ 1 − t as p → 0.
petp etp p
We have shown that
1
lim E[etpXp ] =
p→0 (1 − t)j
and the limit on the right hand side is the m.g.f. of the gamma(j, 1) distribution.
(b) Similar generalisation of the proof of corollary (9.2b) on page 29. [←EX]

Chapter 2 Section 14 on page 44 (exs-betaarcsine.tex)

0 α−2 β−2 0
14.1 (a) By differentiation, fX (x) = cx (1 − x) [α − 1 + x(2 − α − β)] where c = 1/B(α, β) > 0. Hence fX (x) < 0
0
for x < x0 and fX (x) > 0 for x > x0 .
0 α−2
(c) Use fX (x) = cx (1 − x)β−2 [(α − 1)(1 − x) + x(1 − β)]. [←EX]
14.2
Γ(α + β) 1 m+α−1 Γ(α + β)Γ(m + α)
Z
E[X m ] = x (1 − x)β−1 dx =
Γ(α)Γ(β) 0 Γ(α)Γ(m + α + β)
(b) E[X] = α/(α + β) and E[X 2 ] = (α + 1)α/(α + β + 1)(α + β). Hence var[X] = αβ/(α + β + 1)(α + β)2 . (c)
E[1/x] = (α + β − 1)/(α − 1). [←EX]
14.3 Now
α αβ
E[X] = var[X] =
α+β (α + β)2 (α + β + 1)
and
(α + 2)(α + 1)α (α + 3)
E[X 3 ] = E[X 4 ] = E[X 3 ]
(α + β + 2)(α + β + 1)(α + β) (α + β + 3)
(a) Using equation(2.22a) gives
E[X 3 ] − 3µσ 2 − µ3 α(α + β + 1)3/2 (α + β)2 (α + 2)(α + 1)
 
3αβ 2
skew[X] = = − − α
σ3 (αβ)3/2 (α + β + 2)(α + β + 1) (α + β + 1)
2(α + β + 1)1/2 (β − α)
=
(αβ)1/2 (α + β + 2)
Page 190 Answers 2§14 Mar 10, 2020(20:25) Bayesian Time Series Analysis

(b) Using equation(2.25a) gives


E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4
κ[X] =
σ4
(α + β) (α + β + 1)2 β − 3α
4
6α3 β 3α4
 
3
= E[X ] + +
(αβ)2 α+β (α + β)4 (α + β + 1) (α + β)4
3(α + β + 1)(2α2 + α2 β + αβ 2 − 2αβ + 2β 2 )
=
αβ(α + β + 2)(α + β + 3)
(c) Substituting α = β = 2 gives skew[X1 ] = 0 and κ[X1 ] = 15/7. Substituting α = 3 and β = 2 gives skew[X2 ] = − 2/7
and κ[X2 ] = 33/14. Substituting α = 2 and β = 3 gives skew[X3 ] = 2/7 and κ[X3 ] = 33/14. Substituting α = β = 1/2 gives
skew[X4 ] = 0 and κ[X4 ] = 3/2. [←EX]
dx
14.4 X has density fX (x) = αxα−1 . Let Y = − ln(X). Then Y ∈ (0, ∞) and X = e−Y and dy = −x. Hence fY (y) =
.
dy −y(α−1) −y −αy
fX (x) dx = αe × e = αe for y ∈ (0, ∞), as required. [←EX]

dy 2
14.5 (a) Now fY (y) = fX (x)/ dx x fx (x). Hence
β−1
Γ(α + β) (y − 1)β−1

Γ(α + β) 1 1
fY (y) = 1 − = for y > 1.
Γ(α)Γ(β) y α+1 y Γ(α)Γ(β) y α+β
(b) Now fY (y) = α/y α+1 for y > 1; this is the Pareto(α, 1, 0) distribution. [←EX]
14.6
Z 1
Γ(n + 1)
P[X > p] = z k−1 (1 − z)n−k dz
p Γ(k)Γ(n − k + 1)
Z 1
Γ(n + 1)
= [p + (1 − p)y]k−1 (1 − p)n−k+1 (1 − y)n−k dy
Γ(k)Γ(n − k + 1) 0
k−1  Z 1
k−1 r

Γ(n + 1) X
= p (1 − p)n−r y k−1−r (1 − y)n−k dy
Γ(k)Γ(n − k + 1) r 0
r=0
k−1 k−1  
X n! (k − 1 − r)!(n − k)! X n r
= pr (1 − p)n−r = p (1 − p)n−r
(n − k)!r!(k − 1 − r)! (n − r)! r
r=0 r=0
= P[Y ≤ k − 1] [←EX]
14.7 For k ∈ {0, 1, . . . , n} we have
1
n Γ(α + β) Γ(k + α)Γ(n + β − k)
Z  
P[X = k] = P[X = k|Y = y]fY (y) dy =
y=0 k Γ(α)Γ(β) Γ(n + β + α)
and hence
fY (y, X = k) Γ(n + α + β)
fY (y|X = k) = = y k+α−1 (1 − y)n+β−k−1 [←EX]
P[X = k] Γ(k + α)Γ(n + β − k)
14.8 For k ∈ {n, n + 1, . . .} we have
1
n − 1 Γ(α + β) Γ(k + α)Γ(n + β − k)
Z  
P[X = k] = P[X = k|Y = y]fY (y) dy =
y=0 k − 1 Γ(α)Γ(β) Γ(n + β + α)
and hence
fY (y, X = k) Γ(n + α + β)
fY (y|X = k) = = y k+α−1 (1 − y)n+β−k−1 [←EX]
P[X = k] Γ(k + α)Γ(n + β − k)
14.9 By §7.5 on page 23 we know that Uk:n ∼ beta(k, n − k + 1) and hence E[Uk:n ] = k/(n + 1) and hence E[Dk ] = 1/(n + 1).
[←EX]
14.10 (a) Let c = B(α, β). Then
xα−2 xα−2
cf 0 (x) = [(α − 1)(1 + x) − (α + β)x] = [(α − 1) − x(1 + β)]
(1 + x)α+β+1 (1 + x)α+β+1
(b), (c) and (d) follow directly from (a).
(e) A routine calculation gives cf 00 (x) = (α − 1)(α − 2) − 2(α − 1)(β + 2)x + (β + 1)(β + 2)x2 . The discriminant of this
quadratic in x is 4(α − 1)(β + 2)(α + β).
(f) If α ∈ (0, 1] then f 00 (x) > 0 for all x and hence f is convex. If α ∈ (1, 2] then x1 < 0, f 00 (x) < 0 for all x < x2 ,
f 00 (x2 ) = 0, and finally f 00 (x) > 0 for x > x2 . If α > 2 then both x1 > 0 and x2 > 0 and f 00 (x) > 0 for all x < x1 ,
f 00 (x) < 0 for all x ∈ (x1 , x2 ) and f 00 (x) > 0 for all x > x2 . [←EX]
Mar 10, 2020(20:25) Answers 2§14 Page 191

14.11 (a) Suppose α > 0 and β > 0. Then for all x > 0 we have
Z ∞
xα−1 xα B(α + 1, β − 1) α
fX (x) = α+β
Hence E[X] = α+β
dx = =
B(α, β) (1 + x) 0 B(α, β) (1 + x) B(α, β) β−1
R∞ α −α−β
using 0 x (1 + x) dx = B(α + 1, β − 1) for all α > −1 and β > 1.
(b) Similarly, for all β > 2 we have
Z ∞
2 xα+1 B(α + 2, β − 2) α(α + 1)
E[X ] = α+β
dx = =
0 B(α, β) (1 + x) B(α, β) (β − 1)(β − 2)
Hence var[X]. R∞
(c) E[X k ] = 0 xk+α−1 (1 + x)−α−β dx/B(α, β) = B(α + k, β − k)/B(α, β) by using the integral of the beta prime
density in equation(13.6a) equals 1 for α + k > 0 and β − k > 0. Hence part(a). [←EX]
14.12 (a) For β > 4 we have
α α(α + β − 1)
E[X] = var[X] =
β−1 (β − 1)2 (β − 2)
(α + 2)(α + 1)α (α + 3)(α + 2)(α + 1)α
E[X 3 ] = E[X 4 ] =
(β − 1)(β − 2)(β − 3) (β − 1)(β − 2)(β − 3)(β − 4)
Using equation(2.22a) shows that
E[X 3 ] − 3µσ 2 − µ3 (β − 1)3 (β − 2)3/2 3α2 (α + β − 1) α3
 
3
skew[X] = = 3/2 E[X ] − −
σ3 α (α + β − 1)3/2 (β − 1)3 (β − 2) (β − 1)3
(β − 2)1/2 4α + 2β − 2
= for β > 3.
α(α + β + 1)1/2 β−3
(b) Using equation(2.25a) shows that we have
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4
κ[X] =
σ4
(β − 1) (β − 2)2
4
4αE[X 3 ] 6α3 (α + β − 1) 3α4
 
4
= 2 E[X ] − + +
α (α + β − 1)2 (β − 1) (β − 1)4 (β − 2) (β − 1)4
 
3(β − 2) (α + 2)β 2 + (α2 + 4α − 4)β + 5α2 − 5α + 2)
= for β > 4. [←EX]
α(β − 4)(β − 3)(α + β − 1)
dy
14.13 Let Y = X/(1 − X); then Y ∈ (0, ∞). Also X = Y /(1 + Y ), 1 − X = 1/(1 + Y ) and dx = (1 + y)2 . Hence
dx xα−1 (1 − x)β−1 y α−1

1 1
fY (y) = fX (x) = = for y ∈ (0, ∞), as required. [←EX]
dy B(α, β) (1 + y)2 B(α, β) (1 + y)α+β
dy dy
| = x2 fX (x) = xα+1 B(α, β)(1 + x)α+β =

14.14 (a) Let Y = 1/X; hence | dx | = 1/x2 . Hence fY (y) = fX (x)/| dx
y β−1 B(β, α)(1 + y)α+β as required.

(Note that B(α, β) = B(β, α).)
(d) Let V = X/Y and W = Y ; then ∂(v,w)
1
∂(x,y) = y . Now

xn1 −1 y n2 −1 e−(x+y) f(X,Y ) (x, y) xn1 −1 y n2 e−(x+y) v n1 −1 wn1 +n2 −1 e−w(1+v)


f(X,Y ) (x, y) = and f(V,W ) (v, w) = = =
Γ(n1 )Γ(n2 ) ∂(v,w)
Γ(n1 )Γ(n2 ) Γ(n1 )Γ(n2 )
∂(x,y)
and hence
Z ∞
v n1 −1 v n1 −1 Γ(n1 + n2 )
fV (v) = wn1 +n2 −1 e−w(1+v) dw = as required.
Γ(n1 )Γ(n2 ) w=0 Γ(n1 )Γ(n2 ) (1 + v)n1 +n2
(f) Just use X/Y = (2X)/(2Y ) and part (e). [←EX]
2
14.15 Using the transformation u = v gives
x Z √x √x 2
du √
Z
2dv 2
√ = √ = arcsin(v) = arcsin( x)

u=0 π u(1 − u) v=0 π 1 − v
2 π v=0 π
The final equality to prove is:
2 √ arcsin(2x − 1) 1
arcsin( x) = + for x ∈ [0, 1]
π π 2
Throughout x ∈ [0, 1] and we take arcsin(x) ∈ [0, π2 ].
Let y = arcsin(x); then sin(y) = x and sin(−y) = −x. Hence arcsin(−x) = − arcsin(x).
Let y = π2 − arccos(x). Then sin(y) = x and hence arccos(x) + arcsin(x) = π2 .
Now sin(y) = x and 1 − 2x2 = cos2 (y) − sin2 (y) = cos(2y); hence 21 arccos(1 − 2x2 ) = y = arcsin(x).

Combining gives 2 arcsin( x) = arccos(1 − 2x) = π2 − arcsin(1 − 2x) = π2 + arcsin(2x − 1). [←EX]
Page 192 Answers 2§14 Mar 10, 2020(20:25) Bayesian Time Series Analysis

14.16 The density of X is


1
f (x) = √ for x ∈ (a, a + b).
π (x − a)(a + b − x))
Hence for t > 0 we have
1
f (t + a + b/2) = p = f (a + b/2 − t)
π (t + b/2) (b/2 − t)
(b) Differentiating (x − a)(a + b − x)π 2 f (x)2 = 1 gives 2(x − a)(a + b − x)f 0 + (2a + b − 2x)f = 0 and hence
2x − 2a − b
f 0 (x) =
2π(x − a)3/2 (a + b − x)3/2
(c) Differentiating again gives 2(x − a)(a + b − x)f 00 + (6a + 3b − 6x)f 0 = 2f . Hence f 00 (x) > 0. [←EX]
14.17 (a) Setting α = 1/2 and β = 1/2 in the answer for exercise 14.2 gives
(n − 1/2)(n − 3/2) · · · 1/2
 
n Γ(n + 1/2) (2n)! 1 2n
E[X ] = 1 = = n =
Γ( /2)Γ(n + 1) n! 4 (n!)2 4n n
(b) Because X is symmetric about µ = 1/2, it follows that E[ (X − µ)n ] = 0 if n is odd. If n is odd
Z 1
n n 1 n
E[X ] = E[ (X − µ) ] = 1 2 x − 1/2 x−1/2 (1 − x)−1/2 dx
Γ( /2) 0
Z 1/2
2 un du
= 1 2 by setting u = x − 1/2.
Γ( /2) −1/2 (1 − 4u2 )1/2
Z 1 (n−1)/2
1 v dv
= n 1 2 by setting v = 4u2 .
2 Γ( /2) 0 (1 − v)1/2
 
1 Γ( n/2 + 1/2)Γ( 1/2) 1 Γ( n/2 + 1/2) 1 1 n
= n 1 2 = n 1 = by using the result in part(a).
2 Γ( /2) Γ( n/2 + 1) 2 Γ( /2)Γ( n/2 + 1) 2n 2n n/2
(c) Now
Z 1 ∞ ∞
Γ(k + 1/2) (it)k X 1 2k (it)k
 
1 X
φX (t) = 1 2 eitx x−1/2 (1 − x)−1/2 dx = = [←EX]
Γ( /2) 0 Γ( 1/2)Γ(k + 1) k! 4k k k!
k=0 k=0
(d) Just use X = a + bZ where Z ∼ arcsine(0, 1). Hence E[X] = a + b/2 and var[X] = b2 /8. [←EX]
14.18 If X ∼ arcsine(a, b) then Y = (X − a)/b ∼ arcsine(0, 1); hence skew[X] = skew[Y ] and κ[X] = κ[Y ]. Now
√ E[Y ] =

var[Y ] = 1/8, E[Y 3 ] = 5/16 and E[Y 4 ] = 35/128. Hence skew[Y ] = (5/16) − (3/16) − (1/8) /(1/16 2) = 0 and
1/2,

κ[X] = (35/128) − (5/8) + (3/16) + (3/16) /(1/64) = 3/2. [←EX]


14.19 (a) and (b) Use the results for the distribution function, equation(13.9a), and the quantile function, equation(13.9b), and
the results in §7.6 on page 23. [←EX]
14.20 (a) Let Y = X 2 ; then Y ∈ (0, 1). Also
fX (x) fX (x) fX (x) 1 1
fY (y) = 2 dy
=2 = = √ = √ as required.
| dx | 2|x| |x| |x|π 1 − x2 π y(1 − y)
(b) Let Y = sin(X). Then Y ∈ (−1, 1). Also
fX (x) fX (x) 2 1 1 1
fY (y) = 2 dy = 2 = = p = √ as required.
| dx | | cos(x)| 2π | cos(x)| π 1 − y 2 π (1 − y)(1 + y)
Let Y = sin(2X). Then Y ∈ (−1, 1). Also
fX (x) fX (x) 1 1 1 1
fY (y) = 4 dy = 4 = = p = √ as required.
| dx | 2| cos(2x)| π | cos(2x)| π 1 − y 2 π (1 − y)(1 + y)
Let Y = − cos(2X). Then Y ∈ (−1, 1). Also
fX (x) fX (x) 1 1 1 1
fY (y) = 4 dy = 4 = = p = √ as required.
| dx | 2| sin(2x)| π | sin(2x)| π 1−y 2 π (1 − y)(1 + y)
(c)
fX (x) 1 1 1
fY (y) = dy = = p = √ as required.
| dx | π sin(x) π 1 − y 2 π (1 − y)(1 + y)
dy √
(e) In this case fX (x) = 2/π and | dx | = 2 sin(x) cos(x) = 2 y(1 − y). [←EX]

14.21 By part (a) of exercise 12.7 on page 38, we know that V ∼ beta( 12 , 12 ) = arcsine(0, 1). [←EX]
Mar 10, 2020(20:25) Answers 2§16 Page 193

14.22 (a) Now V = X + Y has a triangular distribution on (−2π, 2π) with density:
|v|
 
1
fV (v) = 1− for −2π < v < 2π.
2π 2π
Let Z = sin(V ) = sin(X + Y ). Then Z ∈ (−1, 1). Also
|v|
X fV (v) X fV (v)  
1 1 X
fZ (z) = dz
= =√ 1−
v
| dv | v
| cos(v)| 1 − z 2 2π v 2π
The 4 values of v leading to z are sin−1 −1
P(z), π − sin (z), which are both√positive, and −2π + sin−1 (z) and −π − sin−1 (z)
which are both negative. This gives v |v| = 4π. Hence fZ (z) = 1/π 1 − z . 2 (b) Now −Y ∼ Uniform(−π, π).
Hence result follows by part (a). [←EX]

Chapter 2 Section 16 on page 49 (exs-normal.tex)

16.1 Using the transformation v = −t gives


Z −x Z x Z ∞
1 1 1
√ exp − t2/2 dt = − √ exp − v2/2 dv = √ exp − v2/2 dv = 1 − Φ(x)
     
Φ(−x) =
−∞ 2π ∞ 2π x 2π
Let z = Φ−1 (1 − p); hence p = 1 − Φ(z) = Φ(−z); hence result. [←EX]
16.2 (140 − µ)/σ = Φ−1 (0.3) = −0.5244005 and (200 − µ)/σ = 0.2533471. Hence σ = 77.14585 and µ = 180.4553. [←EX]
16.3 Now skew[X] = skew[Y ] and κ[X] = κ[Y ] where Y ∼ N (0, 1). Now the distribution of Y is symmetric about 0;
hence skew[X] = skew[Y ] = 0. Now E[Y 4 ] = 3; hence κ[X] = κ[Y ] = E[Y 4 ] = 3. [←EX]
16.4 By §7.6 we know that for any continuous distribution function we have F (X) ∼ uniform(0, 1) and hence E [ F (X) ] = 21 .
Hence Z Z ∞
1 1
F (x)fX (x) dx = and hence Φ(x)ϕ(x) dx =
2 −∞ 2
Using Φ(−x) = 1 − Φ(x) gives the other result in part (a).
dy
(b) The transformation y = (x − a)/σ has dx = σ1 and this gives part (b). [←EX]
16.5 Using the transformation y = cx − d shows that
Z ∞
1 ∞
 
bd by
Z
Φ(d − a − bx)ϕ(cx − d) dx = Φ d−a− − ϕ(y) dy
x=−∞ c y=−∞ c c
1 ∞ cd − ac − bd by
Z  
= P Z≤ − ϕ(y) dy
c y=−∞ c c
cd − ac − bd
 
1 bY
= P Z+ ≤
c c c
1
= P[cZ + bY ≤ cd − ac − bd]
c
where Y and Z are i.i.d. random variables with the N (0, 1) distribution. Hence cZ + bY ∼ N (0, c2 + b2 ) and
Z ∞
cd − ac − bd
 
1
Φ(d − a − bx)ϕ(cx − d) dx = Φ √
x=−∞ c c 2 + b2
as required.
If c < 0, using the transformation y = cx − d shows that
Z ∞
1 −∞
 
bd by
Z
Φ(d − a − bx)ϕ(cx − d) dx = Φ d−a− − ϕ(y) dy
x=−∞ c y=∞ c c
1 −∞ cd − ac − bd by
Z  
= P Z≤ − ϕ(y) dy
c y=∞ c c
cd − ac − bd
 
1 bY
=− P Z+ ≤
c c c
1
= − P[cZ + bY ≥ cd − ac − bd]
c
cd − ac − bd

1 1
= Φ √ −
c 2
c +b 2 c
where Y and Z are i.i.d. random variables with the N (0, 1) distribution.
If c = 0 then Z ∞ Z ∞
Φ(d − a − bx)ϕ(cx − d) dx = ϕ(−d) Φ(d − a − bx) dx
x=−∞ x=−∞
and use the inequality in part(a) of exercise 16.8. [←EX]
Page 194 Answers 2§16 Mar 10, 2020(20:25) Bayesian Time Series Analysis

16.6 (a) Integrating by parts gives


2

∞ ∞ Z ∞
ϕ(x) 1 e−x /2
Z Z
1 1 −x2 /2 1 2
dx = √ e dx = − √ − √ e−x /2 dx
x2 x 2 x

w 2π w 2π
w
2π w
−w2 /2 Z ∞
1 e
=√ − ϕ(x) dx
2π w w
Hence result.
(b) Suppose a ≤ x ≤ b. If x ≥ 0 then x2 ≤ b2 whilst if x < 0 then x2 ≤ a2 . Hence for all a ≤ x ≤ b we have
x2 ≤ max{a2 , b2 } = m2 and hence ϕ(m) ≤ ϕ(x) ≤ ϕ(0). Integrating over the interval (a, b) gives the result. [←EX]
16.7 Linear combinations of independent normals are normal and a normal distribution is specified by its mean and variance.
[←EX]
16.8 Using part (a) of exercise 16.6 gives
Z ∞  Z ∞
ϕ(x) 1
= 1 + 2 ϕ(u) du ≥ ϕ(u) du = 1 − Φ(x)
x x u x
Also Z ∞ Z ∞
1 + x2
 
ϕ(x) 1 1
= 1 + 2 ϕ(u) du ≤ 1 + 2 ϕ(u) du = [1 − Φ(x)]
x x u x x x2
It is easy to check that for all x > 0 we have x2 + 1 > x; hence
1 1 x x + x3 − 1 − x2 − x3 x − 1 − x2
− 2− = = 2 <0
x x 1 + x2 x2 (1 + x2 ) x (1 + x2 )
(b) By part (a) we have x[1 − Φ(x)]/ϕ(x) ≤ 1 and x[1 − Φ(x)]/ϕ(x) ≥ x2 /(1 + x2 ) → 1 as x → ∞. Hence part (b).
(c) Using part (a) of exercise 16.6 gives
Z y 
ϕ(x) ϕ(y) 1
− = 1 + 2 ϕ(u) du
x y x u
Then as for part
R ∞ (a). Part2k(d) is an easy consequence.
(e) Let Ik = x ϕ(u)/u du. Integrating by parts shows that
Z ∞ ∞ Z ∞
1 ϕ(u) 1 ϕ(x) Ik
Ik+1 = ϕ(u) 2k+2 du = − 2k+1

2k+1
uϕ(u) du = 2k+1

x u (2k + 1)u
x x (2k + 1)u (2k + 1)x 2k +1
2k+1 ϕ(x) ϕ(x)
Hence Ik + (2k + 1)Ik+1 = ϕ(x)/x for k = 1, 2, . . . . Hence I0 + I1 = x and I1 + 3I2 = x3 ; these two equations
lead to I0 = ϕ(x) x1 − x13 + 3I2 . Then use I2 + 5I3 = ϕ(x) to get I0 = ϕ(x) x1 − x13 + 1.3
 
x5 x5
− 1.3.5I3 .
By induction we see that for k = 2, 3, . . .
k−1 1.3.5 . . . (2k − 3)
 
1 1 1.3
I0 = ϕ(x) − + + · · · + (−1) + (−1)k 1.3.5 . . . (2k − 1)Ik
x x3 x5 x2k−1
Hence for k = 2j with j = 1, 2, 3, . . . 
1.3.5 . . . (4j − 3)

1 1 1.3
I0 ≥ ϕ(x) − + + ··· −
x x3 x5 x4j−1
Similarly for k = 2j + 1 with j = 1, 2, 3,  ...
1.3.5 . . . (4j − 1)

1 1 1.3
I0 ≤ ϕ(x) − 3 + 5 + ··· +
x x x x4j+1
R∞
Also I0 = x ϕ(u) du = 1 − Φ(x). Hence part (e).
(f) The first entries in part (e) give    
1 1 1 1 3
ϕ(x) − ≤ I0 ≤ ϕ(x) − +
x x3 x x3 x5
Hence
I0 3/x5 3
1≤ ≤ 1 + =1+ 2 2 → 1 as x → ∞.
ϕ(x)(1/x − 1/x ) 3 1/x − 1/x 3 x (x − 1)
Hence I0 ∼ ϕ(x) x1 − x13 as x → ∞. Similarly


I0 1/x − 1/x3 1
1≥ ≥ = → 1 as x → ∞.
ϕ(x)(1/x − 1/x + 3/x ) 3 5 1/x − 1/x + 3/x3 5 1 + 3/(x4 − x2 )
Hence I0 ∼ ϕ(x) x1 − x13 + x35 as x → ∞. Similarly for the general case.

[←EX]
16.9 We can take Y = 0 with probability 1/2 and Y = Z with probability 1/2, where Z ∼ N (0, 1). Hence E[Y n ] = 12 E[Z n ] = 0
if n is odd and 12 E[Z n ] = n!/(2(n+2)/2 ( n/2)!) if n is even.. [←EX]
16.10 (a) Clearly we hmust have c > 0; i also a > 0 is necessaryin order  to ensure fX (x) can integrate  to1. Now Q(x)  =
b b 2 b2 b2 b 2 b2 (x−µ)2
2

a(x − a x) = a (x − 2a ) − 4a2 and hence fX (x) = c exp − 4a2 exp −a(x − 2a ) = c exp − 4a2 exp − 2σ2
 2
b 1 b
where µ = 2a and σ 2 = 2a . Because fX (x) integrates to 1, we must also have c exp − 4a 2 = σ√12π . This answers (a).
b 1

(b) X ∼ N 2a , σ 2 = 2a . [←EX]
Mar 10, 2020(20:25) Answers 2§16 Page 195

16.11 Now ln{ f (x) } = − ln(σ 2π) − (x − µ)2 /2σ 2 . Hence
√ Z Z
(x − µ)2 √ E[(X − µ)2 ] √ 1
H(X) = ln(σ 2π) f (x) dx + 2
f (x) dx = ln(σ 2π) + 2
= ln(σ 2π) + [←EX]
σ 2σ 2
16.12 Clearly X/σ and Y /σ are i.i.d. with the N (0, 1) distribution. Hence (X 2 + Y 2 )/σ 2 ∼ χ22 = Γ(1, 1/2) which is the
exponential ( 1/2) distribution with density 12 e−x/2 for x > 0. Hence X 2 + Y 2 ∼ exponential ( 1/2σ2 ) with expectation 2σ 2 .
(b) Clearly X1/σ, . . . , Xn/σ are i.i.d. with the N (0, 1) distribution. Hence Z 2 /σ 2 ∼ χ2n = gamma( n/2, 1/2). Hence
Z 2 ∼ gamma( n/2, 1/2σ2 ). [←EX]
1 −(x2 +y 2 )/2
16.13 The distribution
of (X, Y ) is f(X,Y ) (x, y) = 2π e for x ∈ R and y ∈ R. The absolute value of the Jacobian is
∂(v,w) 2(x2 +y 2 ) 2
∂(x,y) = y2
= 2(w + 1). Note that two values of (x, y) lead to the same value of (v, w). Hence f(V,W ) (v, w) =
−v/2
1
2π(1+w2 )
e for v > 0 and w ∈ R. Hence V and W are independent with fV (v) = 21 e−v/2 for v > 0 and fW (w) =
1
π(1+w2 )
for w ∈ R. So V ∼ exponential ( 21 ) and W ∼ Cauchy(1). [←EX]
16.14
f(X|Y1 ,...,Yn ) (x|y1 , . . . , yn ) ∝ f (y1 , . . . , yn , x) = f (y1 , . . . , yn |x)fX (x)
n
(yi − x)2 (x − µ)2
   
Y 1 1
= √ exp − √ exp −
σ 2π
i=1 1
2σ12 σ 2π 2σ 2
 Pn 2 Pn
nx2 x2 µ2

i=1 yi 2x i=1 y1 2µx
∝ exp − + − 2 − 2+ − 2
2σ12 2σ12 2σ1 2σ 2σ 2 2σ
 Pn 2 2

x i=1 y1 nx x µx
∝ exp 2
− 2 − 2+ 2
σ1 2σ1 2σ σ
Pn
αx2
 
n 1 µ yi
= exp − + βx where α = 2 + 2 and β = 2 + i=12
2 σ1 σ σ σ1
 α 
2
∝ exp − (x − β/α)
2
Hence the distribution of (X|Y1 = y1 , . . . , Yn = yn ) is N ( β/α, σ 2 = 1/α). Note that α, the precision of the result is the
sum of the (n + 1)-precisions. Also, the mean is a weighted average of the input means:
 P
β µ σ12 /n + ( yi )σ 2
=  [←EX]
α σ12 /n + σ 2
1 2 1 2 2 2
16.15 (a) Now E[eitZ1 +isZ2 ] = E[ei(t+s)X ]E[ei(t−s)Y ] = e− 2 (t+s) e− 2 (t−s) = e−s e−t itZ1
√ = E[e ]E[e
isZ2
].√
We have shown that X −Y and X +Y are i.i.d. N (0, 2). Hence V1 = (X −Y )/ 2 and V2 = (X +Y )/ 2 are i.i.d. N (0, 1).
Finally X−Y /X+Y = V1/V2 . (b) Let X1 = X+Y 2
2 , Then X1 ∼ N (0, σ = /2). Let X2 =
1 X−Y 2
2 , Then X2 ∼ N (0, σ = /2).
1
2 2
Let Z1 = 2X1 and Z2 = 2X2 . Now X1 and X2 are√independent by part (a); hence Z1 and √ 2Z are independent. Hence Z1
2
and Z2 are i.i.d. χ1 = gamma( /2, /2) with c.f.√1/ 1 − 2it. Hence −Z2 has the c.f.√
1 1 1/ 1 + 2it. Because Z1 and Z2 are
independent, the c.f. of 2XY = Z1 − Z2 is 1/ 1 + 4t2 . Hence the c.f. of XY is 1/ 1 + t2 .
(c)
Z ∞ Z ∞
1 1 2
E[eitXY ] = E[eityX ]fY (y) dy = E[eityX ] √ e− 2 y dy
−∞ −∞ 2π
Z ∞ Z ∞
1 − 12 t2 y 2 − 12 y 2 1 1 2 2 1
=√ e e dy = √ e− 2 y (1+t ) dy = √
2π −∞ 2π −∞ 1 + t2
√ √
(d) Now X = σX1 and Y = σY1 where the c.f. of X1 Y1 is 1/ 1 + t2 . Hence the c.f. of XY is 1/ 1 + σ 4 t2 .
(e) Take σ = 1. Then the m.g.f. is
Z ∞ Z ∞
1 1 2
E[etXY ] = E[etyX ]fY (y) dy = E[etyX ] √ e− 2 (y−µ) dy
−∞ −∞ 2π
Z ∞ µ2 Z

1 1 2 2 1 2 e− 2 1 2 2
=√ eµty+ 2 t y e− 2 (y−µ) dy = √ e− 2 y (1−t )+µy(1+t) dy
2π −∞ 2π −∞
µ2 Z

e− 2 (1 − t2 )
  
2µy
= √ exp − y2 − dy
2π −∞ 2 1−t
Z ∞ " 2 #
µ2 µ2 (1 + t) (1 − t2 )
  
1 µ
= exp − + √ exp − y− dy
2 2(1 − t) 2π −∞ 2 1−t
 2 
µ t 1
= exp √
1−t 1 − t2
Page 196 Answers 2§16 Mar 10, 2020(20:25) Bayesian Time Series Analysis
2
For the general case E[etXY ] = E[etσ X1 Y1 ] where X1 and Y1 are i.i.d. N ( µ/σ, 1) and hence
µ2 t iµ2 t
   
1 1
E[etXY ] = exp √ and the c.f. is E[e itXY
] = exp √ [←EX]
1 − tσ 2
1−σ t4 2 1 − itσ 2
1 + σ 4 t2
16.16 Use the previous question. In both cases, the c.f. is 1/(1 + t2 ). [←EX]
16.17 (a) Now
∞ ∞  
b b
Z Z
 
b2
 1  
b2

exp − 21 u2 + u2
du = 1 − 2 + 2 + 1 exp − 12 u2 + u2
du
0 2 0 u u
Consider the integral
∞  
b
Z   
1 2 b2
I1 = + 1 exp − 2 u + u2
du
0 u2
The transformation u → z with z = u − ub is a 1 − 1 transformation: (0, ∞) → (−∞, ∞). Also dz
du =1+ b
u2
. Hence
Z ∞
1 2 √
I1 = e−b e− 2 z dz = e−b 2π
−∞
Now consider the integral
∞  
b
Z   
b2
I2 = 1 − 2 exp − 12 u2 + u2
du
0 u
√ √
Consider the transformation z = u + ub . This is a 1 − 1 transformation (0, b) → (∞, 2 b) and a 1 − 1 transformation
√ √
( b, ∞) → (2 b, ∞). Hence
Z √b Z ∞ !   Z 2√b Z ∞
b  
b2
 1 2 1 2
I2 = + √ 1
1 − 2 exp − 2 u + u2 2
du = eb e− 2 z dz + √ eb e− 2 z dz = 0
0 b u ∞ 2 b
as required.
(b) Just use the transformation u = |a|v in part (a) and then set b1 = b/|a|. [←EX]
16.18 (a) Let V = X + Y ; then V ∼ N (0, 2) and
f(X,V ) (x.v) fX (x)fY (v − x)
fX|V (x|v) = =
fV (v) fV (v)
 2
(v − x)2
   2
1 x 1 1 v
= √ exp − √ exp − √ exp −
2π 2 2π 2 4π 4
2
  
1 1 v
= √ exp − x2 + (v − x)2 −
π 2 2
2
    h 
1 2 v 1 v i2
= √ exp − x − vx + = √ exp − x −
π 4 π 2
v 1
and this is the density of N ( 2 , 2 ).
(b) First suppose X ∼ N (0, σ12 ) and Y ∼ N (0, σ22 ). Then V = X + Y ∼ N (0, σ12 + σ22 ). Then
x2 (v − x)2
   
1 1
fX (x)fY (v − x) = q exp − 2 q exp −
2πσ 2 2σ1 2πσ 2 2σ22
1 2
2
 
1 v
fV (v) = q exp −
2π(σ12 + σ22 ) 2(σ1 + σ22 )
2

and hence q
1 σ12 + σ22 
1 x2 (v − x)2

v2

fX|V (x|v) = √ exp − + − 2
2 σ12 σ22 σ1 + σ22
q
2π σ12 σ22
q
2 #
σ12 + σ22
"
1 1 x(σ12 + σ22 ) − σ12 v
=√ exp −
σ12 σ22 (σ12 + σ22 )
q
2π σ2 σ2 2
1 2
and hence the density of X|X + Y = v is
σ12 v σ12 σ22
 
N ,
σ12 + σ22 σ12 + σ22
Now suppose V |W has density fV |W (v|w). Let V1 = V + a and W1 = W + b. Then
f(V1 ,W1 ) (v1 , w1 ) f(V,W ) (v1 − a, w1 − b)
fV1 |W1 (v1 |w1 ) = = = fV |W (v1 − a|w1 − b)
fW1 (w1 ) fW (w1 − b)
Mar 10, 2020(20:25) Answers 2§16 Page 197

It follows that the general result is that the density of X|X + Y = v is


 2
σ1 (v − µ1 − µ2 ) σ12 σ22

N + µ1 , 2 [←EX]
σ12 + σ22 σ1 + σ22
16.19 Now X = a1 X1 + · · · + an Xn where a1 = · · · = an = 1/n and X1 − X = b1 X1 + · · · + bn Xn where b1 = 1 − 1/n and
b2 = · · · = bn = −1/n. By equation(15.7a) we have
n n  Pn 2 
X 1X 1 2 σ
0= ai bi σi2 = bi σi2 = σ1 − i=1 i
n n n
i=1 i=1
Hence result. [←EX]
16.20 In the following, t = (t1 + · · · + tn )/n. The characteristic function of (X, X1 − X, X2 − X, . . . , Xn − X) is,
h P i
  n
E exp itX + it1 (X1 − X) + · · · + itn (Xn − X) = E exp i j=1 Xj (tj + t/n − t)
n
Y  
= E exp iXj (tj + t/n − t
j=1
Yn
exp µj (tj + t/n − t) − 21 σ 2 (tj + t/n − t)2

=
j=1
X 
= exp µt − 12 σ 2 t2 /n exp µj (tj − t) − 12 σ 2 (tj − t)2
 P
   
= E exp itX E exp it1 (X1 − X) + · · · + itn (Xn − X)
Hence the result. Parts (b) and (c) follow from the result that if V and W are random objects and f and g are measurable
functions, then f (V ) and f (W ) are independent. [←EX]
16.21 By part (a) of exercise 12.7 on page 38, we know that if W ∼ gamma(m, α), Z ∼ gamma(n, α) and W and Z are
independent, then W + Z and W/(W + Z) are independent and W/(W + Z) ∼ beta(m, n). 
(a) Take W = Xk2 ∼ χ21 = gamma( 1/2, 1/2) and Z = X12 + · · · + Xn2 − Xk2 ∼ χ2n−1 = gamma (n−1)/2, 1/2 . Hence part (a).
(b) Take W = X12 +X22 ∼ χ22 = gamma(1, 1/2) and Z = X32 +X42 ∼ χ22 = gamma(1, 1/2). Also beta(1, 1) = uniform(0, 1).
Hence result. [←EX]
16.22 Z ∞ Z µ
(x − µ)2 (x − µ)2
    
1
E |X − µ|n = √ (x − µ)n exp − n
 
dx + (µ − x) exp − dx
2πσ 2 µ 2σ 2 −∞ 2σ 2
Z ∞  2
2 n n t
=√ t σ exp − dt
2π 0 2
Z ∞
σn σn 2n/2 σ n
   
n+1 n+1
=√ v (n−1)/2 exp(− v/2) dv = √ 2(n+1)/2 Γ = √ Γ [←EX]
2π 0 2π 2 π 2
16.23 (a)and(b) Now Y = |X| where X ∼ N (µ, σ 2 ). Hence
x−µ −x − µ x−µ
      x + µ
FY (x) = P[Y ≤ x] = P[−x ≤ X ≤ x] = Φ −Φ =Φ +Φ −1
σ σ σ σ
Now FY (x) = FX (x) − FX (−x) implies fY (x) = fX (x) + fX (−x); hence
 r
(x − µ)2 (x + µ)2 (x2 + µ2 )
      µx 
1 1 2
fY (x) = √ exp − + √ exp − = exp − cosh (16.23a)
2πσ 2 2σ 2 2πσ 2 2σ 2 πσ 2 2σ 2 σ2
by using cosh x = 21 (ex + e−x ). (b) To find E[X] and var[X] we proceed as follows.
Z ∞ Z ∞
(x − µ)2 (x + µ)2
    
1
E[Y ] = √ x exp − dx + x exp − dx
2πσ 2 0 2σ 2 0 2σ 2
Z ∞ ∞
(x − µ)2 (x + µ)2
  Z   
1
=√ (x − µ) exp − dx + (x + µ) exp − dx + A
2πσ 2 0 2σ 2 0 2σ 2
µ2 µ2
    
1
=√ σ 2 exp − 2 + σ 2 exp − 2
2πσ 2 2σ 2σ
where
Z ∞ Z ∞
(x − µ)2 (x + µ)2
    
µ
A= √ exp − dx − exp − dx
2πσ 2 0 2σ 2 0 2σ 2
"Z #
∞ Z ∞
µσ 2 2
h µ  µ i h  µ i
=√ e−y /2 dy − e−y /2 dy = µ Φ −Φ − = µ 1 − 2Φ −
2πσ 2 −µ/σ µ/σ σ σ σ
Hence r
µ2
   µ i
2 h
E[Y ] = σ exp − 2 + µ 1 − 2Φ −
π 2σ σ
Page 198 Answers 2§18 Mar 10, 2020(20:25) Bayesian Time Series Analysis

2 2
Clearly var[Y ] = var[|X|] = E[X 2 ] − {E[|X|} = var[X] + {E[X]} − µ2Y = σ 2 + µ2 − µ2Y .
(d) The mgf, E[etY ], is
Z ∞ Z ∞
(x − µ)2 (x + µ)2
    
1
√ exp tx − dx + exp tx − dx =
2πσ 2 0 2σ 2 0 2σ 2
 2 2 Z ∞ Z ∞
(x − µ − σ 2 t)2
 2 2
(x + µ − σ 2 t)2
     
1 σ t σ t
√ exp + µt exp − dx + exp − µt exp − dx
2πσ 2 2 0 2σ 2 2 0 2σ 2
 2 2 h  2 2 h
σ t  µ i σ t µ i
= exp + µt 1 − Φ − − σt + exp − µt 1 − Φ − σt
2 σ 2 σ
Hence the cf is
σ 2 t2 σ 2 t2
 h  µ i  h µ i
φY (t) = E[eitY ] = exp − + iµt 1 − Φ − − iσt + exp − − iµt 1 − Φ − iσt [←EX]
2 σ 2 σ
16.24 (a) Obvious from the expression for the density function, equation(16.23a).
(b) Now X = |Y | where Y ∼ N (µ, σ 2 ). Hence bX = |bY | and bY ∼ N (bµ, b2 , σ 2 ). [←EX]
q
2 2 2 2
16.25 (a) By exercise 16.23, fX (x) = √2 e−x /2σ = σ1 π2 e−x /2σ for x ∈ (0, ∞).
σ 2π
2 2 2 2 2 2
(b) Now fX (x) = ce−x /2σ with c > 0; hence fX 0
(x) = −(cx/σ 2 )e−x /2σ and fX 00
(x) = c(x2 − σ 2 )e−x /2σ /σ 4 . Hence
00 00
fX is decreasing on (0, ∞). (c) For x < σ we have fX < 0 and for x > σ we have fq X > 0.
(e) By exercise 16.23, E[X] = σ π2 and var[X] = σ 2 − µ2X =

(d) By exercise 16.23, FX (x) = 2Φ x/σ − 1.
 
σ 2 (1 − 2/π). (f) By exercise 16.23, φX (t) = 2 exp −σ 2 t2 /2 [1 − Φ (−iσt)]. [←EX]
16.26 (a) By exercise 16.22,
  2n/2 σ n
 
n+1
E Xn = √ Γ for n = 0, 1, . . . .
π 2
√ √
(b)
 Now E[X] = σ 2/ π and var[X] = σ 2 (1 − 2/π). Then substitute in equation(2.22a) which is skew[X] =
E[X 3 ] − 3µX σx2 − µ3X /σX 3
. Use equation(2.25a),
E[X ] − 4µX E[X 3 ] + 6µ2X σX
4 2
+ 3µ4X π2 16 12(π − 2) 12
 
κ[X] = 4
= 3− + + 2 [←EX]
σX (π − 2)2 π π2 π
16.27 (a) Now X = |Y | where Y ∼ N (0, 1). Hence X 2 = Y 2 ∼ χ21 . (b) Now X = |Y | where Y ∼ N (0, σ 2 ). Hence
2 2
bX ∼ b|Y | where bY ∼ N (0, b σ ). [←EX]

Chapter 2 Section 18 on page 55 (exs-logN.tex)

18.1 (a) If x → √
∞ then ln x → ∞ and hence fX (x) → 0.
Let c = ln[ 2π σ] and c1 = c + µ2 /(2σ 2 ). Then
(ln x − µ)2 1 
= c1 − 2 (ln x)2 + 2(σ 2 − µ) ln x

ln[fX (x)] = c − ln x − 2
2σ 2σ
If x → 0 then ln x → −∞ and hence ln[fX (x)] → −∞ and hence fX (x) → 0. Differentiating gives
f 0 (x) 2 ln x 2(σ 2 − µ) 0
−2σ 2 X and hence σ 2 xfX (x) = fX (x) µ − σ 2 − ln x
 
= +
fX (x) x x
2 2
0
If follows that fX (x) > 0 for x < eµ−σ and fX 0
(x) < 0 for x > eµ−σ .
(b) Differentiating again gives
00 0 0
 fX (x)
σ 2 xfX (x) + σ 2 fX (x) µ − σ 2 − ln x −

(x) = fX
x
Hence
00 0
 fX (x) fX (x) [µ − σ 2 − ln x][µ − 2σ 2 − ln x] − σ 2
σ 2 xfX (x) µ − 2σ 2 − ln x −

(x) = fX =
x x σ2
fX (x)
= 2 [(ln x)2 + (3σ 2 − 2µ) ln x + µ2 − 3σ 2 µ + 2σ 4 − σ 2 ]
σ x
The discriminant of the quadratic in ln x is σ 2 (σ 2 + 4). Hence result. [←EX]
µ+ 21 σ 2 2µ+σ 2 σ 2 −1
18.2 We know that α = e and β = e (e ). Hence
α2 α2
 
β α
σ 2 = ln 1 + 2 and eµ = p =p or µ = ln p [←EX]
α 1 + β/α2 β + α2 β + α2
Mar 10, 2020(20:25) Answers 2§18 Page 199

18.3 Let S4 denote the accumulated value at time t = 4 and let s0 denote the initial amount invested. Then S4 = s0 (1 + I1 )(1 +
P4
I2 )(1 + I3 )(1 + I4 ) and ln(S4 /s0 ) = j=1 ln(1 + Ij )
1 2
Recall that if If Y ∼ lognormal(µ, σ ) then Z = ln Y ∼ N (µ,pσ 2 ). Also E[Y ] = E[eZ ] = eµ+ 2 σ and var[Y ] =
2
2 2 2
e2µ+σ (eσ − 1). Hence eσ = 1 + var[Y ]/E[Y ]2 and eµ = E[Y ]/ 1 + var[Y ]/E[Y ]2 or µ = ln E[Y ] − σ 2 /2.
Using mean=1.08 and variance=0.001 gives µ1 = 0.0765325553785 and σ12 = 0.000856971515297.
Using mean=1.06 and variance=0.002 gives µ2 = 0.0573797028389 and σ22 = 0.00177841057009.
Hence ln(S4 /s0 ) ∼ N (2µ1 + 2µ2 , 2σ12 + 2σ22 ) = N (0.267824516435, 0.00527076417077). We
qwant
0.95 = P[S5 > 5000] = P[ln(S5 /s0 ) > ln(5000/s0 )] = P[Z > (ln(5000/s0 ) − (2µ1 + 2µ2 ))/ 2σ12 + 2σ22 ] Hence
 
ln(5000/s ) − (2µ + 2µ ) ln(5000/s0 ) − (2µ1 + 2µ2 )
0.05 = Φ  q0 1 2 
and so q = Φ−1 (0.05)
2
2σ1 + 2σ2 2 2
2σ1 + 2σ22
q
Hence ln(5000/s0 ) = (2µ1 + 2µ2 ) + Φ−1 (0.05) 2σ12 + 2σ22 = 0.148408095871.
 q 
Hence s0 = 5000 exp −(2µ1 + 2µ2 ) − Φ−1 (0.05) 2σ12 + 2σ22 = 4310.39616086 or £4,310.40. [←EX]
18.4 The median is eµ because P[X < eµ ] = P[ln(X) < µ] = 1/2. Hence the median equals the geometric mean. The mean
1 2
is eµ+ 2 σ by equation(17.3a) on page 52. For the mode, we need to differentiate the density function which is
(ln(x) − µ)2
 
c
fX (x) = exp − for x > 0.
x 2σ 2
Hence
c 2(ln(x) − µ) (ln(x) − µ)2
   
dfX (x) c
= − 2− exp −
dx x x 2xσ 2 2σ 2
2 2 1 2
which equals 0 when x = eµ−σ . Clearly mode = eµ−σ < median µ
 = e < mean = e 2 .
µ+ σ
ln(q1 )−µ ln(q1 )−µ
(b) Lower quartile: 0.25 = P[X < q1 ] = P[ln(X) < ln(q1 )] = Φ σ and hence σ = −0.6744898 and hence
q1 = eµ−0.6744898σ . Similarly for the upper quartile,
 q3 =eµ+0.6744898σ .
ln(αp )−µ
(c) p = P[X ≤ αp ] = P[ln(X) ≤ ln(αp )] = Φ σ . Hence ln(αp ) = µ + σβp as required. [←EX]
p
k 1 2 2 1 2
2 2

18.5 (a) Now E[X ] = exp[kµ + 2 k σ ] and var[X] = exp[2µ + σ ] exp(σ ) − 1 . Hence σ = exp[µ + 2 σ ] exp(σ 2 ) − 1,
E[X 3 ] = exp[3µ + 92 σ 2 ], µσ 2 = exp[3µ + 23 σ 2 ] exp(σ 2 ) − 1 and µ3 = exp[3µ + 32 σ 2 ]. Hence


3
E[X 3 ] − 3µσ 2 − µ3 = exp[3µ + σ 2 ] exp[3σ 2 ] − 3 exp(σ 2 ) − 1 − 1
  
2
Using equation(2.22a) shows that

E[X 3 ] − 3µσ 2 − µ3 exp[3σ 2 ] − 3 exp[σ 2 ] − 1 − 1
skew[X] = =
σ3 exp[σ 2 ] − 1
3/2

exp[3σ 2 ] − 3 exp[σ 2 ] + 2 p p
= exp[σ 2 ] − 1 = exp[σ 2 ] + 2 exp[σ 2 ] − 1
exp[2σ ] − 2 exp σ ] + 1
2 2

Now E[X 4 ] = exp[4µ + 8σ 2 ], 4µE[X 3 ] = 4 exp[4µ + 5σ 2 ], 6µ2 σ 2 = 6 exp[4µ + 2σ 2 ] exp[σ 2 ] − 1 and 3µ4 = 3 exp[4µ +
2σ 2 ]. Using equation(2.25a) shows that we have
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 exp[6σ 2 ] − 4 exp[3σ 2 ] + 6 exp[σ 2 ] − 3
κ[X] = = 2
σ4 exp[σ 2 ] − 1
= exp[4σ 2 ] + 2 exp[3σ 2 ] + 3 exp[2σ 2 ] − 3
by using the factorization x6 − 4x3 + 6x − 3 = (x − 1)2 (x4 + 2x3 + 3x2 − 3). [←EX]
18.6 (a) Now X ∼ logN (µ, σ 2 ); hence ln(X) ∼ N (µ, σ 2 ). Hence ln(GMX ) = E[ln(X)] = µ; hence GMX = eµ .
2
(b) Now ln(GVX ) = var[ln(X)] = σ 2 . Hence GVX = eσ and GSDX = eσ . [←EX]
18.7 Now for x ∈ (0, k) we have
(ln(x) − µ)2
 
1
f (x|X < k) = √ exp −
P[X < k]σx 2π 2σ 2
(ln(x) − µ)2 √ ln(k) − µ
   
1
= exp − where α = σ 2πΦ
xα 2σ 2 σ
and hence
1 k (ln(x) − µ)2
Z  
E[X|X < k] = exp − dx
α 0 2σ 2
Using the transformation w = ln(x) − µ − σ 2 /σ gives dw 1 2
 2 2
dx = xσ and (ln(x) − µ) /σ = (w + σ) . Hence
Page 200 Answers 2§20 Mar 10, 2020(20:25) Bayesian Time Series Analysis

(ln(k)−µ−σ 2 )/σ (ln(k)−µ−σ 2 )/σ


(w + σ)2 w2 σ 2
   
σ
Z Z
1
E[X|X < k] = exp − xσ dw = exp − + + µ dw
α −∞ 2 α −∞ 2 2
 2

ln(k)−µ−σ
σ2
Φ σ
= eµ+ 2  
ln(k)−µ
Φ σ
1 2
The other result is similar or use E[X|X < k]P[X < k] + E[X|X > k]P[X > k] = E[X] = eµ+ 2 σ . [←EX]
18.8 (a)
Z x
(ln(u) − µ)2
  
1 2 2 j 1
G(x) = exp −jµ − j σ u √ exp − du
2 0 uσ 2π 2σ 2
Z x
(ln(u) − µ − jσ 2 )2
 
1
= √ exp − du as required.
0 uσ 2π 2σ 2
Setting j = 1 in part (a) shows that xfX (x) = E[X]fX1 (x) where X1 ∼ logN (µ + σ 2 , σ 2 ). (b)
Z ∞Z u Z ∞Z ∞
2E[X]γX = (u − v)fX (u)fX (v) dvdu + (v − u)fX (u)fX (v) dvdu
Zu=0
∞ Z v=0
u Z u=0 v=u
∞ Z v
= (u − v)fX (u)fX (v) dvdu + (v − u)fX (u)fX (v) dvdu
u=0 v=0 v=0 u=0
Z ∞Z u
=2 (u − v)fX (u)fX (v) dvdu
u=0 v=0
Z ∞ Z ∞ Z u 
=2 uFX (u)fX (u)du − 2 vfX (v)dv fX (u)du
u=0 u=0 v=0
Z ∞ Z ∞ 
= 2E[X] FX (u)fX1 (u)du − FX1 (u) fX (u)du where X1 ∼ logN (µ + σ 2 , σ 2 ).
 u=0 u=0
  
= 2E[X] P[X ≤ X1 ] − P[X1 ≤ X] = 2E[X] P[ X/X1 ≤ 1] − P[ X1/X ≤ 1]

But X/X1 ∼ logN (−σ 2 , 2σ 2 ) and P[ X1/X ≤ 1] = P[X ≥ X1 ] = 1 − P[X < X1 ]. Hence
γX = 2P[Y ≤ 1] − 1 where Y ∼ logN (−σ 2 , 2σ 2 ).
 
σ
= 2Φ √ − 1 as required. [←EX]
2

18.9 (a) Let Z = 1/X. Then ln(Z) = − ln(X) ∼ 2


 N (−µ, σ ). Hence Z ∼ logN (−µ,
2
 σ ). (b) Let Z = cX b . Then
ln(Z) = ln(c) + b ln(X) ∼ N ln(c) + bµ, b2 σ 2 . Hence Z ∼ logN ln(c) + bµ, b2 σ 2 . [←EX]

18.10 Let Z = X1 /X2 . Then Z ∼ logN (µ1 − µ2 , σ12 + σ22 ) by using the previous 2 questions. [←EX]
Pn Pn Pn 2
 Pn Pn 2

18.11 (a) Let Z = X1 · · · Xn . Then ln(Z) = ln(Xi ) ∼ N
i=1P i=1 µi , i=1 σi . Hence Z ∼ logN i=1 µi , i=1 σi .
1/n n 2 2
(b) Let Z = (X ·
Q1 n · · Xn ) . Then ln(Z) =Pn i=1 ln(X i )/n ∼ N (µ,
Pn σ /n). Hence
Pn Z ∼ logN (µ, σ /n). Pn
(c) Let Z = i=1 Xiai . Then ln(Z) =

i=1 ai ln(Xi ) ∼ N i=1 ai µi , i=1 a2i σi2 . Hence mn = i=1 ai µi and
qP
n 2 2
sn = i=1 ai σi . [←EX]

Chapter 2 Section 20 on page 60 (exs-powerPareto.tex)

20.1 Now Y = (X − a)/h has the standard power law distribution powerlaw(α, 1, 0) which has density f (y) = αy α−1 for
R1
0 < y < 1. For j = 1, 2, . . . , we have E[Y j ] = α 0 y α+j−1 dy = α/(α + j). Hence
α α α
E[Y ] = E[Y 2 ] = and hence var[Y ] =
α+1 α+2 (α + 1)2 (α + 2)
Now X = a + hY . Hence
αh αh2 2aαh αh2
E[X] = a + E[X 2 ] = + + a2 and var[X] = [←EX]
α+1 α+2 α+1 (α + 1)2 (α + 2)
Mar 10, 2020(20:25) Answers 2§20 Page 201

20.2 Because skew[a + bX] = skew[X] and κ[a + bX] = κ[X] when b > 0, we can assume X ∼ powerlaw(α, 1, 0). Now
E[X n ] = α/(α + n) and var[X] = α/(α + 1)2 (α + 2). Using equation(2.22a) shows that we have
E[X 3 ] − 3µσ 2 − µ3 (α + 1)3 (α + 2)3/2 3α2 α3
 
α
skew[X] = = − −
σ3 α3/2 α + 3 (α + 1)3 (α + 2) (α + 1)3
(α + 2)3/2 α(α + 1)3 3α2 2(1 − α)(α + 2)1/2
 
3
= − − α =
α 3/2 α+3 (α + 2) α1/2 (α + 3)
Using equation(2.25a) shows that we have
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 (α + 1)4 (α + 2)2 4α2 6α3 3α4
 
α
κ[X] = = − + +
σ4 α2 α + 4 (α + 1)(α + 3) (α + 1)4 (α + 2) (α + 1)4
1 (α + 1)4 (α + 2)2 4α(α + 1)3 (α + 2)2
 
= − + 6α2 (α + 2) + 3α3 (α + 2)2
α α+4 α+3
3(α + 2) (α + 1) (α + 2)(1 − 4α − α2 )
3
 
= + α2 (2 + 2α + α2 )
α (α + 3)(α + 4)
3(α + 2)
(α + 1)3 (α + 2)(1 − 4α − α2 ) + α2 (2 + 2α + α2 )(α + 3)(α + 4)
 
=
α(α + 3)(α + 4)
3(α + 2)  2 
= 3α − α + 2 [←EX]
α(α + 3)(α + 4)
20.3 Now X ∈ (0, h). Hence Y ∈ (ln( 1/h), ∞) and Y − ln( 1/h) ∈ (0, ∞).
Now P[Y ≤ y] = P[ln(X) ≥ −y] = P[X ≥ e−y ] = 1 − e−αy /hα = 1 − e−α(y−ln(1/h)) for y > ln( 1/h). Hence the
density is αe−α(y−ln(1/h)) for y > ln( 1/h), a shifted exponential. [←EX]
20.4 (a) FMn (x) = P[Mn ≤ x] = xn for 0 < x < 1. This is the powerlaw(n, 1, 0) distribution with density fMn (x) = nxn−1
for 0 < x < 1. (b) P[U 1/n ≤ x] = P[U ≤ xn ] = xn for 0 < x < 1. The same distribution as for part (a).
(c) Now (X − a)/h ∼ powerlaw(α, 1, 0); hence by part (b) we have (X − a)/h ∼ U 1/α and X ∼ a + hU 1/α . Then use
the binomial theorem and E[U j/α ] = α/(α + j). [←EX]
20.5 P[Mn ≤ x] = (x − a)nα /hnα and so Mn ∼ powerlaw(nα, h, a). [←EX]
20.6 Equation(3.3b) on page 9 gives
n! k−1 n−k n!
fk:n (x) = f (x) {F (x)} {1 − F (x)} = αxkα−1 (1 − xα )n−k
(k − 1)!(n − k)! (k − 1)!(n − k)!
and using the transformation v = xα gives
Z 1 Z 1
n! n! 1
E[Xk:n ] = αxkα (1 − xα )n−k dx = v k−1+ α (1 − v)n−k dv
(k − 1)!(n − k)! 0 (k − 1)!(n − k)! 0
n! Γ(k + α1 )Γ(n − k + 1) n! Γ(k + α1 )
= 1
=
(k − 1)!(n − k)! Γ(n + α + 1) (k − 1)! Γ(n + α1 + 1)
2 n! Γ(k + α2 )
E[Xk:n ]= [←EX]
(k − 1)! Γ(n + α2 + 1)
20.7 (a)
Rx
n − 1 0 yf (y) dy

E[Y1 + · · · + Yn−1 ] (n − 1)E[Y ] (n − 1)α
 
Sn
E X(n) = x = 1 + =1+ =1+ =1+
X(n) x x x F (x) α+1
as required.
(b) The density of (X1:n , X2:n , . . . Xn:n ) is f (x1 , . . . , xn ) = n!αn (x1 x2 · · · xn )α−1 /hnα for 0 ≤ x1 ≤ x2 · · · ≤ xn . Con-
sider the transformation to (W1 , W2 , . . . , Wn ) where W1 = X1:n /Xn:n , W2 = X2:n /Xn:n , . . . , Wn−1 = X(n−1):n /Xn:n
and Wn = Xn:n . This has Jacobian with absolute value
∂(w1 , . . . , wn ) 1
∂(x1 , . . . , xn ) = xn−1

n
Hence for 0 < w1 < 1, . . . , 0 < wn < 1, the density of the vector (W1 , . . . , Wn ) is
αn−1
h(w1 , . . . , wn ) = wnn−1 f (w1 wn )f (w2 wn ) · · · f (wn−1 wn ) = α(n−1) w1α−1 w2α−1 · · · wn−1 α−1 (n−1)α
wn
h
Hence W1 , W2 , . . . , Wn are independent. Hence W1 + · · · + Wn−1 is independent of Wn as required. [←EX]
20.8 (a) The distribution of X(i) give X(i+1) = x is the same as the distribution of the maximum of i independent random
i
variables from the density f (y)/F (x) for y ∈ (0, x); this maximum has distribution function {F (y)/F (x)} and density
i−1 i
if (y){F (y)} /{F (x)} for y ∈ (0, x).
(a) Hence
" #
r Z x
X(i) i
E X = x = y r f (y){F (y)}i−1 dy (20.8a)

r (i+1)
X(i+1) xr {F (x)}i 0
Page 202 Answers 2§20 Mar 10, 2020(20:25) Bayesian Time Series Analysis

⇐ Substituting f (y) = αy α−1 /hα and FR(y) = y α /hα in the right hand side of equation(20.8a) gives iα/(iα + r)
x
as required. ⇒ Equation(20.8a) gives i 0 y r f (y){F (y)}i−1 dy = cxr {F (x)}i for x ∈ (0, h). Differentiating with
r i−1
respect to x gives ix f (x){F (x)} = cx {F (x)}i−1 [rF (x) + xif (x)]. Hence f (x)/F (x) = cr/ix(1 − c) > 0
r−1

because c < 1. Hence result. (b)


" #
r Z x
X(i+1) ixr f (y){F (y)}i−1
E X(i+1) = x = dy (20.8b)

r
X(i) {F (x)}i 0 yr
and then as for part (a). [←EX]
20.9 By equation(3.2a), the density of the vector is (X1:n , X2:n , . . . , Xn:n ) is
g(x1 , . . . , xn ) = n!f (x1 ) · · · f (xn ) = n!αn (x1 x2 · · · xn )α−1 for 0 ≤ x1 ≤ x2 · · · ≤ xn .
The transformation to (W1 , W2 , . . . , Wn ) has Jacobian with absolute value

∂(w1 , . . . , wn ) 1 2 n−2 n−1
∂(x1 , . . . , xn ) = x2 · · · xn where x2 · · · xn = w2 w3 · · · wn−1 wn and x1 = w1 w2 · · · wn

Hence for 0 < w1 < 1, . . . , 0 < wn < 1, the density of the vector (W1 , . . . , Wn ) is
h(w1 , . . . , wn ) = n!αn xα−1 1 (x2 · · · xn )α = (αw1α−1 )(2αw22α−1 ) · · · (nαwnnα−1 ) = fW1 (w1 )fW2 (w2 ) · · · fWn (wn )
Hence W1 , W2 , . . . , Wn are independent. Also fWk (wk ) = kαwkkα−1 which is powerlaw(kα, 1, 0).
(b) Now Xk:n = Wk Wk+1 · · · Wn ; hence
kα (k + 1)α nα
E[Xk:n ] = EWk ]E[Wk+1 ] · · · E[Wn ] = ···
kα + 1 (k + 1)α + 1 nα + 1
n! 1 1 1 n! Γ(k + α1 )
= ··· =
(k − 1)! k + α1 k + 1 + α1 n + α1 (k − 1)! Γ(n + 1 + α1 )
2
Similarly for E[Xk:n ] = EWk2 ]E[Wk+1
2
] · · · E[Wn2 ]. [←EX]
−1/α
20.10 (a) Just use P[Y ≤ y] = P[U ≤ y] = P[U 1/α ≥ 1/y] = P[U ≥ 1/yα ]. (b) Just use Y ∼ Pareto(α, x0 , a)
iff (Y − a)/x0 ∼ Pareto(α, 1, 0) and part (a). (c) By part (a), 1/X ∼ Pareto(α, 1, 0) iff 1/X ∼ U −1/α where
U ∼ uniform(0, 1) and hence iff X ∼ U 1/α where U ∼ uniform(0, 1) and hence iff X ∼ powerlaw(α, 1, 0). [←EX]
20.11 (a) Let Y = X − a; then Y ∼ Pareto(α, x0 , 0) and E[Y n ] = αxn 0 /(α − n) for α > n and ∞ otherwise. Hence
n   n   n−k
X n n−k X n a αxk0
E[X n ] = E[(Y + a)n ] = a E[Y k ] = if n < α and E[Y n ] = ∞ otherwise.
k k α−k
k=0 k=0
(b) Now µY = E[Y ] = αx0 /(α − 1) for α > 1 and E[(X − µX )n ] = E[(Y − µY )n ] and
n   n   n−k
αxk0

n
X n n−k n−k k
X n n−k αx0
E[(Y − µY ) ] = (−1) µY E[Y ] = (−1)
k k α−1 α−k
k=0 k=0
n
αn−k+1 xn0
X n  
= (−1)n−k for n < α.
k (α − 1)n−k (α − k)
k=0
In particular
αx20
var[X] = var[Y ] = for 2 < α.
(α − 2)(α − 1)2

(c) Suppose t < 0. Then E[etX ] = x0 αxα tx α+1
R
0 e /x dx. Set v = −tx; hence v > 0. Then
Z ∞ α −v α+1 Z ∞
αx0 e (−t) dx α(−x0 t)α e−v
E[etX ] = = dx = α(−x0 t)α Γ(−α, −x0 t)
−tx0 v α+1 (−t) −tx0 v α+1
R∞
where Γ(s, x) = x ts−1 e−t dt is the incomplete gamma function. Hence the c.f. is E[eitX ] = α(−ix0 t)α Γ(−α, −ix0 t).
[←EX]
20.12 Because skew[X + a] = skew[X] and κ[X + a] = κ[X], we can assume X ∼ Pareto(α, x0 , 0). We know that E[X n ] =
αxn0 /(α − n) for α > n and var[X] = αx20 /(α − 1)2 (α − 2) for α > 2. Using equation(2.22a) shows that for α > 3 we
have
E[X 3 ] − 3µσ 2 − µ3 (α − 1)3 (α − 2)3/2 3α2 α3
 
α
skew[X] = = − −
σ3 α3/2 α − 3 (α − 1)3 (α − 2) (α − 1)3
(α − 2)1/2   2(α + 1)(α − 2)1/2
= 1/2 (α − 1)3 (α − 2) − 3α(α − 3) − α2 (α − 2)(α − 3) =
α (α − 3) α1/2 (α − 3)
Using equation(2.25a) shows that for α > 4 we have
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4
κ[X] =
σ4
(α − 1)4 (α − 2)2 4α2 6α3 3α4
 
α
= − + +
α2 α − 4 (α − 1)(α − 3) (α − 1)4 (α − 2) (α − 1)4
Mar 10, 2020(20:25) Answers 2§20 Page 203

(α − 2)2 (α − 1)4 4α(α − 1)3 6α2


 
= − + + 3α3
α α−4 (α − 3) (α − 2)
2
(α − 2) 6α2 (α − 3)(α − 4)
 
= (α − 1)4 (α − 3) − 4α(α − 1)3 (α − 4) + + 3α3 (α − 3)(α − 4)
α(α − 3)(α − 4) (α − 2)
3(α − 2)
(α − 1)3 (α − 2)(1 + 4α − α2 ) + α2 (α − 3)(α − 4)(α2 − 2α + 2)
 
=
α(α − 3)(α − 4)
3(α − 2)(3α2 + α + 2)
= [←EX]
α(α − 3)(α − 4)
20.13 For x ∈ [d, ∞) we have
 α
P[X > x] (x0 /x)α d
P[X > x|X > d] = = α
= [←EX]
P[X > d] (x0 /d) x
20.14 We know that E[X] is infinite. Also
  Z ∞ ∞
1 ∞ −5/2 1 −2x−3/2
Z
1 1 1 1
E = dx = x dx = =3 [←EX]
X 1 x 2x3/2 2 1 2 3 1

dy
20.15 Let Y = X n ; then dx = nxn−1 and hence
(α n α/n
n )(x0 )
fY (y) = for y ≥ xn0 .
y α/n+1
which is the density of the Pareto(α/n, xn0 , 0) distribution. [←EX]
20.16 (a) P[Y ≤ y] = P[ln(X) ≤ y] = P[X ≤ ey ] = 1 − xα 0 e −αy
= 1 − e−α(y−ln(x0 ))
for y > ln(x 0 ). Hence the density
is αe−α(y−ln(x0 )) for y > ln(x0 ), a shifted exponential. In particular, if X ∼ Pareto(α, 1, 0), then the distribution of
Y = ln(X) is the exponential (α) distribution.
(b) For y > 1 we have P[Y ≤ y] = P[eX ≤ y] = P[X ≤ ln y] = 1 − exp(−λ ln y) = 1 − 1/y λ . [←EX]
R ∞ ln(x)
20.17 Now GMX is defined by ln(GMX ) = E[ln X]. Either use exercise 20.16 or directly: E[ln X] = αxα 0 x0 xα+1 dx =
α ∞ −αy 1 1
R 
αx0 ln(x0 ) ye dy = ln(x0 ) + α and hence GMX = x0 exp α .
From answer 18.8 on page 200 we have, where E[X] = αx0 /(α − 1),
Z ∞ Z ∞ Z u 
2E[X]γX = 2 uF (u)f (u)du − 2 vf (v)dv f (u)du
u=0 u=0 v=0
Z ∞ h  x α i αxα Z ∞ Z u
αxα αxα

0 0 0 0
=2 u 1− du − 2 v dv du
u=x0 u uα+1 u=x0 v=x0 v
α+1 uα+1
Z ∞  Z ∞ Z u

 
1 1 1
= 2αxα 0 α
− 0
du − 2α 2 2α
x 0 α
dv du
u=x0 u u2α u=x0 v=x0 v uα+1
2α2 x0 2αx0
= −
(2α − 1)(α − 1) (2α − 1)
and hence
α α−1 1
γX = − = [←EX]
2α − 1 2α − 1 2α − 1
20.18

0
1
xα0
n
x0α1 +···+αn
P[Mn > x] = P[X1 > x] · · · P[Xn > x] = · · · =
(x − a)α1 (x − a)αn (x − a)α1 +···+αn
which is the Pareto(α1 + · · · + αn , x0 , a) distribution. [←EX]
20.19 Using equation(3.3b) on page 9 gives
 k−1
n! k−1 n−k n! α 1 1
fk:n (x) = f (x) {F (x)} {1 − F (x)} = 1−
(k − 1)!(n − k)! (k − 1)!(n − k)! xα+1 xα xα(n−k)
 k−1
n! α 1
= 1− α
(k − 1)!(n − k)! x(n−k+1)α+1 x
and hence
∞  k−1
n! α
Z
1
E[Xk:n ] = 1− α dx
(k − 1)!(n − k)! 1 x(n−k+1)α x
1 1
n! v n−k+1 n!
Z Z
1
= 1 (1 − v)k−1 dv = v n−k− α (1 − v)k−1 dv
(k − 1)!(n − k)!0 v 1+ α (k − 1)!(n − k)! 0
1
n! Γ(n − k − α + 1)Γ(k) n! Γ(n − k + 1 − α1 )
= 1
=
(k − 1)!(n − k)! Γ(n − α + 1) (n − k)! Γ(n + 1 − α1 )
Page 204 Answers 2§20 Mar 10, 2020(20:25) Bayesian Time Series Analysis

2 n! Γ(n − k + 1 − α2 )
E[Xk:n ]= [←EX]
(n − k)! Γ(n + 1 − α2 )
20.20 By equation(3.2a) on page 8, the density of the vector is (X1:n , X2:n , . . . , Xn:n ) is
g(x1 , . . . , xn ) = n!f (x1 ) · · · f (xn ) = n!αn / (x1 x2 · · · xn )α+1 for 1 ≤ x1 ≤ x2 · · · ≤ xn .
The transformation
to (W1 , W2 , . . . , Wn ) has Jacobian with absolute value
∂(w1 , . . . , wn ) 1 n−1 n−2
∂(x1 , . . . , xn ) = x1 · · · xn−1 where x1 · · · xn−1 = w1 w2 · · · wn−1 and xn = w1 w2 · · · wn

Hence for w1 > 1, . . . , wn > 1, the density of the vector (W1 , . . . , Wn ) is


n!αn n!αn
h(w1 , . . . , wn ) = α α+1
= nα+1 (n−1)α+1
(x1 · · · xn−1 ) xn w1 w2 2α+1 w α+1
· · · wn−1 n
nα (n − 1)α 2α α
= · · · 2α+1 α+1 = fW1 (w1 )fW2 (w2 ) · · · fWn (wn )
w1nα+1 w2(n−1)α+1 wn−1 wn
(n−k+1)α
Hence W1 , W2 , . . . , Wn are independent. Also fWk (wk ) = (n−k+1)α+1
wk
which is Pareto((n − k + 1)α, 1, 0).
(b) Now Xk:n = W1 W2 · · · Wk ; hence
nα (n − 1)α (n − k + 1)α
E[Xk:n ] = EW1 ]E[W2 ] · · · E[Wk ] = ···
nα − 1 (n − 1)α − 1 (n − k + 1)α − 1
n! 1 1 1 n! Γ(n − k + 1 − α1 )
= 1 1
··· =
(n − k)! n − α n − 1 − α n − k + 1 − α (n − k)! Γ(n + 1 − α1 )
1

2
Similarly for E[Xk:n ] = EW12 ]E[W22 ] · · · E[Wk2 ]. [←EX]
20.21 Just use (X1 , . . . , Xn ) ∼ ( Y11 , . . . , Y1n ). [←EX]
20.22 Define the vector (Y1 , Y2 , . . . , Yn ) by
X2:n Xn:n
Y1 = X1:n , Y2 = , . . . , Yn =
X1:n X1:n
Exercise 6.11 on page 16(with answer 6.11 on page 174) shows that for y1 > 0 and 1 ≤ y2 ≤ · · · ≤ yn the density of
the vector (Y1 , . . . , Yn ) is
h(y1 , . . . , yn ) = n!y1n−1 f (y1 )f (y1 y2 )f (y1 y3 ) · · · f (y1 yn )
αn xαn
0 1 1 1
= n! αn+1 · · · α+1
y1 y2α+1 y3α+1 yn
αnxαn 1 1 1 1 1 1
= αn+1 0
(n − 1)!αn−1 α+1 α+1 · · · α+1 = g(y1 ) (n − 1)!αn−1 α+1 α+1 · · · α+1
y1 y2 y3 yn y2 y3 yn
where g is the density of Y1 = X1:n (see answer 20.18 above). Hence part (a).
(b) R∞
n − 1 x yf (y) dy

E[Y1 + · · · + Yn−1 ] (n − 1)E[Y ] (n − 1)α
 
Sn
E X(1) = x = 1 + =1+ =1+ =1+
X(1) x x x 1 − F (x) α−1
as required. (c) The result of part (a) implies Y1 is independent of Y2 + · · · + Yn = (Sn −X1:n )/X1:n = Sn/X1:n − 1. Hence
Y1 is independent of Sn/X1:n as required. [←EX]
20.23 The distribution of X(i+1) give X(i) = x is the same as the distribution of the minimum of n − i independent random
n on−i
1−F (y)
variables from the density f (y)/[1 − F (x)] for y ∈ (x, ∞); this minimum has distribution function 1 − 1−F (x)
and density (n − i)f (y){1 − F (y)}n−i−1 /{1 − F (x)}n−i for y ∈ (x, ∞). Hence
" #
r Z ∞
X(i+1) (n − i)
E r X(i) = x = xr {1 − F (x)}n−i y r f (y){1 − F (y)}n−i−1 dy (20.23a)

X(i) x
⇐ Substituting f (x) = αxα 0 /x
α+1
and F (x) = 1 − xα α
0 /x in the right hand side of equation(20.23a) gives (n −
i)α/((nR− i)α + r) as required. The condition α > r/(n − i) ensures the integral is finite. ⇒ Equation(20.23a) gives

(n − i) x y r f (y){1 − F (y)}n−i−1 dy = cxr {1 − F (x)}n−i for x ∈ (x0 , ∞). Differentiating with respect to x gives
f (x)/{1 − F (x)} = cr/[x(n − i)(c − 1)] = α/x where α = rc/[(n − i)(c − 1)] > r/(n − i). Hence result. Part (b) is
similar. [←EX]
20.24 Let Z = Y /X . Note that F (x) = 1 − xα α
0 /x forx ≥ x0 and 0 otherwise. Then for z > 1 we have
Z ∞ Z ∞ Z ∞
xα αxα αx2α 1 1
P[Z ≤ z] = F (zx)f (x) dx = 1 − α0 α α+1
0
dx = 1 − 0
α 2α+1
dx = 1 − α
x0 x0 z x x z x0 x 2z
For z < 1 we have Z ∞ Z ∞ 
xα αxα zα zα

P[Z ≤ z] = F (zx)f (x) dx = 1 − α0 α 0
α+1
dx = z α − =
x0 /z x0 /z z x x 2 2
and hence
α/z α+1 if z > 1;
1
fZ (z) = 12 α−1 [←EX]
2 αz if z < 1;
Mar 10, 2020(20:25) Answers 2§20 Page 205

20.25
P[M ≤ x, Y /X ≤ y] = P[M ≤ x, Y /X ≤ y, X ≤ Y ] + P[M ≤ x, Y /X ≤ y, Y < X]
= P[X ≤ x, Y ≤ yX, X ≤ Y ] + P[Y ≤ x, Y ≤ yX, Y < X]

P[X ≤ x, Y ≤ yX, X ≤ Y ] + P[Y ≤ x, Y < X] if y > 1;
=
P[Y ≤ x, Y ≤ yX] if y < 1.
Rx
x0
P[v ≤ Y ≤ yv]f (v) dv + P[Y ≤ x, Y < X] if y > 1;
=
P[Y ≤ x, Y ≤ yX] if y < 1.
(Rx Rx
[F (yv) − F (v)] f (v) dv + x0 [1 − F (v)] f (v) dv if y > 1;
= Rxx0  
x0
1 − F ( v/y) f (v) dv if y < 1.
(  x
x2α
 
f (v)
1
Z
α 1 − + 1 if y > 1; 1 0
= x0 I y α
where I = α
dv = α 1 − 2α
yα if y < 1. x0 v 2x0 x
(
1
x2α 1 − 2yα if y > 1;
 
0
= 1 − 2α yα
x 2 if y < 1.
= P[M ≤ x] P[ Y /X ≤ y] by using exercises 20.18 and 20.24. [←EX]
20.26 Let X1 = ln(X) and Y1 = ln(Y ). Then X1 and Y1 are i.i.d. random variables with an absolutely continuous distribution.
Also min(X1 , Y1 ) = ln [min(X, Y )] is independent of Y1 − X1 = ln(Y /X). Hence there exists λ > 0 such that X1 and
Y1 have the exponential (λ) distribution. Hence X = eX1 and Y = eY1 have the Pareto(λ, 1, 0) distribution. [←EX]
20.27 ⇐ By equation(3.3b) on page 9 the density of Xi:n is
 i−1  n−i
n! i−1 n−i n! β 1 1
fi:n (t) = f (t) {F (t)} {1 − F (t)} = 1 −
(i − 1)!(n − i)! (i − 1)!(n − i)! tβ+1 tβ tβ
n! β β i−1
= βn+1
t −1 for t > 1.
(i − 1)!(n − i)! t
By equation(3.5a) on page 10, the joint density of (Xi:n , Xj:n ) is, where c = n!/[(i − 1)!(j − i − 1)!(n − j)!],
 i−1  j−1−i  n−j
f(i:n,j:n) (u, v) = cf (u)f (v) F (u) F (v) − F (u) 1 − F (v)
i−1  j−i−1  n−j
β2

1 1 1 1
= c β+1 β+1 1 − β −
u v u uβ vβ vβ
 β i−1  β j−i−1
β2 u −1 v − uβ
= c β+1 β(n−j+1)+1
u v uβ uβ v β
β2 i−1  β j−i−1
= c β(j−1)+1 β(n−i)+1 uβ − 1 v − uβ

for 1 ≤ u < v.
u v
Use the transformation (T, W ) = (Xi:n , Xj:n/Xi:n ). The absolute value of the Jacobian is ∂(t,w)∂(u,v) = | /u| = 1/t. Hence
1

β2 β i−1  β j−i−1
f(T,W ) (t, w) = c t −1 w −1 = fi:n (t)fW (w)
tβn+1 wβ(n−i)+1
The fact that the joint density is the product of the marginal densities implies W and Y = Xi:n are independent.
⇒ The joint density of (Xi:n , Xj:n ) is given by equation(3.5a)
on page 10. The transformation to T = Xi:n , W =
∂(t,w)
Xj:n /Xi:n has Jacobian with absolute value ∂(u,v) = 1/u = 1/t. Hence (T, W ) has density
 i−1  j−i−1  n−j
f(T,W ) (t, w) = ctf (t)f (wt) F (t) F (wt) − F (t) 1 − F (wt)
Now T = Xi:n has density given by equation(3.3b) on page 9:
n! i−1 n−i
fT (t) = f (t) {F (t)} {1 − F (t)}
(i − 1)!(n − i)!
Hence the conditional density is, for all t > 1 and w > 1,
j−i−1  n−j
(n − i)! F (tw) − F (t) 1 − F (tw)

f(T,W ) (t, w) tf (tw)
f(W |T ) (w|t) = =
fT (t) (j − i − 1)!(n − j)! 1 − F (t) 1 − F (t) 1 − F (t)
(n − i)! 1 − F (tw)
 
∂q(t, w) j−i−1 n−j
= − {1 − q(t, w)} {q(t, w)} where q(t, w) =
(j − i − 1)!(n − j)! ∂w 1 − F (t)
and by independence, this must be independent of t. Hence there exists a function g(w) with
∂q(t, w) j−i−1 n−j
g(w) = {1 − q(t, w)} {q(t, w)}
∂w
j−i−1 
∂q(t, w) X j − i − 1

r n−j
= (−1)r {q(t, w)} {q(t, w)}
∂w r
r=0
Page 206 Answers 2§22 Mar 10, 2020(20:25) Bayesian Time Series Analysis

j−i−1
X  j−i−1

∂q(t, w) r+n−j
= (−1)r {q(t, w)}
r ∂w
r=0
j−i−1  r+n−j+1
∂ X j−i−1 {q(t, w)}

= (−1)r
∂w r r+n−j+1
r=0
and hence
j−i−1 r+n−j+1
X  j−i−1 {q(t, w)}
Z 
g1 (w) = g(w) dw = (−1)r
r r+n−j+1
r=0
j−i−1
X  j−i−1

∂g1 (w) r+n−j ∂q(t, w)
0= = (−1)r {q(t, w)}
∂t r ∂t
r=0

∂q(t, w) ∂q(t, w)
= g(w)
∂t ∂w
and hence ∂q(t,w)
∂t = 0 and so q(t, w) is a function of w only. Setting t = 1 shows that q(t, w) = (1 − F (tw))/(1 − F (t)) =
q(1, w) = (1 − F (w))/(1 − F (1)). Hence 1 − F (tw) = (1 − F (w))(1 − F (t))/(1 − F (1)). But F (1) = 0; hence we have
the following equation for the continuous function F :
(1 − F (tw)) = (1 − F (t))(1 − F (w))
for all t ≥ 1 and w ≥ 1 with boundary conditions F (1) = 0 and F (∞) = 1. This is effectively Cauchy’s logarithmic
functional equation. It leads to
1
F (x) = 1 − β for x ≥ 1. [←EX]
x

Chapter 2 Section 22 on page 68 (exs-tCauchyF.tex)

22.1 (a) Use the transformation from (X, Y ) to (W, Z) where


X
W =Y and Z=p
Y /n
Hence w ∈ (0, ∞) and z ∈ R and
√ √
∂(w, z)
= √n = √ n
∂(x, y) y w
Now √
∂(x, y) w
f(W,Z) (w, z) = f(X,Y ) (x, y)
= f(X,Y ) (x, y) √
∂(w, z) n
n/2−1 −y/2

1 2 y e w
= √ e−x /2 n/2 √
2π 2 Γ n/2 n
1 2
= 1/2 (n+1)/2 1/2  e−z w/2n w(n−1)/2 e−w/2
π 2 n Γ n/2
But

Γ n+1
Z 
(n−1)/2 −αw 2
w e dw = (n+1)/2
0 α
Hence
Γ n+1

z2
 
1 2 1
fZ (z) = 1/2 (n+1)/2 1/2  where α = 1+
π 2 n Γ n/2 α(n+1)/2 2 n
 −(n+1)/2
Γ n+1/2 z2
 
1
= 1/2 √ 1+ as required.
π Γ n/2 n n
(b) Now
2
y (n−1)/2 e−y(1+x /n)/2
r
y −yx2 /2n 1 n/2−1 −y/2
f(T,Y ) (x, y) = fT |Y (x|y)fY (y) = e y e = √
2πn 2n/2 Γ(n/2) 2(n+1)/2 πnΓ(n/2)
and hence the density of T is
Z ∞ Z ∞
1 2
fT (t) = f(T,Y ) (t, y) dy = √ y (n−1)/2 e−y(1+x /n)/2
dy
y=0 2(n+1)/2 πnΓ(n/2) y=0
x2
Comparing the integral with the density of the gamma( n+1 1
2 , 2 (1 + n )) distribution, gives the required result. [←EX]
Mar 10, 2020(20:25) Answers 2§22 Page 207

22.2 (a) Substituting into equation(21.1b) shows that for all y > 0 we have fX (y) = fX (−y).

x2
−(n+1)/2 √
(b) For all x ∈ R we have cfX (x) = 1 + n where c = B( 1/2, n/2) n > 0. Differentiating with respect to x
gives
−(n+3)/2
x2
 
2x n + 1
0
=− cfX (x) 1+
n 2 n
0 0
Hence fX (x) > 0 for x < 0 and fX (x) < 0 for x > 0. (c) Differentiating again gives
−(n+5)/2 −(n+3)/2
4x2 n + 1 x2 x2
    
00 n+3 2 n+1
cfX (x) = 2 1+ − 1+
n 2 2 n n 2 n
−(n+5)/2
x2 x2
       
2 n+1 n+3
= 2 1+ 2x2 −n 1+
n 2 n 2 n
−(n+5)/2
x2
  
2 n+1
(n + 2)x2 − n
 
= 2 1+
n 2 n
which proves (c). [←EX]
22.3 Using the definition of the tn distribution in equation(21.1a) and the independence of X and Y gives
Xk
 
E[T k ] = nk/2 E = nk/2 E[X k ] E[Y −k/2 ]
Y k/2
where
k!
E[X k ] = k
 by equation(15.4a) on page 47.
2k/2 Γ 2 +1
(2r)! Γ(r + 1/2)
=r
= (2r − 1)(2r − 3) · · · 3.1 = 2r (r − 21 )(r − 32 ) · · · 23 12 = 2r
2 r! Γ 1/2)
n−k

1 Γ 2
E[Y −k/2 ] = k/2  for k < n by equation(11.9b) on page 36.
2 Γ n2
1 1 1
= k/2 n−2 n−4 =
2 ( 2 )( 2 ) · · · ( n−k
2 ) (n − 2)(n − 4) · · · (n − k + 2)(n − k)
Hence result. [←EX]
22.4 Using equation(2.22a) shows that
E[X 3 ] − 3µσ 2 − µ3
skew[X] =
σ3
Hence skew[T ] = 0 provided n > 3 and is undefined otherwise.
(b) Provided n > 4 we have E[X 4 ] = 3n2 /(n − 2)(n − 4), E[X 2 ] = n/(n − 2) and E[X] = 0. Using equation(2.25a)
shows that we have
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 3(n − 2)
κ[X] = = [←EX]
σ4 n−4

22.5 Let X 0 = X/σ . Then X 0 ∼ N (0, 1). Let Y 0 = (Y12 + · · · + Yn2 )/σ 2 . Then Y 0 ∼ χ2n . Also
X0
Z=p ∼ tn [←EX]
Y 0 /n

22.6 First, let x = (t − α)/s:


Z ∞  n/2 Z ∞  n/2
1 1
dt = s dx
−∞ 1 + (t − α)2 /s2 −∞ 1 + x2
But from equation(21.1b) on page 63, we know that
Z ∞ −(n+1)/2
t2 √
1+ dt = B( 1/2, n/2) n
−∞ n

Letting x = t/ n gives
Z ∞
(1 + x2 )−(n+1)/2 dx = B 1/2, n/2

−∞
and hence the result. [←EX]
Page 208 Answers 2§22 Mar 10, 2020(20:25) Bayesian Time Series Analysis
Pn P
22.7 (a)Proof 1. Now Y = Z12 + · · · + Zn2 ; hence Zi2 /n−→1 as n q
i=1 → ∞ by the Weak Law of Large Numbers. Using
√ √ Pn 2 P
the simple inequality | a − 1| = |a − 1|/| a + 1| < |a − 1| gives i=1 Zi /n−→1 as n → ∞. One of the standard
results by Slutsky(see for example p285 in [G RIMMETT &S TIRZAKER(1992)]) is:
D P D
if Zn −→Z as n → ∞ and Yn −→c as n →√∞ where c 6= 0 then Zn /Yn −→Z/c as n → ∞. Hence result.
n −n
Proof 2. Stirling’s formula is n! ∼ n e 2πn as n → ∞. Using this we can show that
1 1
lim √ =√ as n → ∞.
1 n
n→∞ B( /2, /2) n 2π
Also
−(n+1)/2
t2

2
lim 1 + = e−t /2 as n → ∞.
n→∞ n
qP q P
Pn a.e. n a.e. n a.e.
(b) By the SLLN, i=1 Zi2 /n −→ 1 as n → ∞; hence 2
i=1 Zi /n −→ 1 as n → ∞; hence n/ i=1 Zi2 −→ 1 as
q P
n a.e.
n → ∞; hence Z n/ i=1 Zi2 −→ Z as n → ∞. [←EX]

22.8 (a) Just set n = 2 in equation(21.1b) on page 63 and use Γ( 3/2) = 1/2Γ( 1/2) = π/2. (b) First use the transformation
z = 1/x and then w = 2z 2 :
zdz dw x
Z Z Z
1 1
2 3/2
dx = − 2
=− 3/2
= (1 + w)−1/2 = √
(2 + x ) (2z + 1) 4(1 + w) 2 2 x2 + 2
and hence x
Z x
dy y x 1 y 1
= = √ + because lim p =−

2 3/2
p
−∞ (2 + y ) 2 y + 2 −∞ 2 x + 2 2 2

2 2 y→−∞ 2 y +22

(c) Now 2u − 1 = x/ 2 + √ x2 ; hence (2u − 1)2 = x2 /(2 + x2 ) which gives x2 = (2u − √ 1)2 /[2u(1 − u)]. Taking the square
root gives x = ±(2u − 1)/ 2u(1 − u). But Q(u) % as x %. Hence x = +(2u − 1)/ 2u(1 − u). (d) By §21.2.
(e) We have
Z ∞
x dx ∞ √
2 −1/2
E|T | = 2 = −2(2 + x ) = 2
0 (2 + x2 )3/2 0
√ √
The interquartile range is Q( 3/4) − Q( 1/4) = 2 2/ 3. (f) Use definition(21.1a); also by equation(11.9a) on page 36
we know that the χ22 is the exponential ( 1/2) distribution; finally, by §9.1 on page 27 we know that if X ∼ exponential (λ)
and Y = αX where α > 0 then Y ∼ exponential ( λ/α).
(g) Now X ∼ gamma(1, 1) and Y ∼ gamma(1, 1). Hence by equation(21.8a) we have that X/Y ∼ F2,2 . Result follows
by exercise 22.33 below.  See also [C ACOULLOS(1965)] and problem 326 on page  81 in [C ACOULLOS(1989)]. [←EX]
22.9 (a) Now γs (x) = s/π s2 + (x − a)2 . Hence for t > 0 we have γs (a + t) = s/π s2 + t2 = γs (a − t).
2
(b) Differentiating gives γs0 (x) = −2s(x − a)/π s2 + (x − a)2 . Hence γs0 (x) > 0 for x ∈ (−∞, a) and γs0 (x) < 0 for

x ∈ (a, ∞). (c) Differentiating again gives
πγ 00 (x) [s2 + (x − a)2 ]2 − 4(x − a)2 [s2 + (x − a)2 ] [s2 + (x − a)2 ] − 4(x − a)2 s2 − 3(x − a)2
− s = = =
2s [s2 + (x − a)2 ]4 [s2 + (x − a)2 ]3 [s2 + (x − a)2 ]3
√ √ √
The numerator equals 0 when x √ = a − s/ 3 and x = a + s/ 3; it is negative for x < a − s/ 3, then positive and then
negative again when x > a + s/ 3. Hence result. [←EX]
−1 1
22.10 Setting F s (x)
= p in equation (21.5a) and solve for p. Clearly Fs ( / 2 ) = a. [←EX]
dy
22.11 Just use dx = d and fY (y) = fX ((y − c)/d)d. [←EX]
22.12 (a)(b) Just use characteristic functions. [←EX]
t t
h |t|
in
itY in (X1 +···Xn ) in X1 n −s −s|t|
22.13 (a) φ(t) = E[e ] = E[e ] = E[e ] = e n = e as required. (b) Use proposition(3.7a)
√ D
on page 11. Hence 2 n Mn /(πs) =⇒ N (0, 1) as n → ∞. Hence Mn is asymptotically normal with mean 0 and
variance π 2 s2 /(2n). [←EX]
ita−s|t| itna−ns|t|
22.14 (a) Now φX (t) = e . Hence the characteristic function of X1 + · · · + Xn is e and this is the characteristic
function of nX. Hence α = 1 and X is strictly stable. (b) Just substitute in equation(4.3b). [←EX]
1 dy 1 1
22.15 (a) Now fX (x) = π(1+x 2) for x ∈ R and =
dx x2 . Hence f Y (y) = π(1+y 2 )
for y ∈ R. Hence Y ∼ Cauchy(0, 1).
(b) As for part (a), fY (y) = π(1+ss 2 y2 ) for y ∈ R which is Cauchy(0, 1/s). Or: X/s ∼ Cauchy(0, 1); by part (a) we have
s/X ∼ Cauchy(0, 1); hence 1/X ∼ Cauchy(0, 1/s). s
(c) fY (y) = π[s2 y2 +(my−1) 2 ] for y ∈ R. [←EX]

22.16 Now W = X/Y = ( X/σ)/( Y /σ). Hence we can take σ = 1 without loss of generality. (a) Now W = X/Y = X/ Z
where X ∼ N (0, 1) and Z ∼ χ21 and X and Z are independent. Hence W ∼ t1 = γ1 , the Cauchy density.
2
(b) As for part (a). (c) The folded Cauchy ∂u density which 2is fW (w) = 2/π(1 + w ) for w > 0. P
[←EX]
∂u
22.17 (a) Let W = tan(U ). Then fW (w) = fU (u) ∂w = 1/π(1 + w ). (b) Let W = tan(U ). Then fW (w) = u fU (u) ∂w
=
2
(2/2π) × 1/(1 + w ) as required. [←EX]
Mar 10, 2020(20:25) Answers 2§22 Page 209

22.18 (a) φ2X (t) = E[eit2X ] = E[ei(2t)X ] = e−2s|t| . This is the Cauchy γ2s distribution.
(b) φX+Y (t) = E[eit(X+Y ) ] = E[eit(aU +cU +bV +dV ) ] = E[eit(a+c)U ]E[eit(b+d)V ] = e−s(a+c)|t| e−s(b+d)|t| = e−s(a+b+c+d)|t|
which is the Cauchy distribution γs(a+b+c+d) . [←EX]
22.19 We have Y = R sin(Θ) and X = R cos(Θ). Also ∂(x,y) = r. Hence

∂(r,θ)

∂(x, y)
f(R,Θ) (r, θ) = f(X,Y ) (x, y) = 1 e−(x2 +y2 )/2 r = 1 re−r2 /2
∂(r, θ) 2π 2π
2
Hence Θ is uniform on (−π, π), R has density re−r /2 for r > 0 and R and Θ are independent.
If W = R2 then the density of W is fW (w) = 12 e−w/2 for w > 0; this is the χ22 distribution. [←EX]
22.20 (a) Let X = tan(Θ); hence Θ ∈ (− π/2, π/2). Then P[Θ ≤ θ] = P[X ≤ tan(θ)] = 1/2 + θ/π and fΘ (θ) = 1/π. Hence Θ
has the uniform distribution on (− π/2, π/2). Now 2X/(1 − X 2 ) = tan(2Θ). So we want the distribution of W = tan(Y )
where Y has the uniform distribution on (−π, π). Hence

dw
=2 1 1
X
fW (w) = fY (y) This is the Cauchy(0, 1) distribution.
y
dy 2π 1 + w2
By part(a), V1 has the Cauchy(0, 1) distribution. Hence V ∼ Cauchy(0, 1). [←EX]
22.21 Now X = b tan Θ and so 0 < X < ∞. For x > 0 we have P[X ≤ x] = P[|b tan Θ| ≤ x] = P[| tan Θ| ≤ x/b] =
2/π tan−1 x/b. Differentiating gives

2 b
fX (x) =
π b2 + x 2
(b) P[ /X ≤ x] = P[X ≥ /x] = 1 − π tan
1 1 2 −1 1
/bx and this has density π2 1+bb2 x2 for x > 0. [←EX]
22.22 (a) Clearly f ≥ 0. Using the transformation y = (1 + x2 )1/2 tan t gives
Z ∞ Z ∞ Z π/2
1 1 1 cos t 1
f (x, y) dy = dy = dt =
y=−∞ 2π y=−∞ (1 + x2 + y 2 )3/2 2π t=−π/2 1 + x2 π(1 + x2 )
which is the standard Cauchy distribution. This answers parts (a) and (b).
(c) The absolute value of the Jacobian of the transformation is ∂(x,y) ∂(r,θ) = r. Hence

r
f(R,Θ) (r, θ) = for r > 0 and θ ∈ (0, 2π).
2π(1 + r2 )3/2
Hence R and Θ are independent. Θ ∼ uniform(0, 2π) and R has density fR (r) = r/(1 + r2 )3/2 for r > 0. [←EX]
22.23 (a) Use the transformation from (X, Y ) to (V, W ) where
X/m nX
V = = and W =Y
Y /n mY
Hence v ∈ (0, ∞) and w ∈ (0, ∞) and
∂(v, w) n n
∂(x, y) = my = mw

Now
∂(x, y)
f(V,W ) (v, w) = f(X,Y ) (x, y) = f(X,Y ) (x, y) mw
∂(v, w) n
xm/2−1 e−x/2 y n/2−1 e−y/2 mw
=  
2m/2 Γ m/2 2n/2 Γ n/2 n
(mwv/n)m/2−1 e−mvw/2n wn/2−1 e−w/2 mw
=  
2m/2 Γ m/2 2n/2 Γ n/2 n
v m/2−1 (m/n)m/2 w mv
= (m+n)/2
w(m+n)/2−1 e− 2 (1+ n )
2 Γ(m/2)Γ(n/2)
R∞
Using 0
wk−1 e−αw dw = Γ(k)/αn with α = 21 (1 + mv n ) and integrating out w gives
m+n
Γ m+n
 
Γ 2 v m/2−1 (m/n)m/2 2 v m/2−1 mm/2 nn/2
fV (v) = m n (m+n)/2 (m+n)/2
= m n as required.
Γ( 2 )Γ( 2 ) 2 α Γ( 2 )Γ( 2 ) (n + mv)(m+n)/2
(b) Now
 my m/2 xm/2−1 e−myx/2n y n/2−1 e−y/2
f(F,Y ) (x, y) = fF |Y (x|y)fY (y) =
2n Γ(m/2) 2n/2 Γ(n/2)
and hence
xm/2−1  m m/2 Z ∞  y mx 
fF (x) = (m+n)/2 y (m+n)/2−1 exp − ( + 1) dy
2 Γ(m/2)Γ(n/2) n y=0 2 n
xm/2−1  m m/2 Γ( (m + n)/2 )2(m+n)/2

= (m+n)/2
2 Γ(m/2)Γ(n/2) n (mx/n + 1)(m+n)/2
Page 210 Answers 2§22 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Γ m+n

2 n/2 n/2 xm/2−1
= m n m n [←EX]
Γ( 2 )Γ( 2 ) (mx + n)(m+n)/2
22.24 Write the density as
m+n
xm/2−1 m/2 n/2 Γ( 2 ) mm/2 nn/2
f (x) = c where c = m n m n =
[mx + n](m+n)/2 Γ( 2 )Γ( 2 ) B( m n
2 , 2)
Differentiating gives
f 0 (x) [mx + n](m+n)/2 ( m 2 − 1)x
m/2−2
− xm/2−1 ( m+n 2 )m[mx + n]
(m+n)/2−1
=
c [mx + n]m+n
m/2−2
x h m m+n i xm/2−2
= ( − 1)(mx + n) − xm( ) = [n(m − 2) − xm(n + 2)]
[mx + n](m+n)/2+1 2 2 2[mx + n](m+n)/2+1
and this shows (a), (b) and (c). Differentiating again gives
2f 00 (x) [mx + n](m+n)/2+1 du m/2−2
[n(m − 2) − xm(n + 2)] ( m+n
  (m+n)/2
dx − x 2 + 1)m[mx + n]
=
c [mx + n]m+n+2
Hence
m+n
4f 00 (x)[mx + n] 2 +2
 
du
= [mx + n] 2 − xm/2−2 [n(m − 2) − xm(n + 2)] (m + n + 2)m
c dx
where
du
2 = xm/2−3 [n(m − 2)(m − 4) − m(n + 2)(m − 2)x]
dx
Hence
cxm/2−3
f 00 (x) = m+n [a2 x2 + a1 x + a0 ]
4[mx + n] 2 +2
where a2 = m2 (n + 2)(n + 4), a1 = 2mn(n + 4)(2 − m) and a0 = n2 (m − 2)(m − 4).
The roots of the quadratic equation a2 x2 + a1 x + a0 = 0 are α and β.
If m ∈ (0, 2], then a2 > 0, a1 > 0 and a0 > 0. If m ∈ (2, 4] then a2 > 0, a1 < 0 and a0 < 0. If m ∈ (4, ∞), then
a2 > 0, a1 < 0 and a0 > 0. Hence result. [←EX]
22.25 (a) Now F = (nX)/(mY ); hence E[F ] = nE[X]E[1/Y ]/m = nmE[1/Y ]/m = nE[1/Y ] = n/(n − 2). Also
E[F 2 ] = n2 E[X 2 ]E[1/Y 2 ]/m2 = n2 m(m + 2)/[(n − 2)(n − 4)m2 ] = n2 (m + 2)/[m(n − 2)(n − 4)]. Hence var[F ] =
2n2 (m + n − 2)/[ m(n − 2)2 (n − 4) ] for n > 4.
(c) Now F = (nX)/(mY ); hence E[F r ] = nr E[X r ]E[1/Y r ]/mr where, by equation(11.9b) on page 36, we have
2r Γ( m/2 + r) 2−r Γ( n/2 − r)
 
1
E[X r ] = and E r
=
m
Γ( /2) Y Γ(n/2)
and hence for n > 2r
 n r Γ( m/ + r)Γ( n/ − r)  n r ( m/ + r − 1)( m/ + r − 2) · · · ( m/ )
2 2 2 2 2
E[F r ] = =
m Γ( m/2)Γ( n/2) m ( n/2 − 1)( n/2 − 2) · · · ( n/2 − r)
as required. [←EX]
22.26 Provided n > 6 we have
 n 3 m(m + 2)(m + 4) 2n2 (m + n − 2) n
E[X 3 ] = var[F ] = E[X] =
m (n − 2)(n − 4)(n − 6) m(n − 2)2 (n − 4) n−2
Using equation(2.22a) shows that
E[X 3 ] − 3µσ 2 − µ3 m1/2 (n − 4)1/2 m(n − 2)3 (n − 4)[E[X 3 ] − 3µσ 2 − µ3 ]
skew[X] = =
σ3 81/2 (m + n − 2)1/2 n3 (m + n − 2)
1/2 1/2

m (n − 4) 8(n + 2m − 2) (n + 2m − 2) 8(n − 4)
= 1/2 = √
8 (m + n − 2)1/2 m(n − 6) (n − 6) m(m + n − 2)
(b) Provided n > 8 we also have
 n 4 m(m + 2)(m + 4)(m + 6) n4 (m + 2)(m + 4)(m + 6)
E[X 4 ] = = 3
m (n − 2)(n − 4)(n − 6)(n − 8) m (n − 2)(n − 4)(n − 6)(n − 8)
Using equation(2.25a) shows that we have
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4
κ[X] = 4
 2 σ 2 
3(n − 4) mn + 4n + m2 n + 8mn − 16n + 10m2 − 20m + 16
= for n > 8. [←EX]
m(n − 8)(n − 6)(m + n − 2)
22.27 Now Z = X12 /Y12 where X1 = X/σ and Y1 √ = Y /σ are i.i.d. N (0, 1). Now X12 and Y12 are i.i.d. χ21 . Hence Z has the F1,1
distribution which has density fZ (z) = 1/π z(1 + z) for z > 0. [←EX]
22.28 (a) By definition(21.6a) on page 65. (b) Using §11.3 on page 34 gives 2α1 X1 ∼ gamma(n1 , 1/2) = χ22n1 and 2α2 X2 ∼
gamma(n2 , 1/2) = χ22n2 . Hence result by definition(21.6a). [←EX]
Mar 10, 2020(20:25) Answers 2§22 Page 211

22.29 Let Y = nX/m(1 − X); then Y ∈ (0, ∞). Also X = mY /(n + mY ), 1 − X = n/(n + mY ) and
m dy 1 dy n
= and hence =
n dx (1 − x)2 dx m(1 − x)2
Hence
dx xm/2−1 (1 − x)n/2−1 m(1 − x)2 y m/2−1 mm/2 nn/2

fY (y) = fX (x) = = as required. [←EX]
B( m n
B m n

dy 2 , 2) n 2 , 2
(my + n)(m+n)/2
2
22.30 (a) This is the reverse of exercise 22.29.
n/X
Now X ∼ Fm,n implies 1/X ∼ Fn,m . Hence m+n/X ∼ beta( n/2, m/2). Hence result.
dy
(b) Let Y = αX/β; then | dx | = α/β. Now for x ∈ (0, ∞) we have
1 (2α)α (2β)β xα−1 1 αα β β xα−1
fX (x) = =
B(α, β) 2α+β [αx + β]α+β B(α, β) [αx + β]α+β
fX (x) 1 αα−1 β β+1 (βy/α)α−1 1 y α−1
fY (y) = dy = = for y ∈ (0, ∞).
| dx | B(α, β) β α+β [1 + y]α+β B(α, β) [1 + y]α+β
Or use X = (Y /2α)/(Z/2β) = (βY )/(αZ) where Y ∼ χ22α , Z ∼ χ22β and Y and Z are independent. Hence X ∼
beta 0 (α, β) by part (f) of 14.14 on page 45. [←EX]
22.31 Now W = nX/(mY ); hence mW/n = X/Y where X ∼ χ2m , Y ∼ χ2n and X and Y are independent. Hence by part (f)
of exercise 14.14 on page 45, X/Y ∼ beta 0 ( m/2, n/2). [←EX]
P
22.32 Proof 1. Now W = (X/m)/(Y /n). Now Y = Z12 + · · · + Zn2 where Z1 , . . . , Zn are i.i.d. N (0, 1). Hence Y /n−→1 as
D 2
n → ∞. Hence, by Slutsky’s theorem (see answer to exercise 22.7 on page 208), mW = X/(Y /n)−→χm as n → ∞.
Proof 2. Now
Γ( m+n ) mm/2 nn/2 wm/2−1
fW (w) = m 2 n for w ∈ (0, ∞).
Γ( 2 )Γ( 2 ) [mw + n](m+n)/2
mm/2 wm/2−1 Γ( m+n
2 )n
n/2
lim fW (w) = lim
n→∞ Γ( m2 )
n→∞ Γ( n )[mw + n](m+n)/2
2
mm/2 wm/2−1 1 Γ( m+n
2 )
= m lim n
Γ( 2 ) n→∞ (1 + mw/n)n/2 Γ( )[mw + n]m/2
2
mm/2 wm/2−1 −mw/2 Γ( m+n
2 )
= m e lim n
Γ( 2 ) n→∞ Γ( )[mw + n]m/2
2
mm/2 wm/2−1 −mw/2 1
= e
Γ( m2 ) 2m/2
√ n+ 12 −n
by using Stirling’s formula: n! ∼ 2π n e as n → ∞. Finally, convergence in densities implies convergence in
distribution (see page 252 in [F ELLER(1971)]). [←EX]
22.33 (a) Let fX (x) denote the density of X; then
Γ(n) xn/2−1
fX (x) = n for x > 0.
Γ( 2 )Γ( n2 ) (1 + x)n
√ √  
dy √
 √n(1+x)
Define Y by 2Y = n X − √1X . Then 4 dx = n x1/2 1 1
+ x3/2 = x3/2 . Hence
4x3/2 x(n+1)/2

dx 4Γ(n) 4Γ(n)  x (n+1)/2
fY (y) = fX (x) = fX (x) √ =√ = √
dy n(1 + x) nΓ( n )Γ( n ) (1 + x)n+1
2 2 nΓ( n )Γ( n ) 1 + 2x + x2
2 2
Now
1 + 2x + x2 1 4y 2 2
 
y
= +x+2= +4=4 1+
x x n n
Hence
Γ(n) 1 1
fY (y) = √ ∝ for y ∈ R.
nΓ( n2 )Γ( n2 )2n−1 (1 + y 2 /n)(n+1)/2 (1 + y 2 /n)(n+1)/2
This is the tn distribution. To show the multiplicative constants are equal use the duplication formula of the gamma
function which asserts that Γ(n)Γ(1/2) = 2n−1 Γ(n/2)Γ((n + 1)/2). This result is given in equation 6.1.18 on page 256
in [A BRAMOWITZ &S TEGUN(1965)]. √ √ 
(b) By equation(21.8a) on page 66 we know that W = X/Y ∼ Fn,n . Also T = 2n W − √1W . Hence result.
[←EX]
2
Suppose X has density fX and Y = α(X) where the function α has an inverse. Let fY denote the density of Y . Then if Z has
density fY then α−1 (Z) has density fX . This follows because P[α−1 (Z) ≤ z] = P[Z ≤ α(z)] = P[Y ≤ α(z)] = P[α(X) ≤
α(z)] = P[X ≤ z].
Page 212 Answers 2§22 Mar 10, 2020(20:25) Bayesian Time Series Analysis

22.34 Setting a = b in equation(21.10a) gives


 a+1/2  a+1/2  a+1/2
Γ(2a) x x Γ(2a) 2a
fX (x) = √ 1+ √ 1− √ = √
22a−1 Γ(a)Γ(a) 2a 2a + x2 2a + x2 22a−1 Γ(a)Γ(a) 2a 2a + x2
 a+1/2
Γ(2a) 1 1
= 2a−1/2 √
2 Γ(a) Γ(a) a 1 + x2 /2a
 a+1/2
Γ(a + 1/2) 1 1
= 1/2 √
2 Γ(1/2) Γ(a) a 1 + x2 /2a
by using the duplication formula of the gamma function described in answer 22.33 on page 211 with n = 2a. [←EX]
22.35 Let fX (x) denote the density of X; then
Γ(m + n) (2m)m (2n)n xm−1 Γ(m + n) mm nn xm−1
fX (x) = = for x > 0.
Γ(m)Γ(n) (2mx + 2n)m+n Γ(m)Γ(n) (mx + n)m+n
Define W = mX/n; then dw
m
dx = n and mx + n = n(1 + w). Hence

dx Γ(m + n) wm−1

fW (w) = fX (x) =
dw Γ(m)Γ(n) (1 + w)m+n
√ √  √  √
dy
Define Y by 2Y = m + n W − √1W . Then 4 dw = m + n w11/2 + w13/2 = m+n(1+w)
w3/2
. Hence
3/2
wm+1/2

dw 4Γ(m + n)
fY (y) = fW (w) = fW (w) √ 4w =√
dy m + n(1 + w) m + nΓ(m)Γ(n) (1 + w)m+n+1
4Γ(m + n) wm+1/2 1
=√
m + nΓ(m)Γ(n) (1 + w)m+1/2 (1 + w)n+1/2
m+1/2  n+1/2
w−1 w−1

4Γ(m + n) 1 1
=√ + −
m + nΓ(m)Γ(n) 2 2(w + 1) 2 2(w + 1)
1/2 m+1/2
−1/2
n+1/2
1/2
1 w −w 1 w1/2 − w−1/2 w1/2
  
4Γ(m + n) w
=√ + −
m + nΓ(m)Γ(n) 2 2 w+1 2 2 w+1
Now
(1 + w)2 1 + 2w + w2 4y 2 m + n + y2
 
1
= = +w+2= +4=4
w w w m+n m+n
Hence

w1/2 − w−1/2 w1/2 y m+n y
=√ p = p
2 1+w m+n 2 m+n+y 2 2 m + n + y2
Hence result.
(b) Now X = (Y /2m)/(Z/2n) = (nY )/(mZ) ∼ F2m,2n . Then apply part (a). (c) By exercise 12.7 on page 38 we know
that if Y ∼ χ22m , Z ∼ χ22n and Y and Z are independent, then B = Y /(Y + Z) ∼ beta(m, n). Finally
2B − 1 Y −Z
√ = √ [←EX]
B(1 − B) YZ
22.36 Now
√ √ √ !
m+n Y Z
X= √ −√ where Y ∼ χ22m , Z ∼ χ22n and Y and Z are independent.
2 Z Y
By equation(11.9c) on page 36 we know that

√ 1 Γ(m − 1/2)
 
2Γ(m + 1/2) 1
E[ Y ] = and E √ =√ for m > 1/2.
Γ(m) Y 2 Γ(m)
Hence, for m > 1/2 and n > 1/2 we have
√ √ Γ(m + 1/2)Γ(n − 1/2) Γ(n + 1/2)Γ(m − 1/2)
   
1 1
E[ Y ]E √ − E[ Z]E √ = −
Z Y Γ(m)Γ(n) Γ(m)Γ(n)
(m − 1/2)Γ(m − 1/2)Γ(n − 1/2) − (n − 1/2)Γ(n − 1/2)Γ(m − 1/2)
=
Γ(m)Γ(n)
and hence E[X]. For E[X 2 ] use
   
2 m+n Y Z 1 1
X = + −2 and E[Y ] = 2m and E = for m > 1. [←EX]
4 Z Y Y 2(m − 1)
Mar 10, 2020(20:25) Answers 2§24 Page 213

22.37 Now for x ∈ R we have


 m+1/2  n+1/2
x x
f (x) = C 1 + √ 1− √
m + n + x2 m + n + x2
 m−1/2  n−1/2
df (x) m+n x x
=C 1+ √ 1− √ ×
dx (m + n + x2 )3/2 m + n + x2 m + n + x2
    
x x
(m + 1/2) 1 − √ − (n + 1/2) 1 + √
m + n + x2 m + n + x2
 m−1/2  n−1/2
m+n x x
=C 1+ √ 1− √ ×
(m + n + x2 )3/2 m+n+x 2 m + n + x2
 
(m + n + 1)x
m−n− √
m + n + x2
All the factors are always positive except the last. Checking the derivative of the last factor shows it is monotonic
decreasing. Setting equal to 0 shows it is positive for

(m − n) m + n
x< √ √
2m + 1 2n + 1
and then negative. Hence result. [←EX]

Chapter 2 Section 24 on page 76 (exs-noncentral.tex)


Pn 2
Pn
24.1 E[W ] = j=1 E[Xj ]= j=1 (1 + µ2j )
=n+λ
Suppose Z ∼ N (0, 1). Then E[Z] = E[Z 3 ] = 0, E[Z 2 ] = 1 and E[Z 4 ] = 3. Suppose X = Z + µ. Then E[X] = µ,
E[X 2 ] = 1 + µ2 , E[X 3 ] = 3µ + µ3 , and E[X 4 ] = 3 + 6µ2 + µ4 . Hence var[X 2 ] = 2 + 4µ2 .
Hence var[W ] = var[X12 ] + · · · + var[Xn2 ] = 2n + 4λ. [←EX]
24.2 Rearranging equation(23.5a) on page 74 gives
∞ ∞
X e−λ/2 ( λ/2)j e−x/2 x(n+2j−2)/2 X e−λ/2 ( λ/2)j
fX (x) = = fn+2j (x)
j! 2(n+2j)/2 Γ( n/2 + j) j!
j=0 j=0

where fn+2j (x) is the density of the χ2n+2j distribution. Hence result. [←EX]
24.3 Use moment generating functions or characteristic functions.
n    
Y 1 λj t 1 λt
E[etZ ] = exp = exp
j=1
(1 − 2t)kj /2 1 − 2t (1 − 2t)k/2 1 − 2t
Hence Z has a non-central χ2k distribution with non-centrality parameter λ where k = k1 + · · · + kn and λ = λ1 + · · · + λn .
[←EX]
24.4 By equation(11.9b) on page 36, we know that if Z ∼ χ2n then
E[Z] = n E[Z 2 ] = n(n + 2) E[Z 3 ] = n(n + 2)(n + 4) E[Z 4 ] = n(n + 2)(n + 4)(n + 6)
Hence
E[W |V = j] = n + 2j E[W 2 |V = j] = (n + 2j)(n + 2j + 2)
E[W 3 |V = j] = (n + 2j)(n + 2j + 2)(n + 2j + 4) E[W 4 |V = j] = (n + 2j)(n + 2j + 2)(n + 2j + 4)(n + 2j + 6)
Also if V has the Poisson distribution with mean α, then it is known that
E[V 2 ] = α + α2 E[V 3 ] = α(1 + 3α + α2 ) E[V 4 ] = α(1 + 7α + 6α2 + α3 )
By equations(23.5b) and (23.5c) we know that
E[X] = n + λ E[X 2 ] = n2 + 2n + 2nλ + 4λ + λ2 var[X] = 2(n + 2λ)
(a) Now
(n + 2j)(n + 2j + 2)(n + 2j + 4) = n3 + 6n2 + 8n + j[6n2 + 24n + 16] + j 2 [12n + 24] + 8j 3
= n3 + 6n2 + 8n + 2j[3n2 + 12n + 8] + 12j 2 [n + 2] + 8j 3
Hence
λ λ2 λ 3λ λ2
E[X 3 ] = n3 + 6n2 + 8n + λ[3n2 + 12n + 8] + 12(n + 2)( + ) + 8 (1 + + )
2 4 2 2 4
= n3 + 6n2 + 8n + λ[3n2 + 18n + 24] + λ2 [3n + 12] + λ3
Also 3µσ 2 + µ3 = 6(n + λ)(n + 2λ) + (n + λ)3 = n3 + 6n2 + λ(3n2 + 18n) + λ2 (3n + 12) + λ3 .
Using equation(2.22a) on page 7 gives
E[X 3 ] − 3µσ 2 − µ3 8n + 24λ n + 3λ
skew[X] = = = 23/2
σ3 [2(n + 2λ)]3/2 (n + 2λ)3/2
Page 214 Answers 2§24 Mar 10, 2020(20:25) Bayesian Time Series Analysis

(b) Best done with a computer algebra programme! Here are the main steps.
(n + 2j)(n + 2j + 2)(n + 2j + 4)(n + 2j + 6)
= n(n + 2)(n + 4)(n + 6) + j[8n3 + 72n2 + 176n + 96] + 8j 2 [3n2 + 18n + 22] + 32j 3 (n + 3) + 16j 4
Hence
E[X 4 ] = n(n + 2)(n + 4)(n + 6) + λ[4n3 + 36n2 + 88n + 48] + 2λ(2 + λ)[3n2 + 18n + 22]+
(16λ + 24λ2 + 4λ3 )(n + 3) + (8λ + 28λ2 + 12λ3 + λ4 )
Using equation(2.25a) on page 8 gives
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 3n2 + 12nλ + 12λ2 + 12n + 48λ 12n + 48λ
κ[X] = 4
= 2
=3+ [←EX]
σ (n + 2λ) (n + 2λ)2
24.5 Suppose X ∼ χ2n,λ . Let
X − (n + λ)
V = √
2(n + 2λ)
Then
t
MV (t) = E[etV ] = E[et1 X ] exp (−t1 (n + λ)) where t1 = √
2(n + 2λ)
 
1 λt1
= exp exp (−t1 (n + λ))
(1 − 2t1 )n/2 1 − 2t1
 
λt1 exp(−t1 n)
= exp exp(−t1 λ)
1 − 2t1 (1 − 2t1 )n/2

2λt21 2λt21 exp( −nt/ 2(n + 2λ) )
   
exp(−t1 n)
= exp = exp √
1 − 2t1 (1 − 2t1 )n/2 1 − 2t1 (1 − 2t/ 2(n + 2λ) )n/2

Note that exp 2λt21 /(1 − 2t1 ) → 1 as n → ∞. Now if Vn ∼ χ2n , then

Vn − n D e−nt/ 2n 2
√ =⇒ N (0, 1) as n → ∞. Hence √ → et /2 as n → ∞.
2n (1 − 2t/ 2n) n/2

In the following derivation


 we set m = n + 2λ; this gives
   
nt
 


 exp − 2(n+2λ)  exp − (m−2λ)t

2m
lim   = lim  
   
n→∞ 
n/2  m→∞
(m−2λ)/2 
2t
1 − √2m

1 − √ 2t
2(n+2λ)
      
 exp
2λt

2m   exp − √mt
2m
= lim  

m→∞
−λ    m/2 
1− √2t 1− √2t
2m 2m
2
= 1 × et /2
[←EX]
24.6 " # " #
X +µ √ h i √ √ Γ n−1
E[T ] = E p = nE[X + µ]E Y −1/2 = nµE[Y −1/2 ] = nµ 2−1/2 2
as required.
Y /n Γ n2
Similarly
(X + µ)2
 
2 1
E[T ] = E = nE[(X + µ)2 ]E[1/Y ] = n(1 + µ2 ) and hence result. [←EX]
p Y /n n − 2
24.7 Now T = (X + µ)/ Y /n where X ∼ N (0, 1), Y ∼ χ2n and X and Y are independent. Hence
(X + µ)2 W
T2 = = where W ∼ χ21,µ2 , Y ∼ χ2n and W and Y are independent.
Y /n Y /n
Hence result. [←EX]
24.8
n n 1
E[F ] = E[X]E[1/Y ] = (m + λ) as required.
m m n−2
Now for the variance:
n2 n2 1
E[F 2 ] = E[X 2 ]E[1/Y 2 ] = 2 (2m + 4λ + m2 + 2mλ + λ2 )
m 2 m (n − 2)(n − 4)
Hence
n2 1
(n − 2) 2(m + 2λ) + (m + λ)2 − (n − 4)(m + λ)2
  
var[F ] =
m (n − 2) (n − 4)
2 2

n2 1
2(m + λ)2 + 2(n − 2)(m + 2λ) as required.
 
= 2 [←EX]
m (n − 2) (n − 4)
2
Mar 10, 2020(20:25) Answers 2§26 Page 215

24.9 By exercise 24.2 we know that



X e−λ/2 ( λ/2)j
fm,λ (x) = fm+2j (x)
j!
j=0
2
where fm,λ (x) is the density of the non-central χm with
non-centrality parameter λ. and fm+2j (x) is the density of the
χ2m+2j distribution.
It is easy to check that if X and Y are independent and V = (X/m)/(Y /n) = (nX)/(mY ) then
Z ∞
mvw mw
fV (v) = fX ( )fY (w) dw
w=0 n n
The definition shows that the density gm,n,λ of the non-central Fm,n distribution with non-centrality parameter λ is
Z ∞
mvw mw
gm,n,λ (v) = fm,λ ( )fn (w) dw
w=0 n n
Because all terms are positive, we can interchange the order of the sum and integral as follows:
Z ∞X ∞
e−λ/2 ( λ/2)j mvw mw
gm,n,λ (v) = fm+2j ( )fn (w) dw
w=0 j=0 j! n n

e−λ/2 ( λ/2)j ∞ mvw mw
X Z
= fm+2j ( )fn (w) dw
j! w=0 n n
j=0

X e−λ/2 ( λ/2)j
= gm+2j,n (v)
j!
j=0
where gm+2j,n is the density of the Fm+2j,n distribution. [←EX]

Chapter 2 Section 26 on page 81 (exs-sizeshape.tex)



2
26.1 Consider the size variable g1 : (0, ∞) → (0, ∞) with g(x) = x1 . The associated shape function is z1 (x) = 1, x2/x1 =
(1, a). Hence z1 (x) is constant. For any other shape function z2 , we know that z2 (x) = z2 ( z(x) ) (see the proof of
equation(25.3a) on page 78. Hence result. [←EX]
26.2 (a) Consider the size variable g ∗ (x) = x1 ; the associated shape function is z ∗ (x) = 1, x2 /x1 . Hence


3 with probability 1/2;
z ∗ (X) = 1
/3 with probability 1/2.
If z is any other shape function, then by equation(25.3a) on page 78 we have z ∗ (X) = z ∗ ( z(X) ). Hence z(X) cannot be
almost surely constant.
The possible values of the 3 quantities are as follows:
probability z(X) g1 (X) g2 (X)

1/4 ( /4, /4)
1 3
√3 4
1/4 ( 1/4, 3/4) 12 4

1/4 ( 3/4, 1/4) √3 4
1/4 ( 3/4, 1/4) 12 4
Clearly z(X) is independent of both g1 (X) and g2 (X).
(b) By proposition(25.3b)

on page 78 we know that g1 (X)/g2 (X) is almost surely constant. It is easy to check that
g1 (X)/g2 (X) = 3/4. [←EX]
26.3 ⇐ Let Yj = Xjb for j = 1, 2, . . . , n. Then Y1 , . . . , Yn are independent random variables. By proposition(25.4a), the
b b
 
shape vector 1, Y2/Y1 , . . . , Yn/Y1 is independent of Y1 + · · · + Yn . This means 1, X2 /X1b , . . . , Xn/X1b is independent of
X1b + · · · + Xnb . Hence 1, X2/X1 , . . . , Xn/X1 is independent of (X1b + · · · + Xnb )1/b as required.


⇒ We are given that 1, X2/X1 , . . . , Xn/X1 is independent of (X1b + · · · + Xnb )1/b . Hence 1, X2b/X1b , . . . , Xnb/X1b is
 

independent of X1b + · · · + Xnb . By proposition(25.4a), there exist α > 0, k1 > 0, . . . , kn > 0 such that Xjb ∼
gamma(kj , α) and Rhence Xj ∼ ggamma(kj , α, b). [←EX]
∞ R∞
26.4 (a) P[X1 < X2 ] = 0 P[X2 > x]λ1 e−λ1 x dx = 0 e−λ2 x λ1 e−λ1 x dx = λ1 /(λ1 + λ2 ).
(b) The lack of memory property implies the distribution of V = X2 − X1 given X2 > X1 is the same as the distribution
of X2 and the distribution of X2 − X1 given X2 < X1 is the same as the distribution of −X1 . Hence
(
λ2 λ1 λ1 eλ1 v λ1λ+λ
2
if v ≤ 0;
fV (v) = fV (v|X2 < X1 ) + fV (v|X1 < X2 ) = −λ2 v λ1
2
λ1 + λ2 λ1 + λ2 λ2 e λ1 +λ2 if v ≥ 0.
(c) Now V = X2 − X1 . Hence
Z Z ∞
fV (v) = fX2 (x + v)fX1 (x) dx = λ1 λ2 e−λ2 v e−(λ1 +λ2 )x dx
{x:x+v>0} x=max{−v,0}
Hence if v ≥ 0 we have
Page 216 Answers 2§28 Mar 10, 2020(20:25) Bayesian Time Series Analysis

λ1 λ2 e−λ2 v
Z
fV (v) = λ1 λ2 e−λ2 v e−(λ1 +λ2 )x dx =
0 λ1 + λ2
and if v ≤ 0 we have

λ1 λ2 eλ1 v
Z
−λ2 v
fV (v) = λ1 λ2 e e−(λ1 +λ2 )x dx = [←EX]
−v λ1 + λ2
26.5 Now V = Y2 − Y1 . By exercise 26.4, we know that
(
λ2 λ1 v
λ1 +λ2 e if v ≤ 0;

λ1 λ2 eλ1 v if v ≤ 0;
fV (v) = and P[V ≤ v] =
λ1 + λ2 e−λ2 v if v ≥ 0. 1 − λ1λ+λ 1
2
e−λ2 v if v ≥ 0.
P[U ≥ u] = e−λ1 (u−a) e−λ2 (u−a) = ea(λ1 +λ2 ) e−(λ1 +λ2 )u for u ≥ a.
Now for u ≥ a and v ∈ R we have
Z ∞
P[U ≥ u, V ≤ v] = P[Y1 ≥ u, Y2 ≥ u, Y2 − Y1 ≤ v] = P[u ≤ Y2 ≤ v + y1 ]fY1 (y1 ) dy1
y1 =u
Z ∞
= λ1 eλ1 a P[u ≤ Y2 ≤ v + y1 ]e−λ1 y1 dy1
y1 =u

where  R v+y1
P[u ≤ Y2 ≤ v + y1 ] = λ2 eλ2 a y2 =u
e−λ2 y2 dy2 = eλ2 a [e−λ2 u − e−λ2 (v+y1 ) ] if v + y1 > u;
0 if v + y1 < u.
Hence for v ≥ 0 we have
Z ∞
P[U ≥ u, V ≤ v] = λ1 eλ1 a P[u ≤ Y2 ≤ v + y1 ]e−λ1 y1 dy1
y1 =u
Z ∞
(λ1 +λ2 )a
e−λ1 y1 e−λ2 u − e−λ2 (v+y1 ) dy1
 
= λ1 e
y1 =u
−λ1 u −(λ1 +λ2 )u
λ1 e−λ2 v
   
(λ1 +λ2 )a −λ2 u e −λ2 v e (λ1 +λ2 )a −(λ1 +λ2 )u
= λ1 e e −e =e e 1−
λ1 λ1 + λ2 λ1 + λ2
= P[U ≥ u]P[V ≤ v]
Similarly, for v ≤ 0 we have
Z ∞
P[U ≥ u, V ≤ v] = λ1 eλ1 a P[u ≤ Y2 ≤ v + y1 ]e−λ1 y1 dy1
y1 =u−v
Z ∞
(λ1 +λ2 )a
e−λ1 y1 e−λ2 u − e−λ2 (v+y1 ) dy1 = P[U ≥ u]P[V ≤ v]
 
= λ1 e
y1 =u−v
Hence the result. [←EX]
26.6 Suppose θj > 0 and x0 > 0. Then X ∼ Pareto(α, θj x0 , 0) iff X/θ
j ∼ Pareto(α, x0 , 0) and proceed as in the proof of
proposition(25.5a) on page 79. [←EX]
26.7 Suppose θj > 0 and h > 0. Then X ∼ powerlaw(α, θj h, 0) iff X/θ
j ∼ powerlaw(α, h, 0) and proceed as in the proof of
proposition(25.6a) on page 79. [←EX]

Chapter 2 Section 28 on page 83 (exs-LaplaceRayleighWeibull.tex)

28.1 (a) Let W = X + Y . For w > 0,


∞ ∞ ∞
α −αw
Z Z Z
fW (w) = fX (x)fY (w − x) dx = αe−αx αeα(w−x) dx = α2 eαw e−2αx dx = e
x=w x=w x=w 2
For w < 0
w w
α αw
Z Z
fW (w) = fY (y)fX (w − y) dy = α2 e−αw e2αy dy = e
−∞ −∞ 2
and hence fW (w) = α2 e−α|w| for w ∈ R; this is the Laplace(0, α) distribution.
(b) The Laplace(0, α) distribution by part (a). [←EX]
tX 2 µt 2 2 tY bt tkX bt 2 µkt
28.2 (a) Now E[e ] = α e /(α − t ). Hence if Y = kX + b then E[e ] = e E[e ] = e α e /(α − k 2 t2 ) =
2

et(kµ+b) α12 /(α12 − t2 ) where α1 = α/k. This is the mgf of the Laplace(kµ + b, α/k) distribution.
(b) Now X − µ ∼ Laplace(0, α); hence α(X − µ) ∼ Laplace(0, 1). [←EX]
Mar 10, 2020(20:25) Answers 2§28 Page 217

28.3 (a) Clearly fX (µ + y) = fX (µ − y) for all y ∈ (0, ∞). (b) Differentiating fX gives
−α2
(
0 2 exp [−α(x − µ)] if x ∈ (µ, ∞);
fX (x) =
α2
2 exp [−α(µ − x)] if x ∈ (−∞, µ);
0
Hence (b). (c) Differentiating fX gives
α3
(
00 2 exp [−α(x − µ)] if x ∈ (µ, ∞);
fX (x) =
α3
exp [−α(µ − x)] if x ∈ (−∞, µ);
2
00
and hence fX> 0 for x ∈ (−∞, µ) and x ∈ (µ, ∞). [←EX]
28.4 (a) By integrating the density. (b) Set FX (x) = p and solve for x. (c) Because fX is symmetric about µ. (b) The
expectation, median and mode are all µ. Also var[X] = var[X − µ] = 2/α2 by using the representation of the Laplace
as the difference of two independent exponentials given in exercise 28.1. (c) Using the representation in exercise 28.1
again implies the moment generating function is
α α α2 eµt
E[etX ] = eµt E[et(X−µ) ] = eµt = 2 for |t| < α. [←EX]
α − t α + t α − t2
2n−1
28.5 Because the distribution is symmetric about 0, it follows that E[Y ] = 0 for all integers n ≥ 1. Also, by using
equation(11.1b) on page 33 we get
α ∞ 2n −α|x|
Z Z ∞
(2n)!
2n
E[Y ] = x e =α x2n e−αx = α 2n+1 as required. [←EX]
2 −∞ 0 α
28.6 Using equation(2.22a) and equation(27.1a) shows that
E[X 3 ] − 3µσ 2 − µ3
skew[X] = =0
σ3
(b) Using equation(27.1a) shows that µ = E[X] = 0, E[X 4 ] = 4!/α4 and σ 2 = 2/α2 . Hence equation(2.25a) shows that
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 E[X 4 ]
κ[X] = = =6 [←EX]
σ4 σ4
28.7
α −α|x−µ|
Z

H(X) = − e ln(α/2) − α|x − µ| dx
2
Z x∈R
1 −|v| 
= e |v| − ln(α/2) dv by setting v = α(x − µ).
v∈R 2
Z ∞ Z 0
1 −v  1 v 
= e v − ln(α/2) dv + e −v − ln(α/2) dv
v=0 2 v=−∞ 2
Z ∞
e−v v − ln(α/2) dv = 1 − ln(α/2)

= [←EX]
v=0 R x α −α|y| R x α −αy R x −αy
28.8 (a) For x > 0 we have P[|X| < x] = P[−x < X < x] = −x 2 e dy = 2 0 2 e dy = 0 αe dy.
R∞
Hence |X| has the exponential density αe−αx for x > 0. (b) For z > 0 we have fZ (z) = y=0 fX (z + y)fY (y) dy =
R∞
λµ y=0 e−λz e−λy e−µy dy = λµe−λz /(λ + µ).
R∞ R∞
For z < 0 we have fZ (z) = y=−z fX (z + y)fY (y) dy = λµe−λz y=−z e−(λ+µ)y dy = λµeµz /(λ + µ). [←EX]
1 α 1 α α 2α α2
28.9 Let Z = X(2Y − 1). Then E[etX(2Y −1) ] = 1
2 E[e
−tX
] + 12 E[etX ] = 2 α−t + 2 α+t = for |t| < α as
2 α2 −t2 = α2 −t2
required. Pn [←EX]
28.10 (a) By part (b), α(X − µ) ∼ Laplace(0, 1); hence α|X − µ| ∼ exponential (1); hence α i=1 |Xi − µ| ∼ gamma(n, 1);
Pn
hence 2α i=1 |Xi − µ| ∼ gamma(n, 1/2) = χ22n (b) Now |X − µ| ∼ exponential (α) = gamma(1, α). Hence result by
equation(21.8a) on page 66. [←EX]
28.11 Let W = X and Z = ln( X/Y ). Now (X, Y ) ∈ (0, 1) × (0, 1). Clearly W ∈ (0, 1). Also 0 < X < X/Y ; hence
eZ = X/Y > X = W . This implies: if Z > 0 then 0 < W < 1 and if Z < 0 then 0 < W < eZ .
Then | ∂(w,z) z
∂(x,y) | = /y = e /x. Hence f(W,Z) (w, z) = y = we
1 −z
.
R 1 −z R ez
If z > 0 then fZ (z) = 0 we dw = 2 e . If z < 0 then fZ (z) = 0 we−z dw = 21 ez . The Laplace(0, 1) distribution.
1 −z

[←EX]
28.12 Let Y1 = (X1 + X2 )/2; Y2 = (X3 + X4 )/2, Y3 = (X1 − X2 )/2 and Y4 = (X3 − X4 )/2. Then X1 X2 = Y12 − Y32 and
X3 X4 = Y22 − Y42 . Hence X1 X2 − X3 X4 = (Y12 + Y42 ) − (Y22 + Y32 ).
Now
t1 − t3 t2 − t4
    
  t1 + t3 t2 + t4
E exp ( i(t1 Y1 + t2 Y2 + t3 Y3 + t4 Y4 ) ) = E exp i X1 + X2 + X3 + X4
2 2 2 2
(t1 + t3 )2 (t1 − t3 )2 (t2 + t4 )2 (t2 − t4 )2
 
= exp − − − −
8 8 8 8
 2 2 2 2
t1 + t3 + t2 + t4
= exp − = E[eit1 Y1 ] E[eit2 Y2 ] E[eit3 Y3 ] E[eit4 Y4 ]
4
Page 218 Answers 2§28 Mar 10, 2020(20:25) Bayesian Time Series Analysis

Hence Y1 , Y2 , Y3 and Y4 are i.i.d. N (0, σ 2 = 1/2). Hence 2(X1 X2 − X3 X4 ) = 2(Y12 + Y42 ) − 2(Y22 + Y32 ) is equal to the
difference of two independent χ22 = exponential ( 1/2) distributions which is the Laplace(0, 1/2) distribution.
(b) X1 X2 + X3 X4 = (Y12 + Y22 ) − (Y32 + Y42 ) and then as for part (a). [←EX]
28.13 Using characteristic functions.

Z ∞ √
Z ∞ Z ∞
1 2 2 1
E[eitY 2X ] = E[eitY 2x ]e−x dx = e− 2 2xt e−x dx = e−x(1+t ) dx =
0 0 0 1 + t2
and this is the c.f. of the Laplace(0, 1) distribution. Hence result.
√ 1 ,y2 ) √

Using densities. Let Y1 = X and Y2 = Y 2X. Hence Y1 > 0 and Y2 ∈ R and ∂(y ∂(x,y) = 2x = 2y1 . Hence
f(X,Y ) (x, y) 1 −x 1 −y2 /2 1 2
f(Y1 ,Y2 ) (y1 , y2 ) = √ =√ e √ e = √ e−y1 e−y2 /4y1
2y1 2y1 2π 2 πy1

Using the substitution y1 = z 2 /2 and equation(16.17b) on page 51 gives


Z ∞ Z ∞ y2
1 2 1 z2 2 1
fY2 (y2 ) = √ e−(y1 +y2 /4y1 ) dy1 = √ e−( 2 + 2z2 ) dz = e−|y2 |
0 2 πy1 0 2π 2
as required. [←EX]
28.14 (a) Note that X, Y , −X and −Y all have the same distribution. Hence V , W , −V and −W all have the same distribution.
Clearly,
α2
Z Z
fV (v) = fX (x)fY (v − x) dx = e−α|x|−α|v−x| dx
x∈R 4 x∈R
Suppose v > 0; then the exponent is
−αv if 0 < x < v;
(
−α|x| − α|v − x| = −αv + 2αx if x < 0;
αv − 2αx if 0 < v < x.
By straightforward integration we get for v > 0
α
fV (v) = (1 + αv)e−αv
4
The distribution is symmetric; hence the general result is
α
fV (v) = (1 + α|v|)e−α|v| for v ∈ R.
4
(b) Now φV (t) = E[eitV ] = E[eitX ]E[eitY ] = α12 α22 /(α12 + t2 )(α22 + t2 ). Hence
α2 α2
φV (t) = 2 2 2 φX (t) − 2 1 2 φY (t)
α2 − α1 α2 − α1
and hence
α2 α2
fV (x) = 2 2 2 fX (x) − 2 1 2 fY (x)
α2 − α1 α2 − α1
α22 α1 −α1 |x| α2 α2
= e − 2 1 2 e−α2 |x| for x ∈ R. [←EX]
α22 2
− α1 2 α2 − α1 2
28.15 (a) By straightforward differentiation we get

 
0 β β−2 −xβ /γ β
fX (x) = βx e β−1−β β
γ γ
and
γ β 00 xβ β 2 x2β
 
β−3 −xβ /γ β
f (x) = x e (β − 1)(β − 2) − 3β(β − 1) β + 2β
β X γ γ
β
Note that the roots of the quadratic in x on the right hand side are when x = x1 and x = x2 . If β = 2, then x1 = 0; if
β < 2 then x1 is negative or imaginary.
0 00
(b) In this case fX (x) < 0 and fX (x) > 0. (c) If β = 1 then fX (x) = e−x/γ /γ, the exponential density.
00
(d) Suppose β ∈ (1, 2]. Then the quadratic in the expression for fX (x) is negative for x small and then becomes positive
as x becomes larger. (e) Similar. [←EX]
dy β
28.16 (a) Now dx= α; hence fY (y) = βy β−1 e−y/(αγ) /(αγ)β which is the Weibull (β, αγ) distribution.
(b) Using the substitution y = xβ /γ β gives
Z ∞ Z ∞  
β β n
E[X n ] = β xn+β−1 e−(x/γ) dx = γ n y n/β e−y dy = γ n Γ 1 +
γ 0 0 β
 
(c) E[X] = γΓ(1 + /β ) where β is the shape and γ is the scale. Also var[X] = γ Γ(1 + /β ) − Γ(1 + 1/β )2
1 2 2

The median is γ(ln 2)1/β . The mode is γ(1 − 1/β )1/β if β > 1 and 0 otherwise. (d) E[et ln X ] = E[X t ] = γ t Γ 1 + t/β .

(e) Follows from (c). [←EX]
Mar 10, 2020(20:25) Answers 2§28 Page 219

28.17 Using equation(2.22a) and the expressions for E[X k ] and var[X] in exercise 28.16 gives
3 3 2 3
E[X 3 ] − 3µσ 2 − µ3 γ Γ(1 + β ) − 3µσ − µ Γ3 − 3Γ1 Γ2 + 2Γ31
skew[X] = = =
σ3 σ3 (Γ2 − Γ21 )3/2
where Γk = Γ(1 + βk ).
(b) Equation(2.25a) shows that
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 Γ4 − 4Γ1 Γ3 + 6Γ21 (Γ2 − Γ21 ) + 3Γ41 Γ4 − 4Γ1 Γ3 + 6Γ21 Γ2 − 3Γ41
κ[X] = = = [←EX]
σ4 (Γ2 − Γ21 )2 (Γ2 − Γ21 )2
28.18 (a) It is h(x) = f (x)/[1 − F (x)] = βxβ−1 /γ β . (b) Straightforward. [←EX]
28.19 (a) (− ln U ) has the exponential (1) distribution; hence result by (27.3a). (b) Straightforward. [←EX]
28.20 Suppose x ∈ (0, ∞). Then P[X1:n > x] = P[X > x]n = exp[−xβ /γ β ]n = exp[−xβ /(γ/n1/β )β ] as required. [←EX]
28.21 Let Gγ denote the distribution function of the point mass at x = γ. The distribution function of Xβ is Fβ (x) =
1 − exp[−xβ /γ β ] for x ∈ (0, ∞).
If x ∈ (0, γ), then
Fβ (x) = 1 − exp[−xβ /γ β ] −→ 1 − e−0 = 0 = Gγ (x) as β → ∞.
If x ∈ (γ, ∞), then
Fβ (x) = 1 − exp[−xβ /γ β ] −→ 1 − 0 = 1 = Gγ (x) as β → ∞.
D
Hence Fβ (x) → Gγ (x) as β → ∞ at all continuity points of Gγ ; hence Fβ =⇒ Gγ as β → ∞. [←EX]
2 2 2 2
0
(r) = σ 2 − r2 e−r /2σ /σ 2 . (b) Just use σ 4 fR00 (r) = r[r2 − 3σ 2 ]e−r /2σ /σ 2 .
 
28.22 (a) Just use σ 2 fR
2 2
(c) P[R ≤ r] = 1 − e−r /2σ for r ≥ 0. (d) Using the substitution r2 = y shows
Z ∞ Z ∞  
1 2 2 1 n/2 −y/2σ 2 n+2
E[Rn ] = 2 rn+1 e−r /2σ dr = y e dy = 2 n/2 n
σ Γ
σ 0 2σ 2 0 2
R ∞ n−1 −λx
where the last equality comes from the integral of the gamma density: 0 x e dx = Γ(n)/λn .
p 2 2 2 −1 √
(e) E[R] = σ π/2; E[R √ ] = 2σ and hence var[R] = (4 − π)σ /2. 2 (f) From part (c), FR (p) = σ −2 ln(1 − p)
and the median is at σ 2 ln(2). (g) hR (r) = fR (r)/[1 − FR (r)] = r/σ for r ∈ (0, ∞). [←EX]
k
28.23 Using equation(2.22a) and the expression for E[X ] in exercise 28.22 gives
√ √
E[X 3 ] − 3µσ 2 − µ3 3σ 3 2π/2 − 3σ 3 2π(4 − π)/4 − σ 3 (2π)3/2 /8
skew[X] = =
σ3 σ 3 (4 − π)3/2 /23/2
√ √
6 π − 12 π(4 − π)/4 − π 3/2 2π 3/2 − 6π 1/2
= =
(4 − π)3/2 (4 − π)3/2
k

(b) Using the expression for E[X ] in exercise 28.22 gives µ = E[X] = σ 2π/2, E[X 4 ] = 8σ 4 and σ 2 = (4 − π)σ 2 /2.
Hence equation(2.25a) shows that
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 32 − 3π 2
κ[X] = = [←EX]
σ4 (4 − π)2
28.24
∞ Z ∞
σ2 (x − µ) (x − µ)2 (x − µ)2
Z    
1
L.H.S. = √ exp − dx + µ √ exp − dx
2πσ 2 0 σ2 2σ 2 0 2πσ 2 2σ 2
 ∞
−σ 2 (x − µ)2 σ2 µ2
   µ
=√ exp − + µP[X > 0] = √ exp − + µΦ
2πσ 2 2σ 2
x=0 2πσ 2 2σ 2 σ
(b)
Z ∞
r2 exp[t2 σ 2 /2] ∞ (r − tσ 2 )2
  Z  
1
MR (t) = E[etR ] = 2 r exp tr − 2 dr = r exp − dr
σ r=0 2σ σ2 r=0 2σ 2

2π exp[t2 σ 2 /2] ∞ (r − tσ 2 )2
 
r
Z
= √ exp − dr
σ r=0 2πσ 2 2σ 2

2π exp[t2 σ 2 /2] t2 σ 2
   
σ
= √ exp − + tσ 2 Φ(tσ)
σ 2π 2

= 1 + t 2πσ 2 exp σ 2 t2 /2 Φ(tσ) for t ∈ R.
 
[←EX]
√ 2 2 2 2
28.25 (a) P[X ≤ x] = P[ −2 ln U ≤ x/σ] = P[−2 ln U ≤ x2 /σ 2 ] = P[ln U ≥ −x2 /2σ 2 ] = P[U ≥ e−x /2σ ] = 1 − e−x /2σ
as required. √
2
(b) X has density fX (x) = λe−λx for x > 0. Hence fY (y) = 2λye−λy for y > 0. This is the Rayleigh (1/ 2λ)
distribution. [←EX]
Page 220 Answers 2§31 Mar 10, 2020(20:25) Bayesian Time Series Analysis

28.26 (a) The absolute value of the Jacobian of the transformation is



∂(x, y) cos θ −r sin θ
∂(r, θ) = sin θ
=r
r cos θ
Hence for r ∈ (0, ∞) and θ ∈ (0, 2π) we have
r −r2 /2σ2
f(R,Θ) (r, θ) = e
2πσ 2
Hence Θ is uniform on (0, 2π) with density f (θ) = 1/2π and R ∼ Rayleigh (σ).
(b)
r 2 2 1 1 1 −(x2 +y2 )/2σ2
f(X,Y ) (x, y) = 2 e−r /2σ = e for (x, y) ∈ R2 .
σ 2π r 2πσ 2
Hence X and Y are i.i.d. N (0, σ 2 ). √
(c) By exercise 28.25, we know that 2X ∼ Rayleigh (1); also 2πY ∼ uniform(0, 2π). Hence result by part (b).
(e) By part (a), X = R cos Θ and Y = R sin Θ where Θ ∼ uniform(0, 2π), R ∼ Rayleigh (σ) and R and Θ are
independent. Hence W = R sin 2Θ and Z = R cos 2Θ. Result follows by part (b). [←EX]
2 2 2
28.27 (a) Now fR (r) = re−r /2σ /σ 2 for r > 0. Let V = R2 . Then fV (v) = e−v/2σ /2σ 2 which is the exponential (1/2σ 2 ),
or equivalently gamma(n = 1, α = 1/2σ 2 ). (b) The density of V = R2 is fV (v) = e−v/2 /2 which is the
exponential (1/2), or equivalently gamma(n = 1, α = 1/2) = χ22 . (b) Because the sum of i.i.d. exponentials is
a Gamma distribution, the result follows. [←EX]
28.28 First consider the case µ = 0. Then we have
x 2 2 1 2 2
fX (x) = 2 e−x /2s for x > 0. Also fY |X (y|x) = √ e−y /2x for y ∈ R.
s 2πx 2
Hence, by using the integration result in equation(16.17b) on page 51, we get
Z ∞ Z ∞
1 1 2 2 2 2 1
fY (y) = fY |X (y|x)fX (x) dx = √ e− 2 (x /s +y /x ) dx = e−|y|/s
x=0 x=0 s
2 2π 2s
which is the Laplace(0, 1/s) distribution. Finally, if Y |X ∼ N (µ, σ = X) then (Y − µ)|X ∼ N (0, σ = X) and
(Y − µ) ∼ Laplace(0, 1/s). Hence result. [←EX]

Chapter 2 Section 31 on page 96 (exs-extreme.tex)

√ y > 0 we have f (µ + y) = f (µ − y).


31.1 (a) For (b) Write the density as sf (x) = e(x−µ)/s /[1 + e(x−µ)/s ]2 where
s = σ 3/π. Differentiating gives
e(x−µ)/s [1 − e(x−µ)/s ]
s2 f 0 (x) =
[1 + e(x−µ)/s ]3
and this proves (b). (c) Differentiating again gives
e(x−µ)/s [1 − 4e(x−µ)/s + e2(x−µ)/s ]
s3 f 00 (x) =
[1 + e(x−µ)/s ]4
2
√ √ √
Now the √equation z −√ 4z + 1 = 0 has roots 2 ± 3; so we have e(x−µ)/s = 2 ±√ 3; hence x = µ + s ln(2√± 3). Now
ln(2 − 3) = − ln(2 + 3). Hence the points of inflection are x = µ − s ln(2 + 3) and x = µ + s ln(2 + 3). [←EX]
31.2 Recall
1 2 e−x 1 2 e−2x
sech x = = x = 2 and hence sech (x) =
cosh x e + e−x 1 + e−2x 4 (1 + e−2x )2
which leads to the result.
(b) Using tanh(x) = (e2x − 1)/(e2x − 1) gives
π(x − µ)
 
1 1 1 1 1
+ tanh(x) = and hence FX (x) = + tanh √ [←EX]
2 2 1 + e−2x 2 2 2σ 3
31.3 Use equation(29.1d), set FX (x) = p and solve for x. [←EX]
31.4 Skewness: γ1 = E[(Y − µ)3 ]/σ 3 = 0.
 2
Kurtosis: κ = E[(Y − µ)4 ]/ E[(Y − µ)2 ] . By equation(29.3b) we know that if X ∼ logistic(0, π 2 /3), then E[X 4 ] =
 √ 4 4
7π 4 /15. Hence E[(Y − µ)4 ] = σ π 3 E[X 4 ] = 21σ 2 2
5 and E[(Y − µ) ] = σ . Hence κ = 21/5 = 4.2. [←EX]
dy
31.5 Now y = a + bx and hence | dx | = |b| and x − µ = (y − a − bµ)/b. Hence

π e−π(y−a−bµ)/(bσ 3)
fY (y) = √ h i for y ∈ R.
σ|b| 3 1 + e−π(y−a−bµ)/(bσ√3) 2

and this is the density of the logistic(a + bµ, b2 σ 2 ) distribution. If b < 0 we need to use the relation e−z /(1 + e−z )2 =
ez /(1 + ez )2 to put the relation in the form of equation(29.1c). [←EX]
Mar 10, 2020(20:25) Answers 2§31 Page 221

31.6 Now

e−π(x−µ)/(σ 3)
fX (x) π 1 π
1 − FX (x) = √ and hence h(x) = = √ √ = √ FX (x) [←EX]
1+ e−π(x−µ)/(σ 3) 1 − FX (x) σ 3 1 + e−π(x−µ)/(σ 3) σ 3

31.7 We know that if Y ∼ logistic(0,π 2 /3) then the distribution function of Y is FY (y) = 1/(1 + e−y ) for y ∈ R. This has
inverse FY−1 (u) = ln y/(1 − y) . Hence by §7.6 on page 23, we have the result. [←EX]
31.8 (a) The density is fX (x) = 2e−x /(1 + e−x )2 for x > 0. For the distribution function:
1 1 1 e−x 1 − e−x
FX (x) = P[−x ≤ Y ≤ x] = FY (x) − FY (−x) = − = − =
1 + e−x 1 + ex 1 + e−x 1 + e−x 1 + e−x
for x > 0.
(b) Proceeding as for the logistic gives
Z ∞ ∞ ∞
xe−x X
n Γ(2) X (−1)n
E[X] = 2 dx = 2 (−1) (n + 1) dx = 2 = 2 ln 2 = ln 4
0 (1 + e−x )2 (n + 1)2 n+1
n=0 n=0
2
Also var[X] = E[X 2 ] − { E[X] } = E[Y 2 ] − (ln 4)2 = π 2 /3 − (ln 4)2 . [←EX]
31.9 (a) By equation(3.3b) on page 9 we have
n! r−1 n−r
fr:n (y) = fY (y) {FY (y)} {1 − FY (y)}
(r − 1)!(n − r)!
e−y 1 e−(n−r)y
=c −y −y r−1
2
(1 + e ) (1 + e ) (1 + e−y )n−r
n! e−(n−r+1)y
= for x ∈ R.
(r − 1)!(n − r)! (1 + e−y )n+1
Using the transformation u/(1 − u) = ey or u = ey /(1 + ey ) = 1/(1 + e−y ) gives du y y 2
dy = e /(1 + e ) = e
−y
/(1 + e−y )2
and hence for t ∈ R we have
Z ∞ −(n−r+1)y Z ∞ −(n−r−it)y
itYr:n n! ity e n! e e−y
E[e ]= e dy = dy
(r − 1)!(n − r)! −∞ (1 + e−y )n+1 (r − 1)!(n − r)! −∞ (1 + e−y )n−1 (1 + e−y )2
Z 1 n−r−it Z 1
n! 1−u n!
= un−1 du = (1 − u)n−r−it ur+it−1 du
(r − 1)!(n − r)! 0 u (r − 1)!(n − r)! 0
Γ(r + it) Γ(n − r − it + 1)
=
Γ(r)Γ(n − r + 1)
(c) Now
dE[eitYr:n ] iΓ0 (r + it) Γ(n − r − it + 1) iΓ(r + it) Γ0 (n − r − it + 1)
= −
dt Γ(r)Γ(n − r + 1) Γ(r)Γ(n − r + 1)
d2 E[eitYr:n ] Γ00 (r + it) Γ(n − r − it + 1) Γ(r + it) Γ00 (n − r − it + 1)
=− −
dt2 Γ(r)Γ(n − r + 1) Γ(r)Γ(n − r + 1)
Γ0 (r + it) Γ0 (n − r − it + 1) Γ0 (r + it) Γ0 (n − r − it + 1)
+ +
Γ(r)Γ(n − r + 1) Γ(r)Γ(n − r + 1)
and hence
dE[eitYr:n ]

dt = iψ(r) − iψ(n − r + 1)
t=0
dE[eitYr:n ]

E[Yr:n ] = −i = ψ(r) − ψ(n − r + 1)
dt t=0
d2 E[eitYr:n ] Γ00 (r) Γ00 (n − r + 1)

dt2 = − Γ(r) − Γ(n − r + 1) + 2ψ(r)ψ(n − r + 1)


t=0
where ψ denotes the digamma function which satisfies
d Γ0 (z) Γ00 (z)
ψ(z) = ln( Γ(z) ) = and hence = ψ 0 (z) + [ ψ(z) ]2
dz Γ(z) Γ(z)
Hence
d2 E[eitYr:n ]

2 0 0 2
E[Yr:n ] = − = ψ (r) + ψ (n − r + 1) + [ψ(r) − ψ(n − r + 1)]
dt2 t=0
and
var[Yr:n 2
] = ψ 0 (r) + ψ 0 (n − r + 1) [←EX]
Page 222 Answers 2§31 Mar 10, 2020(20:25) Bayesian Time Series Analysis

31.10 (a) Let Z = ln(eX − 1). Then this transformation maps x ∈ (0, ∞) 7→ z ∈ (−∞,
z
∞). Also P[Z ≤ z] = P[ln(eX − 1) ≤
z] = P[eX − 1 ≤ ez ] = P[eX ≤ 1 + ez ] = P[X ≤ ln(1 + ez )] = 1 − e− ln(1+e ) . Hence

1 1 σ 3
P[Z ≤ z] = 1 − = 2
and so Z ∼ logistic(0, π /3) and Z ∼ logistic(0, σ 2 )
1 + ez 1 + e−z π
and hence the result. (b) By exercise 10.20, we know that if Z = X/Y then
t 1
P[Z ≤ t] = for t ≥ 0 and hence P[ ln(Z) ≤ z ] = P[Z ≤ ez ] = for z ∈ R.
1+t 1 + e−z
This implies Z ∼ logistic(0, π 2 /3) and then as for part (a). [←EX]
dy
31.11 (a) Now fX (x) = ex /(1 + ex )2 for x ∈ R and | dx | = ex = y − 1. Also the transformation is 1-1 and maps x ∈ R → y ∈
(1, ∞). Hence
fX (x) 1
fY (y) = = for y > 1.
y − 1 y2
(b) This transformation is 1-1 from y ∈ (1, ∞) → x ∈ R with | dx dy | = 1/(y − 1) = e
−x
. [←EX]
0
 −x  0 0
31.12 (a) The derivative is f (x) = f (x) e − 1 . Hence f (x) < 0 for x < 0 and f (x) > 0 for x > 0.
(b) Now f 00 (x) = f 0 (x) e−x − 1 − f (x)e−x = f (x) e−2x − 3e−x + 1 . So f 00 (x) = 0 implies e−2x − 3e−x + 1 = 0.
   
√ √
Setting y = e−x gives a quadratic in y with solutions y1 = (3 − 5)/2 and y2 = (3 + 5)/2. Hence the solutions for x
are
√ ! √ !
3− 5
   
1 2 1 2 3+ 5
x2 = ln( ) = ln √ = ln = −0.96 and x1 = ln( ) = ln √ = ln = 0.96
y2 3+ 5 2 y1 3− 5 2
Now if y < y1 , then f 00 (x) > 0; if y1 < y < y2 then f 00 (x) < 0 and if y > y2 then f 00 (x) > 0. Hence if x > x1 , then
f 00 (x) > 0; if x1 < x < x2 then f 00 (x) < 0 and if x < x2 then f 00 (x) > 0, as required. [←EX]
31.13 Setting F (x) = u gives ln(u) = −e−x and hence F −1 (u) = x = −ln (−ln(u)). In particular, for the median we need
u = 1/2 and then F −1 (u) = − ln (ln 2) = 0.3665. [←EX]
31.14 Using the substitution v = e−x gives
Z ∞ Z ∞ Z ∞
tX tx −x tx −x
 −x
v −t exp(−v) dv = Γ(1 − t)
  
E[e ] = e exp −(e + x) dx = e exp −(e ) e dx =
−∞ −∞ 0
for 1 − t > 0. [←EX]
31.15 (a) Use §7.6 and exercise 31.13. If U ∼ uniform(0, 1), then the distribution function of G(U ) is F . Also, if the
distribution function F is strictly increasing on the whole of R then F −1 , the inverse of F , exists and G = F −1 .
(b) See §7.6. If the random variable X has the distribution function F and F is continuous, then the random variable
F (X) has the uniform(0, 1) distribution.
(c) The transformation −ln z is 1−1 and maps (0, ∞) 7→ (∞, −∞). Also if y = −ln z then z = e−y and

dz
= e−y and hence fY (y) = fZ (z) dz = exp(−e−y ) e−y

dy dy
which is the density of the standard Gumbel.
(d) The transformation e−x is 1−1 and maps (−∞, ∞) 7→ (∞, 0). Also if z = e−x then

dz
= e−x and hence fZ (z) = fX (x) dx = exp(−e−x ) = e−z

dx dz
which is the density of the exponential (1) distribution. [←EX]
31.16 (a) Now X = a + bV where V ∼ Gumbel (0, 1). Suppose p ∈ (0, 1). If b > 0, then
P[ X ≤ a + bFV−1 (p) ] = P[ a + bV ≤ a + bFV−1 (p) ] = P[ V ≤ FV−1 (p) ] = p
−1
and hence FX (p) = a + bFV−1 (p) = a − bln (−ln(p)).
If b < 0 we have
P[ X ≤ a + bFV−1 (p) ] = P[ a + bV ≤ a + bFV−1 (p) ] = P[ V ≥ FV−1 (p) ] = 1 − p
−1
and hence FX (p) = a + bFV−1 (1 − p) = a − bln (−ln(1 − p)).
(b) Using X = a + bV where V ∼ Gumbel (0, 1) gives MX (t) = E[etX ] = eat Γ(1 − bt) for t ∈ (−∞, 1/b) if b > 0 and
for t ∈ (1/b, ∞) if b < 0.
(c) E[X] = a + bγ and var[X] = b2 var[V ] = b2 π 2 /6 [←EX]
31.17 (a) For x ∈ R we have
Z ∞ Z ∞
fZ (z) = fX (z + y)fY (y) dy = exp(−e−z−y )e−z−y exp(−e−y )e−y dy
y=−∞ y=−∞
Z ∞
−z −y −2y
dy where a = 1 + e−z .

=e exp −ae e
y=−∞
Using the substitution v = e−y and equation(11.1b) gives
Z ∞
e−z e−z
fZ (z) = e−z exp(−av)v dv = 2 =
0 a (1 + e−z )2
Mar 10, 2020(20:25) Answers 2§33 Page 223

and this is the density of logistic(0, π 2 /3).


(b) Now φX (t) = Γ(1 − it). Hence φZ (t) = Γ(1 − it)Γ(1 + it) and this is the c.f. of the logistic(0, π 2 /3) distribution by
equation(29.4a).
(c) Now φX (t) = eiaX t Γ(1 − ibt). Hence φZ (t) = ei(aX −aY )t Γ(1 − ibt)Γ(1 + ibt) and this is the c.f. of logistic(aX −
aY , π 2 b2 /3). [←EX]
31.18 Now X = a + bV where V ∼ Gumbel (0, 1). Also, by part (d) of exercise 31.15, we know that W = e−V ∼
exponential (1). Hence Y = e−X = e−a e−bV = e−a W b where W ∼ exponential (1). Hence by result(27.3a) on
page 83, we have the result. [←EX]
31.19 By result(27.3a) on page 83, we can write Y = bW 1/k where W ∼ exponential (1). Hence X = −ln Y = − ln b− k1 ln W .
By part (c) of exercise 31.15, we know that −ln W ∼ Gumbel (0, 1). Hence X ∼ Gumbel(−ln b, k1 ). [←EX]
31.20 For y > 0,
P[−y ≤ X < 0] FX (0) − FX (−y)
FY (y) = P[−X ≤ y|X < 0] = P[X ≥ −y|X < 0] = =
P[X < 0] FX (0)
and hence the density of Y is
fX (−y)
= e1 ey exp −ey for y > 0.
 
fY (y) =
FX (0)
and this is the density of a Gompertz distribution. [←EX]
31.21 (a) Use §7.6 on page 23 and equation(30.9b).

dy
(b) If Y = a1 + b1 X then dx = b1 . Using equation(30.9a) gives the result.
(c) For y < a
−α !
y−a

n
P[Y < y] = P[X < y] = exp − n
b
which is the distribution function of Fréchet(α, a, bn1/α ).
(d) if Y = 1/X then for y > 0 we have
 −α !   α 
1 y
P[Y < y] = P[1/X < y] = P[X > 1/y] = 1 − exp − = 1 − exp − −α
yb b
which is the distribution function of the Weibull (α, b−1 ) distribution. [←EX]
31.22 (a) By exercise 31.15, we know that − ln X ∼ Gumbel (0, 1). By §30.6, we know that µ − σ ln X ∼ Gumbel (µ, σ).
(b) Suppose X ∼ Weibull (β, γ) where β > 0 and γ > 0. By §30.7, we know that − ln X ∼ Gumbel (− ln γ, β1 ), and
hence −βγ ln X ∼ Gumbel (−βγ ln γ, γ) and hence βγ ln γ + β − βγ ln X ∼ Gumbel (β, γ) = GEV (0, β, γ).
(c) Suppose X ∼ GEV (0, µ, σ) = Gumbel (µ, σ). Then
X −µ X −µ
   
1
∼ Gumbel 0, and hence exp − ∼ Weibull (µ, 1)
µσ µ µσ
Hence by exercise 28.16 on page 85, we know that
X −µ
 
σ exp − ∼ Weibull (µ, σ)
µσ

dy
(d) If α = 0 then this is the result for the Gumbel distribution—see §30.6 on page 91. If α 6= 0, then dx = |a| and
1+1/α
y−b
 
dx 1 1 −τ y−b

fY (y) = fX (x) = fX (x) = τ e a
dy a aσ a
where
−α
y−b y − b − aµ
  
τ = 1+
a αaσ
which is the density of GEV (α, aµ + b, aσ). [←EX]

Chapter 2 Section 33 on page 102 (exs-LevyInverseGaussian.tex.tex)

33.1 (a) Now U = 1/Z 2 where Z ∼ N (0, 1). Clearly the range of the transformation is (0, ∞) and
1 −1/2u z 3

dz 1
fU (u) = fZ (z) = 2 √ e
=√ e−1/2u for u > 0.
du 2π 2
2πu3
(b) For u > 0 we have
√ √
  
1
P[U ≤ u] = P[Z 2 ≥ 1/u] = P[Z ≥ 1/ u] + P[Z ≤ −1/ u] = 2 1 − Φ √
u
(c) Now
Z ∞ Z ∞
e−1/2 ∞ k−3/2
  Z
1 1 1
E[U k ] = uk √ exp − du ≥ √ uk−3/2 e−1/2 du = √ u du = ∞
0 2πu3 2u 2π 1 2π 1
Page 224 Answers 2§33 Mar 10, 2020(20:25) Bayesian Time Series Analysis

(d) Setting FU (u) = p leads to expression for FU−1 (u).


(e) Differentiating gives
1
fU0 (u) = √ u−7/2 e−1/(2u) (1 − 3u)
2 2π
So fU0 (u) > 0 for u < 1/3 and fU0 (u) < 0 for u > 1/3 with mode at 1/3.
(f) Differentiating again gives
1
fU00 (u) = √ u−11/2 e−1/(2u) (15u2 − 10u + 1)
4 2π √ √
The equation 15u2 − 10u + 1 = 0 has roots α1 = 1/3 − 10/15 and α2 = 1/3 + 10/15. Hence fU00 (u) = 0 iff u = α1 or u = α2 .
Also, if u < α1 then fU00 (u) > 0; if α1 < u < α2 then fU00 (u) < 0 and if u > α2 then fU00 (u) > 0. [←EX]
2 d
33.2 Now U = 1/Z where Z ∼ N (0, 1). Hence, by exercise 16.25 on page 52, X = |Z| as required. [←EX]
33.3 Use characteristic functions    
1/2 1/2
φX+Y (t) = φX (t)φY (t) = exp ita1 − b1 |t|1/2 1 − i sgn(t) exp ita2 − b2 |t|1/2 1 − i sgn(t)
   
 
1/2 1/2
= exp it(a1 + a2 ) − (b1 + b2 )|t|1/2 1 − i sgn(t)

as required. [←EX]
33.4 (a) Suppose X1 , . . . , Xn are i.i.d. random variables with the Lévy(a, b) distribution. Then X1 +· · ·+Xn ∼ Lévy(na, n2 b);
hence the distribution Lévy(a, b) is stable  with α = /2.
1

(b) Now φX (t) = exp ita − b1/2 |t|1/2 1 − i sgn(t) and compare with equation(4.3b). [←EX]
33.5 Now r r
   
λ λ 2 1 1 2
f (x; 1, λ) = exp − (x − 1) and f (x; µ, 1) = exp − 2 (x − µ)
2πx3 2x 2πx3 2µ x
Hence r r
λ3 x µ 2
   
1 x λ λ λ x 2 1 x µ λ
f ( ; 1, ) = exp − ( − 1) and f ( ; , 1) = exp − ( − ) [←EX]
µ µ µ 2πx3 2x µ λ λ λ 2πx3 2µ2 x λ λ
2 2 2
33.6 (a) Now y = λ(x − µ) /(xµ ) and hence
4λ λ(x + µ)2 y x−µ
+ y2 = 2
and hence p =
µ xµ 4λ/µ + y 2 x+µ
proving the result.
(b) Clearly
Z y  
w exp −w2 /2
FY (y) = Φ(y) − p √ dw
−∞ 4λ/µ + w2 2π
e2λ/µ y
 
w
Z
1 2 4λ
= Φ(y) − √ p exp − (w + ) dw
2π −∞ 4λ/µ + w2 2 µ
dz
= wz ; hence
p
and now use the substitution z = w2 + 4λ/µ which has dw

Z y2 +4λ/µ
e2λ/µ e2λ/µ ∞
 2  2
z z
Z
FY (y) = Φ(y) − √ exp − dz = Φ(y) + √ √ 2 exp − dz
2π ∞ 2 2π y +4λ/µ 2
s !
2λ/µ 4λ
= Φ(y) + e Φ − y + 2 [←EX]
µ
33.7 Now X has m.g.f.
" r # r
λ λ 2tµ 2 λ λ 2tµ2 λ
tX
M (t) = E[e ] = exp − 1− and ln[M (t)] = − 1− = [1 − g(t)]
µ µ λ µ µ λ µ
where
1/2 −1/2
2tµ2 µ2 2tµ2 µ2 1
 
g(t) = 1 − and g 0 (t) = − 1− =−
λ λ λ λ g(t)
Hence
1 λ µ
M 0 (t) = − g 0 (t) = and hence M 0 (t) = µ

M (t) µ g(t) t=0
Next
1 1 2 µ µ3
M 00 (t) − {M 0
(t)} = − 2
g 0
(t) = 3
(33.7a)
M (t) M (t)2 {g(t)} λ {g(t)}
Setting t = 0 gives E[X 2 ] − µ2 = µ3 /λ and hence var[X] = µ3 /λ.
(b) Differentiating equation(33.7a) again gives
1 0 00 1 000 2 0 3 2 0 00 µ3 g 0 (t) 3µ5
− M (t)M (t) + M (t) − {M (t)} + M (t)M (t) = −3 =
M (t)2 M (t) M (t)3 M (t)2 λ {g(t)}4 λ2 {g(t)}5
Mar 10, 2020(20:25) Answers 2§33 Page 225

and hence
3 0 00 1 000 2 0 3 3µ5
− M (t)M (t) + M (t) + {M (t)} = 5
(33.7b)
M (t)2 M (t) M (t)3 λ2 {g(t)}
Setting t = 0 gives
3µ5 3µ4 3µ5
−3µE[X 2 ] + E[X 3 ] + 2µ3 = and hence E[X 3
] = µ 3
+ + 2
λ2 λ λ
Differentiating equation(33.7b) again gives
2 2 4
12 {M 0 (t)} M 00 (t) 3 {M 00 (t)} 4M 0 (t)M 000 (t) M 0000 (t) 6 {M 0 (t)} 15µ7
− − + − = 7
M (t)3 M (t)2 M (t)2 M (t) M (t)4 λ3 {g(t)}
Setting t = 0 gives
2 15µ7
12µ2 E[X 2 ] − 3 E[X 2 ] − 4µE[X 3 ] + E[X 4 ] − 6µ4 = 3

λ
and hence
6µ5 15µ6 15µ7
E[X 4 ] = µ4 + + 2 + 3 [←EX]
λ λ λ
33.8 Using equation(2.22a) and the expressions for E[X k ] and var[X] in exercise 33.7 gives
E[X 3 ] − 3µvar[X] − µ3 µ3 + 3µ4 /λ + 3µ5 /λ2 − 3µ4 /λ − µ3  µ 1/2
skew[X] = = =3
{var[X]}
3/2 9/2
µ /λ 3/2 λ
(b) Equation(2.25a) shows that
E[X 4 ] − 4µE[X 3 ] + 6µ2 var[X] + 3µ4
κ[X] = 2
{var[X]}
µ4 + 6µ5 /λ + 15µ6 /λ2 + 15µ7 /λ3 − 4µ4 − 12µ5 /λ − 12µ6 /λ2 + 6µ5 /λ + 3µ4
=
µ6 /λ2
3µ6 /λ2 + 15µ7 /λ3 15µ + 3λ
= = [←EX]
µ6 /λ2 λ
p
33.9 Let c = λ/(2π) and α = λ/(2µ2 ). Then for x > 0 we have
µ2
  
1
f (x) = c 3/2 exp −α x − 2µ +
x x
2
    
0 µ 3 −5/2 −3/2 2 −7/2
f (x) = c exp −α x − 2µ + − x − αx + αµ x
x 2
α(x − µ)2 x−7/2 x2
     
= c exp − λ 1 − 2 − 3x
x 2 µ
The last factor is initially positive at x = 0 and then monotonically decreases to −∞. Hence f initially increases and
then decreases. The mode occurs when the last factor equals 0. This occurs when
"r #
λx2 9µ2 3µ
+ 3x − λ = 0 which has one positive root at µ 1+ 2 −
µ2 4λ 2λ
(b) Differentiating again gives
α(x − µ)2
 
f 00 (x) = c exp − ×
x
 
15 −7/2 3 −5/2 7 2 −9/2 3
x + αx − αµ x + α(µ2 x−2 − 1)(− x−5/2 − αx−3/2 + αµ2 x−7/2 )
4 2 2 2
2
α(x − µ)
 
c
= exp − ×
4 x
h i
15x−7/2 + 6αx−5/2 − 14αµ2 x−9/2 + 2α(µ2 x−2 − 1)(−3x−5/2 − 2αx−3/2 + 2αµ2 x−7/2 )
cx−11/2 α(x − µ)2 
 
15x2 + 6αx3 − 14αµ2 x + 2α(µ2 − x2 )(−3x − 2αx2 + 2αµ2 )

= exp −
4 x
r " 2 #
λ(x − µ)2
  2  2
λ 2 2 x 3x
= exp − 15x + λ − 1 + 2λx −5 [←EX]
32πx11 2µ2 x µ2 µ2
Page 226 Answers 2§35 Mar 10, 2020(20:25) Bayesian Time Series Analysis

33.10 (a) Now λXk /µ2k has characteristic function:


λk √
 
λk
exp − 1 − 2it
µk µk
Pn
Hence k=1 λk Xk /µ2k has characteristic function
h √ i
exp µ − µ 1 − 2it
which is the characteristic function of the inverseGaussian(µ, µ2 ) distribution. (b) Now X1 + · · · + Xn has c.f.
" r #
nλ nλ 2itµ2
exp − 1−
µ µ λ
and hence X has c.f. " #
r
nλ nλ 2itµ2
exp − 1−
µ µ nλ
which is the c.f. of the inverseGaussian(µ, nλ) distribution. [←EX]
33.11 Suppose X1 , . . . , Xn are i.i.d. random variables with the inverseGaussian(µ, λ) distribution. Then X1 + · · · + Xn ∼
inverseGaussian(nµ, n2 λ). Hence we can express X = X1 + · · · + Xn where X1 , . . . , Xn are i.i.d. random variables
with the inverseGaussian(µ/n, λ/n2 ) distribution. Hence X is infinitely divisible. [←EX]
33.12 (a) By equation(32.5d) we have
r ! r ! r ! " r !#
λ λ λ λ
lim FX (x) = Φ − +Φ − = 2Φ − =2 1−φ
µ→∞ x x x x
and this is the distribution function of the Lévy(0, λ) distribution.
(b) Use equation(32.6a) for the characteristic function:
" r !# " #
λ 2itµ2 λ 2itµ2 /λ
lim ΦX (t) = lim exp 1− 1− = lim exp p
λ→∞ λ→∞ µ λ λ→∞ µ 1 + 1 − 2itµ2 /λ
" #
2itµ
= lim exp p = eitµ as required. [←EX]
λ→∞ 1 + 1 − 2itµ /λ 2

33.13 We can express the density in the form


k
!
X
fX (x|θ) = h(x)g(θ) exp ηi (θ)hi (x)
i=1

where θ = (µ, λ), h(x) = (2πx3 )−1/2 , g(θ) = λ exp(λ/µ), η1 (θ) = −λ/(2µ2 ), η2 (θ) = −λ/2, h1 (x) = x and
h2 (x) = 1/x. [←EX]

Chapter 2 Section 35 on page 104 (exs-otherbounded.tex)

R1 √
35.1 (a) Now −1
1 − x2 dx = π/2 because it is half the area of the circle with radius 1; or use the substitution x = sin θ and
hence the integral is
Z π/2
1 π/2 π
Z
2
cos θ dθ = (1 + cos 2θ) dθ =
−π/2 2 −π/2 2

(b) Now fX (x) = c 1 − x2 where c > 0 and hence
−cx c cx2
f 0 (x) = √ and f 00 (x) = − √ −
1 − x2 1 − x2 (1 − x2 )3/2
0 0 00
Hence f (x) < 0 for x > 0 and f (x) < 0 for x > 0; also f (x) < 0 for all x ∈ [−1, 1].
(c) For x ∈ [−1, 1] we have, by using the substitution t = sin θ,
Z x arcsin x
2 xp 1 arcsin x
Z Z 
1 sin 2θ
FX (x) = fX (t) dt = 1 − t dt =
2 (1 + cos 2θ) dθ = θ+
−1 π −1 π −π/2 π 2 −π/2
1 1 1 1 1 1
= arcsin x + + sin(2 arcsin x) = arcsin x + + x cos(arcsin x)
π 2 2π π 2 π
which gives the result. [←EX]
Mar 10, 2020(20:25) Answers 2§35 Page 227

35.2 (a) By symmetry, all the odd moments are zero—this is the first result. For the even moments:
Z 1 Z π/2 Z π/2
2 p 4 4
E[X 2n ] = x2n 1 − x2 dx = sin2n θ cos2 θ dθ = cos θ sin2n θ cos θ dθ
π −1 π 0 π 0
" #π/2
2n+1 π/2 2n+2 π/2
θ θ sin2n+2 θ
Z Z
4 sin 4 sin 4
= cos θ + dθ = dθ
π 2n + 1 π 0 2n + 1 π 0 2n + 1
0
Now:
Z π/2 Z π/2 π/2
Z π/2
sin2n θ dθ = sin2n−1 θ sin θ dθ = − sin2n−1 θ cos θ 0 + (2n − 1) sin2n−2 θ cos2 θ dθ

0 0 0
Z π/2 Z π/2
= (2n − 1) sin2n−2 θ dθ − (2n − 1) sin2n θ dθ
0 0
2n−1
which gives the reduction formula I2n = 2n I2n−2 and hence
(2n − 1)(2n − 3) · · · 3 (2n − 1)(2n − 3) · · · 3 π (2n − 1)(2n − 3) · · · 3 π
 
(2n)! π 2n π
I2n = I2 = = = =
(2n)(2n − 2) · · · 4 (2n)(2n − 2) · · · 4 4 2n n! 2 22n n!n! 2 n 22n+1
and hence  
4 4 2n + 2 π
E[X 2n ] = I2n+2 =
π(2n + 1) π(2n + 1) n + 1 22n+3
which gives the required result. (b) By equation(2.22a) and equation(2.25a) we have
E[X 3 ] − 3µσ 2 − µ3 E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 E[X 4 ]
skew[X] = = 0 and κ[X] = = =2 [←EX]
σ3 σ4 σ4
35.3 The density of (X, Y ) is
1
f(X,Y ) (x, y) = for (x, y) ∈ {(x, y) ∈ R2 : 0 < x2 + y 2 < 1}.
π
Hence the marginal density of X is
Z √1−x2
1 2p
fX (x) = √ dy = 1 − x2 for x ∈ (−1, 1).
π − 1−x2 π
Similarly for fY (y). [←EX]
35.4 (a) Now X has the density in equation(34.1b). Let Y = (X − µ)/r. Then Y ∈ (−1, 1) and Y has density

dx 2p
fY (y) = fX (x) = fX (x) × r = 1 − y2
dy π

(b) Part(a) implies P[X ≤ x] = P[Y ≤ (x − µ)/r] = FY (x − µ)/r and then use part (c) in exercise 35.1. [←EX]
35.5 Substituting α = 3/2 and β = 3/2 in equation(13.1a) on page 40 gives
s
 2
Γ(3) 1/2 3/2 8 p 8 1 1
f (x) = 3 x (1 − x) = x(1 − x) = − x− [←EX]
Γ( /2)Γ( 3/2) π π 4 2

35.6 Now X = a + bY ; hence dx
dx
dy = b > 0. Hence fY (y) = fX (x) dy = bfX (a + by). [←EX]

R1
35.7 (a) Now fX (x) = 2x; hence E[X n ] = 0 2xn+1 dx = 2/(n + 2).
(b) If p = 0 then
Z 1  
1 1 2
E[X n ] = 2xn (1 − x) dx = 2 − =
0 n + 1 n + 2 (n + 1)(n + 2)
If 0 < p < 1 then
Z p n+1 Z 1 n
2x (1 − x) 2pn+1 1 − pn+1 1 − pn+2
 
n 2x 2
E[X ] = dx + dx = + −
0 p p 1−p n+2 1−p n+1 n+2
n+1 n+2 n+1
2 1−p 1−p 2 1−p 2 1 − pn+1
 
2
= + pn+1 − = −
n+1 1−p n+2 1−p n+1 1−p n+2 1−p
2 1 − pn+1
=
(n + 1)(n + 2) 1 − p
(c) If X ∼ triangular(0, 1, p) then E[X] = (1 + p)/3 for all p ∈ [0, 1]. Also E[X 2 ] = (1 + p + p2 )/6 and hence
var[X] = (1 − p + p2 )/18 for all p ∈ [0, 1]. Hence if X ∼ triangular(a, b, p) then E[X] = a + b(1 + p)/3 and
var[X] = b2 (1 − p + p2 )/18. [←EX]
Page 228 Answers 2§35 Mar 10, 2020(20:25) Bayesian Time Series Analysis

35.8 By the definition of skewness and kurtosis, equation(1.4a) on page 4 and equation(1.5a) on page 5, we know that we
1
can assume X ∼ triangular(0, 1, p). Now E[X 3 ] = 10 (1 + p + p2 + p3 ) and E[X] = 13 (1 + p) and var[X] = 18 1
(1 − p + p2 ).
Hence, using equation(2.22a) on page 7, we have
E[X 3 ] − 3µσ 2 − µ3
 
1 1 2 3 1 2 1 3
skew[X] = = (1 + p + p + p ) − (1 + p)(1 − p + p ) − (1 + p)
σ3 σ 3 10 18 27
1 
27 + 27p + 27p2 + 27p3 − 15 − 15p3 − 10 − 30p − 30p2 − 10p3

=
270σ 3 √ √
2  2 3
 2
= 2 − 3p − 3p + 2p = (1 + p)(2 − p)(1 − 2p)
5(1 − p + p2 )3/2 5(1 − p + p2 )3/2
1
For the kurtosis, we need E[X 4 ] = 15 (1 + p + p2 + p3 + p4 ). Hence, using equation(2.25a) on page 8, we have
E[X ] − 4µE[X ] + 6µ σ + 3µ4
4 3 2 2
κ[X] =
σ4
1 2 1 1
(1 + p + p + p3 + p4 ) − 15
2
(1 + 2p + 2p2 + 2p3 + p4 ) + 27 (1 + p)2 (1 − p + p2 ) + 27 (1 + p)4 12
= 15 1
= [←EX]
2 (1 − p + p )
2 2 5
Rx 18
35.9 (a) By integration, FX (x) = a fX (u) du. (b) Just set FX (x) = u and solve for x. [←EX]
dy
35.10 Let Y = 1 − X, Then dx = 1. Then use equation(35.6a) on page 105. [←EX]
35.11 (a) Now X = a + bY where Y ∼ triangular(0, 1, p). Hence c + dX = c + da + dbY where Y ∼ triangular(0, 1, p) and
c + da ∈ R and db ∈ (0, ∞). Hence c + dX ∼ triangular(c + da, db, p).
(b) Now c − dX = c − da − dbY = c − da − db + db(1 − Y ) where 1 − Y ∼ triangular(0, 1, 1 − p), c − da − db ∈ R
and db ∈ (0, ∞). Hence c − dX ∼ triangular(c − da − db, db, 1 − p). [←EX]
35.12 (a) Let Z = min{U1 , U2 }; then a ≤ Z ≤ a + b and P[Z ≥ x] = P[U1 ≥ x]P[U2 ≥ x] = [1 − FU (x)]2 = (a +
b − x)2 /b2 . Hence FZ (x) = 1 − (a + b − x)2 /b2 and exercise 35.9 shows that this is the distribution function of the
triangular(a, b, 0) distribution.
Similarly, if Z = max{U1 , U2 }; then a ≤ Z ≤ a + b and P[Z ≤ x] = [FU (x)]2 = (x − a)2 /b2 and this is the distribution
function of the triangular(a, b, 1) distribution.
(b) Now 0 ≤ |U2 − U1 | ≤ a + b and P[|U2 − U1 | ≥ x] = P[U2 − U1 ≥ x] + P[U1 − U2 ≥ x] = 2P[U2 − U1 ≥ x] =
R a+b−x R a+b−x z=a+b−x
2P[U2 ≥ x + U1 ] = b2 z=a P[U2 ≥ x + z] dz = b22 z=a (a + b − x − z) dz = b12 −(a + b − x − z)2 z=a =
2 2 2 2
(b − x) /b . Hence the distribution function of |U2 − U1 | is 1 − (b − x) /b which is the distribution function of the
triangular(0, b, 0 distribution.
(c) By §7.2 on page 19, we know that V1 and V2 are i.i.d. random variables with the uniform(0, 1) distribution then
V1 + V2 ∼ triangular(0, 2, 1/2). In our case, U1 = a + bV1 and hence U1 + U2 = 2a + b(V1 + V2 ) ∼ triangular(2a, 2b, 1/2)
by part(a) of exercise 35.11.
Now U1 ∼ uniform(a, a + b); hence Z1 = 2a + b − U1 ∼ uniform(a, a + b). Hence Z1 + U2 ∼ triangular(2a, 2b, 1/2).
We have shown that 2a + b + U2 − U1 ∼ triangular(2a, 2b, 1/2); hence by part(a) of exercise 35.11 we have U2 − U1 ∼
triangular(−b, 2b, 1/2). [←EX]
35.13 Using equation(35.6a), we know that if X ∼ triangular(0, 1, 0), then fX (x) = 2(1 − x) for x ∈ [0, 1], and if X ∼
triangular(0, 1, 1), then fX (x) = 2x for x ∈ [0, 1]. The beta density is given in equation(13.1a) on page 40. [←EX]
dy
35.14 If y = (x − a)/b then | dx | = 1/b and fY (y) = bfX ( a + by ) = π2 sin(πy) for y ∈ [0, 1] and this is the density of the
sinD (0, 1) distribution.
d
Similarly, if X ∼ sinD (a, b) and Y ∼ sinD (c, d) then X = bY /d + a − bc/d. [←EX]
35.15 (a) For t ∈ [a, a + b] we get, by using the substitution v = π(x − a)/b
π t x−a 1 π(t−a)/b t−a
Z   Z   
π(t−a)/b 1
FX (t) = sin π dx = sin(v) dv = |− cos(v)|v=0 = 1 − cos π
2b a b 2 v=0 2 b
(b) Setting FX (t) = p gives the quantile function. Setting p = 1/2 gives the median.
(c) Suppose Y ∼ sinD (0, 1) then the m.g.f. of Y is
1
π 1 ty
Z 1
π ty cos(πy) π
Z
tY cos(πy)
E[e ] = e sin(πy) dy = − e + t ety dy
2 y=0 2 π
y=0 2 y=0 π
t 1 ty
Z
1 t
= (e + 1) + e cos(πy) dy
2 2 y=0
" 1 Z 1 # Z 1
1 t t ty sin(πy) ty sin(πy) 1 t t2
ety sin(πy) dy

= (e + 1) + e −t e dy = (e + 1) −
2 2 π y=0 y=0 π 2 2π y=0
1 t t2
(e + 1) − 2 E[etY ]
=
2 π
where two integration by parts have been performed. Hence
π 2 (et + 1) π 2 (etb + 1)
E[etY ] = 2 2
and E[etX ] = E[et(a+bY ) ] = eta 2 2 for t ∈ R. [←EX]
2(t + π ) 2(t b + π 2 )
Mar 10, 2020(20:25) Answers 2§37 Page 229
h R 1 cos(πx) i 1 1 R 1
cos(πx) 1
π
R1 π

35.16 (a) Now E[X] = x sin(πx) dx = −x + dx = 2 + 2 x=0 cos(πx) dx = 12 .
2x=0 2
 π x=0 x=0 π 
1
π 1 x2 cos(πx)
π
R 1 2x cos(πx) R1
dx = 12 + x=0 x cos(πx) dx = 21 − π22 .
2
R 2
Similarly E[X ] = 2 x=0 x sin(πx) dx = 2 − π + x=0 π
x=0
 1 
x3 cos(πx)
π 1 π
R 1 3x2 cos(πx) R1
dx = 12 + 23 x=0 x2 cos(πx) dx where
3
R 3
E[X ] = 2 x=0 x sin(πx) dx = 2 − π + x=0 π
x=0
R1 2 2 1
R 2
x=0
x cos(πx) dx = − π x=0 x sin(πx) dx = − π2
.
 
4
1 R 1 4x3 cos(πx)
π 1 π x cos(πx)
R1
dx = 12 + 2 x=0 x3 cos(πx) dx where
4
R 4
E[X ] = 2 x=0 x sin(πx) dx = 2 − π + x=0 π
x=0
R1 3 3 1
R 2 3 1
 4

x=0
x cos(πx) dx = − π x=0 x sin(πx) dx = − π π − π3 .
(b) Now X = a+bY where Y ∼ sinD (0, 1). Using the results from part(a) gives E[X] = a+ b/2, E[X 2 ] = a2 +2abE[Y ]+
2 2
b2 E[Y 2 ] = a2 + ab + b /2 − 2b /π2 and var[X] = b2 var[Y ] = b2 ( 1/4 − 2/π2 ). [←EX]
35.17 Without loss of generality, we can assume X ∼ sinD (0, 1).
(a) By equation(2.22a) on page 7 and exercise 35.16,
E[X 3 ] − 3µσ 2 − µ3 E[X 3 ] − 23 σ 2 − 81
skew[X] = = =0
σ3 σ3
(b) By equation(2.25a) on page 8 and exercise 35.16,
4 6 3 3 3
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 4 E[X ] − 1 + π 2 + 8 − π 2 + 16
κ[X] = = 16π
σ4 (π 2 − 8)2
4 4 4 2 4 2
16π E[X ] − 7π + 48π π − 48π + 384
= = [←EX]
(π − 8)
2 2 (π 2 − 8)2
dy
35.18 If y = (x − a)/b then | dx | = 1/b and fY (y) = bfX ( a + by ) = (k + 1/2)y 2k for y ∈ [−1, 1] and this is the density of the
Upower(0, 1, k) distribution.
d
Similarly, if X ∼ Upower(a, b, k) and Y ∼ Upower(c, d, k) then X = bY /d + a − bc/d. [←EX]
35.19 (a) For x ∈ [a − b, a + b] we get, by using the substitution v = (t − a)/b
2k " 2k+1 #
2k + 1 x t−a 2k + 1 (x−a)/b 2k x−a
Z  Z 
1 2k+1 (x−a)/b 1
FX (x) = dt = v dv = v v=−1
= 1+
2b t=a−b b 2 v=−1 2 2 b
(b) Setting FX (x) = p and solving for x gives the quantile function. Setting p = 1/2 gives the median. [←EX]
R1
35.20 (a) Just use E[X n ] = (k + 1/2) x=−1 x2k+n dx = (k + 1/2)x2k+n+1 |1x=−1 .
(b) Clearly E[X 2 ] = (2k + 1)/(2k + 3).
(c) Just use (X − a)/b ∼ Upower(0, 1, k) and part(a). [←EX]
35.21 Without loss of generality, we can assume X ∼ Upower(0, 1, k).
(a) By equation(2.22a) on page 7 and exercise 35.20,
E[X 3 ] − 3µσ 2 − µ3
skew[X] = =0
σ3
(b) By equation(2.25a) on page 8 and exercise 35.20,
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 E[X 4 ] (2k + 3)2
κ[X] = = = [←EX]
σ4 σ4 (2k + 1)(2k + 5)
35.22 (a) Let Fk denote the distribution function of Xk . Using equation(35.19a) gives
0 if x = −1;
(
1
lim Fk (x) = lim (1 + x2k+1 ) = 1/2 if −1 < x < 1;
k→∞ k→∞ 2
1 if x = 1.
(b) Because Xk = a + bZk where Zk ∼ Upower(0, 1, k). [←EX]

Chapter 2 Section 37 on page



110 (exs-other.tex)

dx
37.1 (a) Now = 2y and hence fY (y) = fX (x) dx
dy = 2yfX (x) where

dy

xn/2−1 e−x/2
fX (x) =
2n/2 Γ( n/2)
Hence part(a). For part(b)
y2 y2
xn/2−1 e−x/2
Z Z
2
FY (y) = P[Y ≤ y] = P[X ≤ y ] = fX (x) dx = dx
0 0 2n/2 Γ( n/2)
y 2 /2 2
Γ( n/2, y /2)
Z
1
= v n/2−1 e−v dv = by substituting x = 2v. [←EX]
Γ( n/2) 0 Γ( n/2)
Page 230 Answers 2§37 Mar 10, 2020(20:25) Bayesian Time Series Analysis
2
37.2 (a) Substituting n = 1 in equation(37.1a) gives, for y > 0, fY (y) = 21/2 e−y /2
/π 1/2 . Then see the answer to exer-
cise 16.25.
(b) Substituting n = 2 in equation(37.1a) gives, for y > 0,
2
fY (y) = ye−y /2 which is the density of the Rayleigh (1) distribution—see §27.4 on page 83.
(c) Substituting n = 3 in equation(37.1a) gives, for y > 0,
2
y 2 e−y /2 √
fY (y) = 1/2 3 where Γ( 3/2) = 21 Γ( 1/2) = 2π . Hence result. [←EX]
2 Γ( /2)
37.3 Using the transformation w = x2 /2 gives
Z ∞ Z ∞
k 1 k+n−1 −x2 /2 1
E[X ] = (n−2)/2 x e dx = (n−2)/2 2(k+n−1)/2 w(k+n−2)/2 e−w dw
2 Γ(n/2) x=0 2 Γ(n/2) w=0
Z ∞ 
2k/2 (k+n−2)/2 −w k/2 Γ
(n+k)/2
= w e dw = 2 [←EX]
Γ(n/2) w=0 Γ( n/2)
37.4 By equation(37.3a) we have E[X 3 ] = (n + 1)µ. By exercise 2.22 on page 7 we have
E[X 3 ] − 3µσ 2 − µ3 1   1 
= 3 nµ + µ − 3µ(n − µ2 ) − µ3 = 3 µ − 2µn + 2µ3

skew[X] =
σ3 σ σ
and then using σ 2 = n − µ2 gives the required result.
For the kurtosis, use E[X 4 ] = (n + 2)n, E[X 3 ] = (n + 1)µ and σ 2 = n − µ2 . Hence by equation (2.25a) on page 8.
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 1 
= 4 (n + 2)n − 4µ2 n − 4µ2 + 6µ2 σ 2 + 3µ4

κ[X] =
σ4 σ
1 
= 4 (n + 2)(σ 2 + µ2 ) − 4µ2 n − 4µ2 + 6µ2 σ 2 + 3µ2 (n − σ 2 )

σ
1   1 
= 4 (n + 2)σ 2 − 2µ2 + 3µ2 σ 2 = 4 2σ 2 − 2µ2 + 4µ2 σ 2 + nσ 2 − µ2 σ 2

σ σ
1 
= 4 2σ 2 − 2µ2 + 4µ2 σ 2 + σ 4 as required.

[←EX]
√ −x σ 2 √
/2 −x2 /2 0 −x2 /2 00 −x2 /2 2
37.5 (a) Now f (x) = 2e / π = ce where c > 0. Hence f (x) = −xce and f (x) = ce [x − 1]. Hence
f 0 (x) < 0 for all x; f 00 (x) < 0 for x < 1, f 00 (1) = 0 and f 00 (x) > 0 for x > 1. Hence part(a).
2 2 2
(b) Now f (x) = cxn−1 e−x /2 where c > 0. Hence f 0 (x) = ce−x /2 [(n − 1)xn−2 − xn ] = cxn−2 e−x /2 [n − 1 − x2 ].
Hence part(b).
2 2
(c) Now f 00 (x) = ce−x /2 [(n − 1)(n − 2)xn−3 − nxn−1 − (n − 1)xn−1 + xn+1 ] = ce−x /2 [(n − 1)(n − 2)xn−3 − (2n −
2
1)xn−1 +xn+1 ] = ce−x /2 xn−3 [x4 −(2n−1)x2 +(n−1)(n−2)]. Solving the equation x4 −(2n−1)x2 +(n−1)(n−2) = 0
gives the points of inflection. If n < 7/8 there are none; if n = 7/8 there is one; if 7/8 < n < 1 there are two; if 1 ≤ n ≤ 2
there is one and if n > 2 there are 2.
As x becomes large, x4 − (2n − 1)x2 + (n − 1)(n − 2) > 0 and hence f 00 (x) > 0 and f is convex.
As x becomes small, sgn[f 00 (x)] = sgn[(n − 1)(n − 2)] and so if n < 1 or n > 2 then f is initially convex; otherwise it
is initially concave.
The result follows. [←EX]
37.6 (a) Now
 1/2  1/2
21/2 Γ (n+1)/2 21/2 n−1 n−1 n−1
    
µn 2 1
= = θ + = θ + 1 − → 1 n → ∞.
n1/2 n1/2 Γ( n/2) n1/2 2 2 n 2 n
Similarly
(n+1)/2
 1/2
n−1 n−1
  
1/2 1/2 Γ 1/2 1/2
µn − n = 2 − n = 2 θ + − n1/2
Γ( n/2) 2 2
1/2
2θ n−1

n−1 +n−1−n
  
1/2 2
= 2θ +n−1 −n =  1/2
2 2θ n−1

+n−1 + n1/2
2
2θ n−1

2 −1
= 1/2
n−1

2θ 2 + n − 1 + n1/2
1/2 1/2
where we have  used the relation a − b = (a − b)/(a1/2 + b1/2 ).
n−1 1/2
Now θ 2 → /4 as n → ∞. Hence µn − n → 0 as n → ∞.
1
(b) Now
 !2
Γ (n+1)/2 n−1
 
2
µn = 2 = n − 1 + 2θ
Γ( n/2) 2
and hence
n−1
 
2 1 1
var[Xn ] = n − µn = 1 − 2θ →1−2× = as n → ∞. [←EX]
2 4 2
Mar 10, 2020(20:25) Answers 2§37 Page 231

37.7 (a) Let Xn = Vn /n. Then


Xn − 1 D
p =⇒ N (0, 1) as n → ∞.
2/n
p √ √
Now use the delta method with µ = 1, σn = 2/n and g(x) = x. Then g 0 (x) = 1/(2 x) and g 0 (1) = 1/2 6= 0. Hence
√ hp i
D
2n Xn − 1 =⇒ N (0, 1) as n → ∞.
and hence √
√ √ D Wn − n D
2 Wn − 2n =⇒ N (0, 1) as n → ∞. or, equivalently, √ =⇒ N (0, 1) as n → ∞.
1/ 2
(b) Now
√ √
W n − µn Wn − n n − µn
 
1
= √ √ + = an Yn + bn
σn 2σn 1/ 2 σn
where √
n − µn
 
1
an = √ → 1 as n → ∞, bn = → 0 as n → ∞
2σn σn
and √
Wn − n D
Yn = √ =⇒ N (0, 1) as n → ∞.
1/ 2
The result follows by Slutsky’s theorem or see example 25.8 on page 334 in [B ILLINGSLEY (1995)]. [←EX]
2 −x2 /(2β 2 ) 0 −x2 /(2β 2 )
 3 2
37.8 (a) Now fX (x) = cx e for x ∈ (0, ∞) where c > 0. Hence fX (x) = c 2x − x /β e . Hence
0
√ 0

fX (x) > 0 for x < β 2 and fX (x) < 0 for x > β 2.
2 2 2 2
00
(x) = c 2 − 3x2 /β 2 + x4 /β 4 − 2x2 /β 2 e−x /(2β ) = c1 x4 − 5x2 β 2 + 2β 4 e−x /(2β ) where c1 = c/β 4 .
   
(b) Now fX
2 2
00
Hence fX (x) = c [(x − x1 )(x − x2 )] e−x /(2β ) and the result follows. [←EX]
37.9 (a) Suppose Y ∼ Maxwell (1). Then integrating by parts gives
r Z ∞ r Z ∞ r
2 2 −y 2 /2 2 −y 2 /2 2
E[Y ] = y ye dy = 2ye dy = 2
π y=0 π y=0 π
and E[X] = βE[Y ]. By using the integral of the Maxwell (1) density equals 1, we get
r Z ∞ r Z ∞
2 −y 2 /2 2 2
E[Y ] =2 3
y ye dy = 3y 2 e−y /2 dy = 3
π y=0 π y=0
and hence E[X 2 ] = 3β 2 .
(b) Suppose Y ∼ Maxwell (1). Using the substitution v = y 2 /2 gives
r Z ∞ r Z ∞
n 2 n+1 −y 2 /2 2
E[Y ] = y ye dy = 2(n+1)/2 v (n+1)/2 e−v dv
π y=0 π y=0
2n/2+1 ∞ (n+1)/2 −v 2n/2+1
 
n+3
Z
= √ v e dv = √ Γ [←EX]
π y=0 π 2
p √ √
37.10 Without loss of generality, we can assume X ∼ Maxwell (1). Then µ = 2 2/π, σ 2 = (3π−8)/π and E[X 3 ] = 8 2/ π.
Using equation(2.22a) gives
√ √ √
E[X 3 ] − 3µσ 2 − µ3 3
3/2 E[X ] − 3µσ − µ
2 3
8 2π − 6 2(3π − 8) − 16 2
skew[X] = =π =
σ3 (3π − 8)3/2 (3π − 8)3/2
4

Now E[X ] = 8Γ( 7/2)/ π = 15. By equation(2.25a) we have
E[X 4 ] − 4µE[X 3 ] + 6µ2 σ 2 + 3µ4 15 − 128/π + 48(3π − 8)/π 2 + 192/π 2
κ[X] = = π2 [←EX]
σ 4 (3π − 8)2
(a) Let u = (1 − p)e−x/b . Then du
−x/b
37.11 dx = −(1 − p)e /b and

Z ∞ Z 0 Z 1−p
1 1 1 −1 1 1−p
fX (x) dx = du = du = ln(1 − u) =1

0 ln(p) 1−p 1 − u ln(p) 0 1 − u ln(p) 0
Also fX (x) > 0 for all x ∈ (0, ∞). Hence fX is a density.
(b) Differentiating gives
0 1 (1 − p)e−x/b
fX (x) = 2 < 0 because ln(p) < 0.
b ln(p) [1 − (1 − p)e−x/b ]2
00 −1 (1 − p)e−x/b [1 + (1 − p)e−x/b ]
fX (x) = 3 >0
b ln(p) [1 − (1 − p)e−x/b ]3
(c)
ln[1 − (1 − p)e−x/b ]
FX (x) = 1 − for x ∈ (0, ∞).
ln(p)
Page 232 Answers 2§37 Mar 10, 2020(20:25) Bayesian Time Series Analysis

−1 −1
 1−u

√ FX (x) = u where u ∈ (0, 1) gives FX (u) = x = b ln(1 − p) − ln(1 − p
(d) Setting . The median is FX (1/2) =
b ln(1 + p]. [←EX]

37.12 (a) If X = bY then dx
dy = b and fY (y) = bfX (by). Hence Y ∼ expLog(1, p).

(b) The hazard function is


fX (t) (1 − p)e−x/b
hX (x) = =− for x ∈ (0, ∞).
1 − FX (t) b[1 − (1 − p)e−x/b ] ln[1 − (1 − p)e−x/b ]
and clearly hX (x) = b1 hY ( xb ) where Y ∼ expLog(1, p).
(c) By part(b), we just need consider hY (x). Let α(x) = (1 − p)e−x and β(x) = ln[1 − α(x)]. Hence α0 (x) = −α(x) and
α0 (x) α(x) α(x)
β 0 (x) = − = and hY (x) = −
1 − α(x) 1 − α(x) β(x)[1 − α(x)]
Hence
β(x)[1 − α(x)]α0 (x) − α(x) β 0 (x)[1 − α(x)] − β(x)α0 (x)

0 α(x) + β(x)
hY (x) = − = α(x) <0
β(x) [1 − α(x)]
2 2 β(x)2 [1 − α(x)]2
by using y + ln(1 − y) < 0 for y ∈ (0, 1). [←EX]
P∞ k
37.13 For x ∈ (−1, 1) we know that k=1 x = x/(1 − x). Hence

1 X
fX (x) = − (1 − p)k e−kx/b
b ln(p)
k=1

and
∞ Z ∞
n 1 X
E[X ] = − (1 − p)k xn e−kx/b dx
b ln(p) 0
k=1

1 X bn+1 n!
=− (1 − p)k n+1 by using the integral of the gamma density is 1.
b ln(p) k
k=1
Now ∞
X (1 − p)k X∞
1
≤ = ζ(n + 1) which is finite, and hence E[X n ] → 0 as p → 0.

k n+1 k n+1


k=1 k=1
Using L’Hôpital’s rule shows that E[X n ] → bn n! as p → 1. [←EX]
−1
37.14 (a) By §7.6 on page 23 we know that FX (U ) ∼ expLog(b, p); also exercise 37.11 shows that
−1
(u) = b ln(1 − p) − ln(1 − p1−u
 
FX
Finally, use 1 − U ∼ uniform(0, 1).
(b) By §7.6 on page 23 we know that if the random variable X has the distribution function F and F is continuous, then
the random variable F (X) has the uniform(0, 1) distribution. By exercise 37.11, we know that
ln[1 − (1 − p)e−x/b ]
FX (x) = 1 − for x ∈ (0, ∞).
ln(p)
Also, if U ∼ uniform(0, 1) then 1 − U ∼ uniform(0, 1). Hence result. [←EX]
37.15 For x ∈ (0, ∞),
∞ ∞ ∞  k
X
k
X
−kλx (1− p)k 1 X (1 − p)e−λx
P[X > x] = (P[Y1 > x]) P[N = k] = − e =−
k ln(p) ln(p) k
k=1 k=1 k=1
−λx
ln[1 − (1 − p)e ]
= [←EX]
ln(p)

37.16 By exercise 37.11, we know that


ln[1 − (1 − p)e−x/b ]
FX (x) = 1 − for x ∈ (0, ∞).
ln(p)
(a) Now FX (0) = 0; also for x > 0, we have FX (x) → 1 as p → 0.
(b) Using L’Hôpital’s rule we have
pe−x/b
lim FX (x) = 1 − lim = 1 − e−x/b [←EX]
p→1 p→1 1 − (1 − p)e−x/b
Mar 10, 2020(20:25) Answers 2§37 Page 233

37.17 (a) Differentiating f 0 gives


f 00 (x) (bk + xk ) bk (k − 1)(k − 2)xk−3 − 2(k − 1)(k + 1)x2k−3 − bk (k − 1)xk−2 − (k + 1)x2k−2 3kxk−1
   
=
kbk (bk + xk )4
k k k k
  k
− b (k − 1) − (k + 1)xk 3kxk
 
k−3 (b + x ) b (k − 1)(k − 2) − 2(k − 1)(k + 1)x
=x
(bk + xk )4
(bk + xk )(k − 1) bk (k − 2) − 2(k + 1)xk − 3kxk bk (k − 1) + 3kxk (k + 1)xk
 
= xk−3
(bk + xk )4
(k − 1) (b + xk )(bk (k − 2) − 2(k + 1)xk ) − 3kxk bk + 3kxk (k + 1)xk
 k 
= xk−3
(bk + xk )4
k k k k k k
+ 3kxk (k + 1)xk
 2k 2k

k−3 (k − 1) (b (k − 2) − 2(k + 1)x b + b x (k − 2) − 2(k + 1)x ) − 3kx b
=x
(bk + xk )4
k k k k
 2k 2k

k−3 (k − 1) (b (k − 2) − 4(k + 1)x b − 2(k + 1)x ) + 3kx (k + 1)x
=x
(bk + xk )4
b (k − 1)(k − 2) − 4(k − 1)(k + 1)xk bk − 2(k − 1)(k + 1)x2k + 3kxk (k + 1)xk
2k
= xk−3
(bk + xk )4
k k
b (k − 1)(k − 2) − 4(k − 1)x b + (k + 1)(k + 2)x2k
2k 2
= xk−3
(bk + xk )4
(b) If k ∈ (0, 1] then f (x) > 0 and hence f is convex. (c) and (d) Treating The numerator, treated as a quadratic in xk ,
00

has discriminant 12b2k k 2 (k 2 − 1). [←EX]


37.18 (a) Now
bk f (x) kxk−1
1 − F (x) = k k
and hence h(x) = = k
b +x 1 − F (x) b + xk
(b) The derivative of the hazard function is
kxk−2 
h0 (x) = k (k − 1)bk − xk

k 2
[←EX]
(b + x )
37.19 Now
Z ∞ Z ∞
n n bk kxk−1
E[X ] = x f (x) dx = xn k dx
x=0 x=0 (b + xk )2
Consider the transformation x ∈ (0, ∞) 7→ u ∈ (1, 0) defined by
bk du kbk xk−1
u= k which has = −
b + xk dx (bk + xk )2
Hence
Z 1  n n
E[X n ] = bn (1 − u)n/k u−n/k du = bn B 1 − , 1 +
u=0 k k
provided 1 − nk > 0 or n < k. Also
 n n Γ(1 − nk )Γ(1 + nk ) bn n n n bn n π
bn B 1 − , 1 + = bn = Γ(1 − ) Γ( ) =
k k Γ(2) k k k k sin(πn/k)
where the last equality follows by Euler’s reflection formula, (13.10a). [←EX]
dy n−1
37.20 Now dx = nx . Hence
k k k−n k k k/n−1 k k/n
k/n−1
kbk xk−1 nb x nb y n b2 y
fY (y) = = = = where b2 = bn . [←EX]
nxn−1 (bk + xk )2 (bk + xk )2 (bk + y k/n )2 (bk/n + y k/n )2
2
37.21 (a) and (b) Use the results for the distribution function, equation(36.4b), and the quantile function, equation(36.4c), and
the results in §7.6 on page 23.
dy 1
(c) Now y = ln(x) and hence dx = x and for y > 0 we have
kbk xk kbk eky kbk e−ky
fY (y) = = =
(bk + xk )2 (bk + eky )2 (1 + bk e−ky )2
e−k(y−β)
=k where β = ln(b).
(1 + e−k(y−β) )2
and this is the density of the logistic(β, σ 2 = π 2 /(3k 2 ) ). √
(d) Suppose X ∼ logistic(a, σ 2 = π 2 /(3k 2 ) ), then k = π/(σ 3); hence X has density
e−k(x−a)
fX (x) = k for x ∈ R.
[1 + e−k(x−a) ]2
Page 234 Answers 2§37 Mar 10, 2020(20:25) Bayesian Time Series Analysis

dy
Now y = ex and hence dx = ex = y; also let b = ea ; hence
k e−kx eka k ekx bk kbk y k−1
fY (y) = −kx ka
= kx k
= k [←EX]
y [1 + e e ] 2 y [e + b ] 2 [b + y k ]2
37.22 (a) Let Z ∼ loglogistic(1, 1) and Xk = Z 1/k . Then Xk ∼ loglogistic(k, 1). Now P[Z > 0] = 1 and on the set
a.e. D
{ω : Z(ω) > 0} we have Xk (ω) = Z(ω)1/k → 1 as k → ∞. Hence Xk −→ 1 as k → ∞; hence Xk =⇒ 1 as k → ∞.
a.e. a.e.
(b) Now Xk ∼ loglogistic(k, b); hence Xk = bZk where Zk ∼ loglogistic(k, 1) −→ 1 as k → ∞. Hence Xk −→ b as
D
k → ∞; hence Xk =⇒ b as k → ∞.
37.23 (a) Using the substitution y = eπx/2 gives
Z ∞ Z ∞
4 ∞ 1
  ∞  h
dx π πi
Z
4 −1 4
fX (x) dx = 2 = dy = tan (y) = − =1

eπx/2 + e−πx/2 π 1 1 + y 2 π π 2 4

−∞ 0 y=1

πy
(b) Let Y = (X − µ)/σ; then dx 1

dy = σ and fY (y) = σfX (µ + σy) = 2 sech 2 . (c) Follows from part(b) and the shape

of the density of the Hsecant(0, 1) distribution. [←EX]
37.24 (a) For x ∈ R we have
Z x Z x πz/2
dz e dz 2 exp(πx/2) du  πx i
Z
2 −1
h
FX (x) = πz/2 + e−πz/2
= πz
= = tan exp for x ∈ R.
−∞ e −∞ 1 + e π u=0 1 + u2 π 2
(b) By exercise 37.23 we know that X = µ + σY where Y ∼ Hsecant(0, 1). Hence FX (x) = P[X ≤ x] = P[Y ≤
(x − µ)/σ] and then use part(a).
(c) Just solve FX (x) = p. [←EX]
37.25 (a) We know that
Z 1
B(α, β) = xα−1 (1 − x)β−1 dx for α > 0 and β > 0.
x=0
Suppose c > 0 and consider the transformation
 
cv x 1 x
e = or v = ln
1−x c 1−x
dv 1
So this is a 1-1 map from x ∈ (0, 1) to v ∈ (−∞, ∞). Also dx =
cx(1−x) ; hence
Z 1 Z ∞ Z ∞
ecαv e−cαw
B(α, β) = xα−1 (1 − x)β−1 dx = c cv α+β
dv = c −cw )α+β
dw
x=0 v=−∞ (1 + e ) w=−∞ (1 + e
Now set c = q; hence q > 0. Also, set p = cα = qα; hence for any q > 0, p > 0 and β > 0 we have
Z ∞
e−pw
q −qw )p/q+β
dw = B(p/q, β)
w=−∞ (1 + e
Hence for any q > p > 0 we have
Z ∞
e−pw
q −qw
dw = B(p/q, 1 − p/q)
w=−∞ 1 + e
Now for any x ∈ (0, 1) we have B(x, 1 − x) = π/ sin(πx). Hence for any q > p > 0 we have
Z ∞
e−pw π
−qw
dw =
w=−∞ 1 + e q sin(πp/q)
Similarly, if 0 > p > q, then −q > −p > 0 and
Z ∞
e−pw π
−qw
dw =
w=−∞ 1 + e |q| sin(πp/q)
Hence Z ∞ Z ∞ −x(π/2−t)
etx e 1 1
MX (t) = πx/2 −πx/2
dx = −πx
dx = = = sec(t) for t ∈ (−π/2, π/2).
−∞ e +e −∞ 1 + e sin(π/2 − t) cos(t)
It follows that φX (t) = E[eitX ] = sec(it) = sech(t) for t ∈ (−π/2, π/2) and then this clearly holds for all t ∈ R.
(b) Just use X = µ + σY where Y ∼ Hsecant(0, 1). [←EX]
37.26 (a) Differentiating the moment generating function MX (t) = sec(t) gives
0 00
MX (t) = sec(t) tan(t) and MX (t) = sec(t)[tan2 (t) + sec2 (t)]
0 2 00 3
Hence E[X] = MX (0) = 0 and E[X ] = MX (0) = sec (0) = 1. Also
000
MX (t) = sec(t) tan3 (t) + 2 sec3 (t) tan(t) + 3 sec3 (t) tan(t)
iv
MX (t) = sec(t) tan4 (t) + 3 sec3 (t) tan2 (t) + 6 sec3 (t) tan(t) + 2 sec5 (t) + 9 sec3 (t) tan2 (t) + 3 sec5 (t)
000 iv
and hence MX (0) = 0 and MX (0) = 5. The values for the skewness and kurtosis follow directly from equations(2.22a)
and (2.25a).
(b) Just use X = µ + σY where Y ∼ Hsecant(), 1). [←EX]
37.27 (a) Set F (x) = p where F is the distribution function and is given in equation(36.6b) on page 110. (b) Just use h(x) =
f (x)/[1 − F (x)]. [←EX]
Mar 10, 2020(20:25) Answers 3§39 Page 235

37.28 (a) Now bf (x)/a = ex/b exp −a(ex/b − 1) for x ∈ [0, ∞). Differentiating f gives
 

bf 0 (x) 1 x/b h ih i
= e exp −a(ex/b − 1) 1 − aex/b
a b
If a < 1 then 1 − aex/b is first positive and then negative with root at x = −b ln a.
(b) If a ≥ 1 then 1 − aex/b < 0 for x > 0.
(c)
bf 00 (x) ex/b h ih i
= 2 exp −a(ex/b − 1) 1 − 3aex/b + a2 e2x/b
a b
(d) The sign of f 00 (x) is clearly the same as the sign of γ(x) = a2 e2x/b − 3aex/b + 1. Now
√ √
2 2x/b x/b x/b x/b 3− 5 3+ 5
γ(x) = a e − 3ae + 1 = (e a − α1 )(e a − α2 ) where α1 = and α2 =
2 2
If a < α1 , then γ(0) > 0. When x = x1 , then γ(x) = 0; also γ(x) < 0 for x1 < x < x2 . And so on. [←EX]
37.29 The distribution function of X is FX (x) = 1 − b exp[−ex/b ] for x ∈ R. Hence
ex/b exp −ex/b
 
fx (x)
=
1 − FX (0) be−1
and this is the density of Gompertz(1, b). [←EX]
37.30 (a) Inverting the Gompertz distribution function given in equation(36.6b) gives
ln(1 − p)
 
F −1 (p) = b ln 1 −
a
−1
Also, by §7.6 on page 23 we see that F (X) ∼ Gompertz(a, b). But if X ∼ uniform(0, 1) then 1 − X ∼ uniform(0, 1).
Hence result.
(b) By §7.6 on page 23 we know that if the random variable X has the distribution function F and F is continuous, then
the random variable F (X) has the uniform(0, 1) distribution. By equation(36.6b), we know that
h i
F (x) = 1 − exp −a(ex/b − 1) for x ∈ (0, ∞).
Also, if U ∼ uniform(0, 1) then 1 − U ∼ uniform(0, 1). Hence result. [←EX]
a x/b x/b

37.31 (a) By equation(36.6a) we have fX (x) = b e exp −a(e − 1) for x ∈ [0, ∞). Now {x : x ∈ [0, ∞) } →
7 {y :y∈
dy
[0, ∞) } and | dx | = ex/b /b. Hence
dx a h i
fY (y) = fX (x)| | = be−x/b ex/b exp −a(ex/b − 1) = a exp[−ay]
dy b
for y ∈ [0, ∞). This proves part(a).
dy
(b) Now {x : x ∈ [0, ∞) } 7→ {y : y ∈ [0, ∞) } and | dx | = b/(x + a). Hence
dx x+a aey/b
fY (y) = fX (x)| | = e−x = exp[−a(ey/b − 1)]
dy b b
which proves part(b). [←EX]
37.32 (a) By equation(21.5b) on page 65 we know that φY (t) = E[eitY ] = e−|t| ; also the density of X is fX (x) = e−x for
x > 0. Hence Z ∞ Z ∞
itZ itXY itxY −x
φZ (t) = E[e ] = E[e ]= E[e ]e dx = e−|tx| e−x dx
0 0
 ∞ −x(1+t)
R 1
e dx = 1+t if t > 0; 1
= R0∞ −x(1−t) 1 =
0
e dx = 1−t if t < 0. 1 + |t|
(b)
Z ∞ Z ∞
itZ itX 1/α Y itx1/α Y −x α 1
φZ (t) = E[e ] = E[e ]= E[e ]e dx = e−x|t| e−x dx = [←EX]
0 0 1 + |t|α

Chapter 3 Section 39 on page 121 (exs-multiv.tex)

T T T T T T T T
39.1 cov[a X, b Y] = E[(a X − a µX )(b Y − b µY ) ] = a cov[X, Y]b. [←EX]
39.2 (a) Just use E[X + a] = µX + b and E[Y + d] = µY + d.  
T T T T
(b) cov[AX, BY] = E (AX − Aµ  X )(BY − BµY ) = E A(X − µX )(Y − µY ) B =T Acov[X, Y]B .
(c)h cov[aX + bV, cY + dW] = E (aX + bV − aµX − bµY )(cY
i + dW − cµY − dµW ) which equals
T
E {a(X − µX ) + b(V − µY )} {c(Y − µY ) + d(W − µW )} which equals the right hand side.
(d) Similar to (c). [←EX]
39.3 Now cov[Y − AX, X] = cov[Y, X] − Avar[X]. Now ΣX = var[X] is non-singular. Hence if we take A = cov[Y, X]Σ−1
X
then cov[Y − AX, X] = 0. [←EX]
39.4 (a) Expanding the left hand side gives AE[XXT ]BT +AµbT +aµT BT +abT = AE[XXT ]BT +(Aµ+a)(Bµ+b)T −AµµT BT .
Then use equation(38.1a): Σ = E[XXT ] − µµT . (b) This is just a special case of (a).
(c) Now aT X is 1 × 1 and hence aT X = XT a. This implies E[XaT X] = E[XXT ]a and hence the result. [←EX]
Page 236 Answers 3§41 Mar 10, 2020(20:25) Bayesian Time Series Analysis

39.5
1 1 1 ··· 1
 
1 2 2 ··· 2
···
 
var[X] =  1 2 3 3 [←EX]
. . . .. .. 
 .. .. .. . .
1 2 3 ··· n
39.6 Let Y = X − α. Then E[(X − α)(X − α)T ] = E[YYT ] = var[Y] + µY µTY = var[X] + (µX − α)(µx − α)T . [←EX]
T T T T T T T T T T T
39.7 Left hand side equals E[X A BX] + µ A b + a Bµ + a b = E[X A BX] − µ A Bµ + (Aµ + a) (Bµ + b). By
equation(38.8a) we have E[XT AT BX] = trace(AT BΣ) + µT AT Bµ. Because trace(A) = trace(AT ) and ΣT = Σ, we have
trace(AT BΣ) = trace(ΣT BT A) = trace(ΣBT A). Then use trace(CA) = trace(AC). Other parts are all special cases of
this result. [←EX]
T T T T T
39.8 Use the argument in proposition(38.8a): X AY = (X − µX ) A(Y − µY ) + µX AY + X AµY − µX AµY . Hence
E[XT AY] = E (X − µX )T A(Y − µY ) ] + µTX AµY . Because (X − µX )T A(Y − µY ) is a scalar, we have
E[ (X − µX )T A(Y − µY ) ] = E trace (X − µX )T A(Y − µY )
 

= E trace A(Y − µY )(X − µX )T


 
because trace(AB) = trace(BA).
T
 
= trace E A(Y − µX )(X − µX ) because E[trace(V)] = trace(E[V]).
Hence result. [←EX]
39.9 Just apply proposition(38.8a) on page 119 to the random vector X − b which has expectation µ − b and variance Σ.
[←EX]
T 1
39.10 Using the representation in example(38.7d) on page 119, we have Q = X AX where A = I − n 1. Using equation(38.8a)
on page 119 gives E[Q] = E[XT AX] = trace(AΣ) + µT Aµ. Using A = I − n1 1 gives trace(AΣ) = trace(Σ) −
1 1 T T 1 T
Pn 2 1
Pn 2
n trace(1Σ) = α − n (α + 2β) = [(n − 1)α − 2β] /n. Finally µ Aµ = µ Iµ − n µ 1µ = k=1 µk − n k=1 µk .
[←EX]
T
39.11 (a) Straightforward algebraic expansionPcombined with A = A .
n Pn
(b) Let b = Aµ. Then W2 = 2bT Y = 2 k=1 bk Yk and var[W2 ] = 4σ 2 k=1 b2k = 4σ 2 bT b = 4σ 2 µT A2 µ.
(c) Now E[W2 ] = 0; hence cov[W1 , W2 ] = E[W1 W2 ] = 2E[YT AYµT AY]. Now µT AY is 1 × 1; hence cov[W1 , W2 ] =
2E[µT AYYT AY] = 2E[bT Zc] P c = AY and Z =P
P where YYPT
. P
Hence cov[W1 , W2 ] = 2E[ j k bj ck Zjk ] = 2E[ j k bj ck Yj Yk ]. Using ck = ` ak` Y` gives cov[W1 , W2 ] =
Pn
2E[ j j ` bj ak` Yj Yk Y` ] = 2µ3 k=1 bk akk = 2µ3 bT d proving (c).
P P P

(d) var[XT AX] = var[W1 ] + var[W2 ] + 2cov[W1 , W2 ] = (µ4 − 3σ 4 )dT d + 2σ 4 trace(A2 ) + var[W2 ] + 2cov[W1 , W2 ].
[←EX]
Pn 2 T 1
39.12 Let Q = k=1 (Xk − X) . By example(38.7d) on page 119, we know that Q = X AX where A = I − n
1. Then
1 2 2 2 T T
trace(AΣ) = trace(IΣ) − n trace(1Σ) = nσ − σ [1 + (n − 1)ρ] = σ (1 − ρ)(n − 1). Also µ Aµ = (X AX) = 0.

X=µ
Hence result. [←EX]
39.13 By §15.4 on page 47, we know that E[(X − µ)2 ] = σ 2 , E[(X − µ)3 ] = 0 and E[(X − µ)4 ] = 3σ 4 .
(a) Now (n − 1)S 2 = XT AX where A = I − n1 1. By equation(38.9a) on page 121, var[XT AX] = (µ4 − 3σ 4 )dT d +
2σ 4 trace(A2 )+4σ 2 µT A2 µ+4µ3 µT Ad = 2σ 4 trace(A2 )+4σ 2 µT A2 µ. Now Aµ = 0. Hence var[XT AX] = 2σ 4 trace(A2 ).
Now A = I − n1 1; hence A2 = I − n1 1 − n1 1 + n12 12 = I − n2 1 + n1 1 = A. Hence trace(A2 ) = trace(A) = n − 1 and so
var[XT AX] = 2σ 4 (n − 1).
Pn Pn−1
(b) By example(38.8e) on page 120, 2(n − 1)Q = 2 k=1 Xk2 − X12 − Xn2 − 2 k=1 Xk Xk+1 and 2(n − 1)E[Q] =
σ 2 (2n−2). Also var[2(n−1)Q] = (µ4 −3σ 4 )dT d+2σ 4 trace(A2 )+4σ 2 µT A2 µ+4µP T 4
3 µ Ad = 2σ trace(A
2
)+4σ 2 µT A2 µ.
n Pn P n
Again Aµ = 0. Hence var[2(n − 1)Q] = 2σ 4 trace(A2 ). Now trace(A2 ) = j=1 [A 2
]jj = j=1 k=1 ajk akj =
Pn Pn 2
j=1 k=1 ajk = 6(n − 2) + 4 = 6n − 8. Hence result. [←EX]
39.14 (a) We use the following results: trace(AB) = trace(BA) and trace( E[A] ) = E[ trace(A) ]. Hence E[Z] = E[ XT AY ] =
E[ trace(AYXT ) ] = trace(E[AYXT ]) = trace(A E[YXT ]) = trace[A( cov[Y, X] + µY µTX ) ] = trace[A cov[Y, X] ] +
trace[AµY µTX ] = trace[A cov[Y, X] ] + trace[µTX AµY ] as required.
Pn
(b) Now (n − 1)SXY = j=1 Xj Yj − nX Y = XT [I − 1/n]Y. By part(a) we have E[(n − 1)SXY ] = trace( Acov[Y, X] ) +
µTX AµY where A = In − 1/n and cov[Y, X] = σXY In . Also µX = µX j and µY = µY j where j is an n × 1
vector with every entry equal to 1. Now trace(Acov[Y, X]) = σXY trace(In − 1/n) = (n − 1)σXY and µTX AµY =
µX µY jT (I − 1/n)j = µX µY (jT j − jT 1j/n) = µX µY (n − n) = 0. Hence result. [←EX]

Chapter 3 Section 41 on page 127 (exs-bivnormal.tex)

41.1 Use characteristic functions. [←EX]


Mar 10, 2020(20:25) Answers 3§41 Page 237

For questions 2–9, see the method in the solution of example(40.3a) on page 125.
41.2 Setting Q(x, y) = 32 (x2 − xy + y 2 ) gives a1 = 2/3, a2 = − 1/3 and a3 = 2/3. Hence

 
2/3 − 1/3 1
P= and |P| = . Hence c = 1/2π 3.
− /3
1 2/3 3
(b) We have cov[X1 , X2 ] = −a2 /(a1 a3 − a22 ) = 1 and hence they are not independent. [←EX]
41.3 Setting Q(x, y) = 2(x2 + 2xy +4y 2 ) gives  a 1 = 2, a2 = 2 and a3 = 8. Clearly (µ , µ
 1 2  ) = (0, 0). Hence
2 2 −1 1 8 −2
P= and |P| = 12. Also Σ = P =
2 8 12 −2 2
1/2
√ √
Finally |P| = 2 3 and so k = 3/π. [←EX]
41.4 Setting Q(x, y) = 2x2 + y 2 + 2xy − 22x − 14y + 65 gives a1 = 2, a2 = 1 and  a3 = 1. Hence
2 1 1 −1
P= and |P| = 1 and Σ = P−1 =
1 1 −1 2
Finally k = 2π/|P|1/2 = 2π.
Now for the mean vector: setting ∂Q(x,y) ∂x = 0 and ∂Q(x,y)
∂y = 0 gives 4µ1 + 2µ2 − 22 = 0 and 2µ2 + 2µ1 − 14 = 0. Hence
(µ1 , µ2 ) = (4, 3). [←EX]
41.5 Setting Q(y1 , y2 ) = y12 + 2y22 − y1 y2 − 3y1 − 2y2 + 4 gives a1 = 1, a2 = − 1/2 and a3 = 2. Hence
   
1 − 1/2 7/4 and Σ = P−1 =
1 8 2
P= and |P| =
− 1/2 2 7 2 4
∂Q(y1 ,y2 ) ∂Q(y1 ,y2 )
Setting ∂y1 = 0 and ∂y2 = 0 gives 2µ1 − µ2 − 3 = 0 and 4µ2 − µ1 − 2 = 0. Hence µ1 = 2 and µ2 = 1. [←EX]
41.6 Proceed as in question 41.3 above. Hence integral = π/√3. [←EX]
41.7 Setting Q(y1 , y2 ) = 61 y12 + 2y1 (y2 − 1) + 4(y2 − 1)2 gives µ1 = 0, µ2 = 1, a1 = 1/6, a2 = 1/6 and a3 = 2/3. Hence
   
1 1 1 8 −2
P= and |P| = 1/12 and Σ = P−1 = [←EX]
6 1 4 −2 2
41.8 (a) cov(X, Y −ρσY X/σX ) = cov[X, Y ]−ρσY var[X]/σX = ρσX σY −ρσX σY = 0. (b) cov[ X cos θ+Y sin θ, X cos θ−
Y sin θ ] = var[X] cos2 θ − var[Y ] sin2 θ = 0. [←EX]
X2 X2
41.9 Using equation(40.3b) on page 125 for P gives XT PX − σ21 = (σ22 X12 − 2ρσ1 σ2 X1 X2 + σ12 X22 )/[σ12 σ22 (1 − ρ2 )] − σ21 =
1 1
(ρ2 X12 /σ12 − 2ρX1 X2 /σ1 σ2 + Xp 2 2 2 2 2
2 /σ2 )/(1 − ρ ) = (ρX1 /σ1 − X2 /σ2 ) /(1 − ρ ).
Let Z = (ρX1 /σ1 − X2 /σ2 )/ 1 − ρ2 ). Then Z is normal with E[Z] = 0 and var[Z] = E[Z 2 ] = (ρ2 + 1 −
2ρcov[X1 , X2 ]/σ1 σ2 )/(1 − ρ2 ) = (ρ2 + 1 − 2ρ2 )/(1 − ρ2 ) = 1. Hence Z 2 ∼ χ21 as required. [←EX]
41.10 (a) E[X1 |X2 ] = E[Y − αX2 |X2 ] = E[Y |X2 ] − αX2 = E[Y ] − αX2 = µ1 + αµ2 − αX2 .
(b) Suppose (X1 , X2 ) has a bivariate normal distribution with µ1 = µ2 = 0. Consider the random vector (X2 , Y ) with
Y = X1 − ρσ1 X2 /σ2 . This is a non-singular linear transformation and hence by proposition(40.7a) on page 126 the new
vector has a bivariate normal with mean (0, 0) and  variance matrix
2
σ2 0
0 (1 − ρ2 )σ12
Hence Y and X2 are independent. Hence by part (a) we have E[X1 |X2 ] = X2 ρσ1 /σ2 . Hence if V1 = X1 + µ1 and
V2 = X2 + µ2 then E[V1 |V2 ] = µ1 + (V2 − µ2 )ρσ1 /σ2 . [←EX]
41.11 A straightforward calculation gives
 
σX p0
L=
ρσY σY 1 − ρ2
Now LZ = X − µ ∼ N (0, Σ). Hence E[Z] = 0 and Lvar[Z]LT = Σ. Now L and hence LT are invertible and hence
var[Z] = L−1 Σ(LT )−1 = I [←EX]
41.12 (a) Z has the N (0, 1) distribution. (b) Using
 2
x − 2ρxz + z 2

fXY (x, y) fXY (x, y) 1
fXZ (x, z) = = p = exp −
2(1 − ρ2 )
p
∂(x,z) 1 − ρ2 2π 1 − ρ2

∂(x,y)
(c) We now have
∂(u, v)
∂(x, z) = σ1 σ2

and hence the density of (U, V ) is that given in equation(40.3a) on page 124. [←EX]
41.13 Now X = (X1 , X2 ) ∼ N 0, Σ) where Σ = diag(σ12 , σ22 ). If Z = (X1 , Z), then Z = CX where
   2 
1 0 σ1 σ12
C= and hence Z ∼ N (0, CΣCT ) where CΣCT =
1 1 σ12 σ12 + σ22
q
(b) The conditional distribution of X1 given Z is N (ρ σσZ1 Z, σ12 (1−ρ2 )) where ρ = σ1 / σ12 + σ22 and ρ σσZ1 = σ12 /(σ12 +σ22 ).
Hence E[X1 |Z] = σ12 Z/(σ12 + σ22 ) and
itσ 2 Z 1 t2 σ12 σ22
 
itX1
E[e |Z] = exp 2 1 2 − [←EX]
σ1 + σ2 2 σ12 + σ22
Page 238 Answers 3§41 Mar 10, 2020(20:25) Bayesian Time Series Analysis

41.14 Now W = X1 iff α(Z1 , . . . , Zn ) = 1 iff Z1 ≥ Z2 and Z1 ≥ Z3 and · · · and Z1 ≥ Zn ; W = X2 iff α(Z1 , . . . , Zn ) = 2 iff
Z2 > Z1 and Z2 ≥ Z3 and · · · and Z2 ≥ Zn ; etc. Hence
Xn Z Xn Z
E[W ] = W dP = W I [α(Z1 , . . . , Zn ) = j] dP
j=1 α(Z1 ,...,Zn )=j j=1 Ω

Because we are dealing with continuous distributions, we can see that all temrs in the sum are equal and hence
Z Z
E[W ] = n W I [α(Z1 , . . . , Zn ) = 1] dP = n X1 I [α(Z1 , . . . , Zn ) = 1] dP

  Ω
= n E X1 I(Z1 ≥ Z2 ) · · · I(Z1 ≥ Zn )
 
= n E X1 I(Z1 ≥ Z2 ) · · · I(Z1 ≥ Zn ) Z1
Now X1 , I(Z1 ≥ Z2 ), . . . , I(Z1 ≥ Zn ) are conditionally independent given Z1 ; hence by equation(1.2a) we have
 
E[W ] = n E E[X1 |Z1 ]E[I(Z1 ≥ Z2 )|Z1 ] · · · E[I(Z1 ≥ Zn )|Z1 ]
q
Using the result of exercise 41.13 and setting V = Z1 σ12 + σ22 gives
  n−1 
2
 σ Z1 Z1 nσ 2 h
n−1
i
E[W ] = n E  2 1 2 Φ  q  = q 1 E V {Φ(V )}
 
σ1 + σ2 σ12 + σ22 σ12 + σ22
Similarly,
  n−1 
itσ12 Z1 t2 σ12 σ22
 
Z1
E[eitW ] = n E exp − Φ q
  
σ12 + σ22 2(σ12 + σ22 )

σ12 + σ22
  n−1 
2 2 2 2
   
t σ1 σ2 itσ1 Z1 Z1
= n exp − E exp Φ q
  
2(σ12 + σ22 ) σ12 + σ22

2
σ +σ 2
1 2
   
t2 σ12 σ22 itσ12 V
 
= n exp − E exp  q  {Φ(V )}n−1  [←EX]
2(σ12 + σ22 ) σ12 + σ22
41.15 Now Z Z Z
1
E[eitY ] = eitQ(x1 ,x2 ) fX1 X2 (x1 , x2 ) dx1 dx2 = e−(1−it)Q(x1 ,x2 ) dx1 dx2
p
2πσ1 σ2 1 − ρ2 x1 x2
Define α1 and α2 by σ1 = (1 − it)1/2 α1 and σ2 = (1 − it)1/2 α2 . Hence
1 p α1 α2 1
E[eitY ] = 2πα1 α2 1 − ρ2 = =
σ1 σ2 1 − it
p
2πσ1 σ2 1 − ρ2
and hence Y has an exponential(1) distribution. [←EX]
41.16 (a) If |Σ| = 0 then σ12 σ22 = σ12
2
. The possibilities are
d
(i) σ12 = 0, σ1 = 0 and σ2 6= 0; φ(t) = exp(− 12 σ22 t22 ). Hence (X1 , X2 ) = (0, Z) where Z ∼ N (0, σ22 ).
d
(ii) σ12 = 0, σ1 6= 0 and σ2 = 0; φ(t) = exp(− 21 σ12 t21 ). (X1 , X2 ) = (Z, 0) where Z ∼ N (0, σ12 ).
d
(iii) σ12 = 0, σ1 = 0 and σ2 = 0; φ(t) = 1; (X1 , X2 ) = (0, 0).
(iv) σ12 6= 0 and ρ = σ12 /σ1 σ2 = +1; φ(t) = E[ei(t1 X1 +t2 X2 ) ] = exp − 21 (σ1 t1 + σ2 t2 )2 . Hence if Z = σ2 X1 − σ1 X2 then
 
a.e.
setting t1 = tσ2 and t2 = tσ1 gives φZ (t) = E[ei(tσ2 X1 −tσ1 X2 ) ] = E[exp(− 21 t2 × 0)] = 1. Hence σ2 X1 = σ1 X2 .
a.e.
(v) σ12 6= 0 and ρ = σ12 /σ1 σ2 = −1; σ2 X1 = −σ1 X2 .
d d
(b) (i) (X1 , X2 ) = (µ1 , Z) where Z ∼ N (µ2 , σ22 ). (ii) (X1 , X2 ) = (Z, µ2 ) where Z ∼ N (µ1 , σ12 ).
d a.e. a.e.
(iii) (X1 , X2 ) = (µ1 , µ2 ). (iv) σ2 (X1 − µ1 ) = σ1 (X2 − µ2 ). (v) σ2 (X1 − µ1 ) = −σ1 (X2 − µ2 ). [←EX]
41.17 (a) Now (X, Y ) is bivariate normal with E[X] = E[Y ] = 0 and variance matrix
 
a21 + a22 a1 b1 + a2 b2
a1 b1 + a2 b2 b21 + b22
2 2
Hence E[Y |X] = ρσ2 X/σ1 = X(a1 b1 + a2 b2 )/(a1 + a2 ).
(b) By simple algebra
(a2 T1 − a1 T2 )(a2 b1 − a1 b2 )
Y − E[Y |X] =
a21 + a22
2 (a2 T1 − a1 T2 )2 (a2 b1 − a1 b2 )2
Y − E[Y |X] =
(a21 + a22 )2
Mar 10, 2020(20:25) Answers 3§41 Page 239

and E (a2 T1 − a1 T2 )2 = E(a22 T12 − 2a1 a2 T1 T2 + a21 T22 ) = a21 + a22 . Hence result. [←EX]
41.18 (a) cov[X + Y, X − Y ] = var[X] + cov[Y, X] − cov[X, Y ] − var[Y ] = var[X] − var[Y ] = 0. Hence result.
(b) Clearly ρ = 0 implies (X, Y ) are independent and hence X 2 and Y 2 are independent. Conversely, suppose X 2 and
Y 2 are independent. Recall the characteristic function is
   
1 T 1 2 2

φX (t) = exp − t Σt = exp − t1 + 2ρt1 t2 + t2
2 2
Now E[X 2 Y 2 ] is the coefficient of t21 t22 /4 in the expansion of φ(t). Hence E[X 2 Y 2 ] is the coefficient of t21 t22 /4 in the
expansion of
1 2  1 2 2
1− t1 + 2ρt1 t2 + t22 + t1 + 2ρt1 t2 + t22
2 8
Hence E[X 2 Y 2 ] is the coefficient of t21 t22 /4 in the expansion of
1 2 2
t1 + 2ρt1 t2 + t22
8
Hence E[X 2 Y 2 ] = 2ρ2 + 1. Independence of X 2 and Y 2 implies E[X 2 ]E[Y 2 ] = 1. Hence ρ = 0.
This could also be obtained by differentiating the characteristic function—but this is tedious. Here are some of the steps:
 
1
φ(t) = exp − f (t1 , t2 ) where f (t1 , t2 ) = t21 + 2ρt1 t2 + t22
2
 
∂φ(t) 1 ∂f 1
=− exp − f (t1 , t2 )
∂t1 2 ∂t1 2
" 2 #
2 2
  
∂ φ(t) 1 ∂ f 1 ∂f 1
= − + exp − f (t1 , t2 )
∂t21 2 ∂t21 4 ∂t1 2
∂ 2 φ(t)
 
1
= g(t1 , t2 ) exp − f (t1 , t2 ) where g(t1 , t2 ) = t21 + ρ2 t22 − 1 + 2ρt1 t2
∂t21 2
" 2 #
∂ 4 φ(t) ∂2g 1 ∂2f
  
∂f ∂g 1 ∂f 1
= − g(t ,
1 2 t ) − + g(t ,
1 2t ) exp − f (t ,
1 2t )
∂t21 ∂t22 ∂t22 2 ∂t22 ∂t2 ∂t2 4 ∂t2 2
Setting t1 = t2 = 0 gives 2ρ2 − 21 (−1)(2) = 2ρ2 + 1 as above. [←EX]
41.19 (a) The absolute value of the Jacobian of the transformation is

∂(x, y)
∂(r, θ) = r

Now
 2
x − 2ρxy + y 2

1
f(X,Y ) (x, y) = exp − for x ∈ R and y ∈ R.
2σ 2 (1 − ρ2 )
p
2πσ 2 1 − ρ2
 2
r ( 1 − ρ sin(2θ) )

r
f(R,Θ) (r, θ) = exp − for 0 < θ < 2π and r > 0.
2σ 2 (1 − ρ2 )
p
2πσ 2 1 − ρ2
Z ∞ p
2 ( 1 − ρ sin(2θ) ) 1 − ρ2
 
1
fΘ (θ) = r exp −r dr = for 0 < θ < 2π.
2σ 2 (1 − ρ2 ) 2π(1 − ρ sin(2θ) )
p
2πσ 2 1 − ρ2 r=0
If ρ = 0 then
r2
 
r
f(R,Θ) (r, θ) = exp − 2 for 0 < θ < 2π and r > 0.
2πσ 2 2σ
and hence R and Θ are independent. Note that Θ ∼ uniform(0, 2π).
(b) We can assume σ 2 = 1 because it just cancels from the numerator and denominator. From part (a) we have
 2
r r
f(R,Θ) (r, θ) = exp − for 0 < θ < 2π and r > 0.
2π 2
So R and Θ are independent. Now T1 = R cos(2Θ) and T2 = R sin(2Θ). For this transformation, 2 values of (R, Θ) lead
∂(t1 ,t2 )
to the same value of (T1 , T2 ). The absolute value of the Jacobian of the transformation is ∂(r,θ) = 2r. Hence
 2 2
1 t +t
f(T1 ,T2 ) (t1 , t2 ) = exp − 1 2 for t1 ∈ R and t2 ∈ R.
2π 2
Hence result. [←EX]
Page 240 Answers 3§41 Mar 10, 2020(20:25) Bayesian Time Series Analysis

41.20 Let V1 = X1 /σ1 and V2 = X2 /σ2 . Then (V1 , V2 ) has a bivariate normal distribution with E[V1 ] = E[V2 ] = 0 and
var[V1 ] = var[V2 ] = 1. Then " #
1 v12 − 2ρv1 v2 + v22
fV1 V 2 (v1 , v2 ) = exp −
2(1 − ρ2 )
p
2π 1 − ρ2
Consider the transformation (W, V ) with W = V1 and V = V1 /V2 . The range is effectively R2 —apart from a set of
measure 0. The Jacobian is
∂(w, v) 1 0 |v1 |
∂(v1 , v2 ) 1/v2 −v1 /v 2 = v 2
=
2 2
and hence
v 2 fV V (v1 , v2 )
fW V (w, v) = 2 1 2 where v1 = w and v2 = w/v
|v1 |
Hence
|w| w2 (v 2 − 2ρv + 1)
 
fW V (w, v) = exp −
2(1 − ρ2 )v 2
p
2πv 2 1 − ρ2
We now integrate out w to find the density of V :
Z ∞ Z ∞
w2 (v 2 − 2ρv + 1)
 
w 1
w exp −αw2 dw
 
fV (v) = exp − dw =
− 2 )v 2
p p
0 πv 1 − ρ
2 2 2(1 ρ πv 1 − ρ 0
2 2

1 v 2 − 2ρv + 1
= where α =
2(1 − ρ2 )v 2
p
2πv 2 α 1 − ρ2
p
1 − ρ2
=
π(v − 2ρv + 1)
2

Now Z = X1 /X2 = V (σ1 /σ2 ) which is a 1 − 1 transformation. Hence fZ (z) = σ2 fV (v)/σ1 and the result follows.
(b)(i) fZ (z) = 1/π(1 + z 2 ) for z ∈ R. This is the Cauchy(0, 1) distribution.
(b)(ii) We know that X − Y and X + Y are i.i.d. random variables with the N (0, 2σ 2 ) distribution. Hence W ∼
Cauchy(0, 1). [←EX]
41.21 The idea is to linearly transform (X1 , X2 ) to (V1 , V2 ) so that V1 and V2 are i.i.d. N (0, 1). In general, Σ = AAT and if we
set V = A−1 X then var[V]
 = A−1 Σ(A−1 )T = I and so V1 and V2 are independent.
a b 1 p p  1 p p  p
Suppose A = with a = 1 + ρ + 1 − ρ and b = 1 + ρ − 1 − ρ . Then a2 − b2 = 1 − ρ2 ,
b a 2 2
a2 + b2 = 1 and 2ab =ρ. Hence AAT = Σ.
1 a −b aX1 − bX2 −bX1 + aX2
Also A−1 = 2 . So let V1 = and V2 = .
a −b 2 −b a a −b
2 2 a2 − b2
Then E[V1 ] = E[V2 ] = 0, var[V1 ] = var[V2 ] = 1 and cov(V1 , V2 ) = 0. As (V1 , V2 ) is bivariate normal, this implies that
V1 and V2 are i.i.d. N (0, 1). Hence V12 + V22 ∼ χ22 . But
X 2 + X22 − 2ρX1 X2
V12 + V22 = 1
1 − ρ2
as required
or use proposition(40.7b) to get
X2 − ρX1 (X2 − ρX1 )2
and X1 are independent N (0, 1) and hence + X12 ∼ χ22 [←EX]
− 2
p
1−ρ 2 1 ρ
41.22 Recall  2
2ρxy y 2
 
1 1 x
f (x, y) = exp − − +
2(1 − ρ2 ) σ12 σ1 σ2 σ22
p
2πσ1 σ2 1 − ρ2

By using the transformation v = x/σ1 and w = y/σ2 which has ∂(v,w) ∂(x,y) = σ1 σ2 , we get

Z ∞Z ∞ Z ∞Z ∞
(x2 − 2ρxy + y 2 )
 
1
P[X ≥ 0, Y ≥ 0] = f (x, y) dxdy = exp − dxdy (41.22a)
2(1 − ρ2 )
p
x=0 y=0 x=0 y=0 2π 1 − ρ2
Z 0 Z ∞ Z 0 Z ∞
(x2 − 2ρxy + y 2 )
 
1
P[X ≤ 0, Y ≥ 0] = f (x, y) dxdy = exp − dxdy (41.22b)
2(1 − ρ2 )
p
x=−∞ y=0 x=−∞ y=0 2π 1 − ρ2

Now use polar coordinates: x = r cos θ and y = r sin θ. Hence tan θ = y/x and r2 = x2 + y 2 . Also ∂(x,y) ∂(r,θ) = r. Hence

Z π/2 Z ∞  2
r (1 − ρ sin 2θ)

1
P[X ≥ 0, Y ≥ 0] = r exp − drdθ
2(1 − ρ2 )
p
2π 1 − ρ2 θ=0 r=0
Z π/2 Z ∞
1 1 − ρ sin 2θ
= r exp[−αr2 ] drdθ where α =
2(1 − ρ2 )
p
2π 1 − ρ θ=0 r=0
2
Mar 10, 2020(20:25) Answers 3§43 Page 241
Z π/2 ∞ Z π/2
1 exp[−αr2 ] 1 1
= dθ = dθ
−2α
p p
2π 1 − ρ θ=0
2
r=0 2π 1 − ρ θ=0
2 2α
1 − ρ2 π/2
p

Z
=
2π θ=0 1 − ρ sin 2θ
Similarly, where the transformation θ → θ − π/2 is used in the last equality
1 − ρ2 π 1 − ρ2 π/2
p p
dθ dθ
Z Z
P[X ≤ 0, Y ≥ 0] = =
2π θ=π/2 1 − ρ sin 2θ 2π θ=0 1 + ρ sin 2θ
dt
We now use the transformation t = tan θ. Hence dθ = sec2 θ = 1 + t2 . Also sin 2θ = 2t/(1 + t2 ), cos 2θ = (1 − t2 )/(1 + t2 )
2
and tan 2θ = 2t/(1 − t ). Hence
P[X ≥ 0, Y ≥ 0]
1 − ρ2 ∞ 1 − ρ2 ∞ 1 − ρ2 ∞
p p p
dt dt dt
Z Z Z
= 2 − 2ρt + 1
= =
2π t=0 t 2π t=0 (t − ρ)2 + (1 − ρ2 ) 2π t=−ρ t 2 + (1 − ρ2 )

p  ∞ 
1 − ρ2  t dx x
Z
1 1
= tan−1 p  by using the standard result = tan−1 + c

x2 + a2 a
p
2π a

1 − ρ2 1 − ρ2 t=−ρ
" #
1 π ρ 1 1
= + tan−1 p = + sin−1 ρ as required.
2π 2 1−ρ 2 4 2π
The transformation (X, Y ) → (−X, −Y ) shows that P[X ≥ 0, Y ≥ 0] = P[X ≤ 0, Y ≤ 0]. [←EX]
41.23 Let a = 1 + x2 .
∞ ∞ √ 2
π √ e−(1+x )
Z Z
C exp −a(1 + y 2 ) dy = C exp[−a] 2
   
fX (x) = exp −ay dy = C exp[−a] √ =C π √
y=−∞ y=−∞ a 1 + x2
Hence √
f(X,Y ) (x, y) 1 + x2
exp −(1 + x2 )y 2
 
fY |X (y|x) = = √
fX (x) π
 
2 1
and this is the density of the N 0, σ = 2(1+x2 ) distribution. [←EX]

Chapter 3 Section 43 on page 139 (exs-multivnormal.tex)

(a) The characteristic function of X is φX (t) = exp it µ − 21 tT Σt . Hence the characteristic function of Y = X − µ is
T

43.1
φY (t) = E exp(itT Y) = E exp(itT X) exp(−itT µ) = exp − 21 tT Σt . Hence Y ∼ N (0, Σ).
    

(b) E[exp(itT X)] = E[exp(it1 X1 )] · · · E[exp(itn Xn )] = exp(it1 µ1 − 21 σ 2 t21 ) · · · exp(itn µn − 21 σ 2 t2n ) = exp(itT µ −
1 2 T
2 σ t It) as required. Pn Pn
(c) Σ = diag[d1 , . . . , dn ]. Hence φX (t) = exp i i=1 ti di − 12 i=1 t2i di = exp(it1 µ1 − 12 t21 d1 ) · · · exp(itn µn − 12 t2n dn )

which means that X1 , . . . , Xn are independent with distributionsN (µ1 , d1 ), . . . , N (µn , dn ) respectively.
(d) φZ (t) = φX (t)φY (t) = exp itT (µX + µY ) − 21 tT (ΣX + ΣY )t [←EX]
43.2 The second term is an Z odd function of x3 ; hence Z
1 − 1 (x21 +x22 ) 1 1 2 1 − 1 (x21 +x22 )
f(X1 ,X2 ) (x1 , x2 ) = f(X1 ,X2 ,X3 ) (x1 , x2 , x3 ) dx3 = e 2 1/2
e− 2 x3 dx3 = e 2
x3 (2π) (2π) x3 (2π)
Hence X1 and X2 are independent. Similarly for the pair X1 and X3 and the pair X2 and X3 . Because X1 , X2 and X3
all have the N (0, 1) distribution, it is clear that
f(X1 ,X2 ,X3 ) (x1 , x2 , x3 ) 6= fX1 (x1 )fX2 (x2 )fX3 (x3 ) [←EX]
43.3 (a) The vectors (X1 , X3 ) and X2 are independent iff the 2 × 1 matrix cov[(X1 , X3 ), X2 ] = 0 (by property of the multi-
variate normal). But cov[X1 , X2 ] = 0 and cov[X3 , X2 ] = 0. Hence result. (b) cov[X1 − X3 , X1 − 3X2 + X3 ] =
var[X1 ] − 3cov[X1 , X2 ] + cov[X1 , X3 ] − cov[X3 , X1 ] + 3cov[X3 , X2 ] − var[X3 ] = 4 − 0 − 1 + 1 + 0 − 2 = 2. Hence
not independent. (c) cov[X1 + X3 , X1 − 2X2 − 3X3 ] = 4 − 0 + 3 − 1 + 0 − 6 = 0. Hence independent. [←EX]
43.4 (a) Let Z = (X1 , X3 ). Using §42.8 on page 132 gives the distribution of Z is N (0, Σ1 ) where
 
2 −1
Σ1 =
−1 5
2 2

Hence for Z = (Z1 , Z2 ) we have σ1 = 2, σ2 = 5 and ρ = −1/ 10. Using §40.5 on page 125 gives the distribution of
(Z2 |Z1 = 1) is N (− 1/2, 9/2).
(b) First use proposition(42.6a) on page 132. Now the vector Z = (X2 , X1 + X3 ) is Z = BX where
   
0 1 0 T T 3 1
B= Hence Z ∼ N (0, BΣB ) where BΣB =
1 0 1 1 5
2 2

Hence, for Z = (Z1 , Z2 ) we have σ1 = 3, σ2 = 5 and ρ = 1/ 15. Using §40.5 on page 125 gives the distribution of
(Z1 |Z2 = 1) is N ( 1/5, 14/5). [←EX]
Page 242 Answers 3§43 Mar 10, 2020(20:25) Bayesian Time Series Analysis

43.5 (a) Suppose x ∈ Rm with aT ax = 0; then xT aT ax = 0. Hence (ax)T (ax) = 0; hence ax = 0. Hence x1 α1 +· · ·+xm αm = 0
where α1 , . . . , αm are the columns of a. But rank(a) = m. Hence x = 0. Hence rank(aT a) = m and so aT a is invertible.
(b) First note that if G is any symmetric non-singular matrix, then GG−1 = I; hence (G−1 )T G = I; hence G−1 = (G−1 )T .
Clearly E[Y] = Bµ and var[Y] = σ 2 BBT = σ 2 (aT a)−1 aT a(aT a)−1 = σ 2 (aT a)−1 as required. [←EX]
43.6 By proposition(42.9a) on page 134, we know there exists a0 , a1 , a2 , a3 and a4 in R such that E[Y |X1 , X2 , X3 , X4 ] =
a0 + a1 X1 + a2 X2 + a3 X3 + a4 X4 . Taking expectations gives a0 = 1. Also, taking expectations of
E[Y X1 |X1 , X2 , X3 , X4 ] = X1 EY |X1 , X2 , X3 , X4 ] = X1 + a1 X12 + a2 X1 X2 + a3 X1 X3 + a4 X1 X4
gives /2 = a1 + 1/2(a2 +a3 +a4 ). Similarly 1/2 = a2 + 1/2(a1 +a3 +a4 ), 1/2 = a3 + 1/2(a1 +a2 +a4 ), and 1/2 = a4 + 1/2(a1 +a2 +a3 ).
1
Subtracting shows that a1 = a2 = a3 = a4 ; indeed this was obvious from the symmetry in Σ. Combining this result with
1/2 = a + 1/2(a + a + a ) gives a = a = a = a = 1/5. [←EX]
1 2 3 4 1 2 3 4
43.7 (a) The matrix of regression coefficients is
 −1
σ22 ρ23 σ2 σ3
A Σ−1 = [ ρ 12 σ1 σ 2 ρ 13 σ 1 σ3 ]
Z
1×2 2×2 ρ23 σ2 σ3 σ32
 
1 σ32 −ρ23 σ2 σ3
= [ ρ12 σ1 σ2 ρ13 σ1 σ3 ] 2 2
σ2 σ3 (1 − ρ223 ) −ρ23 σ2 σ3 σ22
σ1
= [ σ3 (ρ12 − ρ13 ρ23 ) σ2 (ρ13 − ρ12 ρ23 ]
σ2 σ3 (1 − ρ223 )
Using the general result that E[Y|Z] = µY − AΣ−1 (Z − µZ ) and var[Y|Z] = ΣY − AΣ−1 T
Z A gives
Z
X2 − µ2 X3 − µ3

σ1
E[X1 |(X2 , X3 )] = µ1 − (ρ 12 − ρ 13 ρ 23 ) + (ρ 13 − ρ 12 ρ 23 )
(1 − ρ223 ) σ2 σ3
2 2
ρ − 2ρ12 ρ23 ρ13 + ρ13
 
var[X1 |(X2 , X3 )] = σ12 1 − 12
1 − ρ223
(b) The matrix of regression coefficients is
 
ρ13 σ1 σ3 1
A Σ−1Z =
2×1 1×1 ρ23 σ2 σ3 σ32
Using the general result that E[Y|Z] = µY − AΣ−1 (Z − µZ ) and var[Y|Z] = ΣY − AΣ−1 T
Z A gives
  Z 
µ1 1 ρ13 σ1 σ3 (X3 − µ3 )
E[(X1 , X2 )|X3 ] = − 2
µ2 σ3 ρ23 σ2 σ3 (X3 − µ3 )
 2
  
σ1 ρ12 σ1 σ2 1 ρ213 σ12 σ32 ρ13 ρ23 σ1 σ2 σ32
var[(X1 , X2 )|X3 ] = −
ρ12 σ1 σ2 σ22 σ32 ρ13 ρ23 σ1 σ2 σ32 ρ223 σ22 σ32
 
(1 − ρ213 )σ12 (ρ12 − ρ13 ρ23 )σ1 σ2
= [←EX]
(ρ12 − ρ13 ρ23 )σ1 σ2 (1 − ρ223 )σ22
43.8 By equation(42.12a), (n − 1)S 2 /σ 2 ∼ χ2n−1 . Using the formula for the variance of a χ2n distribution (see §11.9) gives
(n − 1)S 2
 
var = 2(n − 1)
σ2
and hence var[S 2 ] = 2σ 4 /(n − 1). [←EX]
43.9 By equation(42.9b) on page 134 we have the conditional distribution of Xj given X(j) :
N µj + σ12 Σ−1 (j)
− µ(j) ), σj2 − σ12 Σ−1 T

22 (x 22 σ12
where σ12 is a 1 × (n − 1) vector of entries from Σ. Now Σ22 is positive definite; hence σ12 Σ−1 T
22 σ12 ≥ 0. Hence
2 −1 T 2
σj − σ12 Σ22 σ12 ≤ σj as required. [←EX]
43.10 Let 1 denote the n × 1-dimensional vector with every entry equal to 1.
(a) Now c = cov[Yj , Y1 + · · · + Yn ] is the sum of the j th row of Σ. Adding over all n rows gives nc = 1T Σ1 = var[1Y] =
var[Y1 + · · · + Yn ] ≥ 0.
(b) Suppose there exists c ∈ R such that cov[Yj , Y1 + · · · + Yn ] = c for all j = 1, 2, . . . , n. Then Σ1 = c1 and hence 1 is
an eigenvector of Σ. The converse is similar.
(c) We are given that cov[Yj , Y1 + · · · + Yn ] = 0 for j = 1, 2, . . . , n. Consider the random vector Z = (Y1 , . . . , Yn , Y1 +
· · · + Yn ) Because every linear combination `T Z of the components of Z has a univariate normal distribution, it follows
that Z has a multivariate normal distribution. Also cov[Y, Y1 + · · · + Yn ] = 0. Hence Y = (Y1 , . . . , Yn ) is independent of
Y1 + · · · + Yn . But Y1 + · · · + Yn is a function of Y = (Y1 , . . . , Yn ); hence Y1 + · · · + Yn is almost surely constant3 . Hence
(X1 · · · Xn )1/n is almost surely constant. [←EX]

3
By definition, the random variables X and Y are independent iff the generated σ-fields σ(X) and σ(Y ) are independent. If
Y = f (X), then σ(Y ) ⊆ σ(X) and hence for every A ∈ σ(Y ) we have A is independent of itself and so P(A) = 0 or P(A) = 1.
Hence the distribution function of Y satisfies FY (y) = 0 or 1 for every y ∈ R and hence Y is almost surely constant.
Mar 10, 2020(20:25) Answers 3§43 Page 243

43.11 We have
n
X n
X n
X
(xk − µ)2 = (xk − x)2 + n(x − µ)2 = (x1 − x)2 + yk2 + ny12
k=1 k=1 k=2
Pn Pn 2
Pn 2
Now (x1 − x) + k=2 yk = k=1 (xk − x) = 0 and hence (x1 − x) = k=2 yk .
Pn 2 2
Pn 2 Pn 2
So we have k=1 (xk − µ) = ny1 + k=2 yk + k=1 yk .
Hence the density of (Y1 , . . . , Yn ) is
  !2 
  n n
n 1 2 1 X X
f (y1 , . . . , yn ) = exp − ny1 exp −  yk2 + yk  [←EX]
(2π)n/2 σ n 2σ 2 2σ 2
k=2 k=2

43.12 (a) Clearly v1 · v1 = v2 · v2 = · · · = vn · vn . Also v1 · vk = 0 for k = 2, . . . , n. Also v2 · vk = 0 for k = 3, . . . , n. And so


on. Hence result.
(b)
 qP3 q
P3
q
P3 
a1 2 2 a2j
j=1 aj a2 j=1 aj a3 j=1
√ √ a22 a23
 where Q2 = and Q =
 
2 3
 
 a1 Q − a1/a2 Q2 0 3
a21 j=1 a2j
P P 2 2
P3 2
√ 2 √ 2 2
√ j=1 aj j=1 aj
a1 Q3 a2 Q3 − (a1 +a2 )/a 3 Q3
(c)The first row is
,v
u n 2
uX
α1 = (a1 , a2 , . . . , an ) t aj
j=1

and for k = 2, . . . , n, the k th row is


v
k−1 2
aj a2
X u
, 0, . . . , 0)/Q where Q = t Pk−1 kPk
u
αk = (a1 , . . . , ak−1 , − [←EX]
ak 2
j=1 aj
2
j=1 aj
j=1
qP
n
43.13 Let Y = CX where C is orthogonal and has first row equal to α = (a1 , . . . , an )/ j=1 a2j . Because C is orthogonal we
have YT Y = XT X; hence Y1 + · · · + Yn2 = X1 + · · · + Xn2 ∼ χ2nq. Also Y1 , Y2 , . . . , Yn are i.i.d. random variables with the
Pn 2
N (0, 1) distribution. Now Y1 = (a1 X1 + a2 X2 + · · · + an Xn )/ j=1 aj . Hence we want the conditional distribution of
Y12 + Y22 + · · · + Yn2 given Y1 = 0. Clearly this has the χ2n−1 distribution. [←EX]
43.14 We need to find the conditional density fXn |(Xn−1 ,···,X2 ,X1 ) (xn |(xn−1 , . . . , x2 , x1 ). This is the density of
N µY + AΣ−1 −1 T

Z (z − µZ ), ΣY − AΣZ A where Y = {Xn } and Z = (Xn−1 , . . . , X1 ).
The matrix of regression coefficients is
1 ρ ρ2 · · · ρn−2 −1
 
1  ρ 1 ρ · · · ρn−3 
A Σ−1
Z = σ 2
[ ρ ρ 2
· · · ρ n−1
]  . . . .. .. 
1×(n−1) (n−1)×(n−1) σ 2  .. .. .. . .

n−2 n−3 n−4
ρ ρ ρ ··· 1
1 −ρ 0 0 ··· 0 0 0
 
2
 −ρ 1 + ρ −ρ 0 ··· 0 0 0 
2
0 −ρ 1 + ρ −ρ · · · 0 0 0 
 
1  
= [ ρ ρ2 · · · ρn−1 ] . .. .. .. .. .. .. .. 
1 − ρ2  .. . . . . . . . 

0 0 0 0 · · · −ρ 1 + ρ2 −ρ
 
0 0 0 0 ··· 0 −ρ 1
= [ρ 0 0 ··· 0 0]
Thus the distribution of Xn given (Xn−1 , . . . , X1 ) is N (µn +ρ(Xn−1 −µn−1 ), σ 2 (1−ρ2 )), proving the Markov property.
[←EX]
43.15 Now Z ∞
g` (x(`) ) = fX (x)dx` = I1 + I2
−∞
where  
 
Z ∞ n n
1 1 X 1  1 X
I1 = exp − x2j  dx` = exp − x2j 

(2π)n/2 −∞ 2
j=1
(2π)(n−1)/2 2 j=1
j6=`

and
Page 244 Answers 3§45 Mar 10, 2020(20:25) Bayesian Time Series Analysis
 
n n
!
Z ∞
1 Y 1 2 1X
I2 = xk e− 2 xk exp − x2j  dx`
(2π)n/2 −∞ k=1
2
j=1
   
n n Z ∞
1 Y
− 1 x2   1 X 2
=  xk e 2 k  exp − x2j  x` e−x` dx`
 
(2π) n/2 2
k=1 j=1 −∞
k6=` j6=`

Now because the integrand is an odd functions of x` we have


Z ∞
2
x` e−x` dx` = 0 and I2 = 0.
−∞
Hence the density of the (n − 1)-dimensional random vector X(`) is the density of (n − 1) independent N (0, 1) distribu-
tions. [←EX]
43.16 Use characteristic functions. Suppose t is the 1 × (m + n) vector in Rm+n and t = [u, v] where u ∈ Rm and v ∈ Rn .
Then
φ(t) = E[eitZ ] = E[eiuX+ivY ] = E[eiuX ] E[eivY ]
1 T 1 T
= eiuµX − 2 uΣX u eivµY − 2 vΣY v
1 T T 1 T
= ei(uµX +vµY )− 2 (uΣX u +vΣY v ) = eitµ− 2 tΣt
as required [←EX]

Chapter 3 Section 45 on page 156 (exs-quadraticForms.tex)

45.1 Using equation(38.8a) on page 119 gives


cov[ AX, XT BX ] = E (AX − Aµ)(XT BX − E[XT BX])
 

= E (AX − Aµ)(XT BX − trace(BΣ) − µT Bµ)


 

= E (AX − Aµ)(XT BX − µT Bµ) by using E[AX] = Aµ.


 

Now let Y = X − µ; hence Y ∼ N (0, Σ). Then


cov[ AX, XT BX ] = E AY(YT BY + YT Bµ + µT BY)
 

= E AY(YT BY + 2YT Bµ) = E AYYT BY + 2E AYYT Bµ


     
 
Now 2E AYYT Bµ = 2AΣBµ.
Pn Pn Pn Pn
Also YYT BY = (Y)(YT BY) = Y j=1 k=1 bjk Yj Yk . The ith component of this vector is j=1 k=1 bjk Yi Yj Yk . Now
E[Yi Yj Yk ] = 0 because the integral is an odd function on Rn . Indeed, all odd central moments of a multivariate normal
equal 0. Hence result. [←EX]
45.2 If A is idempotent then (I − A)(I − A) = I − A. Hence A is idempotent iff I − A is idempotent.
Let B = I − A; then A + B = I. By proposition(44.19c) on page 153, we have A and B are idempotent iff rank(A) +
rank(B) = n. Hence result. [←EX]
45.3 ⇐ Now (A + B)2 = A2 + AB + BA + B2 = A + B. Hence result.
⇒ A + B = (A + B)2 = A + AB + BA + B; hence AB + BA = 0. Premultiplying and postmultiplying by A gives
AB + ABA = 0 and ABA + BA = 0. Hence AB = BA and hence AB = BA = 0 as required. [←EX]
45.4 Let A = P1 , A1 = P2 and A2 = P1 − P2 . Then A = A1 + A2 , A and A1 are idempotent and A2 is non-negative definite.
Hence A1 A2 = 0, A2 A1 = 0 and A2 is idempotent. The result follows by proposition(44.18a) on page 151. [←EX]
45.5 (a)
Z ∞
2 1 2 1 2
E[esX +tX ] = √ esx +tx e− 2 x dx
2π −∞
The exponent is, for s < 1/2,
2
√ t2

1 2  1 t
− x (1 − 2s) − 2tx = − x 1 − 2s − √ +
2 2 1 − 2s 2(1 − 2s)
Hence "
Z ∞ 2 #
t2 √
  
sX 2 +tX 1 1 t
E[e ] = exp √ exp − x 1 − 2s − √ dx
2(1 − 2s) 2π −∞ 2 1 − 2s
Z ∞ " 2 #
t2
  
1 1 t
= exp √ exp − v− √ dv
2(1 − 2s) 2π(1 − 2s) −∞ 2 1 − 2s
(b) In + F(Ir − GT F)−1 GT In − FGT = In − FGT + F(Ir − GT F)−1 GT − F(Ir − GT F)−1 GT FGT = In − FGT +
   

F(Ir − GT F)−1 (Ir − GT F)GT = In [←EX]


45.6 Now X12 + · · · + Xn2 = XT IX. Hence by Craig’s theorem (proposition(44.7b) on page 144), AI = 0 and hence A = 0.
[←EX]
Mar 10, 2020(20:25) Answers 3§45 Page 245

45.7 Now Q1 = XT A1 X and Q2 = XT A2 X where


 2
0 1/2 0 0 a1 a1 a2 a1 a3 a1 a4
  
 1/ 0 0 0   a1 a2 a22 a2 a3 a2 a4 
A1 =  2  and A2 = 
0 0 0 1/2 a1 a3 a2 a3 a23 a3 a4

0 0 1/2 0 a1 a4 a2 a4 a3 a4 a24
Craig’s theorem (proposition(44.7b) on page 144) implies A1 A2 = 0 and hence a1 = a2 = a3 = a4 = 0 and A2 = 0.
[←EX]
45.8 Just as for equation(44.4a) on page 143,
T T T 1
E[et1 X A1 X+t2 X A2 X+···+tk X Ak X ] =
|I − 2t1 σ 2 A1 − 2t2 σ 2 A2 − · · · − 2tk σ 2 Ak |1/2
By Craig’s theorem (proposition(44.7b) on page 144), we know that Ai Aj = 0. Hence, by induction on k we have
Yk
|I − 2tj σ 2 Aj | = |I − 2t1 σ 2 A1 − 2t2 σ 2 A2 − · · · − 2tk σ 2 Ak |
j=1
and hence
k k
T
A1 X+···+tk XT Ak X T
Y Y
E[et1 Q1 +···+tk Qk ] = E[et1 X ]= E[etj X Aj X
]= E[etj Qj ] [←EX]
j=1 j=1
45.9 ⇒ By proposition(44.8a) on page 145, aB = 0. Hence aT aB = 0. Hence Craig’s theorem (proposition(44.7b) on
page 144) implies Q1 and Q2 are independent.
⇐ Craig’s theorem (proposition(44.7b) on page 144) implies aT aB = 0. Hence (aaT )aB = aaT aB = 0. But aaT =
a21 + · · · + a2n is a non-zero scalar; hence aB = 0; hence result. [←EX]
45.10 (a) Proof by contradiction. Suppose aij = aji 6= 0 where i 6= j. Consider the n × 1 vector x with xi 6= 0, xj 6= 0 and
xk = 0 otherwise. Then
xT Ax = x2i aii + 2xi xj aij = xi xi aii + 2xj aij


If aii = 0, take xi = 1 and xj = −aij . Then xT Ax = −2a2ij < 0.


If aii > 0, take xi = a2ij /aii and xj = −aij . Then xT Ax = xi (a2ij − 2a2ij ) < 0.
In both cases we have a contradiction and hence the result is proved.
(b) ⇐ By Craig’s theorem (proposition(44.7b) on page 144) we have AA1 = 0 and AA2 = 0. Hence A(A1 + A2 ) = 0.
Using Craig’s theorem again implies Q is independent of Q1 + Q2 .
⇒ By Craig’s theorem (proposition(44.7b) on page 144) we have (A1 +A2 )A = 0. Because A1 +A2 is a real symmetric
matrix, we know there exists orthogonal L such that LT (A1 +A2 )L = diag(α1 , . . . , αr , 0, . . . , 0) where r = rank(A1 +A2 ).
Hence LT (A1 + A2 )LLT AL = LT (A1 + A2 )AL = 0. Using the fact that LT (A1 + A2 )L = diag(α1 , . . . , αr , 0, . . . , 0), we
must have    
T 0 0 T 0 0
L AL = and using symmetry gives L AL = where B2 is an (n − r) × (n − r) matrix.
B1 B2 0 B2
Now LT A1 L and LT A2 L are both non-negative definite matrices and hence all the diagonal elements are non-negative.
Because LT (A1 + A2 )L = diag(α1 , . . . , αr , 0, . . . , 0), the last n − r diagonal elements of LT A1 L and LT A2 L must be
zero. Then use part(a); hence
   
C1 0 C2 0
LT A1 L = and LT A2 L = where C1 and C2 are both r × r matrices.
0 0 0 0
Hence LT A1 LLT AL = 0 which implies A1 A = 0. Similarly A2 A = 0. The result follows by Craig’s theorem. [←EX]
45.11 Take C = AQ and QT BQ. Pn
PT AX = PT AQPY = PT CPY = diag[λ1 , . . . , λn ]Y and XT BX = YT diag[µ1 , . . . , µn ]Y = j=1 µ2j Yj . [←EX]
45.12 We are given XT AX and XT BX are independent. Also XT AX/σ 2 ∼ χ2r where r = rank(A) and XT BX ∼ χ2s where
s = rank(B). Hence XT (A + B)X/σ 2 ∼ χ2r+s . Exercise 45.18 implies A + B is idempotent. Hence A + B = (A + B)2 =
A2 + AB + BA + B2 = A + AB + BA + B. Hence AB + BA = 0. Post-multiplying by B gives AB + BAB = 0. Then
pre-multiply by B; this gives BAB = 0. Hence AB = 0 as required. √ [←EX]
45.13 Let k = aT b. Now k = 0 implies a = 0 and hence result is trivial. So suppose k > 0 and let a1 = a/ k and let C be the
n × n matrix a1 aT1 . Clearly aT1 a1 = 1 and C is symmetric and idempotent.
We are given aT X and XT BX are independent. Hence aT1 X and XT BX are independent; hence (aT1 X)T (aT1 X) = XT CX
and XT BX are independent. By exercise 45.12 we must have BC = 0; hence BaaT = 0. Let g denote the n × 1 vector
Ba. Then gaT = 0 and hence gi aj = 0 for all i and j; hence gi (a21 + · · · + a2n ) = gi k = 0 and hence gi = 0. Hence Ba = 0
as required. [←EX]
45.14 Let C denote the 2 × 2 matrix with  
1/2 − 1/2
C=
− 1/2 1/2
Then C is symmetric and idempotent. Also XT CX = 21 (X1 − X2 )2 .
Suppose b is a 2 × 1 vector in R2 . It follows by exercise 45.13 that bT X and (X1 − X2 )2 are independent iff bT C = 0,
which occurs iff b1 = b2 . Hence the linear functions are a(X1 + X2 ) for a ∈ R. [←EX]
Page 246 Answers 3§45 Mar 10, 2020(20:25) Bayesian Time Series Analysis

45.15 (a) Now AT A is an n × n matrix and rank(AT A) = rank(A) = n. (b) Now AAT is an m × m matrix and rank(AAT ) =
rank(A) = m. (c) rank(BA) = rank((BA)T BA) = rank(AT BT BA) ≤ rank(BT BA) ≤ rank(BA). Now B has full column
rank; hence BT B is non-singular; hence rank(BT BA) = rank(A). (d) Now rank(AB) = rank(BT AT ) = rank(AT ) because
BT has full column rank. Hence result. [←EX]
3 2
45.16 (a) ⇒ (ΣA) = ΣAΣAΣA = ΣAΣA = (ΣA) . ⇐ ΣAΣAΣAΣAΣ = ΣAΣAΣAΣ = ΣAΣAΣ. Now Σ
is non-negative definite; hence there exists an n × r matrix B with Σ = BBT and rank(Σ) = rank(B) = r. Hence
ΣAΣABBT AΣAΣ − ΣABBT AΣ = 0. Hence (ΣAΣAB − ΣAB)(ΣAΣAB − ΣAB)T = 0; hence ΣAΣAB = ΣAB.
POstmultiply by BT to get the result.
(b) Using the general result that rank(A) = rank(AT ) gives the first equality. Now rank(AΣ) = rank(ABBT ) = rank(AB)
by part(c) of exercise 45.15 because BT has full row rank. Hence rank(AΣ) = rank( (AB)(AB)T ) = rank(AΣA).
(c) Using part(c) of exercise 45.15 and the general result that rank(A) = rank(AAT ) gives rank(ΣAΣ) = rank(ΣABBT ) =
rank(ΣAB) = rank( (ΣAB)(BT AΣ) ) = rank(ΣAΣAΣ). Using the general result that rank(AB) ≤ rank(A) gives
rank(ΣAΣ) ≥ rank(ΣAΣA) = rank(AΣAΣ) ≥ rank(ΣAΣAΣ). But we have established that the left hand side equals
the right hand side; hence the required result. (d) ⇒ Now AΣAΣ = AΣ. Premultiply by Σ1/2 and postmultiply by
Σ−1/2 ; this gives Σ1/2 AΣAΣ1/2 as required. ⇐ Now Σ1/2 AΣAΣ1/2 = Σ1/2 AΣ1/2 . Premultiply by Σ−1/2 and
postmultiply by Σ1/2 ; this gives AΣAΣ = AΣ as required. [←EX]
45.17 By example(38.8d), we know that
(n − 1)S 2 XT AX
= where A = In − n1 1.
σ2 σ2
It is easy to check that A is idempotent. Also rank(A) = trace(A) = n − 1. Hence by proposition(44.13a) on page 147
XT AX/σ 2 has a non-central χ2n−1 distribution with non-centrality parameter µT Aµ/σ 2 where µ = µ1n×1 . But µT Aµ =
nµ2 − nµ2 = 0. Hence result. [←EX]
45.18 We are given XT CX/σ 2 ∼ χ2r . Because C is a real symmetric matrix, there exists orthogonal L such that C = LT DL
dn which are the eigenvalues of C. Let Y = LT X/σ; then Y ∼ N (0, I).
where D is diagonal with entries d1 , d2 , . . . , P
n
Also Y CY = X LCL X/σ = X DX/σ = j=1 dj Xj2 /σ 2 . The characteristic function of the random variable YT CY
T T T 2 T 2

is
n n
P 2 2 Y 2 2 Y 1
φ(t) = E[eit dj Xj /σ ] = E[eitdj Xj /σ ] = 1 because Xj2 /σ 2 ∼ χ21 .
j=1 j=1 (1 − 2itd j ) 2

r
But we are given YT CY ∼ χ2r ; hence φ(t) = 1/(1 − 2it) 2 . Hence for all t ∈ R we have
Yn
(1 − 2it)r = (1 − 2itdj )
j=1
The left hand side is a polynomial in t with r roots all equal to 1/2i; hence so is the right hand side. Hence r of
{d1 , . . . , dn } must equal 1 and the rest equal 0. Hence C is idempotent with rank(C) = r. [←EX]
2 2
45.19 (a) H = H and hence (In − H) = In − H − H + H = In − H. Because H is idempotent, rank(H) = trace(H) =
trace(x(xT x)−1 xT ) = trace(xT x(xT x)−1 ) = trace(Ip ) = p and rank(In − H) = trace(In − H) = n − p.
(b) By equation(38.8a) we have E[YT HY] = trace(Hσ 2 In ) + µT Hµ = pσ 2 + bT xT Hxb = pσ 2 + (xb)T (bx). Similarly
E[YT (In − H)Y] = trace( (In − H)σ 2 In ) + µT (In − H)µ = σ 2 (n − p) + 0 = σ 2 (n − p).
By proposition(44.13a), YT HY/σ 2 ∼ χ2p,λ where λ = µT Hµ/σ 2 = (xb)T (bx)/σ 2 and YT (In − H)Y/σ 2 ∼ χ2n−p .
(c) By proposition(44.7a), YT HY and YT (In − H)Y are independent. By definition(21.6a), the answer is the non-central
Fp,n−p distribution with non-centrality parameter λ = (xb)T (bx)/σ 2 . [←EX]
45.20 (a) Differentiating BA = I gives
dA dB dB dA
B + A = 0 and hence A = −B which gives the first result.
dx dx dx dx
Differentiating the first result gives
2
d2 B d2 A d2 A

dB dA dA dB dA
= − B − B B − B = −B B + 2 B B
dx2 dx dx dx2 dx dx dx2 dx
P 2
λi . Because the eigenvalues of A2 are {λ21 , . . . , λ2n }, the right hand side is
P 2
(b)PThe left hand side is λi +
2 i,j λi λj . Hence result. [←EX]
i6 =j

45.21 (a) Now g(t) = (1 − 2tλ1 ) · · · (1 − 2tλr ). Hence g(0) = 1 and


r r r
X g 0 (t) X −2λj X
ln g(t) = ln(1 − 2tλj ) and hence = leading to g 0 (t)|t=0 = −2 λj = −2 trace(AΣ)
g(t) 1 − 2tλj
j=1 j=1 j=1
Differentiating again gives
r
g(t)g 00 (t) − [g 0 (t)]2 X 4λ2j
= −
[g(t)]2 (1 − 2tλj )2
j=1
Mar 10, 2020(20:25) Answers 3§45 Page 247

Setting t = 0 gives
r
X X
g 00 (0) − [g 0 (0)]2 = − 4λ2j and hence g 00 (0) = 8 λi λj
j=1 i,j
i6 =j

(b) Let Y = XT AX and H(t) = I − 2tAΣ. Then


d2 H(t)

−1
dH(t)
H(t)|t=0 = I [H(t)] t=0 = I and = −2AΣ and =0
dt t=0 dt2 t=0
Use equation(44.3c):
1 1 1
KY (t) = ln MY (t) = − ln[g(t)] − µT Σ−1 µ + µT [H(t)]−1 Σ−1 µ
2 2 2
Hence
1 g 0 (t) 1 T
KY0 (t) = − − µ [H(t)]−1 [−2AΣ][H(t)]−1 Σ−1 µ
2 g(t) 2
T
leading to µ = trace(AΣ) + µ Aµ. This result was shown in equation(38.8a) on page 119.
1 g(t)g 00 (t) − [g 0 (t)]2 1 T d2 H(t)]−1 −1
KY00 (t) = − + µ Σ µ
2 [g(t)]2 2 dt2
" 2 #
1 g(t)g 00 (t) − [g 0 (t)]2 1 T 2

−1 d H −1 −1 dH −1
=− + µ −H H +2 H H Σ−1 µ
2 [g(t)]2 2 dt2 dt
X
KY00 (t)|t=0 = −4 λi λj + 2 [trace(AΣ)]2 + 4µT (AΣ)2 Σ−1 µ
i,j,
i6 =j
X
= −4 λi λj + 2 [trace(AΣ)]2 + 4µT AΣAµ
i,j,
i6 =j

Result follows by part (b) of exercise 45.20 on page 157. [←EX]


45.22 (a)
 
n n n
" ∞ #
1 1 Y 1 X 1 X X (2tλj )r
− ln[|I − 2tAΣ|] = − ln (1 − 2tλj ) = − ln(1 − 2tλj ) = − −
2 2 2 2 r
j=1 j=1 j=1 r=1
n ∞ ∞ n ∞ r
1 X X (2tλj )r 1 X (2t)r X X t
= = λrj = 2r−1 trace[ (AΣ)r ]
2 r 2 r r
j=1 r=1 r=1 j=1 r=1
P∞ P∞
(b) I − (I − 2tAΣ)−1 = − r r r −1
 −1
Σ = − r=1 2r tr (AΣ)r−1 A

r=1 2 t (AΣ) and I − (I − 2tAΣ)
(c)

X κr tr 1 1 1
K XT AX = = − ln[|I − 2tAΣ|] − µT Σ−1 µ + µT (Σ − 2tΣAΣ)−1 µ
r! 2 2 2
r=1
1 1 
= − ln[|I − 2tAΣ|] − µT I − (I − 2tAΣ)−1 Σ−1 µ

2 2
Hence κr = 2r−1 (r − 1)! trace[ (AΣ)r ] + µT 2r−1 r!(AΣ)r−1 Aµ [←EX]
p √
45.23 Now T = V / W/(n − 1) where V = (X − µ0 )/(σ/ n) and W = (n − 1)S 2 /σ 2 ∼ χ2n−1 . Note that V and W are
independent and
µ − µ0
 
V ∼N √ ,1
σ/ n

Hence T has the non-central tn−1 distribution with non-centrality parameter (µ − µ0 )/(σ/ n). [←EX]
2
√ √  2 2 2
45.24 Now X ∼ N (µ, σ /n); hence X n/σ ∼ N µ n/σ, 1 . Hence nX /σ has the non-central χ1,nµ2 /σ2 . By exam-
2
ple(44.17b) on page 151 we know that (n − 1)S 2 /σ 2 ∼ χ2n−1 and X is independent of S 2 . Hence Y has the non-central
F1,n−1 distribution with non-centrality parameter λ = nµ2 /σ 2 . [←EX]
45.25 Part(a) follows directly from exercise 2.12 on page 6. For part(b), let Yi = Xi − µ; so without loss of generality we can
assume Xi ∼ N (0, σ 2 ). Clearly Q1 /σ 2 , Q2 /σ 2 and Q3 /σ 2 all have the χ21 distribution; by equation(42.12a) on page 136
P4
we know that i=1 (Xi − X)2 /σ 2 ∼ χ23 . Proposition(44.20a) on page 154 then shows Q1 , Q2 and Q3 [←EX]
   Pm  Pn 
45.26 Now Xij − X·· = Xij − Xi· − X·j + X·· + Xi· − X·· + X·j − X·· . Clearly i=1 Xi· − X·· = j=1 X·j − X·· =
Pm Pn  
0. Hence i=1 j=1 Xi· − X·· X·j − X·· = 0 and
Pm Pn   Pm Pn  
i=1 j=1 Xij − Xi· −  X·j + X·· Xi· − X·· = i=1 j=1 Xij − Xi· Xi· − X·· = 0
Pm
because j=1 Xij − Xi· = 0. Hence part(a). Part(b) follows by proposition(44.20a) on page 154. [←EX]
Page 248 Answers 3§48 Mar 10, 2020(20:25) Bayesian Time Series Analysis

45.27 By transforming to Yi = Xi − µ, without loss of generality we can assume µ = 0.


(1)  (1)  Pn 2 Pn (1) 2 1 (1) 2
(a) Xi −X = Xi −X − n1 X1 −X and hence i=1 Xi −X = i=1 Xi −X + n X1 −X − n2 X1 −
(1) n (1) n (1) 2 1 (1) 2 (1) 2 n (1) 2 (1) 2
− n2 X1 −X − n1 X1 −X
 P  P    P 
X i=1 Xi −X = i=1 Xi −X + n X1 −X = i=1 Xi −X
as required.
Pn 2 Pn (1) 2
(b) By equation(42.12a) on page 136, i=1 Xi − X /σ 2 ∼ χ2n−1 and i=2 Xi − X /σ 2 ∼ χ2n−2 . Hence by
proposition(44.20a) on page 154, [←EX]
45.28 Note that Sylvester’s theorem on page 416 of [H ARVILLE(1997)] shows that det(In + SU) = det(Im + US). Hence for
|ρ| < 1 we have det(In + ρ1/(1 − ρ) ) = det(In + (1, . . . , 1)T (1, . . . , 1)ρ/(1P− ρ) ) = [1 + (n − 1)ρ]/(1 − ρ). Hence Σ
n 
is non-singular for |ρ| < 1. By example(38.7d) on page 119 we know that j=1 (Xj − X)2 = XT In − 1/n X. Hence
 
Y = XT AX where A = In − 1/n /σ 2 (1 − ρ) is symmetric and
 
In − 1/n 1  
AΣ = [ (1 − ρ)In + ρ1 ] = (1 − ρ)In + ρ1 − (1 − ρ)1/n − ρ1 = In − 1/n
(1 − ρ) 1−ρ
Hence AΣ is idempotent. Also rank(AΣ) = rank(In − 1/n) = n − 1 because for an idempotent matrix, the rank equals
the trace. Using theorem(44.13a) on page 147 shows XT AX has the non-central χ2n−1 distribution with non-centrality
parameter µT Aµ = nµ2 − nµ2 = 0. Hence the answer is the χ2n−1 distribution. [←EX]

Chapter 3 Section 48 on page 163 (exs-t.tex)

48.1 Use the transformation x22 = (ν + 2 2


1)(x − ρy) /(1 − ρ )(ν + y ). Then 2
Z ∞ −(ν+2)/2 Z ∞ −(ν+2)/2
p x2 − 2ρxy + y 2 (x − ρy)2 + (y 2 + ν)(1 − ρ2 )
2π 1 − ρ2 fY (y) = 1+ dx = dx
−∞ ν(1 − ρ2 ) −∞ ν(1 − ρ2 )
Z ∞  −(ν+2)/2 r
y2 x22 (1 − ρ2 )(ν + y 2 )

= 1+ 1+ dx2
−∞ ν ν+1 ν+1
and hence
r −(ν+2)/2
y2 √

1 ν + y2
fY (y) = 1+ B( 1/2, (ν+1)/2) ν + 1
2π ν+1 ν
−(ν+1)/2
/2) √ y2

B( /2,
1 (ν+1)
= ν 1+
2π ν
and then use

B( 1/2, (ν+1)/2) B( 1/2, ν/2) = [←EX]
ν
48.2 (a) Provided ν > 1, E[X] = 0. Provided ν > 2, var[X] = ν/(ν − 2). (b) By the usual characterization of the
t-distribution, E[XY ] = E[Z1 Z2 ]ν/E[1/W ] = ρν/(ν − 2). Hence corr[X, Y ] = ρ. [←EX]
48.3 Now   ν/2−1
1 1 T w h wi
f(Z1 ,...,Zp ,W ) (z1 , . . . , zp , w) = exp − z z exp −
(2π)p/2 2 2ν/2 Γ( ν2 ) 2
Use the transformation T = Z (W/ν)1/2 ; this has Jacobian ν p/2 /wp/2 . Also, ZT Z = W TT T/ν. Hence


wν/2−1 h w w i wp/2
f(T1 ,...,Tp ,W ) (t1 , . . . , tp , w) = (p+ν)/2 p/2 ν  exp − tT t −
2 π Γ 2 2ν 2 ν p/2
w(p+ν)/2−1 tT t
  
w
= (p+ν)/2 p/2 p/2 ν  exp − 1+
2 π ν Γ 2 2 ν
Integrating out w gives
Z ∞
tT t
  
1 (p+ν)/2−1 w
fT (t) = (p+ν)/2 p/2 p/2 ν  w exp − 1+ dw
2 π ν Γ 2 0 2 ν
Now let y = (1 + tT t/ν)w. This leads to
Z ∞ h yi
1 (p+ν)/2−1
fT (t) = ν
 y exp − dy
2(p+ν)/2 π p/2 ν p/2 Γ 2 (1 + tT t/ν)(p+ν)/2 0 2

νν h yi
Z
(p+ν)/2−1
= (p+ν)/2 p/2 ν  y exp − dy
2 π Γ 2 (ν + tT t)(p+ν)/2 0 2
νν (p+ν)/2
p + ν 
= (p+ν)/2 p/2 ν  × 2 Γ
2 π Γ 2 (ν + tT t)(p+ν)/2 2
ν p+ν

ν Γ 2
= p/2 ν  [←EX]
π Γ 2 (ν + tT t)(p+ν)/2
Mar 10, 2020(20:25) Answers 3§50 Page 249

48.4 Now C = LLT and C is symmetric. Hence LT C = CLT ; hence C−1 LT = LT C−1 . Of course C−1 is also symmetric;
hence LC−1 = C−1 L.
Using (V − m)T = TT LT gives (V − m)T L = TT LT L = TT C and hence (V − m)T LC−1 = TT and T = C−1 LT (V − m).
Hence TT T = (V − m)T C−1 (V − m). [←EX]
48.5 First, note that
(v − m1 )T C−1 T −1 T −1
1 (v − m1 ) = (v − a − Am) C1 (v − a − Am) = (At − Am) C1 (At − Am)
= (t − m)T AT C−1 T −1
1 A(t − m) = (t − m) C (t − m)
Then use
1
fV (v) ∝  (ν+p)/2 [←EX]
ν + (t − m)T C−1 (v − m)
48.6 This follows from proposition(47.2a) on page 161. [←EX]

Chapter 3 Section 50 on page 165 (exs-dirichlet.tex)

50.1 Using the standard representation that Yj = Xj /(X1 + · · · + X7 ), we have


 
X1 W2 W3
(Y1 , Y2 + Y3 , Y4 + Y5 + Y6 ) = , ,
X1 + W2 + W3 + X7 X1 + W2 + W3 + X7 X1 + W2 + W3 + X7
where W2 = X2 + X3 ∼ gamma(k2 + k3 , α) and W3 = X4 + X5 + X6 ∼ gamma(k4 + k5 + k6 , α). Hence result. [←EX]
50.2 Using the standard representation that Yj = Xj /(X1 + · · · + X7 ), we have
X1 + X2 + X3 + X4 W
V = =
X1 + · · · + X7 W +Z
where W = X1 + X2 + X3 + X4 ∼ gamma(k1 + k2 + k3 + k4 , α), Z = X5 + X6 + X7 ∼ gamma(k5 + k6 + k7 , α) and W
and Z are independent. Hence V ∼ beta(k1 + k2 + k3 + k4 , k5 + k6 + k7 ). [←EX]
50.3 Now the joint density of (Y1 , Y2 ) is
Γ(k1 + k2 + k3 ) k1 −1 k2 −1
f(Y1 ,Y2 ) (y1 , y2 ) = y y2 (1 − y1 − y2 )k3 −1 for y1 > 0, y2 > 0, y1 + y2 < 1.
Γ(k1 )Γ(k2 )Γ(k3 ) 1
and the marginal density of X2 is beta(k2 , k1 + k3 ):
Γ(k1 + k2 + k3 ) k2 −1
fY2 (y2 ) = y (1 − y2 )k1 +k3 −1 for 0 < y1 < 1.
Γ(k2 )Γ(k1 + k3 ) 2
Hence the conditional density is
Γ(k1 + k3 ) y1k1 −1 (1 − y1 − y2 )k3 −1
fY1 |Y2 (y1 |y2 ) = for 0 < y1 < 1 − y2 .
Γ(k1 )Γ(k3 ) (1 − y2 )k1 +k3 −1
 k1 −1  k3 −1
Γ(k1 + k3 ) y1 y1 1
= 1−
Γ(k1 )Γ(k3 ) 1 − y2 1 − y2 1 − y2
and this is the density of a random variable Z/(1 − y2 ) where Z ∼ beta(k1 , k3 ).
(b) E[Y1 |Y2 = y2 ] = E[Z]/(1 − y2 ) = k1 /[(k1 + k3 )(1 − y2 )]. [←EX]
50.4 (a) Now Z Z
Γ(k) k −1
E[Yi Yj ] = yi yj y ki −1 yj j (1 − yi − yj )k−ki −kj −1 dyi dyj
Γ(ki )Γ(kj )Γ(k − ki − kj ) i
Z Z
Γ(k) k
= yiki yj j (1 − yi − yj )k−ki −kj −1 dyi dyj
Γ(ki )Γ(ki )Γ(k − ki − kj )
Γ(k) Γ(ki + 1)Γ(kj + 1)Γ(k − ki − kj ) ki kj
= =
Γ(ki )Γ(ki )Γ(k − ki − kj ) Γ(k + 2) (k + 1)k
(b) We have p
ki kj ki kj ki kj ki kj
cov[Yi , Yj ] = − =− 2 and corr[Yi , Yj ] = − p [←EX]
(k + 1)k k k k (k + 1) (k − ki )(k − kj )
50.5 Using the density in equation(49.4a) givesZ
Γ(k1 + · · · + kn+1 )
Z
E[Y1j1 · · · Ynjn ] = · · · y1k1 +j1 −1 · · · ynkn +jn −1 (1 − y1 − · · · − yn )kn+1 −1 dy1 · · · dyn
Γ(k1 ) · · · Γ(kn+1 )
A
where A = { (y1 , . . . , yn ) : y1 > 0, y2 > 0, . . . , yn > 0, y1 + y2 + · · · yn < 1 }. Using the Dirichlet integral formula in
equation(49.4b) gives
n
Γ(k1 + · · · + kn+1 ) Γ(ki + ji )
E[Y1j1 · · · Ynjn ] =
Y
Γ(k1 + · · · + kn+1 + j1 + · · · + jn ) Γ(ki )
i=1
For the special case
n
Y ki
E[Y1 · · · Yn ] = [←EX]
k1 + · · · + kn + i − 1
i=1
Page 250 Answers 3§50 Mar 10, 2020(20:25) Bayesian Time Series Analysis

50.6 (a) Adding gives


1 Zi
1 + Z1 + · · · + Zn = and hence Yi = Zi (1 − Y1 − · · · − Yn ) =
1 − Y1 − · · · − Yn 1 + Z1 + · · · + Zn
(b) Let α = 1 + z1 + · · · + zn ; then
(α − z1 )/α2 −z1 /α2 ··· −z1 /α2 −z1 /α2

2 2 2 2
−z2 /α (α − z2 )/α · · · −z2 /α −z2 /α


∂(y1 , . . . , yn ) .. .. .. ..
= .
. . .

∂(z1 , . . . , zn )
−zn−1 /α2 −zn−1 /α2 · · · (α − zn−1 )/α2 −zn−1 α2

−zn /α2 −zn /α2 ··· −zn /α2 (α − zn )α2

1/α −z1 /α2 ··· −z1 /α2 −z1 /α2



2 2 2
−1/α (α − z2 )/α · · · −z2 /α −z2 /α

. . . .
= .. .. .. .. by subtracting column 2 from column 1.

2 2 2
0 −zn−1 /α · · · (α − zn−1 )/α −zn−1 α

0 −zn /α2 ··· −zn /α2 (α − zn )α2

1/α 0 ··· 0 −z1 /α2



−1/α 1/α · · · 0 −z2 /α2

. .. .. ..
= .. . similarly for columns 2, . . . , n − 1.
. .
2
0 0 · · · 1/α −zn−1 /α

0 0 · · · −1/α (α − zn )/α2

1/α 0 ··· 0 −z1 /α2



2
−1/α 1/α · · · 0 −z2 /α

. . . .
= . . . . . . by adding rows 1 to n − 1 to row n.
. .


2
0 0 · · · 1/α −zn−1 /α

0 0 ··· 0 1/α2

1 1
= =
αn+1 (1 + z1 + · · · + zn )n+1
(c) Now
∂(y1 , . . . , yn )
f(Z1 ,...,Zn ) (z1 , . . . , zn ) = f(Y1 ,...,Yn ) (y1 , . . . , yn )

∂(z1 , . . . , zn )
Γ(k1 + · · · + kn+1 ) y1k1 −1 · · · ynkn −1 (1 − y1 − · · · − yn )kn+1 −1
=
Γ(k1 ) · · · Γ(kn+1 ) (1 + z1 + · · · + zn )n+1
Γ(k1 + · · · + kn+1 ) z1k1 −1 · · · znkn −1
=
Γ(k1 ) · · · Γ(kn+1 ) (1 + z1 + · · · + zn )k1 +···+kn+1
as required.
(d) Using the standard representation Yj = Xj /(X1 + · · · + Xn+1 ) shows that Zi = Xi /Xn+1 where Xi ∼ gamma(ki , α),
Xn+1 ∼ gamma(kn+1 , α) and Xi and Xn+1 are independent. Hence, using the result for the moments of the gamma
distribution given in equation(11.4b) on page 34, we get
 
1 ki α ki
E[Zi ] = E[Xi ] E = = for kn+1 > 1.
Xn+1 α kn+1 − 1 kn+1 − 1
α2
 
2 2 1 ki (ki + 1) ki (ki + 1)
E[Zi ] = E[Xi ] E 2
= = for kn+1 > 2.
Xn+1 α 2 (kn+1 − 1)(kn+1 − 2) (kn+1 − 1)(kn+1 − 2)
(ki + 1)(kn+1 − 1) − k1 (kn+2 − 2) ki (ki + kn+1 − 1)
var[Zi ] = ki = for kn+1 > 2.
(kn+1 − 1)2 (kn+1 − 2) (kn+1 − 1)2 (kn+1 − 2)
α2
 
1 ki kj ki kj
E[Zi Zj ] = E[Xi ]E[Xj ]E 2
= =
Xn+1 α α (kn+1 − 1)(kn+1 − 2) (kn+1 − 1)(kn+1 − 2)
ki kj
cov[Zi , Zj ] = [←EX]
(kn+1 − 1)2 (kn+1 − 2)
REFERENCES
[A BRAMOWITZ &S TEGUN (1965)] A BRAMOWITZ , M. & S TEGUN , I. (1965). Handbook of mathematical functions.
Dover. 0-486-61272-4.
Cited on page 38 in exercise 12.14; page 88 in §29.3; page 88 in §29.3; page 88 in §29.3; page 89 in §29.4; page 90 in §30.3; page 90 in §30.3; page 90
in §30.3; page 110 in §36.6 and page 211 in answer to exercise 22.33.

[ACZ ÉL (1966)] ACZ ÉL , J. (1966). Lectures on functional equations and their applications. Academic Press. ISBN
0-12-043750-3.
Cited on page 29 in §9.3.

[A RNOLD et al. (2008)] A RNOLD , B. C., BALAKRISHNAN , N. & NAGARAJA , H. N. (2008). A first course in order
statistics. SIAM. ISBN 0-89871-648-9.
Cited on page 11 in proposition(3.7a) and page 11 in §3.7.

[A RNOLD &G ROENEVELD (1980)] A RNOLD , B. C. & G ROENEVELD , R. A. (1980). Some properties of the arcsine
distribution. Journal of the American Statistical Association, 75 173–175.
Cited on page 43 in §13.9.

[A SH (2000)] A SH , R. B. (2000). Real analysis and probability. 2nd ed. Academic Press. ISBN 0-12-065202-1.
Cited on page 168 in answer to exercise 2.11.

[B ILLINGSLEY (1995)] B ILLINGSLEY, P. (1995). Probability and measure. John Wiley. ISBN 0-471-00710-2.
Cited on page 28 in proposition(9.2a); page 29 in proposition(9.2b); page 53 in proposition(17.6a); page 54 in proposition(17.6b) and page 231 in answer
to exercise 37.7.

[B REIMAN (1968)] B REIMAN , L. (1968). Probability. Addison-Wesley. ISBN 0-20100646-4.


Cited on page 14 in proposition(4.3c).

[C ACOULLOS (1965)] C ACOULLOS , T. (1965). A relation between t and F distributions. Journal of the American
Statistical Association, 60 528–531. (Correction on page 1249.).
Cited on page 208 in answer to exercise 22.8.

[C ACOULLOS (1989)] C ACOULLOS , T. (1989). Exercises in probability. Springer. ISBN 1-4612-8863-0.


Cited on page 208 in answer to exercise 22.8.

[C RAWFORD (1966)] C RAWFORD , G. B. (1966). Characterization of geometric and exponential distributions. An-
nals of Mathematical Statistics, 37 1790–1795.
Cited on page 62 in exercise 20.26; page 79 in proposition(25.5a) and page 80 in proposition(25.6a).

[DALLAS (1976)] DALLAS , A. C. (1976). Characterizing the Pareto and power distributions. Annals of the Institute
of Statistical Mathematics, 28 491–497.
Cited on page 57 in §19.3; page 58 in §19.3; page 59 in §19.7; page 60 in §19.7; page 61 in exercise 20.8 and page 62 in exercise 20.23.

[DALY (1946)] DALY, J. J. (1946). On the use of the sample range in an analogue of student’s t-test. Annals of
Mathematical Statistics, 17 71–74.
Cited on page 137 in proposition(42.13a).

[DAVID &BARTON (1962)] DAVID , F. N. & BARTON , D. E. (1962). Combinatorial chance. Griffin. ISBN 0-85264-057-9.
Cited on page 24 in §7.8.

[D RISCOLL (1999)] D RISCOLL , M. F. (1999). An improved result relating quadratic forms and chi-square distri-
butions. The American Statistician, 53 273–275.
Cited on page 148 in §44.13.

[D RISCOLL &K RASNICKA (1995)] D RISCOLL , M. F. & K RASNICKA , B. (1995). An accessible proof of Craig’s
theorem in the general case. The American Statistician, 49 59–62.
Cited on page 146 in proposition(44.9a).

[D UMAIS (2000)] D UMAIS , M. F. (2000). The Craig-Sakamoto theorem. Master’s thesis, Department of Mathe-
matics and Statistics, McGill University. Dsownloadable from digitool.library.mcgill.ca/dtl publish/6/30372.html.
Cited on page 141 in §44.1.

Bayesian Time Series Analysis by R.J. Reed Mar 10, 2020(20:25) References Page 251
Page 252 References Mar 10, 2020(20:25) Bayesian Time Series Analysis

[F ELLER (1968)] F ELLER , W. (1968). An introduction to probability theory and its applications. Volume I . 3rd
ed. John Wiley. ISBN 0-471-257087.
Cited on page 21 in §7.3; page 22 in §7.3 and page 29 in proposition(9.3a).

[F ELLER (1971)] F ELLER , W. (1971). An introduction to probability theory and its applications. Volume II . 2nd
ed. John Wiley. ISBN 0-471-257095.
Cited on page 12 in proposition(4.3a); page 13 in proposition(4.3a); page 13 in proposition(4.3b); page 15 in §5.1; page 24 in §7.8; page 26 in exer-
cise 8.10; page 26 in exercise 8.12; page 35 in §11.7; page 47 in §15.1 and page 211 in answer to exercise 22.32.

[F RISTEDT &G RAY (1997)] F RISTEDT, B. & G RAY, L. G. (1997). A modern approach to probability theory.
Springer. ISBN 0-8176-3807-5.
Cited on page 11 in example (3.7b); page 15 in §5.1 and page 131 in proposition(42.4a).

[G ALAMBOS &KOTZ (1978)] G ALAMBOS , J. & KOTZ , S. (1978). Characterizations of probability distributions.
Springer. ISBN 3-540-08933-0.
Cited on page 16 in exercise 6.10 and page 29 in §9.4.

[G OVINDARAJULU (1963)] G OVINDARAJULU , Z. (1963). Relationships among moments of order statistics in sam-
ples from two related populations. Technometrics, 5 514–518.
Cited on page 16 in exercise 6.9.

[G RIMMETT &S TIRZAKER (1992)] G RIMMETT, G. R. & S TIRZAKER , D. R. (1992). Probability and random pro-
cesses. 2nd ed. Oxford University Press. ISBN 0-19-853665-8.
Cited on page 208 in answer to exercise 22.7.

[H ARVILLE (1997)] H ARVILLE , D. A. (1997). Matrix algebra from a statistician’s perspective. Springer. ISBN
0-387-94978-X.
Cited on page 135 in proposition(42.9b); page 141 in theorem (44.2a); page 141 in theorem (44.2a); page 144 in theorem (44.7b); page 147 in §44.12;
page 149 in proposition(44.16a); page 149 in proposition(44.16a); page 152 in proposition(44.18a); page 156 in exercise 45.15; page 156 in exercise 45.15
and page 248 in answer to exercise 45.28.

[JAMES (1979)] JAMES , I. R. (1979). Characterization of a family of distributions by the independence of size and
shape variables. Annals of Statistics, 7 869–881.
Cited on page 77 in §25.1; page 81 in exercise 26.6 and page 81 in exercise 26.7.

[K AGAN et al. (1973)] K AGAN , A. M., L INNIK , Y. V. & R AO , C. R. (1973). Characterization problems in mathe-
matical statistics. John Wiley. ISBN 0-471-45421-4.
Cited on page 49 in proposition(15.8b).

[KOTZ et al. (2001)] KOTZ , S., KOZUBOWSKI , T. J. & P ODGORSKI , K. (2001). The Laplace distribution and gener-
alizations. https://www.researchgate.net/publication/258697410 The Laplace Distribution and Generalizations.
Cited on page 82 in §27.1.

[KOTZ &NADARAJAH (2000)] KOTZ , S. & NADARAJAH , S. (2000). Extreme value distributions. Imperial College
Press. ISBN 1-86094-224-5.
Cited on page 89 in §30.1.

[K UCZMA (2009)] K UCZMA , M. (2009). An introduction to the theory of functional equations and inequalities.
2nd ed. Birkhaüser. ISBN 978-3-7643-8748-8.
Cited on page 29 in §9.3.

[L UKACS (1955)] L UKACS , E. (1955). A characterization of the gamma distribution. Annals of Mathematical
Statistics, 26 319–324.
Cited on page 36 in proposition(11.8a).

[L UKACS (1970)] L UKACS , E. (1970). Characteristic functions. 2nd ed. Griffin. ISBN 0-85264-170-2.
Cited on page 168 in answer to exercise 2.11.

[M ARSAGLIA (1989)] M ARSAGLIA , G. (1989). The X+Y, X/Y characterization of the gamma distribution. In
G LESER , L., P ERLMAN , M. D., P RESS , S. J. & S AMPSON , A. R. (eds.), Contributions to probability and statistics.
Essays in honor of Ingram Olkin.. ISBN 1-4612-8200-4.
Cited on page 36 in proposition(11.8a).
Mar 10, 2020(20:25) References Page 253

[M ATHAI &P EDERZOLI (1977)] M ATHAI , A. M. & P EDERZOLI , G. (1977). Characterizations of the normal proba-
bility law. John Wiley. ISBN 0-85226-558-1.
Cited on page 49 in §15.8.

[M ATHAI &P ROVOST (1992)] M ATHAI , A. M. & P ROVOST, S. B. (1992). Quadratic forms in random variables.
Marcel Dekker. ISBN 0-8247-8691-2.
Cited on page 144 in theorem (44.7b) and page 149 in §44.14.

[M ORAN (2003)] M ORAN , P. A. P. (2003). An introduction to probability theory. 2nd ed. Oxford University
Press. ISBN 0-19-853242-3.
Cited on page 49 in proposition(15.8a).

[M OSSIMAN (1970)] M OSSIMAN , J. E. (1970). Size allometry: size and shape variables with characterizations of
the lognormal and generalized gamma distributions. Journal of the American Statistical Association, 65 930–
945.
Cited on page 77 in §25.1.

[N G et al. (2011)] N G , K. W., T IAN , G.-L. & TANG , M.-L. (2011). Dirichlet and related distributions. J. Wiley.
ISBN 0-470-68819-X.
Cited on page 166 in exercise 50.6.

[PATEL &R EAD (1996)] PATEL , J. K. & R EAD , C. B. (1996). Handbook of the normal distribution. 2nd ed. Dekker.
ISBN 0-82479-342-0.
Cited on page 49 in §15.8.

[P IERCE &DYKSTRA (1969)] P IERCE , D. A. & DYKSTRA , R. L. (1969). Independence and the normal distribution.
The American Statistician, 23 39.
Cited on page 140 in exercise 43.15.

[P ROVOST (1996)] P ROVOST, S. B. (1996). On Craig’s thereom and its generalizations. Journal of Statistical
Planning and Inference, 53 311–321.
Cited on page 145 in §44.7.

[R AO (1973)] R AO , C. R. (1973). Linear statistical inference and its applications. 2nd ed. John Wiley. ISBN
0-471-21875-8.
Cited on page 118 in §38.4.

[R EID &D RISCOLL (1988)] R EID , J. G. & D RISCOLL , M. F. (1988). An accessible proof of Craig’s theorem in the
noncentral case. The American Statistician, 42 139–142.
Cited on page 144 in theorem (44.7b).

[S AATY (1981)] S AATY, T. L. (1981). Modern nonlinear equations. Dover. ISBN 0-486-64232-1.
Cited on page 29 in §9.3.

[S CAROWSKY (1973)] S CAROWSKY, I. (1973). Quadratic forms in normal variables. Master’s thesis, Department
of Mathematics, McGill University. Downloadable from digitool.library.mcgill.ca/thesisfile50380.pdf.
Cited on page 141 in §44.1.

[S EARLE (1971)] S EARLE , S. R. (1971). Linear models. John Wiley. ISBN 0-471-76950-9.
Cited on page 146 in §44.8.

[S RIVASTAVA (1965)] S RIVASTAVA , M. S. (1965). A characterization of Pareto’s distribution and (k + 1)xk /θk+1 .
Annals of Mathematical Statistics, 36 361–362.
Cited on page 58 in §19.3 and page 60 in §19.7.

[WATSON (1959)] WATSON , G. (1959). A note on gamma functions. Edinburgh Mathematical Notes, 42 7–9.
Cited on page 111 in exercise 37.6.

[W HITWORTH (1901)] W HITWORTH , W. A. (1901). Choice and chance. 5th ed. George Bell. ISBN 1-332-81083-7.
Cited on page 24 in §7.8.

[W INKELBAUER (2014)] W INKELBAUER , A. (2014). Moments and absolute moments of the normal distribution.
arXiv:1209.4340v2.
Cited on page 47 in §15.4.
Page 254 References Mar 10, 2020(20:25) Bayesian Time Series Analysis

[Z HANG (2017)] Z HANG , J. (2017). On the independence of linear and quadratic forms in normal variates. Com-
munications in Statistics—Theory and Methods, 46 8493–8496.
Cited on page 146 in §44.8.
INDEX
The following distributions have entries in the index below:
Distributions with bounded support:
arcsine Irwin-Hall triangular uniform
beta power law Tukey’s lambda Upower
generalized arcsine semicircle
Distributions with support equal to the whole to R:
bilateral exponential hyperbolic secant logistic reverse Gumbel
Cauchy Laplace non-central t skew t
extreme value Linnik normal Student’s t
Gumbel
Distributions with support unbounded in one direction, either +∞ or −∞ but not both:
beta prime (beta0 ) Fisher’s z half-normal non-central χ2
chi (χ) folded normal inverse Gaussian non-central F
chi-squared (χ2 ) Fréchet Lévy Pareto
Erlang gamma log-logistic Rayleigh
exponential generalized gamma lognormal reverse Weibull
exponential logarithmic Gompertz Maxwell Weibull
F half-logistic
Multivariate distributions:
bivariate Cauchy bivariate t inverted Dirichlet multivariate t
bivariate normal Dirichlet multivariate normal

Symbols
d
= (equal distributions) 4(§1.4);
κ[X] (kurtosis) 5(§1.5);
D
=⇒ (convergence in distribution) 23(§7.5);
A
arcsine distribution 42–43(§13.8–§13.10), 45–46(ex14.15–ex14.22);
↔ beta 42(§13.8); ↔ gamma 46(ex14.21); ↔ normal 51(ex16.21);
↔ uniform 46(ex14.19), 46(ex14.20), 46(ex14.22); characterization 43(§13.9); kurtosis 45(ex14.18);
location-scale property 42(§13.9); skewness 45(ex14.18);
B
Bernoulli distribution
↔ uniform 22(§7.4); kurtosis 8(ex2.24); skewness 8(ex2.24);
beta distribution 40–41(§13.1–§13.5), 44(ex14.1–ex14.9);
↔ arcsine 42(§13.8); ↔ beta0 41(§13.6), 45(ex14.13); ↔ binomial 44(ex14.6);
↔ exponential 41(§13.4), 44(ex14.4); ↔ F 66(§21.8), 71(ex22.29), 71(ex22.30);
↔ gamma 38(ex12.7), 38(ex12.8), 41(§13.4); ↔ normal 51(ex16.21); ↔ Pareto 44(ex14.5);
↔ powerlaw 41(§13.4), 56(§19.1), 59(§19.5); ↔ semicircle 105(ex35.5); ↔ skewt 71(ex22.35);
↔ triangular 105(ex35.13); ↔ uniform 23(§7.5), 41(§13.4); kurtosis 44(ex14.3);
location-scale property 41(§13.5); skewness 44(ex14.3);
beta function 40(§13.1);
beta prime distribution (beta 0 ) 41–42(§13.6–§13.7), 45(ex14.10–ex14.14);
↔ beta 41(§13.6), 45(ex14.13); ↔ F 66(§21.8), 71(ex22.31); ↔ gamma 45(ex14.14);
↔ loglogistic 109(§36.4); kurtosis 45(ex14.12); skewness 45(ex14.12);
bilateral exponential distribution see Laplace distribution;
binomial distribution 44(ex14.7);
↔ beta 44(ex14.6);
bivariate Cauchy distribution 70(ex22.22);
bivariate normal distribution 123–127(§40.1–§40.8), 127–129(ex41.1–ex41.23);
see also multivariate normal distribution;
bivariate t distribution 159–160(§46.1–§46.4);
see also multivariate t distribution;
Box-Muller transformation 86(ex28.26);

Bayesian Time Series Analysis by R.J. Reed Mar 10, 2020(20:25) Index Page 255
Page 256 Index Mar 10, 2020(20:25) Bayesian Time Series Analysis

C
Cauchy distribution 64–65(§21.4–§21.5), 69–70(ex22.9–ex22.22);
↔ normal 69(ex22.16), 69(ex22.19); ↔ t 64(§21.4); ↔ uniform 69(ex22.17);
infinitely divisible 65(§21.5); location-scale property 69(ex22.11); sample median 69(ex22.13);
stable distribution 69(ex22.14);
central limit theorem 47(§15.1), 53–54(§17.6);
central limit theorem, local see local central limit theorem;
central limit theorem, multiplicative see multiplicative central limit theorem;
chi distribution 107(§36.1), 110–111(ex37.1–ex37.7);
↔ folded 110(ex37.2); ↔ HalfNormal 110(ex37.2); ↔ Maxwell 107(§36.2), 110(ex37.2);
↔ normal 111(ex37.7); ↔ Rayleigh 110(ex37.2); kurtosis 111(ex37.4); skewness 111(ex37.4);
chi-squared distribution 36(§11.9), 39(ex12.20–ex12.21);
↔ exponential 36(§11.9); ↔ F 66(§21.8), 71(ex22.32); ↔ gamma 36(§11.9);
↔ Laplace 84(ex28.10); ↔ Maxwell 107(§36.2); ↔ normal 48(§15.5), 50(ex16.12);
↔ Rayleigh 86(ex28.27); ↔ skewt 71(ex22.35); ↔ uniform 39(ex12.21);
exponential family 39(ex12.20); infinitely divisible 36(§11.9); kurtosis 37(ex12.2);
skewness 37(ex12.2);
conditional
conditional expectation 3(§1.1), 7(ex2.13–ex2.18); conditional covariance 3(§1.1);
conditional independence 3(§1.2); conditional variance 3(§1.1); law of total covariance 3(§1.1);
law of total variance 3(§1.1);
correlation matrix 118(§38.6);
covariance matrix see random vectors;
Cramér’s theorem 49(§15.8);
D
Daly’s theorem 137(§42.13);
digamma function 38(ex12.14), 90(§30.3), 221(ans31.9);
Dirichlet distribution 163–165(§49.1–§49.5), 165–166(ex50.1–ex50.6);
of the second kind 165(ex50.6);
distribution of X and S 2 51(ex16.20), 136(§42.12);
Daly’s theorem 137(§42.13);
E
entropy
definition 25(ex8.6); Laplace distribution 82(§27.1), 84(ex28.7); normal distribution 50(ex16.11);
uniform distribution 25(ex8.6);
Erlang distribution 33(§11.1);
Euler’s reflection formula 43(§13.10), 89(§29.4), 113(ex37.19), 233(ex37.19);
exponential distribution 27–31(§9.1–§9.8), 31–33(ex10.1–ex10.24);
↔ beta 41(§13.4), 44(ex14.4); ↔ chi-squared 36(§11.9);
↔ expLog 112(ex37.15), 112(ex37.16); ↔ extremeIII 94(§30.10);
↔ F 71(ex22.28); ↔ ggamma 37(§11.10); ↔ gamma 28(§9.1), 31(§9.7), 35(§11.6);
↔ Gompertz 115(ex37.31); ↔ Gumbel 32(ex10.19), 90(§30.4), 97(ex31.15);
↔ Laplace 32(ex10.11), 83(ex28.1), 84(ex28.8), 84(ex28.9), 84(ex28.13);
↔ logistic 97(ex31.10); ↔ Pareto 62(ex20.16); ↔ powerlaw 60(ex20.3);
↔ Rayleigh 85(ex28.25), 86(ex28.27); ↔ uniform 23(§7.5), 25(ex8.3), 32(ex10.5), 32(ex10.6);
↔ Weibull 82(§27.3); characterization 16(ex6.10), 29(§9.3), 29(§9.4), 32(ex10.13);
distribution of difference 81(ex26.4); hazard function 4(§1.3), 31(ex10.2);
kurtosis 31(ex10.1); lack of memory 29(§9.3); limit of geometrics 28(§9.2);
Markov property 29(§9.3); order statistics 16(ex6.10), 29–30(§9.4–§9.6), 32(ex10.12–ex10.19), 81(ex26.5);
scale family property 28(§9.1); skewness 31(ex10.1);
exponential family of distributions
chi-squared distribution 39(ex12.20); gamma distribution 38(ex12.16);
inverse Gaussian distribution 103(ex33.13);
exponential logarithmic distribution 108(§36.3), 112(ex37.11–ex37.16);
↔ exponential 112(ex37.15), 112(ex37.16); ↔ uniform 112(ex37.14); hazard function 112(ex37.12);
scale family property 112(ex37.12);
Index Mar 10, 2020(20:25) Index Page 257

extreme value distributions 89–96(§30.1–§30.14), 97–98(ex31.12–ex31.22);


general formulation 95–96(§30.13–§30.14); type I see Gumbel distribution;
type II see Fréchet distribution; type III see reverse Weibull distribution;

F
F distribution 65–66(§21.6–§21.8), 70–71(ex22.23–ex22.33);
↔ beta 66(§21.8), 71(ex22.29), 71(ex22.30); ↔ beta0 66(§21.8), 71(ex22.31);
↔ chi-squared 66(§21.8), 71(ex22.32); ↔ exponential 71(ex22.28); ↔ FisherZ 66(§21.9);
↔ gamma 66(§21.8), 71(ex22.28); ↔ Laplace 84(ex28.10); ↔ normal 70(ex22.27);
↔ skewt 71(ex22.35); ↔ t 66(§21.7), 71(ex22.33); kurtosis 70(ex22.26); skewness 70(ex22.26);
Fisher’s z distribution 66(§21.9);
↔ F 66(§21.9);
folded normal distribution 51–52(ex16.22–ex16.27), 110(ex37.2);
↔ chi 110(ex37.2); scale family property 52(ex16.27);
Fréchet distribution (type II extreme value) 92–93(§30.8–§30.9), 98(ex31.21);
↔ uniform 93(§30.9); ↔ Weibull 93(§30.9); order statistics 93(§30.9);

G
gamma distribution 33–37(§11.1–§11.11), 37–39(ex12.1–ex12.23), 78(§25.4);
↔ arcsine 46(ex14.21); ↔ beta 38(ex12.7), 38(ex12.8), 41(§13.4); ↔ beta0 45(ex14.14);
↔ chi-squared 36(§11.9); ↔ exponential 28(§9.1), 31(§9.7), 35(§11.6); ↔ F 66(§21.8), 71(ex22.28);
↔ ggamma 37(§11.10), 39(ex12.22); ↔ normal 50(ex16.12); ↔ Rayleigh 85(ex28.25), 86(ex28.27);
↔ uniform 23(§7.5); characterization(Lukacs) 36(§11.8); characterization (size and shape) 78(§25.4);
exponential-gamma mixture 38(ex12.12); exponential family 38(ex12.16);
infinitely divisible 15(§5.2), 38(ex12.6); kurtosis 37(ex12.2); limit of negative binomials 39(ex12.23);
normal approximation 35(§11.7); Poisson-gamma mixture 39(ex12.17); scale family property 34(§11.3);
skewness 37(ex12.2);
generalized arcsine distribution 43(§13.10);
generalized gamma distribution (ggamma) 37(§11.10), 39(ex12.22);
↔ exponential 37(§11.10); ↔ gamma 37(§11.10), 39(ex12.22); ↔ HalfNormal 37(§11.10);
↔ Rayleigh 37(§11.10); ↔ Weibull 37(§11.10); characterization (size and shape) 81(ex26.3);
geometric mean of a distribution 55(ex18.6);
geometric variance of a distribution 55(ex18.6);
geometric distribution 28(§9.2), 35(§11.6), 181(ans10.4);
Gini coefficient
definition 55(ex18.8); lognormal 55(ex18.8); Pareto 62(ex20.17);
Gompertz distribution 110(§36.6), 114–115(ex37.27–ex37.31);
↔ exponential 115(ex37.31); ↔ reverseGumbel 114(ex37.29); ↔ uniform 115(ex37.30);
hazard function 114(ex37.27); scale family property 110(§36.6);
Gumbel distribution (type I extreme value) 89–92(§30.2–§30.7);
↔ exponential 32(ex10.19), 90(§30.4), 97(ex31.15); ↔ logistic 90(§30.4), 97(ex31.17);
↔ uniform 90(§30.4), 97(ex31.15); ↔ Weibull 92(§30.7), 97(ex31.18), 97(ex31.19);
Gumbel distribution for minima 91(§30.4); kurtosis 90(§30.3); location-scale property 91(§30.6);
reverse Gumbel distribution 91(§30.4); skewness 90(§30.3);
Page 258 Index Mar 10, 2020(20:25) Bayesian Time Series Analysis

H
half-logistic distribution 97(ex31.8);
↔ logistic 97(ex31.8);
half-normal distribution, folded (0, σ 2 ) 51–52(ex16.22–ex16.27);
↔ chi 110(ex37.2); ↔ ggamma 37(§11.10); ↔ Lévy 102(ex33.2);
scale family property 52(ex16.27);
hazard function 4(§1.3), 7(ex2.19–ex2.21);
exponential distribution 4(§1.3), 31(ex10.2); exponential logarithmic distribution 112(ex37.12);
Gompertz distribution 114(ex37.27); log-logistic distribution 113(ex37.18); logistic
distribution 96(ex31.6); Weibull distribution 4(§1.3), 85(ex28.18);
Helmert matrix 140(ex43.12);
hyperbolic secant distribution 109–110(§36.5), 113–114(ex37.23–ex37.26);
location-scale property 113(ex37.23);

I
infinitely divisible 15(§5.1–§5.2);
Cauchy distribution 65(§21.5); chi-squared distribution 36(§11.9);
gamma distribution 15(§5.2), 38(ex12.6); inverse Gaussian distribution 103(ex33.11);
normal distribution 15(§5.2), 48(§15.6);
inverse Gaussian distribution 100–101(§32.4–§32.7), 102–103(ex33.5–ex33.13);
↔ Lévy 101(§32.7), 103(ex33.12); exponential family 103(ex33.13); infinitely divisible 103(ex33.11);
kurtosis 102(ex33.8); scale family property 100(§32.4); skewness 102(ex33.8);
inverted Dirichlet distribution see Dirichlet distribution of the second kind;
Irwin-Hall distribution 20–22(§7.3), 27(ex8.21–ex8.24);
↔ triangular 105(ex35.12); kurtosis 27(ex8.23); skewness 27(ex8.23);

J
Jensen’s inequality 6(ex2.9);

K
kurtosis 5(§1.5), 5(§1.6), 8(ex2.24–ex2.25);
arcsine distribution 45(ex14.18); Bernoulli distribution 8(ex2.24); beta distribution 44(ex14.3);
beta prime distribution (beta 0 ) 45(ex14.12); chi distribution 111(ex37.4);
chi-squared distribution 37(ex12.2); exponential distribution 31(ex10.1);
F distribution 70(ex22.26); gamma distribution 37(ex12.2); Gumbel distribution 90(§30.3);
inverse Gaussian distribution 102(ex33.8); Irwin-Hall distribution 27(ex8.23);
Laplace distribution 84(ex28.6); logistic distribution 96(ex31.4); lognormal distribution 55(ex18.5);
Maxwell distribution 112(ex37.10); non-central chi-squared distribution 76(ex24.4);
normal distribution 49(ex16.3); Pareto distribution 61(ex20.12); power law distribution 60(ex20.2);
Rayleigh distribution 85(ex28.23); semicircle distribution 104(ex35.2); sine distribution 106(ex35.17);
triangular distribution 105(ex35.8); t distribution 68(ex22.4); uniform distribution 5(§1.5), 26(ex8.18);
Upower distribution 107(ex35.21); Weibull distribution 85(ex28.17);
Index Mar 10, 2020(20:25) Index Page 259

L
Laplace distribution 81–82(§27.1), 83–84(ex28.1–ex28.14);
↔ chi-squared 84(ex28.10); ↔ exponential 32(ex10.11), 83(ex28.1), 84(ex28.8), 84(ex28.9), 84(ex28.13);
↔ F 84(ex28.10); ↔ normal 84(ex28.12), 84(ex28.13); ↔ Rayleigh 86(ex28.28);
↔ uniform 84(ex28.11); kurtosis 84(ex28.6); location-scale property 83(ex28.2);
skewness 84(ex28.6);
law of total variance 3(§1.1);
length biased sampling 39(ex12.19);
Lévy distribution 98–99(§32.1–§32.3), 102(ex33.1–ex33.4);
↔ HalfNormal 102(ex33.2); ↔ inverseGaussian 101(§32.7), 103(ex33.12);
location-scale property 99(§32.3); stable distribution 102(ex33.4);
linear predictor 7(ex2.15–ex2.18);
Linnik distribution 115(ex37.32);
local central limit theorem 35(§11.7), 39(ex12.18);
location-scale family of distributions 5(§1.6);
arcsine distribution 42(§13.9); beta distribution 41(§13.5); Cauchy distribution 69(ex22.11);
Gumbel distribution (type I extreme value) 91(§30.6); hyperbolic secant distribution 113(ex37.23);
Laplace distribution 83(ex28.2); Lévy distribution 99(§32.3); logistic distribution 87(§29.2);
normal distribution 5(§1.6), 47(§15.3); semicircle distribution 104(ex35.4);
sine distribution 106(ex35.14); triangular distribution 105(ex35.6); uniform distribution 19(§7.1);
Upower distribution 106(ex35.18);
log-logistic distribution 108–109(§36.4), 113(ex37.17–ex37.22);
↔ beta0 109(§36.4); ↔ logistic 113(ex37.21); ↔ uniform 113(ex37.21);
hazard function 113(ex37.18); scale family property 109(§36.4);
logistic distribution 86–89(§29.1–§29.4), 96–97(ex31.1–ex31.10);
↔ exponential 97(ex31.10); ↔ Gumbel 90(§30.4), 97(ex31.17); ↔ HalfLogistic 97(ex31.8);
↔ loglogistic 113(ex37.21); ↔ uniform 96(ex31.7); hazard function 96(ex31.6);
kurtosis 96(ex31.4); location-scale property 87(§29.2); median 96(ex31.3);
order statistics 97(ex31.9); quantile function 96(ex31.3); skewness 96(ex31.4);
standard logistic distribution 87(§29.2);
lognormal distribution 52–54(§17.1–§17.8), 55–56(ex18.1–ex18.11);
↔ normal 52(§17.1); Gini coefficient 55(ex18.8); kurtosis 55(ex18.5);
scale family property 56(ex18.9); skewness 55(ex18.5);

M
Maxwell distribution 107–108(§36.2), 112(ex37.8–ex37.10);
↔ chi 107(§36.2), 110(ex37.2); ↔ chi-squared 107(§36.2); ↔ normal 107(§36.2);
kurtosis 112(ex37.10); scale family property 108(§36.2); skewness 112(ex37.10);
multiplicative central limit theorem 53(§17.6);
multivariate lognormal distribution 77(§25.3), 80(§25.7);
characterization (size and shape) 80(§25.7);
multivariate normal distribution 77(§25.3), 130–138(§42.1–§42.14), 139–141(ex43.1–ex43.16);
see also bivariate normal distribution; Daly’s theorem 137(§42.13);
multivariate t distribution 160–162(§47.1–§47.7), 163(ex48.1–ex48.6);
see also bivariate t distribution;
Page 260 Index Mar 10, 2020(20:25) Bayesian Time Series Analysis

N
negative binomial distribution 35(§11.6), 39(ex12.23), 44(ex14.8), 187(ex12.17);
non-central chi-squared distribution 71–75(§23.1–§23.5), 76(ex24.1–ex24.5);
kurtosis 76(ex24.4); skewness 76(ex24.4);
non-central F distribution 76(§23.7), 76–77(ex24.8–ex24.9);
non-central t distribution 75–76(§23.6), 76(ex24.6–ex24.7);
normal distribution 46–49(§15.1–§15.9), 49–52(ex16.1–ex16.27), 86(ex28.26);
↔ arcsine 51(ex16.21); ↔ beta 51(ex16.21); ↔ Cauchy 69(ex22.16), 69(ex22.19);
↔ chi 111(ex37.7); ↔ chi-squared 48(§15.5), 50(ex16.12); ↔ F 70(ex22.27);
↔ gamma 50(ex16.12); ↔ Laplace 84(ex28.12), 84(ex28.13); ↔ logN 52(§17.1);
↔ Maxwell 107(§36.2); ↔ Rayleigh 86(ex28.26); Box-Muller transformation 86(ex28.26);
characterizations 49(§15.8); cross product 84(ex28.12); entropy 50(ex16.11);
independence of mean and sample variance 51(ex16.20); infinitely divisible 15(§5.2), 48(§15.6);
kurtosis 49(ex16.3); linear combinations 48(§15.6–§15.7), 50(ex16.7);
location-scale property 5(§1.6), 47(§15.3); order statistics 51(ex16.20);
polar coordinates 69(ex22.19), 86(ex28.26); range of a sample 51(ex16.20); skewness 49(ex16.3);
stable distribution 14(§4.3), 48(§15.6); tail probabilities 50(ex16.8);

O
order statistics 8–11(§3.1–§3.7), 15–16(ex6.1–ex6.11);
asymptotic result 11(§3.7); probability integral transform 24(§7.7); uniform 15(ex6.1–ex6.6);

P
Pareto distribution 58–60(§19.4–§19.7), 61–63(ex20.10–ex20.27);
↔ beta 44(ex14.5); ↔ exponential 62(ex20.16); ↔ powerlaw 59(§19.5), 61(ex20.10);
↔ uniform 61(ex20.10); characterization 59(§19.7), 62(ex20.26), 63(ex20.27);
characterization (size and shape) 79(§25.5), 81(ex26.6); geometric mean 62(ex20.17);
Gini coefficient 62(ex20.17); kurtosis 61(ex20.12); order statistics 62(ex20.18–ex20.23);
scale family property 59(§19.5); skewness 61(ex20.12);
Poincaré’s roulette problem 26(ex8.10);
Poisson distribution 35(§11.6), 39(ex12.17), 75(§23.5), 76(ex24.2), 77(ex24.9), 213(ans24.4);
Poisson process 35(§11.6), 39(ex12.19);
power law distribution 56–58(§19.1–§19.3), 60–61(ex20.1–ex20.9);
↔ beta 41(§13.4), 56(§19.1), 59(§19.5); ↔ exponential 60(ex20.3);
↔ Pareto 59(§19.5), 61(ex20.10); ↔ uniform 61(ex20.4); characterization 57(§19.3);
characterization (size and shape) 79(§25.6), 81(ex26.7); kurtosis 60(ex20.2);
order statistics 61(ex20.5–ex20.9); scale family property 57(§19.2); skewness 60(ex20.2);
probability integral transform 23–24(§7.6–§7.7);

Q
quadratic forms, general results 119–121(§38.7–§38.9), 122–123(ex39.8–ex39.14);
definition 119(§38.7); mean 119(§38.8); variance 120(§38.9);
quadratic forms, normal 141–155(§44.1–§44.21), 156–158(ex45.1–ex45.28);
Index Mar 10, 2020(20:25) Index Page 261

R
random partitions of an interval 24(§7.8), 44(ex14.9);
random vectors 117–118(§38.1–§38.6);
covariance matrix 118(§38.5), 121–122(ex39.1–ex39.3); linear transformation 117(§38.2);
transformation to independent random variables 118(§38.4); variance see variance matrix;
Rayleigh distribution 83(§27.4), 85–86(ex28.22–ex28.28);
↔ chi 110(ex37.2); ↔ chi-squared 86(ex28.27); ↔ exponential 85(ex28.25), 86(ex28.27);
↔ ggamma 37(§11.10); ↔ gamma 85(ex28.25), 86(ex28.27); ↔ Laplace 86(ex28.28);
↔ normal 86(ex28.26); ↔ uniform 85(ex28.25); ↔ Weibull 83(§27.4); kurtosis 85(ex28.23);
scale family property 83(§27.4); skewness 85(ex28.23);
record values 17(ex6.12);
reverse Gumbel distribution (type I extreme value) 91(§30.4);
↔ Gompertz 114(ex37.29);
reverse Weibull distribution (type III extreme value) 94–95(§30.10–§30.12);
↔ exponential 94(§30.10); ↔ Weibull 94(§30.11); median 94(§30.10), 95(§30.12);

S
scale family of distributions 5(§1.6);
exponential distribution 28(§9.1); exponential logarithmic distribution 112(ex37.12);
folded normal distribution 52(ex16.27); gamma distribution 34(§11.3);
Gompertz distribution 110(§36.6); half-normal distribution, folded (0, σ 2 ) 52(ex16.27);
inverse Gaussian distribution 100(§32.4); log-logistic distribution 109(§36.4);
lognormal distribution 56(ex18.9); Maxwell distribution 108(§36.2); Pareto distribution 59(§19.5);
power law distribution 57(§19.2); Rayleigh distribution 83(§27.4); Weibull distribution 82(§27.3);
sech-squared distribution 96(ex31.2);
semicircle distribution 103(§34.1), 104–105(ex35.1–ex35.5);
↔ beta 105(ex35.5); ↔ uniform 104(ex35.3); kurtosis 104(ex35.2);
location-scale property 104(ex35.4); skewness 104(ex35.2);
shape and size see size and shape;
sine distribution 103(§34.3);
kurtosis 106(ex35.17); location-scale property 106(ex35.14); skewness 106(ex35.17);
size and shape 77–80(§25.1–§25.7);
skewness 4(§1.4), 5(§1.6), 7–8(ex2.22–ex2.24);
arcsine distribution 45(ex14.18); Bernoulli distribution 8(ex2.24); beta distribution 44(ex14.3);
beta prime distribution (beta 0 ) 45(ex14.12); chi distribution 111(ex37.4);
chi-squared distribution 37(ex12.2); exponential distribution 31(ex10.1); F distribution 70(ex22.26);
gamma distribution 37(ex12.2); Gumbel distribution (type I extreme value) 90(§30.3);
inverse Gaussian distribution 102(ex33.8); Irwin-Hall distribution 27(ex8.23);
Laplace distribution 84(ex28.6); logistic distribution 96(ex31.4); lognormal distribution 55(ex18.5);
Maxwell distribution 112(ex37.10); non-central chi-squared distribution 76(ex24.4);
normal distribution 49(ex16.3); Pareto distribution 61(ex20.12); power law distribution 60(ex20.2);
Rayleigh distribution 85(ex28.23); semicircle distribution 104(ex35.2); sine distribution 106(ex35.17);
triangular distribution 105(ex35.8); t distribution 68(ex22.4); uniform distribution 5(§1.5), 26(ex8.18);
Upower distribution 107(ex35.21); Weibull distribution 85(ex28.17);
skew t distribution (skewt) 66–67(§21.10–§21.11), 71(ex22.34–ex22.37);
↔ beta 71(ex22.35); ↔ chi-squared 71(ex22.35); ↔ F 71(ex22.35);
↔ t 67(§21.11), 71(ex22.34);
Skitovich-Darmois theorem 49(§15.8);
stable distributions 11–14(§4.1–§4.3), 17(ex6.13–ex6.18);
Cauchy distribution 69(ex22.14); Lévy distribution 102(ex33.4); normal distribution 14(§4.3), 48(§15.6);
survivor function 4(§1.3);
Page 262 Index Mar 10, 2020(20:25) Bayesian Time Series Analysis

T
triangular distribution 26(ex8.20), 103(§34.2), 105(ex35.6–ex35.13);
↔ beta 105(ex35.13); ↔ Irwin-Hall 105(ex35.12);
↔ uniform 6(ex2.3), 19(§7.2), 105(ex35.9), 105(ex35.12); kurtosis 105(ex35.8);
location-scale property 105(ex35.6); skewness 105(ex35.8);
trigamma function 90(§30.3);
Tukey’s lambda distribution 26(ex8.19);
type
distributions with same type 5(§1.6);
t distribution 63–65(§21.1–§21.5), 68–70(ex22.1–ex22.22);
↔ Cauchy 64(§21.4); ↔ F 66(§21.7), 71(ex22.33); ↔ skewt 67(§21.11), 71(ex22.34);
kurtosis 68(ex22.4); skewness 68(ex22.4);

U
uniform distribution 19–25(§7.1–§7.9), 25–27(ex8.1–ex8.24);
↔ arcsine 46(ex14.19), 46(ex14.20), 46(ex14.22); ↔ Bernoulli 22(§7.4); ↔ beta 23(§7.5), 41(§13.4);
↔ Cauchy 69(ex22.17); ↔ chi-squared 39(ex12.21); ↔ expLog 112(ex37.14);
↔ exponential 23(§7.5), 25(ex8.3), 32(ex10.5), 32(ex10.6); ↔ Fréchet 93(§30.9); ↔ gamma 23(§7.5);
↔ Gompertz 115(ex37.30); ↔ Gumbel 90(§30.4), 97(ex31.15); ↔ Laplace 84(ex28.11);
↔ loglogistic 113(ex37.21); ↔ logistic 96(ex31.7); ↔ Pareto 61(ex20.10);
↔ powerlaw 61(ex20.4); ↔ Rayleigh 85(ex28.25); ↔ semicircle 104(ex35.3);
↔ triangular 6(ex2.3), 19(§7.2), 105(ex35.9), 105(ex35.12); ↔ Weibull 85(ex28.19);
entropy 25(ex8.6); kurtosis 5(§1.5), 26(ex8.18); location-scale property 19(§7.1);
order statistics 11(§3.7), 15(ex6.1–ex6.6), 23(§7.5), 26(ex8.11), 26(ex8.16), 30(§9.6), 61(ex20.4);
skewness 5(§1.5), 26(ex8.18);
Upower distribution 104(§34.4);
kurtosis 107(ex35.21); location-scale property 106(ex35.18); skewness 107(ex35.21);

V
variance matrix 117–118(§38.1–§38.6), 121–122(ex39.1–ex39.7);
correlation matrix 118(§38.6); linear transformation 117(§38.2); positive definite 118(§38.3);
square root 118(§38.4);

W
Wald distribution see inverse Gaussian distribution;
Weibull distribution 82(§27.2–§27.3), 85(ex28.15–ex28.21);
↔ exponential 82(§27.3); ↔ extremeIII 94(§30.11); ↔ Fréchet 93(§30.9);
↔ ggamma 37(§11.10); ↔ Gumbel 92(§30.7), 97(ex31.18), 97(ex31.19); ↔ Rayleigh 83(§27.4);
↔ uniform 85(ex28.19); hazard function 4(§1.3), 85(ex28.18); kurtosis 85(ex28.17);
scale family property 82(§27.3); skewness 85(ex28.17);
Wigner distribution see semicircle distribution;

Das könnte Ihnen auch gefallen