Stochastic Processes in Engineering Systems 2nd ( Eugene Wong Bruce Hajek ) PDF

Springer Texts in
Electrical Engineering
Springer Texts in Electrical Engineering
Multivariable Feedback Systems

F. M. Callier and C. A. Desoer
Linear Programming
M. Sakarovitch
Introduction to Random Processes

E. Wong
Stochastic Processes in Engineering Systems

E. Wong and B. Hajek
EUGENE WONG
BRUCE HAJEK
Stochastic Processes in
Engineering Systems
Springer-Verlag
New York Berlin Heidelberg Tokyo
Eugene Wong Bruce Hajek
Department of Electrical Engineering Department of Electrical and
and Computer Sciences Computer Engineering
University of California University of Illinois, Urbana-
Berkeley, California 94720 U.S.A. Champaign
Urbana, Illinois 61801 U.S.A.
Library of Congress Cataloging in Publication Data

Wong, Eugene, 1934-
Stochastic processes in engineering systems.
(Springer texts in electrical engineering)
Bibliography: p.
Includes index.
1. Electric engineering-Mathematics. 2. Stochastic
processes. I. Hajek, Bruce. II. Title. III. Series.
TK153.W66 1984 519.2'024'62 84-20257
Previous edition, Stochastic Processes in Information and Dynamical

Systems by E. Wong, was published by McGraw-Hill, Inc. in 1971.
© 1971, 1985 by Springer-Verlag New York Inc.

Softcover reprint of the hardcover 2nd edition 1985
All rights reserved. No part of this book may be translated or reproduced in
any form without written permission from Springer-Verlag, 175 Fifth
Avenue, New York, New York 10010, U.S.A.
The use of general descriptive names, trade names, trademarks, etc., in this
publication, even if the former are not especially identified, is not to be
taken as a sign that such names, as understood by the Trade Marks and
Merchandise Marks Act, may accordingly be used freely by anyone.
Typeset by Science Typographers, Inc., Medford, New York.
9 8 7 6 543 2 1
ISBN-13: 978-1-4612-9545-7 e-ISBN-13: 978-1-4612-5060-9
DOl: 10.1007/978-1-4612-5060-9
Preface
This book is a revision of Stochastic Processes in Information and

Dynamical Systems written by the first author (E.W.) and published
in 1971. The book was originally written, and revised, to provide a
graduate level text in stochastic processes for students whose primary
interest is its applications. It treats both the traditional topic of sta-
tionary processes in linear time-invariant systems as well as the more
modern theory of stochastic systems in which dynamic structure plays
a profound role. Our aim is to provide a high-level, yet readily acces-
sible, treatment of those topics in the theory of continuous-parameter
stochastic processes that are important in the analysis of information
and dynamical systems.
The theory of stochastic processes can easily become abstract. In
dealing with it from an applied point of view, we have found it difficult
to decide on the appropriate level of rigor. We intend to provide just
enough mathematical machinery so that important results can be stated
vi PREFACE
with precision and clarity; so much ofthe theory of stochastic processes

is inherently simple if the suitable framework is provided. The price
of providing this framework seems worth paying even though the ul-
timate goal is in applications and not the mathematics per se.
There are two main topics in the book: second-order properties of
stochastic processes (Chapter 3) and stochastic calculus in dynamical
systems with applications to white noise in nonlinear systems, nonlin-
ear filtering and detection (Chapters 4, 6 and 7). Each topic provides
a convenient core for a one-semester or one-quarter graduate course.
This material has been used for several years in two such courses in
Berkeley, and the second topic has been used in advanced graduate
courses at Urbana. The level of sophistication required for the first
course is considerably lower than that required by the second. For the
course centered on second-order properties, a good undergraduate back-
ground in probability theory and linear systems analysis is adequate;
for the course centered on stochastic calculus, an aquaintance with
measure theory is almost necessary. At Berkeley we have required a
prerequisite of an undergraduate course on integration. However, a
generally high level of mathematical sophistication is probably more
important than formal prerequisites.
This revised text differs from the original in that the original
Chapter 6 was replaced by two new chapters, Chapters 6 and 7, which
provide an introduction to a more recently developed calculus of random
processes in dynamical systems. This approach allows simultaneous
treatment of jump processes (found in queueing and optical commu-
nication systems, for example) and sample-continuous processes arising
as integrals of white Gaussian noise. This revision also contains a brief
introduction to the two-parameter martingale calculus which has been
developed during the last ten years, and it contains an introduction to
a related, recently developed theory of stochastic differential forms.
The supporting material in Chapters 1 and 2 should be used as
the instructor sees fit. Much of this material may be inappropriate for
one or the other of the two courses and should be omitted.
We have included a fair number of problems and exercises. So-
lutions have been provided for most of them in order to facilitate self
study and to provide a pool of example8 and supplementary material
for the main text.
We have kept the list of references short and specific. In cases
where alternative references exist, we have chosen those which are
most familiar to us. The necessary incompleteness of such a list is
perhaps compensated by the high degree of relevance that each ref-
erence bears to the text. On basic points of probability theory, we have
relied heavily on the three well-known books by Doob, Loeve, and
PREFACE vii
Neveau. On a more elementary level we have found the text by Thom-

asian particularly comprehensive and lucid.
This book could not have been written without direct and indirect
assistance from many sources. First, it is obviously a direct result of
our teaching and research on stochastic processes. We are grateful to
the Department of Electrical Engineering and Computer Sciences at
the University of California, Berkeley and the Department of Electrical
and Computer Engineering at the University of Illinois, Urbana, for
their continuing support. The Army Research Office (Durham), the
National Science Foundation, the Office of Naval Research, and the
Department of Defense through its Joint Services Electronics Program
have supported our research over the years. The organization of the
original book and some ofits initial writing were done during E. Wong's
very pleasant sabbatical year at the University of Cambridge with the
support of a John Simon Guggenheim fellowship. He is indebted to his
Cambridge host, Professor J. F. Coales, F. R. S., for his kindness.
Many colleagues and friends have generously provided suggestions
and criticisms. It is a pleasure to acknowledge the help given by Pierre
Bremaud, Dominique Decavele, Tyrone Duncan, Terence Fine, Larry
Shepp, Pravin Varaiya, and Moshe Zakai. We are very grateful to Joe
Doob for his many helpful suggestions which were for the most part
incorporated into the new chapters, and we are especially indebted to
Bill Root for a painstaking review of a major portion of the original
manuscript. The number of errors found so far makes us wonder how
many still remain, and we still dare not emulate our friend Elwyn
Berlekamp in offering a cash reward to readers for correction of errors!
Nevertheless, any report of errors will be appreciated.
Miss Bonnie Bullivant typed the manuscript and its many revisions
with both skill and patience.
Eugene Wong
Bruce Hajek
Contents
PREFACE v
1 ELEMENTS OF PROBABILITY THEORY 1
1. Events and probability 1

2. Measures on finite-dimensional spaces 5
3. Measurable functions and random variables 7
4. Sequences of events and random variables 11
5. Expectation of random variables 15
6. Convergence concepts 18
7. Independence and conditional expectation 24
2 STOCHASTIC PROCESSES 37
1. Definition and preliminary considerations 37

x CONTENTS
2. Separability and measurability 41

3. Gaussian processes and Brownian motion 46
4. Continuity 55
5. Markov processes 61
6. Stationarity and ergodicity 66
3 SECOND-ORDER PROCESSES 74
1. Introduction 74
2. Second-order continuity 77
3. Linear operations and second-order calculus 78
4. Orthogonal expansions 81
5. Wide-sense stationary processes 88
6. Spectral representation 97
7. Lowpass and bandpass processes 105
8. White noise and white-noise integrals 109
9. Linear prediction and filtering 116
4 STOCHASTIC INTEGRALS AND STOCHASTIC DIFFERENTIAL EQUATIONS 139
1. Introduction 139
2. Stochastic integrals 141
3. Processes defined by stochastic integrals 145
4. Stochastic differential equations 149
5. White noise and stochastic calculus 155
6. Generalizations of the stochastic integral 163
7. Diffusion equations 169
5 ONE-DIMENSIONAL DIFFUSIONS 180
1. Introduction 180
2. The Markov semigroup 182
3. Strong Markov processes 190
4. Characteristic operators 193
5. Diffusion processes 198
6 MARTINGALE CALCULUS 209
1. Martingales 209
2. Sample-path integrals 217
3. Predictable processes 222
4. Isometric integrals 227
5. Semi martingale integrals 233
CONTENTS xi
6. Quadratic variation and the change of variable formula 236

7. Semi martingale exponentials and applications 241
7 DETECTION AND FILTERING 250
1. Introduction 250
2. Likelihood ratio representation 254
3. Filter representation---change of measure derivation 257
4. Filter representation-innovations derivation 262
5. Recursive estimation 269
8 RANDOM FIELDS 279
1. Introduction 279
2. Homogenous random fields 280
3. Spherical harmonics and isotropic random fields 285
4. Markovian random fields 292
5. Multiparameter martingales 296
6. Stochastic differential forms 303
REFERENCES 311
SOLUTIONS TO EXERCISES 316
INDEX 355
1
Elements of Probability Theory
1. EVENTS AND PROBABILITY
The simplest situation in which probability can be considered involves a

random experiment 'with a finite number of outcomes. Let Q = {WI, W2,
. • • ,WN I denote the set of all outcomes of the experiment. In this case
a probability Pi can be assigned to each outcome Wi. The only restrictions
I
N
that we place on these probabilities are: Pi ~ 0 and Pi = 1. Every
i= 1
subset A of Q in this case also has a well-defined probability which is
equal to the sum of the probabilities of the outcomes contained in A.
If the number of outcomes in Q is countably infinite, the situation
is quite similar to the finite case. However, if Q contains an uncountable
number of points, then not more than a countable number of them can
have nonzero probabilities, since the number of outcomes with proba-
bility· 2: l/n must be less than or equal to n. In the general case, it is
necessary to consider probabilities to be defined on subsets of 11, rather
than on points of 11. A subset A on \vhich a probability is defined is called
1
z ELEM ENTS OF PROBABI LlTY THEORY
an event. Let a denote the class of all events. A satisfactory theory of

probability demands that the complement of an event should again be
an event. Indeed, the occurrence of the complement A c is just the non-
occurrence of A. Similarly, if A and B are two events, then the simul-
taneous occurrence of both A and B should have a well-defined proba-
bility, i.e., the intersection of two events should again be an event.
Thus, a should be closed under complementation and pairwise inter-
section. This immediately implies that a is closed under all finite Boolean
set operations. l A class of sets which is closed under all finite Boolean
set operations is called a Boolean algebra or simply an algebra. An
elementary probability measure (P is a function defined on an algebra a
such that
o :::; (p(A) :::; I and (p(n) = I (l.la)

(p(A U B) = (p(A) +
(p(B) whenever A and Bare
disjoint (additivity) (l.lb)
Both (l.la) and (l.lb) are clearly natural conditions to be required of a

probability measure.
In order that we can consider sequences of events and possible
convergence of such sequences, it is necessary that not only finite, but all
countable set operations on events again yield events. A class of sets
is called a (Boolean) d algebra if it is closed under all countable set
operations. 2 It is easy to verify that the intersection of arbitrarily many
u algebras (of subsets of the same n) is again a u algebra (of subsets of n).
Therefore, given an arbitrary class e of subsets of n, there is a smallest
d algebra aCe) which contains e. This is because we can define aCe) to be
the intersection of all u algebras containing e, and there is at least one
such u algebra, viz., the collection of all subsets of n. We shall say that e
generates its minimal u algebra a(e). Now, consider a Boolean algebra CB
together with an elementary probability measure (P. Suppose that in
addition to (l.la) and (l.lb), (p also satisfies
Whenever {An} is a sequence of sets in CB such that An :) .1n+1
and nAn
n=l
othen lim (P(An) = 0 (monotone sequential
n-HO
continuity at 0) (l.le)
1 Complementation, union, and intersection are the most familiar Boolean set opera-
tions. Only complementation and either union or intersection need be defined. All
other set operations are then expressible in terms of the two basic operations.
2 Since all set operations are expressible in terms of complementation and union, to
verify that a class is a u algebra, we only need to verify that it is closed under com-
plementation and countable union.
1. EVENTS AND PROBABILITY 3
where 0 denotes empty set. Conditions (1.la) to (1.le) are equivalent to

(1.la) and the following condition taken together:
Whenever {AnI is a sequence of pairwise disjoint sets in ffi such
that U
'"
n=l
An is also in ffi, then ()'
'"
(U An)
n=l
= r'"
n=l
()'(An)
(0" additivity) (LId)
A set function ()' defined on an algebra ffi satisfying (1.la) to (1.le) is
called a probability measure. It is a fundamental result of probability
theory that a probability measure ()' on an algebra ffi extends uniquely to
a probability measure on the 0" algebra generated by ffi.
Proposition 1.1 (Extension Theorem). Let ffi be an algebra and let a(ffi) be
its generated 0- algebra. If ()' is a probability measure defined on ffi,
then there is one and only one probability measure cP defined on a(ffi)
such that the restriction of cP to CB is ()' [Neveu, 1965, p. 23].
Thus, we have arrived at the basic concept of a probability space. A
probability space is a triplet (Q,a,()') where Q is a nonempty set whose
elements are usually interpreted as outcomes of a random experiment,
a is a 0" algebra of subsets of Q, and ()' is a probability measure defined on
a. The set Q will be called the basic space, and its elements are called
points. Elements of a are called events.
A subset of an event of zero probability is called a null set. Note
that a null set need not be an event. A probability space (Q,a,()') is
said to be complete if every null set is an event (necessarily of zero
probability). If (Q,a,()') is not already complete, ()' can be uniquely
extended to the 0" algebra a generated by a and its null sets. This pro-
cedure is called completion. The process of completion is equivalent to
the following: For a given probability space (Q,a.()'), define for every
subset A of Q an outer probability ()'*(A) and an inner probability
()'* (A) by
()'*(A) = inf {()'(G): G::) A, G E al

c (1.2)
()'* (A) = sup {()'(G): G C A, G E aI
c
Obviously, on a we have ()'* = ()' = ()'*. Thus, ()' can be uniquely extended
to the class of all sets whose inner and outer probabilities are equal.
The additional sets for which ()' can be so defined are exactly the same
as those gotten by completing (Q,a,()') [Neveu, 1965, p. 17].
An example might best illustrate the preceding discussions. Set
vals [a,b) with °: :;

Q = [0,0, and let ffi be the clas,.; consisting of [0,1), 0, all semiopen inter-
a < b :::; 1, and all finite unions of disjoint semiopen
intervals. The class ffi is an algebra, but not a 0" algebra. If A = [a,b) is a
4 ELEMENTS OF PROBABILITY THEORY
m
serniopen interval, we set <p(A) = b - a. If A = U Ai is a union of
r
i=1
m
disjoint intervals Ai, we set <p(A) = (p(A,). Clearly, <P satisfies con-
i= 1
ditions (1.1a) and (LIb). We shall show that <P is, in fact, IT additive.
00
Suppose that AI, A 2 , ••• are disjoint sets in ill such that A = U Ai is
i= 1
::llso in ill. Then A is a finite union of disjoint spmiopen intervals
II, . . . , 1 m , and each Ik n Ai is again a finite union of disjoint semi-
open intervals. Therefore, to prove that <P is IT additive, it is enough to
show that if
[atb) = U [ai,b i )
i= I
i = 1,2, ..
then
00 00
(P([a,b» = b - a = ~ <P([ai,b i » ~ (b i - ai) (1.3)

i= 1 i= 1
First, we note that
rn
(b, - ai) :::; ~ (b i - ai)
n
+
n-l
~ (a,+1 -- bi) = bn - al :::; b - a
r
i=1 i=1 i=1
00
Hence, b - a 2:: (b i - ai). Next, we note that for any 0 > 0,

i= 1
[a, b - 0] C U (a, - 0/2 i , bi)

i=1
The Heine-Borel theorem [see, e.g., Rudin, 1966, p. 36] then states that
there is a finite N such that
N
[a, b - 0] C U (ai - 0/2 i, bi)
i= 1
I t follows that
N
[a,b) C [b - 0, b) U U [a; - 0/2;, bi)
i= 1
N
b - a :::; 0 + ~ (b i - ai + 0/2 i )
i=1
+
00
:::; 20 ~ (b i - ai)
°: :; r
i= 1
00
Therefore, for every 0 > 0, we have (b - a) (b i - ai) :::; 20,

i=1
and (1.3) is proved. Hence, <P is IT additive and is a probability measure.
It can be uniquely extended to the IT algebra a generated by the serniopen
2. MEASURES ON FINITE-DIMENSIONAL SPACES 5
intervals. Sets in a are called Borel sets of [0,1). (P can be further extended
by completion to a. Sets in Ci are called Lebesgue measurable sets of
[0,1). We note that a and Ci include all intervals in [0,1) and not just
semiopen ones. In particular, a point x in [0,1) can be considered to be a
degenerate interval [x,x] and is in a.
Let Q be a basic space, and let a be a u algebra of subsets of Q.
The pair (Q,a) is called a measurable space and is sometimes referred to
as a preprobability space. A nonnegative u additive set function !L defined
on a is callpd a measure. Thus, a probability measure is a measure
sati"fying (p(Q) = 1. More generally, !L is said to be a finite measure if
!L(Q) < eX). Even more generally, iH2 is a countable union of sets A 1, A. 2 , • • •
in Ct such that !L(A i ) is finite for each i, then !L is said to be a
u-finite measure. For example, let Q = R be the real line, and let a be the
u algebra generated by the clasK of all intervals, and for any finite interval
A define !L(A) = length of A. Then, !L can be extended to a unique
u-finite measure on (Q,a), and upon completion it is just the Lebesgue
measure of the real line.
2. MEASURES ON FINITE-DIMENSIONAL SPACES

Let R denote the real line (- eX) , eX) ) , and let Rn denote the collection of all
ordered n-tuples x = (Xl, J"2, . • • , X n), Xi E R. A subset of Rn of the
form {x: Xi E Ai, i = 1,2, . . . , n}, where AI, . . . , An are intervals,
will be called a rectangle. We denote by eRn the smallest u algebra of
subsets of Rn containing all rectangles. The u algebra eRn is called the
n-dimensional Borel u algebra, and sets in eRn are called Borel sets.
We note that eRn can be generated in many ways. eRn is the smallest
u algebra containing every set of the form {x: - 00 < Xi < a} where i is
one of the integers 1, 2, . . . ,n and a is a real number. Thus, the class of
sets generating eRn can be smaller than the class of all rectangles. Of
course, the class of sets generating eRn can also be larger. For example, eR"
is also the smallest u algebra containing every set of the form {x: Xi E A,.
i = 1, . . . ,n} where Ai are one-dimensional Borel sets. We note that
not every set in eRn can be written in this way, for example, {(XI,X2):
Xl + X2 = 2}. A set of the form Ix: Xi E Ai, i = 1, . . . ,n} will be
n Ai.
n
called a product and will be denoted by
i=1
The pair (Rn,eR n) is obviously a measurable space. 1Vleasures defined
on (Rn,(Rn) are called Borel measures, and probability measures defined
on (Rn,(Rn) are called Borel probability measures. Let !L be a finite Borel
measure defined on (Rn, eRn) , and define a function M on Rn by
M(aI, a2, . . . , an)

= !L({x, - eX) < Xi < ai, i = 1,2, . . . ,n}) (2.1)
6 ELEMENTS OF PROBABILITY THEOR't
The function M satisfies the following rather obvious conditions:

M is nonnegative and bounded by J.,I,(Rn) (2.2a)
For each i lim M(a) = 0 (2.2b)
ai----+ - 00
M(a) ) J.,I,(Rn) (2.2c)

a-+ (00,00 , •.. ,00)
Condition (2.2a) is a simple property of measures, and conditions (2.2b)

and (2.2c) follow from monotone sequential continuity of J.,I,.
Let A be any rectangle of the form
A = {x: a; :::; x, < b;, i = 1, . . . ,n}
n
= nIx: ai :::; Xi < bd
i=1
Denoting Ai = {x: ai :::; Xi < bd, we have (because of additivity)
J.,I, en
n-l
i=1
Ai n (- 00 ,b n») = J.,I, (n A n (-
n-I
i=1
I 00 an») + J.,I, en
n
i~1
A i)
Hence,
J.,I,(A) = J.,I, en A,n (-

n-I
i=1
oo,bn») - J.,I, en A,,, (-
n-I
i=1
oo,an»)
Continuing in this way, we find

J.,I,(A) = r
xES (A)
(_I)k(x) M(x)
where M is defined by (2.1), SeA) denotes the collection of 2 n vertices

{x: Xi = ai,bd of A, and k(x) is the number of a's in x. It follows that M
satisfies two additional conditions:
~ (_I)"(X) M(x) >_ 0 for all rectangles A of the form
xEi(A)
n
n
A = {x: ai ::; x, < bd (2.3)
For each i lim

a, i
r
b, xES (A)
(_I)k(x) M (x) = 0
i= 1
(2.4)
Condition (2.3) might be referred to as the monotonicity condition. In

one dimension, it reduces to simply
M(b) - M(a) ~ 0 b~a
Condition (2.4) is a continuity condition. In one dimen~ion, it reduces to

left continuity
lim M(a) = M(b)
aib
3. MEASURABLE FUNCTIONS AND RANDOM VARIABLES 7
A function satisfying (2.2) to (2.4) will be called a distribution function.

Equation (2.1) defines a distribution function in terms of a finite Borel
measure.
Conversely, every distribution function defines a finite Borel mea-
sure. To verify this, let C be the class of all rectangles of the form
n
n
A = {x: ai :::; Xi < b;} (2.5)
i= 1
Let <B(C) be the smallest algebra (but not u algebra) containing C. Then,
every set in <B(C) is a finite union of disjoint rectangles of the form (2.5).
Given a distribution function M, we can define a nonnegative set function
/-L on C by
/-L(A) L (-l)k(x)M(x)
xE8(A)
and extend /-L to <B(C) by additivity. :\IoDotone sequential continuity of /-L

follows from conditions (2.4) and (2.2). Hence, /-L can be extended to a
u-additive set function on <iCC), which is just eRn. If M satisfies
M(a) ) 1 (2.6)
a~(oo,oo •...• 00)
/-L is a probability measure. Distribution functions satisfying (2.6) are

called probability distribution functions for obvious reasons.
We have established a one-to-one relationship between distribution
functions and finite Borel measures. Distribution functions are useful
characterizations of finite Borel measures, becam;e being point functions
rather than set functions. they are easier to specify. However, when they
are used as integrators, distribution functions usually have to be inter-
preted as measures (cf. Sec. 1.5).
3. MEASURABLE FUNCTIONS AND RANDOM VARIABLES

Suppose that (QI,Ct I) and (Q2,<i Z) are two measurable spaces, and f is a
function with domain n1 and range in n2• The function f is said to be a
measurable function or a measurable mapping of (nJ, (Rm,mm), then f is called a
Borel function. All continuous functiom; are Borel functions. The point-
wise limit of a sequence of Borel functions is again a Borel function.
Indeed, the class of Borel functions is the smallest class of functions,
which contains all continuous functions and is closed under pointwise

limit.
If (n,a) is a measurable space, and X is a measurable mapping of
(n,a) into (R,iR), then X is usually referred to as a real-valued a-measur-
able function. If a probability measure <P on (n,a) is defined, then X
is called a real random variable defined on the probability space (n,a,<p).
We note that if X is a measurable, and a' is a (j algebra containing a,
then X is also a' measurable.
Because iR is generated by the collection of half lines {x: - 00 <
x < a I, for X to be a real random variable, it is sufficient for every set
for the form lw: X(w) < al to be an event. The function P x defined by
Px(a) = <p({w: X(w) < a}) (3.2)

is called the probability distribution function of X. The relationship
Px(A) = <p({w: X(w) E A}) (3.3)
defines a probability measure P x on (R,m). Our discussion in Sec. 2
indicated that the distribution function P x and the Borel measure P x
are mutually uniquely determined.
Suppose that Xl, X 2, • • • , X n are n real random variables and
X = (Xl, X 2, • • • ,Xn ). Then, X is a measurable function mapping (n,a)
into (Rn,mn). The function
Px(a) = PX(aI, . . . , an)

= <p({w: Xi(W) < ai, i = 1, . . . ,n}) a E Rn (3.4)
is called the joint probability distribution function of X. At the same time,

the relationship
Px(A) = <p({w: X(w) E A})
defines a Borel probability measure.

A real random variable X is said to be discrete (with probability 1)
if there exists a countable set S = {Xi I such that
L <p({w: X(w) = Xi}) = 1 (3.6)
x,ES
If X is discrete, then its distribution function P x is a function which is

constant except for jumps at Xi, i = 1, 2, . , . ,the size of the jump at
Xi being <P({w: X(w) = xd). The probability measure P x is concentrated
on the points in S. For an arbitrary Borel set A, we have
Px(A) L
x,E·1IlS
<p(/w: X(w) = :r,}) (3.7)
3. MEASURABLE FUNCTIONS AND RANDOM VARIABLES 9
Let P be a probability measure defined on (R",m n ). It is said to be

singular (with respect to the Lebesgue measure) if there exists a set Sin
mn such that peS) = 1 and the Lebesgue measure of S is zero. On the
other hand, P is said to be absolutely continuous (with respect to the
Lebesgue measure) if Lebesgue measure (A) = 0 implies peA) = O. It
is clear that if X!, X 2, • • • , X" are discrete random variables, then P x
is singular. However, P x being singular does not imply that Xl, X 2,
. . . , Xn are necessarily discrete, since the set S having simultaneous
zero Lebesgue measure and P x measure 1 need not be countable.
If Xl, . . . , X n are such that Px is absolutely continuous, then
there exists a nonnegative Borel function px(x), x E Rn such that
Px(A) = fA px(x) dx (3.8)
The function px is called the probability density function for the random
variables Xl, . . . , X". Representation (3.8) results from an applica-
tion of the Radon-Nikodym theorem [LOEwe, 1963, p. 132]. In terms of
the distribution function, (3.8) takes on the more familiar form
P X(a1, a2, . . an)
= f~~ ... f~: PX(XI, X2, • • • I Xn) dX1 dX2 . • . dXn (3.9)
In general, P x is neither absolutely continuous nor singular, but we

can always write
(3.10)
where 0 ~ a ~ 1, and Px(l) and P X (2) are, respectively, absolutely con-
tinuous and singular. The decomposition (3.10) is known as the Lebesgue
decomposition [Neveu, 1965, p. 108].
Given a probability distribution function P(x), x ERn, a set of
random variables Xl, X 2 , • • • , Xn having P as their joint distribution
function is called a realization of P. Every probability distribution func-
tion, i.e., every nonnegative function satisfying (2.3) to (2.6), has at
least one realization. Indeed, it has numerous realizations. One standard
realization can be constructed as follows: An n-dimensional probability
distribution function P defines a probability measure P on (Rn,mn).
Let X: Rn ~ Rn be the coordinate function defined by
(3.11)
Then X = (XI,. , Xn) as random variables on (Rn,mn,p) will have P
as the joint distribution function because
P({x: X 1 (x) < aJ, X 2 (x) < a2, . . . , Xn(x) < an))
= P({x: x, < all i = 1, 2, . . . , n}) = peal, a2, . . . , an) (3.12)
If X = (XI, X 2, • • . , X n) are real random variables defined on a

probability space (n,Ci,cp), and f: Rn -- Rm is a Borel function, then
(3.13)
are again random variables defined on (n,Ci,cp). If we denote the Borel

probability measure of X hy P x , then for A E (Rm,
Py(A) = cp(lw: Yew) E A})

= cp(lw: f(X(w» E A))
= Px(lx:f(x) E A})
= P x (f-l(A» (3.14)
Equation (3.14) expresses the transformation rule of Borel probability

measures (equivalently, distribution functions). We note that (3.14)
would be somewhat awkward to state in terms of distribution functions
directly, without introducing the Borel probability measures.
Suppose that f: Rn -- Rn is a one-to-one and invertible mapping
such that both f and j-I have continuous partial derivatives with respect
to the coordinates. Further, suppose that X = (XI, X 2, • • • , X n) are
random variables with a density function px. Then Y = j(X) also has a
density function, which is given by [see Thomasian, 1969, pp. 362-363]
py(y) = px(f-I(y»IJ(y) I (3.15)
where J is the matrix with elements aji- 1 (y)/aYn and IJ(y)1 stands for
the absolute value of the determinant. As an example of (3.15), suppose
that (X l,X 2) has a joint density function
and [ ~:J = A [.i:], where A is a constant nonsingular matrix. Denoting

by x' the transpose of x, we can write (X12 + X22) = x'x. Hence,
= -.!... e-lY'(AA')-ly
211'
and
( ) __ 1_ -lY'(AA')-ly
py Y - 211'IAI e
Other examples can be found in the exercise section.

4. SEQUENCES OF EVENTS AND RANDOM VARIABLES 11
Finally, we note that the notation

<p({w: X(w) E A})
is unduly clumsy. We shall use the simpler, but less exact, notation
<p(X E A)
instead.
4. SEQUENCES OF EVENTS AND RANDOM VARIABLES

For a given probability space (rl,a,<p), let {An, n = 1, 2, . . . } be a
sequence of events. The sequence {An} is said to be increasing if An+l :J
An for every n and decreasing if An+! C An for every n. A sequence
which is either increasing or decreasing is said to be monotone. The
n_"" n-l
""
limit lim An is defined for every monotone sequence as U An or
n-l
n
An
according as {An} is increasing or decreasing. For a general sequence
{An} (not necessarily monotone), we define superior and inferior limits
as follows:
lim sup An =
n~oo
n U Ak
n=l k?:..n
(4.1)
lim inf An = un Ak (4.2)

n~oo n=I k?.n
The superior limit is the set of all points which occur in an infinite number
of An's, while the inferior limit is the set of all points which occur in all
but a finite number of An's. Hence, lim sup An:J lim inf An. If the
superior limit and the inferior limit coincide, then we say that {An}
is a convergent sequence, and we set
lim An = lim sup An = lim inf An (4.3)
We note that lim sup, lim inf, and lim all involve only countable set
operations. Hence all such sequential limits of events are again events.
Suppose that {An} is a convergent sequence of events, and A is its
limit. Then A is an event, and the following proposition relates (P(A) to
(P(An).
Proposition 4.1. Every probability measure (p is sequentially continuous in

the sense that if {An} is a convergent sequence of events, then
(4.4)
n_oo
Proof: First, if {An} decreases to the empty set 0, then (4.4) is simply
(1.1c) (sequential monotone continuity at 0). If {An} is a decreasing
sequence with a nonempty limit A, then
and {An - A I decreases to 0. If {An} is increasing with limit A, then

<P(An) = <P(A) - <p(A - An)
and {A - AnI decrpases to 0. In either case, (4.4) follows once again

from (l.lc).
Now suppose {AnI is convergent, but not necessarily monotone.
Set Bn = n
Ak and Cn = U A k • Then, Bn C An C Cn, and {Bnl and
k~n k~n
{Cn I are monotone sequences. Therefore,
<p(lim B,.) = lim <P(Bn) ~ lim <P(An)
<p(lim Cn ) = lim <p(Cn) ~ lim <P(An)
Since by qefinition lim An = lim Bn = lim Cn, we have

<P(lim An) ~ lim <P(An) ~ <p(lim An)
so that (4.4) is proved. I
The following rather obvious consequence of Proposition 4.1 is
known as the Borel-Cantelli lemma. For such an obvious result, it is
amazingly powerful and is a standard tool in proving properties which
are true with probability 1.
L <P(A,,) <
00
Proposition 4.2. For an arbitrary sequence of events {A,,\, 00

,,=1
implies <P(lim sup An) = O.
Proof: From Proposition 4.1 we have
'?P(limsup4n ) = '?P( lim U Ak)

n .... oo k:?:n
= lim '?P(
n .... oo
U Ak);5;
k:?:n
lim :E '?P(Ak )
n .... ook:?:n
L <P(An)
00
Because < 00,

n=1
L <P(A
00
lim k) = 0
n---+ GO k~n
which proves the proposition. I

4. SEQUENCES OF EVENTS AND RANDOM VARIABLES 13
Before proceeding to the considera.tion of sequences of random

variables, we first recall the definitions for infimum and supremum.
Let S be a set of numbers in [- 00 ,00]. A lower bound b (upper bound)
of S is defined by the property b :::; x(b 2:: x) for all x in S. The infimum
of S, denoted by inf S, is the greatest lower bound of S, and the supremum
of S, denoted by sup S, is the least upper bound of S. Suppose that
IX,,} is a sequence of random variables defined on a common probability
i'pace (n,a,<p). If for every wEn and every n, X"+l(W) 2:: X,,(w), then
{X,,} i.s said to be an increasing sequence. The sequence {X,,} is said to be
decreasing if I - X n} is increasing. If a sequence IXn} is either increasing
or decreasing, then it is said to be monotone.
If IX n } is a monotone sequence, then we define its limit as
lim Xn = sup Xn or lim Xn = inf Xn

n-+ oo n n-+ 00 n
according as IX n } is increa.'ling or decreasing. We note that such limits

may assume values +
00 and - 00. If IX n) is an arbitrary sequence, not
necessarily monotone, we define inferior and superior limits as
lim inf Xn = lim inf X k (4.5)

n-+ '" n~ eo k?:..n
and
lim sup Xn = lim sup X k (4.6)
n-+ 00 n-~ook?:..n
If the inferior and superior limitR of {X n ) agree, we say that the sequence
I X n I converges and set
lim Xn = lim inf X n = lim sup Xn (4.7)
n-+ 00 n-+ oo
Again, we note that even though X n is finitely valued for each n, the
limit lim Xn may aSRume value,. ± 00.
n-+ 00
If IX n ) is a sequence of random variables, then both lim inf Xn and

lim sup Xn are random variables provided that they are.finite at every wEn.
This is because
Iw: lim inf Xn(W) < a) = {w: sup inf Xn(W) < al
nu
n k~n
= {w: Xk(W) < a) (4.8)

n k~n
and
{w: lim sup Xn(W) < a) {W: inf sup Xn(W) < a)
n k~n
= un {w: Xk(w)
n k~n
< a! (4.9)
so that every set of the form
Iw: lim inf Xn(W) < a} or {w: lim sup Xn(W) < a}
is an event. Indeed, if we had defined random variables to be extended

real-valued functions, as is often done, then inferior and superior limits of
a sequence of random variables are always random variables. The fol-
lowing is an immediate consequence of (4.8) and (4.9).
Proposition 4.3. Let IX n} be a sequence of random variables converging

to a limit X. Suppose that X(w) is finite at every wE fl, then X
is a random variable.
Let A be an event. Define the indicator function IA as follows:
(4.10)
It is obvious that I A is a random variable. Suppose that X is a function

that can be written as
n
X = L
p=1
otpIAp
where ott, ot2, • • • , otn, are real numbers, and AI, A 2, ••• , An, are
events. Then, X is called a simple random variable.
Proposition 4.4.Every random variable is the limit of a sequence of simple

random variables. Every nonnegative random variable is the limit
of an increasing sequence of nonnegative simple random variables.
Proof; We define IX,,} as follows:

if X(w) < -2n
+ 1/2 n )
if X(w) E [k/2 n , k
(4.11)
- 2 2n ~ k ~ 22n - 1
if X(w) 2:: 2 n
For a fixed wand for n 2:: 10g2IX(w)l, we have
so that {Xn} converges to X at every w. If X is nonnegative, then {Xn}

as defined by (4.11) is an increasing sequence of nonnegative random
variables. I
5. EXPECTATION OF RANDOM VARIABLES 15
5. EXPECTATION OF RANDOM VARIABLES

The expectation of a random variable X can be defined as the Stieltjes
integral
EX = f-"'"" .r dP(x) (5.1)
where PO is the probability distribution function for X. Provided that

at least one of the two integrals r"" x dP(x) and -
Jo
fO-""
x dP(x) is less
than 00, the integral ill (.5.1) is well defined. As a definition for the expecta-
tion, (5.1) is less than satisfactory on two counts. First, the definition
is not elementary in that it depends on the definition of a Stieltjes integral.
Secondly, it obscures the fact that EX is really the integral of X on Q
with respect to the probability measure CP. In other words,
EX = In X(w) CP(dw) (5.2)
where the integral needs to be defined. For these reasons, we shall give a
definition for EX by defining the integral in (5.2) for a random variable X.
First, we define EX when X is a simple random variable. By defini-
tion, this means that X has the form
n
X =
i=l
L xJAi (5.3)
where Ai are events and Xi are real constants. For such X we define
n
EX = L xicp(Ai) (5.4)
i=l
Thus defined, EX satisfies the following properties:

Additivity: R(X +
Y) = EX EY+ (5.5a)
Homogeneity: E(cX) = cEX (5.M)
Order preservation: X ~ yo=} EX ~ EY U>·5c)
In addition, EX also satisfies the following important property.
Lemma. Let {Xn} be a monotone sequence of simple random variables

converging to a simple random variable X. Then
lim EX n = EX
Proof: If {X,,} is decreasing, then {X" - X} decreases to zero. If IX n }

is increasing, then {X - Xn} decreases to zero. Since expectation is
additive, we only need to prove that if {Xnl decreases to zero, then

lim EXn = 0
n-+oo
To do this, we note that for every E > 0,

o ::; EX" ~ (max Xn)CP(X n > E) +E
~ (max Xl)cp(X n > E) +E
n-+OO
which completes the proof of the lemma. I

Next, let X be a nonnegative random variable, and let {Xnl be an
increasing sequence of nonnegative simple random variables converging
to X. Since {EXnl is a nondecreasing sequence of nonnegative numbers,
lim EX n = sup EXn always exists, but may be infinite. We shall show
n----+oo n
that two such sequences {Xn}, {Y n }, both converging to X, have the
same limit lim EX n = lim EY n • Hence, we can unambiguously define
n-+ 00 n-+ oc
EX = lim EX n (5.6)
n-HC
for nonnegative random variables. For a general X, we write

X = X+ - X_ (5.7)
where X+ and X_ are both nonnegative and define
EX = EX+ - EX_ (5.8)
provided that the right-hand side ii'l not of the form 00 - 00. It is easy
to verify that, thus defined, EX satisfies properties (5.5). The only thing
that remains to be shown is the uniqueness of the right-hand limit in
(5.6).
Proposition 5.1. Let {Xn I and {Y n I be two increasing sequences of non-

negative simple random variables converging to the same limit X.
Then
lim EXn = lim EYn
n-+ 00 n-+ 00
Proof: Let p be a fixed integer and set

Zn = min (Xn, Yp)
It is clear that {Zn I is an increasing sequence of simple random variables.

Since Xn->X ~ Y p , we have
n-+ "
lim Zn = Yp
n->oo
5. EXPECTATION OF RANDOM VARIABLES 17
Using the lemma, we get
lim EX n 2:: lim EZ n = EYp

n--+ 00 n~ 00
Hence, lim EX n 2:: lim EYp • Reversing the roles of {Xn} and {Y n }
n--+ 00 p---+ co
yields lim EY n 2:: lim EX n and proves the theorem. I

The basic properties (5.5) of EX are precisely those of an integral.
When we want to emphasize the nature of EX as an integral, we shall
write
EX = In X(w)CP(dw)
We shall also use the abbreviation
Inr X dcp. A random variable X
is said to be integrable if EIXI < 00. The following results on sequences
of integrable random variables are counterparts of standard results on
sequences of integrable functions in integration theory. Proofs will be
omitted [Loeve, 1963, pp. 124-125].
Let IXn) be an increasing sequence

Proposition 5.2 (Monotone Convergence).
of nonnegative random variables converging to X. Then,
EX = lim EX n (5.9)
Proposition 5.3 (Fatou's Lemma). Let IX n ) be a sequence of random

variables. Suppose that there exists an integrable random variable
X such that Xn(W) 2:: X(w) for all 'II and w. Then,
lim inf EX n 2:: E lim inf X n (5.10)

n-~IX! n~""
Proposition 5.4 (Dominated Convergence). Let IX n) be a sequence of

random variable;;; converging to X. Suppose that there exists
an integrable random variable Y such that IXn(w) I ~ Yew) for all
nand w. Then,
lim EX n = EX (.5.11)
Suppose that X = (XI, . . . ,Xn ) are random variables defined

on (Q,a,cp) with a joint distribution function P x . Let f be a nonnegative
Borel function. Then f(X(w», w E Q, is a nonnegative random variable
defined on (Q,G,CP), and f(x), x ERn, is a random variable defined on
(Rn,CRn,p x ). Both of the integrals inf(X(w)) (P(dw) and JR.f(x) Px(dx)

are well defined, and we have
jo f(X(w»'!Y(dw) =1.R f(x)Px(dx) n

(5.12)
We leave the proof of (5.12) to Exercise 1.3. If jis not nonnegative, then
we write f = f+ - f- as before, and (5.12) still holds provided that one
of the pair Ef+(X) and Ef-(X) is finite. The integral JRn f(x) Px(dx) is
called a Lebesgue-Stieltjes integral and is often written as JRn f(x) dPx(x)
to emphasize the role of P x as a distribution function. However, for the
definition of the integral, it is the role of P x as a measure which is crucial.
If X = (XI, . . . , X n) are real random variables, the function
Fx(u), u ERn, defined by
Fx(u) = E exp (i rn
k=!
UkXk) = E cos
k=!
r
n
UkXk + iE sin rn
k=!
UkX k
(5.13)
is called the characteristic function for X. Expressed in terms of P x , the

characteristic function is given by
n
Fx(u) = JRn exp (i l1lkXk) Px(dx) (5.14)

k=!
Every distribution function is uniquely determined by its corresponding

characteristic function [compare with Eqs. (3.5.21) and (3.5.32)]. This
is easy to see when P x is absolutely continuous with density px. In
that case,
(i l
n
F x(u) = JRn exp UkXk) px(x) dx (5.15)

k=!
and px can be obtained from Fx by the inversion formula of the Fourier

integral, viz.,
(.5.16)
6. CONVERGENCE CONCEPTS
Thus far, when we speak of the convergence of a sequence of random

variables, we have been referring to pointwise convergence at every point
6. CONVERGENCE CONCEPTS 19
in Q. For this t.ype of convergence, the probability measure es> plays no

role. In probability theory, convergence concepts which depend on es>
are of greater interest. In this section, we shall define some of these
convergence concepts and explore their interrelation.
Definition. A sequence of random variables IXnl is said to converge

almost surely to X if there exists a set A such that es>(A) = 0, and
for every W f1: A
lim IXn(w) - X(w)1

1t->oo
= ° (6.1)
We abbreviate the words "almost sure" and "almost surely" to a.s., and
we adopt the equivalent notations lim a.s. Xn = X and Xn ~ n--->oo
X to
denote the a.s. convergence of IX n I to X. We observe that as we have
defined it, the limit a.s. convergent sequence of random variables is
always finite except on a set of probability zero. Furthermore, two func-
tions which are both a.s. limits of the same sequence are equal except
on a set of probability zero. With these considerations, we see that we
can always take the limit X of an a.s. convergent sequence of random
variables I Xn I to be a random variable, i.e., it is finite and measurable.
We note that the convergence theorems for expectation, Proposi-
tions 5.2 to 5.4, remain valid if in the statements of these theorems
"convergence at every point" is replaced by "convergence almost surely."
This follows simply from the fact that if es>(A) = 0, then fAXdes> = O.
a.s.
Thus, if Xn ~
n->oo
X, and we denote by A the set on which convergence
does not take place, then ( Xn des> ~ ( X des> is the same as
In-A In-A
( Xn des>~ ( X des>.
in JIl
Given a sequence IXnl which mayor may not be almost surely
convergent, it is difficult to apply the definition of a.s. convergence to
it, because we have no candidate for the limit X. For this reason, the
concept of mutual convergence is useful. We say that a sequence {X n I
converges mutually a.s. if sup IX m - Xnl ~ O. By virtue of the Cauchy
>
m_n n-lo IX)
criterion for the convergence of a sequence of real numbers, for each w,

we have sup IX mew) - Xn(w) 1--;:::;: 0, if and only if the sequence of real
m~n
numbers IXn(w) I converges. Therefore, mutual a.s. convergence and a.s.
convergence are equivalent, and we have the following.
Proposition 6.1. A sequence of random variables I Xn I converges a.s. if

and only if it converges mutually a.s.
Definition. A sequence of random variables IX n } is said to converge

mutually in probability if for every e > 0,
sup cp(IX m- X nl 2': e) ---') 0 (6.2)

m2n n-ta:)
A sequence IX n } is said to converge in probability, if there exists X such

that for every e > 0,
cp(IXn - XI >
-
e) ---') 0
n~oo
(6.3)
inp.
We use the notations lim in p. X n = X or X n ---')
n->oo
X to denote that
IX n } converges in probability to X.
Proposition 6.2. Let {X n } be a sequence of random variables:

(a) If {Xn} converges a.s., then it convergei' in probability to the
same limit.
(b) {X n I converges in probability if and only if it converges mutually
in probability.
(c) If IX n } converges in probability, then there is a i'ubsequence
converging a.s. to the same limit.
Remark: The only part of Proposition 6.2 that is easy to prove is (a).
a.s.
If X n ---->
n--->oo
X, then
cp(sup IX m - XI 2': e) -;;:-: 0 (6.4)

m2n
for every E > 0, which certainly implies that

(6.5)
for every e > O. The proof for parts (b) and (c) will be omitted.
There is a third type of convergence which is important to us. We
define it as follow!;.
Definition. A sequence of random variables {X n I is said to converge in

vth mean (1) > 0) to X if
(6.6)
p.m.
We use the notation lim v.m. Xn = X or Xn ---')
n->oo
x.
n->oo
Remark: The case v = 2 is of particular importance and is known as

convergence 1Il quadratic mean. We abbreviate quadratic mean
by q.m.
Proposition 6.3.
(a) If {Xnl converges in vth mean, then it converges in probability
to the same limit.
(b) {Xn I converges in vth mean if and only if
sup EIXf/I - Xnlv ~ 0 (6.7)
m~n n----'OO
Proof:
(a) We make use of the following rather obvious inequality known as
the Markov inequality:
EIZlv = In IZlv d(J> 2:: ftzl~. IZlv d(J>

2:: e(J>(IZI 2:: E) (6.8)
Therefore, if {Xnl converges in vth mean to X, then

1
(J>(IXn - XI
I
2:: E) ::; -flo' EIXn - Xlv ---t
n ....... oo
0
• in p.
WhICh proves that Xn ~n->o"
X.
(b) We first suppose that {Xn I converges in vth mean to X. Then,
sup EIX", - Xnlv ::; 2v{EIXn - Xlv
m~n
+ sup EIXm;:::n
m - Xlv} ~ 0
n~oo
Conversely, suppose that (6.7) holds. Then, by virtue of the Markov

inequality (6.8), {Xn} also converges mutually in probability. It follows
from Proposition 6.2 that there is a subsequence {X:} converging almost
surely, and we denote the limit by X. Using Fatou's lemma, we find
EIX n - Xlv::; lim inf EIX n - X~lv
m-.'"
Since {X:} is a subsequence of {Xn}, we have
lim lim inf EIX n - X~I = lim EIX m - Xnlv
n-+ CIC m-i' 00 m,n--+ 00
which is zero, because {Xn} converges mutually in v.m. Hence,
and {Xnl converges in v.m. to X. I

If we are given the pairwise joint distributions of a sequence of
random variables {Xn}, we can determine whether {Xn} converges in
probability and whether it converges in vth mean. This is simply because
if we know the joint distribution P mn of Xm and X n , we can compute
EIXm - Xnl v and (J>(IXm - Xnl 2:: E). Thus, mutual convergence in v.m.
2Z ELEMENTS OF PROBABILITY THEORY
and in probability can be determined, hence, also convergence. On the

other hand, to determine whether {Xn} converges a.s. generally requires
that we know the joint distribution of all finite subsets of random variables
from the sequence {X n }. There are, however, sufficient conditions on
pairwise distributions which ensure a.s. convergence. We state one of the
simplest and most useful of such conditions as follows.
Proposition 6.4. Suppose that for every E > 0,
L sup <p(IXm -
n m;;:::'n
Xnl 2:: E) < 00 (6.9)
Then {Xn} converges almost surely.
Proof: Since (6.9) implies sup <p(IX m - Xnl 2:: E) ~ 0, the sequence
m~n n----+oo
{Xn} converges in probability. Let X denote the limit and define

An' = {w: IX(w) - Xn(w) I 2:: 2e}
Since An' C {w: max (IX(w) - Xm(W)!, IXm(w) - Xn(W)1) 2:: E}, we have
<P(An') ~ <p(!X - Xml 2:: E) + <p(IX m - Xnl 2:: E)
By letting m ~ 00, we get
<P(An') ~ sup <p(IX m - Xnl 2:: E)
m,;?:n
It follows from (6.9) that L <P(An') < 00, and it follows from the Borel-
n
Cantelli lemma (Proposition 4.2) that
<P(lim sup An') = 0
n
SO that for every E > 0, IX" - XI 2:: 2E for, at most, a finite number of
values of n with probability 1. If we take A = U lim sup Anl/k, then
k';?:l n
<p(A) = 0 and w E A implies that
lim IXn(w) - X(w)1 = 0

proving the theorem. I
The following example illustrates some of the ideas introduced in
this section. Suppose IX n } is a sequence of random variables such that
<p[(Xm - Xn) < a] = _ / 1 fa

_00 exp ( - -X2)
2 dx 2 (6.10)
V 211'I1mn2 I1mn
and umn 2 = 1m - nllmn. We want to investigate the convergence of

{X n }. First, we compute EIX m - Xnl 2 and find
EIX m - Xnl 2 = 1
V211'U mn 2
f'" x-'"
2 exp (-~)
2u
dx
mn
= u mn 2 =
1m - nl
"'-------' (6.11)
mn
Therefore,
lim sup EIX m - Xnl 2 = lim ~ = 0

n-+ co m~n n---+ co n
and we have shown that {Xn} converges in quadratic mean. It follows
that it also converges in probability.
To prove a.s. convergence is more difficult. First, we note the formula
f-'"'" X2k+2 exp (_tx2) dx = - f'"-'" X2k+1 -dxd [exp (_tx2) 1dx
= (2k + 1) f-"'", X2k exp (_tx2) dx
Therefore,
EIX m _ Xnl 4 = 3 (1m - nl)2
mn
EIX m _ Xnl 6 = 15 (1m - nl)3
(6.12)
mn
Using the Markov inequality, we have

EIX m - Xnl 4 3
sup cp(IX m - Xnl ;::: ~) :::; sup 4 = 24
m;?:n m2:n ~ n ~
Therefore, for every E > 0,
and it follows from Proposition 6.4 that {X n 1 converges almost surely.

As the final topic in this section, we consider the distribution func-
tion of the limit of a convergent sequence of random variables.
Proposition 6.5. Let {Xn} converge in probability to X. Let P n(-) denote

the distribution function of X n , and let PO denote the distribution
function of X. Then, at every continuity point x of P(·),
lim P n(X) = P(x) (6.13)
Proof: We write
P(x - E) = cp(X < x - E, Xn < x) + cp(X < x - E, Xn ~ x)

Pn(x) = cp(X < x - E, Xn < x) + cp(X ~ x - E, Xn < x)
Therefore,
Similary, we find
Pn(X) ~ P(x + E) + CP(jX - Xnl > E)
Because {X n I converges to X in probability, we have
P(x - E) ~ lim Pn(x)

n-HO
~ P(x + E)
Because PO is continuous at x, we get (6.13) by letting E -> O. I
Remark: We note that a distribution function is completely determined
by its values at points of continuity. Because of this, Pc-) is deter-
mined everywhere by the sequence {Pnc-) I. The proof that a dis-
tribution function Pc-) is completely determined by its values at
points of continuity can be outlined as follows: Because Pc-) is
nondecreasing and bounded by 1, the number of jumps of size lin
or greater must be less than n. Hence, the points of discontinuity
are, at most, countable. This means that points of continuity are
dense in (- 00,00), that is, every XE( - 00,00) is the limit of a sequence
of continuity points. Because PO is left continuous at every x,
it is completely determined by its values at points of continuity.
7. INDEPENDENCE AND CONDITIONAL EXPECTATION
Two events A and B are said to be independent if

cp(A n B) = cp(A)CP(B) (7.1)
N events AI, A 2 , • • • , AN are said to be independent if for any subset

{kt, . . . ,krl of {l, 2, . . . ,N},
n CP(Ak)
r
r
CP (n Ak,) = (7.2)
i= I i= I
More generally, we say that a family (countable or not) of events is a

family of independent events if every finite subfamily is a set of indepen-
7. INDEPENDENCE AND CONDITIONAL EXPECTATION 25
dent events. If A and B are events such that <9(B) > 0, then we define
the conditional probability of A given B by
( A IB) = <9(A (\ B) (7.3)

<9 <9(B)
If A and B are independent, then clearly
<9(AIB) = <9(A) (7.4)
We say that the random variables Xlo , X N , are independent

if for arbitrary constants Xl, . . . , XN, the events {Xl < xd, ... ,
{XN < XN I are independent. If we denote the distribution function of
Xi by PiC') and the joint distribution function of Xl, . . . , X N by P(·),
then X 10 • , X N are independent if and only if
n Pi(Xi)
N
. , XN) = (7.5)
i=1
We leave the proof of this fact as an exercise. More generally, we define

an infinite and possibly uncountable collection of random variables to
be independent if every finite subset is independent.
Let Xl, X 2, • • • , X N and Y be random variables. We define the
conditional distribution function of Y relative to Xlo . , X N as follows:
1. If Xl, X2, • .., XN are such that <9(X I = Xl, . . . , XN = XN) > 0,
we set
, X N and Y have a joint density function p, we set
(7.7)
for all (Xl, . . . ,XN) such that the denominator is positive.

3. If neither of the above conditions is satisfied, P(yIXI, ••• ,XN)
remains undefined for the present.
If the numerator and the denominator in (7.7) are continuous at

(Xl, ., XN) then it is not hard to see that
P(ylxlJ . . . , XN)
lim <9(Y
-dO
< ylXi ~ Xi < Xi + Ei, i = 1, . . . ,N) (7.8)
.-1,2, ... IN
The function defined by

(
P Y IXl, . . •
) --
,XN f 00
p(Xt, • . . , XN, y)
_ '" P(Xl, , XN, 71) d71
(7.9)
is called the conditional density function of Y relative to Xl, . . . , X N,

and it has the interpretation
P(yIXl' . . . ,XN) dy
= ()'(y :::; Y < y + dylX l = Xl, . • . , X N = XN) (7.10)
Conditional distributions involve rather cumbersome notations, and

their use is limited to situations involving only a finite number of random
variables. In more advanced applications of probability theory, they
are all but abandoned in favor of the simpler yet more powerful concept
of conditional expectation.
Before defining the conditional expectation, we shall try to get an
intuitive picture of what it, is. Suppose that (i is a u algebra of subsets of
some basic space n. We call A an atom of (i if A E (i, and no subset of A
belongs to (i other than A itself and the empty set 0. Atoms arc a kind
of irreducible units of a u algebra. It is obvious that two distinct atoms
must be disjoint. Suppose that there is a countable collection of sets
BJ, B 2 , • • • , which generates (i, that is, (i is the minimal u algebra
containing all the B is. Then (i is said to be separable. Atoms of a separable
u algebra have some important properties that we summarize as follows.
Proposition 7.1. Let ce be a separable (J algebra generated by a countable

collection of sets B 1 , B2 , . .. .
( a) The collection of all atoms of ce can be indexed by a subset of the
real line. We denote the index set by T(T c R) and denote the
collection of atoms by {At, t E T}.
( b) Every set B in ce is a union of atoms.
(c) 0 = U At.
tET
Proof: Take the generating sets BI, B 2, • • • and define 13 i to be Bi or

its complement. Now, the atoms of (i can be constructed by forming the
countable intersections n
13;, because for any atom A, the intersection
i
An B; must be either A or empty. Since each atom, thus constructed,
is indexed by a sequence (possibly infinite) of binary numbers, the index
set T can be taken to be a subset of the real line. Each w belongs to
either Bi or its complement for each 1:. Hence, each w belongs to one and
only one At so that n = U A t. Each set B in (i is the union of all
tET
intersections n Bi such that B

i
(1 Bi is nonempty for every i. Hence, B
is a union of atoms. I
On an atom of a, an Ci-measurable function X can only assume a
single value. Let A be an atom, and let x be a real number, then {w:
X(w) = xl must be in a and so must A (\ {w: X(w) = xl. By definition,
A being an atom implies
A (\ {w: X(w) = xl = 0 or A
N ow suppose that al and a 2 are two separable 0" algebras such that
a l :::> a2. Since every atom of a2 is a set in at, every atom of a 2 is a
union of atoms of al. Therefore, the atoms of (j,2 are bigger than the
atoms of ai, and the collection of atoms of a2 gives a coarser partition
of rl than the corresponding collection from a l • A function measurable
with respect to a2 is necessarily measurable with respect to aI, but not
conversely, because an aI-measurable function may take on more than
one value on an atom of a 2 •
Let X be a random variable defined on a probability space (rl,a,cp).
We can write X as the difference X+ - X- of a pair of nonnegative
random variables X+ and X-. We assume that at least one of the pair
EX+ and EX- is finite so that EX is well defined. Let a' be a sub-O" algebra
of a. Roughly speaking, the conditional expectation of X with respect
to a' is an a'-measurable random variable obtained by averaging X
on the atoms of a'. The precise definition is as follows.
Definition. ECl'X is uniquely defined up to sets in a' of probability zero

by the following two conditions:
(a) ECl'X is measurable with respect to a'.
(b) Let IA denote the indicator function of A. Then
EIAECl'X = EIAX for all A in a' (7.11)
The existence of E Cl 'X as a possibly extended real-valued function is
guaranteed by the Radon-Nikodym theorem, one version of which we
state as follows.
Proposition 7.2. Let (rl,a,cp) be a probability space, and letJL be a nonnega-

tive u-additive set function on (rl,a), that is, ,.,. is a measure. Suppose
that for every A in a such that cp(A) = 0, we also have JL(A) = O.
Then, there exists a nonnegative, a-measurable, extended real-
valued function <p such that for every A E a,
(7.12)
Furthermore, <p is unique up to sets of <P-measure zero.

Remark: The function <p is called the Radon-Nikodym derivative of p.

with respect to CP and is sometimes denoted by dp./drY.
We now apply the Radon-Nikodym theorem to the problem of
defining conditional expectation. Suppose that X is a random variable
on (o,a,cp), and a' is a sub-a- algebra of a. We write X = X+ - X- as
usual and assume that at least. one of the pair EX+ and EX- is finit.e.
N ow, define measures p.+ and p.- on (O,a') by
p.+(B) = EIBX+
p.-(B) = EIBX-
B E a' (7.13)
If B E ffi and rY(B) = 0, then

p.+(B) = JB X+ drY = 0
p.-(B) = JB X- drY = 0
Therefore, there exist a'-measurable functions <p+ and <p- such that for
all B in a',
(7.14)
If we set
gl'X = EIi'X+ _ gl'X-

= <p+ - <p- (7.15)
then ECi'X will have the defining properties, and uniqueness follows from
the uniqueness of <p+ and <p-.
If EX+ and EX- are both finit.e, t.hat. is, if E\X\ is finit.e, t.hen EIi'X
can always be t.aken to be finite valued so that it is a random variable
as we have defined it. If not, EIi'X may have to assume values of ± 00
and is a random variable only in an extended sense. Of course, if we had
defined random variables t.o be extended-valued functions to begin with,
this difficulty would be avoided. This is done by many authors. However,
this approach also has its disadvantages. For example, extended-valued
functions cannot always be added, because the sum may involve 00 - 00.
We shall continue to define random variables as real-valued functions.
When the need arises, we shall make free use of extended-valued measur-
able functions. While they are not random variables in our sense, the
difference is seldom important.
Roughly speaking, a conditional expectation has (almost surely)
all the properties of an expectation. We make this precise by the following
proposition.
Proposition 7.3. If X = c a.s. then ECl'X = c a.s., and if X 2:: Y a.s., then
ECl'X 2:: ECl'y a.s. Furthermore E Cl' is a linear operation, that is,
a.s. (7.16)
Proof: Everything follows directly from the definition of ECl'.

Conditional expectation also has the convergence property of
expectation. The results corresponding to Propositions 5.2 to 5.4 and a
result concerning convergence in v.m. are stated in the following proposi-
tion, but proof will be omitted.
Proposition 7.4 (Convergence Properties).

p.m.
(a) If Xn ----4
n--HO
X and v 2:: 1, then
ECl'Xn~ECl'X (7.17)
n-> 00
(b) If {Xn} is a monotone sequence of integrable random variables

converging to X, then {ECl'X n} is almost surely monotone and
(7.18)
(c) Let {Xn} be a sequence of integrable random variables. Suppose

that there exists an integrable random variable X such that X n 2:: X
for all n, then
a.s. (7.19)
X and IXnl :::;

8..B.
(d) Suppose that Xn ----4
n->oo
Y, for some integrable Y.
Then
(7.20)
Conditional expectation is really a smoothing operation. Roughly

speaking, ECl'X is obtained by averaging X on the atoms of <i'. Proposi-
tion 7.5 below summarizes some of the properties related to the fact that
E Cl' X is a smoothed version of X.
Proposition 7.5 (Smoothing Properties).
(a) If B is an atom of a' and (J'(B) > 0, then the value of ECl'X on
B is given by
(ECl'X)B = (J'(~) IB X d(J' (7.21)

ELEMENTS OF PROBABILITY THEORY
(b) If every event in (f' is independent of every event of the form

{w: X(w) < x}, then
Ect'X = EX a.s. (7.22)
In particular, if we denote all = {~,0}, then E(jIlX = EX a.s.
(c) If Y is a' measurable, then
E(j'YX = YE(j'X a.s. (7.23)
In particular, E(j'Y = Y almost surely, and for any random variable
X, E(jX = X a.s.
(d) If al :::> a 2, then
a.s. (7.24)
Proof:
(a) If B is an atom of a', then E(j'X must be constant on B, because
E(j' X is a' measurable. By definition,
EIBE(j'X = EIBX
Since E(j'X is constant on B, we also have
EIBE(j'X = (EIB)(E(j'X)
Hence, (E(j'X)B = (l/EIB)EIBX, which proves (a).
(b) Let B E a'. Then I B and X are independent. Therefore,
EIBE(j'X = EIBX = (EX)(EI B)
On the other hand, EX is a' measurable and
E(IBEX) = (EX)(EI B)
Hence, E(j'X = EX, a.s. by virtue of the uniqueness of E(j'X.
(c) If B is an event in a', and Y = I B , then for every A E ai,
EIAE(j'YX = EIAlsX = EIA(\BX
EIAYE(j'X = EIAIBE(j'X
= EIA()BX
so that (7.23) follows. By linearity (7.23) must also be t.rue when Y is a

linear combination of indicator (unctions (that is, Y is a simple function).
For the rest, we note that by virtue of linearity, we only need to prove
(7.23) for the case when both X and Yare nonnegative. Since Y is
nonnegative, we can find an increasing sequence of nonnegative simple
functions {Y n I converging to Y. Because (7.23) is true for simple functions,
YnE(j'X = E(j'XYn
1. INDEPENDENCE AND CONDITIONAL EXPECTATioN 31
From Proposition 5.2 we have for every A E 6,',
EI A Y nE I1'X ~
n-+oo
EI A YE I1 'X
EI A E I1'XY n = EIAXYn~ n-->
EIAXY 00
I1
= EI AE 'XY
whence (7.23) follows.

(d) Suppose that 6,1 ::) 6,2 and A E 6,2. Then I A is both 6,2 measurable
and (1,1 measurable. Now for every A E 6,2,
EI AE I1'E I1 1X = EI AE(11X
= EIAX
= EIAEI1'X
which provefJ (7.24). I
We define two a algebras 6,1 and 6,2 to be independent if, whenever
B1 E 6,1 and B2 E 6,2,
(7.25)
Conditional independence can be defined in a similar way as follows:

Two a algebras 6,1 and 6,2 are said to be conditionally independent given
6,' if, whenever B1 E 6,1 and B2 E 6,2, we have
E I1'I B / B, = E 11'lB. EI1' I B , (7.26)

Conditional independence plays a vital role in the theory of Markov
processes.
Suppose that X and Yare two random variables, and EI YI < 00.
Let ax denote the smallest a algebra with respect to which X is measurable.
Now, we want to show that El1xy can always be expressed as a Borel
function of X, that is, there exists a real-valued Borel functionf(x), x E R,
such that
a.s. (7.27)
The notation El1xy is unnecessarily cumbersome, and one usually writes
E(YIX) instead.
To show (7.27), we begin by recalling the probability measure P x
on the a algebra of Borel sets ill defined by
Px(B) = (p({w: X(w) E BD (7.28)
Let !sex), x E R, and B E ill, be the indicator function
if x E B
IB(x) = {~ if x E B
(7.29)
It is clear that EI B(X) Y = 0 whenever cp(X E B) = Px(B) = O. There-

fore, by writing Y = Y+ - Y- as thE' difference of two nonnegative
functions and applying the Radon-Nikodym theorem, we find that there
exists a Borel function f such that
EIB(X) Y = fB f(x) Px(dx) B E (R
= hw:X(W)EBI f(X(w» CP(dw) (7.30)
By the definition of ECixy, we also have
EIB(X)Y =
Jr[w: X(w)EB) (ECixy)(w) CP(dw) (7.31)
Because ax is generated by X, every event A in ax is of the form

A = (w: X(w) E B) for some B E (R (see Exercise 16). A comparison of
(7.30) and (7.31) now yields (7.27).
Exactly the same procedure can now be used to show that if ax is
the smallest (J' algebra with respect to which X = (X b X 2, • • • ,X n)
are all measurable, then
a.s. (7.32)
where Y satisfies EI YI < 00, and f: Rn ~ R is a Borel function. Again,
we shall often write E(YIX) instead of ECixy.
Earlier in this section we defined the conditional distribution func-
tion P(Ylxx, . . . , x n ) [see (7.6) and (7.7)] and interpreted it as
P(YIXl' . . . , XN) = cp(Y < ylX v = XV) v = 1, . . . , n) (7.33)
However, P(yIXl, . . . , Xn) was defined only if either of two conditions
is satisfied. We can now remove this restriction. Let I[y<y) denote the
indicator function
if Yew) < y
(7.34)
otherwise
According to (7.32) there exists a Borel function, say gy(x), x ERn, such
that
E(I{Y<II)IX) = gll(X) (7.35)
We now define
(7.36)
By exactly the same procedure, we can define the joint distribution
function
P(Yl, . . . ,YmIXI, . . . , Xn)
= CP(Y j < Yh j = 1, . . . ,mlXk = Xk, k = 1, . . . ,n) (7.37)
EXERCISES
It is not hard to verify that (7.36) reduces to (7.6) and (7.7) under the
corresponding conditions.
We close this chapter with an important result which is used re-
peatedly in Chapters 6 and 7. Let X be a vector space of bounded real-valued
functions on a set Q. X is said to be closed under uniform (resp. bounded
monotone) convergence if whenever a sequence of In in X converges uni-
formly ::'0 a function I (resp. whenever there is a pointwise monotone
increasing sequence of functions In such that the functions are uniformly
bounded above by a constant and I is the limit) it holds that I is in X.
Let X be a vector space of bounded

Proposition 7.6 (Monotone Class Theorem).
real-valued functions defined on Q which contains the constants and
which is closed under both uniform and bounded monotone conver-
gence. Let e
be a subset of X which is closed under multiplication.
Then X contains all bounded functions measurable with respect to the
a-algebra generated by functions in e. (See [Meyer and Dellacherie,
1978] for a proof.)
EXERCISES
1. Let fl = [0, 00 ).
(a) Let C I be the class of all intervals of the form [O,a). Show that C t is not an
algebra.
(b) Let C 2 be the class of all unions of a finite number of intervals of the form [a,b).
Show that C 2 is an algebra, but not a 0' algebra.
(c) Show that C 2 is the smallest algebra which contains C I •
(d) Show that the 0' algebra generated by C 2 contains all intervals in [0, 00), closed
or open at either end.
2. Suppose that fl = [0,00), and C is the class of all intervals of the form [O,a), 0 <
a S 00. Let P(x), 0 ::; x < 00, be a left-continuous nondecreasing function such
that P(O) = 0 and lim P(x) = 1. We define CP on C by
CP([O,a» = Pea)
(a) Show that a(C) includes all intervals in [0,00).
(b) How must CP«a,b», CP([a,b», and CP([a,b]) be defined in terms of P in order for
CP to be 0' additive?
3. Let X = (XI, . . . , Xn) be random variables defined on a probability space

(n,<i,cp). We can uniquely define a probability measure Px on (Rn,CRn) by
Px({x:C\'v ~ Xv < i3"v = 1, . . . ,n}) = CP({w:C\'v ~ X.(w) <i3.,v = 1, . ,n»

Let I: Rn --> R be a nonnegative Borel function.
(a) Show that for arbitrary a and b
Px({x: a ~ I(x) < b}) = CP({w: a ::; I(X(w» < b»

(b) Both JRn f(x) Px(dx) and In f(X(w»CP(dw) are well defined by the procedure
given in Sec. 1.5. Show that they are equal.
We note that ( I(x) Px(dx) so defined is called a Lebesgue-Stieltjes
JRn
integral and is sometimes written as (
JRn I(x) dP(x). By writing I = j+ - 1-, we
can extend the definition to functions which are not necessarily nonnegative.
4. Let {An, n = 1,2, . . . } be a sequence of sets. We can show that a point w belongs
to nU
n-l k~n
Ak if and only if it belongs to an infinite number of A,,'s as follows.
Suppose that w belongs to an infinite number of An's, then for every n,

""
wE U A k • Therefore, w E n U A k • On the other hand, if w belongs to only a
k?n n ~ 1 k?n
finite number of An's, then there is some no such that w tl: U A k • Since U Ak
k?no k?no
contains nU
n~lk?n
A k, this proves that w cannot belong to nU
n~lk?n
Ak if w belongs
to only a finite number of An's. Show that v.' E U

11 ~
n
1 k?n
Ak if and only if w belongs
to all but a finite number of An's.
5. Suppose that XI, . . . ,Xn are n random variables with a joint density function
px. Let Y I , • • • , Y n be defined by Y i = li(X I , • • • , Xn). Suppose that f has a
differentiable inverse g so that Xi = gi( Y I , • • • , Y n). Show that the joint density
function of Y I, • • • , Y n is given by
where IJI denotes the absolute value of the determinant of J(y) = agdaYj.
Suggestion: Consider the incremental "rectangle" in Rn with sides dYI, d!h, . . . ,
dYn and located at the point y. Under the transformation g this rectangle is mapped
approximately into a "parallelepiped" located at g!y) with sides J(y) ely, the vol-
ume of which is IJ(y) I elYI, dYe, . . . , dYn [see, for example, Birkhoff and Mac Lane,
1953, pp. 307-3lOJ. The desired result now follows from the interpretation of
probability density function as probability per unit volume [see also, Thomasian,
1969, pp. 362-363J.
EXERCISES 35
6. Suppose that X" X 2, X 3 are random variables with a joint density
Find the density function for Y = X ,2 + X 22 + X 3'. V

Hint: Introduce random variable e and cp so that
X, Y cos e
X2 Y sin e cos cp
XJ Y sin e sin cp
7. Suppose that X" . . . , X n have a joint density function
LX;,
k
Let Y k = k 1, . . . , n. Find the joint density Py for Y 1, . • • , Y n.
J~l
8. Prove that
E IXI • < <J>(IXI ~.) < ~.~E~

1 + IXI 1 +. - -. 1 + IXI
inp.
9. Show that Xn ~ X if and only if
n-H"
· E (
11m IXu - XI ) =0
n-> '" 1 +
IXn - XI
10. Suppose that {Xn} is a sequence of random variables such that
<J>(Xn - X", < x) = -1- IX 1

7rU mn - ao 1 + ~'/U"'n' d~
with U",n' = 1m - nl/mn. Does {Xn} converge in probability?
Hint: Compute E[lXn - X",I/(l + IXn - X ... i)J.
q.m.
11. Starting from the Schwarz inequality IEXYI' :::; EX'EY', prove that Xn ----> X
n"""" QO
implies that
EXn~EX
n-> ""
and
Hint: For the second part verify that

12. If IX" I converges in q.m. to X and each X" has a density function Pn(X) = (1/
V2"'O'n 2) exp [-l(x - JLn)2/O'n2], prove that X has a density
p(x) = (1/,,12"'0'2) exp [-l(x - JL)2/O' 2]
provided that E(X - EX)2 > O.
13. Prove that XI, . . . , Xn are mutually independent if and only if
n
n
<l'(X. < Xv> v = 1, . . . , n) = <l'(X. < x.)

,,=1
14. Let X I and X 2 be two random variables such that EX I = 0 and EX 2 = O.

(a) Suppose that we can find a linear combination Y = Xl + aX 2 which is inde-
pendent of X 2• Show that E(Xt\X 2) = -aX 2•
(b) If Xl and X 2 have a joint density function
P(X\,X2) =
2".
VI1 _ p2
exp [ - 1
2 (1 - p2)
(X\2 - 2pX\X2 + X2 2)]
find E(XI\X.).
15. Suppose that Xl and X 2 have a joint density given by
Let Y = l2 VX +
X22. Find E(X\IY).
Hint; Introduce random variable 4> so that X \ = Y cos 4> and X 2 = Y sin 4>.
16. Let ax denote the smallest 0' algebra with respect to which the random variables
X = (Xl, X 2, • • • , Xn) are all measurable. Let X-I(B) denote Iw: X(w) E BJ.
Show that ax = {X-I(B), BE eRnJ. In other words, ax is the collection of all
inverse images of Borel sets.
2
Stochastic Processes
1. DEFINITION AND PRELIMINARY CONSIDERATIONS

A stochastic process {Xt, t E T} is a family of random variables, indexed
by a real parameter t and defined on a common probability space (n,a,6').
Unless otherwise specified, the parameter set T will always be taken to
be an interval. By definition, for each t, X t is an (i-measurable function.
For each w, {X,(w), t E T) is a function defined on T and is called a
sample function of the process.
If Tn = {tl, . . . ,tn } is a finite set from T, we denote by PT.
the joint distribution function of {Xt" . . . ,Xtml. The collection {PTJ
as Tn ranges over all finite sets in T is called the family of finite-dimen-
sional distributions of the process {XI> t E T}. Loosely speaking, prob-
lems which can be answered directly in terms of the finite-dimensional
distributions involve no mathematical difficulties of a measure-theoretical
nature. The more elementary applications of stochastic processes are
problems of this type. Let {X" t E T} be a stochastic process defined on
(n,a,<p). Let CBx and (ix be, respectively, the smallest algebra and the
17
38 STOCHASTIC PROCESSES
smallest (j algebra with respect to which X t is measurable for every t E T.

The difference between CBx and ax is that CBx is only closed under all
finite set operations while ax is closed under all countable set operations.
N ow, if all finite-dimensional distributions are known, then the proba-
bility of every event in CBx is uniquely and directly determined, because
every set in CBx involves only a finite number of X/so In other words, is a probability measure,
it must be (j additive. Hence, by the extension theorem (Proposition 1.1.1),
we can extend on ax is uniquely deter-
mined by the set of all finite-dimensional distributions.
{Xt, °: :;
We illustrate the previous remarks by an example. Suppose that
t :::; I} is a stochastic process. The event
{W: Xt(w) ~ 0, t =~, k = 1,2, . . . ,n} (1.1)
is a set in CBx. The events
1, 2, . . . } (1.2)
and
{W: Xt(w) ~ 0, t = ~, l = 0, 1, . . . , k; k 1, 2, . . . } (1.3)
are sets in ax. However, the set
{W: Xt(w) ~ 0, °: :; t :::; I} = n

tE[O,!]
{w: Xt(w) ~ o} (1.4)
may not even be in a, that is, it may not be an event, because it involves
uncountable set operations on events. Whether it is an event or not
depends on the precise nature of the probability space and the process.
We shall consider these questions in greater detail in Sec. 2.
In practice, one seldom begins with a given probability space and a
given family of random variables defined on it. Instead one often starts
with a proposed collection of finite-dimensional distributions {P T " all
finite Tn in T}, which is usually obtained by a combination of observations
and hypotheses. The question then arises as to whether we can always
find a stochastic process having these distributions. We shall answer this
question as clearly as we can, because it is a source of some confusion.
First, the collection of finite-dimensional distributions must be
compatible in the following sense: If Tn and T m are two ordered finite
1. DEFINITION AND PRELIMINARY CONSIDERATIONS 39
sets from T such that Tn contains T m, then P T ", must be equal to PT.
with the appropriate variables set to 00. For example,
(1.5)
Given a compatible family of finite-dimensional distributions {PT., all
finite Tn in T}, we can always find a probability space (n,a,cp) and a
family of random variablef\ tXt, t E T} having the given finite-dimen-
sional distributions. The proof is by construction. Let n = RT = {the set
of all real-valued functions defined on T}. Let XI(W) = the value of W
at t. Let ffix and ax be, respectively, the smallest Boolean algebra and
the smallest q algebra with respect to which every X t is measurable.
Now, every set in ffix is of the form
{W: (Xt,(w), XI,(w), . . . ,Xt.(w)) E B} (1.6)
where B is an n-dimensional Borel set. Given a compatible family of
finite-dimensional distributions {PT.l, we set
cp({w: (Xt,(w), X,,(w), . . . ,X,,(w)) E B})
= fB dP 1,.1, ... ,,/, (Xl, . . . ,X
n) (1.7)
This defines an elementary probability measure CP on (n,ffix). Now, it
can be shown that CP, so defined, is not only finitely additive, but also
q additive. This means that CP is a probability measure and can be uniquely
extended to ax. To show that CP is q additive, and not merely finitely
additive, is fairly difficult [see Neveu, 1965, pp. 82-83], and the proof
will be omitted. To summarize, we take n = RT,
Xt(W) = value of W at t (1.8)
and ax to be the minimal q algebra generated by {XI, t E T}. The
probability measure CP is defined by (1.7). So defined, the process tXt,
t E T} has the preE'cribed finite-dimensional distributions. We note that
{Xt(w), wERT, t E T} as defined by (1.8) is called the coordinate func-
tion, because X,(w) is the lth coordinate of w. Sets of the form (1.6) are
called cylinder sets.
In the construction that we have just given, the basic space n was
taken to be RT, and the q algebra was taken to be ax, the minimal
q algebra containing all cylinder sets. In a sense, RT is too big and ax
is rather small. For example, sets of the form
{w: a ~ Xt(w) ~ b for all t E T}
are not in ax, if T is uncountable. Sometimes, it may be convenient to
work with a different n. Suppose that we are given a compatible family
of finite-dimensional distributions {P Tn , all finite Tn in T}, and that we

are also given a basic space ~ together with a family of functions {Xt(w),
w E~, t E T}. Let ax be defined as before. The question is, "can we
always find a probability measure (p defined on (~,ax) so that {Xt, t E T}
has the prescribed finite-dimensional distribution?" The answer is an
immediate "no." In order for this to be possible, {PTJ must satisfy other
conditions, in addition to the compatibility condition. First of all, it may
happen that for two different Borel sets A and B,
{w: (Xt.(w), Xt,(w), . . . ,Xtn(w» E A}
= {w: (Xt,(w), Xt,Cw), . . . ,XtnCw» E B} (.1.9)
Since the same set in ax can have only one probability value, we must
require
fA dP t, •..• •1. = h dPt,..... In (1.10)
whenever A and B satisfy (1.9). This consistency condition is sufficient to

ensure that (1.7) defines an elementary probability (p on (~,<Bx). However,
(p need not be (]' additive (or equivalently, monotone-sequential continuous
at 0). If (p is not (]' additive, it means that we cannot define a probability
measure on (~,ax) so that {XI, t E T} has the prescribed finite-dimen-
sional distributions. We illustrate this discussion by an example. Let
T = [0,1], and take ~ = C[O,l] = {all continuous functions on [O,l]}.
Take XI(W) to be the coordinate function as before. Suppose that {PTJ
is given by
P li ./,,, ... t.eXI, ..• , Xn) = Ii J~: v_;

v=l 211"
exp (-iz 2) dz (1.11)
In order to be consistent with (1.11), we must have
(P(XI > E, X. < -e) = [/.oo ';211" exp (_iZ2) dZr (1.12)
Because ~ = C[O,l], the sets

An = {w: Xt(w) > E, Xt+l/n(W) < -E}
must converge to 0 for every E > O. However,
(p(X t > E, X t+ 1/ n < -E) = [/.oo _~

• 211"
V
exp (_iZ2) dZJ2
n-->oo
----j--) 0
Hence, (p defined by (1.7) and (1.11) cannot be sequentially continuous

at 0 and cannot be extended to ax. Intuitively, the reason is clear. Since
~ = C[O,l], any probability measure on (~,ax) gives <P(~) = 1, which
means that ",ith probability 1, every sample function is continuous.
2. SEPARABILITY AND MEASURABILITY 41
Clearly, not every compatible family of finite-dimensional distributions

is consistent with this fact. In particular, (1.11) implies that {Xt, t E [0,1]}
is a family of independent and identically distributed random variables,
and the independence between X t and X., no matter how close t and 8
are, is incompatible with continuous sample functions. Later, we shall
develop conditions on the finite-dimensional distributions which guarantee
continuous sample functions.
Z. SEPARABILITY AND MEASURABILITY

The definition of a stochastic process X t requires it to be an a-measurable
w function for each t, but places no condition on it as a function of t.
When T is an interval, questions of an analytical nature concerning
sample functions of the process usually involve an uncountable number
of events and random variables and are not always answerable without
additional assumptions. For example, the set {w: Xt(w) ~ 0 for all
t E T) = n{w: Xt(w) :?: 0) may not be an event since it involves an
tET
uncountable intersection of events. Similarly, Y = sup X t may not be a
lET
random variable, because sets of the form
{w: Yew) :::; y) = n
tET
{w: Xt(w) :::; y) (2.1)
may not be events. The same comment applies to inf X t and to

tET
lim sup Xu = lim sup Xu (2.2)

u->t n->'" lu-tl<l/n
Hence, even when lim Xu exists, it may not be a random variable.

u->t
Further, an integral of the form
Z(w) = lab Xt(W) dt

may not be well defined for almost all w, and may not be a random variable
even when it is well defined. These difficulties motivated the introduction
of the concepts of a separable process and a measurable process [Doob,
1953, pp. 50-71].
Definition. A process {Xt, t E T) is said to be separable if there exist a

countable set SeT and a fixed null event A such that for any
closed set K C [- 00,00] and any open interval I, the two sets
{w: Xt(w) E K, tEl n T) {w: Xt(w) E K, tEl nB)
differ by a subset of A.
The countable set S is called the separant or separating set. If

tXt, t E T} is separable, then every set of the form {w: Xt(w) E K,
tEl n T} differs from an event by at most a null set and can be rendered
an event by completing the probability space on which the process is
defined. If the process {Xt, t E T} is separable and w El A, then for any
open interval I, Xt(w) ~ a, tEl n S implies Xt(w) ~ a for all tEl n T.
Thus,
sup Xt(w)?:: sup Xf(W)

tEIns tEInT
Since T:) S, the opposite inequality holds also. Hence for any open
interval I
sup Xt(w) = sup Xt(w) (2.3)

tEIns tEInT
for all W El A. Because S is countable, sup X t is a random variable, thus

tEIns
sup X t is equal almost everywhere to a random variable. If the proba-
tEInT
bility space is complete, then sup X t is itself a random variable. Similar
tEInT
results hold for inf XI and, hence, also for
tEInT
lim sup X. = lim sup X•

• ->t n->'" Is-tl<l/n
and
lim inf X. = lim inf X.
8->t n->'" Is-tl<l/n
Thus, lim X., when it exists, is equal almost everywhere to a random

8--+t
variable and can be made a random variable by completing the under-
lying probability space. Henceforth, the underlying probability space
will always be assumed to be complete.
Given a probability space (Q,a,cp) and a process {Xt, t E T} defined
on it, then tXt, t E T} is either separable or not, and there is nothing
one can do about it. In practice, the situation is never this rigid. Instead,
one is usually free to choose the way in which tXt, t E T} is to be defined,
as long as some specified finite-dimensional distributions are satisfied.
Thus, the following result is an exceedingly important one.
Proposition 2.1. For every stochastic process {Xt, t E Tl there exists a

process tXt, t E Tl, defined on th~ same probability space, such that
(a) {X" t E Tl is separable
(b) cp(X t = XI) = 1 for each t E T
Remarks:
(a) Although the set {w: Xt(w) "r: Xt(w) I is a null event for each t,
the set
{w: Xt(w) "r: XI(W) for at least one tin T)
= U {w: XI"r: Xt(w») (2.4)
lET
need not be an event and need not have zero probability even if it
is an event. If it is a null event, then tXt, t E T} is itself a separable
process.
(b) Obviously, {XI, t E T} and {Xt' t E T} have the same finite-
dimensional distributions.
(c) It may be necessary for X t to assume values ± 00.
A proof of Proposition 2.1 will be omitted. Suffice it to note that the
standard proof is by construction, and the separating set S that results
is, in general, quite complicated. The situation improves if a continuity
condition is satisfied by the finite-dimensional distributions. A process
{Xt, t E T} is said to be continuous in probability at t if
(2.5)
for every t: > O. If tXt, t E T} is continuous in probability at every point

in T, then we shall say simply {XI, t E T) is continuous in probability.
We note that continuity in probability is verifiable in terms of two-
dimensional distributions.
Proposition 2.2. Let {X I, t E T} be a separable process which is continuous

in probability. Then every countable set dense in T is a separating
set.
Thus, whenever a process tXt, t E T} is both separable and con-
tinuous in probability, the probability of an event involving an uncount-
able number of X t can be computed by choosing a sequence of partitions
of T in the usual manner. For example, suppose tXt, 0 :s; t :s; I} is both
separable and continuous in probability, and we wish to compute
cP(X t ~ 0, 0 :s; t :s; 1). First, we note that we can take S to be the set
of all dyadic rationals, that is,
S = {:n' 0 :s; k :s; 2n,n = 0, 1, . . . }

Hence,
a>(XI ~ 0, 0 :s; t :s; 1) a>(X k / 2 n ~ 0, 0 :s; k :s; 2n , n = 0, 1, . . . )
""
a> (n {w: X k/ 2.(W) ~ 0,0 :s; k :s; 2n})
n=O
Because An = {W: Xk/2n(W) ~ 0, °.: :;

k .:::; 2n} is a decreasing sequence in
n, and because every probability measure is sequentially continuous, we
have
<p(Xt ~ 0, °.: :; t .:::; 1) = lim <p({w: Xk/2n(W) ~ 0,
n->""
°.: :; k .:::; 2n))
(2.6)
As an example of a nonseparable process, consider the following.
Let fl = [0,1]' a be the u algebra of Lebesgue measurable sets, and let <P
be the Lebesgue measure. Consider a process tXt, t E [0,1]} defined on
(fl,a,<p) by
if W = t
Xt(w) = { ~ otherwise
(2.7)
The process tXt, t E [0,1]} is nonseparable because for any set S C [0,1],
{w: Xt(w) = 0 for all t E S} = [0,1] - S
Therefore,
<p({w: Xt(w) = 0 for all t in [0,1]}) = °
while if S is any countable set, then
<P({w: Xt(w) = ° for all t in SD 1
Now, let tXt, t E [O,1]} be defined by
Xt(w) = ° for all t and w (2.8)
The process tXt, t E [O,l]} is clearly separable. Indeed, for every closed
set K,
°
{w: Xt(w) E K for all t E [O,1]} {w: Xo(w) E K}
{ [0,1] if K:3
o ifKlJO
For each t,
{w: Xt(w) = Xt(w)} = [0,1] - It}
which is an event with probability 1.
It is often desirable to be able to define integrals of the form
lab Xt(W) dt. If the integral is to be interpreted as a Lebesgue integral of
sample functions, Lebesgue integrability of almost all sample functions
is clearly a necessity. Even if almost all sample functions of {XI, t E T}
are Lebesgue integrable, the resulting integral lab Xt(w) dt still may not
be a random variable. What is needed is that Xt(w) defines a (t,w) func-
tion measurable with respect to £ ® a, where £ denotes the u algebra
of Lebesgue measurable sets in T.t We recall that .£ is the smallest CT

algebra containing all intervals and completed with respect to the measure
which assigns lengths to intervals.
Definition. A process tXt, t E T} with a Lebesgue measurable parameter

set T is said to be a measurable process if Xt(w) is a (t,w) function
measurable with respect to .£ ® (1" where .£ is the CT algebra of
Lebesgue measurable sets in T, and (1, is the CT algebra of events in
the defining probability space, i.e., for every x E (- 00,00),
{(t,w): Xt(w) ::=;x} E.£ ® a
We note that whereas separability imposes no restriction on the finite-
dimensional distributions, in general, measurability does. For example,
it can be shown that if T is an interval, and if IXt, t E T} represents a
collection of independent and identically distributed random variables,
then it cannot be measurable unless the distribution of X t is concentrated
at a point (that is, Xl is a.s. a constant). The following proposition gives a
sufficient condition on the finite-dimensional distribution for a measurable
process to exist and also summarizes some preceding results [Doob, 1953,
pp. 61-62].
Proposition 2.3. Let {Xt, t E T} be a process continuous in probability,

and let T be an interval. Then, there exists a process IX t , t E T}
defined on the same probability space such that
(a) cp(X t = X t ) = 1 for each t E T
(b) tXt, t E T} is separable
(c) tXt, t E T} is measurable
(d) Any countable set dense in T is a separating set for tXt, t E T}
We call the process {Xt' t E T} in Proposition 2.3 a separable and
measurable modification of tXt, t E T}. Whenever a process tXt, t E T}
is continuous in probability, we can always assume that it has already
been replaced by a separable and measurable modification. Doing so is
almost never incompatible with any important assumption.
Finally, we note that if A is a Lebesgue measurable set of the real
line, and if
fA EIXt/ dt < 00
then almost all sample functions are Lebesgue integrable on A, and

fAXt dt defines a random variable. This follows directly from Fubini's
theorem.
t Let <1, and <12 be, respectively, CT algebras of subsets of 01 and O2• <11 ® <12 denotes
the smallest CT algebra which includes all sets of the form Al X A 2, Al E <11, A2 E <12.
3. GAUSSIAN PROCESSES AND BROWNIAN MOTION

Many random phenomena in physical problems are well approximated by
stochastic processes that are called Gaussian processes. This is fortunate,
because Gaussian processes have distributions which enjoy great analytical
simplicity. Brownian motion, or Wiener process as it is sometimes called,
is a specific kind of Gaussian process·and plays a vital role in the modern
theory of stochastic processes. Generalizing on some of the properties
of Brownian motion has led to the theory of diffusion processes and sample-
continuous martingales. Brownian motion is also the key to a proper
understanding of "white noise," a widely used model for noise phenomena.
Through this intervening role of Brownian motion, systems with white-
noise disturbances can be studied using results of diffusion theory and
martingale theory. This theme will be developed in considerable detail
in Chaps. 4 and 6.
Let Z be a random variable such that EZ2 < 00. Let J.I. = EZ and
0'2 = E(Z - p.)2. The random variable Z is said to be Gaussian either if
0'2 = 0, in which case Z is equal to J.I. with probability 1, or if
<p(Z < a) =
1
f a 27rO'
-00
_ ;-;;-;.
V 2
exp
[1
- -
2
(z -
0'2
J.I.)2]
dz (3.1)
In other words, Z has a density function
pz(z) = - -1= exp [1(Z-J.l.)2]

- - 2 (3.2)
V27rO' 2 0' 2
whenever 0'2 > O. We can compute the characteristic function and find
F(u) = Ee iuZ = foo _1_ exp [ - --.!...- (z - J.I.)2] eiuz dz

-00 ~ 20'2
(3.3)
Since the distribution function is uniquely determined by the character-
istic function, a necessary and sufficient condition for Z to be Gaussian
is that Z should satisfy
EeiuZ = eiuEZ-!E(Z-EZ)' (3.4)
We note that this condition is valid even for the case E(Z - EZ)2 = O.
A stochastic process {X t, t E T I is said to be a Gaussian process if
every ·finite linear combination of the form
Z = r OtiX
N
i=1
t, (3.5)
is a Gaussian random variable. For tXt, t E Tl to be a Gaussian process,

it is clearly necessary that for each t, X, be a Gaussian random variable.
3. GAUSSIAN PROCESSES AND BROWNIAN MOTION 47
But this is not enough. A necessary and sufficient condition is given by

the following.
Proposition 3.1. A process tXt, t E T} is Gaussian if and only if

(a) EX t 2 < ao for each t E T
(b) For every finite collection (tl, . . . ,tN) C T
E exp (i k~l UkXtk) = exp [i k~l UHL(tk) - ~ k'~l UkUIR(tk,tl) ]

(3.6)
where
JL(t) = EXt (3.7)
and
R(t,s) = E[X t - JL(t) ][X. - JL(s) J (3.8)
Proof: Suppose that tXt, t E T} is a Gaussian process. Then by defini-

N
tion, Z =
k~l
L UkX1k is Gaussian. Therefore,
By direct computation we find

N
EZ =
k,l~
L Ukf.L(t
1
k)
and
N
E(Z - EZ)2 L UkU1R(tk,tl)
k,l ~ 1
N
Conversely, suppose conditions (a) and (b) are satisfied. Let Z L (XkX'k
k~l
be an arbitrary finite linear combination. Then, using (:3.6) we find

N
Ee iuZ = E exp (i L U(XkX
k~l
I.)
= exp [iu
k~l
L (Xkf.L(tk) - lu 2 L(Xk(X1R(tk,tl)]
k,l
= eiuEZe-}u'E(Z-EZ)'
Therefore, Z is Gaussian. Since this is true for every linear combination,

{XI, t E T} is a Gaussian process by definition. I
Remark: Suppose that IX n } is a sequence of Gaussian random variables

converging to a random variable X. Then (see Chap. 1, Exercise 10)
JLn = EXn~f.L
n-->oo
= EX
and
Since X n has a density function pn (.) given by
Pn(X) = 1 exp [ - ~2 (x - f..I.n)2]

V271'u n 2 2un
and the distribution functions of Xn converge to that of X, the

density function p(.) of X must be given by
p(X) = _1_ exp [ - ~ (x - f..I.)2]

~ 2u 2
In other words, the limit of a q.m. convergent sequence of Gaussian

random variables is always a Gaussian random variable. Therefore,
if tXt, t E T} is a Gaussian process, then not only is every sum of
the form
a Gaussian random variable, but so is the limit of every q.m.

convergent sequence of such sums. This makes precise the often
stated result that a random variable obtained by a linear operation
on a Gaussian process is always Gaussian. Linear operation is
taken to mean the q.m. limit of a sequence of finite linear
combinations.
The function f..I.(t) is called the mean function of tXt, t E T} or
simply the mean. The function R(t,s) defined by (3.8) is called the
covariance function. Covariance functions will be considered in some
detail in Chap. 3. One important property of covariance functions is
that for any finite collection (tl' t 2 , • • • , tn) C T, the matrix R formed
by setting R;j = R(t;,tj) is nonnegative definite. l This is simply because
for any complex constants al, a2, . . . , an, we have
n n
L aiajR;j = L E{ai[X t, - f..I.(t;)]} {aj[X t; - f..I.(t j)]}
i,i = 1 i,j
n
= ElL ai[X li - f..I.(t;)) 12 2:: 0 (3.9)
i= 1
1 The term positive semidefinite is more conventional than nonnegative definite. We

have adopted the latter for a closer correspondence with the terminology usually asso-
ciated with covariance functions.
It is apparent from (3.6) that the characteristic function F for every

finite collection X I l ' • • • , X tN is completely determined by the mean
p.(t) and the covariance function R(t,s), viz.,
(3.10)
If the matrix R = [R(tk,tl)] is positive definite and not merely nonnegative

definite, then the inversion formula for Fourier integral in Rn can be
used to obtain the probability density function for Xtl' . . . , X IN •
Specifically, we have
N
= -1- foo . . . foo F(ul . . . , UN) exp ( -1: kfl UkXk) dUl
(21r)N -00 -00'
. . . dUN = 1 -!(x-l')'R-'(x-l') (3.11)

(21r)N/2IRlt e
where R-I denotes the inverse of R, x denotes the column vector with
components Xl, • • • , XN, t' denotes the column vector with components
P.(tl)' . . . , P.(tN), IRI denotes the determinant of R, and prime denotes
transpose. When the matrix R is not positive definite, then (3.11) fails,
and XI" . . . , X tN do not have a joint density function. However, the
joint distribution function can still be obtained from the characteristic
function given by (3.6). In particular, if A is a rectangle such that the
distribution function is continuous on its boundary, then we have
CP«XIl , . . • ,XtN)EA)
= fRN F(ul, . . . , UN)VtA(UI, . . . , UN) dUl . . . dUN (3.12)
where
L UkXk) dXl'
N
VtA(Ul, . . . ,UN) = fA exp ( - i dXk

k=l
Finally, the distribution function cp(X ti <
Xi, i = 1, . . . ,N) can be
obtained by using (3.12) and taking limits.
These considerations show that all finite-dimensional distributions of
a Gaussian process are completely determined once we specify its mean
p.(t) and covariance function R(t,s). This is indeed the simplest way of
specifying the finite-dimensional distributions of a Gaussian process.
As an example, consider the Brownian motion process tXt, t 2': OJ defined

by
{XI, t 2': 0 j is a Gaussian process (3. 13a)
EXt = 0 EXtX. = min (t,s) (3.13b)
If 0 < tl < t2 < ... < tN, the matrix R = [min (tk,tl)] is positive
definite. Thus, the density function can be written down immediately
by the use of (3.11). After a little rearrangement, we find
(3.14)
where to = 0 = Xo. Equation (3.14) shows that {X,., X" - X,., . . . ,

X'N - X ,N_I } must be a collection of independent random variables for
every increasing (tl, . . . , tN). Any process having this property is called
a process with independent increments. Thus, a Brownian motion proc-
ess is a process with independent increments.
A Brownian motion is also a Markov process, which is defined as
follows.
Definition. A process {X" t E T} is said to be a Markov process if for

any increasing collection t l , t 2, . . , tn in T,
cp(X tn :::; xnlXt, = X,, /I = 1, . , n - 1)
= cp(X'n :::; xnlX tn _ 1 = Xn-l) (3.15)
In other words, given the past (X", . . . ,X'n_') and the present (X ,,,_,),
the future (X t.) depends only on the present. For a Bro;vnian motion,
(3.15) is easily verified as follows:
cp(X tn :::; xnlXtp = Xv> II = 1, . . . , n - 1)
= !~'~ p(~,tn; XI,t l ; . . . ; Xn-l,tn-l) d~
= J:r" p(XI,t l ; .... ; ~,tn) d~
P(XI,tI ; . . . ; Xn-l,tn-I)
1 [1 -
- 00
= J .~n exp - -
(~ Xn_I)2]
d~
- 00 vi211'(tn - tn-I) 2 tn - tn-- I
= cp(X tn :::; xnlX tn _1 = Xn-l)
Indeed, a little reflection would show that any process with independent
increments {X" t E Tj is also a Markov process.
A Brownian motion {XI, t 2': O} also has the property (see Exercises
8 and 9)
E(XdX" 0 :::; T :::; 8) = X. a.s.
for any t ~ s. A process having this property is called a martingale. We

shall define a martingale a little more generally as follows.
Definition. Let tXt, t E T} be a stochastic process, and let {a/, t E T} be
an increasing family of (J' algebras such that for each t, X t is at
measurable. {X/, at, t E T} is said to be a martingale if t > s implies
a.s. (3.16)
The process is said to be a sub martingale (supermartingale) if the
equality (3.16) is replaced by ~ (respectively, SO).
For any Brownian motion {Xl, t ~ O}, if we take a xl to be the
smallest (J' algebra with respect to which {X., sst} are all measurable,
then {Xl, a xt , t ~ O} is a martingale. Suppose that {Xl, t ~ O} is a
Brownian motion, and {at, t ~ O} is an increasing family of (J' algebras
such that for each t, X t is at measurable and for each t, {X XI, S ~ t}
8 -
is independent of a l • We shall emphasize this relationship between X t

and at by saying that {Xl, ai, t ~ O} is a Brownian motion. If {Xl,
a/, t ~ O} is a Brownian motion, then it is a martingale. To prove this we
merely note that for t > s,
Eel,X t = Eel.[X. +
(X t - X.)]
= x. +E(X t - X,) = Xs a.s.
We should note that instead of taking the parameter set to be [0,00), we
can define a Brownian motion on (- 00,00) by replacing (3.13) with
EX/ = 0 EX/X. = Hltl + lsi - It - sl) (3.17)
What results is a pair of independent Brownian motions {X/, t ~ O}
and {X _/, t ~ O} pieced together at t = 0 (see Exercise 7).
If {XI, t ~ O} is a Brownian motion, then E(X t - X8)2 = It - sl.
Therefore, by virtue of the Chebyshev inequality, we have
for every E > O. Thus, a Brownian motion is continuous in probability.

It follows from Proposition 2.3 that every Brownian motion has a separable
and measurable modification. A separable Brownian motion has some
important sample function properties. The first of those that we shall
consider is an inequality that it shares with all separable second-order
martingales.
Proposition 3.2. Let {X/, a S t S b} be a separable martingale such that
EX t 2 < 00 for every t in [a,b]. Then, for every positive E,
EX 2
(p( sup IXII ~ E) < ~ (3.18)
a:9~b - e2
Proof: Since {Xt, t E [a,b]} is separable, there is a countable set S such

that
Now, since S is countable, we can write S = USn where {Sn} is an

n
increasing sequence of finite collections of points in [a,b], Because 6> is
monotone-sequential continuous,
6>(sup IXd ;::: E) = lim 6>(max IXtl ;::: E)

IEB n-HO IEB.
For a fixed n, let t1, t 2, . . . , tN be the points in Sn with tl < t2 <

< tN. Let II(W) be the first i (if any) such that IXt, (w)1 :::: E, and write
EXb 2 = r 6>(11
N
k=l
= k)E(Xb 2111 = k)
+ 6>(max IXtl < E)E(Xb 2 1 max IXtl < E)

IEBn IEBn
;: : .rN
k=l
6>(11 = k)E(Xb2 111 = k)
E(Xb 2111 = k) = E[(Xb - X t.)2111 = k) + E(Xtk2/1l = k)

+ 2[X .(Xb - t Xt.)11I = k]
From the definition of II, we have E(X t .2/ 11 = k) ;::: E2. Because the event
II = k depends only on Xt" XI" . . . ,XI., we have
E[XI.(Xb - Xt.)11I = k] =EIXt.E[(Xb-Xt.)!Xt" ... ,Xt.]III=k}

=0
due to the martingale property. It follows that E(Xb 211l = k) ;::: E2 and
N
EXb 2 ;::: E2 L 6>(11 =
k=l
k)
= E26>(max IXtl ;::: E)

IEB.
~ E2(P( sup IXtl ;::: E)
n->OO a:5:l:5:b
proving (3.18). I
Remarks:
(a) (3.18) holds for separable complex-valued martingales with
EIXbl2 replacing EXb 2. The proof is almost identical.
(b) If lXI, a ~ t ~ b} is a separable process with independent

increments such that cp(X t - X8 > 0) = cp(X t - X. < 0), a ~ t,
8 ~ b. Then we can prove
cp( sup IXtl 2:: e) ~ 2CP(IXb l 2:: e)

a:9:::;b
which is a formula in the same spirit as (3.18) and can be proved in

a similar way.
Proposition 3.3. With probability 1, every sample function of a separable

Brownian motion is uniformly continuous on every finite interval.
The proof of this proposition will be deferred until the next section
where it can be easily proved uRing some sufficient conditions for sample
continuity. The sample-continuity property of a separable Brownian
motion can be interpreted in a different way as follows: Let C denote the
space of all continuous real-valued functions on [0,00). Let Xt(Ui), w E C,
be the value of w at t. Let a denote the smallest fT algebra of subsets of
C such that every X, is measurable. Then it can be shown that there
exists a unique probability measure CP on (C,a.) such that IXt, 0 ~ t < oo}
is a Brownian motion. So constructed, {X" t 2:: O} is necessarily separable.
Sample continuity in this context says no more than cp(C) = 1. The
measure CP on (C,a) will be called the Wiener measure. Although the
sample functions of a separable Brownian motion are almost surely
continuous, they are highly irregular. For example, with probability 1,
the sample functions of a separable Brownian motion are nowhere dif-
ferentiable and are of unbounded variation on every interval. Roughly
speaking, (X t +a - XI) is of order OCVB) , which is incompatible with
either differentiability or bounded variation. This behavior of the
Brownian motion is made precise by the following proposition.
Proposition 3.4. Let T = [a,b] be a closed subinterval of [0,00). Let

Tn = [a = toe,,) < t 1 (n) < ... < tt;;!n) = b], n = 1, 2, . . . , be a
sequence of partitions of T such that
Let IXt, t 2:: O} be a Brownian motion, then

N(n)
L = 1
(X t (n) - xCn)
tv-I
)2 ~ b - a
.
JI JI n---i' CIQ
If L <In <
n=!
00 then the convergence is also almost sure.
Proof; To prove quadratic-mean convergence, we write

N(n)
Sn = L
p= 1
(Xt.<n) - X;:~,)2 - (b - a)
N(n)
= k~ [(X t", (n) - X(n)
t,,_l
)2 - (t "(n) - t(n)
11-1
)}
p=1
Since Sn is a sum of independent random variables, each with zero mean,

we have
ES n = 0
N(n)
ESn 2 = L E[(X,.<n)
p=1
- X~;~Y - (Un) - t~~l)F
N(n)
= 2 L
p= 1
(tp(n) - t~~l)2 ~ 2Ll,,(b - a)
Since Ll n -~
n-l> co
0, this proves Sn ~
n---'"
O. 00
For the second part of the proposition, using the Chebyshev inequal-
ity (1.5.5), we find
ES Ll
<P(ISnl > E) ::; ---f-
E
2
~ 2(b - a) -f
E
Since by assumption, L Ll
n
n < 00, we have for every E > 0,
By the Borel-Cantelli lemma, ISnl > E for at most a finite number of

times with probability 1, proving the theorem. I
Remark: The condition LLl n < 00 for a.s. convergence can be replaced by
n
the condition that Tn be nested, that is, T n+l ::) Tn for each n
[Doob, 1953, pp. 395-396}.
As we said earlier, an intuitive interpretation of Proposition 3.4
is that dX t Vit. This has some important consequences. For example,
f"Oo..J
we would expect the following:

df(Xt) = feXt+dt ) - f(Xt)
= f'(X t) dX t + tf"(X t ) dt
This relationship will be made precise subsequently by means of the
stochastic integral.
4. CONTINUITY 55
4. CONTINUITY
For a stochastic process {Xt, t E TI defined on an interval T, we dis-
tinguish the following types of continuity. The process is said to be
1. Continuous in probability at t if for any E > 0,

(4.1)
2. Continuous in vth mean at t (in quadratic mean if II = 2), if
(4.2)
3. Almost-surely continuous at t, if
(p( {w: lim IXt+h(W) - Xt(w) I = O}) = 1 (4.3)

h->O
4. Almm:t-surely sample continuous, if {Xt, t E TI is separable and
(p (U {w: lim IXt+h(w) - Xt(w) I ~ 01) = 0 (4.4)

tET h->O
If a process {XI, t E TI is continuous in probability (in 11th mean) at

every t of T, then it is said to be continuous in probability (in vth mean).
We have already encountered continuity in probability in connection
with questions of separability. Continuity in quadratic mean (II = 2)
will play an important role in our discussion of second-order properties
in Chap. 3. In this section we are primarily interested in a.s. continuity
and a.s. sample continuity.
First of all, we note that almost-sure continuity at every t of
T is not equivalent to a.s. sample continuity. This is simply because
Al = {w: lim IX/+h(w) - X,(w) I ~ 01, being a null event for every t, does
h->O
not imply that the uncountable union U A f iR a null event. The defining
tET
condition (4.4) for sample continuity is not one that can be easily verified
in terms of finite-dimensional distributions. Our primary objective here
iR to replace it by simpler sufficient conditions. To begin with, if a process
is a.s. sample continuous, then it is necessarily a.s. continuous at every
tin T, which in turn implies that the process is continuous in probability.
Thus, every sample-continuous process is both separable (by definition)
and continuous in probability. From Proposition 2.2 we know that for
every such process, every countable set S dense in T is a separating set.
Hence, with probability 1
sup IX t - X.I sup IX t - X.I (4.5)
l.sES l.sET
/t-./<h /I-s/<h
Assume T to be a closed interval. With no loss of generality we can take

T to be [0,1]. Then S can be taken to be
S = {~,n k = 0, 1, . . . ,2n - 1; n = 0, 1,2, .. . } (4.6)

2
Proposition 4.1. Let S be given by (4.6) and define
Z.(w) = sup IX(k+1)/2'(W) - X k / 2'(w)1 (4.7)

O:::;k:::;2 P
Then,
'"
sup IX, - X.I :::; 2 ~ Z, (4.8)
t,sES v=n+l
It-sl<2~"
Proof: If It - sl
It - k/2nl < 2- n ,
<
Is -
2- n , then we can find k such that < k < 2 n and °
k/2nl < 2- n. If t E S and It - k/2 n l < 2- n , t must
be of the form
t = -
k
2" -
+ l
.=1
m
t v 2-(n+.) (tv = 0,1)
Thus,
l . l
n+m ""
Z <
- Z,
.=n+l v=n+l
""
sup IX I - X.! :::; 2 ~ Z.
t,sES .=n+l
It-sl<2~n
which proves the proposition. I

Proposition 4.1 shows that if
lim
n--+ 00 p=n
'"
L Z, = ° (4.9)
with probability 1, then
lim
n~oo
sup
t,sES
/t-8/ <2~n
!X, - X.I = ° (4.10)
with probability 1, which proves that every sample function is uniformly

continuous on T with probability 1, since in (4.10) S can be replaced by T.
4. CONTINUITY 57
An extremely useful condition which implies (4.9) IS the Kolmogorov

condition given in the following proposition.
Proposition 4.2. Let {X t, t E T I be a separable process and let T be a

finite interval. If there exist strictly positive constants a, B, C such
that
PJIX t+ h - Xd a :::; Ch1+f/ (4.11)
then
sup IX,-
,
X81~O
h ..... O
(4.12)
t,sET
It-sl<h
so that almost every sample function is uniformly continuous on T.
Remark: The great advantage of the Kolmogorov condition is simply

that it only involves two-dimensional distributions and can usually
be verified by direct computation.
Proof: From the l\Iarkov inequality (1.5.5), we have

EIX'+h - Xtla
CP(IX'+h - Xd ;:::: 10) :::;
lOa
, 10 >0
Therefore,
CP(IX'+h - Xtl ;:::: h'Y) :::; Ch l+!'l-a'Y
Let 0 :::; 'Y < {3/a, and set 0 = (3 - a'Y > O. Then
CP(IX,+h - X,I ;:::: h'Y) :::; Ch1+o
Hence,
= CP ( sup
O~k~2P-l
IXk + 1 / 2 P _ X k / 2 pl ;:::: (:~)'Y)
2
:,21'-1
: :; kfo ()' (IXk+l/2 P - Xkwl >

-
(~.)'Y)
2"
:::; C2 p C\) 1+0
= C2-0p
'"
Sinc(' L
p=o
2- 0p < 00, we have
-By the Borel-Cantelli lemma, Zp ;-:::: 1/2 p'Y for at most a finite number of
v's with probability 1. That is, there exists N(w) almost-surely finite
such that
1
Zp(w) < -2 'Yp
for all II ;-:::: N(w)
°
.,
and lim ~
\"' Zp(w) = with probability 1. From Proposition 4.1,
n-+ 00 v=n+ 1
sup
t,BES
IX t - X.I ~o
n-+ 00
It-81<2-'
Since S is a separating set in T, this proves Proposition 4.2. I

The Kolmogorov condition (4.11) is sufficient to prove the sample
continuity of Brownian motion, viz., Proposition 3.3. We recall that for
a Brownian motion
<p(X t X. < x) =
x
~1 exp (U
- I
2 I) du
- /
- 00 27rlt _ 81 2t - 8
Therefore,
EIX t - X.14 =
00
_/
U4
exp
(1 -I--I
- -
u 2 )
du
v 27rlt - 81
/
- 00 2 t - 8
= 3(t - 8)2
and
which verifies the Kolmogorov condition with e = 3, a = 4, {3 = 1.

Therefore, with probability 1, every sample function of a separable
Brownian motion is uniformly continuous on every finite interval.
Let T be a finite and closed interval which we take to be [0,1] with
no loss of generality. Let e be the space of all continuous functions on
[0,1]. If wEe, we define
wEe
Xt(w) = value of w at t
t E [0,1]
The space e is a Banach space (complete normed linear space) with a
norm given by
IIwll = max IXt(w) I
Ost:9
Let CB be the minimal Boolean algebra with respect to which every X,

is measurable. CB is not a (J' algebra. Now, every consistent family of
finite-dimensional distributions for {X" t E [O,I]} defines an elementary
4. CONTINUITY 59
probability measure (p' on (C,CB). However, (p' is, in general, only finitely
additive, but not q additive. If (P' is q additive (equivalently, if (P' is
sequentially continuous), then it can be extended to a probability measure
(P on (C,a) where a is the minimal q algebra with respect to which every
X t is measurable. What results then is a stochastic process {Xl, t E [O,l]}
defined on (C,a,<p). Since (p(C) = 1, every sample function of X t is
obviously continuous. We can interpret the Kolmogorov condition to
mean that every consistent family of finite-dimensional distributions which
satisfies (4.11) defines an elementary probability measure (P' on (C,CB)
which is q additive [Prokhorov, 1956]. It is possible to show this directly,
but we won't do it here. In the case of Brownian motion, the resulting <P
is known as the Wiener measure. We ;;:hould note that every process
{Xl, t E T} defined on (C,a,(p) in this way is necessarily separable and
measurable.
If a process is not sample continuous, then it is clearly of interest
to know the nature of its discontinuities. We shall say that a function
f(t), 0 ~ t ~ 1 has only discontinuities of the first kind if (1) f is bounded,
(2) at every t E (0,1) limit from the left lim f(8) = f-(t) and limit from
sit
the right lim f(8) = f+(t) exist. Clearly, [0,1] can easily be replaced by
s! t
[a,b] in everything that we say. We give without proof a condition,
similar to the Kolmogorov condition, which guarantees that a process
{Xl, t E [O,1]} has sample functions which have only discontinuities of
the first kind with probability 1 [Cramer, 1966].
Proposition 4.3. Let {XI, 0 ~ t ~ I} be a separable process. If there exist

strictly positive constants a, (3, C such that
sup E(IX'+h - X81"'IX. - XrI"') ~ Ch Hfj (4.13)
t:5s:5t+h
then with probability 1, every sample function of {X" 0 ~ t ~ I}

has only discontinuities of the first kind.
Remark: Condition (4.13) is weaker than (4.12), as it should be. To show

this, we need only to apply the Schwarz inequality and find
Hence, if condition (4.12) is satisfied with a, {3 and C, then (4.13)

is satisfied with a/2, {3 and C.
As an example of the application of (4.13), consider the random
telegraph process {Xt, - 00 < t < oo} defined as follows:
(a) (p(X t = 1) = (p(X t = -1) = t
60 STOCHASTIC PROC£SSES
(b) {X" - 00 <t< oo} is a process with independent. increments,

that is,
(X" - X,._.) , (X'n_1 - X'n_,)' . . . , (X" - X,.), X'I

are independent whenever tn > tn- 1 > . . . > tl
(r) cp(X, - X. = 0) = i(1 + e- 1H1 ) (4.14)

cp(X, - x. = 2) = cp(X, - X. = ~ 2) = HI - e- 1H1 )
It is clear that with probability 1, {X" - 00 <t < oo} takes only values
± 1. It has independent increments,
E(IX,+h - X.IIX. - X,i) = EIX ,+h - X.IEIX. - Xd
= (1 - e-1t+h-.I) (1 - e- I.-tl )
~ h 2 /4 for all sin [t, t h] +
Hence, (4.13) is satisfied and with probability 1, every sample function
has only discontinuities of the first kind.
As a final topic, we note that the proof of Proposition 4.2 contains an
estimate of the modulus of continuity which we now make explicit. As
in Proposition 4.2, we assume lXI, 0 ~ t ~ I} to be separable and for
some positive constants C, ex, (3
We then found that there existed an a.s. finite random variable N(w) so
that
Z,(w) = sup IX(k+l)/2'(W) - Xk/2>(W) I
O:O;k:O;2'
< 1/2'1' for all v ~ N(w) (4.15)
where 'Y is any constant in [0, (3/ex). If we take S = {k/2", k = 0, . ..,
2n - 1, n = 0, 1, . . . }, then from Proposition 4.1
'"
sup IXt(w) - X.(w) I ~ 2 \'
l,sES ~
II-sl<2-' .=n+l
so that for n ~ N(w)
sup IX,(w) - X.(w) I < 2 ~ _1 = 2 (~)n+l

I,BES - ~ 2>1' 21'
II-sl<2-' >=n+l
for any 'Y in [0, (3/ex). For any h < 2-N (",),
sup IXt(w) - X.(w) I ~ 2h'Y (4.16)

l,sES
It-sl:O;h
5. MARKOV PROCESSES 61
Recalling that S is a separating set in [0,1], we have proven that

lim h-f3 / a +. sup IX t - X.I = 0 (4.17)
h! 0 t,.E[O,I)
It-sl::;h
with probability 1 for any EO > O. For a Brownian motion process, the
largest (3/ a that we can take is t. Therefore, for a separable Brownian
motion process,
sup IX, - X.I

t,sE[O,I)
lim It-81<h = 0 with probability 1 (4.18)
hlo ht-.
for any EO > O.
5. MARKOV PROCESSES
In this section we shall give a preliminary discussion of Markov processes.

In later chapters a more detailed treatment of sample-continuous IVlarkov
processes will be given in connection with the diffusion equation and
stochastic differential equations. We recall the definition of a Markov
process given in Sec. 3.
Definition. A process (XI, t E Tl is said to be a Markov process if for any

increasing collection t I , t2 , • • • , tn in T
<p(X t• < XnlXtv = Xv, v = 1, . . . ,n - 1)
= <P(X tn < xnlX tn _ 1 = Xn-I) (5.1)
If the finite-dimensional distributions of a Markov process (Xt, t E T I

can be expressed in terms of denRity functionR, then whenever tl < t2 <
< tn we can write
P(XI,t I ; X2,t 2 ; • • • ; xn,tn)
= P(Xn,tnIXI,t I ; . . . ; Xn-I,tn-I)P(XI,t I ; ; Xn_I,t n_ I )
= P(Xn,tnlxn-I,tn-I)p(XI,tI; . ; Xn-I,tn-I) (5.2)
Therefore, by using (5.2) repeatedly, we find for tl < t2 < < tn,
n
n
; xn,tn) = P(XI,tI ) p(xv,tvlxV-I,tv-I) (5.3)
v=2
which means that all finite-dimensional distributions are completely
determined by the two-dimensional distributions. This fact is true with
all Markov processes and does not depend on the existence of density
functions.
To obtain an expression relating the n-dimensional distribution

function to the two-dimensional diHtrihutioll, we first adopt the notatioll
P(Xl,t 1 ; xzh; . . . ; xn,tn) = cp(X'v < Xv; /) = 1, . . . , n) (5.4)
P(xv,tv/Xv-l,tv-l) = cp(X tv < XV/X tP _1 = Xv-I) (5.5)
The conditional distribution P(x,tl ~,s), t > s, is often called the transition
function. Suppose tl < t2 < ts, then we can write P(Xl,tl; X2,t 2; xa,ts) as
where dl;i stands for [h, ~i + d~i). Using the Markov property, we get
P(Xl,tl; X2,t 2 ; xs,ts) = f~~ f~~ P(xs,tsl ~2,t2) dP(~l,tl; 1;2,t2)

= f~~ f~~ P(xs,tal ~2,t2) dP(~2,t21 6,tl) dP(h,t 1)
More generally, we have for tl < t2 < ... < tn
P(Xl,t 1 ; X2,t 2; . . . ; xn,tn )
= f~~ ... f~:_l P(xn,tnl ~n-l,tn-l) dP(~n-l,tn-ll ~n-2,tn-2)
. . . dP(bt21 h,t 1 ) dP(~J,tl) (5.6)
which expresses the n-dimensional distribution of a Markov process in
terms of the one-dimensional distribution P(x,t) and the conditional
distribution P(x,tl ~,s), t > s, both of which are obtainable from P(~,s,x,t).
However, we should note that we cannot construct the finite-
dimensional distributions of a Markov process out of just any two-
dimensional distribution. There are two consistency conditions that must
be satisfied. The first one is obvious,
P(x,t) = cp(X t < X, - < X. <
00 00)
= f-"'", P(x,tll;,s) dP(~,s) (5.7)
This condition relates P(x,t) to P(x,tl ~,s). The second condition is obtained
by noting that if to < s < t, then
P(x,tlxo,to) = cp(X t < xiX!. = xo)
= cp(X t < X, - 00 < X. < 00 IX!. = xo)
= f-"'", cp(X! < xiX. = ~,X!. = xo)cp(X. E dl;lX t• = xo)
= f-"'", P(x,tl ~,s) dP(~,slto,xo)
This yields a condition that must be satisfied by the transition function
P(x,tl ~,s), t > s, namely,
P(x,tlxo,to) = f-"'", P(x,tll;,s) dP(I;,slto,xo) to <s< t (5.8)
Equation (f).8) is known as the Chapman-Kolmogorov equation. Given

distribution functions P(x,t) and P(.r,ti ~,s) satisfying (5.7) and (5.8), the
finite-dimensional distribution functions constructed by using (5.6) will
indeed satisfy the :\iarkov property. Therefore, (5.7) and (5.8) are both
necessary and sufficient for P(x,t) and P(x,ti ~,s) to be the one-dimensional
distribution and the transition function of a Markov process.
As an example, suppose that we want to define a lVlarkov process
!X (, - 00 < t < 00 I with the following properties:
1. X t takes only values 1 or -1+
2. <P(X t = 1) = i = <p(X t = -1)
3. For t ;::: s,
(p(X t = Xs) = q(t - s)

(p(X t = -Xs) = 1 - q(t - s)
where q(t) is continuous and q(O) = 1
It is easy to verify that (5.7) is automatically satisfied. However, (5.8)

imposes a constraint on q, namely,
q(t - to) = q(t - s)q(s - to) + [1 - q(t - 8)][1 - q(s - to)] (5.9)
A second equation can also be obtained, but it turns out to be redundant.

Equation (.5.9) looks a little simpler if we make a few changes of variables
and write
q(t + s) = q(t)q(s) + [1 - q(t)][l - q(s)]
By setting q(t) = ill + J(t)], we find
J(t + s) = J(t) f(s) (5.10)
The only continuous solution of (5.10) not identically equal to 0 is
J(t) = e- Xt
and A must be nonnegative, becauRe f(t) ~ 1. This means that q(t) must
be of the form
(5.11)
The resulting process is precisely the random telegraph process that we
introduced in Sec. 4 (4.14), except for the trivial scale factor "".
As a second example of using (5.8), we shall derive a set of conditions
which must be satisfied by the covariance function of a Gaussian-Markov
process. We can always assume that lXI, t E Tl has zero mean, because
Xtis l\Iarkov if and only if X, - p.(t) is Markov. If lXI, t E Tl is Gaussian,
zero mean, and if it has covariance function R(t,s), then

R(t,s)
E(XtIX. = ~) = R(s,s) ~ (5.12)
To prove (5.12), we merely note that X. and X t - [R(t,s)l/[R(s,s)]X.

are Gaussian and uncorrelated, hence are independent. Thm;,
E(XtIX.
,
= ~) =
R(t,s)
R( ) ~ +E [
X, -
R(t,s).
R-(
. ) A,
J
s,s ' 8,8
= R(t,s) ~
R(s,s)
Using (5.12) and (5.8) together, we get
E(XdX,o = xo) = f-"'", x dP(x,tlxo,to)
= I '"-00
R(t,s)
R(s,s) ~ dP(~,slxo,to)
R(t,s) R(s,t o)
=-- Xo
R(s,s) RCto,to)
R(t,t o)
= - - - Xu t> s > to
R(to,to)
Therefore, for tXt, t E T} to be Gaussian and Markov we must have
R( _ R(t,s)R(s,t o)
t,t o) - R(s,s) t > s > to (5.13)
Suppose that T is an interval and R(t,t) is strictly positive for t in the

interior of T. Then we can show that any solution of (5.13) must have
the form
R(t,s) = f(max (t,s»g(min (t,s» t, sET (5.14)
It turns out that (5.14) is also a sufficient condition for a Gaussian process
to be Markov. The simplest way to show this is as follows. First, we note
that a Brownian motion is Gaussian, Markov, and the R(t,s) = min (t,s)
(see Sec. 3). Set
get)
ret) = f(t) (5.15)
Because R(t,s) :$ VR(t,t)R(s,s), ret) must be monotone nondecreasing

in t. Now, if X t is a Gaussian process with zero mean and covariance fUllc-
tion given by (5.14), then it can be represented as
(5.16)
where Y t is a Brownian motion. Since J(t) is a deterministic function, and

ret) is nondecreasing, the fact that Y t is Markov implies that X, is Markov.
As was pointed out in Sec. 1.7, rather than to deal with conditional
distributions, it is far simpler to deal with conditional expectations. The
setting is however more abstract. We shall now reexamine the Markov
property in terms of conditional expectations. First, we adopt some
notations. Let tXt, t E Tl be a process defined on a fixed probability
space (U,a,cp). Let a, denote the smallest (J' algebra with respect to which
every X., 8 :::; t, is measurable. Let C!t+ denote the smallest (J' algebra with
respect to which X. is measurable for every 8 ;::: t Suppose that {X" t E T}
is a Markov process, then the following relations hold: All equalities are
understood to hold up to sets of zero probability.
1. The future and the past are conditionally independent given the
present. That is, if Z is a t+ measurable and Y is (Xt measurable, then
(5.17)
2. The future, given. the past and the present, is equal to the future given
only the present. That is, if Z is a t+ measurable then
(5.18)
Either 1 or 2 can be taken as the defining condition for Markov process.
Roughly speaking, the counterparts of (.5.17) and (5.18) in terms
of density functions are, respectively, given as follows:
p(x,t; xo,to!~,8) = p(X,t!~,8)p(xo,to!~,8) t >8> to
p(x,t!xo,to; ~,8) = p(X,t!~,8) t > 8> to
In this form, the equivalence between conditions 1 and 2 is intuitively
clear. To get (5.17) from (5.18) is easy. We merely note that if Z is a,+
measurable and Y is a, measurable, then
E(ZY!X,) = E(E(l,ZY!X t) = E(YE(lIZ!X t)
= E[YE(Z!X,)IX,) = E(ZIXt)E(YIX,)
To get (5.18) from (5.17), take an arbitrary Ret BE a, and compute

E[I BE(Z!X t») = E {E[IBE(ZIX t)!Xtl I
= E[E(ZIXt)E(IBIX t») = E[E(IBZ!X t ») = E(IBZ)
Since EIBE(ZIX t ) = EIBZ for every set B in at and E(Z!X t) is at

measurable, (5.18) follows by the definition of conditional expectation.
There is still another consequence of the Markov property that is
useful to express. Suppose that Z is a t+ measurable and to < t. Then
we have
E(ZjX,o) = E[E(ZIX,)IXt.) (5.19)
To prove (5.19), we merely have to use (5.18) and write

E(ZIXto) = E(E<l'ZIXto) = E[E(ZIXt) IXto]
Equation (5.19) is really the Chapman-Kolmogorov equation (5.8) in
disguise. If we set B = {w: X t < x}, then (5.8) can be rewritten as
to <s<t (5.20)
which follows immediately from (5.19). A comparison between (5.8) and
(5.19) reveals the notational simplicity gained by using conditional
expectation.
6. STATIONARITY AND ERGODICITY
Definition. A process tXt, - 00 < t < 00 I is said to be a stationary

process if for any (tl, . . . , tn) the joint distribution of {X t1+ tO '
X t 2+/ 0 , • • • , X tn+t oI does not depend on to.
Thus, if {XI, t E Tl is stationary, the n-dimensional distribu-
tion function P(X1,t1; X2,t2; . . . ; xn,tn) is equal to P(X1,0; X2,t2 - t1;
. • . ; xn,tn - t1), and hence, it depends only on the time differences
t2 - t 1, . . . , tn - i 1.
Definition. A process {X t, - 00 < t < 00 I IS said to be a wide-sense

stationary process if
(a) EX t 2 < 00
(b) EXt = J.I. is a constant
(c) E(X t - J.I.)(X. - J.I.) depends only on t - s
Because a stationary process may have EX t 2 = 00, a stationary process is
also wide-sense stationary if and only if EX t 2 < 00. Wide-sense sta-
tionarity will be discussed in Chap. 3 in connection with second-order
processes. In this section, we restrict our consideration to stationary
processes, which are sometimes referred to as strict-sense stationary
processes to emphasize the difference from wide-sense stationarity. The
random telegraph process is an example of a stationary process. The
Brownian motion is not stationary. A Gaussian process with zero mean
and a covariance function given by
R(t,s) = e- lt - 81
is both stationary and Markov. It is known as the Ornstein-Uhlenbeck

process.
Let {XI, - 00 < t < 00 I be a stationary process defined on a com-
pleted probability space (n,ex,<p). We suppose that ex is the completion
of the smallest u algebra with respect to which every X t is measurable.
6. STATIONARITY AND ERGODICITY 67
We also suppose that tXt, - <Xl < t < <Xl} is a separable and measurable
process. Let S denote the space of all a-measurable random variables.
We agree that two random variables which are equal almost surely
count as the same element of S. Then S is a linear space closed under
almost-sure convergence. Now, we define a family {Tt, - <Xl < t < <Xl}
of linear mappings of S onto S as follows:
- <Xl < t, 8 < <Xl (6.1a)
If Z(w) = I(Xt,(w), . . . ,X'n(w)), then

(6.1b)
If {Z,,} is a sequence in S converging almost surely to Z, then
TtZ = lim a.s. TtZ" (6.1c)

n--> 00
Since every Z in S can be approximated by (a.s. limit) a sequence of

Borel functions of a finite number of X/s, (6.1a) to (6.1c) adequately
define ITt, - <Xl < t < <Xl}.
Since tXt, - <Xl < t < <Xl} is stationary, {T,X", v = 1, . . . ,n}
and IX t" v = 1, . . . , n} have the same distribution. Therefore,
TtZ and Z have identical distributions, whenever Z is of the form
I(X 1" • • • ,X,.}. By virtue of the fact that convergence a.s. implies
convergence in distribution, T,Z and Z are always identically distributed.
It is also easy to see that ITt, - <Xl < t < <Xl I is a translation group,
that is, T t+. = TtT., To = I, T -t = T,-l. Summarizing these properties
of T I , we have the following:
T t is a linear mapping of S onto S (6.2a)
T t preserves a.s. convergence (6.2b)
T t+. = TIT., To = I, T _, = Tt-l (6.2c)
T tZ and Z are identically distributed, - <Xl <t< <Xl, Z E S (6.2d)
Now, suppose a random variable Z in S is such that for each t, TtZ = Z
almost surely. That is, for each t, TtZ aud Z differ only on a set At such
that <P(A t) = O. Then, Z is said to be an invariant random variable of
the process {Xt, - <Xl < t < <Xl}.
Definition. A stationary process {Xt, - <Xl < t < <Xl} is said to be ergodic
if every invariant random variable of the process is almost surely
equal to a constant.
We note that a random variable almost surely equal to a constant is
always invariant. The great interest in ergodic processes, from the point
of view of applications, is largely due to the following theorem.
&8 STOCHASTIC PROCESSES
Proposition 6.1. Let tXt, - 00 < t < oo} be a separable and measurable
ergodic process. Let f be any Borel function such that Elf(X 0) I < 00.
Then
almost surely (6.3)
Remark: We note that because of stationarity, Ef(X t ) = Ef(Xo) for

every t. We interpret Proposition 6.1 as saying that for an ergodic
process, time average equals ensemble average.
The proof of Proposition 6.1 will be omitted. We consider, instead,
an example illustrating the foregoing discussion. Suppose
Xt(w) = A(w) cos [211"t + O(w)] -oo<t<oo (6.4)
°
where A and are independent random variables, and is uniformly °
distributed on [0,211"). It is easy to show (e.g., by computing the char-
acteristic functions) that tXt, - 00 < t < 00 I is stationary. Now, for
this simple example, every Z in S is some Borel function of the pair
(A,O), and
T1f(A,0) = f(A, mod 2r (0 + 211"t» (6.5)
Every Z in S which depends only on A and not on is an invariant

random variable. Thus tXt, - 00 < t < oo} is ergodic if and only if A
°
is almost surely a constant. This result can also be inferred from Proposi-
tion 6.1. For an arbitrary Borel function f such that Elf(X o) I < 00,
we have
.
hm IT
1 _Tf(Xt(w» dt = lim -N
2T 1 _Nf(A(w) cos [211"t IN + O(w»)) dt
T--+ '" N--+ '" 2
= lim 1
N f~Nf(A(w) cos 211"t) dt
N--+ '" 2
= 10 1
f(A(w) cos 211"t) dt (6.6)
where we made repeated use of the fact that f(A cos (211"t + 0» is periodic
in t with period 1. On the other hand
Ef(X 0) = Ef(A cos 0)
= r:", [10 2
". Lf(a cos cp) dcp] dP A (a)
=E 10 f(A cos 211"t) dt

1
(6.7)
6. STATIONARITY AND ERGODICITY 69
If we denote 101
f(A cos 21C't) dt by j(A), then the time average is j(A(w».
These two are equal if and only if A(w) is almost surely equal to a constant.
In general, it is not easy to give a simple condition which would
ensure ergodicity. For Gaussian processes, however, we have the following
sufficient condition for ergodicity: 1 Assume that {X" - 00 < t < oo} is
Gaussian and stationary, with mean p. and covariance function
R(T) = E(X,+r - p.)(X t - p.)
Then tXt, - 00 <t< 00 I is ergodic if

f-"'", IR(T) I dT < 00 (6.S)
The question of possible equality between time average and ensemble

average can be asked in a different and more direct way. Suppose for a
fixed stationary process {XI, - 00 < t < 00 J and a fixed function f, we
seek a sufficient condition for
(6.9a)
that is,
lim E [~ f T f(X,) dt - J;.j(X 0)]2 = 0 (6.9b)

T~'" '2T-T
r
We note that Ef(X t) = Ef(X 0) for every t. Therefore,
E L~ f-TT feX/) dt - Ef(X 0)
=
1
(2T)2 fl T
E{[f(X,) - Ef(Xt)][f(Xs) - Ef(X.)J) dtds
If
T
= ~l_
(2T)2 -T
R r(t -
..
8) rtt ds (6.10)
where we have denoted the covariance function of !(X,) by Hr. By a

change in the variables of integration T = t - s, (f = t s, the last +
integral in (6.10) can be rewritten as
_1_ .! f2T [f2T-ITI d(J] R (T) dT

(2T)2 2 -2T -2T+ ITI f
= 2~f~TRf(T)(1- ~T~)dTS 2~{OOooIRf(T)ldT

1 A necessary and sufficient condition is for the spectral-distribution function, defined
by (3.5.21) and (3.5.22), to be continuous [Grenander, 1950, pp. 257-260j.

Therefore, a sufficient condition for (6.9b) is given by
f-~~ IRj(T) I dT < 00 (6.11)
Unlike ergodicity, condition (6.11) has to be verified for each f separately.
EXERCISES
1. Suppose that {X" 0 S t S 1) is a family of independent random variables, i.e.,
every finite subcollection X", . . . , X'n is mutually independent. Show that
{X" 0 S t S 1) cannot be continuous in probability unless there is a continuous
function f(·) such that for each t, X,(w) = f(t) for almost all w.
2. Suppose that X is a Gaussian random variable with EX = 0 and EX' = 1. Let

X, = t X, - 00 + t 00. < <
(a) Let Tn be any countable set in (- 00,00), say Tn ttl, t 2, . . . ). Show that
(l'(X, = 0 for at least one t in Tn) = 0
(b) Let T be an interval, say [0,1]. Show that
(l'(X, = 0 for at least one t in T) = (l' (U {w: X,(w) = Ol) > 0
lET
Note that even though {w: X,( w) = 0 for at least one t in T l is an uncountable
union, it is an event in this particular case.
3. For an extremely explicit example of stochastic process consider the following.

Let 11 = [0,1], let a be the CT algebra of Borel sets in [0,1], and let (l' be the Lebesgue
measure so that (l' (interval) = length. Define
X,(w) = tw
Find the one-dimensional distribution function P, for this process. Repeat for
the two-dimensional distribution function p, .•.
4. Find the mean function p.(t) = EX, and the covariance function R(t,s) = E[X, -
P.(t)](X8 - p.(s)] for the process defined in Exercise 3.
5. Verify that the process defined in Exercise :3 is both separable and measurable·
6. LetZ and 0 be independent random variables such that Z has a density function
pz given by
pz(z) = { 0ze _1 2
z<O
2% z:2.0
and 0 is uniformly distributed in the interval [0,271"). Define
X, = Z cos (271"t + 0)
Show that {X" - 00 <t< 00 l is a Gaussian process.
7. Let {X" - oc< t < 00 I be a Gaussian process with EX, = 0 and
EX,X, = ~(itl + lsi - It - 81)

EXERCISES 71
Now define Y, = X_I for t ::; O. Show that lYe, 0::; t < 00 I and {X"O ::; t < 00 I
are two independent processes.
8. Let {X" t E Tl be a stochastic process such that EIX,I < 00 for every t E T,
and let ax, denote the smallest rr algebra with respect to which X. is measurable
for every s ::; t. Suppose that
E(X'nIX", X,,, . . . ,X'n_,) = X'n_, a.s.
whenever t, < t2 < ... < tn. Prove that whenever t ~ s
Note: E(i••X, can also be written in a more suggestive way as E(X,IX" T ::; s).
!I. Let {X" t ~ 0 I be a Brownian motion. Use the result of Exercise 8 to prove
that for t ~ s,
E(X,IX" T ::; s) = X. a.s.
10. Let {X"~ t E [O,l]} be a separable Gaussian process with zero mean. If
1'>0
show that {X, E [O,l]l must be sample continuous no matter how small l' may be.
11. Suppose that {X" - 00 < t < 00 I is a Gaussian process with zero mean and
EX,X. = e- 1t -. I• Express X, in the form
X, = f(t)Wg(t)lf(t)
where {W" 0 ::; s < 00 I is a standard Brownian motion.
12. Let {X" - 00 < t < oc I be a stationary Markov process which assumes only a
finite number of values XI, X2, • • • ,Xn . Let peT) be an n X n matrix with elements
Pii(T) = 6'(X'+T = xilX, = Xi)
(a) Suppose that lim (I/T)[p(T) - I] = A exists and IS finite. Show that P(T)
dO
must have the form
peT) = eTA
(b) Let q be an n vector with
q. = prob (X, = Xi)
and l,t I d,no", th, n y"t" [ } Show that
p(T)q - q
and
p'(T)1 = I (p' = transpose of p)
Suppose that n = 2 and q = [t J. Show that p(T) must be of the form
1 + e- AT 1 - e- AT ]
[
peT) = 1 _:e-AT 1 +:e-AT A ;::: 0
13. Let l' be an interval and R(t,s), t, sET, be a continuous covariance function such
that R(t,t) > 0 for every t in the interior of 1'. Suppose that R satisfies
R( t to ) -_ R(t,s)R(s,to)
,
---:::C-:--c-'--
R(s,s)
to <s <t
(a) Let pet,s) = R(t,s);-VR(t,t)R(s,s) and show that p satisfies
to < s < t
p(t,to) = p(t,s)p(s,to)
to, s, t in int (1')
(b) Show that pet,s) >0 for all t and s in the interior of 1'.
(c) Let a be a fixed point in int (1') and define
p( t,a) t :::::; a
aCt) = { _1_
t ;::: a
p(t,a)
Show that
a(min (t,s»
pet,s) = ---'-------'-''--'--'- t, s E int (1')
a(max (t,s»
Note: This proves (.5.14).
14. Let IX" - 00 < t < 00 1 be a q.m. continuous and stationary Gaussian-:'.larkov
process. Find its covariance function.
15. Suppose that IX" - 00 <t < 00 1 is given by the form
X,(w) = A(w) cos [2".t + O(w)]

where A and 0 are independent. A is nonnegative and 8 is uniformly distributed
on [0,2".).
(a) Show that {X" - 00 < t < 00 1 is stationary.
(b) Show that EX, = 0 for all t provided that EA < 00. Does
M T
1
= 21'
fT_ T X, dt converge in probability to EX, aR T -, ",'?
EXERCISES 73
(c) Show that {X" - 00 < t < oo} is a Gaussian process if and only if A has
a Rayleigh distribution, that is, A has a density PA given by
PA(r) := r
-exp
q2
(1
- - r2)
-
2 q2 r~O
Is IX" - 00 <t < oo} Markov then?
16. Suppose that {X"t ~ O} and f Y" t 2:: 0 J are two independent. standard Brownian
motions. Let Z, = YX,2 + Y,2.
(a) Suppose that X" y, are observed at t = I, 2, 3 and the data a,re summarized
as follows:
t 1 2 3
x .3 1 -2
--
Y - .1 - .2 1
Find E{Z,2lobserved data} and show that
y5 ~ E {Z,lobserved data J ~ Y7
(b) Is {Z" t ~ 0 J a Markov process?
3
Second-Order Processes
1. INTRODUCTION
In this chapter it will actually be easier to deal with complex-valued

random variables and stochastic processes. A complex-valued random
variable X is a complex-valued function on n such that its real and
imaginary parts are real random variables.
Definition. A second-order random variable X is one which satisfies

EIXI2 < 00. A second-order stochastic process IX" t E T} is a
one-parameter family of second-order random variables.
If tXt, t E T} is a second-order process, then we can define the
mean, correlation function, and covariance function, respectively, as
follows:
EXt = p,(t) (1.1)

EX/X. = met,s) (1.2)
.~---
E[X t - p,(t)][X. - p,(s)] = R(t,s) (1.3)
74
1. INTRODUCTION 7S
where the overbar denotes complex conjugate. As far as second-order

properties are concerned, we are primarily interested in "linear operations"
on processes. Since the output of a linear operation on IX" t E T) is
equal to the sum of the outputs of the same operation on X, - J.I.(t) and
J.I.(t) separately, we can always assume zero mean with little loss in
generality. In this chapter we shall always assume that the mean is
zero. If the mean is zero, the correlation function and the covariance
function are the same thing. ~~ e use the term covariance function
exclusively. Roughly speaking, second-order properties of a process are
those properties that can be deduced knowing only its covariance function.
A covariance function satisfies a numher of important properties,
some of which are listed below.
1. Nonnegative definite
A complex-valued function Jet,s) defined on a square TXT is said
to be nonnegative definite, if for any finite collection t l , t 2 , • • • , t"
in T and any complex constants al.. a2, . . . , an
n n
~ ~ ajak!(tj,tk) 2: 0 (1.4)
i= I k= I
Every covariance function is nonnegative definite, because

n n n
~ ~ ajakR(tj,tk) = E 1 ~ ajX 'j 12 2: 0
j=lk=1 i=1
Conversely, if R(t,s) is a nonnegative definite function on TXT,

then we can always find a second-order process lX" t E T) whose
covariance function is R(t,s). In fact, we can always find a pair of
real processes I y" Z" t E T) such that Y, + iZ, has covariance func-
tion R(t,s), and y" Z, are jointly Gaussian, i.e., any real linear combina-
tion aY, + (3Z, is again a Gaussian process. Summarizing, a function
R(t,s) on TXT is a covariance Junction iJ and only if it is nonnegative
definite.
2. Hermitian symmetry
A covariance function R(t,s) always satisfies
R(t,s) = R(s,t) (1.5)
because EX,X. = EX/X •.
3. Schwarz inequality
If Y and Z are a pair of second-order random variables, then
IEYZI ~ VEIYI2EIZI2 (1.6)

which is a special case of the Schwa.rz inequality. To prove (1.6) we
76 SECOND-ORDER PROCESSES
need only note that
EIYI2EIZl2 - EIYZl2 = EIZI2EIY - ;I~~ ZI2 ~ 0 (1.7)
Applying (1.6) to a second-order process, we get immediately
IR(t,s) I ~ VR(t,t)R(s,s) (1.8)

whenever R(t,s) is a covariance function.
4. Closure properties
(a) Multiplication. If RI(t,s) and R 2 (t,s) are two covariance functions,

then we can always find two independent Gaussian processes Y t and
Zt with covariance functions RI(t,S) and R 2 (t,s), respectively. The
covariance function of X t = YtZ t is given by R I (t,s)R 2 (t,s). Therefore,
the product of two covariance functions is again a covariance function.
(b) Addition. The sum RI(t,s) + R 2 (t,s) of two covariance functions is
again a covariance function. The argument for thiR aRsertion is almost
identical to that of (a).
(c) Positive sums. A positive constant C is always a covariance function,
simply because we can always take X t = X to be a Gaussian random
variable with zero mean and EX2 = C. It follows that if RI(t,s),
R 2 (t,s), . . . , Rn(t,s) are covariance functions on TXT and C1, C 2,
.. ,Cn are positive constants, then
n
R(t,s) = L C.R.(t,s)
v= I
(1.9)
is again a covariance function on TXT.

(d) Pointwise limit. If IRv(t,s), 1J = 1, . . . J is a sequence of covariance
functions on TXT converging pointwise to R(t,s) at every point of
TXT, then R(t,s) is a covariance function. We only need to verify
the definition of a nonnegative definite function,
n n n n
L L ajiikR(tj,tk)
j - l k=1
= lim L L ajlxkR.(thtk)
v-->oo j=1 k=1
~0
(e) Bilinear forms. For any function u(t), u(t)ct(s) is a covariance

function, simply because we can always set X t = u(t) X where X is
normal N(O,I). It follows that any finite bilinear sum
n
L ct.(t)u.(s)
v=1
(1.10)
is a covariance function, and so IS any convergent infinite series

2. SECOND-ORDER CONTINUITY 77
..
r
.=1
u.(t)iJv(s). More generally, any pointwise limit of a sequence of
.
bilinear sums of the form (1.10) is a covariance function. This includes
not only infinite sum such as r
,,=1
u.(t)iJ.(S) , but also integrals of the form
lab u(t,X)iJ(s,X) dX. It will be seen later that most covariance functions
can be represented in the form of bilinear sums and/or integrals, and
these representations play an extremely useful role in the application
of stochastic processes.
2. SECOND-ORDER CONTINUITY
Definition. A second-order process tXt, t E T} is said to be continuous in
quadratic mean (q.m. continuous) at t if
EIX'+h
.
- Xti2~O
h-->O
If a process is q.m. continuous at every t of T, we shall say that it

is a q.m. continuous process.
Q.m. continuity of a second-order process must, of course, be closely
related to questions of continuity of the covariance function of the process.
The following propo.3ition summarizes succinctly the relationship between
q. m. continuity and continuity of the covariance function.
Proposition 2.1. Let {XI, t E T} be a second-order process on an interval

T, and let R(t,s) = ExtIe denote its covariance function.
(a) tXt, t E T} is q.m. continuous at t if and only if R(·,·) is con-
tinuous at the diagonal point (t,t).
(b) If tXt, t E T} is q.m. continuous at every t E T, then R(',') is
continuous at every point of the square TXT.
(c) If a nonnegative definite function on TXT is continuous at
every diagonal point, then it is continuous everywhere on TXT.
Proof;
(a) If R is continuous at (t,t), then
EJX t+h - X t J2 = R(t + 11" t + h) - R(t, t + h)
- R(t + h, t) + R(t,t)
= [R(t + h, t + h) - R(t,t)] - [R(t, t + h) - R(l,t)]
- [R(t + h, t) - R(t,t)]----"o
h-->
0
Conversely, if tXt, t E T} is q.m. continuous at t, then
R(t + h, t + h') - R(t,t) = EXt+hXI+h, - EXIX!
= E(Xt+h - Xt)X t+h , + EXt(X +h, -
t Xt)
From the Schwarz inequality (1.6), we have

IR(t + h, t + h') - R(t,t)1 ~ [EIX t+h - X t 12EIXrw I2]1
+ [EIXt/2EIX r+h , - XtI2]~ ~
h,h'-", 0
0
(b) If tXt, t E T} is q.m. continuous at every t E T, then
IR(t + h, s + h')
- R(t,s)1 = IEXt+hX s +h ' - EXrX,1
= IE(X t+h - XaX 8 +h, +
EX t (X 8 +h, - Xs)1
~ VEIX,+h - Xt/2EIX 8 +h,12 + VEIX t I2EIX 8 +h • - X812~ 0
(c) Since every nonnegative definite function on TXT is the covariance
function of some second-order process on T, part (c) follows immediately
from (a) and (b). I
3. LINEAR OPERATIONS AND SECOND-ORDER CALCULUS

Let tXt, t E T} be a second-order process. A random variable Y is said to
be derived from a linear operation on tXt, t E T} if either
N
(a) Yew) = ~ apXtp(w) (3.1)
1.·=1
or
(b) Y is the q.m. limit of a sequence of such finite linear combinations
We denote the collection of all such random variables derived from a
given process tXt, t E T} by Xx. Two elements Y and Y' are not dis-
tinguished whenever ElY - Y'12 = 0. 1 Equality between second-order
random variables will always be understood to be up to zero q.m. dif-
ferences. The space Xx becomes an inner product space if we define
the inner product (Y,Z) by
(Y,Z) = EYZ (3.2)
The inner product automatically defines a norm
II YII = V(Y, Y) (3.3)
and a metric
d(Y,Z) = IIY - Z!I (3.4)
A sequence {Y n} in Xx is said to be a Cauchy sequence if
I/Ym - Ynll--tO m.n~QO
1 Strictly speaking, Xx is a space of equivalence class, as is generally the case of

L2 spaces.
1. LINEAR OPERATIONS AND SECOND-ORDER CALCULUS 79
Since II Y m - Ynl1 2 = ElY m - Yn1 2, a Cauchy sequence in Xx is a

mutually q.m. convergent sequence, which must converge in q.m. That
is, for every Cauchy sequence {Y n } in Xx, there exists a Y such that
II Y n - YII -----,)
n->oo
o. It is easy to show that Y is in Xx. This means that
Xx is an inner product space which, as a metric space, is complete. By
definition, Xx is a Hilbert space.
Second-order properties of a process {X I, t E T} are those properties
that are derivable from its covariance function. Every property of Xx is
a second-order property, and conversely. It is easy to see, therefore,
that there is inherently a close relationship between linear operations
(defined by membership in Xx) and second-order properties. Often,
questions on second-order properties are most lucidly answered in the
framework of the Hilbert space theory. A good example of this is the
series representation of X t to be treated in the next section.
Quadratic-mean derivative of a process tXt, t E T} at a point t
is defined by
X, = lim in q.m. -hI (X'+h - X t) (3.5)

h~O
Because of the equivalence between mutual q.m. convergence and q.m.

convergence, XI exists if and only if
(3.6)
which holds if the mixed partial derivative of the covariance function, that is,
a 2R( t1> t 2 )/ atl at2 , exists in a neighborhood of (t, t) and is continuous at
(t, t). We note that
EX/X. = a 2R(t,s) (3.7)

at as
Since if X, exists, it is equal to
XI = lim in q.m. n (X t+1 / n - XI) (3.8)

n->oo
XI E Xx by definition.
Quadratic-mean integrals arise even more frequently than q.m.
derivatives. Let tXt, t E T} be a second-order process, and let J(t) be a
complex-valued function defined on the interval T. We define the q.m.
integral IT
J(t)X t dt as an element in Xx as follows. Let {Tn} be a sequence
of partitions of T such that as n ~ 00 Tn becomes dense in T. To be
specific, let T be a finite interval [a,b] and

Tn = {a = to(n) < ll(n) < ... < In(n) = b} (3.9)
max (t. (n) - t~~ I) -------') 0 (3.10)
1 :511:::;n n-----+oo
We define the integral lab f(t)X t dt as
I
n-l
lab f(t)Xt dt = lim in q.m. .f(t~n)Xt'.Jt~~1 - t~n» (3.11)

n-+oo 11=0
where t~n are any sequence of points satisfying t.(n) :::; n < t~~l' The t:
integral lab
f(t)X t dt is well defined by this procedure, provided that
the q.m. limit exists and is independent of the choice of {Tn} and, for
each {Tn}, is independent of the choice of {t: n }. In that case, we say that
the integral lab
f(t)XI dt exists.
Proposition 3.1. The q.m. integral lab f(t)X I dt exists if and only if
{b (b _
Ja Jc f(t).f(s)R(t,s) dt ds exists as a Riemann integral.
Remark:
(a) We note that if XI is q.m. continuous, that is, if R(t,s) is con-

tinuous, then f(t) being piecewise continuous on [a,b] is sufficient
to ensure the existence of lab f(t)X I dt.
(b) The q.m. integral can easily be generalized to include infinite
intervals by taking q.m. limit as one or both of the endpoints goes
to infinity.
(c) A common way in which q.m. integrals arise is when a process
{y" t E T'} is generated from a second process {XI, t E T} according
to the formula
Y, = IT h(t,s)Xs ds t E T' (3.12)
Equation (3.12) often admits the interpretation of a linear system

with input XI, output Y" and impulse response h(t,s).
(d) From the point of view of applications one might well prefer
IT f(t)Xt(W) dt to be defined as the Lebesgue integral of a sample func-
tion /Xt(w), t E T}. The existence of such a Lebesgue integral is
ensured if the process is a measurable process and if
IT If(t)IEIXd dt < <Xl
(cf. discussion at the end of Sec. 2.2).

4. ORTHOGONAL EXPANSIONS 81
4. ORTHOGONAL EXPANSIONS
A family 5" of elements of Xx is said to be an orthonormal (O-N) family if
any two distinct elements Y and Z of 5" satisfy
II YII = 1 = IIZII (4.1)
(Y,Z) = EYZ = 0
The second of these conditions is called orthogonality. An O-N family 5" is
said to be complete in Xx if there exists no element of Xx, except the
zero element, which is orthogonal to every element of 5".
Suppose that {X" t E T} is a q.m. continuous process, where T is
an interval, finite or infinite. Let T' be the set of all rational points in T.
For every t E T, there exists a sequence Itn } in T' such that in ~ n--->
t. 00
Since IX" t E T} is q.m. continuous, we have for every i,
X, = lim in q.m. X'n

n---> 00
It follows that every element in Xx is a linear combination of {XI,

I E T'l or the q.m. limit of a sequence of such linear combinations. In
"hort, the countable family {X" t E T'} is dense in Xx. It follows that
{'very O-N family in Xx is at most countable [see, e.g., Taylor, 1961,
Sec. 3.2].
For a q.m. continuous process {X" t E T}, let {Zn, n = 1,2, . . . }
be an O-N family in Xx. If Y E Xx, then
N N
ElY - L (Y,Zn)Zn 12 =
n=l
EIYI2 - L
n=l
I(Y,Zn)12 ~ 0
Therefore,
L
00
00 > EIYi2 ~ I(Y,Zn)12

n=l
L (Y,Zn)Zn is well defined and

00
so that
n=l
L (Y,Zn)Zn
00
Y -
n=l
is orthogonal to every Zn. It follows that if {Zn} is complete in Xx, then

every Y in Xx has the representation
L (Y,Zn)Zn
00
Y = (4.2)
n=l
Suppose that {Zn, n 1, 2, . . . } is a given complete O-N family

in Xx and we set
(4.3)
8Z SECOND-ORDER PROCESSES
Then, from (4.2) we have
Xt = r
n=l
00
CTn(t)Zn t E T (4.4)
The functions CTn(t), t E T, are continuous, because tXt, t E T} is q.m.

continuous. Further, the set of functions {CTn, n = 1, 2, . . . } is also
linearly independent, i.e., for every N
for all t E T
implies an = 0, n = 1, . . . , N. The linear independence of {CT n , n = 1,
2, . . . } is due to the fact that r

N
n~l
anon(t) = 0 for all t E T implies that
N
L anZ n is
n=l
orthogonal to XI for every t E T, hence, also orthogonal
to every Zn which implies an = 0 for every 11. It follows from (4.4) that
L CTn(t)iTn(s)
00
EX/X. = R(t,s) = f,s E T (4.5)

n=1
Conversely, suppose that {CTn(t), t E T, n = 1,2, .} is a linearly

independent family of continuous functions such that
R(t,s) =
n=l
r
00
CTn(t)iT .. (s) t, sET
Then, it follows from a very general representation theorem of Karhunen

[Karhunen, 1947] that there exists a complete O-N family {Zn' n = 1,
2, . . .} in Xx such that
L CTn(t)Zn(w)
00
X,(w) = t E T
n = 1
Thus, (4.4) and (4.5) imply each other. Representations of the form (4.4) are
useful because they permit the continuum of random variables {Xli t E T}
to be represented by a countable number of orthonormal random variables
{Zn}' However, their use is, in general, limited by the fact that it is usually
difficult to express the random variables Zn explicitly in terms of {Xt' t E T}.
An exceptional case is when {on} are orthogonal, that is, iT0m( t)iV t) dt = 0
whenever m *
n. This motivates the expansion widely known as the
Karhunen-Loeve expansion.
Consider a q.m. continuous process {Xt' a ~ t ~ b}, where the parame-
ter set is explicitly assumed to be a closed and finite interval.
Suppose that there exists an expansion of the form
Xt(W) = r
n=l
""
Un(t)Zn(W) (4.6)
where {Zn} and {Un} satisfy

EZm2n = Omn (4.7)
lb um(t)if,,(t) dt = An Omn (4.8)
Now, from (4.5), the covariance function R must satisfy
R(t,s) = r
n=l
""
un(t)if,,(S) (4.9)
for each (t,s) in [a,b] X [a,b]. Now from the Schwarz inequality and the
fact that R is continuous on [a,b] X [a,b] we have
Ir r
N N
sup
a:o;t •• S;b n=l
un(t)ifn(S) I::; sup
as;tS;b n=l
IUn(t)12
::; sup R(t,t) < 00 (4.10)
a~t ~b
Therefore, the convergence in (4.9) is bounded. It follows that for every m
(4.11)
What we have shown is that if an expansion (4.6), satisfying (4.7) and

(4.8), exists, then {un} must be solutions to the integral equation (4.11).
We shall see that under our assumptions, such an expansion always
exists.
The above considerations suggest that we investigate integral equa-
tions of the form of (4.11). Fortunately, such equations are well known.
We shall now summarize some of the important facts concerning it.
First, we shall introduce a few definitions. Consider the integral equation
Jab R(t,s)cp(S) ds = Acp(t) (4.12)
where we have explicitly denoted the interval T by [a,b]. We assume that

[a,b] is finite and R(t,s) is continuous on [a,b] X [a,b]. A nonzero number
A, for which there exists a cp satisfying both (4.12) and the condition
lb IcpU) 12 dt < 00, is called an eigenvalue of the integral equation. The
corresponding cp is called an eigenfunction.
1. Any eigenvalue of (4.12) must be real and positive. The fact that A is
real follows from the Hermitian symmetry R(t,s) = R(s,t). The fact
that A is positive follows from the nonnegative definiteness of R.
2. There is at least one eigenvalue for (4.12), if R is not identically zero.

The largest eigenvalue AO is given by
b
AO = max I I R(t,s)<p(S)C{J(t) ds dt
(4.13)
T
11'1'11 = 1 a
1I<p11 = [lab 1<p(t)j2 dt
This fact is not easily proved. It depends on both the nonnegative

definiteness of R and its continuity [see, e.g., Taylor. 1961, pp. 334-336].
3. Let <Po(t) denote the normalized eigenfunction corresponding to AO,
then <po(t) is continuous on [a,b]. This is because we can write
<Po(t) = -1
AO
1ba
R(t,s)<po(s) ds (4.14)
and the continuity of <Po(t) follows from the continuity of R(t,s) in t.

4. Let RI(t,s) = R(t,s) - Ao<Po(t)C{Jo(s). Then RI(t,s) is both continuous
and nonnegative definite. The continuity of RI is obvious from 3.
To show nonnegative definiteness, let
Y , = X, - <Po(t) lab XT<PO(T) ciT (4.15)

Then, we have
EYtY. = EXtX. - E [X,C{Jo(s) lab X~<po(u) ciu]
- E [X.<po(t) lab XTC{JO(T) dTJ
b
+E [<po(t)C{JO(S) I I XTXqC{Jo(T)<PO(U) dT dU]

a
= R(t,s) - Ao<Po(t)C{JO(S)
= RI(t,s) (4.16)
Hence, RI (t,s), being a covariance function, must be nonnegative
definite.
5. We observe that lab RI(t,s)<po(S) ds = O. Therefore, if we repeat step
2 and obtain AI, <PI, then
lab RI(t,S)<PI(S) ds = AI<PI(t) a ::; t ::; b
It follows that lab <PI(t)C{Jo(t) dt = ~ lab <PI(S) [I'a;""tb-R-I-(-s,-t)-'P-o(-t)-d-t] ds = 0
so that <PI is orthogonal to <Po. In addition,

lab R(t,S)'Pl(S) ds = lab R1(t,S)<PI(S) ds - Ao<Po(t) lab <PI(S)C{JO(S) ds
= AI<PI(t) a ::; t ::; b (4.17)
In other words, Al and <PI are eigenvalue and eigenfunction of (4.12).
6. It is clear that the above procedure can be iterated, and we get a

nonincreasing sequence of eigenvalues Ao, A1, . • . and a corresponding
sequence of eigenfunctions 'Po, 'PI, . which are orthonormal, that is,
(4.18)
7. The sequence AO, A11 ••• , may terminate after a finite number of
terms, in which case we have
N
R(t,s) = L
n=O
An'Pn(t)i/Jn(S) (4.19)
If the sequence AO, AI, . • . does not terminate, then An ~

n~oo
0 (see
Exercise 2).
8. If the number of eigenvalues is infinite, then
N
N~oo
lim sup
a$t.8$b
I R(t,s) L An'Pn(t)i/Jn(S) I = 0
n=O
(4.20)
In other words,
N
N~oo
lim 2:
n=O
An'Pn(t)i/Jn(S) = R(t,s) uniformly on [a,b)2 (4.21)
This important result is known as the Mercer's theorem [Riesz and

Sz.-Nagy, 19,52, p. 245]. The fact that the convergence is uniform is
a strong result and immediately implies convergence in mean square,
that is,
ff IR(t,s) L An'Pn(t)i/Jn(S) /2 dt ds
b N
lim = 0 (4.22)
N~oo
a n=O
9. In general, the O-N family {'Pnl is not complete in the space L 2(a,b).
The most that can be said is that given f E L 2(a,b) (that is,
f If(t)/2 dt < 00)
we can write
L 'Pn(t) lab f(s)i/Jn(S) ds

N
J(t) = fo(t) + l.i.m. (4.23)

N~oo n=O
where l.i.m. means limit in mean square and fo satisfies
f R(t,s)fo(s) ds = 0 for all t E [a,b] (4.24)
It follows that {'Pnl is complete in L 2 (a,b), if and only if (4.24) implies

lab Ifo(t)12 dt = O.
Weare now in a position to state precisely the theorem concerning

the biorthogonal series for a second-order process. This expansion is
often referred to as the Karhunen-Loeve expansion [Loeve, 1963, pp.
478-479].
Proposition 4.1. Let IX(w,t), t E [a,b]} be a q.m. continuous second-order

process with covariance function R(t,s).
(a) If I 'Pn I are the orthonormal eigenfunctions of
(4.25)
and I An I the eigenvalues, then

N
X(w,t) = lim in q.m.

N---+co n=O
L VAn 'Pn(t)bn(w)
uniformly for a S t ::; b (4.26)
where I bn I satisfy
bn( w) q:::. (I>::::rl jbcpn(t)X( w, t) dt
a
( 4.27)
and
(4.28)
(b) Conversely, if X(w,t) has an expansion of the form (4.26) with
(b -
Ja 'Pm(t)qJn(t) dt = omn = Ebmbn, then l'Pn} and IAn} must be eigen-
functions and eigenvalues of (4.25).
Proof;
(a) By direct computation we have
N N
E IX t - L ~ 'Pn(t)bn 12 =
n=O
R(t,t) L Anl'Pn(t)i2
n=O
which goes to zero as N ---+ 00 uniformly in t by virtue of the Mercer's
theorem.
(b) Suppose X t has the stated expansion. Then, we have
'"
R(t,s) = EXtX. = L An'Pn(t)qJn(S)
n=O
Hence,
lab R(t,S)'Pm(S) ds = lab L'"
n=O
An'Pn(t)'Pn(S)'Pm(S) ds
= An'Pm(t) a ::; t ::; b

The proof is complete. I
Consider now an example of how an integral equation of the form

of (4.25) might be solved. Let R(t,s) = min (t,s) and consider
loT min (t,s)cp(s) ds = "Acp(t) 05,t5,T (4.29)

or
lot scp(s) ds + tiT cp(s) ds = Acp(t)
Differentiating once with respect to t yields
t T cp(s) ds = AcjJ(t) 0< t <T (4.30)
Differentiating again, we find

-cp(t) = A;PU) 0< t < T (4.31)
We also have the obvious boundary conditions cp(O) = 0 and cjJ(T) = o.
Equation (4.31) with initial condition cp(O) = 0 yields
cp(t) = A sin ~~ t
Applying the condition cjJ(T) = 0 obtained from (4.30), we find
1
cos V~ T = 0
In other words the eigenvalues are given by

T2
An = - - - - n 0, 1,2, . . . (4.32)
(n +
i)27r 2
=
The normalized eigenfunctions are given by
(4.33)
It is rather interesting to note that because of Mercer's theorem, we have
min (t,s) = ~ n~o 7r 2(n

T
: i)2 sin [ (n + i)7r (~) ]
sin [ (n + i)7r (~) ] uniformly on [O,T)2 (4.34)
which is by no means an obvious result.

Analytical solution to the integral equation (4.25) is, in general,
difficult. For some special cases, the integral equation can be reduced to a
differential equation and then solved in a way similar to the example
above. In particular, if the covariance function R(t,s) has the form
R(t,s) = f-"'", ei27rX (t-8)cp(A) dA (4.35)
where cp(A) is a ratio of polynomials in A, then the integral equation can

be reduced to a differential equation with constant coefficient. The
details are given in Davenport and Root [1958, pp. 241-242].
Before leaving the subject of orthogonal expansion, consider a
second example. Suppose EXtX. = R(t,s) = cos 27r(t - s). Then we can
show that the Karhunen-Loeve expansion has only two terms. In fact,
any expansion in terms of an orthonormal basis in Xx has only two term!'.
Furthermore, in this instance, the integral equation (4.25) can be solved
rather easily. To show all this, we first observe that
EXoXt = 0
E/X O/2 = E/Xt /2 = 1
Therefore {Xo,X11 is an O-N family in Xx. Next, we observe that
E/X t - Xo cos 27rt - X t sin 27rt/ 2 = 0
Therefore, {X o,X11 is also complete. This means that Xx is two dimen-
sional. Hence, the Karhunen-Loeve expansion has only two terms. The
eigenvalues and eigenfunctions depend on the interval T. Suppose the
interval is [O,k], where k = integer. Then, the eigenfunction can be taken
to be {(l/Vk)e i27rt , (l/Vk)e- i27rt l or {(2/Vk) cos 27rt, (2/Vk) sin 27rtl.
If the interval [0, T] is not integral, then the eigenfunctions can be taken
to be suitable linear combinations of sin 27rt and cos 27rt.
5. WIDE-SENSE STATIONARY PROCESSES

A second-order process {Xl, - <Xl ~ t ~ <Xl I is said to be wide-sense
stationary if its covariance function is a function of only the time dif-
ference, that is,
(5.1)
We recall once again that the mean EXt is assumed to be zero throughout
this chapter. Wide-sense stationarity means that
(5.2)
for any to. As before, let Xx denote the Hilbert space containing all
linear combinations of {X /, - <Xl < t < <Xl I. Suppose we define a linear
operator U t for t, mapping Xx onto Xx, as follows:
UtX. = X t+ 8 (5.3a)
IIZn - Z// ----')
n->.,
0 implies (5.3b)
5. WIDE-SENSE STATIONARY PROCESSES 89
It is clear from the definition of Xx (3.1) that VtZ is well defined for
every Z in Xx by (5.3). Equation (5.2) can now be rewritten as
l.;f to, t, 8
which is easily extended to

- 00 < to < 00
(5.4)
Y, Z E Xx
Equation (5.4) means that for each t, U, is a unitary operator. It is also
easy to see that UtU. = U 1+8 , U o = I, V,-1 = U_ t , so that {U t , - 00 <
t < oo} is a translation group. Summarizing these results, we can state
the following.
Every wide-sense stationary process {XI' - 00 < t < oo} can be
represented as
(5.5)
where {U t , - 00 < t < oo} is a translation group of unitary operators
mapping Xx onto Xx. This result should be compared with the corre-
sponding result for the strictly stationary process given in Sec. 2.6.
The invariance of covariance function with respect to time shifts
suggests that harmonic analysis (i.e., representation in terms of complex
sinusoids ei2'lI"Pt) should play a useful role in the theory of wide-sense
stationary processes. To those who are familiar with application of Fourier
integrals to the analysis of time-invariant linear systems, this is certainly
no surprise. We begin our discussion with a brief review of Fourier
integrals and their application to the analysis of linear time-invariant
systems. Let L p , Co, and S denote, respectively, the following function
spaces of Lebesgue measurable complex functions defined on (- 00 , 00 ) :
Lp is the space of all functions satisfying 1
(5.6a)
Co is the space of all bounded-continuous functions such that

J(t) ~ 0 (5.6b)
S is the space of all infinitely differentiable functions such that

for any integers m and n,
dnJ(t) I
IIlm I -dtn- ~O
Itl-t 00
(5.6c)
1 Strictly speaking, Lp is a space of equivalence classes. Two functions fl and f. belong

to the same equivalence class if and only if 1_00"" [fl(t) - h(t)[p dt = o.
The spaces Lp and Co are complete normed linear spaces (Banach spaces)
00 ]l/P
with respective norms [ {ooll( t)I P dt and s~p If(t)l. The space S; is
dense in both Lp and Co. That is, the completion of S; with respect to the
norm of Lp is Lp ' and its completion with respect to the norm of Co is Co.
Therefore, for every I in Lp , we can find a sequence {In} in S; such that
and for every jin Co, we can find Un I in S such that

sup
t
IJn(t) - J(t)1 ~ 0
n----+ 00
For JELl, the Fourier integral (or Fourier transform)
1(11) = /-"'", e- i2".IJ(t) dt (5.7)
is an absolutely convergent integral for each II E (- 00 , 00 ) and defines a

function 1 in Co. If f E L1 and is of bounded variation in a neighborhood
of t, then the inversion formula i" given by
J(t - 0) + J(t + 0)
2
= lim
N->",
r:N ei21rvt/(/I) dll (5.8)
where J(t - 0) and J(t +

0) denote, respectively, the limit from the left
and the limit from the right of J at t. The right-hand side of (5.8) is often
written simply as j:", e i27rvl/(II) dll, although the integral is not absolutely
convergent unless 1 is also in L 1 .
Roughly speaking, if f E L 1 , L 2 , or S, we have a pair of equatiom;
relating a function J and its Fourier transform 1:
1(11) = j:", e- i2 ".IJ(t) dt

(5.9)
J(t) = j:", e dll
i21rV1l(lI)
These two equations are nearly identical, the only difference being the
terms e±i2.-v1 in the integrals. By convention, / is called the Fourier
transform of j, and f is called the inverse Fourier transform of j.
Depending on j, one or both of the integrals in (5.9) may have to be
defined as the limit in some sense of a sequence of finite integrals, and
the equality may only hold in a restricted sense. For example, if J E L1
and is of bounded variation, then the first integral is absolutely convergent,
but the second equation, strictly speaking, should be replaced by (5.8).
If J E S, then J is also in S. In this case both integrals are absolutely
convergent, and equality holds for every II and t in (- 00,00 ). If f E L 2 ,
then the first equation really says that there exists j E £.2 such that
f .. I j(v)
- 00
- fT,Tl f(lJe- i2 ...t dt I dv
- Tli
)0
T2-+ co
The s('cond equation is also interpreted in a similar way. When we use

(.5.9), it is understood that these equations are subject to such
qualifications.
The convolution product. between two functions f and h, when it
exists, is defined by1
(f * h) (t) = r:. f(t - s)h(s) ds (5.10)
We note that if f and h are in S, then f * h is also in S. The convolution

product is symmetric in f and h. This is easily seen by a change in the
variable of integration on the right-hand side of (5.10). If we denote
Fourier transformation by"', then a most important property of con-
volution is given by the relationship
~ j~
J *h = h (5.11)
Equations (5.10) and (5.11) are well-known representations of linear time-
invariant filtering operations. Roughly speaking, a linear filter is a linear
mapping of some input function space Vi into some output function
space Vo. Suppose we define time shift T a by
(Taf)(t) = f(t + a) (5.12)
A linear filter A is said to be time invariant if for every a E (- 00 , 00 ),
ATaf = TaAf
If the input space Viis L 1, and a filter A is defined by Af = h * f, then
A. is linear and time invariant. The function h is known as the impulse
response of the filter A. The Fourier transform h, of the impulse response
is known as the transfer function. More generally, suppose a filter A
is defined by
(Af)(t) = f-.. . e-i27rPth,(v)f(v) dv (5.13)
Then, A is again time invariant and linear. The function It is again called
a transfer function. In general, h,(v) may not be the Fourier integral of
any impulse response. To put it in another way, the inverse Fourier
transform of It may not exist except as a generalized function. We have
been deliberately vague in specifying the input space Vi and the output
space V o, because they depend very much on the filter. For example,
1 If I, hELl, then 1* hELl. If I, hE L 2, then 1* h is bounded. If IE LI and h is
bounded, then I * h is again bounded.
h(I') = 27ril' represents the transfer function of a differentia tor. Clearly,

the input to a differentiator must be differentiable. In any event, our
main interest here is not the filtering of known t functions, but stochastic
processes.
Let IX t , - 00 < t < 00 I be a q.m. continuous and wide-sense
stationary process with covariance function R(T). Assume R ELI> then
clearly R also belongs to L1 (\ Co because of the q.m. continuity of X,.
Let S be the Fourier transform of R defined by
S(I') = f-'"'" e- i2nt R(T) dT -00<1'<00 (5.14)
The function S is nonnegative because of the nonnegative definiteness

of R (to be proved), and if R E L1 (\ Co as is assumed, then S E L1 (\ Co,
also. The inversion integral gives us
(5.15)
The function S is called the spectral-density function for the process

IXt, - 00 < t < oo}, and has the interpretation of being the density of
average power distribution per unit frequency. To make this notion
precise, we need to introduce the concept of linear time-invariant filtering
of wide-sense stationary processes.
Imagine a time-invariant linear filter characterized by an impulse
response function h, and suppose that the input is a sample function
Xt(w), - 00 < t < 00, of a wide-sense stationary process. It is natural
to view the output Yt(w), - 00 < t < 00, as a sample function of another
second-order process and write
(5.16)
The difficulty is that if the convolution integral is to be viewed as an

absolutely convergent Lebesgue integral involving a sample function of
the X process, then we need to impose conditions on the X process which
are not in the province of second-order properties. IVlathematically, it
is much neater to regard the integral in (5.16) as a q.m. integral as defined
in Sec. 3. From Proposition 3.1 we have that for the integral in (5.16)
to exist as a q.m. integral, it is necessary and sufficient that
fJ k(t -
'"
T)ii(t - U)R(T - u) dT du
exists as a Riemann integral. If R E L1 (\ Co as was assumed, and h

is square integrable, then
II'" h(t - r)h(t - u)R(r - u) dr du

-'"
= 1_"'", Ih(v)j2s(v) dv < 00 (5.17)
and the existence of (5.16) as a q.m. integral is ensured. We should not

lose sight of the fact that to treat (5.16) as a q.m. integral is a mathe-
matical convenience. In applications, since we never observe more than
one sample function at a time, filtering operations should be interpreted
as being performed on a sample function. Thus, in practice, we really
want (5.16) to be interpreted both as a q.m. integral, which is the limit
of a q.m. convergent sequence of random variables, and as a Lebesgue
integral for almost all sample functions. In this chapter, we focus our
attention only on the former interpretation (cf. Sec. 2.2).
With {Y" - 00 < t < 00 I defined by (5.16) with h square integrable,
then
EYtY. = II h(t -
'"
r)ii(s - r)R(r - u) dr du
= 1_"'", e .-v(t-8)1'h(v)I2S(v) dv
i2 (5.18)
which means that (1) {Y t , - 00 < t < 00 I is again wide-sense stationary,

and (2) the spectral-density function of {Y t , - 00 < t < oo} is given by
(5.19)
If we take h(v) to be the indicator function for [VI) V2], then from the usual
interpretation of filtering, Y process is just those components of X process

lying in the frequency range VI ::; V ::; V2. Since
EIYtl2 = 1-"'", Ih(v)12S(V) dv

= f~2 S(v) dv (5.20)
it follows that!.·' PI
S(v) dv is just the average power of the X process in
[V1,V2J. This justifies our earlier assertion that S is nonnegative and that it
has the interpretation of average power per unit frequency.

Equation (5.15) is a representation of the covariance function as a
positive linear combination of sinusoids. This idea can be generalized to
situations where R e: L1 and the spectral-density function may not exist.
Intuitively, the situation is quite simple. When R E L1 n Co, the average
power is smoothly distributed over the continuum of all frequencies
( - 00 , 00 ), with no single frequency having a finite amount of power.
In the more general situation, there may be spectral lines, i.e., distinct
frequencies with finite amount of power. Even more complicated situa-
tions involving continuous, but not absolutely continuous, distributions
may arise. The general statement concerning the harmonic representation
of a stationary covariance function is given by the Bochner's theorem
stated below.
Proposition 5.1. A function R(T), - 00 < T < 00, is the covariance func-
tion of a q.m. continuous and wide-sense stationary process if and
only if it is of the form
(5.21)
where F is a finite Borel measure on the real line (- 00,00) and is

called the spectral measure.
Remarks:
(a) We note that [l/R(O)]F is a probability measure defined on the
u algebra of Borel sets of the real line.
(b) The integral in (5.21) is defined exactly as the expectation was
defined in (Sec. 1.5).
(c) Alternatively, if we define a spectral-distribution function tj> by
tj>(v) = F« - 00 ,v» (5.22)

then tj> is a bounded nondecreasing function, and the integral in
(5.21) can be replaced by a Stieltjes integral
f-.. . ei27rVT dtj>(v)
(d) Finally, we note that if tj> is absolutely continuous (that is, F

is an absolutely continuous measure with respect to Lebesgue
measure), then there exists a nonnegative function S such that
tj>(X) = J~ S(v) dv . (5.23)
Naturally, S(v) is called the spectral-density function, agreeing with

and generalizing our earlier definition.
Proof: First, if F is a finite Borel measure, then
f-.. . e i2 .... (t-·)F(dv) (5.24)
is nonnegative definite function, because it is the limit of a sequence of

N
nonnegative definite functions of the form L Un(t)lin(S). It is continuous
n=O
at t = s, because
I f- " "" (e i2 ".., - I)F(dll) I~

.-.0
0
Hence, it is continuous everywhere on (- 00,00) X (- 00,00). Therefore,

(5.22) defines the covariance function of some q.m. continuous wide-sense
stationary process. Thus, the first half of Proposition 5.1 is proved.
Next, suppose R(T) is the covariance function of some q.m. con-
tinuous and wide-sense stationary process. That is, R(t - s) is a con-
tinuous nonnegative definite function on (- 00,00) X (- 00,00). Now,
define a sequence of continuous covariance functions Rn(T) by
Rn(T) = {( 0 1 - 2n R(T)
H) ITI ~ 2n
(5.25)
ITI > 2n
The fact that Rn(T) is a nonnegative definite function follows from the
fact that max (0, 1 - ITI/2n) is a nonnegative definite function. Now,
clearly Rn E Ll n Co. Therefore, there corresponds a sequence of spectral-
density functions {Sn} defined by
Sn(lI) = f-"""" R n (T)e- i21rPT dT
= f2n
-2n
(1 - tl)2n
R(T)e- i27fVT dT (5.26)
N ow, let f and .I be functions in S which are Fourier transform pairs,
j(x) = f-"""" e- i2JrX1If(y) dy
fey) = f-"""" ei2 " X1Ij(x) dx
Define a linear functional p on S by
p(f) = f-"""" R(T)J(T) dT (5.27)
Using the dominated-convergence theorem, we have
Ip(f) I = l~~ I f-"""" Rn(T)j(T) dT I

= lim
n--->""
I f-"""" Sn(lI)f(lI) dll I
~ sup If(lI) I lim f-"""" Sn(lI) dll
p n~ao
= sup If(lI)IR(O)
p
This means that p is a linear functional defined on a set S dense in Co and

bounded with respect to the supremum norm. Thus, p can be extended to a
bounded-linear functional on Co, and the Riesz representation theorem

[see, e.g., Rudin, 1966, p. 131] stat.es t.hat there exists a bounded 0' additive
set function F defined on the Borel sets such that
p(f) = f-"'", f(v)F(dv) (5.28)
Because R is nonnegative definite, P is nonnegative and is thus a finite

Borel measure. Comparing (5.27) and (5.28), we conclude that
f-"'", R(r)j(r) dT = f-"'", f(v)P(dv) fE~ (5.29)
Now, for a fixed t define
f .. (v) = exp [ - ~ (2::) 2J ei2 .-.t (5.30a)
then
j .. (T) = ~ 2: e-1n(t-T)' (fi.30b)
It is easy to verify that fn, j .. are in ~. If R is continuous, then
f-"'", R(r)jn(T) dT -;;:;-: R(t)

and
f-"'", fn(v)F(dv) ~ f-"'", ei21rvtP(dv)
which proves the second half of Proposition 5.1. I
Proposition 5.1 gives an expression of a covariance function in terms
of the corresponding spectral measure (alternatively, the spectral-distribu-
tion function), but gives no inversion formula expressing the spectral
measure F in terms of the covariance function R. Actually, the proof
already contains the heart of both the direct and the inversion formulas
in (5.29). We need only to choose an approximating sequence of I's or
j's. For example, suppose b > a are two continuity points of P, that is,
a and b are such that
F([a,b]) = F«a,b» = F([a,b» = F«a,b])
Denote the common value of these four quantities by F ab, thenFab can be
found from (5.29) by approximating the indicator function of the interval
a to b by a sequence in ~. Specifically, let
fn(v) = In 1a(b e-!n(v-v')' dv'

"V2;
(5.31)
j.,(r) = exp [ - - - -
1 (2'lrT) 2J e- i21TTb - e- i2 ..Ta
2 n -2'lriT
6. SPECTRAL REPRESENTATION 97
Then, from (5.29) we have
Fab = lim f-"'", fn(v)F(dv)

n ..... '"
= lim f-"'", !n(T)R(T) dT
n->'"
Thus, we have in final form the inversion formula
Fab = lim
n ..... '"
! '"
-'"
R(T)
e-i27rTb _
-27rtT
e- i2 11"Ta
. exp
[ 1 (211"T) 2]
- - - - dT
2 n
(5.32)
Equation (5.32) holds only at continuity points of F. However,

because R(O) = F«- 00,00)) is finite, the number of discontinuities of
F which have jumps greater than lin is, at most, nR(O). Hence, the
discontinuity points (mass points) of F are, at most, countable. Thus,
the continuity points of F are always dense in (- 00,00). It follows that
F is completely determined by (5.32).
We can now repeat the arguments relating to linear time-invariant
filtering operations on a wide-sense stationary process. Instead of (5.18),
we have for the more general case
EYtY s = f-"'", e i21r v(t-s)lh(v)12F(dv)
where Y t is the output of a linear time-invariant filter and F is the spectral

measure of the input. We postpone a detailed discusson of this point until
later. First, we need to develop a spectral representation, not just for the
covariance function as is given by Proposition 5.1, but for the process
itself.
6. SPECTRAL REPRESENTATION
Let IX" - < t < 00 I be a q.m. continuous and wide-sense stationary

0()
process. If X, had a Fourier integral, then the invf'rsion formula would

immediately give a representation of X, as a linear combination of sinus-
oids. However, XI does not have a Fourier integral. That is, e-i21rv t}{t dt f-"'",
fails to exist as a q.m. integral or any other kind of integral. To obtain a
spectral representation, we introduce a kind of integrated Fourier integral
lXx, - 00 < A < 00 I as follows:
lim in q.m. Xa = 0 (6.1a)
a---i' - 00
If a and b are continuity points of the spectral measure F, we set
Xb - Xa = f-"'", (lab r ibvt dV) X t dt (6.1b)

At discontinuity points of F, X" is defined so that {X", - 00 <

A < oo} is q.m. continuous from the left (6. Ie)
Proposition 6.1. Defined as above, the process {XA, - 00 < A < oo}
satisfies
(6.2)
so that {X", - 00 < A < oo} is a process with orthogonal increments.
Remark: Denoting dX" = X"+d" - X" and F(dA) = F([A, ~ + dA)), we

can write (6.2) in a differential form as
o =
}..p
{I,0, AA;;t.= p.
p.
(6.3)
The process {X", - 00 < A < oo} will be called the spectral process
of {XI' - 00 < t < oo}. We shall show that
XI = f-.. . . e i2r "1 dX"

which is the spectral-representation formula that we are after. However,
first we have to define integrals such as f-"'", f(A) dX".
Definition.Let {X", - 00 < ~ < oo} be a process with orthogonal incre-

ments which is q.m. continuous from the left. Let
E dX" dX" = o""F(d~).

(a) If f = lab, the indicator function of [a,b), we set
f-"'. . f(A) dX" = Xb - Xa (6.4)

n
(b) If f =
.=L a.f., we require
1
.=I a. f-.. . . f.(A) dX"

n
f-"'", f(A) dX" = (6.5)

1
(c) If f-.. . . !fn(A) - f(A)12F(dA) -:;;:;: 0, we require
f-"'", f(A) dX" = lim in q.m. f-"'. . fn(A) dX"

n-+'"
(6.6)
It is clear that f-"'. . f(A) dX" is well defined for every ffor which there
exists a sequence of step functions {fn} (linear combinations of indicator
functions) such that
1-"'", Ifn(~) - f(~)12F(d~) ~ 0 (6.7)
Although we won't prove it, the class of all such f is precisely L2(F),
that is, the class of functions satisfying 1-"'", If(~) 12F(d~) < 00. For a
continuous function fin L 2(F), a suitable approximating sequence of step
functions can be constructed by sampling f at continuity points of F.
For an arbitrary fin L 2 (F), the construction of an approximating sequence
is somewhat more complicated [Doob, 1953, pp. 439-441].
Proposition 6.2. Let (XA' - 00 < ~ < 00 I be a process with orthogonal

increments assumed to be q.m. continuous from the left. Let F be a
finite Borel measure so that EXAX!' = oA!,F(d~). Then,
E [/-"'", f(~) dX A ] [/-"'", g(~) dXA]

= 1-"'", f(~) O(~)F(d~) f, g E L 2 (F) (6.8)
Proof: Equation (6.8) is certainly true if f and g are step functions.

Let Un I and (gn I be approximating sequences of step functions for f
and g, respectively, and write A (f) = 1-"'", f(~) dX A• Then,
EA(f)A(g) = E[A(f - fn) + A(fn)][A(g - gn) + A(gn)]

= 1-"'", fn(~)gn(~)F(d~) + EA(f - fn)A(g - gn)
+ EA(fn)A(g - gn) + EA(f - fn)A(gn)
Therefore,
I EA (f)AW - f-"'", fn(~)g:(~)F(d~)1

~ VEIA(f - fn)/2 VEIA(g - gn)12 + VEIA(fn)/2 VEIA(g - gn)12
+ VEIA(f - fn)12 VEIA(gn)12
and (6.8) follows by letting n ~ 00. I
Proposition 6.3. Let tXt, - 00 < t < 00 I be a q.m. continuous and wide-
sense stationary process, and let {XA' - 00 < ~ < 00 I be its spectral
process as defined by (6.1). Then
fES (6.9)
where J denotes the Fourier transform of f and S, as usual, denotes

the space of infinitely differentiable functions of rapid descent.
Proof: The definition of Xx as given by (6.1) yields
1_"',Jab(A) dXx = 1-"'", fab(t)X t dt
where lab denotes the indicator function of [a,b). Let Un} be a sequence
of step functions obtained by samplingf at continuity points of the spectral
measure F. As the sampling points become dense in (- 00,00 ),fn converges
to f uniformly and in L 1 metric. Therefore, Un} converges to J uniformly.
Now, for each n,
1-"'", fn(A) dX x = 1-"'", In(t)X t dt
Therefore, if f E £,
I
E 1-"'", f(A) dX x - 1-"'", J(t)X t dt r: ; 2E I 1-"'", [f(A) - fn(A)] dX x \2
+ 2E 1 1-"'", [J(t) - In(t)]X t dt 12 = 2 1-"'", If(A) - fn(A)12F(dA)
+ 2 II
'"
R(t - S)[J(t) - In(t)][J(S) - In(S)] dt ds
The first of these integrals goes to zero as n ---+ 00 by virtue of the con-
struction of fn. The second integral also goes to zero, because
ITT [JCt) - In(t)]e i2 ,,vt dt r-:: f(II) - fn(II)
and the convergence is bounded. Hence, from (5.21), we have
II R(t -
T
lim s)[J(t) - In(t)][JCs) - In(S)] dt ds

T-.", -T
= 1-"'", If(II) - fn(II)j2F(dll);::-;;; 0 I

Corollary. Under the hypotheses of Proposition 6.3,
1-"'", f(A)Y(A) F(dA) = II J(t){j(s)R( t -

'"
s) dt ds j, g E S (6.10)
Remark: Note the similarity between (6.10) and (5.29). Indeed, (5.29)
can be obtained from (6.10) by using
gn(A) = exp ( - !~)

2 11
(In(S) = ~ exp ( - ~ S2)

and letting n ---+ 00.
Proposition 6.4 (Spectral Representation). A q.m. continuous process

lXI, - 00 < t < oo} is wide-sense stationary if and only if there
exists a process with orthogonal increments {Xx, - 00 < A < oo}
such that EIXxl 2 is bounded and
(6.11)
Proof: If I Xx, - 00 < X < oo} is as described, we can always assume

that it is continuous from the left. Then,
defines a finite Borel measure, and E dXx dX!' = fJx!,F(dX). Since
E (/_0000 ei21rXt dXx) (/_0000 e i2 11"vs dXv) = 1_0000 ei21rX (t-slF(dX)
the process defined by 1_0000 e i21rXt dX x must be q.m. continuous and wide-
sense stationary by virtue of Bochner's theorem (Proposition 5.1).
Conversely, let {Xt, - 00 < t < oo} be q.m. continuous and wide-
sense stationary and define Xx hy (6.1). Then, (6.11) follows from (6.9)
by using the familiar approximation:;
and letting n ~ 00. I

Equation (6.11) finally provides us with a representation of wide-
sense stationary processes as a linear combination of sinusoids. The
spectral process {Xx, - 00 < X < oo} also provides a complete char-
acterization of the Hilbert space Xx generated by the process {XI' - 00 <
t < 00 }, and a complete characterization of every process that can be said
to be derived by a time-invariant linear operation on lXI, - 00 < t < oo}.
Proposition 6.5. Let {XI, - 00 < t < oo} be a <l.m. continuous and wide-
sense stationary process with spectral process lXx, - 00 < X < oo}
and spectral measure F. Let Xx be the Hilbert space generated by
{Xt' - 00 < t < oo}. Then, a random variable Y belongs to Xx
if and only if there exists 7J E L 2(F) such that
(6.12)
Proof: The proof that every Y of the form of (6.12) is in Xx is elementary.

We shall only consider the other half. First, suppose Y is of the form
L a.X
n
t.; then from (6.11), we have
.=1
hence (6.12) holds. In general, Y E Xx is the q.m. limit of a sequence

{Y n I, each of which is a finite sum. Hence
Y = lim in q.m. Yn = lim in q.m.
n-> "" n--> ""
Since {Ynl is q.m. convergent, it is also mutually q.m. convergent, and

from (6.8) we have
EIY m - Ynl 2 = f-"""" l1)m(X) - 1)n(X)12F(dX) m,n-->",,' 0
This means that {1)n I is a Cauchy sequence in L 2(F), and the completeness
of L 2(F) implies the existence of 1) E L 2(F) such that
f-"""" l1)n(X) - 1)(X)12F(dX) -;;:::::
But this is equivalent to saying that f-"""" 1),,(X) dX h converges in q.m. to

f-""""
1)(X)dXh • The proof is complete. I
Proposition 6.5 provides an explicit representation for elements of
Xx in terms of functions in L 2(F). Indeed, it provides a one-to-one
mapping of L 2(F) onto Xx which preserves the inner products, that is,
(6.13)
That is, (6.12) is an isometry between Xx and L 2 (F). Equation (6.12) is

extremely useful in representing an arbitrary linear operation on a wide-
sense stationary process.
Proposition 6.6. Let tXt, - 00 < t < 00 I be as in Proposition 6.5. Let

{UI, - 00 < t < 00 I be the corresponding translation group of
unitary operators as defined by (5.3). Suppose {Y t , - 00 < t < 00 I
is a process such that for each t, Y t E Xx. Then, the following
conditions are equivalent:
(a) There exists 7J E L 2(F) such that
tE(-oo,oo) (6.14)
(b) For each t E (- 00,00),
(6.15)
(c) For arbitrary t and 8 in (- 00,00),
EYtX. = EYoX.- t (6.16)
Proof: It is nearly obvious that (a) implies (c). By direct computation

we have
EYtX. = /-,"'"" e i21rX (t-3)TJ(A)F(dA) = EYoX.- t
To prove that (a) implies (b), we make use of the fact that
U t dX.,. = e i2 ..X1 dX x (6.17)
More precisely, this means that
EIU,(X H < - Xx) - e i2 .. X/ (Xx+. - XX)12 ~

<->0
°
Finally, if (6.14) holds, then we can write
UtYo = f-"""" TJ(A)U t dX.,. = f-"""" e i21rX1 TJ(A) dXx = Y,
This step can be made precise by approximating the stochastic integral in

(6.14) by sums.
Next, we prove that each of (b) and (c) implies (a). Suppose (b) is
satisfied. Because of Proposition 6.4, we can write
Yo = f-"""" TJ(A) dXx
Thus, (6.14) follows upon using (6.17). To prove that (c) implies (a), we
first note that from Proposition 6.4 for each t there exists TJ(-,t) E L 2(F)
such that
Y/ = f-"""" TJ(A,t) dXx
If (c) is satisfied, then from (6.16),
EYtX. = f-"""" TJ(A,t)e- i2 .. xsF(dA)
= EYoX._ t = f-"""" TJ(A,O)e i21rX \I-s)F(dA)

This means that for arbitrary {,s in (- 00,00), we have
If we denote :VI = f-"'", ei2 ".>.t 17 0_,0) dX>., then the above equation is equiva-
lent to
t,8 E (- 00 , 00 )
Since (Y t - :VI), being in Xx, is the q.m. limit of a sequence of finite sums
of the form 2:
a.X!., we have EIY t - :Vt/ 2 = 0. The proof is now
complete. I
Remark: The process {Y f , - 00 < t < oo} is necessarily wide-sense sta-
tionary. More than that, {Xl, Y t , - 00 < t < oo} may be said to
be jointly stationary (wide-sense) in the seIlse that every linear
combination aX! +
/3Y t defines a wide-sense stationary process. On
the other hand, suppose that for every tin (- 00,00), Zt E Xx and
{Zt, - 00 < t < oo} is wide-sense stationary. The process {Z/,
- 00 < t < oo} does not necessarily satisfy the conditions (6.14)
to (6.16) and is not necessarily jointly stationary with tXt, - 00 <
t < oo} (see Exercise 12).
If h( t), - 00 < t < 00, is a function such that its Fourier transform
h is in L 2(F), then
Yt = f-"'", h(t - T)XT dT
= f-"'", h(A)eib>.t dX>. (6.18)
Therefore, every process which can be expressed as a convolution of an

impulse response and a wide-sense stationary process has a spectral repre-
sentation of the form (6.14), and 17 is just the transfer function. How-
ever, not every process of the form (6.14) can be represented as a convolu-
tion. Hence, (6.14) is a more general representation. We shall call every
process of the form (6.14) a process derivable by a linear and time-
invariant filtering on the X process. N stu rally , 17 will be referred to as
the transfer function of the filter. We note that if {Y t , - 00 < t < oo}
is given by (6.14), then its spectral process {:V>., - 00 < A < oo} is
defined by
dfT>. = 17(A) dX>. (6.19)
The Y process is automatically wide-sense stationary and q.m. continuous
with
EYtY. = f-"'", ei2",>'<f-8)I17(AWF(dA) (6.20)
This means that

(6.21)
7. LOWPASS AND BANDPASS PROCESSES 105
Equation (6.21) generalizes (5.19). Equation (6.19) should be considered

the random-process counterpart of the familiar formula relating the
Fourier transforms of the input and output of a linear time-invariant
filter.
It may be useful to consider some examples. First, if {Xt, - 00 <
t < oo} is a wide-sense stationary process, then its q.m. derivative as
defined by (3.5) can be exprel:;sed as
(6.22)
whenever it exists. It is rather obvious from (6.22) that the q.m. deriva-
tive exists if and only if f-"'",
IAI2F(dA) < 00. As a second example, con-
sider the process {Y t , - 00 < t < oo} defined by
(6.23)
where sgn A = 1 or -1 according as A ~ 0 or A < O. Thus defined, Y t

is known as the Hilbert transform of XI. The Y process has exactly the
same covariance function as the X process. Also, since
Xt + iY t = 210'" eihXt dX x
we can say that X t +
iY t has no negative frequencies. Hilbert transforms
will be made use of again in the next section.
7. LOWPASS AND BANDPASS PROCESSES

Definition. A wide-sense stationary and q.m. continuous process {X"
- 00 <
t < oo} is said to be bandlimited to frequency W if its
spectral measure F satisfies
F« - 00,- W]) =0 = F([W,oo» (7.1)
In other words, the X process has no average power for frequencies
::; - W and ~ W. We note that the points -Wand Ware required
to be continuity points of F. A process bandlimited to frequency W can
always be written as
XI f w
= -w eihvt
~
dX v (7.2)
where it makes no difference if the limits are replaced by W ± 0 and

- W ± O. Equation (7.2) results from the fact that
E IXt - f-w+o
w-o e ihvt dX v 12 = E/ r
Jlvl~w
ei2 ...t dX v /2
= F« -. 00 , - W]) + F([W, 00» =0
The concept of a bandlimited process is an important one in prac-

tice, because such a process is specified for all t by its sampled values
spaced 1/2W apart in a formula which is known as the sampling theorem.
Let {Xt, - 00
Proposition 7.1 (Sampling Theorem). <t< oo} be a process
bandlimited to frequency W. Then
~ X sin 21rW(t - a - k/2W)

Xt = lim J~ooq·m. k=4 N a+k/2W 21rW(t _ a _ k/2W) (7.3)
for all t E (- 00,00), where a is an arbitrary constant.
Remark: Proposition 7.1 implies that Xx has a countable basis, namely

IXa+k/2W,k = 0, ±1, ±2, . . . }.
Proof: We begin with the representation (7.2) and for emphasis rewrite
it as
X t
= f-w+o
W-O
ei2nt dX
~
(7.4)
For a fixed t, the functionf(v) = ei2nt is square integrable over (- W,W).

Since {cpk(v) = ei2 ...k/2W, k = 0, ±1, ±2, . . . \ forms an orthonormal
basis in L 2 ( - W,W), we have
ei2 r>t = l.i.m.

N---.oo k=-N
LN
ei2nk/2W _1_ f W ei2 .. p(t-k/2W) dv
2W -w
where !.i.m. denotes limit in L2 mean. The Fourier series
k--N
L
N
ei2nk/2W _1_ f W ei2 .. (t-k/2W) dv
2W -w
p
~ ei2 ...k/2W sin 21rW(t - k/2W)

k=4 N 21rW(t - k/2W)
also converges pointwise to ei2rpt for v E (- W, W), but not at the end-
points v = ± W. At the endpoints, it converges to i(e i2rWt e- i2 ..Wt ) +
cos 21rWt. Therefore,
E 1X - ~ X sin 21rW(t - k/2W) \2

t k=4 N k/2W 21rW(t - k/2W)
= E 1 fW-O [ei2 ...t ~ ei2 ...k/2W sin 21rW(t - k/2W)] dX 12

k=4
_
-w+o N 21rW(t - k/2W) A
f W-O
-w+o
1 ei2 ...t _
k=4 N
~ ei2...k/2wsin 21rW(t - k/2W)
21rW(t - k/2W)
12 F(dt..) -----')
N---.oo
°
7. LOWPASS AND BANDPASS PROCESSES 107
which proves the theorem for ex = O. For ex ~ 0, we note that Y t = X t+a

is also bandlirnited to Wand (7.3) is merely the expansion for Y t- a • I
Remark: The proof fails if F merely satisfies F(( - 00, - W)) = 0 and
F((W, (0)) = 0 but has a finite mass at one or both of the points
± W. Then, we would have to write
X /. = j -w-o
W+o
e i21rv !
~
dX "
and would find
E / f. sin 27rW(t - k/2W) /2

~ XI - k~4N X k / 2W 27rW(t - k/2W)
~ sin 2 27rWt(F+ + F-) (7.5)
where F+ and F- are, respectively, the mass of F at + Wand - W.

As an example, suppose X t = cos(27rWt + cp) where 'P is a random
variable uniformly distributed in [0,271'). Then, {X" - 00 < t < 00 I
is a wide-sense stationary and q.m. continuous process with
EX/X. = R(t - s) = t cos 27rW(t - s)
Therefore, F is concentrated at ± W, assigning a mass of i to each
point. The sampling theorem obviously fails, because in this case,
k = 0, ±1, ..
which means that for every t, the sum
~ X sin 27rW(t - ex - k/2W)
f a+k/2W 27rW(t - ex - k/2W)
is proportional to Xa. On the other hand X a+1/4W is orthogonal to Xa.

So, X a +1/ 4W obviously cannot be represented by the sampling
formula.
In communication problems, we frequently encounter the so-called
bandpass processes. Here, we define a bandpass process with center
frequency Wo and bandwidth 2W(W o > W) as q.m. continuous and a
wide-sense stationary process with
F((- 00, - Wo - W]) = F([- Wo + W, Wo - W])
= F([+Wo + W, (0)) = 0 (7.6)
In other words, the average power is concentrated in the frequency ranges
(-Wo - W, -Wo + W) and (We - W, Wo + W). Of course, a band-
pass process is bandlimited to frequency W 0 + W. Therefore, the sam-
pling theorem can be applied. However, a straightforward application of
the sampling theorem requires a sampling rate of 2W 0 + 2W, which does

not reflect the fact that there is no average power in the frequency range
[- W 0 + W, W 0 - W]. From the point of view of bandwidth, a bandpass
has the same bandwidth as a process bandlimited to 2W. Thus, we should
expect a sampling rate of 4 W rather than 2 (W 0 + W). In practice W 0
is much greater than W, and there is a substantial difference between
these two rates of sampling. It turns out that a bandpass process with
bandwidth 2W can be represented in terms of the samples of itself and
the Hilbert transform, each being sampled at a rate of 2W to give a
combined rate of 4W. Let {X" - 00 < t < oo} be a bandpass process
with center frequency W 0 and bandwidth 2W. Since W 0 > W, II = 0 is a
continuity point of F. We define the Hilbert transform by
YI = f oo
-00 (-i sgn lI)e i2nt dX.
~
(7.7)
where sgn II = 1 or - 1 according as II > 0 or II < O. The value of sgn 0
is immaterial, since (7-7) can be rewritten as
Yt = i j -w,-w
-WO+IV -
e i2nt dX. - i
flVo+w
+wo-w
e i2 ... t dX.
~
(7.8)
Again, we note that the limits in the integrals in (7.8) can be replaced by
their limits from the left or the right without any material effect. The
process {Xt, - 00 < t < oo} can be written as
(7.9)
It follows that we can write
Xt + 'tV
.
t
= 2 ~wo+w °
w,-w
e,21r·t
~
dX •
= 2e i2rW ,t fW
-w e i2nt dX .+W 0
= 2ei2 .. Wot ~t (7.10)

X t - iY t = 2e-i2rWot fW
-w e i2 ... t dX p-
W0
= 2e-i2 ..Wot17t (7.11)

It is clear that both {~t, - 00 < t < oo} and {17t, - 00 < t < oo} are
bandlimited to W. Hence, by applying (7.3) to ~t and 171, we get
(""c.c+k/2We-i2 .. Wot
0/
+ I: k
C;a+ /2
we'"
02 W
ot
) sin 21rW(t - k/2W - a)
21rW(t _ k/2W _ a)
n= -00
Reexpressing 17a+k/2W and ~a+k/2W in terms of Xa+k/2W and Y a+kJ2W, we get

the following result.
8. WHITE NOISE AND WHITE-NOISE INTEGRALS 109
Proposition 7.2. Let {Xt, - 00 < t < oo} be a bandpass process with
center frequency W 0 and bandwidth 2W. Let Y t be its Hilbert
transform defined by (7.7). Then
L
N
sin 21rW(t - k/2W - a)
X t = lim in q.m.
N-.". k=-N 21rW(t - k/2W - a)
[Xk/2W+a cos 21rWo(t - k/2W - a)
- Yk/2w+asin21rWo(t - k/2W - a)] (7.12)
Equation (7.12) involves both X t and Y t being sampled at 2W,
giving a total sampling rate of 4W.
It is interesting to note that
(7.13)
Since ~t and 1/t are both bandlimited to W, we can expect 1~d2 +
11/112 to be
relatively slowly varying when W« Woo Therefore, IXd 2 +
IYd 2 is also
slowly varying (relative to a sinusoid with frequency W 0)' On the other
hand, most of the average power of XI is concentrated near ± W 0, so
XI itself is rapidly varying. If X t is real valued, then it can be written as
Xt vlX t + Y cos [21rWot + OCt)]

= 2 I2
where both viX,2 + Y t and OCt) are slowly varying.

2 Thus, vi X t 2 + Y t 2
has the interpretation of the envelope and OCt) the phase of a sinusoid
being slowly modulated in both its amplitude and phase. Thus, we see
that the Hilbert transform plays a rather important role in bandpass
processes. It permits the envelope of such a process to be expressed simply.
8. WHITE NOISE AND WHITE-NOISE INTEGRALS

A white noise is usually described as a wide-sense stationary process with
a spectral-density function given by
Sell) = So for all 1I (8.1)
Since for this case r"'""

S(lI) dll = R(O) 00, a white noise is not really
a second-order process at all, and the spectral-density function is not

well defined. However, if we proceed heuristically, (8.1) suggests that
R(T) must be given by
ReT) = EX/+TXt = O(T)SO
where OCT) is the Dirac 0 function. The reason for this is that
S(lI) = 1-"'"" e i27r>TR(T) dT

would then give (8.1). These considerations are purely formal and require
elaboration and substantiation. At the outset, we should distinguish
between the problem of handling white noise in a mathematically con-
sistent way and the problem of interpreting white noise as an abstraction
of physical phenomena. As far as the calculus of white noise is concerned,
the problem is not difficult, at least for linear problems. Nonlinear
problems involving Gaussian white noise are substantially more complex
and will be dealt with in a later chapter. The principal tool that we shall
use in establishing a self-consistent calculus for white noise is the second-
order stochastic integral that we introduced in Sec. 6 in connection with
spectral representation. There remains the problem of interpretation.
Since R(O) = 00 implies an infinite average power, a white noise
cannot be a physical process. If a white noise is not a physical process,
and if it leads to mathematical complications, then why is it used at all?
First, even though the calculus of white noise requires justification,
once justified it leads to a tremendous analytical simplification in many
problems. Secondly, many processes that one encounters in practice are
well approximated by white noise, but this statement requires amplifi-
cation. Because a white noise is not a second-order process (indeed, it
is not a stochastic process at all!), no sequence of processes {Xn(t), t E
(- oo,oo)} which is q.m. convergent for each t can converge to a white
noise. The way out of this difficulty is to recall that just as a 0 function is
never used outside of an integral, the same is true with white noise.
Definition. A sequence of q.m. continuous processes {X/nl, t E (- oo,oo)}

is said to "converge to a white noise" if
(a) For each f E L 2 , {Xn(f)} is a q.m. convergent sequence, where
Xn(f) is defined by
Xn(f) = f-"'", f(t)Xtin) dt (8.2)
(b) There exists a positive constant So such that
lim EXn(f)Xn(y) = So f-"'", f(t)g(t) dt (8.3)

n .... '"
This definition helps to make clear the idea that a process {XI, - 00 <
t < oo} is approximately a white noise. What we really mean is that for
all function f that we are concerned with, the quantity EX(f)X(f) is
very nearly equal to So f_"'",
If(t) 12 dt.
Suppose {Xt(n>, t E (- oo,oo)} is a sequence of processes converging
to a white noise. By definition, for every f E L 2, there exists a second-
order random variable X (f) such that
EX (f) X (f) = So f-"'", IJ(t) 12 dt (S.4)
It is common practice to write X(f) as
XU) = /-,""" J(l)X t dt (S.5)
where X t is a white noise. We shall do so on many occasions. It should not

be forgotten, however, that the right-hand side of (8.5) is nothing more
than a symbolic way of writing X (f), and there exists no stochastic
process X t for which the right-hand side of (8.5) is an integral. Although
(8.5) is merely formal, X (f) does admit a representation as a second-order
stochastic integral as is indicated by the following proposition.
Proposition 8.1. Let {X / n), - 00 < t < oo} be a sequence of q. m. con-

tinuous converging to a white noise. Let
XU) = lim in q.m. f-"""" J(t)X/n) dt (8.6)

11---+""
Then
X(f) = f-"""" J(t) dZ t (8.7)
where {Zt - 00 <t< oo} is a process with orthogonal increments

defined by
Zt = lim in q.m. (t X.(n) ds (S.8)

n-'oo Jo
ProoJ; Define {ZI, - 00 < t < oo} by (8.S). If we denote the indicator
function of [a,b) by lab, then we can write
(S.9)
From (8.3) we have
E(Zb - Za)(Zd - Zc) = So f-"""" lab(t)lcd(t) dt (8.10)
Hence, {Zt, - 00 <t< oo} has orthogonal increments and
E dZ t dZ. = So ~t. dt (8.11)
Define the second-order stochastic integral f-"""" J(t) dZ t as in Sec. 6 for

J E L 2• If J is a step function, it is obvious from (8.9) that
XC!) = f-"'"" J(t) dZ t
If f E L2 is not a step, there exists a sequence of step functions converging

to f in L2 distance. Therefore
XCf) = /-,"", fn(t) dZ t + XC! - fn) --;;:::: f-"'", f(t) dZ + X(O)

t
= f-"'", f(t) dZ t
which proves the theorem. I

Manipulations of white-noise integrals 1_"'", Xd(t) dt are justified by
replacing it by the stochastic integral 1_"'", f(t) dZ t • In the future, when
we speak of being given a white noise {X" - 00 < t < 00 I, we shall take
it to mean that we are given a process with orthogonal increments {Z/,
- 00 < t < 00 1 which satisfies E dZ t dZ. = So 5t• dt. From now on we
shall also omit the constant So which is irrelevant to most considerations.

A most interesting property of a white noise is that its Fourier transform
is again a white noise. Of course, Fourier transform has to be defined
properly here.
Proposition 8.2. Let tXt, - 00 < t < 00 1 be a white noise. Then there
exists a second white noise {X p, - 00 < II < 00 1 such that
1_"'", k(t)Xt dt = f-"'", h(II)Xp dll (8.12)
for all h, k E L 2•
Remark: We repeat once again that the integrals in (8.12) are merely
symbolic representations of f-"'",
k(t) dZ t and 1_"'", h(lI) dip.
Proof: Define {i p, - 00 < II < 00 1 as follows:

(a) io = 0
(b) Zb - iO- = 1_"'", Iab(l) dZ t
Then,
E(Zb - Za)(id - ic) = 1_"'", Iob(t)fcd(t) dt
= 1_"'", I (II)lcd(lI) dll
ab
Hence, E dZp dill = 5PII dll, and lip, - 00 < II < 00 1has the same second-
order properties as {Z/, - 00 < t < 00 I. For an arbitrary h E L 2 , by
approximating h by step functions in the familiar way, we find
f-"'", k(t) dZ/ = 1_"'", h(lI) dZp (8.13)
proving the theorem. I

The spectral process of a white noise can now be used to define

linear time-invariant filtering operations on white noise. Let 1/ E L 2 ,
and let tXt, - 00 < t < oo} be a white noise. Define
Yt = !_"'", 1/(v)e i2 .-vIX. dv (S.14)
that is,
Yt = !_"'", 1/(v)e i2r • 1 d2.
Then, Y I is a q.m. continuous and wide-sense stationary process with a

spectral-density function given by
(S.15)
Equation (S.15) justifies the interpretation that a white noise has a
spectral-density function which is equal to a constant for all frequencies.
This is because if we filter a white noise {XI' - 00 < t < 00 I with an
ideal filter having a transfer function
a~v~b
otherwise
then the total average output power is given by
EIYd 2 = E lib ei2 .. vIXv dv /2
= i b
dv = b - a
Equation (S.15) also has another interpretation. It suggests that every

process with a spectral density S(II) can be represented as white noise
filtered by a time-invariant linear filter whose transfer function h satisfies
!h(lI)j2 = S(II). This interpretation plays an important role in Wiener
filtering problems. Closely related to linear time-invariant filtering is the
idea of a differential equation driven by a white noise. Consider the
following example. Suppose XI. is a white noise and Y t satisfies
"V, = -Y, + XI. (8.16)
A good guess is that a solution of this equation is given by
Y =
I
f '"
- '" 1
1
+ i27r1l e i2 .. vl X dv
A
p
(S.17)
Before we can ascertain this, we need to give a precise interpretation to

an equation such as (8.16).
Even if Xl is not a white noise, but a q.m. continuous process, an
equation like (S.16) still cannot be interpreted as an ordinary differential
equation involving sample functions of the two processes without assump-
tions such as separability, sample differentiability, etc. However, these

assumptions would not be necessary if we interpret (8.16) as an integral
equation. More generally, let {X" - 00 < t < oo} be a white noise with
{ZI, - 00 < t < oo} as the corresponding process with orthogonal incre-
ment.s. A second-order process {Y " t E [a,bJ} is said to be the solution of
the differential equation
Yt = aCt) Y/ + (3(t)X , (8.18)
if it. satisfies
Yt = Y" + fat a(s) Y, ds + fat (3(s) dZ. (8.19)
Proposition 8.3. Let a and (3 E L2(a,b). Then. (8.19) has one and only
one solut.ion with the same initial condition Y a , provided that
EIY a l2 < 00.
Remark: The proof will be omitted, because it is identical to the corre-

sponding proof for stochastic differential equations of the Ito type.
At this point we merely point out the following facts concerning
the solution of (8.19).
(a) If we set Y/O) = Y a , a S t S b, and
Y/n+l) = fat a(s) Y/n) ds + fat (3(s) dZ.

then I Yt(n), a S t S b, n = 0, 1, . . . J converges m q.m. to the
unique solution of (8.19).
(b) The solution of (8.19) is q.m. continuous on [a,bl.
If {Xt(nl , - 00 < t < 00, n = 1, 2, . . . } is a sequence of q.m.
continuous processes, then for each n, we can consider the differential
equation
(8.20)
where Y,(n) is the q.m. derivative of Y/n). Equation (8.20) is equivalent
to the integral equation
Yt(n) = Ya(n) + I: a(s) Y/n) ds + fat (3(s)X.(n) ds
where the integrals are q.m. integral!'. Suppose Y,,(n) ~
n __ oo Y a, and {X/nl}
converges t.o a whit.e noise Xt. Then we can show that (Y/n), a S t S b}
converges to a q. m. continuous process {Y t , a S t S b} for any b such
that a, (3 E L2(a,b). Further, {Y t , a S t S b} satisfies the equation
Yt = Y" + fat a(s)Y. ds + fat {3(s)X. ds a S t ::; b

These considerations justify the use of (8.19), even when the driving
force is "not quite white." Finally, we note that for our earlier example,
we can easily show that
Y
t
= f _
00
00 1 +1i27r1l e i2 .. vt dZ
v
(8.21)
if) the unique solution to the equation
t ~ a (8.22)
corresponding to the initial condition
Y
a
= f _
oo
00 1
1
+ i27r1l ei2.. va
-
dZ
v
(8.23)
For an arbitrary initial condition Y a , the general solution of (8.22) is of

the form
(8.24)
Hence, if EI Yal 2 stays finite as a ~ - 00, then
Y, a:·:·oo) f~oo e-(H) dZ.
By Proposition 8.2,
j~-
00 e-(t-.) dZ. = f -
00
00 27rill
1
+1e i2 .-vt dZ
v
Therefore, under rather general conditions, every solution of (8.22)

approaches (8.21) as the initial time a ~ - 00.
It is easy to generalize the preceding considerations to vector dif-
ferential equations
Y I = A(t)Y, + B(t)X,
where A and B are now matrices and Y, and XI are column vectors of
possibly different dimensions whose components are, respectively, second-
order processes and orthogonal white-noise processes. Proposition 8.3
is easily generalized to this case. The required conditions on A and B
are now
Llab 12 dt < IA,;(t) 00
Llab !Bi;(t)12 dt <

i,;
00
i,i
9. LINEAR PREDICTION AND FILTERING

Linear prediction and filtering are special cases of a general estimation
problem that can be formulated as follows: Let tXt, t E Tl be a second-
order process and let Y be a second-order random variable. Let Xx
denote Hilbert space generated by IX t , t E Tl. A random variable Y
is said to be a linear least-squares estimator of Y given tXt, t E Tl
if (1) Y E Xx, and (2)
ElY - YI2 = min EIZ - YI2 (9.1)

ZEXx
Filtering and prediction problems are estimation problems involving two

second-order processes {Xt! and {Yd, where for each t we want to find
the linear least-squares estimator of Y t given IX., 8 ~ tj. A pure predic-
tion problem is one where Y t = X t+ a for some positive Ct. In practice,
the parameter space for the two processes is usually the same, the most
common cases being [0,00) or (- 00,00). The fact that an estimator is
required for each t makes it especially important that the implementation
of the estimator be simple. Both the Wiener filter and Kalman-Bucy
filter, in different circumstances, achieve this goal extremely well. This
fact is responsible for the importance and the widespread use of these
filters. Before discussing these problems in detail, we first prove the
following important characterization of linear least-squares estimators.
Proposition 9.1.Let Y E Xx. Then, Y is a linear least-squares estimator

of Y given {Xi, t E Tl if and only if
E(Y - Y)Xt =0 for every t E T (9.2)
and equivalently,
E(Y - Y)Z = 0 for every Z E Xx
Proof: For any Z E Xx, Y- Z is again in Xx. Hence, if (9.2) holds,

then
EIZ - YI2 = EIZ _ Y + Y _ YI2
= EIZ - YI2 + ElY _ YI2 (9.3)
Therefore, (9.1) is satisfied.
Conversely, suppose Y satisfies (9.1). Set
Z = Y _ E(Y - Y)X t X
EIX t l2 t
Then, Z E Xx, and from (9.1), we have

EIZ - YI2 - ElY - YI2 2: 0
9. LINEAR PREDICTION AND FILTERING 117
However, by direct computation, we find
EIZ - YI2 = ElY - Yl2 - IE(YEIXt~;Xd2 ~ ElY _ YI2
Therefore, (9.2) follows.
Remark: Equation (9.3) proves the uniqueness of linear least-squares

estimators. If Z and Yare both linear least-squares estimators of
Y given tXt, t E T}, then EIZ - YI2 = ElY - Y12, so (9.3) implies
EIZ - YI2 = o. Hence, we shall refer to the linear least-squares
estimator of Y given {XI, t E T}. Furthermore, from Proposition
9.1 we see that the linear least-squares estimator of Y given {Xt,
t E T} is the projection of Y into the smallest Hilbert space Xx
spanned by {Xt, t E T}. We shall denote this projection by a more
explicit notation
The Wiener theory of filtering and prediction is distinguished by

two facts. (1) It deals with wide-sense stationary processes with well-
defined spectral-density functions, and (2) the given information on
which the estimator is to be constructed is the infinite past. These condi-
tions will be made more precise later. Under these conditions, it is often
possible to express the estimator as the output of a linear time-invariant
filtering operation on the observed process. This is an extremely con-
venient form for the solution from the point of view of implementation.
Let {Xt, - 00 < t < oo} and {Yt, - 00 < t < oo} be wide-sense
stationary processes. We assume that there exist bounded functions
Sx, S1/' and SX1/ such that
EXt+TX, = f-'X>", S.r(v)e i21rV7 d~ (9.4a)
(9.4b)
(g.4c)
The spectral-density functions S" and Sy are necessarily real and positive.
The cross-spectral density SX1/ need not be real. Under these assumptions
XI, Y I are not only individually wide-sense stationary processes, but are
in fact jointly wide-sense stationary in the sense that for arbitrary
complex constants a and b, Zt = aX t +
bY t is again a wide-sense sta-
tionary process. For such a process ZI, we have
EZt+TZ t = 1-«>", [la/ 2SxCv) + /b/ 2S II (v) + abSxy(p)

+ abS"1I(v)]ei2 .. VT dv (9.5)
In other words, the spectral-density function of {Z/, - 00 < t < co}

is given by
(9.6)
It is obvious that S. must be real and nonnegative for arbitrary a and b.
A necessary and sufficient condition for this is that for every l',
(9.7)
A proof of this fact is easily constructed by noting the fact that (9.6) is a
quadratic form involving the Hermitian matrix
S(l') = [~x(l') (9.8)

Sxy(l')
Hence, for SZ(l') to be nonnegative for every I! and a and b, it is necessary

and sufficient that S(I!) be nonnegative definite for every l'. This condi-
tion is equivalent to (9.7).
For such a pair of processes lXI, Y I , - 00 < t < oo}, we shall
consider the problem of finding the linear least-squares estimator of
Y I given {X., - 00 < 8 < t}. If we denote by Xx' the Hilbert space
spanned by IX., - 00 ~ 8 < t}, the problem is to find the projection
J?(YIIXXI). We recall that J?(YdXxl) is characterized by two properties:
(1) J?(YdXxl) belongs to XXi and (2) Y I - E(YdXxl) is orthogonal to
XXi. A solution to this problem in the form of a filtering operation will
be constructed using the ideas of white noise and white-noise integrals
that we introduced in the last section. First, we state the following
fundamental result of Paley and Wiener [1934, chap. 1].
Proposition 9.2. Let S E L1 be a real nonnegative function such that
f oo
-00
lIn S(l') I dl'
1 + l'2
< 00
(9.9)
Then there exists a function k C L2 such that

Ik(l')12 = S(l') (9.1Oa)
k(t) = 1_0000 ei27rPt k(l') dl' = 0 t < 0 (9.1Ob)
k(u + iv) = 1_0000 ei21r (u,+iv)tk(t) dt ~ 0 v>O (9.10c)
Remark: Condition (9.1Ob) shows that k is the transfer function of a

nonanticipative filter (physically realizable filter). Condition (9.10c)
is known as the minimum-phase condition in circuit theory. Roughly
speaking, it means that 11k is again non anticipative. However, 11k
has no inverse Fourier transform, so the nonanticipative property

is a little more difficult to define.
A complete proof is somewhat complicated and will be omitted.
Instead, we shall outline a procedure by which the desired h can be found.
First, consider the transformation
v = - tan-
o (9.11)
2
Then, (9.9) becomes
~ f~~ lin 8 ( - tan~) IdO < 00
so that In 8( - tan 0/2) has a Fourier series
In 8 ( - tan D =
n
""
2:
= - co
an ein8 (9.12)
where an are given by
an = -1
271"
f~.
e- zn8 In 8
-~
(- tall -
2
0) dO (9.13)
With the correspondence (9.11), we have
.8 1 - ill
e' =- -- (9.14)
I + ill
so that (9.12) and (9.13) become
a (~)n
n= - 00
n 1 + ill (9.15)
and
an = ~ f"
71" - "" 1
~1_ (.!~-,,)n In 8(11) dv
+ 112 1 - ~11
(9.16)
If Ihl2 = 8, then
In 8(v) = In h + In h (9.17)
We now identify
h(p) = exp -
A
2
[ao +
n=1
-
(11+11J
2:"" an - - ill)n]
. (9.18)
Because a_ n = an, (9.17) is satisfied, hence also (9. lOa) . Conditions (9. lOb )
and (9.lOc) follow from the fact that
. ao ~ [l-i(U+iV)]"
feu 2" + /=1 an 1 + i(u + iv)
+ 'tV) =
(9.19)
is analytic for v < o. Hence, k(u + iv) is analytic for v < 0, and (9.lOb)
follows (more or less) from contour integration closing the contour in
the lower half plane. Condition (9.lOc) follows from the fact that if
fez) is analytic, then elfz ) has no zero. We call the above procedure the
spectral-factorization procedure, since S is usually identified with a
spectral-density function.
Let {Xl, - 00 < t < 00 I have a spectral-density function S"" and
let k be obtained by factoring Sx so that (9.10) is satisfied with Sx replacing
Sin (9. lOa) . Then, in view of our discussion on (8.15), {X t , - 00 < t < 00 I
can be regarded as the output of filtering a white noise Itt, - 00 <
t < 00 I with a nonanticipative filter having transfer function k. Condi-
tion (9.lOc) means that the white noise It"~ - 00 < t < 00 I can in turn
be obtained by filtering tXt, - 00 < t < 00 I by 11k, which is also
nonanticipative. A more precise statement of these results can be made
in terms of the process with orthogonal increments {Zt, - 00 < t < 00 I
corresponding to ttl, - 00 < t < 00 I. CO'1dition (9.lOb) then implies that
there exists a process {Z/, - 00 < t < 00 I withZ o = 0 andE dZ t dZ. = Ot. dt
such that
X t = f-"'", h(t - s) dZ.

= f-"'", ei2Htk(~') dZ. (9.20)
Condition (9.lOc) implies that for each t

Z, E xxt (9.21)
We can express Z I more explicitly in terms of the spectral process
{X., - 00 < v < 00 I as
Zt = f-"'h(lI)
'" .--!- (rJo ei2TrV' dS) dX.
t (9.22)
which points out even more clearly that tt (=Zt) is obtained by filtering
X t by 11k.
We shall now give the main result of the Wiener theory of filtering
as follows [Wiener, 1949].
Proposition 9.3. Let tXt, Yt, - 00 < t < 00 I be a pair of wide-sense sta-
tionary processes satisfying (9.4). Let k be obtained by factoring Sx
so that (9, lOa) to (9, lOe) are satisfied. Let I Z" - 00 < t < 00 I be
as in (9.20) to (9.22). Then,
1!i(YtIJCxt) = f", get - 8) dZ. (9.23)
where g is given by
get)
•
,,= f " e,2,,,(
. -,--
-00
SXy(v) dv
h(v)
-oo<t<oo (9.24)
Remark: Because !SXy/k!2 = IS",!.12/S", is bounded by Sy (from (9.7», the

integral in (9.24) is well defined as a limit in L2 mean.
Proof: To prove (9.23), we need to verify the two conditions:
f~", get - 8) dZ s E JCx t (9.25a)
EY,X T = E [f~", get - 8) dZs] X T T ~t (9.25b)
To verify (9.25a), we note that f~", get - 8) dZ. is in JCz t , where JCzt
denotes the Hilbert space spanned by {Z 8, - 00 < 8 ~ t J. Since for each
t, Z, E JCx t , so JCzt C JCx t , verifying (9.25a). To verify (9.25b), we note
that
f-"'", get - 8) dZ. = f~ '" get - s) dZ. + 1'" get - s) dZ.

= f-"'"" [S",!, (v)/k(v)]e i2nt dZ v (9.26)
Hence, from (9.20) we have
E [f-"'", get - 8) dZ.] XT = f-"'", SXy(v)e i2 11'Vll-T) dv

= EY,X T (9.27)
On the other hand, since
X T = f~", h(T - 8) dZ.
and E dZ t dZ. = Ot8 dt, we have
E [1'" get - 8) dZ s ] X T = 0 whenever T ~ t
Therefore.
E [f~", get - 8) dZ,] X T = E [f-"'", get - 8) dZ s ] X T
= EYtX"T T ~ t
which verifie:-; (9.2.r5b) and completes the proof. I
122 SECON~RDER PROCESSES
The solution (9.23) can be put in a more useful form by using (9.22).
If we define
'9(v) = 10"" g(t)e- i2nl dt (9.28)
then we can write
(9.29)
which is the output to a time-invariant linear filter with tXt, - 00 <

t < 00 I as input and '9/1" as its transfer function. The filter is called the
Wiener filter. We should note that '9 is not the Fourier integral of g, since
the lower limit in the integral in (9.28) is 0 and not - 00, and g(t) in
general is not zero for t < O.
For a general S"" the spectral-factorization procedure that we out-
lined earlier (9.16 and 9.18) may not lead to a closed-form solution for 1".
It is also somewhat complicated. The factorization problem becomes
trivial if S., is a rational function. If Sz is rational, then we can always
write
n
m
(v - Zk)(V - Zk)
S,,( v) = K2 ::...k;_I=--_ _ _ _ __ (9.30)
k=l
n (v - Pk)(V - ilk)
where every Zk and Pk have positive imaginary parts. Since /1,,/2 = S" and
1,,(u + iv) has neither poles nor zeros for v < 0, h must be of the form
n
m
(v - Zk)
1,,( v) = A =-k:......:1'----_ __ (9.31)
n
k=1
(v - Pk)
where /A/2 = K2. The consta~lt A can be determined from (9.18),

m
n (-i - Zk)
k=1
t
n( -i) = ea.!2 =A "--n-=------
n (-i -
k=1
Pk)
and ao is given by
ao 1 f"" - -1- I n S(IJ) dIJ

=-
7r .- "" 1 + IJ2
However, we note that any choice of A would yield an h satisfying (9.10).

Equation (9.18) merely gives a specific one. When {XI, - co < t < co}
is real, then Sx(IJ) is an even fUllction of IJ, and the constant A in (9.31)
can always be chosen so that the impulse response
is real valued. This is a convenient choice. However, as far as the Wiener

filtering problem is concerned, the choice of A is unimportant. It is fairly
obvious that the solution (9.23) or (9.29) is independent of this choice.
For our first example, consider the following pure prediction
problem: Sx(IJ) = [1 +
(27rIJ) 2]-1 and Y, = X 11a , a > O. The factorization
is trivial and we can take
_ 1
h(IJ) -
1 + 1-27rIJ
From (9.24) get) is found to be
get) = f"
-""
e1,' 1I"'pt
2e'1.21rlla
1
1
+ i27r/J
d 'II
e-(t+a) t> -a
{O
t < -a
From (9.28) we find
1
= e- a - - - -
1 + i27rIJ
so that
E(Yt/JCx t ) = e- a f-"""" e i27rPt elX. = e-aX t
For this simple example, the predictor is nothing more than an attenuator.
As a second example, suppose
S(IJ) = [1 + ;2.,), 7 + ;2.,),

4
1 + (27rIJ)2
Factorization of Sz(lI) = [1/1 + (211"11)2] + [1/4 + (211"11)2] yields

. V5 + V2i211"11
h(lI) = (1 + i211"11)(2 + i211"1I)
This gives
2 + i211"11
Therefore, (9.24) can be computed to yield

fOO 2 - i211"11
get) = ei2 -n-vt (1 + i21rl1)[V5 _ V2 (i211"1I)) dll
f (3
-00
OO 1
= v'5 + V2 1 + i211"11
-00
+ 2 + V.5/2 1 ) ~h~
- dV
1 + ,/5/2 V5 - V2 i21rl1 ~
:3
----:=-----= e- t t> 0
V5 + v'2
2 - V5/2 exp ( I~ t) t<O
V5 + V / 2 "V 2
and (9.28) gives
3 1
1(11) = V5 + V2 1 + i211"11
Finally, from (9.28) we get
= foo . :3 2 + i211"11 ei2.-vt dX v

- '" V.5 + V 2 v'!5 + V2 i21rl1
= 'v
.~ [X, + (2 - v-/-
5/2)
ft exp
2+ 10 -00
[- V5/2 (t - s)]X. dS] (9.31a)
It is quite obvious that our formulation of the filtering problem can

be extended to cover cases where both the process to be estimated and
the observation process are vector valued. The general case involves a
pair of vector-valued processes lXI, YI , t E TJ, not necessarily of the same
dimension, whose second-order properties are completely known. The
problem is then to estimate Y i given IX., sSt}. In wide-sense stationary
cases where spectral-density functions exist, one would expect that the
Wiener theory can again be developed. If we denote by A+ the Hermitian
adjoint of a matrix A, we can define the matrix spectral-density function
Sx by
EXt+TXT+ = f-"'", Sx(v)e i2 ...t dv
The key step in obtaining a solution to the Wiener filtering problem is

again a spectral-factorization problem. But now, ,eve have to obtain a
factorization of the matrix-valued function Sx(v) into the form
Sx(v) = h(v)h+(v)
such that the matrix h(v) satisfies conditions similar to (9. lOb) and
(9.lOc). Except that instead of (9.lOc), the determinant of h(u +
iv) is
to have no zero for v > O. The final solution can be expressed in a form
which generalizes (9.29) as
where the matrix y(v) is similarly defined, as in the scalar case. The matrix
spectml-factorization problem is considerably more difficult than the
scalar problem. For the rational case, a number of finite algorithms to
achieve the factorization have been derived [Wiener and Masani, 1958;
Wong and Thomas, 1961; Youla, 1961].
In some areas of application, the Wiener formulation of the filtering
problem is not appropriate because of some of its inherent assumptions.
Among these are the following: (1) wide-sense stationarity and existence
of spectral densities; (2) the second-order properties of the processes
lXI, Y(, - co < t < co} are known, and no other information is known;
(3) the estimator is to be based on the infinite past of the observation
process. These limitations are removed in the formulation of the filtering
problem due to Kalman and Bucy. Im:tead, they made other assumptions
which are more natural in a great variety of applications. The form of the
solution is also different. While in the Wiener theory, the final solution
is in the form of a time-invariant linear and nonanticipative filter, the
Kalman-Bucy theory yields a differential equation which is satisfied by
the estimator. Implementation of the "filter" in feedback form is thus
immediate [Kalman and Bucy, 1961].
The Kalman-Bucy filter problem is usually stated in vector form
as follows: Let {XI, Y l , t 2: to} be a pair of vector-valued second-order
processes. The X process will be the observation process, and the Y
process is to be estimated. While this notational convention is consistent
with our earlier discussion, it is not universal. Often in the literature, the
two letters X and Yare used in just the opposite way. The basic assump-
tions are the following: Throughout, boldface will be used to denote
vectors and matrices, prime denotes transpose, +
denotes Hermitian
adjoint, and I denotes identity matrix.
1. The process to be estimated satisfies
Vt = F(t)Yt + A(t){t t> to (9.32)
where {t has components that are orthogonal white-noise processes

with
(9.33)
2. The observation process satisfies
Xt = H(t)Yt + B(t){t t > to (9.34)
Remarks:
(a) Both (9.32) and (9.34) are to be interpreted along the lines
discussed in Sec. 8. We shall denote the process with orthogonal
components which correspond to {t by Zt, so that formally Zt = {t.
If to > - 00, it is convenient to set Zro = o.
(b) We note that (9.34) is not really a differential equation, since
X t can be immediately expressed explicitly in terms of the Y and {
process by integrating (9.34). However, the problem would be no
more general if we replace (9.34) by a linear differential equation in
Xt. Such an equation can always be changed into (9.34) by redefining
the observation process.
(c) It is necessary to assume that the initial values Xlo and Y to are
random variables orthogonal to {Zr, t ;::: to}, in particular they can
simply be constants.
As usual, let xxt denote the smallest Hilbert space generated by
tXT' to ::; T ::; t l, and let E denote projection. The Kalman-Bucy filtering
problem is to find E(Y.IXxt), and the main results can be summarized
as follows.
Proposition 9.4. Let tXt, Y t , t ;::: to} satisfy (9.32) and (9.34). Let ~(slt) be
the unique solution of
d
ds ~(slt) = F(s)~(slt) s > t (9.35)
with initial condition ~(tlt) = 1. Let A(t) and B(t) III (9.32) and
(9.33) be continuous functions on [to, 00).
(a) For 8 :2: t,

.2(YsIJex t) = ~(8It)2(YtIJext) (9.36)
(b) Let Vt denote .2(YtIJex t), then
V t = F(t)V t + K(t)[X t - H(t)V t] (9.37)

(c) Let £t = Y t - Vt be the error vector, and let l:(t) denote the
covariance matrix
l:(t) = E£t£t+ (9.38)
Then,
:t(t) = [A(t) - K(t)B(t)][A(t) - K(t)B(t)]+
+
[F(t) - K(t)H(t)]l:(t) +
l:(t)[F(t) - K(t)H(t)]+ (9.39)
and
K(t)B(t)B+(t) = A(t)B+(t) + l:(t)H+(t) (9.40)
Remarks:
(a) A complete proof is rather complicated, and will not be pre-
sented. Instead, we shall give a heuristic derivation.
(b) The continuity conditions on A(t) and B(t) are sufficient, but
not necessary. However, some smoothness condition is needed.
Unfortunately, this point is largely lost in our formal derivation.
(c) Equation (9.39) can be simplified somewhat by using (9.40). If
B(t)B+(t) is invertible, these two equations can be combined to give
a single equation in l:(t), which is a nonlinear differential equation
of the Riccati type.
(d) Once K is determined from (9.39) and (9.40), implementation
of (9.37) in feedback form is immediate and yields a continuous
estimate. Feedback implementation of (9.37) is often referred to
as the Kalman-Buey filter.
First, we derive (9.36) as follows: From (9.32) we can write for
8 :2: t,
Y. = ~(8It)Yt +~ 8 ~(81t)A(T) dZ T
Therefore,
J?(Y.IJext) = ~(8It)Vt + ~. ~(8IT)A(T) .2(dZ IJCx

T
t)
Now, let Jet denote the smallest Hilbert space containing X t ., Y t ., and
lZ1, T ::; tl. Because of (9.32) and (9.34), Jex t is contained in Jet. Because
Zt. is a process with orthogonal increments, and X/., Y1) are orthogonal to
{Zt,t ~ to},
Hence,
2(dZ.lxx t) = 2[E(dZ.!x t)lxxt ] = 0 T ~ t
It follows that
E(Y8!Xxt) = (sIOY t
which was to be derived.
To derive (9.37), we first note that every element in xxt can be
written in the form of
Since Yt E xxt for each t, we can write
(9.41)
Thus,
dYt (t~.
= K(tlt) dXt + dt [a(t)Xt o + ito at K(tIT) dX.]
(9.42)
Since the bracketed terms in (9.42) are in Xxt, we can rewrite (9.42) as
dY t = K(tlt) dXt + E[dY t - K(tlt) dXtlxxt] (9.43)
Now, from (9.34)
2(dXtlxxt) = H(t)Yt dt (9.44)
From (9.32), we have
2(dYtIXx t ) = E(E(Y'+dt/XX t+dt ) - E(Yt/JCxt)IJCx t)

= E(dYtlxxt) = F(t)Yt dt (9.45)
Using (9.44) and (9.45) in (9.42) yields
dY t = F(t)YI dt + K(tlt)[dX t - H(t)Y t dt]
which is just (9.37) if we set K(tlt) = K(t).

To derive (9.39), we combine (9.32), (9.34) and (9.37) and obtain
dEl = [F(t) - K(t)H(t)]E/ + [A(t) - K(t)B(t)] dZ/ (9.46)
Now,
d~(t) = ~(t + dt) - ~(t)
= E£t+dt£t+dt - E£t£t+
= E(£t+dt - £t)(£t+dt - £t)+ +
E(£t+dt - £t)£t+ + E£t(£t+dt - £t)+
=E d£t d£t+ + E d£t£t+ + E£t d£t+
Using (9.46) and the fact that dZ t is orthogonal to Jet, we find
d~(t) = [A(t) - K(t)B(t)][A(t) - K(t)B(t)]+ dt
+
[F(t) - K(t)H(t)]~(t) dt +
~(t)[F(t) - K(t)H(t)]+ dt
which is just (9.39).

Finally, to derive (9.40), we begin by writing £t as the solution of the
differential equation (9.46) in the form
(9.47)
where tl! satisfies
itd tl!(t!to) = [F(t) - K(t)H (t) ]tl!(t!to) t > to (9.48)

tl.;(t!t o) = I
Since £t is orthogonal to Jex t , we have for s ~ t,
E£tX s+ = 0 = E£t
" [rs (-']+
ito H(T)Y, dT + ito B(T) dZT (9.49)
Furthermore, for T ~ t, Vt E Jex t , hence

E£tYT+ = EE,(Y, - V,)+ = E£t[T+ (9.50)
Therefore, (9.49) becomeH
o= t: (E£t£,+)H+(T) dT +f tl!(t!T) [A(T)

- K(T)B(T)]B+(T) dT s ~ t (9.51)
whence for s < t,
(E£t£s+)H+(s) + tl!(t!s)A(s)B+(s) = tl!(t!s)K(s)B(s)B+(s) (9.52)
Letting sit and noting the continuity assumptions, \ve get
K(t)B(t)B+(t) = ~(t)H+(t) + A(t)B+(t)
which is (9.40). If we use (9.40) in (9.39), it is not hard to show that
(9.39) can be rewritten as
'i:.(t) = A(t)A+(t) - K(t)B(t)B+(t)K+(t)
+ F(t)l:(t) + ~(t)F+(t) (9.53)
which is a useful alternative form to (9.39). I
For an example which illustrates these procedures and illustrates the

difference between the Wiener filter and the Kalman-Bucy filter, con-
sider the following problem:
+
We want to estimate Y t using data I Y T NT, to < T ::; t} where
{Y t, Nt, - 00 < t < oo} are two uncorrelated wide-sense stationary
processes with spectral densities
(9.54)
and
(9.55)
In order to use the Kalman-Bucy procedure, we first have to convert the

problem into a standard form. From our earlier discussions concerning
spectral factorization, we know that Y t and Nt can be represented as
(9.56)
and
N
t
= -1
2
It
-00
e- 2 (t-T) dV
T
(9.57)
where ZT) V are mutually uncorrelated processes with orthogonal incre-

T
ments. This means that Y t and Nt satisfy

dYt = - Y t dt + dZt (9.58)
dNt = -2Nt dt + dVt (9.59)
Now, let
X t = e2t (Yt + Nt) (9.60)
Then, X t satisfies
dX t = e 2t (dYt + 2Y dt + dNt + 2N
t t dt)
= e 2t Y t dt + e 2t dZ t + e 2t dV t (9.61)
We can now identify the quantities F, H, A, B of (9.32) and (9.34) as
follows:
F(t) = -1
H(t) = e 2t
(9.62)
A(t) = [1 0]
B(t) = [e 2t e2t ]
Equation (9.40) becomes
2e 4t K(t) = e 2t + e21~(t) (9.63)
and (9.53) becomes
:t(t) = 1 - 2e 4t K2(t) - 2};(t)

= i - 3};(t) - i};2(t) (9.64)
Equation (9.64) can be transformed into a linear equation by the sub-

stitution };(t) = 20-(t)/u(t), giving us
ii(t) + 30-(t) - iu(t) = 0 (9.65)
Solving (9.65) with initial condition 20-(to)/u(t o) = };(t o), we find

20-(t)
};(t)
= u(t)
_'" ){I - [3/VlO.0 -
- ... Uu
l/VlO ~(to)l tanh (VlO/2)(t - to)}
1 + [3/VlO + ~(to)/VWl tanh (VlO/2)(t - to)
(9.66)
The initial value };(t o) can be evaluated as follows: First, we note that the
linear least-squares estimator of Yto given Xto has the form aX to ' where a is
determined by
E(Yto - aXt.)X to = 0 (9.67)
This yields
(9.68)
Finally,
~(to) = EI Y'o
- aXt o l2 = EliYfo - iN to l2
= HEIYto !2 + EIN'oI2) = i (9.69)
This completes the solution for };(t), and via (9.63), also completes the
solution for K(t), and hence the Kalman-Bucy filter.
If we let to - - <Xl in (9.66), we get
~(t) - - - )
to~ - 00
(V 10 - 3) (9.70)
which yields K(t) = (V5/2 - 1)e- 2t and (9.37) becomes
dY't = - vt :Vt dt + (vt - 1)e- 2t dXt (9.71)

This gives us
(Vi-
:V t = f~", 1) exp [- Vi (t - r)]e-2T dXT
= f~", (Vi - 1) exp [- ",,/i (t - r)] (d~T + 2~T dr) (9.72)
where we have set ~T = e- 2tX T = Y T + NT' The final expression in (9.72)

represents the output of a linear time-invariant filter with Y t + Nt as
input and with a transfer function given by
( ~5 ) (27riv) +2 3 2 + i27rV
2" - 1 (27riv) + Vi = Vs + V2 Vs + V'2 i27rv
which should be compared with (9.31a).
EXERCISES
1. Test whether each of the following functions is nonnegative definite.
(a) R(t,s) = { ~ - It - sl It - sl :::; 1

it - sl > 1
(b) R(t,s) = e-I,-o, - 00 < t, s < 00
(c) R(t,s) = elt- ol
(d) R(t,s) = {~
It - sl :::; 1
It - sl > 1
2. Let Ao, AI, . . . be the eigenvalues of (4.12).
(a) Show that
L L
N N
R(t,t) - A.I'I'.(t)I' = E [ X, - 'I'.(t) lab oP.(s)X. ds [2 ~ 0
n=O n=O
(b) Show that
L
N
An :::; lab R(t,t) dt < 00
n=O
N
Hence,
L
n=O
An must converge and An ---> O.
n--> '"
3. Suppose that a q.m. continuous and wide-sense stationary process IX" - 00 <
t < 00 J has a covariance function R(·) which is periodic with period T, that is,
R(T + T) = R(r) for all T in (- 00, 00 )

EXERCISES 133
Define for n = 0, ± 1, ± 2,
Z
n
= -T !c
T X,e- in (27r IT)t
1 O '
(a) Show that {Znl are mutually orthogonal, that is,
whenever m .= n
(b) Show that for each t
L
00
X t = Znein (27rIT)t
n=-Q()
(c) Suppose that
R(T) = ~ _1_ cos n (211")

1 + n2
T
1..
n=O
T
find the eigenvalues and a complete set of orthonormal eigenfunctions to
10 T R(t - 15)'1'(15) ds = A<p(t)
4. Let R(t,s) be given by
R(t,s) = {~ - It - 151 o ::; It - 151 ::; ]

elsewhere
Find the eigenvalues and a complete set of O-N eigenfunctions for
lot R(t,s)<p(s) ds = A<p(t) 0 ::; t ::; t

. a2R(t,8)
Hmt: - - - = -20(t - 15) for 0 < t, 15 <t
at 2
5. (a) Suppose that R(t,s) satisfies
f
i...
ak a2k RU,s)
at Zk
= {- bk
i...
aZk oCt -
at"
8) a < t, 15 < b
k=O k-O
Show that the integral equation
lab RU,s)<p(s) ds = A<p(t)
can be reduced to the differential equation

71 m
\' a2k<p(t) '\' a2k<p(t)
A i... ak iJi2k = 1.. bk at2k a <t <b
k-O k=O
together with appropriate boundary conditions.

( b) Verify that, formally at least, if R( t,s) satisfies
R(t,s)
then it also satisfies
I
n a2k
ak - k R(I,s) =
Im
bk -
a 2k
/)(1 - 8)
al 2 al 2k
k=O k=O
(c) Reduce the equation
lab e-I'-.I~(8) ds = A~(t)

to a differential equation and two linearly independent boundary conditions on
~(a), ~(b), .p(a), and .p(b). [Hint: Each of these four quantities can be expressed as
a linear combination of lab e-8~(s) ds and lab e8~(s) ds.]
6. Suppose that {W" I ;::: O} is a standard Brownian motion and

X,(w) = J(t)WT(,)(w) o~ I ~ T
where J(t) is a continuous function and T(t) is a continuous nondecreasing function
with T(O) = O. Using the results on the solutions of (4.31), find an expansion of this
form
L <>n(I)Zn(w)
00
X,(w)
n=O
where {Zn, n = 0, 1, 2, . . . } are independent Gaussian random variables with
EZmZn = /)mn. What conditions do we need, if any, in order that each Zn belongs
to JCx?
7. Suppose that {X" - 00<t < oo} has a covariance function

R(r) = ie-ITI (3 cos T + sin Iri)
Find its spectral-density function S(v), - 00 < v < co.
8. Suppose that {X" - 00 < I < co} has the spectral density given in Exercise 7,
and {Y" - 00 < I < oo} is such that:
(a) Y, E JCx for every I
(b) EY,J? = .g.e-I'-'I[cos (I - 8) + sin (I - 8)]. Find EY,Y•.
9. Suppose that {X" - 00 < t < oo} is wide-sense stationary. Show that for a fixed
constant W, {e i2 n-lI'tXt, - 00 < I < oo} is again wide-sense stationary. Is
{cos 211" WtX" - 00 < I < co} wide-sense stationary?
EXERCISES 135
N
10. Suppose that X, = L Xk', - 00 <t < co, is the sum of N wide-sense stationary
k=l
and q.m. continuous processes Xu, X 2t, ••• , XN'. Show that we have a repre-
sentation
x, =
-0() e 'dg.
/0() i21rV
Is the process {g., - co < II < co} always a process with orthogonal increments?
(Note: X, = cos 2".WtZ" with Z, stationary, is an example of such a process.)
11. For a process of the type given in Exercise 10, show that
12. Let {X" - co <t < oo} be a wide-sense stationary process with a spectral
representation
0() A
X, = / _ 0() ei2 ",' dX.
Let
where ",,(11), - 00 < II < 00, is a real-valued function. Show that {Y" - 00 <
t < 00 I is wide-sense stationary.
13. Suppose that {X" - 00 < t < oo} is a real-valued wide-sense stationary process,
and let X, denote its Hilbert transform
0() A
X, = / -0() (-isgn lI)e i21rVt dX.
Let EX,X. = e-t'-·/.

(a) Show that X, is real valued
(b) Find EX,X, and EX,X.
(c) Let Z, = cos 27rvotX, + sin 27rv ot)(,. Show that Z, is of the form
with ",,(11) = 2".[v - 110 sgn Ill. [Hence, it must be wide-sense stationary (see Exer-
cise 12).J
14. Let {Z" - co < t < oo} be a process with orthogonal increments such that EZ,
0, EIZ, - Z.12 = It - sl and Zo = 0.
(a) Show that
EZ,t. = t(ltl + lsi - It - sl)

(b) Let {tnt, - 00 <t< 00 J be defined by
tnt = n(Zt+lln - Zt) n = 1,2, . . .
Show that for each n, {tnt, - 00 < t < 00 J is wide-sense stationary and find its
spectral-density function S,,(,,), - 00 < " < 00.
(c) Show that tnt converges to a white noise in the sense of (8.2) and (8.3).
15. Let f be a differentiable function such that its derivative j is continuous on [a,bj.
Show that
(b
Ja f(t) dZ t + Ja(b f(t)Z,
.
dt = f(b)Zb - f(a)Za
where I Zt, - 00 < t < 00 J is the process described in Exercise 14. (Hint: Make
use of the sequence It"d defined in Exercise 14 and show that
(bj(t) (ltn8dsdt~
ia}o n~oo In(bj(t)Ztdt
The rest is easy.)
16. Use the fI'sults of Exercise 15 and show that the solution of the integral equation
Y, = Ya - la t
Y. ds + Zt - Za
is given by Y , = Yae-(l-a) + l' e-(l-') dZ •.
17. Suppose that IX" a S t S b J is a q.m. continuous process such that
02
-
at oS
EXtX. =
-
oCt - 8) + pet,s) a<l<b
where p is a continuous nonnegative definite function on [a,bj.
(a) Show that X(f) "" fb/(t) dXt is well defined for all f satisfying (b 1/(01 2 dt
a Ja
such that
(1) X(·) is linear, that is, X(af + fJg) = aX(f) + fJX(g)
(2) If a S c < d S band 1 = I,d is the indicator function of [c,d), then
X(f) = Xd - Xc
(b) Find an expression for EX(f)X(g)
(c) Show that every element Y in Xx can be represented as
Y = 1b 7J(t)dX t + kXa
18. Let {W" t ;::: 0 J be a standard Brownian motion process, and let 11 be a real-valued
second-order random variable (EA = 0, EA 2 = 1) independent of the W process.
Suppose that the process
XI = At + W, 1;:::0
EXERCISES 137
is observed. Let At denote the linear least-squares estimator of A given X., 0 :::;;
s :::;; t. Find an explicit expression for At in the form of
A, = lot h(t,s) dX.

19. Let X, = Y cos 21TWt + Nt, where {N" - 00 < t < 00 I is a second-order process
with zero mean and ENtN. = e-'olt-81, and Y is a random variable with EY = 0,
E/ Y/2 = 1 and EY Nt = 0 for all t. Let y, denote the linear least-squares estimator
of Y given X., 0 :::;; s :::;; t.
(a) Find Yt in the form of
Pt = lot h(t,s) dX. + Xo

(b) Find Pt as the solution of a Kalman filtering problem with observation process
Zt = e·otX,. That is, show that Pt satisfies
Pt = a(t) Pt + K(t)[Z, - (3(t) P,]
and find a(·), K( .), and (3( .).
20. Suppose that {X" - 00 < t < "1 has a spectral density given by
1
S.(p) = 4 + (21TP)4
Find the predictor 1?(Xt+a /JCx t), and express it in the form
21. Let {X" - 00 <t < 00 I be as in Exercise 20, and let {Nt, - 00 < t < 00 I have
spectral density S.v(v) = 1/[1 + (211"v)21. Assume that EX,N, = 0 for all t and .~
and define
Y, = X, + Nt
Find 1?(X./JCy') and express it in the form
1?(Xt/JCyt) = 1_"'", ei2.... tH(p) dY,.

22. Suppose that the quantities A, B, F, and H appearing in (9.32) and (9.34) are
given as follows:
A(t) = [~]
B(t) = 1
F(t) = [~ - ~J
H(t) = [0 1)
Find the matrix K(t) appearing in (9.37) for t ~ 0, assuming]; (0) = o.

23. Suppose that Y. satisfies a differential equation

"V, = Y t +'71
where ET/tfi. = oCt - 8). Let X, satisfy
X, = y, +1;,
with E'7'~. = 0 and EI;,~. = oCt - 8). Reexpress these equations in the form of
(4.32) and (4.34) with
and
24. Suppose that in Exercise 23 the process 1;" instead of being white, satisfies
EI;,~. = e-/'-·/. Reexpress the two differential equations in the form of (4.32) and
(4.34) with suitable choices for X, and {to (Hint: Now (d/dt) 1;, +
1;, is a white noise.)
4
Stochastic Integrals and
Stochastic Differential
Equations
1. INTRODUCTION
Roughly speaking, stochastic differential equations are differential
equations driven by Gaussian white noise. Here, we are using the term
"stochastic differential equations" in a restricted sense and not merely to
denote differential equations with some probabilistic aspects. The impor-
tance of stochastic differential equations is largely due to the fact that the
solution of such an equation is a sample-continuous :Markov process, and
conversely, a large and important class of sample-continuous Markov
processes can be modeled by the solutions of stochastic differential equa-
tions. From the point of view of applications, this is a direct benefit of
using white noise as a noise model, and this fact accounts for its popular-
ity. After all, white noise is, at best, a tolerable abstraction and is never a
completely faithful representation of a physical noise source. Its raison
d'~tre is the simplicity of analysis that it brings about. We have seen this
in connection with Kalman filtering, and we shall see it again in the
Markovian nature of solutions to stochastic differential equations.
139
140 STOCHASTIC INTEGRALS AND STOCHASTIC DIFFERENTIAL EQUATIONS
Stochastic differential equations are defined in terms of stochastic

integrals, which in turn need to be defined. The generally accepted defini-
tion for a stochastic integral is the one due to Ito and is often referred to
as the Ito integral. We shall call it simply the stochastic integral. In addi-
tion to its role in stochastic differential equations, the stochastic integral
is of tremendous importance in its own right. The main source of its
importance is due to the fact that an important class of martingales can be
represented as stochastic integrals. We shall explore this aspect of sto-
chastic integrals a little later.
It turns out that a calculus based on Ito's definition of stochastic
integrals is not compatible with rules of ordinary calculus. Since the
differential equations that we encounter in practice must usually be
interpreted in terms of ordinary calculus, it raises the question whether
stochastic differential equations can really be used to model differential
equations driven by "approximately white" noise. This is an important
question, since the uniqueness, existence, and Markov property of the
solution to a stochastic differential equation are all based on the interpre-
tation of the equation in terms of stochastic integrals. We shall pose this
question in a more precise way and give an answer. The answer is roughly
that a differential equation driven by "nearly white" Gaussian noise can
indeed be modeled by a stochastic differential equation, but in general, a
"correction term" has to be added to the equation.
The solution of a stochastic differential equation not only is a sample-
continuous Markov process, but under quite general conditions, its tran-
sition-probability density function can be shown to satisfy a pair of partial
differential equations called the backward and forward equations of
Kolmogorov, or the diffusion equations. These equations were derived by
Kolmogorov some time before Ito's work on stochastic differential equa-
tions, and they took on new light in view of Ito's results.
Both Kolmogorov and Ito were motivated in their work by the goal
of discovering conditions under which "local" properties of a Markov
process completely determine its probability law. This problem is brought
into a sharper focus in the case of Markov processes with stationary
transition probabilities. In that case, the results of semigroup theory can
be used to give a more complete answer to the question, "When do local
properties determine completely the probability law?" Theory of Markov
semigroups is sufficiently different to require a separate treatment. It will
be treated in the next chapter.
Finally, we should mention that by and large we shall restrict our-
selves to scalar-valued processes in this chapter. For applications, extension
to vector-valued processes is of great importance. This extension is not
difficult in spirit, but is rather complicated in detail and involves cumber-
some notations.
2. STOCHASTIC INTEGRALS 141
2. STOCHASTIC INTEGRALS
Let (g,a,<p) be a fixed probability space. Let {ai, - 00 < t < oo} be an
increasing family of sub-a- algebras of a, and let {WI' - 00 < t < oo} be a
Brownian motion process such that for each 8, the aggregate {W t - W.,
t ::::: 8} is independent of a. and W t is at measurable for each t. It follows
that for 8 2:: 0
EatW t+ = W,
8
(2.1)
Eat(W t+8 - W t )2 = 8 a.s.
We recall that we refer to this situation by saying that {W" a" - 00 <
t < oo} is a Brownian motion. By a stochastic integral we mean a quantity
of the form
(2.2)
Because W t is neither differentiable nor of bounded variation, (2.2) does

not have a well-defined interpretation aR an integral in the ordinary sense.
H the integrand cp does not depend on w, then (2.2) can be treated as a
second-order stochastic integral as introduced in Sec. 3.6. Interpreted in
that way, only the property of a Brownian motion as a process with
orthogonal increments is made use of, and not its other properties. If cp is
random, i.e., it depends on w, then (2.2) has to be defined anew. It turns
out that its definition now depends in a crucial way on the martingale
property (2.1) of W,.
We make the following assumptions on the integrand cp:
The function cp iR jointly measurable in (w,t) (with respect to a in w

and the Lebesgue measure in t). For each t, CPt is measurable with
respect to a, (2.3)
cp sa tiRfies lab E 1cp, 12 dt < 00 (2.4)
The stochastic integral is now defined in the following way:

1. If there exist times to, t 1 , • • • , in independent of w, such that
a = to < tl < . . '. < tn = band
tv :::; t < t.+ 1
cp(w,t) = CPv(w) (2.5)
V = 0, . . . , n - 1
and if rp satisfies (2.3) and (2.4), then we call cp an (w,t)-step function and
define the stochastic integral by
l
n-1
lab cp(w,t) dW(w,t) CPv(w)[W(W,tv+1) - W(w,t.)] (2.6)
v=o
2. If cP satisfies (2.3) and (2.4), then we shall show that there exists a
sequence of (w,t)-step functions {CPn(w,t)} satisfying (2.3) and (2.4) such
that
(2.7)
It will then follow that lab CPn(W,t) dW(w,t)

converges in q.m. as n -> 00,
and this q.m. limit is the same for any sequence of (w,t)-step functions
which satisfy (2.7). Therefore, we can define
(b cp(w,t) dW(w,t) = lim in q.m. (b CPn(W,t) dW(w,t) (2.8)

}a n-+co Ja
where {CPn} is any sequence of steps satisfying (2.7). We assume that both
a and b are finite. If not, the stochastic integral is defined as the limit in
q.m. as a -> - 00, or b -> 00, or both.
Proposition 2.1. Let {W t, - 00 < t < oo} be a Brownian motion, and

let cp(w,t) satisfy (2.3) and (2.4). Then,
(a) There exists a sequence of (w,t) steps {CPn} satisfying (2.3) and
(2.4) such that
Iicp - CPnl\2 = lab Elcp(·,t) - CPn(',t)12 dt ~0 (2.9)
(b) For each n, I(CPn) = lab

CPn(w,t) dW(w,t) is well defined by (2.6),
and {I(CPn)} converges in q.m. as n -> 00.
(c) If {CPn} and {cp~} are two sequences of (w,t) steps satisfying (2.3)
and (2.4) such that llcp - CPnll and llcp - cp~ll both go to zero as
n -> 00, then
(2.10)
n-4,. n-4,.
Proof:
(a) Suppose Ecp(·,t)ip(-'s) is continuous on [a,b] X [a,b], that is, CPt is q.m.
continuous on [a,b], then an approximating sequence of (w,t) step functions
{CPn} can be constructed by partitioning [a,b], sampling <p(w,t) at partition
points t/ n), defining CPn(W,t) = cp(w,tv(n), tv(n) ~ t < t~~l' and refining the
partitions to zero [max (t~~l - tv<n) -70]. Since CPt is q.m. continuous,
v n->"
Elcp(·,t) - CPn(·,t) 12 ~ 0 for every t in [a,b]. By the dominated conver-
n->"
gence theorem, we have
lb Elcp(·,t) - CPn(',t)12 dt ~0
More generally, if cP merely satisfies (2.3) and (2.4) and is not necessarily
q.m. continuous on [a,b], we construct a sequence of approximating (w,t)-
2. STOCHASTIC INTEGRALS 143
step functions in the following manner: First, let in be ep with its real and
imaginary parts truncated to ± n, then
(2.11)
Thus, we can always assume ep to be bounded. If cp is bounded, define
(2.12)
Then,g"C',t) is q.m. continuous on [a,b] and satisfies (2.3) and (2.4). Now,
lab Elep(·,t) - gn(-,t)/2 dt ::; lab 10'" e-TE 1 cp(-,t)
- cp (-, t - ~) /2 dt dr (2.13)
Since lab Icp(·,t) - ep (, t - ~) 12 dt ~ 0 whenever ep is bounded and

Lebesgue measurable, we have
(2.14)
It is now clear that we can construct a sequence of (w,t)-step functions

approximating cp by sampling gn at the partition points of a sequence of
partitions refining to zero. This completes the proof for (a).
(b) We assume that {CPnl is a sequence of (w,t)-step functions such that
Ilep - (Onll---> 0 and define I(epn) by (2.6) as
n->'"
(2.15)
Hence, if we write ApnW = W(t~~l) - W(tp(n), we have
EII(epnW = 2: 2: E(CPnp<pnl' ApnW A!'nW)

If JL> P, then AJLnW is independent of tet (n) while (CPnpCPnJLLlpnW) is tet (n)
measurable. Hence " "
if fJ. ,e P
and
EII(epn)12 = L E[lepnpI2(A.nW) 2]
p
= L Elepn.12E(!,.('.)(A. nW)
v
2
= L. ElepnpI2(t~~1 - tp(n)
(2.16)
Now, 1(\0111+..) - l(tpn) = l(tpm+n - tpn) and tpm+n - tpn is again a step.
Therefore,
EI1(tpm+,,) - l(tpn)12 = lab Eltpm+n(',t) - tpn(',t) 12 dt

~ 2 lab E/tp",+n(',t) - tp(·,t)12 dt
+ 2 lab E/tpn(',t) - tp(-,t)/2dt~O (2.17)
Hence, {1(tp,,)} is a mutually q.m. convergent sequence, and there exists

a second-order random variable I (tp) such that
EI1(tpn) - l(tp)/2 ~
n ......
0
and (b) is proved.

(c) Suppose Itpn} and {tp:} are (w,t) steps such that IItp - tpnll ~
n---+oo
0 and
IItp - tp:1I ~
n---+oo
O. Then,
n---+oo n-->"
This proves (c). I

Proposition 2.2. Let tp(w,t) and 1/t(w,t) satisfy (2.3) and (2.4). Then
(2.18)
Proof: It is enough to prove (2.18) for tp = 1/t, because
E1(tp)I(1/1) = -HEII(tp) + 1(1/1)1 2 - EII(tp) - 1(1/1)1 2]

+ ii[EII( -itp) + 1(1/1) 12 - E!l( -1'10) - 1(1/1) 121
= i[EII(tp + 1/1)1 2 - EII(tp - 1/t)l2]
+ !:.4 [EII( -itp + 1/1)1 2 - EII( -icp - 1/1)1 2] (2.19)
which means that (2.18) follows from the seemingly more special case
EI1(tp)12 = lab
Eltpt/2 dt. Now if tp is an (w,t)-step function, we have
already proved in (2.16) that
EI1(tp)/2 = lab E/tp(',t)/2 dt

3. PROCESSES DEFINED BY STOCHASTIC INTEGRALS 145
If cP is not a step, let {CPn} be an approximating sequence of steps, then

EII(cp)12 = EII(cp - CPn) I(CPn)12 +
= EII(<Pn)12 + 2 Re EI(cp - CPn)I(<Pn) + EII(cp - CPn)12
Since EII(cp - CPn)12 ---)
n->oo
0, we have
EII(cp)12 = lim EII(CPn)12

n->oo
= lim (b EICPn(.,t)12 dt = (b Elcp(·,t)12 dt

n---+ 00 }a }a
I
3 PROCESSES DEFINED BY STOCHASTIC INTEGRALS
Proposition 3.1. Let {Wt,ad be a Brownian motion and let cP satisfy

(2.3) and (2.4). Define a process {X/, a .::; t .::; b} by
X(w,t) = 1t <p(w,.s) dW(w,s) (3.1)
Then, {Xi, at, a .::; t .::; b} is a martingale, that is,
a.s. (3.2)
Remark: This important martingale property is intimately connected

with the fact that a stochastic integral is inherently defined by
forward difference approximations. We recall that if cP is an (w,t)-
step function, then
1b cp(w,t) dW(w,t) = .l <pp(w)[W(W,tp+l) -

p
W(w,tp)]
where each summand consists of a term <pp measurable with respect

to a and a forward increment ~p W which is independent of a
tp tp •
For a general <p, the stochastic integral is defined by approximating

cP ,vith steps, and it involves forward difference approximation
once again.
Proof; First, suppose that the integrand cP in (3.1) is an (w,t)-step func-

tion. Let t > s, and let t l, t2, . . . , tn be jump points of cP between 8
and t. We can write
X(w,t) - X(w,s) = f cp(w,T)dW(W,T)
= CPo(w)[W(W,tl) - W(w,s)] + CPl(W)

[W(W,t2) - W(w,t l )] + ... + CPn(W)[W(w,t) - W(w,t n)] (3.3)
where cpo is a. measurable and <Pk is at. measurable, k = 1, . . . , n. By
successively taking conditional expectation with respect to <XI., a/._ I ,
, a., we find
EIi'(X t - X.) = EIi·EIi., . . . EIi'-(X t - Xs)
= 0 a.s.
Since Xs is obviously as measurable, this proves the proposition for <p
equal to a step. If <p is not a step, let I <Pn) be step approximations to <p,
and define
Xn(w,t) = fat <Pn(W,t) dW(w,r) (3.4)
For each n, {Xn(w,t), a" a ~ t ~ b) is a martingale, and

EIi'(X t - X 8) = EIi'[X(',t) - Xn(·,t)] - EIi,[X(-,s) - Xn(-,S)]
Since EIX(·,t) - Xn(·,t)12~
n--->oo
0, we have
q,m.
~
n---> 00
0 (3.5)
Hence, EO,(X, - Xs) = 0, a.s., and the proof if' complete. I
A process I X I, a ~ t ~ b) as defined by (3.1) is obviously q. m.
continuous. Thus, we can choose a version of lXI, a ~ t ~ b) which is
separable and measurable. If we choose such a version and if we assume
that the Brownian process I Wt, a ~ t ~ b) in (3.1) is also separable, then
lXI, a ~ 1 ~ b) is sample continuous with probability 1. When <p is an
(w,t)-step function, sample continuity is obvious since I X" a ~ t ~ b) is
then a separable Brownian motion pieced together in a continuous way.
If <p is not a step, let {<Pn} be a sequence of (w,t)-step functions satisfying
(2.3) and (2.4) such that
Such a sequence can always be obtained by choosing a subsequence of any

sequence l.pm) such that II.pm - <p11 ~ o.
la <Pn., dW. and choose it to be separable, then for
m---> 00
t
If we set X"' =
each n, {Xnt' a ~ t ~ b) is sample continuous with probability 1. For
each n, {X nl - X I, a ~ t ~ b) is a separable second-order martingale. If
we apply the version of Proposition 2.3.2 for complex-valued martingales,
we get
<Y (sup IXnl - Xd

a5,t5,b
~ ~)
n
~ n 2EIXnb - X bl2 = n211<Pn - <p112 ~ n~
It follows that 2: cp( sup
n a<;tS;b
IX nt - XrI ~ l/n) < 00 and the Borel-Cantelli
3. PROCESSES DEFINED BY STOCHASTIC INTEGRALS 147
lemma implies that
A = lim sup {w: sup IXnt(w) - Xt(w)1 ~ .!}

n a$t$b n
1
is a null set, that is, for every wE A, sup IXnt(w) - Xt(w)1 ~ - for, at
t n
most, a finite number of n. Therefore, for w E A,
lim sup IXnt ( w) - X t ( w)1 = 0
n-+oo a5.t5;.b
and Xt(w), a ::; t ::; b, being the uniform limit of a sequence of continuous
functions, is itself continuous. This proves the sample continuity of
tXt, a ::; t ::; bl.
One immediate consequence of the martingale property is that a
Ht.ochastic integral does not behave like an ordinary integral. Consider,
for example, the stochastic integral lot
W8 dW 8 • If the integral is like an
ordinary integral, surely it must be equal to -HW t 2 - W 0 2) = -!W t 2. How-
ever, -!W t 2 is not a martingale, as is seen from the relationship
E(t8(-!W t2) = -!W8 2 + -Ht - s)
Therefore, lot W 8 dW s cannot be equal to tW t 2. What lot W. dW. is will be

clarified by the so-called Ito's differentiation rule, which will be stated
below.
To state the differentiation rule for stochastic integrals under its
natural hypotheses, we need to generalize the definition of stochastic
integrals to include integrands <p which satisfy (2.3), but instead of (2.4),
the weaker condition
almost surely (3.6)
This generalization is discussed in detail in Sec. 6 of this chapter (see, in

particular, Proposition 6.1). For now, we merely note that if <p satisfies
(2.3) and (3.6), then the stochastic integral 1b <Pt dW t is defined by
Cb <p(w,t) dW(w,t) = lim in p. (b <pn(w,t) dW(w,t)

Jr. n-too Ja
where I{Jn is defined by
<p n( ) = (
w,t
<pew,t) if 1t I<pew,s) 12 ds ::; n
o otherwise
Stochastic integrals appearing in the following proposition will be assumed
to be defined in this way if the integrands satisfy (3.6) rather than (2.4).
Proposition 3.2.Let X 1 (w,t), X 2(w,t), . . . ,Xn(w,t) be processes defined in

terms of a single Brownian motion W(w,t) as follows:
k = 1, . . . ,n
(3.7)
Let Y(w,t) = if;(Xit, X 2t, . . . ,Xn(, t), where if; is once continuously
differentiable with respect to t and has continuous second partials
with respect to the X's. Then, with probability 1,
lib
n
Y(w,t) = Y(w,a) + fIt ~(X(w,t'), t') dt' + if;k(X(W,t') , t')

k=l
dX~(w,t') + i i ~i
j=lk=l
b
if;JIc(X(W,t') , t')cpj(W,t')CPk(wt') dt' (3.8)
Remark: The surprising thing about (3.8) is the last term. It comes about
in roughly the following way. We recall that a Brownian motion
W t has the curious property (dW t )2 ~ dt. Therefore, dXj(t) dXk(t) '"
CPjCPk dt. Now,
dY t = Yt+dt - Y t = if;(X t+dt , t + dt) - if;(Xt,t)
= ~ dt + I
k
if;k dXk(t) +!
2
II
j k
if;jk dXj(t) dXk(t) + (3.9)
Both the first and the third term in (3.9) are of order dt, hence
dY t = ~ dt + I
k
if;k dXk(t) +!
2
L
j,k
if;jkCPjCPk dt + o(dt) (3.10)
which is nothing but a symbolic way of writing (3.8). We note that

dXk(t) can be replaced by fk dt + CPk dW t in both (3.8) and (3.10),
permitting Y t to be written in the form of
Yt = Ya = f g(w,t') dt' + it 'Y(w,t') dW(w,t')
If we apply (3.8) to Y t = iW?, we find immediately
dY t = W t dW t + tdt
or
or
lot Ws dW. = Y, - M =kW t 2 - it (3.11)
which is indeed a martingale.

4. STOCHASTIC DIFFERENTIAL EQUATIONS 149
It might be useful to isolate two special cases of Proposition 3.2.

First, suppose Y t = 1/;(X t ,t) depends only on a single X process, and
dX t = it dt +
CPt dW t • Then, (3.8) becomes
Yt = Ya +f ~(X.,s) ds + it 1/;'(X.,s) dX.
+ -21 lota 1/;"(X.,S)cp2(W,S) ds (3.12)
where prime denotes differentiation with respect to the first variable. For
the second special case, consider the product Y(w,t) = X 1 (W,t)X 2(w,t),
where Xl and X 2 satisfy (3.7) with k = 1, 2. Then (3.8) becomes
Yt = Ya + it X 2 (w,t') dX 1 (w,t') + it X 1 (w,t') dX 2 (w,t')
+ ~21a(t CPl(W,t')CP2(W,t') dt' (3.13)
4. STOCHASTIC DIFFERENTIAL EQUATIONS

From the point of applications, a major motivation for studying stochastic
differential equations is to give meaning to an equation of the form
(4.1)
where tt is a Gaussian white noise. At least formally, we know that

lot t. ds has all the attributes of a Brownian motion WI. Hence, formally
again, (4.1) appears to be equivalent to
XI = Xa + it rn(X.,s) ds + it u(X.,s) dW. (4.2)
With stochastic integrals having been defined, (4.2) is capable of taking

on a precise interpretation. Whether the interpretation is the one that
we really want to give to (4.1) is something else again. We postpone
examination of this question until Sec. 5. For the time being, we confine
ourselves to a study of (4.2) as an equation in the unknown X t and with
the last integral interpreted as a stochastic integral.
By a stochastic differential equation, we mean an equation of the
form
dX(w,t) = m(X(u;,t), t) dt + u(X(w,t), t) dW(u;,t) (4.3)
which is nothing more or less than a symbolic way of writing (4.2). A
process tXt, t ~ a) is said to satisfy (4.3) with initial condition Xa = X
if (1) for each t
J.t u(X.,s) dW.

is capable of being interpreted as a stochastic integral and (2) for each

t, X t is almost surely equal to the random variable defined by
X + 1t m(X.,s) ds + 1t u(Xs,s) dW.
Under the conditions that we shall assume, we can in fact assert a stronger
result than a.s. equality of the two random variables, viz., q.m. difference
between the two is zero.
We shall first state and prove an existence and uniqueness theorem
following Ito.
Proposition 4.1. Let {W t, a l , a ~ t ~ T < oo} be a separable Brownian

motion. Let X be a random variable measurable with respect to
aa and satisfy EX2 < 00. Let m(x,t) and u(x,t), - 00 < x < 00,
a ~ t ~ T, be Borel measurable functions in the pair (x,t). Let m
and u satisfy the following conditions:
Im(x,t) - m(y,t)1 + lu(x,t) - u(y,t)1 ~ Klx - yl (4.4)
Im(x,t) I + iu(.x,t) I ~ K V + 1 x2 (4.5)
Under these hypotheses, there exists a separable and measurable
process tXt, a ~ t ~ T} with the following properties:
Pt: For each tin [a,T] Xl is at-measurable
P 2: iT EX t 2dt < 00
Pa: {Xl, a ~ t ~ T} satisfies (4.2) with Xa = X
P 4 : With probability 1, tXt, a ~ t ~ T} is sample continuous
Ps: {Xl' a ~ t ~ T} is unique with probability 1
P 6 : tXt, a ~ t ~ T} is a Markov process
Remark: Condition (4.4) is known as the uniform Lipschitz condition.

Without loss of generality, the constants K in (4.4) and (4.5) can
be assumed to be the same.
Proof: We shall give a proof by constructing a solution. Since we shall

be dealing with a sequence of stochastic process, we shall write X(w,t)
or X(·,t) rather than Xl, because the subscript will be used to index
terms in the sequence. First, define a sequence of processes {Xn(' ,t),
a ~ t ~ T} as follows:
Xo(w,t) = X(w)
Xn+l(W,t) = X(w) +f m(Xn(W,S), s) ds
+ it U(Xn(W,S), s) dW(w,s) (4.6)
We need to show that the last integral is well defined as a stochastic

integral for each n. That is, we need to show that for each n
q(X n(W,t), t) is jointly measurable in (w,t), and for each t
is (Xt measurable (4.7)
and
(4.8)
This can be done by induction. First, we verify (4.7) and (4.8) for n = O.
Since Xo(w,t) = X(w), (4.7) is satisfied, because q is a Borel measurable
fUIlction, and u(X,t) is not only (Xt measurable, it is (Xu measurable. Using
(4.5), we have
q2(X,t) :::; K2(1 + X2)
so that
iT Eu 2 (Xo(-,t), t) dt :::; K2(1 + EX2)(T - a) < 00
and (4.8) is verified for n = O. Now, assume that (4.7) and (4.8) are both
satisfied for n = 0, 1, 2, . . . , k. Then from (4.6),
Xk+l(W,t) = X + it m(Xk(w,s), s) ds + it U(Xk(W,S), s) dW(w,s)
each of the three terms on the right is (it measurable, because {Xk(',S),
a :::; 8 :::; t) is (Xt measurable. Next, we note that for a :::; to :::; t :::; T,
[Xk+1 (W,t) - X k+1 (W,t O»)2 :::; 2 Ht: m(Xk(w,s), ds r 8)
+ [t: u(Xk(w,s), 8) dW(w,s) r} (4.9)

By using the Schwarz inequality on the second term and (2.18) on the
last term, we get
E[X k +1(-,t) - Xk+l(',tO»)2 :::; 2 [ (t - to) t: Em 2 (X k (·,s), s) ds
+ t: Eu 2 (Xk(-,S), s) d8J (4.10)
Now, (4 ..5) can be applied again, and we get
t:
E[X k +1 (',t) - X k +1 (-,t O»)2
:::; 2 {K2[1 + (t - to)] [1 + EXk2(·,S)] dS} (4.11)
Therefore, {Xk+l(·,t), a :::; t :::; T) is q.m. continuous and a measurable

version can be chosen. Furthermore,
la T EX~+l(-,t) dt :::; 2iT E[Xk+1(·,t) - X)2 dt + 2iT EX2 dt

:::; 4K2[1 + (T - a)](T - a) iT [1 + EXk2(-,S)] ds
+ 2(T - a)EX2 < 00 (4.12)
The induction is complete, and we have verified (4.7) and (4.8) for every n.
Therefore, the sequence of processes {X,,(',t), a ::::; t ::::; T, n = 0, 1, . . . l
is well defined.
Next we prove that for each t, {X,,(·,t), n = 0, 1, . . . l converges
in quadratic mean. To do this, define
~o(w,t) = X(w)
(4.13)
~n(w,t) = Xn(w,t) - Xn_1(w,t) n = 1, 2, . . .
Using (4.6), we get
~n+l(W,t) = lat [m(Xn(W,S), s) - m(Xn_1(w,s), s)] ds
+ f: [U(Xn(W,S), s) - u(Xn_1(w,s), s)] dW(w,s) (4.14)
If we make use of the inequality (A +

B) 2 ::::; 2A 2 +
2B2, the Schwarz
inequality, (2.18) on the stochastic integral, and the uniform Lipschitz
condition, we find
E~!+l(·,t) ::::; 2K2[1 + (T - a)] f: E~n2(·,S) ds (4.15)
The inequality (4.15) can be iterated starting from E~02(-,t) = EX2, and
we get
(4.16)
Now,
m
Xn+m(W,t) - Xn(W,t) 2: ~n+k(W,t)
k=l
and by the Cauchy-Schwarz inequality
(4.17)
Therefore, from (4.16) we get

.,
l
k=l
2n+kE~!+k(·,t)
{4K2[1 + (T - a)](t - a) }k (4.18)

k!
.,
Since .2: a k /k! converges to e" for every finite a,
°
k=O
sup E[Xn+m(·,t) - X n(.,t)J2 ~ (4.19)

m~O n---+co
uniformly in l. Therefore, for every l E [a,T], lXn(-,l)} is a q.m. con-

vergent sequence. Let the q.m. limit be denoted by X(·,t).
Thus, we have obtained a process {X(w,t), a .:::; t .:::; T} such that
sup E[Xn(·,t) - X(·,t»)2 ~ 0 (4.20)
a~t~T n->oo
Because for each n, {Xn(-,t), a ::; t ::; T} is q.m. continuous, the limit
process {XI, a ::; t .:::; T} is also q.m. continuous, hence continuous in
probability. It follows from Proposition 2.2.3 that a separable and mea-
surable version can be chosen for {X" a ::; t ::; 1'). We shall now show
that IX" a ::; t ::; T} so constructed satisfies PI - P 5•
First, for each t, Xn(·,t) is a, measurable for every n. Therefore,
XCt) is also at measurable for each t, and PI is proved. Next,
iT EX 2 dt .:::; {iT E[X(-,t) -

t 2 Xn(-,t))2 dt + iT EXn2(·,t) dt}
(4.21)
From (4.18) we have for some constant a,
sup
n
iT EXn2(·,t) dt ::; 2 [iT eat dt + (T - a) ] EX2 = A < 00
(4.22)
Hence, using (4.20) on (4.21) we get
iT EX,2 dt ::; 2A < 00
which proves P 2• Together, PI and P 2 ensure that u(Xs,S) dW. is well it

defined as a stochastic integral.
To prove that the process {Xi) a ::; t ::; T} is indeed a solution to
(4.2) with Xa = X, ,ve define
Dt = X t - X - it m(X.,s) ds - it u(X.,s) dW.
Using (4.6), we can rewrite D t as

D t = [X(·,t) - Xn+l(·,t)] - i [m(X(-,s), s) - m(Xn(-,S), s)] ds
- it [u(X(·,s), s) - u(X,,(·,s), S)] ds
It is now easy to show that each of the three terms on the right-hand
side goes to zero in quadratic mean as n ~ 00. Therefore
ED t 2 = 0
and for each t E [a,T],
Xl = X
r
+ Ja(t m(Xs,s) ds + Ja[t u(X.,s) dW. (4.23)
with probability 1. Further, both sides of (4.23) represent separable

processes if we choose a separable version for the stochastic integral.
Hence,
(P (X t = X + it m(X.,s) ds + it CT(X.,S) dW., a ::; t ::; T) = 1
(4.24)
and P a has been proved. Since the right-hand side of (4.23) represents a
sample-continuous process, {Xt' a ::; t ::; T} is also sample continuous,
and P 4 is proved. To prove uniqueness, suppose that X t and X t are both
solutions of (4.2) with Xa = Xa = X. Then, we can write
(X t - Xt) = it [m(X.,s) - m(X.,s)] ds
+f [CT(Xs,S) - CT(Xs,S») dW. (4.25)
Equation (4.25) has the same form as (4.14), and the same arguments
yield
E(Xt - X t )2 ::; 2K2[1 + (T - a») f E(X. - X.)2 d8 (4.26)

Inequality (4.26) has the form
d
d/(t) ::; cJ(t) t> a (4.27)
with J(t) ~ 0 and f(a) = O. Rewriting, we get

d
dt (e-ctf(t» ::; 0 t >a
Therefore, by integrating we get cctf(t) ::; J(a)e- ca , and it follows that
o ::; f(t) ::; f(a)ec(t-a) = 0
This proves thatf(t) = it E(X. - Xs)2ds = 0, a::; t::; T. From (4.26)

we have
Therefore, for each tin [a,T),

(P (XI ~ XI) = 0
and by a additivity,
(P (X t ~ X t at one or more rational points in [a,T)) = 0
XI are both chosen to be separable, then they are both sample
If X t and
continuous and
(P (X t = Xt for all t E [a,T)) = 1
5. WHITE NOISE AND STOCHASTIC CALCULUS 155
This proves that with probability 1 there is only one sample-continuous

solution to (4.2) with the same initial condition.
}'inally, we prove that {Xt, a ~ t ~ T} is a Markov process.
Using (4.2) we write
X t = Xs +f m(X.,T) dT +f IT(X"T) dW. a ~ s <t~ T
which can be regarded as a stochastic integral equation on the interval

s ~ t S T with Xs as the initial condition. Thus, for each t E [s,T], X t
can be obtained as a function of Xs and {W. - W., s ~ T ~ t}, that
is, XI is measurable with respect to the IT algebra generated by X. and
{W. - W., s ~ T ~ t}. Since Xs is (i. measurable, and {W W.,
T -
S ~ T ~ t} is independent of (is, X t is conditionally independent of (is

given X". A fortiori, X t is conditionally independent of {X., a ~ T ~ s}
given X g , and this proves the Markov property. I
Summarizing the preceding results, we find that, under the condi-
tions on X, m, and IT given in Proposition 4.1, the stochastic integral
equation
Xt = X + it m(X.,s) ds + it IT(X.,s) dW. (4.28)
has a unique sample-continuous solution which is Markov. We emphasize

again that, by definition, the last integral it
IT(X.,s) dW. in the integral
equation is to be interpreted as a stochastic integral. The question
whether this stochastic integral equation adequately models a differ-
ential equation driven by Gaussian white noise
(4.29)
Xu = X
remains unanswered. Indeed, this question cannot be answered without
more being said about what we want (4.29) to mean. As it stands, (4.29)
is merely a string of symbols, nothing more. We shall take up this ques-
tion in the next section.
Finally, we note that the existence of a solution to (4.2) is ensured
even without the Lipschitz condition (4.4), but then the uniqueness is
no longer guaranteed rSkorokhod. 1965, p. 59].
5. WHITE NOISE AND STOCHASTIC CALCULUS

In thi,; section, we offer an interpretation of differential equations driven
by white noise, and examine its relationship with stochastic differential
equations. The equation that we would like to give a precise meaning to
is the following:
d
- X(w,t) = m(X(w,t), t)
dt
+ u(X(w,t), t)s(w,t) (5.1)
where SI is a Gaussian white noise. Since white noise is an abstraction

and not a physical process, what one really means by (5.1) in practice
is probably an equation driven by a stationary Gaussian process with a
spectral density that is flat over a very wide range of frequencies. If we
take St to be such a process in (5.1), then there is no difficulty interpreting
(5.1) as an ordinary differential equation for each sample function, pro-
vided that the spectral density of St eventually goes to zero sufficiently
rapidly so that the sample functions are well behaved. While this is
probably what we want (5.1) to mean, this is not how ".. e want (5.1) to
be handled mathematically. If we take St to be a process with well-
behaved sample functions, we lose some of the simple statistical proper-
ties of XI, the primary one being the Markov property. In practice, the
interpretation of (5.1) that we really want is probably the following.
Take a sequence of Gaussian processes {s nCt)} which "converges" in
some suitable sense to a white Gaussian noise, and yet for each n Sn(-,t)
has well-behaved sample functions. Now, for each n the equation
d
dt Xn(W,t) = m(Xn(w,t), t) + u(X,,(w,t), l)S,,(w,t)
(5.2)
together with the initial condition Xn(w,a) = X(w) can be solved. We

assume that m and u are such that the solution exists and is unique for
almost all sample functions. Thus, we obtain a sequence of processes
{Xn(',t), a ~ t ~ T}. Suppose that as n-'> 00, {Sn(',t)} converges in a
suitable sense to white noise, and the sequence {X n(' ,t), a ~ t ~ T I
converges almost surely, or in quadratic mean, or even merely in prob-
ability, to a process {X (. ,t), a ~ t ~ T}. Then it is natural to say that
X t is the solution of
where St is a Gaussian white noise. This makes precise the interpretation

of (5.1). We still have to determine whether (5.1) can be modeled by a
stochastic differential equation as defined in the last section.
In order to make precise the notion of ISn(',t)} converging to a
white noise, we define
(5.3)
and rewrite (5.2) as an integral equation

Xn(W,t) = Xn(w,a) + Dm(Xn(W,S), s) ds
+ DO"(X,,(w,s), s) dW neW,S) (5.4)
Since a Gaussian white noise !;t is the formal derivative of a Brownian
motion, we make precise the notion of {!;n(-'t) I converging to a Gaussian
white noise by requiring that
Wn(-,t) ~ K[W(·,t) - W(·,a)] (li5)
n---" co
where K is a constant and {W (. ,t), a ~ t ~ T I is a Brownian motion

process. Since the constant K can always be absorbed into 0" in (5.4), WE'
shall assume it to be 1. We want to resolve the following two questions:
First, under what conditions will {X n(-,t) , a ~ t ~ T I converge? Secondly,
if {Xn(-,t), a ~ t ~ Tl converges, does the limit {X(',t), a ~ t ~ TJ
satisfy a stochastic differential equation, and if SO, what stochastic
differential equation?
Before stating the precise results that can be proved concerning
these questions, we shall give a preliminary and heuristic discussion of
what we can expect. This is especially important since what can be
proved precisely at present is a little complicated and undoubtedly falls
far short of what is in fact true. To begin with, consider a sequence of
processes {Y nCt) I defined by
Yn(W,t) = D<p(Wn(W,t), t) dWn(w,t) (5.6)
where <p is a known function and {W n(' ,t) I converges to a Brownian

motion, and we want to determine what {Y n(',t) J converges to. Suppose
we define a function if;(x,t) by
if;(x,t) = loX <p(z,t) dz (5.7)
If we denote (a / at)if;(x,t) by if;(x,t), we find

dif;(Wn(w,t), t) = <p(Wn(w,t), t) dWn(w,t) + if;(Wn(w,t), t) dt (5.8)
In other words, we have
Yn(w,t)
(5.9)
Now, if if; and if; are reasonable functions, we would certainly expect that
as W,,(w,t) ~ W(w,t),
n-HO
f(TV,,(w,t), t) ----')
n-->
if;(W(w,t), t)
00
f(W,,(w,t), t) ----7
n--> 00
if;(W(w,t), t)
158 STOCHASTIC INTEGRALS AND STOCHASTIC D!FFERENTIAl EQUATIONS
Therefore, if all this is true, then

Yn(w,t) ~
n->oo
Y(w,t) = 1f(W(w,t), t) - 1f(W(w,a), a)
- lat ~(W(w,s), s) ds (5.10)
N ow, by the Ito's differentiation rule (Proposition 3.2),
1f(W(w,t), t) = If.-(W(w,a), a) + lat ~(W(w,s), s) ds

+ Ja[t If.-'(W(w,s) , s) dW(w,s) + ~2 Ja[t If.-"(W(W,S) , s) ds (5.11)
Noting If.-'(x,t) = 'P(x,t), we get
Y(w,t) =
Ja[t 'P(W(w,s), s) dW(w,s) +.!.2 Ja[t 'P'(W(w,s), s) ds (5.12)
Comparing (5.12) against (5.6), we get the interesting result
lat 'P(Wn(w,s), s) dWn(w,s) ;::: lat 'P(W(w,s), s) dW(w,s)

+.!.2 Ja[t 'P'(W(w,s), s) ds (5.13)
where the first term on the right-hand side in (5.13) is a stochastic

integral. The reason for the extra term is the same as the reason for the
extra term in the Ito's differentiation formula (3.8). As we discussed it
at that time, roughly speaking, the extra term is due to the fact that
(dW)t 2 is approximately dt.
In light of (5.13), we should expect a similar development for (5.4)
as n ---+ 00, namely, there will be an extra term. To find out what this
extra term is, we first rewrite (5.4) as
dXn(w,t) = m(Xn(W,t), t) dt + U(Xn(W,t), t) dWn(w,t) (5.14)
Now define
If.-(x,t) = /cox -(-)

1
u z,t
dz (5.15)
so that
dlf.-(Xn(w,t), t) = ~(Xn(W,t), t) dt +
If.-'(Xn(w,t), t) dXn(w,t)
. m(Xn(w,t), t) )
= If.-(Xn(w,t), t) +
«
u Xn w,t), t
+
) dt dWn(w,t) (5.16
or
1f(X r .(w,t), t) - If.-(Xn(w,a), a) = f J.L(Xn(w,s), s) ds
+ W,,(w,t) - W n(w,a) (5.17)
where we have set p, = (m/u) +if;. Suppose {TVn(',t)} converges to a

Brownian motion TV (. ,t) and suppose that {X n (. ,t)} converges to a process
X(·lt). Then, under reasonable conditions, we would expect
!/I(X(w,t), t) - !/I(X(w,a), a) = 1t p,(X(w,s), s) ds

+ TV(w,t) - TV(w,a) (5.18)
If we assume that X(w,t) can be written 'n the form of
X(w,t) = X(w,a) + 1t few,s) ds + f cp(w,s) dW(w,s) (5.19)
then we can apply Ito's differentiation formula (3.8) to !/I(Xt,t) and get
!/I(X(w,t), t) - !/I(X(w,a), a) = 1t !/I'(X(w,s), s)f(w,s) ds
+ 1t if;(X(w,s), s) ds +f !/I'(X(w,s), s)cp(w,s) dW(w,s)
+ ~2 Ja(t !/I"(X(w,s), S)cp2(W,S) ds (5.20)
We can now equate (5.18) with (5.20) and get

cp(w,s)!/I'(X(w,s), s) = 1
Therefore, by noting that !/I' = l/u, we get

cp(w,t) = u(X(w,t), I) (5.21)
Further,
if; + u-1f + iu o (1)' =

--
u
p, = -m
u
+ if;
Hence,
f(w,t) = m(X(w,t), t) + iu(X(w,t), t)u'(X(w,t), t) (5.22)
Putting (5.21) and (5,22) into (5.19), we get
X t = Xa +f [m(X.,s) + iu(X.,s)u'(X.,s)] ds + f u(X.,s) dW.

(5.23)
What we have shown, at least formally, is that if we interpret a white-
noise-driven equation
(5.24)
by a sequence of equat.ions like (5.2), then the white-noise-driven differ-

ential equation is equivalent to a stochastic differential equat.ion given by
(5.25)
Again, we note the presence of an extra term fuu', which will be referred
to as the correction term.
We shall now state some convergence results concerning (5.13) and
(5.23) [Wong and Zakai, 1965a and b. 1966]. We need to define some types
of approximations {Wn(w,t) I to a Brownian motion W(w,t) as follows:
a.R.
AI: For each t, Wn(-,t) ~
n ...... oo
Wc,t). For each n and almost all w, Wn(w,.)
is sample continuous and of bounded variation on [a,T].
A 2: Al and also for almost all w, W neW,) uniformly bounded, i.e., for
almost all u;,
sup sup /Wn(w,t)/ < 00
n tE[a,bj
Aa: A2 and for each n and almost all w, W n(W,t) has a continuous deriva-
tive Wn(W,t).
A4: For each n, W n(w,t) is a polygonal approximation of W(w,t) defined
by
t- t·(n)
- W(w t·(n»] ___J _
, J t(n) _ Un)
J+I .1
l/n) S tS tj~J (5.26)

T and
max (t(n) - t·(n» ~ 0
j J+I J n ...... 00
Proposition 5.1. Let cp(x,t) have continuous partial derivatives cp'(x,t) =

(ajax)cp(x,t) and (ajat)cp(x,t) in - 00 < x < 00, a S t S b. Let
{Wn(w,t) I satisfy A 2 , then
Ja(b cp(Wn(w,t), t) dWn(w,t) ~

n-. Ja(b
00
cp(W(w,t) , t) dW(w,t)
+!2 }a(b cp'(W(w,t), t) dt (5.27)
Further, if cp(x,t) does not depend on t, then the conclusion holds

with Al replacing A 2 •
Proposition 5.2. Let m(x,t), u(x,t), u'(x,t) = (ajax)u(x,t), and u(x,t) = (aj
at)o(x,t) be continuous in - 00 < x < 00, a S t S b. Let m(x,t),
u(x,t), and u(x,t)u'(x,t) satisfy a unifOl"m Lipschitz condition, Le.,
if f denotes any of three quantities m, u, uu', then
/f(x,t) - f(y,t)/ s K/x - y/ (5.28)
Let {Xn(w,t), t 2': al satisfy (.5.2), and let {X(w,t), t 2': aJ satisfy the
stochastic differential equation

dX(w,t) = m(X(w,t), t) dt + u(X(w,t), t) dW(w,t)
+ iu(X(w,t), t)u'(X(w,t), t) dt (5.29)
Let Xn(w,a) = X(w) = X(w,a), where X is independent of the
aggregate of differences {Wi - W a , t 2:: a} and EX2 < 00.
(a) If in addition, lu(x,t) I 2:: (3 > 0 and lu(x,t) I < Ku 2 (x,t), then
with {Wn(w,t)} satisfying Aa
a.s.
Xn(w,t) ~ X(w,t) a ~ t ~ b (5.30)
n--> 00
(b) If (Wn(w,t)} :-;utisfies A. and EX 4 < 00, then

<l.n!.
Xn(w,t) ~ X(w,t)
n--> 00
(5.31)
It should be mentioned that u symmetrized definition for stochastic
integrals has been proposed [Fisk, 1963; Stratonovich, 1966] for which
rules of ordinary calculus apply. Rewritten in terms of the Fisk-Straton-
ovich's integral, neither (5.13) nor (5.23) would contain an extra term.
However, this approach has the disadvantage that conditions which
guarantee the convergence of Fisk-Stratonovich's integral are less natural
and more difficult to verify than those of the stochastic integral. Further-
more, the martingale property of Ito's integral would be lost. As we shall
see in Chap. 6, an important application of the stochastic integral is in the
representation of likelihood ratios and filtering operations, and this appli-
cation depends on the martingale property. While these representations,
under suitable restrictions, can be reexpressed in terms of the Fisk-
Stratonovich integral, the resulting formulas will be considerably more
complicated.
Equations (5.27) and (fi.29) can be interpreted as expressions relating
a white-noise integral 1t cp(w,s)!;(w,s) ds to the stochastic integral
1t cp(w,s) dW(w,s) for the following two special cases:
1. cp(w,s) = If(W (w,s), s)

2. cp(w,s) = If(X(w,s), s), and X. is related to W. via a stochastic differ-
ential equation.
In general, l{J(w,t) may depend in a much more complicated way on {W.,
a ~ s ~ t}. The question arises as to whether it is possible to relate the
white-noise integral to the corresponding stochastic integral in the general
situation. This question has been resolved [Wong and Zakai, 1969].
Roughly speaking, the white-noise integral is equal to the corresponding
stochastic integral plus a correction term. If cp(w,t) is viewed as a functional
on {W(w,s), a ~ s ~ t}, then the correction term can be expressed in
terms of the Frechet differential of this functional.
In applications, differential equations driven by white noise fre-

quently appear in a vector form as follows:
Xt = m(Xt,t) + d(Xt,t){t (5.32)
where XI and mare n vectors, {I is a p vector of independent Gaussian
white noise, and d is a matrix of appropriate dimensions. There is no dif-
ficulty in extending the definition of stochastic integral to the form
lab t(w,t) dW(w,t)
where W t is a vector of independent Brownian motions, and tt is a matrix
provided that
lab LEI 'Pij(-,t) I dt < 2 00
,,",;
In terms of the extended definition of stochastic integral, stochastic differ-

ential equations in the vector form can be treated. Intuitively, it is highly
plausible that the white-noise equation (5.32) is equivalent to a stochastic
differential equation
(5.33)
The problem is to determine f and g. It was conjectured by Wong and
Zakai [1965a] that g = d, but
jk(X,t) = mk(x,t) + -1 ~~ aUkm(X,t) U!m(X,t) (5.34)

2 l ,m ax!
This has since been verified [McShane, 1969, 1970] under suitable
conditions.
As the final topic in this section, we briefly consider problems
arising in simulation. Suppose that we want to simulate a white-noise
differential equation
(5.35)
Roughly speaking, there is a time constant or a bandwidth associated
with the equation. While it is not clear how a bandwidth should be
defined, it clearly should be related to the maximum rate of change of
X t in some way. The following definition may be useful:
Im(x,t) I
B = sup [ + -------,--,-
u (x,t) ] 2
z,t 1 + Ixl 1 + Ixl2

Under assumption (4.5), this quantity is always finite. If Zt is a stationary
Gaussian process with a spectral density that is constant over a band-
width much greater than B, then it is intuitively clear that (5.35) can
6. GENERALIZATIONS OF THE STOCHASTIC INTEGRAL 163
be simulated by replacing tt by Zt. Hence, an analog and continuous-time

simulation of (5.35) can be achieved by implementing
X t = m(Xt,t) + a(Xt,t)Zt (5.36)
with a wide-band noise source Zt. Of course, this also simulates the
dXt = m(Xt,t) dt + a(Xt,t) dWt + iaa'(Xt,t) dt (5.37)
The situation is less clear in discrete-time simulation. All depends
on the noise source. If one uses a random-number generator which
produces a sequence of independent Gaussian random variables Zl, Z2, . . . ,
then the difference equation
(5.38)
simulates (5.37) well, provided that we choose EZk 2 = ~ and ~ « liB.
Hence, (5.35) can be simulated by implementing (5.38). On the other
hand, suppose that the noise source is a wide-band noise generator with
bandwidth Bo » B. If we sample this noise source at a rate to permit a
faithful reproduction of this noise, we ,vould have to sample at 2Bo or
more. If we do this and produce a sequence Zl, Z2, . . . , then the dif-
ference equation
1
Xk+l = Xk + -B
2 0
m(Xk,tk) + a(Xk,tk)Zk (5.39)
is a good approximation to (5.35). The difference here is that Zl, Z2, ... ,
are no longer independent.
6. GENERALIZATIONS OF THE STOCHASTIC INTEGRAL

For a Brownian motion {Wt,cxd, we have defined the stochastic integral
I(<p,w) = lab <p(u.',t) dW(w,t) (6.1)
for integrands satisfying (1) <p jointly measurable in (w,t), (2) for each t
<Pt is at measurable, and (3) lab E/<pd
dt < 00. The stochastic integral
2
(6.1) can be generalized in two important directions. First, it can be

defined for integrands satisfying (1), (2), and instead of (3), the weaker
condition
a.s. (6.2)
Secondly, the Brownian motion {Wt,ad in (6.1) can be replaced by a

class of martingales {Xt,ad. In this section, we shall consider both these
generalizations and their applications.
Proposition 6.1. Let I Wt,G. t } be a Brownian motion and let ep(c..;,t) satisfy:
(a) ep is jointly measurable in (w,t).
(b) For each t, ept is at measurable,
(c) Jab
1r,c(w,t)12 dt < 00 almost surely.
Let epm be defined by
epm(w,t) = { ~(w,t) if 1t lep(w,t) 12 dt ::; m

(6.3)
otherwise
and let J(epm) denote the stochastic integral
J(epm) = 1b epm(w,t) dW(w,t)

Then, {J(epm) , m = 1, . . . } converges in probability as m ~ 00,
and we define
J(ep) = (b «-(w,t) dW(w,t) = lim in p. J(epm) (6.4)

Ja m~oo
Proof: Let epm be defined by (6.3). For each m, epm satisfies (2.3) and
(2.4) so that J(epm) is well defined. Now, for any w such that
1b lep(w,t)12 dt ::; min (m,n)
we have from (6.3)

sup lepm(w,t) - epn(w,t) I = 0
t
which in turn implies that 1b cpm(w,t) dW(w,t) 1b epn(w,t) dW(w,t). It

=
follows that for every e > 0,
 min (m,n») --;;::;;::;: 0
which proves that {I(epn)} converges in probability so that (6.4) is an
adequate qefinition for J(ep). I
Remarks.
(a) If l'Pn} is a sequence of functions satisfying conditions (2.3)
and (6.2), if lepm(w,t) I :::; lep(w,t)l, and if
epm -4
m---H.
ep in <P X £ measure (6.5)
then
(6.6)
(b) Now the process
Xt = 1t <p(w,s) dW (w,s) (6.7)
is no longer necessarily a martingale. Of course, a sufficient condition

for X t to be a martingale is precisely
(6.8)
However, this is not a necessary condition. If we define Tn(W) =

min t: 1t
<p2(W,S) ds 2: nand E'et 1n(W) = 00 if <p2(W,S) ds < n, 1b
then for each 11, X nt = X min (I,T n ) is a martingale. By definition X t is
said to be a local martingale [see, e,g., Kunita and Watanabe, 1967].
Next, we shall consider generalizations of the stochastic integral by
replacing the Brownian motion WI by a more general process. As a first
step in this direction, we shall replace W t by a process Z t satisfying the
following properties. Throughout, {<X t } again denotes an increasing family
of 0" algebras.
{ZI, <X t , a :::; t :::; b} is a martingale and EZ t 2 < 00 (6.9)
E(Zt - Z.)2 = Ea'(Zt - Z.)2 a.s. (6.10)
Let F(t) be a nondecreasing function so that
E(Zt - Z.)2 = F(t) - F(s) t 2: s (6.11)
Then, the stochastic integral
lab <p(w,s) dZ(w,s) (6.12)
is well defined for any <p satisfying

<p is jointly measurable in (w,t) and for each t, <Pt is <X t measurable
(6.13)
lab 1'P(w,s) 12 dF(s) < 00 a.s. (6.14)
The procedure for defining (6.12) is exactly the same as before and will
not be repeated.
The class of processes satisfying both (6.9) and (6.10) is still quite
restricted. In particular, if Z t is almost surely sample continuous, then
F(t) is necessarily continuous [for convenience, we set F(O) = 0] and Zt
can be expressed as
Zt = WF(t) (6.15)
where W t is a Brownian motion. Therefore, if we consider only sample-
continuous Zt, then the stochastic integral (6.12) is really the same as
what we already defined. The next step in generalizing the stochastic

integral is to get rid of the restriction (6.10). We begin with the following
result.
Proposition 6.2. Let {Z t, at, a .:::; t .:::; b} be a sample-continuous second-

order martingale. Then there is a unique decomposition
Zt 2 = ZIt + Z2t (6.16)

where {Z 2t, at, a .:::; t .:::; b} is a sample-continuous first-order mar-
tingale, and {ZIt, a .:::; t .:::; b} is sample continuous, nondecreasing,
with ZIa = o.
Remark: This proposition is a special case of the well-known super-

martingale-decomposition theorem of Meyer [1966, Chap. 7]. We
note that if Z t is a Brownian motion, then Z It is simply t.
Proposition 6.3. Let {Zt, at, a .:::; t .:::; b} be a sample-continuous second-

order martingale. Let {ZIt, a':::; t.:::; b} be defined as in (6.16). Sup-
pose that cp(w,t), w E Q, t E [a,b], is a jointly measurable function 1
such that for each t, CPt is at measurable and
(b 2
Ja CPt dZ It < 00 (6.17)
with probability 1. Then the stochastic integral
I(cp,w) = lab cp(w,t) dZ(w,t) (6.18)
is well defined by the following two properties:

(a) If cp is an (w,t)-step function, then
I(cp,w) = lv
CPv(w)[Z(w,tv+ I) - Z(w,t v )]
Remark: I t is clear that ZIt now plays the role played by l in the original
definition of stochastic integral.
If X t is of the form
Xt = lat few,s) ds + f cp(w,s) dZ(w,s) (6.19)
where the last integral is defined as in (6.18), then a transformation rule

similar to Proposition 3.2 holds once again. Let 1/;(x,t) be twice continuously
1 Here measurability in t refers to Borel measurability.
differentiable in x and once in t. Then

1{;(X t ,t) = 1{;(Xa ,a) + lat ¢/(X.,s) dX. + f !{(X.,s) ds
1 {t {I( ) 2 Z
+ "2}a if; X.,s /Ps d Is (6.20)
with probability 1 [Kunita and Watanabe, 1967].

Suppose that there exists a continuous and nondecreasing function
F(t), a ~ t ~ b, such that for almost all w, Zl(W,t) as a function of t
is absolutely continuous with respect to thc Borel measure generated by
F. That is, there exists an a.s. nonnegative function z(w,t) such that
a.s. (6.21)
If such an F exists it can always be taken +.0 be EZ lt , because (6.21)

implies that
Zl(W,t) = lo
a
t z(w,s)
- E d(EZ 1s )
zs
(6.22)
If ZIt has the representation (6.21), then Zt can be represented as a

stochastic integral
Z(w,t) = la Vz(w,s) dW(w,F(s»
t (6.23)
where IW., 0 ~ s ~ F(b) I is a Brownian motion. We note that (6.23) is

a stochastic integral of the type given by (6.12). Now, with (6.23) we
can rewrite any stochastic integral in terms of Zt,
(6.24)
Once again, \ve return to the basic definition of a stochastic integral in

terms of a Brownian motion.
As the final step in generalizing the stochastic integral, consider a
sample-continuous process I Xi, a ~ t ~ b I which has a decomposition
(6.25)
where Y t is almost surely of bounded variation, and Zt is a second-order
sample-continuous martingale. Clearly, we can define
lab /Pt dX t = lab /Pt dY t + lab /PI dZ t (6.26)
provided that the first integral exists almost surely as a Stieltjes integral
and the second as a stochastic integral. A process that can be decomposed
as in (6.25) was termed a quasi-martingale by Fisk [1965] who also gave
necessary and sufficient conditions for the existence of such a decomposi-
tion. U nfortunateiy, these conditions are not always easily verified.
As an example of applications of the generalized definition of

stochastic integral, we consider the following important representation
theorem due to Doob [1953, pp. 287-291].
Proposition 6.4.Let tXt, a :::; t :::; b} be a sample-continuous second-order

process. Let m(x,t) and u(x,t) be Borel functions of (x,t) satisfying
Im(x,t)1 :::; K y"1 +x 2 (6.27)
0:::; u(x.t):::; K~ (6.28)
Let Gt denote the smallest u algebra such that X s , s :::; t, are all
measurable, and suppose that {XI, a :::; t :::; b} satisfies the following
conditions:
(a) There exists {Z(w,t),a:::; t:::; b} suchthatZ/ 2:: O,EZ t < oo,and
sup ECJ.·X t2 :::; Z. (6.29)
t>.
(b) There exists a nondecreasing function f with lim f(h) = 0 such
ht O
that whenever a :::; t < t +h :::; b, we have with probability 1,
1 ECJ.'(X t+h - Xt) - fH (X.,s) dsl :::; hf(h)(1 + X (2)

m (6.30)
1 ECJ.'(X t+h - X )2 - fH u (Xs,S) ds I :::; hf(h)(1 + X(2)

t 2 (6.31)
Under these conditions, tXt, a :::; t :::; b} is a l\Tarkov process and
satisfies a stochastic differential equation
X t = Xa + it m(Xs.s) ds +f u(Xs,s) dW. (6.32)
where {W /, a :::; t :::; b} is a Brownian motion.
Remark: We have made no assumption that m and u satisfy a Lipschitz

condition. Without such an assumption, we cannot be sure that
there is a unique solution to (6.32). One possible consequence of
this is that the finite-dimensional distributions of {XI, a :::; t :::; b I
may not be completely determined by m, u, and the distribution
of Xa.
Proof: We shall give an outline of the proof. Let {Z/, a :::; t :::; b} be
defined by
Z/ = Xt - Xa - it m(X.,s) ds (6.33)
Because of (6.30), we can show that {Zt, G t , a :::; t ~ b} is a sample-
continuous martingale. Because of (6.27), it is also second order. Further-
more, if we define
Y/ = Z/2 - it u 2 (X.,s) ds (6.34)
7. DIFFUSION EQUATIONS 169
then because of (6.31), {Y t , at, a ~ t ~ b I is also a sample-continuous

martingale. Therefore, the process {Z It. a ~ t ~ b I defined by the de-
composition (6.16) is simply
Zll = 1t u 2 (X.,s) ds (6.35)
Clearly, ZII has the form of (6.21) with

F(t) = I (6.36)
and
z(w,t) = u 2 (X(w,t), t) (6.37)
From (6.23) we get
Zt = f u(X.,s) dW. (6.38)
Equation (6.38) combined with (6.33) yields (6.32). I
7. DIFFUSION EQUATIONS
In this section, we shall try to show that the transition probabilities of a
process satisfying a stochastic differential equation can be obtained by
solving either of a pair of partial differential equations. These equations
are called the backward and forward equations of Kolmogorov or, alter-
natively, diffusion equations. The forward equation is also sometimes
called the Fokker-Planck equation. The situation, however, is not com-
pletely satisfactory. As we shall see, the original derivation of Kolmogorov
involved assumptions that cannot be directly verified. Attempts in cir-
cumventing these assumptions involve other difficulties. We begin v.ith a
derivation of the diffusion equations following the lines of Kolmogorov
[1931].
Let {XI, a ~ t ~ b} be a l\Iarkov process, and denote
P(x,t/xo,to) = a>(XI < .r/X'o =, .ro) (7.1)
We call P(x,t/xo,to) the transition function of the process. If there is a
function p(x,t/xo,to) so that
P(x,t/xo,to) = !~'" p(u,t/xo,to) du (7.2)

then we call p(x,t/xo,to) the transition density function. Since {XI, a ::; t ::;
b I is a Markov process, P(x,t/xo,t o) satisfies the Chapman-Kolmogorov
equation
P(x,t/xo,to) = roo", P(x,tlz,s) dP(z,s/xo,t o) (7.3)
We now assume the crucial conditions on {X t, a ~ t ~ b I which make the

derivation of the diffusion equations possible. These conditions are very
170 STOCHASTIC INTEGRALS AND STOCHASTIC DIFFERENTfAL EQUATIONS
similar to conditions (6.30) and (6.31) which made it possible to represent

a process as the solution of a stochastic differential equation.
Define for a positive ~,
Mk(x,t; E,~) = ~Y-XIS;' (y - X)k dP(y, t + ~Ix,t) k = 0, 1,2

(7.4)
M 3(x,t; E,~) = ~Y-xls;.IY - xl 3 dP(y, t + ~Ix,t)
We assume that the Markov process {X I, a :::; t :::; b I satisfies the follow-
ing conditions:
1
-[1 - MO(X,t;E,~)]~O (7.5)
~ A!O
1
- M1(x,t; ~,~) ~ m(x,t) (7.6)
~ A!O
~M 2 (x,t; ~,~) ~ u 2 (x,t) (7.7)
1
-11l3(x,t;
~
E,~) ~
A!O
0 (7.8)
It is clear that if 1 - M o(x,t; E,~) ~ 0, then by dominated convergence,
A!O
<:p(IX I+A - XII > ~) = f-"'", [1 - M o(x.t; E,~)] dP(x,t) ~ 0

Therefore, (7.5) is considerably stronger than continuity in probability.
In addition, suppose that the transition function P(x,tlxo,t o) satisfies the
condition:
For each (x,t), P(x,tlxo,t o) is once differentiable in to and three-
times differentiable in xo, and the derivatives are continuous
and bounded in (xo,t o) (7.9)
Now we can derive the background equation as follows. Write the Chap-
man-Kolmogorov equation in the form
P(x,tlxo,to) = f-"'", P(x,tlz, to + ~) dP(z, to + ~Ixo,to) (7.10)
Because of (7.9), we can write, by virtue of the Taylor's theorem,

to + ~)
P(x,tlz, to + Ll) = P(x,tlxo, to + ~) + ap(x,tlxo,
axo
(z - xo)
+ -1 a p(x,tlxo, to + ~) (z - Xo )2
2
2 aX02
+ .!.6 aap(x,t!z,a to + ~) I
3 (z - Xo
)3
18 - xol :::; Iz - xol (7.11)
Z z=8
Using (7.11) in (7.10) and using (7.4), we can write

P(x,tlxo,to) = (
llz-xol>.
P(x,tlz, to + !:J.) dP(z, to + !:J.lxo,to)
1 8k 1
+ kfo k! Mk(XO,tO; E.!:J.) aXok P(x,tlxo, to + !:J.) + "6 tz-xol~'
2
8 ap(x.tlz, to + !:J.) I (z - xo)a dP(z, to + !:J.lxo,to) (7.12)

8z a z= 8
This means that
I[
P(x.tlxo,to) - P(x,tlxo. to + !:J.)] __~ M ( . A)
1 xo,to, E,i.J.
!:J. !:J.
ap(x,tlxo, to _+ !:J.) _ ! -.! M 2(xo to· E!:J.)
8xo 2!:J.' , ,
a2p(x,tl:;~:0 + !:J.) I s; I [1 - M o~o,to; E,!:J.)]
+ Ma(xo,to; E,!:J.)
sup
Iaap(x,tlz, to + II a
!:J.)
(7.13)
6!:J. Iz -xol~. az
If we let !:J.1 0 and use conditions (7.5) through (7.8), (7.13) becomes
8 a
- - P(x,tlxo,to) = m(xo,to) - P(x,tlxo,to)
ato axo
a
+ io-2(xo,tO) -aX0
2
2
P(x,tlxo,to) a < to < t Xo (7.15)

to r t x< Xo
Equation (7.14) is the backward equation of diffusion. The name is due to
the fact that it is an equation in the pair of initial variables (xo.to) moving
backward from t.
The forward equation can be derived in the following indirect way:
Let f(x), - 00 < x < 00, be a Schwartz function of rapid descent. That
is, f is infinitely differentiable, and for any k and m,
(7.16)
As we did in Chap. 3, the space of all such functions is denoted by S (cf.

3.5.6c). Define the function
J(tlxo,t o) = E[f(X t ) IX lv = xol
= f-"'", f(x) dP(x,tlxo,t o) (7.17)
Now, write
J(t + .llxo,to) = 1-"'", f(x) dP(x, t + .llxo,to)

= 1-"'", [/-"'", f(x) dP(x, t + .llz,t) ] dP(z,tlxo,t o) (7.18)
Since f is infinitely differentiable and satisfies (7.16), we have
f(x) = L-k!
2
k=O
1
j<k)(Z)(X - Z)k + H(3)(O)(X - z)3 (7.19)
Repeating the arguments leading to (7.14), we find
8J(i~;0,t) = 1-"'", [ m(z,t) d~~) + iu (z,t) dr;:) ]2 dP(z,tlxo,t o) (7.20)
Now, if P(x,tlxo,to) satisfies
For each (xo,t o), P(x,tlxo,to) is four-times differentiable in x

and once in t, and the derivatives are continuous in (x,t) (7.21)
and if u 2 (x,t) is twice continuously differentiable in x, and m(x,t) is once

continuously differentiable in x, then we have from (7.20) and integra-
tions by parts,
f-'"'" f(x) {~p(x,tlxo,to)

at
- 1 8 2 [u 2 (x,t)p(x,tlxo,t o)]
-2
ax 2
+~
ax
[m(x,t)p(X,t1xo,t o)]} dx = 0 (7.22)
Since (7.22) holds for all f E S, the quantity in the brackets must be
zero for almost all x, but being continuous, it must be zero for all x.
Therefore,
a 1 82
- p(x,tlxo,to) = - - [u 2 (x,t)p(x,tlxo,to)]
at 2 ax 2
8
- -
ax [m(x,t)p(x,tlxo,to)] b > t> to > a (7.23)
Equation (7.23) is the forward equation of diffusion, and is also called the
Fokker-Planck equation. The initial condition to be imposed is
f '"
-'"
f(x)p(x,tlxo,to) dx ~ f(xo)
I tlo
VfES (7.24)
that is, p(x,tolxo,to) = o(x - xo). A solution of (7.23), satisfying (7.24),

will be called its fundamental solution. Our derivation of the two equa-
tions of diffusion are now complete.
If we view the two diffusion equations as possible means for deter-

mining the transition probabilities of the solution of a stochastic differential
equation, then the situation is still not entirely satisfactory. This is because
the diffusion equations have been derived under differentiability assump-
tions (7.9) and (7.21). If we don't know P(x,tlxo,t o), how do we know
whether it is differentiable the required number of times? This difficulty
is in part resolved by the following proposition.
Proposition 7.1. Let m(x,t) and u(x,t) satisfy the following conditions on
- 00 < x <
00, a S; t S; b:
There exist positive constants Uo and K so that
Im(x,t)1 S; K V~2
(7.25)
o < Uo S; u(x,t) S; K viI x2 +
There exist positive constants 'Y and K so that
Im(x,t) - m(y,t)1 S; Klx - YI'Y
(Holder condition) (7.26)
!u(x,t) - u(y,t) I S; Klx - YI'Y
Then, the following conclusions are valid:
(a) The backward equation
) i,)2P(x,tlxo,to) . ) ap(x,tlxo,to)
1 2(
2U xo,to
axo
2 + m (xo,to axo
a
= - - P(x,tlxo,t o) t > to (7.27)
ato
has a unique solution corresponding to condition (7.15). Further,
for t > to, P(x,tlxo,to) is differentiable with respect to x so we have
the transition density
a
p(x,tlxo,t) = - P(x,tlxo,t o) (7.28)
ax
(b) There exists a sample-continuous Markov process {X" a S;

t S; b I with transition function P(x,tlxo,t o).
(c) Conditions (7.5) to (7.8) are satisfied.
Cd) If m' (x,t) , u' (x,t), u" (x,t) satisfy (7.25) and (7.26), then p(x,tlxo,t o)
is the unique fundamental solution of the forward equation.
(e) If 'Y can be taken to be 1 in (7.26), then p(x,tlxo,to) is the transi-
tion density of the unique solution to the stochastic integral equation
(7.29)
Example 1. Suppose that {Xt, t ~ O} satisfies a stochastic differential

equation
(7.30)
Here, we have m(x,t) = -x and u(x,t) = V2(1 + x 2). Therefore,
the forward equation is given by
a2
-
a~
[(1 + x2)p(x,tlxo,to)] + -axa [xp(x,tlxo,t o)] = -
~
a
p(x,tlxo,t o) (7.31)
Because m and u do not depend on t, p(x,tlxo,t o) will depend only

on t - to and not on t and to separately. Furthermore, we can
rewrite (7.30) as
X t
= e-(H.)X to + V2 lto(t e-(H) VI + X B
2 dW B
and from this we expect that as t ~ <Xl, the conditional density

p(x,tlxo,t o) will approach the stationary density p(x). We also expect
that
ap(x,tlxo,to) ~ 0
at t-""
Therefore,
d2
-d [(1
X2
+ x 2)p(x)] + -ddX [xp(x)] = 0
or
d
- [(1
dx
+ x 2)p(x)] + xp(x) = constant
Because p'(x), p(x) ~ 0, the constant must be 0, and by direct

integration we get
1
p(x) = p(a) (1 + X2)t (7.32)
By requiring 1_«>«> p(x) dx = 1, we find peO) = t·

The above procedure illustrates how (7.31) is very frequently used,
namely, to find the stationary density when it exists. Actually, for this
example, (7.31) can be solved completely. If we set
U
( x,t,.,xo,to) -_ sinh- 1 x_ /- sinh- 1 Xo
2 v t - to
then it can be shown that p(x,tlxo,to) is given by
p(x, tlx o , to) = 1 3{ 1 e-u'e-(t-to)

2(1 + X 2 )2 V-rr ( t - to)
+ ~ ju+.jt-toe- z '
,;;; u- .jt- to
liz} (7.34)
Example 2. Suppose that {Y" t ~ O} and {Z" t ~ O} are two independent

standard Brownian motion processes. Let
(7.35)
We shall use Proposition 6.4 and show that X, is Markov and derive
its stochastic differential equation. Let <B, denote the smallest u
algebra such that Y., Z., 8 ~ t are all measurable. Let (it denote the
smallest u algebra with respect to which X., 8 ~ t, are all measurable.
Clearly, for each t, <B, :) (i,. Now, for h > 0,
EIJ,(X,+1r. - X,) = EIJ'ECR.(X,+" - X,)
= EIJ'E(l\,{ (Y t+,,2 - Y,2) + (Z'+Jr.2 - Z,2)}
= EIJ'(h h) = 2h +
Similarly, we find that
EIJ,(X,+Jr. - X,)2 = 8h 2 + 4hX,
Clearly, the hypotheses of Proposition 6.4 are satisfied with
m(x,t) = 2 (7.36)
u 2 (x,t) = 4x (7.37)
Therefore, {X" t ~ O} satisfies the stochastic differential equation
dX, = 2dt + VX,dW, (7.38)
We note that u does not satisfy a uniform Lipschitz condition. How-
ever, it turns out that (7.38) has a unique nonnegative solution for
Xo = 0 anyway [McKean, 1960,1969].
According to Proposition 7.1 the conditional density for {X" t ~ O}
must satisfy the Fokker-Planck equation
2a a a
-2 [xp(x,tlxo,to)] - 2 - p(x,tlxo,to) = - p(x,tlxo,to) (7.39)
ax ax at
This equation has a unique fundamental solution given by
p(.r,tlxo,to) =
(t -
1
to
) exp - -1 ----
2 t - to
(x + xo) 10 (V- -no)
-
to t -
(7.40)
which can be verified by direct computation. In this example p(x,tlxo,t o)

can also be obtained directly from the definition for XI.
Finally, we shall state, but not derive, the diffusion equations in the
multidimensional case. Let I WI, a ::; t ::; b} be a vector, the components
of which are independent Brownian motion processes. Let X be a vector
random variable independent of IW " a ::; t ::; b}. Under appropriate
conditions on m and d, existence and uniqueness of the solution to the
vector stochastic integral equation
Xt = X + 1t m(X ,s) ds + 1t d(X.,s) dW.

8 (7.41)
can be established. The solution lX" a ::::; t ::; b} IS a vector Markov

process in the sense that
(7.42)
whenever E is a Borel set and s ::; t. Furthermore, with probability 1,
lim -hI E[(XI+h - Xt)IX/] = m(X/,t) (7.43)

hlO
lim ~ E[(XI+h - Xt)(X ,+h - X,VIX I] = d(X"t)dT(X"t)

hlO h
(7.44)
Suppose that IXI, a ::::; t ::; b} is a vector Markov process satisfying
(7.43) and (7.44). Then under some additional conditions similar to the
scalar case, the conditional density p(x,tixo,to) can be shown to satisfy a
pair of partial differential equations
and
1 \'
- ~
a 2
- - [(:Jij(x,t)p(x,tlxo,t o)] -
\'
~
a
-- [mi(x,t)p(x,tlxo,to)]
2 y.. a~a~ I
. a~
= ata p(x,tlxo,to) (7.46)
with initial condition p(x,tolxo,to) = o(x - xo). Under some additional

conditions, IXt, a ::; t ::; b} can also be shown to satisfy a stochastic dif-
ferential equation of the form (7.41). Therefore, if m and d are such that
(7.41) has a unique solution, and (7.46) has a unique fundamental solution
that is a density function, then the proce:-;s lX" a ::; t ::; b} satisfying
EXERCISES 177
(7.41) must have a density function which satisfies (7.46). There is a

major difference in the vector case, however. The best result that is known
on the uniqueness of solution for (7.46) requires that m and d be bounded,
which is a strong assumption. In practice there is an abundance of examples
to show that this boundedness condition is not essential. However, the
situation remains not completely clear.
EXERCISES
1. Let 'P satisfy (2.3) and (2.4). In addition, assume that its derivative q; is continuous
on [a,b J. Show that
lab q;(w,t)W(w,t) dt + lab <p(w,t) dW(w,t) = 'P(w,b)W(w,b) - <p(w,a)W(w,a)
almost surely. (Hint: Let {W n I be a sequence of step approximations to the

Brownian motion Wand consider
'PbWb - 'PaWa - lab q;,Wnt dt)

2. Suppose that Z, satisfies
Z, = 1 + lot Z<'PB dW,

Show that Z, is given by
Zt = exp (lot 'P. dW. - ~ lot <P.' dS)

(Hint: Consider In Z" and use Ito's differentiation formula.)
3. Let X, satisfy the stochastic differential equation
dX, = X,[m(t) dt + cr(t) dW,j t >0

where m and cr are nonrandom, i.e., independent of w. Show that X" t ;::: 0 has
the form
Xt = Xu exp [lot cr(s) dW. + lot f(s) ds J

and findf.
4. Let X, satisfy a stochastic differential equation

dX, = m(X"t) dt + cr(Xt,t) dW , t>o
where m and cr satisfy conditions given in Proposition 4.1. Show that
EX t 2 ::; 3 [ EX o' + K(l + t) lot (1 + EX.') ds J

From this inequality, show that if EX 0' < 00 then there exist finite constants A
and a such that
EX,' ::; A e''''
5. Let ~(x), - 00 < x < 00, be a real-valued Borel function with bounded-continuous
second derivative ~", and let {W" t ? 0 I be a standard Brownian motion. Show
that
X, = ~(W,) - ~ (t ~"(W.) ds
2 Jo
is a martingale. Thus, if ~" ? 0 (~" ~ 0) then ~(W,) is a submartingale (respec-
tively, supermartingale).
6. Let ~ be as in Exercise 5, and set
x, -
- [XIt]
X 2,
- [ W, ]
- ~(W,)
Show that X, is the solution of a vector stochastic differential equation

dX, = f(X,) dt + g(X,) dW,
where
and
Verify, both directly and by using (.5.34), that X, can also be considered to be the
solution of a white-noise equation
7. Let W k " k = 1, . . . , n be independent standard Brownian motion processes.

Define
X, =~ Wk,2
k=l
Show that X, satisfies a stochastic differential equation, and find this equation.
Is X, Markov?
8. Consider a white-noise equation

t >0
where X, is interpreted as the state at t, Jl., is the control at t, r, is a standard Gaus-
sian white noise, and b, u are constants.
(a) For open-loop control, Jl.t can depend only on the initial state X 0 and t. Find
the open-loop control Jl.(X o,t) which minimizes EX T2.
(b) For linear closed-loop control, we permit Jl., to take on the form Jl., = a(t)Xt.
Find the function aCt) which minimizes EX T2.
9. Consider the forward equation (7.23) for the case where m and u 2 are functions
only of x and not t, that is,
a 1 a2 a
- p(x,tixo,to) =-- [u'(x)p(x,tixo,to)] - - [m(x)p(x,tixo,t o)]
at 2 ax' ax
EXERCISES 179
Let w be any positive solution of the equation

1 d
-2 - [q2(X)W(X)] = m(x)w(x)
dx
Show that p(x,tlxo,to) can be written in the form
p(x,tlxo,to) = w(x) fA c h (t-t O)<p).,(x)1/;).,(xo) l'(dX)
where A is a Borel subset of the real line, I' a Borel measure, and <p)." 1/;)" are both
solutions of the Sturm-Liouville equation
-1 -d [ q2(X)W(X) -
2 dx
df(X)]
-
dx
+ Xw(x)f(x) = 0
10. For q2(X) = 1, m(x) = 0, we have the Brownian motion case and
p(x,tlxo,to) = V. 1
211"(t - to)
exp [ - -21 (~t -=- ~0;2]0
Show that p(x,tlxo,t o) has the form prescribed by Exercise 9. In particular,
p(x,tlxo,t o) = ~
211"
! 00
-.,
e-h'(t-to)eip(x-xo) d"
11. Suppose that q2(X,t) = (3(t) and m(x,t) = a(t)x. Show that the fundamental solu-
tion of the Fokker-Planck equation has the form
p(x,tlxo,to) = V 211"a1 2( t, to) exp {- ----z--(

1) [x - b(t,to)XOP}
2a t,to
How are a 2(t,to) and b(t,to) obtained?
12. Suppose that q2(X,t) = 1 and m(x,t) - 8gn x where sgn x is + 1 or - I according
as x ;::: 0 or x < o.
(a) Find the limiting density p(x) = lim p(x,tlxo,t o).
t-+ .,
(b) Find p(x,tlxo,to).

5
One-Dimensional Diffusions
1. INTRODUCTION
ThiR chapter iR an introduction to the semigroup treatment of Markov
processes with stationary transition functions. Modern theory of Markov
processes is primarily a semigroup theory. Even though much of this
theory has not found its way into applications in physical problems, the
elucidation that is made possible with the semigroup approach makes
it indispensable in any treatment of Markov processes.
Consider a Markov process {X(w,t), t E [0,00») defined on a proba-
bility space (n,a,cp). We assume that the transition function
cp(X( < biX. = a) = P(b,tla,s) t> s
depends only on t - s and not on t and s separately. We call such transi-
tion functions stationary transition functions. A process with a stationary
transition function need not be a stationary process. For example,
Brownian motion has a stationary transition function, but is not a
stationary process.
It is rather important that the set of values that X(w,t) can assume be
180
1. INTRODUCTION 181
explicitly stated. We assume that this set is an interval S which we shall

call the state space. The interval S can be finite or infinite, closed or
open at either end. Let (ft denote the q algebra of Borel sets in S. That is,
(ft is the smallest q algebra containing all subintervals of S. For E E (ft,
we denote
(1.1)
and call {Pa(E,t), a E S, E E (ft, t E [O,oo)} the transition function.
Since {XI, 0:::; t :::; oo} is Markov, Pa(E,t) satisfies the Chapman-
Kolmogorov equation, viz.,
(1.2)
If a transition density Pa(b,t) exists for t > 0, that is, if P a(E,t) can be
written as
P a(E.t) = IE Pa(b,t) db EE(ft

t> ° (1.3)
then Pa(b,t) satisfies
Pa(b, t + s) =Is Pa(X,s)Px(b,t) dx (1.4)
PaCt) is known for °: :; t :::;

Equations (1.2) and (1.4) reveal the basic theme of this chapter. If
E, then PaCt) for any t can be determined
of Pa(',t) for t """ °

by iterating (1.2), no matter how small E is. Therefore, the knowledge
suffices to determine PaCt) for all t. It turns out
that under suitable continuity conditions and with proper interpreta-
tion, Pa(·,o) and (ajat)Pa(',t) 11=0 completely determine P~Ct) for all t.
Furthermore, since PaCO) is always the same, viz.,
if a E A
Pa(A,O) = {~ if a E A
the behavior of Markov processes with stationary transition functions

depends entirely on (ajat)p,,(',t) 11=0' The goal is to deduce everything
about the behavior of a Markov process from (ajat)PaCt) 11=0' We note
that this is very much the same motivation underlying the derivation
of the diffusion equations (backward and forward equations of Kolmo-
gorov). The only difference is that in the stationary case, the setting is
much more general, and the answers are more clear cut. Of course,
when we say "everything about the behavior of the process" we mean
everything up to an arbitrary distribution of X 0, since we only make
use of the transition functions. In this sense, we do not distinguish
182 ONE-DIMENSIONAL DIFFUSIONS
between processes differing only in the distribution of X o. For example,

we shall define in this chapter a Brownian motion as any process {X"
t ;::: 0 I such that {X t - X 0, t 2:: 0 I is a Brownian motion in the earlier
sense. That is, here a Brownian motion is any Markov process with
transition density
1
Pa(b,t) = _ _ e-(1/2t) (b-a)' (1.5)
V27rt
Incidentally, this broadening of the definition of Brownian motion also
avoids the awkwardness in interpreting Pa(E,t) as
P a(E,t) = <p(Xt E EIX 0 = a)
when our earlier definition specified X 0 = 0, a.s.
2. THE MARKOV SEMIGROUP

Let tXt, 0 :::;; t < 00 I be a Markov process with a stationary transition
function P a and continuous in probability. We shall adopt the notations
EaZ = E(ZIX 0 = a) (2.1)
and
<Pa(B) = <p(BIXo = a) (2.2)
Because of the continuity in probability, for every open set E containing a,
lim <Pa(Xt E E) = lim Pa(E,t) = 1 (2.3)
t!O t!o
Let B denote the space of all complex-valued bounded functions
defined on the state space S and measurable with respect to the Borel
u algebra <ft. With the usual norm
11111 = sup If(x) I (2.4)

xES
the space B is a Banach space. For each t E [0,00), we define an operator

H t by
(2.5)
Since
H t maps B into B and is a contraction, that is,

IIHtili ~ IIfll (2.6)
2. THE MARKOV SEM/GROUP 183
Because of (2.6), for each t, H tis continuouE', that is,

IIfn - fll ->
n~oo
0 =} IIHdn - Hdll ->
n~QQ
0 (2.7)
A sequence Un} in B is said to converge strongly to f if Ilfn - fll -n-+oo
> 0,
and we shall use the notation

slim fn = f (2.8)
The one-parameter family of operators {Ht, 0 :::; t < oo} that we

have defined has some additional important properties. First and fore-
most, is the semigroup property. From the Chapman-Kolmogorov equa-
tion (l.2), we can write
(Ht+.f)(a) = Eaf(X t+.) = Isf(x)Pa(dx, t + s)

= Is f(x) Is Pb(dx,t)Pa(db,s)
= f~ (Htf) (b)Pa(db,s) = (H.IJtf)(a)
Therefore,
H t+. = H.Ht = HtH. (2.9)
In addition, because of (2.3) we also have
(Hd)(x) ~ f(x) (2.10)
at every continuity point of f. Roughly speaking, (2.9) and (2.10) imply

that H t must be of the form
Ht = etA
where A is necessarily given by A = (d/dt)H t It=o' Thus, the first step is to

define (d/dOH t It=o'
Let j)A denote the set of all functions f in B such that the limit
slim! (H,f - f)
fLO t
exists. This limit defines a linear operator A mapping j)A into B, that is,
Af = slim! (H,f - f) (2.11)

fLO t
The operator A is called the generator of the semigroup and of the Markov
process. If:DA = B and if A is a bounded operator (i.e., there exists finite
K such that IIAfl1 :::; Kllf!1 for all fEB), then we truly would have
Ht = etA
~ ,An
== 1.. tn
n=O n.
However, this happens only in a few relatively uninteresting cases, for

example, when S contains only a finite number of points. In the general
case:DA r6 B, A is unbounded and etA is not well defined, The main goal of
this section is to show that if a Markov process is continuous in probability
and has a stationary transition function, then its transition function is
uniquely determined by its generator.
We note that if I is such that Hd E :DA, then
d
- Rd = AHd (2.12)
dt
and this is a version of the backward equation. To see that this is the case,
let IE denote the indicator function of E and suppose that HtIE E :DA for
t > O. Then
(RtlE)(a) = Pa(E,t)
and
a
- Pa(E,t) = (AP.(E,t»(a) (2.13)
at
As we shall see a little later, for a Brownian motion H dEE :DA for t > 0
and
(AI)(a) = ~ d 2/(a)
2 da 2
Therefore, for a Brownian motion,
a 1 (j2
at Pa(E,t) = "2 (ja 2 Pa(E,t) t> 0
which is just the backward equation for that process.
The procedure of constructing the semigroup {Ht, 0 :::; t < oo}, or
equivalently, determining the transition function, from its generator A
involves in an essential way the resolvent. First, define Bo as the set of all
functions I in B such that
IIHd - III t""T6 0 (2.14)
It is clear that Bo ::> :DA. It turns out that :DA is dense in Bo. That is, every
lEBo is the strong limit of a sequence from :DA. To show this, take any
i E Bo and set
{lin
in = n}o H.lds (2.15)
Then, for n = 1, 2, . . . ,
8 lim (Hdn - In) = n(H l/nl - f)
flO
2. THE MARKOV SEMIGROUP 185
so that fn E :DA for each n. Furthermore,
I!f.. - fil = It n J({olin (Hd - f) dt II:::; O:5t:5l/n

SUp IIHd - fll -n-...• 0
Therefore, :DA is dense in Bo. For f E Bo and X > o. We define

(RAJ) (a) = 10'" e-Xt(Hd) (a) dt (2.16)
The family of operators {RA' 0 < X < oo} is called the resolvent of the
semigroup {HI, 0 :::; t < oo}. The importance of RA is that it is simply
related to the generator A.
Proposition 2.1. For every g E Bo and every X > 0, RAg E :DA. Further-
more, f = RAg is the unique solution to the equation
Ai - Af = g f E:DA (2.17)
Proof: We shall sketch a proof with some details omitted. First, we verify
that RAg E :DA by computing
t1 (HtRAg - RAg) = t1 Jo{'" e-h8 (H +8g -

t Hsg) dB
= ~ (e ht 1'" e-X8 H 8g dB - !o '" e-h8H 8g dB)

= _eht -t1 Ict0 e- h8H g dB
8
+ -t1 (eXt - I)R hg •'0
~t
• -g
where - • denotes strong convergence. Therefore,

ARAg = -g + XRAg
and
xRAg - ARAg = g
Therefore, f = RAg is a solution to (2.17). Next, we prove that it is the
only solution.
Suppose that f1 and f2 are two solutions to (2.17). Set
tp = f1 - h
Then cp E :DA and
XCf - Atp = 0
Therefore,
and
e-XtHt<p = H o<P = <P
It follows that
o ::; 1\<p11 = e-XtIIHt<p1\ ::; e-Xtll<pl! ~ 0
so that 11<p11 = 0, which proves uniqueness. I

Proposition 2.1 shows that the mapping Rx: Bo ~ :DA is one to one
and onto. If we denote the identity operator by I, then we have
Rx = (AI - A)-l (2.18)
and
(2.19)
By using R x, we can now construct the semigroup {HI, 0 ::; t < 00 I
from A.
Proposition 2.2.Let {Xt, 0 ::; t ::; 00 I be a :Markov process with a sta-

tionary transition function Pa(E,t), and let {Xt, 0 ::; t < 00 I be
continuous in probability. Then its transition function, equivalently,
its semigroup {H t , 0 ::; t < oo}, is uniquely determined by its
generator A.
Proof: First, we note that

ARxf = A 10'" e-XtHd dt = 10"' e-tHt1xf dt
so that II ARxfl\ ::; IIfll and
IIARxf- fll = 1110"' e-t(Httxf- f) dtl'

which goes to zero as A~ 00 by dominated convergence. Therefore,
(2.20)
N ext, define Ax by
Ax = AARx (2.21)
From (2.19) we have
IIAdli = AIIARdl1 = AIIARA! - fll ::; AIIARdl1 + AII!II ::; 2AII!1I
(2.22)
so that Ax is a bounded operator. We can define
Given A, we now determine {H t , 0 ~ t < oo} as follows: For

f E ~A we can show that
Htf = s lim etA>.! (2.23)
A---+ 00
For f E Bo, let {fn} be a sequence from ~A converging strongly to f.

Then
H,f = slim Htfn
n---+ 00
For any bounded-continuous f, we set
jn = n Io lln HJ ds (2.24)
Then, fn E Bo for each n, and
(2.25)
for each x E 8 and each t 2:: O. Finally, we note that
f(u,x) = e;uX
IS a bounded-continuous function in x for each real u. Therefore, the

characteristic function
EaeiuX, = Fa(u,t)
is uniquely determined by the generator A. This in turn implie8 that the
transition function PaCE,t) is uniquely determined by A. What need to
be shown are (2.23) and (2.25).
To prove (2.23), we note that
d
dtHd = AHd
and
Therefore,
//He! - etA >.!1I = II j; (AHa! - A>.elA>.f) ds II

~ II lot (A - Ax)H.f dS\! + II AA Jot (H. - e·A>.)f ds \I
~ lot II (A - AA)H .fll ds + lot IIAx(H. - elAA )!1I ds
If we set 'Pt = lot IIH.! - e'A>.fll ds and make use of (2.22), we find
rPt ~ 2'A.'Pt + lot IICA - A>.)H.JII ds (2.26)

or
:t (e- 2}..tcpt) :::; e- 2}..t lot II (A - A}..)H.t1l ds
By direct integration, we get

_ -1 e- 2}..t
2X
lot II(A - A}..)H.f11 ds + 21X lot e- }..81!(A

2 - A}..)H.f11 ds (2.27)
Combining (2.26) and (2.27), we find
o :::; IIH t f - etA}..fll = <Pt :::; lot e- 2}..(t-B) II (A - A}.,.)H .fll ds ~ 0

which proves (2.23).
To prove (2.25), we write
(H,fn - Hd)(a) = Ea!n(X t) - Ea/(X t)
= Is PaCdx,t) [fn(x) - f(x)]
= Is Pa(dx,t) n IOl/n [E,J(X.) - f(x)] ds
Because {X" 0 :::; t < 00 I is continuous in probability and f is bounded

continuous,
I(Hdn)(a) - (Hd)(a)/ ~ 0
n-+ ""
by dominated convergence. I
Example. As an example, we shall derive the generator for a standard
Brownian motion. We recall that a Brownian motion has a transi-
tion-density function given by
1
Pa(:t,t) = - - e-(1/2t)(;r-a)' (2.28)
V27rt
Let f be any fUllction in B with a bounded-continuous second deriv-
ative f", and let e 2 denote the set of all such functions. Then,
-1 [Ea/(X t ) - f(a)] = -If"" - .

1- e-(l/2t)(x-a)
•[f(x) - f(a)] dx
t t -00 yl2;t
By using the Taylor's theorem at a, and by making a change in
the variable of integration, we get
!t [Ea/(X,) - f(a)] = fd"(a) + ! /"" _~ [1"(0) - f"(a)]e- ht dz

2 -"" v27r
where () lies between a and a + Vi z. Because f" is bounded con-

tinuous, we get
(Af)(a) = H"(a) (2.29)
for all f E e 2 , and we have also shown that 5)A ::> e 2 • What is more
difficult is to determine Bo and 5)A.
If we take the Laplace transform of (2.28), we get
to
OO
e-
A
tPa
(
x,t) dt =
exp (- ~
_ r:::
~2A
Ix - al) O<A< 00 (2.30)
Hence,
(Rx.f)(a) = /00 exp (- ~ Ix - aD f(x) dx (2.31)

-00 ~
For any bounded f, the right-hand side of (2.31) is a bounded-

continuous function. If we denote the set of all bounded-continuous
functions bye, then we have
5)A = RxBo C e
Since Bo is the closure of ~A under uniform convergence, we must
also have
Bo C e
On the other hand, for any bounded-continuous f
so that Bo ::> e. Hence, Bo = e. Now the only thing left to do is

to find 5)A.
For fEe = B o, let g = Rxf. Then
g(a) = Ju [f_OOoo exp (- vU Ix - al)f(x) dx ]
g'(a) = - loo exp [- v'2A (x - a)lf(x) dx
- f~oo exp [+ vU (x - a)lf(x) dx

g"(a) = -2f(a) + 2Ag(a)
Therefore, g" E e, that is, g E e 2 • We have shown that ~A C e 2 •
But earlier we showed that 5)A ::> e 2• Hence, ~A = e 2• To sum-
marize, for a Brownian motion we have
Bo = e (2.32)
and
Af = if" (2.33)
Further, for every t > 0, Pa(X,t) is e 2 in a so that
a 1 a 2
at Pa(X,t) = "2 aa 2 Pa(X,t)
which is the familiar backward equation for Brownian motion.
3. STRONG MARKOV PROCESSES

Let {Xt, 0 ~ t < oo} be a separable Markov process, continuous in
probability, and with a stationary transition function P .. (E,t). Let at
denote the smallest q algebra with respect to which tXT r ~ t} are all
measurable. Then the Markov property can be stated as follows:
a.s. (3.1)
for all 8 ;;::: O. Often, we state (3.1) verbally as "future and present given
the past and present depends only on the present." In (3.1) the present
is a time t which is fixed. Roughly speaking. a strong Markov process
satisfies (3.1) even when the present is a suitably restricted random time,
i.e., it varies from sample function to sample function.
We assume from now on that all processes are sample right con-
tinuous with probability 1. Indeed, we shall shortly specialize to sample-
continuous processes. With {at, 0 ~ t < oo} defined as before, a non-
negative random variable r is said to be a Markov time if for every
t> 0,
(w: r(w) < t} Eat (3.2)
This means that if we observe a sample function X.(wo) on the interval
o~ 8 ~ t, we can always determine where r(wo) < t or r(wo) ;;::: t. How-
ever, we cannot always determine whether 7 (wo) = t or not. It is clear
that deterministic times are always Markov times. Another important
class of Markov times are the first passage times (for level a) defined by
'fa(w) = min It: Xt(w) = a} (3.3)
To show that 'fa is a Markov time, we write
Iw: 'fa(e.:) < t} = 0 {w: 'fa(W) ~

n=l
t - ~}
n
= 0 {w: X.(w)
n=l
= a for 80me 8 in [0, t - !1}
n
3. STRONG MARKOV PROCESSES 191
Since for each n, the set {w: X.(w) = a for some sin [0, t - lin]} is in ai,
{w: Ta(W) < t} Eat
Now, let S be the state space of the Markov process, and let a be in
the interior of S. We define
Ta+ = lim Tb (3.4)
bla
Ta- = lim Tb (3.5)
bra
and these are also Markov times. We should again note that if T is a
Markov time, the set {w: T(W) = t} need not be in at. For example,
neither {Ta+ = t} nor {Ta- = t} is necessarily in at.
Let T be a Markov time. We define the u algebra a r + as follows:
if and only if E ("\. {"-': T(W) < t} Eat
It is obvious that T is ar + measurable. If T = to is a deterministic time,
then
Thus, we see that if T represents the present, then ar + is a little bit more
than the past and present.
Definition. tXt,
every Markov time
°: ; t < OC)} is said to be a strong Markov process if for
T,
(3.6)
Every strong Markov process is Markov in the ordinary sense. This is
because if (3.6) is satisfied, then
<p(X t+. E Ela t ) = E d' <P(X t+
8 E Ela t +)
= Ed'PX,(E,s) = Px,(E,s) a.s.

which is just the ordinary Markov property.
For an example of a Markov process which is not a strong Markov
process, consider the following:
Xt(w) = max (0, t - T(w» o::;t< 00, (3.7)
where T(w) is a nonnegative random variable with
fJ'(T < t) = I - e- t (3.8)
This process is obviously Markov in the ordinary sense because:
1. Given XI = a > 0, X t+8 = a + s with probability I
2. Given X t = 0, X. must be zero for s ~ t so that it provides no further

information
Now, the random variable T is a Markov time for this process because
{w: T(w) ~ t} = {e.;: Xt(w) = O} E G t (3.9)
Given T, X T +8 = s with probability 1. Therefore,
x> s
(3.10)
x~s
On the other hand,

PXr(x,s) = Po(x,s) = (p(X t+ 8 < xlX t = 0)
= e- t + }o
( min(t,.)
e-(t-lI) dy (3.11)
Obviously, (3.10) and (3.11) are not the same, so tXt, 0 ~ t < oo} cannot
be a strong Markov process.
There are two extremely useful criteria for determining whether a
Markov process is also a strong Markov process:
1. If (3.6) is satisfied for the following classes of Markov times, then the
process is strongly Markov:
1 = Ta aE8
T = Ta+ a E int (8)
T = Ta- a E int (8)
2. If for every t ~ 0, the operator H t maps bounded-continuous func-
tions into bounded-continuous functions, then the process is a strong
Markov process.
Processes satisfying (2) are called Feller processes. For example, a

Brownian motion is a Feller process, hence, it is a strong .Markov process.
For strong lVlarkov processes with continuous-sample functions,
the generator A is a local operator, that is, Aj at a depends only on j
in a neighborhood of a. This fact can be deduced from the following
proposition.
Proposition 3.1. Let tXt, 0 ~ t < oo} be a strong Markov process. Let
T(W) be a Markov time.
(a) Let j E Bo and define
u>.(a) = (R>.f)(a) = Ea 10" e->.tj(X t) dt (3.12)
Then
(3.13)
4. CHARACTERISTIC OPERATORS 193
(b) Let 9 E :DA, and let ET < 00, then

Ea !o~ (Ag)(X t) dt = Eag(XT) - g(a) (3.14)
Remark: Equation (3.14) is generally known as Dynkin's formula, even

though both are due to Dynkin [1965, pp. 132-133].
Proof: To prove (3.13), we only need to show

Ea t"" e-Xtf(Xt) dt = Eae-XTux(XT)
which can be done as follows:
Ea t"" e-Xtf(X t) dt = Ea 10 "" e-X(t-T)f(Xt+T) dt
= K.1o"" e-x(t+T)E[f(X t+T) [aT+] dt
= Ea 10"" e-X(t+T)ExT[f(X t)] dt
= Eae-XTuA(XT)
To prove (3.14), set fA = (A - A)g, so that
9 = (A - A)-ifx = Rxfx
From (3.13), we have
g(a) = Ea faT e-Xt[(A - A)g](X,) dt + Eae-xTg(XT)
which becomes (3.14) as A~ 0, provided that EaT < 00. I
Equation (3.14) reveals the local character of the generator A when
the process is sample continuous. To see this, let a E int (8), and let
(3.15)
Then, starting from a at t = 0, X, E (a - ~, a + ~), 0 :::; t < T. From
(3.14) we have
(Ag)(a) = lim Eag(XT~ - g(a) (3.16)

do EaT
where T is given by (3.16). It is easy to see that the right-hand side of
(3.16) depends only on 9 in a neighborhood of a. It turns out that under
some additional assumptions we can show that A is always a differential
operator. This will be taken up in the next section.
4. CHARACTERISTIC OPERATORS
For the remainder of this chapter we shall restrict ourselves to processes
tXt, 0 :::; t < 00 I satisfying the following conditions:
Every point in the state space 8 is reachable from every point in
int (S) with a nonzero probability, that is,

<pa(Th < 00) >0 for every a E int (S) and every b E S (4.1a)
IXt, 0 ::; t < oo}
is a strong Markov process with a stationary
transition function. Starting from every point in S, every sample
function is continuous with probability 1 (4.1b)
We observe that if (4.1a) is not satisfied, then S can be decomposed
into sets, such that starting from a point in a set, X t remains in that set
for all t. In that case, the process can be decomposed into separate
processes. Together, C4.1a) and (4.1b) imply that the process IS a Feller
process. We recall that this means the following: If f is bounded and
continuous (f E C), then H,f E C. In this case it is sufficient to consider
the semigroup IH t , 0 ::; t < oo} as acting on C rather than B. This is a
great convenience, and we assume that this is done for the remainder of
this chapter.
Let Ta be the first passage time at the point a, defined as before by
Ta = min (t: X t = a) (-1-.2)
Let (a,b) be an open interval such that its closure [a,b] belongs to the
state space S. Define the first exit time from (a,b),
Tab = min (ta,tb) == ta 1\ tb (4.3)
We have the following result on Tab.
Proposition 4.1. Under assumption (4.1) we have

sup E;rTab < 00 (4.4)
as;xS;b
Proof; Under assumption (4.1), <Pa(Tb < 00) > 0, so that for some
t < 00,
<Pa(Tb > t) = aCt) < 1 (4.5)
Now, for a ::; x ::; b,
<Px(Tab > t) ::; <Px(Tb > t) ::; <Pa(1b > t) = aCt) < 1 (4.6)
N ext, by the Markov property, we have
<Px(1-ab> t + 8) = lab <Px(Tab > 8, X. C dZ)(J'z(Tab > t)

::; aCt) lab (J'xC'-c.b > 8, X. E dz)
= a(t)(J'xCTab > 8) (4.7)
SO that for some t < 00,
<Px(Tab > nt) ::; an(t) (4.8)
Now, we write
EzTab = 10"" SCI'z(Tab E ds)

~
"" {(n+l)t
= '"" }nt SCI'z(Tab E ds)
n=O
: :; L
n=O
""
(n + 1)t[CI'z(Tab > nt) - Cl'z(Tab > (n + 1)t)]
= t ~ CI'",(Tab > nt) ~ t ~ an(t) (4.9)

/::0 n'=o
=
1 -
t (
a t)
Since there exists some t < 00 such that aCt) < 1, we have proved the
proposition. I
Now, let [a,b] C S and x E (a,b). Dynkin's formula (3.14) yields
Ex 10 M
(Ag)(X t ) dt = Exg(X Tab ) - g(x) (4.10)
Since we are considering {H t, 0 :::;; t < 00 I as acting on C, we can assume

9 E C. It follo\vs that Ag E C, and by shrinking (a,b) down to x, we get
Exg(XTab ) - g(x)
(Ag)(x) = lim (4.11)
(a.b)!!x) ExTab
Equation (4.11) expresses the local nature of A in the interior of S, since
(Ag)(x) depends only on the values of 9 in a neighborhood of x. How-
ever, (4.11) does not completely specify A. For example, suppose that
S = [0,00) and for every x E (0,00),
4. ) (x)
(~g = !')d
d 2g(x)
2
(4.12)
~ X
Both of the following two processes satisfy (4.12):

1. Absorbing Brownian motion
1 fa/Vi ,
<Pa(Xt = 0) = _ / - . r. e-!z dz
V 27r -a/vt
<Pa(O < X t < b) = {b _} [e-O/2t)(z-a)' - e-(l/2t)(z+a)'] dz

}o V 27rt
2. Reflecting Brownian motion
Cl'a(O :::;; X t < (b 1

b) = }o _ / - [e-(l/2t)(z-a)' + e-(1/2t) (z+a)'] dz
V 27rt
Obviously, these two processes have different generators.

We can now use (4.11) to define an extension to A as follows. Let

J denote an interval which is the intersection of an open interval with S.
We call such a J a relatively open interval (in S). Let ::01 denote the set
of all functions such that
lim EXg(XTJ ) - g(x) = (Ag)(x) (4.13)

J!(x} ExTJ
exists at every xES. The difference between A and A is that A is defined

by a uniform limit, and A by a pointwise limit. We adopt the conven-
tion that if E zT J = 00 for every relatively open J containing x, then we
set (Ag)(x) = O. Of course, this can only happen if x is a closed endpoint
of S, because if x E int (S), then (4.4) applies. We shall denote by ::01
the set of all functions in C such that the limit in (4.13) exists at every
XES.
Let Pab+ and mab(x) be defined by
(4.14)
and
(4.15)
For x E int (8), we can write (4.13) as
(Ag)(x) = lim pab+(x)g(b) + [1 - Pab+(X)]g(a) - g(x)

(4.16)
b!x mab(x)
aix
It is clear that knowing the two functions Pab+ and mab for every [a,b] C 8
completely determines (Ag)(x) at everv interior point of 8. It will turn
out that they also elucidate the behavi'or of Ag at any closed endpoints.
The converse is also true. That is, knowing Ag in int (8) completely
determines Pab+ and mab for every [a,b] C S, as the following proposition
shows.
Proposition 4.2. For every [a,b] C S, the functions Pab+ and mab are,
respectively, the unique continuous solutions to the equations
(APab+)(X) = 0 a<x<b (4.17)

(Amab) (x) = -1 a<x<b (4.18)
subject to the boundary conditions
Pab+(b) = 1 = 1 - Pab+(a) (4.19)

mab(a) = 0 = mab(b) (4.20)
Proof: We can easily verify (4.17) as follows:

(Apab+)(X) = lim Pca+(X)Pab+(d) + [1 - PCd+(X)]Pab+(C) - Pab+(X)
dLx mcd(x)
cTx
(4.21)
If x E (c,d) C (a,b), then to exit from (a,b) starting at x the process
must first exit from (c,d). Using the strong Markov property, we get
(4.22)
whenever x E (c,d) C (a,b). Using (4.22) in (4.21) results in (4.17).
Equation (4.18) can be verified in a similar way.
To prove continuity, let y > x, and set c = a, d = y in (4.22).
Then we get
But Pay+(X) ~
xTy
1, so that Pab+(X) ~ xTy
Pab+(Y). Similarly, we can show
Pab+(X) ~x!y
Pab+(Y) , so that Pab+ is continuous. Continuity of mab can be
proved in a similar way. Uniqueness is much more difficult to prove. It
depends on the fact that A satisfies a "minimum principle" [Dynkin,
1965, Chap. 1, p. 145]. I
If 8 is a closed interval, say [0,1], let
u(X) = POl+(X) (4.23)
Then, for 0 :::; a < b :::; 1, we have from (4.22)
u(x) - u(a)
Pab+(X) = u(b) - u(a) (4.24)
It turns out that (4.24) holds generally whether S is a closed interval or

not. The function u(x), unique only up to an affine transformation
u ~ aU + (3, is called the scale function. The scale function is always
continuous and nondecreasing, but bounded only if 8 is closed. It is clear
from (4.17) that
(Au)(x) = 0 (4.25)
for all x E int (8). A process is said to be in its natural scale if u(x) can
be taken to be equal to x. For example, a Brownian motion is in its
natural scale. If a process X t has a scale function u(x), then u(X t ) is a
process in its natural scale, so that it is no restriction to assume a process
to be in its natural scale.
It also turns out that there exists a Borel measure J.L, defined on the
Borel sets in int (8), such that
() ( [u(x A y) - u(a)][u(b) - u(x V Y)] (d)

moo x = l(a,b) u(b) _ u(a) JL Y (4.26)
where x A Y = min (x,y) and x V y = max (x,y). In terms of u and JL, 1

can be written as a generalized differential operator in int (8) as
d d
A=-- (4.27)
dJL du
Actually, we are primarily interested in diffusion processes for which
both JL and u are absolutely continuous. In that case, the restriction of A
to twice-differentiable functions take on the form
A i d ( 1 dg(x»)
( g)(x) = JL'(x) dx u'(x)--a;.:
We won't give a precise definition to (4.27). Instead, we take up the

subject of diffusion processes in the next section.
5. DIFFUSION PROCESSES
There is some disagreement in recent literature as to the definition of a
diffusion process. Some authors call any process satisfying condition (4.1)
a diffusion, while other restrict the name to a smaller class of processes.
We shall adopt the more restrictive definition and define a diffusion
process as any process satisfying (4.1) and for which the limits
(5.1)
and
. E:r,(X'ab -
11m X)2 2( )
= 0- X (5.2)
atx E",Tab
b!x
exist at every x E int (8). We shall always assume that m and 0- 2 satisfy
a Holder condition (cf. 4.7) and
0-2(X) > 0 x E int (8) (5.3)
Proposition 5.1. Let {Xl, 0 ~ t < 00 I be a diffusion process. Let g E

C2 n 5)1. Then for every x E int (8);
(l g)(x) = i0- 2 (x) d 2g(x) + m(x) dg(x) (5.4)

dx 2 dx
5. DIFFUSION PROCESSES 199
Proof: By assumption g has a bounded-continuous second derivative

so that by Taylor's theorem we can write
g(XTa. ) = g(x) + g'(x)(X Ta • - x) + ig"(O) (X Tab - X)2 x, 0 E (a,b)
Since sup /g"(O) - g"(X)/ ) 0, (5.4) follows immediately from the

6E(a,b) 0 (a,b) l Ix)
definition of A. I
Because of Proposition 4.2, we know that if the equation
1 2( ) d 2g(x)
20"Xd;2
+ mx~=
() dg(x) 0 a<x<b (5.5)
has a continuous solution satisfying the boundary conditions g(a) = 0 =

1 - g(b), then it must be pab+(X). Such a solution is easily constructed,
and we find
laX exp [ - laY ~ dZ] dy

(5.6)
lab exp [ -laY ~~~) dZ] dy
It is clear that we can take the scale function to be
u(x) = Icx exp [ - Icy ~~~~) dzJ dy (5.7)
where c is any point in int (S).

The function mab can be found by solving
1 2( ) d 2m a b(X)
20" x dx 2
+ m () dmQb(X)
x --;z;;-
_- -1
a<x<b
(5.8)
mab(a) = 0 = mab(b)
To construct mab, define the Green's function
G ( X ) = [u(x 1\ y) - u(a)][u(b) - u(x V y)] ( b)

(5.9)
ab ,Y u(b) _ u(a) x, yEa,
Because (Au)(x) = 0, x E int (S),
1 2( ) a 2Gab (x,y)
20" X
+ mx
() aGab(x,y) =0
ax 2 ax
for x E (a,y) and x E (y,b). Furthermore, Gab(a,y) = Gab(b,y) = 0, and

aGab(x,y) I - aGab(x,y) I = -u'( )
ax x=y+ ax x=y- y
200 ONE-OIMENSIONAl DIFFUSIONS
Therefore,
--§:u 2 (y)u'(y) 5(x - y) (5.10)
and
(b 2
mab(x) = Ja (J2(y)U'(y) Gab(x,y) dy (5.11)
Given m and (J2 in int (S), Pab+ and mab are determined, and thus A
is completely determined in int (S). However, we know that if S is closed
at one or both endpoints, knowing A in the interior of S may not be
enough to determine the semigroup {H t , 0 ~ t < oo} (equivalently, the
transition function) uniquely. To clarify the situation, we need to study
the possible behavior at the boundaries. The first result in this direction
is the following.
Proposition 5.2. Let a be the left endpoint of S. Then a E S if and only if

u(a) > - 00,and for some c > a,
h a
c u(y) - u(a) d
(J2(y)U' (y)
y < 00 (5.12)
The right endpoint b of S belongs to S if and only if u(b) < 00, and
for some c < b,
lc
b u(b) - u(y) d
(J2(y)U' (y)
y < 00 (5.13)
Proof: We shall only prove the first half, the proof for the second half
being nearly identical.
First, suppose that a E S. Let a be any point such that [a,a] C S.
Then from (4.4) we have
sup maa(X) < 00
a<x<a
From (5.11) we find that sup maQ(X) occurs at a point c satisfying

a<x<a
(C u(y) - u(a) dy _ (a u(a) - u(y) dy

Ja u2(y)u'(y) Je (J2(y)U'(y)
and ma.,(C) is given by
(e u(y) - u(a)
mab(C) = 2 Ja (J2(y)U'(y) dy
Therefore, a E S implies (.5.12).

Conversely, assume that (5.12) holds, but suppose that a E S.

Then, for any point c E S,
E",Tc = lim mzc(x) a < x < c
z!a
= lim
z!a
t
Gzc(x,y) 2()2,() dy
C
U Y u Y
= 2 [u(c) - U(X)] (x u(y) - u(a) dy
u(c) - u(u) Ja u 2 (y)u'(y)
+ 2 [U(X) - u(a)] (c u(c) - u(y) dy
u(c) - u(a) Jx u 2(y)u'(y)
xl a. But ExTc must be

If (5.12) holds, the last expression goes to zero as
nondecreasing as x 1 a, because it takes longer and longer to get from
x to c. Hence, there is a contradiction and (5.12) implies that a E S. I
Proposition 5.2 shows that the knowledge of m and u 2 in the interior
of S is sufficient to determine whether S is closed at one or the other of
the endpoints. As an example, suppose that int (S) = (0,00), and
m(x) = 0, u 2 (x) = 1, 0 < x < o(). In this case, we can take u(x) = x.
Since for every finite c,
(e u(y) - u(O) (C
Jo u2(y)u'(y) dy = Jo y dy < 0()
S must be closed at the left endpoint 0, that is, S = [O,O(». If S is closed

at an endpoint, then A at that point may not be determined by the
knowledge of u 2 and m in int (S). The situation is as follows: Suppose
that a is a closed left. endpoint of S and that for every c > a, EaTe = o().
Then (Ag)(a) = O. However, if for some c > a, EaTc < 00, then (Ag)(a)
is not determined by m and u 2 in int (S). We now state a criterion which
separates these two cases.
Proposition 5.3. Let a be a closed left endpoint of S. If for some c E int (S),
ja
e 1
u2(y)u' (y)
dy < 0() (5.14)
then for x E [a,cJ,

(5.15)
On the other hand, if for every c E int (S),
[C 1 dy = 0()
(5.16)
}a u2(y)u'(y)
then EaTe = 00 for every c E int (S).

Remarks:
(a) The point a is called a regular boundary if (5.14) is satisfied for
some c E int (8), otherwise it is called an exit boundary.
(b) The criterion of Proposition 5.3 can be modified for a closed
right endpoint in an obvious way, and we won't repeat it.
Let a be a regular left endpoint of 8. Define
. EaTc
Ka = 1I m - - (5.17)
da c - a
We note that Ka cannot be determined from m and u 2• Indeed, each
choice of Ka gives us a process with a different transition function. If b
is a regular right endpoint of 8, then we set
Kb = 1I· mEbTc
-- (5.18)
cU b - c
If a is a regular left endpoint, then
o 1
(Ag)(a) = - g'(a+) (5.19)
Ka
and if b is a regular right endpoint, then
1
(Ag)(b) = Kb g'(b-) (5.20)
To summarize the situation, we see that A is completely summarized

by the knowledge of m and u 2 in int (8) and the values of Ka and Kb at
regular boundaries, if there are any. The following proposition shows
that the transition function is also uniquely determined.
Proposition 5.4. Let g E C (that is, g is bounded continuous). Let fA be

defined by
fA(x) = !o" e-AtExg(Xt) dt (5.21)
Then for every A > 0, fA E :01 n C2 and is the unique continuous

solution to
(5.22)
Remark: We note that because .fA E :01 n C2, (5.22) takes on a differ-
ential form in the interior of 8, viz.,
x E int (8) (5.23)

On a closed boundary of S, (5.22) takes on the form given in the

following table:
Table 5.1
Nature of boundary Closed left endpoint a Closed right endpoint b
Exit Af,,(a+) = g(a) Af>.(b-) = g(b)
1 , 1 ,
Regular K/>.(a+) = A!>.(a+) - g(a) - fx(b-) = Aj,,(b-) - g(b)
Kb
If Ka = 0, the boundary a is called a reflecting boundary, and we

interpret the boundary condition to mean j~(a+) = o. Similar comments
apply if Kb = o. On the other hand, if Ka = 00 (or Kb = (0), the boundary
is called an absorbing boundary. We can now understand the two processes
both satisfying (4.12). For the absorbing Brownian motion, the boundary
at x = 0 is absorbing, and for the reflecting Bro,vnian motion, the
boundary at x = 0 is reflecting.
Let F(x,u; t) = Exe iuX, be the characteristic function. Because
g(x) = e iux is bounded continuous, we can use (5.22) to obtain the
Laplace transform of F. Hence, the characteristic function (hence, the
transition function) can be uniquely determined by solving the differential
equation (5.23) with the boundary conditions given in Table 5.1 for closed
endpoints and by inverting a Laplace transform. Actually, the transition
function can be found more directly.
As an example, let S = [0, (0) and set
J.L(x) = 0 O<x< 00 (5.24)
It is clear that x = 0 is a regular boundary. Thus, for a bounded-con-
tinuous g, j>.(x) = 10
e->.tExg(X t ) dt is the unique bounded-continuous
00
solution to
M~' (x) - Aj>.(x) = -g(x) O<x<oo (5.25)
subject to the boundary condition
1 I
2K o1,,(0+) = Aj>.(O+) - g(O) (5.26)
We interpret (5.26) to meanj~(O+) = 0 if Ko = 0, andj>.(O+) = (l/;\)g(O)

if Ko := 00. The solution for j>.(x) is rather easy. The best approach is
to seek a solution of the form
(5.27)
where Fx(x,y) and vx(x) are determined by the following differential

equations and boundary conditions, together with the condition that
both Fx(x,y) and VA(X) are to be bounded in x:
1 a2 O<y<x< 00
- - Fx(x,y) - XFx(x,y) = 0
2 ax 2 O<x<y< 00
Fx(y+,y) = FA(y-,y)
~
ax
FA(x,y) I
x=y+
~ FA(x,y) Ix=y-
- ax -2
(5.28)
_1_ iJFA(x,y) I = XFA(O+,y)

2K o ax :>:=0+
1 d2
- -2 VA(X) - XVA(X) = 0 0 < x < 00
2 dx
(5.29)
1 ,
VA(O+) = XVx(O+) - 1
2K 0
It is obvious that the solutions must have the form
A(y) exp (- V2X x) x ~ y

FA(x,y) = { B(y) exp (- x) vv: + C(y) exp (V2X x) (5.30)
O:::;x:::;y
VA(X) = D exp (- vv: .1') 0:::; x < 00 (5.31)
where A(y), B(y), C(y) and D are determined by the subsidiary conditions
in (5.28) and (5.29).
Let g(u,x) = e- UX • Then, !A(U,X) is given by
!A(U,X) = 10 00
e-At(Exe-ux,) dt
= 10 00
rAt 10 00
e-ubP x(db,t) dt (5.32)
which is seen to be the double Laplace transform of the transition func-

tion P x(E,t). It is easy to see from (5.28) and (5.29) that neither VA nor
Fx depends on u, and we can write
(5.33)
It is clear now what the probabilistic interpretations of Fx(x,y) and

VA(X) are. If we write
P.,(E,t) = IE(O)(J>x(X t = 0) + IE p",(b,t) db (5.34)

where IB(x) = 0, 1 according as x fl E or x E E, then

VA(X) = 10 00
e-}'tCP:r(X t = 0) dt (5.35)
F}.(x,y) = 10 00
e-}'tpiy,t) dt (5.36)
It is easy to find v}. and FA, but to invert the Laplace transforms to get
cp,,(X t = 0) and P:r(y,t) is tedious.
It is fairly clear how the above example generalizes. We shall
sketch an outline of the situation below. First, the equation
x E int (S) (5.37)
for A > 0 has a pair of solutions f}. = <p}., 8}. with the following properties:
<p}.(x) > 0 8}.(x) > 0 x E int (S)
.(x) nondecreasing, (l}.(x) nonincreasing
(5.38)
.(x) bounded if and only if the right endpoint of S is closed
()>.(x) bounded if and only if the left endpoint of S is closed
The two functions . and 8>. are linearly independent, and the Wronskian
A is given by
A(X) = ()>.(x)<p~(x) - .(x)8~(x) = u'(x) (5.39)
where u(x) is the scale function. Therefore, every solution of (5.37) is a
linear combination of . and 8>.. We now seek a bounded solution of
d2 df>.(x)
iu 2(x) dx2f>.(x) + p.(x) ----a;- = A!>.(x) - g(x)
x E int (S) (5.40)

subject to the boundary conditions at closed endpoints given in Table 5.l.
Imitating the procedure in the example, we seek a solution in the form of
(5.41)
where a and b are endpoints of S. The functions F, UA, v>. are of the form
F>.(x ) = {A(Y)<p}.(X) + B(y)8}.(x) X>y

,y C(y).(x) + D(y)8A(x) x<y
(5.42)
v>.(x) = alP}. (x) + {38)o..(x) (5.43)

u,,(x) = 'Y<PA(X) + 88>.(x) (5.44)
The unknowns in (5.42) to (5.44) are determined by requiring
F>.(Y+,y) = Fx(y-,y) (5.45)
2
~
ax F}.(x,y) I x=y+
- ~
ax F>.(x,y) [.
.r=y
_= (5.46)
and by imposing the following boundary conditions:

F>.(x,y), v>.(x), u>.(x) stay bounded as x approaches an open
endpoint (5.47a)
At a closed endpoint, F>.(x,y) (as a function of x) satisfies
this boundary conditions of Table 5.1 for g == 0 (5.47b)
If the left endpoint is closed, v>.(x)(u>.(x» satisfies the

boundary conditions of Table 5.1 corresponding to g == 1
(g == 0). If the right endpoint is closed, VA(x) (uA(x»
satisfies the boundary conditions of Table 5.1 correspond-
ing to g == 0 (g == 1) (5.47e)
The probabilistic interpretation of u>., VA, and FA is given by
FA(x,y) dy = !o" e-At(P",(Xt E dy) dt a<x<y<b (5.4Sa)
uA(x) !o" e-At(P",(X

= t = b) dt (5.4Sb)
vA(x) = !o" e-At(P",(X t = a) dt (5.4Se)
As to be expected, the simplest case results when both endpoints are

open. In that case, both U A and VA are identically zero, and FA(x,y) is
-l
given by
F,(x,y) -
U2(y)~'(Y)
2
'P,(y)8,(x)
(5.49)
u 2(y)u'(y) 8,(y)'P,(x)
From (.5.49) and the fact that F,(x,y) is the Laplace transform of the
transition density px(y,t), we can finally deduce that Px(y,t) must satisfy
both the forward and the backward equations of diffusion. Why it is
so hard to prove that Px(y,t) satisfy the diffusion equations (without
assuming the differentiability conditions) is not well understood.
EXERCISES
1. Let IX" - 00 < t < 00 I be a random telegraph process which we define as a
stationary Markov process such that
(a) X, = +1 or -1 with <l'(X, = 1) = t = <l'(X, = -1)
(b) <l'(X, = X,) = t(I + e-!t-BI) = 1 - <l'(X, = -X.)
Find the generator A, and show that H = etA.

EXERCISES 207
2. Let {X" 0 ~ t < 00 I have a state space 8 = [0,00), and let

<P(X, < xiX. = a)
{ [ exp - -1(~-a)2J - exp

2 t-s
[1(~+a)2J}
- -
2 t-s
d~
+ ~7rI~ 1'" -
a/Vt-.
e- h ' dz
t > s
x>o
a>O
For a = 0 we have <P(X, = OIX. = 0) = 1. Find the generator A.
3. Let {X, t ;::: 0 I be a Brownian motion starting from 0, that is, X 0 = 0 with
probability 1. Let T< be defined by
Tc = min {t, X, = cI c >0
1>0
(a) Introduce a function J defined as

x>c
J(x) = { ~ x ~ c
and use (3.13) to find Ee- Ar,.
(b) Use the same method and prove the reflection principle of D. Andre,
<P(Tc ~ t) = 2<P(X, ;::: c)
4. Let {X" - 00 < t < 00 I be a zero mean Gaussian process with EX,X. = e-I'-.I,
namely, X is an Ornstein-Uhlenbeck process. Use (3.13) to find
<P(X, ;::: 0, 0 ~ t ~ 1)
5. Let W, and V, be standard Brownian motion processes with fVo = Vo = O. Let

Xl = W,' +
V,'. Show that X" t ;::: 0, is a diffusion process, and find the functions
u 2 and m.
6. For the process in Exercise 5 find a suitable scale function u(x), and determine
whether the state spaee 8 is closed at its left endpoint o. If 8 is closed at 0, deter-
mine whether it is a regular boundary or an exit boundary. If it is a regular
boundary, determine (Ay)(O).
7. Show that, except for notational changes, (5.22) can be viewed as a generalization
of (2.12), that is, if H,y E :J)A for each t, then it follows from (2.12) that
AJA = AJA - Y
8. Let X" t ;::: 0, be a diffusion process so that in iIlt (8) A = iu2(x)d'jdx2 + m(x)
djdx. Let h(x,t) be defined by
h(x,t) = Ex/(X,) exp [ - Jot k(X,) dS]

Give a heuristic derivation of the equation
a a'h(x,t) ah(x,t)
at' 2
1
-- hex t) = .u'(x) - - -
ax'
+ m(,x) - .-
iJ.r
- k(x)h(x,t) x E int (8)
9. A precise and general statement of the results in Exercise 8 is known as Kac's

theorem [Kac, 1951] stated as follows: Let k andf be in Bo, and let k be nonnegative
and f be continuous. Let uh(x) be defined by
uh(x) = 10'" h(x,t)e-h' dt = Ex 10'" e-h'f(X,) exp [ - lot k(X.) d8J dt

Then Uh is the unique solution of
AUh -f = AUh - kUA
(which is roughly equivalent to ah/at = Ah - kh).

Use Kac's theorem to prove that if r is the amount of time that a Brownian
motion is positive during [0,1], then
2. _ r.
(l'(r ~ t) = - sm- 1 V t
".
This is the celebrated arc-sine law of Levy. [Hint: Set k(x) = 13«1 + sgn x)/2)
andf(x) = 1 in Kac's theorem.]
10. Let X" t ~ 0, be a standard Brownian motion. Use the result of Exercise 8 to find
the distribution of the quadratic functional
10 1
X," dt
11. Suppose that int (8) = (0,00). For the following pairs of (T" and m, determine:
(a) Whether 8 is closed at 0.
(b) If closed, is ° a regular boundary?
m(x)
1 -I
1 -x
-x
1 - tx
6
Martingale Calculus
1. MARTINGALES
Let (~, n, §,) be a

complete probability space, let T be a subset of R and let
~. = (~t: t E T) be a family of sub-a-algebras of (£ which is increasing:
~s c ~t for s < t. A stochastic process X = (Xt : t E T) is said to be
adapted to (£. if for each t in T, X t is ~t measurable. For example, X is
adapted to (£~, where for each t, ~; is the a-algebra generated by the
family of random variables {Xs: s::s; t}. We call ~~ the family of a-algebras
generated by the process X.
A complex-valued stochastic process X is called a martingale with
respect to ~. if it is adapted to ~. and if t ~ s implies that
a.s. (1.1)
The process is said to be a sUbmartingale (supermartingale) if it is real-

valued and equality in (1.1) is replaced by ~ (respectively, by ::s;).
For example, if w is a Wiener process which is adapted to ~., and if for
each t, {ws - Wt, S ~ t} is independent of ~t' then W is a martingale with
respect to ~. (cf. Sec. 2.3). We call w an ~. Wiener martingale.
209
210
MARTINGALE CALCULUS
For another example, let B be a random variable with EIBI finite.

Then the random process X defined by
t E T
is a martingale with respect to te.. Indeed, X is adapted to te. by the

definition of conditional expectations and Eq. (1.1) is a consequence of the
smoothing property of conditional expectation.
A key to the tractability of martingales is that they have versions with
well-behaved sample paths. The sample paths can be chosen to be especially
nice under some simple conditions on te •.
Definition. We say that teo = (tel: t;::: 0) satisfies the usual conditions if
1. teo contains all sets A in te with '3'(A) = 0, and

2. (Right continuity) tel = te /+ for each t, where tet+ = n te
s>t
s•
Proposition 1.1. Suppose that X = (Xt : t;::: 0) is an te. martingale (or

submartingale or supermartingale) such that EXt is right continuous
in t, and suppose that te. satisfies the usual conditions. Then there
exists a modification of X such that for all t ;::: 0,
X t ( w) = limXs( w)
s~t
s>t
and (1.2)
X t _ ( w) = lim Xs( w)
s~t
s<t
exists.
Remark: Proposition 1.1 and the closely related result, Proposition 1.2, are
due to Doob. Proofs can be found in [Doob, 1953], [Doob, 1984],
[Meyer, 1966] and in [Dellacherie and Meyer, 1982]. In typical proofs,
Proposition 1.2 in the case T = 7l+ is deduced first and then Proposi-
tions 1.1 and 1.2 are deduced.
A process X which satisfies the conditions (1.2) for all t for w in a set
(not depending on t) of probability one is said to have continuous-on-the-right
with limits-on-the-Ieft (corlol) sample paths, or is simply said to be a corlol
process. A corlol process is clearly separable with any countable dense subset
of R + serving as a separating set.
Given two numbers a and b we use a /\ b to denote the smaller and
a V b to denote the larger of the two numbers. Given two a-algebras te and
1. MARTINGALES 211
c>fJ of subsets of a given set we use cr V c>fJ to denote the smallest a-algebra
containing both crand c>fJ.
Proposition 1.2 (Martingale Convergence Theorem). Suppose that (Xt , eft : t E T)

with T = III + or T = Z + is a (separable, if T = III +) supermartingale
such that E(Xt /\ 0) is bounded below by a constant not depending on
t. Then there exists an a.s. finite random variable Xoo such that
Remark: Clearly Xoo is croo measurable, where Aoo = V crt, that is, croo is
t
the smallest a-algebra containing the algebra Ucr t• A similar conver-
t
gence theorem is true when T = III or T = Z and t tends to minus
infinity.
An arbitrary family e of random variables is called uniformly inte-
grable if
The following is a proposition from classical measure theory, the first half of
which is a generalization of the dominated convergence theorem.
Proposition 1.3.
(a) Let (Xn) be a sequence of random variables, with EIXnl finite for
each n, converging a.s. to a random variable X. Then (EIXI is finite
and Xn converges to X in I-mean) if and only if the family of random
variables {Xn} is uniformly integrable.
(b) Let X be a random variable on (Q,cr) with EIXI < + 00, and let
{cry: y E r} be an arbitrary family of sub-a-algebras of cr with
arbitrary parameter set r. Then
e = {E [Xlcry ] : y E r}
is a uniformly integrable family of random variables. (See [Meyer 1966,

pp. 18, 85] for a proof.)
A martingale (or submartingale or supermartingale) (Xt : t E T) is
called uniformly integrable if the family of random variables {Xt : t E T} is
uniformly integrable. We will deduce form Propositions 1.2 and 1.3 the
212 MARTINGALE CALCULUS
following proposition:
Proposition 1.4. Let T = IR + or T = Z +.

(a) If (Xt : t E T) is a unifonnly integrable separable martingale, then
a.s. and in I-mean
where Xoo is an etoo measurable random variable with EIXoo I finite,

and
a.s. for t E T (1.3)
( b) A random process (Xt : t E T) is a unifonnly integrable martingale
if and only if
a.s. for t E T (1.4)
for some random variable B with EIBI finite.

(c) Any B satisfying (1.4) is related to Xoo by
a.s. (1.5)
Proof: Convergence a.s. in part (a) is guaranteed by Proposition 1.2.

Convergence in I-mean then is a consequence of the unifonn integrability of
X. Now for s and t with s ~ t,
EIE [Xslett ] - E [Xoo lett] I ~ E [E [IXs - Xoo Ilett]]

= EIXs - Xool
which tends to zero as s tends to infinity by part (a). Since E[Xslettl = X t
for all s > t, Eq. (1.3) is proved.
If X has the representation (1.4) then it is unifonnly integrable by
Proposition 1.3. Conversely, if X is unifonnly integrable then it has a
separable modification and then (1.4) is true with B = Xoo' This proves part
(b).
By Eqs. (1.3) and (1.4),
E [Bler t ] = E [Xoo lett] a.s. for all t
Thus E[(B - Xoo)IAl = 0 for all A in the algebra Uet t , and thus for all A
t
in eroo by the extension theorem. Since Xoo is etoo measurable, Eq. (1.5)
follows. •
A nonnegative random variable S, S ~ + 00, is called an et. stopping
time if
(w: S( w) ~ t} E ett for all t ~ 0
1. MARTINGALES 213
Equivalently, S is a stopping time if the zero-one valued random process U

defined by lle = I{s:5 t} is an adapted random process.
Given a stopping time S define the a-algebra its as follows:
E E crs if and only if E n { w: S( w) :-;;; t} E crt> all t

If S = So is a deterministic time this definition causes no ambiguity. (Since
we assume that crt+ = crl' stopping times here are the same as Markov times
defined in Section 5.3.)
The martingale property persists at stopping times:
Proposition 1.5 (Ooob's Optional Sampling Theorem). Let X = (X t : t z 0) (or

X = (Xn: n E Z)) be a corlol submartingale, and let S and R be two
stopping times such that S :-;;; R. Then
a.s. (1.6)
with equality in case X is a martingale, if R and S are bounded above

by a constant. If X is a uniformly integrable martingale then equality
holds in (1.6), even if R and S are not bounded.
Proof: The proposition is not difficult to verify if R and S each have a

finite number of possible values (see Exercise 4), and the general case is
proved by an approximation argument as in the proof of Proposition 3.2 in
Chapter 2. •
Lemma. Let (Yk , cr k , k = 1, ... , n) be a positive submartingale. Define

Yn*( w) = max{Y1 ( w), ... , Yn( w)}. Then
(a) 0l[Yn* z A]A:-;;; E[I{y.:' <-A}Yn ] for A > 0, and
(b) E[(y"*)2]:-;;; 4E(Y';).
Proof: Let A > 0 and define a stopping time T to be the first k such that
Yk z X, and T = n if there is no such k. By the optional sampling theorem,
Y,. :-;;; E[y"lcr so that
T ]
and so
Since the event that Y,. z A is the same as the event that Yn* z A, this
proves (a).
Suppose that E(Y;) is finite, since (b) is trivial otherwise. Define

F("A) = 0'(Yn* < "A). Then, using integration by parts and part (a),
Now Yn* ~ Y1 + ... + Y" and E[l',?] ~ E[Y;] < + 00 so that E[(Yn*)2] is
finite, so (b) is implied. •
Proposition 1.6. (Doob's L2 Inequality). Let M be a separable martingale and

T < +00. Then
and if M is uniformly integrable then the inequality persists if T =

+00.
Proof: Suppose 8 1 ,82 , •.• is a separating set for the interval [0, T] with
81 =T. For n ~ 1, let t~, ... , t;: be 8 1 , ••• , 8 n arranged in increasing order.
Then (IMtZI, (£tZ: 1 ~ k ~ n) is a positive submartingale so, by the lemma,
(1.8)
As n tends to infinity the left-hand side of (1.8) converges to the left-hand

side of (1.7) by the monotone convergence theorem, and the proposition is
proved. •
Martingales are ideal for studying differentiation of probability mea-
sures, as we show next. Let 0 be a non empty set and let ce be a a-algebra of
subsets of O. The pair (0, ce) is called a measurable space. Now consider two
nonnegative a-additive set functions M and Mo defined on sets in ce such
that M(O) and Mo(O) are finite. We call M and Mo finite measures on
(0, ce). For a pair of finite measures on a measurable space we define the
1. MARTINGALES 215
concepts of absolute continuity and singularity as follows:

M is said to be absolutely continuous with respect to Mo if for every
A in tt such that Mo(A) = 0, we have M(A) = O. In symbols we write
M« Mo.
M and Mo are said to be mutually absolutely continuous if M « Mo
and Mo « M. In symbols we write M == Mo.
M and Mo are said to be singular if there exists a set A in tt such
that M(A) = 0 and Mo(n - A) = O. We denote this by M ..L Mo.
Two basic theorems related to these properties of measures are the
Lebesgue decomposition theorem and the Radon-Nikodym theorem. These
can be combined and stated for the case of finite measures as follows.
Proposition 1.7. Let (n, tt) be a measurable space, and let Mo and M be two
finite measures defined on (n, tt). Then there exists a nonnegative,
tt--measurable function A and a finite measure ~L such that for every A
in tt,
M(A) = f A(w)Mo(dw) + ~(A)

A
where ~ is singular with respect to Mo' A is unique up to a set of Mo

measure zero, and ~ is unique.
Remark: It is clear that M « Mo if and only if ~ is the zero measure. In

that case we call A the Radon-Nikodym derivative of M with
respect to Mo and denote A by
A= dM
dMo
If M « Mo and ~ is a sub-a-algebra of tt then the restriction of M to
0~ is always absolutely continuous with respect to the corresponding restric-
tion of Mo' It is easy to verify this fact and the formula (see Exercise 6)
oB
dM- =Eo
-
dM:f'
[dMl]
--
dM
o
~ a.s. Mo (1.9)
where M~ and Mlf denote the restrictions of M and Mo to ~, respectively,

and Eo indicates that the conditional expectation is with respect to Mo
measure. Although our definition for conditional expectation given in Section
1.7 was for probability measures, extension to finite measures causes no
difficulty.
We shall now restrict ourselves to probability measures which are
generated by stochastic processes. Let T be a possibly infinite subinterval of
R and let (Xt : t E T) be a family of m-vector valued functions on 0 such

that for each w fixed, X t ( w) is right continuous in t. Let Cf denote the
a-algebra generated by {Xt : t E T} . We call the pair (0, Cf) the sample
space of (Xt : t E T). Let (Tn) be an increasing sequence of finite subsets of
T such that Tn becomes dense in T as n tends to infinity, and Tn contains
the largest t in T, if such t exists. Then let Cfn denote the a-algebra
generated by {Xt : t E Tn} and let Cfoo denote the smallest a-algebra contain-
ing UCfn • It is clear that Cf:::> Cfoo • On the other hand, by the right continuity
n
of X t ( w) in t, each random variable X t is Cfoo measurable and so Cf = Cfoo •
Now suppose that 0' and 0'0 are two probability measures on (0, Cf)
such that 0' « 0'0. For each n, let
Then Ln is a nonnegative uniformly integrable martingale, and by Proposi-

tion 1.4 we have
lim L = d0' a.s. and in I-mean,

n~ 00 n d0'o
This equation provides a useful method for computing d0' / d0'o since Ln can
often be computed directly, as we now demonstrate.
First, by (1.9) we have
Now suppose that Tn contains n points. Then, since X t is m-vector valued,

the ordered collection (Xt : t E Tn) is mn-vector valued. Let PTn and PTno
denote the probability measures, defined on the Borel subsets of R mn, which
are induced by (Xt' t E Tn) under 0' and 0'0' respectively. Then, with 0'0
probability one,
(1.10)
where the Radon-Nikodym derivative dPT /dPT 0 is a Borel measurable

function on Rmn. If both P T and PT 0 are ab~oluteiy continuous with respect
to some measure (e.g., Leb~gue me~ure) then (1.10) can be written in terms
of the density functions PTn and PTno as
2. SAMPLE PATH INTEGRALS 217
For a specific example, consider the following. Suppose that under

°
0'o,{Xt> ~ t ~ I} is a Brownian motion and under 0', {Xi - t,O ~ t ~ I} is
a Brownian motion. Let Tn = {k/2n,k = 0,1, ... ,2 n }. Then,
dPT PTJX O ,X l , ••• ,X 2 n)

d PTnno (Xo, Xl' ••• , X2n) = ( )
PTnO X O , Xl' ••• , X 2 n
2n-l
TI (1/h'IT2- n ) e- ~(2n)(Xv+l -x v- 2-n)2
v~o
2n-l
TI (1/h'IT2- n ) e- ~(2n)(XV+l -xv)2
v~o
2n-l ]
= exp [ + L (XV+l - Xv) e-~
V~O
= e-~e(XI-XO)
Therefore, in this somewhat trivial example, we have with 0'o-measure 1,
As a check, we can verify that Eo d0' /d0'o = 1 by computing
2. SAMPLE PATH INTEGRALS
Consider the integral
f<l>( S, w) dV( s, w)
which we often write as
or simply as
for random processes <l> and V. Such an integral can in some cases be defined
as an integral involving deterministic functions (namely the sample paths of
<l> and V) for each w. Before defining the integral as a Lebesgue-Stieltjes
integral, we will briefly review the theory of Lebesgue-Stieltjes integrals for
deterministic functions.
Let G be a bounded increasing function on III + which is right continu-
ous. Then there exists a unique measure JL defined on the Borel subsets of III +
such that
J.L((a,b]) = G(b) -- G(a)
Then the Lebesgue-Stieltjes integral of a (deterministic) nonnegative Borel

function <{> with respect to G,
{Xl <{>( s) dG( s)
is defined to be equal to the Lebesgue integral of <{> with respect to /-':
(The integral we defined in Section 1.5 is the Lebesgue integral.)

Given a and b with a < b, let Wab denote the set of vectors t such
that, for some n, t = (to, t1> ... ' tn ), where a ::-:; to < ti ... < tn ::-:; b. Then
given a real function v on IR +, the variation of v over a finite interval [a, b]
is
n-l
fbldvsl ~ sup L IV(tk+ I ) - v(tk)1 ::-:; +00
a (E'1Tab k=O
and the variation over IR + is equal to
We say that v has finite variation if its variation over IR+ is finite.
A real function v has finite variation if and only if v can be written as
vet) = v(O) + vIet) - vit) (2.1)
where VI and V 2 are bounded increasing functions. If v has such a represen-

tation then there is a unique choice of VI and V2 (we call this (VI' v 2 )
canonical) such that VI (0) = v 2 (O) = 0 and
(2.2)
Henceforth we suppose that v is right continuous. Then its variation over

intervals [0, t] and the canonical (VI' V2) can be found by taking limits (use
2n-l
Ddvsl
o
= lim L
n ..... oo k=O
Iv( t8;:+I) - v( WI,') I (2.3)
2n-l
v1(t) = lim L (V(t8;:+I)-V(t8;:»+ (2.4)

n ..... 00 k=O
and
2n-l
v2 ( t) = lim L (v( t8;:+I) - v( t8;:»_ (2.5)
n ..... 00 k=O
We suppose, furthennore, that v has finite variation. Then VI and V2 are

bounded right-continuous increasing functions on IR +.
Then we define the Lebesgue-Stieltjes integral of cf> with respect to V
by
1 cf>dvs
00
= lOOcf>+ dv 1 (s) -
0 0 0
l OO cf>_ dv (s)
1
-1o cf>+ dv
00
2( s) + 1 cf>- dv
0
00
2( s)
The integral is well defined and finite if the integral of 1cf>1 with respect to the
variation of v, defined by
(2.6)
is finite. The integral of cf> with respect to v over a compact interval [0, t] is
defined by
(2.7)
If the left-hand side of (2.6) is finite then the integral in (2.7) is finite, and as
a function of t the integral in (2.7) is right continuous and its total variation
is equal to the left-hand side of (2.6). It is convenient to use the notation
cf> • v to denote the integral as a function of t-thus
cf> • vt = Io cf>s dvs

t
A function V is said to have locally finite variation if the variation over

each compact interval [0, t] is finite. Then for any T> the function vT °
defined by v ( t) = v( t 1\ T) has finite variation. We can still define cf> • V t for
T
each t by letting it equal
io
t T
<f>s dvs (for some T, T ~ t) (2.8)
because the value of the integral in (2.8) is the same for all T exceeding t.
Then <f> • V t is well defined and finite for all t if
for all t (2.9)
and the integral in (2.9) is the variation of <f> • v over the interval [0, t].
Suppose now that (Q, ce, 0') is a probability space equipped with an
increasing family of sub-a-algebras of ce, ceo = (ce t , t ~ 0). Suppose that V =
(~: t ~ 0) is a real corlol random process which is ceo adapted. We then
define fldY"l, ~(t) and V;(t) by the same equations, Eqs. (2.3)-(2.5), which
were used for deterministic functions v. Then, for example, for each w fixed,
2n-l
tldy"l(w) = lim L W(t8;:+l>w)-V(t8;:,w)1
o n~oo k~O
V is said to have locally finite variation if
for each t
For the remainder of this section we will assume that V has locally finite
variation.
The corresponding facts for deterministic processes immediately imply
that tldy"l, VI(t) and V;(t) are increasing and right continuous in t and
o
that Eqs. (2.1) and (2.2) hold, for a.e. w. On the other hand, for each t fixed
these quantities are defined as (pointwise) limits of sequences of cet measur-
able random variables. Therefore fldy"l, VI and V; are adapted random
o
processes.
Next, if <f>( w, t) is a real Borel measurable function of t for each w then
we define <f> • ~(w) for each w to be the Lebesgue-Stieltjes integral of the
sample function s ~ <p( s, w) with respect to the sample functions ~ V( s, w).
In order that the resulting integral be measurable and adapted, a
condition on <f> as a function of w is needed. The appropriate condition is
that of progressive measurability.
Definition. <f> = (<p(t, w» is an ceo progressively measurable random pro-

cess if, for each t ~ 0, <p( s, w) restricted to [0, t] X Q is a 'iB([0, t]) X ce t
measurable function of (s, w).
Proposition 2.1. Let V be a corlol, tt. -adapted random process and let be
an tt. -progressively measurable process. Then the sample path Le-
besgue-Stieltjes integral
tl<l>( s, w )lldV( s, w)1 (2.10)

o
is an adapted random process which may be + 00 for some (t, w). If it is
a.s. finite for each t then
t<l>(s, w) dV(s, w) (2.11)

o
is an adapted corlol random process, the variation of which is given by

(2.10).
Proof: The only part of the proposition that does not follow immediately
from the facts given above for deterministic functions is that the integrals in
(2.10) and (2.11) are ttt measurable for each t. It is sufficient to consider the
case when is bounded, because in the general case is the pointwise limit
of a sequence of bounded <l>n with lnl ~ 1<1>1, and the integrals of the <l>n
converge pointwise to those of by the dominated convergence theorem. Fix
t with t ~ O. It suffices to prove (the stronger result) that the random
variable U defined by
U(w) = IatG(s,w) dV(s,w) (t fixed)
is ttt measurable whenever G is a (not necessarily adapted) bounded,

0iJ ([0, t]) X ttt measurable function of (s, w). Let 1C denote the collection of
such G for which U is ttt measurable. Clearly 1C contains functions of the
form I[a,b]xD where 0 ~ a < b ~ t and D is in ttt. The class of such simple
functions is closed under multiplication. Using the dominated convergence
theorem it is also clear that 1C is closed under uniform and bounded
monotone limits. Thus, by the monotone class theorem, 1C contains all
bounded 0iJ([0, t]) X ttt measurable functions, as was to be proved. •
A random counting process (Ct : t ~ 0) is defined to be an increasing
right-continuous integer-valued random process such that the jumps .:lCt =
Ct - Ct - are either zero or one for all w and t. A Poisson random process is a
counting process (Nt) such that No = 0,
Nt2 - N tl , Nta - N t2 ,···, N tn - N tn _l

are independent if 0 ~ t1 < ... < tn'
Nt - Ns is a Poisson random variable with mean t - s whenever t> s
A Poisson random process is called an tE. Poisson process if it is tE. adapted

and if {(Ns - Nt): s ~ t} is independent of tE t for each t> o. Given an tE.
Poisson process N, we define n t = Nt - t. Then for s ~ t, ns - n t has mean
zero and is independent of tEt so that
Therefore n is a martingale. The variation of n is given by
If cf> is a progressively measurable process then we compute that
and
For example,
fNs dNs = L Ns = 1 + 2 +
o s 5, t
I!.Ns~l
and, since ANs = 0 for all but countably many values of s,
fANs dns = L !lNs - f!lNs ds = Nt (2.12)

o s<t 0
I!.N-;;~l
Equation (2.12) shows that the integral of a bounded progressively

measurable (in particular, adapted) random process with respect to a
martingale is not necessarily a martingale. The reason is that the integrand
and integrator can simultaneously jump in a positively correlated way.
Integrals with respect to martingales are martingales if the integrands are
"predictable" as we see in Section 4.
We defined the Lebesgue-Stieltjes integral cf> • V in this section only
when cf> and V are real valued. The definitions extend naturally to complex-
valued cf> and V by requiring the integral to be linear in cf> and linear in V.
3. PREDICTABLE PROCESSES
A submartingale (Bp tE t : t ~ 0) has nonnegative drift by its very definition:

E [Bt - BsltE.] ~ 0 a.s. for t ~ s
3. PREDICTABLE PROCESSES 223
Clearly an adapted increasing process A with Ao = 0 has the same drift as

B, i.e.,
a.s. when t z s
if and only if M defined by M t = B t - At is a martingale. Such an A gives

rise to a representation of B as the sum of an increasing process A and a
martingale M:
(3.1)
The representation (3.1) of B as the sum of an increasing process and a

martingale is not unique. For example, a Poisson process N is a sub-
martingale relative to its own family of a-algebras (te;'), and t + (Nt - t)
and Nt + 0 are each representations of N with the form (3.1). In order to
make the representation (3.1) unique, we need to further restrict A.
To understand how a canonical version of A can be identified, we will
briefly consider discrete-time processes. Suppose that (ten: n E Z +) is an
increasing family of sub-a-algebras of te. A discrete-time process (An: n E
Z +) is called predictable if An+ 1 is ten measurable for each n. A predictable
process is adapted, but the converse is not generally true. Suppose now that
(Bn' ten : n E Z +) is a submartingale and that
(3.2)
where
(Mn' ten : n E Z +) is a martingale, and (3.3)
A is predictable and Ao = 0 (3.4)
Then
E[Bn+l - Bnlten] = E[An+l - Anlten] + E[Mn+l - Mnlten]
= An+ 1 - An z 0 a.s.
Thus, it must be that A is an increasing process and

n-l
An = L E[B k+1 - Bklte k ] a.s. (3.5)

k-O
Therefore, the submartingale B and conditions (3.2)-(3.4) uniquely de-

termine A and M. In addition, given a submartingale B, if A is defined by
Eq. (3.5) and M is then defined by Eq. (3.2), then A is increasing and
conditions (3.3) and (3.4) are true.
The predictability (rather than merely the adaptedness) requirement
on A was essential for making A unique. A similar concept plays an
analogous role for continuous-time processes, which we turn to next.
Let (n, 8, 0') be a probability space equipped with an increasing family

(8 t 0) of sub-a-algebras of 8 satisfying the usual conditions.
: t;,:::
Definition. Lp is the a-algebra of subsets of Iii + X n generated by random

processes q, (viewed as functions on IR + X n) of the form
q,(t,w) = U(w)I(a,b](t) (3.6)
where a and b are deterministic and U is an 8 a -measurable random

variable. Sets in Lp are called predictable (or 8. predictable). A
random process H = (H(t, w» is called predictable if it is a Lp-mea-
surable function of (t, w).
The processes q, in this definition are 8. adapted, and it is crucial that
they are left continuous-left continuity implies that q,( t, w) is determined
by {q,(s,w): s < t}.
Proposition 3.1.
(a) Suppose that H is an 8. -adapted process which is left continuous

in t for each w. Then H is predictable.
( b) Suppose that € > 0 and that K is a corlol random process such
that K t is 8 t _. measurable for all t (set 8 t _. = 8 0 if t < E). Then K
is predictable.
Proof: For each n let
n2
Hn(t,w) = L H(!,w)I(k1n,(k+l)ln](t).
k=O n
Then Hn is clearly predictable for each n and
lim Hn(t,w) = H(t,w)

n~oo
This proves part (a).

If 0 < 8 < € then the process K(t+IJ)- is predictable by part (a). This
process converges pointwise to the process K as 8 decreases to zero, and this
establishes part (b). •
Definition. Given any process X adapted to ~., if there exists a process A

such that Ao = 0, A is predictable, A has corlol sample paths of
locally finite variation, and X t - At is a martingale relative to (8.,0'),
then A is called the predictable compensator of X relative to
(8.,0').
3. PREDICTABLE PROCESSES 225
A corIol sUbmartingale (Bp ee t : t z 0) is class D if the collection of

random variables
{B T : T is an ee. -stopping time}

is uniformly integrable. The following is a celebrated theorem of P. A. Meyer
(the second part of which was more or less conjectured earlier by Doob on
the basis of the discrete-time analogue and many examples) about represen-
tations of the form (3.1).
Proposition 3.2.
( a) If a random process X has a predictable compensator, then it is

unique in the sense that any two predictable compensators are equal to
each other for all t, with probability one.
( b) If B is a corIol class D submartingale, then B has a predictable
compensator A, and the sample paths of A are increasing with prob-
ability one.
Proof: Rather than give a proof (which can be found in [Dellacherie and
Meyer, 1982] and in [Doob, 1984]) we only attempt to make plausible the
existence of predictable compensators for a class D submartingale B. Given a
positive integer M, define a process Am by (note the similarity to Eq. (3.5)):
A;' = 0 for 0 ~ t < 2- m

n-l
A;' = L E [ B(k+ 1)2- m - Bk2-mleek2-m 1

k~O
for n2- m ~ t < (n + 1)2- m and n z 1
Then Am has right-continuous increasing sample paths with probability one,

and for each t, either A;' = 0 or A;' is ee t _ 2 -m measurable. Thus, Am is
predictable. Furthermore, by the reasoning given for discrete-time processes,
(Bt - A;',eet : t = n2- m , n = 0,1,2, ... )
is a martingale. The assumption that B is class D implies that Am converges

in an appropriate sense to a process A which satisfies the required condi-
tions. A proof of (a variation of) Proposition 3.2 which is based on this idea
was given by K. M. Rao (1969). •
Remark: If A and A' are each predictable compensators of X, then their

difference (At - A;) is predictable, is a martingale, has initial value
zero, and has corlol locally finite variation sample paths. Thus, asser-
tion (a) of the proposition is equivalent to the fact that such
martingales are equal to zero for all t, with probability one.
Meyer's original proof, as well as those of Dol€ans (1968) and Rao
(1969) were based on a different characterization of compensator-they
were called "natural increasing processes." Doleans (1967) first estab-
lished equivalence of the two characterizations (see [Dellacherie and
Meyer 1982, p. 126] and [Doob, 1984, p. 483]).
Example 1. Let C be a counting process adapted to te. such that ECt is

finite for each t. We will show that C has a predictable compensator.
First, for fixed n, the process (Ct/>, n) is a class D submartingale since
the family of random variables
{ICT /\ nl : T a stopping time}
is dominated by the integrable random variable ICol + ICnl. Hence,
(Ct /\ n) has a predictable compensator A n for each n- in particular
Ct /\ n - A7 is a martingale. If m < n then Ct /\ m - A7/\ m is clearly also
a martingale, so by the uniqueness of the compensator of (Ct /\ m) we
have A7' = A7/\ m for all t, with probability one or, in other words,
A7' = A7 for all t in [0, m 1\ n], with probability one. Thus, a process
A is well defined up to a set of zero probability by
At = lim A7 for all t, a.s.
n--+ 00
and such a process A is the predictable compensator of C. If A has the

representation
for some nonnegative teo progressive process >t, then A is called the
intensity of C with respect to (te., ~).
Example 2. Recall that if N is an te. Poisson process then ~ - t is a

martingale, so that At = t is the te. predictable compensator of N.
Equivalently, the counting process N has intensity A = 1, relative to
(te., ~).
On the other hand, if 'iJ; = te for all t, then N is an £)t. predict-
able increasing process and ~ - Nt is a martingale (trivially) relative
to (£)t., ~). Thus, the unique £)t. predictable compensator of N is N
itself, and N does not have an intensity relative to '!Y. unless delta
functions are allowed. Thus, the predictable compensator of a process
depends crucially on the family of a-algebras used in the definition.
4. ISOMETRIC INTEGRALS 227
4. ISOMETRIC INTEGRALS
Recall that a corlol martingale (Mt : t ~ 0) is uniformly integrable if and

only if there is an ttoo -measurable random variable Moo such that M t =
E [Moo Ittt]. Let ')1L 2 denote the space of complex-valued corlol uniformly
integrable martingales M such that EIMoo 12 is finite. For M and N in ')1L 2
we define their inner product to be EMoo Noo ' which is simply the inner
product of Moo and Noo in the space L2(ttoo ) of square-integrable ttoo-mea-
surable random variables. The correspondence between a process M in ')1L 2
and the random variable Moo in L2(ttoo ) gives a one-to-one isometric map-
ping of ')1L2 onto L2(ttorJ. Since L2(ttoo ) is a Hilbert space, so is ')1L2.
If M E ')1L 2 then it is not difficult to show that for t > s,
a.s. ( 4.1)
Since the left-hand side of this equation is nonnegative we conclude that

(IMtI2: t ~ 0) is an tt. submartingale. In addition, by Doob's L2 inequality
(Proposition 1.6), sup IMt l 2 has finite mean, and this random variable
dominates
{IMT I2 : T is stopping time}
Therefore, IMI2 is a class D submartingale so that it has a predictable

compensator which we denote by (M,M).
If M and N are in ')1L2 we define (M, N) to be the predictable
compensator of (MtNt ). Since
the compensator (M, N) exists and
(M, N) = H (M + N,M + N) - (M - N,M - N)

+i( - iM + N, -iM + N) - i(iM + N,iM + N)}
Note from the definition that (N,M) = (M, N).

For all complex A,
0:-:; (M + AN,M + AN)

= (M,M) + 2Re(A(M,N» + IAI2(N,N)
The fact that the right-hand side is non-negative for all complex A implies
the useful inequality:
all t, a.s. ( 4.3)

We now define a stochastic integral with integrator M in '!)R,2 and

e
integrand • M-that is,
The construction closely parallels the construction of integrals with respect

to the Wiener process, but now (M, M) plays the important role that
(w, w) did for Wiener integrals.
Proposition 4.1. There is a unique mapping is a function in e (M) such that
2
if tk < t::5: t k + 1 , some k

( 4.4)
otherwise
for some to, ... , tn (such a cf> is called a step function) then
( 4.5)
( b) (Linearity)
( • Moo + 1/1 • Moo (4.6)

(c) (Isometry)
The additional properties hold

(c') (Isometry)
Ecf> • Mool/; • Moo = E 1o cf>s;r,s d(M, M)s

00
a.s. for each t
( e) A( t AMt for all t, with probability one, where
AZt = Zt - Zt~ is the jump of a process Z at time t.
( f) Assume that M has locally finite variation and that M E ~ 2 •

Suppose that cP is a predictable process such that cP M is well defined 0
as an isometric integral and as a random Lebesgue-Stieltjes integral.

Then the two integrals agree for all t, with probability one.
(g) For M in ~2 and cP in e2(M), cP M is the unique process in ~2
0
with initial value zero such that

(cpoM,N) = cpo (M,N) (4.7)
for all N in ~2. (The integral on the right-hand side of (4.7) is a
Lebesgue-Stieltjes sample-path integral, and we consider two processes
which agree for all t with probability one to be the same.)
Proof: If cP is a step function then the set to,"" tn in (4.4) is not unique.
For example, if (4.4) is true for to, ... ,tn then it is also true for any sequence
t6, t{, ... , t~, for which to,"" tn is a subsequence. In spite of this nonunique-
ness, the sum in (4.5) does not depend on the choice of to,"" tn, so we can
use (a) to define cP Moo for a step function cpo If If; is another step function
e
0
in 2 (M), then a single sequence to,"" tn can be found so that (4.3) and the
corresponding statement for If; are both true. The linearity property is easily
established in this case. Next,
n-ln-l
ICP Moo 12
0 = L L CPtk+~tj+(Mtk+l - MtJ(Mtj+1 - MtJ
k=O j=O
Let Dkj denote the kith term on the right-hand side. The fact that CPtk+ is
eftk measurable for each k implies that for k > i,
EDkj = E[ CPtk+~tj+(Mtj+l - MtJE[ M tk +1 - MtkleftJ] = 0
and similarly EDkj = 0 for k < i. Also, using (4.1),

EDkk = E [ICPtk + 12 E [IMtk +1 - Mtk 12leftk ] ]
= E[ICPtk+ 12E[ (M,M)t k+ 1 - (M,M)tkleftJ]
= E[ICPtk+1 2(M,M)tk+l - (M,M)tJ].

Therefore
n-l
EICP 0 Moo 12 = E L ICPt k+12( (M,M)t k+ 1 - (M,M)tJ
k=O
= E l°OICPld(M,M)t = IICPI12.
o
Thus, properties ( b) and (c) are both true when cP and If; are step functions.
So far, we have only defined a random variable cp • Moo for each step
function cpo However, because of the one-to-one correspondence between ~2
and the space of square integrable teoo -measurable random variables dis-
cussed at the beginning of the section, this serves to define the process cp • M
in ~ 2 for any step function cpo The case of general cp will be considered next.
e
Let L denote the set of functions cp in 2 (M) such that there exists a
e
sequence of step functions cpn in 2 (M) such that
It is not difficult to verify that L is a closed subspace of 2 (M). In e

particular, L is closed under both uniform and bounded monotone conver-
gence, so by the monotone class theorem, L contains all of the bounded
real-valued functions in M. From this it follows that L = e2 (M).
e
We have thus proven that for any cp in 2 (M), there exists a sequence
e
of step functions cpn in 2 (M) such that IIcpn - cpll --+ O. In order that the
stochastic integral satisfy properties (a )-( c) we are forced to have
and thus, if cp • M exists, we must have
cp • Moo = lim in q.m. cpn • Moo (4.8)
This proves the assertion that the map satisfying properties (a )-( c) is
unique, if it exists.
Continuing now with the existence proof, we attempt to define cp • M
by Eq. (4.8). To see that this works we need only check two things: First, (use
la + bl 2 5: 21al 2 + 2I b I2),
E[lcpn • Moo - cpm. Moo 12] = Ilcpn _ cpml12
5: 2( Ilcpn - cpll2 + Ilcpm - cp1l2) ~ 0
so that the sequence of random variables cpn • Moo is Cauchy in 2-mean and
hence the limit in (4.8) exists. Second, if (l/;n) is another sequence of step
e
functions in 2 (M) such that Ilcp - l/;nll tends to zero as n tends to infinity,
then
E[Il/;n. Moo - cp. Moo 12] 5: 2E[Il/;n. Moo - cpn. Moo 12

+ Icpn • Moo - cp • Moo 12] (4.9)
and since
the left-hand side of (4.9) tends to zero as n tends to infinity. Thus 1/In - M
converges in 2-mean to q, - M, so our definition of q, - M does not depend on
the sequence q,n chosen. Thus q, - M is well defined for any q, in 2 (M).e
If q,n and 1/In are step functions then
( 4.10)
and if the functions are chosen so that 1Iq, - q,nll, 111/1 - 1/I nll, and hence also
1Iq, +tj; - q,n - 1/I nll tend to zero as n tends to infinity, we obtain (4.6) from
(4.10) by taking the limit in 2-mean of each term in (4.10). Property (c) is
e
easily verified for any q, in 2 (M) by a similar argument.
The isometry property (c') is implied by property (c) and the identi-
ties obtained by substituting q, - M and 1/1 - M or q, and 1/1 in for M and N in
Eq. (4.2).
For any s and t,
Property (d) is thus easy to establish for a step function q,:

n-l
q,-Mt£E[q,-Mooltrt ] = L q,tk+(Mtk+lAt-MtkAt) a.s.

k~O
= (I (O,t jq,) - Moo a.s.
To prove (d) in general we chose a sequence of step functions q,n with

1Iq, - q,nll tending to zero. Then
so that
q, - M t = lim in q.m.q,n - Mt = lim in q.m.I (O,tjq,n - Moo

n~oo n~oo
a.s.
which establishes (d).

Property (e) is certainly true if q, is a step function. Suppose that (e)
is true for each q,n in a sequence such that q,n converges either montonically
e
or uniformly to a function q, in 2 (M). Then
and
for all t, W
Thus, by the monotone class theorem, (e) is true for all bounded real
e
functions <f> in 2(M). Appealing to the approximation argument again, one
e
easily sees that (e) is true for all <f> in 2(M).
Property ( f ) is proved by the same method used to prove property ( e).
To prove Eq. (4.7) we only need to establish that U defined by
is a martingale. Since MN - (M, N) is a martingale it is easy to verify that

un defined by
is a martingale for each n, where <f>n is a sequence of bounded step functions

e
converging in 2(M) to <f>. Now by the Schwarz inequality and (4.3),
EI~ - ~nl s EI(<f> - cpn) • Mt)Ntl + EI(<f> - <f>n). (M,N)tl

s (EI( <f> - cpn) • M t I2EINt I2)1/2
+E[(I<f> - <f>nI2. (M,M)t(N,N)t)1/2]
s 211<f> - <f>nll(EIN~I)1/2
Thus, ~n converges to ~ in I-mean for each t. To see that this implies that
U is a martingale, note that for s > t
EIE [[f,ltrtl - Uti = EIE [[f, - [f,nltrtl I

sEI~-~nl~O
so that ~n converges to E[~ltrt], as well as to ~, in I-mean. Thus ~ is a.s.

equal to E[[f,ltr t ], so U is a martingale and (4.7) is established.
Finally, if Z is any process in ~2 with Zo = 0 such that (Z, N) =
<f>. (M, N) for all N in ~2, then in view of (4.7),
(<f>. M - Z, N) = <f>. (M, N) - <f>. (M, N) = 0
for all N in ~ 2 • Choosing N = <f> • M - Z we have (<f> • M - Z, <f> • M - Z)

= 0 so Z = <f> • M. This completes the proof of (g), and hence the proof of
the proposition. •
We can apply similar reasoning to Stieltjes-Lebesgue integrals:
Let V be a (not necessarily square-integrable) martingale

Proposition 4.2.
and a process with integrable variation and let H be a bounded
5. SEMIMARTINGALE INTEGRALS 233
predictable process. Then
defined as a Lebesgue-Stieltj es integral for each (t, w), is a martingale.
Prool: The result is obvious if H is a step function. In general there is a

sequence Hn of step functions such that
Then
Thus, the martingales ltH': dy' converge to ltHs dY: in I-mean for each t,
o 0
which yields the desired result. •
5. SEMIMARTINGALE INTEGRALS
Let X have a representation
where ME 0lL 2 and A is an adapted corlol process with finite variation.

This representation is not unique, but given any other such representation
X t = Xo + M: + A;, the process M - M' is equal to A' - A and therefore it
is in ~ 2 and is also a process with finite variation. Given a bounded
predictable process q, we define the stochastic integral q, • X by
By part ( I) of Proposition 4.1 this definition is not ambiguous even though

the representation for X is not unique.
The class of processes X with the representation above is nearly closed
under sufficiently smooth nonlinear transformations. The main reason it is
not is that I(Xt ) need not be integrable, even though X t is. In this section
we introduce the technique of localizing processes to a random time interval,
and this will allow us to present an important nonlinear transformation
formula in the next section.
Given a random process X = (Xt : t ~ 0) and a stopping time T, we

define X T , the process X stopped at time T, by
if T = °
xi = (~tX if 0.:00; t.:oo; T
if t ~ T
andT>O
andT>O
T
or equivalently, xi
= I{T> O}Xt/\ T· A random process M is called a local
martingale (resp.local square integrable martingale) if it is corlol and if
there is a sequence (Tn) of stopping times which is increasing (i.e., Tn( w) .:00;
Tn+ 1(w) a.s.) and which satisfies Tn ~ + 00 a.s., such that MTn is a
martingale (resp. square integrable martingale) for each n. Similarly, a
random process H is called locally bounded if there is a sequence Tn i + 00
of stopping times such that HTn is a bounded random process for each n.
Definition. A semimartingale relative to the family (at) is any random

process X which can be written as
for t ~ 0, a.s. (5.1)
where Xo is an teo-measurable random variable, M is a local square

integrable martingale with Mo = 0, and A is a right-continuous adapted
process whose paths are of locally finite variation with Ao = 0. We call
(5.1) a semimartingale representation of X.
Remark: It is more common in the literature to only require that M be a

local martingale (rather than a local square integrable martingale) in
the definition above. However, local martingales are semimartingales as
we have defined them [Delacherie and Meyer, 1982] so our definition of
a semimartingale is equivalent to the common definition. Note that
semimartingales have corlol sample paths since we require local square
integrable martingales to have corlol sample paths.
Let X be a semimartingale and let H be a locally bounded predictable
process. We call a stopping time R good relative to (X, H) if H is bounded
on the interval (0, R] and if X stopped at R has a representation
for t ~ 0, a.s.
where M E ~2 and A has finite variation. For any good stopping time R,
the integral H· X R of H with respect to XR can thus be defined by the
procedure given at the beginning of the section. If S is another good stopping
time, then so is R /\ S and
for t .:00; R /\ S, a.s.

5. SEMIMARTINGALE INTEGRALS 235
There exists a sequence R n i 00 of good stopping times, and then
for t::o:; R n , a.s.
Thus, the processes H 0 X Rn paste together into a unique process H 0 X such

that
If Sn i 00 is another sequence of good stopping times then
so that the definition of H X does not depend on which particular sequence

0
of good stopping times is used.

So defined, the semimartingale integral H X inherits many of the 0
properties which are common to the isometric and Lebesgue-Stieltjes in-

tegrals. For example,
(a) A(H X)t = HJ::"Xt all t, a.s.
0
( b) If X is a local martingale or a process of finite variation, then

H X has the same property.
0
(c) We can obtain the integral H X as a limit of sums (see 0
[Dellacherie and Meyer, 1982] for more infonnation):
Proposition 5.1. Let X be a semimartingale and let Hn be predictable

processes such that Hn( t, w) converges to zero for each (t, w) as n
tends to infinity and such that there exists a sequence of stopping times
Tk i + 00 and constants ck such that IHtl ::0:; Ck for 0::0:; t < Tk and all
n. Then
in p.
sup IHn 0 Xtl ----+
n ..... 00
o (5.2)
0<; t<; T
for any finite T. If U is an adapted corlol process then for each t > 0
(write 8;: for k2- n ):
(5.3)
Proof: First suppose that the Hn are unifonnly bounded by a constant c

and that X = Xo + M + A where M E 0lL 2 and A is an adapted corlol
process with finite variation. Then by Doob's L2 inequality,
EsuPIHnOMtI2::O:;4El°OIHtnI2d(M,M)t n-cJ 0
t 0
so that sup IHn • Mtl converges to zero in 2-mean, and

t
suplH n • Atl ~ lOOIHtnlldAtl n ~ 00' 0 a.s.

t 0
We thus have that sup IHn • Xtl converges to zero in probability as n tends
t
to infinity.
In the general case there are stopping times Rk i 00 such that
for t ~ 0, a.s.
where Ak has finite variation and Mk E '!))L2. Then H n • Xt = Hn • X t /\ Rk

for t < Rk so that
p.
sup ~
n --+ 00
o
by our conclusion in the special case. This implies Eq. (5.2). Applying what
was just proved to
H;: = ll._I{s5t} - L ~15/:I(t15/:,t15/:+d(s),

O:,,;k< 2 n
Ck = 2k and Tk = inf{t: I~I ~ k} yields Eq. (5.3). •
6. QUADRATIC VARIATION AND THE CHANGE OF VARIABLE FORMULA
Let X and Y be semimartingales. We define their co-quadratic variation

process [X, Y] by
[X,Y]t = liminp. XoYo
n --+ 00
+ L (Xt15/:+ 1 - Xt15/:)(Yr15/:+ 1 - 1';15/:) (6.1)

O:,,;k< 2 n
To see that the limit exists, note that
(Xt15/:+ 1 - X t15/:) ( Yr15/:+ 1 - Yrd
= (Xt15/:+1~15/:+1 - Xt15/:~15/:) - Xt15/:(~15/:+1 - ~15/:)

- ~15/:( X t15I:+1 - X t15/:)
Thus, the right-hand side of (6.1) is the limit of
as n tends to infinity. Proposition 5.1 implies that the limit exists in the
6. QUADRATIC VARIATION AND THE CHANGE OF VARIABLE FORMULA 237
sense of convergence in probability, and it is given by
(6.2)
This equation shows that we can choose a corlol version of [X, Y], which we
always will. Since Eq. (6.1) defines [X, Y]t up to a set of zero probability for
each t, and since a corlol process is determined by its values at a countable
set of times, any two corlol versions of [X, Y] are equal for all t, with
probability one.
The process [X, Y] is linear in X and [X, Y] = [Y, X ], so (ignoring a
set of zero probability)
[ X, Y] = H [X + Y, X + Y] - [X - Y, X - Y]
+i[ -iX + Y, -iX + Y] - i[iX + Y,iX + Y]} (6.3)
Processes of the form [Z, Z]t are increasing in t, and thus [X, Y] is a locally
finite variation process. The continuous part, [X, Y]C, of the process [X, Y]
is defined by
[X, Y]~ = [X, YL -[X, Y]o - L Il[X, YJ.

0<8,,;t
Proposition 6.1. Let X and Y be semimartingales.

(a) Il[X, Y]t = IlXtll~ for all t, with probability one.
(b) [X, Y]~ = [X, YL - XoYo - L
IlXs Ily' for all t, with prob-
0<8,,;t
ability one.
( c) If either X or Y have locally finite variation sample paths then
[X,YL=Xo¥o+ L IlXslly'
0<8,,;t
and [X, Y]c = o.

(d) If H is a locally bounded predictable process then [H· X, Y] =
H· [X, Y] and [H· X, y]C = H. [X, Y]c.
(e) If X and Y are in 0]L2 (resp. are local martingales) then XY
- [X, Y] is a uniformly integrable martingale (resp. a local martingale).
Proof: By Eq. (6.2),
MX, YL = Il(Xt~) - Xt_ll~ - ~_IlXt = dXtll~
which proves ( a), and ( b) is an immediate consequence. The first equation in

(c) can be readily deduced from the definition (6.1) of [X, Y], and then the
fact that [X, y]C = 0 is an immediate consequence. It is easy to verify that
[H· X, Y] = H· [X, Y] when H is a bounded predictable step function and

the general case can then be proved by an approximation argument similar
to those of the previous section. Then
and so the second assertion in part (d) is true.

If X and Yare each local martingales, then the stochastic integrals in
Eq. (6.2) are also local martingales, and so XY - [X, Y] is also a local
martingale. Now, suppose that X is in 0lL 2 • Then by what was just proved,
IXI 2 - [X, X] is a local martingale. Thus, there exist stopping times Rn i 00
so that for each n,
(6.4)
is a uniformly integrable martingale. It has initial value zero so that
(6.5)
Since IXI 2 is a uniformly integrable submartingale the variables IXt A R 12

converge in I-mean to Ixl as n tends to infinity for fixed t, t s + 00. ih
Eq. (6.5), [X,X]tAR also converges in I-mean, to [X,X]t. Thus, for each
t, t s + 00, the vari~ble in (6.4) converges in I-mean to IXt l2 - [X, X)t.
Hence, this limit process is also a uniformly integrable martingale. Sum-
marizing, if X is in 0lL 2 then IXt l2 - [X, X]t is a uniformly integrable
martingale. The analogous fact for XY -[X, Y] can now be deduced using
(6.3), and the proposition is proved. •
If M is in 0lL2 then both IM21 - [M,M] and IM21 - (M,M) are
martingales. Thus [M, M) - (M, M) is a martingale. Therefore, (M, M) is
the (unique) predictable compensator of the process [M, M). So far we have
defined (X, X) only for X in 0lL 2 •
To define (X, X) more generally we can extend the definition of the
predictable compensator A of a process X by relaxing the condition that
X - A be a martingale to the condition that X - A be a local martingale.
With this more general definition, the predictable compensator of a process
X continues to be unique (if it exists). A sufficient condition for the existence
of a predictable compensator for X is that X be a corlol process with locally
integrable variation.
If X and Yare semimartingales, we define (X, Y) to be the predictable
compensator, if it exists, of [X, Y]. The process (X, Y) exists, for example, if
X and Yare locally square integrable, for then [X, Y] is locally integrable.
Note that [X, Y] = (X, Y) whenever [X, Y) is predictable and XoYa =
O-this is true, for example, whenever [X, Y] is continuous and XoYa = o.
6. QUADRATIC VARIATION AND THE CHANGE OF VARIABLE FORMULA 239
Examples. Let w be a Wiener martingale. Then, as shown in Proposition

2.3.4, [w, w]t = t. Since this is a predictable increasing function with
initial value zero, it is equal to its predictable compensator. Thus,
(w,w)t=t.
For another example, let N be an t£. Poisson process and let n be
the martingale n t = Nt - t. Then both N and n have locally finite
variation sample paths, ~ Ns = ~ ns for all s, and ~ Ns takes values
zero and one. Therefore
O<s:s;t O<s:s;t
Since the predictable compensator of N is t, it follows that (N, N)t =
(n, n)t = t.
Equation (6.2) (which is virtually the definition of [X, Y]) yields that
for semimartingales X and Y,
(6.6)
We call Eq. (6.6) ItO's product lormula. It is an important special case of a

generalization of the next proposition to functions I on en.
Proposition 6.2 (Generalized Ito's Formula). Let I be a twice continuously

differentiable function on IJ;! and let X be a real semimartingale, or let
I be an analytic function and let X be a possibly complex-valued
semimartingale. Then I(Xt ) is also a semimartingale and
I(Xt) = I(Xo ) + f/'(X s-) dXs

o
+ ~ U(Xs ) - I(Xs-) - 1'(Xs-)~Xs)
s:s;t
+% {/"(XJ d[X,X]; (6.7)
Prool: Since I(Xs -) + 1'(Xs-)~Xs is the first-order Taylor's approxima-

tion to I(Xs ), Taylor's approximation formula yields that for t ~ 0,
s:s;t S5;t
(6.8)
where K is random and is defined by
K(X, I, t) = ~max{I/"(U)I: lui ~ SUPIXsl}

s:s;t
Thus, the sum of jumps term on the right-hand side of Eq. (6.7) converges
absolutely for each finite t, with probability one, and its variation is locally
finite. The right-hand side of Eq. (6.7) is therefore a semimartingale.
Equation (6.7) is certainly true if f(x) = lor if I(x) = x. The equation
when f(x) = xn becomes
xnt = xn0 + nltxn-ldX

s- s + "
~
(xns - xn
s- - nxn-1IlX)
s- s
o 85,t
n(n - 1) ltxn-2d[X X]C (6.9)

+ 2 0 8- , 8
For n = 2 this equation reduces to the product fonnula (6.6) and it is hence
true in that case. Now, argue by induction and suppose that for some
n, n ~ 2, that Eq. (6.9) for xn is correct. Then, since xn+l = x(xn), the
product formula yields
xn+l
t = ltxs- dXsn + ltxns- dXs + [X , xn] t (6.10)
o 0
Now, using Eq. (6.9),
[ X , xn] t = xn+l
0
+ nltxn-ld[X
s-'
X] s
o
+ L IlX8 ( X: - X:_ - nX:-=-l IlX8 )
85,t
= X;+l + nfX:-=-ld[X,X]; + L IlXs(X: - X:_)

o
and
fX
o
s- dX: = n fX:- dXs +
0
L
8:5, t
X s-( X: - X:_ - nX:-=-l IlX.)
n(n - 1) ltxn-ld[X X]C

+ 2 0 8- , s
Substituting these expressions into the right-hand side of Eq. (6.10), we

obtain that Eq. (6.9) is true for n replaced by n + 1. Therefore, Eq. (6.9) is
true for nonnegative integers n.
Since each side of Eq. (6.7) is linear in I, it is thus true whenever I is a
polynomial. In the general case, there exists a sequence of polynomials In so
that In and the first and second derivatives of In converge to I and its first
and second derivatives, uniformly on each bounded set. Then Eq. (6.7) is
valid for each In and, by Proposition 5.1 and the estimate (6.8) with I
replaced by I - In' we can take a term by tenn limit in probability to deduce
that Eq. (6.7) is valid for I. •
7. SEMIMARTINGALE EXPONENTIALS AND APPLICATIONS 241
7. SEMIMARTINGALE EXPONENTIALS AND APPLICATIONS
Given a semimartingale X, the exponential process of X, 0(X), is defined by
0(XL = exp(Xt - HX,X]n Il (1 + ~Xs)e-AXs (7.1)

S:5,t
Given any D > 0, there is a constant K so that
if Izl :0; D
and since the sum of I~XsI2 over s in any finite interval is a.s. finite, the
product term in (7.1) converges absolutely for all t, with probability one. The
product term thus represents a process with locally finite variation, so the
process 0(X) is itself a semimartingale. Note that 0(X)t = exp (t) if X t == t.
Using the generalized Ito's formula, it is not hard (although it is
tedious) to show that
(7.2)
Equation (7.2) can be viewed as a stochastic integral equation for the

semimartingale 0(X), for X given. It can be shown [C. Doleans-Dade, 1970]
by methods similar to those in Section 4.4 that 0(X) is the unique, up to a
set of zero probability, solution of the equation. (If we had provided a proof
of this fact or used it, we would have defined 0(X) using Eq. (7.2).) An
important consequence of Eq. (7.2) is that if X is a local martingale, then so
is 0(X).
It is easy to verify directly from the definitions that if X and Y are
each semimartingales then
0(X)0(Y) = 0(X + Y+[X,Y]) (7.3)
and that if X is a real semimartingale with ~Xs 2 -1 for all s then

0(X)t 20 for all t.
The first application of martingale exponentials is to prove a proposi-
tion which combines a characterization of the Wiener process due to P. Levy
with a characterization of the Poisson process due to S. Watanabe.
Proposition 7.1 (Characterization Theorem). Let w and n be real local

martingales where w is sample continuous with [w, w]t = t and N
defined by Ne = n t + t is a counting process with No = O. Then w is an
tt. Wiener martingale, N is an tt. Poisson process, and for any s 2 0
the following three a-algebras are independent:
tts
Proof: Since W and N are adapted, it suffices to prove that for any s, t
°
with ~ s < t,
wt - Ws is Gaussian, mean 0, variance t - s;
Nr - Ns is Poisson, mean t - s; and
Wt - Ws' Nt - Ns and 6es are independent (7.4)
Choose real constants U and v and define a local martingale X by
Xt = iuwt +( e iv - l)nt
S(X) is also a local martingale, and
[ X , X]Ct = -u 2 t'!:J.X
t = (e iv -l)!:J.Nt and IlNt E {O, I}
so we obtain
S(X)t = exp(iuwt + iv~ + t{u 2 /2 + 1 - e iv ))

Since the local martingale S(X) is bounded on finite intervals independently
of w, it is a martingale. Thus, for s ~ t,
or, equivalently,
This translates to
E[exp{iu(wt - ws ) + iv(Nt - N s ))I6esl

= exp ( - ( t - s)( u 2 /2 + 1 - e iv ))
As a function of (u, v) the right-hand side of this equation is the joint
°
characteristic function of a pair of independent random variables, the first of
which is Gaussian with mean and variance t - s, and the second of which
is Poisson with mean t - s. Thus, by the uniqueness of characteristic
functions, the conditions in (7.4) must be true. •
The next application of semimartingale exponentials is to the descrip-
tion of local martingales when the probability measure on the underlying
probability space is changed. We start out with a probability measure 0'0 on
(n, 6e). Suppose that M is a local martingale relative to (6e t ) on the
probability space (n, 6e, 0'0). For short, we say that M is a 0'0 local martingale.
Assume also that M is real valued, that Mo = 0, and that IlMt > 1 for all t,
and let L = S(M). Then Lo = 1, and L is a positive 0'0 local martingale.
Hence, L is also a 0'0 supermartingale (see Exercise 8) and the a.s. limit Loo
7. SEMI MARTINGALE EXPONENTIALS AND APPLICATIONS 243
exists. We assume that

EoLoo = 1
where the subscript" 0" on the expectation indicates that it is defined using
measure 0'0' Then L is a uniformly integrable 0'0 martingale. We can define
a new probability measure 0' on (Q, ct) which is equivalent to 0'0 (i.e., 0' and
0'0 assign zero probability to the same sets) by specifying the Radon-Nikodym
derivative:
(7.5)
We will use "E" without a subscript to denote expectations defined using

measure 0'.
Lemma 7.1. For any random variable D with EIDI finite and any sub-a-alge-
bra CJ of ct,
0' a.s.
Proof: We will first show that
(7.6)
Let V denote the left-hand side of this equation. To show that it is equal to
the right-hand side we will show that V has the two properties which
characterize the right-hand side. First, V is CJ measurable. Second, we must
prove that
all bounded CJ measurable Z (7.7)
Now
Eo[ZLooD] = E[ZD] = E[ZE[DICJ]]

= Eo [ZLooE[DICJ]] = Eo [Eo [ZLooE[DICJ]ICJ]]
= Eo [ZE[DICJ]Eo[LooICJ]] = Eo[ZV]
which proves (7.7) and hence also (7.6).
Since 0' « 0'0' the equality in (7.6) is also true 0' a.s., and since
0'( Eo [Loo iF] = 0) = Eo [ LooI{Eo[L""!§,l~od
= Eo[ Eo[LooICJ]I{Eo[L""I'51~0}] = 0
the lemma follows. •

Lemma 7.2. Suppose that U is an adapted random process such that Lt~ is
a 0'0 martingale (resp. 0'0 local martingale). Then U is a 0' martingale
(resp. 0' local martingale).
Proof: Suppose that Lt~ is a 0'0 martingale. Then, by Lemma 7.1, for
s> t,
0' a.s.
and the first assertion is proved. If, instead, Lt~ is only a 0'0 local
martingale, then there exist stopping times R n i 00 such that (LU)t /\ R is a
0'0 martingale for each n. Then, by the argument above, ~ /\ R is n a 0'
martingale for each n, which implies that U is a 0' local martingafe. •
Proposition 7.2 (Abstract Girsanoy's Theorem, [GirsanoY, 1960], [Van Schuppen and
Wong, 1974]).Let Y be a 0'0 local martingale. If (Y, M) exists (com-
puted under measure 0'0)' then Y - (Y, M) is a 0' local martingale.
Proof: By Lemma 7.2, it suffices to prove that (Y - (Y, M»)L is a 0'0 local
martingale. By Ito's product formula,
(1'; - (Y, M)t)L t = - {Ls- d(Y, M)s + [Y - (Y, M), L] t

o
+ a 0'0 local martingale (7.8)
Since (Y, M) has locally finite variation and is predictable and since dL t =
L t _ dMt ,
[Y- (Y,M),L]t=[Y,L]t- fl1(y,M)sdLs
= {Ls_ d[Y,ML + a 0'0 local martingale

o
Finally, since (Y, M) - [Y, M] is a 0'0 local martingale, the right-hand side
of (7.8) is a 0'0 local martingale. •
Example. Suppose that w is an (&., 0'0) Wiener martingale and that N is a

counting process with predictable intensity At, where At > 0 a.s. Let h
and p be predictable processes such that the equation
defines a local martingale M. Suppose that EoLoo = 1 where L = 0(M)

and that 0' is defined by Eq. (7.5). Then, computing under measure 0'0'
and
[') itA - p
( M, N - J~ As ds = (M, N)t = J~ \.. s d(N, N\
o t 0 s
Therefore, under measure 0',
and
are each local martingales. Since the quadratic variation process of the
first of these is t, the first of these processes is in fact a 0' Wiener
process. The intensity of N under measure 0' is p.
The final application of martingale exponentials we give is a useful

martingale representation theorem. Let w denote a Wiener process and let N
denote a Poisson process, both defined on a complete probability space
(n, cr, 0'0)' Define an increasing family of a-algebras «(9t) by
I.9t = a( ws , Ns: s :::; t) V 0'o-null sets
The following lemma ensures that the family «(9t) satisfies the usual condi-
tions, so that, for example, «(9t, 0'0) martingales have corlol modifications.
Lemma 7.3 (Conditional Zero-One Law). (9t = (9t+.
Proof. (Based on an idea in [Doob, 1953J): For t ~ 0 define
Fix t and let X be a bounded (9t+ -measurable random variable. Since (9t+ is
independent of Glt t + ~ for each n ~ 1 we have
The martingale convergence theorem ensures that the left-hand side con-
verges a.s. to Eo[XIGltt ] so that Eo[XIGltt ] = EoX a.s. By considering all
possible X we conclude that the a-algebras Glt t and (9t+ are independent.
Now if Z is a sum of finitely many terms AiBi where the Ai are bounded
(9t-measurable random variables and the Bi are bounded Gltt-measurable
random variables, then

n
L Eo[(X - Eo[XI(~\])A;] EoB; = 0
i~l
By the monotone class theorem, the left-most expectation in this equation is

zero for all bounded (900 -measurable random variables Z. Thus, X = Eo [XI(9t]
a.s. which, since (9t contains all 0'0 null sets, implies that X is (9t measurable.
Since X was an arbitrary bounded (9t+ -measurable random variable, the
lemma follows. •
Proposition 7.3 (Martingale Representation Theorem).Let Y be any (corlol) local

martingale relative to (9t, 0'0)' Then Y has the representation
(7.9)
where n t = ~ - t, and Hand K are predictable with
0'0 a.s. for t < + 00 (7.10)
In the special case that Y is a martingale relative to (9t,0'o) with

Eol Yel 2 < + 00,
EolYel 2 = EolYol 2 + EoDHsl2 + IKsl2 ds (7.11)

o
Proof: Assume that Y is a real bounded corlol (9.,0'0) martingale and

choose T with T> O. Then by part (g) of Proposition 4.1, inequality (4.3),
and the fact that (w, w)t = t,
IEo..(,TC( t, w) d(Y, W)tl = IEo(Y, C· w>rl
~ Eo [ ( (Y, Y)T ~TCs2ds) 1/2]

for real bounded predictable processes C. This implies the following absolute
continuity condition for the indicated measures on the predictable subsets of
[0, T] X 0:
d(Y, w )to'o (dw) « dt0'o (dw)

Let H denote the corresponding Radon-Nikodym derivative-so, by defini-
tion,
for all bounded predictable processes C. By choosing C of the form Ct ( w) =

Z( W )I{a < t,,; b} where Z is (fa measurable, we conclude that
is a martingale. It is also predictable and has corlol locally finite variation

sample paths. It is thus zero for all t, with probability one, so
o~ t ~ T a.s.
Similarly, there is a predictable process K so that
o :s; t ~ T a.s.
We will prove that Eqs. (7.9) and (7.11) hold.

Choose c > 0 and define the predictable process U by
and define martingales Y and Y by
and
~ = Yo + tHPs dws + tKPs dns

o 0
We will show that 1'; = ~. First define Zt = 1'; - ~ and note that (z, w) =
( z, n) = O. The jumps of Z are uniformly bounded so that we can choose
A > 0 so that A t::.Zt > -1 for all t. Then c£(AZ) is a positive local martingale
so there exist (9. stopping times Rk iT such that Lk = c£CAZ t 1\ R) is a
martingale for each k. Let 0'k denote the probability measure on (Q, (f)
which is absolutely continuous with respect to 0'0 such that d0'k = L} d0'o.
Since (z, w) = (z, n) = 0, Proposition 7.1 yields that w is a 0'k Wiener
martingale and N is a 0'k Poisson process. Thus 0'k and 0'0 ar~ identical on
(9T' which implies that L} = 1, 0'0 ~.s. Thus, Zt = 0 and 1'; = 1'; for 0 ~ t ~
R k • Since k was arbitrary, 1'; = 1'; for 0 ~ t ~ T. Now, 1'; converges in
2-mean to ~ as c --+ + 00 and so Eqs. (7.9) and (7.11) are obtained in the
limit. By an easy patching-together argument, Y has a representation of the
form (7.9) for all t ~ O.
Next, suppose that Y is a square integrable «(9t, 0'0) martingale, and let
T> O. Then there exists a sequence of bounded (9T-measurable random
variables Y'; converging in 2-mean to YT' Let yk denote the corlol martingale
on [0, T] defined by ~k = Eo[~kl~\]. Then yk is bounded so that
~k = Yok + ItHsk dws + It Ksk dn s t< T

a a
for some predictable Hk and Kk. Now
EaI Y; - Yom l2 + EoITIH: - H::'12 + IK: - K::'12 ~ = EoIY'; - Y;'1 2

a
and the right-hand side of this equation converges to zero as m and k tend
to infinity, so there must exist predictable functions H and K so that
EoITIH: - Hsl2 + IK: - Ksl2 ~ k ~ ex! 0

a
Equations (7.9) and (7.11) thus hold for 0 ~ t ~ T, and it is then an easy
matter to show that there exist H and K so that Eqs. (7.9) and (7.11) hold
for 0 ~ t ~ T.
The general case of the proposition can be proved by a more elaborate
approximation argument [Liptser and Sl-iryaev, 1976], which we omit. See
[Jacod, 1979] for a variety of extensions. •
EXERCISES
1. Verify Eq. (4.1).
2. Suppose that S and T are stopping times. Show that their minimum S /\ T and
maximum S v T are stopping times. If (Tn) is a sequence of stopping times show that
T* = supTn is a stopping time.
n
3. Use the optional sampling theorem to show that if M is a corlol martingale and T is a
stopping time then MT defined by Mr = M,,, T is also a martingale.
4. Prove the optional sampling theorem (Proposition 1.5) in the special case that there are
two nonrandom times tl < t2 such that S, R E {t l , t 2 } a.s.
5. Let U I , U 2 , • •• be independent and suppose that each U; is uniformly distributed over

the interval [0,2]. Let Zn = U I U2 ••• Un. Show that Z is a martingale relative to tt~ and
discuss the limiting behavior of Zn as n tends to infinity (Hint: Consider In Zn).
6. Verify Eq. (1.9).
7. Show that a positive local martingale M is a supermartingale.

EXERCISES 249
8. Let h be a predictable process such that 10 00

h~ ds s K a.s., let w be a Wiener
martingale, and let p > 1. Use the fact that E0(ph • W)T :S 1 for all stopping times T
to obtain an upper bound on E[0(h' w)f] which is uniform in T. Then deduce that
0(h' w) is class D, and finally, that E0(h' w)oo = 1.
9. Let (!'t n ) be an increasing family of sub·a·algebras of !'t, let !'too = V!'tn , let 0'0 and 0' be
probability Illeasures on !'too> and let 0'on and 0'n denote the restrictions of 0'0 and 0' to
!'tn. Suppose that 0'; « 0'n for each n and let Ln = d0'; jd0'n. Then L is a positive 0'0
martingale, so that Loo = lim a.s.Ln exists. Show that 0' « 0'0 if and only if EoLoo = 1.
(Hint: EoLoo = 1 is equival~;t"'to L being uniformly integrable.)
10. [Kakutani, 1948] Let g = {(XI' X2' ... ): Xi E 1Ii!} and let Xi denote the function on g
defined by Xi(X)=Xi' Let !'tn = a(XI"",Xn} and !'too =Vn!'tn. Let 0'0 and 0' be
probability Illeasures on (g, !'too) so that XI' X 2 , ••• are independent unit variance
Gaussian random ~ables under 0'0 and under 0'. Assume that EOXi = 0 and EX; = ai
for all i. Let S = L af. Using the previous problem, show that 0'0 == 0' if S < + 00 and
i=l
that 0'0 ..L 0' otherwise. Show also that if 0'0 == 0' then the Radon-NikodyJTI derivative
of 0' with respect to 0'0 is given by
where the SUIll converges 0'0 a.s.
11. Find the predictable compensators for wt4 and for N t4 , where w is a Wiener Illartingale
and N is an !'t. Poisson process.
12. Let (Vn'@"n: n E Z+) be a martingale with Vo = 0, let Bn = Vn + 1 - Vn for n:2: 1, and
suppose that IBnl = 1 for all n and w.
(a) Show that Bn + I is independent of !'tn for each n (so that B I , B 2 , ... are mutually
independent) and that 0'(Bi = 1) = 0'(Bi = -1) = 0.5.
( b) Show that any martingale (Mn) relative to the a-algebras (!'t;;') generated by V has
a representation
n
Mn = Mo + L HkBk n:2:1
k~l
where H is a discrete-time predictable process.

7
Detection and Filtering
1. INTRODUCTION
In both the detection problem and the filtering problem a sample path from
some random process 0 is observed. In the detection problem, it is pos-
tulated that one of two known probability measures ?P or C!Po is given on the
underlying space (g, ce). Presumably the distribution of the observed process
is different under ?P and ?Po. The problem is to intelligently guess which of
the two measures is in effect, on the basis of the observation. In the filtering
problem a random process t which may be a coordinate of some other
random process, is given and the problem is to produce at each time t an
estimate of ~t.
The detection and filtering problems are closely related. On the one
hand, the solution of the detection problem can involve estimating processes
under the assumption that one of the given probabilities is in effect. On the
other hand, one treatment of filtering is to introduce a probability measure
in addition to the one given, thus leading to the situation encountered in the
detection problem.
250
1. INTRODUCTION 251
For both the detection and the filtering problem the role of the
observations is to provide information, and this information is summarized
by the increasing family of a-algebras (9. = «(9t) generated by the observa-
tions:
(9t = a( Os: 0 ~ s ~ t)
We shall be largely interested in computing E[~tl(9t], which is known to be

the minimum mean square error estimate of ~t given (9t. This expectation is
defined only up to sets of zero probability which can depend on t, but a nice
version of the process always exists:
Proposition 1.1. Let (9. be an increasing family of sub-a-algebras of A (not

necessarily satisfying the usual conditions) and let ~ = (H t, w» be a
c.>B (IR +) X (f..measurable random process. Then there exists an (9. pro-
gressively measurable process ~ such that ~t = E[~tl(,l\] a.s. for all t
such that E~t or Eft is finite.
Proof: Consider first the case that ~(t, w) = f( t)U( w) where f is a bounded
Borel-measurable function and U is a bounded random variable. Let (Mt )
denote a separable version of the martingale E[UI(9t]. There exists a count-
able subset D of IR + such that for each t not in D, M is a.s. left continuous
at t [Doob 1953, Th. 11.2, p. 358]. Then the process (j defined by
_ ( limsupM LtnJ / n if t ft. D

lJ,; = n~ 00
Mt iftE D
is a progressively measurable process (it is a limit of such processes) and it

is a modification of M, and therefore of E[lJ,;I(9t]. Thus ~ defined by
~t = f(t) lIe is a progressively measurable version of E[~tl(9t].
Let % denote the vector space of all bounded measurable processes ~
such that there exists a progressively measurable version of E[~tl(9t]. Then %
includes the algebra of all finite linear combinations of processes of the form
f( t)U( w) as above.
Next, suppose that ~ is a bounded measurable process and that ~n is a
sequence of processes in % such that either ~n converges to ~ uniformly, or
~n( t, w) increases to ~(t, w) for all t, w, as n tends to infinity. Then ~ defined
by
Ht,w) = limsupt(t,w) (1.1)
n~oo
is a progressively measurable version of E[~tl(9t]. Thus ~ E %. % is there-

fore closed under uniform limits and under bounded monotone limits. By the
252 DETECTION AND FILTERING
monotone class theorem, X contains all bounded measurable processes.

Finally, given an arbitrary measurable process, ~,~ is the pointwise
limit of a sequence of bounded measurable processes ~n with I~nl .::; I~I. Then
~ defined by (1.1) satisfies the requirements of the proposition. •
If ~ t is redefined on a (t, w) set of dt dqy ( w) measure zero, then a
predictable process can be obtained:
Proposition 1.2. Let ~ = (g(t, w» be a 0?>(R+) X tt-me!lSurable random pro-

cess. Then there exists an (9. predictable process ~ and a subset D of IR
with Lebes$Ue measure zero (in fact D can be chosen to be countable)
such that ~t = E[~tl(9t] a.s. for all t such that t is not contained in D
and either E~-: or Ef; is finite.
Prool: Let X denote the space of bounded ~ for which Proposition 1.2 is
true. If ~(t, w) = l(t)U(w) and M and D are as in the proof of Proposition
1.1, then ~ defined by
= { I( t) limsupMLt(n-l)J/n if t $. D
~t = n-oo
o ift E D
(with the convention that M t = 0 for t < 0) and D satisfy the conclusion of
Proposition 1.2. Thus ~ is in X, and X also includes finite linear combina-
tions of such processes. The rest of the proof is nearly identical to that of
Proposition 1.1. •
Remark: ~ and ~ are a poor person's version of the so-called optional

projection and predictable projection, respectively, of ~ onto the family
«(9t) (see [Dellacherie and Meyer, 1982, Appendix I, Th. 6]).
In order to describe the dynamics of processes which are not" observed,"
we assume that there is a second increasing family a.
of sub-a-algebras of a
such that (9t c at for each t.
Proposition 1.3 (Representation of Projections onto (9 J. Let Z be an (a., qy)

semimartingale with representation
Zt = Zo + ltlscis + m t
o
where EIZol is finite, I is an a. progressive process with
E fils I cis < + 00

o
and m is an (a., qy) martingale. Then there exists an «(')., qy) martingale
1. INTRODUCTION 253
D so that for each t,
0' a.s. (1.2)
and
0' a.s. (1.3)
Proof: If we use Eq. (1.2) to define D and use the representation for Z and
the definition of Zt we obtain
where 19. progressive versions are chosen for conditional expectations where
appropriate. Now for 0 ~ a < t,
E[E[m t ll9 t ]ll9a] = E[m tll9a]

= E[E[mtl~a]ll9al = E[m all9a]
and
E[fE[181l9t] -E[tsll9sl dsll9a] = fE[tsI0 a] -E[/sll9sAa ] ds
= IaaE[ Isll9al - E[ Isll9

8 ] ds
which shows that D i~ indeed a martingale. Equation (1.3) is a consequence

of Eq. (1.2) since f = f except on a set of dt d0'( w) measure zero. •
Suppose now that the observed process is given by Ot = (Yt, Nt) and
that
Yt = Ioth s ds + wp (1.4)
where w is an ( ~ .' 0') Wiener martingale, (1.5)

his (£. progressive, E Itlhsl ds < + 00 (1.6)
o
and
Nt 1\8ds + Mp
= (1.7)
o
where N is a counting process, (1.8)
M is an ( ce .' 0') martingale, (1.9)
A is ce. predictable, E flAsl cb; < + 00 (1.10)

o
The main point of conditions (1.7)-(1.10) is that N is a counting process with

teointensity A.
Since y and N are (9. progressive, we have y = y and N = N. Thus,
Proposition 1.3 yields that y and IV defined by
f t,
Yt = Yt - Jfi hs cb; and (1.11)
o
are (9. martingales. The processes Y and IV are called the innovations
processes corresponding to the observation process y and N.
2. LIKELIHOOD RATIO REPRESENTATION
Consider a simple hypothesis testing problem for which a process (Os:

o~ ssT) is observed and the problem is to decide which of two given
probability measures 0' or 0'0 is in effect on the underlying space (0, ce).
Associated with any decision procedure we can identify two conditional error
probabilities:
p; = <» [ decide 0'0 is in effect]

p;I = <»0 [decide 0' is in effect]
The a-algebra of the observations is (9T where we now let
(9t = a( Os: 0 s sst) V{ 0'0 null sets}
We also assume that (9t = (9t+ and that (90 consists only of ~o null sets and
their complements. Let 0'~ and 0't denote the restrictions of 0'0 and 0' to (9t'
We assume for simplicity that 0';[ and <»T are mutually absolutely continu-
ous and define At for 0 s t s T by
We observe that At = E o [A T I(9t] so that A is a 0'0 martingale. By our

assumptions on (9., 0'o(Ao = 1) = 1 and we can choose a modification of A
which is corlo!.
The likelihood ratio test with threshold y and randomization probabil-
ity a for deciding which probability measure is in effect is the following:
2. LIKELIHOOD RATIO REPRESENTATION 255
compute AT and
if AT > Y decide 0'

if AT = Y decide 0' with probability a, otherwise decide 0'0
if AT < Y decide 0'0
Then, for example,
and for any E between zero and one there is a choice of the parameters y and
a such that p;I = E. By the well-known Neyman-Pearson lemma of statistics,
the resulting likelihood ratio test achieves the minimum p; over all decision
rules with p;I = E.
The key to implementing and evaluating a likelihood ratio test is to
compute the likelihood ratio AT' and the key to understanding the likelihood
ratio process (A t) is to connect it to the behavior of (q) under 0' and 0'0. We
will first represent A as a martingale exponential.
Define, for n z 1,
Rn = inf {s: 0:::; s :::; T and As :::; lin, or s = + oo}

Then Rn is increasing in n for each w so there exists a limit R of Rn as n
tends to infinity. Since A has right-continuous sample paths, ARn :::; lin if
R n :::; T. By this fact and the optional sampling theorem, we have (using In
for the indicator of the event that Rn :::; T):
EoATI{R';'TJ:::; EoATIn = Eo(Eo[ AT I nl(9RJ)
= Eo( Eo [ A TI(9RJ In) = EoARJn :::; l/n
Thus EoArI{R';'TJ = 0, and since 0'O(AT > 0) = 1 it follows that 0'o(R ::;; T)
= o. Therefore A;~ is locally bounded, and As- is locally bounded away
from zero. We can thus define
and apply Ito's formula to obtain

or
At = exp(Xt - ~[X,X]~ + L In(l + ~Xs) - ~Xs)

ss,t
Equivalently, A has the representation A = SeX), and X IS a 0'0 local
martingale since A is one as well.
Our goal now is to relate X (rather than A directly) to the behavior of
o under 0' and under 0'0' We now restrict our attention to the case that
Dr = (Yt, Nt)· We suppose that (Yt, Nr) satisfies conditions (1.4)-(1.10) for
some A and h-these are conditions on (y, N) under the measure 0'. We
suppose, on the other hand, that
Y is an ( (9., 0'0) Wiener martingale, and

N is an ( (9., 0'0) Poisson process
By the Levy-Watanabe characterization theorem, Proposition 6.7.1, this

completely specifies the distribution of (y, N) under measure 0'0'
We can now invoke the martingale representation theorem under
measure 0'0 to deduce that (let n t = Nr - t)
05.t5.T
for some (9. predictable processes <l> and I/; such that
0'0 a.s.
To see how <l> and I/; are related to 0', first apply the abstract Girsanov's
theorem, Proposition 6.7.2, to deduce that
Comparing this with the fact that the innovations processes y and IV in
(1.11) are also «(9.,0') martingales and using the fact that the «(9.,0')
predictable compensators of Y and N are unique, we conclude that
and dtd0'( w) a.e.
Hence the representation for X becomes
X = h· Y +(~ - 1) • n
3. FILTER REPRESENTATION - CHANGE OF MEASURE DERIVATION 257
so that
A = 0(X) = 0(h· Y)0((X - 1)· n) (2.1)
where
and
0((X - 1) • nL exp(fln(Xs) dN f(Xs - 1) dB)

= s -
exp (- txs - 1 dB) TI Xs

=
o s<t
!:>.N-; ~ 1
Examination of Eq. (2.1) reveals that the optimal solution to the

detection problem cleanly separates into a pure estimation part and a pure
detection part. More precisely, for the "pure detection case" in which A and
h are known nonrandom functions, Eq. (2.1) with h = h and :\ = A, together
with the likelihood ratio test, prescribes the optimal strategy. In the g~neral
case, the first part of the solution is to compute the estimates h and :\ and
the second part is to proceed as in the pure detection case.
Remark: The likelihood ratio representation (2.1) can be traced back to work
of Sosulin and Stratonovich (1965), Duncan (1968), and Kailath (1969).
For an account of its history for point process observations, see
[Bremaud (1981)].
3. FILTER REPRESENTATION - CHANGE OF MEASURE DERIVATION
Let (!2, ct, 0') be a complete probability space and let ~. = (~t) and (9. = «(9t)
be two increasing families of sub-a-algebras of ct which satisfy the usual
conditions. The a-algebra ~t represents "state information" up to time t and
(9t represents "observed information" up to time t. It is useful to define a
third increasing family ct. by ct t = ~t V(9t. We will assume that the observa-
tions a-algebras are generated by a pair of processes y and N:
(9t = 0'( y." Ns: 0 ~ s ~ t) V{ 0' null sets} (3.1)

A random process ~ adapted to ~. represents the state, or one coordinate of
the state, to be estimated. We assume that ~, y, and N have the following
representations:
State Coordinate: 0:::;, t:::;, T (3.2)
(3.3)
Observation:
(3.4)
where
~o is ~o measurable, EI~ol < + 00,

f3 is ~. predictable, E Ia If3s I ds <
T
+ 00,
m is an ~. martingale,
h is 6£. progressive, iTo h; ds < + 00 a.s., (3.5)
w is an fJ t V ~T Wiener martingale, (3.6)
N is an fJ t V ~T adapted counting process, (3.7)
A is 6£. predictable, E
o
iTAs ds < + 00, As > 0, (3.8)
M is an fJ t V ~T martingale (3.9)
Remark: If we specialize the conditions to require h to be ~. progressive and

A to be ~. predictable then by a slight extension of the Levy-Watanabe
characterization theorem, w is independent of ~T and, given ~T' N is
an independent increment process with Poisson distributed increments.
The model is called doubly stochastic in this special case.
The purpose ofthis section is to derive a representation for ~t = E[~tlfJt]
by carrying out calculations under a second probability measure "Po. This
approach finds its roots in the work of Duncan (1968), Mortensen (1966), and
Zakai (1969). To begin, define a "P local martingale U by
We invoke the following assumption:
Assumption: EUT = 1 (3.10)
Define a new measure "Po on (n, 6£) by

Proposition 3.1.
(a) 0' «0'0 and (with n t = Nt - t) d0' /d0'o = LT where

L = f9(h· y + (AA- 1)· n). (L is given by the right-hand side of Eq.
(2.1) with h and >.. replaced by h and A.)
(b) The restrictions of 0' and 0'0 to S'lT are the same.
( c) y is an «(9., 0'0) Wiener martingale.
(d) N is an «(9.,0'0) Poisson process.
(e) For each s z 0, the three a-algebras
and
are 0'0 independent. In particular (take s = 0), S'lT and (9T are 0'0
independent.
Proof: Using the identity (6.7.3) we easily check that Lt~ = 1, which
implies (a). Next, V is an (S'lT V (91' 0') martingale with Vo = 1 so that
E[VTIS'lT] = 1. Thus, for any bounded S'lT-measurable random variable X,
EoX = E[XVT ] = E[E[XVTIS'lT]]

= E[XE[VTIS'lT]] = EX
so that ( b) is true. Now, computing under measure 0',

A-I
wt - <- h· w + -A
- • M' w) t = v
Jt
and
A-I
Mt - <- h· w + -A
- . M' M) t = N t - t = n t
so by the abstract Girsanov's theorem, y and n are each «(9t VS'lT' 0'0) local
martingales. Moreover, y is sample continuous with [y, Y]t = t and N is a
counting process. Properties (c)-( e) are then implied by the Levy-Watanabe
characterization theorem, Proposition 6.7.1. •
Remark: We will no longer appeal directly to assumption (3.10), but will only
use the fact that 0'0 satisfying the conclusions of Proposition 3.1 exists.
This is important since Proposition 3.1 can be established under much
less restrictive assumptions. In particular, it is not necessary to have
0'0 « 0'.
By Lemma 6.7.1, if 1'/ is a measurable process with E( 1'/t) finite for all t,
then
0' a.s. for each t

where TI is any (9. progressive process such that
(3.11)
Note that
where A is the likelihood ratio process for 0' versus 0'0 relative to (9 ••
It follows that if we define ~ to be any predictable process such that
~t = TIt 0'0 a.s. unless t is in some Borel subset D of ~ + with Lebesgue
measure zero, then we have
dtd0'( w) a.e.
We make the additional assumptions
Eo i
0
T
(~sLshJ ds <
2
+00
EoiTI~s_Ls_(As - l)lds < +00

o
Eo io Ls_ d(m, m)s
T 2
< + 00 or
Proposition 3.2. Under the above assumptions,
a.s. for t ~ °
Lemma 3.1. [y, m]t = [n, m]t = °for t ~ 0,0'0 a.s.
Proof of Lemma 3.1: y, and therefore also [y, m], is sample continuous so
that
[ y, m L = (y, m)t
If s < t then Yt - Ys and m t - ms under 0'0 are conditionally independent

and each has conditional mean zero given @t" Thus mtYt is an (@., 0'0)
martingale so that (y, m) = 0, and thus [y, m] = 0.
Since n has locally finite variation,
[n,m]t= LD.nsD.ms (3.12)

s<;t
Since the probability that n has a jump at any fixed time is zero and since n
and m are independent under 0'0' it follows from (3.12) that [n, m] = 0. •
Lemma 3.2.
Proof of Lemma 3.2: First,
Eo [fpSLS dsl{9t] = fotEo [PsL sl{9tl ds (3.13)
if a jointly measurable version of the integrand on the right-hand side is

chosen. Now for s ~ t,
and the second a-algebra on the right-hand side is <3'0 independent of tes '
Since P.Ls is tes measurable and tes ::J {9s' we claim that
a.s. (3.14)
To establish Eq. (3.14) it suffices to prove that
for all bounded (9t-measurable random variables Z, which is easily done by

considering Z of a special form and applying the monotone class theorem.
Now, the right-hand side of (3.14) is, by definition, equal to /3•. Equations
(3.13) and (3.14) thus imply part (a) of the lemma.
Part ( b) is proved similarly:
Eo[fottPsLs- dYsl{9t] = fEo [tPs L .-I{9s1 dys
= t~s dys
o
The first equality can be proved by reducing it to the case that YsLs- is
replaced by a bounded te ° predictable step function, and the second equality
is true by the definition of ~s. The proof of part (c) is similar.
The fact that m is an (~o' <3'0) martingale and that {9T is <3'0 indepen-
dent of ~T implies that m, and therefore (tP°mt), is a (<3'o'~tV{9T)
martingale. Thus,
and Lemma 3.2 is proved.
Proof of Proposition 3.2: By the stochastic integral equation for semi-

martingale exponentials,
so that, using Lemma 3.1,
Using this and Ito's product formula we have
or
Lt~t = ~o + f~s-Ls-hsdys + f~s-Ls-CA.s - 1) dn s

o 0
+ fLs-f3s ds + itLs_ dms (3.15)

o 0
The proof is completed by taking the conditional 0'0 expectation given fJ t of

each term in (3.15), and using Lemma 3.2. •
4. FILTER REPRESENTATION -INNOVATIONS DERIVATION
The so-called "innovations method" is applied to this section to derive

another representation of ~. In this section we assume once again that fJ. is
generated by processes y and N, that ~ is a process to be estimated, and
that &. is an increasing family of a-algebras satisfying the usual conditions
with fJ t C at.
(The a-algebras ~. of the last section are not used in this
section.) We assume that ~, y, and N have a representation of the form
(3.2)-(3.4), where now we assume that
~O is &0 measurable, EI~ol < + 00

f3 is &. predictable, E iT( f3s) 2 ds < + 00
o
m is an &. martingale, Em~ < + 00
4. FILTER REPRESENTATION -INNOVATIONS DERIVATION 263
(9t = a( Ys' Ns: s.:s; t) V{0' null sets}
h is tt. progressive, E iTo h; ds < + 00

w is an tt. Wiener martingale
N is an tt. adapted counting process
A is tt. predictable, E iTAs ds
o
< + 00, As > 0
M is an tt. martingale
We will also make the following assumptions, which can be relaxed.
At~e forall(t,w),wheree> 0
E iTo li; At dt < + 00 (e.g., A bounded)
By the argument used in the first part of the proof of Proposition 6.7.3,
there exist tt. predictable processes cf> and I/; so that
(m, w)t = fcf>s ds

o
o .:s; t.:s; T, 0' a.s.
(m,M)t=fl/;sds
o
Proposition 4.1. Under the above assumptions,
( 4.1)
where H and K are the predictable processes given by
Hs = ((lis - t)(h s - h s »)' +;Ps dsd0'(w) a.e. ( 4.2)
Ks = ((lis - t)(~s - >"s»~ +t dsd0'(w) a.e. (4.3)

As
and y and IV are the innovations processes

51 = Yt - fh s ds and Nt = Nt -It>..s ds
o 0
The truth of the proposition is directly implied by the next three
lemmas. Our proof is essentially that of Fujisaki, Kallianpur, and Kunita
(1972).
Lemma 4.1. There exists an ((9.,0') martingale D with Do = 0 and

DETECTION AND FILTERING
264
ED¥ < + 00 such that
~t = ~o + {/3s ds + Dt 0:0;; t:o;; T, a.s.

o
Proof: This is simply an application of the projection result, Proposition
1.3• •
Lemma 4.2. There exist predictable processes H and K such that
0' a.s.
and
o :0;; s :0;; T, 0' a.s.
Proof: Define X = -h Y -
0 (1 - I/:\.) 0 IV and let L = 0(X), or
Lt=exp(-hoys- t(-2
o
Ih ;+:\.s-I)ds) fl ~IA
S5,(
!!.Ns~l s
Let 'Tn = inf{t: L t ;;:: n or (X,X)t;;:: n} and define xn and Ln by X tn =

X t /\ T and L~ = L t /\ T • Since L t < n for t < 'Tn and L t :0;; Lr/e, we have that
L~ :0;;" n/e for all t. By computation we see that (X, X) is sample continuous,
also (xn, xn) = (X, X)t/\ T • Thus, both Ln and (xn, xn) are bounded
(uniformly in t, w). Since L n n is a bounded local (9. martingale it is actually a
martingale, so EL~ is equal to one. Define a probability measure 0'n by
and then by the abstract Girsanov's theorem,
and
are 0'n local martingales. Now
and
Thus, under measure 0'n, w up to time 'Tn is an (9. Wiener martingale and N
up to time 'Tn is an (9. Poisson process.
By inequality (6.4.3),
(D, xn); :::; (xn, Xn)t(D, D)t

:::; (const.) (D,D)t
so that (D, X n );, as well as Dl, has finite expectation under 0'. Since L n is
bounded, both (D, xn); and Dt2 have finite means under 0'n as well. Thus,
D - (D, xn) is an (0., 0'n) square integrable martingale with initial value
zero, so by the representation theorem of Section 6.7,
where H and K are predictable with
EnlTo H2 + K2 ds <
s s
+ 00
Therefore, for t :::; Tn'
Now the sum of terms in brackets is an (0., 0'n) martingale up to time

Tn (since the other terms are) and it is predictable and has locally finite
variation. It is thus zero, so we have
Now, letting n tend to infinity and patching together processes H and K, we

obtain
0:::; t:::; T
where H and K are predictable processes with
and a.s . •
Lemma 3. Hand K in Lemma 4.2 are given by Eqs. (4.2) and (4.3).
Proof: (See the remark below for a heuristic approach.) Define a sequence
(Sn) of (9. stopping times by
Sn = inf {t: IYtl ~ n}

Then, since Y has continuous sample paths, IYtl is bounded above by n for
o ~ t ~ Sn. In what follows we will use /L to denote Lebesgue measure so
that, for example, yf3 • /L represents the process
yf3 0 /Lt = fYsf3s ds

o
Applying Ito's product formula, using the representations (3.2) and
(3.3) for ~ and y, and using the fact that [m, w] = (m, w) since w is sample
continuous, we obtain
~Y = yf3 • /L + Y • m + ~h 0 /L + ~ w + (m, w)
0 ( 4.4)
so that
(b) l/,Sn = (yf3 + ~h + <1» 0 /Lt A Sn + an cr. martingale
Thus, projecting this process onto the observations a-algebras (Proposition
1.3) yields
(b)tAS n = (yf3 + ~h + <l>f °/LtASn + an (9. martingale (4.5)
On the other hand, another application of Ito's product formula and

use of the representation (4.1) for ~ (which is valid by Lemmas 4.1 and 4.2)
gives
b = (y/3 + ~h + H) 0/L + an (9. local martingale ( 4.6)
Using the fact that
and
and comparing Eqs. (4.5) and (4.6) we find that
is an (9. local martingale. It is also predictable, has initial value zero, and has
locally finite variation corlol sample paths. It is hence zero for all t with
probability one, so that H t must be given by Eq. (4.2) for t S Sn. Since Sn
increases to T as n tends to infinity, Eq. (4.2) is thus true in general.
The identification of K is similar: First define a sequence (Tn) of (9.
stopping times by
Tn = inf {t: Nt ~ n}
Since the jumps of N have size at most one, Nt is bounded above by n for
o :s;
t :s; Tn' The analogue of Eq. (4.4) is
~N = NP • /L + N _ • m + O· . /L + L· M
+([m,M] - (m,M») + (m,M)
where N _ at s is equal to N s - (and similarly for ~_). Since Em, M] - (m, M)
is an ct. martingale, we have
which is analogous to Eq. (4.5). The rest of the proof for identifying K is so
similar to that for identifying H that we omit it. •
Remark: Since the proof of Lemma 4.3 is somewhat mysterious, we will give
a heuristic derivation of the equations for H and K by appealing
directly to the "orthogonality principle." Fix t, let dt be a small
positive number, and use the notation d~t = ~t+dt - ~t> etc. We know
that
for any square integrable (9t+ dt-measurable random variable Y. Since

dYt is (9t+dt measurable, this implies that
Using the relationships
dYt = (h t - ht) dt + dWt + o( dt) = dWt + O( dt)

~t+dt - ~t+dt = ~t - ~t + d~t - d~t
= ~t - ~t+ dm t - KtdYt - Htd~ + O(dt)
E [ dWt dYtl(9t] = E [ dwt ( dWt + O( dt) )1(9t] = dt + o( dt)
E[dwt dm t l(9t] =E[d(w,m)tl(9t] =4>tdt+o(dt)
E[(~t - gt) dWt +(h t - ht){dm t - KtdYt - Ht dNt )l(9t] = o(dt)
then implies that
which implies (heuristically) Eq. (4.2). This derivation can be made

rigorous, but only with much perseverance. Equation (4.3) for K can be
obtained similarly.
Example [Kalman and Buey, 1961]. Let ~ be the unique solution to the linear
(4.7)
and let the observation process y be given by
where v and w are Wiener processes, ~o is Gaussian, and ~o, v, and w

are mutually independent. Suppose that there are no counting process
observations. Then ~ is a Markov process and ~ and y are jointly
Gaussian random processes. By Proposition 4.1
(4.8)
where
(4.9)
By Ito's formula and Eq. (4.7),
d~; = 2~td~t + b 2 dt = (2a~; + b 2) dt + 2b~tdvt

so that by Proposition 4.1
( 4.10)
Ito's formula and Eq. (4.8) yield
dat)2 = (2aat)2 + H;) dt + 2~tHtelyt (4.11)
Thus, by Eqs. (4.9)-(4.11),
dHt = (2aHt + b2 - H;) dt +[Ut - 3~t)(~; -~n]' elyt

The coefficient of dYt can be reexpressed as
which is the conditional third central moment of ~t given (9t. Since ~t is

conditionally Gaussian, this moment is zero, so that H satisfies the
5. RECURSIVE ESTIMATION 269
detenninistic differential equation
dEt
dt
= 2 aH
t
+ b2 - H 2• H =
toO
(72
0 ( 4.12)
Equations (4.8) and (4.12) are a special case of the Kalman-Bucy filter
and associated Riccati equation given by Eqs. (3.9.37) and (3.9.39).
These equations provide a recursive method for computing ~t (see
Section 5).
5. RECURSIVE ESTIMATION
The estimation equations of the previous two sections are specialized in this
section to the case that a Markov process X represents an unobserved state,
and conditional moments
are to be computed. The goal of recursive filtering is to find a process (lfe:

o~ t ~ T) taking values in a possibly infinite dimensional space such that
the following conditions are satisfied:
• lfe+dt depends on lfe and the new observations (dYt, d~) .

• For each t the desired conditional moments g( X) t are completely
detennined by lfe.
Let the families of a-algebras (1Dt) and «('\) represent state and observa-
tion information, respectively, as in Section 7.3. Suppose that X is a Markov
process with respect to (1Dt) with a stationary transition function
p = (~(A, t): x E S,A E 'iB(S), t E [0, TJ)

that is, X is 1D. adapted,
a.s. for A E 'iB (S)
and P satisfies the Chapman-Kolmogorov equation, Eq. (5.1.2). We assume

that the state space S of X is a separable metric space (for example, S can
be an interval of the line as in Chapter 5, a subset of Euclidean n space, or a
finite or countably infinite set), we let 'iB(S) denote the collection of Borel
subsets of S, and we assume that the sample paths of X are corlol. The
treatment of Markov processes in Sections 5.1-5.3 immediately carries over
to this setting. In particular, the generator of X is a linear operator A, the
domain of which, Gj)A' is a subset of the Banach space of bounded 'iB(S)-mea-
surable functions on S.
We assume that the observation a-algebras «(9t) are generated by y and

N as in (3.1), where
Yt = Iath( Xs) ds + W t (5.1)
Nt = Ia\(Xs) ds + Mt (5.2)
and we suppose that conditions (3.5)-(3.9) are true with hs = h(Xs) and
As = A(Xs _)·
To apply the estimation equations of Propositions 3.2 and 4.1 to
processes g of the form gt = g(Xt ), we must find a semimartingale represen-
tation of g as in Eq. (3.2). If g is in Gj)A' then an easy modification of the
proof of Dynkin's identity given in Chapter 5 yields that for 0 :s; s :s; t,
(since here s and t are not random, it is not necessary that X be strong
Markov). Equivalently, cg defined by
(5.3)
is an ~. martingale for each g in Gj)A •
Remark: The fact that Cf in Eq. (5.3) is a martingale for g in Gj)A char-
acterizes Ag. Moreover, if we use such martingale property as a
definition for Ag, we can extend the operator A to a much larger
domain (possibly including some unbounded functions).
Thus, if g E Gj)A then gt = g(Xt ) has the desired representation (3.2)
with Ps = Ag(Xs_) and M t = Cf. Proposition 3.2 then yields the representa-
tion
where, as always, n t = Nt - t. This equation is written in differential form,

which is simply a shorthand way to write an integral equation. This equation
implies that there is a corlol version of g( X) t. Suppose that the product gA
is in Gj). Then there exists a corlol version of (g(X)A(X»tVand so
(g(X)(A(X) - 1)); = (g(X)(A(X) - 1));_

In that case we have
dg(XL=Ag(X)tdt +g(X)h(XLdYt +g(X)(:\(X) - 1L- dn t

( 5.4)
The right-hand side of this equation involves the observations Y and

N, and unnormalized moments corresponding to functions Ag, gh and
g(:\ - 1). If GLl is a subset of GDA such that (using 1 to denote the function
identically equal to one)
1, gh, g(:\ - 1), Ag E span (GLl) whenever g E GLl
then a set of equations of the form (5.4), one for each g in GLl, yields a
stochastic differential equation for
The equation is driven by the observations Y and N. Since
U satisfies the requirements of recursive filtering discussed at the beginning

of the section. If GLl is sufficiently large, then as we see next, Ue determines
the conditional distribution of X t given 191' so that recursive equations can be
obtained for that distribution.
Example [Wonham, 1964]. Suppose the statespace S is {I, 2, ... , n}. Functions
and measures on S are represented by column and row vectors, respec-
tively. The generator A is then represented by an n X n matrix which
we also call A, so that for a function I on S, it is consistent to
interpret AI as the product of a matrix A and a column vector I.
Let 8 i denote the function on S such that 8 i (z) is one for z = i
and is zero otherwise, and let II denote the vector process defined by
For a function I on S we then have
and I(X)t = ~t~ (5.5)

t
where 1 denotes a column vector of all ones.

Equations (5.4) yield a representation for each of the coordinates

of II, and these representations can be written in vector form as
where diag ( v) denotes the n X n diagonal matrix with the elements of

v down the diagonal.
Equation (5.6) shows that lIt can be computed "recursively"
from the observations processes y and N, and by Eq. (5.5) the condi-
tional moments f( X)t are easily expressed in terms of II.
We will now obtain an analogue of Eq. (5.6) for the general case. For
disjoint sets An in 01>(8),
a.s. (5.7)
and since S is separable there exist versions of the conditional expectations

so that the exceptional set in (5.7) does not depend on the sets An [Doob
1953, p. 29]. With more care, a process Q can be constructed so that
Q( ., t, w) is a positive measure for t, w fixed

Q( A, ., .) is (C). progressive for each A E 01> (8) (5.8)
If there is a process (qt(x, w)) and a (deterministic) measure 1 on 8 such that
f bounded, 01> (8) measurable
then qt is the density of Qt with respect to 1.

Similarly, there exists a process II which satisfies the same conditions
(5.8) as Q but with f(X)t replaced by f(X)(" If there is a process 7Tt (X, w)
such that
f bounded, 01> (8) measurable
then 7Tt is the density of lIt with respect to 1.

The bilinear product of a measure JL on 'iB(S) and a bounded 'iB(S)-

measurable function f is defined by
(JL, t) = 1s fdJL
The adjoint A* of A relative to this product is characterized by
(JL, At) = (A*JL, t)
(We will not enter into a discussion of the domain of A *.) If I is a reference
measure on 'iB (S) we use (., ·)z to denote the usual inner product
(f,gL = isf(x)g(x)l(dx)
The adjoint t* of A relative to this product is characterized by
(g,At)z = (t*g, fL
For example, if X is a diffusion with drift term m(x) and diffusion term
(J2(X) as described in Section 4.7, then
Af(x) =
af 1 a2 f
m(x)-a (x) + -2 (J2(X)-2 (x)
x ax
and, by an integration by parts, if g is a Schwartz function of rapid descent,

then the adjoint t* of A relative to Lebesgue measure satisfies
Now, Eq. (5.4) can be written
or, using M f to denote the operator on measures defined by MfJL(A) = L,f dJL,
we have
This equation is true for g in 6j)A and 6j)A is dense in the space of all bounded
'iB(S)-measurable functions so we conclude that
(5.9)
This is a stochastic differential equation (actually, an integral equation) for

II driven by the observations. Since II t determines all the moments I( X)t'
Eq. (5.9) represents a possibly infinite dimensional filter. If II has a density
'IT with respect to some fixed reference measure I, then this equation becomes
Similarly, starting with Proposition 4.1, we can obtain a recursive

equation for the normalized conditional density q:
dqt(X) = e*qt(x) dt + qt(x)[ hex) - h(X)t] (dYt - h(X)t dt )
+qt-(X)[ A(X~(X)t-l( d~ - 'X t dt) (5.11)

A(X)t_
Ancestors of Eq. (5.11) were first given by Stratonovich (1960) and Kushner
(1967). In the case that h and A are identically constant functions (equiv-
alent to no observations) this equation reduces to dqt/dt = e*qt, which is
the Kolmogorov forward equation for the family of densities of X t as t
vanes.
Remark [DaviS, 1980].Consider Eq. (5.10) for an unnormalized conditional

density when A = 1 (equivalent to no counting process observations):
(5.12)
Seek a solution of the form
Then r(t, x) = 'lTt(x) exp (- h(x)Yt), so we can apply Ito's formula and
Eq. (2.12) to deduce that
dr(t,x) = (Bt - th(x)2)r(t,x)dt (5.13)
where Bt is the (random) operator defined by
B*f(x) = exp( -h(x)Yt)e*(exp(h(-)Yt)t(-))(t,x)
for sufficiently regular functions f on S. The important point is that

there is no dYt term in Eq. (5.13). This affords the possibility that Eq.
(5.12) for r can be solved pathwise, and this then gives a method for
constructing 'ITt> and therefore all conditional moments f( X) t' as path-
wise functions of the process y.
A different tactic for finding recursive filtering equations is to work
directly from the definition (3.11) of f( X)t" Let ~o and L be defined as in
Section 7.3 and use .f!,(A, t) to denote ~(Xt E A). By Proposition 3.1,
and so for bounded 0?J(S)-measurable functions f,
f(X)t = Eo[Eo[ f(X t )Lt l(9t>Xt ]l(9t]
= 1s f( x )Eo [Lt l(9t> X t = x] ~o [Xt E dxl(9t]
where 'ITt is defined by
( 5.14)
Thus, 'ITt is the density of ITt with respect to the unconditional distribution
of X,.f!,( -, t).
Example. Let X t = X for all t for some random variable X. Suppose that h
and A have the form
ht=a(X)u(t) and At = exp(b(X)v(t))
for bounded measurable functions a, b, U, and v. Define random

processes
and
and define a function F: R4 -+ R by
(
F(x,cp,t/;,t) = exp a(x)</> + 2" lita(x) u(s) ds + b{xH
0
2 2
+ {b(x)v(s) -Ids)
Then Eq. (5.14) becomes
'ITt ( x) = F( x, lilt, 'Itt> t)
Note that 'ITt determines all the moments fe X)t and that III and 'It' can
be recursively updated using the equations
Example [BeneA, 1981]. Suppose the state process X is the solution to the
for some constant x o , and suppose the observation process y is given

by
Yo = 0
where v and w are independent Wiener processes. Using a variation of

the change of measure method just explained, BeneS shows that if
( 5.15)
for some constants k, b, and c then an unnormalized conditional

density Pt (relative to Lebesgue measure) of X t given (~\ is given by
where
F(x) = lo feu) du
x
-oo<x<oo
EXERCISES 277
and /L and S can be recursively computed using the two equations
ds t = 1 _ k 2s 2 • So = 0,
dt t,
/Lo = Xo
The process p does not satisfy the nonlinear filtering equation (5.10).
However, using direct computation and Eq. (5.15), one readily finds
that
where U is a random process (not depending on x). By Ito's product

rule we then verify the p defined by
satisfies the stochastic equation (5.10) for 'Tr. Therefore, if we can

establish uniqueness of solutions, we can conclude that p = 'TT. Since for
each t, Pt differs from Pt by a random factor (not depending on x), this
would then imply that P is an unnormalized conditional density as
claimed.
Remark: Other examples which lead to finite dimensional nonlinear filters,

geometric arguments for why finite dimensional filters do not exist for
certain (in fact, "most") examples, approximate filters, bounds on
minimal error for nonlinear filtering, connections to physics, applica-
tions to control, and a variety of other topics can be found in
[Hazewinkel and Willems (1981)] and [Kohlmann and Vogel (1979)].
EXERCISES
1. Let Z = (Zk: k E Z+) and 0 = (Ok: k E Z+) be random processes and let '8 k =
a( 0 0 , ••• , Ok)' Suppose that each zk takes values in S = {l, 2, ... , n} and that Ok takes
values in some finite set (not depending on k or w). Suppose that for each possible value
(J of Ok that there is an n X n matrix R«(J) such that
(i.e., (zk+ " Ok+ I) is conditionally independent of fJ k given Zk' and the transition mecha-
nisms are time homogeneous.} Define Ilk to be the row vector with ith entry '!J'(Zk =
il 0 , _... , Ok)' Derive the recursive filtering equation,
where e is the column vector of all ones. (Note that the numerator on the right-hand
side is an unnormalized version of I1 k + 1)'
2. Find E[Ui t - ~t)2] in terms of a and b for the example of Section 7.4 in the case that
a~ = O. Find its limiting value as f tends to infinity.
3. Let U be a positive random variable with density function f and distribution function
F. Suppose that N is a counting process with intensity A = aI[ o. U) + bI[ u. + 00) relative
to the family of o-algebras
a(U)Yo(Ns:S75,f) f~O
Find a recursive equation for 'P[f < UINs: s 75, f].

8
Random Fields
l. INTRODUCTION
,Ve have used the term stochastic process to denote a collection of ran-
dom variables indexed by a single real parameter. In other \Yords, the
parameter space is a subset of the real line and usually an interval. In
most applications, this parameter is interpreted as time. There are many
applications where it is more appropriate to consider collections of ran-
dom variables indexed by points in a more general parameter space. For
example, in problems involving propagation of electromagnetic waves
through random media, the natural parameter space is a subset of R4,
representing space and time. A similar example is the velocity field in
turbulence theory. The term random field is often used to denote a
collection of random variables with a parameter space which is a subset
of Rn. There are other possible parameter spaces. For example, the
parameter space can be taken to be a function space of some kind. Such
is the case with generalized processes. Alternatively, ,ve can also take
the parameter space to be a collection of subsets of Rn. Such, for example,
279
280 RANDOM FIELDS
is the situation for random measures, which have already been made use
of in connection with second-order stochastic integrals. Generally speak-
ing, the kind of assumptions that we make concerning mutual depen-
dence of the collection of random variables reflects something of the
parameter space. For example, if the parameter space is an interval, we
usually assume continuity in probability. If the parameter space is a
linear topological space, we usually assume that the collection of random
variables as a function of the parameter is both linear and continuous in
probability. If the parameter space is a IT algebra, then we usually assume
that the collection of random variable is IT additive, and so on.
Compared to the one-parameter case, relatively little is known con-
cerning processes with a more general parameter space. Of course, a great
deal of the results concerning stochastic processes with a one-dimensional
parameter space do not depend on the fact that the parameter is one
dimensional. These results are easily generalized to more general collec-
tions of random variables. Such generalizations require little elaboration.
In this chapter, we shall focus our attention on problems of the following
two kinds: (1) problems which arise only when the parameter space is
more complex than one-dimensional, and (2) important properties of
one-dimensional processes \\"hich are not easily extended, because they
depend on the parameter space being one dimensional. As an example of
category (1), we have the rich interplay between the probabilistic prop-
erties of a random field and the geometry of its parameter space. Although
this interplay already appears in the one-dimenBional case in the form of
stationarity, the geometry of the real line is obviously both degenerate and
rather trivial by comparison with the geometry of higher dimensions. As
an example of category (2), consider :\Iarkov processes. The definition
of a :Markov process makes explicit use of the ,yell orderedness of the
real line. It is difficult to see how it can be generalized to a multidimen-
sional parameter space. The way that it is done is one of the most inter-
eBting problem8 that we shall discuss in thi8 chapter.
To avoid confusion, we shall adopt the following terminology: A
collection of random variables defined on a common probability space
will be called a stochastic process or a random field accordingly as its
parameter space is one-dimensional or multidi mensional. We should note
that while this terminology is widely used, it is by no means universal.
For example, a random field is often called a stochastic process with a
several-dimensional time.
2. HOMOGENEOUS RANDOM FIELDS

The simplest generalization is to take the parameter space to be an
n-dimensional interval, that is, A = {(Zl, Z2, . . . , Zn): Zi E Ii I where
2. HOMOGENEOUS RANDOM FIELDS 281
each Ii is an interval, closed, open, or semiclosed. For a random field

{X(w,z), z E A l most of our discussion in Chaps. 2 and 3 carry over with
no difficulty, e.g., continuity questions, separability, Karhunen-Loeve
expansion. Results on wide-sense stationary process can also be general-
ized to random fields with parameter space Rh. Here, the generalization
is interesting and not entirely trivial.
Let {X., z E Rn} be a family of second-order random variables
defined on a probability space (n,a,<p). We assume that X. is continuous
in probability, i.e., for every E > 0,
<p(IX. - X.' I ~ E) 11<=<'11->0) 0 (2.1)
where Ilzll denotes the Euclidean norm
(2.2)
As in Chap. 3, we allow X. to be complex valued. The most straight-
forward generalizations to wide-sense stationary processes are homo-
geneous random fields defined as follows : We say that {X., z E Rn} is
homogeneous if EX. = fJ. does not depend on z and
(2.3)
for all Zo, z, z' in Rn. Setting Zo = -z' in (2.3), we see that the covariance
function
E(X. - EX.)(X., - EX.,) = R(z - z') (2.4)
depends only on z - z'. Of course, R(z - z') is also nonnegative definite;
i.e., for any finite number of points Zl, Z2, • • • ,ZN in Rn, and any collec-
tion of complex constants aI, a2, . . . , aN, we have
N
2:
;,j= 1
a/ijR(zi - Zj) ~ 0 (2.5)
Bochner's theorem (Proposition 3.5.1) can now be generalized as follows.
Proposition 2.1.A function R(z), z ERn, is the covariance function of a

homogeneous q.m. continuous random field if and only if it is of
the form
R(z) = r ei21r (.,zlF(dv)
JRn (2.6)
Where F is a finite Borel measure on Rn, and (v,z) denotes the inner
product
(v,z) = rn
;=1
ViZ; (2.7)
282 RANDOM FIELDS
Proposition 2.1 can be proved in exactly the same way as Proposi-

tion 3.5.1, and we won't repeat it here. Similarly, we can obtain a spectral
representation for {X., z E R"l of the form
(2.8)
where XO is a random set function defined on Borel sets of Rn such that

EX(A)X(B) = F(A n B) (2.9)
F being defined by (2.6). An integral of the form
JR" !(II)X(dll)
can be defined for any! E U(F) as the q.m. limit of a sequence of random
variables resulting from approximating! by a sequence Uk I where each
!k is a linear combination of indicator functions of Borel sets and
The details are nearly identical to those given in Sec. 3.6.
Translations: z ~ z + Zo are transformations of Rn onto Rn which

leave the Euclidean distance liz - z'lI between any pair of points un-
changed. However, these are not the only transformations of Rn onto Rn
which have this property. Transformations t: Rn ~ Rn of the form
n (2.10)
ti(Z) = L aijzj
j~l
where A = (a;j) is an orthogonal matrix, also preserve Euclidean distances.

Such a transformation is called a rotation, and is often further designated
as being proper or improper according as the determinant of A is + 1 or - 1.
It follows that any succession of rotations and translations also preserves
Euclidean distances. It also turns out that every transformation t of Rn
onto Rn preserving Euclidean distances can be represented as a translation
followed by a rotation. Such distance preserving transformations are
called rigid-body motions. The collection of all rigid-body motions forms
a group G which is called the Euclidean group.
N ow consider a second-order random field {X., z E Rn} such that
EX. = fJ. is a constant which we assume to be zero and
(2.11)
for all z, z' E Rn and all rigid-body motions t. Suppose that we move z'
2. HOMOGENEOUS RANDOM FIELDS 283
to the origin 0 by translation, then z is moved to z - z'. If we follow

this by a rotation which takes z - z' into (liz - z'll, 0, 0, . . . , 0), the
origin 0 is left invariant. Therefore, from (2.11) we have
EX.X., = EX<llz_z'lI.o•...• O)X eo•o..... 0)

(2.12)
= R(llz - z'l\)
which is a function of a single, real positive variable liz - z'll. A random
field satisfying (2.11) will be called a homogeneous and isotropic random
field. It is obviously homogeneous.
For an isotropic and homogeneous random field, it is advantageous
to adopt a polar-coordinate system. We define the (n - I)-dimensional
unit sphere 8 n - 1 as the set of all points z in Rn such that lizil = 1. An
arbitrary point z in Rn can be uniquely identified by its distance from
the origin 0 which is liz I and the point on 8 n - 1 intersected by a line
connecting z and 0 which is
Therefore any z E Rn is uniquely identified by Ilzll and any n - 1 coordi-

nates identifying a point of 8,,-1. We choose a coordinate system for
8 n-1, recursively, as follows: 8 1 is the unit circle in R2, and every point
on 8 1 has the form
(2.13)
Hence, 8 1 has a single coordinate 81 • For 8 n , consider the point z = (1,

0, . . . ,0) E Rn+1 and call it the north pole. The set of all points on
8 n with ZI = cos 8n forms an (n - I)-dimensional sphere with radius
sin 8n . If a coordinate system (8 1 , 82 , • • • ,8 n - l ) has already been
chosen for 8 n -1, then this automatically produces a coordinate system
(8 1 , 82, • • • , 8n ) for 8 n • Starting from (2.13), we can thus generate a
coordinate system (8 1 , • • • , 8n ) for 8 n with the property that 8 n = con-
stant represents a sphere of dimension n - 1. We have also generated a
coordinate system for Rn where a point z is represented by (r, 81 , • • • ,
8n - 1 ) with r = Ilzll, and (8 1 , • • • , 8n _ l ) is the intersection between 8 n - 1
and a line connecting z and the origin O. The origin 0 has a degenerate
representation in this system given by r = 0 with 81 , • • • ,8 n - 1 undefined.
Starting from (2.6), we can now find the form of Bochner's theorem
for an isotropic and homogeneous random field. Since an isotropic and
homogeneous random field is necessarily homogeneous, we have
(2.14)
284 RANDOM FIELDS
F or any rotation t,
R(lIt(z) Ii) = R(lIzll)
= ( ei27r (•• t(zllF(dll)
JR-
= r
JR'
ei27r( •• zlF(dll) (2.15)
Since (lI,t(Z)) = CtCt-l(II)),t(Z)) = (t- 1(1I),Z), we have, upon a change in

the variable of integration,
(
JR-
ei21r( •. t(zllF(dll) = r ei21r(X.zlF(t(d'A))
h~~
= r ei21r( •• zlF(t(dll)) (2.16)
JR"
It follows from (2.15) and (2.16) that for every z E Rn and every rotation t,
(
JR-
ei21r (.,zlF(dll) = r
JR"
ei21r (.,zlF(t(dll» (2.17)
Therefore, the Borel measure F must be isotropic, that is,

F(A) = F(t(A» (2.18)
for every Borel set A and every rotation t. We can now take z to be
directed along the north pole (that is, Zl = IIzll, Z2 = 0, Za = 0, . . . ),
and if we make use of the isotropy of F, then (2.14) becomes
R(llzll) = Joroo }sn-l
r ei27rAlizilcosqo._l dQ tpl.··· ·t{Jn-l
Fo(d'A) (2.19)
where dQ is the differential surface element of Sn-l.

The set of all points on Sn-l with a fixed c,on-l (that is, at a fixed
distance from the north pole) is a sphere of dimension n - 2 with radius
sin c,on-l, the surface area of which must be sin n - 2 c,on-l area (Sn-2). Since
the integrand in (2.19) depends only on c,on-l, we have
(2.20)
The constant K is just the total area of Sn-2 and can be absorbed into F o.
The inside integral in (2.20) can be evaluated to be
(2.21)
where J is the Bessel function. Hence, the isotropic version of Bochner's

theorem is given as follows [see, e.g., Yaglom, 1962, pp. 81-86].
Proposition 2.2.A function R(r), 0 :::; r < 00, is the covariance function of
an isotropic and homogeneous q.m. continuous random field if
3. SPHERICAL HARMONICS AND ISOTROPIC RANDOM FIELDS 285
and only if
R(r) = r" J(n-2l/2(Xr) F (dX) (2.22)

Jo (Xr) (n-2) /2 0
where F 0 is a finite Borel measure on [0,00).

We shall call Fo the spectral measure for the random field. We note that
the constant 271" in (2.21) is absorbed into X in (2.22) to result in a simpler
formula.
For an isotropic and homogeneous random field, the spectral repre-
sentation (2.8) assumes a special form which involves a countable family
of random set functions defined on the Borel sets of [0,00) in place of the
n-dimensional set function X. This spectral-representation formula is
expressible in terms of spherical harmonics which will be introduced in the
next section. In the process, we shall also introduce isotropic random fields
which are not necessarily homogeneous.
3. SPHERICAL HARMONICS AND ISOTROPIC RANDOM FIELDS

We have already defined a unit n-dimensional sphere Sn as the set
of all points in Rn+1 at unit distance from the origin. Starting from (2.13),
we can find a natural coordinate system for Sn so that an arbitrary point
6 E Sn has a representation 6 = (0 1 , O2 , • • • , On) in which On is the angle
of 6 from the north pole. The north pole is thus defined by having On = 0,
and for the north pole 01 , O2 , • • • , On_1 are not specified.
The sphere Sn is obviously invariant under any rotation (proper or
improper) of Rn+l, and the set of all rigid-body motions of Rn+1 which
leave Sn invariant is just the set of rotations. Suppose that we define a
metric d on Sn by
d(6,6') = 1f(6,6') (3.1)
where 1f(6,6') is just the angle between the two straight lines connecting
the origin in Rn+1 to 6 and 6'. It is easy to verify that (3.1) defines a metric.
Now, suppose that we consider the set of all transformations t: Sn ~ Sn
such that
d(6,6') = d(t(6) ,t(6'» (3.2)

It turns out that this set is just the set of rotations in Rn+l. This set is
denoted by G(Sn) and is easily verified to be a group. If we denote by
G(Rn+l), the group of rigid-body motions in Rn+l, and by T(Rn+l), the
group of all translations in Rn+l, then we have
(3.3)
286 RANDOM FIELDS
We also note that T(Rn+l) is isomorphic to Rn+l, since to each t E R1O+l

t(z) = z +t (3.4)
This observation and (3.3) allow us to identify R1O+l as G(Rn+l)jG(Sn)
where the latter denotes the collection of equivalence classes of rigid
motions, each equivalence class being made up of the set of all motions
which take the origin to the same point.
N ow, consider the Laplacian operator on Rn+l which we deTlote by
~(Rn+l). In Cartesian coordinates, we have
a 2
a 2 a 2
~(Rn+l) = -+-+
az 2 az2
1 2
... + -
aZ;+l
(3.5)
As domain of ~(Rn+l), we take the set of all complex-valued functions on

Rn+l with bounded-continuous second partials with respect to each Zi, and
we denote this set by C2(Rn+l). For each rigid-body motion g E G(Rn+l),
we can define a mapping Tu: C2(Rn+l) onto C2(Rn+l) by
Tuf(z) = f(g(z» (3.6)
It is both well known and easily verified that for each g E G(Rn+l), Ttl
commutes with ~(Rn+l). That is,
(3.7)
The easiest proof of this fact is probably by representing each function in
C2(Rn+l) in terms of its Fourier integral.
In terms of polar coordinates, whereby each point in Rn+l is repre-
sented by a pair (r,a) E [0, 'Xl) X Sn, the Laplacian can be rewritten aR
(3.8)
where ~(Sn), the Laplacian of the n sphere, can be recursively defined as

follows:
afJ 1 2
~(Sn) = 1 ~ (sinn-1
sin n - 1 en aen
e ~)
n aen
+ _1_
sin en
2
~(Sn-l) (3.9)
From (3.8) it is easy to see that ~(Sn) commutes with each T u, g E G(Sn),
since T is left invariant by any 9 E G(S1O), and Tu commutes with ~(Rn+l).
It is now convenient to take as the domain of ~(Sn) the space C2(Sn) of
functions in C2(Rn+l) which do not depend on the radial distance T. Now,
consider the eigenvalues and eigenfunctions of ~(Sn). An eigenvalue A of
~(Sn) is any complex number such that the equation
(3.10)
has a solution f which is not identically zero. For an eigenvalue A, any

not-identic ally-zero f satisfying (3.10) is called an eigenfunction corre-
sponding to (3.10). Because A(Sn) is self-adjoint, the eigenvalues must be
real. To find the eigenvalues, first we seek those eigenvalues such that
(3.10) has a solution f which depends only on On and not on (0 1 , O2, • • • ,
On-I). For such a function we have
(3.11)
If we set cos On = X andf(On) = h(cos On), then (3.11) becomes

d h(x)
2 dh(x)
(1 - x 2) -- - n x - - - Ah(x) = 0 (3.12)
dx
2 dx
which is an equation of familiar form and is sometimes called the Gegen-
bauer equation. By seeking a power series solution, it is easy to discover
that a bounded h exists if and only if A is of the form
Am = -m(m +n - 1) m = 0, 1,2, . . . (3.13)
and for n ;::: 2, the corresponding eigenfunctions are the Gegenbauer
polynomials defined by
fm(fJn ) = hm(cos(Jn) = cm(n-l)/2(cos(Jn)
= Km["(cos(Jn + isin(Jncoscp)msinn-2cpdcp (3.14)
o
where the constant Km is such that fm(O) = 1.
We have thus found a subset of eigenvalues for (3.10), and for each
Xm = -m(m + n - 1) we have found an eigenfunction of the form
(3.15)
For each g E G(Sn) we can define a linear operator To: C2(Sn) -> C2(Sn) by
(Tof)(S) = f(g(S» (3.16)
Since each To commutes with A(Sn),foreach g andeachfm given by (3.15),
the function Tofm is again an eigenfunction corresponding to the same
eigenvalue Am = -m(m +
n - 1). Now consider the Hilbert space V(Sn)
generated by complex-valued functions on Sn which are square integrable
with respect to the uniform measure dO dO = (/so
For each m, the1).
set of functions {Tofm) g E G(Sn)} spans a finite-dimensional subspace
Xm of £2(Sn) with dimension dm. For n ;::: 2, we have
(3.17)
For m ~ m', Xm and X m, are orthogonal. We now choose a real ortho-

288 RANDOM FIELDS
normal basis {hmz(n), l = 1, . . . ,dm} for X m, where we always t.ake

hm1(n) to be proportional to 1m as defined by (3.15). It turns out that the
set of functions
{hm z(I1), l = 1, . . . ,am, m = 0, 1, . . . } (3.18)
is a basis for p(Sn). Since p(Sn) contains C2(Sn), this means that (3.13)
exhausts all eigenvalues for (3.10), and for each eigenvalue Am every
eigenfunction is a linear combination of {hmz(n), l = 1, 2, . . . ,dm}. We
shall call the functions {hml n )} spherical harmonics [Erdelyi, 1953,
Chap. 11].
Let F be a function defined on Sn X Sn such that for every g E
G(Sn) and every (0,0') E Sn X Sn,
F(g(O),g(O'» = F(O,O') (3.19)
°°
Let denote t.he north pole as before. Then for every rotation
leaves fixed, we have
T which
F(O,O) = F(T(O),O) (3.20)

Therefore, F(O,O) can depend only on On, that is, the last component of
a. Suppose that F(',O) E X m. That is, suppose that
t.(Sn)F(·,O) = -m(m +n - I)F(-,o) (3.21)
Then it follows that F(O,O) as a function On must satisfy.(3.11) corre-
sponding to A = -m(m +
n - 1). This means that we must. have
(3.22)
where K is a constant. Now, for any fixed pair (0,0') there always exists
a motion g which simultaneously takes 0' into 0 and a into (0, 0, . . . ,
1f(O,O'» where 1f(O,O') denotes the angle between a and a'. This means that.
F(O,O') = KCm(n-1)/2(cos 1f(O,O'» (3.23)
What we have shown is that if F satisfies (3.19) and (3.21) then it must
be of the form (3.23).
Let {hmz(n)} denote the spherical harmonics that we defined earlier.
Consider the bilinear sum
F(O,O') = rdm
1= 1
hmz(n)(O)hmz(n)(o') (3.24)
We shall show that the F in (3.24) satisfies both (3.19) and (3.21). For an
arbitrary g E o(sn), Xm is invariant under Tg so that we can write
(Tghml(n»(o) = hm1'n)(g(0» = r
dm
1'-1
(Xll'(g)h;;:i-(O) (3.25)
Because
Isn hmk(n) (g(6»h mz(n) (g(6» dO = Isn hmk(n) (6)h mz (n) (6) dO = aiel (3.26)
we find that for each g E G(sn),

dm
L
1'=1
all'(g)akl'(g) = akZ (3.27)
This means that A(g) = [aij(g)] is an orthogonal matrix so that we also have
dm
L
1'= 1
az'z(g)al'k(g) = aM (3.28)
It follows that
d,. dm dm dm
~ hmz<n) (g(6) )hmz(n) (g(6'» ~ ~ h;;:z\(6)ht;:k,(6) L all'(g)azdg)
1= 1 I' = 1 k' = 1 1= 1
= L hmz(n) (6)h mz(n) (6')

dm
(3.29)
1=1
Hence, the function F defined by (3.24) satisfies (3.19). It is obvious that

it also satisfies (3.21) so that
dm
~ h m/ n)(6)h mz<n)(6') = KCm(n-l)/2(cos !f(6,6'» (3.30)
1= 1
The constant K can be evaluated as follows:

d.,
Isn L h ml Cn ) (O)h m/n)(0) dO = dm

/=1
(3.31)
Therefore,
(3.32)
We can now apply the results on spherical harmonics to the problem of

representing isotropic random fields. A second-order random field {Xz , Z E
Rn} is said to be isotropic if fot" every rotation p
EXp(z) = EX. (3.33)

and
z, z' ERn (3.34)
If we adopt a polar-coordinate system z = (r,6), r E [0,00),6 E 8 n - 1, then

290 RANDOM FIELDS
(3.33) and (3.34) imply that

EX(r,8) = ~(r) (3.35)
and
EX(r,8)X(r',9) = R(r, r', cos ",(9,9'» (3.36)
where ",(9,9') is the angle between 8 and 9'. Suppose that we now seek a
representation of the form
L L X m/(r)hml n-1)(9)
.. dm
X(r,9) = (3.37)
m=OI=l
where hml<n-l) are the spherical harmonics. The orthonormality of the

spherical ha,rmonics yields
Xml(r) = 18'-' X(r,9)h m l\n-I)(9) dO (3.38)

so that
EXmZ(r)X."",,(r')
= ( EX(r 9)X(r' 9')h m l(n-l) (9) h<n,-; 1) (9') dO dO'
)sn-lxsn-l " ml
= (
}S,,-lXSn.-l
R[r " r' cos .1'(9
Y',
Cn,-;I) (9') dO dO'
9')]h m l(n-l) (8)h ml (3.39)
If {Xz'z ERn} is q.m. continuous, then R(r,r',cosl/;) can be expanded in

terms of cm(n-2)/2(cosl/;) as
L
00
R(r,r',cosl/;) = dmRm(r,r,)cm(n-2)/2(cosl/;) (3.40)

m=O
The bilinear fonn (3.32) and (3.40) can now be used in (3.39) to yield
EXml ( r)XmAr') = 6mm,6u ,Rm(r, r') (3.41)
This means that {Xm/(r)} is a countable family of orthogonal one-
dimensional stochastic processes.
Suppose that {X., Z ERn} is not only isotropic, but also homo-
geneous; then we know from (2.8) that we can write
X. = fRO ei2r ("')X(dv) (3.42)
where X is a random set function defined on the Borel sets of Rn. Now,
adopt a polar-coordinate system for both 11 and z in (3.42) so that
O:$X<
= (X,+) +E Sn-l
00
211'11
O:$r< 00
z = (r,9)
9 E Sn-l
It is obvious that
~(Rn)ei2.-( •.. ) = _ X2ei2r( •• ·)
It follows that we must be able to write
L
OD
eiArcosy,(6.0) = e,).rco88 R _ 1 = Cm(n-2)f2(COS (}n-l)fm(Xr) (3.43)
m=O
where fm satisfies
-
1 d ( n 1 d
-
)
r - - fm(Xr) -
m(m +n - 2)
fm(Xr)
r n- 1 dr dr r2
(3.44)
f m (Xr ) -- K
m
J(n-2)/2+m(Xr)
(Xr) (n-2)/2 (3.45)
Therefore, from (3.43) and (3.32), we have
e,).rcosy,(8.'I') = f
L
K
m
J(n_2)/2+m(Xr) C (n-2)/2(cos 1/;(0
(Xr)(n-2)/2 m ,
,»
m=O
= f
L
~
L d
Km J(n-2)/2+m(Ar) h (n-l)(O)h (n-l) (A)
(Ar) (n-2)/2 ml ml ...
(3.46)
m=O 1=1 m
We can now use (3.46) in (3.42) and get

OD dm J
_ \' \' (n-1) roo (n-2)/2+m(Ar)
X(r,O) - m':o Ifl
(0) Jo (3.47)
A
hml (Ar)(n-2)/2 Xm1(dA)
where {Xmd is a family of random set functions defined on the Borel sets
of [0,00) by the formula
~
Xml(A) Km! ~ (dA
= d m AXS.- 1 hmz'n-l) (,)X 271" d, ) (3.48)
It is easy to verify that
(3.49)
where F 0 is the Borel measures appearing in (2.22).

A comparison of (3.37) and (3.47) shows that for an isotropic and
homogeneous random field, WP, have
Xml(r) = r '" J(n-2)/2+m(Xr) X (dA) (3.50)

10 (Xr) (n-2)/2 ml
with
Km 2 hOD J(n-.2)/2+m(Xr) J(n-2)/2+m(Xr') F (d')
Rrn ( r,r , ) =
0 (3.51)
d m2 0
1\
(Xr) (n-2)/2 (Xr') (n-2)f2
292 RANDOM FIELDS
Thus we see that an isotropic random field can be decomposed into

a countable number of mutually uncorrelated stochastic processes with a
one-dimensional parameter. If the random field is also homogeneous, then
the correlation functions of these component processes can all be expressed
in terms of a single Borel measure, as is done in (3.51). Results very similar
to these can be obtained for random fields with parameter spaces which
are certain differentiable manifolds with a Riemannian metric, when these
random fields are homogeneous with respect to the motions which preserve
this metric. More generally, representation results for random fields de-
fined on homogeneous spaces can be obtained [Yaglom, 1961; Gangoli,
1968; Wong, 1969].
4. MARKOVIAN RANDOM FIELDS

One characterization of Markov processes is that the future and the past
should be independent given the present (see 2.5.17). Levy [1956] gen-
eralized the concept of Markov property to random fields with parameter
space Rn by identifying the present with any smooth, closed (n - 1)
surface separating the parameter space into a bounded part (past) and
an unbounded part (future). We should immediately note that this corre-
spondence does not quite reduce to the usual identification when the
parameter space is Rl, since a "closed" surface in Rl corresponds to a
pair of points rather than to a single point. If we had identified the present
with any (n - 1) surface separating the parameter space into two parts,
and not just with closed surfaces, the definition for a l\'larkov random
field would correspond more closely to that of Markov processes. However,
this would lead to other difficulties.
Let aD be a smooth (infinitely differentiable), closed surface sepa-
rating Rn into a bounded part D- and an unbounded part D+. A random
field {X., Z ERn} is said to be Markov if for any such aD, X. and X",
z E D-, z' E D+, are independent, given {X., z E aD}. Now suppose that
{X., z ERn} is a q.m. continuous zero-mean Gaussian isotropic and
homogeneous random field. From (2.22), we know that the probability
law of such fl, random field is completely specified by a finite Borel measure
Fo on [0,00). A question of obvious interest is what must Po be so that the
random field {X., Z ERn} is Markov? We shall answer this question in
this section. The answer, unfortunately, is not very satisfactory in that
the only such Markov fields turn out to be degenerate ones.
Suppose that {X., Z ERn} is a q.m. continuous Gaussian isotropic
random field. From Sec. 3, we know that if we rewrite
X(r,6) = r r
..
m=OI=l
dm
X ml (r)h ml (n-l)(6) (4.1)
4. MARKOVIAN RANDOM FIELDS 293
then lXml(r)} are mutually uncorrelated processes. Since IX., Z E R"l is

Gaussian, {XmIO I are in fact independent Gaussian processes with
parameter space [0,00). Now, suppose that IX., Z E R"} is Markov. Then
each Xml(r) , 0 ~ r < 00, is a Gaussian Markov process from the following
reasoning: Because each X ml(') is Gaussian, we only need to prove that
for each m and l,
(4.2)
Because {XmIO} are mutually independent, we have
E[Xml(r)!Xml(p), P ~ ro < r]
= E[Xml(r)!Xm'I'(p), for all m', l' and P ~ ro < r]
Since X ml(r) is given by

= E[Xml(r)!X(p,O), p ~ ro < r, ° Sn-l]
E
Xml(r) = !sn-, hml(n-l) (O)X(r,O) dO
we can apply the Markov property of IX., Z ERn} with aD = sphere

with radius ro, and find
E[Xml(r)!X(p,O), P ~ ro < r, °E Sn-l] = E[Xml(r)!X(ro,O), 6 E Sn-l]
= E[Xml(r)IXm'I,(rO), all m', l']
= E[Xml(r)IX1I'1(rO)]
which is just (4.2). We can summarize this result as follows.
Proposition 4.1.Let IX., Z ERn} be a·q.m. continuous Gaussian isotropic

Markov rahdom field. Define
(4.3)
Then {Xml(r), 0 ~ r < 00, m = 0,1,2, . . . ; l ~ dm } is a family

of independent Gaussian Markov processes.
From (3.41) \ve know that the correlation function of XmlO only
depends on m, and from (2.5.14) we know that it must have the form
EXml(r)Xml(r') = Rm(r,r') = fm(max (r,r'»gm(min (r,r'» (4.4)
If {X., Z ERn} is also homogeneous, then we know further that Rm must
be of the form [cf. (3.51)]
Rm{ r, r') = fm{max{ r, r'»gm{min{ r, r'»

= A 1 ~n-2)/2+m{
m0
00 Ar) ~n-2)/2+m( Ar') R (dA)
{Ar )(n-2)/2 (Ar,)(n-2)/2 0
(4.5)
294 RANDOM FIELDS
where Am are positive constants. Let Am dp,note the operator
(Am ')(r)
J
= _1_ ~
r,,-l dr
(rn-1 df(r»)
dr
_ m(m + n -
r2
2) fer) (4.6)
For r > r' consider

(4.7)
and
[AmRm(r,')](r') = fm(r) (Amgm) (r') (4.8)
Because of (4.5), (3.44), and (3.45), we have
[AmR (. r')](r) = A roo (-A2) J(n-2)/2+m(Ar) J(n_2l/2+m(Ar') F (dA)

m , 10
m (Ar) (n-2) /2 (Ar') (n-2) /2 0
= [AmRm(r,')](r') (4.9)
so that for r > r',
(4.10)
Since the two sides of (4.10) are functions of different variables, they must
be equal to a constant, i.e.,
(Amfm)(r) = vmfm(r) r>O (4.11)
(Amgm) (r) = vmgm(r) r>O (4.12)
Now, consider
R(r) = EX(r,9)X(O,·) = EXzXo
= L Rm(r,O)hm/n-l)(9)hm/n-l)( <p)
m,l
It is obvious that we must have
m ~ 1
because hOI (n-l)(9) = 1 and all other h m / n - 1) are orthogonal to it. There-
fore,
R(r) = Ro(r,O) = fo(r)go(O) ( 4.13)
It follows from (4.12) that R(') must satisfy
- 1 -d ( r n- 1 - d R(r) ) = voR(r) r>O (4.14)

r n - 1 dr dr
The only solution of (4.14) such that R(O) < 00 and R(llz - z'll) is non-
4. MARKOVIAN RANDOM FIELDS 295
negative definite is given by
R(r) = AJ (n-2)/2(Aor)
(4.15)
(Aor)(n-2)/2
where Ao and A are nonnegative constants.

If Ao = 0 in (4.15) then the random field is just a single Gaussian
random variable. It is Markov but only trivially so. If Ao > 0 then
X (r) = X J(n-2l/2+m(Aor)
(4.16)
ml ml (Aor)(n-2)/2
where {Xmd is a collection of independent Gaussian random variables.

Since J(n-2)/2+m(AOT) has zeros on 0 :::; r < 00, the processes Xml(r),
o :::; r < 00 are not l\Iarkov. 1 By virtue of Proposition 4.1 the corre-
sponding random field is not Markov.
Equation (4.14) has a second solution of the form
R(r) = A K(n_2)/2(Aor) (4.17)

(XoT) (n-2)/2
for \vhich R (liz - z'll) is positive definite, but R (0) 00 • Eq uatioll (4.17)
is counterpart of the exponential-correlation function R(T) = e-airi of a

stationary Gaussian l\Iarkov process (Ornstein-Uhlenbeck process), and
the correspondence becomes more evident when we 'Hite
K(n-2)/2(Aor) too J(n_2)/2(Xr) 1
--'---'-'-----'--------'-- =
(Aor) (n-2)/2 0 (Ar)(n-2)/2 (X2 + '\.0 2) xn-1 dA (4.18)
which suggests that the spectral density is of the form
(4.19)
Of course, (4.18) corresponds to an unbounded spectral measure

An - 1 dx
P (dX) -
o - (X2 + X02) (4.20)
and there is no q.m. continuous random field with a correlation function

given by (4.17). It is possible to define a generalized random field with a
correlation function given by (4.17). It turns out that this generalized
random field, if Gaussian, is Markovian in a well-defined sense. To extend
the definition of a Markov random field to generalized random fields
requires some care, but it can be done [Wong, 1969].
1 This point was clarified for me by Professor Frank Spitzer.

296 RANDOM FIELDS
Levy [1948] defined a Brownian motion with parameter space Rn as

a Gaussian random field IX., z ERn} with zero mean and covariance
ftinction
EX.X., = -Hlizil + IIz'l! - liz - z'lI) (4.21)
For n = 1, it reduces to the ordinary Brownian motion on (- 00,00) as

defined by (2.3.17). For odd-dimensional parameter spaces (n = 2p +
1),
Levy [1956] conjectured that I X., z E R2 p +l} is Markovian of order
p +
1 in the following sense. A random field {X., Z E Rn I is said to be
Markov of order 5:p +
1 if, for any smooth, closed (n - 1) surface aD,
every approximation X. to X. in a neighborhood of aD which has the
property
. 1 _
hm-IX. - X.I = 0
810 (jp
o= distance (z,aD) (4.22)
also has the property that given X, X and X z ' are independent whenever
Z
Z E D- and z' E D+. If X z is Markov of order :$ p + 1 but not :$ p, then

it is said to be Markov of order p + 1. This conjecture in a somewhat
different formulation was proved by McKean [1963]. McKean showed that
for a Brownian motion on R 2 p+l, given X and its "normal derivatives" ak X,
k = 1,2, ... , p, on aD, X z , Z E D- is independent of X z " z' E D+. However,
a Brownian motion is not even once differentiable, so that the normal
derivatives need to be defined, which McKean has done. Brownian motions
with an even-dimensional parameter space have no Markov property at all.
One way in which Markovian random fields (of some order) arise
naturally is through differential equations driven by white noise, very
much in the same way that diffusion processes are generated by white
noise. It is not difficult to define white noise, Gaussian or not, with a
multidimensional parameter. However, stochastic partial-differential
equations as extensions of Ito equations have not been studied, except
for linear equations of the form
!lX. = kX. + TJ.
where TJ. is a white noise. Several examples of this type have been given by
Whittle [1963].
5. MUL TIPARAMETER MARTINGALES
In view of the important role that martingales have played in the develop-
ment of a theory of filtering and detection, one is motivated to generalize the
martingale concept to multiparameter processes. This can be done in a
number of ways. One of the simplest and most natural is to make use of the
5. MUL TIPARAMETER MARTINGALES 297
natural partial ordering defined in terms of Cartesian coordinates. For

simplicity of exposition we limit the discussion for now to the two dimen-
sional case.
Let ~ ~ denote the positive quadrant {( t I , t2): 0::5: tI , tI < oo} and
define a partial ordering (» as follows: For t, S E IR ~, t > S if and only if
tI ~ SI and t2 ~ S2. Now martingales can be defined. Let ([2, IX, '?P) be a
probability space and {IXt, t E IR~} be a family of a-algebras such that
( increasing family)
A two-parameter process {Mt, t E IR~} is said to be a martingale relative to

{IXt} if
a.s. (5.1)
An example of two-parameter martingales is the Wiener process ~

defined as follows:
(a) {~, t E IR~} is a Gaussian process
(b) E~ = 0 for every t
(c) E~~ = min (tI' SI) min(t2' S2)
Now, let IX wt be the a-algebra generated by {~, S < t}. Then, ~ is a

martingale relative to {IX wt }.
The fact that the Wiener process is a martingale can be easily seen by
relating it to a "white noise process" as follows: Let 0?> denote the collection
of Borel sets in IR ~. Let {1) (A), A E 0?>} be a Gaussian family of random
variables parametrized by Borel sets such that E1)(A) = 0 and
E1)(A)1)(B) = area (A n B)
It is clear that 1) is independent on disjoint sets. Now, let At denote the

rectangle {s E ~ ~ : S < t}. Then, a Wiener process ~ can be expressed as
The martingale property of W can now be derived as follows:
t> s=E(~\IXwJ = E[ 1)(At)\IXwsl
= E[ 1)(AJ + 1)(At - As)\IXws]

= 1)(As) a.s.
=~
where we have used the fact that 1)(As) is IXws measurable while 1)(At - As)
(At - As being disjoint from As) is IXws independent.
298 RANDOM FIELDS
Wong and Zakai (1974) considered the problem of representing the

functionals of a Wiener process in ~~ by suitably defined stochastic in-
tegrals. This work was followed shortly thereafter by the paper of Cairoli and
Walsh (1975), and these papers inaugurated a body of research that has
grown to substantial proportion. In this section we give a brief account of the
basic results on stochastic integration on ~ ~.
Let {1j(A),A E 02>} be a white noise process and {~= 1j(A t )} be a
Wiener process as before. Let {ttt' t E ~~} be an increasing family of
a-algebras such that B n At = 0 implies 1j(B) is ttt independent. Suppose
{<pt, t E T} is a random field on the unit square T = [0,1]2, such that for
each t, <Pt is ttt measurable and
Denote the collection of all such random fields by Xl' Then, Xl is a Hilbert
space with inner product
We note define type-l stochastic integrals
for <P E Xl as follows:

(a) If cMw) = Z(w)fA(t) where fA IS the indicator function of a
rectangle A, then we set
(Note: Z is necessarily ttto measurable where to is the minimum point

on A.)
m
( b) If 's simple functions. For simple <p's, we have
E( <P' W)2 = (cf>,<P) = 11<p112

( c) A Cauchy sequence of simple functions {k· W} of square-integrable random variables.
Hence, we can define
(lim</>k) • W = lim in q.m. </>k • W
(d) It can be shown in a way similar to that in Section 4.2 that every
</> E X 1 is the limit of a sequence of simple functions. Hence, </> • W is
well defined for all </> E Xl.
It is clear that </>. W is a straightforward generalization of the Ito
integral (defined in Section 4.2) and its properties are similar. These include:
(a) linearity: (a</> + P</>')· W= a(</>· W) + /3(</>'. W)
(b) isometry: E(</>· W)(</>,· W) = (</>,</>')
(c) martingale: E( </> • Wlttt) = </>IA t • W almost surely for every t E T
(i.e., E[ 1</>s d~lttt] = is<t </>s dW.)
T
Type-l stochastic integrals are not enough to represent all Wiener

functionals. That is, there are square-integrable random variables Z mea-
surable with respect to tt(1,l) that cannot be written as
Z= EZ + </>. W
To see this, consider a random variable Z constructed as follows: Let t and 8

be two unordered points and let t:.t and t:.s be two forward incremental
rectangles at t and 8, respectively. Define Z = 1j(t:.t)1j(t:. s ). For infinitessimal
t:.t and t:. s' to represent Z as a type-l stochastic integral would require it to
be of the form
where X is ttt measurable and Y is tts measurable. That would mean
EZ 2 = E [ X 1]2 ( t:. ()1j ( t:. s) + Y1j2 ( t:. s) 1j ( t:. t) ]

= E{X1]2(t:.()E[1j(t:.Jltt(1 + Y1j2(t:. s)E[1j(t:.t)ltts]} = 0
which contradicts
What is needed is a type-2 stochastic integral which will be defined next.

Let G denote the set of all unordered pairs (t, s) in TxT such that t
is lower and to the right and 8 is higher and to the left. In other words
t1 > 8 1 and t2 < 8 2 • We denote this by 81\ t. Let X 2 denote the collection of
300 RANDOM FIELDS
all random functions I/;t,s defined on G such that:

(a) for each (t, s), I/;t,s is a random variable measurable with respect
toCP-ts ;
(b) 1/;( t, s, w) is jointly measurable with respect to
" 2
ex ex CP- where e
denotes the a-algebra of Lebesgue measurable sets in T;
(c) jEI/;;sdtds< 00.
e .
We can now define a type-2 stochastic integral, variously denoted by
as follows:
(a) Suppose that there exist rectangles Ll1 and Ll2 such that ~1 X ~2
C G and
I/;t.s(W) = a(w) for ( t, s) E ~1 X ~ 2

=0 otherwise
Then, we set
(b) We say I/; is simple if

m
and each I/; k is of the form given in (a). If I/; is simple, we set
m
I/; • W 2 = L I/;k· W 2
k~l
(c) It can be shown (Wong and Zakai, 1974) that simple I/;'s are dense
in X 2 with respect to the norm
Ie EI/;L dtds
1/2
( )
111/;11 =
and that for a simple 1/;,
It follows that I/; • w 2 can be extended to all I/; E X 2 by approximat-

ing I/; with a sequence of simple functions I/;n such that
II~ - ~nll~ 0 and by defining
..p • W 2 = lim in q.m. ~n • W2

n~oo
Type-2 stochastic integrals have the following important properties:

(a) linearity: (a~ + /30/'). W 2 = a(~· W2) + f3(~' • W2)
(b) isometry: E(~· W2)(~'. W2) = fa E~t.s~;.s dtds
( c) orthogonality: E (~ • W2)( cp • W) = 0, ~ E :JC 2 , cp E :JC I
(d) martingale: E(~· W2ltet) = ~IAtXAt • W 2.
With stochastic integrals of type-1 and type-2, all square-integrable
Wiener functionals can now be represented. For example, ""'(~.I) = TJ2(T) can
be written as
""'(i,l) = E""'(i,l) + 2( W· W) + 2(1 • W2)
= 1 + 2 Ir w" dw" + 2 fa dWr dw"

Several proofs of the representation theorem have now been given (Wong
and Zakai, 1974; Cairoli and Walsh, 1975; Hajek, 1979). Rather than repro-
ducing one here, we show a heuristic argument for it.
Consider a functional
f(u, W) = exp Iru(s) dw"
where u is a deterministic function in L 2 (T). Intuitively, any square-inte-

grable Wiener functional can be approximated by sums of the form
[,a(u)f(u, W)
u
so that it would be sufficient to obtain a representation for f( u, W). Now,

define
Mt = f
At
u( s) dw"
and
302 RANDOM FIELDS
follows from the differentiation rule for a one-parameter martingale that
and
Now, d tl d t2 M = u(t)'r/(dt) and
d tl Md t2 M= L u(t)'r/(ds)rj(ds')
Sf /\ s
SVS'=t
Hence, if we denote A(u) = exp t 1u (s) ds then

T
2
f( u, W) = A( U)X1,l = A( u){ 1 + ~X t ( dtlM d t2 M + d tl d t2 M)}
= A( u){ 1 + fTu( t)Xt dWe + fa u( s) u( s')Xsvs' d~ dw..)

which is the representation result that we have been seeking.
From the martingale property of the stochastic integrals of both types,
it is clear that the representation theorem implies that every square-integra-
ble Wiener martingale can be represented in the form
Mt = EMo + cpIA • W + I/;IA t
xA •
t
W2
However, for the stochastic integrals to be useful as an analytical and
computational tool, we need more than a representation result, we need a
calculus. It turns out (Cairoli and Walsh, 1975) that a host of additional
concepts is needed. These include: 1- and 2- martingales and strong and weak
martingales. Even with these additional concepts the resulting calculus is
complicated. To remove this obstacle, a theory of stochastic differential
forms is needed. Some of the basic elements of such a theory are presented in
the next section.
Before taking up stochastic differential forms, we should mention that
stochastic integration has been further generalized in a number of ways. Yor
(1976) generalized the representation result to n-parameter Wiener function-
als and in the process introduced multiple stochastic integrals of various
orders of which our type-2 integral is a special case corresponding to order 2.
Haj ek (1979) generalized the representation result by viewing martingales as
set-parametrized processes. In the process he not only extended stochastic
integration to a much more general setting but also illuminated the connec-
tion between multiple stochastic integrals and multiple Wiener integrals (Ito,
1951c).
6. STOCHASTIC DIFFERENTIAL FORMS 303
6. STOCHASTIC DIFFERENTIAL FORMS
Intuitively, differential fonns are objects to be integrated on reasonably

smooth k-dimensional surfaces in an n-dimensional space (n;;::.: k). For
example, a 1-fonn ill two-dimensional space can be written as
x = a(t) dt1 + b(t) dt 2 , (6.1)
If a curve y in IR 2 is represented by
then f x is given by
y
~x = f[ a(t(a» dt~~a) + b(t(a» dt~~a)] da

Our goal is to generalize the concept to include not only cases where a( t)
and b( t) in (6.1) are multi parameter processes but also situations where the
processes involved are not sufficiently smooth to allow representation such as
(6.1).
We begin with the observation (Whitney, 1957) that if we denote
X(y) = fx
y
then knowing X(y) on a sufficiently large set of y's completely specifies x.

In fact, it is enough to know X(y) on horizontal and vertical lines, which
would allow a( t) and b( t), respectively, to be detennined. Thus, a differen-
tial1-fonn in n-space can be viewed as a function of lines parallel to the axes,
and 2-forms, can be viewed as functions of rectangles, etc. This is the
approach we shall use in defining stochastic differential fonns. For simplicity
of exposition we shall limit ourselves to differential fonns on two-dimensional
spaces.
Consider the collection of all oriented horizontal and vertical line
segments a. We give a a positive orientation if it is increasing (in the
coordinate that is changing) and negative otherwise. Then, a and - a
represent the same set which is denoted by lal. If (J is subdivided into two
subsegrnents a' and (J" having the same orientation as a, then we write
11 = a' + a"
A stochastic differential 1-fonn X is a family of random variables parame-
trized by
:s = {oriented horizontal and vertical line segments}

304 RANDOM FIELDS
such that it is additive, i.e.,
X( a' + a") = X( a') + X( a")
To allow X to be extended to curves, we need some continuity conditions to

be explained next.
Construct a linear space f1 consisting of fonnal sums
m
L ai ai
i=l
where ai are real constants and ai are oriented line segments with the
requirements that: (a) elements of f1 equal under subdivision are not
distinguished, and (b) a( - a) = ( - a) a. A differential 1-fonn X is easily
extended to f1 by linearity, i.e.,
X( ,'I:, aiai ) = ,=1

,=1
,'I:, aiX( a;)
Elements of f1 are called I-chains.
Now, give each rectangle p in 1R2 an orientation (say - for "in" and +
for "out"). The boundary ap of each oriented rectangle p is a I-chain, and
we adopt a right-hand-screw convention so that the boundary of a positively
oriented rectangle is counter-clockwise. Let f2 denote the linear space
consisting of linear combinations of oriented rectangles [aiPi with the
i
previously cited convention concerning equivalence under subdivision and
multiplication by -1. Elements of f2 are called 2-chains. We are now ready
to define stochastic differential fonns.
Let (Q, ft, 0') be a fixed probability space and let ~ denote the space of
all random variables on the space. We define stochastic k-fonns on 1R2 as
follows:
A 0-fonn is a function X: IR 2 ~ ~
A I-fonn is a linear function X: f1 ~ ~
A 2-fonn is a linear function X: f2 ~ ~
In addition, we shall assume the following conditions to be satisfied by

all stochastic differential fonns:
(a) All O-forms X( w, t) are bimeasurable in (w, t).
(b) All I-fonns satisfy two continuity conditions:
lim in p . X( a) = 0 where lIall = length of a (6.2)

Ilall~O
lim in p. X( ap) = 0 where lipli = area of p (6.3)

IIpll~O
( c) All 2-forms satisfy
lim in p. X( p) = 0 (6.4)
Ilpll~O
With the continuity conditions (6.2)-(6.4), the I and 2 forms can now
be further extended. For example, a sequence of approximating I-chains can
be constructed for any smooth curve y by successively subdividing y and
constructing a staircase approximation using the subdivision. If the subdivi-
sions are nested then the difference between two staircase approximations is
the boundary of a 2-chain. Continuity (6.3) then allows a I-form to be
extended to y. Similarly, (6.4) allows a 2-form to be extended to a two-dimen-
sional set that can be approximated by 2-chains.
Before proceeding further, consider the following example. Let
{1J (A), A E 01}} be a Gaussian collection of random variables parametrized
by the collection 012 of Borel sets in 1R2 such that E1J(A) = 0 and
E1J(A)1J(B) = area (A n B)
The set-parameter process 1J will be called a Gaussian white noise. Now, for
any oriented rectangle, set
Z( p) = 1J(lpl) if p has a + orientation

= -1J(ipl) if p has a - orientation
Z can be extended to f2 by linearity and so extended satisfies condition (6.4).

Thus, Z is a stochastic 2-form.
A O-form W can also be defined in terms of 1/ in a natural way. For
each t E 1R2 set
when At is the rectangle (un oriented) bounded by the two axes and t. The
sign is + if t is in the first or third quadrant and - otherwise.
A I-form G2l can be defined in terms of Was follows. Let ab denote an
oriented horizontal or vertical line segment from point a to point b, and set
G2l ( ab) = W( b) - W( a)
Then G2l can be extended by linearity to f1 and to smooth curves by

continuity. The above example suggests a close relationship among the three
forms Z, W, and G2l. To expose the precise relationship, however, requires the
introduction of some additional concepts.
306 RANDOM FIELDS
Assign an orientation (+ or -) to each point t in R 2 and denote the

space of all linear combinations :~:>t:iti by rD. A O-form can be extended to ro
i
by linearity. Just as the boundary ap of a p in r2 is in r l , the boundary ap
of a a in r l is in rD. We can now define the exterior derivative dX of an
r-form (r = 0,1) as an r + 1 form such that
dX( a) = X( aa) for all a E rr+ I (6.5)

It follows that for a zero form Y,
dye ab) = Y( b) - Y( a)
and for a I-form 'Y,
d'\f( :GJ:) ~ '\f(ab) + '\f(bc) + '\f(ed) + '\f(da)

Observe that aa( p) = 0 for all p E r 2 , hence ddY is always O.
For our previous example, it is clear that
GIl = dW
but the relationship between Z and GIl (or W) remains obscure. To expose
that relationship requires one more concept. Given a I-form, say X, we can
express it in terms of a "coordinate system" similar to (6.1) as follows. Define
Xi (i = 1,2) as one forms such that
XI(a)=X(a) and X2(a)=0 if a is horizontal

=0 =X(a) if a is vertical
Then, we can write X = Xl + X 2 and the representation is good everywhere

on r l • For our example, we have
To see this consider a rectangle p given by

d c
p= [±]
a b
Then,
Now,
d( G21 2 )( p) = G21 2 ( Jp) = G21 2 ( ab + be + cd + da)

= G21(be) + G21(da)
= W(c) - W(b) - W(d) + W(a)
= 1](lpl)
Note that the exterior derivative d is independent of any coordinate system.
Hence, the relationship between G[l and W is coordinate independent.
However, the relationship between Z and G[l (or W) depends explicitly on
the coordinate system.
The concept of "martingales" can be extended easily to differential
forms, and the results that follow illuminate the situation discussed in
Section 5. Let {te t , t E 1Jl2} be a family of a-algebras satisfying the following
conditions:
HI: t > s = te t ;;:2 tes

H2 : te t = te t + = n
81> tt
tes
82> t2
If a is a point, vertical or horizontal line segment, or closed rectangle, denote

by t( a) and t( a) the minimum and maximum points of a, respectively. An
r-form (r = 0,1,2) X is said to be adapted if X( a) is tel(a) measurable.
Now, define te;, i = 1, 2, as follows: For t = (tI' t 2 ),
te; = Vte t == teoo ,t2

t\
A I-form X is said to ba a I-martingale (2-martingale) if for an oriented

horizontal (vertical) line segment a,
(6.6)
A 2-form can also be defined as an i-martingale (i = 1,2) in exactly the same

way. We merely need to take a to be an oriented rectangle rather than a line
segment in (6.6). We sayan r-form (r = 1,2) X is a martingale if it is an
i-martingale for both i = 1 and 2.
For O-forms we retain the definition of martingale given earlier, namely,
308 RANDOM FIELDS
Now, the question is: What is the relationship among the martingale r-forms
for different r? The following result due in its original form to Cairoli and
Walsh (1975) relates O-form martingales to I-form martingales.
Proposition 6.1. Let {tet} satisfy the additional condition
H3: te~ and te; are conditionally independent given tet

Let M be a O-form such that M is a one-parameter martingale on each
of the two axes. Then, M is a martingale if and only if dM is a
martingale.
The relationship between martingale I-forms and 2-forms is less inter-
esting.
Proposition 6.2. If a I-form X is an i-martingale (i = 1,2), then dXi is an

i-martingale.
Now, if X is a martingale I-form then X is a one-parameter martingale
on every horizontal line and every vertical line. If X is square-integrable
then X has a quadratic variation on each horizontal or vertical line. Hence,
there exists a positive adapted I-form (X) such that (X)i (i = 1,2) are the
horizontal and vertical quadratic variations. Furthermore, if Y is a O-form
then we can define a I-form Y /\ X by evaluating Y /\ X on line segments (J
as follows.
(Y /\ X) = 1~ d
a
tl Xt if (J is horizontal
if (J is vertical
In either case the integral is well defined as an integral with respect to a

one-parameter martingale (c.f. Chapter 6). Indeed, Y /\ X remains well
defined if X is only a semimartingale on horizontal and vertical lines.
We can now proceed to state the differentiation formula for two-
parameter martingale O-forms as follows:
Proposition 6.3.Let M be a square-integrable martingale O-form. Let f be a

twice continuously differentiable function. Then
dF(M) = f'(M) /\ dM + tt"(M) /\ (dM) (6.7)
where prime denotes differentiation.
Consider the O-form W defined earlier. It is easier to see that dW is in
martingale and
(dW) = dlL(A)
where IL denotes the area (i.e., the Lebesgue measure).
The differentiation formula (6.7) can be generalized to R n without

change in form. It is also a global formula without dependence on any
coordinate systems, and as such is far simpler than the differentiation
formulas that had been derived earlier for two-dimensional martingales.
The operation Y /\ X (exterior product) can be defined for suitable p
and r forms and results in a p + r form. The new cases for R 2 are p = and °
r = 2 and p = 1 = r. If Y is a O-form and X a martingale 2-form then
Y /\ X is basically an area stochastic integral such as defined in Section 5. If
Y and X are both martingale I-forms then Y /\ X is a martingale 2-form.
We shall not deal with the topic further except to consider some examples.
Consider the differential forms Z, W, and G21 (= dW) introduced in an
earlier example. f(W) /\ Z is a 2-form defined by
(f(W) /\Z)(a) = If(~)d~

o
where the integral is a type-I integral introduced in Section 5. Now, G21/\ G21
== 0, but G21 1 /\ G21 2 is given by the type-2 integral
where Al = [t{, t 1] X [0, t 2 ] and A2 = [0, t 1] X [t~, t 2 ]. If {tet} is generated

by the process W, then one can proceed to obtain representation theorems
not only for martingale O-forms but also for martingale 1- and 2-forms. We
will not pursue the topic further here.
As a final topic, we shall present a brief discussion on the relationship
between differential forms and two-parameter Markov processes. Consider
the Wiener process ~, t E R!, defined in Section 5. One might think that if
any process is Markov it should be ~. Surprisingly, ~ is not Markov! For
example, consider the triangular region S in R! bounded by the line y = {t:
tl + t2 = I}. For a point t outside of S, one can show that
E(~I~', t' E y) "* E(~I~', t' E S)

demonstrating that W is not Markov. The situation is greatly illuminated by
the use of differential forms. Consider the two I-forms d 1W and d 2 W. It is
easy to show that each d i W is a Markov process in the following sense. For
any curve aD that separates R! into a bounded region D that includes the
origin and D+,
d i W( a) and d i W( a+) are independent given

a c D and a+C D+=
{diW( a'), a' can}
310 RANDOM FIELDS
To prove this, take a vertical segment (1+ in D+ and construct a set p by

projecting (1+ horizontally onto aD. Then, d( d 2 W) = Z so that
d(d 2 W)(p) = Z(p) = 1j(lpl)
where 1j is the Gaussian white noise defined earlier, Since d( d 2 W)( p) =

d 2 W( ap) and d 2 W is zero on horizontal segments, we have
which is independent of d 2 W( (1) for any (1 E D, and we have proved d 2 W is

Markov. A similar proof works for d l W. Hence, we conclude that although W
is not Markov, the components of dW are Markov.
As a second example of the relationship between differential forms and
two-parameter Markov processes, consider a pair of independent zero-mean
Gaussian generalized random fields ~I and ~2 each with a covariance func-
tion given by (4.19) for n = 2 and >'0 = 1, i.e.,
t,t' E 1R2
The pair ai' ~2) defines a differential I-form X via the relationship
X( (J) = 1~lt al
a
t if (1 is horizontal
= 1~2t dl
a
t if (1 is vertical
where dl t denotes the path differential. Now, dX is a 2-form and it can be

easily shown that
dX+o:I\X=1j (6.8)
is a Gaussian white noise 2-form for any I-form

0: = 0: 1 dt l + 0: 2 dt2
such that 0: 1 and 0: 2 are constants satisfying
o:~ + o:~ = 1
Equation (6.8) relates an isotropic and homogeneous Gauss-Markov field to

Gaussian white noise and can be used to prove the Markovian property. It
also strengths our intuition that a Markov process with parameter of any
dimension is "one derivative away" from white noise.
References
BeneS, V. E. (1981): Exact finite-dimensional filters for certain diffusions with nonlin-
ear drift, Stochastics, 5:65-92.
Birkoff, G. and S. MacLane (1953): "A Survey of Modern Algebra," Macmillan, New
York.
Breiman, L. (1968): "Probability," Addison-Wesley, Reading, Mass.
Bremaud, P. (1981): "Point Processes and Queues, Martingale Dynamics," Springer-
Verlag, New York.
Bucy, R. S. and P. D. Joseph (1968): "Filtering for Stochastic Processes with
Applications to Guidance," Interscience, Wiley, New York.
Cairoli, R. and J. B. Walsh (1975): Stochastic integrals in the plane, Acta Math.
134:111-183.
Cramer, H. (1966): On stochastic processes whose trajectories have no discontinuities
of the second kind, Ann. di Matematica (iv), 71:85-92.
Davenport, W. B., Jr. and W. L. Root (1958): "An Introduction to the Theory of
Random Signals and Noise," McGraw-Hill, New York.
Davis, M. H. A. (1980): On a mUltiplicative functional transformation arising in
nonlinear filtering theory, Z. Wahrscheinlichkeitstheorie verw. Geb., 54: 125-139.
Dellacherie, C. and P. A. Meyer (1978): "Probabilities and Potential," North-Holland,
New York.
Dellacherie, C. and P. A. Meyer (1982): "Probabilities and Potential B, Theory of
Martingales," North-Holland, New York.
Doleans, C. (= Doleans-Dade, C.) (1967): Processus croissant naturels et processus
tres-bien-measurables, C.R. Acad. Sci. Paris,264:874-876.
311
312 REFERENCES
Doleans, C. (1968): Existence du processus croissant naturel associe it un potential de

la classe (D) Z. Wahrscheinlichkeitstheorie verw. Geb., 9:309-314.
Doleans, C. (1970): Quelque applications de la formule de changement de variable
pour les semimartingales Z. Wahrscheinlichkeitstheorie verw. Geb., 16:181-194.
Doob, J. L. (1953): "Stochastic Processes," Wiley, New York.
Doob, J. L. (1984): "Classical Potential Theory and Its Probabilistic Counterpart,"
Springer-Verlag, New York.
Duncan, T. E. (1968): Evaluation of likelihood functions, Information and Control,
13:62-74.
Duncan, T. E. (1970): On the absolute continuity of measures, Annals Math. Stat.,
41:30-38.
Dynkin, E. B. (1965): "Markov Processes," (2 vols.), Academic, New York, Springer-
Verlag, Berlin.
Erdelyi, A. (1953): "Higher Transcendental Functions," vol. II, Bateman Manuscript
Proj., McGraw-Hill, New York.
Fujisaki, M., G. Kallianpur and H. Kunita (1972): Stochastic differential equations for
the non-linear filtering problem, Osaka J. Math. 9:19-40.
Gangoli, R. (1967): Abstract harmonic analysis and Levy's Brownian motion of several
parameters, Proc. 5th Berkeley Symp. Math. Stat. and Prob., 11-1:13-30.
Girsanov, I. V. (1960): On transforming a certain class of stochastic processes by
absolutely continuous substitution of measures, Theory of Prob. and Appl.,
5:285-30l.
Hajek, B. (1979): "Stochastic Integration, Markov Property and Measure Transfor-
mation of Random Fields," Ph.D. dissertation, University of California, Berkeley.
Halmos, P. R. (1950): "Measure Theory," Van Nostrand, Princeton, N.J.
Hazewinkel, M. and J. C. Willems (Eds.) (1981): "The mathematics of filtering and
identification, Proceedings of the NATO Advanced Study Institute which took
place in June-July 1980 at Les Arcs, France," Reidel, Boston.
Ito, K. (1944): Stochastic integrals, Proc. Imp. A cad. Tokyo, 20:519-524.
Ito, K. (1951a): On stochastic differential equations, Mem. A mer. Math. Soc.,
4:1-5l.
Ito, K. (1951b): On a formula concerning stochastic differentials, Nagoya Math. J.,
3:55-65.
Ito, K. (1951c): Multiple Wiener integral, J. Math. Soc. Japan, 3:157-169.
Jacod, J. (1979): "Calcul Stochastique et Problemes de Martingales," Springer-Verlag,
New York.
Jazwinski, A. H. (1970): "Stochastic Processes and Filtering Theory," Academic, New
York.
Kac, M. (1951): On some connections between probability theory and differential and
integral equations, Proc. 2nd Berkeley Symp. on Math. Stat. and Prob., 189-215.
Kailath, T. (1969): A general likelihood-ratio formula for random signals in Gaussian
noise, IEEE Trans. Inf. Th., IT-5:350-36l.
Kakutani, S. (1948): On equivalence of infinite product measures, Ann. Math.,
47:214-224.
REFERENCES 313
Kalman, R. E. and R. S. Bucy (1961): New results in linear filtering and prediction
theory, Trans. Am. Soc. Mech. Engn. Series D, J. Basic Eng., 83:95-108.
Karhunen, K. (1947): Uber linear methoden in der wahrscheinlichkeitsrechnung, Ann.
A cad. Sci. Fenn., 37.
Kohlmann, M. and W. Vogel (Eds.) (1979): "Stochastic control theory and stochastic
differential systems, Proceedings of a Workshop of the 'Sonderforschungsbereich 72
der Deutschen Forschungsgemeinschaft an der Universitat Bonn,' which took place
in January 1979 at Bad Honnef," Lecture Notes in Control and Information
Sciences, 16, Springer-Verlag, New York.
Kolmogorov, A. N. (1931): Uber die analytische methoden in der wahrscheinlichkeits-
rechnung, Math. Ann., 104:415-458.
Kunita, H. and S. Watanabe (1967): On square integrable martingales, Nagoya Math.
J.,30:209-245.
Kushner, H. J. (1967): Dynamical equations for optimal nonlinear filtering, J. Diff.
Equat., 3:179-190.
Levy, P. (1956): A special problem of Brownian motion, and a general theory of
Gaussian random function, Proc. 3rd Berkeley Symp. Math. Stat. and Prob.,
2:133-175.
Liptser, R. S. and A. N. Shiryayev (1977): "Statistics of Random Processes, I and II,"
Springer-Verlag, New York.
Loeve, M. (1963): "Probability Theory," 3d ed., Van Nostrand, Princeton, N.J.
McKean, H. P., Jr. (1960): The Bessel motion and a singular integral equation, Mem.
Coli. Sci. Univ. Kyota, Series A, 33:317-322.
McKean, H. P., Jr. (1963): Brownian motion with a several dimensional time, Theory
of Prob. and Appl., 8:335-365.
McKean, H. P., Jr. (1969): "Stochastic Integrals," Academic, New York.
McShane, E. J. (1969): Toward a stochastic calculus, II, Proc. National Academy of
Sciences, 63:1084-1087.
McShane, E. J. (1970): Stochastic differential equations and models of random
processes, Proc. 6th Berkeley Symp. Math. Stat. and Prob., 3:263-294.
Meyer, P. A. (1966): "Probability and Potentials," Blaisdell, Waltham, Mass.
Mortensen, R. E. (1966): "Optimal Control of Continuous Time Stochastic Systems,"
Ph.D. dissertation, Dept. of Electrical Engineering, University of California, Berke-
ley.
Neveu, J. (1965): "Mathematical Foundations of the Calculus of Probability," Arniel
Feinstein (trans.), Holden-Day, San Francisco.
Paley, R. E. A. C. and N. Wiener (1934): "Fourier Transforms in the Complex
Domain," Amer. Math. Soc. Coll. Pub., Am. Math. Soc., 19.
Prokhorov, Yu V. (1956): Convergence of random processes and limit theorems in
probability theory, Theory of Prob. and Appl., 1:157-214.
Rao, K. M. (1969): On decomposition theorems of Meyer, Math. Scand., 24:66-78.
Riesz, F. and B. Sz.-Nagy (1955): "Functional analysis," Ungar, New York.
Root, W. L. (1962): Singular Gaussian measures in detection theory, Proc. Symp.
314 REFERENCES
Time Series Analysis, Brown University, 1962, Wiley, New York, 1963, pp. 292-316.
Rudin, Walter (1966): "Real and Complex Analysis," McGraw-Hill, New York.
Skorokhod, A. V. (1965): "Studies in the Theory of Random Processes," (trans. from
Russian), Addison-Wesley, Reading, Mass.
Slepian, D. (1958): Some comments on the detection of Gaussian signals in Gaussian
noise, IRE Trans. Inf. Th., 4:65-68.
Sosulin, Yu and R. L. Stratonovich (1965): Optimum detection of a diffusion process
in white noise, Radio Engrg. Electron. Phys., 10:704-714.
Stratonovich, R. L. (1960): Conditional Markov processes, Theory Prob. Appl.,
5:156-178.
Stratonovich, R. L. (1966): A new form of representation of stochastic integrals and
equations, SIAM J. Control, 4:362-371.
Taylor, A. E. (1961): "Introduction to Functional Analysis," Wiley, New York.
Thomasian, A. J. (1969): "The Structure of Probability Theory with Applications,"
McGraw-Hill, New York.
Van Schuppen, J. H. and E. Wong (1974): Transformation of local martingales under
a change of law, Annals of Prob., 2:878-888.
Whitney, H. (1957): "Geometric Integration Theory", Princeton University Press,
Princeton, N.J.
Whittle, P. (1963): Stochastic processes in several dimensions, Bull. Inst. Int. Statist.,
40:974-994.
Wiener, N. (1949): "Extrapolation, Interpolation, and Smoothing of Stationary Time
Series," Wiley, New York.
Wiener, N. and P. Masani (1958): The prediction theory of multivariate stochastic
processes-II, the linear predictor, Acta Mathematica, 99:93-137.
Wong, E. (1964): The construction of a class of stationary Markoff processes, Proc.
Symp. in Appl. Math., Am. Math. Soc., 16:264-276.
Wong, E. (1969): Homogeneous Gauss-Markov random fields, Ann. Math. Stat.,
40:1625-1634.
Wong, E. and J. B. Thomas (1961): On the multidimensional prediction and filtering
problem and the factorization of spectral matrices, J. Franklin Institute, 272:87-99.
Wong, E. and M. Zakai (1965a): On the relationship between ordinary and stochastic
differential equations, Int. J. Engng. Sci., 3:213-229.
Wong, E. and M. Zakai (1965b): On the convergence of ordinary integrals to stochas-
tic integrals, Ann. Math. Stat., 36:1560-1564.
Wong, E. and M. Zakai (1966): On the relationship between ordinary and stochastic
differential equations and applications to stochastic problems in control theory,
Proc. 3rd IFAC Congress, paper 3B.
Wong, E. and M. Zakai (1969): Riemann-Stieltjes approximations of stochastic in-
tegrals, Z. Wahrscheinlichkeitstheorie verw. Geb., 12:87-97.
Wong, E. and M. Zakai (1974): Martingales and stochastic integrals for processes with
a multi-dimensional parameter, Z. Wahrscheinlichkeitstheorie verw. Geb., 29:
109-122.
REFERENCES 315
Wonham, W. M. (1964): Some applications of stochastic differential equations to

optimal nonlinear filtering, Tech. Rept. 64-3, Feb. 1964, RIAS, Baltimore.
Wonham, W. M. (1970): Random differential equations in control theory, in "Prob-
abilistic Methods in Applied Math," vol. 2, Academic, New York.
Yaglom, A. M. (1961): Second-order homogeneous random fields, Proc. 4th Berkeley
Symp. Math. Stat. and Prob., 2:593-620.
Yaglom, A. M. (1962): "An Introduction to the Theory of Stationary Random
Functions," R. A. Silverman (trans.), Prentice-Hall, Englewood Cliffs, N.J.
Yor, M. (1976): Representation de martingale de carre integrable relative aux processus
de Wiener et de Poisson a n parameters, Z. Wahrscheinlichkeitstheorie verw. Geb.,
35:121-129.
Youla, D. C. (1961): On the factorization of rational matrices, IRE Trans. Inf. Th.,
IT-7:172-189.
Zakai, M. (1969): On the optimal filtering of diffusion processes, Z. Wahrschein-
lichkeitstheorie verw. Geb., 11:230-243.
Solutions to Exercises
CHAPTER 1
1. (a) We need only note that [O,a) (\ [0, a + 1) = [a, a + 1) e Ct.
m m+n
(b) Let A = U [a.,b.) and B = U [a.,b i )
;=1 .=",+1
Then
m+n m m+n
A VB = U [a.,b;) and A (\ B = U U [a;,bi) (\ [aj,b j )
i=1 i=1 j=m+l
But [a;,b i ) ( \ [aj,b j ) is either empty or of the form [min (ai,b j ), max (bi,b j ». Hence,
A (\ B is again a finite union of intervals of the form [a,b).
(c) Let C be any Boolean algebra containing Ct. Because [a,b) = [O,b) (\ [O,a),
C must contain all sets of the form [a,b) and, hence, all finite unions of such sets. Hence,
every Boolean algebra containing C) must also contain C 2• Because C 2 is a Boolean
n
algebra, it must also be the smallest.
(d) [a,b] ""

n=1
[a, b + .!.)
n
(a,b) 0 [a + .!., b)
10=1 n
(a,b] = n(a,b +.!.)

n=1 n
316
CHAPTER 1 317
z. (a) See Exercise l.
(b) If <P is <T additive, then it is also sequentially continuous so that from Solution
l.Id, we have
Cl'([a,bJ) = <P Cl~rr: [a, b + ~))

nl~m~ Cl' ([ a, b +~))
= lim
n-+ QO
P(b + ..!.)
n
- pea)
= P(b+) - pea)
Cl'«a,b» = !~m~ {P(b) - P (a +~)}

= P(b) - P(a+)
Cl'«a,bJ) = P(b+) - P(a+)
3. (a) Setj-I(A) = {x,j(x) EA}. If A Effi, thenr'(A) Effi n , becausejis a Borel

function. Hence,
P(f-'(A» = <P({w: X(w) Ej-I(A)}
= <P({w:j(X(w» EA})
(b) We can assume that j is nonnegative, becau~e othcrwise we can write

j = j+
- j- whcre rand j- arc both nonnegatiyc. If, in addition, j is also simple, i.e .•
there exist disjoint Borel sets .cL, . . . , A m such that
and
j(x) = jk X EAk,
k = I, . . . , m,
then, by definition, we have
I
Tn
/RJ(X)P(dX) = jkP(A k)
k=l
If we set X-leA) = {w: X(w) E A l. then we can write
f
m
( f(x)P(dx) = '\' j(X(w»Cl'(dw)

JR" ~ X-l(A.)
In
k=l
= j(X(w»<P(dw)
If j is not a simple function, then by definition
r f(x)P(dx)
JRn
= lim
m-+oo
r
]Rn
jm(x)P(dx)
318 SOLUTIONS TO EXERCISES
when If... } is a nondecreasing sequence of simple functions converging pointwise to f.

We now have
( f(x)P(dx) = lim ( f ...(x)P(dx)

JRn. m-HO JR"
= lim ( f ...(X(w»!J'(dw)
In
In
m-+CIO
= f(X(w»!J'(dw)
where the last equality follows from monotone convergence.
6. Let Xl = Y cos 8, X 2 = Y sin 8 cos <1>, and Xa = Y sin 0 sin <1>. If we denote the
joint density function of Y, 0, and by p, then
s~nO0 sin <p I

cos 0 -y sin 0
p(y,8,<p) = Ism
s~n 0 C?S <p
0 sm we have X,
k
7. Since Y k = Y,and
j=1
Yk - Yk - l = Xk k = 2, . . . ,n
Therefore,
1 o 0 o
-1 1 0
PY(YI, .•• , Yn) = 0 -1 PX(YI, Y2 - YI, • . . , Yn - Yn-I)
o ...... -1
_1_ exp [ -
(2'11")"12
~ ~
2 ~
(Yk - Yk-I)2]
k=l
where Yo == O.
•. E-L!L
1 + IXI
= I
JIXI?.1
~d!J'+ I
+ IXI JIXI<. 1
~d!J'
+ IXI
This implies that
IXI
> --!J'(IXI > E)
E
E--
1 IXI + - 1 + E -
CHAPTER 2 319
and
14. (a) Because Y is independent of X 2, we have

E(YIX 2 ) = EY = EXI + aEX 2 = 0
On the other hand
so that
15. Let Xl = Y cos <1>. The joint density function of Y and is given by
p(y,,,,) = Isin
cos
. '"
-ysin", 11
_ e-} (I/! cos! ~+1I2 sin! rp)
'" y COS'" 2,..

= -1 ye-lut
2,..
In other words, Y and are independent, with uniformly distributed on [0,2... ).
Therefore,
E(XIIY) = E(Ycos<l>IY) = YEcos<l>
= Y -1
2,..
!u0
2r
cos'" d", = 0
16. By definition ax contains every set of the form {w: Xi(W) E A}, A E (JlI, i 1,
. , n. It follows that if A I, • • • , A n are one-dimensional Borel sets then
n
n n
X-I (n Ai) = {w: Xi(W) E Ad E ax
i=l t=1
Since (Jln is the smallest <T algebra containing all n products of one-dimensional Borel
sets, it follows that for every b E (Jln,
X-I(B) E ax so that ax ::J {X-I(B), B E (Jln I
Conversely, consider the collection {X-I(B), BE (}tnl. It is a <T algebra, and
every Xi is clearly measurable with respect to {X-I(B), B E (Jln I. Hence, ax C
{X-I(B), B E (Jln}, and our assertion is proved.
CHAPTER 2
1. For any real number a,
Cl'(IX, - X.I > 0) ~ Cl'(X, > a +
0, X. < a - 0)
= [1 - P,(a o)JP.(a - 0)+
---> [1 - P,(a
.-.t
o)]P,(a - 0) +
and continuity in probability means that

[1 - P,(a+»)P,(a) = °
for all t and a. Therefore, at all continuity points P, is either or 1. It follows that P,
is a function with a single jump of size 1, say, at f(t), and that for each t, X,(w) = f(t)
°
with probability 1. Because the X process is continuous in probability, the function
f must be continuous.
2. (a) <l'(/w:X,(w) = °for at least onetin Tnl = <l'(U /w:X(w) = -tl)
°
lET"
= L <l'(X = -t) =
lET"
(b) <l'(X +t = °for at least one t in [0,1)) = <l'(X E [-1,0)) = f~ 1 J2,.. e- z '
dx > °
3. P,(x) = <l'( /w: X,(w) < xl) = <l'(/w: tw < xl)
= Lebesgue measure of [0, i) n [0,1)
= min (1, i)
P,.,(XI,X2) = 0' ({ w: w < ¥, w < ~})
= Lebesgue measure of [ 0, ¥) n [ 0, ~) n [0,1)
= min (1, ~I,~}

4. (a) p.(t) = EX, = 101 wt dw = ~t
R(t,s) = 10 1
(wt - ~t)(ws - ~s) dw = tslol (w - "~-)2d",
_0.1 _.it!
- 3 "4 - 12
(b) We note that

X, = Z cos 8 cos 2,..t - Z sin 8 sin 2,..t
=A cos 2,..t - B sin 2,..t
where A = Z cos 8 and B = Z sin 8. If we denote the joint density function of A and
B by p, then
-zsinOI =-ze
P( zcosO, Z sinO)1 ~::
1 _1 z 2
2
zcos (J 2"."
or
1
p(z cos 8, z sin 8) = - e-1••
2,..
CHAPTER 2 321
and
p(a,b)
It now follows that every linear combination

n n n
;=1
L (XiX" = (.L (Xi cos 27rti) A - (L
;=1 ;=1
(Xi sin 27rti) B
is a Gaussian random variable. By definition, {X" - 00 <t < oo} IS a Gausshn

process.
8. Let ffixo denote the smallest algebra (not <T algebra) such that for every T ::; s, all
sets of the form {w: XT(W) < a} are in ffi". It is clear that ax. is generated by ffiu.
Now, every set A in ffi x • depends on only a finite collection X," X,,, . . . ,X,,,, ti ::; s.
Therefore, for every A E ffix.
EIAX, = E{E[IAX,IX," X,,, . . . ,X, .. X.])
Writing X, = X,+ - X,-, where both X,+ and X,- are nonnegative, we have
EIAX,+ = EIAX,+ A E ffi x •
EIAX,- = EIAX.- A E ffixo
Each of these four terms defines a finite measure on ffi .. which has a unique extension
to a... It follows that
for otherwise, we would have two different extensions of the same measure. Similarly,
EIAX,- = EIAX.-
and
Since X. is a" measurable, we have

EI1"X, = X.
with probability 1.
11. Suppose that X, = f(t)Wo(t)lfCt), then
EX,X. = f(t)f(o) 111m

. (g(t)
-,-
ges»)
f(t) f(s)
Because g(t)/f(l) i~ uuudccreasing, we have
EX,X. = f(t)f(s) g(lIl~n (t,s»

f(mm (t,s»
= f(max (t,s»g(min (t,s»
= e-[t-.I = e-[mux (t,.)-min (t.,))
It follows that f(t)g(t) = 1 so that get) 1//(1) and

f(max (t,s»
f(max (t,s»g(min (t,s»
f(min (t,s»
Hence, we can take !(t) = ke-I where k is any nonzero constant. Thus,
X, = ke-IWI(l/I:).~'
12. (a) Since IX" - co < t < co I is Markov, the Chapman-Kolmogorov equation
yields
n
<P(X,.... = xilXo = Xj) L <P(X,+. = xilX. = Xk)<P(X. = xklXo = Xj)
k-l
or, equivalently,
n
Pij(t + 8) = L Pik(t)Pk;(8)
.1:-1
t,8 >0
In matrix form, this can be rewritten as

p(t + 8) = p(t)p(8)
so that
lim ! [p(t + 8) - p(8») = lim ! [p(t) - l)p(8) = Ap(8)

t!ot t!ot
Hence,
p(8) = Ap(8) 8> 0
the unique solution of which corresponding to p(O) = I is p(8) = e8A •

n
(b) L <P(XI+T = Xi!X, = Xj)<P(X, = Xj) = <P(X'+T = Xi)
i=1
L <P(X'+T = Xi!X, = Xj) = 1

n
i-I
Hence, p(r)q = q and pT(r)1 = 1.
If q = [t]. then we have p(r)l = I and pT(r)1 = I, and p(r) must have the form
p(r) = L~1(r) 1 - !(r)]
!(r)
and p(O) = j(O) [ _ ~ - ~] = A. Because every entry in p( r) is nonnegative, !(O)
must be less than or equal to zero. Setting j(O) = - X, we have .from part (a)
CHAPTER 2 323
14. Because {X" - co < t < co) is stationary, its covariance function depends only
on the time difference. Set
E[(X, - EX,)(X. - EX.»)

p(t - 8) = -'-'---':..--..-----''--'---=----~
E(X, - EX,)'
then from Solution 2.13, p must satisfy
p(t + 8) = p(t)p(s) t, 8 2: 0
It follows that for any positive integers m,n, we have
so that p ( ; ) = pm/n(l) = e(m/n)lnP(l). By continuity we must have
p( t) = e nn P(I) t 2: 0
and by symmetry
pet) = eltllnp(l)
= e- Xlti
where we have set ->. = In pel).
15. (a) Consider the characteristic function
E exp (i k~l UkX'HT) = E {2171" 10 2

". exp [iA k~l Uk cos (271"tk + 271"T + 0) ] dO}
=
I
E { 2; 12",
(2'1f"(r+l) f
exp iA ~ Uk cos (271"tk + 1/t) #
[ ] }
I
k-l
= E {L 10 2
". exp [iA
k=l
Uk cos (271"tk + 1/t) ] #}
(i I
n
=E exp UkX,.)
k=l
(b) EX, = EA (L 10 21f" cos (21l"t + 0) dO) = 0
M T(W) "" 2IT ! -

TT X,(w) dt = A(w) .2.
2T
! -
T
T
cos [21l"t + 9(w») dt _ _0
T-+OQ
324 SOLUTIONS TO EXERCrSES
(c) Let Y =A cos 0 and Z = A sin 8, then
X. = Y cos 2rt - Z sin 2rt

and Y = X 0, Z = - X 10 so that {X .. t E ( - co, co)} is a Gaussian process if and only
if Y and Z are jointly Gaussian. Since
EYZ = EA2~ (2,.. sin o cos OdO =0

27r }o
EY2 = EZ2 = tEA 2 = 0'2
Y and Z are jointly Gaussian if and only if
pyz(y,z) = _1_ exp [_

27rO' 2
~ (y2 + Z2)]
20'2
Hence, by the transformation rule for random variables (see, e.g., Exercise 1.5), we
have
= . 0) Icos.
pyz(r cos 0, r SID
SIn
0
0
-rsin 0
r COS 0
I
r (r2)
= 27ru exp - 20'2
2
and
PA(r) r
= - exp
0"2
(1 r2)
- - -
2 u2
r~O
{X. - co <t < co} is not :Markov because for t ~ }
E(Xt!X o = y, Xi = z) = y cos 27rt - z sin 27rt
which depends on y, contrary to the l\Iarkov property.
16. (a) Since X" Y, are independent and Markov
E(Z.2IX"X2,X3'y"Y 2,Y3) = E(X.2IX3) + E(l".2IY 3)

= E[(X 4 - X3)2 + 2X3(X. - X 3) +X3 21X3J
+ E[(Y, - Y3)2 + 2Y 3(Y. - l"3) + r 32 1Y 3 ]
= 1 + X3 2 + 1 + Y 3 2
Therefore, E(Z.210bserved data) = 7.
Now, by the Schwarz inequality
E(Z.ldata) ~ V E(Z.2Idata)
=0
On the other hand, the Schwarz inequality applied to summations yields
CHAPTER 2 325
so that
W+1I2 > Ixxo+ yyol > (xxo + YYo)
y - VXoz + yo2 - VX02 + Y02
= v'xoz + Y02 + _/ (x
~
- xo) + _/ ~
(y - Yo)
V X02 + Yo' V X02 + Yo'
It follows that
v'X;-;- + Y.2
=
Hence, E(Z.ldata) 2: VI + (2)2 = VS.

(b) Introduce {8 t , I 2: O} so that
x, = Z, cos e, Y t = Z, s n e,
Erf( Z,J IZtp Z,,, . Z,,,_,i = E\E[f(Z,JIZtj,e'i,j = 1, . . . , n - 11

= E{E[.f(Z,JIZ",_"e,,,_,J!z,,, , Z,,,_, I
Now,
and
E[f(Z,JIZ'n_, = ro, 8',,_1 = 001 = E[f(VX, .. 2 + Y.! I

X,,>-) = ro cos 00, Y tn _1 = ro sin 00 1
= to tOO
dr
0
2" rJ(r)
dO -----'-'--'--
27r(t n - tn_I)
exp { - .
2(tn -
1
t,,_,)
[r' + r02 - 2rro cos (0 - 00 ) I}
t '" t
Because cos 0 is periodic with period 27r, a change in variable yields
E[.f(Z,,,J IZtH = ro, 0, -I = 00 1 = dr 2"

dO' _~r--,-J(,-r,-)~
o 0 '27r(t" - In_I!
exp [ - ;--.. _1_- (r 2

2(t" - tn-I)
+ r02 - 2rro cos 0') J
which is independent of 00 . It follows that
and
so that IZ" I 2: O} is Markov (see Proposition 4.6.4 for an easy means of proving
this fact).
CHAPTER 3
1. (a) Compute
B(II) = f 00 R(T) dT == f 1 (1 - JTi)e-· tr .,. dT
!o
-00 -1
=2 1
(1 - T) cos 211'JIT dT
== -- 11 . 2
SIn
2(1 - cos 211'11)
211'JIT dT == - - - - - -
211'11 0 (211'1')2
== (Si::vy ~ 0
It follows from one-half of Bochner's theorem that R is nonnegative definite.
(b) Repeat the same procedure as in (a),
(c) If R(t,s) == e ll - al then R(t,t) = 1, and it violates (1.8), viz.
JR(t,s) J ::;; V R(t,t)R(s,s)

(d) R is continuous at all diagonal points (t,t), but not continuous everywhere, so
it cannot be nonnegative definite.
3. (a) Since R is continuous and periode, we can write it in a Fourier series as
R(T) = L'" R nein (2 7r IT)1

n=-oo
where Rn ~ O. It follows that
ff
T
EZmZn = R(t - s)e- i (2 7r IT)(ml-n&) dt ds
o
2:
00
RnOmkOnk = Rnomn
k=-oo
2:
N N
(b) E [ X t - Znein(27rIT)t [2 = R(O) ~
L.. EJZnJ2
n=-N lI=-N
N
= R(O)
n= -N
2: R,. ----->
,N-Jooo
0
(c) We can write

00
~ R n ein (27r IT)T

'-I
n=-oo
where Rn = 1/2(1 + n2), n ;;e 0 and Ro = 1. Since the family {e in (2"IT)I, 0 ::;; t ::;; T,
n = 0, ±1, . . . \ is orthogonal, we can clearly take 'l'n(t) = (I/VT)e in (2 7r IT)l to be
the orthonormal eigenfunction. The eigenvalues are An = RnT.
CHAPTER 3 327
4. First, we write
A",(t) ... lot (1 - t + S)",(S) ds + Ie! (1 - s + t)",(s) ds

Differentiating it once, we get
A",'(t) =- lot ",(s) ds + Ie! ",(s) ds

Differentiating once more, we find
A","(t) = -2",(t)
as was suggested by the hint. The second of the above equations yields the boundary
condition -",'(0) = ",'(t). The first and second equations yield ",(0) + ",(t) = j",'(O).
From the equation A","(t) = -2",(t), we get
"'( t) = A cos ~ t + B sin ~~ t

Applying the condition - ",' (0) = ",' (t) yields
",(t) = C cos ~~ (t - i)
Applying the second boundary condition yields the transcendental equation
_ F- cos l V2/A _~
t V 2/A = = cot l V 2/A
sini
4
vI27X 4
which is to be solved for the eigenvalues A. Finally, for normalization we choose C = 2

so that
",(t) = 2 cos ~ (t - 'V
6. Since the W process is a Brownian motion, we can write
WT(t) = L
'"
n=O
vx: CPn(r(t»Zn o~ t ~ T
where An and CPn are given (4.32) and (4.33), respectively, and {Znl are Gaussian and
orthonormal. Hence, the desired expansion is obtained by setting
"n(t) = !(t) vx: "'n(t)

Suppose that we define T-1(t) = min Is: T(S) = tl. Then,
Since a Brownian motion is q.m. continuous, the Hilbert space Xw generated by

{WI, 0 ~ t ~ T(T) I is spanned by {WI, t E S} where S is any dense subset of [O,T(T»).
It follows that every Zn E Xx if and only if there exists a dense subset S of [O,T( T»)
such that
for every t E S
7. R(T) = te-IT! (3 cos + sin IrD

T
S(lI) = I"-.. e-'I1!"""R(r) dr ... 2 ( .. cos 211"lITR(r) dr

Jo
= 41- Ic"
0
e-t' cos 211"lIT(3 cos r + sin r) dr
=!8 Jor" e-T [3 cos (1 + 211"/I)r + 3 cos (1

+ sin (1 + 211"/I)r
- 211"lIT)
+ sin (1 - 211"/I)r) dr
1[3 3
1 + 211"/1 1 - 211"/1
= 8 1 + (1 + 211"/1)2 + 1+0- 211"/1)2 + 1 + (1 + 211"11)2 + 1 + (1 - 211" v)
]
2
1 16 + 4(211"v)2 = ! 4 + (211"/1)2
== 8 4 + (271"/1)4 2 4 + (271"v)4
•. Let pet - s) = EY,R,. Thcn if }", = j _"'", e

i2 •rvI H(/I) d.t.
pet - 8) = 1-"", pi2n (l-')H(v)S(v) d/l
Therefore,
H(/I) = _1_
S( v)
I'"
-",
e- i2 .. >T p (r) dr
1 1
= S(v) 4 + (211'"v)4
The covariance funct on of thl' V prO(,l'8S is given by
E(· i271"lnX,)(e i2"lI',X .. ) = e'~·11'(I-')EX,X.

= e d 7l"11'('-')R,(t - 8)
cos 211'"lVtX, is not wide-sense 8tationary.
1 . (a) More generally, Y , = j _"'", H(v)ed''''1 dX p is real valued i
H(v) = H(-v)
To prove this, first assume HE L2( - 00, 00), then H is the fourier transform of a
CHAPTER 3 329
real-valued function hand
Y, = f-"'", h(t - 8)X. ds

If H fi. £2( - 00,00) we truncate H(,,) to " ±n. It follows then that Y, is the q.m.
limit of a sequence of real-valued processes.
(b) Since EX,X. = e-r,-.r,
2
EldX.12
+ (2",,)2 d"
A
=
1
It follows that
EX,X. = EX,i. = f'"- '" I-i sgn 1112e i2 ... (,-.)

1
2
+ (2n-v)2
d"
= e-I'-.I
EX,X. = EX,X. = f'"- '" (-i sgn II)e i2 11"V('-.)

1 + (2,,")2
2 d"
= 4 fro
oo sin2"I'(t - s)
1 + (2n-v)2 dl'
(c) Z, = f-OOoo (cos 2"" ot - i sgn I' sin 2"I' ot)e i2"'" dX.
= f-OOoo e-' sgn '(2"0)<+i2"" dX.
14. (a) Check the three cases min (t,s) > 0, max (t,s) < 0, and max (t,s) > 0 >
min (t,s).
= n2(Z'+lln - Z,)(Z.+lln - Z.)
= ~ (I t + ~ / + /s + ~ /- It - sl + It I + lsi - It - sl - It I
- 18
I
+ -I
1 I + II t
n
- 8 -
1/
-
n
-
It + -1 I -
I n
lsi + Ii.t + -1 - 8 II')
It
=~' (I t - 8 - ~ + It - + ~ I -
/ 8 21t - sl)
I
o ~ It - s. ~
II
elsewhere
= l~ n(l
f-lin .
- nT)e-· 2 >rVT dT = 2n frl~ (1 - nT) cos 2"I'T dT
0
2
= -2n21c lin sin 2"I'T dT = -2n- ( 1 - cos -:;-
2,,1')
2,,1' 0 (2"1')2
=~ sin 2 "I' = (Sin "I'ln) 2

(2,,1') 2 n "I'ln
(e) E [f-OOoo r",f(t) dtJ [f-OOoo r n,g(t) dtJ = ff !(t)ii(s)R,(n)(t - 8) ds
= f oo j
-00
(1')6(,,)
(Sin "I'ln) 2
---
1rv/n
dll
-oo
(dominated convergence)
n~oo
)
f oo
-00
j (,,){i(I')
- d"
= f-OOoo 1(t)0(t) dt
15. Set Zn' = lot In. ds. Then, we have

Z .., = n lot (Z.+lIn - Z,) ds
= n(h~:l/n z, ds - lot Z. dS)
= n (!et+l/n Z.ds - Io l/n Z.ds)
Therefl)re,
I{t+l/n (Z, - + JO[lIn z. ds I

n
EIZ, - Znel 2 = n 2E it Z.) ds
~ 2n 2 [ E If+ l/n (Zt - Z.) ds 12 + E llol/n Z, ds
~ 2n 2 [; f+l / (S - t) ds + ; 10 lin S dS]

n
2
= -n ~
n---?
0
00
uniformly in t
It follows that
{b.
Ja f(t)Zn, dt + Ja{b (t)ln' dt = f(b)Z.~ - (a)Zna
We have already shown that Z., ~ Z, uniformly in t, and by definition,

n--> ""
(b f(t) dZ, = lim in q.m. {b f(t)!n' dt

Ja n---+- 00 Ja
Hence, the desired result follows.
16. Starting from Y, = Y" - it Y, ds + z, - z"' wc set U, = it y. ds and write

U, + U, = Ya + Z, - Z"
Hence, we have
U, = e-' [It Y"e'ds + 1t (Z, - Za)e' dS]
and
Y, = U, = -Ut + Ya + Z, - Za
If we replace 1t Z.e· ds by - lot e' dZ, + Z,e' - Zaea, we get
Y, = Yae-('-") + 1t e-(t-·) dZ.
which wa~ to be shown.

n
17. (a) Define X(lcd) = Xd - Xc as required. If f = 2: OIdi and each Ji is an indi-
i=l
2: OIiXUi).
n
cator function of a half-open interval, calif a step function and set XU) =
i= 1
CHAPTER 3 331
If f and g are step functions, then
lab f(t)g(t) dt + ff p(t,s)f(t)g(s) ds

b
EX(f)X(g) -
a
It follows that
lab If(t)1 2dt ::; EIX(f)12::; K lab If(t)J2dt

By taking q.m. limits, XU) can be defined for any f which 's the V limit of a sequence
of step functions. But the set of step functions is dense in L2(a,b), hence our definition
is complete.
lab f(t)g(t) dt + ff p(t,s)f(t)g(8) dt ds

b
(b) EXU)X(g) =
a
L"'iXI,
n
(e) If Y is a finite sum Y = then Y = Jab 'f/(I) dX, + !.-Xa

i=O
where 'f/ is a step function. Every Y E Xx is the q.m. limit of a sequence of finite sums.
Since X( 'f/n) converges in q.m. if and only if 'f/n is a L2-convergent sequence, the result
is proved.
18. We use the condition
E(A - A,)X. = 0
and find
Hence, lot h(t,r) dr = t/ (1 + t) and
s
10o
8
h(t,r) dr = ~~
1 +t
1
h(ts)
,
= 1-+-t 0::;8::;t
Therefore, AI = [1/(1 + t)]X,.

19. (a) We use the condition
E( Y - f,)x. = 0 8 ::; t
and find
hot
h(t,r) -
a
ar (EXTX.) dr + EXoX. = cos 211"Ws
Since EX,X. = cos 21rWt C03 21rWs + e- vo !'-'!, we have

cos 21rWs [21rW lot h(t,T) sin 21rWT dT J
=e- V,' - 1'0 lot h(t,T) sgn (T - s)e-Vo!T-.! dT
Differentiating both sides twice with respect to s yields
- (21r W)' cos 21r W 8 [ 21r W lot h(t,T) sin 21r WT dTJ
= vo'e- v,' - 1'0 Ja(t h(t,T) sgn (T - s)e-Vo!T-.! ciT
ah(t,s)
+ 21'0-----;;;-
or
21'0 ah~~8) + {[v o' + (21rW)'](21rW) 101 h(t,T) sin 21rWT ciT} cos 21rlFs = 0
That means that h(t,s) must have the form

h(l,s) = get) + f(t) sin 21r W s
and
J(t) = - ~
21'0
[1'0' + (21rW)') (I h(t,T) sin 21rWT ciT
Ja
which yields
r(t) {I + 21'0
~ [1'0' + (21rWJ')} Jo(f sin' 21rlVT ciT
+ {~.
2vo
[vu' + (21rW)') (t sin 21rWT dT} get)
Jo = 0 (**)
From (*) we find (upon setting s = 0)
21rW Jot l1(1,T) sin 21rWT dT = 1 - 1'0 lot h(t,T)e-V"T dT

which yields
.r(tl [1'0 Ja(t e-VoT sin 21rlVT ciT - _41rv.oH_T

+
1'0' (21r W)'
-·1 + (1'0 (at e-VoT ciT) get)
J(
= 1 (***)
Equations (**) and (** *) suffice to determine f(l) and a(t), which in turn determine
h(t,s) completely.
(b) \Ve bpgin by noting that f-"'", I'-V,'T!e i '"'' dT = 2vo/(12lrvl' + po') so that {N"
- 00 <t < oo} can be viewed as a white noise filtered by transfer function Y2vo/
(vo + i21rv) or
N, + voN, = Y2vo r,
where r, is a standard white-noise process. It follows that
.Y, (-21rW sin 21rWt)Y + Nt
= (-21rW sin 21rWt) Y - vo(X. - cos 21rWtY) + Y2vo r,
CHAPTER 3 333
Let Z, = cVo'X/. Then
i, = Yevo'(vo cos 2".Wt - 2".W sin 2".Wt) + V2vo eVo'!;,

1'=0
which is in the standard form (9.32) and (9.34), with Z, replacing X,. Hence, we have
A(t) = 0 = F(t), B(t) = .y2v;; CVo' and
H(l) = eVo'(vo cos 2".Wt - 2".W sin 2".Wt)
The estimator Y, must satisfy

p, = k(t)[Z, - H(t) Y,j
Equations (!).39) and (9.40) now hecome
i(t) = +R2(t)k 2(t) - 2H(t)~(t)k(t)

K(t)B2(t) = ~(t)H(t)
Continuing, we get
. H2(t)
~(t) = - - - ~2(t)
B2(t)
which can be transformed into a linear equation by setting
'li(t) H2(t)
u(t) = B2(t) 1;(t)
we then get
d H2(t)
u(t) = li(t) - - -
dt B2(t)
'li(t) = 'li(O) exp r H2(t)

- - - ---
R2(1)
H'(O) ]
B2(O)
It is now eIcar that K(t) can he found pxplicitly.
20. We begin by factoring 8,(v) into the form
8 (v)
L
= [(i2".v) + 1 + i][(i2".v) + 1 - i][(i2".v) - 1 + i][(i2".v) - 1 - iJ
It is clear that an Ii satisfying (9.10) can be taken to be
1
h(v) = ----
[(i2".v) + 1 + i][(i2n) + 1 - iJ
('orresponding to
h(t) = e-' sin t t ;0:: 0

=0 t < 0
The function g(t) can be found by using (9.24) as
g(t) = / .. ei2"" [8%11(")]

-.. h(v)
= f-.. . ei~"ei2""'h(,,) d"

= e-(I-t<x) sin (t + a) t ;::: -a
= 0 t::; -a
Now, -9(,,) is given by
[
cos a +
sin a(2ri" + 1) ]
= e-a (2ri" +
1 - i)(2ri" + 1 + i)
From (9.29) we get
2(X,+aIJCx') = 1_0000 e-a [cos a+ sin a(2".i" + 1»)e i2"" dX.

= (cos a + sin a)e-aX, + e-a sin aX,
21. Here, 8ue,,) has the form
8 ,,- 1 . 1 _ 5+ (2".,,)2 + (27r/J)4 _ hv I

.( ) - 1 + (2".,,)2 + 4 + (2".,,)4 - [1 + (2".,,)2](4 + (2".,,)4J - 1 ( )1
The function h(,,) has the form
h,,' _ (27ri" + z,)(2".i" + Z2)

( ) - + pd(27ri" + P2)(2".i" + p,)
(2".i"
where we take PI = 1 + i, P2 = 1 - i, P, = 1, and ZI and Z2 have positive real parts.

For this problem
Therefore,
Since g(.) is the inverse transform of [8,"(,,)/';(,,)] and only gel) for I;::: 0 contributes
CHAPTER 3 335
then, Y(II) = . al + . a2 . Finally,

211"tll + PI 211"tll + P2
H(II)
Using calculus of residues, we get
Now, it only remains to compute ZI and Z2 which can be shown to be given by

= (5)t exp (±i/2 tan- I
ZI,Z2 09)
22. For this case (9.40) becomes
K(t) = [~J + 1:(t) [~J

and (9.53) becomes
±(t) = [~ gJ - [~ gJ - 1:(t) [~ g] - [g ~J 1:(1) + 1:(t) l.~ ~ J 1:(t)

+ [~ - ~] 1:(t) + 1:(t) [ _ ~ ~]
Since 1: (0) = 0, 1:(t) == 0, t ::::: 0 is a solution. If the ~ollltion to (f).5a) is unique, as
can be shown, then 1:(t) == 0 is the only solution, and we have K(t) = [~J "Ve can
verify by writing
y, = FY, + [~J r,
X, = HY, + r,
Hence,
.
y, = FY, + [IJ .-
0 (X, HY,)
23. Let Y 2' "" Y, and Y 11 = V,. Then
[Y~IIJ = [01 0IJ [YIIJ

2, Y
+ [10 0OJ ['TI'J
2, ~,
X, [0 1) [YIIJ
=
Y
+ [0 1) ['TI']
2, ~,
24. Since e- 1t -. I = f-"'", ei2 vt2/[1 + (211"11)2)

11" dll, (l/y2) (~~, + ~.) = It is a stan-
dard white noise. Therefore,
(~+ 1) X, = (~+ 1) Y, +y2 I,

Let Z, = [(d/dt) + I)X" then we have
Z, = [1 l)Y, + [0 y2) [;J
which is in the form of (9.34).
CHAPTER 4
1. Without loss of generality we can assume a = 0 and b = 1 and set
W,(n) = W k1n kin ~ t <k + l/n
k = 0, 1, . . . ,n - 1
Because E<p''P8 is clearly continuous on [0,1)2, we have
n-1
\ ' <Pkln( W k+lln -
~
W kin) --->
n~oo
j 0
1 'P, d W,
k~O
On the other hand
I
n-l
<p,W, - <poWo - 10 1 tP,W,(n) dt = <p,W, - <poWo - W kln fk~:1/n tPl dt
k~O
I
11-1
= <PIWI - <PoWo - W kln (<P(k+1lln - <Pkln)

k~O
I
n
<f'kln(TVkln - HT(k_l)ln)
k~1
I
n -1
= <Pkln(HT(k+l)ln - W kln )
k~O
I
n
+ (<Pkln - <P(k-ll/n)(Wkln - lV(k_ll/n)

k~1
The first term goes to 101 <PI dW, as n -> 00, while
I
k~1
! (<Pkln - <Pk-lIn)(Wkln - W(k-ll/n) I ~ ~-!
k=1
(<Pkln - <P(k_1lln)2 ! (Wkl:-~
k-1
W(k_1lln)2
a.s.
---> O. Since
!c 1 a.s.
tP,W,(n) dt - - >
!c 1
tP,W, dt, we have
n--+ QC 0 n--+ OCI 0
<PIWI - <poWo - 101 tPIW, dt = 101 <PI dW,

which was to be proved.
CHAPTER 4 337
2. Set X, = In Z, then
d X ,= -l dZ, - -2Z,2
Z,
1 1
- Z,2",,2 dt = "', dW, -
1
2",,2 dt
Since Xo = In Zo = 0 we have
Z, = eX, = exp (lot "'. dW. - ~ lot ",.2 dS)
3. Define Y, = In X, - In X o. Then, Ito's differentiation rule yields
1 1 1
dY, = - dX, - - - ,,2(t)X,2 dt
X, 2X,2
,,2( t)
= met) dt + ,,(t) dW, - 2 dt
Since Yo = 0, we have
Y, = lot ,,(s) dW. + lot [m(s) - u2~S) ] ds
1}
or
[t [t [ ,,2(S)
X, = Xoe Y ' = exp { }o ,,(8) dW. +}o m(s) - 2 ds
4. We first write
X t = Xo + lot m(X"s) ds + lot ,,(X.,s) dWs

By the Cauchy-Schwarz inequality, we have
N N
(L k=l
akr ~ N
k=l
L ak 2
It follows that
EX/2 ~ lot E"2(X,,s) ds + E [lot m(X"s) ds T}

a {EXo2 +
~ 3 [ EXo2 + K lot E(l + X.2) ds + Kt lot E(1 + X,2) dsJ
3 [ EXo2 + K(l + t) lot E(l + X,2) dsJ
=
If we set = lot E(l + X,2) ds, then

f(t)
d
dif(t) - 3K(l + t)f(t) ~ 1 + :lEXo2
or
d
di [e- 3K (t+i212) f(t)] ~ (1 + 3EX o 2)e- 3K (t+i2/2)
Since j(O) = 0, we have
j(t) ~ (1 + 3EXo')e 3K (t+,'I,) 10 00

e- 3K (t+,212) dt = Be 3K (t+,2/2)
and
EX,2 < j'(t) ~ (1 + 3EX02) + 3K(1 + t)f(t) ~ Ae 3K"
5. We merely have to note that
[t
rp(W,) - rp(O) = }o rp'(W.) dW. +:21 }o[t rp"(W.) ds
Therefore, X, = rp(O) + lot rp'(W.) dW•. Since rp" is bounded (say 1",,"1 ~ B), we have
Irp'(x) I ~ Irp'(O) I + Blxl. Hence, lot Elrp'(W.)I'd8 < 00, and lot rp'(W,) dW. is a
martingale and so is X,.
L
n
7. First, let Z, = W k ,'. Then,
k=l
L L
n n
(Z, - Z.) = (Wkt - Wka)2 +2 Wk. (W kt - Wk.)
k=l k=l
Therefore,
L
n
E[(Z, - Z.)IWkn T ~ 8, k = I, . . . , n] E(Wki - Wka)2 = n(t - 8)
k=l
and
L
n
E[(Zt - Z.)'IWkn T ~ 8, k = I, ' .. , n] E(Wkt - Wk.)'

k=l
L L
n
+ E(Wk, - Wk.)'E(Wu - WI,)' +4 W k8 'E(Wkt - Wk.)'
+ L
k~l k=l
4 Wk.WI.E[(Wkt - Wk.)(W/t - WI.)] - 3n(t - 8)' + n(n - 1)(t - 8)2
k~l
+ 4(t - 8)Z.
It follows that
lim E(Zt+to - Zt)IZn T ~ tj = n

tolO
lim E(Z,+to - Z,l'IZ" T ~ t] = 4Z,
tolO
From Proposition 6.4 we know that Z is Markov and satisfies
dZ, = n dt + 2 vz, dW,

CHAPTER 4 339
for some Brownian motion W. Since X, = 0;, we have

1 1 1 1 1
dX, = - /_ dZ, - - Z,-!(4Z,) dt = - (n dt + 2X, dW,) -dt
2 -V Z, 8 2X, 2X,
= (~2 - ~) ~ dt
2 X,
+ dW,.
We note that neither the stochastic differential equation for Z nor the one for X
satisfies the conditions of Proposition 4.1. However, since Z is Markov and
X, = V
Z" the process X is also Markov.
8. (a) For p. = p.(Xo,t) the white-noise equation is equivalent to a stochastic differ-

ential equation.
dX, = p.(X o,t)[b + u dW,]
or
X, = X 0 +b Jr/ p.(X o,s) ds + u lot p.(X o,s) dW.

Therefore,
E(XT 2IX o = x) = [x + b loT p.(x,s) ds T+ u2 loT p.2(X,S) ds

Suppose we denote the above functional of p. by F(p.). We get a minimizing p. by setting
aF(p. + • op.) I = 0
a. <=0
and get
10 T {2u 2p.(X,S) + 2b [ x + b JOT p.(x,r) drJ} ojJ(X,s) ds = 0
for all o}J.. This yields

bx
p.(x,s) = }J.(x) = (u 2 + b 2T)
(b) For p., = a(OX" the equivalent stochastic differential equation is now
dX, = ba(t)X, dt + ua(t)X, dW, + iu2a'(t)X, dt

= X,[rn(t) dt + ua(t) dW,J
We know from Solution 4.3 that the solution is given by
X, = Xo exp {lot ua(s) dW. + t)t [m(s) - u'a;0JJ dS}

-= Xoexp [lot ua(s) dW. + Jo' ba(s) ds J
Now since JOT ua(S) dW. is a Gaussian random variable with zero mean and variance
2;' = u' JoT a'(s) ds, we have
E exp [2 loT ua(s) dW, ] = exp [ 20-' t)T (8) d.~
a2 ]
Therefore,
loT ba(s) ds + 20- 2 10 T",'(s) ds ]

r
E(X T21Xo = x) = X2 exp [2
= X2 exp {2 loT [qa( s) + ;q b ds - ~ G T} Y

~ x'e- hb1u )2 T
so that a(s) = b/2q 2, 0 :5 s :5 T, is the minimizing function.
9. We use the standard technique of separation of variables. Consider the equation

al(x,t) 1 a2 a
-;;t = 2 ax' [q2(X)j(X,t») - ;u [m(x)/(x,t)]
If f(x,t) is a product /(x,t) = yet) W(x)",,(x), then we have
W(x)",,(x) dy(t) = get)

dt
(!i {~~
dx 2 dx
[q2(X)W(X)",,(x)] - In(X)JV(X),,,,(X)})
= get) -1 --.
d [ q2(X) W(x) - - d""(X)]
'2 ax ax
Therefore,
get)
1 dg(t)
dt
-- l- Id - -- [ q2(X)W(X)--
W(x)",,(x) 2 dx
d",,(x)
elx
1
.J
The two sides, being functions of different yariables, must be constants. HenC'e, if
f(x,t) is It product, then it must have the form
.h(x,/) = e-AI""x(x)
when ""x sati:;fies the Sturm-Liouville equation
~~
2 elx
[u2(X)II'(X) d""(X)]
ax
+ XlV(x)",,(J') = 0
Under rather general conditions, it can be shown that every solution /(x,t) can
be represented as a linear combination of produets. SinC'e p(x,tlxo,lo) is a fundioll of
t - 10 , x, and .1'0, it must have the assumed form.
10. The Sturm-Liouville equation in this case is
U"(x) + Vex) = 0
Let X = _§,,2, then/.(x) = ~,v, are the bounded solutions. Kuw
J'" _ 00 e- i ' x 1 (12 tX')

V 2".t exp - dx = e-l .'1
By the inversion formula of the Fourier integral, we get
0;/ exp 1 (12 tX') = I J - 2". _00

00 e-~"'ei'£ d"
CHAPTER 4 341
and
1 exp [_ ...:..(X_-_X_o)c-'] !
V2".(t - to) 2 (t - to)
11. The Fokker-Planek equation for this ease has the form
op o'p 0
at = t~(t) ox' - aCt) oX (xp)
Assuming p to have the form
p(x,tlxo,l o) = _ /1 exp [_ ~ (x - bX o)']

·V 2".a' 2a'
we get
lop lId 1 du' db
- - - (a') + -
- (x - bxo)'-- + -Xo (x - bxo)-
P ot 2 a' dt 2a 4 dt a' dt
1 op 1
-- = - - (x - bxo)
p ox a'
1 o'p I 1
- = - (x - bxo)' - -
pox' a4 a'
Upon substituting into the Fokker-Planck equation, we get
[ (x - bxo)' _ ~J da' + ~ (x _ bxo) db = ~ [(X - bxo>' - -~J

2a 4 2a 2 dt a' dt 2 a' a'
a(t)x
+ --
a 2
(x - bxo) - a(t)
Therefore,
-du' = 2a' ( - ~ + a ) = (3 + 2aa'

dt 2a'
db a
- = a'- b = ab
dt a'
which are to be solved with a'(to,t o) = 0, b(to,lo) = 1.
op
12. (a) We assume -~ 0 EO that
ot t-> 00
1 d'p(x) d
-2 -dx'- + dx
- [sgn xp(x)1 = 0
or
dp(x)
- - + 2 sgn xp(x)
dx
= eonstant
Since p'( x), p( x) ~ 0, this constant must be zero. Therefore,
which is already normalized.

(b) Consider the Sturm-Liouville equation
d [ p(x)~
"21 dx dCP(X)] + Ap(x)cp(x) = 0
This can be written as

d 2",(x)· d",(x)
- - - 2 sgnx - -
dx· dx
+ >..<p<x) = 0
The solutions satisfying, (1) d",/dx continuous at x = 0, and (2) ptp2 bounded, are of
the form
",(x) = A.eizi sin IIX
where 112 = >.. - 1. It is convenient to choose A. to obtain the normalization
1-"'", ""(x),,,..(x)p(x) dx = a( II - II')
It can be shown that A. should be Y2/.,... Hence,
<P<II,X) = ~ eizi sin IIX

The transition-density function must have the form
p(x,tlxo,to) = p(x) 10'" e-(V'+IH'-"><P<II,X),y(II,Tu) dll

Because p(x,tolxo,to) = a(x - xo), we have
,y(II,XO) = 1-"'", ",(II,x)ll(x - xo) £Ix = 'P(II,XO)
Therefore,
p(x,tlxo,to) = e-2izieizi+iz.i ~ { '" e-(I+V'HI-I.> sin IIX sin IIXO dll

.,. 10
= r(izl-iz.i>e-(l-">! ( '" e-V'(I-' o> [cos II(X -
.,. 10 Xo) - cos II(X + Xo)] dll
= e-(izl-iz.il-('-'u> 1_ {exp [_ (x - XO)2] _ exp [ _ _(X_+_X_O,--)2]}
Y2.,.. 2(t - to) 2(t - to)
CHAPTER 5
1. (H,f)(x) = Ez(f(X,»
= !(1 + e-')f(x) + -l(1 - e-')f( -x) x = 1, -1
If we set f = [~~~ 1)]. then the operator H, has a representation as a matrix, and
CHAPTER 5 343
we have
The generator A also has a matrix representation and can be found by

.
t! 0 t
1
A = hm - (H, - I) = lim - (1 - e- I )
/lo2t
1 [-1 1] 1 -1
=![-1
2 1-11]
We can verify that H, = e,A in many ways. For example, we note that
-1] 1 =-A
so that for n = 1, 2, . . . ,
An = (-l)n-IA = - -
(-I)n[ 1
2 -1
Therefore,
2. Imitating the example at the end of See. 5.2, we find
:DA = If: I bounded continuous on [0, oc), I" boundeu continuous on (0, oc) I
and
(AI)(a) = ! d 2f(a) a > 0

2 da 2
=0 a=O
3. (a) With j as defined, we have
ux(a) = Eo !o 00 e-Xtj(X,) dt = !o 00 proh (X t > clXo = a)e-x, dt
= {OO [ {OO _1_ cxp [_ 2- (x _ a)2]

}o }c y'2;t 21
dX] e·-At dl
Since {OO 1 exp [ _ ...!: (x - a)2] e- A' dt = exp (- ~Ix - al)/y2X, we find
}o Y21rt 2t
ux(a) = ic
00 exp (- vI2x
_ /
Ix -
'V 2X
a I)
dx
l
~ exp [-
2X
V2X (c - a)] a:::;c
'"' ! - ~ exp [- v'2X (a - c)] a :::: c

X 2X
Equation (3.13) now yields (with T = Tc)

ux(a) = ux(c)Ear XT ,
Since the Brownian motion X, starts from 0 at t = 0, we get
ux(O) _ ;--
Eoe-"T, = - - = exp (- V 2>. c)
ux(c)
(b) Assume that Tc has a density function q(B), 0 :::; 8 :::; oc so that
cJ>(Tc :::; t) = lot q( B) dB

Taking the Laplace transform, we get
('" e-'"
}o}o
(t q(s) ds dt = !>.}o('" e-"'q(t) dt = !>. Er"T, = !>. exp (- V2X c)
On the other hand,
uA(O) = ('" e-).'cJ>(X, > c) dt = !- exp (- v'2X c)

}o - 2>.
Because of the uniqueness of Laplace transforms
t > 0,
;2;
4. Let TO = min {t: X, = OJ. Then,
<P(S, :::: 0,0:::; t:::; I) = 10'" e-"'/2 (j'(TO ~ JIX o = a) da
Now, let p(x,tlxo,s) denote the transition density and define

f(x) = 0 x ~ 0
= x <1 °
Then, (3.12) yields
IIA(a) = 10'" r"'cp(X, < OIX o = a) dt = f~ '" [10'" e-A1p(x,tla,O) dt] dx
Using TO in (3.13), we get

uA(a) = ux(O)Eae-XTO
Let qa (-) denote the density of TO given X 0 = a. Then we have
For a > 0, we get

CHAPTER 5 345
Therefore, for a > 0,
<l'(TO ~ I[Xo = a) = tOO qa(t) dt = 2{! - <l'(X, < O[Xo = a)}
Finally,
<l'(X, ~ 0,0 ::s; t ::s; 1) =}o

roo 1
V211" e- a2 / 2[1 - 2<l'(X, < O[Xo = a)) da
The joint density function for X 0 and X, has the form
p(x,t;xo,O) =
211"
VI
I-p'
exp [ - 1
2(I-p')
(x' - 2pxxo + x o')]
where p = e-'. Therefore,
00 1
/ ---= e- u2 /2<l'(X, ::s; O[Xo = a) da
o V211"
= r
}o
00 da /0
-00
dx
211"
VI
I-p2
- exp [ -
2(I-p2)
1 (x' - 2pxa + a2) ]
(Letting a = r cos 0, x = r sin 0, we get)
=
}o
r 00 r dr /0 -,,-/2
dO
211" ~
1 - exp [ - _ _r'_ _ (1 -
2(1 - p2)
p sin 20) ]
=
VI -
211"
p2/0 -,,-/2 I - p
I
sin 20
dO
1 I.
= - - ~ Sln-' p
4 211"
Thus, we finally arrive at the expression
5. See Solution 4.7 where we determined that
m(x) = 2 q2(X) = 4x
6. By (5.7) we can take
u(x) = l c
x
exp [icY
- 2m(z)
- - dz ]
0 q2(Z)
dy
Since 2m(z) /q2(Z) = I/x, it is convenient to take c = 1 and find
u(x) = t X
e-1ny dy = In x
Since u(O) = - co, S is open at o.

7. We rewrite (2.12) for B,y E :DA as

d
-B,y = AB,y
dt
If we set I). = 10" e-)..'B,y dt, then
(00 e-)..'!:. B,y dt = -g + X (., r)..'B,g dt =

10 dt 10
Using (2.12), we get
fo OO
e--)..' -d H,g dt = A
dt
foo e-)..'H,g dt
0
= AI).
Hence,
AI).. = Xf).. - y
8. First, we write
hex, t + 0) = Ex {/(X'+<l) exp [ - lotH k(X.) dsJ}
= Ex (E {/(X,+o) exp [ - lot + ;; k(X.) dsJ IX T, 0 .::; T So})

Because X is Markov and has a stationary transition function,
hex, t + 0) = Ex (exp [ - 10;; k(X,) dsJ E {f(X'+<l) exp [ - lot k(X.+<l) dsJ I
X;;} )
= Ex {ex p [ - 10;; k(X,) dsJ h(X;;,t) }

= h(x,t) + h'(x,t)Ex(X;; - x) + !h"(x,t)Ex(X/j - X)2
- k(x)h(x,t) 0 + 0(0)
Therefore,
ah(x,t)
-at --. = 1II(x)h'(x,t) +i u2(x)h"(x,t) - k(x)h(x,t)
9. Setting k(x) = /3(1 + sgn x/2) and I(x) = 1 in Kac's theorem, we get
u~.cx) = Ex 10
(00
e- Xt exp
((t -/3 10
1 + sgn
2
X. )
ds dt
and u).. satisfies
Therefore,
1 d 2u)..(x)
- -- -
2 dx 2
(/3 + X)ux(x) -1 x>o
1 d 2u)..(x)
- - - - -Xu)..(x) = -1
2 dx 2
x<o
CHAPTER 5 347
and u~ must have the form
u>.(x) = A exp [- V2(i3 + X) xl + _1_ x >0

X+i3
... B exp (y'2}: x) + ~X x < 0
Continuity of u>. and u~ at 0 yields

1 1 1
A
~0-X+i3
1 1 1
B
~0-i
Since the Brownian motion s arts at 0, we only need u~(O) which is given by
1
u>.(O) = ---:;===
VX(X + 13)
which is the double Laplace transform of the distribution of
ret) = lot C+ ~gn X') ds
If we define q(s,t) as the density of ret), that is, q(s,t) ds = (J'(T(t) E ds), then
u).(O) =
VX(X +
1
(J)
= If
0
'"
e-(At-fJ')q(s,t) ds dt
Inverting the Laplace transform once (wi.h respect to (J) yields
Ico'" 1 1
e-A'q(s,t) dt = - - --_ e- A'
.y;; VX
Inverting once again yields
III
q(s,t) = - - --- t > s
7r V; Vt - s
= 0 t < s
Finally
(J'(r(l) ::; t) = (t q(s,t) ds

}o
=!
7r}O
(t 1
Vs(l _ s)
ds = ~ sin- 1
7r
vi
10. Define
h(x,t) = Ex exp ( -(J lot dS) X,2
and consider the equation

ah 1 a 2h
- = - - - px 2h h(x,O) = 1
at 2 ax 2
We attempt a solution of the form
h(x,t) = A (t)e- a (t)x 2

<>(0) = 0
A(O) = 1
Substituting the trial solution in the differential equation yields

A(t)
-~ - x 2a(t) = 2a 2(t)x 2 - aCt) - (1x 2
A (t)
Equating like terms, we get
a + 2a 2 (t) = (1
A(t)
-~ = -aCt)
A (t)
If we let a (t) = iJ(t) /2v(t), then
ii(t)
aCt) + 2«2(t) = 2v(t) = (1
so that
ii(t) = 2{1v(t)
With the initial condition a(O) = 0, we get
aCt) = y2;3 tanh Y2~ t
and
A (t) = exp ( - lut y2;3 tanh y 2{1 SdS)
= exp (- In cosh Y2~ t)
1
cosh y~t
Therefore,
h(x,1) ~. exp r ~ y2{1 (tanh Y2~ t)x 2)

cosh 2{1 t
and
h(O,t)
cosh y2(:l t
and the density function for Z = 10 1
X,, dt can be found by inverting h(O,1), that is,
1 ~C+i"" 1
pz(z) = - ea. - - - - d{1
27ri C - i'" cosh V2;3
11. ,,'(x) m(x) !lex) Closed or open Regularity
e" - 1
-1 Closed Regular
2
1 -x lox eu2 dy Closed Regular
ezu - 1
x -x Closed Regular
2
- 3
:r~r loX y3e-2. dy Open
CHAPTER 6 349
CHAPTER 6
1. E[IM, - Msl2lt1's] = E[IM,1 2 - 2Re(M,Ms) + IMsl2lcts]

= E[IM,12Icts ] - 2 Re(MsE[M,ItI'sD + IMsl2
= E[IM,12It1's] - 21Msl2 + IMsl2
2. It suffices to prove that for any t ~ 0 the events {S 1\ T ~ t}, {S V T ~ t}, and
{T* ~ t} are in CPt. These events can be expressed in terms of events in ti, as {S ~ t} U {T
~ t}, {S ~ t} n {T ~ t}, and n{Tn ~ t}, respectively, which implies the result desired.
3. Let s < t. Then T' = (T 1\ t) V s is a bounded stopping time with T' ~ s so that
E[MT'I(t s ] = Ms by the optional sampling theorem. Now
so
E[M'A TltI'.] = M. + (MT - Ms)I{T ""s) = MT A .
4. It suffices to consider the case that X is a submartingale. It is easy to check that Xs is

d's measurable, and by definition E[XRltis ] is d's measurable. Hence it suffices to prove
that for any bounded tis' measurable random variable U,
or, equivalently, EUXR ~ EUXs ' Now
Since {R = t 2 } = {R ~ td e is in ti'l and UI{s~'d is ti'l measurable, this becomes
Since E[X'2 - X" Id"l] ~ 0, so is the left·hand side of this equation. This completes the
proof.
5. (a) Form~n,
E[ZmIZl,"" Zn] = E[Um ·· .Un+1ZnIU1 , · · . , Un]

= (EUmEUm-l···EUn+l)Zn = Zn
(b) By the martingale convergence theorem, Zn converges a.s. to a random variable

Zoo' Now
In (Zn) = L InUi , ElnU, = 112

2" 0 In(u)du = (In2) - 1 < 0
i=l
and E[(ln(u,))2] < +00. Then, by the weak law of large numbers, (l/n)ln(Zn) n";J
In 2 - 1. Thus In (Zn) converges in probability to - 00, which means that Zn converges in
probability to zero. Thus Zoo = O-that is, Zn converges a.S. to zero. Since EZn = 1 for all
n, Zn does not converge in p·mean for any p ~ 1.
6. Let A E "B. Then M(A) = t (dMGf, /dM:J') dMo. On the other hand, by the definition of
conditional expectation, we also have
Equating these two expressions for M(A) and noting that each side of Eq. (l.9) is 6Ji\
measurable, the conclusion follows.
7. Let Rn i + 00 be stopping times such that (MRn 1\ t, CEt : t ~ 0) is a martingale for each
n ~ l. Then, as n tends to + 00, we use a conditional version of Fatou's lemma to deduce
that for t > s,
E [MtiCE s ] = E [liminfMRn 1\ tlCEs] a.s.

n~oo
os; liminfE [MRn 1\ tlCE s ] = liminfMRn 1\ S = Ms a.s.

n-oo n-oo
8. E[ t9(h· w);'j = Eex p ( ph· WT - ~ fh;ds)
= E[ t9(ph· W)TexP( p2;- p fh;ds)]
os; exp ( (p2 - P) ~) ,

as desired. Then, by the Markov inequality,
which implies that t9(h· w) is class D. Finally, since t9(h· w) is a local martingale, there
exists a sequence 'Tn i + 00 of stopping times such that Et9(h· W)T n = 1 for each n. Then
Et9(h • w)oo = 1 by Proposition l.3.
9. If '!P« '!Po then the Radon-Nikodym derivative A = d'!P/d'!Po exists and Ln = E[AICEnl.
By Proposition l.4, Loo = A a.s. so that ELoo = l. Conversely, suppose that ELoo = l.
Then for c ~ 1,
E[LnI(Ln"C}] ~ E[ LnI(Loo"C-l/C,ILn-Lool"l/cd
~ E[ (Loo - l/C)I(L oo "C-l/C,IL n-Loo l"I/C)]
so that
Equivalently, since ELn = 1 for each n,
lim limsupE[LnI(Ln> C)] =0

c-oo n-oo
so that L is a uniformly integrable martingale. In particular, Ln = E[LooICEnl. Now if U is

CHAPTER 6 351
a bounded random variable which is ttn measurable for some n, then EU = EOULn =
EoULoo. Then, by the monotone class theorem, EU = EoUL"" for all ct"" bounded measura-
ble U. Hence 0' « 0'0 and L"" is the Radon-Nikodym derivative.
10. Ln defined in the previous problem is given by
Suppose that S < + 00. Then
Eo [Ln In Ln] = E [In Ln] = ~f a; ~ ~S

i=l
so that for c > 1,
~o
so that (Ln) is uniformly integrable. Thus 0' « 0'0'

Suppose no~ that S = + 00. Unde~ measure 0'0' In Ln is a Gaussian random variable
with mean - t L ai and variance L ai. Therefore, Po [In Ln ~ c] ~ 1 for all
k~1 k~O
constants c. Combining this with the fact that, by the martingale convergence theorem, Ln
converges 0'0 a.s., conclude that 0'o(A) = 1 where A is the event that lim Ln = O. On the
n~ 00
other hand, L - 1 is a nonnegative 0' martingale so that
0' [ lim L;; 1 exists and is finite] = 1.

n~oo
Therefore 0'(A) = O. Thus 0' 1- 0'0'
11- By Ito's formula,
W,4 = 4 j tWs3 dws + 6 jt2

Ws ds
o 0
so we see that 6 jtws2 ds is the predictable compensator of w,'. Next (simply check the
o
jumps)
so the predictable compensator of Nt' is
j t(Ns _ + 1) 4 - (Ns _) 4 ds which

•
equals jt (Ns + 1) 4 - N; ds.
o 0
and
Thus, 0'[Bn+ 1 = 1Ia~] = 0'[Bn+ 1 = -1Ia~] = 0.5 for each n, and this implies part
(a).
There exists a sequence of functions Fn so that
n;e:O
Now
so if we define a predictable process H by Ho = 0 and
n>O
then Mn+l = Mn + Bn+1Hn+l for n;e: o. This implies that the representation to be proved
is valid.
CHAPTER 7
1. For each possible value (J of Ok+l'
II k + 1(J) = 0'(Zk+l =jIOk+l = (J'(')k)
= 0'(Zk+l =j,Ok+l = (J1(')k)j'!J'(Ok+l = (J1(')k)
.E'!J'(Zk+l =j,Ok+l = (J,zk = il(')k)
.E '!J'(Zk+l =j',Ok+l = (J,Zk = il(')k)

i.j'
Now the ith term in the numerator of the last expression is equal to
which is the same as IIk(i)Ri/(J). The denominator above is the sum of similar terms, and
the desired conclusion follows.
2. Since H, is not random, it is equal to E[a, - ~,)2]. Equation (4.12) for H can be written
Setting u(t)ju(t) = H, - a, we get
so u, = Ae P' + Be- P'
for some constants A and B. Using the initial condition u(O)ju(O) = Ho - a = - a, we get
u(t) {(1 + (J)e P' - (1 - (J)e- p , } where (J = -

-a
u(t) = P (1 + (J)eP' +(1 - (J)e-P' P
Since H, = a + u(t)ju(t), this yields H,. We find that H, tends to a + past tends to
infinity. This limit value can also be obtained by setting H, = 0 in Eq. (4.12).
CHAPTER 7 353
3. The goal is to find a recursive equation for ~, where ~ = I[o,v). Let tt, = (J as, Ns:
o os; s
os; t). Then, arguing heuristically,
E[d~,ltt,] = -op[t < V OS; t + dtltt,]

= - p[ t < Vltt,] op [t < V OS; t + dtl t < V, tt,]
= -~,/(t)dt/(l - F(t»
This suggests the easily verified fact that if fi, = -~,/(t)/(l -,F(t)); then m defined by
Eq. (3.2) is an tt. martingale. Now As = (a - b)~s + b, ~; = ~s' ~s = ~s-' and '" in Eq. (4.3)
is zero, so Eq. (4.3) becomes
K = at_-(a - b)(~s_)2 - bt-

s (a-b)t3b
and Eq. (4.1) yields the recursive equation ao = 1):
~,= ~o - {tf(S)/(l - F(s»)ds + f:Ks(dNs -[(a - b)t + b] ds)

INDEX
a -measurable function, 8 Bandlimited process, 105

a . Poisson process, 222 Bandpass process, 107
Absolute continuity, 215 Basic space, 3
with respect to Lebesgue measure, Bochner's theorem, 94-96
9 generalization to Rn, 281
Absorbing boundary, 203 homogeneous and isotropic case,
Absorbing Brownian motion, 195, 284-285
203 Boolean algebra, 2
Adapted, 209 Borel-Cantelli lemma, 12
Algebra, 2 Borel function, 7
Almost-sure convergence, 19 Borel measure, 5, 8
criterion for, 22 Borel probability measure, 5, 8
Arc-sine law, 208 absolute continous, 9
Atom of a (J' algebra, 26 singular, 9
Average, see Expectation Borel sets, 5
Borel (J' algebra, 5
Backward equation of diffusion, 171 Boundary
356 INDEX
Boundary (cont.) Continuity

absorbing, 203 almost-sure, 55
closed or open, 200 almost-surely sample, 55
exit, 202 of Brownian motion, 58-59
reflecting, 203 sufficient condition, 57
regular, 202 modulus of, 60-61
Boundary condition, 202-203 in vth mean, 55
Brownian motion, 50 in probability, 43, 55
absorbing, 195, 203 in quadratic mean, 55, 77
approximations to, 160 characterization, 77
Gaussian white noise as deriva- Convergence
tive of, 156-157 almost-sure, 19, 22
generator, 189 of distributions, 23-24
Markov property, 50 of expectations, 17
martingale property, 51 mutual, 19-21
modulus of continuity, 61 in vth mean, 20
multidimensional parameter, 296 in probability, 19
quadratic variation, 53 in quadratic mean, 20
reflecting, 195, 203 sequential, of events, 11
sample continuity, 58--59 to stochastic integral, 160
semigroup, 188--190 strong, in Banach space, 183
stochastic integral with respect to, to a white noise, 110-111
141-145, 163-164 Convolution, 91
Coordinate function
Co space, 89 finite-dimensional, 9
Cadlag, see corlol infinite-dimensional, 39
Cauchy sequence, 78 Corlol, 210
Chapman-Kolmogorov equation, 63 Correction term for white-noise inte-
stationary case, 181 gral, 160-162
in terms of conditional expecta- Correlation function, 74
tion,66 Counting process, 221
Characteristic function, 18 Covariance function, 48, 74
Characteristic operator, A, 196 properties, 75-77
Characterization theorem of Levy
and Watanabe, 241
Chebyshev inequality, see Markov Decreasing sequences
inequality, v = 2 of events, 11
Compensator, 224,238 of random variables, 13
Completion of a probability space, 3 Detection, 250, 254-257
Conditional density function, 26 Differentiability of measures, 214-
Conditional distribution function, 215
25, 32-33 Differential equation
Conditional expectation, 27 driven by white noise, 113-115,
convergence properties, 29 126
smoothing properties, 29-31 stochastic, 149-150
Conditional independence, 31 Differentiation in quadratic mean,
Conditional probability, 25 79
INDEX 357
Differentiation rule of stochastic in- Fatou's lemma, 17

tegral, 147-148, 239 Feller process, 192, 194
Diffusion equations, 169 Filtering
backward, 171 and detection, 250
forward, 172, 274 Kalman-Bucy, 125-127, 268-269
fundamental solution, 172 linear time-invariant, 91, 101
multidimensional case, 176 recursive, 231, 269, 274
solution using Laplace transform, Wiener, 120-122
205-206 Finite-dimensional distributions of a
Diffusion process, 198 stochastic process, 37-41
Discontinuities of first kind, 59 compatibility condition, 38-39
random telegraph process, 60 First exit time, 194
random telegraph process, 60 First passage time, 190
Distribution function, 5-7 Fokker-Planck equation, 172
probability, 7-8 Forward equation of diffusion, 172
joint, 8 fundamental solution, 172
Dominated convergence, 17 Fourier integral, 90
Dynkin's formula, 192 inversion formula, 90
Fundamental solution, 172
Eigenfunction
of integral equation, 83 Gauss-Markov process, condition for,
of Laplacian operator, 287 64-65
Eigenvalue Gauss-Markov random field, iso-
of integral equation, 83 tropic, 292-293
of Laplacian operator, 286 and homogeneous, 293-295
Envelope, 109 Gaussian process, 46
Ergodic process, 67 characteristic function, 47
condition for a Gaussian process, density function, 49
69 linear operations on, 47-48
Ergodic theorem, 68 Markov, 64-65
Estimator Gaussian random variable, 46
linear least squares, 116-117 Gaussian white noise
recursive, 269 convergence, 160-161
Euclidean distance, 282 correction term, 160-162
Euclidean group, 282 as derivative of Brownian motion,
Euclidean norm, 281 156-157
Event, 1-3 in differential equations, 156-160
convergence, 11 in integrals, 157
Exit boundary, 202 simulation, 162-163
Exit time, 194 Gegenbauer equation, 287
Expectation, 15-17 Gegenbauer polynomials, 287
conditional, 27, 29-31 Generalized process, 279
Convergence, 17 Generator (of Markov semigroup),
Exponential of a semimartingale, 183
241 Girsanov's theorem, 244
Extension thereom, 3 Green's function, 199
358 INDEX
Hilbert space: Kalman-Bucy filtering, 125-127,

projection onto, 117 268--269
second-order process, 101 Karhunen-Loeve expansion, 82-88
wide-sense stationary process, 101 Kolmogorov condition (for sample
Hilbert transform, 105, 108 continuity), 57
Holder condition, 173 Kolmogorov's equations, see Diffu-
Homogeneous random field, 281 sion equations
isotropic case, 283
Hypothesis testing, 254-255
V martingale inequality, 214
Lp space, 89
Impulse response, 91 Laplacian operator, 286
Increasing sequence Lebesgue decomposition, 215
of events, 11 of Borel probability measure, 9
of random variables, 13 Lebesgue measure, 5
Independence Lebesgue-Stieltjes integral, 18, 33,
of events, 24 217-219
of random variables, 25 martingale property, 222, 232-233
Independent increments, 50 for random processes, 220-221
Indicator function, 14 Likelihood ratio, see Radon-Nikodym
Infimum, 13 derivative
Innovations, 254 Likelihood ratio test, 254-255
derivation of filters, 262-267 Limit, sequential, 11
Integrable random variable, 17 Limit inferior
Integral of events, 11
quadratic-mean, 79-80 of random variables, 13
stochastic, see Stochastic integral Limit superior:
Integral equation for orthogonal ex- of events, 11
pansion,83 of random variables, 13
Intensity of a counting process, 226 Linear least-squares estimator, 116
Invariant random variable, 67 Linear operations
Inverse Fourier transform, 90 on Gaussian process, 47-48
Inverse image, 7 on second-order process, 78
Inversion formula Linear time-invariant filtering, 91
Fourier integral, 90 characterization, 101
Fourier-Stieltjes integrals, 96-97 Lipschitz condition, 150
spectral distribution, 96-97 Local martingale, 165, 234
Isotropic random field, 289 Local square integrable martingale,
Ito integral, 140; (see also Stochastic 234
integral) Locally bounded, 234
Ito's differentiation rule, 147-148, Locally finite variation, 219
239
Ito's product formula, 239
Markov inequality, v == 2, 21
Markov process, 50, 61
Kac's theorem, 208 and conditional expectation, 65-66
Kakutani's dichotomy theorem, 249 Gaussian, 64
INDEX 359
representation by stochastic equa- Natural scale, 197

tion, 168 Nonanticipative filter, 118
strong, 191 Nonnegative definite function, 75
Markov random field, 292 Null set, 3
Markov semigroup, 182-183
Markov time, 190
Martingale, 51, 209 Optional sampling theorem, 213
local, 165, 234 Ornstein-Uhlenbeck process, 66
and stochastic integral, 145, 166, Orthogonal expansion, 81
228, 232-233 Orthogonal increments, 98
Martingale inequality, 51, 214 Orthonormal family, 81
Martingale representation theorem,
246
Mean function (or mean), 48 Paley-Wiener condition, 118
Measurable function, 7 Pathwise
Borel,7 integration, 217
Measurable process, 45 solutions to filtering equations,
Measurable space, 5, 215 274-275
Measure Polar coordinates, 285
absolute continuity, 215 Predictable
compensator, 224, 238
Borel, 5, 8
continuous time process, 224
finite, 5
discrete time process, 223
Lebesgue, 5
projection, 252
mutual absolute continuity, 215
0" algebra, 224
probability, 3
Prediction, linear least-squares, 116
Borel, 5, 8, 9 Probability density function, 9
0" finite, 5
Probability distribution function, 7-
singular, 215 8
spectral, 94, 284 Probability measure, 3
Mercer's theorem, 85 Borel, 5, 8, 9
Meyer's predictable compensator elementary, 2
theorem, 225 Probability space, 3
Minimum phase condition, 118 completion, 3
Modulus of continuity, 60-61 Process
Brownian motion, 61 with independent increments, 50
Monotone class theorem, 33 with orthogonal increments, 98
Monotone convergence, 17 Progressively measurable, 220
Monotone sequence of events, 11 Projection, 11 7
of random variables, 13 optional, 252
Monotone sequential continuity at 0, predictable, 252
2
Mutual convergence, 19-21
almost-sure, 19 Quadratic-mean continuity, 55, 77
in probability, 20 Quadratic-mean convergence, 20
in vth mean, 21 Quadratic-mean derivative, 79
in quadratic mean, 21 Quadratic-mean integral, 79-80
360 INDEX
Quadratic variation, 236 Sample function, 37

Quasi-martingales, 167 Sample space, 216
Sampling theorem, 106
for bandpass process, 109
Radon-Nikodym derivative, 28, 215 Scale function, 197
connection to martingales, 216, Second-order calculus, 79-80
243-244 Second-order random variable, 74
finite-dimensional approximations, Semigroup
216-217 of Brownian motion, 188-190
representation, 257 of Markov processes, 182-183
Radon-Nikodym theorem, 27, 215 Semigroup property, 183
Random field, 279-280 Semimartingale, 234
homogeneous, 281 approximation of integral, 235
and isotropic, 283 integral, 235
isotropic, 289 Separability of stochastic process,
spectral representation, 282, 291 41-45
Random measure, 280 Separable and measurable modifica-
Random telegraph process, 59-60 tion,45
discontinuities of, 60, 63 Separable process, 41-45
Random variable Separable IT algebra, 26
discrete, 8 Separating set, 42
invariant, 67 of process continuous in probabil-
real,8 ity, 43
second-order, 74 Sequence
simple, 14 of events, 11
Realization of random variables, 13-14
of probability distribution, 9 Sequential continuity, 11
for stochastic process, 39 Sequential limits of events, 11
Recursive estimation, 269 IT additivity, 3
Recursive filtering, 269 IT algebra, 2
equation of motion, 274 atoms of, 26
reduction to Kalman-Bucy filter- Borel,5
ing, 268-269 generated, 2
Reflecting boundary, 203 minimal,2
Reflecting Brownian motion, 195, predictable, 224
203 separable, 26
Reflection principle of D. Andre, 207 Simple random variables, 14
Resolvent, 185 Simulation of white noise, 162-163
Riccati equation, 127 Singularity, 215
Rigid-body motion, 282 with respect to Lebesgue measure,
Rotation, 282 9
Spectral-density function, 92, 94
input-output relationship, 93
S -space, 89 interpretation in terms of average
Sample continuity, 55 power, 93
of Brownian motion, 58-59 Spectral-distribution function, 94
condition for, 57 for random fields, 281, 284
INDEX 361
Spectral factorization, 119-120 Supremum, 13

Spectral measure, 94
for random fields, 281, 284
Spectral representation Transfer function, 91
of homogeneous random fields, 282 Transformation rule
isotropic case, 291 of Borel probability measures, 10
of wide-sense stationary process, of probability density function, 10
101 of stochastic integral, 147-148
Spherical harmonics, 288 Transition density function, 169
State space (of diffusion process), Transition function, 62, 169
181 stationary, 180
Stationary process, 6~7 Translation, 282
wide-sense, 66, 88-89, 101 Translation group
jointly, 104 for stationary process, 67
Stationary transition function, 180 for wide-sense stationary process,
Step function, 141 88-89
Stieltjes integral, 18, 33
Stochastic differentiation equation,
149-150 Uniformly integrable, 211
and diffusion equations, 173 martingale, 211
properties of solution, 150-155 Usual conditions, 210
representation of Markov process,
168
Stochastic integral, 98, 141-145,
163-164 Variation, 218
differentiation rule, 147-148, 239 quadratic, 236
local martingales defined by, 165
martingales defined by, 145
pathwise, 220-221 White noise, 109-115
with respect to Brownian motion, convergence to, 110-111
141-145, 163-164 differential equations driven by,
with respect to martingales, 166 113-115, 126
with respect to process with or- Gaussian, see Gaussian white
thogonal increments, 98 noise
sample continuity, 146--147 integral, as stochastic integral,
Stochastic integral equation, see Sto- 111
chastic differential equation Wide-sense stationary process, 66,
Stochastic process 88--89, 101
definition, 37 Wiener-filtering, 120-122
sample functions, 37 Wiener measure, 53, 59
second-order, 74 Wiener martingale, 209
Stopping time, 212 Wiener process, see Brownian
Strong convergence, 183 motion
Strong Markov process, 191
Submartingale, 41, 209
Supermartingale, 51, 209 Zero-one law, 245
University Series in Modem Engineering
Elements of State Space Theory of Systems

A. V. Balakrishnan
Kalman Filtering Theory

A. V. Balakrishnan
Systems & Signals

N. Levan

Stochastic Processes in Engineering Systems 2nd ( Eugene Wong Bruce Hajek ) PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Stochastic Processes in Engineering Systems 2nd ( Eugene Wong Bruce Hajek ) PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Springer Texts in

Multivariable Feedback Systems

Introduction to Random Processes

Stochastic Processes in Engineering Systems

Library of Congress Cataloging in Publication Data

Previous edition, Stochastic Processes in Information and Dynamical

© 1971, 1985 by Springer-Verlag New York Inc.

Typeset by Science Typographers, Inc., Medford, New York.

This book is a revision of Stochastic Processes in Information and

with precision and clarity; so much ofthe theory of stochastic processes

Neveau. On a more elementary level we have found the text by Thom-

1 ELEMENTS OF PROBABILITY THEORY 1

1. Events and probability 1

1. Definition and preliminary considerations 37

2. Separability and measurability 41

4 STOCHASTIC INTEGRALS AND STOCHASTIC DIFFERENTIAL EQUATIONS 139

5 ONE-DIMENSIONAL DIFFUSIONS 180

6 MARTINGALE CALCULUS 209

6. Quadratic variation and the change of variable formula 236

7 DETECTION AND FILTERING 250

8 RANDOM FIELDS 279

SOLUTIONS TO EXERCISES 316

1. EVENTS AND PROBABILITY

The simplest situation in which probability can be considered involves a

an event. Let a denote the class of all events. A satisfactory theory of

o :::; (p(A) :::; I and (p(n) = I (l.la)

Both (l.la) and (l.lb) are clearly natural conditions to be required of a

Whenever {An} is a sequence of sets in CB such that An :) .1n+1

where 0 denotes empty set. Conditions (1.la) to (1.le) are equivalent to

()'*(A) = inf {()'(G): G::) A, G E al

vals [a,b) with °: :;

(P([a,b» = b - a = ~ <P([ai,b i » ~ (b i - ai) (1.3)

Hence, b - a 2:: (b i - ai). Next, we note that for any 0 > 0,

[a, b - 0] C U (a, - 0/2 i , bi)

Therefore, for every 0 > 0, we have (b - a) (b i - ai) :::; 20,

2. MEASURES ON FINITE-DIMENSIONAL SPACES

M(aI, a2, . . . , an)

The function M satisfies the following rather obvious conditions:

M(a) ) J.,I,(Rn) (2.2c)

Condition (2.2a) is a simple property of measures, and conditions (2.2b)

Denoting Ai = {x: ai :::; Xi < bd, we have (because of additivity)

J.,I,(A) = J.,I, en A,n (-

Continuing in this way, we find

where M is defined by (2.1), SeA) denotes the collection of 2 n vertices

For each i lim

Condition (2.3) might be referred to as the monotonicity condition. In

Condition (2.4) is a continuity condition. In one dimen~ion, it reduces to

A function satisfying (2.2) to (2.4) will be called a distribution function.

and extend /-L to <B(C) by additivity. :\IoDotone sequential continuity of /-L

/-L is a probability measure. Distribution functions satisfying (2.6) are

3. MEASURABLE FUNCTIONS AND RANDOM VARIABLES

which contains all continuous functions and is closed under pointwise

Px(a) = <p({w: X(w) < a}) (3.2)

Px(a) = PX(aI, . . . , an)

is called the joint probability distribution function of X. At the same time,

defines a Borel probability measure.

If X is discrete, then its distribution function P x is a function which is

Let P be a probability measure defined on (R",m n ). It is said to be

Px(A) = fA px(x) dx (3.8)

In general, P x is neither absolutely continuous nor singular, but we

If X = (XI, X 2, • • . , X n) are real random variables defined on a

are again random variables defined on (n,Ci,cp). If we denote the Borel

Py(A) = cp(lw: Yew) E A})