Beruflich Dokumente
Kultur Dokumente
for
Chemicaland BiologicalEngineers
Michael D. Graham
James B. Rawlings
p(E,t)
0/
Publis ing
Publishing
Madison, Wisconsin
and printed
in Lucida using LATEX,
set
was
and bound
book
This
by
Cover
ham
M. and
design by Cheryl
LLC
Cheryl M. Rawlings,
publisher
Madison, WI 53705
cal modeling.
1. Chemicalengineering. 2. Mathemati
I. Rawlings, JamesB. 11. Title.
Printed in the United States of America.
First Printing
May 2013
FSC
www.fscorg
MIX
Paper from
responsible sources
FSC' C002589
Preface
by modern chemical and biological engineers
undertaken
.
Research
mathematical principles and
of
range
methods
corporates a wide
struggled
to
incorporate
authors
as the
book came about two-semester course sequence for new modernlip
or
graduate
ics into a oneof
aspects
essential
traditional
losing the
mathemah.s
dents, while not
decided
are
we
that
particularly
Topics
important
cal modeling syllabi. traditional texts include: matrix
in
factorizations
but not represented
basic
decomposition,
qualitative
value
dynamics
such as the singular
integral
representations
equations,
of partial
of nonlinear differential
stochastic
and
probability
processes, and
state
differential equations,
the
in
more
many
book.
find
will
reader
Thesetopics
estimation. The
often
have
which
texts,
a
many
bias towardthe
are generally absent in
early 20th-century physics. Wealsobe.
mathematics of 19th- through
substantial interest to activeresearchers
lieve that the book will be of
survey of the applied mathematics COmmonly
as it is in many respects a
engineering practitioners,and
encountered by chemical and biological
certainly absent in their chemicontains many topics that were almost
cal engineeringgraduate coursework.
Due to the wide range of topics that we have incorporated, the level
of discussion in the book ranges from very detailed to broadly descriptive, allowingus to focus on important core topics while also introducing the reader to more advanced or specialized ones. Someimportant
but technical subjects such as convergence of power series havebeen
treated only briefly,with references to more detailed sources. Ween-
vii
methods.
JBR
MPG
Wisconsin
Madison,
Madison, Wisconsin
CO S
Acknowledgments
notes for graduate level analysis
lecture
the
of
out
Thisbookgrew the authors in the Department of Chemical and Biocoursestaughtby
the University of Wisconsin-Madison. We have
at
Engineering
logical
many graduate students taking these
of
feedback
the
benefitedfrom
with which they received some
enthusiasm
the
appreciate
classes,and
the notes. Especially Andres Merchan,
earlyandincompletedrafts of
helpful discussion and
provided
KushalSinha,and Megan Zagrobelny
assistance.
contents
LinearAlgebra Linear
and
Spaces
1.1 VectorsSubspaces .
1.1.1
2
2
4
5
6
8
9
composition
10
1.2.5 The Outer Product, Dyads, and Projection Operators
11
1.2.6 Partitioned Matrices and Matrix Operations
12
1.3 Systems of Linear Algebraic Equations
1.3.1 Introduction to Existence and Uniqueness .
16
18
Method .
formations
ix
33
33
33
35
37
41
42
ontehts
Decomposition . . .
1.4.6 Schur
Value Decomposition
1.4.7 Singular
of Matrices
1.5 Functions
Polyomial and Exponential .
15.1
1.52 OptimizingQuadratic Functions
1.5.3 VecOperator and Kronecker Product of Matrices
52
32
1.6 Exercises .
Equations
2 OrdinaryDifferential
69
2.1 Introduction. .
2.2 First-OrderLinear Systems .
22.1 SuperpositionPrinciple for Linear Differential Equations
22.2 HomogeneousLinear Systems with Constant Coefficients
2.52 LyapunovFunctions. . .
97
97
98
98
99
102
104
110
112
112
113
118
118
126
133
145
145
148
153
155
158
158
xi
contents
174
Method of Multiple Scales
2.6.6
Dynamics of Nonlinear Initial-ValueProblems 179
Qualitativ e
179
Introduction
2.7
2.7.1
179
Subspaces and Manifolds
2.7.2 Invariant
183
Special Nonlinear Systems
2.7.3 Some
187
Behavior and Attractors
2.7.4 Long-Time
Fundamental Local Bifurcations of Steady States 193
2.7.5 The
200
Solutions of Initial-Value Problems . . .
Numerical
2.8
201
Methods: Accuracy and Stability
2.8.1 Euler
204
Accuracy, and Stiff Systems
2.8.2 Stability,
204
Methods
2.8.3 Higher-Order
208
Solutions of Boundary-ValueProblems
Numerical
2.9
208
of Weighted Residuals
2.9.1 The Method
220
.
2.10 Exercises
253
253
253
253
256
256
258
264
269
271
271
272
287
291
297
305
308
xi
contents
174
of Multiple Scales
2.6.6 Method
Dynamics of Nonlinear Initial-Value Problems 179
Qualitative
179
2.7
Introduction
2.7.1
179
Subspaces and Manifolds
2.7.2 Invariant
183
Special Nonlinear Systems
2.7.3 Some
187
Behavior and Attractors
2.7.4 Long-Time
21.5 The Fundamental Local Bifurcations of Steady States 193
200
Solutions of Initial-ValueProblems
Numerical
2.8
201
2.8.1 Euler Methods: Accuracy and Stability
204
2.82 Stability,Accuracy, and Stiff Systems
204
2.8.3 Higher-OrderMethods
208
Solutions of Boundary-ValueProblems
2.9 Numerical
208
2.9.1 The Method of Weighted Residuals
220
Exercises
2.10
253
253
253
253
256
256
258
264
269
271
271
272
287
291
297
305
308
xii
Contents
Problems.
Prob-
3.5 Exercises
4.4 Sampling
316
320
323
325
347
347
349
356
358
368
370
374
399
404
410
412
contents
4.7.6 Vector
all Measurements,
xiii
Unknown
Measurement
co-
PLS regression .
4.8 PCA and
4.9 Appendix Proof of the Central
Limit Theorem
Exercises .
4.10
414
416
425
430
Introduction .
455
5.4.1 Introduction
5.42 Optimal Dynamic Estimator .
A MathematicalTables
A.l LaplaceTransform Table
A.2 Statistical Distributions ..
A.3 Vector and Matrix Derivatives ...
AuthorIndex
CitationIndex
SubjectIndex
463
470
475
475
477
483
486
492
498
498
501
506
508
511
513
528
528
531
532
536
538
540
543
546
List of Figures
23
for solving
26
N (AT).
1.3 An iteration of the Newton-Raphson method
output
2.1
R2x2
xiv
31
so
59
60
69
74
76
84
86
103
104
107
123
125
127
128
Figures
of
List
xv
to the initial-value
problem with
2.8 Solution
nonhomogeboundary
conditions.
neous
for
H
2P2
Landscape
+ {q
2.17
2.18 A limit cycle (thick dashed curve) and a trajectory (thin
solid curve) approaching it.
2.19 Periodic (left) and quasiperiodic (right) orbits on the surface of a torus. The orbit on the right eventuallypasses
through every point in the domain
2.20 A limit cycle for the Rssler system, a = b = 0.2, c = 1.
139
146
149
151
173
181
182
184
188
189
191
192
194
194
196
197
198
203
207
208
209
212
213
List of Figures
for the
Legendre
-Galerkin ap217
223
228
10.
xvi
uses
corrector.
nth-order
predictor
and nth-order
point Xo.
a
around
size
er and eo.
zero
vectors
to
ng and unit
snrillkl O)
3.1
3.3
two- divergence.
3.4 A
of the
3.6
260
263
265
266
3.2
a
equationin
Laplace's
239
square
Original
(a)
domain.
prob-
spherical harsurface
parts of the
real
to right,
Y44.
left
Y43,
domain.
31 From Y40, Y41, Y42,
physical
< 0 rex
the
monics
domain
in
in the
and sink
wave
source
right-traveling
3.8 A
initially
opposite sign
3.9 An
with
"image"
distance for
left-traveling
penetration
a
of
position
membrane
versus
3.10Concentration
rate constants. and sphere.
reaction
different
of slab, cylinder,
heating
3.11 Transient domain..
3.12 Wavy-walled
density p; (x)
probability
distribution, with
4.1 Normal
m) 2/2). .
(1/dQF)
2. The contour lines show
=
n
for
normal
4.2 Multivariate 95, 75, and 50 percent probability
containing
ellipses
form x TAx = b.
quadratic
of
geometry
The
4.3
c
=
4.4 Theregion X(c) for y
random
4.5 Ajointdensityfunction for the two uncorrelated
variables in Example 4.8.
278
290
303
307
313
334
339
355
360
360
370
372
376
378
xvii
Figures
Li5t
Of
Histogram
of y =
of 10,000 samples
Histogram
marginals, marginal box, and
4.9
multivariate normal,
4.8
Tile
4.10 bounding
box.
384
384
398
423
424
428
444
445
451
461
462
469
469
477
479
xviii
list Of
starting
With
20 A molecules, 100 B molecules and 0 C
molecules,
k
1/20, k-l =
5.14 Simulationof 2 A
simulation.
5.15 Cumulative distribution for 2 A
B at t = 1 with no =
500, Q = 500. Discrete master equation (steps) versus
omega expansion (smooth)
5.16 The change in 95%confidence intervals for R(klk) versus
489
490
491
493
497
498
512
522
523
524
Tables
of
List
1.1
2.1
2.2
2.3
icalcoordinates..
3.1 Gradient and Laplacian operators in Cartesian,cylindrical,and spherical coordinates. . .
Larger table of Laplace transforms.
Statistical distributions defined and used in the text and
exercises.
62
107
113
119
263
530
531
536
1.1
1.2
1.3
1.4
1.5 Example:
with index
notation
10
11
19
20
24
39
40
43
46
47
48
58
82
2.1
2.2
2.3
2.4
der zero
2.6 Example:Fourier series of a nonperiodic function
2.7 Example:Generatingtrigonometric basis functions
106
109
109
115
117
124
130
131
132
133
Examples
Listof
and Statements
xxi
2.11 Example:
steady-state temperature
2.12 Example:
Steady-state temperature
profile with
profile with
fixed end
insulated
134
Steady-state temperature
2.13 Example:
profile with
fixed flux
Fixed flux revisited .
Example:
137
2.14
141
2.15 Example: Nonhomogeneous boundary-value
problem
and
the Green's function
142
2.16 Definition: (Lyapunov) Stability
Attractivity .
147
2.17 Definition:
147
2.18Definition:Asymptotic stability
147
2.19Definition:Exponential stability
148
2.20 Definition: Lyapunov function
149
2.21Theorem: Lyapunov stability
150
2.22 Theorem: Asymptotic stability .
151
2.23 Theorem: Exponential stability
152
2.24 Theorem: Lyapunov function for linear systems .
155
2.25 Definition: Exponential stability (discrete time)
156
2.26 Definition: Lyapunov function (discrete time)
156
2.27 Theorem: Lyapunov stability (discrete time)
157
Asymptotic
Theorem:
stability
2.28
(discrete time) .
157
Exponential
Theorem:
stability (discrete time) .
2.29
157
2.30 Example: Matched asymptotic expansion analysis of the
reaction equilibrium assumption
169
2.31 Example: Oscillatory dynamics of a nonlinear system
2.32 Theorem: Poincar-Bendixson
176
189
(cylindrical)coordinates
3.2 Example:The divergence theorem and conservationlaws .
3.3 Example: Steady-state temperature distribution in a circular cylinder
3.4 Example:Transient diffusion in a slab
259
267
273
275
277
278
279
283
List of Examples
and
stQteth
xxii
28
286
293
one and
294
296
309
314
3.15Example:
function of the normal density
Characteristic
356
4.1 Example: mean and covariance of the multivariate
The
4.2 Example:
normal
function of the multivariate normal 361
Characteristic
365
4.3 Example:
normal density
Marginal
4.4 Example:
. . .
Nonlinear transformation.
Example:
4.5
of two random variables .
4.6 Example:Maximum
implies uncorrelated
4.7 Example:Independent
imply independent?
4.8 Example:Does uncorrelated
366
369
369
371
371
373
376
377
383
386
387
4.16Theorem:
Lindeberg-Feller
central limit theorem
387
387
390
391
397
4.22Theorem:
Meanand varianceof samples from a normal . . 408
4.23
Example:Comparing
PCR and PLSR .
4.24
Theorem:Taylor's
theorem with bound on remainder
420
426
List of
Statements
xxiii
ordinatesystems
properties from sampling
Example:Average
468
3.2 Example:Transport of many particles suspended in a fluid 473
3.3 Example:Fokker-Planck equations for diffusion on a plane 474
3.4 Algorithm:First reaction method
483
485
3.3 Algorithm: Gillespie's direct method or SSA
509
3.6
Observability of a chemical reactor .
Example.
511
5.7
iteration and estimator stability
Theorem: Riccati
515
(with probability one)
3.8
Definition: Continuity
3.9
LinearAlgebra
ax e V
a(x) =
(a + )x = ax + x
+ y) = ax + ay
Ix = x, Ox = 0
Linear Algebra
from the origin to a point on the unit sphere. The sum of two such
vectors will no longer lie on the unit spherevectors definingpoints
on the sphere do not form a linear space. Regarding notation, many
1.1.1 Subspaces
x +y e S
ax e S
(1.1)
For example, if V is the plane (R2), then any line through the origin
on that plane is a subspace.
llaxll =
llxll
triangle inequality
llx112= ElXi12
lixilp=
particularlyuseful are the cases p = 1, sometimes called the "taxicab
norm"(why?)and p = co: llxlloo= maxi IXil.
generalizes the dot product of elementary algeThe INNERPRODUCT
bra and measures the alignment of a pair of vectors: an inner product
(ax,y) =
(x, x) > 0, if x *O
Theoverbar denotes complex conjugate. Notice that the square root of
satisfies all the properties of a norm, so it is
the inner product
is
a measureof the length of x. The usual inner product in
(x, y) =
XiYi
in whichcase
= llx112.This is a straightforward generalization
of the formula for the dot product x y in R2 or R3 and has the same
geometricmeaning
whereis the angle between the vectors. See Exercise 1.1 for a derivation. If we are considering a space of complex numbers rather than real
numbers,the usual inner product becomes
(x,y) =
i=l
Xii
Linear Algebra
a
Finally,we can represent a vector x in Rn as single columnof
XT as a Row
and define its TRANSPOSE
VECTOR,
elements, a COLUMN
VECTOR
is
DEPENDENT.
Otherwisethe set is LINEARLY
A
= 0 for all i.
x = alel +
+ a3e3 +
directions form a basis. But more generally, any three LIvectors form
a basis for R3,
Although any set of LIvectors that span a space form a basis, some
bases are more convenient than others. The elements of an ORTHONOR
MAL(ON)basis satisfy these properties
(ei,ej)
0, i
DELTA.In an orthonormal
is called the KRONECKER
ij
symbol
The
can be expressed
vector
basis,any
and Matrices
1.2 Linear Operators
of A)
DOMAIN
eratorA is a mapping that takes elements of one set (the
of A). LINEAR
andconvertsthem into elements of another (the RANGE
(1.2)
A(au) = a(Au)
A 12
A 21
'422
Aln
Aml Am2
Thefirst subscript of each element denotes its row, while the second
denotesits column. The transformation of a vector
into another
Y=
xn
thenoccursthrough matrix-vector multiplication. That is: y = Ax,
whichmeans
Linear
Algebra
+ X3C3+
1. A is SQUARE
if m = n.
2. A is DIAGONAL
if A is square and Aij = 0 for i * j. Thisis
and Matrices
Operators
Linear
1,...q
(AB)ij = ui
Equivalently,
n
(AB)ij= E AikBkj,i=
= l,...q
100
010
001
Linear
Algebra
- 7
Sincethese are not equal, we conclude that the two vector operations
do not commute.
(AB) T = B TA T
(ABC) T = c TB TA T
If A, x, and y are real, then the inner product between the vectorAx
and the vector y is given by
(Ax) Ty =
(1.3)
(Lx,y) =
(1.4)
in the
(ABC)ij= E E AikBklClj
the indicesk and I appear twice in the summations, while the indices i
andj only appear once. This observation suggests a simplified notation
for products, in which the presence of the repeated indices implies
Linear
10
Algebra
Soluon
T
(a) (Ax, y) = (x, y)
(Ax, y) = AijXji
= XjAiji
= xjAjOi
= xjjiYi
= (x,Ty)
= Aji + Aij
= AjkBki
= BkiAjk
BikAkj
= (BTAT)ij
= (A + AT)T
ij
= AkiAkj
= AjkAki
(AAT) ij = AikAkj
= AikAjk
= AjkAik
= (ATA) ji
= (ATA)Tj
AjkAki
(AAT)T
11
(V2,u1)ul
Inhigherdimensions,where we have v3, v4, etc., we continue the process,subtractingoff the components parallel to the previously determinedorthogonal vectors
(V3,u1)
(V3,u2)
u2
2
llul 11
and so on.
with orthonormal
columns,
thonormal,QTQ = I.
Given
twoLIvectors VI and in [RTZ,
Gram-Schmidtuses projection to
constructan orthogonal
pair
u 1 = VI
Linear
12
Algebra
PV2= (ll
That is, P is givenby what we will call the OUTERPRODUCT
between
generally,
the
outer
project uvT between
l and itself: 1T. More
a
DYAD,
called
that
satisfies the following
vectors u and v is a matrix,
properties
(uvT)ij = uiVj
(uv T)w = u(v Tw)
w T(uv T) = (w Tu)v T
ii
13
AIC
'412
A21 A22
Akl Ak2
whereeachAij is an mi x nj submatrix of A. Note that Ei=l mi = m
andEj=lnj = n. Two of the more useful matrix partitions are columnpartitioningand row partitioning. If we let the m-vectors ai, i
1,2 n denote the n column vectors of A, then the column partitioning of A is
A = al
a2
al
a2
am
A12
Akl Ak2
AIC
Bll
Bml
B12
Bin
Bmn
Linear Algebra
14
1. Scalarmultiplication.
Ml 1 A12
RAk1 Ak2
2. Transpose.
11
21
18
2e
ke
C, then we saythe
4. Matrixmultiplication.If qi = ri for i = 1
partitioned matrices conform, and the matrices can be multiplied
Cll
Cle
Ckl
These formulas are all easily verified by reducing all the partitioned
matrices back to their scalar elements. Notice that we do not haveto
remember any new formulas. These are the same formulas that we
learned for matrix operations when the submatrices Aij and Bij were
scalar elements (exceptwe normally do not write the transposefor
scalars in the transpose formula). The conclusion is that all the usual
rules apply provided that the matrices are partitioned so that all the
implied operations are defined.
in the form
Ax = b
15
b e
and x (e Rti) is the vector of
whereA e R
unknowns.
consider the vectors Ci that form the columns of A. The solution x (if
Rmxn X e Rn, b e Rm
has at least one solution x if and only if the columns of A are not
linearlyindependent from b.
b e R3. Conversely, if the column vectors of A are not LI, then they do
(1) Ax
= 0 (the homogeneous
(2)Ax
(1)Ax
solutionfor all b * 0.
or
(b)If the columns of A are NOT1.1,then the matrix is SINGULAR
In this case:
NONINVERTIBLE.
solutions comprise the NULLSPACEof A.
(2) Forb * 0, Ax
= b has either:
of A, or
in the RANGE
Linear
16
solution to Ax
ticular
Algeb
= b and any
combination
x = XH + XP where Axpofth
i.e.,
0,
=
Ax
lutions of
Qhd
13.2
LU Decomposition
b:
=
Ax
Solving
123
048
007
Allthe elements below the diagonal are zero. Since the third rowhas
onlyone nonzero element, it corresponds to a single equationwitha
it
singleunknown. Once this equation is solved, the equationabove
has only a singleunknown and is therefore easy to solve,andso
LUdecompositiondepends on the fact that a square matrixAcanbe
written A = LU,where L is lower triangular and U is upper triangular
Usingthis fact, solving Ax = b consists of three steps, the firstofwhid
takes the most computation:
1. Find L and U from A: LU factorization,
2. SolveLc = b for c:
forward substitution.
of Linear
1.3Systems
Algebraic Equations
A=
352
082
628
row
0 82
Stepc. Nowthe first column of the matrix is zero below the diagonal. Wemove to the second column. Replace row 3 with a
linear combination of row 2 and row 3 that makes the sec-
352
082
006
=U
This matrix is now the upper triangular matrix U. For a matrix in higher dimensions, the procedure would be continued
until all of the elements below the diagonal were zero. The
matrix L is simply composed of the multipliers Lij that were
computed at each step
1
L = L21
L31 L32 1
Linear
18
Algebra
Lc = b and then Ux
Now,for any vector b the simple systems
arises. One often finds a complicated definition based on submatrices, but having the LUdecomposition in hand a much simpler formula
emerges (Strang, 1980). For a square matrix A that can be decomposed
into LU,the determinantis the product of the pivots
n
If m permutations of rows must be performed to complete the decomposition, then the decomposition has the form PA = LU, and
detA = (1)m
Uii
The matrix A-I exists if and only if detA * 0, in which case detA 1
(detA)-1. Another key property of the determinant is that
of
Systems
1.3
19
of a Matrix
13.4 Rank
the rank of a matrix, it is useful to establish the followBeforewe definematrices: the number of linearly independent columns
of
ingproperty equal to the number of linearly independent rows.
is
of a matrix
Linearly independent columns, rows of a matrix
Example1.8:
Assume A has c linearly independent columns and r
GivenA e Rtnxn.
rows. Show c = r.
linearlyindependent
Solution
al
a2
am
Eachcolumnof the A matrix can be expressed as a linear combination
of the c linearly independent Vi vectors. We denote this statement as
follows
aj
A:mxn
V:mxc
A:cxn
linearcombination of the Vi representing the jth column vector of matrixA. If we place all the j,j = 1, . .. , n next to each other, we have
matrixA. Next comes the key step. Repartition the relationship above
as follows
51
ai
A:mxn
V:mxc
A:cxn
20
lineQF
of
elementsof the ith row of V, written as the
row vector
resslble as linear
combinations
o;
R(A)= {y e
I y = Ax, x e VI}
with
generated
be
The range of a matrix is the set of all vectors that can
linearly
the
are
VI
the product Ax for all x e IV. Equivalently, if Vi e
the
span of
independent columns of A, then the range of A is the
Linear
systems of
Algebraic Equations
21
= {x e
IAx = 0}
R(AT) = {x e
ye
Ix=ATy,
N(AT) = {y e Rm I
= 0}
L N(A T )
R(A T ) L N(A)
y Ta1 Y a2
shipfollowsby switching the roles of A and AT in the preceding argument(seeExercise1.15). Note that the range of a matrix is sometimes
calledthe image, and the null space is sometimes called the kernel.
Linear
22
Algebra
We can also state this result in terms of the null spaces. A solution
to Ax = b exists for all b if and only if N(AT) = {O}and a solution
to Ax = b is unique if and only if N(A) = {O}. Moregenerally,a
for a particular b if and only if b e R(A),by
solution to Ax = b
the definition of the range of A. From the fundamental theorem,that
means y Tb = 0 for all y e N(AT). And if N(AT) = {0}we recover
the existence condition 1 stated above. These statements providea
succinct generalization of the results described in Section 1.3.1.
13.7 Least-Squares Solution for Overdetermined Systems
Nowconsider the OVERDETERMINED
problem, Ax = b where A e
to the function
P(x) = x TA TAx x TA T b + b T b
Linear Algebraic
of
3 Systems
23
Equations
Ax = b
R(AT)
N(AT)
MA)
operationyields
= AljAjkXk Xl
or in matrix form
dx
Therefore,the condition that P be minimized is equivalent to solving
24
lineqt
EQUATIONS.
These are called the NORMAL
Notice
that
defined
write the least-squares solution to the normal equations ,and
as
Xls =
b
The matrix on the right-hand side is ubiquitous in
as the pseudoinverse of A (or least-squares
lems,it is
Moore-Penrose
doinverse in honor of mathematician E. H. Moore
and
physicist Roger Penrose) and given the symbol At. The
Xls = At b
At = (ATA) IAT
A = 21
I
1
1
The
to remember.
easy
least
at
is
IPutting proof aside for a moment, the condition
has morerows
squares
least
solutiol
A in the overdetermined system for which we apply
least-squares
columns, So the rank of A is at most the number of columns. The rank of Aequals
i.e.,
is unique if and only if the rank is equal to this largest value,
number of columns.
.1
25
Linear
26
Algebra
MAT)
RCA)
(g) No. The range of A does not have a nonzero third element and
this b does.
(h) The solution is
100
000
0
0
O
0
-1
b0 =
-1
The spaces R (A) and N(AT) are orthogonal, and therefore so are
b0 and r. The method of least squares projects b into the range
solvethis equation for z and insert into the equation x = ATz that we
2
Transposethe result of Exercise 1.41.
Linear Algebra
28
minimum-norm solution
(1.6)
where each
[=
kg m al m a2 kg
s2m2
3 kg ms a4
m2s2
(m) a5 (m) a6
29
m = 3,n = 6, and x =
- 0,whereA e
We
PUR
Ap
113 =
groups
as the
Readerswith a background in fluid mechanics will recognize
(Bird, Stewart, and Lightfoot, 2002).
NUMBER
REYNOLDS
Becausethe solution to Ax = 0 is not unique, this choice of dimensionlessgroups is not unique: each Ili can be replaced by any nonzero
powerof it, and the IliS can be multiplied by one another and by any
constantto yield other equally valid dimensionless groups. For exam= -AP
-, fluid mechanicians
ple,112can be replaced in this set by 112113
FACTOR.
recognizethis quantity as the FRICTION
Nowwe return to the general case where we have n quantities and m
units.BecauseA has m LI rows (and thus m LI columnssee Example
1.8),it has a nullspace of n m dimensions, and therefore there is an
n - m dimensionalsubspace of vectors x that will solve Ax = 0. This
given a problem with n
resultgivesus the BUCKINGHAM
PI THEOREM:
dimensional
parameters containing m units, the problem can be recast
interms of I = n m dimensionless groups (Lin and Segel, 1974). This
theoremholds under the condition that rank(A) = m; in principle it is
possiblefor the rank of A to be less than m. One somewhat artificial
examplewhere this issue arises is the following: if all units of length are
Linear
30
Algebra
One might
species B (or mole or mass fractions of these species) would be independent units, but they are not. Unlike kilograms and meters, which
cannot be added to one another, moles of A and moles of B can be added
f(x) = o
(1.7)
f(xe) = f (x + d) = 0
We next expandingthe right-hand side in a Taylor series aroundx.
It is now convenientto switch to component notation to express the
second-orderTaylor series approximation for vector f
1 2fi
3)
djdl + O (lld11
2 xjXl x
where the notation O(P) denotes terms that are "of order P,"which
means that they decayto zero at least as fast as P in the limit (5-10,
fi
fi(x + d) = fi(x) +
xj x
31
dx x
f(x)
f(x+)
x
method for solving
Figure 1.3: An iteration of the Newton-Raphson
f(x) = 0 in the scalar case.
the terms
Anapproximatesolution to this equation can be found if
yielding the
thatare quadratic and higher degree in d are neglected,
linearizedproblem
x x
Settingf(x+d) = 0 and defining the JAc0BIANmatrixJij(x) = fi/xj
thiscanbe rearranged into the linear system
J(x)d = f(x)
Thisequation can be solved for d (e.g., by LUdecomposition) to yield a
newguess for the solution x + = x + d in which we use the superscript
x+ to denote the variable x at the next iterate. Denoting the solution
byd = J-1
the process can be summarized as
x + = x J -1 (x)f(x)
(1.8)
32
Linear
Algebra
is
Ei+= Ei Ji-jllxe+ Xl
El + O
1 Jjk
O + JjklxeEk +
EkEl + O
2 Xl
Ji-j1
I -1 Jjk
Jjk
+
-J
X1
2 ij Xl
= Jj1Jjk I
ikEk+
EIEk
Ji-j1
I -1 Jjk
Jjk
+
J
X1
EIEk + O
Ji-j1
X1
Jjk + -J -1 Jjk
2 ij Xl
16k+0 (11611
3)
-1 Jjk
(Jj1Jjk)
AJ
Xl
2 ij Xl
1 -l Jjk
ik
-J
Xl
Jjk
I 1
EIEk+ O (116113)
EIEk+ O (116113)
+ O (11611
3)
10-8. Inde d
| 4 The
Problem
Algebraic Eigenvalue
33
x and x12,where
Nowconsider new variables
x'l = TIIXI + T12 X 2
X'2= T21Xl +
mationof the vector y of the form y' = Wy, then y' = WAT-1x'
Thematrix WAT-I provides the mapping from x' to y'. Some importantcoordinatetransformations that take advantage of the properties
of the operator A are described in Section 1.4.
Eigenvalue
problems arise in a variety of contexts. One of the most
importantis in the solution of systems of linear ordinary differential
equations.Consider the system of two ordinary differential equations
dz
dt
= Az
(1.9)
34
Linear
Algebra
know about
(1.10)
Algebraic Eigenvalue
4 The
35
Problem
-1
Linear Algebra
36
Ax = A(xlV1 + x5V2)
= x'1AV1+ x!2AV2
= x'1IV1+ x!22V2
= QAx'
where
A = QAQT
As an example of the usefulness of this result, consider the system
of equations
= Ax = QAQTx
dt
dt
or dX1/dt = Xl, dx2/dt = 3x2. In the eigenvectorbasis, the differential equations are decoupled. They can be solved separately.
The above representation of A can be found for any matrixA that
satisfies the self-adjointness condition A = T. We have the foll(Ning
theorem.
Algebraic Eigenvalue
| 4 The
Problem
37
is self(Self-adjointmatrix decomposition). If A e
1.11
Tbeorem
thereexistsa unitary Q e cnxn and real, diagonal A e
then
adjoint,
nxnsuch that
A = QAQ*
expectintuitivelythat it continues to hold when the eigenvaluesbecomeequal.This turns out to be true, and we delay the proof until we
haveintroducedthe Schur decomposition in Section 1.4.6.
Linear Algebra
38
A-SAS 1
(1.11)
(1.12)
where J = M-IAM is organized as follows: each distinct eigenvalue appears on the diagonal with the nondiagonal elements of
the correspondingrow and column being zero, just as above,
However, repeated eigenvalues appear in JORDANBLOCKS
with
this structure (shown here for an eigenvalue of multiplicity three)
Algebraic Eigenvalue
4 The
Problem
37
adjoint,then there
ttxnsuch that
A = QAQ*
pairso that Avj = jvj. MultiplyingAvi = iVion the left by v}, and
Avj = on the left by
and subtracting gives (Ri
= 0.
If the eigenvaluesare distinct, *
this equation can hold only if
Vi*vj= 0, and therefore Vi and vj are orthogonal.
Forthe case of repeated eigenvalues, since orthogonality holds for
eigenvaluesthat are arbitrarily close together but unequal, we might
expectintuitivelythat it continues to hold when the eigenvalues becomeequal. This turns out to be true, and we delay the proof until we
haveintroduced the Schur decomposition in Section 1.4.6.
1.4.3 General (Square) Matrices
38
Linear
AlgebtQ
j),
A = SAS-I
(1.11)
(1.12)
where J = M-IAM is organized as follows: each distinct eigenvalue appears on the diagonal with the nondiagonal elementsof
the correspondingrow and column being zero, just as above.
However,repeated eigenvalues appear in JORDANBLOCKS
with
this structure (shown here for an eigenvalue of multiplicity three)
Algebraic Eigenvalue
| 4 Te
39
Problem
are
The eigenvectors corresponding to the distinct eigenvalues
the corresponding columns
Linear Algebra
40
and show that it cannot be put in the form of (1.11),but can be put in
the form of (1.12).
Solution
The characteristic equation for A is (3 = 0, so A has the repeated
eigenvalue = 3. The eigenvector is determined from
satisfying
bf-I AM = J =
-t
Algebraic
| 4 The
Eigenvalue Problem
41
Thefirst of these is simply the equation determining the true eigenvector VI,while the second will give us the generalized eigenvector v2. For
this equation is
thepresent problem
0 1/2
onecan show that
o 2
and that
J = M -I AM =
for any
result.
0,
V x e Rn
O,
e eig(A)
0), if
42
Linear Algebra
2. A 0
R 20, Reeig(A)
3.
B TABO
VB
BTAB > 0
5. A > 0 and B full column rank
BTAB > O
7. A > O
8. For A 0,
A = Al + A2 > O
Az > 0 V nonzero z e
Ax = O
If symmetric matrix A is not positive semidefinite nor negative semidefinite, then it is termed indefinite. In this case A has both positiveand
negative eigenvalues.
Y = Ax
(1.13)
all the components of the vector y are coupled to all the components
of the vector x via the elements of A, all of which are generally nonzero.
We can always rewrite this transformation using the eigenvaluedecomposition as
Y = MJM-1x
Now consider the coordinate transformation x' = M-I x and y' =
M-1y. In this new coordinate system, the linear transformation, (1.13)
becomes
Y' = Jx'
1.4
43
Eigenvalue Problem
TheAlgebraic
and thus
Nowx = T-lx',
side
Multiplyingboth sides by T to eliminate T-1 on the right-hand
yields
are the
Recallthat we have done nothing to the eigenvaluesthey eigenthe
samein the last equation of this sequence as the first. Thus
if two
valuesof TAT-I are the same as the eigenvalues of A. Therefore,
a
matricesare related by a transformation B = TAT-I, which is called
their eigenvalues are the same. In other
TRANSFORMATION,
SIMILARITY
under similarity transwords,eigenvaluesof a matrix are INVARIANT
formations.
In many situations, invariants other than the eigenvalues are used.
trA =
Aii
= (-1) m
n
i=l
Uii -
to characterize the molecule by spectroscopy and are important in determiningmany of its properties, such as heat capacity and reactivity.
Weexaminehere a simple model of a molecule to illustrate the origin
and nature of these vibrations.
Let the ath atom of a molecule be at position xa = [xa, ya, za] T and
havemass ma. The bond energy of the molecule is U (Xl, x2, x3, ,.. , XN)
whereN is the number of atoms in the molecule. Newton's second law
for each atom is
U(x1, ,XN)
d2xa
dt2
xa
Linear Algebra
44
and M be a 3N x 3N
Let X =
Mij dt2
U(Xeq+ R)
xt
where
Hik =
HikXk
2U
XiXk xeq
Mijd2xj = -HikXk
By definition, H is symmetric. Furthermore, rigidly translating the en-
tire moleculedoes not change its bond energy, so H has three zero
eigenvalues,with eigenvectors
o, 1,01 T
o, o, 11T
45
x, y, 'and z
directionS,respectively. Furthermore, because Xeq
is a minimum of
energy,
H
is
also
bond
the
positive semidefinite.
Weexpect the molecule to vibrate, so we will
seek oscillatory solutions. A convenient way to do so is to let
R(t) = ze iwt + e -iwt
cos wt + i sin
recallingthat for real w, ei(0t
wt. Substituting into the
equation
yields
governing
-w 2Mij(Zje iWt+ je -iwt ) = -Hik(ae iWt+ ae -Wt)
(1.14)
Wecan learn more about this problem by considering the properties of M and H. Since M is diagonal and the atomic masses are positive,M is clearly positive definite. Also recall that H is symmetric
positive semidefinite. Writing M = L2, where L is diagonal and its diagonalentries are the square roots of the masses, we can qmitethat
w2L2Z = I-IZ, Multiplyingby L-1 on the left yields w 2LZ = L-I HZ
and letting = LZ results in w 2 = L-I HL-I . This has the form
of an eigenvalue problem f-I = w 2, where fl = LA I-IL-I . Solving
this eigenvalueproblem gives the frequencies w at which the molecule
vibrates. The corresponding eigenvectors , when transformed back
into the original coordinates via Z = L-I , give the so-called "normal
modes."Each frequency is associated with a mode of vibration that in
generalinvolves different atoms of the molecule in different ways. -Becausefl is symmetric, these modes form an orthogonal basis in which
to describethe motions of the molecule. A further result can be obtained by multiplying (1.14) on the left by ZT, yielding
w 2z TMZ = z THZ
Linear Algebra
46
that the frequenciesw are real and thus that the dynamics are purely
oscillatory.
Observe that the quantity ZTMZ arises naturally in this problem: via
eigenvectors
are
the
orthogonal
In the current case,
under the usual
"unweighted"inner product, in which case the vectors Z = L-I are
orthogonal under the weighted inner product with W = M.
1.4.6 Schur Decomposition
A major problem with using the Jordan form when doing calculations
on matrices that have repeated eigenvalues is that the Jordan form is
numerically unstable. For matrices with repeated eigenvalues, if di-
use the Schur form instead of the Jordan form. The Schur form only
triangularizes the matrix. Triangularizing a matrix, even one with repeated eigenvalues, is numerically well conditioned. Golub and Van
Loan (1996, p.313) provide the following theorem.
Theorem 1.15 (Schur decomposition). If A e
unitary Q e c nxn such that
Q*AQ = T
in which T is upper triangular.
Algebraic
1.4 The
47
Eigenvalue Problem
(b) QQ* = I
(c) Q* is unitary
(d) The rows of Q form an orthonormal set
(e) The columns of Q form an orthonormal set
If A is self-adjoint,then by taking adjoints of both sides of the Schur
decompositionequality,we have that T is real and diagonal, and the
columnsof Q are the eigenvectors of A, which is one way to show that
the eigenvectorsof a self-adjoint matrix are orthogonal, regardless of
whetherthe eigenvaluesare distinct. Recallthat we delayed the proof
of this assertion in Section 1.42 until we had introduced the Schur
decomposition.
If A is real and symmetric, then not only is T real and diagonal, but
Q can be chosen real and orthogonal. This fact can be established by
notingthat if complex-valuedq = a + bi is an eigenvector of A, then
so are both real-valued vectors a and b. And if complex eigenvector qj
is orthogonalto qk, then real eigenvectors aj and bj are orthogonal to
real eigenvectorsak and bk, respectively. -The theorem summarizing
this case is the following (Golub and Van Loan, 1996, p.393), where,
again,it does not matter if the eigenvalues of A are repeated.
Theorem 1.16 (Symmetric Schur decomposition). If A e
is sym-
suchthat
Q TAQ = A = diag(1,
,n)
on the diagonal.
48
Linear Algebra
R22
R2m
Rmm
in which each Rii is either a real scalar or a 2x2 real matrix having com-
ofA.
in which U e
and V e
V*V = VV* = In
and S - Rtnxnis partitioned as
O(mr)xr
49
Eigenva/ue Problem
= i
2(A) i = 1
-1
= VS 2V* = QAQ
Real matrix with full row rank. Consider a real matrix A with more
columnsthan rows (wide matrix, m < n) and full row rank, r = m. In
this case both U and V are real and orthogonal, and the SVDtakes the
form
Linear Algebra
50
Ax b
R(At)
RCA)
(ui)l-l
N(AT)
spanned by
Real matrix with full column rank. Next consider the case in which
real matrix A has more rows than columns (tall matrix, m > n) and full
column rank. In this case the SVDtakes the form
2
in which UI contains the first n columns of U, and U2 contains the
remaining n m columns. Multiplyingthe partitioned matrices gives
A = UIEV T
1.4 The
51
U2
VIT
andwe have {ur+l,... , um} span the null space of AT. Becausethe
columnsof UI are orthogonal to this set, they span the range of A.
b11
is given by
Xls = (A TA) 1
ATb
Xls = At b
S.
AT=
A = UIEV T
A T A = V2UTUIEV
T = vz 2v T
A t = v2 -2v TvE(JIT
At =
I UIT
52
Linear
Algebra
+ aol
with A e c nxn, e C, i = 1, n. Wewish to expand this set of functions so that we have convenient ways to express solutions to coupled
sets of differential equations, for example. Probably the most important function for use in applications is the matrix exponential.The
standard exponential of a scalar can be defined in terms of its Taylor
series
2!
3!
3!
and this series convergesfor all A e cnxn. Let's see why the matrix
exponential is so useful. Consider first the scalar first-order linear dif-
ferential equation
dx = ax
dt
x(O) = xo
x e
e R
1.5 Functions
53
of Matrices
is
whicharises in the simplest chemical kinetics models. The solution
givenby
x(t) = xoeat
and this is probably the first and most important differential equation
that is discussed in the introductory differential equations course. By
definingthe matrix exponential we have the solution to all coupled sets
of linear first-order differential equations. Consider the coupled set of
linearfirst-orderdifferential equations
all a12
dt
xn
aln
a2n
anl an2
ann
xn
X20
xno
dx
dt
= Ax
x(O) = xo
x e Rn , A e [Rnxn
(1.15)
Thepayoff for knowing the solution to the scalar version is that we also
knowthe solution to the matrix version. Wepropose as the solution
x(t) = eAtxo
(1.16)
Noticethat we must put the xo after the eAt so that the matrix multiplicationon the right-hand side is defined and gives the required n x 1
columnvector for x(t). Let's establish that this proposed solution is
indeedthe solution to (1.15). Substituting t = 0 to check the initial
condition gives
Linear Algebra
dt
3!
2!
+ )
+ LAI + t2A2
1!
2!
ptimes
(SAS-I )
ptimes
= SAPS-I
Substituting the eigenvalue decomposition into the definition of the
1.5
Functions of
55
Matrices
exponential gives
matrix
t3
t2
eAt = 1 + tA + -A2 + ----A3+
3!
2!
2
= SS -I + tSAS -1 + SRS
2!
3!
eAt = seAts1
examining the
Therefore,we can determine the time behavior of eAtby
behaviorof
e1t
eAt
e2t
ent
trixQ. Moreover,for any linear scalar differential equation having solutionsconsistingof these scalar functions, coupled sets of the correspondinglinear differential equations are solved by the matrix version
of the function.
Re(z)+lm(z)i
Re(z)
Re(z)
Linear Algebra
56
diagonal
Similarly,ifwe have a
of eD is
then the matrix norm
have that
t 20
For any nonsingularS, the product IISIIIIS-Ill is defined as the condition number of S, denoted K(S). A bound on the norm of eAtis
therefore
IleAtll
A diagonalizable
Schur form A = QTQ*, with T upper triangular. Van Loan (1977) shows
that3
IINtlIk
(1.17)
t0
Y
IleAtII
k!
here.
57
of Matrices
Functions
1.5
constant
thereis a
t 20
IleAt
(1.18)
finding the
Scalarargument. The reader is undoubtedly familiar with
maximumand minimum of scalar functions by taking the first deriva-
to
fiveand setting it to zero. For conciseness, we restrict attention
(unconstrained)minimization, and we are interested in the problem4
min f (x)
Whatdo we expect of a solution to this problem? A point x o is termed
58
Linear Algebra
f (x) = (1/2)ax2 + bx + c
x o = bla
This last result for xo is at least well defined provided a * 0. Butif
we are interested in minimization,we require more: a Ois required
for a unique solution to the problem minx f (x). Indeed, taking a second derivativeof f G) gives d2/dx2f(x) = a. The condition a 0 is
[0, 1]
f (ax + (1 a)y)
af(x) + (1
Figure 1.5 shows a convex function. Notice that if you draw a straight
is convex the straight line lies above the function. We say the function
f() is STRICTLY
CONVEX
if the inequalityis strict for all x * y and
f (ax + (1 a)y) < af(x) + (1 a)f(y)
Functionsof Matrices
59
fly)
flax + (1 a)y)
f(x)
x
Figure 1.5: Convex function. The straight line connecting two points
on the function curve lies above the function; af(x) +
(1
= f (xo)
60
Linear Algebra
as
Figure 1.6: Contoursof constantf (x) = x TAx; (a) A > O (or A <
O), ellipses; (b) A 0 (or A 0), straight lines; (c) A
origin, so they are not important to the shape of the contours of the quadratic function
61
of Matrices
Functions
J.5
hyperbolas.
ForindefiniteA, Figure 1.6 shows that the contours are
point in this case, because the function reTheoriginis termed a saddle
prefers to maintain
semblesa horse's saddle, or a mountain pass if one
without bound in
thetopographymetaphor. Note that f G) increases
bound
maxminf (x)
minmaxf(x)
(1.19)
62
Linear
Algebra
20
10
f(x)
Scalar
Vector
(1/2)ax 2 + bx + c
(1/2)x T Ax + b T x + c
ax + b
Ax + b
b/a
(1/2)b2/a + c
-(1/2)b
- xo)2 + f 0
f(x)
TA-1 b + c
(1/2) (x
xo) + fo
Finally, to complete the vector minimization problem, we restrict attention to the case A > O.Taking two derivatives in this case produces
f (x) = (1/2)x T Ax
+ b Tx + c
(x) = (1/2)(Ax + A T x) + b = Ax + b
Setting df/dx = Oand solving for xo, and then evaluating f (xo) gives
x o = A-I b
f 0 = (1/2)bTA-1 b + c
63
of Matrices
5 Functions
ObviouslyA is symmetric in the least-squares problem. We have alreadyderived the fact that T > 0 if the columns of A are independent
in the discussion of the SVDin Section 1.4.7. So independent columns
Linear
Algebra
All
.421
ml
All '412
A12
Aln
A 21
A 22
A2n
Am I
Ant2
mn
A22
vecA =
m2
Aln
A2n
mn
an
vecA
a2
an
Matrix Kronecker product. For A e Rtnxn and B e
necker product of A and B, denoted A B, is defined as
AllB A12B
A21B
A22B
AmIB Am2B
A2nB
the Kro-
(1.20)
AmnB
B,and
Note that the Kronecker product is defined for all matrices A and
the matrices do not have to conform as in normal matrix multiplication.
above,
By counting the number of rows and columns in the definition outer
we see that matrix A B e RntPXnq.Notice also that the vector
general
product, defined in Section 12.5, is a special case of this more
matrix Kroneckerproduct.
of Matrices
Functions
1.5
65
vec(ABC) = (CT
D) = (AC)
(A
-I
(AB) -1 = A
(1.21)
(1.23)
A and B invertible
(1.24)
Be
and C e RPxq. Let the
Establishing(1.21). Let A e
columnpartitions of matrices B and C be given by
vec(ABC) =
ABC2
ABcq
Nowwe examine the right-hand side of (1.21). We have from the definitionsof vec operator and Kronecker product
C12A C22A
CplA
cp2A
bl
b2
ClqA C2qA
cpqA
bp
CllA C21A
(CT A)vecB=
Clj
c-2j
cpj
bp] cj
= ABcj
66
Linear
Algebra
(CT e A)vecB =
ABCI
ABC2
ABcq
Establishing(1.22).Herewelet A e
B e vxq, C e
AllB
AlnB
CljD
AllB
AmnB
cnjD
Am1B
AlnB Clj
cnj
= (Acj
= (Acj) (BD)
Since this is the jth (block)column of (A B) (C D), the entire matrix
is
(AC2) (BD)
ACI AC2
Acrl
(BD)
(Acr) (BD)I
67
of Matrices
Functions
5
A2nB
A22B
AmnB
A21B T
A22B T
Am2BT
AlnB T A2nB T
AmnBT
AllB T
A12BT
-1
eig(A B) = eig(A)eig(B)
rank(A B) = rank(A)rank(B)
(1.25)
(1.26)
(1.27)
68
Linear
Algebra
(and ATA).We then have for (A) and (A) denoting nonzero s'
and eigenvalues, respectively
(AeB) =
(BBT))
Solving linear matrix equations. We shall find the properties(1.21)(1.27)highly useful when dealing with complex maximum-likelihood
estimation problems in Chapter 4. But to provide here a small illustration of their utility, consider the following linear matrix equationfor
the unknown matrix X
AXB = C
ATS + SA = -Q
1.6
69
Exercises
= vecQ
1.6 Exercises
in R2
Exercise1.1: Inner product and angle
Considerthe two vectors a,b e R2
las
llall =
(a,b) = E aibi
i
cos e =
Il all
b2
al
Il b Il
bl
Considerthe vector x e R2,whose elements are the temperature (in K) and pressure
300
(inPa)in a reactor. A typical value of x would be
1.0 x 106
300
x 106 be two measurements of the state
1.2
=
1.0 x 106
Ofthe reactor. Use the Euclidean norm to calculate the error 1b' xll for the
twovalues of y. Do you think that the calculated errors give a meaningful idea
(a) Let Yl =
310
and Y2?
70
Linear
Algebra
n
=
12 Wi
(c) Propose a weight vector that is appropriate for the example in part
(a). justify
Exercise1.3:linear independence
Verify that the followingsets are u.
= (2,0, 11T,e3 = [1, 1, Il T.
(a) el = [O,
=
= 11 + 2i,1 (b) el = + i, 1 Hint: boress alel + a2e2 + a3e3 O. Taking inner products of this equation
with
el, Q, and e3 yields three linear equations for the O(i.
because we do not know a priori which vector(s) in a linearly dependent set is(are)
expressible as a linear combination of the others.
The following statement is a more precise variation on this theme. Giventhe vectors
k,Xi e Rn are linearly independent and the vectors {Xi,a} are linearly
71
Exercises
Someproperties of subspaces
1.7:
Exercise
properties
following
Establishthe
line in R2
1
1
line
s' = yly= 1
1
1
Exercise1.9:Permutation matrices
(a) Giventhe matrix
100
P=OOI
010
showthat PA interchanges the second and third rows of A for any 3 x 3 matrix.
Whatdoes AP do?
(a) Constructa matrix operator that multiplies the horizontal (Xl) component of
a vectorby 2, but leaves its vertical component (x2) unchanged.
(b) Constructa matrix operator B that rotates a vector counterclockwise by an angle
of 2TT/3.
(d) Showthat B3 = I. With drawings, show how this makes geometric sense.
72
Linear
Algebra
k(t,s)x(s)ds
of the operator.
where k(t,s) is a known function called the KERNEL
(a) Show that K is a linear operator.
(b) Read Section 2.4.1. Use the usual (i.e., unweighted) inner product on
the interval
Ka {x(iAt)) = E
= I,N
(a) (nn T)(nn T)u for any vector u. Recalling that nn T is the projectionoperator,
what is the geometricinterpretation of this result?
(b) (I 2nn T)2. What is the geometric interpretation of this result?
Someonein your research group wrote a computer program that takes an n-vector
input, x e VI and returns an m-vector output, y e Rm.
All we know about the function f is that it is linear.
authorhas
The code was compiled and now the source code has been lost; the codefor
graduated and won't respond to our email. We need to create the sourceno longer
function f so we can compile it for our newly purchased hardware, which
do is execute
runs the old compiled code. To help us accomplish this task, all we can
the function on the old hardware.
write the source
can
you
before
make
to
(a) Howmany function calls do you need
code for this function?
functionf from
linear
the
construct
you
(b) Whatinputs do you choose, and how do
the resulting outputs?
|6
73
Exercises
matters worse, your advisor has a hot new project idea that requires
(c)Tomake a program to evaluate the inverse of this linear function,
youto write
Exercise1.16:Rank of a dyad
n x n dyad uvT?
Whatis the rank of the
as
Findbasesfor the four fundamental subspaces associated with the following matrices
1100
0101
Exercise
1.19:Zero is orthogonal to many vectors
Provethat if
x z = y z
for allz e
then
or,equivalently,prove that if
Linear
Algebrq
16
12
10
0.5
1.5
(d) 1, independent of b.
x
y
0.67
3.82
0.89
4.87
1.11
6.28
1.33
8.23
1.56
9.47
1.78 2.00
12.01 15.26
Not having a good theory to determine the form of this expression, your friendhas
chosen a polynomialto fit the data.
(a) Consider the polynomial model
the
Expressthe normal equations for finding the coefficients ai that minimize
sum of squares of errors in y.
75
| .6 Exercises
(b) Using the x- and y-data shown above and plotted in Figure 1.8, solve the leastsquares problem and find the a that minimize
2
and plot
versus n.
(d) Plot the data along with your fitted polynomial curves for each value of n. In
particular, plot the data and fits for n = 2 and n = 9 on one plot. Use the range
0.25 x 2.25 to get an idea about how well the models extrapolate.
and the appearance of your plots, what degree poly(e) Basedon the values of
nomialwould you choose to fit these data? Whynot choose n = 9 so that the
polynomialcan pass through every point and = 0?
300
1.82
325
1.89
350
2.02
375
2.14
400
2.12
425
2.17
450
2.15
475
2.21
500
2.26
(a) Take logarithms of (1.28)and write a model that is linear in the parameters
In(ko) and E/R. Summarize the data and model with the linear algebra problem
Ax = b
(d) Plot the data and least-squares fit in the original variables k versus T. Do you
have a good fit to the data?
Linear
76
Algebra
2.4
2.2
300
400
350
450
500
Exercise123:
that
(f) Is the solution for these b unique? If not, given one solution Xl, such
AXI = b, specify all solutions.
02
C02
H2 + 02 =H20
+ 202
C02 + 2H20
co + 2H20
77
1.6 Exercises
(a) Giventhe species list, A = CO" 02 C02 ?H2 AH20 CH4] write out the
stoichiometricmatrix, v, for the reactions relating the four reaction rates to the
six production rates
(1.29)
meas
-11T
Is there a set of reaction rates rex that satisfies (1.29) exactly? If not, how do you
know? If so, find an rex that satisfies Rmeas= VTrex.
(d) If there is an rex, is it unique? If so, how do you know? If not, characterize all
solutions.
co + -02
C02
H2 + -02 =H20
CH4+ 202
+ 2H20
-4.5
Is there a set of reaction rates rex in this second model that satisfies (1.29) exactly? If so, find an rex that satisfies Rmeas= VTrex. If not, how do you know?
(c) If there is not an exact solution, find the least-squares solution, rest. What is the
least-squares objective value?
(d) Is this solution unique? If so, how do you know? If not, characterize all solutions
that achieve this value of the objective function.
(1.30)
in whichx(k), an n-vector, is the state of the system, and u(k), an m-vector, is the
manipulatableinput at time k. The goal of the controller is to choose a sequence of
inputs that force the state to follow some desirable trajectory,
(a) Whatare the dimensions of the A and B matrices?
Linear
78
Algebra
x(n) = An xo + C
(1.31)
u(n 1)
What is the C matrix and what are its dimensions?
(c) Whatmust be true of the rank of C for a system to be controllable, i.e.,forthere
to be a solution to (1.31)for every x(O) and x(n)?
(d) Considerthe followingtwo systems with 2 states (n = 2) and 1 input (m = 1)
x(k+ 1) =
x(k)+
1 u(k)
x(k+ 1) =
x(k) +
Noticethat the input only directly affects one of the states in both of these
systems. Are either of these two systems controllable? If not, show whichx(n)
cannot be reached with n input moves starting from x (O) 0.
xx
T b)
Cij= (AxxTb)i i =
dxj
j = 1,...,n
1.6 Exercises
79
products
rank(ABC) = rank(B)
LU decomposition
-1
1 -1
+ 2XIX2 + 2X3)
)T minimizes
+ X2
2 -1
-1
2 -1
2
(b) If all the diagonal elements of D are positive, the matrix can be further factorized
into LLTthis is called the CHOLESKY
DECOMPOSITION
of A. Find L for the matrix
of part (a).
80
Linear
Algebra
(a) Find the value of q for which elimination fails (i.e., no solution to
If you are thoughtful, you won't need to perform the eliminationAx b exists).
to findout.
(b) For this value of q what happens to the first geometrical interpretation
ofthe
(b) If b = (1, p,q) T, find a necessary and sufficient condition on p and q so that
Ax = b has a solution.
from
(c) Givenvalues of p and q for which a solution exists, will the algorithm
Section1.32 solve it? If not, pinpoint the difficulty.
(d) Find the LU factorization of AT.
(e) Use this factorization to find two LI solutions of ATx = b, where b = (2,5)T.
Since there are fewer equations than unknowns in this case, there are infinitely
this
many solutions, forming a line in R2. Are there any values of b for which
problem has no solution?
Herea is an
Under what conditions on u and v does (I auv T) (I + auvT)-l?
arbitrary nonzero scalar.
A' =
this pair
Write a program that uses the Newton-Raphsonmethod to solve
y (x
=0
(y +
tanx O
of equations
findat
program,
this
81
1.6 Exercises
Exercise 1.38; The QR decomposition
ai
cni
in the linear combinationare the elements of the ith column of matrix C. This result
willbe helpfulin solving the followingproblem. Let A be an m x n matrix whose
columnsai are linearlyindependent (thus m n). We know that using the GramSchmidtprocedureallows us to construct an ONset of vectors from the ai. Define a
manixQ whosecolumns are these basis vectors, qi, where qi qj = ij.
(a) Expresseach ai in the basis formed by the qi. Hint: because the set of qi are
constructedfrom the set of ai by Gram-Schmidt,al has a component only in
the ql direction, a2 has components only in the qi and q2 directions, etc.
(b) Use the above result to write A = QR, i.e.; find a square matrix R such that each
columnof A is
upper triangular.
ForA e Rmxn with independent columns we have used in the text what
is sometimes
calledthe "thin" QRwith QI e
and RI e IRnxnsatisfying
A = QIRI
82
Linear
Algebra
Provethe followingproposition
Note that this proof requires our first use of the fundamental theoremof
linear
algebra. Sincemost undergraduate engineers have limited experience doingproofs,
we
provide a few hints.
1. The Sifand only if" statement requires proof of two statements: (i)ATAhaving
full rank implies A has linearly independent columns and (ii) A having
linearly
independent columns implies ATA has full rank.
2. The statement that S implies T is logically equivalent to the statement thatnot
T implies not S. So one could prove this proposition by showing (ii) and then
showing: (i') A not having linearly independent columns implies that ATAis not
full rank.
linearlyindependent rows and columns. Think about what that tells you about
the null space of B and BT. See also Figure 1.1.
rank(X) = p
np
Tll
T12
T22
in which eig(T11) = eig(B), and eig(T22) = eig(A) \ eig(B), i.e., the eigenvalues of T22
are the eigenvalues of A that are not eigenvalues of B. Also show that eig(B) g eig(A).
Hint: use the QR decomposition of X.
0.46287 0.11526
0.53244 0.34359
invoking [u , s , v)
-0.59540 -0.80343
-0.80343 0.59540
0.78328
0.00000
0.00000
0.12469
-0.89798
-O. 44004
-0.44004
0.89798
1.6 Exercises
83
(d) Denotethe columns of v by VI and v2. Draw a sketch of the unit circle traced
by x as it travels from x = VI to x = v2 and the corresponding curve traced by
Ax.
in which
0.798
-0.715
0.051
1.088
1
xo = 0
(a) Computethe eigenvalues and singular values of A. See the Octave or MATLAB
commandseig and svd. Are the magnitudes of the eigenvaluesof A less than
one? Are the singular values less than one?
(b) Whatis the steady state of this system? Is the steady state asymptotically stable?
(c) Makea two-dimensional plot of the two components of x(k) (phase portrait) as
(d) Whenthe largest eigenvalue of A is less than one but the largest singular value
of A is greater than one, what happens to the evolution of x(k)?
(e) Nowplot the values of x for 50 points uniformly distributed on a unit circle and
A = USV*
mark ul, 112,VI, v2, Sl, and s2 on your plot. Figure 1.10 gives you an idea of the
appearanceof the set of points for x and Ax to make sure you are on track.
GivenA e
with rank(A) = r and the SVDof A = UEV*,if we partition the first
r columnsof U and V and call them UI and VI we have
84
Linear
-0.5
0.5
1.5
and A =
x = A+b
If we form the residual for this "solution"
= AA+b-b
b-b
= UI
= Imb-b
1m
b we have
Algebra
Exercises
85
consider the process depicted in Figure 1.11 in which u is a manipulatable input and
d is a disturbance. At steady state, the effects of these two variables combine at the
measurementy in a linear relationship
The steady-state goal of the control system is to minimize the effect of d at the measurementy by adjusting u.
For this problem we have 3 inputs, u e R3, 2 disturbances, d e R2, and 2 measurements,y e R2, and G and D are matrices of appropriate dimensions. Wehave the
followingtwo singular value decompositions available
[X] [E z T
075 -0.66
-0.66
-0.98
-0.19
0.75
-0.19
0.98
1.57
0.00
0.71
0.00
0.00
0.21
0.00
0.13
-0.89
0.37
-0.085
0.46
045 -0.81
094 -0.33
-0.33
0.94
(a) Can you exactly cancel the effect of d on y using u for all d? Why or why not?
u = Kd
What is K in terms of U, S, VI , X, E, Z? Give the symbolic and numerical results.
(c) Whatis the worst d of unit norm, i.e., what d requires the largest response in u?
What is the response u to this worst d?
and y, u, d are in deviation variables from the steady state at which the system was
linearized. Experimentaltests on the system have produced the followingmodel parameters
2.857 3.125
0.991 2.134
If we have measurements of the disturbance d available, we would like to find the input
u that exactlycancels d's effect on y, and we would like to know ahead of time what
86
Linear
Algebra
u
Figure 1.11: Manipulatedinput u and disturbance d combine to affect output y,
-1
-1
sus
1
1
(1.32)
vmat is the size of the disturbance so that all disturbances less than this size
words
can be rejected by the input without violating these constraints? In other
find the largest scalar a such that
if lldll
facts
Use the Schur decomposition of matrix A e cnxn to prove the following
n
(1.33)
(1.34)
detA=
trA= E 'Ni
87
1.6 Exercises
e
inwhich
= 1,2,
,n.
1.50:Repeated eigenvalues
matrix
Theself-adjoint
011
1
matrix
Thenon-self-adjoint
0
013
002
000 21
0
v2 and v3 can be
EIGENVECTORS
other eigenvectoras v4. The GENERALIZED
found by solving
(c) Determinethe set, construct the transformation matrix M, and show that J =
M-I AM is indeed in Jordan form.
(b) In the problem Ax = b, use the eigenvectors to determine necessary and sufficient conditions on b for existence of a solution.
123
123
123
88
Linear
Algebra
to
establish
that an = O. With an = Owhat can you do next to show that an-I = 0? Continue
this
process.
(c) Assume that A has a complete set of eigenvectors. Showthat the eigenvectors
of A and AT are biorthogonal.
trices are called NORMAL(The converse is also true (Horn and Johnson,1985).)
eigenvectors
(e) Show that the eigenvalues of A = ATare imaginary and that its
are orthogonal.
andeigenLet u and v be unit vectors in VI, with uTv * 0. What are the eigenvalues
vectors of uv T?
A=
001
001
111
Xi+l AXi.
(a) Let xo = (1,O,
and consider the iteration procedureresult.
several steps of this procedure by hand and observe the
Perform
1.6 Exercises
89
whereT is called a transition matrix and the elements of wn are the probabilitiesof
havinga certain type of weather on that day. For example, if u'5 = [0.2, 0.1, 0.7]T, then
the probability of snow five days from now is 70%.The sequence of probability vectors
on subsequent days, {wo, WI , u.'2,.. .} is called a MARKOV
CHAIN.Because w is a vector
+ iw,
-2 -2
2
s -I AS =
90
Linear
Algebra
DA
dx2
d2CB
equations
+ k-1CB= O
+ klCA k1CB
k2CB+ k2Cc= 0
d2cc
where the ki,i = (1,2) are rate constants and the Dj,j = (A,B,
C) are the species
diffusivities.
The boundary conditions are
CA= 1
dCA
dCB
dx = dx
atx=l
dcc
dx
Consider a nonsquare m x n matrix A. Showthat ATAis symmetric positivesemidefinite. If A were square we could determine its nullspace from the eigenvectorscorresponding to zero eigenvalues. How can we determine the nullspace of a nonsquare
matrix A? What about the nullspace of AT?
(b) What conditions must the eigenvalues satisfy for this iteration to convergeto a
steady state, i.e., so that x(i) x(i + 1) as i 00?
91
Exercises
(a) Show
that
An
anIAn I
=O
is known as
theorem.
(b) Let
combinations of A and I.
Usethe theorem to express A2, A3, and A-I as linear
least-squares problem
Exercise1.64:Solvingthe nonunique
by
(a) Showthat all solutions to the least-squares problem are given
Xls = VIE-I UTb +
in which is an arbitrary vector.
0 0 73 r
inwhichTi,i = I
4 are arbitrary triangular matrices, T5is triangular, and * representsarbitrary(full)matrices. This result is useful in proving the Cayley-Hamilton
theoremin the next exercise.
92
Linear
Algebra
k.
= 1 x + x 2 x 3 +
= etr(ln)
d2y
dt
dt
your guide
93
| Exercises
equation can be written as a single first-order differenand show that the above
tial equation
dz
dt
that for
Usethis result to show that for any E > 0, N e c nxn , there exists c > Osuch
all t 20
IINtllk
k!
0.
94
Linear
Algebra
Consider again the quadratic function f(x) = (1/2)xTAx and the two games given
in
to the A matrix
(1.19). Confirm that Figure 1.6 (c) corresponds
Xl
minmaxf(x)
This inequality verifies that the player who goes first, i.e., the inner optimizer, has the
AXB = C
in whichA
vnxn X
we consider
fixed
Of
matrices and X is the unknown matrix. The number of equations is the number
the
elements in C. The number of unknowns is the number of elements of X. Taking
vec of both sides gives
(1.37)
(B' e A)vecX = vecC
We wish to explore how to solve this equation for vecX.
95
Exercises
1.6
to exist for all vecC, and be unique, we require that (B' e A) has
(a) Forthe solution
rows and columns, i.e., it is square and full rank. Using the
linearlyindependent
square and full
rankresult (1.27)show that this is equivalent to A and B being
rank.
(b)Forthis
is equivalentto that obtained by multiplying (1.36)by A-I on the left and B-l
on the right,
X = A-I CB-I
(c)If we have more equations than unknowns, we can solve (1.37) for vecX as a
least-squaresproblem. The least-squares solution is unique if and only if BT A
haslinearlyindependent columns. Again, use the rank result to show that this
is equivalentto: (i) A has linearly independent columns, and (ii) B has linearly
independentrows.
(d) Weknowthat A has linearly independent columns if and only if ATA has full
rank,and B has linearly independent rows if and only if BBThas full rank (see
Proposition1.19in Exercise 1.41). In this case, show that the least-squares solution of (1.37)
VeCXls = (B T e A) t vecC
Xls = A t CBt
Exercise
1.78:Solvingthe matrix Lyapunov equation
Write
a functionS = your 1yap(A, Q) using the Kronecker product to solve the matrix
Lyapunovequation
A TS + SA = -Q
comparing
to the function 1yap in Octave or MATLAB.
Bibliography
Phenomena.
John
Lightfoot. Transport
N.
and E.
Stewart,
E.
edition, 2002.
W.
second
B.Bird,
York,
New
The Johns
Matrix Computations.
&sons,
van Loan.
F.
edition, 1996.
C.
and
Maryland, third
Golub
Baltimore,
G.H.
Press,
University
N.J. Higham.
Functions
phia, 2008.
and
R.A. Horn
ofMatrices:
Matrix Analysis.
Johnson.
C. R.
1985.
C.C.Lin
Mathematics
and L.A. segel.
Neudecker.
J. R.Magnusand H.
University press,
to Deterministic
Applied
Cambridge
SIAM, Philadel.
Problems in the
1974.
Matrix Differential
Calculus
with Applications
Analysis.
Springer-Verlag,
1998.
S.M.Selby.CRCStandardMathematical
Tables. CRC Press, twenty-first edition,
1973.
G.Strang. Linear
Algebra
second edition, 1980.
Academic
L,N.Trefethen
and D. Bau Ill.
Numerical Linear Algebra.
and Applied
Mathematics, 1997.
C.F.VanLoan.
The sensitivity
of the matrix
14.971-981,
exponential.
1977.
J. von
Neumann and
O.
Morgenstern.
Princeton
Theory ofGames
University
and Economic Behavior
Press, Princeton
and oxford, 1944.
2.1 Introduction
Differentialequations arise in all areas of chemical engineering. In this
chapter we consider
ORDINARY
differential equations (ODEs),that is,
equationSthat have only one independent variable. For example, for reactionSin a stirred-tank reactor the independent variable is time, while
in a simple steady-state model of a plug-flow reactor, the independent
variableis position along the reactor. Typically,ODEsappear in one of
two forms
dx
dt
(2.1)
or
dy +
dx
y e R
(2.2)
First-order
Linear
Systems
for Linear
Differential Equations
principle
Superposition
can be Written
equation
2.2.1
differential
linear
Anarbitrary
Lu U
= d/dt
operator (e.g., L and - A, where
g is a
differential be determined,
linear
to
L is a
the solution the following general properties of
is
u
matrix),
A is a section1.2 introduced in terms of L
function.
we now
which
linear operators,
Lu + Lv
L(u + v) =
L(cxu) = cx(Lu)
3. Letul be a solutionto Lu =
a constantfor a DIRICHLET
boundary condition, a first derivative
d/dx for a NEUMANN
boundary condition, or a combination B =
2.2
99
dx
dt
(2.3)
= F(ul)
Moregenerally, a single high-order differential equation can always be
writtenas a system of first-order equations.
UnlessA is diagonal, all of the individual scalar equations in the
system(2.4)are coupled. The only practical way to find a solution to
the system is to try to decouple it. But we already know how to do
thisweuse the eigenvector decomposition A = MJM-I , where J is
theJordan form for A (Section 1.4). Letting y = M-I x be the solution
vectorin the eigenvector coordinate system, we write
=Jy
If A can be completely diagonalized, then J = A = diag(1,112,... , An)
and the equations in the y coordinates are completely decoupled. The
solutionis
Yi(t) = e ttCi
or
Y = eAtc
an initial-valUe problem
constants. For
Vi
general consequence
of
defined by
conditionxo that lies on the line
an initial
that starts in an invariant subspace never leaves it. Similarly, each pair
that
2.2
First-order Linear
form
real
ill
x(t)
Systems
101
as
sin wt)vr +
(2.5)
whereVI is the eigenvector corresponding to and c'2 is the generalizedeigenvector;compare with Example 1.13. The line defined by the
eigenvectorVI is an invariant subspace, as is the plane defined by VI
andv2. However,the line defined by the generalized eigenvector is
not invariant.
this term allows solutions to grow initially even when all of the eigenval-
constant-coefficient
problem = Ax can be rewritten as S' = Jy,
whereJ has a block diagonal structure exemplifiedby the following
Ordinary Differentia/
Equations
102
template
(0
others.
= Ax =
thecharacteristicequationfor A is
2.2
103
determinant
-- definite
+ definite
Re(A) > O
stable spiral
unstable spiral
stable node
(trace)_
< 0 and
unstable node
trace
>0
unstable saddle
indefinite
indefinite
Figure 2.1: Dynamical regimes for the planar system dx/dt = Ax,
A e
parametrized in the determinant and trace of
A; see also Strang (1986, Fig. 6.7).
planein that regime; the axes correspond to the eigenvectors (or real
andimaginaryparts of the eigenvectorsin the case of complexconjugates)and trajectories x(t) on this plane are shown with time as
Ordinary Differentia/
102
template
(0
of Planar Systems
Dynamics
Qualitative
2.2B
system, there is a large range of Possible
In a general It-dimensional real and complex, with positiveor neg.
combinationsof eigenvalues,
simple and general classificationof the
ative real parts. For n = 2, a Such systems are called PLANAR,
bepossible dynamicsis possible.
a simple plane (sometimescalled
cause all of the dynamicsoccur on
defined by two eigenvectors (or an eigenvectorand
PLANE)
the PHASE
generalized eigenvector,if A is defective). Writing
= Ax
STABLE.
cays exponentially to the originthe origin is ASYMPTOTICALLY
2.2
First-Order Linear
103
Systems
determinant
+ definite
-- definite
stable spiral
stable node
unstablespiral
(Oce _
unstable node
trace
unstable saddle
indefinite
indefinite
= Ax,
Figure 2.1: Dynamical regimes for the planar system dx/dt
parametrized in the determinant and trace of
A e
A; see also Strang (1986, Fig. 6.7).
real
andimaginaryparts of the eigenvectorsin the case of complexconjugates)and trajectories x(t) on this plane are shown with time as
Ordinary Differential
Equations
104
determinant
X = ico
neutral center
node or
trace
or
stable node
stable star
unstable node
unstable star
First-order
Linear Systems
that Re(s)
s such
105
L(f(t))
The
inverse
Re(s) c
(2.6)
formula is given by
c+i 00
est f(s)ds
2TTi c-ioo
properties
(2.7)
is also linear
L-1 icx(s) +
= af(t) + g(t)
2. Transform of derivatives
df(t)
d2f(t)
dt2
dnf(t)
3. Transform of integral
f(t')dt' = -f(s)
anf(s)
dsn
Ordinary Differential
106
s delay
L(f(t
Il(t) =
6. Laplaceconvolutiontheorem
t')dt'
L
f (t t')g(t')dt'
7. Finalvalue theorem
lims(s) = limf(t)
if and only if sf(s) is bounded for all Re(s) 0
8. Initial-value theorem
d2y
dt2
= F Ky
2.2
First-OrderLinear
Systems
(t)
107
s2
sn+l
cos (0t
s2 + (02
sin cot
s2 + (02
co
s2 (0 2
Sinh (0 t
cosh (0t
s2 (02
eat
teat
-KY
F(t)
Ordinary Differential
108
Equations
this second-order
boundary conditions for
two
require
rest at the
We
the particle is initially at
origin,then
assume
we
If
equation.
=
t
at
O
specified
and
both
these initial
conditions are
the boundary
conditions are
dY (0) = o
dt
If we divide by the
d2Y+ y = f
dt2
dy(t)
= F/m. Take the Laplacetransformof
in which = K/m and f
of the particle versus time y(t), for
the model and find the position
Solution
of motion and SUbstitut.
Taking the Laplacetransform of the equation
ing in the two initial conditions gives
2
s 2(s) sy(0) y' (0) + k (s) = f (s)
s 2(s) + k2(s) = f (s)
Solvingthis equation for y(s) gives
= sin kt
The first followsby the definitionof f (s) and the second followsfrom
Table2.1. Usingthe convolutiontheorem then gives
1
2.2
First-order
Linear Systems
109
dx
= ax + bu(t)
dt
x(0) = xo
X(s) =
xo
the second
Wecan invert the first term directly using Table 2.1, and
termusing the table and the convolution theorem giving
x(t) = xoeat + b
u(t')dt'
ea(tt')
Wesee the effect of the initial condition xo and the forcing term u(t).
If a < 0 so the system is asymptotically stable, the effect of the initial
conditiondecays exponentiallywith time. The forcing term affects the
solutionthrough the convolution of u with the time-shifted exponential.
dx = Ax + Bu(t)
dt
x(0) = xo
110
Ordinary
Differential
qu
Qtions
f(t)
transformpair
f(s)
1
eAt A e Rnxn
(sl
A)-l
x(t) = eAt xo +
nxl
nxn nxl
2.2 First-Order
Linear Systems
111
1
47T(X
e x2/4cx
(2.8)
(2.9)
where > 0. Setting x = 0 and then taking the limit tx -+ 0 shows that
00,while setting x = xo 0 and taking the same limit
showsthat for any nonzero xo,
ga(xo) 0. These functions
becomeinfinitely high and infinitely narrow. Furthermore, they both
have unit area
gdx) ax = 1
(2.10)
dH(x)
dx
Also note that the interval of integration in (2.10)does not have to be
(00,
00).The integral over any interval containing the point of singularity for the delta function produces the value of f (x) at the point of
singularity. For example
for all
for all ae R
Ordinary Differential
Equations
112
Delta Function
the
of
Derivatives
Doublet.
The first
also differentiable.
(x)
usually denoted (5'
d(x)
dx
(x)dx
integration by parts
the
Higher-order derivatives. Repeatedtriplets, quadruplets, produces
etc.
followinghigher-orderformulas for
nf (n)(0)
n 0
(x)dx =
As with the singlet and doublet, we can change the variableof integration and shift the location of the singularity to obtain the general
formula
a)dx =
nf (n )(a)
a e R
x 2d2y
dy
dx
(2.12)
113
f(s)
1
s
1
tion, has a simple exact solution that illustrates many important features of variable-coefficientproblems and arises during the solution of
manyproblems. The second-order Cauchy-Eulerequationhas the form
aox 2y" + al xy' + a2Y = 0
(2.13)
where y' = dy /dx. Its defining feature is that the term containingthe
and
1) +
solution is found
Y=
+ C2xa2
(2.14)
=
For example, let ao = 1, al = 1, = 9,yielding the equation cx2 9
0, which has solutions = 3. Thus the equation has two solutions
Ordinary Differential
114
linear problem
(x), the
yields
Ax + Ax = O
which simplifies to
constants. Thus the general solution for this problem can be written
y(x) =
(2.15)
q(x)
r(x)
p(x)
p(x)
(2.16)
(2.17)
POINT.
ries expansion,at some point x = a, then a is an ORDINARY
Otherwise, x = a is a SINGULAR
POINT.
2.3 Linear
power series
115
n=2
- 1)cnxn-2 +
E cnxn = 0
n=0
The two sums can be combined if we can make their lower limits the
[ (m +
+ 1)cm+2 + k2cm] x m = 0
Thiscan only hold if the term inside the square brackets is zero for all
m, requiring that
cnk2
IA full understanding of convergence of power series requires knowledge of functions of complex variables, see, e.g., Ablowitz and Fokas (2003).
Ordinary Differential
116
now reverted to
have
we
(where
that
arbitrary, we find
Equations
Cok2
Clk2
3!
--C21<2 cok4
4!
4-3
C3k2
5-4
5!
k4x 4
kx
4!
k 3x 3
k 5x 5
3!
5!
expansions of two familiar functions, and we can thus rewrite the gen_
eral solution as
y(x) = co cos kx + Cl sinkx
seta = 0 from now on for convenience. Now q (x)/p (x) and r(x)/p(x)
POINT.If x (q(x)/p(x))
are not analytic and x = 0 is called a SINGULAR
y(x) =
n
E cnx
n=0
(2.19)
2.3 Linear
117
Here X
Ilius.
Solution
n=0
+ CnX
(n+a)cnx n+cxl
Sincex can vary, the equality can only hold if the terms in the brackets
are all zero. This is the recursion formula for the coefficientscn. The
firstterm (n = 2)picks out the Cauchy-Eulerbehavior and is called the
EQUATION.
Since c-2 = 0, it reduces to (n + +
INDICIAL
=
= 0.
Aswe anticipated above with the corresponding Cauchy-Eulerequation,
thishas the repeated root = 0. The general recursion relation for the
coefficientsreads
cn
Sincec-1 = 0, all the coefficients with n odd are zero. Therefore, only
oneof the two solutions to the problem has the form of (2.19),again,
in parallel with the Cauchy-Euler analysis. With some rearrangements,
this solution becomes
Yl(x) =
n=0
Ordinary Differential
118
Equations
of V2y
indicial equation yielded a singlerethe
example,
previous
In the
solution of Frobenius form. Other casesare
one
and
for
root
peated
and their consequences.
possibilities
the
are
Here
possible.
Cauchy-Euler case.
Operators
119
Rectangular
Coordinates
d2y
dx2
Cylindrical
Coordinates
=O
Spherical
Coordinates
I d
2dy
r2 dr (r dr ) Y
cosx
Jo(r)
cosr
8 10 12 14
sinx
Yo(r)
sin r
10 12 14
8
10
10
10
Io(r)
Ko(r)
Table 2.3: The linear differential equations arising from the radial
part of V2y y = 0 in rectangular, cylindrical,and spherical coordinates. Bessel functions (Jo,Yo)and modified
Bessel functions (10,1<0)are two linearly independent so-
12 14
Ordinary Differential
120
(u(x),v(x)) =
b, a natural
analog
dx
u(x)(x) dx
dx
(u(x),
definitions, a
is onethat
2 < 00
dx = llu11w
ei
in
Thesame is true a Hilbert space, except that
function(hi(x) and the sum is infinite2,e.g.,each basis vector is now
(2.22)
dx
f(x)e LkX
27T -TT
(2.23)
Ordinary Differential
122
Equations
f (x)
Uketkx dx
Cke
ikx
The question of convergence of this series to the function f is nontrivial; we state without proof that for functions in L2(TT,
TT)
as K 00
llf(x)
The rate of convergenceof fK to f depends on the behaviorof the
by parts
(2.24)
f(x)e tkxdx
-1
(2.25)
(2.26)
Therefore ICkldecays at least as fast as k-l as k 00.This is oftenwritten as Ck= O(k -l ): "Ckis order k-l " If, additionally, f (r) = f(-TT)
123
K = 10
0.8
0.6
fK(x)
0.4
0.2
0
-0.2
-4
-3
-2
-1
to j
Thecase just discussed, in which Ck = O(k-2), corresponds
= 2. For infinitely smooth periodic functions, this argument impliesthat the Fourier coefficientsdecay faster than any finite negative
convergence. Figpowerof k. This is called exponential or SPECTRAL
ure 2.4 shows truncated Fourier series approximations to the function
f (x) = exp ( 8() ) with several values of K. Although this function
is not exactlyperiodic, its function values and derivatives at x =
are extremelysmall, so convergence is rapid.
124
Ordinary Differential
Equations
k0
fK(x) = co + 2
Fourier
fK(x) =
sinkx
This series contains only sines, not cosines, reflecting the fact
that the
nonperiodic function.
The plot remains essentially the same if the discontinuityis in the
interior rather than on the boundary. For example, the function
f(x) =
is periodic (along with all its derivatives) but has a discontinuity at the
origin. The Fourier series of this function is the same as that for the
previous, except shifted by 7T
fK(x) =
sink(x + 7T)=
-2
sin kx
125
4
3
2
1
fK(x) 0
-1
-2
-3
-4
-4
-3
-1
-2
series approximation to
Figure 2.5: Truncated trigonometric Fourier
as
K increases.
PI(x) = x
(2.27)
(2.28)
(2.29)
PO(x) = 1
Pj+1(X)=
xPj(x) -
Pj-l (X)
(2.30)
126
Ordinary
Differential
2.4
127
exact
1
n = 10
0.8
0.6
0.4
fn(x)
0.2
-0.2
-0.4
-1
0.5
-0.5
Figure 2.6: Functionf (x) = exp ( 8x2) and truncated LegendreFourier series approximations with n = 2, 5, 10.
u(l)v(l)
u(x)v'(x)dx
(Lu,v) =
u(x)v'(x)dx
Ordinary Differential
128
Equations
0.8
0.6
fn(x)
0.4
0.2
0
-0.2
-1
-0.5
0.5
(2.31)
w (x) dx
yu(b) + u'(b) = 0
domain.
129
(u,v)w
u(x)v(x)w(x) dx
+ r(x)u v w dx
dx
(x)
aw
p(b) (u'(b)v(b) u(b)v'(b))
p(a) (u'(a)v(a) u(a)v'(a))
1
(Lu,
a w (x) dx
w dx
(2.32)
(2.33)
vanish, then this expression satisfies the selfIf the boundary terms
(u, Lv). This is the case if the above
adjointnesscondition (Lu, v) =
u and v. The restriction on the
boundaryconditionsapply on both
vanishes at one or both
boundaryconditions can be relaxed if p (x)
the function and its
boundaries,in which case only boundedness of
latter case is called a sinderivativeis required at that boundary. The
operator, because it has a singular point at the
gularSturm-Liouville
the boundary terms
boundaryor boundaries where p vanishes. Finally,
are
CONDITIONS
BOUNDARY
alsovanish if p (a) = p(b) and PERIODIC
imposed:u(a) = u(b), u' (a) = u' (b) and likewisefor v. the SturmNextconsider the eigenvalue problem associated with
operator4
Liouville
Lu + Au = 0
the
Aswith all self-adjoint operators, the eigenvalues are real and
called eigenfunctionsbecause they are elements of
eigenvectorsnow
a functionspaceareorthogonal with respect to the inner product
weightedby w (x). Furthermore, and very importantly, there are an
infinitenumber of eigenfunctions and they form a complete basis for
L2,w
(a, b). Wenext consider three Sturm-Liouvilleoperators that producesome famous eigenfunctions that are popular choices for use as
basis functions.
4Thisis the conventional form for writing differential eigenvalue problems. Unfortunately,it is different from the convention for algebraic problems.
Ordinary Differential
130
+ c2 cos
0: a negative value of
would lead to a
gen.
eral solution consisting of growing and decaying exponentials,which
un(x) sin
1
The result that Sturm-Liouville eigenfunctions
with (um, un) = 2mn.
f (x) =
where
(f
(sm
n=l
sin
sin
cn(x) sin
nTTX
1
=
21(f (x), sin nrx)
1
131
(2.34) is
u = exp iMx
un expi
2.4.1.
equation revisited
Example2.8: Bessel's
Theoperator
x dx
d)
dx
x
or,multiplyingthrough by x2, as
x 2u" + xu' + x2u = 0
x = 0, so we can seek solutions by the method of Frobenius. Alternately,in the present case we can make the substitution z = xv/, thus
rewritingthe equation as
2d2u
du
dz
132
Ordinary Differential
Equations
Consider
d; this is
all that
Boundednessrequires
that
Satisfaction
The top center plot of Table 2.3 shows Jo (x); the positions of
its
determine the eigenvalues A. The first several of these are at zeros
mately x = 2.4, 5.5,8.7, 11.8,... and are tabulated in many approxi_
places,ins
cluding Abramowitz and Stegun (1970). Thus
(2.4/1)2,etc.
The
functions
un(x) =
Jo(Vb)+
=O
It has regular singular points at x = 1 while the originis an ordinary point. Becausep(x) = 0 at x = 1, only boundednessat these
Operators
133
Solution
Oreveals
Oan integer, then one of
the solutions that, if = +
(Exercise
1
of degree
2.35) and
is a Legendre 1) with
using the
polynomial
method of
learnthat the other has logarithmic
Frobenius one
can
because the radius of convergence singularitiesat x
=
of a Power
the distance to the nearest singular
series solution Otherwise,
there is no solution that is bounded point(Ab10witzand is given by
Fokas,
at both x =
fore,the eigenvalues of (2.9)are
1 and x = -1. 2003),
= + 1)
the corresponding eigenfunctions
with I = O,1,2, Thereare the
and
Legendre polynomials
Legendre polynomials are the
simplest of a
PI
broad class of
that come from
p0LYNOMlALS
ORTHOGONAL
Sturm-Liouville
are orthogonal with respect to
eigenvalue problems
various weighted
and
inner products.
Some
Solutions
HomogeneousBoundary Conditions
consider the nonhomogeneous second-order
differential equation with
the homogeneous boundary conditions
Blu=O
-O
(2.35)
N(L) = {u I Lu -O,
BILL
-O, B2u=O}
Ordinary Differential
134
Equations
(b) Or
linearly independent functions, in which case
n
contains
N (L)
independent functions
contains n linearly
N(L) = {111,112,
and (2.35)has a
is
and the general solution
u(x) = up(x) +
CXkUk(X)
are arbitrary
scalars.
problems that display the
Next we present two heat-conduction
two
alternatives.
peratures
Apply the alternative theorem to the steady-state heat-conduction problem with heat generation (x) and specified end-temperature boundary
conditions
-k
d2T(x)
dX2
What can you conclude about existence and uniqueness of the steadystate temperature profile?
Solution
Bill -O
=O
135
= /k and
inwhichf
Blu = u(0)
B2u= u(l)
Blu
u(0) = b = 0
B2u = u(l) = a = 0
andwe see that u = 0 is the only element of N(L). We can therefore con-
dude that N(L*) also contains only the zero element,and the steadystate temperature profile exists and is unique for any heat-removalrate
B2T= o
in whichf = /k and
d2
BIT = Tx(0)
B2T= Tx(l)
Ordinary Differential
136
Setting LT = 0 gives
T(x) = Tp(x)
where Tpis any particular solution. Since f corresponds to a rateof
heat removal (or addition when f < 0) to the domain, the restrictionon
B2u = Y2
(2.36)
The null spaces of the operator and the adjoint are definedas in the
2,
Function
Spa ces
define
WIIe11
137
(u, L*v) =
Y2,and
J(u, Vk) for u satisfying Blu = Yl and B2u =
Evaluating
O, B2*Vk 0, gives the solvabilityconditions for
Bl*Vk
satisfying
problem. The next example and Exercise 2.40
nonhomogeneous
the
conditions for problems with nonhomogeneous
derivethe solvability
boundaryconditions.
LT=f
BIT = Yl
B2T= Y2
Ordinary Differential
138
and
in whichf = /k
d2
dx2
B2T= Tx(l)
BIT = Tx(0)
and
by parts gives
(Lu,v)
B2T= Tx(l)
BIT = Tx(0)
- dV1
B2*V1
dx
dV1
BI*VI- dx
=O
dt'l
dV1
dx
= Y2 Yl
T(x) = Tp(x) + a
The restrictionon f now stipulates that the net heat generationmust
exactlybalance the heat removed through the two ends. Again,forf
satisfyingthis restriction, a constant can be added to any steady-state
solution to provide another steady-state solution.
Operators
139
u(t)
du(t)
dt
y(t)
j (t) = f (t) + y(t)
f(t)
t
geneousones, but then compensate for this changeby adding appropriate impulsive terms to the forcing term of the differentialequation.
In this way, we have to recall only how to solve problems with homoge-
boundaryconditions.
It is perhaps easiest to introduce the approachwith an example.
Let's say we are interested in solving the first-order nonhomogeneous
Ordinary Differential
140
Equations
condition
boundary (initial)
du
dt
Figure 2.8. Imagine instead that we solve
in
sketched
is
The solution
homogeneous boundary condition u(0-) = O,and
the
with
problem
the
slightly to the left of zero. NowWeWish
0
=
t
at
boundary
we push the
y just after time Oto value u(0)
jump
solution
the
so
to make
with
problem
the
to
nonhomogeneous
solution
that it agrees with the
This idea is also sketched in Figure2.8
boundaryconditionat t = 0.
discontinuouslyby amount y at t = 0, we require
jump
u(t)
make
To
strength y at t = 0, which is y(t). Since
du/dt to have an impulse of
forcing term and chooseit
du/dt = f (t), we introduce a modified
to be
Weconjecture that solving the problem with this modified forcing term
and homogeneousboundary condition should give us the solutionto
the problem with the original f and nonhomogeneous boundary condition. Let's check this conjecture. By inspection, the solution to the
differentialequation is obtained by integration
du
dt
du =
u(t)
t
I o-
u(t) u(0-) =
u(t) =
Note that this solution satisfies
the homogeneous boundary condition
u(0-) = 0 as desired. Now we
substitute the definition of to obtain
141
u(t) =
yJ
u(t) = y + J
f(T)dT
t
Solution
Wereplace the nonhomogeneous boundary conditions of Example 2.13
with the homogeneous version
BIT = Tx(O-) = O
B2T = Tx(l+) = O
) (x)
Ordinary Differential
142
Equations
gives
Y2(xl))dx
(f (x) +
f(x)dx + Yl Y2
for f satisfying
= Y2 Yl
T(x) = Tp(x) + (x
We see that we have reached the same solvability conditionfound
in Example2.13. By introducing f and using homogeneous boundary
conditions, we avoid the additional complication of introducingand
evaluatingJ(u, v) as explained in Section 2.4.3. EvaluatingJ(u, v) is
about the same work as determining the appropriate j. But using delta
143
x variable playing
the role of time. Note that the
value of u(0) and ux (0) shows up
in the transform. Evaluate u(0)
and leave ux (0) as an unknown
constant.
u(x) =
The function G (x, is known as the Green's function for the nonhomogeneous problem. 6 Write out the Green's function G(x,
Solution
s2(s)
= f + ux(0)
ux(0)
greater detail in Chapter 3,
6The Green's function concept is explored in
3.3.5.
Section
Ordinary Differential
144
(b) Usingthe
transform pair
u(x)
Equations
Sinhkx
theorem gives
sinh(k(x
ux(0)
k
Sinh loc
-1
Sinh k
ux(0) sinhk o
Substituting ux (0) into the previous solution gives
u(x)
Sinh kx
k sinhk
u(x) =
with
1
sinh(k(x
kx
sinh(k(l
k Sinh k
Sinh
; ) )
Sinh kx Sinh
k Sinh k
9)
E<x
E)
(e) We work on the first part of G(x, ; ) using the Sinh difference
formula
Sinhkx
k Sinh k
1
k Sinh k
sinh(k(l 9)
and Stability
145
formula on
the term in
E)) sinhkx
Sinh k(sinh
sinhk
kx Cosh
-- Coshkx
Sinh kx(sinh
k Cosh
terms gives
parentheses
E)) =
Sinh ICE)cosh k Sinh
- E)) sinhkx
- E))=
Sinh k Cosh
kx Sinh ICE+
Sinh kx Cosh k Sinh
the differenceformula
Sinh
sinh(k(l x))
sinhkx sinhk(l
k Sinh k
dt
Ordinary Differential
146
Stability
Equations
Asymptotic Stability
(5
(5
x(t)
x(t)
stability (left) and asymptotic stability
Figure 2.9: Solution behavior;
(right).
di dx = f (x) = f (R + xs)
dt dt
dt
So we assume without loss of generality that xs = 0, i.e., the originis
the steady state of interest. Unlike a linear system, when dealingwith
a nonlinear system, stability depends on the solution of interest,and
we may have some solutions that are stable, while others are unstable.
For a given linear system, the stability of all solutions are identical,and
to reflect this special situation, we often refer to stability of the system,
rather than stability of a solution.
There are several aspects to stability, and we define these next.The
first most basic characteristic of interest is whether a small perturbation to x away from the steady-state solution results in a smallsubsecomquent deviation for all future times. The general term stabilityis
monly reserved for this most basic notion; we use the more precise
2.5 Lyapunov
147
lim
t 00
>
=0
Definition 2.18 (Asymptotic stability). The origin is asymptotically stable if it is (i) stable and (ii) attractive.
The right side of Figure 2.9 shows a representative solution trajectory when the origin is asymptotically stable.7 One might wonder
sweris yes, the origin in a nonlinear system may be globally attractive and still not Lyapunov stable. The problem with these systems is
that there exist starting points, arbitrarily close to the origin, for which
the resulting trajectories become large before they asymptotically approach zero as time tends to infinity. Becausewe cannot bound how
largethe solution transient becomes by constraining the size of its initialvalue,we classify the origin as unstable.8 Note that the system must
7Asymptotic stability is probably the most common notion of stability that people
havein mind, and sometimes it is referred to simply as stability. Of course, this usage
may cause confusion because now the term stability is being used in two ways: as
Lyapunovstability and as asymptotic stability; and one is a subset of the other.
80ne is obviously free to define words as one pleases, but defining asymptotic stability in this way precludes a possible solution behavior that is not expected of "nice"
or "stable" solutions. Regardless of terminology, the important point is to be aware
that solutions can be globally attractive and not Lyapunov stable.
Ordinary Differential
148
Exercise
2.60.
for all t
fashion, we conclude that the marble at rest at the bottom of the track
is an asymptotically stable steady state, and we do not have to solve
the complicated equations of motion of the system to deduce this fact.
Wewish to generalize this concept, and the key idea is to define V(x)
Ro,with a negative time
to be a nonnegative scalar function V :
0. To compute the time derivativeof V(x(t)), we
derivative (x(t))
apply the chain rule giving10
V T dx
x) dt
f(x)
(2.39)
Some
derivatives with respect to vectors.
for
notations
=
(x)
various
or
IOSeeAppendix A for
form (x) = VV
the
in
equation
this
readers may be more familiar with
(W)T or (x) = i.
Ordinary Differential
150
tion V : [Vt
Equations
(2.40)
(2.41)
system = f (x).
Then V( ) is a Lyapunov function for the
The big payoff for having a Lyapunov function for a system is the
mediate stability analysis that it provides. We present next a fewrepres
sentative theorems stating these results. Wemainly follow Khalil(2002)
is called
\0ththisptoperty
to indicate tha
SET,
Nextno
{X e
I llxll
an INV;
0. Therefore we
Therefore,
hiRlonllt0
gD
min
xeD,llxllr
V(x)
are not elements of B, or, equivalently, the elements of A remaining after removing the
elements of B.
origin
ts
otiCQtl
151
VCO
Figure 2.11: The origin and sets D, Br, V (shaded), and B.
Notethat cxis well defined because it is the minimization of a continuous function on a compact set, and > 0 because of (2.40). Choose
e (0, a) and consider the sublevel set
since(x(t))
V(x(0))
for all t
0.
Ilxll
xeV
(2.42)
Ordinary Differential
Equations
152
to some
V(x(t)) converges
{x I V(x) = c}. This level set does not
=
Vc
set
level
Considerthe
choose d > 0 such that maXllxllsdV(x) c
can
we
so
tain the origin, nonincreasingand approaches c as t
00,wehave
is
V(x(t))
Since
Bd for all t 0. Next define y as
max V(x)
V(x(t)) =
+J
V(x(0)) yt
0 andallx
in a level set Vcontained in Bn also implies that the level set Vis
connected.This followsbecause every point x e Vis then connected
ing result.
function
yapunov
L
Fu nctions
and Stability
153
proof,
inwhichc
dx
dt
= Ax
x(0) = xo
(2.45)
inwhichx e
and A e Rnxn. We have already discussed in Section22.2the stability
of this system and shown that x(t) = 0 is an
154
Ordinary Differential
Equations
d
d
V(x(t)) = x Tsx
dt
dt
dt
+ XTs g-E
dt
= x TATsx + x TSAx
dt
= x T(ATs + SA)x
A TS + SA = -Q
(2.46)
so that
V = xTQx
dt
155
ATS +
= -Q
x(0) = xo
To streamline
in whichthe sample time k is an integer k = 0, 1,2
the presentation we assume throughout that f ( ) is continuous on its
domainof definition. Steady states are now given by solutions to the
equationxs = f(xs), and we again assumewithout loss of generality
thatf (0) = 0 so that the origin is a steady state of the discrete time
model.Discrete time models arise when timeis discretized, as in digital
controlsystems for chemicalplants. But discrete time models also
arisewhen representing the behavior of an iterative algorithm, such as
the Newton-Raphsonmethod for solvingnonlinear algebraic equations
discussedin Chapter 1. In these cases, the integer k represents the
algorithmiteration number rather than time. Wecompress notation by
definingthe superscript + operator to denote the variable at the next
x + = f (x)
x(0) = xo
(2.47)
Noticethat this notation also emphasizes the similaritywith the continuoustime model = f (x) in (2.38).Weagain denote solutions to
(2.47)by
with k 0 that start at state x at k = 0. The discrete
timedefinitionsof stability, attractivity, and asymptotic stability of the
Originare then identical to their continuous time counterparts given in
156
Ordinary Differential
Equations
(5implies
that
solutiondecay
AV(x) =
(2.48)
(2.49)
2.5 Lyapunov
157
domain of definition
sinceboth V( ) and f ( ) are assumed continuous.
Theorem 2.27 (Lyapunov stability (discrete time)).
Let V(-) be a Lyapunov function for the system x + = f (x). Then the
origin is (Lyapunov)
stable.
(2.50)
Theorem 2.29 (Exponential stability (discrete time)). Let V G) be a Lyapunovfunction for the system x + = f (x). Moreover,let V ( ) satisfy for
allx e D
a llxll V (x)
AV(x)
b llxII
(2.51)
c llxll
(2.52)
The proofs of Theorems 2.27, 2.28, and 2.29 are essentially identicalto their continuous time counterparts, Theorems 2.21, 2.22, and
2.23,respectively, with integer k replacing real t and the difference AV
replacing the derivative V. An essential difference between continuous and discrete time cases is that the solution of the continuous time
is continuous in t, and the solution of the discrete time
model
model 4(k; x) has no continuity with index k since k takes on discrete
values.Notice that in the proofs of the continuous time results, we did
not follow the common practice of appealing to continuity of (t,x) in
t, so the supplied arguments are valid for both continuous and discrete
cases.
Linear systems.
x(0) = xo
x gives
- XTsx
Ordinary Differential
158
Q>
Choosinga positive definite
Equations
satisfies
AT SA-S= -Q
(2.53)
problem is large or small. In chemical engineering problems, smallparameters often arise as ratios of time or length scales. Importantlim-
159
a is equal to b
a b a is proportionalto b
to note that implies a limit process, while does
It is important
not. In this section we will be carefulto use the symbol'" in the
precisemanner defined here, though one must be aware that it often
f() =
as
if lim < 00
f (e) =
as 0
if lim
f(e) =
as 0 if f (E) =
but not
f (x) =
n=0
fn(x)
CONVERGES
at a particular value of x if and only if, for every e > 0,
n=0
Ordinary Differential
160
Equations
satisfies
= 0 for each M N
fM(c)
f (E) 's'
n=0
fn() as c 0
n
where
erf (x) = I
e t2dt
161
function
erf (x) =
(1)nx2n+1
O (211+ l)n!
On the other hand, for x > 1, an asymptotic series for the function
maybe constructed by repeated integration by parts (a common trick
for the asymptotic approximation of integrals). This approximation is
erf (x) = 1
e t2dt
2
e x
e x
1
e
det2
2x
2
x 2
x 2
e -t2 dt
4X3
erf (x)
x2
dt
1-3 + 0(x -6 )
2x2 (2x2)2
If continuedindefinitely, this series would diverge. The truncated series, however, is useful. In particular, the "leading order" term 1 e
the expression that includes the first correction for finite but large x,
preciselyindicates the behavior of erf (x) for large values of x. Furthermore,the truncated series can be used to provide accurate numerical
valuesand is the basis of modern algorithms for doing so (Cody, 1969).
Nowconsider a function f of E and some other parameter or vari-
able,x
f (X,) Eann(e)
Ordinary Differentia/
162
the interval
lu(x, E) ou(x,
Nonuniformity
is a feature of many
x 2 + exI = 0, e < 1
(2.55)
(2.54)
+ o(2)
(2.56)
(e) remain
to be determined.
Substituting
(E)XI +2
= 0 (2.57)
163
0, the solution is x = xo =
1. Sowe let (50
consider
the
root
1 and for the
xo
= 1. Now (2.57)
moment
At
becomes
= o (2.58)
observe that all but the first two terms are
0() or o(l). Neglecting
these,we would find that
(2.59)
+ 2(52x2+
=0
(2.60)
get
4 + 22X2= O
(2.61)
Thus we have
(2.62)
Observe that to determine (5| (e) and (52(E), we have found a DOMINANT
size to the largest term not containinga (5kand where all the terms
containingkS and es are smaller as e 0.
To find how the second root xo = 1depends on e, we use the
lessonslearned in the previous paragraphto streamlinethe solution
process. That analysis suggests that (5k(e) = Ek so we seek a solution
X = I + EXI + E2X2 + O (E3)
2E2X2 + O (E3) = O
(2.62) and
solutions
0 both
(2.63)
REGULAR
PERTURBAT10N
this are called
as
such
when E 0. Cases
when c O is qualitatively
interesting
problems.
more
much
SINGULAR
PERTURBAThe situation is
like this are called
Cases
1.
differentfrom
Consider the equation
problems.
TION
In the limit c
(2.64)
X xo +
+ -X2= ord(l)
2x 2 + X-1
=O
Perturbation Methods
165
2e1X1
22X2 + ...
1/ 2 +0(2)
(2.66)
r 2dr
dr
166
co + ecl + dc 2
of the form c(r)
+
solution
a
seek
like
powers
yields
equating
Let c = Da. We
into (2.67) and
Substituting
O (3).
1 d 2dco
r 2dr dr
1 d 2dC1
r 2dr dr
1 d 2dC2r 2dr dr
-1
co(l)
co, Cl(l) = O
2c1c(), c2(1) = O
we have
co = 1 for all r. At 1,
1 d 2dC1
r2dr dr
2.6.5 MatchedAsymptoticExpansions
The regular perturbation approach above provided an approximatesolution for Da < 1. We can also pursue a perturbation solution in the
opposite limit, Da > 1. Now letting E = Da-l we have
I d 2dc
r2dr dr
(2.68)
167
largegradient
physical intuition to guess where the gradients are large. At
use
can
we
reaction occurs rapidly, so we expect the concentration to
highDa the
most of the catalyst particle. Near r = 1, however, reactant
be smallin
1 the
from the surroundings and indeed right at r
4 2 (1 r7)2 dn
(2.69)
(1 E l /2 17)2 dc
(2.70)
=
Nowwe seek a perturbation solution of this rescaled problem: c (n)
co + 1/ 2c1 + O (e). The choice of 1/ 2 comes from the observation
El /217) = 1
d2 co
dn2
that
(2.71)
co
13
co = w
13
Ordinary Differentia/
168
As
dil
dn
fl , H
co
co = w
H
cotv
13 = K, where K
{c
'2
=
H
of
is a constant
Therefore, curves
large, i.e., at positions much
, are
larger
solutions. As becomes
than
expect
the concentration
interface, we
distance1/2 from the
andits
take K = 0, so
gradient to go to zero, we
2 3/2
chosen so that co decays with increasing
The negativesign must be
This equationcan be integrated and the boundary conditionco(rl
0) = 1 applied to yield
-2
co(n) =
In terms of the originalvariables this becomes
co(r) =
-2
66
(2.72)
very small, going to zero as - 0. One can carry this analysis to higher
order terms. For example,
result appearat 0(1/2)but it should be clear that the primarystructure of the solution behavior has been captured by this leading-order
solution.
This example is a simple instance of a singular perturbation method.
2, 6
Asymptotic
169
B,
k2
dt
dCB
dt
klcA + k-1CB
= klCA
k-1CB k 2CB
dcc = k2CB
dt
Thereactionequilibrium assumption takes CAand CBto be in equilibriumso that CB= KCAwhere K = kl/k-l. Further assume that 1<-1
is the largest rate constant. Initial concentrations in the reactor are
CA(O)
=
= CC(0)= 0.
-141 AL
Ordinary Differentia/
170
on thefast
Solution
(a) Letu =
du
dts
dts
dw
dts
du
dts
= Ku+ v
dv
dts
u(ts) = uo(ts) +
v(ts) = vo(ts) +
(ts) + O(E2)
(ts) + O(e2)
and Perturbation
Methods
171
substituting and
considering only
the terms of o
yields
Kuo = vo
for both of these
equations.
sumption in dimensionless This is the reaction equilibrium
asform. Although
observe that this
physicallyreasonable,
assumption is not
consistent with the initial con1, v (0) = 0.
Similarly,because the time
are multiplied by c,
derivatives
they do not appear
in
the leading-orderouter
problem, so we do not
have differentialequations
tions include the arbitrary
whosesoluconstants
that are determinedby the
initial conditions. Keeping
this issue in mind, we collect O(el)
terms to yield
duo
Ku 1 + VI
dts
dvo
dts
(c) The problem with the outer solution can be traced to the loss of
the time-derivative terms. Recognizingthat the derivativescan
be large at short times because kl and 1<-1and muchlarger than
du
dtf
= Ku + v
dv
dtf
= Ku V
u(tf)
dtf
dtf
pair of
= 0. This coupled
transforms or
Laplace
by
with initial condition
solved, for example,
be
could
equations
Uo(0)
Ordinary Differentia/
172
dtf
which has solution
-(1+K)tf +
-(1+K)tf
Usingthis, we obtain
-(1+K)tf
space.
With inner and outer solutions in hand, we can use (2.73)to match
lim
lim Vo
CIO
(0) =
vo(0) =
duo
dts
dvo
= vo
dts
duo
dts
dvo
dts
uo
dvo
dts
vo
vo =
173
0.9
0.8
0.7
uoc
0.6
CA(t)
CA0
0.5
0.4
uo
0.3
0.2
0.1
0.5
1.5
Figure 2.12: Leading-order inner Uo, outer uo, and composite solutions uoc, for Example 2.30 with E = 0.2,K = 1, and
e 1+K k2t
CB(t)
e 1+K k2t
CB(t)
e 1+K k2t+
-(1+K)k2t/E
e 1+K k2t
Ordinary
174
Differential
+ (k)2x = 0, X(0) =
+ (02xo = 0,
xo(0) =
+ Xl XO, XI(O) =
=o
=O
tcos wt
with solution
x(t)
this term destroys the asymptoticness of the expansion;the approximation is not uniformly valid, failing at large times. The methodof
dx
dt
x
tl
andMook
14Forextensive application of the method in this context, see: Nayfeh
(1979).
175
look for
and we
xo(to,tl) +
x(to,
(to,tl)
Doxo+
=O
+ (02x1=
sin
sin
DoxI(O) = o
dtl
is therefore
xo = e
coswt
Ordinary Differential
176
dx
dt
(0
where (T= ELI,with < 1, while u and (0 are ord(l). The steadystate
x = 0 of this system is very weakly stable or unstable, dependingon
the sign of u. Since the problem is nonlinear, finding the proper scaling
of x is an important part of the solution procedure. The oscillatory
nature of the linearized equation suggests that a solution canbe found
in terms of amplitude llxll and phase
Solution
the
Althoughwe consider here a specificform for the nonlinearity,
multiple-scales solution will lead to equations for r and whosegen-
u ox
Analysis
Asymptotic
2.6
177
and N3(X,X, X) are the quadratic and cubic terms writconvenient for perturbation expansions. For general vecten in a form
the nonlinear terms
-- [111, 112] , v = [VI,V2] , w =
tors u
problem are
this
for
(X, X)
where N2
N2(U,V)
Doxo-
xo=o
and
xo - DI xo + N2(X0)
DOXI
Thesolution at 0(0) is
xo(to, h) = r(tl)
cos(wto
sin(wto
resonancebecause it is quadratic and thus contains no terms with frequencyco.The solvabilitycondition for this choice of scaling is thus
xo
Thisequationis linear, leading to exponential growth on the time scale
Ordinary Differential
178
Equations
where
(Xo,Xo)
As noted above, M
solution can be found
Xl (to, tl)
LX2=
XO
sin (0 to
cos (Oto
2TT/(O
dto
wethus
cos (Oto
sin coto
dto = 0
Omitting the detailed calculation, which involves elementary but extensive trigonometric manipulations, we find that
dtl
dtl
= ur + ar
= br2
240
This is a remarkably general result. These simple differential equa-
structure of N2 and N3 is distilled into the constants a and b. Furthermore, even for a more general nonlinearity containing higher powers'
2.7
of Nonlinear Initial-ValueProblems
QualitativeDynamics
179
Initial-Value ProbNonlinear
of
Dynamics
Qualitative
2.7
lems
21.1 Introduction
equations can be extremely comThedynamicsof nonlinear differential
in
= Ax in terms of the
Ordinary Differential
180
Equations
stable subspace
unstable subspace
center subspace
or
zj z=0
= Jz
f (x) as
1
2fi
zjZk + O(llz11 3)
2 zjZk z=0
+ O(llz11 3)
u
(a)
181
(2)
(b)
llz112 0
dt
for some values of z, in which case the quadratic term N2(z, z) appears
at leading order.
v/
Ordinary Differentia/
182
Equations
(b)
(a)
Figure 2.14: Invariant subspaces of the linearized system (a) and invariant manifolds of the nonlinear system (b).
Let [xl,X2]
(0,0) be a stable steady state. Furthermore, assume
that we have written the equation in coordinates where
-1
Thus the eigenvalues are e-l and 1,with corresponding eigenvectors uf = [l, OJTand us = [0, IJT. The "s" and "f" stand for "slow"and
"fast"respectively, because the dynamics in the us direction occuron
Nonlinear
Initial-ValueProblems
183
Ordinary Differential
184
Equations
-1
-2
1
-1
+ 10X?2.
U on a trajectory is
dt
U dXi
Xi dt
Xi Xi
= - llVU112 go
showing that the rate of change of potential energy is the square of the
a
gradient of the potential. Trajectories roll downhill until they reach
minimum in U. In a high-dimensional problem, the potential surface
there
can be very complex, with many minima, and saddle points where
are many "downhill" directions for the system to choose from.
Initial-Value
Problems
185
Hamiltonian Systems
consider again the landscape of
Figure 2.15.
rather than V. Now imagine a
dynamical call the energy
H,
but
to
normal
are
system
not
function
tangent to
them. so where trajectories
we modify
are
the gradient
X2
X1
(2.76)
above for U
along trajectories,
We
dH(x, t)
dt
dil
dt
H dpi
Pi dt
H dqi
qi dt
Pi
qi
These equations are called
HAMILTON'SEQUATIONS.
A system whose
Ordinary Differential
Equations
186
Pi
Cli
plane, illustrates some of the important features of nonlinear differential equations. In this case p and q are scalars. Often the Hamiltonian
can be written in this simple form
1
p2+ V(q)
Trajectoriesin phase space are thus symmetric across p = 0. Furthermore, this formula can be used to construct the energy landscape,the
curves of H = constant on the (q, p) plane. From Hamilton's equations
See Section 3.2.
formalism
modelsOf
for continuum
materials, in which the vector field is
simply a sum of a Hamiltonian part and a gradient
part.
187
particularpoint. Hamiltonian systems are not structurally stable; physicallywe can understand this by noting that any dissipation of energy
dx
Ordinary Differentia/
188
Equations
-2
-4
-4
-2
=
Figure 2.16: Energy landscape for a pendulum; H
Kcosq;
Not every two-dimensionalvector field can be written as the gradi19) systems are not
ent of a potential, so two-dimensional(or PLANAR
quite as restricted as one-dimensional ones. Nevertheless, they are
still fairly constrained by the topology of the plane. Let us write a twodimensional system as
2,7 Qualitative
189
0.5
-0.5
-1
-1.5
-1
-0.5
0.5
1.5
- O. Near
intersectionsof the curves fl (Xl,X2)= 0 and
thesesteady states, the behavior is described by the linearizations, if
the eigenvalueshave nonzero real parts. In addition to steady states,
weknowthat closed trajectories (oscillations)can arise, as we saw in
theHamiltonianexamples described previously. Can anything else happen as t - co? Can, for example, a periodic orbit have a figure-eight
shape?The answer to this is no; for trajectories in phase space to cross
wouldrequire two values of the vector field (fl , f2) Tfor the same point
Ordinary Differential
190
Equations
+ 2y2)
= x y x(x2
coordinates
Transformingto polar
gives
e) = 1 + Yr 2 sin 2 0 sin 20
= r(l
the trajectories
Furthermore
< 0 for all r > 1 so
and
2/6
rl
r =
0 on
0 for all O on the circle entering this annulus (let the circle
us call it D)
= 1. So all trajectories
is the area between the two gray circleson
never leaveit. This region
D, there can be no
throughout
steady statesin
Figure 2.18. Since > 0
Poincar-Bendixson theorem thus requires that there
this region. The
in this
(periodic orbit)
region. Numerical
be at least one closed path
integrationreveals that for this problem there is one asymptotically
Partof
stable periodic orbit, which is also known as a LIMITCYCLE.
as
a trajectorythat starts near the origin, as well the limit cycleit approaches, are shoml in Figure 2.18.
may tend to as t 00.'a steady state and a limit cycle. These are simple
A good working definition of an attractoris
examples of ATTRACTORS.
the following.
onlyattractors are steady states and limit cycles. Note that the twodimensionalHamiltoniansystems discussed above also have periodic
orbits;these are not attractors because an initial condition closeto one
such orbit does not approach it as t 00.The fact that trajectoriesof
Hamiltoniansystems lie on constant energy surfaces precludesthem
from having attractors.
20see,for example, Guckenheimer and Holmes (1983) for a discussion of variousdefinitionsof attractors and the difficulties
in developing a satisfactory generaldefinition
2.7
Qualitative
191
0.5
-0.5
-1
-1
-0.5
0.5
x
Figure 2.18: A limit cycle (thick dashed curve) and a trajectory (thin
Three Dimensions
topologically
constrained than are those in one or two dimensions.
Thereis no three-dimensional analog of the Poincar-Bendixsontheoremand thus no restriction that all attractors be either steady states
or periodicorbits. We look first at a simple, geometricallydefined example.Consider a torus (a donut-shaped surface) floating in three dimensionsand assume that all trajectories asymptotically approach the
surfaceof the torus, so that we only need consider what happens on
thetorus itself. Further assume that there are no steady states on the
Ordinary Differentia/
192
2
2
1
1
Tr
of these variables
2.19 shows trajectoriesfor the cases p/q = 9/7 (left) and p/q = u
(right).The qualitativedistinction should be clear. From this example
we see a new type of dynamical behavior, the quasiperiodic orbit.
Initial-Value
Problems
siderthe system
193
52= x + ay
=b+z(x c)
system. If a =
knownas the RSSLER
b = 0.2, c
displaysa limit cycle, as shown in
Figure
1, the system
2.20.
the
has
-system
If c
attractor
the
in
Figure
2.21. This
neither periodic nor quasiperiodic; in
attractor is
fact,
nearby initial
followsimilar paths but will
eventually diverge conditions will
is
known
from one another.
as SENSITIVITY
Thisproperty
To
INITIAL
is characteristic of CHAOTICdynamics.
CONDITIONS
and
Loosely
speaking,
on which the dynamics are chaotic is
an attractor
called a STRANGE
(Guckenheimerand Holmes, 1983;
ATTRACTOR
Strogatz, 1994).
Ordinary Differentia/
Equations
194
1.5
0.5
30
20
10
10
-10
10
15 -15
-10
2,7
Qualitative
Dynamics of Nonlinear
Initial-Va/ue Problems
195
new
xs(po). Taylor
definea
0, p = 0 and using the facts that f = fx = 0 give expanding
j'
+
fpl + 72(fxxy2 2fxpuy + fuuu2) +
1
(2.77)
important cases.
Saddle-NodeBifurcation
Webeginwith the most general case: the partial derivativesoff (other
5" =
Thishas steady states
+ fxxy 2
(2.78)
-2fuu
xx
for > 0 and none for < 0 or vice versa. The point = 0 is thus
quitespecialin that on one side of it there are no steady states near
y = 0 and on the other there are two. This type of bifurcation point
is called variously a LIMIT POINT, TURNINGPOINT, or SADDLE-NODE
(2.79)
0.5
-0.5
-1
-0.2
0.2
0.4
0.6
0.8
1.2
across y = 0, u = O.
tical and/or horizontal reflection
The branch of stable solutions is the solid curve, the
unstable branch is dashed.
whichis stablefrom the right but not the left, and when u < 0 there
is no steady state, although trajectories that pass close to y = 0 move
very slowlythrough that region. The time spent in the interval [-1, 1]
is Tr The BIFURCATION
DIAGRAMassociated with the saddle-node
Dynamics of Nonlinear
Initial-ValueProblems
Qualitative
197
0.5
-0.5
-1
-0.5
-1
Figure 2.23:
0.5
states
Thishas steady
1
fxxfuu) u
sothe steady states are (locally) lines in the (y, u) space, which cross
at (Y,u) = (0, 0). Since steady states persist on both sides of the bibifurcation. It
furcationpoint, this scenario is called TRANSCRITICAL
ariseswhenthe conditions fx = 0, fu = O,fxu * O,fxx * 0 are satisfied.Wecan make the presentation simpler without loss of generality
bysettingfug 0 and rescaling, which gives us the normal form for
thetranscritical bifurcation
5' = y (u + ay),
a = 1
(2.80)
unstablewhen u > 0, and the nontrivial steady state y = u/a has the
oppositestability characteristics. The solutions are sometimes said to
"exchange
stability" at the bifurcation point. The bifurcation diagram
forthe transcritical bifurcation is shown for a < 0 in Figure 2.23.
Ordinary Differentia/
Equations
198
0.5
-0.5
-1
-1
-0.5
0.5
0.5
-0.5
-1
-1
-0.5
0.5
Pitchfork Bifurcation
f (x xs;u) = f((x
2.1
Qualitative
Dynamics of Nonlinear
Initial-Value
Problems
199
becomes
tus
5' = y (u + ay2), a =
y = o, y = u/a.
has steady states
(2.81)
The steady
states and
this bifurcation are shown in Figure 2.24.
for
stability
For obvious reais
called
scenario
PITCHFORK
BIFURCATION.
sons,this
It arises when
=
O,
f
fx
(y;
u)
=
f
(
y;
conditions
the
* O,fxx = O,fxxx O
1,
then
=
a
the
If
nontrivial
steady-state branch exists
aresatisfied.
is
and
stable;
this
case is
>0
said to be SUPERCRITICAL.
onlyfor u
If
branch
nontrivial
exists
for u < 0 and is unstable;
+1, the
this is
case. Note that in the latter case,
sUBCRITICAL
the
the linearlystable
not
be
will
approached
by
branch
initial conditions
trMaI
> u/a; so although small perturbationswith magnitude
from the
decay, larger ones grow.
0
=
y
state
steady
HopfBifurcation
In all of the above scenarios, solutions either monotonicallyapproach
a steadystate or go off to oo(or more precisely, to where higher-order
termsin the Taylor expansion are important). We now consider the
+N3(x,x,x)
+ O(lx14)
Ordinary Differenti
a/ Eq
200
QQti0hs
x(t) = e l /
2r(t)
= epr + aer
2
d) = ber
(2.82)
(2.83)
and b are
a
constants
The
amplitude is El 27"
ue/a. For the
suba stable limit cycle whose
but
duced to find actual solutions. The situation is worse in general,becauseno simple quantitative theory exists for nonlinear systems. Most
of themneed to be treated numerically right from the start. Therefore
it is important to understand how numerical solutions of ODESare constructed. Here we consider initial-value problems (IVPs). We focus on
the solution of a single first-order
equation, because the generalization
to a system is usually apparent.
The equation
f(x(t'),t')dt'
20)
read
(2.84)
in
Thecentralissue the numerical solution of IVPsis
evaluationof the integral on the right-hand side of the approximate
this
a goodapproximation and a small enough time step At,equation.With
the aboveformulacan be applied repeatedly for as long a time
interval as we like,
i.e.,x(At) is obtained from x(O), x(2At) is obtained
from x(At), etc.
x(kAt).
Weuse the shorthand notation x(k)
x
x
x( k) +
(2.85)
t(k+l)
(2.86)
x (k) +
t( k)) + eAt
31
Ordinary Differential
EquQti0hs
202
yielding
(k)At+X
of
we find
and
cancel,
this equation
that
Side
+ o(At 2)
Euler
Thus
Thus the
is simpler, is there any reason to use
scaling.
method
same
the explicit
and arises when we look
rate." sincemethod? The answer is yes
at the
stability.
the implicitmentioned above,
equation = Ax, so f (x, t)
third issue
Ax. If
linear
single
not
asking
is
It
too
00.
Considera
much that
-.-0 as t
x(t)
then
property.
0,
same
The Eulerap_
Re(A)< approximation maintain the
a numerical
special case are
this
for
proximations
x (k+l) = x(k) + Atx(k)
x (k+l)
FACTORfor
GROWTHFACTOR
AAt). We call G the
approximation. By applying this equation recursively from k =
the
k 0
00as k
k
if IGI > 1, then x (k) ..>
0, we see that x ( ) = G x( ), so (k)
0 as k
0. Thus thereis
if IGI < 1, then x
00. Conversely,
CRITERION:IGI < 1. This is equivalent to
STABILITY
a NUMERICAL
Gi + G} < 1, where subscripts R and I denote real and imaginary parts,
respectively. For explicit Euler, GR 1 + RAt,G1 = IAt, yielding
stabilitywhen
(1 + RAt)2 + (IAt) 2 < I
On a plane with axes RAtand IAt, this region is the interior of a circle
If is real,instabilityoccursif
the exactsolution also blows up. But it also happens if < 0 but At <
-2/, whichleads to G < 1.This is pathological, because the exact
solutiondecays. This situation is known as NUMERICAL
INSTABILITY.
of
Initial-value
Problems
2.5
'2
1.5
203
Exact
Explicit
Implicit
11
0.5
x(t)
-0.5
-1
-1.5
-2
-2.5
5
10
15
20
Euler methods
with At = 2.1,
along with the
faithful approximation
of the
whenever
< 0. That is, if the exact solution decays, so does the
approximation. The stability of this method is independentof At, so it
is said to be ABSOLUTELY
STABLE
or A-stable.
Figure 2.25 shows plots of x(t) for the case = 1startingfrom
initial condition xo = 1 using explicit and implicitEulermethods with
numerical instability.
solution
Euler
explicit
Systems
whilethe
and Stiff
Accuracy,
model whose shortest time
Stability,
equation
time step At that
2.8.2
is
cannot choose a
we
a differential
jump right over the
will
have
Obviously,
solution
we
Say
is tint, our approximate
that At < tint. But if we
or
of interest
accuracyrequires
So
small
STIFF.Implicit requireunreasonably
explicitmethodsthe problem
for
In general,
wecan
= Ax
a single-step
For example,consider
scheme as
(k)
the system
= Ax
-100
onlya very short time. The explicit Euler method must capture this time
3At)and 1/(1 + 100At), which are both always less than one. Again,
the implicitEulermethod is always stable.
2.8.3 Higher-Order Methods
TheEulermethods are
simple to implement and convenient for introducingthe conceptsof
simplicity, accuracy, and stability, but they are
Initial-Value
Problems
dicting"x
Ordinary Differential
206
Equations
the
Moultonformula
x(k) +
x
where
At
+ 8f(k) f(k-l))
12
t( k+l ))
ods also have higher-degree accuracy than Euler, but are one-step meth-
At
Initial-value
Problems
207
0.5
AB2
AB3
Im(At)
-0.5
-1
-1.5
-1
-0.5
Re(At)
0.5
in which
+ Atk2, t( k) + At)
+ Atk3, t( k) + At)
Ordinary Differential
208
Equations
1.5
APC2
APC3
APC4
0.5
Im(At)
0
-0.5
-1
-1.5
-2
-2.5
-2
-1.5
-1
Re(At)
-0.5
0.5
4.7)
case, the solution at any point is coupled to the solution at all other
points in the intervalbecause the boundary conditions are imposed
at both ends of the interval (think of a diffusion problem). Soif the
Boundary-Value
Problems
209
RK4
1
RK3
RK2
Im(NAt) 0
-1
-2
-3
-3
-2.5
-2
-1.5
-1
-0.5
Re(At)
0.5
Lu = f (x) x e [a, b]
We choose a set of TRIALFUNCTIONS
solution
solutionu(x) and let un(x) be the approximate
n
un(x) = E cj(x)
and the trial funcu(x)
solution
the
For the moment we require that
though it is easyto
conditions,
tions satisfy homogeneous boundary
Ordinary
210
Differential
R = Lun -f
Obviously,if un = u, then Lun = f, the equation is solved
In any case, we want R to be as small as possible. In what andR
sensedo
Ctlons,
{(/'i(x)}, and require that the WEIGHT
or TESTFUNCTIONS
FUNCTIONS
(2.87)
Setting
and
AiJ
(Lcj(x),
bt = (f (x), (Vi(x))
(2.88)
(2.89)
of
Boundary-Value
Problems
We introduce a number
of specific
modelproblem
211
MWR
implementations
Sincethe boundary
conditions are
not
1). Now u(0) = u(l) = O
homogeneous,
and the
let u
equation
becomes
using the
(2.90)
(2.91)
Galerkin Method
o,
o,
otherwise
h, xj < X xj+l
otherwise
h' XN-I S X
otherwise
212
Ordinary
Differential
40 (x)
0.5
0
0.5
= 2.
where n = N 1. Thus
o,
(W"'i + "i)dX
otherwise
and
= ih 2
special structure: only the diagonal elements and those just aboveand
below the diagonal are nonzero. Such a matrix is calledTRIDIAGONAL
and can be LU decomposed quickly, i.e., in O (n) operations, sincemost
2.9 Numerical
Solutions
of
Boundary-value
Problems
213
0.06
Exact
- 12
UN(X) 0.04
0.02
0.2
0.6
0.8
methods, which instead of expanding solutionsin basis functions,considers function values at distinct grid points in a domainandreplaces
derivatives by difference formulas (Press,Teukolsky,Vetterling,and
Flannery, 1992). For example, u' (x) can be approximatedas
u'f(xj)
or
u'b(xj) 'z
(2.92)
(2.93)
two equationsareknown
These
above.
The
as
where x j and h are defined difference formulas, respectively.
as FORWARDand BACKWARD
Ordinary Differential
214
(2.94)
u'c'(xj)
112
(2.95)
112
- -jh j
Observethat the term corresponding to the second derivativeis identical in the two cases, as is the right-hand side. In many situations,
J 2 Tt2
(1)i-1
i7T
215
d2u
=
Welet (j(z)
Pj-1(Z), so
Cj+1Pj(z)
conditions,so
boundary
the
satisfy
TAU
polynomials do not approach, calledthe GALERKIN
The Legendre a slightly modified
use
we need to
i 1,2,... ,n-2
for
only
conditions
method:
X.
unknownscj.
n
the
for
2 equation
expressionsfor
equations
these
2. Supplement
on
conditions
ary
Ordinary
216
Differential
Equ
Qti0hs
polynomials
-1
(2k +
k=0,jk odd
-(2k+
41 1(
+k+
jk even
ik +
jk even
for i = 0, n 3, j = 0, n 1 and
bi+l
-1 +
(z
dz
-1
(Po(z) + PI
dz
1
- -i0 -il
3
conditions leadto
for i = 0, n 3. The expressions for the boundary
bn-l
bn = 0
and exact
Wedo not plot the comparison between approximate visually
inare
two
the
5,
=
n
for
even
lutionsfor this case, because
j for n
distinguishable.Rather, Figure 2.31 shows lcjl versus plot,indicatFor j 4, the plot is nearly a straight line on a semilog or spectral
ing that cj decays exponentiallywith j. This exponential
Numerical Solutions of
Boundary-Value
Problems
10
-1
217
o
o
10
-3
10
10
10
10
-11
10
Ordinary Differential
218
Equations
(2(x1) (3(x1)
(h1(X2)
vector of
solution
('2(X2)
(X2)
uln(X2)
u'n(X3)
or Sac = U'. Using the fact that c = S-1 U, we can write U'
= SdS-1U
or U' = DnU, where Dn = SdS-1 is called the COLLOCATION
DIFFERENTIATIONMATRIX.
With this formula, we can compute the derivative
of
the function un (evaluatedat the collocation points) directly
fromthe
function values at the collocationpoints. All of the informationabout
what basis functions have been used is absorbed into the operatorDn.
Similarly,the second derivativematrix is simply . Note that within
u(x)(x)w(x) dx
can be approximated as a sum
(2.96)
u(x)=
n/2-1
Problems
219
functionsthat
canbe
keikx
written
n/2-1
i'keikx
et al. (2006)
provide a detailed
dis-
cussion.
chebyshev collocation.
Chebyshev polynomials
are aparticularlypoptrial
functions
as
for
the
choice
ular
collocationmethod. These
To(x) = 1
Tl(x) = x
= 2xTv(x) - Tv-I(x)
Aswith Legendre polynomials, Chebyshev polynomials also arise from
Gram-Schmidt orthogonalization of the set {1,x, x2,... }, but now us-
d2y
d 02
collocation
< O < TT.In this domain,the optimal
in the domain TT
domain -1 < x < 1
original
the
in
which
spaced,
points are uniformly
resultsin the points
xj = COS,j= O,...,n
Ordinary Differential
220
given by
CL (-1) l +j
operator
is
I-EIT,
Dn,lj
2(1-Xj)
2n2+1
2n2+1
+ jn
where - = 1 +
As with the Legendre-Galerkin method, the natural setting for
Cheby_
Shev collocation is the domain (1,1). For our example problem,
(2.91)
this domain,the equations of the
transformedinto
tion approximationare
4(D2) U + U
Chebyshevcolloca_
2.10 Exercises
Exercise 2.1: A linear constant coefficient problem
Find the general solution to
where
= Ax
-1 -1
1 -1 -1
o
Express it so that only the arbitrary constants are (possibly) complex. Youshouldbe
able to solve the problem without explicitlyperforming any similarity transformationS,
i.e., you should not need to invert any matrices.
= Ax
2, 10 Exercises
221
where
14 -16
sketch the dynamics on the phase
Plane
to showthe invariant directions and in the original
the stability
coordinate system,
along those
being careful
directions.
1. cos
2. cosx(cosx + sin x)
3. 1 + sin2 x
4. 1 + cosx
(x,y)w x(t)(t)w(t)dt
andw(t)
= t2.
(b) From the set {1, t, t2 t3, t 4}, construct a set of ONbasis functions
for L2,w(0, 1).
These are the first five Jacobi polynomials (Abramowitzand Stegun,
1970).
(c) Find a five-term approximation to 11t with this inner product and basis. Plot
the exact function and five-term approximation. Computethe error betweenthe
exact and approximate solutions using the inner product aboveto define a norm.
f(x) = 2TTX
ifxr
27T
Cke ikx,
show that
Ordinary Differential
222
for this function using the basis functions
coefficients
Findthe Fourieras 1/1<2 as Ikl 00.
decay
that they
if X < TT
1ifxr
= 0,
' ..,N.1s
series:f (x) = E
E i=0
where
mightbe
fM(x) =
n=l
an sin(nrrx)
(mx + b)
sin(nrx)
x e [0, 11
= mx + b cos(nrx)
nrr
1
= -nm,
sin(nTtx)
+ (TITT)2
n, m = 1,2,..
2.10
Exercises
223
Usethe Leibniz rule for differentiating integrals to solve the following two problems.
dy +
dt
= q(t)
Rememberto show the solution satisfies both the differential equation and initial
condition.
f(t) =
h(t, p, s)dsdp
a(t) c(t,p)
Youranswer should not contain the derivatives of any integrals.
f(t')g(t t')dt'l =
(b) Use the definition of the inverse Laplace transform to derive the convolution
theoremgoing in the other direction
- t')dt'
Whichdirection do you prefer and why?
Ordinary
224
Differential
eqt.4Q
Ohs
00
and
(a) The conditions on sf (s) for the final-value theorem are crucia
below, state which satisfy the conditions and give their final 1.Forthe
values functions
1.
2.
3.
4.
1
1
s(s a)
1
s(s + a)
Re(a) > O
Re(a) > 0
(c) Invert each of the transforms to get f (t) and check your results.
CA
CC CD CE
(a) Write the mass balance for the well-mixed, batch reactor of constant volume
dc
dt
(b) What is the solution of this mass balance for initial condition c(0) = co?What
calculation do you do to find out if this solution is stable?
(c) Determine the rank of matrix K. Hint: focus on the rows of K. Justifyyour
answer. From the fundamental theorem of linear algebra, what is the dimension
of the null space of K?
(d) Whatis the condition for a steady-state solution of the model? Is the
state unique? Whyor why not?
steady
2.10 Exercises
Exercise2.15: Network of
consider the generalization first-order
of Exercise
chemical
reactions
2.14 to
the
following
set of n
batch reactor.
The reaction
rate for the
ith
dt
reactionis
of constant
volume
df(s)
ds
This formula proves useful in Exercise 3.19.
Exercise2.17: ODEreview
Solvethe following ODEs: unless boundary conditions are
given, find the general solution:
(b)
= y 2,y(0) = 1 (separable)
oil
x 10
Ordinary Differential
226
A linear
Exercise2.19:
system
Considerthe
-1
1 -1
x + h(t)
of a freely
Exercise2.20: Dynamics
equations
Consider the system of
-13)
(13 11)
(a) If( = ((l, )2,(3) is a steady state of this system, find the linearizedequation
= ((l, (2,()3)from the steady state.
for
(b) Find three steady states of the system that satisfy (01 + c02 + (03 = 1. Whichare
linearly stable?
(c) Sketch,in the ((l, (2, (3) phase space, the qualitative behavior of trajectories
that begin near each of the steady states, using the linearized equationsasyour
guide.
0
wherex is proportionalto the
beam. When >
displacement
the
of
the
middle
Of
the beambuckles:the "unbuckled"
state x = = 0, is unstable.
(a) The two nontrivial steady
states are x =
linearizations around those states.
Ofthe
= 0. Find the eigenvalueS
2.10 Exercises
(b) In this model there is
no friction
so the
total energy
(kineticPlus
elastic)is
x4
givethe correct
In this model,
> O and
> 1, and
system: species
1 eats the
grass
and
X2)
and
model.Findthem.
states. Since
(a) For A = 0, there are four steady states in the domain0 <
and determine which ones are linearly stable.
2TT.Findthem
(b) Draw the trajectories in phase space for A = 0, along with the steady states.
Here phase space is simply the line, and since e is periodic can alternatelybe
considered to be just a circle with unit radius.
Q = so(l +
- To))
228
Ordinary
Differential
Equ
Qti0hs
(f) Plot
[1, 3, 5, 7, 7.5]
profile
Exercise 2.25: Existence of a positive steady-state temperature
Consider Exercise 2.24 again.
and = 7.5,
(a) Plot and compare the solution 9) if you set K = 0.8
What happens as you increase in this problem?
for
assuming
you
are
What
(b) Lookagain at how you solve for the constants , c2.
this solution to exist?
2, 10
229
Exercises
ranging from O to 0.99, find and plot the critical value of such
exist. If you exceed this critical value of F,
solution for , c2 does nottransient
(c) For
the
that the
in
heat-conduction problem?
think happens
whatdo you
es
K valu
2.26: Flow
Exercise modificationof Darcy's law for flow in porous media is
an's
Brillkrn
be containing
flowin a tu
axial
For
1 d
r dr
dr
in which
A'2R2
and rewrite the differential equation and the boundary conditions in terms of
the dimensionlessvariables. Howmany dimensionless parameters does the new
differentialequation contain?
(b) Obtaina particular solution of the differential equation obtained in (a) by inspection.
(c) Obtainthe solution of the homogeneous equation; it should contain two constants. Oneconstant can be immediately evaluated from the boundary condition
at p = 0. Why?
m(p) dp
Jolpdp
1-
11(+)
K/R2 102
Ordinary
230
(f) Showthat in the limit of
Differential
00,
(which is exactly
in (e)
the result
for flow
in
xy" + (1 x)y' + Ay = 0
whereAis a constant, is called Laguerre's equation. It arises in determinin
function for the electrons of a hydrogen atomthe orbitals that you
lear:
quantum mechanics (and thus the structure of the periodic table) emerge
in partfrom
(a) Showthat x = 0 is a regular singular point.
(b) Determinethe roots of the indicial equation and one solution to this
problem
0, where
form of a Sturm-Liouvilleoperator, Lu
w(x) = e-x2. Whatare p(x) and r (x)?
(a,
= lim
a(x)b(x)w(x)dx
2.10 Exercises
231
5x2y"
+ (x3
=O
zero is
eigenvalue problem
u" + Au = 0,
x y +3x2y" 3xy' = 0
Exercise2.34: A fourth-order variable coefficient ODE
The following differential equation arises in the analysis of time-dependent flow of a
polymericliquid
x 2D 2 -x 2 +2-2xD)
-2iD
whereD = a-. This equation has solutions of Frobenius form. Find the roots of the
indicialequation.
Ordinary Differential
232
Legendre's
Exercise2.35:
Legendre's equation
is
equation
| (1+1)
Y2(x) = x
4!
5!
3!
equation
Exercise2.36:Chebyshev's
Chebyshev's equation is
problem Lu + Au = 0 in the
(a) Put this in the form of a Sturm-Liouville What
domain
(1
boundary
=
w(x)
v2,
=
conditions
with
1],
[1,
mustu
and u' satisfyat x = 1 for self-adjointness to hold?
(c) The points x = 1 are regular singular points for this equation. As a firststep
toward findingthe behavior of the solution near these points, find the rootsof
the indicialequation for a solution in Frobenius form expanded aroundx = 1.
(b) What are two linearly independent solutions for each coordinate system?
d2u
(a) Howmanylinearlyindependent
u(0) = u(l)
condition
2. 10 Exercises
(b) How many
lin early
in
deendent
(c) How many
solutions
lin early
ZZ(O)
in
deenaent
Exercise 2.39:
Heat
solutions
QRist
for the
Oun
twob
conditions
ound
conditions
conduction
With
for
heat
conduction
Supposewe set up
heat
the body at the samethe problem
Witha
temperature.
(a) Identify the
temperature
controller
tional Bl, so appropriate
that keeps
this problemdifferential
the ends
operator, L,
can be written
of
and
as
associated
boundary func-
BIT = O
(b) Notice that we
do
not
solve (2.98) uniquely. have enough
boundary conditions
Define the adjoint
to expectto be
operator and adjoint
ableto
boundary
func(Lu,v) = (u, L*v)
for every admissible
u
ary condition on u (x) , (x) and v
that sinceYouare
you will require Notice
missinga boundthree boundary conditions
on v (x). What
g apidentifythefunction
(e) What is the Green's function for this problem,i.e.,
f
not involving
g (x, E)f (E)d; + terms
Ordinary Differentia/
234
differential operator
second-order
Considerthe
Lu --
dX2
Blu = u(TT)
du
dx
B2udx
nonhomogeneous problem
Lu = f (x)
Bl(u)
(d) Forf
B2(u) = Y2
cosx) = Y2
answer:(f, sinx) = Yl (f,
what is the general solution?
satisfyingthis solvability condition,
answer:u(x) =
Txx= f
Txx=f
Tx(0) = O
-k d2T(x)
dx2 = (x)
d2T
dX2
=f
f = -/k
2.10
Exercises
235
(a) Write
this problem as
BIT = Yl
What are D, Bl and B2, and
B2T =
and )'2?
eigenfunctions
of a Sturm-Liouville
showthat two eigenfunctions ul and
problem(pu')' + ru +
orthogonal
if
the
inner product weighted
= 0 are
with
w
is used.
zeroboundary conditions u(a) = u(b) = 0. Multiplythe equation Consideronly
for (setting
by 112',multiply the equation for
(setting = 2* M)
by 111',subtract and
integrateover the interval. Use the boundary conditions
and integration by parts to
proveorthogonality.
d2u
du
dx2 + pe
dx
boundary condition u(0) = u(l) = 0. Pe is the PECLET
number,measuringthe
(a) Find the adjoint of this operator, first with an inner product with a constant
(b) Solvethe eigenvalue problem Lu + Xu = for arbitrary Pe. Hint: since the equa-
tion has constant coefficients, express the solution as y(x) = eiwc. Plot the
r = kCA= koe-EITCA
dt
kCA
dt - VRPp
Tf-T kCA
pep
Ordinary
236
Parameter
E
f
CA
110
Differential
Value
7550
298
Units
kmol/m3
0
-2.09 x 10 8
4.48 x 106
4.19 x 103
103
J/kmol
vs
J/(kg K)
kg/m3
18 x 10-3
m3
60 x 10-6
m3/s
solver
Exercise 2.46: Choosing an ODE
(d) Modifyyour RKcode to use variable time steps. Use the criterion that At <
/5. Estimate tmtn from the values of y/ S' at each time step.
many
When examining the numerical stability of integration schemes, as well as in
other situations, we run across the linear constant-coefficient difference equation
(2.99)
aMYn+M+ aM-1Yn+M-1 +
+ aoYn = O.
2.10 Exercises
237
Forexample, could be the value of y at the
nth
(a) Showthat this equation can be written in vector time step of someprocess.
form
= Gxn
Whatare x and G in terms of y and a coefficients?
(2.100)
(b) Giventhe initial condition xo, find the solution
to this equation
of n and xo) in the situation where A has
distinct eigenvalues (i.e.,xn in terms
X.
the
case
for
where
Repeat
(c)
x = O?
For f (u) q2u find a quadratic equation for the growth factor G for this
method,i.e., look for solutions of the form un+l = Gun. Up to what threshold
(qAt)2 are the numerical solutions stable?
(d) Byexpanding all terms in Taylor series around time step n, find the local trun-
cation error p of this formula (the first power of At that does not cancel).
=v
= ax
wherea e R.
(a) What property must a satisfy so that the true solution x = O,v = Ois stable?
(b) For this problem, the velocity Verlet algorithm becomes
wherex = (x, v)
Ordinary
238
(c) Find the criteria that
Differential
fourth-order) predictor-corrector
Denote the general (up to
x
formulas for
the
(nl)
n
= x (n ) + W PIX ( ) + P2X
x (n) + W
x 01+1)
+ p4x(n-3)
+ C2X(n)+ c3x(n-l)
predictors
c{l}=
p{3} -
10, 0]
-59, 37,-9]
19,-5, 1]
x (n ) (1 + WCI + w(cltvpl
gion, consider (0 = e i0 for 0 O 27T,so co has unit magnitude, and solve the single
algebraic equation det(G(w) WI) = 0 for the complex value w as a functionof pa-
rameter e. The stability boundary of the APCmethod then comprises these valuesof
w. That is how Figure 2.27 was prepared, for example.
orderin
Now consider the class of predictor-corrector methods that use the same with
predictor
the predictor and corrector. Recall the methods in Figure 2.27 used a
through
order one less than the corrector. Find the stability boundaries for first-order
Contrastthe
fourth-order methods. Compareyour calculated results to Figure 2.34.
standpoint, which
stability results displayed in Figures 2.27 and 2.34. From a stability
class of methods do you prefer and why?
boundary. Why
You will need to increase the O interval to [0, 47T]to close the stability out the square
do you suppose this increased interval is required? Consider mappingclose?
root function on the unit circle using O e [0, 2TT].Does this boundary boundaryto
in the
Youwill need to clip off some unstable regions made by loops
2.10 Exercises
239
1.5
APC4'
APC3'
0.5
APC2'
-0.5
-1
-1.5
-2.5
-2
-1.5
-1
Re(NAt)
-0.5
0.5
2.54:Airy's equation
Exercise
Theequation
arisesin optics, quantum mechanics, and hydrodynamics, and is known as
Airy's equation.
Ordinary Differential
240
=0
using
(a) The Legendre-Galerkin method.
ermic
occurs.
reaction
(b) Use the Galerkin tau method to construct an approximate solution. Usethe
= 1, (x) = x, (2(x) = (3x2
Legendrepolynomialbasis set:
- 1)/2.
and
u'
(0)
u (1) to compare the approximate
at
look
Sketch the solution and
and
solutions.
exact
Solve the above problem again, using the finite element method with the "hat functions"
described in Section 2.9.1. Study how the approximation converges as the numberof
node points N increases. Also look at the computation time as a function of N,
program to solve
Using the Chebyshev collocation technique, write an Octave or MATLAB
the boundary-value problem (a steady-state reaction-diffusion problem)
2.10 Exercises
41
stability for
showthat asymptotic stability and attractivity
linear system:
are identical
for linear
systems, = Ax
Stability
Exercise2.61:
and asymptotic
stability for linear
systems
dt
systems
Forreal, square matrix S, consider redefining S > 0 to mean that XTsx > O
for
O. We are removing the usual requirement that S is symmetricin the all x e
definition
definite in Section
ofpositive
1.4.4.
(a) Define the matrix B = (S + ST) 12. Show that B is symmetric and xTBx = xTSx
for all x e Pt. Therefore S > 0 (new definition)if and only if B is positive
definite (standard definition).
(b) What happens to the connection between this new definitionof S > Oand the
eigenvalues of S? Consider statement 1. from Section 1.4.4
e eig(S)
Does this statement remain valid? If so prove it. If not, provide a counterexample.
Ordinary Differentia/
242
stable.
(d) Exponentially
(e) which of
stability are
these forms of
Equations
system?
the
perturbation
For the regular
c2(r).
next term in the series,
a two-time-scalesingularper.
turbation
simple reaction
Consider the following
volume, batch reactor
(b) The B species has two-time-scalebehavior. On the fast time scale, it changes
rapidly from initial concentration CBOto the quasi-steady-state value for which
RB 0. Divide Vs material balance by 1<2,define the fast time-scaletimeas
T = k2t, and obtain for B's material balance
dCB
= k1CA
CB
dT
Wewish to find an asymptotic solution for small e. We try a series expansionin
powers of for the inner solution (fast time scale)
The initial condition, CBi = CBO,must be valid for all e, which gives for the initial
conditions of the Yn
Yo(0)
Yn(0) = O, n
Substitute the series expansion into B's material
balance, collect like powersOf
and show the following differential
equations govern the Yn
o.
1.
dT
dY1
dT
dT
n 22
2.10 Exercises
243
(c) Solvethese differential equations and show
Yo =
kl
kl/k2 -1
Yn=O n2
-- e-k1T1k2
= ek1CA -s CB
BO +
dBo
1.
dt klCA Bl
dBn
dt = -Btl+l
Bl = klCA
Sowe see the zero-order outer solution is CBO= 0, which is appropriatefor a
QSSAspecies, but a rather rough approximation.
(e) Showthat the classic QSSAanalysis is the first-order outer solution.
(f) To obtain a uniform solution valid for both short and long times, we add the
inner and outer solution and subtract any common terms. Plot the uniform
zeroth-order and first-order solutions for the followingparametervalues
= 10
Compareto the exact solution and the first-order outer solution (QSSAsolution).
(g) Showthat the infinite-order uniform solution is also the exact solution.
Ordinary Differential
244
on species B and show
(a) Apply the QSSA
CA() + CBO
CBs
kl
1+1<2
1+1<2
CA() + CBO
1
1+1<2
) e-ft#2t
in whichK2 = k2/k-1.
the A and B species have
(b) With this mechanism, both both CAand CB.Let thetwo-time-scale
inner solution behavior,
we use a series expansion for
be givenby
+
2X2
C
XI
+
+
CAi = XO
axo
dT
dT
1.
dXl = klX()+ Yl
dT
dXn -klXn-1 +
=
dT
dY1
dT
dYn
dT
= klXn-1 -(1+K2)Yn
XO = CA() + CBO
1+1<2 k
-(1+K2)T
YO=
-(1+K2)T
(d) Next we construct the outer solution valid for large times. Postulatea series
expansion of the form
Bo=o
1.
dt
dBn-l
dt
dt
= k 1AnI-(1+K2)Bn
2.10 Exercises
245
(e) Solvethese and show for zero order
ESE P+E
in which the free enzyme E binds with substrate S to form bound substrate ESin the
first reaction, and the bound substrate is converted to product P and releases free enzyme in the second reaction. This mechanism has become known as Michaelis-Menten
kinetics (Michaelis and Menten, 1913), but it was proposed earlier by Henri (1901). If the
either the free or bound enzyme is present in
rates of these two reactions are such that
with the QSSA.
small concentration, the mechanism is a candidate for model reduction
and
Assume kl > 1<-1, so E is present in small concentration. Apply the QSSA
reduces to a first-order, irreversible decomposition
show that the slow time scale model
satisfies
reactor, show the total enzyme concentration
batch
well-stirred
a
For
(a)
CE(t) + CES(t) = CE(O)+ CES(O)
corresponding
concentration of E. What is the
QSS
the
for
expression
(b) Find an
concentration of ES?
model's singlereactionis
reduced
the
for
(c) Showthe rate expression
kcs
1 + Kcs
k = k2KEo
k-l + k2
Eo =
+ CES(O) (2.101)
Ordinary
246
Differential
Rp= r
(d) Plot the concentrationsversus time for the full model and
followingvalues of the rate constants and initial conditions.QssAmodel
kl = 5
CE(O)= I
-1
CES(O)= O
10
cs(0) = 50
th
cp(0) = 0
E+ S
kl
ES
k2
k-1 >
Nowassume the rate constants satisfy 1<1,
scale
of
the
time
the
second
equilibrium on
reaction.
Rp =
kcs
1 +1<1cs
k = k2KIE()
= kl/k-l
(2.102)
early with substrate at low substrate concentration and (ii) production rate ofPis
The reader should be aware that either approximation may be appropriatedepending on the values of the rate constants and initial conditions. Althoughboth
2.10 Exercises
247
tri
the following rate expressions
tri =
tr2 =
tr2
kcs
k = k2KIEo
kcs
= kl/k-l
CE(O)= 20
k 2 = 0.5
CES(O) = 10
cs(0) = so
cp(0) = O
Recall that you must modify the initial conditions for the slow-time-scalemodel
f(x) =
integration by parts. Showthat the approxfor large positive values of x. Use repeated
imationis asymptotic as x
00.
xe X= E
for
x =
1. Seek solutions of the form
where (E)
1.
1 and one where () >
one
find two dominantbalances:
Ordinary Differential
248
eigenvalue problems
Perturbed
Exercise2.73:
problem
eigenvalue
Considerthe
whereA
Hint:
matrix and
is an n x n
Ax + B(x) = Ax
and uniqueness
existence
the
roiew
of equations
1/2 2
cy
e l / 2 xy
are both ord( 1). (They have already been scaled by 112.)perform
Assumethat x and y
letting to = t, ti = el/ 2t, t2 = ct. Show that
the solvability
expansion,
a multiple-scales
conditions require that
yo
to
yo
tl
dyo = RYO+ Yo
dt2
wheref (x; u)
and fxxx
the correct symmetry to display a pitchfork bifurcation, (2.81) does not hold because
fxxx = O.
(a) Derivethe correct normal form in this case and draw the correspondingbifurcation diagram(s).
(b) Nowlet fxxx be nonzero, but very small. How are the above bifurcationdiagrams modified?
Consider the stability of a periodic orbit of a nonlinear system. Let XP(t) = xp(t + T)
be a time-periodicsolution of the differential
equation
Nowlet x = xp(t) +
1.
2.10
Exercises
the linearized
equation for
z
249
takes the
A(t)z
rix operator
With
form
time-periodic
coefficients.
It is (writtenparticular case
as a single of a linear
+
second-orderequation With
+ (0 2 +
equation)
ecos2t)x =
< < 1,u =
Letting =
0
determine the stability of the ord
point z = O.
1/2. (Although this equation
Showthat
put
in
in second-order
the form z Ois stable
a solution of the form x(to, to form.) Use time scales
easiertowork
= A(tl) COs
to t,
to +
andassume
+
Exercise2.77: Oscillator with slowly
varying
frequency
=
to find
slowlyvarying the leading-ordergeneral
frequency
Usetimescales to = t,
+ x = 0,
x(0) = 1,
= 0
= ct.
whose natural frequencies (01 and (02 are close but not identical can be synchronized
has since
("phaselocked") if they are coupled to one another. Suchsynchronization
been observed in a diverse range of applications, including coupledchemicalreactors.
Asimplemodel for a pair of coupled oscillators is
I
(01 + Kl sin(02
01)
sin(1- 02)
to theunsynchronized state.
Bibliography
John Wiley
Theory. Springer
_
1969.
C. Gasquet and P. Witomski. Fourier Analysis and Applications. SpringerVerlag, New York, 1999.
sey, 1978.
250
Bibliography
Dynamics
fluids. 1.Development of a
and
general
formalism. thermodynamics
of complex
Phys.Rev.
E,
J. Guckenheimer and P. Holmes.
Nonlinear
and Bifurcations of vector
Oscillations,
Fields.
Springer
Dynamical
Verlag, New
systems
York, New
York,
M.V. Henri. Thorie
gnrale de
l'action de
quelques
Cambridge
diastases.
University Press,
Equations,
Comptes
Cambridge,
Dynamical Systems
and
49:333-369, 1913.
BibliogtQh
252
ess
Cambridge, 1992.
Analysis and
Ekerdt. Chemical Reactor
Design Fund
G.
J.
and
second
J. B. Rawlings
WI,
edition,
Madison,
2012.
Q.
Publishing,
mentals. Nob Hill
Control: Theory
Q. Mayne. Model Predictive
and Design
J. B. Rawlings and D.
2009.
WI,
Madison,
Nob Hill Publishing,
J.-B.Wets. VariationalAnalysis. springer-Verlag,
R. T. Rockafellar and R.
1998
3
Vector Calculus and Partial
Differential
Equations
254
EquQti0hs
u; = uati.
a vector is Ilull =
of
length
The
vector. The
two vectors is determined by the dot
degree
between
of alignment
UiVj(ei ej)
U V = UV
Usingsome elementary
(llu112
+ llv112
tors, also called the DIRECT PRODUCT or DYADIC PRODUCT. The outer
product between vectors u and v is the DYADIuv. A dyad is a SECONDa quantity that incorporates information regarding two
TENSOR:
ORDER
directions. (Avector, which has one magnitude and one direction,is a
(uv) w = u(v w)
Similarly,
Notethat uv
w (uv) = (w u)v
vu. Based on this definition, we can write uv out,
includingbasis vectors
UV
v.
the dyad uv is denoted by uvT or u
3,1 vector
255
of the
0011
Tijetej
v
between
a
=
u
second-order
tensor and a vector is
dotproduct
=
TijVj.
Similarly,
u = v T is, in Cartesian
anothervector:
coordiThe second-order identity tensor is
vjTji.
=
nates:
denoted and
property a = a
satisfiesthe
dinateS,the ij component of
113
VI
-112
Ejk -1,
0,
UNITTENSOR
or PERMUTATION
TENSORE. As with the cross-product
itself,qjk is not actually a tensor, but rather a pseudotensor,
because
itsdefinitionis based on the
use of right-handed Cartesian coordinates.
Nowthe operator (ux)
can be written EijkUj. This quantity has two free
256
indices, so it is a
cross product as
A useful identity
EquQti0hs
Writethe
involving Eijk is
int jl
ijkklm
vol s
n v dS
n v dS
div v = lim
Vol
vol-o
s
(3.1)
volumethat
Thus the divergenceof v measures the amount per unit
coordinates,so
of
independent
leaves the point xo. This definition is
the divergence is a tensor.
GRADIENT
For a scalar field (x) there is an analogous quantity, the
of 4, defined by
(3.2)
grad
= lim
vol-o Vol S
n dS
xo
arounda pointxo.
of the maximumchange
a
magnitude
is
the
whose
magnitude
of that change.
in 4)and
important
operation,
the CURL,measures the
The final
rotation ot
v at a point. It is defined
by
vectorfield
curl v
lim
n x v dS
is a
pseudovector.
The above definitions of div, grad, and curl are independent ot co-
xo
ordinatesystem and illustrate the concepts under\yingthem,
actuallywork with these operators we need coordinate systems.
three of the above operations can be expressed in terms ot the GRAoperator, V, also called "nab\a" or "del." is also sometimes
DIENT
denoted
x
by
In Cartesiancoordinates, it is given
3
et is
vector
basis
the
The presence of
and curl operators
gradient,
unrepeated index i. The divergence,
258
by
then given
Vi
Xi
grad 4) =
Xi
Vk
curlv = V x v = Eijkxj
important operator is the LAPLACIAN
operator div
extremely
Another
grad, given by
Non-Cartesian Coordinates
3.22 The GradientOperator in
In manyapplications,Cartesiancoordinates are not the most practical
for solvinga problem.2 We are familiar with cylindrical and spherical
coordinatesystems,but there are many others, including bipolarand
parabolicsystems. We consider here only orthogonal coordinate systems;the basis vectors may change from point to point, but at each
point they are mutually orthogonal. We denote an arbitrary set of orthogonal coordinates by 111, 112, and the (orthonormal) base vectors
by eul , eu2, eu3. The most important distinction between Cartesian and
from one coordinateline to another. For example, in Cartesian coordinates (Xl,x2,x3) = (x, y, z), the distance between the coordinatelines
y = 1 and y = 2, keeping x and z fixed, is always 1. But in cylindrical
coordinates,
x2
u i) 2
2Appendix
A of Bird, Stewart, and
deal of useful
informationabout this topic. TensorLightfoot (2002) contains a great
analysis is not restricted to orthogonal coordinate systems;if you want
to learn about tensor analysis in general coordinates,some
good references are Aris
(1962); Block (1978); Simmonds (1994); Bird, Armstrong, and
Hassager
(1987).
distancecovered
in moving
1
h2
vectors in terms
xj
u ej
i
ut
(summationimplied). In general, the gi depend on
position. The importance of this fact becomes clear when we consider
operators like
Laplacian
the
v.v=gt ut
gJ
hihj
ui uJ
Thesecond term in this expression does not appear in Cartesiancoordinates,where the base vectors are independent of position. In terms
of the scale factors, the derivative of a basis vector with respect to
positioncan be written as follows
gJ
hJ
uk
hk hk
hJ
i=l
ut
gj
260
DifferentiQl
quqtio
rdOe
drr
x = rr
Solution
(a) As shown in Figure 3.2 we have for the differential of position
dx = drer + rdee
From the definition of partial derivative, we have the formula for
the total differential of an arbitrary function (r, O)
3.2 Differential
Operators and
Integral
Theorems
261
a2
r
(b) Next we use the definition of the Laplacian
to obtain
(3.4)
v2=v.v
re
producter ee = 0
because the unit vectors are orthogonal, gives
er
r r
1 eo
r2 r r
er
r
1 ee
r 2 r r
er
eo
er
ee
(3.5)
Substituting these derivatives into the previous result and collecting the nonzero terms gives
r2
rr
r2 2
Note that we can combine the first two terms for an equivalent
form
r22
(3.6)
262
1q
QQtio
hi = 1
cos
andg
1
g2 r2 - r sin
g2 = eo
= er
We then have
gl
h 2 02
r r
g2
r )
gl
g2
r2
r2 2 r r
r2 2
and
are
in spherical coordinates
Theorems
263
Cartesian
X2
Cylindrical
2+
spherical coordinates.
z z
2
V = er + eo
z
2
r
Spherical
r 2 2 + z 2
V = er
r 2 r
r sine
r
r 2 sine
sin 0
r2 sin2e
264
QQti0hs
Theorem
3.2.3 The Divergence
concerns the integral of the
The divergencetheorem
divergence
V. It is central to many
vectorfield v(x) in a region
aspects
vx
VA
V.vdV= VA X
vy
Yl xc(y) vx
Yc(x ) v
dx dy +
ax
dy
y dy dx
(vy(x,yc) vy(x, 0) dx
Vy(X,0) dx +
dy
= eyand S2,
Since on Sl,
expression can be simplified
V.vdV=
V dS +
dx +
V dS
(3.7)
both corTo simplify the remaining two terms, observe that they
respond to integrals along the surface (a curve in two dimensions)an
into
They can be combined into one by converting the second term
limitsof
the
changing
and
integral over x, noting that dy = dycdx
ax
Operators and
3.2Differential
Integral
Theorems
265
(x)
integrationappropriately
dX+
dx +
dyc
dy
xl Yc=Y1
dyc
dx dx
xly=o
ex + ey vdx
dyc
dx
1 + dyc2
and dS =
v dS
Combining
this result with (3.7),we find that
V.vdV=
v dS +
v dS+
v dS
(3.8)
S2
266
Differential
Eq u
Qti0hs
V dS+
V dS+
VB
V dS
(3.9)
The two domains VAand V2 share one side S2; on this side
nB.Adding (3.8) and (3.9) and recognizing that the integralsover
n v dS
V.vdV=
(3.10)
on the plane.
Equation(3.10)is the divergence theorem. By extending theseelementary arguments it can be shown to be valid for arbitrarybounded
THEOREMOF INTEGRALCALCULUS:
it reduces to the FUNDAMENTAL
its boundary. Finally,one can see that the definition of the divergence
operator mirrors this result for an infinitesimal domain.
In Cartesiantensor notation, the divergence theorem is
vt
dV =
V Xi
niVidS
Theorems
267
n T dS
(3.12)
A closely related result is the
multidimensional
which we state without proof
RULE,
version of LEIBNIZ'S
here.
Consider
the time derivative
of an integral over a volume that is moving
ement in a velocity field. If a point on the with time, e.g.,a fluid elboundary is moving
velocityq (x), then Leibniz's rule states that
with a
dt V(t)
m(x, t) dV =
dV +
mn qdS
or
time and represents the
net amountthat is
swept into V because of the motion of its boundaries.
dt v
n.
as + S RAdv
(3.13)
and
VectorCalculus
Partial Differential
Equations
268
the lone
allows
theorem
Thedivergence
integral
a volume
n FAdS -
surface integral to be
dV
independent
V is time
because
Furthermore,
d PA dV =
v t
v
dt
equations
Substitutingthese two
PAdV = V
v t
equation
Sinceall terms in this
bined
dV+
RAdV
+ V.FAdV-RA av
QA = -V FA+RA
(3.14)
can be written FA =
CA
= -V
(CAV) +DAV
2cA + RA
(3.15)
3.2 Differential
Theorems
269
(Vu vv + uv 2v) dv =
uVv ndS
(3.16)
(uv 2v -
(3.17)
pression
(Vu.V+uv
uv .ndS
(3.18)
vy
vx
dA =
(vx dx + vy dy)
The proof of this result closely follows what we did above with the
n (V x v) dA=
v tdC
270
uvdV
u vdV
if they are vectors. In our earlier discussion of adjoints, we used integrationby parts to help us compute them; in multiple dimensions,
Green's formula and identities are the appropriate replacements. For
example,using Green's formula, (3.18), we can easily find that, with
u (S) = 0 (Dirichletboundary conditions)
(Vu,v) = -(u,v v)
Thus the adjoint of grad (with Dirichlet boundary conditions) is -div,
Similarly,rearrangingGreen's second identity we find that
(V2u,v) =
If we imposethe same boundary conditions on u and v, then u Vv v Vu on the boundary. Thus the boundary term vanishes, leaving
Equations:
Properties
Equations:
and
solution
Propertiesand
Forms for
Second-Order
Partial
general
properties of
Many
partial
differential
duced with this second-order
equations
equation in
two dimensionscan be introauxx + 2buxy +
Cu
where x, y e R and ux = u etc. For
(3.19)
the
moment x and
essarily position variablesthey
y are not necare simply
the
problem.
the
The coefficients
for
a, b, and c independentvariables
latter
are real and
though the
restriction can be
relaxed.
Nowconsider constant,
of whether there exists a change of
independent variablesthe question
= 7xx+ nyy
that can simplify the left-hand-side of this
equation. Here
are constants and Exr7y
must be nonzero for Ex,5, nx,
the coordinate
transformation to be invertible. Applying the
chain rule yields that
(a; 2 +
+ cg)
(3.20)
tiplying u;; and unn in (3.20) vanish, leaving the simpler differential
equation
(3.21)
This is the canonical, or simplest, form for a hyperbolicpartialdifferential equation. Lines = constant and = constant are called
for the equation. The WAVEEQUATION
CHARACTERISTICS
utt C2uxx = O
(3.22)
272
tiQ1
EquQti0hs
coefficients
Ex,
of u;; and
coefficients
the
vanish.
make
exist that will
InsteaY hi
+ il,
characteristics =
conjugate
complex
= FOR , one
finds
and
n'
=
Using
as new
All
is not lost, however.
coordina
to
made
vanish,
leading
be
can
to the
the coefficientof
form
(3.23)
urn = g
(3.24)
ut + Vux = Duxx + RA
The Schrdingerequation is also parabolic. Elliptic and parabolicequations are treated extensivelyin the sections below.
The classificationof partial differential equations into these categories plays an important role in the mathematical theory of existence
of solutions for given boundary conditions. Fortunately, the physical
worryabout
posed mathematical problems for which we do not need to
the presenthese more abstract issues. Therefore we now proceed to
insensitive
tation of classicalsolution approaches, many of which are
to the type of equation encountered.
with
Expansion
Eigenfunction
3.3.2 Separationof Variables and
2
Equations involving V
most familthe
perhaps
is
OFVARIABLES
The technique of SEPARATION
equationS
differential
iar classical technique for solving linear partial
mechanicS,
Equations:
Propertiesand
solution
representing the solutions. Consider a problemwith three independent variables and two of them, say x2 and x3, lead to
= E EXkL(X1)
examples.
Weillustrate the method with several
1 u
1 2u
rr r r2
(3.25)
274
Equqtions
us (9).
Asdes
Solution
simplifyingyields
Plugginginto the equation and
r (rR')' -
=O
(3.26)
(3.27)
the direction.
Nowconsider the equation for R (r), setting c = 1<2.A littlemanipulation puts the equation in this form
principle to write
krlkleik
u(r, O) =
= 00
tion. At r = 1,
Equations:
Properties
and solution
the coefficients
Wecan extract
ak from
the
this formula
of
Sturm-Liouville
thogonality
by
basis
functions: take using the orinner products
akeiko ile
= 00
Us
die
eil)/2Tt, this
Letting = (us
process simply
(known)
the
Fourier
is,
That
coefficients of the gives us that ak = Ck.
boundary temperature
determinethe Fourier coefficients in the
cylinder,
so
Ckrlkleike
2u
X2
(3.28)
XT' = DX"T
whereagain ' denotes the derivative of a function with respect to its
independent variable. Rearranging yields
IT' X"
Observingthat this expression equates a function of t to a function of
T' = cDT
(3.29)
(3.30)
276
A simple change of
EquQtions
tions
v (C,t) = O. A particularly convenientbound.
ary conditions v (0, t) =
choice
the steady-state solution to this
problem.Thus
is us = ue, whichis
steady
the
state.
from
Substituting
into (3.28)
v (x, t) is the deviation
=
0
2us/x2
yields
=
us/t
that
and observing
2v
X2
0) = 0. Nowletting
X(x) T(t) and repeating the above steps we find that the problemforx
is a true Sturm-Liouvilleproblem, including the homogeneousboundary conditions X(O) = X(C) = 0. The eigenvalues are c = k2,where
for positive integer n and the eigenfunctions are sin
now k = 717T/C
Equation (3.29)is an initial-value problem. Its solutions, parametrized
by n, are
Tn(t) =
v(x, t) =
sin
n=l
(3.31)
The initial conditions Tn(0) are determined from the initialcondition v (x, 0) = usby setting t = Oin (3.31) and taking its inner product
mrrx
with basis function sin
x
. mrrx
uev, sm
, sin
n=l
Thus
Sin
Tm(0) = C mrrx
Sin
Sln
2uc
dX
ntTT
dX
u(x,t)
n 2ue
=
n=l
sin
nrx
(3.32)
Properties and
Solution
<
t
ID, this series
At short times
converges
-I decay of the Fourier
coefficients Tn(0) very
of the initialbecauseof
In this situation, alternate approaches that
semi-infiniteare more appropriate
approximatethe
because
domainas
heat
or
the
solute over a short diffusion has only
to spread
had time
distance from
see Exercises3.23 and 3.36. As t increases,
the
boundary.
becomessmaller and the series converges the exponentialdecayterm
more rapidly.
As stated, there are no homogeneous directions. Nowwe split the solution into three pieces: u(x, y) = U (x, y) + V(x, y) + W(x, y), where
U,V, and W all satisfy Laplace's equation, but with conveniently chosen
boundaryconditions that sum to the boundary conditionsfor the original problem, as illustrated in Figure 3.6(b). The problem for U is trivial
because all the boundaries have the same value of 200; thus U = 200.
The problem for V has homogeneous boundary conditions at y = 0
and x = 1, while that for W has homogeneous boundary conditionsat
x = 0 and x = 1, aside from a multiplicative constant, it is just a IT12
rotation of the problem for V. The solution to the Wproblem (to within
a multiplicative constant) is Exercise 3.32. From it the solution to the
Vproblem can be found so the solution for u = U + V + W is complete,
500
300
200
200
300
200
(b)
200
v2U
200
v2v
= O
100
v2w = 0
200
uxx + uyy
in a unit square with Dirichlet boundary conditions, which modelsa
steady-statedistribution given a source f (x, y) distributed withinthe
domain.
Solution
Separationof variables does not work for this problem (try it), but a
versionof eigenfunctionexpansion does. Think of this problem as a
linear algebraproblem Lu = f. Here L is self-adjoint, so the solutionS
to the eigenvalueproblem Lw + Rw = 0 form an orthogonal basis and
allowus to diagonalizeL. We can express u
and f in this basis, and
sinceL becomesdiagonal we can
easily solve for u.
To perform this procedure in
the present case, we need to solve
wxx + Wyy +
=O
279
We can solvethis
of variables: it gives
problemby Separation
Sturm-Liouvilleproblems
and yields eigenfunctions
in
Uhnn(X,y) = sinmrrx
bothX and y,
eigenvalues
mn
sinnTry
=
TT2(m2+ n2), for all
with (real) the Poisson equation,
solve
we express u andinteger pairs mn.
Nowto
f in terms of the
enfunctions
eig
UmnWmn (X Y)
write
can
we
mnUmn
NW,
=
fmn,
which
we
can
solve immediWyY
ately to give Umn = fmn/
SO
mn
sin ml-rx sinnrry
mn
where
r r
02
Z2
are Sturm-Liouvilleoperators, so depending on the boundary conditions,there may by the possibility of more than one method of solution.
Thefollowingexample illustrates this situation.
280
that depend on r.
(a) Usingbasisfunctions
that depend on z.
(b) Usingbasis functions
Solution
u(r,z) =
where
d2un
dz2
k 2un = 0
un (z) = an cosh
so
u(r, z) =
(3.33)
At z = 0, u =
l.
from r = Oto Takingthe inner product, i.e.,
r = 1 of (3.33),
weighted integral
evaluated at z = O,with Jo(kmr)
an =
(JO
JO (km r )
(3.34)
3.3 Linear
281
tX
dx
xnkJn-1(kx)
d
dx
(3.35)
xnkYn-1(kx)
(3.36)
(3.37)
+ nkyn(kx) = xkYn-1(kx)
(3.38)
(3.39)
(3.40)
J-n(kx) = (-1PJn(kx)
Y-n(kx) =
(3.41)
(3.42)
dx
0
-1
(k) k =
LJ2
2 71+1
(3.43)
Usingthe first and last of these expressions, one can find that
1
dr
(3.45)
an cosh kn
Sinhkn
282
EquQti0hs
boundary
Condi?
vn(r) sinnrrz
v(r,z) =
Substituting this solution form
n=l
leads to
1 d dvn n27T2vn
(r) sinnrrz = 0
r dr dr
r dr dr
n 2TT2vn(r) = 0
general solution is
order zero; they are shown in Table 2.3. The function has
a logarithmicsingularity at the origin, so for boundednesswe
require that bn = 0. The coefficients an are found by imposingthe
boundarycondition at r = 1 and again taking the inner product
with an eigenfunction
an =
z),sinnrz)
-2
(n27T2)
nnIo(n27T2)
v(r,z) =
sinnrz
an10(n27T2r)
ItI
Properties
and solution
operator can
283
be written
where
sino OSin
Lyf =
r r2(r f)
variable g = Yf
often is useful
withu(r,0)=
= DLru
> O) = O.
Solution
dx
dx + (m 2x 2
+ 1)) y = 0
sinmx
x
cosmx
x
with boundary
we are solving
2
O=LrT+ ALnT
conditions
VT Gez,
00
Solution
= cos O, Lbecomes
d
and the eigenvalueproblem can be written as
1 z ) dz2
dz
Thisis Legendre'sdifferential equation, see Example 2.9. Its eigenvalues are = n(n + 1) for nonnegative integers n and its eigenfunctions
are the Legendre polynomials Pn
Substitutingthe solution form
(3.47)
r 2 LrTn -
+ 1)Tn = O
3 Linear
echrliques
285
as
ewritingthis
2d2Tn + 2r dTn
dr2
it
recognize
we
(3.48)
(3.49)
1.
0 for n >
dr
Pn(n)
nanRn-l - (n
vanishterm
must
sum
this
of the pn(n),
orthogonality
the
Becauseof
-(n+l)-l = O
1)bnR
_ (n +
byterm
nanR
known values
the
Using
of an
bo
al G - 2b1R-3
bl -
GR3
(n + 1)bnR
is
Thefinalresult
(3.50)
Too G
R3 cose
2r2
in spherical
r = R(O) = 1 + EP2(cos O)
1)/2 is the quadratic Legendre polynomial.
(3x2
=
(x)
whereh
the poles and narrower at the equator
at
elongated
than
shape is slightly
surface area. Use a regular perturbation
same
the
has
is a sphere,but
smallness of the deviation of the surface from
approachbased on the
spherical.
Solution
the technique of DOMAINPERTURBATION.
This
This example illustrates
uniformly small.
V2T = 0, T(r = R) = 1, T 0 as r 00
Becausethe boundary is not a constant-coordinate surface, separation
1 2T
2 r2 r=l
1
2T
O()
EP2(cos)+
E2P2(COS +
2 r2 r=l
Equations:
Properties
and
solution
is imposed
287
use of separation of variables. There
at
perturbation approach is necessary, is no
so we indicationthat
expansionT(r, 0)
To(r, 0) +
a singular
posit a
(r,
regular
0)
order
each
+
is simply
equationat
perturbation
Laplace's 2T2(r,0). The
equation, With governing
boundarycon-
ri+l
i=0
P2(cos O)
21
r3
0)
36
0)
35 r5
solutions,
these
we
can
Given
find that the dimensionlessheat
fluxfrom
the object is
27T
O r
sine dO = 4TT(1+ 2)
changein heat flux from the sphere is proportional to the square of the
deviationof the surface from spherical. Notice that the entire solution
procedureis valid, and the heat flux the same if E < 0, so the objectis
actuallya slightly flattened sphere. Therefore both prolate and oblate
deviationsfrom a spherical shape increase the heat flux.
288
Partial Differentia/
spherically symmetric
the case of a later, so it is natural potential
consider
to Workin
Wewill
we will specify
form
whose
equation
have
this
of
V(r),
a very richSh
solutions
The
ical coordinates.
many features of systems with
Thus
where E is a constant.
dt
(3.52)
(3.53)
-iEt
f (t) = foe
Equation(3.53)has the form of an eigenvalue problem where the eigenvalueEis a dimensionlessenergy. This must be real so that Y doesnot
vanish at past or future times. Now, since L4) = 2/2 with periodic
boundary conditions, we let (P(r, 17,4) = u(r, 7)eimfor any integer
m. As above, we have let = cos O. Equation (3.53) becomes
2
(3.54)
r2 (1 172)
Therefore
r 2LrR + r 2(E -
- CR
2
I 172
(3.55)
(3.56)
3.3 Linear
Partial Differential
Equations:
Properties and
Solution
knowthat
* O it is
I DIFFERENTIAL
the ASSOCIATED
EQUATION. Seeking
a
LEGENpower series
vealsthat this equation has bounded solutions
solution
rein 1<
number. For
. , l. These
solutions are the
if
ASSOCIATEDLEGENDRE POLYNOMIALS
2)
dnmPl(n),
(3.57)
Pint.
denoted Yim(O, ,
these products are called SURFACESPHERICAL
HARMONICS,
or someHARMONICS;
timesjust SPHERICAL
they are
the eigenfunctions
of the
(3.58)
21+1
(3.59)
O,
(3.60)
= 0.
almrl + bimr
-(1+1)
= 0, expressed
(3.6
290
eqt4Qtio
ecsoe
m=2
'77=1
"77=0
Figure 3.7: From left to right, real parts of the surface spherical
har_
dr2
Equations:
Properties
and
n'
+
1
=
as
n
solution
pefining
the PRINCIPAL
QUANTUM
NUMBER
this
becomes
n2
expression
determines the
This
verywell the energy levels of a eigenvalues of
(3.53)
hydrogen
n,(x) = Yim(O,
(r) of (3.53) atom. The and describes
l, called the ANGULAR
are
eigenfunctions
characterized
MOMENTUM
RADIAL
the
QUANTUM
called
by l,
NUMBER. QUANTUM
NUMBERS,
onlyon I + n', various combinations Since the
and n'
eigenvalues
of
I
true
for
is
and
E
all eigenfunctions
Thesame
n' have
the samedepend
atomic
with
f
and
d,
orbitals correspond
energy.
the same
to
m.
I = O,1, 2,
Thes, P,
sinceE < O,when n = 1 only 1 =
and
O
3,
respectively.
is the ground state or lowest-energy states, s orbitals,can
state of the
exist. This
hydrogen
atom.
1
on. Thus we see in this analysis the (p orbitals)can exist, When
basic features
and so
of the electronic
k = 00
couldbe rep-
(f eikX)
(eikx
eikX)
f(x)e
tkXdx = F {f (x)}
Thisis the analogue of the expression for Ckin a bounded domain;becauseperiodicity is no longer required over a finite interval, k can be
1
f(x) = 2TT
-00
and vice
Theseoperations are mappings from "x-space" to "k-space" which
transforms,
versa. Here are some useful properties of Fourier
areeasilyderived from its definition:
property
Derivative
l.
df(x)
2. Integral
property
xo
on the
where c depends
3. Shift in x
F {f (x - a)} = e tka(k)
(3.64)
4. Shift in k
(3.65)
5. Scaling
F {flax)} =
lal
(9
(3.66)
u(x) =
This is often written u = G * h. The CONVOLUTION
THEOREM
states that
(3.67)
3.3 Linear
Techniques
293
2TT(k)
f (x)
1. Conversely,
is smeared all over space. a spike located at
wavenumber
zero
L)
f (x) eilx. A spike at
2TT(k
k = I corresponds
3. j'(k)
of wavenumber l.
sinusoid
to a
1, L< x < L and zero elsewhere
=
j (k) = (2 sin kL)Ik.
f(X)
4.
1
5.
6.
blx\
2b
7.
ex214a
formula
Example3.11: Derivation of a Fourier transform
its Fourier transform.
Letf (x) = e-blxl with b > 0. Find
Solution
eblx\eikx dX
e
(b-ik)x
1
b -ik
2b
b+ik
j (k) = 1.
294
of Fourier
Weillustrate the use
EquQti0hs
transforms to solve
PDEs
With
amples.
ut Duxx
(3.68)
in the one-dimensionalinfinite domain (00,00)with initial
ditionu(x, 0) = uo(x), where uo (x) is known but otherwise
bitrary. Usethe Fourier transform in x to find the solution
(b) Extend this result to three directions, with initial condition u(x,
0)
uo(x). Do so by first considering a -function initial condition
(x) = (x)(y)(z) and noting that it can be incorporated
into
the governingequation as a point source in space and time
) (t)(x)
(3.69)
Solution
t) = k2D
Thisgives us an ODEfor each value of k, with initial condition
o(k). The solution is simply (k, t) =
Theinverse
Fourier transformputs this back in physical space. Considerthe
evolutionof a delta function, whose Fourier transform is simply
(k) = 1. Now
2
u(x, t) = F -1 {e-Dtk
F {ut = DUxx}
t(k, t)
e -x2/4Dt
2 7TDt
(3.70)
Thus at any time, the temperature field that starts as a delta function is a Gaussian distribution, with height
1/2 TTDtand width
4Dt. An important extension of this result comes from the
of delta functions
uo(x) =
(3.71)
3.3 Linear
Propertiesand
solution
superposition
s of the
2 TTDt
(3.72)
f (x, y, Z)e-ikXXe-ikyye-ikzz
dx dy dz
(k) =
dx
Similarly
1
(2TT)3
F3D{Vf} = ik(k)
F3D{V v} = ik (k)
2),
F3D i V 2f I 1<
F3D
k2D = (t)
vector
296
formula
results
dk
k2Dteik.x
(27T)
2 TT
transform
three-dimensional
the inverse
Applying
u(x,t)
Partial Differential
Calculusand
-kDteikxXdkx
co
27T 00
x
2/4Dt
2 TTDt
e -y2/4Dt
2 TTDt
2 7TDt
_r2/4Dt
(2 7TDt)
this result, (3.72) generalizes to an arbitrary
Using
Ixl.
(x)
wherer =
initial condition uo
three-dimensional
u(x)
(2 TTDt)
tration profile
uxx + uyy
00.
Solution
sidering
theboundaryconditionu(x, 0) = (x). Taking the Fourier
transformof the equationand boundary
condition in the x-direction
-k2(y) +yy(y)
Requiringthat the
solution be
(0) = F {(x)} = 1
(y) =
3.3Linear
Partial Differential
Equations:
Properties
and
solution
1.
Now
the
110
inverse
that
297
from the pointtransform of
found. Recall
this
of
the variable Y
view of
solution
is a
andits inverse,
the
constant (we Fourier mustbe
transforms
involving
the
Fourier
are only transform
x-coordinate).
Therefore considering
we can
comand(cxx) = k(k/CX). Letting y = we
( ky)
e-l kly
have that
11
Try + x2
572
TTx2 +
u(x,y) =
Observethat (3.73) has the form of a
convolutionG(k)h(k),
and
(k)
uo(k)
=
with
,
i.e,
(k)
it is a convolution.
Thus,the
solutionarises directly from the convolution theorem
Lu = f (x)
(3.74)
for a transientproblem
In quantum mechanics in particular, a Green's function
condition
a -function initial
298
The discussion
within the domain
xo
position
arbitrary
boundary conditions G should satisfy. For the
what
reveals
low
for self-adjoint problems present
functions
Green's
and
we will consider
Sturm-Liouville
consider
will
operators. Recall
specificinitial example
(2.33)from Section2.42
1
+r(x)u
a w(x) dx
v w dx
= p(b) (u'(b)v(b)
+ r(x)v
w (x) dx
w dx
+ p(b) (u'(b)G(b,xo)
p(a) (u' (a)G(a,xo)
Applying(3.74)and (3.75)in the two inner products gives us that
+ p(b) (u'(b)G(b,xo)
p(a)
xo) u(a)G'(a,xo))
The inner product (u, (x
ranging leads to
u(xo) =
(xo), so rear-
evaluates to
dx
w (xo)
p(b)
xo)
+ p(a) (u'
xo)
(a, xo)) ]
Properties and
solution
u(xo) w (xo)
f(x)G(x,
dX+
(3.76)
problem LG =
xo),
solution
to
Lu = f for any
f through (3.76). Note that (3.76) is closely
A-1b of the algebraic problem Ax = b,analogous to the solution
with G
of A-I. Example 2.15 shows a derivation of this playingthe role
formulafor a specificproblem. Because that example already imposes
Dirichletboundary conditions, reworking it with u(x) =homogeneous
xo) would directly yield the Green's G(x,xo) and
f(x) =
functionfor the
Dirichletproblem.
Xl)
(3.77)
((x xo),
(x,xl)) =
Xl))
Thisreduces to simply
(xo,xl) =
(3.78)
Vector Ca/cu/us
300
Thisresult
of the
is the analog
-v2u = f(x)
= (x
(3.17),
Green's second identity,
xo)
(3.79)
with v replaced by G, is
u(xo) =
G(x,
dV(x)
ndS(x)
(3.80)
requiring
If u satisfiesDirichletboundary conditions u = us on S, then
that G = 0 on S yields a solution for u
u(xo) =
G(x,
dV(x) us
the term
5Weput a negative sign in front of the Laplacian here so that physically,
function
f(x) represents a source of heat, chemical species, etc., and thus the Green's
represents a point source. Someauthors do not use the negative sign.
Equations:
Properties
and
solution
whereG/n n VG. A Green's
pirichlet boundary conditions is function
sometimessatisfying
TIONOFTHEFIRSTKIND. If u satisfies
called a homogeneous
we apply
Neumann
GREEN's
u/n = js, then
homogeneous
FUNCto
boundary
O
the
tions G/n
Green's function, Neumann
conditions
in which
boundarycondicase the
solution for
u(xo) G(x,
G(x,
dS(x)
(3.82)
(3.82)
the solution to -V2G
xo) with the requiresus to
determine
ditiOnS.To do this, it is useful to let G appropriateboundary
be
con+ GB. In this sum,
written as the
parts: G =
is
sum
of two
called the
It is a solution to the
FUNCTION.
FREE-SPACE
equation
GREEN'S
domain, and contains the singular behavior Lu = in an unbounded
the point source. The boundary correctioninduced by the presenceof
GBsatisfiesLGB
singularbehavior is contained in GO, and is
= O(the
determined
satisfy
by
G
specific boundary
ment that
conditions on S. We the requirewillfind GOO
for L V2 in two dimensions.
and CYB
r dr dr
Thesolution to this is simple
Goo(r) =
Inr + C2
Recallingthat V2
theorem to the
V V and applying the divergence
s n
dS = 1
302
14Qtio
surrounding
radius
s n
Therefore
1
27T
Letting r = Ix - xo l, the free-space
becomes
tion for V2 in two dimensions
Goo(x XO)=
-1 In
Green's
fun
Ix xol
(3.83)
-1
In Ix
= = In Ix xo/ = In Ix
27T
2m
27T
In
XOII
Ix D
XO/ I
Yo
Ix xo I
us (x)
(x xo) 2 + Yo
f (x,y) dx dy
dx
Equations:
Properties
and
solution
303
on
than a
be compensatedby another
imagepoint,
fieldat y
is availablein closed form but often it is not. To addressthis situation,we step back to (3.80). In developing this equation, the boundary
conditionson G have not yet been specified. For example,it is valid if
welet G =
which has a simple closed-form solution, (3.83).Using
thischoice and letting f (x) = 0 so that we are consideringthe Laplace
equation, (3.80) becomes
u(xo) =
Cx,xo)
u(x)
dS(x)
- u(x)
CAx,xo)
n
n
(3.84)
304
EquQti0hs
fact that GN/n changes sign as xo crosses from one Sidfrom the
boundaryto the other. Considera vertical boundary define
line x = 0 (with the outward normal pointing to the right) and
let
xo = (xo,y). Takingthe limit xo 0 corresponds to approaching
the point (0,y) on the boundary
G00(x, xo)
xo-0
lim
G00(x, xo)
1
xo-0
xo
27Txo + Y2
sgn(xo)(y)
2
Ix I
as a delta
lim u(x)
xo-s s
dS(x) =
u(x)
dS(x) u(xo)
(3.85)
G00(x, xo)
lim
dS(x)
(x, xo)
u(x)
u(x)
(x, xo)
GOO
dS(x)
(3.86)
2
n
n
If Dirichletboundaryconditionsu = g are imposed, then the lefthand side and the second integral is known and the boundaryvalues
3.3
Partial
Linear
Techniques
Differential
Equations:
Properties
and
of u/n are determined by
solution
the solution
305
1mPosed,then of this
the unknowns. If u is impos
the
u/n on the remainder, then ed on some boundary
boundarywhere u is imposed u/n is an part of thevaluesof u are
boundary,and
and
closed form solutions
to (3.86) viceversa.
can be
its importance goes beyond
these.
that Laplace's equation, a
on aobtained in special
partial
fundamental cases,but
mulated as an integral equation
differential
level,it
original domain. on a practical whose domainequation,canbeshows
is
reforcomputational approach to level, it forms the the boundaryof the
basis of an
solving the
problems, the BOUNDARY
Laplace
important
ELEMENTMETHOD. equationand
tegrals in (3.86) are discretized,
related
In this
approach,
leading
to
the ina system of
equations whose unknowns are
linearalgebraic
values of u and
u/n at points
on the
3.3.6 Characteristics
and D'Alembert's
Solution to the wave
Equa-
= C2 V 2 u
(3.87)
governs wave propagation in many physical
contexts,
tromagnetic waves (light), vibrations of strings and includingelecmembranes,and
sound propagation. In one spatial dimension, the equationis
utt = c 2uxx
which was introduced
(3.88)
= x ct and
yields(3.21)with g = 0
x+ct.
solutionof the
Wecan easily integrate this twice to find the general
waveequation
Fl(x
=
+F2(n)
FIG)
u(x, t) =
right-movinganda lefta
of
equationas an
It says that any solution is a superposition
wave
the
understand
and
conditions
moving wave. Usually, we want to
initial
two cases of
at
100k
we
so
initial-value problem,
result.
306
Equations
to
0 the above general solution
and its . n'
= O= El (x) +F2(x)
ut(x, 0) vo(x) = CFI'+ cF
This tells us that Fl
find that
F2and that vo
1
x+ct
2c o
Similarly
Fl(x ct) =
xct
2c o
x+ct
20' x ct
partial
Linear
3.3
Differential Equations:
Techniques
Properties and
Image domain
solution
307
xo
u(x,t)
I
uo(x ct) + uo(x+ ct) +
x+ct
2C xct
Partial Differential
Vectorcalculus and
Equations
308
Methods
Transform
3.3.7 Laplace
PDEs with Laplace
solution of several linear
the
trans.
Nextwe illustrate with some experience, Laplace transforms are
forms. For a user
method for solving linear, low-dimensional PDE
powerful
s
Laplace transform of a PDE,
bly the most
the
taking
After
usually
in closedform.the time variable, the result is a linear ODEin the trans.
wthrespect to
can often solve this ODE.To perform the
inverse
form function. Werequire some inverse formulas for transforms
With
transform,we then
inverse formulas next and then solve
these
develop
singularities.We
transform function
Let the
p(s)
a(s)
simple zeros
a(s) = 0
f(t) =
n=l
anesnt
an
(3.89)
6The
singularities of complex-valued
a(s)
company. Heaviside also used the
expansion for the case Of
sinhxs (Vallarta,
1926)(Heaviside, 1899, p. 88).
partial
3.3 Linear
Techniques
coefficientsani, for i
Tile
309
1,
ill which
2CA
CA
DA x2 - KcA
BCI
=O
BC2
cm
2c
z 2
kc
BCI
BC2
c(z,0) = 0
KL2
310
Equations
parameter is
pearing in the
kno
modulus
Thiele
in
the
chemical
as the Thiele number or
engineering literature (Rawlings and Ekerdt, 2012, p. 363).
rate to the diffusion rate It
dicates the ratio of the reaction
(b) Take the Laplace
equation and
bouns
sinh( s +
z))
s sinh s + k
(c) Apply the final-valuetheorem to C(z,s) to find the steady-state
solution Cs(Z).
s+
and find
(g) Invert the transform and find c(z, t). Check that the solution
satisfies the PDEand boundary conditions.
Solution
(a) Inserting the defined dimensionless variables in the PDEgives
2c
L2
2c
z 2
2c
Z2
KL2
c
DA
kc
3.3Linear
OKzLKL,
311
and
C(Z,T)
C(Z,T) O
C(Z,T) O
0 KZ KI,
T 0
d 2 k
z
CII, s) 0
C(z,s)
acosh s +
z) +
z)
and we use the two BCSto find the constants a and b. Wehave
1
bsinh s + k
so we have
s sinh s + k
z)
sinh s +
s sinh s + k
312
QQti0hs
the
(c) Applying
(z) = lims(z, s)
Sinh s
= lim s
= lim
cs(z)
z)
s sinh s + k
sinh
- z)
s-o Sinh s + k
Sinhv/k(l z)
Sinh v/k
sinhx
(d) Usingthe fact that
limcs(z)
diffusion
boundary conditions
d2cs(z)
equation and
cs(0)
-1
(e) Theconcentrationprofile (z) versus z for a variety of rate constant k are given in Figure 3.10. We see that a large reactionrate
constant prevents species A from diffusing very far into the membrane.
nr, n
0, 1, 2
the zerosof
Theseare simple zeros so the inversion formula in (3.89)is applicable. Differentiatinga(s) and evaluating Q'(s) at the zeros
8SeeExercise3.48 for a proof that these are the only zeros of sinu for u G C,
and solution
313
0.8
0.6
2
0.4
10
30
0.2
100
0.4
0.2
0.6
0.8
constants.
gives
Q'(0) = sinhNQ
Q' ((n 2TT2 + k)) =
(n 27T2 +
2nTTi
+ k)) = isinnr(l z)
C(Z,T) =
sinhVk(1 z)
Sinh vfk
n=0
2rn
n 2TT2+ k
(n2TT2+k)T
Noticingthe n
C(Z, T)
vanishes,
0 term
z)
Sinhk(l --Sinh
(1)n+1Trn
34 in Table A. 1.
entry
to
Compare also
wave equation
the
Solving
Example3.15:
utt = c2uxx on x e [0, 1] for a stringWith
equation
wave
Revisitthe
0, and the plucked string initial con.
u(l,t)
=
u(0,t)
fixedends
ut(x,()) = O. Solvethis equationusingthe
uo(x),
=
u(x,0)
dition,
Compare the solution to D'Alembert's solution.
transform.
Laplace
prefer
Whichform do you
and why?
Solution
uTT= uxx
u(x,0) = uo(x),
LIT(x, 0) = 0
u(0,t) = 0, u(1,t)=0
xe
(0,1)
T 0
u(x,s)xx
= suo(x)
in which
sinh(sE)sinh(s(l x))
sinhs
sinh(sx)sinh(s(l 9)
sinhs
3.3Linear
Techniques
315
we expect, G(x, s) is
Noticethat, asboundary-value problemsymmetric in (x, k) because
second-orderrequire a Laplace inverseis self-adjoint. Next we the
invert
for the following
s). We
form
sinh(as) sinh(bs)
sinhs
sinhs has simple zeros at s
I-ITT
i With
Noticethat
in
(3.89)
to
given
obtain
formula
usethe
p (sn) = sinh(nTrai) sinh(nTTbi) = sin(nTra) sin(YITrb)
q' (sn) = cosh(nTri) (1p
inverse is
Thereforethe
(1/1+1sin(nTra)
E
= 00
f (T)
inTtT
Substituting e
gives
cos
f(T) = 2 E (-1/1+1sin(nTta)sin(nTrb)
n=l
(1P
E)) = (1P +1
Butnoticingthat
G(x,
reduces this to
sin(nTtx) COS(nTTT)
=2
an = 2
uo)
316
we have finally
u(X, T) =
an sin(nrrx) COS(nTTT)
n=l
time variable
Returningto the original
gives
u(x, t) =
(3.91)
ct
an sin(nrx) COS(nrct)
T
c 2(nn) 2u
(nn)2U,similarly, taking two
wave
satisfies the
equation. Thezero
so utt = c2uxx and the solution because all the sine
terms vanishat
boundaryconditionsare satisfied
satisfied because of the Fouriersex = 0, l. The initial condition is
immediately that the solutionis
ries representation of (x). We see
periodic (in time) with period T = 2/c since all the cosine terms have
also convenient if we wishto
this period. The Fourier series solution is
analyze the frequencycontent of the solution, which is often a quantity
of interest when modeling sound propagation.
D'Alembert'ssolution, on the other hand, provides the nice structural insight that the solution splits into two waves travelingin opposite directions. But then we also require the additional insight fromthe
method of images to enforce zero boundary conditions and extendthe
solution to the (x, t) values where x ct < 0 or x + ct > 1, for which
u
+ Lu = f (x), Bu(S, t) = h, u(x, 0) = uo(x)
t
form, we could
91fwe knew enough about the problem to propose a solution of this
is that it is
here
transform
Laplace
the
arrive at this answer more quickly. The value of
solution to apply
prescriptive. Youdo not have to know (or guess) the structure of the
the method.
4 Numerical
Solution of Initial-Boundary-Value
Problems
317
IN
+ LuN f (x)
t
Theresidual is now forced to be orthogonal to the set of N test functions(Pi;that is (R, (V)= 0, i 1, 2, , N. In the Galerkinmethod the
testfunctionsequal the trial functions so this conditionbecomes
theweightedresidual conditions as
and bi = (f,
canwrite
= -M -1 Ac + M-l b
Thisis a set of linear ODEs (an initial-value problem) for the vector
coefficientsc in the series for uN. We have reduced a partial different
318
ary conditions. The explicitly solved for the last NL,c valuesequations
of c and
Typically, these can be
ODEs.
the
into
the formulas substituted if we use the collocation approach.
Nowwe
A similarresult arises operators in L by their matrix
approxima_
replace the spatial derivative
operators. This yields
differentiation
tions, the collocation
< 2r2)
This result shows that, to within a numerical constant, the time step
for explicitEulermust be shorter than the time scale for diffusion over
a distance Cmm.
A similar result holds when finite element or finite difference methods are applied. For simplicity, we will consider a finite difference ap-
Solution of Initial-Boundary-Value
3 4 Numerical and Numerical
Stability
Problems
pjscretization
the diffusion equation,
prooation to
using the
central difference
foruj-l - 211j +
-D
dt
spacing between
whereh is the 2 that the finite mesh points x j and u
chapter
element discretization
from
identical
an
form
using hat funcfor the second
tionsleads to
derivative.The
approximationto this ODEis
forward
Euler
2DAt
2DAt
coskh eikjh
(n)
2DAt
2DAt
+ hQ
cos khl is the growth factor, which for nuHereG = 11 VV< 1. When k = O,G 1, whichmakes
mericalstability must satisfy
physicalsense because k 0 corresponds to a constant function, which
doesnot decay by diffusion (there are no gradients). As k increases, G
2DAt
IT.To maintain
(3.92)
320
than
step At must be smaller
IDAt
(n)
2u
2 h 2 k j-l
method. The linear system that
This is called the CRANK-NICHOLSON
must be solved at each time step is tridiagonal so it can be factored
quickly.
U(Xj+1)U(Xj-1)
duJ
vAt (n)
dt
2h uj+l
The same right-hand side arises in the finite element approximation
(3.94)
Initial-Boundary-Va1ue
Problems,
eikxj
the Fourier
the convection
analysis
operator
(n) vat
vAt
vAt as
DefiningC =
the COURANTNUMBER,
the stability conditionbe-
comes
C < cmax
(3.95)
wherein this case Cmax 1. This is the COURANT-FRIEDRICHS-LEVY
Physically,
CONDITION,
CONDITION.
often simply called the COURANT
it tells us that the time step must be smaller than the timeit takesfor
convectionat speed v over one mesh unit h. Byreplacingthe central
gained
difference,we have lost an order in spatial accuracybut have
small
One
stability.And anyway, the method is still first order in time.
velocitycan
the
where
problems
Complicationof this method is that for
322
Equations
difference is used.
approximahon
unstable.
is always
without use of upwind differences.
Stability also can be gained
The
method is a simple modification to the FTCSdiscretizas
LAX-FRIEDRICHS
at point xj is replaced by the average
tion where the present value
of
j 1
the values at points j + 1 and
u (n+l) 2
(n)
u j+l +
u
211 t j+l u j 1
(3.96)
vAt
(n)
u j _1)
(.n)
2u(.n) + u J+I)
(3.97)
uxx
ut + VuX
2At
This diffusion term is enough to stabilize the method: using the von
Neumannanalysis the stability criterion is found to be very similarto
what we found for the upwind scheme but is now insensitive to the sign
of v
IvlAt ICI<I
=
h
All the methods developedso far for the convectionequationare
j1/2
211 kujl
Initial-BoundQFYVQ1ue
Problems
FTCSstep
to
generate
the
j +1/2
solution
u(+1/2)
Eliminatingthe intermediate values,
this can
be rewritten
in the
more
n
211
-u(
(VAt)2
j-l
2u (n)
made
always be chosen to be much
smallerthan
thelength scales over which the true solution varies.
Taylor-expanding
2 around kh = 0 yields
IG1
IG12 = 1 - (1 and
for Lax-Friedrichs and Lax-Wendroff, respectively. The latter is substantially better, since the deviation from IG12 1 scales as (kh)4
rather than (kh) 2
limitsof the
Thecases above represent the low and high Peclet number
generalconvection-diffusion equation
ut + VuX = Duxx
(n)\
(n) vAt ( (n) uj-l)
(vAt)2 u (n)
2h2 k j-l
values instead
2. Crank-Nicholsonis applied, using the intermediate
of the values at step n
IDAt
2 h2 \uj-l
IDAt (
2 112\uj-l
+ uj+l
(n+l)
In methods like this, because the diffusion terms are evaluated implicitly, the stabilitylimit is set by a Courant condition on the convective
terms. In fact, one might also get away with an unstable (e.g.,FTCS)
3.5 Exercises
3.5
Exercises
325
consider
definition
corner at
the origin
(X,y, z) = (Ax, Ay, AZ). In this case the
and
integral
definitionthe
opposite
of the
corner
gradientgradat
dS
Becausewe are going to shrink the volume to
zero, we
can make
the truncated
Taylor-
y
z
wherethe derivatives are evaluated at the origin.
Combine these
to derive
the formula
arguments
hold
same
for
The
the
er
-Q=o
eo
1 qe
(rqr) +
r r
(hereu, v, and
UsingCartesian tensor notation, derive the following identities
vectorsand
is a scalar).
(b)
(C)V (uu) = U Vu + UV u
w are
326
identities
vu
(v w)u
(d) (u x v) XW = (u
= (vu uv)
(f) V X (V X V) =
iljm
VV.u-V.
(b) v x v x u =
(e) (u x v) x w
Equations
llv112) - (v.
DeriveLeibniz's rule for the special case where the volume V is a cube whosesize
is constant but is moving with velocity q. In other words, explicitly show that the
contribution from the motion of V becomes Js mq ndA.
(c) Use this result and the divergencetheorem to derive a formula for the total
volume T = JV dV of a region V in terms of an integral over the surface S of the
volume.
VxvdV=
n x v dS
v 2u = f
T' n
in a volume T with (no-flux)boundary
condition n Vu = O on the boundary S Of
boundary.
3.5
Exercises
(b) Iff
necessary
condition
3.12:Helmholtz decomposition
Exercise
condition
for the
must v
existence
satisfyfor
the
resultof
termsof
p in a viscous
flowdriven
bya
then
(AU,V) = (U,AV)
wherethe inner product is given by
u v dV+
pq dV
dt
scalar functionf().
in which g() =
is the usual derivative of the
For the special case of f (A) = In A for A nonsingular, we obtain
dt
dt
Partial Differential
and
Calculus
vector
328
nonsingular
A,
differentiate
to
(b) For previouspart
the
of
3.15: Euler
show that
det(A) tr A-l A
d detA =
dt
expansion
continuum.
of the
given by
coordinates is
(3.99)
formula
reference
representthe
u
Letcoordinates
Exercise
Equations
)dVx
dVu = det
of the
is the determinant
establish that
viousexerciseto
detCux
expansion (or dilation) formula.
Euler
the
as
whichis knoval
ReadExample12.2-2
Equation 12.2-21 into 12.2-23. Then exchange the orderof
(a) SubstituteY from
integral can be performed. Then,makea
integration,and show that the inner
change of variable to obtain
t-1/3 e-t dt
2
x 2
(c) Verifythat the temperature profile in (a) satisfies the differential equationin
Equation12.2-13.Usethe chain rule and the results from (b).
(d) What is the numerical value of r (2/3)?
erf (z) =
e t2dt
Notethat
e t2dt=
The complementary
error function defined
by
00
e t2 dt
3.5
Exercises
329
cos(2tx)dt
Differentiatef(x) and then integrate by parts to show
dx + 2xf(x)
that f satisfies
the dif-
(d)Lett
(e) Integrate
Tr
exp
2a
cos(bu)du
the order of
-a2u2sin(u)
m. Change
du
e2aberf (ax + ) + e-2aberf (ax andderivethe indefinite integral (Abramowitz and Stegun, 1970, p. 304)
2x2_
4a
+ const. a *0
b
e-2aberfc(
bO
(3.101)
330
useful Laplace
Some
3.19:
Exercise
transforms
f(t)
k>0
e kv/
erfc
f (t) = erfc
Use the definition of the Laplace transform, switch the order of integration,and
use Equation 3.101.
(b) Establishthe second entry by differentiating the first f (t) with respect to t.
(c) Establish the third entry by differentiating the second f (s) with respecttos.
f(t) = 1 at e-kerfc
'Xerfc
Derive this result by using the convolution theorem and the last entry in the tablein
20
(3.102)
fo(x) =
1
20
x dx
dx
= 2x2
3.5
Exercises
331
equationand
showthat
fo
satisfiesthe
fo(x) = allo(x)
+ a2K0(x)
with some constants al , 612.
(c) Given the integral defining f)
what value
does
approachfor
largex?
lim fo(x)
In(x)
(x) -In(x) as x
It is known that
O(see
p. 375)),so we conclude that
= 1 and fo(x) =(Abramowitzand stegun,1970,
Ko(x).
of the modified
Bessel function Ko
derived in Exercise3.21 to derive the following Laplace transform pairs.
f(s)
f(t)
1
e-
1
1<1(kv), k > o
e-TE
332
energy equation
Show that the
reduces to
x 2
with boundary
Equations
conditions
T>0
9(0, T) = 1
e-xv
What assumptions did you make?
using Exercise 3.19 to obtain
(c) Take the inverse transform
(x, T) = erfc
Plot 6(x, T) as a function of x on 0
(d) Show that the proposed solution satisfies the PDE and BCS.
(s
b)(s c)
(s
b)(s c)
(3.103)
and the coefficients A, B, and C are determined. Then the inverse is simply
f (t) = Ae at + Bebt +
3.5 Exercises
Exercise3.25: Transient heat
333
conduction in
a finite
slab
.
we have a one-dimensional slab Withends (-kVT)
at uniform temperature To. Just after t
located
= 0, the at x
temperatureTl and held at this
two ends
are
we Wish
to findimmediately
the transientraisedto
(a) Write the PDE and (three)
boundary
solution
at x = L, x -L, and t = O.
conditions for
How many
this situation,
parameters appear
i.e.,
(b) Choose nondimensional
in this conditions
temperature,
problem?
spatial
temperature.
position, and
TI-T()
tk
time variables
as fol-
variables. How
manyparam-
cosh(vGz)
S cosh
conditions,and show
(3.104)
as T 00.
in this
conditionsat r = R, r = 0, and t = 0. Howmanyparametersappear
problem?
time variablesas
temperature, radial position, and
tk
T-To
pCpR2
fol-
Howmanyparamvariables.
nondimensional
Express the PDE and BCSin these
and
Calculus
Vector
Partial Differential
Equations
334
1
cylinder
0.8
1
0.6
slab
0.8
0.4
0.6
0.2
0.4
0.2
O
0.2
0.4
0.6
0.2
0.8
0.6
0.8
sphere
0.8
0.6
0.4
0.2
0
O
0.2
0.4
0.6
0.8
(c) Take the Laplacetransform of the PDE,apply the boundary conditions and find
(d) Writethe PDEand (three) boundary conditions for the spherical body, i.e., condi-
(e) Choosethe same nondimensional temperature, radial position, and time variables as follows
T-TO
Tl-To
tk
pCpR2
Expressthe PDEand BCSin these nondimensional variables. How many parameters appear in this problem?
Exercise3.27:Transient
solutions for slab, cylinder, and sphere
Wewish
to plot and compare
at differentT
Exercises
335
(a) The
o,s) =
slo(v')
(3.105)
(c) The
9,s) = sinh(vf9
Sinhvr
Findthe zeros of the denominator Sinhv". Note that the denominator
has a
doublezero at s = 0 because both s and Sinh vanish
at s = 0.
g(t) = f(t')dt'
Therefore define
= s,s) = sinh(v'9
Sinh
n=l
(TITT)
sin(nrr9e -n
Noticethat the following series is the Fourier sine series of the linear function
(Selby,1973, p. 480)
(1)71+1
so we have
n=l
sin(nTT9 =
(l)tt+l
sin(nTTE) e-
n2Tt2T
(3.106)
vector
calculus and
336
c'
separation of variables
in a sphere by
diffusion
Transient
spherical geometry described in Exercises
Exercise3.28:
problem in the
the information in
diffusion
using
the transient by separation of variables,
it
solve
3.26and 3.27.
Fourier
Exercise3.29:
Example 3.8
series
1 on the interval x
function f(x)
e
coefficients for the
..
5x,
cos
series
3x,
cos
x,
Find the Fourier the odd cosine terms {cos
[-77/2,TT/2Jusing
f(x) =
an cos(2n + l)x
n=0
oati(
If (x) 12 dx =
dk
d4G
d2G
Exercise3.32:A square
with one
solve v2u = o in
a unit
u = 0 on x
heated wall
square
0, x = I and y domain O < x < 1, O < y < 1, with boundary conditionS
= 0, and u
= I ony = 1.
tat
tkk
3.5
Exercises
337
variables to solve
separationof
the
utt = c2uxx
followingboundary conditions
u(x,()) 1 -x,
that your
u(O,t) =
x > 1/2'
= O,
of
steady-state temperature
Useseparationbottom half the surface temperature
distribution in a
is kept at T = O,
spherewhose Use the transformation X =
1.
cos (t) to convert the and whosetop
=
T
at
is
half
equationin the
to Legendre's equation. Note that the
eigenvalues
polarangledirection
of Legendre's
+ 1) for positive integers n. The
=
corresponding
equationare polynomial
eigenfunctionis
Pn(X). Explicitly find the first four
terms in the expansion.
theLegendre
equation in spherical coordinates (r, O,(b) is
Laplace's
1
r 2 sinc
sin 4)
Y2 sin2
2T
02
utt = c2 V2u
00
in the domain y > O, < x < 00,with boundary condition u(x,y = O,t) = f (x)eiwt
Thisequation governs sound emanating from a vibrating wall.
, showthat the
c0 2 v c 2 V 2 v = O
Gt = DGxx
has solution
e -(x-9214D(t-T)
Wehavechanged notation here to emphasize that this solution is the Green's function
"(t
- T).
vector Calculus
Equations
338
analogouS to (3.71), to find the
with a result
solution
along
result,
this
Use
(a)
problem
t)
ut = DUxx + f(x,
the initial-value
condition u(x, 0) = uo (x).
initial
and
00< x < 00
in the domain x > O with boundary
O
=
f
case
condition
the
=
condition u(x > 0, 0) 1. Use an image or symm
(b) Nowconsider
initial
in the unbounded domain,
u(0,t) = 0 and
this into a problem
where
convert
to
argument
in Exercise 3.17 may be useful. This
information
The
solution
can apply (3.72). transforms in Exercise 3.23.
is found by Laplace
of SIMILARITYSOLUTION.That
again by the method
is, 0b_
problem
this
problem
is
the
combination
(c) Solve
the
in
scale
=
length
serve that the only
convenient), and seek a solution
(the factor of 2 is
arbitrary but
u(x,t) = u(n)
ordinary
in a circular domain
Exercise3.37:Schrdingerequation
w _ v2q,
t
Useseparationof variables to find the general (bounded) axisymmetric solution to this
problemin a circulardomain with (P = 0 at r = 1. Hint: if you assume exponential
growthor decayin time, the spatial dependence will be determined by the so-called
modifiedBesselequation. Use the properties of solutions to this equation to showthat
there are no nontrivial solutions that are exponentially growing or decaying in time,
thus concluding that the time dependence must be oscillatory.
uxx + uyy
Exercise3.39:Domainperturbation
analysis of diffusion in a wavy-walled
slab
SolveV2T = 0 in the wavy-walled
domain shown in Figure 3.12. The top surface is at
y = 1, the left and right
boundaries are x = 0 and x = L, respectively, and the bottom
surfaceis y = ecos 2TTX/L,
where e < 1. Find the solution to O(E) using domain
perturbation.
3.5 Exercises
339
=O
Useseparation of variables to solve Laplace's equation in the wedge 0 < e < a, 0 <
r <
boundary conditions u(r, 0) = 0, u(r,
1,with
= 50,u(l, O)= 0.
should
Usethe method of images to find the Green's function for this domainwhere
the images be, and what should their signs be? A well-drawn picture showingthe posi-
tionsand signs of the images is sufficient. The first two images are shown.Theydon't
completelysolve the problem because each messes up the boundaryconditionon the
sideof the wedge further from it.
Partial Differential
Equations
VectorCalculusand
340
form of
Exercise3.43: D'Alembert
X2
is
condition u(x, 0) =
find the solution with initial
Iv(x),
(b) Use this solution to unbounded domain. Pick a shape for w (x) and sketch
the
u/t(x, O) = 0 in an
solutionu(x, t).
semi-infinite domain
ut = Duxx
uo (x) is
subject to the initial condition u(x, 0) =
u(x,t) =
find the analogous solution for
Use this solution and an argument based on images to
condition u(0, t) = 0 and
boundary
the same problem,but in the domain x > 0, with
with initial conditionu(x > 0, t = 0) = (x).
where K > 0. Use separationof variablesto find u(x, t) subject to initial condi= 0 and boundaryconditionsu(0,t) = u(L,t) = o,
tion u(x,0) =
for
dx = au+ (x)
where a > 0. Recall that F -1
1 ealxl
3.5
Exercises
341
(b) show
a
that if u is solution to Laplace's equation, then
so is vu, as
(c) showthat
Eij
constant tensor
as
E.
We
ex +iy = ex (COSy
z e C as follows
i Siny)
iz
sin z = etz
eiz
function sin z.
(a) The
Hint:using the definition of sine, convert the zeros of sin z to solutions
equatione2iz = 1. Substitute z = x + i)' and find all solutions x, y e R. of the
Notice
that all the zeros in the complex plane are only the usual ones on the real
a,'ds.
(Answer:
only
cos
z.
the usual ones on the real
(b) The function
ds.)
TheLaplaceinverse for the following transform has been used in solvingthe wave
equation
sinh(as) sinh(bs)
sinhs
a, be R
Findf (t) , and note that your solution should be real valued, i.e., the imaginary number i
uo(x),
velocity.
sinh(as) sinh(bs)
s sinhs
Showthat this inverse is given by
n=l
nTT
(3.107)
ti
342
(c) Denote
coefficients
the Fourier
as
for the initial velocity v (x)
v(x) =
sin(nmx)
n=l
(x) and
0)
(x, 0)
initial condition u(x,
mixed
the
Example
in
as
(x)
uo
3.15.
consider
of
Showthat
coefficients
(d) Next
the Fourier
Let an denote the mixed initial condition is
the solution for
cos(rtTtT) + sin(nTTT))
=
11=1
Exercise3.51: Wave
115:
initial condition
equation with triangle wave
LITT= uxx
u(x,0)
= uo(x),
CIT (x, 0) = 0
xe
(0, 1)
(b) Considerthe solution (3.91)given in Example 3.15. Establish that the solution
u (x, T) satisfies the wave equation, both boundary conditions, and the initial
period?
3. T = 0.50,
4. T = 0.90, 0.95, 1.00, 1.05, 1.10
5. T = 1.90,
x = 0, 1?
3531
-1
the
soluti(
3.5
Exercises
343
Chebyshev
(and
withboundary
conditions
collocation to
ut = uxx
u(0,t) = 0,
equa-
=1
(b) Howmany
Perform simulations for a long enough time that the solution reachesa
using = 0.05.
convergence checks to verify that your spatialand temporal
perform
steadystate, and adequate, i.e., that the solution does not change much whenthe
discretizationsare
resolutionis increased).
Exercise 3.54
(Courant)
analysis to find the growth factor and the stability
stability
Neumann
Usevon
method, (3.98).
conditionfor the Lax-Wendroff
Bibliography
Handbook of MathematicalFuncti0ns.
and 1.A. Stegun.
Washington, D.C., 1970.
M.
Standards,
rial
Bureau of
Basic Equations of Fluid
Tensors, and the
Mechanics.b
Vectors,
1962.
R. Aris.
York,
over
New
Publications Inc.,
Hassager. Dynamics of
Armstrong, and O.
Polymeric
C.
R.
York, second edition, 1987
R. B. Bird,
New
Wiley,
Dynamics.
Vol.1, Fluid
R. B. Bird, W.
&
N. Lightfoot.
E. Stewart, and E.
Transport phenomena.
John
344
Bibliography
345
J. p. Hernandez-Ortiz.
Polymer
osswald and
Processing:
Hanser, Munich, 2006.
Modelingand
simulation.
Introduction to Theoreticaland
pozrikidis.
ComputationalFluid
C.
University Press, New York, 1997.
Dynamics.
OSford
A. Teukolsky, W. T. Vetterling, and
Art of Scientific Computing. B. T. Flannery.
The
C:
Numerical
Recipesin
CambridgeUniversity
1992.
Press,
Cambridge,
Advanced Mathematics for Applications.
prosperetti.
CambridgeUniversity
H. press, S.
press,
New
Springer-Verlag,
probability,Random Variables,
and
Estimation
347
intuition
the samehuman
Estimation
I. (Nonnegativity)Pr(A)
0 for all
e 1.
satisfying n B = .
These three axioms, due to Kolmogorov (1933), are the source from
whichall probabilisticdeductions follow. It may seem surprisingat
first that these three axioms are sufficient. In fact, we'll see soon that
we do require a modified third axiom to handle infinitely many sets.
First we state a few immediate consequences of these axioms. Exercise
and 'B are mutually exclusive,or pairwise disjoint. We use the symbol
set A is then defined to be 1 \ A, i.e., 31 is the set of all events that are
not eventsin A. Wesay that two events and B are independentif
Pr(A n B) = Pr(A) Pr(B).
Someof the important immediate consequences of the axiomsare
the following
Pr() = o
Pr(A) + Pr(A) =
Pr(A)
If B A, then Pr() Pr(A)
Pr(A u B) = Pr(A) + Pr(B) - Pr(A n B)
Proof. To establish the first result, note that AU = A and An =
for all e 1, and apply the third axiom to
obtain Pr(A u ) = Pr(A) =
Pr(A) + Pr(). Rearrangingthis last equality
gives the first result
42
Random
Density
Function
349
B) = Pr(A) + Pr(A n B)
Nextwe introduce the concept of an experiment and a random variable.An experiment is the set of all outcomes '1, the subsets f g '1
thatare the events of interest, and the probabilities assignedto these
events.A random variable is a function that assigns a number to the
possibleoutcomes of the experiment, X (w), w e 1. For an experiment
a finite number of outcomes, such as rolling a die, the situation is
andthe events, f, can be taken as all subsets of '1. The set f obviouslycontains the six different possible outcomes of the die roll,
corresponding
to an even number showingon the die, and 0 to the
eventscorresponding to an odd number showingon the die. In the
firstcasewe have the simple assignment
1Notice
that we have used all three axioms to reach this point.
Probability, Random
Variables, and
Estimation
350
and in the
second
the
case, we have
assignment
experimentwith
random variables. For example, if Wemeasure
real-valued
the reactor as a random
werequire
reactor, and want to model
a
the temperature in variable of interest X (w) assigns a (positive, real)
process, the random
experiment (0 e 'I. If we let '1 =
R
of the
valueto eachoutcome
should
we
allow for the
immediatelyclear what
for example,it's not
individual points on the real number
subsetsf. If we allowonly the
enough set of events to be useful, i.e.,the
line,we do not obtaina rich
If we try to allow all subsets of the real number line, however, we obtain
=
Wecan then assign probabilities to these
be a count-
+Pr(A2) +
events,
satisfy-
ef
ing the axioms. The random variable
X (w) is then a mapping from
(0 e to R, and we have
well-definedprobabilities for the events
x) =
: X(w)
all the foundationalelements x) for all x e R. At this point we have
that we require to develop the stochastic
methods of most use in
science and engineering.
The interested reader
may wish to consult
Papoulis (1984, pp.22-27)
and Thomasian (1969'
pp.320-322)for further
discussion of these issues.
42
351
is a nonnegative,
function and has the followingproperties
due to the
probability
value
asiomsof
if
lim
00
Wenext define
suchthat
=0
< X2
lim
=1
the PROBABILITYDENSITYFUNCTION,
denoted
00< X < 00
(x) ,
(4.1)
TheMEAN
of a random variable
or EXPECTATION
is defined as
(4.2)
TheMOMENTS
of a random variable are defined by
xnpdx)dx
the
3052d
first
meanis the
moment.
Moments of
definedby
second
as the
defined
is
-TheVARIANCE
var)
+ 2 (E)
($) ---22(E)
square
deviationis the
standard
The
=
(F3)
by
pdx)
2TT2
(4.3)
exp
(4.4)
(4.5)
x 2e -x2 dx
(4.6)
Thefirst formulamay also be familiar from the error function in transport phenomena
erf (x) =
e-u du
erf (00) = 1
e -t dt
Random
4.2
Density
Function
353
x2 gives
x2e-x2dx
ti/2 e-tdt
r(3/2)
calculatethe integral of the normal density as
follows
exp 1 (x m)2
0-2
change of variable
the
pefine
dx
whichgives
p; (x)dx =
du = 1
V-
from(4.4)and the proposed normal density does have unit area. Computingthe mean gives
1
27T2
x exp
dx
7T -00
(vQu + m)e-U2du
1
2Tr 2
-00
(x
exp
Changing
the variable of integration as before gives
var) =
u2e-U2du
dx
354
2
var() = (T
a more
In order to collect
having a normal
distris
(T2)
of
changing the variable
e {x2/adX
27Tv/
{x2/adX
x2e-gx2/adX
2na 3/2
Figure 4.1 shows the normal distribution with a mean of one and vari-
ances of 1/2, 1, and 2. Notice that a large variance implies that the random variable is likely to take on large values. As the variance shrinks to
zero, the probability density becomes a delta function and the random
variableapproaches a deterministic value.
Characteristic function. It is often convenient to handle the algebra
of densityfunctions,particularly normal densities, by using a close
Wk(t) = f(eit)
27T
4.2
Random
355
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-3
-2
-1
= (e ita =
(4.7)
gives
(Pn(t) =
e
e itX1
wn(t) =
(Xl ) dX1
(x2)dx2
oo
(4.8)
normal distribution.
the
of
function
Wenextcompute the characteristic
Probability, Random
Variables, and
Estimati0h
356
show the
characteristic
= exp
itm
t 2
1
00
integration to
of
variable
Changingthe
itm
z = x m gives
27T
COStzdz
e -(1/2)z2/2
itm
2TT2
itmt22/2
definite integral
2a
the
gral. Note also that the integral with
variable but a
In applicationswe usually do not have a single random
vector and let
collectionof them. Wegroup these variables together in a
analogously
random variable now take on values in R't. Proceeding
FUNCTIONF; (x) is
to the single variable case, the JOINTDISTRIBUTION
definedso that
a
scalar inequalities for the components. Note that F; (x) remains
scalar-valued function taking values in the interval [0, 1]
Density Functions
Multivariate
4.3
357
variable case, we
single
the
define the
as in
JOINT
DENSITY
FUNCTION,
derivatives exist,
providedthe
or,
(4.9)
the scalar case, the probability that the
on values between a and b is n-dimensionalrandom
givenby
variable takes
b) =
Pr(a
dxn
al
...dxn
J f
...dxn
...dxn
J J
cov,n)
is definedas
(n - (n)))
components
4, i = 1
n is defined as
Cij = cov(Ei, 9)
varl)
cov2,
covl,
var2)
cov(En, E2)
covl, 41)
cov2,
varn)
358
EstithQti0h
---f
product
x TX,
Which is
two
of random variables, e
and e Rm. We can consider vectors
the joint
distributionof both of these random variables
(x, y) or we
may
onlybe interested in the variables, in which case we can
integrate
out
the m variablesto obtain the marginal density of
dym
we use
dxn
1
e Rn is the
(x m) T
P -l (x
variable e
m)
(4.12)
mean and P e
itive definitematrix. We
is a real,symmetric,
posshow
subsequently that P is the
matrixof E. The notation
covariance
multivariatenormal densitydetp denotes the determinant of P. The
is well defined only
lar, or degenerate,
for P > 0. The singucase P Ois
discussed subsequently. Shorthand
43
Multivariate
Density Functions
359
variable having a
normaldistribution
for the random P is
covariance
and
we
also
find it convenient
P)
QCx --
that we can
so
P
variance
(x m)
(4.13)
3.5
2.5
2.5 4.0
normalare
variate
(x m) TP 1(x m)
360
0.6
0.4
0.2
2. The contour
x TAx = b
Avi
bA22
bA11
Figure 4.3: The
geometry
= b.
eStith
MultivariateDensity Functions
36)
axes and the eigenvalues scale the
that is tangent to the lengths. The
ellipse
desof the box
ellipse are
lengthsof
si
the
to the
The mean and covariance
4.2:
of the
mple
variable
is distributed multivariate
the random
normallyas normal
in (4.12)
exp
E (x
withA e
1
z T A -I z dz
1
z T A -I z dz
(scalar)
zz T exp
z T A -Iz dz
(n-vector)
(27T)n12 (detA) 112A
(n x Il-matrix)
(4.14)
(4.15)
(4.16)
random variable g.
Solution
A = QAQT
A-I = QA IQT
of integration
exp
1
xTA-I x
IdetQl dx
2
T -1 z dz
Iz A
exp
XL2/i]dx
x /dxt
2Tr
(27TH/ 2
i=l
zexp zTA Iz dz = Q
1
x exp xTA Ix dx
[2
xie-B Xf/ L
dxi
eXi/kdXk = 0
Thisintegralvanishes
because of the first term in the product
Sincethe integral
vanishes for each element i, the vector of integrals is
therefore zero.
4.3
nation
zz T exp
of integration
in (4.16)
-- zTA-I z dz =
2
1
xx T exp
xTA-I x det
Qdx QT
2
xxT exp
= QVQT (4.17)
xjex;
dxj
e-Xklkdxk
Vii =
1
zT A I z dz = QVQ T
2
probability give
the
Using
the mean
2.
of
definition
xp; (x)dx
1
()
x exp
-00
1
2
integration to z = x
of
variable
the
Changing
1
m gives
(m + z) exp zTp-1z dz
2
m produces unity by
in whichthe integralwith because the integrand (4.14) and the
is odd.
integralinvolvingz vanishes
Nextusingthe probabilitydensity of the multivariate normal, the
definitionof the covariance,and changing the variable of integra-
tion give
(x)dx
(x ()) (x
1
(x
1
zz T exp
1
(2n)
dx
1
z TP -1 z dz
2
(detP) l /2P
Characteristic
functionofan functionof multivariate
as
m)
Il-dimensional
density. The
characteristic
multivariate
random variable
sz,is defined
WE(t) =
eitTx
00
p (x)dx
43
Multivaria
te Density
t is now
Functions
365
is
it T x
over to
t vector
cp(tx,ty)
exp i t tl
t}
x
Y
y)dxdy
ty)
x
exp i 0 tT
eityy
(x,y)dxdy
Qn(ty)
4.3: Characteristic function of the multivariate normal
Example
that the characteristic function of the multivariatenormal
Show
N(m,P)is given by
Solution
(xm)
dx
Changing
the variable of integration to z = x m gives
e itTm
(2TT)n/2(detP)1/2
00
Probability, Random
Variables, and
366
---+1
det(Q)
that
after noting
lin gives
div
n 00
J j dw
I] F 2Aje
exp ( (1/2)
=
to
in which we used (4.95)
b2J)
b22t
gives
t TQAQ Tt = t TPt
(2 T) n/2(detp)
(4.18)
and
are jointly,normally
covariance
x
rameters
PX
Pxy
Pyx Py
Multivariate
Density Functions
367
4.3
(X)
n/2 (detP)1/2
(2TT)
PX pxy -1
exp
exp i
Y] m
- (112) tx
ty
ty
= exp i [tl
-(112)
e (112)
when deriving
of random variables.
Functions
4.32
we need to know how the density of a random
applications
In many
a function of that
random variable
the density of
to
related
is
variable
variable
random
the
into the random
be a mapping of
Let f : Vt -4
that the inverse mapping also exists
variable17,and assume
regionof
pn(y), induced by the
set
the
define
Y
and
as
E,
variable
the transform
thefield of the random
function f
of this set under the
e X}
Y = {yly =
such that
Then we seek a function pn(y)
pn (y) dy
(4.20)
for every admissible set X. Using the rules of calculus for transforming
a variable of integration we can write
det f -l (y)
dy
(4.21)
in which
Pn(y)
det
f-l(y)
to
dy = 0
Subtracting
(4.22)
contradiction is immediate) 3
Pn(y) =
det
f-l(y)
(4.23)
probability density.
Multivariate
4.3
d tilen
Density
Functions
369
Nonlinear transformation
4.5:
function of the random variable under the
transfornormally distributed
N(m,
02).
for
f %ioll
defl
pn(y) 3 21T
2/3 exp (
= n1/3.Takingthe
m)2/2)
We
= fk)
to find pn
vector
bythe
inequality
X(c) = {xlf(x)
c}
Fn(y) -
then satisfies
p; (x)dx
(4.24)
Pn(y)=
(y)
(x)dx +
(y) J
(x)dx
X(c)
X(c) for y =
c.
Solution
c is
p; (Xl ,
)dX1 dX2
the probabilitythat
whichhas a clear physical interpretation. It says
variables is less than some
the maximumof two independent random
valueis equal to the probability that both random variables are less
than that value. To obtain the density, we differentiate
PEI(Y)/
p2(x)dx +
(y)/
(x)dx
Multivariate
4.3
Density Functions
371
y. Bythe definitions
of joint
distribution, these events have probabilities:and marginal
x and
Aroility
Pr(31n 6B)=
all x, y
are
UNCORRELATEDif
(4.26)
randomvariables,
cov(E, n) = 0
solution
0.5
0.25
-1
uncorrelated random
density function for the two
Figure 4.5: A joint
variables in Example 4.8.
and
indepen-
dent?
uncorrelated?
and uncorrelated?
Solution
density produces
Pn(y) = E, Iyl<l
43
Multivariate
Density Functions
373
performing the
term gives
expectationof
xy +
Y2)dxdy
-1
and the
covariance of
and
cov(E,n) =
the product
is therefore
uncorrelated.
and and r) are
independent implies uncorrelated.
This example
(c)Weknow that
does not contradict that relationship. This example shows uncor-
Px Pxy
my ' Pyx
Probability,
Estimation
374
so
can
the density
be wTitten
1
(nx+ny)(detpx detPy
(2TT)2
eXP
For anyjoint
know that
normal,we
so we have
2 S?
py
(4.27)
x TPx I x
exp
exp y TPy 1y
and combining
product
the
Forming
S,
(detPx)1/2
= (27T)nx/2
pn(y) =
-1
terms gives
(27T)(nx+ny)(detpx detPy
exp
that
diagonalmatrix,we have shown
dent.
4.4 Sampling
variance
Letscalarrandom variable have density p; with mean m and
xn. By
P, and consider n independent samples of E, denoted Xl, x2, .. , ,
independentsamples,we mean that the joint density of the samplesis
the product of the marginals, which all are identical and equal to p;
(Zl,
,zn) =
ampling
5
4
4,
375
Transformation
Linear
4.4.1
about the linear transformations
facts
following
of random
random variable e
variRn with density
variance of random variable
mean and
= A;
var(n) = Avar)AT
(4.28)
have
tation,we
that
(Ax (Ax))(Ax
AJ
(x (x))(x
= Avar)A T
Withnormals, we often wish to check if the variance is positive definiteafter a linear transformation. Let P e Rnxn be positive definite
be an arbitrary matrix. The followingresult is often
andA e R
APAT > 0. See
useful:P > 0 and A's rows linearly independent
alsostatement 5 in Section 1.4.4.
Toseehow the singular normal arises, let the scalar random variable
bedistributed normally with zero mean and positive definitecovari-
p(x)
exp
+
1/2(27.2xf
+ 73.8x;
0.75
0.5
0.25
ance,
nents
and
to ex
theappe
*stcompute
the
sampling
density
377
of random variables l,
e Rn2,is
denoted
(21T) n1 / 2 (detA1)1/2
p; (Xl,X2)
exp --(XI
In this limit,
-ml)
(4.29)
becomes
(2TT)n1/2 (detA1)1/2eXP
Q(XI
ml)
= (X2nt2)
domvariable
N(O,
andobtain
Py = QAQT
1 1-1 -1
1
pn(y)
singular normal
The
4.7:
Figure
deficientA.
variable transformation
invertible
the
Nextwe define
= QTn
the covariance
and we can write
of C, Pz, as
(Z2)
this q
samplesincreases,
using
1
(Yl + Y2)
Y2)
ofmdomvariab
the differe
and
towar
Ofsqu
44
sampling
unit variance
=
to the
379
res
Tile
anddefer
4.12 (Normal distributions under linear transformation).
eorem
1b
Convariable
normally distributed random
e RI ,
a
sider
covariance PX O and an arbitrary
linear transforwithsemidefinite and transformed
random variable n
R
e
A
= AE. Then
mation with Py APxAT O.
'E(Xi) =
=m
Probability,
380
can
Rn)2,which
-- Sit)
Random
as follows
rearranged
be
Sn
m))
2)
2 2(Xi
--
2n(Rn --
2
+ n(kn m)
2
2 n(Rn ---m)
expectation
Takingthe
gives
(Sn) =
var(x ---nvar(Rn)
var(Rn)= Avar(X)A
nP
45
central
as sti
Limit Theorems
381
Sn/(n 1) to obtain
Sn
OF THE MEANis
therefore
SE(Rn) =
. . . + Xn
Probability,Random Variables,and
382
Px(x') =
U (O,I), which
means
0 otherwise
(4.30)
var(x) =
(x
(1/2)) 2dx 1
12
X10
random variable as
Y = Ax
A = [1 |
are given by
var(y) = Avar(x)AT 5
=
6
central Limit
Theorems
if
ill
Withonly
10
N (5, 5/6)
383
random
variables
and y are
that even 10
shown
uniformly
produce nearly a normal distribution
distributed x in Figfor their
random
4.8and 4.9.
ores
sum Y.
i=l
n
var(Sn) =
i=l
var(Xi) = n2
00,Wefirst rescale
we want to take the limit as n
Since
the sum
tokeepthe mean and variance finite. Given the formulas for shifting
meanandvariance we choose Zn = (Sn nu)/
and obtain
(Zn) =
var(Zn)
n2
var(Sn) = 1
,n be independent
and
characteristic
functions to establish this result. Weshallfinduseful
of
thefollowing
bound on the error in the Taylor series approximation
(ix)tn
m=0
(4.31)
Probability,Random
Variables, and
EstithQtion
384
600
500
400
nx 300
200
100
0.1
0.3
0.2
0.4
0.5
0.6
0.7
0.8
0.9
1600
1400
1200
1000
ny 800
600
400
200
i=l
Theorems
central
Llmit
385
(4.32)
denotes
that
the
size
o(lx13)
of the error
13.
time
we first show
term in (4.31)
some constant
that the
is
characteristic
t variance.
for Yi's characteristic function
and obtain
a series
tpansion
(x)dx
(1 + itx (1/2)t2x2 + Ix13
= 1 + i(Yi) +
- 1- (1/2)t 2 + 0(lt1 3)
that here we have assumed (lYi13) is finite, so that it can
Notice
be
into the O(lt13) term. Next, since Zn
(l/vfi)
absorbed
Yi,we
and (4.8) that
(4.7)
from
have
= (1 (1/2)t2/n+
vzn(t) =
Intakingthe limit as n
droppedto obtain
wzn(t) = n-.oo
lim
nlim
00
the calculusresult that
Using
gives
lim00Qzn(t)
n
(1
+ ax)l/x = ea withn = l/x
= e-(1/2)t2
nlim00zn
andtheresult is established.
N(0, 1)
,0fO
distributions
variables with different
Random
4.52
and Laplace is already a SPectac_
theorem of de Moivre
it is not a compelling reason
Thecentral limit
result. But as it stands,
mathematical
system would be well
ular
unmodelednoise in a physical
to assume that
how
all,
would we deduce
distribution. After
representedby a normal
random effect in a physical system is the result
that someunmodeled
random causes, all of which have iden_
of manydifferentindependent
the
lion to
Assumption 4.15 (Lindebergconditions). Consider independent randomvariablesXi, i = 1,2, , n satisfying (Xi) = 0 and var(Xi) = 2
(42.The following two conditions hold as n
and let s; =
00
(a) Sn
00
ago
CLTI
f(x 2;
nt
variab
random
flu
> a) =
central
4.5
Limit Theorems
theorem
limit
theorem).
satisfying Assumption
4.15. The
With
denoting
consider in-
and
normalized
sum Z
=0
45.3
result.
following
definite.Wehave
the
Zn = (l/vl)
distribution
in
to
the
normal
converges
N(O,E).
u)
(Xt-
(b)Forevery > 0,
Thenthesum Zn =
Seevan der Vaart (1998, pp. 20-21) for further discussionof this
case.Theorem 4.18 is the mathematical basis for the commonphysical
modeled
aSsumption
that noise in process measurements is oftenwell
Probability,Random Variables, a
388
by a zero mean
(2
fid
EstihQti0h
often
can b
minedby examiningsamples of the measurement, Which is
tantpart of the process modeling task that is often overlooked
Finally,the history of the term "central limit theorem"
teresting.Apparentlycoined by Poly in 1920 (in German Is also
z
entrap;
eac
be jointly distributed
Theorem
random variables
(x,y). we seek the
mth
density function
of given Probabihty
y of Ohas been
observed. we
define the that a
conditional
PE/0(x/y) =
P7(y)
Weexp
Considerarollofa single
die in which
whethertheoutcome
is
even or odd takes on values E or
die. The12
and is
O to
valuesof thejoint
density function the integer value denote
of the
are simply
computed
1/6
p,o (3, E)
1/6
pe,o (5, E)
p,o (6,
E)
1/6
Themarginal
densitiesare
then easily
whichgives
by
g across
p(x)
Pm;(011)
1/6
(4.33)
tecondional
density
This
factleads to a
computed;
rows of
(4.33)
we have
biddensity,
which
for
con
we have
SIOjlarlY
389
for n
pn(y)
summing
givesby
112
= 116
pn(011)=
or
Because
anddeduce
pn(y)
390
my ' Pyx Py
m = mx +
are
my)
Solution
(4.35)
P = PX --PxyPy-1Pyx
(4.36)
Thedefinitionof
conditional
density gives
PEIQ(xly) = PE,t7(x, y)
Because(E,7)is
jointly normal,
PEIQ(xly) =
in which
the
from Example
4.4
and therefore
Substitutingin
the
we know
definitionof
(2n)n/2
argument of
det
the
x
the normal
xy
density from
(4.13) gives
1/2 exp
(1/2)a)
(4.37)
exponent is
xy -1
(4.38)
in
Theorem
P PX pxyPIPyx as defined
use
If
in (4.36)
inversion formula
artitjonedmatrix
to express
then we
can use3thle
the matrix
{4.38)as
inversein
Pxy
-PilPyxp-1
Pyx
substituting this
multiplying
out terms
yields
P-1 [(x -
mx)
PX Pxy
detpy detp
Y ' my
or
= n(x,m,P)
(4.39)
= n(x,m,P)
Example
4.20: More normal conditional densities
Letthe joint conditional
distribution
with the following mean and variance
ma
(4.40)
J get;
the
showthat
with mean
conditional
density of A
c) = n(a, m, P)
given
and variance
by
Solution
b,c) pc(c)
Pc(c)
or
PBlc(bIc)
a ma
Pa Pab
'
b
' Pba Pb
n(b, mb,Pb)
m = ma + PabPJl (b
and the result is
= n(a, m, P)
nib)
P = Pa PabPJ lPba
established.
4.7
Maximum-Likelihood
Estimation
we now
turn to one
of the most
terminemodel
basic problems
in modeling: how to deparameters
methodsto
from
experimental measurement. Finding
solveparameter
estimation problems
has had a significant
41
Maximum-Likelihood Estimation
development of
mathematics
acton the
393
estimated.
problems.
determining
confidence intervals. The first five estimationproblems
andthe
nUmerical
solution for both the optimal parameter estimate
measurement
error covariance estimate.
394
Probability, Random
Variables
, and
Esti hQ
Measurement
Variance
ith,
(27T)n/2nexp
In 27T + n Ino +
2
the
in a vector
and
giving
1
Inp(ylO, a) = U ln2TT+ n Ina + G(y
-XO)
We define the the log of the likelihood as a function of the parameters
O and (Twith the data y regarded as fixed values
1
(4.43)
2,
Because we assume that we know the measurement error variance
to
the only unknown in this first estimation problem is O. Therefore,
find the maximum-likelihood estimate, we maximize L(O,U) bydifferentiating with respect to Oand set the result to zero
= x T(y -XO)
O= A x T(y -XO)
(4.44)
47
Maximum-Likelihood Estimation
395
O = (x TX) -1 x Ty
1.6(b).
of
density
parameters
and parameter
probability
item
of
interest
next
is
the probability confidenceinterval. The
densityof the
be the parameter generating
estimates. Let 00
the
measurements
+
e.
00
Then
X
we
have
so the
modelis y =
= (x Tx) -1 x Ty
e
N(O, 21n)
O N(00,
(4.46)
(x m) TP 1(x m) < b
is given by
y(np/2,b/2)
r(np/2)
(Abramowitz
and Stegun, 1970, p.255-260)
F(np) =
y(np,x) -
tnp-le-tdt
for
defining
into the equation
(x --
the ellipse
m)
x2(np, cx)
mean and
in the values of the region covariancegives
substituting
for the
Finally,
elliptical confidence
maximum.
a-level
the following
likelihoodestimate
xTx
T
( 00)
(O 00)
x 2(np, a)
(4.47)
the
bersometo present. In
box
bounding
that
smallest
the
contains
regionwith
the
confidence
4.15, this box is given by
ellipse. As shown in
Exercise
|-oo/
1/2
(x 2(np,
= (x2(np,
1/2
kelihood Estimation
41
Maximum-Ll
397
0(-levelconfidence levels
compute
on each of the
then
np unigiving
normals
O = 00
srariate
bich
= (X 2 (1,
that the
and
boundmultivariate
0(-level
true
confidence
region;the secfor the
of the cx-level
tervalin a
is to know
and communicatewhatyou
The important point
The bounding box certainly contains more than the
reporting.
are
region in its interior.
since it contains the true
probability
level
box does not have this property. The interpretationof
marginal
The
box is the same as the interpretation of any marginaldenmarginal
the
obtained many samples of the parameter estimatesfrom
sity.If you the ith interval of the marginal box would contain an
manydatasets,
all the different samples of the ith parameter estimate.
levelfraction of
about the probability of the jointly distributedparameter
statement
No
this characterization. We include the following
from
follows
estimate
these distinctions.
clarify
help
to
example
confidence region, bounding box, and marginal
Example4.21: The
box
random
Assumethat the two-dimensional
N(m,P)with
variable is distributedas
them
Probability, Random
Variables, and
Estimation
398
0.6
0.4
0.2
0
0.6
0.4
0.2
024
-2
0.4
0.2
-2
-2
- m)
= 5.99
4.7
Maximum-Likelihood Estimation
(b) The
399
(1/2)
and
are
shown
respectively. The in the bottom right
intervalfor the
x 2 (1, 0.95) =
3.84
Xl e
[1.77, 3.771
= 3.84
e [0.614,
3.391
(1/2)
= 5.99
e [-2.46,
4.461
e [0.269, 3.731
ellipse = 0.956
bounding box = 0.981 marginal box = 0.920
N(O, 0 2)
400
differentiating(4.43)
L(O,(T)
I xT(y -XO)
L(0, (T) = _ Y + -3 (y
to
Equatingthe derivatives
XO)
(xTx)-1xTy
1
= (y
n
(4.48)
-XO) (4.49)
the square
the samples.
not equal to
n np
2
n np
Weshowsubsequently that the sample variance is an unbiased estimate
of 2so the maximum-likelihoodestimate of 2is biased. But this
bias is small for a large number of samples compared to parameters
n > np.
Giventhe same result for O as in the previous problem, the probabilitydensityof Ois unchanged from the previous problem. Wenext
For this it is convenientto
determinethe probabilitydensity of
first considerthe singular value decomposition of the X matrix. We
assume that this n x np matrix has independent columns so the rank
is np. As discussedin Chapter 1, a real n x np matrix with independent
columnscan be writtenas the product of orthogonal n x n matrix U
and orthogonalnp x np matrix V, and diagonal np x np matrix E
x = [UI U2
0
4.7
Maximum-Likelihood Estimation
relationships result
the
ill
U! UI
40)
from
orthogonality
In np
UITU2
0 npxnn
UIUIT + U2U2T
= 1n
VT
Onnpxnp
V V VT
value
singular
decomposition
Usingthe
(SVD)for X,
we find by
(x Tx) -1 x T =
1
substi-
Vs -I UT
= UIUT
= (J2U2
Y -XO = U2UTe
(4.50)
- XO) = eT U2U2Te
Theserelations provide an essential insight. The error e obviouslyaffectsboth quadratic terms, but its effect in the sum of the squaresof
theresidual (the sample variance) is through U2and its effectin the
parameterestimate's distance from the true value is through UI. Becausethese two matrices are orthogonal to each other, the effectof
themeasurement error is independently distributed in these two quadraticterms. We make this statement precise subsequently. First it is
helpfulto establish that the following two random variables, Zl, z2 are
independent
statistically
= UT e
Giventhat e
= U2Te
transformation of a
linear
on
result
the
and
N(O, 21)
normal,the pair
is distributed as
0
O
Intip
402
Z2 Z2
Xnp
Zl
Xn-np
f,
ti
of
shows that the mean
deduce quickly the earlier claim that the sarn_
Fromthat fact we can estimate. Summarizing our results
on
ple varianceis an unbiased
ple variance thus far
1
n -Tip
n np
(zT2Z2)
f(s 2) - n np
2
n np
2
confli
Z(451)
xTx
n np
np)
np
hi*thesamplev
expre
r ( nt2m ) 1
(zn)nmm
(zn + m) n+m
41
Maximum-Lik
elihood Estimation
403
the
00)
Oand
2
(O
(O 00)
give
x2n-n
00) T (x T x ) (O 00)
YIP)
(4.51)
(4.52)
(4.53)
Xnnp
(4.54)
xTx
F(np,n
(O 00)
xTx
(O
00)
up
Random
Probability,
we
the
can obtain
bounding
in the previous
as was done
intervals
box
section
1/2
TIP,
in which
previous
denceas the
samples n
confidence interval
the
case,
number of
tbe
Measurement Covariof
Measurements, Known
4.73 VectorDifferent
ing to
ance R
vector of parameters
1
sRisknown,W
Itis
perhaps
Rewritinf
describing it
1
el
(4.56)
weuse
dices,Taking th!
Yp i
givesa ma
ei
Theenvironmental
variableXi is assumed to have q components, Xie
Rq,and e RPxq,and we assume q < n. In this model we have
np = pq modelparametersto estimate. Notice that this model is not
restrictedto onlyp independent versions of the model given by (4.42),
Thegeneralizationallowedhere comes from the covariance matrix R.
Toreducethis case to the (4.42),
we would add the further restriction
that R = 21.Wewill see
that allowing the different measurements
ikelihood Estimation
41
Maximum-L
405
measurements
1
(2TT)np/ 2
taking
or,by
(det R)n/2
logarithm
nP In 27T+ IndetR +
n
IndetR + E (Yi o
xi)TR-1 (Yi
(Yis
(4.57)
inwhichwe use the Einstein summation convention for repeated indices.Takingthe derivative of scalar-valued function L with respect to
givesa matrix derivative
(+)mn
1
(YirPerforming
the sums over the deltas, noting R is symmetric,and collectingterms gives
R ms (Yis slXil)Xni
Probability,Random Variables,and
406
statement we have
= yx T xx T) -1
as
(4.58)
Y = 0X+E
E = [el
= 98 + (xx T)
Wefind the transpose
convenient because we now
matrix 9 Tin a vector
wish to stack the
giving
01
0 1 02
02
47
Maximum
-Likelihood Estimation
407
fromthe
form of the
+ (1
definition of E we see
el,l
el,2
el,n
vecE
ep,l
ep,2
RII
RIP
RII
RIP
RPI
Rpp
RPI
Rpp
N(O, S)
S = Re (XXT) 1
(4.59)
plify this
Kronecker
follows
covariance as
Sims
in Section
can be found as
S -1 (vecT vec(T)
(vecT
x2(np,
a)
(4.60)
Interlude
orthogonality
Let's put the tools of
independent.
n
ThenX is distributed as N(u, (1/n)E) and independently of E, and ny is
distributedas
the fundamental
theorem
of linear algebra,
that is an n I dimensional
space. Collect an orthonormal basis in the
aximum-Lik
M
(11
elihood Estimation
409
atri$
BnT 1
IT
BT _
BBT
e the
BTB = 1
pefill
XB T
vecX
more compact
in
or
xn
notation
vecX N(FLI
transformationgives for Z
The
vecz = (B 1)vecX
vecZ
inwhich
P = (BBT
vecZ =
zn-l
Probability, Random
Variables, and
Estimation
410
variables Zl,
conclude that the
we
covariance
From the independent. Computing E gives
statistically
',zn are
xxT)
(XiXiT
1
--- (X X T nxx T )
1
T T
(ZB BZ -- znzn)
n
1
Z ZT
znznT)
nl
ZiZi
X
SinceX = zn/ffi, we have that
N(u,
is
proved.
tr(B) = tr(BA)
47
Maximum
-LikelihoodEstimation
411
of a
matrix
dtr(f(A))
g(x) = df(x)
= g(AT)
dx
the usual scalar derivativeof the scalar function
whichg is a derivation of this fact.
f. See
for
Applying
this result and using
Exercise4.4
thefact
= -R -2 C
d In detA
detA
= (A
+ -R -2
= -R
2
- oxi)T
E(Yi
Probability,Random
412
=
pw( W)
the
in whichrp is
(4.61)
function defined by
gamma
multivariate
rp(z) = TT
1))
ments, Known
in which the different measurement types
Nextwe consider the case
parameters. The model is
are affectedby the same set of
el
e2
ep
exp(-
i=l
-Xi)
np
2
i=l
-Xi0) (4.62)
41
Maximum
raking
-LikelihoodEstimation
473
(Yi- Xi0)
Ixt)
(4.63)
= (X TR -1 x)-
X TR -1 y
(4.64)
Yi = Xi()+ ei
ExTR -1et
el
+ (EXTR -1Xi)
[XTR-I
en
(4.65)
414
in which
XTR-I
s = (ExTR -l Xi
R-IXI
-1
R-l xn
-1
(0
(4.66)
The bounding box intervals follow as in Section 4.7. I. Notice that when-
ever the variance of the measurement errors are known, the maximumlikelihoodestimate is normally distributed and the elliptical confidence
intervals are given by X2(np,
4.7.6 Vector of Measurementsy, Same Parameters for all Measurements, Unknown Measurement Covariance R
The final case is the one that arises most often in mechanistic modeling
n
2
Setting this result to zero and using the result of the previous sec-
tion gives the followingset of necessary conditions for the maximumlikelihood estimates
(ExTk -1Xi)
(4.68)
tion
covariance such as Ro =
I. One
then
estimates
by SOIVinga sequence of
standard
thereis no guarantee that this
the iterate
by
estimation
procedure
a crudeinitial guess like Ro = 1 lies
converges. problems.
But
Onemay
outside the
find
regionof
that
convergence
Maximum-Likelihoodand Bayesian
Estimation
background
this
in
maximum-likelihood
liketo compare the approach to
another classestimation,we
would
knownas Bayesian estimation. As
we saw in of popularmethods
the previous
in the maximum-likelihood approach,
we
sections
maximize the
OMLE
= argmaxp(y; 0)
Althoughin the MLE sections we wrote
probabilityof
(4.69)
to indicatethat
a parameter, here we use instead p(y; 0)
e was
to emphasize
that e is an
unknownparameter, not a random variable.
In the MLE
approach,e is
a random variable, not O, and we assess the
confidenceintervals
In Bayesian estimation, on the other hand,
for 0.
itself
is modeledas
a random variable. The information that we have
about e beforethe
experimentis denoted by p (O). In the experiment,
we imaginedrawing
well
as
e
as
of
the
a value
= argmaxp(\y)
Theconditional density p (OIy) is known as the POSTERIOR
density,i.e.,
the density for O after the experiment, and the density p (0) is knownas
thePRIOR,
i.e., the density before experiment. In Bayesianestimation,
weassess how much the measurement of y has changedourknowledge
aboute. From Bayes's theorem we can express the posterioras
Variables, and
Estimation
Probability,Random
416
477
that we have more equations than
unknowns,
a well-conditioned estimation problem.
It is which is necessary
customaryto
for
definedata
1
XT
to
model
the linear
we let B =
Y=XB+E
e Rqxp+ E. In order
,
and we have
min IIYIt is not difficult to show that the solution to this problemis the following
Bis =
= xtY
Also, we already know that XTX has an inverse if and only if the
columns of X are linearly independent; see Proposition1.19.Sincewe
maynot have control over the experimental conditions,we oftenmust
dependent
contend with datasets in which X has dependent or nearly
X. In such
columns,i.e., we have near collinearity in the columnsof
418
011
the maximum-likelihood
small errors in
in the data or
to small changes
cases,
assumedmodel
structure.
E VT
E = diag(1, ,q),
andti
have
tom
that
= UcEcv{ + UqEqv{
X
UcEcv cT
(4.71)
int(
of
(4.72)
s
estimateis less sensitiveto errors in the data than the least-square
or maximum-likelihoodestimate. Realize also that only the maximumlikelihoodestimate is unbiased. By suppressing the small singularvalues, we introduce a small bias in BSVD,
but greatly reduce the variance
in the estimate.
orthogonal matrices T,
known as
the
only
first
loadings.
C principal the scores,
components and p knOwn
are
columns of T and P, respectively.
retained as the
The
Principal
component
BPCR= PI
regression
showsthat
B PCR = BSVD
PCR.
pLSR. A potential drawback of the PCR
approach is that
only the predictorvariables are evaluated. The principal
componentsare selected
to maximize the information about matrix X.
But there is no
guarantee
that these components can represent the responses
Y. To improvethe
predictive capability of the model, the PLSregression
(PLSR)
adds a very
interesting wrinkle. In this approach, one does not start
with the SVD
the
SVD
of
with
XT
Y, which includes information
of X but
about both
WI = YVI = FIVI
and Y,respectively.
in which the matrices El and Fl are initialized as X
t{tl. Wenowdefine
=
tl/
ti
normalized
usually
TheX scores are then
420
Estimation
PI = El ti
Next the data matrices are deflated by subtracting the informationin
the current latent variablevia
T
Ei+l = Ei -- tipt
Fi+l = Fi titli
in place of X TY
The next iterate starts VNiththe SVDof Ei+1Fi+1
and the
matrix using the fitting dataset (X, Y) and the chosen number of principal components or latent variables, C. To determine the best valueof
C to use for estimating B, one finds the C that minimizes llEvIl}.This
value of C is large enough that the model fits the data accurately,but
not so large that the model has been fit to the noise in the data. We
demonstrate the cross validation technique with the following example.
4.8 PCA
421
4
3
llEpcRllF
16
12
llEpcRvllF
4
0
1
4
5
Firstwe divide the 200 samples into two sets, and use the first 100
samples for estimating the parameter matrix B, and the second100
samplesfor cross validation. For principal componentanalysis,we
compute the SVD of the 5 x 100 X matrix. The five singularvaluesare
E = diag(15.1, 3.26, 2.72, 2.67, 0.0226)
422
Estimation
8
7
6
5
4
3
Il
IlF
4
5
Figure 4.12: The sum of squares validation error for PCR and PI-SR
versus the number of principal components/latent variables C; note that only two latent variables are required
versus four principal components.
We see that X has four large singular values and one near zero, indicating that the rank of X is nearly four. Next we estimate BPCR
using
(4.72)for V = 1, 2, 3, 4, 5 and calculate the sum of squares of the fitting
error, IlY XBpcRllF.The results are shown in the top of Figure 4.11. It
is not surprising that the fitting error decreases with increasing number of principal components. As we see, the fitting error contains little
information about how many principal components to use. After estimating the parameters, we then compute the output responses for the
validation data and compute IlYvXvBpcRIIF
in which Xv, Yv are the
predictor and response variables in the validation dataset. This validation error is plotted in the bottom of Figure 4.11. Here we see that
we should use four principal components in the model, in agreement
with the SVDanalysis of X. Using the unreliable smallest singular value
in the regression causes a large error when trying to predict response
data that have not been used in the fitting process.
'
regression
0.4
423
1.2
pcR -0.4
0.8
-0.8
-1.2
-1.2
-0.8
00
00 0
-0.4
0.4
0.8
1.2
PLSR -0.4
-0.8
1.2
0.8
9000
0.4
00
-1.2
-1.2 -0.8
PLSR
00
0.4
0.8
1.2
latent
to obtain the sameerror as fourprincipal
components. This reduction in model order is the primarybenefitof
By evaluating
provides.
capability
Variableshas roughly the same predictive
424
Qti0h
0.8
008880
0
00
PCR
-0.4
00
-0.8
0 0 00 0
0.4
000 0
0.4
00
0.8
0.8
1.2
00 0
PLSR
00
000 0
-0.8
0.4
00
0 08 000
0 &00
0.4
0.8
PLSR
1.2
Figure 4.14: Effect of undermodeling. Top: PCR using three principal components. Bottom: PI-SR using one latent variable.
Appendix----Proof of
the
central
eoreth
several
established. But
until some of the
properties of PLSR are uncovered,
optimality
research on the
PLSRapproach
likelycontinue. In any field, a
valuable technique
that alsodefies
explanationis a prime target for further
easy
research.
Probability,
40
426
the text.
section in
proof.we
start
two
by considering
2)
define Rk as
Rk EXJ+EYJ,
so that
RI
+ Xnl
and
expectations
this ineqU?
Establishing
Fsnand FTn,
toachievethisgoal.
xeR
Kf min(h 2, lh13)
(4.73)
fixed
The term lh13is expected from the standard Taylor expansion, but
Indicates
th
4.9 Appendix
Proof of the
central
Limit
Theorem
continuous,
bounded
derivatives
f" (Rk) +
Y(Rk,Xk)
performing a similar expansion
for f (Yk
+ Rk),
taking
expectations,
+ Rk)
(112)
f(Yk + Rk)) =
((Xk)
(Rk)) +
wherewe have used the fact that
(AB)
dependent random variables. Noting =
that the first
+ Rk) - f(Yk + Rk))
I
for A and B
two terms incancel,
+ g(Yk))
and
(4.74)
to compress the
g (X) = min(X2 ,
and defined
notation.
Next
comesthe reason
for introducing the Rk variables. Notice
that
f(Rk + Xk) and f (Rk + Yk)leaves only two differencingthe sum of
terms
n
f(Rk +
+g(Yk))
(4.75)
But we wish to bound the distance between the two cumulativedistributions Fsn and FTn, so we next choose an appropriate functionf G)
to achieve this goal. Consider the step function fl(w;x) depictedin
Figure4.15, in which w is the argument to the functionand x is considered a fixed parameter. Using fl (w; x) we have immediately
= Fsn(x)
fl(Sn)
The function fl ( ) is known as an indicator function,because
So this is
indicates when the random variable Sn satisfies Sn x.
4Sincef (x)
density px(x)
If (X) I for all x, multiply by the
and integrate.
the kindof functionwe seek, but, of course, fl does not have even
shortly.
Computing ff(Sn) gives
Psn(w)f(w;x)dw =
psn (w)dw +
Psn
and, subtracting the
Fsn(x)-FTn(x) =
x)dw
analogous expression
for Tn and rearranging
-fan))-f(Tn)) +
f (Tn)) +
bnL
(Psn(w)
gives
(w))f(w;x)dw
(w)f(w;x)dw
429
choose
\ FOX) --
Kf
this
L small,
side
+ bnL
is the second
step. Notethat
, and
variable Zn = Snlsn =
n for this choice of Yk.
independent 01
also
scaled Xk, and also has zero mean
is a sum of
Applying (4.76) to these variablesgives
n.
sariance for
side, we
right-hand
evaluatethe
discussedbetore
as
g (Xkl
\Xkl
>
sn\
the sum
gives
enoughn itis
smaller
430
of n. So as n
26
46
+ g(Yk/sn))
(4.78)
+ g(Yk/sn))
L= (
Kf = 20L-3
(20
3/4
Substituting these values for Kf and L into (4.77) and using (4.78)gives
sup IFzn(x)
x
ce l / 4
with c (1/2)5-3/4 + 1/ F
0.71. Since this bound holds for all
0,we have established that
=0
4.10 Exercises
Exercise4.1: Consequences of the axioms of probability
(a) If B g A, show that Pr(A \ B) = Pr(A)
Pr(B).
(b) From the definition, the events
and B are independent ifPr(AnB) = Pr(A) Pr(B).
If and B are independent, show
that A and 'B are independent.
4.10 Exercises
Exercise4.2: Statistical independence
random variables
showthat two
and
431
condition
in
are statistically densities
independentif
and onlyif
all x, y
(4.79)
Exercise4.3: Statistical indpendence of
functions
considerstatistically independent random variables, of random variables
e [Rmand
randomvariable e RP and 13e RQas =
e
Vt. Define
statisticallyindependent for all functions f(.)
and O.g(n). Showthat and
are
statistical independence of random
Summarizing
variables
(E,
n) implies
dependence of random variables
for all f(.) statisticalin-
andg(.).
dtr(f(A))
g(A T )
functionof a square
g(x) =
dx
in whichg is the usual scalar derivative of the scalar functionf.
(4.80)
consider again the estimation problem for the model givenin (4.56),but this time
YiT = xT + ei
(4.81)
(a) Derive the maximum-likelihood estimate for this case. Showall steps in the
be expressed as
and show the maximum-likelihood estimate can
T)-IT
formulathat is analogousto
estimate
an
gives
way
this
Expressing the model
what other problem?
and givethe analogous
estimate
the
for
(b) Find the resulting probability density
result corresponding to (4.59).
and why?
(c) Whichform of the model do you prefer
Random Variables,
Probability,
432
and
Estimation
covariance
mean, and
and n. Calculate the joint density
marginal,
variables,
Joint,
random
the means,
4.7:
Exercise
discrete-valued
densities,
and covari'_
two
marginal
each die.
we consider
both
the values on
(x, y),
are
and and
dice,
two
and is the sum of the two values
die
throw
one
on
(a) we
is the value
dice,
throwtwo
inverse function
(b)cwe
the
of
density
the random variable be definedby
Probability
let
and
R
Exercise4.8: random variable e
Considera scalar
function
the inverse
11
uniformly on [a,
distributed
your answer.
(a) If is
allow a = 0? Explain
we
if
well defined
(b) Is n's density
operator
as a linear
Expectation
defined as a linear combination of the
Exercise4.9:
variable x to be
the random
(a) Consider
variables a and b
random
Show
(b)
(x) = cx(a)
are scalars.
4.10
Exercises
(a) prove
433
= R,
all N
Makethe standard assumptions: the probability density for each die is uniform
overthe integer values from one to six, and the outcome of each die is independent of
the other die.
Givenrandom variable x has mean m and covarianceP, show that the expectedsum
of squares is given by the formula (Selby, 1973, p.138)
(XTQx) = mTQm + tr(QP)
Recallthat the trace of a square matrix A, written tr(A) , is definedto be the sum of the
diagonalelements
tr(A) = E Aii
i
exp
21T2
var) = 0 2
(4.82)
434
Estimation
R = arg max
in whicharg returns the solution to the optimization problem.
and
17
mx
PX
Pxy
my ' Pyx P
then the conditionaldensity of
in which the conditional
mean is
N(mxly, Pxly)
mxly = mx +
my)
covarianceis
Pxly = PX PxyPy-1Pyx
Exercise4.17: Transform
of
Showthe Fourier
the multivariate
transform of the
normal density
multivariate normal
(P(u) = exp iu Tm
-- uTPu
4.10
Exercises
435
randomvari-
ables
random variables T 1 and T2 are statistically independent and
identically distributed
exponential density
withthe
pefinethe
(d) Integrate
in
T2.
densitypy,z.
space, n
S3(r) =
4TTY 2
V3(r) = 413Ttr 3
such that
If we definesn and vn as the constants
Vn(r) = vnrn
Sn(r) = Snrn-l
we have
S2 = 2Tt
S3 = 4TT
= TT
= 413TT
n-dimensiona\case. Compute
Weseekthe generalizationof these results to the
formulasfor sn and vn and show
Vn
Ttn12
pot
10
Probability, Random
Variables, and
Estimation
436
of an ellipsoid in n dimensions
and volume
area
surface
sphere in n dimensions can be
volume of a
extended
Exercise4.20:
and
area
ellipse (ellipsoid,
hyperellipsoid) in
for surface
volume of an
is
defined
ellipse
an
The results surface area and
by the
The surface of
equation
the
to obtain Let x be an It-vector.
dimensions.
and R2 is the
square of the
positive definite matrix
symmetric,
denoted
be
a
R
by
is
size
the
of
set
vtxn
ellipse
e
interior of the
in whichA
the
Let
ellipse"radius."
= {x I x TAx R2}
which is defined by the following
of the ellipse,
volume
the
to compute
we
dx
=
vf(R)
such that
for stet and
(a) Deriveformulas
e n- 1
e
= snR
sn(R)
(detIS1
r 12
rxr,
ellipse given by
=1
V = rabc
4.25: Use
Exercise
the follo
Establish
r(n/2)
(4.84)
(c) The X2(n, a) function is defined to invert this relationship and give the size of
the ellipsethat contains total probability
X2(n, a) = b
(4.85)
Plot (n/2,x/2)
rot/ ) and x2(n,x) versus x for various n (try n = 1,4), and display
the inverse relationship given by (4.84) and (4.85).
Exercise
4.26: Le
Considera model
inwhichy e
RP
the
measurement
estimate
of giv
assume
is dis
4.10 Exercises
Exercise4.22: Normal distributions under
linear
RPI,obtained by the linear transformation
437
transformations
, consider
the random
variable
Exercise
with singular
covariance
value
(22]
- mx)]
mx))
QT
2
(A-I + c TB l c)
(A -I + C TB A C)
-1
-1
= A-AC T B+CAC T) CA
-1
c TB -1 = AC T (B + CAC T)
(4.86)
Probability,
438
to be sampled
measurements
formula,
Considerthe using the least-squares
00. Showthat
tributedas
(x Tx)
N (00, P)
Exercise 4.26
x TR -1 x)-
X T R -1 y
4.10
Exercises
439
can
problem into
we wish to estimate x from
which
stages.
in
case
measurements
consider the
an
intermediate
variable, Y,
x and
of
z,
following
and the
but we
model
have the
betweenY
and z model
cov(el)
QI
z = By + e2
cov(Q) =
(22
the optimal least-squares
(a) write
problem to
surements and the second model. Given
solvefor
y, write
giventhe z
in
terms
R
downthe
of S'. Combine
problem for
meaoptimal
these
two
least-squares
results
resulting estimate ofR given measurements
of z. call this togetherand writethe
the two-stage
estimate
(b) combine the two models together into a single
model and show
the relationship
z = BAX+ e3
cov(e3) = (23
Express (23 in terms of (21, (22 and the
models A,B. What
is the optimal
squares estimate of R given measurements of
leastz and the one-stage
model?call
(c) Are the one-stage and two-stage estimates of x the
same? If yes, prove
no, provide a counterexample. Do you have to
it. If
make any
assumptions about the
such as an all-expenses-paid
vacationto
Hawaiior a new car. Behind the other two doors are goats and donkeys.
The contestant
selectsa door, say door number one. The game show host, MontyHall,then says,
"BeforeI show you what is behind your door, let's revealwhatis behinddoor
number three!" Monty always chooses a door that has one of the boobyprizes behindit.
As the goat or donkey is revealed, the audience howls with laughter. Then Montyasks
innocently,
"Before I show you what is behind your door, I will allow you one chance to change
max
(4.88)
Probability, Random
440
probability that the prize is behind
and give the
density
this conditional
door j * i.
(a) Calculate original choice, and
door i, your
behavior. Please state the one that is
Monty's
of
model
to specify a
(b) Youneed
Make a Deal.
appropriate to Let's
behavior is the answer that it does not matter
Monty's
model of
model for the game show?
poor
a
(c) For what other
this
is
why
if you switchdoors.
textbook
The author of a famous
random
x and u,
confidence intervals
e NO, 0.001)
In(k) = In(ko) -E/T + e
Choosetrue values of the parameters to be
In(ko) = 1
E = 100
(a) Generatea set of experimentaldata for this problem. Estimate the parameters
from these data using least squares. Plot the data and the model fit using both
(T, k) and (1/ T, Ink) as the (x, y) axes.
4, 10
Exercises
A
Exercise4.32: fourth moment of
the normal
established the following
Youhave
matrix
distribution
integral
result
involving
the second
m04me4ntl
(2Tr)/2
following
matrix result
(detP)l/2p
Establishthe
involving a
fourth
moment
xx Txx Texp xTp-1
X dx =
(27T)tt/2
00
Firstyou may
(detP)1/2
want to establish the
[2pp +
following result
for scalar
x
I x2
XPexp
2 T2 dx
0
p odd
2 51
p even
is given by
pt(z;n) =
rC t +2
1) z
v/' r (t)
71+1
+1)
t-distribution(density)
its
This distribution is known as Student's t-distribution after
Student.
W.S. Gosset (Gosset, 1908), writing under the name
(4.90)
discoverer,the chemist
Y
variables X and Define the random variable F as the ratio,
random
Given
respectively.
degrees of freedom,
Y/m
probability
Show that F's
density is
(zn)nmm
PF(z; n, m) =
tt+m
(zn + m)
Beta function
in which B is the complete
by
z 20 n,ml
B(n,m) =
the F-distribution
(density).
t- and F-distributions
distributed
Showthat the random variable T is
= pt(z;m)
Consider two random variables A, B with joint density po(a, b), and well-defined
(alb). Show that A and B are statismarginals PA(a) and n (b) and conditional
ticallyindependent if and only if the conditional of A given B is independent of b
PAIB(alb) f (b)
4.10
Exercises
result to solve the n-sample problem
Yp i
stack the
443
givenby
the
following
model
ep
sampleS in an enlarged
vector Y,
and define
the
Ytt
en
corresponding
et N(O,R)
(b) what
for e in terms
of for this
problem?
e N(O,R)
in which y, O,e are vectors and A, R are matrices. If the constraints are not active,
the
produces the well-known solution
code
= (A T R -I A) -I ATR -1 y
(4.91)
,n, and
you have n statistically independent samples. Your colleague suggests you stack your
problem into a vector and find the solution with the existing code. Soyou arrangeyour
measurementsas
E = tel
444
6
4
00
900
(Txy
-2
-4
-6
-8
1
00
Figure 4.16: Typical strain versus time data from a molecular dynamics simulation from data file rohit. dat on the website
Youlooked up the answer to your estimation problem when the constraints are not
active and find the formula
= yx T (XX
-1
(4.93)
You do not see how this answer can come from your colleague's code because the
answer in (4.91) obviously depends on R but your answer above clearly does not depend
on R. Let's get to the bottom of this apparent contradiction, and see if we can use Vector
(a) What vector equation do you obtain if you apply the vec operator to both sides
of the matrix model equation, (4.92)?
(b) What is the covariance of the vector vecE appearing in your answer above?
(c) Apply (4.91)to your result in (a) and obtain the estimate vec9.
(d) Apply the vec operator to the matrix solution, (4.93), and obtain another expression for vec.
(e) Compare your two results for vec. Are they identical or different? Explainany
differences. Does the parameter estimate depend on R? Explain why or why not.
Exercise 4.41: Estimating a material's storage and loss moduli from molecular simulation
Consider the following strain response mode15
4.10
Exercises
445
12
10
8
6
0
2
o
o
o o
0.2
-0.2
0.4
0.6
0.8
to estimate GI
loss
and cr2from modulus(Gl
measurementsof
The strain "measurement" in this case actually comes
from
simulation. The simulation computes a noisy realizationof a moleculardynamics
for the given
material of interest. A representative simulation data set is
provided
in Figure4.16.
Thesedata are given in file rohit.dat on the website
www.che.wisc.eduhjbraw/
pri nci pl es so you can download them.
(a) Without knowing any details of the molecular dynamicssimulation,
suggesta
reasonable least-squares estimation procedure for Gl and G2.
and G2 are
Find the optimal estimates and 95% confidence intervals for your
estimationprocedure.
recommended
Plot your best-fit model as a smooth time function alongwith the data.
Are the confidence intervals approximate or exact in this case?Why?
(b) Examining the data shown in Figure 4.16, suggest an improvedestimationprocedure. What traditional least-squares assumption is violatedby these data?
How would you implement your improved procedure if you had accessto the
molecular dynamics simulation so you could generate as manyreplicate"measurements" as you would like at almost no cost.
Probability,
variable
that the X
ill
told
havebeen error ey
youmeasurement
file errvbl
are given in
and
Figure 4.17
shown in braw/pri nci pl es.
are
as
Thedata wi sc.edu/-j
the model
che.
www.
assumptions,
these
(a) Given
and intercept
of the slope
estimate
find the best
variable has
confidence ellipse,
probability
and the
estimates.
the parameter
these data.
of best fit to
line
the
data, and
(b) Plot the
you are told later that actually y is known
lab,
confusionin thevariablehas measurement error ex distributed as
x
(c) Dueto some
the
highaccuracyand
ex
in a transformed parameter vector (l)
so that it is linear
model
the
Transform
+ exi
(f) Canyou tell from the estimates and the fitted lines which of these two proposed
modelsis more appropriate for these data? Discuss why or why not.
enl
Con
so that
tneragain
an!
Etntio
Y = Be.
and
ellipses
10 Exercises
ercise4.44:The multivariate
-distribution
Xe
and
random
variable
ko are
statistically
defined
as
in
deendent
with m e RP a constant
is given
by
nn 02
RP
+ 00
nnp
are
(b) Showthat lines of constant probability of the multivariatet-distribution
ellipsesin O as in the normal distribution.
Probability, Random
448
xTx (- 00) n np
2
( 00)
in agreement
with (4.55).
distributed
two uniformly
random variables
Adding
random variables, X U[O,11 and
distributed
Exercise4.47:
uniformly
that the transformation from (X, Y) to Z
two independent,
X + Y. Note
Given
density for Z =
U[4, 5], find the transformation.
is not an invertible
variance normals
unit
two
of
Exercise4.48: Product scalar random variables distributed identically as N(o, 1).
independent
well defined for all z? If not, explain
Let X and Y be density for Z = xy. Is pz (z)
why not.
integral
Derivethe definite
used
on (00,00). We wish to
exponential version of the integral
the
first
consider
Hint:
show that
eibxdX
a2x2
(4.95)
e
a2x2sinbxdx = 0
a*0
Thenperform the integral by noticing that integrating the normal distribution gives
= Q7r
even when m' = im is complex valued instead of real valued. This last statement can
be establishedby a simple contour integration in the complex plane and noting that
the exponentialfunction is an entire function, i.e., has no singularities in the complex
plane.
Yi =
Cijxj
N(mi, R).
4 10
Exercises
matrix C is orthogonal.
are independently
bich
the
that
d deduce the
Let
449
distributed
as Yi -v
in which
problem
statement.
4.51: Estimated variance and the Wishart
, en e RP be n independent
distribution
el , '2,
vectors
randomvariable
samples of a
zero mean and identical
normally
variance,
distributed
N(o, R). Define
the matrix
eieT
Wishartdistribution,
EET Wp(R,n).
q).
butions)if they produce the same integral for all test functions
(x/)2
-+0.
Probability, Random
450
of the exponential
the Taylor series
for
bound
limit theorem for sums
Exercise4.53: Error used in establishing the central
Of
(4.31)
Derivethe bound
random variables
identically distributed
I-MIn + 1
f(x)
r(x,h) = f (x + h)
Kf min(h 2, lh13 )
sup Ir(x,
xeR
is valid for any b > 0
Showthat the followingKf
Kf = max
(4.96)
(i
with M/ ) = SUPxeRf ) (X).
(a) The de Moivre-Laplacecentral limit theorem assumption that the Xi are independent and identically distributed with mean zero and variance 0 2.
(b) The Lyapunovcentral limit theorem assumption that there exists > 0 such
that as n 00
1
E f (I X k1
2+
Note that the Lyapunov assumption implies only part (b) of Assumption 4.15.
(c) The bounded random variable assumption, i.e., there exists B > 0 such that
IXil <
4.10
Exercises
451
1
0.5
0.5
1
1.5
2
to a unit
stepfunction,
H(z-l).
and variance
normals with mean
op. Let
zero
and
satisfy the Lindeberg conditions listed
in Assumptionvariance
Hint: using the
variables, show that
4.15, then soShowthat if the
for
do the Yi.
n sufficiently
vQ for all i. This result shows that
large
no
and
a significant fraction of the sum's variance as single random variablecan any O,
accountfor
condition for the Yi variables, and use n becomes large. Next
evaluate
the fact that
the Lin-
(maxiq)
so that all
p(l) = a4 + a4 = 1/2
p" (1) =
+ 20a5= o
3/4
= 0,to
Random Variables,
Probability,
and Estimation
452
is therefore
f(z)
function
candidate
(b) The
(3/4)z5,
1K z < 2
(5/4)Z4
f(z)
uous at Z
given by
Mf
also
these values
check
and
and
three derivatives,
(2) = 20/9
on your plots.
(w -x)/L)) = f(u,).
andf(z) =
(1-z/2)L+x
of Figure 4.15. Showthat
rescale.Letu, has the required properties
(c)
f(w) now
Thefunction bounds are scaled by
the derivative
M CI) =
PI-SRalgorithm
Exercise4.58: Properties of
described in Section 4.8, show the following properties.
Giventhe PLSRalgorithm
(a) T=XR
(b) TTT=1q
000
000
Bibliography
National
John
Wiley
essay towards solving a problem
An
in the
T.Bayes.
Trans. Roy. soc., 53:370-418, 1763. Reprinted doctrineof
chances.Phil.
in Biometrika,
35:293-315
and
G.E.p. Box
E.A. Cornish.
The multivariate
t-distribution
Analysis.
associated with
Addison-
a set of
normal
Examples. Cambridge
UniversityPress,
china.Acta,
1986.
decompositions.
A. N. Kolmogorov. Foundations
453
Bibliography
454
J. W.
Lindeberg.
Eine neue
Exponentialgesetzes in der
des
Herleitung
1922.
Math. Z,
15:211-225,
Multivariate
statistical
Skagerberg.
B.
and
and W. H. Ray, editors
J. Kresta,control. In Y. Arkun
Marlin,
T. F.
and
analysis
CACHE, 1991.
process
in
Control-CPCIV.
methods
J. F.
MacGregor,
and partial
Principal
package:
The pls
2007.
R. Wehrens. J. stat. Softw., 18:1-24,
and
B.-H.
regresion in R.
squares
least
and Stochastic Processes. McGrawVariables,
Probability, Random
Papoulis.
A.
edition, 1984.
second
Inc.,
1935. Statist. sci.,
Hill,
limit theorem around
central
The
on:
D. Pollard. Comment
1986.
Eng.,
Tables. CRC
S.M.Selby.CRCStandardMathematical
1973.
6:1-25, 1908.
Student. The probable error of a mean. Biometrika,
A.J. Thomasian. Thestructure of probability theory with applications. McGrawHill, 1969.
5.1 Introduction
incorporate
and (iii)the Kalman filter for reducing the effects of noise in process
measurements, a fundamental task in systems engineering.Bycovering
representative examples from transport phenomena,chemicalkinetics,
and systems engineering, we hope to both introducerandommodels
and processes, as well as demonstrate their wide range of applicability
in modern chemical and biological engineering.
455
Stochastic
456
5.2
Stochastic
ables
Processes for
Models
and
Processes
Continuous Random
Processes
Stochastic
Time
52.1 Discrete
chapter is an understanding of the struc_
the
of
this partcontinuous time stochastic processes: the stochas_
in
target
our
dymamicsof
differential equations. In building up to
ture and of deterministic
simpler stochastic
with the conceptually
start
tic analogs
to
example
instructive
these,it is equation. Consider the following
difference
c(k)
(5.1)
+ 1) = Ax(k) +
number in discrete time, is a random
sample
the
120is
in whichk e
some fixed and known probability density,
Gil'
to have
variable,assumed . .. are independent, identically distributed sam= 0, 1,2,
interval At, then t kAt. Because of
and
sampling
a
define
ples of E. If we
variable the variable x is also a random
random
the
of
call it a continuous
the influence
can take any value, so we
variable.In generalit
to the integer-valuedor discrete random
contrast
in
variable
random
Section 5.3.
variables we encounter in
(5.2)
Thereis no difficultyexpressing the solution to the stochastic differenceequation;in fact we cannot determine by looking at the form of
lyits
probability den
understanding
manyimportant aspects of stochastic processes. Considera systemwith scalarx,
=
A = 1, zero initial condition and
zy,fweletG
=
x(t)N
p(x,t)
wiS
to
tile
find
x(0) = 0
+ Gw(k)
(5.3)
+ 1) = x(k)
density of x(k) versus time for this
the probability
x(l)
p we sequence
w(k)
tile
N(O,1)
x(l) + Gw(l)
w(l)
that
Noting
J w(l)
the linear transformation of a normal we
Theorem 4.12 on
and using
have that
NCO, 2G 2 )
x(2)
N(O, kG 2)
I x2
2TTt
Similarly,
if we let G = 2DAt where D is a constant, then
x(t)
N(O,2Dt)
or
p(x, t) =
2 TTDt
Thisis precisely
(3.70) from Chapter 3, which describes the transient
spreadby diffusion
of a delta-function initial condition. Thus we see
Stochastic
sign
first
the
already
of what
turns
the mean
a deep
be
to
out
square
then
variable,
position
(x2(t))
processes,
diffusion
For
and important
by
displacement is given
square
the mean
x(k+
Gm + Gw(k)
+
x(k)
1) =
this becomes
Gm/At
v
Defining
vAt + Gw(k)
+
x(k)
=
x(k+ 1)
G=a
a velocity v
The particle drifts with
early sNithtime, while
also diffusing.
becomes
That is, the solution becomes a sum of independent identically distributed (Ill))random variables. In Section 4.5 we learned the remarkable fact that sums of IID random variables converge to a normal distribution. Thus as k -+ 00,x(k) becomes normally distributed even if the
noise that drives it is not. So, for example, if we can only observe the
processx(t) at time intervals that are infrequent compared to At, it
willbe Srtuallyimpossible to know whether the underlying noise was
Gaussianor nottheresulting process x(k) will be. This result is one
reasonwhy, in the absence of further
information, taking the noise in
a systemto be normallydistributed
is often a good approximation.
stochastic processes
for Continuous
RandomVariables
459
x(t) = N/w(t)
(5.4)
s0, ts
(5.5)
(AW(ti)
= B 2AtiiJ
(5.7)
= 0 for n odd
(5.8)
(5.9)
>
on ti is that ti ti-l. Accordingly
restriction
only
canbe written as a sum of Wiener
where tN = t and the
process
motion)
a diffusion (Brownian
by 21)
increments multiplied
- to) = E dAW(ti)
Furthermore,
(5.11)
(5.12)
x((k+ 1)At)=
with w(k)
2DAt w(k)
x(0) = 0
5
this process for sampletime At = 10-6 and diffusivityD = 5 x 10
Noticethat the roughnessis quite apparent in the top row of Figure
5.1. But by looking at finer time scales, we can see the effect of the
finite step size in the discrete time approximation. The continuous
Random
Variables
461
400
-200
-800
-400
-1200
0.5
0.45
-220
0.55
-240
-240
-280
W(t)
-260
-320
-280
-360
0.495
0.5
0.505
0.4995
-220
0.5
0.5005
-228
-230
-232
-240
-250
0.49995
0.50000
0.50005
-236
0.499995
0.5
o.soooos
Stochastic Modelsand
462
Processes
-228
-232
-236
0.499995
0.5
0.500005
Figure 5.2: Sampling faster on the last plot in Figure 5.1; the sample time is decreased to At = 10-9 and the roughness is
restored on this time scale. Thought question: how did
we generatea random walk that passes exactly through
the solid sample points taken from Figure 5.1? Hint: certainly not by trial and error! Such a process is called a
Brownian bridge (Bhattacharya and Waymire, 2009).
At
= f (IAwl)
At
1
At 27TAt
IXIexp
2At/TT
At
7Tv/
This diverges as At -1 / 2 as At -Y 0.
1The results of Exercise 5.8 were applied
in this derivation.
x2
2At
dx
Continuous
Random
variables
463
Now let us return for the
moment to
white-noise process, (5.3).
Considering a the discretetime
sampling
integrated
interval At
and let, We Can
rewrite this
Ax = BAW
Under other circumstances
we could
divide by At
and let it
NQdVv
dt
shrink to
dt
we have just found, however,
that
less, we can define a differential of dW/dt does not exist.
the Wiener
+ Ats) - W(t)
increment
process Neverthe-
dW(t) =
when At becomes
as the Wiener
the infinitesimal
dt
continuous.
(5.14)
This is the most elementary STOCHASTIC
Withinitial condition x(0) = 0, its solution DIFFERENTIAL
EQUATION.
is (5.4).
5.23 Stochastic Differential Equations
Basic ideas
To motivate and introduce stochastic differential
equations,consider
first the deterministic differential equation
dx
dt
(5.15)
dx
dt
dt + B(x, t)
t)
A(x,
=
dx
integrate
Formally,we can
x(t) = x(0) + J
(5.16)
this to yield
t') dW(t')
t') dt' + J
(5.17)
would be as well if dw
second
dt existed
The
classical.
The first integral is
just write that
would
case
which
in
dW
dt'
'B(x(t'), t')
dt
by the sum
sn = E
- W(ti-l))
mathematician.
465
rule, one practical reason for this choiceis that is the onemost
applied in
val.
straigY1tforWardly
+ St) = x(t) +
(t + At)
SW
where
+ At)
G(t') dW(t')
Tilisis
makes
This
O.
=
because 'E (W(ti)matters. if were
the choice of Ti(ti-l)) would notbe independent and
and (W(ti) - W
necessarilybe zero.
of the form
integrals
By considering
one can
expression for Sn
show
466
Stochastic Models
and
Processes
dWidWj ijdt
df(x(t))
+ dt) f(x(t)))
f'(x(t))dx(t) + f"(x(t))(dx(t)) 2
=
(5.21)
systems
nates as
dx = B dvvx
dy = B dWy
(5.22)
(5.23)
ses for
where Wx and W
How would we write
As a brief prelude,
Continuous
Random
the same
observe
ariables
467
Process
in
that for
a Particle
starting
(2 (vv2)
and1B
at the
origin,
the'
4Dt
Returning to the
specific
motion in
any number
d of diquestion at
ordinate first and keep
in mind
hand, consider
that we
the
may need
to keep radialcoterms up to
2
1 r
ax +
ydy +
x
I 2
2 x2 dx2 + r
+ I 2r
b3dy2
Here all the partials can be
evaluatedfrom
the formulas
Now using the
r = x2 + Y2
SDEsand noting
B2dt, dxdy
0, we have that
that dx 2 = dy 2 =
dr = cos 0B dWx +
sin 0B
dWy + B2 dt
+ B dwr
(5.24)
as we
Letting B2 = 21) we find that (r 2) = 4Dt in twodimensions,
should.
+ x2
dx + dy
y
x
1 2e
dxdy
2 xy
1 2e
---dy2
+ y2
drift
cannot be any
there
Evaluating derivatives we find
symmetry
likely.
By
must be equally
changesin e
Using(5.12)again
de = (ydWx+ xdWy)
-ydWx + xdWy with r
replace
we can
(5.25)
from sampling
properties
Average
Example5.2:
property of the model rather
"average"
an
in
Often we are interested
stochastic equation. Consider again
the
of
realization
than a single
diffusion process on the plane, (5.22)the
of
model
walk
the random
and compute an estimate of the mean
(5.23). Simulatethe process
time.
square displacementversus
Solution
(5.27)
continuous
Random
variables
469
20
-20
-40
-60
-80
-100
-120
140
-160
20 40 60 80 100
12000
8000
4000
200
mean square
The
5.4:
Figure
500.
--
600
400
1000
800
time k
D
versustime;
displacement
2, V
The squared
displacement
square
particles
is given by
particle
of the ith
(5.28)
given by
is
displacement
(k)
(r2) (k)
n large
above
(5.29)
(r(k) = 4Dk
Equation
5.2.4 Fokker-Planck
about solving an SDE.We can find particuthink
to
ways
two
There are
the Euler-Maruyama scheme above will do.
lar trajectoriesthisis what
probability density p (x, t).
evolution of the
We can also consider the
white-noise and Wiener processes, we obIn consideringthe integrated
evolution of p (x, t) and the diffuservedthe connectionbetween the
the solution to dx = dW. Because
sion equation. The Wienerprocess is
density p (x, t) for a trajectory
its trajectoriesx(t) x(0) N(O,t), the
equation
starting at x = xo is a solution to the transient diffusion
p
t
2p
X2
p(0, t) =
xo)
(5.30)
Random
variables
rewritten as
This can be
471
f(x) p(x,t)
t dx
2 x 2
dx
p(x, t)
2 1
x2 2
(5.31)
canbe
p(x,
t)
A(x, t)p(x,t) -
(5.32)
2
d(x x')
dt
t=t'
(5.33)
deterministic case)
d(x x')
dt
integrate to unity
The probability density must
p(x,t) dx = 1
(5.34)
(5.35)
Processes
472
is
there is an important
A = v, but
differs
with
The
the FPE,
for
gousto that
not equivalent to the (gradient) diffus
is
it
position,
transport equation. Exercise 5.2
varies
the
FPE
the
appears in
detail.
D that
in further
differences
analysis to an Il-vector random process
the
these
generalize
... , n. The SDE and FPEs for this
Wealso can
Xi, L -- 1, 2,
in
first term
X, with
the flux
components
are
dXi = Ai(x,
Xixj
t
of all components
function
a
is
Herep
(Dij(x, t)p)
(5.36)
(5.37)
FORMULA
MENSIONALIT
+ BLkBJk
df(x)= At
Xixj
ijdwj
(5.38)
conserved
As in the scalar case, probability is
(5.39)
dx = A x(t)dt + B(x, t) dw
(5.40)
(5.41)
with
(5.42)
Thisresultindicatesthat D
is symmetric positive semidefinite.For
numerical
integration of multidimensional
scheme extends
straightforwardly.
Random
variables
473
Example 5.3: Transport of many
particles
large number of particles, each
suspended
obeying the
in a fluid
equation
dx
are
vdt + N/bdw
in a fluid. How do
we
describe the
Solution
The probability density for an individual
For the many-particle
evolutionof
the
particle
vpx + Dpxx
concen-
evolvesas
system, we define
an n-particle
joint density
func-
through xn,
xj)fldxt
i=l
(5.44)
p(xl,...
Performingthe integral in (5.44)gives
c(x,t) =
which indicates that the linear superposition of each particle'sprobability of being at location x produces the total concentrationat x. If
the particles are identical, pj(x, t) = p(x, t), j = 1,... ,n, this reduces
to
c(x, t) = np(x, t)
gab
Iti
Stochastic Modelsand
474
(5,
Processes
36) P
0
therefore
vcx(x,t) + Dcxx(x, t)
The conclusionis that the concentration profile created by many
interacting, identical particles obeys the same evolution equation non.
probabilitydensity of a single particle. Averaging the behavior as the
particles does not "average away" the diffusion term in the of
evolution
equationof the total concentration c(x, t). See Deen (1998,pp.
59-63)
Example 5.4: Fokker-Planck equations for diffusion on a
plane
Example5.1 introduced the stochastic differential equations
for diffu_
sion on a plane in Cartesian and polar coordinate representations.
For
the Cartesian representation, (5.22) and (5.23) have
probability density
' eld
ser
not
factoj
x2
e dic
xY
on the
e ands
condition
- r,X2 message
p(x,y) dx dy = 1
If we rewrite this equation in polar
only a
here,
we wish
Finally,
fac
by the
Motivated
coordinates we get
guess that
wemight
p (r, O)
and
rr(
not?
(5.45)
substit
g this
cc
equationin polar
2n 00
o
of the stochastic
r2
pi
p(r,
dr dO = 1
polar coordinateform
differential equations,
(5.24) and (5.25)? Why or
why
and
Solution
Equations(5.24)and
(5.25)can be
B O
dt +
O
dwr
dwe
Stochast
53.1
Introduc
Our
next
netics applic
takin
5.3 Stochastic
Kinetics
475
1 BOB
probability
pp(r, O)
2
t
r2
Y2 02 P P (r, O) (5.46)
Thisis not the transient diffusion equationin polarcoordinates.
We begin to understand this difference by writing the
normalization
condition, (5.39)
Pp(r, O)dr de
This differs by a factor of r in the integrand from the conventionalarea
integral in polar coordinates. The reason is simple: in goingfromthe
SDEto the FPE,we did not tell It's formula about the geometryof area
(Exercise5.9)are
at t = 0, the normalized solutions
1
4TTDt
and
_r2 / (4Dt)
pp(r, O,t) =
and Time
Length
and
Introduction,
53.1
Scales
chemicalkiand
networks
is reaction
start with a
we
First
Our next application of interest
numbers of molecules.
small
at
place
netics taking
Stochastic Modelsand
476
Processes
c = c A CB CC
We define the species vector of concentrations
,and
network with
the stoichio_
-1
-1
-1
IQ = k2CB
C = VTr (c) =
dt
ViYi(c)
(5.47)
of the atoms, we can model the electron bonds deforming continuously in time from reactants through transition states to products. We
choose instead a larger time and length scale so that each reaction that
0.8
CA(t)
0.6
cc(t)
CB(t)
0.4
0.2
0
process
the Poisson
0 gives
At
Pr(T > t) = e
of T's probability disaxioms and the definition
t) = FT(t) =
gives the
Differentiatingto obtain the density
PT(t)
At
exponential density
At
(5.48)
10
unit Poisson
12
process.
1000
800
600
400
200
200
400
600
800
1000
250
200
150
100
50
480
Stochastic Models
and
is the
the probabilitythat the first event has occurred at some
sarne
time
as
=
t)
>
1 - Pr(T t).
than t, or Pr(Y(t) = 0) = Pr(T
Therefore
the relationships
= 0) = I-FT(t) = eAt
Wenext generalize the discussion to find the probabilit
the time of the second and subsequent events. Let ran y density
dom
for
denote the time of the second event. Wewish to comput variable
(t2, t). Because of the independent increments
PT(t2
0
PT(t2 t)pT(t)dt
or
(t) = 2te-t.We can then use induction to obtain the
of the time for the nth event, n > 2. Assumingthat Tn-l has density
density
t / (n 2)!, we have for Tn
t)
e-(tn-t)
(VII
AnI
(5.49)
From here we can work out Pr(Y(t) = n) for any n. For Y(t) to be n at
time t, we must have time Tn t and time TTL
+1 > t, i.e., n eventshave
occurred by time t but n + 1 have not. In terms of the joint density,we
have
P Tn+l'Tn(t', t)dtdt'
481
Kinetics
5tocastiC
n)
(5.50)
follows. we
as
is
justification
is
abilityof an event during time interval [t, t + At]
for At small. We can express the nonhomogeneous process
alsoin terms of a unit Poisson process with the relation
Y(t)
482
Processes
Pr(t ST St + At) =
+ At)
= Pr(Y(z(t + At)
YR(t) > 0)
z(t)) > 0)
(s)ds) = 0)
(s)ds
(s)ds
process Yiwith intensity Yi. Note that this assignment gives nr nonhomogeneous Poisson processes because the species numbers change
with time, i.e., ri = ri(X(t)). The Poissonprocesses then count the
number of times that each reaction fires as a function of time. Thus
the Poisson process provides the extents of the reactions versus time.
From these extents, it is a simple matter to compute the species numbers from the stoichiometry. We have that
i=l
(5.51)
pearing on both sides of the equation. This integral equation representation of the solution leads to many useful solution properties and
simulation algorithms. We can express the analogous integral equation
5.3
for
stochastic
483
Kinetics
tile
dolli
timeS
Simulation
Stochastic
53.3
ime change representation suggests a natural simulation
t
random strategy for the species numbers X(t). We start with a
The
samplingown initial condition, X (0). We then select based on each
or
kn
chosenor exponentially distributed proposed times for the next rereaction, t 1, 2, . .. , nr. These exponential distributions have inTi,
actions,
As mentioned
to the different reaction rates,
equal
tensities
we obtain a sample of an exponentialFTt(t) =
previously,
sample of a uniformly distributed RVon [0,1], u, and
a
drawing
by
logarithm
rescalingthe
Ti = (l/ri)lnui
the
Wethen select
to fire,
giving
i=
tl
min
Ti
il = arg min
Ti
1 , nrJ
X(tl) =
Stochastic Modelsand
484
Processes
P2 s u
ik arg
-rt.
Pi = E rd r,
Note that 0 = po PI P2
= 1, so the set of Pi are a partition of [0, 1] as shown in Figure 5.8 for nr = 3 reactions. The length
of each interval indicates the relative rate of each of the nr reactions.
485
Kinetics
5tocastiC
53
pmI u pm
fires m and the time of the reaction T, we then
rli Y
falls
that
reaction umbers in the standard way
n
te
tne
Gj%ante
LIP
species
or simply the
wn as Gillespie's DIRECTMETHOD
(SSA).we
this method
Stochastic Modelsand
486
Processes
100
80
nc
60
40
20
4
5
A+B=C
(5.52)
t. We seek an evolution equation governing p(a, b, c, t). The probability density evolvesdue to the chemicalreactions given in (5.52).
Consider the system state (a, b, c, t); if the forward event takes place,
53
stochastic
487
Kinetics
required for
rates
the
But
system IS
p(E,t)
t
(5.54)
488
Stochastic Models
and P
forward
(30
(31
P2
the
Yo
Yl
(32 Y2
dt
pn-l
pn
Yn-l
Pn-l
(5.5S)
pn
dt
(5.56)
solution to the deterministicproblem as the number of moleculesincreases. Figure 5.11 displays the solution to (5.55)starting with20A
489
Kinetics
5toca5tlC
p(E, t)
-1
time is
490
Stochastic
Models
10
0.2
0.4
t
0.4
0.6
0.6
0.8
.2
0.8
1
as the state of the system rather than the reaction extents. Given
a
system in state x e ons, reaction i with stoichiometricvectorVican
reach state x from only state x Vi,and can leave this state to reach
state x + Vi. We then have for the evolution of the probabilitydensity
dt
(5.57)
tocastic
5
491
Kinetics
30
20
10
0.2
0.4
0.6
0.2
0.8
starting with
Solution to master equation for A +
5.12:
Figure
200 A molecules, 1000 B molecules and 0 C molecules,
-- 1/200, 1<-1=
nc
andmaster equation
dt
(rl (x, t) +
(x, t))px(x, t)
Stochastic Models
and
Processes
nc
, t kl (nA + l)px
k -1 (nc + 1) PX
t)
+1
nc 1
(1<1
n A + k-1nc)Px
nc
5.3.5 Microscopic, Mesoscopic, and Macroscopic Kinetic
Models
Next we would like to explore how the discrete stochastic kinetic
of a microscopic system transforms into the deterministic kineticmodel
of a macroscopicsystem that is familiar to undergraduate model
chemical
and biological engineers. Along the way, we derive a model for the
the regime bridging the microscopic and macroscopic levels,which
sometimes called the mesoscopic regime. Our goal is to start with is
the
microscopic chemical master equation and take the limit as the sys.
tem size becomes large. We use the system volume Q for the size
parameter. The procedure we follow is given by van Kampen(1992
EXPANSION.
pp. 244-263)and is known as the OMEGA
The essentials
of the approach are perhaps best explained by taking a concrete(and
nonlinear) example. Consider the bimolecular reaction
variable, and the rate constant k has units of 13/(mol t), so the rate
has units of mol/ (t 13),a rate of reaction per volume,whichis also
intensive. The mole balance for species A in a well-mixed system is the
familiar
dc
dt
= 2kc2
c(0)
co
(5.58)
For these same kinetics, at the small scale, we have the microscopic
chemical master equation
n(n 1)
dt
20
20
(5.59)
493
Kinetics
53
5tocastiC
0.2
0.15
o 20 0.0114
var(E) --
0.1
0.05
0.4
0.2
0.6
0.8
0.06
0.04
Q = 200
var(E) = 0.00176
0.02
0.2
0.4
0.6
0.8
494
Stochastic Models
and
Processes
(5.60)
var).
We are neglecting terms of order 00 and lower in the expansion
of n
in (5.60). Thus we are expressing n/Q as a perturbation solution
in
increasing powers of small parameter Q-1/2. The additional complica_
tion in this case compared to our previous perturbation examples
in
Chapters 2 and 3, is that we are also changing from a discrete variable
n to continuous variables c and E.
The master equation describes the density of random variable
n
P(n, t), and we wish to deduce an evolutionequation for the density
of random variable which we denote FI(;, t). And we also expectthe
analysis to show that the familiar differential equation (5.58)describes
the deterministic variable c. As a transformation of random variables
we are considering the two densities to be related by
POI, t) = P(cQ + 0 1/2, t)
in which we suppress the dependence of n on c. Consider c to be some
known function of time when expressing the transformation between
the two random variables n and
Given this transformation, the partial derivatives are related by Pt =
and is found by differentiating (5.60)holding n constant,
Flt +
which yields
Fot Q1/2
hand side.
495
Kinetics
5tocastiC
53
+ 2), t) in terms of
and
(5.61)
E(n+ 2) --to
n +2
its
-11 +
80-3/2
3!
2!
(5.62)
n(n 1)
(11+
+ 1) - c2Q +
+ (3C+
+ 20-1
-[4C +
+
k[c2Q1/2
+ 20 -1 ] 11+
+ (3c +
+ 30-1 + 20-3/2111;+
+ 30-3/2 + 20-2]
order 01/2. Collecting the terms of order 01/2 gives (C+ kc2)11; = 0
and,sinceIl; * 0, we deduce
dc
dt
= kc2
(5.63)
whichis the macroscopic equation (5.58) after noting that the usual
macroscopic
convention absorbs a factor of one-half into the definition
of the rate constant, i.e., k = k/ 2.
496
Stochastic Models
and
Processes
= -2kc;dt+
dw
tuation term
axis to more clearly show the evolution at early times. These two simulations display quite similar character. To compare them more quantitatively, we could compute several low-order moments of the densities
by computing sample averages over many simulations.
As a more comprehensive alternative, we compute the correspond-
no
We can obtain the density for the omega expansion by solvingthe PDE
for Il (4 t), shifting the mean by the deterministic c (t), and integrating
for the cumulative distribution. Or we can instead derive a corresponding evolution equation for g's cumulative density
Il(,
5.3
stochastic
0.01
497
Kinetics
0.1
10
100
0.1
10
100
0.6
0.4
0.2
0.01
Figure5.14: Simulationof 2 A
compared
to c, and we have the familiar deterministicmacroscopic
description,(5.63) or (5.58). In Figure 5.15, this limit would be observed
-1
0.8
0.6
F(x, t) 0.4
0.2
o
0.4
0.45
0.5
0.55
0.6
B at t = 1 with no =
A
distribution for 2 equation (steps) versus
Cumulative Discrete master
Figure 5.15:
500, Q = 500.
omega expansion
by the two
(smooth).
Exercise 5.22.
also
See
c(t).
=
x
rapidly growing literature on stochasand
extensive
an
Thereis now
Anderson and Kurtz (2011) is highly
by
chapter
book
current and comprehensive
tic kinetics. The
a
in
interested
recommendedfor those
covered here as well as more advanced
topics
the
of
overviewof most
theorems for Poisson processes, marlimit
central
relevant
topics on:
model reduction.
tingales,and scaling and
provideus with our first exposure to sensors, i.e., the type built in
by nature. Sincehumans are very curious about the world, people
havebeen hard at work for a long time augmenting the natural senses
by constructingartificialor man-made sensors. Some of mankind's
biggestadvancesin scienceand engineering were precipitated by a
breakthroughin sensor technology,e.g., the telescope, the microscope,
detectorsfor electromagneticradiation outside the visible range, etc.
Oneof the importantthings that we know about sensors is that
theyare limitedand imperfect indicators of the world around us. They
tima/
Linear state
Estimation
499
fteJ1
ran
Wemay decide in some situation that a change in a
sensor's
disturbance.
indicates that the system has changed. But we may designal
sensor
Sl
a different or disturbance to the sensor itself,
ill
and the system
cide random effect
a
unchanged. Optimally combining these two sources of
completely
is
what the sensor tells us and the other knowledgethat we
formation:
n
the system's behavior, is the task of state estimation.
about
have
these concepts precise, we consider a linear system. Let
make
To
an Il-vector containing all the relevant information about a
be
X
interest
systemof
x + = Ax + Bu
500
Processes
variance P (0) reflects our confidence in the initial state. If we know how
the system starts off, P (0) is small. If we have little knowledge, we take
P (0) large. Recall the noninformative prior is a uniform distribution,
which we can approximate by taking P (0) very large. In industrial applications, the initial condition may be known with high accuracyfor
batch processes. But the initial condition is usually considered largely
unknown when analyzing a dataset taken from a continuous process.
We require three main results concerning normals, conditionalnormals, and linear transformation. These follow directly from the properties of the normal established in Chapter 4, but see Exercise5.24for
some hints if you have difficulty deriving any of these. Recallalso the
exp
(x
m)
Joint independent normals. If pxlz(xlz) is normal, and y is statistically independent of x and z and normally distributed
N(my,Py)
y independentof x and z
timal
5.4
Linear
te
teJ1
State Estimation
501
px,ytz
px,ylZ
mx
PX 0
Y ' my
(5.64)
pylZ
pxlz(xlz) = n(x, m, P)
Pylz(ylz)
normal.
of a joint
conditionalis normal
givenz
x z
thenthe conditional
y = Ax
(5.65)
PX Pxy
= n(x,m,P)
in which
(5.66)
P=Px-P xyp-lp
y yx
(5.67)
(5.68)
withknown densities
N (0, Qw)
502
Stochastic Models
and
Processes
We
next derive the optimal estimator for this process.
this derivation, we will derive the probability densities of As part
function of time. This is the same pattern that we follo the state asOf
wed
a
two sections on Brownianmotion and stochastic kinetics. in the first
the random process (Wienerand Poisson processes), Westarted
derived their probability density equations (Fokker-plank and then we
and
Because we have assumed a prior, the density of x (0),
we
Bayesian estimation. The overall game plan is as follows. are using
state x(0) is assumed normal. Our optimal estimate beforeThe initial
ment is denoted R- (0). The minus sign indicates estimate measure.
before
surement. We obtain from the sensor measurement y (0).
Wethen
compute the conditional density of x(0) ly(0). We
ment y (0) , we next obtain the conditional density PX(0)b, (0)(x (0) Iy (0)).
x(0)
y(0)
We assume that the noise v (0) is statisticallyindependent of x(0),
and use the independent joint normal result (5.64)to express the joint
3Because we have linear transformations of normals at each step of the procedure,
every density in sight will be normal,
tuna/
Linear
503
State Estimation
X(0)
xo
o
(2(0) o
tne
+R
CQ(O)
Y(O)
ill
which
P = (2(0)
density
the conditional
x(0)
x(l) = [A 11
inwhichw(0) N(O,Q) is the process noise. We next calculate the
conditional
density
Now we require the conditional version
504
Processes
w(0)
of the linear transformation of a
We then use the conditionalversion
normal (5.65)to obtain
in which the mean and variance are
R-(1) = AR(O) P-(1) =
+Q
We see that forecasting forward one time step may increase or decrease
(O)AT may be smaller
the conditional variance of the state. The term AP
or larger than P(O),but the process noise Q always makes a Positive
contribution.
is also a normal, we are situated to add meaGiventhat
surement y(l) and continue the process of adding measurements followed by forecasting forward one time step until we have processed
all the availabledata. Becausethis process is recursive, the storage requirements are small. We need to store only the current state estimate
and variance, and can discard the measurements as they are processed.
The required online calculation is minor. These features make the optimal linear estimator an ideal candidate for rapid online application.
We next summarize the state estimation recursion.
y(k) = {y(0),y(1),...y(k)}
At time k the conditionaldensity with data y(k 1) is normal
and we denote the mean and variance with a superscript minus to in-
x(k)
tuna/
Estimation
State
Linear
505
1)
n dependent
flie
11
x(k) N
oise
y(k)
note
reslilt
ill
(5.66)
which
gives
x(k)
x(k+ 1) = [A I]J w(k)
x(k)
w(k)
n(x(k +
in which
AR(k)
andtherecursion is complete.
+ 1))
Stochastic Models
506
and P
rocesses
R(k)
0 are
- CR-(k))
R-(k) +
(k)C T + R) -l
(k)cT +
P-(k)
AR(k)
(3.69)
(5.70)
(3.71)
(3.72)
+ l)ly(k))
(x (k)ly(k))
(3.73)
are
n(x(k),R(k),P(k))
- Ps-CT(CPs-CT + R) -I CV
Ps- = APSAT + Q
Linear State
ti1a/
tile
flie
steady-s
Estimation
507
ps from
tate filter
(5.72)giving
and
(5.69)
Jfli11g s-(k + 1) = AR-(k) +
- CR-(k))
many,
allyhave
sometimes
objectives. Optimality is
observability.
508
Stochastic
Models
and P
ocess
states
th
ances
x + = Ax
Y = cx
with initial conditionx(0) = xo. The solution for
the state
Akxo, and the output is therefore
is X(k)
y(k) = CAkxo
(5.74)
The system is observableif there exists a finite N,
such that
xo, N measurements
1)} distinguish for every
the initial state xo. As
in Exercise 5.26, if we
uniquely
the initial state using n measurements, we cannot cannot determin
determineit using
develop a convenient
observability as follows. For n measurements, the system
test for
modelgives
CA
xo
(5.75)
c
CA
(5.76)
tima/
Linear State
Estimation
509
the system (A,
Therefore,
C) is
we
1981, P.58).
present
'til
first-order
liquid-P
volumetric
f is the manipulatedvariable,
of A in the feed CA
and
concentration
0. Let
down the mass balances for species A and B and show that
Write
(a)
= Acx + Bcu
and Bc for this problem?
Whatare matrices Ac
Solution
dt CB
-(F/V+k2)
(F/v
+ kl)
CB
CAf
510
Stochastic Models
and
Processes
dx
giving
dt
x + = Ax
+ 1) x(k)
At
y
Ccx
0
(At)kl
(At)kl 1-
1<2)
which has rank two for all sample times At > 0. Sincerank(O) =
n, the system is observable.
The answers are different because measuring A tells us how much
total B we have produced, but we have no information about how
much Bwas present initially nor how much was consumed to produce C.Therefore we cannot reconstruct the Bconcentrationfrom
the model and the A concentration. Measuring species B, however, provides information about how much A is in the reactor,
because the A concentration affects the production rate of B.The
B measurement information plus the mass balances enableus to
reconstruct the A concentration. The value of the rank condition
of the observabilitymatrix is that it makes rigorous this kind of
physical intuition and reasoning.
41mproving the numerical approximation does not change the observabilityanalysis
that follows.
opt/
al
Linear
State
51 J
Estimation
Estimator
Optimal
of an
desirable characteristic,but systems engiStability
one
is
filter
such as stability. Stability
a
ut other characteristics
of
to illustrate
AR-(k)
w(k)
+
1) Ax(k)
y(k)
system measurement
the
substitutinggives
terms
billing
- CR-(k))
Cx(k) + v (k) and com-
+ w(k) - ALsv(k)
of whether (A - ALsC)is a stable
question
the
is
stability
the unit circle.
Estimator
all its eigenvalues inside
+ 1) = (A
i.e.,has
matrix,
theorem covering the stability of the steadyfollowing
Wehavethe
stateestimator.
iteration and estimator stability). Given (A, C)
(Riccati
5.8
Theorem
> 0, P- (0) 0, and the discrete Riccati equation
R
0,
>
Q
observable,
Then
1-+00
512
Stochastic Models
and
Processes
1.5
1
0.5
0
-0.5
-1
-1.5
-2
-1
-0.5
0.5
1
is a stable matrix.
Bertsekas (1987, pp. 59-64)provides a proof of the "dual"of this
theorem, which can be readily translated to this case.
So what is the payoff for knowing how to design a stable, optimal
estimator? Assume we have developed a linear empirical model for a
chemical process describing its normal operation around somenominal
steady state. After some significantunmeasured process disturbance,
we have little knowledge of the state. So we take initial varianceP- (0)
to be large (the noninformative prior). Figure 5.16 shows the evolution of our 95%confidence intervals for the state as time increasesand
we obtain more measurements. We see that the optimal estimator's
confidence interval returns to its steady-state value after about only
513
state estimation
control
optimal
Ho, 1975; Stengel, 1994). The moving
Many
problem (Bryson and
estimation
estimati0n
horizonsystem nonlinearity and constraints, is presented by Rawlings
address
ch. 4).
and
(2009,
5.5 Exercises
walk with the uniform distribution
Random
5.1:
Exercise
walk simulation
againa discrete-time random
consider
(5.77)
x(k+l)
otherwise
(a)Calculatea trajectory for this random walkin the plane and compare to Figure 5.3
for the normal distribution.
Stochastic Models
and
514
Processes
(b) Calculate the mean square displacement for 500 trajectories and
the normal distribution.
Figure 5.4 for
to
(x t)
steps?
distribute
d
and
= (v(x,t)c) +
(D(x, t)c)
x
in which we consider x, v, and D scalars. The first is derived from conservation
with a flux law defined by N = Dc/x. The second is the Fokker-Planck of mass
equation
dx = v(x,t)dt+
2D(x,t) dw
(a) Show that when the diffusivity D (x, t) does not depend on x, these two
models
are equivalent and D(t) = D(t).
x
and find the expression for i' (x, t).
and
c
515
5,
ox
r(s)l s E)
Multidimensional
5.5:
formula to derive
(5.33)and (5.34).
s
UseIto
dimensional form of It's formula, (5.38), for an SDEin the form
multi
the
perivge
dXi = Ai(x, t)dt + Bij(x, t)dWj
(b)
Recall
(c) Use
(5.20).
Lila
this form
d(Xi
X; )
dt
t=t'
d(Xi x'i)
dt
t=t'
- 2Dij(X',t')
= Ai(x', t')
consider the
Wewishto
diffusion
= DV 2c
t
to an impulse source term at t = 0, c(x, 0) =
calculatethe response c(x, t)
(x).
(5.78)
Breakthe problem into two parts and solve the differential equation for x > 0
andx < 0. You have four unknown constants at this point.
Stochastic Modelsa
tid
516
d(x = 0+,s)
dx
(e) Use this jump condition to find
(x,s) valid for all x.
(t) Invert this transform and show
1
Processes
d(x = 0-,s)
dx
-x2/(4Dt)
0 < t,
e
c(x,t) = 2 7TDt
full transfo
rth
00< x < 00
(5.79)
(g) Compute the mean square displacement for this concentration profile.
p(x,t) =c(x,t)
(c) Plot the histogram of particle locations at t = 1000 for 1000 particles. On
the
same plot, compare this histogram to the analytical result given in (5.79).Describe any differences.
(5.80)
c
-
= Y rD
t rr
r 0 < t, 0 < r < 00
We wish to calculate the response c(r,t) to an impulse source term at t
erases
5
tile
1 d dC(r,s)
(3)Rdes0W
-(r)
dr
D r dr
(5.81)
d(r, s)
jump
Usethis
a condition on
d(r s)
r
lim
r-0-
(r,s) = 2TTD
5.10: Diffusion
Exercise
transform
the diffusion
consider
coordinates
c
= A @r 2D
t r 2 r
0 < t,
0 < r < 00
r2dr
s(r,s) = (r)
4Trr2
(5.82)
lim r2
lim r2
518
Stochastic Models
and P
rocesses
(e) Use this jump condition to find the second constant and obt
ain the
valid for all r.
full
transfo
rttl
-Y2/(4Dt)
Consideragain
2p
2p
y2
X2
p(x,y, t) = (x)(y)
t =0
47TDt
(5.83)
x
1
47TDt
519
erases
Nest
ave
define
random
new
a
variable,
for
tan-I (y/x)
17
Y
df-l(r7)
cos O
sin O
r sin O
r cos O
r cos O
r sin O
f-l(n)
7
21)t
are both well-defined probability densities (positive, normalized).
these
Notethat the probability density of the pair of random variables (r, O),and the
Thefirst is marginal density of the random variable r for particles undergoing
secondis the motion.
the
Brownian
that discrete
Given
parameter
0 1,.. ., and
py(n) =
a e R
0, show that
var(Y) = a
(t)dt
(t)dt PTn+1
Substitute
(5.49)and use integration by parts to show (5.50)
(t)7t At
Pn(y)=
(y))
l. see (4.23).
Stochastic Models
and
520
Processes
Let random variable u be distributed uniformly on [0, ll. Define ran dom
variable
the transformation
1
Inu
T=
PT(t) = Re-At
Thus uniformly distributed random samples can easily be transformed
nentially distributed random samples as required for simulating Poisson
A+B2C
(5.84)
(b) What is the dimension of the state vector in terms of the initial
numbersof
aA+bB=cC+dD
(a) Write out the two reaction probabilities hi (n j), i = 1, 1considering the forward
and reverse reactions as separate events.
(b) Compare these to the deterministic rate laws ri (cj), i = 1, 1for the forwardand
reverse reactions considered as elementary reactions. Why are these expressions
different? When do they become close to being the same?
521
5,
5.1B
of }ter
ea
ise
reaction
irreversible
simple
A h B
Te
C tile
of
evolution
r = kitA
e the
A's
mean of
probability density by
nAp(nA, t)
(nA(t)) =
the evolution of the probability density, write an evodefinitionand(t)). The probability density itself should not appear in
(
mean.
equation for
equation for the
evolution
the usual mass action kinetics
tile
c simulation
Stochasti
5.20:
Lise reversible,second-order reaction
the
consider
C r = klCACB- k-1Cc
A+B
deterministic
(a)solvethe
with
kl = 1 L/molmin
1<-1= 1 mm
cc (0) = 0 mol/L
CB(O)= 0.9 mol/L
CA(0) = 1 mol/L
concentrations out to t = 5 min.
Plotthe A, B, and C
simulation using an initial condition of 400 A,
(b)Comparethe result to a stochastic
360Band zero C molecules. Notice from the units of the rate constants that kl
shouldbe divided by 400 to compare simulations. Figure 5.17 is a representative
for one sequence of pseudorandom numbers.
comparison
(c)Repeatthe stochastic simulation for an initial condition of 4000 A, 3600 B, zero C
molecules.Remember to scale kl appropriately. Are the fluctuations noticeable
withthis many starting molecules?
6seealsoExercise4.17 in (Rawlings and Ekerdt, 2012)
522
Stochastic
Models
Pto
cesses
0.8
0.6
0.4
0.2
cc
time (min)
2
rl = klCA
(3.85)
r = 3kc2A0
But if we erase the distinctions between A and B completely and relabeltheB
molecules in Figure 5.18 as A molecules, we obtain the new concentrationsofA
523
Exercises
0B
o
o
o
o
CBO
o
O
r = 4kCA0
two total
Whyare these
(c) performa
parameters
nAO= 50
=
nB()= 60 nco =
kl = = = k = 10sec-1
=0
code.
Makea plot of all species versus time. Print the plot and the simulation
2kcE Il) +
(kc2 n)
Stochastic Models
524
and P
rocesses
VAi
moleculesof
Denote the A species mean velocity (drift term) as VA. The A molecule velocitiesare
then distributed as
t'Ai
kBT
VA,1
all i
(5.86)
Starting from the distribution (5.86), derive the following expectations in terms of the
mean species velocity VAand kB,T, m A.
1. (vAi)
2. (VAiVAi)
2 in which v
3. 'E(vAi)
4. (VAiVAi)
525
Exercises
tor.
(a) For
(5.64), use
linear estima-
both
and divide
px,z(x,z) = n
linear transformation
Considerthe
Y
z
and show
that
conditional
Nowuse the
Amx
mz
A ox
01
z
APxAT APxz
pzxAT
with N measurements
Exercise5.26:Observability
x + = Ax
y = Cx
Bibliography
Of
eti%
Academic
Inc.,
Englewood
Cliffs
Philadelphia, Applications
2009
A. E. Bryson and Y. Ho. Applied Optimal Control.
Hemi
sphere
York, 1975.
Publishing,
New
engineering.
A. Einstein.
ber die von der molekular
Wrme geforderte Bewegungvon in ruhenden -kinetischen Theorie
Flssigkeiten
der
suspendierten
C. W. Gardiner. Handbook ofStochastic Methods for
Physics, Chemistry,
Natural Sciences. Springer-Verlag, Berlin, Germany,
and the
second edition,
1990
Theory,
March 1974.
526
527
Bibliography
quences.
R.
Kampen.
N.G. van
MathematicalTables
stated.
f(t)
1
2
3
af(t) + g(t)
Page
cxf(s) + (s)
105
df(t)
dt
d2f(t)
105
s 2f(s)
s f (0)
dttf(t)
n- t
-f(sj
6 tnf(t)
e-aSf(s)
eatf (t)
i
f( - l ) (0)
105
105
105
dsn
7
8
in
TableA
they are
derived or 1
first
105, 225
106
106
t')dt'
106, 223
528
apiace
L
I
A,
from previous
continued
10
11
12
13
529
Transform Table
lini f(t)
lini f(t)
page
f(s)
limsf(s)
106, 224
limsf(s)t
106, 224
00
00
107
113
fl(t)
(t)
113
1
107
107
s?t+l
16
18
page
107
-1
eAt A e
1
109
107
19
teat
20
sin (0t
(0
s 2 + (02
107
21
cos (0t
s2 + (.02
107
22
Sinh (0 t
s 2 (0 2
23 cosh wt
s 2 (0 2
(0
24
(s
25
eat COSwt
(s
P(s)
a(s)
26
p(sn)
n=l
27 E
E anit
28
e-qr
n=l
2
7Tt 3
107
107
+ (02 107
+ (02 107
a(s)
e-k$ k > 0
308
308
330
530
Mathematic
QI
Qbles
f(t)
1
29
30
e-T
erfc (2ka
ekv/
31
k (N
erfc
ekvaerfc
32
33
34
e-.f
k2
331
sinh(xv'k)
Sinh
35 1-2 E
37
e-
n=l
36
Ko(kv)
sinh(xvk)
(1 ) tl + l rn
n2Tr2t
sin(nrx)
e
nrrx
(l)tt+l
1-2 E
n (n + 1/2)TT
1-2 E
anJ1 (an)
38 2
39
n=l
sin(nrx)
e -(n27T2+k)t
s sinh s + k
333
sinh(xv)
xs Sinh
335
+ 1/2)7TX)
e
cosh(xx/)
S cosh
333
e-ant
10 (x VS)
slo(v)
sinh(as) sinh(bs)
sinhs
sinh(as) sinh(bs)
s sinhs
ani =
331
335
314
341
statistical
531
Distributions
Statistical
Distributions
Page
p(x)-l/(b-a) xe [a,b]
382
pistribution
uniform
I (x-m) 2
p(x)
normal
multivariate
352
p(x) = -------r-rexp
358
normal
exponential
Axxo,
p(x)
11>0
478
481
Poisson
12
441
rot/2)
chi
chi-squared
p(x) =
n/2-1e-X/2
rot/2) x
(xn)nmm
(xtt+tn)l+tn
p(x)
441
x 20, nl
x0, n,ml
442
n+l
Student'st
multivariate t
x
n
p(x) =
p(x) = (nr)P/2
Wishart
441
m))
1211/2
IR12rp(2)
e-ltr(R-lX) X > 0
412
12
Maxwell
X
p(x) = x 2e-2
Maxwell-
524
+19
Boltzmann
Table A.2: Statistical distributions defined and used in the text and
exercises.
447
524
532
Mathematical
Qbles
derivative
ed functions
with
vectors and matrices, produce tensors having more than respectto
two indices
ulas that can be
in matrix/vector calculus. To state how the derivatives expressed
are
into vectors and matrices, we require a more precise notationarranged
used in the text. Moreover,severaldifferent and conflictingthan we
tions are in use in different fields; these are briefly described
A.3.1. So we state here the main results in a descriptive in Section
expect the reader can translate these results into the notation,and
conventionsof
ds
X2
dx
scalar-vector derivative
xn
ds
s
X1
X2
xn
vector
al
atrix
and M
derivatives
s
Derivatives
A1n
s
A2n
,A11
ds
A21
scalar-matrix derivative
533
JAtl
dxT
X2
X/{
f2
xn
X2
fm fm
X1
vector-vector derivative
(Jacobian matrix)
xn
X2
df
dx
n
(a, b) = a T b =
aibi
a, b e
534
Mathematical
Qbles
atrices as
follows
ds
ds
(YR, ax)
ds
dx
dA) = tr ( {SAT
scalar-vector
scalar-matrix
dx
vector-vector
with
But notice the similarity of the vector chain rule with the second equalities of the two scalar chain rules. Because of this similarity, all three
important versions of the chain rule are easy to rememberusingthis
notation. There is no chain rule for matrix-valuedfunctions that does
not involve tensors.
Finally, we collect here the different matrix and vector differentiation formulas that have been used in the text and exercises. Theseare
summarized in Table A.3, with a reference to the page in the text where
they are first mentioned or derived.
vector
and
Derivative
cis
(chainrule l)
dx
ds
--dA)
ds tr
df
Page
Formula
ds dxT
535
Derivatives
trix
Ma
(chainrule 3)
----dx
dxT
g
lix
(chain rule 2)
dgT
(product rule)
+ f
dx
d x Tb = b
dx
d bTx b
6
dxT
d x TAx = Ax + ATx
dx
dx
10
p(A) =
dt
q() = p()
q() =
14
In(detA)
15
T
tr(AB) = tr(BA) = B
16
= (A-I ) T, detA
328
431
detA
13
328
431
431
536
Mathematical
Qbles
continued from previous page
Derivative Formula
Page
tr(ABA T) = A(BT + B)
18
denoted df/dx.
literature
dx
dx
optimizationconvention
Given that ds /dx is a row vector in the optimization notation, the first
scalar chain rule reads
ds =
ddSx)T
ds
-
dx
dx
optimization convention
vector
trix
and Ma
Derivatives
537
test
So th
optimization convention
ds dA)
sistency
incon
the
contains a transpose and the scalar-vector and vectorNotice
do not. The burden rests on the reader to recall these
the chain rule and remember which ones require the
forms of
of the notation used in this section is that all
The advantage
transpose.
with a transpose, which is what one might anticipate
appear
rules
chairlchain rule's required summation over an index. Also, in the
the
dueto used in this section, the V operator is identicalto d/dx and
notation
a transpose should be taken. Finally,there is no hint
implies
neither
optimization
vs or x
In Cartesian
coordinates
x i
co
Xi
or
where
( Vf)ij =
x ij
Xi
538
Mathematical
Tables
so
is the transpose of the Jacobian. Therefore, the
chain
becomes
rule
T
vf
=
(Vf)
df = dx
dx
Consistent with this notation, one can write the Taylor-series
sion of a vector field f around the origin as
expans
f (x) = f (O) + x Vf + xx : V Vf +
f (x) = f (O)
x + K: xx
where
XJ
Kijk
XjXk
A.4 Exercises
ExerciseA.l: Simpleand repeated zeros
Assume all the zeros of q(s) are first-order zeros, rpt = 1, n = 1, 2, ... , m, in entry 27
of Table A.l, and show that it reduces to entry 26.
Exercise A.2: Deriving the Heaviside expansion theorem for repeated roots
Establish the Heaviside expansion theorem for repeated roots, entry(27) in TableA.l.
Hints: Close the contour of the inverse transform Bromwichintegral in (2.7)to the
left side of the complex plane. Show that the integral along the closed contour except
for the Bromwichline goes to zero, leaving only the residues at the singularities,i.e.,
the residues are the coefficientsain given in the expansion formula. Note that this
procedure remains valid if there are an infinite number of poles, such as the case with
a transcendental function for Q(s).
539
4
Exercises
relations
Laplacetransform
Table A.l and show that it produces entry 35.
A.3:
in entry 34 of
lirnit k
--bTx = b
dx
x Tb
bT
db transposing the scalar numerators in entries 5 and 6, respecdo not find companion forms for these listed in Table A.3
co u
matrix B replaang
b with general matrix B above does not generate
at simply replacing
show th
and
tises formulas
correct
d BTx
ax
xTB
show that Formulas 17 and 18 are equivalent by taking trans(b) on the other hand, to produce the other one.
posesof one of them