Beruflich Dokumente
Kultur Dokumente
edu
Determinant
Norms
sign
mantissa
base
xx|
The relative error in x approximating x is ||x| .
= mach |x+y|
|xy|
A problem instance is ill conditioned if the solution is sensitive to perturbations of the data. For example, sin 1 is
Gaussian Elimination
well conditioned, but sin 12392193 is ill conditioned.
GE produces a factorization A = LU , GEPP P A = LU .
Suppose we perturb Ax = b by (A + E) = b + e where
x
E
e
x+x
GEPP
Plain GE
2(A) + O( 2 ), where
A , b . Then
x
1: for k = 1 to n 1 do
1: for k = 1 to n 1 do
(A) = A A1 is the condition number of A.
2:
= argmax |aik |
2:
if akk = 0 then stop
1. A Rnn , (A) 1.
i{k+1,...,n}
3:
k+1:n,k = ak+1:n,k /akk
2. (I) = 1.
3:
a[,k],k:n = a[k,],k:n
4:
ak+1:n,k:n = ak+1:n,k:n
3. For = 0, (A) = (A).
4:
[,k],1:k1 = [k,],1:k1
k+1:n,k ak,k:n
4. For diagonal D and all p, D p = maxi=1..n |dii |. So,
5:
pk =
5: end for
i=1..n |dii
(D) = maxi=1..n |dii || .
6:
k:n,k = ak:n,k /akk
min
Backward Substitution
7:
ak+1:n,k:n = ak+1:n,k:n If (A) 1 , A may as well be singular.
mach
1: x = zeros(n, 1)
k+1:n,k ak,k:n
An algorithm is backwards stable if in the presence of
2: for j = n to 1 do
8: end for
roundo error it returns the exact solution to a nearby
wj uj,j+1:n xj+1:n
3:
xj =
problem instance.
uj,j
GEPP solves Ax = b by returning x where (A+E) = b.
x
4: end for
Lw = b (or Lw = b where b = P b) for w using forward A cn mach + O(mach ), where cn is worst case exposubstitution, then solve U x = w for x using backward sub- nential in n, but in practice almost always low order poly2
stitution. The complexity of GE and GEPP is 3 n3 +O(n2 ). nomial.
Combining stability and conditioning analysis yields
GEPP encounters an exact 0 pivot i A is singular.
xx
cn (A)mach + O(2
mach ).
For banded A, L + U has the same bandwidths as A.
x
Orthogonal Matrices
For Q Rnn , these statements are equivalent:
1. QT Q = QQT = I (i.e., Q is orthogonal )
2. The 2 for each row and column of Q. The inner
product of any row (or column) with another is 0.
3. For all x Rn , Qx 2 = x 2 .
A matrix Q Rmn with m > n has orthonormal columns
if the columns are orthonormal, and QT Q = I.
The product of orthogonal matrices is orthogonal.
For orthogonal Q, QA 2 = A 2 and AQ 2 = A 2 .
QR-factorization
For any A Rmn with m n, we can factor A = QR,
where Q Rmm is orthogonal, and R = [ R1 0 ]T
Rmn is upper triangular. rank(A) = n i R1 is invertible.
Qs rst n (or last m n) columns form an orthonormal
basis for span(A) (or nullspace(AT )).
T
A Householder reection is H = I 2vvv . H is symmetvT
ric and orthogonal. Explicit H.H. QR-factorization is:
1: for k = 1 : n do
2:
v = A(k : m, k) A(k : m, k) 2 e1
T
3:
A(k : m, k : n) = I 2vvv A(k : m, k : n)
vT
4: end for
We get Hn Hn1 H1 A = R, so then, Q = H1 H2 Hn .
This takes 2mn2 2 n3 + O(mn) ops.
3
Givens requires 50% more ops. Preferable for sparse A.
The Gram-Schmidt produces a skinny/reduced QRfactorization A = Q1 R1 , where Q1 Rmn has orthonormal columns. The Gram-Schmidt algorithm is:
Right Looking
Left Looking
1: Q = A
1: for k = 1 : n do
2: for k = 1 : n do
2:
qk = ak
3:
R(k, k) = qk 2
3:
for j = 1 : k 1 do
4:
qk = qk /R(k, k)
4:
R(j, k) = qT ak
j
for j = k + 1 : n do
5:
qk = qk R(j, k)qj 5:
6:
R(k, j) = qT qj
6:
end for
k
7:
qj = qj R(k, j)qk
7:
R(k, k) = qk 2
8:
end for
8:
qk = qk /R(k, k)
9: end for
9: end for
solve for
u1 u1 1
v1
a
.
. . b = .
.
. .
.
.
. .
.
c
u2 u5 1
v5
5
This is overdetermined so an exact solution is out. Instead,
nd the least squares solution x that minimizes Ax b 2 .
For the method of normal equations, solve for x in
AT Ax = AT b by using Cholesky factorization. This takes
3
mn2 + n + O(mn) ops. It is conditionally but not back3
wards stable: AT A doubles the condition number.
Alternatively, factor A = QR. Let c = [ c1 c2 ]T =
1
QT b. The least squares solution is x = R1 c1 .
If rank(A) = r and r < n (rank decient), factor A =
U V T , let y = V T x and c = U T b. Then, min Ax
r
2
b 2 = min
i=1 (i yi ci ) +
i = r + 1 : n, yi is arbitrary.
m
2
i=r+1 ci ,
so yi =
ci
i .
For
T
Gw = b by forward substitution, then solve GT x = w A = 1:k,1:k V:,1:k R
:,1:k d
:,d
d
3
T
with backwards substitution, which takes n + O(n2 ) ops. q = U:,1:k q.
3
Optimization
In continuous optimization, f : Rn R
min f (x)
is the objective function, g : Rn Rm
s.t. g(x) = 0
n Rp
holds equality constraints, h : R
h(x) 0
holds inequality constraints.
We did unrestricted optimization min f (x) in the course.
A ball is a set B(x, r) = {y Rn : x y < r}.
We have local minimizers x which are the best in a
region, i.e., r > 0 such that f (x ) f (x) for all x
T
B(x , r). A global minizer is the best local minimizer.
with y0 as initial position and velocity.
v kx
m
Assume f is c2 . If x is a local minimizer, then f (x ) =
For stability of an ODE, let dy = Ay for A Cnn .
dt
0 and 2 f (x ) is PSD. Semi-conversely, if f (x ) = 0 and
The stable or neutrally spable or unstable case is where
2 f (x ) is PD, then x is a local minimizer.
maxi (i (A)) < 0 or = 0 or > 0 respectively.
Steepest Descent
In nite dierence methods, approximate y(t) by discrete
Go where the function (locally) decreases most rapidly via
(k+1) = x(k) f (x(k) . is explained later. SD is points y0 (given), y1 , y2 , . . . so yk y(tk ) for increasing tk .
x
k
k
For many IVPs and FDMs, if the local truncation error
stateless: depends only on the current point. Too slow.
(error at each step) is O(hp+1 ), the global truncation error
Newtons Method for Unconstrained Min.
(error overall) is O(hp ). Call p the order of accuracy.
Iterate by x(k+1) = x(k) (2 f (x(k) ))1 f (x(k) ), derived
To nd p, substitute the exact solution into FDM for ) = 0. If 2 f (x(k) ) is PD and
by solving for where f (x
mula, insert a remainder term +R on RHS, use a Taylor
(k) ) = 0, the step is a descent direction.
f (x
series expansion, solve for R, keep only the leading term.
What if the Hessian isnt PD? Use (a) secant method, (b)
In Eulers method, let yk+1 = yk + f (yk , tk )hk where
direction of negative curvature where hT 2 f (x(k) )h < 0
hk = tk+1 tk is the step size, and y = f (y, t) is perhaps
where h or h (doesnt work well in practice), (c) trust
computed by nite dierence. p = 1, very low. Explicit!
region idea so h = (2 f (x(k) ) + tI)1 f (x(k) ) (interpoA sti problem has widely ranging time scales in the solation of NMUM and SD), (d) factor 2 f (x(k) ) by Cholesky
lution, e.g., a transient initial velocity that in the true sowhen checking for PD, detect 0 pivots, modify that diagolution disappears immediately, chemical reaction rate varinal in 2 f (x(k) ) and keep going (unjustied by theory, but
ability over temperature, transients in electical circuits. An
works in practice).
explicit method requires hk to be on the smallest scale!
Line Search
Backward Euler has yk+1 = yk + hf (yk+1 , tk+1 ). BE
Line search, given x(k) and step h (perhaps derived from is implicit (y
k+1 on the RHS). If the original program is
(k+1) = x(k) + h.
SD or NMUM), nds a > 0 for x
stable, any h will work!
(k) + h) over .
In exact line search, optimize min f (x
Miscellaneous
Frowned upon because its computationally expensive.
p+1
nconstant p
p
k =n
k=1
p+1 + O(n )
In Armijo or backtrack line search, initialize . While
b2
c
(k) + h) > f (x(k) ) + 0.1f (x(k) )T h, halve .
ax2 + bx + c = 0. r1 , r2 = b 2a 4ac . r1 r2 = a
f (x
Exact arithmetic is slow, futile for inexact observations,
Secant/quasi Newton methods use an approximate aland NA relies on approximate algorithms.
ways PD 2 f . In Broyden-Fletcher-Goldfarb-Shanno:
9
5
6
99
00
9
8
7
6
E
03
27
A
E
1
8
0
0
E
4
8
A
F
B
8
9
2
1
7
1
A
7
A
28
21
A
6
E
4
9
E
8
5
B
9
0
C
21
1
F
20
D6
22
C
F
26
00
28
96
22
88
D6
C7
C
E
2:
02
1:
20
C
E
3
fm /x1 fm /xn
choose (k) as eigenvalues of submatrices of A. f s Taylor expansion is f (x+h) = f (x)+f (x)h+O( h 2 ).
5:
02
01
In the Ando-Lee analysis, for a corpus with k topics, for Arnoldi and Lanczos
t 1 : k and d 1 : n, let Rt,d 0 be document ds Given A Rnn and unit length q1 Rn , output Q, H
relevance to topic t. R:,d 2 = 1. True document similar- such that A = QHQT . Use Lanczos for symmetric A.
ity is RRT = Rnn , where entry (i, j) is relevance of i to Arnoldi
Lanczos
j. Using LSI, if A contains information about RRT , then 1: for k = 1 : n 1 do
1: 0 = w0 2
)T A will approximate RRT well. LSI depends on even
(A
2: for k = 1, 2, . . . do
2:
qk+1 = Aqk
max Rt,: 2
wk1
distribution of topics, where distribution is = mintt Rt,: 2 . 3:
3:
qk = k1
for = 1 : k do
4:
uk = Aqk
4:
H(, k) = qT qk+1
Great for is near 1, but if 1, LSI does worse.
vk = uk k1 qk1
5:
qk+1 = qk+1 H(, k)q 5:
Complex Numbers
6:
k = qT vk
6:
end for
k
Complex numbers are written z = x + iy C for i = 1.
7:
wk = vk k qk
7:
H(k + 1, k) = qk+1 2
The real part is x = (z). The imaginary part is y = (z).
qk+1
8:
k = wk 2
8:
qk+1 = H(k+1,k)
The conjugate of z is z = xiy. Ax = (Ax), A B = (AB)
9: end for
9: end for
The absolute value of z is |z| = x2 + y 2 .
For Lanczos, the k and k are diagonal and subdiagonal
The conjugate transpose of x is xH = (x)T . A Cnn is
entries of the Hermitian tridiagonal Tk , and we have H in
Hermitian or self-adjoint if A = AH .
Arnoldi. After very few iterations of either method, the
If QH Q = I, Q is unitary.
eigenvalues of Tk and H will be excellent approximations
Eigenvalues & Eigenvectors
to the extreme eigenvalues of A.
For A Cnn , if Ax = x where x = 0, x is an eigenvector
For k iterations, Arnoldi is O(nk 2 ) times and O(nk)
of A and is the corresponding eigenvalue.
space, Lanczos is O(nk)+k M time (M is time for matrixRemember, A x is singular i det(A I) = 0. With vector multiplication) and O(nk) space, or O(n + k) space
as a variable, det(AI) is As characteristic polynomial. if old qk s are discarded.
For nonsingular T Cnn , T 1 AT (the similarity transIterative Methods for Ax = b
formation) is similar to A. Similar matrices have the same
Useful for sparse A where GE would cause ll-in.
characteristic polynomial and hence the same eigenvalues
In the splitting method, A = M N and M v = c is easily
(though probably dierent eigenvectors). This relationship
solvable. Then, x(k+1) = M 1 N x(k) + b . If it converges,
is reexive, transitive, and symmetric.
the limit point x is a solution to Ax = b.
A is diagonalizable if A is similar to a diagonal matrix
The error is e(k) = (M 1 N )k e0 , so splitting methods
1 AT . As eigenvalues are Ds diagonals, and the
D = T
converge if | max | (M 1 N ) < 1.
eigenvectors are columns of T since AT:,i = Di,i T:,i . A is
In the Jacobi method, consider M as the diagonals of A.
diagonalizable i it has n linearly independent eigenvectors.
This will fail of A has any zero diagonals.
For symmetric A Rnn , A is diagonalizable, has all
real eigenvalues, and the eigenvectors may be chosen as the Conjugate Gradient
columns of an orthogonal matrix Q. A = QDQT is the Conjugate gradient iteratively solve Ax = b for SPD A.
It is derived from Lanczos and takes advantage of if A is
eigendecomposition of A. Further for symmetric A:
1. The singular values are absolute values of eigenvalues. SPD then T is SPD. It produces the exact solution after n
iterations. Time per iteration is O(n) + M.
2. Is SPD (or SPSD) i eigenvalues > 0 (or 0).
is reduced by
1: x(0) = arbitrary (0 is okay) Error
3. For SPD, singular values equal eigenvalues.
( (A) 1)/( (A) + 1)
4. For B Rmn , m n, singular values of B are the 2: r0 = b Ax(0)
per iteration. Thus, for
3: p0 = r0
square roots of B T Bs eigenvalues.
(A) = 1, CG converges
For any A Cnn , the Schur form of A is A = QT QH 4: for k=0,1,2,. . . do
after 1 iteration.
To
5:
k = (rT rk )/(pT Apk )
with unitary Q Cnn and upper triangular T Cnn .
k
k
speed up CG, use a per6:
x(k+1) = x(k) + k pk
In this sheet I denote | max | = max{1 ,...,n } ||.
conditioner M such that
7:
rk+1 = rk k Apk
For B Cnn , then limk B k = 0 if | max | (B) < 1.
8:
k+1 = (rT rk+1 )/(rT rk ) (M A) (A) and solve
k+1
k
Power Methods for Eigenvalues
M Ax = M b instead.
9:
pk+1 = rk+1 k+1 pk
x(k+1) = Ax(k) converges to | max | (A)s eigenvector.
10: end for
Once you nd an eigenvector x, nd the associated eigenMultivariate Calculus
(k) T
(k)
value through the Raleigh quotient = x (k) TAx .
Provided f : Rn R, the
gradient and Hessian are
x(k)
x
2f
f
2 f
2 f
The inverse shifted power method is x(k+1) = (A
x1 xn
2
x1 x2
x1
x1
I)1 x(k) . If A has eigenpairs (1 , u1 ), . . . , (n , un ), then
.
.
.
.
.
f = . , 2 f =
.
.
.
f
2 f
2 f
2 f
xn
xn x1
xn x2
x2
n
Factor A = QHQT where H is upper Hessenberg.
2
nd
2
To factor A = QHQT , nd successive Householder reec- If f is c (2 partials are all continuous), f is symmetric.
tions H1 , H2 , . . . that zero out rows 2 and lower of column 1, The Taylor expansion for f is
1 T 2
T
3
T
T
rows 3 and lower of column 2, etc. Then Q = H1 Hn2 . f (x + h) = f (x) + h f (x) + 2 h f (x)h + O( h )
Provided f : Rn Rm , the Jacobian is
(0) = A
1: A
A(k) is similar to A by
f1 /x1 f1 /xn
2: for k = 0, 1, 2, . . . do
orthog. trans. U (k) =
.
.
..
.
.
f =
.
3:
Set A(k) (k) I = Q(k) R(k) Q(0) Q(k+1) . Perhaps
.
.
23