Linear Algebra Cheat Sheet

Thomas Finley, tomf@cs.cornell.
edu
Determinant
Norms
A vector norm function : Rn R satises:

Linear Algebra
1. x 0, and x = 0 x = 0.
A subspace is a set S Rn such that 0 S and x, y
2. x = || x for all R, and all x Rn .
S, , R . x + y S.
3. x + y x + y , for all x, y Rn .
The span of {v1 , . . . , vk } is the set of all vectors in Rn
Common norms include:
that are linear combinations of v1 , . . . , vk .
1. x 1 = |x1 | + |x2 | + + |xn |
A basis B of subspace S, B = {v1 , . . . , vk } S has
2. x 2 = x2 + x2 + + x2
n
1
2
Span(B) = S and all vi linearly independent.
1
3. x = lim (|x1 |p + + |xn |p ) p = max |xi |
The dimension of S is |B| for a basis B of S.
p
i=1..n
For subspaces S, T with S T , dim(S) dim(T ), and
An induced matrix norm is A
= supx=0 Ax . It
x
further if dim(S) = dim(T ), then S = T .
satises the three properties of norms.
A linear transformation T : Rn Rm has x, y
x Rn , A Rmn , Ax A
x .
Rn , , R . T (x + y) = T (x) + T (y). Further,
AB A
B , called submultiplicativity.
A Rmn such that x . T (x) Ax.
aT b a 2 b 2 , called Cauchy-Schwarz inequality.
n Rm , S : Rm
For two linear transformations T : R
1. A = maxi=1,...,m n |ai,j | (max row sum).
j=1
Rp , S T S(T (x)) is linear transformation. (T (x)
2. A 1 = maxj=1,...,n m |ai,j | (max column sum).
i=1
Ax) (S(y) B) (S T )(x) BAx.
3. A 2 is hard: it takes O(n3 ), not O(n2 ) operations.
n
m
The matrixs row space is the span of its rows, its column
2
F often replaces 2 .
4. A F =
i=1
j=1 ai,j .
space or range is the span of its columns, and its rank is
Numerical Stability
the dimension of either of these spaces.
For A Rmn , rank(A) min(m, n). A has full row (or Six sources of error in scientic computing: modeling errors, measurement or data errors, blunders, discretization
column) rank if rank(A) = m (or n).
A diagonal matrix D Rnn has dj,k = 0 for j = k. The or truncation errors, convergence tolerance, and rounding
exponent
errors.
For single and double:
diagonal identity matrix I has ij,j = 1.
e
t = 24, e {126, . . . , 127}
d1 .d2 d3 dt
The upper (or lower ) bandwidth of A is max |ij| among
i, j where i j (or i j) such that Ai,j = 0.

A matrix with lower bandwidth 1 is upper Hessenberg.
For A, B Rnn , B is As inverse if AB = BA = I. If
such a B exists, A is invertible or nonsingular. B = A1 .
The inverse of A is A1 = [x1 , , xn ] where Axi = ei .
For A Rnn the following are equivalent: A is nonsingular, rank(A) = n, Ax = b has a solution x for any b, if
Ax = 0 then x = 0.
The nullspace of A Rmn is {x Rn : Ax = 0}.
For A Rmn , Range(A) and N ullspace(AT ) are
orthogonal complements, i.e., x Range(A), y
N ullspace(AT ) xT y = 0, and for all p Rm , p = x + y
for unique x and y.
For a permutation matrix P Rnn , P A permutes the
rows of A, AP the columns of A. P 1 = P T .
sign
mantissa
t = 53, e {1022, . . . , 1023}
base
xx|
The relative error in x approximating x is ||x| .
Unit roundo or machine epsilon is mach = t+1 .

Arithmetic operations have relative error bounded by mach .
E.g., consider z = xy with input x, y. This program has
three roundo errors. z = ((1 + 1 )x (1 + 2 )y) (1 + 3 ),
where 1 , 2 , 3 [mach , mach ].

|z|
z
|z|
|(1 +3 )x(2 +3 )y+O(2

mach )|
|xy|
The bad case is where 1 = mach , 2 = mach , 3 = 0:

|z|
z
|z|
= mach |x+y|
|xy|
Inaccuracy if |x+y| |xy| called catastrophic calcellation.
Conditioning & Backwards Stability
A problem instance is ill conditioned if the solution is sensitive to perturbations of the data. For example, sin 1 is
Gaussian Elimination
well conditioned, but sin 12392193 is ill conditioned.
GE produces a factorization A = LU , GEPP P A = LU .
Suppose we perturb Ax = b by (A + E) = b + e where
x
E
e
x+x
GEPP
Plain GE
2(A) + O( 2 ), where
A , b . Then
x
1: for k = 1 to n 1 do
1: for k = 1 to n 1 do
(A) = A A1 is the condition number of A.
2:
= argmax |aik |
2:
if akk = 0 then stop
1. A Rnn , (A) 1.
i{k+1,...,n}
3:
k+1:n,k = ak+1:n,k /akk
2. (I) = 1.
3:
a[,k],k:n = a[k,],k:n
4:
ak+1:n,k:n = ak+1:n,k:n
3. For = 0, (A) = (A).
4:
[,k],1:k1 = [k,],1:k1
k+1:n,k ak,k:n
4. For diagonal D and all p, D p = maxi=1..n |dii |. So,
5:
pk =
5: end for
i=1..n |dii
(D) = maxi=1..n |dii || .
6:
k:n,k = ak:n,k /akk
min
Backward Substitution
7:
ak+1:n,k:n = ak+1:n,k:n If (A) 1 , A may as well be singular.
mach
1: x = zeros(n, 1)
k+1:n,k ak,k:n
An algorithm is backwards stable if in the presence of
2: for j = n to 1 do
8: end for
roundo error it returns the exact solution to a nearby
wj uj,j+1:n xj+1:n
3:
xj =
problem instance.
uj,j
GEPP solves Ax = b by returning x where (A+E) = b.
x
4: end for
It is backwards stable if E O(mach ). With GEPP,

A
To solve Ax = b, factor A = LU (or A = P T LU ), solve E
Lw = b (or Lw = b where b = P b) for w using forward A cn mach + O(mach ), where cn is worst case exposubstitution, then solve U x = w for x using backward sub- nential in n, but in practice almost always low order poly2
stitution. The complexity of GE and GEPP is 3 n3 +O(n2 ). nomial.
Combining stability and conditioning analysis yields
GEPP encounters an exact 0 pivot i A is singular.
xx
cn (A)mach + O(2
mach ).
For banded A, L + U has the same bandwidths as A.
x
The determinant det : Rnn R satises:

1. det(AB) = det(A) det(B).
2. det(A) = 0 i A is singular.
3. det(L) = 1,1 2,2 n,n for triangular L.
4. det(A) = det(AT ).
To compute det(A) factor A = P T LU . det(P ) = (1)s
where s is the number of swaps, det(L) = 1. When computing det(U ) watch out for overow!
Orthogonal Matrices
For Q Rnn , these statements are equivalent:
1. QT Q = QQT = I (i.e., Q is orthogonal )
2. The 2 for each row and column of Q. The inner
product of any row (or column) with another is 0.
3. For all x Rn , Qx 2 = x 2 .
A matrix Q Rmn with m > n has orthonormal columns
if the columns are orthonormal, and QT Q = I.
The product of orthogonal matrices is orthogonal.
For orthogonal Q, QA 2 = A 2 and AQ 2 = A 2 .
QR-factorization
For any A Rmn with m n, we can factor A = QR,
where Q Rmm is orthogonal, and R = [ R1 0 ]T
Rmn is upper triangular. rank(A) = n i R1 is invertible.
Qs rst n (or last m n) columns form an orthonormal
basis for span(A) (or nullspace(AT )).
T
A Householder reection is H = I 2vvv . H is symmetvT
ric and orthogonal. Explicit H.H. QR-factorization is:
1: for k = 1 : n do
2:
v = A(k : m, k) A(k : m, k) 2 e1
T
3:
A(k : m, k : n) = I 2vvv A(k : m, k : n)
vT
4: end for
We get Hn Hn1 H1 A = R, so then, Q = H1 H2 Hn .
This takes 2mn2 2 n3 + O(mn) ops.
3
Givens requires 50% more ops. Preferable for sparse A.
The Gram-Schmidt produces a skinny/reduced QRfactorization A = Q1 R1 , where Q1 Rmn has orthonormal columns. The Gram-Schmidt algorithm is:
Right Looking
Left Looking
1: Q = A
1: for k = 1 : n do
2: for k = 1 : n do
2:
qk = ak
3:
R(k, k) = qk 2
3:
for j = 1 : k 1 do
4:
qk = qk /R(k, k)
4:
R(j, k) = qT ak
j
for j = k + 1 : n do
5:
qk = qk R(j, k)qj 5:
6:
R(k, j) = qT qj
6:
end for
k
7:
qj = qj R(k, j)qk
7:
R(k, k) = qk 2
8:
end for
8:
qk = qk /R(k, k)
9: end for
9: end for
For A Rmn , if rank(A) = n, then AT A is SPD.
Basic Linear Algebra Subroutines
0. Scalar ops, like x2 + y 2 . O(1) ops, O(1) data.

1. Vector ops, like y = ax + y. O(n) ops, O(n) data.
2. Matrix-vector ops, like rank-one update A = A + xyT .
O(n2 ) ops, O(n2 ) data.
3. Matrix-matrix ops, like C = C + AB. O(n2 ) data,
O(n3 ) ops.
Use the highest BLAS level possible. Operators are architecture tuned, e.g., data processed in cache-sized bites.
Linear Least Squares

Suppose we have points (u1 , v1 ), . . . , (u5 , v5 ) that we want
to t a quadratic curve au2 + bu + c through. We want to
2

solve for
u1 u1 1
v1
a
.
. . b = .
.
. .
.
.
. .
.
c
u2 u5 1
v5
5
This is overdetermined so an exact solution is out. Instead,
nd the least squares solution x that minimizes Ax b 2 .
For the method of normal equations, solve for x in
AT Ax = AT b by using Cholesky factorization. This takes
3
mn2 + n + O(mn) ops. It is conditionally but not back3
wards stable: AT A doubles the condition number.
Alternatively, factor A = QR. Let c = [ c1 c2 ]T =
1
QT b. The least squares solution is x = R1 c1 .
If rank(A) = r and r < n (rank decient), factor A =
U V T , let y = V T x and c = U T b. Then, min Ax
r
2
b 2 = min
i=1 (i yi ci ) +
i = r + 1 : n, yi is arbitrary.
m
2
i=r+1 ci ,
so yi =
ci
i .
For
Singular Value Decomposition

For any A Rmn , we can express A = U V T such
that U Rmm and V Rnn are orthogonal, and
= diag(1 , , p ) Rmn where p = min(m, n) and
1 2 p 0. The i are singular values.
1. Matrix 2-norm, where A 2 = 1 .
1
2. The condition number 2 (A) = A 2 A1 2 = n , or
rectangular condition number 2 (A) = 1 . Note
min(m,n)
that 2 (AT A) = 2 (A)2 .

3. For a rank k approximation to A, let k =
diag(1 , , k , 0T ). Then Ak = U k V T . rank(Ak )
k and rank(Ak ) = k i k > 0. Among rank k or lower
matrices, Ak minimizes A Ak 2 = k+1 .
4. Rank determination, since rank(A) = r equals the
number of nonzero , or in machine arithmetic, perhaps the number of mach 1 .
V1T
(1 : r, 1 : r) 0
A = U V T = U1 U2
V2T
0
0
In left looking, let line 6 be qT qj1 for modied G.S. to

j
See that range(U1 ) = range(A). The SVD gives an ormake it backwards stable.
thonormal basis for the range and nullspace of A and AT .
T
Positive Denite, A = LDL
Compute the SVD by using shifted QR on AT A.
A Rnn is positive denite (PD) (or semidenite (PSD))
Information Retrival & LSI
if xT Ax > 0 (or xT Ax 0).
m
When LU -factorizing symmetric A, the result is A = In the bag of words model, wd R , where wd (i) is the
T ; L is unit lower triangular, D is diagonal. A is SPD (perhaps weighted) frequency of term i in document d. The
LDL
mn
i D has all positive entries. The Cholesky factorization is corpus matrix is A = [w1 , , wn ] R T . For a query
q w
m
A = LDLT = LD1/2 D1/2 LT = GGT . Can be done directly q R , rank documents according to a wd d score.
2
3
In latent semantic indexing, you do the same, but in a
in n + O(n2 ) ops. If G has all positive diagonal A is SPD.
3
T , solve k dimensional subspace. Factor A = U V T , then dene
To solve Ax = b for SPD A, factor A = GG
kn . Each w = A = U T w , and
T
Gw = b by forward substitution, then solve GT x = w A = 1:k,1:k V:,1:k R
:,1:k d
:,d
d
3
T
with backwards substitution, which takes n + O(n2 ) ops. q = U:,1:k q.
3
Given f : Rn Rm , we want x such that f (x) = 0.

In xed point iteration, we choose g : Rn Rn such that
x(k+1) = g(x(k) ). If it converges to x , g(x ) x = 0.
g(x(k) ) = g(x )+g(x )(x(k) x )+O( x(k) x 2 ) For
small e(k) = x(k) x , ignore the last term. If g(x ) has
| max | < 1, then x(k) x as e(k) ck e(0) for large k,
where c = | max | + , where is the inuence of the ignored
last term. This indicates a linear rate of convergence.
Suppose for g(x ) = QT QH , T is non-normal, i.e.,
T s superdiagonal portion is large relative to the diagonal.
Then this may not converge as (g(x ))k initially grows!
In Newtons method, x(k+1) = x(k) (f (x(k) ))1 f (x(k) ).
This converges quadratically, i.e., e(k+1) c e(k) 2 .
Automatic dierentiation takes advantage of the notion
that a computer program is nothing but arithmetic operations, and one can apply the chain rule to get the derivative.
This may be used to compute Jacobians and determinants.
Optimization
Non-linear Least Squares

For g : Rn Rm , m n, we want the x for min g(x) 2 .
In the Gauss-Newton method, x(k+1) = x(k) h where
h = (g(x)T g(x))1 g(x)T g(x). Note that h is a solution to a linear least squares problem min g(x(k) )h
g(x(k) ) !
GN is derived by applying NMUM to to
g(x)T g(x), and dropping a resulting tensor (derivative
of Jacobian). You keep the quadratic convergence when
g(x ) = 0, since the tensor 0 as k .
Ordinary Dierential Equations
ODE (or PDE) has one (or multiple) independent variables.

In initial value problems, given dy = f (y, t), y(t) Rn ,
dt
and y(0) = y0 , we want y(t) for t > 0. Examples include:
1. Exponential growth/decay with dy = ay, with closed
dt
form y(t) = y0 eat . Growth if a > 0, decay if a < 0.
dyi
2. Ecological models, dt = fi (y1 , . . . , yn , t) for species
i = 1, . . . , n. yi is population size, fi encodes species
relationships.
3. Mechanics, e.g. wall-spring-block models for F = ma
T
2x
2x
(a = d 2 ) and F = kx, so d 2 = kx . Yields d[x,v] =
m
dt
dt
dt
In continuous optimization, f : Rn R
min f (x)
is the objective function, g : Rn Rm
s.t. g(x) = 0
n Rp
holds equality constraints, h : R
h(x) 0
holds inequality constraints.
We did unrestricted optimization min f (x) in the course.
A ball is a set B(x, r) = {y Rn : x y < r}.
We have local minimizers x which are the best in a
region, i.e., r > 0 such that f (x ) f (x) for all x
T
B(x , r). A global minizer is the best local minimizer.
with y0 as initial position and velocity.
v kx
m
Assume f is c2 . If x is a local minimizer, then f (x ) =
For stability of an ODE, let dy = Ay for A Cnn .
dt
0 and 2 f (x ) is PSD. Semi-conversely, if f (x ) = 0 and
The stable or neutrally spable or unstable case is where
2 f (x ) is PD, then x is a local minimizer.
maxi (i (A)) < 0 or = 0 or > 0 respectively.
Steepest Descent
In nite dierence methods, approximate y(t) by discrete
Go where the function (locally) decreases most rapidly via
(k+1) = x(k) f (x(k) . is explained later. SD is points y0 (given), y1 , y2 , . . . so yk y(tk ) for increasing tk .
x
k
k
For many IVPs and FDMs, if the local truncation error
stateless: depends only on the current point. Too slow.
(error at each step) is O(hp+1 ), the global truncation error
Newtons Method for Unconstrained Min.
(error overall) is O(hp ). Call p the order of accuracy.
Iterate by x(k+1) = x(k) (2 f (x(k) ))1 f (x(k) ), derived
To nd p, substitute the exact solution into FDM for ) = 0. If 2 f (x(k) ) is PD and
by solving for where f (x
mula, insert a remainder term +R on RHS, use a Taylor
(k) ) = 0, the step is a descent direction.
f (x
series expansion, solve for R, keep only the leading term.
What if the Hessian isnt PD? Use (a) secant method, (b)
In Eulers method, let yk+1 = yk + f (yk , tk )hk where
direction of negative curvature where hT 2 f (x(k) )h < 0
hk = tk+1 tk is the step size, and y = f (y, t) is perhaps
where h or h (doesnt work well in practice), (c) trust
computed by nite dierence. p = 1, very low. Explicit!
region idea so h = (2 f (x(k) ) + tI)1 f (x(k) ) (interpoA sti problem has widely ranging time scales in the solation of NMUM and SD), (d) factor 2 f (x(k) ) by Cholesky
lution, e.g., a transient initial velocity that in the true sowhen checking for PD, detect 0 pivots, modify that diagolution disappears immediately, chemical reaction rate varinal in 2 f (x(k) ) and keep going (unjustied by theory, but
ability over temperature, transients in electical circuits. An
works in practice).
explicit method requires hk to be on the smallest scale!
Line Search
Backward Euler has yk+1 = yk + hf (yk+1 , tk+1 ). BE
Line search, given x(k) and step h (perhaps derived from is implicit (y
k+1 on the RHS). If the original program is
(k+1) = x(k) + h.
SD or NMUM), nds a > 0 for x
stable, any h will work!
(k) + h) over .
In exact line search, optimize min f (x
Miscellaneous
Frowned upon because its computationally expensive.
p+1
nconstant p
p
k =n
k=1
p+1 + O(n )
In Armijo or backtrack line search, initialize . While
b2
c
(k) + h) > f (x(k) ) + 0.1f (x(k) )T h, halve .
ax2 + bx + c = 0. r1 , r2 = b 2a 4ac . r1 r2 = a
f (x
Exact arithmetic is slow, futile for inexact observations,
Secant/quasi Newton methods use an approximate aland NA relies on approximate algorithms.
ways PD 2 f . In Broyden-Fletcher-Goldfarb-Shanno:
9
5
6
99
00
9
8
7
6
E
03
27
A
E
1
8
0
0
E
4
8
A
F
B
8
9
2
1
7
1
A
7
A
28
21
A
6
E
4
9
E
8
5
B
9
0
C
21
1
F
20
D6
22
C
F
26
00
28
96
22
88
D6
C7
C
E
2:
02
B0 = initial approximate Hessian {OK to use I.}

for k = 0, 1, 2, . . . do
1
3:
sk = Bk f (x(k) )
1:
20
C
E
3
fm /x1 fm /xn
choose (k) as eigenvalues of submatrices of A. f s Taylor expansion is f (x+h) = f (x)+f (x)h+O( h 2 ).
Nonlinear Equation Solving
x(k+1) = x(k) + k sk {Use special line search for k !}

yk = f (x(k+1) ) f (x(k) )
T
yk yk
Bk sk sT Bk
6:
Bk+1 = Bk +
T k
T
yk sk
sk Bk sk
7: end for
By maintaining Bk in factored form, can iterate in O(n2 )
ops. Bk is SPD provided sT y > 0 (use line search to
k
increase k if needed). The secant condition k Bk+1 sk =
yk holds. If BFCS converges, it converges superlinearly.
4:
5:
02
A(k+1) = R(k) Q(k) + (k) I

5: end for
4:
A linear (or quadratic) model approximates a function f

by the rst two (or three) terms of f s Taylor expansion.
01
In the Ando-Lee analysis, for a corpus with k topics, for Arnoldi and Lanczos
t 1 : k and d 1 : n, let Rt,d 0 be document ds Given A Rnn and unit length q1 Rn , output Q, H
relevance to topic t. R:,d 2 = 1. True document similar- such that A = QHQT . Use Lanczos for symmetric A.
ity is RRT = Rnn , where entry (i, j) is relevance of i to Arnoldi
Lanczos
j. Using LSI, if A contains information about RRT , then 1: for k = 1 : n 1 do
1: 0 = w0 2
)T A will approximate RRT well. LSI depends on even
(A
2: for k = 1, 2, . . . do
2:
qk+1 = Aqk
max Rt,: 2
wk1
distribution of topics, where distribution is = mintt Rt,: 2 . 3:
3:
qk = k1
for = 1 : k do
4:
uk = Aqk
4:
H(, k) = qT qk+1
Great for is near 1, but if 1, LSI does worse.

vk = uk k1 qk1
5:
qk+1 = qk+1 H(, k)q 5:
Complex Numbers
6:
k = qT vk
6:
end for
k
Complex numbers are written z = x + iy C for i = 1.
7:
wk = vk k qk
7:
H(k + 1, k) = qk+1 2
The real part is x = (z). The imaginary part is y = (z).
qk+1
8:
k = wk 2
8:
qk+1 = H(k+1,k)
The conjugate of z is z = xiy. Ax = (Ax), A B = (AB)
9: end for
9: end for
The absolute value of z is |z| = x2 + y 2 .
For Lanczos, the k and k are diagonal and subdiagonal
The conjugate transpose of x is xH = (x)T . A Cnn is
entries of the Hermitian tridiagonal Tk , and we have H in
Hermitian or self-adjoint if A = AH .
Arnoldi. After very few iterations of either method, the
If QH Q = I, Q is unitary.
eigenvalues of Tk and H will be excellent approximations
Eigenvalues & Eigenvectors
to the extreme eigenvalues of A.
For A Cnn , if Ax = x where x = 0, x is an eigenvector
For k iterations, Arnoldi is O(nk 2 ) times and O(nk)
of A and is the corresponding eigenvalue.
space, Lanczos is O(nk)+k M time (M is time for matrixRemember, A x is singular i det(A I) = 0. With vector multiplication) and O(nk) space, or O(n + k) space
as a variable, det(AI) is As characteristic polynomial. if old qk s are discarded.
For nonsingular T Cnn , T 1 AT (the similarity transIterative Methods for Ax = b
formation) is similar to A. Similar matrices have the same
Useful for sparse A where GE would cause ll-in.
characteristic polynomial and hence the same eigenvalues
In the splitting method, A = M N and M v = c is easily
(though probably dierent eigenvectors). This relationship
solvable. Then, x(k+1) = M 1 N x(k) + b . If it converges,
is reexive, transitive, and symmetric.
the limit point x is a solution to Ax = b.
A is diagonalizable if A is similar to a diagonal matrix
The error is e(k) = (M 1 N )k e0 , so splitting methods
1 AT . As eigenvalues are Ds diagonals, and the
D = T
converge if | max | (M 1 N ) < 1.
eigenvectors are columns of T since AT:,i = Di,i T:,i . A is
In the Jacobi method, consider M as the diagonals of A.
diagonalizable i it has n linearly independent eigenvectors.
This will fail of A has any zero diagonals.
For symmetric A Rnn , A is diagonalizable, has all
real eigenvalues, and the eigenvectors may be chosen as the Conjugate Gradient
columns of an orthogonal matrix Q. A = QDQT is the Conjugate gradient iteratively solve Ax = b for SPD A.
It is derived from Lanczos and takes advantage of if A is
eigendecomposition of A. Further for symmetric A:
1. The singular values are absolute values of eigenvalues. SPD then T is SPD. It produces the exact solution after n
iterations. Time per iteration is O(n) + M.
2. Is SPD (or SPSD) i eigenvalues > 0 (or 0).
is reduced by
1: x(0) = arbitrary (0 is okay) Error
3. For SPD, singular values equal eigenvalues.
( (A) 1)/( (A) + 1)
4. For B Rmn , m n, singular values of B are the 2: r0 = b Ax(0)
per iteration. Thus, for
3: p0 = r0
square roots of B T Bs eigenvalues.
(A) = 1, CG converges
For any A Cnn , the Schur form of A is A = QT QH 4: for k=0,1,2,. . . do
after 1 iteration.
To
5:
k = (rT rk )/(pT Apk )
with unitary Q Cnn and upper triangular T Cnn .
k
k
speed up CG, use a per6:
x(k+1) = x(k) + k pk
In this sheet I denote | max | = max{1 ,...,n } ||.
conditioner M such that
7:
rk+1 = rk k Apk
For B Cnn , then limk B k = 0 if | max | (B) < 1.
8:
k+1 = (rT rk+1 )/(rT rk ) (M A) (A) and solve
k+1
k
Power Methods for Eigenvalues
M Ax = M b instead.
9:
pk+1 = rk+1 k+1 pk
x(k+1) = Ax(k) converges to | max | (A)s eigenvector.
10: end for
Once you nd an eigenvector x, nd the associated eigenMultivariate Calculus
(k) T
(k)
value through the Raleigh quotient = x (k) TAx .
Provided f : Rn R, the
gradient and Hessian are
x(k)
x
2f
f
2 f
2 f
The inverse shifted power method is x(k+1) = (A
x1 xn
2
x1 x2
x1
x1
I)1 x(k) . If A has eigenpairs (1 , u1 ), . . . , (n , un ), then
.
.
.
.
.
f = . , 2 f =
.
.
.
(A I)1 has eigenpairs 11 , u1 , . . . , n1 , un .
f
2 f
2 f
2 f
xn
xn x1
xn x2
x2
n
Factor A = QHQT where H is upper Hessenberg.
2
nd
2
To factor A = QHQT , nd successive Householder reec- If f is c (2 partials are all continuous), f is symmetric.
tions H1 , H2 , . . . that zero out rows 2 and lower of column 1, The Taylor expansion for f is
1 T 2
T
3
T
T
rows 3 and lower of column 2, etc. Then Q = H1 Hn2 . f (x + h) = f (x) + h f (x) + 2 h f (x)h + O( h )
Provided f : Rn Rm , the Jacobian is
(0) = A
1: A
A(k) is similar to A by
f1 /x1 f1 /xn
2: for k = 0, 1, 2, . . . do
orthog. trans. U (k) =
.
.
..
.
.
f =
.
3:
Set A(k) (k) I = Q(k) R(k) Q(0) Q(k+1) . Perhaps
.
.
23

Linear Algebra Cheat Sheet

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Linear Algebra Cheat Sheet

Hochgeladen von

Copyright:

Verfügbare Formate

Thomas Finley, tomf@cs.cornell.

A vector norm function : Rn R satises:

i, j where i j (or i j) such that Ai,j = 0.

t = 53, e {1022, . . . , 1023}

Unit roundo or machine epsilon is mach = t+1 .

where 1 , 2 , 3 [mach , mach ].

|(1 +3 )x(2 +3 )y+O(2

The bad case is where 1 = mach , 2 = mach , 3 = 0:

Inaccuracy if |x+y| |xy| called catastrophic calcellation.

Conditioning & Backwards Stability

It is backwards stable if E O(mach ). With GEPP,

The determinant det : Rnn R satises:

For A Rmn , if rank(A) = n, then AT A is SPD.

Basic Linear Algebra Subroutines

0. Scalar ops, like x2 + y 2 . O(1) ops, O(1) data.

Linear Least Squares

Singular Value Decomposition

that 2 (AT A) = 2 (A)2 .

In left looking, let line 6 be qT qj1 for modied G.S. to

Given f : Rn Rm , we want x such that f (x) = 0.

Non-linear Least Squares

Ordinary Dierential Equations

ODE (or PDE) has one (or multiple) independent variables.

B0 = initial approximate Hessian {OK to use I.}

Nonlinear Equation Solving

x(k+1) = x(k) + k sk {Use special line search for k !}

A(k+1) = R(k) Q(k) + (k) I

A linear (or quadratic) model approximates a function f

(A I)1 has eigenpairs 11 , u1 , . . . , n1 , un .

Das könnte Ihnen auch gefallen