Sie sind auf Seite 1von 256

EE263 Autumn 2010-11 Stephen Boyd

Lecture 1
Overview
course mechanics
outline & topics
what is a linear dynamical system?
why study linear systems?
some examples
11
Course mechanics
all class info, lectures, homeworks, announcements on class web page:
www.stanford.edu/class/ee263
course requirements:
weekly homework
takehome midterm exam (date TBD)
takehome nal exam (date TBD)
Overview 12
Prerequisites
exposure to linear algebra (e.g., Math 103)
exposure to Laplace transform, dierential equations
not needed, but might increase appreciation:
control systems
circuits & systems
dynamics
Overview 13
Major topics & outline
linear algebra & applications
autonomous linear dynamical systems
linear dynamical systems with inputs & outputs
basic quadratic control & estimation
Overview 14
Linear dynamical system
continuous-time linear dynamical system (CT LDS) has the form
dx
dt
= A(t)x(t) +B(t)u(t), y(t) = C(t)x(t) +D(t)u(t)
where:
t R denotes time
x(t) R
n
is the state (vector)
u(t) R
m
is the input or control
y(t) R
p
is the output
Overview 15
A(t) R
nn
is the dynamics matrix
B(t) R
nm
is the input matrix
C(t) R
pn
is the output or sensor matrix
D(t) R
pm
is the feedthrough matrix
for lighter appearance, equations are often written
x = Ax +Bu, y = Cx +Du
CT LDS is a rst order vector dierential equation
also called state equations, or m-input, n-state, p-output LDS
Overview 16
Some LDS terminology
most linear systems encountered are time-invariant: A, B, C, D are
constant, i.e., dont depend on t
when there is no input u (hence, no B or D) system is called
autonomous
very often there is no feedthrough, i.e., D = 0
when u(t) and y(t) are scalar, system is called single-input,
single-output (SISO); when input & output signal dimensions are more
than one, MIMO
Overview 17
Discrete-time linear dynamical system
discrete-time linear dynamical system (DT LDS) has the form
x(t + 1) = A(t)x(t) +B(t)u(t), y(t) = C(t)x(t) +D(t)u(t)
where
t Z = {0, 1, 2, . . .}
(vector) signals x, u, y are sequences
DT LDS is a rst order vector recursion
Overview 18
Why study linear systems?
applications arise in many areas, e.g.
automatic control systems
signal processing
communications
economics, nance
circuit analysis, simulation, design
mechanical and civil engineering
aeronautics
navigation, guidance
Overview 19
Usefulness of LDS
depends on availability of computing power, which is large &
increasing exponentially
used for
analysis & design
implementation, embedded in real-time systems
like DSP, was a specialized topic & technology 30 years ago
Overview 110
Origins and history
parts of LDS theory can be traced to 19th century
builds on classical circuits & systems (1920s on) (transfer functions
. . . ) but with more emphasis on linear algebra
rst engineering application: aerospace, 1960s
transitioned from specialized topic to ubiquitous in 1980s
(just like digital signal processing, information theory, . . . )
Overview 111
Nonlinear dynamical systems
many dynamical systems are nonlinear (a fascinating topic) so why study
linear systems?
most techniques for nonlinear systems are based on linear methods
methods for linear systems often work unreasonably well, in practice, for
nonlinear systems
if you dont understand linear dynamical systems you certainly cant
understand nonlinear dynamical systems
Overview 112
Examples (ideas only, no details)
lets consider a specic system
x = Ax, y = Cx
with x(t) R
16
, y(t) R (a 16-state single-output system)
model of a lightly damped mechanical system, but it doesnt matter
Overview 113
typical output:
0 50 100 150 200 250 300 350
3
2
1
0
1
2
3
0 100 200 300 400 500 600 700 800 900 1000
3
2
1
0
1
2
3
y
y
t
t
output waveform is very complicated; looks almost random and
unpredictable
well see that such a solution can be decomposed into much simpler
(modal) components
Overview 114
0 50 100 150 200 250 300 350
0.2
0
0.2
0 50 100 150 200 250 300 350
1
0
1
0 50 100 150 200 250 300 350
0.5
0
0.5
0 50 100 150 200 250 300 350
2
0
2
0 50 100 150 200 250 300 350
1
0
1
0 50 100 150 200 250 300 350
2
0
2
0 50 100 150 200 250 300 350
5
0
5
0 50 100 150 200 250 300 350
0.2
0
0.2
t
(idea probably familiar from poles)
Overview 115
Input design
add two inputs, two outputs to system:
x = Ax +Bu, y = Cx, x(0) = 0
where B R
162
, C R
216
(same A as before)
problem: nd appropriate u : R
+
R
2
so that y(t) y
des
= (1, 2)
simple approach: consider static conditions (u, x, y constant):
x = 0 = Ax +Bu
static
, y = y
des
= Cx
solve for u to get:
u
static
=

CA
1
B

1
y
des
=

0.63
0.36

Overview 116
lets apply u = u
static
and just wait for things to settle:
200 0 200 400 600 800 1000 1200 1400 1600 1800
0
0.5
1
1.5
2
200 0 200 400 600 800 1000 1200 1400 1600 1800
4
3
2
1
0
200 0 200 400 600 800 1000 1200 1400 1600 1800
1
0.8
0.6
0.4
0.2
0
200 0 200 400 600 800 1000 1200 1400 1600 1800
0.1
0
0.1
0.2
0.3
0.4
u
1
u
2
y
1
y
2
t
t
t
t
. . . takes about 1500 sec for y(t) to converge to y
des
Overview 117
using very clever input waveforms (EE263) we can do much better, e.g.
0 10 20 30 40 50 60
0.5
0
0.5
1
0 10 20 30 40 50 60
2.5
2
1.5
1
0.5
0
0 10 20 30 40 50 60
0.6
0.4
0.2
0
0.2
0 10 20 30 40 50 60
0.2
0
0.2
0.4
u
1
u
2
y
1
y
2
t
t
t
t
. . . here y converges exactly in 50 sec
Overview 118
in fact by using larger inputs we do still better, e.g.
5 0 5 10 15 20 25
1
0
1
2
5 0 5 10 15 20 25
2
1.5
1
0.5
0
5 0 5 10 15 20 25
5
0
5
5 0 5 10 15 20 25
1.5
1
0.5
0
0.5
1
u
1
u
2
y
1
y
2
t
t
t
t
. . . here we have (exact) convergence in 20 sec
Overview 119
in this course well study
how to synthesize or design such inputs
the tradeo between size of u and convergence time
Overview 120
Estimation / ltering
u w y
H(s) A/D
signal u is piecewise constant (period 1 sec)
ltered by 2nd-order system H(s), step response s(t)
A/D runs at 10Hz, with 3-bit quantizer
Overview 121
0 1 2 3 4 5 6 7 8 9 10
0
0.5
1
1.5
0 1 2 3 4 5 6 7 8 9 10
1
0
1
0 1 2 3 4 5 6 7 8 9 10
1
0
1
0 1 2 3 4 5 6 7 8 9 10
1
0
1
s
(
t
)
u
(
t
)
w
(
t
)
y
(
t
)
t
problem: estimate original signal u, given quantized, ltered signal y
Overview 122
simple approach:
ignore quantization
design equalizer G(s) for H(s) (i.e., GH 1)
approximate u as G(s)y
. . . yields terrible results
Overview 123
formulate as estimation problem (EE263) . . .
0 1 2 3 4 5 6 7 8 9 10
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
u
(
t
)
(
s
o
l
i
d
)
a
n
d
u
(
t
)
(
d
o
t
t
e
d
)
t
RMS error 0.03, well below quantization error (!)
Overview 124
EE263 Autumn 2010-11 Stephen Boyd
Lecture 2
Linear functions and examples
linear equations and functions
engineering examples
interpretations
21
Linear equations
consider system of linear equations
y
1
= a
11
x
1
+ a
12
x
2
+ + a
1n
x
n
y
2
= a
21
x
1
+ a
22
x
2
+ + a
2n
x
n
.
.
.
y
m
= a
m1
x
1
+ a
m2
x
2
+ + a
mn
x
n
can be written in matrix form as y = Ax, where
y =

y
1
y
2
.
.
.
y
m

A =

a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
a
m1
a
m2
a
mn

x =

x
1
x
2
.
.
.
x
n

Linear functions and examples 22


Linear functions
a function f : R
n
R
m
is linear if
f(x + y) = f(x) + f(y), x, y R
n
f(x) = f(x), x R
n
R
i.e., superposition holds
x
y
x + y
f(x)
f(y)
f(x + y)
Linear functions and examples 23
Matrix multiplication function
consider function f : R
n
R
m
given by f(x) = Ax, where A R
mn
matrix multiplication function f is linear
converse is true: any linear function f : R
n
R
m
can be written as
f(x) = Ax for some A R
mn
representation via matrix multiplication is unique: for any linear
function f there is only one matrix A for which f(x) = Ax for all x
y = Ax is a concrete representation of a generic linear function
Linear functions and examples 24
Interpretations of y = Ax
y is measurement or observation; x is unknown to be determined
x is input or action; y is output or result
y = Ax denes a function or transformation that maps x R
n
into
y R
m
Linear functions and examples 25
Interpretation of a
ij
y
i
=
n

j=1
a
ij
x
j
a
ij
is gain factor from jth input (x
j
) to ith output (y
i
)
thus, e.g.,
ith row of A concerns ith output
jth column of A concerns jth input
a
27
= 0 means 2nd output (y
2
) doesnt depend on 7th input (x
7
)
|a
31
| |a
3j
| for j = 1 means y
3
depends mainly on x
1
Linear functions and examples 26
|a
52
| |a
i2
| for i = 5 means x
2
aects mainly y
5
A is lower triangular, i.e., a
ij
= 0 for i < j, means y
i
only depends on
x
1
, . . . , x
i
A is diagonal, i.e., a
ij
= 0 for i = j, means ith output depends only on
ith input
more generally, sparsity pattern of A, i.e., list of zero/nonzero entries of
A, shows which x
j
aect which y
i
Linear functions and examples 27
Linear elastic structure
x
j
is external force applied at some node, in some xed direction
y
i
is (small) deection of some node, in some xed direction
x
1
x
2
x
3
x
4
(provided x, y are small) we have y Ax
A is called the compliance matrix
a
ij
gives deection i per unit force at j (in m/N)
Linear functions and examples 28
Total force/torque on rigid body
x
1
x
2
x
3
x
4
CG
x
j
is external force/torque applied at some point/direction/axis
y R
6
is resulting total force & torque on body
(y
1
, y
2
, y
3
are x-, y-, z- components of total force,
y
4
, y
5
, y
6
are x-, y-, z- components of total torque)
we have y = Ax
A depends on geometry
(of applied forces and torques with respect to center of gravity CG)
jth column gives resulting force & torque for unit force/torque j
Linear functions and examples 29
Linear static circuit
interconnection of resistors, linear dependent (controlled) sources, and
independent sources
x
1
x
2
y
1
y
2
y
3
i
b
i
b
x
j
is value of independent source j
y
i
is some circuit variable (voltage, current)
we have y = Ax
if x
j
are currents and y
i
are voltages, A is called the impedance or
resistance matrix
Linear functions and examples 210
Final position/velocity of mass due to applied forces
f
unit mass, zero position/velocity at t = 0, subject to force f(t) for
0 t n
f(t) = x
j
for j 1 t < j, j = 1, . . . , n
(x is the sequence of applied forces, constant in each interval)
y
1
, y
2
are nal position and velocity (i.e., at t = n)
we have y = Ax
a
1j
gives inuence of applied force during j 1 t < j on nal position
a
2j
gives inuence of applied force during j 1 t < j on nal velocity
Linear functions and examples 211
Gravimeter prospecting

j
g
i
g
avg
x
j
=
j

avg
is (excess) mass density of earth in voxel j;
y
i
is measured gravity anomaly at location i, i.e., some component
(typically vertical) of g
i
g
avg
y = Ax
Linear functions and examples 212
A comes from physics and geometry
jth column of A shows sensor readings caused by unit density anomaly
at voxel j
ith row of A shows sensitivity pattern of sensor i
Linear functions and examples 213
Thermal system
x
1
x
2
x
3
x
4
x
5
location 4
heating element 5
x
j
is power of jth heating element or heat source
y
i
is change in steady-state temperature at location i
thermal transport via conduction
y = Ax
Linear functions and examples 214
a
ij
gives inuence of heater j at location i (in

C/W)
jth column of A gives pattern of steady-state temperature rise due to
1W at heater j
ith row shows how heaters aect location i
Linear functions and examples 215
Illumination with multiple lamps
pwr. x
j
illum. y
i
r
ij

ij
n lamps illuminating m (small, at) patches, no shadows
x
j
is power of jth lamp; y
i
is illumination level of patch i
y = Ax, where a
ij
= r
2
ij
max{cos
ij
, 0}
(cos
ij
< 0 means patch i is shaded from lamp j)
jth column of A shows illumination pattern from lamp j
Linear functions and examples 216
Signal and interference power in wireless system
n transmitter/receiver pairs
transmitter j transmits to receiver j (and, inadvertantly, to the other
receivers)
p
j
is power of jth transmitter
s
i
is received signal power of ith receiver
z
i
is received interference power of ith receiver
G
ij
is path gain from transmitter j to receiver i
we have s = Ap, z = Bp, where
a
ij
=

G
ii
i = j
0 i = j
b
ij
=

0 i = j
G
ij
i = j
A is diagonal; B has zero diagonal (ideally, A is large, B is small)
Linear functions and examples 217
Cost of production
production inputs (materials, parts, labor, . . . ) are combined to make a
number of products
x
j
is price per unit of production input j
a
ij
is units of production input j required to manufacture one unit of
product i
y
i
is production cost per unit of product i
we have y = Ax
ith row of A is bill of materials for unit of product i
Linear functions and examples 218
production inputs needed
q
i
is quantity of product i to be produced
r
j
is total quantity of production input j needed
we have r = A
T
q
total production cost is
r
T
x = (A
T
q)
T
x = q
T
Ax
Linear functions and examples 219
Network trac and ows
n ows with rates f
1
, . . . , f
n
pass from their source nodes to their
destination nodes over xed routes in a network
t
i
, trac on link i, is sum of rates of ows passing through it
ow routes given by ow-link incidence matrix
A
ij
=

1 ow j goes over link i


0 otherwise
trac and ow rates related by t = Af
Linear functions and examples 220
link delays and ow latency
let d
1
, . . . , d
m
be link delays, and l
1
, . . . , l
n
be latency (total travel
time) of ows
l = A
T
d
f
T
l = f
T
A
T
d = (Af)
T
d = t
T
d, total # of packets in network
Linear functions and examples 221
Linearization
if f : R
n
R
m
is dierentiable at x
0
R
n
, then
x near x
0
= f(x) very near f(x
0
) + Df(x
0
)(x x
0
)
where
Df(x
0
)
ij
=
f
i
x
j

x
0
is derivative (Jacobian) matrix
with y = f(x), y
0
= f(x
0
), dene input deviation x := x x
0
, output
deviation y := y y
0
then we have y Df(x
0
)x
when deviations are small, they are (approximately) related by a linear
function
Linear functions and examples 222
Navigation by range measurement
(x, y) unknown coordinates in plane
(p
i
, q
i
) known coordinates of beacons for i = 1, 2, 3, 4

i
measured (known) distance or range from beacon i
(x, y)
(p
1
, q
1
)
(p
2
, q
2
)
(p
3
, q
3
)
(p
4
, q
4
)
1

4
beacons
unknown position
Linear functions and examples 223
R
4
is a nonlinear function of (x, y) R
2
:

i
(x, y) =

(x p
i
)
2
+ (y q
i
)
2
linearize around (x
0
, y
0
): A

x
y

, where
a
i1
=
(x
0
p
i
)

(x
0
p
i
)
2
+ (y
0
q
i
)
2
, a
i2
=
(y
0
q
i
)

(x
0
p
i
)
2
+ (y
0
q
i
)
2
ith row of A shows (approximate) change in ith range measurement for
(small) shift in (x, y) from (x
0
, y
0
)
rst column of A shows sensitivity of range measurements to (small)
change in x from x
0
obvious application: (x
0
, y
0
) is last navigation x; (x, y) is current
position, a short time later
Linear functions and examples 224
Broad categories of applications
linear model or function y = Ax
some broad categories of applications:
estimation or inversion
control or design
mapping or transformation
(this list is not exclusive; can have combinations . . . )
Linear functions and examples 225
Estimation or inversion
y = Ax
y
i
is ith measurement or sensor reading (which we know)
x
j
is jth parameter to be estimated or determined
a
ij
is sensitivity of ith sensor to jth parameter
sample problems:
nd x, given y
nd all xs that result in y (i.e., all xs consistent with measurements)
if there is no x such that y = Ax, nd x s.t. y Ax (i.e., if the sensor
readings are inconsistent, nd x which is almost consistent)
Linear functions and examples 226
Control or design
y = Ax
x is vector of design parameters or inputs (which we can choose)
y is vector of results, or outcomes
A describes how input choices aect results
sample problems:
nd x so that y = y
des
nd all xs that result in y = y
des
(i.e., nd all designs that meet
specications)
among xs that satisfy y = y
des
, nd a small one (i.e., nd a small or
ecient x that meets specications)
Linear functions and examples 227
Mapping or transformation
x is mapped or transformed to y by linear function y = Ax
sample problems:
determine if there is an x that maps to a given y
(if possible) nd an x that maps to y
nd all xs that map to a given y
if there is only one x that maps to y, nd it (i.e., decode or undo the
mapping)
Linear functions and examples 228
Matrix multiplication as mixture of columns
write A R
mn
in terms of its columns:
A =

a
1
a
2
a
n

where a
j
R
m
then y = Ax can be written as
y = x
1
a
1
+ x
2
a
2
+ + x
n
a
n
(x
j
s are scalars, a
j
s are m-vectors)
y is a (linear) combination or mixture of the columns of A
coecients of x give coecients of mixture
Linear functions and examples 229
an important example: x = e
j
, the jth unit vector
e
1
=

1
0
.
.
.
0

, e
2
=

0
1
.
.
.
0

, . . . e
n
=

0
0
.
.
.
1

then Ae
j
= a
j
, the jth column of A
(e
j
corresponds to a pure mixture, giving only column j)
Linear functions and examples 230
Matrix multiplication as inner product with rows
write A in terms of its rows:
A =

a
T
1
a
T
2
.
.
.
a
T
n

where a
i
R
n
then y = Ax can be written as
y =

a
T
1
x
a
T
2
x
.
.
.
a
T
m
x

thus y
i
= a
i
, x, i.e., y
i
is inner product of ith row of A with x
Linear functions and examples 231
geometric interpretation:
y
i
= a
T
i
x = is a hyperplane in R
n
(normal to a
i
)
a
i
y
i
= a
i
, x = 3
y
i
= a
i
, x = 2
y
i
= a
i
, x = 1
y
i
= a
i
, x = 0
Linear functions and examples 232
Block diagram representation
y = Ax can be represented by a signal ow graph or block diagram
e.g. for m = n = 2, we represent

y
1
y
2

a
11
a
12
a
21
a
22

x
1
x
2

as
x
1
x
2
y
1
y
2
a
11
a
21
a
12
a
22
a
ij
is the gain along the path from jth input to ith output
(by not drawing paths with zero gain) shows sparsity structure of A
(e.g., diagonal, block upper triangular, arrow . . . )
Linear functions and examples 233
example: block upper triangular, i.e.,
A =

A
11
A
12
0 A
22

where A
11
R
m
1
n
1
, A
12
R
m
1
n
2
, A
21
R
m
2
n
1
, A
22
R
m
2
n
2
partition x and y conformably as
x =

x
1
x
2

, y =

y
1
y
2

(x
1
R
n
1
, x
2
R
n
2
, y
1
R
m
1
, y
2
R
m
2
) so
y
1
= A
11
x
1
+ A
12
x
2
, y
2
= A
22
x
2
,
i.e., y
2
doesnt depend on x
1
Linear functions and examples 234
block diagram:
x
1
x
2
y
1
y
2
A
11
A
12
A
22
. . . no path from x
1
to y
2
, so y
2
doesnt depend on x
1
Linear functions and examples 235
Matrix multiplication as composition
for A R
mn
and B R
np
, C = AB R
mp
where
c
ij
=
n

k=1
a
ik
b
kj
composition interpretation: y = Cz represents composition of y = Ax
and x = Bz
m m n p p
z z y y
x
A B AB

(note that B is on left in block diagram)


Linear functions and examples 236
Column and row interpretations
can write product C = AB as
C =

c
1
c
p

= AB =

Ab
1
Ab
p

i.e., ith column of C is A acting on ith column of B


similarly we can write
C =

c
T
1
.
.
.
c
T
m

= AB =

a
T
1
B
.
.
.
a
T
m
B

i.e., ith row of C is ith row of A acting (on left) on B


Linear functions and examples 237
Inner product interpretation
inner product interpretation:
c
ij
= a
T
i
b
j
= a
i
, b
j

i.e., entries of C are inner products of rows of A and columns of B


c
ij
= 0 means ith row of A is orthogonal to jth column of B
Gram matrix of vectors f
1
, . . . , f
n
dened as G
ij
= f
T
i
f
j
(gives inner product of each vector with the others)
G = [f
1
f
n
]
T
[f
1
f
n
]
Linear functions and examples 238
Matrix multiplication interpretation via paths
a
11
a
21
a
12
a
22
b
11
b
21
b
12
b
22
x
1
x
2
y
1
y
2
z
1
z
2
path gain= a
22
b
21
a
ik
b
kj
is gain of path from input j to output i via k
c
ij
is sum of gains over all paths from input j to output i
Linear functions and examples 239
EE263 Autumn 2010-11 Stephen Boyd
Lecture 3
Linear algebra review
vector space, subspaces
independence, basis, dimension
range, nullspace, rank
change of coordinates
norm, angle, inner product
31
Vector spaces
a vector space or linear space (over the reals) consists of
a set 1
a vector sum + : 1 1 1
a scalar multiplication : R 1 1
a distinguished element 0 1
which satisfy a list of properties
Linear algebra review 32
x + y = y + x, x, y 1 (+ is commutative)
(x + y) + z = x + (y + z), x, y, z 1 (+ is associative)
0 + x = x, x 1 (0 is additive identity)
x 1 (x) 1 s.t. x + (x) = 0 (existence of additive inverse)
()x = (x), , R x 1 (scalar mult. is associative)
(x + y) = x + y, R x, y 1 (right distributive rule)
( + )x = x + x, , R x 1 (left distributive rule)
1x = x, x 1
Linear algebra review 33
Examples
1
1
= R
n
, with standard (componentwise) vector addition and scalar
multiplication
1
2
= 0 (where 0 R
n
)
1
3
= span(v
1
, v
2
, . . . , v
k
) where
span(v
1
, v
2
, . . . , v
k
) =
1
v
1
+ +
k
v
k
[
i
R
and v
1
, . . . , v
k
R
n
Linear algebra review 34
Subspaces
a subspace of a vector space is a subset of a vector space which is itself
a vector space
roughly speaking, a subspace is closed under vector addition and scalar
multiplication
examples 1
1
, 1
2
, 1
3
above are subspaces of R
n
Linear algebra review 35
Vector spaces of functions
1
4
= x : R
+
R
n
[ x is dierentiable, where vector sum is sum of
functions:
(x + z)(t) = x(t) + z(t)
and scalar multiplication is dened by
(x)(t) = x(t)
(a point in 1
4
is a trajectory in R
n
)
1
5
= x 1
4
[ x = Ax
(points in 1
5
are trajectories of the linear system x = Ax)
1
5
is a subspace of 1
4
Linear algebra review 36
Independent set of vectors
a set of vectors v
1
, v
2
, . . . , v
k
is independent if

1
v
1
+
2
v
2
+ +
k
v
k
= 0 =
1
=
2
= = 0
some equivalent conditions:
coecients of
1
v
1
+
2
v
2
+ +
k
v
k
are uniquely determined, i.e.,

1
v
1
+
2
v
2
+ +
k
v
k
=
1
v
1
+
2
v
2
+ +
k
v
k
implies
1
=
1
,
2
=
2
, . . . ,
k
=
k
no vector v
i
can be expressed as a linear combination of the other
vectors v
1
, . . . , v
i1
, v
i+1
, . . . , v
k
Linear algebra review 37
Basis and dimension
set of vectors v
1
, v
2
, . . . , v
k
is a basis for a vector space 1 if
v
1
, v
2
, . . . , v
k
span 1, i.e., 1 = span(v
1
, v
2
, . . . , v
k
)
v
1
, v
2
, . . . , v
k
is independent
equivalent: every v 1 can be uniquely expressed as
v =
1
v
1
+ +
k
v
k
fact: for a given vector space 1, the number of vectors in any basis is the
same
number of vectors in any basis is called the dimension of 1, denoted dim1
(we assign dim0 = 0, and dim1 = if there is no basis)
Linear algebra review 38
Nullspace of a matrix
the nullspace of A R
mn
is dened as
^(A) = x R
n
[ Ax = 0
^(A) is set of vectors mapped to zero by y = Ax
^(A) is set of vectors orthogonal to all rows of A
^(A) gives ambiguity in x given y = Ax:
if y = Ax and z ^(A), then y = A(x + z)
conversely, if y = Ax and y = A x, then x = x + z for some z ^(A)
Linear algebra review 39
Zero nullspace
A is called one-to-one if 0 is the only element of its nullspace:
^(A) = 0
x can always be uniquely determined from y = Ax
(i.e., the linear transformation y = Ax doesnt lose information)
mapping from x to Ax is one-to-one: dierent xs map to dierent ys
columns of A are independent (hence, a basis for their span)
A has a left inverse, i.e., there is a matrix B R
nm
s.t. BA = I
det(A
T
A) ,= 0
(well establish these later)
Linear algebra review 310
Interpretations of nullspace
suppose z ^(A)
y = Ax represents measurement of x
z is undetectable from sensors get zero sensor readings
x and x + z are indistinguishable from sensors: Ax = A(x + z)
^(A) characterizes ambiguity in x from measurement y = Ax
y = Ax represents output resulting from input x
z is an input with no result
x and x + z have same result
^(A) characterizes freedom of input choice for given result
Linear algebra review 311
Range of a matrix
the range of A R
mn
is dened as
1(A) = Ax [ x R
n
R
m
1(A) can be interpreted as
the set of vectors that can be hit by linear mapping y = Ax
the span of columns of A
the set of vectors y for which Ax = y has a solution
Linear algebra review 312
Onto matrices
A is called onto if 1(A) = R
m

Ax = y can be solved in x for any y


columns of A span R
m
A has a right inverse, i.e., there is a matrix B R
nm
s.t. AB = I
rows of A are independent
^(A
T
) = 0
det(AA
T
) ,= 0
(some of these are not obvious; well establish them later)
Linear algebra review 313
Interpretations of range
suppose v 1(A), w , 1(A)
y = Ax represents measurement of x
y = v is a possible or consistent sensor signal
y = w is impossible or inconsistent; sensors have failed or model is
wrong
y = Ax represents output resulting from input x
v is a possible result or output
w cannot be a result or output
1(A) characterizes the possible results or achievable outputs
Linear algebra review 314
Inverse
A R
nn
is invertible or nonsingular if det A ,= 0
equivalent conditions:
columns of A are a basis for R
n
rows of A are a basis for R
n
y = Ax has a unique solution x for every y R
n
A has a (left and right) inverse denoted A
1
R
nn
, with
AA
1
= A
1
A = I
^(A) = 0
1(A) = R
n
det A
T
A = det AA
T
,= 0
Linear algebra review 315
Interpretations of inverse
suppose A R
nn
has inverse B = A
1
mapping associated with B undoes mapping associated with A (applied
either before or after!)
x = By is a perfect (pre- or post-) equalizer for the channel y = Ax
x = By is unique solution of Ax = y
Linear algebra review 316
Dual basis interpretation
let a
i
be columns of A, and

b
T
i
be rows of B = A
1
from y = x
1
a
1
+ + x
n
a
n
and x
i
=

b
T
i
y, we get
y =
n

i=1
(

b
T
i
y)a
i
thus, inner product with rows of inverse matrix gives the coecients in
the expansion of a vector in the columns of the matrix


b
1
, . . . ,

b
n
and a
1
, . . . , a
n
are called dual bases
Linear algebra review 317
Rank of a matrix
we dene the rank of A R
mn
as
rank(A) = dim1(A)
(nontrivial) facts:
rank(A) = rank(A
T
)
rank(A) is maximum number of independent columns (or rows) of A
hence rank(A) min(m, n)
rank(A) + dim^(A) = n
Linear algebra review 318
Conservation of dimension
interpretation of rank(A) + dim^(A) = n:
rank(A) is dimension of set hit by the mapping y = Ax
dim^(A) is dimension of set of x crushed to zero by y = Ax
conservation of dimension: each dimension of input is either crushed
to zero or ends up in output
roughly speaking:
n is number of degrees of freedom in input x
dim^(A) is number of degrees of freedom lost in the mapping from
x to y = Ax
rank(A) is number of degrees of freedom in output y
Linear algebra review 319
Coding interpretation of rank
rank of product: rank(BC) minrank(B), rank(C)
hence if A = BC with B R
mr
, C R
rn
, then rank(A) r
conversely: if rank(A) = r then A R
mn
can be factored as A = BC
with B R
mr
, C R
rn
:
m m n n r
x x y y
rank(A) lines
B C A
rank(A) = r is minimum size of vector needed to faithfully reconstruct
y from x
Linear algebra review 320
Application: fast matrix-vector multiplication
need to compute matrix-vector product y = Ax, A R
mn
A has known factorization A = BC, B R
mr
computing y = Ax directly: mn operations
computing y = Ax as y = B(Cx) (compute z = Cx rst, then
y = Bz): rn + mr = (m + n)r operations
savings can be considerable if r minm, n
Linear algebra review 321
Full rank matrices
for A R
mn
we always have rank(A) min(m, n)
we say A is full rank if rank(A) = min(m, n)
for square matrices, full rank means nonsingular
for skinny matrices (m n), full rank means columns are independent
for fat matrices (m n), full rank means rows are independent
Linear algebra review 322
Change of coordinates
standard basis vectors in R
n
: (e
1
, e
2
, . . . , e
n
) where
e
i
=

0
.
.
.
1
.
.
.
0

(1 in ith component)
obviously we have
x = x
1
e
1
+ x
2
e
2
+ + x
n
e
n
x
i
are called the coordinates of x (in the standard basis)
Linear algebra review 323
if (t
1
, t
2
, . . . , t
n
) is another basis for R
n
, we have
x = x
1
t
1
+ x
2
t
2
+ + x
n
t
n
where x
i
are the coordinates of x in the basis (t
1
, t
2
, . . . , t
n
)
dene T =

t
1
t
2
t
n

so x = T x, hence
x = T
1
x
(T is invertible since t
i
are a basis)
T
1
transforms (standard basis) coordinates of x into t
i
-coordinates
inner product ith row of T
1
with x extracts t
i
-coordinate of x
Linear algebra review 324
consider linear transformation y = Ax, A R
nn
express y and x in terms of t
1
, t
2
. . . , t
n
:
x = T x, y = T y
so
y = (T
1
AT) x
A T
1
AT is called similarity transformation
similarity transformation by T expresses linear transformation y = Ax in
coordinates t
1
, t
2
, . . . , t
n
Linear algebra review 325
(Euclidean) norm
for x R
n
we dene the (Euclidean) norm as
|x| =

x
2
1
+ x
2
2
+ + x
2
n
=

x
T
x
|x| measures length of vector (from origin)
important properties:
|x| = [[|x| (homogeneity)
|x + y| |x| +|y| (triangle inequality)
|x| 0 (nonnegativity)
|x| = 0 x = 0 (deniteness)
Linear algebra review 326
RMS value and (Euclidean) distance
root-mean-square (RMS) value of vector x R
n
:
rms(x) =

1
n
n

i=1
x
2
i

1/2
=
|x|

n
norm denes distance between vectors: dist(x, y) = |x y|
x
y
x y
Linear algebra review 327
Inner product
x, y) := x
1
y
1
+ x
2
y
2
+ + x
n
y
n
= x
T
y
important properties:
x, y) = x, y)
x + y, z) = x, z) +y, z)
x, y) = y, x)
x, x) 0
x, x) = 0 x = 0
f(y) = x, y) is linear function : R
n
R, with linear map dened by row
vector x
T
Linear algebra review 328
Cauchy-Schwartz inequality and angle between vectors
for any x, y R
n
, [x
T
y[ |x||y|
(unsigned) angle between vectors in R
n
dened as
=

(x, y) = cos
1
x
T
y
|x||y|
x
y


x
T
y
y
2

y
thus x
T
y = |x||y| cos
Linear algebra review 329
special cases:
x and y are aligned: = 0; x
T
y = |x||y|;
(if x ,= 0) y = x for some 0
x and y are opposed: = ; x
T
y = |x||y|
(if x ,= 0) y = x for some 0
x and y are orthogonal : = /2 or /2; x
T
y = 0
denoted x y
Linear algebra review 330
interpretation of x
T
y > 0 and x
T
y < 0:
x
T
y > 0 means

(x, y) is acute
x
T
y < 0 means

(x, y) is obtuse
x x
y y
x
T
y < 0 x
T
y > 0
x [ x
T
y 0 denes a halfspace with outward normal vector y, and
boundary passing through 0
0
x [ x
T
y 0
y
Linear algebra review 331
EE263 Autumn 2010-11 Stephen Boyd
Lecture 4
Orthonormal sets of vectors and QR
factorization
orthonormal set of vectors
Gram-Schmidt procedure, QR factorization
orthogonal decomposition induced by a matrix
41
Orthonormal set of vectors
set of vectors u
1
, . . . , u
k
R
n
is
normalized if |u
i
| = 1, i = 1, . . . , k
(u
i
are called unit vectors or direction vectors)
orthogonal if u
i
u
j
for i ,= j
orthonormal if both
slang: we say u
1
, . . . , u
k
are orthonormal vectors but orthonormality (like
independence) is a property of a set of vectors, not vectors individually
in terms of U = [u
1
u
k
], orthonormal means
U
T
U = I
k
Orthonormal sets of vectors and QR factorization 42
an orthonormal set of vectors is independent
(multiply
1
u
1
+
2
u
2
+ +
k
u
k
= 0 by u
T
i
)
hence u
1
, . . . , u
k
is an orthonormal basis for
span(u
1
, . . . , u
k
) = 1(U)
warning: if k < n then UU
T
,= I (since its rank is at most k)
(more on this matrix later . . . )
Orthonormal sets of vectors and QR factorization 43
Geometric properties
suppose columns of U = [u
1
u
k
] are orthonormal
if w = Uz, then |w| = |z|
multiplication by U does not change norm
mapping w = Uz is isometric: it preserves distances
simple derivation using matrices:
|w|
2
= |Uz|
2
= (Uz)
T
(Uz) = z
T
U
T
Uz = z
T
z = |z|
2
Orthonormal sets of vectors and QR factorization 44
inner products are also preserved: Uz, U z) = z, z)
if w = Uz and w = U z then
w, w) = Uz, U z) = (Uz)
T
(U z) = z
T
U
T
U z = z, z)
norms and inner products preserved, so angles are preserved:

(Uz, U z) =

(z, z)
thus, multiplication by U preserves inner products, angles, and distances
Orthonormal sets of vectors and QR factorization 45
Orthonormal basis for R
n
suppose u
1
, . . . , u
n
is an orthonormal basis for R
n
then U = [u
1
u
n
] is called orthogonal: it is square and satises
U
T
U = I
(youd think such matrices would be called orthonormal, not orthogonal )
it follows that U
1
= U
T
, and hence also UU
T
= I, i.e.,
n

i=1
u
i
u
T
i
= I
Orthonormal sets of vectors and QR factorization 46
Expansion in orthonormal basis
suppose U is orthogonal, so x = UU
T
x, i.e.,
x =
n

i=1
(u
T
i
x)u
i
u
T
i
x is called the component of x in the direction u
i
a = U
T
x resolves x into the vector of its u
i
components
x = Ua reconstitutes x from its u
i
components
x = Ua =
n

i=1
a
i
u
i
is called the (u
i
-) expansion of x
Orthonormal sets of vectors and QR factorization 47
the identity I = UU
T
=

n
i=1
u
i
u
T
i
is sometimes written (in physics) as
I =
n

i=1
[u
i
)u
i
[
since
x =
n

i=1
[u
i
)u
i
[x)
(but we wont use this notation)
Orthonormal sets of vectors and QR factorization 48
Geometric interpretation
if U is orthogonal, then transformation w = Uz
preserves norm of vectors, i.e., |Uz| = |z|
preserves angles between vectors, i.e.,

(Uz, U z) =

(z, z)
examples:
rotations (about some axis)
reections (through some plane)
Orthonormal sets of vectors and QR factorization 49
Example: rotation by in R
2
is given by
y = U

x, U

=
_
cos sin
sin cos
_
since e
1
(cos , sin ), e
2
(sin , cos )
reection across line x
2
= x
1
tan(/2) is given by
y = R

x, R

=
_
cos sin
sin cos
_
since e
1
(cos , sin ), e
2
(sin , cos )
Orthonormal sets of vectors and QR factorization 410

x
1
x
1
x
2
x
2
e
1
e
1
e
2
e
2
rotation reection
can check that U

and R

are orthogonal
Orthonormal sets of vectors and QR factorization 411
Gram-Schmidt procedure
given independent vectors a
1
, . . . , a
k
R
n
, G-S procedure nds
orthonormal vectors q
1
, . . . , q
k
s.t.
span(a
1
, . . . , a
r
) = span(q
1
, . . . , q
r
) for r k
thus, q
1
, . . . , q
r
is an orthonormal basis for span(a
1
, . . . , a
r
)
rough idea of method: rst orthogonalize each vector w.r.t. previous
ones; then normalize result to have norm one
Orthonormal sets of vectors and QR factorization 412
Gram-Schmidt procedure
step 1a. q
1
:= a
1
step 1b. q
1
:= q
1
/| q
1
| (normalize)
step 2a. q
2
:= a
2
(q
T
1
a
2
)q
1
(remove q
1
component from a
2
)
step 2b. q
2
:= q
2
/| q
2
| (normalize)
step 3a. q
3
:= a
3
(q
T
1
a
3
)q
1
(q
T
2
a
3
)q
2
(remove q
1
, q
2
components)
step 3b. q
3
:= q
3
/| q
3
| (normalize)
etc.
Orthonormal sets of vectors and QR factorization 413
q
1
= a
1
q
1
q
2
q
2
= a
2
(q
T
1
a
2
)q
1
a
2
for i = 1, 2, . . . , k we have
a
i
= (q
T
1
a
i
)q
1
+ (q
T
2
a
i
)q
2
+ + (q
T
i1
a
i
)q
i1
+| q
i
|q
i
= r
1i
q
1
+ r
2i
q
2
+ + r
ii
q
i
(note that the r
ij
s come right out of the G-S procedure, and r
ii
,= 0)
Orthonormal sets of vectors and QR factorization 414
QR decomposition
written in matrix form: A = QR, where A R
nk
, Q R
nk
, R R
kk
:
_
a
1
a
2
a
k

. .
A
=
_
q
1
q
2
q
k

. .
Q
_

_
r
11
r
12
r
1k
0 r
22
r
2k
.
.
.
.
.
.
.
.
.
.
.
.
0 0 r
kk
_

_
. .
R
Q
T
Q = I
k
, and R is upper triangular & invertible
called QR decomposition (or factorization) of A
usually computed using a variation on Gram-Schmidt procedure which is
less sensitive to numerical (rounding) errors
columns of Q are orthonormal basis for 1(A)
Orthonormal sets of vectors and QR factorization 415
General Gram-Schmidt procedure
in basic G-S we assume a
1
, . . . , a
k
R
n
are independent
if a
1
, . . . , a
k
are dependent, we nd q
j
= 0 for some j, which means a
j
is linearly dependent on a
1
, . . . , a
j1
modied algorithm: when we encounter q
j
= 0, skip to next vector a
j+1
and continue:
r = 0;
for i = 1, . . . , k

a = a
i

r
j=1
q
j
q
T
j
a
i
;
if a ,= 0 r = r + 1; q
r
= a/| a|;

Orthonormal sets of vectors and QR factorization 416


on exit,
q
1
, . . . , q
r
is an orthonormal basis for 1(A) (hence r = Rank(A))
each a
i
is linear combination of previously generated q
j
s
in matrix notation we have A = QR with Q
T
Q = I
r
and R R
rk
in
upper staircase form:
zero entries

possibly nonzero entries


corner entries (shown as ) are nonzero
Orthonormal sets of vectors and QR factorization 417
can permute columns with to front of matrix:
A = Q[

R S]P
where:
Q
T
Q = I
r


R R
rr
is upper triangular and invertible
P R
kk
is a permutation matrix
(which moves forward the columns of a which generated a new q)
Orthonormal sets of vectors and QR factorization 418
Applications
directly yields orthonormal basis for 1(A)
yields factorization A = BC with B R
nr
, C R
rk
, r = Rank(A)
to check if b span(a
1
, . . . , a
k
): apply Gram-Schmidt to [a
1
a
k
b]
staircase pattern in R shows which columns of A are dependent on
previous ones
works incrementally: one G-S procedure yields QR factorizations of
[a
1
a
p
] for p = 1, . . . , k:
[a
1
a
p
] = [q
1
q
s
]R
p
where s = Rank([a
1
a
p
]) and R
p
is leading s p submatrix of R
Orthonormal sets of vectors and QR factorization 419
Full QR factorization
with A = Q
1
R
1
the QR factorization as above, write
A =
_
Q
1
Q
2

_
R
1
0
_
where [Q
1
Q
2
] is orthogonal, i.e., columns of Q
2
R
n(nr)
are
orthonormal, orthogonal to Q
1
to nd Q
2
:
nd any matrix

A s.t. [A

A] is full rank (e.g.,

A = I)
apply general Gram-Schmidt to [A

A]
Q
1
are orthonormal vectors obtained from columns of A
Q
2
are orthonormal vectors obtained from extra columns (

A)
Orthonormal sets of vectors and QR factorization 420
i.e., any set of orthonormal vectors can be extended to an orthonormal
basis for R
n
1(Q
1
) and 1(Q
2
) are called complementary subspaces since
they are orthogonal (i.e., every vector in the rst subspace is orthogonal
to every vector in the second subspace)
their sum is R
n
(i.e., every vector in R
n
can be expressed as a sum of
two vectors, one from each subspace)
this is written
1(Q
1
)

+ 1(Q
2
) = R
n
1(Q
2
) = 1(Q
1
)

(and 1(Q
1
) = 1(Q
2
)

)
(each subspace is the orthogonal complement of the other)
we know 1(Q
1
) = 1(A); but what is its orthogonal complement 1(Q
2
)?
Orthonormal sets of vectors and QR factorization 421
Orthogonal decomposition induced by A
from A
T
=
_
R
T
1
0

_
Q
T
1
Q
T
2
_
we see that
A
T
z = 0 Q
T
1
z = 0 z 1(Q
2
)
so 1(Q
2
) = ^(A
T
)
(in fact the columns of Q
2
are an orthonormal basis for ^(A
T
))
we conclude: 1(A) and ^(A
T
) are complementary subspaces:
1(A)

+ ^(A
T
) = R
n
(recall A R
nk
)
1(A)

= ^(A
T
) (and ^(A
T
)

= 1(A))
called orthogonal decomposition (of R
n
) induced by A R
nk
Orthonormal sets of vectors and QR factorization 422
every y R
n
can be written uniquely as y = z + w, with z 1(A),
w ^(A
T
) (well soon see what the vector z is . . . )
can now prove most of the assertions from the linear algebra review
lecture
switching A R
nk
to A
T
R
kn
gives decomposition of R
k
:
^(A)

+ 1(A
T
) = R
k
Orthonormal sets of vectors and QR factorization 423
EE263 Autumn 2010-11 Stephen Boyd
Lecture 5
Least-squares
least-squares (approximate) solution of overdetermined equations
projection and orthogonality principle
least-squares estimation
BLUE property
51
Overdetermined linear equations
consider y = Ax where A R
mn
is (strictly) skinny, i.e., m > n
called overdetermined set of linear equations
(more equations than unknowns)
for most y, cannot solve for x
one approach to approximately solve y = Ax:
dene residual or error r = Ax y
nd x = x
ls
that minimizes r
x
ls
called least-squares (approximate) solution of y = Ax
Least-squares 52
Geometric interpretation
Ax
ls
is point in R(A) closest to y (Ax
ls
is projection of y onto R(A))
R(A)
y
Ax
ls
r
Least-squares 53
Least-squares (approximate) solution
assume A is full rank, skinny
to nd x
ls
, well minimize norm of residual squared,
r
2
= x
T
A
T
Ax 2y
T
Ax + y
T
y
set gradient w.r.t. x to zero:

x
r
2
= 2A
T
Ax 2A
T
y = 0
yields the normal equations: A
T
Ax = A
T
y
assumptions imply A
T
A invertible, so we have
x
ls
= (A
T
A)
1
A
T
y
. . . a very famous formula
Least-squares 54
x
ls
is linear function of y
x
ls
= A
1
y if A is square
x
ls
solves y = Ax
ls
if y R(A)
A

= (A
T
A)
1
A
T
is called the pseudo-inverse of A
A

is a left inverse of (full rank, skinny) A:


A

A = (A
T
A)
1
A
T
A = I
Least-squares 55
Projection on R(A)
Ax
ls
is (by denition) the point in R(A) that is closest to y, i.e., it is the
projection of y onto R(A)
Ax
ls
= P
R(A)
(y)
the projection function P
R(A)
is linear, and given by
P
R(A)
(y) = Ax
ls
= A(A
T
A)
1
A
T
y
A(A
T
A)
1
A
T
is called the projection matrix (associated with R(A))
Least-squares 56
Orthogonality principle
optimal residual
r = Ax
ls
y = (A(A
T
A)
1
A
T
I)y
is orthogonal to R(A):
r, Az = y
T
(A(A
T
A)
1
A
T
I)
T
Az = 0
for all z R
n
R(A)
y
Ax
ls
r
Least-squares 57
Completion of squares
since r = Ax
ls
y A(x x
ls
) for any x, we have
Ax y
2
= (Ax
ls
y) + A(x x
ls
)
2
= Ax
ls
y
2
+A(x x
ls
)
2
this shows that for x = x
ls
, Ax y > Ax
ls
y
Least-squares 58
Least-squares via QR factorization
A R
mn
skinny, full rank
factor as A = QR with Q
T
Q = I
n
, R R
nn
upper triangular,
invertible
pseudo-inverse is
(A
T
A)
1
A
T
= (R
T
Q
T
QR)
1
R
T
Q
T
= R
1
Q
T
so x
ls
= R
1
Q
T
y
projection on R(A) given by matrix
A(A
T
A)
1
A
T
= AR
1
Q
T
= QQ
T
Least-squares 59
Least-squares via full QR factorization
full QR factorization:
A = [Q
1
Q
2
]

R
1
0

with [Q
1
Q
2
] R
mm
orthogonal, R
1
R
nn
upper triangular,
invertible
multiplication by orthogonal matrix doesnt change norm, so
Ax y
2
=

[Q
1
Q
2
]

R
1
0

x y

2
=

[Q
1
Q
2
]
T
[Q
1
Q
2
]

R
1
0

x [Q
1
Q
2
]
T
y

2
Least-squares 510
=

R
1
x Q
T
1
y
Q
T
2
y

2
= R
1
x Q
T
1
y
2
+Q
T
2
y
2
this is evidently minimized by choice x
ls
= R
1
1
Q
T
1
y
(which makes rst term zero)
residual with optimal x is
Ax
ls
y = Q
2
Q
T
2
y
Q
1
Q
T
1
gives projection onto R(A)
Q
2
Q
T
2
gives projection onto R(A)

Least-squares 511
Least-squares estimation
many applications in inversion, estimation, and reconstruction problems
have form
y = Ax + v
x is what we want to estimate or reconstruct
y is our sensor measurement(s)
v is an unknown noise or measurement error (assumed small)
ith row of A characterizes ith sensor
Least-squares 512
least-squares estimation: choose as estimate x that minimizes
A x y
i.e., deviation between
what we actually observed (y), and
what we would observe if x = x, and there were no noise (v = 0)
least-squares estimate is just x = (A
T
A)
1
A
T
y
Least-squares 513
BLUE property
linear measurement with noise:
y = Ax + v
with A full rank, skinny
consider a linear estimator of form x = By
called unbiased if x = x whenever v = 0
(i.e., no estimation error when there is no noise)
same as BA = I, i.e., B is left inverse of A
Least-squares 514
estimation error of unbiased linear estimator is
x x = x B(Ax + v) = Bv
obviously, then, wed like B small (and BA = I)
fact: A

= (A
T
A)
1
A
T
is the smallest left inverse of A, in the
following sense:
for any B with BA = I, we have

i,j
B
2
ij

i,j
A
2
ij
i.e., least-squares provides the best linear unbiased estimator (BLUE)
Least-squares 515
Navigation from range measurements
navigation using range measurements from distant beacons
x
k
1
k
2
k
3
k
4
beacons
unknown position
beacons far from unknown position x R
2
, so linearization around x = 0
(say) nearly exact
Least-squares 516
ranges y R
4
measured, with measurement noise v:
y =

k
T
1
k
T
2
k
T
3
k
T
4

x + v
where k
i
is unit vector from 0 to beacon i
measurement errors are independent, Gaussian, with standard deviation 2
(details not important)
problem: estimate x R
2
, given y R
4
(roughly speaking, a 2: 1 measurement redundancy ratio)
actual position is x = (5.59, 10.58);
measurement is y = (11.95, 2.84, 9.81, 2.81)
Least-squares 517
Just enough measurements method
y
1
and y
2
suce to nd x (when v = 0)
compute estimate x by inverting top (2 2) half of A:
x = B
je
y =

0 1.0 0 0
1.12 0.5 0 0

y =

2.84
11.9

(norm of error: 3.07)


Least-squares 518
Least-squares method
compute estimate x by least-squares:
x = A

y =

0.23 0.48 0.04 0.44


0.47 0.02 0.51 0.18

y =

4.95
10.26

(norm of error: 0.72)


B
je
and A

are both left inverses of A


larger entries in B lead to larger estimation error
Least-squares 519
Example from overview lecture
u w
y
H(s) A/D
signal u is piecewise constant, period 1 sec, 0 t 10:
u(t) = x
j
, j 1 t < j, j = 1, . . . , 10
ltered by system with impulse response h(t):
w(t) =

t
0
h(t )u() d
sample at 10Hz: y
i
= w(0.1i), i = 1, . . . , 100
Least-squares 520
3-bit quantization: y
i
= Q( y
i
), i = 1, . . . , 100, where Q is 3-bit
quantizer characteristic
Q(a) = (1/4) (round(4a + 1/2) 1/2)
problem: estimate x R
10
given y R
100
example:
0 1 2 3 4 5 6 7 8 9 10
0
0.5
1
1.5
0 1 2 3 4 5 6 7 8 9 10
1
0
1
0 1 2 3 4 5 6 7 8 9 10
1
0
1
0 1 2 3 4 5 6 7 8 9 10
1
0
1
s
(
t
)
u
(
t
)
w
(
t
)
y
(
t
)
t
Least-squares 521
we have y = Ax + v, where
A R
10010
is given by A
ij
=

j
j1
h(0.1i ) d
v R
100
is quantization error: v
i
= Q( y
i
) y
i
(so |v
i
| 0.125)
least-squares estimate: x
ls
= (A
T
A)
1
A
T
y
0 1 2 3 4 5 6 7 8 9 10
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
u
(
t
)
(
s
o
l
i
d
)
&
u
(
t
)
(
d
o
t
t
e
d
)
t
Least-squares 522
RMS error is
x x
ls

10
= 0.03
better than if we had no ltering! (RMS error 0.07)
more on this later . . .
Least-squares 523
some rows of B
ls
= (A
T
A)
1
A
T
:
0 1 2 3 4 5 6 7 8 9 10
0.05
0
0.05
0.1
0.15
0 1 2 3 4 5 6 7 8 9 10
0.05
0
0.05
0.1
0.15
0 1 2 3 4 5 6 7 8 9 10
0.05
0
0.05
0.1
0.15
r
o
w
2
r
o
w
5
r
o
w
8
t
rows show how sampled measurements of y are used to form estimate
of x
i
for i = 2, 5, 8
to estimate x
5
, which is the original input signal for 4 t < 5, we
mostly use y(t) for 3 t 7
Least-squares 524
EE263 Autumn 2010-11 Stephen Boyd
Lecture 6
Least-squares applications
least-squares data tting
growing sets of regressors
system identication
growing sets of measurements and recursive least-squares
61
Least-squares data tting
we are given:
functions f
1
, . . . , f
n
: S R, called regressors or basis functions
data or measurements (s
i
, g
i
), i = 1, . . . , m, where s
i
S and (usually)
m n
problem: nd coecients x
1
, . . . , x
n
R so that
x
1
f
1
(s
i
) + + x
n
f
n
(s
i
) g
i
, i = 1, . . . , m
i.e., nd linear combination of functions that ts data
least-squares t: choose x to minimize total square tting error:
m

i=1
(x
1
f
1
(s
i
) + + x
n
f
n
(s
i
) g
i
)
2
Least-squares applications 62
using matrix notation, total square tting error is Ax g
2
, where
A
ij
= f
j
(s
i
)
hence, least-squares t is given by
x = (A
T
A)
1
A
T
g
(assuming A is skinny, full rank)
corresponding function is
f
lst
(s) = x
1
f
1
(s) + + x
n
f
n
(s)
applications:
interpolation, extrapolation, smoothing of data
developing simple, approximate model of data
Least-squares applications 63
Least-squares polynomial tting
problem: t polynomial of degree < n,
p(t) = a
0
+ a
1
t + + a
n1
t
n1
,
to data (t
i
, y
i
), i = 1, . . . , m
basis functions are f
j
(t) = t
j1
, j = 1, . . . , n
matrix A has form A
ij
= t
j1
i
A =

1 t
1
t
2
1
t
n1
1
1 t
2
t
2
2
t
n1
2
.
.
.
.
.
.
1 t
m
t
2
m
t
n1
m

(called a Vandermonde matrix)


Least-squares applications 64
assuming t
k
= t
l
for k = l and m n, A is full rank:
suppose Aa = 0
corresponding polynomial p(t) = a
0
+ + a
n1
t
n1
vanishes at m
points t
1
, . . . , t
m
by fundamental theorem of algebra p can have no more than n 1
zeros, so p is identically zero, and a = 0
columns of A are independent, i.e., A full rank
Least-squares applications 65
Example
t g(t) = 4t/(1 + 10t
2
) with polynomial
m = 100 points between t = 0 & t = 1
least-squares t for degrees 1, 2, 3, 4 have RMS errors .135, .076, .025,
.005, respectively
Least-squares applications 66
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.5
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.5
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.5
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.5
1
t
p
1
(
t
)
p
2
(
t
)
p
3
(
t
)
p
4
(
t
)
Least-squares applications 67
Growing sets of regressors
consider family of least-squares problems
minimize

p
i=1
x
i
a
i
y
for p = 1, . . . , n
(a
1
, . . . , a
p
are called regressors)
approximate y by linear combination of a
1
, . . . , a
p
project y onto span{a
1
, . . . , a
p
}
regress y on a
1
, . . . , a
p
as p increases, get better t, so optimal residual decreases
Least-squares applications 68
solution for each p n is given by
x
(p)
ls
= (A
T
p
A
p
)
1
A
T
p
y = R
1
p
Q
T
p
y
where
A
p
= [a
1
a
p
] R
mp
is the rst p columns of A
A
p
= Q
p
R
p
is the QR factorization of A
p
R
p
R
pp
is the leading p p submatrix of R
Q
p
= [q
1
q
p
] is the rst p columns of Q
Least-squares applications 69
Norm of optimal residual versus p
plot of optimal residual versus p shows how well y can be matched by
linear combination of a
1
, . . . , a
p
, as function of p
residual
p
0 1 2 3 4 5 6 7
y
min
x
1
x
1
a
1
y
min
x
1
,...,x
7

7
i=1
x
i
a
i
y
Least-squares applications 610
Least-squares system identication
we measure input u(t) and output y(t) for t = 0, . . . , N of unknown system
u(t) y(t) unknown system
system identication problem: nd reasonable model for system based
on measured I/O data u, y
example with scalar u, y (vector u, y readily handled): t I/O data with
moving-average (MA) model with n delays
y(t) = h
0
u(t) + h
1
u(t 1) + + h
n
u(t n)
where h
0
, . . . , h
n
R
Least-squares applications 611
we can write model or predicted output as

y(n)
y(n + 1)
.
.
.
y(N)

u(n) u(n 1) u(0)


u(n + 1) u(n) u(1)
.
.
.
.
.
.
.
.
.
u(N) u(N 1) u(N n)

h
0
h
1
.
.
.
h
n

model prediction error is


e = (y(n) y(n), . . . , y(N) y(N))
least-squares identication: choose model (i.e., h) that minimizes norm
of model prediction error e
. . . a least-squares problem (with variables h)
Least-squares applications 612
Example
0 10 20 30 40 50 60 70
4
2
0
2
4
0 10 20 30 40 50 60 70
5
0
5
y
(
t
)
t
t
u
(
t
)
for n = 7 we obtain MA model with
(h
0
, . . . , h
7
) = (.024, .282, .418, .354, .243, .487, .208, .441)
with relative prediction error e/y = 0.37
Least-squares applications 613
0 10 20 30 40 50 60 70
4
3
2
1
0
1
2
3
4
5
t
solid: y(t): actual output
dashed: y(t), predicted from model
Least-squares applications 614
Model order selection
question: how large should n be?
obviously the larger n, the smaller the prediction error on the data used
to form the model
suggests using largest possible model order for smallest prediction error
Least-squares applications 615
0 5 10 15 20 25 30 35 40 45 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n
r
e
l
a
t
i
v
e
p
r
e
d
i
c
t
i
o
n
e
r
r
o
r

diculty: for n too large the predictive ability of the model on other I/O
data (from the same system) becomes worse
Least-squares applications 616
Out of sample validation
evaluate model predictive performance on another I/O data set not used to
develop model
model validation data set:
0 10 20 30 40 50 60 70
4
2
0
2
4
0 10 20 30 40 50 60 70
5
0
5
y
(
t
)
t
t
u
(
t
)
Least-squares applications 617
now check prediction error of models (developed using modeling data) on
validation data:
0 5 10 15 20 25 30 35 40 45 50
0
0.2
0.4
0.6
0.8
1
n
r
e
l
a
t
i
v
e
p
r
e
d
i
c
t
i
o
n
e
r
r
o
r
validation data
modeling data
plot suggests n = 10 is a good choice
Least-squares applications 618
for n = 50 the actual and predicted outputs on system identication and
model validation data are:
0 10 20 30 40 50 60 70
5
0
5
0 10 20 30 40 50 60 70
5
0
5
t
t
solid: y(t)
dashed: predicted y(t)
solid: y(t)
dashed: predicted y(t)
loss of predictive ability when n too large is called model overt or
overmodeling
Least-squares applications 619
Growing sets of measurements
least-squares problem in row form:
minimize Ax y
2
=
m

i=1
( a
T
i
x y
i
)
2
where a
T
i
are the rows of A ( a
i
R
n
)
x R
n
is some vector to be estimated
each pair a
i
, y
i
corresponds to one measurement
solution is
x
ls
=

i=1
a
i
a
T
i

1
m

i=1
y
i
a
i
suppose that a
i
and y
i
become available sequentially, i.e., m increases
with time
Least-squares applications 620
Recursive least-squares
we can compute x
ls
(m) =

i=1
a
i
a
T
i

1
m

i=1
y
i
a
i
recursively
initialize P(0) = 0 R
nn
, q(0) = 0 R
n
for m = 0, 1, . . . ,
P(m + 1) = P(m) + a
m+1
a
T
m+1
q(m + 1) = q(m) + y
m+1
a
m+1
if P(m) is invertible, we have x
ls
(m) = P(m)
1
q(m)
P(m) is invertible a
1
, . . . , a
m
span R
n
(so, once P(m) becomes invertible, it stays invertible)
Least-squares applications 621
Fast update for recursive least-squares
we can calculate
P(m + 1)
1
=

P(m) + a
m+1
a
T
m+1

1
eciently from P(m)
1
using the rank one update formula

P + a a
T

1
= P
1

1
1 + a
T
P
1
a
(P
1
a)(P
1
a)
T
valid when P = P
T
, and P and P + a a
T
are both invertible
gives an O(n
2
) method for computing P(m + 1)
1
from P(m)
1
standard methods for computing P(m+1)
1
from P(m+1) are O(n
3
)
Least-squares applications 622
Verication of rank one update formula
(P + a a
T
)

P
1

1
1 + a
T
P
1
a
(P
1
a)(P
1
a)
T

= I + a a
T
P
1

1
1 + a
T
P
1
a
P(P
1
a)(P
1
a)
T

1
1 + a
T
P
1
a
a a
T
(P
1
a)(P
1
a)
T
= I + a a
T
P
1

1
1 + a
T
P
1
a
a a
T
P
1

a
T
P
1
a
1 + a
T
P
1
a
a a
T
P
1
= I
Least-squares applications 623
EE263 Autumn 2010-11 Stephen Boyd
Lecture 7
Regularized least-squares and Gauss-Newton
method
multi-objective least-squares
regularized least-squares
nonlinear least-squares
Gauss-Newton method
71
Multi-objective least-squares
in many problems we have two (or more) objectives
we want J
1
= Ax y
2
small
and also J
2
= Fx g
2
small
(x R
n
is the variable)
usually the objectives are competing
we can make one smaller, at the expense of making the other larger
common example: F = I, g = 0; we want Ax y small, with small x
Regularized least-squares and Gauss-Newton method 72
Plot of achievable objective pairs
plot (J
2
, J
1
) for every x:
J
2
J
1 x
(1)
x
(2)
x
(3)
note that x R
n
, but this plot is in R
2
; point labeled x
(1)
is really

J
2
(x
(1)
), J
1
(x
(1)
)

Regularized least-squares and Gauss-Newton method 73


shaded area shows (J
2
, J
1
) achieved by some x R
n
clear area shows (J
2
, J
1
) not achieved by any x R
n
boundary of region is called optimal trade-o curve
corresponding x are called Pareto optimal
(for the two objectives Ax y
2
, Fx g
2
)
three example choices of x: x
(1)
, x
(2)
, x
(3)
x
(3)
is worse than x
(2)
on both counts (J
2
and J
1
)
x
(1)
is better than x
(2)
in J
2
, but worse in J
1
Regularized least-squares and Gauss-Newton method 74
Weighted-sum objective
to nd Pareto optimal points, i.e., xs on optimal trade-o curve, we
minimize weighted-sum objective
J
1
+ J
2
= Ax y
2
+ Fx g
2
parameter 0 gives relative weight between J
1
and J
2
points where weighted sum is constant, J
1
+ J
2
= , correspond to
line with slope on (J
2
, J
1
) plot
Regularized least-squares and Gauss-Newton method 75
S
J
2
J
1 x
(1)
x
(2)
x
(3)
J
1
+ J
2
=
x
(2)
minimizes weighted-sum objective for shown
by varying from 0 to +, can sweep out entire optimal tradeo curve
Regularized least-squares and Gauss-Newton method 76
Minimizing weighted-sum objective
can express weighted-sum objective as ordinary least-squares objective:
Ax y
2
+ Fx g
2
=

2
=

Ax y

2
where

A =

, y =

hence solution is (assuming



A full rank)
x =


A
T

A
T
y
=

A
T
A + F
T
F

A
T
y + F
T
g

Regularized least-squares and Gauss-Newton method 77


Example
f
unit mass at rest subject to forces x
i
for i 1 < t i, i = 1, . . . , 10
y R is position at t = 10; y = a
T
x where a R
10
J
1
= (y 1)
2
(nal position error squared)
J
2
= x
2
(sum of squares of forces)
weighted-sum objective: (a
T
x 1)
2
+ x
2
optimal x:
x =

aa
T
+ I

1
a
Regularized least-squares and Gauss-Newton method 78
optimal trade-o curve:
0 0.5 1 1.5 2 2.5 3 3.5
x 10
3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
J
2
= x
2
J
1
=
(
y

1
)
2
upper left corner of optimal trade-o curve corresponds to x = 0
bottom right corresponds to input that yields y = 1, i.e., J
1
= 0
Regularized least-squares and Gauss-Newton method 79
Regularized least-squares
when F = I, g = 0 the objectives are
J
1
= Ax y
2
, J
2
= x
2
minimizer of weighted-sum objective,
x =

A
T
A + I

1
A
T
y,
is called regularized least-squares (approximate) solution of Ax y
also called Tychonov regularization
for > 0, works for any A (no restrictions on shape, rank . . . )
Regularized least-squares and Gauss-Newton method 710
estimation/inversion application:
Ax y is sensor residual
prior information: x small
or, model only accurate for x small
regularized solution trades o sensor t, size of x
Regularized least-squares and Gauss-Newton method 711
Nonlinear least-squares
nonlinear least-squares (NLLS) problem: nd x R
n
that minimizes
r(x)
2
=
m

i=1
r
i
(x)
2
,
where r : R
n
R
m
r(x) is a vector of residuals
reduces to (linear) least-squares if r(x) = Ax y
Regularized least-squares and Gauss-Newton method 712
Position estimation from ranges
estimate position x R
2
from approximate distances to beacons at
locations b
1
, . . . , b
m
R
2
without linearizing
we measure
i
= x b
i
+ v
i
(v
i
is range error, unknown but assumed small)
NLLS estimate: choose x to minimize
m

i=1
r
i
(x)
2
=
m

i=1
(
i
x b
i
)
2
Regularized least-squares and Gauss-Newton method 713
Gauss-Newton method for NLLS
NLLS: nd x R
n
that minimizes r(x)
2
=
m

i=1
r
i
(x)
2
, where
r : R
n
R
m
in general, very hard to solve exactly
many good heuristics to compute locally optimal solution
Gauss-Newton method:
given starting guess for x
repeat
linearize r near current guess
new guess is linear LS solution, using linearized r
until convergence
Regularized least-squares and Gauss-Newton method 714
Gauss-Newton method (more detail):
linearize r near current iterate x
(k)
:
r(x) r(x
(k)
) + Dr(x
(k)
)(x x
(k)
)
where Dr is the Jacobian: (Dr)
ij
= r
i
/x
j
write linearized approximation as
r(x
(k)
) + Dr(x
(k)
)(x x
(k)
) = A
(k)
x b
(k)
A
(k)
= Dr(x
(k)
), b
(k)
= Dr(x
(k)
)x
(k)
r(x
(k)
)
at kth iteration, we approximate NLLS problem by linear LS problem:
r(x)
2

A
(k)
x b
(k)

2
Regularized least-squares and Gauss-Newton method 715
next iterate solves this linearized LS problem:
x
(k+1)
=

A
(k)T
A
(k)

1
A
(k)T
b
(k)
repeat until convergence (which isnt guaranteed)
Regularized least-squares and Gauss-Newton method 716
Gauss-Newton example
10 beacons
+ true position (3.6, 3.2); initial guess (1.2, 1.2)
range estimates accurate to 0.5
5 4 3 2 1 0 1 2 3 4 5
5
4
3
2
1
0
1
2
3
4
5
Regularized least-squares and Gauss-Newton method 717
NLLS objective r(x)
2
versus x:
5
0
5
5
0
5
0
2
4
6
8
10
12
14
16
for a linear LS problem, objective would be nice quadratic bowl
bumps in objective due to strong nonlinearity of r
Regularized least-squares and Gauss-Newton method 718
objective of Gauss-Newton iterates:
1 2 3 4 5 6 7 8 9 10
0
2
4
6
8
10
12

r
(
x
)

2
iteration
x
(k)
converges to (in this case, global) minimum of r(x)
2
convergence takes only ve or so steps
Regularized least-squares and Gauss-Newton method 719
nal estimate is x = (3.3, 3.3)
estimation error is x x = 0.31
(substantially smaller than range accuracy!)
Regularized least-squares and Gauss-Newton method 720
convergence of Gauss-Newton iterates:
5 4 3 2 1 0 1 2 3 4 5
5
4
3
2
1
0
1
2
3
4
5
1
2
3
4
56
Regularized least-squares and Gauss-Newton method 721
useful varation on Gauss-Newton: add regularization term
A
(k)
x b
(k)

2
+ x x
(k)

2
so that next iterate is not too far from previous one (hence, linearized
model still pretty accurate)
Regularized least-squares and Gauss-Newton method 722
EE263 Autumn 2010-11 Stephen Boyd
Lecture 8
Least-norm solutions of undetermined
equations
least-norm solution of underdetermined equations
minimum norm solutions via QR factorization
derivation via Lagrange multipliers
relation to regularized least-squares
general norm minimization with equality constraints
81
Underdetermined linear equations
we consider
y = Ax
where A R
mn
is fat (m < n), i.e.,
there are more variables than equations
x is underspecied, i.e., many choices of x lead to the same y
well assume that A is full rank (m), so for each y R
m
, there is a solution
set of all solutions has form
{ x | Ax = y } = { x
p
+ z | z N(A) }
where x
p
is any (particular) solution, i.e., Ax
p
= y
Least-norm solutions of undetermined equations 82
z characterizes available choices in solution
solution has dimN(A) = n m degrees of freedom
can choose z to satisfy other specs or optimize among solutions
Least-norm solutions of undetermined equations 83
Least-norm solution
one particular solution is
x
ln
= A
T
(AA
T
)
1
y
(AA
T
is invertible since A full rank)
in fact, x
ln
is the solution of y = Ax that minimizes x
i.e., x
ln
is solution of optimization problem
minimize x
subject to Ax = y
(with variable x R
n
)
Least-norm solutions of undetermined equations 84
suppose Ax = y, so A(x x
ln
) = 0 and
(x x
ln
)
T
x
ln
= (x x
ln
)
T
A
T
(AA
T
)
1
y
= (A(x x
ln
))
T
(AA
T
)
1
y
= 0
i.e., (x x
ln
) x
ln
, so
x
2
= x
ln
+ x x
ln

2
= x
ln

2
+x x
ln

2
x
ln

2
i.e., x
ln
has smallest norm of any solution
Least-norm solutions of undetermined equations 85
N(A) = { x | Ax = 0 }
x
ln
{ x | Ax = y }
orthogonality condition: x
ln
N(A)
projection interpretation: x
ln
is projection of 0 on solution set
{ x | Ax = y }
Least-norm solutions of undetermined equations 86
A

= A
T
(AA
T
)
1
is called the pseudo-inverse of full rank, fat A
A
T
(AA
T
)
1
is a right inverse of A
I A
T
(AA
T
)
1
A gives projection onto N(A)
cf. analogous formulas for full rank, skinny matrix A:
A

= (A
T
A)
1
A
T
(A
T
A)
1
A
T
is a left inverse of A
A(A
T
A)
1
A
T
gives projection onto R(A)
Least-norm solutions of undetermined equations 87
Least-norm solution via QR factorization
nd QR factorization of A
T
, i.e., A
T
= QR, with
Q R
nm
, Q
T
Q = I
m
R R
mm
upper triangular, nonsingular
then
x
ln
= A
T
(AA
T
)
1
y = QR
T
y
x
ln
= R
T
y
Least-norm solutions of undetermined equations 88
Derivation via Lagrange multipliers
least-norm solution solves optimization problem
minimize x
T
x
subject to Ax = y
introduce Lagrange multipliers: L(x, ) = x
T
x +
T
(Ax y)
optimality conditions are

x
L = 2x + A
T
= 0,

L = Ax y = 0
from rst condition, x = A
T
/2
substitute into second to get = 2(AA
T
)
1
y
hence x = A
T
(AA
T
)
1
y
Least-norm solutions of undetermined equations 89
Example: transferring mass unit distance
f
unit mass at rest subject to forces x
i
for i 1 < t i, i = 1, . . . , 10
y
1
is position at t = 10, y
2
is velocity at t = 10
y = Ax where A R
210
(A is fat)
nd least norm force that transfers mass unit distance with zero nal
velocity, i.e., y = (1, 0)
Least-norm solutions of undetermined equations 810
0 2 4 6 8 10 12
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10 12
0
0.05
0.1
0.15
0.2
0 2 4 6 8 10 12
0.06
0.04
0.02
0
0.02
0.04
0.06
x
l
n
p
o
s
i
t
i
o
n
v
e
l
o
c
i
t
y
t
t
t
Least-norm solutions of undetermined equations 811
Relation to regularized least-squares
suppose A R
mn
is fat, full rank
dene J
1
= Ax y
2
, J
2
= x
2
least-norm solution minimizes J
2
with J
1
= 0
minimizer of weighted-sum objective J
1
+ J
2
= Ax y
2
+ x
2
is
x

A
T
A + I

1
A
T
y
fact: x

x
ln
as 0, i.e., regularized solution converges to
least-norm solution as 0
in matrix terms: as 0,

A
T
A + I

1
A
T
A
T

AA
T

1
(for full rank, fat A)
Least-norm solutions of undetermined equations 812
General norm minimization with equality constraints
consider problem
minimize Ax b
subject to Cx = d
with variable x
includes least-squares and least-norm problems as special cases
equivalent to
minimize (1/2)Ax b
2
subject to Cx = d
Lagrangian is
L(x, ) = (1/2)Ax b
2
+
T
(Cx d)
= (1/2)x
T
A
T
Ax b
T
Ax + (1/2)b
T
b +
T
Cx
T
d
Least-norm solutions of undetermined equations 813
optimality conditions are

x
L = A
T
Ax A
T
b + C
T
= 0,

L = Cx d = 0
write in block matrix form as

A
T
A C
T
C 0

x

A
T
b
d

if the block matrix is invertible, we have

A
T
A C
T
C 0

A
T
b
d

Least-norm solutions of undetermined equations 814


if A
T
A is invertible, we can derive a more explicit (and complicated)
formula for x
from rst block equation we get
x = (A
T
A)
1
(A
T
b C
T
)
substitute into Cx = d to get
C(A
T
A)
1
(A
T
b C
T
) = d
so
=

C(A
T
A)
1
C
T

C(A
T
A)
1
A
T
b d

recover x from equation above (not pretty)


x = (A
T
A)
1

A
T
b C
T

C(A
T
A)
1
C
T

C(A
T
A)
1
A
T
b d

Least-norm solutions of undetermined equations 815


EE263 Autumn 2010-11 Stephen Boyd
Lecture 9
Autonomous linear dynamical systems
autonomous linear dynamical systems
examples
higher order systems
linearization near equilibrium point
linearization along trajectory
91
Autonomous linear dynamical systems
continuous-time autonomous LDS has form
x = Ax
x(t) R
n
is called the state
n is the state dimension or (informally) the number of states
A is the dynamics matrix
(system is time-invariant if A doesnt depend on t)
Autonomous linear dynamical systems 92
picture (phase plane):
x
1
x
2
x(t)
x(t) = Ax(t)
Autonomous linear dynamical systems 93
example 1: x =

1 0
2 1

x
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
Autonomous linear dynamical systems 94
example 2: x =

0.5 1
1 0.5

x
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
Autonomous linear dynamical systems 95
Block diagram
block diagram representation of x = Ax:
1/s
n n
A
x(t) x(t)
1/s block represents n parallel scalar integrators
coupling comes from dynamics matrix A
Autonomous linear dynamical systems 96
useful when A has structure, e.g., block upper triangular:
x =

A
11
A
12
0 A
22

x
x
1
x
2 1/s
1/s
A
11
A
12
A
22
here x
1
doesnt aect x
2
at all
Autonomous linear dynamical systems 97
Linear circuit
linear static circuit
C
1
C
p
L
1
L
r
v
c1
v
cp
i
c1
i
cp
v
l1
v
lr
i
l1
i
lr
circuit equations are
C
dv
c
dt
= i
c
, L
di
l
dt
= v
l
,

i
c
v
l

= F

v
c
i
l

C = diag(C
1
, . . . , C
p
), L = diag(L
1
, . . . , L
r
)
Autonomous linear dynamical systems 98
with state x =

v
c
i
l

, we have
x =

C
1
0
0 L
1

Fx
Autonomous linear dynamical systems 99
Chemical reactions
reaction involving n chemicals; x
i
is concentration of chemical i
linear model of reaction kinetics
dx
i
dt
= a
i1
x
1
+ + a
in
x
n
good model for some reactions; A is usually sparse
Autonomous linear dynamical systems 910
Example: series reaction A
k
1
B
k
2
C with linear dynamics
x =

k
1
0 0
k
1
k
2
0
0 k
2
0

x
plot for k
1
= k
2
= 1, initial x(0) = (1, 0, 0)
0 1 2 3 4 5 6 7 8 9 10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
1
x
2
x
3
t
x
(
t
)
Autonomous linear dynamical systems 911
Finite-state discrete-time Markov chain
z(t) {1, . . . , n} is a random sequence with
Prob( z(t + 1) = i | z(t) = j ) = P
ij
where P R
nn
is the matrix of transition probabilities
can represent probability distribution of z(t) as n-vector
p(t) =

Prob(z(t) = 1)
.
.
.
Prob(z(t) = n)

(so, e.g., Prob(z(t) = 1, 2, or 3) = [1 1 1 0 0]p(t))


then we have p(t + 1) = Pp(t)
Autonomous linear dynamical systems 912
P is often sparse; Markov chain is depicted graphically
nodes are states
edges show transition probabilities
Autonomous linear dynamical systems 913
example:
1
2 3
0.9
0.1
0.7
0.1
0.2
1.0
state 1 is system OK
state 2 is system down
state 3 is system being repaired
p(t + 1) =

0.9 0.7 1.0


0.1 0.1 0
0 0.2 0

p(t)
Autonomous linear dynamical systems 914
Numerical integration of continuous system
compute approximate solution of x = Ax, x(0) = x
0
suppose h is small time step (x doesnt change much in h seconds)
simple (forward Euler) approximation:
x(t + h) x(t) + h x(t) = (I + hA)x(t)
by carrying out this recursion (discrete-time LDS), starting at x(0) = x
0
,
we get approximation
x(kh) (I + hA)
k
x(0)
(forward Euler is never used in practice)
Autonomous linear dynamical systems 915
Higher order linear dynamical systems
x
(k)
= A
k1
x
(k1)
+ + A
1
x
(1)
+ A
0
x, x(t) R
n
where x
(m)
denotes mth derivative
dene new variable z =

x
x
(1)
.
.
.
x
(k1)

R
nk
, so
z =

x
(1)
.
.
.
x
(k)

0 I 0 0
0 0 I 0
.
.
.
.
.
.
0 0 0 I
A
0
A
1
A
2
A
k1

z
a (rst order) LDS (with bigger state)
Autonomous linear dynamical systems 916
block diagram:
1/s 1/s 1/s
x
(k)
x
(k1)
x
(k2)
x
A
k1
A
k2
A
0
Autonomous linear dynamical systems 917
Mechanical systems
mechanical system with k degrees of freedom undergoing small motions:
M q + D q + Kq = 0
q(t) R
k
is the vector of generalized displacements
M is the mass matrix
K is the stiness matrix
D is the damping matrix
with state x =

q
q

we have
x =

q
q

0 I
M
1
K M
1
D

x
Autonomous linear dynamical systems 918
Linearization near equilibrium point
nonlinear, time-invariant dierential equation (DE):
x = f(x)
where f : R
n
R
n
suppose x
e
is an equilibrium point, i.e., f(x
e
) = 0
(so x(t) = x
e
satises DE)
now suppose x(t) is near x
e
, so
x(t) = f(x(t)) f(x
e
) + Df(x
e
)(x(t) x
e
)
Autonomous linear dynamical systems 919
with x(t) = x(t) x
e
, rewrite as

x(t) Df(x
e
)x(t)
replacing with = yields linearized approximation of DE near x
e
we hope solution of

x = Df(x
e
)x is a good approximation of x x
e
(more later)
Autonomous linear dynamical systems 920
example: pendulum
l

m
mg
2nd order nonlinear DE ml
2
= lmg sin
rewrite as rst order DE with state x =

:
x =

x
2
(g/l) sin x
1

Autonomous linear dynamical systems 921


equilibrium point (pendulum down): x = 0
linearized system near x
e
= 0:

x =

0 1
g/l 0

x
Autonomous linear dynamical systems 922
Does linearization work ?
the linearized system usually, but not always, gives a good idea of the
system behavior near x
e
example 1: x = x
3
near x
e
= 0
for x(0) > 0 solutions have form x(t) =

x(0)
2
+ 2t

1/2
linearized system is

x = 0; solutions are constant
example 2: z = z
3
near z
e
= 0
for z(0) > 0 solutions have form z(t) =

z(0)
2
2t

1/2
(nite escape time at t = z(0)
2
/2)
linearized system is

z = 0; solutions are constant
Autonomous linear dynamical systems 923
0 10 20 30 40 50 60 70 80 90 100
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
t
x(t)
z(t)
x(t) = z(t)
systems with very dierent behavior have same linearized system
linearized systems do not predict qualitative behavior of either system
Autonomous linear dynamical systems 924
Linearization along trajectory
suppose x
traj
: R
+
R
n
satises x
traj
(t) = f(x
traj
(t), t)
suppose x(t) is another trajectory, i.e., x(t) = f(x(t), t), and is near
x
traj
(t)
then
d
dt
(x x
traj
) = f(x, t) f(x
traj
, t) D
x
f(x
traj
, t)(x x
traj
)
(time-varying) LDS

x = D
x
f(x
traj
, t)x
is called linearized or variational system along trajectory x
traj
Autonomous linear dynamical systems 925
example: linearized oscillator
suppose x
traj
(t) is T-periodic solution of nonlinear DE:
x
traj
(t) = f(x
traj
(t)), x
traj
(t + T) = x
traj
(t)
linearized system is

x = A(t)x
where A(t) = Df(x
traj
(t))
A(t) is T-periodic, so linearized system is called T-periodic linear system.
used to study:
startup dynamics of clock and oscillator circuits
eects of power supply and other disturbances on clock behavior
Autonomous linear dynamical systems 926
EE263 Autumn 2010-11 Stephen Boyd
Lecture 10
Solution via Laplace transform and matrix
exponential
Laplace transform
solving x = Ax via Laplace transform
state transition matrix
matrix exponential
qualitative behavior and stability
101
Laplace transform of matrix valued function
suppose z : R
+
R
pq
Laplace transform: Z = L(z), where Z : D C C
pq
is dened by
Z(s) =
_

0
e
st
z(t) dt
integral of matrix is done term-by-term
convention: upper case denotes Laplace transform
D is the domain or region of convergence of Z
D includes at least {s | s > a}, where a satises |z
ij
(t)| e
at
for
t 0, i = 1, . . . , p, j = 1, . . . , q
Solution via Laplace transform and matrix exponential 102
Derivative property
L( z) = sZ(s) z(0)
to derive, integrate by parts:
L( z)(s) =
_

0
e
st
z(t) dt
= e
st
z(t)

t
t=0
+s
_

0
e
st
z(t) dt
= sZ(s) z(0)
Solution via Laplace transform and matrix exponential 103
Laplace transform solution of x = Ax
consider continuous-time time-invariant (TI) LDS
x = Ax
for t 0, where x(t) R
n
take Laplace transform: sX(s) x(0) = AX(s)
rewrite as (sI A)X(s) = x(0)
hence X(s) = (sI A)
1
x(0)
take inverse transform
x(t) = L
1
_
(sI A)
1
_
x(0)
Solution via Laplace transform and matrix exponential 104
Resolvent and state transition matrix
(sI A)
1
is called the resolvent of A
resolvent dened for s C except eigenvalues of A, i.e., s such that
det(sI A) = 0
(t) = L
1
_
(sI A)
1
_
is called the state-transition matrix; it maps
the initial state to the state at time t:
x(t) = (t)x(0)
(in particular, state x(t) is a linear function of initial state x(0))
Solution via Laplace transform and matrix exponential 105
Example 1: Harmonic oscillator
x =
_
0 1
1 0
_
x
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
Solution via Laplace transform and matrix exponential 106
sI A =
_
s 1
1 s
_
, so resolvent is
(sI A)
1
=
_
s
s
2
+1
1
s
2
+1
1
s
2
+1
s
s
2
+1
_
(eigenvalues are i)
state transition matrix is
(t) = L
1
__
s
s
2
+1
1
s
2
+1
1
s
2
+1
s
s
2
+1
__
=
_
cos t sin t
sin t cos t
_
a rotation matrix (t radians)
so we have x(t) =
_
cos t sin t
sin t cos t
_
x(0)
Solution via Laplace transform and matrix exponential 107
Example 2: Double integrator
x =
_
0 1
0 0
_
x
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
Solution via Laplace transform and matrix exponential 108
sI A =
_
s 1
0 s
_
, so resolvent is
(sI A)
1
=
_
1
s
1
s
2
0
1
s
_
(eigenvalues are 0, 0)
state transition matrix is
(t) = L
1
__
1
s
1
s
2
0
1
s
__
=
_
1 t
0 1
_
so we have x(t) =
_
1 t
0 1
_
x(0)
Solution via Laplace transform and matrix exponential 109
Characteristic polynomial
X(s) = det(sI A) is called the characteristic polynomial of A
X(s) is a polynomial of degree n, with leading (i.e., s
n
) coecient one
roots of X are the eigenvalues of A
X has real coecients, so eigenvalues are either real or occur in
conjugate pairs
there are n eigenvalues (if we count multiplicity as roots of X)
Solution via Laplace transform and matrix exponential 1010
Eigenvalues of A and poles of resolvent
i, j entry of resolvent can be expressed via Cramers rule as
(1)
i+j
det
ij
det(sI A)
where
ij
is sI A with jth row and ith column deleted
det
ij
is a polynomial of degree less than n, so i, j entry of resolvent
has form f
ij
(s)/X(s) where f
ij
is polynomial with degree less than n
poles of entries of resolvent must be eigenvalues of A
but not all eigenvalues of A show up as poles of each entry
(when there are cancellations between det
ij
and X(s))
Solution via Laplace transform and matrix exponential 1011
Matrix exponential
(I C)
1
= I +C +C
2
+C
3
+ (if series converges)
series expansion of resolvent:
(sI A)
1
= (1/s)(I A/s)
1
=
I
s
+
A
s
2
+
A
2
s
3
+
(valid for |s| large enough) so
(t) = L
1
_
(sI A)
1
_
= I +tA+
(tA)
2
2!
+
Solution via Laplace transform and matrix exponential 1012
looks like ordinary power series
e
at
= 1 +ta +
(ta)
2
2!
+
with square matrices instead of scalars . . .
dene matrix exponential as
e
M
= I +M +
M
2
2!
+
for M R
nn
(which in fact converges for all M)
with this denition, state-transition matrix is
(t) = L
1
_
(sI A)
1
_
= e
tA
Solution via Laplace transform and matrix exponential 1013
Matrix exponential solution of autonomous LDS
solution of x = Ax, with A R
nn
and constant, is
x(t) = e
tA
x(0)
generalizes scalar case: solution of x = ax, with a R and constant, is
x(t) = e
ta
x(0)
Solution via Laplace transform and matrix exponential 1014
matrix exponential is meant to look like scalar exponential
some things youd guess hold for the matrix exponential (by analogy
with the scalar exponential) do in fact hold
but many things youd guess are wrong
example: you might guess that e
A+B
= e
A
e
B
, but its false (in general)
A =
_
0 1
1 0
_
, B =
_
0 1
0 0
_
e
A
=
_
0.54 0.84
0.84 0.54
_
, e
B
=
_
1 1
0 1
_
e
A+B
=
_
0.16 1.40
0.70 0.16
_
= e
A
e
B
=
_
0.54 1.38
0.84 0.30
_
Solution via Laplace transform and matrix exponential 1015
however, we do have e
A+B
= e
A
e
B
if AB = BA, i.e., A and B commute
thus for t, s R, e
(tA+sA)
= e
tA
e
sA
with s = t we get
e
tA
e
tA
= e
tAtA
= e
0
= I
so e
tA
is nonsingular, with inverse
_
e
tA
_
1
= e
tA
Solution via Laplace transform and matrix exponential 1016
example: lets nd e
A
, where A =
_
0 1
0 0
_
we already found
e
tA
= L
1
(sI A)
1
=
_
1 t
0 1
_
so, plugging in t = 1, we get e
A
=
_
1 1
0 1
_
lets check power series:
e
A
= I +A+
A
2
2!
+ = I +A
since A
2
= A
3
= = 0
Solution via Laplace transform and matrix exponential 1017
Time transfer property
for x = Ax we know
x(t) = (t)x(0) = e
tA
x(0)
interpretation: the matrix e
tA
propagates initial condition into state at
time t
more generally we have, for any t and ,
x( +t) = e
tA
x()
(to see this, apply result above to z(t) = x(t +))
interpretation: the matrix e
tA
propagates state t seconds forward in time
(backward if t < 0)
Solution via Laplace transform and matrix exponential 1018
recall rst order (forward Euler) approximate state update, for small t:
x( +t) x() +t x() = (I +tA)x()
exact solution is
x( +t) = e
tA
x() = (I +tA+ (tA)
2
/2! + )x()
forward Euler is just rst two terms in series
Solution via Laplace transform and matrix exponential 1019
Sampling a continuous-time system
suppose x = Ax
sample x at times t
1
t
2
: dene z(k) = x(t
k
)
then z(k + 1) = e
(t
k+1
t
k
)A
z(k)
for uniform sampling t
k+1
t
k
= h, so
z(k + 1) = e
hA
z(k),
a discrete-time LDS (called discretized version of continuous-time system)
Solution via Laplace transform and matrix exponential 1020
Piecewise constant system
consider time-varying LDS x = A(t)x, with
A(t) =
_
_
_
A
0
0 t < t
1
A
1
t
1
t < t
2
.
.
.
where 0 < t
1
< t
2
< (sometimes called jump linear system)
for t [t
i
, t
i+1
] we have
x(t) = e
(tt
i
)A
i
e
(t
3
t
2
)A
2
e
(t
2
t
1
)A
1
e
t
1
A
0
x(0)
(matrix on righthand side is called state transition matrix for system, and
denoted (t))
Solution via Laplace transform and matrix exponential 1021
Qualitative behavior of x(t)
suppose x = Ax, x(t) R
n
then x(t) = e
tA
x(0); X(s) = (sI A)
1
x(0)
ith component X
i
(s) has form
X
i
(s) =
a
i
(s)
X(s)
where a
i
is a polynomial of degree < n
thus the poles of X
i
are all eigenvalues of A (but not necessarily the other
way around)
Solution via Laplace transform and matrix exponential 1022
rst assume eigenvalues
i
are distinct, so X
i
(s) cannot have repeated
poles
then x
i
(t) has form
x
i
(t) =
n

j=1

ij
e

j
t
where
ij
depend on x(0) (linearly)
eigenvalues determine (possible) qualitative behavior of x:
eigenvalues give exponents that can occur in exponentials
real eigenvalue corresponds to an exponentially decaying or growing
term e
t
in solution
complex eigenvalue = +j corresponds to decaying or growing
sinusoidal term e
t
cos(t +) in solution
Solution via Laplace transform and matrix exponential 1023

j
gives exponential growth rate (if > 0), or exponential decay rate (if
< 0) of term

j
gives frequency of oscillatory term (if = 0)
s
s
eigenvalues
Solution via Laplace transform and matrix exponential 1024
now suppose A has repeated eigenvalues, so X
i
can have repeated poles
express eigenvalues as
1
, . . . ,
r
(distinct) with multiplicities n
1
, . . . , n
r
,
respectively (n
1
+ +n
r
= n)
then x
i
(t) has form
x
i
(t) =
r

j=1
p
ij
(t)e

j
t
where p
ij
(t) is a polynomial of degree < n
j
(that depends linearly on x(0))
Solution via Laplace transform and matrix exponential 1025
Stability
we say system x = Ax is stable if e
tA
0 as t
meaning:
state x(t) converges to 0, as t , no matter what x(0) is
all trajectories of x = Ax converge to 0 as t
fact: x = Ax is stable if and only if all eigenvalues of A have negative real
part:

i
< 0, i = 1, . . . , n
Solution via Laplace transform and matrix exponential 1026
the if part is clear since
lim
t
p(t)e
t
= 0
for any polynomial, if < 0
well see the only if part next lecture
more generally, max
i

i
determines the maximum asymptotic logarithmic
growth rate of x(t) (or decay, if < 0)
Solution via Laplace transform and matrix exponential 1027
EE263 Autumn 2010-11 Stephen Boyd
Lecture 11
Eigenvectors and diagonalization
eigenvectors
dynamic interpretation: invariant sets
complex eigenvectors & invariant planes
left eigenvectors
diagonalization
modal form
discrete-time stability
111
Eigenvectors and eigenvalues
C is an eigenvalue of A C
nn
if
X() = det(I A) = 0
equivalent to:
there exists nonzero v C
n
s.t. (I A)v = 0, i.e.,
Av = v
any such v is called an eigenvector of A (associated with eigenvalue )
there exists nonzero w C
n
s.t. w
T
(I A) = 0, i.e.,
w
T
A = w
T
any such w is called a left eigenvector of A
Eigenvectors and diagonalization 112
if v is an eigenvector of A with eigenvalue , then so is v, for any
C, = 0
even when A is real, eigenvalue and eigenvector v can be complex
when A and are real, we can always nd a real eigenvector v
associated with : if Av = v, with A R
nn
, R, and v C
n
,
then
Av = v, Av = v
so v and v are real eigenvectors, if they are nonzero
(and at least one is)
conjugate symmetry: if A is real and v C
n
is an eigenvector
associated with C, then v is an eigenvector associated with :
taking conjugate of Av = v we get Av = v, so
Av = v
well assume A is real from now on . . .
Eigenvectors and diagonalization 113
Scaling interpretation
(assume R for now; well consider C later)
if v is an eigenvector, eect of A on v is very simple: scaling by
x
v
Ax
Av
(what is here?)
Eigenvectors and diagonalization 114
R, > 0: v and Av point in same direction
R, < 0: v and Av point in opposite directions
R, || < 1: Av smaller than v
R, || > 1: Av larger than v
(well see later how this relates to stability of continuous- and discrete-time
systems. . . )
Eigenvectors and diagonalization 115
Dynamic interpretation
suppose Av = v, v = 0
if x = Ax and x(0) = v, then x(t) = e
t
v
several ways to see this, e.g.,
x(t) = e
tA
v =
_
I + tA +
(tA)
2
2!
+
_
v
= v + tv +
(t)
2
2!
v +
= e
t
v
(since (tA)
k
v = (t)
k
v)
Eigenvectors and diagonalization 116
for C, solution is complex (well interpret later); for now, assume
R
if initial state is an eigenvector v, resulting motion is very simple
always on the line spanned by v
solution x(t) = e
t
v is called mode of system x = Ax (associated with
eigenvalue )
for R, < 0, mode contracts or shrinks as t
for R, > 0, mode expands or grows as t
Eigenvectors and diagonalization 117
Invariant sets
a set S R
n
is invariant under x = Ax if whenever x(t) S, then
x() S for all t
i.e.: once trajectory enters S, it stays in S
trajectory
S
vector eld interpretation: trajectories only cut into S, never out
Eigenvectors and diagonalization 118
suppose Av = v, v = 0, R
line { tv | t R } is invariant
(in fact, ray { tv | t > 0 } is invariant)
if < 0, line segment { tv | 0 t a } is invariant
Eigenvectors and diagonalization 119
Complex eigenvectors
suppose Av = v, v = 0, is complex
for a C, (complex) trajectory ae
t
v satises x = Ax
hence so does (real) trajectory
x(t) =
_
ae
t
v
_
= e
t
_
v
re
v
im

_
cos t sin t
sin t cos t
_ _

_
where
v = v
re
+ jv
im
, = + j, a = + j
trajectory stays in invariant plane span{v
re
, v
im
}
gives logarithmic growth/decay factor
gives angular velocity of rotation in plane
Eigenvectors and diagonalization 1110
Dynamic interpretation: left eigenvectors
suppose w
T
A = w
T
, w = 0
then
d
dt
(w
T
x) = w
T
x = w
T
Ax = (w
T
x)
i.e., w
T
x satises the DE d(w
T
x)/dt = (w
T
x)
hence w
T
x(t) = e
t
w
T
x(0)
even if trajectory x is complicated, w
T
x is simple
if, e.g., R, < 0, halfspace { z | w
T
z a } is invariant (for a 0)
for = + j C, (w)
T
x and (w)
T
x both have form
e
t
(cos(t) + sin(t))
Eigenvectors and diagonalization 1111
Summary
right eigenvectors are initial conditions from which resulting motion is
simple (i.e., remains on line or in plane)
left eigenvectors give linear functions of state that are simple, for any
initial condition
Eigenvectors and diagonalization 1112
example 1: x =
_
_
1 10 10
1 0 0
0 1 0
_
_
x
block diagram:
x
1
x
2
x
3
1/s 1/s 1/s
1 10 10
X(s) = s
3
+ s
2
+ 10s + 10 = (s + 1)(s
2
+ 10)
eigenvalues are 1, j

10
Eigenvectors and diagonalization 1113
trajectory with x(0) = (0, 1, 1):
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
1
0
1
2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
1
0.5
0
0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0.5
0
0.5
1
t
t
t
x
1
x
2
x
3
Eigenvectors and diagonalization 1114
left eigenvector asssociated with eigenvalue 1 is
g =
_
_
0.1
0
1
_
_
lets check g
T
x(t) when x(0) = (0, 1, 1) (as above):
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
t
g
T
x
Eigenvectors and diagonalization 1115
eigenvector associated with eigenvalue j

10 is
v =
_
_
0.554 + j0.771
0.244 + j0.175
0.055 j0.077
_
_
so an invariant plane is spanned by
v
re
=
_
_
0.554
0.244
0.055
_
_
, v
im
=
_
_
0.771
0.175
0.077
_
_
Eigenvectors and diagonalization 1116
for example, with x(0) = v
re
we have
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
1
0
1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0.5
0
0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0.1
0
0.1
t
t
t
x
1
x
2
x
3
Eigenvectors and diagonalization 1117
Example 2: Markov chain
probability distribution satises p(t + 1) = Pp(t)
p
i
(t) = Prob( z(t) = i ) so

n
i=1
p
i
(t) = 1
P
ij
= Prob( z(t + 1) = i | z(t) = j ), so

n
i=1
P
ij
= 1
(such matrices are called stochastic)
rewrite as:
[1 1 1]P = [1 1 1]
i.e., [1 1 1] is a left eigenvector of P with e.v. 1
hence det(I P) = 0, so there is a right eigenvector v = 0 with Pv = v
it can be shown that v can be chosen so that v
i
0, hence we can
normalize v so that

n
i=1
v
i
= 1
interpretation: v is an equilibrium distribution; i.e., if p(0) = v then
p(t) = v for all t 0
(if v is unique it is called the steady-state distribution of the Markov chain)
Eigenvectors and diagonalization 1118
Diagonalization
suppose v
1
, . . . , v
n
is a linearly independent set of eigenvectors of
A R
nn
:
Av
i
=
i
v
i
, i = 1, . . . , n
express as
A
_
v
1
v
n

=
_
v
1
v
n

_
_

1
.
.
.

n
_
_
dene T =
_
v
1
v
n

and = diag(
1
, . . . ,
n
), so
AT = T
and nally
T
1
AT =
Eigenvectors and diagonalization 1119
T invertible since v
1
, . . . , v
n
linearly independent
similarity transformation by T diagonalizes A
conversely if there is a T = [v
1
v
n
] s.t.
T
1
AT = = diag(
1
, . . . ,
n
)
then AT = T, i.e.,
Av
i
=
i
v
i
, i = 1, . . . , n
so v
1
, . . . , v
n
is a linearly independent set of n eigenvectors of A
we say A is diagonalizable if
there exists T s.t. T
1
AT = is diagonal
A has a set of n linearly independent eigenvectors
(if A is not diagonalizable, it is sometimes called defective)
Eigenvectors and diagonalization 1120
Not all matrices are diagonalizable
example: A =
_
0 1
0 0
_
characteristic polynomial is X(s) = s
2
, so = 0 is only eigenvalue
eigenvectors satisfy Av = 0v = 0, i.e.
_
0 1
0 0
_ _
v
1
v
2
_
= 0
so all eigenvectors have form v =
_
v
1
0
_
where v
1
= 0
thus, A cannot have two independent eigenvectors
Eigenvectors and diagonalization 1121
Distinct eigenvalues
fact: if A has distinct eigenvalues, i.e.,
i
=
j
for i = j, then A is
diagonalizable
(the converse is false A can have repeated eigenvalues but still be
diagonalizable)
Eigenvectors and diagonalization 1122
Diagonalization and left eigenvectors
rewrite T
1
AT = as T
1
A = T
1
, or
_
_
w
T
1
.
.
.
w
T
n
_
_
A =
_
_
w
T
1
.
.
.
w
T
n
_
_
where w
T
1
, . . . , w
T
n
are the rows of T
1
thus
w
T
i
A =
i
w
T
i
i.e., the rows of T
1
are (lin. indep.) left eigenvectors, normalized so that
w
T
i
v
j
=
ij
(i.e., left & right eigenvectors chosen this way are dual bases)
Eigenvectors and diagonalization 1123
Modal form
suppose A is diagonalizable by T
dene new coordinates by x = T x, so
T

x = AT x

x = T
1
AT x

x = x
Eigenvectors and diagonalization 1124
in new coordinate system, system is diagonal (decoupled):
1/s
1/s
x
1
x
n

n
trajectories consist of n independent modes, i.e.,
x
i
(t) = e

i
t
x
i
(0)
hence the name modal form
Eigenvectors and diagonalization 1125
Real modal form
when eigenvalues (hence T) are complex, system can be put in real modal
form:
S
1
AS = diag (
r
, M
r+1
, M
r+3
, . . . , M
n1
)
where
r
= diag(
1
, . . . ,
r
) are the real eigenvalues, and
M
i
=
_

i

i

i

i
_
,
i
=
i
+ j
i
, i = r + 1, r + 3, . . . , n
where
i
are the complex eigenvalues (one from each conjugate pair)
Eigenvectors and diagonalization 1126
block diagram of complex mode:

1/s
1/s
Eigenvectors and diagonalization 1127
diagonalization simplies many matrix expressions
e.g., resolvent:
(sI A)
1
=
_
sTT
1
TT
1
_
1
=
_
T(sI )T
1
_
1
= T(sI )
1
T
1
= T diag
_
1
s
1
, . . . ,
1
s
n
_
T
1
powers (i.e., discrete-time solution):
A
k
=
_
TT
1
_
k
=
_
TT
1
_

_
TT
1
_
= T
k
T
1
= T diag(
k
1
, . . . ,
k
n
)T
1
(for k < 0 only if A invertible, i.e., all
i
= 0)
Eigenvectors and diagonalization 1128
exponential (i.e., continuous-time solution):
e
A
= I + A + A
2
/2! +
= I + TT
1
+
_
TT
1
_
2
/2! +
= T(I + +
2
/2! + )T
1
= Te

T
1
= T diag(e

1
, . . . , e
n
)T
1
Eigenvectors and diagonalization 1129
Analytic function of a matrix
for any analytic function f : R R, i.e., given by power series
f(a) =
0
+
1
a +
2
a
2
+
3
a
3
+
we can dene f(A) for A R
nn
(i.e., overload f) as
f(A) =
0
I +
1
A +
2
A
2
+
3
A
3
+
substituting A = TT
1
, we have
f(A) =
0
I +
1
A +
2
A
2
+
3
A
3
+
=
0
TT
1
+
1
TT
1
+
2
(TT
1
)
2
+
= T
_

0
I +
1
+
2

2
+
_
T
1
= T diag(f(
1
), . . . , f(
n
))T
1
Eigenvectors and diagonalization 1130
Solution via diagonalization
assume A is diagonalizable
consider LDS x = Ax, with T
1
AT =
then
x(t) = e
tA
x(0)
= Te
t
T
1
x(0)
=
n

i=1
e

i
t
(w
T
i
x(0))v
i
thus: any trajectory can be expressed as linear combination of modes
Eigenvectors and diagonalization 1131
interpretation:
(left eigenvectors) decompose initial state x(0) into modal components
w
T
i
x(0)
e

i
t
term propagates ith mode forward t seconds
reconstruct state as linear combination of (right) eigenvectors
Eigenvectors and diagonalization 1132
application: for what x(0) do we have x(t) 0 as t ?
divide eigenvalues into those with negative real parts

1
< 0, . . . ,
s
< 0,
and the others,

s+1
0, . . . ,
n
0
from
x(t) =
n

i=1
e

i
t
(w
T
i
x(0))v
i
condition for x(t) 0 is:
x(0) span{v
1
, . . . , v
s
},
or equivalently,
w
T
i
x(0) = 0, i = s + 1, . . . , n
(can you prove this?)
Eigenvectors and diagonalization 1133
Stability of discrete-time systems
suppose A diagonalizable
consider discrete-time LDS x(t + 1) = Ax(t)
if A = TT
1
, then A
k
= T
k
T
1
then
x(t) = A
t
x(0) =
n

i=1

t
i
(w
T
i
x(0))v
i
0 as t
for all x(0) if and only if
|
i
| < 1, i = 1, . . . , n.
we will see later that this is true even when A is not diagonalizable, so we
have
fact: x(t + 1) = Ax(t) is stable if and only if all eigenvalues of A have
magnitude less than one
Eigenvectors and diagonalization 1134
EE263 Autumn 2010-11 Stephen Boyd
Lecture 12
Jordan canonical form
Jordan canonical form
generalized modes
Cayley-Hamilton theorem
121
Jordan canonical form
what if A cannot be diagonalized?
any matrix A R
nn
can be put in Jordan canonical form by a similarity
transformation, i.e.
T
1
AT = J =
_
_
J
1
.
.
.
J
q
_
_
where
J
i
=
_

i
1

i
.
.
.
.
.
. 1

i
_

_
C
n
i
n
i
is called a Jordan block of size n
i
with eigenvalue
i
(so n =

q
i=1
n
i
)
Jordan canonical form 122
J is upper bidiagonal
J diagonal is the special case of n Jordan blocks of size n
i
= 1
Jordan form is unique (up to permutations of the blocks)
can have multiple blocks with same eigenvalue
Jordan canonical form 123
note: JCF is a conceptual tool, never used in numerical computations!
X(s) = det(sI A) = (s
1
)
n
1
(s
q
)
nq
hence distinct eigenvalues n
i
= 1 A diagonalizable
dimN(I A) is the number of Jordan blocks with eigenvalue
more generally,
dimN(I A)
k
=

i
=
min{k, n
i
}
so from dimN(I A)
k
for k = 1, 2, . . . we can determine the sizes of
the Jordan blocks associated with
Jordan canonical form 124
factor out T and T
1
, I A = T(I J)T
1
for, say, a block of size 3:

i
IJ
i
=
_
_
0 1 0
0 0 1
0 0 0
_
_
(
i
IJ
i
)
2
=
_
_
0 0 1
0 0 0
0 0 0
_
_
(
i
IJ
i
)
3
= 0
for other blocks (say, size 3, for k 2)
(
i
IJ
j
)
k
=
_
_
(
i

j
)
k
k(
i

j
)
k1
(k(k 1)/2)(
i

j
)
k2
0 (
j

i
)
k
k(
j

i
)
k1
0 0 (
j

i
)
k
_
_
Jordan canonical form 125
Generalized eigenvectors
suppose T
1
AT = J = diag(J
1
, . . . , J
q
)
express T as
T = [T
1
T
2
T
q
]
where T
i
C
nn
i
are the columns of T associated with ith Jordan block J
i
we have AT
i
= T
i
J
i
let T
i
= [v
i1
v
i2
v
in
i
]
then we have:
Av
i1
=
i
v
i1
,
i.e., the rst column of each T
i
is an eigenvector associated with e.v.
i
for j = 2, . . . , n
i
,
Av
ij
= v
i j1
+
i
v
ij
the vectors v
i1
, . . . v
in
i
are sometimes called generalized eigenvectors
Jordan canonical form 126
Jordan form LDS
consider LDS x = Ax
by change of coordinates x = T x, can put into form

x = J x
system is decomposed into independent Jordan block systems

x
i
= J
i
x
i
x
1
x
n
i
x
n
i
1
1/s 1/s 1/s

Jordan blocks are sometimes called Jordan chains
(block diagram shows why)
Jordan canonical form 127
Resolvent, exponential of Jordan block
resolvent of k k Jordan block with eigenvalue :
(sI J

)
1
=
_

_
s 1
s
.
.
.
.
.
. 1
s
_

_
1
=
_

_
(s )
1
(s )
2
(s )
k
(s )
1
(s )
k+1
.
.
.
.
.
.
(s )
1
_

_
= (s )
1
I + (s )
2
F
1
+ + (s )
k
F
k1
where F
i
is the matrix with ones on the ith upper diagonal
Jordan canonical form 128
by inverse Laplace transform, exponential is:
e
tJ

= e
t
_
I + tF
1
+ + (t
k1
/(k 1)!)F
k1
_
= e
t
_

_
1 t t
k1
/(k 1)!
1 t
k2
/(k 2)!
.
.
.
.
.
.
1
_

_
Jordan blocks yield:
repeated poles in resolvent
terms of form t
p
e
t
in e
tA
Jordan canonical form 129
Generalized modes
consider x = Ax, with
x(0) = a
1
v
i1
+ + a
n
i
v
in
i
= T
i
a
then x(t) = Te
Jt
x(0) = T
i
e
J
i
t
a
trajectory stays in span of generalized eigenvectors
coecients have form p(t)e
t
, where p is polynomial
such solutions are called generalized modes of the system
Jordan canonical form 1210
with general x(0) we can write
x(t) = e
tA
x(0) = Te
tJ
T
1
x(0) =
q

i=1
T
i
e
tJ
i
(S
T
i
x(0))
where
T
1
=
_
_
S
T
1
.
.
.
S
T
q
_
_
hence: all solutions of x = Ax are linear combinations of (generalized)
modes
Jordan canonical form 1211
Cayley-Hamilton theorem
if p(s) = a
0
+ a
1
s + + a
k
s
k
is a polynomial and A R
nn
, we dene
p(A) = a
0
I + a
1
A + + a
k
A
k
Cayley-Hamilton theorem: for any A R
nn
we have X(A) = 0, where
X(s) = det(sI A)
example: with A =
_
1 2
3 4
_
we have X(s) = s
2
5s 2, so
X(A) = A
2
5A2I
=
_
7 10
15 22
_
5
_
1 2
3 4
_
2I
= 0
Jordan canonical form 1212
corollary: for every p Z
+
, we have
A
p
span
_
I, A, A
2
, . . . , A
n1
_
(and if A is invertible, also for p Z)
i.e., every power of A can be expressed as linear combination of
I, A, . . . , A
n1
proof: divide X(s) into s
p
to get s
p
= q(s)X(s) + r(s)
r =
0
+
1
s + +
n1
s
n1
is remainder polynomial
then
A
p
= q(A)X(A) + r(A) = r(A) =
0
I +
1
A + +
n1
A
n1
Jordan canonical form 1213
for p = 1: rewrite C-H theorem
X(A) = A
n
+ a
n1
A
n1
+ + a
0
I = 0
as
I = A
_
(a
1
/a
0
)I (a
2
/a
0
)A (1/a
0
)A
n1
_
(A is invertible a
0
= 0) so
A
1
= (a
1
/a
0
)I (a
2
/a
0
)A (1/a
0
)A
n1
i.e., inverse is linear combination of A
k
, k = 0, . . . , n 1
Jordan canonical form 1214
Proof of C-H theorem
rst assume A is diagonalizable: T
1
AT =
X(s) = (s
1
) (s
n
)
since
X(A) = X(TT
1
) = TX()T
1
it suces to show X() = 0
X() = (
1
I) (
n
I)
= diag(0,
2

1
, . . . ,
n

1
) diag(
1

n
, . . . ,
n1

n
, 0)
= 0
Jordan canonical form 1215
now lets do general case: T
1
AT = J
X(s) = (s
1
)
n
1
(s
q
)
nq
suces to show X(J
i
) = 0
X(J
i
) = (J
i

1
I)
n
1

_
_
0 1 0
0 0 1
.
.
.
_
_
n
i
. .
(J
i

i
I)
n
i
(J
i

q
I)
nq
= 0
Jordan canonical form 1216
EE263 Autumn 2010-11 Stephen Boyd
Lecture 13
Linear dynamical systems with inputs &
outputs
inputs & outputs: interpretations
transfer function
impulse and step responses
examples
131
Inputs & outputs
recall continuous-time time-invariant LDS has form
x = Ax + Bu, y = Cx + Du
Ax is called the drift term (of x)
Bu is called the input term (of x)
picture, with B R
21
:
x(t)
Ax(t)
x(t) (with u(t) = 1)
x(t) (with u(t) = 1.5)
B
Linear dynamical systems with inputs & outputs 132
Interpretations
write x = Ax + b
1
u
1
+ + b
m
u
m
, where B = [b
1
b
m
]
state derivative is sum of autonomous term (Ax) and one term per
input (b
i
u
i
)
each input u
i
gives another degree of freedom for x (assuming columns
of B independent)
write x = Ax +Bu as x
i
= a
T
i
x +

b
T
i
u, where a
T
i
,

b
T
i
are the rows of A, B
ith state derivative is linear function of state x and input u
Linear dynamical systems with inputs & outputs 133
Block diagram
1/s
A
B C
D
x(t) x(t)
u(t) y(t)
A
ij
is gain factor from state x
j
into integrator i
B
ij
is gain factor from input u
j
into integrator i
C
ij
is gain factor from state x
j
into output y
i
D
ij
is gain factor from input u
j
into output y
i
Linear dynamical systems with inputs & outputs 134
interesting when there is structure, e.g., with x
1
R
n
1
, x
2
R
n
2
:
d
dt
_
x
1
x
2
_
=
_
A
11
A
12
0 A
22
_ _
x
1
x
2
_
+
_
B
1
0
_
u, y =
_
C
1
C
2

_
x
1
x
2
_
1/s
1/s
A
11
A
12
A
22
B
1
C
1
C
2
x
1
x
2
u y
x
2
is not aected by input u, i.e., x
2
propagates autonomously
x
2
aects y directly and through x
1
Linear dynamical systems with inputs & outputs 135
Transfer function
take Laplace transform of x = Ax + Bu:
sX(s) x(0) = AX(s) + BU(s)
hence
X(s) = (sI A)
1
x(0) + (sI A)
1
BU(s)
so
x(t) = e
tA
x(0) +
_
t
0
e
(t)A
Bu() d
e
tA
x(0) is the unforced or autonomous response
e
tA
B is called the input-to-state impulse response or impulse matrix
(sI A)
1
B is called the input-to-state transfer function or transfer
matrix
Linear dynamical systems with inputs & outputs 136
with y = Cx + Du we have:
Y (s) = C(sI A)
1
x(0) + (C(sI A)
1
B + D)U(s)
so
y(t) = Ce
tA
x(0) +
_
t
0
Ce
(t)A
Bu() d + Du(t)
output term Ce
tA
x(0) due to initial condition
H(s) = C(sI A)
1
B + D is called the transfer function or transfer
matrix
h(t) = Ce
tA
B + D(t) is called the impulse response or impulse matrix
( is the Dirac delta function)
Linear dynamical systems with inputs & outputs 137
with zero initial condition we have:
Y (s) = H(s)U(s), y = h u
where is convolution (of matrix valued functions)
intepretation:
H
ij
is transfer function from input u
j
to output y
i
Linear dynamical systems with inputs & outputs 138
Impulse response
impulse response h(t) = Ce
tA
B + D(t)
with x(0) = 0, y = h u, i.e.,
y
i
(t) =
m

j=1
_
t
0
h
ij
(t )u
j
() d
interpretations:
h
ij
(t) is impulse response from jth input to ith output
h
ij
(t) gives y
i
when u(t) = e
j

h
ij
() shows how dependent output i is, on what input j was,
seconds ago
i indexes output; j indexes input; indexes time lag
Linear dynamical systems with inputs & outputs 139
Step response
the step response or step matrix is given by
s(t) =
_
t
0
h() d
interpretations:
s
ij
(t) is step response from jth input to ith output
s
ij
(t) gives y
i
when u = e
j
for t 0
for invertible A, we have
s(t) = CA
1
_
e
tA
I
_
B + D
Linear dynamical systems with inputs & outputs 1310
Example 1
u
1
u
2
u
1
u
2
unit masses, springs, dampers
u
1
is tension between 1st & 2nd masses
u
2
is tension between 2nd & 3rd masses
y R
3
is displacement of masses 1,2,3
x =
_
y
y
_
Linear dynamical systems with inputs & outputs 1311
system is:
x =
_

_
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
2 1 0 2 1 0
1 2 1 1 2 1
0 1 2 0 1 2
_

_
x +
_

_
0 0
0 0
0 0
1 0
1 1
0 1
_

_
_
u
1
u
2
_
eigenvalues of A are
1.71 j0.71, 1.00 j1.00, 0.29 j0.71
Linear dynamical systems with inputs & outputs 1312
impulse response:
0 5 10 15
0.2
0.1
0
0.1
0.2
0 5 10 15
0.2
0.1
0
0.1
0.2
t
t
h
11
h
21
h
31
h
12
h
22
h
32
impulse response from u
1
impulse response from u
2
roughly speaking:
impulse at u
1
aects third mass less than other two
impulse at u
2
aects rst mass later than other two
Linear dynamical systems with inputs & outputs 1313
Example 2
interconnect circuit:
u
C
3
C
4
C
1
C
2
u(t) R is input (drive) voltage
x
i
is voltage across C
i
output is state: y = x
unit resistors, unit capacitors
step response matrix shows delay to each node
Linear dynamical systems with inputs & outputs 1314
system is
x =
_

_
3 1 1 0
1 1 0 0
1 0 2 1
0 0 1 1
_

_
x +
_

_
1
0
0
0
_

_
u, y = x
eigenvalues of A are
0.17, 0.66, 2.21, 3.96
Linear dynamical systems with inputs & outputs 1315
step response matrix s(t) R
41
:
0 5 10 15
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
s
1
s
2
s
3
s
4
shortest delay to x
1
; longest delay to x
4
delays 10, consistent with slowest (i.e., dominant) eigenvalue 0.17
Linear dynamical systems with inputs & outputs 1316
DC or static gain matrix
transfer function at s = 0 is H(0) = CA
1
B + D R
mp
DC transfer function describes system under static conditions, i.e., x,
u, y constant:
0 = x = Ax + Bu, y = Cx + Du
eliminate x to get y = H(0)u
Linear dynamical systems with inputs & outputs 1317
if system is stable,
H(0) =
_

0
h(t) dt = lim
t
s(t)
(recall: H(s) =
_

0
e
st
h(t) dt, s(t) =
_
t
0
h() d)
if u(t) u

R
m
, then y(t) y

R
p
where y

= H(0)u

Linear dynamical systems with inputs & outputs 1318


DC gain matrix for example 1 (springs):
H(0) =
_
_
1/4 1/4
1/2 1/2
1/4 1/4
_
_
DC gain matrix for example 2 (RC circuit):
H(0) =
_

_
1
1
1
1
_

_
(do these make sense?)
Linear dynamical systems with inputs & outputs 1319
Discretization with piecewise constant inputs
linear system x = Ax + Bu, y = Cx + Du
suppose u
d
: Z
+
R
m
is a sequence, and
u(t) = u
d
(k) for kh t < (k + 1)h, k = 0, 1, . . .
dene sequences
x
d
(k) = x(kh), y
d
(k) = y(kh), k = 0, 1, . . .
h > 0 is called the sample interval (for x and y) or update interval (for
u)
u is piecewise constant (called zero-order-hold)
x
d
, y
d
are sampled versions of x, y
Linear dynamical systems with inputs & outputs 1320
x
d
(k + 1) = x((k + 1)h)
= e
hA
x(kh) +
_
h
0
e
A
Bu((k + 1)h ) d
= e
hA
x
d
(k) +
_
_
h
0
e
A
d
_
B u
d
(k)
x
d
, u
d
, and y
d
satisfy discrete-time LDS equations
x
d
(k + 1) = A
d
x
d
(k) + B
d
u
d
(k), y
d
(k) = C
d
x
d
(k) + D
d
u
d
(k)
where
A
d
= e
hA
, B
d
=
_
_
h
0
e
A
d
_
B, C
d
= C, D
d
= D
Linear dynamical systems with inputs & outputs 1321
called discretized system
if A is invertible, we can express integral as
_
h
0
e
A
d = A
1
_
e
hA
I
_
stability: if eigenvalues of A are
1
, . . . ,
n
, then eigenvalues of A
d
are
e
h
1
, . . . , e
hn
discretization preserves stability properties since

i
< 0

e
h
i

< 1
for h > 0
Linear dynamical systems with inputs & outputs 1322
extensions/variations:
osets: updates for u and sampling of x, y are oset in time
multirate: u
i
updated, y
i
sampled at dierent intervals
(usually integer multiples of a common interval h)
both very common in practice
Linear dynamical systems with inputs & outputs 1323
Dual system
the dual system associated with system
x = Ax + Bu, y = Cx + Du
is given by
z = A
T
z + C
T
v, w = B
T
z + D
T
v
all matrices are transposed
role of B and C are swapped
transfer function of dual system:
(B
T
)(sI A
T
)
1
(C
T
) + D
T
= H(s)
T
where H(s) = C(sI A)
1
B + D
Linear dynamical systems with inputs & outputs 1324
(for SISO case, TF of dual is same as original)
eigenvalues (hence stability properties) are the same
Linear dynamical systems with inputs & outputs 1325
Dual via block diagram
in terms of block diagrams, dual is formed by:
transpose all matrices
swap inputs and outputs on all boxes
reverse directions of signal ow arrows
swap solder joints and summing junctions
Linear dynamical systems with inputs & outputs 1326
original system:
1/s
A
B C
D
x(t)
u(t) y(t)
dual system:
1/s
A
T
B
T
C
T
D
T
z(t)
w(t) v(t)
Linear dynamical systems with inputs & outputs 1327
Causality
interpretation of
x(t) = e
tA
x(0) +
_
t
0
e
(t)A
Bu() d
y(t) = Ce
tA
x(0) +
_
t
0
Ce
(t)A
Bu() d + Du(t)
for t 0:
current state (x(t)) and output (y(t)) depend on past input (u() for
t)
i.e., mapping from input to state and output is causal (with xed initial
state)
Linear dynamical systems with inputs & outputs 1328
now consider xed nal state x(T): for t T,
x(t) = e
(tT)A
x(T) +
_
t
T
e
(t)A
Bu() d,
i.e., current state (and output) depend on future input!
so for xed nal condition, same system is anti-causal
Linear dynamical systems with inputs & outputs 1329
Idea of state
x(t) is called state of system at time t since:
future output depends only on current state and future input
future output depends on past input only through current state
state summarizes eect of past inputs on future output
state is bridge between past inputs and future outputs
Linear dynamical systems with inputs & outputs 1330
Change of coordinates
start with LDS x = Ax + Bu, y = Cx + Du
change coordinates in R
n
to x, with x = T x
then

x = T
1
x = T
1
(Ax + Bu) = T
1
AT x + T
1
Bu
hence LDS can be expressed as

x =

A x +

Bu, y =

C x +

Du
where

A = T
1
AT,

B = T
1
B,

C = CT,

D = D
TF is same (since u, y arent aected):

C(sI

A)
1

B +

D = C(sI A)
1
B + D
Linear dynamical systems with inputs & outputs 1331
Standard forms for LDS
can change coordinates to put A in various forms (diagonal, real modal,
Jordan . . . )
e.g., to put LDS in diagonal form, nd T s.t.
T
1
AT = diag(
1
, . . . ,
n
)
write
T
1
B =
_
_

b
T
1
.
.
.

b
T
n
_
_
, CT =
_
c
1
c
n

so

x
i
=
i
x
i
+

b
T
i
u, y =
n

i=1
c
i
x
i
Linear dynamical systems with inputs & outputs 1332
1/s
1/s

b
T
1

b
T
n
c
1
c
n
x
1
x
n
u y

n
(here we assume D = 0)
Linear dynamical systems with inputs & outputs 1333
Discrete-time systems
discrete-time LDS:
x(t + 1) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t)
1/z
A
B C
D
x(t) x(t + 1)
u(t) y(t)
only dierence w/cts-time: z instead of s
interpretation of z
1
block:
unit delayor (shifts sequence back in time one epoch)
latch (plus small delay to avoid race condition)
Linear dynamical systems with inputs & outputs 1334
we have:
x(1) = Ax(0) + Bu(0),
x(2) = Ax(1) + Bu(1)
= A
2
x(0) + ABu(0) + Bu(1),
and in general, for t Z
+
,
x(t) = A
t
x(0) +
t1

=0
A
(t1)
Bu()
hence
y(t) = CA
t
x(0) + h u
Linear dynamical systems with inputs & outputs 1335
where is discrete-time convolution and
h(t) =
_
D, t = 0
CA
t1
B, t > 0
is the impulse response
Linear dynamical systems with inputs & outputs 1336
Z-transform
suppose w R
pq
is a sequence (discrete-time signal), i.e.,
w : Z
+
R
pq
recall Z-transform W = Z(w):
W(z) =

t=0
z
t
w(t)
where W : D C C
pq
(D is domain of W)
time-advanced or shifted signal v:
v(t) = w(t + 1), t = 0, 1, . . .
Linear dynamical systems with inputs & outputs 1337
Z-transform of time-advanced signal:
V (z) =

t=0
z
t
w(t + 1)
= z

t=1
z
t
w(t)
= zW(z) zw(0)
Linear dynamical systems with inputs & outputs 1338
Discrete-time transfer function
take Z-transform of system equations
x(t + 1) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t)
yields
zX(z) zx(0) = AX(z) + BU(z), Y (z) = CX(z) + DU(z)
solve for X(z) to get
X(z) = (zI A)
1
zx(0) + (zI A)
1
BU(z)
(note extra z in rst term!)
Linear dynamical systems with inputs & outputs 1339
hence
Y (z) = H(z)U(z) + C(zI A)
1
zx(0)
where H(z) = C(zI A)
1
B + D is the discrete-time transfer function
note power series expansion of resolvent:
(zI A)
1
= z
1
I + z
2
A + z
3
A
2
+
Linear dynamical systems with inputs & outputs 1340
EE263 Autumn 2010-11 Stephen Boyd
Lecture 14
Example: Aircraft dynamics
longitudinal aircraft dynamics
wind gust & control inputs
linearized dynamics
steady-state analysis
eigenvalues & modes
impulse matrices
141
Longitudinal aircraft dynamics
body axis
horizontal

variables are (small) deviations from operating point or trim conditions


state (components):
u: velocity of aircraft along body axis
v: velocity of aircraft perpendicular to body axis
(down is positive)
: angle between body axis and horizontal
(up is positive)
q =

: angular velocity of aircraft (pitch rate)
Example: Aircraft dynamics 142
Inputs
disturbance inputs:
u
w
: velocity of wind along body axis
v
w
: velocity of wind perpendicular to body axis
control or actuator inputs:

e
: elevator angle (
e
> 0 is down)

t
: thrust
Example: Aircraft dynamics 143
Linearized dynamics
for 747, level ight, 40000 ft, 774 ft/sec,

u
v
q

.003 .039 0 .322


.065 .319 7.74 0
.020 .101 .429 0
0 0 1 0

u u
w
v v
w
q

.01 1
.18 .04
1.16 .598
0 0

units: ft, sec, crad (= 0.01rad 0.57

)
matrix coecients are called stability derivatives
Example: Aircraft dynamics 144
outputs of interest:
aircraft speed u (deviation from trim)
climb rate

h = v + 7.74
Example: Aircraft dynamics 145
Steady-state analysis
DC gain from (u
w
, v
w
,
e
,
t
) to (u,

h):
H(0) = CA
1
B + D =

1 0 27.2 15.0
0 1 1.34 24.9

gives steady-state change in speed & climb rate due to wind, elevator &
thrust changes
solve for control variables in terms of wind velocities, desired speed &
climb rate

.0379 .0229
.0020 .0413

u u
w

h + v
w

Example: Aircraft dynamics 146


level ight, increase in speed is obtained mostly by increasing elevator
(i.e., downwards)
constant speed, increase in climb rate is obtained by increasing thrust
and increasing elevator (i.e., downwards)
(thrust on 747 gives strong pitch up torque)
Example: Aircraft dynamics 147
Eigenvalues and modes
eigenvalues are
0.3750 0.8818j, 0.0005 0.0674j
two complex modes, called short-period and phugoid, respectively
system is stable (but lightly damped)
hence step responses converge (eventually) to DC gain matrix
Example: Aircraft dynamics 148
eigenvectors are
x
short
=

0.0005
0.5433
0.0899
0.0283

0.0135
0.8235
0.0677
0.1140

,
x
phug
=

0.7510
0.0962
0.0111
0.1225

0.6130
0.0941
0.0082
0.1637

Example: Aircraft dynamics 149


Short-period mode
y(t) = Ce
tA
(x
short
) (pure short-period mode motion)
0 2 4 6 8 10 12 14 16 18 20
1
0.5
0
0.5
1
0 2 4 6 8 10 12 14 16 18 20
1
0.5
0
0.5
1
u
(
t
)

h
(
t
)
only small eect on speed u
period 7 sec, decays in 10 sec
Example: Aircraft dynamics 1410
Phugoid mode
y(t) = Ce
tA
(x
phug
) (pure phugoid mode motion)
0 200 400 600 800 1000 1200 1400 1600 1800 2000
2
1
0
1
2
0 200 400 600 800 1000 1200 1400 1600 1800 2000
2
1
0
1
2
u
(
t
)

h
(
t
)
aects both speed and climb rate
period 100 sec; decays in 5000 sec
Example: Aircraft dynamics 1411
Dynamic response to wind gusts
impulse response matrix from (u
w
, v
w
) to (u,

h) (gives response to short
wind bursts)
over time period [0, 20]:
0 5 10 15 20
0.1
0
0.1
0 5 10 15 20
0.1
0
0.1
0 5 10 15 20
0.5
0
0.5
0 5 10 15 20
0.5
0
0.5
h
1
1
h
1
2
h
2
1
h
2
2
Example: Aircraft dynamics 1412
over time period [0, 600]:
0 200 400 600
0.1
0
0.1
0 200 400 600
0.1
0
0.1
0 200 400 600
0.5
0
0.5
0 200 400 600
0.5
0
0.5
h
1
1
h
1
2
h
2
1
h
2
2
Example: Aircraft dynamics 1413
Dynamic response to actuators
impulse response matrix from (
e
,
t
) to (u,

h)
over time period [0, 20]:
0 5 10 15 20
2
1
0
1
2
0 5 10 15 20
2
1
0
1
2
0 5 10 15 20
5
0
5
0 5 10 15 20
0
0.5
1
1.5
2
2.5
3
h
1
1
h
1
2
h
2
1
h
2
2
Example: Aircraft dynamics 1414
over time period [0, 600]:
0 200 400 600
2
1
0
1
2
0 200 400 600
2
1
0
1
2
0 200 400 600
3
2
1
0
1
2
3
0 200 400 600
3
2
1
0
1
2
3
h
1
1
h
1
2
h
2
1
h
2
2
Example: Aircraft dynamics 1415
EE263 Autumn 2010-11 Stephen Boyd
Lecture 15
Symmetric matrices, quadratic forms, matrix
norm, and SVD
eigenvectors of symmetric matrices
quadratic forms
inequalities for quadratic forms
positive semidenite matrices
norm of a matrix
singular value decomposition
151
Eigenvalues of symmetric matrices
suppose A R
nn
is symmetric, i.e., A = A
T
fact: the eigenvalues of A are real
to see this, suppose Av = v, v ,= 0, v C
n
then
v
T
Av = v
T
(Av) = v
T
v =
n

i=1
[v
i
[
2
but also
v
T
Av = (Av)
T
v = (v)
T
v =
n

i=1
[v
i
[
2
so we have = , i.e., R (hence, can assume v R
n
)
Symmetric matrices, quadratic forms, matrix norm, and SVD 152
Eigenvectors of symmetric matrices
fact: there is a set of orthonormal eigenvectors of A, i.e., q
1
, . . . , q
n
s.t.
Aq
i
=
i
q
i
, q
T
i
q
j
=
ij
in matrix form: there is an orthogonal Q s.t.
Q
1
AQ = Q
T
AQ =
hence we can express A as
A = QQ
T
=
n

i=1

i
q
i
q
T
i
in particular, q
i
are both left and right eigenvectors
Symmetric matrices, quadratic forms, matrix norm, and SVD 153
Interpretations
A = QQ
T
ac
x
Q
T
x Q
T
x
Ax
Q
T
Q
linear mapping y = Ax can be decomposed as
resolve into q
i
coordinates
scale coordinates by
i
reconstitute with basis q
i
Symmetric matrices, quadratic forms, matrix norm, and SVD 154
or, geometrically,
rotate by Q
T
diagonal real scale (dilation) by
rotate back by Q
decomposition
A =
n

i=1

i
q
i
q
T
i
expresses A as linear combination of 1-dimensional projections
Symmetric matrices, quadratic forms, matrix norm, and SVD 155
example:
A =
_
1/2 3/2
3/2 1/2
_
=
_
1

2
_
1 1
1 1
___
1 0
0 2
_ _
1

2
_
1 1
1 1
__
T
x
q
1
q
2
q
1
q
T
1
x
q
2
q
T
2
x

2
q
2
q
T
2
x

1
q
1
q
T
1
x
Ax
Symmetric matrices, quadratic forms, matrix norm, and SVD 156
proof (case of
i
distinct)
since
i
distinct, can nd v
1
, . . . , v
n
, a set of linearly independent
eigenvectors of A:
Av
i
=
i
v
i
, |v
i
| = 1
then we have
v
T
i
(Av
j
) =
j
v
T
i
v
j
= (Av
i
)
T
v
j
=
i
v
T
i
v
j
so (
i

j
)v
T
i
v
j
= 0
for i ,= j,
i
,=
j
, hence v
T
i
v
j
= 0
in this case we can say: eigenvectors are orthogonal
in general case (
i
not distinct) we must say: eigenvectors can be
chosen to be orthogonal
Symmetric matrices, quadratic forms, matrix norm, and SVD 157
Example: RC circuit
v
1
v
n
c
1
c
n
i
1
i
n
resistive circuit
c
k
v
k
= i
k
, i = Gv
G = G
T
R
nn
is conductance matrix of resistive circuit
thus v = C
1
Gv where C = diag(c
1
, . . . , c
n
)
note C
1
G is not symmetric
Symmetric matrices, quadratic forms, matrix norm, and SVD 158
use state x
i
=

c
i
v
i
, so
x = C
1/2
v = C
1/2
GC
1/2
x
where C
1/2
= diag(

c
1
, . . . ,

c
n
)
we conclude:
eigenvalues
1
, . . . ,
n
of C
1/2
GC
1/2
(hence, C
1
G) are real
eigenvectors q
i
(in x
i
coordinates) can be chosen orthogonal
eigenvectors in voltage coordinates, s
i
= C
1/2
q
i
, satisfy
C
1
Gs
i
=
i
s
i
, s
T
i
Cs
i
=
ij
Symmetric matrices, quadratic forms, matrix norm, and SVD 159
Quadratic forms
a function f : R
n
R of the form
f(x) = x
T
Ax =
n

i,j=1
A
ij
x
i
x
j
is called a quadratic form
in a quadratic form we may as well assume A = A
T
since
x
T
Ax = x
T
((A+A
T
)/2)x
((A+A
T
)/2 is called the symmetric part of A)
uniqueness: if x
T
Ax = x
T
Bx for all x R
n
and A = A
T
, B = B
T
, then
A = B
Symmetric matrices, quadratic forms, matrix norm, and SVD 1510
Examples
|Bx|
2
= x
T
B
T
Bx


n1
i=1
(x
i+1
x
i
)
2
|Fx|
2
|Gx|
2
sets dened by quadratic forms:
x [ f(x) = a is called a quadratic surface
x [ f(x) a is called a quadratic region
Symmetric matrices, quadratic forms, matrix norm, and SVD 1511
Inequalities for quadratic forms
suppose A = A
T
, A = QQ
T
with eigenvalues sorted so
1

n
x
T
Ax = x
T
QQ
T
x
= (Q
T
x)
T
(Q
T
x)
=
n

i=1

i
(q
T
i
x)
2

1
n

i=1
(q
T
i
x)
2
=
1
|x|
2
i.e., we have x
T
Ax
1
x
T
x
Symmetric matrices, quadratic forms, matrix norm, and SVD 1512
similar argument shows x
T
Ax
n
|x|
2
, so we have

n
x
T
x x
T
Ax
1
x
T
x
sometimes
1
is called
max
,
n
is called
min
note also that
q
T
1
Aq
1
=
1
|q
1
|
2
, q
T
n
Aq
n
=
n
|q
n
|
2
,
so the inequalities are tight
Symmetric matrices, quadratic forms, matrix norm, and SVD 1513
Positive semidenite and positive denite matrices
suppose A = A
T
R
nn
we say A is positive semidenite if x
T
Ax 0 for all x
denoted A 0 (and sometimes A _ 0)
A 0 if and only if
min
(A) 0, i.e., all eigenvalues are nonnegative
not the same as A
ij
0 for all i, j
we say A is positive denite if x
T
Ax > 0 for all x ,= 0
denoted A > 0
A > 0 if and only if
min
(A) > 0, i.e., all eigenvalues are positive
Symmetric matrices, quadratic forms, matrix norm, and SVD 1514
Matrix inequalities
we say A is negative semidenite if A 0
we say A is negative denite if A > 0
otherwise, we say A is indenite
matrix inequality: if B = B
T
R
n
we say A B if AB 0, A < B
if B A > 0, etc.
for example:
A 0 means A is positive semidenite
A > B means x
T
Ax > x
T
Bx for all x ,= 0
Symmetric matrices, quadratic forms, matrix norm, and SVD 1515
many properties that youd guess hold actually do, e.g.,
if A B and C D, then A+C B +D
if B 0 then A+B A
if A 0 and 0, then A 0
A
2
0
if A > 0, then A
1
> 0
matrix inequality is only a partial order : we can have
A , B, B , A
(such matrices are called incomparable)
Symmetric matrices, quadratic forms, matrix norm, and SVD 1516
Ellipsoids
if A = A
T
> 0, the set
c = x [ x
T
Ax 1
is an ellipsoid in R
n
, centered at 0
s
1
s
2
c
Symmetric matrices, quadratic forms, matrix norm, and SVD 1517
semi-axes are given by s
i
=
1/2
i
q
i
, i.e.:
eigenvectors determine directions of semiaxes
eigenvalues determine lengths of semiaxes
note:
in direction q
1
, x
T
Ax is large, hence ellipsoid is thin in direction q
1
in direction q
n
, x
T
Ax is small, hence ellipsoid is fat in direction q
n

max
/
min
gives maximum eccentricity
if

c = x [ x
T
Bx 1 , where B > 0, then c

c A B
Symmetric matrices, quadratic forms, matrix norm, and SVD 1518
Gain of a matrix in a direction
suppose A R
mn
(not necessarily square or symmetric)
for x R
n
, |Ax|/|x| gives the amplication factor or gain of A in the
direction x
obviously, gain varies with direction of input x
questions:
what is maximum gain of A
(and corresponding maximum gain direction)?
what is minimum gain of A
(and corresponding minimum gain direction)?
how does gain of A vary with direction?
Symmetric matrices, quadratic forms, matrix norm, and SVD 1519
Matrix norm
the maximum gain
max
x=0
|Ax|
|x|
is called the matrix norm or spectral norm of A and is denoted |A|
max
x=0
|Ax|
2
|x|
2
= max
x=0
x
T
A
T
Ax
|x|
2
=
max
(A
T
A)
so we have |A| =
_

max
(A
T
A)
similarly the minimum gain is given by
min
x=0
|Ax|/|x| =
_

min
(A
T
A)
Symmetric matrices, quadratic forms, matrix norm, and SVD 1520
note that
A
T
A R
nn
is symmetric and A
T
A 0 so
min
,
max
0
max gain input direction is x = q
1
, eigenvector of A
T
A associated
with
max
min gain input direction is x = q
n
, eigenvector of A
T
A associated with

min
Symmetric matrices, quadratic forms, matrix norm, and SVD 1521
example: A =
_
_
1 2
3 4
5 6
_
_
A
T
A =
_
35 44
44 56
_
=
_
0.620 0.785
0.785 0.620
_ _
90.7 0
0 0.265
_ _
0.620 0.785
0.785 0.620
_
T
then |A| =
_

max
(A
T
A) = 9.53:
_
_
_
_
_
0.620
0.785
__
_
_
_
= 1,
_
_
_
_
A
_
0.620
0.785
__
_
_
_
=
_
_
_
_
_
_
_
_
2.18
4.99
7.78
_
_
_
_
_
_
_
_
= 9.53
Symmetric matrices, quadratic forms, matrix norm, and SVD 1522
min gain is
_

min
(A
T
A) = 0.514:
_
_
_
_
_
0.785
0.620
__
_
_
_
= 1,
_
_
_
_
A
_
0.785
0.620
__
_
_
_
=
_
_
_
_
_
_
_
_
0.46
0.14
0.18
_
_
_
_
_
_
_
_
= 0.514
for all x ,= 0, we have
0.514
|Ax|
|x|
9.53
Symmetric matrices, quadratic forms, matrix norm, and SVD 1523
Properties of matrix norm
consistent with vector norm: matrix norm of a R
n1
is
_

max
(a
T
a) =

a
T
a
for any x, |Ax| |A||x|
scaling: |aA| = [a[|A|
triangle inequality: |A+B| |A| +|B|
deniteness: |A| = 0 A = 0
norm of product: |AB| |A||B|
Symmetric matrices, quadratic forms, matrix norm, and SVD 1524
Singular value decomposition
more complete picture of gain properties of A given by singular value
decomposition (SVD) of A:
A = UV
T
where
A R
mn
, Rank(A) = r
U R
mr
, U
T
U = I
V R
nr
, V
T
V = I
= diag(
1
, . . . ,
r
), where
1

r
> 0
Symmetric matrices, quadratic forms, matrix norm, and SVD 1525
with U = [u
1
u
r
], V = [v
1
v
r
],
A = UV
T
=
r

i=1

i
u
i
v
T
i

i
are the (nonzero) singular values of A
v
i
are the right or input singular vectors of A
u
i
are the left or output singular vectors of A
Symmetric matrices, quadratic forms, matrix norm, and SVD 1526
A
T
A = (UV
T
)
T
(UV
T
) = V
2
V
T
hence:
v
i
are eigenvectors of A
T
A (corresponding to nonzero eigenvalues)

i
=
_

i
(A
T
A) (and
i
(A
T
A) = 0 for i > r)
|A| =
1
Symmetric matrices, quadratic forms, matrix norm, and SVD 1527
similarly,
AA
T
= (UV
T
)(UV
T
)
T
= U
2
U
T
hence:
u
i
are eigenvectors of AA
T
(corresponding to nonzero eigenvalues)

i
=
_

i
(AA
T
) (and
i
(AA
T
) = 0 for i > r)
u
1
, . . . u
r
are orthonormal basis for range(A)
v
1
, . . . v
r
are orthonormal basis for ^(A)

Symmetric matrices, quadratic forms, matrix norm, and SVD 1528


Interpretations
A = UV
T
=
r

i=1

i
u
i
v
T
i
x V
T
x V
T
x Ax
V
T
U
linear mapping y = Ax can be decomposed as
compute coecients of x along input directions v
1
, . . . , v
r
scale coecients by
i
reconstitute along output directions u
1
, . . . , u
r
dierence with eigenvalue decomposition for symmetric A: input and
output directions are dierent
Symmetric matrices, quadratic forms, matrix norm, and SVD 1529
v
1
is most sensitive (highest gain) input direction
u
1
is highest gain output direction
Av
1
=
1
u
1
Symmetric matrices, quadratic forms, matrix norm, and SVD 1530
SVD gives clearer picture of gain as function of input/output directions
example: consider A R
44
with = diag(10, 7, 0.1, 0.05)
input components along directions v
1
and v
2
are amplied (by about
10) and come out mostly along plane spanned by u
1
, u
2
input components along directions v
3
and v
4
are attenuated (by about
10)
|Ax|/|x| can range between 10 and 0.05
A is nonsingular
for some applications you might say A is eectively rank 2
Symmetric matrices, quadratic forms, matrix norm, and SVD 1531
example: A R
22
, with
1
= 1,
2
= 0.5
resolve x along v
1
, v
2
: v
T
1
x = 0.5, v
T
2
x = 0.6, i.e., x = 0.5v
1
+ 0.6v
2
now form Ax = (v
T
1
x)
1
u
1
+ (v
T
2
x)
2
u
2
= (0.5)(1)u
1
+ (0.6)(0.5)u
2
v
1
v
2
u
1
u
2
x
Ax
Symmetric matrices, quadratic forms, matrix norm, and SVD 1532
EE263 Autumn 2010-11 Stephen Boyd
Lecture 16
SVD Applications
general pseudo-inverse
full SVD
image of unit ball under linear transformation
SVD in estimation/inversion
sensitivity of linear equations to data error
low rank approximation via SVD
161
General pseudo-inverse
if A = 0 has SVD A = UV
T
,
A

= V
1
U
T
is the pseudo-inverse or Moore-Penrose inverse of A
if A is skinny and full rank,
A

= (A
T
A)
1
A
T
gives the least-squares approximate solution x
ls
= A

y
if A is fat and full rank,
A

= A
T
(AA
T
)
1
gives the least-norm solution x
ln
= A

y
SVD Applications 162
in general case:
X
ls
= { z | Az y = min
w
Aw y }
is set of least-squares approximate solutions
x
pinv
= A

y X
ls
has minimum norm on X
ls
, i.e., x
pinv
is the
minimum-norm, least-squares approximate solution
SVD Applications 163
Pseudo-inverse via regularization
for > 0, let x

be (unique) minimizer of
Ax y
2
+x
2
i.e.,
x

=
_
A
T
A+I
_
1
A
T
y
here, A
T
A+I > 0 and so is invertible
then we have lim
0
x

= A

y
in fact, we have lim
0
_
A
T
A+I
_
1
A
T
= A

(check this!)
SVD Applications 164
Full SVD
SVD of A R
mn
with Rank(A) = r:
A = U
1

1
V
T
1
=
_
u
1
u
r

_
_

1
.
.
.

r
_
_
_
_
v
T
1
.
.
.
v
T
r
_
_
nd U
2
R
m(mr)
, V
2
R
n(nr)
s.t. U = [U
1
U
2
] R
mm
and
V = [V
1
V
2
] R
nn
are orthogonal
add zero rows/cols to
1
to form R
mn
:
=
_

1
0
r(n r)
0
(mr)r
0
(mr)(n r)
_
SVD Applications 165
then we have
A = U
1

1
V
T
1
=
_
U
1
U
2

_

1
0
r(n r)
0
(mr)r
0
(mr)(n r)
_
_
V
T
1
V
T
2
_
i.e.:
A = UV
T
called full SVD of A
(SVD with positive singular values only called compact SVD)
SVD Applications 166
Image of unit ball under linear transformation
full SVD:
A = UV
T
gives intepretation of y = Ax:
rotate (by V
T
)
stretch along axes by
i
(
i
= 0 for i > r)
zero-pad (if m > n) or truncate (if m < n) to get m-vector
rotate (by U)
SVD Applications 167
Image of unit ball under A
rotate by V
T
stretch, = diag(2, 0.5)
u
1 u
2 rotate by U
1
1
1
1
2
0.5
{Ax | x 1} is ellipsoid with principal axes
i
u
i
.
SVD Applications 168
SVD in estimation/inversion
suppose y = Ax +v, where
y R
m
is measurement
x R
n
is vector to be estimated
v is a measurement noise or error
norm-bound model of noise: we assume v but otherwise know
nothing about v ( gives max norm of noise)
SVD Applications 169
consider estimator x = By, with BA = I (i.e., unbiased)
estimation or inversion error is x = x x = Bv
set of possible estimation errors is ellipsoid
x E
unc
= { Bv | v }
x = x x x E
unc
= x +E
unc
, i.e.:
true x lies in uncertainty ellipsoid E
unc
, centered at estimate x
good estimator has small E
unc
(with BA = I, of course)
SVD Applications 1610
semiaxes of E
unc
are
i
u
i
(singular values & vectors of B)
e.g., maximum norm of error is B, i.e., x x B
optimality of least-squares: suppose BA = I is any estimator, and
B
ls
= A

is the least-squares estimator


then:
B
ls
B
T
ls
BB
T
E
ls
E
in particular B
ls
B
i.e., the least-squares estimator gives the smallest uncertainty ellipsoid
SVD Applications 1611
Example: navigation using range measurements (lect. 4)
we have
y =
_

_
k
T
1
k
T
2
k
T
3
k
T
4
_

_
x
where k
i
R
2
using rst two measurements and inverting:
x =
_
_
k
T
1
k
T
2
_
1
0
22
_
y
using all four measurements and least-squares:
x = A

y
SVD Applications 1612
uncertainty regions (with = 1):
10 8 6 4 2 0 2 4 6 8 10
0
5
10
15
20
10 8 6 4 2 0 2 4 6 8 10
0
5
10
15
20
x
1
x
1
x
2
x
2
uncertainty region for x using inversion
uncertainty region for x using least-squares
SVD Applications 1613
Proof of optimality property
suppose A R
mn
, m > n, is full rank
SVD: A = UV
T
, with V orthogonal
B
ls
= A

= V
1
U
T
, and B satises BA = I
dene Z = B B
ls
, so B = B
ls
+Z
then ZA = ZUV
T
= 0, so ZU = 0 (multiply by V
1
on right)
therefore
BB
T
= (B
ls
+Z)(B
ls
+Z)
T
= B
ls
B
T
ls
+B
ls
Z
T
+ZB
T
ls
+ZZ
T
= B
ls
B
T
ls
+ZZ
T
B
ls
B
T
ls
using ZB
T
ls
= (ZU)
1
V
T
= 0
SVD Applications 1614
Sensitivity of linear equations to data error
consider y = Ax, A R
nn
invertible; of course x = A
1
y
suppose we have an error or noise in y, i.e., y becomes y +y
then x becomes x +x with x = A
1
y
hence we have x = A
1
y A
1
y
if A
1
is large,
small errors in y can lead to large errors in x
cant solve for x given y (with small errors)
hence, A can be considered singular in practice
SVD Applications 1615
a more rened analysis uses relative instead of absolute errors in x and y
since y = Ax, we also have y Ax, hence
x
x
AA
1

y
y
(A) = AA
1
=
max
(A)/
min
(A)
is called the condition number of A
we have:
relative error in solution x condition number relative error in data y
or, in terms of # bits of guaranteed accuracy:
# bits accuracy in solution # bits accuracy in data log
2

SVD Applications 1616


we say
A is well conditioned if is small
A is poorly conditioned if is large
(denition of small and large depend on application)
same analysis holds for least-squares approximate solutions with A
nonsquare, =
max
(A)/
min
(A)
SVD Applications 1617
Low rank approximations
suppose A R
mn
, Rank(A) = r, with SVD A = UV
T
=
r

i=1

i
u
i
v
T
i
we seek matrix

A, Rank(

A) p < r, s.t.

A A in the sense that
A

A is minimized
solution: optimal rank p approximator is

A =
p

i=1

i
u
i
v
T
i
hence A

A =
_
_
_

r
i=p+1

i
u
i
v
T
i
_
_
_ =
p+1
interpretation: SVD dyads u
i
v
T
i
are ranked in order of importance;
take p to get rank p approximant
SVD Applications 1618
proof: suppose Rank(B) p
then dimN(B) n p
also, dimspan{v
1
, . . . , v
p+1
} = p + 1
hence, the two subspaces intersect, i.e., there is a unit vector z R
n
s.t.
Bz = 0, z span{v
1
, . . . , v
p+1
}
(AB)z = Az =
p+1

i=1

i
u
i
v
T
i
z
(AB)z
2
=
p+1

i=1

2
i
(v
T
i
z)
2

2
p+1
z
2
hence AB
p+1
= A

A
SVD Applications 1619
Distance to singularity
another interpretation of
i
:

i
= min{ AB | Rank(B) i 1 }
i.e., the distance (measured by matrix norm) to the nearest rank i 1
matrix
for example, if A R
nn
,
n
=
min
is distance to nearest singular matrix
hence, small
min
means A is near to a singular matrix
SVD Applications 1620
application: model simplication
suppose y = Ax +v, where
A R
10030
has SVs
10, 7, 2, 0.5, 0.01, . . . , 0.0001
x is on the order of 1
unknown error or noise v has norm on the order of 0.1
then the terms
i
u
i
v
T
i
x, for i = 5, . . . , 30, are substantially smaller than
the noise term v
simplied model:
y =
4

i=1

i
u
i
v
T
i
x +v
SVD Applications 1621
EE263 Autumn 2010-11 Stephen Boyd
Lecture 17
Example: Quantum mechanics
wave function and Schrodinger equation
discretization
preservation of probability
eigenvalues & eigenstates
example
171
Quantum mechanics
single particle in interval [0, 1], mass m
potential V : [0, 1] R
: [0, 1] R
+
C is (complex-valued) wave function
interpretation: |(x, t)|
2
is probability density of particle at position x,
time t
(so

1
0
|(x, t)|
2
dx = 1 for all t)
evolution of governed by Schrodinger equation:
ih

=

V
h
2
2m

2
x

= H
where H is Hamiltonian operator, i =

1
Example: Quantum mechanics 172
Discretization
lets discretize position x into N discrete points, k/N, k = 1, . . . , N
wave function is approximated as vector (t) C
N

2
x
operator is approximated as matrix

2
= N
2

2 1
1 2 1
1 2 1
.
.
.
.
.
.
.
.
.
1 2

so w =
2
v means
w
k
=
(v
k+1
v
k
)/(1/N) (v
k
v
k1
)(1/N)
1/N
(which approximates w =
2
v/x
2
)
Example: Quantum mechanics 173
discretized Schrodinger equation is (complex) linear dynamical system

= (i/h)(V ( h/2m)
2
) = (i/h)H
where V is a diagonal matrix with V
kk
= V (k/N)
hence we analyze using linear dynamical system theory (with complex
vectors & matrices):

= (i/h)H
solution of Shrodinger equation: (t) = e
(i/ h)tH
(0)
matrix e
(i/ h)tH
propogates wave function forward in time t seconds
(backward if t < 0)
Example: Quantum mechanics 174
Preservation of probability
d
dt

2
=
d
dt

= ((i/h)H)

((i/h)H)
= (i/h)

H + (i/h)

H
= 0
(using H = H
T
R
NN
)
hence, (t)
2
is constant; our discretization preserves probability exactly
Example: Quantum mechanics 175
U = e
(i/ h)tH
is unitary, meaning U

U = I
unitary is extension of orthogonal for complex matrix: if U C
NN
is
unitary and z C
N
, then
Uz
2
= (Uz)

(Uz) = z

Uz = z

z = z
2
Example: Quantum mechanics 176
Eigenvalues & eigenstates
H is symmetric, so
its eigenvalues
1
, . . . ,
N
are real (
1

N
)
its eigenvectors v
1
, . . . , v
N
can be chosen to be orthogonal (and real)
from Hv = v (i/h)Hv = (i/h)v we see:
eigenvectors of (i/h)H are same as eigenvectors of H, i.e., v
1
, . . . , v
N
eigenvalues of (i/h)H are (i/h)
1
, . . . , (i/h)
N
(which are pure
imaginary)
Example: Quantum mechanics 177
eigenvectors v
k
are called eigenstates of system
eigenvalue
k
is energy of eigenstate v
k
for mode (t) = e
(i/ h)
k
t
v
k
, probability density
|
m
(t)|
2
=

e
(i/ h)
k
t
v
k

2
= |v
mk
|
2
doesnt change with time (v
mk
is mth entry of v
k
)
Example: Quantum mechanics 178
Example
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
100
200
300
400
500
600
700
800
900
1000
Potential Function V (x)
x
V
potential bump in middle of innite potential well
(for this example, we set h = 1, m = 1 . . . )
Example: Quantum mechanics 179
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.2
0
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.2
0
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.2
0
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.2
0
0.2
lowest energy eigenfunctions
x
potential V shown as dotted line (scaled to t plot)
four eigenstates with lowest energy shown (i.e., v
1
, v
2
, v
3
, v
4
)
Example: Quantum mechanics 1710
now lets look at a trajectory of , with initial wave function (0)
particle near x = 0.2
with momentum to right (cant see in plot of ||
2
)
(expected) kinetic energy half potential bump height
Example: Quantum mechanics 1711
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.02
0.04
0.06
0.08
0 10 20 30 40 50 60 70 80 90 100
0
0.05
0.1
0.15
x
eigenstate
top plot shows initial probability density |(0)|
2
bottom plot shows |v

k
(0)|
2
, i.e., resolution of (0) into eigenstates
Example: Quantum mechanics 1712
time evolution, for t = 0, 40, 80, . . . , 320:
|(t)|
2
Example: Quantum mechanics 1713
cf. classical solution:
particle rolls half way up potential bump, stops, then rolls back down
reverses velocity when it hits the wall at left
(perfectly elastic collision)
then repeats
Example: Quantum mechanics 1714
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 10
4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
t
plot shows probability that particle is in left half of well, i.e.,
N/2

k=1
|
k
(t)|
2
,
versus time t
Example: Quantum mechanics 1715
EE263 Autumn 2010-11 Stephen Boyd
Lecture 18
Controllability and state transfer
state transfer
reachable set, controllability matrix
minimum norm inputs
innite-horizon minimum norm transfer
181
State transfer
consider x = Ax + Bu (or x(t + 1) = Ax(t) + Bu(t)) over time interval
[t
i
, t
f
]
we say input u : [t
i
, t
f
] R
m
steers or transfers state from x(t
i
) to x(t
f
)
(over time interval [t
i
, t
f
])
(subscripts stand for initial and nal )
questions:
where can x(t
i
) be transfered to at t = t
f
?
how quickly can x(t
i
) be transfered to some x
target
?
how do we nd a u that transfers x(t
i
) to x(t
f
)?
how do we nd a small or ecient u that transfers x(t
i
) to x(t
f
)?
Controllability and state transfer 182
Reachability
consider state transfer from x(0) = 0 to x(t)
we say x(t) is reachable (in t seconds or epochs)
we dene R
t
R
n
as the set of points reachable in t seconds or epochs
for CT system x = Ax + Bu,
R
t
=
_ _
t
0
e
(t)A
Bu() d

u : [0, t] R
m
_
and for DT system x(t + 1) = Ax(t) + Bu(t),
R
t
=
_
t1

=0
A
t1
Bu()

u(0), . . . , u(t 1) R
m
_
Controllability and state transfer 183
R
t
is a subspace of R
n
R
t
R
s
if t s
(i.e., can reach more points given more time)
we dene the reachable set R as the set of points reachable for some t:
R =
_
t0
R
t
Controllability and state transfer 184
Reachability for discrete-time LDS
DT system x(t + 1) = Ax(t) + Bu(t), x(t) R
n
x(t) = C
t
_
_
u(t 1)
.
.
.
u(0)
_
_
where C
t
=
_
B AB A
t1
B

so reachable set at t is R
t
= range(C
t
)
by C-H theorem, we can express each A
k
for k n as linear combination
of A
0
, . . . , A
n1
hence for t n, range(C
t
) = range(C
n
)
Controllability and state transfer 185
thus we have
R
t
=
_
range(C
t
) t < n
range(C) t n
where C = C
n
is called the controllability matrix
any state that can be reached can be reached by t = n
the reachable set is R = range(C)
Controllability and state transfer 186
Controllable system
system is called reachable or controllable if all states are reachable (i.e.,
R = R
n
)
system is reachable if and only if Rank(C) = n
example: x(t + 1) =
_
0 1
1 0
_
x(t) +
_
1
1
_
u(t)
controllability matrix is C =
_
1 1
1 1
_
hence system is not controllable; reachable set is
R = range(C) = { x | x
1
= x
2
}
Controllability and state transfer 187
General state transfer
with t
f
> t
i
,
x(t
f
) = A
t
f
t
i
x(t
i
) +C
t
f
t
i
_
_
u(t
f
1)
.
.
.
u(t
i
)
_
_
hence can transfer x(t
i
) to x(t
f
) = x
des
x
des
A
t
f
t
i
x(t
i
) R
t
f
t
i
general state transfer reduces to reachability problem
if system is controllable any state transfer can be achieved in n steps
important special case: driving state to zero (sometimes called
regulating or controlling state)
Controllability and state transfer 188
Least-norm input for reachability
assume system is reachable, Rank(C
t
) = n
to steer x(0) = 0 to x(t) = x
des
, inputs u(0), . . . , u(t 1) must satisfy
x
des
= C
t
_
_
u(t 1)
.
.
.
u(0)
_
_
among all u that steer x(0) = 0 to x(t) = x
des
, the one that minimizes
t1

=0
u()
2
Controllability and state transfer 189
is given by
_
_
u
ln
(t 1)
.
.
.
u
ln
(0)
_
_
= C
T
t
(C
t
C
T
t
)
1
x
des
u
ln
is called least-norm or minimum energy input that eects state transfer
can express as
u
ln
() = B
T
(A
T
)
(t1)
_
t1

s=0
A
s
BB
T
(A
T
)
s
_
1
x
des
,
for = 0, . . . , t 1
Controllability and state transfer 1810
E
min
, the minimum value of
t1

=0
u()
2
required to reach x(t) = x
des
, is
sometimes called minimum energy required to reach x(t) = x
des
E
min
=
t1

=0
u
ln
()
2
=
_
C
T
t
(C
t
C
T
t
)
1
x
des
_
T
C
T
t
(C
t
C
T
t
)
1
x
des
= x
T
des
(C
t
C
T
t
)
1
x
des
= x
T
des
_
t1

=0
A

BB
T
(A
T
)

_
1
x
des
Controllability and state transfer 1811
E
min
(x
des
, t) gives measure of how hard it is to reach x(t) = x
des
from
x(0) = 0 (i.e., how large a u is required)
E
min
(x
des
, t) gives practical measure of controllability/reachability (as
function of x
des
, t)
ellipsoid { z | E
min
(z, t) 1 } shows points in state space reachable at t
with one unit of energy
(shows directions that can be reached with small inputs, and directions
that can be reached only with large inputs)
Controllability and state transfer 1812
E
min
as function of t:
if t s then
t1

=0
A

BB
T
(A
T
)

s1

=0
A

BB
T
(A
T
)

hence
_
t1

=0
A

BB
T
(A
T
)

_
1

_
s1

=0
A

BB
T
(A
T
)

_
1
so E
min
(x
des
, t) E
min
(x
des
, s)
i.e.: takes less energy to get somewhere more leisurely
Controllability and state transfer 1813
example: x(t + 1) =
_
1.75 0.8
0.95 0
_
x(t) +
_
1
0
_
u(t)
E
min
(z, t) for z = [1 1]
T
:
0 5 10 15 20 25 30 35
0
1
2
3
4
5
6
7
8
9
10
t
E
m
i
n
Controllability and state transfer 1814
ellipsoids E
min
1 for t = 3 and t = 10:
10 8 6 4 2 0 2 4 6 8 10
10
5
0
5
10
10 8 6 4 2 0 2 4 6 8 10
10
5
0
5
10
x
1
x
1
x
2
x
2
E
min
(x, 3) 1
E
min
(x, 10) 1
Controllability and state transfer 1815
Minimum energy over innite horizon
the matrix
P = lim
t
_
t1

=0
A

BB
T
(A
T
)

_
1
always exists, and gives the minimum energy required to reach a point x
des
(with no limit on t):
min
_
t1

=0
u()
2

x(0) = 0, x(t) = x
des
_
= x
T
des
Px
des
if A is stable, P > 0 (i.e., cant get anywhere for free)
if A is not stable, then P can have nonzero nullspace
Controllability and state transfer 1816
Pz = 0, z = 0 means can get to z using us with energy as small as you
like
(u just gives a little kick to the state; the instability carries it out to z
eciently)
basis of highly maneuverable, unstable aircraft
Controllability and state transfer 1817
Continuous-time reachability
consider now x = Ax + Bu with x(t) R
n
reachable set at time t is
R
t
=
_ _
t
0
e
(t)A
Bu() d

u : [0, t] R
m
_
fact: for t > 0, R
t
= R = range(C), where
C =
_
B AB A
n1
B

is the controllability matrix of (A, B)


same R as discrete-time system
for continuous-time system, any reachable point can be reached as fast
as you like (with large enough u)
Controllability and state transfer 1818
rst lets show for any u (and x(0) = 0) we have x(t) range(C)
write e
tA
as power series:
e
tA
= I +
t
1!
A +
t
2
2!
A
2
+
by C-H, express A
n
, A
n+1
, . . . in terms of A
0
, . . . , A
n1
and collect powers
of A:
e
tA
=
0
(t)I +
1
(t)A + +
n1
(t)A
n1
therefore
x(t) =
_
t
0
e
A
Bu(t ) d
=
_
t
0
_
n1

i=0

i
()A
i
_
Bu(t ) d
Controllability and state transfer 1819
=
n1

i=0
A
i
B
_
t
0

i
()u(t ) d
= Cz
where z
i
=
_
t
0

i
()u(t ) d
hence, x(t) is always in range(C)
need to show converse: every point in range(C) can be reached
Controllability and state transfer 1820
Impulsive inputs
suppose x(0

) = 0 and we apply input u(t) =


(k)
(t)f, where
(k)
denotes
kth derivative of and f R
m
then U(s) = s
k
f, so
X(s) = (sI A)
1
Bs
k
f
=
_
s
1
I + s
2
A +
_
Bs
k
f
= ( s
k1
+ + sA
k2
+ A
k1
. .
impulsive terms
+s
1
A
k
+ )Bf
hence
x(t) = impulsive terms + A
k
Bf + A
k+1
Bf
t
1!
+ A
k+2
Bf
t
2
2!
+
in particular, x(0
+
) = A
k
Bf
Controllability and state transfer 1821
thus, input u =
(k)
f transfers state from x(0

) = 0 to x(0
+
) = A
k
Bf
now consider input of form
u(t) = (t)f
0
+ +
(n1)
(t)f
n1
where f
i
R
m
by linearity we have
x(0
+
) = Bf
0
+ + A
n1
Bf
n1
= C
_
_
f
0
.
.
.
f
n1
_
_
hence we can reach any point in range(C)
(at least, using impulse inputs)
Controllability and state transfer 1822
can also be shown that any point in range(C) can be reached for any t > 0
using nonimpulsive inputs
fact: if x(0) R, then x(t) R for all t (no matter what u is)
to show this, need to show e
tA
x(0) R if x(0) R . . .
Controllability and state transfer 1823
Example
unit masses at y
1
, y
2
, connected by unit springs, dampers
input is tension between masses
state is x = [y
T
y
T
]
T
u(t) u(t)
system is
x =
_

_
0 0 1 0
0 0 0 1
1 1 1 1
1 1 1 1
_

_
x +
_

_
0
0
1
1
_

_
u
can we maneuver state anywhere, starting from x(0) = 0?
if not, where can we maneuver state?
Controllability and state transfer 1824
controllability matrix is
C =
_
B AB A
2
B A
3
B

=
_

_
0 1 2 2
0 1 2 2
1 2 2 0
1 2 2 0
_

_
hence reachable set is
R = span
_

_
_

_
1
1
0
0
_

_
,
_

_
0
0
1
1
_

_
_

_
we can reach states with y
1
= y
2
, y
1
= y
2
, i.e., precisely the
dierential motions
its obvious internal force does not aect center of mass position or
total momentum!
Controllability and state transfer 1825
Least-norm input for reachability
(also called minimum energy input)
assume that x = Ax + Bu is reachable
we seek u that steers x(0) = 0 to x(t) = x
des
and minimizes
_
t
0
u()
2
d
lets discretize system with interval h = t/N
(well let N later)
thus u is piecewise constant:
u() = u
d
(k) for kh < (k + 1)h, k = 0, . . . , N 1
Controllability and state transfer 1826
so
x(t) =
_
B
d
A
d
B
d
A
N1
d
B
d

_
_
u
d
(N 1)
.
.
.
u
d
(0)
_
_
where
A
d
= e
hA
, B
d
=
_
h
0
e
A
dB
least-norm u
d
that yields x(t) = x
des
is
u
dln
(k) = B
T
d
(A
T
d
)
(N1k)
_
N1

i=0
A
i
d
B
d
B
T
d
(A
T
d
)
i
_
1
x
des
lets express in terms of A:
B
T
d
(A
T
d
)
(N1k)
= B
T
d
e
(t)A
T
Controllability and state transfer 1827
where = t(k + 1)/N
for N large, B
d
(t/N)B, so this is approximately
(t/N)B
T
e
(t)A
T
similarly
N1

i=0
A
i
d
B
d
B
T
d
(A
T
d
)
i
=
N1

i=0
e
(ti/N)A
B
d
B
T
d
e
(ti/N)A
T
(t/N)
_
t
0
e

tA
BB
T
e

tA
T
d

t
for large N
Controllability and state transfer 1828
hence least-norm discretized input is approximately
u
ln
() = B
T
e
(t)A
T
__
t
0
e

tA
BB
T
e

tA
T
d

t
_
1
x
des
, 0 t
for large N
hence, this is the least-norm continuous input
can make t small, but get larger u
cf. DT solution: sum becomes integral
Controllability and state transfer 1829
min energy is
_
t
0
u
ln
()
2
d = x
T
des
Q(t)
1
x
des
where
Q(t) =
_
t
0
e
A
BB
T
e
A
T
d
can show
(A, B) controllable Q(t) > 0 for all t > 0
Q(s) > 0 for some s > 0
in fact, range(Q(t)) = R for any t > 0
Controllability and state transfer 1830
Minimum energy over innite horizon
the matrix
P = lim
t
__
t
0
e
A
BB
T
e
A
T
d
_
1
always exists, and gives minimum energy required to reach a point x
des
(with no limit on t):
min
_ _
t
0
u()
2
d

x(0) = 0, x(t) = x
des
_
= x
T
des
Px
des
if A is stable, P > 0 (i.e., cant get anywhere for free)
if A is not stable, then P can have nonzero nullspace
Pz = 0, z = 0 means can get to z using us with energy as small as you
like (u just gives a little kick to the state; the instability carries it out to
z eciently)
Controllability and state transfer 1831
General state transfer
consider state transfer from x(t
i
) to x(t
f
) = x
des
, t
f
> t
i
since
x(t
f
) = e
(t
f
t
i
)A
x(t
i
) +
_
t
f
t
i
e
(t
f
)A
Bu() d
u steers x(t
i
) to x(t
f
) = x
des

u (shifted by t
i
) steers x(0) = 0 to x(t
f
t
i
) = x
des
e
(t
f
t
i
)A
x(t
i
)
general state transfer reduces to reachability problem
if system is controllable, any state transfer can be eected
in zero time with impulsive inputs
in any positive time with non-impulsive inputs
Controllability and state transfer 1832
Example
u
1
u
1
u
2
u
2
unit masses, springs, dampers
u
1
is force between 1st & 2nd masses
u
2
is force between 2nd & 3rd masses
y R
3
is displacement of masses 1,2,3
x =
_
y
y
_
Controllability and state transfer 1833
system is:
x =
_

_
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
2 1 0 2 1 0
1 2 1 1 2 1
0 1 2 0 1 2
_

_
x +
_

_
0 0
0 0
0 0
1 0
1 1
0 1
_

_
_
u
1
u
2
_
Controllability and state transfer 1834
steer state from x(0) = e
1
to x(t
f
) = 0
i.e., control initial state e
1
to zero at t = t
f
E
min
=
_
t
f
0
u
ln
()
2
d vs. t
f
:
0 2 4 6 8 10 12
0
5
10
15
20
25
30
35
40
45
50
t
f
E
m
i
n
Controllability and state transfer 1835
for t
f
= 3, u = u
ln
is:
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
5
0
5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
5
0
5
t
t
u
1
(
t
)
u
2
(
t
)
Controllability and state transfer 1836
and for t
f
= 4:
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
5
0
5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
5
0
5
t
t
u
1
(
t
)
u
2
(
t
)
Controllability and state transfer 1837
output y
1
for u = 0:
0 2 4 6 8 10 12 14
0.4
0.2
0
0.2
0.4
0.6
0.8
1
t
y
1
(
t
)
Controllability and state transfer 1838
output y
1
for u = u
ln
with t
f
= 3:
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
0.4
0.2
0
0.2
0.4
0.6
0.8
1
t
y
1
(
t
)
Controllability and state transfer 1839
output y
1
for u = u
ln
with t
f
= 4:
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
0.4
0.2
0
0.2
0.4
0.6
0.8
1
t
y
1
(
t
)
Controllability and state transfer 1840
EE263 Autumn 2010-11 Stephen Boyd
Lecture 19
Observability and state estimation
state estimation
discrete-time observability
observability controllability duality
observers for noiseless case
continuous-time observability
least-squares observers
example
191
State estimation set up
we consider the discrete-time system
x(t + 1) = Ax(t) + Bu(t) + w(t), y(t) = Cx(t) + Du(t) + v(t)
w is state disturbance or noise
v is sensor noise or error
A, B, C, and D are known
u and y are observed over time interval [0, t 1]
w and v are not known, but can be described statistically, or assumed
small (e.g., in RMS value)
Observability and state estimation 192
State estimation problem
state estimation problem: estimate x(s) from
u(0), . . . , u(t 1), y(0), . . . , y(t 1)
s = 0: estimate initial state
s = t 1: estimate current state
s = t: estimate (i.e., predict) next state
an algorithm or system that yields an estimate x(s) is called an observer or
state estimator
x(s) is denoted x(s|t 1) to show what information estimate is based on
(read, x(s) given t 1)
Observability and state estimation 193
Noiseless case
lets look at nding x(0), with no state or measurement noise:
x(t + 1) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t)
with x(t) R
n
, u(t) R
m
, y(t) R
p
then we have
_
_
y(0)
.
.
.
y(t 1)
_
_
= O
t
x(0) + T
t
_
_
u(0)
.
.
.
u(t 1)
_
_
Observability and state estimation 194
where
O
t
=
_

_
C
CA
.
.
.
CA
t1
_

_
, T
t
=
_

_
D 0
CB D 0
.
.
.
CA
t2
B CA
t3
B CB D
_

_
O
t
maps initials state into resulting output over [0, t 1]
T
t
maps input to output over [0, t 1]
hence we have
O
t
x(0) =
_
_
y(0)
.
.
.
y(t 1)
_
_
T
t
_
_
u(0)
.
.
.
u(t 1)
_
_
RHS is known, x(0) is to be determined
Observability and state estimation 195
hence:
can uniquely determine x(0) if and only if N(O
t
) = {0}
N(O
t
) gives ambiguity in determining x(0)
if x(0) N(O
t
) and u = 0, output is zero over interval [0, t 1]
input u does not aect ability to determine x(0);
its eect can be subtracted out
Observability and state estimation 196
Observability matrix
by C-H theorem, each A
k
is linear combination of A
0
, . . . , A
n1
hence for t n, N(O
t
) = N(O) where
O = O
n
=
_

_
C
CA
.
.
.
CA
n1
_

_
is called the observability matrix
if x(0) can be deduced from u and y over [0, t 1] for any t, then x(0)
can be deduced from u and y over [0, n 1]
N(O) is called unobservable subspace; describes ambiguity in determining
state from input and output
system is called observable if N(O) = {0}, i.e., Rank(O) = n
Observability and state estimation 197
Observability controllability duality
let (

A,

B,

C,

D) be dual of system (A, B, C, D), i.e.,

A = A
T
,

B = C
T
,

C = B
T
,

D = D
T
controllability matrix of dual system is

C = [

B

A

B

A
n1

B]
= [C
T
A
T
C
T
(A
T
)
n1
C
T
]
= O
T
,
transpose of observability matrix
similarly we have

O = C
T
Observability and state estimation 198
thus, system is observable (controllable) if and only if dual system is
controllable (observable)
in fact,
N(O) = range(O
T
)

= range(

C)

i.e., unobservable subspace is orthogonal complement of controllable


subspace of dual
Observability and state estimation 199
Observers for noiseless case
suppose Rank(O
t
) = n (i.e., system is observable) and let F be any left
inverse of O
t
, i.e., FO
t
= I
then we have the observer
x(0) = F
_
_
_
_
y(0)
.
.
.
y(t 1)
_
_
T
t
_
_
u(0)
.
.
.
u(t 1)
_
_
_
_
which deduces x(0) (exactly) from u, y over [0, t 1]
in fact we have
x( t + 1) = F
_
_
_
_
y( t + 1)
.
.
.
y()
_
_
T
t
_
_
u( t + 1)
.
.
.
u()
_
_
_
_
Observability and state estimation 1910
i.e., our observer estimates what state was t 1 epochs ago, given past
t 1 inputs & outputs
observer is (multi-input, multi-output) nite impulse response (FIR) lter,
with inputs u and y, and output x
Observability and state estimation 1911
Invariance of unobservable set
fact: the unobservable subspace N(O) is invariant, i.e., if z N(O),
then Az N(O)
proof: suppose z N(O), i.e., CA
k
z = 0 for k = 0, . . . , n 1
evidently CA
k
(Az) = 0 for k = 0, . . . , n 2;
CA
n1
(Az) = CA
n
z =
n1

i=0

i
CA
i
z = 0
(by C-H) where
det(sI A) = s
n
+
n1
s
n1
+ +
0
Observability and state estimation 1912
Continuous-time observability
continuous-time system with no sensor or state noise:
x = Ax + Bu, y = Cx + Du
can we deduce state x from u and y?
lets look at derivatives of y:
y = Cx + Du
y = C x + D u = CAx + CBu + D u
y = CA
2
x + CABu + CB u + D u
and so on
Observability and state estimation 1913
hence we have
_

_
y
y
.
.
.
y
(n1)
_

_
= Ox + T
_

_
u
u
.
.
.
u
(n1)
_

_
where O is the observability matrix and
T =
_

_
D 0
CB D 0
.
.
.
CA
n2
B CA
n3
B CB D
_

_
(same matrices we encountered in discrete-time case!)
Observability and state estimation 1914
rewrite as
Ox =
_

_
y
y
.
.
.
y
(n1)
_

_
T
_

_
u
u
.
.
.
u
(n1)
_

_
RHS is known; x is to be determined
hence if N(O) = {0} we can deduce x(t) from derivatives of u(t), y(t) up
to order n 1
in this case we say system is observable
can construct an observer using any left inverse F of O:
x = F
_
_
_
_
_

_
y
y
.
.
.
y
(n1)
_

_
T
_

_
u
u
.
.
.
u
(n1)
_

_
_
_
_
_
Observability and state estimation 1915
reconstructs x(t) (exactly and instantaneously) from
u(t), . . . , u
(n1)
(t), y(t), . . . , y
(n1)
(t)
derivative-based state reconstruction is dual of state transfer using
impulsive inputs
Observability and state estimation 1916
A converse
suppose z N(O) (the unobservable subspace), and u is any input, with
x, y the corresponding state and output, i.e.,
x = Ax + Bu, y = Cx + Du
then state trajectory x = x + e
tA
z satises

x = A x + Bu, y = C x + Du
i.e., input/output signals u, y consistent with both state trajectories x, x
hence if system is unobservable, no signal processing of any kind applied to
u and y can deduce x
unobservable subspace N(O) gives fundamental ambiguity in deducing x
from u, y
Observability and state estimation 1917
Least-squares observers
discrete-time system, with sensor noise:
x(t + 1) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t) + v(t)
we assume Rank(O
t
) = n (hence, system is observable)
least-squares observer uses pseudo-inverse:
x(0) = O

t
_
_
_
_
y(0)
.
.
.
y(t 1)
_
_
T
t
_
_
u(0)
.
.
.
u(t 1)
_
_
_
_
where O

t
=
_
O
T
t
O
t
_
1
O
T
t
Observability and state estimation 1918
interpretation: x
ls
(0) minimizes discrepancy between
output y that would be observed, with input u and initial state x(0)
(and no sensor noise), and
output y that was observed,
measured as
t1

=0
y() y()
2
can express least-squares initial state estimate as
x
ls
(0) =
_
t1

=0
(A
T
)

C
T
CA

_
1
t1

=0
(A
T
)

C
T
y()
where y is observed output with portion due to input subtracted:
y = y h u where h is impulse response
Observability and state estimation 1919
Least-squares observer uncertainty ellipsoid
since O

t
O
t
= I, we have
x(0) = x
ls
(0) x(0) = O

t
_
_
v(0)
.
.
.
v(t 1)
_
_
where x(0) is the estimation error of the initial state
in particular, x
ls
(0) = x(0) if sensor noise is zero
(i.e., observer recovers exact state in noiseless case)
now assume sensor noise is unknown, but has RMS value ,
1
t
t1

=0
v()
2

2
Observability and state estimation 1920
set of possible estimation errors is ellipsoid
x(0) E
unc
=
_
_
_
O

t
_
_
v(0)
.
.
.
v(t 1)
_
_

1
t
t1

=0
v()
2

2
_
_
_
E
unc
is uncertainty ellipsoid for x(0) (least-square gives best E
unc
)
shape of uncertainty ellipsoid determined by matrix
_
O
T
t
O
t
_
1
=
_
t1

=0
(A
T
)

C
T
CA

_
1
maximum norm of error is
x
ls
(0) x(0)

tO

Observability and state estimation 1921


Innite horizon uncertainty ellipsoid
the matrix
P = lim
t
_
t1

=0
(A
T
)

C
T
CA

_
1
always exists, and gives the limiting uncertainty in estimating x(0) from u,
y over longer and longer periods:
if A is stable, P > 0
i.e., cant estimate initial state perfectly even with innite number of
measurements u(t), y(t), t = 0, . . . (since memory of x(0) fades . . . )
if A is not stable, then P can have nonzero nullspace
i.e., initial state estimation error gets arbitrarily small (at least in some
directions) as more and more of signals u and y are observed
Observability and state estimation 1922
Example
particle in R
2
moves with uniform velocity
(linear, noisy) range measurements from directions 15

, 0

, 20

, 30

,
once per second
range noises IID N(0, 1); can assume RMS value of v is not much more
than 2
no assumptions about initial position & velocity
range sensors
particle
problem: estimate initial position & velocity from range measurements
Observability and state estimation 1923
express as linear system
x(t + 1) =
_

_
1 0 1 0
0 1 0 1
0 0 1 0
0 0 0 1
_

_
x(t), y(t) =
_
_
k
T
1
.
.
.
k
T
4
_
_
x(t) + v(t)
(x
1
(t), x
2
(t)) is position of particle
(x
3
(t), x
4
(t)) is velocity of particle
can assume RMS value of v is around 2
k
i
is unit vector from sensor i to origin
true initial position & velocities: x(0) = (1 3 0.04 0.03)
Observability and state estimation 1924
range measurements (& noiseless versions):
0 20 40 60 80 100 120
5
4
3
2
1
0
1
2
3
4
5
measurements from sensors 1 4
r
a
n
g
e
t
Observability and state estimation 1925
estimate based on (y(0), . . . , y(t)) is x(0|t)
actual RMS position error is
_
( x
1
(0|t) x
1
(0))
2
+ ( x
2
(0|t) x
2
(0))
2
(similarly for actual RMS velocity error)
Observability and state estimation 1926
10 20 30 40 50 60 70 80 90 100 110 120
0
0.5
1
1.5
10 20 30 40 50 60 70 80 90 100 110 120
0
0.2
0.4
0.6
0.8
1
R
M
S
p
o
s
i
t
i
o
n
e
r
r
o
r
R
M
S
v
e
l
o
c
i
t
y
e
r
r
o
r
t
Observability and state estimation 1927
Continuous-time least-squares state estimation
assume x = Ax + Bu, y = Cx + Du + v is observable
least-squares estimate of initial state x(0), given u(), y(), 0 t:
choose x
ls
(0) to minimize integral square residual
J =
_
t
0
_
_
y() Ce
A
x(0)
_
_
2
d
where y = y h u is observed output minus part due to input
lets expand as J = x(0)
T
Qx(0) + 2r
T
x(0) + s,
Q =
_
t
0
e
A
T
C
T
Ce
A
d, r =
_
t
0
e
A
T
C
T
y() d,
s =
_
t
0
y()
T
y() d
Observability and state estimation 1928
setting
x(0)
J to zero, we obtain the least-squares observer
x
ls
(0) = Q
1
r =
__
t
0
e
A
T
C
T
Ce
A
d
_
1 _
t
0
e
A
T

C
T
y() d
estimation error is
x(0) = x
ls
(0) x(0) =
__
t
0
e
A
T
C
T
Ce
A
d
_
1 _
t
0
e
A
T
C
T
v() d
therefore if v = 0 then x
ls
(0) = x(0)
Observability and state estimation 1929
EE263 Autumn 2010-11 Stephen Boyd
Lecture 20
Some parting thoughts . . .
linear algebra
levels of understanding
whats next?
201
Linear algebra
comes up in many practical contexts (EE, ME, CE, AA, OR, Econ, . . . )
nowadays is readily done
cf. 10 yrs ago (when it was mostly talked about)
Matlab or equiv for fooling around
real codes (e.g., LAPACK) widely available
current level of linear algebra technology:
500 1000 vbles: easy with general purpose codes
much more possible with special structure, special codes (e.g., sparse,
convolution, banded, . . . )
Some parting thoughts . . . 202
Levels of understanding
Simple, intuitive view:
17 vbles, 17 eqns: usually has unique solution
80 vbles, 60 eqns: 20 extra degrees of freedom
Platonic view:
singular, rank, range, nullspace, Jordan form, controllability
everything is precise & unambiguous
gives insight & deeper understanding
sometimes misleading in practice
Some parting thoughts . . . 203
Quantitative view:
based on ideas like least-squares, SVD
gives numerical measures for ideas like singularity, rank, etc.
interpretation depends on (practical) context
very useful in practice
Some parting thoughts . . . 204
must have understanding at one level before moving to next
never forget which level you are operating in
Some parting thoughts . . . 205
Whats next?
EE364a convex optimization I (Win 10-11)
EE364b convex optimization II (Spr 10-11)
(plus lots of other EE, CS, ICME, MS&E, Stat, ME, AA courses on signal
processing, control, graphics & vision, machine learning, computational
geometry, numerical linear algebra, . . . )
Some parting thoughts . . . 206

Das könnte Ihnen auch gefallen