Ls PDF

EE263 Autumn 2015 S. Boyd and S.
Lall
Least-squares
I least-squares (approximate) solution of overdetermined equations
I projection and orthogonality principle
I least-squares estimation
I BLUE property
1
Overdetermined linear equations
consider y = Ax where A ∈ Rm×n is (strictly) skinny, i.e., m > n
I called overdetermined set of linear equations

(more equations than unknowns)
I for most y, cannot solve for x
one approach to approximately solve y = Ax:
I define residual or error r = Ax − y
I find x = xls that minimizes krk
xls called least-squares (approximate) solution of y = Ax
2
Geometric interpretation
Given y ∈ Rm , find x ∈ Rn to minimize kAx − yk
Axls is point in range(A) closest to y (Axls is projection of y onto range(A))
range(A) Axls
3
Least-squares (approximate) solution
I assume A is full rank, skinny
I to find xls , we’ll minimize norm of residual squared,
krk2 = xT AT Ax − 2y T Ax + y T y
I set gradient w.r.t. x to zero:
∇x krk2 = 2AT Ax − 2AT y = 0
I yields the normal equations: AT Ax = AT y
I assumptions imply AT A invertible, so we have
xls = (AT A)−1 AT y
. . . a very famous formula
4
I xls is linear function of y
I xls = A−1 y if A is square
I xls solves y = Axls if y ∈ range(A)
5
for A skinny and full rank, the pseudo-inverse of A is
A† = (AT A)−1 AT
I for A skinny and full rank, A† is a left inverse of A
A† A = (AT A)−1 AT A = I
I if A is not skinny and full rank then A† has a different definition
6
Projection on range(A)
Axls is (by definition) the point in range(A) that is closest to y, i.e., it is the
projection of y onto range(A)
Axls = Prange(A) (y)
I the projection function Prange(A) is linear, and given by
Prange(A) (y) = Axls = A(AT A)−1 AT y
I A(AT A)−1 AT is called the projection matrix (associated with range(A))
7
Orthogonality principle
optimal residual
r = Axls − y = (A(AT A)−1 AT − I)y
is orthogonal to range(A):
hr, Azi = y T (A(AT A)−1 AT − I)T Az = 0
for all z ∈ Rn
range(A) Axls
8
Completion of squares
since r = Axls − y ⊥ A(x − xls ) for any x, we have
kAx − yk2 = k(Axls − y) + A(x − xls )k2

= kAxls − yk2 + kA(x − xls )k2
this shows that for x 6= xls , kAx − yk > kAxls − yk
9
Least-squares via QR factorization
I A ∈ Rm×n skinny, full rank
I factor as A = QR with QT Q = In , R ∈ Rn×n upper triangular, invertible
I pseudo-inverse is
A† = (AT A)−1 AT = (RT QT QR)−1 RT QT = R−1 QT
so xls = R−1 QT y
I projection on range(A) given by matrix
A(AT A)−1 AT = AR−1 QT = QQT
10
Least-squares via full QR factorization
I full QR factorization:
R1
A = Q1 Q2
0
∈ Rm×m orthogonal, R1 ∈ Rn×n upper triangular, invertible

Q1 Q2
I multiplication by orthogonal matrix doesn’t change norm, so

2
2 R1
kAx − yk =
Q1 Q2 x − y
0
T 2

T R1
= Q1 Q2 Q1 Q2 x − Q1 Q2 y
0
R1 x − QT1 y 2

=
−QT2 y
= kR1 x − QT1 yk2 + kQT2 yk2
11
Least-squares via full QR factorization
so for any y,
kAx − yk2 = kR1 x − QT1 yk2 + kQT2 yk2
I this is evidently minimized by choice xls = R1−1 QT

1y
(which makes first term zero)
I residual with optimal x is
Axls − y = −Q2 QT2 y
I Q1 QT
1 gives projection onto range(A)
⊥
I Q2 QT
2 gives projection onto range(A)
12
Least-squares estimation
many applications in inversion, estimation, and reconstruction problems have form
y = Ax + v
I x is what we want to estimate or reconstruct
I y is our sensor measurement(s)
I v is an unknown noise or measurement error (assumed small)
I ith row of A characterizes ith sensor
13
Least-squares estimation
least-squares estimation: choose as estimate x̂ that minimizes
kAx̂ − yk
i.e., deviation between
I what we actually observed (y), and
I what we would observe if x = x̂, and there were no noise (v = 0)
least-squares estimate is just x̂ = (AT A)−1 AT y
14
BLUE property
suppose A full rank, skinny, and we have linear measurement with noise
y = Ax + v
consider a linear estimator of form x̂ = By
I B is called unbiased if x̂ = x whenever v = 0
I no estimation error when there is no noise

I equivalent to left inverse property BA = I
I estimation error of unbiased linear estimator is
x − x̂ = x − B(Ax + v) = −Bv
I so we’d like B ‘small’ and BA = I
15
BLUE property
fact: A† = (AT A)−1 AT is the smallest left inverse of A, in the following sense:
for any B with BA = I, we have

X X
2
Bij ≥ A†2
ij
i,j i,j
i.e., least-squares provides the best linear unbiased estimator (BLUE)
16

Ls PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Ls PDF

Hochgeladen von

Copyright:

Verfügbare Formate

EE263 Autumn 2015 S. Boyd and S.

I least-squares (approximate) solution of overdetermined equations

I projection and orthogonality principle

consider y = Ax where A ∈ Rm×n is (strictly) skinny, i.e., m > n

I called overdetermined set of linear equations

I for most y, cannot solve for x

one approach to approximately solve y = Ax:

I define residual or error r = Ax − y

I find x = xls that minimizes krk

xls called least-squares (approximate) solution of y = Ax

Given y ∈ Rm , find x ∈ Rn to minimize kAx − yk

Axls is point in range(A) closest to y (Axls is projection of y onto range(A))

I assume A is full rank, skinny

I to find xls , we’ll minimize norm of residual squared,

I set gradient w.r.t. x to zero:

∇x krk2 = 2AT Ax − 2AT y = 0

I yields the normal equations: AT Ax = AT y

I assumptions imply AT A invertible, so we have

xls = (AT A)−1 AT y

. . . a very famous formula

I xls is linear function of y

I xls = A−1 y if A is square

I xls solves y = Axls if y ∈ range(A)

for A skinny and full rank, the pseudo-inverse of A is

I for A skinny and full rank, A† is a left inverse of A

I if A is not skinny and full rank then A† has a different definition

Axls = Prange(A) (y)

I the projection function Prange(A) is linear, and given by

Prange(A) (y) = Axls = A(AT A)−1 AT y

I A(AT A)−1 AT is called the projection matrix (associated with range(A))

hr, Azi = y T (A(AT A)−1 AT − I)T Az = 0

since r = Axls − y ⊥ A(x − xls ) for any x, we have

kAx − yk2 = k(Axls − y) + A(x − xls )k2

this shows that for x 6= xls , kAx − yk > kAxls − yk

I A ∈ Rm×n skinny, full rank

I factor as A = QR with QT Q = In , R ∈ Rn×n upper triangular, invertible

A† = (AT A)−1 AT = (RT QT QR)−1 RT QT = R−1 QT

I projection on range(A) given by matrix

A(AT A)−1 AT = AR−1 QT = QQT

I multiplication by orthogonal matrix doesn’t change norm, so

= kR1 x − QT1 yk2 + kQT2 yk2

kAx − yk2 = kR1 x − QT1 yk2 + kQT2 yk2

I this is evidently minimized by choice xls = R1−1 QT

I residual with optimal x is

Axls − y = −Q2 QT2 y

many applications in inversion, estimation, and reconstruction problems have form

I x is what we want to estimate or reconstruct

I y is our sensor measurement(s)

I v is an unknown noise or measurement error (assumed small)

I ith row of A characterizes ith sensor

least-squares estimation: choose as estimate x̂ that minimizes

i.e., deviation between

I what we actually observed (y), and

I what we would observe if x = x̂, and there were no noise (v = 0)

least-squares estimate is just x̂ = (AT A)−1 AT y

consider a linear estimator of form x̂ = By

I B is called unbiased if x̂ = x whenever v = 0

I no estimation error when there is no noise

I estimation error of unbiased linear estimator is

I so we’d like B ‘small’ and BA = I

for any B with BA = I, we have

i.e., least-squares provides the best linear unbiased estimator (BLUE)

Das könnte Ihnen auch gefallen