Sie sind auf Seite 1von 16

EE263 Autumn 2015 S. Boyd and S.

Lall

Least-squares

I least-squares (approximate) solution of overdetermined equations

I projection and orthogonality principle

I least-squares estimation

I BLUE property

1
Overdetermined linear equations

consider y = Ax where A ∈ Rm×n is (strictly) skinny, i.e., m > n

I called overdetermined set of linear equations


(more equations than unknowns)

I for most y, cannot solve for x

one approach to approximately solve y = Ax:

I define residual or error r = Ax − y

I find x = xls that minimizes krk

xls called least-squares (approximate) solution of y = Ax

2
Geometric interpretation

Given y ∈ Rm , find x ∈ Rn to minimize kAx − yk

Axls is point in range(A) closest to y (Axls is projection of y onto range(A))

range(A) Axls

3
Least-squares (approximate) solution

I assume A is full rank, skinny

I to find xls , we’ll minimize norm of residual squared,

krk2 = xT AT Ax − 2y T Ax + y T y

I set gradient w.r.t. x to zero:

∇x krk2 = 2AT Ax − 2AT y = 0

I yields the normal equations: AT Ax = AT y

I assumptions imply AT A invertible, so we have

xls = (AT A)−1 AT y

. . . a very famous formula

4
Least-squares (approximate) solution

I xls is linear function of y

I xls = A−1 y if A is square

I xls solves y = Axls if y ∈ range(A)

5
Least-squares (approximate) solution

for A skinny and full rank, the pseudo-inverse of A is

A† = (AT A)−1 AT

I for A skinny and full rank, A† is a left inverse of A

A† A = (AT A)−1 AT A = I

I if A is not skinny and full rank then A† has a different definition

6
Projection on range(A)

Axls is (by definition) the point in range(A) that is closest to y, i.e., it is the
projection of y onto range(A)

Axls = Prange(A) (y)

I the projection function Prange(A) is linear, and given by

Prange(A) (y) = Axls = A(AT A)−1 AT y

I A(AT A)−1 AT is called the projection matrix (associated with range(A))

7
Orthogonality principle

optimal residual
r = Axls − y = (A(AT A)−1 AT − I)y
is orthogonal to range(A):

hr, Azi = y T (A(AT A)−1 AT − I)T Az = 0

for all z ∈ Rn

range(A) Axls

8
Completion of squares

since r = Axls − y ⊥ A(x − xls ) for any x, we have

kAx − yk2 = k(Axls − y) + A(x − xls )k2


= kAxls − yk2 + kA(x − xls )k2

this shows that for x 6= xls , kAx − yk > kAxls − yk

9
Least-squares via QR factorization

I A ∈ Rm×n skinny, full rank

I factor as A = QR with QT Q = In , R ∈ Rn×n upper triangular, invertible

I pseudo-inverse is

A† = (AT A)−1 AT = (RT QT QR)−1 RT QT = R−1 QT

so xls = R−1 QT y

I projection on range(A) given by matrix

A(AT A)−1 AT = AR−1 QT = QQT

10
Least-squares via full QR factorization

I full QR factorization:  
  R1
A = Q1 Q2
0
∈ Rm×m orthogonal, R1 ∈ Rn×n upper triangular, invertible
 
Q1 Q2

I multiplication by orthogonal matrix doesn’t change norm, so


  2
2  R1
kAx − yk =
Q1 Q2 x − y
0
 T 2
 
 T   R1 
= Q1 Q2 Q1 Q2 x − Q1 Q2 y
0
R1 x − QT1 y 2
 
=
−QT2 y

= kR1 x − QT1 yk2 + kQT2 yk2

11
Least-squares via full QR factorization

so for any y,

kAx − yk2 = kR1 x − QT1 yk2 + kQT2 yk2

I this is evidently minimized by choice xls = R1−1 QT


1y
(which makes first term zero)

I residual with optimal x is

Axls − y = −Q2 QT2 y

I Q1 QT
1 gives projection onto range(A)


I Q2 QT
2 gives projection onto range(A)

12
Least-squares estimation

many applications in inversion, estimation, and reconstruction problems have form

y = Ax + v

I x is what we want to estimate or reconstruct

I y is our sensor measurement(s)

I v is an unknown noise or measurement error (assumed small)

I ith row of A characterizes ith sensor

13
Least-squares estimation

least-squares estimation: choose as estimate x̂ that minimizes

kAx̂ − yk

i.e., deviation between

I what we actually observed (y), and

I what we would observe if x = x̂, and there were no noise (v = 0)

least-squares estimate is just x̂ = (AT A)−1 AT y

14
BLUE property

suppose A full rank, skinny, and we have linear measurement with noise

y = Ax + v

consider a linear estimator of form x̂ = By

I B is called unbiased if x̂ = x whenever v = 0

I no estimation error when there is no noise


I equivalent to left inverse property BA = I

I estimation error of unbiased linear estimator is

x − x̂ = x − B(Ax + v) = −Bv

I so we’d like B ‘small’ and BA = I

15
BLUE property

fact: A† = (AT A)−1 AT is the smallest left inverse of A, in the following sense:

for any B with BA = I, we have


X X
2
Bij ≥ A†2
ij
i,j i,j

i.e., least-squares provides the best linear unbiased estimator (BLUE)

16

Das könnte Ihnen auch gefallen