Sie sind auf Seite 1von 12

BGHiggins/UCDavis/ECH256/Jan_2012

Least Squares Solution and Pseudo-


Inverse
Introduction
In this notebook we will explore how to solve linear systems of equations given by
A×x = b (1)

In particular we will be interested in the case when A is not a square matrix, and instead has size m
rows and n columns, where m ¹ n. Here is and example when m = 5, n = 3
A11 º A13
A21 º A23
A= » ¸ »
A41 º A43
A51 º A53
If we have more rows than columns ( m > n), then we have more equations than unknowns, and the
system is sometimes referred to as overdetermined. If m < n, the opposite is true and now we have
more unknowns than equations. This would be the case if m=3 and n=5
A11 A12 º º A15
A= » » ¸ º »
A31 A32 º º A35
The system of equations is then sometimes referred to as under-determined.
If m > n, then A× AT is a m ‰m matrix where as AT × A is a n ‰n matrix
We will give precise mathematical descriptions for these terms shortly.
The topics that we will discuss in this notes are;
(i) Rank of a matrix
(ii) Range and null-space of a matrix
(iii) Eigenvalues and eigenvectors of a matrix
(iv) Singular value decomposition (SVD) of a matrix
(v) Least squares solution of a system of equations
(vi) Pseudo-inverse of a matrix
As we will show below, the above topics are all crucial to gaining a clear understanding of what it
means for Eqn. (1) to have a "solution". We will also show how to use Mathematica's built-in functions to
do various computations when we are faced with equations that are not "square".

A Simple Example
In the following example we will do all the calculations by hand. In this way we can fully appreciate the
mathematics that follows. We begin with the following system of linear equations ( written in matrix
notation or vector notation)
2 ECh256LeastSquaresSolution.nb

In the following example we will do all the calculations by hand. In this way we can fully appreciate the
mathematics that follows. We begin with the following system of linear equations ( written in matrix
notation or vector notation)
x
1 3 2 -1
K O y =K O ” A×x = b (2)
2 1 -1 3
z
Let us write these equations out:
x + 3 y + 2 z = -1
(3)
2 x+y-z = 3
Next we solve the system to find x and y in terms of z. To do this we multiply the first equation by 2
and then subtract the second equation to eliminate x to get
5 y + 5 z = -5 ” y = -z - 1 (4)

Substituting this result into the first equation to eliminate y we get


x = z+2 (5)

Substituting this result into the second equation to eliminate x we get


y = -z - 1
Thus for each value of z we have a solution for x and y. Clearly we have an infinite number of possibili-
ties. We can write the solution as
x 2 1
y = -1 + -1 Ξ (6)
z 0 1
where Ξ is a parameter that takes on all values along the real line ( assuming that x, y and z are real
variables). The solution as represented by Eqn. (6) can be written in vector notation as
x = xP + Ν Ξ (7)

where xp is called a particular solution to A·x=b, and Ν is call a null vector that satisfies A×Ν = 0:
Clearly any null vector multiplied by a constant Ξ is also a solution.
Let us now compute the magnitude of our solution given by Eqn. (7)

°x´ = x2 + y2 + z2 = H2 + ΞL2 + H-1 - ΞL2 + Ξ2 (8)

Next we ask, what is the value of Ξ that gives a vector that has the smallest magnitude. We can find this
value by simply solving
â °x´ 1 12
=0 ” I3 Ξ2 + 6 Ξ + 5M H6 Ξ + 6L ” Ξ = -1 (9)
âΞ 2
Substituting this value into Eq. (6) we get
1
xML = 0 (10)
-1
This is called the minimum length solution for Eqn. (2). Note also that the null solution Ν is orthogonal to
xML :
xML × Ν = 0 (11)

In the rest of these notes we formalize these ideas in terms of a general theory for solving systems of
linear equations that are in general not square. That is there are either more equations than unknowns
or more unknowns than there are equations.
ECh256LeastSquaresSolution.nb 3

Rank of a Matrix
In this section we want to be precise about what we mean by the rank of a matrix.
The column rank of a matrix is the dimension of its column space: in plain words this is the number of
linearly independent columns of the matrix.
Likewise, the row rank of a matrix is the number of linearly independent rows of a matrix. Row rank is
always equal to column rank; thus one normally uses the term rank of a matrix to identify the number of
linearly independent rows or columns of a matrix.
The usual way to determine the rank of a matrix is to find the row rank by transforming the matrix by
elementary row operations into row reduce echelon form (RREF). The number of non-zero rows is then
the rank r of the matrix.
Consider the following 2‰5 matrix
1 5 3 -8
A=K O;
2 8 -2 1
We can use RowReduce to put the equation in row reduced echelon form (RREF) from which we can
determine the rank of the matrix by inspection: count the number of non-zero rows.
RowReduce@AD  MatrixForm
69
1 0 - 17 2
17
0 1 4 - 2

In RREF form A has 2 non-zero rows and thus we can conclude that the rank of A is r = 2. Since A has
4 columns, this means that not all the columns are linearly independent. For example, column 1 is a
linear combination of columns 2 and 4. We can check out this possibility by solving following system of
equations:
1 5 -8
K O = c1 K O + c2 K O
2 8 1

where c1 and c2 are non-zero constants. We can use Solve to find these constants:
Solve@85 c1 - 8 c2 Š 1, 8 c1 + c2 Š 2<D
17 2
::c1 ® , c2 ® >>
69 69

Thus
1 17 5 2 -8
K O= K O+ K O
2 69 8 69 1

The matrix A is said to be of full rank if r = min Hm, nL. Thus if m = 5 and n=3, as shown below,
A11 A12 A13
A21 A22 A23
A= A31 A32 A33
A41 A42 A43
A51 A52 A53
then for matrix A to have full rank, it must have 3 linearly independent columns. Of course, this also
means that the 5 rows of A are not linearly independent, only 3 are.

If n ³ m, then for the matrix to have full rank it must have m linearly independent rows. In the following
example m=2, n=4
4 ECh256LeastSquaresSolution.nb
then for matrix A to have full rank, it must have 3 linearly independent columns. Of course, this also
means that the 5 rows of A are not linearly independent, only 3 are.

If n ³ m, then for the matrix to have full rank it must have m linearly independent rows. In the following
example m=2, n=4
1 5 3 -8
A=K O;
2 8 -2 1
Thus A has full rank if r = minH2, 5L = 2. This is the case as the previous calculation showed. Of course,
the columns of A are not linearly independent. Matrix A is also referred to as having full row rank. As we
noted earlier the following matrices are defined: AT × A and A× AT These are square matrices but their
determinant is not the same, In fact
Det IA × AT M ¹ 0, where A × AT is a 2‰2 system
(12)
T T
Det IA × AM = 0, where A × A is a 4‰4 system
A matrix is said to be rank deficient if
r < min Hm, nL (13)

Here is an example of a rank deficient matrix.


1 0 -1 2
1 1 1 -1
A= 0 -1 -2 3 ;
5 2 -1 4
-1 2 5 -8
In this case m=5 and n=4. The rank is found using RowReduce
RowReduce@AD  MatrixForm
1 0 -1 2
0 1 2 -3
0 0 0 0
0 0 0 0
0 0 0 0

Thus the rank of A is r=2. Since r < min Hm, nL, we say A is rank deficient. Furthermore, even though
A× AT and AT × A are defined, the following is true
Det IA × AT M = 0, where A × AT is a 5‰5 system
(14)
T T
Det IA × AM = 0, where A × A is a 4‰4 system
The implications of Eqn. (12) and Eqn. (14) will be discussed below in determining a solution to A× x = b

Least Squares Solution: Overdetermined Systems


Suppose we have the following linear system of equations
A×x = b (15)

where A is a m ‰n matrix (m > n) of rank n. This means we have more equations than unknowns. And
since m > n, we say the system in overdetermined. Let us suppose that Eqn. (8) has full column rank.
Then the solution to Eqn. (8) can also be interpreted in terms of the range of A, denoted by R(A). In
particular, we want to find a suitable x such that b lies in the range of A. But since b is a 1‰m vector,
whereas R(A) has dimension n (or less if the rank of A is less than n), it means we are trying to express
b as a linear combination of vectors that span the column space of A:
ECh256LeastSquaresSolution.nb 5

b1 A11 A12 A1 n
b2 A21 A22 A2 n
= x1 + x2 + º + xn
» » » »
bm Am1 Am2 Amn
Such a construction is true only for very special choices of b. Nevertheless, we can always seek a
`
vector x such that the residual
`
r = b-A×x (16)
`
is small as possible. One measure of smallness of r is to choose x such that the sum of squares of the
residual S is as small as possible
S = min IrT × rM (17)
`
Then the vector x is called the least squares solution of the overdetermined linear system (15). We
prove this result next. The expression for S is
` T ` `
S = Ib - A × xM × Ib - A × xM = ±A × x - bµ (18)

Expanding the RHS of (18) we get


S = bT × b - xT × IAT × bM - IbT × A M × x + xT × IAT × AM × x (19)

To minimize S with respect to x , we compute the derivatives ¶S ‘ ¶xj = 0, j = 1, 2, …, n. The


result is
`
IAT × AM × x = AT × b (20)

Recall that the matrix A has rank n, and therefore AT also has rank n. One can then prove that AT × A is
also of rank n (Nobel, p139, 1969). Since AT × A is a n‰n matrix, it therefore has full rank and its
inverse exists. Thus the least squares solution to Eqn. (15) is
` -1
x = IAT × AM AT × b (21)

Note that A× AT is a m‰m matrix but its inverse does not exist!

Example 1: Solution of linear system when matrix A has full rank.


Consider the following system of linear equations
A.x = b (22)

where A is a 5‰3 matrix given by


1 1 1
2 -1 2
A= -1 4 3 ;
4 2 1
3 -3 4
For this system of equations we have more equations than unknowns. If we use RowReduce to deter-
mine the row echelon form we can deduce by inspection that the rank of A is r = 3:
6 ECh256LeastSquaresSolution.nb

RowReduce@AD  MatrixForm
1 0 0
0 1 0
0 0 1
0 0 0
0 0 0

Since r = n, the matrix A has full rank. By this we mean that the rank of A is equal to the number of
columns of A . This means that the dimension of the NullSpace is zero, and we confirm this with the
function NullSpace
NullSpace@AD
8<

Next we consider the vector b


b = 81, 2, - 1, 4, 8<;

Since we have more equations than unknowns, the system of equations is overdetermined, and so we
`
say the problem is ill-posed. We can seek a solution x to Eqn. (22) that minimizes the sum of squares of
the residual S = rT · r. As shown above the solution is then given as
` -1
x = IAT × AM AT × b (23)

We can use Mathematica to evaluate the RHS of (23) to obtain


`
x = N@Inverse@Transpose@AD.AD.Transpose@AD.bD
81.00326, - 0.504515, 0.659541<
`
For the calculated value of x, residual is then
`
r = A.x - b
80.15829, 1.83012, - 0.0427002, - 0.336434, - 0.838501<
`
and the norm S of the residual for using the calculated value of x is

S= r.r
2.04756

The quantity HAT AL-1 AT appearing in the solution Eqn. (22) is called the pseudo-inverse of A and
denoted by A+ . We will discuss the pseudo-inverse in more detail later in these notes. In Mathematica
there is a function called the PseudoInverse. Let us use this function to compute A+
MatrixForm@A+ = N@PseudoInverse@ADDD
0.0326371 0.0319843 - 0.0998695 0.200392 0.000652742
0.0381854 - 0.0275783 0.104819 0.106125 - 0.100903
0.0120757 0.0468342 0.151382 - 0.0875218 0.131908

Thus the least squares solution is


`
x = A+ .b
81.00326, - 0.504515, 0.659541<

This is the same solution we got using HAT AL-1 AT ·b The residual as expected is then
ECh256LeastSquaresSolution.nb 7

`
r = A.x - b
80.15829, 1.83012, - 0.0427002, - 0.336434, - 0.838501<

and the norm of the residual is

r.r
2.04756

For all the possible "solutions" to Eqn. (22) the one that we have found has the smallest residual.
Thus by using the pseudo-inverse we can readily determine a solution to Eqn. (22). What is important to
recognize at this stage is that since the matrix A is not rank deficient (A has full rank), we can find the
least squares solution without ever using the matrix construct called pseudo-inverse. In the next
section we look at the case when the matrix A has full row rank

Minimal Length Solution: Underdetermined Systems


In this section we consider a linear system defined by Eqn. (1) that is under-determined. By this we
mean the matrix A does not have full column rank, that is r < n but does have full row rank r = m. As
n > m we have more unknowns than equations. This means practically that there is more than one
solution. Our goal then is to pick a solution that has a minimum squared norm. Thus we want to find the
vector x that ( see example above for simple example)
A × x = b, where x = min IxT xM (24)

We can formulate this constrained optimization problem in terms of Lagrange multipliers


F = xT × x + ΛT × HA × x - bL (25)

where Λ is a vector of Lagrange multipliers. The function F is an extrema when


¶F
=0 (26)
¶x
To determine x such that (26) holds we have
1
2 xT + ΛT × A = 0 Þ x = AT × Λ (27)
2
Now suppose A has full row rank, that is r = m. Then AT A is a m‰ m matrix with full rank m, and
consequently is invertible. Substituting Eqn. (27) into Eqn. (24) gives
1
A× AT × Λ = b (28)
2
Solving for Λ we get
-1
Λ = 2 IA × AT M ×b (29)

Substituting this value into Eqn. (24) we obtain the minimal-length solution
-1
x = AT × IA × AT M ×b

The n‰ m matrix AT × HA × AT L-1 is called the right pseudo-inverse, and is also represented by
-1
A+ = AT × IA × AT M (30)

Again note that AT × A and A× AT are defined but only A× AT is invertible, i.e its determinant is non-zero.
8 ECh256LeastSquaresSolution.nb

Example 2: Matrix A has full row rank


In this example we consider a m‰n matrix A where m < n. That is, we have a system of equations with
more unknowns than equations. Suppose A has full row rank, that is r = m. This matrix is not rank
deficient as r = min Hm, nL. Here is our matrix A
A = 881, 5, 3, - 8<, 82, 8, - 2, 1<<; MatrixForm@AD
1 5 3 -8
K O
2 8 -2 1

The RHS vector for our linear system is given by


b = 8- 2, 1<;

From the row echelon form shown below we see that the matrix A has full row rank. That is r = 2 = m
RowReduce@AD  MatrixForm
69
1 0 - 17 2
17
0 1 4 - 2

We can solve this system of 2 equations and 4 unknowns in terms of two of the four variables. The
solutions defines constraints on what values x1 and x2 can take for arbitrary values of x3 and x4
Solve@Thread@A.8x1 , x2 , x3 , x4 <D Š bD
21 69 x4 5 17 x4
::x1 ® + 17 x3 - , x2 ® - - 4 x3 + >>
2 2 2 2

Since we have an under-determined system (more unknowns than equations), we do not get a unique
solution. In the above example we find a solution for x1 and x2 in terms of x3 and x4 . Thus there is a
doubly infinite number of solutions for different choices of x3 and x4 . For each set of x3 and x4 we get a
particular solution to A× x = b.
For example, a particular solution (with x3 = x4 = 0) is
21 5
x1 = , x2 = - , x3 = 0, x4 = 0 (31)
2 2
Since the row rank of A is less than n, there are non-trivial solutions of A× x = 0. These are call the null
space solutions and can be found using Mathematica's NullSpace function
NullSpace@AD  MatrixForm
- 69 17 0 2
K O
17 - 4 1 0

The basis vectors for the NullSpace of A are given below


Transpose@NullSpace@ADD  MatrixForm
- 69 17
17 - 4
0 1
2 0

Since A.Νi = 0 for all vectors in the null space of A, this means we can write the solution to our under-
determined system as
ECh256LeastSquaresSolution.nb 9

N-r
x = x* + â Νi Ξi (32)
i=1

where x* is any solution to our linear system. Taking x* to be Eqn. (31) we can write our solution in
matrix notation as
21
x1 2 -69 17
x2 - 25 17 -4
= + Ξ1 + Ξ2 (33)
x3 0 1
0
x4 2 0
0
The above calculation clearly shows that the solution to our linear system in which A has full row rank is
not unique. The key question then is there a "best" solution. One possibility is to look for the solution x*
that has the minimum norm. We showed previously that such a solution does exist and is called the
minimal length and is given by
` -1
x = AT × IA × AT M × b º A+ × b (34)

where A+ is called the right pseudo-inverse of A. It has the property that A × A+ = I. As the next calcula-
tion shows the built-in function PseudoInverse is equal to AT × HA × AT L-1
Transpose@AD.Inverse@A.Transpose@ADD Š PseudoInverse@AD
True

Let us now compute the least squares solution to our problem by calculating the pseudo-inverse of A
`
x = N@PseudoInverse@AD.bD
80.0211082, 0.0574267, - 0.129132, 0.240106<

Thus the general solution to this problem in matrix notation is


x1 0.0211 - 69 17
x2 0.0574 17 -4
= + Ξ1 + Ξ2
x3 - 0.1291 0 1
x4 0.2401 2 0
`
The general solution based on the least squares solution x gives in this example the following residual R
`
x = x + 8- 69, 17, 0, 2< Ξ1 + 817, - 4, 1, 0< Ξ2 ;
R = b - A.x1  Simplify
92.22045 ´ 10-16 , 0.=
`
Thus the solution satisfies the linear system. The norm for x is

` `
x.x
0.279409
`
Hence for all possible particular solutions to our under-determined system, our x has the minimum
norm. For this reason it is called the minimal length solution

Matrix is Rank Deficient: the Pseudo-Inverse


-1
If the matrix A is rank deficient, neither HAT × AL-1 or IA× AT M exist. There is an equivalent matrix that
solves A× x = b in a least squares sense; it is called the pseudo-inverse of A and is given by
10 ECh256LeastSquaresSolution.nb

-1
If the matrix A is rank deficient, neither HAT × AL-1 or IA× AT M exist. There is an equivalent matrix that
solves A× x = b in a least squares sense; it is called the pseudo-inverse of A and is given by
A+ = V × S-1 × UT
where the matrices V, S and U are related to the singular value decomposition (SVD) of A. See addi-
tional notes on SVD. The general solution of A × x = b where A is rank deficient (r < min Hn, mL) is
N-r
x = A+ × b + â Νi Ξi (35)
i=1

where Νi is a vector in the NullSpace of A, i.e.


A × Νi = 0

The solution with all Ξi = 0 is called the minimal length solution. Note: if the matrix is rank deficient the
solution is non-unique.

Mathematica Computation of Pseudo-inverse


Consider the following matrix
2 2 0
A=K O;
-1 1 1
A quick check shows that this matrix is not rank deficient as the rank r = minHm, nL = 2
RowReduce@AD  MatrixForm
1
1 0 -2
1
0 1 2

Thus we can determine the pseudo-inverse using our previous formula A+ = AT × HA × AT L-1 (see Eq.
(34)
Transpose@AD.Inverse@A.Transpose@ADD  N  MatrixForm
0.25 - 0.333333
0.25 0.333333
0. 0.333333

The SVD of A is
8U, S, V< = SingularValues@N@ADD
888- 1., 0.<, 80., - 1.<<, 82.82843, 1.73205<,
88- 0.707107, - 0.707107, 0.<, 80.57735, - 0.57735, - 0.57735<<<

The pseudo-inverse is then given by


A+ = Transpose@VD.DiagonalMatrix@1  SD.U; A+  MatrixForm
0.25 - 0.333333
0.25 0.333333
0. 0.333333

We can also use the built-in function PseudoInverse which agrees with our previous calculation
PseudoInverse@AD Š A+
True
ECh256LeastSquaresSolution.nb 11

Example 3: Least Squares solution when matrix A is rank deficient


In this section we consider linear systems defined by the matrix A and the vector b
1 0 -1 2
1 1 1 -1
A= 0 - 1 - 2 3 ; b = 8- 1, 2, - 3, 1, 7<;
5 2 -1 4
-1 2 5 -8
In this system we have more equations than unknowns, but the row rank r of A is not equal to 4. Thus
the inverse of HA × AT L does not exist.
A simple check shows that the row rank of this matrix is r = 2
RowReduce@AD  MatrixForm
1 0 -1 2
0 1 2 -3
0 0 0 0
0 0 0 0
0 0 0 0

This means also that the column rank of A is r = 2 ¹ minHm, nL = 4 and thus A is rank deficient. Further,
as AT has rank r = 2, then AT A will have rank r = 2. But AT A is a square matrix of size 4‰4. Hence AT A
is singular. This means that we cannot solve for x using the least squares approach described earlier by
computing HAT AL-1 , as HAT AL-1 is not defined.
These ideas can be extended to the general case when A is a m ‰n matrix with rank k. The pseudo-
inverse of A is then defined in terms of the U and V matrices from the SVD of A
A+ = V S+ UT (36)
+
where S is a diagonal matrix with entries 1  Σ1 , 1  Σ2 , etc. where Σ1 , Σ2 etc., are the singular values
of A.
Since r < n, this means that there are non trivial solutions that lie in the null space of A. Let us inspect
these solutions
Transpose@NullSpace@ADD  MatrixForm
-2 1
3 -2
0 1
1 0

Again we can use Solve to find the family of particular solutions to our linear system
Solve@Thread@A.8x1 , x2 , x3 , x4 <D Š bD
88x1 ® - 1 + x3 - 2 x4 , x2 ® 3 - 2 x3 + 3 x4 <<

A particular solution with x3 = x4 = 0 is


x1 = -1, x2 = 3, x3 = 0, x4 = 0 (37)

Thus using the above particular solution, we have the following general solution
12 ECh256LeastSquaresSolution.nb

x1 -1 -2 1
x2 3 3 -2
= + Ξ1 + Ξ2 (38)
x3 0 0 1
x4 0 1 0
Let us now compute the least squares solution to our problem by calculating the pseudo-inverse of A
`
x = N@PseudoInverse@AD.bD
80.5, 0.5, 0.5, - 0.5<

Thus the general solution to this problem in matrix notation is


x1 0.5 -2 1
x2 0.5 3 -2
= + Ξ1 + Ξ2
x3 0.5 0 1
x4 - 0.5 1 0

The general solution has a particular solution with the smallest norm, i.e., the minimal length solution

` `
x.x
1.

Summary
These notes have explored the various options for solving linear systems of equations which are either
overdetermined (more equations than unknowns) or under-determined (more unknowns than equa-
tions). These definitions can be made precise by invoking the rank of the matrix A.
We have considered 3 cases for a matrix A with size m‰ n:
(i) Matrix A has full column rank: r = min Hm, nL = n
(ii) Matrix A has full row rank: r = min Hm, nL = m
(iii) Matrix A is rank deficient : r < min Hm.nL
In each case we have shown how to compute the least squares solution to the linear system A × x = b
This involves using the pseudo-inverse of A . We also indicated how the pseudo-inverse of A is related
to the singular value decomposition of A.

References
Much of the material given in these notes can be found in a good linear algebra textbook. The following
texts were helpful in preparing these notes.
1. Noble, B., Applied Linear Algebra, Prentice Hall, 1969
2. Trefethen, L. T. & Bau III, D., Numerical Linear Algebra, SIAM, 1997
3. Demmel, J. W., Applied Numerical Linear Algebra, SIAM, 1997

Das könnte Ihnen auch gefallen