Beruflich Dokumente
Kultur Dokumente
In particular we will be interested in the case when A is not a square matrix, and instead has size m
rows and n columns, where m ¹ n. Here is and example when m = 5, n = 3
A11 º A13
A21 º A23
A= » ¸ »
A41 º A43
A51 º A53
If we have more rows than columns ( m > n), then we have more equations than unknowns, and the
system is sometimes referred to as overdetermined. If m < n, the opposite is true and now we have
more unknowns than equations. This would be the case if m=3 and n=5
A11 A12 º º A15
A= » » ¸ º »
A31 A32 º º A35
The system of equations is then sometimes referred to as under-determined.
If m > n, then A× AT is a m m matrix where as AT × A is a n n matrix
We will give precise mathematical descriptions for these terms shortly.
The topics that we will discuss in this notes are;
(i) Rank of a matrix
(ii) Range and null-space of a matrix
(iii) Eigenvalues and eigenvectors of a matrix
(iv) Singular value decomposition (SVD) of a matrix
(v) Least squares solution of a system of equations
(vi) Pseudo-inverse of a matrix
As we will show below, the above topics are all crucial to gaining a clear understanding of what it
means for Eqn. (1) to have a "solution". We will also show how to use Mathematica's built-in functions to
do various computations when we are faced with equations that are not "square".
A Simple Example
In the following example we will do all the calculations by hand. In this way we can fully appreciate the
mathematics that follows. We begin with the following system of linear equations ( written in matrix
notation or vector notation)
2 ECh256LeastSquaresSolution.nb
In the following example we will do all the calculations by hand. In this way we can fully appreciate the
mathematics that follows. We begin with the following system of linear equations ( written in matrix
notation or vector notation)
x
1 3 2 -1
K O y =K O A×x = b (2)
2 1 -1 3
z
Let us write these equations out:
x + 3 y + 2 z = -1
(3)
2 x+y-z = 3
Next we solve the system to find x and y in terms of z. To do this we multiply the first equation by 2
and then subtract the second equation to eliminate x to get
5 y + 5 z = -5 y = -z - 1 (4)
where xp is called a particular solution to A·x=b, and Ν is call a null vector that satisfies A×Ν = 0:
Clearly any null vector multiplied by a constant Ξ is also a solution.
Let us now compute the magnitude of our solution given by Eqn. (7)
Next we ask, what is the value of Ξ that gives a vector that has the smallest magnitude. We can find this
value by simply solving
â °x´ 1 12
=0 I3 Ξ2 + 6 Ξ + 5M H6 Ξ + 6L Ξ = -1 (9)
âΞ 2
Substituting this value into Eq. (6) we get
1
xML = 0 (10)
-1
This is called the minimum length solution for Eqn. (2). Note also that the null solution Ν is orthogonal to
xML :
xML × Ν = 0 (11)
In the rest of these notes we formalize these ideas in terms of a general theory for solving systems of
linear equations that are in general not square. That is there are either more equations than unknowns
or more unknowns than there are equations.
ECh256LeastSquaresSolution.nb 3
Rank of a Matrix
In this section we want to be precise about what we mean by the rank of a matrix.
The column rank of a matrix is the dimension of its column space: in plain words this is the number of
linearly independent columns of the matrix.
Likewise, the row rank of a matrix is the number of linearly independent rows of a matrix. Row rank is
always equal to column rank; thus one normally uses the term rank of a matrix to identify the number of
linearly independent rows or columns of a matrix.
The usual way to determine the rank of a matrix is to find the row rank by transforming the matrix by
elementary row operations into row reduce echelon form (RREF). The number of non-zero rows is then
the rank r of the matrix.
Consider the following 25 matrix
1 5 3 -8
A=K O;
2 8 -2 1
We can use RowReduce to put the equation in row reduced echelon form (RREF) from which we can
determine the rank of the matrix by inspection: count the number of non-zero rows.
RowReduce@AD MatrixForm
69
1 0 - 17 2
17
0 1 4 - 2
In RREF form A has 2 non-zero rows and thus we can conclude that the rank of A is r = 2. Since A has
4 columns, this means that not all the columns are linearly independent. For example, column 1 is a
linear combination of columns 2 and 4. We can check out this possibility by solving following system of
equations:
1 5 -8
K O = c1 K O + c2 K O
2 8 1
where c1 and c2 are non-zero constants. We can use Solve to find these constants:
Solve@85 c1 - 8 c2 1, 8 c1 + c2 2<D
17 2
::c1 ® , c2 ® >>
69 69
Thus
1 17 5 2 -8
K O= K O+ K O
2 69 8 69 1
The matrix A is said to be of full rank if r = min Hm, nL. Thus if m = 5 and n=3, as shown below,
A11 A12 A13
A21 A22 A23
A= A31 A32 A33
A41 A42 A43
A51 A52 A53
then for matrix A to have full rank, it must have 3 linearly independent columns. Of course, this also
means that the 5 rows of A are not linearly independent, only 3 are.
If n ³ m, then for the matrix to have full rank it must have m linearly independent rows. In the following
example m=2, n=4
4 ECh256LeastSquaresSolution.nb
then for matrix A to have full rank, it must have 3 linearly independent columns. Of course, this also
means that the 5 rows of A are not linearly independent, only 3 are.
If n ³ m, then for the matrix to have full rank it must have m linearly independent rows. In the following
example m=2, n=4
1 5 3 -8
A=K O;
2 8 -2 1
Thus A has full rank if r = minH2, 5L = 2. This is the case as the previous calculation showed. Of course,
the columns of A are not linearly independent. Matrix A is also referred to as having full row rank. As we
noted earlier the following matrices are defined: AT × A and A× AT These are square matrices but their
determinant is not the same, In fact
Det IA × AT M ¹ 0, where A × AT is a 22 system
(12)
T T
Det IA × AM = 0, where A × A is a 44 system
A matrix is said to be rank deficient if
r < min Hm, nL (13)
Thus the rank of A is r=2. Since r < min Hm, nL, we say A is rank deficient. Furthermore, even though
A× AT and AT × A are defined, the following is true
Det IA × AT M = 0, where A × AT is a 55 system
(14)
T T
Det IA × AM = 0, where A × A is a 44 system
The implications of Eqn. (12) and Eqn. (14) will be discussed below in determining a solution to A× x = b
where A is a m n matrix (m > n) of rank n. This means we have more equations than unknowns. And
since m > n, we say the system in overdetermined. Let us suppose that Eqn. (8) has full column rank.
Then the solution to Eqn. (8) can also be interpreted in terms of the range of A, denoted by R(A). In
particular, we want to find a suitable x such that b lies in the range of A. But since b is a 1m vector,
whereas R(A) has dimension n (or less if the rank of A is less than n), it means we are trying to express
b as a linear combination of vectors that span the column space of A:
ECh256LeastSquaresSolution.nb 5
b1 A11 A12 A1 n
b2 A21 A22 A2 n
= x1 + x2 + º + xn
» » » »
bm Am1 Am2 Amn
Such a construction is true only for very special choices of b. Nevertheless, we can always seek a
`
vector x such that the residual
`
r = b-A×x (16)
`
is small as possible. One measure of smallness of r is to choose x such that the sum of squares of the
residual S is as small as possible
S = min IrT × rM (17)
`
Then the vector x is called the least squares solution of the overdetermined linear system (15). We
prove this result next. The expression for S is
` T ` `
S = Ib - A × xM × Ib - A × xM = ±A × x - bµ (18)
Recall that the matrix A has rank n, and therefore AT also has rank n. One can then prove that AT × A is
also of rank n (Nobel, p139, 1969). Since AT × A is a nn matrix, it therefore has full rank and its
inverse exists. Thus the least squares solution to Eqn. (15) is
` -1
x = IAT × AM AT × b (21)
Note that A× AT is a mm matrix but its inverse does not exist!
RowReduce@AD MatrixForm
1 0 0
0 1 0
0 0 1
0 0 0
0 0 0
Since r = n, the matrix A has full rank. By this we mean that the rank of A is equal to the number of
columns of A . This means that the dimension of the NullSpace is zero, and we confirm this with the
function NullSpace
NullSpace@AD
8<
Since we have more equations than unknowns, the system of equations is overdetermined, and so we
`
say the problem is ill-posed. We can seek a solution x to Eqn. (22) that minimizes the sum of squares of
the residual S = rT · r. As shown above the solution is then given as
` -1
x = IAT × AM AT × b (23)
S= r.r
2.04756
The quantity HAT AL-1 AT appearing in the solution Eqn. (22) is called the pseudo-inverse of A and
denoted by A+ . We will discuss the pseudo-inverse in more detail later in these notes. In Mathematica
there is a function called the PseudoInverse. Let us use this function to compute A+
MatrixForm@A+ = N@PseudoInverse@ADDD
0.0326371 0.0319843 - 0.0998695 0.200392 0.000652742
0.0381854 - 0.0275783 0.104819 0.106125 - 0.100903
0.0120757 0.0468342 0.151382 - 0.0875218 0.131908
This is the same solution we got using HAT AL-1 AT ·b The residual as expected is then
ECh256LeastSquaresSolution.nb 7
`
r = A.x - b
80.15829, 1.83012, - 0.0427002, - 0.336434, - 0.838501<
r.r
2.04756
For all the possible "solutions" to Eqn. (22) the one that we have found has the smallest residual.
Thus by using the pseudo-inverse we can readily determine a solution to Eqn. (22). What is important to
recognize at this stage is that since the matrix A is not rank deficient (A has full rank), we can find the
least squares solution without ever using the matrix construct called pseudo-inverse. In the next
section we look at the case when the matrix A has full row rank
Substituting this value into Eqn. (24) we obtain the minimal-length solution
-1
x = AT × IA × AT M ×b
The n m matrix AT × HA × AT L-1 is called the right pseudo-inverse, and is also represented by
-1
A+ = AT × IA × AT M (30)
Again note that AT × A and A× AT are defined but only A× AT is invertible, i.e its determinant is non-zero.
8 ECh256LeastSquaresSolution.nb
From the row echelon form shown below we see that the matrix A has full row rank. That is r = 2 = m
RowReduce@AD MatrixForm
69
1 0 - 17 2
17
0 1 4 - 2
We can solve this system of 2 equations and 4 unknowns in terms of two of the four variables. The
solutions defines constraints on what values x1 and x2 can take for arbitrary values of x3 and x4
Solve@Thread@A.8x1 , x2 , x3 , x4 <D bD
21 69 x4 5 17 x4
::x1 ® + 17 x3 - , x2 ® - - 4 x3 + >>
2 2 2 2
Since we have an under-determined system (more unknowns than equations), we do not get a unique
solution. In the above example we find a solution for x1 and x2 in terms of x3 and x4 . Thus there is a
doubly infinite number of solutions for different choices of x3 and x4 . For each set of x3 and x4 we get a
particular solution to A× x = b.
For example, a particular solution (with x3 = x4 = 0) is
21 5
x1 = , x2 = - , x3 = 0, x4 = 0 (31)
2 2
Since the row rank of A is less than n, there are non-trivial solutions of A× x = 0. These are call the null
space solutions and can be found using Mathematica's NullSpace function
NullSpace@AD MatrixForm
- 69 17 0 2
K O
17 - 4 1 0
Since A.Νi = 0 for all vectors in the null space of A, this means we can write the solution to our under-
determined system as
ECh256LeastSquaresSolution.nb 9
N-r
x = x* + â Νi Ξi (32)
i=1
where x* is any solution to our linear system. Taking x* to be Eqn. (31) we can write our solution in
matrix notation as
21
x1 2 -69 17
x2 - 25 17 -4
= + Ξ1 + Ξ2 (33)
x3 0 1
0
x4 2 0
0
The above calculation clearly shows that the solution to our linear system in which A has full row rank is
not unique. The key question then is there a "best" solution. One possibility is to look for the solution x*
that has the minimum norm. We showed previously that such a solution does exist and is called the
minimal length and is given by
` -1
x = AT × IA × AT M × b º A+ × b (34)
where A+ is called the right pseudo-inverse of A. It has the property that A × A+ = I. As the next calcula-
tion shows the built-in function PseudoInverse is equal to AT × HA × AT L-1
Transpose@AD.Inverse@A.Transpose@ADD PseudoInverse@AD
True
Let us now compute the least squares solution to our problem by calculating the pseudo-inverse of A
`
x = N@PseudoInverse@AD.bD
80.0211082, 0.0574267, - 0.129132, 0.240106<
` `
x.x
0.279409
`
Hence for all possible particular solutions to our under-determined system, our x has the minimum
norm. For this reason it is called the minimal length solution
-1
If the matrix A is rank deficient, neither HAT × AL-1 or IA× AT M exist. There is an equivalent matrix that
solves A× x = b in a least squares sense; it is called the pseudo-inverse of A and is given by
A+ = V × S-1 × UT
where the matrices V, S and U are related to the singular value decomposition (SVD) of A. See addi-
tional notes on SVD. The general solution of A × x = b where A is rank deficient (r < min Hn, mL) is
N-r
x = A+ × b + â Νi Ξi (35)
i=1
The solution with all Ξi = 0 is called the minimal length solution. Note: if the matrix is rank deficient the
solution is non-unique.
Thus we can determine the pseudo-inverse using our previous formula A+ = AT × HA × AT L-1 (see Eq.
(34)
Transpose@AD.Inverse@A.Transpose@ADD N MatrixForm
0.25 - 0.333333
0.25 0.333333
0. 0.333333
The SVD of A is
8U, S, V< = SingularValues@N@ADD
888- 1., 0.<, 80., - 1.<<, 82.82843, 1.73205<,
88- 0.707107, - 0.707107, 0.<, 80.57735, - 0.57735, - 0.57735<<<
We can also use the built-in function PseudoInverse which agrees with our previous calculation
PseudoInverse@AD A+
True
ECh256LeastSquaresSolution.nb 11
This means also that the column rank of A is r = 2 ¹ minHm, nL = 4 and thus A is rank deficient. Further,
as AT has rank r = 2, then AT A will have rank r = 2. But AT A is a square matrix of size 44. Hence AT A
is singular. This means that we cannot solve for x using the least squares approach described earlier by
computing HAT AL-1 , as HAT AL-1 is not defined.
These ideas can be extended to the general case when A is a m n matrix with rank k. The pseudo-
inverse of A is then defined in terms of the U and V matrices from the SVD of A
A+ = V S+ UT (36)
+
where S is a diagonal matrix with entries 1 Σ1 , 1 Σ2 , etc. where Σ1 , Σ2 etc., are the singular values
of A.
Since r < n, this means that there are non trivial solutions that lie in the null space of A. Let us inspect
these solutions
Transpose@NullSpace@ADD MatrixForm
-2 1
3 -2
0 1
1 0
Again we can use Solve to find the family of particular solutions to our linear system
Solve@Thread@A.8x1 , x2 , x3 , x4 <D bD
88x1 ® - 1 + x3 - 2 x4 , x2 ® 3 - 2 x3 + 3 x4 <<
Thus using the above particular solution, we have the following general solution
12 ECh256LeastSquaresSolution.nb
x1 -1 -2 1
x2 3 3 -2
= + Ξ1 + Ξ2 (38)
x3 0 0 1
x4 0 1 0
Let us now compute the least squares solution to our problem by calculating the pseudo-inverse of A
`
x = N@PseudoInverse@AD.bD
80.5, 0.5, 0.5, - 0.5<
The general solution has a particular solution with the smallest norm, i.e., the minimal length solution
` `
x.x
1.
Summary
These notes have explored the various options for solving linear systems of equations which are either
overdetermined (more equations than unknowns) or under-determined (more unknowns than equa-
tions). These definitions can be made precise by invoking the rank of the matrix A.
We have considered 3 cases for a matrix A with size m n:
(i) Matrix A has full column rank: r = min Hm, nL = n
(ii) Matrix A has full row rank: r = min Hm, nL = m
(iii) Matrix A is rank deficient : r < min Hm.nL
In each case we have shown how to compute the least squares solution to the linear system A × x = b
This involves using the pseudo-inverse of A . We also indicated how the pseudo-inverse of A is related
to the singular value decomposition of A.
References
Much of the material given in these notes can be found in a good linear algebra textbook. The following
texts were helpful in preparing these notes.
1. Noble, B., Applied Linear Algebra, Prentice Hall, 1969
2. Trefethen, L. T. & Bau III, D., Numerical Linear Algebra, SIAM, 1997
3. Demmel, J. W., Applied Numerical Linear Algebra, SIAM, 1997