Matrix Alg

Matrix Algebra 1
Scalers, Vectors & Matrices
A single element, value, or quantity is referred to as scaler. For example, number 3 is scaler,
so is covariance between x1 and x2 , denoted by Cov(x1 , x2 ). Finally, estimated parameter b̂1 is
also scaler.
Two or more scalers written together in a row or a column, form a row or a column vector.
The order of a row vector is 1 × c, where 1 indicates one row and c is the number of columns.
The order of a column vector is r × 1. Geometrically, when vectors are in two dimensions, it
is possible to interpret vectors in physical space. Consider four points, a, b, c and d in two
dimensions. Location of each point is generally specified relative to some reference point and
reference axes. Following figure shows these four points when reference point is o(0,0).
x2
5
4
a(2,3)
3 b
b(-4,2)
b 2
−5 −4 −3 −2 −1 o 1 2 3 4 5 x1
b −1
c(-4,-1)
−2 b
d(2.5,-2)
−3
−4
−5
A matrix is defined as a rectangular array of elements (vectors or scalers) arranged in rows and
columns as in
 
2 3
 −4 2 
 
A= 
 −4 −1 
2.5 −2
and in general,
Multivariate Research Methods COST*6060

Matrix Algebra 2
 
a11 a12 · · · a1c
 a21 a22 · · · a2c 
 
 .. .. .. . 
 . .. 
A=  . . 
 
 .. .. .. . 
 . . . .. 
ar1 ar2 · · · arc
There are rc elements arranged in r rows and c columns, and it is order of r by c, and it is
written as r × c. The element in the ith row and jth column is denoted by aij . The matrix is
designated by a capital letter, A.
Special forms of Matrices
• Vector, a matrix with only a single row or of order 1 × c is called row vector. Also a
matrix with only a single column or of order r × 1 is called column vector. Vectors will
be designated by a lowercase letter, b. For example, a row vector
h i
b= b1 b2 · · · bc
while a column vector or a matrix of order r × 1 is a
 
b1
 
 b2 
c=
 .. .

 . 
br
• Square matrix, matrix with same number of rows and columns, that is, r = c or matrix
of size r × r or c × c.
• Identity matrix is a square matrix with ones along the diagonal and zeros everywhere
else.  
1 0 ··· 0
 
 0 1 ··· 0 
Ir = 
 .. .. . . .. 

 . . . . 
0 0 ··· 1
• Scaler matrix is similar to identity matrix but a common constant or scaler along
diagonal and zero everywhere else. For example,
 
k 0 ··· 0
 
 0 k ··· 0 
kI =  .... . . .. 
 
 . . . . 
0 0 ··· k

Matrix Algebra 3
• Diagonal matrix is also similar to identity matrix but elements along diagonal are
different and zero everywhere else. For example,
 
d1 0 ··· 0
 
 0 d2 ··· 0 
D=  .. .. .. 
 .. 
 . . . . 
0 0 · · · dr
• Square Symmetric Matrix is square matrix with aij = aji for all i 6= j.
 
a11 a12 · · · a1c
 
 a12 a22 · · · a2c 
A=  .. .. . 
 .. 
 . . . .. 
a1c a2c · · · acc
• Null Matrix is matrix with all elements are zero and denoted by 0.
Matrix Operations
• Equality of two matrices Two matrices A and B are said to be equal when they are
of the same order and aij = bij for all i and j.
• Addition and subtraction of two matrices is possible if A and B are of the same
order. That is, C = A + B. Consider following example,
" # " #
−4 2 2 3
A= and B=
−4 −1 2.5 −2
then
" #
−4 + 2 2+3
C = A+B =
−4 + 2.5 −1 − 2
" #
−2 5
=
−1.5 −3
Geometrically, additions of two matrices is explained as follows. Consider matrix A as

composed of two vectors, a1 and a2 and matrix B as b1 and b2 . Then addition involves
taking summation along same axis for a1 and b1 as well as a2 and b2 . The resulting
matrix then is denoted by vectors c1 and c2 on the graph.

Matrix Algebra 4
c1 (-2,5) x2
b 5
4
b1 (2,3)
3 b
a1 (-4,2)
b 2
x1
−5 −4 −3 −2 −1 1 2 3 4 5
b −1
a2 (-4,-1)
−2 b
b2 (2.5,-2)
b −3
c2 (-1.5,-3)
−4
−5
• Scaler multiplication require that each element aij be multiplied by a scaler. If k is a

scaler, then kA is k × aij . For example,
" #
3 2
A= and k=2
2 0
then
" #
2×3 2×2
kA =
2×2 2×0
" #
6 4
=
4 0
• Matrix multiplication is possible if two matrices are conformable. That is, the number
of elements in a row of the first matrix has to be equal to the number of elements in a
column of the second matrix. If A is of order m × n and B is order n × p, then the
product AB is defined to be a matrix of order m × p whose ijth element is
n
X
cij = aik bkj .
k=1
In AB note that A being postmultiplied by B or to A premultiplying B. BA is usually

different from AB and may not exist. The two products AB and BA will exist only if
the matrices are of order m × n and n × m. In such case, the first product will be of order
m × m and the second n × n.

Matrix Algebra 5
Consider following example,

 
" # b11 b12
a11 a12 a13  
A= B =  b21 b22 
a21 a22 a23
b31 b32
Then AB is a 2 × 2 or
" #
a11 b11 + a12 b21 + a13 b31 a11 b12 + a12 b22 + a13 b32
AB =
a21 b11 + a22 b21 + a23 b31 a21 b12 + a22 b22 + a23 b32
while BA is a 3 × 3 or
 
b11 a11 + b12 a21 b11 a12 + b12 a22 b11 a13 + b12 a23
 
BA =  b21 a11 + b22 a21 b21 a12 + b22 a22 b21 a13 + b22 a23 
b31 a11 + b32 a21 b31 a12 + b32 a22 b31 a13 + b32 a23
Determine AB and BA for following matrices.

 
" # 3 −1
2 3 5  
A= B= 1 0 
1 −1 2
0 2
Answers:
" #
2×3+3×1+5×0 2 × −1 + 3 × 0 + 5 × 2
AB =
× + × + × × + × + ×
" #
9 8
=
2 3
Similarly verify that

 
× + × × + × × + ×
 
BA =  × + × × + × × + × 
× + × × + × × + ×
 
5 10 13
 
=  2 3 5 
2 −2 4
• Rules for Matrix Additions and Multiplication are not same as those apply for
scalers. Following are important rules to remember.
1. A + B = B + A. This is commutative law of addition.

Matrix Algebra 6
2. AB 6= BA. In general, commutative law of multiplication does not hold. There

are special square matrices for which commutative law holds. If the matrices are
of order m × n and n × m then both products will exist, but they will of different
orders and hence cannot be equal. If both matrices are square of same order, then
both products will exist and will be of the same order but not necessarily equal.
3. (A + B) + C = A + (B + C). The associative law of addition holds.
4. (AB)C = A(BC). The associative law of multiplication holds.
5. A(B + C) = AB + AC. This is called distributive law and it holds. Similarly,
distributive law of scaler multiplication also holds. That is c(A + B) = cA + cB and
(c + d)A = cA + dA.
• Transpose of matrix is obtained by interchanging rows and columns. Suppose A is
original matrix, then the first row of A becomes the first column of the transpose, the
second row of A becomes the second column of the transpose. In general, the jith element
in the transpose is the ijth element of the original matrix. The transposed matrix is
indicated by A0 or At . Consider following example,
" #
a11 a12 a13
A =
a21 a22 a23
 
a11 a21
 
A0 =  a12 a22 
a13 a23
Note here that AA0 and A0 A result in two different symmetric matrices. That is,
" #
0 a211 + a212 + a213 a11 a21 + a12 a22 + a13 a23
AA =
a11 a21 + a12 a22 + a13 a23 a221 + a222 + a223
 
a211 + a221 a11 a12 + a21 a22 a11 a13 + a21 a23

0
A A =  a11 a12 + a21 a22 a212 + a222 a12 a13 + a22 a23 

a11 a13 + a21 a23 a12 a13 + a22 a23 a213 + a223
Verify above results for " #

2 3 5
A=
1 −1 2
 
× + × + × × + × + ×
AA0 =  
× + × + × × + × + ×
" #
38 9
=
9 6

Matrix Algebra 7
 
× + × × + × × + ×
 
A0 A =  × + × × + × × + × 
× + × × + × × + ×
 
5 5 12
 
=  5 10 13 
12 13 29
• Rules about transposed matrices

1. (A0 )0 = A.
2. (A + B)0 = A0 + B0 .
3. (AB)0 = B0 A0 .
4. If A0 = A, then A is a symmetric matrix.
If x is a column vector of n elements, then x0 x is sum of squares. That is because,
 
x1
h i
 x2 

0
xx = x1 x2 · · · xn  .. 
 
 . 
xn
= x21 + x22 +···+ x2n .
Note that in above application, each element contributes equally to summation. If we
want to include weighted sum, then we need to include weighting matrix, W. That is,
 
w11 0 ··· 0  
 0 w22 ··· 0  x1
  
h i .. .. .. ..  x2 
  
x0 Wx = x1 x2 · · · xn  . . . .  .. 
  . 
 .. .. .. .. 
 . . . . 
xn
0 0 · · · wnn
 
w11 x1
h i w22 x2 
 
= x1 x2 · · · xn  
 ··· 
wnn xn
= w11 x21 + w22 x22 + · · · + wnn x2n
A very important function is obtained when we consider the general form of x0 Wx,
without the restriction that W is a diagonal matrix but postulating simply that W is
symmetric. Note that for 2 × 2, x0 Wx would be
x0 Wx = w11 x21 + w22 x22 + 2w12 x1 x2

Matrix Algebra 8
and for 3 × 3 case,
x0 Wx = w11 x21 + w22 x22 + w33 x23 +

2w12 x1 x2 + 2w13 x1 x3 + 2w23 x2 x3 .
This quadratic form also has application in portfolio selection model where W is variance-
covariance of returns while xi reflect fraction of portfolio invested in ith investment instru-
ment. If ri is the expected return on ith investment instrument, then portfolio manager is
P
expected to maximize rx subject to ni=1 xi = 1 and x0 Wx ≤ T where T is risk threshold,
that is variation in returns that are not acceptable to portfolio manager.
• Trace of a square matrix is sum of the elements in the principal diagonal. That is, for
P
n × n matrix A, tr(A) = ni=1 aii . Note that the trace of A0 A, denoted by tr(A0 A) is
equal to tr(AA0 ).
For matrics A and B of order m × n and p × q, respectively, following properties apply.
tr(A + B) = tr(A) + tr(B) m=n=p=q

tr(cA) = c tr(A) m=n
tr(A0 ) = tr(A) m=n
tr(AB) = tr(BA) m = q, n = p
• Determinant is a scaler quantity associated with any square matrix A and denoted by
det(A) or | A |. This scaler is obtained by summing various products of the elements of
A. For example, if matrix A 2 × 2 then

a a

| A |= 11 12

= a11 a22 − a12 a21 ,
a21 a22
and if matrix is 3 × 3, then determinant is

a11 a12 a13

| A |= a21 a22 a23
= a11 a22 a33 − a12 a21 a33 + a12 a23 a31 − a13 a22 a31

a31 a32 a33
+ a13 a21 a32 − a11 a23 a32

Matrix Algebra 9
y a(x1 , y1 )
b
Geometrically, determinant is a constant times area, or volume
enclosed by points. For example, if there are two points a(x1 , y1 )
b(x2 , y2 ) and b(x2 , y2 ) as shown in figure. Then
b

x1 y1

| A |= = x1 y2 − x2 y1

x2 y2
and area enclosed by1 a and b is 12 | x2 y1 − x1 y2 |.
x
In order to understand process of obtaining the general expression, we need two more concepts,
minor and Co-factor.
• Minor of the element aij from matrix A is the determinant of the matrix formed by
deleting the ith row and jth column of matrix A and it is denoted by Aij .
• Co-factor of aij is the minor multiplied by (−1)i+j or co-factor, cij is equal (−1)i+j | Aij |.
These two quantities can be used to evaluate determinant of any square matrix. Thus,
| A |= ai1 ci1 + ai2 ci2 + · · · + ain cin .
For example,

a11 a12 a13

|A| = a21 a22 a23

a31 a32 a33
= a11 c11 + a12 c12 + a13 c13

(1+1) a22 a23 (1+2) a21 a23 (1+3) a21 a22
= a11 (−1) + a12 (−1) + a13 (−1)
a32 a33 a31 a33 a31 a32
= a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31
+ a13 a21 a32 − a13 a31 a22
• Properties of Determinants
Q
1. If D is a diagonal matrix with n rows and columns, then | D |= ni=1 dii .
2. If Ir is an identity matrix with r rows, then | Ir |= 1.
3. | A0 |=| A |. That is, changing rows to columns does not alter value of determinant.
4. | AB |=| A || B |.
5. If each element in A is multiplied by k, the determinant of A is multiplied by k.
6. If each element in A is multiplied by k, the determinant of A is multiplied by k n
where n is number of elements in matrix A.
7. If | A |= 0, then A is said to be a singular matrix. That is, at least one row or one
column is linear combination of some other row(s) or column(s).
1
Note that area enclosed by two points and the origin is two rectangles, plus a triangle minus two triangles.
That is, Area = x1 y1 + (x2 − x1 )y2 + 12 (x2 − x1 )(y1 − y2 ) − 12 x1 y1 − 12 x2 y2 . After simplifying we get desired
result.

Matrix Algebra 10
Suppose that matrix A is  

2 4 2
 
A= 1 1 3 
0 3 1
then show that | A | is equal to −14.
• Inverse of matrix A exists only if A is nonsingular. The inverse or reciprocal of a
matrix is denoted by A−1 . Note that
AA−1 = A−1 A = I.
That is, pre or post multiplication by inverse to original matrix results in identity matrix.
To obtain inverse of matrix A, first transpose of co-factors (adjoint of A) is obtained.
Then, adjoint of matrix A is divided by determinant of matrix A. That is,
1
A−1 = (adjA).
|A|
Suppose that matrix A is  
1 2 1
 
A= 3 2 3 
1 1 2
then show that inverse of A is
 
−1 3 −4
1
A−1 =  3 −1 0 

4
−1 −1 4
and verify that A−1 A = I3 . Let us denote co-factor matrix by C, then
 
2 3 3 3 3 2

 − 
 1 2 1 2 1 1 
 



 2 1 1
1

1 2 
C =  − − 
 1 2 1 2 1 1 
 



2 1 1
1

1 2 


−
2 3 3 3 3 2
 
1 −3 1

=  −3 1 1 
.
4 0 −4
Note that | A | is equal to −4. Thus,
 
1 −3 4

(adjA) = C0 =  −3 1 0 

1 1 −4
and hence the result.

Matrix Algebra 11
• Properties of Inverses
1. Inverse of a symmetric matrix is also symmetric.

2. (A0 )−1 = (A−1 )0 . Inverse of a transpose is equal to transpose of inverse.
3. (AB)−1 = (B−1 A−1 ).
4. If k is a constant, then (kA)−1 = k1 A−1 .
5. If D is a diagonal matrix, then
 
1
d1
0 ··· 0
 1 
 0 ··· 0 
−1  d2 
D =  .. .. .. .. .
 . . . . 
 
1
0 0 ··· dr
1
6. | A−1 |= .
|A|
• Finding Solution to Linear Equations can be performed using inverse and deter-
minants. Consider following following three equations with three unknowns (x1 , x2 , and
x3 ).
x1 + x2 + x3 = 6
x1 − x2 + x3 = 2
2x1 + x2 − x3 = 1
This may be written in compact form as

    
1 1 1 x1 6

 1 −1 1     
  x2  =  2 
2 1 −1 x3 1
Ax = h
We can see that to obtain unique solution to above simultaneous equations, we must be
able to compute x = A−1 h. We know that to compute A−1 , we need determinant (| A |)
to be non-zero. We also know that
1
A−1 = adj(A)
|A|
 
c11 c21 c31
1  
=  c12 c22 c32 
|A|
c13 c23 c33

Matrix Algebra 12
where cij is transposed element of co-factor matrix. Using these results, we can write
solution to simultanous equations as
    
x1 c c c h
  1  11 21 31   1 
x
 2  = c
 12 c 22 c 32   h2 
|A|
x3 c13 c23 c33 h3
h1 c11 + h2 c21 + h3 c31
x1 =
|A|
h1 c12 + h2 c22 + h3 c32
x2 =
|A|
h1 c13 + h2 c23 + h3 c33
x3 =
|A|
Above solution also could be written with original matrix elements. That is,

h1 a12 a13

h2 a22 a23

h3 a32 a33
x1 =
|A|

a11 h1 a13

a21 h2 a23

a31 h3 a33
x2 =
|A|

a11 a12 h1

a21 a22 h2

a31 a32 h3
x3 =
|A|
Numerically, note that

−1
1 1 1 1 −1

|A| = 1 −1 +1

1 −1 2 −1 2 1
= 0+3+3
= 6
We can now compute the solution. That is,

6 1 1

2 −1 1

1 1 −1 6 × (1 − 1) − 1 × (−2 − 1) + 1 × (2 + 1)
x1 = = =1
6 6

Matrix Algebra 13

1 6 1

1 2 1

2 1 −1 1 × (−1 − 2) − 6 × (−1 − 2) + 1 × (1 − 4)
x2 = = =2
6 6
1 1 6

1 −1 2

2 1 1 1 × (−1 − 2) − 1 × (1 − 4) + 6 × (1 + 2)
x3 = = =3
6 6
There are two things to note. First, we can solve a system of equations containing a large
set of variables without much difficulty. Second, in most instances, we can proceed to
solve these equations unless the determinant of coefficients or | A |6= 0. One easy way to
detect that determinant is zero is by examining, if there are one or more row or column
that are a linear combination of each other. For example, consider
 
1 3 −2

A= 2 −1 4 
.
2 6 −4
In this instance, the last row is twice that of the first row and | A | is 1 × (4 − 24) − 3 ×
(−8 − 8) − 2 × (12 + 2) is zero. There are, however, complex forms of situation where
determinant is zero. Consider
 
1 3 −2

A =  2 −1 4 
1 −11 14
| A | = 1 × (−14 + 44) − 3 × (28 − 4) − 2 × (−22 + 1) = 30 − 72 + 42 = 0.
In such situation, matrix A may provide solution to not all of unknowns. We will revisit
this issue when we discuss problem of finding eigenvalues and eigenvectors below.
Rank of a Matrix
Linear Dependence
Consider the set of homogeneous equations
Ax = 0
where A is an m × n matrix of known constants and x a column vector of n unknowns. There

are two types of solution to this problem. First, the trivial one, that is, x = 0. That means,
every element of x is zero. Second solution is more complicated. There are some elements of
x are non-zero while the others are function of non-zero x’s. In first case, matrix A is termed

Matrix Algebra 14
linearly independent while the second case, matrix A is termed linearly dependent. To put
differently, suppose the columns of matrix A are denoted by a1 , a2 , · · · , an , if there exists a set
of a scalers x1 , x2 , · · · , xn with all xi 6= 0 for all i and satisfy
a1 x 1 + a 2 x 2 + · · · + a n x n = 0
then the matrix A is linearly dependent. Note that to decide which is possibility, we need to
evaluate determinant of A or | A |. Consider first situation when m = n or matrix A is a
square matrix. If determinant is non-zero (positive or negative), then we have the first case
and if determinant is zero, then we have the second case. To illustrate this further, Consider
following set of equations:
    
1 2 −3 x1 0

 2 0 2     
  x2  =  0 
4 1 3 x3 0
Note that

0 2 2 2 2 0

|A| = 1 − 2 − 3
1 3 4 3 4 1
= −2 + 4 − 6
= −4
In this case, we may conclude that the only solution is x1 = x2 = x3 = 0. Suppose we change
matrix A to     
1 2 −3 x1 0

 2 0 2    
  x2  =  0 

4 1 2 x3 0
In this case,

0 2 2 2 2 0

|A| = 1 − 2 − 3
1 2 4 2 4 1
= −2 + 8 − 6
= 0
This means that it may be possible to find a set of x’s, not all zero, such that
       
1 2 −3 0
       
x1  2  + x2  0  + x3  2  =  0 
4 1 2 0
Suppose we assumed that x3 = 1 and used first two equations to solve for x1 and x2 . Then, we
would get x1 = −1 and x2 = 2 or any scaler multiple of these values will satisfy above equation.
This leads to definition of rank and some properties of ranks.

Matrix Algebra 15
The rank of matrix, denoted by ρ(A) is defined as the maximum number of linearly independent
columns in A. That is, the rank of a matrix A is said to be r, if
• every square submatrix of order r + 1 is singular, and
• there is at least one square submatrix of order r which is non-singular.
Theorems about ranks

1. If matrix A is m × n, ρ(A) is less than equal to min(m, n).
2. If there are two matrices, A and B and their product exists, then ρ(AB) = min[ρ(A), ρ(B)]
Two alternatives for determining rank of a matrix are provided. First method is based on minor
of a matrix while other is based on echelon (normal) form of a matrix. Both methods give
identical results but method based on minor becomes far more elaborate and time consuming
for a larger matrices. Recall that the rank of a matrix is the maximum number of independent
columns or rows. For example, identity matrix
 
1 0 0
 
I3 =  0 1 0 
0 0 1
has three independent columns or rows. Thus, methods summarized below, either reduce matrix
to an identity or infer structure of matrix based on characteristics of some other matrices.
Minor Based Approach
Definition of Minor: Let A be a m × n matrix. If we retain any r rows and r columns of A, we

will have a square sub-matrix of order r. The determinant of the square sub-matrix of order
r is called a minor of A of order r. From a given matrix, we can form square sub-matrices of
order 0, 1, 2, 3, · · · , m if m is less than n or order 0, 1, 2, 3, · · · , n if n is less than m. For example,
if the matrix is 3 × 4, we can have square sub-matrices of order 1, 2, and 3. We cannot have
square sub-matrices of order 4 × 4 for this example. Let
 
a11 a12 a13 a14
A=
 a21 a22 a23 a24 
.
a31 a32 a33 a34
As you might have guessed it, there are 12 minors of order 1. That is, each element of A is a
minor. If we retain any two rows and any two columns of A and then computed determinant
of the square sub-matrices, then these determinants are called minors of order 2. That is,

a11 a12 a11 a13 a21 a24

, , , etc.
a21 a22 a21 a23 a31 a32

Matrix Algebra 16
For a matrix of size 3 × 4, there would be 18 such sub-matrices and minors. If we retain any
three rows and any three columns of A and computed the determinant of square sub-matrices,
these determinant are called the minors of order 3. That is for above 3 × 4 matrix, we might
get

a a a
11 a12 a13 11 a12 a14 12 a13 a14

a21 a22 a23 , a21 a22 a24 and a22 a23 a24 .

a31 a32 a33 a31 a32 a34 a32 a33 a34
We are now in position to relate rank and minor.
Theorem
The rank of a given matrix A is said to be r if
a. there is at least one minor of A of order r which is not equal to zero; or
b. every minor of A of order r + 1 is zero.
Note that if a minor of A is zero, the corresponding submatrix is singular and reverse is also
true.
Illustration: Consider following 2 × 3 matrix A. That is,

a11 a12 a13
A= .
a21 a22 a23
For this matrix, we can have rank of this matrix to be 0, 1 or 2. When all elements of matrix
A are zero, we will have rank of A of zero. If any element of this matrix is non-zero, then we
would rank of 1. Finally, there are three 2 × 2 sub-matrices. That is,

a11 a12 a11 a13 a12 a13
, , and .
a21 a22 a21 a23 a22 a23
If rank of this matrix is 2, then we must have that
a11 a22 − a21 a12 =

6 0 or
a11 a23 − a21 a13 = 6 0 or
a12 a23 − a22 a13 = 6 0
Note that to demonstrate that rank of A is 2, we need to show that at least one of three
sub-matrices have minor that is non-zero. As one would imagine that such process for a larger
matrices is tedious. For example, when matrix is 3 × 4, we need to compute 18 such minors to
demonstrate that rank of such matrix is 2.
Theorems on Rank of Matrices

Matrix Algebra 17
i. The rank of the transpose of a matrix is the same as that of the original matrix.
ii. The rank of a given matrix A remains unchanged by the operations of elementary row and
column transformations.
We will use above two theorems for finding rank of a matrix using the normal form of a matrix.
Every m × n matrix A of rank r can be reduced to any of the following forms

Ir 0 I
, ( Ir 0) r , ( Ir )
0 0 0
where matrix Ir denotes identity matrix of size r. These are called normal forms and are
obtained by elementary row and column operations.
Rank of Matrix Based on Normal Form
If a matrix A is reduced to the form
" #
Ir 0
0 0
by a chain of elementary row and column operations, then the rank of A is the order of identity
matrix Ir .
Illustration Suppose
 
1 2 3
 
A =  2 3 4 .
3 5 7
Consider following steps to convert matrix A to normal forms:
Step 1. Multiply column 1 by 2 and subtract it from column 2. In addition, multiply column
1 by 3 and subtract it from column 3.
Step 2. Multiply row 1 by 2 and subtract it from row 2; and also multiply row 1 by 3 and
subtract it from row 3.
Step 3. Multiply row 2 by −1.
Step 4. Multiply column 2 by 2 and subtract it from column 3.
Above four steps are summarized below.
 
1 2 3
 
A =  2 3 4 
3 5 7
       
1 0 0 1 0 0 1 0 0 1 0 0
       
⇒  2 −1 −2 ⇒ 0 −1 −2 ⇒ 0 1 2 ⇒ 0 1 0 
3 −1 −2 0 −1 −2 0 0 0 0 0 0

Matrix Algebra 18
That is, by using elementary row and column operations, we have reduced matrix A to
" #
I2 0
. Hence, we conclude that rank of matrix A is 2.
0 0
Rank of Matrix Based on Echelon Form
To find the linearly independent rows of a matrix A, and hence to determine its rank, we may
apply successive row operations in order to transform matrix A into a so-called echelon form.
Suppose we obtain matrix B by applying such row and column operations. That is,
 
1 b12 b13 · · · b1r ··· b1n
0 1 b23 · · · b2r ··· b2n 
 
B  .. .. 

= 0 0 b3n 
m×n 1 . b3r . .
 
0 0 0 ··· 1 ··· brn 
0 0 0 ··· 0 ··· 0
That is, the echelon form consists of an r ×r matrix in leading position in which elements below
the main diagonal are identically zero and elements in main diagonal are unity. The remaining
elements in the first r rows are in general nonzero, while all elements in the remaining m − r
rows are identically zero. The rank of A is equal to the number of rows in the echelon form in
which there is at least on nonzero element. As an example, let us reduce
 
1 2 3 2
 
A =  2 3 5 1 .
1 3 4 5
Multiply row one by 2 and subtract from row two and subtract row one from row three. Then,
A is transformed to  
1 2 3 2
 
A0 =  0 −1 −1 −3  .
0 1 1 3
In the next step, add row two to row three, and multiply row two by -1 to obtain matrix B.
 
1 2 3 2
 
=  0 1 1 3 .
0 0 0 0
Consequently, we conclude that rank of A is 2. As you may noted that transforming matrix in
echelon form requires fewer numerical operations compared to the approach based on minors,
especially when matrices are large.

Matrix Algebra 19
Eigenvalues and Eigenvectors
The characteristic value problem is defined as that of finding values of a scaler λ and an
associated vector x 6= 0 which satisfy
Ax = λx
where A is some n × n matrix, λ is called a characterisitc root of A and X a characteristic
vector. Alternative names are latent roots and vectors and eigenvalues and eigenvectors. Note
that above equation may be written as
(A − λI)x = 0
and has a nontrivial solution, x 6= 0, if (A − λI) is singular, that is, if | (A − λI) |= 0

Eigenvectors are transformed covariance-variance or correlation matrices such that
1. Each vector is independent of others,
2. Each vector is of equal length, usually unit length, and
3. Eigenvalues are decreasing in magnitude where eigenvalues are associated scalars with
each eigenvector.
• Illustrative Example: Suppose we ask five respondents two questions, attitude towards
family and attitude towards church on ten point scale. Actual and the mean subtracted
responses were
   
6 7 0 −1
   
 5 9   −1 1 
   
Responses =  8 6  (Response - Mean) =  2 −2 
   .
  
 4 9   −2 1 

7 9 1 1
Let us call the first column to be y1 (with mean of 6) and the second to be y2 (with mean
of 8). We may construct covariance-variance matrix S
!
2.5 −1.5
S=
−1.5 2.0
We would like to obtain two sets of two numbers, say, x11 , x12 and x21 , x22 such that when
we construct transformed variables, yt1 and yt2 transformed variables will be independent
of each other. In addition, we would like each vector(s) of unit length. That means,

Matrix Algebra 20
q
x211 + x212 = 1. Suppose λ is a scalar number and we will call it eigenvalue. Then
finding eigenvector is equivalent to finding solution to following two linear equations.
2.5x11 − 1.5x12 = λx11

−1.5x11 + 2x12 = λx12
Note that using the first equation, we can solve for x11 . That is, we may write
−1.5x12
x11 = − ,
2.5 − λ
and substituting value of x11 in the second equation, we get
−1.5x12
(−1.5) × − + 2x12 = λx12 or
2.5 − λ
−(1.5)2 x12
+ (2 − λ)x12 = 0 or
2.5 − λ
−(1.5)2 x12 + (2.5 − λ)(2 − λ)x12 = 0 or
h i
(2.5 − λ)(2 − λ) − (1.5)2 x12 = 0
Note a solution to above equation could be found by setting x12 = 0 and that would
mean x11 = 0. This solution is not consistent with our second requirement of unit length.
Another possibility would entail finding a solution for terms inside square parenthesis, or
(2.5 − λ)(2 − λ) − (1.5)2 = 0
This is quadratic equation with λ being unknown. Two solutions can be obtained where
1q
λ = 2.25 ± .25 + 4 × (1.5)2 .
2
Thus, we would get λ = 3.791 or λ = 0.729. Since these are roots of quadratic equations
(in general, polynomial) these are often called characteristic roots or eigenvalues.
Let us substitute λ = 3.791 in equation relating to x11 . That is,
1.5x12
x11 = ,
2.5 − λ
1.5x12
= ,
2.5 − 3.771
= −1.18x12

Matrix Algebra 21
We also know that x211 + x212 = 1. Using these two equations, we may write
(−1.18)2 x212 + x212 = 1 or

(1.392 + 1)x212 = 1
s
1
x12 = ±
2.392
= ±0.646
This would mean that x11 = −0.763 when x12 = 0.646 and x11 = 0.763, if we take
x12 = −0.646. If repeat computational steps x12 with λ = 0.729, we will get x12 ± 0.763
and x11 ±0.646. Note that first vector, [−0.763, 0.646] is called first eigenvector2 and other
one is [0.646, 0.763]. Note couple of observations. First, if we multiply our data matrix by
these two eigenvectors, we would get two new transformed variables and these transformed
vectors are independent of each other. To verify this, I multiplied 0×(−0.763)+−1×0.646
and I get −0.65. I continued this for each cell in the data matrix and obtained following
transformed and mean corrected observations.
 
−0.65 −0.76
!  


1.41 0.12 

Mean Corrected 
=  −2.82 −0.24 

Transformed Responses 
 2.17 −0.53 

−0.12 1.41
and covariance-variance matrix is

!
3.771 0.0
.
0.0 0.729
Second, covariance-variance matrix of transformed matrix is nothing but eigenvalues.

Graphically, determining eigenvalue is explained as a transformation of axes such that
variability is maximized along the first axis.
2
Here we have chosen one of two solutions but much of subsequent analysis does not depend on which solution
is used.

Matrix Algebra 22
y2 yt2
yt1 3
4 2 5
b b 1 b
−3 −2 −1 1 2 3
y1
1
−1 b
3
−2 b
−3
• Generalization: Suppose we had general covariance-variance matrix such as

a11 a12
C=
a12 a22
what are its eigenvalues? It turns out that for a simple matrix like 2×2, we can write a
formula for its eigenvalues. That is,
1 1q
λ = (a11 + a22 ) ± (a11 − a22 )2 + 4a212 .
2 2
Note several interesting observations.
1. When covariance is zero (a12 = 0), two eigenvalues would be a11 and a22 .
2. If we substitute a11 = a22 = 1, that is convert variables with variance of 1, then a12
would be correlation between two variables. In such instances, eigenvalues would be
λ = 1 ± r12 . This would imply that when two variables are perfectly correlated, one
eigenvalue would be 2 and other would be zero.
3. Note that sum of variances is equal to sum of eigenvalues.
• Properties of Eigenvalues & Eigenvectors: There are number of interesting proper-

ties of eigenvalues. Some of these are listed below and assume that covariance-variance
matrix is used for analysis.
1. The number of non-zero eigenvalues is equal rank of a matrix.

Matrix Algebra 23
2. The product of eigenvalues is equal to determinant of a matrix.

3. The sum of eigenvalues is equal to sum of elements along the diagonal.
4. The eigenvalues of a symmetric matrix are real and orthogonal.
5. The eigenvalues of a covariance-variance matrix are positive.
Partitioned Matrices
A matrix may be divided it by means of horizontal and vertical lines and made into smaller
submatrices. A partitioned matrices are often easier for computation purpose than the whole
matrix. Consider a matrix A with dimensions 3 × 5. That is,
 
a11 a12 a13 a14 a15
 
A =  a21 a22 a23 a24 a25 
a31 a32 a33 a34 a35
and may partioned by means of the two lines shown to provide four submatrices,
" # " #
a11 a12 a13 a14 a15
A11 = A12 =
a21 a22 a23 a24 a25
h i h i
A21 = a31 a32 a33 A22 = a34 a35
and it could be written as " #

A11 A12
A= .
A21 A22
Note that the partitioning must extend all the way across or up and down the original matrix.
The operations of addition and multiplication also apply to partitioned matrices, but matrices
must be have been partitioned conformably. For example, if there is another matrix B also of
the order 3 × 5 and partitioned as matrix A. That is,
 
B11 B12
 2×3 2×2 
 
B= 
 B21 B22 
1×3 2×1
then the sum A + B can be written as

" #
A11 + B11 A12 + B12
A+B= .
A21 + B21 A22 + B22

Matrix Algebra 24
For multiplication also partitioned matrices must be conformable. Suppose matrix C is 5 × 2.

That is;
 
c11 c12
 
 c21 c22 
 
C=  c31 c32 .

 
 c41 c42 
c51 c52
The product AC then is of order 3×2. For the product may be expressed in terms of partitioned
matrices, the only condition is that the partitioning of the rows of C should conform to the
partitioning of the columns of A. Suppose matrix C is partitioned as
 
C11
 3×2 
 
C= .
 C21 
2×2
The multiplication of submatrices then is same as treating submatrices as elements. That is

product
" #" #
A11 A12 C11
AC =
A21 A22 C21
 
A11 C11 + A12 C21
 2×2 2×2 
= 


.
 A21 C11 + A22 C21 
1×2 1×2
Note that condition of conformability also would be satisfied by

 
C11 C12
 3×1 3×1 
 
C= .
 C21 C22 
2×1 2×1
and then product would appear

" #" #
A11 A12 C11 C12
AC =
A21 A22 C21 C22
 
A11 C11 + A12 C21 A11 C12 + A12 C22
 2×1 2×1 1×1 1×1 
 
=  .
 A21 C11 + A22 C21 A21 C12 + A22 C22 
1×1 1×1 1×1 1×1

Matrix Algebra 25
The inverse of partitioned matrix sometimes provide useful starting point for analysis purpose.
If A is a nonsingular matrix, which is partitioned
 
A11 A12
 p×p p×q 
 
A= 
 A21 A22 
q×p q×q
Using following identities

" #
−1 −1 I 0
AA =A A=
0 I
after some algebraic manipulation, it can be shown that
 
A−1 −1 −1 −1
11 + A11 A12 E(A21 A11 ) −A11 A12 E
 p×p p×q 
 
A−1 = 


 or alternatively
 
 −E(A21 A−1
11 ) E 
q×p q×q
 
G −G(A12 A−1
22 )
 q×q q×p 
 
 
= 

.

 −(A−1 −1 −1 −1
22 A21 )G A22 + A22 A21 G(A12 A22 ) 
p×q p×p
h i−1 h i−1
where E = A22 − A21 A−1
11 A12 and G = A11 − A12 A−1
22 A21 Both of these forms are
useful, if A22 or A11 happens to be of simpler form. This inverse exists, if A11 and / or A22
are nonsigular. If matrix A is block-diagonal then, inverse of is of simpler form. That is,
" #
A11 0
A = then
0 A22
" #
−1 A−1
11 0
A = .
0 A−1
22
Above result holds for any number of matrices.

Determinant of partitioned matrix can be written in two different forms
" #
A11 0
A = then
0 A22
| A | = | A22 | · | A11 − A12 A−1

22 A21 |
= | A11 | · | A22 − A21 A−1

11 A12 |

Matrix Algebra 26
Note that the first result to be valid, we must have A22 to be nonsingular while the second
result to be valid, we must have A11 to nonsingular. Furthermore, if any one of off-diagonal
matrix is zero, that is, either A12 or A21 , then determinant is equal to product of determinant
of diagonal matices, or | A11 | · | A22 |.
Vector and Matrix Derivatives
It is possible to write any vector or matrix in extended form and apply, element by element,
rules of scaler differentiation, there is advantage in defining rules that apply to vectors and
matrices. First briefly scaler rules are reviewed and then similar matrix based rules are provided.
Scaler Differentiation Rules
In following examples, u and v are real-valued differentiable functions of x and y is function u
and v. Furthermore a is a real constant, then following rules apply.
∂y
• If y = a, then = 0.
∂x
• If y = u(x) + v(x), then
∂y ∂u ∂v
= + .
∂x ∂x ∂x
• If y = u(x)v(x), then
∂y ∂u ∂v
= v(x) + u(x) .
∂x ∂x ∂x
u(x)
• If y = , then
v(x
∂u ∂v
∂y v(x) − u(x)
= ∂x ∂x
∂x v(x)2
provided v(x) 6= 0.
• If y = u(x)a , then
∂y ∂u
= au(x)a−1 .
∂x ∂x
• If y = log[u(x)], then
∂y 1 ∂u
= .
∂x u(x) ∂x

Matrix Algebra 27
• If y = exp[u(x)], then
∂y ∂u
= exp[u(x)] .
∂x ∂x
• If y = au(x) , then
∂y ∂u
= au(x) log[u(x)] ,
∂x ∂x
provided a > 0.
A Matric Function with respect to a Scaler

Suppose the elements of m × n of matrix U be differentiable functions of a scaler variable x.
Then the derivative of U with respect to x is defined as the m × n matrix
" #
∂U ∂uij
=
∂x ∂x
where i = 1, 2, · · · , m and j = 1, 2, · · · , n. This is easy rule to follow because element-by-element

differentiation. For example, suppose that
 
x2 3 exp(x)
2x
U=  √ 
log(x) 3 2x
then  
2 2x 3 exp(x)
∂U  
=
 1 2 .

∂x 0 √
x 2x
Suppose that there is another matrix V of size p × q and also differentiable functions of a scaler
variable x. Then following rules apply.
∂(U + V) ∂U ∂V
= +
∂x ∂x ∂x
provided m = p and n = q.
∂(UV) ∂V ∂U
=U +V
∂x ∂x ∂x
provided n = p and resulting matrix will be of size m × q. Finally, differentiation of U−1 only
exist if m = n and | U |6= 0.
∂U−1 ∂U −1
= −U−1 U
∂x ∂x

Matrix Algebra 28
which is complicated to remember and manipulate than scaler quotient rule. Note that
∂ log | U | ∂U
= tr(U−1 )
∂x ∂x
provided m = n. Moreover,
∂tr(UA) ∂U
= tr( )A
∂x ∂x
provided m = n, matrix A has real valued constants and of size m × m.
Differentiation of a Scaler Function of a Matrix with respect to Matric or Vector
Let y be scaler function of matrix X, then the derivative of y with respect to X is the matrix
" #
∂y ∂y
= . The vector form of this differentiation appear in regression analysis while the
∂X ∂xij
matrix form appear with multivariate regression. Since determinants and traces are always
scaler, these differentiations are summarized below. It is assumed that matrix X is of order
m × n. Note that differentiation of determinant,
∂|X|
= adj(X0 )
∂X
provided m = n and
∂ log | X | adj(X0 )
= = (X−1 )0
∂X |X|
provided that m = n and | X |6= 0. To illustrate, usefulness of these rules consider following
example. Let
" #
x11 x12
X= .
x21 x22
Note that | X |= x11 x22 − x12 x21 . Then it follows that
∂|X|
= x22
∂x11
∂|X|
= −x21
∂x12
∂|X|
= −x12
∂x21
∂|X|
= x11 and, thus
∂x22
" #
∂|X| x22 −x21
= .
∂X −x12 x11

Matrix Algebra 29
To obtain same result using rules of differentiation, a transpose and adjoint must be calculated.
That is,
" #
0 x11 x21
X =
x12 x22
" #
x22 −x21
Adj(X0 ) =
−x12 x11
which is easier to obtain than actually calculating derivatives. Verify that for above X
" #
∂ log | X | 1 x22 −x21
=
∂X x11 x22 − x12 x21 −x12 x11
using both approaches.

Note that differentiation of determinant,
∂ | AXB |
=| AXB | (X0 )−1 =| AXB | (X−1 )0
∂X
where A and B are matrices of constants and of the same size as matrix X. Suppose
" #
2 1
A = ,
3 2
" #
x11 x12
X = and
x21 x22
" #
3 4
B = .
1 2
Verify that | AXB | is equal to 2(x11 x22 − x21 x12 ) and

" #
∂ | AXB | x22 −x21
=2 .
∂X −x12 x11

Matrix Algebra 30
Suppose A be an p × q matrix of constants and X a m × n matrix of variables as defined above.

Then following derivatives of trace function are useful.
∂tr(A)
∂X = 0 p=q
∂tr(X)
∂X = I m=n
∂tr(AX) ∂tr(XA)
∂X = ∂X = A0 p = m, q = n
0
∂tr(AX ) ∂tr(X0 A)
∂X = ∂X = A p = m, q = n
∂tr(X0 AX)
= (A + A0 )X p = m, q = n.
∂X
∂tr(X−1 A)
= −X−1 A0 X−1 p = m, q = n.
∂X
Suppose
" #
a11 a12
A = and
a21 a22
" #
x11 x12
X = .
x21 x22
This results in
" #
a11 x11 + a12 x21 a11 x12 + a12 x22
AX = and
a21 x11 + a22 x21 a21 x12 + a22 x22
tr(AX) = a11 x11 + a12 x21 + a21 x12 + a22 x22 .
Thus, it can be concluded that

" #
∂tr(AX) a11 a21
= = A0 .
∂X a12 a22
In conclusion, it might be noted that use of matrix algebra for differentiation purposes is more
compact than element-by-element differentiation.

Matrix Alg

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Matrix Alg

Hochgeladen von

Copyright:

Verfügbare Formate

Matrix Algebra 1

Scalers, Vectors & Matrices

Multivariate Research Methods COST*6060

Multivariate Research Methods COST*6060

Geometrically, additions of two matrices is explained as follows. Consider matrix A as

Multivariate Research Methods COST*6060

• Scaler multiplication require that each element aij be multiplied by a scaler. If k is a

In AB note that A being postmultiplied by B or to A premultiplying B. BA is usually

Multivariate Research Methods COST*6060

Consider following example,

Determine AB and BA for following matrices.

Similarly verify that

1. A + B = B + A. This is commutative law of addition.

Multivariate Research Methods COST*6060

2. AB 6= BA. In general, commutative law of multiplication does not hold. There

Verify above results for " #

Multivariate Research Methods COST*6060

• Rules about transposed matrices

Multivariate Research Methods COST*6060

and for 3 × 3 case,

x0 Wx = w11 x21 + w22 x22 + w33 x23 +

tr(A + B) = tr(A) + tr(B) m=n=p=q

and if matrix is 3 × 3, then determinant is

Multivariate Research Methods COST*6060

Multivariate Research Methods COST*6060

Suppose that matrix A is  

Multivariate Research Methods COST*6060

1. Inverse of a symmetric matrix is also symmetric.

This may be written in compact form as

Multivariate Research Methods COST*6060

Numerically, note that

We can now compute the solution. That is,

Multivariate Research Methods COST*6060

where A is an m × n matrix of known constants and x a column vector of n unknowns. There

Multivariate Research Methods COST*6060

Multivariate Research Methods COST*6060

Theorems about ranks

Minor Based Approach

Definition of Minor: Let A be a m × n matrix. If we retain any r rows and r columns of A, we

Multivariate Research Methods COST*6060

We are now in position to relate rank and minor.

a. there is at least one minor of A of order r which is not equal to zero; or

b. every minor of A of order r + 1 is zero.

If rank of this matrix is 2, then we must have that

a11 a22 − a21 a12 =

Theorems on Rank of Matrices

Multivariate Research Methods COST*6060

Multivariate Research Methods COST*6060

Rank of Matrix Based on Echelon Form

Multivariate Research Methods COST*6060

Eigenvalues and Eigenvectors

and has a nontrivial solution, x 6= 0, if (A − λI) is singular, that is, if | (A − λI) |= 0

1. Each vector is independent of others,

2. Each vector is of equal length, usually unit length, and

Multivariate Research Methods COST*6060

2.5x11 − 1.5x12 = λx11

and substituting value of x11 in the second equation, we get

(2.5 − λ)(2 − λ) − (1.5)2 = 0

Multivariate Research Methods COST*6060

(−1.18)2 x212 + x212 = 1 or

and covariance-variance matrix is

Second, covariance-variance matrix of transformed matrix is nothing but eigenvalues.