Matrix Calculus

F
Matrix
Calculus
F1
Appendix F: MATRIX CALCULUS
TABLE OF CONTENTS
Page
F.1. Introduction F3
F.2. The Derivatives of Vector Functions F3
F.2.1. Derivative of Vector with Respect to Vector . . . . . . . F3
F.2.2. Derivative of a Scalar with Respect to Vector . . . . . . F3
F.2.3. Derivative of Vector with Respect to Scalar . . . . . . . F3
F.2.4. Jacobian of a Variable Transformation . . . . . . . . . F4
F.3. The Chain Rule for Vector Functions F5
F.4. The Derivative of Scalar Functions of a Matrix F6
F.4.1. Functions of a Matrix Determinant . . . . . . . . . . F7
F.5. The Matrix Differential F8
F2
F.2 THE DERIVATIVES OF VECTOR FUNCTIONS
F.1. Introduction
In this Appendix we collect some useful formulas of matrix calculus that often appear in nite
element derivations.
F.2. The Derivatives of Vector Functions
Let x and y be vectors of orders n and m respectively:
x =
x
1
x
2
.
.
.
x
n
, y =
y
1
y
2
.
.
.
y
m
, (F.1)
where each component y
i
may be a function of all the x
j
, a fact represented by saying that y is a
function of x, or
y = y(x). (F.2)
If n = 1, x reduces to a scalar, which we call x. If m = 1, y reduces to a scalar, which we call y.
Various applications are studied in the following subsections.
F.2.1. Derivative of Vector with Respect to Vector
The derivative of the vector y with respect to vector x is the n m matrix
y
x
def
=
y
1
x
1
y
2
x
1

y
m
x
1
y
1
x
2
y
2
x
2

y
m
x
2
.
.
.
.
.
.
.
.
.
.
.
.
y
1
x
n
y
2
x
n

y
m
x
n
(F.3)
F.2.2. Derivative of a Scalar with Respect to Vector
If y is a scalar,
y
x
def
=
y
x
1
y
x
2
.
.
.
y
x
n
. (F.4)
F.2.3. Derivative of Vector with Respect to Scalar
If x is a scalar,
y
x
def
=
y
1
x
y
2
x
. . .
y
m
x
(F.5)
F3
Remark F.1. Many authors, notably in statistics and economics, dene the derivatives as the transposes of
those given above.
1
This has the advantage of better agreement of matrix products with composition schemes
such as the chain rule. Evidently the notation is not yet stable.
Example F.1. Given
y =
y
1
y
2
, x =
x
1
x
2
x
3
(F.6)
and
y
1
= x
2
1
x
2
y
2
= x
2
3
+3x
2
(F.7)
the partial derivative matrix y/x is computed as follows:
y
x
=
y
1
x
1
y
2
x
1
y
1
x
2
y
2
x
2
y
1
x
3
y
2
x
3
2x
1
0
1 3
0 2x
3
(F.8)
F.2.4. Jacobian of a Variable Transformation
In multivariate analysis, if x and y are of the same order, the determinant of the square matrix x/y,
that is
J =
x
y
(F.9)
is called the Jacobian of the transformation determined by y = y(x). The inverse determinant is
J
1
=
y
x
. (F.10)
Example F.2. The transformation from spherical to Cartesian coordinates is dened by
x = r sin cos , y = r sin sin , z = r cos (F.11)
where r > 0, 0 < < and 0 < 2. To obtain the Jacobian of the transformation, let
x x
1
, y x
2
, z x
3
r y
1
, y
2
, y
3
(F.12)
Then
J =
x
y
sin y
2
cos y
3
sin y
2
sin y
3
cos y
2
y
1
cos y
2
cos y
3
y
1
cos y
2
sin y
3
y
1
sin y
2
y
1
sin y
2
sin y
3
y
1
sin y
2
cos y
3
0
= y
2
1
sin y
2
= r
2
sin .
(F.13)
The foregoing denitions can be used to obtain derivatives to many frequently used expressions,
including quadratic and bilinear forms.
1
One author puts it this way: When one does matrix calculus, one quickly nds that there are two kinds of people in this
world: those who think the gradient is a row vector, and those who think it is a column vector.
F4
F.3 THE CHAIN RULE FOR VECTOR FUNCTIONS
Example F.3. Consider the quadratic form
y = x
T
Ax (F.14)
where A is a square matrix of order n. Using the denition (D.3) one obtains
y
x
= Ax +A
T
x (F.15)
and if A is symmetric,
y
x
= 2Ax. (F.16)
We can of course continue the differentiation process:
2
y
x
2
=

x
y
x
= A +A
T
, (F.17)
and if A is symmetric,
2
y
x
2
= 2A. (F.18)
The following table collects several useful vector derivative formulas.
y
y
x
Ax A
T
x
T
A A
x
T
x 2x
x
T
Ax Ax +A
T
x
F.3. The Chain Rule for Vector Functions
Let
x =
x
1
x
2
.
.
.
x
n
, y =
y
1
y
2
.
.
.
y
r
and z =
z
1
z
2
.
.
.
z
m
(F.19)
where z is a function of y, which is in turn a function of x. Using the denition (D.2), we can write
z
x
T
=
z
1
x
1
z
1
x
2
. . .
z
1
x
n
z
2
x
1
z
2
x
2
. . .
z
2
x
n
.
.
.
.
.
.
.
.
.
z
m
x
1
z
m
x
2
. . .
z
m
x
n
(F.20)
Each entry of this matrix may be expanded as
z
i
x
j
=
r
q=1
z
i
y
q
y
q
x
j
i = 1, 2, . . . , m
j = 1, 2, . . . , n.
(F.21)
F5
Then
z
x
T
=
z
1
y
q
y
q
x
1
z
1
y
q
y
q
x
2
. . .
z
2
y
q
y
q
x
n
z
2
y
q
y
q
x
1
z
2
y
q
y
q
x
2
. . .
z
2
y
q
y
q
x
n
.
.
.
z
m
y
q
y
q
x
1
z
m
y
q
y
q
x
2
. . .
z
m
y
q
y
q
x
n
z
1
y
1
z
1
y
2
. . .
z
1
y
r
z
2
y
1
z
2
y
2
. . .
z
2
y
r
.
.
.
z
m
y
1
z
m
y
2
. . .
z
m
y
r
y
1
x
1
y
1
x
2
. . .
y
1
x
n
y
2
x
1
y
2
x
2
. . .
y
2
x
n
.
.
.
y
r
x
1
y
r
x
2
. . .
y
r
x
n
z
y
T

y
x
T
=
y
x
z
y
T
. (F.22)
On transposing both sides, we nally obtain
z
x
=
y
x
z
y
, (F.23)
which is the chain rule for vectors. If all vectors reduce to scalars,
z
x
=
y
x
z
y
=
z
y
y
x
, (F.24)
which is the conventional chain rule of calculus. Note, however, that when we are dealing with
vectors, the chain of matrices builds toward the left. For example, if w is a function of z, which
is a function of y, which is a function of x,
w
x
=
y
x
z
y
w
z
. (F.25)
On the other hand, in the ordinary chain rule one can indistictly build the product to the right or to
the left because scalar multiplication is commutative.
F.4. The Derivative of Scalar Functions of a Matrix
Let X = (x
i j
) be a matrix of order (m n) and let
y = f (X), (F.26)
be a scalar function of X. The derivative of y with respect to X, denoted by
y
X
, (F.27)
F6
F.4 THE DERIVATIVE OF SCALAR FUNCTIONS OF A MATRIX
is dened as the following matrix of order (m n):
G =
y
X
=
y
x
11
y
x
12
. . .
y
x
1n
y
x
21
y
x
22
. . .
y
x
2n
.
.
.
.
.
.
.
.
.
y
x
m1
y
x
m2
. . .
y
x
mn
y
x
i j
i, j
E
i j
y
x
i j
, (F.28)
where E
i j
denotes the elementary matrix* of order (m n). This matrix G is also known as a
gradient matrix.
Example F.4. Find the gradient matrix if y is the trace of a square matrix X of order n, that is
y = tr(X) =
n
i =1
x
i i
. (F.29)
Obviously all non-diagonal partials vanish whereas the diagonal partials equal one, thus
G =
y
X
= I, (F.30)
where I denotes the identity matrix of order n.
F.4.1. Functions of a Matrix Determinant
An important family of derivatives with respect to a matrix involves functions of the determinant
of a matrix, for example y = |X| or y = |AX|. Suppose that we have a matrix Y = [y
i j
] whose
components are functions of a matrix X = [x
rs
], that is y
i j
= f
i j
(x
rs
), and set out to build the
matrix
|Y|
X
. (F.31)
Using the chain rule we can write
|Y|
x
rs
=
j
Y
i j
|Y|
y
i j
y
i j
x
rs
. (F.32)
But
|Y| =
j
y
i j
Y
i j
, (F.33)
where Y
i j
is the cofactor of the element y
i j
in |Y|. Since the cofactors Y
i 1
, Y
i 2
, . . . are independent
of the element y
i j
, we have
|Y|
y
i j
= Y
i j
. (F.34)
It follows that
|Y|
x
rs
=
j
Y
i j
y
i j
x
rs
. (F.35)
* The elementary matrix E
i j
of order m n has all zero entries except for the (i, j ) entry, which is one.
F7
There is an alternative form of this result which is ocassionally useful. Dene
a
i j
= Y
i j
, A = [a
i j
], b
i j
=
y
i j
x
rs
, B = [b
i j
]. (F.36)
Then it can be shown that
|Y|
x
rs
= tr(AB
T
) = tr(B
T
A). (F.37)
Example F.5. If X is a nonsingular square matrix and Z = |X|X
1
its cofactor matrix,
G =
|X|
X
= Z
T
. (F.38)
If X is also symmetric,
G =
|X|
X
= 2Z
T
diag(Z
T
). (F.39)
F.5. The Matrix Differential
For a scalar function f (x), where x is an n-vector, the ordinary differential of multivariate calculus
is dened as
d f =
n
i =1
f
x
i
dx
i
. (F.40)
In harmony with this formula, we dene the differential of an m n matrix X = [x
i j
] to be
dX
def
=
dx
11
dx
12
. . . dx
1n
dx
21
dx
22
. . . dx
2n
.
.
.
.
.
.
.
.
.
dx
m1
dx
m2
. . . dx
mn
. (F.41)
This denition complies with the multiplicative and associative rules
d(X) = dX, d(X +Y) = dX +dY. (F.42)
If X and Y are product-conforming matrices, it can be veried that the differential of their product
is
d(XY) = (dX)Y +X(dY). (F.43)
which is an extension of the well known rule d(xy) = y dx + x dy for scalar functions.
Example F.6. If X = [x
i j
] is a square nonsingular matrix of order n, and denote Z = |X|X
1
. Find the
differential of the determinant of X:
d|X| =
i, j
|X|
x
i j
dx
i j
=
i, j
X
i j
dx
i j
= tr(|X|X
1
)
T
dX) = tr(Z
T
dX), (F.44)
where X
i j
denotes the cofactor of x
i j
in X.
F8
F.5 THE MATRIX DIFFERENTIAL
Example F.7. With the same assumptions as above, nd d(X
1
). The quickest derivation follows by differ-
entiating both sides of the identity X
1
X = I:
d(X
1
)X +X
1
dX = 0, (F.45)
from which
d(X
1
) = X
1
dXX
1
. (F.46)
If X reduces to the scalar x we have
d
1
x
=
dx
x
2
. (F.47)
F9

Matrix Calculus

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Matrix Calculus

Hochgeladen von

Copyright:

Verfügbare Formate

F

Das könnte Ihnen auch gefallen