Beruflich Dokumente
Kultur Dokumente
CONTENTS
CONTENTS
Contents
1 Basics
1.1 Trace and Determinants . . . . . . . . . . . . . . . . . . . . . . .
1.2 The Special Case 2x2 . . . . . . . . . . . . . . . . . . . . . . . . .
2 Derivatives
2.1 Derivatives
2.2 Derivatives
2.3 Derivatives
2.4 Derivatives
2.5 Derivatives
of
of
of
of
of
a Determinant . . . . . . . . . . . .
an Inverse . . . . . . . . . . . . . . .
Matrices, Vectors and Scalar Forms
Traces . . . . . . . . . . . . . . . . .
Structured Matrices . . . . . . . . .
3 Inverses
3.1 Basic . . . . . . . . . . .
3.2 Exact Relations . . . . .
3.3 Implication on Inverses .
3.4 Approximations . . . . .
3.5 Generalized Inverse . . .
3.6 Pseudo Inverse . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
9
11
12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
16
17
17
17
17
4 Complex Matrices
19
4.1 Complex Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Decompositions
22
5.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . 22
5.2 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . 22
5.3 Triangular Decomposition . . . . . . . . . . . . . . . . . . . . . . 23
6 Statistics and Probability
24
6.1 Definition of Moments . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2 Expectation of Linear Combinations . . . . . . . . . . . . . . . . 25
6.3 Weighted Scalar Variable . . . . . . . . . . . . . . . . . . . . . . 26
7 Gaussians
7.1 Basics . . . . . . . .
7.2 Moments . . . . . .
7.3 Miscellaneous . . . .
7.4 Mixture of Gaussians
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
29
31
32
8 Special Matrices
8.1 Units, Permutation and Shift . . . . . . . .
8.2 The Singleentry Matrix . . . . . . . . . . .
8.3 Symmetric and Antisymmetric . . . . . . .
8.4 Toeplitz Matrices . . . . . . . . . . . . . . .
8.5 Positive Definite and Semi-definite Matrices
8.6 Block matrices . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
33
34
36
36
38
39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 2
CONTENTS
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
41
42
43
45
46
46
46
A One-dimensional Results
47
A.1 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.2 One Dimensional Mixture of Gaussians . . . . . . . . . . . . . . . 48
B Proofs and Details
50
B.1 Misc Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 3
CONTENTS
CONTENTS
Matrix
Matrix indexed for some purpose
Matrix indexed for some purpose
Matrix indexed for some purpose
Matrix indexed for some purpose or
The n.th power of a square matrix
The inverse matrix of the matrix A
The pseudo inverse matrix of the matrix A (see Sec. 3.6)
The square root of a matrix (if unique), not elementwise
The (i, j).th entry of the matrix A
The (i, j).th entry of the matrix A
The ij-submatrix, i.e. A with i.th row and j.th column deleted
Vector
Vector indexed for some purpose
The i.th element of the vector a
Scalar
Real part of a scalar
Real part of a vector
Real part of a matrix
Imaginary part of a scalar
Imaginary part of a vector
Imaginary part of a matrix
det(A)
Tr(A)
diag(A)
vec(A)
||A||
AT
A
AH
Determinant of A
Trace of the matrix A
Diagonal matrix of the matrix A, i.e. (diag(A))ij = ij Aij
The vector-version of the matrix A (see Sec. 9.2.2)
Matrix norm (subscript if any denotes what norm)
Transposed matrix
Complex conjugated matrix
Transposed and complex conjugated matrix (Hermitian)
AB
AB
0
I
Jij
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 4
1 BASICS
Basics
(AB)1
(ABC...)1
= B1 A1
= ...C1 B1 A1
(AT )1
=
=
=
=
=
=
=
=
(A + B)
(AB)T
(ABC...)T
(AH )1
(A + B)H
(AB)H
(ABC...)H
1.1
1.2
(A1 )T
AT + B T
B T AT
...CT BT AT
(A1 )H
AH + B H
B H AH
...CH BH AH
P
Aii
Pi
i = eig(A)
i i ,
Tr(AT )
Tr(BA)
Tr(A) + Tr(B)
Tr(BCA) = Tr(CAB)
Q
i = eig(A)
i i
det(A) det(B)
1/ det(A)
=
=
=
=
=
=
=
=
=
A=
A11
A21
A12
A22
2 Tr(A) + det(A) = 0
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 5
1.2
1 =
Tr(A) +
p
Tr(A)2 4 det(A)
2
1 + 2 = Tr(A)
Eigenvectors
v1
A12
1 A11
Inverse
A
2 =
Tr(A)2 4 det(A)
2
1 2 = det(A)
1
=
det(A)
Tr(A)
BASICS
v2
A22
A21
A12
2 A11
A12
A11
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 6
2 DERIVATIVES
Derivatives
x
y
x
=
yi
x
y
=
ij
xi
yj
The following rules are general and very useful when deriving the differential of
an expression ([12]):
A
(X)
(X + Y)
(Tr(X))
(XY)
(X Y)
(X Y)
(X1 )
(det(X))
(ln(det(X)))
XT
XH
2.1
2.1.1
=
=
=
=
=
=
=
=
=
=
=
=
0
(A is a constant)
X
X + Y
Tr(X)
(X)Y + X(Y)
(X) Y + X (Y)
(X) Y + X (Y)
X1 (X)X1
det(X)Tr(X1 X)
Tr(X1 X)
(X)T
(X)H
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
Derivatives of a Determinant
General form
det(Y)
Y
= det(Y)Tr Y1
x
x
2.1.2
Linear forms
det(X)
= det(X)(X1 )T
X
det(AXB)
= det(AXB)(X1 )T = det(AXB)(XT )1
X
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 7
2.2
Derivatives of an Inverse
2.1.3
DERIVATIVES
Square forms
(13)
2.2
= 2(X+ )T
= 2XT
= (X1 )T = (XT )1
= k det(Xk )XT
Derivatives of an Inverse
XT abT XT
det(X1 )(X1 )T
(X1 BAX1 )T
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 8
2.3
2.3
2.3.1
DERIVATIVES
xT a
x
aT Xb
X
aT XT b
X
aT Xa
X
X
Xij
(XA)ij
Xmn
(XT A)ij
Xmn
2.3.2
aT x
x
= abT
= baT
aT XT a
X
aaT
= im (A)nj
(Jmn A)ij
= in (A)mj
(Jnm A)ij
= Jij
Second Order
X
Xkl Xmn
Xij
= 2
bT XT Xc
X
(Bx + b)T C(Dx + d)
x
(XT BX)kl
Xij
(XT BX)
Xij
Xkl
kl
klmn
= X(bcT + cbT )
= BT C(Dx + d) + DT CT (Bx + b)
= lj (XT B)ki + kj (BX)il
= XT BJij + Jji BX
(Jij )kl = ik jl
See Sec 8.2 for useful properties of the Single-entry matrix Jij
xT Bx
x
bT XT DXc
X
= (B + BT )x
= DT XbcT + DXcbT
=
(D + DT )(Xb + c)bT
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 9
2.3
DERIVATIVES
(x s)T W(x s)
s
2W(x s)
2W(x As)
2W(x As)sT
T n T n
a (X ) X b =
X
(14)
n1
Xh
r=0
i
(15)
s
x
s
x
T
(A + AT )s
T
r
As +
x
T
AT r
Using the above we have for the gradient and the hessian
f
f
x f =
x
2f
xxT
xT Ax + bT x
= (A + AT )x + b
=
A + AT
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 10
2.4
Derivatives of Traces
2.4
DERIVATIVES
Derivatives of Traces
2.4.1
First Order
Tr(X)
X
Tr(XA)
X
Tr(AXB)
X
Tr(AXT B)
X
Tr(XT A)
X
Tr(AXT )
X
2.4.2
AT
AT B T
BA
(16)
Second Order
Tr(X2 ) = 2X
X
Tr(XT BX)
X
Tr(XBXT )
X
Tr(AXBX)
X
Tr(XT X)
X
Tr(BXXT )
X
Tr(BT XT CXB)
X
Tr XT BXC
X
Tr(AXBXT C)
X
i
h
BX + BT X
XBT + XB
AT XT BT + BT XT AT
= 2X
= (B + BT )X
=
CT XBBT + CXBBT
BXC + BT XCT
AT CT XBT + CAXB
See [7].
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 11
2.5
2.4.3
DERIVATIVES
Higher Order
Tr(Xk )
X
= k(Xk1 )T
k1
X
Tr(AXk ) =
(Xr AXkr1 )T
X
r=0
T T
T
T
Tr
B
X
CXX
CXB
=
CXX
CXBBT
X
+CT XBBT XT CT X
+CXBBT XT CX
+CT XXT CT XBBT
2.4.4
Other
2.5
Assume that the matrix A has some structure, i.e. symmetric, toeplitz, etc.
In that case the derivatives of the previous section does not apply in general.
Instead, consider the following general rule for differentiating a scalar function
f (A)
"
#
T
X f Akl
df
f
A
=
= Tr
dAij
Akl Aij
A
Aij
kl
2.5
2.5.1
DERIVATIVES
(17)
Then the Chain Rule can then be written the following way:
M
(18)
k=1 l=1
(19)
Symmetric
df
f
f
f
=
+
diag
dA
A
A
A
That is, e.g., ([5], [18]):
Tr(AX)
X
det(X)
X
ln det(X)
X
2.5.3
(20)
(21)
= 2X1 (X1 I)
(22)
Diagonal
AI
(23)
Toeplitz
Like symmetric matrices and diagonal matrices also Toeplitz matrices has a
special structure which should be taken into account when the derivative with
respect to a matrix with Toeplitz structure.
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 13
2.5
Tr(AT)
T
Tr(TA)
=
T
DERIVATIVES
(24)
Tr([AT ]n1 )
Tr(A)
Tr([AT ]1n ))
Tr(A)
.
.
A1n
.
.
.
.
.
.
.
.
.
.
.
.
Tr([[AT ]1n ]2,n1 )
.
.
.
.
.
.
Tr([AT ]1n ))
An1
.
.
.
Tr([[AT ]1n ]n1,2 )
Tr([AT ]n1 )
Tr(A)
(A)
As it can be seen, the derivative (A) also has a Toeplitz structure. Each value
in the diagonal is the sum of all the diagonal valued in A, the values in the
diagonals next to the main diagonal equal the sum of the diagonal next to the
main diagonal in AT . This result is only valid for the unconstrained Toeplitz
matrix. If the Toeplitz matrix also is symmetric, the same derivative yields
Tr(AT)
T
Tr(TA)
T
= (A) + (A)T (A) I
=
(25)
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 14
3 INVERSES
Inverses
3.1
3.1.1
Basic
Definition
(26)
cof(A, 1, 1)
cof(A, 1, n)
.
.
..
..
cof(A) =
cof(A,
i,
j)
cof(A, n, 1)
cof(A, n, n)
(27)
(28)
(29)
Determinant
n
X
j=1
n
X
(30)
j=1
3.1.4
Construction
1
adj(A)
det(A)
(31)
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 15
3.2
Exact Relations
3.1.5
INVERSES
Condition number
The condition number of a matrix c(A) is the ratio between the largest and the
smallest singular value of a matrix (see Section 5.2 on singular values),
c(A) =
d+
d
The condition number can be used to measure how singular a matrix is. If the
condition number is large, it indicates that the matrix is nearly singular. The
condition number can also be estimated from the matrix norms. Here
c(A) = kAk kA1 k,
(32)
where k k is a norm such as e.g the 1-norm, the 2-norm, the -norm or the
Frobenius norm (see Sec 9.4 for more on matrix norms).
3.2
3.2.1
Exact Relations
The Woodbury identity
(A + CBCT )1 = A1 A1 C(B1 + CT A1 C)1 CT A1
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 16
3.3
Implication on Inverses
3.3
INVERSES
Implication on Inverses
(A + B)1 = A1 + B1
AB1 A = BA1 B
See [15].
3.3.1
A PosDef identity
3.4
Approximations
(I + A)1 = I A + A2 A3 + ...
A A(I + A)1 A
= I A1
If 2 is small then
(Q + 2 M)1
= Q1 2 Q1 MQ1
3.5
3.5.1
Generalized Inverse
Definition
A generalized inverse matrix of the matrix A is any matrix A such that (see
[16])
AA A = A
The matrix A is not unique.
3.6
3.6.1
Pseudo Inverse
Definition
AA+ A = A
A+ AA+ = A+
AA+ symmetric
IV
A+ A symmetric
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 17
3.6
Pseudo Inverse
3.6.2
INVERSES
Properties
= A
= (A+ )T
(cA)+
= (1/c)A+
= A+ (AT )+
= (AT )+ A+
(A A)
(AAT )+
Assume A to have full rank, then
(See [16])
(See [16])
Construction
Square
Broad
Tall
rank(A) = n
rank(A) = n
rank(A) = m
A+ = A1
A+ = AT (AAT )1
A+ = (AT A)1 AT
Assume A does not have full rank, i.e. A is nm and rank(A) = r < min(n, m).
The pseudo inverse A+ can be constructed from the singular value decomposition A = UDVT , by
A+ = VD+ UT
A different way is this: There does always exists two matrices C n r and D
r m of rank r, such that A = CD. Using these matrices it holds that
A+ = DT (DDT )1 (CT C)1 CT
See [3].
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 18
COMPLEX MATRICES
Complex Matrices
4.1
Complex Derivatives
(33)
and
df (z)
<(f (z)) =(f (z))
= i
+
dz
=z
=z
or in a more compact form:
f (z)
f (z)
=i
.
=z
<z
(34)
(35)
(36)
(37)
=
=
df (z)
dz
f (z)
f (z)
+i
.
<z
=z
(39)
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 19
4.1
Complex Derivatives
COMPLEX MATRICES
df (Z)
dZ
f (Z)
f (Z)
+i
.
<Z
=Z
(40)
The chain rule is a little more complicated when the function of a complex
u = f (x) is non-analytic. For a non-analytic function, the following chain rule
can be applied ([7])
g(u)
x
=
=
g u
g u
+
u x u x
g u g u
+
u x
u
x
(41)
Notice, if the function is analytic, the second term reduces to zero, and the function is reduced to the normal well-known chain rule. For the matrix derivative
of a scalar function g(U), the chain rule can be written the following way:
T
T
Tr(( g(U)
Tr(( g(U)
g(U)
U ) U)
U ) U )
=
+
.
X
X
X
4.1.2
(42)
If the derivatives involve complex numbers, the conjugate transpose is often involved. The most useful way to show complex derivative is to show the derivative
with respect to the real and the imaginary part separately. An easy example is:
Tr(X )
Tr(XH )
=
<X
<X
Tr(X )
Tr(XH )
i
=i
=X
=X
(43)
(44)
Since the two results have the same sign, the conjugate complex derivative (37)
should be used.
Tr(X)
Tr(XT )
=
<X
<X
Tr(XT )
Tr(X)
=i
i
=X
=X
(45)
(46)
Here, the two results have different signs, the generalized complex derivative
(36) should be used. Hereby, it can be seen that (16) holds even if X is a
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 20
4.1
Complex Derivatives
COMPLEX MATRICES
complex number.
Tr(AXH )
<X
Tr(AXH )
i
=X
Tr(AX )
<X
Tr(AX )
i
=X
(47)
(48)
AT
(49)
AT
(50)
Tr(XXH )
Tr(XH X)
=
<X
<X
H
Tr(XX )
Tr(XH X)
i
=i
=X
=X
2<X
(51)
i2=X
(52)
By inserting (51) and (52) in (36) and (37), it can be seen that
Tr(XXH )
= X
X
Tr(XXH )
=X
X
(53)
(54)
Since the function Tr(XXH ) is a real function of the complex matrix X, the
complex gradient matrix (40) is given by
Tr(XXH ) = 2
4.1.3
Tr(XXH )
= 2X
X
(55)
1 det(XH AX)
det(XH AX)
i
2
<X
=X
H
T
H
1 H
= det(X AX) (X AX) X A
=
(56)
1 det(XH AX)
det(XH AX)
+i
2
<X
=X
= det(XH AX)AX(XH AX)1
=
(57)
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 21
5 DECOMPOSITIONS
Decompositions
5.1
5.1.1
(D)ij = ij i
General Properties
eig(AB)
A is n m
rank(A) = r
5.1.3
= eig(BA)
At most min(n, m) distinct i
At most r non-zero i
Symmetric
5.2
=
=
=
I
R
P
(i.e. V is orthogonal)
(i.e. i is real)
p
i i
1 + ci
1
i
5.2.1
U = eigenvectors
of AAT n n
p
D =
diag(eig(AAT ))
nm
V = eigenvectors of AT A m m
Symmetric Square decomposed into squares
T
A = V
D
V
where D is diagonal with the eigenvalues of A and V is orthogonal and the
eigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 22
5.3
Triangular Decomposition
5.2.2
DECOMPOSITIONS
UT
where D is diagonal with the square root of the eigenvalues of AAT , V is the
eigenvectors of AAT and UT is the eigenvectors of AT A.
5.2.3
D 0
U
A = V V
0 D
UT
where the SVD of A is A = VDUT .
5.2.4
Rectangular decomposition I
Assume A is n m
UT
where D is diagonal with the square root of the eigenvalues of AAT , V is the
eigenvectors of AAT and UT is the eigenvectors of AT A.
5.2.5
Rectangular decomposition II
Assume A is n m
5.2.6
UT
Assume A is n m
UT
where D is diagonal with the square root of the eigenvalues of AAT , V is the
eigenvectors of AAT and UT is the eigenvectors of AT A.
5.3
5.3.1
Triangular Decomposition
Cholesky-decomposition
6.1
Definition of Moments
Mean
Covariance
6.1.3
Third moments
i
h
(3) (3)
M3 = m::1 m::2 ...m(3)
::n
where : denotes all elements within the given index. M3 can alternatively be
expressed as
M3 = h(x m)(x m)T (x m)T i
6.1.4
Fourth moments
mijkl = h(xi hxi i)(xj hxj i)(xk hxk i)(xl hxl i)i
as
h
i
(4)
(4)
(4)
(4)
(4)
(4)
(4)
(4)
M4 = m::11 m::21 ...m::n1 |m::12 m::22 ...m::n2 |...|m::1n m::2n ...m(4)
::nn
or alternatively as
M4 = h(x m)(x m)T (x m)T (x m)T i
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 24
6.2
6.2
6.2.1
Linear Forms
Am + b
Am
m+b
Quadratic Forms
=
=
=
=
=
=
Tr(M) + (m + a)T (m + a)
See [7].
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 25
6.3
6.2.3
Cubic Forms
E[xxT x] =
E[(Ax + a)(Ax + a)T (Ax + a)] =
Adiag(BT C)v3
+Tr(BMCT )(Am + a)
+AMCT (Bm + b)
+(AMBT + (Am + a)(Bm + b)T )(Cm + c)
v3 + 2Mm + (Tr(M) + mT m)m
Adiag(AT A)v3
+[2AMAT + (Ax + a)(Ax + a)T ](Am + a)
+Tr(AMAT )(Am + a)
6.3
=
=
=
=
wT m
wT M2 w
wT M3 w w
wT M4 w w w
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 26
GAUSSIANS
Gaussians
7.1
7.1.1
Basics
Density and normalization
1
exp (x m)T 1 (x m)
2
det(2)
1
Z
p
1
T 1
exp (x m) (x m) dx = det(2)
2
Z
p
1 T
1 T 1
T
1
exp x Ax + b x dx = det(2A ) exp b A b
2
2
Z
p
1
1
exp Tr(ST AS) + Tr(BT S) dS = det(2A1 ) exp Tr(BT A1 B)
2
2
The derivatives of the density are
2p
xxT
7.1.2
p(x)
= p(x)1 (x m)
x
Marginal Distribution
Assume x Nx (, ) where
xa
a
x=
=
xb
b
a
Tc
c
b
a
Tc
c
b
then
p(xa ) = Nxa (a , a )
p(xb ) = Nxb (b , b )
7.1.3
Conditional Distribution
Assume x Nx (, ) where
xa
a
x=
=
xb
b
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 27
7.1
Basics
GAUSSIANS
then
7.1.4
a)
p(xa |xb ) = Nxa (
a ,
n
a
a
b)
p(xb |xa ) = Nxb (
b ,
n
b
b
=
=
=
=
a + c 1
b (xb b )
T
a + c 1
b c
b + Tc 1
a (xa a )
b + Tc 1
a c
Linear combination
Rearranging Means
p
det(2(AT 1 A)1 )
p
NAx [m, ] =
Nx [A1 m, (AT 1 A)1 ]
det(2)
7.1.6
If A is symmetric, then
1
1
1
xT Ax + bT x = (x A1 b)T A(x A1 b) + bT A1 b
2
2
2
1
1
1
Tr(XT AX)+Tr(BT X) = Tr[(XA1 B)T A(XA1 B)]+ Tr(BT A1 B)
2
2
2
7.1.7
=
1
c
mc
C
1
(x m1 )T 1
1 (x m1 )
2
1
(x m2 )T 1
2 (x m2 )
2
1
(x mc )T 1
c (x mc ) + C
2
1
= 1
1 + 2
1 1
1
= (1
(1
1 + 2 )
1 m1 + 2 m2 )
1 T 1
1
1 1
1
=
(m + mT2 1
(1
2 )(1 + 2 )
1 m1 + 2 m2 )
2 1 1
1
T 1
mT1 1
1 m1 + m2 2 m2
2
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 28
7.2
Moments
GAUSSIANS
M
)
(
+
)
(
M
+
M
)
1
2
1
2
1
2
1
2
1
2
2
1
T 1
Tr(MT1 1
1 M1 + M2 2 M2 )
2
1
c
Mc
C
7.1.8
=
=
mc
c
=
=
Nm1 (m2 , (1 + 2 ))
1
1
T
1
p
exp (m1 m2 ) (1 + 2 ) (m1 m2 )
2
det(2(1 + 2 ))
1 1
1
(1
(1
1 + 2 )
1 m1 + 2 m2 )
1
1 1
(1 + 2 )
7.2
Moments
7.2.1
7.2
Moments
7.2.2
GAUSSIANS
= 2 4 Tr(A2 ) + 4 2 mT A2 m
E[(x m ) A(x m )] = (m m0 )T A(m m0 ) + Tr(A)
0 T
Cubic forms
E[xbT xxT ] =
7.2.4
E[xxT xxT ] =
2( + mmT )2 + mT m( mmT )
+Tr()( + mmT )
E[xxT AxxT ] = ( + mmT )(A + AT )( + mmT )
+mT Am( mmT ) + Tr[A( + mmT )]
E[xT xxT x] = 2Tr(2 ) + 4mT m + (Tr() + mT m)2
E[xT AxxT Bx] = Tr[A(B + BT )] + mT (A + AT )(B + BT )m
+(Tr(A) + mT Am)(Tr(B) + mT Bm)
E[aT xbT xcT xdT x]
= (aT ( + mmT )b)(cT ( + mmT )d)
+(aT ( + mmT )c)(bT ( + mmT )d)
+(aT ( + mmT )d)(bT ( + mmT )c) 2aT mbT mcT mdT m
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 30
7.3
Miscellaneous
GAUSSIANS
Moments
E[x] =
Cov(x) =
XX
k
7.3
7.3.1
k mk
k k0 (k + mk mTk mk mTk0 )
k0
Miscellaneous
Whitening
Entropy
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 31
7.4
Mixture of Gaussians
7.4
7.4.1
GAUSSIANS
Mixture of Gaussians
Density
K
X
k=1
k p
1
exp (x mk )T 1
(x
m
)
k
k
2
det(2k )
1
Derivatives
P
Defining p(s) = k k Ns (k , k ) one get
ln p(s)
j
=
=
ln p(s)
j
=
=
ln p(s)
j
=
=
j Ns (j , j )
P
ln[j Ns (j , j )]
N
(
,
k
s
k
j
k
k
j Ns (j , j ) 1
P
k k Ns (k , k ) j
j Ns (j , j )
P
ln[j Ns (j , j )]
N
(
,
k
k
j
k k s
j Ns (j , j )
P
1
k (s k )
k k Ns (k , k )
j Ns (j , j )
P
ln[j Ns (j , j )]
N
(
,
k
j
k
k k s
j Ns (j , j ) 1 T
T T
P
j + T
j (s j )(s j ) j
N
(
,
)
2
k
k
k k s
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 32
8 SPECIAL MATRICES
Special Matrices
8.1
8.1.1
Let ei Rn1 be the ith unit vector, i.e. the vector which is zero in all entries
except the ith at which it is 1.
8.1.2
= eTi A
= Aej
i.th row of A
j.th column of A
8.1.3
Permutations
0 1
P= 1 0
0 0
matrix, e.g.
0 = e2
1
eT2
= eT1
eT3
e1
then
e3
eT2 A
PA = eT1 A
eT3 A
AP =
Ae2
Ae1
Ae2
That is, the first is a matrix which has columns of A but in permuted sequence
and the second is a matrix which has the rows of A but in the permuted sequence.
8.1.4
0
1
L=
0
0
0
0
0
1
0
0
0
0
i.e. a matrix of zeros with one on the sub-diagonal, (L)ij = i,j+1 . With some
signal xt for t = 1, ..., N , the n.th power of the lag operator shifts the indices,
i.e.
n 0
for t = 1, .., n
(Ln x)t =
xtn for t = n + 1, ..., N
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 33
8.2
SPECIAL MATRICES
A related but slightly different matrix is the recurrent shifted operator defined
on a 4x4 example by
0 0 0 1
= 1 0 0 0
L
0 1 0 0
0 0 1 0
ij = i,j+1 + i,1 j,dim(L) . On a signal x it has the
i.e. a matrix defined by (L)
effect
n x)t = xt0 , t0 = [(t n) mod N ] + 1
(L
is like the shift operator L except that it wraps the signal as if it
That is, L
was periodic and shifted (substituting the zeros with the rear end of the signal).
is invertible and orthogonal, i.e.
Note that L
1 = L
T
L
8.2
8.2.1
The single-entry matrix Jij Rnn is defined as the matrix which is zero
everywhere except in the entry (i, j) in which it is 1. In a 4 4 example one
might have
0 0 0 0
0 0 1 0
J23 =
0 0 0 0
0 0 0 0
The single-entry matrix is very useful when working with derivatives of expressions involving matrices.
8.2.2
AJij = 0 0 . . . Ai
...
i.e. an n p matrix of zeros with the i.th column of A in place of the j.th
column. Assume A to be n m and Jij to be p n
0
..
.
Jij A =
Aj
0
.
..
0
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 34
8.2
SPECIAL MATRICES
i.e. an p m matrix of zeros with the j.th row of A in the placed of the i.th
row.
8.2.3
8.2.4
If i = j
Jij Jij = 0
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 35
8.3
8.2.6
SPECIAL MATRICES
Structure Matrices
8.3
8.3.1
Antisymmetric
The antisymmetric matrix is also known as the skew symmetric matrix. It has
the following property from which it is defined
A = AT
Hereby, it can be seen that the antisymmetric matrices always have a zero
diagonal. The n n antisymmetric matrices also have the following properties.
det(AT ) =
det(A) =
8.4
Toeplitz Matrices
..
..
..
..
t21 . . . . . .
t1
.
.
.
.
T= .
(58)
=
.
.
.
.
.
..
.. t
..
..
..
..
t1
12
tn1 t21 t11
t(n1) t1
t0
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 36
8.4
Toeplitz Matrices
SPECIAL MATRICES
A Toeplitz matrix is persymmetric. If a matrix is persymmetric (or orthosymmetric), it means that the matrix is symmetric about its northeast-southwest
diagonal (anti-diagonal) [9]. Persymmetric matrices is a larger class of matrices,
since a persymmetric matrix not necessarily has a Toeplitz structure. There are
some special cases of Toeplitz matrices. The symmetric Toeplitz matrix is given
by:
t0
t1 tn1
..
..
.
. ..
t1
.
T=
(59)
..
..
.
. ..
.
t1
t(n1) t1
t0
The circular Toeplitz matrix:
TC =
t0
tn
..
.
t1
t1
..
.
..
t0 t1
0 ...
TU =
. .
..
..
0
..
.
..
.
tn1
..
.
..
.
0
t0
0
.
..
t1
TL =
..
..
.
.
t(n1)
8.4.1
tn1
..
.
t1
t0
(60)
tn1
..
.
,
t1
t0
(61)
..
.
..
.
t1
0
..
.
0
t0
(62)
The Toeplitz matrix has some computational advantages. The addition of two
Toeplitz matrices can be done with O(n) flops, multiplication of two Toeplitz
matrices can be done in O(n ln n) flops. Toeplitz equation systems can be
solved in O(n2 ) flops. The inverse of a positive definite Toeplitz matrix can
be found in O(n2 ) flops too. The inverse of a Toeplitz matrix is persymmetric.
The product of two lower triangular Toeplitz matrices is a Toeplitz matrix.
The following important relation between the circulant matrix and the discrete
Fourier transform (DFT) exists
TC = F1
n (I (Fn t))Fn ,
(63)
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 37
8.5
SPECIAL MATRICES
(64)
Likewise, F1
n is the inverse DFT matrix defined as
1
F1
.
n = IDFT(I) = (DFT(I))
(65)
8.5
8.5.1
Eigenvalues
eig(A) > 0
eig(A) 0
Trace
Tr(A) > 0
Tr(A) 0
Inverse
Diagonal
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 38
8.6
Block matrices
8.5.6
SPECIAL MATRICES
Decomposition I
Decomposition II
AX = 0
Rank of product
Outer Product
Small pertubations
8.6
Block matrices
Multiplication
A11 A12
B11 B12
A11 B11 + A12 B21 A11 B12 + A12 B22
=
A21 A22
B21 B22
A21 B11 + A22 B21 A21 B12 + A22 B22
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 39
8.6
Block matrices
8.6.2
SPECIAL MATRICES
The Determinant
C1
C2
as
det
8.6.3
A12
A22
A11
A21
The Inverse
C1
C2
as
8.6.4
A11
A21
A12
A22
C1
1
1
C1
2 A21 A11
1
1
1
A1
11 + A11 A12 C2 A21 A11
1
A1
22 A21 C1
1
A1
11 A12 C2
C1
2
1
C1
1 A12 A22
1
1
1
A1
22 + A22 A21 C1 A12 A22
Block diagonal
A11
0
det
0
A22
A11
0
0
A22
(A11 )1
0
(A22 )1
= det(A11 ) det(A22 )
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 40
9.1
9.1.1
9.1.2
Consider some scalar function f (x) which takes the vector x as an argument.
This we can Taylor expand around x0
1
f (x)
= f (x0 ) + g(x0 )T (x x0 ) + (x x0 )T H(x0 )(x x0 )
2
where
g(x0 ) =
9.1.3
f (x)
x x0
H(x0 ) =
2 f (x)
xxT x0
As for analytical functions in one dimension, one can define a matrix function
for square matrices X by an infinite series
f (X) =
cn Xn
n=0
P
assuming the limit exists and is finite. If the coefficients cn fulfils n cn xn < ,
then one can prove that the above series exists and is finite, see [1]. Thus for
any analytical function f (x) there exists a corresponding matrix function f (x)
constructed by the Taylor expansion. Using this one can prove the following
results:
1) A matrix A is a zero of its own characteristic polynomium [1]:
X
p() = det(I A) =
cn n
p(A) = 0
n
f (A) = Uf (B)U1
if
|A| < 1
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 41
9.2
9.1.4
In analogy to the ordinary scalar exponential function, one can define exponential and logarithmic matrix functions:
eA
X
1 n
1
A = I + A + A2 + ...
n!
2
n=0
eA
X
1
1
(1)n An = I A + A2 ...
n!
2
n=0
etA
X
1
1
(tA)n = I + tA + t2 A2 + ...
n!
2
n=0
ln(I + A)
X
1
1
(1)n1 n
A = A A2 + A3 ...
n
2
3
n=1
if
AB = BA
(eA )1 = eA
d tA
e = AetA ,
dt
9.1.5
9.2
9.2.1
tR
Trigonometric Functions
sin(A)
X
(1)n A2n+1
1
1
= A A3 + A5 ...
(2n + 1)!
3!
5!
n=0
cos(A)
X
(1)n A2n
1
1
= I A2 + A4 ...
(2n)!
2!
4!
n=0
AB=
..
..
.
.
Am1 B
Am2 B
... Amn B
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 42
9.3
The vec-operator applied on a matrix A stacks the columns into a vector, i.e.
for a 2 2 matrix
A11
A21
A11 A12
A=
vec(A) =
A12
A21 A22
A22
Properties of the vec-operator include (see [12])
vec(AXB) = (BT A)vec(X)
Tr(AT B) = vec(A)T vec(B)
vec(A + B) = vec(A) + vec(B)
vec(A) = vec(A)
9.3
9.3.1
Solution
Unique solution x
Many solutions x
No solutions x
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 43
9.3
9.3.2
Standard Square
x = A1 b
9.3.3
Degenerated Square
9.3.4
Over-determined Rectangular
x = (AT A)1 AT b = A+ b
xmin = A+ b
Now xmin is the vector x which minimizes ||Ax b||2 , i.e. the vector which is
least wrong. The matrix A+ is the pseudo-inverse of A. See [3].
9.3.5
Under-determined Rectangular
xmin = AT (AAT )1 b
The equation have many solutions x. But xmin is the solution which minimizes
||Ax b||2 and also the solution with the smallest norm ||x||2 . The same holds
for a matrix version: Assume A is n m, X is m n and B is n n, then
AX = B
Xmin = A+ B
The equation have many solutions X. But Xmin is the solution which minimizes
||AX B||2 and also the solution with the smallest norm ||X||2 . See [3].
Similar but different: Assume A is square n n and the matrices B0 , B1
are n N , where N > n, then if B0 has maximal rank
AB0 = B1
where Amin denotes the matrix which is optimal in a least square sense. An
interpretation is that A is the linear approximation which maps the columns
vectors of B0 into the columns vectors of B1 .
9.3.6
A=0
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 44
9.4
Matrix Norms
9.3.7
If A is symmetric, then
xT Ax = 0,
9.3.8
A=0
See Sec 9.2.1 and 9.2.2 for details on the Kronecker product and the vec operator.
9.3.9
Encapsulating Sum
P
n An XBn
=C
P T
1
vec(X) =
vec(C)
n Bn An
See Sec 9.2.1 and 9.2.2 for details on the Kronecker product and the vec operator.
9.4
9.4.1
Matrix Norms
Definitions
0
||A|| = 0 A = 0
= |c|||A||,
cR
||A|| + ||B||
Examples
||A||F
P
= max
p j i |Aij |
=
max eig(AT A)
1/p
= (max||x||
P p =1 ||Ax||p )
= maxi j |Aij |
qP
p
2
=
Tr(AAH )
ij |Aij | =
||A||max
||A||KF
=
=
||A||1
||A||2
||A||p
||A||
maxij |Aij |
||sing(A)||1
(Frobenius)
(Ky Fan)
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 45
9.5
Rank
9.4.3
Inequalities
||A||1
1
||A||
1
m
n
n
n
nd
m
md
m
n
mn
mn
mnd
9.5
||A||2
1
m
n
d
d
||A||F
1
m
n
1
||A||KF
1
m
n
1
1
m ||A||
Rank
9.5.1
Sylvesters Inequality
If A is m n and B is n r, then
rank(A) + rank(B) n rank(AB) min{rank(A), rank(B)}
9.6
1
p(A1 x)
det(A)
9.7
Miscellaneous
Orthogonal matrix
A
A.1
A.1.1
One-dimensional Results
Gaussian
Density
p(x) =
A.1.2
(x )2
exp
2 2
2 2
1
Normalization
Z
e
(s)2
2 2
ds =
Z
e(ax
+bx+c)
dx
+c1 x+c0
dx
Z
e c2 x
A.1.3
2 2
r
2
b 4ac
exp
a
4a
2
c 4c2 c0
exp 1
c2
4c2
Derivatives
p(x)
ln p(x)
p(x)
ln p(x)
A.1.4
ONE-DIMENSIONAL RESULTS
=
=
=
=
(x )
2
(x )
2
1 (x )2
p(x)
1 (x )2
2
p(x)
b=
1 c1
2 c2
or
c2 x2 + c1 x + c0 =
=
c1
2c2
2 =
w=
1 c21
+ c0
4 c2
1
(x )2 + d
2 2
1
2c2
d = c0
c21
4c2
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 47
A.2
A.1.5
Moments
1
(s )2
p(x) =
exp
2 2
2 2
or p(x) = C exp(c2 x2 + c1 x)
hx2 i
2 + 2
hx3 i
3 2 + 3
hx4 i
4 + 62 2 + 3 4
c1
2c2
2
c1
1
2c2 + h 2c2
i
c21
c1
3
2
(2c )
2c2
2 4
2
c1
c1
+
6
2c2
2c2
1
2c2
+3
1
2c2
=
2
h(x ) i
h(x )3 i
=
=
h(x )4 i
3 4
0h
=
=
1
2c2
1
2c2
i2
c1
exp(c2 x2 + c1 x)xn dx = Zhxn i =
exp
hxn i
c2
4c2
From the un-centralized moments one can derive other entities like
A.2
A.2.1
hx2 i hxi2
hx3 i hx2 ihxi
=
=
2
2 2
=
=
hx4 i hx2 i2
2 4 + 42 2
h
i
c2
1 4 2c12
p(s) =
K
X
k
A.2.2
1
2c2
2c1
(2c2 )2
2
(2c2 )2
1 (s k )2
k
exp
2
k2
2k2
Moments
k hxn ik
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 48
A.2
where hik denotes average with respect to the k.th component. We can calculate
the first four moments from the densities
X
1
1 (x k )2
p(x) =
k p
exp
2
k2
2k2
k
p(x) =
as
hxi
hx2 i
hx3 i
hx4 i
k k k
k (k2 + 2k )
k (3k2 k + 3k )
P
P
h
i
ck1
k
k
2ck2
2
P
ck1
1
+
k
k
2ck2
2ck2
h
ii
h
P
c2k1
ck1
k k (2ck2 )2
2ck2
2
2
P
c2k1
c
1
k1
6
+
3
k k
2ck2
2ck2
2ck2
P
hx i
hx3 i
4
hx i
=
=
=
=
0
P
0
P
=
2
k k k
4
k k 3k
=
=
=
0
P
k k
0
P
k k 3
1
2ck2
1
2ck2
i2
From the un-centralized moments one can derive other entities like
2
P
2
hx2 i hxi2
=
k,k0 k k0 k + k k k0
P
2
3
2
2
hx3 i hx2 ihxi =
k,k0 k k0 3k k + k (k + k )k0
P
4
2 2
4
2
2
2
2
hx4 i hx2 i2
=
k,k0 k k0 k + 6k k + 3k (k + k )(k0 + k0 )
A.2.3
Derivatives
P
Defining p(s) = k k Ns (k , k2 ) we get for a parameter j of the j.th component
j Ns (j , j2 ) ln(j Ns (j , j2 ))
ln p(s)
=P
2
j
j
k k Ns (k , k )
that is,
ln p(s)
j
ln p(s)
j
ln p(s)
j
j Ns (j , j2 ) 1
P
2
k k Ns (k , k ) j
j Ns (j , j2 ) (s j )
P
2
j2
k k Ns (k , k )
"
#
j Ns (j , j2 ) 1 (s j )2
P
1
2
j2
k k Ns (k , k ) j
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 49
Note thatP
k must be constrained to be proper ratios. Defining the ratios by
j = erj / k erk , we obtain
ln p(s) X ln p(s) l
=
rj
l rj
where
B
B.1
B.1.1
l
= l (lj j )
rj
Xij
=
=
u1 ,...,un1
r=0
n1
X
r=0
Using the properties of the single entry matrix found in Sec. 8.2.4, the result
follows easily.
B.1.2
Details on Eq. 67
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 50
B.1
Misc Proofs
Through the calculations, (16) and (47) were used. In addition, by use of (48),
the derivative is found with respect to the imaginary part of X
i
det(XH AX)
=X
= i det(XH AX)
=X
Tr[(X AX) X A(X)]
+
=X
1 det(XH AX)
det(XH AX)
i
2
<X
=X
T
= det(XH AX) (XH AX)1 XH A
=
1 det(XH AX)
det(XH AX)
+i
2
<X
=X
= det(XH AX)AX(XH AX)1
=
Notice, for real X, A, the sum of (56) and (57) is reduced to (13).
Similar calculations yield
det(XAXH )
X
1 det(XAXH )
det(XAXH )
i
2
<X
=X
T
= det(XAXH ) AXH (XAXH )1
(66)
1 det(XAXH )
det(XAXH )
+i
2
<X
=X
= det(XAXH )(XAXH )1 XA
(67)
and
det(XAXH )
X
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 51
REFERENCES
REFERENCES
References
[1] Karl Gustav Andersson and Lars-Christer Boiers. Ordinaera differentialekvationer. Studenterlitteratur, 1992.
uller, Terrence J. Sejnowski, and Scott Makeig. Complex inde[2] Jorn Anem
pendent component analysis of frequency-domain electroencephalographic
data. Neural Networks, 16(9):13111323, November 2003.
[3] S. Barnet. Matrices. Methods and Applications. Oxford Applied Mathematics and Computin Science Series. Clarendon Press, 1990.
[4] Christoffer Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.
[5] Robert J. Boik. Lecture notes: Statistics 550. Online, April 22 2002. Notes.
[6] D. H. Brandwood. A complex gradient operator and its application in
adaptive array theory. IEE Proceedings, 130(1):1116, February 1983. PTS.
F and H.
[7] M. Brookes. Matrix Reference Manual, 2004. Website May 20, 2004.
[8] Mads Dyrholm. Some matrix results, 2004. Website August 23, 2004.
[9] Gene H. Golub and Charles F. van Loan. Matrix Computations. The Johns
Hopkins University Press, Baltimore, 3rd edition, 1996.
[10] Robert M. Gray. Toeplitz and circulant matrices: A review. Technical
report, Information Systems Laboratory, Department of Electrical Engineering,Stanford University, Stanford, California 94305, August 2002.
[11] Simon Haykin. Adaptive Filter Theory. Prentice Hall, Upper Saddle River,
NJ, 4th edition, 2002.
[12] Thomas P. Minka. Old and new matrix algebra useful for statistics, December 2000. Notes.
[13] L. Parra and C. Spence. Convolutive blind separation of non-stationary
sources. In IEEE Transactions Speech and Audio Processing, pages 320
327, May 2000.
[14] Laurent Schwartz. Cours dAnalyse, volume II. Hermann, Paris, 1967. As
referenced in [11].
[15] Shayle R. Searle. Matrix Algebra Useful for Statistics. John Wiley and
Sons, 1982.
[16] G. Seber and A. Lee. Linear Regression Analysis. John Wiley and Sons,
2002.
[17] S. M. Selby. Standard Mathematical Tables. CRC Press, 1974.
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 52
REFERENCES
REFERENCES
Petersen & Pedersen, The Matrix Cookbook, Version: October 3, 2005, Page 53