Sie sind auf Seite 1von 127

Linear Algebra via Exterior Products

This book is a pedagogical introduction to the


coordinate-free approach in finite-dimensional linear
algebra, at the undergraduate level. Throughout this
book, extensive use is made of the exterior (wedge)
product of vectors. In this approach, the book derives,
without matrix calculations, the standard properties of
determinants, the formulas of Jacobi and Liouville, the
Cayley-Hamilton theorem, properties of Pfaffians, the
Jordan canonical form, as well as some generalizations
of these results. Every concept is logically motivated
and discussed; exercises with some hints are provided.
Sergei Winitzki received a
PhD in theoretical physics
from Tufts University, USA
(1997) and has been a re-
searcher and part-time lec-
turer at universities in the
USA, UK, and Germany.
Dr. Winitzki has authored a
number of research articles and two books on his
main professional interest, theoretical physics. He is
presently employed as a senior academic fellow at the
Ludwig-Maximilians-University, Munich (Germany).
Linear Algebra via Exterior Products

Sergei Winitzki, Ph.D.


Contents

Preface iv

0 Introduction and summary 1


0.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Sample quiz problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.3 A list of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1 Linear algebra without coordinates 5


1.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Three-dimensional Euclidean geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 From three-dimensional vectors to abstract vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Examples of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Dimensionality and bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.5 All bases have equally many vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Linear maps in vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 Abstract definition of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Examples of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.3 Vector space of all linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.4 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.1 Projectors and subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.2 Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Isomorphisms of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Direct sum of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5.1 V and W as subspaces of V W ; canonical projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Dual (conjugate) vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6.1 Dual basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6.2 Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7 Tensor product of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7.1 First examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7.2 Example: Rm Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7.3 Dimension of tensor product is the product of dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7.4 Higher-rank tensor products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7.5 * Distributivity of tensor product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8 Linear maps and tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8.1 Tensors as linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8.2 Linear operators as tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8.3 Examples and exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8.4 Linear maps between different spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.9 Index notation for tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.9.1 Definition of index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.9.2 Advantages and disadvantages of index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.10 Dirac notation for vectors and covectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.10.1 Definition of Dirac notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.10.2 Advantages and disadvantages of Dirac notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2 Exterior product 30
2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.1 Two-dimensional oriented area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.2 Parallelograms in R3 and in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Exterior product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.1 Definition of exterior product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.2 * Symmetric tensor product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

i
Contents

2.3 Properties of spaces k V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35


2.3.1 Linear maps between spaces k V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.2 Exterior product and linear dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.3 Computing the dual basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.4 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.5 Rank of a set of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.6 Exterior product in index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.7 * Exterior algebra (Grassmann algebra) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 Basic applications 44
3.1 Determinants through permutations: the hard way . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 The space N V and oriented volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Determinants of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.1 Examples: computing determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Determinants of square tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.1 * Index notation for N V and determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5 Solving linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.1 Existence of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5.2 Kramers rule and beyond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 Vandermonde matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.6.1 Linear independence of eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6.2 Polynomial interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.7 Multilinear actions in exterior powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.7.1 * Index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.8 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.9 Characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.9.1 Nilpotent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4 Advanced applications 63
4.1 The space N 1 V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1.1 Exterior transposition of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1.2 * Index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2 Algebraic complement (adjoint) and beyond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.1 Definition of algebraic complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.2 Algebraic complement of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.3 Further properties and generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3 Cayley-Hamilton theorem and beyond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 Functions of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.1 Definitions. Formal power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.2 Computations: Sylvesters method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.4.3 * Square roots of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5 Formulas of Jacobi and Liouville . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.1 Derivative of characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5.2 Derivative of a simple eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.5.3 General trace relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.6 Jordan canonical form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6.1 Minimal polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.7 * Construction of projectors onto Jordan cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5 Scalar product 87
5.1 Vector spaces with scalar product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.1.1 Orthonormal bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.1.2 Correspondence between vectors and covectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.1.3 * Example: bilinear forms on V V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.1.4 Scalar product in index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 Orthogonal subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.1 Affine hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3 Orthogonal transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3.1 Examples and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3.2 Transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4 Applications of exterior product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

ii
Contents

5.4.1 Orthonormal bases, volume, and N V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94


5.4.2 Vector product in R3 and Levi-Civita symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4.3 Hodge star and Levi-Civita symbol in N dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.4.4 Reciprocal basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.5 Scalar product in k V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.5.1 Scalar product in N V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.5.2 Volumes of k-dimensional parallelepipeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.6 Scalar product for complex spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.6.1 Symmetric and Hermitian operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.6.2 Unitary transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.7 Antisymmetric operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.8 * Pfaffians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.8.1 Determinants are Pfaffians squared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.8.2 Further properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

A Complex numbers 107


A.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
A.2 Geometric representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
A.3 Analytic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
A.4 Exponent and logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

B Permutations 109

C Matrices 111
C.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
C.2 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
C.3 Linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
C.4 Inverse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
C.5 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
C.6 Tensor product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

D Distribution of this text 115


D.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
D.2 GNU Free Documentation License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
D.2.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
D.2.2 Applicability and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
D.2.3 Verbatim copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
D.2.4 Copying in quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
D.2.5 Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
D.2.6 Combining documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
D.2.7 Collections of documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
D.2.8 Aggregation with independent works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
D.2.9 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
D.2.10 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
D.2.11 Future revisions of this license . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
D.2.12 Addendum: How to use this License for your documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
D.2.13 Copyright . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Index 119

iii
Preface
trix; Jacobis formula for the variation of the determinant; vari-
In a first course of linear algebra, one learns the various uses of
matrices, for instance the properties of determinants, eigenvec- ation of the characteristic polynomial and of eigenvalue; the
tors and eigenvalues, and methods for solving linear equations. Cayley-Hamilton theorem; analytic functions of operators; Jor-
The required calculations are straightforward (because, concep- dan canonical form; construction of projectors onto Jordan cells;
tually, vectors and matrices are merely arrays of numbers) if Hodge star and the computation of k-dimensional volumes
cumbersome. However, there is a more abstract and more pow- through k-vectors; definition and properties of the Pfaffian PfA
erful approach: Vectors are elements of abstract vector spaces, for antisymmetric operators A. All these standard results are de-
and matrices represent linear transformations of vectors. This rived without matrix calculations; instead, the exterior product
invariant or coordinate-free approach is important in algebra is used as a main computational tool.
and has found many applications in science. This book is largely pedagogical, meaning that the results are
The purpose of this book is to help the reader make a tran- long known, and the emphasis is on a clear and self-contained,
sition to the abstract coordinate-free approach, and also to give logically motivated presentation aimed at students. Therefore,
a hands-on introduction to exterior products, a powerful tool some exercises with hints and partial solutions are included, but
of linear algebra. I show how the coordinate-free approach to- not references to literature.2 I have tried to avoid being overly
gether with exterior products can be used to clarify the basic pedantic while keeping the exposition mathematically rigorous.
results of matrix algebra, at the same time avoiding all the labo- Sections marked with a star are not especially difficult but
rious matrix calculations. contain material that may be skipped at first reading. (Exercises
Here is a simple theorem that illustrates the advantages of the marked with a star are more difficult.)
exterior product approach. A triangle is oriented arbitrarily in The first chapter is an introduction to the invariant approach
three-dimensional space; the three orthogonal projections of this to vector spaces. I assume that readers are familiar with elemen-
triangle are triangles in the three coordinate planes. Let S be thetary linear algebra in the language of row/column vectors and
area of the initial triangle, and let A, B, C be the areas of the matrices; Appendix C contains a brief overview of that mate-
three projections. Then rial. Good introductory books (which I did not read in detail but
which have a certain overlap with the present notes) are Finite-
S 2 = A2 + B 2 + C 2 . dimensional Vector Spaces by P. Halmos and Linear Algebra
If one uses bivectors to represent the oriented areas of the tri- by J. Hefferon (the latter is a free book).
angle and of its three projections, the statement above is equiv- I started thinking about the approach to linear algebra based
alent to the Pythagoras theorem in the space of bivectors, and on exterior products while still a student. I am especially grate-
the proof requires only a few straightforward definitions and ful to Sergei Arkhipov, Leonid Positselsky, and Arkady Vain-
checks. A generalization of this result to volumes of k-dimen- trob who have stimulated my interest at that time and taught
sional bodies embedded in N -dimensional spaces is then ob- me much of what I could not otherwise learn about algebra.
tained with no extra work. I hope that the readers will appre- Thanks are also due to Prof. Howard Haber (UCSC) for con-
ciate the beauty of an approach to linear algebra that allows us structive feedback on an earlier version of this text.
to obtain such results quickly and almost without calculations.
The exterior product is widely used in connection with n-
forms, which are exterior products of covectors. In this book
I do not use n-forms instead I use vectors, n-vectors, and
their exterior products. This approach allows a more straightfor-
ward geometric interpretation and also simplifies calculations
and proofs.
To make the book logically self-contained, I present a proof
of every basic result of linear algebra. The emphasis is not
on computational techniques, although the coordinate-free ap-
proach does make many computations easier and more elegant.1
The main topics covered are tensor products; exterior prod-
ucts; coordinate-free definitions of the determinant det A, the
trace TrA, and the characteristic polynomial QA (); basic prop-
erties of determinants; solution of linear equations, including
over-determined or under-determined systems, using Kramers
rule; the Liouville formula det exp A = exp TrA as an iden-
tity of formal series; the algebraic complement (cofactor) ma- 2 The approach to determinants via exterior products has been known since at
least 1880 but does not seem especially popular in textbooks, perhaps due
1 Elegantmeans shorter and easier to remember. Usually, elegant derivations to the somewhat abstract nature of the tensor product. I believe that this
are those in which some powerful basic idea is exploited to obtain the result approach to determinants and to other results in linear algebra deserves to
quickly. be more widely appreciated.

iv
0 Introduction and summary
All the notions mentioned in this section will be explained tor spaces are denoted by the symbol =; for example, End V
=
below. If you already know the definition of tensor and exterior V V .
products and are familiar with statements such as End V =V The scalar product of vectors is denoted by hu, vi. The nota-
V , you may skip to Chapter 2. tion a b is used only for the traditional vector product (also
called cross product) in 3-dimensional space. Otherwise, the
product symbol is used to denote the continuation a long ex-
0.1 Notation pression that is being split between lines.
The exterior (wedge) product of vectors is denoted by a b
The following conventions are used throughout this text. 2 V .
I use the bold emphasis to define a new word, term, or no- Any two nonzero tensors a1 ... aN and b1 ... bN in an
tion, and the definition always appears near the boldface text N -dimensional space are proportional to each other, say
(whether or not I write the word Definition).
Ordered sets are denoted by round parentheses, e.g. (1, 2, 3). a1 ... aN = b1 ... bN .
Unordered sets are denoted using the curly parentheses,
It is then convenient to denote by the tensor ratio
e.g. {a, b, c}.
The symbol means is now being defined as or equals by a1 ... aN
.
a previously given definition. b1 ... bN
!
The symbol = means as we already know, equals. The number of unordered choices of k items from n is denoted
A set consisting of all elements x satisfying some property by
 
P (x) is denoted by { x | P (x) is true }. n n!
A map f from a set V to W is denoted by f : V W . An = .
k k!(n k)!
element v V is then mapped to an element w W , which is
written as f : v 7 w or f (v) = w. The k-linear action of a linear operator A in the space n V is
The sets of rational numbers, real numbers, and complex denoted by n Ak . (Here 0 k n N .) For example,
numbers are denoted respectively by Q, R, and C.
Statements, Lemmas, Theorems, Examples, and Exercises are (3 A2 )a b c Aa Ab c + Aa b Ac
numbered only within a single subsection, so references are al- + a Ab Ac.
ways to a certain statement in a certain subsection.1 A reference
to Theorem 1.1.4 means the unnumbered theorem in Sec. 1.1.4. The imaginary unit ( 1) is denoted by a roman i, while
Proofs, solutions, examples, and exercises are separated from the base of natural logarithms is written as an italic e. For
the rest by the symbol . More precisely, this symbol means I example, I would write ei = 1. This convention is designed
have finished with this; now we look at something else. to avoid conflicts with the much used index i and with labeled
V is a finite-dimensional vector space over a field K. Vectors vectors such as ei .
from V are denoted by boldface lowercase letters, e.g. v V . I write R d in the derivatives, such as df /dx, and in inte-
an italic
The dimension of V is N dim V . grals, such as f (x)dx, because in these cases the symbols dx do
The standard N -dimensional space over real numbers (the not refer to a separate well-defined object dx but are a part of
space consisting of N -tuples of real numbers) is denoted by RN . the traditional symbolic notation used in calculus. Differential
The subspace spanned by a given set of vectors {v1 , ..., vn } is forms (or, for that matter, nonstandard calculus) do make dx
denoted by Span {v1 , ..., vn }. into a well-defined object; in that case I write a roman d in
The vector space dual to V is V . Elements of V (covectors) dx. Neither calculus nor differential forms are actually used in
are denoted by starred letters, e.g. f V . A covector f acts this book; the only exception is the occasional use of the deriva-
on a vector v and produces a number f (v). tive d/dx applied to polynomials in x. I will not need to make a
The space of linear maps (homomorphisms) V W is distinction between d/dx and /x; the derivative of a function
Hom (V, W ). The space of linear operators (also called endo- f with respect to x is denoted by x f .
morphisms) of a vector space V , i.e. the space of all linear maps
V V , is End V . Operators are denoted by the circumflex ac-
cent, e.g. A. The identity operator on V is 1V End V (some-
0.2 Sample quiz problems
times also denoted 1 for brevity). The following problems can be solved using techniques ex-
The direct sum of spaces V and W is V W . The tensor plained in this book. (These problems are of varying difficulty.)
product of spaces V and W is V W . The exterior (anti- In these problems V is an N -dimensional vector space (with a
commutative) product of V and V is V V . The exterior prod- scalar product if indicated).
uct of n copies of V is n V . Canonical isomorphisms of vec- Exterior multiplication: If two tensors , k V (with 1
1 2
1I was too lazy to implement a comprehensive system of numbering for all k N 1) are such that 1 v = 2 v for all vectors v V ,
these items. show that 1 = 2 .

1
0 Introduction and summary

Insertions: a) It is given that k V (with 1 k N 1) and Inverse operator: It is known that AB = 1V , where 6= 0 is
a = 0, where a V and a 6= 0. Further, a covector f V is a number. Prove that also B A = 1V . (Both A and B are linear
given such that f (a) 6= 0. Show that operators in a finite-dimensional space V .)
1 Trace and determinant: Consider the space of polynomials in
= a (f ). the variables x and y, where we admit only polynomials of the
f (a)
form a0 + a1 x + a2 y + a3 xy (with aj R). An operator A is
b) It is given that a = 0 and b = 0, where k V defined by
(with 2 k N 1) and a, b V such that a b 6= 0. Show
that there exists k2 V such that = a b . A x .
x y
c) It is given that a b = 0, where k V (with 2 k
N 2) and a, b V such that a b 6= 0. Is it always true that Show that A is a linear operator in this space. Compute the trace
= a b for some k2 V ? and the determinant of A. If A is invertible, compute A1 (x+ y).
Determinants: a) Suppose A is a linear operator defined by A = Cayley-Hamilton theorem: Express det A through TrA and
PN
i=1 ai bi , where ai V are given vectors and bi V are
Tr(A2 ) for an arbitrary operator A in a two-dimensional space.
given covectors; N = dim V . Show that
Algebraic complement: Let A be a linear operator and A its al-
a1 ... aN b1 ... bN gebraic complement.
det A = , a) Show that
e1 ... eN e1 ... eN

 TrA = N AN 1 .
where {ej } is an arbitrary basis and ej is the corresponding
dual basis. Show that the expression above is independent of Here N AN 1 is the coefficient at () in the characteristic poly-
the choice of the basis {ej }. nomial of A (that is, minus the coefficient preceding the deter-
b) Suppose that a scalar product is given in V , and an operator minant).
A is defined by b) For t-independent operators A and B, show that
X N
Ax ai hbi , xi .
det(A + tB) = Tr(AB).
i=1 t
Further, suppose that {ej } is an orthonormal basis in V . Show
Liouville formula: Suppose X(t) is a defined as solution of the
that
a1 ... aN b1 ... bN differential equation
det A = ,
e1 ... eN e1 ... eN
t X(t) = A(t)X(t) X(t)A(t),
and that this expression is independent of the choice of the or-
thonormal basis {ej } and of the orientation of the basis. where A(t) is a given operator. (Operators that are functions of
Hyperplanes: a) Let us suppose that the price of the vector t can be understood as operator-valued formal power series.)
x V is given by the formula
a) Show that the determinant of X(t) is independent of t.
Cost (x) C(x, x), b) Show that all the coefficients of the characteristic polyno-
mial of X(t) are independent of t.
where C(a, b) is a known, positive-definite bilinear form. Deter-
Hodge star: Suppose {v1 , ..., vN } is a basis in V , not necessar-
mine the cheapest vector x belonging to the affine hyperplane
ily orthonormal, while {ej } is a positively oriented orthonormal
a (x) = , where a V is a nonzero covector and is a num-
basis. Show that
ber.
b) We are now working in a vector space with a scalar product, v1 ... vN
(v1 ... vN ) = .
and the price of a vector x is hx, xi. Two affine hyperplanes e1 ... eN
are given by equations ha, xi = and hb, xi = , where a and
b are given vectors, and are numbers, and x V . (It is Volume in space: Consider the space of polynomials of degree
assured that a and b are nonzero and not parallel to each other.) at most 4 in the variable x. The scalar product of two polynomi-
Determine the cheapest vector x belonging to the intersection als p1 (x) and p2 (x) is defined by
of the two hyperplanes. Z
1 1
Too few equations: A linear operator A is defined by A = hp1 , p2 i p1 (x)p2 (x)dx.
Pk
2 1
a
i=1 i b i , where a i V are given vectors and bi V
are given covectors, and k < N = dim V . Show that the vector Determine the three-dimensional volume of the tetrahedron
equation Ax = c has no solutions if a1 ... ak c 6= 0. In case with vertices at the points 0, 1 + x, x2 + x3 , x4 in this five-
a1 ... ak c = 0, show that solutions x surely exist when dimensional space.
b1 ... bk 6= 0 but may not exist otherwise.
Operator functions: It is known that the operator A satisfies the
operator equation A2 = p 1. Simplify the operator-valued func- 0.3 A list of results
1+A
tions 3 A
, cos(A), and A + 2 to linear formulas involving A. Here is a list of some results explained in this book. If you al-
(Here is a number, while the numbers 1, 2, 3 stand for multi- ready know all these results and their derivations, you may not
ples of the identity operator.) Compare the results with the com- need to read any further.
1+i
plex numbers 3i , cos(i), i + 2 and generalize the conclusion Vector spaces may be defined over an abstract number field,
to a theorem about computing analytic functions f (A). without specifying the number of dimensions or a basis.

2
0 Introduction and summary

The set a + b 41 | a, b Q is a number field. solutions may be constructed using Kramers rule: If a vector
Any vector can be represented as a linear combination of basis Pn to the subspace spanned by vectors {v1 , ..., vn } then
b belongs
vectors. All bases have equally many vectors. b = i=1 bi vi , where the coefficients bi may be found (assum-
The set of all linear maps from one vector space to another is ing v1 ... vn 6= 0) as
denoted Hom(V, W ) and is a vector space.
The zero vector is not an eigenvector (by definition). v1 ... x ... vn
bi =
v1 ... vn
 An operator
 having in some basis the matrix representation
0 1
cannot be diagonalized. (here x replaces vi in the exterior product in the numerator).
0 0
Eigenvalues of a linear operator are roots of its characteristic
The dual vector space V has the same dimension as V (for
polynomial. For each root i , there exists at least one eigenvec-
finite-dimensional spaces).
tor corresponding to the eigenvalue i .
Given a nonzero covector f V , the set of vectors v V
If {v1 , ..., vk } are eigenvectors corresponding to all different
such that f (v) = 0 is a subspace of codimension 1 (a hyper-
eigenvalues 1 , ..., k of some operator, then the set {v1 , ..., vk }
plane).
is linearly independent.
The tensor product of Rm and Rn has dimension mn.
The dimension of the eigenspace corresponding to i is not
Any linear map A : V W can be represented by a tensor
Pk larger than the algebraic multiplicity of the root i in the charac-
of the form i=1 vi wi V W . The rank of A is equal teristic polynomial.
to the smallest number of simple tensor product terms vi wi (Below in this section we always denote by N the dimension of the
required for this representation. space V .)
The identity map 1V : V V is represented as the tensor
PN
The trace of an operator A can be expressed as N A1 .
i=1 ei ei V V , where {ei } is any basis and {ei } its dual We have Tr(AB) = Tr(B A). This holds even if A, B are maps
basis. This tensor does not depend on the choice of the basis
{ei }. between different spaces, i.e. A : V W and B : W V .
A set of vectors {v1 , ..., vk } is linearly independent if and only If an operator A is nilpotent, its characteristic polynomial is
N
if v1 ... vk 6= 0. If v1 ... vk 6= 0 but v1 ... vk x = 0 () , i.e. the same as the characteristic polynomial of a zero
operator.
then the vector x belongs to the subspace  Span {v1 , ..., vk }.
The dimension of the space k V is N k , where N dim V .
The j-th coefficient of the characteristic polynomial of A is
j
Insertion a of a covector a V into an antisymmetric (1) (N Aj ).
tensor k V has the property Each coefficient of the characteristic polynomial of A can be
expressed as a polynomial function of N traces of the form
v (a ) + a (v ) = a (v).
Tr(Ak ), k = 1, ..., N .
Given a basis {ei }, the dual basis {ei } may be computed as The space N 1 V is N -dimensional like V itself, and there
is a canonical isomorphism between End(N 1 V ) and End(V ).
e1 ... x ... eN This isomorphism, called exterior transposition, is denoted by
ei (x) = ,
e1 ... eN (...)T . The exterior transpose of an operator X End V is de-
where x replaces ei in the numerator. fined by
The subspace spanned by a set of vectors {v1 , ..., vk }, not nec-
essarily linearly independent, can be characterized by a certain (X T ) v Xv, N 1 V, v V.
antisymmetric tensor , which is the exterior product of the
largest number of vi s such that 6= 0. The tensor , computed Similarly, one defines the exterior transposition map between
in this way, is unique up to a constant factor. End(N k V ) and End(k V ) for all k = 1, ..., N .
The n-vector (antisymmetric tensor) v1 ... vn represents The algebraic complement operator (normally defined as a
geometrically the oriented n-dimensional volume of the paral- matrix consisting of minors) is canonically defined through ex-

lelepiped spanned by the vectors vi . terior transposition as A (N 1 AN 1 )T . It can be expressed
The determinant of a linear operator A is the coefficient that
as a polynomial in A and satisfies the identity AA = (det A)1V .
multiplies the oriented volume of any parallelepiped trans- Also, all other operators
formed by A. In our notation, the operator N AN acts in N V
T
as multiplication by det A. A(k) N 1 AN k , k = 1, ..., N
If each of the given vectors {v1 , ..., vN } is expressed through
P
a basis {ei } as vj = N i=1 vij ei , the determinant of the matrix vij can be expressed as polynomials in A with known coefficients.
is found as The characteristic polynomial of A gives the zero operator if
v1 ... vN applied to the operator A (the Cayley-Hamilton theorem). A
det(vij ) = det(vji ) = .
e1 ... eN similar theorem holds for each of the operators k A1 , 2 k
N 1 (with different polynomials).
A linear operator A : V V and its canonically defined trans- A formal power series f (t) can be applied to the operator tA;
pose AT : V V have the same characteristic polynomials. the result is an operator-valued formal series f (tA) that has the
If det A 6= 0 then the inverse operator A1 exists, and a lin- usual properties, e.g.
ear equation Ax = b has the unique solution x = A1 b. Oth-
erwise, solutions exist if b belongs to the image of A. Explicit t f (tA) = Af (tA).

3
0 Introduction and summary

If A is diagonalized with eigenvalues {i } in the eigenbasis if {ei } is any positively oriented, orthonormal basis.
{ei }, then a formal power series f (tA) is diagonalized in the The Hodge star map satisfies
same basis with eigenvalues f (ti ).
If an operator A satisfies a polynomial equation such as ha, bi = (a b) = (b a), a, b V.
p(A) = 0, where p(x) is a known polynomial of degree k (not In a three-dimensional space, the usual vector product and
necessarily, but possibly, the characteristic polynomial of A) triple product can be expressed through the Hodge star as
then any formal power series f (tA) is reduced to a polynomial
in tA of degree not larger than k 1. This polynomial can be a b = (a b), a (b c) = (a b c).
computed as the interpolating polynomial for the function f (tx)
at points x = xi where xi are the (all different) roots of p(x). Suit- The volume of an N -dimensional
p parallelepiped spanned by
able modifications are available when not all roots are different. {v1 , ..., vN } is equal to det(Gij ), where Gij hvi , vj i is the
So one can compute any analytic function f (A) of the operator matrix of the pairwise scalar products.
Given a scalar product in V , a scalar product is canonically
A as long as one knows a polynomial equation satisfied by A.
defined also in the spaces k V for all k = 2, ..., N . This scalar
A square root of an operator A (i.e. a linear operator B such
product can be defined by
that B B = A) is not unique and does not always exist. In two
and three dimensions, one can either obtain all square roots ex- h1 , 2 i = (1 2 ) = (2 1 ) = h2 , 1 i ,
plicitly as polynomials in A, or determine that some square roots
are not expressible as polynomials in A or that square roots of A where 1,2 k V . Alternatively, this scalar product is de-
do not exist at all. fined by choosing an orthonormal basis {ej } and postulating
If an operator A depends on a parameter t, one can express the that ei1 ... eik is normalized and orthogonal to any other such
derivative of the determinant of A through the algebraic com- tensor with different indices {ij |j = 1, ..., k}. The k-dimension-
al volumepof a parallelepiped spanned by vectors {v1 , ..., vk } is
plement A (Jacobis formula),
found as h, i with v1 ... vk k V .
The insertion v of a vector v into a k-vector k V (or the
t det A(t) = Tr(At A).
interior product) can be expressed as
Derivatives of other coefficients qk N AN k of the character-
istic polynomial are given by similar formulas, v = (v ).
 
t qk = Tr (N 1 AN k1 )T t A . If e1 ... eN is the unit volume tensor, we have v = v.
Symmetric, antisymmetric, Hermitian, and anti-Hermitian
The Liouville formula holds: det exp A = exp TrA. operators are always diagonalizable (if we allow complex eigen-
Any operator (not necessarily diagonalizable) can be reduced values and eigenvectors). Eigenvectors of these operators can be
to a Jordan canonical form in a Jordan basis. The Jordan basis chosen orthogonal to each other.
consists of eigenvectors and root vectors for each eigenvalue. Antisymmetric Poperators are representable as elements of
2 n
Given an operator A whose characteristic polynomial is V of the form i=1 a i b i , where one needs no more than N/2
known (hence all roots i and their algebraic multiplicities mi terms, and the vectors a i , bi can be chosen mutually orthogonal
are known), one can construct explicitly a projector Pi onto a to each other. (For this, we do not need complex vectors.)
Jordan cell for any chosen eigenvalue i . The projector is found The Pfaffian of an antisymmetric operator A in even-dimen-
as a polynomial in A with known coefficients. sional space is the number Pf A defined as
(Below in this section we assume that a scalar product is fixed in V .)
1
A nondegenerate scalar product provides a one-to-one corre- A
(N/2)! | {z... A} = (Pf A)e1 ... eN ,
spondence between vectors and covectors. Then the canonically N/2
transposed operator AT : V V can be mapped into an op-
erator in V , denoted also by AT . (This operator is represented where {ei } is an orthonormal basis. Some basic properties of the
by the transposed matrix only in an orthonormal basis.) We have Pfaffian are
(AB)T = B T AT and det(AT ) = det A.
(Pf A)2 = det A,
Orthogonal transformations have determinants equal to 1.
Mirror reflections are orthogonal transformations and have de- Pf (B AB T ) = (det B)(Pf A),
terminant equal to 1.
Given an orthonormal basis {ei }, one can define the unit vol- where A is an antisymmetric operator (AT = A) and B is an
ume tensor = e1 ... eN . The tensor is then independent arbitrary operator.
of the choice of {ei } up to a factor 1 due to the orientation of
the basis (i.e. the ordering of the vectors of the basis), as long as
the scalar product is kept fixed.
Given a fixed scalar product h, i and a fixed orientation of
space, the Hodge star operation is uniquely defined as a linear
map (isomorphism) k V N k V for each k = 0, ..., N . For
instance,
e1 = e2 e3 ... eN ; (e1 e2 ) = e3 ... eN ,

4
1 Linear algebra without coordinates
1.1 Vector spaces 1.1.2 From three-dimensional vectors to abstract
vectors
Abstract vector spaces are developed as a generalization of the
familiar vectors in Euclidean space. Abstract vector spaces retain the essential properties of the fa-
miliar Euclidean geometry but generalize it in two ways: First,
the dimension of space is not 3 but an arbitrary integer number
1.1.1 Three-dimensional Euclidean geometry (or even infinity); second, the coordinates are abstract num-
Let us begin with something you already know. Three-dimen- bers (see below) instead of real numbers. Let us first pass to
sional vectors are specified by triples of coordinates, r higher-dimensional vectors.
(x, y, z). The operations of vector sum and vector product of Generalizing the notion of a three-dimensional vector to a
such vectors are defined by higher (still finite) dimension is straightforward: instead of
triples (x, y, z) one considers sets of n coordinates (x1 , ..., xn ).
(x1 , y1 , z1 ) + (x2 , y2 , z2 ) (x1 + x2 , y1 + y2 , z1 + z2 ) ; (1.1) The definitions of the vector sum (1.1), scaling (1.3) and scalar
product (1.4) are straightforwardly generalized to n-tuples of co-
(x1 , y1 , z1 ) (x2 , y2 , z2 ) (y1 z2 z1 y2 , z1 x2 x1 z2 ,
ordinates. In this way we can describe n-dimensional Euclidean
x1 y2 y1 x2 ). (1.2) geometry. All theorems of linear algebra are proved in the same
way regardless of the number of components in vectors, so the
(I assume that these definitions are familiar to you.) Vectors can generalization to n-dimensional spaces is a natural thing to do.
be rescaled by multiplying them with real numbers, Question: The scalar product can be generalized to n-dimen-
sional spaces,
cr = c (x, y, z) (cx, cy, cz) . (1.3)
(x1 , ..., xn ) (y1 , ..., yn ) x1 y1 + ... + xn yn ,
A rescaled vector is parallel to the original vector and points
either in the same or in the opposite direction. In addition, a but what about the vector product? The formula (1.2) seems to
scalar product of two vectors is defined, be complicated, and it is hard to guess what should be written,
say, in four dimensions.
(x1 , y1 , z1 ) (x2 , y2 , z2 ) x1 x2 + y1 y2 + z1 z2 . (1.4) Answer: It turns out that the vector product (1.2) cannot be
generalized to arbitrary n-dimensional spaces.1 At this point
These operations encapsulate all of Euclidean geometry in a
we will not require the vector spaces to have either a vector or
purely algebraic language. For example, the length of a vector r
a scalar product; instead we will concentrate on the basic alge-
is p braic properties of vectors. Later we will see that there is an alge-
|r| r r = x2 + y 2 + z 2 , (1.5) braic construction (the exterior product) that replaces the vector
the angle between vectors r1 and r2 is found from the relation product in higher dimensions.
(the cosine theorem)
Abstract numbers
|r1 | |r2 | cos = r1 r2 ,
The motivation to replace the real coordinates x, y, z by com-
while the area of a triangle spanned by vectors r1 and r2 is plex coordinates, rational coordinates, or by some other, more
abstract numbers comes from many branches of physics and
1 mathematics. In any case, the statements of linear algebra al-
S = |r1 r2 | .
2 most never rely on the fact that coordinates of vectors are real
Using these definitions, one can reformulate every geomet- numbers. Only certain properties of real numbers are actually
ric statement (such as, a triangle having two equal sides has used, namely that one can add or multiply or divide numbers.
also two equal angles) in terms of relations between vectors, So one can easily replace real numbers by complex numbers or
which are ultimately reducible to algebraic equations involving by some other kind of numbers as long as one can add, multi-
a set of numbers. The replacement of geometric constructions ply and divide them as usual. (The use of the square root as in
by algebraic relations is useful because it allows us to free our- Eq. (1.5) can be avoided if one considers only squared lengths of
selves from the confines of our three-dimensional intuition; we vectors.)
are then able to solve problems in higher-dimensional spaces. Instead of specifying each time that one works with real num-
The price is a greater complication of the algebraic equations bers or with complex numbers, one says that one is working
and inequalities that need to be solved. To make these equa- with some abstract numbers that have all the needed proper-
tions more transparent and easier to handle, the theory of linear ties of numbers. The required properties of such abstract num-
algebra is developed. The first step is to realize what features bers are summarized by the axioms of a number field.
of vectors are essential and what are just accidental facts of our 1 A vector product exists only in some cases, e.g. n = 3 and n = 7. This is a
familiar three-dimensional Euclidean space. theorem of higher algebra which we will not prove here.

5
1 Linear algebra without coordinates

Definition: A number field (also called simply a field) is a set Most of the time we will not need to specify the number field;
K which is an abelian group with respect to addition and mul- it is all right to imagine that we always use R or C as the field.
tiplication, such that the distributive law holds. More precisely: (See Appendix A for a brief introduction to complex numbers.)
There exist elements 0 and 1, and the operations +, , , and / Exercise:
 Which
of the following
sets are number fields:
are defined such that a + b = b + a, a b = b a, 0 + a = a, a) x + iy 2 | x, y Q , where i is the imaginary unit.

1 a = a, 0 a = 0, and for every a K the numbers a and 1/a b) x + y 2 | x, y Z .
(for a 6= 0) exist such that a + (a) = 0, a (1/a) = 1, and also
a (b + c) = a b + a c. The operations and / are defined by
a b a + (b) and a/b = a (1/b). Abstract vector spaces
In a more visual language: A field is a set of elements on After a generalization of the three-dimensional vector geometry
which the operations +, , , and / are defined, the elements to n-dimensional spaces and real numbers R to abstract number
0 and 1 exist, and the familiar arithmetic properties such as fields, we arrive at the following definition of a vector space.
a + b = b + a, a + 0 = 0, a a = 0, a 1 = 1, a/b b = a (for Definition V1: An n-dimensional vector space over a field K is
b 6= 0), etc. are satisfied. Elements of a field can be visualized the set of all n-tuples (x1 , ..., xn ), where xi K; the numbers xi
as abstract numbers because they can be added, subtracted, are called components of the vector (in older books they were
multiplied, and divided, with the usual arithmetic rules. (For called coordinates). The operations of vector sum and the scal-
instance, division by zero is still undefined, even with abstract ing of vectors by numbers are given by the formulas
numbers!) I will call elements of a number field simply numbers
when (in my view) it does not cause confusion. (x1 , ..., xn ) + (y1 , ..., yn ) (x1 + y1 , ..., xn + yn ) , xi , yi K;
(x1 , ..., xn ) (x1 , ..., xn ) , K.
Examples of number fields
This vector space is denoted by Kn .
Real numbers R are a field, as are rational numbers Q and com- Most problems in physics involve vector spaces over the field
plex numbers C, with all arithmetic operations defined as usual. of real numbers K = R or complex numbers K = C. However,
Integer numbers Z with the usual arithmetic are not a field be- most results of basic linear algebra hold for arbitrary number
cause e.g. the division of 1 by a nonzero number 2 cannot be an fields, and for now we will consider vector spaces over an arbi-
integer. trary number field K.
Another
interesting example is the set of numbers of the form Definition V1 is adequate for applications involving finite-
a+b 3, where a, b Q are rational numbers. It is easy to see that dimensional vector spaces. However, it turns out that fur-
sums, products, and ratios of such numbers are again numbers ther abstraction is necessary when one considers infinite-dimen-
from the same set, for example sional spaces. Namely, one needs to do away with coordinates
and define the vector space by the basic requirements on the
(a1 + b1 3)(a2 + b2 3)
vector sum and scaling operations.
= (a1 a2 + 3b1 b2 ) + (a1 b2 + a2 b1 ) 3. We will adopt the following coordinate-free definition of a
vector space.
Lets check the division property:
Definition V2: A set V is a vector space over a number field K
if the following conditions are met:
1 ab 3 1 ab 3
= = 2 .
a+b 3 ab 3a+b 3 a 3b2 1. V is an abelian group; the sum of two vectors is denoted by
the + sign, the zero element is the vector 0. So for any
Note that 3 is irrational, so the denominator a2 3b2 is never
u, v V the vector u + v V exists, u + v = v + u, and in
zero as long as a and b are rational and at least one of a, b is non-
particular v + 0 = v for any v V .
zero. Therefore, we can divide numbers of the form a + b 3
and again
 get numbers of the same kind. It follows that the set 2. An operation of multiplication by numbers is defined,
a + b 3 | a, b Q is indeed a number field. This field is usu- such that for each K, v V the vector v V is deter-

by Q[ 3] and called an extension of rational num-
ally denoted mined.
bers by 3. Fields of this form are useful in algebraic number
theory. 3. The following properties hold, for all vectors u, v V and
A field might even consist of a finite set of numbers (in which all numbers , K:
case it is called a finite field). For example, the set of three num-
bers {0, 1, 2} can be made a field if we define the arithmetic op- ( + ) v = v + v, (v + u) = v + u,
erations as 1v = v, 0v = 0.

1 + 2 0, 2 + 2 1, 2 2 1, 1/2 2, These properties guarantee that the multiplication by num-


bers is compatible with the vector sum, so that usual rules
with all other operations as in usual arithmetic. This is the field of arithmetic and algebra are applicable.
of integers modulo 3 and is denoted by F3 . Fields of this form
are useful, for instance, in cryptography. Below I will not be so pedantic as to write the boldface 0 for the
Any field must contain elements that play the role of the num- zero vector 0 V ; denoting the zero vector simply by 0 never
bers 0 and 1; we denote these elements simply by 0 and 1. There- creates confusion in practice.
fore the smallest possible field is the set {0, 1} with the usual Elements of a vector space are called vectors; in contrast,
relations 0 + 1 = 1, 1 1 = 1 etc. This field is denoted by F2 . numbers from the field K are called scalars. For clarity, since

6
1 Linear algebra without coordinates

this is an introductory text, I will print all vectors in boldface of a vector space is satisfied if we define the sum of two func-
font so that v, a, x are vectors but v, a, x are scalars (i.e. num- tions as f (x) + f (y) and the multiplication by scalars, f (x),
bers). Sometimes, for additional clarity, one uses Greek let- in the natural way. It is easy to see that the axioms of the vector
ters such as , , to denote scalars and Latin letters to de- space are satisfied: If h (x) = f (x)+g (x), where f (x) and g (x)
note vectors. For example, one writes expressions of the form are vectors from this space, then the function h (x) is continuous
1 v1 + 2 v2 + ... + n vn ; these are called linear combinations on [0, 1] and satisfies h (0) = h (1) = 0, i.e. the function h (x) is
of vectors v1 , v2 , ..., vn . also an element of the same space.
The definition V2 is standard in abstract algebra. As we will Example 4. To represent the fact that there are 1 gallons of wa-
see below, the coordinate-free language is well suited to proving ter and 2 gallons of oil, we may write the expression 1 X+2 Y,
theorems about general properties of vectors. where X and Y are formal symbols and 1,2 are numbers. The
Question: I do not understand how to work with abstract vec- set of all such expressions is a vector space. This space is called
tors in abstract vector spaces. According to the vector space ax- the space of formal linear combinations of the symbols X and
ioms (definition V2), I should be able to add vectors together Y. The operations of sum and scalar multiplication are defined
and multiply them by scalars. It is clear how to add the n-tuples in the natural way, so that we can perform calculations such as
(v1 , ..., vn ), but how can I compute anything with an abstract
vector v that does not seem to have any components? 1 1
(2X + 3Y) (2X 3Y) = 3Y.
Answer: Definition V2 is abstract in the sense that it does 2 2
not explain how to add particular kinds of vectors, instead it
merely lists the set of properties any vector space must satisfy. For the purpose of manipulating such expressions, it is unim-
portant that X and Y stand for water and oil. We may simply
To define a particular vector space, we of course need to spec-
work with formal expressions such as 2X + 3Y, where X and
ify a particular set of vectors and a rule for adding its elements
in an explicit fashion (see examples below in Sec. 1.1.3). Defini- Y and + are symbols that do not mean anything by them-
selves except that they can appear in such linear combinations
tion V2 is used in the following way: Suppose someone claims
that a certain set X of particular mathematical objects is a vector and have familiar properties of algebraic objects (the operation
+ is commutative and associative, etc.). Such formal construc-
space over some number field, then we only need to check that
tions are often encountered in mathematics.
the sum of vectors and the multiplication of vector by a number
are well-defined and conform to the properties listed in Defini- Question: It seems that such formal constructions are absurd
tion V2. If every property holds, then the set X is a vector space, and/or useless. I know how to add numbers or vectors, but
and all the theorems of linear algebra will automatically hold for how can I add X + Y if X and Y are, as you say, meaningless
the elements of the set X. Viewed from this perspective, Defini- symbols?
tion V1 specifies a particular vector spacethe space of rows of Answer: Usually when we write a + b we imply that the op-
numbers (v1 , ..., vn ). In some cases the vector space at hand is eration + is already defined, so a+b is another number if a and
exactly that of Definition V1, and then it is convenient to work b are numbers. However, in the case of formal expressions de-
with components vj when performing calculations with specific scribed in Example 4, the + sign is actually going to acquire a
vectors. However, components are not needed for proving gen- new definition. So X + Y is not equal to a new symbol Z, instead
eral theorems. In this book, when I say that a vector v V is X + Y is just an expression that we can manipulate. Consider the
given, I imagine that enough concrete information about v will analogy with complex numbers: the number 1 + 2i is an expres-
be available when it is actually needed. sion that we manipulate, and the imaginary unit, i, is a symbol
that is never equal to something else. According to its defini-
tion, the expression X + Y cannot be simplified to anything else,
1.1.3 Examples of vector spaces just like 1 + 2i cannot be simplified. The symbols X, Y, i are not
Example 0. The familiar example is the three-dimensional Eu- meaningless: their meaning comes from the rules of computations
3
clidean space. This space is denoted by R and is the set of all with these symbols.
triples (x1 , x2 , x3 ), where xi are real numbers. This is a vector Maybe it helps to change notation. Let us begin by writing a
space over R. pair (a, b) instead of aX + bY. We can define the sum of such
pairs in the natural way, e.g.
Example 1. The set of complex numbers C is a vector space over
the field of real numbers R. Indeed, complex numbers can be
(2, 3) + (2, 1) = (0, 4) .
added and multiplied by real numbers.
Example 2. Consider the set of all three-dimensional vectors It is clear that these pairs build a vector space. Now, to remind
v R3 which are orthogonal to a given vector a 6= 0; here we ourselves that the numbers of the pair stand for, say, quantities
use the standard scalar product (1.4); vectors a and b are called of water and oil, we write (2X, 3Y) instead of (2, 3). The sym-
orthogonal to each other if a b = 0. This set is closed under bols X and Y are merely part of the notation. Now it is natural
vector sum and scalar multiplication because if u a = 0 and to change the notation further and to write simply 2X instead of
v a = 0, then for any R we have (u + v) a = 0. Thus we (2X, 0Y) and aX + bY instead of (aX, bY). It is clear that we do
obtain a vector space (a certain subset of R3 ) which is defined not introduce anything new when we write aX + bY instead of
not in terms of components but through geometric relations be- (aX, bY): We merely change the notation so that computations
tween vectors of another (previously defined) space. appear easier. Similarly, complex numbers can be understood as
Example 3. Consider the set of all real-valued continuous func- pairs of real numbers, such as (3, 2), for which 3 + 2i is merely a
tions f (x) defined for x [0, 1] and such that f (0) = 0 and more convenient notation that helps remember the rules of com-
f (1) = 0. This set is a vector space over R. Indeed, the definition putation. 

7
1 Linear algebra without coordinates

Example 5. The set of all polynomials of degree at most n in Answer: It will be perfectly all right as long as you work with
the variable x with complex coefficients is a vector space over C.
finite-dimensional vector spaces. (This intuition often fails when
Such polynomials are expressions of the form p (x) = p0 + p1 x + working with infinite-dimensional spaces!) Even if all we need
... + pn xn , where x is a formal variable (i.e. no value is assigned
is finite-dimensional vectors, there is another argument in fa-
to x), n is an integer, and pi are complex numbers. vor of the coordinate-free thinking. Suppose I persist in vi-
Example 6. Consider now the set of all polynomials in the vari- sualizing vectors as rows (v1 , ..., vn ); let us see what happens.
ables x, y, and z, with complex coefficients, and such that the First, I introduce the vector notation and write u + v instead
combined degree in x, in y, andin z is at most 2. For instance, of (u1 + v1 , ..., un + vn ); this is just for convenience and to save
the polynomial 1 + 2ix yz 3x2 is an element of that vec- time. Then I check the axioms of the vector space (see the defi-
tor space (while x2 y is not because its combined degree is 3). It
nition V2 above); row vectors of course obey these axioms. Sup-
is clear that the degree will never increase above 2 when any pose I somehow manage to produce all proofs and calculations
two such polynomials are added together, so these polynomials using only the vector notation and the axioms of the abstract
indeed form a vector space over the field C. vector space, and suppose I never use the coordinates vj explic-
Exercise. Which of the following are vector spaces over R? itly, even though I keep them in the back of my mind. Then all
my results will be valid not only for collections of components
1. The set of all complex numbers z whose real part is equal to (v , ..., v ) but also for any mathematical objects that obey the
1 n
0. The complex numbers are added and multiplied by real axioms of the abstract vector space. In fact I would then realize
constants as usual. that I have been working with abstract vectors all along while
2. The set of all complex numbers z whose imaginary part is carrying the image of a row vector (v1 , ..., vn ) in the back of my
equal to 3. The complex numbers are added and multiplied mind.
by real constants as usual.
1.1.4 Dimensionality and bases
3. The set of pairs of the form (apples, $3.1415926), where the
first element is always the word apples and the second el- Unlike the definition V1, the definition V2 does not include any
ement is a price in dollars (the price may be an arbitrary real information about the dimensionality of the vector space. So,
number, not necessarily positive or with an integer number on the one hand, this definition treats finite- and infinite-dimen-
of cents). Addition and multiplication by real constants is sional spaces on the same footing; the definition V2 lets us es-
defined as follows: tablish that a certain set is a vector space without knowing its
dimensionality in advance. On the other hand, once a particu-
(apples, $x) + (apples, $y) (apples, $(x + y)) lar vector space is given, we may need some additional work to
(apples, $x) (apples, $( x)) figure out the number of dimensions in it. The key notion used
for that purpose is linear independence.
4. The set of pairs of the form either (apples, $x) or We say, for example, the vector w 2u3v is linearly depen-
(chocolate, $y), where x and y are real numbers. The pairs dent on u and v. A vector x is linearly independent of vectors u
are added as follows, and v if x cannot be expressed as a linear combination 1 u+2 v.
A set of vectors is linearly dependent if one of the vectors is
(apples, $x) + (apples, $y) (apples, $(x + y)) a linear combination of others. This property can be formulated
(chocolate, $x) + (chocolate, $y) (chocolate, $(x + y)) more elegantly:
Definition: The set of vectors {v1 , ..., vn } is a linearly depen-
(chocolate, $x) + (apples, $y) (chocolate, $(x + y))
dent set if there exist numbers 1 , ..., n K, not all equal to
(that is, chocolate takes precedence over apples). The zero, such that
multiplication by a number is defined as in the previous 1 v1 + ... + n vn = 0. (1.6)
question. If no such numbers exist, i.e. if Eq. (1.6) holds only with all i =
0, the vectors {vi } constitute a linearly independent set.
5. The set of bracketed complex numbers, denoted [z], Interpretation: As a first example, consider the set {v} con-
where z is a complex  number such that |z| = 1. For ex- sisting of a single nonzero vector v 6= 0. The set {v} is a linearly
ample: [i], 21 12 i 3 , [1]. Addition and multiplication independent set because v = 0 only if = 0. Now consider the
by real constants are defined as follows, set {u, v, w}, where u = 2v and w is any vector. This set is lin-
 i  early dependent because there exists a nontrivial linear combi-
[z1 ] + [z2 ] = [z1 z2 ] , [z] = ze .
nation (i.e. a linear combination with some nonzero coefficients)
which is equal to zero,
6. The set of infinite arrays (a1 , a2 , ...) of arbitrary real num-
bers. Addition and multiplication are defined term-by- u 2v = 1u + (2) v + 0w = 0.
term. More generally: If a set {v1 , ..., vn } is linearly dependent, then
there exists at least one vector equal to a linear combination of
7. The set of polynomials in the variable x with real coeffi-
other vectors. Indeed, by definition there must be at least one
cients and of arbitrary (but finite) degree. Addition and
nonzero number among the numbers i involved in Eq. (1.6);
multiplication is defined as usual in algebra.
suppose 1 6= 0, then we can divide Eq. (1.6) by 1 and express
Question: All these abstract definitions notwithstanding, v 1 through other vectors,
would it be all right if I always keep in the back of my mind 1
that a vector v is a row of components (v1 , ..., vn )? v1 = (2 v2 + ... + n vn ) .
1

8
1 Linear algebra without coordinates

In other words, the existence of numbers i , not all equal to zero, Example 2: In the three-dimensional Euclidean space R3 , the set
is indeed the formal statement of the idea that at least some vec- of three triples (1, 0, 0), (0, 1, 0), and (0, 0, 1) is a basis because
tor in the set {vi } is a linear combination
P of other vectors. By every vector x = (x, y, z) can be expressed as
writing a linear combination i i vi = 0 and by saying that
not all i are zero we avoid specifying which vector is equal to x = (x, y, z) = x (1, 0, 0) + y (0, 1, 0) + z (0, 0, 1) .
a linear combination of others.
This basis is called the standard basis. Analogously one defines
Remark: Often instead of saying a linearly independent set of
the standard basis in Rn . 
vectors one says a set of linearly independent vectors. This
The following statement is standard, and I write out its full
is intended to mean the same thing but might be confusing be-
proof here as an example of an argument based on the abstract
cause, taken literally, the phrase a set of independent vectors
definition of vectors.
means a set in which each vector is independent by itself.
Theorem: (1) If a set {e1 , ..., en } is linearly independent and
Keep in mind that linear independence is a property of a set of
n = dim V , then the set {e1 , ..., en } is a basis in V . (2) For a
vectors; this property depends on the relationships between all
given vector v V and a given basis {e1P , ..., en }, the coefficients
the vectors in the set and is not a property of each vector taken n
vk involved in the decomposition v = k=1 vk ek are uniquely
separately. It would be more consistent to say e.g. a set of mu-
determined.
tually independent vectors. In this text, I will pedantically stick
Proof: (1) By definition of dimension, the set {v, e1 , ..., en }
to the phrase linearly independent set.
must be linearly dependent. By definition of linear dependence,
Example 1: Consider the vectors a = (0, 1), b = (1, 1) in R2 . Is
there exist numbers 0 , ..., n , not all equal to zero, such that
the set {a, b} linearly independent? Suppose there exists a linear
combination a + b = 0 with at least one of , 6= 0. Then we 0 v + 1 e1 + ... + n en = 0. (1.7)
would have
! Now if we had 0 = 0, it would mean that not all numbers in the
a + b = (0, ) + (, ) = (, + ) = 0. smaller set {1 , ..., n } are zero; however, in that case Eq. (1.7)
This is possible only if = 0 and = 0. Therefore, {a, b} is would contradict the linear independence of the set {e1 , ..., en }.
linearly independent. Therefore 0 6= 0 and Eq. (1.7) shows Pn that the vector v can be ex-
Exercise 1: a) A set {v1 , ..., vn } is linearly independent. Prove pressed through the basis, v = k=1 vk ek with the coefficients
that any subset, say {v1 , ..., vk }, where k < n, is also a linearly vk k /0 .
independent set. (2) To show that the set of coefficients {vk } is unique, we as-
b) Decide whether the given sets {a, b} or {a, b, c} are linearly sume that there are two such sets, {vk } and {vk }. Then
independent sets of vectors from R2 or other spaces as indicated. Xn Xn X n
For linearly dependent sets, find a linear combination showing 0=vv = vk ek vk ek = (vk vk ) ek .
this. k=1 k=1 k=1

1. a = 2, 2 , b = ( 12 , 12 ) in R2 Since the set {e1 , ..., en } is linearly independent, all coefficients
in this linear combination must vanish, so vk = vk for all k. 
2. a = (2, 3), b = (6, 9) in R2 If we fix a basis {ei } in a finite-dimensional vector space V
3. a = (1 + 2i, 10, 20), b = (1 2i, 10, 20) in C 3 then all vectors v V are uniquely represented by n-tuples
{v1 , ..., vn } of their components. Thus we recover the original
4. a = (0, 10i, 20i, 30i), b = (0, 20i, 40i, 60i), c = (0, 30i, 60i, 90i) picture of a vector space as a set of n-tuples of numbers. (Below
in C4 we will prove that every basis in an n-dimensional space has the
same number of vectors, namely n.) Now, if we choose another
5. a = (3, 1, 2), b = (1, 0, 1), c = (0, 1, 2) in R3
basis {ei }, the same vector v will have different components vk :
The number of dimensions (or simply the dimension) of a vec- n n
tor space is the maximum possible number of vectors in a lin- X X
v= vk ek = vk ek .
early independent set. The formal definition is the following. k=1 k=1
Definition: A vector space is n-dimensional if linearly inde-
pendent sets of n vectors can be found in it, but no linearly in- Remark: One sometimes reads that the components are trans-
dependent sets of n + 1 vectors. The dimension of a vector space formed or that vectors are sets of numbers that transform un-
V is then denoted by dim V n. A vector space is infinite- der a change of basis. I do not use this language because it
dimensional if linearly independent sets having arbitrarily many suggests
1
that the components vk , which are numbers such as
vectors can be found in it. 3 or 2, are somehow not simply numbers but know how to
By this definition, in an n-dimensional vector space there ex- transform. I prefer to say that the components vk of a vector
ists at least one linearly independent set of n vectors {e1 , ..., en }. v in a particular basis {ek } express the relationship of v to that
Linearly independent sets containing exactly n = dim V vectors basis and are therefore functions of the vector v and of all basis
have useful properties, to which we now turn. vectors ej . 
Definition: A basis in the space V is a linearly independent set For many purposes it is better to think about a vector v not
of vectors {e1 , ..., en } such that forPany vector v V there exist as a set of its components {v1 , ..., vn } in some basis, but as a
n
numbers vk K such that v = k=1 vk ek . (In other words, geometric object; a directed magnitude is a useful heuristic
every other vector v is a linear combination of basis vectors.) idea. Geometric objects exist in the vector space independently
The numbers vk are called the components (or coordinates) of of a choice of basis. In linear algebra, one is typically interested
the vector v with respect to the basis {ei }. in problems involving relations between vectors, for example

9
1 Linear algebra without coordinates

u = av + bw, where a, b K are numbers. No choice of basis is finite number fields (try F2 ), and the only available example is
necessary to describe such relations between vectors; I will call rather dull.
such relations coordinate-free or geometric. As I will demon-
strate later in this text, many statements of linear algebra are 1.1.5 All bases have equally many vectors
more transparent and easier to prove in the coordinate-free lan-
guage. Of course, in many practical applications one absolutely We have seen that any linearly independent set of n vectors in an
needs to perform specific calculations with components in an n-dimensional space is a basis. The following statement shows
appropriately chosen basis, and facility with such calculations that a basis cannot have fewer than n vectors. The proof is some-
is important. But I find it helpful to keep a coordinate-free (ge- what long and can be skipped unless you would like to gain
ometric) picture in the back of my mind even when I am doing more facility with coordinate-free manipulations.
calculations in coordinates. Theorem: In a finite-dimensional vector space, all bases have
Question: I am not sure how to determine the number of di- equally many vectors.
mensions in a vector space. According to the definition, I should Proof: Suppose that {e1 , ..., em } and {f1 , ..., fn } are two bases
figure out whether there exist certain linearly independent sets in a vector space V and m 6= n. I will show that this assumption
of vectors. But surely it is impossible to go over all sets of n leads to contradiction, and then it will follow that any two bases
vectors checking the linear independence of each set? must have equally many vectors.
Answer: Of course it is impossible when there are infinitely Assume that m > n. The idea of the proof is to take the larger
many vectors. This is simply not the way to go. We can deter- set {e1 , ..., em } and to replace one of its vectors, say es , by f1 , so
mine the dimensionality of a given vector space by proving that that the resulting set of m vectors
the space has a basis consisting of a certain number of vectors. A {e1 , ..., es1 , f1 , es+1 , ..., em } (1.8)
particular vector space must be specified in concrete terms (see
Sec. 1.1.3 for examples), and in each case we should manage to is still linearly independent. I will prove shortly that such a re-
find a general proof that covers all sets of n vectors at once. placement is possible, assuming only that the initial set is lin-
Exercise 2: For each vector space in the examples in Sec. 1.1.3, early independent. Then I will continue to replace other vectors
find the dimension or show that the dimension is infinite. ek by f2 , f3 , etc., always keeping the resulting set linearly inde-
Solution for Example 1: The set C of complex numbers is a pendent. Finally, I will arrive to the linearly independent set
two-dimensional vector space over R because every complex 
f1 , ..., fn , ek1 , ek2 , ..., ekmn ,
number a + ib can be represented as a linear combination of
two basis vectors (1 and i) with real coefficients a, b. The set which contains all fj as well as (m n) vectors ek1 , ek2 , ..., ekmn
{1, i} is linearly independent because a + ib = 0 only when both left over from the original set; there must be at least one such
a = b = 0. vector left over because (by assumption) there are more vectors
Solution for Example 2: The space V is defined as the set of in the basis {ej } than in the basis {fj }, in other words, because
triples (x, y, z) such that ax + by + cz = 0, where at least one of m n 1. Since the set {fj } is a basis, the vector ek1 is a linear
combination of {f1 , ..., fn }, so the set {f1 , ..., fn , ek1 , ...} cannot be
a, b, c is nonzero. Suppose, without loss of generality, that a 6= 0;
then we can express linearly independent. This contradiction proves the theorem.
It remains to show that it is possible to find the index s such
b c that the set (1.8) is linearly independent. The required state-
x = y z.
a a ment is the following: If {ej | 1 j m} and {fj | 1 j n} are
two bases in the space V , and if the set S {e1 , ..., ek , f1 , ..., fl }
Now the two parameters y and z are arbitrary while x is de-
(where l < n) is linearly independent then there exists an index
termined. Hence it appears plausible that the space V is two-
s such that es in S can be replaced by fl+1 and the new set
dimensional. Let us prove this formally. Choose as  the possible
b c
basis vectors e1 = ( a , 1, 0) and e2 = a , 0, 1 . These vec- T {e1 , ..., es1 , fl+1 , es+1 , ..., ek , f1 , ..., fl } (1.9)
tors belong to V , and the set {e1 , e2 } is linearly independent
(straightforward checks). It remains to show that every vec- is still linearly independent. To find a suitable index s, we try to
tor x V is expressed as a linear combination of e1 and e2 . decompose fl+1 into a linear combination of vectors from S. In
Indeed, any such x must have components x, y, z that satisfy other words, we ask whether the set
x = ab y ac z. Hence, x = ye1 + ze2 . S S {fl+1 } = {e1 , ..., ek , f1 , ..., fl+1 }
Exercise 3: Describe a vector space that has dimension zero.

Solution: If there are no linearly independent sets in a space is linearly independent. There are two possibilities: First, if S
V , it means that all sets consisting of just one vector {v} are is linearly independent, we can remove any es , say e1 , from it,
already linearly dependent. More formally, v V : 6= 0 such and the resulting set
that v = 0. Thus v = 0, that is, all vectors v V are equal to T = {e2 , ..., ek , f1 , ..., fl+1 }
the zero vector. Therefore a zero-dimensional space is a space
that consists of only one vector: the zero vector. will be again linearly independent. This set T is obtained from S
Exercise 4 : Usually a vector space admits infinitely many by replacing e1 with fl+1 , so now there is nothing left to prove.


choices of a basis. However, above I cautiously wrote that a Now consider the second possibility: S is linearly dependent.
vector space has at least one basis. Is there an example of a In that case, fl+1 can be decomposed as
vector space that has only one basis? k l
X X
Hints: The answer is positive. Try to build a new basis from an fl+1 = j ej + j fj , (1.10)
existing one and see where that might fail. This has to do with j=1 j=1

10
1 Linear algebra without coordinates

where j , j are some constants, not all equal to zero. Suppose with the definition V1 of vectors as n-tuples vi , one defines ma-
all j are zero; then fl+1 would be a linear combination of other trices as square tables of numbers, Aij , that describe transforma-
fj ; but this cannot happen for a basis {fj }. Therefore not all j , tions of vectors according to the formula
1 j k are zero; for example, s 6= 0. This gives us the
n
index s. Now we can replace es in the set S by fl+1 ; it remains X
ui Aij vj . (1.12)
to prove that the resulting set T defined by Eq. (1.9) is linearly
j=1
independent.
This last proof is again by contradiction: if T is linearly depen- This transformation takes a vector v into a new vector u = Av
dent, there exists a vanishing linear combination of the form
in the same vector space. For example, in two dimensions one
s1
X Xk Xl writes the transformation of column vectors as
j ej + l+1 fl+1 + j e j + j fj = 0, (1.11)       
u1 A11 A12 v1 A11 v1 + A12 v2
j=1 j=s+1 j=1 = .
u2 A21 A22 v2 A21 v1 + A22 v2
where j , j are not all zero. In particular, l+1 6= 0 because
otherwise the initial set S would be linearly dependent, The composition of two transformations Aij and Bij is a trans-
s1 k l formation described by the matrix
X X X
j e j + j e j + j fj = 0. Xn
j=1 j=s+1 j=1 Cij = Aik Bkj . (1.13)
If we now substitute Eq. (1.10) into Eq. (1.11), we will obtain a k=1

vanishing linear combination that contains only vectors from the This is the law of matrix multiplication. (I assume that all this is
initial set S in which the coefficient at the vector es is l+1 s 6= 0. familiar to you.)
This contradicts the linear independence of the set S. Therefore More generally, a map from an m-dimensional space V to an
the set T is linearly independent.  n-dimensional space W is described by a rectangular m n
Exercise 1: Completing a basis. If a set {v1 , ..., vk }, vj V is
matrix that transforms m-tuples into n-tuples in an analogous
linearly independent and k < n dim V , the theorem says that
way. Most of the time we will be working with transformations
the set {vj } is not a basis in V . Prove that there exist (n k)
within one vector space (described by square matrices).
additional vectors vk+1 , ..., vn V such that the set {v1 , ..., vn }
This picture of matrix transformations is straightforward but
is a basis in V .
relies on the coordinate representation of vectors and so has
Outline of proof: If {vj } is not yet a basis, it means that there
two drawbacks: (i) The calculations with matrix components
exists at least one vector v V which cannot be represented
are often unnecessarily cumbersome. (ii) Definitions and cal-
by a linear combination of {vj }. Add it to the set {vj }; prove
culations cannot be easily generalized to infinite-dimensional
that the resulting set is still linearly independent. Repeat these
spaces. Nevertheless, many of the results have nothing to do
steps until a basis is built; by the above Theorem, the basis will
with components and do apply to infinite-dimensional spaces.
contain exactly n vectors.
We need a different approach to characterizing linear transfor-
Exercise 2: Eliminating unnecessary vectors. Suppose that a
mations of vectors.
set of vectors {e1 , ..., es } spans the space V , i.e. every vector
The way out is to concentrate on the linearity of the transfor-
v V can be represented by a linear combination of {vj }; and
mations, i.e. on the properties
suppose that s > n dim V . By definition of dimension, the
set {ej } must be linearly dependent, so it is not a basis in V .
A (v) = A (v) ,
Prove that one can remove certain vectors from this set so that
the remaining vectors are a basis in V . A (v1 + v2 ) = A (v1 ) + A (v2 ) ,
Hint: The set has too many vectors. Consider a nontrivial lin-
ear combination of vectors {e1 , ..., es } that is equal to zero. Show which are easy to check directly. In fact it turns out that the mul-
that one can remove some vector ek from the set {e1 , ..., es } such tiplication law and the matrix representation of transformations
that the remaining set still spans V . The procedure can be re- can be derived from the above requirements of linearity. Below
peated until a basis in V remains. we will see how this is done.
Exercise 3: Finding a basis. Consider the vector space of poly-
nomials of degree at most 2 in the variable x, with real coef- 1.2.1 Abstract definition of linear maps
ficients. Determine whether the following four sets of vectors
are linearly independent, and which of them can serve as a ba- First, we define an abstract linear map as follows.
Definition: A map A : V W between two vector spaces V ,
sis
 in that space.  The sets are {1 + x, 1 x}; {1, 1 + x, 1 x};
1, 1 + x x2 ; 1, 1 + x, 1 + x + x2 . W is linear if for any K and u, v V ,
Exercise 4: Not a basis. Suppose that a set {v1 , ..., vn } in an n-
dimensional space V is not a basis; show that this set must be A (u + v) = Au + Av. (1.14)
linearly dependent.
(Note, pedantically, that the + in the left side of Eq. (1.14) is
the vector sum in the space V , while in the right side it is the
1.2 Linear maps in vector spaces vector sum in the space W .)
Linear maps are also called homomorphisms of vector spaces.
An important role in linear algebra is played by matrices, which Linear maps acting from a space V to the same space are called
usually represent linear transformations of vectors. Namely, linear operators or endomorphisms of the space V .

11
1 Linear algebra without coordinates

At first sight it might appear that the abstract definition of a Definition: Two linear maps A, B are equal if Av = Bv for all
linear transformation offers much less information than the def- v V . The composition of linear maps A, B is the map AB
inition in terms of matrices. This is true: the abstract definition which acts on vectors v as (AB)v A(Bv).
does not specify any particular linear map, it only gives condi- Statement 2: The composition of two linear transformations is
tions for a map to be linear. If the vector space is finite-dimen- again a linear transformation.
sional and a basis {ei } is selected then the familiar matrix pic- Proof: I give two proofs to contrast the coordinate-free lan-
ture is immediately recovered from the abstract definition. Let guage with the language of matrices, and also to show the
us first, for simplicity, consider a linear map A : V V . derivation of the matrix multiplication law.
Statement 1: If A is a linear map V V and {ej } is a basis then (Coordinate-free proof :) We need to demonstrate the prop-
erty (1.14). If A and B are linear transformations then we have,
there exist numbers P Ajk (j, k = 1, ..., n) such that the vector Av
has components k Ajk vk if a vector v has components vk in by definition,
the basis {ej }.
For any vector v we have a decomposition v = AB (u + v) = A(Bu + Bv) = ABu + ABv.
PProof:
n
k=1 v k e k with some components vk . By linearity, the result Therefore the composition AB is a linear map.
of application of the map A to the vector v is (Proof using matrices:) We need to show that for any vector v
n n with components vi and for any two transformation matrices
X  X
Av = A vk ek = vk (Aek ). Aij and Bij , the result of first transforming with Bij and then
k=1 k=1
with Aij is equivalent to transforming v with some other matrix.
We calculate the components vi of the transformed vector,
Therefore, it is sufficient to know how the map A transforms the
Xn n
X X n Xn n
X
basis vectors ek , k = 1, ..., n. Each of the vectors Aek has (in the
vi = Aij Bjk vk = Aij Bjk vk Cik vk ,
basis {ei }) a decomposition j=1 j=1
k=1 k=1 k=1
n
X where Cik is the matrix of the new transformation. 
Aek = Ajk ej , k = 1, ..., n,
Note that we need to work more in the second proof be-
j=1
cause matrices are defined through their components, as tables
where Ajk with 1 j, k n are some coefficients; these Ajk of numbers. So we cannot prove linearity without also find-
are just some numbers that we can calculate for a specific given ing an explicit formula for the matrix product in terms of matrix
linear transformation and a specific basis. It is convenient to components. The first proof does not use such a formula.
arrange these numbers into a square table (matrix) Ajk . Finally,
we compute Av as 1.2.2 Examples of linear maps
n n n
X X X The easiest example of a linear map is the identity operator 1V .
Av = vk Ajk ej = u j ej , This is a map V V defined by 1V v = v. It is clear that this
k=1 j=1 j=1
map is linear, and that its matrix elements in any basis are given
by the Kronecker delta symbol
where the components uj of the vector u Av are 
1, i = j;
n
X ij
uj Ajk vk . 0, i 6= j.
k=1 We can also define a map which multiplies all vectors v V
by a fixed number . This is also obviously a linear map, and we
This is exactly the law (1.12) of multiplication of the matrix Ajk
denote it by 1V . If = 0, we may write 0V to denote the map
by a column vector vk . Therefore the formula of the matrix rep-
that transforms all vectors into the zero vector.
resentation (1.12) is a necessary consequence of the linearity of a
Another example of a linear transformation is the following.
transformation. 
Suppose that the set {e1 , ..., en } is a basis in the space V ; then
The analogous matrix representation holds for linear maps A :
V W between different vector spaces. Pn v V is uniquely expressed as a linear combination
any vector
v = j=1 vj ej . We denote by e1 (v) the function that gives the
It is helpful to imagine that the linear transformation A some-
component v1 of a vector v in the basis {ej }. Then we define the
how exists as a geometric object (an object that knows how
map M by the formula
to transform vectors), while the matrix representation Ajk is
merely a set of coefficients needed to describe that transforma- M v v1 e2 = e1 (v) e2 .
tion in a particular basis. The matrix Ajk depends on the choice
of the basis, but there any many properties of the linear transfor- In other words, the new vector M v is always parallel to e2 but
has the coefficient v1 . It is easy to prove that this map is linear
mation A that do not depend on the basis; these properties can be
(you need to check that the first component of a sum of vectors
thought of as the geometric properties of the transformation.2
is equal to the sum of their first components). The matrix corre-
Below we will be concerned only with geometric properties of
objects. sponding to M in the basis {ej } is

2 Example: the properties A = 0, A > A , and A = 2A are not
0 0 0 ...
11 11 12 ij ji
1 0 0 ...
geometric properties of the linear transformation A becauseP they may hold Mij = 0 0 0 ... .

in one basis but not in another basis. However, the number n A
i=1 ii turns
out to be geometric (independent of the basis), as we will see below. ... ... ... ...

12
1 Linear algebra without coordinates

The map that shifts all vectors by a fixed vector, Sa v v + a, A + B acts on a vector v by adding the vectors Av and Bv. It is
is not linear because straightforward to check that the maps A and A + B defined in
this way are linear maps V W . Therefore, the set of all linear
Sa (u + v) = u + v + a 6= Sa (u) + Sa (v) = u + v + 2a. maps V W is a vector space. This vector space is denoted
Hom (V, W ), meaning the space of homomorphisms from V
Question: I understand how to work with a linear transforma- to W .
tion specified by its matrix Ajk . But how can I work with an The space of linear maps from V to itself is called the space of
abstract linear map A if the only thing I know about A is that endomorphisms of V and is denoted End V . Endomorphisms
it is linear? It seems that I cannot specify linear transformations of V are also called linear operators in the space V . (We have
or perform calculations with them unless I use matrices. been talking about linear operators all along, but we did not call
Answer: It is true that the abstract definition of a linear map them endomorphisms until now.)
does not include a specification of a particular transformation,
unlike the concrete definition in terms of a matrix. However, it
does not mean that matrices are always needed. For a particular 1.2.4 Eigenvectors and eigenvalues
problem in linear algebra, a particular transformation is always Definition 1: Suppose A : V V is a linear operator, and a
specified either as a certain matrix in a given basis, or in a geomet- vector v 6= 0 is such that Av = v where K is some number.
ric, i.e. basis-free manner, e.g. the transformation B multiplies Then v is called the eigenvector of A with the eigenvalue .
a vector by 3/2 and then projects onto the plane orthogonal to The geometric interpretation is that v is a special direction for
the fixed vector a. In this book I concentrate on general prop- the transformation A such that A acts simply as a scaling by a
erties of linear transformations, which are best formulated and certain number in that direction.
studied in the geometric (coordinate-free) language rather than Remark: Without the condition v 6= 0 in the definition, it would
in the matrix language. Below we will see many coordinate-free follow that the zero vector is an eigenvector for any operator
calculations with linear maps. In Sec. 1.8 we will also see how with any eigenvalue, which would not be very useful, so we
to specify arbitrary linear transformations in a coordinate-free exclude the trivial case v = 0.
manner, although it will then be quite similar to the matrix no- Example 1: Suppose A is the transformation that rotates vec-
tation. tors around some fixed axis by a fixed angle. Then any vector
Exercise 1: If V is a one-dimensional vector space over a field v parallel to the axis is unchanged by the rotation, so it is an
K, prove that any linear operator A on V must act simply as a eigenvector of A with eigenvalue 1.
multiplication by a number. Example 2: Suppose A is the operator of multiplication by a
Solution: Let e 6= 0 be a basis vector; note that any nonzero number , i.e. we define Ax x for all x. Then all nonzero
vector e is a basis in V , and that every vector v V is propor- vectors x 6= 0 are eigenvectors of A with eigenvalue .
tional to e. Consider the action of A on the vector e: the vector Exercise 1: Suppose v is an eigenvector of A with eigenvalue .
Ae must also be proportional to e, say Ae = ae where a K Show that cv for any c K, c 6= 0, is also an eigenvector with
is some constant. Then by linearity of A, for any vector v = ve the same eigenvalue.
we get Av = Ave = ave = av, so the operator A multiplies all Solution: A(cv) = cAv = cv = (cv).
vectors by the same number a.  Example 3: Suppose that an operator A End V is such that it
Exercise 2: If {e1 , ..., eN } is a basis in V and {v1 , ..., vN } is a set has N = dim V eigenvectors v1 , ..., vN that constitute a basis in
of N arbitrary vectors, does there exist a linear map A such that V . Suppose that 1 , ..., N are the corresponding eigenvalues
Aej = vj for j = 1, ..., N ? If so, is this map unique? (not necessarily different). Then the matrix representation of A
Solution: For any x V there exists a unique set of N num- in the basis {vj } is a diagonal matrix
PN
bers x1 , ..., xN such that x = i=1 xi ei . Since A must be lin- 1 0 . . . 0
ear, the action of A on x must be given by the formula Ax = 0 2 . . . 0
PN
A = diag ( , ..., ) .. .. . . .. .
i=1 xi vi . This formula defines Ax for all x. Hence, the map A
ij 1 N
. . . .
exists and is unique. 
0 0 . . . N
Thus a basis consisting of eigenvectors (the eigenbasis), if it ex-
1.2.3 Vector space of all linear maps ists, is a particularly convenient choice of basis for a given oper-
Suppose that V and W are two vector spaces and consider all ator.
linear maps A : V W . The set of all such maps is itself a vector Remark: The task of determining the eigenbasis (also called
space because we can add two linear maps and multiply linear the diagonalization of an operator) is a standard, well-studied
maps by scalars, getting again a linear map. More formally, if A problem for which efficient numerical methods exist. (This book
is not about these methods.) However, it is important to know
and B are linear maps from V to W and K is a number (a
that not all operators can be diagonalized. The simplest example
scalar) then we define A and A + B in the natural way: of a non-diagonalizable operator is one with the matrix repre-
 
0 1
(A)v (Av), sentation in R2 . This operator has only one eigenvec-
0 0

(A + B)v Av + Bv, v V. tor, 10 , so we have no hope of finding an eigenbasis. The the-
ory of the Jordan canonical form (see Sec. 4.6) explains how
In words: the map A acts on a vector v by first acting on it to choose the basis for a non-diagonalizable operator so that its
with A and then multiplying the result by the scalar ; the map matrix in that basis becomes as simple as possible.

13
1 Linear algebra without coordinates

Definition 2: A map A : V W is invertible if there exists a Exercise 2: In a vector space V , let us choose a vector v 6= 0.
map A1 : W V such that AA1 = 1W and A1 A = 1V . The Consider the set S0 of all linear operators A End V such that
map A1 is called the inverse of A. Av = 0. Is S0 a subspace? Same question for the set S3 of opera-
Exercise 2: Suppose that an operator A End V has an eigen- tors A such that Av = 3v. Same question for the set S of all op-
vector with eigenvalue 0. Show that A describes a non-invertible erators A for which there exists some K such that Av = v,
transformation. where may be different for each A.
Outline of the solution: Show that the inverse of a linear op-
erator (if the inverse exists) is again a linear operator. A linear
1.3.1 Projectors and subspaces
operator must transform the zero vector into the zero vector. We
have Av = 0 and yet we must have A1 0 = 0 if A1 exists.  Definition: A linear operator P : V V is called a projector if
Exercise 3: Suppose that an operator A End V in an n-dimen- P P = P .
sional vector space V describes a non-invertible transformation. Projectors are useful for defining subspaces: The result of a
Show that the operator A has at least one eigenvector v with projection remains invariant under further projections, P (P v) =
eigenvalue 0. P v, so a projector P defines a subspace im P , which consists of
Outline of the solution: Let {e1 , ..., en } be a basis; consider the all vectors invariant under P .
set of vectors {Ae1 , ..., Aen } and show that it is not a basis, hence As an example, consider the transformation of R3 given by
linearly dependent (otherwise A would be invertible). Then there the matrix
P
1 0 a
P combination j cj (Aej ) = 0 where not all cj are
exists a linear
zero; v j cj ej is then nonzero, and is the desired eigenvec- P = 0 1 b ,
tor.  0 0 0
where a, b are arbitrary numbers. It is easy to check that P P = P
1.3 Subspaces for any a, b. This transformation is a projector onto the subspace
spanned by the vectors (1, 0, 0) and (0, 1, 0). (Note that a and b
Definition: A subspace of a vector space V is a subset S V can be chosen at will; there are many projectors onto the same
such that S is itself a vector space. subspace.)
A subspace is not just any subset of V . For example, if v V Statement: Eigenvalues of a projector can be only the numbers
is a nonzero vector then the subset S consisting of the single 0 and 1.
vector, S = {v}, is not a subspace: for instance, v + v = 2v, but Proof: If v V is an eigenvector of a projector P with the
2v 6 S. eigenvalue then
Example 1. The set {v | K} is called the subspace
spanned by the vector v. This set is a subspace because we can v = P v = P P v = P v = 2 v ( 1) v = 0.
add vectors from this set to each other and obtain again vectors Since v 6= 0, we must have either = 0 or = 1. 
from the same set. More generally, if v1 , ..., vn V are some
vectors, we define the subspace spanned by {vj } as the set of
all linear combinations 1.3.2 Eigenspaces
Span {v1 , ..., vn } {1 v1 + ... + n vn | i K} . Another way to specify a subspace is through eigenvectors of
some operator.
It is obvious that Span {v1 , ..., vn } is a subspace of V . Exercise 1: For a linear operator A and a fixed number K,
If {ej } is a basis in the space V then the subspace spanned by
the set of all vectors v V such that Av = v is a subspace of V .
the vectors {ej } is equal to V itself.
Exercise 1: Show that the intersection of two subspaces is also The subspace of all such vectors is called the eigenspace of A
a subspace. with the eigenvalue . Any nonzero vector from that subspace
Example 2: Kernel of an operator. Suppose A End V is a is an eigenvector of A with eigenvalue .
linear operator. The set of all vectors v such that Av = 0 is Example: If P is a projector then im P is the eigenspace of P
with eigenvalue 1.
called the kernel of the operator A and is denoted by ker A. In
Exercise 2: Show that eigenspaces V and V corresponding to
formal notation,
different eigenvalues, 6= , have only one common vector
ker A {u V | Au = 0}. the zero vector. (V V = {0}.)
By definition, a subspace U V is invariant under the action
This set is a subspace of V because if u, v ker A then
of some operator A if Au U for all u U .
A (u + v) = Au + Av = 0, Exercise 3: Show that the eigenspace of A with eigenvalue is
and so u + v ker A. invariant under A.
Example 3: Image of an operator. Suppose A : V V is a Exercise 4: In a space of polynomials in the variable x of any (fi-
linear operator. The image of the operator A, denoted im A, is nite) degree, consider the subspace U of polynomials of degree
d
by definition the set of all vectors v obtained by acting with A not more than 2 and the operator A x dx , that is,
on some other vectors u V . In formal notation, dp(x)
A : p(x) 7 x .
im A {Au | u V }. dx
This set is also a subspace of V (prove this!). Show that U is invariant under A.

14
1 Linear algebra without coordinates

1.4 Isomorphisms of vector spaces picture is that canonically isomorphic spaces have a fundamen-
tal structural similarity. An isomorphism that depends on the
Two vector spaces are isomorphic if there exists a one-to-one choice of basis, as in the Statement 1 above, is unsatisfactory if
linear map between them. This linear map is called the isomor- we are interested in properties that can be formulated geometri-
phism. cally (independently of any basis).
Exercise 1: If {v1 , ..., vN } is a linearly independent set of vec-
tors (vj V ) and M : V W is an isomorphism then the set
{M v1 , ..., M vN } is also linearly independent. In particular, M
1.5 Direct sum of vector spaces
maps a basis in V into a basis in W .
If V and W are two given vector spaces over a field K, we define
Hint: First show that M v = 0 if and only if v = 0. Then a new vector space V W as the space of pairs (v, w), where
consider the result of M (1 v1 + ... + N vN ). v V and w W . The operations of vector sum and scalar
Statement 1: Any vector space V of dimension n is isomorphic multiplication are defined in the natural way,
to the space Kn of n-tuples.
Proof: To demonstrate this, it is sufficient to present some iso- (v1 , w1 ) + (v2 , w2 ) = (v1 + v2 , w1 + w2 ) ,
morphism. We can always choose a basis Pn {ei } in V , so that any (v1 , w1 ) = (v1 , w1 ) .
vector v V is decomposed as v = i=1 i ei . Then we define
The new vector space is called the direct sum of the spaces V
the isomorphism map M between V and the space Kn as
and W .
Statement: The dimension of the direct sum is dim (V W ) =
M v (1 , ..., n ) .
dim V + dim W .
It is easy to see that M is linear and one-to-one.  Proof: If v1 , ..., vm and w1 , ..., wn are bases in V and W re-
m n
Vector spaces K and K are isomorphic only if they have spectively, consider the set of m + n vectors
equal dimension, m = n. The reason they are not isomorphic (v1 , 0) , ..., (vm , 0) , (0, w1 ) , ..., (0, wn ) .
for m 6= n is that they have different numbers of vectors in a ba-
sis, while one-to-one linear maps must preserve linear indepen- It is easy to prove that this set is linearly independent. Then it is
dence and map a basis to a basis. (For m 6= n, there are plenty clear that any vector (v, w) V W can be represented as a lin-
of linear maps from Km to Kn but none of them is a one-to-one ear combination of the vectors from the above set, therefore that
map. It also follows that a one-to-one map between Km and Kn set is a basis and the dimension of V W is m + n. (This proof
cannot be linear.) is sketchy but the material is standard and straightforward.) 
Note that the isomorphism M constructed in the proof of Exercise 1: Complete the proof.
Statement 1 will depend on the choice of the basis: a different Hint: If (v, w) = 0 then v = 0 and w = 0 separately.
basis {ei } yields a different map M . For this reason, the iso-
morphism M is not canonical. 1.5.1 V and W as subspaces of V W ; canonical
Definition: A linear map between two vector spaces V and W is projections
canonically defined or canonical if it is defined independently
of a choice of bases in V and W . (We are of course allowed to If V and W are two vector spaces then the space V W has
choose a basis while constructing a canonical map, but at the end a certain subspace which is canonically isomorphic to V . This
we need to prove that the resulting map does not depend on that subspace is the set of all vectors from V W of the form (v, 0),
choice.) Vector spaces V and W are canonically isomorphic if where v V . It is obvious that this set forms a subspace (it
there exists a canonically defined isomorphism between them; I is closed under linear operations) and is isomorphic to V . To
write V = W in this case. demonstrate this, we present a canonical isomorphism which
Examples of canonical isomorphisms: we denote PV : V W V . The isomorphism PV is the canon-
ical projection defined by
1. Any vector space V is canonically isomorphic to itself, V =
V ; the isomorphism is the identity map v v which is PV (v, w) v.
defined regardless of any basis. (This is trivial but still, a
It is easy to check that this is a linear and one-to-one map of the
valid example.)
subspace {(v, 0) | v V } to V , and that P is a projector. This
2. If V is a one-dimensional vector space then End V = K. You projector is canonical because we have defined it without refer-

have seen the map End V K in the Exercise 1.2.2, where ence to any basis. The relation is so simple that it is convenient
you had to show that any linear operator in V is a multi- to write v V W instead of (v, 0) V W .
plication by a number; this number is the element of K cor- Similarly, we define the subspace isomorphic to W and the
responding to the given operator. Note that V =
6 K unless corresponding canonical projection.
there is a preferred vector e V , e 6= 0 which would be It is usually convenient to denote vectors from V W by for-
mapped into the number 1 K. Usually vector spaces do mal linear combinations, e.g. v + w, instead of the pair notation
not have any special vectors, so there is no canonical iso- (v, w). A pair (v, 0) is denoted simply by v V W .
morphism. (However, End V does have a special element Exercise 1: Show that the space Rn Rm is isomorphic to Rn+m ,
the identity 1V .) but not canonically.
Hint: The image of Rn Rn Rm under the isomorphism is
At this point I cannot give more interesting examples of canon- a subspace of Rn+m , but there are no canonically defined sub-
ical maps, but I will show many of them later. My intuitive spaces in that space.

15
1 Linear algebra without coordinates

1.6 Dual (conjugate) vector space The coefficient v1 , understood as a function of the vector v, is a
linear function of v because
Given a vector space V , we define another vector space V n n n
called the dual or the conjugate to V . The elements of V are
X X X
u + v = u j ej + vj ej = (ui + vj ) ej ,
linear functions on V , that is to say, maps f : V K having j=1 j=1 j=1
the property
therefore the first coefficient of the vector u + v is u1 + v1 .
f (u + v) = f (u) + f (v) , u, v V, K. So the coefficients vk , 1 k n, are linear functions of the
The elements of V are called dual vectors, covectors or linear vector v; therefore they are covectors, i.e. elements of V . Let us
forms; I will say covectors to save space. denote these covectors by e1 , ..., en . Please note that e1 depends
Definition: A covector is a linear map V K. The set of all cov- on the entire basis {ej } and not only on e1 , as it might appear
ectors is the dual space to the vector space V . The zero covector from the notation e1 . In other words, e1 is not a result of some
is the linear function that maps all vectors into zero. Covectors star operation applied only to e1 . The covector e1 will change
f and g are equal if if we change e2 or any other basis vector. This is so because the
component v1 of a fixed vector v depends not only on e1 but also
f (v) = g (v) , v V. on every other basis vector ej .
Theorem: The set of n covectors e1 , ..., en is a basis in V . Thus,
It is clear that the set of all linear functions is a vector space
the dimension of the dual space V is equal to that of V .
because e.g. the sum of linear functions is again a linear function.
Proof: First, we show by an explicit
 calculation that any cov-
This space of all linear functions is the space we denote by V .
ector f is a linear combination of ej . Namely, for any f V
In our earlier notation, this space is the same as Hom(V, K).
and v V we have
Example 1: For the space R2 with vectors v (x, y), we may
define the functions f (v) 2x, g (v) y x. It is straightfor- n
X  Xn n
X
ward to check that these functions are linear. f (v) = f vj ej = vj f (ej ) = ej (v) f (ej ) .
Example 2: Let V be the space of polynomials of degree not j=1 j=1 j=1
more than 2 in the variable x with real coefficients. This space V
Note that in the last line the quantities f (ej ) are some numbers
is three-dimensional and contains elements such as p p(x) =
that do not depend on v. Let us denote j f (ej ) for brevity;
a + bx + cx2. A linear function f on V could be defined in a way
then we obtain the following linear decomposition of f through
that might appear nontrivial, such as
the covectors ej ,
Z

f (p) = ex p(x)dx. n
X n
X

0 f (v) = j ej (v) f =
j ej .
Nevertheless, it is clear that this is a linear function mapping V j=1 j=1
into R. Similarly,
d So indeed all covectors f are linear combinations of ej .
g (p) =

p(x) It remains to prove that the set eP
dx x=1 j is linearly independent.
If this were not so, we would have i i ei = 0 where not all
is a linear function. Hence, f and g belong to V . i are zero. Act on a vector ek (k = 1, ..., n) with this linear
Remark: One says that a covector f is applied to a vector v combination and get
and yields a number f (v), or alternatively that a covector acts
on a vector. This is similar to writing cos(0) = 1 and saying that !
Xn

the cosine function is applied to the number 0, or acts on the 0=( i ei )(ek ) = k , k = 1, ..., n.
number 0, and then yields the number 1. Other notations for a i=1
covector acting on a vector are hf , vi and f v, and also v f
Hence all k are zero. 
or f v (here the symbol stands for insert). However, in this
Remark: The theorem holds only for finite-dimensional spaces!
text I will always use the notation f (v) for clarity. The notation
For infinite-dimensional spaces V , the dual space V may be
hx, yi will be used for scalar products.
larger or smaller than V . Infinite-dimensional spaces are
Question: It is unclear how to visualize the dual space when it
subtle, and one should not think that they are simply spaces
is defined in such abstract terms, as the set of all functions hav-
with infinitely many basis vectors. More detail (much more de-
ing some property. How do I know which functions are there,
tail!) can be found in standard textbooks on functional analysis.
and how can I describe this space in more concrete terms?

Answer: Indeed, we need some work to characterize V more 
The set of covectors ej is called the dual basis to the basis
explicitly. We will do this in the next subsection by constructing
{ej }. The covectors ej of the dual basis have the useful property
a basis in V .
ei (ej ) = ij
1.6.1 Dual basis
(please check this!). Here ij is the Kronecker symbol: ij = 0 if
Suppose {e1 , ..., en } is a basis in V ; then any vector v V is i 6= j and ii = 1. For instance, e1 (e1 ) = 1 and e1 (ek ) = 0 for
uniquely expressed as a linear combination k 2.
n Question: I would like to see a concrete calculation. How do
X
v= vj ej . I compute f (v) if a vector v V and a covector f V are
j=1 given?

16
1 Linear algebra without coordinates

Answer: Vectors are usually given by listing their compo- combination of {e1 , e2 , e3 } with some constant coefficients, and
nents in some basis. Suppose {e1 , ..., eN } is a basis in V and similarly f2 and f3 . Let us, for instance, determine f1 . We write
{e1 , ..., eN } is its dual basis. If the vector v has components vk
in a basis {ek } and the covector f V has components fk in f1 = Ae1 + Be2 + Ce3
the dual basis {ek }, then with unknown coefficients A, B, C. By definition, f1 acting on
N N N
an arbitrary vector v = c1 f1 + c2 f2 + c3 f3 must yield c1 . Recall
X X  X that ei , i = 1, 2, 3 yield the coefficients of the polynomial at 1, x,
f (v) = fk ek vl el = fk vk . (1.15)
and x2 . Therefore
k=1 l=1 k=1
!
c1 = f1 (v) = f1 (c1 f1 + c2 f2 + c3 f3 )
Question: The formula (1.15) looks like the scalar product (1.4).
How come? = (Ae1 + Be2 + Ce3 ) (c1 f1 + c2 f2 + c3 f3 )

Answer: Yes, it does look like that, but Eq. (1.15) does not de- = (Ae1 + Be2 + Ce3 ) c1 + c2 (1 + x) + c3 1 + x + 21 x2
scribe a scalar product because for one thing, f and v are from = Ac1 + Ac2 + Ac3 + Bc2 + Bc3 + 12 Cc3 .
different vector spaces. I would rather say that the scalar product
resembles Eq. (1.15), and this happens only for a special choice Since this must hold for every c1 , c2 , c3 , we obtain a system of
of basis (an orthonormal basis) in V . This will be explained in equations for the unknown constants A, B, C:
more detail in Sec. 5.1.
A = 1;
Question: The dual basis still seems too abstract to me. Sup-
pose V is the three-dimensional space of polynomials in the A + B = 0;
variable x with real coefficients
and degree no more than 2. The A + B + 21 C = 0.
three polynomials 1, x, x2 are a basis in V . How can I compute
explicitly the dual basis to this basis? The solution is A = 1, B = 1, C = 0. Therefore f1 = e1 e2 .
Answer: An arbitrary vector from this space is a polynomial In the same way we can determine f2 and f3 . 
 Here are some useful properties of covectors.
a + bx + cx2 . The basis dual to 1, x, x2 consists of three cov-
ectors. Let us denote the set of these covectors by {e1 , e2 , e3 }. Statement: (1) If f 6= 0 is a given covector, there exists a basis
These covectors are linear functions defined like this: {v1 , ..., vN } of V such that f (v1 ) = 1 while f (vi ) = 0 for 2
i N.

e1 a + bx + cx2 = a, (2) Once such a basis is found, the set {a, v2 , ..., vN } will still
 be a basis in V for any vector a such that f (a) 6= 0.
e2 a + bx + cx2 = b,
 Proof: (1) By definition, the property f 6= 0 means that there
e3 a + bx + cx2 = c. exists at least one vector u V such that f (u) 6= 0. Given the
vector u, we define the vector v1 by
If you like, you can visualize them as differential operators act-
ing on the polynomials p(x) like this: 1
v1 u.
f (u)

dp 1 d2 p
e1 (p) = p(x)|x=0 ; e2 (p) = ; e3 (p) = It follows (using the linearity of f ) that f (v1 ) = 1. Then by
.
dx x=0 2 dx2 x=0
Exercise 1 in Sec. 1.1.5 the vector v1 can be completed to some
basis {v1 , w2 , ..., wN }. Thereafter we define the vectors v2 , ...,
However, this is a bit too complicated; the covector e3 just ex-
2 vN by the formula
tracts the coefficient of the polynomial p(x) at x . To make it

clear that, say, e2 and e3 can be evaluated without taking deriva- vi wi f (wi ) v1 , 2 i N,
tives or limits, we may write the formulas for ej (p) in another
equivalent way, e.g. and obtain a set of vectors {v1 , ..., vN } such that f (v1 ) = 1 and
f (vi ) = 0 for 2 i N . This set is linearly independent
p(1) p(1) p(1) 2p(0) + p(1) because a linear dependence among {vj },
e2 (p) = , e3 (p) = .
2 2 N N N
X X  X

It is straightforward to check that these formulas are indeed 0 = i vi = 1 i f (w i ) v1 + i wi ,
2 i=1 i=2 i=2
equivalent by substituting p(x) = a + bx + cx .
Exercise 1: Compute f and g from Example 2 in terms of the together with the linear independence of the basis
basis {ei } defined above. {v1 , w2 , ..., wN }, forces i = 0 for all i 2 and hence also
Question: Im still not sure what to do in the general case. For 1 = 0. Therefore, the set {v1 , ..., vN } is the required basis.
1 2
example, the set 1, 1 + x, 1 + x + 2 x is also a basis in the (2) If the set {a, v2 , ..., vN } were linearly dependent,
space V of quadratic polynomials. How do I explicitly compute N
X
the dual basis now? The previous trick with derivatives does a + j vj = 0,
not work. j=2
Answer: Lets denote this basis by {f1 , f2 , f3 }; we are looking
for the dual basis {f1 , f2 , f3 }. It will certainly be sufficiently ex- with j , not all zero, then we would have
plicit if we manage to express the covectors fj through the cov- N
X
ectors {e1 , e2 , e3 } that we just found previously. Since the set of

f a + j vj = f (a) = 0,

covectors {e1 , e2 , e3 } is a basis in V , we expect that f1 is a linear j=2

17
1 Linear algebra without coordinates

which forces = 0 since by assumption f (a) 6= 0. However, u to some basis {u, u1 , ..., uN 1 }. Then we define vi = ui ci u
= 0 entails with appropriately chosen ci . To achieve f (vi ) = 0, we set
XN
j vj = 0, f (ui )
j=2 ci = .
f (u)
with j not all zero, which contradicts the linear independence
of the set {v2 , ..., vN }.  It remains to prove that {u, v1 , ..., vN 1 } is again a basis. Apply-

Exercise 2: Suppose that {v1 , ..., vk }, vj V is a linearly inde- ing f to a supposedly existing vanishing linear combination,
pendent set (not necessarily a basis). Prove that there exists at N 1
X
least one covector f V such that u + v = 0,
i i
i=1
f (v1 ) = 1, while f (v2 ) = ... = f (vk ) = 0.
we obtain = 0. Expressing vi through u and ui , we ob-
Outline of proof: The set {v1 , ..., vk } can be completed to a basis tain a vanishing linear combination of vectors {u, u1 , ..., uN 1 }
in V , see Exercise 1 in Sec. 1.1.5. Then f is the covector dual to with coefficients i at ui . Hence, all i are zero, and so the set
v1 in that basis. {u, v1 , ..., vN 1 } is linearly independent and thus a basis in V .
Exercise 3: Prove that the space dual to V is canonically iso- Finally, we show that {v1 , ..., vN 1 } is a basis in the hyper-
morphic to V , i.e. V = V (for finite-dimensional V ). plane. By construction, every vi belongs to the hyperplane, and
Hint: Vectors v V can be thought of as linear functions on so does every linear combination of the vi s. It remains to show
V , defined by v(f ) f (v). This provides a map V V , so that every x such that f (x) = 0 can be expressed as a linear
the space V is a subspace of V . Show that this map is injective. combination of the {vj }. For any such x we have the decompo-
The dimensions of the spaces V , V , and V are the same; de- sition in the basis{u, v1 , ..., vN 1 },
duce that V as a subspace of V coincides with the whole space
V . N
X 1
x = u + i vi .
i=1
1.6.2 Hyperplanes
Applying f to this, we find = 0. Hence, x is a linear com-
Covectors are convenient for characterizing hyperplanes. bination only of the {vj }. This shows that the set {vj } spans
Let us begin with a familiar example: In three dimensions, the the hyperplane. The set {vj } is linearly independent since it is a
set of points with coordinate x = 0 is a plane. The set of points subset of a basis in V . Hence, {vj } is a basis in the hyperplane.
whose coordinates satisfy the linear equation x + 2y z = 0 is Therefore, the hyperplane has dimension N 1. 
another plane. Hyperplanes considered so far always contain the zero vector.
Instead of writing a linear equation with coordinates, one can Another useful construction is that of an affine hyperplane: Ge-
write a covector applied to the vector of coordinates. For exam- ometrically speaking, this is a hyperplane that has been shifted
ple, the equation x + 2y z = 0 can be rewritten as f (x) = 0, away from the origin.

where x {x, y, z} R3 , while the
covector f R3 is ex- Definition 2: An affine hyperplane is the set of all vectors x
pressed through the dual basis ej as V such that f (x) = , where f V is nonzero, and is a
number.
f e1 + 2e2 e3 .
Remark: An affine hyperplane with 6= 0 is not a subspace
The generalization of this to N dimensions is as follows. of V and may be described more constructively as follows. We

Definition 1: The hyperplane (i.e. subspace of codimension 1) first obtain a basis {v1 , ..., vN 1 } of the hyperplane f (x) = 0,
annihilated by a covector f V is the set of all vectors x V as described above. We then choose some vector u such that
such that f (x) = 0. (Note that the zero vector, x = 0, belongs to f (u) 6= 0; such a vector exists since
f 6= 0. We can then mul-
the hyperplane.) tiply u by a constant such that f (u) = , that is, the vector
u belongs to the affine hyperplane. Now, every vector x of the
Statement: The hyperplane annihilated by a nonzero covector
form
f is a subspace of V of dimension N 1 (where N dim V ). N 1
X
Proof: It is clear that the hyperplane is a subspace of V be- x = u + i vi ,
cause for any x1 and x2 in the hyperplane we have i=1

f (x1 + x2 ) = f (x1 ) + f (x2 ) = 0. with arbitrary i , belongs to the hyperplane since f (x) =
by construction. Thus, the set {x | f (x) = } is a hyperplane
Hence any linear combination of x1 and x2 also belongs to the drawn through u parallel to the vectors {vi }. Affine hyper-
hyperplane, so the hyperplane is a subspace. planes described by the same covector f but with different val-
To determine the dimension of this subspace, we would like to ues of will differ only in the choice of the initial vector u and
construct a basis for the hyperplane. Since f V is a nonzero thus are parallel to each other, in the geometric sense.
covector, there exists some vector u V such that f (u) 6= 0. Exercise: Intersection of many hyperplanes. a) Suppose
(This vector does not belong to the hyperplane.) The idea is to f1 , ..., fk V . Show that the set of all vectors x V such that
complete u to a basis {u, v1 , ..., vN 1 } in V , such that f (u) 6= 0 fi (x) = 0 (i = 1, ...k) is a subspace of V .
but f (vi ) = 0; then {v1 , ..., vN 1 } will be a basis in the hyper- b)* Show that the dimension of that subspace is equal to N k
plane. To find such a basis {u, v1 , ..., vN 1 }, let us first complete (where N dimV ) if the set {f1 , ..., fk } is linearly independent.

18
1 Linear algebra without coordinates

1.7 Tensor product of vector spaces Note that we cannot simplify this expression any further, be-
cause by definition no other combinations of tensor products are
The tensor product is an abstract construction which is impor- equal except those specified in Eqs. (1.17)(1.19). This calculation
tant in many applications. The motivation is that we would like illustrates that is a formal symbol, so in particular v w is not
to define a product of vectors, uv, which behaves as we expect a new vector from V or from W but is a new entity, an element
a product to behave, e.g. of a new vector space that we just defined.
Question: The logic behind the operation is still unclear.
(a + b) c = a c + b c, K, a, b, c V, How could we write the properties (1.17)(1.19) if the operation
and the same with respect to the second vector. This property was not yet defined?
is called bilinearity. A trivial product would be a b = 0 Answer: We actually define the operation through these
for all a, b; of course, this product has the bilinearity property properties. In other words, the object a b is defined as an
but is useless. It turns out to be impossible to define a nontrivial expression with which one may perform certain manipulations.
product of vectors in a general vector space, such that the result Here is a more formal definition of the tensor product space. We
is again a vector in the same space.3 The solution is to define a first consider the space of all formal linear combinations
product of vectors so that the resulting object uv is not a vector
1 v1 w1 + ... + n vn wn ,
from V but an element of another space. This space is constructed
in the following definition. which is a very large vector space. Then we introduce equiva-
Definition: Suppose V and W are two vector spaces over a field lence relations expressed by Eqs. (1.17)(1.19). The space V W
K; then one defines a new vector space, which is called the ten- is, by definition, the set of equivalence classes of linear combi-
sor product of V and W and denoted by V W . This is the space nations with respect to these relations. Representatives of these
of expressions of the form equivalence classes may be written in the form (1.16) and cal-
v1 w1 + ... + vn wn , (1.16) culations can be performed using only the axioms (1.17)(1.19).

where vi V , wi W . The plus sign behaves as usual (com- Note that v w is generally different from w v because the
mutative and associative). The symbol is a special separator vectors v and w can belong to different vector spaces. Pedanti-
symbol. Further, we postulate that the following combinations cally, one can also define the tensor product space W V and
are equal, then demonstrate a canonical isomorphism V W =W V.
Exercise: Prove that the spaces V W and W V are canonically
(v w) = (v) w = v (w) , (1.17) isomorphic.
(v1 + v2 ) w = v1 w + v2 w, (1.18) Answer: A canonical isomorphism will map the expression
v (w1 + w2 ) = v w1 + v w2 , (1.19) v w V W into w v W V . 
The representation of a tensor A V W in the form (1.16) is
for any vectors v, w, v1,2 , w1,2 and for any constant . (One not unique, i.e. there may be many possible choices of the vectors
could say that the symbol behaves as a noncommutative vj and wj that give the same tensor A. For example,
product sign.) The expression v w, which is by definition
an element of V W , is called the tensor product of vectors v A v1 w1 + v2 w2 = (v1 v2 ) w1 + v2 (w1 + w2 ) .
and w. In the space V W , the operations of addition and mul-
tiplication by scalars are defined in the natural way. Elements of This is quite similar to the identity 2 + 3 = (2 1) + (3 + 1),
the tensor product space are called tensors. except that in this case we can simplify 2 + 3 = 5 while in the
Question: The set V W is a vector space. What is the zero tensor product space no suchP simplification is possible.
P I stress
vector in that space? that two tensor expressions k vk wk and k vk wk are
Answer: Since V W is a vector space, the zero element 0 equal only if they can be related by a chain of identities of the
V W can be obtained by multiplying any other element of form (1.17)(1.19); such are the axioms of the tensor product.
V W by the number 0. So, according to Eq. (1.17), we have
0 = 0 (v w) = (0v) w = 0 w = 0 (0w) = 0 0. In other 1.7.1 First examples
words, the zero element is represented by the tensor 0 0. It will
not cause confusion if we simply write 0 for this zero tensor.  Example 1: polynomials. Let V be the space of polynomials
Generally, one calls something a tensor if it belongs to a space having a degree 2 in the variable x, and let W be the space
that was previously defined as a tensor product of some other of polynomials of degree 2 in the variable y. We consider the
vector spaces. tensor product of the elements p(x) = 1 + x and q(y) = y 2 2y.
According to the above definition, we may perform calcula- Expanding the tensor product according to the axioms, we find
tions with the tensor product expressions by expanding brack- 
(1 + x) y 2 2y = 1 y 2 1 2y + x y 2 x 2y.
ets or moving scalar factors, as if is a kind of multiplication.
For example, if vi V and wi W then Let us compare this with the formula we would obtain by mul-
1 1 1 tiplying the polynomials in the conventional way,
(v1 v2 ) (w1 2w2 ) = v1 w1 v2 w1 
3 3 3 (1 + x) y 2 2y = y 2 2y + xy 2 2xy.
2 2
v1 w2 + v2 w2 .
3 3 Note that 1 2y = 2 y and x 2y = 2x y according to
3 Theimpossibility of this is proved in abstract algebra but I do not know the the axioms of the tensor product. So we can see that the tensor
proof. product space V W has a natural interpretation through the

19
1 Linear algebra without coordinates

algebra of polynomials. The space V W can be visualized as where ij and ij are some coefficients. Then
the space of polynomials in both x and y of degree at most 2
k k m n
!
in each variable. To make this interpretation precise, we can X X X X
construct a canonical isomorphism between the space V W vi wi = ij ej il fl
and the space of polynomials in x and y of degree at most 2 in i=1 i=1 j=1 l=1
each variable. The isomorphism maps the tensor p(x) q(y) to m X n k
!
X X
the polynomial p(x)q(y). = ij il (ej fl )
j=1 l=1 i=1
Example 2: R3 C. Let V be the three-dimensional space R3 ,
m X
n
and let W be the set of all complex numbers C considered as a X
= Cjl ej fl ,
vector space over R. Then the tensor product of V and W is, by
j=1 l=1
definition, the space of combinations of the form
Pk
where Cjl i=1 ij il is a certain set of numbers. In other
(x1 , y1 , z1 ) (a1 + b1 i) + (x2 , y2 , z2 ) (a2 + b2 i) + ...
words, an arbitrary element of Rm Rn can be expressed as a
linear combination of ej fl . In Sec. 1.7.3 (after some preparatory
Here i can be treated as a formal symbol; of course we know
work) we will prove that the the set of tensors
that i2 = 1, but our vector spaces are over R and so we will
not need to multiply complex numbers when we perform calcu- {ej fl | 1 j m, 1 l n}
lations in these spaces. Since
is linearly independent and therefore is a basis in the space
(x, y, z) (a + bi) = (ax, ay, az) 1 + (bx, by, bz) i, Rm Rn . It follows that the space Rm Rn has dimension mn
and that elements of Rm Rn can be represented by rectangular
any element of R3 C can be represented by the expression v1 tables of components C , where 1 j m, 1 l n. In other
jl
1 + v2 i, where v1,2 R3 . For brevity one can write such words, the space Rm Rn is isomorphic to the linear space of
3
expressions as v1 + v2 i. One also writes R R C to emphasize rectangular m n matrices with coefficients from K. This iso-
the fact that it is a space over R. In other words, R3 R C is the morphism is not canonical because the components C depend
jl
space of three-dimensional vectors with complex coefficients. on the choice of the bases {e } and {f }.
j j
This space is six-dimensional.
Exercise: We can consider R3 R C as a vector space over C if
1.7.3 Dimension of tensor product is the product
we define the multiplication by a complex number by (v
z) v (z) for v V and , z C. Compute explicitly of dimensions
We have seen above that the dimension of a direct sum V W
(v1 1 + v2 i) =? is the sum of dimensions of V and of W . Now the analogous
statement: The dimension of a tensor product space V W is
Determine the dimension of the space R3 R C when viewed as equal to dim V dim W .
a vector space over C in this way. To prove this statement, we will explicitly construct a basis in
Example 3: V K is isomorphic to V . Since K is a vector V W out of two given bases in V and in W . Throughout this
space over itself, we can consider the tensor product of V and section, we consider finite-dimensional vector spaces V and W
K. However, nothing is gained: the space V K is canoni- and vectors vj V , wj W .
cally isomorphic to V . This can be easily verified: an element Lemma 1: a) If {v1 , ..., vm } and {w1 , ..., wn } are two bases in
x of V K is by definition an expression of the form x = their respective spaces then any element A V W can be
v1 1 + ... + vn n , however, it follows from the axiom (1.17) expressed as a linear combination of the form
that v1 1 = (1 v1 ) 1, therefore x = (1 v1 + ... + n vn ) 1.
m X n
Thus for any x V K there exists a unique v V such that X
A= jk vj wk
x = v 1. In other words, there is a canonical isomorphism
j=1 k=1
V V K which maps v into v 1.
with some coefficients jk .
b) Any P tensor A V W can be written as a linear combina-
1.7.2 Example: Rm Rn tion A = k ak bk , where ak V and bk W , with at most
Let {e1 , ..., em } and {f1 , ..., fn } be the standard bases in Rm and min (m, n) terms in the sum.
Rn respectively. The vector space Rm Rn consists, by defini- Proof: a) The required decomposition was given in Exam-
tion, of expressions of the form ple 1.7.2.
b) We can group the n terms jk wk into new vectors bj and
X k obtain the required formula with m terms:
m n
v1 w1 + ... + vk wk = vi wi , vi R , wi R . m X n m n
i=1
X X X
A= jk vj wk = vj bj , bj jk wk .
j=1 k=1 j=1 k=1
The vectors vi , wi can be decomposed as follows,
m n I will call this formula the decomposition of the tensor A in the
X X
vi = ij ej , wi = il fl , (1.20) basis {vj }. Since a similar decomposition with n terms exists
j=1 l=1 for the basis {wk }, it follows that A has a decomposition with at

20
1 Linear algebra without coordinates

most min (m, n) terms (not all terms in the decomposition need Lemma 3: If {v1 , ..., vm } and {u1 , ..., un } are two linearly inde-
to be nonzero).  pendent sets in their respective spaces then the set
We have proved that the set {vj wk } allows us to express
any tensor A as a linear combination; in other words, the set {vj wk } {v1 w1 , v1 w2 , ..., vm wn1 , vm wn }

{vj wk | 1 j m, 1 k n} is linearly independent in the space V W .


Proof: We need to prove that a vanishing linear combination
spans the space V W . This set will be a basis in V W if it
is linearly independent, which we have not yet proved. This is m X
X n
a somewhat subtle point; indeed, how do we show that there jk vj wk = 0 (1.23)
exists no linear dependence, say, of the form j=1 k=1

1 v1 w1 + 2 v2 w2 = 0 is possible only if all jk = 0. Let us choose some fixed value


j1 ; we will now prove that j1 k = 0 for all k. By the result of
with some nonzero coefficients i ? Is it perhaps possible to jug- Exercise 1 in Sec. 1.6 there exists a covector f V such that
gle tensor products to obtain such a relation? The answer is f (vj ) = j1 j for j = 1, ..., n. Then we apply the map f : V
negative, but the proof is a bit circumspect. We will use covec- W W defined in Lemma 1 to Eq. (1.23). On the one hand, it
tors from V in a nontraditional way, namely not as linear maps follows from Eq. (1.23) that
V K but as maps V W W .
Lemma 2: If f V is any covector, we define the map f : m X
X n

V W W (tensors into vectors) by the formula f jk vj wk = f (0) = 0.
j=1 k=1
X  X
f vk wk f (vk ) wk . (1.21)
k k
On the other hand, by definition of the map f we have
m X
n m X
n
Then this map is a linear map V W W .
X  X
Proof: The formula (1.21) defines the map explicitly (and f jk vj wk = jk f (vj ) wk
j=1 k=1 j=1 k=1
canonically!). It is easy to see that any linear combinations of
m X n n
tensors are mapped into the corresponding linear combinations X X
= jk j1 j wk = j1 k wk .
of vectors,
j=1 k=1 k=1
f (vk wk + vk wk ) = f (vk ) wk + f (vk ) wk . P
Therefore k j1 k wk = 0. Since the set {wk } is linearly inde-
This follows from the definition (1.21) and the linearity of the pendent, we must have j1 k = 0 for all k = 1, ..., n. 
Now we are ready to prove the main statement of this section.
map f . However, there is one potential problem: there exist
many representations
P of an element A V W as an expression Theorem: If V and W are finite-dimensional vector spaces then
of the form k vk wk with different choices of vk , wk . Thus
we need to show that the map f is well-defined by Eq. (1.21), dim (V W ) = dim V dim W.

i.e. that f (A) is always the same vector regardless of Pthe choice
of the vectors vk and wk used to represent A as A = Proof: By definition of dimension, there exist linearly inde-
P k vk wk .
Recall that different expressions of the form k vk wk can be pendent sets of m dim V vectors in V and of n dim W
equal as a consequence of the axioms (1.17)(1.19). vectors in W , and by the basis theorem these sets are bases in
In other words, we need to prove that a tensor equality V and W respectively. By Lemma 1 the set of mn elements
X X {v j w k } spans the space V W , and by Lemma 3 this set is
vk wk = vk wk (1.22) linearly independent. Therefore this set is a basis. Hence, there
k k are no linearly independent sets of mn + 1 elements in V W ,
so dim (V W ) = mn. 
entails X  X 
f vk wk = f vk wk .
k k 1.7.4 Higher-rank tensor products
To prove this, we need to use the definition of the tensor prod- The tensor product of several spaces is defined similarly, e.g. U
uct. Two expressions in Eq. (1.22) can be equal only if they are V W is the space of expressions of the form
related by a chain of identities of the form (1.17)(1.19), therefore
it is sufficient to prove that the map f transforms both sides of u1 v1 w1 + ... + un vn wn , ui , vi , wi V.
each of those identities into the same vector. This is verified by
explicit calculations, for example we need to check that Alternatively (and equivalently) one can define the space U
V W as the tensor product of the spaces U V and W .
f (v w) = f (v w) ,
Exercise: Prove that (U V ) W = U (V W ).
f [(v1 + v2 ) w] = f (v1 w) + f (v2 w) , Definition: If we only work with one space V and if all other
f [v (w1 + w2 )] = f (v w1 ) + f (v w2 ) . spaces are constructed out of V and V using the tensor product,
then we only need spaces of the form
These simple calculations look tautological, so please check that
you can do them and explain why they are necessary for this V ... V V ... V .
proof.  | {z } | {z }
m n

21
1 Linear algebra without coordinates

Elements of such spaces are called tensors of rank (m, n). For Proof: Compare this linear map with the linear map defined
example, vectors v V have rank (1, 0), covectors f V have in Eq. (1.21), Lemma 2 of Sec. 1.7.3. We need to prove two state-
rank (0, 1), tensors from V V have rank (1, 1), tensors from ments:
V V have rank (2, 0), and so on. Scalars from K have rank (1) The transformation is linear, A(x + y) = Ax + Ay.
(0, 0). (2) The operator A does not depend on the decomposition of
In many applications, the spaces V and V are identified the tensor A using particular vectors vj and covectors fj : two
(e.g. using a scalar product; see below). In that case, the rank decompositions of the tensor A,
is reduced to a single number the sum of m and n. Thus, in
this simplified counting, tensors from V V as well as tensors k
X l
X
from V V have rank 2. A= vj fj = wj gj ,
j=1 j=1

1.7.5 * Distributivity of tensor product


yield the same operator,
We have two operations that build new vector spaces out of old
ones: the direct sum V W and the tensor product V W . Is Xk X l


there something like the formula (U V ) W = (U W ) Ax = fj (x) vj = gj (x) wj , x.
(V W )? The answer is positive. I will not need this construc- j=1 j=1
tion below; this is just another example of how different spaces
are related by a canonical isomorphism. The first statement, A (x + y) = Ax + Ay, follows from the
Statement: The spaces (U V ) W and (U W ) (V W ) linearity of fj as a map V K and is easy to verify by explicit
are canonically isomorphic. calculation:
Proof: An element (u, v) w (U V ) W is mapped into
the pair (u w, v w) (U W ) (V W ). It is easy to see Xk

that this map is a canonical isomorphism. I leave the details to A(x + y) = fj (x + y) vj


you.  j=1

Exercise: Let U , V , and W be some vector spaces. Demonstrate Xk Xk



the following canonical isomorphisms: = fj (x) v j + fj (y) vj
j=1 j=1
(U V )
= U V ,
= Ax + Ay.
(U V ) = U V .
The second statement is proved using the axioms (1.17)(1.19)
of the tensor product. Two different expressions for the ten-
1.8 Linear maps and tensors sor A can be equal only if they are related through the ax-
The tensor product construction may appear an abstract play- ioms (1.17)(1.19). So it suffices to check that the operator A
thing at this point, but in fact it is a universal tool to describe remainsP unchanged when we use each of the three axioms to
k
linear maps. replace j=1 vj fj by an equivalent tensor expression. Let
We have seen that the set of all linear operators A : V V us P check the first axiom: We need to compare the action of

is a vector space because one can naturally define the sum of Pj (uj + vj ) fP j on a vector x V and the action of the sum of

two operators and the product of a number and an operator. j uj fj and j vj fj on the same vector:
This vector space is called the space of endomorphisms of V X 
and denoted by End V .
Ax = (uj + vj ) fj x
In this section I will show that linear operators can be thought
j
of as elements of the space V V . This gives a convenient way X
to represent a linear operator by a coordinate-free formula. Later = fj (x) (uj + vj )
we will see that the space Hom (V, W ) of linear maps V W is j
X  X 
canonically isomorphic to W V .
= uj fj x + vj fj x.
j j
1.8.1 Tensors as linear operators
First, we will show that any tensor from the space V V acts The action of A on x remains unchanged for every x, which
as a linear map V V . means that the operator A itself is unchanged. Similarly, we
Lemma: A tensor A V V expressed as (more precisely, you) can check directly that the other two ax-
k
ioms also leave A unchanged. It follows that the action of A on a
X vector x, as defined by Eq. (1.24), is independent of the choice of

A vj fj
representation of the tensor A through vectors vj and covectors
j=1
fj . 
defines a linear operator A : V V according to the formula
Question: I am wondering what kind of operators correspond
Xk to tensor expressions. For example, take the single-term tensor
Ax fj (x) vj . (1.24) A = v w . What is the geometric meaning of the correspond-
j=1 ing operator A?

22
1 Linear algebra without coordinates

Answer: Let us calculate: Ax = w (x) v, i.e. the operator A Proof: (1) To prove that a map is an isomorphism of vector
acts on any vector x V and produces a vector that is always spaces, we need to show that this map is linear and bijective
proportional to the fixed vector v. Hence, the image of the oper- (one-to-one). Linearity easily follows from the definition of the
ator A is the one-dimensional subspace spanned by v. However, map : if A, B V V are two tensors then A + B V V
A is not necessarily a projector because in general AA 6= A: is mapped into A + B. To prove the bijectivity, we need to
showP that for any operator A there exists a corresponding tensor
A(Ax) = w (v) w (x) v 6= w (x) v, unless w (v) = 1. A = k vk fk (this we have already shown above), and that
two different tensors A 6= B cannot be mapped into the same
Exercise 1: An operator A is given by the formula
operator A = B. If two different tensors A 6= B were mapped
A = 1V + v w , into the same operator A = B, it would follow from the linearity
of that A \ B = A B = 0, in other words, that a nonzero
where K, v V , w V . Compute Ax for any x V . tensor C A B 6= 0 is mapped into the zero operator, C =
Answer: Ax = x + w (x) v. 0. We will now arrive The tensor C has a
P to a contradiction.
Exercise 2: Let n V and f V such that f (n) = 1. Show decomposition C = k vk c
k in the basis {vk }. Since C 6=
that the operator P 1V nf is a projector onto the subspace 0, it follows that at least one covector ck is nonzero. Suppose
annihilated by f . c1 6= 0; then there exists at least one vector x V such that
Hint: You need to show that P P = P ; that any vector x anni- c1 (x) 6= 0. We now act on x with the operator C: by assumption,
hilated by f is invariant under P (i.e. if f (x) = 0 then P x = x); C = A B = 0, but at the same time
and that for any vector x, f (P x) = 0. X
0 = Cx vk ck (x) = v1 c1 (x) + ...
k
1.8.2 Linear operators as tensors This is a contradiction because a linear combination of vectors
We have seen that any tensor A V V has a corresponding vk with at least one nonzero coefficient cannot vanish (the vec-

linear map in End V . Now conversely, let A End V be a linear tors {vk } are a basis).
operator and let {v1 , ..., vn } be a basisPin V . We will now find Note that we did use a basis {vk } in the construction of the

such covectors fk V that the tensor k vk fk corresponds to map End V V V , when we defined the covectors fk . How-
ever, this map is canonical because it is the same map for all
A. The required covectors fk V can be defined by the formula
choices of the basis. Indeed, if we choose another basis {vk }
fk (x) vk (Ax), x V, then of course the covectors fk will be different from fk , but the
tensor A will remain the same,
where {vk } is the dual basis. With this definition, we have X n X n

Xn  n n
A = v k fk = A = vk fk V V ,
X X k=1 k=1
vk fk x = fk (x) vk = vk (Ax)vk = Ax.
k=1 k=1 k=1 because (as we just proved) different tensors are always mapped
into different operators.
The last equality is based on the formula (2) This follows from Lemma 1 of Sec. 1.7.3. 
Xn From now on, I will not use the map explicitly. Rather, I will
vk (y) vk = y, simply not distinguish between the spaces End V and V V . I
k=1 will write things like v w End V or A = x y . The space
implied in each case will be clear from the context.
which holds because the components of a vector y in the basis
{vk } are vk (y). Then it follows from the definition (1.24) that
P
 1.8.3 Examples and exercises
k vk fk x = Ax.
Let us look at this construction in another way: we have de- Example 1: The identity operator. How to represent the iden-
fined a map : V V End V whereby any tensor A V V tity operator 1V by a tensor A V V ?
is transformed into a linear operator A End V . Choose a basis {vk } in V ; this choice defines the dual basis
Theorem: (1) There is a canonical isomorphism A A between {vk } in V (see Sec. 1.6) such that vj (vk ) = jk . Now apply the
the spaces V V and End V . In other words, linear operators construction of Sec. 1.8.2 to find
are canonically (without choosing a basis) and uniquely mapped X n

into tensors of the form A= vk fk , fk (x) = vk 1V x = vk (x) fk = vk .
k=1
v1 f1 + ... + vn fn .
Therefore
P n
Conversely, a tensor nk=1 vk fk is mapped into the operator
X
1V = vk vk . (1.25)
A defined by Eq. (1.24). k=1
(2) It is possible to write a tensor A as a sum of not more than
Question: The identity operator 1V is defined canonically,
N dim V terms,
i.e. independently of a basis in V ; it is simply the transformation
Xn that does not change any vectors. However, the tensor repre-
A= vk fk , n N. sentation (1.25) seems to depend on the choice of a basis {vk }.
k=1 What is going on? Is the tensor 1 V V defined canonically?

23
1 Linear algebra without coordinates

P
Answer: Yes. The tensor k vk vk is the same tensor regard- = + f (u). Therefore the operator A has two eigenvalues,
less of which basis {vk } we choose; of course the correct dual = and = + f (u). The eigenspace with the eigenvalue
basis {vk } must be used. In other words, for any two bases {vk } = is the set of all x V such that f (x) = 0. The eigenspace
and {vk }, and with {vk } and {vk } being the corresponding dual with the eigenvalue = + f (u) is the set of vectors propor-
bases, we have the tensor equality tional to u. (It might happen that f (u) = 0; then there is only
X X one eigenvalue, = , and no second eigenspace.)
vk vk = vk vk . For the operator B, the calculations are longer. Since {u, v} is
k k a linearly independent set, we may add some vectors ek to that
We have proved this in Theorem 1.8.2 when we established that set in order to complete it to a basis {u, v, e3 , ...,eN }. It is conve-
two different tensors are alwaysP mapped into different operators nient to adapt this basis to the given covectors f and g ; namely,
by the map . One can say that k vk vk is a canonically defined it is possible to choose this basis such that f (ek ) = 0 and
tensor in V V since it is the unique tensor corresponding to g (ek ) = 0 for k = 3, ..., N . (We may replace ek 7 ek ak ubk v
the canonically defined identity operator 1V . Recall that a given with some suitable constants ak , bk to achieve this, using the

tensor can be written as a linear combination of tensor products given properties f (v) = 0, g (u) = 0, f (u) 6= 0, and g (v) 6=
in many different ways! Here is a worked-out example: 0.) Suppose x is an unknown eigenvector with the eigenvalue ;
PN
Let {v1 , v2 } be a basis in a two-dimensional space; let {v1 , v2 } then x can be expressed as x = u+v+ k=3 yk ek in this basis,
be the corresponding dual basis. We can choose another basis, where , , and yk are unknown constants. Our P
goal is there-
e.g. fore to determine , , yk , and . Denote y N k=3 yk ek and
{w1 , w2 } {v1 + v2 , v1 v2 } . transform the eigenvalue equation using the given conditions
f (v) = g (u) = 0 as well as the properties f (y) = g (y) = 0,
Its dual basis is (verify this!)
1 1 Bx x =u (f (u) + f (v) + f (y) )
w1 = (v + v2 ) , w2 = (v v2 ) .
2 1 2 1 + v (g (u) + g (v) + g (y) ) y
Then we compute the identity tensor: =u (f (u) ) + v (g (v) ) y = 0.

1 The above equation says that a certain linear combination of the


1 = w1 w1 + w2 w2 = (v1 + v2 ) (v + v2 )
2 1 vectors u, v, and y is zero. If y 6= 0, the set {u, v, y} is linearly
1 independent since {u, v, e3 , ..., eN } is a basis (see Exercise 1 in
+ (v1 v2 ) (v1 v2 )
2 Sec. 1.1.4). Then the linear combination of the three vectors u,
= v1 v1 + v2 v2 . v, and y can be zero only if all three coefficients are zero. On
the other hand, if y = 0 then we are left only with two coeffi-
The tensor expressions w1 w1 +w2 w2 and v1 v1 +v2 v2 are cients that must vanish. Thus, we can proceed by considering
equal because of distributivity and linearity of tensor product, separately the two possible cases, y 6= 0 and y = 0.
i.e. due to the axioms of the tensor product. We begin with the case y = 0. In this case, Bx x = 0 is
Exercise 1: Matrices as tensors. Now suppose we have a matrix equivalent to the vanishing of the linear combination
Ajk that specifies the linear operator A in a basis {ek }. Which

tensor A V V Pncorresponds to this operator? u (f (u) ) + v (g (v) ) = 0.
Answer: A = j,k=1 Ajk ej ek .
Exercise 2: Product of linear operators. Suppose A = Since {u, v} is linearly independent, this linear combination can
Pn
Pn vanish only when both coefficients vanish:
k=1 vk fk and B = l=1 wl gl are two operators. Ob-
tain the tensor representation of the product AB.
Pn Pn (f (u) ) = 0,
Answer: AB = k=1 l=1 fk (wl ) vk gl .
Exercise 3: Verify that 1V 1V = 1V by explicit computation us- (g (v) ) = 0.
ing the tensor representation (1.25).
This is a system of two linear equations for the two unknowns
Hint: Use the formula vj (vk ) = jk .
and ; when we solve it, we will determine the possible eigen-
Exercise 4: Eigenvalues. Suppose A = 1V + u f and B = vectors x = u + v and the corresponding eigenvalues . Note
u f + v g , where u, v V are a linearly independent set, that we are looking for nonzero solutions, so and cannot be
K, and f , g V are nonzero but such that f (v) = 0 both zero. If 6= 0, we must have = f (u). If f (u) 6= g (v),
and g (u) = 0 while f (u) 6= 0 and g (v) 6= 0. Determine the the second equation forces = 0. Otherwise, any is a solution.
eigenvalues and eigenvectors of the operators A and B. Likewise, if 6= 0 then we must have = g (v). Therefore we
Solution: (I give a solution because it is an instructive calcula- obtain the following possibilities:
tion showing how to handle tensors in the index-free approach.
a) f (u) 6= g (v), two nonzero eigenvalues 1 = f (u) with
Note that the vectors u, v and the covectors f , g are given,
eigenvector x1 = u (with any 6= 0) and 2 = g (v) with
which means that numbers such as f (u) are known constants.)
eigenvector x2 = v (with any 6= 0).
For the operator A, the eigenvalue equation Ax = x yields
b) f (u) = g (v), one nonzero eigenvalue = f (u) = g (v),
x + uf (x) = x. two-dimensional eigenspace with eigenvectors x = u + v
where at least one of , is nonzero.
Either = and then f (x) = 0, or 6= and then x is propor- Now we consider the case y 6= 0 (recall that y is an un-
tional to u; substituting x = u into the above equation, we find known vector from the subspace Span {e3 , ..., eN }). In this case,

24
1 Linear algebra without coordinates

we obtain a system of linear equations for the set of unknowns Example 2: If V and W are vector spaces, what are tensors from
(, , , y): V W ?
They can be viewed as (1) linear maps from V into W , (2)
f (u) = 0, linear maps from W into V , (3) linear maps from V W into K.
g (v) = 0, These possibilities can be written as canonical isomorphisms:
= 0.
V W
= Hom (V, W )
= Hom (W, V )
= Hom (V W, K) .
This system is simplified, using = 0, to
Exercise 1: How can we interpret the space V V V ? Same
f (u) = 0, question for the space V V V V .
g (v) = 0. Answer: In many different ways:
Since f (u) 6= 0 and g (v) 6= 0, the only solution is =
V V V
= Hom (V, V V )
= 0. Hence, the eigenvector is x = y for any nonzero

= Hom (End V, V )
= Hom (V , End V )
= ... and
y Span {e3 , ..., eN }. In other words, there is an (N 2)-
dimensional eigenspace corresponding to the eigenvalue = 0. VVV V = Hom (V, V V V )

= Hom (V V, V V )
= Hom (End V, End V )
= ...
Remark: The preceding exercise serves to show that calcula-
tions in the coordinate-free approach are not always short! (I For example, V V V can be visualized as the space of linear
even specified some additional constraints on u, v, f , g in or- maps from V to linear operators in V . The action of a tensor
der to make the solution shorter. Without these constraints, u v w V V V on a covector f V may be defined
there are many more cases to be considered.) The coordinate- either as f (u) vw V V or alternatively as f (v) uw
free approach does not necessarily provide a shorter way to V V . Note that these two definitions are not equivalent, i.e. the
find eigenvalues of matrices than the usual methods based on same tensors are mapped to different operators. In each case, one
the evaluation of determinants. However, the coordinate-free of the copies of V (from V V V ) is paired up with V .
method is efficient for the operator A. The end result is that we
are able to determine eigenvalues and eigenspaces of operators Question: We have seen in the proof of Lemma 1 in Sec. 1.7.3
such as A and B, regardless of the number of dimensions in the that covectors f V act as linear maps V W W . However,
space, by using the special structure of these operators, which is I am now sufficiently illuminated to know that linear maps V
specified in a purely geometric way. W W are elements of the space W W V and not elements
of V . How can this be reconciled?
Exercise 5: Find the inverse operator to A = 1V + u f , where
Answer: There is an injection map V W W V de-
u V , f V . Determine when A1 exists.
fined by the formula f 1W f , where 1W W W is
Answer: The inverse operator exists only if f (u) 6= 1: then
the identity operator. Since 1W is a canonically defined element
1 of W W , the map is canonical (defined without choice of ba-
A1 = 1V u f . sis, i.e. geometrically). Thus covectors f V can be naturally
1 + f (u)
considered as elements of the space Hom (V W, W ).
When f (u) = 1, the operator A has an eigenvector u with Question: The space V V can be interpreted as End V , as
eigenvalue 0, so A1 cannot exist. End V , or as Hom (V V , K). This means that one tensor
A V V represents an operator in V , an operator in V , or a
1.8.4 Linear maps between different spaces map from operators into numbers. What is the relation between
all these different interpretations of the tensor A? For example,
So far we have been dealing with linear operators that map a what is the interpretation of the identity operator 1V V V
space V into itself; what about linear maps V W between as an element of Hom (V V , K)?
different spaces? If we replace V by W in many of our defini- Answer: The identity tensor 1V represents the identity op-
tions and proofs, we will obtain a parallel set of results for linear erator in V and in V . It also represents the following map
maps V W . V V K,
P
Theorem 1: Any tensor A kj=1 wj fj W V acts as a
1V : v f 7 f (v) .
linear map V W according to the formula
k
This map applied to an operator A V V yields the trace of
X
Ax fj (x) wj . that operator (see Sec. 3.8).
j=1 The definition below explains the relation between operators
in V and operators in V represented by the same tensor.
The space Hom (V, W ) of all linear operators V W is canoni-
cally isomorphic to the space W V . Definition: If A : V W is a linear map then the transposed
T
Proof: Left as an exercise since it is fully analogous to previous operator A : W V is the map defined by
proofs.
Example 1: Covectors as tensors. We know that the number (AT f ) (v) f (Av), v V, f W . (1.26)
field K is a vector space over itself and V = V K. Therefore
T
linear maps V K are tensors from V K = V , i.e. covectors, In particular, this defines the transposed operator A : V V

in agreement with the definition of V . given an operator A : V V .

25
1 Linear algebra without coordinates

Remark: The above definition is an example of mathematical with suitably chosen wk W and fk V , but not as a sum of
style: I just wrote formula (1.26) and left it for you to digest. In fewer terms.
case you have trouble with this formula, let me translate: The Proof: We know that A can be written as a sum of tensor prod-
operator AT is by definition such that it will transform an arbi- uct terms,
n
trary covector f W into a new covector (AT f ) V , which A =
X
wk fk , (1.27)
is a linear function defined by its action on vectors v V . The k=1
formula says that the value of that linear function applied to an
where wk W , fk
V are some vectors and covectors, and n
arbitrary vector v should be equal to the number f (Av); thus
is some integer. There are many possible choices of these vectors
we defined the action of the covector AT f on any vector v. Note
and the covectors. Let us suppose that Eq. (1.27) represents a
how in the formula (AT f ) (v) the parentheses are used to show choice such that n is the smallest possible number of terms. We
that the first object is acting on the second.
will first show that n is not smaller than the rank of A; then we
Since we have defined the covector AT f for any f W , will show that n is not larger than the rank of A.
it follows that we have thereby defined the operator AT acting If n is the smallest number of terms, the set {w1 , ..., wn } must
in the space W and yielding a covector from V . Please read be linearly independent, or else we can reduce the number of
the formula again and check that you can understand it. The terms in the sum (1.27). To show this, suppose that w1 is equal
difficulty of understanding equations such as Eq. (1.26) is that to a linear combination of other wk ,
one needs to keep in mind all the mathematical notations intro-
n
duced previously and used here, and one also needs to guess X
the argument implied by the formula. In this case, the implied w1 = k wk ,
k=2
argument is that we will define a new operator AT if we show, for
any f W , how the new covector (AT f ) V works on any then we can rewrite A as
vector v V . Only after some practice with such arguments
n n
will it become easier to read mathematical definitions.  X X
A = w1 f1 + wk fk = wk (fk + k f1 ) ,
Note that the transpose map AT is defined canonically
k=2 k=2
(i.e. without choosing a basis) through the original map A.
Question: How to use this definition when the operator A is reducing the number of terms from n to n 1. Since by assump-
tion the number of terms cannot be made less than n, the set
given? Eq. (1.26) is not a formula that gives AT f directly; rather,
{wk } must be linearly independent. In particular, the subspace
it is an identity connecting some values for arbitrary v and f .
spanned by {wk } is n-dimensional. (The same reasoning shows
Answer: In order to use this definition, we need to apply AT f that the set {fk } must be also linearly independent, but we will
to an arbitrary vector v and transform the resulting expression. not need to use this.)
We could also compute the coefficients of the operator AT in The rank of A is the dimension of the image of A; let us denote
some basis. m rank A. It follows from the definition of the map A that for
P
Exercise 2: If A = k wk fk W V is a linear map V W , any v V , the image Av is a linear combination of the vectors
what is the tensor representation of its transpose AT ? What is its wk ,
matrix representation in a suitable basis? Xn
T Av = fk (v) wk .
T
P A maps W V , so the
Answer: The transpose operator
k=1
corresponding tensor is A = k fk wk V W . Its tensor
representation consists of the same vectors wk W and cov- Therefore, the m-dimensional subspace imA is contained within
ectors fk V as the tensor representation of A. The matrix the n-dimensional subspace Span {w1 , ..., wn }, so m n.
representation of AT is the transposed
 matrix of A if we use the Now, we may choose a basis {b1 , ..., bm } in the subspace imA;
same basis {ej } and its dual basis ej .  then for every v V we have
An important characteristic of linear operators is the rank.
(Note that we have already used the word rank to denote the m
X
degree of a tensor product; the following definition presents a Av = i bi
different meaning of the word rank.) i=1

Definition: The rank of a linear map A : V W is the dimen- with some coefficients i that are uniquely determined for each
sion of the image subspace im A W . (Recall that im A is a vector v; in other words, i are functions of v. It is easy to see
linear subspace of W that contains all vectors w W expressed that the coefficients i are linear functions of the vector v since
as w = Av with some v V .) The rank may be denoted by m
X
rank A dim(im A). A(v + u) = (i + i )bi
Theorem 2: The rank of A is the smallest number of terms nec- i=1

essary to write an operator A : V W as a sum of single- Pm


if Au = i=1 i bi . Hence there exist some covectors gi such
term tensor products. In other words, the operator A can be
expressed as that i P= gi (v). It follows that we are able to express A as the
m
tensor i=1 bi gi using m terms. Since the smallest possible
number of terms is n, we must have m n.
XA
rank
A = wk fk W V , We have shown that m n and m n, therefore n = m =
k=1
rank A. 

26
1 Linear algebra without coordinates

Corollary: The rank of a map A : V W is equal to the rank of Tensors are written as multidimensional arrays of compo-
its transpose AT : W V . nents with superscript or subscript indices as necessary, for
Proof: The maps A and AT are represented by the same tensor example Ajk V V or Bklm V V V . Thus e.g. the
from the space W V . Since the rank is equal to the minimum Kronecker delta symbol is written as kj when it represents
number of terms necessary to express that tensor, the ranks of A the identity operator 1V .
and AT always coincide.  The choice of indices must be consistent; each index corre-
We conclude that tensor product is a general construction that sponds to a particular copy of V or V . Thus it is wrong
represents the space of linear maps between various previously to write vj = uk or vi + ui = 0. Correct equations are
defined spaces. For example, matrices are representations of lin- vj = uj and v i + ui = 0. This disallows meaningless expres-
ear maps from vectors to vectors; tensors from V V V can sions such as v + u (one cannot add vectors from different
be viewed as linear maps from matrices to vectors, etc. spaces).
Exercise 3: Prove that the tensor equality a a + b b = v w PN
where a 6= 0 and b 6= 0 can hold only when a = b for some Sums over P indices such as k=1 ak bk are not written explic-
scalar . itly, the symbol is omitted, and the Einstein summation
Hint: If a 6= b then there exists a covector f such that convention is used instead: Summation over all values of
f (a) = 1 and f (b) = 0. Define the map f : V V V as an index is always implied when that index letter appears
f (x y) = f (x)y. Compute once as a subscript and once as a superscript. In this case the
letter is called aP
dummy (or mute) index. Thus P one writes
f (a a + b b) = a = f (v)w, fk v k instead of k fk vk and Ajk v k instead of k Ajk vk .

hence w is proportional to a. Similarly you can show that w is Summation is allowed only over one subscript and one su-
proportional to b. perscript but never over two subscripts or two superscripts
and never over three or more coincident indices. This cor-
responds to requiring that we are only allowed to compute
1.9 Index notation for tensors the canonical pairing of V and V [see Eq. (1.15)] but no
other pairing. The expression v k v k is not allowed because
So far we have used a purely coordinate-free formalism to de- there is no canonical pairing of V and V , so, for instance, the
fine and describe tensors from spaces such as V V . How-
PN
sum k=1 v k v k depends on the choice of the basis. For the
ever, in many calculations a basis in V is fixed, and one needs same reason (dependence on the basis), expressions such as
to compute the components of tensors in that basis. Also, ui v i wi or Aii B ii are not allowed. Correct expressions are
the coordinate-free notation becomes cumbersome for compu- ui v i wk and Aik B ik .
tations in higher-rank tensor spaces such as V V V because
there is no direct means of referring to an individual component One needs to pay close attention to the choice and the po-
in the tensor product. The index notation makes such calcula- sition of the letters such as j, k, l,... used as indices. Indices
tions easier. that are not repeated are free indices. The rank of a tensor
Suppose a basis {e1 , ..., eN } in V is fixed; then the dual basis expression is equal to the number of free subscript and su-
perscript indices. Thus Ajk v k is a rank 1 tensor (i.e. a vector)
Pk } is also fixed. Any vector v P
{e V is decomposed as v =
k vk ek and any covector as f =

k fk ek . Any tensor from
because the expression Ajk v k has a single free index, j, and
V V is decomposed as a summation over k is implied.
X The tensor product symbol is never written. For example,
A= Ajk ej ek V V P
if v f = jk vj fk ej ek , one writes v k fj to represent
j,k
the tensor v f . The index letters in the expression v k fj
are intentionally chosen to be different (in this case, k and j)
P so on. The action of a covector on a vector is f (v) =
and
so that no summation would be implied. In other words,
Pk fk vk , and the action of an operator on a vector is
j,k Ajk vk ek . However, it is cumbersome to keep writing these
a tensor product is written simply as a product of compo-
sums. In the index notation, one writes only the components vk nents, and the index letters are chosen appropriately. Then
or Ajk of vectors and tensors. one can interpret v k fj as simply the product of numbers. In
particular, it makes no difference whether one writes fj v k
or v k fj . The position of the indices (rather than the ordering
1.9.1 Definition of index notation of vectors) shows in every case how the tensor product is
The rules are as follows: formed. Note that it is not possible to distinguish V V
from V V in the index notation.
Basis vectors ek and basis tensors ek el are never written
Example 1: It follows from the definition of ji that ji v j = v i .
explicitly. (It is assumed that the basis is fixed and known.)
This is the index representation of 1v = v.
Instead of a vector v V , one writes its array of compo- Example 2: Suppose w, x, y, and z are vectors from V whose
nents v k with the superscript index. Covectors f V are components are wi , xi , y i , z i . What are the components of the
written fk with the subscript index. The index k runs over tensor w x + 2y z V V ?
integers from 1 to N . Components of vectors and tensors Answer: wi xk + 2y i z k . (We need to choose another letter for
may be thought of as numbers (e.g. elements of the number the second free index, k, which corresponds to the second copy
field K). of V in V V .)

27
1 Linear algebra without coordinates

Example 3: The operator A 1V + v u V V acts on a interpreted as operators from Hom (V V, V V ). The action
vector x V . Calculate the resulting vector y Ax. of such an operator on a tensor ajk V V is expressed in the
In the index-free notation, the calculation is index notation as
 blm = Alm jk
jk a ,
y = Ax = 1V + v u x = x + u (x) v.
where alm and blm represent tensors from V V and Alm jk is a
In the index notation, the calculation looks like this: tensor from V V V V , while the summation over the in-
 dices j and k is implied. Each index letter refers unambiguously
y k = jk + v k uj xj = xk + v k uj xj . to one tensor product factor. Note that the formula

In this formula, j is a dummy index and k is a free index. We blm = Alm


kj a
jk
could have also written xj v k uj instead of v k uj xj since the or-
dering of components makes no difference in the index notation. describes another (inequivalent) way to define the isomorphism
Exercise: In a physics book you find the following formula, between the spaces V V V V and Hom (V V, V V ).
The index notation expresses this difference in a concise way; of
1 course, one needs to pay close attention to the position and the
H = (h + h h ) g .
2 order of indices.
To what spaces do the tensors H, g, h belong (assuming these Note that in the coordinate-free notation it is much more cum-
quantities represent tensors)? Rewrite this formula in the bersome to describe and manipulate such tensors. Without the
coordinate-free notation. index notation, it is cumbersome to perform calculations with a
Answer: H V V V , h V V V , g V V . tensor such as
Assuming the simplest case, ik
Bjl ji lk jk li V V V V
h = h1 h2 h3 , g = g1 g2 ,
which acts as an operator in V V , exchanging the two vector
the coordinate-free formula is factors: 
ji lk jk li ajl = aik aki .
1
H= g1 (h1 (g2 ) h2 h3 + h1 (g2 ) h3 h2 h3 (g2 ) h1 h2 ) .
2 The index-free definition of this operator is simple with single-
term tensor products,
Question: I would like to decompose a vector v in the basis {ej }
using the index notation, v = v j ej . Is it okay to write the lower B (u v) u v v u.
index j on the basis vectors ej ? I also want to write v j = ej (v)

using the dual basis ej , but then the index j is not correctly Having defined B on single-term tensor products, we require
matched at both sides. linearity and so define the operator B on the entire space V V .
Answer: The index notation is designed so that you never use However, practical calculations are cumbersome if we are apply-
the basis vectors ej or ej you only use components such as ing B to a complicated tensor X V V rather than to a single-
v j or fj . The only way to keep the upper and the lower indices term product u v, because, in particular, we are obliged to de-
consistent (i.e. having the summation always over one upper compose X into single-term tensor products in order to perform
and one lower index) when you want to use both the compo- such a calculation.
nents v j and the basis vectors
 ej is to use upper indices on the Some disadvantages of the index notation are as follows: (1) If
dual basis, i.e. writing ej . Then a covector will have com- the basis is changed, all components need to be recomputed. In
ponents with lower indices, f = fj ej , and the index notation textbooks that use the index notation, quite some time is spent
remains consistent. A further problem occurs when you have a studying the transformation laws of tensor components under
scalar product and you would like to express the component v j a change of basis. If different bases are used simultaneously,
as v j = hv, ej i. In this case, the only way to keep the notation confusion may result as to which basis is implied in a particular
consistent is to use explicitly a suitable matrix, say g ij , in order formula. (2) If we are using unrelated vector spaces V and W ,
to represent the scalar product. Then one would be able to write we need to choose a basis in each of them and always remember
v j = g jk hv, ek i and keep the index notation consistent. which index belongs to which space. The index notation does
not show this explicitly. To alleviate this problem, one may use
1.9.2 Advantages and disadvantages of index e.g. Greek and Latin indices to distinguish different spaces, but
this is not always convenient or sufficient. (3) The geometrical
notation
meaning of many calculations appears hidden behind a mass of
Index notation is conceptually easier than the index-free nota- indices. It is sometimes unclear whether a long expression with
tion because one can imagine manipulating merely some ta- indices can be simplified and how to proceed with calculations.
bles of numbers, rather than abstract vectors. In other words, (Do we need to try all possible relabellings of indices and see
we are working with less abstract objects. The price is that we what happens?)
obscure the geometric interpretation of what we are doing, and Despite these disadvantages, the index notation enables one
proofs of general theorems become more difficult to understand. to perform practical calculations with high-rank tensor spaces,
The main advantage of the index notation is that it makes such as those required in field theory and in general relativity.
computations with complicated tensors quicker. Consider, for For this reason, and also for historical reasons (Einstein used the
example, the space V V V V whose elements can be index notation when developing the theory of relativity), most

28
1 Linear algebra without coordinates

physics textbooks use the index notation. In some cases, calcula- Example 2: The action of A 1V + 12 v u V V on a vector
tions can be performed equally quickly using index and index- x V is written as follows:
free notations. In other cases, especially when deriving general 
properties of tensors, the index-free notation is superior.4 I use |yi = A |xi = 1 + 12 |vi hu| |xi = |xi + 12 |vi hu| |xi
the index-free notation in this book because calculations in coor- hu|xi
dinates are not essential for this books central topics. However, = |xi + |vi .
2
I will occasionally show how to do some calculations also in the
index notation. Note that we have again simplified hu| |xi to hu|xi, and the re-
sult is correct. Compare this notation with the same calculation
written in the index-free notation:
1.10 Dirac notation for vectors and  u (x)
covectors y = Ax = 1 + 12 v u x = x + v.
2

The Dirac notation was developed for quantum mechanics Example 3: If |e1 i, ..., |eN i is a basis, we denote by hek | the cov-
where one needs to perform many computations with opera- ectors from the dual basis, so that hej |ek i = jk . A vector |vi is
tors, vectors and covectors (but not with higher-rank tensors!). expressed through the basis vectors as
The Dirac notation is index-free. X
|vi = vk |ek i ,
k
1.10.1 Definition of Dirac notation
where the coefficients vk can be computed as vk = hek |vi. An
The rules are as follows:
arbitrary operator A is decomposed as
One writes the symbol |vi for a vector v V and hf | for
X
a covector f V . The labels inside the special brack- A = Ajk |ej i hek | .
ets | i and h | are chosen according to the problem at hand, j,k
e.g. one

can denote specific vectors by |0i, |1i, |xi, |v1 i, or
even (0) aij ; l, m if that helps. (Note that |0i is normally not The matrix elements Ajk of the operator A in this basis are
the zero vector; the latter is denoted simply by 0, as usual.) found as
Linear combinations of vectors are written like this: 2 |vi Ajk = hej | A |ek i .
3 |ui instead of 2v 3u. The identity operator is decomposed as follows,
The action of a covector on a vector is written as hf |vi; the X
result is a number. The mnemonic for this is bra-ket, so 1 = |ek i hek | .
k
hf | is a bra vector and |vi is a ket vector. The action of
an operator A on a vector |vi is written A |vi. Expressions of this sort abound in quantum mechanics text-
The action of the transposed operator A on a covector hf | books.
T

is written hf | A. Note that the transposition label (T ) is not


used. This is consistent within the Dirac notation: The cov- 1.10.2 Advantages and disadvantages of Dirac
ector hf | A acts on a vector |vi as hf | A |vi, which is the same notation
(by definition of AT ) as the covector hf | acting on A |vi.
The Dirac notation is convenient when many calculations with
The tensor product symbol is omitted. Instead of v f vectors and covectors are required. But calculations become
V V or a b V V , one writes |vi hf | and |ai |bi re- cumbersome if we need many tensor powers. For example, sup-
spectively. The tensor space to which a tensor belongs will pose we would like to apply a covector hf | to the second vector
be clear from the notation or from explanations in the text. in the tensor product |ai |bi |ci, so that the answer is |ai hf |bi |ci.
Note that one cannot write f v as hf | |vi since hf | |vi al- Now one cannot simply write hf | X with X = |ai |bi |ci because
ready means f (v) in the Dirac notation. Instead, one al- hf | X is ambiguous in this case. The desired kind of action of
ways writes |vi hf | and does not distinguish between f v covectors on tensors is difficult to express using the Dirac nota-
and v f . tion. Only the index notation allows one to write and to carry
Example 1: The action of an operator a b V V on a out arbitrary operations with this kind of tensor product. In the
i j k
vector v V has been defined by (a b ) v = b (v) a. In the example just mentioned, one writes fj a b c to indicate that the
j
Dirac notation, this is very easy to express: one acts with |ai hb| covector fj acts on the vector b but not on the other vectors. Of
on a vector |vi by writing course, the resulting expression is harder to read because one
needs to pay close attention to every index.
(|ai hb|) |vi = |ai hb| |vi = |ai hb|vi .
In other words, we mentally remove one vertical line and get
the vector |ai times the number hb|vi. This is entirely consistent
with the definition of the operator a b End V .
4I have developed an advanced textbook on general relativity entirely in the
index-free notation and displayed the infrequent cases where the index no-
tation is easier to use.

29
2 Exterior product
In this chapter I introduce one of the most useful constructions D
in basic linear algebra the exterior product, denoted by a b,
C
where a and b are vectors from a space V . The basic idea of the
exterior product is that we would like to define an antisymmetric E
and bilinear product of vectors. In other words, we would like to b + a B
have the properties ab = ba and a(b+c) = ab+ac.

b
2.1 Motivation
A
Here I discuss, at some length, the motivation for introducing
the exterior product. The motivation is geometrical and comes a
from considering the properties of areas and volumes in the 0
framework of elementary Euclidean geometry. I will proceed
with a formal definition of the exterior product in Sec. 2.2. In Figure 2.1: The area of the parallelogram 0ACB spanned by a
order to understand the definition explained there, it is not nec- and b is equal to the area of the parallelogram 0ADE
essary to use this geometric motivation because the definition spanned by a and b + a due to the equality of areas
will be purely algebraic. Nevertheless, I feel that this motiva- ACD and 0BE.
tion will be helpful for some readers.

The trick is to replace the area function Ar with the oriented


2.1.1 Two-dimensional oriented area area function A(a, b). Namely, we define the function A(a, b)
We work in a two-dimensional Euclidean space, such as that by
considered in elementary geometry. We assume that the usual A(a, b) = |a| |b| sin ,
geometrical definition of the area of a parallelogram is known. where the sign is chosen positive when the angle is measured
Consider the area Ar(a, b) of a parallelogram spanned by from the vector a to the vector b in the counterclockwise direc-
vectors a and b. It is known from elementary geometry that tion, and negative otherwise.
Ar(a, b) = |a| |b| sin where is the angle between the two
Statement: The oriented area A(a, b) of a parallelogram
vectors, which is always between 0 and (we do not take into
spanned by the vectors a and b in the two-dimensional Eu-
account the orientation of this angle). Thus defined, the area Ar
clidean space is an antisymmetric and bilinear function of the
is always non-negative.
vectors a and b:
Let us investigate Ar(a, b) as a function of the vectors a and
b. If we stretch the vector a, say, by factor 2, the area is also A(a, b) = A(b, a),
increased by factor 2. However, if we multiply a by the number
2, the area will be multiplied by 2 rather than by 2: A(a, b) = A(a, b),
A(a, b + c) = A(a, b) + A(a, c). (the sum law)
Ar(a, 2b) = Ar(a, 2b) = 2Ar(a, b).
Proof: The first property is a straightforward consequence of
Similarly, for some vectors a, b, c such as shown in Fig. 2.2, we the sign rule in the definition of A.
have Ar(a, b+c) = Ar(a, b)+Ar(a, c). However, if we consider Proving the second property requires considering the cases
b = c then we obtain > 0 and < 0 separately. If > 0 then the orientation of the
Ar(a, b + c) = Ar(a, 0) = 0 pair (a, b) remains the same and then it is clear that the property
holds: When we rescale a by , the parallelogram is stretched
6= Ar(a, b) + Ar(a, b) = 2Ar(a, b). and its area increases by factor . If < 0 then the orientation
Hence, the area Ar(a, b) is, strictly speaking, not a linear func- of the parallelogram is reversed and the oriented area changes
tion of the vectors a and b: sign.
To prove the sum law, we consider two cases: either c is par-
Ar(a, b) = || Ar(a, b) 6= Ar(a, b), allel to a or it is not. If c is parallel to a, say c = a, we use
Ar(a, b + c) 6= Ar(a, b) + Ar(a, c). Fig. 2.1 to show that A(a, b + a) = A(a, b), which yields the
desired statement since A(a, a) = 0. If c is not parallel to a, we
Nevertheless, as we have seen, the properties of linearity hold in use Fig. 2.2 to show that A(a, b + c) = A(a, b) + A(a, c). Analo-
some cases. If we look closely at those cases, we find that linearly gous geometric constructions can be made for different possible
holds precisely when we do not change the orientation of the orientations of the vectors a, b, c. 
vectors. It would be more convenient if the linearity properties It is relatively easy to compute the oriented area because of
held in all cases. its algebraic properties. Suppose the vectors a and b are given

30
2 Exterior product

E a F the parallelogram within the coordinate plane Span {e1 , e2 } ob-


b+c tained by projecting P (a, b) onto that coordinate plane, and sim-
ilarly for the other two coordinate planes. Denote by A(a, b)e1 ,e2
b
the oriented area of P (a, b)e1 ,e2 . Then A(a, b)e1 ,e2 is a bilinear,
antisymmetric function of a and b.
C D
Proof: The projection onto the coordinate plane of e1 , e2 is a
c
linear transformation. Hence, the vector a + b is projected onto
the sum of the projections of a and b. Then we apply the ar-
b guments in the proof of Statement 2.1.1 to the projections of the
vectors; in particular, Figs. 2.1 and 2.2 are interpreted as show-
ing the projections of all vectors onto the coordinate plane e1 , e2 .
It is then straightforward to see that all the properties of the ori-
A a B ented area hold for the projected oriented areas. Details left as
exercise. 
Figure 2.2: The area of the parallelogram spanned by a and b It is therefore convenient to consider the oriented areas of the
(equal to the area of CEF D) plus the area of the par- three projections A(a, b)e1 ,e2 , A(a, b)e2 ,e3 , A(a, b)e3 ,e1 as
allelogram spanned by a and c (the area of ACDB) three components of a vector-valued area A(a, b) of the parallel-
equals the area of the parallelogram spanned by a ogram spanned by a, b. Indeed, it can be shown that these three
and b + c (the area of AEF B) because of the equality projected areas coincide with the three Euclidean components of
of the areas of ACE and BDF . the vector product a b. The vector product is the traditional
way such areas are represented in geometry: the vector a b
represents at once the magnitude of the area and the orientation
through their components in a standard basis {e1 , e2 }, for in- of the parallelogram. One computes the unoriented area of a
stance parallelogram as the length of the vector a b representing the
a = 1 e1 + 2 e2 , b = 1 e1 + 2 e2 . oriented area,
We assume, of course, that the vectors e1 and e2 are orthogo-  1
Ar(a, b) = A(a, b)2e1 ,e2 + A(a, b)2e2 ,e3 + A(a, b)2e3 ,e1 2 .
nal to each other and have unit length, as is appropriate in a
Euclidean space. We also assume that the right angle is mea- However, the vector product cannot be generalized to all
sured from e1 to e2 in the counter-clockwise direction, so that higher-dimensional spaces. Luckily, the vector product does not
A(e1 , e2 ) = +1. Then we use the Statement and the properties play an essential role in the construction of the oriented area.
A(e1 , e1 ) = 0, A(e1 , e2 ) = 1, A(e2 , e2 ) = 0 to compute Instead of working with the vector product, we will gener-
alize the idea of projecting the parallelogram onto coordinate
A(a, b) = A(1 e1 + 2 e2 , 1 e1 + 2 e2 ) planes. Consider a parallelogram spanned by vectors a, b in
= 1 2 A(e1 , e2 ) + 2 1 A(e2 , e1 ) an n-dimensional Euclidean space V with the standard basis
{e1 , ..., en }. While in three-dimensional space we had just three
= 1 2 2 1 . projections (onto the coordinate planes xy, xz, yz), in an n-
dimensional space we have 12 n(n 1) coordinate planes, which
The ordinary (unoriented) area is then obtained as the abso-
can be denoted by Span {ei , ej } (with 1 i < j n). We may
lute value of the oriented area, Ar(a, b) = |A(a, b)|. It turns
construct the 12 n(n 1) projections of the parallelogram onto
out that the oriented area, due to its strict linearity properties,
these coordinate planes. Each of these projections has an ori-
is a much more convenient and powerful construction than the
ented area; that area is a bilinear, antisymmetric number-valued
unoriented area.
function of the vectors a, b. (The proof of the Statement above
does not use the fact that the space is three-dimensional!) We
2.1.2 Parallelograms in R3 and in Rn may then regard these 12 n(n 1) numbers as the components of
a vector representing the oriented area of the parallelogram. It is
Let us now work in the Euclidean space R3 with a standard ba- clear that all these components are needed in order to describe
sis {e1 , e2 , e3 }. We can similarly try to characterize the area of the actual geometric orientation of the parallelogram in the n-
a parallelogram spanned by two vectors a, b. It is, however, dimensional space.
not possible to characterize the orientation of the area simply We arrived at the idea that the oriented area of the parallel-
by a sign. We also cannot use a geometric construction such as ogram spanned by a, b is an antisymmetric, bilinear function
that in Fig. 2.2; in fact it is not true in three dimensions that the A(a, b) whose value is a vector with 1 n(n1) components, i.e. a
2
area spanned by a and b + c is equal to the sum of Ar(a, b) and vector in a new space the space of oriented areas, as it were.
Ar(a, c). Can we still define some kind of oriented area that This space is 1 n(n1)-dimensional. We will construct this space
2
obeys the sum law? explicitly below; it is the space of bivectors, to be denoted by
Let us consider Fig. 2.2 as a figure showing the projection of the 2 V .
areas of the three parallelograms onto some coordinate plane, We will see that the unoriented area of the parallelogram is
say, the plane of the basis vectors {e1 , e2 }. It is straightforward computed as the length of the vector A(a, b), i.e. as the square
to see that the projections of the areas obey the sum law as ori- root of the sum of squares of the areas of the projections of the
ented areas. parallelogram onto the coordinate planes. This is a generaliza-
Statement: Let a, b be two vectors in R3 , and let P (a, b) be the tion of the Pythagoras theorem to areas in higher-dimensional
parallelogram spanned by these vectors. Denote by P (a, b)e1 ,e2 spaces.

31
2 Exterior product

The analogy between ordinary vectors and vector-valued ar- Here is a more formal definition of the exterior product space:
eas can be understood visually as follows. A straight line We will construct an antisymmetric product by hand, using
segment in an n-dimensional space is represented by a vector the tensor product space.
whose n components (in an orthonormal basis) are the signed Definition 1: Given a vector space V , we define a new vector
lengths of the n projections of the line segment onto the coor- space V V called the exterior product (or antisymmetric ten-
dinate axes. (The components are signed, or oriented, i.e. taken sor product, or alternating product, or wedge product) of two
with a negative sign if the orientation of the vector is opposite copies of V . The space V V is the subspace in V V consisting
to the orientation of the axis.) The length of a straight line of all antisymmetric tensors, i.e. tensors of the form
p seg-
ment, i.e. the length of the vector v, is then computed as hv, vi.
v1 v2 v2 v1 , v1,2 V,
The scalar product hv, vi is equal to the sum of squared lengths
of the projections because we are using an orthonormal basis. and all linear combinations of such tensors. The exterior product
A parallelogram in space is represented by a vector  whose of two vectors v1 and v2 is the expression shown above; it is
n n
2 components are the oriented areas of the 2 projections of obviously an antisymmetric and bilinear function of v1 and v2 .
the parallelogram onto the coordinate planes. (The vector be- For example, here is one particular element from V V , which
longs to the space of oriented areas, not to the original n-dimen- we write in two different ways using the axioms of the tensor
sional space.) The numerical p value of the area of the parallelo- product:
gram is then computed as h, i. The scalar product h, i in
the space of oriented areas is equal (u + v) (v + w) (v + w) (u + v) = u v v u
 to the sum of squared areas
of the projections because the n2 unit areas in the coordinate +u w w u + v w w v V V. (2.1)
planes are an orthonormal basis (according to the definition of
Remark: A tensor v1 v2 V V is not equal to the ten-
the scalar product in the space of oriented areas).
sor v2 v1 if v1 6= v2 . This is so because there is no identity
The generalization of the Pythagoras theorem holds not only
among the axioms of the tensor product that would allow us to
for areas but also for higher-dimensional volumes. A general
exchange the factors v1 and v2 in the expression v1 v2 .
proof of this theorem will be given in Sec. 5.5.2, using the ex-
Exercise 1: Prove that the exchange map T (v1 v2 ) v2
terior product and several other constructions to be developed
v1 is a canonically defined, linear map of V V into itself. Show
below.
that T has only two eigenvalues which are 1. Give examples
of eigenvectors with eigenvalues +1 and 1. Show that the sub-
2.2 Exterior product space V V V V is the eigenspace of the exchange operator
T with eigenvalue 1
In the previous section I motivated the introduction of the anti- Hint: T T = 1V V . Consider tensors of the form u v v u
symmetric product by showing its connection to areas and vol- as candidate eigenvectors of T . 
umes. In this section I will give the definition and work out It is quite cumbersome to perform calculations in the tensor
the properties of the exterior product in a purely algebraic man- product notation as we did in Eq. (2.1). So let us write the exte-
ner, without using any geometric intuition. This will enable us rior product as u v instead of u v v u. It is then straight-
to work with vectors in arbitrary dimensions, to obtain many forward to see that the wedge symbol indeed works like an
useful results, and eventually also to appreciate more fully the anti-commutative multiplication, as we intended. The rules of
geometric significance of the exterior product. computation are summarized in the following statement.
As explained in Sec. 2.1.2, it is possible to represent the ori- Statement 1: One may save time and write u v v u
ented area of a parallelogram by a vector in some auxiliary u v V V , and the result of any calculation will be correct,
space. The oriented area is much more convenient to work with as long as one follows the rules:
because it is a bilinear function of the vectors a and b (this is
explained in detail in Sec. 2.1). Product is another word for u v = v u, (2.2)
bilinear function. We have also seen that the oriented area is (u) v = (u v) , (2.3)
an antisymmetric function of the vectors a and b. (u + v) x = u x + v x. (2.4)
In three dimensions, an oriented area is represented by the
cross product a b, which is indeed an antisymmetric and bi- It follows also that u (v) = (u v) and that v v = 0.
linear product. So we expect that the oriented area in higher di- (These identities hold for any vectors u, v V and any scalars
mensions can be represented by some kind of new antisymmet- K.)
ric product of a and b; let us denote this product (to be defined Proof: These properties are direct consequences of the axioms
below) by a b, pronounced a wedge b. The value of a b of the tensor product when applied to antisymmetric tensors.
will be a vector in a new vector space. We will also construct this For example, the calculation (2.1) now requires a simple expan-
new space explicitly. sion of brackets,
(u + v) (v + w) = u v + u w + v w.
2.2.1 Definition of exterior product
Here we removed the term v v which vanishes due to the an-
Like the tensor product space, the space of exterior products can tisymmetry of . Details left as exercise. 
be defined solely by its algebraic properties. We can consider Elements of the space V V , such as a b + c d, are some-
the space of formal expressions like a b, 3a b + 2c d, etc., times called bivectors.1 We will also want to define the exterior
and require the properties of an antisymmetric, bilinear product 1 It is important to note that a bivector is not necessarily expressible as a single-
to hold. term product of two vectors; see the Exercise at the end of Sec. 2.3.2.

32
2 Exterior product

product of more than two vectors. To define the exterior prod- Answer: If we want to be pedantic, we need to define the ex-
uct of three vectors, we consider the subspace of V V V that terior product operation between a single-term bivector a b
consists of antisymmetric tensors of the form and a vector c, such that the result is by definition the 3-vector
a b c. We then define the same operation on linear combina-
abcbac+cabcba tions of single-term bivectors,
+b c a a c b (2.5) (a b + x y) c a b c + x y c.
and linear combinations of such tensors. These tensors are called Thus we have defined the exterior product between 2 V and V ,
totally antisymmetric because they can be viewed as (tensor- the result being a 3-vector from 3 V . We then need to verify
valued) functions of the vectors a, b, c that change sign under that the results do not depend on the choice of the vectors such
exchange of any two vectors. The expression in Eq. (2.5) will be as a, b, x, y in the representation of a bivector: A different rep-
denoted for brevity by a b c, similarly to the exterior product resentation can be achieved only by using the properties of the
of two vectors, a b b a, which is denoted for brevity by exterior product (i.e. the axioms of the tensor product), e.g. we
a b. Here is a general definition. may replace a b by b (a + b). It is easy to verify that any
such replacements will not modify the resulting 3-vector, e.g.
Definition 2: The exterior product of k copies of V (also called
the k-th exterior power of V ) is denoted by k V and is de- a b c = b (a + b) c,
fined as the subspace of totally antisymmetric tensors within
V ... V . In the concise notation, this is the space spanned again due to the properties of the exterior product. This consid-
by expressions of the form eration shows that calculations with exterior products are con-
sistent with our algebraic intuition. We may indeed compute
v1 v2 ... vk , vj V, a b c as (a b) c or as a (b c).
Example 1: Suppose we work in R3 and have vectors a =
1 1
assuming that the properties of the wedge product (linearity and 0, 2 , 2 , b = (2, 2, 0), c = (2, 5, 3). Let us compute var-
antisymmetry) hold as given by Statement 1. For instance, ious exterior products. Calculations are easier if we introduce
the basis {e1 , e2 , e3 } explicitly:
k
u v1 ... vk = (1) v1 ... vk u (2.6) 1
a= (e2 e3 ) , b = 2(e1 e2 ), c = 2e1 + 5e2 3e3 .
2
(pulling a vector through k other vectors changes sign k
We compute the 2-vector a b by using the properties of the
times). 
exterior product, such as x x = 0 and x y = y x, and
The previously defined space of bivectors is in this notation
2 0 simply expanding the brackets as usual in algebra:
V V V . A natural extension of this notation is V = K
1 1
and V = V . I will also use the following wedge product
a b = (e2 e3 ) 2 (e1 e2 )
notation, 2
n
^ = (e2 e3 ) (e1 e2 )
vk v1 v2 ... vn .
= e2 e1 e3 e1 e2 e2 + e3 e2
k=1
= e1 e2 + e1 e3 e2 e3 .
Tensors from the space n V are also called n-vectors or anti-
symmetric tensors of rank n. The last expression is the result; note that now there is nothing
more to compute or to simplify. The expressions such as e1 e2
Question: How to compute expressions containing multiple are the basic expressions out of which the space R3 R3 is built.
products such as a b c? Below (Sec. 2.3.2) we will show formally that the set of these
Answer: Apply the rules shown in Statement 1. For example, expressions is a basis in the space R3 R3 .
one can permute adjacent vectors and change sign, Let us also compute the 3-vector a b c,

a b c = b a c = b c a, a b c = (a b) c
= (e1 e2 + e1 e3 e2 e3 ) (2e1 + 5e2 3e3 ).
one can expand brackets,
When we expand the brackets here, terms such as e1 e2 e1
a (x + 4y) b = a x b + 4a y b, will vanish because
e1 e2 e1 = e2 e1 e1 = 0,
and so on. If the vectors a, b, c are given as linear combinations
of some basis vectors {ej }, we can thus reduce a b c to a so only terms containing all different vectors need to be kept,
linear combination of exterior products of basis vectors, such as and we find
e1 e2 e3 , e1 e2 e4 , etc.
a b c = 3e1 e2 e3 + 5e1 e3 e2 + 2e2 e3 e1
Question: The notation abc suggests that the exterior prod- = (3 5 + 2) e1 e2 e3 = 0.
uct is associative,
We note that all the terms are proportional to the 3-vector e1
a b c = (a b) c = a (b c). e2 e3 , so only the coefficient in front of e1 e2 e3 was needed;
then, by coincidence, that coefficient turned out to be zero. So
How can we make sense of this? the result is the zero 3-vector. 

33
2 Exterior product

Question: Our original goal was to introduce a bilinear, anti- so by using the definition of a b and u v through the tensor
symmetric product of vectors in order to obtain a geometric rep- product, we find
resentation of oriented areas. Instead, a b was defined alge-
braically, through tensor products. It is clear that a b is anti- (a b ) (u v) = (a b b a ) (u v v u)
symmetric and bilinear, but why does it represent an oriented = 2a (u) b (v) 2b (u) a (v).
area?
Answer: Indeed, it may not be immediately clear why ori- We got a combinatorial factor 2, that is, a factor that arises be-
ented areas should be elements of V V . We have seen that cause we have two permutations of the set (a, b). With n (V )
the oriented area A(x, y) is an antisymmetric and bilinear func- and (n V ) we get a factor n!. It is not always convenient to
tion of the two vectors x and y. Right now we have constructed have this combinatorial factor. For example, in a finite number
the space V V simply as the space of antisymmetric products. field the number n! might be equal to zero for large enough n. In
By constructing that space merely out of the axioms of the an- these cases we could redefine the action of a b on u v as
tisymmetric product, we already covered every possible bilinear
(a b ) (u v) a (u) b (v) b (u) a (v).
antisymmetric product. This means that any antisymmetric and
bilinear function of the two vectors x and y is proportional to If we are not working in a finite number field, we are able to
x y or, more generally, is a linear function of x y (perhaps divide by any integer, so we may keep combinatorial factors in
with values in a different space). Therefore, the space of oriented the denominators of expressions where such factors appear. For
areas (that is, the space of linear combinations of A(x, y) for var- example, if {ej } is a basis in V and = e1 ... eN is the
ious x and y) is in any case mapped to a subspace of V V . corresponding basis tensor in the one-dimensional space N V ,
We have also seen that oriented areas in N dimensions can be 
the dual basis tensor in N V could be defined by
represented through N2 projections, which indicates that they

are vectors in some N2 -dimensional space. We will see below 1
= e ... eN , so that () = 1.
that the space V V has exactly this dimension (Theorem 2 in N! 1
Sec. 2.3.2). Therefore, we can expect that the space of oriented
The need for such combinatorial factors is a minor technical in-
areas coincides with V V . Below we will be working in a space
convenience that does not arise too often. We may give the fol-
V with a scalar product, where the notions of area and volume
lowing definition that avoids dividing by combinatorial factors
are well defined. Then we will see (Sec. 5.5.2) that tensors from
(but now we use permutations; see Appendix B).
V V and the higher exterior powers of V indeed correspond
Definition 3: The action of a k-form f1 ... fk on a k-vector
in a natural way to oriented areas, or more generally to oriented
v1 ... vk is defined by
volumes of a certain dimension.
Remark: Origin of the name exterior. The construction of X
(1)|| f1 (v(1) )...fk (v(k) ),
the exterior product is a modern formulation of the ideas dat-

ing back to H. Grassmann (1844). A 2-vector a b is inter-
preted geometrically as the oriented area of the parallelogram where the summation is performed over all permutations of
spanned by the vectors a and b. Similarly, a 3-vector a b c the ordered set (1, ..., k).
represents the oriented 3-volume of a parallelepiped spanned Example 2: With k = 3 we have
by {a, b, c}. Due to the antisymmetry of the exterior product,
we have (a b) (a c) = 0, (a b c) (b d) = 0, etc. We can (p q r )(a b c)
interpret this geometrically by saying that the product of two = p (a)q (b)r (c) p (b)q (a)r (c)
volumes is zero if these volumes have a vector in common. This + p (b)q (c)r (a) p (c)q (b)r (a)
motivated Grassmann to call his antisymmetric product exte-
+ p (c)q (a)r (b) p (c)q (b)r (a).
rior. In his reasoning, the product of two extensive quantities
(such as lines, areas, or volumes) is nonzero only when each of Exercise 3: a) Show that a b = a b where is any
the two quantities is geometrically to the exterior (outside) of antisymmetric tensor (e.g. = x y z).
the other. b) Show that
Exercise 2: Show that in a two-dimensional space V , any 3-
vector such as a b c can be simplified to the zero 3-vector. 1 a 2 b 3 = 1 b 2 a 3 ,
Prove the same for n-vectors in N -dimensional spaces when
n > N.  where 1 , 2 , 3 are arbitrary antisymmetric tensors and a, b are
One can also consider the exterior powers of the dual space vectors.
V . Tensors from n V are usually (for historical reasons) called c) Due to antisymmetry, a a = 0 for any vector a V . Is it
n-forms (rather than n-covectors). also true that = 0 for any bivector 2 V ?
Question: Where is the star here, really? Is the space n (V )

different from (n V ) ? 2.2.2 * Symmetric tensor product
Answer: Good that you asked. These spaces are canonically
isomorphic, but there is a subtle technical issue worth mention- Question: At this point it is still unclear why the antisymmetric
ing. Consider an example: a b 2 (V ) can act upon definition is at all useful. Perhaps we could define something
uv 2 V by the standard tensor product rule, namely a b else, say the symmetric product, instead of the exterior product?
acts on u v as We could try to define a product, say a b, with some other
property, such as
(a b ) (u v) = a (u) b (v), a b = 2b a.

34
2 Exterior product

Answer: This does not work because, for example, we would space. Any N -vector can be written as a linear combination of
have exterior product terms,
b a = 2a b = 4b a,
= 1 e2 ... eN +1 + 2 e1 e3 ... eN +1 + ...
so all the products would have to vanish. + N e1 ... eN 1 eN +1 + N +1 e1 ... eN ,
We can define the symmetric tensor product, S , with the
property where {i } are some constants.
a S b = b S a, Note that any tensor N 1 V can be written in this way
simply by expressing every vector through the basis and by ex-
but it is impossible to define anything else in a similar fashion.2 panding the exterior products. The result will be a linear combi-
The antisymmetric tensor product is the eigenspace (within nation of the form shown above, containing at most N +1 single-
V V ) of the exchange operator T with eigenvalue 1. That term exterior products of the form e1 ... eN , e2 ... eN +1 ,
operator has only eigenvectors with eigenvalues 1, so the only and so on. We do not yet know whether these single-term exte-
other possibility is to consider the eigenspace with eigenvalue rior products constitute a linearly independent set; this will be
+1. This eigenspace is spanned by symmetric tensors of the established in Sec. 2.3.2. Presently, we will not need this prop-
form u v + v u, and can be considered as the space of sym- erty.
metric tensor products. We could write Now we would like to transform the expression above to a
single term. We move eN +1 outside brackets in the first N terms:
a S b a b + b a 
= 1 e2 ... eN + ... + N e1 ... eN 1 eN +1
and develop the properties of this product. However, it turns + N +1 e1 ... eN
out that the symmetric tensor product is much less useful for
eN +1 + N +1 e1 ... eN ,
the purposes of linear algebra than the antisymmetric subspace.
This book derives most of the results of linear algebra using the where in the last line we have introduced an auxiliary (N 1)-
antisymmetric product as the main tool! vector . If it happens that = 0, there is nothing left to prove.
Otherwise, at least one of the i must be nonzero; without loss
of generality, suppose that N 6= 0 and rewrite as
2.3 Properties of spaces k V N +1 
= eN +1 + N +1 e1 ... eN = eN +1 + eN .
As we have seen, tensors from the space V V are representable N
by linear combinations of the form a b + c d + ..., but not Now we note that belongs to the space of (N 1)-vectors over
uniquely representable because one can transform one such lin- the N -dimensional subspace spanned by {e1 , ..., eN }. By the in-
ear combination into another by using the axioms of the tensor ductive assumption, can be written as a single-term exterior
product. Similarly, n-vectors are not uniquely representable by product, = a1 ... aN 1 , of some vectors {ai }. Denoting
linear combinations of exterior products. For example,
N +1
aN eN +1 + eN ,
a b + a c + b c = (a + b) (b + c) N

since bb = 0. In other words, the 2-vector ab+ac+bc we obtain


has an alternative representation containing only a single-term = a1 ... aN 1 aN ,
exterior product, = r s where r = a + b and s = b + c. i.e. can be represented as a single-term exterior product. 
Exercise: Show that any 2-vector in a three-dimensional space is
representable by a single-term exterior product, i.e. to a 2-vector 2.3.1 Linear maps between spaces k V
of the form a b.
Hint: Choose a basis {e1 , e2 , e3 } and show that e1 e2 +e1 Since the spaces k V are vector spaces, we may consider linear
e3 + e2 e3 is equal to a single-term product.  maps between them.
What about higher-dimensional spaces? We will show (see A simplest example is a map
the Exercise at the end of Sec. 2.3.2) that n-vectors cannot be in
La : 7 a ,
general reduced to a single-term product. This is, however, al-
ways possible for (N 1)-vectors in an N -dimensional space. mapping k V k+1 V ; here the vector a is fixed. It is impor-
(You showed this for N = 3 in the exercise above.) tant to check that La is a linear map between these spaces. How
Statement: Any (N 1)-vector in an N -dimensional space can do we check this? We need to check that La maps a linear com-
be written as a single-term exterior product of the form a1 ... bination of tensors into linear combinations; this is easy to see,
aN 1 .
Proof: We prove this by using induction in N . The basis of in- La ( + ) = a ( + )
duction is N = 2, where there is nothing to prove. The induction = a + a = La + La .
step: Suppose that the statement is proved for (N 1)-vectors
in N -dimensional spaces, we need to prove it for N -vectors in Let us now fix a covector a . A covector is a map V K. In
(N + 1)-dimensional spaces. Choose a basis {e1 , ..., eN +1 } in the Lemma 2 of Sec. 1.7.3 we have used covectors to define linear
maps a : V W W according to Eq. (1.21), mapping v
2 This is a theorem due to Grassmann (1862). w 7 a (v) w. Now we will apply the analogous construction

35
2 Exterior product

to exterior powers and construct a map V V V . Let us linearity and antisymmetry. Therefore, we need to verify that
denote this map by a . a () does not change when we change the representation of
It would be incorrect to define the map a by the formula in these two ways: 1) expanding a linear combination,
a (v w) = a (v) w because such a definition does not respect
the antisymmetry of the wedge product and thus violates the (x + y) ... 7 x ... + y ...; (2.8)
linearity condition,
2) interchanging the order of two vectors in the exterior product
! and change the sign,
a (w v) = a ((1) v w) = a (v w) 6= a (v)w.
x y ... 7 y x ... (2.9)
So we need to act with a on each of the vectors in a wedge prod-
uct and make sure that the correct minus sign comes out. An It is clear that a (x + y) = a (x) + a (y); it follows by induc-
acceptable formula for the map a : 2 V V is tion that a does not change under a change of representation
of the type (2.8). Now we consider the change of representation
a (v w) a (v) w a (w) v.
of the type (2.9). We have, by definition of a ,
(Please check that the linearity condition now holds!) This is
a (v1 v2 ) = a (v1 )v2 a (v2 )v1 + v1 v2 a (),
how we will define the map a on 2 V .
Let us now extend a : 2 V V to a map where we have denoted by the rest of the exterior product. It
is clear from the above expression that
a : k V k1 V,
a (v1 v2 ) = a (v2 v1 ) = a (v2 v1 ).
defined as follows:
This proves that a () does not change under a change of rep-
a v a (v),
resentation of of the type (2.9). This concludes the proof. 
a (v ) a (v) v (a ). (2.7) Remark: It is apparent from the proof that the minus sign in the
inductive definition (2.7) is crucial for the linearity of the map
This definition is inductive, i.e. it shows how to define a on k V . Indeed, if we attempt to define a map by a formula such as
a
if we know how to define it on k1 V . The action of a on a sum
of terms is defined by requiring linearity, v1 v2 7 a (v1 )v2 + a (v2 )v1 ,
a (A + B) a (A) + a (B) , A, B k V. the result will not be a linear map 2 V V despite the appear-
ance of linearity. The correct formula must take into account the
We can convert this inductive definition into a more explicit fact that v1 v2 = v2 v1 .
formula: if = v1 ... vk k V then Exercise: Show by induction in k that
a (v1 ... vk ) a (v1 )v2 ... vk a (v2 )v1 v3 ... vk Lx a + a Lx = a (x), k V.
k1
+ ... + (1) a (vk )v1 ... vk1 .
In other words, the linear operator Lx a + a Lx : k V k V
This map is called the interior product or the insertion map. is simply the multiplication by the number a (x).
This is a useful operation in linear algebra. The insertion map
a inserts the covector a into the tensor k V by acting
with a on each of the vectors in the exterior product that makes 2.3.2 Exterior product and linear dependence
up .
Let us check formally that the insertion map is linear. The exterior product is useful in many ways. One powerful
Statement: The map a : k V k1 V for 1 k N is a property of the exterior product is its close relation to linear
well-defined linear map, according to the inductive definition. independence of sets of vectors. For example, if u = v then
Proof: First, we need to check that it maps linear combinations u v = 0. More generally:
into linear combinations; this is quite easy to see by induction, Theorem 1: A set {v1 , ..., vk } of vectors from V is linearly inde-
using the fact that a : V K is linear. However, this type of pendent if and only if (v1 v2 ... vk ) 6= 0, i.e. it is a nonzero
linearity is not sufficient; we also need to check that the result tensor from k V .
of the map, i.e. the tensor a (), is defined independently of the Proof: If {vj } is linearly dependent then without loss of gen-
representation of through vectors such as vi . The problem is, erality we may assume that v1 is a linear combination of other
Pk
there are many such representations, for example some tensor vectors, v1 = j=2 j vj . Then
3 V might be written using different vectors as
k
X
= v1 v2 v3 = v2 (v3 v1 ) (v3 + v2 ) v1 v2 v3 . v1 v2 ... vk = j vj v2 ... vj ... vk
j=2
We need to verify that any such equivalent representation yields k
X
the same resulting tensor a (), despite the fact that the defini- = (1)
j1
v2 ...vj vj ... vk = 0.
tion of a appears to depend on the choice of the vectors vi . Only j=2
then will it be proved that a is a linear map k V k1 V .
An equivalent representation of a tensor can be obtained Conversely, we need to prove that the tensor v1 ... vk 6= 0 if
only by using the properties of the exterior product, namely {vj } is linearly independent. The proof is by induction in k. The

36
2 Exterior product

basis of induction is k = 1: if {v1 } is linearly independent then is linearly independent  in the space 2 V .
n
clearly v1 6= 0. The induction step: Assume that the statement is (2) The set of m tensors
proved for k 1 and that {v1 , ..., vk } is a linearly independent
{vk1 vk2 ... vkm , 1 k1 < k2 < ... < km n}
set. By Exercise 1 in Sec. 1.6 there exists a covector f V such
that f (v1 ) = 1 and f (vi ) = 0 for 2 i k. Now we apply is linearly independent in the space m V for 2 m n.

the interior product map f : k V k1 V constructed in Proof: (1) The proof is similar to that of Lemma 3 in Sec. 1.7.3.
Sec. 2.3.1 to the tensor v1 ... vk and find Suppose the set {vj } is linearly independent but the set
{vj vk } is linearly dependent, so that there exists a linear com-
f (v1 ... vk ) = v2 ... vk . bination X
jk vj vk = 0
By the induction step, the linear independence of k 1 vectors
1j<kn
{v2 , ..., vk } entails v2 ... vk 6= 0. The map f is linear and
cannot map a zero tensor into a nonzero tensor, therefore v1 with at least some jk 6= 0. Without loss of generality, 12 6= 0
... vk 6= 0.  (or else we can renumber the vectors vj ). There exists a covector

It is also important to know that any tensor from the highest f V such that f (v1 ) = 1 and f (vi ) = 0 for 2 i
N
exterior power V can be represented as just a single-term ex- n. Apply the interior product with this covector to the above
terior product of N vectors. (Note that the same property for tensor,

N 1 V was already established in Sec. 2.3.) X Xn
N
Lemma 1: For any tensor V there exist vectors 0 = f jk vj vk = 1k vk ,
{v1 , ..., vN } such that = v1 ... vN . 1j<kn k=2
Proof: If = 0 then there is nothing to prove, so we assume
6= 0. By definition, the tensor has a representation as a sum therefore by linear independence of {vk } all 1k = 0, contradict-
of several exterior products, say ing the assumption 12 6= 0.
(2) The proof of part (1) is straightforwardly generalized to the
= v1 ... vN + v1 ... vN
+ ... space m V , using induction in m. We have just proved the basis
of induction, m = 2. Now the induction step: assume that the
Let us simplify this expression to just one exterior product. First, statement is proved for m1 and consider a set {vk ... vk },
1 m
let us omit any zero terms in this expression (for instance, a a of tensors of rank m, where {vj } is a basis. Suppose that this set
b ... = 0). Then by Theorem 1 the set {v1 , ..., vN } is linearly is linearly dependent; then there is a linear combination
independent (or else the term v1 ...vN would be zero). Hence, X
{v1 , ..., vN } is a basis in V . All other vectors such as vi can be k1 ...km vk1 ... vkm = 0
decomposed as linear combinations of vectors in that basis. Let k1 ,...,km
us denote v1 ...vN . By expanding the brackets in exterior with some nonzero coefficients, e.g. 12...m 6= 0. There exists a
products such as v1 ... vN
, we will obtain every time the covector f such that f (v1 ) = 1 and f (vi ) = 0 for 2 i n.
tensor with different coefficients. Therefore, the final result Apply this covector to the tensor and obtain f = 0, which
of simplification will be that equals multiplied with some yields a vanishing linear combination of tensors vk ... vk
1 m1
coefficient. This is sufficient to prove Lemma 1.  of rank m 1 with some nonzero coefficients. But this contra-
m
Now we would like to build a basis in the space V . For dicts the induction assumption, which says that any set of ten-
this we need to determine which sets of tensors from m V are sors vk ... vk
1 m1 of rank m 1 is linearly independent. 
linearly independent within that space. Now we are ready to compute the dimension of m V .
Lemma 2: If {e1 , ..., eN } is a basis in V then any tensor A Theorem 2: The dimension of the space m V is
m V can be decomposed as a linear combination of the tensors  
ek1 ek2 ... ekm with some indices kj , 1 j m. m N N!
dim V = = ,
Proof: The tensor A is a linear combination of expressions of m m! (N m)!
the form v1 ...vm , and each vector vi V can be decomposed where N dim V . For m > N we have dim m V = 0, i.e. the
in the basis {ej }. Expanding the brackets around the wedges spaces m V for m > N consist solely of the zero tensor.
using the rules (2.2)(2.4), we obtain a decomposition of an arbi- Proof: We will explicitly construct a basis in the space m V.
trary tensor through the basis tensors. For example, First choose a basis {e1 , ..., eN } in V . By Lemma 3, the set of N m
tensors
(e1 + 2e2 ) (e1 e2 + e3 ) 2 (e2 e3 ) (e1 e3 )
= e1 e2 e1 e3 + 4e2 e3 {ek1 ek2 ... ekm , 1 k1 < k2 < ... < km N }

(please verify this yourself!).  is linearly independent, and by Lemma 2 any tensor A m V
By Theorem 1, all tensors ek1 ek2 ... ekm constructed out is a linear combination of these tensors. Therefore the set
of subsets of vectors from the basis {e1 , ..., ek } are nonzero, and {ek1 ek2 ... ekm } is a basis in m V . By Theorem 1.1.5, the
by Lemma 2 any tensor can be decomposed into a linear combi- dimension of space is equal to  the number of vectors in any ba-
nation of these tensors. But are these tensors a basis in the space sis, therefore dim m N = N m .
m V ? Yes: For m > N , the existence of a nonzero tensor v1 ... vm
Lemma 3: If {v1 , ..., vn } is a linearly independent set of vectors contradicts Theorem 1: The set {v1 , ..., vm } cannot be linearly
(not necessarily a basis in V since n N ), then: independent since it has more vectors than the dimension of the
(1) The set of n2 tensors space. Therefore all such tensors are equal to zero (more pedan-
tically, to the zero tensor), which is thus the only element of m V
{vj vk , 1 j < k n} {v1 v2 , v1 v3 , ..., vn1 vn } for every m > N . 

37
2 Exterior product

Exercise 1: It is given that the set of four vectors {a, b, c, d} is Then we can write v1 (x) = x(v1 ). This equation can be used
linearly independent. Show that the tensor a b + c d for computing v1 : namely, for any x V the number v1 (x) is
2 V cannot be equal to a single-term exterior product of the form equal to the constant in the equation x (v1 ) = . To make
x y. this kind of equation more convenient, let us write
Outline of solution:
1. Constructive solution. There exists f V such that x v2 ... vN x (v1 )
v1 (x) = = ,
f (a) = 1 and f (b) = 0, f (c) = 0, f (d) = 0. Compute v1 v2 ... vN
f = b. If = x y, it will follow that a linear combination
where the division of one tensor by another is to be under-
of x and y is equal to b, i.e. b belongs to the two-dimensional
stood as follows: We first compute the tensor x (v1 ); this
space Span {x, y}. Repeat this argument for the remaining three
tensor is proportional to the tensor since both belong to the
vectors (a, c, d) and obtain a contradiction.
one-dimensional space N V , so we can determine the number
2. Non-constructive solution. Compute = 2a b c
such that x (v1 ) = ; the proportionality coefficient is
d 6= 0 by linear independence of {a, b, c, d}. If we could express
then the result of the division of x (v1 ) by .
= x y then we would have = 0. 
For v2 we have
Remark: While ab is interpreted geometrically as the oriented
area of a parallelogram spanned by a and b, a general linear v1 x v3 ... vN = x2 = v2 (x).
combination such as a b + c d + e f does not have this
interpretation (unless it can be reduced to a single-term product If we would like to have x2 = x (v2 ), we need to add an
x y). If not reducible to a single-term product, a b + c d can extra minus sign and define
be interpreted only as a formal linear combination of two areas.
Exercise 2: Suppose that k V and x V are such that (v2 ) v1 v3 ... vN .
x = 0 while x 6= 0. Show that there exists k1 V
such that = x . Give an example where and are not Then we indeed obtain v2 (x) = x (v2 ).
representable as a single-term exterior product. It is then clear that we can define the tensors (vi ) for i =
1, ..., N in this way. The tensor (vi ) is obtained from by re-
Outline of solution: There exists f V such that f (x) = 1.
Apply f to the given equality x = 0: moving the vector vi and by adding a sign that corresponds to
shifting the vector vi to the left position in the exterior product.
!
0 = f (x ) = x f ,
The complement map, : V N 1 V , satisfies vj (vj ) =
for each basis vector vj . (Once defined on the basis vectors, the
which means that = x with f . An example can be complement map can be then extended to all vectors from V by
found with = a b + c d as in Exercise 1, and x such that requiring linearity. However, we will apply the complement op-
the set {a, b, c, d, x} is linearly independent; then x is eration only to basis vectors right now.)
also not reducible to a single-term product. With these definitions, we may express the dual basis as

vi (x) = x (vi ), x V, i = 1, ..., N.


2.3.3 Computing the dual basis
The exterior product allows us to compute explicitly the dual Remark: The notation (vi ) suggests that e.g. (v1 ) is some op-
basis for a given basis. eration applied to v1 and is a function only of the vector v1 , but
We begin with some motivation. Suppose {v1 , ..., vN } is a this is not so: The complement of a vector depends on the
given basis; we would like to compute its dual basis. For in- entire basis and not merely on the single vector! Also, the prop-
stance, the covector v1 of the dual basis is the linear function erty v1 (v1 ) = is not sufficient to define the tensor v1 .
such that v1 (x) is equal to the coefficient at v1 in the decompo- The proper definition of (vi ) is the tensor obtained from by
sition of x in the basis {vj }, removing vi as just explained.
Example: In the space R2 , let us compute the dual basis to the
N
X basis {v1 , v2 } where v1 = 21 and v2 = 1 1 .
x= xi vi ; v1 (x) = x1 . 
Denote by e1 and e2 the standard basis vectors 10 and 01 .

i=1
We first compute the 2-vector
We start from the observation that the tensor v1 ... vN is
nonzero since {vj } is a basis. The exterior product xv2 ...vN = v1 v2 = (2e1 + e2 ) (e1 + e2 ) = 3e1 e2 .
is equal to zero if x is a linear combination only of v2 , ..., vN ,
The complement operation for the basis {v1 , v2 } gives (v1 ) =
with a zero coefficient x1 . This suggests that the exterior product
v2 and (v2 ) = v1 . We now define the covectors v1,2 by their
of x with the (N 1)-vector v2 ... vN is quite similar to the
action on arbitrary vector x x1 e1 + x2 e2 ,
covector v1 we are looking for. Indeed, let us compute

x v2 ... vN = x1 v1 v2 ... vN = x1 . v1 (x) = x v2 = (x1 e1 + x2 e2 ) (e1 + e2 )


x1 + x2
Therefore, exterior multiplication with v2 ... vN acts quite = (x1 + x2 ) e1 e2 = ,
3
similarly to v1 . To make the notation more concise, let us intro- v2 (x) = x v1 = (x1 e1 + x2 e2 ) (2e1 + e2 )
duce a special complement operation3 denoted by a star:
x1 + 2x2
= (x1 + 2x2 ) e1 e2 = .
(v1 ) v2 ... vN . 3
3 The complement operation was introduced by H. Grassmann (1844). Therefore, v1 = 31 e1 + 31 e2 and v2 = 13 e1 + 32 e2 .

38
2 Exterior product

Question: Can we define the complement operation for all x computation of a long exterior product if we rewrite
V by the equation x (x) = where N V is a fixed ten-
n
sor? Does the complement really depend on the entire basis? Or ^
perhaps a choice of is sufficient? xn = x1 x2 ... xn
i=1
Answer: No, yes, no. Firstly, (x) is not uniquely specified by
that equation alone, since x A = defines A only up to tensors x1 (x2 11 x1 ) ... (xn n1 x1 ... n1,n1 xn1 ) ,
of the form x ...; secondly, the equation x (x) = indicates
that (x) = 1 (x), so the complement map would not be lin- where the coefficients {ij | 1 i n 1, 1 j i} are chosen
ear if defined like that. It is important to keep in mind that the appropriately such that the vector x2 x2 11 x1 does not
contain the basis vector e1 , and generally the vector
complement map requires an entire basis for its definition and
depends not only on the choice of a tensor , but also on the
xk xk k1 x1 ... k1,k1 xk1
choice of all the basis vectors. For example, in two dimensions
we have (e1 ) = e2 ; it is clear that (e1 ) depends on the choice
does not contain the basis vectors e1 ,..., ek1 . (That is, these ba-
of e2 !
sis vectors have been eliminated from the vector xk , hence the
Remark: The situation is different when the vector space is name of the method.) Eliminating e1 from x2 can be done with
x
equipped with a scalar product (see Sec. 5.4.2 below). In that 11 = x21 11
, which is possible provided that x11 6= 0; if x11 = 0,
case, one usually chooses an orthonormal basis to define the com- we need to renumber the vectors {xj }. If none of them con-
plement map; then the complement map is called the Hodge tains e1 , we skip e1 and proceed with e2 instead. Elimination
star. It turns out that the Hodge star is independent of the choice of other basis vectors proceeds similarly. After performing this
of the basis as long as the basis is orthonormal with respect to the algorithm, we will either find that some vector xk is itself zero,
given scalar product, and as long as the orientation of the basis which means that the entire exterior product vanishes, or we
is unchanged (i.e. as long as the tensor does not change sign). will find the product of vectors of the form
In other words, the Hodge star operation is invariant under or-
thogonal and orientation-preserving transformations of the ba- x1 ... xn ,
sis; these transformations preserve the tensor . So the Hodge
star operation depends not quite on the detailed choice of the where the vectors xi are linear combinations of ei , ..., eN (not
basis, but rather on the choice of the scalar product and on the containing e1 , ..., ei ).
orientation of the basis (the sign of ). However, right now we If n = N , the product can be evaluated immediately since the
are working with a general space without a scalar product. In last vector, xN , is proportional to eN , so
this case, the complement map depends on the entire basis.
x1 ... xn = (c11 e1 + ...) ... (cnn eN )
= c11 c22 ...cnn e1 ... eN .
2.3.4 Gaussian elimination
The computation is somewhat longer if n < N , so that
Question: How much computational effort is actually needed
to compute the exterior product of n vectors? It looks easy in xn = cnn en + ... + cnN eN .
two or three dimensions, but in N dimensions the product of n
vectors {x1 , ..., xn } gives expressions such as In that case, we may eliminate, say, en from x1 , ..., xn1 by
subtracting a multiple of xn from them, but we cannot simplify
n
^ the product any more; at that point we need to expand the last
xn = (x11 e1 + ... + x1N eN ) ... (xn1 e1 + ... + xnN eN ) , bracket (containing xn ) and write out the terms.
i=1
Example 1: We will calculate the exterior product
which will be reduced to an exponentially large number (of
order N n ) of elementary tensor products when we expand all abc
brackets. (7e1 8e2 + e3 ) (e1 2e2 15e3 ) (2e1 5e2 e3 ).
Answer: Of course, expanding all brackets is not the best way
to compute long exterior products. We can instead use a pro- We will eliminate e1 from a and c (just to keep the coefficients
cedure similar to the Gaussian elimination for computing deter- simpler):
minants. The key observation is that
a b c = (a 7b) b (c 2b)
x1 x2 ... = x1 (x2 x1 ) ... = (6e2 + 106e3) b (e2 + 9e3 )
a1 b c1 .
for any number , and that it is easy to compute an exterior
product of the form Now we eliminate e2 from a1 , and then the product can be eval-
uated quickly:
(1 e1 + 2 e2 + 3 e3 ) (2 e2 + 3 e3 ) e3 = 1 2 e1 e2 e3 .
a b c = a1 b c1 = (a1 + 6c1 ) b c1
It is easy to compute this exterior product because the second
= (160e3 ) (e1 2e2 5e3 ) (e2 + 9e3 )
vector (2 e2 + 3 e3 ) does not contain the basis vector e1 and the
third vector does not contain e1 or e2 . So we can simplify the = 160e3 e1 (e2 ) = 160e1 e2 e3 .

39
2 Exterior product

Example 2: Consider is zero, we may omit v2 since v2 is proportional to v1 and try


v1 v3 . If v1 v2 6= 0, we try v1 v2 v3 , and so on. The pro-
a b c (e1 + 2e2 e3 + e4 ) cedure can be formulated using induction in the obvious way.
(2e1 + e2 e3 + 3e4 ) (e1 e2 + e4 ). Eventually we will arrive at a subset {vi1 , ..., vik } S such that
vi1 ... ...vik 6= 0 but vi1 ... ...vik vj = 0 for any other
We eliminate e1 and e2 : vj . Thus, there are no linearly independent subsets of S having
k + 1 or more vectors. Then the rank of S is equal to k.
a b c = a (b 2a) (c + a) The subset {vi1 , ..., vik } is built by a procedure that depends
= a (3e2 + e3 + e4 ) (e2 e3 + 2e4 ) on the order in which the vectors vj are selected. However,
a b1 c1 = a b1 (c1 + 3b1 ) the next statement says that the resulting subspace spanned by
{vi1 , ..., vik } is the same regardless of the order of vectors vj .
= a b1 (2e3 + 5e4 ) a b1 c2 .
Hence, the subset {vi1 , ..., vik } yields a basis in Span S.
We can now eliminate e3 from a and b1 : Statement: Suppose a set S of vectors has rank k and contains
two different linearly independent subsets, say S1 = {v1 , ..., vk }
1 1 and S2 = {u1 , ..., uk }, both having k vectors (but no linearly
a b1 c2 = (a + c2 ) (b1 c2 ) c2 a2 b2 c2
2 2 independent subsets having k + 1 or more vectors). Then the
7 3 tensors v1 ... vk and u1 ... uk are proportional to each
= (e1 + 2e2 + e4 ) (3e2 e4 ) (2e3 + 5e4 ).
2 2 other (as tensors from k V ).
Proof: The tensors v1 ...vk and u1 ...uk are both nonzero
Now we cannot eliminate any more vectors, so we expand the
by Theorem 1 in Sec. 2.3.2. We will now show that it is possible
last bracket and simplify the result by omitting the products of
to replace v1 by one of the vectors from the set S2 , say ul , such
equal vectors:
that the new tensor ul v2 ...vk is nonzero and proportional to
a2 b2 c2 = a2 b2 2e3 + a2 b2 5e4 the original tensor v1 ... vk . It will follow that this procedure
can be repeated for every other vector vi , until we replace all
3
= (e1 + 2e2 ) ( e4 ) 2e3 + e1 (3e2 ) 2e3 vi s by some ui s and thus prove that the tensors v1 ... vk
2 and u1 ... uk are proportional to each other.
+ e1 (3e2 ) 5e4 It remains to prove that the vector v1 can be replaced. We
= 3e1 e3 e4 + 6e2 e3 e4 6e1 e2 e3 15e1 e2 e4 . need to find a suitable vector ul . Let ul be one of the vectors
from S2 , and let us check whether v1 could be replaced by ul .
2.3.5 Rank of a set of vectors We first note that v1 ... vk ul = 0 since there are no lin-
early independent subsets of S having k + 1 vectors. Hence the
We have defined the rank of a map (Sec. 1.8.4) as the dimen- set {v1 , ..., vk , ul } is linearly dependent. It follows (since the set
sion of the image of the map, and we have seen that the rank is {vi | i = 1, ..., k} was linearly independent before we added ul
equal to the minimum number of tensor product terms needed to it) that ul can be expressed as a linear combination of the vi s
to represent the map as a tensor. An analogous concept can be with some coefficients i :
introduced for sets of vectors.
Definition: If S = {v1 , ..., vn } is a set of vectors (where n is not ul = 1 v1 + ... + k vk .
necessarily smaller than the dimension N of space), the rank If 1 6= 0 then we will have
of the set S is the dimension of the subspace spanned by the
vectors {v1 , ..., vn }. Written as a formula, ul v2 ... vk = 1 v1 v2 ... vk .
The new tensor is nonzero and proportional to the old tensor, so
rank (S) = dim Span S.
we can replace v1 by ul .
The rank of a set S is equal to the maximum number of vectors However, it could also happen that 1 = 0. In that case we
in any linearly independent subset of S. For example, consider need to choose a different vector ul S2 such that the corre-
the set {0, v, 2v, 3v} where v 6= 0. The rank of this set is 1 since sponding coefficient 1 is nonzero. It remains to prove that such
these four vectors span a one-dimensional subspace, a choice is possible. If this were impossible then all ui s would
have been expressible as linear combinations of vi s with zero
Span {0, v, 2v, 3v} = Span {v} . coefficients at the vector v1 . In that case, the exterior product
u1 ... uk would be equal to a linear combination of exterior
Any subset of S having two or more vectors is linearly depen- products of vectors vi with i = 2, ..., k. These exterior products
dent. contain k vectors among which only (k 1) vectors are differ-
We will now show how to use the exterior product for com- ent. Such exterior products are all equal to zero. However, this
puting the rank of a given (finite) set S = {v1 , ..., vn }. contradicts the assumption u1 ... uk 6= 0. Therefore, at least
According to Theorem 1 in Sec. 2.3.2, the set S is linearly in- one vector ul exists such that 1 6= 0, and the required replace-
dependent if and only if v1 ... vn 6= 0. So we first compute ment is always possible. 
the tensor v1 ... vn . If this tensor is nonzero then the set S Remark: It follows from the above Statement that the subspace
is linearly independent, and the rank of S is equal to n. If, on spanned by S can be uniquely characterized by a nonzero ten-
the other hand, v1 ... vn = 0, the rank is less than n. We can sor such as v1 ... vk in which the constituents the vectors
determine the rank of S by the following procedure. First, we v1 ,..., vk form a basis in the subspace Span S. It does not mat-
assume that all vj 6= 0 (any zero vectors can be omitted without ter which linearly independent subset we choose for this pur-
changing the rank of S). Then we compute v1 v2 ; if the result pose. We also have a computational procedure for determining

40
2 Exterior product

the subspace Span S together with its dimension. Thus, we find will now rewrite Eq. (2.10) in a different form that will be more
that a k-dimensional subspace is adequately specified by select- suitable for expressing exterior products of arbitrary tensors.
ing a nonzero tensor k V of the form = v1 ... vk . For Let us first consider the exterior product of three vectors as
a given subspace, this tensor is unique up to a nonzero con- a map E : V V V 3 V . This map is linear and can be
stant factor. Of course, the decomposition of into an exterior represented, in the index notation, in the following way:
product of vectors {vi | i = 1, ..., k} is not unique, but any such X ijk
ijk
decomposition yields a set {vi | i = 1, ..., k} spanning the same ui v j wk 7 (u v w) = Elmn ul v m wn ,
subspace. l,m,n
Exercise 1: Let {v1 , ..., vn } be a linearly independent set of vec- ijk
tors, v1 ... vn 6= 0, and x be a given vector such that where the array Elmn is the component representation of the
ijk
x = 0. Show that x belongs to the subspace Span {v1 , ..., vn }. map E. Comparing with the formula (2.10), we find that Elmn
Exercise 2: Given a nonzero covector f and a vector n such that can be expressed through the Kronecker -symbol as
f (n) 6= 0, show that the operator P defined by ijk
Elmn = li m
j k
n li m
k j
n + lk m
i j
n lk m n + lj m
j i
n lj m
k i i k
n .
f (x) It is now clear that the exterior product of two vectors can be
P x = x n
f (n) also written as X ij
(u v)ij = Elm ul v m ,
is a projector onto the subspace f , i.e. that f (P x) = 0 for all l,m
x V . Show that
where
ij
(P x) n = x n, x V. Elm = li m
j
lj m
i
.
By analogy, the map E : V ... V n V (for 2 n N ) can
2.3.6 Exterior product in index notation be represented in the index notation by the array of components
Eji11 ...j
...in
n
. This array is totally antisymmetric with respect to all the
Here I show how to perform calculations with the exterior prod- indices {is } and separately with respect to all {js }. Using this
uct using the index notation (see Sec. 1.9), although I will not use array, the exterior product of two general antisymmetric tensors,
this later because the index-free notation is more suitable for the say m V and n V , such that m + n N , can be
purposes of this book.  represented in the index notation by
Let us choose a basis {ej } in V ; then the dual basis ej in V
and the basis {ek1 ... ekm } in m V are fixed. By definition, 1 X i ...i
( )i1 ...im+n = Ej11 ...jm+n
m k1 ...kn
j1 ...jm k1 ...kn .
the exterior product of two vectors u and v is m!n!
(js ,ks )

A u v = u v v u, The combinatorial factor m!n! is needed to compensate for the


m! equal terms arising from the summation over (j1 , ..., jm ) due
therefore it is written in the index notation as Aij = ui v j uj v i . to the fact that j1 ...jm is totally antisymmetric, and similarly for
Note that the matrix Aij is antisymmetric: Aij = Aji . the n! equal terms arising from the summation over (k1 , ..., km ).
Another example: The 3-vector u v w can be expanded in It is useful to have a general formula for the array Eji11 ...j
...in
n
. One
the basis as way to define it is
N
(
X (1)|| if (i1 , ..., in ) is a permutation of (j1 , ..., jn ) ;
uvw = B ijk ei ej ek . i1 ...in
Ej1 ...jn =
i,j,k=1 0 otherwise.

What is the relation between the components ui , v i , wi of the We will now show how one can express Eji11 ...j ...in
n
through the
vectors and the components B ijk ? A direct calculation yields Levi-Civita symbol .
The Levi-Civita symbol is defined as a totally antisymmetric
B ijk = ui v j wk ui v k wj + uk v i wj uk wj v i + uj wk v i uj wi wk . array with N indices, whose values are 0 or 1 according to the
(2.10) formula
In other words, every permutation of the set (i, j, k) of indices (
||
i ...i (1) if (i1 , ..., iN ) is a permutation of (1, ..., N ) ;
enters with the sign corresponding to the parity of that permu- 1 N =
tation. 0 otherwise.
Remark: Readers familiar with the standard definition of the i1 ...in
matrix determinant will recognize a formula quite similar to the Comparing this with the definition of Ej1 ...jn , we notice that
determinant of a 3 3 matrix. The connection between determi- i1 ...iN = E1...N i1 ...iN
.
nants and exterior products will be fully elucidated in Chapter 3.
Remark: The three-dimensional array B ijk is antisymmetric Depending on convenience, we may write with upper or lower
with respect to any pair of indices: indices since is just an array of numbers in this calculation.
In order to express Eji11 ...i ...jn through
n i1 ...iN
, we obviously need
B ijk = B jik = B ikj = ... to use at least two copies of one with upper and one with
lower indices. Let us therefore consider the expression
Such arrays are called totally antisymmetric.  X
The formula (2.10) for the components B ijk of u v w is not Eji11 ...i
...jn
n
i1 ...in k1 ...kN n j1 ...jn k1 ...kN n , (2.11)
particularly convenient and cannot be easily generalized. We k1 ,...,kN n

41
2 Exterior product

where the summation is performed only over the N n indices 2.3.7 * Exterior algebra (Grassmann algebra)
{ks }. This expression has 2n free indices i1 , ..., in and j1 , ...,
jn , and is totally antisymmetric in these free indices (since is The formalism of exterior algebra is used e.g. in physical theo-
totally antisymmetric in all indices). ries of quantum fermionic fields and supersymmetry.
Statement: The exterior product operator Eji11 ...j
...in
n
is expressed Definition: An algebra is a vector space with a distributive
through the Levi-Civita symbol as multiplication. In other words, A is an algebra if it is a vector
space over a field K and if for any a, b A their product ab A
1
Eji11 ...j
...in
= E i1 ...in , (2.12) is defined, such that a (b + c) = ab + ac and (a + b) c = ac + bc
n
(N n)! j1 ...jn and (ab) = (a) b = a (b) for K. An algebra is called
commutative if ab = ba for all a, b.
where E is defined by Eq. (2.11).
The properties of the multiplication in an algebra can be sum-
Proof: Let us compare the values of Eji11 ...j ...in
n
and Eji11 ...j
...in
n
, where marized by saying that for any fixed element a A, the trans-
the indices {is } and {js } have some fixed values. There are formations x 7 ax and x 7 xa are linear maps of the algebra
two cases: either the set (i1 , ..., in ) is a permutation of the set into itself.
(j1 , ..., jn ); in that case we may denote this permutation by ; or
(i1 , ..., in ) is not a permutation of (j1 , ..., jn ). Examples of algebras:
Considering the case when a permutation brings (j1 , ..., jn )
1. All N N matrices with coefficients from K are a N 2 -dimen-
into (i1 , ..., in ), we find that the symbols in Eq. (2.11) will be
sional algebra. The multiplication is defined by the usual
nonzero only if the indices (k1 , ..., kN n ) are a permutation of
matrix multiplication formula. This algebra is not commu-
the complement of the set (i1 , ..., in ). There are (N n)! such
tative because not all matrices commute.
permutations, each contributing the same value to the sum in
Eq. (2.11). Hence, we may write4 the sum as 2. The field K is a one-dimensional algebra over itself. (Not a
very exciting example.) This algebra is commutative.
Eji11 ...j
...in
n
= (N n)! i1 ...in k1 ...kN n j1 ...jn k1 ...kN n (no sums!),
Statement: If m V then we can define the map L : k V
where the indices {ks } are chosen such that the values of are
k+m V by the formula
nonzero. Since
(j1 , ..., jn ) = (i1 , ..., in ) ,
L (v1 ... vk ) v1 ... vk .
we may permute the first n indices in j1 ...jn k1 ...kN n
For elements of 0 V K, we define L and also L
i1 ...in || i1 ...in k1 ...kN n
Ej1 ...jn = (N n)!(1) i1 ...in k1 ...kN n (no sums!) for any k V , K. Then the map L is linear for any
m V , 0 m N .
= (N n)!(1)|| .
Proof: Left as exercise. 
(In the last line, we replaced the squared by 1.) Thus, the re- Definition: The exterior algebra (also called the Grassmann
quired formula for E is valid in the first case. algebra) based on a vector space V is the space V defined as
In the case when does not exist, we note that the direct sum,

Eji11 ...j
...in
n
= 0, V K V 2 V ... N V,
because in that case one of the s in Eq. (2.11) will have at least with the multiplication defined by the map L, which is extended
some indices equal and thus will be zero. Therefore E and E are to the whole of V by linearity.
equal to zero for the same sets of indices.  For example, if u, v V then 1 + u V ,
Note that the formula for the top exterior power (n = N ) is
simple and involves no summations and no combinatorial fac- A 3 v + u 2v u V,
tors:
Eji11 ...j
...iN
N
= i1 ...iN j1 ...jN . and
Exercise: The operator E : V V V 3 V can be considered
L1+u A = (1 + u) (3 v + u 2v u) = 3 v + 4u v u.
within the subspace 3 V V V V , which yields an operator
E : 3 V 3 V . Show that in this subspace, Note that we still write the symbol to denote multiplication
in V although now it is not necessarily anticommutative; for
E = 3! 13 V .
instance, 1 x = x 1 = x for any x in this algebra.
Generalize to n V in the natural way. Remark: The summation in expressions such as 1 + u above
Hint: Act with E on a b c. is formal in the usual sense: 1 + u is not a new vector or a new
Remark: As a rule, a summation of the Levi-Civita symbol tensor, but an element of a new space. The exterior algebra is thus
with any antisymmetric tensor (e.g. another ) gives rise to a the space of formal linear combinations of numbers, vectors, 2-
combinatorial factor n! when the summation goes over n in- vectors, etc., all the way to N -vectors. 
dices. Since V is a direct sum of 0 V , 1 V , etc., the elements of V
4 In are sums of scalars, vectors, bivectors, etc., i.e. of objects having
the equation below, I have put the warning no sums for clarity: A sum-
mation over all repeated indices is often implicitly assumed in the index no- a definite grade scalars being of grade 0, vectors of grade
tation. 1, and generally k-vectors being of grade k. It is easy to see

42
2 Exterior product

that k-vectors and l-vectors either commute or anticommute, for


instance

(a b) c = c (a b) ,
(a b c) 1 = 1 (a b c) ,
(a b c) d = d (a b c) .

The general law of commutation and anticommutation can be


written as
kl
k l = (1) l k ,
where k k V and l l V . However, it is important to note
that sums of elements having different grades, such as 1 + a,
are elements of V that do not have a definite grade, because
they do not belong to any single subspace k V V . Elements
that do not have a definite grade can of course still be multi-
plied within V , but they neither commute nor anticommute, for
example:

(1 + a) (1 + b) = 1 + a + b + a b,
(1 + b) (1 + a) = 1 + a + b a b.

So V is a noncommutative (but associative) algebra. Neverthe-


less, the fact that elements of V having a pure grade either
commute or anticommute is important, so this kind of algebra
is called a graded algebra.
Exercise 1: Compute the dimension of the algebra V as a vec-
tor space, if dim V = N .
PN 
Answer: dim (V ) = i=0 Ni = 2N .
Exercise 2: Suppose that an element x V is a sum of ele-
ments of pure even grade, e.g. x = 1 + a b. Show that x com-
mutes with any other element of V .
Exercise 3: Compute exp (a) and exp (a b + c d) by writing
the Taylor series using the multiplication within the algebra V .
Hint: Simplify the expression exp(x) = 1 + x + 21 x x + ... for
the particular x as given.
Answer: exp (a) = 1 + a;

exp (a b + c d) = 1 + a b + c d + a b c d.

43
3 Basic applications
In this section we will consider finite-dimensional vector Question: To me, definition D0 seems unmotivated and
spaces V without a scalar product. We will denote by N the strange. It is not clear why this complicated combination of ma-
dimensionality of V , i.e. N = dim V . trix elements has any useful properties at all. Even if so then
maybe there exists another complicated combination of matrix
elements that is even more useful?
3.1 Determinants through permutations: Answer: Yes, indeed: There exist other complicated combina-
tions that are also useful. All this is best understood if we do not
the hard way begin by studying the definition (3.1). Instead, we will proceed
In textbooks on linear algebra, the following definition is found. in a coordinate-free manner and build upon geometric intuition.
We will interpret the matrix Ajk not as a table of numbers
Definition D0: The determinant of a square N N matrix Aij
is the number but as a coordinate representation of a linear transformation A
in some vector space V with respect to some given basis. We
X ||
det(Aij ) (1) A(1)1 ...A(N )N , (3.1) will define an action of the operator A on the exterior product
space N V in a certain way. That action will allow us to under-
stand the properties and the uses of determinants without long
where the summation goes over all permutations : calculations.
(1, ..., N ) 7 (k1 , ..., kN ) of the ordered set (1, ..., N ), and the par- Another useful interpretation of the matrix Ajk is to regard
ity function || is equal to 0 if the permutation is even and it as a table of components of a set of N vectors v1 , ..., vN in a
to 1 if it is odd. (An even permutation is reducible to an even given basis {ej }, that is,
number of elementary exchanges of adjacent numbers; for in-
N
stance, the permutation (1, 3, 2) is odd while (3, 1, 2) is even. See X
Appendix B if you need to refresh your knowledge of permuta- v j = Ajk ek , j = 1, ..., N.
k=1
tions.)
Let us illustrate Eq. (3.1) with 2 2 and 3 3 matrices. Since The determinant of the matrix Ajk is then naturally related to
there are only two permutations of the set (1, 2), namely the exterior product v1 ... vN . This construction is especially
useful for solving linear equations.
(1, 2) 7 (1, 2) and (1, 2) 7 (2, 1) , These constructions and related results occupy the present
chapter. Most of the derivations are straightforward and short
and six permutations of the set (1, 2, 3), namely but require some facility with calculations involving the exte-
rior product. I recommend that you repeat all the calculations
(1, 2, 3) , (1, 3, 2) , (2, 1, 3) , (2, 3, 1) , (3, 1, 2) , (3, 2, 1) , yourself.
Exercise: If {v1 , ..., vN } are N vectors and is a permutation of
we can write explicit formulas for these determinants: the ordered set (1, ..., N ), show that
 
det
a11 a12
= a11 a22 a21 a12 ; v1 ... vN = (1)|| v(1) ... v(N ) .
a21 a22

a11 a12 a13
det a21 a22 a23 = a11 a22 a33 a11 a32 a23 a21 a12 a33 3.2 The space N V and oriented volume
a31 a32 a33
Of all the exterior power spaces k V (k = 1, 2, ...), the last non-
+ a21 a32 a13 + a31 a12 a23 a31 a22 a13 . trivial space is N V where N dim V , for it is impossible to
have a nonzero exterior product of (N + 1) or more vectors. In
We note that the determinant of an N N matrix has N ! terms in other words, the spaces N +1 V , N +2 V etc. are all zero-dimen-
this type of formula, because there are N ! different permutations sional and thus do not contain any nonzero tensors.
of the set (1, ..., N ). A numerical evaluation of the determinant By Theorem 2 from Sec. 2.3.2, the space N V is one-dimen-
of a large matrix using this formula is prohibitively long. sional. Therefore, all nonzero tensors from N V are propor-
Using the definition D0 and the properties of permutations, tional to each other. Hence, any nonzero tensor 1 N V can
one can directly prove various properties of determinants, for serve as a basis tensor in N V .
instance their antisymmetry with respect to exchanges of matrix The space N V is extremely useful because it is so simple and
rows or columns,
P and finally the relevance of det(A ij ) to linear yet is directly related to determinants and volumes; this idea
equations j Aij xj = ai , as well as the important property will be developed now. We begin by considering an example.
Example: In a two-dimensional space V , let us choose a basis
det (AB) = (det A) (det B) . {e1 , e2 } and consider two arbitrary vectors v1 and v2 . These
vectors can be decomposed in the basis as
Deriving these properties in this way will require long calcula-
tions. v1 = a11 e1 + a12 e2 , v2 = a21 e1 + a22 e2 ,

44
3 Basic applications

where {aij } are some coefficients. Let us now compute the 2- D


vector v1 v2 2 V : C
E
v1 v2 = (a11 e1 + a12 e2 ) (a21 e1 + a22 e2 ) v1 + v2 B
= a11 a22 e1 e2 + a12 a21 e2 e1
= (a11 a22 a12 a21 ) e1 e2 . v1
A
We may observe that firstly, the 2-vector v1 v2 is proportional v2
to e1 e2 , and secondly, the proportionality coefficient is equal 0
to the determinant of the matrix aij .
If we compute the exterior product v1 v2 v3 of three vectors Figure 3.1: The area of the parallelogram 0ACB spanned by
in a 3-dimensional space, we will similarly notice that the result {v1 , v2 } is equal to the area of the parallelogram
is proportional to e1 e2 e3 , and the proportionality coefficient 0ADE spanned by {v1 + v2 , v2 }.
is again equal to the determinant of the matrix aij . 
Let us return to considering a general, N -dimensional space
V . The examples just given motivate us to study N -vectors
(i.e. tensors from the top exterior power space N V ) and their
relationships of the form v1 ... vN = e1 ... eN .
By Lemma 1 from Sec. 2.3.2, every nonzero element of N V
must be of the form v1 ... vN , where the set {v1 , ..., vN } is
linearly independent and thus a basis in V . Conversely, each ba-
sis {vj } in V yields a nonzero tensor v1 ... vN N V . This
tensor has a useful geometric interpretation because, in some c
sense, it represents the volume of the N -dimensional parallelepi-
ped spanned by the vectors {vj }. I will now explain this idea.
A rigorous definition of volume in N -dimensional space re-
quires much background work in geometry and measure theory;
I am not prepared to explain all this here. However, we can mo-
tivate the interpretation of the tensor v1 ... vN as the volume
by appealing to the visual notion of the volume of a parallelepi-
ped.1 b
Statement: Consider an N -dimensional space V where the (N -
dimensional) volume of solid bodies can be computed through
some reasonable2 geometric procedure. Then: a + b
(1) Two parallelepipeds spanned by the sets of vectors
{u1 , u2 , ..., uN } and {v1 , v2 , ..., vN } have equal volumes if and a
only if the corresponding tensors from N V are equal up to a
sign,
Figure 3.2: Parallelepipeds spanned by {a, b, c} and by
u1 ... uN = v1 ... vN . (3.2) {a + b, b, c} have equal volume since the vol-
Here two bodies have equal volumes means (in the style of umes of the shaded regions are equal.
ancient Greek geometry) that the bodies can be cut into suitable
pieces, such that the volumes are found to be identical by in-
Proof of Lemma: (1) This is clear from geometric consider-
spection after a rearrangement of the pieces.
ations: When a parallelepiped is stretched times in one di-
(2) If u1 ... uN = v1 ... vN , where K is a number,
rection, its volume must increase by the factor . (2) First, we
6= 0, then the volumes of the two parallelepipeds differ by a
ignore the vectors v3 ,...,vN and consider the two-dimensional
factor of ||.
plane containing v1 and v2 . In Fig. 3.1 one can see that the paral-
To prove these statements, we will use the following lemma.
lelograms spanned by {v1 , v2 } and by {v1 + v2 , v2 } can be cut
Lemma: In an N -dimensional space: into appropriate pieces to demonstrate the equality of their area.
(1) The volume of a parallelepiped spanned by Now, we consider the N -dimensional volume (a three-dimen-
{v1 , v2 ..., vN } is times greater than that of {v1 , v2 , ..., vN }. sional example is shown in Fig. 3.2). Similarly to the two-dimen-
(2) Two parallelepipeds spanned by the sets of vectors sional case, we find that the N -dimensional parallelepipeds
{v1 , v2 , ..., vN } and {v1 + v2 , v2 , ..., vN } have equal volume. spanned by {v1 , v2 , ..., vN } and by {v1 + v2 , v2 , ..., vN } have
1 In this text, we do not actually need a mathematically rigorous notion of vol-
equal N -dimensional volume. 
ume it is used purely to develop geometrical intuition. All formulations Proof of Statement: (1) To prove that the volumes are equal
and proofs in this text are completely algebraic. when the tensors are equal, we will transform the first basis
2 Here by reasonable I mean that the volume has the usual properties: for
{u1 , u2 , ..., uN } into the second basis {v1 , v2 , ..., vN } by a se-
instance, the volume of a body consisting of two parts equals the sum of the
quence of transformations of two types: either we will multiply
volumesRof the parts. An example of such procedure would be the N -fold
integral dx1 ... dxN , where xj are coordinates of points in an orthonormal one of the vectors vj by a number , or add vj to another vec-
R
basis. tor vk . We first need to demonstrate that any basis can be trans-

45
3 Basic applications

formed into any other basis by this procedure. To demonstrate basis vectors is 1, and the basis vectors are orthogonal to each
this, recall the proof of Theorem 1.1.5 in which vectors from the other, the volume of the parallelepiped spanned by {ej } is equal
first basis were systematically replaced by vectors of the sec- to 1. (This is V
the usual Euclidean definition of volume.) Then the
ond one. Each replacement can be implemented by a certain se- tensor 1 N j=1 ej can be computed using this basis and used
quence of replacements of the kind uj uj or uj uj + ui . as a unit volume tensor. We will see below (Sec. 5.5.2) that this
Note that the tensor u1 ... uN changes in the same way as tensor does not depend on the choice of the orthonormal basis,
the volume under these replacements: The tensor u1 ... uN up to the orientation. The isomorphism between N V and K is
gets multiplied by after uj uj and remains unchanged af- then fixed (up to the sign), thanks to the scalar product. 
ter uj uj + ui . At the end of the replacement procedure, In the absence of a scalar product, one can say that the value
the basis {uj } becomes the basis {vj } (up to the ordering of of the volume in an abstract vector space is not a number but a
vectors), while the volume is multiplied by the same factor as tensor from the space N V . It is sufficient to regard the element
the tensor u1 ... uN . The ordering of the vectors in the set v1 ... vN N V as the definition of the N V -valued vol-
{vj } can be changed with possibly a sign change in the tensor ume of the parallelepiped spanned by {vj }. The space N V is
u1 ... uN . Therefore the statement (3.2) is equivalent to the one-dimensional, so the tensor-valued volume has the famil-
assumption that the volumes of {vj } and {uj } are equal. (2) A iar properties we expect (it is almost a number). One thing is
transformation v1 v1 increases the volume by a factor of || unusual about this volume: It is oriented, that is, it changes
and makes the two tensors equal, therefore the volumes differ sign if we exchange the order of two vectors from the set {vj }.
by a factor of ||. 
Let us now consider the interpretation of the above Statement. Exercise 2: Suppose {u1 , ..., uN } is a basis in V . Let x be some
Suppose we somehow know that the parallelepiped spanned by vector
P whose components in the basis {uj } are given, x =
the vectors {u1 , ..., uN } has unit volume. Given this knowledge, j j uj . Compute the (tensor-valued) volume of the parallel-

the volume of any other parallelepiped spanned by some other epiped spanned by {u1 + x, ..., uN + x}.
vectors {v1 , ..., vN } is easy to compute. Indeed, we can compute Hints: Use the linearity property, (a + x) ... = a ... + x ...,
the tensors u1 ... uN and v1 ... vN . Since the space N V and notice the simplification
is one-dimensional, these two tensors must be proportional to
each other. By expanding the vectors vj in the basis {uj }, it is x (a + x) (b + x) ... (c + x) = x a b ... c.
straightforward to compute the coefficient in the relationship
Answer: The volume tensor is
v1 ... vN = u1 ... uN .
(u1 + x) ... (uN + x) = (1 + 1 + ... + N ) u1 ... uN .
The Statement now says that the volume of a parallelepiped
spanned by the vectors {v1 , ..., vN } is equal to ||. Remark: tensor-valued area. The idea that the volume is ori-
Exercise 1: The volume of a parallelepiped spanned by vectors ented can be understood perhaps more intuitively by consid-
a, b, c is equal to 19. Compute the volume of a parallelepiped ering the area of the parallelogram spanned by two vectors a, b
spanned by the vectors 2a b, c + 3a, b. in the familiar 3-dimensional space. It is customary to draw the
Solution: Since (2a b)(c + 3a)b = 2acb = 2abc, vector product a b as the representation of this area, since the
the volume is 38 (twice 19; we ignored the minus sign since we length |a b| is equal to the area, and the direction of a b is
are interested only in the absolute value of the volume).  normal to the area. Thus, the vector a b can be understood
It is also clear that the tensor v1 ...vN allows us only to com- as the oriented area of the parallelogram. However, note that
pare the volumes of two parallelepipeds; we cannot determine the direction of the vector a b depends not only on the angular
the volume of one parallelepiped taken by itself. A tensor such orientation of the parallelogram in space, but also on the order
as v1 ... vN can be used to determine the numerical value of of the vectors a, b. The 2-vector a b is the natural analogue of
the volume only if we can compare it with another given tensor, the vector product a b in higher-dimensional spaces. Hence,
u1 ... uN , which (by assumption) corresponds to a parallelepi- it is algebraically natural to regard the tensor a b 2 V as the
ped of unit volume. A choice of a reference tensor u1 ... uN tensor-valued representation of the area of the parallelogram
can be made, for instance, if we are given a basis in V ; without spanned by {a, b}.
this choice, there is no natural map from N V to numbers (K). Consider now a parallelogram spanned by a, b in a two-
In other words, the space N V is not canonically isomorphic to dimensional plane. We can still represent the oriented area of
the space K (even though both N V and K are one-dimensional this parallelogram by the vector product a b, where we imag-
vector spaces). Indeed, a canonical isomorphism between N V ine that the plane is embedded in a three-dimensional space.
and K would imply that the element 1 K has a corresponding The area of the parallelogram does not have a nontrivial angular
canonically defined tensor 1 N V . In that case there would orientation any more since the vector product a b is always or-
be some basis {ej } in V such that e1 ... eN = 1 , which in- thogonal to the plane; the only feature left from the orientation
dicates that the basis {ej } is in some sense preferred or nat- is the positive or negative sign of a b relative to an arbitrarily
ural. However, there is no natural or preferred choice of chosen vector n normal to the plane. Hence, we may say that the
basis in a vector space V , unless some additional structure is sign of the oriented volume of a parallelepiped is the only rem-
given (such as a scalar product). Hence, no canonical choice of nant of the angular orientation of the parallelepiped in space
1 N V is possible. when the dimension of the parallelepiped is equal to the dimen-
Remark: When a scalar product is defined in V , there is a pre- sion of space. (See Sec. 2.1 for more explanations about the geo-
ferred choice of basis, namely an orthonormal basis {ej } such metrical interpretation of volume in terms of exterior product.)
that hei , ej i = ij (see Sec. 5.1). Since the length of each of the 

46
3 Basic applications

3.3 Determinants of operators Exercise 1: Prove that det(A) = N det A for any K and
A End V .
Let A End V be a linear operator. Consider its action on ten- Now let us clarify the relation between the determinant and
sors from the space N V defined in the following way, v1 ... the volume. We will prove that the determinant of a transforma-
...vN 7 Av1 ... AvN . I denote this operation by N AN , so tion A is the coefficient by which the volume of parallelepipeds
will grow when we act with A on the vector space. After proving
N AN (v1 ... vN ) (Av1 ) ... (AvN ). this, I will derive the relation (3.1) for the determinant through
The notation A underscores the fact that there are N copies the matrix coefficients of A in some basis; it will follow that the
N N

of A acting simultaneously. formula (3.1) gives the same results in any basis.
N N Statement 2: When a parallelepiped spanned by the vectors
We have just defined A on single-term products v1 ...
vN ; the action of N AN on linear combinations of such products {v 1 , ..., vN } is transformed by a linear operator A, so that vj 7

is obtained by requiring linearity. Avj , the volume of the parallelepiped grows by the factor
Let us verify that A is a linear map; it is sufficient to check | det A |.
N N

that it is compatible with the exterior product axioms: Proof: Suppose the volume of the parallelepiped spanned by
the vectors {v1 , ..., vN } is v. The transformed parallelepiped is
A(v + u) Av2 ... AvN = Av Av2 ... AvN spanned by vectors {Av1 , ..., AvN }. According to the definition
+ Au Av2 ... AvN ; of the determinant, det A is a number such that

Av1 Av2 ... AvN = Av2 Av1 ... AvN . Av1 ... AvN = (det A)v1 ... vN .
By Statement 3.2, the volume of the transformed parallelepiped
Therefore, N AN is now defined as a linear operator N V
is | det A | times the volume of the original parallelepiped. 
N V .
If we consider the oriented (i.e. tensor-valued) volume, we
By Theorem 2 in Sec. 2.3.2, the space N V is one-dimensional.
find that it grows by the factor det A (without the absolute
So N AN , being a linear operator in a one-dimensional space, value). Therefore we could define the determinant also in the
must act simply as multiplication by a number. (Every linear following way:
operator in a one-dimensional space must act as multiplication Definition D2: The determinant det A of a linear transforma-
by a number!) Thus we can write tion A is the number by which the oriented volume of any paral-
N N
A = 1N V , lelepiped grows after the transformation. (One is then obliged
to prove that this number does not depend on the choice of the
where K is a number which is somehow associated with initial parallelepiped! We just proved this in Statement 1 using
the operator A. What is the significance of this number ? This an algebraic definition D1 of the determinant.)
number is actually equal to the determinant of the operator A as With this definition of the determinant, the property
given by Definition D0. But let us pretend that we do not know
det(AB) = (det A)(det B)
anything about determinants; it is very convenient to use this
construction to define the determinant and to derive its proper- is easy to understand: The composition of the transformations
ties. A and B multiplies the volume by the product of the individual
Definition D1: The determinant det A of an operator A volume growth factors det A and det B.
End V is the number by which any nonzero tensor N V Finally, here is a derivation of the formula (3.1) from Defini-
is multiplied when N AN acts on it: tion D1. 
Statement 3: If {ej } is any basis in V , ej is the dual basis, and
N N
( A ) = (det A). (3.3) a linear operator A is represented by a tensor,
In other words, N AN = (det A)1N V . XN
We can immediately put this definition to use; here are the A = Ajk ej ek , (3.4)
first results. j,k=1
Statement 1: The determinant of a product is the product of de-
then the determinant of A is given by the formula (3.1).
terminants: det(AB) = (det A)(det B).
N N N N Proof: The operator A defined by Eq. (3.4) acts on the basis
Proof: Act with A and then with B on a nonzero ten-
N vectors {ej } as follows,
sor V . Since these operators act as multiplication by a
number, the result is the multiplication by the product of these XN

numbers. We thus have Aek = Ajk ej .


j=1
(N AN )(N B N ) = (N AN )(det B) = (det A)(det B).
A straightforward calculation is all that is needed to obtain the
On the other hand, for = v1 ... vN we have formula for the determinant. I first consider the case N = 2 as
an illustration:
(N AN )(N B N ) = (N AN )Bv1 ... BvN
2 A2 (e1 e2 ) = Ae1 Ae2
= ABv1 ... ABvN = N (AB)N
= (A11 e1 + A21 e2 ) (A12 e1 + A22 e2 )
= (det(AB)). = A11 A22 e1 e2 + A21 A12 e2 e1
Therefore, det(AB) = (det A)(det B).  = (A11 A22 A12 A21 ) e1 e2 .

47
3 Basic applications

Hence det A = A11 A22 A12 A21 , in agreement with the usual Example 1: Operators of the form 1V + a b are useful in
formula. geometry because they can represent reflections or projections
Now I consider the general case. The action of N AN on the with respect to an axis or a plane if a and b are chosen appro-
basis element e1 ... eN N V is priately. For instance, if b 6= 0, we can define a hyperplane
Hb V as the subspace annihilated by the covector b , i.e. the
N AN (e1 ... eN ) = Ae1 ... AeN subspace consisting of vectors v V such that b (v) = 0. If a
vector a V is such that b (a) 6= 0, i.e. a 6 Hb , then

N
X N
X
= Aj1 1 ej1 ... AjN N ejN 1
j1 =1 jN =1
P 1V a b
b (a)
N N
X X is a projector onto Hb , while the operator
= ... Aj1 1 ej1 ... AjN N ejN
j1 =1 jN =1 2
R 1V a b
N
X N
X b (a)
= ... (Aj1 1 ...AjN N )ej1 ... ejN .
(3.5) describes a mirror reflection with respect to the hyperplane H ,
b
j1 =1 jN =1
in the sense that v + Rv Hb for any v V . 
In the last sum, the only nonzero terms are those in which the The following statement shows how to calculate determinants
indices j1 , ..., jN do not repeat; in other words, (j1 , ..., jN ) is of such operators. For instance, with the above definitions we
a permutation of the set (1, ..., N ). Let us therefore denote this would find det P = 0 and det R = 1 by a direct application of
permutation by and write (1) j1 , ..., (N ) jN . Using the Eq. (3.6).

antisymmetry of the exterior product and the definition of the Statement: Let a V and b V . Then
parity || of the permutation , we can express 
det 1 + a b = 1 + b (a) .
V (3.6)

ej1 ... ejN = e(1) ... e(N ) = (1)


||
e1 ... eN . Proof: If b = 0, the formula is trivial, so we assume that b 6=

0. Then we need to consider two cases: b (a) 6= 0 or b (a) = 0;


Now we can rewrite the last line in Eq. (3.5) in terms of sums however, the final formula (3.6) is the same in both cases.
over all permutations instead of sums over all {j1 , ..., jN }: Case 1. By Statement 1.6, if b (a) 6= 0 there exists a basis
X {a, v2 , ..., vN } such that b (vi ) = 0 for 2 i N , where
N AN (e1 ... eN ) = A(1)1 ...A(N )N e(1) ... e(N ) N = dim V . Then we compute the determinant by applying
N
X
the operator N 1V + a b to the tensor a v2 ... vN :
||
= A(1)1 ...A(N )N (1) e1 ... eN . since

1V + a b a = (1 + b (a)) a,

Thus we have reproduced the formula (3.1).  1V + a b vi = vi , i = 2, ..., N,
We have seen three equivalent definitions of the determinant,
each with its own advantages: first, a direct but complicated we get
definition (3.1) in terms of matrix coefficients; second, an ele- N
N 1V + a b a v2 ... vN
gant but abstract definition (3.3) that depends on the construc-
tion of the exterior product; third, an intuitive and visual defini- = (1 + b (a)) a v2 ... vN .
tion in terms of the volume which, however, is based on the ge- 
Therefore det 1V + a b = 1 + b (a), as required.
ometric notion of volume of an N -dimensional domain rather 
Case 2. If b (a) = 0, we will show that det 1V + a b = 1.
than on purely algebraic constructions. All three definitions are
We cannot choose the basis {a, v2 , ..., vN } as in case 1, so we
equivalent when applied to linear operators in finite-dimension-
need to choose another basis. There exists some vector w V
al spaces.
such that b (w) 6= 0 because by assumption b 6= 0. It is clear
that {w, a} is a linearly independent set: otherwise we would
3.3.1 Examples: computing determinants have b (w) = 0. Therefore, we can complete this set to a basis
{w, a, v3 , ..., vN }. Further, the vectors v3 , ..., vN can be chosen
Question: We have been working with operators more or less such that b (v ) = 0 for 3 i N . Now we compute the
i
in the same way as with matrices, like in Eq. (3.4). What is the N N

advantage of the coordinate-free approach if we are again com- determinant by acting with the operator 1V + a b on
puting with the elements of matrices? the tensor a w v3 ... vN : since

Answer: In some cases, there is no other way except to rep- 1V + a b a = a,
resent an operator in some basis through a matrix such as Aij . 
1V + a b w = w + b (w) a,
However, in many cases an interesting operator can be repre- 
sented geometrically, i.e. without choosing a basis. It is often use- 1V + a b vi = vi , i = 3, ..., N,
ful to express an operator in a basis-free manner because this
we get
yields some nontrivial information that would otherwise be ob-
N
scured by an unnecessary (or wrong) choice of basis. It is use- N 1V + a b a w v3 ... vN
ful to be able to employ both the basis-free and the component-
= a (w + b (w) a) v3 ... vN
based techniques. Here are some examples where we compute
determinants of operators defined without a basis. = a w v3 ... vN .

48
3 Basic applications

Therefore det 1V + a b = 1.  vectors. Since each of the vj can be decomposed through the
Exercise 1: In a similar way, prove the following statement: If basis {ej }, say
ai V and bi V for 1 i n < N are such that bi (aj ) = 0 N
for all i > j, then X
vi = vij ej , i = 1, ..., N,
 n  n j=1
X Y

det 1V + ai bi = (1 + bi (ai )) .
we may consider the coefficients vij as a square matrix. This ma-
i=1 i=1
trix, at first glance, does not represent a linear transformation;
Exercise 2: Consider the three-dimensional space of polynomi- its just a square-shaped table of the coefficients vij . However,
als p(x) in the variable x of degree at most 2 with real coeffi- let us define a linear operator A by the condition that Aei = vi
cients. The operators A and B are defined by for all i = 1, ..., N . This condition defines Ax for any vector x if
we assume the linearity of A (see Exercise 2 in Sec. 1.2.2). The
dp(x) operator A has the following matrix representation with respect
(Ap)(x) p(x) + x ,
dx to the basis {ei } and the dual basis {ei }:
(Bp)(x) x2 p(1) + 2p(x).
XN XN X N

Check that these operators are linear. Compute the determi- A = vi e i = vij ej ei .
i=1 i=1 j=1
nants of A and B.
Solution: The operators are linear because they are expressed So the matrix vji (the transpose of vij ) is the matrix representing
as formulas containing p(x) linearly. Let us use the underbar to the transformation A. Let us consider the determinant of this
distinguish the polynomials 1, x from numbers such as 1. A transformation:
convenient basis tensor of the 3rd exterior power is 1 x x2 , so
we perform the calculation, (det A)e1 ... eN = Ae1 ... AeN = v1 ... vN .

(det A)(1 x x2 ) = (A1) (Ax) (Ax2 ) The determinant of the matrix vji is thus equal to the determi-
2
= 1 (2x) (3x ) = 6(1 x x ), 2 nant of the transformation A. Hence, the computation of the
determinant of the matrix vji is equivalent to the computation
N
and find that det A = 6. Similarly we find det B = 12.  of the tensor v1 ... vN V and its comparison with the
basis tensor e1 ... eN . We have thus proved the following
Exercise 3: Suppose the space V is decomposed into a direct statement.
sum of U and W , and an operator A is such that U and W are Statement 1: The determinant of the matrix vji made up by the
invariant subspaces (Ax U for all x U , and the same for W ). components of the vectors {vj } in a basis {ej } (j = 1, ..., N ) is
Denote by AU the restriction of the operator A to the subspace the number C defined as the coefficient in the tensor equality
U . Show that
det A = (det AU )(det AW ). v1 ... vN = Ce1 ... eN .

Hint: Choose a basis in V as the union of a basis in U and Corollary: The determinant of a matrix does not change when a
a basis in W . In this basis, the operator A is represented by a multiple of one row is added to another row. The determinant is
block-diagonal matrix. linear as a function of each row. The determinant changes sign
when two rows are exchanged.
Proof: We consider the matrix vij as the table of coefficients of
3.4 Determinants of square tables vectors {vj } in a basis {ej }, as explained above. Since

(det vji )e1 ... eN = v1 ... vN ,


Note that the determinant formula (3.1) applies to any square
matrix, without referring to any transformations in any vector we need only to examine the properties of the tensor v1
spaces. Sometimes it is useful to compute the determinants of ... vN under various replacements. When a multiple of row k
matrices that do not represent linear transformations. Such ma- is added to another row j, we replace vj 7 vj + vk for fixed
trices are really just tables of numbers. The properties of deter- j, k; then the tensor does not change,
minants of course remain the same whether or not the matrix
represents a linear transformation in the context of the prob- v1 ... vj ... vN = v1 ... (vj + vk ) ... vN ,
lem we are solving. The geometric construction of the deter-
minant through the space N V is useful because it helps us un- hence the determinant of vij does not change. To show that the
derstand heuristically where the properties of the determinant determinant is linear as a function of each row, we consider the
come from. replacement vj 7 u + v for fixed j; the tensor is then equal
Given just a square table of numbers, it is often useful to in- to the sum of the tensors v1 ... u ... vN and v1 ... v
troduce a linear transformation corresponding to the matrix in ... vN . Finally, exchanging the rows k and l in the matrix vij
some (conveniently chosen) basis; this often helps solve prob- corresponds to exchanging the vectors vk and vl , and then the
lems. An example frequently used in linear algebra is a matrix tensor changes sign. 
consisting of the components of some vectors in a basis. Sup- It is an important property that matrix transposition leaves
pose {ej | j = 1, ..., N } is a basis and {vj | j = 1, ..., N } are some the determinant unchanged.

49
3 Basic applications

Statement 2: The determinant of the transposed operator is un- Hence det AT = det A. 
changed: Exercise* (Laplace expansion): As shown in the Corollary
det AT = det A. above, the determinant of the matrix vij is a linear function of
Proof: I give two proofs, one based on Definition D0 and the each of the vectors {vi }. Consider det(vij ) as a linear function
properties of permutations, another entirely coordinate-free of the first vector, v1 ; this function isa covector that we may tem-
based on Definition D1 of the determinant and definition 1.8.4 porarily denote
 by f1 . Show that f1 can be represented in the
of the transposed operator. dual basis ej as
First proof : According to Definition D0, the determinant of the N
X
transposed matrix Aji is given by the formula f1 = (1)i1 B1i ei ,
X || i=1
det(Aji ) (1) A1,(1) ...AN,(N ) , (3.7)
where the coefficients B1i are minors of the matrix vij , that is,
so the only difference between det(Aij ) and det(Aji ) is the or- determinants of the matrix vij from which row 1 and column i
der of indices in the products of matrix elements, namely A(i),i have been deleted.
instead of Ai,(i) . We can show that the sum in Eq. (3.7) con- Solution: Consider one of the coefficients, for example B11
sists of exactly the same terms as the sum in Eq. (3.1), only the f1 (e1 ). This coefficient can be determined from the tensor equal-
terms occur in a different order. This is sufficient to prove that ity
det(Aij ) = det(Aji ). e1 v2 ... vN = B11 e1 ... eN . (3.8)
The sum in Eq. (3.7) consists of terms of the form
We could reduce B11 to a determinant of an (N 1) (N 1)
A1,(1) ...AN,(N ) , where is some permutation. We may reorder
matrix if we could cancel e1 on both sides of Eq. (3.8). We would
factors in this term,
be able to cancel e1 if we had a tensor equality of the form
A1,(1) ...AN,(N ) = A (1),1 ...A (N ),N ,

e1 = B11 e1 e2 ... eN ,
where is another permutation such that Ai,(i) = A (i),i for
i = 1, ..., N . This is achieved when is the permutation inverse where the (N 1)-vector were proportional to e2 ... eN .
to , i.e. we need to use 1 . Since there exists precisely However, v2 ... vN in Eq. (3.8) is not necessarily proportional
one inverse permutation 1 for each permutation , we may to e2 ... eN ; so we need to transform Eq. (3.8) to a suitable
transform the sum in Eq. (3.7) into a sum over all inverse per- form. In order to do this, we transform the vectors vi into vec-
mutations ; each permutation will still enter exactly once into tors that belong to the subspace spanned by {e2 , ..., eN }. We
the new sum. Since the parity of the inverse permutation 1 is subtract from each vi (i = 2, ..., N ) a suitable multiple of e1 and
the same as the parity of (see Statement 3 in Appendix B), the define the vectors vi (i = 2, ..., N ) such that e1 (vi ) = 0:
||
factor (1) will remain unchanged. Therefore, the sum will
remain the same. vi vi e1 (vi )e1 , i = 2, ..., N.
Second proof : The transposed operator is defined as Then vi Span {e2 , ..., eN } and also
(AT f )(x) = f (Ax), f V , x V.
e1 v2 ... vN = e1 v2 ... vN .
In order to compare the determinants det A and det(AT ) accord-
ing to Definition D1, we need to compare the numbers N AN Now Eq. (3.8) is rewritten as
and N (AT )N .
e1 v2 ... vN = B11 e1 e2 ... eN .
Let us choose nonzero tensors N V and N V . By
Lemma 1 in Sec. 2.3.2, these tensors have representations of the Since vi Span {e2 , ..., eN }, the tensors v2 ... vN and e2
form = v1 ... vN and = f1 ... fN

. We have ... eN are proportional to each other. Now we are allowed to
(det A)v ... v = Av ... Av . cancel e1 and obtain
1 N 1 N

Now we would like to relate this expression with the analogous v2 ... vN = B11 e2 ... eN .
expression for AT . In order to use the definition of AT , we need
Note that the vectors vi have the first components equal to zero.
to act on the vectors Avi by the covectors fj . Therefore, we
In other words, B11 is equal to the determinant of the matrix
act with the N -form N V = (N V ) on the N -vector
N N N vij from which row 1 (i.e. the vector v1 ) and column 1 (the
A V (this canonical action was defined by Defini-
coefficients at e1 ) have been deleted. The coefficients B1j for
tion 3 in Sec. 2.2). Since this action is linear, we find
j = 2, ..., N are calculated similarly. 
(N AN ) = (det A) ().
(Note that () 6= 0 since by assumption the tensors and 3.4.1 * Index notation for N V and determinants
are nonzero.) On the other hand,
 X Let us see how determinants are written in the index notation.
N AN = (1)|| f1 (Av(1) )...fN

(Av(N ) ) In order to use the index notation, we need to fix a basis {ej }
and represent each vector and each tensor by their components
X
= (1)|| (AT f1 )(v(1) )...(AT fN

)(v(N ) ) in that basis. Determinants are related to the space N V . Let us
consider a set of vectors {v1 , ..., vN } and the tensor

= N (AT )N () = (det AT ) (). v1 ... vN N V.

50
3 Basic applications

Since the space N V is one-dimensional and its basis consists Since the tensor N AN is proportional to with the coeffi-
of the single tensor e1 ... eN , the index representation of cient det A, the same proportionality holds for the components
consists, in principle, of the single number C in a formula such of these tensors:
as X
= Ce1 ... eN . i1 ...iN Akj11 vij11 ...AkjNN vijNN = (det A) k1 ...kN
is ,js
However, it is more convenient to use a totally antisymmetric X
array of numbers having N indices, i1 ...iN , so that = (det A) i1 ...iN vik11 ...vikNN .
is
N
1 X
= i1 ...iN ei1 ... eiN . The relation above must hold for arbitrary vectors {vj }. This is
N ! i ,...,i =1
1 N sufficient to derive a formula for det A. Since {vj } are arbitrary,
we may select {vj } as the basis vectors {ej }, so that vik = ik .
Then the coefficient C is C 12...N . In the formula above, the
Substituting this into the equation above, we find
combinatorial factor N ! compensates the fact that we are sum-
ming an antisymmetric product of vectors with a totally anti- X
i1 ...iN Aki11 ...AkiNN = (det A)k1 ...kN .
symmetric array of coefficients.
is ,js
To write such arrays more conveniently, one can use Levi-
Civita symbol i1 ...iN (see Sec. 2.3.6). It is clear that any other We can now solve for det A by multiplying with another Levi-
totally antisymmetric array of numbers with N indices, such as Civita symbol k1 ...kN , written this time with lower indices to
i1 ...iN , is proportional to i1 ...iN : For indices {i1 , ..., iN } that cor- comply with the summation convention, and summing over all
respond to a permutation we have ks . By elementary combinatorics (there are N ! possibilities to
choose the indices k1 , ..., kN such that they are all different), we
i1 ...iN = 12...N (1)|| ,
have X
and hence k1 ...kN k1 ...kN = N !,
i1 ...iN = ( 12...N )i1 ...iN . k1 ,...,kN

How to compute the index representation of given the array and therefore
vjk of the components of the vectors {vj }? We need to represent 1 X
the tensor det(A) = k1 ...kN i1 ...iN Aki11 ...AkiNN .
N!
X is ,ks
||
(1) v(1) v(2) ... v(N ) .
This formula can be seen as the index representation of
Hence, we can use the Levi-Civita symbol and write det A = (N AN ),
X || 1
12...N = (1) v(1) 2
v(2) N
... v(N ) where (N V ) is the tensor dual to and such that
() = 1. The components of are
N
X
= i1 ...iN vi11 ...viNN . 1
k ...k .
i1 ,...,iN =1 N! 1 N
The component 12...N is the only number we need to represent We have shown how the index notation can express calcula-
in the basis {ej }. tions with determinants and tensors in the space N V . Such
The Levi-Civita symbol itself can be seen as the index repre- calculations in the index notation are almost always more cum-
sentation of the tensor bersome than in the index-free notation.

e1 ... eN
in the basis {ej }. (The components of in a different basis will,
3.5 Solving linear equations
of course, differ from i1 ...iN by a constant factor.) Determinants allow us to determine whether a system of lin-
Now let us construct the index representation of the determi- ear equations has solutions. I will now explain this using ex-
nant of an operator A. The operator is given by its matrix Aij and terior products. I will also show how to use exterior products
acts on a vector v with components v i yielding a vector u Av for actually finding the solutions of linear equations when they
with components exist.
XN A system of N linear equations for N unknowns x1 , ..., xN can
uk = Aki v i . be written in the matrix form,
i=1
N
N N X
Hence, the operator A acting on yields an antisymmetric
Aij xj = bi , i = 1, ..., N. (3.9)
tensor whose component with the indices k1 ...kN is j=1
h ik1 ...kN h ik1 ...kN
(N AN ) = Av1 ... AvN Here Aij is a given matrix of coefficients, and the N numbers bi
X are also given.
= i1 ...iN Akj11 vij11 ...AkjNN vijNN . The first step in studying Eq. (3.9) is to interpret it in a geo-
is ,js metric way, so that Aij is not merely a table of numbers but a

51
3 Basic applications

geometric object. We introduce an N -dimensional vector space Then due to linearity of A we have
V = RN , in which a basis {ei } is fixed. There are two options
N
(both will turn out to be useful). The first option is to interpret X
Aij , bj , and xj as the coefficients representing some linear oper- b = A ci e i ;
i=1
ator A and some vectors b, x in the basis {ej }:
N N N in other words, the solution of the equation Ax = b is x
X X X PN
A Aij ei ej , b b j ej , x xj ej . i=1 ci ei . Since the coefficients {ci } are determined uniquely,
i,j=1 j=1 j=1 the solution x is unique.
The solution x can be expressed as a function of b as follows.
Then we reformulate Eq. (3.9) as the vector equation Since {Aei } is a basis, there exists
the corresponding dual basis,
which we may denote by vj . Then the coefficients ci can be
Ax = b, (3.10)
expressed as ci = vi (b), and the vector x as
from which we would like to find the unknown vector x. N N N
The second option is to interpret Aij as the components of a X X X 
x= ci e i = ei vi (b) = ei vi b A1 b.
set of N vectors {a1 , ..., aN } with respect to the basis, i=1 i=1 i=1

N
X This shows explicitly that the operator A1 exists and is linear.
aj Aij ei , j = 1, ..., N,

i=1
Corollary: If det A 6= 0, the equation Av = 0 has only the (triv-
to define b as before, ial) solution v = 0.
N
X Proof: The zero vector v = 0 is a solution of Av = 0. By
b b j ej , the above theorem the solution of that equation is unique, thus
j=1
there are no other solutions. 
and to rewrite Eq. (3.9) as an equation expressing b as a linear Theorem 2 (existence of eigenvectors): If det A = 0, there ex-
combination of {aj } with unknown coefficients {xj }, ists at least one eigenvector with eigenvalue 0, that is, at least
N
one nonzero vector v such that Av = 0.
X Proof: Choose a basis {ej } and consider the set
xj aj = b. (3.11)
j=1 { Ae 1 , ..., AeN }. This set must be linearly dependent since

In this interpretation, {xj } is just a set of N unknown numbers. Ae1 ... AeN = (det A)e1 ... eN = 0.
These numbers could be interpreted the set of components of
the vector b in the basis {aj } if {aj } were actually a basis, whichHence,
PN there must exist at least one linear combination
is not necessarily the case. i=1 i Aei = 0 with i not all zero. Then the vector v
PN
i=1 i ei is nonzero and satisfies Av = 0. 
3.5.1 Existence of solutions Remark: If det A = 0, there may exist more than one eigenvector
v such that Av = 0; more detailed analysis is needed to fully
Let us begin with the first interpretation, Eq. (3.10). When does determine the eigenspace of zero eigenvalue, but we found that
Eq. (3.10) have solutions? The solution certainly exists when
at least one eigenvector v exists. If det A = 0 then the equation
the operator A is invertible, i.e. the inverse operator A1 ex-
Ax = b with b 6= 0 may still have solutions, although not for
ists such that AA1 = A1 A = 1V ; then the solution is found every b. Moreover, when a solution x exists it will not be unique
as x = A1 b. The condition for the existence of A1 is that the because x + v is another solution if x is one. The full analysis
determinant of A is nonzero. When the determinant of A is zero,
of solvability of the equation Ax = b when det A = 0 is more
the solution may or may not exist, and the solution is more com-
complicated (see the end of Sec. 3.5.2). 
plicated. I will give a proof of these statements based on the new 1
definition D1 of the determinant. Once the inverse operator A is determined, it is easy to com-
Theorem 1: If det A 6= 0, the equation Ax = b has a unique pute solutions of any number of equations Ax = b1 , Ax = b2 ,
solution x for any b V . There exists a linear operator A1 etc., for any number of vectors b1 , b2 , etc. However, if we
such that the solution x is expressed as x = A1 b. only need to solve one such equation, Ax = b, then comput-
Proof: Suppose {ei | i = 1, ..., N } is a basis in V . It follows ing the full inverse operator istoo much work: We have to de-
termine the entire dual basis vj and construct the operator
from det A 6= 0 that PN
A1 =
i=1 ei vi . An easier method is then provided by
N N Kramers rule.
A (e1 ... eN ) = (Ae1 ) ... (AeN ) 6= 0.

By Theorem 1 of Sec. 2.3.2, the set of vectors {Ae1 , ..., AeN } is


linearly independent and therefore is a basis in V . Thus there 3.5.2 Kramers rule and beyond
exists a unique set of coefficients {ci } such that We will now use the second interpretation, Eq. (3.11), of a linear
N
system. This equation claims that b is a linear combination of
X
b= ci (Aei ). the N vectors of the set {a1 , ..., aN }. Clearly, this is true for any b
i=1
if {a1 , ..., aN } is a basis in V ; in that case, the solution {xj } exists

52
3 Basic applications

and is unique because the dual basis, aj , exists and allows us that may have nonzero coefficients x(1) (1)
1 , ..., xr only up to the
to write the solution as (1)
component number r, after which xi = 0 (r + 1 i n). To
(1)

xj = aj (b). obtain the coefficients xi , we use Kramers rule for the sub-
space Span {a1 , ..., ar }:
On the other hand, when {a1 , ..., aN } is not a basis in V it is not
certain that some given vector b is a linear combination of aj . In (1) a1 ... aj1 b aj+1 ... ar
xi = .
that case, the solution {xj } may or may not exist, and when it a1 ... ar
exists it will not be unique. We can now obtain the general solution of the equation
We first consider the case where {aj } is a basis in V . In this Pn (1)
case, the solution {xj } exists, and we would like to determine j=1 xj aj = b by adding to the solution xi an arbitrary so-
(0) P n (0)
it more explicitly. We recall that an explicit computation of the lution xi of the homogeneous equation, j=1 xj aj = 0. The
dual basis was shown in Sec. 2.3.3. Motivated by the construc- solutions of the homogeneous equation build a subspace that
tions given in that section, we consider the tensor can be determined as an eigenspace of the operator A as con-
sidered in the previous subsection. We can also determine the
a1 ... aN N V homogeneous solutions using the method of this section, as fol-
lows.
and additionally the N tensors {j | j = 1, ..., N }, defined by
We decompose the vectors ar+1 , ..., an into linear combina-
N tions of a1 , ..., ar again by using Kramers rule:
j a1 ... aj1 b aj+1 ... aN V. (3.12)
r
The tensor j is the exterior product of all the vectors a1 to aN X
ak = kj aj , k = r + 1, ..., n,
except that aj is replaced by b. Since we know that the solution
PN j=1
xj exists, we can substitute b = i=1 xi ai into Eq. (3.12) and a1 ... aj1 ak aj+1 ... ar
find kj .
a1 ... ar
j = a1 ... xj aj ... aN = xj .
Since {aj } is a basis, the tensor N V is nonzero (Theorem 1 Having computed the coefficients kj , we determine the (n r)-
in Sec. 2.3.2). Hence xj (j = 1, ..., N ) can be computed as the dimensional space of homogeneous solutions. This space is
coefficient of proportionality between j and : spanned by the (n r) solutions that can be chosen, for exam-
ple, as follows:
j a1 ... aj1 b aj+1 ... aN
xj = = . (0)(r+1)
a1 ... aN xi = ((r+1)1 , ..., (r+1)r , 1, 0, ..., 0),
(0)(r+2)
As before, the division of tensors means that the nonzero ten- xi = ((r+2)1 , ..., (r+2)r , 0, 1, ..., 0),
sor is to be factored out of the numerator and canceled with ...
the denominator, leaving a number. (0)(n)
This formula represents Kramers rule, which yields explic- xi = (n1 , ..., nr , 0, 0, ..., 1).
itly the coefficients xj necessary to represent a vector b through Pn
Finally, the solution of the equation j=1 xj aj = b can be writ-
vectors {a1 , ..., aN }. In its matrix formulation, Kramers rule
ten as
says that xj is equal to the determinant of the modified ma- Xn
(1) (0)(k)
trix Aij where the j-th column has been replaced by the column xi = xi + k xi , i = 1, ..., n,
(b1 , ..., bN ), divided by the determinant of the unmodified Aij . k=r+1
It remains to consider the case where {aj } is not a basis in where {k | k = r + 1, ...n} are arbitrary coefficients. The for-
V . We have seen in Statement 2.3.5 that there exists a maximal mula above explicitly contains (nP r) arbitrary constants and
nonzero exterior product of some linearly independent subset of n
is called the general solution of i=1 xi ai = b. (The general
{aj }; this subset can be found by trying various exterior prod- solution of something is a formula with arbitrary constants that
ucts of the aj s. Let us now denote by this maximal exterior describes all solutions.)
product. Without loss of generality, we may renumber the aj s Example: Consider the linear system
so that = Pan1 ... ar , where r is the rank of the set {aj }. If the
equation j=1 xj aj = b has a solution then b is expressible as 2x + y = 1
a linear combination of the aj s; thus we must have b = 0. 2x + 2y + z = 4
We can check whether b = 0 since we have already com-
y+z =3
puted . If we find that b 6= 0 we know that the equation
P n
j=1 xj aj = b has no solutions. Let us apply the procedure above to this system. We interpret
If we find that b = 0 then we can conclude that the vector this system as the vector equation xa + yb + zc = p where a =
Pbelongs
b to the subspace Span {a1 , ..., ar }, and so the equation (2, 2, 0), b = (1, 2, 1), c = (0, 1, 1), and p = (1, 4, 3) are given
n
j=1 j j = b has solutions, in fact infinitely many of them.
x a vectors. Introducing an explicit basis {e1 , e2 , e3 }, we compute
To determine all solutions, we will note that the set {a1 , ..., ar } (using elimination)
is linearly independent, so b is uniquely represented as a linear
combination of the vectors a1 , ..., ar . In other words, there is a a b = (2e1 + 2e2 ) (e1 + 2e2 + e3 )
unique solution of the form = 2 (e1 + e2 ) (e1 + 2e2 + e3 )
xi
(1) (1)
= (x1 , ..., x(1) = 2 (e1 + e2 ) (e2 + e3 ) = a c.
r , 0, ..., 0)

53
3 Basic applications

Therefore a b c = 0, and the maximal nonzero exterior prod- It is a curious matrix that is useful in several ways. A classic
uct can be chosen as a b. Now we check whether the result is an explicit formula for the determinant of this matrix.
vector p belongs to the subspace Span {a, b}: Let us first compute the determinant for a Vandermonde matrix
of small size.
p = 2 (e1 + e2 ) (e2 + e3 ) (e1 + 4e2 + 3e3 )
Exercise 1: Verify that the Vandermonde determinants for N =
= 2 (e1 + e2 ) (e2 + e3 ) 3(e2 + e3 ) = 0. 2 and N = 3 are as follows,
Therefore, p can be represented as a linear combination of a and

1 1 1

b. To determine the coefficients, we use Kramers rule: p = 1 1
x y = y x; x2 y2 z2 = (y x) (z x) (z y) .

a + b where x y z
pb (e1 + 4e2 + 3e3 ) (e1 + 2e2 + e3 )
= =
ab 2 (e1 + e2 ) (e2 + e3 ) It now appears plausible from these examples that the deter-
2e1 e2 2e1 e3 2e2 e3 minant that we denote by det (Vand(x1 , ..., xN )) is equal to the
= = 1; product of the pairwise differences between all the xi s.
2 (e1 e2 + e1 e3 + e2 e3 )
Statement 1: The determinant of the Vandermonde matrix is
ap 2 (e1 + e2 ) (e1 + 4e2 + 3e3 )
= = given by
ab 2 (e1 + e2 ) (e2 + e3 )
3e1 e2 + 3e1 e3 + 3e2 e3 det (Vand (x1 , ..., xN ))
= = 3.
e1 e2 + e1 e3 + e2 e3 = (x2 x1 ) (x3 x1 ) ... (xN xN 1 )
Therefore, p = a + 3b; thus the inhomogeneous solution is Y
(1) = (xj xi ). (3.13)
x = (1, 3, 0).
1i<jN
To determine the space of homogeneous solutions, we decom-
pose c into a linear combination of a and b by the same method; Proof: Let us represent the Vandermonde matrix as a table of
the result is c = 21 a+b. So the space of homogeneous solutions the components of a set of N vectors {vj } with respect to some
is spanned by the single solution basis {ej }. Looking at the Vandermonde matrix, we find that
x
(0)(1) 1
= , 1, 1 .
 the components of the vector v1 are (1, 1, ..., 1), so
i 2
Finally, we write the general solution as v1 = e1 + ... + eN .
(1) (0)(1) 1

xi = xi + xi = 1 2 , 3 + , , The components of the vector v2 are (x1, x2 , ..., xN ); the compo-
where is an arbitrary constant.  nents of the vector v3 are x21 , x22 , ..., x2N . Generally, the vector
Remark: In the calculations of the coefficients according to vj (j = 1, ..., N ) has components (xj1 j1
1 , ..., xN ). It is conve-
Kramers rule the numerators and the denominators always nient to introduce a linear operator A such that Ae1 = x1 e1 , ...,
contain the same tensor, such as e1 e2 + e1 e3 + e2 e3 , AeN = xN eN ; in other words, the operator A is diagonal in the
multiplied by a constant factor. We have seen this in the above basis {ej }, and ej is an eigenvector of A with the eigenvalue xj .
examples. This is guaranteed to happen in every case; it is im-
A tensor representation of A is
possible that a numerator should contain e1 e2 +e1 e3 +2e2 e3
or some other tensor not proportional to . Therefore, in prac- XN
tical calculations it is sufficient to compute just one coefficient, A = xj ej ej .
say at e1 e2 , in both the numerator and the denominator. j=1
Exercise: Techniques based on Kramers rule can be applied
also to non-square systems. Consider the system Then we have a short formula for vj :
x+y =1 vj = Aj1 u, j = 1, ..., N ; u v1 = e1 + ... + eN .
y+z =1
According to Statement 1 of Sec. 3.4, the determinant of the Van-
This system has infinitely many solutions. Determine the gen-
dermonde matrix is equal to the coefficient C in the equation
eral solution.
Answer: For example, the general solution can be written as v1 ... vN = Ce1 ... eN .
xi = (1, 0, 1) + (1, 1, 1) ,
So our purpose now is to determine C. Let us use the formula
where is an arbitrary number. for vj to rewrite

v1 ... vN = u Au A2 u ... AN 1 u. (3.14)


3.6 Vandermonde matrix
Now we use the following trick: since a b = a (b + a) for
The Vandermonde matrix is defined by any , we may replace

1 1 1
x1 x xN u Au = u (Au + u) = u (A + 1)u.
2
2
Vand (x1 , ..., xN ) x1
x22 x2N
.
.. .. Similarly, we may replace the factor A2 u by (A2 + 1 A + 2 )u,
..
. . . with arbitrary coefficients 1 and 2 . We may pull this trick in
xN
1
1
xN
2
1
xN
N
1 every factor in the tensor product (3.14) starting from the second

54
3 Basic applications

factor. In effect, we may replace Ak by an arbitrary polynomial prove the Vandermonde formula in a much more elegant way.3
pk (A) of degree k as long as the coefficient at Ak remains 1. (Such Namely, one can notice that the expression v1 ... vN is a poly-
polynomials are called monic polynomials.) So we obtain nomial in xj of degree not more than 21 N (N 1); that this poly-
nomial is equal to zero unless every xj is different; therefore this
u Au A2 u ... AN 1 u polynomial must be equal to Eq. (3.13) times a constant. To find
that constant, one computes explicitly the coefficient at the term
= u p1 (A)u p2 (A)Au ... pN 1 (A)u.
x2 x23 ...xN
N
1
, which is equal to 1, hence the constant is 1. 
In the next two subsections we will look at two interesting
Since we may choose the monic polynomials pj (A) arbitrarily,
applications of the Vandermonde matrix.
we would like to choose them such that the formula is simplified
as much as possible.
Let us first choose the polynomial pN 1 because that polyno- 3.6.1 Linear independence of eigenvectors
mial has the highest degree (N 1) and so affords us the most
Statement: Suppose that the vectors e1 , ..., en are nonzero and
freedom. Here comes another trick: If we choose
are eigenvectors of an operator A with all different eigenvalues
pN 1 (x) (x x1 ) (x x2 ) ... (x xN 1 ) , 1 , ..., n . Then the set {e1 , ..., en } is linearly independent. (The
number n may be less than the dimension N of the vector space
then the operator pN 1 (A) will be much simplified: V ; the statement holds also for infinite-dimensional spaces).
Proof. Let us show that the set {ej | j = 1, ..., n} is linearly
pN 1 (A)eN = pN 1 (xN )eN ; pN 1 (A)ej = 0, j = 1, ..., N 1. independent. Pn By definition of linear independence, we need to
show that j=1 cj ej = 0 is possible only if all the coefficients cj
P
Therefore pN 1 (A)u = pN 1 (xN )eN . Now we repeat this trick are equal to zero. Let us denote u = nj=1 cj ej and assume that
for the polynomial pN 2 , choosing u = 0. Consider the vectors u, Au, ..., An1 u; by assumption all
these vectors are equal to zero. The condition that these vectors
pN 2 (x) (x x1 ) ... (x xN 2 )
are equal to zero is a system of vector equations that looks like
this,
and finding
c1 e1 + ... + cn en = 0,
pN 2 (A)u = pN 2 (xN 1 )eN 1 + pN 2 (xN )eN .
c1 1 e1 + ... + cn n en = 0,
We need to compute the exterior product, which simplifies: ...
c1 n1
1 e1 + ... + cn n1
n en = 0.
pN 2 (A)u pN 1 (A)u
= (pN 2 (xN 1 )eN 1 + pN 2 (xN )eN ) pN 1 (xN )eN This system of equations can be written in a matrix form with
= pN 2 (xN 1 )eN 1 pN 1 (xN )eN . the Vandermonde matrix,

Proceeding inductively in this fashion, we find 1 1 1 c1 e 1 0
1 2 n c2 e 2 0

.. .. . .. .. = .. .
u p1 (A)u ... pN 1 (A)u . . . .
= u p1 (x2 )e2 ... pN 1 (xN )eN n1
1 n1
2 n1
n cn e n 0
= p1 (x2 )...pN 1 (xN )e1 ... eN ,
Since the eigenvalues j are (by assumption) all different, the
where we defined each monic polynomial pj (x) as determinant of the Vandermonde matrix is nonzero. Therefore,
this system of equations has only the trivial solution, cj ej = 0
pj (x) (x x1 )...(x xj ), j = 1, ..., N 1. for all j. Since ej 6= 0, it is necessary that all cj = 0, j = 1, ...n. 
Exercise: Show that we are justified in using the matrix method
For instance, p1 (x) = x x1 . The product of the polynomials, for solving a system of equations with vector-valued unknowns
ci e i .
p1 (x2 )p2 (x3 )...pN 1 (xN ) Hint: Act with an arbitrary covector f on all the equations.
= (x2 x1 ) (x3 x1 )(x3 x2 )...(xN xN 1 )
Y
= (xj xi ) . 3.6.2 Polynomial interpolation
1i<jN
The task of polynomial interpolation consists of finding a poly-
yields the required formula (3.13).  nomial that passes through specified points.
Remark: This somewhat long argument explains the procedure Statement: If the numbers x1 , ..., xN are all different and num-
of subtracting various rows of the Vandermonde matrix from bers y1 , ..., yN are arbitrary then there exists a unique polynomial
each other in order to simplify the determinant. (The calcula- p(x) of degree at most N 1 that has values yj at the points xj
tion appears long because I have motivated every step, rather (j = 1, ..., N ).
than just go through the equations.) One can observe that the 3I picked this up from a paper by C. Krattenthaler (see online
determinant of the Vandermonde matrix is nonzero if and only arxiv.org/abs/math.co/9902004) where many other special
if all the values xj are different. This property allows one to determinants are evaluated using similar techniques.

55
3 Basic applications

Proof. Let us try to determine the coefficients of the polyno- This map is linear in A (as well as being a linear map of 2 V to
mial p(x). We write a polynomial with unknown coefficients, itself), so I denote this map by 2 A1 to emphasize that it contains
A only linearly. I call such maps extensions of A to the exterior
p(x) = p0 + p1 x + ... + pN 1 xN 1 ,
power space 2 V (this is not a standard terminology).
and obtain a system of N linear equations, p(xj ) = yj (j = It turns out that operators of this kind play an important role
1, ..., N ), for the N unknowns pj . The crucial observation is that in many results related to determinants. Let us now generalize
this system of equations has the Vandermonde matrix. For ex- the examples given above. We denote by m Ak a linear map
ample, with N = 3 we have three equations, m V m V that acts on v1 ... vm by producing a sum of
terms with k copies of A in each term. For instance,
p(x1 ) = p0 + p1 x1 + p2 x21 = y1 ,
p(x2 ) = p0 + p1 x2 + p2 x2 = y2 , 2 A1 (a b) Aa b + a Ab;
2
p(x3 ) = p0 + p1 x3 + p2 x23 = y3 , 3 A3 (a b c) Aa Ab Ac;
3 A2 (a b c) Aa Ab c + Aa b Ac
which can be rewritten in the matrix form as
+ a Ab Ac.
1 x1 x21 p0 y1
1 x2 x22 p1 = y2 . More generally, we can write
1 x3 x23 p2 y3
k Ak (v1 ... vk ) = Av1 ... Avk ;
Since the determinant of the Vandermonde matrix is nonzero as Xk
long as all xj are different, these equations always have a unique k A1 (v1 ... vk ) = v1 ... Avj ... vk ;
solution {pj }. Therefore the required polynomial always exists j=1
and is unique.  X
k Am (v1 ... vk ) = As1 v1 ... Ask vk .
Question: The polynomial p(x) exists, but how can I write it ex-
plicitly? s1P, ..., sk = 0, 1
Answer: One possibility is the Lagrange interpolating poly- j sj = m
nomial; let us illustrate the idea on an example with three
In the last line, the sum is over all integers sj , each being either
points:
0 or 1, so that Asj is either 1 or A, and the total power of A is m.
(x x2 ) (x x3 ) (x x1 ) (x x3 ) So far we defined the action of m Ak only on tensors of the
p(x) = y1 + y2
(x1 x2 ) (x1 x3 ) (x2 x1 ) (x2 x3 ) form v1 ... vm m V . Since an arbitrary element of m V is
(x x1 ) (x x2 ) a linear combination of such elementary tensors, and since we
+ y3 . intend m Ak to be a linear map, we define the action of m Ak
(x3 x1 ) (x3 x2 )
on every element of m V using linearity. For example,
It is easy to check directly that this polynomial indeed has val-
ues p(xi ) = yi for i = 1, 2, 3. However, other (equivalent, but 2 A2 (a b + c d) Aa Ab + Ac Ad.
computationally more efficient) formulas are used in numerical By now it should be clear that the extension m Ak is indeed a
calculations. linear map m V m V . Here is a formal definition.
Definition: For a linear operator A in V , the k-linear extension
3.7 Multilinear actions in exterior powers of A to the space m V is a linear transformation m V m V
denoted by m Ak and defined by the formula
As we have seen, the action of A on the exterior power N V by m
^  X m
^ m
X
m Ak vj = Asj vj , sj = 0 or 1, sj = k.
v1 ... vN 7 Av1 ... AvN j=1 (s1 ,...,sm )j=1 j=1
(3.15)
has been very useful. However, this is not the only way A can
In words: To describe the action of m Ak on a term v1 ...vm
act on an N -vector. Let us explore other possibilities; we will
m V , we sum over all possible ways to act with A on the various
later see that they have their uses as well.
vectors vj from the term v1 ... vm , where A appears exactly
A straightforward generalization is to promote an operator
k times. The action of m Ak on a linear combination of terms is
A End V to a linear operator in the space k V , k < N (rather
by definition the linear combination of the actions on each term.
than in the top exterior power N V ). We denote this by k Ak :
Also by definition we set m A0 1m V and m Ak 0m V for
k k
( A )v1 ... vk = Av1 ... Avk . k < 0 or k > m or m > N . The meaningful values of m and k
for m Ak are thus 0 k m N .
This is, of course, a linear map of k Ak to itself (but not any Example: Let the operator A and the vectors a, b, c be such that
more a mere multiplication by a scalar!). For instance, in 2 V Aa = 0, Ab = 2b, Ac = b + c. We can then apply the various
we have extensions of the operator A to various tensors. For instance,
(2 A2 )u v = Au Av.
However, this is not the only possibility. We could, for instance, 2 A1 (a b) = Aa b + a Ab = 2a b,
define another map of 2 V to itself like this, 2 A2 (a b) = Aa Ab = 0,

u v 7 (Au) v + u (Av). 3 A2 (a b c) = a Ab Ac = a 2b c = 2(a b c)

56
3 Basic applications

(in the last line, we dropped terms containing Aa). Proof: By definition, m+1 Ak (v1 ... vm u) is a sum of
Before we move on to see why the operators m Ak are useful, terms where A acts k times on the vectors vj and u. We can
let us obtain some basic properties of these operators. gather all terms containing Au and separately all terms contain-
Statement 1: The k-linear extension of A is a linear operator in ing u, and we will get the required expressions. Here is an ex-
the space m V . plicit calculation for the given example:
Proof: To prove the linearity of the map, we need to demon-
strate not only that m Ak maps linear combinations into linear 2 A2 (u v) w = Au Av w;

combinations (this is obvious), but also that the result of the ac- 2 A1 (u v) Aw = Au v + u Av Aw.
tion of m Ak on a tensor m V does not depend on the par-
ticular representation of through terms of the form v1 ...vm . The formula (3.16) follows.
Thus we need to check that It should now be clear how the proof proceeds in the general
case. A formal proof using Eq. (3.15) is as follows. Applying
m Ak ( v1 v2 ) = m Ak ( v2 v1 ) , Eq. (3.15), we need to sum over s1 , ..., sm+1 . We can consider
terms where sm+1 = 0 separately from terms where sm+1 = 1:
where and are arbitrary tensors such that v1 v2
m V . But this property is a simple consequence of the definition X m
^ 
of m Ak which can be verified by explicit computation.  m+1 Ak (v1 ... vm u) = Asj vj u
sj =k j=1
P
(s1 ,...,sm );
Statement 2: For any two operators A, B End V , we have
X m
^ 
m (AB)
m  
= m Am m B m . + Asj vj Au
sj =k1 j=1
P
(s1 ,...,sm );
   
For example, = m Ak (v1 ... vm ) u + m Ak1 (v1 ... vm ) Au.
2
2 (AB) (u v) = ABu ABv 
2 2 2 2 2 2

= A (Bu Bv) = A B (u v) .
3.7.1 * Index notation
Proof: This property is a direct consequence of the definition
of the operator k Ak : Let us briefly note how the multilinear action such as m Ak can
be expressed in the index notation.
k
^ Suppose that the operator A has the index representation Aji
k Ak (v1 ... vk ) = Av1 Av2 ... Avk = Avj , in a fixed basis. The operator m Ak acts in the space m V ; ten-
j=1 sors in that space are represented in the index notation by to-
tally antisymmetric arrays with m indices, such as i1 ...im . An
therefore
operator B End (m V ) must be therefore represented by an
m
k
^  k
^ array with 2m indices, Bij11...i
...jm
, which is totally antisymmetric
m m
(AB) vj = ABvj , with respect to the indices {is } and separately with respect to
j=1 j=1 {js }.
k
^  k
^  k
^ Let us begin with m Am as the simplest case. The action of
m m m m m m
A B vj = A Bvj = ABvj . m Am on is written in the index notation as
j=1 j=1 j=1
N
X
 [m Am ]i1 ...im = Aij11 ...Aijm
m
j1 ...jm .
Statement 3: The operator m Ak is k-linear in A, j1 ,...,jm =1

m (A)k = k (m Ak ). This array is totally antisymmetric in i1 , ..., im as usual.


Another example is the action of m A1 on :
For this reason, m Ak is called a k-linear extension.
Proof: This follows directly from the definition of the opera- Xm X N
m k
tor A .  [m 1 i1 ...im
A ] = Aijs i1 ...is1 jis+1 ...im .
s=1 j=1
Finally, a formula that will be useful later (you can skip to
m k
Sec. 3.8 if you would rather see how A is used).
In other words, A acts only on the sth index of , and we sum
Statement 4: The following identity holds for any A End V over all s.
and for any vectors {vj | 1 j m} and u, In this way, every m Ak can be written in the index notation,
 m k   m k1  although the expressions become cumbersome.
A (v1 ... vm ) u + A (v1 ... vm ) (Au)
= m+1 Ak (v1 ... vm u) .
3.8 Trace
For example,
Pn Ajk is defined as the sum of its diag-
The trace of a square matrix
2 A2 (u v) w + 2 A1 (u v) Aw = 3 A2 (u v w) . onal elements, TrA j=1 Ajj . This definition is quite simple
(3.16) at first sight. However, if this definition is taken as fundamental

57
3 Basic applications

then one is left with many questions. Suppose Ajk is the rep- Therefore e1 ... Aej ... eN = Ajj e1 ... eN , and defini-
resentation of a linear transformation in a basis; is the number tion (3.18) gives
TrA independent of the basis? Why is this particular combina-
N
tion of the matrix elements useful? (Why not compute the sum X
(TrA) e1 ... eN = e1 ... Aej ... eN
Pnthe elements of Ajk along the other diagonal of the square,
of
j=1
j=1 A(n+1j)j ?)
N
To clarify the significance of the trace, I will give two other X 
= Ajj e1 ... eN .
definitions of the trace: one through the canonical linear map
j=1
V V K, and another using the exterior powers construc-
tion, quite similar to the definition of the determinant in Sec. 3.3. PN
Thus TrA = j=1 Ajj . 
P
Definition Tr1: The trace TrA of a tensor A k vk fk Now we prove some standard properties of the trace.

V V is the number canonically defined by the formula Statement 2: For any operators A, B End V :
X (1) Tr(A + B) = TrA + TrB.
TrA = fk (vk ) . (3.17) (2) Tr(AB) = Tr(B A).
k Proof: The formula (3.17) allows one to derive these proper-
ties more easily, but I will give proofs using the definition (3.18).
If we represent the tensor A through the basis tensors ej ek ,
(1) Since
where {ej } is some basis and {ek } is its dual basis,
e1 ... (A + B)ej ... eN = e1 ... Aej ... eN
XN XN
A= Ajk ej ek , + e1 ... Bej ... eN ,
j=1 k=1
from the definition of N A1 we easily obtain N (A + B)1 =
then ek (ej ) = ij , and it follows that N A1 + N B 1 .
(2) Since N A1 and N B 1 are operators in one-dimensional
N N N
X X X space N V , they commute, that is
TrA = Ajk ek (ej ) = Ajk kj = Ajj ,
j,k=1 j,k=1 j=1 (N A1 )(N B 1 ) = (N B 1 )(N A1 ) = (TrA)(TrB)1N V .

in agreement with the traditional definition. Now we explicitly compute the composition (N A1 )(N B 1 )
Exercise 1: Show that the trace (according to Definition Tr1) acting on e1 .... eN . First, an example with N = 2,
doesPnot depend on the choice of the tensor decomposition
A = k vk fk .  (N A1 )(N B 1 ) (e1 e2 ) = N A1 (Be1 e2 + e1 Be2 )
Here is another definition of the trace. = ABe1 e2 + Be1 Ae2
Definition Tr2: The trace TrA of an operator A End V is the + Ae1 Be2 + e1 ABe2
number by which any nonzero tensor N V is multiplied N 1
= (AB) e1 e2 + Ae1 Be2 + Be1 Ae2 .
when N A1 acts on it:
Now the general calculation:
(N A1 ) = (TrA), N V. (3.18)
N
X
Alternatively written, (N A1 )(N B 1 )e1 .... eN = e1 ... ABej ... eN
j=1
N A1 = (TrA)1N V . N
X N
X
+ e1 ... Aej ... Bek ... eN .
First we will show that the definition Tr2 is equivalent to the j=1
k=1
traditional definition of the trace. Recall that, according to the (k 6= j)
definition of N A1 ,
The second sum is symmetric in A and B, therefore the identity
N A1 (v1 ... vN ) = Av1 v2 ... vN + ...
+ v1 ... vN 1 AvN . (N A1 )(N B 1 )e1 .... eN = (N B 1 )(N A1 )e1 .... eN

 entails
Statement 1: If {ej } is any basis in V , ej is the dual ba-
N N
sis, and a linear operator A is represented by a tensor A = X X
PN
e 1 ... ABe j ... e N = e1 ... B Aej ... eN ,
j,k=1 Ajk ej ek , then the trace of A computed according to j=1 j=1
PN
Eq. (3.18) will agree with the formula TrA = j=1 Ajj .
that is Tr(AB) = Tr(B A). 
Proof: The operator A acts on the basis vectors {ej } as fol-
Exercise 2: The operator Lb acts on the entire exterior algebra
lows,
N V and is defined by Lb : 7 b , where V and b V .
X
Aek = Ajk ej . Compute the trace of this operator. Hint: Use Definition Tr1 of
j=1 the trace.

58
3 Basic applications

Answer: TrLb = 0. Exercise 1: If an operator A has the characteristic polynomial


Exercise 3: Suppose AA = 0; show that TrA = 0 and det A = 0. QA (x) then what is the characteristic polynomial of the operator
Solution: We see that det A = 0 because 0 = det(AA) = aA, where a K is a scalar?
(det A)2 . Now we apply the operator N A1 to a nonzero ten- Answer: 
sor = v1 ... vN N V twice in a row: QaA (x) = aN QA a1 x .

(N A1 )(N A1 ) = (TrA)2 Note that the right side of the above formula does not actually
N contain a in the denominator because of the prefactor aN . 
X
= (N A1 ) v1 ... Avj ... vN The principal use of the characteristic polynomial is to deter-
j=1 mine the eigenvalues of linear operators. We remind the reader
N X
N that a polynomial p(x) of degree N has N roots if we count each
X
= v1 ... Avi ... Avj ... vN root with its algebraic multiplicity; the number of different roots
i=1 j=1 may be smaller than N . A root has algebraic multiplicity k if
k k+1
p(x) contains a factor (x ) but not a factor (x ) . For
= 2(N A2 ).
example, the polynomial
(In this calculation, we omitted the terms containing AAvi since
p(x) = (x 3)2 (x 1) = x3 7x2 + 15x 9
AA = 0.) Using this trick, we can prove by induction that for
1kN has two distinct roots, x = 1 and x = 3, and the root x = 3
k has multiplicity 2. If we count each root with its multiplicity, we
(TrA) = (N A1 )k = k!(N Ak ). will find that the polynomial p(x) has 3 roots (not all of them
different as we would say in this case).
Note that N AN multiplies by the determinant of A, which is
Theorem 1: a) The set of all the roots of the characteristic poly-
zero. Therefore (TrA)N = N !(det A) = 0 and so TrA = 0.  nomial Q (x) is the same as the set of all the eigenvalues of the
A
operator A.
3.9 Characteristic polynomial b) The geometric multiplicity of an eigenvalue (i.e. the di-
mension of the space of all eigenvectors with the given eigen-
Definition: The characteristic polynomial QA (x) of an opera- value ) is at least 1 but not larger than the algebraic multiplicity
tor A End V is defined as of a root in the characteristic polynomial.
 Proof: a) By definition, an eigenvalue of an operator A is such
QA (x) det A x1V . a number K that there exists at least one vector v V , v 6= 0,
such that Av = v. This equation is equivalent to (A 1V )v =
This is a polynomial of degree N in the variable x. 0. By Corollary 3.5, there would be no solutions v 6= 0 unless
Example 1: The characteristic polynomial of the operator a1V , det(A 1V ) = 0. It follows that all eigenvalues must be roots
where a K, is of the characteristic polynomial. Conversely, if is a root then
Qa1V (x) = (a x)N . det(A 1V ) = 0 and hence the vector equation (A 1V )v =
Setting a = 0, we find that the characteristic polynomial of the 0 will have at least one nonzero solution v (see Theorem 2 in
zero operator 0V is simply (x) .
N Sec. 3.5).
b) Suppose {v1 , ..., vk } is a basis in the eigenspace of eigen-
Example 2: Consider a diagonalizable operator A, i.e. an oper-
value 0 . We need to show that 0 is a root of QA (x) with
ator having a basis {v1 , ..., vN } of eigenvectors with eigenvalues
multiplicity at least k. We may obtain a basis in the space V
1 , ..., N (the eigenvalues are not necessarily all different). This
as {v1 , ..., vk , ek+1 , ..., eN } by adding suitable new vectors {ej },
operator can be then written in a tensor form as
j = k + 1, ..., N . Now compute the characteristic polynomial:
N
X
A = i vi vi , QA (x)(v1 ... vk ek+1 ... eN )
i=1 = (A x1)v1 ... (A x1)vk
where {vi }is the basis dual to {vi }. The characteristic polyno- (A x1)ek+1 ... (A x1)eN
mial of this operator is found from
= (0 x)k v1 ... vk (A x1)ek+1 ... (A x1)eN .
det(A x1)v1 ... vN = (Av1 xv1 ) ... (AvN xvN ) k
It follows that QA (x) contains the factor (0 x) , which means
= (1 x) v1 ... (N x) vN .that 0 is a root of QA (x) of multiplicity at least k. 
Hence Remark: If an operators characteristic polynomial has a root 0

QA (x) = (1 x) ... (N x) . of algebraic multiplicity k, it may or may not have a k-dimen-


sional eigenspace for the eigenvalue 0 . We only know that 0
Note also that the trace of a diagonalizable operator is equal to is an eigenvalue, i.e. that the eigenspace is at least one-dimen-
the sum of the eigenvalues, Tr A = 1 + ... + N , and the de- sional. 
terminant is equal to the product of the eigenvalues, det A = Theorem 1 shows that all the eigenvalues of an operator A
1 2 ...N . This can be easily verified by direct calculations in can be computed as roots of the equation Q A () = 0, which is
the eigenbasis of A. called the characteristic equation for the operator A.

59
3 Basic applications

Now we will demonstrate that the coefficients of the charac- q


teristic polynomial QA (x) are related in a simple way to the op- 4
erators N Ak . First we need an auxiliary calculation to derive
an explicit formula for determinants of operators of the form 3
A 1V . 2
Lemma 1: For any A End V , we have
1
N
X
N (A + 1V )N = (N Ar ). 0 1 2 3 4 p
r=0

More generally, for 0 q p N , we have Figure 3.3: Deriving Lemma 1 by induction. White circles cor-
respond to the basis of induction. Black circles are
Xq 
pr
 reached by induction steps.
p q
(A + 1V ) = (p Ar ). (3.19)
r=0
p q

Proof: I first give some examples, then prove the most useful Let v V be an arbitrary vector and p V be an arbitrary
case p = q, and then show a proof of Eq. (3.19) for arbitrary p tensor. The induction step is proved by the following chain of
and q. equations,
For p = q = 2, we compute

2 (A + 1V )2 a b = (A + 1V )a (A + 1V )b p+1 (A + 1V )q+1 (v )
h i h i
(1)
= Aa Ab + Aa b + a Ab + a b = (A + 1V )v p (A + 1V )q + v p (A + 1V )q+1
= [2 A2 + 2 A1 + 2 A0 ] (a b) . X q 
pr
 Xq 
pr

(2)
= Av (p Ar ) + v (p Ar )
p q p q
This can be easily generalized to arbitrary p = q: The action of r=0 r=0
the operator p (A + 1V )p on e1 ... ep is q+1
X 
pr

+v (p Ar )
p q 1
p (A + 1V )p e1 ... ep = (A + 1V )e1 ... (A + 1V )ep , r=0
q+1  
(3)
X pk+1
and we can expand the brackets to find first one term with p op- = Av (p Ak1 )
pq
erators A, then p terms with (p 1) operators A, etc., and finally k=1
q+1    
one term with no operators A acting on the vectors ej . All terms X pr pr
which contain r operators A (with 0 r p) are those appear- +v + (p Ar )
r=0
pq1 pq
ing in the definition of the operator p Ar . Therefore q+1  
(4)
X pk+1 n h i h io
p = Av p Ak1 + v p Ak
X pq
p (A + 1V )p = (p Ar ). k=0
r=0 q+1  
(1)
X pk+1
= (p+1 Ak ) (v ) ,
This is precisely the formula (3.19) because in the particular case pq
k=0
p = q the combinatorial coefficient is trivial,
   
pr pr where (1) is Statement 4 of Sec. 3.7, (2) uses the induction step
= = 1.
pq 0 assumptions for (p, q) and (p, q + 1), (3) is the relabeling r = k1
and rearranging terms (note that the summation over 0 r q
Now we consider the general case 0 q p. First an exam- was formally extended to 0 r q + 1 because the term with
ple: for p = 2 and q = 1, we compute r = q + 1 vanishes), and (4) is by the binomial identity
2 (A + 1V )1 a b = (A + 1V )a b + a (A + 1V )b      
n n n+1
= 2a b + Aa b + a Ab + =
h   i m1 m m
= 21 (2 A0 ) + 20 (2 A1 ) a b,
  and a further relabeling r k in the preceding summation. 
since 21 = 2 and 20 = 1.
To prove the formula (3.19) in the general case, we use induc-
tion. The basis of induction consists of the trivial case (p 0,
q = 0) where all operators 0 Ap with p 1 are zero operators, Corollary: For any A End V and K,
and of the case p = q, which was already proved. Now we will
prove the induction step (p, q) & (p, q + 1) (p + 1, q + 1). Fig- q  
qr p r
X
p q
ure 3.3 indicates why this induction step is sufficient to prove (A + 1V ) = (p Ar ).
p q
the statement for all 0 q p N . r=0

60
3 Basic applications

Proof: By Statement 3 of Sec. 3.7, p (A)q = q (p Aq ). Set Exercise 2 (general trace relations): Generalize the result of Ex-
A = B, where B is an auxiliary operator, and compute ercise 1 to N dimensions:
a) Show that
q  
X pr  
p (B + 1V )q = q p (B + 1V )q = q (p B r ) N A2 = 21 (TrA)2 Tr(A2 ) .
r=0
p q
q   b)* Show that all coefficients N Ak (k = 1, ..., N ) can be ex-
qr p r
X
p r
= ( (B) ) pressed as polynomials in TrA, Tr(A2 ), ..., Tr(AN ).
r=0
p q
Hint: Define a mixed operator N (An )j Ak as a sum of exte-
q  
X pr rior products containing j times An and k times A; for example,
= qr (p Ar ).
r=0
pq  3 2 1 1
(A ) A a b c A2 a (Ab c + b Ac)
 + Aa (A2 b c + b A2 c) + a (A2 b Ac + Ab A2 c).
Theorem 2: The coefficients qm (A), 1 m N of the charac-
teristic polynomial, defined by By applying several operators N Ak and Tr(Ak ) to an exterior
product, derive identities connecting these operators and N Ak :
N
X 1
QA () = ()N + (1)k qN k (A)k , (N A1 )(N Ak ) = (k + 1) N Ak+1 + N (A2 )1 Ak1 ,
k=0
Tr(Ak )Tr(A) = Tr(Ak+1 ) + N (Ak )1 A1 ,
are the numbers corresponding to the operators N Am
for k = 2, ..., N 1. Using these identities, show by induction
End(N V ):
that operators of the form N Ak (k = 1, ..., N ) can be all ex-
qm (A)1N V = N Am .
pressed through TrA, Tr(A2 ), ..., Tr(AN 1 ) as polynomials.
In particular, qN (A) = det A and q1 (A) = TrA. More compactly, As an example, here is the trace relation for N A3 :
the statement can be written as
N A3 = 61 (TrA)3 12 (TrA)Tr(A2 ) + 13 Tr(A3 ).
N
X N k
QA () 1N V = () (N Ak ). Note that in three dimensions this formula directly yields the
k=0
determinant of A expressed through traces of powers of A. Be-
Proof: This is now a consequence of Lemma 1 and its Corol- low (Sec. 4.5.3) we will derive a formula for the general trace
lary, where we set p = q = N and obtain relation. 
Since operators in N V act as multiplication by a number,
X N it is convenient to omit 1N V and regard expressions such as
N r
N (A 1V )N = () (N Ar ). N Ak as simply numbers. More formally, there is a canonical
r=0 isomorphism between End N V and K (even though there is
no canonical isomorphism between N V and K).
 Exercise 3: Give an explicit formula for the canonical isomor-
 
Exercise 1: Show that the characteristic polynomial of an oper- phism: a) between k V and k (V ); b) between End N V
ator A in a three-dimensional space V can be written as and K.
  Answer: a) A tensor f1 ... fk k (V ) acts as a linear func-
1 2 2 2 3
QA () = det A 2 (TrA) Tr(A ) + (TrA) . tion on a tensor v1 ... vk k V by the formula

Solution: The first and the third coefficients of QA () are, as (f1 ... fk ) (v1 ... vk ) det(Ajk ),
usual, the determinant and the trace of A. The second coefficient
where Ajk is the square matrix defined by Ajk fj (vk ).
is equal to 3 A2 , so we need to show that
b) Since (N V ) is canonically isomorphic to N (V ), an op-

3 2 1 2 2
 erator N End N V can be represented by a tensor
A = (TrA) Tr(A ) .
2  
N = (v1 ... vN ) (f1 ... fN
) N V N V .
We apply the operator 3 A1 twice to a tensor a b c and cal-
culate: The isomorphism maps N into the number det(Ajk ), where Ajk
is the square matrix defined by Ajk fj (vk ). 
2 3 1 3 1
(TrA) a b c = ( A )( A )(a b c) Exercise 4: Show that an operator A End V and its canonical
= (3 A1 )(Aa b c + a Ab c + a b Ac) transpose operator AT End V have the same characteristic
polynomials.
= A2 a b c + 2Aa Ab c + a A2 b c
Hint: Consider the operator (A x1V )T . 
2
+ 2Aa b Ac + 2a Ab Ac + a b A c Exercise 5: Given an operator A of rank r < N , show that
 
= Tr(A2 ) + 2 3 A2 a b c. N Ak = 0 for k r + 1 but N Ar 6= 0.
Hint: If A has rank r < N then Av1 ... Avr+1 = 0 for any
Then the desired formula follows.  set of vectors {v1 , ..., vr+1 }.

61
3 Basic applications

3.9.1 Nilpotent operators where now sj are non-negative integers, 0 sj pN , such


PN
that j=1 sj = kpN . It is impossible that all sj in Eq. (3.20) are
There are many operators with the same characteristic polyno- PN
mial. In particular, there are many operators which have the less than p, because then we would have j=1 sj < N p, which
P N
simplest possible characteristic polynomial, Q0 (x) = (x)N . would contradict the condition j=1 sj = kpN (since k 1 by
Note that the zero operator has this characteristic polynomial. construction). So each term of the sum in Eq. (3.20) contains at
We will now see how to describe all such operators A that least a p-th power of A. Since (A)p = 0, each term in the sum in
N
QA (x) = (x) . Eq. (3.20) vanishes. Hence (N Ak )pN = 0 as required. 
Definition: An operator A End V is nilpotent if there exists Remark: The converse statement is also true: If the character-
N
an integer p 1 such that (A)p = 0, where 0 is the zero operator istic polynomial of an operator A is QA (x) = (x) then A is
nilpotent. This follows easily from the Cayley-Hamilton the-
and (A)p is the p-th power of the operator A.   orem (see below), which states that QA (A) = 0, so we obtain
0
Examples: a) The operator defined by the matrix in immediately (A)N = 0, i.e. the operator A is nilpotent. We find
0 0
some basis {e1 , e2 } is nilpotent for any number . This operator that one cannot distinguish a nilpotent operator from the zero
can be expressed in tensor form as e e . operator by looking only at the characteristic polynomial.
1 2
b) In the space of polynomials of degree at most n in the vari-
d
able x, the linear operator dx is nilpotent because the (n + 1)-th
power of this operator will evaluate the (n + 1)-th derivative,
which is zero on any polynomial of degree at most n. 
N
Statement: If A is a nilpotent operator then QA (x) = (x) .
Proof: First an example: suppose that N = 2 and that A3 = 0.
By Theorem 2, the coefficients of the characteristic polynomial
of the operator A correspond to the operators N Ak . We need
to show that all these operators are equal to zero.
Consider, for instance, 2 A2 = q2 12 V . This operator raised
to the power 3 acts on a tensor a b 2 V as
3
2 A2 a b = A3 a A3 b = 0

since A3 = 0. On the other hand,


3 3
2 A2 a b = (q2 ) a b.

Therefore q2 = 0. Now consider 2 A1 to the power 3,


3
2 A1 a b = A2 a Ab + Aa A2 b

(all other terms vanish because A3 = 0). It is clear that the oper-
ator 2 A1 to the power 6 vanishes because there will be at least
a third power of A acting on each vector. Therefore q1 = 0 as
well.
Now a general argument. Let p be a positive integer such that
Ap = 0, and consider the (pN )-th power of the operator N Ak
for some k 1. We will prove that (N Ak )pN = 0. Since N Ak
is a multiplication by a number, from (N Ak )pN = 0 it will fol-
low that N Ak is a zero operator in N V for all k 1. If all the
coefficients qk of the characteristic polynomial vanish, we will
N
have QA (x) = (x) .
N k pN
To prove that ( A ) = 0, consider the action of the oper-
ator ( A ) on a tensor e1 ... eN N V . By definition of
N k pN

N Ak , this operator is a sum of terms of the form

As1 e1 ... AsN eN ,


PN
where sj = 0 or sj = 1 are chosen such that j=1 sj = k. There-
fore, the same operator raised to the power pN is expressed as
X
(N Ak )pN = As1 e1 ... AsN eN , (3.20)
(s1 ,...,sn )

62
4 Advanced applications
In this chapter we work in an N -dimensional vector space Proof: We need to show that the formula
over a number field K. 
X T v Xv

4.1 The space N 1V actually defines an operator X T uniquely when X End V is a


given operator. Let us fix a tensor N 1 V ; to find X T we
So far we have been using only the top exterior power, V . need to determine a tensor N 1 V such that v = Xv
N

The next-to-top exterior power space, N 1 V , has the same di- for all v V . When we find such a , we will also show that
mension as V and is therefore quite useful since it is a space, in it is unique; then we will have shown that X T is well-
some special sense, associated with V . We will now find several defined.
important uses of this space. An explicit computation of the tensor can be performed in
terms of a basis {e1 , ..., eN } in V . A basis in the space N 1 V
is formed by the set of N tensors of the form i e1 ...
4.1.1 Exterior transposition of operators ei1 ei+1 ... eN , that is, i is the exterior product of the
basis vectors without the vector ei (1 i N ). In the nota-
We have seen that a linear operator in the space N V is equiv- tion of Sec. 2.3.3, we have = (e )(1)i1 . It is sufficient to
i i
alent to multiplication by a number. We can reformulate this determine the components of in this basis,
statement by saying that the space of linear operators in N V is
canonically isomorphic to K. Similarly, the space of linear oper- XN
ators in N 1 V is canonically isomorphic to End V , the space of = ci i .
linear operators in V . The isomorphism map will be denoted by i=1
the superscript T . We will begin by defining this map explicitly.
Taking the exterior product of with ei , we find that only the
Question: What is a nontrivial example of a linear operator in term with c survives,
i
N 1 V ?
Answer: Any operator of the form N 1 Ap with 1 p N 1 ei = (1)N i ci e1 ... eN .
and A End V . In this book, operators constructed in this way
will be the only instance of operators in N 1 V . Therefore, the coefficient ci is uniquely determined from the
condition
Definition: If X End V is a given linear operator then the
exterior transpose operator !
ci e1 ... eN = (1)N i ei =(1)N i Xei .

X T End N 1 V Since the operator X is given, we know all Xei and can compute
Xei N V . So we find that every coefficient ci is uniquely
is canonically defined by the formula determined.
 It is seen from the above formula that each coefficient ci de-
X T v Xv, pends linearly on the operator X. Therefore the linearity prop-
erty holds,
which must hold for all N 1 V and all v V . If (A + B)T = AT + B T .
N 1
Y End( V ) is a linear operator then its exterior transpose
Y T End V is defined by the formula The linearity of the operator X T follows straightforwardly
from the identity
T
 N 1
Y v (Y ) v, V, v V.  !
X T ( + ) v= ( + ) Xv
We need to check that the definition makes sense, i.e. that the = Xv + Xv
operators defined by these formulas exist and are uniquely de- !
fined. =(X T ) v + (X T ) v.

Statement 1: The exterior transpose operators are well-defined, In the same way we prove the existence, the uniqueness, and
i.e. they exist, are unique, and are linear operators in the respec- the linearity of the exterior transpose of an operator from
tive spaces. The exterior transposition has the linearity property End(N 1 V ). It is then clear that the transpose of the transpose
is again the original operator. Details left as exercise. 
(A + B)T = AT + B T . Remark: Note that the space N 1 V is has the same dimension
 as V but is not canonically isomorphic to V . Rather, an element
If X End V is an exterior transpose of Y End N 1 V , N 1 V naturally acts by exterior multiplication on a vec-
i.e. X = Y T , then also conversely Y = X T . tor v V and yields a tensor from N V , i.e. is a linear map

63
4 Advanced applications

V N V , and we may express this as N 1 V = V N V . Using the index representation of the exterior product through
Nevertheless, as we will now show, the exterior transpose map the projection operators E (see Sec. 2.3.6), we represent the equa-
allows us to establish that the space of linear operators in N 1 V tion above in the the index notation as
is canonically isomorphic to the space of linear operators in V . X j ...j 1 i1 ...iN 1 i
We will use this isomorphism extensively in the following sec- Ejk11...j
...kN
N 1 i
(Bi11...iNN1 )v
tions. A formal statement follows. i,is ,js
X
Statement 2: The spaces End(N 1 V ) and End V are canoni- = Ejk11...j
...kN
j1 ...jN 1 (Aji v i ).
N 1 j
cally isomorphic. js ,i,j
Proof: The map T between these spaces is one-to-one since
no two different operators are mapped to the same operator. If We may simplify this to
two different operators A, B had the same exterior transpose, X j ...j 1 i1 ...iN 1 i
we would have (A B)T = 0 and yet A B 6= 0. There exists j1 ...jN 1 i (Bi11...iNN1 )v
i,is ,js
at least one N 1 V and v V such that (A B)v 6= 0, X
and then = i1 ...iN 1 j i1 ...iN 1 (Aji v i ),
 is ,i,j
0 = (A B)T v = (A B)v 6= 0,
because Ejk11...j
...kN
= j1 ...jN k1 ...kN , and we may cancel the com-
which is a contradiction. The map T is linear (Statement 1). N
mon factor k1 ...kN whose indices are not being summed over.
Therefore, it is an isomorphism between the vector spaces
Since the equation above should hold for arbitrary i1 ...iN 1
End N 1 V and End V . 
and v i , the equation with the corresponding free indices is and i
A generalization of Statement 1 is the following.
should hold:
Exercise 1: Show that the spaces End(k V ) and End(N k V )
X X
are canonically isomorphic (1 k < N ). Specifically, if j ...j 1
j1 ...jN 1 i Bi11...iNN1 = i1 ...iN 1 j Aji . (4.1)
X End(k V ) then the linear operator X T End(N k V ) js j
is uniquely defined by the formula
 This equation can be solved for B as follows. We note that the
X T N k k N k Xk , symbol in the left-hand side of Eq. (4.1) has one free index, i. Let
us therefore multiply with an additional and sum over that
which must hold for arbitrary tensors k k V , N k
index; this will yield the projection operator E (see Sec. 2.3.6).
N k V .
Namely, we multiply both sides of Eq. (4.1) with k1 ...kN 1 i and
Remark: It follows that the exterior transpose of N AN
sum over i:
End N V is mapped by the canonical isomorphism to an el-
X X j ...j 1
ement of End K, that is, a multiplication by a number. This is k1 ...kN 1 i i1 ...iN 1 j Aji = k1 ...kN 1 i j1 ...jN 1 i Bi11...iNN1
precisely the map we have been using in the previous section to j,i js ,i
define the determinant. In this notation, we have X k ...k j ...j
= Ej11...jNN1
1
Bi11...iNN1
1
,
T
det A N AN . js

Here we identify End K with K. where in the last line we used the definition (2.11)(2.12) of the

Exercise 2: For any operators A, B End k V , show that operator E. Now we note that the right-hand side is the index
representation of the product of the operators E and B (both
(AB)T = B T AT . operators act in N 1 V ). The left-hand side is also an operator
in N 1 V ; denoting this operator for brevity by X, we rewrite
4.1.2 * Index notation the equation as

Let us see how the exterior transposition is expressed in the in- E B = X End N 1 V .
dex notation. (Below we will not use the resulting formulas.)
Using the property
If an operator A End V is given in the index notation by a

matrix Aji , the exterior transpose AT End N 1 V is rep- E = (N 1)!1N 1V
j ...j 1
resented by an array Bi11...iNN1 , which is totally antisymmetric
with respect to its N 1 lower and upper indices separately. (see Exercise in Sec. 2.3.6), we may solve the equation E B = X
The action of the operator B AT on a tensor N 1 V is for B as
1
written in the index notation as B = X.
X j ...j (N 1)!
Bi11...iNN1
1
i1 ...iN 1 .
Hence, the components of B AT are expressed as
is

k ...k 1 X
(Here we did not introduce any combinatorial factors; the factor Bi11...iNN1
1
= k1 ...kN 1 i i1 ...iN 1 j Aji .
(N 1)! will therefore appear at the end of the calculation.) (N 1)!
j,i
By definition of the exterior transpose, for any vector v V
and for any N 1 V we must have An analogous formula holds for the exterior transpose of an
operator in n V , for any n = 2, ..., N . I give the formula without
(B) v = (Av). proof and illustrate it by an example.

64
4 Advanced applications

Statement: If A End (n V ) is given by its components Aji11...i


...jn
n
Example 1: Let us compute (N 1 A1 )T . We consider, as a first
then the components of AT are example, a three-dimensional (N = 3) vector space V and a
linear operator A End V . We are interested in the operator
k1 ...kN n
AT (2 A1 )T . By definition of the exterior transpose,
l1 ...lN n
1 
a b (2 A1 )T c = (2 A1 )(a b) c
X
= k1 ...kN ni1 ...in l1 ...lN nj1 ...jn Aji11...i
...jn
.
n!(N n)! j ,i n
s s = Aa b c + a Ab c.

Example: Consider the exterior transposition AT of the iden- We recognize a fragment of the operator 3 A1 and write
tity operator A 12 V . The components of the identity operator
(3 A1 )(a b c) = Aa b c + a Ab c + a b Ac
are given by
Aji11ij22 = ij11 ij22 , = (Tr A)a b c,

since this operator acts as multiplication by the trace of A (Sec-


so the components of AT are
tion 3.8). It follows that
k1 ...kN 2 1 X
a b (2 A1 )T c = (Tr A)a b c a b Ac
AT l1 ...lN 2
= k1 ...kN 2 i1 i2 l1 ...lN 2 j1 j2 Aji11ij22
2!(N 2)! j ,i 
s s
= a b (Tr A)c Ac .
1 X
= k1 ...kN 2 i1 i2 l1 ...lN 2 i1 i2 . Since this must hold for arbitrary a, b, c V , it follows that
2!(N 2)! i ,i
1 2

(2 A1 )T = (Tr A)1V A.
Let us check that this array of components is the same as that
representing the operator 1N 2 V . We note that the expression Thus we have computed the operator (2 A1 )T in terms of A
above is the same as and the trace of A.
Example 2: Let us now consider the operator (2 A2 )T . We
1 k1 ...kN 2
E , have
(N 2)! l1 ...lN 2 
a b (2 A2 )T c = (2 A2 )(a b) c = Aa Ab c.
where the numbers Elk11...l
...kn
n
are defined by Eqs. (2.11)(2.12).
3 2
Since the operator E in N 2
V is equal to (N 2)!1N 2 V , we We recognize a fragment of the operator A and write
obtain that
(3 A2 )(a b c) = Aa Ab c + a Ab Ac + Aa b Ac.
T
A = 1N 2 V
Therefore,
as required.
a b (2 A2 )T c = (3 A2 )(a b c)
(a Ab + Aa b) Ac
4.2 Algebraic complement (adjoint) and (1)
= ( A )(a b c) a b (2 A1 )T Ac
3 2

beyond 
= a b 3 A2 (2 A1 )T A c,

In Sec. 3.3 we defined the determinant and derived various use- where (1) used the definition of the operator (2 A1 )T . It fol-
ful properties by considering, essentially, the exterior transpose lows that
of N Ap with 1 p N (although we did not introduce
this terminology back then). We have just seen that the exte- (2 A2 )T = (3 A2 )1V (2 A1 )T A
rior transposition can be defined more generally as a map = (3 A2 )1V (Tr A)A + AA.
from End(k V ) to End(N k V ). We will see in this section
that the exterior transposition of the operators N 1 Ap with Thus we have expressed the operator (2 A2 )T as a polynomial
1 p N 1 yields operators acting in V that are quite useful in A. Note that 3 A2 is the second coefficient of the characteristic
as well. polynomial of A.
Exercise 1: Consider a three-dimensional space V , a linear op-
erator A, and show that
4.2.1 Definition of algebraic complement
(2 A2 )T Av = (det A)v, v V.
While we proved that operators like (N 1 Ap )T are well-
defined, we still have not obtained any explicit formulas for Hint: Consider a b (2 A2 )T Ac = Aa Ab Ac. 
these operators. We will now compute these operators explic- These examples are straightforwardly generalized. We will
itly because they play an important role in the further develop- now express every operator of the form (N 1 Ap )T as a poly-
ment of the theory. It will turn out that every operator of the nomial in A. For brevity, we introduce the notation
form (N 1 Ap )T is a polynomial in A with coefficients that are
known if we know the characteristic polynomial of A. A(k) (N 1 AN k )T , 1 k N 1.

65
4 Advanced applications

Lemma 1: For any operator A End V and for an integer p, 1 Note that the characteristic polynomial of A is
p N , the following formula holds as an identity of operators
N 1
in V : QA () = q0 + q1 () + ... + qN 1 () + ()N .
T T
N 1 Ap1 A + N 1 Ap = (N Ap )1V . Thus the operators denoted by A(k) are computed as suitable
fragments of the characteristic polynomial into which A is
Here, in order to provide a meaning for this formula in cases
substituted instead of .
p = 1 and p = N , we define N 1 AN 0 and N 1 A0 1. In Exercise 3:* Using the definition of exterior transpose for gen-
the shorter notation, this is eral exterior powers (Exercise 1 in Sec. 4.1.1), show that for
A A + A N N k+1
= ( A )1 . 1 k N 1 and 1 p k the following identity holds,
(k) (k1) V
p
X T
Note that N AN k+1 qk1 , where qj are the coefficients of the N k Apq (k Aq ) = (N Ap )1k V .
characteristic polynomial of A (see Sec. 3.9). q=0
Proof: We use Statement 4 in Sec. 3.7 with v1 ... vN 1 , T
m N 1 and k p: Deduce that the operators N k Ap can be expressed as
  polynomials in the (mutually commuting) operators k Aj (1
N 1 Ap u + N 1 Ap1 (Au) = N Ap ( u) .
j k).
This holds for 1 p N 1. Applying the definition of the Hints: Follow the proof of Statement 4 in Sec. 3.7. The idea is
exterior transpose, we find to apply both sides to k N k , where k v1 ... vk and
T T N k = vN k+1 ... vN . Since N Ap acts on k N k by
N 1 Ap u + N 1 Ap1 Au = (N Ap ) u. distributing p copies of A among the N vectors vj , one needs to
show that the same terms will occur when one first distributes
Since this holds for all N 1 V and u V , we obtain the q copies of A among the first k vectors and p q copies of A
required formula, among the last N k vectors, and then sums over all q from 0 to
T T p. Once the identity is proved, one can use induction to express
N 1 Ap + N 1 Ap1 A = (N Ap )1V . T
the operators N k Ap . For instance, the identity with k = 2
It remains to verify the case p = N . In that case we compute and p = 1 yields
directly, T T
 N 2 A0 (2 A1 ) + N 2 A1 (2 A0 ) = (N A1 )1k V .
N 1 AN 1 (Au) = Av1 ... AvN 1 Au
Therefore
= N AN ( u) . T
N 2 A1 = (TrA)1k V 2 A1 .
Hence, Similarly, with k = 2 and p = 2 we find
N 1

N 1 T N N
A A = ( A )1V (det A)1V . N 2 A2
T
= (N A2 )1k V N 2 A1
T
(2 A1 ) 2 A2
 = (N A2 )1k V (TrA)(2 A1 ) + (2 A1 )2 2 A2 .
N p
Remark: In  these formulas we interpret the operators A
T
End N V as simply numbers multiplying some operators. It follows by induction that all the operators N k Ap are
This is justified since N V is one-dimensional, and linear op- k j
expressed as polynomials in A . 
erators in it act as multiplication by numbers. In other words,
 At the end of the proof of Lemma 1 we have obtained a curi-
we implicitly use the canonical isomorphism End N V = K. ous relation,

Exercise 2: Use induction in p (for 1 p N 1) and Lemma 1 T
N 1 AN 1 A = (det A)1V .
to express A(k) explicitly as polynomials in A:
p If det A 6= 0, we may divide by it and immediately find the fol-
X
p T k
N 1
 k N pk lowing result.
A(N p) A = (1) ( A )(A) .
k=0 Lemma 2: If det A 6= 0, the inverse operator satisfies

Hint: Start applying Lemma 1 with p = 1 and A(N ) 1.  1 T


A1 = N 1 AN 1 .
N N k
Using the coefficients qk A of the characteristic poly- det A
nomial, the result of Exercise 2 can be rewritten as Thus we are able to express the inverse operator A1 as a poly-
T nomial in A. If det A = 0 then the operator A has no inverse,
N 1 A1 A(N 1) = qN 1 1V A, T
T but the operator N 1 AN 1 is still well-defined and suffi-
N 1 A2 A(N 2) = qN 2 1V qN 1 A + (A)2 , ciently useful to deserve a special name.
......, Definition: The algebraic complement (also called the adjoint)
T of A is the operator
N 1 AN 1 A(1) = q1 1V + q2 (A) + ...
T
+ qN 1 (A)N 2 + (A)N 1 . A N 1 AN 1 End V.

66
4 Advanced applications

Exercise 4: Compute the algebraic complement of the operator matrix Xij the k-th column to the first column and the l-th row to
A = ab , where a V and b V , and V is an N -dimensional the first row, without changing the order of any other rows and
k+l
space (N 2). columns. This produces the sign factor (1) but otherwise
Answer: Zero if N 3. For N = 2 we use Example 1 to does not change the determinant. The result is
compute
1 X12 ... X1N

(1 A1 )T = (Tr A)1 A = b (a)1 a b . k+l
0

Bkl = det X = (1) det .
..
Exercise 5: For the operator A = ab in N -dimensional space,
T 0
as in Exercise 4, show that N 1 Ap = 0 for p 2.

k+l
= (1) det ,
4.2.2 Algebraic complement of a matrix
The algebraic complement is usually introduced in terms of ma-
trix determinants. Namely, one takes a matrix Aij and deletes where the stars represent the matrix obtained from Aij by delet-
the column number k and the row number l. Then one com- ing column k and row l, and the numbers X12 , ..., X1N do not
putes the determinant of the resulting matrix and multiplies by enter the determinant. This is the result we needed. 
k+l
(1) . The result is the element Bkl of the matrix that is the al- Exercise 5:* Show that the matrix representation of the alge-
gebraic complement of Aij . I will now show that our definition braic complement can be written through the Levi-Civita sym-
is equivalent to this one, if we interpret matrices as coefficients bol as
of linear operators in a basis.
1 X X
Statement: Let A End V and let {ej } be a basis in V . Let Aik = kk2 ...kN ii2 ...iN Aki22 ...AkiNN .
(N 1)! i ,...,i
Aij be the matrix of the operator A in this basis. Let B = 2 N k2 ,...,kN

N 1 N 1 T

A and let Bkl be the matrix of B in the same basis. Hint: See Sections 3.4.1 and 4.1.2.
k+l
Then Bkl is equal to (1) times the determinant of the matrix
obtained from Aij by deleting the column number k and the row
number l. 4.2.3 Further properties and generalizations
Proof: Given an operator B, the matrix element Bkl in the ba-
sis {ej } can be computed as the coefficient in the following rela- In our approach, the algebraic complement A of an operator A
tion (see Sec. 2.3.3), comes from considering the set of N 1 operators
T
Bkl e1 ... eN = e1 ... ek1 (Bel ) ek+1 ... eN . A(k) N 1 AN k , 1 k N 1.
T
Since B = N 1 AN 1 , we have (For convenience we might define A(N ) 1V .)
The operators A(k) can be expressed as polynomials in A
Bkl e1 ... eN = Ae1 ... Aek1 el Aek+1 ... AeN . through the identity (Lemma 1 in Sec. 4.2.1)
Now the right side can be expressed as the determinant of an- A(k) A + A(k1) = qk1 1, qj N AN j .
other operator, call it X,
The numbers qj introduced here are the coefficients of the char-
Bkl e1 ... eN = (det X)e1 ... eN acteristic polynomial of A; for instance, det A q0 and TrA
= Xe1 ... Xek1 Xek Xek+1 ... XeN , qN 1 . It follows by induction (Exercise 2 in Sec. 4.2.1) that

if we define X as an operator such that Xek el while on other A(N k) = qN k 1 qN k+1 A + ...
basis vectors Xej Aej (j 6= k). Having defined X in this way, + qN 1 (A)k1 + (A)k .
we have Bkl = det X.
We can now determine the matrix Xij representing X in the
The algebraic complement is A A1 , but it appears natural to
basis {ej }. By the definition of the matrix representation of op-
study the properties of all the operators A(k) . (The operators
erators,
A(k) do not seem to have an established name for k 2.)
N N
X X Statement 1: The coefficients of the characteristic polynomial of
Aej = Aij ei , Xej = Xij ei , 1 j N.
i=1 i=1
the algebraic complement, A, are

It follows that Xij = Aij for j 6= k while Xik = il (1 i N ),


N Ak = (det A)k1 (N AN k ) q0k1 qk .
which means that the entire k-th column in the matrix Aij has
been replaced by a column containing zeros except for a single For instance,
nonzero element Xlk = 1.
It remains to show that the determinant of the matrix Xij is
Tr A = N A1 = q1 = N AN 1 ,
k+l
equal to (1) times the determinant of the matrix obtained
from Aij by deleting column k and row l. We may move in the det A = N AN = q0N 1 qN = (det A)N 1 .

67
4 Advanced applications

Proof: Let us first assume that det A q0 6= 0. We use the Exercise:* Suppose that A has the simple eigenvalue = 0
(i.e. this eigenvalue has multiplicity 1). Show that the algebraic
property AA = q0 1 (Lemma 2 in Sec. 4.2.1) and the multiplica-

tivity of determinants to find complement, A, has rank 1, and that the image of A is the one-
dimensional subspace Span {v}.
q0 Hint: An operator has rank 1 if its image is one-dimensional.
det(A 1)q0 = det(q0 1 A) = ()N det(A 1)
The eigenvalue = 0 has multiplicity 1 if N AN 1 6= 0. Choose
N q0
= ( )QA ( ), a basis consisting of the eigenvector v and N 1 other vectors

u2 , ..., uN . Show that

hence the characteristic polynomial of A is Av u2 ... uN = N AN 1 (v u2 ... uN ) 6= 0,

(N ) q0 while
QA () det(A 1) = QA ( )
q0
  v u2 ... Auj ... uN = 0, 2 j N.
N   
() q0 N q0 N 1
= + qN 1 + ... + q0 Consider other expressions, such as
q0
= ()N + q1 ()N 1 + q2 q0 ()
N 2
+ ... + q0N 1 .
Av v u3 ... uN or Auj v u3 ... uN ,

This agrees with the required formula. and finally deduce that the image of A is precisely the one-
It remains to prove the case q0 det A = 0. Although this dimensional subspace Span {v}. 
result could be achieved as a limit of nonzero q0 with q0 0, it Now we will demonstrate a useful property of the operators
is instructive to see a direct proof without using the assumption A(k) .
q0 6= 0 or taking limits. Statement 2: The trace of A(k) satisfies
Consider a basis {vj } in V and the expression
TrA(k)
= N AN k qk .
k
( Ak )v1 ... vN .
N

 Proof: Consider the action of N AN k  on a basis tensor


This expression contains N
k terms of the form v1 ... vN ; the result is a sum of NNk terms,

Av1 ... Avk vk+1 ... vN , N AN k = Av1 ... AvN k vN k+1 ... vN
+ (permutations).

where A is applied only to k vectors. Using the definition of A,
Consider now the action of TrA(k) on ,
we can rewrite such a term as follows. First, we use the defini-

tion of A to write TrA(k) = N [A(k) ]1
N
 X
Av1 = v1 N 1 AN 1 , = v1 ... A(k) vj ... vN .
j=1
for any N 1 V . In our case, we use
Using the definition of A(k) , we rewrite

Av2 ... Avk vk+1 ... vN v1 ... A(k) vj ... vN
and find = Av1 ... AvN k vN k+1 ... vj ... vN
+ (permutations not including Avj ).
Av1 = v1 AAv2 ... AAvk Avk+1 ... AvN .
After summing over j, we will obtain all the same terms as were
present in the expression for N AN k , but each term will occur
By assumption q0 = 0, hence AA = 0 = AA (since A, being a
polynomial in A, commutes with A) and thus several times. We can show that each term will occur exactly k
times. For instance, the term

(N Ak )v1 ... vN = 0, k 2. Av1 ... AvN k vN k+1 ... vj ... vN

For k = 1 we find will occur k times in the expression for TrA(k) because it will
be generated once by each of the terms

Av1 = v1 Av2 ... AvN .
v1 ... A(k) vj ... vN
Summing N such terms, we obtain the same expression as that with N k + 1 j N . The same argument holds for every
in the definition of N AN 1 , hence other term. Therefore

TrA(k) = k (N AN k ) = kqk .
(N A1 )v1 ... vN = N AN 1 v1 ... vN .
Since this holds for any N V , we obtain the required state-
This concludes the proof for the case det A = 0.  ment. 

68
4 Advanced applications

Remark: We have thus computed the trace of every operator Theorem 1 (Cayley-Hamilton): If QA () det(A 1V ) is the
characteristic polynomial of the operator A then QA (A) = 0V .
A(k) , as well as the characteristic polynomial of A(1) A. Com-
puting the entire characteristic polynomial of each Ak is cer- Proof: The coefficients of the characteristic polynomial are
tainly possible but will perhaps lead to cumbersome expres- N Am . When we substitute the operator A into QA (), we ob-
sions.  tain the operator
An interesting application of Statement 2 is the following al-
gorithm for computing the characteristic polynomial of an op- QA (A) = (det A)1V + (N AN 1 )(A) + ... + (A)N .
erator.1 This algorithm is more economical compared with the
We note that this expression is similar to that for the algebraic
computation of det(A 1) via permutations, and requires only
complement of A (see Exercise 2 in Sec. 4.2.1), so
operator (or matrix) multiplications and the computation of a
trace. 
Statement 3: (Leverriers algorithm) The coefficients N Ak QA (A) = (det A)1V + N AN 1 + ... + (A)N 1 (A)
qN k (1 k N ) of the characteristic polynomial of an operator = (det A)1V (N 1 AN 1 )T A = 0V
A can be computed together with the operators A(j) by starting
with A(N ) 1V and using the descending recurrence relation by Lemma 1 in Sec. 4.2.1. Hence QA (A) = 0V for any operator
for j = N 1, ..., 0: A. 
1 Remark: While it is true that the characteristic polynomial van-
qj = Tr [AA(j+1) ], ishes on A, it is not necessarily the simplest such polynomial. A
N j
polynomial of a lower degree may vanish on A. A trivial exam-
A(j) = qj 1 AA(j+1) . (4.2)
ple of this is given by an operator A = 1, that is, the identity
At the end of the calculation, we will have operator times a constant . The characteristic polynomial of A
N
q0 = det A, A(1) = A, A(0) = 0. is QA () = ( ) . In agreement with the Cayley-Hamilton
N
theorem, (1 A) = 0. However, the simpler polynomial
Proof: At the beginning of the recurrence, we have
p() = also has the property p(A) = 0. We will look
1 into this at the end of Sec. 4.6. 
j = N 1, qN 1 = Tr [AA(j+1) ] = TrA,
N j We have derived the Cayley-Hamilton theorem by consider-
N 1 N 1
which is correct. The recurrence relation (4.2) for A(j) coincides ing the exterior transpose of A . A generalization is
T
with the result of Lemma 1 in Sec. 4.2.1 and thus yields at each found if we similarly use the operators of the form a Ab .
step j the correct operator A(j) as long as qj was computed Theorem 2 (Cayley-Hamilton in k V ): For any operator A in
correctly at that step. So it remains to verify that qj is computed V and for 1 k N , 1 p N , the following identity holds,
correctly. Taking the trace of Eq. (4.2) and using Tr 1 = N , we
p
get X T k q
N k Apq ( A ) = (N Ap )1k V . (4.3)
Tr [AA(j+1) ] = N qj TrA(j) .
q=0
We now substitute for TrA(j) the result of Statement 2 and find
In this identity, we set k A0 1k V and k Ar 0 for r > k. Ex-
Tr [AA(j+1) ] = N qj jqj = (N j) qj . T
plicit expressions can be derived for all operators N k Ap
Thus qj is also computed correctly from the previously known as polynomials in the (mutually commuting) operators k Aj ,
A(j+1) at each step j.  1 j k. (See Exercise 3 in Sec. 4.2.1.) Hence, there exist k iden-
Remark: This algorithm provides another illustration for the tically vanishing operator-valued polynomials involving k Aj .
trace relations (see Exercises 1 and 2 in Sec. 3.9), i.e. for the (In the ordinary Cayley-Hamilton theorem, we have k = 1 and
fact that the coefficients qj of the characteristic polynomial of A
a single polynomial QA (A) that identically vanishes as an oper-
can be expressed as polynomials in the traces of A and its pow- ator in V 1 V .) The coefficients of those polynomials will be
ers. These expressions will be obtained in Sec. 4.5.3. known functions of A. One can also obtain an identically van-
ishing polynomial in k A1 .
4.3 Cayley-Hamilton theorem and Proof: Let us fix k and first write Eq. (4.3) for 1 p N k.
These N k equations are all of the form
beyond
T
N k Ap + [...] = (N Ap )1k V , 1 p N k.
The characteristic polynomial of an operator A has roots that
are eigenvalues of A. It turns out that we can substitute A as an In the p-th equation, the omitted terms in square brackets con-
operator into the characteristic polynomial, and the result is the tain only the operators N k Ar T with r < p and k Aq with
zero operator, as if A were one of its eigenvalues. In other words, 1 q k. Therefore, these equations can be used to express
A satisfies (as an operator) its own characteristic equation. T
N k Ap for 1 p N k through the operators k Aq
1I found this algorithm in an online note by W. explicitly as polynomials. Substituting these expressions into
Kahan, Jordans normal form (downloaded from Eq. (4.3), we obtain k identically vanishing polynomials in the
http://www.cs.berkeley.edu/~wkahan/MathH110/jordan.pdf k q
on October 6, 2009). Kahan attributes this algorithm to Leverrier, Souriau, k operators A (with 1 q k). These polynomials can be
Frame, and Faddeev. considered as a system of polynomial equations in the variables

69
4 Advanced applications

q k Aq . (As an exercise, you may verify that all the op- and finally
erators q commute.) A system of polynomial equations may
be reduced to a single polynomial equation in one of the vari- (q2 1 + 21 q3 1 2 )1 + (q3 1 1 )2 = q1 1,
ables, say 1 . (The technique for doing this in practice, called (q2 1 + 21 q3 1 2 )2 = q0 1.
the Grbner basis, is complicated and beyond the scope of this
book.)  One cannot express 2 directly through 1 using these last equa-
The following two examples illustrate Theorem 2 in three and tions. However, one can show (for instance, using a com-
four dimensions. puter algebra program2 ) that there exists an identically vanish-
Example 1: Suppose V is a three-dimensional space (N = 3) ing polynomial of degree 6 in 1 , namely p(1 ) = 0 with
and an operator A is given. The ordinary Cayley-Hamilton the-  
p(x) x6 3q3 x5 + 2q2 + 3q32 x4 4q2 q3 + q33 x3
orem is obtained from Theorem 2 with k = 1,  
+ q22 4q0 + q1 q3 + 2q2 q32 x2 q1 q32 + q22 q3 4q0 q3 x
q0 q1 A + q2 A2 A3 = 0,
+ q1 q2 q3 q0 q32 q12 .
N N j
where qj A are the coefficients of the characteristic
polynomial of A. The generalization of the Cayley-Hamilton The coefficients of p(x) are known functions of the coefficients qj
theorem is obtained with k = 2 (the only remaining case k = 3 of the characteristic polynomial of A. Note that the space 2 V
will not yield interesting results). has dimension 6 in this example; the polynomial p(x) has the
We write the identity (4.3) for k = 2 and p = 1, 2, 3. Using the same degree.
properties k Ak+j = 0 (with j > 0) and k A0 = 1, we get the Question: In both examples we found an identically vanishing
following three identities of operators in 2 V : polynomial in k A1 . Is there a general formula for the coeffi-
T cients of this polynomial?
1 A1 + 2 A1 = q2 12 V , Answer: I do not know!
T 2 1
1 A1 ( A ) + 2 A2 = q1 12 V ,
1 A1
T 2 2
( A ) = q0 12 V . 4.4 Functions of operators
Let us denote for brevity 1 2 A1 and 2 2 A2 . Expressing We will now consider some calculations with operators.
T
1 A1 through 1 from the first line above and substituting Let A End V . Since linear operators can be multiplied, it is
into the last two lines, we find straightforward to evaluate AA A2 and other powers of A, as
well as arbitrary polynomials in A. For example, the operator
2 = q1 1 q2 1 + 21 ,
A can be substituted instead of x into the polynomial p(x) =
(q2 1 1 )2 = q0 1. 2 + 3x + 4x2 ; the result is the operator 2 + 3A + 4A2 p(A).
We can now express 2 through 1 and substitute into the last Exercise: For a linear operator A and an arbitrary polynomial
equation to find p(x), show that p(A) has the same eigenvectors as A (although
perhaps with different eigenvalues). 
31 2q2 21 + (q1 + q22 )1 (q1 q2 q0 )1 = 0.
Another familiar function of A is the inverse operator, A1 .
Thus, the generalization of the Cayley-Hamilton theorem in Clearly, we can evaluate a polynomial in A1 as well (if A1
2 V yields an identically vanishing polynomial in 2 A1 1 exists). It is interesting to ask whether we can evaluate an ar-
with coefficients that are expressed through qj . bitrary function of A; for instance, whether we can raise A to
Question: Is this the characteristic polynomial of 1 ? a non-integer power, or compute exp(A), ln(A), cos(A). Gener-
Answer: I do not know! It could be since it has the correct ally, can we substitute A instead of x in an arbitrary function
degree. However, not every polynomial p(x) such that p() = 0
f (x) and evaluate an operator-valued function f (A)? If so, how
for some operator is the characteristic polynomial of .
to do this in practice?
Example 2: Let us now consider the case N = 4 and k = 2.
We use Eq. (4.3) with p = 1, 2, 3, 4 and obtain the following four
equations, 4.4.1 Definitions. Formal power series
(2 A1 )T + 2 A1 = (4 A1 )12 V , The answer is that sometimes we can. There are two situations
2 2 T 2 1 T 2 1 2 2 4 2 when f (A) makes sense, i.e. can be defined and has reasonable
( A ) + ( A ) ( A ) + A = ( A )12 V , properties.
(2 A2 )T (2 A1 ) + (2 A1 )T (2 A2 ) = (4 A3 )12 V , The first situation is when A is diagonalizable, i.e. there exists
(2 A2 )T (2 A2 ) = (4 A4 )12 V . a basis {ei } such that every basis vector is an eigenvector of A,

Let us denote, as before, qj = 4 A4j (with 0 j 3) and Aei = i ei .


r 2 Ar (with r = 1, 2). Using the first two equations above,
In this case, we simply define f (A) as the linear operator that
we can then express (2 Ar )T through r and substitute into acts on the basis vectors as follows,
the last two equations. We obtain
f (A)ei f (i )ei .
(2 A1 )T = q3 1 1 ,
2 Thiscan be surely done by hand, but I have not yet learned the Grbner basis
(2 A2 )T = q2 1 + 21 q3 1 2 , technique necessary to do this, so I cannot show the calculation here.

70
4 Advanced applications

Definition 1: Given a function f (x) and a diagonalizable linear This argument indicates at least one case where the operator-
operator valued power series surely converges.
XN Instead of performing an in-depth study of operator-valued
A = i ei ei , power series, I will restrict myself to considering formal power
i=1 series containing a parameter t, that is, infinite power series in
the function f (A) is the linear operator defined by t considered without regard for convergence. Let us discuss this
idea in more detail.
N
X By definition, a formal power series (FPS) is an infinite se-
f (A) f (i ) ei ei , quence of numbers (c0 , c1 , c2 , ...). This sequence, however, is
i=1 written as if it were a power series in a parameter t,
provided that f (x) is well-defined at the points x = i , i =
X
1, ..., N . c0 + c1 t + c2 t2 + ... = c n tn .
This definition might appear to be cheating since we sim- n=0

ply substituted the eigenvalues into f (x), rather than evaluate It appears that we need to calculate the sum of the above series.
the operator f (A) in some natural way. However, the result However, while we manipulate an FPS, we do not assign any
is reasonable since we, in effect, define f (A) separately in each value to t and thus do not have to consider the issue of conver-
eigenspace Span {ei } where A acts as multiplication by i . It is gence of the resulting infinite series. Hence, we work with an
natural to define f (A) in each eigenspace as multiplication by FPS as with an algebraic expression containing a variable t, an
f (i ). expression that we do not evaluate (although we may simplify
The second situation is when f (x) is an analytic function, that it). These expressions can be manipulated term by term, so that,
is, a function represented by a power series for example, the sum and the product of two FPS are always
defined; the result is another FPS. Thus, the notation for FPS
X should be understood as a convenient shorthand that simplifies
f (x) = cn xn , working with FPS, rather than an actual sum of an infinite series.
n=0 At the same time, the notation for FPS makes it easy to evaluate
such that the series converges to the value f (x) for some x. Fur- the actual infinite series when the need arises. Therefore, any
ther, we need this series to converge for a sufficiently wide range results obtained using FPS will hold whenever the series con-
verges.
of values of x such that all eigenvalues of A are within that
range. Then one can show that the operator-valued series Now I will use the formal power series to define f (tA).
Definition 2: Given an analytic function f (x) shown above and

X a linear operator A, the function f (tA) denotes the operator-
f (A) = cn (A)n valued formal power series
n=0

X
converges. The technical details of this proof are beyond the f (tA) cn (A)n tn .
scope of this book; one needs to define the limit of a sequence of n=0
operators and other notions studied in functional analysis. Here (According to the definition of formal power series, the variable
is a simple argument that gives a condition for convergence. t is a parameter that does not have a value and serves only to
Suppose that the operator A is diagonalizable and has eigenval- label the terms of the series.)
ues i and the corresponding eigenvectors vi (i = 1, ..., N ) such One can define the derivative of a formal power series, with-
that {vi } is a basis and A has a tensor representation out using the notion of a limit (and without discussing conver-
gence).
N
X Definition 3: The derivative t of a formal power series
A = i vi vi . P k
k ak t is another formal power series defined by
i=1

X
X
k

Note that t ak t (k + 1) ak+1 tk .
" #n k=0 k=0
N
X N
X
n
A = i vi vi = ni vi vi This definition gives us the usual properties of the derivative.
i=1 i=1 For instance, it is obvious that t is a linear operator in the space
of formal power series. Further, we have the important distribu-
to the property of the dual basis, vi (vj ) = ij . So if the se-
due P
tive property:
ries n=0 cn xn converges for every eigenvalue x = i of the op- Statement 1: The Leibniz rule,
erator A then the tensor-valued series also converges and yields
a new tensor t [f (t)g(t)] = [t f (t)] g(t) + f (t) [t g(t)] ,

X
X N
X holds for formal power series.
cn (A)n = cn ni vi vi Proof: Since t is a linear operation, it is sufficient to check
n=0 n=0 i=1 that the Leibniz rule holds for single terms, f (t) = ta and g(t) =
" #
N
X X tb . Details left as exercise. 
= cn n vi vi . This definition of f (tA) has reasonable and expected proper-
i=1 n=0 ties, such as:

71
4 Advanced applications

Exercise: For an analytic function f (x), show that In this way we


Pcan compute any analytic function of A (as long
as the series n=1 cn converges). For example,
f (A)A = Af (A)
1 1 1 1
cos A = 1 (A)2 + (A)4 ... = 1 A + A ...
and that 2! 4! 2! 4!
d 1 1
f (tA) = Af (A) = (1 + ...)A + 1 A
dt 2! 4!
= [(cos 1) 1] A + 1.
for an analytic function f (x). Here both sides are interpreted as
formal power series. Deduce that f (A)g(A) = g(A)f (A) for any Remark: In the above computation, we obtained a formula that
two analytic functions f (x) and g(x). expresses the end result through A. We have that formula even
Hint: Linear operations with formal power series must be per- though we do not know an explicit form of the operator A
formed term by term (by definition). So it is sufficient to con-
not even the dimension of the space where A acts or whether A
sider a single term in f (x), such as f (x) = xa . 
is diagonalizable. We do not need to know any eigenvectors of
Now we can show that the two definitions of the operator-
A. We only use the given fact that A2 = A, and we are still able
valued function f (A) agree when both are applicable.
to find a useful result. If such an operator A is given explicitly,
Statement 2: If f (x) is an analytic function and A is a di- we can substitute it into the formula
agonalizable operator then the two definitions agree, i.e. for
P PN cos A = [(cos 1) 1] A + 1
f (x) = n=0 cn xn and A = i=1 i ei ei we have the equality
of formal power series,
to obtain an explicit expression for cos A. Note also that the re-

X N
X sult is a formula linear in A.
2 1
cn (tA)n = f (ti ) ei ei . (4.4) Exercise 1: a) Given that (P ) = P , express (1P ) and exp P
n=0 i=1 through P . Assume that || > 1 so that the Taylor series for
f (x) = ( x)1 converges for x = 1.
Proof: It is sufficient to prove that the terms multiplying tn b) It is known only that (A)2 = A + 2. Determine the possible
coincide for each n. We note that the square of A is eigenvalues of A. Show that any analytic function of A can be
!2 ! N reduced to the form 1 + A with some suitable coefficients
N N
X X X and . Express (A)3 , (A)4 , and A1 as linear functions of A.
i ei ei = i ei ei j ej ej
Hint: Write A1 = 1+ A with unknown , . Write AA1 =
i=1 i=1 j=1
1 and simplify to determine and .
N
X Exercise 2: The operator A is such that A3 + A = 0. Compute
= 2i ei ei
i=1
exp(A) as a quadratic polynomial of A (here is a fixed num-
ber). 
because ei (ej ) = ij . In this way we can compute any power of Let us now consider a more general situation. Suppose we
A. Therefore, the term in the left side of Eq. (4.4) is know the characteristic polynomial QA () of A. The character-
istic polynomial has the form
N
!n N
X X
N 1
cn tn (A)n = cn tn i ei ei = c n tn ni ei ei , N
X k
i=1 i=1
Q A () = () + (1) qN k k ,
k=0
which coincides with the term at tn in the right side. 
where qi (i = 1, ..., N ) are known coefficients. The Cayley-
Hamilton theorem indicates that A satisfies the polynomial
4.4.2 Computations: Sylvesters method identity,
N
X 1
N k
Now that we know when an operator-valued function f (A) is (A)N = qN k (1) (A)k .
defined, how can we actually compute the operator f (A)? The k=0

first definition requires us to diagonalize A (this is already a lot It follows that any power of A larger than N 1 can be expressed
of work since we need to determine every eigenvector). More- as a linear combination of smaller powers of A. Therefore, a
over, Definition 1 does not apply when A is non-diagonalizable. power series in A can be reduced to a polynomial p(A) of de-
On the other hand, Definition 2 requires us to evaluate infinitely gree not larger than N 1. The task of computing an arbitrary
many terms of a power series. Is there a simpler way?
function f (A) is then reduced to the task of determining the N
There is a situation when f (A) can be computed without such coefficients of p(x) p0 + ... + pN 1 xn1 . Once the coefficients
effort. Let us first consider a simple example where the operator of that polynomial are found, the function can be evaluated as
A happens to be a projector, (A)2 = A. In this case, any power of f (A) = p(A) for any operator A that has the given characteristic
A is again equal to A. It is then easy to compute a power series polynomial.
in A: Determining the coefficients of the polynomial p(A) might ap-

X X  pear to be difficult because one can get rather complicated for-
cn (A)n = c0 1 + cn A.
n=0 n=1
mulas when one converts an arbitrary power of A to smaller

72
4 Advanced applications

powers. This work can be avoided if the eigenvalues of A are Theorem 2: Suppose that a linear operator A and a polynomial
known, by using the method of Sylvester, which I will now ex- Q(x) are such that Q(A) = 0, and assume that the equation
plain. Q() = 0 has all distinct roots i (i = 1, ..., n), where n is not
The present task is to calculate f (A) equivalently, the poly- necessarily equal to the dimension N of the vector space. Then
nomial p(A) when the characteristic polynomial QA () is an analytic function f (A) can be computed as
known. The characteristic polynomial has order N and hence
has N (complex) roots, counting each root with its multiplicity. f (A) = p(A),
The eigenvalues i of the operator A are roots of its character-
where p(x) is the interpolating polynomial for the function f (x)
istic polynomial, and there exists at least one eigenvector vi for
at the points x = i (i = 1, ..., n).
each i (Theorem 1 in Sec. 3.9). Knowing the characteristic poly-
Proof: The polynomial p(x) is defined uniquely by substitut-
nomial QA (), we may determine its roots i .
ing xk with k n through lower powers of x in the series for
Let us first assume that the roots i (i = 1, ..., N ) are all
different. Then we have N different eigenvectors vi . The f (x), using the equation p(x) = 0. Consider the operator A1 that
set {vi | i = 1, ..., N } is linearly independent (Statement 1 in acts as multiplication by 1 . This operator satisfies p(A1 ) = 0,
Sec. 3.6.1) and hence is a basis in V ; that is, A is diagonalizable. and so f (A1 ) is simplified to the same polynomial p(A1 ). Hence
We will not actually need to determine the eigenvectors vi ; it we must have f (A1 ) = p(A1 ). However, f (A1 ) is simply the op-
will be sufficient that they exist. Let us now apply the function erator of multiplication by f (1 ). Hence, p(x) must be equal
f (A) to each of these N eigenvectors: we must have to f (x) when evaluated at x = 1 . Similarly, we find that
p(i ) = f (i ) for i = 1, ..., n. The interpolating polynomial for
f (A)vi = f (i )vi . f (x) at the points x = i (i = 1, ..., n) is unique and has degree
n 1. Therefore, this polynomial must be equal to p(x). 
On the other hand, we may express It remains to develop a procedure for the case when not all
roots i of the polynomial Q() are different. To be specific, let
f (A)vi = p(A)vi = p(i )vi . us assume that 1 = 2 and that all other eigenvalues are differ-
ent. In this case we will first solve an auxiliary problem where
Since the set {vi } is linearly independent, the vanishing linear 2 = 1 + and then take the limit 0. The equations deter-
combination mining the coefficients of the polynomial p(x) are
XN
[f (i ) p(i )] vi = 0 p(1 ) = f (1 ), p(1 + ) = f (1 + ), p(3 ) = f (3 ), ...
i=1

must have all vanishing coefficients; hence we obtain a system Subtracting the first equation from the second and dividing by
of N equations for N unknowns {p0 , ..., pN 1 }: , we find
p(1 + ) p(1 ) f (1 + ) f (1 )
p0 + p1 i + ... + pN 1 N
i
1
= f (i ), i = 1, ..., N. = .

Note that this system of equations has the Vandermonde ma- In the limit 0 this becomes
trix (Sec. 3.6). Since by assumption all i s are different, the
determinant of this matrix is nonzero, therefore the solution p (1 ) = f (1 ).
{p0 , ..., pN 1 } exists and is unique. The polynomial p(x) is the
Therefore, the polynomial p(x) is determined by the require-
interpolating polynomial for f (x) at the points x = i (i =
ments that
1, ..., N ).
We have proved the following theorem: p(1 ) = f (1 ), p (1 ) = f (1 ), p(3 ) = f (3 ), ...
Theorem 1: If the roots {1 , ..., N } of the characteristic poly-
nomial of A are all different, a function of A can be computed If three roots coincide, say 1 = 2 = 3 , we introduce two aux-
as f (A) = p(A), where p(x) is the interpolating polynomial for iliary parameters 2 and 3 and first obtain the three equations
f (x) at the N points {1 , ..., N }.
p(1 ) = f (1 ), p(1 + 2 ) = f (1 + 2 ),
Exercise 3: It is given that the operator A has the characteristic
polynomial QA () = 2 + 6. Determine the eigenvalues of p(1 + 2 + 3 ) = f (1 + 2 + 3 ).
A and calculate exp(A) as a linear expression in A. Subtracting the equations and taking the limit 2 0 as before,
If we know that an operator A satisfies a certain operator we find
equation, say (A)2 A + 6 = 0, then it is not necessary to
know the characteristic polynomial in order to compute func- p(1 ) = f (1 ), p (1 ) = f (1 ), p (1 + 3 ) = f (1 + 3 ).
tions f (A). It can be that the characteristic polynomial has a Subtracting now the second equation from the third and taking
high order due to many repeated eigenvalues; however, as far the limit 3 0, we find p (1 ) = f (1 ). Thus we have proved
as analytic functions are concerned, all that matters is the possi- the following.
bility to reduce high powers of A to low powers. This possibil- Theorem 3: If a linear operator A satisfies a polynomial oper-
ity can be provided by a polynomial of a lower degree than the
ator equation Q(A) = 0, such that the equation Q() = 0 has
characteristic polynomial.
roots i (i = 1, ..., n) with multiplicities mi ,
In the following theorem, we will determine f (A) knowing
only some polynomial Q(x) for which p(A) = 0. Q() = const ( 1 )m1 ... ( n )mn ,

73
4 Advanced applications

an analytic function f (A) can be computed as Taking the trace of this equation, we can express the determinant
as
f (A) = p(A), 1 1
det B = (TrB)2 Tr(B 2 )
2 2
where p(x) is the polynomial determined by the conditions
and hence
p(i ) = f (i ), p (i ) = f (i ), ..., b2 a
bB = A + 1. (4.5)
dmi 1 p(x) dmi 1 f (x) 2
= , i = 1, ..., n.
dxmi 1 x=i dxmi 1 x=i This equation will yield an explicit formula for B through A if
Theorems 1 to 3, which comprise Sylvesters method, allow us we only determine the value of the constant b such that b 6= 0.
to compute functions of an operator when only the eigenvalues Squaring the above equation and taking the trace, we find
are known, without determining any eigenvectors and without
assuming that the operator is diagonalizable. b4 2b2 a + c = 0, c 2Tr(A2 ) a2 = a2 4 det A.

Hence, we obtain up to four possible solutions for b,


4.4.3 * Square roots of operators
q q p
In the previous section we have seen that functions of operators p
2
b = a a c = TrA 2 det A. (4.6)
can be sometimes computed explicitly. However, our methods
work either for diagonalizable operators A or for functions f (x)
given by a power series that converges for every eigenvalue of Each value of b such that b 6= 0 yield possible operators B
through Eq. (4.5). Denoting by s1 = 1 and s2 = 1 the two
the operator A. If these conditions are not met, functions of op-
free choices of signs in Eq. (4.6), we may write the general solu-
erators may not exist or may not be uniquely defined. As an
tion (assuming b 6= 0) as
example where these problems arise, we will briefly consider
the task of computing the square root of a given operator. p
Given an operator A we would like to define its square root as A + s2 det A1
B = s1 q p . (4.7)
an operator B such that B 2 = A. For a diagonalizable operator TrA + 2s2 det A
PN
A = i=1 i ei ei (where {ei } is an eigenbasis and {ei } is the
dual basis) we can easily find a suitable B by writing It is straightforward to verify (using the Cayley-Hamilton theo-
N p
rem for A) that every such B indeed satisfies B 2 = A.
X
B i ei ei . Note also that B is expressed as a linear polynomial in A.
i=1 Due to the Cayley-Hamilton theorem, any analytic function of
A reduces to a linear polynomial in the two-dimensional case.
Note that the numeric square root i has an ambiguous sign; Hence, we can view Eq. (4.7) as a formula yielding the analytic
so with each possible choice of sign for each i , we obtain a solutions of the equation B 2 = A.
possible choice of B. (Depending on the problem at hand, there If b = 0 is a solution of Eq. (4.6) then we must consider the
if all i
might be a natural way of fixing the signs; for instance,
possibility that solutions B with b Tr B = 0 may exist. In
are positive then it might be useful to choose also all i as pos-
that case, Eq. (4.5) indicates that A plus a multiple of 1 must be
itive.) The ambiguity of signs is expected; what is unexpected is
equal to the zero operator. Note that Eq. (4.5) is a necessary con-
that there could be many other operators B satisfying B 2 = A,
sequence of B 2 = A, obtained only by assuming that B exists.
as the following example shows.
Example 1: Let us compute the square root of the identity oper- Hence, when A is not proportional to the identity operator, no
ator in a two-dimensional space. We look for B such that B 2 = 1. solutions B with Tr B = 0 can exist. On the other hand, if A is
proportional to 1, solutions with Tr B = 0 exist but the present
Straightforward solutions are B = 1. However, consider the
method does not yield these solutions. (Note that this method
following operator,
can only yield solutions B that are linear combinations of the
   2 
a b a + bc 0  operator A and the identity operator!) It is easy to see that the
B , B 2 = 2 = a2 + bc 1.
c a 0 a + bc operators from Example 1 fall into this category, with TrB = 0.
There are no other solutions except those shown in Example 1
This B satisfies B 2 = 1 for any a, b, c C as long as a2 + bc = 1. because in that example we have obtained all possible traceless
The square root is quite ambiguous for the identity operator!  solutions.
We will now perform a simple analysis of square roots of op- Another interesting example is found when A is a nilpotent
erators in two- and three-dimensional spaces using the Cayley- (but nonzero).
Hamilton theorem.  
Let us assume that B 2 = A, where A is a given operator, and 0 1
Example 2: Consider a nilpotent operator A1 = . In
denote for brevity a TrA and b TrB (where a is given but 0 0
b is still unknown). In two dimensions, any operator B satisfies that case, both the trace and the determinant of A1 are equal
the characteristic equation to zero; it follows that b = 0 is the only solution of Eq. (4.6).
However, A1 is not proportional to the identity operator. Hence,
B 2 (TrB)B + (det B)1 = 0. a square root of A1 does not exist.

74
4 Advanced applications
p
Remark: This problem with the nonexistence
of the square root Note that det B = det A and hence can be considered
is not the same as the nonexistence of 1 within real numbers; known. Moving B to another side in Eq. (4.8) and squaring the
the square root of A1 does not exist even if we p allow complex resulting equation, we find
numbers! The reason is that the existence of A1 would be al-
gebraically inconsistent (because it would contradict the Cayley- (A2 + 2sA + s2 1)A = (bA + (det B)1)2 .
Hamilton theorem). 
Let us summarize our results so far. In two dimensions, the Expanding the brackets and using the Cayley-Hamilton theo-
general calculation of a square root of a given operator A pro- rem for A in the form
ceeds as follows: If A is proportional to the identity operator, A3 aA2 + pA (det A)1 = 0,
we have various solutions of the form shown in Example 1. (Not
every one of these solutions may be relevant for the problem at where the coefficient p can be expressed as
hand, but they exist.) If A is not proportional to the identity op-
erator, we solve Eq. (4.6) and obtain up to four possible values 1 2
p= (a Tr(A2 )),
of b. If the only solution is b = 0, the square root of A does not 2
exist. Otherwise, every nonzero value of b yields a solution B we obtain after simplifications
according to Eq. (4.5), and there are no other solutions.
Example 3: We would like to determine a square root of the (s2 p 2b det B)A = 0.
operator  
1 3 This yields a fourth-order polynomial equation for b,
A = .
0 4  2
b2 a
We compute det A = 4 and a = TrA = 5. Hence Eq. (4.6) gives p 2b det B = 0.
2
four nonzero values,
This equation can be solved, in p principle. Since det B has up to
b = 5 4 = {1, 3} .
two possible values, det B = det A, we can then determine
Substituting these values of b into Eq. (4.5) and solving for B, we up to eight possible values of b (and the corresponding values of
compute the four possible square roots s).
   
1 1 1 3 Now we use a trick to express B as a function of A. We rewrite
B = , B = . Eq. (4.8) as
0 2 0 2
AB = sB + bA + (det B)1
Since b = 0 is not a solution, while A 6= 1, there are no other
square roots. and multiply both sides by B, substituting AB back into the
Exercise 1: Consider a diagonalizable operator represented in a equation,
certain basis by the matrix
  A2 + sA = bAB + (det B)B
2
0
A = , = b[sB + bA + (det B)1] + (det B)B.
0 2
where and are any complex numbers, possibly zero, such The last line yields
that 2 6= 2 . Use Eqs. (4.5)(4.6) to show that the possible 1
square roots are B = [A2 + (s b2 )A b(det B)1].
  (det B) sb
0
B = .
0
This is the final result, provided that the denominator (det B
and that there are no other square roots.  sb) does not vanish. In case this denominator vanishes, the
Exercise 2: Obtain all possible square roots of the zero operator present method cannot yield a formula for B in terms of A.
in two dimensions.  Exercise 3:* Verify that the square root of a diagonalizable op-
Let us now consider a given operator A in a three-dimensional erator, 2
space and assume that there exists B such that B 2 = A. We will p 0 0
be looking for a formula expressing B as a polynomial in A. As A = 0 q 2 0 ,
we have seen, this will certainly not give every possible solution 0 0 r2
B, but we do expect to get the interesting solutions that can be where p2 , q 2 , r2 C are all different, can be determined using
expressed as analytic functions of A. this approach, which yields the eight possibilities
As before, we denote a TrA and b TrB. The Cayley-
Hamilton theorem for B together with Exercise 1 in Sec. 3.9 p 0 0
(page 61) yields a simplified equation, B = 0 q 0 .
0 0 r
0 = B 3 bB 2 + sB (det B)1
Hint: Rather than trying to solve the fourth-order equation for
= (A + s1)B bA (det B)1, (4.8)
b directly (a cumbersome task), one can just verify, by substitut-
b2 a ing into the equation, that the eight values b = p q r (with
s .
2 all the possible choices of signs) are roots of that equation.

75
4 Advanced applications

Exercise 4:*3 It is given that a three-dimensional operator A sat- Remark: Although we establish Theorem 1 only in the sense of
isfies equality of formal power series, the result is useful because both
1 sides of Eq. (4.10) will be equal whenever both series converge.
Tr (A2 ) = (Tr A)2 , det A 6= 0.
2 Since the series for exp(x) converges for all x, one expects that
Show that there exists B, unique up to a sign, such that Tr B = 0 Eq. (4.10) has a wide range of applicability. In particular, it holds
and B 2 = A. for any operator in finite dimensions. 
The idea of the proof will be to represent both sides of
Answer:
1  2 1  Eq. (4.10) as power series in t satisfying some differential equa-
B = p A (Tr A)A . tion. First we figure out how to solve differential equations for
det A 2
formal power series. Then we will guess a suitable differential
equation that will enable us to prove the theorem.
4.5 Formulas of Jacobi and Liouville Lemma 1: The operator-valued function F (t) exp(tA) is the
unique solution of the differential equation
Definition: The Liouville formula is the identity
t F (t) = F (t) A, F (t = 0) = 1V ,
det(exp A) = exp(TrA), (4.9)
where both sides of the equation are understood as formal
where A is a linear operator and exp A is defined by the power power series.
series, Proof: The initial condition means that

X 1
exp A (A)n . F (t) = 1 + F1 t + F2 t2 + ...,
n=0
n!
where F1 , F2 , ..., are some operators. Then we equate terms
Example: Consider a diagonalizable operator A (an operator with equal powers of t in the differential equation, which yields
such that there exists an eigenbasis {ei | i = 1, ..., N }) and denote Fj+1 = 1j Fj A, j = 1, 2, ..., and so we obtain the desired expo-
by i the eigenvalues, so that Aei = i ei . (The eigenvalues i nential series. 
are not necessarily all different.) Then we have (A)n ei = ni ei Lemma 2: If (t) and (t) are power series in t with coefficients
and therefore from m V and n V respectively, then the Leibniz rule holds,

X 1 X 1 n t ( ) = (t ) + (t ) .
(exp A)ei = (A)n ei = i ei = ei ei .
n=0
n! n=0
n!
Proof: Since the derivative of formal power series, as defined
PN above, is a linear operation, it is sufficient to verify the statement
The trace of A is TrA = i=1 i and the determinant is det A = in the case when = ta 1 and = tb 2 . Then we find
QN
i=1 i . Hence we can easily verify the Liouville formula,
t ( ) = (a + b) ta+b1 1 2 ,
1 N
det(exp A) = e ...e = exp(1 + ... + n ) = exp(TrA). (t ) + (t ) = ata1 1 tb 2 + ta 1 btb1 2 .
However, the Liouville formula is valid also for non- 
diagonalizable operators.  Lemma 3: The inverse to a formal power series (t) exists (as a
The formula (4.9) is useful in several areas of mathematics and formal power series) if and only if (0) 6= 0.
physics. A proof of Eq. (4.9) for matrices can be given through Proof: The condition (0) 6= 0 means that we can express
the use of the Jordan canonical form of the matrix, which is (t) = (0) + t(t) where (t) is another power series. Then
a powerful but complicated construction that actually is not we can use the identity of formal power series,
needed to derive the Liouville formula. We will derive it us- " #
ing operator-valued differential equations for power series. A X n n
useful by-product is a formula for the derivative of the determi- 1 = (1 + x) (1) x ,
n=0
nant.
Theorem 1 (Liouvilles formula): For an operator A in a finite- to express 1/(t) as a formal power series,
dimensional space V ,

1 1 X n n1 n
= = (1) [(0)] [t(t)] .
det exp(tA) = exp(tTrA). (4.10) (t) (0) + t(t) n=0

Here both sides are understood as formal power series in the n


Since each term [t(t)] is expanded into a series that starts with
variable t, e.g. tn , we can compute each term of 1/(t) by adding finitely many
n
X t other terms, i.e. the above equation does specify a well-defined
exp(tA) (A)n , formal power series. 
n=0
n!
Corollary: If A(t) is an operator-valued formal power series, the
i.e. an infinite series considered without regard for convergence inverse to A(t) exists (as a formal power series) if and only if
(Sec. 4.4). det A(0) 6= 0.
3 This is motivated by the article by R. Capovilla, J. Dell, and T. Jacobson, Clas- The next step towards guessing the differential equation is to
sical and Quantum Gravity 8 (1991), pp. 5973; see p. 63 in that article. compute the derivative of a determinant.

76
4 Advanced applications

Lemma 4 (Jacobis formula): If A(t) is an operator-valued for- Exercise 2:* (Sylvesters theorem) For any two linear maps A :
mal power series such that the inverse A1 (t) exists, we have V W and B : W V , we have well-defined composition
maps AB End W and B A End V . Then
t det A(t) = (det A)Tr [A1 t A] = Tr [(det A)A1 t A]. (4.11)
If the inverse does not exist, we need to replace det A A1 in det(1V + B A) = det(1W + AB).
Eq. (4.11) by the algebraic complement, Note that the operators at both sides act in different spaces.
T Hint: Introduce a real parameter t and consider the functions
A N 1 AN 1
f (t) det(1 + tAB), g(t) det(1 + tB A). These functions are
(see Sec. 4.2.1), so that we obtain the formula of Jacobi, polynomials of finite degree in t. Consider the differential equa-
tion for these functions; show that f (t) satisfies
t det A = Tr [A t A].
df
Proof of Lemma 4: A straightforward calculation using = f (t)Tr [AB(1 + tAB)1 ],
dt
Lemma 2 gives
 and similarly for g. Expand in series in t and use the identi-
t det A(t) v1 ... vN = t [Av1 ... AvN ] ties Tr (AB) = Tr (B A), Tr (AB AB) = Tr (B AB A), etc. Then
N
X show that f and g are solutions of the same differential equa-
= Av1 ... (t A)vk ... AvN . tion, with the same conditions at t = 0. Therefore, show that
k=1 these functions are identical as formal power series. Since f and
Now we use the definition of the algebraic complement operator g are actually polynomials in t, they must be equal.
to rewrite
4.5.1 Derivative of characteristic polynomial
Av1 ... (t A)vk ... AvN = v1 ... (A t Avk ) ... vN .
Jacobis formula expresses the derivative of the determinant,
Hence
t det A, in terms of the derivative t A of the operator A. The
N
X determinant is the last coefficient q0 of the characteristic polyno-
(t det A)v1 ... vN = v1 ... (A t Avk ) ... vN mial of A. It is possible to obtain similar formulas for the deriva-
k=1
tives of all other coefficients of the characteristic polynomial.

= N (A t A)1 v1 ... vN Statement: The derivative of the coefficient

= Tr [A t A]v1 ... vN . qk N AN k
of the characteristic polynomial of A is expressed (for 0 k
Therefore t det A = Tr [A t A]. When A1 exists, we may ex-
N 1) as
press A through the inverse matrix, A = (det A)A1 , and obtain  
t qk = Tr (N 1 AN k1 )T t A .
Eq. (4.11).
Proof of Theorem 1: It follows from Lemma 3 that F 1 (t) ex- Note that the first operator in the brackets is the one we denoted
ists since F (0) = 1, and it follows from Lemma 4 that the oper- by A(k+1) in Sec. 4.2.3, so we can write
ator-valued function F (t) = exp(tA) satisfies the differential
equation t qk = Tr [A(k+1) t A].
t det F (t) = det F (t) Tr[F 1 t F ].
Proof: We apply the operator t (N AN k ) to the tensor
From Lemma 1, we have F 1 t F = F 1 F A = A, therefore v1 ... vN , where {vj } is a basis. We assume that the vectors
vj do not depend on t, so we can compute
t det F (t) = det F (t) TrA.
   
This is a differential equation for the number-valued formal t (N AN k ) = t N AN k .
power series f (t) det F (t), with the initial condition f (0) = 1.
The result is a sum of terms such as
The solution (which we may still regard as a formal power se-
ries) is Av1 ... AvN k1 t AvN k vN k+1 ... vN
f (t) = exp(tTrA).
and other terms obtained by permuting the vectors vj (without
Therefore
introducing any minus  signs!). The total number of these terms
det F (t) det exp(tA) = exp(tTrA). is equal to N NNk1
1
, since we need to choose a single vector to
 which t A will apply, and then (N k 1) vectors to which A
Exercise 1: (generalized Liouvilles formula) If A End V and will apply, among the (N 1) remaining vectors. Now consider
p N dim V , show that the expression
  
p (exp tA)p = exp t(p A1 ) , Tr (N 1 AN k1 )T t A .
where both sides are understood as formal power series of op- This expression is the sum of terms such as
erators in p V . (The Liouville formula is a special case with
p = N .) A(k+1) t Av1 v2 ... vN

77
4 Advanced applications

and other terms with permuted vectors vj . There will be N such The number

terms, since we choose one vector out of N to apply the operator TrB(0) N B N 1

6= 0
t=0
A(k+1) t A. Using the definition of A(k+1) , we write
if and only if (0) is a simple eigenvalue.
A(k+1) t Av1 v2 ... vN Proof: We consider the derivative t of the identity det B = 0:
 
= t Av1 N 1 AN k1 (v2 ... vN )

0 = t det B = Tr (Bt B) = Tr [B(t A 1t )]
= t Av1 Av2 ... AvN k vN k+1 ... vN + ...,

= Tr (Bt A) (Tr B)t .
where in the last line we omitted all other permutations of the

vectors. (There will be NNk1
1
such permutations.) It follows We have from Statement 1 in Sec. 4.2.3 the relation
that the tensor expressions

Tr B = N B N 1
t qk t (N AN k )

and Tr [A(k+1) t A] consist of the same terms; thus they are for any operator B. Since (by assumption) TrB(t) 6= 0 at t = 0,
equal,
we may divide by TrB(t) because 1/TrB(t) is a well-defined FPS
t qk = Tr [A(k+1) t A]. (Lemma 3 in Sec. 4.5). Hence, we have
Since this holds for any N V , we obtain the required state-
ment.  Tr (Bt A) Tr (Bt A)
t = = .
Exercise: Assuming that A(t) is invertible, derive a formula for N B N 1
Tr B

the derivative of the algebraic complement, t A.
The condition N B N 1 6= 0 is equivalent to
Hint: Compute t of both sides of the identity AA = (det A)1.
Answer:
Q () 6= 0 at = 0,
Tr [At A]A A(t A)A B
t A = .
det A
which is the same as the condition that = 0 is a simple zero of

Remark: Since A is a polynomial in A, the characteristic polynomial of B A 1. 
Remark: If A(t), say, at t = 0 has an eigenvalue (0) of mul-
A = q1 q2 A + ... + qN 1 (A)N 2 + (A)N 1 ,
tiplicity higher than 1, the formula derived in Statement 1 does

all derivatives of A may be expressed directly as polynomials in not apply, and the analysis requires knowledge of the eigenvec-
tors. For example, the eigenvalue (0) could have multiplic-
A and derivatives of A, even when A is not invertible. Explicit
1
ity 2 because there are two eigenvalues 1 (t) and 2 (t), corre-
expressions not involving A are cumbersome for instance, sponding to different eigenvectors, which are accidentally equal
the derivative of a polynomial in A will contain expressions like at t = 0. One cannot compute t without specifying which
of the two eigenvalues, 1 (t) or 2 (t), needs to be considered,
t (A3 ) = (t A)A2 + A(t A)A + A2 t A.
i.e. without specifying the corresponding eigenvectors v1 (t) or
Nevertheless, these expressions can be derived using the known v2 (t). Here I do not consider these more complicated situations
formulas for t qk and A(k) .  but restrict attention to the case of a simple eigenvalue.

4.5.2 Derivative of a simple eigenvalue 4.5.3 General trace relations


Suppose an operator A is a function of a parameter t; we will We have seen in Sec. 3.9 (Exercises 1 and 2) that the coeffi-
consider A(t) as a formal power series (FPS). Then the eigen- cients of the characteristic polynomial of an operator A can be
vectors and the eigenvalues of A are also functions of t. We can expressed by algebraic formulas through the N traces TrA, ...,
obtain a simple formula for the derivative of an eigenvalue if Tr(AN ), and we called these formulas trace relations. We will
it is an eigenvalue of multiplicity 1. It will be sufficient to know now compute the coefficients in the trace relations in the general
the eigenvalue and the algebraic complement of A 1; we do case.
not need to know any eigenvectors of A explicitly, nor the other We are working with a given operator A in an N -dimensional
eigenvalues. space.
Statement: Suppose A(t) is an operator-valued formal power Statement: We denote for brevity qk N Ak and tk Tr(Ak ),
series and (0) is a simple eigenvalue, i.e. an eigenvalue of A(0) where k = 1, 2, ..., and set qk 0 for k > N . Then all qk can be
having multiplicity 1. We also assume that there exists an FPS expressed as polynomials in tk , and these polynomials are equal
(t) and a vector-valued FPS v(t) such that Av = v in the sense to the coefficients at xk of the formal power series
of formal power series. Then the following identity of FPS holds,   X
x2 n1 xn
G(x) = exp t1 x t2 + ... + (1) tn + ... xk qk
Tr (Bt A) Tr (Bt A) 2 n
k=1
t = = ,
N B N 1
Tr B
by collecting the powers of the formal variable x up to the de-
B(t) A(t) (t)1V . sired order.

78
4 Advanced applications

Proof: Consider the expression det(1 + xA) as a formal power in Sec. 3.5.1). However, it may happen that the algebraic multi-
series in x. By the Liouville formula, we have the following plicity of an eigenvalue is larger than 1 but the geometric mul-
identity of formal power series, tiplicity is strictly smaller than the algebraic multiplicity. For
h i example, an operator given in some basis by the matrix
ln det(1 + xA) = Tr ln(1 + xA)  
  0 1
x2 2 n1 x
n
n 0 0
= Tr xA A + ... + (1) A + ...
2 n
x2 n1 xn has only one eigenvector corresponding to the eigenvalue = 0
= xt1 t2 + ... + (1) tn + ..., of algebraic multiplicity 2. Note that this has nothing to do with
2 n
missing real roots of algebraic equations; this operator has only
where we substituted the power series for the logarithm func- one eigenvector even if we allow complex eigenvectors. In this
tion and used the notation tk Tr(Ak ). Therefore, we have case, the operator is not diagonalizable because there are insuffi-
ciently many eigenvectors to build a basis. The theory of the Jor-
det(1 + xA) = exp G(x) dan canonical form explains the structure of the operator in this
case and finds a suitable basis that contains all the eigenvectors
as the identity of formal power series. On the other hand,
and also some additional vectors (called the root vectors), such
det(1 + xA) is actually a polynomial of degree N in x, i.e. a formal that the given operator has a particularly simple form when ex-
power series that has all zero coefficients from xN +1 onwards. pressed through that basis. This form is block-diagonal and con-
The coefficients of this polynomial are found by using xA in- sists of Jordan cells, which are square matrices such as
stead of A in Lemma 1 of Sec. 3.9:
N
1 0
det(1 + xA) = 1 + q1 x + ... + qN x . 0 1 ,
0 0
Therefore, the coefficient at xk in the formal power series
exp G(x) is indeed equal to qk for k = 1, ..., N . (The coefficients and similarly built matrices of higher dimension.
at xk for k > N are all zero!)  To perform the required analysis, it is convenient to consider
Example: Expanding the given series up to terms of order x4 , each eigenvalue of a given operator separately and build the re-
we find after some straightforward calculations quired basis gradually. Since the procedure is somewhat long,
  we will organize it by steps. The result of the procedure will be a
t2 t2 2 t3 t1 t2 t3 3
G(x) = t1 x + 1 x + 1 + x construction of a basis (the Jordan basis) in which the operator
2 6 2 3
 4  A has the Jordan canonical form.
t t2 t2 t2 t1 t3 t4 4
+ 1 1 + 2+ x + O(x5 ). Step 0: Set up the initial basis. Let A End V be a linear oper-
24 4 8 3 4 ator having the eigenvalues 1 ,...,n , and let us consider the first
eigenvalue 1 ; suppose 1 has algebraic multiplicity m. If the
Replacing tj with Tr(Aj ) and collecting the terms at the k-th geometric multiplicity of is also equal to m, we can choose
1
power of x, we obtain the k-th trace relation. For example, the a linearly independent set of m basis eigenvectors {v , ..., v }
1 m
trace relation for k = 4 is and continue to work with the next eigenvalue 2 . If the geo-
1 1 1h i2 metric multiplicity of 1 is less than m, we can only choose a set
N A4 = (TrA)4 Tr(A2 )(TrA)2 + Tr(A2 ) of r < m basis eigenvectors {v1 , ..., vr }.
24 4 8
1 1 In either case, we have found a set of eigenvectors with
+ Tr(A3 )TrA Tr(A4 ). eigenvalue 1 that spans the entire eigenspace. We can repeat
3 4
Step 0 for every eigenvalue i and obtain the spanning sets
Note that this formula is valid for all N , even for N < 4; in the of eigenvectors. The resulting set of eigenvectors can be com-
latter case, N A4 = 0. pleted to a basis in V . At the end of Step 0, we have a basis
{v1 , ..., vk , uk+1 , ..., uN }, where the vectors vi are eigenvectors
4.6 Jordan canonical form of A and the vectors ui are chosen arbitrarily as long as the
result is a basis in V . By construction, any eigenvector of A is
We have seen in Sec. 3.9 that the eigenvalues of a linear operator a linear combination of the vi s. If the eigenvectors vi are suffi-
are the roots of the characteristic polynomial, and that there ex- ciently numerous as to make a basis in V without any ui s, the
ists at least one eigenvector corresponding to each eigenvalue. In operator A is diagonalizable and its Jordan basis is the eigenba-
this section we will assume that the total number of roots of the sis; the procedure is finished. We need to proceed with the next
characteristic polynomial, counting the algebraic multiplicity, is steps only in the case when the eigenvectors vi do not yet span
equal to N (the dimension of the space). This is the case, for the entire space V , so the Jordan basis is not yet determined.
instance, when the field K is that of the complex numbers (C); Step 1: Determine a root vector. We will now concentrate
otherwise not all polynomials will have roots belonging to K. on an eigenvalue 1 for which the geometric multiplicity r is
The dimension of the eigenspace corresponding to an eigen- less than the algebraic multiplicity m. At the previous step,
value (the geometric multiplicity) is not larger than the alge- we have found a basis containing all the eigenvectors needed
braic multiplicity of the root in the characteristic polynomial to span every eigenspace. The basis presently has the form
(Theorem 1 in Sec. 3.9). The geometric multiplicity is in any case {v1 , ..., vr , ur+1 , ..., uN }, where {vi | 1 i r} span the eigen-
not less than 1 because at least one eigenvector exists (Theorem 2 space of the eigenvalue 1 , and {ui | r + 1 i N } are either

79
4 Advanced applications

eigenvectors of A corresponding to other eigenvalues, or other Similarly, at least one of the coefficients {ci | r + 1 i N } is
basis vectors. Without loss of generality, we may assume that nonzero. We would like to replace one of the ui s in the basis by
1 = 0 (otherwise we need to consider temporarily the operator x; it is possible to replace ui by x as long as ci 6= 0. However,
we do not wish to remove from the basis any of the eigenvectors
A 1 1V , which has all the same eigenvectors as A). Since the
operator A has eigenvalue 0 with algebraic multiplicity m, the corresponding to other eigenvalues; so we need to choose the in-
characteristic polynomial has the form QA () = m q(), where dex i such that ui is not one of the other eigenvectors and at the
same time ci 6= 0. This choice is possible; for were it impossible,
q() is some other polynomial. Since the coefficients of the char-
acteristic polynomial are proportional to the operators N Ak forthe vector x were a linear combination of other eigenvectors of
1 k N , we find that A (all having nonzero eigenvalues), so Ax is again a linear com-
bination of those eigenvectors, which contradicts the equations
N N m N N k
A 6= 0, while A = 0, 0 k < m. Ax = v and Av = 0 because v is linearly independent of all
other eigenvectors. Therefore, we can choose a vector ui that is
In other words, we have found that several operators of the form not an eigenvector and such that x can be replaced by ui . With-
N AN k vanish. Let us now try to obtain some information out loss of generality, we may assume that this vector is ur+1 .
about the vectors ui by considering the action of these operators The new basis, {v, v2 , ..., vr , x, ur+2 , ..., uN } is still linearly in-
on the N -vector dependent because

v1 ... vr ur+1 ... uN . v v2 ... vr x ur+2 ... uN 6= 0

The result must be zero; for instance, we have due to cr+1 6= 0. Renaming now v v1 , x x1 , and ,
we obtain a new basis {v1 , ..., vr , x1 , ur+2 , ..., uN } such that vi
(N AN ) = Av1 ... = 0 are eigenvectors (Avi = 0) and Ax1 = v1 . The vector x1 is called
a root vector of order 1 corresponding to the given eigenvalue
since Av1 = 0. We do not obtain any new information by con-
1 = 0. Eventually the Jordan basis will contain all the root
sidering the operator N AN because the application of N AN vectors as well as all the eigenvectors for each eigenvalue. So
on acts with A on vi , which immediately yields zero. A non- our goal is to determine all the root vectors.
trivial result can be obtained only if we do not act with A on any Example 1: The operator A = e1 e in a two-dimensional
2
of the r eigenvectors vi . Thus, we turn to considering the oper- space has an eigenvector e1 with eigenvalue 0 and a root vec-
ators N AN k with k r; these operators involve sufficiently tor e2 (of order 1) so that Ae2 = e1 and Ae1 = 0. The matrix
few powers of A so that N AN k may avoid containing any representation of A in the basis {e1 , e2 } is
terms Avi .  
The first such operator is 0 1
A = .
0 0
!
0=(N AN r ) = v1 ... vr Aur+1 ... AuN .
Step 2: Determine other root vectors. If r + 1 = m then we
It follows that the set {v1 , ..., vr , Aur+1 , ..., AuN } is linearly de- are finished with the eigenvalue 1 ; there are no more operators
N N k
pendent, so there exists a vanishing linear combination A that vanish, and we cannot extract any more informa-
tion. Otherwise r + 1 < m, and we will continue by considering
r N
X X the operator N AN r1 , which vanishes as well:
ci vi + ci Aui = 0 (4.12)
i=1 i=r+1 0 = (N AN r1 ) = v1 ... vr x1 Aur+2 ... AuN .
with at least some ci 6= 0. Let us define the vectors (Note that v1 Ax1 = 0, so in writing (N AN r1 ) we omit
r
X N
X the terms where A acts on vi or on x1 and write only the term
v ci vi , x ci ui , where the operators A act on the N r 1 vectors ui .) As before,
i=1 i=r+1 it follows that there exists a vanishing linear combination
r N
so that Eq. (4.12) is rewritten Pr as Ax = v. Note that x 6= 0, for X X
otherwise we would have i=1 ci vi = 0, which contradicts the c i v i + c r+1 x1 + ci Aui = 0. (4.13)
linear independence of the set {v1 , ..., vr }. Further, the vector v i=1 i=r+2

cannot be equal to zero, for otherwise we would have Ax = 0, We introduce the auxiliary vectors
so there would exist an additional eigenvector x 6= 0 that is not a
linear combination of vi , which is impossible since (by assump- X r XN
tion) the set {v1 , ..., vr } spans the entire subspace of all eigen- v ci vi , x ci ui ,
vectors with eigenvalue 0. Therefore, v 6= 0, so at least one of i=1 i=r+2
the coefficients {ci | 1 i r} is nonzero. Without loss of gener-
ality, we assume that c1 6= 0. Then we can replace v1 by v in the and rewrite Eq. (4.13) as
basis; the set {v, v2 , ..., vr , ur+1 , ..., uN } is still a basis because
Ax = cr+1 x1 + v. (4.14)
v v2 ... vr = (c1 v1 + ...) v2 ... vr
As before, we find that x 6= 0. There are now two possibilities:
= c1 v1 v2 ... vr 6= 0. either cr+1 = 0 or cr+1 6= 0. If cr+1 = 0 then x is another root

80
4 Advanced applications

vector of order 1. As before, we show that one of the vectors eigenvector or a root vector for another eigenvalue; the Jordan
vi (but not v1 ) may be replaced by v, and one of the vectors ui cells have zero intersection. During the construction, we guar-
(but not one of the other eigenvectors or root vectors) may be antee that we are not replacing any root vectors or eigenvectors
replaced by x. After renaming the vectors (v vi and x x2 ), found for the previous eigenvalues. Therefore, the final result is
the result is a new basis a basis of the form

{v1 , ..., vr , x1 , x2 , ur+3 , ..., uN } , (4.15) {v1 , ..., vr , x1 , ..., xN r } , (4.16)


where {vi } are the various eigenvectors and {xi } are the corre-
such that Ax1 = v1 and Ax2 = v2 . It is important to keep the
information that x1 and x2 are root vectors of order 1. sponding root vectors of various orders.
The other possibility is that cr+1 6= 0. Without loss of general-Definition: The Jordan basis of an operator A is a basis of the
ity, we may assume that cr+1 = 1 (otherwise we divide Eq. (4.14) form (4.16) such that vi are eigenvectors and xi are root vectors.
by cr+1 and redefine x and v). In this case x is a root vector of For each root vector x corresponding to an eigenvalue we have
order 2; according to Eq. (4.14), acting with A on x yields a root Ax = x + y, where y is either an eigenvector or a root vector
vector of order 1 and a linear combination of some eigenvectors. belonging to the same eigenvalue.
We will modify the basis again in order to simplify the action The construction in this section constitutes a proof of the fol-
lowing statement.
of A; namely, we redefine x1 x1 + v so that Ax = x1 . The
Theorem 1: Any linear operator A in a vector space over C ad-
new vector x1 is still a root vector of order 1 because it satisfies
mits a Jordan basis.
Ax1 = v1 , and the vector x1 in the basis may be replaced by
Remark: The assumption that the vector space is over complex
x1 . As before, one of the ui s can be replaced by x. Renaming
numbers C is necessary in order to be sure that every polynomial
x1 x1 and x x2 , we obtain the basis
has as many roots (counting with the algebraic multiplicity) as
{v1 , ..., vr , x1 , x2 , ur+3 , ..., uN } , its degree. If we work in a vector space over R, the construction
of the Jordan basis will be complete only for operators whose
where now we record that x2 is a root vector of order 2. characteristic polynomial has only real roots. Otherwise we will
The procedure of determining the root vectors can be contin- be able to construct Jordan cells only for real eigenvalues.
ued in this fashion until all the root vectors corresponding to the Example 3: An operator A defined by the matrix
eigenvalue 0 are found. The end result will be a basis of the form
0 1 0
{v1 , ..., vr , x1 , ..., xmr , um+1 , ..., uN } , A = 0 0 1
0 0 0
where {vi } are eigenvectors, {xi } are root vectors of various or-
ders, and {ui } are the vectors that do not belong to this eigen- in a basis {e1 , e2 , e3 } can be also written in the tensor notation
value. as
Generally, a root vector of order k for the eigenvalue 1 = 0 is A = e1 e2 + e2 e3 .
3
a vector x such that (A)k x = 0. However, we have constructed The characteristic polynomial of A is QA () = () , so there
the root vectors such that they come in chains, for example is only one eigenvalue, 1 = 0. The algebraic multiplicity of 1
Ax2 = x1 , Ax1 = v1 , Av1 = 0. Clearly, this is the simplest is 3. However, there is only one eigenvector, namely e1 . The
possible arrangement of basis vectors. There are at most r chains vectors e2 and e3 are root vectors since Ae3 = e2 and Ae2 = e1 .
for a given eigenvalue because each eigenvector vi (i = 1, ..., r) Note also that the operator A is nilpotent, A3 = 0.
may have an associated chain of root vectors. Note that the root Example 4: An operator A defined by the matrix
chains for an eigenvalue 6= 0 have the form Av1 = v1 , Ax1 =
6 1 0 0 0
x1 + v1 , Ax2 = x2 + x1 , etc. 0 6 0 0 0
Example 2: An operator given by the matrix A =

0 0 6 0 0


0 0 0 7 0
20 1 0
A = 0 20 1 0 0 0 0 7
0 0 20 has the characteristic polynomial QA () = (6 ) (7 ) and
3 2

has an eigenvector e1 with eigenvalue = 20 and the root two eigenvalues, 1 = 6 and 2 = 7. The algebraic multiplic-
ity of 1 is 3. However, there are only two eigenvectors for the
vectors e2 (of order 1) and e3 (of order 2) since Ae1 = 20e1 ,
eigenvalue 1 , namely e1 and e3 . The vector e2 is a root vector
Ae2 = 20e2 + e1 , and Ae3 = 20e3 + e2 . A tensor representation
of order 1 for the eigenvalue 1 since
of A is
6 1 0 0 0 0 1
A = e1 (20e1 + e2 ) + e2 (20e2 + e3 ) + 20e3 e3 . 0 6 0 0 0 1 6

Ae2 = 0 0 6 0 0 0 = 0 = 6e2 + e1 .
Step 3: Proceed to other eigenvalues. At Step 2, we determined
0 0 0 7 0 0 0

all the root vectors for one eigenvalue 1 . The eigenvectors and
0 0 0 0 7 0 0
the root vectors belonging to a given eigenvalue 1 span a sub-
space called the Jordan cell for that eigenvalue. We then repeat The algebraic multiplicity of 2 is 2, and there are two eigenvec-
the same analysis (Steps 1 and 2) for another eigenvalue and tors for 2 , namely e4 and e5 . The vectors {e1 , e2 , e3 } span the
determine the corresponding Jordan cell. Note that it is impos- Jordan cell for the eigenvalue 1 , and the vectors {e4 , e5 } span
sible that a root vector for one eigenvalue is at the same time an the Jordan cell for the eigenvalue 2 .

81
4 Advanced applications

Exercise 1: Show that root vectors of order k (with k 1) be- Definition: A polynomial p(x) of degree n is square-free if all n
longing to eigenvalue are at the same time eigenvectors of the roots of p(x) have algebraic multiplicity 1, in other words,
operator (A 1)k+1 with eigenvalue 0. (This gives another con-
p(x) = c (x x1 ) ... (x xn )
structive procedure for determining the root vectors.)
where all xi (i = 1, ..., n) are different. If a polynomial
4.6.1 Minimal polynomial s
q(x) = c (x x1 ) 1 ... (x xm )
sm

Recalling the Cayley-Hamilton theorem, we note that the char- is not square-free (i.e. some si 6= 1), its square-free reduction is
acteristic polynomial for the operator A in Example 4 in the pre- the polynomial
vious subsection vanishes on A:
q(x) = c (x x1 ) ... (x xm ) .
3 2
(6 A) (7 A) = 0. Remark: In order to compute the square-free reduction of a
given polynomial q(x), one does not need to obtain the roots xi
However, there is a polynomial of a lower degree that also van-
2 of q(x). Instead, it suffices to consider the derivative q (x) and
ishes on A, namely p(x) = (6 x) (7 x). to note that q (x) and q(x) have common factors only if q(x) is
Let us consider the operator A in Example 3 in the previous not square-free, and moreover, the common factors are exactly
3
subsection. Its characteristic polynomial is () , and it is clear the factors that we need to remove from q(x) to make it square-
that (A)2 6= 0 but (A)3 = 0. Hence there is no lower-degree free. Therefore, one computes the greatest common divisor of
polynomial p(x) that makes A vanish; the minimal polynomial q(x) and q (x) using the Euclidean algorithm and then divides
is 3 . q(x) by gcd (q, q ) to obtain the square-free reduction q(x).
Let us also consider the operator Theorem 2: An operator A is diagonalizable if and only if
p(A) = 0 where p() is the square-free reduction of the char-
2 0 0 0 0 acteristic polynomial QA ().
0 2 0 0 0
Proof: The Jordan canonical form of A may contain several Jor-
B =
0 0 1 0 0 .

dan cells corresponding to different eigenvalues. Suppose that
0 0 0 1 0
the set of the eigenvalues of A is {i | i = 1, ..., n}, where i are
0 0 0 0 1
all different and have algebraic multiplicities si ; then the char-
The characteristic polynomial of this operator is acteristic polynomial of A is
2 3
(2 ) (1 ) , but it is clear that the following simpler s
QA (x) = (1 x) 1 ... (n x)
sn
,
polynomial, p(x) = (2 x) (1 x), also vanishes on B. If we
are interested in the lowest-degree polynomial that vanishes on and its square-free reduction is the polynomial
B, we do not need to keep higher powers of the factors (2 ) p(x) = (1 x) ... (n x) .
and (1 ) that appear in the characteristic polynomial.
We may ask: what is the polynomial p(x) of a smallest degree If the operator A is diagonalizable, its eigenvectors
such that p(A) = 0? Is this polynomial unique? {vj | j = 1, ..., N } are a basis in V . Then p(A)vj = 0 for all
Definition: The minimal polynomial for an operator A is a j = 1, ..., N . It follows that p(A) = 0 as an operator. If the oper-
monic polynomial p(x) such that p(A) = 0 and that no poly- ator A is not diagonalizable, there exists at least one nontrivial
nomial p(x) of lower degree satisfies p(A) = 0. Jordan cell with root vectors. Without loss of generality, let us
Exercise 1: Suppose that the characteristic polynomial of A is assume that this Jordan cell corresponds to 1 . Then there exists
given as a root vector x such that Ax = 1 x + v1 while Av1 = 1 v1 .
Then we can compute (1 A)x = v1 and
n1
QA () = (1 ) (2 )n2 ...(s )ns .
p(A)x = (1 A)...(n A)x
Suppose that the Jordan canonical form of A includes Jordan (1)
= (n A)...(2 A)(1 A)x
cells for eigenvalues 1 , ..., s such that the largest-order root
(2)
vector for i has order ri (i = 1, ..., s). Show that the polyno- = (n 1 ) ... (2 1 ) v1 6= 0,
mial of degree r1 + ... + rs defined by
(1)
r1 rs
where in = we used the fact that operators (i A) all commute
r1 +...+rs
p(x) (1) (1 ) ... (s ) (2)
with each other, and in = we used the property of an eigenvec-
is monic and satisfies p(A) = 0. If p(x) is another polynomial of tor, q(A)v1 = q(1 )v1 for any polynomial q(x). Thus we have
the same degree as p(x) such that p(A) = 0, show that p(x) is shown that p(A) gives a nonzero vector on x, which means that
proportional to p(x). Show that no polynomial q(x) of lower de- p(A) is a nonzero operator. 
gree can satisfy q(A) = 0. Hence, p(x) is the minimal polynomial Exercise 2: a) It is given that the characteristic polynomial of an
for A. operator A (in a complex vector space) is 3 + 1. Prove that the
Hint: It suffices to prove these statements for a single Jordan operator A is invertible and diagonalizable.
cell.  b) It is given that the operator A satisfies the equation A3 =
2
We now formulate a criterion that shows whether a given op- A . Is A invertible? Is A diagonalizable? (If not, give explicit
erator A is diagonalizable. counterexamples, e.g., in a 2-dimensional space.)

82
4 Advanced applications

Exercise 3: A given operator A has a Jordan cell expressed as a polynomial in A with known coefficients. (Note
Span {v1 , ..., vk } with eigenvalue . Let that A may or may not be diagonalizable.)
s The required projector P can be viewed as an operator that
p(x) = p0 + p1 x + ... + ps x
has the same Jordan cells as A but the eigenvalues are 1 for a sin-
be an arbitrary, fixed polynomial, and consider the operator B gle chosen Jordan cell and 0 for all other Jordan cells. One way
p(A). Show that Span {v1 , ..., vk } is a subspace of some Jordan to construct the projector P is to look for a polynomial in A such
cell of the operator B (although the eigenvalue of that cell may that the eigenvalues and the Jordan cells are mapped as desired.
Some examples of this were discussed at the end of the previ-
be different). Show that the orders of the root vectors of B are
ous subsection; however, the construction required a complete
not larger than those of A.
Hint: Consider for simplicity = 0. The vectors vj belong to knowledge of the Jordan canonical form of A with all eigenvec-
tors and root vectors. We will consider a different method of
the eigenvalue p0 p(0) of the operator B. The statement that
computing the projector P . With this method, we only need to
{vj } are within a Jordan cell for B is equivalent to
know the characteristic polynomial of A, a single eigenvalue,
v1 ... (B p0 1)vi ... vk = 0 for i = 1, ..., k. and the algebraic multiplicity of the chosen eigenvalue. We will
develop this method beginning with the simplest case.
If v1 is an eigenvector of A with eigenvalue = 0 then it is also Statement 1: If the characteristic polynomial Q () of an opera-
an eigenvector of B with eigenvalue p0 . If x is a root vector of tor A has a zero = 0 of multiplicity 1, i.e. if Q(0 ) = 0 and

order 1 such that Ax = v1 then Bx = p0 x + p1 v, which means Q (0 ) 6= 0, then the operator P0 defined by
that x could be a root vector of order 1 or an eigenvector of B 1  N 1 T
depending on whether p1 = 0. Similarly, one can show that the P0 (A 0 1V )N 1
Q (0 )
root chains of B are sub-chains of the root chains A (i.e. the root
is a projector onto the one-dimensional eigenspace of the eigen-
chains can only get shorter).
value 0 . The prefactor can be computed also as Q (0 ) =
Example 5: A nonzero nilpotent operator A such that A1000 = 0
N (A 0 1V )N 1 .
may have root vectors of orders up to 999. The operator B
Proof: We denote P P0 for brevity. We will first show
A500 satisfies B 2 = 0 and thus can have root vectors only up to
that for any vector x, the vector P x is an eigenvector of A with
order 1. More precisely, the root vectors of A of orders 1 through eigenvalue , i.e. that the image of P is a subspace of the -
0 0
499 are eigenvectors of B, while root vectors of A of orders 500
eigenspace. Then it will be sufficient to show that P v0 = v0 for
through 999 are root vectors of B of order 1. However, the Jor-
an eigenvector v0 ; it will follow that P P = P and so it will be
dan cells of these operators are the same (the entire space V is
proved that P is a projector onto the eigenspace.
a Jordan cell with eigenvalue 0). Also, A is not expressible as a
Without loss of generality, we may set 0 = 0 (or else we
polynomial in B.  consider the operator A 1 instead of A). Then we have
0 V
Exercise 3 gives a necessary condition for being able to express
det A = 0, while the number N AN 1 is equal to the last-but-
an operator B as a polynomial in A: It is necessary to deter- one coefficient in the characteristic polynomial, which is the
mine whether the Jordan cells of A and B are compatible in same as Q ( ) and is nonzero. Thus we set
0
the sense of Exercise 3. If As Jordan cells cannot be embedded
1 T 1
as subspaces within Bs Jordan cells, or if B has a root chain that P = N 1 AN 1 = A
N A N 1 N A N 1
is not a sub-chain of some root chain of A, then B cannot be a
polynomial in A. and note that by Lemma 1 in Sec. 4.2.1
Determining a sufficient condition for the existence of p(x) for 1
arbitrary A and B is a complicated task, and I do not consider it P A = (det A)1V = 0V .
N AN 1
here. The following exercise shows how to do this in a particu-
larly simple case. Since P is a polynomial in A, we have P A = AP = 0. Therefore
Exercise 4: Two operators A and B are diagonalizable in the A(P x) = 0 for all x V , so imP is indeed a subspace of the
same eigenbasis {v1 , ..., vN } with eigenvalues 1 , ..., n and 1 , eigenspace of the eigenvalue 0 = 0.
..., n that all have multiplicity 1. Show that B = p(A) for some It remains to show that P v0 = v0 for an eigenvector v0 such
polynomial p(x) of degree at most N 1. that Av0 = 0. This is verified by a calculation: We use Lemma 1
Hint: We need to map the eigenvalues {j } into {j }. Choose in Sec. 4.2.1, which is the identity
the polynomial p(x) that maps p(j ) = j for j = 1, ..., N . Such T T
N 1 AN n A + N 1 AN n+1 = (N AN n+1 )1V
a polynomial surely exists and is unique if we restrict to polyno-
mials of degree not more than N 1.  valid for all n = 1, ..., N , and apply both sides to the vector v0
with n = 2:
T T
4.7 * Construction of projectors onto N 1 AN 2 Av0 + N 1 AN 1 v0 = (N AN 1 )v0 ,

Jordan cells which yields the required formula,


T
N 1 AN 1 v0
We now consider the problem of determining the Jordan cells. = v0 ,
It turns out that we can write a general expression for a projec- N AN 1
tor onto a single Jordan cell of an operator A. The projector is since Av0 = 0. Therefore, P v0 = v0 as required. 

83
4 Advanced applications

Remark: The projector P0 is a polynomial in A with coeffi- are equal to zero: qk = 0 for k = 0, ..., n 1 but qn 6= 0. (Thus the
cients that are known if the characteristic polynomial Q() is denominator in Eq. (4.18) is nonzero.)
known. The quantity Q (0 ) is also an algebraically constructed By Lemma 1 in Sec. 4.2.1, for every k = 1, ..., N we have the
object that can be calculated without taking derivatives. More identity
precisely, the following formula holds. T T
Exercise 1: If A is any operator in V , prove that N 1 AN k A + N 1 AN k+1 = (N AN k+1 )1V .

k k We can rewrite this as


k k
(1) Q () (1) N (A 1V )N
k A k A(k) A + A(k1) = qk1 1, (4.19)
= k! N (A 1V )N k . (4.17)
where we denoted, as before,
Solution: An easy calculation. For example, with k = 2 and T
N = 2, A(k) N 1 AN k .

2 2 2 h i Setting k = n, we find
2
(A 1V )2 u v = 2
(A 1V )u (A 1V )v
A(n) A = qn P (n) A = 0.
= 2u v.
Since qn 6= 0, we find P A = 0. Since P is a polynomial in A, it
The formula (4.17) shows that the derivatives of the characteris-
commutes with A, so P A = AP = 0. Hence the image of P is a
tic polynomial are algebraically defined quantities with a poly-
subspace of the eigenspace of A with 0 = 0.
nomial dependence on the operator A. 
Now it remains to show that all vi s are eigenvectors of P with
Example 1: We illustrate this construction of the projector in a
eigenvalue 1. We set k = n + 1 in Eq. (4.19) and obtain
two-dimensional space for simplicity. Let V be a space of poly-
nomials in x of degree at most 1, i.e. polynomials of the form A(n+1) Avi + A(n) vi = qn vi .
d
+ x with , C, and consider the linear operator A = x dx
in this space. The basis in V is {1, x}, where we use an underbar Since Avi = 0, it follows that A(n) vi = qn vi . Therefore P v1 =
to distinguish the polynomials 1 and x from numbers such as 1. v1 . 
We first determine the characteristic polynomial, It remains to consider the case when the geometric multiplic-
ity of 0 is less than the algebraic multiplicity, i.e. if there exist
(A )1 (A )x some root vectors.
QA () = det(A 1) = = (1 ).
1x Statement 3: We work with an operator A whose characteristic
polynomial is known,
Let us determine the projector onto the eigenspace of = 0. We
have 2 A1 = Q (0) = 1 and QA () = q0 + () q1 + ... + ()N 1 qN 1 + ()N .

1 T d
Without loss of generality, we assume that A has an eigenvalue
P0 = 1 A1 = (2 A1 )1 A = 1 x .
Q (0) 0 = 0 of algebraic multiplicity n 1. The geometric multiplic-
dx
ity of 0 may be less than or equal to n. (For nonzero eigenvalues
Since P0 1 = 1 while P0 x = 0, the image of P is the subspace 0 , we consider the operator A 0 1 instead of A.)
spanned by 1. Hence, the eigenspace of = 0 is Span{1}.  (1) A projector onto the Jordan cell of dimension n belonging
What if the eigenvalue 0 has an algebraic multiplicity larger to eigenvalue 0 is given by the operator
than 1? Let us first consider the easier case when the geometric
n n N k
multiplicity is equal to the algebraic multiplicity. X X X
Statement 2: If 0 is an eigenvalue of both geometric and alge- P0 c k A(k) = 1 + ck qi+k (A)i , (4.20)
(n) k=1 k=1 i=n
braic multiplicity n then the operator P0 defined by
where
P
(n)  N N n 1  N 1
A (A 1 )N n T

(4.18) A(k) (N 1 AN k )T , 1 k N 1,
0 0 V
and c1 , ..., cn are the numbers that solve the system of equations
is a projector onto the subspace of eigenvectors with eigenvalue
0 . qn qn+1 qn+2 q2n1
c1 0
Proof: As in the proof of Statement 1, we first show that the 0 qn qn+1 q2n2
c2 0

(n)
image (im P0 ) is a subspace of the 0 -eigenspace of A, and .. . . . . .. . .
. 0 . . . . = .

.
. .
then show that any eigenvector v0 of A with eigenvalue 0 sat- .. . ..

qn qn+1 cn1 0
0
(n) (n) .
isfies P0 v0 = v0 . Let us write P P0 for brevity. c 1
0 0 0 qn n
We first need to show that (A 0 1)P = 0. Since by assump-
tion 0 has algebraic multiplicity n, the characteristic polyno- For convenience, we have set qN 1 and qi 0 for i > N .
mial is of the form QA () = (0 )n p(), where p() is an- (2) No polynomial in A can be a projector onto the subspace
other polynomial such that p(0 ) 6= 0. Without loss of generality of eigenvectors within the Jordan cell (rather than a projector onto
we set 0 = 0. With 0 = 0, the factor (n ) in the characteristic the entire Jordan cell) when the geometric multiplicity is strictly
polynomial means that many of its coefficients qk N AN k less than the algebraic.

84
4 Advanced applications

Proof: (1) The Jordan cell consists of all vectors x such that solution is unique since qn 6= 0. Thus, we are able to choose ck
An x = 0. We proceed as in the proof of Statement 2, starting such that P0 x = x for any x within the Jordan cell.
from Eq. (4.19). By induction in k, starting from k = 1 until The formula for P0 can be simplified by writing
k = n, we obtain "n1 #
Xn X N
X k
i i
AA(1) = q0 1 = 0, P0 = ck qk+i (A) + ck qk+i (A) .
k=1 i=0 i=n
A2 A(2) + AA(1) = Aq1 1 = 0 A2 A(2) = 0,
The first sum yields 1 by Eq. (4.22), and so we obtain Eq. (4.20).
..., An A(n) = 0. (2) A simple counterexample is the (non-diagonalizable) op-
erator  
So we find An A(k) = 0 for all k (1 k n). Since P0 is by 0 1
A = = e1 e2 .
construction equal to a linear combination of these A(k) , we have 0 0
An P0 = 0, i.e. the image of P0 is contained in the Jordan cell. This operator has a Jordan cell with eigenvalue 0 spanned by the
It remains to prove that the Jordan cell is also contained in the basis vectors e1 and e2 . The eigenvector with eigenvalue 0 is e1 ,
image of P0 , that is, to show that An x = 0 implies P0 x = x. and a possible projector onto this eigenvector is P = e1 e1 .
We use the explicit formulas for A(k) that can be obtained by However, no polynomial in A can yield P or any other projector
induction from Eq. (4.19) starting with k = N : we have A(N ) = only onto e1 . This can be seen as follows. We note that AA = 0,
0, A(N 1) = qN 1 1 A, and finally and thus any polynomial in A can be rewritten as a0 1V + a1 A.
However, if an operator of the form a0 1V + a1 A is a projector,
N k
N k X and AA = 0, then we can derive that a20 = a0 and a1 = 2a0 a1 ,
A(k) = qk 1qk+1 A+...+qN (A) = qk+i (A)i , k 1.
which forces a0 = 1 and a1 = 0. Therefore the only result of a
i=0
(4.21) polynomial formula can be the projector e1 e1 + e2 e2 onto
The operator P0 is a linear combination of A(k) with 1 k n. the entire Jordan cell. 
The Jordan cell of dimension n consists of all x V such that Example 2: Consider the space of polynomials in x and y of de-
An x = 0. Therefore, while computing P0 x for any x such that gree at most 1, i.e. the space spanned by {1, x, y}, and the oper-
An x = 0, we can restrict the summation over i to 0 i n 1, ator
A = x + .
Xn N
X k Xn n1
X x y
P0 x = ck qk+i (A)i x = ck qk+i (A)i x. The characteristic polynomial of A is found as
k=1 i=0 k=1 i=0
(A )1 (A )x (A )y
We would like to choose the coefficients ck such that the sum QA () =
1xy
above contains only the term (A)0 x = x with coefficient 1,
= q0 q1 + q2 2 q3 3 .
2 3
while all other powers of A will enter with zero coefficient. In
other words, we require that Hence = 0 is an eigenvalue of algebraic multiplicity 2. It is
easy to guess the eigenvectors, v1 = 1 ( = 0) and v2 = x
n n1
X X ( = 1), as well as the root vector v3 = y ( = 0). However,
ck qk+i (A)i = 1 (4.22)
let us pretend that we do not know the Jordan basis, and instead
k=1 i=0
determine the projector P0 onto the Jordan cell belonging to the
identically as polynomial in A. This will happen if the coeffi- eigenvalue 0 = 0 using Statement 3 with n = 2 and N = 3.
cients ck satisfy We have q0 = q1 = 0, q2 = q3 = 1. The system of equations for
the coefficients ck is
Xn
ck qk = 1, q2 c1 + q3 c2 = 0,
k=1 q2 c2 = 1,
n
X
ck qk+i = 0, i = 1, ..., n 1. and the solution is c1 = 1 and c2 = 1. We note that in our
k=1 example,

This system of equations for the unknown coefficients ck can be A2 = x .
x
rewritten in matrix form as
So we can compute the projector P0 by using Eq. (4.20):

qn qn+1 qn+2 q2n1 c1 0 2 3k
qn1 X X
qn qn+1 q2n2 c2 ck qi+k (A)i
0 P0 = 1 +

.. . . . . .. . .
.
qn1 . . .
.. = ..

. k=1 i=2
.. .
q2 . .. qn qn+1 cn1
0 = 1 + c1 q3 A2 = 1 x
.
cn 1 x
q1 q2 qn1 qn
(The summation over k and i collapses to a single term k = 1,
However, it is given that 0 = 0 is a root of multiplicity n, there- i = 2.) The image of P0 is Span {1, y}, and we have P0 P0 = P0 .
fore q0 = ... = qn1 = 0 while qn 6= 0. Therefore, the system Hence P0 is indeed a projector onto the Jordan cell Span {1, y}
of equations has the triangular form as given in Statement 3. Its that belongs to the eigenvalue = 0.

85
4 Advanced applications

Exercise 2: Suppose the operator A has eigenvalue 0 with


algebraic multiplicity n. Show that one can choose a basis
{v1 , ..., vn , en+1 , ..., eN } such that vi are eigenvalues or root
vectors belonging to the eigenvalue 0 , and ej are such that
the vectors (A 0 1)ej (with j = n + 1,...,N ) belong to
the subspace Span {en+1 , ..., eN }. Deduce that the subspace
Span {en+1 , ..., eN } is mapped one-to-one onto itself by the op-
erator A 0 1.
Hint: Assume that the Jordan canonical form of A is known.
Show that

N n (A 0 1)N n (en+1 ... eN ) 6= 0.

(Otherwise, a linear combination of ej is an eigenvector with


eigenvalue 0 .)
Remark: Operators of the form
 T
Rk N 1 (A 0 1V )N k (4.23)

with k n are used in the construction of projectors onto the


Jordan cell. What if we use Eq. (4.23) with other values of k?
It turns out that the resulting operators are not projectors. If
k n, the operator Rk does not map into the Jordan cell. If
k < n, the operator Rk does not map onto the entire Jordan cell
but rather onto a subspace of the Jordan cell; the image of Rk
contains eigenvectors or root vectors of a certain order. An ex-
ample of this property will be shown in Exercise 3.
Exercise 3: Suppose an operator A has an eigenvalue 0 with
algebraic multiplicity n and geometric multiplicity n 1. This
means (according to the theory of the Jordan canonical form)
that there exist n 1 eigenvectors and one root vector of order
1. Let us denote that root vector by x1 and let v2 , ..., vn be the
(n 1) eigenvectors with eigenvalue 0 . Moreover, let us choose
v2 such that Av1 = 0 x1 + v2 (i.e. the vectors x1 , v2 are a root
chain). Show that the operator Rk given by the formula (4.23),
with k = n 1, satisfies

Rn1 x1 = const v2 ; Rn1 vj = 0, j = 2, ..., n;


Rn1 ej = 0, j = n + 1, ..., N.

In other words, the image of the operator Rn1 contains only


the eigenvector v2 ; that is, the image contains the eigenvector
related to a root vector of order 1.
Hint: Use a basis of the form {x1 , v2 , ..., vn , en+1 , ..., eN } as in
Exercise 2.

86
5 Scalar product
Until now we did not use any scalar product in our vector Example 1: In the space Rn , the standard scalar product is
spaces. In this chapter we explore the properties of spaces with
N
a scalar product. The exterior product techniques are especially X
h(x1 , ..., xN ) , (y1 , ..., yN )i xj yj . (5.1)
powerful when used together with a scalar product.
j=1

Let us verify that this defines a symmetric, nondegenerate, and


5.1 Vector spaces with scalar product positive-definite bilinear form. This is a bilinear form because it
depends linearly on each xj and on each yj . This form is sym-
As you already know, the scalar product of vectors is related to metric because it is invariant under the interchange of x with
j
the geometric notions of angle and length. These notions are y . This form is nondegenerate because for any x 6= 0 at least
j
most useful in vector spaces over real numbers, so in most of one of x , say x , is nonzero; then the scalar product of x with
j 1
this chapter I will assume that K is a field where it makes sense the vector w (1, 0, 0, ..., 0) is nonzero. So for any x 6= 0 there
to compare numbers (i.e. the comparison x > y is defined and exists w such that hx, wi 6= 0, which is the nondegeneracy prop-
has the usual properties) and where statements such as 2 0 erty. Finally, the scalar product is positive-definite because for
( K) hold. (Scalar products in complex spaces are defined any nonzero x there is at least one nonzero x and thus
j
in a different way and will be considered in Sec. 5.6.)
In order to understand the properties of spaces with a scalar X N
product, it is helpful to define the scalar product in a purely alge- hx, xi = h(x1 , ..., xN ) , (x1 , ..., xN )i x2j > 0.
braic way, without any geometric constructions. The geometric j=1
interpretation will be developed subsequently.
Remark: The fact that a bilinear form is nondegenerate does not
The scalar product of two vectors is a number, i.e. the scalar
mean that it must always be nonzero on any two vectors. It is
product maps a pair of vectors into a number. We will denote
perfectly possible that ha, bi = 0 while a 6= 0 and b 6= 0. In the
the scalar product by hu, vi, or sometimes by writing it in a func-
usual Euclidean space, this would mean that a and b are orthog-
tional form, S (u, v).
onal to each other. Nondegeneracy means that no vector is or-
A scalar product must be compatible with the linear structure
thogonal to every other vector. It is also impossible that ha, ai = 0
of the vector space, so it cannot be an arbitrary map. The precise
while a 6= 0 (this contradicts the positive-definiteness).
definition is the following.
Example 2: Consider the space End V of linear operators in V .
Definition: A map B : V V K is a bilinear form in a vector
We can define a bilinear form in the space End V as follows: For
space V if for any vectors u, v, w V and for any K,
any two operators A, B End V we set hA, Bi Tr(AB). This
B (u, v + w) = B (u, v) + B (u, w) , bilinear form is not positive-definite. For example, if there is an
B (v + w, u) = B (v, u) + B (w, u) . operator J such that J2 = 1V then Tr(JJ) = N < 0 while
Tr(11) = N > 0, so neither Tr(AB) nor Tr(AB) can be posit-
A bilinear form B is symmetric if B (v, w) = B (w, v) for any v, ive-definite. (See Exercise 4 in Sec. 5.1.2 below for more infor-
w. A bilinear form is nondegenerate if for any nonzero vector mation.)
v 6= 0 there exists another vector w such that B (v, w) 6= 0. A Remark: Bilinear forms that are not positive-definite (or even
bilinear form is positive-definite if B (v, v) > 0 for all nonzero degenerate) are sometimes useful as pseudo-scalar products.
vectors v 6= 0. We will not discuss these cases here.
A scalar product in V is a nondegenerate, positive-definite, Exercise 1: Prove that two vectors are equal, u = v, if and only
symmetric bilinear form S : V V K. The action of the scalar if hu, xi = hv, xi for all vectors x V .
product on pairs of vectors is also denoted by hv, wi S (v, w). Hint: Consider the vector u v and the definition of nonde-
A finite-dimensional vector space over R with a scalar product generacy of the scalar product.
is called a Euclidean
p space. The length of a vector v is the non- Solution: If u v = 0 then by the linearity of the scalar prod-
negative number hv, vi. (This number is also called the norm uct hu v, xi = 0 = hu, xi hv, xi. Conversely, suppose that
of v.)  u 6= v; then uv 6= 0, and (by definition of nondegeneracy of the
Verifying that a map S : V V K is a scalar product in V scalar product) there exists a vector x such that hu v, xi 6= 0.
requires proving that S is a bilinear form satisfying certain prop- 
erties. For instance, the zero function B (v, w) = 0 is symmetric Exercise 2: Prove that two linear operators A and B are equal as
but is not a scalar product because it is degenerate. operators, A = B, if and only if hAx, yi = hBx, yi for all vectors
Remark: The above definition of the scalar product is an ab- x, y V .
stract definition because it does not specify any particular Hint: Consider the vector Ax Bx. 
scalar product in a given vector space. To specify a scalar prod-
uct, one usually gives an explicit formula for computing ha, bi. 5.1.1 Orthonormal bases
In the same space V , one could consider different scalar prod-
ucts. A scalar product defines an important property of a basis in V .

87
5 Scalar product

Definition: A set of vectors {e1 , ..., ek } in a space V is orthonor- so that hek+1 , ek+1 i = 1; then the set {e1 , ..., ek , ek+1 } is or-
mal with respect to the scalar product if thonormal. So the required set {e1 , ..., ek+1 } is now constructed.

hei , ej i = ij , 1 i, j k. Question: What about number fields K where the square root
If an orthonormal set {ej } is a basis in V , it is called an orthonor- does not exist, for example the field of rational numbers Q?
mal basis. Answer: In that case, an orthonormal basis may or may not
N
Example 2: In the space R of N -tuples of real numbers exist. For example, suppose that we consider vectors in Q2 and
(x1 , ..., xN ), the natural scalar product is defined by the for- the scalar product
mula (5.1). Then the standard basis in RN , i.e. the basis con- h(x1 , x2 ), (y1 , y2 )i = x1 y1 + 5x2 y2 .
sisting of vectors (1, 0, ..., 0), (0, 1, 0, ..., 0), ..., (0, ..., 0, 1), is or-
thonormal with respect to this scalar product.  Then we cannot normalize the vectors: there exists no vector
The standard properties of orthonormal bases are summa- x (x1 , x2 ) Q2 such that hx, xi = x21 + 5x22 = 1. The proof
rized in the following theorems. of this is similar to the ancient proof of the irrationality of 2.
Statement: Any orthonormal set of vectors is linearly indepen- Thus, there exists no orthonormal basis in this space with this
dent. scalar product.
Proof: If an orthonormal set {e1 , ..., ek } is linearly dependent, Theorem 2: If {ej } is an orthonormal basis then any vector v
there exist numbers j , not all equal to zero, such that V is expanded according to the formula
k
X N
X
j ej = 0. v= vj ej , vj hej , vi .
j=1 j=1

By assumption, there exists an index s such that s 6= 0; then the In other words, the j-th component of the vector v in the basis
scalar product of the above sum with es yields a contradiction, {e1 , ..., eN } is equal to the scalar product hej , vi.
* k + k
Proof: Compute the scalar product hej , vi and obtain vj
X X
0 = h0, es i = j ej , es = js j = s 6= 0. hej , vi. 
j=1 j=1
Remark: Theorem 2 shows that the components of a vector in
an orthonormal basis can be computed quickly. As we have seen
Hence, any orthonormal set is linearly independent (although it before, the component vj of a vector v in the basis {ej } is given
is not necessarily a basis).  by the covector ej from the dual basis, vj = ej (v). Hence, the

Theorem 1: Assume that V is a finite-dimensional vector space dual basis e consists of linear functions
j
with a scalar product and K is a field where one can compute
ej : x 7 hej , xi . (5.2)
square roots (i.e. for any K, >20 there exists another num-
ber K such that = ). Then there exists an or-
In contrast, determining the dual basis for a general (non-
thonormal basis in V .
orthonormal) basis requires a complicated construction, such as
Proof: We can build a basis by the standard orthogonaliza-
that given in Sec. 2.3.3.
tion procedure (the Gram-Schmidt procedure). This procedure
Corollary: If {e1 , ..., eN } is an arbitrary basis in V , there exists
uses induction to determine a sequence of orthonormal sets
a scalar product with respect to which {ej } is an orthonormal
{e1 , ..., ek } for k = 1, ..., N .
basis.
Basis of induction: Choose any nonzero vector v pV and
Proof: Let {e , ..., eN } be the dual basis in V . The required
compute hv, vi; since v 6= 0, we have hv, vi > 0, so hv, vi scalar product is1 defined by the bilinear form
exists, and we can define e1 by
N
v X
e1 p . S (u, v) = ej (u) ej (v) .
hv, vi j=1

It follows that he1 , e1 i = 1. It is easy to show that the basis {ej } is orthonormal with respect
Induction step: If {e1 , ..., ek } is an orthonormal set, we need to the bilinear form S, namely S(ei , ej ) = ij (where ij is the
to find a vector ek+1 such that {e1 , ..., ek , ek+1 } is again an or- Kronecker symbol). It remains to prove that S is nondegener-
thonormal set. To find a suitable vector ek+1 , we first take any ate and positive-definite. To prove the nondegeneracy: Suppose
vector v such that the set {e1 , ..., ek , v} is linearly independent; that u 6= 0; then we can decompose u in the basis {ej },
such v exists if k < N , while for k = N there is nothing left to
prove. Then we define a new vector N
X
u= u j ej .
k
X j=1
wv hej , vi ej .
j=1 There will be at least one nonzero coefficient us , thus S (es , u) =
us 6= 0. To prove that S is positive-definite, compute
This vector has the property hej , wi = 0 for 1 j k. We have
w 6= 0 because (by construction) v is not a linear combination of N
X
e1 , ..., ek ; therefore hw, wi > 0. Finally, we define S (u, u) = u2j > 0
j=1
w
ek+1 p ,
hw, wi as long as at least one coefficient uj is nonzero. 

88
5 Scalar product

Exercise 1: Let {v1 , ..., vN } be a basis in V , and let {e1 , ..., eN } 5.1.2 Correspondence between vectors and
be an orthonormal basis. Show that the linear operator covectors
N
X Let us temporarily consider the scalar product hv, xi as a func-
Ax hei , xi vi tion of x for a fixed v. We may denote this function by f . So
i=1
f : x 7 hv, xi is a linear map V K, i.e. (by definition) an
maps the basis {ei } into the basis {vi }. element of V . Thus, a covector f V is determined for every
Exercise 2: Let {v1 , ..., vn } with n < N be a linearly indepen- v. Therefore we have defined a map V V whereby a vector
dent set (not necessarily orthonormal). Show that this set can v is mapped to the covector f , which is defined by its action on
be completed to a basis {v1 , ..., vn , en+1 , ..., eN } in V , such that vectors x as follows,
every vector ej (j = n + 1, ..., N ) is orthogonal to every vector vi
(i = 1, ..., n). v 7 f ; f (x) hv, xi , x V. (5.3)
Hint: Follow the proof of Theorem 1 but begin the Gram-
Schmidt procedure at step n, without orthogonalizing the vec- This map is an isomorphism between V and V (not a canonical
tors vi . one, since it depends on the choice of the scalar product), as the
Exercise 3: Let {e1 , ..., eN } be an orthonormal basis, and let vi following statement shows.
hv, ei i. Show that Statement 1: A nondegenerate bilinear form B : V V K
N
X 2 defines an isomorphism V V by the formula v 7 f , f (x)
hv, vi = |vi | . B(v, x).
i=1
Proof: We need to show that the map B : V V is a lin-
Exercise 4: Consider the space of polynomials of degree at most ear one-to-one (bijective) map. Linearity easily follows from the
2 in the variable x. Let us define the scalar product of two poly- bilinearity of B. Bijectivity requires that no two different vec-
nomials p1 (x) and p2 (x) by the formula tors are mapped into one and the same covector, and that any
1
Z 1 covector is an image of some vector. If two vectors u 6= v are
hp1 , p2 i = p1 (x)p2 (x)dx. mapped into one covector f then B (u v) = f f = 0 V ,
2 1
in other words, B (u v, x) = 0 for all x. However, from the
Find a linear polynomial q1 (x) and a quadratic polynomial q2 (x) nondegeneracy of B it follows that there exists x V such that
such that {1, q1 , q2 } is an orthonormal basis in this space. B (u v, x) 6= 0, which gives a contradiction. Finally, consider
Remark: Some of the properties of the scalar product are related a basis {vj } in V . Its image {Bv1 , ..., BvN } must be a linearly
in an essential way to the assumption that we are working with independent set in V because a vanishing linear combination
real numbers. As an example of what could go wrong if we
X X
naively extended the same results to complex vector spaces, let Bv = 0 = B k vk

2 k k
us consider a vector x = (1, i) C and compute its scalar prod- k k
uct with itself by the formula
P
hx, xi = x21 + x22 = 12 + i2 = 0.
entails k k vk = 0 (we just proved that a nonzero vec-
tor cannot be mapped into the zero covector). Therefore
Hence we have a nonzero vector whose length is zero. To {Bv1 , ..., BvN } is a basis in V , and any covector f is a linear
correct this problem when working with complex numbers, one combination
usually considers a different kind of scalar product designed for X X 
complex vector spaces. For instance, the scalar product in Cn is f = fk Bvk = B fk vk .
defined by the formula k k

n
X It follows that any vector f is an image of some vector from V .
h(x1 , ..., xn ), (y1 , ..., yn )i = xj yj ,
Thus B is a one-to-one map. 
j=1
Let us show explicitly how to use the scalar product in order
where xj
is the complex conjugate of the component xj . This to map vectors to covectors and vice versa.
scalar product is called Hermitian and has the property Example: We use the scalar product as the bilinear form B, so
hx, yi = hy, xi , B(x, y) hx, yi. Suppose {ej } is an orthonormal basis. What is
the covector Be1 ? By Eq. (5.3), this covector acts on an arbitrary
that is, it is not symmetric but becomes complex-conjugated vector x as
when the order of vectors is interchanged. According to this Be1 (x) = he1 , xi x1 ,
scalar product, we have for the vector x = (1, i) C2 a sensible
result, where x1 is the first component of the vector x in the basis {ej },
2 2 PN
hx, xi = x1 x1 + x2 x2 = |1| + |i| = 2. i.e. x = i=1 xi ei. We find that Be1 is the same as the covector
More generally, for x 6= 0 e1 from the basis ej dual to {ej }.
Suppose f V is a given covector. What is its pre-image
N
X 2 B f V ? It is a vector v such that f (x) = hv, xi for any
1
hx, xi = |xi | > 0.
x V . In order to determine v, let us substitute the basis vectors
i=1
ej instead of x; we then obtain
In this text, I will use this kind of scalar product only once
(Sec. 5.6). f (ej ) = hv, ej i .

89
5 Scalar product

Since the covector f is given, the numbers f (ej ) are known, 5.1.3 * Example: bilinear forms on V V
and hence
X n N
X If V is a vector space then the space V V has two canoni-
v= ej hv, ej i = ej f (ej ). cally defined bilinear forms that could be useful under certain
i=1 i=1 circumstances (when positive-definiteness is not required). This
 construction is used in abstract algebra, and I mention it here as
Bilinear forms can be viewed as elements of the space V V . an example of a purely algebraic and basis-free definition of a
Statement 2: All bilinear forms in V constitute a vector space bilinear form.
canonically isomorphic to V V . A basis {ej } is orthonormal If (u, f ) and (v, g ) are two elements of V V , a canonical
with respect to the bilinear form bilinear form is defined by the formula
N
X
B ej ej . h(u, f ) , (v, g )i = f (v) + g (u) . (5.4)
j=1

Proof: Left as exercise.  This formula does not define a positive-definite bilinear form
Exercise 1: Let {v1 , ..., vN } be a basis in V (not necessarily or- because
thonormal), and denote by {vi } the dual basis to {vi }. The h(u, f ) , (u, f )i = 2f (u) ,
dual basis is a basis in V . Now, we can map {vi } into a ba-
sis {ui } in V using the covector-vector correspondence. Show which can be positive, negative, or zero for some (u, f ) V
that hvi , uj i = ij . Use this formula to show that this construc- V .
tion, applied to an orthonormal basis {ei }, yields again the same
basis {ei }. Statement: The bilinear form defined by Eq. (5.4) is symmetric
Hint: If vectors x and y have the same scalar products and nondegenerate.
hvi , xi = hvi , yi (for i = 1, ..., N ) then x = y. Proof: The symmetry is obvious from Eq. (5.4). Then for any
Exercise 2: Let {v1 , ..., vN } be a given (not necessarily orthonor- nonzero vector (u, f ) we need to find a vector (v, g ) such that
mal) basis in V , and denote by {vi } the dual basis to {vi }. Due h(u, f ) , (v, g )i 6= 0. By assumption, either u 6= 0 or f 6= 0 or
to the vector-covector correspondence, {vi } is mapped into a both. If u 6= 0, there exists a covector g such that g (u) 6= 0;
basis {uj } in V , so the tensor then we choose v = 0. If f 6= 0, there exists a vector v such that
f (v) 6= 0, and then we choose g = 0. Thus the nondegeneracy
N
X is proved. 
1V vi vi
i=1
Alternatively, there is a canonically defined antisymmetric bi-
linear form (or 2-form),
is mapped into a bilinear form B acting as
N
X h(u, f ) , (v, g )i = f (v) g (u) .
B(x, y) = hvi , xi hui , yi .
i=1 This bilinear form is also nondegenerate (the same proof goes
Show that this bilinear form coincides with the scalar product, through as for the symmetric bilinear form above). Neverthe-
i.e. less, none of the two bilinear forms can serve as a scalar product:
B(x, y) = hx, yi , x, y V. the former lacks positive-definiteness, the latter is antisymmet-
PN PN ric rather than symmetric.
Hint: Since i=1 vi vi = 1V , we have i=1 vi hui , yi = y.
Exercise 3: If a scalar product h, i is given in V , a scalar product
h, i can be constructed also in V as follows: Given any two
5.1.4 Scalar product in index notation
covectors f , g V , we map them into vectors u, v V and
then define In the index notation, the scalar product tensor S V V
hf , g i hu, vi . is represented by a matrix S (with lower indices), and so the
ij
Show that this scalar product is bilinear and positive-definite scalar product of two vectors is written as
if h, i is. For an orthonormal basis {ej }, show that the dual
basis ej in V is also orthonormal with respect to this scalar hu, vi = ui v j Sij .
product.
Exercise 4:* Consider the space End V of linear operators in a Alternatively, one uses the vector-to-covector map S : V V
vector space V with dim V 2. A bilinear form in the space and writes
End V is defined as follows: for any two operators A, B End V
hu, vi = u (v) = ui v i ,
we set hA, Bi Tr(AB). Show that hA, Bi is bilinear, symmetric,
and nondegenerate, but not positive-definite. where the covector u is defined by
Hint: To show nondegeneracy, consider a nonzero operator A;
there exists v V such that Av 6= 0, and then one can choose u Su ui Sij uj .
f V such that f (Av) 6= 0; then define B v f and verify
that hA, Bi is nonzero. To show that the scalar product is not Typically, in the index notation one uses the same symbol to de-
positive-definite, consider C = v f + w g and choose the note a vector, ui , and the corresponding covector, ui . This is
vectors and the covectors appropriately so that Tr(C 2 ) < 0. unambiguous as long as the scalar product is fixed.

90
5 Scalar product

5.2 Orthogonal subspaces Proof: Choose a basis {u1 , ..., un } of U . If n = N , the or-
thogonal complement U is the zero-dimensional subspace,
From now on, we work in a real, N -dimensional vector space V so there is nothing left to prove. If n < N , we may
equipped with a scalar product. choose some additional vectors en+1 , ..., eN such that the set
We call two subspaces V1 V and V2 V orthogonal if ev- {u1 , ..., un , en+1 , ..., eN } is a basis in V and every vector ej is or-
ery vector from V1 is orthogonal to every vector from V2 . An thogonal to every vector ui . Such a basis exists (see Exercise 2 in
important example of orthogonal subspaces is given by the con- Sec. 5.1.1). Then every vector x V can be decomposed as
struction of the orthogonal complement.
Definition: The set of vectors orthogonal to a given vector v is X n XN

denoted by v and is called the orthogonal complement of the x = u
i i + i ei u + w.
vector v. Written as a formula: i=1 i=n+1

v = {x | x V, hx, vi = 0} . This decomposition provides the required decomposition of x


into two vectors.
Similarly, the set of vectors orthogonal to each of the vectors It remains to show that this decomposition is unique (in par-
{v1 , ..., vn } is denoted by {v1 , ..., vn } .
ticular, independent of the choice of bases). If there were two
Examples: If {e1 , e2 , e3 , e4 } is an orthonormal basis in V different such decompositions, say x = u + w = u + w , we
then the subspace Span {e1 , e3 } is orthogonal to the subspace would have
Span {e2 , e4 } because any linear combination of e1 and e3 is or- !
thogonal to any linear combination of e2 and e4 . The orthogonal 0 = hu u + w w , yi , y V.
complement of e1 is
Let us now show that u = u and w = w : Taking an arbitrary
e y U , we have hw w , y = 0i and hence find that u u is
1 = Span {e2 , e3 , e4 } .
orthogonal to y. It means that the vector uu U is orthogonal

Statement 1: (1) The orthogonal complement {v1 , ..., vn } is a to every vector y U , e.g. to y u u ; since the scalar product

subspace of V . of a nonzero vector with itself cannot be equal to zero, we must



(2) Every vector from the subspace Span {v1 , ..., vn } is orthog- have uu = 0. Similarly, by taking an arbitrary z U , we find
that w w is orthogonal to z, hence we must have w w = 0.
onal to every vector from {v1 , ..., vn } .

Proof: (1) If two vectors x, y belong to {v1 , ..., vn } , it means
An important operation is the orthogonal projection onto a
that hvi , xi = 0 and hvi , yi = 0 for i = 1, ..., n. Since the scalar subspace.
product is linear, it follows that
Statement 3: There are many projectors onto a given subspace
hvi , x + yi = 0, i = 1, ..., n. U V , but only one projector PU that preserves the scalar prod-
uct with vectors from U . Namely, there exists a unique linear
Therefore, any linear combination of x and y also belongs to operator PU , called the orthogonal projector onto the subspace
{v1 , ..., vn } . This is the same as to say that {v1 , ..., vn } is a U , such that
subspace of V .
(2) Suppose x Span
PU PU = PU ; (PU x) U for x V projection property;
Pn{v1 , ..., vn } and y {v1 , ..., vn } ; then
we may express x = i=1 i vi with some coefficients i , while hPU x, ai = hx, ai , x V, a U preserves h, i .
hvi , yi = 0 for i = 1, ..., n. It follows from the linearity of the
scalar product that Remark: The name orthogonal projections (this is quite dif-
ferent from orthogonal transformations defined in the next
n
X section!) comes from a geometric analogy: Projecting a three-
hx, yi = hi vi , yi = 0.
dimensional vector orthogonally onto a plane means that the
i=1
projection does not add to the vector any components parallel
Hence, every such x is orthogonal to every such y.  to the plane. The vector is cast down in the direction normal
Definition: If U V is a given subspace, the orthogonal com- to the plane. The projection modifies a vector x by adding to it
plement U is defined as the subspace of vectors that are or- some vector orthogonal to the plane; this modification preserves
thogonal to every vector from U . (It is easy to see that all these the scalar products of x with vectors in the plane. Perhaps a bet-
vectors form a subspace.) ter word would be normal projection.
Exercise 1: Given a subspace U V , we may choose a ba- Proof: Suppose {u1 , ..., un } is a basis in the subspace U ,
sis {u1 , ..., un } in U and then construct the orthogonal comple- and assume that n < N (or else U = V and there ex-
ment {u1 , ..., un } as defined above. Show that the subspace ists only one projector onto U , namely the identity opera-

{u1 , ..., un } is the same as U independently of the choice of tor, which preserves the scalar product, so there is nothing
the basis {uj } in U .  left to prove). We may complete the basis {u1 , ..., un } of U
The space V can be decomposed into a direct sum of orthogo- to  a basis {u1 , ..., un ,e n+1 , ..., eN } in the entire space V . Let
nal subspaces. u1 , ..., un , en+1 , ..., eN be the corresponding dual basis. Then
Statement 2: Given a subspace U V , we can construct its or- a projector onto U can be defined by

thogonal complement U V . Then V = U U ; in other n
X
words, every vector x V can be uniquely decomposed as P = ui ui ,
x = u + w where u U and w U . i=1

91
5 Scalar product

that is, P x simply omits the components of the vector x paral- Hence, all vectors in the hyperplane can be represented as a sum
lel to any ej (j = n + 1, ..., N ). For example, the operator P of one such vector, say x0 , and an arbitrary vector orthogonal to
maps the linear combination u1 + en+1 to u1 , omitting the n. Geometrically, this means that the hyperplane is orthogonal
component parallel to en+1 . There are infinitely many ways of to the vector n and may be shifted from the origin.
choosing {ej | j = n + 1, ..., N }; for instance, one can add to en+1 Example: Let us consider an affine hyperplane given by the
an arbitrary linear combination of {uj } and obtain another pos- equation hn, xi = 1, and let us compute the shortest vector be-
sible choice of en+1 . Hence there are infinitely many possible longing to the hyperplane. Any vector x V can be written
projectors onto U . as
While all these projectors satisfy the projection property, not x = n + b,
all of them preserve the scalar product. The orthogonal projector where b is some vector such that hn, bi = 0. If x belongs to the
is the one obtained from a particular completion of the basis, hyperplane, we have
namely such that every vector ej is orthogonal to every vector
ui . Such a basis exists (see Exercise 2 in Sec. 5.1.1). Using the 1 = hn, xi = hn, n + bi = hn, ni .
construction shown above, we obtain a projector that we will
denote PU . We will now show that this projector is unique and Hence, we must have
satisfies the scalar product preservation property. 1
= .
The scalar product is preserved for the following reason. For hn, ni
any x V , we have a unique decomposition x = u + w, where The squared length of x is then computed as
u U and w U . The definition of PU guarantees that PU x =
u. Hence hx, xi = 2 hn, ni + hb, bi
1 1
hx, ai = hu + w, ai = hu, ai = hPU x, ai, x V, a U. = + hb, bi .
hn, ni hn, ni
Now the uniqueness: If there were two projectors PU and PU ,
The inequality becomes an equality when b = 0, i.e. when x =
both satisfying the scalar product preservation property, then
n. Therefore, the smallest possible length of x is equal to ,

h(PU PU )x, ui = 0 x V, u U. which is equal to the inverse length of n.
Exercise: Compute the shortest distance between two parallel
For a given x V , the vector y (PU PU )x belongs to U and hyperplanes defined by equations hn, xi = and hn, xi = .
is orthogonal to every vector in U . Therefore y = 0. It follows Answer:
that (PU PU )x = 0 for any x V , i.e. the operator (PU PU ) | |
p .
is equal to zero.  hn, ni
Example: Given a nonzero vector v V , let us construct the
orthogonal projector onto the subspace v . It seems (judging
from the proof of Statement 3) that we need to chose a basis in
5.3 Orthogonal transformations
v . However, the projector (as we know) is in fact independent
Definition: An operator A is called an orthogonal transforma-
of the choice of the basis and can be constructed as follows:
tion with respect to the scalar product h, i if
hv, xi
Pv x x v . hAv, Awi = hv, wi , v, w V.
hv, vi

It is easy to check that this is indeed a projector onto v , namely (We use the words transformation and operator inter-
we can check that hPv x, vi = 0 for all x V , and that v is an changeably since we are always working within the same vector
invariant subspace under Pv . space V .)
Exercise 2: Construct an orthogonal projector Pv onto the space
spanned by the vector v. 5.3.1 Examples and properties
hv,xi
Answer: Pv x = v hv,vi .
Example 1: Rotation by a fixed angle is an orthogonal transfor-
mation in a Euclidean plane. It is easy to see that such a rota-
5.2.1 Affine hyperplanes tion preserves scalar products (angles and lengths are preserved
Suppose n V is a given vector and a given number. The set by a rotation). Let us define this transformation by a formula.
of vectors x satisfying the equation If {e1 , e2 } is a positively oriented orthonormal basis in the Eu-
clidean plane, then we define the rotation R of the plane by
hn, xi = angle in the counter-clockwise direction by

is called an affine hyperplane. Note that an affine hyperplane is R e1 e1 cos e2 sin ,


not necessarily a subspace of V because x = 0 does not belong
to the hyperplane when 6= 0. R e2 e1 sin + e2 cos .
The geometric interpretation of a hyperplane follows from the
fact that the difference of any two vectors x1 and x2 , both be- One can quickly verify that the transformed basis {R e1 , R e2 }
longing to the hyperplane, satisfies is also an orthonormal basis; for example,

hn, x1 x2 i = 0. hR e1 , R e1 i = he1 , e1 i cos2 + he2 , e2 i sin2 = 1.

92
5 Scalar product

Example 2: Mirror reflections are also orthogonal transforma- Exercise 4: Prove that Mn (as defined in Example 2) is an or-
tions. A mirror reflection with respect to the basis vector e1 thogonal transformation by showing that hMn x, Mn xi = hx, xi
maps a vector x = 1 e1 + 2 e2 + ... + N eN into Me1 x = for any x.
1 e1 + 2 e2 + ... + N eN , i.e. only the first coefficient changes Exercise 5: Consider the orthogonal transformations R and
sign. A mirror reflection with respect to an arbitrary axis n Mn and an orthonormal basis {e1 , e2 } as defined in Examples 1
(where n is a unit vector, i.e. hn, ni = 1) can be defined as the and 2. Show by a direct calculation that
transformation
Mn x x 2 hn, xi n. (R e1 ) (R e2 ) = e1 e2
This transformation is interpreted geometrically as mirror re-
and that
flection with respect to the hyperplane n . 
(Mn e1 ) (Mn e2 ) = e1 e2 .
An interesting fact is that orthogonality entails linearity.
Statement 1: If a map A : V V is orthogonal then it is a linear This is the same as to say that det R = 1 and det Mn = 1.
map, A (u + v) = Au + Av. This indicates that rotations preserve orientation while mirror
Proof: Consider an orthonormal basis {e1 , ..., eN }. The set reflections reverse orientation. 
{Ae1 , ..., AeN } is orthonormal because
5.3.2 Transposition
hAei , Aej i = hei , ej i = ij .
Another way to characterize orthogonal transformations is by
By Theorem 1 of Sec. 5.1 the set {Ae1 , ..., AeN } is linearly inde- using transposed operators. Recall that the canonically defined
pendent and is therefore an orthonormal basis in V . Consider an transpose to A is AT : V V (see Sec. 1.8.4, p. 25 for a defini-
arbitrary vector v V and its image Av after the transformation tion). In a (finite-dimensional) space with a scalar product, the
A. By Theorem 2 of Sec. 5.1.1, we can decompose v in the basis one-to-one correspondence between V and V means that AT
{ej } and Av in the basis {Aej } as follows, can be identified with some operator acting in V (rather than in
V ). Let us also denote that operator by AT and call it the trans-
N
X
v= hej , vi ej , posed to A. (This transposition is not canonical but depends on
j=1 the scalar product.) We can formulate the definition of AT as
N N follows.
X X
Av = hAej , Avi Aej = hej , vi Aej . Definition 1: In a finite-dimensional space with a scalar prod-
j=1 j=1 uct, the transposed operator AT : V V is defined by

Any other vector u V can be similarly decomposed, and so hAT x, yi hx, Ayi, x, y V.
we obtain
N
Exercise 1: Show that (AB)T = B T AT .
X Statement 1: If A is orthogonal then AT A = 1V .
A (u + v) = hej , u + vi Aej
j=1 Proof: By definition of orthogonal transformation, hAx, Ayi =
N N hx, yi for all x, y V . Then we use the definition of AT and
X X
= hej , ui Aej + hej , vi Aej obtain
j=1 j=1 hx, yi = hAx, Ayi = hAT Ax, yi.
= Au + Av, u, v V, K, Since this holds for all x, y V , we conclude that AT A = 1V
(see Exercise 2 in Sec. 5.1). 
showing that the map A is linear.  Let us now see how transposed operators appear in matrix
An orthogonal operator always maps an orthonormal basis form. Suppose {ej } is an orthonormal basis in V ; then the oper-
into another orthonormal basis (this was shown in the proof of ator A can be represented by some matrix Aij in this basis. Then
Statement 1). The following exercise shows that the converse is the operator AT is represented by the matrix Aji in the same
also true. basis (i.e. by the matrix transpose of Aij ), as shown in the fol-
Exercise 1: Prove that a transformation is orthogonal if and only lowing exercise. (Note that the operator AT is not represented
if it maps some orthonormal basis into another orthonormal ba- by the transposed matrix when the basis is not orthonormal.)
sis. Deduce that any orthogonal transformation is invertible.
Exercise 2: Show that the operator AT is represented by the
Exercise 2: If a linear transformation A satisfies hAx, Axi = transposed matrix Aji in the same (orthonormal) basis in which
hx, xi for all x V , show that A is an orthogonal transforma-
the operator A has the matrix Aij . Deduce that det A = det (AT ).
tion. (This shows how to check more easily whether a given
Solution: The matrix element Aij with respect to an orthonor-
linear transformation is orthogonal.)
mal basis {ej } is the coefficient in the tensor decomposition
Hint: Substitute x = y + z. PN
Exercise 3: Show that for any two orthonormal bases A = i,j=1 Aij ei ej and can be computed using the scalar
{ej | j = 1, ..., N } and {fj | j = 1, ..., N }, there exists an orthog- product as
onal operator R that maps the basis {ej } into the basis {fj }, Aij = hei , Aej i.
i.e. Rej = fj for j = 1, ..., N . The transposed operator satisfies
Hint: A linear operator mapping {ej } into {fj } exists; show
that this operator is orthogonal. hei , AT ej i = hAei , ej i = Aji .

93
5 Scalar product

Hence, the matrix elements of AT are Aji , i.e. the matrix el- Statement: Given two orthonormal bases {ej } and {fj }, let us
ements of the transposed matrix. We know that det(Aji ) = define two tensors e1 ... eN and f1 ... fN . Then
det(Aij ). If the basis {ej } is not orthonormal, the property = .
Aij = hei , Aej i does not hold and the argument fails.  Proof: There exists an orthogonal transformation R that maps
We have seen in Exercise 5 (Sec. 5.3.1) that the determinants of the basis {ej } into the basis {fj }, i.e. Rej = fj for j = 1, ..., N .
some orthogonal transformations were equal to +1 or 1. This Then det R = 1 and thus
is, in fact, a general property.
Statement 2: The determinant of an orthogonal transformation = Re1 ... ReN = (det R) = .
is equal to 1 or to 1.
Proof: An orthogonal transformation A satisfies AT A = 1V . 
Compute the determinant of both sides; since the determinant of The sign factor 1 in the definition of the unit-volume tensor
is an essential ambiguity that cannot be avoided; instead, one
the transposed operator is equal to that of the original operator,
we have (det A)2 = 1. simply chooses some orthonormal basis {ej }, computes e1

... eN , and declares this to be positively oriented. Any
other nonzero N -vector N V can then be compared with
as = C, yielding a constant C 6= 0. If C > 0 then is
5.4 Applications of exterior product also positively oriented, otherwise is negatively oriented.
Similarly, any given basis {vj } is then deemed to be positively
We will now apply the exterior product techniques to spaces
oriented if Eq. (5.5) holds with C > 0. Choosing is therefore
with a scalar product and obtain several important results.
called fixing the orientation of space.
Remark: right-hand rule. To fix the orientation of the basis
5.4.1 Orthonormal bases, volume, and N V in the 3-dimensional space, frequently the right-hand rule is
If an orthonormal basis {ej } is chosen, we can consider a special used: The thumb, the index finger, and the middle finger of a
tensor in N V , namely relaxed right hand are considered the positively oriented ba-
sis vectors {e1 , e2 , e3 }. However, this is not really a definition in
e1 ... eN . the mathematical sense because the concept of fingers of a right
hand is undefined and actually cannot be defined in geometric
Since 6= 0, the tensor can be considered a basis tensor in the terms. In other words, it is impossible to give a purely algebraic
one-dimensional space N V . This choice allows one to identify or geometric definition of a positively oriented basis in terms
the space N V with scalars (the one-dimensional space of num- of any properties of the vectors {ej } alone! (Not to mention that
bers, K). Namely, any tensor N V must be proportional to there is no human hand in N dimensions.) However, once an
(since N V is one-dimensional), so = t where t K is some arbitrary basis {ej } is selected and declared to be positively ori-
number. The number t corresponds uniquely to each N V . ented, we may look at any other basis {vj }, compute
As we have seen before, tensors from N V have the interpre-
v1 ... vN v1 ... vN
tation of oriented volumes. In this interpretation, represents C = ,
the volume of a parallelepiped spanned by the unit basis vec- e 1 ... e N
tors {ej }. Since the vectors {ej } are orthonormal and have unit and examine the sign of C. We will have C 6= 0 since {v } is a
j
length, it is reasonable to assume that they span a unit volume. basis. If C > 0, the basis {v } is positively oriented. If C < 0, we
j
Hence, the oriented volume represented by is equal to 1 de- need to change the ordering of vectors in {v }; for instance, we
j
pending on the orientation of the basis {ej }. The tensor is may swap the first two vectors and use {v , v , v , ..., v } as the
2 1 3 N
called the unit volume tensor. positively oriented basis. In other words, a positive orientation
Once is fixed, the (oriented) volume of a parallelepiped of space simply means choosing a certain ordering of vectors in
spanned by arbitrary vectors {v1 , ..., vN } is equal to the constant each basis. As we have seen, it suffices to choose the unit volume
C in the equality tensor (rather than a basis) to fix the orientation of space. The
v1 ... vN = C. (5.5) choice of sign of is quite arbitrary and does not influence the
In our notation of tensor division, we can also write results of any calculations because the tensor always appears
on both sides of equations or in a quadratic combination. 
v1 ... vN
Vol {v1 , ..., vN } C = .
3
5.4.2 Vector product in R and Levi-Civita
It might appear that is arbitrarily chosen and will change symbol
when we select another orthonormal basis. However, it turns
out that the basis tensor does not actually depend on the In the familiar three-dimensional Euclidean space, V = R3 ,
choice of the orthonormal basis, up to a sign. (The sign of is there is a vector product a b and a scalar product a b. We will
necessarily ambiguous because one can always interchange, say, now show how the vector product can be expressed through the
e1 and e2 in the orthonormal basis, and then the sign of will be exterior product.
flipped.) We will now prove that a different orthonormal basis A positively oriented orthonormal basis {e1 , e2 , e3 } defines
yields again either or , depending on the order of vectors. the unit volume tensor e1 e2 e3 in 3 V . Due to the
In other words, depends on the choice of the scalar product presence of the scalar product, V can be identified with V , as
but not on the choice of an orthonormal basis, up to a sign. we have seen.

94
5 Scalar product

Further, the space 2 V can be identified with V by the follow- Indeed, the triple product can be expressed through the exte-
ing construction. A 2-vector A 2 V generates a covector f by rior product. We again use the tensor = e1 e2 e3 . Since
the formula {ej } is an orthonormal basis, the volume of the parallelepiped
xA spanned by e1 , e2 , e3 is equal to 1. Then we can express a b c
f (x) , x V.
as
Now the identification of vectors and covectors shows that f
a b c = ha, (b c)i = ha, b ci = (a, b, c) .
corresponds to a certain vector c. Thus, a 2-vector A 2 V is
mapped to a vector c V . Let us denote this map by the star Therefore we may write
symbol and write c = A. This map is called the Hodge star; it abc
is a linear map 2 V V . (a, b,c) = .

Example 1: Let us compute (e2 e3 ). The 2-vector e2 e3 is
In the index notation, the triple product is written as
mapped to the covector f defined by
(a, b, c) jkl aj bk cl .

f (x)e1 e2 e3 x e2 e3 = x1 e1 e2 e3 ,
Here the symbol jkl (the Levi-Civita symbol) is by definition
where x is an arbitrary vector and x1 e1 (x) is the first compo- 123 = 1 and ijk = jik = ikj . This antisymmetric array of
nent of x in the basis. Therefore f = e1 . By the vector-covector numbers, ijk , can be also thought of as the index representation
correspondence, f is mapped to the vector e1 since of the unit volume tensor = e1 e2 e3 because
3
x1 = e1 (x) = he1 , xi . 1 X
= e1 e2 e3 = ijk ei ej ek .
3!
i,j,k=1
Therefore (e2 e3 ) = e1 .
Similarly we compute (e1 e3 ) = e2 and (e1 e2 ) = e3 . Remark: Geometric interpretation. The Hodge star is useful
Generalizing Example 1 to a single-term product a b, where in conjunction with the interpretation of bivectors as oriented
a and b are vectors from V , we find that the vector c = (a b) areas. If a bivector a b represents the oriented area of a par-
is equal to the usually defined vector product or cross product allelogram spanned by the vectors a and b, then (a b) is the
c = ab. We note that the vector product depends on the choice vector a b, i.e. the vector orthogonal to the plane of the par-
of the orientation of the basis; exchanging the order of any two allelogram whose length is numerically equal to the area of the
basis vectors will change the sign of the tensor and hence will parallelogram. Conversely, if n is a vector then (n) is a bivector
change the sign of the vector product. that may represent some parallelogram orthogonal to n with the
Exercise 1: The vector product in R3 is usually defined through appropriate area.
the components of vectors in an orthogonal basis, as in Eq. (1.2). Another geometric example is the computation of the inter-
Show that the definition section of two planes: If a b and c d represent two parallel-
ograms in space then
a b (a b) 
[(a b)] [(c d)] = (a b) (c d)
is equivalent to that. is a vector parallel to the line of intersection of the two planes
Hint: Since the vector product is bilinear, it is sufficient to containing the two parallelograms. While in three dimensions
show that (a b) is linear in both a and b, and then to con- the Hodge star yields the same results as the cross product, the
sider the pairwise vector products e1 e2 , e2 e3 , e3 e1 for an advantage of the Hodge star is that it is defined in any dimen-
orthonormal basis {e1 , e2 , e3 }. Some of these calculations were sions, as the next section shows. 
performed in Example 1. 
The Hodge star is a one-to-one map because (a b) = 0 if
and only if ab = 0. Hence, the inverse map V 2 V exists. It
5.4.3 Hodge star and Levi-Civita symbol in N
is convenient to denote the inverse map also by the same star dimensions
2
symbol, so that we have the map : V V . For example, We would like to generalize our results to an N -dimension-
al space. We begin by defining the unit volume tensor =
(e1 ) = e2 e3 , (e2 ) = e1 e3 ,
e1 ... eN , where {ej } is a positively oriented orthonormal ba-
(e1 ) = (e2 e3 ) = e1 . sis. As we have seen, the tensor is independent of the choice
of the orthonormal basis {ej } and depends only on the scalar
We may then write symbolically = 1; here one of the stars
product and on the choice of the orientation of space. (Alterna-
stands for the map V 2 V , and the other star is the map
tively, the choice of rather than as the unit volume tensor
2 V V .
defines the fact that the basis {ej } is positively oriented.) Below
The triple product is defined by the formula
we will always assume that the orthonormal basis {ej } is chosen
(a, b, c) ha, b ci . to be positively oriented.
The Hodge star is now defined as a linear map V N 1 V
The triple product is fully antisymmetric, through its action on the basis vectors,

(a, b, c) = (b, a, c) = (a, c, b) = + (c, a, b) = ... (ej ) (1)j1 e1 ... ej1 ej+1 ... eN ,
where we write the exterior product of all the basis vectors ex-
The geometric interpretation of the triple product is that of the cept ej . To check the sign, we note the identity
oriented volume of the parallelepiped spanned by the vectors a,
b, c. This suggests a connection with the exterior power 3 (R3 ). ej (ej ) = , 1 j N.

95
5 Scalar product

Remark: The Hodge star map depends on the scalar product Exercise 2: Show that (ei ) = ei for basis vectors ei . Deduce
and on the choice of the orientation of the space V , i.e. on the that x = x for any x V .
choice of the sign in the basis tensor e1 ... eN , but not on Exercise 3: Show that
the choice of the vectors {ej } in a positively oriented orthonor-
N N
mal basis. This is in contrast with the complement operation X X
defined in Sec. 2.3.3, where the scalar product was not available: x = hx, ei i ei = (ei x)(ei ).
the complement operation depends on the choice of every vec- i=1 i=1

tor in the basis. The complement operation is equivalent to the Here a b ha, bi. 
Hodge star only if we use an orthonormal basis.
In the previous section, we saw that e1 = e1 (in three di-
Alternatively, given some basis {vj }, we may temporarily in-
mensions). The following exercise shows what happens in N
troduce a new scalar product such that {vj } is orthonormal. The
dimensions: we may get a minus sign.
complement operation is then the same as the Hodge star de-
fined with respect to the new scalar product. The complement Exercise 4: a) Given a vector x V , define N 1 V as
operation was introduced by H. Grassmann (1844) long before x. Then show that
the now standard definitions of vector space and scalar product
(x) = (1)N 1 x.
were developed. 
The Hodge star can be also defined more generally as a map
b) Show that = (1)k(N k) 1 when applied to the space
of k V to N k V . The construction of the Hodge star map is as
k V or N k V .
follows. We require that it be a linear map. So it suffices to define
Hint: Since is a linear map, it is sufficient to consider its
the Hodge star on single-term products of the form a1 ...
action on a basis vector, say e1 , or a basis tensor e1 ... ek
ak . The vectors {ai | i = 1, ..., k} define a subspace of V , which
k V , where {ej } is an orthonormal basis.
we temporarily denote by U Span {ai }. Through the scalar
product, we can construct the orthogonal complement subspace Exercise 5: Suppose that a1 , ..., ak , x V are such that hx, ai i =
U ; this subspace consists of all vectors that are orthogonal to 0 for all i = 1, ..., k while hx, xi = 1. The k-vector k V is
every ai . Thus, U is an (N k)-dimensional subspace of V . We then defined as a function of t by
can find a basis {bi | i = k + 1, ..., N } in U such that
(t) (a1 + tx) ... (ak + tx) .
a1 ... ak bk+1 ... bN = . (5.6)
Show that tt = x x .
Then we define
Exercise 6: For x V and k V (1 k N ), the tensor
(a1 ... ak ) bk+1 ... bN N k V. x k1 V is called the interior product of x and . Show
Examples: that
x = (x ).
(e1 e3 ) = e2 e4 ... eN ;
(1) = e1 ... eN ; (e1 ... eN ) = 1. (Note however that x = 0 for k 2.)
Exercise 7: a) Suppose x V and k V are such that x =
The fact that we denote different maps by the same star symbol 0 while hx, xi = 1. Show that
will not cause confusion because in each case we will write the
tensor to which the Hodge star is applied.  = x x .
Even though (by definition) ej (ej ) = for the basis vectors
ej , it is not true that x (x) = for any x V . Hint: Use Exercise 2 in Sec. 2.3.2 with a suitable f .
Exercise 1: Show that x (x) = hx, xi for any x V . Then b) For any k V , show that
set x = a + b and show (using = 1) that
N
ha, bi = (a b) = (b a), a, b V. 1X
= ej ej ,
Statement: The Hodge star map : k V N k V , as defined k j=1
above, is independent of the choice of the basis in U .
Proof: A different choice of basis in U , say {bi } instead of where {ej } is an orthonormal basis.
{bi }, will yield a tensor bk+1 ... bN that is proportional to Hint: It suffices to consider = ei1 ... eik . 
bk+1 ... bN . The coefficient of proportionality is fixed by The Levi-Civita symbol i1 ...iN is defined in an N -dimensional
Eq. (5.6). Therefore, no ambiguity remains.  space as the coordinate representation of the unit volume tensor
The insertion map a was defined in Sec. 2.3.1 for covectors e1 ... eN N V (see also Sections 2.3.6 and 3.4.1). When
a . Due to the correspondence between vectors and covectors, a scalar product is fixed, the tensor is unique up to a sign; if
we may now use the insertion map with vectors. Namely, we we assume that corresponds to a positively oriented basis, the
define Levi-Civita symbol is the index representation of in any pos-
x x , itively oriented orthonormal basis. It is instructive to see how
where the covector x is defined by one writes the Hodge star in the index notation using the Levi-
Civita symbol. (I will write the summations explicitly here, but
x (v) hx, vi , v V. keep in mind that in the physics literature the summations are
For example, we then have implicit.)
Given an orthonormal basis {ej }, the natural basis in k V is
x (a b) = hx, ai b hx, bi a. the set of tensors {ei1 ... eik } where all indices i1 , ..., ik are

96
5 Scalar product

different (or else the exterior product vanishes). Therefore, an vectors, {u1 , ..., uN }. By definition of the vector-covector corre-
arbitrary tensor k V can be expanded in this basis as spondence, the vector ui is such that
N hui , xi = vi (x) xi , x V.
1 X
= Ai1 ...ik ei1 ... eik , We will now show that the set {u1 , ..., uN } is a basis in V . It
k! i ,...,i =1
1 k
is called the reciprocal basis for the basis {vj }. The reciprocal
i1 ...ik basis is useful, in particular, because the components of a vector
where A are some scalar coefficients. I have included the
x in the basis {vj } are computed conveniently through scalar
prefactor 1/k! in order to cancel the combinatorial factor k! that
products with the vectors {uj }, as shown by the formula above.
appears due to the summation over all the indices i1 , ..., ik .
Statement 1: The set {u1 , ..., uN } is a basis in V .
Let us write the tensor (e1 ) in this way. The corre-
i1 ...iN 1 Proof: We first note that
sponding coefficients A are zero unless the set of indices
(i1 , ..., iN 1 ) is a permutation of the set (2, 3, ..., N ). This state- hui , vj i vi (vj ) = ij .
ment can be written more concisely as
We need to show that the set {u1 , ..., uN } is linearly indepen-
(e )i1 ...iN 1
A i1 ...iN 1
= 1i1 ...iN 1
. dent. Suppose a vanishing linear combination exists,
1
N
X
PN
Generalizing to an arbitrary vector x = j=1 xj ej , we find i ui = 0,
i=1
N
X N
X
i1 ...iN 1 j i1 ...iN 1 and take its scalar product with the vector v1 ,
(x) x (ej ) = xj ji ii1 ...iN 1 .
j=1 i,j=1 N N

X X
0 = v1 , i ui = i 1i = 1 .
Remark: The extra Kronecker symbol above is introduced for i=1 i=1
consistency of the notation (summing only over a pair of op-
In the same way we show that all i are zero. A linearly inde-
posite indices). However, this Kronecker symbol can be inter-
pendent set of N vectors in an N -dimensional space is always a
preted as the coordinate representation of the scalar product in
basis, hence {uj } is a basis. 
the orthonormal basis. This formula then shows how to write Exercise 1: Show that computing the reciprocal basis to an or-
the Hodge star in another basis: replace ji with the matrix rep- thonormal basis {ej } gives again the same basis {ej }. 
resentation of the scalar product.  The following statement shows that, in some sense, the recip-
Similarly, we can write the Hodge star of an arbitrary k-vector rocal basis is the inverse of the basis {vj }.
in the index notation through the symbol.PFor example, in a Statement 2: The oriented volume of the parallelepiped
four-dimensional space one maps a 2-vector i,j Aij ei ej into spanned by {uj } is the inverse of that spanned by {vj }.
X  X kl Proof: The volume of the parallelepiped spanned by {uj } is
Aij ei ej = B ek el , found as
i,j k,l u1 ... uN
Vol {uj } = ,
e1 ... eN
where where {ej } is a positively oriented orthonormal basis. Let us
1 X km ln
B kl ijmn Aij . introduce an auxiliary transformation M that maps {ej } into
2! i,j,m,n
{vj }; such a transformation surely exists and is invertible. Since
P
A vector v = i v i ei is mapped into M ej = vj (j = 1, ..., N ), we have
X  1 X M e1 ... M eN v1 ... vN
(v) = v i ei = ijkl v i ej ek el . det M = = = Vol {vj } .
3! e1 ... eN e1 ... eN
i i,j,k,l
Consider the transposed operator M T (the transposition is per-
Note the combinatorial factors 2! and 3! appearing in these for- formed using the scalar product, see Definition 1 in Sec. 5.3.1).
mulas, according to the number of indices in that are being We can now show that M T maps the dual basis {u } into {e }.
j j
summed over. To show this, we consider the scalar products
hei , M T uj i = hM ei , uj i = hvi , uj i = ij .
5.4.4 Reciprocal basis
Since the above is true for any i, j = 1, ..., N , it follows that
Suppose {v1 , ..., vN } is a basis in V , not necessarily orthonor-
M T uj = ej as desired.
mal. For any x V , we can compute the components  of x
in the basis {vj } by first computing the dual basis, vj , as in Since det M T = det M , we have
Sec. 2.3.3, and then writing e1 ... eN = M T u1 ... M T uN = (det M )u1 ... uN .
N
X It follows that
x= xi vi , xi vi (x). u1 ... uN 1 1
i=1
Vol {uj } = = = .
e1 ... eN det M Vol {vj }
The scalar product in V provides a vector-covector correspon- 
dence. Hence, each vi has a corresponding vector; let us de- The vectors of the reciprocal basis can be also computed using
note that vector temporarily by ui . We then obtain a set of N the Hodge star, as follows.

97
5 Scalar product

Exercise 2: Suppose that {vj } is a basis (not necessarily or- define the scalar product h1 , 2 i as the determinant of that ma-
thonormal) and {uj } is its reciprocal basis. Show that trix:
h1 , 2 i det hui , vj i .
u1 = (v2 ... vN ) ,
v1 ... vN Prove that this definition really yields a symmetric bilinear form
in N V , independently of the particular representation of 1 , 2
where e1 ...eN , {ej } is a positively oriented orthonormal
through vectors.
basis, and we use the Hodge star as a map from N 1 V to V .
Hint: The known properties of the determinant show that
Hint: Use the formula for the dual basis (Sec. 2.3.3),
h1 , 2 i is an antisymmetric and multilinear function of every ui
x v2 ... vN and vj . A linear transformation of the vectors {ui } that leaves
v1 (x) = , 1 constant will also leave h1 , 2 i constant. Therefore, it can be
v1 v2 ... vN
considered as a linear function of the tensors 1 and 2 . Sym-
and the property metry follows from det(Gij ) = det(Gji ).
hx, ui = x u. Exercise 2: Given an orthonormal basis {ej | j = 1, ..., N }, let us
consider the unit volume tensor e1 ... eN N V .
a) Show that h, i = 1, where the scalar product in N V is
5.5 Scalar product in k V chosen according to the definition in Exercise 1.
In this section we will apply the techniques developed until now b) Given a linear operator A, show that det A = h, N AN i.
to the problem of computing k-dimensional volumes. Exercise 3: For any , N V , show that
If a scalar product is given in V , one can naturally define a
scalar product also in each of the spaces k V (k = 2, ..., N ). We
h, i = ,
will show that this scalar product allows one to compute the
ordinary (number-valued) volumes represented by tensors from
where is the unit volume tensor. Deduce that h, i is a
k V . This is fully analogous to computing the lengths of vectors
positive-definite bilinear form.
through the scalar product in V . A vector v in a Euclidean space
represents at once the orientation and the length of a p straight Statement: The volume pof a parallelepiped spanned by vectors
v1 , ..., vN is equal to det(Gij ), where Gij hvi , vj i is the
line segment between two points; the length is found as hv, vi
matrix of the pairwise scalar products.
using the scalar product in V . Similarly, a tensor = v1 ...
Proof: If v1 ... vN 6= 0, the set of vectors {vj | j = 1, ..., N }
vk k V represents at once the orientation and the volume of
is a basis in V . Let us also choose some orthonormal basis
a parallelepiped spanned by the vectors {vj p }; the unoriented
{ej | j = 1, ..., N }. There exists a linear transformation A that
volume of the parallelepiped will be found as h, i using the
maps the basis {ej } into the basis {vj }. Then we have Aej = vj
scalar product in k V .
N and hence
We begin by considering the space V .
Gij = hvi , vj i = hAei , Aej i = hAT Aei , ej i.
N
5.5.1 Scalar product in V
It follows that the matrix Gij is equal to the matrix representa-
Suppose {uj } and {vj } are two bases in V , not necessarily or-
tion of the operator AT A in the basis {ej }. Therefore,
thonormal, and consider the pairwise scalar products

Gjk huj , vk i , j, k = 1, ..., N. det(Gij ) = det(AT A) = (det A)2 .

The coefficients Gjk can be arranged into a square-shaped table, Finally, we note that the volume v of the parallelepiped spanned
i.e. into a matrix. The determinant of this matrix, det(Gjk ), can by {vj } is the coefficient in the tensor equality
be computed using Eq. (3.1). Now consider two tensors 1 , 2
N V defined as ve1 ... eN = v1 ... vN = (det A)e1 ... eN .

1 u1 ... uN , 2 v1 ... vN . Hence v 2 = (det A)2 = det(Gij ). 


We have found that the (unoriented, i.e. number-valued) N -
Then det(Gjk ), understood as a function of the tensors 1 and dimensional volume of a parallelepiped spanned by a set of N
p
2 , is bilinear and symmetric, and thus can be interpreted as the vectors {vj } is expressed as v = h, i, where v1 ...vN
scalar product of 1 and 2 . After some work proving the nec- is the tensor representing the oriented volume of the parallelepi-
essary properties, we obtain a scalar product in the space N V , ped, and h, ip is the scalar product in the space N V . The ex-
given a scalar product in V . pression || h, i is naturally interpreted as the length
Exercise 1: We try to define the scalar product in the space N V of the tensor . In this way, we obtain a geometric interpretation
as follows: Given a scalar product h, i in V and given two ten- of tensors N V as oriented volumes of parallelepipeds: The
sors 1 , 2 N V , we first represent these tensors in some way tensor represents at once the orientation of the parallelepiped
as products and the magnitude of the volume.
1 u1 ... uN , 2 v1 ... vN ,
5.5.2 Volumes of k-dimensional parallelepipeds
where {ui } and {vi } are some suitable sets of vectors, then con-
sider the matrix of pairwise scalar products hui , vj i, and finally In a similar way we treat k-dimensional volumes.

98
5 Scalar product

We begin by defining a scalar product in the spaces k V for Statement: The unoriented k-dimensional volume v ofp a paral-
2 k N . Let us choose
 an orthonormal basis {ej } in V and lelepiped spanned by k vectors {v1 , ..., vk } is equal to h, i,
consider the set of N
k tensors where v1 ... vk and h, i is the scalar product defined
above.
i1 ...ik ei1 ... eik k V. Proof: Consider the orthogonal projection of the given k-
dimensional parallelepiped onto some k-dimensional coordi-
Since the set of these tensors (for all admissible sets of indices)
nate hyperplane, e.g. onto the hyperplane Span {e1 , ..., ek }.
is a basis in k V , it is sufficient to define the scalar product of
Each vector vi is projected orthogonally, i.e. by omitting the
any two tensors i1 ...ik . It is natural to define the scalar product
components of vi at ek+1 , ..., eN . Let us denote the projected
such that i1 ...ik are orthonormal:
vectors by vi (i = 1, ..., k). The projection is a k-dimensional
hi1 ...ik , i1 ...ik i = 1, parallelepiped spanned by {vi } in the coordinate hyperplane.
hi1 ...ik , j1 ...jk i = 0 if i1 ...ik 6= j1 ...jk . Let us now restrict attention to the subspace Span {e1 , ..., ek }.
In this subspace, the oriented k-dimensional volume of the pro-
For any two tensors 1 , 2 k V , we then define h1 , 2 i by jected parallelepiped is represented by the tensor v1 ...vk .
expressing 1 , 2 through the basis tensors i1 ...ik and requiring By construction, is proportional to the unit volume tensor in
the bilinearity of the scalar product. the subspace, = e1 ... ek for some . Therefore, the ori-
In the following exercise, we derive an explicit formula for ented k-dimensional volume of the projected parallelepiped is
the scalar product h1 , 2 i through scalar products of the con- equal to .
stituent vectors. Let us now decompose the tensor into the basis tensors in
Exercise 1: Use the definition above to prove that k V ,
hu1 ... uk , v1 ... vk i = det hui , vj i . (5.7) X
= ci1 ...ik i1 ...ik
Hints: The right side of Eq. (5.7) is a totally antisymmetric, linear 1i1 <...<ik N
function of every ui due to the known properties of the determi- = c1...k e1 ... ek + c13...(k+1) e1 e3 ... ek+1 + ...,
nant. Also, the function is invariant under the interchange of uj 
with vj . The left side of Eq. (5.7) has the same symmetry and where we have only written down the first two of the Nk
linearity properties. Therefore, it is sufficient to verify Eq. (5.7) possible terms of the expansion. The projection of {vi } onto
when vectors ui and vj are chosen from the set of orthonormal the hyperplane Span {e1 , ..., ek } removes the components pro-
basis vectors {ej }. Then u1 ... uk and v1 ... vk are among portional to ek+1 , ..., eN , hence is equal to the first term
the basis tensors i1 ...ik . Show that the matrix hui , vj i has at c1...k e1 ...ek . Therefore, the oriented volume of the projection
least one row or one column of zeros unless the sets {ui } and onto the hyperplane Span {e1 , ..., ek } is equal to c1...k .
{vj } coincide as unordered sets of vectors, i.e. unless By definition of the scalar product in k V , all the basis tensors
i1 ...ik are orthonormal. Hence, the coefficients ci1 ...ik can be
u1 ... uk = v1 ... vk .
computed as
If the above does not hold, both sides of Eq. (5.7) are zero. It
remains to verify that both sides of Eq. (5.7) are equal to 1 when ci1 ...ik = h, ei1 ... eik i h, i1 ...ik i .
we choose identical vectors ui = vi from the orthonormal basis,
For brevity, we may introduce the multi-index I {i1 , ..., ik }
for instance if uj = vj = ej for j = 1, ..., k. 
and rewrite the above as
We now come back to the problem of computing the vol-
ume of a k-dimensional parallelepiped spanned by vectors cI = h, I i .
{v1 , ..., vk } in an n-dimensional Euclidean space Rn . In Sec. 2.1.2
we considered a parallelogram (i.e. we had k = 2), and we Then the value h, i can be computed as
projected the parallelogram onto the N2 coordinate planes to
X X X
define a vector-valued area. We now generalize that con- h, i = cI I , cJ J = cI cJ hI , J i
struction to k-dimensional parallelepipeds. We project the given I J I,J
parallelepiped onto each of the k-dimensional coordinate hyper- X X 2
= cI cJ IJ = |cI | .
planes in the space, which are the subspaces Span {ei1 , ..., eik }
I,J I
(with 1 i1 < ... < ik n). There will be N k such co-
ordinate
 hyperplanes and, accordingly, we may determine the In other words, we have shown that h, i is equal to the sum
N
k oriented k-dimensional volumes of these projections. It is of all N
k squared projected volumes,
natural to view these numbers as the components of the ori-
X
ented volume of the k-dimensional parallelepiped in some ba- h, i = |ci1 ...ik |2 .
sis in the N k -dimensional space of oriented volumes. As we 1i1 <...<ik N
have shown before, oriented volumes are antisymmetric in the p
vectors vj . The space of all antisymmetric combinations of k It remains to show that h, i is actually equal to the unori-
vectors is, in our present notation, k V . Thus the oriented vol- ented volume v of the parallelepiped. To this end, let us choose a
ume of the k-dimensional parallelepiped is represented by the new orthonormal basis {ej } (j = 1, ..., N ) such that every vector
tensor v1 ... vk k V . The unoriented volume is computed vi (i = 1, ..., k) lies entirely within the hyperplane spanned by
as the length of the oriented volume, defined via the scalar the first k basis vectors. (This choice of basis is certainly possible,
product in k V . for instance, by choosing an orthonormal basis in Span {vi } and

99
5 Scalar product

then completing it to an orthonormal basis in V .) Then we will Exercise 3: Intersection of hyperplanes. Suppose U1 , ...,
have = e1 ...ek , i.e. with zero coefficients for all other basis UN 1 V are some (N 1)-dimensional subspaces (hyper-
tensors. Restricting attention to the subspace Span {e1 , ..., ek }, planes) in V . Each Ui can be represented by a tensor i
we can use the results of Sec. 5.5.1 to p find that the volume v is N 1 V , e.g. by choosing i as the exterior product of all vec-
equal to ||. It remains to show that h, i = ||. tors in a basis in U . Define the vector
The transformation from the old basis {ej } to {ej } can be per-  
formed using a certain orthogonal transformation R such that v (1 ) ... (N 1 ) .
Rej = ej (j = 1, ..., N ). Since the scalar product in k V is If v 6= 0, show that v belongs to the intersection of all the (N 1)-
defined directly through scalar products of vectors in V (Exer- dimensional hyperplanes.
cise 1) and since R is orthogonal, we have for any {ai } and {bi } Hint: Show that v i = 0 for each i = 1, ..., N 1. Use
that Exercise 2.
hRa1 ... Rak , Rb1 ... Rbk i = dethRai , Rbj i Exercise 4: Show that hv, vi = hv, vi for v V (noting that
v N 1 V and using the scalar product in that space). Show
= det hai , bj i = ha1 ... ak , b1 ... bk i .
more generally that
In other words, the operator k Rk is an orthogonal transformation
h1 , 2 i = h1 , 2 i ,
in k V . Therefore,

= e1 ... ek = Re1 ... Rek = k Rk 1...k ; where 1 , 2 k V and thus 1 and 2 belong to N k V .
Deduce that the Hodge star is an orthogonal transformation in
h, i = 2 hk Rk 1...k , k Rk 1...k i = 2 h1...k , 1...k i = 2 . N/2 V (if N is even).
p Hint: Use Exercise 2.
Therefore, h, i = || = v as required. 
Remark: The scalar product in the space k V is related the k-
dimensional volume of a body embedded in the space V , in the
same way as the scalar product in V is related to the length of a 5.6 Scalar product for complex spaces
straight line segment embedded in V . The tensor = v1 ...vk
In complex spaces, one can get useful results if one defines the
fully represents the orientation of the k-dimensional parallel-
scalar product in a different way. In this section we work in a
epiped spanned by the vectors {v1 , ..., vk }, while the length
p complex vector space V .
h, i of this tensor gives the numerical value of the volume
A Hermitian scalar product is a complex function of two vec-
of the parallelepiped. This is a multidimensional generalization tors a, b V with the properties
of the Pythagoras theorem that is not easy to visualize! The tech-
niques of exterior algebra enables us to calculate these quantities ha, bi = ha, bi , ha, bi = ha, bi ,
without visualizing them.
ha + b, ci = ha, ci + hb, ci , hb, ai = ha, bi ,
Example 1: In a Euclidean space R4 with a standard orthonor-
mal basis {ej }, a three-dimensional parallelepiped is spanned and nondegeneracy (a V , b V such that ha, b 6= 0i). (Note
by the given vectors that in the formula above means the complex conjugate to .)
a = e1 + 2e2 , b = e3 e1 , c = e2 + e3 + e4 . It follows that hx, xi is real-valued. One usually also imposes
the property hx, xi > 0 for x 6= 0, which is positive-definiteness.
We would like to determine the volume of the parallelepiped. Remark: Note that the scalar product is not linear in the first ar-
We compute the wedge product a b c using Gaussian gument because we have the factor instead of ; one says that
elimination, it is antilinear. One can also define a Hermitian scalar product
= (e1 + 2e2 ) (e3 e1 ) (e2 + e3 + e4 ) that is linear in the first argument but antilinear in the second
argument, i.e. ha, bi = ha, bi and ha, bi = ha, bi. Here
= (e1 + 2e2 ) (e3 + 2e2 ) (e2 + e3 + e4 )
 we follow the definition used in the physics literature. This def-
= [(e1 + 2e2 ) e3 + 2e1 e2 ] 21 e3 + e4 inition is designed to be compatible with the Dirac notation for
= e1 e2 e3 + e1 e3 e4 complex spaces (see Example 3 below).
+ 2e2 e3 e4 + 2e1 e2 e4 . Example 1: In the vector space Cn , vectors are n-tuples of com-
plex numbers, x = (x1 , ..., xn ). A Hermitian scalar product is
We see that the volumes of the projections onto the four coordi- defined by the formula
nate hyperplanes are 1, 1, 2, 2. Therefore the numerical value of
n
the volume is X
p hx, yi = xi yi .
v = h, i = 1 + 1 + 4 + 4 = 10. i=1

Exercise 2: Show that the scalar product of two tensors 1 , 2 This scalar product is nondegenerate and positive-definite.
k V can be expressed through the Hodge star as Example 2: Suppose we have a real, N -dimensional vector
  space V with an ordinary (real) scalar product h, i. We can con-
h1 , 2 i = 1 2 or as h1 , 2 i = 2 1 ,
struct a complex vector space out of V by the following construc-
depending on whether 2k N or 2k N . tion (called the complexification of V ). First we consider the
Hint: Since both sides are linear in 1 and 2 , it is sufficient space C as a real, two-dimensional vector space over R. Then
to show that the relationship holds for basis tensors i1 ...ik we consider the tensor product V C, still a vector space over
ei1 ... eik . R. Elements of V C are linear combinations of terms of the

100
5 Scalar product

form v , where v V and C. However, the (2N -dimen- 5.6.1 Symmetric and Hermitian operators
sional, real) vector space V C can be also viewed as a vector
space over C: the multiplication of v by a complex number z An operator A is symmetric with respect to the scalar product if
yields v (z). Then V C is interpreted as an N -dimensional,
complex vector space. A Hermitian scalar product in this space hu, Avi = hAu, vi, u, v V.
is defined by
According to the definition of the transposed operator, the above
ha , b i ha, bi .
property is the same as AT = A.
Here ha, bi is the ordinary (real) scalar product in V . It is easy The notion of a symmetric operator is suitable for a real vector
to verify that the properties of a Hermitian scalar product are space. In a complex vector space, one uses Hermitian conjuga-
satisfied by the above definition.  tion instead of transposition: An operator A is called Hermitian
Using the Hermitian scalar product, one defines an orthonor- if A = A.
mal basis and other constructions analogous to those defined Symmetric as well as Hermitian operators often occur in ap-
using the ordinary (real) scalar product. For instance, the Her- plications and have useful properties.
mitian scalar product allows one to identify vectors and covec- Statement 1: a) All eigenvalues of a Hermitian operator are real
tors. (have zero imaginary part).
Example 3: The vector-covector correspondence in complex b) If A is a symmetric or Hermitian operator and v1 , v2 are
spaces is slightly different from that in real spaces. Consider eigenvectors of A corresponding to different eigenvalues 1 6=
a vector v V ; the corresponding covector f : V C may be 2 , then v1 and v2 are orthogonal to each other: hv1 , v2 i = 0.
defined as Proof: a) If v is an eigenvector of a Hermitian operator A with
f (x) hv, xi C. eigenvalue , we have

We denote the map v 7 f by a dagger symbol, called Hermi- hv, Avi = hv, vi = hv, vi
tian conjugation, so that (v) = f . Due to the antilinearity of
= hAv, vi = hv, vi = hv, vi .
the scalar product, we have the property

Since hv, vi 6= 0, we have = , i.e. is purely real.


(v) = (v) . b) We compute
In the Dirac notation, one denotes covectors by the bra sym- hv1 , Av2 i = 2 hv1 , v2 i
bols such as hv|. One then may write
!
= hAv1 , v2 i = 1 hv1 , v2 i .
(|vi) = hv| ,
(In the case of Hermitian operators, we have used the fact that
i.e. one uses the same label v inside the special brackets. We 1 is real.) Hence, either 1 = 2 or hv1 , v2 i = 0. 
then have Statement 2: If A is either symmetric or Hermitian and has an
eigenvector v, the subspace orthogonal to v is invariant under
( |vi) = hv| .
A.
The Hermitian scalar product of vectors |ai and |bi is equal to Proof: We need to show that hx, vi = 0 entails hAx, vi = 0.

the action of (|ai) on |bi and denoted ha|bi. Thus, the scalar We compute
product of |ai and |bi is equal to ha| |bi = ha|bi, while the
scalar product of |ai and |bi is equal to ha|bi.  hAx, vi = hx, Avi = hx, vi = 0.
Similarly to the transposed operator AT , the Hermitian con-
jugate operator A is defined by Hence, Ax also belongs to the subspace orthogonal to v. 
Statement 3: A Hermitian operator is diagonalizable.
hA x, yi hx, Ayi, x, y V. Proof: We work in an N -dimensional space V . The charac-
teristic polynomial of an operator A has at least one (perhaps
In an orthonormal basis, the matrix describing the Hermitian complex-valued) root , which is an eigenvalue of A, and thus
conjugate operator A is obtained from the matrix of A by trans- there exists at least one eigenvector v corresponding to . By
posing and complex conjugating each matrix element. Statement 2, the subspace v (the orthogonal complement of v)
Example 4: In the space of linear operators End V , a bilinear is invariant under A. The space
V splits into a direct sum of
form can be defined by Span {v} and the subspace v . We may consider the operator
A in that subspace; again we find that there exists at least one
hA, Bi Tr (A B). eigenvector in v . Continuing this argument, we split the entire
space into a direct sum of N orthogonal eigenspaces. Hence,
As we will see in the next section (Exercise 2), this bilinear form there exist N eigenvectors of A. 
is a positive-definite scalar product in the space End V .  Statement 4: A symmetric operator in a real N -dimensional
In the following sections, we consider some applications of vector space is diagonalizable, i.e. it has N real eigenvectors
the Hermitian scalar product. with real eigenvalues.

101
5 Scalar product

Proof: We cannot repeat the proof of Statement 3 literally, 5.6.2 Unitary transformations
since we do not know a priori that the characteristic polynomial
In complex spaces, the notion analogous to orthogonal transfor-
of a symmetric operator has all real roots; this is something we
need to prove. Therefore we complexify the space V , i.e. we con- mations is unitary transformations.
Definition: An operator is called unitary if it preserves the Her-
sider the space V C as a vector space over C. In this space, we
mitian scalar product:
introduce a Hermitian scalar product as in Example 2 in Sec. 5.6.
In the space V C there is a special notion of real vectors;
hAx, Ayi = hx, yi , x, y V.
these are vectors of the form v c with real c.
The operator A is extended to the space V C by It follows that a unitary operator A satisfies A A = 1.
Exercise 2: If A is Hermitian, show that the operators (1 +
A(v c) (Av) c.
iA)1 (1 iA) and exp (iA) are unitary.
It is important to observe that the operator A transforms real Hint: The Hermitian conjugate of f (iA) is f (iA ) if f (z) is an
vectors into real vectors, and moreover that A is Hermitian in analytic function. This can be shown by considering each term
V C if A is symmetric in V . Therefore, A is diagonalizable in in the power series for f (z).
V C with real eigenvalues. Exercise 3: Show that the determinant of a unitary operator is a
complex number c such that |c| = 1.
It remains to show that all the eigenvectors of A can be chosen
Hint: First show that det(A ) is the complex conjugate of
real; this will prove that A is also diagonalizable in the original
det A.
space V . So far we only know that A has N eigenvectors in V C.
Any vector from V C can be transformed into the expression
u 1 + v i with u, v V . Let us assume that u 1 + v i is 5.7 Antisymmetric operators
an eigenvector of A with eigenvalue . If v = 0, the eigenvector
is real, and there is nothing left to prove; so we assume v 6= 0. In this and the following sections we work in a real vector space
Since is real, we have V in which a scalar product h, i is defined. The dimension of V
is N dim V .
A(u 1 + v i) = (Au) 1 + (Av) i
An operator A is antisymmetric with respect to the scalar
!
= u 1 + v i. product if

If both u 6= 0 and v 6= 0, it follows that u and v are both eigen- hu, Avi + hAu, vi = 0, u, v V.
vectors of A with eigenvalue . Hence, the operator A in V C
can be diagonalized by choosing the real eigenvectors as u 1 Exercise 1: Show that the set of all antisymmetric operators is a
and v 1 instead of the complex eigenvector u 1 + v i. If subspace of V V .
u = 0, we only need to replace the complex eigenvector v i by Exercise 2: Show that AT + A = 0 if and only if the operator A
the equivalent real eigenvector v 1. We have thus shown that is antisymmetric.
the eigenvectors of A in V C can be chosen real.  Remark: Exercise 2 shows that antisymmetric operators are rep-
resented by antisymmetric matrices in an orthonormal basis.
Exercise 1: If an operator A satisfies A = A, it is called anti-
However, the matrix of an operator in some other basis does not
Hermitian. Show that all eigenvalues of A are pure imaginary have to be antisymmetric. An operator can be antisymmetric
or zero, that eigenvectors of A are orthogonal to each other, and with respect to one scalar product and not antisymmetric with
that A is diagonalizable. respect to another.
Hint: The operator B iA is Hermitian; use the properties of Question: Surely an antisymmetric matrix has rather special
Hermitian operators (Statements 1,2,3). properties. Why is it that the corresponding operator is only
Exercise 2: Show that Tr(AT A) > 0 for operators in a real space antisymmetric with respect to some scalar product? Is it not true
with a scalar product, and Tr(A A) > 0 for operators in a that the corresponding operator has by itself special properties,
complex space with a Hermitian scalar product. Deduce that regardless of any scalar product?
hA, Bi Tr (AT B) and hA, Bi Tr (A B) are positive-definite Answer: Yes, it is true. It is a special property of an operator
scalar products in the spaces of operators (assuming real or, re- that there exists a scalar product with respect to which the operator
spectively, complex space V with a scalar product). is antisymmetric. If we know that this is true, we can derive
some useful properties of the given operator by using that scalar
Hint: Compute Tr(AT A) or Tr(A A) directly through compo-
product. 
nents of A in an orthonormal basis.
Statement 1: A 2-vector a b 2 V can be mapped to an op-
Exercise 3: Show that the set of all Hermitian operators is a
erator in V by
subspace of End V , and the same for anti-Hermitian operators.
Then show that these two subspaces are orthogonal to each a b 7 A; Ax a hb, xi b ha, xi , x V.
other with respect to the scalar product of Exercise 2.
Exercise 4: Consider the space End V of linear operators and This formula defines a canonical isomorphism between the
two of its subspaces: the subspace of traceless operators (i.e. op- space of antisymmetric operators (with respect to the given
erators A with TrA = 0) and the subspace of operators propor- scalar product) and 2 V . In other words, any antisymmetric
tional to the identity (i.e. operators 1V for R). Show that operator A can be represented by a 2-vector A 2 V and vice
these two subspaces are orthogonal with respect to the scalar versa.
products hA, Bi Tr(AT B) or hA, Bi Tr (A B). Proof: Left as exercise.

102
5 Scalar product

Statement 2: Any 2-vector A 2 V can be written as a sum


P Remark: Note that the property of being antisymmetric is de-
n
j=1 ak bk using n terms, where n is some number such fined only with respect to a chosen scalar product. (An oper-
that n 12 N (here N dim V ), and the set of vectors ator may be represented by an antisymmetric matrix in some
{a1 , b1 , ..., an , bn } is linearly independent. basis, but not in another basis. An antisymmetric operator is
Proof: By definition, a 2-vector A is representable as a linear represented by an antisymmetric matrix only in an orthonormal
combination of the form basis.) The properties shown in Exercises 3 and 4 will hold for
n
any operator A such that some scalar product exists with respect to
X
A= aj bj , which A is antisymmetric. If A is represented by an antisymmet-
j=1 ric matrix in a given basis {ej }, we may define the scalar product
by requiring that {ej } be an orthonormal basis; then A will be
with some vectors aj , bj V and some value of n. We will begin antisymmetric with respect to that scalar product.
with this representation and transform it in order to minimize
the number of terms. Exercise 6: Show that the canonical scalar product hA, Bi in
The idea is to make sure that the set of vectors the space 2 V (see Sec. 5.5.2) coincides with the scalar product
{a1 , b1 , ..., an , bn } is linearly independent. If this is not so, hA, Bi Tr(AT B) when the 2-vectors A and B are mapped into
there exists a linear relation, say antisymmetric operators A and B.
n
Hint: It is sufficient to consider the basis tensors ei ej as
X
a1 = 1 b1 + (j aj + j bj ) , operators A and B.
j=2 Exercise 7:* Show that any 2-vector A can be written as A =
P n
with some coefficients j and j . Using this relation, the term i=1 i ai bi , where the set {a1 , b1 , ..., an , bn } is orthonormal.

a1 b1 can be rewritten as Outline of solution: Consider the complexified vector space


V C in which a Hermitian scalar product is defined; extend
n
X A into that space, and show that A is anti-Hermitian. Then A
a1 b1 = (j aj + j bj ) b1 . is diagonalizable and has all imaginary eigenvalues. However,
j=2
the operator A is real; therefore, its eigenvalues come in pairs of
These terms can be absorbed by other terms aj bj (j = 2, ..., N ). complex conjugate imaginary values {i1 , i1 , ..., in , in }.
For example, by rewriting The corresponding eigenvectors {v1 , v1 , ..., vn , vn } are orthog-
onal and can be rescaled so that they are orthonormal. Further,
a2 b2 + 2 a2 b1 + 2 b2 b1 we may choose these vectors such that vi is the vector complex
= (a2 2 b1 ) (b2 + 2 b1 ) conjugate to vi . The tensor representation of A is
a2 b2 n
X
A = ii (vi vi vi vi ) ,
we can absorb the term (j aj + j bj )b1 with j = 2 into a2 b2 , i=1
replacing the vectors a2 and b2 by new vectors a2 and b2 . In this
way, we can redefine the vectors aj , bj (j = 2, ..., N ) so that the where {vi , vi } is the basis dual to {vi , vi }. We now define the
term a1 b1 is eliminated from the expression for A. We con- vectors
tinue this procedure until the set of all the vectors aj , bj is lin- vi + vi vi vi
ai , bi ,
early independent. We now denote again by {a1 , b1 , ..., an , bn } 2 i 2
the resulting linearlyPindependent set of vectors such that the
representation A = n and verify that
j=1 aj bj still holds. Note that the fi-
nal number n may be smaller than the initial number. Since
the number of vectors (2n) in the final, linearly independent set Aai = i bi , Abi = i ai (i = 1, ..., n).
{a1 , b1 , ..., an , bn } cannot be greater than N , the dimension of
the space V , we have 2n N and so n 21 N .  Furthermore, the set of vectors {a1 , b1 , ..., an , bn } is orthonor-
Exercise 3: A 2-vector A 2 V satisfies A A = 0. Show that mal, and all the vectors ai , bi are real. Therefore, we can repre-
A can be expressed as a single-term exterior product, A = a b. sent A in the original space V by the 2-vector
Hint: ExpressPA as a sum of smallest number of single-term
products, A = nj=1 ak bk , and show that A A = 0 implies n
X
n = 1: By Statement 2, the set {ai , bi } is linearly independent. A i (ai bi ) .
If n > 1, the expression A A will contain terms such as a1 i=1

b1 a2 b2 ; a linear combination of these terms cannot vanish,


The set {a1 , b1 , ..., an , bn } yields the solution to the problem.
since they are all linearly independent of each other. To show
that rigorously, apply suitably chosen covectors ai and bi . 
Antisymmetric operators have the following properties.
Exercise 4: Show that the trace of an antisymmetric operator is 5.8 * Pfaffians
equal to zero.
Hint: Use the property Tr(AT ) = Tr(A). The Pfaffian is a construction analogous to the determinant, ex-
Exercise 5: Show that the determinant of the antisymmetric op- cept that it applies only to antisymmetric operators in even-
erator is equal to zero in an odd-dimensional space. dimensional spaces with a scalar product.

103
5 Scalar product

Definition: If A is an antisymmetric operator in V and N this matrix. We find the representation of B as an element of
dim V is even, the Pfaffian of A is the number Pf A defined (up 2 V as follows,
to a sign) as the constant factor in the tensor equality
B = ae1 e2 + be1 e3 + ce1 e4
N/2
1 1 ^ + xe2 e3 + ye2 e4 + ze3 e4 .
(Pf A)e1 ... eN = A {z
... A} = A,
(N/2)! | (N/2)!
k=1
N/2 Therefore,

where {e1 , ..., eN } is an orthonormal basis in V and A 2 V 1


is the tensor corresponding to the operator A. (Note that both B B = (az by + cx) e1 e2 e3 e4 .
2!
sides in the equation above are tensors from N V .)
1
Remark: The sign of the Pfaffian depends on the orientation of (Note that the factor 2! cancels the combinatorial factor 2 re-
the orthonormal basis. Other than that, the Pfaffian does not de- sulting from the antisymmetry of the exterior product.) Hence,
pend on the choice of the orthonormal basis {ej }. If this ambi- Pf B = az by + cx.
guity is not desired, one could consider a tensor-valued Pfaffian,
Exercise: Compute the determinant of B in the example above;
A ... A N V ; this tensor does not depend on the choice
show that
of the orientation of the orthonormal basis. This is quite similar
to the ambiguity of the definition of volume and to the possi-
det B = a2 z 2 2abyz + b2 y 2 2bcxy + c2 x2 + 2acxz.
bility of defining an unambiguous but tensor-valued oriented
volume. However, it is important to note that {ej } must be
We see that, again, the determinant is equal to the square of the
a positively oriented orthonormal basis; if we change to an arbi-
Pfaffian (which is easier to compute).
trary basis, the tensor e1 ... eN will be multiplied by some
number not equal to 1, which will make the definition of Pf A Remark: The factor 1/(N/2)! used in the definition of the Pfaf-
impossible. fian is a combinatorial factor. This factor could be inconvenient
Question: Can we define the Pfaffian of an operator if we do if we were calculating in a finite number field where one cannot
not have a scalar product in V ? Can we define the Pfaffian of an divide by (N/2)!. This inconvenience can be avoided if we de-
antisymmetric matrix? fine the Pfaffian of a tensor A = v1 v2 + ... + vn1 vn as zero
Answer: We need a scalar product in order to map an operator if n < N and as the coefficient in the tensor equality
A EndV to a bivector A 2 V ; this is central in the construc-
!
tion of the Pfaffian. If we know that an operator A is antisym- v1 ... vN =(Pf A)e1 ... eN
metric with respect to some scalar product (i.e. if we know that
such a scalar product exists) then we can use that scalar product if n = N . For example, consider the tensor
in order to define the Pfaffian of A. In the language of matrices:
If an antisymmetric matrix is given, we can postulate that this A= ab+cd
matrix represents an operator in some basis; then we can intro-
duce a scalar product such that this basis is orthonormal, so that in a four-dimensional space (N = 4). We compute
this operator is an antisymmetric operator with respect to this
scalar product; and then the Pfaffian can be defined.  A A = (a b + c d) (a b + c d)
To make the correspondence between operators and bivectors =0+abcd+cdab+0
more visual, let us represent operators by their matrices in an or- = 2a b c d.
thonormal basis. Antisymmetric operators are then represented
by antisymmetric matrices. It is clear that the factor 2 = (N/2)! arises due to the presence of
Examples: First we consider a two-dimensional space V . Any 2 possible permutations of the two tensors a b and c d and is
2 2 antisymmetric
  matrix A is necessarily of the form A = therefore a combinatorial factor. We can avoid the division by 2 in
0 a the definition of the Pfaffian if we consider the tensor abcd
, where a is some number; the determinant of A is
a 0 right away, instead of dividing A A by 2. 
then a2 . Let us compute the Pfaffian of A. We find the represen-
tation of A as an element of 2 V as follows, A = ae1 e2 , and
hence Pf A = a. We note that the determinant is equal to the 5.8.1 Determinants are Pfaffians squared
square of the Pfaffian.
In the examples in the previous section, we have seen that the
Let us now consider a four-dimensional space V and a 4 4
determinant turned out to be equal to the square of the Pfaffian
antisymmetric matrix; such a matrix must be of the form
of the same operator. We will now prove this correspondence in

0 a b c
the general case.
a 0 x y
B = Theorem: Given a linear operator A in an even-dimensional
b x 0 z ,

space V where a scalar product is defined, and given that the
c y z 0 operator A is antisymmetric with respect to that scalar product,
we have
where the numbers a, b, c, x, y, z are arbitrary. Let us compute
the Pfaffian and the determinant of the operator represented by (Pf A)2 = det A.

104
5 Scalar product

Proof: We know that the tensor A 2 V corresponding to the together with Eq. (5.8):
operator A can be written in the form
v1 ... vN (det A)(Pf A)1 e1 ... eN
Pf A = =
A = v1 v2 + ... + vn1 vk , e1 ... eN e1 ... eN
= (det A)(Pf A)1 .
where the set of vectors {v1 , ..., vk } is linearly independent
(Statement 2 in Sec. 5.7) and k N is an even number. Hence det A = (Pf A)2 . 
We begin by considering the case k < N . In this case the
exterior product A ... A (where A is taken N/2 times) will be 5.8.2 Further properties
equal to zero because there are only k different vectors in that
exterior product, while the total number of vectors is N , so at Having demonstrated the techniques of working with antisym-
least two vectors vi must be repeated. Also det A = 0 in this metric operators and Pfaffians, I propose to you the following
case; this can be shown explicitly by completing {v1 , ..., vk } to exercises that demonstrate some other properties of Pfaffians.
a basis {v1 , ..., vk , ek+1 , ..., eN } such that all ej are orthogonal These exercises conclude this book.
to all vi . (This can be done by first completing {v1 , ..., vk } to Exercise 1: Let A be an antisymmetric operator; let B be an ar-
a basis and then applying the Gram-Schmidt orthogonalization bitrary operator. Prove that Pf (B AB T ) = det(B)Pf A.
procedure to the vectors ej , j = k + 1, ..., N .) Then we will have Hint: If A corresponds to the bivector A = v1 v2 +...+vk1
Aej = 0 (j = k + 1, ..., N ). Acting with N AN on the tensor vk , show that B AB T corresponds to the bivector Bv1 Bv2 +
v1 ... vk ek+1 ... eN , we find ... + Bvk1 Bvk .
Exercise 2: Let A be an antisymmetric operator such that
(N AN )(v1 ... vk ek+1 ... eN ) = ... AeN = 0 det A 6= 0; let {ei | i = 1, ..., 2n} be a given orthonormal basis.
Prove that there exists an operator B such that the operator
and hence det A = 0. Thus (Pf A)2 = 0 = det A, and there is B AB T is represented by the bivector e e + ... + e
1 2 2n1 e2n .
nothing left to prove in case k < N .
Deduce that det A = (Pf A)2 .
It remains to consider the interesting case k = N . In this case,
Hint: This is a paraphrase of the proof of Theorem 5.8.1. Use
the set {v1 , ..., vN } is a basis in V . The Pfaffian Pf A is the coeffi-
the previous exercise and represent A by the bivector v1 v2 +
cient in the tensor equality
... + v2n1 v2n , where the set {vi } is a basis. Define B as a
N/2 map ei 7 vi ; then B 1 exists and maps vi 7 ei . Show that
1 ^ ! Pf A = 1/(det B).
A = v1 ... vN =(Pf A)e1 ... eN ,
(N/2)! Exercise 3: Use the result of Exercise 5 in Sec. 5.7 to prove that
k=1
det A = (Pf A)2 . Pn
where {ej } is an orthonormal basis. In other words, Pf A is the Hint: For an operator A = i=1 i ai bi , where
(oriented) volume of the parallelepiped spanned by the vectors {a1 , b1 , ..., an , bn } is a positively oriented orthonormal basis and
{vj | j = 1, ..., N }, if we assume that the vectors {ej } span a unit 2n N , show that Pf A = 1 ...n and det A = 21 ...2n .
Exercise 4:* An operator A is antisymmetric and is represented
volume. Now it is clear  that Pf A 6= 0.
Let us denote by vj the dual basis to {vj }. Due to the one- in some orthonormal basis by a block matrix of the form
to-one
 correspondence between vectors and covectors, we map
 
0 M
vj into the reciprocal basis {uj }. We now apply the operator A = ,
M T 0
A to the reciprocal basis {uj } and find by a direct calculation
(using the property hvi , uj i = ij ) that Au1 = v2 , Au2 = v1 , where M is an arbitrary n-dimensional matrix. Show that
and so on. Hence 1
Pf A = (1) 2 n(n1) det M .
Au1 ... AuN = (v2 ) v1 ... (vN ) vN 1 Solution: We need to represent A by a bivector from 2 V . The
= v1 v2 ... vN . given form of the matrix A suggests that we consider the split-
ting of the space V into a direct sum of two orthogonal n-dimen-
It follows that det A is the coefficient in the tensor equality sional subspaces, V = U1 U2 , where U1 and U2 are two copies
of the same n-dimensional space U . A scalar product in U is
!
Au1 ... AuN = v1 ... vN =(det A)u1 ... uN . (5.8) defined naturally (by restriction), given the scalar product in V .
We will denote by h, i the scalar product in U . The given matrix
In particular, det A 6= 0. form of A means that we have a given operator M End U such
In order to prove the desired relationship between the de- that A acts on vectors from V as
terminant and the Pfaffian, it remains to compute the volume
spanned by the dual basis {uj }, so that the tensor u1 ... uN A (v1 v2 ) = (M v2 ) (M T v1 ), v1 , v2 U. (5.9)
can be related to e1 ...eN . By Statement 2 in Sec. 5.4.4, the vol- We can choose an orthonormal basis {c | i = 1, ..., n} in U
i
ume spanned by {uj } is the inverse of the volume spanned by and represent the operator M through some suitable vectors
{vj }. Therefore the volume spanned by {uj } is equal to 1/Pf A. {mi | i = 1, ..., n} (not necessarily orthogonal) such that
Now we can compute the Pfaffian of A using n
X
M u = mi hci , ui , u U.
u1 ... uN = (Pf A)1 e1 ... eN i=1

105
5 Scalar product

Note that thePvectors mi are found from M ci = mi . It follows


n
that M T u = i=1 ci hmi , ui. Using Eq. (5.9), we can then write
the tensor representation of A as
n
X
A = [(mi 0) (0 ci ) (0 ci ) (mi 0) ] .
i=1

Hence, A can be represented by the 2-vector


n
X
A= (mi 0) (0 ci ) 2 V.
i=1

The Pfaffian of A is then found from


(m1 0) (0 c1 ) ... (mn 0) (0 cn )
Pf A = ,
e1 ... e2n
where {ei | i = 1, ..., 2n} is an orthonormal basis in V . We can
choose this basis as ei = ci 0, en+i = 0 ci (for i = 1, ..., n). By
1
introducing the sign factor (1) 2 n(n1) , we may rearrange the
exterior products so that all mi are together. Hence
1
Pf A = (1) 2 n(n1)
(m1 0) ... (mn 0) (0 c1 ) ... (0 cn )
.
(c1 0) ... (cn 0) (0 c1 ) ... (0 cn )

Vectors corresponding to different subspaces can be factorized,


and then the factors containing 0 ci can be canceled:
1 m1 ... mn c1 ... cn
Pf A = (1) 2 n(n1)
c1 ... cn c1 ... cn
1
n(n1) m1 ... mn
= (1) 2 .
c1 ... cn
Finally, we have

m1 ... mn M c1 ... M cn
= = det M .
c1 ... cn c1 ... cn
This concludes the calculation. 

106
A Complex numbers
This appendix is a crash course on complex numbers. Another view of complex numbers is that they are linear poly-
nomials in the formal variable i. Since we may replace i2 by
1 and i1 by i wherever any power of i appears, we can re-
A.1 Basic definitions duce any power series in i and/or in i1 to a linear combination
of 1 and i.
A complex number is a formal expression a + ib, where a, b are If z = a + ib where a, b R then a is called the real part, Re z,
real numbers. In other words, a complex number is simply a and b is the imaginary part, Im z. In other words,
pair (a, b) of real numbers, written in a more convenient notation
as a+ib. One writes, for example, 2+i3 or 2+3i or 3+i or 5i8, Re (a + ib) = a, Im (a + ib) = b.
etc. The imaginary unit, denoted i, is not a real number; it is The absolute value or modulus of z = a + ib is the real number
a symbol which has the property i2 = 1. Using this property, |z| a2 + b2 .
we can apply the usual algebraic rules to complex numbers; this Exercise: Compute
is emphasized by the algebraic notation a + ib. For instance, we h i
can add and multiply complex numbers, Re (2 + i)2 =? |3 + 4i| =?
(1 + i) + 5i = 1 + 6i; Prove that
2
(1 i) (2 + i) = 2 2i + i i z + z z z 2
Re z = ; Im z = ; |z| = z z;
= 3 i; 2 2i

i3 = ii2 = i. |z| = |z| ; |z1 z2 | = |z1 | |z2 | ; (z1 z2 ) = z1 z2
for any complex numbers z, z1 , z2 C.
It is straightforward to see that the result of any arithmetic op-
eration on complex numbers turns out to be again a complex
number. In other words, one can multiply, divide, add, subtract A.2 Geometric representation
complex numbers just as directly as real numbers.
The set of all complex numbers is denoted by C. The set of all Let us draw a complex number z = x+ iy as a point with coordi-
real numbers is R. nates (x, y) in the Euclidean plane, or a vector with real compo-
Exercise: Using directly the definition of the imaginary unit, nents (x, y). You can check that the sum z1 + z2 and the product
compute the following complex numbers. of z with a real number , that is z 7 z, correspond to the fa-
miliar operations of adding two vectors and multiplying a vec-
!3
1 1 i 3 tor by a scalar. Also, the absolute value |z| is equal to the length
=? i4 =? i5 =? + =? of the two-dimensional vector (x, y) as computed in the usual
i 2 2
Euclidean space.
The complex number a ib is called complex conjugate to Exercise: Show that the multiplication of z = x + iy by a com-
a + ib. Conjugation is denoted either with an overbar or with a plex number r cos + i sin corresponds to rotating the vec-
star superscript, tor (x, y) by angle counterclockwise (assuming that the x axis
is horizontal and points to the right, and the y axis points verti-
z = a + ib, z = z = a ib, cally upwards). Show that |rz| = |z|, which corresponds to the
fact that the length of a vector does not change after a rotation.
according to convenience. Note that

zz = (a + ib) (a ib) = a2 + b2 R. A.3 Analytic functions


In order to divide by a complex number more easily, one mul- Analytic functions are such functionsP f (x)nthat can be repre-
tiplies the numerator and the denominator by the complex con- sented by a power series f (x) = n=0 cn x with some coeffi-
jugate number, e.g. cients cn such that the series converges at least for some real x.
In that case, the series will converge also for some complex x.
1 1 3i 3i 3i 3 1 In this sense, analytic functions are naturally extended from real
=? = = = = i.
3+i 3+i 3i 9 i2 10 10 10 to complex numbers. For example, f (x) = x2 + 1 is an analytic
function; it can be computed just as well for any complex x as
Exercise: Compute the following complex numbers, for real x.
An example of a non-analytic function is the Heaviside step
1i 1i 1+i 1
=? =? =? function (
1+i 4+i 4i a + ib
0, x < 0;
(x) =
where a, b R.  1, x 0.

107
A Complex numbers

This function cannot be represented by a power series and thus in other words, the logarithm is defined only up to adding 2i.
cannot be naturally extended to complex numbers. In other So the logarithm (at least in our simple-minded approach here)
words, there is no useful way to define the value of, say, (2i). is not a single-valued function. For example, we have ln (1) =
Rx 2
On the other hand, functions such as cos x, x, x/ ln x, 0 et dt, i or 3i or i, so one can write
and so on, are analytic and can be evaluated for complex x.
Exercise: Compute (1 + 2i) (1 + 3i) and (1 2i) (1 3i). What ln (1) = {i + 2ni | n Z} .

did you notice? Prove that f (z ) = [f (z)] for any analytic func-
Exercise: a) Calculate:
tion f (z).
Remark: Although x has no power series expansion at x = 0, ln i =? ln (8i) =?
it has a Taylor expansion at x = 1, which is sufficient for analyt-
icity; one can also define z for complex z through the property b) Show that the geometric or polar representation of a com-
2
( z) = z. plex number z = x + iy = ei can be computed using the loga-
Exercise: Derive an explicit formula for the square root of a rithm:
complex number, a + ib, where a, b R. y
Hint: Write a + ib = x + iy, square both sides, and solve for = exp (Re ln z) = |z| , = Im ln z = arctan .
x
x and y.
Answer: Determine the polar representation
of the following complex
s s numbers: z1 = 2 + 2i, z2 = 3 + i. Calculate also ln z1 and

a 2 + b2 + a a 2 + b2 a ln z2 .
a + ib = + i sign(b) ,
c) Powers of a complex number can be defined by z x
2 2
exp [x ln z]. Here x can be also a complex number! As a rule,
x
where sign(b) = 1, 0, 1 when b is positive, zero, or negative. z is not uniquely defined (unless x is a real integer). Calculate:
Note that this formula may be rewritten for quicker calculation v
u !
as u 1 3
 
s i =? t + i =? 6 1 =? ii =? 32i =?
b 2 2
a +b +a 2 2
a + ib = r + i , r .
2r 2
(In this formula, the square roots in the definition of r are purely
real and positive.)

A.4 Exponent and logarithm


The exponential function and the logarithmic function are ana-
lytic functions.
The exponential function is defined through the power series

1 1 X zn
ez exp z 1 + z + z 2 + ... = .
1! 2! n=0
n!

This series converges for all complex z.


Exercise: Verify the Euler formula,

ei = cos + i sin , R,

by using the known Taylor series for sin x and cos x. Calculate:
1
e2i =? ei =? e 2 i =? e2i =?

Exercise: Use the identity ea+b = ea eb , which holds also for


complex numbers a, b, to show that

ea+ib = ea (cos b + i sin b) , a, b R.

Calculate:
h  
i 1
exp ln 2 + i =? exp [1 + i] =? cos i =?
2 2

The logarithm of a complex number z is a complex number


denoted ln z such that eln z = z. It is easy to see that

exp [z + 2i] = exp z, z C,

108
B Permutations
In this appendix I briefly review some basic properties of per- Statement 2: If is represented as a product of EPs in two dif-
mutations. ferent ways, namely by a product of n1 EPs and also by a prod-
We consider the ordered set (1, ..., N ) of integers. A permu- uct of n2 EPs, then the integers n1 and n2 are both even or both
tation of the set (1, ..., N ) is a map : (1, ..., N ) 7 (k1 , ..., kN ) odd.
where the kj are all different and again range from 1 to N . In Proof: Let us denote by || the smallest number of EPs re-
other words, a permutation is a one-to-one map of the set quired to represent a given permutation .1 We will now show
(1, ..., N ) to itself. For example, that || is equal to the number of order violations in , i.e. the
number of instances when some larger number is situated to the
: (1, 2, 3, 4, 5) 7 (4, 1, 5, 3, 2)
left of some smaller number. For example, in the permutation
is a permutation of the set of five elements. (1, 2, 3, 4) 7 (4, 1, 3, 2) there are four order violations: the pairs
We call a permutation elementary if it exchanges only two (4, 1), (4, 3), (4, 2), and (3, 2). It is clear that the correct order can
adjacent numbers, for example (1, 2, 3, 4) 7 (1, 3, 2, 4). The be restored only when each order violation is resolved, which
identity permutation, denoted by id, does not permute any- requires one EP for each order violation.
thing. Two permutations 1 and 2 can be executed one af- The construction in the proof of Statement 1 shows that there
ter another; the result is also a permutation called the product exists a choice of exactly || EPs whose product equals . There-
(composition) of the elementary permutations 1 and 2 and de- fore, || (the smallest number of EPs required to represent ) is
noted 2 1 (where 1 is executed first, and then 2 ). For exam- indeed equal to the number of order violations in .
ple, the product of (1, 2, 3) 7 (1, 3, 2) and (1, 2, 3) 7 (2, 1, 3) is Now consider multiplying by some EP 0 ; it is clear that the
(1, 2, 3) 7 (3, 1, 2). The effect of this (non-elementary) permuta- number of order violations changes by 1, that is, |0 | = || 1,
tion is to move 3 through 1 and 2 into the first place. Note that depending on whether 0 violates the order existing in at the
in this way we can move any number into any other place; for two adjacent places affected by 0 . For example, the permu-
that, we need to use as many elementary permutations as places tation = (4, 1, 3, 2) has four order violations, || = 4; when
we are passing through. we multiply by 0 = (1, 3, 2, 4), which is an EP exchanging
The set of all permutations of N elements is a group with re- 2 and 3, we remove the order violation in in the pair (1, 3)
spect to the product of permutations. This group is not commu- since 0 = (4, 3, 1, 2); hence |0 | = 3. Since || is changed by
tative. 1, we have (1)|0 | = (1)|| in any case. Now we con-
For brevity, let us write EP for elementary permutation. sider two representations of through n1 and through n2 EPs.
Note that = id when is an EP. Now we will prove that If = n1 ...1 , where j are EPs, we find by induction
the permutation group is generated by EPs.
Statement 1: Any permutation can be represented as a product
= (1)| n1 1 | = (1) 1 .
|| ... n
(1)
of some finite number of EPs.
Proof: Suppose : (1, ..., N ) 7 (k1 , ..., kN ) is a given permu- Similarly for the second representation. So it follows that
tation. Let us try to reduce it to EPs. If k1 6= 1 then 1 is some-
where among the ki , say at the place i1 . We can move 1 from
(1)|| = (1)n1 = (1)n2 .
the i1 -th place to the first place by executing a product of i1 1
EPs (since we pass through i1 1 places). Then we repeat the Hence, the numbers n1 and n2 are either both even or both odd.
same operation with 2, moving it to the second place, and so on. 
The result will be that we obtain some (perhaps a large number It follows from the proof of Statement 2 that the number
of) EPs 1 , ..., n , such that 1 ...n = id. Using the property
(1)|| is independent of the representation of through EPs.
i2 = id, we move i s to the right and obtain = n ...1 . 
This number is called the parity of a permutation . For exam-
Any given permutation is thus equal to a product of EPs 1
ple, the permutation
to n , but this representation is in any case not unique because,
say, we may insert 1 1 = id in any place of the product n ...1 : (1, 2, 3, 4) 7 (1, 4, 3, 2)
without changing the result. So the number of required EPs can
be changed. However, it is very important (and we will prove has four order violations, || = 4, and is therefore an even per-
this now) that the number of required EPs can only be changed mutation with parity +1.
by 2, never by 1.
Definition: For a permutation , the inverse permutation 1
In other words, we are going to prove the following statement:
is defined by 1 = 1 = id.
When a given permutation is represented as a product of EPs,
= n ...1 , the number n of these EPs is always either even Statement 3: The inverse permutation 1 exists for every per-
or odd, depending on but independent of the choice of the mutation , is unique, and the parity of 1 is the same as the
representation n ...1 . Since the parity of n (parity is whether parity of .
n is even or odd) is a property of the permutation rather than 1 In
Definition D0 we used the notation || to mean 0 or 1 for even or odd per-
of the representation of through EPs, it will make sense to say mutations. However, the formula uses only (1)|| , so the present definition
that the permutation is itself even or odd. of || is still consistent with Definition D0.

109
B Permutations

Proof: By Statement 1, we have = 1 ...n where i are EPs.


Since i i = id, we can define explicitly the inverse permutation
as
1 n n1 ...1 .
It is obvious that 1 = 1 = 1, and so 1 exists. If there
were two different inverse permutations, say 1 and , we
would have
1 = 1 = .
Therefore, the inverse is unique. Finally, by Statement 2, the par-
ity of 1 is equal to the parity of the number n, and thus equal
to the parity of . (Alternatively, we may show that | 1 | = ||.)


110
C Matrices
This appendix is a crash course on vector and matrix algebra. So the result of a multiplication of a 1 n matrix with an n 1
matrix is simply a number. The general definition is

C.1 Definitions x1 n
X
a1 . . . an ... =
 
ai xi .
Matrices are rectangular tables of numbers; here is an example xn i=1
of a 4 4 matrix:
Let us try to guess how to define the multiplication of a col-
1 0 0 2 umn with a matrix consisting of several rows. Start with just two
2 1 0 0
. rows:
3 2 1 0   x1
4 3 2 1 a1 a2 a3
x2 =?
b1 b2 b3
x3
Matrices are used whenever it is convenient to arrange some
numbers in a rectangular table. We can multiply each of the two rows with the column [xi ] as
To write matrices symbolically, one uses two indices, for ex- before. Then we obtain two numbers, and it is natural to put
ample Aij is the matrix element in the i-th row and the j- them into a column:

th column. In this convention, the indices are integers rang-   x1  
ing from 1 to each dimension of the matrix. For example, a a1 a2 a3 a1 x1 + a2 x2 + a3 x3
x2 = .
3 2 rectangular matrix can be written as a set of coefficients b1 b2 b3 b1 x1 + b2 x2 + b3 x3
x3
{Bij | 1 i 3, 1 j 2} and is displayed as
In general, we define the product of an m n matrix with an
B11 B12 n 1 matrix (a column); the result is an m 1 matrix (again a
B21 B22 . column):
B31 B32 Pn
a11 ... a1n x1 i=1 a1i xi
A matrix with dimensions n 1 is called a column since it has .. .. .. .. = .
. . . . P .. .
the shape n
am1 . . . amn xn a
i=1 mi i x
A11
.. Exercise: Calculate the following products of matrices and
. .
columns:
An1
  
1 3 2
A matrix with dimensions 1 n is called a row since it has the =?
4 1 1
shape   
 
A11 . . . A1n . 51 2 5 + 1 =?
2 5+1 51
Rows and columns are sometimes distinguished from other ma-
trices by using square brackets. 1 9 2 2
3 0 3 0 =?
6 4 3 4

C.2 Matrix multiplication 1 0 0 0 a
2 1 0 0 b
0 2 1 0 c =?

Matrices can be multiplied by a number just like vectors: each
matrix element is multiplied by the number. For example, 0 0 2 1 d

u v 2u 2v 2 1 0 0 0 1
2 w x = 2w 2x . 1 2 1 0 0 1

y z 2y 2z ..
1


0 1 2 1 .

0 0
.. =?
Now we will see how to multiply a matrix with another matrix. 1 2 0
.
. . ..
The easiest is to define the multiplication of a row with a col- .. .. . 1 1

umn: 0 0 1 2 1

Finally, we can extend this definition to products of two ma-
  x1
a1 a2 a3 x2 = a1 x1 + a2 x2 + a3 x3 . trices of sizes m n and n p. We first multiply the m n
x3 matrix by each of the n 1 columns in the n p matrix, yielding

111
C Matrices

p columns of size m 1, and then arrange these p columns into Exercise 2: We consider real-valued 2 2 matrices.
an m p matrix. The resulting general definition can be written a) The matrix-valued function A() is defined by
as a formula for matrix multiplication: if A is an m n matrix  
and B is an n p matrix then the product of A and B is an m p cos sin
A() = .
matrix C whose coefficients are given by sin cos

n
X Show that A(1 )A(2 ) = A(1 + 2 ). Deduce that A(1 ) com-
Cik = Aij Bjk , 1 i m, 1 k p. mutes with A(2 ) for arbitrary 1 , 2 .
j=1 b) For every complex number z = x + iy = rei , let us now
define a matrix
Exercise: Calculate the following matrix products:    
r cos r sin x y
  C(z) = = .
 3 9  r sin r cos y x
2 3 =?
2 6
   Show that C(z1 ) commutes with C(z2 ) for arbitrary complex
5 6 5 5 z1 , z2 , and that C(z1 ) + C(z2 ) = C(z1 + z2 ) and C(z1 )C(z2 ) =
=?
6 5 6 6 C(z1 z2 ). In this way, complex numbers could be replaced by
! !
1+
2 0 1
2 0 matrices of the form C(z). The addition and the multiplication
3 3 =?
1 of matrices of this form corresponds exactly to the addition and
0 2 0 1+
2
3 3 the multiplication of complex numbers.
Exercise 3: The Pauli matrices 1 , 2 , 3 are defined as follows,
  3 2 1 2      
0 1 2 2 1 0 0 =? 0 1 0 i 1 0
1 = , 2 = , 3 = .
1 0 0 0 1 0 i 0 0 1

2 0 0 0

3 0 0 0

a
Verify that 12 = 1 (the 2 2 identity matrix), 1 2 = i3 , 2 3 =
  0 2 0 0 0 3 0 0 b i1 , and in general
w x y z
0
=?
0 2 0 0 0 3 0 c X
0 0 0 2 0 0 0 3 d a b = ab 1 + i abc c .
c
Matrices of size n n are called square matrices. They can be
b) The expression AB BA where A, B are two matrices is
multiplied with each other and, according to the rules of matrix
called the commutator of A and B and is denoted by
multiplication, again give square matrices of the same size.
Exercise 1: If A and B are two square matrices such that AB = [A, B] = AB BA.
BA then one says that the matrices A and B commute with each
other. Determine whether the following pairs of matrices com- Using the result of part a), compute [a , b ].
mute:    
1 1 3 0
a) A = and B = . C.3 Linear equations
0 2 1 2

2 0 0 3 1 1
A system of linear algebraic equations, for example,
b) A = 0 2 0 and B = 0 1 2 .
0 0 2 2 8 7 2x + y = 11

3 0 0 97 12 55 3x y = 6
c) A = 0 3 0 and B = 8 54 26 .
0 0 3 31 53 78 can be formulated in the matrix language as follows. One intro-
 
What have you noticed?
  duces the column vectors x xy and b 11 6 and the matrix
w x
d) Determine all possible matrices B = that com-  
y z 2 1
  A .
1 1 3 1
mute with the given matrix A = . 
0 2
Note that a square matrix having the elements 1 at the diago- Then the above system of equations is equivalent to the single
nal and zeros elsewhere, for example matrix equation,
Ax = b,

1 0 0 where x is understood as the unknown vector.
0 1 0 , Exercise: Rewrite the following system of equations in matrix
0 0 1 form:
has the property that it does not modify anything it multiplies. x+yz = 0
Therefore such matrices are called the identity matrices and de-
y x + 2z = 0
noted by 1. One has 1A = A and A1 = A for any matrix A (for
which the product is defined). 3y = 2

112
C Matrices

Remark: In a system of equations, the number of unknowns C.5 Determinants


may differ from the number of equations. In that case we need
to use a rectangular (non-square) matrix to rewrite the system inIn the construction of the inverse matrix for a given matrix Aij ,
a matrix form. one finds a formula of a peculiar type: Each element of the in-
verse matrix A1 is equal to some polynomial in Aij , divided
by a certain function of Aij . For example, Exercise 1a in Sec. C.4
C.4 Inverse matrix gives such a formula for 2 2 matrices; that formula contains
the expression wz xy in every denominator.
The expression in the denominator is the same for every ele-
We consider square matrices A and B. If AB = 1 and BA = 1
ment of A1 . This expression needs to be nonzero in that for-
then B is called the inverse matrix to A (and vice versa). The
mula, or else we cannot divide by it (and then the inverse ma-
inverse matrix to A is denoted by A1 , so that one has AA1 =
1 trix does not exist). In other words, this expression (which is
A A = 1.
a function of the matrix Aij ) determines whether the inverse
Remark: The inverse matrix does not always exist; for instance, matrix exists. Essentially, this function (after fixing a numerical
the matrix   prefactor) is called the determinant of the matrix Aij .
1 1 The determinant for a 2 2 or 3 3 matrix is given1 by the
2 2 formulas
 
does not have an inverse. For finite-dimensional square matrices a b
det = ay bx,
A and B, one can derive from AB = 1 that also BA = 1.  x y

The inverse matrix is useful for solving linear equations. For a b c
1
instance, if a matrix A has an inverse, A , then any equation det p q r = aqz + brx + cpy bpz cqx ary.
Ax = b can be solved immediately as x = A1 b. x y z
Exercise 
 1: a) Show that the inverse to a 2 2 matrix A = Determinants are also sometimes written as matrices with
w x
exists when wz xy 6= 0 and is given explicitly by straight vertical lines at both sides, e.g.
y z
the formula  
  1 2 1 2
1 z x det = 3.
A1 = . 0 3 0 3
wz xy y w
In this notation, a determinant resembles a matrix, so it requires
1 1
b)
 Compute
 the inverse
 matrices
 A and B for A = that we clearly distinguish between a matrix (a table of num-
1 1 3 0
and B = . Then compute the solutions bers) and a determinant (which is a single number computed
0 2 1 2 from a matrix).
of the linear systems To compute the determinant of an arbitrary n n matrix A,
          one can use the procedure called the Laplace expansion.2 First
1 1 x 3 3 0 x 6
= ; = . one defines the notion of a minor Mij corresponding to some
0 2 y 5 1 2 y 0 element Aij : By definition, Mij is the determinant of a matrix
obtained from A by deleting row i and column j. For example,
Exercise 2: Show that (AB)1 = B 1 A1 , assuming that the the minor corresponding to the element b of the matrix
inverse matrices to A and B exist.
Hint: Simplify the expression (AB)(B 1 A1 ). a b c
A= p q r
Exercise 3: Show that x y z

(1 + BA)1 = A1 (1 + AB)1 A, is the minor corresponding to A12 , hence we delete row 1 and
column 2 from A and obtain
assuming that all the needed inverse matrices exist.
p r

Hint: Use the property A(1 + BA) = A + ABA = (1 + AB)A. M12 =
= pz rx.
x z

The inverse matrix to a given n n matrix A can be computed Then, one sums over all the elements A1i (i = 1, ..., n) in the
by solving n systems of equations, first row of A, multiplied by the corresponding minors and the
i1
sign factor (1) . In other words, the Laplace expansion is the
Ax1 = e1 , ..., Axn = en , formula
n
X i1
where the vectors ei are the standard basis vectors, det(A) = (1) A1i M1i .
i=1

e1 = (1, 0, ..., 0) , e2 = (0, 1, 0, ..., 0) , A similar formula holds for any other row j instead of the first
..., en = (0, ..., 0, 1) , row; one needs an additional sign factor (1)j1 in that case.
1I do not derive this result here; a derivation is given in the main text.
while the vectors x1 , ..., xn are unknown. When {xi } are deter- 2 Here I will only present the Laplace expansion as a computational procedure
mined, their components xij form the inverse matrix. without derivation. A derivation is given as an exercise in Sec. 3.4.

113
C Matrices

Example: We compute the determinant of the matrix is a given 2 3 matrix then the transposed matrix, denoted by
AT , is the following 3 2 matrix:
a b c
A= p q r a x
x y z AT = b y .
c z
using the Laplace expansion in the first row. The minors are
Note that a row vector becomes a column vector when trans-
q r posed, and vice versa. In general, an m n matrix becomes an
M11 = = qz ry,
y z n m matrix when transposed.

p r The scalar product of vectors, q r, can be represented as a
M12 = = pz rx,
matrix product qT r. For example, if q = (a, b, c) and r = (x, y, z)
x z

p q
then
M13 = = py qx.
x y
  a
q r = ax + by + cz = x y z b = qT r = rT q.
Hence c
det A = aM11 bM12 + bM13 A matrix product taken in the opposite order (i.e. a column vec-
= a(qx ry) b(pz rx) + c(py qx). tor times a row vector) gives a matrix as a result,

This agrees with the formula given previously. a   ax ay az
Exercise: Compute the following determinants. qrT = b x y z = bx by bz .
a) c cx cy cz

15 12 1 + x2 1 + x2 This is known as the tensor product of two vectors. An alterna-
=? 1 + x2 1 + x4 =?
1 2


2 5 tive notation is q rT . Note that the result of the tensor product

1 99 99 99 is not a vector but a matrix, i.e. an object of a different kind. (The
1 2 3
0 2 99 99 space of n n matrices is also denoted by Rn Rn .)

0 =? 4 5 6 =?
Exercise: Does the tensor product commute? In a three-dimen-
0 3 99

7 8 9

0 0 0 4 sional space, compute the matrix q rT r qT . Compare that
matrix with the vector product q r.
b)

2
1 0
2 1
A2 =
=? A3 = 1
2 1 =?
1 2 0
1 2

2 1 0 0

1 2 1 0
A4 = =?
0 1 2 1

0 0 1 2

Guess and then prove (using the Laplace expansion) the general
formula for determinants An of this form for arbitrary n,

2 1 0 0

..
1 2 1 .

An = 0 1 2 0 =?
. .. .. ..
.. . . . 1

0 0 1 2

Hint: Use the Laplace expansion to prove the recurrence relation


An+1 = 2An An1 .

C.6 Tensor product


A matrix with rows and columns reversed is called the trans-
posed matrix. For example, if
 
a b c
A=
x y z

114
D Distribution of this text
D.1 Motivation Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
A scientist receives financial support from the society and the
freedom to do research in any field. I believe it is a duty of sci- D.2.1 Preamble
entists to make the results of their science freely available to the
interested public in the form of understandable, clearly written The purpose of this License is to make a manual, textbook, or other
textbooks. This task has been significantly alleviated by modern functional and useful document free in the sense of freedom: to as-
technology. Especially in theoretical sciences where no experi- sure everyone the effective freedom to copy and redistribute it, with or
without modifying it, either commercially or noncommercially. Secon-
mentally obtained photographs or other such significant third-
darily, this License preserves for the author and publisher a way to get
party material need to be displayed, authors are able (if not al-
credit for their work, while not being considered responsible for modi-
ways willing) to prepare the entire book on a personal computer, fications made by others.
typing the text and drawing the diagrams using freely available This License is a kind of copyleft, which means that derivative
software. Ubiquitous access to the Internet makes it possible to works of the document must themselves be free in the same sense. It
create texts of high typographic quality in ready-to-print form, complements the GNU General Public License, which is a copyleft li-
such as a PDF file, and to distribute these texts essentially at no cense designed for free software.
cost. We have designed this License in order to use it for manuals for free
The distribution of texts in todays society is inextricably con- software, because free software needs free documentation: a free pro-
nected with the problem of intellectual property. One could sim- gram should come with manuals providing the same freedoms that the
ply upload PDF files to a Web site and declare these texts to be in software does. But this License is not limited to software manuals; it can
be used for any textual work, regardless of subject matter or whether it
public domain, so that everyone would be entitled to download
is published as a printed book. We recommend this License principally
them for free, print them, or distribute further. However, ma-
for works whose purpose is instruction or reference.
licious persons might then prepare a slightly modified version
and inhibit further distribution of the text by imposing a non-
free license on the modified version and by threatening to sue D.2.2 Applicability and definitions
anyone who wants to distribute any version of the text, includ- This License applies to any manual or other work, in any medium, that
ing the old public-domain version. Merely a threat of a law- contains a notice placed by the copyright holder saying it can be dis-
suit suffices for an Internet service provider to take down any tributed under the terms of this License. Such a notice grants a world-
web page allegedly violating copyright, even if the actual law- wide, royalty-free license, unlimited in duration, to use that work un-
suit may be unsuccessful. der the conditions stated herein. The Document, below, refers to any
To protect the freedom of the readers, one thus needs to re- such manual or work. Any member of the public is a licensee, and is
lease the text under a copyright rather than into public domain, addressed as you. You accept the license if you copy, modify or dis-
and at the same time one needs to make sure that the text, as tribute the work in a way requiring permission under copyright law.
well as any future revisions thereof, remains freely distributable. A Modified Version of the Document means any work containing
the Document or a portion of it, either copied verbatim, or with modi-
I believe that a free license, such as GNU FDL (see the next sub-
fications and/or translated into another language.
section), is an appropriate way of copyrighting a science text-
A Secondary Section is a named appendix or a front-matter sec-
book. tion of the Document that deals exclusively with the relationship of the
The present book is released under GNU FDL. According to publishers or authors of the Document to the Documents overall sub-
the license, everyone is allowed to print this book or distribute it ject (or to related matters) and contains nothing that could fall directly
in any other way. In particular, any commercial publisher may within that overall subject. (Thus, if the Document is in part a textbook
offer professionally printed and bound copies of the book for of mathematics, a Secondary Section may not explain any mathemat-
sale; the permission to do so is already granted. Since the FDL ics.) The relationship could be a matter of historical connection with
disallows granting exclusive distribution rights, I (or anybody the subject or with related matters, or of legal, commercial, philosophi-
else) will not be able to sign a standard exclusive-rights contract cal, ethical or political position regarding them.
with a publisher for printing this book (or any further revision The Invariant Sections are certain Secondary Sections whose titles
of this book). I am happy that lulu.com offers commercial print- are designated, as being those of Invariant Sections, in the notice that
says that the Document is released under this License. If a section does
ing of the book at low cost and at the same time adheres to the
not fit the above definition of Secondary then it is not allowed to be
conditions of a free license (the GNU FDL). The full text of the designated as Invariant. The Document may contain zero Invariant
license follows. Sections. If the Document does not identify any Invariant Sections then
there are none.
The Cover Texts are certain short passages of text that are listed, as
D.2 GNU Free Documentation License Front-Cover Texts or Back-Cover Texts, in the notice that says that the
Document is released under this License. A Front-Cover Text may be at
Version 1.2, November 2002 most 5 words, and a Back-Cover Text may be at most 25 words.
Copyright (c) 2000,2001,2002 Free Software Foundation, Inc. A Transparent copy of the Document means a machine-readable
59 Temple Place, Suite 330, Boston, MA 02111-1307, USA copy, represented in a format whose specification is available to the

115
D Distribution of this text

general public, that is suitable for revising the document straightfor- satisfy these conditions, can be treated as verbatim copying in other
wardly with generic text editors or (for images composed of pixels) respects.
generic paint programs or (for drawings) some widely available draw- If the required texts for either cover are too voluminous to fit legibly,
ing editor, and that is suitable for input to text formatters or for auto- you should put the first ones listed (as many as fit reasonably) on the
matic translation to a variety of formats suitable for input to text for- actual cover, and continue the rest onto adjacent pages.
matters. A copy made in an otherwise Transparent file format whose If you publish or distribute Opaque copies of the Document number-
markup, or absence of markup, has been arranged to thwart or discour- ing more than 100, you must either include a machine-readable Trans-
age subsequent modification by readers is not Transparent. An image parent copy along with each Opaque copy, or state in or with each
format is not Transparent if used for any substantial amount of text. A Opaque copy a computer-network location from which the general net-
copy that is not Transparent is called Opaque. work-using public has access to download using public-standard net-
Examples of suitable formats for Transparent copies include plain work protocols a complete Transparent copy of the Document, free of
ASCII without markup, Texinfo input format, LATEX input format, added material. If you use the latter option, you must take reasonably
SGML or XML using a publicly available DTD, and standard- prudent steps, when you begin distribution of Opaque copies in quan-
conforming simple HTML, PostScript or PDF designed for human tity, to ensure that this Transparent copy will remain thus accessible at
modification. Examples of transparent image formats include PNG, the stated location until at least one year after the last time you dis-
XCF and JPG. Opaque formats include proprietary formats that can be tribute an Opaque copy (directly or through your agents or retailers) of
read and edited only by proprietary word processors, SGML or XML that edition to the public.
for which the DTD and/or processing tools are not generally avail- It is requested, but not required, that you contact the authors of the
able, and the machine-generated HTML, PostScript or PDF produced Document well before redistributing any large number of copies, to
by some word processors for output purposes only. give them a chance to provide you with an updated version of the Doc-
The Title Page means, for a printed book, the title page itself, plus ument.
such following pages as are needed to hold, legibly, the material this
License requires to appear in the title page. For works in formats which
do not have any title page as such, Title Page means the text near the D.2.5 Modifications
most prominent appearance of the works title, preceding the beginning You may copy and distribute a Modified Version of the Document un-
of the body of the text. der the conditions of sections D.2.3 and D.2.4 above, provided that
A section Entitled XYZ means a named subunit of the Document you release the Modified Version under precisely this License, with the
whose title either is precisely XYZ or contains XYZ in parentheses fol- Modified Version filling the role of the Document, thus licensing distri-
lowing text that translates XYZ in another language. (Here XYZ stands bution and modification of the Modified Version to whoever possesses
for a specific section name mentioned below, such as Acknowledge- a copy of it. In addition, you must do these things in the Modified
ments, Dedications, Endorsements, or History.) To Preserve Version:
the Title of such a section when you modify the Document means that A. Use in the Title Page (and on the covers, if any) a title distinct
it remains a section Entitled XYZ according to this definition. from that of the Document, and from those of previous versions (which
The Document may include Warranty Disclaimers next to the notice should, if there were any, be listed in the History section of the Docu-
which states that this License applies to the Document. These Warranty ment). You may use the same title as a previous version if the original
Disclaimers are considered to be included by reference in this License, publisher of that version gives permission.
but only as regards disclaiming warranties: any other implication that B. List on the Title Page, as authors, one or more persons or entities
these Warranty Disclaimers may have is void and has no effect on the responsible for authorship of the modifications in the Modified Version,
meaning of this License. together with at least five of the principal authors of the Document (all
of its principal authors, if it has fewer than five), unless they release you
D.2.3 Verbatim copying from this requirement.
C. State on the Title page the name of the publisher of the Modified
You may copy and distribute the Document in any medium, either com- Version, as the publisher.
mercially or noncommercially, provided that this License, the copyright D. Preserve all the copyright notices of the Document.
notices, and the license notice saying this License applies to the Docu- E. Add an appropriate copyright notice for your modifications adja-
ment are reproduced in all copies, and that you add no other conditions cent to the other copyright notices.
whatsoever to those of this License. You may not use technical mea- F. Include, immediately after the copyright notices, a license notice
sures to obstruct or control the reading or further copying of the copies giving the public permission to use the Modified Version under the
you make or distribute. However, you may accept compensation in ex- terms of this License, in the form shown in the Addendum below.
change for copies. If you distribute a large enough number of copies G. Preserve in that license notice the full lists of Invariant Sections
you must also follow the conditions in section D.2.4. and required Cover Texts given in the Documents license notice.
You may also lend copies, under the same conditions stated above, H. Include an unaltered copy of this License.
and you may publicly display copies. I. Preserve the section Entitled History, Preserve its Title, and add
to it an item stating at least the title, year, new authors, and publisher
D.2.4 Copying in quantity of the Modified Version as given on the Title Page. If there is no section
Entitled History in the Document, create one stating the title, year,
If you publish printed copies (or copies in media that commonly have authors, and publisher of the Document as given on its Title Page, then
printed covers) of the Document, numbering more than 100, and the add an item describing the Modified Version as stated in the previous
Documents license notice requires Cover Texts, you must enclose the sentence.
copies in covers that carry, clearly and legibly, all these Cover Texts: J. Preserve the network location, if any, given in the Document for
Front-Cover Texts on the front cover, and Back-Cover Texts on the back public access to a Transparent copy of the Document, and likewise the
cover. Both covers must also clearly and legibly identify you as the network locations given in the Document for previous versions it was
publisher of these copies. The front cover must present the full title based on. These may be placed in the History section. You may omit
with all words of the title equally prominent and visible. You may add a network location for a work that was published at least four years
other material on the covers in addition. Copying with changes limited before the Document itself, or if the original publisher of the version it
to the covers, as long as they preserve the title of the Document and refers to gives permission.

116
D Distribution of this text

K. For any section Entitled Acknowledgements or Dedications, You may extract a single document from such a collection, and dis-
Preserve the Title of the section, and preserve in the section all the sub- tribute it individually under this License, provided you insert a copy of
stance and tone of each of the contributor acknowledgements and/or this License into the extracted document, and follow this License in all
dedications given therein. other respects regarding verbatim copying of that document.
L. Preserve all the Invariant Sections of the Document, unaltered in
their text and in their titles. Section numbers or the equivalent are not
considered part of the section titles. D.2.8 Aggregation with independent works
M. Delete any section Entitled Endorsements. Such a section may
A compilation of the Document or its derivatives with other separate
not be included in the Modified Version.
and independent documents or works, in or on a volume of a storage
N. Do not retitle any existing section to be Entitled Endorsements
or distribution medium, is called an aggregate if the copyright re-
or to conflict in title with any Invariant Section.
sulting from the compilation is not used to limit the legal rights of the
O. Preserve any Warranty Disclaimers.
compilations users beyond what the individual works permit. When
If the Modified Version includes new front-matter sections or appen-
the Document is included an aggregate, this License does not apply to
dices that qualify as Secondary Sections and contain no material copied
the other works in the aggregate which are not themselves derivative
from the Document, you may at your option designate some or all of
works of the Document.
these sections as invariant. To do this, add their titles to the list of In-
If the Cover Text requirement of section D.2.4 is applicable to these
variant Sections in the Modified Versions license notice. These titles
copies of the Document, then if the Document is less than one half of the
must be distinct from any other section titles.
entire aggregate, the Documents Cover Texts may be placed on cov-
You may add a section Entitled Endorsements, provided it con-
ers that bracket the Document within the aggregate, or the electronic
tains nothing but endorsements of your Modified Version by various
equivalent of covers if the Document is in electronic form. Otherwise
partiesfor example, statements of peer review or that the text has
they must appear on printed covers that bracket the whole aggregate.
been approved by an organization as the authoritative definition of a
standard.
You may add a passage of up to five words as a Front-Cover Text,
D.2.9 Translation
and a passage of up to 25 words as a Back-Cover Text, to the end of the
list of Cover Texts in the Modified Version. Only one passage of Front- Translation is considered a kind of modification, so you may distribute
Cover Text and one of Back-Cover Text may be added by (or through translations of the Document under the terms of section D.2.5. Re-
arrangements made by) any one entity. If the Document already in- placing Invariant Sections with translations requires special permission
cludes a cover text for the same cover, previously added by you or by from their copyright holders, but you may include translations of some
arrangement made by the same entity you are acting on behalf of, you or all Invariant Sections in addition to the original versions of these In-
may not add another; but you may replace the old one, on explicit per- variant Sections. You may include a translation of this License, and all
mission from the previous publisher that added the old one. the license notices in the Document, and any Warrany Disclaimers, pro-
The author(s) and publisher(s) of the Document do not by this Li- vided that you also include the original English version of this License
cense give permission to use their names for publicity for or to assert and the original versions of those notices and disclaimers. In case of a
or imply endorsement of any Modified Version. disagreement between the translation and the original version of this
License or a notice or disclaimer, the original version will prevail.
D.2.6 Combining documents If a section in the Document is Entitled Acknowledgements, Ded-
ications, or History, the requirement (section D.2.5) to Preserve its
You may combine the Document with other documents released under Title (section D.2.2) will typically require changing the actual title.
this License, under the terms defined in section 4 above for modified
versions, provided that you include in the combination all of the In-
variant Sections of all of the original documents, unmodified, and list D.2.10 Termination
them all as Invariant Sections of your combined work in its license no-
You may not copy, modify, sublicense, or distribute the Document ex-
tice, and that you preserve all their Warranty Disclaimers.
cept as expressly provided for under this License. Any other attempt
The combined work need only contain one copy of this License,
to copy, modify, sublicense or distribute the Document is void, and will
and multiple identical Invariant Sections may be replaced with a sin-
automatically terminate your rights under this License. However, par-
gle copy. If there are multiple Invariant Sections with the same name
ties who have received copies, or rights, from you under this License
but different contents, make the title of each such section unique by
will not have their licenses terminated so long as such parties remain
adding at the end of it, in parentheses, the name of the original author
in full compliance.
or publisher of that section if known, or else a unique number. Make
the same adjustment to the section titles in the list of Invariant Sections
in the license notice of the combined work.
D.2.11 Future revisions of this license
In the combination, you must combine any sections Entitled His-
tory in the various original documents, forming one section Enti- The Free Software Foundation may publish new, revised versions
tled History; likewise combine any sections Entitled Acknowledge- of the GNU Free Documentation License from time to time. Such
ments, and any sections Entitled Dedications. You must delete all new versions will be similar in spirit to the present version, but
sections Entitled Endorsements. may differ in detail to address new problems or concerns. See
http://www.gnu.org/copyleft/.
D.2.7 Collections of documents Each version of the License is given a distinguishing version num-
ber. If the Document specifies that a particular numbered version of
You may make a collection consisting of the Document and other doc- this License or any later version applies to it, you have the option of
uments released under this License, and replace the individual copies following the terms and conditions either of that specified version or
of this License in the various documents with a single copy that is in- of any later version that has been published (not as a draft) by the Free
cluded in the collection, provided that you follow the rules of this Li- Software Foundation. If the Document does not specify a version num-
cense for verbatim copying of each of the documents in all other re- ber of this License, you may choose any version ever published (not as
spects. a draft) by the Free Software Foundation.

117
D Distribution of this text

D.2.12 Addendum: How to use this License for


your documents
To use this License in a document you have written, include a copy
of the License in the document and put the following copyright and
license notices just after the title page:
Copyright (c) <year> <your name>. Permission is granted to copy,
distribute and/or modify this document under the terms of the GNU
Free Documentation License, Version 1.2 or any later version published
by the Free Software Foundation; with no Invariant Sections, no Front-
Cover Texts, and no Back-Cover Texts. A copy of the license is included
in the section entitled GNU Free Documentation License.
If you have Invariant Sections, Front-Cover Texts and Back-Cover
Texts, replace the with...Texts. line with this:
with the Invariant Sections being <list their titles>, with the Front-
Cover Texts being <list>, and with the Back-Cover Texts being <list>.
If you have Invariant Sections without Cover Texts, or some other
combination of the three, merge those two alternatives to suit the situ-
ation.
If your document contains nontrivial examples of program code, we
recommend releasing these examples in parallel under your choice of
free software license, such as the GNU General Public License, to per-
mit their use in free software.

D.2.13 Copyright
Copyright (c) 2000, 2001, 2002 Free Software Foundation, Inc. 59 Tem-
ple Place, Suite 330, Boston, MA 02111-1307, USA
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.

118
Index

n-forms, 34 formal power series, 71, 76


n-vectors, 33 free index, 27

adjoint, 66 Gaussian elimination, 39


affine hyperplane, 18, 92 general solution, 53
algebra, 42 geometric multiplicity, 59, 79
algebraic complement, 66 geometric relation, 10
algebraic multiplicity, 59 Grbner basis, 70
analytic function, 71 graded algebra, 43
anti-Hermitian operator, 102 Gram-Schmidt procedure, 88
antisymmetric operator, 102 Grassmann algebra, 42
antisymmetric tensor, 33 Grassmanns complement, 38, 96

bivector, 32 Heaviside step function, 107


block-diagonal matrix, 49 Hermitian conjugate, 101
Hermitian operator, 101
canonical isomorphism, 15 Hermitian scalar product, 89, 100
canonical projection, 15 Hodge star, 39, 95, 100
Cayley-Hamilton theorem, 69 general definition, 96
generalization, 69 homomorphism, 11, 13
characteristic equation, 59 hyperplane, 2, 18, 48, 100
combinatorial factor, 34, 104
commutator, 112 identity matrix, 112
complexification, 100 insertion map, 36, 96
components of a vector, 6, 9 interior product, 36, 96
coordinate-free approach, iv, 10 invariant subspace, 14
covector, 16 inverse matrix, 113
inverse operator, 52
decomposition of identity, 23 inverse permutation, 109
determinant, 44, 47, 49, 113 invertible operator, 14
diagonalizable operator, 59, 70, 76, 82
dividing by tensor, 38, 53 Jacobi formula, 77
dual basis, 16, 38 Jordan basis, 7981
dual space, 16 Jordan canonical form, 13, 76
dummy index, 27 Jordan cell, 79, 81

eigenbasis, 13 Kramers rule, 53


eigenspace, 14 Kronecker symbol, 12, 16, 88
eigenvector, 13
elegance, iv Lagrange polynomial, 56
elementary permutation, 109 Laplace expansion, 50, 113
endomorphism, 11, 13 length of a vector, 87
Euclidean space, 87 Leverriers algorithm, 69
Euler formula, 108 Levi-Civita symbol, 41, 51, 67, 95, 96
extensions of operators to k V , 56 linear combination, 7
exterior algebra, 42 linear operator, 11
exterior product, 32, 33 linearity, 11
in index notation, 41 linearly (in)dependent set, 8
origin of the name, 34 Liouville formula, 76
exterior transposition, 3, 63
in index notation, 64 minimal polynomial, 82
minor, 50, 113
formal linear combination, 7 mirror reflection, 48

119
Index

monic polynomial, 55
multi-index, 99

nilpotent, 62
normal projection, 91
number field, 6

orientation of space, 94
oriented area, 30
orthogonal complement, 91
orthogonal projection, 91
orthonormal basis, 88

parity, 109
Pauli matrices, 112
permutation, 109
order violations, 109
parity of, 109
Pfaffian, 4, 104
polynomial interpolation, 55
positively orientated basis, 94
projector, 14, 23, 41, 48, 72, 83

rank of an operator, 26
reciprocal basis, 97
right-hand rule, 94
root vector, 80
rotation, 92

scalar product in k V , 99
scalar product in N V , 98
single-term exterior products, 32, 35, 37, 38
square-free polynomial, 82
standard basis, 9
Sylvesters method, 73, 74
symmetric operator, 101

tensor, 19
tensor product, 19, 114
tensor-valued area, 46
totally antisymmetric, 33, 41
trace, 25, 57
trace relations, 61, 69, 78
traceless operator, 102
transposed matrix, 114
transposed operator, 25, 93
triple product, 95

unit volume tensor, 4, 94, 95


unitary operator, 102

Vandermonde matrix, 54
vector product, 95

wedge product, 32

120
Index

Notes

121

Das könnte Ihnen auch gefallen