Mathematical Foundations of Quantum Mechanics

Mathematical Foundations of Quantum Mechanics
Judith McGovern
December 16, 2016

Maths of Vector Spaces
This section is designed to be read in conjunction with chapter 1 of Shankar’s Principles of

Quantum Mechanics, which will be the principal course text book. Other on-line resources are
linked from the course home page.
Another source that covers most of the material at the right level is Griffiths’ Introduction to
Quantum Mechanics, which has an appendix on linear algebra.
Riley’s Mathematical Methods for the Physical Sciences is available as an ebook, and chapter
8 covers much of the material too. This is particularly recommended if Shankar seems initially
intimidating. Unfortunately Riley does not use Dirac notation except for inner products, using
boldface a where we would use |ai, but if you understand the concepts from that book, the
notation used here should not be a barrier. Some further comments on Riley’s notation can be
found in section 1.4. Riley (or a similar text such as Boas) should be consulted for revision on
finding the eigenvalues and eigenvectors of matrices.
This outline omits proofs, but inserts the symbol P to indicate where they are missing. In the
early stages the proofs are extremely simple and largely consist of assuming the opposite and
demonstrating a contradiction with the rules of vector spaces or with previous results. Many
are in Shankar but some are left by him as exercises, though usually with hints. By the time
we get on the properties of operators (existence of inverses, orthogonality of eigenstates) some
of the proofs are more involved. Some of the principal proofs are on the examples sheet. Proofs
from this section are not examinable, but you are advised to tackle some of them to make sure
you understand the ideas.
1
1.1 Vector Spaces
Definition
Shankar pp 1-3, Riley 8.1, Griffiths A.1
A linear vector space is a set V of elements called vectors, {|vi, |wi...}, for which
I) An operation, “+”, is defined, which for any |vi and |wi specifies how to form |vi + |wi
II) Multiplication by a scalar is also defined, specifying α|vi
and these operations obey the following rules:
1. The result of these operations is another member of V (closure).
2. |vi + |wi = |wi + |vi (vector addition is commutative)
3. (|ui + |vi) + |wi = |ui + (|vi + |wi) (vector addition is associative)
4. α(β|vi) = (αβ)|vi (scalar multiplication is associative)
5. 1 |vi = |vi
6. α(|vi + |wi) = α|vi + α|wi (distributive rule 1)
7. (α + β)|vi = α|vi + β|vi (distributive rule 2)
8. The null or zero vector is written as |0i (or often, just 0), with |0i + |vi = |vi
9. For every vector |vi there is another, denoted |−vi, such that |vi + |−vi = |0i
Note in the definition of |−vi the minus sign is just part of the name of the inverse vector.
The zero vector is unique. 0|vi = |0i for any |vi P .
The inverse is unique and given by |−vi = (−1)|vi P .
We use “minus” in the following sense: |vi − |wi = |vi + (−1)|wi = |vi + |−wi.
If the scalars α, β... are complex (written α, β ∈ C), we have a complex vector space, otherwise
(α, β ∈ R) we have a real one. If we want to distinguish we write V(C) and V(R), but if we
don’t specify we assume it is complex. (C or R is called the field of the space).
These rules just confirm what you do naturally, but:
• You should not assume anything about abstract vectors that is not given in the definition.
• The rules apply to many things apart from traditional “arrow” vectors.
• So far there is no concept of “angle” between vectors, nor any way to measure “length”.
Examples
• Ordinary 3D “arrow” vectors belong to a real vector space.1
1
“arrow” vectors have length and direction in 3D, but they do not have a fixed starting point, so two vectors
are added by placing the tail of second at the tip of the first; multiplication by a scalar changes the length but
not the direction. In physics, displacement vectors are a better picture to keep in mind than positions.
• Real numbers form a (very simple) real vector space.
• The set RN (CN ) of sequences of N real (complex) numbers, such as |ci = (c1 , c2 , . . . cN )
form a real (complex) vector space, where ‘+’ is ordinary matrix addition, |0i = (0, 0, . . . 0)
and the inverse is |−ci = (−c1 , −c2 , . . . − cN ).
• The set of all polynomials such as f (x) = a0 + a1 x + a2 x2 + . . . , with ai ∈ C and x ∈ R,

forms a complex vector space; |0i is the polynomial with all coefficients ai equal to zero.

a b
• The set of 2 × 2 complex matrices , with a, b, c, d ∈ C, form a complex vector
c d
space under matrix addition (in fact any such set of n × m matrices gives a vector space).
Ket Notation
Shankar p 3, Griffiths A.1
Here we are using the Dirac notation for vectors, with the object |vi also being called a
ket. The text between the “|” and the “i” is just a name or label for the ket, which can take
many forms—we will see letters, numbers, symbols (|+i, |♥i), reminders of how the vector was
formed (|αvi for α|vi).... Sensible choices of names can help make the algebra easy to follow.
The notation prevents abstract vectors being confused with simple numbers.
1.2 Linear Independence, bases and dimensions
Linear Independence
Shankar p 4, Riley 8.1.1, Griffiths A.1
Since there are infinitely many scalars, all vector spaces have infinitely many members.
Pn {|x1 i, |x2 i, . . . , |xn i}, the set is said to be linearly dependent if

If from V we pick n vectors
it is possible to write i=1 ai |xi i = |0i where the coefficients ai are not all zero. It follows that
at least one of the vectors can be written as a sum over the others P .
If this is not possible, the set is linearly independent. Any two non-parallel “arrow” vectors
are linearly independent; any three arrow vectors in a plane are linearly dependent.
Dimensions and Bases

Shankar pp 5-7, Riley 8.1.1, Griffiths A.1
A vector space has dimension N if it can accommodate a maximum of N linearly-independent

vectors. It is infinite-dimensional if there is no maximum. We use VN if we want to specify the
dimension.
A basis in a vector space V is a set {|x1 i, |x2 i, . . . , |xN i} ≡ {|xi i} of linearly-independent
vectors such that every vector in V is a linear combination of the basis vectors |xi i; that is, for
an arbitrary vector |vi,
N
X
|vi = vi |xi i
i=1
where vi are suitable coefficients (or components or coordinates). For a given basis and
vector |vi, these components are unique P . However in different bases, a given vector will have
different components.
In general components are complex, but for a real vector space (with a suitable choice of basis)
they are real.
Example: In real 3-D space, using the usual notation, the vectors {i, j, k} form a basis . (These
may also be written {x̂, ŷ, ẑ} or {ex , ey , ez }.) So does any other set of three non-coplanar
vectors.
Every basis in VN has N elements; conversely any set of N linearly-independent vectors in VN
forms a basis P .
P
When
P we add vectors,
P the coordinates add: if |wi = α|ui + β|vi, with |ui = ui |xi i, |vi =
vi |xi i and |wi = wi |xi i, then wi = αui + βvi .
P
Any set of at least N vectors which includes a basis as a subset is said to span the space;
obviously a basis spans the space.
For convenience, we will often write a basis as {|ii} ≡ {|1i, |2i, . . . |N i}. Recall that what is
written inside the ket is just a label. Numbers-as-labels in kets will be widely used, so it is
important to remember they have no other significance. |1i + |2i = 6 |3i!
Representations
For a given basis {|xi i}, and a vector |vi = N

P
i=1 vi |xi i, the list of components vi is a repre-
sentation of the abstract vector |vi. We write this as a vertical list (or column vector):
v1 !
v2
|vi −→ .. .
x .
vN
The symbol −→ means “is represented by”, with the x being a name or label for the basis
x
(which will be omitted if the basis is obvious).
Note that in their own representation the basis vectors are simple:
1 0 0
! ! !
0 1 0
|x1 i −→ .. , |x2 i −→ .. , . . . |xN i −→ .. .
x . x . x .
0 0 1
If ui , vi and wi are the components of |ui, |vi and |wi in this basis, and |wi = α|ui + β|vi,
 
αu1 + βv1
 αu2 + βv2 
|wi −→ 
 
.. 
x  . 
αuN + βvN
Hence all manipulations (addition, multiplication by a scalar) of the abstract vectors or kets
are mirrored in corresponding manipulations of the column vectors. A fancy way of saying the
same this is that all N -dimensional vector spaces are homomorphic to CN , and hence to one
another. Practical calculations often start by specifying a basis and working with the corre-
sponding representations of vectors in that basis. We will repeatedly find the same calculations
recurring for physically different vector spaces that happen to have the same dimension.
If we have another basis, {|yi i}, |vi will be represented by a different column vector in this
basis, and the |xi i will have more than one non-zero component.
Example: given a 2D real vector space with a basis {|x1 i, |x2 i} and another
{|y1 i = |x1 i+|x2 i, |y2 i = |x1 i−|x2 i}, and |vi = 2|x1 i + 3|x2 i = 25 |y1 i − 12 |y1 i, we have for
instance

2 5/2 1 1/2
|vi −→ , |vi −→ , |y2 i −→ , |x1 i −→ .
x 3 y −1/2 x −1 y 1/2
Subspaces and direct sums

Shankar pp 17-18
Given an N -D vector space VN , a subset of its elements that form a vector space among
themselves is a subspace.
For examples in ordinary 3-D space:
• all vectors along the x axis are a 1-D subspace: V1x
• all vectors in the xy plane which includes the origin are a 2-D subspace: V2xy .
Note both of these contain the origin, and the inverse of any vector in the subspace.
Any n-D subset of a basis of VN will span a new subspace Vn P . Of course the space contains
all linear combinations of these basis vectors, not just the vectors themselves.
Given two spaces, VN M
a and Vb , (where a and b are just labels), their so-called direct sum,
written VN M N M
a ⊕ Vb is the set containing all elements of Vi and Vj and all possible linear
combinations between them. This makes it closed, and so the direct sum is a new vector space.
A set consisting of N basis vectors from VN M N M
a and M from Vb forms a basis in Va ⊕ Vb , which
N M
is an N + M dimensional space. Va and Vb are subspaces of this space.
Example: V1x ⊕ Vy1 = V2xy . Bases for the two 1-D spaces are the 1-element sets {i} and {j}; so
{i, j} is a basis on their direct sum. V1x and V1y are now subspaces of the new space V2xy . Note
that V2xy contains points off the x and y axes which are not in either of the component spaces,
but are produced by linear combinations (e.g. 2i − 10j).
Note that for this to work, the two spaces must have only the zero vector in common. The
direct sum of the xy plane and the xz plane is not four-dimensional!
Product spaces
Shankar pp 248-249 (chapter 10)
A different way of combining two spaces is the “tensor direct product”, denoted VN M
a ⊗ Vb .
Though important in quantum mechanics, it is hard to come up with examples from classical
physics. They arise when a system has two distinct aspects, both of which are vectors, and in
order to specify the state of the system both vectors have to be given.
If {|ai i} and {|bi i} are basis sets for the two spaces, one possible basis for the product space is
formed by picking one from each—say the ith from the first set and the jth from the second
set. There are N × M possibilities, so the product space has dimension N × M . These states
are written |i, ji ≡ |ai i ⊗ |bj i. The ⊗ is best regarded simply as a separator; it doesn’t indicate
any operation that is carried out.
Note that for |pi, |qi ∈ VN M N M
a and |vi, |wi ∈ Vb , while all vectors |pi ⊗ |vi are in Va ⊗ Vb , not all
vectors in the product space can be written in this way. Those that can are called separable,
i.e. they have a specified vector in each separate space. The vector α|pi ⊗ |vi + β|qi ⊗ |wi
is in the product state but is not separable unless |pi ∝ |qi or |vi ∝ |wi.2 This is where
the distinction between classical and quantum mechanics comes in. In quantum mechanics, a
non-separable state is called an entangled state.
Linearity and associative and distributive laws hold, eg

α|pi ⊗ β|vi + γ|wi = αβ |pi ⊗ |vi + αγ |pi ⊗ |wi
Note |vi ⊗ |0i and |0i ⊗ |wi are the same and equal to the null vector P .
1.3 Inner Products
Definitions
In applications in physics we usually want to define the length or “norm” of a vector, and
the “angle” between two vectors. To be precise, we define the inner product of |vi and |wi,
written hv|wi, as a complex number that obeys three rules:
(I) hv|wi = hw|vi∗ . (Skew symmetry)

(II) hv|vi ≥ 0, with equality if and only if |vi is the zero vector. (Positive definiteness)

(III) hv| α|ui + β|wi = αhv|ui + βhv|wi, where α, β ∈ C. (Linearity on the right or ket side).
A vector space with an inner product is called an inner-product space. The term Hilbert
space is also used; in finite-dimensional spaces at least they are equivalent for our purposes.
Examples
• For real vectors in 3-D the usual scalar
P product satisfies these rules . N
P
• So does the “sum of products” rule i vi wi for lists of real numbers (R ).

•
PHowever the “sum of products” rule does NOT work for lists of complex numbers (CN ); but
∗
i vi wi does.
It follows that for vectors from a complex vector space, if |pi = α|ui + β|vi,
hp|wi = α∗ hu|wi + β ∗ hv|wi :
i.e. inner products are “anti-linear” or “conjugate-linear” on the left P .
Two vectors are orthogonal if their inner product is zero: hv|wi = 0 = hw|vi.
p
We choose the norm or length of a vector |vi to be |v| = hv|vi. If |v| = 1, |vi is normalised.
2
|pi ∝ |qi means that there is some scalar α such that |pi = α|qi
Orthonormal bases
Shankar pp 9-12, 14-15, Riley 8.1.2, Griffiths A.2
A set of vectors in a vector space VN , {|ii} ≡ {|1i, |2i, ...|ni}, all of unit norm, and all orthogonal
to each other, is called an orthonormal set. By definition they satisfy hi|ji = δij (i.e. 1 if
i = j and 0 otherwise).
(We could equally have denoted the basis {|xi i}. Especially if we are talking about vectors in
real 3D space we might use the notation {|ei i} instead.)
Vectors in an orthonormal set are linearly independent P , so n ≤ N .
If there are enough vectors in the orthonormal set to make a basis (for finite-dimensional spaces,
n = N ), we call it an orthonormal basis or complete orthonormal set.
Every [finite-dimensional] vector space has an orthonormal basis P (actually infinitely many).
(This theorem is actually true even for infinite-dimensional vector spaces but the proof is hard.)
P
Coordinates in an orthonormal basis have very simple expressions: if |vi = i vi |ii, then
vi = hi|vi P .
P ∗
If
P vi ∗and w i , are the coordinates of |vi and |wi respectively, hv|wi = i vi wi and hv|vi =
2
P
i vi vi = i |vi | ≥ 0 .
P
(Remember in proving these, you need to use different indices (“dummy indices”) for each sum,
and these in turn must
P be different from any “free” index, which stands for any of 1 . . . N . Thus
for example hi|vi = j vj hi|ji.)
Though coordinates are basis-dependent, the sums that give norms and inner products are
basis-independent, as will be shown later.
Gram-Schmidt orthogonalisation can be used to construct an orthonormal basis {|ii} from
a set {|vi i} of N linearly-independent vectors. First, let |1i be |v1 i/|v1 |. Then take |v2 i, subtract
off the component parallel to |1i, and normalise:
where |C2 |−2 = hv2 |v2 i − hv2 |1ih1|v2 i

|2i = C2 |v2 i − h1|v2 i|1i
Continue taking the remaining |vj i in turn, subtract off the component parallel to each previ-
ously constructed |ii, normalise and call the result |ji:
j−1
!
X
|ji = Cj |vj i − hi|vj i|ii
i=1
where Cj is the normalisation constant. The resulting basis is not unique, because it depends
on the ordering of the basis vectors, which is arbitrary; also the normalisation constants are
only defined up to a phase. (This construction proves the existence of an orthonormal basis,
as asserted above.)
Bras
Shankar pp 11-14, Griffiths 3.6
In Dirac notation, the inner product hv|wi is considered as a bra hv| acting on ket |wi to form
a (scalar) “bracket”. Another way of saying this is that a bra hv| is an object with the property
that it can be combined with any ket |wi from VN to give the inner product hv|wi. For each ket,
there is a corresponding bra and vice versa, so hw|vi = hv|wi∗ will be the result of combining
the bra hw| with the ket |vi.
Mathematically, if the ket lives in a vector space VN , then the bra is an element of another
vector space, called the dual of VN , but we will not need this distinction. (Students often
stumble over the concept of bras when they first meet them, so the interpretation in terms of
row vectors to be given below is a very useful picture.)
Given a basis {|ii}, thePcorresponding bras {hi|} span the space of bras, and an arbitrary bra
can be expanded hv| = i vi∗ hi|, with hv|ii ∗
= vi . Thus the coordinates of the bra hv| are vi∗ P .
Note that if the ket |wi = α|ui + β|vi , the corresponding bra is hw| = α∗ hu| + β ∗ hv| .

If we represent a ket |vi as a column matrix of coordinates v:

 
v1
 v2 
|vi →   ≡ v,
 
..
 . 
vN
the corresponding bra is a row matrix:
hv| → (v1∗ , v2∗ , . . . vN

∗
) = (v> )∗ ≡ v† .
and the ordinary rules of matrix multiplication make the operation of a bra on a ket give a
single complex number:
 
w1 N
∗ ∗  .
hv|wi → (v1 , . . . vN )  ..  =
 X
vi∗ wi
wN i=1
just as before.
Note that the basis kets given by h1| → (1, 0, . . . , 0, 0) etc.
Inequalities
The Schwarz Inequality: for any vectors |vi, |wi, |hv|wi| ≤ |v| |w| P .
The equality holds only if |vi ∝ |wi P .
Notice that the same rule applies to ordinary dot products, since | cos θ| ≤ 1.
The triangle inequality: If |wi = |ui ± |vi, then |w| ≤ |u| + |v| P .
Notice that this holds for the lengths of ordinary “arrow” vectors that form a triangle! By
symmetry, the result is cyclic, i.e. |v| ≤ |w| + |u| etc.
Inner products in product spaces

Let {|pi, |qi} ∈ Va and {|vi, |wi} ∈ Vb , and let an inner product be defined
on each
space. The
inner product in the product space Va ⊗ Vb is defined as hp| ⊗ hv| |qi ⊗ |wi = hp|qihv|wi,
which of course is a scalar.
If {|pi i} and {|vi i} are orthonormal bases in each space, then {|pi i ⊗ |vj i} is an orthonormal
basis in the product space (there are of course others, which need not be separable).
1.4 Operators
Definition
Shankar 18-20, Riley 8.2, 7.2.1, Griffiths A.3
Operators change kets into other kets in the same vector space:
Â|vi = |wi
For the moment we mark operators with a hat, ˆ.
Linear operators (we will not consider others) have the property that

Â (α|vi + β|wi) = αÂ|vi + β Â|wi and αÂ + β B̂ |vi = αÂ|vi + β B̂|vi.
Hence any operator acting on the zero vector gives zero.
The identity operator Iˆ leaves a ket unchanged: I|vi
ˆ = |vi.
The product of two operators, say ÂB̂, means “apply B̂ first and then apply Â to the
result”. If B̂|vi = |ui, ÂB̂|vi = Â|ui.
Â and B̂ will not in general commute, in which case this is not the same as B̂ Â|vi.
If, for all kets in the space, B̂ Â|vi = |vi, then B̂ is called the inverse of Â and denoted Â−1 .
We can write Â−1 Â = I.ˆ For finite dimensional spaces, ÂÂ−1 = Iˆ also P .
Not all operators have inverses. However if the equation Â|vi = |0i has no solutions except
|vi = |0i, the inverse Â−1 does exist P .
Inverse of matrix products: if Ĉ = ÂB̂, then Ĉ −1 = B̂ −1 Â−1 P .
Identity and Projection operators

Shankar 22-24, (Riley 8.4), Griffiths 3.6
The object |aihb| is in fact an operator since, acting on any ket |vi, it gives another ket, hb|vi|ai.
(Whatever |vi we choose, the resulting ket is always proportional to |ai.) This is termed the
outer product of |ai and |bi, and is entirely distinct from the inner product hb|ai, which is a
scalar.
Using an orthonormal basis {|ii}, we can define projection operators, P̂i = |iihi|, which
“pull out” only the part of a vector |vi which is parallel to |ii: P̂i |vi = vi |ii. The product of
two projection operators is zero or equivalent to a single projection P : P̂i P̂j = δij P̂i .
These are examples of operators which do not have an inverse, since P̂i |vi = 0 will be satisfied
for many non-zero kets |vi. The lack of an inverse reflects the fact that when we operate with
P̂i on a vector, we lose all information about components orthogonal to |ii, and no operator
can restore it.
One very useful way of writing the identity operator is as follows P :
X X
Iˆ = P̂i = |iihi|
i i
This is called the completeness relation. The sum must be over projectors onto an orthonor-
mal basis.
Matrix representation of operators
Shankar 20-22, 25, Riley 8.3, 7.3.1, Griffiths A.3
[Comment on notation in Riley: Riley uses boldface for abstract vectors where we use kets,
and calligraphic letters without “hats” for operators: hence Â|vi = |ui is written Av = u. We
use boldface for column vectors and matrices of components, but Riley uses a sans-serif font,
so A v = u is a matrix equation.]

We can form the inner product of |ui ≡ Â|vi with another vector |wi, to get hw|ui = hw| Â|vi .
This is called a matrix element of Â, and is more often written hw|Â|vi.
If we have an orthonormal basis {|ii}, we can form all possible matrix elements of Â between
vectors of the basis, Aij = hi|Â|ji; these are the coordinates of Â in this basis. Then P
X X
Â|vi = Aij vj |ii and hw|Â|vi = wi∗ Aij vj .
ij ij
The numbers Aij can be arranged in a matrix A, i labelling the row and j the column, which
gives
  
A11 A12 ... A1N v1
 A21 A22 ... A2N  v2 
hw|Â|vi = (w1∗ , w2∗ , . . . wN
∗ 
)  = w† Av (1.1)
 
.. .. .. .. .. ..  ..
 . . ... .  . 
AN 1 AN 2 . . . AN N vN
The ith column of matrix A is just the coordinates of |Aii ≡ Â|ii, i.e. the transformed basis
ket.
If the determinant of A vanishes, its columns are not linearly independent. That means that
{|Aii} is not a basis, and the vectors |Avi belong to a lower-dimensional sub-space of VN .
Hence det A = 0 means that Â−1 does not exist.
The matrix
P elements of the product of two operators can be found by inserting the completeness
ˆ
relation k |kihk| as an identity operator in ÂB̂ = ÂI B̂:
X X
(AB)ij = hi|ÂB̂|ji = hi|Â|kihk|B̂|ji = Aik Bkj
k k
i.e. the usual matrix multiplication formula.

Examples:
ˆ = hi|ji = δij . So
Identity: Iij = hi|I|ji
 
1 0 0 ...
 0 1 0 
Iˆ →  .
 
 0 0 1 
.. ...
.
Projectors: hi|P̂k |ji = hi|kihk|ji = δik δjk = δij δik (note we do not use a summation conven-
tion), eg  
0 0 0 0 ...
 0 0 0 0 
 
P̂3 →  0 0 1 0
 

 0 0 0 0 
 
.. ..
. .
i.e. 1 on the diagonal for the selected row/column
An outer product: The matrix elements of Ĉ = |vihw| are just cij = vi wj∗ . We can obtain a
square matrix from a column and a row vector if we multiply them in that order (as opposed
to the opposite order which gives the inner product, a scalar):
v1 w1∗ v1 w2∗ . . . v1 wN
∗
   
v1
v2 ..
v2 w1∗ v2 w2∗ .
   
 ∗ ∗ ∗
 (w1 , w2 , . . . wN ) = 
  
 .. .. .. 
 .   . . 
vN vN w1∗ ... vN wN∗
Adjoints
Shankar pp 25-27, (Riley 8.6), Griffiths A.3, A.6

An operator such as |aihb| can clearly act on bras as well as kets: hu| |aihb| = hu|ai hb|.
In fact all operators can act to the left on bras as well as to the right on kets. This is obvious
from the matrix representation in an orthonormal basis, since a row vector can be multiplied
from the right by a matrix.
Now the ket |ui = Â|vi has a bra equivalent, hu|, but for most operators it is not the same
as hp| = hv|Â. We define the adjoint of Â, Â† , as the operator that, acting to the left, gives
the bra corresponding to the ket which results from Â, acting to the right: hu| = hv|Â† . Hence
hw|Â|vi = hv|Â† |wi∗ .
Â† has matrix elements A†ij = A∗ji P i.e. the matrix representation of the adjoint operator
is the transposed complex conjugate of the original matrix, also called the Hermitian
conjugate. It follows that (Â† )† = Â, i.e. the adjoint of the adjoint is the original.
Adjoints of products: (ÂB̂)† = B̂ † Â† .
Adjoints of scalars: if B̂ = cÂ, B̂ † = c∗ Â† . Complex numbers go to their complex conjugates
in the adjoint.
The adjoint of |aihb| is |biha| P .
Operators in product spaces

Let Ĉa be an operator in a vector space Va and D̂b one in Vb . Then in the product space
Va ⊗ Vb we can form product operators Ĉa ⊗ D̂b , which act on the kets as follows:

Ĉa ⊗ D̂b |pi ⊗ |vi = Ĉa |pi ⊗ D̂b |vi .
Here it is particularly important to be clear that we are not multiplying Ĉa and D̂b together; they
act in different spaces. Once again ⊗ should be regarded as a separator, not a multiplication.
Denoting the identity operators in each space as Iâ and Iˆb respectively, in the product space
the identity operator is Iâ ⊗ Iˆb . An operator in which each additive term acts in only one space,
such as Ĉa ⊗ Iˆb + Iâ ⊗ D̂b , is called a separable operator. Ĉa ⊗ Iˆb and Ia ⊗ D̂b commute.
The inverse of Ĉa ⊗ D̂b is Ĉa−1 ⊗ D̂b−1 and the adjoint, Ĉa† ⊗ D̂b† . (The order is NOT reversed,
since each still has to act in the correct space.)

Matrix elements work as follows: (hp| ⊗ hv|) Ĉa ⊗ D̂b (|qi ⊗ |wi) = hp|Ĉa |qihv|D̂b |wi. (This
is the arithmetic product of two scalars.)
The labels a and b are redundant since the order of the operators in the product tells us which
acts in which space. Alternatively if we keep the labels, it is common to write Ĉa when we
mean Ĉa ⊗ Iˆb and Ĉa D̂b (or even, since they commute, D̂b Ĉa ) when we mean Ĉa ⊗ D̂b .
1.5 Hermitian and Unitary operators
Definition and Properties of Hermitian operators

Shankar p 27, Riley 8.12.5, Griffiths A.3
An operator Ĥ is Hermitian if Ĥ † = Ĥ or anti-Hermitian if Ĝ† = −Ĝ. Another term for

Hermitian is self-adjoint.
In real spaces Hermitian operators are represented by symmetric matrices, H> = H.
For Hermitian operators, if |ui = Ĥ|vi and |zi = Ĥ|wi, then hz| = hw|Ĥ, and hw|Ĥ|vi =
hw|ui = hz|vi P .
It follows that hv|Ĥ|wi = hw|Ĥ|vi∗ and hv|Ĥ 2 |vi ≥ 0 P .
Definition and Properties of Unitary operators

An operator Û is unitary if Û † = Û −1 . (In infinite dimensional spaces Û Û † = Iˆ and Û † Û = Iˆ

must both be checked.)
In real spaces unitary operators are represented by orthogonal matrices, U> = U−1 .
Unitary operators preserve the inner product, i.e. if Û |vi = |v 0 i and Û |wi = |w0 i, then
hv|wi = hv 0 |w0 i P . (The use of a “prime”, 0 , just creates a new label. It has nothing to do with
differentiation!)
The columns of a unitary matrix are orthonormal vectors, as are the rows P .
Since the matrix contains N columns (or rows), where N is the dimension of the vector space,
these orthonormal sets are actually complete bases.
The converse is also true: any matrix whose columns (or rows) form orthonormal vectors is
guaranteed to be unitary.
The determinant of a unitary matrix is a complex number of unit modulus P .
Unitary transformations: Change of basis
Let us define two orthonormal bases in VN , {|xi i} and {|yi i}. We will label components in
(x) (y)
these bases by superscripts (x) and (y), eg vi = hxi |vi, Aij = hyi |Â|yj i.
The components in the two bases are related by the matrix S, where Sij = hxi |yj i (and (S† )ij =
hyi |xj i) as follows P :
(y)
X
∗ (x) (y)
vi = Sji vj ⇒ v(y) = S† v(x) ; ∗
Aij = Ski Akl Slj ⇒ A(y) = S† A(x) S.
j
A simple example of a change of basis in a two-dimensional

given by |y1 i = cos θ|x1 i +
space is
cos θ − sin θ
sin θ|x2 i and |y2 i = cos θ|x2 i − sin θ|x1 i. Then S = .
sin θ cos θ
We often use {|ii} and {|i0 i} for the two bases, with Sij = hi|j 0 i, vi = hi|vi and vi0 = hi0 |vi.
S is a unitary matrix: (S† S)ij = k hyi |xk ihxk |yj i = δij . Hence (as we already knew) inner
P
products (hv|wi) and matrix elements (hv|Â|wi) are independent of coordinate system, even if
the individual numbers we sum to get them are different.
In addition, Tr(A(x) ) = Tr(A(y) ) and det(A(x) ) = det(A(y) ), so these also are basis-independent P .
For that reason we can assign these properties to the operators and talk about about Tr(Â)
and det(Â).3
The reverse transformation, from y-basis to x-basis, is done by interchanging S† and S.
Note that A(x) and A(y) are representations of the same abstract operator Â in different bases
(similarly v(x) , v(y) of the abstract ket |vi). Therefore, S is not an operator, since it does not
change the abstract kets. We call this a passive transformation or coordinate change.
However there are also unitary operators which do change the kets. An example is a rotation of
a vector in ordinary 3D (real) space (an active transformation), which is represented by the
transpose of the (orthogonal) matrix which transforms between rotated coordinate systems.
1.6 Eigenvectors and Eigenvalues

Note that from now on, we will write the zero vector as 0. We may even use |0i for a non-zero
vector with label 0!
Basic properties
The eigenvalue equation for a linear operator Ω̂ is

Ω̂|ωi = ω|ωi.
The equation is solved by finding both the allowed values of the scalar number ω, the eigen-
value, and for each eigenvalue the corresponding ket |ωi, the eigenvector or eigenket.
3
P
The trace of a matrix A is the sum of the diagonal elements: Tr(A) = i Aii .
The German word “eigen” means “own” or “characteristic”— i.e. the eigenkets are a special
set of vectors for each particular operator which have a very simple behaviour when operated
on: no change in “direction”, just a multiplication by a scalar eigenvalue. As we have done
above, we habitually use the eigenvalue (“ω”) to label the corresponding eigenket (“|ωi”).
The zero vector does not count as an eigenvector.
If |ωi is a solution to the eigenvalue equation, so is α|ωi for any α 6= 0. All such multiples
are considered to be a single eigenvector, and we usually quote the normalized value, with real
elements if that is possible.
ˆ
We can rewrite the eigenvalue equation as (Ω̂−ω I)|ωi = 0. (We can insert the identity operator
at will as it does nothing. The final zero is of course the zero vector.)
ˆ cannot have an inverse,
This is an equation that we want to solve for a non-zero |ωi, so (Ω̂−ω I)
and its determinant must vanish. This is the the characteristic equation:
ˆ = 0.
det(Ω̂ − ω I)
In any basis this is the determinant of an N ×N matrix, which is an N th-order polynomial in ω.

The fundamental theorem of algebra states that such a polynomial has N roots ω1 , ω2 . . . ωN ,
where some roots may be repeated and roots may be complex even if the coefficients are real.
Therefore any operator on VN has N eigenvalues, not necessarily all different.
The sum of all eigenvalues of Ω̂ (including repeated ones) is Tr(Ω̂), and their product equals
det(Â) P . Thus if Ω̂ has any zero eigenvalues, its inverse does not exist.
For each non-repeated eigenvalue ωi we will call the corresponding eigenvector |ωi i Working
ˆ i i = 0 will give N − 1 linearly-independent
in an orthonormal basis, the equation (Ω̂ − ωi I)|ω
equations for the components of |ωi i, so—as we knew—we can determine |ωi i only up to a
multiplicative constant.
A set of eigenvectors corresponding to distinct eigenvalues is linearly independent. P
For an eigenvalue which is repeated n times, there will be at least N − n linearly-independent
equations. These will have up to n linearly-independent solutions. Thus an operator with
repeated eigenvalues will have up to N linearly-independent eigenvectors.
Hermitian and unitary operators

Shankar pp 35-40, Riley 8.13.2 & 18.13.3, 7.12.3, Griffiths A.6
Important results P :
I) For Hermitian operators, eigenvalues are real.
II) For unitary operators, eigenvalues have unit modulus, i.e. they can be written eiθ , θ ∈ R.
III) For both Hermitian and unitary operators, eigenkets with different eigenvalues are orthog-
onal.
IV) For all Hermitian and unitary operators, the eigenvectors span the space. (The general
proof of this one is more involved, but it follows from (III) if all the eigenvalues are distinct).
This is called the Spectral Theorem.
Suppose a Hermitian or unitary operator Ω̂ has a repeated eigenvalue, say ω1 = ω2 = . . . = ωn =
λ. By the spectral theorem there are n linearly-independent solutions |λ, mi (where m = 1 . . . n
is just a label here). These
Pneigenvectors are said to be degenerate (same eigenvector). Then
any linear combination m=1 cm |λ, mi is also an eigenvector. Therefore any vector in the
subspace spanned by the set {|λ, mi} is an eigenvector of Ω̂. We call this an eigenspace. Even
if the first set of degenerate eigenvectors we found was not orthogonal, a new orthogonal basis
in the sub-space can always be found (by the Gram-Schmidt method or otherwise). Thus we
can always find a set of N orthonormal eigenvectors of Ω̂.
Any Hermitian or unitary operator can be written in terms of this orthonormal basis as
X
Ω̂ = ωi |wi , mihwi , m|.
i,m
This is called the spectral resolution of Ω̂. The first sum is over distinct eigenvalues. The
second sum runs over all the states within each eigenspace; for non-degenerate eigenvalues it is
not needed. We will not always write it explicitly, often just referring to the set of N vectors
{|ωi i}, but if degeneracy is present an orthogonalised basis is always meant.
Diagonalisation of Hermitian or unitary operators

To convert from some orthonormal basis {|xi i} to the eigenvector basis {|ωi i} in which Ω̂
is diagonal, we need the unitary conversion matrix Sij = hxj |ωi i. The columns of S are the
eigenvectors of Ω in the original basis, hence it is sometimes called the matrix of eigenvectors.
Using this matrix to change coordinates we get:
v(ω) = S† v(x) , Ω(ω) = S† Ω(x) S,
where superscripts in braces indicate the basis in which |vi and Ω̂ are represented.
However we do not need to perform the operation to know what we will get for Ω(ω) :
 
ω1
 ω2 
Ω̂ −→ 
 
.. 
ω  . 
ωN
(all the off-diagonal elements being zero). The order is arbitrary of course, though we often
choose ascending order (since they are, of course, real).
Commuting Hermitian Operators

Shankar pp 43-46, Riley 8.13.5
If the commutator [Ω̂, Λ̂] = 0 (where Ω̂ and Λ̂ are Hermitian), there is at least one basis of
common eigenvectors (therefore both operators are represented by diagonal matrices in this
basis).
Proof outline: by considering [Ω̂, Λ̂]|ωi i = 0 we can immediately see that Λ̂|ωi i is also an
eigenvector of Ω̂ with eigenvalue ωi . In the absence of degeneracy, that can only be the case
if Λ̂|ωi i is proportional to |ωi i, so the non-degenerate eigenstates of Ω̂ are also those of Λ̂. If
there is degeneracy, though, Λ̂|ωi i only needs to be another state in the same n-dimensional
eigenspace of Ω̂. However we know we can find n orthogonal eigenvectors of Λ̂ within that
subspace (i.e. we can diagonalise Λ̂ within that subspace) and the resulting eigenvectors of Λ̂
are an equally valid basis of degenerate eigenstates of Ω̂. We can now label the states |ωi , λj i,
and λj is no longer just an arbitrary label.
There may still be states that have the same ωi and the same λi , but we can repeat with
further commuting operators until we have a complete set of commuting operators defining
a unique orthonormal basis, in which each basis ket can be labelled unambiguously by the
eigenvalues |ω, λ, γ, . . .i of the operators {Ω̂, Λ̂, Γ̂, . . .}.
Examples of commuting operators are those in a product space of the form Ĉa ⊗ Iˆb and Iâ ⊗ D̂b .
If an operator is separable, i.e. it can be written as Ĉa ⊗ Iˆb + Iâ ⊗ D̂b , then the eigenvectors are
|ci i ⊗ |dj i with eigenvalue ci + dj . As already mentioned the operator is often written Ĉa + D̂b ,
where the label makes clear which space each operator acts in; similarly the eigenstates are
often written |ci , dj i.
1.7 Functions of Operators

We can add operators, multiply them by scalars, and take products of them. Hence we can
define a power series
∞
X
f (Ω̂) = an Ω̂n .
n=0
This will make sense if it converges to a definite limit. In its eigenbasis a Hermitian operator
is diagonal, so the power series acts on each diagonal element separately:
 
f (ω1 )
 f (ω2 ) 
f (Ω̂) −→ 
 
ω 
... 

f (ωN )
i.e. the power series converges for the operator if it converges for all its eigenvalues, and the
eigenvalues of f (Ω̂) are just the corresponding functions of the eigenvalues of Ω̂.
A very important operator function is the exponential, which is defined though the power series
∞
Ω̂
X Ω̂n
e ≡ .
n=0
n!
Since the corresponding power series for eω converges for all finite numbers, this is defined for
all Hermitian operators, and its eigenvalues are eωi .
From the definition it is clear that if Ω̂ and Λ̂ do not commute, eΩ̂ eΛ̂ 6= eΩ̂+Λ̂ .
Acknowledgements
This section is based quite closely on Chapter 1 of Shankar, and owes a considerable debt to
the notes prepared by Dr J P Leahy for a precursor to this course. Any mistakes however are
mine.
1.8 Summary
• A real or complex vector space is a set of abstract vectors, written as kets (e.g. |vi), which
is closed under both addition and multiplication by scalar real or complex numbers: all vectors
you can reach by any combination of addition and scalar multiplication are elements of the
vector space. There must be a zero vector |0i (or often, just 0) and vector have inverses:
|vi + | − vi = |0i.
• Linearly-independent sets of vectors are sets in which no member can be written as a

linear sum of the others. A basis is a set of linearly-independent vectors big enough to allow
any vector in the space to be written as a sum over the basis vectors. All bases have the same
size, which is the dimension of the space.
• The coordinates of an arbitrary vectorPin a given basis are the factors that multiply each
basis vector |ii in the linear sum: |vi = vi |ii. The column vector of these coordinate is the
representation of |vi in this basis. The representation depends on the basis.
• In some vector spaces there exists an inner product of two vectors, hv|wi, which give us
orthogonality, the norm of each vector, and hence allows us to construct orthonormal
bases.
• In an orthonormal basis, coordinates

P ∗ are given by vi = hi|vi, and from coordinates we can
evaluate inner products hv|wi = i vi wi and norms of arbitrary vectors.
• We can think of the left side of inner products as bras, ha|, represented by row matrices if kets
are column matrices (with elements that are complex conjugates, vi∗ ). Inner products are then
given by ordinary matrix multiplication.
• Direct tensor product spaces are composite spaces in which kets are obtained by taking a ket
from each of two separate spaces: |pi ⊗ |vi
(or taking
sums of such terms). Inner products are
taken in each space separately: hp| ⊗ hv| |qi ⊗ |wi = hp|qihv|wi. A basis of the product space
can be formed by taking all possible combinations of basis vectors from each subspace—M × N
for the product of an M and an N -dimensional space.
• Linear operators change kets to kets: Â|ui = |vi, or bras to bras: hu|Â = hw|.
• The adjoint operator Â† is defined by hu|Â† = hv|. For any |vi and |xi, we have hv|Â† |xi =
hx|Â|vi∗
• Operators can be multiplied: ÂB̂ means “do B then A”. They may not commute.
• They may have inverses: ÂÂ−1 = Iˆ = Â−1 Â.
• (ÂB̂)† = B̂ † Â† ; (ÂB̂)−1 = B̂ −1 Â−1
• In an orthonormal basis {|ii}, Iˆ = N

P
i |iihi|; this is the completeness relation.
• Operators
in a product
space form Â ⊗ P̂ (or sums of such terms) with
have the
Â ⊗ P̂ |ai ⊗ |vi − Â|ai ⊗ P̂ |vi .
• Operators in N -dimensional vector spaces can be represented as N × N matrices.

• Operator product and inverses correspond to matrix products and inverses. The adjoint is the
transposed complex conjugate matrix or Hermitian conjugate.
• A Hermitian operator satisfies Â = Â† (‘Self-Adjoint’) and hw|Â|vi = hv|Â|wi∗ .
• A unitary operator satisfies Û −1 = Û † ; like a rotation or change of coordinates
• Eigenvectors (eigenkets) and eigenvalues satisfy Â|ai i = ai |ai i.
• Eigenvectors of Hermitian and unitary operators can form an orthonormal basis (eigenbasis).
• Hermitian operators arePdiagonal in their eigenbasis {|ωi i}, the diagonal elements are the
eigenvalues and Ω̂ = ωi N
i |ωi ihωi |.
• Given a complete sets of commuting Hermitian operators: each such set defines a unique
eigenbasis, with each vector uniquely labelled by its eigenvalues for the operators in the set.
• Functions of operators are defined through power series; for Hermitian (or unitary) opera-
tors, diagonalize the matrix and apply function to each diagonal element (eigenvalue).
Functions as vectors
A number of times in the early sections of the course we used functions as examples of vectors.
If we confine ourselves to, say, polynomials of a given order, we have a finite dimensional space.
But clearly without that restriction, the order is infinite, and that introduces new issues.
In the first two sections, we will consider functions as vectors. Then in the subsequent sections
we will find a way of mapping abstract vectors on to functions.
Shankar covers this material in a slightly different order. Other textbooks cover the material
but within the context of quantum mechanics from the start; see eg Griffiths Chapter 3.
2.1 Inner product for functions

Shankar p 59
So far we have not defined an inner product in a function space. The definition that we will
find useful is as follows. Given two complex functions of the real variable x ∈ R, f (x) and g(x),
both vectors in the space and so also written |f i and |gi, then
Z ∞ Z ∞
∗
hf |gi = f (x)g(x) dx 2
|f | = hf |f i = f ∗ (x)f (x) dx
−∞ −∞
It is easily seen that this satisfies the rules for an inner product; in particular as f ∗ (x)f (x) ≥ 0,
if hf |f i = 0 then f (x) = 0 for all x—the zero function.
Take careful note that while f and g are functions of x, hf |gi is just a complex number, NOT
a function of x.
However this inner product is not defined for all functions, only those that are square in-
tegrable, that is hf |f i is finite. So the space of square-integrable functions of x ∈ R is an
inner-product or Hilbert space. (We note that the Schwartz inequality ensures hf |gi will be
finite if f and g are square integrable, and the triangle inequality ensures a linear combination
(“vector sum”) of square-integrable functions is also square integrable P .)
With an eye to the application to quantum mechanics, and with due disregard for mathematical
rigour, we will confine ourselves to the subspace of “well-behaved” continuous functions, for
which square-integrability also implies that f vanishes as x → ±∞. In most cases we will
require f 0 (x) and xf (x) to be in the space as well; this restriction will be assumed in what
follows. One exception will be if the functions are required to vanish outside a finite range of
x.
Given an inner product we can find sets of functions which are orthogonal; an example is
2 /2 2 /2 2 /2 2 /2
{φ0 (x) = N0 e−x , φ1 (x) = N1 2xe−x , φ2 (x) = N2 (4x2 −2)e−x , φ3 (x) = N3 (8x3 −12x)e−x }.
21
(The numbers Nn are conventionally chosen to normalise the functions and make the set or-
thonormal.) Any finite set of course cannot be a basis, but an infinite set can; an example is
2
the set {φn (x) = Nn Hn (x)e−x /2 } where Hn (x) is the nth Hermite polynomial, the first four of
which (n = 0, 1, 2, 3) give the previously listed set.
We will call the nth normalised member of an orthonormal basis φn (x) or |ni, where by con-
vention and depending on the basis n = 0, 1, 2 . . . or n = 1, 2 . . .. So now |0i may represent a
basis vector, NOT the zero vector, which will be written 0.
P
Since this set is a basis, any f (x) in the space can be written f (x) = n fn |ni where the infinite
list of complex components {f0 , f1 , f2 , . . .} is the infinite-length column vector which represents
|f i in this basis. As expected the following results hold P :
Z ∞ X X
fn = hn|f i = φ∗n (x)f (x) dx; hf |gi = fn∗ gn ; hf |f i = |fn |2 < ∞.
−∞ n n
2.2 Operators in function spaces

Shankar pp 63-64
Somewhat confusingly, the simplest kind of operator in function space is multiplication by

another function! In particular multiplication by x will give us another function in the space.
The new function xf (x) is written in abstract notation as X̂|f i.
Another operator is differentiation: df /dx is another function in the space. In abstract nota-
tion, we write D̂|f i.
X̂ is obviously Hermitian, since hf |X̂|gi can be written f ∗ × (xg) dx, but that is equivalent
R
to ( g ∗ × (xf ) dx)∗ which is hg|X̂|f i∗ .

R
What about D̂? Consider

Z ∞ Z ∞
∗ dg h i∞
∗ df ∗
hf |D̂|gi = f (x) dx = f g − g(x) dx = −hg|D̂|f i∗
−∞ dx −∞ −∞ dx
where we have integrated by parts and used the fact that f and g vanish at ±∞. So D̂ is
antihermitian, but K̂ ≡ −iD̂ is Hermitian P . (Looking ahead, when we use these ideas in
quantum mechanics we will be using P̂ ≡ −i~D̂ instead, but the constant is irrelevant just
now.)
By integrating by parts twice we can show that D̂2 is Hermitian. So is K̂ 2 .
From the fact that the Hermite polynomials are the solutions (with integer n ≥ 0) of the
equation
d2 Hn dHn
2
− 2x = −2nHn
dx dx
2
we can show that the set {Hn (x)e−x /2 } are eigenfunctions of the Hermitian operator K̂ 2 + X̂ 2
with eigenvalues 2n + 1 P . As expected, the eigenvalues of the Hermitian operator are real and
the eigenvectors (eigenfunctions) are orthogonal. In this basis, K̂ 2 + X̂ 2 is represented by an
infinite-dimensional diagonal square matrix with matrix elements hm|K̂ 2 + X̂ 2 |ni = (2n+1)δmn .
The operators X̂ and D̂ do not commute: [X̂, D̂] 6= 0. If we consider an arbitrary function
f (x), then
df d(xf )
[X̂, D̂]|f i ≡ X̂ D̂|f i − D̂X̂|f i −→ x − = −f (x) ⇒ [X̂, D̂] = −1
dx dx
Equivalently, [X̂, K̂] = i.
If Q(x) is a polynomial in x, and dQ/dx = R(x), we can also write down the operators
Q̂ = Q(X̂) and R̂ = R(X̂). Then P [Q̂, X̂] = 0 and [Q̂, K̂] = iR̂.
2.3 Eigenstates of X̂ and the x-representation

Shankar pp 57-70
Let us define eigenkets of X̂, denoted |x0 i, such that X̂|x0 i = x0 |x0 i, where x0 is a real number.
√ take any value at all, so there are uncountably many such kets, including |2.5i,
Clearly x0 can
| − 53.34i, | 2i, |πi . . . . Often we don’t want to specify the value but keep it general, giving
X̂|xi = x|xi for any x. Different eigenkets are orthogonal: hx|x0 i = 0 if x 6= x0 . The set {|xi}
is called the x-basis. The completeness relation for the identity now involves a sum over all
these states, but a sum over a continuous variable is an integral, so we have
Z ∞
|xihx| dx = Iˆ
−∞
We will often use x0 or even x00 as the variable of integration.

Now consider hx|f i. This is the x component of an abstract vector |f i, which is a complex
number that varies with x, i.e. a function of x which we can call f (x). So if we haven’t already
specified the type of object that |f i is, the x-basis gives us a way of associating a function f (x)
with it. In this way of looking at things, f (x) is the representation of |f i in the x-basis:
|f i −→ f (x).
x
It follows (using the expresson above for the identity operator) that
Z ∞ Z ∞ Z ∞ Z ∞
|f i = |xihx|f i dx = f (x)|xi dx; hf |gi = hf |xihx|gi dx f ∗ (x)g(x) dx;
−∞ −∞ −∞ −∞
with the latter equation giving the expected definition of the inner product for functions.
But what is the function associated with the ket |x0 i, hx|x0 i? We already know that it is a
rather strange object: somehow it only “knows about” the specific point x = x0 . Consider the
following: Z ∞ Z ∞
f (x) = hx|f i = hx| |x0 ihx0 | dx0 |f i = hx|x0 if (x0 ) dx0
−∞ −∞
We should recognise this type of expression: for this to work, we must have hx|x0 i = δ(x − x0 ),
the Dirac delta function. The delta function is real and symmetric, δ(x − x0 ) = δ(x0 − x), so as
required hx0 |xi = hx|x0 i∗ . This fixes the normalisation of the kets |xi, and it is different from
hn|ni = 1, which is appropriate for a countable (discrete) basis.
The matrix elements of any operator Â in the x-basis are hx|Â|x0 i. This is obviously a function
of two variables. However many operators vanish unless x = x0 ; these are called local. An
example is X̂ itself: hx|X̂|x0 i = x0 δ(x − x0 ) (or equivalently xδ(x − x0 )).
Finally let us consider D̂. We want the representation of D̂|f i to be |df /dxi, i.e.
df d
hx|D̂|f i = = hx|f i.
dx dx
Then
d d
hx|D̂|x0 i = hx|x0 i = δ(x − x0 ).
dx dx
Note that, as expected since D̂ is antihermitian, (d/dx)δ(x − x0 ) = −(d/dx0 )δ(x − x0 ).
If the delta function is weird, its derivative is even weirder. But remember, both only have
meaning within an integral (technically speaking, they are distributions rather than func-
tions). So
Z ∞ Z ∞
0 0 0 dδ(x − x0 )
hx|D̂|f i = hx|D̂|x ihx |f i dx = f (x0 ) dx0
−∞ −∞ dx
Z ∞ Z ∞
dδ(x − x0 ) 0 0 df df
=− 0
f (x ) dx = δ(x − x0 ) 0 dx0 =
−∞ dx −∞ dx dx
as expected.
Note D̂ is also local.
For local operators, it is very common to drop the delta function and just write, say, X̂ −→ x,
x
D̂ −→ d/dx, K̂ −→ −id/dx, and we will use this freely in the future.
x x
2.4 Eigenstates of K̂ and the k-representation

Shankar pp 136-137
Recall that we have defined K̂ = −iD̂ to get a Hermitian operator.

We denote eigenkets of K̂ as |k0 i, with K̂|k0 i = k0 |k0 i for some specific value k0 , or more
generally K̂|ki = k|ki if we don’t want to specify the value. Since K̂ is Hermitian, allowed
values of k0 must be real.
What is the functional form of |ki, hx|ki? It turns out to be confusing to call this k(x), so we
will call it φk (x). In the x-basis, the eigenvalue equation is
hx|K̂|ki = khx|ki = k φk (x)
but also, from the x-representation of the operator K̂,

dφk
hx|K̂|ki = −i
dx
Equating the two right-hand sides, we have a familiar differential equation for φk (x), whose
solution is q
1 ikx
hx|ki ≡ φk (x) = 2π e ,
where the choice of normalisation will be justified shortly.
Two states of different k must be orthogonal. In fact
Z ∞ Z ∞ Z ∞
0 0 1 −ikx ik0 x 1 0
hk|k i = hk|xihx|k i dx = e e dx = ei(k −k)x dx = δ(k − k 0 )
−∞ 2π −∞ 2π −∞
and this justifies the choice of normalisation.

This gives us another version of the identity operator
Z ∞
ˆ
|kihk| dk = I.
−∞
By the same argument as used above, for some arbitrary ket |f i, hk|f i is a function of k, which
we will call F (k). Then
Z ∞ q Z ∞
F (k) ≡ hk|f i = 1
hk|xihx|f i dx = 2π e−ikx f (x) dx
−∞ −∞
Thus F (k) is the Fourier transform of f (x). Both are representations of the same abstract ket
|f i.
Similarly, we can show that q Z ∞
f (x) = 1
2π
eikx F (k) dk
−∞
which is the inverse Fourier transform.

Note that
Z ∞ Z ∞ Z ∞
∗
f (x)g(x) dx = hf |gi = hf |kihk|gi dk = F ∗ (k)G(k) dk
−∞ −∞ −∞
which is Parsival’s theorem. So if f (x) is square-integrable, so is F (k), and if one is normalised

so is the other. The k-basis then maps vectors into an alternative Hilbert space, that of complex
square-integrable functions of the real variable k.
We can show that
dδ(k − k 0 )
hk|K̂|k 0 i = kδ(k − k 0 ) and hk|X̂|k 0 i = i
dk
so both operators are local in the k-basis (or k-representation) as well, and we often write
K̂ −→ k and X̂ −→ id/dk.
k k
Note that now we have at least three possible representations of |f i: as an infinite list of
coefficients {f0 , f1 , f2 . . .} in a basis such as the one introduced at the start, as f (x), or as F (k).
All encode the same information about |f i, and it is natural to think of |f i as primary, rather
than any of the representations.
2.5 Functions in 3-D space

The extension to functions of three coordinates x, y and z is is straightforward. There are
operators associated with each, X̂, Ŷ and Ẑ, which commute, and corresponding differential
operators K̂x , K̂y and K̂z , which also commute. Between the two sets the the only non-vanishing
commutators are [X̂, K̂x ] = [Ŷ , K̂y ] = [Ẑ, K̂z ] = i.
In a more compact notation, we introduce the position operator in 3-d space, X̂, which will be
X̂ex + Ŷ ey + Ẑez in a particular coordinate system, and similarly K̂. Boldface now indicates a
vector operator, i.e. a triplet of operators. (We have written the 3-D basis vectors {ex , ey , ez }
instead of {i, j, k}.)
The state |x, y, zi ≡ |ri is an eigenstate of position:

X̂|ri = X̂ex + Ŷ ey + Ẑez |ri = x ex + y ey + z ez |ri = r|ri

K̂|ki = K̂x ex + K̂y ey + K̂z ez |ki = kx ex + ky ey + kz ez |ki = k|ki
In position space, X̂ −→ r and K̂ −→ −i∇.

In 3-D, we have
Z ∞
hf |gi = f ∗ (r)g(r)d3 r; hr|r0 i = δ(r − r0 ) = δ(x − x0 )δ(y − y 0 )δ(z − z 0 )
−∞
The structure of this space is a direct product space: we could write |x, y, zi = |xi ⊗ |yi ⊗ |zi,
and x̂1 as X̂ ⊗ Iˆ⊗ I.
ˆ We almost never do, as it is usually not helpful for problems with spherical
symmetry. But it enables us to see that the states {|m, n, pi} whose wave functions are
2 /2 2 /2 2 /2 2 /2
hr|m, n, pi = Hm (x)e−x Hn (y)e−y Hp (z)e−z = Hm (x)Hn (y)Hp (z)e−r
are basis functions in the space.
The generalisation of φk (x) is
1 3/2
eik·r ,

φk (r) = hr|ki = 2π
which is a plane wave travelling in the direction of k.
Caveats
Though we glossed over the fact, the states |xi and |ki do not correspond to functions in the
Hilbert space, because they are not square integrable. It is particularly easy to see that φk (x),
which is a plane wave of unit magnitude everywhere, is not normalisable, and both hk|ki and
hx|xi are infinite. The x- and k-representations, though, are still extremely useful because
they allow us to associate functions and their Fourier transforms with abstract vectors and vice
versa. The identity operators are particularly useful for this purpose.
In physical applications the most usual solution to this problem is to imagine the system of
interest is in a large box, and require the functions either to vanish at the boundaries, or to
be periodic. Then of course only discrete values of the wave vector k are allowed, but these
will be so finely spaced that sums over allowed values can be replaced by integrals, and any
dependence on the size of the box drops out. The density of states in statistical physics uses
these ideas.
A proper mathematical treatment of functional spaces is well beyond the scope of this course.
Griffiths, chapter 3, says a little more about which results of finite-dimensional vector spaces
can safely be carried over to infinite-dimensional ones.
The Fundamentals of Quantum
Mechanics
3.1 Postulates of Quantum Mechanics
Summary: All of quantum mechanics follows from a small set

of assumptions, which cannot themselves be derived.
Shankar ch 4
Mandl ch 1
Griffiths ch 3
There is no unique formulation or even number of postulates, but all formulations I’ve seen
have the same basic content. This formulation follows Shankar most closely, though he puts III
and IV together. Nothing significant should be read into my separating them (as many other
authors do), it just seems easier to explore the consequences bit by bit.
I: The state of a particle is given by a vector |ψ(t)i in a Hilbert space. The state is normalised:
hψ(t)|ψ(t)i = 1.
This is as opposed to the classical case where the position and momentum can be specified at
any given time.
This is a pretty abstract statement, but more informally we can say that the wave function
ψ(x, t) contains all possible information about the particle. How we extract that information
is the subject of subsequent postulates.
The really major consequence we get from this postulate is superposition, which is behind most
quantum weirdness such as the two-slit experiment.
II: There is a Hermitian operator corresponding to each observable property of the particle.
Those corresponding to position x̂ and momentum p̂ satisfy [x̂i , p̂j ] = i~δij .
Other examples of observable properties are energy and angular momentum. The choice of
these operators may be guided by classical physics (eg p̂ · p̂/2m for kinetic energy and x̂ × p̂
for orbital angular momentum), but ultimately is verified by experiment (eg Pauli matrices for
spin- 12 particles).
The commutation relation for x̂ and p̂ is a formal expression of Heisenberg’s uncertainty prin-
ciple.
III: Measurement of the observable associated with the operator Ω̂ will result in one of the
eigenvalues ωi of Ω̂. Immediately after the measurement the particle will be in the corresponding
27
eigenstate |ωi i.
This postulate ensures reproducibility of measurements. If the particle was not initially in the
state |ωi i the result of the measurement was not predictable in advance, but for the result of
a measurement to be meaningful the result of a subsequent measurement must be predictable.
(“Immediately” reflects the fact that subsequent time evolution of the system will change the
value of ω unless it is a constant of the motion.)
IV: The probability of obtaining the result ωi in the above measurement (at time t0 ) is
|hωi |ψ(t0 )i|2 .
If a particle (or an ensemble of particles) is repeatedly prepared in the same initial state |ψ(t0 )i
and the measurement is performed, the result each time will in general be different (assuming
this state is not an eigenstate of Ω̂; if it is the result will be the corresponding ωi each time).
Only the distribution of results can be predicted. The postulate expressed this way has the same
content as saying that the average value of ω is given by hψ(t0 )|Ω̂|ψ(t0 )i. (Note the distinction
between repeated measurements on freshly-prepared particles, and repeated measurements on
the same particle which will give the same ωi each subsequent time.)
P
Note that if we expand the state in the (orthonormal) basis {|ωi i}, |ψ(t0 )i = i ci |ωi i, the
probability of obtaining the result ωi is |ci |2 , and hΩ̂i = i |ci |2 ωi .
P
d
V: The time evolution of the state |ψ(t)i is given by i~ dt |ψ(t)i = Ĥ|ψ(t)i, where Ĥ is the
operator corresponding to the classical Hamiltonian.
In most cases the Hamiltonian is just the energy and is expressed as p̂ · p̂/2m + V (x̂). (They
differ in some cases though - see texts on classical mechanics such as Kibble and Berkshire.)
In the presence of non-conservative forces such as magnetism the Hamiltonian is still equal to
the energy, but its expression in terms of p̂ is more complicated.
VI: The Hilbert space for a system of two or more particles is a product space.
This is true whether the particles interact or not, ie if the states |φi i span the space for one
particle, the states |φi i ⊗ |φj i will span the space for two particles. If they do interact though,
the eigenstates of the Hamiltonian will not just be simple products of that form, but will be
linear superpositions of such states.
From the ket to the wavefunction

We have already met the position operator, which we previously called X̂, and its eigenkets |ri.
The wave function of a particle is therefore given by ψ(r, t) = hr|ψ(t)i. Note that position and
time are treated quite differently in non-relativistic quantum mechanics. There is no operator
corresponding to time, and t is just part of the label of the state: ψ(r, t) = hr|ψ(t)i. By the 4th
postulate, the probability of finding the particle in an infinitesimal volume dV at a position r,
ρ(r)dV , is given by ρ(r, t) = |hr|ψ(t)i|2 = |ψ(r, t)|2 . Thus a measurement of position can yield
many answers, and as well as an average x-position hψ|x̂|ψi there will be an uncertainty, ∆x,
where ∆x2 = hψ|x̂2 |ψi − hψ|x̂|ψi2 .
Since we need a momentum operator which (in 1-D) obeys [x̂, p̂] = i~, we have p̂ = ~K̂, with
the representation in the position-basis of −i~∇. (We will use small letters, x̂ and p̂ from now
onwards, to agree with the vast majority of textbooks). The commutators can all be expressed
as
[x̂i , x̂j ] = [p̂i , p̂j ] = 0; [x̂i , p̂j ] = i~δij .
where i, j ∈ {x, y, z}. In position space, p̂ −→ −i~∇.
Though the notation hψ|Â|φi is compact, if Â is a function of position and

R∞momentum operators,
to calculate it we will usually immediately substitute the integal form −∞ ψ ∗ (r)Âφ(r)d3 r.
Eigenstates of p̂ are the same as those of K̂, so we can equally write them as |pi with eigenvalue
p = ~k. However if we want to write
Z ∞
ˆ
I= |pihp| d3 p, and hp|p0 i = δ(p − p0 ) = δ(px − p0x )δ(py − p0y )δ(pz − p0z ),
−∞
then the normalisation of the states has to change, since d3 p = ~3 d3 k. (Note that the dimen-
sions of a delta function are the inverse of those of its argument.) Thus
1 3/2 ip·r/~

φp (r) ≡ hr|pi = 2π~ e .
d
From the time evolution equation i~ dt |ψ(t)i = Ĥ|ψ(t)i we obtain in the x-basis
∂
i~ ψ(r, t) = Hψ(r, t),
∂t
which is the Schrödinger equation. Here H is the x-representation of Ĥ, usually −~2 ∇2 /2m +
V (r). (It is common to use Ĥ for this as well).
Together with the probability density, ρ(r) = |ψ(r)|2 , we also have a probability flux
i~ ∗
j(r) = − (ψ (r)∇ψ(r) − ψ(r)∇ψ ∗ (r)).
2m
The continuity equation ∇ · j = −∂ρ/∂t which ensures local conservation of probability density
follows from the Schrödinger equation.
A two-particle state has a wave function which is a function of the two positions (6 coordinates),
Φ(r1 , r2 ), and the basis kets are direct product states |r1 i ⊗ |r2 i. For states of non-interacting
distinguishable particles where it is possible to say that the first particle is in single-particle
state |ψi and the second in |φi, the state of the system is |ψi ⊗ |φi and the wave function is
Φ(r1 , r2 ) = (hr1 | ⊗ hr2 |)(|ψi ⊗ |φi) = hr1 |ψihr2 |φi = ψ(r1 )φ(r2 ).
The propagator or time-evolution operator

The Schrödinger equation tells us the rate of change of the state at a given time. From that
we can deduce an operator that acts on the state at time t0 to give that at a subsequent time
t: |ψ(t)i = Û (t, t0 )|ψ(t0 )i, which is called the propagator or time-evolution operator. We need
the identity
x N
lim 1 + = ex
N →∞ N
(to prove it, take the log of the L.H.S. and use the Taylor expansion for ln(1 + x) about the
point x = 0).
An infinitesimal time step Û (t+dt, t) follows immediately from the Schrödinger equation:
d i
i~ |ψ(t)i = Ĥ|ψ(t)i ⇒ |ψ(t+dt)i − |ψ(t)i = − Ĥdt|ψ(t)i
dt ~
i
⇒ |ψ(t+dt)i = 1 − Ĥdt |ψ(t)i.
~
For a finite time interval t − t0 , we break it into N small steps and take the limit N → ∞, in
which limit every step is infinitesimal and we can use the previous result N times:

|ψ(t)i = lim 1 − ~i Ĥ (t−t
N
0) N
|ψ(t0 )i = e−iĤ(t−t0 )/~ |ψ(t0 )i ≡ Û (t, t0 )|ψ(t0 )i
N →∞
We note that this is a unitary operator (the exponential of i times a Hermitial operator always
is). Thus, importantly, it conserves the norm of the state; there remains a unit probability of
finding the particle somewhere!
If |ψ(t0 )i is an eigenfunction |ni of the Hamiltonian with energy En ,
|ψ(t)i = Û (t, t0 )|ni = e−iEn (t−t0 )/~ |ni.
P
If we are able to decompose |ψ(t0 )i as a sum of such terms, |ψ(t0 )i = n cn |ni, then
X
|ψ(t)i = cn e−iEn (t−t0 )/~ |ni;
n
each term evolves with a different phase and non-trivial time evolution takes place. Note that
this implies an alternative form for the propagator:
X
Û (t, t0 ) = e−iEn (t−t0 )/~ |nihn|.
n
(Aside: If the Hamiltonian depends explicitly on time, we have

Z t
0 0
Û (t, t0 ) = T exp −i Ĥ(t )dt /~ ,
t0
where the time-ordered exponential denoted by T exp means that in expanding the exponential,
the operators are ordered so that Ĥ(t1 ) always sits to the right of Ĥ(t2 ) (so that it acts first)
if t1 < t2 . This will come up in Advanced Quantum Mechanics.)
3.2 Simple examples
3.2.1 Two-state system

Let us introduce a toy system with which to explore some of the ideas from the postulates.
Consider a quantum system in which the states belong to a two-dimensional, rather than
infinite-dimensional, vector space, spanned by the two orthonormal states {|a+i, |a−i} (nota-
tion to be explained shortly). We will need two operators in this space, Â and B̂, and in this
representation

1 0 1 0 0 1
|a+i −→ , |a−i −→ , Â −→ , B̂ −→ .
0 1 0 −1 1 0
So |a+i and |a−i are eigenstates of Â with eigenvalues ±1 respectively. The eigenkets of B̂ are
q
1
q
1 1
|b±i = 2 (|a+i ± |a−i) −→ 2
±1
with eigenvalues ±1.
Measurement
The flow-chart below represents an arbitrary series of measurement on a particle (or series
of identically prepared paricles) in an unknown initial state. We carry out consecutive mea-
surements “immediately”, that is quickly compared with the timescale which characterises the
evolution of the system in between measurements. We will talk of “measuring A” when we
strictly mean “measuring the physical quantity associated with the operator Â.
b=+1
a=+1
B a=+1
A B
b=−1
a=−1
A
b=−1
a=−1
A priori, the possible outcomes on measuring A are the eigenstates of Â, ±1. In general the
particle will not start out in an eigenstate of Â, so either outcome is possible, with probabilities
that depend on the initial state.
If we obtain the outcome a = +1 and then measure B, what can we get? We know that the
state is now no longer |φi but |a+i. The possible outcomes are b = +1 with probabilities
|hb+|a+i|2 and b = −1 with probabilities |hb−|a+i|2 . Both of these probabilities are 1/2: there
is a 50:50 chance of getting b = ±1. (Note that the difference between this and the previous
measurement of A where we did not know the probabilities is that now we know the state before
the measurement.)
If we we obtain the outcome b = −1 and then measure B again immediately, we can only
get b = −1 again. (This is reproducibility). The particle is in the state |b−i before the
measurement, an eigenstate of B̂.
Finally we measure A again. What are the possible outcomes and their probabilities?
Propagation
First let us consider the time-evolution of this system if the Hamiltonian is Ĥ = ~γ B̂. Assume
we start the evolution at t = 0 with the system in the state |ψ(0)i. Then |ψ(t)i = Û (t, 0)|ψ(0)i
with Û (t, 0) = e−iĤt/~ . Now in general the exponentiation of an operator can’t be found in
closed form, but in this case it can, because B̂ 2 = Iˆ and so B̂ 3 = B̂. So in the power series
that defines the exponential, successive terms will be alternately proportional to B̂ and I: ˆ
Û (t, 0) = e−iγtB̂ = Iˆ − iγtB̂ − 12 γ 2 t2 B̂ 2 + i 3!1 γ 3 t3 B̂ 3 + . . .

= 1 − (γt)2 /2 + (γt)4 /4! − . . . Iˆ − i γt − (γt)3 /3! + (γt)5 /5! − . . . B̂

cos γt −i sin γt
= cos γtIˆ − i sin γtB̂ −→
−i sin γt cos γt
So if we start, say, with |ψ(0)i = |b+i, an eigenstate of B̂, as expected we stay in the same
state: |ψ(t)i = Û (t, 0)|b+i = e−iγt |b+i. All that happens is a change of phase. But if we start
with |ψ(0)i = |a+i,
|ψ(t)i = cos γt|a+i − i sin γt|a−i.
Of course we can rewrite this as
q
1
e−iγt |b+i + eiγt |b−i

|ψ(t)i = 2
as expected. The expectation value of Â is not constant: hψ(t)|Â|ψ(t)i = cos 2γt. The system
oscillates between |a+i and |a−i with a frequency 2γ. (This is twice as fast as you might
think—but after time π/γ the state of the system is −|a+i, which is not distinguishable from
|a+i.)
3.2.2 Propagator in free space

One case where the propagator can be calculated even in function space is the case of a free
particle, in which the Hamiltonian is Ĥ = p̂2 /2m. We want to be able to find ψ(r, t) given
ψ(r, 0), using
Z
ψ(r, t) = hr|ψ(t)i = hr|Û (t, 0)|ψ(0)i = hr|Û (t, 0)|r0 iψ(r0 , 0)d3 r0 .
The object U (r, r0 ; t, 0) ≡ hr|Û (t, 0)|r0 i is the position-space matrix element of the propagator.
(Some texts call this the propagator, referring to Û only as the time-evolution operator.) This
is the probability of finding the particle at position r0 at time t, given that at time 0 it was at
r. To calculate it we will use the fact that momentum eigenstates |pi are eigenstates of Ĥ:
ZZ
0
hr|Û (t, 0)|r i = hr|pihp|Û (t, 0)|p0 ihp0 |r0 id3 pd3 p0
−ip̂2 t
ZZ
= hr|pihp| exp |p0 ihp0 |r0 id3 pd3 p0
2m~
−ip2 t −ip0 · r0

ip · r
ZZ
1 0
= exp exp δ(p − p ) exp d3 pd3 p0
(2π~)3 ~ 2m~ ~
−ip2 t ip · (r − r0 )
Z
1
= 3
exp + d3 p
(2π~) 2m~ ~
0 2

m 3/2 im|r − r |
= exp
2iπ~t 2~t
In the last stage, to do the three Gaussian integrals

R (dp x dpy dpzp
) we “completed the square”,
−αx2
shifted the variables and used the standard results e dx = π/α which is valid even if α
is imaginary.
Suppose the initial wave function is a spherically symmetric Gaussian wave packet with width
∆:
ψ(r, 0) = N exp(−|r|2 /(2∆2 )) with N = (π∆2 )−3/4 .
Then the (pretty ghastly) Gaussian integrals give
im|r − r0 |2 −|r0 |2
m 3/2 Z
ψ(r, t) = N exp exp d3 r0
2iπ~t 2~t 2∆2
|r|2

0
= N exp − 2
2∆ (1 + i~t/m∆2 )
where N 0 does preserve the normalisation but we do not display it. This is an odd-looking
function, but the probability density is more revealing:
|r|2

−3/2 2 2 −3/2

P (r, t) = |ψ(r, t)| = π ∆ + (~t/m∆) exp − 2 ;
∆ + (~t/m∆)2 )
p
this is a gaussian wavepacket with width ∆(t) = ∆2 + (~t/m∆)2 . The narrower the initial
wavepacket (in position space), the faster the subsequent spread, which makes sense as the
momentum-space wave function will be wide, built up of high-momentum components. On the
other hand for a massive particle with ∆ not too small, the spread will be slow. For m = 1 g
and ∆(0) = 1 µm, it would take longer than the age of the universe for ∆(t) to double.
3.3 Ehrenfest’s Theorem and the Classical Limit
Summary: The form of classical mechanics which inspired

Heisenberg’s formulation of Classical Mechanics allows us to see
when particles should behave classically.
Shankar ch 2.7, ch 6; Mandl ch 3.2, (Griffiths ch 3.5.3)
d d
Using i~ dt |ψ(t)i = Ĥ|ψ(t)i and hence −i~ dt hψ(t)| = hψ(t)|Ĥ, and writing hΩ̂i ≡ hψ(t)|Ω|ψ(t)i,
we have Ehrenfest’s Theorem
d 1 ∂ Ω̂
hΩ̂i = h[Ω̂, Ĥ]i + h i
dt i~ ∂t
The second term disappears if Ω̂ is a time-independent operator (like momentum, spin...).
Note we are distinguishing between intrinsic time-dependence of an operator, and the time-
dependence of its expectation value in a given state.
This is very reminiscent of a result which follows from Hamilton’s equations in classical me-
chanics, for a function Ω(p, x, t) of position, momentum (and possibly time explicitly)
d ∂Ω dx ∂Ω dp ∂Ω
Ω(p, x, t) = + +
dt ∂x dt ∂p dt ∂t
∂Ω ∂H ∂Ω ∂H ∂Ω
= − +
∂x ∂p ∂p ∂x ∂t
∂Ω
≡ {Ω, H} +
∂t
where the notation {Ω, H} is called the Poisson bracket of Ω and H, and is simply defined in
terms of the expression on the line above which it replaced. (For Ω = x and Ω = p we can in
fact recover Hamilton’s equations for ṗ and ẋ from this more general expression.)
p̂2
In fact for Ĥ = 2m
+ V (x̂), we can further show that
d p̂ d dV (x̂)
hx̂i = h i and hp̂i = −h i
dt m dt dx̂
which looks very close to Newton’s laws. Note though that hdV (x̂)/dx̂i 6= dhV (x̂)i/dhx̂i in
general.
This correspondence is not just a coincidence, in the sense that Heisenberg was influenced by it
in coming up with his formulation of quantum mechanics. It confirms that it is the expectation
value of an operator, rather than the operator itself, which is closer to the classical concept of
the time evolution of some quantity as a particle moves along a trajectory.
A further similarity is that in both quantum and classical mechanics, anything that commutes
with the Hamiltonian (vanishing Poisson bracket in the latter case) is a constant of the motion.
Examples are momentum for a free particle and angular momentum for a particle in a spherically
symmetric potential.
6 0, if the system is in an eigenstate of
In the QM case, we further see that even if [Ω̂, Ĥ] =
Ĥ the expectation value of Ω̂ will not change with time. That’s why the eigenstates of the
Hamiltonian are also called stationary states.
Similarity of formalism is not the same as identity of concepts though. Ehrenfest’s Theorem
does not say that the expectation value of a quantity follows a classical trajectory in general.
What it does ensure is that if the uncertainty in the quantity is sufficiently small, in other words
if ∆x and ∆p are both small (in relative terms) then the quantum motion will aproximate the
classical path. Of course because of the uncertainty principle, if ∆x is small then ∆p is large, and
it can only be relatively small if p itself is really large—ie if the particle’s mass is macroscopic.
More specifically, we can say that we will be in the classical regime if the de Broglie wavelength
is much less that the (experimental) uncertainty in x. (In the Stern-Gerlach experiment the
atoms are heavy enough that (for a given component of their magnetic moment) they follow
approximately classical trajectories through the inhomogeneous magnetic field.)
3.4 The Harmonic Oscillator Without Tears
Summary: Operator methods lead to a new way of viewing

the harmonic oscillator in which quanta of energy are primary.
Shankar pp 202-231, Mandl ch 12.5, Griffiths ch 2.3.1
We are concerned with a particle of mass m in a harmonic oscillator potential 12 kx2 ≡ 12 mω 2 x2

where ω is the classical frequency of oscillation. The Hamiltonian is
p̂2 1
Ĥ = + mω 2 x̂2
2m 2
and we are going to temporarily forget that we know what the energy levels p and wavefunctions
are. Before we do, though, we note that if we define the length x0 = ~/mω, and rescale
x̂ → x0 X̂ and p̂ → ~K̂/x0 , in terms of the new operators Ĥ = 21 ~ω(K̂ 2 + X̂ 2 ), the eigenstates
of which we have already considered.
If we define
1 x̂ x0 †
1 x̂ x0
â = √ + i p̂ and â = √ − i p̂
2 x0 ~ 2 x0 ~
we can prove the following:
√ √
• x̂ = (x0 / 2)(â† + â); p̂ = (i~/ 2x0 )(â† − â)
• [x̂, p̂] = i~ ⇒ [â, â† ] = 1
• Ĥ = ~ω(â† â + 21 )
• [Ĥ, â] = −~ω â and [Ĥ, â† ] = ~ω â†
Without any prior knowledge of this sytem, we can derive the spectrum and the wave functions
of the energy eigenstates. We start by assuming we know one normalised eigenstate of Ĥ, |ni,
with energy En . Since
En = hn|Ĥ|ni = ~ωhn|â† â + 12 |ni = ~ωhn|â† â|ni + 21 ~ω

and also hn|â† â|ni = hân|âni ≥ 0, we see that En ≥ 21 ~ω. There must therefore be a lowest-
energy state, |0i (not the null state!).
Now consider the state â|ni. Using the commutator [Ĥ, â] above we have

Ĥ â|ni = âĤ|ni − ~ωâ|ni = (En − ~ω)â|ni,
so â|ni is another eigenstate with energy En − ~ω. A similar calculation shows that â† |ni is
another eigenstate with energy En + ~ω. So starting with |ni it seems that we can generate an
infinite tower of states with energies higher and lower by multiples of ~ω.
However this contradicts the finding that there is a lowest energy state, |0i. Looking more
closely at the argument, though, we see there is a get-out: either â|ni is another energy eigen-
state or it vanishes. Hence â|0i = 0 (where 0 is the null state or vacuum).
The energy of this ground state is E0 = h0|Ĥ|0i = 21 ~ω. The energy of the state |ni, the nth
excited state, obtained by n applications of â† , is therefore (n + 12 )~ω. Thus
Ĥ|ni ≡ ~ω(â† â + 12 )|ni = (n + 21 )~ω|ni
and it follows that â† â is a “number operator”, with â† â|ni = n|ni. The number in question is
the number ofthe excited state (n = 1—first excited state, etc) but also the number of quanta
of energy in the oscillator.
Up to a phase, which we chose to be zero, the normalisations of the states |ni are:
√ √
â|ni = n|n−1i and â† |ni = n + 1|n+1i.
As a result we have
â† n
|ni = √ |0i.
n!
†
The operators â and â are called “raising” and “lowering” operators, or collectively “ladder”
operators.
We can also onbtain the wavefuntions in this approach. Writing φ0 (x) ≡ hx|0i, from hx|â|0i = 0
we obtain dφ0 /dx = −(x/x20 )φ0 and hence
2 /2x2
φ0 = (πx20 )−1/4 e−x 0
(where the normalisation has to be determined separately). This is a much easier differential
equation to solve than the one which comes direct from the Schrödinger equation!
The wave function for the n-th state is
n
1 x d 1
φn (x) = √ − x0 φ0 (x) = √ Hn ( xx0 )φ0 (x)
n
2 n! x 0 dx n
2 n!
2 2
d n −z /2
where here the definition of the Hermite polynomials is Hn (z) = ez /2 (z − dz ) e . The equiv-
alence of this formulation and the Schrödinger-equation-based approach means that Hermite
polynomials defined this way are indeed solutions of Hermite’s equation.
This framework makes many calculations almost trivial which would be very hard in the tra-
ditional framework; in particular matrix elements √ of powers of x̂† and p̂ between
√ general states
can be easily found by using x̂ = (â + â )(x0 / 2) and p̂ = i(â − â)(~/ 2x0 ). For example,
†
hm|x̂n |m0 i and hm|p̂n |m0 i will vanish unless |m − m0 | ≤ n and |m − m0 | and n are either both
even or both odd (the last condition being a manifestation of parity, since φn (x) is odd/even if
n is odd/even).
For a particle in a two-dimensional potential 21 mωx2 x2 + 12 mωy2 y 2 , the Hamiltonian is seperable:
p p
Ĥ = Ĥx + Ĥy . Defining x0 = ~/mωx and y0 = ~/mωy , creation operators âx and â†x can be
constructed from x̂ and p̂x as above, and we can construct a second set of operators ây and â†y
from ŷ and p̂y (using y0 as the scale factor) in the same way. It is clear that âx and â†x commute
with ây and â†y , and each of Ĥx and Ĥy independently has a set of eigenstates just like the ones
discussed above.
In fact the space of solutions to the two-dimensional problem can be thought of as a tensor
direct product space of the x and y spaces, with energy eigenstates |nx i ⊗ |ny i, nx and ny
being integers, and the Hamiltonian properly being written Ĥ = Ĥx ⊗ Iˆy + Iˆx ⊗ Ĥy , and the
eigenvalues being (nx + 12 )~ωx + (ny + 1)~ωy . The ground state is |0i ⊗ |0i and it is annihilated
by both âx (= âx ⊗ Iˆy ) and ây (= Iˆx ⊗ ây ).
The direct product notation is clumsy though, and we often write the states as just |nx , ny i.
Then for instance
√ p
âx |nx , ny i = nx |nx −1, ny i and â†y |nx , ny i = ny + 1|nx , ny +1i.
The corresponding wave functions of the particle are given by hr|nx , ny i = hx|nx ihy|ny i:
2 2 2 2
φ0,0 (x, y) = (πx0 y0 )−1/2 e−x /2x0 e−y /2y0
1 1
φnx ,ny (x, y) = √ n p
n
Hnx ( xx0 ) Hny ( yy0 ) φ0,0 (x, y)
2 x nx ! 2 y ny !
In many cases we are interested in a symmetric potential in which case ωx = ωy , x0 = y0 , and

φ0,0 ∝ exp(−r2 /x20 ).
This formalism has remarkably little reference to the actual system in question—all the pa-
rameters are buried in x0 . What is highlighted instead is the number of quanta of energy in
the system, with â and â† annihilating or creating quanta (indeed they are most frequently
termed “creation” and “annihilation” operators). Exactly the same formalism can be used in
a quantum theory of photons, where the oscillator in question is just a mode of the EM field,
and the operators create or destroy photons of the corresponding frequency.
Angular momentum
4.1 A revision of orbital angular momentum

Mandl 2.3, 2.5 , Griffiths 4.1
First, a recap. In position representation, in a spherically symmetric problem such as a particle

of mass M moving in a spherical potential V (r) = V (r), we can write the wave function in a
form which is seperable in spherical polar coordinates: ψ(r) = R(r)Y (θ, φ). Then
~2 2
− ∇ ψ(r) + V (r)ψ(r) = Eψ(r)
2M
1 ∂ 2Y

2 1 ∂ ∂Y
⇒ −~ sin θ + = ~2 l(l + 1)Y (4.1)
sin θ ∂θ ∂θ sin2 θ ∂φ2
~2 d2 (rR) ~2 l(l + 1)
and − + R + V (r)R = ER
2M r dr2 2M r2
where l(l + 1) is a constant of separation. The radial equation depends on the potential, and
so differs from problem to problem. However the angular equation is universal: its solutions
do not depend on the potential. It is further seperable into a function of θ and one of φ with
separation constant m2 (not to be confused with the mass!); the latter has solution eimφ and m
must be an integer if the wave function is to be single valued. Finally the allowable solutions
of the θ equation are restricted to those which are finite for all θ, which is only possible if l
is an integer greater than or equal to |m|; the solutions are associated Legendre polynomials
Plm (cos θ). The combined angular solutions are called spherical harmonics Ylm (θ, φ):
r r
1 ±1 3
0
Y0 (θ, φ) = Y1 (θ, φ) = ∓ sin θ e±iφ
4π 8π
r r
3 ±2 15
0
Y1 (θ, φ) = cos θ Y2 (θ, φ) = sin2 θ e±2iφ
4π 32π
r r
15 5
Y2±1 (θ, φ) = ∓ sin θ cos θ e±iφ Y20 (θ, φ) = (3 cos2 θ − 1)
8π 16π
These are normalised and orthogonal:
Z
0 ∗
(Ylm m
0 ) Yl dΩ = δll0 δmm0 where dΩ = sin θ dθ dφ
The physical significance of the quantum numbers l and m is not clear from this approach.
However if we look at the radial equation, we see that the potential has been effectively modified
by an extra term ~2 l(l + 1)/(2M r2 ). Recalling classical mechanics, this is reminiscent of the
38
centrifugal potential which enters the equation for the radial motion of an orbiting particle,
where ~2 l(l + 1) is taking the place of the (conserved) square of the angular momentum. And
indeed, if in quantum mechanics we construct the angular momentum operator
L̂ = r̂ × p̂ = (ŷ p̂z − ẑ p̂y )ex + (ẑ p̂x − x̂p̂z )ey + (x̂p̂y − ŷ p̂z )ez
then the position-space representation of L̂2 = L̂2x + L̂2y + L̂2z is indeed the differential operator
that acts on Y in equation 4.1 above. So ~2 l(l + 1) is the eigenvalue of L̂2 . Since for a system
with a central potential, L̂2 commutes with the Hamiltonian, states may be classified not only
by their energy but also by the square of their angular momentum, indicated by the quantum
number l. What about m? We can rewrite Lz very simply in spherical polar coordinates (which
∂
privilege the z direction): L̂z = −i~ ∂φ . So all the spherical harmonics are eigenstates of L̂z
with eigenvalue ~m. This means that L̂z must commute with L̂2 , something which can be
proved a little lengthily in operator form, but which is obvious in position-space representation
as L̂2 is independent of φ.
The expressions for L̂x and L̂y are rather lengthy, but can be expressed more succinctly as
L̂x = 21 (L̂+ + L̂− ) and L̂y = 2i1 (L̂+ − L̂− ), where L̂+ and L̂− are given below together with L̂z
and L̂2 for reference:

iφ ∂ ∂ † −iφ ∂ ∂
L̂+ = ~e + i cot θ , L̂− = L̂+ = ~e − + i cot θ
∂θ ∂φ ∂θ ∂φ
2

∂ 1 ∂ ∂ 1 ∂
L̂z = −i~ , L̂2 = −~2 sin θ + .
∂φ sin θ ∂θ ∂θ sin2 θ ∂φ2
We have had to choose particular coordinates in physical space (say x is east, y is north and z is
up!) to define these operators. However there is nothing special about any particular direction.
(L̂x , L̂y , L̂z ) is a vector in the sense used in classical physics; the form of its components will be
basis-dependent but its properties will not be. This is what we mean by a vector operator.
Clearly L̂x and L̂y also commute with L̂2 , but as the three components don’t commute with one
another we can only choose one to complete our set of mutually commuting operators; usually,
L̂z , as has been done with the definitions of the spherical harmonics.
4.2 General properties of angular momentum

Shankar 12.5, Griffiths 4.3, Mandl 5.2
In the case of the harmonic oscillator, we found that an approach which focused on operators
and abstract states rather than differential equations was extremely powerful. We are going
to do something similar with angular momentum, with the added incentive that we know
that orbital angular momentum is not the only possible form, we will need to include spin as
well—and that has no classical analogue or position-space description.
Consider three Hermitian operators, Jˆ1 , Jˆ2 and Jˆ3 , components of the vector operator Ĵ, about
which we will only assume one thing, their commutation relations:
[Jˆ1 , Jˆ2 ] = i~Jˆ3 , [Jˆ2 , Jˆ3 ] = i~Jˆ1 , [Jˆ3 , Jˆ1 ] = i~Jˆ2 (4.2)
or succinctly, [Jî , Jˆj ] = i~ k ijk Jˆk .4 It can be shown that the orbital angular momentum
P
operator defined previously satisfy these rules, but we want to be more general, hence the new
name Ĵ, and the use of indices 1-3 rather than x, y, z. Note that Ĵ has the same dimensions
(units) as ~.
From these follow the fact that all three commute with Ĵ2 = Jˆ12 + Jˆ22 + Jˆ2 : 3
[Ĵ2 , Jî ] = 0
It follows that we will in general be able to find simultaneous eigenstates of Ĵ2 and only one
of the components Jî . We quite arbitrarily choose Jˆ3 . We denote the normalised states |λ, µi
with eigenvalue ~2 λ of Ĵ2 and eigenvalue ~µ of Jˆ3 . (We’ve written these so that λ and µ
are dimensionless.) All we know about µ is that it is real, but recalling that for any state and
Hermitan operator, hα|Â2 |αi = hÂα|Âαi ≥ 0, we know in addition that λ must be non-negative.
Furthermore
~2 (λ − µ2 ) = hλ, µ|(Ĵ2 − Jˆ32 )|λ, µi = hλ, µ|(Jˆ12 + Jˆ22 )|λ, µi ≥ 0

√
so |µ| ≤ λ. The magnitude of a component of a vector can’t be bigger than the length of the
vector!
Now let us define raising and lowering operators Jˆ± (appropriateness of the names still to be
shown):
Jˆ+ ≡ Jˆ1 + iJˆ2 ; Jˆ− ≡ Jˆ1 − iJˆ2 .
Note these are not Hermitian, but Jˆ− = Jˆ+† . These satisfy the following commutation relations:
[Jˆ+ , Jˆ− ] = 2~J3 , [Jˆ3 , Jˆ+ ] = ~Jˆ+ , [Jˆ3 , Jˆ− ] = −~Jˆ−

Ĵ2 = 1 (Jˆ+ Jˆ− + Jˆ− Jˆ+ ) + Jˆ32 = Jˆ+ Jˆ− + Jˆ32 − ~Jˆ3 = Jˆ− Jˆ+ + Jˆ32 + ~Jˆ3
2
(4.3)
[Ĵ2 , Jˆ± ] = 0.
Since Jˆ± commute with Ĵ2 , we see that the states Jˆ± |λ, µi are also eigenstates of Ĵ2 with
eigenvalue ~2 λ.
Why the names? Consider the state Jˆ+ |λ, µi:
Jˆ3 (Jˆ+ |λ, µi) = Jˆ+ Jˆ3 |λ, µi + ~Jˆ+ |λ, µi = ~(µ + 1)(Jˆ+ |λ, µi)
So either Jˆ+ |λ, µi is another eigenstate of Jˆ3 with eigenvalue ~(µ + 1), or it is the zero vector.
Similarly either Jˆ− |λ, µi is another eigenstate of Jˆ3 with eigenvalue ~(µ − 1), or it is the zero
vector. Leaving aside for a moment the case where the raising or lowering operator annihilates
the state, we have Jˆ+ |λ, µi = Cλµ |λ, µ + 1i, where
|Cλµ |2 = hλ, µ|Jˆ+† Jˆ+ |λ, µi = hλ, µ|Jˆ− Jˆ+ |λ, µi = hλ, µ|(Ĵ2 − Jˆ32 − ~Jˆz )|λ, µi = ~2 (λ − µ2 − µ)
p
There is an undetermined phase that we can choose to be +1, so Cλµ = ~ λ − µ2 − µ.
We can repeat the process to generate more states with quantum numbers µ ± 2, µ ± 3 . . . unless
we reach states that are annihilated by the raising or lowering operators. All these states are
in the λ-subspace of Ĵ2 .
4
ijk is 1 if i, j, k is a cyclic permutation of 1, 2, 3, −1 if an anticylic permutation such as 2, 1, 3 and 0 if any
two indices are the same.
However
√ we saw above that the magnitude of the eigenvalues µ of Jˆ3 must not be greater than
λ. So the process cannot go on indefinitely, there must be a maximum and minimum value
µmax and µmin , such that Jˆ+ |λ, µmax i = 0 and Jˆ− |λ, µmin i = 0. Furthermore by repeated action
of Jˆ− , we can get from |λ, µmax i to |λ, µmin i in an integer number of steps: µmax − µmin is an
integer, call it N .
Now the expectation value of Jˆ− Jˆ+ in the state |λ, µmax i must also be zero, but as we saw above
2
that expectation value, for general µ, is Cλµ = ~2 (λ − µ2 − µ). Thus
λ − µmax (µmax + 1) = 0.
Similarly, considering the expectation value of Jˆ+ Jˆ− in the state |λ, µmin i gives
λ − µmin (µmin − 1) = 0.
Taking these two equations together with µmin = µmax − N , we find
N
µmax (µmax + 1) = (µmax − N )(µmax − N − 1) ⇒ (N + 1)(2µmax − N ) = 0 ⇒ µmax = 2
.
Hence µmax is either an integer or a half-integer, µmin = −µmax and there are 2µmax + 1 possible
values of µ. Furthermore λ is restricted to values λ = N2 N2 + 1 for integer N .

Let’s compare with what we found for orbital angular momentum. There we found that what
we have called λ had to have the form l(l + 1) for integer l, and what we’ve called µ was an
integer m, with −l ≤ m ≤ l. That agrees exactly with the integer case above. From now on we
will use m for µ, and j for µmax ; furthermore instead of writing the state |j(j +1), mi we will use
|j, mi. We refer to it as “a state with angular momentum j” but this is sloppy—if universally
p
understood; the magnitude of the angular momentum is ~ j(j + 1). The component of this
along any axis, though, cannot be greater than ~j.
But there is one big difference between the abstract case and the case of orbital angular mo-
mentum, and that is that j can be half integer 21 , 23 . . .. If these cases are realised in Physics,
the source of the angular momentum cannot be orbital, but something without any parallel in
classical Physics.
We end this section by rewriting the relations we have already found in terms of j and m,
noting m can only take the one of the 2j + 1 values −j, −j + 1 . . . j − 1, j:
J2 |j, mi = ~2 j(j + 1)|j, mi; Jˆz |j, mi = ~m |j, mi;
Jˆ± |j, mi = ~ j(j + 1) − m(m ± 1) |j, m ± 1i.
p
(4.4)
In the diagram below, √ the five cones show the possible locations of the angular momentum
vector with length 6~ and z-component m. The x- and y-components are not fixed, but must
satisfy hJˆx2 + Jˆy2 i = (6 − m2 )~2 > 0.
4.3 Electron spin and the Stern-Gerlach experiment
From classical physics, we know that charged systems with angular momentum have a magnetic
moment µ, which means that they experience a torque µ × B if not aligned with an external
magnetic field B, and their interaction energy with the magnetic field is −µ·B. For an electron
in a circular orbit with angular momentum L, the classical prediction is µ = −(|e|/2m)L =
−(µB /~)L, where µB = |e|~/2m is called the Bohr magneton and has dimensions of a magnetic
moment.
Since the torque is perpendicular to the angular momentum the system is like a gyroscope, and
(classically) the direction of the magnetic moment precesses about B, with Lz being unchanged.
If the field is not uniform, though, there will also be a net force causing the whole atom to
move so as to reduce its energy −µ · B; taking the magnetic field along the +ve z axis the atom
will move to regions of stronger field if µz > 0 but to weaker field regions if µz < 0. If a beam
of atoms enters a region of inhomogeneous magnetic field one (classically) expects the beam to
spread out, each atom having a random value of µz and so being deflected a different amount.
The Stern-Gerlach experiment, in 1922, aimed to test if silver atoms had a magnetic moment,
and found that it did. The figure below (from Wikipedia) shows the apparatus; the shape of
the poles of the magnet ensures that the field is stronger near the upper pole than the lower
one.
The first run just showed a smearing of the beam demonstrating that there was a magnetic
moment, but further running showed that atoms were actually deflected either up or down by
a fixed amount, indicating that µz only had two possible values relative to the magnetic field.
The deflection what what would be expected for Lz = ±~. That accorded nicely with Bohr’s
planetary orbits, and was taken as a confirmation of a prediction of what we now call the “old”
quantum theory.
From a post-1926 perspective, however, l = 1 would give three spots (m = −1, 0, 1) not two—
and anyway we now know that the electrons in silver atoms have zero net orbital magnetic
moment. By that time though other considerations, particularly the so-called anomalous Zee-
man splitting of spectroscopic lines in a magnetic field, had caused first Kronig then, in 1925,
Gouldschmidt and Uhling, to suggest that electrons could have a further source of angular
momentum that they called spin, which would have only two possible values (m = − 12 , + 12 ) but
which couples twice as strongly to a magnetic field as orbital angular momentum (gs = 2)—
hence the Stern-Gerlach result (µ̂ = −(µB /~)Ŝ). We now know that the electron does indeed
carry an intrinsic angular momentum, called spin but not mechanical in origin, which is an
example of the j = 21 possibility that we deduced above.
Thus the full specification of the state of an electron has two parts, spatial and spin. The
vector space is a tensor direct product space of the space of square-integrable functions of which
the spatial state is a member, states like |ψr (t)i, for which hr|ψr (t)i = ψ(r, t), and spin space,
containing states |ψs (t)i, the nature of which we will explore in more detail in the next section.
While in non-relativistic QM this has to be put in by hand, it emerges naturally from the Dirac
equation, which also predicts gs = 2.
Because this product space is itself a vector space, sums of vectors are in the space and not all
states of the system are seperable, (that is, they do not all have the form |ψr (t)i ⊗ |ψs (t)i). We
can also have states like |ψr (t)i ⊗ |ψs (t)i + |φr (t)i ⊗ |φs (t)i. As we will see spin-space is two
dimensional (call the basis {|+i, |−i} just now), so including spin doubles the dimension of the
state space; as a result we never need more than two terms, and can write
|Ψ(t)i = c1 |ψr (t)i ⊗ |+i + c1 |φr (t)i ⊗ |−i.
But this still means that the electron has two spatial wavefunctions, one for each spin state. In
everything we’ve done so far the spin is assumed not to be affected by the dynamics, in which
case we return to a single common spatial state. But that is not general.
4.4 Spin- 21
Shankar 14, Griffiths 4.4, Mandl 5.3
Whereas with orbital angular momentum we were talking about an infinite-dimension space
which could be considered as a sum of subspaces with l = 0, 1, 2, ...., when we talk about
intrinsic angular momentum—spin—we are confined to a single subspace with fixed j. We also
use Ŝ in place of Ĵ, but the operators Ŝi obey the same rules as the Ji . The simultaneous
eigenstates of Ŝ2 and Ŝz are |s, mi, but as ALL states in the space have the same s, we often
drop it in the notation. In this case, s = 12 , m = − 12 , + 12 , so the space is two-dimensional with
a basis variously denoted
{| 12 , 12 i, | 12 , − 12 i} ≡ {| 12 i, |− 21 i} ≡ {|+i, |−i} ≡ {|ẑ+i, |ẑ−i}
In the last case ẑ is a unit vector in the z-direction, so we are making it clear that these are
states with spin-up (+) and spin-down (−) in the z-direction. We will also construct states
with definite spin in other directions.
†
In this basis, the matrices representing p Ŝz (which is diagonal), Ŝ+ and Ŝ− = Ŝ+ , can be written
down directly. Recall Jˆ+ |j, mi = ~ j(j+1)−m(m+1) |j, m+1i, so
q
1
Ŝ+ | 2 , − 2 i = ~ 34 + 14 | 21 , 12 i,
1
Ŝ+ | 12 , 12 i = 0,
and so h 12 , 21 |Ŝ+ | 12 , − 12 i = ~ is the only non-vanishing matrix element of Ŝ+ From these Ŝx =
1
2
(Ŝ+ + Ŝ− ) and Ŝy = − 12 i(Ŝ+ − Ŝ− ) can be constructed:

1 0 0 1 0 0
|ẑ+i −→ |ẑ−i −→ Ŝ+ −→ ~ Ŝ− −→ ~
Sz 0 Sz 1 Sz 0 0 Sz 1 0

~ 1 0 ~ 0 1 ~ 0 −i
Ŝz −→ Ŝx −→ Ŝy −→ .
Sz 2 0 −1 Sz 2 1 0 Sz 2 i 0
The label Sz on the arrows reminds us of the particular basis we are using. It is easily shown
that the matrices representing the Ŝi obey the required commutation relations.
The matrices
0 1 0 −i 1 0
σx = σy = σz =
1 0 i 0 0 −1
P
are called the Pauli matrices. They obey σi σj = δij I + i k ijk σk , and a · σb · σ = a · b I +
i(a × b) · σ. Together with the identity matrix they form a basis (with real coefficients) for all
Hermitian 2 × 2 matrices.
The component of Ŝ in an arbitrary direction defined by the unit vector n is Ŝ · n, We can
parametrise the direction of n by the polar angles θ, φ, so n = sin θ cos φ ex + sin θ sin φ ey +
cos θ ez . Then in the basis of eigenstates of Ŝz , Ŝ · n and its eigenstates are
sin θe−iφ cos 2θ e−iφ/2 − sin 2θ e−iφ/2

~ cos θ
Ŝ·n −→ |n+i −→ |n−i −→
Sz 2 sin θeiφ − cos θ Sz sin 2θ eiφ/2 Sz cos 2θ eiφ/2
Note that (from the matrix representation) (2Ŝ · n/~)2 = Iˆ for any n. So
exp i(α/~)Ŝ · n = cos α2 Iˆ + i sin α2 ~2 Ŝ · n.

The lack of higher powers of Ŝi follows from the point about Hermitian operators noted above.
Some calculation in the matrix basis reveals the useful fact that hn±|Ŝ|n±i = ± ~2 n; that is,
the expectation value of the vector operator Ŝ is parallel or antiparallel to n.
Spin precession
The Hamiltonian of a spin- 12 electron in a uniform magnetic field is, with gs = 2 and charge
−|e|,
Ĥ = −µ · B = (gs µB /~)Ŝ · B −→ µB σ · B.
Sz
Consider the case of a field in the x-direction, so that Ĥ −→ µB σx B, and a particle initially
Sz
in the state |ẑ+i. It turns out that we have already done this problem, obtaining, with ω =
2µB B/~ being the frequency corresponding to the energy splitting of the eigenstates of Ĥ,
|ψ(t)i = cos(ωt/2)|ẑ+i − isin(ωt/2)|ẑ−i. hψ(t)|Ŝz |ψ(t)i = (~/2) cos ωt.
To this we can now add hψ(t)|Ŝy |ψ(t)i = −(~/2) sin ωt, and hψ(t)|Ŝx |ψ(t)i = 0. So the ex-
pectation value of Ŝ is a vector of length ~/2 in the yz plane which rotates with frequency
ω = 2µB B/~. This is exactly what we would get from Ehrenfest’s theorem.
Alternatively, we can take the magnetic field along ẑ so that the energy eigenstates are |ẑ±i
with energies ±µB B ≡ ±~ω/2. If the initial state is spin-up in an arbitrary direction n, that
is |n+i, we can decompose this in terms of the energy eigenstates, each with its own energy
dependence, and obtain
|ψ(t)i = cos 2θ e−i(ωt+φ)/2 |ẑ+i + sin 2θ ei(ωt+φ)/2 |ẑ−i = |n(t)+i
where n(t) is a vector which, like the original n, is oriented at an angle θ to the ẑ (i.e. B)
axis, but which rotates about that axis so that the aximuthal angle changes with time: φ(t) =
φ(0) + ωt. The expectation value hŜi precesses likewise, following the same behaviour as a
classical magnetic moment of −gs µB .
Spin and measurement: Stern Gerlach revisited
We now understand, in a way that the orginal experimenters did not, what the Stern-Gerlach
experiment does: for each atom that passes through, it measures S · n, where the magnetic field
is along the n direction. Each time, the answer is either up or down, ±~/2. With the initial
beam being unpolarised, the numbers of up and down will be equal.
The apparatus also gives us access to a beam of particles which are all spin-up in a particular
direction; say the z direction. We can then run that beam through a second copy of the
apparatus rotated through some angle θ relative to the first. The particles exiting from this
copy will be either spin-up or down along the new magnetic field axis, and the probability of
getting each is |hn ± |ẑ+i|2 , that is cos2 2θ and sin2 2θ respectively. If θ = π/2 (new field along
the x axis, assuming a beam in the y-direction), the probabilities are both 50%.
Successive measurements can be schematically represented below; each block being labelled by
the direction of the magnetic field. It should look very familiar.
Higher spins
The Particle Data Group lists spin-1 particles and spin- 23 particles; gravitons if they exist are
spin-2, and nuclei can have much higher spins (at least 92 for known ground states of stable
nuclei).
Further more since in many situations total angular momentum commutes with the Hamiltonian
(see later) even when orbital angular momentum is involved we are often only concerned with
a subspace of fixed j (or l or s). All such sub-spaces are finite dimensional, of dimension
N = 2j + 1, and spanned by the basis {|j, mi} with m = j, j − 1 . . . − j + 1, −j. It is most
usual (though of course not obligatory) to order the states by descending m.
In this subspace, with this basis, the operators Jˆx , Jˆy , Jˆz are represented by three N × N
matrices with matrix elements eg (Jx )m0 m = hj, m0 |Jˆx |j, mi. (Because states with differenet j
are orthogonal, and because the Jî only change m, not j, hj 0 , m0 |Jˆx |j, mi = 0 if j 6= j: that’s
why we can talk about non-overlapping subspaces in the first place.) The matrix representation
of Jˆz of course is diagonal with diagonal elements j, j − 1 . . . − j + 1, −j. As with spin- 21 , it is
easiest to construct Jˆ+ first, then Jˆ− as its transpose (the elements of the former having been
chosen to be real), then Jˆx = 21 (Jˆ+ + Jˆ− ) and Jˆy = − 21 i(Jˆ+ − Jˆ− ).
As an example we construct the matrix representation of the operators for spin-1. The three
ˆ
p
basis states |s, mi are |1, 1i, |1, 0i and |1, −1i. Recall J+ |j, mi = ~ j(j+1)−m(m+1) |j, m+1i,
√ √
so Ŝ+ |1, −1i = ~ 2 − 0 |1, 0i, Ŝ+ |1, 0i = ~ 2 − 0 |1, 1i and Ŝ+ |1, 1i = 0, and the only non-zero
matrix elements of Ŝ+ are
√
h1, 1|Ŝ+ |1, 0i = h1, 0|Ŝ+ |1, −1i = 2~.
So:
     
1 0 0
|1, 1i −→  0  |1, 0i −→  1  |1, −1i −→  0 
Sz Sz Sz
0 0 1
     
1 0 0 √ 0 1 0 √ 0 0 0
Ŝz −→ ~  0 0 0 Ŝ+ −→ 2~  0 0 1  Ŝ− −→ 2~  1 0 0 
Sz Sz Sz
0 0 −1 0 0 0 0 1 0
   
0 1 0 0 −i 0
~  ~ 
Ŝx −→ √ 1 0 1  Ŝy −→ √ i 0 −i 
Sz 2 0 1 0
Sz 2 0 i 0
Of course this is equally applicable to any system with j = 1, including the l = 1 spherical
harmonics.
Once all possible values of j and m are allowed, any angular momentum operator is represented
in the {|j, mi} = {|0, 0i, | 12 , 12 i, | 12 , − 12 i, |1, 1i, |1, 0i, |1, −1i . . . } basis by a block-diagonal matrix
The first block is a single element, zero in fact, since all components of Ĵ in the one-dimensional
space of states of j = 0 are zero. The next block is the appropriate 2×2 spin- 21 matrix, the next
a 3 × 3 spin-1 matrix, and so on. This block-diagonal structure reflects the fact that the vector
space can be written as a direct sum of spaces with j = 0, j = 12 , j = 1....: V = V1 ⊕V2 ⊕V3 ⊕. . .
(where the superscripts of course are 2j + 1).
In fact, any given physical system can only have integer or half-integer angular momentum.
So the picture would be similar, but with only odd- or even-dimensioned blocks. For orbital
angular momentum, for instance, the blocks would be 1 × 1, 3 × 3, 5 × 5 . . ..
4.5 Addition of angular momentum
Shankar pp 403-415, Griffiths 4.4, Mandl 4.4
Up till now, we have in general spoken rather loosely as if an electron has either orbital or
spin angular momentum—or more precisely, we’ve considered cases where only one affects the
dynamics, so we can ignore the other. But many cases are not like that. If a hydrogen atom
is placed in a magnetic field, its electron can have both orbital and spin angular momentum,
and both will affect how the energy levels shift, and hence how the spectral lines split. Or the
deuteron (heavy hydrogen nucleus) consists of both a proton and a neutron, and both have
spin; heavier atoms and nuclei have many components all with spin and angular momentum.
Only the total angular momentum of the whole system is guaranteed by rotational symmetry
to be conserved in the absence of external fields. So we need to address the question of the
addition of angular momentum.
Because the notation is clearest, we will start with the spin and orbital angular momentum of
a particle. We consider the case where l as well as s is fixed: electrons in a p-wave orbital,
for instance. These two types of angular momentum are independent and live in different
vector spaces, so this is an example of a tensor direct product space, spanned by the basis
{|l, ml i ⊗ |s, ms i} and hence (2l + 1) × (2s + 1) dimensional.
Now angular momentum is a vector, and we expect the total angular momentum to be the
vector sum of the orbital and spin angular momenta. We can form a new vector operator in
the product space
Ĵ = L̂ ⊗ Iˆ + Iˆ ⊗ Ŝ Ĵ2 = L̂2 ⊗ Iˆ + Iˆ ⊗ Ŝ2 + 2L̂ ⊗ Ŝ
where the last term represents a scalar product as well as a tensor product and would more
clearly be written 2(L̂x ⊗ Ŝx + L̂y ⊗ Ŝy + L̂z ⊗ Ŝz ).
In practice, the tensor product notation for operators proves cumbersome, and we always just
write
Ĵ = L̂ + Ŝ Ĵ2 = L̂2 + Ŝ2 + 2L̂ · Ŝ
We know that the L̂i and Ŝi act on different parts of the state, and we don’t need to stress that
when we act with Ŝi alone we are not changing the orbital state, etc. An alternative form, in
which the tensor product notation is again suppressed, is
Ĵ2 = L̂2 + Ŝ2 + L̂+ Ŝ− + L̂− Ŝ+ + 2L̂z Ŝz .
Now in calling the sum of angular momenta Ĵ, which we previously used for a generic angular
momentum, we are assuming that the Jî do indeed obey the defining commutation rules for
angular momentum, and this can easily be demonstrated. For instance
[Jˆx , Jˆy ] = [L̂x + Ŝx , L̂y + Ŝy ] = [L̂x , L̂y ] + [Ŝx , Ŝy ] = i~L̂z + i~Ŝz = i~Jˆz ,
where we have used the fact that [L̂i , Ŝj ] = 0, since they act in different spaces. Hence we
expect that an alternative basis in the product space will be {|j, mj i}, with allowed values of
j not yet determined. The question we want to answer, then, in the connection between the
{|l, ml i ⊗ |s, ms i} and {|j, mj i} bases. Both, we note, must have dimension (2l + 1) × (2s + 1).
We note some other points about the commutator: L̂z , Ŝz and Jˆz all commute; Jˆz commutes
with Ĵ2 (of course) and with L̂2 and with Ŝ2 (because both L̂z and Ŝz do), but L̂z and Ŝz do not
commute with Ĵ2 . Thus we can, as implied when we wrote down the two bases, always specify
l and s, but then either ml and ms (with mj = ml + ms ) or j and mj . (We will sometimes write
|l, s; j, mj i instead of just |j, mj i, if we need a reminder of l and s in the problem.) What this
boils down to is that the state of a given j and mj will be linear superpositions of the states of
given ms and ml that add up to that mj . If there is more than one such state, there must be
more than one allowed value of j for that mj .
Let’s introduce a useful piece of jargon: the state of maximal m in a multiplet, |j, ji, is called
the stretched state.
We start with the state of maximal ml and ms |l, li ⊗ |s, si, which has mj = l + s. This is
clearly the maximal value of mj , and hence of j: jmax = l + s, and since the state is unique, it
must be an eigenstate of Ĵ2 5 . If we act on this with Jpˆ− = L̂− + Ŝ− , we get a new state with
two terms in it; recalling the general rule Jˆ− |j, mi = ~ j(j + 1) − m(m − 1)|j, m−1i where j
can stand for j or l or s, we have (using ̄ as a shorthand for jmax = l + s)
|̄, ̄i = |l, li ⊗ |s, si⇒ Jˆ− |̄, ̄i = (L̂− |l, li) ⊗ |s, si + |l, li ⊗ (Ŝ− |s, si)
p √ √
⇒ 2̄ |̄, ̄−1i = 2l |l, l−1i ⊗ |s, si + 2s |l, li ⊗ |s, s−1i
From this state we can continue operating with Jˆ− ; at the next step there will be three terms on
the R.H.S. with {ml , ms } equal to {l−2, s}, {l−1, s−1} and {l, s−2}, then four, but eventually
we will reach states which are annihilated by L̂− or Ŝ− and the number of terms will start to
shrink again, till we finally reach |̄, −̄i = |l, −li ⊗ |s, −si after 2̄ steps (2̄ + 1 states in all).
Whichever is the smaller of l or s will govern the maximum number of {ml , ms } that can equal
any mj ; for example if s is smaller, the maximum number is 2s + 1.
Now the state we found with mj = l+s−1 is not unique, there must be another orthogonal
combination of the two states with {ml , ms } equal to {l−1, s} and {l, s−1}. This cannot be
part of a multiplet with j = ̄ because we’ve “used up” the only state with mj = ̄. So it must
be the highest mj state (the stretched state) of a multiplet with j = ̄ − 1 (ie l + s − 1):
q q
s l
|̄ − 1, ̄ − 1i = − l+s |l, l−1i ⊗ |s, si + l+s |l, li ⊗ |s, s−1i
Successive operations with Jˆ− will generate the rest of the multiplet (2̄ − 1 in all); all the states
will be orthogonal to the states of the same mj but higher j already found.
However there will be a third linear combination of the states with {ml , ms } equal to {l−2, s},
{l−1, s−1} and {l, s−2}, which cannot have j = ̄ or ̄−1. So it must be the stretched state of
a multiplet with j = ̄−2, (2̄ − 3 states in all).
And so it continues, generating multiplets with successively smaller values of j. However the
process will come to an end. As we saw, the maximum number of terms in any sum is whichever
is smaller of 2l + 1 or 2s + 1, so this is also the maximum number of mutually orthogonal states
of the same mj , and hence the number of different values of j. So j can be between l + s and
the larger of l + s − 2s and l + s − 2l; that is, l + s ≥ j ≥ |l − s|. The size of the {|j, mj i} basis
is then l+s
P
j=|l−s| 2j + 1, which is equal to (2l + 1)(2s + 1).
The table below illustrates the process for l = 2, s = 1; we go down a column by applying Jˆ− ,
and start a new column by constructing a state orthogonal to those in the previous columns.
5
This can also be seen directly by acting with Ĵ2 = Jˆ− Jˆ+ + Jˆz2 + ~Jˆz , since |l, li ⊗ |s, si is an eigenstate of
Jz with eigenvalue ~(l + s), and is annihilated by both L̂+ and Ŝ+ , and hence by Jˆ+ .
ˆ
The three columns correspond to j = 3, j = 2 and j = 1, and there are 7 + 5 + 3 = 5 × 3 states
in total.
|3, 3i = |2,
q 2i⊗|1, 1i q q q
|3, 2i = 2 |2, 1i⊗|1, 1i+ 1 |2, 2i⊗|1, 0i |2, 2i = − 1 |2, 1i⊗|1, 1i+ 2 |2, 2i⊗|1, 0i
q3 q3 q3 q3 q q
|3, 1i = 2 5
|2, 0i⊗|1, 1i+ 15 8 |2, 1i⊗|1, 0i |2, 1i = − 1
2
|2, 0i⊗|1, 1i+ 1 6
|2, 1i⊗|1, 0i 1 |2, 0i⊗|1, 1i−
|1, 1i = 10 3 |2, 1i⊗|1, 0i
10
q q q
+ 151 |2, 2i⊗|1, −1i + 1 |2, 2i⊗|1, −1i + 3 |2, 2i⊗|1, −1i
q q q 3 q 5 q
|3, 0i = 1 5
|2, −1i⊗|1, 1i+ 3 5
|2, 0i⊗|1, 0i |2, 0i = − 1
2
|2, −1i⊗|1, 1i+0|2, 0i⊗|1, 0i 3 |2, −1i⊗|1, 1i− 2 |2, 0i⊗|1, 0i
|1, 0i = 10 5
q q q
+ 1 5
|2, 1i⊗|1, −1i + 12
|2, 1i⊗|1, −1i + 10 3 |2, 1i⊗|1, −1i
q q q q q q
|3, −1i = 15 1 |2, −2i⊗|1, 1i+ 8 |2, −1i⊗|1, 0i |2, −1i = − 1 |2, −2i⊗|1, 1i− 1 |2, −1i⊗|1, 0i |1, −1i = 3 |2, −2i⊗|1, 1i− 10 3 |2, −1i⊗|1, 0i
q 15 3 q 6 5 q
+ 2 5
|2, 0i⊗|1, −1i + 12
|2, 0i⊗|1, −1i + 1 |2, 0i⊗|1, −1i
10
q q q q
|3, −2i = 1 3
|2, −2i⊗|1, 0i+ 2 |2, −1i⊗|1, −1i
3
|2, −2i = − 2 3
|2, −2i⊗|1, 0i+ 1 3
|2, −1i⊗|1, −1i
|3, −3i = |2, −2i⊗|1, −1i
The coefficients in the table are called Clebsch-Gordan coefficients. They are the inner prod-
ucts (hl, ml | ⊗ hs, ms |)|j, mj i but that is too cumbersome a notation; with a minimum modifi-
cation Shankar uses hl, ml ; s, ms |j, mj i; Mandl uses C(l, ml , s, ms ; j, mj ), but hl, s, ml , ms |j, mj i
and other minor modifications, including dropping the commas, are common. They are all
totally clear when symbols are being used, but easily confused when numerical values are sub-
stituted! We use the “Condon-Shortley” phase convention, which is the most common; in this
convention Clebsch-Gordan coefficients are real, which is why we won’t write hl, ml ; s, ms |j, mj i∗
in the second equation of Eq. (4.5) below. General formulae for the coefficients are not used
(the case of s = 12 is an exception, see below), instead one consults tables or uses the Math-
ematica function ClebschGordan[{l, ml }, {s, ms }, {j, mj }]. There is also an on-line calculator
at Wolfram Alpha.
Here you will find the PDG tables of Clebsch-Gordan coefficients and here instructions on their
use.
All of this has been written for the addition of orbital and spin angular momenta. But we did
not actually assume at any point that l was integer. So in fact the same formulae apply for
the addition of any two angular momenta of any origin: a very common example is two spin- 21
particles. The more general form for adding two angular momenta j1 and j2 , with J and M
being the quantum numbers corresponding to the total angular momentum of the system, is
X
|J, M i = hj1 , m1 ; j2 , m2 |J, M i |j1 , m1 i ⊗ |j2 , m2 i,
m1 ,m2
X
|j1 , m1 i ⊗ |j2 , m2 i = hj1 , m1 ; j2 , m2 |J, M i |J, M i. (4.5)
J,M
For the common case of s = 21 , j = l ± 21 , we have

q q
l∓mj + 21 l±mj + 12
|l± 12 , mj i = 2l+1
|l, mj + 21 i ⊗ | 12 , − 12 i ± 2l+1
|l, mj − 21 i ⊗ | 12 , 12 i.
To summarise, the states of a system with two contributions to the angular momentum, j1
and j2 , written in a basis in which the total angular momentum J and z-component M are
specified; the values of J range from |j1 −j2 | to j1 +j2 in unit steps. In this basis the total angular
momentum operators Jî and Ĵ2 are cast in block-diagonal form, one (2J+1)-square block for
each value of J. The vector space, which we started by writing as a product, V2j1 +1 ⊗ V2j2 +1 ,
can instead be written as a direct sum: V2(j1 +j2 )+1 ⊕ . . . ⊕ V2|j1 −j2 |+1 . In particular for some
orbital angular momentum l and s = 21 , V2l+1 ⊗ V2 = V2l+2 ⊕ V2l . The overall dimension of
the space is of course unchanged.
Example: Two spin- 12 particles
Here we will call the operators Ŝ(1) , Ŝ(2) and Ŝ = Ŝ(1) + Ŝ(2) for the individual and total spin
operators, and S and M for the total spin quantum numbers. (The use of capitals is standard
in a many-particle system.) Because both systems are spin- 12 , we will omit the label from our
states, which we will write in the {m1 , m2 } basis as
|1i = |+i ⊗ |+i, |2i = |+i ⊗ |−i, |3i = |−i ⊗ |+i, |4i = |−i ⊗ |−i.
(The 1 . . . 4 are just labels here.) In this basis

     
0 1 1 0 1 0 0 0 2 0 0 0
0 0 0 1 0 0 0 0   0 1 1 0
Ŝ+ −→ ~  0 0 0 1
 Ŝz −→ ~ 
0 0
 Ŝ2 −→ ~2  
0 0  0 1 1 0
0 0 0 0 0 0 0 −1 0 0 0 2
where we use explicit calculation for the matrix elements, eg
h1|(Ŝ+(1) + Ŝ+(2) )|2i = h+|Ŝ+(1) |+ih+|Iˆ(2) |−i + h+|Iˆ(1) |+ih+|Ŝ+(2) |−i = 0 + ~,
then Ŝ− = (Ŝ+ )† and Ŝ2 = Ŝ+ Ŝ− + Ŝz2 − ~Ŝz .

It is clear that |1i and |4i are eigenstates of Ŝ2 with eigenvalue 2~2 and hence S = 1. They are
2
±~. In the {|2i, |3i} subspace, which has M = 0, Ŝ is
also eigenstates of Ŝz with eigenvalues

1 1
represented by the matrix ~2 which has eigenvalues 2~2 and 0 corresponding to states
1 1
q
1
2
(|2i ± |3i). We label these four simultaneous eigenstates of Ŝ2 and Ŝz as |S, M i, and take
the ordering for the new basis as {|0, 0i, |1, 1i, |1, 0i, |1, −1i}. Then the matrix of eigenvectors,
U, is  
0√ 1 0√ 0
 1/ 2 0 1/ 2 0 
U = ~ √ √ 
 −1/ 2 0 1/ 2 0 
0 0 0 1
and the transformed matrices U† Si U are
     
0 0 0 0 0 0 0 0 0 0 0 0
~  0 0 1 0  ~  0 0 −i 0  0 1 0 0 
Ŝx −→ √  Ŝy −→ √   Ŝz −→ ~  
2 0 1 0 1
  2 0
 i 0 −i  0 0 0 0 
0 0 1 0 0 0 i 0 0 0 0 −1
where the 1 × 1 plus 3 × 3 block-diagonal structure has been emphasised and the 3 × 3 blocks
are just the spin-1 matrices we found previously.
Angular Momentum of Atoms and Nuclei

Both atoms and nuclei consist of many spin- 12 fermions, each of which has both spin and orbital
angular momentum. In the independent-particle model we think of each fermion occupying a
well-defined single-particle orbital which is an eigenstate of a central potential and hence has
well defined orbital angular momentum l. The notation s, p, d, f , g. . . is used for orbitals of
l = 0, 1, 2, 3, 4 . . .. For each fermion there is also a total angualar momentum j, and the spin-
orbit splitting (of which more later) splits states of different j. All the angular momenta of all
the fermions can be added in a variety of ways, and the following quantum numbers are defined:
L for the sum of all the orbital angular momenta (that is, the eigenvalues of L̂2tot are ~2 L(L+1));
S for the sum of all the spin angular momenta, and J for the total angular momentum of the
atom or nucleus from all sources. The use of capitals for the quantum numbers shouldn’t be
confused with the operators themselves.
In reality the independent-particle model is only an approximation, and only the total angular
momentum J is a conserved quantum number (only Ĵ2tot commutes with the Hamiltonian of
the whole system). For light atoms, it is a good starting point to treat L and S as if they were
conserved too, and the notation 2S+1 LJ is used, with L being denoted by S, P , D, F , G. . . .
This is termed LS coupling. So 3 S1 has L = 0, S = J = 1. For heavy atoms and nuclei, it is a
better approximation to sum the individual total angular momenta j. (j-j coupling.)
Somewhat confusingly, J is often called the spin of the atom or nucleus, even though its origin
is both spin and angular momentum. This composite origin shows up in a magnetic coupling
g which is neither 1 (pure orbital) or 2 (pure spin). For light atoms g can be calculated from
L, S and J (the Landé g-factor). For nuclei things are further complicated by the fact that
protons and neutrons are not elementary particles, and their “spin” is likewise of composite
origin, something which shows up through their g values of gp = 5.59 and gn = −3.83 rather
than 2 and 0 respectively. Using these the equivalent of the Landé g-factor can be calculated
for individual nucleon orbitals, and hence for those odd-even nuclei for which the single-particle
model works (that is, assuming that only the last unpaired nucleon contributes to the total
angular momentum). Beyond that it gets complicated.
4.6 Vector Operators

Shankar 15.3
This section is not examinable. The take-home message is that vector operators such as x̂
and p̂ can change the angular momentum of the state they act on in the same way as coupling
in another source of angular momentum with l = 1. If the components of the vector operator
are written in a spherical basis analogously to Ĵ± , the dependence of the matrix elements on the
m quantum numbers is given by Clebsch Gordan coefficients, with the non-trivial dependence
residing only in a single “reduced matrix element” for each pair j and j 0 of the angular momenta
of the initial and final states. This is the Wigner-Eckart theorem of Eq. (4.6).
We have now met a number of vector operators: x̂ = (x̂, ŷ, ẑ), p̂ = (p̂x , p̂y , p̂z ), and of course
L̂, Ŝ and Ĵ. We have seen, either in lectures or examples, that they all satisfy the following
relation: if V̂ stands for the vector operator
X
[Jî , V̂j ] = i~ ijk V̂k
k
for example, [Jˆx , ŷ] = i~ẑ. (We could have substituted L̂x for Jˆx here as spin and space operators
commute.)
We can take this to be the definition of a vector operator: a triplet of operators makes up a
vector operator if it satisfies these commutation relations.
Just as it was useful to define Jˆ+ and Jˆ− , so it is useful to define
q q
1
V̂+1 = − 2 (V̂1 + iV̂2 ) V̂−1 = 12 (V̂1 − iV̂2 ) V̂0 = V̂3
where the subscripts are no longer Cartesian coordinates (1 ≡ x etc) but analogous to the m
of the spherical harmonics—and indeed
q q q
±1
1
∓ 2 (x ± iy) = 4π
3
rY1 (θ, φ) z = 4π 3
rY10 (θ, φ).
q
Note a slight change of normalisation and sign: Jˆ±1 = ∓ 12 Jˆ± . In terms of these spherical
components V̂m ,
[Jˆ0 , V̂m ] = m~V̂m [Jˆ± , V̂m ] = ~ (1 ∓ m)(2 ± m)V̂m±1

p
If we compare these to the effects on states
Jˆ3 |j, mi = ~m|j, mi

p
J± |j, mi = ~ (j ∓ m)(j ± m + 1)|j, m ± 1i
we see a close parallel, so long as we take j = 1 for the vector operators.6

Consider the following two calculations. First, we consider matrix elements of the commutator
of the components of a tensor operator V̂m with Jˆ± , in which l = 1, and p and q are magnetic
quantum numbers like m; in the second line we note that hj, m|Jˆ± is the bra associated with
Jˆ∓ |j, mi
hj 0 , p|[Jˆ± , V̂m ]|j, qi = ~ (l ∓ m)(l ± m + 1)hj 0 , p|V̂m±1 |j, qi

p
hj 0 , p|Jˆ± V̂m − V̂m Jˆ± |j, qi = ~ (j 0 ± p)(j 0 ∓ p + 1)hj 0 , p ∓ 1|V̂m |j, qi

p
and
p
− ~ (j ∓ q)(j ± q + 1)hj 0 , p|V̂m |j, q ± 1i
p p
⇒ (l ∓ m)(l ± m + 1)hj 0 , p|V̂m±1 |j, qi = (j 0 ± p)(j 0 ∓ p + 1)hj 0 , p ∓ 1|V̂m |j, qi
p
− (j ∓ q)(j ± q + 1)hj 0 , p|V̂m |j, q ± 1i
Secondly we take matrix elements of Jˆ± = Jˆ±(1) + Jˆ±(2) , giving us a relation between the Clebsch-
Gordan coefficients for l and j coupling up to j 0

hj 0 , p|Jˆ± |l, mi ⊗ |j, qi = ~ (j 0 ± p)(j 0 ∓ p + 1)hj 0 , p ∓ 1|l, m; j, qi
p

ˆ ˆ
p
and 0
hj , p|J± + J± |l, mi ⊗ |j, qi = ~ (l ∓ m)(l ± m + 1)hj 0 , p|l, m ± 1; j, qi
(1) (2)
p
+ ~ (j ∓ q)(j ± q + 1)hj 0 , p|l, m; j, q ± 1i
p p
⇒ (l ∓ m)(l ± m + 1)hj 0 , p|l, m ± 1; j, qi = (j 0 ± p)(j 0 ∓ p + 1)hj 0 , p ∓ 1|l, m; j, qi
p
− (j ∓ q)(j ± q + 1)hj 0 , p|l, m; j, q ± 1i
Comparing the two, we see that the coefficients are identical, but in the first they multiply
matrix elements of V̂ whereas in the second, they multiply Clebsch-Gordan coefficients. This
can only be true if the matrix elements are proportional to the Clebsch-Gordan coefficients,
6
Note that in this section, we use the algebraically equivalent (j ∓ m)(j ± m + 1) for j(j + 1) − m(m ± 1) in
the normalisation of Jˆ± |j, mi.
with a constant of proportionality which must be independent of magnetic quantum numbers,
and which we will write as hj 0 ||V̂||ji, the reduced matrix element:
hj 0 , p|V̂m |j, qi = hj 0 ||V̂||jihj 0 , p|l, m; j, qi|l=1
This is a specific instance of the Wigner-Eckart theorem. It says that acting on a state
with a vector operator is like coupling in one unit of angular momentum; only states with
|j 0 − 1| ≤ j ≤ j 0 + 1 and with p = m + q will have non-vanishing matrix elements. It also
means that if one calculates one matrix element, which ever is the simplest (so long as it is
non-vanishing), then the others can be written down directly.
Since Ĵ is a vector operator, it follows that matrix elements of Ĵq can also be written in terms
of a reduced matrix element hj 0 ||Ĵ||ji, but of course this vanishes unless j 0 = j.
P
Writing |j1 , j2 ; J, M i = m1 m2 hJ, M |j1 , m1 ; j2 , m2 i|j1 , m1 i ⊗ |j2 , m2 i, and using orthonormality
of the states {|J, M i}, allows us to show that
X
hJ, M |j1 , m1 ; j2 , m2 ihJ 0 , M 0 |j1 , m1 ; j2 , m2 i = δJJ 0 δM M 0 (4.6)
m1 m2
Noting
P too that a scalar product of vector operators P̂·Q̂ can be written in spherical components
as q (−1)q P̂−q Q̂q , we can show that
X
hj, m|P̂ · Ĵ|j, mi = (−1)q hj, m|P̂−q |j 0 , m0 ihj 0 , m0 |Jˆq |j, mi
q,j 0 ,m0
X
= hj 0 , m0 |P̂q |j, mihj 0 , m0 |Jˆq |j, mi = hj||P̂||jihj||Ĵ||ji;
q,m0
(we insert a complete set of states at the first step, then use the Wigner-Eckart theorem and
Eq. (4.6)).
p
Replacing P̂ with Ĵ gives us hj||Ĵ||ji = j(j + 1). Hence we have the extremely useful relation
hj, m|P̂ · Ĵ|j, mi

hj, m|P̂|j, mi = hj, m|Ĵ|j, mi . (4.7)
j(j + 1)
which we will use in calculating the Landé g factor in the next section.
Finally, we might guess from the way that we used a general symbol l instead of 1, that there
are operators which couple in 2 or more units of angular momentum. Simple examples are
obtained by writing rl Ylm in terms of x, y and z, then setting x → x̂ etc; so (x̂ ± iŷ)2 , (x̂ ± iŷ)ẑ,
and 2ẑ 2 − x̂2 − ŷ 2 ) are the m = ±2, m = ±1 and m = 0 components of an operator with
l = 2 (a rank-two tensor operator, in the jargon). There are six components of x̂i x̂j , but
(ẑ 2 + x̂2 + ŷ 2 ) is a scalar (l = 0). This is an example of the tensor product of two l = 1 operators
giving l = 2 and l = 0 operators.
Time-independent perturbation theory
5.1 Approximate methods in Quantum Mechanics

It is often (almost always!) the case that we cannot solve real problems analytically. Only a
very few potentials have analytic solutions, by which I mean one can write down the energy
levels and wave functions in closed form, as for the harmonic oscillator and Coulomb potential.
In fact those are really the only useful ones (along with square wells)... In the last century,
a number of approximate methods have been developed to obtain information about systems
which can’t be solved exactly.
These days, this might not seem very relevant. Computers can solve differential equations very
efficiently. But:
• It is always useful to have a check on numerical methods

• Even supercomputers can’t solve the equations for many interacting particle exactly in a
reasonable time (where “many” may be as low as four, depending on the complexity of
the interaction) — ask a nuclear physicist or quantum chemist.
• Quantum field theories are systems with infinitely many degrees of freedom. All ap-
proaches to QFT must be approximate.
• If the system we are interested in is close to a soluble one, we might obtain more insight
from approximate methods than from numerical ones. This is the realm of perturbation
theory. The most accurate prediction ever made, for the anomalous magnetic moment of
the electron, which is good to one part in 1012 , is a 4th order perturbative calculation.
Examples of approximate methods that we will not cover in this course are:
• The WKB approximation, applicable when the potential varies slowly on the scale of
the wavelength of a particle moving in that potential. Among other uses, it gives an
approximation expression for the probability of tunnelling through a barrier. Recall that
for apsquare barrier of height V and width L this is proportional to exp(−2kL), where
RL
k = 2m(V − E). For a slowly varying potential, this is replaced by exp(−2 0 k(x)dx),
p
where k(x) = 2m(V (x) − E). You will meet this in the context of alpha decay; the
details are given here.
• The variational method, which sets an upper bound on the ground-state energy E0 of
a bound system, by noting that for any appropriate normalised trial state |Ψi, E0 ≤
hΨ|Ĥ|Ψi, something that can easily be seen by expressing |Ψias a sum over the true
eigenstates of Ĥ.
54
By far the most widely-used approximate method, though, is perturbation theory, applicable
where the problem to be solved is “close to” a soluble one.
5.2 Non-degenerate perturbation theory

Shankar 17.1, Mandl 7.1, Griffiths 6.1
Perturbation theory is applicable when the Hamiltonian Ĥ can be split into two parts, with
the first part being exactly solvable and the second part being small in comparison. The first
part is always written Ĥ (0) , and we will denote its eigenstates by |n(0) i and energies by En(0)
(with wave functions φ(0)
n ). These we know, and for now assume to be non-degenerate. The
eigenstates and energies of the full Hamiltonian are denoted |ni and En , and the aim is to
find successively better approximations to these. The zeroth-order approximation is simply
|ni = |n(0) i and En = En(0) , which is just another way of saying that the perturbation is small
and at a crude enough level of approximation we can ignore it entirely.
Nomenclature for the perturbing Hamiltonian Ĥ − Ĥ (0) varies. δV , Ĥ (1) and λĤ (1) are all
common. It usually is a perturbing potential but we won’t assume so here, so we won’t use the
first. The second and third differ in that the third has explicitly identified a small, dimensionless
parameter (eg α in EM), so that the residual Ĥ (1) isn’t itself small. With the last choice, our
expressions for the eigenstates and energies of the full Hamiltonian will be explicitly power
series in λ, so En = En(0) + λEn(1) + λ2 En(2) + . . . etc. With the second choice the small factor is
hidden in Ĥ (1) , and is implicit in the expansion which then reads En = En(0) +En(1) +En(2) +. . .. In
this case one has to remember that anything with a superscript (1) is first order in this implicit
small factor, or more generally the superscript (m) denotes something which is mth order. For
the derivation of the equations we will retain an explicit λ, but thereafter we will set it equal
to one to revert to the other formulation. We will take λ to be real so that Ĥ1 is Hermitian.
We start with the master equation
(Ĥ (0) + λĤ (1) )|ni = En |ni.
Then we substitute in En = En(0) + λEn(1) + λ2 En(2) + . . . and |ni = |n(0) i + λ|n(1) i + λ2 |n(2) i + . . .
and expand. Then since λ is a free parameter, we have to match terms on each side with the
same powers of λ, to get
Ĥ (0) |n(0) i = En(0) |n(0) i
Ĥ (0) |n(1) i + Ĥ (1) |n(0) i = En(0) |n(1) i + En(1) |n(0) i
Ĥ (0) |n(2) i + Ĥ (1) |n(1) i = En(0) |n(2) i + En(1) |n(1) i + En(2) |n(0) i
We have to solve these sequentially. The first we assume we have already done. The second
will yield En(1) and |n(1) i. Once we know these, we can use the third equation to yield En(2) and
|n(2) i, and so on. The expressions for the changes in the states, |n(1) i etc, will make use of the
fact that the unperturbed states {|n(0) i} from a basis, so we can write
X X
|n(1) i = cm |m(0) i = hm(0) |n(1) i |m(0) i.
m m
In each case, to solve for the energy we take the inner product with hn(0) | (i.e. the same state)
whereas for the wave function, we use hm(0) | (another state). We use, of course, hm(0) |Ĥ (0) =
Em(0)
hm(0) | and hm(0) |n(0) i = δmn .
At first order we get
En(1) = hn(0) |Ĥ (1) |n(0) i (5.1)

hm |Ĥ |n i
(0) (1) (0)
hm(0) |n(1) i = ∀m 6= n.
En(0) − Em
(0)
The second equation tells us the overlap of |n(1) i with all the other |m(0) i, but not with |n(0) i.
This is obviously not constrained by the eigenvalue equation, because we can add any amount
of |n(0) i and the equations will still be satisfied. However we need the state to continue to be
normalised, and when we expand hn|ni = 1 in powers of λ we find that hn(0) |n(1) i is required
to be imaginary. This is just like a phase rotation of the original state and we can ignore it.
(Recall that an infinitesimal change in a unit vector has to be at right angles to the original.)
Hence
X hm(0) |Ĥ (1) |n(0) i
|n(1) i = (0) (0)
|m(0) i. (5.2)
m6=n
E n − E m
If the spectrum of Ĥ (0) is degenerate, there may be a problem with this expression because the
denominator can be infinite. In fact nothing that we have done so far is directly valid in that
case, and we have to use “degenerate perturbation theory” instead. For now we assume that for
any two states |m(0) i and |n(0) i, either En(0) − Em
(0)
6= 0 (non degenerate) or hm(0) |Ĥ (1) |n(0) i = 0
(the states are not mixed by the perturbation.)
Then at second order
(0) (1) (0) 2

X hm |Ĥ |n i
En(2) = hn(0) |Ĥ (1) |n(1) i = . (5.3)
m6=n
En(0) − Em
(0)
The expression for the second-order shift in the wave function |n(2) i can also be found but it
is tedious. The main reason we wanted |n(1) i was to find En(2) anyway, and we’re not planning
to find En(3) ! Note that though the expression for En(1) is generally applicable, those for |n(1) i
and En(2) would need some modification if the Hamiltonian had continuum eigenstates as well
as bound states (eg hydrogen atom). Provided the state |ni is bound, that is just a matter of
integrating rather than summing. This restriction to bound states is why Mandl calls chapter 7
“bound-state perturbation theory”. The perturbation of continuum states (eg scattering states)
is usually dealt with separately.
Note that the equations above hold whether we have identified an explicit small parameter λ
or not. So from now on we will set λ to one, assume that Ĥ (1) has an implicit small parameter
within it, and write En = En(0) + En(1) + En(2) + . . .; the expressions above for E (1,2) and |n(1) i are
still valid.
It can be shown that hψ|Ĥ|ψi ≥ E0 for all normalised states |ψi (with equality implying
|ψi = |0i). Thus for the (non-degenerate) ground state, E0(0) + E0(1) is an upper bound on the
exact energy E0 , since it is obtained by using the unperturbed ground state as a trial state
for the full Hamiltonian. It follows that the sum of all higher corrections E0(2) + . . . must be
negative. We can see indeed that E0(2) will always be negative, since for every term in the sum
the numerator is positive and the denominator negative. (The fact that hψ|Ĥ|ψi ≥ E0 is the
basis of the variational approach to finding the ground state energy, where we vary a trial state
|ψi to opimise an upper bound on E0 .)
5.2.1 Perturbed infinite square well
Probably the simplest example we can think of is an infinite square well with a low step half
way across, so that 
0
 for 0 < x < a/2,
V (x) = V0 for a/2 < x < a

∞ elsewhere

We treat this as a perturbation on the flat-bottomed well, so H (1) = V0 for a/2 < x < a and
zero elsewhere.
q
2
The ground-state unperturbed wave function is ψ0(0) = a
sin πx
a
, with unperturbed energy
E0(0) = π 2 ~2 /(2ma2 ). A “low” step will mean V0 E0(0) . Then we have
2 a
Z
πx V0
(1) (0) (1) (0)
E0 = hψ0 |H |ψ0 i = V0 sin2 dx =
a a/2 a 2
This problem can be solved√ semi-analytically;

p in both regions the solutions are sinusoids, but
0
with wavenumbers k = 2mE/~ and k = 2m(E − V0 )/~ respectively; satisfying the bound-
ary conditions and matching the wave functions and derivatives at x = a/2 gives the condition
k cot(ka/2) = k 0 cot(k 0 a/2) which can be solved numerically for E. Below the exact solu-
tion (green, dotted) and E0(0) + E0(1) (blue) are plotted; we can see that they start to diverge
when V0 is about 5, which is higher than we might have expected (everything is in units of
~2 /(2ma2 ) ≈ 0.1E0 ).
17
16
15
14
13
12
11
V
2 4 6 8 10 12 14
We can also plot the exact wave functions for different step size, and see that for V0 = 10 (the
middle picture, well beyond the validity of first-order perturbation theory) it is significantly
different from a simple sinusoid.
25 25 25
20 20 20
15 15 15
10 10 10
5 5 5
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
5.2.2 Perturbed Harmonic oscillator
p̂ 2
Another example is the harmonic oscillator, Ĥ = 2m + 12 mω 2 x̂2 , with a perturbing potential
2
H = λx . The states of the unperturbed oscillator are denoted |n(0) i with energies E0(0) =
(1)
(n + 21 )~ω.
Recalling√that in terms of creation and annihilation p operators (see section A.1),
x̂ = (x0 / 2)(â + â† ), with [â, â† ] = 1, and x0 = ~/(mω), and so
x20 λ (0) † 2 λ
En(1) = hn(0) |H (1) |n(0) i = hn |(â ) + â2 + 2â† â + 1|n(0) i = ~ω(n + 12 ).
2 mω 2
The first-order change in the wave function is also easy to compute, as hm(0) |H (1) |n(0) i = 0
unless m = n ± 2. Thus
X hm(0) |Ĥ (1) |n(0) i
|n(1) i = (0) (0)
|m(0) i
m6=n
En − Em
p p !
~λ (n + 1)(n + 2) n(n − 1)
= |(n + 2)(0) i + |(n − 2)(0) i .
2mω −2~ω 2~ω
We can now also calculate the second order shift in the energy:
2
X hm(0) |Ĥ (1) |n(0) i

En(2) = hn(0) |Ĥ (1) |n(1) i =
m6=n
En(0) − Em (0)
2
~λ (n + 1)(n + 2) n(n − 1) 2
= + = − 21 λ
mω 2
~ω(n + 12 )
2mω −2~ω 2~ω
We can see a pattern emerging, and of course this is actually a soluble

p problem, as all that the
0
perturbation has done is change the frequency. Defining ω = ω 1 + 2λ/(mω 2 ), we see that
the exact solution is
2
En = (n + 21 )~ω 0 = (n + 12 )~ω 1 + mω
λ
2 − 1 λ
2 mω 2
+ . . .
in agreement with the perturbative calculation.
5.3 Degenerate perturbation theory

Shankar 17.3, Mandl 7.3, Griffiths 6.6
None of the formalism that we have developed so far works if Ĥ (0) has degenerate eigenstates.
To be precise, it is still fine for the non-degenerate states, but it fails to work in a subspace
of degenerate states if Ĥ (1) is not also diagonal in this subspace. The reason is simple: we
assumed from the start that the shifts in the states due to the perturbation q would
be small.

But suppose |1(0) i and |2(0) i are degenerate eigenstates of Ĥ (0) ; then so are 12 |1(0) i ± |2(0) i .
Now the eigenstates of the full Hamiltonian |1i and |2i are not degenerate—but which of the
possible choices for the eigenstate of Ĥ (0) are they close to? If for example it is the latter (as is
often the case in simple examples) then even a tiny perturbation Ĥ (1) will induce a big change
in the eigenstates.
The solution is clear: we need to work with a combination of the unperturbed degenerate states
in which Ĥ (1) is diagonal. Sometimes the right choice is obvious from the outset. To use a
physical example, if Ĥ (0) commutes with both L̂ and Ŝ, we have a choice of quantum numbers
to classify the state: our basis can be {|l, ml ; s, ms i} or {|l, s; j, mj i}. If Ĥ (1) fails to commute
with L̂ or Ŝ (while still commuting with L̂2 and Ŝ2 ), then we avoid all problems by simply
choosing the second basis from the start.7
In the absence of physical guidance, we need to write down the matrix which is the representa-
tion of Ĥ (1) in the degenerate subspace of the originally-chosen basis, and diagonalise it. The
eigenstates are still eigenstates of Ĥ (0) and are linear combinations of the old basis states. We
then then proceed as in the non-degenerate case having replaced (say) |1(0) i and |2(0) i with the
new linear combinations which we can call |10(0) i and |20(0) i. The expressions for the energy and
state shifts, using the new basis, are as before, Eqs. (5.1,5.2,5.3), except instead of summing
over all states m 6= n, we sum over all states for which Em (0)
6= En(0) . The first-order energy-shifts
hn0(0) |Ĥ (1) |n0(0) i of the originally-degenerate states are just the eigenvalues of the representation
of Ĥ (1) in the degenerate subspace.
For example suppose Ĥ (0) has many eigenstates but two, |1(0) i and |2(0) i, are degenerate, and
that Ĥ (1) |1(0) i = β|2(0) i and Ĥ (1) |2(0) i = β|1(0) i, with β real; then in this subspace

(1) 0 1
Ĥ −→ β
1 0
q q
whose eigenstates are 12 −11 and 12 ( 11 ), with eigenvalues ∓β. So

q q
0(0) 1 0(0)
(0)
|1 i = 2 |1 i − |2 i , (0)
|2 i = 12 |1(0) i + |2(0) i
E1(1)0 = h10(0) |Ĥ (1) |10(0) i = −β, E2(1)0 = h20(0) |Ĥ (1) |20(0) i = β.
The expressions for |10(1) i and E1(2)0 are just given by Eq. (5.2,5.3) but with primed states where
appropriate; since h20(0) |Ĥ (1) |10(0) i = 0 by construction, state |20(0) i does not appear in the sum
over states and there is no problem with vanishing denominators.
5.3.1 Example of degenerate perturbation theory

Suppose we have a three-state basis and an Ĥ (0) whose eigenstates, |1(0) i, |2(0) i and |3(0) i, have
energies E1(0) , E2(0) and E3(0) (all initially assumed to be different). A representation of this
system is
       (0) 
1 0 0 E1 0 0
|1(0) i −→  0  , |2(0) i −→  1  , |3(0) i −→  0  , Ĥ (0) −→  0 E2(0) 0  .
0 0 1 0 0 E3(0)
First, let us take E1(0) = E0 , E2(0) = 2E0 and E3(0) = 3E0 . Now let’s consider the perturbation
 
1 1 1
Ĥ (1) −→ a  1 1 1  .
1 1 1
7
If the states {|ωi i} are eigenstates of Ω̂, and [Ĥ (1) , Ω̂] = 0, then from hωj |[Ĥ (1) , Ω̂]|ωi i = 0 we immediately
have hωj |Ĥ (1) |ωi i = 0 if ωj 6= ωi .
Then we can show that, to first order in a
E1(1) = E2(1) = E3(1) = a,

   
0 1
a a (0) a  a (0) a (0) a 
|1(1) i = − |2(0) i − |3 i −→ −2  , |2(1) i = |1 i − |3 i −→ 0 ,
E0 2E0 2E0 E0 E0 E0
−1 −1
 
1
a (0) a (0) a   3a2 3a2
(1)
|3 i = |1 i + |2 i −→ 2 , E1(2) = − , E2(2) = 0, E3(2) = .
2E0 E0 2E0 2E0 2E0
0
Note that all of these terms are just the changes in the energies and states, which have to be
added to the zeroth order ones to get expressions which are complete to the given order.
In this case the exact eigenvalues of Ĥ (0) + Ĥ (1) can only be found numerically. The left-hand
plot below shows the energies as a function of a, both in units of E0 , with the dashed lines
being the expansion to second order:
The right-hand plot above shows the the partially degenerate case which we will now consider.
Let E1(0) = E2(0) = E0 , and E3(0) = 2E0 . We note that |1(0) i and |2(0) i are just two of an
infinite set of eigenstates with the same energy E1(0) , since any linear combination of them is
another eigenstate. We have to make the choice which diagonalises Ĥ (1) in this subspace: in
this subspace
(1) 1 1
Ĥ −→ a
1 1
q q
whose eigenstates are 12 −11 and 12 ( 11 ), with eigenvalues 0 and 2a. So

0 (0) 1 (0)
0 (0) 1 (0)
|1 i = √ |1 i − |2 i
(0)
and |2 i = √ |1 i + |2 i .
(0)
2 2
These new states√don’t diagonalise Ĥ (1) completely, of course. We have h3(0) |Ĥ (1) |10(0) i = 0 and
h3(0) |Ĥ (1) |20(0) i = 2a. Thus
E1(1)0 = 0 E2(1)0 = 2a, E3(1) = a,

√ √ √
   
0 1
a 2 (0) 2a   2a 0(0) a  
|10(1) i = 0, 0(1)
|2 i = − |3 i −→ − 0 , (1)
|3 i = |2 i −→ 1
E0 E0 E0 E0
1 0
2 2
2a 2a
E1(2) = 0, E2(2) = − , E3(2) =
E0 E0
In this case it is easy to show that |10(0) i is actually an eigenstate of Ĥ (1) , so there will be no
change to any order. In this case we can check our results against the exact eigenvalues and
see that they are correct; for that purpose it is useful to write Ĥ (1) in the new basis (Ĥ (0) of
course being unchanged):  
0 0 √0
Ĥ (1) −→ a  0 √2 2 .
0 2 1
One final comment: we calculated
h10(0) |Ĥ (1) |3(0) i 0(0) h20(0) |Ĥ (1) |3(0) i 0(0)
|3(1) i = |1 i + |2 i
2E0 2E0
But we could equally have used the undiagonalised states |1(0) i and |2(0) i. This can be seen if
we write
1 0(0) 0(0)
|3(1) i = |1 ih1 | + |20(0) ih20(0) | Ĥ (1) |3(0) i
2E0
and spot that the term in
brackets is the identity operator in the degenerate subspace, which can
equally well be written |1 ih1 | + |2 ih2 | . Of course for a problem in higher dimensions,
(0) (0) (0) (0)
there would be other terms coming from the non-degenerate states |m(0) i as well.
5.4 The fine structure of hydrogen
Shankar 17.3, Mandl 7.4, Griffiths 6.3, Gasiorowicz 12.1,2,4
5.4.1 Pure Coulomb potential and nomenclature

Although the Schrödinger equation with a Coulomb potential reproduces the Bohr model and
gives an excellent approximation to the energy levels of hydrogen, the true spectrum was known
to be more complicated right from the start. The small deviations are termed “fine structure”
and they are of order 10−4 compared with the ground-state energy (though the equivalent terms
for many-electron atoms can be sizable). Hence perturbation theory is an excellent framework
in which to consider them.
First a reminder of the results of the unperturbed calculation. The Coulomb potential is
V (r) = −~cα/r (written in terms of the dimensionless α = e2 /4π0 ~c ≈ 1/137); the energies
turn out to depend only on the principal quantum number n and not on l: En = − n12 ERy ,
where ERy = 12 α2 mc2 = 13.6 eV (with m the reduced mass of the electron-proton system).
For a given n all values of l < n are allowed, giving n−1 (2l + 1) = n2 degenerate states.
P
l
−r/a0
The wave function of the ground state is proportional to e , where a0 is the Bohr radius,
2
a0 = ~c/(mc α). Results for other hydrogen-like single electron atoms (with nuclear charge Z)
can be obtained by replacing α with Zα and m with the appropriate reduced mass. Lists of
wave functions are given in the background section.
The states of the system are tensor-direct-product states of spatial and spin states, written
|n, l, ml i ⊗ | 12 , ms i or |n, l, ml , ms i (s = 12 is usually suppressed). Since the Hamiltonian has no
spin dependence, the spatial state is just the one discussed previously. The alternative basis
|n, l; j, mj i is often used for reasons which will become clear; subshells of states of a given
{l, s, j} are referred to using the notation (2s+1) lj , with l = s, p, d, f . . . , so for example 2 f5/2
(or just f5/2 ) has s = 21 of course, l = 3 and j = l − s = 52 . An example of such a state is
q q
|n, 1; 2 2 i = 3 |n, 1, 1i ⊗ | 2 , − 2 i − 13 |n, 1, 0i ⊗ | 21 , 12 i.
11 2 1 1
(Compared to the section on addition of orbital and spin angular momenta, the only difference
is that in calculating matrix elements of the states |n, l, ml i, there is a radial integral to be done
as well as the angular one.) With the pure Coulomb potential, we can use whichever basis we
like.
5.4.2 Fine structure: the lifting of l degeneracy

There are two effects to be considered. One arises from the use of the non-relativistic p expression
p2 /2m for the kinetic energy, which is only the first term in an expansion of (mc2 )2 + (pc)2 −
mc2 . The first correction term is −p4 /(8m3 c2 ), and its matrix elements are most easily cal-
culated using the trick of writing it as −1/(2mc2 )(Ĥ (0) − VC (r))2 , where Ĥ (0) is the usual
Hamiltonian with a Coulomb potential. Now in principle we need to be careful here, because
Ĥ (0) is highly degenerate (energies depend only on n and not on l or m). However we have
hn, l0 , m0l |(Ĥ (0) − VC (r))2 |n, l, ml i = hn, l0 , m0l |(En(0) − VC (r))2 |n, l, ml i, and since in this form the
operator is spherically symmetric and spin-independent, it can’t link states of different l, ml
or ms . So the basis {|n, l, ml , ss i} already diagonalises Ĥ (1) in each subspace of states with the
(0)
same n, and we have no extra work to do here. (We are omitting the superscript on the
hydrogenic states, here and below.)
The final result for the kinetic energy effect is

(1) 1 (0) 2 (0)
1 2 1
hn, l, ml |ĤKE |n, l, ml i = − (En ) + 2En ~cαhn, l| |n, li + (~cα) hn, l| 2 |n, li
2mc2 r r
2

α |En |
(0)
2 3
=− −
n 2l + 1 4n
In calculating this the relation ERy = ~cα/(2a0 ) is useful. The matrix elements involve radial
integrals only; tricks for calculating these are explained in Shankar qu. 17.3.4; they are tabu-
lated in the background section. Details of the algebra for this and the following calculation
are given here.
The second correction is the spin-orbit interaction:
(1) 1 dVC
ĤSO = L̂ · Ŝ
2m2 c2 r dr
In this expression L̂ and Ŝ are the vector operators for orbital and spin angular momentum
respectively. The usual (somewhat hand-waving) derivation talks of the electron seeing a mag-
netic field from the proton which appears to orbit it; the magnetic moment of the electron then
prefers to be aligned with this field. This gives an expression which is too large by a factor of
2; an exact derivation requires the Dirac equation.
This time we will run into trouble with the degeneracy of Ĥ (0) unless we do some work first.
Since the Coulomb potential is spherically symmetric, there is no mixing of states of the same
n but different l. However states of different {ml , ms } will mix, since L̂ · Ŝ does not commute
with L̂z and Ŝz . The trick of writing 2L̂ · Ŝ = Ĵ2 − L̂2 − Ŝ2 where Ĵ = L̂ + Ŝ tells us that L̂ · Ŝ
does commute with Ĵ2 and Jˆz , so we should work in the basis |n, l; j, mj i, instead. (The label
s = 21 is suppressed.) This basis diagonalises the spin-orbit perturbation (and is an equally
acceptable basis, giving the same result, for the relativistic correction term above).
Then
(1) 1 1 dVC 1

2 2 2

hn, l; j, mj |ĤSO |n, l; j, mj i = hn, l| |n, li hl; j, m j | 2
Ĵ − L̂ − Ŝ |l; j, mj i
2m2 c2 r dr
~cα 1
hn, l| 3 |n, li ~2 j(j + 1) − l(l + 1) − 43

= 2 2
4m c r
α2 |En(0) |

2 2
= − ,
n 2l + 1 2j + 1
where in the first line we have separated the radial integral from the angular momentum matrix
elements, and where a fair amount of algebra links the last two lines. (This expression is only
correct for l 6= 0. However there is another separate effect, the Darwin term, which only affects
s-waves and whose expectation value is just the same as above (with l = 0 and j = 21 ), so we
can use this for all l. The Darwin term can only be understood in the context of the Dirac
equation.)
So finally
α2 |En(0) |

(1) 3 2
Enj = − .
n 4n 2j + 1
The degeneracy of all states with the same n has been broken. States of given j with l = j ± 21
are still degenerate, a result that persists to all orders in the Dirac equation (where in any case
orbital angular momentum is no longer a good quantum number.) So the eight n = 2 states
are split by 4.5 × 10−5 eV, with the 2 p3/2 state lying higher that the degerate 2 p1/2 and 2 s1/2
states.
Two other effects should be mentioned here. One is the hyperfine splitting. The proton has a
magnetic moment, and the energy of the atom depends on whether the electon spin is aligned
with it or not— more precisely, whether the total spin of the electon and proton is 0 or 1. The
anti-aligned case has lower energy (since the charges are opposite), and the splitting for the
1s state is 5.9 × 10−6 eV. (It is around a factor of 10 smaller for any of the n = 2 states.)
Transitions between the two hyperfine states of 1s hydrogen give rise to the 21 cm microwave
radiation which is a signal of cold hydrogen gas in the galaxy and beyond.
The final effect is called the Lamb shift. It cannot be accounted for in quantum mechanics, but
only in quantum field theory.
The diagrams above show corrections to the simple Coulomb force which would be represented
by the exchange of a single photon between the proton and the electron. The most notable
effect on the spectrum of hydrogen is to lift the remaining degeneracy between the 2 p1/2 and
2
s1/2 states, so that the latter is higher by 4.4 × 10−6 eV.
Below the various corrections to the energy levels of hydrogen are shown schematically. The
gap between the n = 1 and n = 2 shells is supressed, and the Lamb and hyperfine shifts are
exaggerated in comparison with the fine-structure. The effect of the last two on the 2 p3/2 level
is not shown.
5.5 The Zeeman effect: hydrogen in an external mag-
netic field
(Shankar 14.5), Mandl 7.5, Griffiths 6.4, Gasiorowicz 12.3
(Since we will not ignore spin, this whole section is about the so-called anomalous Zeeman
effect. The so-called normal Zeeman effect cannot occur for hydrogen, but is a special case
which pertains in certain multi-electron atoms for which the total spin is zero.)
With an external magnetic field along the z-axis, the perturbing Hamiltonian is Ĥ (1) = −µ·B =
(µB B/~)(L̂z + 2Ŝz ). The factor of 2 multiplying the spin is of course the famous g-factor for
spin, as predicted by the Dirac equation. Clearly this is diagonalised in the {|n, l, ml , ms i}
basis (s = 12 suppressed in the labelling as usual). Then Enlm(1)
l ms
= µB B(ml + 2ms ). If, for
example, l = 2 there are 7 possible values of ml + 2ms between −3 and 3, with −1, 0 and 1
being degenerate (5 × 2 = 10 states in all).
This is fine if the magnetic field is strong enough that we can ignore the fine structure discussed
in the last section. But typically it is not. For a weak field the fine structure effects will be
stronger, so we will consider them part of Ĥ (0) for the Zeeman problem; our basis is then
{|n, l; j, mj i} and states of the same j but different l are degenerate. This degeneracy however
is not a problem, because the operator (L̂z + 2Ŝz ) does not connect states of different l. So we
can use non-degenerate perturbation theory, with
(1) µB B µB B
Enljmj = hn, l; j, mj |L̂z + 2Ŝz |n, l; j, mj i = µB Bmj + hn, l; j, mj |Ŝz |n, l; j, mj i.
~ ~
If Jˆz is conserved but L̂z and Ŝz are not, the expectation values of the latter two might be
expected to be proportional to the first, modified by the average degree of alignment: hŜz i =
~mj hŜ · Ĵi/hĴ2 i . (This falls short of a proof but is in fact correct, and follows from the
Wignar Eckart theorem as explained in section 4.6. A similar expression holds for L̂z .) Using
2Ŝ · Ĵ = Ŝ2 + Ĵ2 − L̂2 gives

(1) j(j + 1) − l(l + 1) + s(s + 1)
Enljmj = µB Bmj 1 + = µB Bmj gjls .
2j(j + 1)
Of course for hydrogen s(s+1) = 34 , but the expression above, which defines the Landé g factor,
is actually more general and hence I’ve left it with an explicit s. For hydrogen, j = l ± 12 and
1
so g = (1 ± 2l+1 ).
Thus states of a given j (already no longer degenerate due to fine-structure effects) are further
split into (2j + 1) equally-spaced levels. Since spectroscopy involves observing transitions
between two states, both split but by different amounts, the number of spectral lines can be
quite large.
5.6 The Stark effect: hydrogen in an external electric
field
Shankar 17.2,3, Gasiorowicz 11.3, (Griffiths problems 6.35,36)
In this section we consider the energy shifts of the levels of hydrogen in an external electric field,
taken to be along the z-axis: E = Eez (we use E for the electric field strength to distinguish it
from the energy). We will work in the strong-field limit and ignore fine structure; furthermore
the dynamics are then independent of the spin so we can ignore ms ; the unperturbed eigenstates
can be take to be |n, l, ml i.
The perturbing Hamiltonian is Ĥ (1) = |e|Ez. Now it is immediately obvious that, for any state,
hn, l, ml |z|n, l, ml i = 0. The probability density is symmetric on reflection in the xy-plane,
but z is antisymmetric. So for the ground state, the first order energy shift vanishes. (We
will return to excited states, but think now about why we can’t conclude the same for them.)
This is not surprising, because an atom of hydrogen in its ground state has no electric dipole
moment: there is no p · E term to match the µ · B one.
To pcalculate the second-order energy shift we need hn0 , l0 , m0l |z|1, 0, 0i. We can write z as r cos θ
or 4π/3 rY10 (θ, φ). The lack of dependence on φ means that ml can’t change, and in addition
l can only change by one unit, so hn0 , l0 , m0 |z|1, 0, 0i = δl0 1 δm0 0 hn0 , 1, 0|z|1, 0, 0i. However this
isn’t the whole story: there are also states in the continuum, which we will denote |ki (though
these are not plane waves, since they see the Coulomb potential). So we have
X |hn, 1, 0|z|1, 0, 0i|2 2
3 |hk|z|100i|
Z
(2) 2 2
E100 = (eE) + (eE) d k
n>1
E1(0) − En(0) E1(0) − Ek(0)
(We use E1 for E100 ). This is a compact expression, but it would be very hard to evaluate
directly. We can get a crude estimate of the size of the effect by simply replacing all the
denominators by E1(0) − E2(0) ; this overestimates the magnitude of every term but the first, for
which it is exact, so it will give an upper bound on the magnitude of the shift. Then
!
2 Z
(eE) X X
E1(2) > (0) (0)
h1, 0, 0|z|n, l, ml ihn, l, ml |z|1, 0, 0i + d3 kh1, 0, 0|z|kihk|z|1, 0, 0i
E1 − E2 n≥1 lm l
2
(eE) 2 4(eEa0 )2 8(eE)2 a30
= h1, 0, 0|z |1, 0, 0i = − = −
E1(0) − E2(0) 3ERy 3~cα
where we have included n = 1 and other values of l and m in the sum because the matrix
elements vanish anyway, and then used the completeness relation involving all the states, bound
and unbound, of the hydrogen atom.
There is a trick for evaluating the exact result, which gives 9/4 rather than 8/3 as the constant
(See Shankar.) So our estimate of the magnitude is fairly good. (For comparison with other
ways of writing the shift, note that (eE)2 /~cα = 4π0 E2 —or in Gaussian units, just E2 —see
the background section.)
Having argued above that the hydrogen atom has no electric dipole, how come we are getting
a finite effect at all? The answer of course is that the field polarises the atom, and the induced
dipole can then interact with the field.
Now for the first excited state. We can’t conclude that the first-order shift vanishes here, of
course, because of degeneracy: there are four states and Ĥ (1) is not diagonal in the usual basis
|2, l, ml i (with l = 0, 1). In fact as we argued above it only connects |2, 0, 0i and |2, 1, 0i, so the
states |2, 1, ±1i decouple and their first order shifts do vanish. Using h2, 1, 0|z|2, 0, 0i = −3a0 ,
we have in this subspace (with |2, 0, 0i = (1, 0)> and |2, 1, 0i = (0, 1)> )

(1) 0 1
Ĥ = −3a0 |e|E ,
1 0
p
and the eigenstates are 1/2(|2, 0, 0i ± |2, 1, 0i) with eigenvalues ∓3a0 |e|E. So the degenerate
quartet is split into a triplet of levels (with the unshifted one doubly degenerate).
In reality the degeneracy of the n = 2 states is lifted by the fine-structure splitting; are these
results then actually relevant? They will be approximately true if the field is large; at an
intermediate strength both fine-structure and Stark effects should be treated together as a
perturbation on the pure Coulomb states. For very weak fields degenerate perturbation √ theory
can be applied in the space of j = 21 states (2 s 1 and 2 p 1 ), which are shifted by ± 3a0 |e|E. The
2 2
j = 23 states though have no first-order shift.
Quantum Measurement
6.1 The Einstein-Poldosky-Rosen “paradox” and Bell’s

inequalities
Mandl 6.3, Griffiths 12.2, Gasiorowicz 20.3,4
In 1935 Einstein, along with Boris Poldosky and Nathan Rosen, published a paper entitled
“Can quantum-mechanical description of physical reality be considered complete?” By this
stage Einstein had accepted that the uncertainty principle did place fundamental restrictions
on what one could discover about a particle through measurements conducted on it. The
question however was whether the measuring process actually somehow brought the properties
into being, or whether they existed all along but without our being able to determine what
they were. If the latter was the case there would be “hidden variables” (hidden from the
experimenter) and the quantum description—the wave function—would not be a complete
description of reality. Till the EPR paper came out many people dismissed the question as
undecidable, but the EPR paper put it into much sharper focus. Then in 1964 John Bell
presented an analysis of a variant of the EPR paper which showed that the question actually
was decidable. Many experiments have been done subsequently, and they have come down
firmly in favour of a positive answer to the question posed in EPR’s title.
The original EPR paper used position and momentum as the two properties which couldn’t be
simultaneously known (but might still have hidden definite values), but subsequent discussions
have used components of spin instead, and we will do the same. But I will be quite lax about
continuing to refer to “the EPR experiment”.
There is nothing counter-intuitive or unclassical about the fact that we can produce a pair of
particles whose total spin is zero, so that if we find one to be spin-up along some axis, the other
must be spin down. All the variants of the experiment to which we will refer can be considered
like this: such a pair of electrons is created travelling back-to-back at one point, and travel to
distant measuring stations where each passes through a Stern-Gerlach apparatus (an “SG”) of
a certain orientation in the plane perpendicular to the electrons’ momentum.
As I say there is nothing odd about the fact that when the two SGs have the same orientation
the two sequences recorded at the two stations are perfectly anti-correlated (up to measurement
errors). But consider the case where they are orientated at 90◦ with respect to each other as
below: Suppose for a particular pair of electrons, we measure number 1 to be spin up in the
z-direction and number 2 to be spin down in the x-direction. Now let’s think about what would
have happened if we had instead measured the spin in the x-direction of particle 1. Surely, say
EPR, we know the answer. Since particle 2 is spin down in the x-direction, particle 1 would
have been spin up. So now we know that before it reached the detector, particle 1 was spin up
68
in the z-direction (because that’s what we got when we measured it) and also spin up in the
x-direction (because it is anti-correlated with particle 2 which was spin down). We have beaten
the uncertainty principle, if only retrospectively.
But of course we know we can’t construct a wave function with these properties. So is there
more to reality than the wave function? Bell’s contribution was to show that the assumption
that the electron really has definite values for different spin components—if you like, it has
an instruction set which tells it which way to go through any conceivable SG that it might
encounter—leads to testable predictions.
For Bell’s purposes, we imagine that the two measuring stations have agreed that they will set
their SG to one of 3 possible settings. Setting A is along the z-direction, setting C is along
the x direction, and setting B is at 45◦ to both. In the ideal set-up, the setting is chosen just
before the electron arrives, sufficiently late that no possible causal influence (travelling at not
more than the speed of light) can reach the other lab before the measurements are made. The
labs record their results for a stream of electrons, and then get together to classify each pair
as, for instance, (A ↑, B ↓) or (A ↑, C ↑) or (B ↑, B ↓) (the state of electron 1 being given
first). Then they look at the number of pairs with three particular classifications: (A ↑, B ↑),
(B ↑, C ↑) and (A ↑, C ↑). Bell’s inequality says that, if the way the electrons will go through
any given orientation is set in advance,
N (A ↑, B ↑) + N (B ↑, C ↑) ≥ N (A ↑, C ↑)
where N (A ↑, B ↑) is the number of (A ↑, B ↑) pairs etc.

Now let’s prove that.
Imagine any set of objects (or people!) with three distinct binary properties a, b and c—say
blue or brown eyes, right or left handed, and male or female (ignoring messy reality in which
there are some people not so easily classified). In each case, let us denote the two possible
values as A and A etc (A being “not A” in the sense it is used in logic, so if A is blue-eyed,
A is brown-eyed). Then every object is classified by its values for the three properties as, for
instance, ABC or ABC or ABC . . .. The various possibilities are shown on a Venn diagram
below (sorry that the bars are through rather than over the letters...) In any given collection
of objects, there will be no fewer than zero objects in each subset, obviously. All the N s are
greater than or equal to zero. Now we want to prove that the number of objects which are
AB (irrespective of c) plus those that are BC (irrespective of a) is greater than or equal to the
number which are AC (irrespective of b):
N (AB) + N (BC) ≥ N (AC)
This is obvious from the diagram below, in which the union of the blue and green sets fully
contains the red set.
A logical proof is as follows:
N (AB) + N (BC) = N (ABC) + N (ABC) + N (ABC) + N (ABC)

= N (ABC) + N (AC) + N (ABC) ≥ N (AC)
To apply to the spins we started with, we identify A with A ↑ and A with A ↓. Now if an
electron is A ↑ B ↓ (whatever C might be) then its partner must be A ↓ B ↑, and so the result
of a measurement A on the first and B on the second will be (A ↑, B ↑). Hence the inequality
for the spin case is a special case of the general one. We have proved Bell’s inequality assuming,
remember, that the electrons really do have these three defined properties even if, for a single
electron, we can only measure one of them.
Now let’s consider what quantum mechanics would say. A spin-0 state of two identical particles
is q
|S = 0i = 12 | ↑i ⊗ | ↓i − | ↓i ⊗ | ↑i
and this is true whatever the axis we have chosen to define “up” and “down”. As expected, if we
choose the same measurement direction at the two stations (eg both A), the first measurement
selects one of the two terms and so the second measurement, on the other particle, always
give the opposite result. (Recall this is the meaning of the 2-particle wave function being
non-seperable or entangled.)
What about different measurement directions at the two stations (eg A and B)? Recall the
relation between the spin-up and spin-down states for two directions in the xz-plane, where θ
is the angle between the two directions:
|θ, ↑i = cos 2θ |0, ↑i + sin 2θ |0, ↓i |0, ↑i = cos 2θ |θ, ↑i − sin 2θ |θ, ↓i
|θ, ↓i = − sin 2θ |0, ↑i + cos 2θ |0, ↓i |0, ↓i = sin 2θ |θ, ↑i + cos 2θ |θ, ↓i.
(We previously showed this for the first axis being the z-axis, but, up to overall phases, it is
true for any pair). For A and B or for B and C θ = 45◦ ; for A and C it is 90◦ .
Consider randomly oriented spin-zero pairs and settings A, B and C equally likely. If the first
SG is set to A and the second to B (which happens 1 time in 9), there is a probability of 1/2 of
getting A ↑ at the first station. But then we know that the state of the second electron is |A ↓i
and the probability that we will measure spin-up in the B direction is |hB ↑ |A ↓i|2 = sin2 π8 .
Thus the fraction of pairs which are (A ↑, B ↑) is 12 sin2 22.5◦ = 0.073, and similarly for
(B ↑, C ↑). But the fraction which are (A ↑, C ↑) is 12 sin2 45◦ = 0.25. So the prediction of
quantum mechanics for 9N0 measurements is
N (A ↑, B ↑) + N (B ↑, C ↑) = 0.146N0 < N (A ↑, C ↑) = 0.25N0
So in quantum mechanincs, Bell’s inequality is violated. The experiment has been done many
times, starting with the pioneering work of Alain Aspect, and every time the predictions of
quantum mechanics are upheld and Bell’s inequality is violated. (Photons rather than electrons
are often used. Early experiments fell short of the ideal in many ways, but as loopholes have
been successively closed the result has become more and more robust.)
It seems pretty inescapable that the electrons have not “decided in advance” how they will
pass through any given SG. Do we therefore have to conclude that the measurement made at
station 1 is responsible for collapsing the wave function at station 2, even if there is no time
for light to pass between the two? It is worth noting that no-one has shown any way to use
this set-up to send signals between the stations; on their own they both see a totally random
succession of results. It is only in the statistical correlation that the weirdness shows up...
In writing this section I found this document by David Harrison of the University of Toronto
very useful.
As well as the textbook references given at the start, further discussions can be found in N.
David Mermin’s book Boojums all the way through (CUP 1990) and in John S. Bell’s Speakable
and unspeakable in quantum mechanics (CUP 1987).
Mathematical background and revision
A.1 Series Solution of Hermite’s equation and the Har-

monic Oscillator
Shankar 7.3
Griffiths 2.3.2
We consider a particle moving in a 1D quadatic potential V (x) = 12 mω 2 x2 , like a mass on a

spring. The Hamiltonian operator is
p̂2 1
Ĥ = + mω 2 x̂2 (A.1)
2m 2
p
We will work in rescaled dimensionless coordinates, defining the length scale x0 = ~/mω, so
x̂ → x0 y and p̂ → (−i~/x0 )d/dy. The energy scale is 12 mω 2 x20 = 21 ~ω. We are looking for wave
functions φ(y), of energy E = 12 ~ωE, which satisfy
d2 φ
− + y 2 φ = Eφ. (A.2)
dy 2
2 /2
If we write φ(y) ≡ f (y)e−y , this can be rewritten as
d2 f df
− 2y + (E − 1)f = 0. (A.3)
dy 2 dy
This is P
Hermite’s differential equation. If we look for a series solution of the form
f (y) = ∞ n
j=0 cj y , we get
∞
X ∞
X ∞
X
j(j − 1)cj y j−2 − 2 jcj y j + (E − 1) cn y j = 0
j=2 j=1 j=0
∞
X
⇒ (j + 1)(j + 2)cj+2 + (E − 1 − 2j)cj y j = 0 (A.4)
j=0
where we have changed the summation index in the first sum before relabelling it j. The only
way a polynomial can vanish for all y is if all the coefficients vanish, so we have a recurrence
relation:
(j + 1)(j + 2)cj+2 + (E − 1 − 2j)cj = 0. (A.5)
72
Given c0 and c1 , we can construct all other coefficients from this equation, for any E. We can
obtain two independent solution, as expected for a second order differential equation: even
solutions with c1 = 0 and odd ones with c0 = 0.
However, we need the wave function to be normalisable (square integrable), which means that
it tends to 0 as x → ±∞. In general an infinite polynomial times a Gaussian will not satisfy
this, and these solutions are not physically acceptable. If we look again at equation (A.5),
though, we see that if E = 1 + 2n for some integer n ≥ 0, then cn+2 , cn+4 , cn+6 . . . are all zero.
Thus for E = 1, 5, 9 . . . we have finite even polynomials, and for E = 3, 7, 11 . . . we have finite
odd polynomials. These are called the Hermite polynomials.
Rewriting (A.5) with E = 1 + 2n as
2(j − n)
cj+2 = cj , (A.6)
(j + 1)(j + 2)
we have for instance, for n = 5,
c3 = 2(1−5)c1 /(2.3) = −4c1 /3 c5 = 2(3−5)c3 /(4.5) = −c3 /5 = 4c1 /15, c7 = c9 = . . . = 0,

(A.7)
and H5 (y) = c1 (4y 5 − 20y 3 + 15y)/15. The conventional normalisation uses 2n for the coefficient
of the highest power of y, which would require c1 = 120, and H5 (y) = 32y 5 − 160y 3 + 120y.
The first four are:
H0 (y) = 1; H1 (y) = 2y; H2 (y) = 4y 2 −2; H3 (y) = 8y 3 −12y; H4 (y) = 16y 4 −48y 2 +12. (A.8)
The corresponding solutions of the original Hamiltonian, returning to unscaled coordinates, are
φn (x) = (2n n!)−1/2 Hn ( xx0 ) × (πx20 )−1/4 exp −x2 /(2x20 ) ;

with energies En = (n + 21 )~ω.

Just as in the square well, the restriction to solutions which satisfy the boundary conditions
has resulted in quantised energy levels.
The wave functions and probablility densities are illustrated below.
A.2 Hydrogen wave functions

The solutions of the Schrödinger equation for the Coulomb potential V (r) = −~cα/r have
energy En = − n12 ERy , where ERy = 12 α2 mc2 = 13.6 eV (with m the reduced mass of the
electron-proton system). (Recall α = e2 /(4π0 ~c) ≈ 1/137.) The spatial wavefunctions are
ψnlm (r) = Rn,l (r)Ylm (θ, φ).
The radial wavefunctions are as follows, where a0 = ~c/(mc2 α):
Figure A.1: Energy levels and wave functions of the Harmonic oscillator, from Florida State
University Physics wiki:
http://wiki.physics.fsu.edu/wiki/index.php/Harmonic Oscillator Spectrum and Eigenstates

2 r
R1,0 (r) = 3/2
exp − ,
a0 a0

2 r r
R2,0 (r) = 1− exp − ,
(2 a0 )3/2 2 a0 2 a0

1 r r
R2,1 (r) = √ exp − ,
3 (2 a0 )3/2 a0 2 a0
2 r2

2 2r r
R3,0 (r) = 1− + exp − ,
(3 a0 )3/2 3 a0 27 a02 3 a0
√
4 2 r r r
R3,1 (r) = 1− exp − ,
9 (3 a0 )3/2 a0 6 a0 3 a0
√ 2
2 2 r r
R3,2 (r) = √ exp − .
27 5 (3 a0 )3/2 a0 3 a0
R∞
They are normalised, so 0 (Rn,l (r))2 r2 dr = 1. Radial wavefuntions of the same l but different
n are orthogonal (the spherical harmonics take care of orthogonality for different ls).
The following radial integrals can be proved:
a20 n2
hr2 i = 5 n2 + 1 − 3l (l + 1) ,

2
a0
3n2 − l (l + 1) ,

hri =
2
1 1
= ,
r n 2 a0

1 1
= ,
r2 (l + 1/2) n3 a02

1 1
= .
r3 l (l + 1/2) (l + 1) n3 a30
For hydrogen-like atoms (single-electron ions with nuclear charge |e| Z) the results are obtained
by substituting α → Zα (and so a0 → a0 /Z).
A.3 Properties of δ-functions

The δ-function is defined by its behaviour in integrals:
Z b Z b
δ(x − x0 )dx = 1; f (x)δ(x − x0 ) dx = f (x0 )
a a
where the limits a and b satisfy a < x0 < b; the integration simply has to span the point on
which the δ-function is centred. The second property is called the sifting property because
it picks out the value of f at x = x0 .
The following equivalences may also be proved by changing variables in the corresponding
integral (an appropriate integration range is assumed for compactness of notation):
Z
1 b
δ(ax − b) = |a| δ(x − a ) since f (x)δ(ax − b) dx = a1 f ( ab )
X δ(x − xi )
δ(g(x)) = where the xi are the (simple) real roots of g(x).
i
|g 0 (xi )|
Note that the dimensions of a δ-function are the inverse of those of its argument, as should be
obvious from the first equation.
Though the δ-function is not well defined as a function (technically it is a distribution rather
than a function), it can be considered as the limit of many well-defined functions. For instance
the “top-hat” function which vanishes outside a range a and has height 1/a tends to a δ-
function as a → ∞. Similarly a Gaussian with width and height inversely proportial tends to
a δ-function as the width tends to zero. These are shown in the first two frames below.
Two less obvious functions which tend to a δ-function, shown in the next two frames, are the
following:
Z L
0 L→∞
1
2π
ei(k−k )x dx = Lπ sinc (k − k 0 )L −→ δ(k − k 0 )
−L
L→∞
L
π
sinc2 (k − k 0 )L −→ δ(k − k 0 )
The first of these does not actually vanish away from the peak, but it oscillates so rapidly
that there will be no contribution to any integral over k 0 except from the point k 0 = k. This
is the integral which gives the orthogonality of two plane waves with different wavelengths:
hk|k 0 i = δ(k − k 0 ). It also ensures that the inverse Fourier transform of a Fourier transform
recovers the original function.
That
R ∞ the normalisation (for
R ∞integration over k) is correct follows from the following two integrals:
2
−∞
sinc(t)dt = π and −∞ sinc (t)dt = π. The second of these follows from the first via
R∞ R∞
integration by parts. The integral −∞ sinc(t)dt = Im I where I = −∞ eit /t dt may be done
via the contour integral below:
As no poles are included by the contour, the full contour integral is zero. By Jordan’s lemma
the integral round the outer circle tends to zero (as R → ∞, eiz decays exponentially in the
upper half plane). So the integral along the real axis is equal and opposite to the integral over
the inner circle, namely −πi times the residue at x = 0, so I = iπ. So the imaginary part, the
integral of sinc(x), is π.
A.4 Gaussian integrals

The following integrals will be useful:
Z ∞ ∞
dn
r Z r
−αx2 π 2n −αx2 n π
e dx = and x e dx = (−1)
−∞ α −∞ dαn α
These work even for complex α, so long as Re [α] ≥ 0

Often we are faced with a somewhat more complicated integral, which can be cast in Gaussian
form by “completing the square” in the exponent and then shifting integration variable x →
x − β/(2α):
Z ∞ Z ∞ r
−αx2 −βx β 2 /(4α) −α(x+β/(2α))2 π β 2 /(4α)
e dx = e e dx = e
−∞ −∞ α
This works even if β is imaginary.
The two contours below illustrate the two results for complex parameters α or β. For the first,
in (a), we rewrite αx2 as |α|z 2 where z = x exp(iArg [α]/2), so the integral we want is along
the blue line, with R → ∞. Since there are no poles, by Cauchy’s theorem the integral along
the blue contour must equal the sum of those along the red and black countours. As R → ∞
2
the red one gives the known real integral. Since e−|α|z tends to zero faster than 1/R as R → ∞
providing |x| > |y|, the contribution from the black paths is zero as R → ∞. Hence the red
and blue integrals are the same, provided Arg [α] ≤ π/2.
For the second, in (b), the blue contour is the desired integral one after the variable change
(for β imaginary). Again the red and black paths together must equal the blue and again the
contribution from the black paths is zero. Hence the two integrals must be the same.
y z y z
|β|/2α
x x
R R
(a) (b)
A.5 Units in EM
There are several systems of units in electromagnetism. We are familiar with SI units, but
Gaussian units are still very common and are used, for instance, in Shankar.
In SI units the force between two currents is used to define the unit of current, and hence the
unit of charge. (Currents are much easier to calibrate and manipulate in the lab than charges.)
The constant µ0 is defined as 4π × 10−7 N A−2 , with the magnitude chosen so that the Ampère
is a “sensible” sort of size. Then Coulomb’s law reads
q1 q2
F=
4π0 |r1 − r2 |2
and 0 has to be obtained from experiment. (Or, these days, as the speed of light is now has a
defined value, 0 is obtained from 1/(µ0 c2 ).)
However one could in principle equally decide to use Coulomb’s law to define charge. This is
what is done in Gaussian units, where by definition
q1 q2
F=
|r1 − r2 |2
Then there is no separate unit of charge; charges are measured in N1/2 m (or the non-SI equiv-
alent): e = 4.803 × 10−10 g1/2 cm3/2 s−1 . (You should never need that!) In these units,
µ0 = 4π/c2 . Electric and magnetic fields are also measured in different units.
The following translation table can be used:
Gauss e E p B
√ √
SI e/ 4π0 4π0 E 4π/µ0 B
Note that eE is the same in both systems of units, but eB in SI units is replaced by eB/c in
Gaussian units. Thus the Bohr magneton µB is e~/2m in SI units, but e~/2mc in Gaussian
units, and µB B has dimesions of energy in both systems.
The fine-structure constant α is a dimensionless combination of fundamental units, and as such
takes on the same value (≈ 1/137) in all systems. In SI it is defined as α = e2 /(4π0 ~c), in
Gaussian units as α = e2 /( ~c). In all systems, therefore, Coulomb’s law between two particles
of charge z1 e and z2 e can be written
z1 z2 ~cα
F=
|r1 − r2 |2
and this is the form I prefer.

Mathematical Foundations of Quantum Mechanics

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Mathematical Foundations of Quantum Mechanics

Hochgeladen von

Copyright:

Verfügbare Formate

Mathematical Foundations of Quantum Mechanics

December 16, 2016

This section is designed to be read in conjunction with chapter 1 of Shankar’s Principles of

1. The result of these operations is another member of V (closure).

2. |vi + |wi = |wi + |vi (vector addition is commutative)

3. (|ui + |vi) + |wi = |ui + (|vi + |wi) (vector addition is associative)

4. α(β|vi) = (αβ)|vi (scalar multiplication is associative)

6. α(|vi + |wi) = α|vi + α|wi (distributive rule 1)

7. (α + β)|vi = α|vi + β|vi (distributive rule 2)

• The set of all polynomials such as f (x) = a0 + a1 x + a2 x2 + . . . , with ai ∈ C and x ∈ R,

1.2 Linear Independence, bases and dimensions

Pn {|x1 i, |x2 i, . . . , |xn i}, the set is said to be linearly dependent if

Dimensions and Bases

A vector space has dimension N if it can accommodate a maximum of N linearly-independent

For a given basis {|xi i}, and a vector |vi = N

Subspaces and direct sums

1.3 Inner Products

(I) hv|wi = hw|vi∗ . (Skew symmetry)

• So does the “sum of products” rule i vi wi for lists of real numbers (R ).

where |C2 |−2 = hv2 |v2 i − hv2 |1ih1|v2 i

If we represent a ket |vi as a column matrix of coordinates v:

hv| → (v1∗ , v2∗ , . . . vN

Inner products in product spaces

Identity and Projection operators

i.e. the usual matrix multiplication formula.

Operators in product spaces

1.5 Hermitian and Unitary operators

Definition and Properties of Hermitian operators

An operator Ĥ is Hermitian if Ĥ † = Ĥ or anti-Hermitian if Ĝ† = −Ĝ. Another term for

Definition and Properties of Unitary operators

An operator Û is unitary if Û † = Û −1 . (In infinite dimensional spaces Û Û † = Iˆ and Û † Û = Iˆ

A simple example of a change of basis in a two-dimensional

1.6 Eigenvectors and Eigenvalues

The eigenvalue equation for a linear operator Ω̂ is

In any basis this is the determinant of an N ×N matrix, which is an N th-order polynomial in ω.

Hermitian and unitary operators

Diagonalisation of Hermitian or unitary operators

v(ω) = S† v(x) , Ω(ω) = S† Ω(x) S,

Commuting Hermitian Operators

1.7 Functions of Operators

• Linearly-independent sets of vectors are sets in which no member can be written as a

• In an orthonormal basis, coordinates

• They may have inverses: ÂÂ−1 = Iˆ = Â−1 Â.

• (ÂB̂)† = B̂ † Â† ; (ÂB̂)−1 = B̂ −1 Â−1

• In an orthonormal basis {|ii}, Iˆ = N

• Operators in N -dimensional vector spaces can be represented as N × N matrices.

• A Hermitian operator satisfies Â = Â† (‘Self-Adjoint’) and hw|Â|vi = hv|Â|wi∗ .

• A unitary operator satisfies Û −1 = Û † ; like a rotation or change of coordinates

• Eigenvectors (eigenkets) and eigenvalues satisfy Â|ai i = ai |ai i.

2.1 Inner product for functions

2.2 Operators in function spaces

Somewhat confusingly, the simplest kind of operator in function space is multiplication by

to ( g ∗ × (xf ) dx)∗ which is hg|X̂|f i∗ .

What about D̂? Consider

2.3 Eigenstates of X̂ and the x-representation

We will often use x0 or even x00 as the variable of integration.

2.4 Eigenstates of K̂ and the k-representation

Recall that we have defined K̂ = −iD̂ to get a Hermitian operator.

hx|K̂|ki = khx|ki = k φk (x)

but also, from the x-representation of the operator K̂,

and this justifies the choice of normalisation.

which is the inverse Fourier transform.