You are on page 1of 186


A theoretical physics FAQ



The up-to-date version of the theoretical physics FAQ is at
in a reorganized and clickable version.
The present document is an old ASCII version,
in which the state of the FAQ on January 9, 2010 is frozen.


Consider everything, and keep the good.

(St. Paul, 1 Thess. 5:21)

This document (a simple ASCII file) contains answers to some more or

less frequently asked questions from theoretical physics. Currently,
the FAQ contains 148 topics, grouped into 20 chapters, and filling over
11000 lines of text (about half a megabyte), corresponding to a book
of about 220 pages. Starting in 2004, the topics were edited from my
answers to postings to the moderated newsgroup sci.physics.research
(or, for some, translated from postings to the unmoderated German
newsgroup de.sci.physik).

If you like the FAQ and/or found it useful, please link to it from
your home page to make it more widely known.
If you spot errors or have suggestions for improvements,
please write me (at
If you have questions, please post them to the moderated newsgroup
sci.physics.research (!
If you found this FAQ useful you are likely to benefit also from
reading our book
Arnold Neumaier and Dennis Westra,
Classical and Quantum Mechanics via Lie algebras,
Of course, the FAQ refers only to a tiny part of theoretical physics,
namely to what I happened to discuss on sci.physics.research.
The answers are only as good as my understanding of the matter.
This doesn't mean that they are poor but probably that they are
not perfect. Many topics are discussed quite in detail, but this is
not a book, so don't expect completeness or comprehensiveness in any
On topics where the physics community has not yet reached a consensus,
my point of view is of course only one of the possibilities, and not
always the mainstream view, although I tend to discuss that view, too.
In any case, I try to be accurate, consistent, and intelligible.

Happy Reading!
Arnold Neumaier
University of Vienna
I like to see people grow

Table of Contents
The 21 topics in the initial version, posted there on April 28, 2004,
have grown to 88 by January 1, 2005, to 116 by January 4, 2006,
to 128 by January 3, 2007, to 140 by January 3, 2008, to 147 by
January 30, 2009, and are likely to grow further.
(A * indicates addition of a new topic, or large modification of
an old one, since January 30, 2009. Minor changes or additions to
old topics are not indicated.)
The various topics can usually be read independently of each other;
they are arranged into groups of loosely related topics.
To read a particular entry, grep for its label, e.g., S2e.
The labels may change with time as answers to further questions
will be added and old answers regrouped. So, to quote part of the FAQ,
refer to the title of a section and not only to its label.
QM = quantum mechanics, QFT = quantum field theory,
QED = quantum electrodynamics, CCR = canonical commutation relations,
s.p.r. = sci.physics.research (newsgroup).
Strings like quant-ph/0303047 or arXiv:0810.1019 refer to electronic
documents in the e-Print archive at and mirror sites.
p_0 and \p are the time and space part of a 4-vector p;
the Minkowski inner product is always taken to be p^2=p_0^2-\p^2.
Chapter 1 (20 sections)
S1a. What are bras and kets?
S1b. Projective geometry and quantum mechanics
S1c. What is the meaning of the entries of a density matrix?
S1d. Postulates for the formal core of quantum mechanics
S1e. Open quantum systems
S1f. Interaction with a heat bath
S1g. Quantum-classical mechanics
S1h. Can all quantum states be realized in nature?
S1i. Modes and wave functions of laser beams
S1j. Classical and quantum tunneling
S1k. Quantization in non-Cartesian coordinates
S1l. Second quantization
S1m. When is an object macroscopic?
S1n. The role of the ergodic hypothesis
S1o. Does quantum mechanics apply to single systems?
*S1p. Dissipative dynamics and Lagrangians
*S1q. How can QM be stochastic while the Schroedinger equation is not?
*S1r. Measurement theory for real numbers
*S1s. The classical limit of quantum mechanics
*S1t. The classical limit via coherent states
Chapter 2 (10 sections)
S2a. Lie groups and Lie algebras
S2b. The Galilei group as contraction of the Poincare group
S2c. Representations of the Poincare group, spin and gauge invariance
S2d. Forms of relativistic dynamics
S2e. Is there a multiparticle relativistic quantum mechanics?
S2f. What is a photon?
S2g. Particle positions and the position operator
S2h. Localization and position operators
*S2i. Position operators in relativistic quantum field theory
S2j. Coherent states of light as ensembles

Chapter 3 (6 sections)
S3a. What are 'bare' and 'dressed' particles?
S3b. How meaningful are single Feynman diagrams?
S3c. How real are 'virtual particles'?
S3d. What is the meaning of 'on-shell' and 'off-shell'?
S3e. Virtual particles and Coulomb interaction
S3f. Are virtual particles and decaying particles the same?
Chapter 4 (10 sections)
S4a. How do atoms and molecules look like?
S4b. Why are observable densities state-dependent?
S4c. Are electrons pointlike/structureless?
S4d. How much information is in a particle?
S4e. Entropy and missing information
S4f. How real is the wave function?
S4g. How real are Feynman's paths?
S4h. Can particles go backward in time?
S4i. What about particles faster than light (tachyons)?
S4j. Do free particles exist?
Chapter 5 (9 sections)
S5a. QM pictures and representations
S5b. Inequivalent representations of the CCR/CAR
S5c. Why does QFT look so different from QM?
S5d. Why is QFT based on a classical action?
S5e. Why does the action only contain first derivatives?
S5f. Why normal ordering?
S5g. Why locality and causal commutation relations?
S5h. Creation operators and rigged Hilbert space
S5i. Why Feynman diagrams?
Chapter 6 (8 sections)
S6a. Nonperturbative computations in quantum field theory
S6b. The formal functional integral approach to QFT
S6c. Functional integrals, Wightman functions, and rigorous QFT
S6d. Is there a rigorous interacting QFT in 4 dimensions?
S6e. Constructive field theory
S6f. The classical limit of relativistic QFT
S6g. What are interpolating fields?
S6h. Hilbert space and Hamiltonian in relativistic quantum field theory
*S6i. 2-dimensional quantum field theory
Chapter 7 (3 sections)
S7a. What is the mass gap?
S7b. Why can a bound state of massless quarks be heavy?
S7c. Bound states in relativistic quantum field theory
Chapter 8 (9 sections)
S8a. Why renormalization?
S8b. Renormalization without infinities I
S8c. Renormalization without infinities II
S8d. Renormalization and coarse graining
S8e. Renormalization scale and experimental energy scale
S8f. Dimensional regularization
S8g. Nonrelativistic quantum field theory
S8h. Nonrenormalizable theories as effective theories
S8i. What about infrared divergences?
Chapter 9 (6 sections)
S9a. Summing divergent series
S9b. Is QED consistent?
S9c. What about relativistic QFT at finite times?
S9d. Perturbation theory and instantaneous forces
S9e. QED and relativistic quantum chemistry
S9f. Are protons described by QED?
Chapter 10 (13 sections)
S10a. How are matrices and tensors related?
S10b. Is quantum mechanics compatible with general relativity?
S10c. Difficulties in quantizing gravity
S10d. Renormalization in quantum gravity
S10e. Hadamard states and their Hilbert spaces
S10f. Why do gravitons have spin 2?
S10g. What is the tetrad formalism?
S10h. Energy in general relativity
S10i. What happened to the aether?
S10j. What is time?
S10k. Time in quantum mechanics
S10l. Diffeomorphism invariant classical mechanics
S10m. The concept of ''Now''
Chapter 11 (7 sections)
S11a. A concise formulation of the measurement problem of QM
S11b. The double slit experiment
S11c. The Stern-Gerlach experiment
S11d. The minimal interpretation
S11e. The preferred basis problem
S11f. Master equation and pointer variables
S11g. Does decoherence solve the measurement problem?
Chapter 12 (6 sections)
S12a. Which interpretation of quantum mechanics is most consistent?
S12b. Which textbook of quantum mechanics is best for foundations?
S12c. What is the role of quantum logic?
S12d. Stochastic quantum mechanics
S12e. Is there a relativistic measurement theory?
S12f. Quantum mechanics and dice
Chapter 13 (10 sections)
S13a. Random numbers and other random objects
S13b. What is the meaning of probabilities?
S13c. What about the subjective interpretation of probabilities?
S13d. Are probabilities limits of relative frequencies?
S13e. How meaningful are probabilities of single events?
S13f. Objective probabilities
S13g. How probable are realizations of stochastic processes?
S13h. How do probabilities apply in practice?
S13i. Incomplete knowledge and statistics
S13j. Priors and entropy in probability theory
Chapter 14 (4 sections)
S14a. Theoretical challenges close to experimental data
S14b. Does the standard model predict chemistry?
S14c. Is the result of a measurement a real number?
S14d. Why use complex numbers in physics?
Chapter 15 (5 sections)
S15a. How precise can physical language be?
S15b. Why bother about rigor in physics?
S15c. Justifying the foundations of a theory
S15d. Foundations, theory and experiment
S15e. Theoretical physics as a formal model of reality
Chapter 16 (12 sections)
S16a. On progress in science
S16b. How different are physical sciences and social sciences
S16c. Can good theories be falsified?
S16d. What, then, distinguishes a good theory?
S16e. When is a theory preferred to another one?
S16f. What is a fact?
S16g. Physics and experience
S16h. Modeling reality
S16i. What is a system (e.g., an ideal gas)?
S16j. When is a theory confirmed?
S16k. What is real?
S16l. How many angels fit onto the tip of a needle?

Chapter 17 (8 sections)
S17a. How to get information from sci.physics.research
S17b. How to get your work published
S17c. How to respond to critical referee's reports
S17d. How to sell your revolutionary idea
S17e. Useful background, online lecture notes, etc.
S17f. Stories about physicists
S17g. Other physics FAQs
*S17h. Naming in science

Chapter 18 (5 sections)
S18a. What is the meaning of 'self-consistent'?
S18b. What is a vector?
S18c. Learning quantum mechanics at age 14
S18d. Research at age 16
S18e. Are there indefinite Hilbert spaces?
Chapter 19 (1 section)
S19a. God and physics
Chapter 20 (1 section)
S20a. Acknowledgments

Since March 1, 2005, there is also a related FAQ in German language,

Ein Theoretische Physik FAQ
where I describe some more topics which I have not translated.
(Among other topics, it discusses a new interpretation of quantum
mechanics, which I call the 'consistent experiment interpretation'.
It gives a new meaning to the foundations of physics, less paradox
than the conventional interpretations. I expect to have soon an
English version of it.)

S1a. What are bras and kets?
In the language of linear algebra, kets |psi> are just column vectors
psi (for systems with finitely many levels only; each component gives
the amplitude for the corresponding level), and the corresponding
bras <psi| are the complex conjugated transposed row vectors psi^*.
The inner product <phi|psi>, the bra(c)ket, is therefore
<phi|psi> = phi^*psi = sum_k phi^k^* psi_k.
For the basis bra <k|, the unit vector with a single entry 1 at
position k, we find as special case
<k|psi> = psi_k.
In infinite dimensions, the sum becomes an integral, and we get
<phi|psi> = integral dx phi(x)^* psi(x)
and for the basis bra <x|, which is a delta distribution centered at x,
we have
psi(x) = <x|psi>.
Actually, in infinite dimensions, one needs functional analysis
in place of linear algebra to get a concise definition; kets are smooth
functions from some nice function space, and bras are linear
functionals on the dual space. The dual space is larger and also
contains distributions.
(For those who want to be fully rigorous: kets belong to a
so-called nuclear space H_inf, for example the space of Schwartz
functions; its closure H under the Euclidean norm
gives the conventional Hilbert space, and together with the dual
H_inf^* = H_-inf, these define a Gelfand triple or rigged Hilbert
space, two names for the same concept).
Physicists are less picky, however, and allow kets also to be
less smooth functions and even distributions, so that every bra has
a corresponding ket. Thus they use the ket |x> although this is not a
function but a delta distribution centered at x.
This allows them to write not only psi(x) = <x|psi>, but also
psi(x)^* = <x|psi>^* = <psi|x>.
The price to be paid is that inner products are no longer well-defined
in general; for example, <x|x> is infinite. They say, |x> is not
normalizable and mean that it is not in the Hilbert space of
well-behaved pure states.
Caution: Physicists often use different bases which may cause confusing
notation. For example <p| is a momentum basis state, while <x| is a
position basis state. But while <x|y> = 0 if x and y are distinct
positions, and <p|q> = 0 if p and q are distinct momenta,
the inner product of a momentum bra <p| and a position ket |x>
(or vice versa) is never zero. (Exercise: Verify this by computing
explicit formulas for <p|x> and <x|p>!) Thus, unlike in mathematics,
the formulas are not invariant under substitution of letters for
the variables!
About the pitfalls when not using the required care, I recommend reading
F. Gieres,
Mathematical surprises and Dirac's formalism in quantum mechanics,
Rep. Prog. Phys. 63 (2000) 1893-1931.
G. Bonneau, J. Faraut, G. Valent,
Self-adjoint extensions of operators and the teaching of quantum
Amer. J. Phys. 69 (2001) 322-331.

S1b. Projective geometry and quantum mechanics
Projective geometry means that one works with rays instead of vectors
to designate points in a geometry.
Think of the 2-dimensional affine plane. The points are represented by
vectors in R^2. On the other hand, by moving an affine plane lying on
the floor a little upwards into the air (the same amount at every
point), one may think of each point as being represented by the ray
from an origin on the floor to the point on the plane.
(Actually, instead of the ray one should consider the whole line;
strictly speaking, a ray is only a half-line. But in quantum physics,
one custonmarily calls the 1-dimensional subspaces rays. Since the
coefficient field is complex, the rays are actually rotated complex
number planes.)
Similarly, lines are now 2-spaces through the origin. This gives
projective geometry (or homogeneous coordinates, which is the same in
more algebraic terms).
But now one also has some additional points, corresponding to rays
parallel to the affine plane. These points form the 'line at infinity'
= the 2-space through the origin parallel to the affine plane.
A slightly closer look reveals that the geometry has become more
complete: Now not only every two points have a unique connecting line
but also any two lines have a unique intersections - what were before
parallels are now lines intersecting 'at infinity'. Imagine two long,
straight rails of a railway track...
Thic can be extended to higher dimensions. n-dimensional affine geometry
can be respresented by rays through 0 in n+1 dimensional space, and can
be completed there to a projective geometry, in which the vector
subspaces are the geometrical objects. In Hilbert space one cannot
count anymore dimensions, but otherwise everything is similar.
Since, in quantum mechanics, state vectors are only defined up to a
phase (even when normalized), they correspond uniquely to rays
= 1-dimensional subspaces in Hilbert space. Hence quantum mechanics is
intrinsically projective.

S1c. What is the meaning of the entries of a density matrix?
Density matrices are a convenient way of describing states of quantum
systems in contact with an environment. (State vectors = wave functions
are appropriate only for isolated systems at zero absolute temperature,
though they can be used in an approximate way in thermally isolated
contexts. But contact with an environment means positive temperature.)
If the quantum system has only a finite number n of levels,
the density matrix is an n x n matrix; otherwise it is
a linear operator on Hilbert space (but nevertheless called a matrix).
The real use for density matrices is to compute expectations
<f> = trace (rho f)
for quantities f of interest. Indeed, rho is just a collection of
numbers enabling one to calculate these expectations.
The fact that the constant 1 must have expectation 1 leads to the
restriction that
sum_k rho_kk = trace rho = 1.
Apart from that, rho must be a Hermitian, positive semidefinite matrix,
to satisfy the requirements of statistics. (See quant-ph/0303047 for
details.) For small systems, all such density matrices can indeed be
approximately realized in practice.
Since diagonal entries of a semidefiniteness are always nonnegative,
the p_k:=rho_kk are nonnegative numbers summing to 1 and thus look like
probabilities. What the components mean depends on the basis used.
In particluar, if the basis consists of eigenstates of a Hamiltonian,
and the eigenvalues E_k are all nondegenerate, a diagonal element
rho_kk can be interpreted as the probability that upon measuring the
energy of the system one will find the value E_k.
If f is a function of the Hamiltonian H, and the basis used consists of
eigenstates |k> of H, with H|k>=E_k|k> then the density matrix rho
has entries rho_jk = <j|rho|k>. If one now calculates the expectation
of a function f(H), the equation f(H)|k>=f(E_k)|k> implies that
<f(H)> = trace (rho f(H)) = sum_k <k|rho f(H)|k>
= sum_k <k|rho f(E_k)|k> = sum_k <k|rho|k> f(E_k)
= sum_k rho_kk f(E_k).
If we average the results f(E) of a number of measurements of the
energy, where the energy E_k is measured with probability p_k,
we get
<f(H)> = sum_k p_k f(E_k).
Thus, to match the expectations no matter which function we are
averaging, we need to take p_k=rho_kk. This gives the claimed
probability interpretation of the diagonal entries.
Off-diagonal elements have no simple interpretation.
Usually one does not look at off-diagonal elements at all, but they
are important in intermediate steps of calculations.
Close to absolute zero temperature, and assuming the absence of
degeneracy, (but also in certain other, well prepared nearly
isolated systems), quantum state have the property that all columns
of the density matrix are nearly parallel to a wave function psi
that is conventionally normalized to have norm 1,
(In Dirac language, this says <psi|psi>=1; see the FAQ entry for bras
and kets.). This vector psi, which is clearly determined only up to a
complex number of absolute value 1, is called the wave vector
(or, in infinite dimensions, the wave function) of the state.
Idealizing this situation, one describes such quantum systems by states
in which all columns of the density matrix are exactly parallel to some
nonzero wave vector psi. (Such matrices are called rank 1 matrices;
the wave vector, also referred to as a wave function, is defined
only up to a phase factor.)
Then the k-th column is a multiple c_k psi of psi. The fact that rho
is Hermitian forces each row to be a multiple of psi^*. But this implies
that c_k is a multiple of phi^*_k, so that rho is a multiple of
psi psi^*. Since psi is normalized, the multiplication factor is just
the trace, and since the trace is 1 we find
rho = psi psi^* for any rank 1 density matrix.
If we now calculate the probability of measuring the energy E_k, we find
p_k = rho_kk = <k|rho|k> = <k|psi psi^*|k> = <k|psi> <psi|k>,
and since <psi|k> is just the complex conjugate of <k|psi>,
we end up with
p_k = |<k|psi>|^2.
This is Born's squared amplitude formula for calculating probabilities.
Thus one sees that the traditional wave vector calculus is just a
special case of the density matrix calculus, appropriate (only) for
the study of tiny, well-prepared nearly isolated systems and for
systems close to zero absolute temperature. For the study of ordinary
matter under ordinary conditions, one needs to represent states
by density matrices.
Everything that is done with wave vectors can also be done with
density matrices, or equivalently with the associated expectation
mapping. Indeed, everything becomes simpler that way, much closer
to classical mechanics, and much less weird-looking.
See quant-ph/0303047 for an exposition of the foundations of quantum
mechanics (including the probability interpretation, uncertainty
relations, nonlocality, and Bell's theorem) in terms of expectations.

S1d. Postulates for the formal core of quantum mechanics
Quantum mechanics consists of a formal core that is
universally agreed upon (basically being a piece of mathematics
with a few meager pointers on how to match it with experimental
reality) and an interpretational halo that remains highly disputed
even after 80 years of modern quantum mechanics. The latter is the
subject of the foundations of quantum mechanics; it is addressed
elsewhere in this FAQ. Here I focus on the formal side.
As in any axiomatic setting (necessary for a formal discipline),
there are a number of different but equivalent sets of axioms
or postulates that can be used to define formal quantum mechanics.
Since they are equivalent, their choice is a matter of convenience.
My choice presented here is the formulation which gives most
direct access to statistical mechanics, which is the main tool for
real life applications of quantum mechanics. The relativistic case
is outside the scope of the present axioms. Thus the following
describes nonrelativistic quantum statistical mechanics in the
Schroedinger picture. (The traditional starting point is instead
the special case of this setting where all states are assumed to be
There are six basic axioms:
A1. A generic system (e.g., a 'hydrogen molecule')
is defined by specifying a Hilbert space K whose elements
are called state vectors and a (densely defined, self-adjoint)
Hermitian linear operator H called the _Hamiltonian_ or the _energy_.
A2. A particular system (e.g., 'the ion in the ion trap on this
particular desk') is characterized by its _state_ rho(t)
at every time t in R (the set of real numbers). Here rho(t) is a
Hermitian, positive semidefinite (trace class) linear operator on K
satisfying at all times the conditions
trace rho(t) = 1. (normalization)
A state is called _pure_ at time t if rho(t) maps K to a 1-dimensional
subspace, and _mixed_ otherwise.
A3. A system is called _closed_ in a time interval [t1,t2]
if it satisfies the evolution equation
d/dt rho(t) = i/hbar [rho(t),H] for t in [t1,t2],
and _open_ otherwise. (hbar is Planck's constant, and is often set
to 1.) If nothing else is apparent from the context,
a system is assumed to be closed.
A4. Besides the energy H, certain other (densely defined, self-adjoint)
Hermitian operators (or vectors of such operators) are distinguished
as _observables_.
(E.g., the observables for an N-particle system conventionally include
for each particle a involved several 3-dimensional vectors:
the _position_ x^a, _momentum_ p^a, _orbital_angular_momentum_ L^a
and the _spin_vector_ (or Bloch vector) sigma^a of the particle with
label a. If u is a 3-vector of unit length then u dot p^a, u dot L^a
and u dot sigma^a define the momentum, orbital angular momentum,
and spin of particle a in direction u.)
A5. For any particular system, one associates to every vector X
of observables with commuting components a time-dependent monotone
linear functional <dot>_t defining the _expectation_
<f(X)>_t:=trace rho(t) f(X)
of bounded continuous functions f(X) at time t.
This is equivalent to a multivariate probability measure dmu_t(X)
(on a suitable sigma algebra over the spectrum spec(X) of X)
defined by
integral dmu_t(X) f(X) := trace rho(t) f(X) =<f(X)>_t.
A6. Quantum mechanical predictions amount to predicting properties
(typically expectations or conditional probabilities)
of the measures defined in axiom A5 given reasonable assumptions
about the states (e.g., ground state, equilibrium state, etc.)
Axiom A6 specifies that the formal content of the theory is covered
exactly by what can be deduced from axioms A1-A5 without
anything else added (except for restrictions defining the specific
nature of the state), and hence says that Axioms A1-A5 are complete.
The description of a particular closed system is therefore given by
the specification of a particular Hilbert space in A1, the
specification of the observable quantities in A4, and the
specification of conditions singling out a particular class of
states (in A6). Everything else is determined by the theory and
hence is (in principle) predicted by the theory.
The description of an open system involves, in addition, the
specification of the details of the dynamical law. (For the basics,
see the entry 'Open quantum systems' in this FAQ.)

In addition to these formal axioms one needs a rudimentary

interpretation relating the formal part to experiments.
The following _minimal_interpretation_ seems to be universally
MI. Upon measuring at times t_l (l=1,...,n) a vector X of observables
with commuting components, for a large collection of independent
(particular) systems closed for times t<t_l, all in the same state
rho_0 = lim_{t to t_l from below} rho(t)
(one calls such systems _identically_prepared_), the measurement
results are statistically consistent with independent realizations
of a random vector X with measure as defined in axiom A5.

Note that MI is no longer a formal statement since it neither defines

what 'measuring' is, nor what 'measurement results' are and what
'statistically consistent' or 'independent identical system' means.
Thus Axiom MI has no mathematical meaning. That's why it is
part of the interpretation of formal quantum mechanics.
However, the terms 'measuring', 'measurement results', 'statistically
consistent', and 'independent' already have informal meaning
in the reality as perceived by a physicist. Everything stated is
understandable by every trained physicist. Thus statement MI is not
for formal logical reasoning but for informal reasoning in the
traditional cultural setting that defines what a trained physicist
understands by reality.

The lack of precision in statement MI is on purpose, since it allows

the statement to be agreeable to everyone in its vagueness; different
philosophical schools can easily fill it with their own understanding
of the terms in a way consistent with the rest.
Interpretational axioms necessarily have this form, since they must
assume some unexplained common cultural background for perceiving
reality. (This is even true in pure mathematics, since the language
stating the axioms must be assumed to be common cultural background.)

Everything beyond MI seems to be controversial. In particular,

already what constitutes a measurement of X is controversial.
(E.g., reading a pointer, different readers may get marginally
different results. What is the true pointer reading?)
On the other hand there is an informal consensus on how to
perform measurements in practice. Good foundations including a
good measurement theory should be able to properly justify this
informal consensus by defining additional formal concepts that
behave within the theory just as their informal relatives with
the same name behave in reality.
In complete foundations, there would be formal objects in the
mathematical theory corresponding to all informal objects discussed
by physicists, such that talking about the formal objects
and talking about the real objects is essentially isomorphic.
We are currently far from such complete foundations.

Although much of traditional quantum mechanics is phrased in terms of

pure states, this is a very special case; in most actual experiments
the systems are open and the states are mixed states. Pure states
are relevant only if they come from the ground state of a
Hamiltonian in which the first excited state has a large energy gap.
Indeed, assume for simplicity that H has discrete spectrum. In an
orthonormal basis of eigenstates psi_k,
f(H) = sum_k f(E_k) psi_k psi_k^*
for every function f defined on the spectrum. Setting the Boltzmann
constant to 1 to simplify the formulas, the equilibrium density is
the canonical ensemble,
rho(T) = 1/Z(T) exp(-H/T) = sum_k exp(-E_k/T)/Z(T) psi_k psi_k^*.
(Of course, equating this ensemble with equilibrium in a closed system
is an additional step beyond our axiom system, which would require
justification.) Taking the trace (which must be 1) gives
Z(T) = sum_k exp(-E_k/T),
and in the limit T -> 0, all terms exp(-E_k/T)/Z(T) become 0 or 1,
with 1 only for the k corresponding to the states with least energy
Thus, if the ground state psi_1 is unique,
lim_{T->0} rho(T) = psi_1 psi_1^*.
This implies that for low enough temperatures, the equilibrium state
is approximately pure. The larger the gap to the second smallest
energy level, the better is the approximation at a given nonzero
temperature. In particular (reinstalling the Boltzmann constant kbar),
if the energy gap exceeds a small multiple of E^* := kbar T the
approximation is good.
States of simple enough systems with a few levels only
can often be prepared in nearly pure states, by realizing a source
governed by a Hamiltonian in which the first excited state has a much
larger energy than the ground state. Dissipation then brings the
system into equilibrium, and as seen above, the resulting equilibrium
state is nearly pure.
To see how the more traditional setting in terms of the
Schroedinger equation arises, we consider the case of a closed
system in a pure state rho(t) at some time t.
If psi(t) is a unit vector in the range of the pure state rho(t)
then psi(t), called the _state_vector_ of the system is determined
up to a phase, and one easily verifies that
rho(t) = psi(t)psi(t)^*.
Remarkably, under the dynamics for a closed system specified in the
above axioms, this property persists with time (only) if the system
is closed, and the state vector satisfies the Schroedinger equation
i hbar psi(t) = H psi(t)
Thus the state remains pure at all times.
Moreover, if X is a vector of observables with commuting components
and the spectrum of X is discrete, then the measure from axiom A5
is discrete,
integral dmu(X) f(X) = sum_k p_k f(X_k)
with nonnegative numbers p_k summing to 1, commonly called
Moreover, associated with the p_k are eigenspaces K_k such that
X psi = X_k psi for psi in K_k,
and K is the direct sum of the K_k. Therefore, every state vector psi
can be uniquely decomposed into a sum
psi = sum_k psi_k with psi_k in K_k.
psi_k is called the _projection_ of psi to the eigenspace K_k.
A short calculation using axiom A5 now reveals that for a pure state
rho(t)=psi(t)psi(t)^*, the probabilities p_k are given by the
so-called _Born_rule_
p_k = |psi_k(t)|^2, (*)
where psi_k(t) is the projection of psi(t) to the eigenspace K_k.

Deriving the Born rule (*) from axioms A1-A5 makes it completely
natural, while the traditional approach starting with (*)
makes it an irreducible rule full of mystery and only justifiable
by its agreement with experiment.

S1e. Open quantum systems
Open quantum systems are usually modelled in a stochastic way
to account for the unpredictability of the measurement process.
(Note that a measurement is any non-negligible interaction with the
environment, whether or not it is observed by something deserving
the name 'detector' or 'observer').
In the simplest setting in which states can be assumed to
be pure and measurements occur at definite, a priori known times
and have a negligible duration, an open quantum system is a discrete
stochastic process with values psi(t) in the Hilbert space of state
vectors, normalized to norm 1. Between two consecutive measurements,
the system is assumed to be closed.
Thus between two consecutive measurements at times t' and t''>t',
the normalized state psi(t) evolves according to the Schroedinger
i hbar psidot = H psi,
so that
psi(t''-0)= P psi(t'+0), P = exp (i/hbar (t'-t'')H). (1)
(In the interaction picture, H=0 and psi remains constant between
A measurement at time t is assumed to happen in infinitesimal time
and replaces psi(t-0) independent of other measurements with
probability p_s by
psi(t+0)= P_s psi(t-0)/p_s if p_s>0, (2)
where the P_s are linear operators determined by the experimental
arrangement, satisfying the relation
sum_s P_s^*P_s = 1, (3)
p_s=|P_s\psi(t-0)|^2 (4)
guarantees that psi(t+0) remains normalized. Clearly the p_s are
nonnegative and by (3), they sum up to 1 (since psi(t-0) is normalized).
(For measurements with more than countably many possible outcomes,
one must replace the probabilities by probability densities and the
sums by integrals.)
Thus this is a well-defined stochastic process.
A von-Neumann measurement of a self-adjoint linear operator A
corresponds to the special case where P_s is an orthogonal projector
to the eigenspace corresponding to the eigenvalue a_s of A
(respective to the set of eigenvalues corresponding to the s-th
interval in a partition of the continuous spectrum of A.)

If the measurement at different times has the same (or different)

nature, the P_s at these times are the same (or different).
It is possible to introduce 'empty measurements' at arbitrary
intermediate times with a trivial sum over a singleton s, where P_s=1.
For continuous measurements (where the open system cannot be considered
closed at all but a discrete number of times), one needs to take
a continuum limit of the above description. Depending how one takes
the limit, one gets quantum diffusion processes or quantum jump
processes. In this case, the density matrix for the associated
deterministic expectation evolves according to a Lindblad dynamics.
Realistic measurements (i.e. those taking into account the unavoidable
uncertainty) are not modelled by von-Neumann measurements, but rather
by positive operator valued measures, short POVMs. These are well
explained in
For more on real measurement processes (as opposed to the
von-Neumann measurement caricature treated in typical textbooks
of quantum mechanics), see, e.g.,
V.B. Braginsky and F.Ya. Khalili,
Quantum measurement,
Cambridge Univ. Press, Cambridge 1992

S1f. Interaction with a heat bath
Quantum mechanics in the presence of a heat bath requires the use
of density matrices. Instead of the usual von-Neumann equation
rhodot = rho \lp H
(for \lp see the section on 'Quantum-classical correspondence'),
the dynamics of the density matrix is given by a dissipative version
of it,
rhodot = rho \lp H + L(rho)
usually associated with the name of Lindblad. Here L(rho)
is a linear operator responsible for dissipation of energy to
the heat bath; it is not a simple commutator but can have
a rather complex form.
To get the Lindblad dynamics from a Hamiltonian description of
system plus bath, one uses the projection operator formalism.
The clearest treatment I know of is in
H Grabert,
Projection Operator Techniques in Nonequilibrium
Statistical Mechanics,
Springer Tracts in Modern Physics, 1982.
The final equations for the Lindblad dynamics are (5.4.48/49)
in Grabert's book.

S1g. Quantum-classical mechanics
Quantum mechanics and classical mechanics are very close relatives.
There are analogous objects for everything of relevance in
classical and quantum statistical mechanics.
Observable f:
classical - real phase space function f(x,p)
quantum - Hermitian linear operator or sesquilinear form f
Lie product f \lp g:
read \lp as 'Lie', and visualize it as inverted, stylized L;
Macro for LaTeX:
classical: f \lp g = {g,f} in terms of the Poisson bracket
quantum: f \lp g = i/hbar [f,g] in terms of the commutator
The Lie product is bilinear in the arguments and satisfies
f \lp g = - g \lp f
f \lp gh = (f \lp g)h + g(f \lp h) (Leibniz)
f \lp (g \lp h) = (f \lp g) \lp h + g \lp (f \lp h) (Jacobi)
Invariant measure:
classical - integral f := integral dxdp f(x,p)
quantum - integral f := trace f
Integrability: integral |f| finite
quantum integrable <==> f trace class
Partial integration formula:
integral f \lp g = 0.
Dynamics: df/dt = X_H f := H \lp f with Hermitian H
canonical transformations = mappings exp(tX_H) with Hermitian H
Liouville's theorem says that
integral f = integral exp(tX_H)f
The infinitesimal form of this is the partial integration formula.
State rho:
classical - real integrable phase space function rho(x,p)>=0
quantum - Hermitian positive semidefinite trace class operator rho
both normalized to integral rho = 1.
expectation of f in state rho:
<f> = integral rho f

S1h. Can all quantum states be realized in Nature?
No. Many mathematically conceivable states do not exist in Nature,
for example, that of water at an absolute temperature of zero.
Quantum mechanics does not demand that all states are realizable.
For a number of tiny systems with a few levels, all states are
realizable with reasonable precision. However, the larger the system
the fewer states are realized.
The number of states realized at a given time of very large systems
such as human beings or galaxy clusters is even so small that it
can be approximately counted!

S1i. Modes and wave functions of laser beams
The physical state described by a typical laser beam is a state with
an indeterminate number of photons, since it is usually not an
eigenstate of the photon number operator. This essentially means that
in a beam, a certain number of photons cannot be meaningfully asserted;
instead, one has a meaningful photon density, referred to as the beam
Thus the traditional N-particle picture does not apply.
Instead one has to work in a suitable Fock space.
The Maxwell-Fock space is obtained by 'second quantization' of the mode
space H_photon, consisting of all mode functions, i.e., solutions A(x)
of the free Maxwell equations, describing a classical background
electromagnetic field in vacuum. H_photon may be thought of as the
single photon Hilbert space, in analogy to the single electron Hilbert
space of solutions of the Dirac equation. (However, following up on
this analogy and calling A(x) a wave function leads to confusion later
on, and is best avoided.)
Actually, because of gauge invariance, the situation is slightly more
complicasted, and best described in momentum space. The Maxwell
equations reduce in Lorentz gauge, partial dot A(x) = 0, to
partial^2 A(x)=0, whence the Fourier transform of A(x) has the form
delta(p^2) Ahat(p), and Ahat(p) must satisfy the transversality
p dot Ahat(p) = 0.
By gauge invariance, only the coset of Ahat(p) obtained by adding
arbitrary multiples of p has a physical meaning, reflecting the
transversal nature of the free electromagnetic field.
This coset construction is needed to turn the space of modes
into a Hilbert space H_photon with invariant inner product
<A|B>= integral Ahat(p) dot Bhat(p) Dp,
Dp = d\p/p_0 = dp_1 dp_2 dp_3/p_0,
is the Lorentz invariant measure on the photon mass shell,
0 < p_0 = |\p| = sqrt(p_1^2+p_2^2+p_3^2)
(negative frequencies are discarded to get an irreducible
representation of the Poincare group).
Indeed, without the coset construction, the inner product is only
positive semidefinite, hence gives only a pre-Hilbert space.

Each (sufficiently nice) mode function A(x) gives rise to a coherent

state ||A>> in the Maxwell-Fock space, to an associated annihilation
a(A) = integral Ahat(p) a(p) Dp,
where a(p) is the QED annihilation operator for a photon with
momentum p, and to the corresponding creation operator a^*(A) = a(A)^*.
The annihilation and creation operators a(A) and a^*(A) produce a
single-mode Fock subspace consisting of all |A,psi>, where psi is the
unnormalized wave function of a harmonic oscillator; |psi|^2 is the
intensity of the beam.
The coherent state itself corresponds to the normalized vacuum state
of the harmonic oscillator, ||A>> = |A,vac>. If psi is a Hermite
polynomial H_k, |A,psi> is an eigenstate of the photon number operator
with eigenvalue k, and one has a k-photon state.
The Maxwell-Fock space is the closure of the space spanned by all
the |A,psi> together (and indeed, already the closure of the space
spanned by all ||A>>). This space is the pure electromagnetic field
sector of QED, describing a physical vacuum, i.e., a region of the
universe where matter is absent though radiation may be present.
In optics experiments, laser beams are often idealized by ignoring
their extension perpendicular to the transmission direction. Then each
beam can be described by some |A,psi>. In particular, for a
monochromatic beam, A is a plane wave, A(x)=A_0 exp(-i p dot x).
Of course, this matches the original approximation that we have a
beam only with a grain of salt, since a plane wave is not normalized.
A coherent pair of laser beams obtained by splitting is described by
a superposition |A_1,psi_1> + |A_2,psi_2> of the two beams.
Beams of thermal light (such as that from the sun) and pairs of
beams created by independent sources, cannot be described by wave
functions alone, but need a density formulation. A single light beam
is then described (in the same idealization) by a mode A and a density
matrix rho in a single-mode Fock space, while k light beams are
described by k modes A and a density matrix rho in a k-mode Fock space.
In many treatments, the modes are left implicit, so that one works
only in the k-mode Fock space. This simplifies the presentation, but
hides the connection to the more fundamental QED picture.
For a thorough study of the latter, see the bible on quantum optics,
L. Mandel and E. Wolf,
Optical Coherence and Quantum Optics,
Cambridge University Press, 1995.
S1j. Classical and quantum tunneling
Consider a particle in an external potential.
Assume the potential is everywhere finite, locally constant and positive
near the origin, and decays to zero far away.
There is no force, when the motion is deterministic and classical.
In practice, however, the classical, deterministic setting is an
approximation only, and the particle makes random motions.
Thus it moves away from the origin and will sooner or later reach
the nonconstant part of the potential. With low probability p,
it will even escape over any barrier; roughly, log p is proportional
to the negative barrier height. For details, you might
wish to consult my paper
A. Neumaier,
Molecular modeling of proteins and mathematical prediction of
protein structure,
SIAM Rev. 39 (1997), 407-460.
and the references there.
Quantum mechanically, there is always a probability of escaping to
infinity, without assuming any approximations. This is called
In both cases, once the particle is in the infinite region,
the probability that it returns is zero.
Thus a positive potential drives a particle in the long run off to
infinity (though, in case of a high barrier, one has to wait a long
time). In particular, in the classical case one also has a form of
(stochastic) tunneling.
Thus it is justified to refer to a potential such as the above as
repelling. However, no one would object if you call a potential
repelling _only_ in the neighborhood of a strict local minimizer, i.e.,
close to a metastable state.

Of course, a golf ball sitting on top of a flat hill will not move
down the hill; because of friction it remains in a metastable state.
Thus the above is an idealization. But most of physics is idealized,
and the language is also somewhat idealized (and, as actually used by
people, not even completely precise).

S1k. Quantization in non-Cartesian coordinates
Textbook quantization rules assume (often silently, without warning)
Cartesian coordinates. The rules derived there are based on
canonical commutation rules and are invalid for systems
described in other coordinate systems.
In particular, a Hamiltonian alone does not have a physical meaning
since it can be quite arbitrarily transformed by coordinate
transformations. The Hamiltonian needs to be combined with the
correct Poisson bracket to yield the correct dynamical equations.
Only if the classical Poisson bracket satisfies the canonical
commutation rules, the quantum mechanics is obtained by imposing
canonical commutation rules on the commutators.
The standard quantization procedure assumes that the symplectic form
underlying the Hamiltonian description has the standard form
p dq - q dp. Under a coordinate transformation, the symplectic form
changes into something nonstandard, and naive quantization gives
wrong results.
To get correct results, one has to take account of the correct
symplectic structure, more precisely of the Poisson bracket defined
by it. This is most naturally done in a differential geometric
setting, in terms of symplectic manifolds and Poisson manifolds.
To proceed, one must quantize a symplectic (or a Poisson) manifold
together with a Hamiltonian defined on it.
This combination is invariant under coordinate transformations
and hence has a coordinate-independent geometric meaning.
How to quantize Hamiltonians on a symplectic (or a Poisson) manifold
is the subject of geometric quantization, about which there is a
significant literature.

S1l. Second quantization
Second quantization is a way of writing the quantum mechanics of
indistinguishable particles in such a way that it makes statistical
mechanics calculations easy and makes everything look like field theory.
One starts with a distinguished vacuum state |vac> and a family of
annihilation operators a(x) whith their adjoints, the creation
operators a^*(x), satisfying the canonical commutation relations (CCR)
(This is for Bosons; for Fermions one has instead canonical
anticommutation relations, CAR, and everything below gets additional
minus signs in certain places.)
A pure (permutation symmetric) N-particle state with wave function
psi(x_1:N) is written in 2nd quantization as
psi = integral dx_1:N psi(x_1:N) a^*(x_1:N) |vac>,
hence the corresponding density matrix
rho = psi psi^*
takes the form
rho = integral dx_1:N dy_1:N rho(x_1:N,y_1:N),
where rho(x_1:N,y_1:N) is the rank one operator
Using this correspondence, one can do in second quantization whatever
one can do in first quantization (i.e., wave mechanics),
and match the results.
If f is a 1-particle operator given by an integral operator with
kernel f(x,y) (the general case follows by taking limits), so that
(f psi)(x_1:N)
= sum_a integral dx f(x_a,x) psi(x_{1:a-1},x,x_{a+1:N}),
the formula
<f> = integral dx dy <x|Rho|y> f(x,y)
defines the 1-particle density matrix Rho. The form of f in second
quantization is
f = integral dx dy f(x,y) a^*(x) a(y)
(exercise: check that it has indeed the desired action on an
N-particle state!), hence one has
<f> = integral dx dy f(x,y) <a^*(y)a(x)>.
and comparison with the definition of Rho gives the formula
<x|Rho|y> = <a^*(y)a(x)> = trace a(x) rho a^*(y),
which can therefore be viewed as the definition of the 1-particle
density matrix in second quantization.
Authers who fear integrals write instead similar formulas with
sums in place of integrals and discrete indices in place of the x,y.
Also, one can do the same in momentum space rather than position space,
which amounts to a change of basis but generally leads to
computationally more tractable formulations.

S1m. When is an object macroscopic?
One says that thermodynamics and statistical mechanics apply to
macroscopic objects. But when is an object macroscopic?
Thermodynamics and statistical mechanics are approximate, asymptotic
descriptions valid for 'sufficiently large' objects.
The approximations made are better and better the larger the object.
One can place the barrier anywhere; if one puts it too low, the
approximate description will be poor, if one puts it too high it
won't apply to the system of interest.
Thus the loose language accommodates the freedom in modeling the
user has when choosing the description level and the accuracy level.
It is only in the same sense subjective as is the choice of a
system of interest. What is interesting for one person or investigation
may be different from what is interesting for another person or
investigation; nevertheless, both may employ objective tools.
The mathematical meaning underlying this loose language is called the
thermodynamic limit. It makes the term 'macroscopic'
precise in a similar way as the mathematical notion of a limit N->inf
makes the term 'N sufficiently large' precise.
If one accepts the vague terminology to avoid talking always about
limits, one can give the following definition (which reflects the
subjectivity in the qualification about the modeling accuracy):
In statistical mechanics, all macroscopic observables are ensemble
averages. Thus, formally, a "macroscopic observable" is the expectation
of a space-time dependent field operator which remains constant
within the modeling accuracy under changes in space and time
smaller than the modeling accuracy.

S1n. The role of the ergodic hypothesis

Statistical mechanics textbook often invoke the so-called ergodic

hypothesis (assuming that every phase space trajectory comes
arbitrarily close to every phase space point with the same values of
all conserved variables as the initioal point of the trajectory)
to derive thermodynamics from the foundations. However, textbook
statistical mechanics gives only a gross simplification of the
power of thermodynamics. The ergodic hypothesis is not needed to make
thermodynamics valid. Indeed, the ergodic hypothesis is invalid in
many cases - namely always when the system needs additional variables
to be thermodynamically described.
This is the case for fluids near the critical point, for finite objects
at their surfaces, for systems with interfaces, for metastable states,
for molecular systems in the absence of chemical reactions (here the
number of molecules of each species is conserved), etc.
But this does not invalidate thermodynamics - the latter only requires
that a sufficiently large set of macroscopic variables (in the above
sense) is included in the list of thermodynamic variables.
Indeed, traditional thermodynamics accounts for molecules, surface
tension, metastability, etc., without any change to the formalism.

Probably the ergodic hypothesis, restricted to a limited piece of a

submanifold of the phase space with fixed values of the macroscopic
variables (whether conserved or not) is ''roughly'' equivalent to the
completeness of the set of distinguished macroscopic observables,
in the sense that every other macroscopic observable can be defined
in terms of the distinguished ones. But ...
1. It is the latter property (only) which can be checked experimentally:
Completeness holds if and only if the properties of the system under
study are indeed predicable by the thermodynamics of the distinguished
observables. Experiment (or experience), together with simplicity of
the description, decides in _all_ practical situations what is the set
of distinguished observables.
Indeed, we refine a model whenever we discover significant deviations
from the thermodynamical behavior of a previous simpler model.
Thus thermodynamics takes the form of a setting for describing
material properties to which any successful description has to conform
by axiomatic decree.
2. The ergodic hypothesis can be proved only for extremely simple
systems. In particular, these systems must conform to classical
mechanics - there is no simple quantum version of ergodic dynamics.
Moreover, there are many classical systems which are chaotic only in
part of their phase space - they are probably not ergodic, as the
number of conserved quantities depends on where in the phase space one
3. Thermodynamics applies also for nearly conserved quantities, where
the ergodic argument becomes vague; conversely, near ergodicity (up to
the model accuracy) is enough to make a thermodynamic description
valid. In particular, thermodynamics applies near a critical point
where there cannot be an ergodic argument since there is no extra
conserved quantity but an order parameter is needed to give a correct
description. (At which distance from the critical point should one
ignore the order parameter? Ergodic arguments have nothing to say here.)
4. There are studies about the nonergodic behavior of supercooled
liquids, e.g., Phys. Rev. A 43, 1103 - 1106 (1991).

Thus I think it is best to ignore the ergodic hypothesis as a means for

explaining statistical mechanics, except in some simple model cases.
It should have no deeper relevance than the hard sphere model of a
monatomic gas (which has been shown to be ergodic, I believe).

S1o. Does quantum mechanics apply to single systems?
It is clear phenomenologically that statistical mechanics (and hence
quantum mechanics) applies to single systems like a particular cup of
tea, irrespective of what the discussions about the foundations of
physics say (see many other entries in this FAQ). Thus statistical
mechanics and quantum mechanics do not only apply - as is often
claimed - to large ensembles of independently and identically prepared
systems; when the system is large enough (i.e., macroscopic),
a _single_ system is enough.
(For smaller single systems, see the entry
''How do atoms and molecules look like?'' in the present FAQ.)
In classical statistical mechanics, the traditional bridge between
the ensemble view and thermodynamics (which clearly applies to single
systems) is the ergodic hypothesis. But there is not enough time
in the universe to explore more than an extremely tiny region of the
about 10^25-dimensional phase space of the cup of tea to explain the
success of the thermodynamical description by ergodicity.
In quantum mechanics, the situation is even worse - usually it is not
even attempted here to bridge the gap.
The best treatment I know of the foundational problems
involved in classical statistical mechanics is in the book
L. Sklar,
Physics and Chance,
Cambridge Univ. Press, Cambridge 1993.
but it does not present a solution. Other sources are not better in
this respect.
My own solution is the ''thermal interpretation'' of
physics, discussed to some extent in Chapter 7 of the book
Arnold Neumaier and Dennis Westra,
Classical and Quantum Mechanics via Lie algebras,
Cambridge University Press, to appear (2009?).
and in my recent slides
A. Neumaier,
Classical and quantum field aspects of light,
A. Neumaier,
Optical models for quantum mechanics,
and explored in more detail in my German
Ein Theoretische Physik FAQ
under the name ''consistent experiment interpretation''
The key idea is that mathematical expectation has two different
interpretations in physics, one as average over a large number of
cases, and the other as a means of defining observables. That the
two interpretations have the same mathematical properties is the
reason they have been confused in the past. The thermal interpretation
separates them neatly and thus gets rid of most of the confusing
aspects of the foundations of physics.

S1p. Dissipative dynamics and Lagrangians
Any system of ordinary differential equations can be brought
into an artificial Lagrangian form, by first rewriting it in first
order form
doubling the degrees of freedom by introducing conjugate variables p,
and then considering the Lagrangian
L(p,q)= p^T F(q,q').
In particular, this provides a Lagrangian formulation of dissipative
systems, such as the damped harmonic oscillator
m q'' + c q' + k q = 0 (m,c,k >0)
Unfortunately, the Hamiltonian in such a formulation has
nothing to do with the physical energy
E = (m q'^2 + k q^2)/2
The same holds for various other representations for the damped
harmonic oscillator found in the literature.
Lagrangians for the damped harmonic oscillator go back to
H. Bateman, Phys. Rev. 38, 815-819 (1931); the treatise
P.M. Morse and H. Feshbach,
Methods of Theoretical Physics
MacGraw-Hill, Boston 1953
discusses the procedure in Chapter 3 in terms of 'mirror images'
= additional dynamical variables needed to absorb the missing energy,
and remarks on p 313:
''The introduction of the mirror image ... is probably too artificial
a prcedure to expect to obtain much of physical significance from
And indeed, the book doesn't make use of it anywhere.

Having a formal Lagrangian or Hamiltonian is no virtue in itself.

In particular, for a _quantum_ system, the Hamiltonian _must_ be the
energy. Playing around with alternative Lagrangians and Hamiltonians
may be amusing, but does not produce relevant physics.

Since dissipative equations (like the diffusion equation or the damped

harmonic oscillator) describe open systems (where energy is lost to an
unspecified environment), they cannot be described by a Schroedinger
Classically, dissipative systems are described by stochastic
differential equations (and their equivalent deterministic
Fokker-Planck equations) or master equations;
the diffusion equation is the particular case of a Fokker-Planck
equation for Brownian motion.
Quantum mechanically, dissipative systems are described by stochastic
Schroedinger equations or, corresponding to the Fokker-Planck level,
by quantum Liouville equations with Lindblad terms. This gives correct
physics in a dissipative environment. Many quantum optical systems
are directly modeled on the Lindblad level, where the terms have an
understandable and experimentally verifiable meaning independent of
any underlying more microscopic model.
An important recent example is that of photons on demand,
M. Keller, B Lange, K Hayasaka, W Lange and H Walther,
A calcium ion in a cavity as a controlled single-photon source,
New Journal of Physics 6 (2004), 95.
There is no trace of a Lagrangian in the modeling, and indeed, a
useful Lagrangian formulation does not exist - unless one extends the
dynamics and explicitly includes the environment.

Of course, in theory, a dissipative system is thought to be a

contracted version of a bigger conservative system which includes
the envoironment, and in simple situations, this theoretical view can
indeed be substantiated.
If one models the dissipative environment explicitly, on gets a
bigger conservative system, not a dissipative system. Of course,
this conservative system has a Hamiltonian or Lagrangian description,
but it does not describe the dissipative system alone. When one
contracts it to the degrees of freedoms of the original system,
one gets an integro-differential equation with memory, which is no
longer described by a physically meaningful Hamiltonian or Lagrangian
The reduced dynamics takes the exact form
m x''(t) + k x(t) = int_0^t G(s) x(t-s) ds + F(t).
with functions F(t) (the noise caused by the environment) and G(s)
(the memory kernel) that depend on the state of the environment.
If the interaction is of the usual, dissipative nature then both F(t)
and G(s) are extremely oscillating, even for intervals short compared
to the inverse frequency T of the oscillator. But the short time
averages of the memory Kernel have an exponentially decaying bound on
their size and become negligible after some relaxation time tau << T.
Thus it suffices in a good approximation to take the integral
from s=0 to s=tau only. This allows us to expand x(t-s) in a second
order Taylor expansion (valid since s<=tau<<T) and to express the
integral in closed form as
int_0^t G(s) x(t-s) ds approx = dk x(t) - c x'(t) + dm x''(t)
with renormalization constants
dk = int_0^tau G(s) ds,
c = int_0^tau G(s) s ds,
dm = int_0^tau G(s) s^2 ds,
leading to the memory-free renormalized reduced dynamics
(m-dm) x''(t) + c x'(t) + (k-dk) x(t) = F(t).
Microscopic models of the environment lead in simple cases to explicit
expressions for G(s) from which one can deduce that c>0, recovering the
traditional equation for the damped harmonic oscillator, including a
stochastic force term. (Its size can be related to the damping
coefficient and the temperature of the environment, a relation known
as the fluctuation-dissipation theorem.)
A thorough discussion of the reduction of microscopic conservative
large systems to dissipative subsystems of interest is given in
H Grabert,
Projection Operator Techniques in Nonequilibrium
Statistical Mechanics,
Springer Tracts in Modern Physics, 1982
at a much more general level that also applies for
many other dissipative systems.

There are cases where one needs to model the memory to capture the
essence of the reduced dynamics. But in many cases, a simpler,
memory-free description is possible and adequate. One can remove the
memory by employing a Markov approximation, and gets again a
differential equation, which defines the Lindblad (or, classicallally,
the Focker-Planck) dynamics. Again, this is no longer described by a
Hamiltonian or Lagrangian framework.
In the extended formulation with explicit environment or with memory,
already a simple damped harmonic oscillator becomes a huge and
unwieldy dynamical system which is no longer equivalent to the damped
harmonic oscillator, but includes unwanted environment terms or memory
terms. In cases where one really needs to model the memory, the system
therefore is no longer a damped harmonic oscillator. The latter is
described by a simple linear constant coefficient second order
differential equation for a single function, and has no memory.
Its analysis is very simple, and compared to that any more detailed
description is unwieldy.

In practice, the dissipative formulation therefore stands by itself

(apart from lip service paid to a hypothesized more fundamental
conservative description).
The situation is similar to that in fluid dynamics. In theory, the
Navier-Stokes equations (which are dissipative) should be derivable from
a Lagrangian. Indeed, such derivations have been given, but only for
very simple model problems such as an ideal gas. However, there is no
microscopic derivation of the Navier-Stokes equations in the practically
interesting case of water at room temperature...
S1q. How can QM be stochastic while the Schroedinger equation is not?
The Schroedinger equation is a deterministic wave equation.
But when we set up an experiment to measure either position or
momentum, we get uncertain, stochastic outcomes.
So - is quantum mechanics deterministic or stochastic?

One has to be careful in the interpretation of the foundations...

Fortunately, the same apparent paradox already occurs in classical

physics; hence the paradox cannot have anything to do with the
peculiarities of quantum mechanics.
Indeed, a Focker-Planck equation is a deterministic partial
differential equation. But when measuring a process modelled by it
- such as the position of a grain of pollen in Brownian motion -,
we get only probabilistic results. Now Focker-Planck equations are
essentially equivalent to classical stochstic differential equations.
So - do they describe a deterministic or a stochastic process?

The point resolving the issue is that, both in stochstic differential

equations and in quantum mechanics, probabilities satisfy deterministic
equations, while the quantities observed to deduce the probabilities
do not.
Thus, in both cases, probabilities are deterministic ''observables''
while the position of a grain of pollen in classical mechanics, or
position and momentum in quantum mechanics, ar not.

S1r. Measurement theory for real numbers
The standard textbook measurement theory says that the possible
measurement results in measuring an observable given by a Hermitian
operator A are its possible eigenvalues, with a probability density
depending on the state of the system. This is part of the content of
Born's rule, and counts as one of the cornerstones of the
interpretation of quantum mechanics.
But Born's rule gives only a very idealized account of measurement
theory, and gives no sufficient explanation for what is going on in
many nontrivial measurements.
The spectrum of the Hamiltonian of the electron of a hydrogen atom
has a discrete part, catering for its bound states. According to the
idealized textbook measurement theory, a measurement of the energy
of a bound state should produce an infinitely accurate value agreeing
with one of the values in the (QED-corrected) Balmer (etc.) series.
But this is ridiculous. Repeated preparation and measurement of the
position of the ``same'' spectral lines (which provide these energy
measurements, relative to an appropriate zero of the energy) yields
different results, from which the energies themselves can be obtained
only to a certain accuracy.
Thus Born's rule does not account for the interpretation of a
measurement of the energy of an electron. For similar reasons,
measurements of particle masses or resonance energies do not reveal
the exact values (which they should according to Born's rule) but only
approximations whose quality depends a lot on the way the measurement
is done (an aspect that does not figure at all in Born's rule).
Measurements such as that of a particle lifetime or the integral cross
section of a particular reaction do not even have a natural associated
operator of which the measurement result would be an eigenvalue.
The idealized textbook measurement theory based on Born's rule is
appropriate only for the measurement of spin and related variables
that result in recording decisions of finite information content.
Thus the measurement process as described by von Neumann (and copied
from there to numerous textbooks) is an unrealistic idealization
compared with many (and probably most) real measurements.
The latter are usually much better described by suitable POVMs
(positive operator valued measures) rather than by Born's rule,
which corresponds to PVMs (projection-valued measures), a special case
of POVMs in which the positive operators are in fact projections.
See Sections 7.3-7.5 of the book
A. Neumaier and D. Westra,
Classical and Quantum Mechanics via Lie algebras,
for a realistic account of measurement theory not dependent on
Born's rule. The latter is derived there as a special case, together
with giving the condition in which it is applicable.

S1s. The classical limit of quantum mechanics
Classical mechanics is often seen as the formal limit hbar-->0 of
quantum mechanics. Strictly speaking, this cannot be true since hbar
is a constant of nature, which is often even set to one to have
convenient units. The classical limit really is the limit of large
quantum numbers M (typically of mass, number of particles, or size of
angular momentum), when attention is limited to quantities whose
uncertainties are small compared to their expectations.
In these situations, the effect is similar to taking the limit
hbar --> 0. In these cases the relative uncertainties scale with
sqrt(hbar/M), which becomes small if either hbar is made formally
tiny or if M is large.
Indeed, a quantum system is essentially classical if its relevant
quantities have uncertainties that are small compared to their
The relation between classical mechanics is most easily seen if --
as in statistical mechanics -- quantum mechnaics is presented in terms
of mixed states, which correspond to density matrices.
(Almost all quantum mechanics applied to real systems not in
the ground state needs density matrices, since pure states are very
difficult to create and propagate unless a system is in the ground
state. Pure states describe only an idealized version of quantum
reality, which in statistical mechanics appears as the approximation
in the cold limit T-->0.)
Density matrices are intrinsically quantum mechanical.
Nevertheless they exhibit very close analogies to classical densities.
Therefore everyone interested in the relations between classical and
quantum mechanics is well-advised to look at both theories in the
statistical mechanics version, where the analogies are obvious, and
the transition from quantum to classical takes the form of a simple
QM in the statistical mechanics version is almost as intuitive as
classical statistical mechanics. The only somewhat nonintuitive part
is in both cases how to interpret probability. (This is already a
severe problem in classical statistical mechanics, as the book by
Laurence Sklar, Physics and Chance, explains in detail.)

A density matrix describes the stochastic behavior of a quantum system

in the same way as a density function describes the stochastic behavior
of a classical system. In both cases, if the system is nice enough that
the stochastic uncertainties (square roots of variances) in the
quantities of interest are much smaller than the quantities themselves,
one can form a deterministic approximation.
This deterministic approximation is given by a classical dynamical
system for the (expectations of the) quantities of interest.
Thus, in a sense, classical variables are simply expectations of
relevant quantum variables with small uncertainty. Then (and only then)
is a deterministic approximation adequate. The small uncertainty
makes these variables approximately predictable in each individual
event, and hence classical.
Classicality therefore develops whenever the uncertainties of the
quantities of interest become small compared to their expectations.
Of course, there is significant interest in quantum systems where this
does not happen, since these are decidedly non-classical, but quantum
theory gets its strange, counterintuitive feature only when one
concentrates on these systems only.
For more details, see, e.g., Sections 7.3-7.5 of
A. Neumaier and D. Westra,
Classical and Quantum Mechanics via Lie algebras

S1t. The classical limit via coherent states
One method for producing classical mechanics from a quantum theory is
by looking at coherent states of the quantum theory. The standard
(Glauber) coherent states have a localized probability distribution in
classical phase space? whose center follows the classical equations
of motion when the Hamiltonian is quadratic in positions and momenta.
(For nonquadratic Hamiltonians, this only holds approximately over
short times. For example, for the 2-body problem with a 1/r^2
interaction, Glauber coherent states are not preserved by the dynamics.
In this particular case, there are, however, alternative SO(2,4)-based
coherent states that are preserved by the dynamics, smeared over
Kepler-like orbits. The reason is that the Kepler 2-body problem --
and its quantum version, the hydrogen atom -- are superintegrable
systems with the large dynamical symmetry group SO(2,4).)
In general, roughly, coherent states form a nice orbit of unit vectors
of a Hilbert space H under a dynamical symmetry group G with a
triangular decomposition, such that the linear combinations of
coherent states are dense in H, and the inner product phi^*psi of
coherent states phi and psi can be calculated explicitly in terms of
the highest weight representation theory of G. The diagonal of the
N-th tensor power of H (coding systems with N-fold quantum numbers)
has coherent states phi_N (labelled by the same classical phase space
as the original coherent states, and orresponding to the N-fold highest
weight) with inner product
phi_N^*psi_N=(phi^*psi) N
and for N --> inf, one gets a good classical limit. For the Heisenberg
group, phi^*psi is a 1/hbar-th power, and the N-th power corresponds
to replacing hbar by hbar/N. Thus one gets the standard classical limit.
Basic literature on relations between coherent states and the classical
limit, based on irreducible unitary representations of Lie groups
includes the book
A. M. Perelomov,
Generalized Coherent States and Their Applications,
Springer-Verlag, Berlin, 1986.
and the paper
L. Yaffe,
Large N limits as classical mechanics,
Rev. Mod. Phys. 54, 407--435 (1982)
Both references assume that the Lie group is finite-dimensional and
semisimple. This excludes the Heisenberg group, in terms of which the
standard (Glauber) coherent states are usually defined. However, the
Heisenberg group has a triangular decomposition, and this suffices to
apply Perelomov's theory in spirit. The online book
Arnold Neumaier, Dennis Westra,
Classical and Quantum Mechanics via Lie algebras,
contains a general discussion of the relations between classical
mechanics and quantum mechanics, and discusses in Chapter 16 the
concept of a triangular decomposition of Lie algebras and a summary of
the associated representation theory (though in its present version
not the general relation to coherent states).
For other relevant approaches to a rigorous classical limit, see the
online sources

S2a. Lie groups and Lie algebras
Lie groups can be illustrated by continuous rigid motion of a ball
with painted patterns on it in 3-dimensional space. The Lie group ISO(3)
consists of all rigid transformations.
A rigid transformation is essentially the act of picking the ball and
placing it somewhere else, ignoring the detailed motion in between and
the location one started.
Special transformations are for example a translation in northern
direction by 1 meter, or a rotation by one quarter around the vertical
axis at some particular point (think of a ball with a string attached).
'Rigid' means that the distances between marked points on the ball
remains the same; the mathematician talks about 'preserving distances',
and the distances are therefore labeled 'invariants'.
One can repeat the same transformation several times, or two different
transformations and get another one - This is called the product of
these transformations. For example, the product of a translations
by 1 meter and another one by 2 meters in the same direction gives one
of 1+2=3 meters in the same direction. In this case, the distances add,
but if one combines rotations about different axes the result is no
longer intuitive. To make this more tractable for calculations,
one needs to take some kind of logarithms of transformations - these
behave again additively and make up the corresponding Lie algebra
iso(3) [same letters but in lower case]. The elements of the Lie algebra
can be visualized as very small, or 'infinitesimal', motions.

General Lie groups and Lie algebras extend these notions to to more
general manifolds. A manifold is just a higher-dimensional version
of space, and transformations are generalized motions preserving
invariants that are important in the manifold. The transformations
preserving these invariants are also called 'symmetries', and the
Lie group consisting of all symmetries is called a 'symmetry group'.
The elements of the corresponding Lie algebra are 'infinitesimal
For example, physical laws are invariant under rotations and
translations, and hence unter all rigid motions. But not only these:
If one includes time explicitly, the resulting 4-dimensional space
has more invariant motions or ''symmetries''.
The Lie group of all these symmetry transformations is called the
Poincar'e group, and plays a basic role in the theory of relativity.
The transformations are now about space-time frames in uniform motion.
Apart from translations and rotations there are symmetries called
'boosts' that accelerate a frame in a certain direction, and
combinations obtained by taking products. All infinitesimal symmetries
together make up a Lie algebra, called the Poincar'e algebra.
Much more on Lie groups and Lie algebras from the perspective of
classical and quantum physics can be found in:
Arnold Neumaier and Dennis Westra,
Classical and Quantum Mechanics via Lie algebras,
Cambridge University Press, to appear (2009?).

S2b. The Galilei group as contraction of the Poincare group
The group of symmetries of special relativity is the Poincare group.
However, before Einstein invented the theory of relativity,
physics was believed to follow Newton's laws, and these have a
different group of symmetries - the Galilei group, and its
infinitesimal symmetries form the Galilei algebra.
Now Newton's physics is just a special case of the theory of relativity
in which all motions are very slow compared to the speed of light.
Physicists speak of the 'nonrelativisitic limit'.
Thus one would expect that the Galilei group is a kind of
nonrelativistic limit of the Poincar'e group.
This notion has been made precise by Inonu. He looked at the
Poincar'e algebra and 'contracted' it in an ingenious way
to the Galilei algebra. The construction could then be lifted to
the corresponding groups. Not only that, it turned out to be a
general machinery applicable to all Lie algebras and Lie groups,
and therefore has found many applications far beyond that for which
it was originally developed.

S2c. Representations of the Poincare group, spin and gauge invariance
Whatever deserves the name ''particle'' must move like a single,
indivisible object. The Poincare group must act on the description of
this single object; so the state space of the object carries a
unitary representation of the Poincare group. This splits into a direct
sum or direct integral of irreducible reps. But splitting means
divisibility; so in the indivisible case, we have an irreducible
representation. Thus particles are described by irreducible unitary
reps of the Poincare group. Additional parameters characterizing the
irreducible representation of an internal symmetry group = gauge
On the other hand, not all irreducible unitary reps of the Poincare
group qualify. Associated with the rep must be a consistent and causal
free field theory. As explained in Volume 1 of Weinberg's book on
quantum field theory, this restricts the rep further to those with
positive mass, or massless reps with quantized helicity.

Weinberg's book on QFT argues for gauge invariance from

causality + masslessness. He discusses massless fields in
Chapter 5, and observes (probably there, or in the beginning
of Chapter 8 on quantum electrodynamics) roughly the following:
Since massless spin 1 fields have only two degrees of freedom,
the 4-vector one can make from them does not transform correctly
but only up to a gauge transformation making up for the missing
longitudinal degree of freedom. Since sufficiently long range
elementary fields (less than exponential decay) are necessarily
massless, they must either have spin <=1/2 or have gauge behavior.
To couple such gauge fields to matter currents, the latter
must be conserved, which means (given the known conservation laws)
that the gauge fields either have spin 1 (coupling to a conserved
vector current), or spin 2 (coupling to the energy-momentum tensor).
[Actually, he does not discuss this for Fermion fields,
so spin 3/2 (gravitinos) is perhaps another special case.]
Spin 1 leads to standard gauge theories, while spin 2 leads
to general covariance (and gravitons) which, in this context,
is best viewed also as a kind of gauge invariance.
There are some assumptions in the derivation, which one can find
out by reading Weinberg's papers
Phys.Rev. 133 (1964), B1318-B1322 any spin (massive)
Phys.Rev. 134 (1964), B882-B896 any spin II (massless)
Phys.Rev. 135 (1964), B1049-B1056 grav. mass = inertial mass
Phys.Rev. 138 (1965), B988-B1002 derivation of Einstein
Phys.Rev. 140 (1965), B516-B524 infrared gravitons
Phys.Rev. 181 (1969), 1893-1899 any spin III (general reps.)
on 'Feynman rules for any spin' and some related questions, which
contain a lot of important information about applying the irreducible
representations of the Poincare group for higher spin to field
theories, and their relation to gauge theories and general relativity.
A perhaps more understandable version of part of the material is in
D.N. Williams,
The Dirac Algebra for Any Spin,
Unpublished Manuscript (2003)
Note that there are plenty of interactions that can be constructed
using the representation theory of the Lorentz group (and Weinberg's
constructions), and there are plenty of (compound) particles with
spin >2. See the tables of the particle data group, e.g., Delta(2950)
(randomly chosen from ).
R.L. Ingraham,
Prog. Theor. Phys. 51 91974), 249-261,
constructs covariant propagators and complete vertices for spin J
bosons with conserved currents for all J. See also
H Shi-Zhong et al.,
Eur. Phys. J. C 42 (2005), 375-389

S2d. Forms of relativistic dynamics
Relativistic multiparticle mechanics is an intricate subject,
and there are no-go theorems that imply that the most plausible
possibilities cannot be realized. However, these no-go theorems
depend on assumptions that, when questioned, allow meaningful
solutions. The no-go theorems thus show that one needs to be careful
not to introduce plausible but inappropriate intuition into the
formal framework.

To pose the problem, one needs to distinguish between kinematical

and dynamical quantities in the theory. Kinematics answers the
question "What are the general form and properties of objects that
are subject to the dynamics?" Thus it tells one about conceivable
solutions, mapping out the properties of the considered representation
of the phase space (or what remains of it in the quantum case).
Thus kinematics is geometric in nature. But kinematics does not know
of equations of motions, and hence can only tell general (kinematical)
features of solutions.
In contrast, dynamics is based on an equation of motion (or an
associated variational principle) and answers the question 'What
characterizes the actual solution?', given appropriate initial or
boundary conditions. Although the actual solution may not be available
in closed form, one can discuss their detailed properties and devise
numerical approximation schemes.
The difference between kinematical and dynamical is one of convention,
and has nothing to do with the physics. By choosing the representation,
i.e., the geometric setting, one chooses what is kinematical;
everything else is dynamical.
Since something which is up to the choice of the person describing
an experiment can never be distinguished experimentally, the physics
is unaffected. However, the formulas look very different in different
descriptions, and - just as in choosing coordinate systems - choosing
a form adapted to a problem may make a huge difference for actual

Dirac distinguishes in his seminal paper

Rev. Mod. Phys. 21 (1949), 392-399
three natural forms of relativistic dynamics, the instant form,
the point form, and the fromt form. They are distinguished by
what they consider to be kinematical quantities and what are the
dynamical quantities.

The familiar form of dynamics is the instant form,

which treats space (hence spatial translations and rotations)
as kinematical and time (and hence time translation and Lorentz boosts)
as dynamical. This is the dynamics from the point of view of a
hypothetical observer (let us call it an 'instant observer')
who has knowledge about all information at some time t (the present),
and asks how this information changes as time proceeds.
Because of causality (the finite bound of c on the speed of material
motion and communication), the resulting differential equations
should be symmetric hyperbolic differential equations for which the
initial-value problem is well-posed.
Because of Lorentz invariance, the time axis can be
any axis along a timelike 4-vector, and (in special relativity)
space is the 3-space orthogonal to it. For a real observer,
the natural timelike vector is the momentum 4-vector of the material
system defining its reference frame (e.g., the solar system).
While very close to the Newtonian view of reality, it involves
an element of fiction in that no real observer can get all the
information needed as intial data. Indeed, causality implies that
it is impossible for a physical observer to know the present anywhere
except at its own position.

A second, natural form of relativistic dynamics is, according to Dirac,

the point form. This is the form of dynamics in which a particular
space-time point x=0 (the here and now) in Minkowski space is
distinguished, and the kinematical object replacing space is,
for fixed L, a hyperboloid x^2=L^2 (and x_0<0) in the past
of the here and now.
The Lorentz transformations, as symmetries of the hyperboloid,
are now kinematical and take the role that space translations and
rotations had in the instant form. On the other hand, _all_ space and
time translations are now dynamical, since they affect the position
of the here-and-now.
This is the form of dynamics which is manifestly
Lorentz invariant, and in which space and time appear on equal footing.
An observer in the here and now (let us call it a 'point observer')
can - in principle, classically - have arbitrarily accurate
information about the particles and/or fields on the past
hyperboloid; thus causality is naturally accounted for.
Information given on the past hyperboloid of a point can be propagated
to information on any other past hyperboloid using the dynamical
equations that are defined via the momentum 4-vector P, which is a
4-dimensional analogue of the nonrelativistic Hamiltonian.
The Hamiltonian corresponding to motion in a fixed timelike
direction u is given by H=u dot P. The commutativity of the components
of P is the condition for the uniqueness of the resulting state
at a different point x independent of the path x is reached from 0.

In principle, there are many other forms of relativistic dynamics:

As Dirac mentions on p. 396 of his paper, any 3-dimensional surface
in Minkowski space works as kinematical space if it meets
every world line with time like tangents exactly once.
In general, those transformations are kinematical which
are also symmetries of the surface one treats as kinematical reference
surface. By choosing a surface without symmetries _all_
transformations become dynamical. For reasons of economy, one wants
however, a large kinematical symmetry group. The full Poincare group
is possible only for free dynamics.
This leaves as interesting large subgroups two with 6 linearly
independent generators, the Euclidean group ISO(3), leading to the
instant form, and the Lorentz group SO(1,3), leading to the point form,
and one with 7 linearly independent generators, the stabilizer of
a front (or infinite momentum plane), a 3-space with lightlike normal,
leading to the front form. This third natural form of relativistic
dynamics according to Dirac, has many uses in quantum field theory,
but here I won't discuss it further.

All forms are equivalent, related classically by canonical

transformations preserving algebraic operations and the Poisson bracket,
and quantum mechanically by unitary transformations preserving
algebraic operations and hence the commutator. This means that any
statement about a system in one of the forms can be translated into
an equivalent statement of an equivalent system in any of the other
Preferences are therefore given to one form over the other depending
solely on the relative simplicity of the computations one wants to do.
This is completely analogous to the choice of coordinate systems
(cartesian, polar, cylindric, etc.) in classical mechanics.

For a multiparticle theory, however, the different forms and the

need to pick a particular one seem to give different pictures of
reality. This invites paradoxes if one is not careful.
This can be seen by considering trajectories of classical relativistic
many-particle systems. There is a famous theorem by
Currie, Jordan and Sudarshan
Rev. Mod. Phys. 35 (1963), 350-375
which asserts that interacting two-particle systems cannot have
Lorentz invariant trajectories in Minkowski space. Traditionally,
this was taken by mainstream physics as an indication that the
multiparticle view of relativistic mechanics is inadequate,
and a field theoretical formulation is essential.
However, as time proceeded, several approaches to valid relativistic
multi-particle (quantum) dynamics were found (see the FAQ entry on
'Is there a multiparticle relativistic quantum mechanics?'),
and the theorem had the same fate as von Neumann's proof that
hidden-variable theories are impossible. Both results are now simply
taken as an indication that the assumptions under which they were
made are too strong.
In particular, once the assumption by Currie, Jordan and Sudarshan
that all observers see the same trajectories of a system of interacting
particles is rejected, their no-go theorem no longer applies.
The question then is how to find a consistent and covariant description
without this at first sight very intuitive property. But once it is
admitted that different observers see the same world but represented
in different personal spaces, the formerly intuitive property becomes
meaningless. For objectivity, it is enough that one can consistently
translate the views of any observer into that of any other observer.
Precisely this is the role of the dynamical Poincare transformations.
Thus nothing forbids an instant observer to observe
particle trajectories in its present space, or a
point observer to observe particle trajectories in its past hyperboloid.
However, the present space (or the past hyperboloid) of two different
observers is related not by kinematical transforms but dynamically,
with the result that trajectories seen by different observers on
their different kinematical 3-surface look different.
Classically, this looks strange on first sight, although
the Poincare group provides well-defined recipes for translating
the trajectories seen by one observer into those seen by another
Quantum mechanically, trajectories are fuzzy anyway, due to the
uncertainty principle, and as various successful multiparticle
theories show, there is no mathematical obstacle for such a description.
The mathematical reason of this superficially paradoxical situation
lies in the fact that there is no observer-independent definition
of the center of mass of relativistic particles, and the related fact
that there is no observer-independent definition of space-time
coordinates for a multiparticle system.
The best one can do is to define either a covariant position operator
whose components do not commute (thus definig a noncommutative
space-time), or a spatial position operator, the so-called
Newton-Wigner position operator, which has three commuting coordinates
but is observer-dependent.
(See the FAQ entry on 'Localization and position operators'.)

S2e. Is there a multiparticle relativistic quantum mechanics?
In his QFT book, Weinberg says no, arguing that there is no way to
implement the cluster separation property. But in fact there is:
There is a big survey by Keister and Polyzou on the subject
B.D. Keister and W.N. Polyzou,
Relativistic Hamiltonian Dynamics in Nuclear and Particle Physics,
in: Advances in Nuclear Physics, Volume 20,
(J. W. Negele and E.W. Vogt, eds.)
Plenum Press 1991.
that covered everything known at that time. This survey was quoted
at least 116 times, see,20,225
looking these up will bring you close to the state of the art
on this.
They survey the construction of effective few-particle models.
There are no singular interactions, hence there is no need for
The models are _not_ field theories, only Poincare-invariant few-body
dynamics with cluster decomposition and phenomenological terms
which can be matched to approximate form factors from experiment or
some field theory. (Actually many-body dynamics also works, but the
many particle case is extremely messy.)
They are useful phenomenological models, but somewhat limited;
for example, it is not clear how to incorporate external fields.
The papers by Klink at
and work by Polyzou at
contain lots of multiparticle relativistic quantum mechanics,
applied to real particles. See also the Ph.D. thesis by Krassnigg at
(Other work in this direction includes Dirac's many-time quantum
theory, with a separate time coordinate for each particle; see, e.g.,
Marian Guenther, Phys Rev 94, 1347-1357 (1954)
and references there. Related multi-time work was done under the
name of 'proper time quantum mechanics' or 'manifestly covariant
quantum mechanics', see, e.g.,
L.P. Horwitz and C. Piron, Helv. Phys. Acta 48 (1973) 316,
but it does not reproduce standard physics, and apparently never
reached a stage useful to phenomenology.)
Note that in the working single-time approaches, covariance is always
achieved through a representation of the Poincare group on a
Hilbert space corresponding to a fixed time (or another 3D manifold in
space-time), rather than through multiple times.
Thus the whole theory has a single time only, whose dynamics is
generated by the Hamiltonian, the generator H=P_0 of the Poincare group.
(This is completely analogous to the nonrelativistic case,
where multiparticle systems also have a single time only.)
The natural manifestly covariant picture is that of a vector bundle
on Minkowski space-time, with a standard Fock space attached to each
point. An observer (i.e., formally, an orthonormal frame attached at
some space-time point) moves in space-time via the Poincare group,
and this action extends to the bundle by means of the representation
defining the Fock space.

S2f. What is a photon?
According to quantum electrodynamics, the most accurately verified
theory in physics, a photon is a single-particle excitation of the
free quantum electromagnetic field. More formally, it is a state of
the free electromagnetic field which is an eigenstate of the photon
number operator with eigenvalue 1.
The pure states of the free quantum electromagnetic field
are elements of a Fock space constructed from 1-photon states.
A general n-photon state vector is an arbitrary linear combinations
of tensor products of n 1-photon state vectors; and a general pure
state of the free quantum electromagnetic field is a sum of n-photon
state vectors, one for each n. If only the 0-photon term contributes,
we have the dark state, usually called the vacuum; if only the
1-photon term contributes, we have a single photon.
A single photon has the same degrees of freedom as a classical vacuum
radiation field. Its shape is characterized by an arbitrary nonzero
real 4-potential A(x) satisfying the free Maxwell equations, which in
the Lorentz gauge take the form
nabla dot nabla A(x) = 0,
nabla dot A(x) = 0,
expressing the zero mass and the transversality of photons. Thus for
every such A there is a corresponding pure photon state |A>.
Here A(x) is _not_ a field operator but a photon amplitude;
photons whose amplitude differ by an x-independent phase factor are
the same. For a photon in the normalized state |A>, the observable
electromagnetic field expectations are given by the usual formulas
relating the 4-potential and the fields,
<\E(x)> = <A|\E(x)|A>
= - partial \A(x)/partial x_0 - c nabla_\x A_0(x),
<\B(x)> = <A|\B(x)|A> = nabla_\x x \A(x)
[hmmm. check if this really is the case...]
Here \x (fat x) and x_0 are the space part and the time part of a
relativistic 4-vector, \E(x), \B(x) are the electromagnetic
field operators (related to the operator 4-potential by analogous
formulas), and c is the speed of light. Amplitudes A(x) producing
the same \E(x) and \B(x) are equivalent and related by a gauge
transformation, and describe the same photon.

In momentum space (frequently but not always the appropriate choice),

single photon states have the form
|A> = integral d\p^3/p_0 A(\p)|\p>,
where |\p> is a single particle state with definite 3-momentum
\p (fat p), p_0=|\p| is the corresponding photon energy divided by c,
and the photon amplitide A(\p) is a polarization 4-vector.
Thus a general photon is a superposition of monochromatic waves with
arbitrary polarizations, frequencies and directions.
(The Fourier transform of A(\p) is the so-called analytic signal
A^(+)(x), and by adding its complex conjugate one gets the real
4-potential A(x) in the Lorentz gauge.)
The photon amplitude A(\p) can be regarded as the photon's
wave function in momentum space. Since photons are not localizable
(though they are localizable approximately), there is no
meaningful photon wave function in coordinate space; see the
next entry in this FAQ. One could regard the 4-potential A(x) as
coordinate space wave function, but because of its gauge dependence,
this is not really useful.
This is second quantized notation, as appropriate for quantum fields.
This is how things always look in second quantization, even for a
harmonic oscillator. The wave function psi(x) or psi(p) in standard
(first quantized) quantum mechanics becomes the state vector
psi = integral dx psi(x) |x> or integral dp psi(p) |p>
in Fock space; the wave function at x or p turns into the coefficient
of |x> or |p>. In quantum field theory, x, A (the photon amplitude), and
E(x) (the electric field operator) correspond to k (a component of the
momentum), x, and p_k. Thus the coordinate index k is inflated to the
spacetime position x, the argument of the wave function is inflated to
a solution of the free Maxwell equations, the momentum operator is
inflated to a field operator, and the integral over x becomes a
functional integral over photon amplitudes,
psi = integral dA psi(A) |A>.
Here psi(A) is the most general state vector in Fock space; for a
single photon, psi depends linearly on A,
psi(A) = integral d\p^3/p_0 A(\p)|\p> = |A>.
Observable electromagnetic fields are obtained as expectation values
of the field operators \E(x) and \B(x) constructed by differentiation of
the textbook field operator A(x). As the observed components of
the mean momentum, say, in ordinary quantum mechanics are
<p_k> = integral dx psi(x)^* p_k psi(x),
so the observed values of the electromagnetic field are
<\E(x)> = <psi|\E(x)|psi> = integral dA psi(A)^* \E(x) psi(A).
<\B(x)> = <psi|\B(x)|psi> = integral dA psi(A)^* \B(x) psi(A).
In a frequently used interpretation (valid only approximately),
the term A(\p)|\p> represents the one-photon part of a monochromatic
beam with frequency nu=cp_0/h, direction \n(\p)=\p/p_0, and
polarization determined by A(\p). Here h = 2 pi hbar, where hbar is
Planck's number; omega=cp_0/hbar is the angular frequency.
The polarization 4-vector A(\p) is orthogonal to the 4-momentum p
composed of p_0 and \p, obtained by a Fourier transform of the
4-potential A(x) in the Lorentz gauge. (The wave equation translates
into the condition p_0^2=\p^2, causality requires p_0>0, hence
p_0=|\p|, and orthogonality p dot A(\p) = 0 expresses the Lorentz
gauge condition. For massless particles, there remains the additional
gauge freedom to shift A(\p) by a multiple of the 4-momentum p, which
can be used to fix A_0=0.)
A(\p) is usually written (in the gauge with vanishing time component) as
a linear combination of two specific polarization vectors eps^+(p) and
eps^-(p) for circularly polarized light (corresponding to helicities +1
and -1), forming together with the direction vector \n(\p) an
orthonormal basis of complex 3-space. In particular,
eps^+(p) eps^+(p)^* + eps^-(p)eps^-(p)^* + \n(\p)\n(\p)^* = 1
is the 3x3 identity matrix. (This is used in sums over helicities for
Feynman rules.) Specifically, eps^+(p) and eps^-(p) can be obtained by
finding normalized eigenvectors for the eigenvalue problem
[check. The original eigenvalue problem is p dot J eps = lambda eps.]
p x eps = lambda eps
with lambda = +-i|p|. For example, if p is in z-direction then
eps^+(p) = (1, -i, 0)/sqrt(2),
eps^-(p) = (i, -1, 0)/sqrt(2),
and the general case can be obtained by a suitable rotation.
An explicit calculation gives almost everywhere
eps^+(p) = u(p)/p_0
where p_0=|p| and
u_1(p) = p_3 - i p_2 p'/p'',
u_2(p) = -i p_3 - i p_1 p'/p''
u_3(p) = p'
p' = p_1+ip_2,
p''= p_3+p_0.
[what is eps^-(p)?]
These formulas become singular along the negative p_3-axis,
so several charts are needed to cover
For experiments one usually uses nearly monochromatic light bundled
into narrow beams. If one also ignores the directions (which are
usually fixed by the experimental setting, hence carry no extra
information), then only the helicity degrees of freedom remain,
and the 1-photon part of the beam behaves like a 2-level quantum
system ('a single spin').
A general monochromatic beam with fixed direction in a pure state is
given by a second-quantized state vector, which is a superposition of
arbitrary multiphoton states in the Bosonic Fock space generated by
the two helicity degrees of freedom. This is the basis for most
quantum optics experiments probing the foundations of quantum

The simplest state of light (generated for example by

lasers) is a coherent state, with state vector proportional to
e(A) = |vac> + |A> + 1/sqrt(2!) |A> tensor |A>
+ 1/sqrt(3!) |A> tensor |A> tensor |A> + ...
where |A> is a one-photon state. Thus coherent states also have the
same degrees of freedom as classical electromagnetic radiation.
Indeed, light in coherent states behaves classically in most respects.
At low intensity, the higher order terms in the expansion are
negligible, and since the vacuum part is not directly observable,
a low intensity coherent states resembles a single photon state.
On the other hand, true single photon states are very hard to produce
to good accuracy, and were created experimentally only recently:
B.T.H. Varcoe, S. Brattke, M. Weidinger and H. Walther,
Preparing pure photon number states of the radiation field,
Nature 403, 743--746 (2000).
see also
Ordinary light is essentially never, and high-tech light almost never,
describable by single photons.

A good informal discussion of what a photon is from a more practical

perspective was given by Paul Kinsler in
But this does not tell the whole story. An interesting collection of
articles explaining different current views is in
The Nature of Light: What Is a Photon?
Optics and Photonics News, October 2003
Further discussion is given in the section ''Coherent states of light
as ensembles'' of the present FAQ.

The standard reference for quantum optics is

L. Mandel and E. Wolf,
Optical Coherence and Quantum Optics,
Cambridge University Press, 1995.
Mandel and Wolf write (in the context of localizing photons),
about the temptation to associate with the clicks of a photodetector
a concept of photon particles. [If there is interest, I can try to
recover the details.] The wording suggests that one should resist the
temptation, although this advice is usually not heeded. However,
the advice is sound since a photodetector clicks even when it
detects only classical light! This follows from the standard analysis
of a photodetector, which treats the light classically and only
quantizes the detector. Thus the clicks are an artifact of
photodetection caused by the quantum nature of matter, rather than
a proof of photons arriving!!!

A coherent light source (laser) produces a coherent state of light,

which is a superposition of the vacuum state, a 1-photon state,
a 2-photon state, etc, with squared amplitudes given by a Poisson
distribution. At low intensity, this is misinterpreted in practice
as random single photons arriving at the end of the beam in a
random Poisson process, because the photodetector produces clicks
according to this distribution.
Incoherent light sources usually consist of thermal mixtures and
produce other distributions, but otherwise the description (and
misinterpretation) is the same.
Nevertheless, one must understand this misinterpretation in order
to follow much of the literature on quantum optics.
Thus the talk about photons is usually done inconsistently;
almost everything said in the literature about photons should be taken
with a grain of salt.
There are even people like the Nobel prize winner Willis E. Lamb
(the discoverer of the Lamb shift) who maintain that photons don't
exist. See towards the end of
The reference mentioned there at the end appeared as
W.E Lamb, Jr.,
Applied Physics B 60 (1995), 77--84
This, together with the other reference mentioned by Lamb, is reprinted
W.E Lamb, Jr.,
The interpretation of quantum mechanics,
Rinton Press, Princeton 2001.
I think the most apt interpretation of an 'observed' photon as used
in practice (in contrast to the photon formally defined as above) is
as a low intensity coherent state, cut arbitrarily into time slices
carrying an energy of h*nu = hbar*omega, the energy of a photon at
frequency nu and angular frequency omega.
Such a state consists mostly of the vacuum (which is not directly
observable hence can usually be neglected), and the contributions of
the multiphoton states are negligible compared to the single photon
With such a notion of photon, most of the actual experiments done make
sense, though it does not explain the quantum randomness of the
detection process (which comes from the quantized electrons in the

A nonclassical description of the electromagnetic field where states of

light other than coherent states are required is necessary mainly for
special experiments involving recombining split beams, squeezed
state amplification, parametric down-conversion, and similar
arrangements where entangled photons make their appearance.
There is a nice booklet on this kind of optics:
U. Leonhardt,
Measuring the Quantum State of Light,
Cambridge, 1997.
Nonclassical electromagnetic fields are also relevant in the
scattering of light, where there are quantum corrections
due to multiphoton scattering. These give rise to important effects
such as the Lamb shift, which very accurately confirm the quantum
nature of the electromagnetic field. They involve no observable
photon states, but only virtual photon states, hence they are unrelated
to experiments involving photons. Indeed, there is no way to observe
virtual particles, and their name was chosen to reflect this.
(Observed particles are always onshell, hence massless for photons,
whereas it is an easy exercise that the virtual photon mediating
electromagnetic interaction of two electrons in the tree approximation
is never onshell.)

S2g. Particle positions and the position operator
The standard probability interpretation for quantum particles
is based on the Schr"odinger wave function psi(x), a square integrable
single- or multicomponent function of position x in R^3.
Indeed, with ^* denoting the conjugate transpose,
rho(x) := psi(x)^*psi(x)
is generally interpreted as the probability density to find (upon
measurement) the particle at position x. Consequently,
Pr(Z) := integral_Z dx |psi(x)|^2
is interpreted as the probability of the particle being in the open
subset Z of position space. Particles in highly localized states
are then given by wave packets which have no appreciable size
|psi(x)| outside some tiny region Z.
If the position representation in the Schr"odinger picture exists,
there is also a vector-valued position operator x, whose components
act on psi(x) by multiplication with x_j (j=1,2,3). In particular,
the components of x commute, satisfy canonical commutation relations
with the conjugate momentum
p = -i hbar partial_x,
and transform under rotations like a 3-vector, so that the commutation
relations with the angular momentum J take the form
[J_j,x_k] = i eps_{jkl} x_l.
Moreover, in terms of the (unnormalizable) eigenstates |x,m> of the
position operator correponding to the spectral value x (and a label m
to distinguish multiple eigenstates) we can recover the position
representation from an arbitrary representation by defining psi(x)
to be the vector with components
psi_m(x) := <x,m|psi>.
Therefore, if we have a quantum system defined in an arbitrary
Hilbert space in which a momentum operator is defined, the necessary
and sufficient condition for the existence of a spatial probability
interpretation of the system is the existence of a position operator
with commuting components which satisfy standard commutation
relations with the components of the momentum operator and the
angular momentum operator.
Thus we have reduced the existence of a probability interpretation
for particles in a bounded region of space to the question of the
existence of a position operator with the right properties.

We now investigate this existence problem for elementary particles,

i.e., objects represented by an irreducible representation of the
full Poincare group. We consider first the case of particles of
mass m>0, since the massless case needs additional considerations.

A. Massive case, m>0:

Let M := R^3 be the manifold of 3-momenta p. On the Hilbert space
H_m^d obtained by completion of the space of all C^infty functions
with compact support from M to the space C^d of d-component vectors
with complex entries, with inner product defined by
<phi|psi> := integral d\p/sqrt(p^2+m^2) phi(p)^*psi(p),
we define the position operator
q := i hbar partial_p,
which satisfies the standard commutation relations, the momentum in
time direction,
p_0 := sqrt(m^2+|p|^2),
where m>0 is a fixed mass, and the operators
J := q x p + S,
K := (p_0 q + q p_0)/2 + p x S/(m+p_0),
where S is the spin vector in a unitary representation of so(3) on
the vector space C^d of complex vectors of length d, with the same
commutation relations as J.
This is a unitary representation of the Poincare algebra;
verification of the standard commutation relations (given,
e.g., in Weinberg's Volume 1, p.61) is straightforward.
It is not difficult to show that this representation is irreducible
and extends to a representation of the full Poincare group.
Obviously, this representation carries a position operator.
Since the physical irreducible representations of the Poincare group
are uniquely determined by mass and spin, we see that in the massive
case, a position operator must always exist. An explicit formula in
terms of the Poincare generators is obtained through division by m
in the formula
mq = K - ((K dot p) p/p_0 + J x p)/(m+p_0),
which is straightforward, though a bit tedious to verify from the above.
That there is no other possibility follows from
T.F. Jordan
Simple derivation of the Newton-Wigner position operator
J. Math. Phys. 21 (1980), 2028-2032.
Note that the position operator is always observer-dependent, in the
sense that one must choose a timelike unit vector to distinguish
space and time coordinates in the momentum operator. This is due to
the fact that the above construction is not invariant under Lorentz
boosts (which give rise to equivalent but different representations).
Note also that in case of the Dirac equation, the position operator is
_not_ the operator multiplying a solution psi(x) of the Dirac equation
by the spacelike part of x (which would mix electron and positron
states), but a related operator obtained by first applying a so-called
Foldy-Wouthuysen transformation.
L.L. Foldy and S.A. Wouthuysen,
On the Dirac Theory of Spin 1/2 Particles and Its Non-Relativistic
Phys. Rev. 78 (1950), 29-36.

B. Massless case, m=0:

Let M_0 := R^3\{0} be the manifold of nonzero 3-momenta p, and let
p_0 := |p|, n := p/p_0.
The Hilbert space H_0^d (defined as before but now with m=0 and with
M_0 in place of M)
obtained by completion of the space of all C^infty functions
with compact support from M to the space C^d of d-component vectors
with complex entries, with inner product defined by
<phi|psi> := integral d\p/sqrt(p^2+m^2) phi(p)^*psi(p),
carries a natural massless representation of the Poincare algebra,
defined by
J := q x p + S,
K := (p_0 q + q p_0)/2 + n x S,
where q = i hbar partial_p is the position operator, and S is the
spin vector in a unitary representation of so(3) on C^d, with the
same commutation relations as J.
Again, verification of the standard commutation relations is
straightforward. (Indeed, this representation is the limit of the
above massive representation for m --> 0.)
It is easily seen that the helicity
lambda := n dot S
is central in the (suitably completed) universal envelope of the
Lie algebra, and that the possible eigenvalues
of the helicity are s,s-1,...,-s, where s=(d-1)/2. Therefore, the
eigenspaces of the helicity operator carry by restriction unitary
representations of the Poincare algebra, which are easily seen to be
irreducible. They extend to a representation of the connected
Poincare group. Moreover, the invariant subspace H_s formed by the
direct sum of the eigenspaces for helicity s and -s form a massless
irreducible spin s representation of the full Poincare group.
(It is easy to see that changing K to K-t(p_0)p for an arbitrary
differentiable function t of p_0 preserves all commutation relations,
hence gives another representation of the Poincare algebra.
Since the massless irreducible representations of the Poincare group
are uniquely determined by their spin, the resulting representations
are equivalent. This corresponds to the freedom below in choosing a
position operator.)
Now suppose that a Poincare invariant subspace H of L^2(M_0)^d has a
position operator x satisfying the canonical commutation relations
with p and the above commutator relations with J. Then F=q-x commutes
with p, hence its components must be a (possibly matrix-valued)
function F(p) of p. Commutation with p implies that partial_p x F = 0,
and, since M_0 is simply connected, that F is the gradient of a scalar
function f. Rotation invariance then implies that this function
depends only on p_0=|p|. Thus
F = partial_p f(p_0) = f'(p_0) n.
Thus the position operator takes the form
x = q - f'(p_0) n.
In particular,
x x p = q x p.
Now the algebra of linear operators on the dense subspace of C^infty
functions in H contains the components of p, J, K and x, hence those of
J - x x p = J - q x p = S.
Thus the (p-independent) operators from the spin so(3) act on H.
But this implies that either H=0 (no helicity) or H = L^2(M_0)^d
(all helicities between s and -s).
Since the physical irreducible representations of the Poincare group
are uniquely determined by mass and spin, and for s>1/2, the spin s
Hilbert space H_s is a proper, nontrivial subspace of L^2(M_0)^d,
we proved the following theorem:

An irreducible representations of the full Poincare group with
mass m>=0 and finite spin has a position operator transforming
like a 3-vector and satisfying the canonical commutation relations
if and only if either m>0 or m=0 and s<=1/2 (but s=0 if only
the connected poincare group is considered).
This theorem was announced without giving details in
T.D. Newton and E.P. Wigner,
Localized states for elementary systems,
Rev. Mod. Phys. 21 (1949), 400-406.
A mathematically rigorous proof was given in
A. S. Wightman,
On the Localizability of Quantum Mechanical Systems,
Rev. Mod. Phys. 34 (1962), 845-872.
See also
T.F. Jordan
Simple proof of no position operator for quanta with zero mass
and nonzero helicity
J. Math. Phys. 19 (1980), 1382-1385.
who also considers the massless representations of continuous spin,
D Rosewarne and S Sarkar,
Rigorous theory of photon localizability,
Quantum Opt. 4 (1992), 405-413.

For spin 1, the case relevant for photons, we have d=3, and the
subspace of interest is the space H obtained by completion of the
space of all vector-valued C^infty functions A(p) of a nonzero
3-momentum p with compact support satisfying the transversality
condition p dot A(p)=0,
with inner product defined by
<A|A'> := integral dp/|p| A(p)^* A'(p).
It is not difficult to see that one can identify the wave functions
A(p) with the Fourier transform of the vector potential in the
radiation gauge where its 0-component vanishes. This relates the
present discussion to that given in the FAQ entry ''What is a photon?''.

As a consequence of our discussion, photons (m=0, s=1) and gravitons

(m=0, s=2) cannot be given natural probabilities for being in any given
bounded region of space. Chiral spin 1/2 particles also do not have
a position operator and hence have no such probabilities, by the same
argument, applied to the connected Poincare group.
(Note that measured are only frequencies, intensities and
S-matrix elements; these don't need a well-defined position concept
but only a well-defined momentum concept, from which frequencies
can be found via omega=p_0/hbar - since c=1 in the present setting,
and directions via n = p/p_0.)

However, assuming there are scalar massless Higgs particles (s=0),

one could combine such a higgs, a photon, and a graviton into
a single reducible representation on L^2(M_0)^5, using the above
construction. By our derivation, one can find position eigenstates
which are superpositions of Higgs, photon, and graviton. Thus to
be able to regard photons and gravitons as particles with a proper
probability interpretation, one must consider Higgs, photons, and
gravitons as aspects of the same localizable particle, which we
might call a graphoton. (Without gravity, a phiggs particle would
also do.)

If the concept of an observable is not tied to that of a Hermitian

operator but rather to that of a POVM (positive operator-valued
measure), there is more flexibility, and covariant POVMs for positon
measurements can be meaningfully defined, even for photons. See, e.g.,
A. Peres and D.R. Terno,
Quantum Information and Relativity Theory,
Rev. Mod. Phys. 76 (2004), 93.
[see, in particular, (52)]
K. Kraus, Position observable of the photon, in:
The Uncertainty Principle and Foundations of Quantum Mechanics,
Eds. W. C. Price and S. S. Chissick,
John Wiley & Sons, New York, pp. 293-320, 1976.
M. Toller,
Localization of events in space-time,
Phys. Rev. A 59, 960 (1999).
P. Busch, M. Grabowski, P. J. Lahti,
Operational Quantum Physics,
Springer-Verlag, Berlin Heidelberg 1995, pp.92-94.
Note that a POVM describes the statistics of a particular measurement
process rather than some underlying reality. This is reflected in the
fact that there are many possible nonequivalent possible definitions
of POVMs, pertaining to the possible different ways to get a measured
Therefore, the concept of a photon position is necessarily subjective,
since it depends on the POVM used, hence on the way the
measurement is performed. It does not describe something objective.
The POVM does not allow one to talk about the position of a photon
- which could exist only if the corresponding operator existed -,
but only about the measured position: The photon is somewhere near the
range of values established by the measurement, without any more
definite statement being possible. On the other hand, for observables
corresponding to Hermitian operators, there are states in which
a definite statement is (at least theoretically) possible that the
observable has a value in a given range.

Papers related to position operators:

M.H.L. Pryce,
Commuting Co-ordinates in the new field theory,
Proc. Roy. Soc. London Ser. A 150 (1935), 166-172.
(first construction of position operators in the massive case)
B. Bakamjian and L.H. Thomas,
Relativistic Particle Dynamics. II,
Phys. Rev. 92 (1953), 1300-1310.
(first construction of massive representations along the above
L.L. Foldy,
Synthesis of Covariant Particle Equations,
Physical Review 102 (1956), 568-581.
(nice and readable version of the Bakamjian-Thomas construction
for massive representations of the Poincare group)
R. Acharya and E. C. G. Sudarshan,
''Front'' Description in Relativistic Quantum Mechanics,
J. Math. Phys. 1 (1960), 532-536.
(a ''most local'' description of the photon by wave fronts)
I. Bialynicki-Birula,
Photon wave function,
(A 53 page recent review article, covering various possibilities
to define photon wave functions without a position operator
acting on them. The best is (3.5), with a nonstandard inner
product (5.8). What is left of the probability interpretation is
(5.28) and its subsequent discussion.)
See also the entry ''Localization and position operators'' in this FAQ.

There are a few papers by M. Hawton, e.g.
on a nonstandard position operator which does not transform like a
3-vector. This is unphysical since it does not give orientation
independent probabilities for observing a photon in a given region of
space. Claims to the contrary in,
supposedly constructing a Lorentz invariant photon number density,
are erroneous; see
Other nonstandard position operators violating the conditions
necessary for a probability interpretation were discussed earlier,
starting with
M.H.L. Pryce,
The Mass-Centre in the Restricted Theory of Relativity and Its
Connexion with the Quantum Theory of Elementary Particles,
Proc. Roy. Soc. London, Ser. A, 195 (1948), 62-81.

S2h. Localization and position operators
Position operators are part of the toolkit of relativistic quantum
In a relativistic setting, one always has a representation of the
Poincare algebra. From the generators of the Poincare algebra
(namely the 4-momentum p, the angular momentum \J, and the
boost generators \K) one can make up (in massive representations)
a nonlinear expression for a 3-dimensional \x (the position operator)
that together with the space part \p of the 4-momentum has canonical
commutation rules and hence gives a Heisenberg algebra.
(The backslash is a convenient ascii notation to indicate bold face
letters, corresponding to 3-vectors.)
The position operator so constructed is unique, once the time coordinate
is fixed, and is usually called the Newton-Wigner position operator,
although it appears already in earlier work of Pryce. Relevant
applications are related to the names Foldy and Wuythousen
(for their transform of the Dirac equation, widely used in relativistic
quantum chemistry) and Bakamjian and Thomas (for their relativistic
multi-particle theories); both groups rediscovered the Newton-Wigner
results independently, not being aware of their work.
That the time coordinate has to be fixed means that the position
operator is observer-dependent. Each observer splits space-time
into its personal time (in direction of its total 4-momentum) and
personal 3-space (orthogonal to it), and the position operator
relates to this 3-space. By a Lorentz transformation, one can
transform the 4-momentum to the vector (E_obs 0 0 0), which makes time
the 0-component. Most papers on the subject work in the latter setting.
For massless representations of spin >1/2, the construction breaks down.
This is related to the fact that massless particles with spin >1/2
don't have modes of all helicities allowed by the spin
(e.g., photons have spin 1 but no longitudinal modes),
which makes them being always spread out, and hence not completely
localizable. For details, see the FAQ entry
''Particle positions and the position operator''

Here are a few references:

J.P. Costella and B.H.J. McKellar,
The Foldy-Wouthuysen transformation,
* This paper discusses the physical relevance of the Newton-Wigner
representation, and its relation to the Foldy-Wouthuysen transformation
T. D. Newton, E. P. Wigner,
Localized States for Elementary Systems,
Rev. Mod. Phys. 21 (1949) 400-406
* The original paper on localization
L. L. Foldy and S. A. Wouthuysen,
On the Dirac Theory of Spin 1/2 Particles and Its Non-Relativistic
Phys. Rev. 78 (1950), 29-36.
* On the transform of the Dirac equation now carrying the author's name
B. Bakamjian and L. H. Thomas
Relativistic Particle Dynamics. II
Phys. Rev. 92 (1953), 1300-1310.
and related papers in
Phys. Rev. 85 (1952), 868-872.
Phys. Rev. 121 (1961), 1849-1851.
* First constructive papers on relativistic multiparticle dynamics,
based on a 3D position operator
L. L. Foldy,
Synthesis of Covariant Particle Equations,
Phys. Rev. 102 (1956), 568-581
* A lucid exposition of Poincare representations which start with
a 3D position operator, and a discussion of electron localization
Before eq. (189), he notes that an observer-independent localization
of a Dirac electron (which generally is considered to be a pointlike
particle since it can be exactly localized in a given frame)
necessarily leaves a fuzziness of the order of the Compton wavelength
of the particle. (This is also related to the so-called Zitterbewegung,
see, e.g., the discussion in Chapter 7 of Paul Strange's
"Relativistic Quantum Mechanics".)
A. S. Wightman,
On the Localizability of Quantum Mechanical Systems,
Rev. Mod. Phys. 34 (1962) 845-872
* A group theoretic view in terms of systems of imprimitiviy
T. O. Philips,
Lorentz invariant localized states,
Phys. Rev. 136 (1964), B893-B896.
* A covariant coherent state alternative which does not require
to single out a time coordinate
V. S. Varadarajan,
Geometry of Quantum Theory
(second edition), Springer, 1985
* A book discussing some of this stuff
L. Mandel and E. Wolf,
Optical Coherence and Quantum Optics,
Cambridge University Press, 1995.
* The bible on quantum optics, a thick but very useful book.
Relevant here since it contains a good discussion of the
localizability of photons (which can be done only approximately,
in view of the above) from a reasonably practical point of view.
G.N. Fleming,
Reeh-Schlieder meets Newton-Wigner
* This paper gives some relations to quantum field theory

S2i. Position operators in relativistic quantum field theory
In relativistic quantum field theory in its usually given form,
position is promoted to the same status as time, and hence becomes a
parameter in the quantum field, while in quantum mechanics it is an
operator vector.
This poses the question of whether there is a position operator in
relativistic quantum field theory. Many people think that there is none.
But even though there is a parameter called x and referred to as
4-dimensional position, there is also an vector defining a
3-dimensional position operator, provided the relativistic system
under consideration is not massless.
Indeed, any relativistic theory possesses the Poincare group as a
symmetry group, whose infinitesimal generators satisfy the standard
commutation rules of the Poincare algebra. But given these, the
standard construction by Newton and Wigner gives (in each Lorentz
frame) a 3-dimensional position operator with commuting components,
and the associated conjugate momentum operators. (See Section S2g
''Particle positions and the position operator'' of this FAQ.)
These play exactly the same role as the position and momentum
operators in nonrelativistic quantum mechanics.
S2j. Coherent states of light as ensembles
Let us look in some detail at the setting of a weak laser switched on
at time t_0 and switched off again at time t_1. The time T:=t_1-t_0
that the laser is switched on is a variable that we can choose at will.
Conventionally one models the light produced by a laser by coherent
states. If one tests the photon contents at the end of the beam by a
photodetector, one measures a series of clicks indicating (according to
tradition) the presence of single photons. Each click is conventionally
regarded as the measurement of a single photon; hence one measures an
ensemble of photons. Without this interpretation, much of the talk
about photons in quantum optics would not make sense.

Technically, and completely precisely, one has an ensemble of photons

in an indefinite photon number state. (Even a superposition of states
describes an ensemble, in the conventional interpretation.)
In a weak coherent state, the multiparticle contents is negligible;
one has essentially a superposition of the vacuum and the single
particle state. Conventionally (as for all somewhat rare events),
the vacuum part is ignored - one just restricts attention to the
times where a particle is present. This leaves a single particle state.
Thus, at least for weak coherent states, it is a good approximation
to say that a coherent state of definite frequency is an ensemble
of single-particle systems.

More formally, in the usual abbreviated form, a weak coherent state of

a stationary monochromatic beam has the form
|psi> = (1-eps||0> + eps|1> + O(eps^2), (*)
with eps<<1, and
<n> = <a^*a> = <psi|a^*a|psi> = eps^2 + O(eps^3)
is not a mean photon number, but a mean rate - the mean intensity.
More precisely, each coherent state has a mode A=A(p); the modes are in
1-1 correspondence with creation operators a^*(A). They create,
in field theory language, one photon in this mode. So far, these
photons are only constructs on paper, used to be able to write down
multiparticle states, and have not yet an observable meaning.
An N-particle state of mode A is defined recursively from the vacuum
state by
|1,A> := a^*(A)|vac>, |N,A>: = a^*(A)|N-1,,A> for N>0,
and coherent states with mode A have the form
|z,A>> := const* sum_N z^N/sqrt{N!} |N,A>
with a complex amplitude z. and satisfies
a(A)|z,A>> = z|z,A>>.
The mean photon number associated with the coherent state is
Nbar := <N> = <a^*(A)a(A)> = <<z,A|a^*(A)a(A)|z,A>>
= <<z,A|z^*z|z,A>> = z^*z <<z,A|z,A>> = z^*z,
Nbar = |z|^2,
independent of the time T.
The events are the clicks, and there is exactly one click per event
in a weak signal (for strong signals, one cannot separate the events).
But the events happen randomly in time, with a rate proportinal to eps.
It is conventional to regard each click as evidence for the presence
of a single photon - this more or less defines the experimental notion
of a photon. (See also the discussion in the section
''What is a photon?'' of this FAQ.)
Note that two photons arriving at different times cannot be considered
as being part of a N-particle state with N>1, since states are
considered at a fixed time! Also, the fact that the weak coherent
state has a negligible contribution of doubly excited states
means that N-particle state with N>1 are here completely irrelevant.
Thus one has an ensemble of single photons.
Clearly, the number of observable photons (in the sense
of detector clicks) is proportional to T. This shows that the
formal photon number operator in Fock space, N = a^*(A)a(A), has
nothing to do with the photon number as defined by the number of clicks;
instead its expectation is proportional to the mean rate of clicks
per unit time.
Thus (*) describes an ensemble of O(T*eps^2) single photons, where
$T$ is the duration of the experiment.
In particular, plane monochromatic light in the form of a coherent state
(three mathematical idealizations involved here) is an endless stream
of infinitely many photons passing with the speed of light through a
particular position on the beam. The rate of emission of photons is
proportional to the intensity of the incident beam. But the fact that
the model is an approximation only and that for real preparations,
observations are bounded in space and time does not change the results
of this analysis.

On the other hand, it is clear that a coherent state is not a 1-photon

state but a state with an indefinite number of photons (i.e., not an
eigenstate of the number operator). Thus there seems to be a conflict
in terminology - weak laser light is describerd by a coherent state
without definite number contents, but it behaves experimentally as
an ensemble of single photons.
This shows that the concept of a photon is somewhat ambiguous.
Different people mean different and often quite vague things by
''photon'', if they bother to spell out the meaning in some detail
(which is usually not done). This can be seen from the diverging
explanations given in a recent special issue on this topic:
The Nature of Light: What Is a Photon?
Optics and Photonics News, October 2003
which presents five mutually incompatible views,
* Light reconsidered (Arthur Zajonc)
* What is a photon? (Rodney Loudon)
* What is a photon? (David Finkelstein)
* The concept of the photon - revisited (Ashok Muthukrishnan,
Marlan O. Scully, and M. Suhail Zubairy)
* A photon viewed from Wigner phase space (Holger Mack and
Wolfgang P. Schleich)
In QED, a ''one-photon state'' is a well-defined object, but ''one
photon'' in an experiment is not (unless one identifies it with a
detector click - which leaves unsaid what an undetected photon would
be). The relation between the two is quite indirect, and there is no
agreement in the literature on the precise relation.
My own views (not mainstream, but consistent with experiment) are:
1. that clicks have nothing at all to do with photons, they are just
a stochastic measure of intensity, and arise also if the incindent
field is modelled completely classical;
2. that what is typically called a photon is not an arbitrary single
particle state of the electromagnetic field (in particular, never
an approximately plane wave) but a state of the electromagnetic field
that at each time is localized in space, whose energy contents is
that of hbar*omega. Otherwise, the idea of producing photons of
demands makes no sense.
3. It is the field of the incident beam that counts; the talk about
photons in the incoming beam is not very meaningful and only blurs
the picture; the right language is that of field theory.

Indeed, a theoretical model of a photo-detector excited by an external

classical monochromatic e/m field contains no photons, but in this
model the detector responds by clicking randomly according to a Poisson
statistics; see Chapter 9 of the book
L. Mandel and E. Wolf,
Optical Coherence and Quantum Optics,
Cambridge University Press, 1995.
Thus a precise meaning of ''photon'' is not needed to defend
statement 1.
No matter which view one takes with regard to statement 1., the
question is how one relates a 1-photon state to what one actually
prepares in a beam of light. What does it mean in experimental terms
to have prepared _one_ photon in this state?
Reading the details of preparation schemes for photons on demands
as discussed (with references to the original literature) in
one finds that no clear answer can be given to this question, but
that the evidence points to statement 2. ov my view presented above.
In this view, the difference between the preparation of a coherent
state and that of a single photon is that a weak coherent state
generates an infinitely long random sequence of Poisson-distributed
clicks, while a single photon (in the above sense of a space-localized
field) generates (in an ideal detector) a single click only.

The practice seems to be that one silently ignores the vacuum

contribution in (*) and obtains after rescaling to a normalized state
a state
psi' = |1> + O(eps) (*')
which, with perfect right, can be considered to be an approximate
1-photon state. Indeed, most photon states produced in the laboratory
are superposition with the vacuum, and still people speak of photons.
This also holds for other systems than simple laser light. For example,
entanglement studies are typically made with squeezed states,
which differ from coherent states only in that they have instead
of (*) a representation
psi = (1-eps||0> + eps|2> + O(eps2), (**)
and everyone refers to (**) as an ensemble of 2-photon states.
Indeed, parametric down conversion is well-known to produce an
ensemble of 2-photon states, but if one looks closer at the models
one finds that they actually produce states of the form (**)
that produce endless streams of photon pairs.
While photons on demand are based on exciting single atoms,
the only way of reliably creating single photons was for a long time
to use a source in a state of the form (**), where the photon pairs
are entangled pairs of photons with different momentum vectors (hence
located on different beams). Then one observes photons (clicks) on the
left beam with a detector, and knows from general principles that at
the same time a photon is underway in the other beam. Thus one can know
about the presence of single photons without having them observed yet.
This interpretation again explains away the vacuum part of the state
in (**). One restricts attention to the 2-photon sector of (**) by
ignoring the times where nothing but the vacuum part is observed, and
focuses on the times when something - and then by the form of (**)
the 2-photon part - is observed. This is the sense in which one
interprets as an ensemble of 2-photon states.
Then one observes the part of the 2-photon system in one beam, to know
when a photon is present in the other beam. Bot of course, although
this is the way talked about the situation, in reality one still has
the superposition with the vacuum, except that one chooses to ignore
the times where nothing happens to get rid of the vacuum.

S3a. What are 'bare' and 'dressed' particles?
A bare electron is the formal entity discussed in textbooks
when they do perturbative quantum electrodynamics. The intuitive
picture generally given is that a bare electron is surrounded
by a cloud of virtual photons and virtual electron-positron pairs
to make up a physical, 'dressed' electron. Only the latter is real
and observable. The former is a formal caricature of the latter,
with paradoxical properties (infinite mass, etc.).
On a more substantial level, the observable electrons are produced
from the bare electrons by a process called renormalization,
which modifies the propagators by self-energy terms
and the currents by form factors. As the name says, the latter define
the 'form' of a particle. (In the above picture, it would correspond
to the shape of the virtual cloud, though it is better to avoid
giving the virtual particles too much of meaning.)
The dressed object is the renormalized, physical object,
described perturbatively as the bare object 'clothed' by the
cloud of virtual particles. The dressed interaction is the 'screened'
physical interaction between these dress objects.
To draw an analogy in nonrelativistic quantum mechanics
think of nuclei as bare atoms, electrons as virtual particles,
atoms as dressed nuclei and the residual interaction between atoms,
computed in the Born-Oppenheimer approximation, as the dressed
interaction. Thus, for Argon atoms, the dressed interaction is
something close to a Lennard-Jones potential, while the bare
interaction is Coulomb repulsion. This is the situation physicists
had in mind when they invented the notions of bare and dressed
Of course, it is only an analogy, and should not be taken very
seriously. It just explains the intuition about the terminology used.
(For the serious version of renormalization, see Chapter 8.)
The electrons in QM are real, physical electrons that can be isolated.
The reason is that they are good eigenstates of the Hamiltonian.
On the other hand, virtual particles don't have this nice attribute
since the relativistic Hamiltonian H from field theory contains
creation and annihilation operators which mess things up.
The bare particles correspond to 1-particle states in the Hilbert
space (though that is not quite true since there is no good Hilbert
space picture in conventional interacting QFT). Multiplying them
with H introduces terms with other particle numbers, hence a bare
particle can never be an eigenstate of H, and thus never be
observable in the way a nonrelativistic particle is.
The eigenstates of the relativistic Hamiltonian are, instead,
complicated multibody states consisting
of a superposition of states with any number of particles and
antiparticles, just subject to the restriction that the total quantum
numbers come out right. These are the dressed particles.
For the computational side of dressing, see, e.g., nucl-th/0102037,

S3b. How meaningful are single Feynman diagrams?
The standard model is a theory defined in terms of a Lagrangian.
To get computable output, Feynman graph techniques are used.
But individual Feynman graphs are meaningless (often infinite);
only the sum of all terms of a given order can be given - after
a process called renormalization - a well-defined (finite) meaning.
This is well-known; so no-one treats the Feynman graphs as real.
What is taken as real is the final outcome of the calculations,
which can be compared with measurements.

S3c. How real are 'virtual particles'?
Virtual particles are used in perturbation theory with
Feynman diagrams. (See the FAQ entry ''Why Feynman diagrams''
for an explanation of their meaning. They do _not_ describe
processes in space and time, but certain multiple integrals...)
Feynman diagrams change their nature depending on the way
one does perturbation theory and what is resummed.
In their treatise on QED, Landau and Lifshitz discuss virtual particles
in Section 79. They start at the outset with the remark that things
depend on which kind of perturbation theory is used, and contrast
'virtual' explicitly with 'real'. Virtual particles are called that
in contrast to 'real particles' which are observable and hence real.
Unlike the latter, virtual particles occuring in computations _must_
have disappeared from the formulas by the time the calculations lead
to something that can be compared with experiment.
Whence their 'reality' if there is any is like the reality of
characters in a dream. For example, just as we can fly in a dream,
virtual particles can be faster than light (since they may have
imaginary mass)...
The following is a more detailed discussion of the question how
meaningful it is to ascribe some sort of reality to virtual particles.

All language is only an approximation to reality, which simply is.

But to do science we need to classify the aspects of reality
that appear to have more permanence, and consider them as real.
Nevertheless, all concepts, including 'real' have a fuzziness
about them, unless they are phrased in terms of rigorous mathematical
models (in which case they don't apply to reality itself but only to
a model of reality).
In the informal way I use the notion, 'real' in theoretical physics
means a concept or object that
- is independent of the computational scheme used to
extract information from a theory,
- has a reasonably well-defined and consistent formal basis
- does not give rise to misleading intuition.
This does not give a clear definition of real, of course.
But it makes for example charge distributions, inputs and outputs of
(theoretical models of) scattering experiments, and quarks something
real, while making bare particles and virtual particles artifacts of
perturbation theory.
Quarks must be considered real because one cannot dispense with them
in any coherent explanation of high energy physics.
Virtual particles must not be considered real since they arise only in
a particular approach to high energy physics - perturbation theory
before renormalization - that does not even survive the modifications
needed to remove the infinities. Moreover, the virtual particle content
of a real state depends so much on the details of the computational
scheme (canonical or light front quantization, standard or
renormalization group enhances perturbation theory, etc.) that
calling virtual particles real would produce a very weird picture of
Whenever we observe a system we make a number of idealizations
that serve to identify the objects in reality with the
mathematical concepts we are using to describe them. Then we calculate
something, and at the end we retranslate it into reality. If our initial
initialization was good enough and our theory is good enough, the final
result will match reality well. Because of this idealization,
'real' real particles (moving in the universe) are slightly different
from 'mathematical' real particles (figuring in equations).

Modern quantum electrodynamics and other field theories are based on

the theory developed for modeling scattering events.
Scattering events take a very short time compared to the
lifetime of the objects involved before and after the event. Therefore,
we represent a prepared beam of particles hitting a target as a single
particle hitting another single particle, and whenever this in fact
happens, we observe end products, e.g. in a wire chamber.
Strictly speaking (i.e., in a fuller model of reality), we'd have to
use a multiparticle (statistical mechanics) setting, but this is never
done since it does not give better information and the added
complications are formidable.
As long as we prepare the particles long (compared to the scattering
time) before they scatter and observe them long enough afterwards,
they behave essentially as in and out states, respectively.
(They are not quite free, because of the electromagnetic self-field
they generate, this gives rise to the infrared problem in quantum
electrodynamics and can be corrected by using coherent states.)
The preparation and detection of the particles is outside this model,
since it would produce only minute corrections to the scattering event.
But to treat it would require to increase the system to include source
and detector, which makes the problem completely different.
Therefore at the level appropriate to a scattering event, the 'real'
real particles are modeled by 'mathematical' in/out states, which
therefore are also called 'real'. On the other hand, 'mathematical'
virtual particles have nothing to do with observations, hence have no
counterpart in reality; therefore they are called 'virtual'.
The figurative virtual objects in QFT are there only because of the
well-known limitations of the foundations of QFT. In a nonperturbative
setting they wouldn't occur at all. This can be seen by comparing with
QM. One could also do nonrelativistic QM with virtual objects but
no one does so (except sometimes in motivations for QFT),
because it does not add value to a well-understood theory.
Virtual particles are an artifact of perturbation theory that
give an intuitive (but if taken too far, misleading) interpretation
for Feynman diagrams. More precisely, a virtual photon, say,
is an internal photon line in one of the Feynman diagrams. But there
is nothing real associated with it. Detectable photons are always
real, 'dressed' photons.
Virtual particles, and the Feynman diagrams they appear in,
are just a visual tool of keeping track of the different terms
in a formal expansion of scattering amplitudes into multi-dimensional
integrals involving multiple propaators - the momenta of the virtual
particles represent the integration variables.
They have no meaning at all outside these integrals.
They get out of mathematical existence once one changes the
formula for computing a scattering amplitude.
Therefore virtual particles are essentially analogous to virtual
integers k obtained by computing
log(1-x) = sum_k x^k/k
by expansion into a Taylor series. Since we can compute the
logarithm in many other ways, it is ridiculous to attach to
k any intrinsic meaning. But ...
... in QFT, we have no good ways to compute scattering amplitudes
without at least some form of expansion (unless we only use the
lowest order of some approximation method), which makes
virtual particles look a little more real. But the analogy
to the Taylor series shows that it's best not to look at them
that way. (For a very informal view of quantum electrodynamics in
terms of clouds of virtual particles see
and the later mails in this thread.)
A sign of the irreality of virtual particles is the fact that
when one does partial resummations of diagrams (which is essential for
renormalization), many of the virtual particles disappear.
A fully nonperturbative theory would sum everything, and no virtual
particles would be present anymore. Thus virtual particles are
entirely a consequence of looking at QFT in a perturbative way
rather than nonperturbatively.
In the standard covariant Feynman approach, energy (cp_0) and
momentum (\p; the backslash indicates 'boldface') is conserved,
and virtual particles are typically off-shell (i.e., they
do not satisfy the equation p^2 = p_0^2 - \p^2 = m^2 for physical
particles). To see this, try to model a vertex in which an electron
(mass m_e) absorbs a photon (mass 0). One cannot keep the incoming
electron and photon and the outgoing photon on-shell (satisfying
p^2 = m^2) without violating the energy-momentum balance.
However, many physicists work in light front quantization.
There one keeps all particles on-shell, and instead has energy and
momentum nonconservation (removed formally by adding an additional
The effect of this is that the virtual particle structure of the
theory is changed completely: For example, the physical vacuum and
the bare vacuum now agree, while in the standard approach,
the vacuum looks like a highly complicated
medium made up from infinitely many bare particles....
But bare particles must still be dressed to become physical,
though less heavily than in the traditional Feynman approach.
Another group of physicists calculate consequences of the standard
model using quantization on a lattice.
Here virtual particles are completely absent.
Clearly concepts such as virtual particles that depend so much
on the method of quantization cannot be regarded as being real.

Of course, physicists would not talk of virtual particles if the concept

had no relevance at all. One can argue with virtual particles to get an
intuitive idea of 'dressing', and to gain in this way some
understanding of phenomena such as the Casmir effect, Rabi
oscillations, the Lamb shift, anomalous magnetic moments, etc.
From a nonperturbative point of view, these effects all show up as
a consequence of renormalized, effective interactions between
physical (dressed, on-shell) particles.

See also earlier discussions on s.p.r. such as
and followups; maybe
is also of interest.
[For a longwinded alternative view of virtual particles
that I do _not_ share but rather find misleading, see]

S3d. What is the meaning of 'on-shell' and 'off-shell'?
This applies only to relativistic particles.
A particle of mass m is on-shell if its momentum p satisfies
p^2 (= p_0^2-p_1^2-p_2^2-p_3^2) = m^2,
and off-shell otherwise. The 'mass shell' is the manifold of
momenta p with p^2=m^2.
Observable (i.e., physical) particles are asymptotic states
(scattering states) described (modulo unresolved mathematical
difficulties) by free fields based on the dispersion relation p^2=m^2,
and hence are necessarily on-shell. Off-shell particles only
arise in intermediate perturbative calculations; they are necessarily
The situation is muddled by the fact that one has to distinguish
(formal) bare mass and (physical) dressed mass; the above is valid
only for the dressed mass. Moreover, the mass shell loses its meaning
in external fields, where, instead, a so-called 'gap equation'

S3e. Virtual particles and Coulomb interaction
Virtual objects have strange properties. For example,
the Coulomb interaction between two electrons is mediated by
virtual photons faster than the speed of light, with imaginary masses.
(This is often made palatable by invoking a time-energy uncertaintly
relation, which would allow particles to go off-shell.
But there is no time operator in QFT, so the analogy to Heisenberg's
uncertainty relation for position and momentum is highly dubious.)
Strictly speaking,
the Coulomb interaction is simply the Fourier transform of the
photon propagator 1/q^2, followed by a nonrelativistic approximation.
It has nothing at all to do with virtual particle exchanges ---
except if one does perturbation theory. But then there is no surprise
that it must influence already the tree level. By a hand waving
argument (equate the Born approximations) this gives the
nonrelativistic correspondence.
But to get the Coulomb interaction as part of the Schroedinger equation,
one needs to sum all ladder diagrams with 0,1,2,3,...,n,... exchanged
photons arranged in form of a ladder. Then one needs to approximate
the resulting Bethe-Salpeter equation. These are nonperturbative
techniques. (The computations are still done at few loops only,
which means that questions of convergence never enter.)
Virtual photons mediating the Coulomb repulsion between electrons
have spacelike momenta and hence would proceed faster than light
if there were any reality to them. But there cannot be; one'd need
infinitely many of them, and infinitely many virtual electron-positron
pairs (and then superpositions of any numbers of these) to match exactly
a real, dressed object or interaction.

S3f. Are virtual particles and decaying particles the same?
Decaying particles and resonances are used synonymously in the
literature; they are complementary views of the same unstable state.
A very sharp resonance has a long lifetime relative to a scattering
event, hence behaves like a particle in scattering. It is regarded
as a real object if it lives long enough that its trace in a wire
chamber is detectable, or if its decay products are detectable at
places significantly different from the place where it was created.
On the other hand, a very broad resonance has a very short lifetime
and cannot be differentiated well from the scattering event producing
it; so the idealization defining the scattering event is no longer
valid, and one would not regard the resonance as a particle.
Of course, there is an intermediate grey regime where different people
apply different judgment. This can be seen, e.g., in discussions
concerning the tables of the Particle Data Group.
The only difference between a short-living particle and a stable
particle is the fact that the stable particle has a real rest mass,
while the mass m of the resonance has a small imaginary part.
Note that states with complex masses can be handled well in a rigged
Hilbert space (= Gelfand triple) formulation of quantum mechanics.
Resonances appear as so-called Siegert (or Gamov) states.
A good reference on resonances (not well covered in textbooks) is
V.I. Kukulin et al.,
Theory of Resonances,
Kluwer, Dordrecht 1989.
For rigged Hilbert spaces (treated in Appendix A of Kukulin), see also
quant-ph/9805063 and for its functional analysis ramifications,
K. Maurin,
General Eigenfunction Expansions and Unitary Representations of
Topological Groups,
PWN Polish Sci. Publ., Warsaw 1968.
But a very short-living particle is not the same as a virtual
particle. Often it is a complicated, nearly bound state of other
particles. On the other hand, virtual particles are essentally always
elementary. (There are exceptions when deriving Bethe-Salpeter equations
and the like for the approximate calculations of bound states and
resonances, where one creates an effective theory in which the latter
are treated as elementary.)
Even an unstable elementary particle can be distinguished from
a virtual particle. In perturbation theory, unstable elementary
particles are modelled exactly like stable particles,
namely as external lines in a Feynman diagram.
Virtual particles in Feynman diagrams are exactly those parts
of the diagram which are not given by external lines.
In particular, what is real and what is virtual is not affected
by a diagram rotation - this only affects what is input
and what is output.
The difference can also be seen in the mathematical representation.
In an effective theory where the resonance (e.g., the neutron or a
meson) is regarded as an elementary object, the resonance appears
in in/out states as a real particle, with complex on shell momentum
satisfying p^2=m^2, but in internal Feynman diagrams as a virtual
particle with real mass, almost always off-shell, i.e., violating
this equation.
There are also some unstable elementary particles like the weak
gauge bosons. Usually, one observes a 4-fermion interaction and the
gauge bosons are virtual. But at high energy = very short scales,
one can in principle observe the gauge bosons and make them real.
This means that they now appear as external lines in the corresponding
perturbative calculations, which displays their nonvirtual nature.
In any case, from a mathematical point of view, one must choose the
framework. Either one works in a Hilbert space, then masses are real
and there are no unstable particles (since these 'are' poles on the
so-called 'unphysical' sheet); in this case, there are no asymptotic
gauge bosons and all are therefore virtual.
Or one works in a rigged Hilbert space and deform the inner product;
this makes part of the 'unphysical' sheet visible; then the gauge
bosons have complex masses and there exist unstable particles
corresponding to in/out gauge bosons which are real.
The modeling framework therefore decides which language is appropriate.

S4a. How do atoms and molecules look like?
Today, images of single atoms and molecules can be routinely produced.
M. Herz, F.J. Giessibl and J. Mannhart
Probing the shape of atoms in real space
Phys. Rev. B 68, 045301 (2003)
write in the introduction:
''quantum mechanics specifies the probability of finding an electron
at position x relative to the nucleus. This probability is
determined by |psi(x)|^2, where psi(x) is the wave function of the
electron given by Schroedinger's equation. The product of -e and
|psi(x)|^2 is usually interpreted as charge density, because the
electrons in an atom move so fast that the forces they exert on
other charges are essentially equal to the forces caused by a
static charge distribution -e|psi(x)|^2.''
One of the authors, Jochen Mannhart, is one of the 10 winners of the
Leibniz prize 2008,
among others for the achievement that, for the first time, he made
pictures of atoms with subatomic resolution possible.
The Leibniz prize is the highest German academic prize, endowed with
a research grant of up to 2.5 Million Euro for each winner,
awarded each year to a few excellent younger scientists from all
The orbitals one can look at in physics and chemistry books
are the pictures of the squared absolute values of basis functions
used for representing single electron wave functions.
The actual shape of the wave function of each electron is some linear
combination of such basis function. These are calculated (in the
simplest realistic approximation) by Hartree-Fock calculations.
The atom shape is the shape of all electrons together, forming
in the Hartree-Fock approximation a Slater determinant formed from the
single-particle wave functions, and in general a linear combinations
of such Slater determinants. These live in a multidimensional space
with 3n dimensions for an atom with n electrons.
The shape one can measure is actually a 3-dimensional charge density
rho(x) (x in R^3) formed by integrating the square of the absolute
value of the 3n-dimensional wave function psi over 3n-3 dimensions.
More precisely, it is defined (nonrelativistically) such that
(apart form a constant factor and the charge contribution of the
integral dx rho(x) f(x) = psi^* O_1(f) psi (1)
for all nice 3-dimensional functions f(x) of the space coordinate
vector x, where
O_1(f) = integral f(x) a^*(x) a(x)
is the 1-particle operator corresopnding to f. Here a^* and a denote
creation and annihilation operators. Since rho(x) decays quickly as x
differs more and more from the atom center, the atom looks like a
charge cloud with slightly fuzzy boundary.
For isolated atoms in the absence of external fields,
rho is typically spherically symmetric, giving symmetric shapes.
(In case of particles of nonzero spin, this assumes
that we are in a thermal setting where the spin directions average out.
In this case, we have instead of (1) the formula
integral dx rho(x) f(x) = tr O_1(f) rhohat,
where rhohat is the density matrix of the mixed state.)
For molecules, rho is in fact also a function of the coordinates of
all nuclei involved, and there is no longer any reason to have more
symmetry than the symmetry of the configuration of nuclei,
which is very little and often none.
The shape of molecules is therefore mainly determined by the geometry
of the positions of the nuclei. In equilibrium, these arrange
themselves such that the potential energy, i.e., the smallest
eigenvalue of the Hamiltonian operator for the electrons is minimal
among all other positions (or at least a local minimum from which a
deeper lying state is very difficult to reach). The charge density
of molecules can be identified by means of X-ray crystallography or
nuclear magnetic resonance (NMR) spectroscopy; however, for complex
molecules, doing this reliably from the available indirect information
is a highly nontrivial art.
A few years ago,
I wrote a survey of molecular modeling of proteins, the largest
molecules in nature (apart from crystals, which are essentially
molecules of macroscopic size):
A. Neumaier,
Molecular modeling of proteins and mathematical prediction of
protein structure,
SIAM Review 39 (1997), 407-460.

Viewing atoms or molecules with a scanning tunneling microscope (STM)

or an atomic force microscope (AFM)
amounts to scanning the response of the 3-dimensional charge density
to (or, more precisely, the current or force induced by it on)
the scanning device, from which a computer generates a picture.
Thus rho(x) is actually observable, with a resolution of currently
up to 0.6 Angstrom = 0.6 10^{-10}m.
For a discussion of the charge density of molecules and the resulting
operative interpretation of atoms in molecules see, e.g., the
encyclopedic article
R.F.W. Bader
Atoms in Molecules
or Bader's web site

On the other hand, whether atomic or molecular substructures such as

orbitals are observable is controversial. See, e.g.,
J.M. Zuo et al.,
Direct Observation of d-orbital holes and Cu-Cu bonding in Cu_2O,
Nature 401 (1999), 49-52.
for discussions in 1999-2001, a discussion presenting a positive
majority vote among 22 textbooks:
and from 2007:
Also, see the nice pictures in
M. Herz, F.J. Giessibl and J. Mannhart
Probing the shape of atoms in real space
Phys. Rev. B 68, 045301 (2003)
Apparently, it is a matter of terminology. Those who use the term
orbital to refer to a charge distribution corresponding to a particular
electronic state (and the ball- dumbbell-, or ring-shaped pictures of
orbitals in textbooks show just that) find orbitals observable, while
purists restricting the usage of orbitals to denoting particular
single-electron wave functions find them unobservable.
Note that Scerri, who in
defends the unobservability of orbitals, writes explicitly:
''What can be observed, and frequently is observed in experiments, is
electron density. In fact, the observation of electron density is a
major field of research in which several monographs and review
articles have been written.''
and then cites two books and a review article. A more recent review
article of some aspects is
J.M. Zuo
Measurements of electron densities in solids: a real-space view of
electronic structure and bonding in inorganic crystals
Rep. Progr. Phys. 67 (2004), 2053-2103.

S4b. Why are observable densities state-dependent?
In the preceding, the mass and charge density of a n-particle system
(or of a single particle) depends on its quantum state. This is
sometimes regarded as a reason for denying the 'reality' of the
mass and charge density. However, such a reasoning is misguided.
Indeed, the phenomenon is already present in classical mechanics.
That mass and charge density depends on the state is no more
surprising than that the trajectory of a classical particle depends
on its classical state (its position and momentum), or that the
density of a cloud in the sky depends on its classical state
(the position and momentum of all its particles, or, in the customary
fluid mechanics approximation, its mass density field and its velocity
Of course it has to, to match a particular real life situation.
What seems strange at first sight is that the above applies already to
a single, indivisible particle. But this is really strange only if one
assumes that the particle is pointlike - which we know is the case only
for unphysical, bare particles, but not for the physical, renormalized
ones. (See the entry ''Are electrons pointlike/structureless?''
elsewhere in this FAQ.) Once one realizes that physical particles are
extended (although they are indivisible), there is enough room to
accommodate the internal structure described by densities.
Thus the only quantum paradox that remains is that particles with
nontrivial internal structure (and shape) can nevertheless be
indivisible, a fact coming from the representation theory of the
fundamental symmetry group of our universe: Indivisibility of an
object just means that this object is described by an irreducible
representation which cannot be decomposed further without violating
a fundamental symmetry.

S4c. Are electrons pointlike/structureless?
Both electrons and neutrinos are considered to be pointlike
as bare particles, because of the way they appear in the standard model.
But physical, relativistic particles are not pointlike.
A pointlike electron would be described exactly by the 1-particle
Dirac equation, which has a degenerate spectrum. But the real electron
is described by a modified Dirac equation, resulting in an anomalous
magnetic moment and a nonzero Lamb shift resolving the degeneracy of
the spectrum. Both are measurable to high accuracy.
The relations between form factors for spin 1/2 particles and
terms in a modified Dirac equation describing the covariant dynamics
of a particle deviating from a point particle are given in
L. L. Foldy
The Electromagnetic Properties of Dirac Particles
Phys. Rev. 87 (1952), 688 - 693.
An intuitive argument for the lack of pointlikeness is the fact that
their localization
to a region significantly smaller than the de Broglie wavelength
would need energies larger than that needed to create
particle-antiparticle pairs, which changes the nature of the system.
(See also this FAQ about localization, and Foldy's papers quoted there.)
On a more formal, quantitative level, the physical, dressed particles
have nontrivial form factors, due to the renormalization necessary to
give finite results in QFT. The form factor measures the deviation
form the behavior of an ideal point particle, i.e., a particle obeying
exactly the the Dirac equation. The form factor can be measured
indirectly, through the anomalous magnetic moment and the Lamb shift.
(A point particle has no anomalous magnetic moment and no Lamb shift
since it satisfies the Dirac equation exactly.)
Nontrivial form factors give rise to a positive charge radius.
In his book
S. Weinberg,
The quantum theory of fields, Vol. I,
Cambridge University Press, 1995,
Weinberg defines and explicitly computes in (11.3.33) a formula for the
'charge radius' of a physical electron. But his formula is not
fully satisfying since it is not fully renormalized (infrared
divergence: the expression contains a ficticious photon mass,
and diverges if this goes to zero).
For electron form factors in light atoms, see
hep-ph/0002158 = Physics Reports 342, 63-126 (2001):
Equation (28) uses a binding energy dependent cutoff,
which makes the electron charge radius depend on its surrounding.
Of course, other particles also have form factors and associated
charge radii. For proton and neutron form factors, see hep-ph/0204239
and hep-ph/030305. Neutrons have a negative mean squared charge radius.
This looks strange but is not since the measure for the mean is
not positive; but it means that a classical interpretation of the
charge radius of neutrons is dubious. In the introduction of
S. Kopecky et al
Phys. Rev. C 56, 2229-2237 (1997)
one can read:
''The charge radius of the neutron <r_n^2> or the mean squared charge
radius is described by the volume integral over the neutron
integral rho(r)r^2dr, where r is the distance to the center of
the neutron and rho(r) is the charge density.
Positive as well as negative values of rho(r) will occur coming
from the distributions of valence quarks and the negative p-meson
cloud outside.
Since rho(r) is negative for larger r values, caused by the meson
cloud, the r^2 dependence of the integral will lead to a negative
value of <r_n^2>.''
The paper
L.L. Foldy,
Neutron-electron interaction,
Rev. Mod. Phys. 30, 471-481 (1958).
discusses the extendedness of the electron in a phenomenological way.
On the numerical side, I only found values for the charge radius
of the neutrinos, computed from the standard model to 1 loop order.
The values are about 4-6 10^-14 cm for the three neutrino species.
See (7.12) in Phys. Rev. D 62, 113012 (2000)
gives in an abstract of a 1982 thesis of Anzhi Lai
an electron charge radius of ~ 10^{-16} cm
(But I haven't seen the thesis.)
The "form" of an elementary particle (considered as a free particle
at rest) is described by its form factor,
which is a well-defined physical function
(though at present computable only in perturbation theory)
describing how the (spin 0, 1/2, or 1) particle's response to an
external classical electromagnetic field deviates from the
Klein-Gordon, Dirac, or Maxwell equations, respectively.
The form factor contains the complete state-independent information
about a free particle, since it determines the (single-particle)
Hamilton operator of the free particle and everything else can be
computed from it.
In Foldy's paper, the form factors are encoded in the infinite sum
in (16). The sum is usually considered in the momentum domain;
then one simply gets two k-dependent form factors, where k represents
the 4-momentum transferred in the interaction. These form factors
can be calculated in a good approximation perturbatively from QFT,
see for example Peskin and Schroeder's book.
An extensive discussion of form factors of Dirac particles
and their relation to the radial density function is in
D. R. Yennie, M. M. Levy and D. G. Ravenhall,
Electromagnetic Structure of Nucleons,
Rev. Mod. Phys. 29, 144-157 (1957).
R. G. Sachs
High-Energy Behavior of Nucleon Electromagnetic Form Factors
Phys. Rev. 126, 2256-2260 (1962)
Yennie et al. write:
''Information about the internal structure of the individual nucleons
is contained in the results of a variety of experiments performed in
recent years. [...] The Lamb shift and the hyperfine splitting also
give such information, [...] The charge-current density of the nucleon
(proton or neutron) includes all of the effects of the internal
structure. [...] The nucleon charge-current density must have the form
<formula involving two form factors F_1 and F_2>
The functions F_1 and F_2 are relativistic generalizations of the form
factors characteristic of finite extension occurring in other
experiments, [...]''
However, the form factor contains nothing at all about
interaction- or state-dependent information since the
interaction-dependent information is coded in an external potential
or a multiparticle formulation, and the state-dependent information
is coded in the wave function or density matrix, which (at any given
time) is independent of the Hamiltonian.
Also, the information contained in the form factor is only about
the free particle in the rest system, defined by a pure state
in which momentum and orbital angular momentum vanish identically.
In an external potential, or in a state where momentum (or orbital
angular momentum) doesn't vanish, the charge density (and the
resulting charge radius) can differ arbitrarily much from the
charge density (and charge radius) at rest.
For example, for a hydrogen electron in the ground state,
the charge density is significant in a region of diameter about
10^-11 cm (a small multiple of the Bohr radius), while the
charge radius at rest is probably (in view of the above partial
results) << 10^-12 cm.
In all cases, the charge distribution is defined as the
expectation of the charge density operator of the corresponding
quantum field. For molecules, this charge distribution is the
computational target of much of quantum chemistry, and defines the
shape of a molecule. The shape of a particle determined by the form
factor therefore corresponds to the equilibrium shape a molecule
takes in its rest frame in the absence of forces, i.e., in its
ground state, while the state-dependent shape corresponds to the
much less predictable shape of a molecule interacting with its

S4d. How much information is in a particle?
Knowing a particular electron intimately is infinitely precious.
A pure state of an electron is defined by its wave function
(up to a phase). Thus knowing all about an electron requires in the
traditional interpretation to know all about this wave function -
an infinite amount of information.
The information humans are interested in is however always finite,
since they can hardly remember even 20 decimal digits seen only once.
And the amount of information humans are capable of retrieving
by experiment is still limited, since each experiment has only a finite
Thus they simplify things to the point that all they want to know about
an electron is its mass, charge and its state to a small number
of decimal places.
This is only a few bits. But if you want to tell someone else exactly
where the electron is that you are referring to, you have an
infinitely more difficult task. Of course, any human 'else' will not
be patient enough to hear the whole (infinite) story but will be
satisfied with a crude position and momentum
estimate consistent with the uncertainty relation. But this is not the
best possible statement about the electron, which would be telling
its complete wave function. You can do it only if you force the
electron into a prison where it has to behave in a dull (and hence
completely describable) way, being
restricted in its freedom to at most a few bits of change.
This is indeed done when studying qubits for quantum information
For an N-state system, one needs N^2-1 independent pieces of
information to reconstruct (by quantum tomography) the density matrix
of a finite mixed quantum system, and a fortiori the wave function of
a finite pure quantum system. Most natural systems, unlike those
systems carefully prepared by modern technology, have infinitely many
states, and therefore need an infinite amount of information for their
reconstruction to full accuracy.

S4e. Entropy and missing information
[This continues the preceding entry.]
How is this notion of information related to information in terms of
Informally, entropy is often equated with information, but this is not
correct - entropy is _missing_ information!
More precisely, in the statistical interpretation, the state belongs
not to a single particle but to an ensemble of particles.
Entropy measures the amount of information missing for a complete
probabilistic description of a system.
Entropy is the mean number of binary questions that must be asked in
an optimal decision strategy to determine the state of a particular
realization given the state of the ensemble to which it belongs.
See Appendix A of my paper
A. Neumaier,
On the foundations of thermodynamics,
The formula for the entropy S found in every
statistical mechanics textbook is, for a system in a mixed state
described by the density matrix rho,
S = <kbar log rho> where <f> = Tr rho f
and kbar is Boltzmann's constant. (I use the bar to be free to use k
as an index.) In any representation where rho is diagonal,
rho = sum_k p_k |k><k|,
this gives
S = kbar sum_k p_k log p_k;
also, since <1>=1 and rho is positive semidefinite,
sum_k p_k = 1 , all p_k >= 0.
Thus p_k can be consistently interpreted as the probability of the
system to occupy state |k>. This probability interpretation
depends on the orthonormal basis used to represent rho; which basis
to use is a famous and not really solved problem in the foundations of
quantum mechanics.

For a pure state psi, rho has rank 1, and the sum extends only over
the single index k with |k> = psi. Thus in this case, p_k = 1 and
S = kbar 1 log 1 = 0, as it should be for a state of maximal
information. The amount of missing information is zero.
For more along these lines, and in particular for a way to avoid
the probabilistic issues indicated above, see Sections 6 and 12
and Appendix A of my paper
A. Neumaier,
On the foundations of thermodynamics,

But how does the infinite amount of information in a pure state (wave
function) square with the finiteness of entropy?
Specifying a mixed state _exactly_ provides already an infinite amount
of information, since the density matrix rho must be specified to
infinite precision.
Defining the eigenstates that are of interest in measurement
amounts to specifying a Hamiltonian operator H _exactly_, which again
provides already an infinite amount of information, since the
coefficients of H in an explicit description must be specified to
infinite precision.
Then only a finite amount of information is missing to determine
in which of the eigenstates a particular particle is.
Of course in practice one just _postulates_ rho and H based on a
finite number of measurements, and _pretends_ (i.e., procedds as if)
they are known exactly, while knowing well that one knows them only
In practice, a number of approximations are made. Frewquently,
one postulates exact equilibrium, hence a grand canonical ensemble,
which of course is not exactly valid. Deviations from equilibrium are
handled by means of a hydrodynamical approximation, in which entropy
is no longer a number but a field - and specifying the entropy density
again requires an infinite amount of information. Of course, one
also represents this only to some limited accuracy, to keep things
Thus finiteness of the entropy in a particular model is enforced by
making simplifying assumptions which are valid only if one doesn't
look too closely.
Indeed, as the Gibbs paradox (discussed, e.g., as Example 9.1 in
my above thermodynamics paper) shows, the amount of entropy depends
on the level of modeling.

An analogy contributed by Gerard Westendorp:

To describe a classical, slightly biased die exacltly by a
probability distribution also requires an infinite amount of
information, namely the specification of 5 infinite decimal expansions
of the probabilities p_k for getting k eyes. (The sixth is the
determined since probabilities sum up to 1.) This is much more than
the finite amount of information in saying which particular value k
was obtained in a specific die. On the other hand, _given_ the
distribution, the entropy S = - sum p_k log p_k is finite.
In general, describing the probabilistic state of an ensemble exactly
requires much more information than the exact description of a
particular realization.

S4f. How real is the wave function?
In thought experiments one often assigns a state to a single particle.
How defendable is this, and what is the meaning of the state?
In a statistical interpretation - see the section on measurements -,
this would make no sense, since there the state is a property
of the ensemble of particles generated by a given source. But then
it is difficult to visualize what happens in each single case.
Thus many people prefer the 'realistic' language of particles having
definite states. So let us discuss some of its implications.
Suppose that the particle is in the pure state represented by the wave
function psi. It is possible to give the wave function, or rather its
absolute valued squared, a geometric interpretation:
is the mass density and
the charge density.
Thus while the wave function itself has no tangible interpretation,
certain fields computable from it have.

This extends - but not quite in the obvious way - to multiparticle

For a system of several, say n particles, the wave function is
3n-dimensional psi(x_1,...,x_n), each x_i being an ordinary
3-dimensional position vector, but the correct densities are
still 3-dimensional, obtained by integration:
m(x) = sum_a m_a integral dx_1...dx_n delta(x-x_a)|psi(x_1:n)|^2,
e(x) = sum_a e_a integral dx_1...dx_n delta(x-x_a)|psi(x_1:n)|^2.
This reduces for n=1 to the above, and is consistent with the
definition of mass and charge density in quantum field theory as
m(x) = <Psi_0(x)^* e Psi_0(x)>,
e(x) = <Psi_0(x)^* e Psi_0(x)>,
where Psi_0(x) is the time component of the relevant matter field.
These formulas are the common starting point for the derivation from
first principles of the semiconductor equations in solid state physics.

It is also what chemists draw as molecular shapes, using a cutoff where

m(x) and e(x) are negligible to delineate the boundary. Indeed,
chemists use such an interpretation all the time when visualizing
molecules in terms of orbitals, and with great success. The charge
distribution of the electron cloud of a molecule is one of the
important outputs of quantum chemistry packages such as
GAUSSIAN (commercial)
MOLPRO (commercial)
GAMESS (free after registration)
In the ground state (but also in definite excited states),
the mass or charge distribution is spread out over an infinite region,
although it becomes negligibly small outside a tiny core region
(or, sometimes, such as in Stern-Gerlach experiments, where the
wave function is multimodal, outside a few disconnected core regions).
The infinite extension invites apparent paradox in that upon
collapse (e.g., due to hitting a detector screen), the particle
contracts from its infinite extension to a single spot. This seems
to violate the central tenet of relativity that information cannot
flow faster than the speed of light.
However, special relativity only restricts the observational
consequences of theory. Since most of the wave function of an
individual particle is unobservable, there is no contradiction.
(It is like the nonlocality in tests of Bell's inequalities.
Nonlocality is unavoidable in QM, but the observable consequences
respect the bound relativity puts on the speed of information flow.)
For example, on a TV set, one observes just 3 position degrees of
freedom of each electron reaching the screen, while - in contrast
to the case of a classical particle - the wave function
characterizing a pure state of the electron sits
in a space of functions of 3 variables, which has infinitely many
degrees of freedom. Thus one observes only a tiny little bit about
the electron's state. It is like knowing the velocity of the wind
(a 3-dimensional vector field) in the earth's atmosphere
at a single point (giving a velocity vector with 3 coordinates)!
This unobservability of most of the state causes a problem for
those who require that everything a theory is talking about is
observable. But this requirement is not satisfied anyway in current
microphysics - no one ever observed a quark, but it is generally
believed that they make up most of the matter in our universe.
Thus, while it is reasonable to require that theory has observable
consequences in agreement with Nature, it is not reasonable to
require that everything the theory talks about is observable.
Then the unobservability of most of the state of a single particle
is harmless.

On the other hand, one can probe the state of particles in detail
if one has a large ensemble of identically prepared particles
(to make sure that they have the same state). These are usually created
by a carefully calibrated source, such as a laser. Then one can
subject them to different kinds of measurements from which one can
reconstruct a reasonable approximation of the state by quantum
tomography. In theory, one can make the approximation arbitrarily good.
Similarly a particle bound to a surface in a stationary state will
be measurable repeatedly if after the measurement the particle returns
to its state (which is natural if the bound system is in equilibrium).
Therefore one can measure equilibrium properties quite accurately.

In this sense one can say that the state of a single particle is
indeed real, and objective.

Note that single particles can nowadays be routinely prepared and

studied; see, e.g.,
D. Leibfried, R. Blatt, C. Monroe, and D. Wineland,
Quantum dynamics of single trapped ions,
Reviews of Modern Physics 75 (2003), 281-324.
S.M. Reimann and M. Manninen,
Electronic structure of quantum dots,
Reviews of Modern Physics 74 (2002), 1283-1342.

S4g. How real are Feynman's paths?
In Feynman's version of quantum mechanics, amplitudes are calculated
as sum over all possible classical paths a particle (or a system)
can take in a classical phase space.
The paths in the Feynman picture of QM should not be regarded as real.
All possible paths are about as real as all possible books that can
be written, or - closer to physics - all possible items in a
statistical ensemble modeling a classical ideal gas. Of course only one
state is realized, not all conceivable ones; all others are just there
to compare to and compute probabilities.
In QM things are slightly more complicated, however, since the 'true'
path is smeared by the uncertainty principle. (Even in the many-wolds
interpretation, quantum objects have no sharp paths, while the paths
integrated over in a path integral must be perfectly accurate.)
The paths are just calculational devices that stop to exist once a
different approach to computations are taken. This is why I don't
ascribe any reality to them. The real objects remain present in
_any_ sensible description; the unreal one's don't.
S4h. Can particles go backward in time?
In the old relativistic QM (e.g., in Volume 1 of Bjorken and Drell)
antiparticles are viewed as particles traveling backward in time.
This is based on a consideration of the solutions of the Dirac equation
and the idea of a filled sea of negative-energy solutions in which
antiparticles appear as holes (though this picture only works for
fermions since it requires an exclusion principle). One can go some way
with this view, but more sophisticated stuff requires the QFT picture
(as in Volume 2 of Bjorken and Drell and most modern treatments).
In relativistic QFT, all particles (and antiparticles) travel forward
in time, corresponding to timelike or lightlike momenta.
(Only 'virtual' particles may have unrestricted momenta; but these are
unobservable artifacts of perturbation theory.)
The need for antiparticles is in QFT instead revealed by the fact that
they are necessary to construct operators with causal (anti)commutation
relations, in connection with the spin-statistic theorem. See, e.g.,
Volume 1 of Weinberg's quantum field theory book.
Thus talking about particles traveling backward in time, the Dirac sea,
and holes as positrons is outdated; it is today more misleading
than it does good.

S4i. What about particles faster than light (tachyons)?
Tachyons are hypothetical particles with speed exceeding the speed of
light. Special relativity demands that such particles have imaginary
rest mass (negative m^2), and hence can never be brought to rest
(or below the speed of light); unlike ordinary particles, they speed
up as they lose energy,
Charged tachyons would produce Cerenkov radiation in vacuum which has
never been observed. However, Cerenkov radiation is indeed observed
when fast particles enter a dense medium in which the speed of light
is smaller than the particle's speed. This is not a problem since
relativity only demands that no particle with real mass is faster
than the speed of light in vacuum.
(Unfortunately, this does no longer allow to discriminate between
massless particles having the vacuum speed of light, and tachyons.)
Neutrinos are uncharged and have a squared mass of zero or very close
to zero, and hence could possibly be tachyons.
Recently observed neutrino oscillations confirmed a small
squared mass difference between at least two species of neutrinos.
This does not yet settle the sign of m^2 for any species.
Direct measurements of m^2 have experimental errors still compatible
with m^2=0. For data see
The initial interest in tachyons stopped around 1980, when it was
clear that the QFT of tachyons would be very different from standard
QFT, and that experiment didn't demand their existence. The publications
of the particle data group, which contain the biannually revised
consensus of the particle physics community, do not even include the
search for tachyons in their reviews of hypothetical particles:
In fact, the theory of symmetry breaking demands that tachyons do
_not_ exist: When a relativistic field theory is deformed in a way
that the square of the mass (pole of the S-matrix) of some physical
particle would cross zero, the old physical vacuum becomes unstable and
induces a phase transition to a new physical vacuum in which all
particles have real nonnegative mass. This would happen already at
tiny negative m^2,
and is believed to be the cause of inflation in the early universe.
(Of course, the exact mechanism is not known since it would require a
nonperturbative definition of QFT. But classical and semiclassical
computations strongly suggest the correctness of this picture.)
Expanding a theory (such as the standard model) around an unstable state
(e.g., the Higgs with a local maximum at vanishing vacuum expectation)
formally produces a bare tachyon. This does not contradict the above
assertion, but only indicates the instability of the bare vacuum.
Asymptotic power series expansions around maxima
(especially those with tiny or vanishing convergence radius)
make meaningless assertions about the behavior of a function near one
of its minima. Since physical particles arise from field excitations
near the global minimum of the effective energy, perturbations around
the maximum are unphysical.
An expansion around an unstable state gives no significant information,
unless one has a system that actually _is_ close such an unstable state
(as perhaps the very early universe). But in that case there are no
relevant excitations (tachyons), since the whole process of motion
(inflation) towards a more stable state proceeds so rapidly that
excitations do not form and everything can be analyzed semiclassically.
The physical Higgs field is far away from the unstable maximum, and its
particle excitations have a positive real mass, hence are not tachyons.

Below are some references about tachyons.

the more important papers are marked by an asterisk.
* G. Feinberg,
Possibility of Faster-Than-Light Particles,
Phys. Rev. 159, 1089 (1967).
J. Dhar and E. C. G. Sudarshan,
Quantum Field Theory of Interacting Tachyons,
Phys. Rev. 174, 1808-1815 (1968)
M. Glueck,
Note on Causal Tachyon Fields,
Phys. Rev. 183, 1514 (1969).
D. G. Boulware,
Unitarity and Interacting Tachyons,
Phys. Rev. D 1, 2426 (1970).
* B. Schroer,
Quantization of m^2<0 Field Equations,
Phys. Rev. D 3, 1764 (1971).
G. Feinberg
Lorentz invariance of tachyon theories
Phys. Rev. D 17, 1651 (1978)
C. Schwartz
Some improvements in the theory of faster-than-light particles
Phys. Rev. D 25, 356 (1982)
SM. B. Davis, M. N. Kreisler, and T. Alvaeger
Search for Faster-Than-Light Particles
Phys. Rev. 183, 1132 (1969)
* L. W. Jones
A review of quark search experiments
Rev. Mod. Phys. 49, 717 (1977)
[Section IIIG reviews the vain search for tachyons.]
The Wikipedia entry for tachyons,
gives some more explanations.
although mainly speculating about connections between tachyons and
inflation, has some links with further useful information.

S4j. Do free particles exist?
Free particles are a convenient mathematical abstraction.
In Nature, there are - strictly speaking - no free particles,
only interacting ones. This holds both for photons and for other
more tangible particles like electrons. However, in sufficiently
localized (and nearly empty) regions of space, particles can be
approximately free. Again, this holds for both photons and other
It is very convenient to approximate such states by free states.
For example, this allows to explain much of quantum mechanics
in terms of particle scattering. The S-matrix interpretation
depends crucially on the fact that the ingoing and outgoing
asymptotic states of photons, electrons, quarks, etc. are free.
Thus, in this sense, free photons exist just as much (or just as
little) as free electrons.

S5a. QM pictures and representations
QM exists in different pictures, of which the Schroedinger picture,
the Heisenberg picture, the interaction picture, and Feynman's
path integral representation are frequently invoked. There is also
the algebraic approach using unitary representations of canonical
commutation rules (CCR).
The Schroedinger picture, the Heisenberg picture, and the interaction
pictures are equivalent because there are unitary transformations
between them. They all provide different representations of the
same canonical commutation rules
i[p_j,q_k]= hbar delta_jk
between components p_j of momentum p and q_k of position q.
The Stone-von Neumann theorem guarantees that the canonical
commutation relations (or their unitary version, the Weyl relations)
have a unique unitary representation apart from unitary
transformations, and hence suffice to specify the QM of finitely many
degrees of freedom uniquely, no matter which picture is used.
The Stone-von Neumann theorem fails for systems of infinitely many
degrees of freedom (see the FAQ entry on 'Inequivalent
representations of CCR/CAR'), which in a sense 'causes' the
difficulties in quantum field theory.
Nevertheless, QFT still has a Schroedinger picture
and a Heisenberg picture, and these are still equivalent:
The Heisenberg picture can be immediately constructed from the Wightman
fields. Then the canonical procedure - fixing the Heisenberg operators
at time t=0 and instead defining dynamical states
psi(t) := exp(-itH)psi
- produces the Schroedinger picture from it.
The Feynman path integral is related to the other pictures via the
Feynman-Kac formula, which makes the often only formally stated
equivalence precise, after analytically continuing the time to purely
imaginary times. The Osterwalder-Schrader theory
[see, e.g., math-ph/0001010 or the book by Glimm and Jaffe]
shows how to go back in case of relativistic quantum field theory.
The Feynman path integral only gives time-ordered expectation values;
this suffices to compute S-matrix elements, but is inadequate for
dynamical investigations needed for nonequilibrium quantum mechanics.
The latter can be treated with the so-called closed time path (CPT)
integral within the Schwinger-Keldysh formalism.

S5b. Inequivalent representations of the CCR/CAR
Ordinary quantum mechanics of N particles can be written in terms of
creation and annihilation operators for the 3N modes of an associated
reference harmonic oscillator. The field case, on the other hand,
is characterized by the fact that there are infinitely many modes.
If the creation and annihilation operators are those in the action
or Hamiltonian defining the QFT, the different modes are traditionally
referred to as 'bare particles', though this is not recommended for
reasons discussed elsewhere in this FAQ. If the creation and
annihilation operators are properly renormalized so that they
create and annihilate physical particles from the physical vacuum,
the modes are referred to as 'dressed particles'; only these have
physical relevance.
A state in which k modes are excited is called a k-particle state.
In many states of interest, however, (the most prominent ones being
the coherent states) infinitely many modes are excited (although the
notion of infinitely particles is strained in this case). Thus one
needs to cater in the formalism for states with arbitrarily many or
even infinitely many modes. This has subtle consequences, which
account for the big difference between quantum field theory and
ordinary quantum mechanics.

The canonical commutation rules (CCR) for creation and annihilation

operators in field theory take in the simplest case (countably many
modes, corresponding to fields confined to a bounded region) the form
[a(k),a^*(l)] = delta_kl, k,l=0,1,2,... (1)
The Stone-von Neumann theorem, which guarantees that the canonical
commutation relations of quantum mechanics (or their unitary version,
the Weyl relations) have a unique unitary representation apart from
unitary transformations, fails for systems of infinitely many degrees
of freedom.

The reason for this is that the natural representation space for
creation and annihilation operators is the vector space consisting
of all formal linear combinations
sum psi(n1,n2,n3,...) |n1,n2,n3,...>
with _arbitrary_ complex coefficients psi(n1,n2,n3,...), on which
a(k) and a^*(l) act as
a(k)|n1,....,n_k,...> = sqrt(n_k)|n1,....,n_k - 1,...>,
a*(l)|n1,....,n_l,...> = sqrt(1+n_l)|n1,....,1+n_l,...>.
This vector space V has no natural Hilbert space structure.
To provide a definite inner product, one must select a suitable
subspace where this inner product can be defined.
This allows many choices; the choice usually discussed in QFT treatises
is Fock space, where only basis vectors |n1,....,n_k,0,0,...>
with finitely many particles are allowed, and these basis vectors are
declared orthonormal. As a result, Fock space contains only
the linear combinations
sum psi(n1,n2,n3,...,n_k) |n1,n2,n3,...,n_k>
where k is variable and
sum |psi(n1,n2,n3,...,n_k)|^2 is finite.
Unfortunately, if this choice is made for the representation of the
bare creation and annihilation operators, it excludes the states
relevant for the physical, interacting situation. This is the
essential message of Haag's no interaction theorem.
Indeed, the physical states lie in a different, inequivalent unitary
representation, characterized by a different subspace of V. This
subspace is generated by applying to the physical (= renormalized)
vacuum state the dressed (= renormalized) creation operators
an arbitrary number of times, then taking all finite linear
combinations, and finally taking the closure with respect to the
innner product in which all a^*(n_1)...a^*(n_k)|vac> are orthonormal.
In general, this Hilbert space has only the null vector (_not_ the
vacuum) in common with the Fock space, even for the simplest
(i.e.,quadratic) Hamiltonians and actions. This case is well understood,
giving rise to the theory of quasiparticles and in particular of
superconductivity. For example (counting modes by signed nonzero
integers for simplicity - they become momenta in the infinite volume
limit), if the bare a(k) and b(k) satisfy CCR then do the dressed
annihilation operators
alp(k) = A(k) a(k) - B(-k) b*(-k),
bet(k) = A(k) b(k) - B(-k) a*(-k),
and their formal adjoints
alp^*(k) = A(-k) a^*(k) - B(k) b(-k),
bet^*(k) = A(-k) b^*(k) - B(k) a(-k),
provided that A(k), B(k) are real numbers satisfying
A(k)^2 - B(k)^2 = 1,
or, equivalently, that
A(k) = cosh(theta(k)), B = sinh(theta(k)).
If there were only finitely many modes, we could define
in Fock space the unitary operator
G = exp [- sum_k theta(k) (a(k)b(-k) - b*(-k)a*(k))],
and verify that
alp(k) = G a(k) G^{-1},
bet(k) = G b(k) G^{-1},
showing that we get an equivalent representation of the CCR.
We could deduce that
|vac> := G|>,
where |> is the bare vacuum, is the dressed vacuum on which
alp and bet act naturally. The dressed states were simply be
the images of the bare states under the Bogoliubov operator G.
Unfortunately, if there are infinitely many modes, G can no
longer be consistently defined as an operator in Fock space,
and the infinite-dimensional version of this scenario breaks
down. Ignoring this, one would find all sorts of infinities.
Mathematically, however, one simply changed the unitary
representation - G does not exist although the dressed
representation exists.
Physicists say that the above computations hold 'formally',
and mean (if a mathematician tries to give it a precise meaning)
that it holds in finite mode approximations but does not survive
the limit although they usually formulate it in the meaningless,
limit form.
The canonical anticommutation rules (CAR) also have the form (1),
except that the commutator is replaced by an anticommutator.
All statements above are valid with appropriate modifications;
the most important one being that occupation numbers are now
restricted to 0 and 1, and the definition of a^*(l) has 1-n_l in
place of 1+n_l.
For more details see the book
H. Umezawa, H. Matsumoto, and M. Tachiki,
Thermo Field Dynamics and Condensed States,
North Holland 1982.

S5c. Why does QFT look so different from QM?
This is only because of technical reasons and the power of tradition.
In ordinary quantum mechanics, pure states are described by
wave functions (more precisely by rays) in a Hilbert space,
there is a Hamiltonian H and an associated Schroedinger equations
i hbar psidot = H psi, the time evolution is described by a unitary
operator, the bound states are normalized eigenstates of the
Hamiltonian, etc.
This is also done in traditional quantum field theory, though it
is not directly apparent. But one can see it when studying
constructive field theory. It gives everything in case of 2D quantum
fields. There is a well-defined Hilbert space, a well-defined
Hamiltonian constructed without any use of perturbation theory,
a well-defined unitary dynamics, well-defined bound states that
are eigenstates of the Hamiltonian, and everything is invariant under
the 2D Poincare group ISO(1,1). See the book
J. Glimm and A Jaffe,
Quantum Physics: A Functional Integral Point of View,
Springer, Berlin 1987.
The only thing wanting is an explicit formula for H in the traditional
nonrelativistic form H=H_0+V. Instead, H is constructed in a more
abstract way, as analytic continuation of an operator in Euclidean
field theory.
That the 4D case is more difficult has to do with obstacles in getting
tight enough bounds for the analytic estimates needed. These are
mathematical difficulties, but not inconsistencies - no one proved that
there are contradictions, and the practice of QFT suggests that there
are indeed none (at least for asymptotically free theories).
On the perturbative level, there is no difficulty at all - see, e.g.
the book
M Salmhofer,
Renormalization: An Introduction,
Texts and Monographs in Physics,
Springer, Berlin 1999.
which constructs the Euclidean theory for Phi^4 theory in 4 dimensions
perturbatively, i.e., in the formal power series topology, with full
mathematical rigor. If this construction would work nonperturbatively
(i.e., give functions instead of formal power series),
analytic continuation using Osterwalder-Schrader theory would do
the rest. The latter is described, e.g., in Chapter 6 of the above
book by Glimm and Jaffe.

S5d. Why is QFT based on a classical action?
The path integral approach to QFT begins with classical fields
that are varied to produce quantum amplitudes as a 'sum over all
possible paths'. But, with exception of the elctromagnetic field,
the classical fields one meets there are not fields occurring
in classical physics. Nevertheless they are rightfully labelled
Classical physics is the physics of processes slowly varying in space
and time; of course, elementary particles do not belong there.
But classical mechanics can also be considered as an abstract
mathematical framework for dynamics in a general phase space
(described by a Poisson manifold), which has much wider applicability.
The classical fields that figure in the path
integral belong in this sense to classical mechanics.
In QFT, one needs a classical action to be able to implement
unitarity of the S-matrix and the cluster decomposition.
The first is essential for a correct probabilistic interpretation of
QFT, since it amounts to preservation of probability, and the second is
necessary to account for the fact that all our experiments are done
locally, and what is far away does not contribute significantly
except through effectively classical far fields. (What happens with
the stars should be irrelevant to experiments on the earth, except for
the experiments of astronomers. This is the basis of all physics.)
In terms of microphysics, cluster decomposition means that one cannot
scatter particles (clusters of elementary particles) at very distant
particles (clusters).
The arguments why this requires a classical action expressed in terms
of creation and annihilation operators are explained in detail in
Weinberg's quantum field theory book, Volume I, Chapters 3-7.
We need cluster decomposition because it is observed. We need
local fields and microcausality, mainly because it implies
(modulo fine print involving contact terms) at least perturbatively
cluster decomposition, and there is no other known way in QFT to
ensure the latter. But there are covariant N-particle models with
cluster decomposition, discussed, e.g., in
B.D. Keister and W.N. Polyzou,
Relativistic Hamiltonian Dynamics in Nuclear and Particle Physics,
in: Advances in Nuclear Physics, Volume 20,
(J. W. Negele and E.W. Vogt, eds.)
Plenum Press 1991.
(The constructions are quite messy; they have, however, the
advantage that they do not need renormalization, and are useful
phenomenological models.)
The lack of references to cluster decomposition in standard textbooks
of QFT is explained by the fact that local QFT automatically satisfies
cluster decomposition. Most people start by taking QFT as starting
point, without asking why. Weinberg's treatise is about the only book
that asks this question and answers it in some depth.
But when you look at the literature on phenomenological covariant
multiparticle models, cluster decomposition plays an essential role
in that it is the main hurdle to overcome to get realistic models for
systems made of more than two unconfined particles. For details see
the survey by Keister and Polyzou mentioned above,
and the references there.
Cluster decomposition for field theory is also discussed from a
rigorous point of view in the book by Glimm and Jaffe, where
connections are made to multiparticle scattering.
Indeed, books on (nonrelativistic) scattering theory are the ones
where the cluster decomposition is discussed in detail, since it is
needed to describe the result of the most general multiparticle
scattering experiments, and an understanding of it is essential for
proving the asymptotic completeness of scattering states.
Nonrelativistic theory also shows that the 'correct'
cluster decomposition is always one for bound states,
as can be seen from a more detailed nonrelativistic analysis.
(This is not apparent from Weinberg's argument,
since perturbation theory breaks down in the presence of
bound states. This explains why QCD has no cluster
decomposition for isolated quarks.)

Unfortunately, most physicists tend to work in isolated fragments of the

whole edifice of physics, thus losing connections that may be important
to understanding. Cluster decomposition would perhaps be more prominent
in QFT if it were easier to calculate properties of bound states and
their scattering or breaking up, since that is where one can see the
principle at work. But such calculations are presently out of reach
without severe approximations.

S5e. Why does the action only contain first derivatives?
On the classical level, higher derivatives cause no formal problems,
one can form the variational equations as always. There might be
problems with causality (= symmetric hyperbolicity), however.
These problems become worse (and apparently untractable) in the
quantum case.
In a k derivative theory with k>1, one can always introduce new fields
for the k-1 first derivatives, and add terms to the action that give
as variation their defining equations. Thus one can reduce any theory
to an equivalent one with only first derivatives in the action.
The problems appear when trying to go from the Lagrangian picture to
the Hamiltonian - then one gets similar difficulties as for gauge

S5f. Why normal ordering?
Field theory often deals with polynomial expressions in annihilation
operators a(p) and their adjoint creation operators a^*(p).
While a(p) is a linear operator on a dense subspace H of the
corresponding Fock space, its adjoint isn't. But both are densely
defined sesquilinear forms on Fock space.
A sesquilinear form is a linear mapping f from a space H (the domain;
a dense subspace of the Hilbert space, in the present case of Fock
space) to its dual space H^* (which properly contains H), while
an operator maps H into H. Thus the latter can be iterated
while the former usually cannot.
<phi|f|psi> is always defined when phi,psi in H (since f|psi> is in H^*,
the inner product is defined). Thus Hermitian sesquilinear forms are
satisfying candidates for 'observables'. However, matrix elements
<phi|fg|psi> of products fg make sense only for
operators f,g, since fg|psi> is not defined if g|psi> is outside H.
In particular, a(p)a(p)^* is a meaningless construct, while
:a(p)a(p)^*: = +-a^(p)*a(p)
makes sense as a Hermitian sesquilinear forms. But f(p)=a^(p)*a(p) is
no longer an operator in any sense (though good 1-particle
operators can be made by integration with suitable test functions).
That's why f(p)f(q) is meaningless while the permuted form
:f(p)f(q): = +-a^*(p)a^*(q)a(q)a(p)
(+ for Bosons, - for Fermions) is well defined (again as sesquilinear
form only).
More generally, any product O of creation and annihilation operators
which has all its creation terms to the left of all its annihilation
terms (these are called normally ordered products) defines a
sesquilinear form. The reason is that such an O can be written as
O=A^*B where A and B are products of annihilation operators only,
hence <phi|O|psi> = <phi|A^*B|psi> can be interpreted as the inner
product of the two vectors A|phi> and B|psi> obtained from phi and psi
by applying annihilation operators only, which produces vectors in H
for which the inner product is always defined.
Normal ordering just permutes arbitrary products to put them into the
normally ordered and hence well-defined form (and adds a minus sign
if an odd number of transpositions of Fermion operators is needed
to order the product). This is extended by linearity to polynomials
and infinite series in power products. Note that normal ordering is
defined for formal expressions (i.e. strings of letters),
not for operators or forms; only _after_ nornal ordering an
expression O one gets a sesquilinear form :O:.
In Fock spaces over finite-dimensional Hilbert spaces, the situation is
different; there a(p) and a^*(p) are indeed operators on Fock space
(and the index p ranges over finitely many items only). Thus all
products make sense, and the normally ordered version of a product
differs from the original product by terms involving fewer operators.
Normal ordering is usually motivated by starting with a
finite-dimensional discretization where integrals become finite sums;
then one can do all the formal manipulations rigorously. Upon passing
to the continuum limit, most expressions become infinite and hence
meaningless, but the normally ordered expressions happen to have a
well-definedlimit and hence are meaningful. So these are the relevant
'operators' or rather sesquilinear forms. Presenting things as above
avoids any infinities.

S5g. Why locality and causal commutation relations?
In measurement terms, locality is the idea that a measurement here
and a simultaneous measurement there can be performed independently,
and in particular don't limit each other in precision. This is encoded
in the requirement that 'local' quantities described by fields
Phi_a(\x,t) here (at \x) and fields Phi_b(\y,t) there (at \y)
commute if the positions \x and \y are distinct.

The covariant form of this locality requirement is that,

with x=(ct,\x) and the +--- norm defined by x^2=x_0^2-\x^2,
[Phi_a(x),Phi_b(y)]=0 if (x-y)^2<0 (*)
Indeed, if x_0=y_0=ct then (x-y)^2=(x_0-y_0)^2-(\x-\y)^2=-(\x-\y)^2<0,
so this commutation relation holds at equal time. But then Lorentz
covariance implies that it must hold whenever (x-y)^2<0, since any
pair (x,y) with (x-y)^2<0 can be transformed into an equal time pair.
Thus locality is a property of distinguished fields satisfying (*),
called local fields. This property is completely independent of states,
since it is understood that the property holds independent of the
coincidental properties of the state.

Quantum field theory is physics in the Heisenberg picture, with

states fixed once and for all, and all spacetime dependence in
the fields. The universe is in a definite though largely unknown state,
and apart from the Lagrangian of the standard model plus gravitation,
all the history, present and future of the universe is encoded in
this universal state.
Lacking knowledge of this state, physicists are usually
contend with describing tiny portions of this state, namely the
restriction of the state to a subalgebra of accessible quantities
within the lab (or at least close to the solar system).
Since there are many such subsystems of interest, and all these
are in different states even if described by the same algebra
(more precisely by isomorphic ones), all generic properties of
physical systems must be independent of the states.

S5h. Creation operators and rigged Hilbert space
Physicists regard Fock space as the Hilbert space containing the
basis states
|x_1:N> = |x_1,...,x_N>
and their linear combinations. However, there is no Hilbert space
containing these states. The state |x_1:N> = |x_1,...,x_N>
is not in the Hilbert Fock space, for the same reason for which
|x> is not in the 1-particle Hilbert space. It is only a
The Hilbert Fock space is made instead of all wave functions
psi = sum_N integral dx_1:n psi_N(x_1:N) |x_1,...,x_N>
with finite
<psi|psi> = sum_N |psi_N|^2/N!
Physicists also define annihilation operators a(x) and
their adjoints, creation operators a^*(x). However, these are
not operators, but operator-valued distributions. For example,
a^*(x) maps the vacuum state |vac> (with psi_0=1, other psi_N=0)
into a^*(x)|vac> = |x>, which is not in the Hilbert Fock space.
More generally, for every nonzero Hilbert Fock space vector psi,
the vector
psi' = a^*(x) psi
lies outside the Hilbert Fock space state.
Thus the domain of a^*(x) is just {0}.
However, the states |x_1:N> = |x_1,...,x_N> lie in the top
layer H^* of the right Gelfand triple = rigged Hilbert space.
This is the name for a triple H in Hbar in H^* of vector spaces,
where Hbar is a Hilbert space, H a dense 'nuclear' subspace
(containing very smooth states with very good behavior at infintity)
and H^* its dual space (containing among others very singular states
and states with very poor behavior at infintity). Observables (in the
weak sense) are bilinear forms, or, which is the same, linear mappings
from H to H^*. The adjoint of such a linear mapping is again an
observable in the weak sense. Annihilation operators a(x) (and their
adjoints a^*(x)) are observables in this weak sense, although they are
not Hermitian (and a fortiori not self-adjoint).

Most physicists take it lightly since the times of Dirac.

They don't bother about self-adjointness or any other functional
analytic concept, unless ignoring it brings them into trouble.
Almost everything they do in the nonrelativistic regime
can be made rigorous in the rigged Hilbert space, so they fare right
even when they imagine wrongly that they work in a Hilbert
space. Thus they get away with their bad practices.
What they call 'Hilbert space' _is_ in fact always a
rigged Hilbert space; although most of them just don't know and
don't care.

S5i. Why Feynman diagrams?
Feynman diagrams resemble processes with particles moving in space and
time, and are often figurately treated as such. But in fact they
do _not_ describe such processes, but certain multiple integrals.
(To emphasize this, the particles involved in Feynman diagrams are
called 'virtual particles'. (Still, many people think mistakenly
that virtual particles are somehow also real. See the entries about
virtual particles elsewhere in this FAQ.)
Although it is nowhere said explicitly, Feynman diagrams are just
a mnemonic for nicely picturing the composition of higher order tensors.
Create for each tensor of a theory a different vertex type, draw a
vertex of this type for each occurence of this tensor in a product
expression in Einstein summation convention, and draw a line between
two such vertices whenever they share an index to be summed over.
The form of the lines defines the value of the coefficient function
in such a product, and the sum over Feynman diagrams simply means that
one considers a linear combination of these products, integrated over
the arguments. Thus this defines a generic representation of an
expansion of a function of the tensors of the theory.
Tuus Feynman diagrams can be used whenever one expands a function of one
or more tensors into a linear combination of products of components of
these tensors.
Indeed, for this reason, they are also used in classical statistical
mechanics and in the analysis of stochastic differential equations
by functional integration techniques.

S6a. Nonperturbative computations in quantum field theory
There is well-defined theory for computing contributions to the
S-matrix in quantum electrodynamics (and other renormalizable field
theories) by perturbation theory.
There is also much more which uses handwaving arguments and appeals
to analogy to compute approximations to nonperturbative effects.
Examples are:
- relating the Coulomb interaction and corrections to scattering
amplitudes and then using the nonrelativistic Schroedinger
- computing Lamb shift contributions (now usually done in what is
called the NRQED expansion),
- Bethe-Salpeter and Schwinger-Dyson equations obtained by resumming
infinitely many diagrams.
The use of 'nonperturbative' and 'expansion' together sounds
paradoxical, but is common terminology in QFT. The term 'perturbative'
refers to results obtained directly from renormalized Feynman graph
evaluations. From such calculations, one can obtain certain information
(tree level interactions, form factors, self energies) that can be
used together with standard QM techniques to study nonperturbative
effects - generally assuming without clear demonstrations that this
transition to quantum mechanics is allowed.
Of course, although usually called 'nonperturbative', these techniques
also use approximations and expansions. The most conspicous
high accuracy applications (e.g. the Lamb shift) are highly
nonperturbative. But on a rigorous level, so far only the perturbative
results (coefficients of the expansion in coupling constants) have any
Although the perturbation series in QED are believed to be asymptotic
only, one can get highly accurate approximations for quantities like the
Lamb shift. However, the Lamb shift is a nonperturbative
effect of QED. One uses an expansion in the fine structure
constant, in the ratio electron mass/proton mass, and in 1/c
(well, different methods differ somewhat). Starting e.g., with
Phys. Rev. Lett. 91, 113005 (2003)
one should be able to track the literature.
Perturbative results are also often improved by partial summation of
infinite classes of related diagrams. This is a standard approach to
go some way towards a nonperturbative description. Of course, the
series diverges (in case of a bound state it _must_ diverge, already in
the simplest, nonrelativistic examples!), but the summation is done
on a formal level (as everything in QFT) and only the result
reinterpreted in a numerical way. In this way one can get
in the ladder approximation Schroedinger's equation, and in other
approximations Bethe-Salpeter equations, etc..
See Volume 1 of Weinberg's quantum field theory book.

S6b. The formal functional integral approach to QFT
On a purely formal level (i.e., with power series in place of actual
numbers), 4D QFT is very alive and useful. It is now almost
always based upon functional integrals.
The path integral is discussed e.g., in Weinberg I, Chapter 9, or
Peskin/Schroeder, also Chapter 9. As one can see there, the
path integral formalism involves no operators at all, only classical
(commuting or anticommuting) fields.
The quantities obtained in the expansion of the path integral in
powers of hbar are time-ordered vacuum expectation values.
Since the original ordering in a time-ordered vacuum expectation value
is immaterial (apart from a sign for fermions), the same must be the
case for the path integral itself, which explains why the fields
in the path integral are classical (i.e., commute or anticommute
at all arguments).

The main strength of the path integral approach is precisely that

it avoids quantum operators and replaces all operator arguments by
averages over classical paths. (The main weakness is that this
averaging process is logically ill-defined.
There exists no prescription how the limit in the ``definition'' of
the path integral is to be taken to yield (in theory - independent
of the difficulty of computing them) numbers that have the properties
commonly ascribed to the path integral.)

The older canonical quantization approach was fraught with difficulties

because of inconsistencies in the operator approach.
For example, the canonical commutation rules (CCR) are
valid only in the free case, and no one knows how they should
be in the interacting case - though one knows that (anti)commutators
must still vanish at spacelike related arguments.
Moreover, the renormalization program plays havoc with operators.
Unfortunately, this means that dynamical isssues and bound states
questions, which are comparatively easy to handle in an operator
framework, become almost intractable in the path integral approach.
However, as Weinberg stresses in his QFT book, an understanding of
the relation between path integral and canonical quantization is
essential to get the properties of the latter correct in cases like
the nonlinear sigma model.
S6c. Functional integrals, Wightman functions, and rigorous QFT
QFT assumes the existence of interacting (operator
distribution valued) fields Phi(x) with certain properties, which
imply the existence of distributions
But the right hand side makes no rigorous sense in traditional QFT
as found in most text books, except for free fields. Axiomatic QFT
therefore tries to construct the W's - called the Wightman functions -
directly such that they have the properties needed to get an S-matrix
(Haag-Ruelle theory), whose perturbative expansion
can be compared with the nonrigorous mainstream computations.
This can be done successfully for many 2D theories and for some 3D
theories, but not, so far, in the physically relevant case of 4D.
To construct something means to prove its existence as a mathematically
well-defined object. Usually this is done by giving a construction
as a sort of limit, and proving that the limit is well-defined.
(This is different from solving a theory, which means computing
numerical properties, often approximately, occasionally
- for simple problems - in closed analytic form.)
To compare it to something simpler: In mathematics one constructs the
Riemann integral of a continuous function over a finite interval by
some kind of limit, and later the solution of an initial value problem
ordinary differential equations by using this and a fixed point
theorem. This shows that each (nice enough) initial value problem is
uniquely solvable. But it tells very little of its properties, and
in practice no one uses this construction to calculate anything.
But it is important as a mathematical tool since it shows that
calculus is logically consistent.
Such a logical consistence proof of any 4D interacting QFT is presently
still missing. Since logical consistency of a theory is important,
the first person who finds such a proof will become famous - it means
inventing new conceptual tools that can handle this currently
intractable problem.

Wightman functions are the moments of a linear functional on

some algebra generated by field operators, and just as linear
functionals on ordinary function spaces are treated in terms of
Lebesgue integration theory (and its generalization), so Wightman
linear functionals are naturally treated by functional integration.
The 'only' problem is that the latter behaves much more poorly from
a rigorous point of view than ordinary integration.
Wightman functions are the moments <Phi(x_1)...Phi(x_n)> of a positive
state < . > on noncommutative polynomials in the quantum field Phi,
while time-ordered correlation functions are the moments
<Phi(x_1)...Phi(x_n)> of a complex measure < . > on commutative
polynomials in the classical field Phi.
In both cases, we have a linear functional, and the linearity gives
rise to an interpretation in terms of a functional integral.
The exponential kernel in Feynman's path integral formula for the
time-ordered correlation functions comes from the analogy between
(analytically continued) QFT and statistical mechanics,
and the Wightman functions can also be described in a similar analogy,
though noncommutativity complicates matters. The main formal reason for
this is that a Wick theorem holds both in the commutative and the
noncommutative case.
For rigorous quantum field theory one essentially avoids the
path integral, because it is difficult to give it a rigorous
meaning when the action is not quadratic. Instead, one only keeps
the notion that an integral is a linear functional, and
constructs rigorously useful linear functionals on the relevant
algebras of functions or operators. In particular, one can define
Gaussian functionals (e.g., using the Wick theorem as
a definition, or via coherent states); these correspond exactly
to path integrals with a quadratic action.
If one looks at a Gaussian functional as a functional on the
algebra of fields appearing in the action (without derivatives
of fields), one gets - after time-ordering the fields - the
traditional path integral view and the time-ordered correlation
If one looks at it as a functional on the bigger algebra of
fields and their derivatives, one gets - after rewriting the
fields in terms of creation and annihilation operators - the
canonical quantum field theory view with Wightman functions.
The algebra is generated by the operators a(f) and a^*(f),
where f has compact support, but normally ordered
expressions of the form
S = integral dx : L(Phi(x), Nabla Phi(x)) :
make sense weakly (i.e., as quadratic forms).
The art and difficulty is to find well-defined functionals
that formally match the properties of the functionals 'defined'
loosely in terms of path integrals.
This requires a lot of functional analysis,
and has been successfully done only in dimensions d<4.
For an overview, see:
A.S. Wightman,
Hilbert's sixth problem:
Mathematical treatment of the axioms of physics,
in: Mathematical Developments Arising From Hilbert Problems,
edited by F. Browder,
(American Mathematical Society, Providence, R.I.) 1976, pp.147-240.

S6d. Is there a rigorous interacting QFT in 4 dimensions?
The Wightman axioms and the Osterwalder-Schrader axioms
[see, e.g., math-ph/0001010 or the book by Glimm and Jaffe]
are currently the basis on which rigorous quantum field theory
(at least for massive particles) is discussed.
In spite of many attempts (and though numerous uncontrolled
approximations are routinely computed), no one has so far succeeded
in rigorously constructing a single QFT in 4D which
has nontrivial scattering. Not even QED is a mathematical object,
although it is the theory that was able to reproduce experiments
(anomalous magnetic moment of the electron; see the entry
''Is QED consistent"" in this FAQ) with an accuracy of 1 in 10^12.
But till today no one knows how to formulate the
theory in such a way that the relevant objects whose approximations
are calculated and compared with experiment are logically well-defined.
See, e.g., the S.P.R. threads
This probably explains the high prize tag of 1.000.000 US dollars,
promised for a solution to one of the Clay millenium problems,
that asks to find a valid
construction for d=4 quantum Yang-Mills theories that is strong
enough to prove correlation inequalities corresponding to the
existence of a mass gap. The problem is to explain rigorously
why the mass spectrum for compact Yang Mills QFT begins at a positive
mass, while the classical version has a continuous spectrum
beginning at 0.
The mass gap is a property of the theory, not of a wave function.
Intuitively, it means that, in the rest frame of the total system,
the ground state (=vacuum) is an isolated eigenstate of the
Hamiltonian H, i.e., that the spectrum of H is a subset of
{0} union [E_1,inf]. The largest E_1 with this property defines
the mass gap m_1=E_1/c^2.
This would make proper sense for a nonrelativistic theory.
For a relativistic theory one has to read between the lines and
interpret everything in terms of suitable analogies,
for lack of a consistent mathematical theory.
The millenium problem essentially asks for a rigorous mathematical
setting in which the above can be made precise and proved.
The real problem is the rigorous construction of a Hilbert space with
a unitary representation of the Poincare group, such that a
perturbation argument recovers the traditional renormalized order by
order approximation of quantum field theory.

The state of the art at the time the problem was crowned by
a prize is given in
and the references quoted there. See also
I don't think significant progress has been published since then.
(The paper hep-th/0511173 which claims to have solved the problem
only consists of a bunch of heuristic arguments. That the author calls
it a proof doesn't turn it into a mathematical proof.)
Yang-Mills theories are (perhaps erroneously) believed
to be the simplest (hopefully) tractable case,
being asymptotically complete while not having the
extra difficulties associated with matter fields.
(There are only gluons, no quarks or leptons.)
Of course, one would like to show rigorously that QED is consistent.
But QED has certain problems (the Landau pole, see below) that are
absent in so-called asymptotically free theories, of which
Yang-Mills is the simplest.

Note that rigorous interacting relativistic theories in 2D and 3D exist;

see, e.g.,
J. Glimm and A Jaffe,
Quantum Physics: A Functional Integral Point of View,
Springer, Berlin 1987.
This book is quite difficult on first reading.
Volume 3 of Thirring's Course in Mathematical Physics
(which only deals with nonrelativistic QM but in a reasonably
rigorous way) might be a good preparation to the functional analysis
needed. A more leisurely introduction of the physical side of the
matter is in
Elcio Abdalla, M. Christina Abdalla, Klaus D. Rothe
Non-Perturbative Methods in 2 Dimensional Quantum Field Theory
World Scientific, 1991, revised 2nd. ed. 2001.
The book is about rigorous results, with a focus on solvable models.
Note that 'solvable' means in this context 'being able to
find a closed analytic expression for all S-matrix elements'.
These solvable models are to QFT what the hydrogen atom is to
quantum mechanics. The helium atom is no longer 'solvable' in the
present sense, though of course very accurate approximate calculations
are possible.
Unfortunately, solvable models appear to be restricted to 2 dimensions.
The deeper reason for the observation that dimension d=2 is special
seems to be that in 2D the line cone is just a pair of lines.
Thus space and time look completely alike, and by a change of variables
2g. (light front quantization), one can disentangle things nicely
and find a good Hamiltonian description.
This is no longer the case in higher dimensions. (But 4D light front
quantization, using a tangent plane to the light cone, is well alive
as an approximate technique, e.g., to get numerical results from QCD.)
Thus, while 2D solvable models pave the way to get some rigorous
understanding of the concepts, they are no substitute for the
functional analytic techniques needed to handle the non-solvable
models such as Phi^4 theory.

S6e. Constructive field theory
Rigorously defined Lorentz-covariant quantum field theories are known
to exist in 2 and 3 dimensions; the standard reference (for d=2)
is the book by
J. Glimm and A. Jaffe,
Quantum physics. A functional integral point of view
New York, 1981
A recent review of the achievements of constructive
quantum field theory in dimensions < 4 is
V. Rivasseau
Constructive Field Theory and Applications:
Perspectives and Open Problems,
J. Math. Phys. 41 (2000), 3764-3775.
The case d=4 is a famous unsolved problem; the special case of 4D
quantum Yang-Mills gauge theory with a compact simple, nonabelian
gauge group is one of the Clay Millenium problems with a 1 million
Dollar prize attached to its solution.

Let me explain some aspects of the construction given in

Glimm and Jaffe.
First one needs to understand that the construction breaks the Lorentz
symmetry. This is (although they don't draw this connection) because in
irreducible Poincare representations, one can construct only three
commuting coordinates, and their construction is observer-dependent,
i..e, dependent on singling out a preferred time. Of course, the final
theory is again Lorentz invariant.
To motivate construction, one therefore needs to choose a time
coordinate, then one makes analytical continuation to Euclidean time
(i.e. it in place of t), and shows that one gets an SO(4) symmetric
field theory in place of the Lorentz symmetry. The advantage gained is
that the functional calculus over a space with definite metric is
well-defined mathematically (via a limit approach through lattices, or
via Wiener measures) - this is just classical stochastic calculus.
Conversely, and this is the constructive part, given an SO(4) symmetric
field theory, one can choose a direction as Euclidean time and obtain
(via a fairly simple construction detailed in Chapter 7) within that
theory a well-defined Hamiltonian on a suitably constructed Hilbert
space of 3-dimensional fields. This Hamiltonian defines a time
evolution as in ordinary quantum mechanics. The nontrivial part (which
is the Osterwalder-Schrader reconstruction theorem stated in Chapter 7
but proved much later in the book - the forward references in Glimm
and Jaffe are, unfortunately, quite confusing) is to show that the
resulting theory is Lorentz invariant.
Thus the construction reduces to constructing the Euclidean field
theory. This is done via a Lattice regularization; indeed, all lattice
field theory and computation is based on the Euclidean formulation
rather than the Minkowski formulation.
In 2D and 3D, the existing analytic error estimation techniques are
sufficient to prove the existence of the limit with suitably
renormalized operators. In 4D, there are additional technical
problems that have not been overcome so far. But neither has it been
proved that any of the 4D field theories cannot exist. There are some
informal arguments suggesting this or that, but none of them is
conclusive in the sense of having paved the way towards a construction
or a no-go theorem.
S6f. The classical limit in relativistic QFT
The classical limit of a quantum field theory is the
theory defined by taking the Lagrangian occuring in the functional
formalism and making the corresponding action stationary.
Note that a functional integral is an integral in which all
fields have classical meaning. The quantum interpretation comes
from taking the functional integral as a generating functional for
S-matrix elements, while the classical interpretation comes from
taking a saddle point approximation. Since the k loop contributions
scale with hbar^k, they disappear in the classical limit hbar to 0,
so only the tree diagrams are left in the expansion, which correspond
to the saddle point approximation in the functional integral.
This needs a slight qualification for Fermions, e.g., electrons.
A fermion field Psi(x) itself, being an anticommuting field,
has no direct classical meaning, but has the numerical advantage
that it is a field in 3 instead of 6 variables. Products of two
Psi terms commute with each other, hence have a direct classical
interpretation. Indeed, classically there is an electron density
field W(x,p) given by the Wigner transform of Psi(x)Psi(y)^*,
where Psi(x) is the classical Grassmann field occuring in the
Lagrangian, satisfying a Dirac equation with an electromagnetic
interaction added. This field W(x,p) is measurable and plays a role
in semiconductor modeling. (In the definition of the Wigner transform,
a second hbar appears, a remnant of second quantization. If one
moves this to zero, too, the description in terms of Psi is no longer
possible, and one gets instead a Vlasov equation for W.)
Thus the classical limit of the standard model is a mathematically
well-defined theory, while the quantum version is only perturbatively
defined, which means, it is mathematically undefined - even for QED.
Nevertheless, the renormalization prescription make at least the
coefficients of the asymptotoc series in hbar well-defined, which is
what particle physicists use to extract approximate physical
In this relaxed sense, the quantum standard model is also well-defined.

S6g. What are interpolating fields?
Traditional QFT has rules for computing reasonable approximations
to the S-matrix of a field theory. The S-matrix describes the behavior
of a state of the system under a transition from time t=-inf to time
t=+inf. But in a complete dynamical theory, one would like to be able
know what happens in-between at finite times. In nonrelativistic QM,
this information is given by the Schroedinger equation. In QFT it is
given by the interpolating field - called interpolation since it
interpolates between the infinite limiting times.
More precisely, the dynamical information about the interpolating
field is represented mathematically in the Wightman functions,
which give the (renormalized) vacuum expectations of field products
at arbitrary combination of space-time points.
Unfortunately, no one knows how to compute the latter in relativistic
$D quantum field theories. However, Wightman functions have been
constructed rigorously in lower dimension (more precisely
in certain superrenormalizable theories in 2 and 3 dimensions).

S6h. Hilbert space and Hamiltonian in relativistic quantum field theory
Most of current quantum field theory (i.e., everything with exception
of 2D and 3D constructive field theory - which doesn't even cover QED)
does not have a well-defined Hilbert space at all, in which a
time operator would be defined.
Well-defined are only the asymptotic Hilbert spaces of in and out
states for scattering experiments. These are Fock spaces of
free particles, and hence defined on a mass shell.
There is a basic result called Haag's theorem which states that
these asymptotic Fock spaces cannot carry a nontrivial local dynamics,
as would be required for a field theory.
The full dynamics can be defined only indirectly, via CTP (closed
time path) integration, and subject to all interpretation problems
of the renormalization procedures.

Constructing for a relativistic field theory a physical

Hamiltonian which is bounded below is really difficult, and has
been achieved only in less than 4D theories.
The construction is usually based on a preferred time coordinate
which is needed in all cases I am familiar with;
- in the Foldy-Wouthuysen transformation (for the Dirac equation,
where p_0 also fails to have the right properties),
- in the Newton-Wigner construction (for single particles in
an arbitrary massive irreducible representation of the
Poincare group) and
- in the Osterwalder-Schrader reconstruction theorem (for
Lorentz-invariant field theories from Euclidean field theories).
While the Hilbert space and the Hamiltonian depend on the choice of
the time coordinate, the physics is independent of it since all these
Hilbert spaces are isomorphic via isomorphisms that maps the
Hamiltonians into each other.

S6i. 2-dimensional quantum field theory
Much of the state of the art in 2-dimensional relativistic quantum
field theories is covered in two books,
Elcio Abdalla, M. Christina Abdalla, Klaus D. Rothe
Non-Perturbative Methods in 2 Dimensional Quantum Field Theory
World Scientific, 1991, revised 2nd. ed. 2001.
J. Glimm and A Jaffe,
Quantum Physics: A Functional Integral Point of View,
Springer, Berlin 1987.
The first book treats exactly solvable theories, the second book
treats general polynomial interactions. The methods are completely
different in the two cases, and the two books are essentially disjoint.
Unfortunately, both books are somewhat difficult to read.
Abdallah et al. treat those (very special) 2-dimensional quantum field
theories having closed analytic expression for all S-matrix elements'.
These solvable models are to 2-dimensional quantum field theory what
the hydrogen atom is to quantum mechanics. It gives lots of details
about many solvable models, but I found it too specialized to give me
a feeling of general 2-dimensional quantum field theory.
Glimm and Jaffe assume a lot of measure theory and functional analysis.
This is summarized in Appendix A of their Part I, but working first
through Volume 3 of Thirring's Course in Mathematical Physics (which
only deals with nonrelativistic QM but in a reasonably rigorous way)
would be a good preparation for tackling Gliimm and Jaffe.
They construct - rigorously - for 2-dimensional relativistic
Lagrangian scalar field theories with polynomial interaction a Hilbert
space, a well-defined Hamiltonian, a well-defined unitary dynamics,
with well-defined bound states that are eigenstates of the Hamiltonian,
and everything is invariant under the 2D Poincare group ISO(1,1).
Chapter 3 defines a rigorous version of the path integral for ordinary
quantum mechanics, or rather for the Euclidean version of it, with the
i in the Schroedinger equation dropped. This amounts to analytic
continuation to imaginary time, where everything is easy and
respectable. In place of a hyperbolic differential equation one gets
a parabolic one (the heat equation), which makes things tractable
since the heat kernel is positive and hence the measures needed to
make the path integral rigorous are positive Wiener measures, with a
good rigorous theory.
Quantum field theory starts in Chapter 6. It is presented in a
Euclidean and a Minkowski version, the former being an analytic
continuation of the latter. Both versions are defined axiomatically,
by the Osterwalder-Schrader axioms and the Wightman axioms,
respectively. Again, the Euclidean version is the tractable one,
in which one can generalize the path integral and perform the
estimates needed for proving the existence of all the tools.
The Osterwalder-Schrader theory then guarantees that, given the
satisfaction of the Euclidean axioms, analytic continuation to
the Minkowski case is indeed possible. This is outlined in Section 6.1;
the remainder of the chapter discusses the (easy) special case of
free fields.
Chapters 7-12 and 19 then define the machinery needed to show how
to satisfy the axioms in the case of 2-dimensional relativistic
Lagrangian scalar field theories with polynomial interaction.
Chapter 7 discusses the Gaussian measures that define the Euclidean
path integral of free fields, Chapter 8 presents a rigorous theory of
perturbation theory for Euclidean path integrals, and the remaining
chapters mentioned provide the estimates needed to make sure that
everything works.
S7a. What is the mass gap?
In a relativistic theory, whenever there is a state with definite
4-momentum p, there is also one with definite momentum p' = Lambda p
obtained by applying a Lorentz transform Lambda. The orbit of
4-momenta obtained in this way forms a hyperboloid in the future
cone (because of causality), characterized by a mass m=>0.
p^2=m^2, p_0>0.
This includes as a limiting case massless states with m=0,
where the orbit consists of the future light cone with 0 excluded.
Therefore the possible values of p are characterized by the possible
values of m, which defines the mass spectrum of the theory. The mass
spectrum is the relativistic analogue of the energy spectrum of the
Hamiltonian in a nonrelativistic theory, shifted such that the ground
state has E=0.

The only state with zero momentum is the ground state, usually called
the vacuum. If the values of p^2 for the realizable nonzero p is
bounded below by a positive number, the theory is said to have a mass
gap. The largest value of m>0 for which m^2 is such a lower bound
defines the precise value of the mass gap. Usually there is a state
for which p^2=m^2; this is then interpreted as the state of a
single 'dressed' particle.
In general, the mass spectrum consist of a discrete and a continuous
part. The discrete part of the spectrum corresponds to bound states,
the continuous part to scattering states.
The continuous spectrum starts when there is the possiblity of
scattering. which means that the energy is large enough that two
asymptotically independent systems can exist. Given a state of mass
m, one expects to have states with two almost independent systems of
mass m and an arbitrary relative momentum, giving a continuous
spectrum of scattering states with all possible squared momenta
exceeding (2m)^2, as a simple calculation reveals:
If p is the sum of two timelike vectors p1,p2 of mass m then
p^2 = (sqrt(\p1^2+m^2)sqrt(\p2^2+m^2))^2 - (\p1+\p2)^2
= 2m^2 + 2 sqrt((\p1^2+m^2)(\p2^2+m^2)) -2\p1 dot \p2
By making \p2=-\p1 one gets arbitrarily large values of p^2, hence
part of the continuous spectrum. The minimum of p^2 must occur by
Cauchy/Schwarz for \p2=\p1, and is then (2m)^2, independent of the
spatial momentum.
Thus the continuous spectrum extends from mass 2m to infinity,
where m is the mass gap.
There may be bound states with mass m_b<2m, forming the discrete
spectrum. These are not scattering states, hence not obtained by
simply adding momenta. For bound states of k particles with masses
m_1,...,m_k, one needs to subtract from (m_1+...+m_k)c^2 the binding
energy of the bound particles. There might be bound states
with mass m_b>2m embedded in the continuous spectrum, but these are
possible only if there are selection rules that forbid the decay into
particles with smaller mass.
In particular, the state of minimal mass m, if it exists, is always
a bound state (including the case of a single particle).

If there is no mass gap, one expects massless dressed particles

to be present. This corresponds to the limiting case m --> 0 of the
above discussion.

S7b. Why can a bound state of massless quarks be heavy?
A system has a well-defined mass if it is in an eigenstate of p^2,
where p is the total momentum operator (whatever this is;
relativistically, bound states are very poorly understood).
So to understand, view it from a nonrelativistic perspective.
Because of E=mc^2, the mass shows up as energy, i.e., as eigenstate
of the Hamiltonian.
Now a bound state at rest defines the rest energy, and by giving
it uniform motion one can increase the energy by an arbitrary amount
of kinetic energy. The rest energy (and hence the rest mass), on the
other hand, is determined by the discrete spectrum of the Hamiltonian
in reduced coordinates, i.e., with center of mass motion separated out.
For forces that decay with distance, a bound state necessarily has
a mass that is less than the sum of the masses of the constituents.
For particles involving quarks, this does not apply since the strong
force increases with distance. Hence the rest mass of a bound state of
quarks could be anything.

S7c. Bound states in relativistic quantum field theory
Bound states are supposed to be poles of the S-matrix, and
Bethe-Salpeter equations for the bound state dynamics can be
obtained approximately from resumming infinite families of
Feynman diagrams. See Chapter 14 of Weinberg's QFT I. But...
Perturbative QED (even in Scharf's rigorous treatment)
has nothing at all to say about how to model bound states.
Bound states don't exist perturbatively: The poles in the S-matrix
can arise only by summing infinitely many Feynman diagrams.
(Sum the geometric series 1+x+x^2+... to see how poles arise by
I haven't seen a single rigorous treatment of such an issue in
quantum field theory.
Weinberg states in his QFT book (Vol. I) repeatedly that bound state
problems (and this includes the Lamb shift) are still very poorly
understood (though the Lamb shift is one of the most accurately
predicted physical quantity). On p.564 he says,
'These problems are those inbolving bound states [...]
such problems necessarily involve a breakdown of ordinary
perturbation theory. [...] The pole therefore can only arise
from a divergence of the sum of all diagrams [...]'
On p.560, he writes,
'It must be said that the theory of relativistic effects
and radiative corrections in bound states is not yet in an
entirely satisfactory shape.'
This remark suggests that he seems to think that, in contrast,
for scattering problems, the theory is in an entirely satisfactory
state, as given in the rest of his book. Thus 'satisfactory'
does not mean 'mathematically rigorous', but only
'well understood from a physical, approximate point of view'.
There are, of course, methods for approximating bound state problems,
based on Bethe-Salpeter equations, Schwinger-Dyson equations, and
some other approaches. See, e.g., the review
H. Grotch and D.A. Owen,
Foundations of Physics 32 (2002), 1419-1457.
or hep-ph/0308280.
But all of this is done in completely uncontolled approximations,
and to get numerically consistent results is currently more an
art than a science.
This leaves plenty of scope for interesting (but hard)
new work on bound states on both the physical and mathematical side.

S8a. Why renormalization?
Quantum field theory is what particle physicists define it is, and
this includes many working interacting QFTs. But it is not a theory
in the mathematical sense. This is due to the freedom they take
when discussing the renormalization needed to remove formal
infinities from their theories.

Finite renormalization just refers to the fact that the coefficients

in a Hamiltonian are not directly measurable but only computable as
function of some key observables. It is simply a consequence of the
historical accident that these coefficients were given names (masses,
charges) that sound like real properties, while they are in fact
indirectly related to them.
Thus in solid state physics one gets bare masses of quasiparticles
from the coefficients of a Hamiltonian, but they are just parameters
and related to the measurable masses by some transformation, which is
dubbed the finite renormalization.
Infinite renormalization is needed in ordinary QM when the potential
gets too singular, for example with delta-function potentials that
model contact interactions. Hardly ever discussed in textbooks but
important for understanding. See, e.g., hep-th/9710061, or Chapter I.3
R. Jackiw,
Diverse topics in theoretical and mathematical physics,
World Scientific, Singapore 1995.
A paper by Dimock (Comm. Math. Phys. 57 (1977), 51-66) shows rigorously
that, at least in 2 dimensions, delta-function potentials define
the correct nonrelativistic limit of local scalar field theories.
In mathematical terms, infinite renormalization means that the
interaction is a limit of regularized interactions related to fixed
measurable quantities by finite transformations which, however,
diverge when the regularization is removed. The limiting interaction
remains, however, well-defined as a densely defined operator in
Hilbert space.
For exactly the same reason it is needed in relativistic QFT, since
local fields imply singular interactions. But in 4 dimensions, the
limiting process is not well understood mathematically.
In 1+1 dimensions, everything is well-defined mathematically
in terms of rigorous renormalization theory, for arbitrary polynomial
interactions. (See the book by Glimm and Jaffe).
The 1+2-dimensional case is significantly more difficult and needs
a restriction on the polynomial degree. There is a nontrivial
renormalization theory for Phi^4 theory, which is mathematically
Only the 1+3 dimensional case is at present completely open.

What is loosely called 'infinite' in traditional discussions of

renormalization means, strictly speaking, only that the limit where
a cutoff goes to infinity does not exist. At any finite value of the
cutoff, both the Hamiltonian and the counterterms are finite.
If it were not so, one couldn't do renormalization and get something
finite. The problem solved by Tomonaga, Schwinger and Feynman,
for which they got the Nobel prize, was that they discovered how to
produce a well-defined limiting theory for cutoff to infinity
which allows to extract finite values for quantities that can be
compared with experiment.
All renormalization until today follows the same pattern.
One does certain formal computations at
finite cutoff and at some point where it no longer harms
moves the cutoff to infinity, being left with approximate
formulas at some (fixed or variable) loop order which no
longer contain a cutoff and have finite values.

S8b. Renormalization without infinities I
Renormalization in QFT is often associated with the need to handle
infinities. This makes everything look as nonsense from a
mathematical point of view. But this is just the sloppiness of
physicists; it is not difficult to get a satisfying view of
renormalization without encountering any weird infinities.
The basic principles can be explained without knowing anything about
quantum mechanics, since renormalization is a much more general
phenomenon associated with idealizations in a theory and the
corresponding limits. As such it is also needed in various classical
situations (classical point electrons, turbulence, etc.)
hep-th/0212049 is a nice paper discussing most of renormalization
without ever mentioning fields (which come in quite late).
In all cases, we want to describe a situation which is a limit of more
complex and often less symmetric situations. This limit is the only
problematic thing, and sometimes generates infinities if done in an
improper way. Just as when trying to compute
s_N = sum_{k=0:N} (-1)^k/(k+1)^s = u_N - v_N
by summing the even and odd contributions u_N and v_N separately.
The limit N to inf is well-defined for s>0, but can be obtained only
for s>1 by going to the limit in u_N and v_N separately.
One needs to proceed similar as in techniques to evaluate limits which
give naively inf-inf, by using some transformation that cancels the
infinities analytically. Example:
lim sqrt(n^2+n)-sqrt(n^2+1)
= lim ((n^2+n)-(n^2+1))/(sqrt(n^2+n)+sqrt(n^2+1))
= lim (n-1)/(sqrt(n^2+n)+sqrt(n^2+1)) = 1/2.

In quantum physics, the data (the Hamiltonian in QM, the action in QFT)
depends on some parameter vector v of dimension d, say, without direct
physical meaning. For example, v may consist of bare mass,
bare charge, and bare coupling constant.
Without the renormalization conditions we get a family solution
parameterized by v from which we can compute measurable quantities
combined into a vector q=q_N(v) of some dimension e>d.
where N is the parameter in which we want to take the limit.
(N might be an energy cutoff at energies beyond observability, and q the
observed particle spectrum.)
Anything we can reliably measure must clearly be essentially independent
of N, once N is large enough. Therefore the equation q=q_N(v) defines a
(generically) d-dimensional manifold in R^e whose limit as a set is also
a well-defined d-dimensional manifold. This is the manifold of interest,
since picking a particular finite value for N is usually subjective.
In a theory with finite renormalization, this limit manifold can still
be parameterized by v, since the limit
q(v)= lim_{N to inf} q_N(v) (*)
exists. Although v is unobservable it can be calculated from the
measurements by solving the equation q=q(v) in the least squares sense.
Rather than doing that (which would be numerically best in case the
measurements are inexact or q(v) is not exactly known) one proceeds
in theoretical work as if an s-dimensional vector mu of key physical
data and a corresponding subset of d equations were known exactly,
and can be solved exactly for v=v(mu).
Then one gets a renormalized parameterization
q=q_ren(mu), with q_ren(mu)= q(v(mu)), (**)
expressing everything in terms of the physical parameters mu.
When the limit (*) does not exist, the situation is more complicated.
Since there is no limiting q, one has to work at finite N. Proceeding
as before, one solves d of the equations in q=q_N(v) for v, getting
v=v_N(mu), but since the limit (*) does not exist, there will also be no
v(mu) = lim_{N to inf} v_N(mu)
which would enable the use of (**). Instead, v_N(mu) diverges.
Loosely speaking, we get infinite bare masses and bare coupling
constants. But this limit will never be used, hence there are no
problems. It is just the loose way of speaking that creates the
impression of weirdness. The 'infinities' are caused by the nature
of the interactions. If they are too singular for a standard treatment
then the limits needed for a finite renormalization simple do not
exist anymore.
But this does not mean that the theory becomes meaningless but only
that one has to be careful in performing the limit only where it is
allowed to do so. This requires a small change in our procedure.
At finite N, we can still define a renormalized
q = q_{N,ren}(mu), with q_{N,ren}(mu)= q_N(v_N(mu)).
For a renormalizable theory, the limit
q_ren(mu) = lim_{N to inf} q_N,ren}(mu)
exists although neither q_N nor v_N converge.
Once this limit replaces the naive bare recipe (*)-(**) which is
ill-defined, everything behaves properly as it should.
The situation may be slightly more complex than indicated above.
Instead of working with directly measurable quantities one often
works with formally more tractable quantities q that are finitely
related to the key measurable quantities mu (such as observed mass
spectra). However, their definition depends on an additional scale
parameter E that fixes the renormalization conditions. (This parameter
should not be mixed up with the cutoff energy, which after
renormalization is always infinite!)
Thus we actually have q=q_N(v,E), solve some of these equations for
v=v_N(mu,E), and get as a result
q = q_{N,ren}(mu,E), with q_{N,ren}(mu,E)= q_N(v_N(mu),E),
q_ren(mu,E) = lim_{N to inf} q_{N,ren}(mu,E).
But since the scale E can be chosen arbitrarily, the final renormalized
result of physical predictions P(q,E) must be
independent of E. Thus,
d/dE P(q_ren(mu,E),E) = 0,
which is a form of the renormalization group equations.

To get a renormalized Hamiltonian, one also needs wave function

renormalization, which means using a cutoff-dependent inner
product in the space of wave functions (in the functional
Schr"odinger picture). The limiting Hamiltonian is perturbatively
well-defined in the physical Hilbert space obtained as limit of
renormalized Hilbert spaces at finite cutoff, as the cutoff goes
to infinity.
S8c. Renormalization without infinities II
In bare (divergent) QFT, infinities arise because integrals taken over
unbounded momenta don't exist; so doing it leads to nonsense.
Instead, proper QFT takes regularized integrals, for example by
adding an explicit cutoff Lambda. This simply means that everything is
calculated with an action that depends on Lambda as an additional
parameter. Once this is done, everything is finite, but
The only problem with that is that the cutoff destroys Lorentz
covariance - apart from that it would be a completely respectable
field theory in itself. Now Lorentz invariance is violated only
at energies >O(Lambda); hence to have the theory conform to
physics that can be checked it suffices to take Lambda large.
But for aesthetic reasons or since we believe that symmetries are
fundamental, we want to have fully invariant theories. This requires
that we let Lambda go to infinity.
But in order that the results have a finite limit we must at the
same time make the coupling constants g dependent on Lambda.
If this is done in a correct way (and the textbooks on QFT teach
one or more of the known correct ways under the heading of
'renormalization'), one encounters no infinities at all in the
whole process.
Thus renormalized quantities are never infinite.

The essentials of the renormalization process, namely the need for

Lambda-dependent coupling constants for sufficiently singular
Hamiltonians, can be understood nonperturbatively on the
nonrelativistic level.
What happens is that one has a family of Hamiltonians
H(Lambda,g) that depend on a scale parameter Lambda and and a coupling
constant g (or several). H(Lambda,g) has a good limit H(g) as Lambda
to inf, with g fixed, but the corresponding limit of the resolvent
G(Lambda,g) does not exist; hence if one tries to do calculations with
H(g) directly (the 1930 way of doing things, which was a dead end),
one gets infinities all over the place.
On the other hand, if one chooses a good parameterization g(Lambda,mu)
then, although H(Lambda,g(Lambda,mu)) has no longer a good limit as
Lambda to inf, its resolvent G(Lambda,g(Lambda,mu)) has a well-defined
limit G(mu). (At least in 1D and 2D field theory, where this can
be proved in certain cases. In 3D and 4D, one probably needs also
a Lambda-dependent inner product defining the Hilbert space
to ensure that one ends up in the right representation,
and Lambda-dependent wave functions to ensure that the limiting
renormalized wave functions remain bounded in the limiting
renormalized inner product.)
Since all dynamical information including scattering information
is in the resolvent, G(mu) defines a good physical model for a
scattering process.
In some simple cases, renormalization can be done nonperturbatively.
For example, standard perturbation theory for a Hamiltonian
p^2/2m +g delta(x) produces infinities. The renormalization of this
particular example is treated nonperturbatively in hep-th/9305052.
Thus, infinities only appear if one takes the limit in a way it
cannot be taken consistently.
Of course, the relativistic case is more involved and at present
not understood nonperturbatively, but there is no difference in
The local interaction of the formal Lorentz invariant action is
replaced by a nonlocal interaction depending on the UV cutoff Lambda.
Thus one has V(g,Lambda) in place of V(g), where g are the coupling
constants (including masses).
To do so, one writes the (Euclidean = Wick rotated) field as
Phi(x) = integral dp exp(-i p dot x) Phihat(p)
and substitutes it into the action. This gives an action in the
momentum representation. Then one regularizes the interaction term by
throwing away the momenta above some cutoff Lambda.
Introducing the cutoff makes the interaction nonlocal, as one can see
by going from the momentum representation of the regularized
interaction term back to the position representation by substituting
Phihat(p) = const * integral dx exp(i p dot x) Phi(x).
Instead of the delta functions which would appear without the
cutoff there are now explicit nonlocal potential terms.
(Note that Coulomb interaction in nonrelativistic QFT is nonlocal.
See also
H. Ekstein, Phys. Rev. 117, 1590-1595 (1960)
for more on nonlocal interactions and relations to the S-matrix.)
(But actually one does not need to care about locality or not,
since the regularized interaction in the momentum representation
is mathematically ok and one can do everything else in this
More precisely, one starts with the smeared Lagrangian interaction
defined by the cutoff, uses the representation of the S-matrix as
a time-ordered exponential to work out the corresponding
Hamiltonian interaction in the interaction picture, and takes this
as definition of the regularized dynamics. (Note that Haag's theorem,
which asserts that a nontrivial Lorentz-invariant theory satisfying
microlocality cannot have an interaction picture, does not apply since
the theory with cutoff is neither Lorentz invariant nor microlocal.)
From here on, one can do standard perturbation theory without
encountering any infinity at all; one gets meaningful
formulas throughout the whole renormalization procedure.
All contributions to the S-matrix elements of this regularized theory
are finite, and give (after analytic continuation to real time)
the S-matrix of the regularized interaction.
The result is an asymptotic series S(g,Lambda) for the S-matrix of
the regularized interaction, with finite, computable coefficients.
This S-matrix is unitary and has all properties one would like to have,
except that, because of the cutoff, it is only approximately Lorentz
Of course, for a general nonlocal theory in position representation,
one gets more complicated Feynman rules than those traditionally
written down. In momentum space, the formulas become the standard
formulas, but with explicit cutoff included. Thus to do the suggested
exercise, one should always work in the momentum representation.

To restore Lorentz invariance, one uses a running coupling constant

g=g(Lambda,mu) which, for fixed renormalization point mu (a vector of
the same dimension of g containing the free constants in the matching
of the renormalization conditions), is uniquely determined
(for any fixed renormalization scheme) as the solution of a
renormalization group equation whose coefficients are also defined
as a (presumably even convergent) asymptotic expansion.
Having this, one can take the limit
S(mu) = lim_{Lambda to inf} S(g(Lambda,mu),Lambda)
which is an asymptotic series in hbar with finite, computable
coefficients when the theory is renormalizable, and is Lorentz
invariant and microlocal.
Thus one gets the desired Lorentz invariant, microlocal theory
as a perturbatively well-defined limit of perturbatively well-defined
but not Lorentz invariant or microlocal theories.
At the very end one can pass to the limit, but not earlier.
The only infinity encountered is not worse than the infinity
encountered in defining Riemann integrals over the real line,
where one also gets a finite limit by letting a finite cutoff go to

The real mathematical difficulties in QFT are not in the renormalization

procedure but in giving a nonperturbative construction of the S-matrix

S8d. Renormalization and coarse graining
In QFT, there are two different scales, one on the bare level and one
on the renormalized level, and the meaning of the renormalization
group is slightly different from that in statistical mechanics.

On the statistical mechanics level, there is the cutoff beyond which

one cannot (or does not want to) observe anything. This effective
cutoff is a parameter Lambda in an effective theory defined by coarse
The effective theory depends on E: For different values of E you get a
_different_ effective theory, though their low energy predictions are
essentially the same. This is expressed by the Wilson flow, described
by renormalization group equations that relate the parameters
g(Lambda,mu) in the different effective theories such that some key
low energy observables mu keep the same values.
The number of such key observables (i.e, the dimension of mu)
equals the number of parameters in the effective theory
(i.e, the dimension of g); most other observables are different
at different cutoffs (though only slightly if they are observable at
low energy), because of the coarse graining done when lowering
the cutoff scale Lambda.

In QFT, the above is mimicked on the _bare_ level. The cutoff is a

large energy Lambda beyond which the bare interaction is modified to
be able to get a meaningful limit; this corresponds to coarse-graining.
The resulting bare theory with cutoff Lambda is a well-defined
effective theory and behaves precisely as described above.
To define the renormalized theory, one needs, in addition to the
cutoff, renormalization conditions defining the bare parameters in
terms of renormalized parameters q.
These conditions depend on a renormalization scale E figuring in the
equations defining the renormalization conditions. Because of the
dimensional nature of momentum, there always has to be such a
parameter E, no matter which renormalization procedure is followed.
In QFT, one usually refers to a mass scale M, which is the same as
E=Mc^2 in units such that c=1. Then M is the constant needed in the
renormalization conditions to relate certain computable expressions
to the renormalized parameters. This is discussed at length in
the QFT book by Peskin and Schroeder, Section 12.2, for a massless
Phi^4 theory, and in Section 12.5 for the general case. (For an online
source, see, e.g., equations (90-(11) of hep-th/9804079.
M is introduced there without comment, the role of M is described
later, after (20).) In the following, I continue to use E in place of M.
Thus the bare parameters are functions g(Lambda,q,E) of the cutoff
Lambda, the renormalized parameters q, and the renormalization scale E.
The renormalization group equations in the statistical mechanics
sense (the Wilson flow) would describe how g(Lambda,q,E) changes as
the cutoff Lambda is altered. However, in QFT, this is of no physical
interest. Indeed, Lambda is completely eliminated from considerations:
The renormalized theory is obtained at fixed E by letting the cutoff
Lambda go to infinity. This has the effect that the bare parameters
become meaningless, since the limit
lim_{Lambda to inf} g(Lambda,q,E)
does not exist. At this stage it becomes obvious that all bare objects
are unphysical.
Although nonphysical, the renormalization group equations in
Lambda are an important tool in the _construction_ of QFTs, where the
limit of all correlation functions must be shown to exist in a
suitable topology, and the absence
of divergences shown. In the weakest topology, based on the
ultrametric norm and corresponding to perturbation theory at all
orders, this is shown rigorously in a nice book
M. Salmhofer,
Renormalization: An Introduction,
Springer, Berlin 1999.
Unfortunately, this topology is too weak to give the existence of
the correlation functions as functions; they are only shown to exist
as formal power series.
All expressions of the theory that survive the limit, in particular
all n-point correlation functions, n=1,2,3,...,
describe observable physics. They can therefore be expressed as
functions of q and E only, whose detailed form comes from the
standard theory. However, there is a little twist since the scale E
can be chosen arbitrarily, hence cannot be measurable.
In terms of a fixed set of physical parameters mu (measurable
under well-defined experimental conditions), we can predict mu
by some function of q and E, mu=mu(q,E). Solving for q, we can
express q in terms of mu and E,
But the exact renormalized result of a physical prediction P(q,E)
must be completely independent of E, uniquely determined by the
physical parameters mu. Thus,
d/dE P(q_ren(mu,E),E) = 0,
which are the Callan-Symanzik equations, the renormalization group
equations of interest in quantum field theory.
In contrast to the Wilson flow, however, the sliding scale in the
Callan-Symanzik flow is the renormalization scale E and _not_ the
cutoff Lambda (which at this stage is already infinite). Moreover,
since observable physics is completely independent of the
renormalization scale E, the latter has no intuitive 'physical'
There is no relation between the two flows, except by analogy.
The Wilson flow is needed to _get_ the renormalized theory
at fixed renormalization conditions, the Callan-Symanzik flow
describes what happens when you _change_ these conditions.

S8e. Renormalization scale and experimental energy scale
The picture drawn in the preceding is somewhat incomplete with
regard to the practice of computing, due to the fact that we cannot
compute this renormalized theory at any E, since it is exceedingly
Thus we need to consider approximations. These approximations are
no longer independent of E, since the approximation errors depend
on it. It turns out that the approximation errors are small only
when the energy scale of the experiment for which a prediction is
made is close to the renormalization scale E, since (see, e.g.,
Weinberg's QFT book, Vol. 2, Chapter 18.1) the perturbative
expansion contains arbitrary powers of log(E_experiment/E) which
therfore must be kept small.
Thus one needs to evaluate the theory near the scale of interest.
However, perturbation theory is valid only near a fixed point E^* of
the renormalization group equations. Therefore, one determines
approximate formulas for the quantities q_ren(mu,E) with E close to
the appropriate fixed point E^*, and then uses (also approximate)
renormalization group equations to transform the result to the
scale of interest.

Thus there are two different scales involved, the energy scale
E_exp where the experiments are done, and the renormalization scale
E_ren (previously denoted by E).
On the experimental side, coupling constants (such as the charge)
are determined with reference to some effective, coarse grained theory
(such as the nonrelativistic Schroedinger equation). This effective
theory depends on E_exp (for QED, the charge is traditionally defined
in the low energy limit E_exp to 0). This effective theory behaves
like any other coarse-grained theory, giving rise to running coupling
constants such as e=e_exp(E_exp). But these depend on the details of
the coarse-graining scheme, and the computed results depend on the
coarse-graining, too, and hence on E_exp.

The experimental running coupling constants are only loosely related

to the running coupling constants such as e=e_ren(E_ren) obtained by
the Callan-Symanzik equation (= the renormalization group equation
in terms of the renormalization scale E_ren). The latter are, in theory,
uniquely defined by the renormalization prescription. There the
coupling constants are defined not by an experimental prescription
but as parameters in the renormalization prescription. For example,
in Phi^4 theory, lambda=lambda(M) is defined by equation (12.30) in
Peskin/Schroeder (and E_ren=Mc^2), and the charge e=e(M) in QED
by (10.39) [but at spacelike momentum p^2=-M^2 as in Chapter 12].
As discussed, the physical predictions at any energy are completely
independent of M if e(M) and the other renormalized parameters slide
with M. At least this would be the case in a fully nonperturbative
calculation (which we cannot do). However, the few-loop approximations
depend heavily on M, and give a reasonable approximation to the
exact theory _only_ at energies close to E_ren=Mc^2. Thus the few-loop
approximation behaves just like an effective theory, provided we choose
E_ren = E_exp (or close). But the analogy is not complete since
in a true effective theory we could choose the coarse-graining scale
anywhere at or above E_exp, while for good few-loop approximations
we need to choose it always close to E_exp.

Thus, if one could solve the equations exactly, the dependence on M

and the Callan-Symanzik equation would be completely irrelevant,
and nothing at all could be extracted from it. But in practice one
can work only at few loops, and then different values of M may give
vastly different results, and the equation is very useful since it
enables one to work with the right M.
The renormalization group equations are used to move from an
M near the fixed point (where one can do perturbation theory and has
reliable few-loop calculations but where the approximation errors =
the higher order terms in the perturbation series are huge)
to an M near the experimental scale (where the approximation error
is small, and the few-loop calculation therefore reasonably accurate).
This is often expressed by saying, loosely, that the renormalization
group approach partially resum the perturbation series.
One gets what is called 'renormalization group improved perturbation
theory', which is predictive about a much larger range of coupling
constants than simple renormalized perturbation theory (which only
works for very weak coupling).

S8f. Dimensional regularization
The neatest way to perform regularization, and the only one which
works well in complicated cases such as nonabelian gauge theories
is dimensional regularization. Unfortunately, it is presented
in most textbooks in a way that looks quite mysterious, involving
unphysical fractional dimensions. This is however just sloppiness
on the side of physical tradition, and a more rigorous approach
removes everything strange.

The rules for dimensional regularization are derived in Euclidean

space rather than Minkowski space. To get the latter, one needs an
additional analytic continuation.
For p in Euclidean d-space (d>0 integral), we put p^2=p^Tp.
If d is a positive integer and f(p^2) is integrable (i.e. decays fast
enough), then standard Lebesgue integration gives the formula
integral dp^d/(2 pi)^d f(p^2)
= C_d integral_0^inf dr r^{d-1}f(r^2), (1)
where C_d is given in terms of the Gamma function as
C_d = 2 pi^{d/2}/Gamma(d/2). (2)
We observe that the formula (2) makes sense for arbitrary complex d
with nonnegative real part, and that therefore for
f(s)=r^2j/(r^2+m^2)^n, n>j+d/2,
the well-defined right hand side of (1) is an expression I(d,j,n)
which depends analytically on d,j,n.
In particular, the cases j=0 and j=1 lead to the expressions
given in P/S (7.85/86). A similar reasoning produces (7.87) and
more complicated rules analogous to those given in P/S on p.807
(where, however, analytic continuation to Minkowski space has
already been performed). These rules, together with the
Feynman trick stated as (A.39)on p.806 of P/S, can be used to evaluate
integrals of arbitrary rational Lorentz-invariant expressions
provided that they decay fast enough.
Note that the resulting formula
integral dp^d/(2 pi)^d f(p^2) = I(d,j,n) (3)
is valid only if n>j+d/2, which ensures sufficiently fast decay at
infinity to make the Lebesgue integral well-defined and integral d.
For other values the above computations are meaningless, and any
contradiction derived from it is therefore irrelevant.
As irrelevant as the well-known fact that a divergent alternating
infinite sum can be given any value whatsoever by formal rearrangements.
Remarkably, however, I(d,j,n) (and the analogous formulas on
p.807) can be analytically continued to the interesting case
d=4-eps. This allows us to _define_ an _extended_ Lebesgue integral
for d=4-eps by the formula
integral (dp/2 pi)^d p^2j/(p^2+m^2)^n:= I(d,j,n) (4)
and similar expressions for arbitrary rational Lorentz-invariant
expressions. Moreover, if these expressions happen to have
good limits for eps to 0 (which cannot happen for (4) but for
suitable linear combinations) they define the value also for d=4.
The derivation ensures that it gives the correct results in all
cases where the integral makes sense in the traditional (Lebesgue)
Thus we have defined a consistent extension of the Lebesgue
integral of Lorentz-invariant expressions to the singular case.
This is similar in spirit to Lebesgue's extension of the Riemann
integral to the Lebesgue integral.
A good, mathematically rigorous exposition of d-dimensional integration
theory for general complex dimension d is given in
P. Etingof,
Note on dimensional regularization,
Ppp. 597-607 in: Pierre Deligne et al.,
Quantum Fields and Strings, A Course for Mathematicians, Vol. 1,
Amer. Math. Soc., Providence, Rhode Island, 1999
See also

The theory of renormalization now shows that all integrals

occuring in the expressions for S-matrix elements in renormalizable
theories have a well-defined _extended_ Lebesgue integral for d=4.
This is all that is required for consistency.

For those who dislike unphysical complex dimensions,

the uniqueness of analytic continuation implies that one can
get completely equivalent results by keeping the physical
dimension d=4. In this case, one must replace the propagator
(p^2+m^2)^{-1} by (p^2+m^2)^{-n} with sufficiently large n,
and continue the result analytically to the physical value n=1.
Then all integrals are (in Euclidean space) ordinary Lebesgue
integrals. The formulas used for the extended Lebesgue integral
defined as above still apply; however, computations are now slightly
more involved.
Those who worry about the appropriateness of analytic continuation
might wish to consider the functions f, g defined by
in the real domain. They are equal for d<2 but f does not make
sense for d>=2. Nevertheless, it makes exceedingly much sense
to extend the definition of f to arguments d>2 by making
a definition. Indeed, g(d) is the unique meromorphic extension
of f to arbitrary complex arguments.
This uniqueness is in the nature of analytic continuation,
and makes the latter an extremely useful device in many applications.
It is the reason why we consider such useful equations
such as exp(ix)=sin(x)+i*cos(x), which one would have no right
to use if one would not silently identify analytic functions
defined on part of their domain with the full analytic function
on the associated Riemann surface.

S8g. Nonrelativistic quantum field theory
The right way to understand relativistic QFT is to regard it as
a limit of nonlocal nonrelativistic quantum field theory.
The latter is much better behaved.
Interacting QFT in 3+1 dimensions exists, however, as a rigorous
mathematical theory in the nonrelativistic case, since there only
finite renormalizations are needed and no infinities occur.
In this context, Feynman-Dyson perturbation theory can be given a
rigorous meaning. Note that nonrelativistic QFT is nonlocal
because of the Coulomb potential interaction.
Interacting QFT based on Feynman-Dyson perturbation theory
in 3+1 dimensions exists as a rigorous mathematical theory
in the relativistic case, as a limit of smeared, nonrelativistic
theories. This is done for Phi^4 theory in all details in
Salmhofer's book. For technical reasons, one gets the results
however only in a very weak topology corresponding to power series
in the coupling constant, rather than as true functions of the
coupling constants. Thus perturbative relativistic QFT is rigorously
established in 4D while nonperturbative relativistic QFT in 4D
is still elusive.
However, the infinities that plague 4D relativistic QFT are already
present in 3D, and there rigorous construction have been given.
Exactly the same kind of renormalization tricks are used in 3D.
Thus our present lack of understanding cannot be blamed on
renormalization, but has to do with the difficulty of getting
the hard analytical estimates needed to justify the constructions.

S8h. Nonrenormalizable theories as effective theories
The difference between renormalizable and unrenormalizable theories is
that the former are specified by a (small) finite number of parameters
while the latter are specified by an infinite number of parameters.
In a renormalizable quantum field theory, only few counterterms
must be added to the action in order to get a consistent
finite perturbative expansion at all orders. This means that a few
parameters suffice to get a consistent theory which will be correct
at the energies of interest (which should be essentially independent
of what happens at the inaccessible large energies).
In a nonrenormalizable quantum field theory, infinitely many
counterterms must be added to the action in order to get a consistent
finite perturbative expansion at all orders. This means that with a few
parameters one can only get an effective low order theory, which may,
however, still be good enough at the energies of interest.
But for better approximation, one needs to determine more and
more parameters...

In both cases, it is possible to extract approximate results from

computations, and the parameters can be tuned to fit the experimental
results. This gives a consistent procedure for predictions. Indeed,
many nonrenormalizable theories are in use as effective field theories.
(See hep-ph/0308266 for a recent survey on effective field theories.)
People who dislike nonrenormalizable theories do this on the basis of
a claim that their predictive value is nil because of the infinitely
many constants. But this is as unfounded as saying that thermodynamics
is not predictive because it depends on a function (the expression for
the free energy, say) that requires an infinite number of degrees of
freedom for its complete specification. Clearly, in the latter case,
the widespread use of finitely parameterized imperfect free energies
does not hamper the usefulness of thermodynamics. The same can be
said about nonrenormalizable field theories. It only implies that to
extract arbitrarily precise predictions one needs correspondingly
much information as input. We know that this is the case already for
many simpler phenomena in physics.
See also
J.Gegelia, G.Japaridze, N.Kiknadze, K.Turashvili
"Renormalization" Of Non Renormalizable Theories
J Gegelia, G Japaridze
Perturbative Approach to Non-renormalizable Theories

S8i. What about infrared divergences?
Renormalization theory deals with the regularization of ultraviolet
divergences, occuring at very high but unobservable energies.
In contrast, infrared divergences arise if there are problems at
very low energies. They are not cured by renormalization and need
completely different techniques.

Theories without massless particles have no infrared problems at all,

since at low energies only few particles can coexist. Indeed,
the sum of the rest masses of physical particles is bounded by the
total energy of the system.
In QED one has infrared problems because the photon is massless,
so a bound on the sum of the rest masses does not limit the number of
possible photons. indeed, a closer calculations shows that there
may be an arbitrary number of very low energy ('soft') photons.
One can handle the situation in some approximation by giving the
photon a tiny mass mu. But this is an _additional_ parameter, quite
different from the renormalization scale M. And the renormalized
theory at finite mu depends on mu (so that one needs to take
in the end the limit mu to 0 to get physically correct results),
while it is still independent of M.
A better way to handle the infrared divergences is to avoid them
completely by using coherent states. These sum the contributions of
arbitrarily many soft photons in a coherent way.
S9a. Summing divergent series
There is a second kind of divergences, different from those cured
by renormalization.
Most perturbation series in QFT are believed to be asymptotic only,
hence divergent. Strong arguments (which haven't lost in half a
century their persuasive power) supporting the view that one should
expect the divergence of the QED (and other relatvistic QFTs)
power series for S-matrix elements, for all values of
alpha>0 (and independent of energy) are given in
F.J. Dyson,
Divergence of perturbation theory in quantum electrodynamics,
Phys. Rev. 85 (1952), 613--632.
The remarkable fact is that QED is very accurate in spite of this.
It produces verifiable predictions by restricting attention to the
first few terms of a (most probably divergent) asymptotic series,
but it has no way to make sense of the whole series.
This is what Dirac found deficient in the foundations.
An asymptotic series is a series such as
f(x) = sum_{k=0:inf} k! x^k
with radius of convergence zero. For small enough x, the first few
terms give seemingly good approximations, but if one includes - for
any fixed nonzero x - enough terms, the series diverges. Thus, as Dirac
asserts, one neglects arbitrarily large terms to get the approximations
which work so well in QED.
There are infinitely many different ways to assign to an
asymptotic series a function with this series as Taylor expansion.
The problem is to have a way to choose the right one. Borel summation
is often taken as default, but seems to be no cure for QFT in view
of the so-called renormalon problem.
At present, there is no sound mathematical foundation of relativistic
quantum field theory. Who finds one will be awarded one of the
1 Million Dollar Clay Millenium prizes...

If we have a well-defined Hamiltonian H(g) depending infinitely

differentiably on a parameter g, it typically has a well-defined
S-matrix S(g), also depending infinitely differentiably on g.
Perturbation theory computes a power series expansion
S(g)=S_0 + S_1 g + ...
which often diverges for all g although each S_k is finite.
This happens already for the anharmonic oscillator with
H(g)= 1/2 (p^2+q^2) +g q^4.
Thus a correct Hamiltonian with a convergent (in the harmonic
oscillator case even finite, hence trivially convergent) expansion
is quite consistent with a divergent expansion of the S-matrix.
However, one can one still extract information by so-called
resumming techniques. One can study these things quite well with
functions which have known asymptotic expansions
(e.g., improper integrals, using Watson's lemma).
In many cases (and under well-defined conditions), the resulting
infinite series is Borel summable in the following sense: To sum
f(x) = sum a_k x^k (1)
if it is divergent or very slowly convergent, one can sum instead
its Borel transform
Bf(z) = sum a_k/k! z^k (2)
which obviously converges much faster (if not yet, one could probably
repeat the procedure). In many cases, f can be reconstructed from Bf
by means of
Sf(x) = integral_0^inf dz/x exp(-z/x) Bf(z)
= integral_0^inf dt exp(-t) Bf(tx).
Sf is called the Borel summation of the asymptotic series (1),
and is defined whenever Bf is convergent. If Bf has singularities,
the integral over t may have to be done along a contour in the complex
plane; see, e.g., physics/0010038.
It is easy to show that BSf=Bf and that Sf has the same asymptotic
expansion as f. Moreover, the identity Sf(x)=f(x) can be easily
verified if (1) has a positive radius of convergence, but also under
other natural assumptions (but stronger than simply asserting that (1)
is an asymptotic expansion for f).
The book
J.S. Feldman, T.R. Hurd, L. Rosen and J.D. Wright,
QED: A proof of renormalizability,
Lecture Notes in Physics 312,
Springer, Berlin 1988
claims to prove on p. 112ff that the coefficients in the loop
expansion of the QED S-matrix are bounded by const*(N!)^{1/2)/R^n for
some R>0, which would imply that it is locally Borel summable.
But hep-ph/9701418 seems to make oppsite claims.
See also hep-ph/9807443.
Of course, since there are many functions with the same asymptotic
expansion (e.g., one can add arbitrary multiples of terms like
e^{-a/x}, e^{-a/x) log x, etc.),
one has to show that the Borel summed Sf actually has the
properties that the original f was supposed to have (and from which
the asymptotic series was derived). If, in addition, f is
uniquely determined by these properties, we know that f=Sf.
Unfortunately, a proof for such a statement is missing in QED.
In some 2D cases, where nonperturbative QM applies, one can show that
the nonperturbative result satisfies the properties needed to show
that Borel summation of the perturbative expansion reproduces the
nonperturbative result. See also the thread
Re: unsolved problems in QED
starting with
With experimental results one just has numbers, and not infinite
series, so questions of convergence do not occur.
On the other hand, if one knows of an infinite series a finite number
of terms only, the result can be, strictly speaking, anything.
But usually one applies some extrapolation algorithm
(e.g., the epsilon or eta algorithm) to get a meaningful guess for the
limit, and estimates the error by doing the same several times,
keeping a variable number of terms. The difference between
consecutive results can count as a reasonable (though not foolproof)
error estimate of these results. Similarly, given a finite number
of coefficients of a power series, one can use Pade approximation to
find an often excellent approximation of the 'intended' function,
although of course, a finite series says, strictly speaking, nothing
about the limit of the sequence.
But to have reliable bounds one needs to know an exact definition of
what one is approximating, and work from there. Such an exact
defintion is, at present, missing for quantum electrodynamics.

S9b. Is QED consistent?
Quantum electrodynamics (QED) gives the most accurate predictions
quantum physics currently has to offer.
The anomalous magnetic dipole moment matches the experimental data
to 12 significant digits:
M. Passera,
Precise mass-dependent QED contributions to leptonic g-2 at order
alpha^2 and alpha^3,
Phys. Rev. D 75, 013002 (2007).
B. Odom, D. Hanneke, B. D'Urso, and G. Gabrielse,
New Measurement of the Electron Magnetic Moment Using a
One-Electron Quantum Cyclotron,
Phys. Rev. Lett. 97, 030801 (2006)
The Lamb shift, whose prediction made QED and renormalization
respectable, is much more difficult to measure with high precision,
hence offers no such phenomenal test of accuracy:
S.G. Karshenboim,
Precision physics of simple atoms: QED tests, nuclear structure
and fundamental constants,
Phys. Rep. 422 (2005), 1-63
Nevertheless, many physicists think that QED cannot be a consistent
theory. There is a phenomenon called the Landau pole:
It indicates that at extremely large energies (far beyond the range of
physical validity of QED, even far beyond the Planck energy) something
might go wrong with QED. (QED loses its validity already at energies
of about 10^11 eV, where the weak interaction becomes essential.
The Planck energy at about 10^28 eV is the limit where some current
theories try to make predictions. But the Landau pole, if it exists,
has an energy far larger than the latter.)
This is probably why Yang-Mills and not quantum electrodynamics was
chosen as the model theory for the millenium prize.
Since the existence of the Landau pole is confirmed only in low order
perturbation theory and in lattice calculations,
hep-lat/9801004 and hep-th/9712244
the question whether the alleged landau pole implies limits to the
consistency of QED has currently no rigorous mathematical substance.
The observations about the Landau pole in perturbation theory can be
recast in mathematically rigorous terms using so-called renormalons,
obstructions to Borel summability; see
V Rivasseau
From Peturbative to Constructive Renormalization
Princeton 1991
But the resulting analysis is inconclusive as regards the existence
of the theory.

The quality of the computed approximations to QED are a strong

indication that there should be a consistent mathematical foundation
(for not too high energies), although it hasn't been found yet.
There is no indication at all that at the energies where QED
suffices to describe our world (with electrons and nuclei considered
elementary particles), it should be inconsistent. To show this
rigorously, or to disprove therefore remains another unsolved
(and for physics more important) problem.
Perturbative QED is only a rudimentary version of the 'real QED';
which can be seen that Scharf's results on the external field case
G. Scharf,
Finite Quantum Electrodynamics: The Causal Approach, 2nd ed.
New York: Springer-Verlag, 1995.
are much stronger (he constructs in his book the S-matrix)
than those for QED proper (where he only shows the existence of
the power series in alpha, but not their convergence).
J.S. Feldman, T.R. Hurd, L. Rosen and J.D. Wright,
QED: A proof of renormalizability,
Lecture Notes in Physics 312,
Springer, Berlin 1988
gives a rigorous proof of perturbative existence of QED at all orders.
This means that a formal power series for the S-matrix is shown to
exist rigorously. This includes renormalization and is sufficient for
actual computations since a few terms in the power series give very
high accuracy.
However, the power series is believed to diverge if enough
(i.e., infinitely many) terms are added, and a consistent
nonperturbative treatment of full QED is presently missing.

The quest for 'existence' of QED is the quest for a framework

where the formulas make sense nonperturbatively, and where the
power series in alpha is a Taylor expansion of a (presumably
nonanalytic) function of alpha that is mathematically well-defined
for alpha around 1/137 and not too high energy. This is still open.
More precisely: Probably QED (and thus the QED S-matrix
exists nonperturbatively as a 2-parameter theory depending on
the fine structure constant alpha and the electron mass m_e; these
parameters are the zero energy limits of the corresponding renormalized
running coupling constants, and is defined for alpha <= 1/137 and
input energies <= some number E_limit(alpha,m_e) larger
than the physical validity of pure QED. What is needed is
a mathematical proof that the QED S-matrix exists for 0<alpha<1/137
(rather than only for infinitesimal alpha, as currently established)
as a unitary operator S(alpha) in the Hilbert space H(Emax) of all
in-states of energy <= E_max=E_limit(alpha), for some reasonable

We know from perturbation theory how to compute in such a range the

coefficients of an asymptotic series in alpha for S(alpha).
We also have a number of nonperturbative approximation schemes that
give certain nonperturbative results (such as the Lamb shift).
But we currently do not have a way to ascertain that some well-defined
object S(alpha) exists that has this asymptotic series. The quest for
proving that QED exists is that of finding a construction for S(alpha)
that makes rigorous sense and has the known asymptotic expansion.

QED is renormalizable at all loops, which means that the power series
expansion of the S-matrix is mathematically well-defined at ordinary
energies. The _only_ thing that is missing is to give its limit a
mathematically well-defined meaning.
Note that the S-matrix S commutes with the Hamiltonian;
hence if P is the orthogonal projector to the space H_limit of
states involving only energies < E_limit(alpha)
then PSP is unitary on H_limit, and my conjecture is that PSP has
some (yet unknown but) rigorous nonperturbative construction.
The Landau pole (if it exists) just gives an upper bound to the allowed
energies. E_limit(alpha) is a function of alpha, which according to
perturbation theory has to satisfy
E_limit(0) < (Landau-pole in lowest order)
(and possibly decreases with increasing alpha); apart from that,
the known approximate results do not restrict the likely mathematical
validity of pure quantum electrodynamics.
A cautious evaluation of the situation is given in Weinberg's QFT book,
Vol. 2, pp.136-138 - all options are left open. On the other hand,
D. Espriu and R. Tarrach,
The case for triviality,
Phys. Lett. B383 (1996) 482,
argue that, because of the Landau pole, quantum electrodynamics is
only an effective field theory.

To summarize:
QED is renormalizable at all loops, which means that the power series
expansion of the S-matrix is mathematically well-defined at ordinary
energies. The _only_ thing that is missing is to give its limit a
mathematically well-defined meaning derived from a formulation of
QED that makes sense also at finite times and not only as a transition
from t=-infinity to t=+infinity.

S9c. What about relativistic QFT at finite times?
Although many time-dependent observable consequernces of QED
can be deduced in a nonrigorous way in the Schwinger-Keldysh
= closed time path (CPT) formalism, there is at present no rigorous
relativistic quantum field theory at finite times in 4 dimensions.
In lower dimensions, for all theories where Wightman
functions can be constructed rigorously, there is an associated
Hilbert space on which corresponding (smeared) Wightman fields
and generators of the Poincare group are densely defined.
This implies that there is a well-defined Hamiltonian H=cp_0 that
provides via the Schroedinger equation the dynamics of wave functions
in time.
In particular, if the Wightman functions are constructed via the
Osterwalder-Schrader reconstruction theorem, both the Hilbert space
and the Hamiltonian are available in terms of the probability measure
on the function space of integrable functions of the corresponding
Euclidean fields. For details, see, e.g., Section 6.1 of
J. Glimm and A Jaffe,
Quantum Physics: A Functional Integral Point of View,
Springer, Berlin 1987.
In particular, (6.1.6), (6.1.11) and Theorem 6.1.3 are relevant.
Unfortunately, no Wightman functions have been constructed so far
for interacting 4D quantum field theorys; see the FAQ entry on
'Is there a rigorous interacting QFT in 4 dimensions?'.
However, the functional integration measure of Euclidean QED is known
to exist perturbatively at all orders (Tomonaga, Schwinger and Feynman
got the Nobel prize for this), though a nonperturbative construction
is still missing. By analytic continuation as in the
Osterwalder-Schrader reconstruction theorem , one should be able to
obtain a perturbatively valid Hamiltonian for QED (cf. Theorem 6.1.3
in Glimm and Jaffe).

Current 4D QFT in its usual textbook form is based on perturbation

theory for free (i.e., asymptotic in- and out-) states; therefore it
gives only predictions that relate the in- and out-states.
(But see below for the CTP techniques, which are not of the standard
perturbative form and give far mor information.)
This information is contained in the S-matrix elements.
From the S-matrix, one can the derive further information, e.g.,
about bound state energies as poles.
In nonrelativistic QM, one has a well-defined dynamics at finite
times, given by the Schroedinger equation. This dynamics can be recast
in terms of Feynman path integrals. Unfortunately, this does not
extend to the relativistic case.
The problem with relativistic path integrals is that they are formal
objects without a clear numerical meaning: whatever one tries to
compute with them turns out to be infinite.
Only selected objects derived from path integrals can be given
meaning by means of the renormalization procedure. The books show how to
give meaning to S-matrix elements between asymptotic in and out states.
The (Minkowski space) path integral is ill-defined as a number,
but, after regularization, well-defined as a formal power series in
hbar (the latter is often set to 1 to simplify typography,
but this make things more difficult to grasp). The Legendre transform
of the logarithm is then also defined as a formal power series, and
by letting the coupling constants depend on the regularization parameter
eps (or Lambda), one can take the limit eps to 0 (or Lambda to infty)
to get the effective action, again as a formal power series.
From there, one can get the S-matrix, again as a formal power series.
FOR QED, the first few terms give highly accurate approximations;
for other QFTs, partial resumming of these series give acceptable
results in agreement with experiment.
Expanding objects of interest as power series is the hallmark of the
so-called perturbative approach. In contrast, nonperturbative methods
try to give meaning to the actual sums, though no one succeeded so far.
Indeed, convergence questions are open, although it is generally
believed that (as most series coming from a saddle point expansion
of an integral) the series is only asymptotic. See the section on
'Summing divergent series' in this FAQ.

But I haven't seen a single article that gives meaning (i.e.,

infrared and ultraviolet finite, renormalization scheme independent
properties) to, say, quantum electrodynamics states at finite t and
their propagation in time.
People don't even know what an initial state should be in a 4D
relativistic QFT (i.e., from which space to take the states at
finite t); so how can they know how to propagate it...
Thus the standard textbook theory gives an S-matrix (or rather an
asymptotic series for it) but not a dynamics at finite times.

This does not mean that there is no dynamical reality underlying

4D relativistic QFT. It only means that no one has been able to find
a working, logically consistent framework for it.
Probably people working in QFT imagine something like a state evolution
in some unspecified Hilbert space underlying their formalism.
After all, this is how one justifies that the functional integral works.
Indeed, one can compute - nonrigorously, in renormalized perturbation
theory - many time-dependent things, namely via the Schwinger-Keldysh
(or closed time path = CTP) formalism; see, e.g.,
For example,
E. Calzetta and B. L. Hu,
Nonequilibrium quantum fields: Closed-time-path effective action,
Wigner function, and Boltzmann equation,
Phys. Rev. D 37 (1988), 2878-2900.
derive finite-time Boltzmann-type kinetic equations from quantum
field theory using the CTP formalism.

There are also successful nonrelativistic approximations with

relativistic corrections, within the framework of NRQED and NRQCD,
which are used to compute bound state properties and spectral shifts.
See, e.g., hep-ph/9209266, hep-ph/9805424, hep-ph/9707481, and
There is also an interesting particle-based approximation to QED
by Barut, which might well turn out to become the germ of an exact
particle interpretation of standard renormalized QED. See
A.O. Barut and J.F. Van Huele, Phys. Rev. A 32 (1985), 3187-3195,
and the discussion in Phys. Rev. A 34 (1986), 3500-3501,3502-3503.

Approximately renormalized Hamiltonians, and with them an approximate

dynamics, can also be constructed via similarity renormalization;
see, e.g.,
S.D. Glazek and K.G. Wilson,
Phys. Rev. D 48 (1993), 5863-5872.
It is usually applied in the front form (cf. the FAQ entry on
'Forms of relativistic dynamics'), as the many references to this
paper in (search for:
author:glazek author:wilson) show. See also hep-ph/0009071.
S.D. Glazek and K.G. Wilson,
Phys. Rev. D 49 (1994), 4214-4218.
gives a proof that, for renormalizable theories without
massless particles, similarity renormalization results in a
perturbatively finite Hamiltonian at all orders of perturbation theory.
A different, more explicit renormalized Hamiltonian framework is
given for quantum electrodynamics (but with a small photon mass
to avoid infrared problems) in the instant form in
E.V. Stefanovich,
Quantum Field Theory without Infinities
Ann. Phys. 292 (2001), 139-156.
Apart from the photon mass, it appears to be equivalent with QED
on the the renormalized level, and provides on the perturbative level
a representation of the Poincare group in the instant form, and
therefore a dynamical interpretation. (But many of his foundational
views voiced in are far from
being justifed.)
Thus both similarity renormalization and Stefanovich renormalization
give infrared-regularized QED a dynamical content at every order of
perturbation theory by providing approximate but finite,
UV renormalized Hamiltonians to each order that are asymptotic to a
formal Hamiltonian that acts formally as the
generator of translations in the Poincare group.
Convergence questions are not discussed. Also, the infrared
divergences are not addressed but must be removed by assuming a tiny
photon mass, thus spoiling gauge invariance.

While Dyson's argument (see the FAQ entry on 'Summing divergent

series') implies that it is not reasonable to demanding a convergent
S-matrix expansion, the limit Hamiltonian in these approaches could
still be convergent. If this could be shown and the massless limit
for the photon performed, it would amount to an existence proof of
quantum electrodynamics.
In general, the correct Hamiltonian is
H = lim_{Lambda to inf} H(Lambda,g(Lambda,E,q_phys)),
where H(Lambda,g) is the canonical Hamiltonian with cutoff Lambda
and parameter vector g (containing the so-called bare mass, charge or
coupling constant, and field renormalization factor), and
g(Lambda,E,q_phys) is the cutoff-dependent parameter vector as
determined by renormalization conditions at energy scale E, which
relate g to a set of physical parameters q_phys (consisting, in case
of QED of the physical electron mass m, the physical electron charge e,
or, equivalently, the observed value of the fine structure constant
This limit probably exists, at least for renormalizable,
asymptotically free theories, at least in 1D and 2D field theory,
where this can be proved in certain cases. In 3D and 4D, one probably
needs also a Lambda-dependent inner product defining the Hilbert space
to ensure that one ends up in the right representation,
and Lambda-dependent wave functions to ensure that the limiting
renormalized wave functions remain bounded in the limiting
renormalized inner product.
The consistency problem in a Hamiltonian approach to quantum field
theory is precisely to show that this limit indeed exists.

The missing consistent dynamical theory in 4D relativistic QFT

may also have consequences for the foundations of quantum mechanics.
Clearly, measurements happen in finite time, hence cannot
be described at present in a fundamental way (i.e., beyond the
nonrelativistic QM approximation). Thus foundational
studies based on nonrelativistic QM are naturally incomplete.
This implies that it is quite possible that a solution of the
unresolved issues in relativistic QFT are related to the unresolved
issues in quantum measurement theory.

S9d. Perturbation theory and instantaneous forces
In classical relativity theory, causality demands that all forces
are retarded. In relativistic quantum theory, this principle is
somewhat obscured, due to the approximations needed to get a
dynamical picture. The general practice is to expand in powers
of v/c, where v is a velocity and c is the speed of light.
When doing this, the resulting formulas look instantaneous at
each order of perturbation theory, which might invite unfounded
However, the same already happens at the classical level, where
the situation is easy to understand. The retarded terms must
reappear when summing terms to all order.
This is most easily seen by noting that a retarded differential
equation (for simplicity 1D, but the 4D case is similar)
dx(t)/dt = f(x(t-tau)),
when expanded in powers of the small parameter tau, becomes a
higher order ordinary differential equation at fixed order.
To see this, differentiate the original equation k times and
introduce new functions
x_0=x, x_1=dx/dt, ..., x_k=d^kx/dt^k
to get a system of retarded differential equations.
Then expand the equation for dx_k/dt up to order n-k.
Then substitute terms on the right hand side.
The approximate equation is manifestly instantaneous, but it
describes the perturbative behavior of the retarded equation.
Thus perturbation theory in v/c cannot be used to decide about the
instantaneous or retarded nature of quantum dynamics.

S9e. QED and relativistic quantum chemistry
Relativistic quantum chemistry is needed to predict properties
of heavy atoms. This is usually done by invoking the Dirac-Fock
Hamiltonian, which is an approximation of the QED Hamiltonian
for which the multiparticle bound state problem is tractable.
Here are a few samples of what can be done:
The first is explicitly time-dependent;
the second is about bound states calculations;
the third shows how to add further QED corrections;
The fourth shows how the Dirac-Fock Hamiltonian arises as
approximation of QED. The last gives a discussion of some
mathematical problems involved.
Electron correlations and spin-orbit interaction in two-photon
ionization of closed-shell atoms: A relativistic time-dependent
Dirac-Fock approach
Phys. Rev. A 42, 3801-3818 (1990)
Bieron et al.
Large-scale multiconfigurational Dirac-Fock calculations of the
hyperfine-structure constants and determination of the nuclear
quadrupole moment of 49Ti
Phys. Rev. A 59, 4295-4299 (1999)
Multiconfiguration Dirac-Fock calculations of transition energies
with QED corrections in three-electron ions
Phys. Rev. A 42, 5139-5149 (1990)
P Chaix and D Iracane
From quantum electrodynamics to mean-field theory.
I. The Bogoliubov-Dirac-Fock formalism
J. Phys. B: At. Mol. Opt. Phys. 22 (1989) 3791-3814
M Defranceschi and C Le Bris
Computing a molecule in its environment: A mathematical viewpoint
Int J Quantum Chemistry 71 (1999) 227-250

S9f. Are protons described by QED?
The traditional field equations of quantum electrodynamics (QED),
which can be found in any textbook on quantum field theory, describe
only electrons, positrons, and photons, but not protons, although
the latter have electromagnetic interactions.
The reason is that, unlike free electrons and positrons, free protons
do not obey the Dirac equation since they have form factors which are
(unlike for electrons and positrons) determined not only by interactions
with photons, but primarily by the inner structure of the proton.
Thus even bare protons cannot be understood as point particles, which
makes standard QED equations inapplicable.
To understand the proton's frm factors from first principles needs
quantum chromodynamics (QCD) - and even then they are imperfectly
In the traditional QED treatment of molecules and their interaction
with light, protons and other nuclei are typically treated as classical
sources of electromagnetic fields when determining the structure of
the electron. (The resulting effective potential between the
nuclear positions is quantized afterwards if a full classical treatment
is not adequate). This gives excellent agreement with experiment,
in particular for the hydrogen atom.
Of course, one can tread QED together with a proton field as an
effective (and nonrenormalizable) theory, in which in addition to the
Dirac equation for the bare electrons there is a Dirac-like equation,
modified by the form factors, for the bare protons. To describe atoms
correctly, one needs also fields for neutrons and mesons, and
appropriate interaction terms between them, leading to quantum
hadrodynamics (plus QED). This accounts for all practically
relevant properties of atoms (including nuclear fission and fusion).

S10a. How are matrices and tensors related?
Mathematicians and physicist differ in the notation used for
vectors, tensors, matrices, and multilinear forms. Here is
a dictionary.
T^q = tensor product of q copies of the vector space T;
in particular, T0=S is the algebra of scalar fields and T1=T.
T^p_q = space of all linear mappings from T^q to T^p;
elements are (p,q)-tensors with p upper and q lower indices.
T_0^q = T^q
T_p0 =: T_p = (T^p)^* is the so-called dual space of T^q;
in particular, T_1 = T^* is the dual space of T;
its elements are the linear forms = covectors.
One can associate with every A in T_p^q canonically a multilinear
mapping B: T_q tensor T^p --> S with
B(s,t) = t(As) for s in T^q, t in T_p,
and conversely; indeed, since the image As of s under A is in T^p,
its image t(As) is a well-defined scalar. Using the B's in place of
the A's gives an alternative way of defining tensors, although one
less convenient for visualization.
Given a basis on T and a dual cobasis on T^*, one can use coordinates.
Then physicists write
- elements of T as vectors = column vectors with an upper index,
- elements of T^* as linear forms = 1-forms = covectors = row vectors
with a lower index,
- elements of T^q as multivectors with q upper indices,
- elements of T_p as multicovectors with p lower indices,
- elements of T_p as mixed multi/ko/vectors with p lower and q upper
(There is also a dual version of this, where vector are considered
as rows and covectors as columns. The remainder then changes
In particular.
(0,0)-tensor = scalar,
(1,0)-tensor = vector (vector in T=T1) = column vektor,
(0,1)-tensor = covector (vector in the dual space T^*=T_1)
= row vector,
(1,1)-tensor = matrix (linear mapping from T to T).
Clearly, the columns of the matrix A_i^k are column vectors = vectors,
the rows are row vectors = covectors, and the indexing is consistent.
The requirement that basis and cobasis are dual is equivalent to the
statement that for every vector u and covector w (i.e., linear mapping
from vectors to scalars),
w(u) = w_i u^i;
here the Einstein convention is used that formulas involving
pairs of equally labelled indices, one of them a lower index
and the other an upper index must be interpreted as a sum over these

Mathematicians using linear algebra (where no tensors of order

>2 appear) write instead all indices as lower indices, no matter
whether they belong to row vectors, column vectors, or matrices.
They also write all sums explicitly, consider all vectors given
by a single letter as column vectors, and write covectors (1-forms)
explicitly using the transposition sign (^T, but statisticians often
use a prime ' instead, which is also the form used in Matlab).
This has many advantages and allows a simple notation
which increases understandability of otherwise long formulas.
Phys. notation: s = x^k y_k x vector, y covector
Math. notation: s = sum_k y_k x_k
or simply s=y^Tx.
Phys. notation: y_i = A_i^k x_k x,y vectors, A matrix
Math. notation: y_i = sum_k A_ik x_k
or simply y=Ax.
Phys. notation: s = A_i^i A matrix
Math. notation: s = sum_i A_ii
or simply s = tr A (trace).
Phys. notation: y_i = A_i^j B_j^k x_k x,y vectors, A,B matrices,
Math. notation: y_i = sum_jk A_ij B_jk x_k
or simply y=ABx.
Phys. notation: y_i = A_i^j B_j^k C_k^l D_l^m x_k
x,y vectors, A,B,C,D matrices
Math. notation: y_i = sum_jklm A_ij B_jk C_kl D_lm x_k
or simply y=ABCDx.
The linear algebra notation is compact and index-free,
in spite of the fact that coordinates are being used.

For higher order tensors, the advantages of the linear algebra

notation are less pronounced since one has to specify which
pairs of indices must be contracted. However, often, an index-free
notation is still possible:
Phys. notation: A_li = R_ijkl b^j c^k
Math. notation: A(u,v) = R(v,b,c,u)
Phys. notation: A_l^i = R^i_j^k_l b^j c_k
Math. notation: A(u,v^T) = R(v^T,b,c^T,u)
Phys. notation: A_i^j = R_i^kkj
Math. notation: A = tr_23 R,
where the subscripts indicate which indices must be contracted.

All this is completely independent of any metric.

If a metric = nondegenerate symmetric (0,2)-tensor g is given on T,
which associates with u,v in T the scalar g(u,v),
one can canonically identify vectors and covectors, at the
expense of some confusion if one is not careful.
This reads in physicists notation as follows: The metric is
g_ik=g_ki (expressing the symmetry),
and for every vector u^k, the associated covector is
u_i = g_ik u^k.
Conversely, one can reconstruct the vector from the covector using
u^k = g^ik u_i,
where g^ik=g^ki is the inverse metric, a symmetric (2,0)-tensor which
for consistency must satisfy the equations
g_ij g^kj = delta_i^k (*)
with the Kronecker delta
delta_i^k = 1 if i=k, = 0 otherwise,
which is the identity matrix written as a (1,1)-tensor in index
notation. Nondegeneracy is precisely the solvability of (*) for the
dual metric.
Mathematicians find it confusing to label different objects with the
same symbol, and prefer to always distinguish between a vector and its
canonically associated covector. Given a basis of T and the dual
cobasis of T^*, coordinates (row and column vectors) can be used to
define the elements of T and T^*; the metric g in T_2 is represented
in these coordinates by an invertible symmetric matrix = (1,1)-tensor G
such that
g(u,v) = u^TGv for u,v in T.
The canonical pairing induced by the metric therefore associates with
the vector u the covector
w^T = u^TG. (**)
Conversely, one can reconstruct from the covector w^T the canonically
associated vector
u = G^{-1}w.
The dual metric therefore maps u^T, v^T to u^TG^{-1}v, and is
represented by the inverse matrix G^{-1}.

The relation between the physicists form and the linear algebra form
of writing things can be inferred from (**) - we simply have
Phys. notation: g_ik
Math. notation: G = (g_ik)
Phys. notation: g^ik
Math. notation: G^{-1} = (g^ik)

Again, the linear algebra notation is compact and index free,

in spite of the fact that coordinates are being used.

S10b. Is quantum mechanics compatible with general relativity?
The difficulty to reconcile quantum mechanics and general relativity
counts as one of the big problems of fundamental physics.
There appears to be a problem because canonical quantum gravity
based on quantizing the Hilbert action is nonrenormalizable.
(See the section on 'Renormalization in quantum gravity' in this
FAQ about how nevertheless to renormalize a nonrenormalizable field

The difference between renormalizable and unrenormalizable theories is

that the former are specified by a (small) finite number of parameters
while the latter are specified by an infinite number of parameters.
In both cases, it is possible to extract approximate results from
computations, and the parameters can be tuned to fit the experimental
results. This gives a consistent procedure for predictions. Indeed,
many nonrenormalizable theories are in use as effective field theories.
(See hep-ph/0308266 for a recent survey on effective field theories.)
People who dislike nonrenormalizable theories do this on the basis of
a claim that their predictive value is nil because of the infinitely
many constants. But this is as unfounded as saying that thermodynamics
is not predictive because it depends on a function (the expression for
the free energy, say) that requires an infinite number of degrees of
freedom for its complete specification. Clearly, in the latter case,
the widespread use of finitely parameterized imperfect free energies
does not hamper the usefulness of thermodynamics. The same can be
said about nonrenormalizable field theories. It only implies that to
extract arbitrarily precise predictions one needs correspondingly
much information as input. We know that this is the case already for
many simpler phenomena in physics. (For indications that canonical
quantum gravity is nonperturbatively renormalizable see, e.g.,
hep-th/0110021, hep-th/0312114, hep-th/0304222.)
A different matter is the dream of a fundamental theory without any
free parameters, which of course conflicts with a theory in which
infinitely parameters are needed for its complete specification.
But there is no theorem that says that nature is governed by unique
principles. It is quite likely that the designer of the universe
had some choices besides the constraints imposed by logical consistency.
Thus I think this dream (which also fuels string theory) is misguided,
and the correct quantum version of general relativity is standard,
nonrenormalizable canonical quantum gravity.
This means that, quite likely, general relativity is fully compatible
with quantum mechanics.
Of course this conflicts with the view of powerful groups within
theoretical physics, who maintain that their approach to quantum
gravity (either string theory, or loop quantum gravity) is the road
to suuccess. But from what I have seen (at a somewhat superficial
level of understanding) I trust neither string theory
nor loop quantum gravity to be close to the truth. In any case,
both are completely separated from experiemental verification.
If experiments in the near future can probe some features of quantum
gravity, it will be for small quantum systems interacting with
external electromagnetic and gravitational fields. See gr-qc/0408010.

S10c. Difficulties in quantizing gravity
(i) (mathematical) No consistent interaction relativistic quantum
field theory is known in 4 dimensions.
(ii) (theoretical) The accepted ways to avoid divergences in
expressions for scattering amplitudes that work in simpler theories
all fail because of the lack of renormalizability. See, e.g.,
the references in Section 2.2 of
(iii) (theoretical) The theories for which a (perturbatively)
finite scattering theory is available have not been related
quantitatively to the established theories.
A convincing classical limit (to general relativity),
nonrelativistic limit (to a multiparticle Schroedinger equation with
Newtonian interaction), and low energy limit (at currently accessible
energies no new particles apart from the graviton) would be needed.
(iv) (conceptual) The three limits pose severe constraints on possible
quantum gravity theories, and it requires much imagination to come up
with a conceptual basis in which these limit make sense and are
tractable. (But see the preceing entry.)
(v) (experimental) Quantum effects in gravity are so weak that no
experiments sensitive to quantum effects are in reach in the near
future, and the data from astromomy that may cast light on quantum
gravity are scarce. (Quantum gravity is not demanded by unexplained
data but only by the quest for consistency with particle physics.)

S10d. Renormalization in quantum gravity
Renormalization of QFTs is needed to make the coefficients in the
loop expansion (i.e., the expansion in powers of Planck's number hbar)
of the S-matrix well-defined.
Canonical quantum gravity is the theory obtained by writing down the
Einstein-Hilbert action in a (3+1)-dimensional splitting (ADM formalism)
and either fixing coordinates and solving the constraints (reduced phase
space quantization) or quantizing using Dirac's approach to constrained
systems (Dirac quantization).
Covariant quantum gravity is the theory obtained as follows:
Write down the classical Hilbert action for general relativity,
look at the corresponding functional integral defined perturbatively
as for QED or QCD, and try to compute S-matrix elements using the
usual renormalization prescriptions for the integrals corresponding
to the various Feynman diagrams.
Quantum field theories are nowadays almost always defined in the
covariant way; the covariant approach has the advantage of being
manifestly invariant under the full symmetry group. (The canonical
approach to scalar QED fails in certain versions to preserve
Poincar'e symmetries, due to term ordering problems; see
gr-qc/9403065.) On the other hand, the canonical approach is
intrinsically nonperturbative, while the covariant approach needs
extra tricks (renormalization group enhancements) to get partial
nonperturbative results.
Covariant quantum gravity only works in the traditional way up to
1 loop (and together with matter not even then); at higher loops
(i.e., for corrections of higher order in the Planck constant hbar)
one needs more and more counterterms to make the resulting combination
of integrals finite. See
S. Deser,
Infinities in Quantum Gravities,
(and references [2,4] there). This is called 'nonrenormalizability',
and is the main blemish of covariant quantum gravity.
(For other potential problems, see, e.g., gr-qc/0108040.)
Note that quantum gravity, though nonrenormalizable in the
established sense, is renormalizable in a weak sense,
where infinitely many counterterms are allowed; see
J. Gomis and S. Weinberg,
Are Nonrenormalizable Gauge Theories Renormalizable?
Most researchers in quantum gravity want a renormalizable theory
in the strong sense (so that finitely many counterterms suffice);
then covariant quantum gravity is out, and people look
for fancy alternatives (loop quantum gravity, superstring
theory, etc.). However, these theories have their own difficulties.
Some online references are:
gr-qc/9803024: Strings, loops and others: a critical survey
of the present approaches to quantum gravity
gr-qc/9710008: Loop quantum gravity
hep-th/9709062: Introduction to superstring theory
astro-ph/0304507: Update on string theory
hep-th/0311044: The nature and status of string theory
physics/0605105: a short review of superstring theories
gr-qc/0410049 shows how gravity derives from string theory;
a more complete derivation is in section 3.7 of Polchinski's book.
Phys. Rev. Lett. 60, 2105-2108 (1988) discusses the lack of Borel
summability of the S-matrix expansion for the bosonic string. tells about the state
in 2003 concerning the claims of (super)string theory to be a
renormalizable quantum theory. Only the 2 loop case seems to be
settled; see arXiv:hep-th/0501197 and hep-th/0211111 (especially
Section 14 of the latter for the unsolved problems at 3 loops and
Others treat covariant quantum gravity just as they treat
nonrenormalizable effective field theories, and fare well with it.
See, for example,
C.P. Burgess,
Quantum Gravity in Everyday Life:
General Relativity as an Effective Field Theory
Living Reviews in Relativity 7 (2004), 5
for 1-loop corrections, and
Donoghue, J.F., and Torma, T.,
Power counting of loop diagrams in general relativity,
Phys. Rev. D, 54, 4963-4972,
for higher-loop behavior.
Section 4.1 discussed recent computational studies showing that
covariant quantum gravity regarded as an effective field theory
predicts quantitative leading quantum corrections to the
Schwarzschild, Kerr-Newman, and Reisner-Nordstroem metrics.
Only a few new parameters arise at each loop order, in particular only
one (the coefficient of curvature^2) at one loop.
In particular, at one loop, Newton's constant of gravitation becomes
a running coupling constant with
G(r) = G - 167/30pi G^2/r^2 + ...
in terms of a renormalization length scale r.
Here is a quote from Section 4.1:
''Numerically, the quantum corrections are so miniscule as to be
unobservable within the solar system for the forseeable future.
Clearly the quantum-gravitational correction is numerically extremely
small when evaluated for garden-variety gravitational fields in the
solar system, and would remain so right down to the event horizon even
if the sun were a black hole. At face value it is only for separations
comparable to the Planck length that quantum gravity effects become
important. To the extent that these estimates carry over to quantum
effects right down to the event horizon on curved black hole
geometries (more about this below) this makes quantum corrections
irrelevant for physics outside of the event horizon, unless the
black hole mass is as small as the Planck mass''

S10e. Hadamard states and their Hilbert spaces
In his book on qunatum field theory in curved spacetime
Wald delineates a class of 2-point functions called Hadamard states
that have locally the same kind of singular behavior as the flat
free 2-point functions. This class of states is also natural from
several other points of view, though I cannot give details off-hand
since this is slightly outside my field of knowledge.
Associated to each Hadamard state is a Gaussian state |0>
of the quantum field which is constructed from the 2-point function
via Wick's theorem. This state is often called a 'vacuum state',
though this is not quite appropriate, unless one allows the vacuum
to carry gravitational and electromagnetic fields. A more appropriate
name would be a 'coherent state' since it is the generalization of
coherent states in the Fock spaces considered in optics.
Each Gaussian state produces a Hilbert space of wave functions
consisting of linear combinations of the a*_k1 a*_k2 ...|0>,
weighted by sufficiently smooth functions of the k's to render
their norm finite.
All states in this Hilbert space are also physically reasonable,
but they do not have the same basic (vacuum-like)
status as the Hadamard states since they are no longer Gaussian,
and hence are harder to work with.
But you can evaluate <psi|phi(x)phi(y)|psi> in such a state by
expanding everything in terms of vacuum expectations of expressions
in a's and a^*'s and applying Wick's theorem. Their leading singular
behavior is probably the same as for the Gaussian state itself,
though I haven't tried to check this.

S10f. Why do gravitons have spin 2?
The reason is that gravitation is described by a metric
(symmetric 2-tensor field) modulo general covariance,
which gives locally, in the tangent Minkowski space of any point,
a spin 2 representation of the Poincare group.
Gravitational waves have to be (classically) long range,
which requires (after quantization) massless particles.
Thus gravitons (although never observed) should be massless
spin 2 particles.
S10g. What is the tetrad formalism?
A way of writing general relativity such that it can be
applied to a spinor (e.g. electron) field.
A tetrad is a set of four linearly independent
vector fields e_0, e_1, e_2, e_3.
Considering them orthonormal in the sense that
g(e_j,e_k)=eta_jk (*)
where eta is the Minkowski metric defines the
metric g uniquely; conversely, for any metric one can
choose (on any chart) such an orthonormal basis.
If the manifold is parallelizable then one can choose
the ONB even globally. In 4 dimensions, any manifold
which allows to define spinors consistently is
parallelizable (by a result of Geroch), hence reality
is most likely described by such a manifold.
Using (*), one can rewrite any formula involving the
metric into one involving instead tetrads, and many
things simplify - using tetrads is closer to the Cartan
formalism of differential geometry than using the metric
directly. E.g.,
sqrt(-det g) = det(e).
One has to be slightly careful not to confuse curved
and flat indices, but this is learnt very quickly.
Then one needs much less index shifting.
For gravitation coupled to a (classical) Dirac field,
the tetrad formalism is indispensable, since spinors
cannot be defined without a flat representation.

S10h. Energy in general relativity
Energy is no absolute concept, but depends on the observer
(in the nonrelativistic case, by choice of a velocity,
in the relativistic case, by choice of time-like unit
vector that defines the direction of time and hence the
time coordinate).
In classical mechanics there is always a (up to rotations)
distinguished center of mass frame where the whole system
is at rest and the center of mass at zero.
The observer is usually (silently) considered to be at rest
with respect to that frame; then there is no ambiguity
left in the energy.
In special relativity things are already more problematic
since there is no natural center of mass. But one can fix
the time direction by taking it to be that of the total
4-momentum of the whole system. This again fixes a frame,
now up to Euclidean motions. On the other hand, this is not
what an observer (who has a slightly different eigentime
depending on its 4-momentum) sees, and must be corrected
In general relativity the conserved total 4-momentum is
identically zero, so there is no longer a way to fix a
time direction. But assuming an asymptotically flat
space-time one can take its flat coordinate system
(determined up to a Poincare transformation) and
use it to chart the localized part, and gets a Minkowski
description, to which the preceding applies.
In general relativity, the concept of energy depends on the
choice of a spacelike hypersurface defining a region of space
and a time-like vector field along that hypersurface defining
the direction of time: Then the integral of [part of]
the (0,0)-component of the energy-momentum tensor over this
hypersurface defines the corresponding [part of the]
energy in this region.
This allows one to talk about the (observer-dependent) energy
of a subsystem, or of all matter in the universe, etc.
Observer-independent is the energy-momentum tensor density
as a whole, but not energy.
The weak-field limit defines a preferred coordinate system,
thus reducing the arbitrariness to the choice of the time
direction, and the nonrelativistic limit fixes this choice
to be the direction of the total momentum of the reference
object (e.g., the earth or sun or our galaxy). This makes
everything completely determined and gives us a good
energy for everyday life.

Note that using the concept of energy does not require

a global conservation law.
Even in nonrelativistic classical mechanics, energy is conserved
only for isolated systems, while the concept is used very
profitably in all sorts of nonisolated settings. It just means that
one needs to account in the balance equations for what happens
at the boundary, and (if necessary) include friction terms
(which describe, so to speak, the boundary to the neglected
microscopic degrees of freedom).
Thus, to connect general relativity to what most physicists actually
study, namely systems localized in a small region of space and time
(small may mean, e.g., a laboratory, the earth, the solar system,
or our galaxy - within an hour, a year, a few millenia, etc.)
one needs to make precise what energy means for such pieces of the
whole universe.
This requires that the observer specifies the region of space of
interest, and the length of time of interest, including the way time
is supposed to flow. The observer also has to specify which part of
the energy is of interest, i.e., the terms in the energy-momentum
tensor that define the system (contrasted to the environment -
which make up all the other terms).
After all that is done, energy has a well-defined meaning,
as given above.
On the other hand, the observer-independent notion generalizing
energy is the full energy-momentum tensor; its tensor
nature reflects the need for observer information to extract
from it numerical values, i.e. real numbers that can be compared
with experiment. But apart from energy it also contains the
observer-independent part of the information about momentum
and stress, which themselves are also observer-dependent.

S10i. What happened to the aether?
The aether as supporting substance for electromagnetic waves
was a standard hypothesis in the 19th century but fell out of
favor with the successes of relativity theory.
When in vogue, the aether was the substance filling empty space
- i.e., the physics of the aether is the physics of empty space.
In a way, the classical background field (also termed the 'vacuum',
or more neutral a 'coherent state' or - in quantum gravity -
a 'Hadamard state') around which the quantum field is expanded into
excitation modes (photons, gravitons, etc.) is the modern equivalent
of the aether. However nobody uses the term since it it fraught with
misleading connotations, and not really needed.
In modern language, the aether is called the vacuum, and the properties
of the aether are the properties of the vacuum.
While the 19th century aether was thought to be at rest,
the 20th century aether (= the vacuum in a quantum field theory)
is a Poincare invariant state with zero quantum numbers.
(In a putative quantum gravity, it would even be a diffemorphism
invariant state, should something like that exist. The Unruh effect
indicates, however, that there is probably no objective vacuum,
since emptiness is observer dependent.)
Indeed, Poincare invariance is the modern way of saying
'being at rest' - the momentum of a Poincare invariant state is zero
in every frame of reference, and the mass of a Poincare invariant
state must also be zero, which implies that the vacuum is empty
in terms of mass. (It is however allowed to be filled by a constant
nonzero Higgs field, as required in the standard model.)

Identifying the aether and the vacuum is consistent with the way
Einstein thought about the topic, as the following quotes from
Einstein's lecture (in German) at the University of Leyden, 1920, show:
''Da solche Felder auch im Vakuum - d.h. im freien Aether - auftreten,
so erscheint auch der Aether als Traeger von elektromagnetischen
''Man kann hinzufuegen, dass die ganze Aenderung der Aetherauffassung,
welche die spezielle Relativitaetstheorie brachte, darin bestand,
dass sie dem Aether seine letzte mechanische Qualitaet, naemlich die
Unbeweglichkeit, wegnahm.''
''Man kann die Existenz eines Aethers annehmen; nur muss man darauf
verzichten, ihm einen bestimmten Bewegungszustand zuzuschreiben,
d.h. man muss ihm durch Abstraktion das letzte mechanische Merkmal
nehmen, welches ihm Lorentz noch gelassen hatte.''
''Der Aether der allgemeinen Relativitaetstheorie ist ein Medium,
welches selbst aller mechanischen und kinematischen Eigenschaften
bar ist, aber das mechanische (und elektromagnetische) Geschehen
''Man kann also wohl auch sagen, dass der Aether der allgemeinen
Relativitaetstheorie durch Relativierung aus dem Lorentzschen Aether
hervorgegangen ist.''
''... Den Aether leugnen bedeutet letzten Endes annehmen, dass dem
leeren Raume keinerlei physikalische Eigenschaften zukommen...''
For the complete speech in German and in English translation, see
(the part with the above quotes is not freely available online).

Note that the QFT vacuum is considered by many as a very dynamical

entity, being able
1. to have excitations, namely single particles and multiparticle
states; in particular photons = quantized electromagnetic waves,
2. to exhibit spontaneous symmetry breaking, and
3. to generate random particle-antiparticle pairs.
(In some people's imagination, being able to
4. allow whole universes to pop up or disappear!)
Thus the modern vacuum looks much more like the 19th century aether
(whose excitations were the classical electromagnetic waves)
than the classical vacuum to which Einstein was referring.

S10j. What is time?
It is commonly asserted that in general relativity there is no
absolute simultaneity. On the other hand, it is asserted that
we see the Sun as it was 8 minutes ago and the Andromeda nebula
as it was 2.5 million years ago. This seems to conflict with
each other - apparently we have no diffeomorphism invariant way
of assigning a relative time to a distant object.
Let us take a closer look at the issues involved.
The invariant way of defining present is to say that
x and y are present if the two points are in a spacelike relation,
and to say y was earlier (or later) than x if y lies in or on
the past (or future) light cone.
Thus the present is well-defined as the complement of the
closed light cone.
Now suppose that you look at the sun. If one is really pedantic,
one would have to say that you see the sun in your eye, as a
2D object, and not out there in 3D. But we are accustomed to
interpret our sensations in 3D and hence put the sun far away
but into the here.
In general relativity, one goes a step further.
One thinks in terms of the 4D spacetime manifold and places the sun
there. Calculating the length of the geodesic gives a value of 0,
so the sun is not in your present. Consideration of the sign of the
time component in an arbitrary proper Lorentz frame, one finds that
the sun is in your past, as everything you observe.
But the amount of invariant time passed, as measured by the metric,
is zero. This looks like a paradox. What happened with the claimed
8 minutes?
The answer is that the metric time is not the right way to measure
time. It is the only time available in a Poincare-invariant flat
universe, or in a diffeomorphism invariant curved universe.
An empty universe where only noninteracting observers
move has no notion of simultaneity.
But a matter-filled, homogeneous and isotropic universe
generally has one, defined by the rest frame of the galactic fluid
with which general relativity models cosmology.
Since the fluid breaks Lorentz symmetry (except in
very special cases, which are ruled out by experiment)
it creates a preferred foliation of spacetime.
This foliation gives a well-defined cosmic time, when
scaled to make the expansion of the universe uniform.
(Actually there are several natural scalings = monotone
transformations of the time parameter;
see Section 27.9 in Misner/Thorne/Wheeler, so cosmic time
without a reference to the scale used is ambiguous.)
This cosmic time figures in all models of cosmology.
The values commonly talked about when quoting times
for cosmological events, such as the date of the big bang
or the time a photon seen now left the Andromeda nebula,
refer to this cosmological time.

S10k. Time in quantum mechanics
In the traditional formulation of quantum mechanics, time is not an
observable. Nevertheless it can be observed...

In the Schroedinger picture, the state is defined at fixed times,

which distinguishes the time. In this picture, time measurement
is difficult to discuss since the time at which a state is considered
is always sharp.
In the Heisenberg picture, time is simply a parameter in the
observables, and therefore also distinguished, but in a different way.
Parameters are in fact just continuous indices and not observables.
As 3 is not an observable while p_3 is one, so t is not an observable
but H(t) is one. Observables have at _each_ time an expected value;
the moment of time (''now'') is not modelled as observable.
But what can be modelled is a clock, i.e., a system with an observable
which changes with time in a predictable way. If the observable u(t)
of a system satisfies
ubar(t) := <u(t)> = u_0 + v (t - t_0) (v nonzero) (*)
with sufficient accuracy, one has a clock and can find out by means of
<u(t)> how much time
T = Delta t
passed between two observed data sets.
This is also the usual way we measure time in classical physics.
Of course, to be a meaningful time measurement, T must be large enough
compared with the intrinsic uncertainty
Sigma_T := |v^{-1}| sigma(u(t)).
sigma(u(t)) = sqrt(<(u(t)-ubar(t))^2>)
is the standard deviation in the properly calibrated
(quantum mechanical) state <.>. If (*) has significant errors
then Sigma_T is of course correspondingly larger.

In relativistic quantum field theory (which in its covariant Version

can only be formulated in the Heisenberg picture), the 1-dimensional
time t turns into the 4-dimensional space-time position x. Now x
is a vector parameter in the observables (fields), and hence is not an
observable. Space and time are now on the same level (allowing a
covariant point of view), but both as non-observables.
The observables are fields; positions and times of particles are
modelled by unsharp 1-dimensional world lines characterized by a high
density of the expectations of the corresponding fields.
(Think of the trace of a particle in a bubble chamber.)
For position and time measurement, one now needs a 4-vector field
u(x) with
<u(x)> = u_0 + V (x - x_0)
and a nonsingular 4x4 matrix V, and the intrinsic uncertainty
takes the form
Sigma_T := sigma(V^{-1}u(x))
sigma(a(x)) = sqrt(<(a(x)-abar(x))^*(a(x)-abar(x))>),

Conclusion: In nonrelativistic quantum mechanics, time is always

measured indirectly via the expectations of distinguished observables
of clocks in calibrated quantum mechanical states. In relativistic
quantum field theory, the same holds for both position and time.

However, this analysis works only when one assigns to single clocks
a well-defined state, hence assumes a version of the Copenhagen
From the point of view of the minimal statistical interpretation,
one needs in contrast a whole ensemble of identically prepared
clocks to measure time...

Note that in relativistic quantum mechanics, a single particle is

described (in the absence of an external field) by an irreducible
representation of the Poincare group. Here only the components of
4-momentum and the 4-angular momentum are observables. From these,
one can reconstruct observer-dependent 3-dimensional (Newton-Wigner)
position operators satisfying canonical commutation rules, but not
a time operator.

S10l. Diffeomorphism invariant classical mechanics
In mechanics, time is a point in a 1-dimensional manifold,
and diffeomorphisms are just smooth reparameterizations of the time.
For any Lagrangian of the form
L(q,qdot,t) := U(q(t)) qdot(t),
where q is an n-dimensional column vector and U an n-dimensionaler
row vector, the action
S = integral L(q,qdot,t) dt
is diffeomorphism invariant. As a consequence, the Noether energy
(the formal Hamiltonian constructed in the transition from a Lagrangian
to a Hamiltonian formulation) vanishes identically and has no physical
content. For one can bring an arbitrary Hamiltonian system
xdot=H_p(p,x) , pdot=-H_x(p,x),
where H is the physically relevant energy, into the above form by
q^T = (x^T,p^T,s),
U(q) = (p^T,0^T,-H(p,x)).
For a careful discussion see Section 4.3 of
PJ Olver,
Applications of Lie groups to differential equations,
Springer, New York 1993.
Those who can read German, can find more in the Section on
''Diffeomorphismeninvariante klassische Mechanik'' in my
German Theoretische-Physik-FAQ at
For diffeomorphism invariant reformulations of arbitrary field
theories, see
C.G. Torre,
Covariant phase space formulation of parameterized field theories,
J. Math. Phys. 33 (1992) 3802-3812

S10m. The concept of ''Now''
Time is passing - what is ''now'' in our subjective experience
changes. But there is no concept of ''now'' in physics.
Classical nonrelativistic mechanics does not know the concept of now.
One declares some time to be ''now'' - but which time one declares to
be ''now'' is completely subjective (i.e., in different situations it
will be declared differently). Similarly, one declares some position
to be ''here'', but which position you declare to be ''here'' is
completely subjective, in the same sense.
Classical relativistic mechanics does not know the concept of now,
either, but things change a little: Here one declares some event
(= spacetime point) to be ''here and now'' - but which event one
declares to be ''here and now'' is completely subjective.
Nonrelativistic quantum mechanics treats time completely differently
from space (time is a parameter, space coordinates are operators),
and introduces stochastic elements into the dynamics.
but with respect to ''here'' and ''now'', the situation is identical
with that in the classical nonrelativistic case.
Relativistic quantum mechanics restores the treatment of space and
time on equal footing (space annd time coordinates are parameters),
and introduces stochastic elements into the dynamics.
But with respect to ''here and now'', the situation is identical
with that in the classical relativistic case.
Once one has chosen ''here'' and ''now'', respectively ''here and now'',
it serves as origin of the tangent hyperplane, in which localized, flat
physics can be done, reflecting faithfully what happens in a
neigborhood of the spacetime point. This is the domain of relativistic
quantum field theory.

S11a. A concise formulation of the measurement problem of QM
Quantum mechanics asserts in the Born rule (also called Lueder's rule)
that when a particle prepared in a pure state passes an ideal
measuring instrument characterized by a finite family of mutually
orthogonal projectors P_k (with P_k = P_k^*, P_k P_l = delta_kl P_l
and sum_k P_k = 1), it transforms the pure state psi into the pure
state psi_k = P_k psi/p_k with probability p_k= psi^* P_k psi.
This is a consistent rule in a purely statistical interpretation
in which psi is an objective property of a source (describing the
statistical behavior of an ideal - stationary and pure - source of
particles) rather than an objective property of each individual
The measurement problem arises when (as is commonly informally assumed)
the wave function is regarded as an objective property of a particle.
Then the stochastic transformation demanded by the Born rule, called
the collapse of the wave function, conflicts with the deterministic,
unitary dynamics of the wave function demanded by quantum mechanics
of the joint system consisting of particle+instrument+environment.
The unitary dynamics predicts that the joint system is in a macroscopic
superposition, which is not observed.
Note that a measurement does not need a conscious observer.
A measurement is any permanent record of an event, whether or not
anyone has seen it. Thus the terabytes of collision data collected
by CERN are measurements, although most of them have never been
looked at by anybody. We human beings only look at crude summaries
of such high tech data, but the collapse (which gives rise to
individual particle tracks) is clearly independent of whether or when
we look at them.

S11b. The double slit experiment
The double slit experiment, where a broad beam of particles passes
a screen with two slits, is one of the most fundamental quantum
Standard wave function arguments for purely unitary quantum mechanics
predict (at best) that the effect of the screen is to turn a particle
in a pure state psi into a superposition of at least three terms,
one each for being in one of the two beams (for sufficiently wide
slits) or spherical waves (if the slits are narrow enough)
passing the slit and a third (or more) for the particle being stuck
somewhere on the screen.
This conclusion is arrived at as a simple consequence of linearity of
the Schroedinger equation, together with natural assumptions of what
happens for particles prepared in coherent states.
But it is generally believed - and assumed in _all_ discussions of
interference - that a double slit screen projects a particle with
incoming wave function psi with the correct Born probability to a
particle in a superposition of the two beams that pass the slits.
The challenge is to derive this from a quantum model of the situation,
without invoking explicit collapse anywhere in the derivation.
Before this cannot be done convincingly, I don't consider the
measurement problem solved.
For a precise version of a (slightly different) challenge, see

S11c. The Stern-Gerlach experiment
Another basic quantum experiment is the Stern-Gerlach experiment.
An input beam of silver atoms is passed through an inhomogeneous
magnetic field in a fixed direction, which produces a sideways
classical force on each silver atom proportional to the atom's
magnetic moment. The magnetic field is said to split the input beam
into two separate beams corresponding to atoms of spin up and down,
respectively, which shows in the experiment as silver spots where
the beams hit a screen. If the beam of silver atoms is replaced by
a beam of electrons with very low intensity and the screen is replaced
by a more sensitive detector, one observes single detection events,
each randomly at one of the two spots. Each such event is generally
interpreted as a spin measurement (up or down), which makes sense
only if the wave function actually collapses to |up> or |down>.
(Though this is very questionable since the electron stops existing
as an object separable from the screen.)
If a blocker is put in the way of one of the beams, the corresponding
spot on the screen disappears, but if the blocker is sensitive as well,
single observations are found to occur at the blocker as well.
According to strictly orthodox but purely unitary quantum mechanics,
the situation is the following:
If a single particle leaves the magnetic area, it is in an entangled
state consisting of a bilocal superposition of wave packets somewhere
along the two beams. When it encounters the blocker,
this single electron turns into a still bilocal superposition of wave
packets: One remains stuck where the blocked beam meets the block
and the other continues its motion along the unblocked beam.
A little later, this second wave packet meets the screen, and we end up
with a still bilocal superposition of wave packets, now both sitting
at the end points of the respective beam. Without the blocker,
essentially the same happens, except that the electron ends up
in a superposition of two spots on the screen.
More precisely, what happens is that if one starts with a pure
state |x,p> |left>, where |x,p> denotes an approximately coherent
state with position x and momentum p, and
one gets approximately a superposition
1/sqrt(2)(|x^+(t),p^+(t)>|up> +|x^-(t),p^-(t)>|down>),
where the parameters in the approximately coherent states follow
classical paths in phase space determined by approximately classical
motion due to the magnetic field, the blocker and the screen -
After hitting blocker and screen. respectively, positions are constant
and momenta vanish, and the particle is in a superpostion of two spots.
All this follows without difficulty from the superposition principle,
i.e., from the linearity of the Schroedinger equation.
To match observations in an objective interpretation of the wave
function, one needs a mechanism for changing the unobserved
superposition of spots into the observed definite spot. In an
observer-independent interpretation this has to happen in the split
moment between the particle feeling the presence of blocker or screen
and hitting or passing it. This is the so-called collapse of the wave
According to the old school (von Neumann, London and Bauer, Wigner),
in a purely unitary setting it requires a conscious look at what
really happened to change the superposition of spots into a definite
spot, which gives quantum mechanics an uncomfortable subjective,
human-centered touch.

S11d. The minimal interpretation
The minimal interpretation of quantum mechanics does not model
what really happens - it only claims probabilities.
When quantum mechanics is applied to small systems, one usually asks
only for statistical information. Here a collapse simply means a
change of the point of view resulting in taking conditional
expectations, and all difficulties disappear.
In that case, each particle simply moves in an undeclared and
undeclarable fashion along the experimental setting, the classical
instruments are always in a definite state, and instead of
superpositions one has probabilities of observation of exactly one
of the possible results in the superposition.
Now all objectivity (sources and preparation, detectors and
measurements) is in the classical setting only, which coexists
with the somewhat spooky quantum world, connected by quantum statistics.
The problem here is how to unify what happens classically
and quantum mechanically. This minimal view becomes inconsistent
once one wants to consider the classical system as a large quantum
system - all objectivity disappears since macroscopic superpositions
are possible.
(Generally, nonlinear modifications of the Schroedinger dynamics
are considered a possible way out, but this introduces other problems.)
The main limitation of the minimal interpretation is that it does not
apply to systems that are so large that they are unique.
Today no one disputes that the sun is governed by quantum mechanics.
But one cannot apply statistical reasoning to unique systems, such as
the sun as a whole.
If quantum mechanics is a universal theory of nature, it should also
apply to the sun as a whole. At least we know that it applies to the
extent that it governs the energy generating processes in the sun.
The actual numerical analysis of models of the sun use just
treats the nuclear reactions within a classical reaction-diffusion
framework, which (in principle - I don't know whether anyone has
actually done it) should be derivable from quantum mechanics using
statistical mechanics arguments.
A purely statistical interpretation has also a problem with the
notion of probability. (See the discussion on probability elsewhere
in this FAQ.) Probability (and hence the quantum state that predicts it)
is often seen as a subjective view about the experimenter's assumed
knowledge, or the knowledge an experimenter could gain when 100%
attentive. There is the subjectivist difficulty to determine
whose knowledge counts and why unobserved (and hence unknown)
classical processes still make a difference;
but one could imagine an ideal classical observer of the status of
Laplace's demon, for whom these problems would be absent.

S11e. The preferred basis problem
Born's rule, stated in the form that |<phi|psi>|^2 is the probability
that a system prepared in state psi is, upon measurement, found in state
phi, is valid only if a complete set of commuting observables is
measured and phi belongs to the preferred basis determined by the
experimental setting (i.e., the family of projectors).
Given the present state of the universe (which fixes the experimental
setting), there is no choice in the preferred basis. Thus, in a
mathematical model of quantum mechanics in the large, it has to
be deduced from the assumptions about the initial state and the
The preferred basis is fully determined by Nature, and that's why we can
find it out. Given an unknown instrument, one finds out by
experimenting with the new piece, letting it interact with systems
of known properties, and matching the collected data to trial models
until one fits. This is how things are indeed done in practice.
The process is called model calibration (or parameter estimation if
the model is fixed up to adjustable parameters).
At first, one never knows a new instrument precisely, and has to check
out its properties. After sufficient experience with enough instruments,
one knows reasonably well what to expect of the next, similar one.
Then only fine-tuning is needed, which saves time. And this knowledge
can be used to create new instruments which are likely to behave a
certain way; but one still has to check to which extent they actually
do, since no theoretical design is realized exactly in practice.
Not even in the classical, macroscopic domain!
Nature's choice is systematic, hence after having
seen that a number of screens have a preferred position basis,
we conclude that this is the case generally. As for a spectrometer,
if it is built with a prism to analyze light, it is reduced by theory
to the observation of light or current at certain positions of the
screen, which is done in the preferred position basis. Something
similar can be said about the Stern-Gerlach experiment.
So once one knows _some_ of Nature's preferences and the general laws,
one can deduce other preferences.
The challenge posed in the measurement problem is to deduce
from first principles that a screen made of quantum matter,
with two slits in it, actually has a preferred position basis and
projects the incoming system to the part determined by the slits.

S11f. Master equation and pointer variables
On an approximate level, the preferred basis problem is approached
via quantum master equations.
A quantum master equation is a dynamical equation for the density matrix
of a dissipative quantum systems, which approximates a quantum system
weakly coupled to an environment at time scales long compared to the
typical interaction time but short enough to avoid recurrence effects.
More precisely, the dynamics is given by a completely positive
Markovian semigroup in a representation named after Lindblad,
wo discovered its general form.
For a classical damped linear system xdot(t)=Ax(t) with a matrix A
whose spectrum is in the left complex half plane, the contribution of x
in the invariant subspace corresponding to eigenvalues which are not
purely imaginary decays to zero, so that at large times t,
x(t) essentially approaches the invariant subspace corresponding to
purely imaginary eigenvalues.
For a quantum master equation, a similar analysis holds and shows that
(under suitable conditions) the density matrix at times much larger
than the so-called decoherence time approaches a block diagonal form
in a suitable basis. Thus it (almost) commutes with a special set
of observables, which define the 'pointer variables' of the system.
These pointer variables therefore behave essentially classically.
If the pointer variables form a complete set of commuting variables,
the density matrix approaches a diagonal matrix, and the basis in
which this happens is called the 'preferred basis'.
For details, see, e.g., cond-mat/0011204 or gr-qc/9406054

S11g. Does decoherence solve the measurement problem?
Many physicist nowadays think that decoherence provides a fully
satisfying answer to the measurement problem. But this is an illusion.
Decoherence is the (experimentally verified) decay of
off-diagonal contributions in a density matrix (written in a
preferred basis), when information dissipates into unobservable
degrees of freedom in the environment of a system.
In particular, decoherence reduces a pure state to a _mixture_
of eigenstates. This is enough to induce classical features
in many large quantum systems, characterized by a lack of
interference terms.
Thus decoherence is very valuable in understanding the classical
features of a world that is fundamentally quantum.

On the other hand, the 'collapse of the wave function'

selects _one_ of the eigenstates as the observed one.
This ''problem of definite outcomes'' is part of the measurement
problem. It is still a riddle, and not explained by decoherence.
See the excellent survey article
M. Schlosshauer,
Decoherence, the measurement problem, and interpretations of quantum
Rev. Mod. Phys. 76 (2005), 1267-1305.
The champions of the decoherence approach are (not always
but at least sometimes) quite careful to delineate what decoherence
can do and what it leaves open. For example, Erich Joos, coauthor
of the nice book 'Decoherence and the Appearance of a Classical World
in Quantum Theory',
explicitly states in the last paragraph of p.3 in quant-ph/9908008
that (and why) decoherence does not resolve the measurement problem.
If the big crowd has a cruder point of view, it means nothing but
lack of familiarity with the details.

If the quantum mechanical state is taken only as a description

of a large ensemble, as in the Statistical Interpretation
(see next question), there is no problem.
But the riddle is present if one insists that the quantum mechanical
state describes a single quantum system (as seems to be required for
today's experiments on single atoms in a ion trap), which makes the
collapse a necessity.

In spite of all results about decoherence,

Wigner's mathematically rigorous analysis of the incompatibility
of unrestricted unitarity, the unrestricted superposition principle
and collapse, Chapter II.2 in:
J.A. Wheeler and W. H. Zurek (eds.),
Quantum theory and measurement.
Princeton Univ. Press, Princeton 1983,
in particular pp. 285-288, is unassailable.
In a nutshell, Wigner's argument goes as follows:
If a measurement of 'up' turns the complete system
(including the measured system, the detector, and the environment)
into the state
psi_1 = |up> tensor |up-detected> tensor |env_1>
and a measurement of 'down' turns it into
psi_2 = |down> tensor |down-detected> tensor |env_2>
and the projections of these states are stable under repetition
of the measurement (but possibly with different |env> parts>)
then, by linearity, measuring the state
|left> = (|up> + |down>)/sqrt(2)
necessarily places the whole system into the superposition
(psi_1 + psi_2)/sqrt(2)
of such states and _not_ (as would be needed to account for the
experimental observations) into a state of the form as psi_1 or psi_2,
depending on the result of the measurement.
Wigner's reasoning implies that a
resolution of the measurement problem requires giving up one of
the two holy tenets of traditional quantum mechanics: unrestricted
unitarity or the unrestricted superposition principle.
Von Neumann and with him most textbook authors opted for giving up
unitarity by introducing collapse as a process independent of the
Schroedinger equation. This is no longer adequate since we now know
that there is no dividing line between classical and quantum, so
that a measurement can no longer be idealized in the traditional
fashion. But then there is no longer a clear place for when the
collapse happens, and more specific solutions are called for.
My paper
A. Neumaier,
Collapse challenge for interpretations of quantum mechanics
(see also
contains a collapse challenge for interpretations of quantum mechanics
that brings to a focus the requirements for a good solution of the
measurement problem.
In my opinion, the collapse is no fundamental principle but
the result of _approximating_ the entangled dynamics of a system
with its environment by a Markovian dynamics for the system itself,
resulting in a dissipative master equation of Lindblad type.
The latter have a built in collapse. The validity of the Markov
approximation is an _additional_ assumption beyond decoherence,
which is responsible for the collapse. Its nature is similar to
that of the socalled Stosszahlansatz in the derivation of the
Boltzmann equation.
Quantum optics and hence all high quality experiments for
the foundations of quantum mechanics are unthinkable without
the Markov approximation.

S12b. Which textbook of quantum mechanics is best for foundations?
For large ensembles, there seems to be no disagreement about the
interpretation. The book
A. Peres,
Quantum theory - concepts and methods,
Kluwer, Dordrecht 1993
is probably the most useful (i.e., both clear and applicable)
account of foundational aspects on this level. It is not the easiest
book, though, and reading it demands more attention than, say
Sakurai's book. The latter is much more readable but has sloppy
foundations only; see the discussion in
There are also nice online treatises on certain aspects.
For the basics as related to quantum information theory, see, e.g.,
M. Plenio, Quantum Mechanics
M.B. Plenio and V. Vedral
Entanglement in Quantum Information Theory
M.B. Plenio and P.L. Knight
The Quantum Jump Approach to Dissipative Dynamics in Quantum Optics
Modern experiments appear to need, however, a quantum mechanics
of individual systems, and that's where controversy and confusion
prevails. I find none of the existing interpretations convincing,
and wrote up in Int. J. Mod. Phys. B 17 (2003), 2937-2980
= quant-ph/0303047 my own constructive (but incomplete) view
of the matter.
This paper is completely self-contained and works directly
with the statistical mechanics version of QM, with the
benefit that it avoids many of the traditional obscurities.
It discusses complementarity, ensembles, uncertainty relations,
probability, quantum logic, nonlocality, Bell inequalities,
sharpness of measurements, and rudiments of quantum dynamics.
The German ''Theoretische Physik FAQ'' at
contains a German language exposition of my consistent experiment
interpretation of quantum mechanics, which is a much extended version
of the above and gives a consistent setting for a quantum universe
which explains the nature of quantum chance. A paper on this
(in English) is in preparation.
For the history of the interpretation of QM, see the excellent book
Max Jammer
The philosophy of quantum mechanics.
The interpretations of quantum mechanics in historical perspective
Wiley, New York 1974
and the collection of original papers,
J.A. Wheeler and W. H. Zurek (eds.),
Quantum theory and measurement.
Princeton Univ. Press, Princeton 1983,

S12c. What is the role of quantum logic?
Quantum logic is a variant of logic often thought to be
appropriate for the foundations of quantum mechanics.
A good exposition is given in
K. Svozil,
Quantum Logic,
Springer, Singapore 1998.
The book is nice and useful for its material on hidden-variable
related arguments.
However, all that is commonly argued in textbooks about QM is argued
in terms of classical logic. An even cursory look at the large
quantum mechanical literature reveals that quantum logic only has
a marginal spectator role in QM, while all proofs of all properties
of quantum systems have always been discussed using the familiar
classical logic. Even in Svozil's book, one can see that quantum
logic is argued in terms of classical logic, and that it has
essentially no role in the analysis of actual physical situations
(apart from those used for testing the foundations).
Beyond a certain point, quantum logic is sterile, which is the reason
it never figures in textbooks (except perhaps in passing).
All one ever needs to know about quantum logic (unless one wants to
specialize in it) is summarized in Sections 6 and 7 of my paper
Int. J. Mod. Phys. B 17 (2003), 2937-2980 = quant-ph/0303047.

S12d. Stochastic quantum mechanics
For certain Hamiltonians, the Schroedinger equation can be interpreted
as a classical diffusion process. This leads to the stochastic
quantum mechanics of Nelson. For an overview, see, e.g.,
While it gives an interesting aspect to quantum mechanics and its
classical limit, Nelson's description has a severe deficiency
in that it cannot handle the situation when the wave function vanishes
at some point. At all such points, R has a singularity, and S is
entirely undefined. This happens, e.g., for excited states of hydrogen,
hence is an integral part of standard quantum mechanics.
Even if one argues that such states are idealized and cannot occur,
it seems not be possible to show that a state that is everywhere
nonzero will preserve this property under time evolution.
Thus Nelson's representations may develop spurious singularities
which are not in the observable part of quantum mechanics.
Also, it is awkward to do scattering calculations in Nelson's
framework. Moreover, Nelson, as quoted on p. 16 of the above paper,
says correctly,
''Quantum mechanics can treat much more general Hamiltonians
for which there is no stochastic theory.''
Thus it is unlikely to be useful as a 'fundamental' description
of nature.
Instead, natural stochastic forms of quantum mechanics are those of
quantum diffusion processes and quantum jump processes, in which the
wave function itself is regarded as a classical random object.
For their use in an experimental context, see, e.g., quant-ph/9805027.

S12e. Is there a relativistic measurement theory?
Real measurements take time, and are not instantaneous.
To treat the collapse as instantaneous is an idealization,
valid for many applications of quantum mechanics.
If relativistic effects play a role, one needs to use
quantum field theory. However, the measurement process in
quantum field theory is very poorly researched.
Thus statements about the conflict of instantaneous collapse
and relativity theory are based on very shaky grounds.
For measurement in the relativistic case (but without
invoking field theory) see quant-ph/9906034 and other papers
by Peres and/or Terno available in the arxiv.
They indicate the absence of problems, as far as such a simplified
analysis can be trusted.

S12f. Quantum mechanics and dice
It is frequently held that quantum mechanics makes only statements
about probabilities and not about single events.
This is very strange for a theory that claims to be the foundation
for everything scientifically observable.
According to the probabilistic view, quantum mechanics is incapable
of making any statement about dice that have been thrown already.
Although we can observe with perfect accuracy the value of the throw,
all that traditional quantum mechanics can give is the probability
distribution of the possible values of the throw, if this value were
not yet known.
Quantum mechanics has similar difficulties coping with other
actual events, since it never ever predicts what must happen or what
must have happened, but only gives probabilities.
This is of little consequence for quantities like the value of a
throw of three dice, but is a severe defect when discussing the
trajectories of the planets of the Solar System (for which we cannot
make meaningful statistics), of air planes, or of cars.
Clearly there must be something objective about these, although
traditional quantum mechanical interpretations - taken seriously -
are unable to accont for definite individual events.

S13a. Random numbers and other random objects
In probability theory, a random number is just a random variable x,
i.e., a measurable function on the set Omega of possible experiments,
that assigns to each experiment omega in Omega the value x(omega)
of x in this experiment.
In the important, 'noninformative' case where the measure is invariant
under a group transitive on Omega, so that all experiments are
identical copies of one another, physicists refer to this set Omega
as a (classical) 'ensemble',
although they are usually too vague to express this in formal terms.
The terminology easily extends to the inhomogeneous case if one
allows in ensembles each realization with a different frequency.
Mathematicians prefer to leave the set Omega (which they call the
'sample space') unspecified and talk about 'realizations' in place of
'experiments'. Thus, for each experiment omega in Omega, x(omega) is
a realization of x, i.e., what physicists would call the value found
in this particular experiment.
By giving a specific definition of the sigma algebra of interest,
and specific recipes defining x(omega), one has a model world in which
realizations make perfect sense.
A difficulty is, of course, that we do not have such a model for the
real world, and hence must resort to empirical approximations when
treating real-life problems. (This places physicists at a slight
disatvantage; however, there is the compensating advantage that their
results apply to real life instead of only satisfying one's sense of
beauty and precision....)
The only thing not specified in probability theory (unless one specifies
a particular model as indicated above) is the mechanism that draws
the number, and hence there is no way to know which experiment omega
has been realized. Therefore, probability theory makes only statements
about _all_ realizations simultaneously.
Example. Given the axioms of probability theory, a random number
uniformly distributed between zero and one is defined as a random
variable x such that
<f(x)> = integral_0^1 f(s) ds
for all Lebesgue-integrable functions f on [0,1], and any x(omega) is a
realization of it, i.e., an actual number in [0,1]. (In particular,
random numbers are _not_ numbers!)
Mechanisms to draw numbers that may be used as approximations to a
sequence of independent realizations x(omega) are called randon number
generators. They do not produce random numbers (since random numbers
are not numbers but measurable functions). Instead, they produce
sequences that look like typical
realizations of sequences of independent, uniformly distributed random
numbers (in the sense that they usually pass with high confidence level
certain statistical tests valid for such random sequences).
Therefore, the numbers they generate are used in practice as (often
completely adequate) substitutes for random numbers.
(On the other hand, there is no uniformly distributed random natural
number since the uniform measure on natural numbers,
mu(f) = sum_{k>=0} f(k) is not normalizable.)

Random numbers are comparably simple objects. More complicated random

objects need more sophisiticated ensembles but otherwise everything
remains analogous.
Let us consider the physically important example of Brownian motion.
Brownian motion (the random walk in space) is modelled by an ensemble
whose realizations (members) are the H"older differentiable
functions on R^3 with exponent 1/2. The probability of any particular
realization of a random walk is exactly zero, and statements with
positive probability must hold in uncountably many realizations.
Nevertheless, the ensemble is precisely the set Omega composed of all
such realizations. And the appropriate sigma algebra carrying the
Wiener measure needed to describe the random walk is indeed an algebra
of subsets of Omega.
Repeatedly tossing a fair coin is also a (kind of trivial) stochastic
process. A fair coin that can be thrown an unlimited number of times
with independent outcomes (sampling with replacement) cannot be
modelled by the sigma algebra 2^{0,1} over Omega_1 ={0,1}, since
this has not even two independent bits. Its sigma algebra is based
on the infinite ensemble Omega_inf consisting of all possible
sequences of outcomes, and is the tensor product of infinitely many
copies of 2^{0,1}. This setting is necessary in order to provide
meaning to the concept of 'independent trial' which
underlies most of statisitcal reasoning.
Because of the assumed independence of the trials, one can reduce all
computations to computations within 2^{0,1}. This is generally done
in elementary probability theory, to simplify the presentation.
But once one looks at binary processes which are even slightly
correlated (history-dependent), one needs the full sigma algebra
over Omega_inf.

S13b. What is the meaning of probabilities?
To say that
"The probability that someone in risk group A will die of cancer is 1/3"
does _not_ mean that
"10 out of 30 people in risk group A will die of cancer".
It only means that,
"on the average, 10 out of 30 randomly chosen people in risk group A
will die of cancer".
This can be checked (in the limit) by many repeated simulations,
or (directly) by a theoretical computation; both require that the
complete ensemble is available. Of course, in using probabilities for
predictive purposes, an insurance company tacitly assumes
(without any guarantee)
that the group of 30 people of interest is actually well approximated
by a random sample, so that one can expect 10 out of the 30 to die of
cancer. But this tacit assumption may well turn out to be wrong.
Statements about ensembles are in principle exactly checkable:
Operationally, to say that "The probability that someone in
risk group A will die of cancer is 1/3" means nothing more or less
than that exactly 1/3 of _all_ people in risk group A will die of
(This assumes that risk group A is finite. For infinite ensembles,
to define the precise meaning of '1/3 of all',one needs to go into
technicalities leading to measure theory. Indeed, measures are the
mathematically rigorous versions of 'classical ensembles' in general.
For quantum ensembles, see quant-ph/0303047.)
Of course, we cannot check this before we have information about
how _all_ people in risk group A died, but once we have this
information, we can check and verify or falsify the statement.
In terms of precise mathematics: A classical ensemble is the set of
elementary events underlying the sigma algebra over which the measure
is defined. For example, in any finite sigma algebra containing random
variables representing a fair coin (realizations 0,1; 1=head)
with probability 50%), one has a finite ensemble of elementary events,
and exactly half of them come out heads. For an infinite sigma algebra,
the ensemble is infinite; but with the natural weighting, again exactly
half of them come out head.
Usually, however, we only have incomplete knowledge about the ensemble.
For example, 'Tossing 10 fair coins' is just a sloppy way of saying
'Selecting a sample of size 10 from the total ensemble'.
The sigma algebra for modeling this must contain at least 10 indepemdent
random variables representing fair coins. This is the case, e.g., in the
direct product of N>=10 sigma algebras isomorphic to 2^{0,1}. For N>10,
it is obvious that here the number of heads is 5 (=50%) only on
average over many random samples; and it is impossible to infer the
exact probability from a single sample.
This is why statisticians say that they _estimate_ probabilities
based on _incomplete_ knowledge, collected from a sample.
The resulting estimated probabilities are known to be inherently
inaccurate; but they can be checked approximately by independent data
(cross-validation) providing confidence levels indicating how much
the predictions can be trusted.
On the other hand, they _compute_ probabilities from _assumed_complete
knowledge about the ensemble, namely the theoretical probability
distribution. Thus if complete information goes in, exact information
comes out, while computations based on incomplete information
naturally only gives approximate results inheriting some uncertainty
from the input.
Computed probabilities are powerful, but only if the assumed stochastic
model is correct. Empirical estimates are usually inaccurate but useful.
The two approaches are not contradictory; indeed, they are combined in
practice without difficulties at all.
The only subjective aspect in the whole thing is the choice of a
stochastic model when making theoretical predictions; and even this
is made almost objective by the standard rules of statistical
inference and model building.
Indeed, the choice of ensemble is _always_ a subjective act that
determines what the probabilities mean. It encodes what the user is
prepared to assume about the given situation. Once the ensemble is
chosen - either a theoretical, exactly known ensemble, defined by
specifying a distribution, or as a real life ensemble of which only
a (perhaps growing) sample is available, all probabilities have an
objective meaning.
A chosen ensemble is knowledge precisely if it is close to the correct
ensemble, and we have a good idea of how close it is.
That's why we value highly scientists such as Gibbs who guessed
the right ensembles for statistical mechanics, which turned out to be
a highly accurate description of equilibrium situations.
Only good choices are knowledge.
And what is good is found out only through proper checking,
and not through the principle of insufficient reason.
In case of tossing a coin we know that the fairness assumption is
usually reasonable, being consistent with experience.
In case of taking an exam at a newly appointed professor about whom
no one knows anything, reasoning from the two possible outcomes
(pass or fail) and the principle of insufficient reason to assign
a probability of 50% failure is ridiculous, and dangerous for those
who are not prepared.

S13c. What about the subjective interpretation of probabilities?
People with a preference for subjective interpretations would say
''probabilities depend on someone's knowledge''.
instead of
''probabilities are a property of the ensemble under consideration''.
They talk of ''arrival of new information'' or ''learning'' instead of
the objective and unassailable formulation ''restricting the ensemble to
a subset defined by the conditions'' when discussing conditional
probabilities (the classical analogue of the statistical collapse of
the wave packet in quantum mechanics).
But knowledge is an even more poorly defined concept than probability,
which at least has an undisputed axiomatic basis. Thus explaining
probability in terms of knowledge only makes the meaning of probability
more foggy by putting it deep into the psychological realm.
Moreover, the subjective interpretation based on the Bayesian paradigm
of conditional probability has no formal way of coping with
misinformation (the ensemble grows if one learns that some of the
information one believed to know turns out to be false!) while,
on the objective level, the latter is just another change of the
Thus the subjective interpretation of probability is an inadequate
foundation for the use of probabilities in physics.

S13d. Are probabilities limits of relative frequencies?
Sometimes, probabilities are regarded as limits of relative
frequencies as the number of trials becomes arbitrarily large.
But the weak law of large numbers only guarantees that most trial
histories will give a sequence of relative frequencies that converge
to the probability. It might just fail for the one actually tried...
Moreover, in practice we only have partial knowledge of such an infinite
sequence of trials (which cannot be performed). This knowledge about
the sample give no knowledge at all about the limiting ensemble.
Just as the knowledge of the first n items of a sequence give, in
theory, no knowledge at all about the limit of the sequence.
That we often estimate the limit using a small part of the sequence
is asnother matter, and is like estimating probabilities from samples.
But the estimate may be completely wrong.
Thus interpreting probability as relative frequency is a philosophically
difficult interpretation step. For a thorough discussion, see the very
informative books by
T.L. Fine,
Theory of probability; an examination of foundations.
Acad. Press, New York 1973.
L. Sklar,
Physics and Chance,
Cambridge Univ. Press, Cambridge 1993.

S13e. How meaningful are probabilities of single events?
(Note: In this FAQ, 'event' is always understood in the ordinary sense
of the word, as 'something specific happening'.
In axiomatic probability theory based on Kolmogorov's axioms,
there is a slightly different, formal meaning of an event as an
element of the underlying sigma algebra.
An axiomatic foundation of probability theory equivalent to that of
Kolmogorov, but not based on sigma algebras, can be found in the book
'probability via expectation' by Paul Whittle, and a quantum extension
in quant-ph/0303047.)
Probabilities of single events are not at all meaningful
- at least not in any scientific sense -, although we are
used to scientific-sounding phrases such as
''There is a 60% probability for rain tomorrow''.
Instead, probabilities are properties of ensembles of events.
In the case just cited, the ensemble is the set of all tomorrow's,
(or rather an infinite idealization of it), and the probability is not
an exact probability, but an estimate computed on the basis of a sample
of former 'tomorrow's, together with statistical weather models.

Probability assignments to single events can be neither verified nor

falsified. Indeed, suppose we intend to throw a coin exactly once.
Person A claims 'the probability of the coin coming out head is 50%'.
Person B claims 'the probability of the coin coming out head is 20%'.
Person C claims 'the probability of the coin coming out head is 80%'.
Now we throw the coin and find 'head'. Who was right? It is undecidable.
Thus there cannot be objective content in the statement
'the probability of the coin coming out head is p', when applied to
a single case. Subjectively, of course, every person may feel
(and is entitled to feel) right about their probability assignment.
But for use in science, such a subjective view (where everyone is right,
no matter which statement was made) is completely useless.

What is the probability that a particular person, Mrs. X, will die of

cancer? This is a single event that either will happen, or will not
happen. If one considers this single event only, the probability is 1
or 0, depending on what will actually happen. (But this sort of
probability is not what we talk about in physics.)
On the other hand one may assign a probability based on some facts
about Mrs. X (smoker? age? gender? already ill?, etc).
Each collection of such facts determine an ensemble of people,
from which one can form a statistical estimate of the probability.
It clearly depends on which sort of ensemble one regarde Mrs. X
to belong to, what probability one will assign. Mrs. X belongs to many
ensembles, and the answer is different for each of these.
Thus probabilities are meaningful not as a property of the single event
but only as a property of the ensemble under consideration.
This can also be seen from the mathematical foundations. Classical
probabilities are determined by measures over some sigma algebra.
All statements in measure theory are _only_ about expectations and
probabilities of all possible (often infinitely many) realizations
simultaneously, and say nothing at all about any particular
For a random sequence consisting of 9 independent bits, with 0 and 1
equally likely, the sequence 111111111 has exactly the same status
and exactly the same probability as the sequences 110100101 or
000000000, although only the second sequence looks random.
(A random sequence is _not_ a sequence of numbers but a sequence of
random numbers = measurable functions. Only the _realizations_ of a
random sequence are sequences of ordinary numbers. Sequences of
ordinary numbers are _never_ random, but they can 'look random',
in a subjective sense.)

S13f. Objective probabilities
Consider a physical die (for simplicity assumed perfectly symmetric)
with six elementary events 1,...,6.
If the die is not thrown, all events are equivalent, and the
probabilities are 1/6 for each event. These probabilities are
associated to the die (_not_ to a throw), and can be determined
uniquely from the knowledge of the geometry and composition of
the die. All of probability theory happens at this level,
since the 'happening' of an event is not formally defined.
If the die is thrown, a given event (say 3) either happens or
does not happen. If the event happens (does not happen), the
statement 'This throw is a 3' is true (false), hence has a
probability of 100% (0%), although before the throw, these
probabilities are not yet known. These probabilities are
associated to each particular throw (_not_ to the die).
Thus a die functions as a potential stationary source of throws,
and hence _defines_ an ensemble of (conceivable) throws.
An actual throw, though a realization of this ensemble,
is determined by the outcome, and cannot be assigned a
probability different from 0 or 1.
[See, e.g., the wikipedia entry
''Omega is a non-empty set, sometimes called the "sample space",
each of whose members is thought of as a potential outcome of a
random experiment.''
'is thought of' signifies the interpretational level.
Probabilities are only about 'potential outcomes' (what I call
conceivable), not abut actual ones.]

A stationary source has objective probability distributions

for random vectors computable from observations made on it.
These are given in terms of an objective expectation mapping
and an associated density. In principle, this density can be
measured arbitrarily well, and if the form and composition of
the source is known, can be objectively predicted from
physical theories.
Thus objective probability distributions exist always when the
generating ensemble is completely known, and more generally
whenever it is objectively determined.

Similarly, in quantum theory, a laser is a potential stationary source

of photons, the oven in a Stern-Gerlach experiment is a stationary
source of electrons, etc. The sources are in well-defined,
objective quantum mechanical states, defining ensembles with
objectively predictable properties.
S13g. How probable are realizations of stochastic processes?
In a stochastic setting, _every_ realization of a stochastic process
typically has probability 0; nevertheless, exactly one of them actually
Taking for simplicity the stochastic process defined by independent
flips of a fair coin, a realization is an infinite binary sequence,
and each of these has probability zero. (Partial realizations of
finite length N each have a probability of 2^-N which is extremely
tiny for large N.)
For discrete stochastic processes having a continuum of allowed values
at each time step, even partial realizations have zero probability,
except in degenerate situations. The same holds for continuous-time
stochastic processes.
The case of measuring electron spin, say, is more difficult to analyze
because as stated, it is not yet a well-defined stochastic process.
If it is taken as a continuous measurement, the flips occur at random
times, and so even a single flip at a definite time has probability
If it is taken as a discrete process, we need to specify a measuring
protocol that applies at definite, equidistant times. Then it is likely
that there are some correlations, and probabilities even of finite
pieces of a particular realization are hard to get by. Nevertheless,
under reasonably random circumstances (for example, when measuring spins
of independent electrons), the probability of the most likely sequence
of N measurements decreases exponentially with N, and the probability of
a complete realization (infinite sequence) is again zero.

S13h. How do probabilities apply in practice?
If one has a sound probabilistic model of a multitude of independent
events e_i with same assigned probability p one would be surprised
if the frequency of events is not close to p within a small multiple of
sqrt(p(1-p)/N). Rather than just accepting a rare occurence
(e.g., a brick going upwards due to fluctuations) as something within
one's probabilistic model, one would probably rather try to explain
it away by assuming a hidden, unobserved cause (someone throwing it).
The way probabilities are used in practice is always as informative
guides of what to expect, but not as statements with a 100% exact
meaning. I wrote a paper on surprise:
A. Neumaier,
Fuzzy modeling in terms of surprise,
Fuzzy Sets and Systems 135 (2003), 21-38.
that may help understand the fuzziness inherent in our concepts of
S13i. Incomplete knowledge and statistics
It is offen erroneously assumed that incomplete knowledge can
always be described by statistics. But this is by no means the case.
If one knows about a number x only that it is in [0,1], one cannot
apply statistics since one knows nothing at all about the distribution
(except for its support). It is perfectly consistent with
the knowledge that in fact always x=0.75, except that one does not
know it, or that x oscillates regularly, or....
The ignorance is in this case simply deterministic lack of information.
In particular, it would be a mistake to assume that the distribution
is uniform (ignorance interpretation). Using the noninformative prior
of the Bayesian school, which makes this assumption, may be seriously
More realistically, in engineering, an uncertainty in the elasticity
module of 5% in steel bars may be the only information available
to an architect; but 3/4 of the bars used later in the building
may have a deviation of 0.1% and the remaining quarter one of 3.7%.
In general, all one can deduce from information that takes the form of
deterministic bounds on a vector x of variables and/or on expressions
in x are bounds on derived quantities y=f(x) one would like to compute
from it. This leads to global optimization problems, where f(x) is
minimized or maximized subject to the known constraints. See
The lack of knowledge that statistics can model is of a different kind.
It assumes that the _maximal_attainable_ knowledge about the system
- at the given level of description - is a probability distribution,
and that this probability distribution is indeed known.
The knowledge of the probability distribution can be replaced by a
qualitative knowledge of it (e.g. 'some Gaussian distribution'),
together with the knowledge of an incomplete sample from the ensemble
of interest; in this case, however, the best statistics can offer are
parameter estimation techniques that give credible probability
distributions compatible at some confidence level with the sample data.
There are also combinations of both kinds of incomplete information,
where one knows the maximal knowledge about a system should be
stochastic, but one lacks complete information on the distribution.
This is handled by the field of 'imprecise probability', although
there is not yet a generally accepted way for analyzing such
situations, and different schools with quite different basic
approaches compete. See, e.g, the links in

Theoretical physics is always concerned about describing the maximal

attainable knowledge about a system (at a given level of description),
irrespective of what anyone actually knows about it. In this way,
and only in this way, it is possible to get close to the objectivity
that science always is striving for.
S13j. Priors and entropy in probability theory
For a probability distribution on a finite set of alternatives,
given by probabilities p_n summing to 1, the Shannon entropy is
defined by
S = - sum p_n log_2 p_n.
The main use of the entropy concept is the maximum entropy principle,
used to define various interesting ensembles by maximizing the entropy
subject to constraints defined by known expectation values
<f> = sum P_n f(n)
for certain key observables f.
If the number of alternatives is infinite, this formula must be
appropriately generalized. In the literature, one finds various
possibilities, the most common being, for random vectors with
probability density p(x), the absolute entropy
S = - k_B integral dx p(x) log p(x)
with the Boltzmann constant k_B and Lebesgue measure dx.
The value of the Boltzmann constant k_B is conventional and has no
effect on the use of entropy in applications.
There is also the relative entropy
S = - k_B integral dx p(x) log (p(x)/p_0(x)),
which involves an arbitrary positive function p_0(x). If p_0(x)
is a probability density then the relative entropy is nonnegative.
For a probability distribution over an _arbitrary_ sigma algebra
of events, the absolute entropy makes no sense since there is no
distinguished measure and hence no meaningful absolute probability
density. One needs to assume a measure to be able to define a
probability density (namely as the Radon-Nikodym derivative,
assuming it exists). This measure is called the prior (it is often
improper = not normalizable to a probability density).
Once one has specified a prior dmu,
<f(x)> = integral dmu(x) rho(x) f(x)
defines the density rho(x), and then
S(rho)= <-k_B log(rho(x))>
defines the entropy with respect to this prior. Note that the
condition for rho to define a probability density is
integral dmu(x) rho(x) = <1> = 1.
In many cases, symmetry considerations suggest a unique natural prior.
For random variables on a locally compact homogeneous space (such as
the real line, the circle, n-dimensional space or the n-dimensional
sphere), the conventional measure is the invariant Haar measure.
In particular, for probability theory of finitely many alternatives,
it is conventional to consider the symmetric group on the set of
alternatives and take as the (proper) prior the uniform measure, giving
<f(x)> = sum_x rho(x) f(x).
The density rho(x) agrees with the probability p_x, and the
corresponding entropy is the Shannon entropy is one takes k_B=1/log2.
For random variables whose support is R or R^n, the conventional
symmetry group is the translation group, and the corresponding
(improper) prior is the Lebesgue measure. In this case one obtains
the absolute entropy given above. But one could also take as prior
a noninvariant measure
dmu(x) = dx p_0(x);
then the density becomes rho(x)=p(x)/p_0(x), and one arrives at the
relative entropy.
If there is no natural transitive symmetry group, there is no natural
prior, and one has to make other useful choices. In particular, this
is the case for random natural numbers.
Choice A. Treating the natural numbers as a limiting situation of
finite interval [0:n] suggests to use the measure with
integral dmu(x) phi(x) = sum_n phi(n)
as (improper) prior, making
<f(x)> = sum_n rho(n) f(n)
the definition of the density; in this case, p_n=rho(n) is the
probability of getting n.
Choice B. Statistical mechanics suggests to use as (proper) prior
instead a measure with
integral dmu(x) phi(x) = sum_n h^n phi(n)/n!,
where h is Planck's constant, making
<f(x)> = sum_n rho(n) h^n f(n)/n!
the definition of the density; in this case, p_n=h^n rho(n)/n! is the
probability of getting n.
The maximum entropy ensemble defined by given expectations depends on
the prior chosen. In particular, if the mean of a random natural number
is given, choice A leads to a geometric distribution, while
choice B leads to a Poisson distribution. The latter is the one
relevant for statistical mechanics. Indeed, choice B is the prior
needed in statistical mechanics of systems with an indefinite
number n of particles to get the 'correct Boltzmann counting' in the
grand canonical ensemble. With choice A, the maximum entropy
solution is unrelated to the distributions arising in statistical
Thus while the geometric distribution has greater Shannon entropy
than the Poisson distribution, this is irrelevant for classical physics.
In statistical physics with an indeterminate number of particles,
only the relative entropy corresponding to choice B is meaningful.
(In the quantum physics of systems with discrete spectrum, however,
the microcanonical ensemble is the right prior, and then Shannon's
entropy is the correct one.)
The identification of 'information' and 'Shannon entropy'
is dubious for situations with infinitely many alternatives.
Shannon assumes in his analysis that without knowledge, all
alternatives are equally likely, which makes no sense in the infinite
case, and may even be debated in the finite case.
(One of the problems of a subjective, Bayesian approach to
probability is that one always needs a prior before information
theoretic arguments make sense. If there is doubt about the former
the results become doubtful, too. Since information theory in
statistical mechanics works out correctly _only_ if one used the
right prior (choice B) and the right knowledge (expectations of
the additive conserved quantities in the equilibrium case),
both the prior and the knowledge are objectively determined.
But this is strange for a subjective approach as the information
theoretic one, and casts doubt on the relevance of information
theory in the foundations.)
S14a. Theoretical challenges close to experimental data
Many theoretical physicists seem to think that the only worthwhile
challenges in theoretical physics can be found at >TeV energies.
But, (un?)fortunately, there are challenges, as difficult and
as exciting, in the realm of normal energies, deep in the limits
of the unknown (as regards understanding), and far more relevant
in my opinion.
The manpower and money invested in the exotic realms of nature
at very large energies would be much better spent on these challenges
closer to experimental data..

For example, finding a consistent nonperturbative setting for

QFT, or giving a meaning to the concept of the ground state of
a Helium atom in quantum electrodynamics (extended by a field
describing the nuclei).
I have not seen a single field-theoretic treatment of Helium,
surely a simple system.
Helium is a bound state with well-defined asymptotic behavior,
as well-defined as a dressed electron or photon, but there is no
clear conceptual basis for this in QFT although there should be
such a concept. That's why I think it is a very important
unsolved problem.
There are papers making heuristic approximations
(see hep-ph/9612330) which give accurate predictions - cf.
Phys Rev. A 65 (2002), 032516 and Phys. Rev Lett. 84 (2000), 3274 -,
but they don't give a clue what a helium atom 'is' in QFT.
Moreover, they treat two electrons in a classical external
Coulomb field instead of a system of two electrons and a nucleus.

The current treatment of bound states in QFT (see elsewhere in this FAQ)
is a very loose patchwork of techniques borrowed from perturbative
field theory and nonrelativistic quantum mechanics that should make
every theoretician shudder. There are some beginnings in algebraic
QFT of what bound states should be, but nothing convincing on the
quantitative level.

A theory of everything should also be able to answer questions

that are well established experimentally but not understood
from the foundations.
For example, deriving the Navier-Stokes equations
for water from quantum theory is another challenge
that so far remained unmet; it has been done long ago for
dilute gases, but no one extended it to dense fluids.
There are severe difficulties to overcome, but we know both
the final result (to much better accuracy than the parameters
of the standard model) and the supposed underlying microscopic
model (unlike in quantum gravity); and the availablility of
a derivation might even have long-term engineering consequences
for predicting properties of fluids under thermodynamic conditions
where experiments are difficult or impossible.
I am not an expert in this topic, but here are some pointers to
what I have seen about the problem.
I have never seen any microscopic derivation of Navier-Stokes
for water, although this is by far the most important application.
The statistical mechanics text of Reichl derives the equations
in Chapter 14F from thermodynamics, and in Chapter 16C-F
(for dilute monatomic gases) from classical statistical physics.
Fujita, Nonequilibrium statistical mechanics, derives Boltzmann
from QM in Chapter 4.2 and Chapter 6; Navier-Stokes would
be roughly analogous (for dilute gases). Similarly for many
other books on nonequilibrium stat. mech..
Mueller/Ruggeri, Extended Thermodynamics, treat relativistic versions,
deriving them from the Boltzmann equation and from thermodynamics.
Volume 9 of Landau/Lifschitz discusses techniques for the condensed
state in general, but no derivation of Navier-Stokes.
J. Math. Phys 11 (1970), 2481 is a paper summarizing in the
introduction what was known by 1970.
Phys. Rev. D 53, 5799-5809 (1996) derives hydrodynamic equations
from quantum fields but only in a scalar phi^4 theory.
More recent related work includes
Phys. Rev. D 68, 085009 (2003)
Phys. Rev. D 64, 025001 (2001)
Phys. Rev. D 61, 125013 (2000)
Thus there is a well-trodden pathway for the dilute gas case,
and a set of tools for the condensed phase, but no synthesis
of the two.
If you find better references, please let me know.

S14b. Does the standard model predict chemistry?
The standard model is widely believed to be in agreement with
all we know about matter and radiation on earth, within the range of
accessible energies, as long as gravitational effects can be neglected.
But this does not mean that it has a high predictivity, except
on the level of high energy elementary particle scattering.
The reason is that we can compute from it almost nothing at the scales
of interest in nuclear, atomic, or molecular physics.
Lattice gauge calculations show that the standard model implies the
existence of baryons such as proton and neutron with masses that
match the experimental masses with an accuracy of about 5%.
This is far too low to be of use in chemistry or even in nuclear
physics. The accuracy of the effective forces between them is even
We have very little control over confinement, which is essential to
get useful forces at the energies relevant for nuclear physics.
Thus predictivity of the standard model for nuclear information
is almost nil.
And indeed, nuclear physicists do not use the standard model
(except for paying religious lip service to it), but work with
their own phenomenological models. They just borrow some of the
symmetries. These were of course known long before the standard
model was born, and built into the latter to match reality; so they
cannot count as predictions from the standard model.
If we had only the standard model and the numerical estimates
for the constants of effective actions computed from it,
this would give _very_ poor predictions of properties of protons,
neutrons, and their bound states.
One can show that the effective dynamics of protons and neutrons is
governed by effective field theories whose form can be derived
from the standard model (but also follows from assumed symmetry
principles built into the standard model) but whose coefficients
are derived by fitting calculations to _measured_ data about form
factors of proton and neutron, which have _not_ been calculated
from the standard model but must be put in by hand as additional
From this, one can calculate the energy of the nuclei, using a combined
droplet/shell model. We understand the structure of nuclei, in agreement
with the standard model, but _not_ derived from it.
If we had only the standard model and the numerical estimates computed
from it, this would give _very_ poor predictions of nuclear properties.
There would be neither nuclear energy nor nuclear weapons based on
knowledge derived form the standard model only.
Even knowing the properties of proton and neutron from measurement
and the effective equations (but nothing else) does not allow to get
highly accurate predictions for the properties of larger nuclei.
At atomic distances from the nucleus (for QED-dominated phenomena),
one can further approximate the theory by Dirac-Fock equations,
or, for light nuclei, by Schroedinger's equation
for electrons and nuclei together with relativistic corrections.
The details of the nuclei become irrelevant for atomic physics and
chemistry, except for their atomic weights. These cannot be derived
accurately enough from lower levels, and must again be supplemented by
additional experimental information.
If we had only the standard model and the numerical estimates computed
from it, this would give _very_ poor predictions of most chemical
properties of everything including the hydrogen spectrum.
Only starting on this level, _assuming_ the properties of the nuclei
and the electron, we are able to predict much of macroscopic physics:
We can solve the Dirac equation exactly for hydrogen, and
compute the radiation corrections from QED and other corrections from
the Standard Model. It agrees with the experimental measurement of
hydrogen spectra to extraordinary accuracy. We can understand why the
periodic table works, and predict the properties of even large
atoms (such as the color of gold) reasonably well using the Dirac-Fock
From this level on upwards, one has enough experimental data to
calculate chemical information for small molecules that is predictive
in the sense that it may give quantitative information that is
reasonably accurate and not put in by hand.
But already for proteins, one again needs to complement the theoretical
input by measurements to get predictions of reasonable accuracy.
Thus the standard model is a very inaccurate tool for chemistry.
It is useful only for elementary particle scattering experiments.
At each higher level, one needs additional information from
experiment to complement the predictions of the lower levels.

S14c. Is the result of a measurement a real number?
A single measurement (reading from a scale) always gives a rational
number, at least if the scale is in terms of rastional units.
(If the scale gives an angle in degrees which is then converted into
arc length, the measuremnt gives rational multiples of pi instead).
However, this is by convention only, since a pointer position is
just a position in 3-space which must be translated into a number
by a subjective reading or by a digital reading device of limited
resolution. Thus the true position is not determined accurately
enough to associate it with a single number.
Infinitely many rationals (and uncountably many reals) are
compatible with any observable state of the voltmeter.
That's why the error bars are intrinsic to measurement results, even to
single readings. Deleting them and claiming exact measurement results is
just laziness, acceptable when the resolution of an instrument is known.
Therefore, according to the standards of NIST (National Institute
of Standards and Technology), a measurement gives an interval
consisting of a rational number together with an error bar; see
Of course, the error bar is also somewhat uncertain, but one generally
accounts for this uncertainty by rounding it upwards, to make the
whole estimate conservative.
The NIST definition has the advantage that it also applies to indirect
measurements obtained from raw measurements by some computations.
Indeed, most high quality measurements are of this kind.
Nevertheless there is no contradiction if one assumes that reality is
governed by equations in terms of exact real (or complex) numbers,
and only the measurement abilities are limited.

S14d. Why use complex numbers in physics?
Complex numbers are _the_ natural number system for all but
elementary physics; one needs them to make sense of many advanced
concepts. Avoiding complex numbers would make much
of what is done incomprehensible.
Already Fourier analysis is most natural with complex numbers,
though here it could be avoided by using trigonometric series
The time-independent Schroedinger equation defines the
Fourier components of real, measurable expectations. So it is
very natural that quantum mechanics is based on complex entities, too.
Dispersion relations in optics are natural only in a complex setting.
Spectra of nonhermitian operators, essential for dissipative systems
even in the classical case, are always complex.
Analytic continuation plays a significant role in some physical
theories. For example, lattice gauge theory works in a continuation
of quantum field theory to Euclidean space, and the results must be
continued back to Minkowski space to get physical meaning.

On the other hand, at first sight it seems that only real quantities
are measurable. However this only holds for the most direct measurements
where you read a number from a meter. Most measurements are of a
more indirect kind, and then this restriction no longer applies.
To measure a family of physical quantities x_l (l=1,...,n),
one measures some related real quantities r_1,...,r_m connected to
the x_l by a system of equations F(x,r)=0 (in the absence of
measurement errors). In fact, there will always be measurement errors,
hence one generally uses more equations than unknowns and solves
the least squares problem ||F(x,r)||^2=min (or a more complicated
related problem if a model of measurement errors is avaialble)
to get an estimate of x.
This recipe is universally used for all sorts of measurements and
works whether the x_l are real or complex.

S15a. How precise can physical language be?
The relation between theory and reality necessarily uses ordinary
language and is therefore somewhat fuzzy. If one insists on 100%
unambiguous statements, one is on the level of pure mathematics or
mathematical physics (platonic reality), and cannot have any contact
with (physical) reality.
The best one can do is to have completely precise concepts on the
theoretical level and a description in ordinary, informal language
that relates theory to reality. In the formal theory, all concepts
can be precisely defined, and get names corresponding to their intended
use in reality. This ensures that one knows precisely what one talks
about - on the conceptual level.
In this informal language there must be room for linguistic
approximations without specifying their quality more than by
fuzzy words interpreted by the circumstances, since this is the way
we necessarily perceive reality.
When formulating the interface between theory and reality,
one must use the formulations people use who are using this interface,
They know how 'large' something must be to be taken as 'infinite'.
They estimate limits from finite sequences (most of numerical
analysis would be void if we couldn't...), usually quite successfully
- although this is meaningless mathematically.
A mathematical limit in theory does _not_ translate into a mathematical
limit in reality.
This is necessary since all our observations are finite, and most of
them are noisy. As there are approximate ways of determining the mass
of the Moon, but no exact methods, so there are approximate methods
for determining probabilities, but no exact ones. Exact real numbers
belong to theory, not to reality. (Even counting is not sure to result
in an integer. What about the number of people in a room when just
someone enters?)
Careful protocols for experimentation and measurement are useful to
achieve a certain amount of objectivity and repeatability, but even the
best protocols cannot reduce the level of fuzziness in the interface
between theory and reality to zero. I recommend
Experimentation and Measurement, by W.J. Youden,
reprinted 1997 by the National Institute of Standards and Technology
Although a very old paper (from 1961), it is still considered by NIST
to be up to date and exemplary in its lessons about measurements.
Among other things, it discusses on pp. 26ff in greatest detail
how to measure the thickness of a sheet of paper in an ensemble of
sheets typically called a thick book.
If one follows his argument closely, one finds that even classically,
observables such as the 'thickness of a sheet of paper' are
probabilistic only, notwithstanding that probably everything relevant
about paper can be understood by classical mechanics and
Thus there are no exact concepts in observed Nature.
But in a good theory of Nature, all concepts should be exact.

S15b. Why bother about rigor in physics?
Approximate methods are almost always more efficient than rigorous ones.
You can see this, for example, from the way integrals are calculated in
numerical analysis. No one uses the 'constructive proof' by
Riemann sums or, harder, by measure theory.
But for the logical coherence of a theory, the rigorous approach
is important.
To prove that a long, complicated expression in a single variable is
monotone may be quite hard and exceed the capacity of a typical
mathematician or phycisist, but to evaluate it at a few hundred points
and look at the plot generated is easy.
If you (the reader) are satisfied with the latter, never try to
understand mathematical physics - it will be a waste of your time.
But if you want to have physics in general look like classical
Hamiltonian mechanics - a beautiful piece of mathematically rich
and powerful theory, then you should not be satisfied with the way
current quantum field theory (say) is done, and keep looking for
a better, more solid, foundation.
About the pitfalls of using mathematics ''formally'' (i.e., without
bothering about convergence of the expressions, existence or
interchangability of limits, etc.), I recommend reading
F. Gieres,
Mathematical surprises and Dirac's formalism in quantum mechanics,
Rep. Prog. Phys. 63 (2000) 1893-1931.
G. Bonneau, J. Faraut, G. Valent,
Self-adjoint extensions of operators and the teaching of quantum
Amer. J. Phys. 69 (2001) 322-331.
See also:
K Davey,
Is Mathematical Rigor Necessary in Physics?
British J. Phil. Science 54 (2003), 39-463.

On the other hand, on the way towards finding out what is true,
nonrigorous first steps are the rule, even for hard die
mathematicians. The role of intuition and nonrigorous thinking in
mathematics is well depicted in the classics
J. Hadamard,
An essay on the psychology of invention in the mathematical field,
Princeton 1945.
G. Polya,
Mathematics and plausible reasoning,
2 Vols., 1954.
G. Polya,
Mathematical discovery,
John Wiley and Sons, New York, 1962.
More recently, the article
A. Jaffe and F. Quinn,
"Theoretical mathematics": Toward a cultural synthesis of
mathematics and theoretical physics,
Bull. Amer. Math. Soc. (N.S.) 29 (1993) 1-13.
reports on the potential and dangers of nonrigorous approaches
to scientific truth. This paper was commented in contributions
by a number of influential mathematicians and mathematical physcists in
M. Atiyah et al.,
Responses to ``Theoretical Mathematics: Toward a cultural
synthesis of mathematics and theoretical physics'',
by A. Jaffe and F. Quinn,
Bull. Amer. Math. Soc. 30 (1994) 178-207.
and the response of Jaffe and Quinn is given in
A. Jaffe and F. Quinn,
Response to comments on ``Theoretical mathematics'',
Bull. Amer. Math. Soc. 30 (1994) 208-211.
See also
D. Zeilberger,
Theorems for a Price: Tomorrow's Semi-Rigorous Mathematical Culture,
J. Borwein, P. Borwein, R. Girgensohn and S. Parnes
Experimental Mathematics: A Discussion

S15c. Justifying the foundations of a theory
Quantum mechanics is a somewhat unintuitive theory, and generated
a lot of foundational literature aimed at justification and
explanation of the conceptual basis.
Justification of the basic postulates of any theory is necessarily
circular. If it were not, the postulates were not basic but derivable.
One must take all the basic postulates as a single foundation
on which everything else rests without circularity.
But the basic postulates themselves can only be motivated, but not
Most people simply trust that tradition selected good foundations.
If you want to probe that trust you can go into studying the sea of
publications on the foundations of quantum mechanics. But unless
you are very dedicated and spend a lot of effort on it,
it is likely that you'll drown there before having found satisfaction...

S15d. Foundations, theory and experiment
Foundations of physics is the quest for getting the mathematical
concepts right to be able to do correct physics and think correctly
about it. Without correct concepts operational statements have no
meaning. The theory defines what a measurement is. Outside the
immediate realm of everyday experience, one needs already the
conceptual basis to even discuss what has operational meaning.
These statements apply both to good and bad theories. Even a bad theory
defines what a measurement is; it just defines is more poorly.
There is in fact a crossfertilization between measurement and
foundations. If one gets better the other profits from it.
On the other hand, fuzzy foundations lead to poor judgment and
ambiguity in measurements, and poor measurements lead to low
discrimination among theoretical alternatives.
One can observe from history that progress in concepts lead to better
inverstigations of nature, and better experiments lead to higher
demands on the theory, forcing people to look for more stringent
concepts and simpler or more encompassing frameworks.

S15e. Theoretical physics as a formal model of reality
Can the meaning of all terms in a physical model be determined
precisely without an infinite regress? I want to show that the
answer is a clear `yes'.
Look at the question `What is a force?' To answer this, one needs
to consider the concepts of force, mass, acceleration, pressure,
stress, recoil, perhaps the gravitational field, etc., in total
a small number of physical items. If we want to define them in reality,
we don't get an infinite chain but a circular definition -- we can only
define one in terms of another, illustrating the concepts by pointing
to situations where we hope everything is obvious.
In practice (i.e., in teaching physics), this works alright since
each of us knows reality already
and only needs enough context to identify the usage of the concepts --
there is essentially only one fit that works, and once the
light goes on, we understand -- or at least the level of
understanding deepens. (Later, when doing high precision measurements,
we may notice that our understanding is not adequate,
and become more careful and sophisticated, and at some advanced s
tage one can probably write a whole book to get definitions
that are really precise...)
But there is another way that is fruitful and neither circular nor
infinite. It is obtained by mimicking how modern logic investigates
its foundations. It assumes that we know at the 'external reality'
level what logic is; then it builds a formal model, a 'formal reality',
in which one can talk about everything one talks in 'real' logic,
but in completely formal terms.
You don't need to know what truth, propositions, etc. are in reality,
but you declare the rules for manipulating
with them -- since this is the heart of the matter.
This is done in exactly the same way as the Greeks declared rules for
manipulating geometric terms. In addition, they had definitions like
'a point is what has no parts'; but in modern geometry, this is
considered to be not a well-defined formal statement
(instead it has the circular character of relating the concept
to reality), and hence is simply dropped from the list of axioms.
So modern geometers define a projective plane by a few simple
''There are points, there are lines, there is a relation which
tells which points are on which lines, through any two distinct
points there is exactly one line, and any two distinct lines
have exactly one common point.''
That's all, and it is enough to do planar projective geometry with
full clarity and completeness. We do not need to know anything about
the objects to analyze a situation
(unless we want to check it's impact on external reality).
Of course, it is good to have a few more restrictions and concepts to
go really deep, but this is supposed to be just an example.
In the same way, one can discuss _everything_ about
the real logic in the formal model of logic, and reach clarity.
It is my proposal to do this for physics as well.
Actually it has been nearly achieved in classical physics,
and fully achieved in Hamiltonian mechanics.
You start with a phase space and a Hamiltonian which fall from
heaven. (They are motivated by circular arguments, but these arguments
are not part of the theory in the formal sense.)
Having this, you can build a whole world, with atoms,
dynamics, paths, forces, accelerations, stress, etc.
In fact, you can discuss any question about the classical world
in this mathematical frame, without ever needing any undefined term.
Formal reality is define by what is expressible in terms of the
concepts already available, and 'true' reality with its circularity
never enters except as a guide to formulating new concepts and to
discuss their consequences.
This is what I think theoretical physics is about.
It builds a formal model of the world, with a 'formal reality',
in which every important concept from experimental physics has
a well-defined formal meaning, and in which every reasonable
question about the physical world can be posed and investigated.
What can be posed and analyzed in such a framework counts
as understood, and understanding of nature increases by bringing
more and more into such a formal model, until everything about
physical nature is representable.
My vision is that the same is possible and desirable for quantum
physics. For me, realizing this vision is
equivalent to having understood quantum physics.
So I want to have a mathematical quantum model of nature,
in which one can talk about all the things physicists talk about
when they talk about nature in the physical sense. In particular,
there will be concepts like particles, fields, detectors, measurement,
probability, memory, etc. but -- unlike in real nature --
they will have a precise and unambiguous formal definition,
of the same formal quality as force, acceleration, etc. are defined
in Hamiltonian mechanics.
Then we can ask about the "meaning" of each term,
and get a well-defined answer within the formalism,
without infinite regress.

S16a. On progress in science
The frontier in science is the frontier because there is no clear
understanding of what is beyond. All that is there is a set of
questions bothering those close to the frontier, and a set of
experiences of more or less failed attempts to push the frontier
Real improvements in difficult matters never come by starting from
scratch - they come from patiently building upon the best of
what already exists, being open-minded but critical about new
possibilities, and trying to integrate what looks most promising.
Those who had the questions and found real answers published it and
andvanced the state of the art. The others can only share their
experience and their chart of the uncharted territory. As one can see
from the conflicting opinions, these charts are not reliable.

If you (the reader) want to proceed further, you need to learn to

see with your own eyes, take your own risks, and find out for yourself
what can be trusted. There are no guides beyond a certain point.
And don't count on recognition before you actually succeed!
As long as ideas are tentative and not validated by experiment,
they are always hard to defend. Success comes late -
either with a triumphal experimental verification, or if people
realize that a new way is significantly simpler than the tradition.
If neither happens, people will stick to the tradition,
except for a minority who lives from exploring the consequences of
the idea.
Innovative research is always a risky business - one must be prepared
to continue one's work no matter how much it is criticied, but one must
also learn as much as possible from one's critics. Then - if it is
indeed the right track - success will come sooner or later.
But who knows beforehand what will turn out to be the right track?
So people have a right to be critical...
Critics usually just present a statement, or point to an incoherence in
an opponent's statement. To learn from it is a nontrivial task,
since it means that one has to find out
a) how to make the criticism strongest, in a constructive sense, and
b) how best to defend the original statement.
Finding this out is learning from it.
Everyone starts their journey from where they are, in the direction they
find most promising. The others observe what they do and have to make
up their own mind. If people knew what is the right start and the right
direction, all important unsolved problems were solved by now.
The journey is a journey to collect understanding of the ill-understood.
To find bugs in a computer code one doesn't go around speculating,
but one carefully compares evidence available and stays as close as
possible to the code. Physics needs to find the bug in its
foundations, and as with computer programs, it will be very subtle
and will be found only by a careful investigator, not by a dreamer.
Of course, a certain amount of creativity is needed. But it must be
guided closely by general knowledge of similar problems already solved
and on the structure of the system, together with the information
turned up by a detailed analysis of the code.
Thus imaginative speculation works only if checked and confirmed by
detailed code analysis. And most of the wild ideas are useless.
Not a procedure I'd recommend for research, though, unfortunately,
it has become fashionable in some quarters of theoretical physics...
Rather, learn as much as you can about how and why the good theories
work, and if you have the calibre to be an innovator, you might be
able to spot what went wrong. But not by searching in the mist; your
search should always be well-directed, or you'll go in circles...

Judging from my own experience, understanding is not something that

springs into one's head without preparation, but is the result of
walking attentively and openminded along many blind alleys, until one
sees one which smells like being the real thing. Then one starts
grinding away in this direction, and in this process discovers what
should have been the guiding principle that would have avoided all
the dead ends, bringing one directly to the goal. Then, and only then,
the right understanding governs the remainder of the search.
This is not only my personal experience but seems to be the general
pattern: See
G. Polya,
Mathematical Discovery,
John Wiley and Sons, New York, 1962.

S16b. How different are physical sciences and social sciences
From the subject matter treated, a lot. From the modeling side far less.
There is no difference in principle. All science is based on observation
and experiment. All experimental data must be observed according to
well-defined protocols, to be objective (and hence science).
The main difference between physical sciences and social sciences is
that in the former one generally studies systems which are strongly
constrained by the experimental setting, so that they give much more
predictable results.
In both cases, however, the correct mathematical model is that of a
stochastic process, and physiccal sciences and social sciences only
differ in the size of the noise relative to the signal.
Sometimes to the extent that one can ignore the noise and treat a
physical system as deterministic, while a social system can never
be controlled well enough to make the remaining fluctuations

S16c. Can good theories be falsified?
The philosopher Karl Popper claimed that falsifiability is the
hallmark of scientific theories. But scientific practice speaks
against him.
A correct theory cannot be falsified, and in this sense is not
falsifiable, in spite of Popper. (Falsifiability can be asserted
only in a contrafactual sense, that there are _conceivable_ situations
that, according to the theory, are excluded. But for a correct theory,
these situation will never happen, hence are completely ficticious.)

What happens with good theories is, at worst, that their region of
validity or accuracy gets restricted as new data about more remote
instances come in.
In today's understanding, people are careful to indicate the
limits where a theory is claimed to be valid, and the accuracy
to which its answers are to be trusted.
For example, the Standard Model is claimed to be valid whenever
gravitation is negligible, accuracies conform to present possibilities,
and energies are well below a putative unification scale.
Failures outside this domain are not counted as falsifications.
While limits and accuracy claims are not necessarily part
of the theory proper, they are part of the theory as actually taught
and applied. Indeed, although people try to extrapolate, one can
never be sure whether a theory is correct outside the domain where
the data were collected.
But one can be reasonably sure within the domain where enough data
are available. Good scientific practice requires that a good theory
agrees with the data within the tolerances claimed.
Once this is the case, these theories can never be falsified.
Rather, if people find disagreement in experiments, the
theory falsifies the experimental arrangement or analysis.
All science students who ever did experiments in the lab know
very well that this is common practice.

The degree of caution and care at the highest

level of quality has been increasing through the centuries.
It is now too late to ask Newton whether he believed his theory was
valid without restrictions. (Or are there any hints in the Principia
Mathematica?) Certainly Newton's theory as taught today is taught
(i.e., with the restriction that it is valid at speeds small compared
to c and at distances large compared to the radius of the largest atom).
But we nevertheless believe that it is the 'same' theory, and if
Newton would live today, I think he would agree with that.
And Newton's theory will never be falsified, unless God suddenly
decides to change the physics of the Universe.
(That the observed advance of Mercury's perihelion did not match
Newton's theory was known as a limitating condition already before
relativity was born.)

S16d. What, then, distinguishes a good theory?
We can _know_ whether a theory has been correct in the past,
and we can _trust_ that it will remain so in the future.
There is no other kind of knowledge than that of the past.
Relying on that ''anything in the future is like in the past'' is an
act of faith. The question is not about faith or not, but about
faith in what is best supported by past experience.
Theories that conform with the past are easy to trust.
But they come in different degrees of stringency.
Theories which are not restrictive at all but accommodates everything
(such as astrology or psychoanalysis) are in vogue (as society shows)
but useless (and probably harmful). These are the ones that Popper calls
Highly restrictive theories (what Popper calls scientific) are preferred
by those who want to control their destiny as far as possible.
Theories like Newton's, general relativity, or QED are extremely
restrictive and in agreement with past experience, hence both
trustworthy and very useful.

What makes a theory good is not its potential falsifiability, but that
it drastically reduces the number of possibilities which are present
without the theory, without eliminating something that can actually
If you have no theory and put two marbles into your empty pocket,
and then another two, you don't know how many marbles you can take out.
If you know arithmetic and the law of conservation of marbles you can
predict that exactly four can be taken out. This is testable, and will
always come out correct. So you have a correct theory. Of course, its
validity is not unlimited, since it assumes that your pocket does not
have a hole; so if some experiment does not conform to your theory
since you can only take out three, you suspect that the domain of
validity was violated; you check for the hole - and surely you'll
find it.
This is exactly analogous to the way Newton's theory works, within
its domain of validity. If it fails, we suspect speed close to c,
or highly accurate measurements, or tiny distances. And surely
we'll find it so.
S16e. When is a theory preferred to another one?
Frequently, Ockham's razor
''frustra fit per plura quod potest fieri per pauciora'',
that we should not use more degrees of freedom than are
necessary to model a phenomenon, is invokes to argue that the theory
with the fewest parameters is the best. But this is true only
when taken with many grains of salt.
Chemists prefer as a starting point of their deepest investigations
the theory based on Dirac-Fock theory or even cruder approximations,
treating the nuclei (for large problems even atoms) as elementary.
This gives them all the information they need, while they can deduce
nothing at all from the standard model which is supposed to be a much
more exact and general theory.
Thus what is preferred depends a lot on which use can be made of it
Ockham's razor is appropriate only if two theories allow the same
deductions with a similar amount of work, or if the more parsimonious
theory is even superior in allowing one to derive the desired
Nothing in science is against a complicated model if it gives more ready
access to the quantities of interest than a formally simpler but
computationally more difficult or even untractable formulation.
Given only the standard model +classical relativity
(allegedly correctly describing all phenomena of the world at
accessible energies, distances, and accuracy), we'd know very little
about our world, and only very inaccurately. Not even the masses of
the nuclei can be predicted at present with any confidence, let alone
the properties of water or gold.
And given only string theory (a theory without any free parameter),
we'd know essentially nothing about our world.
for further discussion of Ockham's razor.)

16f. What is a fact?
In discussion on sci.physics.research, one often finds very good
information, but also often poor and misleading information.
How to distinguish the good from the poor?
Everything called knowledge is in fact a set of beliefs of the
person claiming it. And this set of beliefs is more or less close
to the objective truth, depending on the standards of that persons.
Calling so-called knowledge a set of beliefs does not contradict the
objectivity of mathematical definitions. When I say that a Banach
space is a normed, complete vector space, I both state my belief
and happen to coincide with the social consensus of the guild of
mathematicians. And when I say that state reduction is a
physical process, I both state my belief and happen to coincide with
famous physicists like von Neumann and many others, and this is good
enough to make this statement honestly, since the community has not
reached an agreement on the matter.

Telling others what one thinks is true in no way manipulates

others any more than feeding others what one thinks is nourishing.
But as we shouldn't accept being fed by those with poor judgment
about food, we shouldn't accept an opinion for the truth if offered
by someone with poor judgment about the relevant areas.
It is obvious that whatever a person claims is first and foremost
his or her personal opinion, and not a fact. Who takes it for a
fact is simply misleading himself or herself. Thus there is no
need to qualify each of one's statements by clumsy phrases like
'in my opinion', or 'according to what I have read/understood', or
'as far as I am informed' or 'since this makes most sense to me'.
These phrases accompany silently any statement by anyone.
It is also obvious that an opinion doesn't become a fact because
it is believed by half the number of people from a particular
ensemble; truth would otherwise become dependent on the choice
of this ensemble.
Thus one needs to check the claims, to listen to different sides
of a controversy, to ask for sources or justification of an opinion.
In this way, anyone who wants to get a clear picture soon notices
which claims are trustworthy, which ones are tenable but somewhat
shaky, and which ones are poorly founded.

On the other hand, in participating in a discussion,

honesty only requires that one asserts what one thinks
is true, and gives one's reasons upon request.
This is the scientific approach, since it lets others check upon
the trustworthiness of a claim.

S16g. Physics and experience
On superficial reasoning, time is only a concept that helps us
to order our experiences. Thus,
''experience exists; time does not''.
By exactly the same, argument,
''experience exists; space does not''
''experience exists; mass does not''
''experience exists; charge does not''
''experience exists; gravitation does not''
Physics is exactly about the concepts that are substituted for
experience to make experience quantitatively predictable.
Therefore, in this deeper sense, time, space, mass, charge,
gravitation, etc. exist, and are more fundamental than experience.

S16h. Modeling reality
In describing reality from a physics point of view,
the person modeling a system of interest makes certain
choices. These consist in choosing a mathematical model
of the system, and setting up a correspondence between
informal objects related to the system and formal objects
in the mathematical model.
More specifically, an assertion about reality is modelled
as a mathematical assertion about mathematical objects in the
mathematical model that carry the same names as those
in the reality they are supposed to model.

S16i. What is a system (e.g., an ideal gas)?
Theories of physics do not say what a system (such as an
electron, a star, an ideal gas, a crystal) is in reality.
Nevertheless, it is possible to check the reality contents
of a physical theory. How does this come about?
Let us consider thermodynamics. Thermodynamics does not say which
system is an ideal gas, which is only a van-der-Waals gas,
which is a liquid, or a solid.
Indeed, such questions need not be answered by the theory.
Instead, they are answered by checking how a system behaves:

If a real system behaves as the theory for an ideal gas (a solid,

a crystal, an electron) requires, a physisict will say it 'is'
an ideal gas (a solid, a crystal, an electron); if not, it is not.
While this definition may seem circular, it isn't once it is recognized
that one can check some characterizing properties of systems that have
a particular label (such as 'ideal gas') by a small amount
of measurements, and then deduce many more properties from the theory
that can be checked subsequently.
Engineers call this process 'system identification'.
Thus the task of theory is to provide models with just enough
flexibility that they cover the range of relevant possibilities,
while being still restricted enough so that one can identify
the system with a limited amount of data. Exactly in this case
a theory has predictive value.

S16j. When is a theory confirmed?
Any deviation from a law can only be 'confirmed' by narrowing error
bars for the parameters modeling the deviation. As long as the error
bars contain zero, the law counts as confirmed.
With time, confirmation of the law may be at a higher level of
accuracy, or (as in the case of neutron masses) confirmation of the
deviation (if the more accurate error bars no longer contain zero).
If one disputes any of the established theories because of not enough
confirmation, one can as well dispute Lorentz symmetry, translation
invariance, zero photon mass, general relativity, etc., which are
basic to contemporary physics but all confirmed only to a certain
There are experiments testing the limits of all these assumptions,
but even when one of these experiments succeeds (as in the case of
neutron masses), the previous theory remains valid to the accuracy
it was known to be valid before. In this sense, older theories don't
die even when they are superseded. A well-known case is Newton's
gravitational theory which is still taught and heavily used
although not completely correct.

S16k. What is real?
All physics is just a handy way of thinking about certain phenomena.
This - a handy way of thinking - is what it means that something
- the concept we find useful - exists.
We say that people exist, because they are a handy way to describe
certain blobs of matter like ourselves. We say that electrons exist,
because they are a handy way to describe ionization phenomena.
We say that photons exist because they are a handy way to describe
quantum optics phenomena.
Photons are objectively real because they are needed in the only
comprehensive coherent theory of microscopic interactions that we
know of.
On the other hand, 'photon' is merely a word that physicists use on
paper and in conversation. But in precisely the same sense that
entropy, energy, or the electromagnetic field are merely words that
physicists use on paper and in conversation.
Even our best concepts are 'merely' words.
If we give up concepts, only an undifferentiated happening in
space-time remains, and even talking about this becomes impossible.

S16l. How many angels fit onto the tip of a needle?
Anton Zeilinger writes in
''the question whether such a description exists or not was therefore
similarly irrelevant as, according to Pauli, the old question
how many angels fit onto the tip of a needle.''
This question has become a well-known metaphor for doing
irrelevant physics.
But how old is this question really?
Who was the person who discussed it seriously?
mentions explicitly Chillingworth's
''Religion of Protestants a Safe Way to Salvation''
(1638, reprinted 1972, 12th unnumbered page of the preface)
accusing unnamed scholars of debating
''Whether a Million of Angels may not fit upon a needles point?''
It seems that, as here, the question has always been used in a derisive
manner only. In the historical essay
E.D. Sylla,
Swester Katrei and Gregory of Rimini:
Angels, God and mathematics in the fourteenth century,
pp. 251-270 in:
Mathematics and the Divine: A Historical Study
(T. Koetsier and L. Bergmans, eds.)
Elsevier 2005,
Sylla conjectures that the question might have been coined by
Thomas Hobbes, who had learnt the scholastic tradition in Oxford
between 1603 and 1608. See also

But similar questions were discussed much earlier. Sylla

mentions an anonymous 14th century mystical treatise
''Swester Katrei'' (= Sister Kate or Sister Catherine) referring to
''a thousand souls in heaven sitting on the point of a needle''.
Cf. also the paper
G.M. Ross,
Philosophy 60 (1985), 495-511.
and the web site
Sister Catherine (Schwester Katrei)

Even earlier and most prominent is the discussion of angels

in Thomas Aquinas' ''Summa Theologica'', published in 1266.
It is surprisingly interesting.
It looks as if Aquinas was the first writer anticipating quantum theory
and the Pauli exclusion principle. Replace 'angel' by 'electron' and he
sounds surprisingly modern; in modern terms, angels are Fermions,
according to Thomas Aquinas.

An English translation of the ''Summa Theologica'' is available online.

Part I (
contains the chapter on angels.
The sections 50-53 on their substance relates to their physical
properties and hence is of scientific interest.
There he discusses the properties of a point particle from
a logical point of view. His 'angels' are not the winged creatures
we might imagine them to be, but incorruptible, indivisible,
extended objects, ''form without matter'', with quite precise
Two angels cannot be in the same place, but they have
virtual (sic!) positions, and can be in an extended place:
''So the entire body to which he is applied by his power,
corresponds as one place to him.''
They may go from one place to another with or without being observable
in between:
''But an angel's substance is not subject to place as contained
thereby, but is above it as containing it: hence it is under
his control to apply himself to a place just as he wills,
either through or without the intervening place.''
Their number roughly matches those of the number of electrons:
''Hence it must be said that the angels, even inasmuch as they are
immaterial substances, exist in exceeding great number, far beyond
all material multitude.''
(With ''angel'' interpreted as ''electron'', ''immaterial'' could thus
be interpreted as zero baryon number.)

Like early chemists hiding their scientific insights in an alchemist

guise, he might have phrased his speculations in terms of notions
acceptable to his clerical collegues...

If we attribute to the Greeks the concept of the atom (though they

thought of it in - for modern ears bizarre - terms that have little
to do with our modern view), we should perhaps be as generous towards
Aquinas and attribute to him the exclusion principle.

On a more tongue-in-cheek basis, the Annals of Improbable Research

published an article
A. Sandberg,
Quantum Gravity Treatment of the Angel Density Problem,
Annals of Improbable Research 7 (Issue 3), (2001), 5-8.

S17a. How to get information from sci.physics.research
If you read sci.physics.research out of curiosity, you may find that
the discussions get too specific for you but make you curious to
learn more about the background. But it may be difficult to find out
where to get started.
The right way to find out is to ask on sci.physics.research
for what you need, in response to someone's contribution.
The writers usually know how they got the knowledge, and are happy to
give you hints or recommendations, and others will join in if they
think they have better advice. The more specific your question, the
more likely you'll get an answer, and the more useful it will
be for others, too. By asking good questions you are doing a
service to all.
My Lord Jesus Christ, for whom I live, asserted:
"Ask, and it will be given you; search, and you will find; knock,
and the door will be opened for you. For everyone who asks receives,
and everyone who searches finds, and for everyone who knocks,
the door will be opened." (Matth. 7:7-8)
It took me a while to realize that this was excellent advice.

S17b. How to get your work published
You did some work that you think is great (or at least reasonable),
but it was rejected by the journal you sent it to?
This is disappointing, but not the end of all hope...
Rejection letters usually give some reasons for rejection; if they
don't you may request (in a polite way!) getting reasons so that you
can learn from them. And then _do_ learn from them! Usually the reasons
for rejection are sound and mean at least that you didn't pose your
case well. It also takes some time to learn the standards that
publications should respect, and it is likely that you violated
some of the unspoken rules without realizing it.
If your idea is far from mainstream, you need also convince people
that your approach is sound and merits spending the time to read
through the new proposal. This is difficult since you need to build
up trust; it requires that you have a high level of frustration
The less mainstream an idea the stronger must be its contents and the
more careful it must be argued to be publishable; use the feedback you
get to find out the standards expected and then go and meet them.
The difference between a crank and a serious researcher is that
the letter learns from criticisms and grows through each feedback,
while the former 'knows' (and acts on this assumption) that he is right
and that established physics is just rejecting him or her for no good
If you enter a correspondence with anyone who takes the time to
read your work, stay polite even when the answers you get are not
what you hoped for. Once the tone of your mail gets defensive or
aggressive, you probably lost your case - your partner sees that
you try to replace facts by emotions and your credibility is gone.

Time is precious for active scientists. So keep your article as short

as possible without losing substance. 120 pages of detailed analysis,
say, is too much for most people to read, unless they already have high
confidence that the contents is sound. If you really need 120 pages
to make your case you need to make short versions of your long paper
that allow others to do checks for reasonableness with less efforts.
You'd then have a 1/2 page abstract, a 3 page introductory essay,
a 7 page outline version, a 20 page version with the key steps, and a
full paper with all the details, and each of these versions should be
self-contained and allow the reader to get a feeling of what you do,
and why you succeed - in terms of background that shows that you are
familiar with the state of the art, and in a language that is both
understandable and concise. Then anyone reading it gets a sense of
high quality work that is informative and inviting.
Note that the most important task is not to present your claim and
praise or defend your work, but to convince others that your claim
deserves trust enough to spend time on checking it.
It is all too easy to make claims that are unsubstantiated but
embedded in a complicated manuscript where one gets easily lost,
loses track of what is important, and therefore misses the mistakes
or gaps in the arguments. It is the responsibility of the innovator
to present the news in a way that makes checking and trusting
Of course one can find many published papers that do not meet these
standards. This is probably because their contents is not important
enough to require high standards of checking, or because their
conclusions are not inviting suspicion. But innovative work invites
suspicion since it is far from the common, and if relevant requires
therefore higher standards to be accepted.

S17c. How to respond to critical referee's reports
{This is taken verbatim from]
What Should I Do When a Referee Criticizes My Paper?
Read the referee report carefully and dispassionately. Approach the
report with an open mind. What may at first seem like a devastating
blow is perhaps a request for more information or for a more detailed
explanation. At other times the referee may indeed have found a fatal
flaw in the research or logic. Put yourself in the position of a
reader, which is exactly the position of the referee. Is the paper
well written? Is the presentation clear, unambiguous, and logical?
Respond to all referee comments, suggestions, and criticisms.
Explain which changes have been made and state your position on points
of disagreement. In our experience, appropriate response to some
referee comments may require more research or even reconsideration
of the research project.
S17d. How to sell your revolutionary idea
Unless you don't care about making a fool of yourself, don't tell
it to others before you worked out enough details to be convincing.
Your audience is very likely to be skeptic (since there are too many
revolutionary ideas around which don't stand the test); so you need
to make best use of this fact.

The secret is that most people like to answer questions that

fall into their field of expertise, if it does not take too
much effort to reply. But few like to listen to half-baked
(or even fully baked but only outlined) ideas;
too many such offers come from cranks. The devil is always
in the details; and if you can't provide them it is likely
they'll think it is because it does not work or does not offer
any advantage.
So the right approach is to ask them for (and afterwards study!)
information about what is known in the direction you want to go,
rather than proposing the revolutionary way of doing it correctly.
Take heed of the advice of an old saint:
"Let every man be swift to hear, slow to speak, slow to wrath."
(The Bible, James 1:19)
If you really can do it better than others, and you don't find
prior relevant work in the literature, work it out yourself and
show with a nontrivial application that you can do _something_
more effciently than tradition. Then submit it to a respectable
journal, and people are likely to listen.
If you get negative feedback from referees, take it seriously,
learn from it as much as you can. Raise your standards according
to what you learn, and accomodate the criticism in your future work.
The referees are usually competent and have a point in what they say.
If not, it is likely that your work was presented in a fashion prone
to misunderstanding - in this case formulate your results more
carefully, taking into account accepted tradition. It is an author's
obligation to minimize the chances of misunderstanding by potential
readers. I gained a lot from considering the referee's advice in the
many papers I have written. And it takes a while to learn how to
write good papers...
Even if your work is good but not mainstream, it may take
persistence to publicize it properly; publishing is not enough.
But publicizing does not mean boasting with great claims -
this makes people suspicious and is therefore counterproductive.
Be modest in your claims - claim what you can actually prove,
but not what you only dream of proving one day.
See also: The Crackpot Index (by John Baez) at
S17e. Useful background, online lecture notes, etc.
(incomplete, just some useful references)

The Nobel Prize Winners in Physics
Nobel lectures of the laureates, and their biographies
worth reading - can be regarded as a sort of lively answer
to the question:
What has been important enough in physics to deserve a big prize?
Gerard 't Hooft,
How to Become a _good_ theoretical physicist
A tree (well, almost) of physics fields, subfields, and concepts.
The leaves explain things in some detail. There is an index
but it does not contain references to each node.
Organization seems to be experimental physics oriented;
for example, I have not found nodes with 'statistical mechanics'
or 'quantum field theory'.
Everything You Always Wanted to Know About the Hydrogen Atom
(But Were Afraid to Ask)
Theory of Renormalization and Regularization
contains a very useful set of online notes that may serve
as an introduction to QFT from a mathematical physics point of view.
Lecture Scripts and Online Courses on Quantum Mechanics
and on other physics topics
Introduction to General Relativity (video)
Review articles on Local Quantum Physics
Norbert Dragon,
Remarks on Quantum Mechanics
Lost & Regained Causes in theoretical physics
Selected Classic Papers from the History of Chemistry
Digest of moderated newsgroup sci.physics.research
Historical Physics Lecture Notes
* Freeman J. Dyson, Advanced Quantum Mechanics 1951
* Fritz Rohrlich, Applied Quantum Electrodynamics, 1953
* Green and Sengers Proc. 1965 conference on critical phenomena
* Cyril Domb's brief historical survey on critical phenomena, 1985
* 1993 roundtable, Physics in Transition
Resources for the History of Physics & Allied Fields
Sidney Coleman
Lecture notes on quantum field theory

S17f. Stories about physicists
Memories about Theoretical Physicists (by R.F. Streater)
Short Stories
Parables for Modern Academia (by D. and L. Haarsma)

S17g. Other physics FAQs
Usenet Physics FAQ
(extensive, has also links to further physics-related FAQs)
Physics FAQ
(a list of links)
Plasma FAQ
Quantum Physics FAQ
(current views of Erich Joos)
Physik und das Drumherum
(Physics FAQ in German)
S17h. Naming in science
How do scientific concepts, effects, or inventions named after
their discoverers?
It is good practice to name important concepts, effects, or inventions
created by esteemed collegues after them - good names are always hard
to find, and besides names clearly related to the content, names
naturally related to the history stick best. If a naming is successful
(in that others find it appropriate and useful) it will spread,
and soon everywhere is using it. Then the name is established.
It is bad practice if authors calls something by their own name
before it has been established by others. It suggests both vanity
and a lack of confidence that others do a good naming job.
And if the self chosen vanity name does not stick, it serves them
right for having made a fool of themselves.
On the other hand, naming is at times unfair. Not rarely in the past,
a concept (or theorem, etc.) got the name of one of its main proponents
rather than that of its creator.
There are several reasons for this.
It takes time (and a certain amount of interest) to find the true
origin of a concept; but a good name is needed once it is used by more
than a few people. But once a name is established, it is nearly
impossible to change it.
A concept may also be rediscovered independent of its first inception.
If the time wasn't ripe for it the first time, it is likely that the
name of the rediscoverer sticks, and the voices of those who had known
the first source come too late.
See also:
List of misnamed theorems

S18a. What is the meaning of 'self-consistent'?
A self-consistent solution (or method, or theory) refers to the fact
that one has two sets of equations relating two sets of unknown
quantities, and wants to solve the equations jointly for the unknowns.
If aspect A of a theory says y=x^2 and aspect B of the theory
(or of another theory) says x=y-2 then self-consistency means that
both equations are assumed to be valid, giving
x^2 = y = x+2,
which leads to the two solutions
x=2, y=4 and x=-1, y=1.
That's all. Of course, the self-consistent Hartree-Fock method,
say, has more variables and is harder to solve, but the principle
is the same.

S18b. What is a vector?
A vector is (for the beginner) a list of numbers written below each
other. For example the x,y, and z coordinate of a point in a
3-dimensional coordinate system. Physicists write the three
coordinates as x_1, x_2, x_3 and combine it to a vector
simply called x.
/ \
| x_1 |
x = | x_2 | (The parentheses look a bit awkward in ascii.)
| x_3 |
\ /
The same for a list of n numbers. This gives a vector x with n
coordinates x_1,...x_n, and is thought of as a point in a
space with n dimensions.
Two vectors are added or subtracted just by adding or subtracting
their entries. A vector is multiplied by a number just by multiplying
each entry with the number. Then there is the inner product of two
x dot y = sum_i x_i*y_i
which is a number and not a vector.
Once you mastered vectors you need to understand matrices.
These are rectangular arrays of numbers.
Later you need to enrich the meaning of a vector by learning
the concept of a vector space. Now all sorts of objects might
also deserve the name vector, most prominently functions,
matrices, tensors, operators. They behave in many respects
just like ordinary vectors.

S18c. Learning quantum mechanics at age 14
If you want to learn about quantum physics and really understand
you need to learn first how to do calculations with vectors and
matrices. Look in your local library for math books, about
'linear algebra' or 'analytic geometry'. You may have to try
several before you find one suitable at your level.
Linear algebra (i.e., vectors and matrices) is more fundamental
to quantum mechanics than calculus, although the latter is needed
to understand how things change steadily with time.
But one can understand the time-independent part of quantum mechanics
already without calculus, namely everything involving entanglement,
Schroedinger's cat, quantum cryptography, and the like.
This only needs linear algebra, which may be easier.
(On the other hand, calculus is not really difficult either,
once one gets used to it.)

Maybe at first it is better to get math schoolbooks from your

older peers. Good school books are written in a way that they can be
used for self study. If you are motivated it can be very exciting!
If you like math it is much less work than you might think,
and it is fun! Just start with next years textbook and read it
in your spare time! I started reading math beyond my age when I
was 12, and never regretted it.
With the right motivation, you can learn 10 times as fast
as when you just wait till the subject comes up in school!
And it will be 10 times as interesting!
You don't need to do all the exercises but just enough that you
think you know how it works. Go back to practicing more if you
need it. This speeds up things a lot.
Also, you don't need to read everything in the order it is in
the book - just go where your curiosity leads you, and if you
encounter something you don't know yet, go back to where it was
introduced. In this way you get the idea of what is happening
long before you understand it thoroughly, and it will be a
motivation to learn the missing things.
Learning math and physics is a life-long challenge (so much
interesting stuff accumulated over the centuries...),
and you can't start early enough.
And at any time in life there will be parts you understand well,
parts you understand partly or superficially only, and parts
where you know little more than a few buzz words. So you need
not aim at understanding everything fully on first acquaintance,
but learn whatever you can in whatever order you pick it up.
The stuff to be practiced and learnt well is only the part that
comes up over and over again. When you realize that then you know
what to learn, and you quickly see how to do it!

S18d. Research at age 16
With 16, you should spend your time with learning rather than
with doing research. Lacking ideas means knowing too little...
Once you know enough about what others did and where they
got stuck, you'll have more than enough ideas to work on.
I'd like to suggest that you read the Nobel lectures of the
physics Nobel laureates,
The material spans a whole century, and will occupy you for long!
It will put your mind to themes that have been important enough
to merit the prize; most of them will continue to be important in
the future.
In parallel, use the web to sort out all concepts used in the Nobel
lectures that you don't yet understand; at first it will be a lot,
and you have to search a bit to find out where the basics you need are
well explained. Some items might be explained in this theoretical
physics FAQ, or in the book mentioned at the top of this FAQ.
Doing both will put you on a learning track which will end in a
research career and bear plenty of fruit.

S18e. Are there indefinite Hilbert spaces?
There are no indefinite Hilbert spaces. There are, however,
vector spaces with a distinguished indefinite inner product;
these are called Krein spaces. Their structure is much weaker than
that of Hilbert spaces; there is no natural topology, no completeness,
nothing resembling a Hilbert space except the inner product.
Since there are physical situations where indefinite inner products
arise naturally, some people show their lack of knowledge of the
literature by referring to Krein spaces as indefinite Hilbert spaces.
But if a few people do so, it doesn't mean that the terminology is
For example, quant-ph/0211048 uses this poor terminology.
The ghosts referred to in this paper are nonphysical vectors in a
Krein space which contains a definite subspace of physical vectors
whose completion gives the physical Hilbert space. This is a natural
construction in gauge theories (Gupta-Bleuler formalism) where
the direct construction of a physical Hilbert space would
manifestly break Lorentz and/or gauge invariance, while the
nonphysical, bigger Krein space enjoys all desired invariance
The indefinite metric in relativity, also mentioned in that paper,
has nothing to do with indefinite Hilbert spaces, since the
underlying vector spaces (Minkowski space in special relativity,
the tangent spaces at space-time points in general relativity)
are 4-dimensional spaces with the ordinary Euclidean topology
(although the metric is non-Euclidean).

S19a. God and physics
This is most likely to be controversial; but you might be
interested in how the author of this FAQ sees the issues.
The following links are to some relevant pages from my web site.
How Do We Know Whether God Acts In The World?
''I found the assumption that `God acts in the world' a superior
way of organizing the events that I see or hear happen.''
Knowledge, Chance, and Creation
(On the difficulty to know, and the role of the second law of
thermodynamics as an instrument of creation)
How to study
''When I questioned the bible about the attitude appropriate
to the study of science I found the following instructions.''
How to Create a Universe - Instructions for an Apprentice God.
(A fantasy to be read at leisure time)
Science and Faith
(an extensive collection of links)
''Science is the truth only in matters that can be objectified;
in the spiritual world, where values, goals, authority and purpose
are located, science has nothing to say. It is a poor life that is
restricted to the scientific standard of truth, where you and I are
nothing but a collection of atoms without meaning and purpose.
Realizing the narrow-minded nature of science opens the gate to an
understanding of God that complements the scientific truth and gives
life, love and peace.''

and in German:
Gott - die grosse Unbekannte
Mathematik, Physik und Ewigkeit (mit einem Augenzwinkern betrachtet)

S20a. Acknowledgments
Thanks to the contributors to the newsgroup sci.physics.research
for their more or less challenging questions and comments, without
which this FAQ wouldn't exist.
Thanks also to Steve Carlip, Norbert Dragon,
Hendrik van Hees, Don Koks, Nick Maclaren,
Alejandro Rivero, Joe Rongen, and Gerard Westendorp
for useful comments that lead to improvements in the FAQ.
Finally, thanks to God for his wonderful and interesting universe,
and for the gift of being able to understand his wonders.