You are on page 1of 292

Statistical Mechanics:

Entropy, Order Parameters

and Complexity
James P. Sethna, Physics, Cornell University, Ithaca, NY
c 4, 2005

B b



Two-Phase Mix

E+E Water Oil

Electronic version of text available at

1 Why Study Statistical Mechanics? 3

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1 Quantum Dice. . . . . . . . . . . . . . . . . . . . . 7
1.2 Probability Distributions. . . . . . . . . . . . . . . 8
1.3 Waiting times. . . . . . . . . . . . . . . . . . . . . 8
1.4 Stirlings Approximation and Asymptotic Series. . 9
1.5 Random Matrix Theory. . . . . . . . . . . . . . . . 10

2 Random Walks and Emergent Properties 13

2.1 Random Walk Examples: Universality and Scale Invariance 13
2.2 The Diusion Equation . . . . . . . . . . . . . . . . . . . 17
2.3 Currents and External Forces. . . . . . . . . . . . . . . . . 19
2.4 Solving the Diusion Equation . . . . . . . . . . . . . . . 21
2.4.1 Fourier . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Green . . . . . . . . . . . . . . . . . . . . . . . . . 22
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1 Random walks in Grade Space. . . . . . . . . . . . 24
2.2 Photon diusion in the Sun. . . . . . . . . . . . . . 24
2.3 Ratchet and Molecular Motors. . . . . . . . . . . . 24
2.4 Solving Diusion: Fourier and Green. . . . . . . . 26
2.5 Solving the Diusion Equation. . . . . . . . . . . . 26
2.6 Frying Pan . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Thermal Diusion. . . . . . . . . . . . . . . . . . . 27
2.8 Polymers and Random Walks. . . . . . . . . . . . 27

3 Temperature and Equilibrium 29

3.1 The Microcanonical Ensemble . . . . . . . . . . . . . . . . 29
3.2 The Microcanonical Ideal Gas . . . . . . . . . . . . . . . . 31
3.2.1 Conguration Space . . . . . . . . . . . . . . . . . 32
3.2.2 Momentum Space . . . . . . . . . . . . . . . . . . 33
3.3 What is Temperature? . . . . . . . . . . . . . . . . . . . . 37
3.4 Pressure and Chemical Potential . . . . . . . . . . . . . . 40
3.5 Entropy, the Ideal Gas, and Phase Space Renements . . 44
3.6 What is Thermodynamics? . . . . . . . . . . . . . . . . . 46
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1 Escape Velocity. . . . . . . . . . . . . . . . . . . . 48

3.2 Hard Sphere Gas . . . . . . . . . . . . . . . . . . . 49

3.3 Connecting Two Macroscopic Systems. . . . . . . . 49
3.4 Gauss and Poisson. . . . . . . . . . . . . . . . . . . 50
3.5 Microcanonical Thermodynamics . . . . . . . . . . 50

4 Phase Space Dynamics and Ergodicity 53

4.1 Liouvilles Theorem . . . . . . . . . . . . . . . . . . . . . 53
4.2 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1 The Damped Pendulum vs. Liouvilles Theorem. . 60
4.2 Jupiter! and the KAM Theorem . . . . . . . . . . 60
4.3 Invariant Measures. . . . . . . . . . . . . . . . . . 62

5 Free Energies and Ensembles 65

5.1 Free Energy . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 The Canonical Ensemble . . . . . . . . . . . . . . . . . . . 67
5.3 NonInteracting Canonical Distributions . . . . . . . . . . 70
5.4 Grand Canonical Ensemble . . . . . . . . . . . . . . . . . 72
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.1 Twostate system. . . . . . . . . . . . . . . . . . . 74
5.2 Barrier Crossing. . . . . . . . . . . . . . . . . . . . 75
5.3 Statistical Mechanics and Statistics. . . . . . . . . 76
5.4 Euler, Gibbs-Duhem, and Clausius-Clapeyron. . . 77
5.5 Negative Temperature. . . . . . . . . . . . . . . . . 78
5.6 Laplace. . . . . . . . . . . . . . . . . . . . . . . . . 78
5.7 Legendre. . . . . . . . . . . . . . . . . . . . . . . . 79
5.8 Molecular Motors: Which Free Energy? . . . . . . 79
5.9 Michaelis-Menten and Hill . . . . . . . . . . . . . . 79

6 Entropy 83
6.1 Entropy as Irreversibility: Engines and Heat Death . . . . 83
6.2 Entropy as Disorder . . . . . . . . . . . . . . . . . . . . . 87
6.2.1 Mixing: Maxwells Demon and Osmotic Pressure . 87
6.2.2 Residual Entropy of Glasses: The Roads Not Taken 89
6.3 Entropy as Ignorance: Information and Memory . . . . . 92
6.3.1 Nonequilibrium Entropy . . . . . . . . . . . . . . . 92
6.3.2 Information Entropy . . . . . . . . . . . . . . . . . 94
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.1 Life and the Heat Death of the Universe. . . . . . 97
6.2 P-V Diagram. . . . . . . . . . . . . . . . . . . . . . 98
6.3 Carnot Refrigerator. . . . . . . . . . . . . . . . . . 98
6.4 Lagrange. . . . . . . . . . . . . . . . . . . . . . . . 99
6.5 Does Entropy Increase? . . . . . . . . . . . . . . . 99
6.6 Entropy Increases: Diusion. . . . . . . . . . . . . 100
6.7 Information entropy. . . . . . . . . . . . . . . . . . 101
6.8 Shannon entropy. . . . . . . . . . . . . . . . . . . . 101
6.9 Entropy of Glasses. . . . . . . . . . . . . . . . . . . 102
6.10 Rubber Band. . . . . . . . . . . . . . . . . . . . . . 103
To be pub. Oxford UP, Fall05

6.11 Entropy Measures Ignorance. . . . . . . . . . . . . 104

6.12 Chaos, Lyapunov, and Entropy Increase. . . . . . . 104
6.13 Black Hole Thermodynamics. . . . . . . . . . . . . 105
6.14 Fractal Dimensions. . . . . . . . . . . . . . . . . . 105

7 Quantum Statistical Mechanics 109

7.1 Quantum Ensembles and Density Matrices . . . . . . . . . 109
7.2 Quantum Harmonic Oscillator . . . . . . . . . . . . . . . . 114
7.3 Bose and Fermi Statistics . . . . . . . . . . . . . . . . . . 115
7.4 Non-Interacting Bosons and Fermions . . . . . . . . . . . 116
7.5 Maxwell Boltzmann Quantum Statistics . . . . . . . . . 119
7.6 Black Body Radiation and Bose Condensation . . . . . . 121
7.6.1 Free Particles in a Periodic Box . . . . . . . . . . . 121
7.6.2 Black Body Radiation . . . . . . . . . . . . . . . . 122
7.6.3 Bose Condensation . . . . . . . . . . . . . . . . . . 123
7.7 Metals and the Fermi Gas . . . . . . . . . . . . . . . . . . 125
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.1 Phase Space Units and the Zero of Entropy. . . . . 126
7.2 Does Entropy Increase in Quantum Systems? . . . 127
7.3 Phonons on a String. . . . . . . . . . . . . . . . . . 128
7.4 Crystal Defects. . . . . . . . . . . . . . . . . . . . 128
7.5 Density Matrices. . . . . . . . . . . . . . . . . . . . 128
7.6 Ensembles and Statistics: 3 Particles, 2 Levels. . . 128
7.7 Bosons are Gregarious: Superuids and Lasers . . 129
7.8 Einsteins A and B . . . . . . . . . . . . . . . . . . 130
7.9 Phonons and Photons are Bosons. . . . . . . . . . 131
7.10 Bose Condensation in a Band. . . . . . . . . . . . 132
7.11 Bose Condensation in a Parabolic Potential. . . . . 132
7.12 Light Emission and Absorption. . . . . . . . . . . . 133
7.13 Fermions in Semiconductors. . . . . . . . . . . . . 134
7.14 White Dwarves, Neutron Stars, and Black Holes. . 135

8 Computational Stat Mech: Ising and Markov 137

8.1 The Ising Model . . . . . . . . . . . . . . . . . . . . . . . 137
8.1.1 Magnetism . . . . . . . . . . . . . . . . . . . . . . 137
8.1.2 Binary Alloys . . . . . . . . . . . . . . . . . . . . . 138
8.1.3 Lattice Gas and the Critical Point . . . . . . . . . 139
8.1.4 How to Solve the Ising Model. . . . . . . . . . . . 140
8.2 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . 141
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.1 The Ising Model. . . . . . . . . . . . . . . . . . . . 145
8.2 Coin Flips and Markov Chains. . . . . . . . . . . . 146
8.3 Red and Green Bacteria . . . . . . . . . . . . . . . 146
8.4 Detailed Balance. . . . . . . . . . . . . . . . . . . . 147
8.5 Heat Bath, Metropolis, and Wol. . . . . . . . . . 147
8.6 Stochastic Cells. . . . . . . . . . . . . . . . . . . . 148
8.7 The Repressilator. . . . . . . . . . . . . . . . . . . 150
8.8 Entropy Increases! Markov chains. . . . . . . . . . 152
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

8.9 Solving ODEs: The Pendulum . . . . . . . . . . . 153

8.10 Small World Networks. . . . . . . . . . . . . . . . 156
8.11 Building a Percolation Network. . . . . . . . . . . 158
8.12 Hysteresis Model: Computational Methods. . . . . 160

9 Order Parameters, Broken Symmetry, and Topology 163

9.1 Identify the Broken Symmetry . . . . . . . . . . . . . . . 164
9.2 Dene the Order Parameter . . . . . . . . . . . . . . . . . 164
9.3 Examine the Elementary Excitations . . . . . . . . . . . . 167
9.4 Classify the Topological Defects . . . . . . . . . . . . . . . 170
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.1 Topological Defects in the XY Model. . . . . . . . 175
9.2 Topological Defects in Nematic Liquid Crystals. . 177
9.3 Defect Energetics and Total Divergence Terms. . . 177
9.4 Superuid Order and Vortices. . . . . . . . . . . . 177

10 Deriving New Laws: Landau Theory 179

10.1 Random Walks from Symmetry . . . . . . . . . . . . . . . 180
10.2 What is a Phase? Perturbation theory. . . . . . . . . . . . 183
10.3 Free Energy Density for the Ideal Gas . . . . . . . . . . . 185
10.4 Landau Theory for Free Energy Densities . . . . . . . . . 188
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
10.1 Deriving New Laws. . . . . . . . . . . . . . . . . . 193
10.2 Symmetries of the Wave Equation. . . . . . . . . . 193
10.3 Bloch walls in Magnets. . . . . . . . . . . . . . . . 193
10.4 Pollen and Hard Squares. . . . . . . . . . . . . . . 194
10.5 Superuids: Density Matrices and ODLRO. . . . . 194

11 Correlations, Response, and Dissipation 199

11.1 Correlation Functions: Motivation . . . . . . . . . . . . . 199
11.2 Experimental Probes of Correlations . . . . . . . . . . . . 201
11.3 EqualTime Correlations in the Ideal Gas . . . . . . . . . 202
11.4 Onsagers Regression Hypothesis and Time Correlations . 204
11.5 Susceptibility and the FluctuationDissipation Theorem . 206
11.5.1 Dissipation and the imaginary part  () . . . . . 207
11.5.2 Calculating the static susceptibility 0 (k) . . . . . 209
11.5.3 Calculating the dynamic susceptibility (r, t) . . . 211
11.6 Causality and Kramers Kronig . . . . . . . . . . . . . . . 214
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
11.1 Telegraph Noise and RNA Unfolding. . . . . . . . 215
11.2 Telegraph Noise in Nanojunctions. . . . . . . . . . 216
11.3 Coarse-Grained Magnetic Dynamics. . . . . . . . . 217
11.4 Fluctuations, Correlations, and Response: Ising . . 218
11.5 Spin Correlation Functions and Susceptibilities. . . 219

12 Abrupt Phase Transitions 221

12.1 Maxwell Construction. . . . . . . . . . . . . . . . . . . . . 223
12.2 Nucleation: Critical Droplet Theory. . . . . . . . . . . . . 224
12.3 Morphology of abrupt transitions. . . . . . . . . . . . . . 226
To be pub. Oxford UP, Fall05

12.3.1 Coarsening. . . . . . . . . . . . . . . . . . . . . . . 226

12.3.2 Martensites. . . . . . . . . . . . . . . . . . . . . . . 229
12.3.3 Dendritic Growth. . . . . . . . . . . . . . . . . . . 230
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
12.1 van der Waals Water. . . . . . . . . . . . . . . . . 230
12.2 Nucleation in the Ising Model. . . . . . . . . . . . 231
12.3 Coarsening and Criticality in the Ising Model. . . . 232
12.4 Nucleation of Dislocation Pairs. . . . . . . . . . . . 233
12.5 Oragami Microstructure. . . . . . . . . . . . . . . . 234
12.6 Minimizing Sequences and Microstructure. . . . . . 236

13 Continuous Transitions 239

13.1 Universality. . . . . . . . . . . . . . . . . . . . . . . . . . . 241
13.2 Scale Invariance . . . . . . . . . . . . . . . . . . . . . . . . 248
13.3 Examples of Critical Points. . . . . . . . . . . . . . . . . . 255
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
13.1 Scaling: Critical Points and Coarsening. . . . . . . 258
13.2 RG Trajectories and Scaling. . . . . . . . . . . . . 259
13.3 Bifurcation Theory and Phase Transitions. . . . . 259
13.4 Onset of Lasing as a Critical Point. . . . . . . . . . 261
13.5 Superconductivity and the Renormalization Group. 262
13.6 RG and the Central Limit Theorem: Short. . . . . 264
13.7 RG and the Central Limit Theorem: Long. . . . . 264
13.8 Period Doubling. . . . . . . . . . . . . . . . . . . . 266
13.9 Percolation and Universality. . . . . . . . . . . . . 269
13.10 Hysteresis Model: Scaling and Exponent Equalities.271

A Appendix: Fourier Methods 275

A.1 Fourier Conventions . . . . . . . . . . . . . . . . . . . . . 275
A.2 Derivatives, Correlations, and Convolutions . . . . . . . . 277
A.3 Fourier and Translational Symmetry . . . . . . . . . . . . 278
A.4 Fourier Methods and Function Space . . . . . . . . . . . . 279
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
A.1 Relations between the Fouriers. . . . . . . . . . . . 279

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

To be pub. Oxford UP, Fall05

Why Study Statistical
Mechanics? 1
Many systems in nature are far too complex to analyze directly. Solving
for the motion of all the atomis in a block of ice or the boulders in
an earthquake fault, or the nodes on the Internet is simply infeasible.
Despite this, such systems often show simple, striking behavior. We use
statistical mechanics to explain the simple behavior of complex systems.
Statistical mechanics brings together concepts and methods that inl-
trate into many elds of science, engineering, and mathematics. Ensem-
bles, entropy, phases, Monte Carlo, emergent laws, and criticality all
are concepts and methods rooted in the physics and chemistry of gasses
and liquids, but have become important in mathematics, biology, and
computer science. In turn, these broader applications bring perspective
and insight to our elds.
Lets start by briey introducing these pervasive concepts and meth-
Ensembles: The trick of statistical mechanics is not to study a single
systems, but a large collection or ensemble of systems. Where under-
standing a single system is often impossible, calculating the behavior of
an enormous collection of similarly prepared systems often allows one to
answer most questions that science can be expected to address.
For example, consider the random walk (gure 1.1, chapter 2). (You
might imagine it as the trajectory of a particle in a gas, or the cong-
uration of a polymer in solution. While the motion of any given walk
is irregular (left) and hard to predict, simple laws describe the distribu-
tion of motions of an innite ensemble of random walks starting from
the same initial point (right). Introducing and deriving these ensembles
are the themes of chapters 3, 4, and 5.
Entropy: Entropy is the most inuential concept arising from statis-
tical mechanics (chapter 6. Entropy, originally understood as a thermo-
dynamic property of heat engines that could only increase, has become
sciences fundamental measure of disorder and information. Although it
controls the behavior of particular systems, entropy can only be dened
within a statistical ensemble: it is the child of statistical mechanics,
with no correspondence in the underlying microscopic dynamics. En-
tropy now underlies our understanding of everything from compression
algorithms for pictures on the Web to the heat death expected at the
end of the universe.
Phases. Statistical mechanics explains the existence and properties of
4 Why Study Statistical Mechanics?

Fig. 1.1 Random Walks. The motion of molecules in a gas, or bacteria in a

liquid, or photons in the Sun, is described by an irregular trajectory whose velocity
rapidly changes in direction at random. Describing the specic trajectory of any
given random walk (left) is not feasible or even interesting. Describing the statistical
average properties of a large number of random walks is straightforward; at right is
shown the endpoints of random walks all starting at the center. The deep principle
underlying statistical mechanics is that it is often easier to understand the behavior
of ensembles of systems.

phases. The three common phases of matter (solids, liquids, and gasses)
have multiplied into hundreds: from superuids and liquid crystals, to
vacuum states of the universe just after the Big Bang, to the pinned
and sliding phases of earthquake faults. Phases have an integrity or
stability to small changes in external conditions or composition,1 and
often have a rigidity or stiness. Understanding what phases are and
how to describe their properties, excitations, and topological defects will
Chapter 7 focuses on quantum sta- be the themes of chapters 7,2 9 and 10.
tistical mechanics: quantum statistics, Computational Methods: MonteCarlo methods use simple rules
metals, insulators, superuids, Bose
condensation, . . . To keep the presenta-
to allow the computer to nd ensemble averages in systems far too com-
tion accessible to a broad audience, the plicated to allow analytical evaluation. These tools, invented and sharp-
rest of the text is not dependent upon ened in statistical mechanics, are used everywhere in science and tech-
knowing quantum mechanics. nology from simulating the innards of particle accelerators, to studies
of trac ow, to designing computer circuits. In chapter 8, we introduce
the Markovchain mathematics that underlies MonteCarlo.
Emergent Laws. Statistical mechanics allows us to derive the new
laws that emerge from the complex microscopic behavior. These laws be-
come exact only in certain limits. Thermodynamics the study of heat,

1 Water remains a liquid, with only perturbative changes in its properties, as one

changes the temperature or adds alcohol. Indeed, it is likely that all liquids are
connected to one another, and indeed to the gas phase, through paths in the space
of composition and external conditions.
To be pub. Oxford UP, Fall05

Fig. 1.2 Temperature: the Ising

model at the critical temperature.
Traditional statistical mechanics fo-
cuses on understanding phases of mat-
ter, and transitions between phases.
These phases solids, liquids, mag-
nets, superuids are emergent prop-
erties of many interacting molecules,
spins, or other degrees of free-
dom. Pictured here is a simple
two-dimensional model at its mag-
netic transition temperature Tc . At
higher temperatures, the system is
non-magnetic: the magnetization is
on average zero. At the temperature
shown, the system is just deciding
whether to magnetize upward (white)
or downward (black). While predict-
ing the time dependence of all these
degrees of freedom is not practical or
possible, calculating the average be-
havior of many such systems (a statis-
tical ensemble) is the job of statistical

temperature, and entropy becomes exact in the limit of large numbers

of particles. Scaling behavior and power laws both at phase transitions
and more broadly in complex systems emerge for large systems tuned
(or selforganized) near critical points. The right gure 1.1 illustrates
the simple law (the diusion equation) that describes the evolution of
the end-to-end lengths of random walks in the limit where the number
of steps becomes large. Developing the machinery to express and derive
these new laws are the themes of chapters 10 (phases), and 13 (critical
points). Chapter 11 systematically studies the uctuations about these
emergent theories, and how they relate to the response to external forces.
Phase Transitions. Beautiful spatial patterns arise in statistical
mechanics at the transitions between phases. Most of these are abrupt
phase transitions: ice is crystalline and solid until abruptly (at the edge
of the ice cube) it becomes unambiguously liquid. We study nucleation
and the exotic structures that evolve at abrupt phase transitions in chap-
ter 12.
Other phase transitions are continuous. Figure 1.2 shows a snapshot
of the Ising model at its phase transition temperature Tc . The Ising
model is a lattice of sites that can take one of two states. It is used as a
simple model for magnets (spins pointing up or down), two component
crystalline alloys (A or B atoms), or transitions between liquids and
gasses (occupied and unoccupied sites).3 All of these systems, at their
critical points, share the self-similar, fractal structures seen in the gure:
the system cant decide whether to stay gray or to separate into black

3 The Ising model has more far-ung applications: the threedimensional Ising

model has been useful in the study of quantum gravity.

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
6 Why Study Statistical Mechanics?

Fig. 1.3 Dynamical Systems and

Chaos. The ideas and methods of
statistical mechanics have close ties
to many other elds. Many nonlin-
ear dierential equations and map-
pings, for example, have qualitative
changes of behavior (bifurcations) as
parameters are tuned, and can ex-
hibit chaotic behavior. Here we see
the longtime equilibrium dynamics

x*( )
of a simple mapping of the unit in-
terval into itself as a parameter is
tuned. Just as an Ising magnet goes
from one unmagnetized state above Tc
to two magnetized states below Tc , 1
so this system goes from a periodic
state below 1 to a periodtwo cycle
above 1 . Above , the behavior
is chaotic. The study of chaos has 2
provided us with our fundamental ex-
planation for the increase of entropy
in statistical mechanics. Conversely,
tools developed in statistical mechan-
ics have been central to the under-
standing of the onset of chaos.

and white, so it uctuates on all scales. Another selfsimilar, fractal

object emerges from random walks (left gure 1.1) even without tuning
to a critical point: a blowup of a small segment of the walk looks sta-
tistically similar to the original path. Chapter 13 develops the scaling
and renormalizationgroup techniques that we use to understand these
selfsimilar, fractal properties.
Applications. Science grows through accretion, but becomes po-
tent through distillation. Each generation expands the knowledge base,
extending the explanatory power of science to new domains. In these
explorations, new unifying principles, perspectives, and insights lead us
to deeper, simpler understanding of our elds.
The period doubling route to chaos (gure 1.3) is an excellent ex-
ample of how statistical mechanics has grown tentacles into disparate
elds, and has been enriched thereby. On the one hand, renormalization
group methods drawn directly from statistical mechanics (chapter 13)
were used to explain the striking scaling behavior seen at the onset
of chaos (the geometrical branching pattern at the left of the gure).
These methods also predicted that this behavior should be universal:
this same perioddoubling cascade, with quantitatively the same scal-
ing behavior, would be seen in vastly more complex systems. This was
later veried everywhere from uid mechanics to models of human walk-
ing. Conversely, the study of chaotic dynamics has provided our best
fundamental understanding of the cause for the increase of entropy in
statistical mechanics (chapter 6).
We provide here the distilled version of statistical mechanics, invigo-
rated and claried by the accretion of the last four decades of research.
To be pub. Oxford UP, Fall05

The text in each chapter will address those topics of fundamental im-
portance to all who study our eld: the exercises will provide in-depth
introductions to the accretion of applications in mesoscopic physics,
astrophysics, dynamical systems, information theory, lowtemperature
physics, statistics, biology, lasers, and complexity theory. The goal is to
broaden the presentation to make it useful and comprehensible to so-
phisticated biologists, mathematicians, computer scientists, or complex
systems sociologists thereby enriching the subject for the physics and
chemistry students, many of whom will likely make excursions in later
life into these disparate elds.

Exercises 1.11.3 provide a brief review of probability
distributions. Quantum Dice explores discrete distribu-
tions and also acts as a gentle preview into Bose and 3 4 5 6
Roll #2

Fermi statistics. Probability Distributions introduces the

form and moments for the key distributions for continuous
variables and then introduces convolutions and multidi- 2 3 4 5
mensional distributions. Waiting Times shows the para-
doxes one can concoct by confusing dierent ensemble av-
Stirling part (a) derives the useful approximation
erages. 1 2 3 4
n! 2n(n/e)n ; more advanced students can continue
in the later parts to explore asymptotic series, which arise
in typical perturbative statistical mechanics calculations. 1 2 3
Random Matrix Theory briey introduces a huge eld,
with applications in nuclear physics, mesoscopic physics, Roll #1
Fig. 1.4 Rolling two dice. In Bosons, one accepts only the
and number theory; part (a) provides a good exercise in rolls in the shaded squares, with equal probability 1/6. In Fer-
histograms and ensembles, and the remaining more ad- mions, one accepts only the rolls in the darkly shaded squares
vanced parts illustrate level repulsion, the Wigner sur- (not including the diagonal), with probability 1/3.
mise, universality, and emergent symmetry.

(a) Presume the dice are fair: each of the three numbers
(1.1) Quantum Dice. (Quantum) (With Buchan. [15]) of dots shows up 1/3 of the time. For a legal turn rolling a
die twice in Bosons, what is the probability (4) of rolling
a 4? Similarly, among the legal Fermion turns rolling two
You are given several unusual three-sided dice which,
dice, what is the probability (4)?
when rolled, show either one, two, or three spots. There
are three games played with these dice, Distinguishable, Our dice rules are the same ones that govern the quantum
Bosons and Fermions. In each turn in these games, the statistics of identical particles.
player rolls one die at a time, starting over if required (b) For a legal turn rolling three three-sided dice in Fer-
by the rules, until a legal combination occurs. In Dis- mions, what is the probability (6) of rolling a 6? (Hint:
tinguishable, all rolls are legal. In Bosons, a roll is legal theres a Fermi exclusion principle: when playing Fer-
only if the new number is larger or equal to the preced- mions, no two dice can have the same number of dots
ing number. In Fermions, a roll is legal only if the new showing.) Electrons are fermions; no two electrons can
number is strictly larger than the preceding number. See be in exactly the same state.
gure 1.4 for a table of possibilities after rolling two dice. When rolling two dice in Bosons, there are six dierent
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
8 Why Study Statistical Mechanics?

legal turns (11), (12), (13), . . . , (33): half of them are (c) Sums of variables. Draw a graph of the probabil-
doubles (both numbers equal), when for plain old Dis- ity distribution of the sum x + y of two random variables
tinguishable turns only one third would be doubles4 : the drawn from a uniform distribution on [0, 1). Argue in gen-
probability of getting doubles is enhanced by 1.5 times eral that the sum z = x + y of random variables with dis-
in two-roll Bosons. When rolling three dice in Bosons, tributions 1 (x) and 2 (y)R will have a distribution given
there are ten dierent legal turns (111), (112), (113), . . . , by the convolution (z) = 1 (x)2 (z x) dx.
(333). When rolling M dice each ` with N sides in Bosons, Multidimensional probability distributions. In sta-
1 1)!
one can show that there are N+M M
= (N+M
M !(N1)!
legal tistical mechanics, we often discuss probability distribu-
turns. tions for many variables at once (for example, all the
(c) In a turn of three rolls, what is the enhancement of components of all the velocities of all the atoms in a
probability of getting triples in Bosons over that in Distin- box). Lets consider just the probability distribution of
guishable? In a turn of M rolls, what is the enhancement one molecules velocities. If vx , vy , and vz of a molecule
of probability for generating an M-tuple (all rolls having are independent and p each distributed with a Gaussian
the same number of dots showing)? distribution with = kT /M (section 3.2.2) then we de-
Notice that the states of the dice tend to cluster together scribe the combined probability distribution as a function
in Bosons. Examples of real bosons clustering into the of three variables as the product of the three Gaussians:
same state include Bose condensation (section 7.6.3) and
lasers (exercise 7.7). (vx , vy , vz ) = 1/(2(kT /M ))3/2 exp(M v2 /2kT )
r ! r 2
M vx M vy
(1.2) Probability Distributions. (Basic) M M
= e 2kT e 2kT
Most people are more familiar with probabilities for dis- 2kT 2kT
r !
crete events (like coin ips and card games), than with M 2
M vz

probability distributions for continuous variables (like hu- e 2kT . (1.1)

man heights and atomic velocities). The three contin-
uous probability distributions most commonly encoun-
(d) Show, using your answer for the standard deviation
tered in physics are: (i) Uniform: uniform (x) = 1 for
of the Gaussian in part (b), that the mean kinetic energy
0 x < 1, (x) = 0 otherwise; produced by ran-
is kT /2 per dimension. Show that the probability that the
dom number generators on computers; (ii) Exponential:
speed is v = |v| is given by a Maxwellian distribution
exponential (t) = et/ / for t 0, familiar from radioac-
tive decay and used in the collision theory p
2 2 of gasses; and Maxwell (v) = 2/(v 2 / 3 ) exp(v 2 /2 2 ). (1.2)
(iii) Gaussian: gaussian (v) = ev /2 /( 2), describ-
ing the probability distribution of velocities in a gas, the (Hint: What is the shape of the region in 3D velocity
distribution of positions at long times in random walks, space where |v| is between v and v + v? The area of a
the sums of random variables, and the solution to the sphere of radius R is 4R2 .)
diusion equation.
(a) Likelihoods. What is the probability that a ran- (1.3) Waiting times. (Basic) (With Brouwer. [14])
dom number uniform on [0, 1) will happen to lie between On a highway, the average numbers of cars and buses go-
x = 0.7 and x = 0.75? That the waiting time for a ra- ing east are equal: each hour, on average, there are 12
dioactive decay of a nucleus will be more than twice the ex- buses and 12 cars passing by. The buses are scheduled:
ponential decay time ? That your score on an exam with each bus appears exactly 5 minutes after the previous one.
Gaussian distribution of scores R will
be greater than 2 On the other hand, the cars appear at random: in a short
above themean? (Note: 2 (1/ 2) exp(v 2 /2) dv = interval dt, the probability that a car comes by is dt/ ,
(1 erf( 2))/2 0.023.) with = 5 minutes. An observer is counting the cars and
(b) Normalization, Mean, and Standard De- buses.
viation. Show thatR these probability distributions (a) Verify that each hour the average number of cars pass-
are normalized: (x)dx = 1. What is the ing the observer is 12.
mean x0qof each distribution? The standard de-
R (b) What is the probability Pbus (n) that n buses pass the
viation (x x0 )2 (x)dx? (You may use
R 2
observer in a randomly chosen 10 minute interval? And
the formulas (1/ 2) exp(x /2) dx = 1 and what is the probability Pcar (n) that n cars pass the ob-
R 2

x (1/ 2) exp(x2 /2) dx = 1.) server in the same time interval? (Hint: For the cars,

4 For Fermions, of course, there are no doubles.

To be pub. Oxford UP, Fall05

one way to proceed is to divide the interval into many many elds of applied mathematics, statistical mechan-
small slivers of time dt: in each sliver the probability is ics [97], and eld theory [98], so lets investigate them in
dt/ that a car passes, and 1 dt/ edt/ that no detail.
car passes. However you do it, you should get a Poisson (a) Show, by converting the sum to an integral, that
distribution, Pcar (n) = an ea /n! See also exercise 3.4.) log(n!) (n + 1/2 ) log(n + 1/2 ) n 1/2 log(1/2 ), where
(c) What is the probability distribution bus and car for (as always in this book) log represents the natural log-
the time interval between two successive buses and arithm, not log10 . Show that this is compatible with the
cars, respectively? What are the means of these distri- more precise and traditional formula n! (n/e)n 2n;
butions? (Hint: To answer this for the bus, youll in particular, show that the dierence of the logs goes
need to use the Dirac -function,5 which is zero except to a constant as n . Show that the latter is com-
at patible with the rst term in the series we use below,
R c zero and innite at zero, with integral equal to one: 1

f (x)(x b) dx = f (b).) n! (2/(n + 1)) /2 e(n+1) (n + 1)n+1 , in that the dif-
(d) If another observer arrives at the road at a randomly ference Rof the logs goes to zero as n . Related for-
chosen time, what is the probability distribution for the mul: log x dx = x log x x, and log(n + 1) log(n) =
time she has to wait for the rst bus to arrive? What log(1 + 1/n) 1/n up to terms of order 1/n2 .
is the probability distribution for the time she has to wait We want to expand this function for large n: to do this,
for the rst car to pass by? (Hint: What would the dis- we need to turn it into a continuous function, interpolat-
tribution of waiting times be just after a car passes by? ing between the integers. This continuous function, with
Does the time of the next car depend at all on the previ- its argument perversely shifted by one, is (z) = (z 1)!.
ous car?) What are the means of these distributions? There are many equivalent formulas for (z): indeed, any
formula giving an analytic function satisfying the recur-
The mean time between cars is 5 minutes. The mean
sion relation (z + 1) = z(z) and the normalization
time to the next car should be 5 minutes. A little thought
(1) = 1 is equivalent (by theorems of complex analy-
should convince you that the mean time since the last car
sis). We R wont use it here, but a typical denition is
should also be 5 minutes. But 5 + 5 = 5: how can this
(z) = 0 et tz1 dt: one can integrate by parts to show
that (z + 1) = z(z).
The same physical quantity can have dierent means (b) Show, using the recursion relation (z + 1) = z(z),
when averaged in dierent ensembles! The mean time that (z) is innite (has a pole) at all the negative inte-
between cars in part (c) was a gap average: it weighted gers.
all gaps between cars equally. The mean time to the next
Stirlings formula is extensible [9, p.218] into a nice ex-
car from part (d) was a time average: the second observer
pansion of (z) in powers of 1/z = z 1 :
arrives with equal probability at every time, so is twice
as likely to arrive during a gap between cars that is twice [z] = (z 1)! (1.3)
as long. 1
z z 1
(2/z) e /2
z (1 + (1/12)z
(e) In part (c), gap
car () was the probability that a ran- 2
+ (1/288)z (139/51840)z 3
domly chosen gap was of length . Write a formula for
car (), the probability that the second observer, arriv- (571/2488320)z 4 + (163879/209018880)z 5
ing at a randomly chosen time, will be in a gap between + (5246819/75246796800)z 6
cars of length . (Hint: Make sure its normalized.)
(534703531/902961561600)z 7
From time
car (), calculate the average length of the gaps
between cars, using the timeweighted average measured (4483131259/86684309913600)z 8 + ...)
by the second observer. This looks like a Taylor series in 1/z, but is subtly dier-
ent. For example, we might ask what the radius of con-
(1.4) Stirlings Approximation and Asymptotic vergence [101] of this series is. The radius of convergence
Series. (Mathematics) is the distance to the nearest singularity in the complex
One important approximation useful in statistical me- plane.
chanics is Stirlings approximation [99] for n!, valid for (c) Let g() = (1/); then Stirlings formula is some
large n. Its not a traditional Taylor series: rather, its stu times a Taylor series in . Plot the poles of g() in
an asymptotic series. Stirlings formula is extremely use- the complex plane. Show, that the radius of convergence
ful in this course, and asymptotic series are important in of Stirlings formula applied to g must be zero, and hence

5 Mathematically, this isnt a function, but rather a distribution or a measure.

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
10 Why Study Statistical Mechanics?

no matter how large z is, Stirlings formula eventually plot functions on the same graph, (ii) nd eigenvalues of
diverges. matrices, sort them, and collect the dierences between
Indeed, the coecient of z j eventually grows rapidly; neighboring ones, and (iii) generate symmetric random
Bender and Orszag [9, p.218] show that the odd coe- matrices with Gaussian and integer entries. Mathemat-
cients (A1 = 1/12, A3 = 139/51840 . . . ) asymptotically ica, Matlab, Octave, and Python are all good choices. For
grow as those who are not familiar with one of these packages, I
A2j+1 (1)j 2(2j)!/(2)2(j+1) . (1.4) will post hints on how to do these three things on the
book Web site [105].
(d) Show explicitly, using the ratio test applied to for-
mula 1.4, that the radius of convergence of Stirlings for- The most commonly explored ensemble of matrices is the
mula is indeed zero.6 Gaussian Orthogonal Ensemble. Generating a member
H of this ensemble of size N N is easy:
This in no way implies that Stirlings formula isnt valu-
able! An asymptotic series of length n approaches f (z) as Generate a N N matrix whose elements are ran-
z gets big, but for xed z it can diverge as n gets larger dom numbers with Gaussian distributions of mean
and larger. In fact, asymptotic series are very common, zero and standard deviation = 1.
and often are useful for much larger regions than are Tay-
Add each matrix to its transpose to symmetrize it.
lor series.
(e) What is 0!? Compute 0! using successive terms in As a reminder, the Gaussian or normal probability distri-
Stirlings formula (summing to AN for the rst few N .) bution gives a random number x with probability
Considering that this formula is expanding about innity,
1 2 2
it does pretty well! (x) = ex /2 . (1.5)
Quantum electrodynamics these days produces the most 2
precise predictions in science. Physicists sum enormous One of the simplest and most striking properties that
numbers of Feynman diagrams to produce predictions of large random matrices share is the distribution of level
fundamental quantum phenomena. Dyson argued that splittings.
quantum electrodynamics calculations give an asymptotic
(a) Generate an ensemble with M = 1000 or so GOE ma-
series [98]; the most precise calculation in science takes
trices of size N = 2, 4, and 10. (More is nice.) Find the
the form of a series which cannot converge!
eigenvalues n of each matrix, sorted in increasing or-
(1.5) Random Matrix Theory. (Math, Quantum) der. Find the dierence between neighboring eigenvalues
(With Brouwer. [14]) n+1 n , for n, say, equal to7 N/2. Plot a histogram of
One of the most active and unusual applications of ensem- these eigenvalue splittings divided by the mean splitting,
bles is random matrix theory, used to describe phenomena with binsize small enough to see some of the uctuations.
in nuclear physics, mesoscopic quantum mechanics, and (Hint: debug your work with M = 10, and then change
wave phenomena. Random matrix theory was invented in to M = 1000.)
a bold attempt to describe the statistics of energy level What is this dip in the eigenvalue probability near zero?
spectra in nuclei. In many cases, the statistical behavior Its called level repulsion.
of systems exhibiting complex wave phenomena almost For N = 2 the probability distribution for the eigenvalue
any correlations involving eigenvalues and eigenstates splitting
can becalculated pretty simply. Let our matrix
can be quantitatively modeled using simple ensembles of a b
be M = .
matrices with completely random, uncorrelated entries! b c
To do this problem, youll need to nd a software envi- (b) Show that the eigenvalue
p dierence for M is =
ronment in which it is easy to (i) make histograms and (c a)2 + 4b2 = 2 d2 + b2 where d = (c a)/2.8 If

6 If you dont remember about radius of convergence, see [101]. Here youll be using
every other term in the series, so the radius of convergence is |A2j1 /A2j+1 |.
7 In the experiments, they typically plot all the eigenvalue splittings. Since the

mean splitting between eigenvalues will change slowly, this smears the distributions
a bit. So, for example, the splittings between the largest and secondlargest eigen-
values will be typically rather larger for the GOE ensemble than for pairs near the
middle. If you conne your plots to a small range near the middle, the smearing
would be small, but its so fast to calculate new ones we just keep one pair.
8 Note that the eigenvalue dierence doesnt depend on the trace of M , a + c, only

on the dierence c a = 2d.

To be pub. Oxford UP, Fall05

the probability distribution of matrices M (d, b) is contin- distributions as in part (a). Are they universal (indepen-
uous and nite at d = b = 0, argue that the probability dent of the ensemble up to the mean spacing) for N = 2
density () of nding an energy level splitting near zero and 4? Do they appear to be nearly universal10 (the same
vanishes at = 0, giving us level repulsion. (Both d and as for the GOE in part (a)) for N = 10? Plot the Wigner
b must vanish to make = 0.) (Hint: go to polar coor- surmise along with your histogram for N = 10.
dinates, with the radius.) The GOE ensemble has some nice statistical properties.
(c) Calculate analytically the standard deviation of a di- The ensemble is invariant under orthogonal transforma-
agonal and an o-diagonal element of the GOE ensemble tions
(made by symmetrizing Gaussian random matrices with H RT HR with RT = R1 . (1.7)
= 1). You may want to check your answer by plotting
your predicted Gaussians over the histogram of H11 and (g) Show that Tr[H T H] is the sum of the squares of all
elements of H. Show that this trace is invariant un-
H12 from your ensemble in part (a). Calculate analyti-
der orthogonal coordinate transformations (that is, H
cally the standard deviation of d = (c a)/2 of the N = 2
GOE ensemble of part (b), and show that it equals the RT HR with RT = R1 ). (Hint: Remember, or derive,
the cyclic invariance of the trace: Tr[ABC] = Tr[CAB].)
standard deviation of b.
(d) Calculate a formula for the probability distribution of Note that this trace, for a symmetric matrix, is the sum
eigenvalue spacings for the N = 2 GOE, by integrating of the squares of the diagonal elements plus twice the
over the probability density M (d, b). (Hint: polar coor- squares of the upper triangle of odiagonal elements.
dinates again.) That is convenient, because in our GOE ensemble the
variance (squared standard deviation) of the odiagonal
If you rescale the eigenvalue splitting distribution you
elements is half that of the diagonal elements.
found in part (d) to make the mean splitting equal to
one, you should nd the distribution (h) Write the probability density (H) for nding GOE
ensemble member H in terms of the trace formula in
s s2 /4
Wigner (s) = e . (1.6) part (g). Argue, using your formula and the invariance
from part (g), that the GOE ensemble is invariant under
This is called the Wigner surmise: it is within 2% of the orthogonal transformations: (RT HR) = (H).
correct answer for larger matrices as well.9
This is our rst example of an emergent symmetry. Many
(e) Plot equation 1.6 along with your N = 2 results from dierent ensembles of symmetric matrices, as the size N
part (a). Plot the Wigner surmise formula against the goes to innity, have eigenvalue and eigenvector distribu-
plots for N = 4 and N = 10 as well. tions that are invariant under orthogonal transformations
Lets dene a 1 ensemble of real symmetric matrices, by even though the original matrix ensemble did not have
generating a N N matrix whose elements are indepen- this symmetry. Similarly, rotational symmetry emerges
dent random variables each 1 with equal probability. in random walks on the square lattice as the number of
(f ) Generate an ensemble with M = 1000 1 symmetric steps N goes to innity, and also emerges on long length
matrices with size N = 2, 4, and 10. Plot the eigenvalue scales for Ising models at their critical temperatures.11

9 The distribution for large matrices is known and universal, but is much more

complicated to calculate.
10 Note the spike at zero. There is a small probability that two rows or columns of

our matrix of 1 will be the same, but this probability vanishes rapidly for large N .
11 A more exotic emergent symmetry underlies Fermi liquid theory: the eective

interactions between electrons disappear near the Fermi energy: the xed point has
an emergent gauge symmetry.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
12 Why Study Statistical Mechanics?

To be pub. Oxford UP, Fall05

Random Walks and
Emergent Properties 2
What makes physics possible? Why are humans able to nd simple
mathematical laws that describe the real world? Our physical laws
are not direct statements about the underlying reality of the universe.
Rather, our laws emerge out of far more complex microscopic behavior.1 1
You may think that Newtons law of
Statistical mechanics provides powerful tools for understanding simple gravitation, or Einsteins renement to
it, is more fundamental than the dif-
behavior that emerges from underlying complexity. fusion equation. You would be cor-
In this chapter, we will explore the emergent behavior for random rect: gravitation applies to everything.
walks. Random walks are paths that take successive steps in random But the simple macroscopic law of grav-
directions. They arise often in statistical mechanics: as partial sums of itation emerges, from a quantum ex-
change of immense numbers of virtual
uctuating quantities, as trajectories of particles undergoing repeated gravitons, just as the diusion equa-
collisions, and as the shapes for long, linked systems like polymers. They tion emerges from large numbers of long
have two kinds of emergent behavior. First, an individual random walk, random walks. The diusion equation
after a large number of steps, becomes fractal or scale invariant (sec- and other continuum statistical me-
chanics laws are special to particular
tion 2.1). Secondly, the endpoint of the random walk has a probability systems, but they emerge from the mi-
distribution that obeys a simple continuum law: the diusion equation croscopic theory in much the same way
(section 2.2). Both of these behaviors are largely independent of the as gravitation and the other fundamen-
microscopic details of the walk: they are universal. Random walks in tal laws of nature do.

an external eld (section 2.3) provide our rst examples of conserved

currents, linear response, and Boltzmann distributions. Finally we use
the diusion equation to introduce Fourier and Greens function solution
techniques (section 2.4). Random walks encapsulate many of the themes
and methods of statistical mechanics.

2.1 Random Walk Examples: Universality

and Scale Invariance
We illustrate random walks with three examples: coin ips, the drunk-
ards walk, and polymers.
Coin Flips. Statistical mechanics often N demands sums or averages of
a series of uctuating quantities: sN = i=1 i . The energy of a material
is a sum over the energies of the molecules composing the material; your
grade on a statistical mechanics exam is the sum of the scores on many
individual questions. Imagine adding up this sum one term at a time:
the path s1 , s2 , . . . forms an example of a one-dimensional random walk.
For example, consider ipping a coin, recording the dierence sN =
hN tN between the number of heads and tails found. Each coin ip
14 Random Walks and Emergent Properties

contributes i = 1 to the total. How big a sum sN = i=1 i =
(heads tails) do you expect after N ips? The average of sN is of
course zero, because positive and negative steps are equally likely. A
better measure of the characteristic
 distance moved is the rootmean
We use angle brackets X to denote square (RMS) number2 s2N . After one coin ip,
averages over various ensembles: well
add subscripts to the brackets where s1 2 = 1 = 1/2 (1)2 + 1/2 (1)2 ; (2.1)
there may be confusion about which en-
semble we are using. Here our ensemble after two and three coin ips
contains all 2N possible sequences of N
coin ips. s2 2 = 2 = 1/4 (2)2 + 1/2 (0)2 + 1/4 (2)2 ; (2.2)
s3  = 3 =
2 1
/8 (3)2 + 3
/8 (1)2 + 3
/8 (1)2 + 1
/8 (3)2 .
Does this pattern continue? Because N = 1 with equal probability
independent of the history, N sN 1  = 1/2 (+1)sN 1  + 1/2 (1)sN 1  =
0. We know 2N  = 1; if we assume s2N 1  = N 1 we can prove by
induction on N that

s2N  = (sN 1 + N )2  = s2N 1  + N1 N  + N 
= s2N 1  + 1 = N. (2.3)
Hence the RMS average of (heads-tails) for N coin ips,

s = s2N  = N . (2.4)
Notice that we chose to count the dierence between the number of
heads and tails. Had we instead just counted the number of heads hN ,
then hN  would grow proportionately to N : hN  = N/2. We would
then be interested in the uctuations of hN about N/2, measured most
easily by squaring the dierence between the particular random walks
Its N/4 for h instead of N for s be- and the average random walk: h2 = (hN hN )2  = N/4.3 The
cause each step changes sN by 2, and variable h is the standard deviation of the sum hN : this is an example
hN only by 1: the standard deviation
is in general proportional to the step
of the typical behavior that the standarddeviation of the sum of N
size. random variables grows proportionally to N .
The sum, of course, grows linearly with N , so (if the average isnt
zero) the uctuations become tiny in comparison to the sum. This is
why experimentalists often make repeated measurements of the same
quantity and take the mean. Suppose we were to measure the mean
number of heads per coin toss, aN = hN /N . We see immediately that
the uctuations in aN will also be divided by N , so

a = h /N = 1/(2 N ). (2.5)
The standard
deviation of the mean of N measurements is proportional
to 1/ N .
Drunkards Walk. Random walks in higher dimensions arise as
trajectories that undergo successive random collisions or turns: for ex-
ample, the trajectory of a perfume molecule in a sample of air4 Because
4 Real perfume in a real room will primarily be transported by convection; in

liquids and gasses, diusion dominates usually only on short length scales. Solids
dont convect, so thermal or electrical conductivity would be more accurate but
less vivid applications for random walks.
To be pub. Oxford UP, Fall05
2.1 Random Walk Examples: Universality and Scale Invariance 15

the air is dilute and the interactions are short-ranged, the molecule will
basically travel in straight lines, with sharp changes in velocity during
infrequent collisions. After a few substantial collisions, the molecules
velocity will be uncorrelated with its original velocity. The path taken
by the molecule will be a jagged, random walk through three dimensions.
The random walk of a perfume molecule involves random directions,
random velocities, and random step sizes. Its more convenient to study
steps at regular time intervals, so well instead consider the classic prob-
lem of a drunkards walk. The drunkard is presumed to start at a lamp-
post at x = y = 0. He takes steps N each of length L, at regular time
intervals. Because hes drunk, the steps are in completely random direc-
tions, each uncorrelated with the previous steps. This lack of correlation
says that the average dot product between any two steps m and n is
zero, since all relative angles between the two directions are equally
likely: m n  = L
= 0.5 This implies that the dot product
 N 1
of N with sN 1 = m=0 m is zero. Again, we can use this to work by
 = (sN 1 + 
N )2  = sN
sN 1 N  + N
1  + 2
 Fig. 2.1 The drunkard takes a series of
steps of length L away from the lamp-
= + L = = NL ,
2 2
(2.6) post, but each with a random angle.
so the RMS distance moved is N L. More generally, if two variables are
uncorrelated then the average of their
Random walks introduce us to the concepts of scale invariance and
product is the product of their aver-
universality. ages: in this case this would imply
Scale Invariance. What kind of path only goes N total distance in 
n  = 
n  = 0 0 = 0.
N steps? Random walks form paths which look jagged and scrambled.
Indeed, they are so jagged that if you blow up a small corner of one, the
blown up version looks just as jagged (gure 2.2). Clearly each of the
blown-up random walks is dierent, just as any two random walks of the
same length are dierent, but the ensemble of random walks of length
N looks much like that of length N/4, until N becomes small enough
that the individual steps can be distinguished. Random walks are scale
invariant: they look the same on all scales.6 6
They are also fractal with dimen-
Universality. On scales where the individual steps are not distin- sion two, in all spatial dimensions larger
than two. This just reects the fact
guishable (and any correlations between steps is likewise too small to that a random walk of volume V = N
see) we nd that all random walks look the same. Figure 2.2 depicts steps roughly ts into a radius R
a drunkards walk, but any twodimensional random walk would give sN N /2 . The fractal dimension D
the same behavior (statistically). Coin tosses of two coins (penny sums of the set, dened by RD = V , is thus
along x, dime sums along y) would produce, statistically, the same ran-
dom walk ensemble on lengths large compared to the step sizes. In three
dimensions, photons7 in the sun (exercise 2.2) or in a glass of milk un- 7
A photon is a quantum of light or
dergo a random walk with xed speed c between collisions. Nonetheless, other electromagnetic radiation.
after a few steps their random walks are statistically indistinguishable
from that of our variablespeed perfume molecule. This independence
of the behavior on the microscopic details is called universality.
Random walks are simple enough that we could probably show that
each individual case behaves like the others. In section
2.2 we will gen-
eralize our argument that the RMS distance scales as N to simulta-
neously cover both coin ips and drunkards; with more work we could
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
16 Random Walks and Emergent Properties

Fig. 2.2 Random Walk: Scale Invariance Random walks form a jagged, fractal
pattern which looks the same when rescaled. Here each succeeding walk is the rst
quarter of the previous walk, magnied by a factor of two; the shortest random walk
is of length 31, the longest of length 128,000 steps. The left side of gure 1.1 is the
further evolution of this walk to 512,000 steps.

To be pub. Oxford UP, Fall05

2.2 The Diusion Equation 17

include variable times between collisions and local correlations to cover

the cases of photons and molecules in a gas. We could probably also
calculate properties about the jaggedness of paths in these systems, and

S&P 500 Index / avg. return

show that they too agree with one another after many steps. Instead, 2 Random
well wait for chapter 13 (and specically exercise 13.7), where we will
give a deep but intuitive explanation of why each of these problems 1.5

are scale invariant, and why all of these problems share the same be-
havior on long length scales. Universality and scale invariance will be 1

explained there using renormalizationgroup methods, originally devel-

1985 1990 1995 2000 2005
oped to study continuous phase transitions. Year

Polymers. Finally, random walks arise as the shapes for polymers.

Fig. 2.3 S&P 500, normalized.
Polymers are long molecules (like DNA, RNA, proteins, and many plas-
Standard and Poors 500 stock index
tics) made up of many small units (called monomers) attached to one daily closing price since its inception,
another in a long chain. Temperature can introduce uctuations in the corrected for ination, divided by the
angle between two adjacent monomers; if these uctuations dominate average 6.4% return over this time pe-
riod. Stock prices are often modeled as
over the energy,8 the polymer shape can form a random walk. Here a biased random walk. Notice that the
the steps are not increasing with time, but with monomers (or groups uctuations (risk) in individual stock
of monomers) along the chain. prices will typically be much higher. By
The random walks formed by polymers are not the same as those in averaging over 500 stocks, the random
uctuations in this index are reduced,
our rst two examples: they are in a dierent universality class. This while the average return remains the
is because the polymer cannot intersect itself: a walk that would cause same: see [65] and [66]. For compar-
two monomers to overlap is not allowed. Polymers undergo self-avoiding ison, a one-dimensional multiplicative
random walks. In two and three dimensions, it turns out that the eects random walk is also shown.
of these selfintersections is not a small, microscopic detail, but changes Plastics at low temperature can be
crystals; functional proteins and RNA
the properties of the random walk in an essential way.9 One can show often packed tightly into welldened
that these intersections will often arise on farseparated regions of the shapes. Molten plastics and dena-
polymer, and that in particular they change the dependence of squared tured proteins form selfavoiding ran-
radius s2N  on the number ofsegments N (exercise 2.8). In particular, dom walks. Doublestranded DNA is
rather sti: the step size for the ran-
they change the power law s2N  N from the ordinary random dom walk is many nucleic acids long.
walk value = 1/2 to a higher value, = 3/4 in two dimensions and 9
Selfavoidance is said to be a rel-
0.59 in three dimensions. Power laws are central to the study of evant perturbation that changes the
scaleinvariant systems: is our rst example of a universal critical universality class. In (unphysical)
exponent. spatial dimensions higher than four,
selfavoidance is irrelevant: hypothet-
ical hyperpolymers in ve dimensions
would look like regular random walks
2.2 The Diusion Equation on long length scales.

In the continuum limit of long length and time scales, simple behavior
emerges from the ensemble of irregular, jagged random walks: their
evolution is described by the diusion equation:10 10
In the remainder of this chapter we
specialize for simplicity to one dimen-
2 sion. We also change variables from the
= D2 = D 2 . (2.7) sum s to position x.
t x
The diusion equation can describe the evolving density (x, t) of a local
cloud of perfume as the molecules randomwalk through collisions with
the air molecules. Alternatively, it can describe the probability density of
an individual particle as it random walks through space: if the particles
are non-interacting, the probability distribution of one particle describes
the density of all particles.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
18 Random Walks and Emergent Properties

Consider a general, uncorrelated random walk where at each time step

t the particles position x changes by a step :
x(t + t) = x(t) + (t). (2.8)
In our two examples the distribution Let the probability distribution for each step be ().11 Well assume
() was discrete: we can write it using that has mean zero and standard deviation a, so the rst few moments
the Dirac -function. (The function
(x x0 ) is a probability density which
of are

has 100% chance of nding the particle
in any box containing x0 : thus
R (xx0 ) (z) dz = 1, (2.9)
is zero unless x = x0 , and f (x)(x 
x0 )dx = f (x0 ) so long as the domain
of integration includes x0 .) In the case
z(z) dz = 0, and
of coin ips, a 50/50 chance of  = 1 
can be written as () = 1/2 ( + 1) + z 2 (z) dz = a2 .
1/ ( 1). In the case of the drunkard,
() = (|| L)/(2L), evenly spaced
around the circle. What is the probability distribution for (x, t+t), given the probability
distribution (x , t)?
Clearly, for the particle to go from x at time t to x at time t + t,
the step (t) must be x x . This happens with probability (x x )
times the probability density (x , t) that it started at x . Integrating
over original positions x , we have

(x, t + t) = (x , t)(x x ) dx

= (x z, t)(z) dz (2.10)

Notice that although dz =
dx ,
the where we change variables to z = x x .12
R of integration
limits = Now, suppose is broad: the step size is very small compared to the
, canceling the minus sign. This scales on which varies (gure 2.4). We may then do a Taylor expansion
happens often in calculations: watch
of 2.10 in z:
out for it.   
z 2 2
(x, t + t) (x, t) z + (z) dz (2.11)
x 2 x2
1  0 2 

dz + 1/2 2
= (x, t) (z) dz z(z) z 2 (z) dz.
Fig. 2.4 We suppose the step sizes  x x
are small compared to the broad ranges 2
on which (x) varies, so we may do a = (x, t) + 1/2 2 a2
Taylor expansion in gradients of . x
using the moments of in 2.9. Now, if we also assume that is slow, so
that it changes only slightly during this time step, we can approximate
(x, t + t) (x, t)
t t, and we nd

a2 2
= . (2.12)
t 2t x2
One can understand this intuitively. This is the diusion equation13 (2.7), with
Random walks and diusion tend to
even out the hills and valleys in the den- D = a2 /2t. (2.13)
sity. Hills have negative second deriva-
2 The diusion equation applies to all random walks, so long as the prob-
tives x 2 < 0 and should atten t <
0, valleys have positive second deriva- ability distribution is broad and slow compared to the individual steps.
tives and ll up. To be pub. Oxford UP, Fall05
2.3 Currents and External Forces. 19

2.3 Currents and External Forces.

As the particles in our random walks move around, they never are cre-
ated or destroyed: they are conserved.14 If (x) is the density of a 14
More subtly, the probability density
conserved quantity, we may write its evolution law (see gure 2.5) in (x) of a single particle undergoing a
random walk is also conserved: like par-
terms of the current J(x) passing a given point x: ticle density, probability density can-
not be created or destroyed, it can only
J slosh around.
= . (2.14)
t x

Here the current J is the amount of stu owing to the right through (x) x
the point x; since the stu is conserved, the only way the density can
change is by owing from one place to another. From equation 2.7 and J(x) J(x+x)
equation 2.14, the current for the diusion equation is
Fig. 2.5 Let (x, t) be the density
of some conserved quantity (# of
molecules, mass, energy, probability,
Jdiusion = D ; (2.15) etc.) varying in one spatial dimension
x, and J(x) be the rate at which is
passing a point x. The the amount
particles diuse (randomwalk) on average from regions of high density of in a small region (x, x + x) is
towards regions of low density. n = (x) x. The ow of particles into
this region from the left is J(x) and
In many applications one has an average drift term along with a ran-
the ow out is J(x + x), so n t
dom walk. In some cases (like the total grade in a multiple-choice test, J(x) J(x + x) x, and we de-
exercise 2.1) there is naturally a non-zero mean for each step in the ran- rive the conserved current relation
dom walk. In other cases, there is an external force F that is biasing J(x + x) J(x) J
= = .
the steps to one side: the mean net drift is F t times a mobility : t x x

x(t + t) = x(t) + F t + (t). (2.16)

We can derive formulas for this mobility given a microscopic model. If

our air is diulte and the diusing molecule is small, we can model the
trajectory as free acceleration between collisions separated by t, and we
can assume the collisions completely scramble the velocities. In this case,
the net motion due to the external force is half the acceleration F/m
t t
times the time squared: 1/2 at2 = 1/2 (F/m)(t)2 = F t 2m so = 2m
Using equation 2.13, we nd
t 2t D D
= D 2 = = (2.17)
2m a m(a/t)2 mv 2

where v = a/t is the velocity of the unbiased random walk step. If our
air is dense and the diusing molecule is large, we might treat the air
as a viscous uid of kinematic viscosity ; if we also simply model the
molecule as a sphere of radius r, a uid mechanics calculation tells us
that the mobility is = 1/(6r).
Starting from equation 2.16, we can repeat our analysis of the contin-
uum limit (equations 2.10 through 2.12) to derive the diusion equation
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
20 Random Walks and Emergent Properties

in an external force,15

J = F D (2.18)

= F + D 2. (2.19)
t x x
The new term can be explained intuitively: if is increasing in space

(positive slope x ) and the force is dragging the particles forward, then
will decrease with time because the high-density regions ahead of x
are receding and the low density regions behind x are moving in.
The diusion equation describes how systems of randomwalking par-
ticles approach equilibrium (see chapter 3). The diusion equation in
the absence of external force describes the evolution of perfume density
in a room. A timeindependent equilibrium state obeying the dif-
fusion equation 2.7 must have 2 /x2 = 0, so (x) = 0 + Bx. If
the perfume cannot penetrate the walls, x = 0 at the boundaries so
B = 0. Thus, as one might expect, the perfume evolves to a rather
featureless equilibrium state (x) = 0 , evenly distributed throughout
the room.
In the presence of a constant external force (like gravitation) the equi-
librium state is more interesting. Let x be the height above the ground,
and F = mg be the force due to gravity. By equation 2.19, the equi-
librium state satises
0= = mg +D (2.20)
t x x2
which has general solution (x) = A exp( D
mgx) + B. We assume
Non-zero B would correspond to a that the density of perfume B in outer space is zero,16 so the density
constant-density rain of perfume. of perfume decreases exponentially with height:

(x) = A exp( mgx). (2.21)
The perfume molecules are pulled downward by the gravitational force,
and remain aloft only because of the random walk. If we generalize
from perfume to oxygen molecules (and ignore temperature gradients
and weather) this gives the basic explanation for why it becomes harder
to breath as one climbs mountains.17
15 Warning: if the force is not constant in space, the evolution also depends on the
2 2
gradient of the force: t
= F (x)(x)
+ D x F
2 = x F x + D x2 (see
the discussion surrounding note 15 on page 182.
17 In chapter 5 we shall derive the Boltzmann distribution, implying that the

probability of having energy mgh = E in an equilibrium system is proportional

to exp(E/kB T ), where T is the temperature and kB is Boltzmanns constant. This
has just the same form as our solution (equation 2.21), if D/ = kB T . This is
called the Einstein relation. Our rough derivation (equation 2.17) suggested that
D/ = mv2 , which suggests that kB T must equal twice the kinetic energy along
x for the Einstein relation to hold: this is also true, and is called the equipartition
theorem (section 3.2.2). The constants in the (nonequilibrium) diusion equation
are related to one another, because the density must evolve toward the equilibrium
distribution dictated by statistical mechanics.
To be pub. Oxford UP, Fall05
2.4 Solving the Diusion Equation 21

2.4 Solving the Diusion Equation

We take a brief mathematical interlude, to review two important meth-
ods for solving the diusion equation: Fourier transforms and Greens
functions. Both rely upon the fact that the diusion equation is linear:
 n (x, t) are known, then any linear combination
if a family of solutions
of these solutions n an n (x, t) isalso a solution. If we can then ex-
pand the initial density (x, 0) = n an n (x, 0), weve formally found
the solution.
Fourier methods are wonderfully eective computationally, because
of fast Fourier Transform (FFT) algorithms for shifting from the real-
space density to the solution space. Greens function methods are more
important for analytical calculations and as a source of approximate
solutions.18 18
One should note that much of quan-
tum eld theory and many-body quan-
tum mechanics is framed in terms of
2.4.1 Fourier something also called Greens functions.
These are distant, fancier cousins of the
The Fourier transform method decomposes into a family of plane wave simple methods used in linear dieren-
solutions k (t)eikx . tial equations.
The diusion equation is homogeneous in space: our system is trans-
lationally invariant. That is, if we have a solution (x, t), another
equally valid solution is given by (x , t), which describes the evo-
lution of an initial condition translated by in the positive x direc-
tion.19 Under very general circumstances, a linear equation describing 19
Make sure you know that g(x) =
a translationinvariant system will have solutions given by plane waves f (x ) shifts the function in the pos-
itive direction: for example, the new
(x, t) = k (t)eikx . function g() is at what the old one
We argue this important truth in detail in in the appendix (sec- was at the origin, g() = f (0).
tion A.3). Here we just try it. Plugging a plane wave into the diusion
equation 2.7, we nd

k ikx 2
= e = D 2 = Dk 2 k eikx (2.22)
t dt x
d k
= Dk 2 k (2.23)
k (t) = k (0)eDk t .

Now, these plane wave solutions by themselves are unphysical: we must

combine them to get a sensible density. First, they are complex: we
must add plane waves at k and k to form cosine waves, or subtract
them and dividing by 2i to get sine waves. Cosines and sines are also
not by themselves sensible densities (because they go negative), but
they in turn can be added to one another (for example, added to a
constant background 0 ) to make for sensible densities. Indeed, we can
superimpose all dierent wave-vectors to get the general solution

k (0)eikx eDk t dk.
(x, t) = (2.25)

Here the coecients k (0) we use are just the the Fourier transform of
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
22 Random Walks and Emergent Properties

the initial density prole

k (0) = (x, 0)eikx dx (2.26)

and we recognize equation 2.25 as the inverse Fourier transform of the

solution timeevolved in Fourier space

k (t) = k (0)eDk t .

Thys, by writing as a superposition of plane waves, we nd a simple

law: the shortwavelength parts of are squelched as time t evolves,
with wavevector k being suppressed by a factor eDk t .

2.4.2 Green
The Greens function method decomposes into a family of solutions
G(x y, t) where all of the diusing particles start at a particular point
Lets rst consider the case where all particles start at the origin.
Suppose we have one unit of perfume, released at the origin at time t = 0.
What is the
initial condition (x, t = 0)? Clearly (x, 0) = 0 unless
x = 0, and (x, 0)dx = 1, so (0, 0) must be really, really innite.
This is of course the Dirac delta function (x), which mathematically
(when integrated) is a linear operator on functions returning the value
of the function at zero:

f (y)(y) dy = f (0). (2.28)

Lets dene the Greens function G(x, t) to be the time evolution of

the density G(x, 0) = (x) with all the perfume at the origin. Naturally,
2 G
Fig. 2.6 10,000 endpoints of random G(x, t) obeys the diusion equation G
t = D x2 . We can use the Fourier
walks, each 1000 steps long. Notice
that after 1000 steps, the distribution transform methods of the previous section to solve for G(x, t). The
of endpoints looks quite Gaussian. In- Fourier transform at t = 0 is
deed after about ve steps the distri-  
bution is extraordinarily close to Gaus-
Gk (0) = G(x, 0)e ikx
= (x)eikx = 1 (2.29)
sian, except far in the tails.

(independent of k). Hence the time evolved Fourier transform is G k (t) =

Dk2 t
e , and the time evolution in real space is
1 ikx Dk2 t 1
eikx eDk t dk.
G(x, t) = e Gk (0)e dk = (2.30)
2 2
This last integral is the Fourier transform of a Gaussian. This transform
Its useful to remember that the can be performed20 giving another Gaussian21
Fourier transform of a normalized
Gaussian 1 exp(x2 /22 ) is another ix 2 x2
we complete the square in the integrand eikx eDk t = eDt(k 2Dt ) e 4Dt ,
2 2
20 If
Gaussian, exp(2 k 2 /2) of standard
and change variables to = k 2Dt
deviation 1/ and with no prefactor.
Z + ix
x2 2Dt 2
G(x, t) = e 4Dt eDt d. (2.31)
+ 2Dt
To be pub. Oxford UP, Fall05
2.4 Solving the Diusion Equation 23

ex /4Dt .
G(x, t) = (2.32)
This is the Greens function for the diusion equation.
The Greens function directly tells us the distribution of the endpoints
of random walks centered
at the origin (gure 2.6). Does it agree with
our formula x2  = a N for N -step random walks of step size a (sec-
tion 2.1)? At time t, the Greens function (equation 2.32) is a Gaussian
with root-mean-square standard deviation (t) = 2Dt; plugging in our
diusion constant D = 2t (equation 2.13), we nd an RMS distance of
(t) = a t = a N , where N = t is the number of steps taken in
the random walk: our two methods do agree.
Finally, since the diusion equation has translational symmetry, we
can solve for the evolution of random walks centered at any point y: the
time evolution of an initial condition (x y) is G(x y, t). Since we
can write any initial condition (x, 0) as a superposition of -functions

(x, 0) = (y, 0)(x y) dy (2.33)

we can write a general solution (x, t) to the diusion equation

(x, 0) = (y, 0)(x y) dy = (y, 0)G(x y, 0) dy (2.34)
e(yx) /4Dt

(x, t) = (y, 0)G(x y, t) dy = (y, 0) dy. (2.35)

This equation states that the current value of the density is given by
the original values of the density in the neighborhood, smeared sideways
(convolved) with the function G.
Thus by writing as a superposition of point sources, we nd tha the
diusion equation smears out all the sharp features, averaging over
that grow proportionally to the typical random walk distance

Exercises 2.1, 2.2, and 2.3 give simple examples of random tion approaches to solving the diusion equation. Ex-
walks in dierent contexts. Exercises 2.4 and 2.5 illustrate ercises 2.6 and 2.7 apply the diusion equation in the
the qualitative behavior of the Fourier and Greens func-

If we then shift the limits of integration upward to the real axis, we get a familiar
integral (exercise 1.2) giving 1 . This last step (shifting the limits of integration),
is not trivial: we must rely on Cauchys theorem, which allow one to deform the
integration contour in the complex plane.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
24 Random Walks and Emergent Properties

familiar context of thermal conductivity.22 Exercise 2.8 Most of the fusion energy generated by the Sun is pro-
explores selfavoiding random walks: in two dimensions, duced near its center. The Sun is 7 105 km in radius.
we nd that the constraint that walk must avoid itself Convection probably dominates heat transport in approx-
gives new critical exponents and a new universality class imately the outer third of the Sun, but it is believed that
(see also chapter 13). energy is transported through the inner portions (say to
Random walks also arise in nonequilibrium situations. a radius R = 5 108 m) through a random walk of X-ray
photons. (A photon is a quantized package of energy: you
They arise in living systems. Bacteria search for
may view it as a particle which always moves at the speed
food (chemotaxis) using a biased random walk, ran-
of light c. Ignore for this problem the index of refraction
domly switching from a swimming state (random
of the Sun.) Assume that the mean free path * for the
walk step) to a tumbling state (scrambling the ve-
photon is * = 5 105 m.
locity), see [10].
They arise in economics: Black and Scholes [111] an- About how many random steps N will the photon take of
alyze the approximate random walks seen in stock length * to get to the radius R where convection becomes
prices (gure 2.3) to estimate the price of options important? About how many years t will it take for the
how much you charge a customer who wants a guar- photon to get there? (You may assume for this problem
antee that they can by stock X at price Y at time t that the photon takes steps in random directions, each of
depends not only on whether the average price will equal length given by the mean-free path.) Related for-
rise past Y , but also whether a random uctuation mul: c = 3108 m/s; x2  2Dt; s2n  = n 2 = ns21 .
will push it past Y . There are 31, 556, 925.9747 107 3 107 seconds
They arise in engineering studies of failure. If a in a year.
bridge strut has N microcracks each with a failure
stress i , and these stresses have probability density (2.3) Ratchet and Molecular Motors. (Basic, Biol-
(), the engineer is not concerned with the aver- ogy)
age failure stress , but the minimum. This intro- Read Feynmans Ratchet and Pawl discussion in refer-
duces the study of extreme value statistics: in this ence [86, I.46] for this problem. Feynmans ratchet and
case, the failure time distribution is very generally pawl discussion obviously isnt so relevant to machines
described by the Weibull distribution. you can make in your basement shop. The thermal uc-
tuations which turn the wheel to lift the ea are too small
(2.1) Random walks in Grade Space. to be noticeable on human length and time scales (you
need to look in a microscope to see Brownian motion).
Lets make a simple model of the prelim grade distribu-
On the other hand, his discussion turns out to be surpris-
tion. Lets imagine a multiple-choice test of ten problems
ingly close to how real cells move things around. Physics
of ten points each. Each problem is identically dicult,
professor Michelle Wang studies these molecular motors
and the mean is 70. How much of the point spread on the
in the basement of Clark Hall.
exam is just luck, and how much reects the dierences
in skill and knowledge of the people taking the exam? To Inside your cells, there are several dierent molecular mo-
test this, lets imagine that all students are identical, and tors, which move and pull and copy (gure 2.7). There
that each question is answered at random with a proba- are molecular motors which contract your muscles, there
bility 0.7 of getting it right. are motors which copy your DNA into RNA and copy
your RNA into protein, there are motors which transport
(a) What is the expected mean and standard deviation for
biomolecules around in the cell. All of these motors share
the exam? (Work it out for one question, and then use
some common features: (1) they move along some linear
our theorems for a random walk with ten steps.)
track (microtubule, DNA, ...), hopping forward in discrete
A typical exam with a mean of 70 might have a standard jumps between low-energy positions, (2) they consume
deviation of about 15. energy (burning ATP or NTP) as they move, generat-
(b) What physical interpretation do you make of the ratio ing an eective force pushing them forward, and (3) their
of the random standard deviation and the observed one? mechanical properties can be studied by seeing how their
motion changes as the external force on them is changed
(2.2) Photon diusion in the Sun. (Easy) (gure 2.8).

22 We havent derived the law of thermal conductivity from random walks of

phonons. Well give general arguments in chapter 10 that an energy ow linear

in the thermal gradient is to be expected on very general grounds.
To be pub. Oxford UP, Fall05
2.4 Solving the Diusion Equation 25

Fig. 2.7 Cartoon of a motor protein, from reference [48]. As it Fig. 2.9 The eective potential for moving along the DNA
carries some cargo along the way (or builds an RNA or protein, (from reference [48]). Ignoring the tilt We , Feynmans energy
. . . ) it moves against an external force fext and consumes r barrier / is the dierence between the bottom of the wells and
ATP molecules, which are hydrolyzed to ADP and phosphate the top of the barriers. The experiment changes the tilt by
(P). adding an external force pulling  to the left. In the absence
of the external force, We is the (Gibbs free) energy released
when one NTP is burned and one RNA nucleotide is attached.

For transcription of DNA into RNA, the motor moves on

average one base pair (A, T, G or C) per step: * is
about 0.34nm. We can think of the triangular grooves in
the ratchet as being the low-energy states of the motor
when it is resting between steps. The barrier between
steps has an asymmetric shape (gure 2.9), just like the
energy stored in the pawl is ramped going up and steep
going down. Professor Wang showed (in a later paper)
that the motor stalls at an external force of about 27 pN
(a) At that force, what is the energy dierence between
neighboring wells due to the external force from the bead?
(This corresponds to L in Feynmans ratchet.) Lets as-
sume that this force is whats needed to balance the natural
force downhill that the motor develops to propel the tran-
scription process. What does this imply about the ratio
of the forward rate to the backward rate, in the absence
of the external force from the laser tweezers, at a tem-
perature of 300K, (from Feynmans discussion preceding
equation 46.1)? (kB = 1.381 1023 J/K).
The natural force downhill is coming from the chemical
reactions which accompany the motor moving one base
pair: the motor burns up an NTP molecule into a PPi
Fig. 2.8 Cartoon of Cornell professor Michelle Wangs early molecule, and attaches a nucleotide onto the RNA. The
laser tweezer experiment, (reference [119]). (A) The laser beam net energy from this reaction depends on details, but
is focused at a point (the laser trap); the polystyrene bead varies between about 2 and 5 times 1020 Joule. This
is pulled (from dielectric eects) into the intense part of the is actually a Gibbs free energy dierence, but for this
light beam. The track is a DNA molecule attached to the problem treat it as just an energy dierence.
bead, the motor is an RNA polymerase molecule, the cargo
is the glass cover slip to which the motor is attached. (B) As (b) The motor isnt perfectly ecient: not all the chemi-
the motor (RNA polymerase) copies DNA onto RNA, it pulls cal energy is available as motor force. From your answer
the DNA track toward itself, dragging the bead out of the to part (a), give the eciency of the motor as the ratio
trap, generating a force resisting the motion. (C) A mechani- of force-times-distance produced to energy consumed, for
cal equivalent, showing the laser trap as a spring and the DNA the range of consumed energies given.
(which can stretch) as a second spring.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
26 Random Walks and Emergent Properties

(2.4) Solving Diusion: Fourier and Green. (Ba- (2.5) Solving the Diusion Equation. (Basic)
Consider a one-dimensional diusion equation /t =
D 2 /x2 , with initial condition periodic in space with
0.4 period L,P consisting of a function at every xn = nL:
(x, 0) = n= (x nL).

(a) Using the Greens function method, give an approxi-

(x,t=0) - 0

mate expression for the the density, valid at short times

and for L/2 < x < L/2, involving only one term (not
an innite sum). (Hint: how many of the Gaussians are
0 important in this region at early times?)
(b) Using the Fourier method,24 give an approximate ex-
pression for the density, valid at long times, involving only
two terms (not an innite sum). (Hint: how many of
the wavelengths are important at late times?)
-0.4 (c) Give a characteristic time in terms of L and D,
0 5 10 15 20
Position x such that your answer in (a) is valid for t  and your
Fig. 2.10 Initial prole of density deviation from average. answer in (b) is valid for t  .

An initial density prole (x, t = 0) is perturbed slightly (2.6) Frying Pan (Basic)
away from a uniform density 0 , as shown at left. The An iron frying pan is quickly heated on a stove top to 400
density obeys the diusion equation /t = D 2 /x2 , degrees Celsius. Roughly how long it will be before the
where D = 0.001 m2 /s. The lump centered at x = 5 handle is too hot to touch (within, say, a factor of two)?
is a Gaussian exp(x2 /2)/ 2, and the wiggle centered (Adapted from reference [90, p. 40].)
at x = 15 is a smooth envelope function multiplying
Do this three ways.
(a) Fourier. As a rst step in guessing how the pictured (a) Guess the answer from your own experience. If youve
density will evolve, lets consider just a cosine wave. If the always used aluminum pans, consult a friend or parent.
initial wave were cos (x, 0) = cos(10x), what would it be (b) Get a rough answer by a dimensional argument. You
at t = 10s? Related formul: (k, t) = (k, t )G(k, tt ); need to transport heat cp V T across an area A = V /x.
G(k, t) = exp(Dk2 t). How much heat will ow across that area per unit time,
(b) Green. As a second step, lets check how long it if the temperature gradient is roughly assumed to be
would take to spread out as far as the Gaussian on the left. T /x? How long t will it take to transport the amount
If the wave at some earlier time t0 were a function at needed to heat up the whole handle?
x = 0, (x, t0 ) = (x), what choice of the time elapsed (c) Roughly model the problem as the time needed for
t0 would yield a Gaussian (x, 0) = exp(x2 /2)/ 2 a pulse of heat at x = 0 on an innite rod to spread
for the given diusion constantR D = 0.001m /s? Re- out a distance equal to the length of the handle, and
lated formul: (x, t) = (y, t )G(y x, t t ) dy; use the Greens function for the heat diusion equation
G(x, t) = (1/ 4Dt) exp(x /(4Dt)). (problems 10.3 and 10.4 below). How long until the pulse
(c) Pictures. Now consider time evolution for the next spreads out a root-mean square distance (t) equal to the
ten seconds. The initial density prole (x, t = 0) is length of the handle?
again shown at left. Which of the choices in gure 2.11
Note: For iron, the specic heat cp = 450J/kg C, the
represents the density at t = 10s? (Hint: compare
density = 7900kg/m3 , and the thermal conductivity
t = 10s to the time t0 from part (B).) Related formul:
kt = 80W/m C.
x2  2Dt;

23 Math reference: [68, sec. 8.4].

24 Ifyou use a Fourier transform of (x, 0), youll need to sum over n to get -
function contributions at discrete values of k = 2m/L. If you use a Fourier series,
youll need to unfold the sum over n of partial Gaussians into a single integral over
an unbounded Gaussian.
To be pub. Oxford UP, Fall05
2.4 Solving the Diusion Equation 27

0.4 0.4 0.4

(x,t=10) - 0

(x,t=10) - 0

(x,t=10) - 0
0 0 0

-0.4 -0.4 -0.4

(A) 10 15 20
Position x
25 30
(B) 0 5 10
Position x
15 20
(C) 0 5 10
Position x
15 20

0.4 0.4
(x,t=10) - 0

(x,t=10) - 0

0 0

-0.4 -0.4
(D) 0 5 10
Position x
15 20
(E) 0 5 10
Position x
15 20

Fig. 2.11 Final states of diusion example

(2.7) Thermal Diusion. (Basic) Polymers are not accurately represented as random walks,
The rate of energy ow in a material with thermal con- however. Random walks, particularly in low dimensions,
ductivity kt and a temperature eld T (x, y, z, t) = T (r, t) often intersect themselves. Polymers are best represented
is J = kt T .25 Energy is locally conserved, so the en- as self-avoiding random walks: the polymer samples all
ergy density E satises E/t = J. possible congurations that does not cross itself. (Greg
(a) If the material has constant specic heat cp and den- Lawler, in the math department here, is an expert on
sity , so E = cp T , show that the temperature T satises self-avoiding random walks.)
the diusion equation T /t = ckpt 2 T . Lets investigate whether avoiding itself will change the
(b) By putting our material in a cavity with microwave basic nature of the polymer conguration. In particu-
standing waves, we heat it with a periodic modulation lar, does the end-to-end typical distance continue to scale
T = sin(kx) at t = 0, at which time the microwaves with the square root of the length L of the polymer,
are turned o. Show that amplitude of the temperature R L?
modulation decays exponentially in time. How does the (b) Two dimensional self-avoiding random walk.
amplitude decay rate depend on wavelength = 2/k? Give a convincing, short argument explaining whether or
(2.8) Polymers and Random Walks. not a typical, non-self-avoiding random walk in two di-
mensions will come back after large numbers of monomers
Polymers are long molecules, typically made of identi-
and cross itself. (Hint: how big a radius does it extend
cal small molecules called monomers that are bonded to-
to? How many times does it traverse this radius?)
gether in a long, one-dimensional chain. When dissolved
in a solvent, the polymer chain conguration often forms a BU java applet. Run the Java applet linked to at ref-
good approximation to a random walk. Typically, neigh- erence [69]. (Youll need to nd a machine with Java
boring monomers will align at relatively small angles: sev- enabled.) They model a 2-dimensional random walk as a
eral monomers are needed to lose memory of the original connected line between nearest-neighbor neighboring lat-
angle. Instead of modeling all these small angles, we can tice points on the square lattice of integers. They start
produce an equivalent problem focusing all the bending in random walks at the origin, grow them without allowing
a few hinges: we approximate the polymer by an uncorre- backtracking, and discard them when they hit the same
lated random walk of straight segments several monomers lattice point twice. As long as they survive, they average
in length. The equivalent segment size is called the per- the squared length as a function of number of steps.
sistence length.26 (c) Measure for a reasonable length of time, print out
(a) If the persistence length to bending of DNA is 50nm, the current
state, and enclose it. Did the simulation give
with 3.4A per nucleotide
p base pair, what will the root- R L? If not, whats the estimate that your simula-
mean-square distance R2  be between the ends of a tion gives for the exponent relating R to L? How does
gene in solution with 100,000 base pairs, if the DNA is it compare with the two-dimensional theoretical exponent
accurately represented as a random walk? given at the Web site?

25 We could have derived this law of thermal conductivity from random walks of

phonons, but we havent. Well give general arguments in chapter 10 that an energy
ow linear in the thermal gradient is to be expected on very general grounds.
26 Some seem to dene the persistence length with a dierent constant factor.

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
28 Random Walks and Emergent Properties

To be pub. Oxford UP, Fall05

Temperature and
Equilibrium 3
We now turn to study the equilibrium behavior of matter: the historical
origin of statistical mechanics. We will switch in this chapter between
discussing the general theory and applying it to a particular system the
ideal gas. The ideal gas provides a tangible example of the formalism,
and its solution will provide a preview of material coming in the next
few chapters.
A system which is not acted upon by the external world1 is said 1
If the system is driven (e.g., there are
to approach equilibrium if and when it settles down at long times to externally imposed forces or currents)
we instead call this nal condition the
a state which is independent of the initial conditions (except for con- steady state. If the system is large, the
served quantities like the total energy). Statistical mechanics describes equilibrium state will also usually be
the equilibrium state as an average over all states consistent with the time independent and calm, hence the
conservation laws: this microcanonical ensemble is introduced in sec- name. Small systems will continue to
uctuate substantially even in equilib-
tion 3.1. In section 3.2, we shall calculate the properties of the ideal rium.
gas using the microcanonical ensemble. In section 3.3 we shall dene
entropy and temperature for equilibrium systems, and argue from the
microcanonical ensemble that heat ows to maximize the entropy and
equalize the temperature. In section 3.4 we will derive the formula for
the pressure in terms of the entropy, and dene the chemical potential.
In section 3.5 we calculate the entropy, temperature, and pressure for
the ideal gas, and introduce some renements to our denitions of phase
space volume. Finally, in section 3.6 we discuss the relation between
statistical mechanics and thermodynamics.

3.1 The Microcanonical Ensemble

Statistical mechanics allows us to solve en masse many problems that
are impossible to solve individually. In this chapter we address the gen-
eral equilibrium behavior of N atoms in a box of volume V any kinds
of atoms, in arbitrary external conditions. Lets presume for simplicity
that the walls of the box are smooth and rigid, so that energy is con-
served when atoms bounce o the walls. This makes our system isolated,
independent of the world around it.
How can we solve for the behavior of our atoms? If we ignore quan-
tum mechanics, we can in principle determine the positions2 Q = 2
The 3N dimensional space of positions
(x1 , y1 , z1 , x2 , . . . xN , yN , zN ) = (q1 . . . q3N ) and momenta P = (p1 , . . . p3N ) Q is called conguration space. The
3N dimensional space of momenta P is
of the particles at any future time given their initial positions and mo- called momentum space. The 6N di-
29 mensional space (P, Q) is called phase
30 Temperature and Equilibrium

menta using Newtons laws

Q = m1 P (3.1)
P = F(Q)
(where F is the 3N dimensional force due to the other particles and the
m is a diagonal matrix if the particles walls, and m is the particle mass).3
arent all the same mass. In general, solving these equations is plainly not feasible.
Many systems of interest involve far too many particles to allow
one to solve for their trajectories.
Most systems of interest exhibit chaotic motion, where the time
evolution depends with ever increasing sensitivity on the initial
conditions you cannot know enough about the current state to
E predict the future.
Even if it were possible to evolve our trajectory, knowing the solu-
tion would for most purposes be useless: were far more interested
in the typical number of atoms striking a wall of the box, say, than
E+E the precise time a particular particle hits.4
How can we extract the simple, important predictions out of the com-
Fig. 3.1 The shell of energies between plex trajectories of these atoms? The chaotic time evolution will rapidly
E and E + E can have an irregu-
scramble5 whatever knowledge we may have about the initial conditions
lar thickness. The volume of this
shell in 6N dimensional phase space, of our system, leaving us eectively knowing only the conserved quanti-
divided by E, is the denition of (E). ties for our system, just the total energy E.6 Rather than solving for
Notice that the microcanonical average the behavior of a particular set of initial conditions, let us hypothesize
weights the thick regions more heav-
ily. We shall see in section 4.1 that this
that the energy is all we need to describe the equilibrium state. This
is the correct way to take the average: leads us to a statistical mechanical description of the equilibrium state of
just as a water drop in a river spends our system as an ensemble of all possible initial conditions with energy
more time in the deep sections where E the microcanonical ensemble.
the water ows slowly, so also a trajec-
tory in phase space spends more time in
We calculate the properties of our ensemble by averaging over states
the thick regions where it moves more with energies in a shell (E, E+E) taking the limit7 E 0 (gure 3.1).
This scrambling, of course, is precisely Lets dene the function (E) to be the phase-space volume of this thin
the approach to equilibrium. shell: 
If our box were spherical angular mo- (E) E = dP dQ. (3.2)
mentum would also be conserved. E<H(P,Q)<E+E
What about quantum mechanics, Here H(P, Q) is the Hamiltonian for our system.8 Finding the average
where the energy levels in a nite sys- A of a property A in the microcanonical ensemble is done by averaging
tem are discrete? In that case (chap- A(P, Q) over this same energy shell,9
ter 7), we will need to keep E large 
compared to the spacing between en- 1
ergy eigenstates, but small compared to
AE = A(P, Q) dP dQ. (3.9)
(E)E E<H(P,Q)<E+E
the total energy.
The Hamiltonian H is the function 4 Of course, there are applications where the precise evolution of a particular sys-
of P and Q that gives the energy.
tem is of interest. It would be nice to predict the time at which a particular earth-
For our purposes, this P3Nwill always be
quake fault will yield, so as to warn everyone to go for a picnic outdoors. Statistical
P2 /2m + V (Q) = 2
=1 p /2m + mechanics, broadly speaking, is helpless in computing such particulars. The bud-
V (q1 , . . . , q3N ), where the force in New-
get of the weather bureau is a good illustration of how hard such system-specic
tons laws 3.1 is F = q V
predictions are.
9 It is convenient to write the energy shell E < H(P, Q) < E + E in terms of the

Heaviside step function (x):

1 x0
(x) = ; (3.3)
0 x<0
To be pub. Oxford UP, Fall05
3.2 The Microcanonical Ideal Gas 31

Notice that, by averaging equally over all states in phase space com-
patible with our knowledge about the system (that is, the conserved
energy), we have made a hidden assumption: all points in phase space
(with a given energy) are a priori equally likely, so the average should
treat them all with equal weight. In section 3.2, we will see that this
assumption leads to sensible behavior, by solving the simple case of an
ideal gas. We will fully justify this equal-weighting assumption in chap-
ter 4, where we will also discuss the more challenging question of why
so many systems actually reach equilibrium.
The fact that the microcanonical distribution describes equilibrium
systems should be amazing to you. The long-time equilibrium behavior
of a system is precisely the typical behavior of all systems with the same
value of the conserved quantities. This fundamental regression to the
mean is the basis of statistical mechanics.

3.2 The Microcanonical Ideal Gas

We can talk about a general collection of atoms, and derive general
statistical mechanical truths for them, but to calculate specic properties
we must choose a particular system. The simplest statistical mechanical

we see that (E + E H) (E H) is one precisely inside the energy shell (see

gure 3.1). In the limit E 0, we can write (E) as a derivative
(E)E = dP dQ
= dP dQ [(E + E H) (E H)]

= E dP dQ (E H) (3.4)
and the expectation of a general operator A as
A = dP dQ [(E + E H) (E H)] A(P, Q)
= dP dQ (E H)A(P, Q). (3.5)
(E) E
the derivatives in equations 3.4 and 3.5 are
It will be important later to note that

at constant N and constant V : E . Finally, we know the derivative of the
Heaviside function is the the Dirac -function. (You may think of (x) as the limit
as / zero of a function which is 1// in the range (0, /). Mathematicians may think
of it as a point mass at the origin.)
(E) = dP dQ (E H(P, Q)) , (3.6)
A = dP dQ (E H(P, Q)) A(P, Q). (3.7)
Thus the microcanonical ensemble can be written as a probability density
(E H(P, Q)) /(E) in phase space. which is of course the integral divided by
the volume (E)E:
A = dP dQ A(P, Q). (3.8)
(E)E E<H(P,Q)<E+E

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
32 Temperature and Equilibrium

Air is a mixture of gasses, but most system is the monatomic10 ideal gas. You can think of helium atoms
of the molecules are diatomic: O2 and at high temperatures and low densities as a good approximation to this
N2 , with a small admixture of triatomic
CO2 and monatomic Ar. The proper-
ideal gas the atoms have very weak long-range interactions and rarely
ties of diatomic ideal gasses are almost collide. The ideal gas will be the limit when the interactions between
as simple: but one must keep track of particles vanish.11
the internal rotational degree of free-
dom (and, at high temperatures, the
vibrational degrees of freedom). 3.2.1 Conguration Space
For the ideal gas, the energy does not depend upon the spatial congura-
tion Q of the particles. This allows us to study the positions separately
from the momenta (next subsection). Since the energy is independent
of the position, our microcanonical ensemble must weight all congura-
tions equally. That is to say, it is precisely as likely that all the particles
will be within a distance / of the middle of the box as it is that they will
be within a distance / of any other particular conguration.
What is the probability density (Q) that the ideal gas particles will
be in a particular conguration Q R3N inside the box of volume V? We
know is a constant, independent of the conguration.

We know that
the gas atoms are in some conguration, so dQ = 1. The integral
over the positions gives a factor of V for each of the N particles, so
(Q) = 1/V N .
It may be counterintuitive that unusual congurations, like all the
particles on the right half of the box, have the same probability density
as more typical congurations. If there are two non-interacting particles
in a L L L box centered at the origin, what is the probability that
both are on the right (have x > 0)? The probability that two particles
are on the right half is the integral of = 1/L6 over the six dimensional
volume where both particles have x > 0. The volume of this space is
(L/2) L L (L/2) L L = L6 /4, so the probability is 1/4, just as
one would calculate by ipping a coin for each particle. The probability
that N such particles are on the right is 2N just as your intuition
would suggest. Dont confuse probability density with probability! The
unlikely states for molecules are not those with small probability density.
Rather, they are states with small net probability, because their allowed
congurations and/or momenta occupy insignicant volumes of the total
phase space.
Notice that conguration space typically has dimension equal to sev-
A gram of hydrogen has approxi- eral times Avogadros number.12 Enormousdimensional vector spaces
mately N = 6.02 1023 atoms, known have weird properties which directly lead to to important principles
as Avogadros number. So, a typical
in statistical mechanics. For example, most of conguration space has
3N will be around 1024 .
almost exactly half the x-coordinates on the right side of the box.
If there are 2N non-interacting particles in the box, what is the prob-
ability Pm that N + m of them will be on the right half? There are 22N
equally likely ways the distinct particles could sit on the two sides of

11 With no interactions, how can the ideal gas reach equilibrium? If the particles

never collide, they will forever be going with whatever initial velocity we started them.
We imagine delicately taking the long time limit rst, before taking the limit of weak
interactions, so we can presume an equilibrium distribution has been established.
To be pub. Oxford UP, Fall05
3.2 The Microcanonical Ideal Gas 33

the box. Of these, N2N +m = (2N )!/((N + m)!(N m)!) have m extra
13 `p
particles on the right half.13 So, q
is the number of ways of choosing
  an unordered subset of size q from a set
2N 2N of size p. There are p(p 1)...(p q +
Pm = 2 = 22N (2N )!/((N + m)!(N m)!). (3.10)
N +m 1) = p!/(p q)! ways of choosing an
ordered subset, since there are p choices
We can calculate the uctuations in the number on the right using for the rst member and p 1 for the
Stirlings formula,14 second . . . There are q! dierent ordered
sets for each disordered one, so pq =
n! (n/e)n 2n (n/e)n . (3.11) p!/(q!(p q)!).
Stirlings formula tells us that the
now, lets use the second, less accurate form: keeping the factor average number in the product n(n
2n would x the prefactor in the nal formula (exercise 3.4) which 1) . . . 1 is roughly n/e. See exercise 1.4.
we will instead derive by normalizing the total probability to one. Using
Stirlings formula, equation 3.10 becomes
 2N  N +m  N m
2N N +m N m
Pm 22N
e e e
2N N m
=N (N + m) N +m
(N m) (3.12)
(N +m) (N m)
= (1 + m/N ) (1 m/N )
= ((1 + m/N )(1 m/N )) (1 + m/N )m (1 m/N )m
= (1 m2 /N 2 ) (1 + m/N )m (1 m/N )m
and, since m  N we may substitute 1 + / exp(/), giving us

2 N
m m
Pm em /N em/N
em/N P0 exp(m2 /N ).
where P0 is the prefactor we missed by not keeping enough terms in
Stirlings formula. We know that the
probabilities must sum toone,
so again for m  N , 1 = m Pm P0 exp(m2 /N ) dm = P0 N .
Pm 1/N exp(m2 /N ). (3.14)
This is a nice result: it says that the number uctuations
are distributed
according to a Gaussian or normal distribution15 (1/ 2) exp(x2 /2 2 ) 15
We derived exactly this result in sec-
with a standard deviation m = N/2. If we have Avogadros number tion 2.4.2 using random walks and a
continuum approximation, instead of
of particles N 1024 , then the fractional uctuations m /N = 2N 1
Stirlings formula: this Gaussian is
10 = 0.0000000001%. In almost all the volume of a box in R , al- 3N
the Greens function for the number
most exactly half of the coordinates are on the right half of their range. of heads in 2N coin ips. Well de-
rive it again in exercise 13.7 by de-
In section 3.2.2 we will nd another weird property of highdimensional riving the central limit theorem using
spaces. renormalization-group methods.
We will nd that the relative uctuations of most quantities of interest
in equilibrium statistical mechanics go as 1/ N . For many properties
of macroscopic systems, statistical mechanical uctuations about the av-
erage value are very small.

3.2.2 Momentum Space

Working with the microcanonical momentum distribution is more chal-
lenging, but more illuminating, than working with the ideal gas cong-
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
34 Temperature and Equilibrium

R E+ E
Fig. 3.2 The energy surface in mo-
mentum spaceis the 3N 1 sphere
of radius R = 2mE. The conditions
that the x-component of the momen-
tum of atom #1 is p1 restricts us to
a circle (or rather
p 3N 2 sphere) of
radius R = 2mE p1 2 . The con-
dition that the energy is in the shell
(E, E +E) leaves us with the annular
region shown in the inset.

uration space of the last section. Here we must study the geometry of
spheres in high dimensions.
3N 1
kinetic  energy
2 3N 2
for interacting particles is
/ m
=1 2 v = p
=1 /2m . If we assume all of our atoms have
the same mass m, this simplies to P2 /2m. Hence the condition that
the particles in our system have energy E is that the system lies on
a sphere in 3N dimensional momentum space of radius R = 2mE.
3N 1
Mathematicians16 call this the 3N 1 sphere, SR . Specically, if the
energy of the system is known to be in a small range between E and
E + E, what
1is the corresponding volume of momentum space? The
Check this in two dimensions. Us-

volume SR of the  1 sphere (in  dimensions) of radius R is17
ing 1/2 ! = /2 and 32 ! = 3 /4, check 1
it in one and three dimensions (see ex- SR = /2 R / 2 ! (3.15)
ercise 1.4 for n! for non-integer n.) Is
n! = n (n 1)! valid for n = 3/2?
The volume of the thin shell18 between E and E + E is given by
This is not quite the surface area,
3N 1 3N 1
since were taking a shell of energy (S ) (S )
rather than radius. Thats why its Momentum Shell Volume 2M(E+E) 2ME
volume goes as R3N2 , rather than E
R3N1 .
3N 1

= d S 2mE dE
d 3N 3N

= 2 (2mE) 2 / 3N2 !
= 2 (3N m)(2mE) 2 1 / 3N
3N 3N
2 !
3N 3N
= (3N/2E) 2 (2mE) 2 / 3N
2 !. (3.16)

Formula 3.16 is the main result of this section. Given our microcanonical
ensemble that equally weights all states with energy E, the probability
16 Mathematicians like to name surfaces, or manifolds, for the number of dimensions

or local coordinates internal to the manifold, rather than the dimension of the space
the manifold lives in. After all, one can draw a circle embedded in any number of
dimensions (down to two). Thus a basketball is a two sphere S 2 , the circle is the
one-sphere S 1 , and the zero sphere S 0 consists of the two points 1.
To be pub. Oxford UP, Fall05
3.2 The Microcanonical Ideal Gas 35

density for having any particular set of particle momenta P is the inverse
of this shell volume.
Lets do a tangible calculation. Lets calculate the probability density
(p1 ) that the x-component of the momentum of the rst atom is p1 .19 19
It is a sloppy physics convention to
The probability density that this momentum is p1 and the energy is use to denote probability densities of
all sorts. Earlier, we used it to denote
in the range (E, E + E) is proportional to the area of the annular probability density in 3N dimensional
region (between
two 3N 2 spheres) in gure 3.2. The sphere has conguration space; here we use it to
radius R = 2mE, so by the Pythagorean theorem, the circle has radius denote probability density in one vari-
R = 2mE p1 2 . The volume in momentum space of the 3N 2 able. The argument of the function
tells us which function were consider-
dimensional annulus is given by using equation 3.15 with  = 3N 1: ing.

3N 2
Annular Area/E = d S dE
2mEp1 2
d (3N 1)/2 
= (2mE p1 2 )(3N 1)/2 / 3N21 !
= (3N 1)/2 ((3N 1)m)(2mE p1 2 )(3N 3)/2 / 3N21 !
= (3N 1)m (3N 1)/2 R3N 3 / 3N21 !
= [Constants]R 3N 3 , (3.17)
where weve dropped multiplicative factors that are independent of p1
and E. The probability density of being in the annulus is its area divided
by the shell volume in equation 3.16; this shell volume can be simplied
as well, dropping terms that do not depend on E:
Momentum Shell Volume
= 2 (3N m)(2mE) 2 1 / 3N
3N 3N
2 !
= 3N m 2 R3N 2 / 3N
2 !

= [Constants]R3N 2 . (3.18)
Our formula for the probability density (p1 ) is thus
Annular Area
(p1 ) =
Momentum Shell Volume
(3N 1)m (3N 1)/2 R3N 3 / 3N21 !
= 3N
3N m 2 R3N 2 / 3N
2 !
= [Constants](R /R )(R /R)3N
3 3N
= [Constants](R /R )(1 p1 /2mE)
2 2 2 .

 probability density (p1 ) will be essentially zero unless R /R =
1 p1 /2mE is nearly equal to one, since this factor is taken to

a power 3N/2 of around

Avogadros number. We can thus simplify
R2 /R3 1/R = 1/ 2mE and (1 p1 2 /2mE) = (1 /) exp(/) =
exp(p1 2 /2mE), giving us
p1 2 3N
(p1 ) 1/ 2mE exp (3.20)
2m 2E
The probability
 density (p1 ) is a Gaussian distribution of standard
deviation 2mE/3N; we again can set the constant of proportionality
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
36 Temperature and Equilibrium

to normalize the Gaussian, leading to

1 p1 2 3N
(p1 ) =  exp (3.21)
2m(2E/3N ) 2m 2E

Our ensemble assumption has allowed us to calculate the momentum

distribution of our particles explicitly in terms of E, N , and m, without
ever considering a particular trajectory: this is what makes statistical
mechanics powerful.
Formula 3.21 tells us that most of the surface area of a largedimensional
sphere is very close to the equator. Think of p1 as the latitude on the
 The range of latitudes containing most ofthe area is p =
2mE/3N , and the total range of latitudes is 2mE: the belt di-
vided by the height is the square root of Avogadros number. This is
true whatever equator you choose, even intersections of several equators.
Geometry is weird in high dimensions.
In the context of statistical mechanics, this seems much less strange:
typical congurations of gasses have the kinetic energy divided roughly
equally among all the components of momentum: congurations where
one atom has most of the kinetic energy are vanishingly rare.
This formula foreshadows four key results that will emerge from our
systematic study of equilibrium statistical mechanics in the following
few chapters.
(1) Temperature. In our calculation, a single momentum component
competed for the available energy with the rest of the ideal gas. In
section 3.3 we will study the competition in general between two
large subsystems for energy, and will discover that the balance is
determined by the temperature. The temperature T for our ideal
20 2E 20
We shall see that temperature is nat- gas will be given (equation 3.57) by kB T = 3N . Equation 3.21
urally measured in units of energy. His- then gives us the important formula
torically we measure temperature in de-
grees and energy in various other units

(p1 ) = 1/ 2mkB T exp(p1 2 /2mkB T ). (3.22)
(Joules, ergs, calories, eV, foot-pounds,
. . . ); Boltzmanns constant kB is the
conversion factor between units of tem- (2) Boltzmann distribution. The probability of the x-momentum
perature and units of energy. of the rst particle having kinetic energy K = p21 /2m is propor-
tional to exp(K/kB T ) (equation 3.22). This is our rst example
of a Boltzmann distribution. We shall see in section 5.2 that the
This is dierent from the probabil- probability of a small subsystem being in a particular state21 of
ity of the subsystem having energy E, energy E will in completely general contexts have probability pro-
which is the product of the Boltzmann
probability times the number of states
portional to exp(E/kB T ).
with that energy. (3) Equipartition theorem. The average kinetic energy p21 /2m
from equation 3.22 is kB T /2. This is an example of the equiparti-
tion theorem (section 5.3): each harmonic degree of freedom in an
equilibrium classical system has average energy kB T /2.
(4) General classical momentum distribution. Our derivation
was in the context of a monatomic ideal gas. But we could have
done an analogous calculation for a system with several gasses of
dierent masses: our momentum sphere would become an ellipsoid,
Molecular gasses will have internal but the calculation would still give the same distribution.22 What
vibration modes that are often not To be pub. Oxford UP, Fall05
well described by classical mechan-
ics. At low temperatures, these are
often frozen out: including rotations
and translations but ignoring vibra-
tions leads to the traditional formulas
used, for example, for air (see note 10
on page 32).
3.3 What is Temperature? 37

is more surprising, we shall see when we introduce the canonical

ensemble (section 5.2), that interactions dont matter either, so
long as the system is classical:23 the calculation factors and the 23
Notice that almost all molecular dy-
probability densities for the momenta are given by equation 3.22, namics simulations are done classically:
their momentum distributions are given
independent of the potential energies.24 The momentum distribu- by equation 3.22.
tion in the form equation 3.22 is correct for nearly all equilibrium
systems of classical particles.

3.3 What is Temperature?

When a hot body is placed beside a cold one, our ordinary experience
suggests that heat energy ows from hot to cold until they reach the same
temperature. In statistical mechanics, the distribution of heat between
the two bodies is determined by the assumption that all possible states
of the two bodies at xed total energy are equally likely. Do these two
denitions agree? Can we dene the temperature so that two large
bodies in equilibrium with one another will have the same temperature?
Consider a general, isolated system of total energy E consisting of two
parts, labeled 1 and 2. Each subsystem has xed volume and number of
particles, and is energetically weakly connected to the other subsystem.
The connection is weak in that we assume we can neglect the dependence
of the energy E1 of the rst subsystem on the state s2 of the second one,
and vice versa.25 25
A macroscopic system attached to
Our microcanonical ensemble then asserts that the equilibrium en- the external world at its boundaries
is usually weakly connected, since the
semble of the total system is an equal weighting of all possible states interaction energy is only important
of the two subsystems having total energy E. A particular state of the at the surfaces, which are a negligible
whole system is given by a pair of states (s1 , s2 ) with E = E1 + E2 . fraction of the total. Also, the mo-
This immediately implies that a particular conguration or state s1 of menta and positions of classical parti-
cles without magnetic elds are weakly
the rst subsystem at energy E1 will occur with probability density26 connected in this sense: no terms in the
Hamiltonian mix them (although the
(s1 ) 2 (E E1 ) (3.23) dynamical evolution certainly does).
That is, if we compare the probabili-
where 1 (E1 ) E1 and 2 (E2 ) E2 are the phase-space volumes of the ties of two states of the subsystems with
energy shell for the two subsystems. The volume of the energy surface energies Ea and Eb , and if 2 (E Ea )
for the total system at energy E will be given by adding up the product is 50 times larger than 2 (EEb ), then
(Ea ) = 50 (Eb ) because the former
of the volumes of the subsystems for pairs of energy summing27 to E: has 50 times as many partners that it
 can pair with to get an allotment of
(E) = dE1 1 (E1 )2 (E E1 ) (3.24) probability.
Equation 3.24 becomes a sum over
Notice that the integrand in equation 3.24, normalized by the total in- states in quantum mechanics, and
should be intuitively clear. We can for-
tegral, is just the probability density28 of the subsystem having energy mally derive it in classical mechanics:
E1 : see exercise 3.3.
(E1 ) = 1 (E1 )2 (E E1 )/(E). (3.25) 28
Warning: again were being sloppy:
we use (s1 ) in equation 3.23 for the
24 Quantum mechanics, however, couples the kinetic and potential terms: see chap- probability that the subsystem is in a
ter 7. Quantum mechanics is important for atomic motions only at low temperatures, particular state s1 and we use (E1 ) in
so equation 3.22 will be reasonably accurate for all gasses, all liquids but helium, and equation 3.25 for the probability that a
many solids that are not too cold. subsystem is in any of many particular
states with energy E1 .
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
38 Temperature and Equilibrium

We will show in a moment that if the two subsystems have a large

number of particles then (E1 ) is a very sharply peaked function near
its maximum E1 . Hence, the energy in subsystem 1 is given (apart from
small uctuations) by the maximum in the integrand 1 (E1 )2 (E E1 ).
The maximum is found when the derivative d dE1 2 1 dE2 is zero, so
1 d2

1 d1 1 d2
= . (3.26)
1 dE 2 dE
This is the condition for thermal equilibrium between the two subsys-
tems. We can put it in a more convenient form by dening the equilib-
rium entropy29
Sequil (E) = kB log((E)) (3.27)
kB is again Boltzmanns constant, for each of our systems.30 Then dS/dE = kB (1/)d/dE, and the
see note 20 on page 36. condition 3.26 for thermal equilibrium between two macroscopic bodies
is precisely the condition
d dS1  dS2 
(S1 (E1 ) + S2 (E E1 )) = =0 (3.28)
dE1 dE E1 dE EE1
that entropy is an extremum. Indeed, since the sum of the entropies is
the logarithm of the integrand of equation 3.24 which by assumption is
expanded about a local maximum, the condition of thermal equilibrium
We shall discuss dierent aspects of maximizes the entropy.31
entropy and its growth in chapter 6. We want to dene the temperature so that it becomes equal when the
two subsystems come to equilibrium. Weve seen that
dS1 /dE = dS2 /dE (3.29)
in thermal equilibrium. dS/dE decreases upon increasing energy, so we
dene the temperature in statistical mechanics as
1/T = dS/dE. (3.30)
Is the probability density (E1 ) in equation 3.25 sharply peaked, as
we have assumed? We can expand the numerator about the maximum
E1 = E1 , and use the fact that the temperatures balance to remove the
terms linear in E1 E1 :
(E1 ) 1 (E1 )2 (E E1 ) = exp (S1 (E1 )/kB + S2 (E E1 )/kB )
2 S1
exp S1 (E1 ) + 1/2 (E1 E1 )2
2 S2  
+ S2 (E E1 ) + 1/2 (E1 E1 )2 kB
S1 2 S2 
exp (E1 E1 )2 + (2kB ) . (3.31)
E12 E22

29 This denition depends on the units we pick for the phase-space volume. We will

later realize that the natural unit to pick is h3N , where h = 2 is Plancks constant.
Note also that in this book we will consistently use log to mean the natural logarithm
loge , and not log10 .
To be pub. Oxford UP, Fall05
3.3 What is Temperature? 39

where in the last line weve dropped factors independent of E1 . Thus

the energy uctuations are Gaussian, with standard deviation given by
kB times the inverse of the sum of E 2 for the two subsystems.
How large is E 2 for a macroscopic system? It has units of inverse
energy squared, but is the energy a typical system energy or an atomic
energy? If it is a system-scale energy (scaling like the
number of particles
N ) then the root-mean-square energy uctuation (E1 E1 )2  will
be comparable to E1 (enormous uctuations). If it is an atomic-scale
energy (going to a constant as N ) then the energy uctuations will
be independent of system size (microscopic). Quantities like the total
energy which scale linearly with the system size are called extensive;
quantities like temperature that go to a constant as the system grows
large are called intensive. We shall nd that the entropy of a system is
2 [S]/[E ] N/N 1/N so
2 2
extensive, so the second derivative E
the energy uctuations will scale as 1/ N of the total energy. (We can
also calculate these uctuations explicitly,32 with a little eort.) Just as
for the congurations of the ideal gas, where the number of particles in
half the box uctuated very little, so also the energy E1 uctuates very
33 33
little from the maximum probability E1 . In both cases, the relative We will discuss uctuations in detail
uctuations scale as 1/ N . in section 5.2, and in chapter 11.
The inverse of the temperature is the cost of buying energy from the
rest of the world. The lower the temperature, the more strongly the
kinetic energy for the momentum component is pushed towards zero.
Entropy is the currency being paid. For each unit E energy bought, we
pay E/T = E dS/dE = S in reduced entropy of the world. Inverse
temperature is the cost in entropy to buy a unit of energy.
The rest of the world is often called the heat bath; it is a source and
sink for heat and xes the temperature. All heat baths are equivalent,
depending only on the temperature. More precisely, the equilibrium
behavior of a system weakly coupled to the external world is independent
of what the external world is made of it depends only on the worlds
temperature. This is a deep truth.

32 S/E = 1/T , so 2 S/E 2 = (1/T 2 )T /E. Because were holding N and V

xed, we can use equation 3.38 to show

. E
T 1
= 1 = , (3.32)
E V,N T V,N N cv

the inverse of the total specic heat at constant volume. (The specic heat cv is the

energy needed per particle to change the temperature by one unit: N cv = ET
1 S 1 1 1 1
= = . (3.33)
kB E kB T 2 N c v kB T N c v T
This last line is indeed the inverse of a product of two energies. The second term
N cv T is a system-scale energy: it is the total energy that would be needed to raise
the temperature of the system from absolute zero, if the specic heat per particle
cv were temperature independent. However, the rst energy, kB T , is an atomic-
scale energy independent of N . The uctuations in energy, therefore, scale like the
geometric mean of the two, summed over the two subsystems in equation 3.31, and
hence scale as N : the total energy uctuations per particle thus are roughly 1/ N
times a typical energy per particle.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
40 Temperature and Equilibrium

3.4 Pressure and Chemical Potential

The entropy S(E, V, N ) is our rst example of a thermodynamic poten-
tial. In thermodynamics, all the macroscopic properties can be calcu-
lated by taking derivatives of thermodynamic potentials with respect
to their arguments. It is often useful to think of thermodynamic po-
tentials as surfaces: gure 3.3 shows the surface in S, E, V space (at
constant number of particles N ). The energy E(S, V, N ) is another
thermodynamic potential, completely equivalent to S(E, V, N ): its the
same surface with a dierent direction up. 
In section 3.3 we dened the temperature using E . What about


the other two rst derivatives, V E,N and N E,V ? That is, how does
the entropy change when volume or particles are exchanged between two
Fig. 3.3 Entropy. The entropy
S(E, V, N ) as a function of energy E subsystems? The change in the entropy for a tiny shift E, V , and
and volume V (at xed number N ). N from subsystem 2 to subsystem 1 (gure 3.4) is
Viewed sideways, this surface also de-        
nes the energy E(S, V, N ). The three S1  S2  S1  S2 
S = E + V
curves are lines at constant S,E, and V : E1 V,N E2 V,N V1 E,N V2 E,N

the fact that they close yields
must the    
relation E V S
= S1  S2 
V,N S,N E,N + N. (3.34)
1 (see exercise 3.5). N1 E,V N2 E,V
The rst term is of course (1/T1 1/T2)E; exchanging energy to max-
imize the entropy sets the temperatures equal. Just as for the energy, if
the two subsystems are allowed to exchange volume and number then the
S1(E1,V1 ,N1 ) S2(E2,V2 ,N2 ) entropy will maximize itself with respect to these variables as well, with
small uctuations.34 Equating the derivatives with respect to volume
V gives us our statistical mechanics denition of the pressure P :

P/T = (3.35)
and equating the derivatives with respect to number gives us the deni-
tion of the chemical potential :

Fig. 3.4 Two subsystems. Two S 
/T = . (3.36)
subsystems, isolated from the outside N  E,V
world, may exchange energy (open door
through the insulation), volume (pis- These denitions are a bit odd: usually we dene pressure and chem-
ton), or particles (tiny uncorked holes). ical potential in terms of the change in energy E, not the change in
entropy S. There is an important mathematical identity that we derive
Notice that this is exactly minus the in exercise 3.5. If f is a function of x and y, then (see gure 3.3):35
result you would have derived by can-
celling f , x, and y from numerator 34 If the systems are at dierent temperatures and the piston is allowed to act, we
and denominator.
would expect the pressures to equalize. Showing that this maximizes the entropy is
complicated by the fact that the motion of a piston not only exchanges volume V
between the two subsystems, but also changes the energyE because
of the
done. Equation 3.34 and 3.35 tell us that S = T1 T1 E + P T
V =
1 2 1 2
0, implying that E/V = (1 )P1 + P2 with = 1T1 /T . If we hypothesize
2 1
that the maximum entropy had P1 = P2 , we would certainly expect that E/V
would lie between these two pressures, corresponding to 0 < < 1, but if T2 and T1
are both positive and dierent then either < 0 or > 1. Hence the piston must
move to equalize the pressure even when the temperatures do not agree.
To be pub. Oxford UP, Fall05
3.4 Pressure and Chemical Potential 41

f  x  y 
= 1. (3.37)
x y y f f x

Also, its clear that if we keep all but one variable xed, partial deriva-
tives are like regular derivatives so
f   .
=1 (3.38)
x y f y

Using this for S(E, V ) and xing N , we nd

S  V  E  P  E 
1 = = 1  T (3.39)
= P. (3.40)
Thus the pressure is minus the energy cost per unit volume at constant
entropy. Similarly,
S  N  E   E 
1 = = 1  T (3.41)
= (3.42)
the chemical potential is the energy cost of adding a particle at constant
The chemical potential will be unfamiliar to most of those new to sta-
tistical mechanics. We can feel pressure and temperature as our bodies
exchange volume with balloons and heat with coee cups. Most of us
have not had comparable tactile experience with exchanging particles.36 36
Our lungs exchange oxygen and car-
Your intuition will improve as you work with chemical potentials. They bon dioxide, but they dont have nerve
endings that measure the chemical po-
are crucial to the study of chemical reactions (which we will treat only tentials.
lightly in this text): whether a reaction will proceed depends in part
on the relative cost of the products and the reactants, measured by
the dierences in their chemical potentials. The chemical potential is
also central to noninteracting quantum systems, where the number of
particles in each quantum state can vary (chapter 7).
Our familiar notion of pressure is from mechanics: the energy of a sub-
system increases as the volume decreases, as E = P V . What may
not be familiar is that this energy change is measured at xed entropy.
With the tools we have now, we can show explicitly that the mechanical
denition of pressure is the same as the statistical mechanics denition
(equation 3.35): the argument is somewhat technical, but illuminating
(footnote 37 at the end of this section).
We can also give a simpler argument, using properties of the entropy
that we will discuss more fully in chapter 6. A mechanical measure-
ment of the pressure must not exchange heat with the body. Changing
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
42 Temperature and Equilibrium

the volume while adding heat to keep the temperature xed, for exam-
ple, is a dierent measurement. The mechanical measurement must also
change the volume slowly. If the volume changes fast enough that the
subsystem goes out of equilibrium (typically a piston moving near the
speed of sound), then the energy needed to change the volume will in-
clude the energy for generating the sound and shock waves energies not
appropriate to include in a good measurement of the pressure. We call
a process adiabatic if it occurs without heat exchange and suciently
slowly that the system remains in equilibrium.
Consider the system comprising the subsystem and the mechanical
device pushing the piston, under a cycle V V +V V . Because the
subsystem remains in equilibrium at all times, the process of changing
the volume is completely reversible: the entropy of the system at the end
is the same as that at the beginning. Since entropy can only increase
(chapter 6), the entropy of the system halfway through the cycle at V +
V must be the same as at V . The mechanical instrument can be made
with few moving parts, so its entropy change can be neglected. Hence
the entropy of the subsystem must be unchanged under an adiabatic
change in volume. Thus a mechanical measurement of pressure is done
at constant entropy.
Broadly speaking, the entropy of a system changing adiabatically
(slowly and in thermal isolation) will be a constant. Indeed, you may
view our detailed calculation (the following footnote) as providing a sta-
tistical mechanical derivation for this important truth.37

37 We want to show that our statistical mechanics denition P = T cor-
responds to the everyday mechanical denition P = E/V . We rst must use
statistical mechanics to nd a formula for the mechanical force per unit area P .
Consider some general liquid or gas whose volume is changed smoothly from V to
V + V , and is otherwise isolated from the rest of the world. (A solid can support
a shear stress. Because of this, it has not just a pressure, but a whole stress tensor,
that can vary in space . . . )
We can nd the mechanical pressure if we can nd out how much the energy
changes as the volume changes. The initial system at t = 0 is an equilibrium ensem-
ble at volume V , uniformly lling phase space in an energy range E < H < E + E
with density 1/(E, V ). A member of this volume-expanding ensemble` is a trajec-
tory P(t), Q(t) that evolves in time under the changing Hamiltonian H P, Q, V (t) .
The amount this particular trajectory changes in energy under the time-dependent
Hamiltonian is
dH P(t), Q(t), V (t) H H H dV
= P + Q + . (3.43)
dt P Q V dt
A Hamiltonian for particles of kinetic energy 1/2 P2 /m and potential energy U (Q) will
have H
= P/m = Q and H Q
= V (Q) = P, so the rst two terms cancel on
the right-hand side of equation 3.43. (You may recognize Hamiltons equations of
motion; indeed, the rst two terms cancel for any Hamiltonian system.) Hence the
energy change for this particular trajectory is
dH P(t), Q(t), V (t) H ` dV
= P, Q . (3.44)
dt V dt
That is, the energy change of the evolving trajectory is the same as the expectation
value of Ht
at the static current point in the trajectory: we need not follow the
particles as they zoom around.
To be pub. Oxford UP, Fall05
3.4 Pressure and Chemical Potential 43

We still must average this energy change over the equilibrium ensemble of initial
conditions. This is in general not possible, until we make the second assumption
involved in the adiabatic measurement of pressure: we assume that the potential
energy turns on so slowly that the system remains in equilibrium at the current
volume V (t) and energy E(t). This allows us to calculate the ensemble average
energy change as an equilibrium thermal average:

dH H dV
= . (3.45)
dt V E(t),V (t) dt

Since this energy change must equal P dV

, we nd
H 1 H
P = = dP dQ (E H(P, Q, V )) . (3.46)
V (E) V
We now return to calculating the derivative

S kB
= k log() = . (3.47)

Using equation 3.4 to write in terms of a derivative of the function, we can

change orders of dierentiation

= dP dQ (E H(P, Q, V ))

= dP dQ (E H(P, Q, V ))
= dP dQ (E H(P, Q, V )) (3.48)

But the phasespace integral in the last equation is precisely the same integral that
appears in our formula for the pressure, equation 3.46: it is (E)(P ). Thus

= ((E)P )

= P + (3.49)
S kB P
= k log() = P +

kB log() P S P
= P + kB = P + kB

= P/T + kB (3.50)

Now, P and T are both intensive variables, but E is extensive (scales linearly with
system size). Hence P/T is of order one for a large system, but kB E is of order
1/N where N is the number of particles. (For example, we shall see that for the ideal
gas, P V = 2/3E = N kB T , so kB P
= 2k
B 2P
= 3NT P
= 23 NT  P T
for large N .)
Hence the second term, for a large system, may be neglected, giving us the desired
= P/T. (3.51)
The derivative of the entropy S(E, V, N ) with respect to V at constant E and N
is thus indeed the mechanical pressure divided by the temperature. Adiabatic mea-
surements (slow and without heat exchange) keep the entropy unchanged.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
44 Temperature and Equilibrium

3.5 Entropy, the Ideal Gas, and Phase Space

Lets nd the temperature and pressure for the ideal gas, using our mi-
crocanonical ensemble. Well then introduce two subtle renements to
the phase space volume (one from quantum mechanics, and one for indis-
tinguishable particles) which will not aect the temperature or pressure,
but will be important for the entropy and chemical potential.
We derived the volume (E) of the energy shell in phase space in
It factors only because the potential section 3.2: it factored38 into a momentum space volume from equa-
energy is zero. tion 3.16 and a conguration space volume V N . Before our renements,
we have:
3N 3N 3N
crude(E) = 2 (2mE) 2 / 3N2 !(V )
3N 3N
2 (2mE) 2 / 3N
2 !(V

Notice that in the second line of 3.52 we have dropped the rst term:
it divides the phase space volume by a negligible factor (two-thirds the
Multiplying (E) by a factor in- energy per particle).39 The entropy and its derivatives are (before our
dependent of the number of particles renements)
is equivalent to adding a constant to
the entropy. The entropy of a typi- 3N 3N

cal system is so large (of order Avo- Scrude(E) = kB log 2 (2mE) 2 / 3N 2 ! (V N )
gadros number times kB ) that adding  
a number-independent constant to it is = /2 N kB log(2mE) + N kB log(V ) kB log
! (3.53)
irrelevant. Notice that this implies that 2

(E) is so large that multiplying it by
1 S  3N kB
a constant doesnt signicantly change = = (3.54)
its value. T E V,N 2E

P S  N kB
= = (3.55)

so the temperature and pressure are

kB T = and (3.57)
P V = N kB T. (3.58)

The rst line above is the temperature formula we promised in forming

equation 3.22: the ideal gas has energy equal to 1/2 kB T per component
Since kB T = 2E/3N , this means of the velocity.40
each particle on average has its share The second formula is the equation of state41 for the ideal gas. The
E/N of the total energy, as it must.
equation of state is the relation between the macroscopic variables of
It is rare that the equation of state an equilibrium system that emerges in the limit of large numbers of
can be written out as an explicit equa-
particles. The pressure P (T, V, N ) in an ideal gas will uctuate around
tion! Only in special cases (e.g., nonin-
teracting systems like the ideal gas) can the value N kB T /V given by the equation of state, with the magnitude
one solve in closed form for the thermo- of the uctuations vanishing as the system size gets large.
dynamic potentials, equations of state, To be pub. Oxford UP, Fall05
or other properties.
3.5 Entropy, the Ideal Gas, and Phase Space Renements 45

In general, our denition for the energy shell volume in phase space
needs two renements. First, the phase space volume has units of
([length][momentum])3N : the volume of the energy shell depends multi-
plicatively upon the units chosen for length, mass, and time. Changing
these units will change the corresponding crude form for the entropy by a
constant times 3N . Most physical properties, like temperature and pres-
sure above, are dependent only on derivatives of the entropy, so the over-
all constant wont matter: indeed, the zero of the entropy is undened
within classical mechanics. It is suggestive that [length][momentum] has
units of Plancks constant h, and we shall see in chapter 7 that quantum
mechanics in fact does set the zero of the entropy. We shall see in exer-
cise 7.1 that dividing42 (E) by h3N nicely sets the entropy density to 42
This is equivalent to using units for
zero in equilibrium quantum systems at absolute zero. which h = 1.
Second, there is an important subtlety in quantum physics regarding
e e
identical particles. Two electrons, or two Helium atoms of the same

isotope, are not just hard to tell apart: they really are completely and e+ e
utterly the same (gure 3.5). We shall see in section 7.3 that the proper

quantum treatment of identical particles involves averaging over possible e e
states using Bose and Fermi statistics.
In classical physics, there is an analogous subtlety regarding indistin-
guishable43 particles. For a system of two indistinguishable particles, Fig. 3.5 Feynman diagram: in-
the phase space points (pA , pB , qA , qB ) and (pB , pA , qB , qA ) should distinguishable particles. In quan-
tum mechanics, two electrons, (or two
not both be counted: the volume of phase space (E) should be half atoms of the same isotope) are funda-
that given by a calculation for distinguishable particles. For N indis- mentally indistinguishable. We can il-
tinguishable particles, the phase space volume should be divided by N !, lustrate this with a peek at an advanced
the total number of ways the labels for the particles can be permuted. topic mixing quantum eld theory and
relativity. Here is a scattering event
Unlike the introduction of the factor h3N above, dividing the phase of a photon o an electron, viewed in
space volume by N ! does change the predictions of statistical mechan- two reference frames: time is vertical,
ics in important ways. We will see in subsection 6.2.1 that the entropy a spatial coordinate is horizontal. On
increase for joining containers of dierent kinds of particles should be the left we see two dierent electrons,
one which is created along with an anti-
substantial, while the entropy increase for joining containers lled with electron or positron e+ , and the other
indistinguishable particles should be near zero. This result is correctly which later annihilates the positron. At
treated by dividing (E) by N ! for each set of N indistinguishable par- right we see the same event viewed in
a dierent reference frame: here there
ticles. We call the resulting ensemble Maxwell Boltzmann statistics,
is only one electron, which scatters two
to distinguish it from distinguishable statistics and from the quantum- photons. (The electron is virtual, mov-
mechanical Bose and Fermi statistics. We shall see in chapter 7 that ing faster than light, between the col-
identical fermions and bosons obey Maxwell-Boltzmann statistics at high lisions: this is allowed in intermediate
states for quantum transitions.) The
temperatures they become classical, but remain indistinguishable. two electrons on the left are not only in-
Combining these two renements gives us for the ideal gas distinguishable, they are the same par-
3N 3N ticle! The antiparticle is also the elec-
(E) = (V N /N !) ( 2 (2mE) 2 / 3N !) (1/h)3N . (3.59) tron, traveling backward in time.
V 3/2 3N 43
If we have particles that in principle
S(E) = N kB log 3 (2mE) kB log(N ! !). (3.60) are not identical, but our Hamiltonian
h 2
and measurement instruments do not
We can make our equation for the entropy more useful by using Stirlings distinguish between them, then in clas-
formula log(N !) N log N N , valid at large N . sical statistical mechanics we may treat
them with Maxwell Boltzmann statis-
tics as well: they are indistinguishable
5 V 4mE but not identical.
S(E, V, N ) = N kB + N kB log (3.61)
2 N h3 3N
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
46 Temperature and Equilibrium

This is the standard formula for the entropy of an ideal gas.

3.6 What is Thermodynamics?

Thermodynamics and statistical mechanics historically were closely tied,
and often even now they are taught together. What is thermodynamics?
Thermodynamics is the theory that emerges from statisti-
cal mechanics in the limit of large systems. Statistical mechanics
originated as a derivation of thermodynamics from an atomistic micro-
scopic theory (somewhat before the existence of atoms was universally
accepted). Thermodynamics can be viewed as statistical mechanics in
The limit N is thus usually the limit44 as the number of particles N . When we calculate
called the thermodynamic limit, even the relative uctuations in properties
for systems like secondorder phase like the energy or the pressure
transitions where the uctuations re-
and show that they vanish like 1/ N , we are providing a microscopic
main important and thermodynamics justication for thermodynamics. Thermodynamics is the statistical me-
per se is not applicable. chanics of near-equilibrium systems when one ignores the uctuations.
In this text, we will summarize many of the important methods and
results of traditional thermodynamics in the exercises (3.5, 5.4, 5.6),
6.4), and 5.7). Our discussions of order parameters (chapter 9) and
deriving new laws (chapter 10) will be providing thermodynamic laws,
broadly speaking, for a wide variety of states of matter.
Statistical mechanics has a broader purview than thermodynamics.
Particularly in applications to other elds like information theory, dy-
namical systems, and complexity theory, statistical mechanics describes
many systems where the emergent behavior does not have a recognizable
relation to thermodynamics.
Thermodynamics is a self-contained theory. Thermodynam-
ics can be developed as an axiomatic system. It rests on the so-called
three laws of thermodynamics, which for logical completeness must be
supplemented by a zeroth law. Informally, they are:

(0) Transitivity of equilibria: If two subsystems are in equilibrium with

a third, they are in equilibrium with one another.
(1) Conservation of energy: The total energy of an isolated system,
including the heat energy, is constant.
(2) Entropy always increases: An isolated system may undergo irre-
versible processes, whose eects can be measured by a state func-
tion called the entropy.
(3) Entropy goes to zero at absolute zero: The entropy per particle of
This value is set to zero by dividing any two large equilibrium systems will approach the same value45
(E) by h3N , as in section 3.5. as the temperature approaches absolute zero.

The zeroth law (transitivity of equilibria) becomes the basis for den-
ing the temperature. Our statistical mechanics derivation of the temper-
ature in section 3.3 provides the microscopic justication of the zeroth
law: systems that can only exchange heat energy are in equilibrium
one another when they have a common value of T1 = E V,N
To be pub. Oxford UP, Fall05
3.6 What is Thermodynamics? 47

The rst law (conservation of energy) is now a fundamental principle

of physics. Thermodynamics automatically inherits it from the micro-
scopic theory. Historically, the thermodynamic understanding of how
work transforms into heat was important in establishing that energy is
conserved. Careful arguments about the energy transfer due to heat ow
and mechanical work46 are central to thermodynamics. 46
We will use this kind of argument
The second law (entropy always increases) is the heart of thermo- in discussing the Carnot cycle in sec-
tion 6.1.
dynamics.47 It is responsible for everything from forbidding perpetual 47
motion machines to predicting the heat death of the universe (exer- In The Two Cultures, C. P. Snow
suggests being able to describe the Sec-
cise 6.1). Entropy and its increase is not a part of our microscopic laws ond Law of Thermodynamics is to sci-
of nature, but is the foundation an axiom for our macroscopic theory ence as having read a work of Shake-
of thermodynamics. The subtleties of how entropy and its growth comes speare is to the arts. (Some in non-
English speaking cultures may wish to
out of statistical mechanics will be the theme of chapter 6 and the focus
object.) That it is law #2 is not of great
of several exercises (6.5, 6.6, 7.2, and 8.8). import, but the concept of entropy and
The third law (entropy goes to zero at T = 0, also known as Nernsts its inevitable increase is indeed central.
theorem), basically reects the fact that quantum systems at absolute
zero are in a ground state. Since the number of ground states of a
quantum system typically is small48 and the number of particles is large, 48
Some systems may have broken sym-
systems at absolute zero have zero entropy per particle. Systems like metry states, or multiple degenerate
ground states, but the number of such
glasses that have not reached complete equilibrium can have non-zero states is typically independent of the
residual entropy as their eective temperature goes to zero (exercise 6.9). size of the system, or at least does not
The laws of thermodynamics have been written in many equivalent grow exponentially with the number of
ways.49 Caratheodory, for example, states the second law as There are particles, so the entropy per particle
goes to zero.
states of a system, diering innitesimally from a given state, which
are unattainable from that state by any quasi-static adiabatic50 process. Occasionally you hear them stated
(1) You cant win, (2) You cant break
The axiomatic form of the subject has attracted the attention of math- even, and (3) You cant get out of the
ematicians: indeed, formulas like dE = T dS P dV + dN have precise game. I dont see that Nernsts theo-
meanings in dierential geometry.51 rem relates to quitting.
In this text, we will not attempt to derive properties axiomatically 50
Caratheodory is using the term adia-
or otherwise from the laws of thermodynamics: we focus on statistical batic just to exclude heat ow: we use it
mechanics. to also imply innitely slow transitions.
Thermodynamics is a zoo of partial derivatives, transforma- The terms dX are dierential forms.

tions, and relations. More than any other eld of science, the ther-
modynamics literature seems lled with partial derivatives and tricky
relations between varieties of physical quantities.
This is in part because there are several alternative thermodynamic
potentials or free energies to choose between for a given problem. For
studying molecular systems one has not only the entropy (or the internal
energy) studied in this chapter, but also the Helmholtz free energy, the
Gibbs free energy, the enthalpy, and the grand free energy. Transforming
from one free energy to another is done using Legendre transformations
(exercise 5.7). There are corresponding free energies for studying mag-
netic systems, where instead of particles one studies the local magneti-
zation or spin. There appears to be little consensus between textbooks
on the symbols or even the names of these various free energies.
Thermodynamics seems cluttered in part also because it is so pow-
erful: almost any macroscopic property of interest can be found by
taking derivatives of the thermodynamic potential, as weve seen. First
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
48 Temperature and Equilibrium

derivatives of the entropy gave the intensive variables temperature, pres-

sure, and chemical potential; second derivatives gave properties like the
specic heat. The rst derivatives must agree around a tiny triangle,
yielding a tricky relation between their products (equation 3.37). The
2 2
second derivatives must be symmetric ( xy = yx ), giving tricky
Maxwell relations between what naively seem dierent susceptibilities
(exercise 3.5). There are further tricks involved with taking derivatives
For example, with work you can take in terms of unnatural variables,52 and there are many inequalities that
the derivative of S(E, V, N ) with re- can be derived from stability criteria.
spect to P at constant T without re-
expressing it in the variables P and T .
Of course, statistical mechanics is not really dierent from thermody-
namics in these regards.53 For each of the thermodynamic potentials
Perhaps the partial derivatives are
there is a corresponding statistical mechanical ensemble.54 Almost ev-
not so daunting when they come at the
end of a technically challenging micro- erything in statistical mechanics can be found from derivatives of the
scopic calculation. entropy (or, more typically, the partition function, section 5.2). Indeed,
See chapter 5 where we also explain statistical mechanics has its own collection of important relations that
why they are called free energies. connect equilibrium uctuations to transport and response.55 Weve
These are extremely useful, for exam- already seen the Einstein relation connecting uctuations to diusive
ple, in numerical simulations: do the transport in section 2.3; chapter 11 will focus on these uctuation
equilibrium state, measure the trans- dissipation and uctuationresponse relations.
port and response properties!
If we knew what system the reader was going to be interested in, we
could write down all these various tricky relations in a long appendix. We
could also produce a table of the dierent notations and nomenclatures
used in dierent communities and texts. However, such a tabulation
would not address the variety of free energies that arise in systems with
other external forces (exercise 5.8) or other broken symmetries. The
fraction of young scientists that will devote their careers to the study of
uids and gasses (or magnets, or any other particular system) is small.
For that reason, we focus on the statistical mechanical principles which
enable us to derive these and other new relations.

Exercise 3.1 is the classic problem of planetary atmo- (3.1) Escape Velocity. (Basic)
spheres. Exercise 3.2 is a nice generalization of the ideal
gas law. Part (a) of exercise 3.3 is a workout in - Assuming the probability distribution for the z compo-
functions; parts (b) and (c) calculate the energy uctu- nent
of momentum given in equation 3.22, (pz ) =
ations for a mixture of two ideal gasses, and could be 1/ 2mkB T exp(pz 2 /2mkB T ), give the probability
assigned separately. Exercise 3.4 extends the calculation density that an N2 molecule will have a vertical compo-
of the density uctuations from two subvolumes to K nent of the velocity equal to the escape velocity from the
subvolumes, and introduces the Poisson distribution. Fi- Earth (about 10 km/sec, if I remember right). Do we need
nally, exercise 3.5 introduces some of the tricky partial to worry about losing our atmosphere? Optional: Try
derivative relations in thermodynamics (the triple prod- the same calculation for H2 , where youll nd a substan-
uct of equation 3.37 and the Maxwell relations) and ap- tial leakage.
plys them to the ideal gas. (Hint:Youll want to know that there are about 107
seconds in a year, and molecules collide (and scramble
To be pub. Oxford UP, Fall05
3.6 What is Thermodynamics? 49

their velocities) many times per second. Thats why

Jupiter has hydrogen gas in its atmosphere, and Earth
does not.)

(3.2) Hard Sphere Gas (Basic) Fig. 3.7 Excluded volume around a sphere.
We can improve on the realism of the ideal gas by giving
the atoms a small radius. If we make the potential energy
(b) What is the congurational entropy for the hard disks?
innite inside this radius (hard spheres), the potential
Here, simplify your answer so that it does not involve
energy is simple (zero unless the spheres overlap, which
a sum over N terms, but valid to rst order in the
is forbidden). Lets do this in two dimensions.
area of the disks r 2 . Show, for large N , that it is
A two dimensional L L box with hard walls contains well approximated by SQ = N kB (1 + log(A/N b)),
a gas of N hard disks of radius r  L (gure 3.6). The with b representing the eective excluded area due to
disks are dilute: the summed area N r 2  L2 . Let A the
PN other disks. (You may want to derive the formula
n=1 log (A (n 1)9) = N log (A (N 1)9/2) +
be the eective volume allowed for the disks in the box:
A = (L 2r)2 . O(92 ).) What is the value of b, in terms of the area of
the disk?
(c) Find the pressure for the hard-sphere gas in the large
N approximation of part (b). Does it reduce to the ideal
gas law for b = 0?

r (3.3) Connecting Two Macroscopic Systems.

An isolated system with energy E is composed of two
macroscopic subsystems, each of xed volume V and
number of particles N . The subsystems are weakly cou-
pled, so the sum of their energies is E1 + E2 = E (g-
ure 3.4 with only the energy door open). We can use
the Dirac delta function (x) to dene the volume of the
energy surface of a system with Hamiltonian H to be
(E) = dP dQ (E H(P, Q)) (3.62)
Fig. 3.6 Hard Sphere Gas. = dP1 dQ1 dP2 dQ2 (3.63)

(E (H1 (P1 , Q1 ) + H2 (P2 , Q2 ))) .

(a) The area allowed for the second disk is A (2r)2 (a) Derive formula 3.24 for the volume of the R energy
(gure 3.7), ignoring the small correction when the ex- surface of the whole system (Hint: Insert (E1
cluded region around the rst disk overlaps the excluded H1 (P1 , Q1 )) dE1 = 1 into equation 3.63.)
region near the walls of the box. What is the allowed Consider a monatomic ideal gas (He) mixed with a di-
2N -dimensional volume in conguration space, of allowed atomic ideal gas (H2 ). We showed that a monatomic
zero-energy congurations of hard disks, in this dilute ideal gas of N atoms has 1 (E1 ) E1 . A diatomic
5N/2 56
limit? Ignore small corrections when the excluded re- molecule has 2 (E2 ) E2 .
gion around one disk overlaps the excluded regions around (b) Argue that the probability density of system 1 being at
other disks, or near the walls of the box. Remember the energy E1 is the integrand of 3.24 divided by the whole in-
1/N ! correction for identical particles. Leave your answer tegral, equation 3.25. For these two gasses, which energy
as a product of N terms. E1max has the maximum probability?

56 In the range 2 /2I  k T  , where is the vibrational frequency of the

stretch mode and I is the moment of inertia. The lower limit makes the rotations
classical; the upper limit freezes out the vibrations, leaving us with three classical
translation modes and two rotational modes a total of ve degrees of freedom.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
50 Temperature and Equilibrium

(c) Use the saddle-point method [68, sect. 3.6] to approx- Poisson distribution is valid even if there are only a few
imate the integral 3.63 as the integral over a Gaussian. events.)
(That is, put the integrand into the form exp(f (E1 )) and From parts (b) and (c), you should be able to conclude
Taylor expand f (E1 ) to second order in E1 E1max .) Use that the standard deviation in the number of particles
the saddle-point integrand as a Gaussian approximation found in a volume V inside an innite system should be
for the probability density (E1 ) (valid, for large N , when- equal to N0 , the expected number of particles in the vol-
ever (E1 ) isnt absurdly small). In this approximation, ume:
what is the mean energy E1 p(easy)? What are the en- (n n)2 = N0 . (3.65)
ergy uctuations per particle (E1 E1max )2 /N ?
This is twice the squared uctuations we found for the
For subsystems with large numbers of particles N , tem- case where the volume V was half of the total volume,
perature and energy density are well dened because equation 3.14. That makes sense, since the particles can
(E) for each subsystem grows extremely rapidly with uctuate more freely in an innite volume than in a dou-
increasing energy, in such a way that 1 (E1 )2 (E E1 ) bled volume.
is sharply peaked near its maximum.
If N0 is large, the probability Pm that N0 + m particles
(3.4) Gauss and Poisson. (Basic) lie inside our volume will be Gaussian for any K. (As a
special case, if a is large the Poisson distribution is well
In section 3.2.1, we calculated the probability distribu-
approximated as a Gaussian.) Lets derive this distribu-
tion for having n = N0 + m particles on the righthand
tion for all K. First, as in section 3.2.1, lets use the weak
half of a box of volume 2V with 2N0 total particles. In
form of Stirlings approximation, equation 3.11 dropping
section 11.3 will want to know the number uctuations
the square root: n! (n/e)n .
of a small subvolume in an innite system. Studying this
also introduces the Poisson distribution. (d) Using your result from part (a), write the exact for-
mula for log(Pm ). Apply the weak form of Stirlings for-
Lets calculate the probability of having n particles in a
mula. Expand your result around m = 0 to second or-
subvolume V , for a box with total volume KV and a to-
der in m, and show that log(Pm ) m2 /2K 2
, giving a
tal number of particles T = KN0 . For K = 2 we will
Gaussian form
derive our previous result, equation 3.14, including the 2 2
Pm em /2K . (3.66)
prefactor. As K we will derive the innite volume
result. What is K ? In particular, what is 2 and ? Your
(a) Find the exact formula for this probability: n particles result for 2 should agree with the calculation in sec-
in V , with total of T particles in KV . (Hint: What is the tion 3.2.1, and your result for should agree with equa-
probability that the rst n particles fall in the subvolume tion 3.65.
V , and the remainder T n fall outside the subvolume Finally, we should address the normalization of the Gaus-
(K 1)V ? How many ways are there to pick n particles sian. Notice that the ratio of the strong and weak forms
from T total particles?) of Stirlings formula, (equation 3.11) is 2n. We need
The Poisson probability distribution to use this to produce the normalization 2 of our
n = an ea /n! (3.64) (e) In terms of T and n, what factor would the square
root term have contributed if you had kept it in Stirlings
arises in many applications. It arises whenever there is formula going from part (a) to part (d)? (It should look
a large number of possible events T each with a small like a ratio involving three terms like 2X.) Show
probability a/T ; the number of cars passing a given point from equation 3.66 that the uctuations are small, m =
during an hour on a mostly empty street, the number of n N0  N0 for large N0 . Ignoring these uctuations,
cosmic rays hitting in a given second, etc. set n = N0 in your factor, and give the prefactor multiply-
P Show that the Poisson distribution is normalized: ing the Gaussian in equation 3.66. (Hint: your answer
n n = 1. Calculate the mean of the distribution should be normalized.)
in terms
of a. Calculate the standard deviation
(n n)2 . (3.5) Microcanonical Thermodynamics (Thermo-
(c) As K , show that the probability that n particles dynamics, Chemistry)
fall in the subvolume V has the Poisson distribution 3.64. Thermodynamics was understood as an almost complete
What is a? (Hint: Youll need to use the fact that scientic discipline before statistical mechanics was in-
ea = (e1/K )Ka (1 1/K)Ka as K , and the vented. Stat mech can be thought of as the microscopic
fact that n  T . Here dont assume that n is large: the theory, which yields thermo as the emergent theory on
To be pub. Oxford UP, Fall05
3.6 What is Thermodynamics? 51

long length and time scales where the uctuations are at (x0 + x | y, y0 + y, f0 ). Draw it at constant x
y f
unimportant. back to y0 , and then at constant y back to (x0 , y0 ). How
The microcanonical stat mech distribution introduced in much must f change to make this a single-valued func-
class studies the properties at xed total energy E, vol- tion?) Applying this formula to S at xed E, derive the
ume V , and number of particles N . We derived the mi- two equations in part (a) again.
croscopic formula S(N, V, E) = kB log (N, V, E). The (c) Ideal Gas Thermodynamics. Using the micro-
principle that entropy is maximal led us to the conclu- scopic formula for the entropy of a monatomic ideal
sion that two weakly-coupled systems in thermal equilib- gas 3.61
rium would exchange energy until their values of E S
|N,V " 3/2 #
agreed, leading us to dene the latter as the inverse of 5 V 4mE
the temperature. By an analogous argument we nd that S(N, V, E) = N kB + N kB log ,
2 N h3 3N
systems that can exchange volume (by a thermally in- (3.68)
sulated movable partition) will shift until V S
|N,E agrees, calculate .
and that systems that can exchange particles (by semiper-
Maxwell Relations. Imagine solving the microcanoni-
meable membranes) will shift until N S
|V,E agrees.
cal equation of state of some material (not necessarily an
How do we connect these statements with the denitions
ideal gas) for the energy E(S, V, N ): its the same surface
of pressure and chemical potential we get from thermo-
in four dimensions, but looked at with a dierent direc-
dynamics? In thermo, one denes the pressure as minus
tion pointing up. One knows that the second deriva-
the change in energy with volume P = V E
|N,S , and the
tives of E are symmetric: at xed N , we get the same
chemical potential as the change in energy with number
answer whichever order we take derivatives with respect
of particles = N |V,S ; the total internal energy satises
to S and V .
dE = T dS P dV + dN. (3.67) (d) Use this to show the Maxwell relation

(a) Show by solving equation 3.67 for dS that V S
|N,E = T P
= . (3.69)
P/T and N |V,E = /T (simple algebra). V S,N S V,N

Ive always been uncomfortable with manipulating dXs.

Lets do this the hard way. Our microcanonical equa- (This should take about two lines of calculus). Generate
tion of state S(N, V, E) can be thought of as a surface two other similar formulas by taking other second partial
embedded in four dimensions. derivatives of E. There are many of these relations.
(b) Show that, if f is a function of x and y, that (e) Stat Mech check of the Maxwell relation.
| y | f | = 1. (Draw a picture of a surface
y f f x x y
Using equation 3.61 repeated above, write formulas for
f (x, y) and a triangular path with three curves at con- E(S, V, N ), T (S, V, N ) and P (S, V, N ) for the ideal gas
stant f , x, and y as in gure 3.3. Specically, draw a (non trivial!). (This is dierent from T and P in part (c),
path that starts at (x0 , y0 , f0 ) and moves along a con- which were functions of N , V , and E.) Show explicitly
tour at constant f to y0 + y. The nal point will be that the Maxwell relation equation 3.69 is satised.

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
52 Temperature and Equilibrium

To be pub. Oxford UP, Fall05

Phase Space Dynamics and
Ergodicity 4
So far, our justication for using the microcanonical ensemble was sim-
ple ignorance: all we know about the late time dynamics is that energy
must be conserved, so we average over all states of xed energy. Here we
provide a much more convincing argument for the ensemble, and hence
for equilibrium statistical mechanics as a whole. In section 4.1 well
show for classical systems that averaging over the energy surface is con-
sistent with time evolution: Liouvilles theorem tells us that volume in
phase space is conserved, so the trajectories only stir the energy surface
around, they do not change the relative weights of dierent parts of the
energy surface. In section 4.2 we introduce the concept of ergodicity: an
ergodic system has an energy surface which is well stirred. Using Liou-
villes theorem and assuming ergodicity will allow us to show1 that the 1
We do not aspire to rigor, but we will
microcanonical ensemble average gives the long-time average behavior provide physical arguments for rigor-
ously known results: see [61].
that we call equilibrium.

4.1 Liouvilles Theorem

In chapter 3, we saw that treating all states in phase space with a given
energy on an equal footing gave sensible predictions for the ideal gas,
but we did not show that this democratic treatment was necessarily the
correct one. Liouvilles theorem, true for all Hamiltonian systems, will
tell us that all states are created equal.
Systems of point particles obeying Newtons laws without dissipation
are examples of Hamiltonian dynamical systems. Hamiltonian systems
conserve energy. The Hamiltonian is the function H(P, Q) that gives
the energy for the system for any point in phase space: the equations of
motion are given by

q = H/p (4.1)
p = H/q .

where as usual X = X/t. The standard example of a Hamiltonian,

and the only example we will discuss in this text, is a bunch of particles
interacting with a potential energy V :

H(P, Q) = p 2 /2m + V (q1 , . . . , q3N ). (4.2)

54 Phase Space Dynamics and Ergodicity

In this case, one immediately nds the expected equations of motion

q = H/p = p /m (4.3)
p = H/q = V /q = f (q1 , . . . , q3N ).
Youll cover Hamiltonian dynamics in where f is the force on coordinate . More general Hamiltonians2 arise
detail in most advanced courses in clas- when studying, for example, the motions of rigid bodies or mechanical
sical mechanics. For those who dont
already know about Hamiltonians, rest
objects connected by hinges and joints, where the natural variables are
assured that we wont use anything angles or relative positions rather than points in space. Hamiltonians
other than the special case of Newtons also play a central role in quantum mechanics.3
laws for point particles: you can safely Hamiltonian systems have properties that are quite distinct from gen-
ignore the more general case for our
purposes. eral systems of dierential equations. Not only do they conserve energy:
3 they also have many other unusual properties.4 Liouvilles theorem de-
In section 7.1 we discuss the quantum
version of Liouvilles theorem.
scribes the most important of these properties.
Consider the evolution law for a general probability density in phase
(P, Q) = (q1 , ..., q3N , p1 , ...p3N ). (4.4)
(As a special case, the microcanonical ensemble has equal to a con-
stant in a thin range of energies, and zero outside that range.) This
probability density is locally conserved: probability cannot be created
or destroyed, it can only ow around in phase space. As an analogy,
suppose a uid of mass density 3D (x) in three dimensions has a veloc-
ity v(x). Because mass density is locally conserved, 3D must satisfy
the continuity equation 3D /t = J, where J = 3D v is the mass
Think of the ow in and out of a small current.5 In the same way, the probability density in 6N dimensions
volume V in space. The change in the has a phase-space probability current ( P, Q) and hence satises a
density inside the volume 3D /t V
must equal minus the ow of
continuity equation
R mate-
rial out through the surface J R dS,
which by Gauss theorem equals 
(q ) (p )
J dV J V . = + (4.5)
t =1
q p

q p
= q + + p +
q q p p

Now, its clear what is meant by /q , since is a function of the q s

and p s. But what is meant by q /q ? For our example of point
particles, q = p /m, which has no dependence on q ; nor does p =
It would typically generally depend on f (q1 , . . . , q3N ) have any dependence on the momentum p .6 Hence
the coordinate q , for example. these two mysterious terms in equation 4.5 both vanish for Newtons
laws for point particles. Indeed, in a general Hamiltonian system, using
equation 4.1, we nd that they cancel:

q /q = (H/p )/q = 2 H/p q = 2 H/q p

= (H/q )/p = (p )/p = p /p . (4.6)

4 Forthe mathematically sophisticated reader, Hamiltonian dynamics preserves

a symplectic form = dq1 dp1 + + dq3N dp3N : Liouvilles theorem follows
because the volume in phase space is 3N .
To be pub. Oxford UP, Fall05
4.1 Liouvilles Theorem 55

This leaves us with the equation


/t + q + p = d/dt = 0. (4.7)
q p

This is Liouvilles theorem.

What is d/dt, and how is it dierent from /t? The former is
called the total derivative of with respect to time: its the evolution of
seen by a particle moving withthe ow. In a three dimensional ow,
d3D /dt = /t+ v = t +

i=1 xi xi ; the rst term is the change
in due to the time evolution at xed position, and the second is the
change in that a particle moving with velocity v would see if the
eld didnt change in time. Equation 4.7 is the same physical situation,
but in 6N -dimensional phase space.
What does Liouvilles theorem, d/dt = 0, tell us about Hamiltonian

Flows in phase space are incompressible. In uid mechanics,

if the density d3D /dt = 0 it means that the uid is incompressible. t
The density of a small element of uid doesnt change as it moves
around in the uid: hence the small element is not compressing
or expanding. In Liouvilles theorem, it means the same thing: a
small volume in phase space will evolve into a new shape, perhaps
stretched, twisted, or folded, but with exactly the same volume.
There are no attractors. In other dynamical systems, most
Fig. 4.1 A small volume in phase
states of the system are usually transient, and the system settles space may be stretched and twisted by
down onto a small set of states called the attractor. A damped the ow, but Liouvilles theorem shows
pendulum will stop moving: the attractor has zero velocity and that the volume stays unchanged.
vertical angle (exercise 4.1). A forced, damped pendulum will
settle down to oscillate with a particular amplitude: the attractor
is a circle in phase space. The decay of these transients would
seem closely related to equilibration in statistical mechanics, where
at long times all initial states of a system will settle down into
boring static equilibrium behavior.7 Perversely, weve just proven 7
Well return to the question of how ir-
that equilibration in statistical mechanics happens by a completely reversibility and damping emerge from
statistical mechanics many times in the
dierent mechanism! In equilibrium statistical mechanics all states rest of this book. It will always involve
are created equal: transient states are temporary only insofar as introducing approximations to the mi-
they are very unusual, so as time evolves they disappear, to arise croscopic theory.
again only as rare uctuations.
Microcanonical ensembles are time independent. An initial
uniform density in phase space will stay uniform. More generally,
since energy is conserved, a uniform density over a small shell of
energies (E, E + E) will stay uniform.

Liouvilles theorem tells us that the energy surface may get stirred
around, but the relative weights of parts of the surface are given by
their phasespace volumes (gure 3.1) and dont change. This is clearly
a necessary condition for our microcanonical ensemble to describe the
timeindependent equilibrium state.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
56 Phase Space Dynamics and Ergodicity

4.2 Ergodicity
By averaging over the energy surface, statistical mechanics is making a
hypothesis, rst stated by Boltzmann. Roughly speaking, the hypothesis
is that the energy surface is thoroughly stirred by the time evolution: it
isnt divided into some kind of components that dont intermingle (see
Mathematicians distinguish between gure 4.2). A system which is thoroughly stirred is said to be ergodic.8
ergodic (stirred) and mixing (scram- The original way of dening ergodicity is due to Boltzmann. Adapting
bled); we only need to assume ergod-
icity here. See reference [61] for more
his denition,
information about ergodicity. Denition 1: In an ergodic system, the trajectory of almost every9
What does almost every mean? point in phase space eventually passes arbitrarily close10 to every
Technically, it means all but a set of other point (position and momentum) on the surface of constant
zero volume (measure zero). Basically, energy.
its there to avoid problems with initial
conditions like all the particles moving We say our Hamiltonian is ergodic if the time evolution is ergodic on
precisely at the same velocity in neat each energy surface S.
The most important consequence of ergodicity is that time averages
Why not just assume that every are equal to microcanonical averages.11 Intuitively, since the trajectory
point on the energy surface gets passed
through? Boltzmann originally did (P(t), Q(t)) covers the whole energy surface, the average of any property
assume this. However, it can be A(P(t), Q(t)) over time is the same as the average of A over the energy
shown that a smooth curve (our time- surface.
trajectory) cant ll up a whole volume
This turns out to be tricky to prove, though. Its easier mathemat-
(the energy surface).
ically to work with another, equivalent denition of ergodicity. This
If the system equilibrates (i.e., denition roughly says the energy surface cant be divided into compo-
doesnt oscillate forever), the time aver-
age behavior will be determined by the nents which dont intermingle. Lets dene an ergodic component R of
equilibrium behavior, and then ergod- a set S to be a subset that is left invariant under the ow (so r(t) R
icity implies that the equilibrium prop- for all r(0) R).
erties are equal to the microcanonical
averages. Denition 2: A time evolution in a set S is ergodic if and only if all
the ergodic components R in S either have zero volume or have a
volume equal to the volume of S.
Why does denition 2 follow from denition 1? A trajectory r(t) of
course must lie within a single ergodic component. If r(t) covers the
energy surface densely (denition 1), then there is no more room for a
second ergodic component with non-zero volume (denition 2).
Using this denition of ergodic, its easy to argue that time averages
must equal microcanonical averages. Lets denote the microcanonical
average of an observable A as AS , and lets denote the time average
starting at initial condition (P, Q) as A(P, Q).
Showing that the time average A equals the ensemble average AS
for an ergodic system (using this second denition) has three steps.
(1) Time averages are constant on trajectories. If A is a nice
function, (e.g. without any innities on the energy surface), then
its easy to show that

A(P(t), Q(t)) = A(P(t + ), Q(t + )); (4.8)

the innite time average doesnt depend on the values of A during

the nite time interval (t, t + ). Thus the time average A is
To be pub. Oxford UP, Fall05
4.2 Ergodicity 57

Fig. 4.2 KAM tori and non-

ergodic motion. This is a
(Poincare) cross section of Earths
motion in the three-body problem (ex-
ercise 4.2), with Jupiters mass set at
almost 70 times its actual value. The
closed loops correspond to trajectories
that form tori in phase space, whose
cross sections look like deformed cir-
cles in our view. The complex lled
region is a single trajectory exhibiting
chaotic motion, and represents an er-
godic component. The tori, each an
ergodic component, can together be
shown to occupy non-zero volume in
phase space, for small Jovian masses.
Note that this system is not ergodic
according to either of our denitions.
The trajectories on the tori never ex-
plore the rest of the energy surface.
The region R formed by the chaotic
domain is invariant under the time
evolution; it has positive volume and
the region outside R also has positive

constant along the trajectory.12

(2) Time averages are constant on the energy surface. Now
consider the subset Ra of the energy surface where A < a, for
some value a. Since A is constant along a trajectory, any point
in Ra is sent under the time evolution to another point in Ra , so
Ra is an ergodic component. If our dynamics is ergodic on the
energy surface, that means the set Ra has either zero volume or
the volume of the energy surface. This implies that A is a constant
on the energy surface (except on a set of zero volume); its value
is a , the lowest value where Ra has the whole volume. Thus the
equilibrium, time average value of our observable A is independent
of initial condition.
(3) Time averages equal microcanonical averages. Is this equi-
librium value given by the microcanonical ensemble average over
S? We need to show that the trajectories dont dawdle in some
regions of the energy surface more than they should (based on
the thickness of the energy shell, gure 3.1). Liouvilles theorem
in section 4.1 told us that the microcanonical ensemble was time
independent. This implies that the microcanonical ensemble aver-

12 If we could show that A had to be a continuous function, wed now be able to use

the rst denition of ergodicity to show that it was constant on the energy surface,
since our trajectory comes close to every point on the surface. But it will not be
continuous for Hamiltonian systems that are not ergodic. In gure 4.2, consider two
initial conditions at nearby points, one just inside a chaotic region and the other on
a KAM torus. The innite time averages on the two trajectories for most quantities
will be dierent: A will typically have a jump at the boundary.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
58 Phase Space Dynamics and Ergodicity

age of A is time independent: AS = A(t)S = A(P(t), Q(t))S ,

where the average S integrates over initial conditions (P(0), Q(0))
but evaluates A at (P(t), Q(t)). Averaging over all time, and using
the fact that A = a (almost everywhere), tells us
AS = lim A(P(t), Q(t))S dt
= lim A(P(t), Q(t)) dt
T T 0
= A(P, Q)S = a S = a . (4.9)

Thus the microcanonical average equals the time average for an

ergodic Hamiltonian system.

Can we show that our systems are ergodic? Usually not. Ergodic-
ity has been proven for the collisions of hard spheres, and for geodesic
Geodesic motion on a sphere would motion on nite surfaces with constant negative curvature,13 but not
be motion at a constant speed around for many systems of immediate practical importance. Indeed, many
great circles. Geodesics are the short-
est paths between two points. In gen-
fundamental problems precisely involve systems which are not ergodic.
eral relativity, falling bodies travel on
geodesics in space-time. KAM tori and the three-body problem. Generations of
mathematicians and physicists have worked on the gravitational
Newton solved the gravitational two- three-body problem.14 The key challenge was showing that the
body problem, giving Keplers ellipse. interactions between the planets do not completely mess up their
orbits over long times. One must note that messing up their or-
bits is precisely what an ergodic system must do! (Theres just
as much phase space at constant energy with Earth and Venus
exchanging places, and a whole lot more with Earth ying out
That is, the 20th century. into interstellar space.) In the last century15 the KAM theorem
was proven, which showed that (for small interplanetary interac-
tions and a large fraction of initial conditions) the orbits of the
planets qualitatively stayed in weakly perturbed ellipses around
the Sun (KAM tori, see gure 4.2). Other initial conditions, intri-
cately intermingled with the stable ones, lead to chaotic motion.
Exercise 4.2 investigates the KAM tori and chaotic motion in a
numerical simulation.
From the KAM theorem and the study of chaos in these systems
we learn that Hamiltonian systems with small numbers of particles
are often, even usually, not ergodic there are commonly regions
formed by tori of non-zero volume which do not mix with the rest
of the energy surface.
Fermi, Pasta, Ulam and KdV. You might think that this is
a peculiarity of having only a few particles. Surely if there are
lots of particles, such funny behavior has to go away? On one of
the early computers developed for the Manhattan project, Fermi,
Pasta and Ulam tested this [29]. They took a one-dimensional
chain of atoms, coupled them with anharmonic potentials, and
tried to look for thermalization. To quote them:
To be pub. Oxford UP, Fall05
4.2 Ergodicity 59

Let us say here that the results of our computa-

tions were, from the beginning, surprising us. Instead
of a continuous ow of energy from the rst mode to
the higher modes, all of the problems show an entirely
dierent behavior. . . . Instead of a gradual increase of
all the higher modes, the energy is exchanged, essen-
tially, among only a certain few. It is, therefore, very
hard to observe the rate of thermalization or mixing
in our problem, and this was the initial purpose of the
calculation. [29, p.978]
It turns out that their system, in the continuum limit, gave a par-
tial dierential equation (the Kortweg-de Vries equation) that was
even weirder than planetary motion: it had an innite family of
conserved quantities, and could be exactly solved using a combi-
nation of fronts called solitons.
The kind of non-ergodicity found in the Kortweg-de Vries equa-
tion was thought to arise in only rather special onedimensional
systems. The recent discovery of anharmonic localized modes in
generic, threedimensional systems by Sievers and Takeno [91, 85]
suggests that non-ergodicity my arise in rather realistic lattice
Phase Transitions. In systems with an innite number of parti-
cles, one can have phase transitions. Often ergodicity breaks down
in one of the phases. For example, a liquid may explore all of phase
space with a given energy, but an innite crystal (with a neat grid
of atoms aligned in a particular orientation) will never uctuate to
change its orientation, or (in three dimensions) the registry of its
grid.16 The real system will explore only one ergodic component 16
That is, a 3D crystal has broken
of the phase space (one crystal position and orientation), and we orientational and translational symme-
tries: see chapter 9.
must do the same when making theories of the system.
Glasses. There are other kinds of breakdowns of the ergodic hy-
pothesis. For example, glasses fall out of equilibrium as they are
cooled: they no longer ergodically explore all congurations, but
just oscillate about one of many metastable glassy states. Certain
models of glasses and disordered systems can be shown to break
ergodicity. It is an open question whether real glasses truly break
ergodicity when cooled innitely slowly, or whether they are just
sluggish, frozen liquids.
Should we be concerned that we cannot prove that our systems are
ergodic? It is entertaining to point out the gaps in our derivations, espe-
cially since they tie into so many central problems in mathematics and
physics (above). We emphasize that these gaps are for most purposes
purely of academic concern. Statistical mechanics works phenomenally
well in systems with large numbers of interacting degrees of freedom.
Indeed, the level of rigor here is unusual. In more modern applica-
tions17 of statistical mechanics outside of equilibrium thermal systems
17 In disordered systems, disorder is heuristically introduced with Gaussian or dis-
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
60 Phase Space Dynamics and Ergodicity

there is rarely any justication of the choice of the ensemble comparable

to that provided by Liouvilles theorem and ergodicity.

(4.1) The Damped Pendulum vs. Liouvilles The- planets out of the Solar system. Most of the phase-space
orem. (Basic, Mathematics) volume of the energy surface has eight planets evaporated
The damped pendulum has a force p proportional to and Jupiter orbiting the Sun alone: the ergodic hypothe-
the momentum slowing down the pendulum. It satises sis would doom us to one long harsh winter. So, the big
the equations question is: Why hasnt the Earth been kicked out into
interstellar space?
x = p/M (4.10) Mathematical physicists have studied this problem for
p = p K sin(x). hundreds of years. For simplicity, they focused on the
three-body problem: for example, the Sun, Jupiter, and
At long times, the pendulum will tend to an equilibrium the Earth. The early (failed) attempts tried to do pertur-
stationary state, zero velocity at x = 0 (or more generally bation theory in the strength of the interaction between
at the equivalent positions x = 2m, for m an integer): planets. Jupiters gravitational force on the Earth is not
(p, x) = (0, 0) is an attractor for the damped pendulum. tiny, though: if it acted as a constant brake or accelerator,
An ensemble of damped pendulums is started with ini- our orbit would be way out of whack in a few thousand
tial conditions distributed with probability (p0 , x0 ). At years. Jupiters eects must cancel out over time rather
late times, these initial conditions are gathered together perfectly...
near the equilibrium stationary state: Liouvilles theorem
This problem is mostly discussion and exploration: only
clearly is not satised.
a few questions need to be answered. Download the pro-
(a) In the steps leading from equation 4.5 to equation 4.7, gram Jupiter from the appropriate link at the bottom
why does Liouvilles theorem not apply to the damped pen- of reference [96]. (Go to the directory with the binaries
dulum? More specically, what is p/p and q/q? and select Jupiter.) Check that Jupiter doesnt seem to
(b) Find an expression for the total derivative d/dt in send the Earth out of the Solar system. Try increasing
terms of for the damped pendulum. How does the prob- Jupiters mass to 35000 Earth masses. (If you type in a
ability density vary with time? If we evolve a region of new value, you need to hit Enter to register it.)
phase space of initial volume A = p x how will its Start the program over again (or reset Jupiters mass back
volume depend upon time? to 317.83 Earth masses). Shifting View to Earths
trajectory, run for a while, and zoom in with the right
(4.2) Jupiter! and the KAM Theorem (Astro- mouse button to see the small eects of Jupiter on the
physics, Mathematics) Earth. (The left mouse button will launch new trajec-
See also reference [96]. tories. Clicking with the right button will restore the
The foundation of statistical mechanics is the ergodic hy- original view.) Note that the Earths position shifts de-
pothesis: any large system will explore the entire energy pending on whether Jupiter is on the near or far side of
surface. We focus on large systems because it is well the sun.
known that many systems with a few interacting parti- (a) Estimate the fraction that the Earths radius from
cles are denitely not ergodic. the Sun changes during the rst Jovian year (about 11.9
The classic example of a non-ergodic system is the Solar years). How much does this fractional variation increase
system. Jupiter has plenty of energy to send the other over the next hundred Jovian years?

crete random variables. In stochastic systems Gaussian or white noise is added. In

Bayesian statistics, the user is in charge of determining the prior model probabil-
ity distribution, analogous to Liouvilles theorem determining the measure on phase
To be pub. Oxford UP, Fall05
4.2 Ergodicity 61

Jupiter thus warps Earths orbit into a kind of spiral on edge.)

around a tube. This orbit in physical three-dimensional The fact that the torus isnt destroyed immediately is a
space is a projection of the tube in 6N-dimensional phase serious problem for statistical mechanics! The orbit does
space. The tube in phase space already exists for massless not ergodically explore the entire allowed energy surface.
planets... This is a counterexample to Boltzmanns ergodic theo-
Lets start in the non-interacting planet approximation rem. That means that time averages are not equal to
(where Earth and Jupiter are assumed to have zero mass). averages over the energy surface: our climate would be
Both Earths orbit and Jupiters orbit then become cir- very unpleasant, on the average, if our orbit were ergodic.
cles, or more generally ellipses. The eld of topology does
Lets use a Poincare section to explore these tori, and
not distinguish an ellipse from a circle: any stretched,
the chaotic regions between them. If a dynamical system
wiggled rubber band is a circle so long as it forms a curve
keeps looping back in phase space, one can take a cross-
that closes into a loop. Similarly, a torus (the surface of
section of phase space and look at the mapping from that
a doughnut) is topologically equivalent to any closed sur-
cross section back into itself (see gure 4.3).
face with one hole in it (like the surface of a coee cup,
with the handle as the hole). Convince yourself in this
non-interacting approximation that Earths orbit remains
topologically a circle in its six-dimensional phase space.18
(b) In the non-interacting planet approximation, what
topological surface is it in in the eighteen-dimensional
phase space that contains the trajectory of the three bod-
ies? Choose between (i) sphere, (ii) torus, (iii) Klein
bottle, (iv) two-hole torus, (v) complex projective plane.19
About how many times does Earth wind around this sur-
face during each Jovian year? (This ratio of years is
called the winding number).
The mathematical understanding of the three-body prob-
lem was only solved in the past hundred years or so, by
Kolmogorov, Arnold, and Moser. Their proof focuses on
the topological integrity of this tube in phase space (called
now the KAM torus). They were able to prove stability
if the winding number (Jupiter year over Earth year) is Fig. 4.3 The Poincare section of a torus is a circle. The dy-
suciently irrational. More specically, they could prove namics on the torus becomes a mapping of the circle onto itself.
in this case that for suciently small planetary masses
that there is a distorted torus in phase space, near the
unperturbed one, around which the planets spiral around
with the same winding number. The Poincare section shown in the gure is a planar cross
(c) About how large can you make Jupiters mass before section in a three-dimensional phase space. Can we re-
Earths orbit stops looking like a torus? (You can hit duce our problem to an interesting problem with three
Clear and Reset to put the planets back to a stan- phase-space coordinates? The original problem has an
dard starting point. Otherwise, your answer will depend eighteen dimensional phase space. In the center of mass
upon the location of Jupiter in the sky Admire the cool frame it has twelve interesting dimensions. If we restrict
orbits when the mass becomes too heavy. the motion to a plane, it reduces to eight dimensions. If
Thus, for small Jovian masses, the trajectory in phase we assume the mass of the Earth is zero (the restricted
space is warped and rotated a bit, so that its toroidal planar three body problem) we have ve relevant coordi-
shape is visible looking at Earths position alone. (The nates (Earth xy positions and velocities, and the location
circular orbit for zero Jovian mass is looking at the torus of Jupiter along its orbit). If we remove one more variable

18 Hint: plot the orbit in the (x, y), (x, p ), and other planes. It should look like
the projection of a circle along various axes.
19 Hint: Its a circle cross a circle, parameterized by two independent angles one

representing the month of Earths year, and one representing the month of the Jovian
year. Feel free to look at part (c) before committing yourself, if pure thought isnt
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
62 Phase Space Dynamics and Ergodicity

by going to a rotating coordinate system that rotates with curves. These are KAM tori that have been squashed and
Jupiter, the current state of our model can be described twisted like rubber bands.20 Explore until you nd some
with four numbers: two positions and two momenta for orbits that seem to ll out whole regions: these represent
the Earth. We can remove another variable by conn- chaotic orbits.21
ing ourselves to a xed energy. The true energy of the (d) If you can do a screen capture, print out a Poincare
Earth isnt conserved (because Earth feels a periodic po- section with initial conditions both on KAM tori and in
tential), but there is a conserved quantity which is like chaotic regions: label each.22 See gure 4.2 for a small
the energy in the rotating frame: more details described segment of the picture you should generate.
under Help or on the Web [96] under Description of
the Three Body Problem. This leaves us with a trajec- It turns out that proving that Jupiters eects cancel out
tory in three dimensions (so, for small Jovian masses, we depends on Earths smoothly averaging over the surface
have a torus embedded in a three-dimensional space). Fi- of the torus. If Jupiters year is a rational multiple of
nally, we take a Poincare cross section: we plot a point of Earths year, the orbit closes after a few years and you
the trajectory every time Earth passes directly between dont average over the whole torus: only a closed spiral.
Jupiter and the Sun. I plot the distance to Jupiter along Rational winding numbers, we now know, leads to chaos
the horizontal axis, and the velocity component towards when the interactions are turned on: the large chaotic re-
Jupiter along the vertical axis; the perpendicular compo- gion you found above is associated with an unperturbed
nent of the velocity isnt shown (and is determined by the orbit with a winding ratio of 3:1. Of course, the rational
energy). numbers are dense: between any two KAM tori there are
Set the View to Poincare. (You may need to expand the chaotic regions, just because between any two irrational
window a bit: sometimes the dot size is too small to see.) numbers there are rational ones. Its even worse: it turns
Set Jupiters mass to 2000, and run for 1000 years. You out that numbers which are really, really close to rational
should see two nice elliptical cross-sections of the torus. (Liouville numbers like 1 + 1/10 + 1/1010 + 1/1010 + . . . )
As you increase the mass (type in a mass, Enter, Re- also may lead to chaos. It was amazingly tricky to prove
set and Run, repeat), watch the toroidal cross sections that lots of tori survive nonetheless. You can imagine
as they break down. Run for a few thousand years at why this took hundreds of years to understand (especially
MJ = 22000Me ; notice the torus has broken into three without computers).
Fixing the mass at MJ = 22000Me , lets explore the de- (4.3) Invariant Measures. (Math, Complexity) (With
pendence of the planetary orbits on the initial condition. Myers. [72])
Select the preset for Chaos (or set MJ to 22000 Me ,
View to Poincare, and Reset). Clicking on a point Reading: Reference [47], Roderick V. Jensen and
on the screen with the left mouse button will launch a Christopher R. Myers, Images of the critical points of
trajectory with that initial position and velocity towards nonlinear maps Physical Review A 32, 1222-1224 (1985).
Jupiter; it sets the perpendicular component of the ve- Liouvilles theorem tells us that all available points in
locity to keep the current energy. (If you click on a phase space are equally weighted when a Hamiltonian sys-
point where energy cannot be conserved, the program tem is averaged over all times. What happens for systems
will tell you so.) You can thus view the trajectories on that evolve according to laws that are not Hamiltonian?
a two-dimensional cross-section of the three-dimensional Usually, the system does not continue to explore all points
constant energy surface. in its state space: at long times it is conned a subset of
Notice that many initial conditions slowly ll out closed the original space known as the attractor.

20 You can Continue if the trajectory doesnt run long enough to give you a com-

plete feeling for the cross-section: also, increase the time to run). You can zoom in
with the right mouse button, and zoom out by expanding the window or by using
the right button and selecting a box which extends outside the window.
21 Notice that the chaotic orbit doesnt throw the Earth out of the Solar system.

The chaotic regions near innity and near our initial condition are not connected.
This may be an artifact of our simplied model: in other larger systems it is believed
that all chaotic regions (on a connected energy surface) are joined through Arnold
22 At least under Linux, the Print feature is broken. Under Linux, try gimp:

File Menu, then Acquire, then Screen Shot. Under Windows, alt-Print Screen and
then Paste into your favorite graphics program.
To be pub. Oxford UP, Fall05
4.2 Ergodicity 63

We consider the behavior of the logistic mapping from For larger values of , more complicated things happen.
the unit interval (0, 1) into itself.23 At = 1, the dynamics can be shown to ll the entire in-
terval: the dynamics is ergodic, and the attractor lls the
f (x) = 4x(1 x). (4.11) entire set of available states. However, unlike the case of
Hamiltonian systems, not all states are weighted equally.
We talk of the trajectory of an initial point x0 as the se-
We can nd time averages for functions of x in two ways:
quence of points x0 , f (x0 ), f (f (x0 )), . . . , f [n] (x0 ), . . . .
by averaging over time (many iterates of the map) or
Iteration can be thought of as a time step (one iteration
by weighting an integral over x by the invariant density
of a Poincare return map of exercise 4.2 or one step t
(x). The invariant density (x) dx is the probability that
in a time-step algorithm as in exercise 8.9).
a point on a long trajectory will lie between x and x + dx.
Attracting Fixed Point: For small , our mapping has To nd it numerically, we iterate a typical point25 x0 a
an attracting xed point. A xed point of a mapping is thousand or so times (Ntransient ) to nd a point xa on the
a value x = f (x ); a xed point is stable if under small attractor, and then collect a long trajectory of perhaps
perturbations shrink: a million points (Ncycles ). A histogram of this trajectory
gives (x). Clearly averaging over this density is the same
|f (x + 9) x | |f  (x )|9 < 9, (4.12) as averaging over the trajectory of a million points. We
call (x) an invariant measure because its left invariant
which happens if the derivative |f  (x )| < 1.24
under the mapping f : iterating our millionpoint approx-
(a) Iteration: Set = 0.2; iterate f for some initial imation for once under f only removes the rst point
points 0 < x0 < 1 of your choosing, and convince your- xa and adds one extra point to the end.
self that they all are attracted to zero. Plot f and the
(b) Invariant Density: Set = 1; iterate f many times,
diagonal y = x on the same plot. Are there any xed
and form a histogram of values giving the density (x) of
points other than x = 0? Repeat for = 0.4, and 0.6.
points along the trajectory. You should nd that points
What happens?
x near the boundaries are approached more often than
Analytics: Find the non-zero xed point x () of the points near the center.
map 4.11, and show that it exists and is stable for 1/4 < Analytics: Use the fact that the long time average (x)
< 3/4. If youre ambitious or have a computer algebra must be independent of time, verify for = 1 that the
program, show that
there is a stable periodtwo cycle for density of points is26
3/4 < < (1 + 6)/4.
An attracting xed point is the antithesis of Liouvilles (x) = p . (4.13)
theorem: all initial conditions are transient except one, x(1 x)
and all systems lead eventually to the same, time Plot this theoretical curve with your numerical histogram.
independent state. (On the other hand, this is precisely (Hint: The points in a range dx of a point x map under f
the behavior we expect in statistical mechanics on the to a range dy = f  (x) dx around the image y = f (x).
macroscopic scale: the system settles down into a time Each iteration maps two points xa and xb = 1 xa
independent equilibrium state! All microstates are equiv- to y, and thus maps all the density (xa )|dxa | and
alent, but the vast majority of accessible microstates (xb )|dxb | into dy. Hence the probability (y)dy must
have the same macroscopic behavior in most large sys- equal (xa )|dxa | + (xb )|dxb |, so
tems). We could dene a rather trivial equilibrium en-
(f (xa )) = (xa )/|f  (xa )| + (xb )/|f  (xb )| (4.14)
semble for this system, which consists of the single point
x : any property A(x) will have the longtime average Plug equation 4.13 into equation 4.14. Youll need to
A = A(x ). factor a polynomial.)

23 We also study this map in exercises 6.12, 6.14, and 13.8.

24 For many dimensional mappings, a sucient criterion for stability is that all the
eigenvalues of the Jacobian have magnitude smaller than one. A continuous time
evolution dy/dt = F (y), will be stable if dF/dy is smaller than zero, or (for multidi-
mensional systems) if the Jacobian DF has eigenvalues whose real parts are all less
than zero. This is all precisely analogous to discrete and continuoustime Markov
chains, see section 8.2
25 For example, we must not choose an unstable xed point or unstable periodic

26 You need not derive the factor 1/, which normalizes the probability density to

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
64 Phase Space Dynamics and Ergodicity

Mathematicians call this probability density (x)dx the cation diagram, which is shown for large in gure 4.4.
invariant measure on the attractor.27 To get the long One of the striking features in this plot are the sharp
term average of any function A(x), one can use boundaries formed by the cusps.
A = A(x)(x)dx (4.15)

To a mathematician, a measure is a way of weighting dif-

ferent regions in doing integrals precisely our (x)dx.
Notice that, for the case of an attracting xed point, we
would have (x) = (x ).28 x
Cusps in the invariant density: At values of slightly
smaller than one, our mapping has a rather complex in-
variant density.
(c) Find the invariant density (as described above) for
= 0.9. Make your trajectory length Ncycles big enough
and the bin size small enough to see the interesting struc-
tures. Notice that the attractor no longer lls the whole
range (0, 1): locate roughly where the edges are. No-
tice also the cusps in (x) at the edges of the attractor, Fig. 4.4 Bifurcation diagram in the chaotic region. No-
and also at places inside the attractor (called boundaries, tice the boundary lines threading through the diagram, images
of the crease formed by the folding at x = 1/2 in our map (see
see reprint above). Locate some of the more prominent
reprint above).
Analytics of cusps: Notice that f  (1/2 ) = 0. so by equa-
tion 4.14 we know that (f (x)) (x)/|f  (x)| must have
a singularity near x = 1/2 : all the points near x = 1/2 are (e) Bifurcation Diagram: Plot the attractor (duplicat-
squeezed together and folded to one side by f . Further ing gure 4.4) as a function of , for 0.8 < < 1. (Pick
iterates of this singularity produce more cusps: the crease regularly spaced , run ntransient steps, record ncycles
after one fold stays a crease after being further stretched steps, and plot. After the routine is working, you should
and kneaded. be able to push ntransient and ncycles both larger than 100,
(d) Set = 0.9. Calculate f (1/2 ), f (f (1/2 )), . . . and com- and < 0.01.)
pare these iterates to the locations of the edges and cusps On the same plot, for the same s, plot the rst eight
from part (c). (You may wish to include them both on the images of x = 1/2 , that is, f (1/2 ), f (f (1/2 )), . . . . Are the
same plot.) boundaries you see just the cusps? What happens in the
Bifurcation Diagram: The evolution of the attractor bifurcation diagram when two boundaries touch? (See the
and its invariant density as varies is plotted in the bifur- reprint above.)

27 There are actually many possible invariant measures on some attractors: this

one is the SRB measure.[39]

28 The case of a xed point then becomes mathematically a measure with a point

mass at x .
To be pub. Oxford UP, Fall05
Free Energies and
Ensembles 5
In the preceding chapters, we have in principle dened equilibrium statis-
tical mechanics as the microcanonical ensemble average over the energy
surface. All that remains is to calculate the properties of particular sys-
tems. Computing properties of large systems xing the total energy,
though, turns out to be quite awkward. One is almost always interested
either in the properties of a small subsystem or in a system that is cou-
pled in some way with the external world. The calculations are made far
easier by using ensembles appropriate for subsystems of larger systems.
In this chapter, we will introduce two such ensembles, the canonical
ensemble1 and the grand canonical ensemble. 1
Websters canonical: reduced to the
But rst, let us motivate the introduction of these ensembles by in- simplest or clearest schema possible.
The canonical ensemble will be simpler
troducing the concept of free energy. to compute with than the microcanon-
ical one.

5.1 Free Energy

A mass M hangs on the end of a spring of spring constant K and un- K(hh 0)
stretched length h0 , subject to a gravitational eld of strength g. How
far does the spring stretch? We have all solved innumerable statics prob- h*
lems of this sort in rst-year mechanics courses. The spring stretches to
a length h where mg = K(h h0 ), where the forces balance and the mg h* h 0
energy is minimized.
What principle of physics is this? In physics, energy is conserved, not Fig. 5.1 A mass on a spring in equilib-
minimized! Shouldnt we be concluding that the mass will oscillate with rium sits very close to the minimum of
a constant amplitude forever? the energy.
We have now come to the point in your physics education where we
can nally explain why the mass appears to minimize energy. Here our
subsystem (the mass and spring)2 are coupled to a very large number 2
We think of the subsystem as being
N of internal atomic or molecular degrees of freedom. The oscillation of just the macroscopic conguration of
mass and spring, and the atoms com-
the mass is coupled to these other degrees of freedom (friction) and will prising them as being part of the envi-
share its energy with them. The vibrations of the atoms is heat: the ronment, the rest of the system.
energy of the pendulum is dissipated by friction into heat. Indeed, since
the spring potential energy is quadratic we can use the equipartition
theorem from section 3.2.2: 1/2 K(h h )2 = 1/2 kB T . For
 a spring with
 = 10N/m at room temperature kB T = 4 10 J, (h h )2  =
kB T /K = 2 10 m = 0.2A. The energy is indeed minimized up to
tiny thermal uctuations.
66 Free Energies and Ensembles

Is there a more general principle? Suppose our subsystem is more

complex than one degree of freedom, say a piston lled with gas? The
entropy of our subsystem can also be important. The second law of
thermodynamics states that the entropy of the universe tends to a max-
imum. What function of the energy E and the entropy S for our system
tends to a minimum, when it is coupled to the rest of the universe?
Lets start with a subsystem of energy Es and xed volume and num-
ber of particles, weakly coupled to a world at temperature Tb . In sec-
tion 5.2 we shall nd that small subsystems of large systems in equilib-
rium are described by the canonical ensemble, and we shall dene the
Its called the free energy because its Helmholtz free energy A(T, V, N ) = E T S.3 Can we see in advance
the energy available to do work: to run that minimizing the Helmholtz free energy of the subsystem maximizes
a heat engine youd need to send energy
Q/T = S into the cold bath to get rid
the entropy of the universe? If A = Es Tb Ss (Es ) then A is a minimum
of the entropy, leaving A = E T S free if its derivative
to do work (section 6.1). A/Es = 1 Tb Ss /Es (5.1)
equals zero. The last term is the inverse temperature of the subsystem
1/Ts , so
A/Es = 1 Tb /Ts = 0, (5.2)
once the temperature of our system equals the bath temperature. But
equating the temperatures maximizes the total entropy (Ss (Es )+Sb (E
Es ))/Es = 1/Ts 1/Tb , so minimizing the Helmholtz free energy A of
our subsystem maximizes the entropy of the universe as a whole.
Lets write out the Helmholtz free energy in more detail:

A(T, V, N ) = E T S(E, V, N ). (5.3)

The terms on the right-hand side of the equation involve four variables:
T , V , N , and E. Why is A only a function of three? These functions
only are meaningful for systems in equilibrium. In equilibrium, we have
just shown in equations 5.1 and 5.2 that A/E = 0 for all values of the
four variables. Hence A is independent of E. This is an example of a
Legendre transformation. Legendre transformations allow one to change
The entropy, energy, and various free from one type of energy or free energy4 to another, by changing from
energies are also called thermodynamic one set of independent variables (here E,V , and N ) to another (T , V ,
and N ).
We introduced in problem 3.5 the thermodynamics nomenclature (equa-
tion 3.67)
dE = T dS P dV + dN. (5.4)
which basically asserts that E(S, V, N ) satises

= T, (5.5)

= P, and

= .
To be pub. Oxford UP, Fall05
5.2 The Canonical Ensemble 67

The corresponding equation for the Helmholtz free energy A(T, V, N ) is

dA = d(E T S) = dE T dS S dT = S dT P dV + dN. (5.6)

which satises

= S, (5.7)

= P, and

= .

Similarly, systems at constant temperature and pressure (for example,

most biological and chemical systems, see exercise 5.8) minimize the
Gibbs free energy

G(T, P, N ) = E T S + P V = Es Tb Ss + Pb Vs . (5.8)

dG = d(A + P V ) = S dT + V dP + dN. (5.9)
Systems at constant energy and pressure minimize the enthalpy H =
E+P V . If we allow the number of particles N in our system to uctuate,
we will nd four more free energies involving the chemical potential .
The only one we will discuss will be the grand free energy

(T, V, ) = E T S N (5.10)

at constant temperature, volume, and chemical potential,

d = d(A N ) = S dT P dV N d. (5.11)

The grand free energy will arise naturally in the grand canonical ensem-
ble discussed in section 5.4.
Enough generalities and thermodynamics for now. Lets turn back to
statistical mechanics, and derive the ensemble appropriate for subsys-

5.2 The Canonical Ensemble

Consider a closed system of total energy E composed of a small part
(the subsystem) in a particular state a much larger heat bath5 with 5
Its called the heat bath because it will
entropy SHB (EHB ) = kB log (HB (EHB )). We assume the two parts act as a source and sink for heat energy
from our subsystem.
can exchange energy, but are weakly coupled (section 3.3). For now, we
assume they cannot exchange volume or particles.
We are interested in how much the probability of our subsystem being
in a particular state depends upon its energy. Consider two states of
the subsystem s1 and s2 with energies E1 and E2 . As we discussed in
deriving equation 3.23 (see note 26), the probability that our subsystem
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
68 Free Energies and Ensembles

In this section, it is convenient to as- will be in the particular6 state si is proportional to the density of states
sume the energy states are discrete. of our heat bath at E Ei
Hence we will talk about probabilities
rather Pthan probability Rdensities, and
(si ) HB (E Ei ) = exp (SHB (E Ei )/kB ) (5.12)
n rather than dP dQ. Dis-
crete states will be the norm both in
quantum systems and for our later work since it gets a share of the microcanonical probability for each heat-bath
on Ising and other model systems. No partner it can coexist with at the xed total energy.
complications arise from translating the Now, we are assuming that the bath is much larger than the subsys-
equations in this section back into inte-
grals over probability density.
tem. We can therefore assume that the inverse temperature 1/THB =
SHB /E of the heat bath is constant in the range (E E1 , E E2 ),

(s2 )/(s1 ) = HB (E E2 )/HB (E E1 )

= e(SHB (EE2 )SHB (EE1 ))/kB = e(E1 E2 ) (SHB /E)/kB
= e(E1 E2 )/kB THB . (5.13)

This is the general derivation of the Boltzmann or Gibbs distribution;

the probability Pn of a particular state of a subsystem of energy En

Pn exp(En /kB T ). (5.14)

We know that the probability is normalized n Pn = 1, so that gives

Pn = eEn /kB T / eEm /kB T = exp(En /kB T )/Z (5.15)
Particular Large where the normalization factor
State si Heat  
Energy E i Bath Z(T, N, V ) = eEm /kB T = exp(En ) (5.16)
m n
is the partition function, and where we also introduce the convenient
) = Exp(S(EHB)/kB) symbol = 1/kB T , which well call the inverse temperature (verbally
ignoring the factor kB ).
Equation 5.15 is the denition of the canonical ensemble, appropriate
Fig. 5.2 An isolated system is com-
for calculating properties of systems coupled energetically to an exter-
posed of a small subsystem in state si
surrounded by a larger region we call nal world with temperature T . The partition function Z is simply the
the heat bath. The total system is iso- normalization factor that keeps the total probability summing to one.
lated from the outside world; the sub- It may surprise you to discover that this normalization factor plays a
system and heat bath here each have
xed volume, but can exchange energy.
central role in most calculations: a typical application involves nding a
The energy of interaction between the microscopic way of calculating Z, and then using Z to calculate every-
subsystems and bath is assumed negli- thing else of interest. Lets see how this works by using Z to calculate
gible. The total energy is E, the energy several important quantities.
of the subsystem is Ei , and the energy
of the bath is EHB = E Ei . Internal energy. Lets calculate the average internal energy of our
Throughout this text, all logs are nat- subsystem E, where the angle brackets represent canonical averages:7
ural logarithms, loge and not log10 .
En eEn Z/
E = En Pn = n =
= log Z/ (5.17)
To be pub. Oxford UP, Fall05
5.2 The Canonical Ensemble 69

Specic Heat. Lets call cv the specic heat per particle at constant
volume.8 8
The specic heat is the energy needed
 to increase the temperature by one
E E d 1 En eEn d
unit. Also, dT =
d(1/kB T )
= k 1T 2 .
N cv = = =  (5.18) dT B
T dT kB T 2 eEn
1 En eEn En 2 eEn
= 2 2
kB T Z Z
= E 2  E2
kB T 2
= E 2 /kB T 2 ,

where E is the root-mean-square uctuation9 in the energy of our 9

Weve used the standard trick (E
system at constant temperature. E)2  = E 2  2EE + E2 =
E 2 E2 , since E is just a constant
Thus the energy uctuations per particle10
that can be pulled out of the ensemble
E /N = E 2  E2 /N = (kB T )(cv T )/ N . (5.20) 10
kB T is two-thirds the equipartition
kinetic energy per particle. cv T is the
become small for large numbers of particles N , as we showed from ther- energy per particle that it would take
modynamics in section 3.3. For macroscopic systems, the behavior in to warm the system up from absolute
most regards is the same whether the system is completely isolated (mi- zero, if the specic heat were constant
for all temperatures. The uctuations
crocanonical) or in thermal contact with the rest of the world (canoni- in the energy per particle of a macro-
cal). scopic system is the geometric mean of

Notice that weve derived a formula relating a macroscopic suscep- these two divided by N 1012 .
tibility (cv , the temperature changes when the energy is perturbed) to
a microscopic uctuation (E , the energy uctuation in thermal equi-
librium). In general, uctuations can be related to responses in this
fashion. These relations are extremely useful, for example, in extract-
ing susceptibilities from numerical simulations. No need to make small
changes and try to measure the response: just watch it uctuate.
Entropy. Using the general formula for the entropy 6.22
  exp(En )  
exp(En )
S = kB Pn log Pn = kB log

exp(En )(En log Z)
= kB

exp(En )
= kB E + kB log Z = E/T + kB log Z. (5.21)
Thus in particular

kB T log Z = E T S = A(T, V, N ); (5.22)

the Helmholtz free energy A = E T S we introduced in section 5.1 is

kB T times the log of the partition function. The relationship between
the partition function Z and the Helmholtz free energy A = kB T log Z in
the canonical ensemble is quite analogous to the relation between the en-
ergy shell volume and the entropy S = kB log in the microcanonical
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
70 Free Energies and Ensembles

ensemble. We also note, using [5.17] and [5.22]

A  log Z d
 = (kB T log Z) = kB log Z kB T
T N,V T dT
= kB T log Z + kB T E = kB log Z E/T
kB T 2
= S. (5.23)
just as we saw from thermodynamics (equation 5.7).

5.3 NonInteracting Canonical Distributions

Suppose we have a system with two weakly interacting subsystems L and
R, both connected to a heat bath at inverse temperature . The states
for the whole system are pairs of states (sL R
i , sj ) from the two subsystems,
with energies Ei and Ej respectively. The partition function for the
whole system is
  E L E R
Z= exp (EiL + EjR ) = e i e j

ij ij
 L  R
= eEi eEj
i j
=Z Z . (5.24)
Thus in the canonical ensemble of non-interacting systems the partition
function factors. The Helmholtz free energy adds
A = kB T log Z = kB T log(Z L Z R ) = AL + AR (5.25)
as does the entropy, average energy, and other properties that one would
These properties are termed exten- expect to scale with the size of the system.11
sive, as opposed to intensive properties This is much simpler than the same calculation would be in the mi-
like pressure, temperature, and chemi-
cal potential.
crocanonical ensemble! In the microcanonical ensemble, each subsystem
competes with the other for the available total energy. Even though
two subsystems are uncoupled (the energy of one is independent of the
state of the other) the microcanonical ensemble intermingles them in the
calculation. By allowing each to draw energy from a large heat bath,
uncoupled subsystems become independent calculations in the canonical
We can now immediately do several important cases of non-interacting
Ideal Gas. The partition function for the monatomic ideal gas of classi-
cal distinguishable particles of mass m in a cubical box of volume V = L3
factors into a product over each degree of freedom :

3N  L    3N
dist p2 /2m L 2m
Zideal = (1/h) dq dp e =
=1 0 h

=(L 2mkB T /h2 )3N = (L/)3N
To be pub. Oxford UP, Fall05
5.3 NonInteracting Canonical Distributions 71

= h/ 2mkB T = 22 /mkB T (5.26)
is the thermal de Broglie wavelength.
The internal energy
log Zideal
log( 2 ) = 3N/2 = 3N kB T /2 (5.27)
E = =

giving us the equipartition theorem without our needing to nd volumes
of spheres in 3N dimensions.
For N indistinguishable particles, we have counted each real cong-
uration N ! times for the dierent permutations of particles, so we must
dist by N ! just as we did for the phase space volume in
divide Zideal
section 3.5.
indist = (L/)3N /N !
Zideal (5.28)
This doesnt change the internal energy, but does change the Helmholtz
free energy

ideal = kB T log (L/) /N !

= N kB T log(V /3 ) kB T log(N !)
N kB T log(V /3 ) kB T (N log N N )

= N kB T log(V /N 3 ) 1

= N kB T log(3 ) 1 . (5.29)

where = N/V is the average density, and weve used Stirlings formula
log(N !) N log N N .
Classical Harmonic Oscillator and the Equipartition Theorem.
A harmonic oscillator of mass m and frequency k has a total energy

H(pk , qk ) = p2k /2m + mk2 qk2 /2. (5.30)

The partition function for one such oscillator is

1 2 2m
dpk (1/h)e(pk /2m+mk qk /2) =
2 2 2
Zk = dqk
h mk2
= . (5.31)
Hence the Helmholtz free energy for the classical oscillator is

A (T ) = kB T log Z = kB T log(/kB T ), (5.32)

the internal energy is

log Z
E (T ) = = (log log ) = 1/ = kB T, (5.33)

and of course cv = E/T = kB . This establishes the equipartition
theorem (from section 3.2.2) for systems with harmonic potential ener-
gies as well as quadratic kinetic energies: the internal energy is 1/2 kB T
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
72 Free Energies and Ensembles

per degree of freedom, where our harmonic oscillator has two degrees of
freedom (pk and qk ).
Harmonic oscillators are important for the specic heat of molecules
and solids. At temperatures low compared to the melting point, a solid
or molecule with an arbitrary many-body interaction potential V(Q)
typically only makes small excursions about the minimum Q0 of the
We use the Einstein summation con- potential. We expand about this minimum, giving us12
vention, summing over the repeated in-
dices and , and the convention =
/Q . V(Q) V(Q0 )+(QQ0 ) V+1/2 (QQ0 ) (QQ0 ) V+. . . (5.34)

Since the potential is a minimum at Q0 , the gradient of the potential

must be zero, so second term on the right-hand side must vanish. The
third term is a big 3N 3N quadratic form, which we may diagonalize
If the masses of the atoms are not all by converting to normal modes qk .13 In terms of these normal modes,
the same, one must change coordinates, the Hamiltonian is a set of uncoupled harmonic oscillators
rescaling the components of Q Q0 by
the square root of the mass. 
H= p2k /2m + mk2 qk2 /2. (5.35)

In section 7.2 well do the quan- At high enough temperatures that quantum mechanics can be ignored,14
tum harmonic oscillator, which then we can then use equation 5.31 to nd the total partition function for our
gives the entire statistical mechanics of
atomic vibrations well below the melt-
harmonic system
ing or disassociation point.
Z= Zk = (1/k ). (5.36)
k k

Classical Kinetic Energies. One will notice both for the ideal gas
and for the harmonic oscillator that each component of the momentum
contributed a factor 2m . As we promised in section 3.2.2, this will
happen in any classical system where the kinetic energy is of the stan-
Not all Hamiltonians have this form. dard form15 K(P) = p2 /2m , since each component of the momen-
For example, charged particles in mag- tum is thus uncoupled from the rest of the system. Thus the partition
netic elds will have terms that couple
function for any classical interacting system of non-magnetic particles
momenta and positions. !
will be some congurational piece times 2m
. This implies that
the velocity distribution is always Maxwellian [1.2] independent of what
This may be counterintuitive: an conguration the positions have.16
atom crossing a barrier has the same
velocity distribution as it had in the
bottom of the well. It does need to
borrow some energy from the rest of
the system. The canonical distribu-
5.4 Grand Canonical Ensemble
tion works precisely when the system is
large, so that the resulting temperature The canonical ensemble allows one to decouple the calculations of sub-
shift may be neglected. systems with xed volume and particle number, but which can exchange
energy. There are occasions when one would like to decouple the calcu-
lations of problems with varying particle number.
Consider a subsystem in a state s with energy Es and number Ns , in
a system with total energy E and number N . By analogy with equa-
tion 5.14 the probability density that the system will be in state s is
To be pub. Oxford UP, Fall05
5.4 Grand Canonical Ensemble 73

proportional to

(s) HB (E Es , N Ns ) (5.37)
= exp ((SHB (E Es , N Ns )) /kB )
= exp Es Ns /kB
= exp (Es /kB T + Ns /kB T )
= exp ((Es Ns )/kB T ) ,

where Subsystem Large

Particular Heat
= T S/N (5.38) State s Bath
is the chemical potential. Notice the factor of T : this converts the Energy E s E = E-Es
entropy change into an energy change, so the chemical potential is the Number N s NHB= N-Ns
energy gain per particle for accepting particles from the bath. At low
temperatures the subsystem will ll with particles until the energy for (s) (E , N )
the next particle reaches .
Again, just as for the canonical ensemble, there is a normalization
Fig. 5.3 An isolated system composed
factor called the grand partition function
of a small system and a heat bath that
 can exchange both energy and particles
(T, V, ) = e(En Nn )/kB T ; (5.39) (porous boundary). Both exchanges
n are weak, so that the states of the two
subsystems are assumed independent of
the probability density of state si is (si ) = e(Ei Ni )/kB T /. There one another. The system is in state
is a grand free energy si , with energy Ei and Ni particles;
the total energy is E and total number
is N . The probability of the subsys-
(T, V, ) = kB T log() = E T S N (5.40)
tem being in state si is proportional to
HB (E Ei , N Ni ).
analogous to the Helmholtz free energy A(T, V, N ). In problem 5.4 you
shall derive the Euler relation E = T S P V + N , and hence show that
(T, , V ) = P V .
Partial Traces. Let us note in passing that we can write the grand
canonical partition function as a sum over canonical partition functions.
Let us separate the sum over states n of our system into a double sum
an inner restricted sum17 over states of xed number of particles M in 17
This restricted sum is said to inte-
the system and an outer sum over M . Let sM ,M have energy EM ,M , grate over the internal degrees of free-
dom M . The process is often called a
so partial trace, a nomenclature stemming
 from quantum mechanics.
(T, V, ) = e(E M ,M M)/kB T
E M ,M /kB T
= e eM/kB T

= Z(T, V, M )eM/kB T

= e(A(T,V,M)M)/kB T . (5.41)

Notice that the Helmholtz free energy in the last equation plays exactly
the same role as the energy plays in equation 5.39: exp(Em /kB T )
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
74 Free Energies and Ensembles

is the probability of the system having a particular state m, while

exp(A(T, V, M )/kB T ) is the probability of the system having any state
This is why we dened A = with M particles.18 This type of partial summation is a basic tool
kB T log Z, so that Z = exp(A). that we will explore further in chapter X: it allows one to write eec-
This is the deep reason why the nor-
malization factors like Z are so central
tive coarse-grained free energies, averaging over many microstates which
to statistical mechanics: the statistical share the same macroscopic conguration of interest.
weight of a set of states with a common Using the grand canonical ensemble. The grand canonical en-
property Y is given by the partial trace semble is particularly useful for non-interacting quantum systems, which
of Z(Y ) = exp(F (Y )), giving us an
eective Hamiltonian free energy F (Y )
well see in chapter 7. There each energy eigenstate can be thought of
describing the coarse-grained system. as a separate subsystem, independent of the others except for the com-
petition between eigenstates for the particle number.
For now, lets see how the grand canonical ensemble works for the
problem of number uctuations. In general,
 (Em Nm )/kB T
m Nm e kB T
N  =  (E N )/k T
= / = /. (5.42)
m m B
Just as the uctuations in the energy were related to the specic heat
(the rate of change of energy with temperature, section 5.2), the number
uctuations are related to the rate of change of particle number with
chemical potential.
 (Em Nm )/kB T
N  m Nm e

(Em Nm )/kB T 2
1 m Nm e
= 2
k T
 B 2 (Em Nm )/kB T
1 m Nm e
kB T
N 2  N 2 (N N )2 
= = (5.43)
kB T kB T

(5.1) Twostate system. (Basic) perature T . What is the limiting probability as T ?
Consider the statistical mechanics of a tiny object with As T 0? Related formula: Boltzmann probability
only two discrete states:19 one of energy E1 and the other = Z(T ) exp(E/kT ) exp(E/kT ).
of higher energy E2 > E1 .
(a) Boltzmann probability ratio. Find the ratio of (b) Probabilities and averages. Use the normalization
the equilibrium probabilities 2 /1 to nd our system in of the probability distribution (the system must be in one
the two states, when weakly coupled to a heat bath of tem- or the other state) to nd 1 and 2 separately. (That

19 Visualize this as a tiny biased coin, which can be in the heads or tails state but

has no other internal vibrations or center of mass degrees of freedom. Many systems
are well described by large numbers of these twostate systems: some paramagnets,
carbon monoxide on surfaces, glasses at low temperatures, . . .
To be pub. Oxford UP, Fall05
5.4 Grand Canonical Ensemble 75

is, solve for Z(T ) in the related formula for part (A).) atoms are scattered at dierent X, not at dierent heights
What is the average value of the energy E? in energy.)

(5.2) Barrier Crossing. (Basic, Chemistry)

In this problem, we will derive the Arrhenius law
(a) Let the probability that a particle has position X
= 0 exp(E/kB T ) (5.44) be (X). What is the ratio of probability densities
(XB )/(X0 ) if the particles near the top of the bar-
giving the rate at which systems cross energy barriers. rier are assumed in equilibrium with those deep inside
This law governs not only chemical reaction rates, but the well? Related formula: Boltzmann distribution
many macroscopic rates like diusion constants in solids exp(E/kB T ).
and nucleation rates (section 12.2) that depend on micro-
scopic thermal activation over barriers.20
The important exponential dependence on the barrier
height E is easy to explain: it is the relative Boltzmann
probability that a particle is near the top of the barrier
(and hence able to escape). Here we will do a relatively
careful job of calculating the prefactor 0 .
Consider a system described by a coordinate X, with an
energy U (X) with a minimum at X0 with energy zero
and an energy barrier at XB with energy U (XB ) = B.21
Let the temperature of the system be much smaller than
B/kB . To do our calculation, we will make some approx-
imations. (1) We assume that the atoms escaping across
Fig. 5.5 Well Probability Distribution. The approximate
the barrier to the right do not scatter back into the well. probability distribution for the atoms still trapped inside the
(2) We assume that the atoms deep inside the well are in well.
equilibrium. (3) We assume that the particles crossing to
the right across the barrier are given by the equilibrium
distribution inside the well.
If the barrier height B >> kB T , then most of the par-
ticles in the well stay near the bottom of the well. Of-
ten, the potential near the bottom is accurately described
by a quadratic approximation U (X) 1/2 M 2 (X X0 )2 ,

where M is the mass of our system and is the frequency

of small oscillations in the well.
B (b) In this approximation, what is the probability den-
sity (X) near the bottom of the well? (See g-
ure 5.5.) What is (X0 ), the probability density of
X0 XB having the system at the bottom of the well? Re-
Position X lated
formula: Gaussian probability distribution
Fig. 5.4 Barrier Crossing Potential. A schematic of how (1/ 2 2 ) exp(x2 /2 2 ).
many atoms are at each position. (Actually, of course, the Hint: Make sure you keep track of the 2s.

20 There are basically three ways in which slow processes arise in physics. (1) Large

systems can respond slowly to external changes because communication from one end
of the system to the other is sluggish: examples are the slow decay at long wavelengths
in the diusion equation 2.2 and Goldstone modes 9.3. (2) Systems like radioactive
nuclei can respond slowly decaying with lifetimes of billions of years because
of the slow rate of quantum tunneling through barriers. (3) Systems can be slow
because they must thermally activate over barriers (with the Arrhenius rate 5.44).
21 This potential could describe a chemical reaction, with X being a reaction coor-

dinate. It could describe the escape of gas from a moon of Jupiter, with X being the
distance from the moon in Jupiters direction.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
76 Free Energies and Ensembles

time-dependent function y (M ) (t). Let there be N experi-

v t
mentally determined data points yi at times ti with errors
of standard deviation . We assume that the experimen-
tal errors for the data points are independent and Gaus-
sian distributed, so that the probability that our model
actually generated the observed data points (the proba-
bility P (D|M ) of the data given the model) is
2 2 3
1 6 y (M ) (ti ) yi 7
Fig. 5.6 Crossing the Barrier. The range of positions for P (D|M ) = exp 4 5.
which atoms moving to the right with velocity v will cross the i=1
2 2 2
barrier top in time t.
(a) True or false: This probability density corresponds to
Knowing the answers from (a) and (b), we know the a Boltzmann Pdistribution with energy H and temperature
probability density (XB ) at the top of the barrier. We T , with H = N i=1 (y
(M )
(ti ) yi )2 /2 and kB T = 2 .
need to also know the probability that particles near There are two schools of statistics. Among a family of
the top of the barrier have velocity V , because the models, the frequentists will pick the model M with the
faster-moving parts of the distribution of velocities con- largest value of P (D|M ). The Bayesians take a dierent
tribute more to the ux of probability over the barrier point of view. They argue that there is no reason to be-
(see gure 5.6). As usual, because the total energy is lieve that all models have the same likelihood.22 Suppose
the sum of the kinetic and potential energy, the total the intrinsic probability of the model (the prior) is P (M ).
Boltzmann probability factors: in equilibrium the parti- They use the simple theorem
cles will always
p have a velocity probability distribution
(V ) = 1/ 2kB T /M exp(1/2 M V 2 /kB T ).
P (M |D) = P (D|M )P (M )/P (D) = P (D|M )P (M )
(c) First give a formula for the decay rate (the proba- (5.46)
bility per unit time that our system crosses the barrier where the last step notes that the probability that you
towards the right), for an unknown probability density measured the known data D is presumably one.
(XB )(V ) as an integral over the velocity V . Then, us-
ing your formulas from parts (A) and (B), give your esti- The Bayesians often will pick the maximum of P (M |D)
as their model for the experimental data. But, given their

of the decay rate for our system. Related formula:
perspective, its even more natural to consider the entire
x exp(x2 /2 2 ) dx = 2 .
ensemble of models, weighted by P (M |D), as the best
How could we go beyond this simple calculation? In the description of the data. This ensemble average then nat-
olden days, Kramers studied other onedimensional mod- urally provides error bars as well as predictions for various
els, changing the ways in which the system was coupled quantities.
to the external bath. On the computer, one can avoid a
separate heat bath and directly work with the full mul- Consider the simple problem of tting a line to two data
tidimensional conguration space, leading to transition points. Suppose the experimental data points are at
state theory. The transitionstate theory formula is very t1 = 0, y1 = 1 and t2 = 1, y2 = 2, where both y-values
similar to the one you derived in part (c), except that have uncorrelated Gaussian errors with standard devia-
the prefactor involves the product of all the frequencies tion = 1/2, as assumed in equation (F.2.1) above. Our
at the bottom of the well and all the positive frequen- model M (m, b) is y(t) = mt+b. Our Bayesian statistician
cies at the saddlepoint at the top of the barrier. (See knows that m and b both lie between zero and two, and
reference [42].) Other generalizations arise when crossing assumes that the probability density is otherwise uniform:
multiple barriers [45] or in nonequilibrium systems [63]. P (m, b) = 1/4 for 0 < m < 2 and 0 < b < 2.
(b) Which of the contour plots in gure 5.7 accurately rep-
(5.3) Statistical Mechanics and Statistics. (Mathe- resent the probability distribution P (M |D) for the model,
matics) given the observed data? (The spacing between the con-
Consider the problem of tting a theoretical model to ex- tour lines is arbitrary.)
perimentally determined data. Let our model M predict a

22 There is no analogue of Liouvilles theorem (chapter 4) in model space.

To be pub. Oxford UP, Fall05
5.4 Grand Canonical Ensemble 77

(A) (B) (C)

(D) (E)
Fig. 5.7

(5.4) Euler, Gibbs-Duhem, and Clausius-

Clapeyron. (Thermodynamics, Chemistry)
(a) Using the fact that the entropy S(N, V, E) is exten- Critical

sive, show that Point

S S S Solid
N + V + E = S. (5.47)

Show from this that in general Triple

S = (E + pV N )/T (5.48) Temperature
Fig. 5.8 Generic phase diagram, showing the coexistence
and hence E = T S pV + N . This is Eulers equation. curves for solids, liquids, and gasses.
As a state function, S is supposed to depend only on
E, V , and N . But equation 5.48 seems to show explicit
dependence on T , p, and as well: how can this be? Clausius-Clapeyron equation. Consider the phase di-
(b) One answer is to write the latter three as functions of agram 5.8. Along an equilibrium phase boundary, the
E, V , and N . Do this explicitly for the ideal gas, using temperatures, pressures, and chemical potentials of the
the ideal gas entropy equation 3.61 two phases must agree: otherwise a at interface between
the two phases would transmit heat, shift sideways, or
" 3/2 #
5 V 4mE leak particles, respectively (violating the assumption of
S(N, V, E) = N kB + N kB log , equilibrium).
2 N h3 3N
(5.49) (c) Apply the Gibbs-Duhem relation to both phases, for
and your (or the graders) results for problem 3.5(c), and a small shift by T along the phase boundary. Let s1 ,
verify equation 5.48 in that case. v1 , s2 , and v2 be the molecular entropies and volumes
(s = S/N , v = V /N for each phase); derive the Clausius-
Another answer is to consider a small shift of all six vari-
Clapeyron equation for the slope of the coexistence line on
ables. We know that dE = T dS pdV + dN , but
the phase diagram
if we shift all six variables in Eulers equation we get
dE = T dS pdV + dN + SdT V dp + N d. This im- dP/dT = (s1 s2 )/(v1 v2 ). (5.51)
plies the Gibbs-Duhem relation

0 = SdT V dp + N d. (5.50) Its hard to experimentally measure the entropies per par-
ticle: we dont have an entropy thermometer. But, as you
It means that the intensive variables T , p, and are not will remember, the entropy dierence upon a phase trans-
all independent. formation S = Q/T is related to the heat ow Q needed
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
78 Free Energies and Ensembles

to induce the phase change. Let the latent heat L be the (c) Canonical Ensemble: Explicit traces and ther-
heat ow per molecule. modynamics. (i) Take one of our atoms and couple
(d) Write a formula for dP/dT not involving the entropy. it to a heat bath of temperature kB T = 1/. Write
explicit formulas for Zcanon , Ecanon , and Scanon in the
(5.5) Negative Temperature. (Quantum) canonical ensemble, as a trace (or sum) over the two
states of the atom. (E should be the energy of each
A system of N atoms can be in the ground state or in an
state multiplied by the probability n of that state, S
excited state. For convenience, we set the zero of energy
should be the trace of n log n .) (ii) Compare the re-
exactly in between, so the energies of the two states of
sults with what you get by using the thermodynamic rela-
an atom are 9/2. The atoms are isolated from the out-
tions. Using Z from the trace over states, calculate the
side world. There are only weak couplings between the
Helmholtz free energy A, S as a derivative of A, and
atoms, sucient to bring them into internal equilibrium
E from A = E T S. Do the thermodynamically de-
but without other eects.
rived formulas you get agree with the statistical traces?
(iii) To remove some of the mystery from the thermo-
Entropies and Energy Fluctuations dynamic relations, consider the thermodynamically valid
Microcanonical and Canonical formula E = log Z/ = (1/Z) Z/. Write out
Z as a sum over energy states, and see that this formula
follows naturally.
30kB 0.1 (d) What happens to E in the canonical ensemble as
T ? Can you get into the regime discussed in
Probability (E)
Entropy S

part (b)?
(e) Canonical-Microcanonical Correspondence.
Microcanonical Entropy Smicro(E)
Find the entropy in the canonical distribution for N of
Canonical Entropy Sc(T(E)) our atoms coupled to the outside world, from your an-
Probability (E)
swer to part (c). How can you understand the value of
S(T = ) S(T = 0) simply? Using the approximate
0kB 0 form of the entropy from part (a) and the temperature
-10 0 10 20
Energy E = -N/2 + M /3 from part (b), show that the canonical and microcanon-
Fig. 5.9 Entropies and energy uctuations for this problem ical entropies agree, Smicro (E) = Scanon (T (E)). (Per-
with N = 50. The canonical probability distribution for the haps useful: arctanh(x) = 1/2 log ((1 + x)/(1 x)) .) No-
energy is for E = 10/, and kB T = 1.207/. You may wish tice that the two are not equal in the gure above: the
to check some of your answers against this plot. form of Stirlings formula we used in part (a) is not very
accurate for N = 50. In a simple way, explain why the
microcanonical entropy is smaller than the canonical en-
(a) Microcanonical Entropy. If the net energy is tropy.
E (corresponding to a number of excited atoms m = (f ) Fluctuations. Show in general that the root-mean-
E/9+N/2), what is the microcanonical entropy Smicro (E) squared uctuations in the energy in the canonical dis-
of our system? Simplify your expression using Stirlings tribution (E E)2  = E 2  E2 is related to the
formula, log n! n log n n. specic heat C = E/T . (I nd it helpful to use the for-
(b) Negative Temperature. Find the temperature, us- mula from part (c.iii), E = log(Z)/.) Calculate the
ing your simplied expression from part (a). (Why is it root-mean-square energy uctuations for N of our atoms.
tricky to do it without approximation?) What happens to Evaluate it at T (E) from part (b): it should have a par-
the temperature when E > 0? ticularly simple form. For large N , are the uctuations
Having the energy E > 0 is a kind of population inver- in E small compared to E?
sion. Population inversion is the driving mechanism for
lasers. Microcanonical simulations can lead also to states (5.6) Laplace. (Thermodynamics) 23
with negative specic heats. Laplace Transform. The Laplace transform of a func-
For many quantities, the thermodynamic derivatives have tion f (t) is a function of x:
natural interpretations when viewed as sums over states.
Its easiest to see this in small systems. L{f }(x) = f (t)ext dt. (5.52)

23 Laplace (1749-1827). Math reference [68, sec. 4.3].

To be pub. Oxford UP, Fall05
5.4 Grand Canonical Ensemble 79

Show that the canonical partition function Z() can be

written as the Laplace transform of the microcanonical
volume of the energy shell (E).

(5.7) Legendre. (Thermodynamics) x
Legendre Transforms. The Legendre transform of a
function f (t) is given by minimizing f (x) xp with re- SY
spect to p, so that p is the slope (p = f
Fig. 5.10 An RNA polymerase molecular motor attached to
g(p) = min{f (x) xp}. (5.53) a glass slide is pulling along a DNA molecule (transcribing it
x into RNA). The opposite end of the DNA molecule is attached
to a bead which is being pulled by an optical trap with a con-
stant external force F . Let the distance from the motor to the
We saw in the text that in thermodynamics the Legendre bead be x: thus the motor is trying to move to decrease x and
transform of the energy is the Helmholtz free energy25 the force is trying to increase x.

A(T, N, V ) = min {E(S, V, N ) T S} . (5.54) Without knowing anything further about the chemistry or
biology in the system, which two of the following must be
true on average, in all cases, according to basic laws of
How do we connect this with the statistical mechanical thermodynamics?
relation of part (a), which related = exp(S/kB ) to
(T) (F) The total entropy of the universe (the system,
Z = exp A/kB T ? Thermodynamics, roughly speaking,
bead, trap, laser beam . . . ) must increase or stay un-
is statistical mechanics without the uctuations.
changed with time.
Using your Laplace transform of exercise 5.6, nd an (T) (F) The entropy Ss of the system must increase with
equation for Emax where the integrand is maximized. time.
Does this energy equal the energy which minimizes the (T) (F) The total energy ET of the universe must de-
Legendre transform 5.54? Approximate Z() in your crease with time.
Laplace transform by the value of the integrand at this
(T) (F) The energy Es of the system must decrease with
maximum (ignoring the uctuations). Does it give the
Legendre transform relation 5.54?
(T) (F) Gs F x = Es T Ss + P Vs F x must decrease
with time, where Gs is the Gibbs free energy of the system.
Related formula: G = E T S + P V .
(5.8) Molecular Motors: Which Free Energy? (Ba-
Note: F is a force, not the Helmholtz free energy. Pre-
sic, Biology)
cisely two of the answers are correct.
Figure 5.10 shows a study of the molecular motor that
transcribes DNA into RNA. Choosing a good ensemble (5.9) Michaelis-Menten and Hill (Biology, Computa-
for this system is a bit involved. It is under two constant tion)
forces (F and pressure), and involves complicated chem- Biological systems often have reaction rates that are sat-
istry and biology. Nonetheless, you know some things urable: the cell needs to respond sensitively to the in-
based on fundamental principles. Let us consider the op- troduction of a new chemical S, but the response should
tical trap and the distant uid as being part of the ex- not keep growing indenitely as the new chemical con-
ternal environment, and dene the system as the local centration [S] grows.26 Other biological systems act as
region of DNA, the RNA, motor, and the uid and local switches: they not only saturate, but they change sharply
molecules in a region immediately enclosing the region, from one state to another as the concentration of a chem-
as shown in gure 5.10. ical S is varied. We shall analyze both of these important

24 Legendre (1752-1833).
25 Actually, [5.3] in the text had E as the independent variable. As usual in ther-
modynamics, we can solve S(E, V, N ) for E(S, V, N ).
26 [S] is the concentration of S (number per unit volume). S stands for substrate.

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
80 Free Energies and Ensembles

biological problems, and at the same time give tangible Vmax

examples of how one develops eective dynamical theo-
ries by removing degrees of freedom: here, we remove an
Michaelis Menten
Hill, n=4
enzyme E from the equations to get an eective reaction
rate, rather than coarsegraining some large statistical

Rate d[P]/dt
mechanical system.

The rate of a chemical reaction

NS + B C (5.55)

0 KM, KH
where N molecules of type S combine with a molecule Substrate concentration [S]
of type B to make a molecule of type C will occur with Fig. 5.11 MichaelisMenten and Hill equation forms.
a reaction rate given by a traditional chemical kinetics
We can derive the MichaelisMenten form by hypothesiz-
= k[S]N [B]. (5.56) ing the existence of a catalyst or enzyme E, which is in
dt short supply. The enzyme is presumed to be partly free
and available for binding (concentration [E]) and partly
If the reactants need all to be in a small volume V in order bound to the substrate (concentration [E : S], the colon
to react, then [S]N [B]V N is the probability that they are denoting the dimer), helping it to turn into the product.
in location to proceed, and the rate constant k divided The total concentration [E] + [E : S] = Etot is xed. The
by V N is the reaction rate of the conned molecules.27 reactions are as follows:
Saturation: the MichaelisMenten equation. Sat- k
uration is not seen in simple chemical reaction kinetics. E +S E:S E+P (5.58)

Notice that the reaction rate goes as the N th power of k1
the concentration [S]: far from saturating, the reaction
rate grows linearly or faster with concentration. We must then assume that the supply of substrate is
large, so its concentration changes slowly with time. We
The prototype example of saturation in biological systems can then assume that the concentration [E : S] is in
is the MichaelisMenten reaction form. A reaction of this steady state, and remove it as a degree of freedom.
form converting a chemical S (the substrate) into P (the (a) Assume the binding reaction (rates k1 , k1 , and kcat )
product) has a rate given by the formula in equation 5.58 are of traditional chemical kinetics form
(equation 5.56), with N = 1 or N = 0 as appropriate.
Write the equation for d[E : S]/dt, set it to zero, and
d[P ] vmax [S] use it to eliminate [E] in the equation for dP/dt. What
= , (5.57) are vmax and KM in the Michaelis-Menten form (equa-
dt KM + [S]
tion 5.57) in terms of the ks and Etot ?
We can understand this saturation intuitively: when all
where KM is called the Michaelis constant (gure 5.11). the enzyme is busy and bound to the substrate, adding
This reaction at small concentrations acts like an ordi- more substrate cant speed up the reaction.
nary chemical reaction with N = 1 and k = vmax /KM , Cooperativity and sharp switching: the Hill equa-
but the rate saturates at Vmax as [S] . The Michaelis tion. Hemoglobin is what makes blood red: this iron
constant KM is the concentration [S] at which the rate is containing protein can bind up to four molecules of oxy-
equal to half of its saturation rate. gen in the lungs, and carries them to the tissues of the

27 The reaction will typically involve crossing an energy barrier E, and the rate will

be given by a Boltzmann probability k = k0 exp(E/kB T ). The constant of propor-

tionality k0 can in principle be calculated using generalizations of the methods we
used in exercise 5.2.
To be pub. Oxford UP, Fall05
5.4 Grand Canonical Ensemble 81

body where it releases them. If the binding of all four more of a switch, with the reaction turning on (or the Hb
oxygens were independent, the [O2 ] concentration depen- accepting or releasing its oxygen) sharply at a particular
dence of the bound oxygen concentration would have the concentration (gure 5.11). The transition can be made
MichaelisMenten form (gure 5.11): to completely de- more or less sharp by increasing or decreasing n.
oxygenate the Hemoglobin (Hb) would demand a very The Hill equation can be derived using a simplifying as-
low oxygen concentration in the tissue. sumption that n molecules bind in a single reaction:
What happens instead is that the Hb binding of oxygen
looks much more sigmoidal a fairly sharp transition be- b
E + nS E : (nS) (5.60)
tween nearly 4 oxygens bound at high [O2 ] (lungs) to
nearly none bound at low oxygen concentrations. This
arises because the binding of the oxygens is enhanced by where E might stand for hemoglobin and S for the O2
having other oxygens bound. This is not because the oxy- oxygen molecules. Again, there is a xed total amount
gens somehow stick to one another: instead, each oxygen Etot = [E] + [E : nS].
deforms the Hb in a nonlocal allosteric28 fashion, chang-
(b) Assume that the two reactions in equation 5.60 have
ing the congurations and and anity of the other bind-
the chemical kinetics form (equation 5.56) with N = 0 or
ing sites.
N = n as appropriate. Write the equilibrium equation for
The Hill equation was introduced for hemoglobin to E : (nS), and eliminate [E] using the xed total Etot .
describe this kind of cooperative binding. Like the
Usually, and in particular for hemoglobin, this coopera-
MichaelisMenten form, it is also used to describe reac-
tivity is not so rigid: the states with one, two, and three
tion rates, where instead of the carrier Hb we have an
O2 molecules bound also compete with the unbound and
enzyme, or perhaps a series of transcription binding sites
fully bound states. This is treated in an approximate way
(see exercise 8.7). In the reaction rate form, the Hill equa-
by using the Hill equation, but allowing n to vary as a
tion is
d[P ] vmax [S]n tting parameter: for Hb, n 2.8.
= n , (5.59)
dt KH + [S]n Both Hill and MichaelisMenten equations are often used
(see gure 5.11). For Hb, the concentration of the n-fold in biological reaction models even when there are no ex-
oxygenated form is given by the right-hand side of equa- plicit mechanisms (enzymes, cooperative binding) known
tion 5.59. In both cases, the transition becomes much to generate them.

28 Allosteric comes from Allo (other) and steric (structure or space). Allosteric

interactions can be cooperative, as in hemoglobin, or inhibitory.

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
82 Free Energies and Ensembles

To be pub. Oxford UP, Fall05

Entropy 6
Entropy is the key concept in statistical mechanics. What does it mean?
Can we develop an intuition for it?
We shall see in this chapter that entropy has several related interpre-
tations. Entropy measures the disorder in a system: in section 6.2 well
see this using the entropy of mixing and the residual entropy of glasses.
Entropy measures our ignorance about a system: in section 6.3 well see
this with examples from nonequilibrium systems and information the-
ory. But well start in section 6.1 with the original interpretation, that
grew out of the 19th century study of engines, refrigerators, and the end
of the universe: Entropy measures the irreversible changes in a system.

6.1 Entropy as Irreversibility: Engines and

Heat Death of the Universe
The early 1800s saw great advances in understanding motors and en-
gines. In particular, scientists asked a fundamental question: How e-
cient can an engine be? The question was made more dicult because
there were two relevant principles to be discovered: energy is conserved
and entropy always increases.1 1
Some would be pedantic, and say only
For some kinds of engines, only energy conservation is important. that entropy never decreases, but this
qualication is unnecessary. Systems
For example, there are electric motors that convert electricity into me- that remain completely in equilibrium
chanical work (running an electric train), and generators that convert at all times have constant entropy. But
mechanical work (a windmill rotating) into electricity.2 For these elec- systems only equilibrate completely af-
tromechanical engines, the absolute limitation is given by the conserva- ter an innite time; for example, well
see that Carnot cycles must be run in-
tion of energy: the motor cannot generate more energy in mechanical nitely slowly to be truly reversible.
work than is consumed electrically, and the generator cannot generate 2
Electric motors are really the same as
more electrical energy than is input mechanically.3 An ideal electrome- generators run in reverse: turning the
chanical engine can convert all the energy from one form to another. shaft of a simple electric motor can gen-
Electric motors and generators are limited only by the conservation of erate electricity.
energy. 3
Mechanical work (force times dis-
Steam engines are more complicated. Scientists in the early 1800s tance) is energy; electrical power (cur-
were guring out that heat is a form of energy. A steam engine, running rent times voltage) is energy per unit
a power plant or an old-style locomotive, transforms a fraction of the
heat energy from the hot steam (the hot bath) into electrical energy or
work, but some of the heat energy always ends up wasted dumped
into the air or into the cooling water for the power plant (the cold
bath). In fact, if the only limitation on heat engines was conservation
of energy, one would be able to make a motor using the heat energy from
84 Entropy

a rock, getting both useful work and a very cold rock.

There is something fundamentally less useful about energy once it
becomes heat. By spreading out the energy among all the atoms in a
macroscopic chunk of material, not all of it can be retrieved again to
do useful work. The energy is more useful for generating power when
divided between hot steam and a cold lake, than in the form of water
at a uniform, intermediate warm temperature. Indeed, most of the time
when we use mechanical or electrical energy, the energy ends up as heat,
generated from friction or other dissipative processes.
The equilibration of a hot and cold body to two warm bodies in an
isolated system is irreversible: one cannot return to the original state
without inputting some kind of work from outside the system. Carnot,
publishing in 1824, realized that the key to producing the most ecient
possible engine was to avoid irreversibility. A heat engine run in reverse
is a refrigerator: it consumes mechanical work or electricity and uses
it to pump heat from a cold bath to a hot one (extracting some of the
heat as work). A reversible heat engine would be able to run forward
Q 2+W W Q 2+W + generating work by transferring heat from the hot to the cold baths, and
Carnot Power Impossible then run backward using the same work to pump the heat back into the
Refrigerator Plant Engine
Q2 Q2 hot bath. It was by calculating the properties of this reversible engine
T2 that Carnot discovered what would later be called the entropy.
If you had an engine more ecient than a reversible one, you could
Fig. 6.1 How to use an engine which
run it side-by-side with a reversible engine running as a refrigerator
produces more work than the Carnot
cycle to build a perpetual motion ma- (gure 6.1). The pair of engines would generate work by extracting
chine doing work per cycle. energy from the hot bath (as from our rock, above) without adding heat
to the cold one. After we used this work, we could dump the extra heat
T1 from friction back into the hot bath, getting a perpetual motion machine
that did useful work without consuming anything. In thermodynamics,
it is a postulate that such perpetual motion machines are impossible.
P Carnot considered a prototype heat engine (gure 6.2), given by a
piston with external pressure P , two heat baths at a hot temperature T1
Q2 P and a cold temperture T2 , with some material inside the piston. During
one cycle of his engine, heat Q1 ows out of the hot bath, heat Q2 ows
into our cold bath, and net work W = Q1 Q2 is done by the piston
T2 on the outside world. To make his engine reversible, Carnot must avoid
(i) friction, (ii) letting hot things touch cold things, (iii) letting high
Fig. 6.2 Prototype Heat Engine:
pressures expand into low pressures, and (iv) moving the walls of the
A piston with external exerted pres- container too quickly (emitting sound or shock waves).
sure P , moving through an insulated Carnot, a theorist, could ignore the practical diculties. He imagined
cylinder. The cylinder can be put into a frictionless piston run through a cycle at arbitrarily low velocities. He
thermal contact with either of two heat
baths: a hot bath at temperature T1 realized that all reversible heat engines working with the same tempera-
(say, a coal re in a power plant) and a ture baths had to produce exactly the same amount of work for a given
cold bath at T2 (say water from a cold heat ow from hot to cold (none of them could be more ecient than any
lake). During one cycle of the piston in other, since they all were the most ecient possible). This allowed him
and out, heat energy Q1 ows into the
piston, mechanical energy W is done on to ll the piston with the simplest possible material (an ideal gas), for
the external world by the piston, and which he knew the relation between pressure, volume, and temperature.
heat energy Q2 ows out of the piston The piston was used both to extract work from the system and to raise
into the cold bath. and lower the temperature. Carnot connected the gas thermally to each
To be pub. Oxford UP, Fall05
6.1 Entropy as Irreversibility: Engines and Heat Death 85

bath only when its temperature agreed with the bath, so his engine was
fully reversible.
The Carnot cycle moves the piston in and out in four steps (gure 6.3).
(ab) The compressed gas is connected to the hot bath, and the piston
moves outward at a varying pressure; heat Q1 ows in to maintain the
gas at temperature T1 .
(bc) The piston expand further at varying pressure, cooling the gas
to T2 without heat transfer,
(cd) The expanded gas in the piston is connected to the cold bath
and compressed; heat Q2 ows out maintaining the temperature at T2 .
(da) The piston is compressed, warming the gas to T1 without heat
transfer, returning it to the original state.
Energy conservation tells us that the net heat energy owing into the piston,
Q1 Q2 must equal the work done on the outside world W :

Q1 = Q2 + W. (6.1)

The work done by the piston is the integral of the force exerted times the
distance. The force is the piston surface area times the pressure, and the a
distance times the piston surface area is the volume change, giving the simple
Z Z Z Pressure P
Heat In Q1
W = F dx = (F/A)(Adx) = P dV = Area inside PV Loop. (6.2) Compress
cycle b
d Expand
That is, if we plot P versus V for the four steps of our cycle, the area inside PV=N kBT2
the resulting closed loop is the work done by the piston on the outside world Heat Out Q2
(gure 6.2). Volume V
We now ll our system with a monatomic ideal gas. We saw in section 3.5
that the ideal gas equation of state is Fig. 6.3 Carnot Cycle PV Dia-
gram: The four steps in the Carnot
P V = N kB T (6.3) cycle: ab heat in Q1 at constant tem-
perature T1 , bc expansion without
heat ow, cd heat out Q2 at constant
and that its total energy is its kinetic energy, given by the equipartition the-
temperature T2 , and da compression
orem without heat ow to the original vol-
E = 3/2 N kB T = 3/2 P V. (6.4) ume and temperature.
Along ab where we add heat Q1 to the system, we have P (V ) = N kB T1 /V .
Using energy conservation (the rst law),
Z b
Q1 = Eb Ea + Wab = 2/3 Pb Vb 2/3 Pa Va + P dV (6.5)

But Pa Va = N kB T1 = Pb Vb , so the rst two terms cancel, and the last term
simplies Z b
N kB T1
Q1 = dV = N kB T1 log(Vb /Va ). (6.6)
a V
Q2 = N kB T2 log(Vc /Vd ). (6.7)
For the other two steps in our cycle we need to know how the ideal gas
behaves under expansion without any heat ow in or out. Again, using the rst
law on a small segment of the path, the work done for a small volume change
P dV must equal the change in energy dE. Using equation 6.3, P dV =
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
86 Entropy

NkVB T dV , and using equation 6.4, dE = 3/2 N kB dT , so dV /V = 3/2 dT /T .

Integrating both sides from b to c, we nd
Z c Z c
dV dT
= log(Vc /Vb ) = 3/2 = 3/2 log(T2 /T1 ). (6.8)
b V b T
3 3
so Vc /Vb = (T1 /T2 ) /2 . Similarly, Vd /Va = (T1 /T2 ) /2 . Thus Vc /Vb = Vd /Va ,
and hence
Vc Vc Vb Vd Vb Vb
= = = . (6.9)
Vd Vb Vd Va Vd Va
We can use the volume ratios from the insulated expansion and compres-
sion (equation 6.9) to substitute into the heat ow (equations 6.6 and 6.6) to
Q1 /T1 = N kB log(Vb /Va ) = N kB log(Vc /Vd ) = Q2 /T2 . (6.10)

This was Carnots fundamental result: his cycle, and hence all re-
versible engines, satises the law

Q1 /T1 = Q2 /T2 . (6.11)

The thermodynamic entropy is de- Later scientists decided to dene4 the entropy change to be this ratio
rived with a heat ow E = Q at of heat ow to temperature:
a xed temperature T , so our statisti-
cal mechanics denition of temperature
1/T = S/E (from equation 3.30) is Sthermo = Q/T. (6.12)
equivalent to the thermodynamics de-
nition of entropy S = Q/T (equation For a reversible engine the entropy ow from the hot bath into the
6.12). piston Q1 /T1 equals the entropy ow from the piston into the cold bath
Q2 /T2 : no entropy is created or destroyed. Any real engine will create
a net entropy during a cycle: no engine can reduce the net amount of
entropy in the universe.
The irreversible increase of entropy is not a property of the microscopic
laws of nature. In particular, the microscopic laws of nature are time
reversal invariant: the laws governing the motion of atoms are the same
More correctly, the laws of nature whether time is running backwards or forwards.5 The microscopic laws
are only invariant under CPT: chang- do not tell us the arrow of time. The direction of time in which entropy
ing the direction of time (T) along
with inverting space (P) and changing
increases is our denition of the future.6
matter to antimatter (C). Radioactive This confusing point may be illustrated by considering the game of
beta decay and other weak interaction pool or billiards. Neglecting friction, the trajectories of the pool balls are
forces are not invariant under time re- also time-reversal invariant. If the velocities of the balls were reversed
versal. The basic conundrum for sta-
tistical mechanics is the same, though:
halfway through a pool shot, they would retrace their motions, building
we cant tell if we are matter beings liv- up all the velocity into one ball that then would stop as it hit the cue
ing forward in time or antimatter be- stick. In pool, the feature that distinguishes forward from backward
ings living backward in time in a mir- time is the greater order at early times: all the momentum starts in one
ror. Time running backward would ap-
pear strange even if we were made of ball, and is later distributed among all the balls involved in collisions.
antimatter. Similarly, the only reason we can resolve the arrow of time distinguish
In electromagnetism, the fact that the future from the past is that our universe started in an unusual,
waves radiate away from sources more low entropy state,7 and is irreversibly moving towards equilibrium.8
often than they converge upon sources
is a closely related distinction of past 8 Ifsome miracle produced a low entropy, ordered state as a spontaneous uctua-
from future. tion at time t0 , then at times t < t0 all our laws of macroscopic physics would appear
7 to run backward.
The big bang was hot and probably
close to equilibrium, but the volume per To be pub. Oxford UP, Fall05
particle was small so the entropy was
nonetheless low.
6.2 Entropy as Disorder 87

The cosmic implications of the irreversible increase of entropy were not

lost on the intellectuals of the 19th century. In 1854, Helmholtz predicted
the heat death of the universe: he suggested that as the universe ages
all energy will become heat, all temperatures will become equal, and
everything will be condemned to a state of eternal rest. In 1895, H.G.
Wells in The Time Machine [118, Chapter 11] speculated about the state
of the Earth in the distant future:
. . . the sun, red and very large, halted motionless upon
the horizon, a vast dome glowing with a dull heat. . . The
earth had come to rest with one face to the sun, even as
in our own time the moon faces the earth. . . There were no
breakers and no waves, for not a breath of wind was stirring.
Only a slight oily swell rose and fell like a gentle breathing,
and showed that the eternal sea was still moving and living.
. . . the life of the old earth ebb[s] away. . .
This gloomy prognosis has been re-examined recently: it appears that
the expansion of the universe may provide loopholes. While there is little
doubt that the sun and the stars will indeed die, it may be possible if
life can evolve to accomodate the changing environments that civiliza- V V
tion, memory, and thought could continue for an indenite subjective
time (e.g., exercise 6.1).

6.2 Entropy as Disorder

A second intuitive interpretation of entropy is as a measure of the disor-
der in a system. Scientist mothers tell their children to lower the entropy Fig. 6.4 The pre-mixed state: N/2
white atoms on one side, N/2 black
by tidying their rooms; liquids have higher entropy than crystals intu-
atoms on the other.
itively because their atomic positions are less orderly.9 We illustrate
this interpretation by rst calculating the entropy of mixing, and then 2V
discussing the zero-temperature entropy of glasses.

6.2.1 Entropy of Mixing: Maxwells Demon and Os-

motic Pressure
Scrambling an egg is a standard example of irreversibility: you cant
re-separate the yolk from the white. A simple model for scrambling is Fig. 6.5 The mixed state: N/2 white
given in gures 6.4 and 6.5: the mixing of two dierent types of particles. atoms and N/2 black atoms scattered
Here the entropy change upon mixing is a measure of increased disorder. through the volume 2V .
Consider a volume separated by a partition into two equal volumes of vol- 9
There are interesting examples of sys-
ume V . N/2 indistinguishable ideal gas white atoms are on one side of the tems that appear to develop more or-
partition, and N/2 indistinguishable ideal gas black atoms are on the other der as their entropy (and temperature)
side. The congurational entropy of this system (section 3.5, ignoring the rises. These are systems where adding
order of one, visible type (say, crys-
momentum space parts) is
talline or orientational order) allows in-
N creased disorder of another type (say,
/2 N
Sunmixed = 2 kB log(V / /2 !) (6.13) vibrational disorder). Entropy is a pre-
cise measure of disorder, but is not the
just twice the congurational entropy of N/2 atoms in a volume V . We assume only possible or useful measure.
that the black and white atoms have the same masses and the same total
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
88 Entropy

energy. Now consider the entropy change when the partition is removed, and
the two sets of atoms are allowed to mix. Because the temperatures and
pressures from both sides are equal, removing the partition does not involve
any irreversible sound emission or heat transfer: any entropy change is due
No social policy implications are im- to the mixing of the white and black atoms. In the desegregated state,10 the
plied by physics: the entropy of mixing entropy has increased to
for a few billion humans would not pro-
vide for an eye blink. /2 N
Smixed = 2kB log((2V ) / /2 !), (6.14)

twice the entropy of N/2 indistinguishable atoms in a volume 2V . Since

log(2m x) = m log 2 + log x, the change in entropy due to the mixing is

Smixing = Smixed Sunmixed = kB log 2N = N kB log 2. (6.15)

We gain kB log 2 in entropy every time we placed an atom into one

of two boxes without looking which box we chose. More generally, we
might dene a counting entropy

Scounting = kB log(Number of undistinguished congurations) (6.16)

for systems with a discrete number of equally-likely congurations.

This kind of discrete choice arises often in statistical mechanics. In
equilibrium quantum mechanics (for a nite system) the states are quan-
tized: so adding a new (non-interacting) particle into one of m degen-
erate states adds kB log m to the entropy. In communications theory
(subsection 6.3.2, exercises 6.7 and 6.8), each bit transmitted down your
channel can be in one of two states, so a random stream of bits of length
Here it is natural to measure entropy N has S = kS N log 2 per bit.11
not in units of temperature, but rather In more general cases, the states available to one particle depend
in base 2, so kS = 1/ log 2. This means
that S = N , for a random string of N
strongly on the congurations of the other particles. Nonetheless, the
bits. equilibrium entropy still measures the logarithm of the number of dier-
ent states that the total system could be in. For example, our equilib-
rium statistical mechanics entropy Sequil (E) = kB log((E)) (equation
3.27) is the logarithm of the number of states energy E, with phase-space
volume h3N allocated to each state.
What would happen if we removed a partition separating N/2 black
atoms on one side from N/2 indistinguishable black atoms on the other?
The initial entropy is the same as above SBB unmixed = 2 kB log(V /2 /N/2 !),
but the nal entropy is now SBB mixed = kB log((2V )N /N !) Notice we
now have N ! rather than the (N/2 !)2 from equation 6.14, since all of our
particles are now indistinguishable. Now N ! = (N ) (N 1) (N
2) (N 3) . . . and (N/2 !)2 = (N/2) (N/2) (N 2)/2 (N 2)/2 . . . :
they roughly dier by 2N canceling the entropy change due to the vol-
ume doubling. Indeed, expanding the logarithm using Stirlings formula
If you keep Stirlings formula to log n! n log n n we nd the entropy per atom is unchanged.12 This
higher order, youll see that the entropy is why we introduced the N ! term for indistinguishable particles in sec-
increases a bit when you remove the
partition. This is due to the number
tion 3.2.1: without it the entropy would decrease by N log 2 whenever
uctuations on the two sides that are we split a container into two pieces.13
now allowed. How can we intuitively connect this entropy of mixing with the ther-
13 modynamic entropy of pistons and engines in section 6.1? Can we use
This is often called the Gibbs para-
dox. To be pub. Oxford UP, Fall05
6.2 Entropy as Disorder 89

our mixing entropy to do work? Clearly to do so we must discriminate

between the two kinds of atoms. Suppose that the barrier separating the
two walls in gure 6.4 was a membrane that was impermeable to black
atoms but allowed white ones to cross. Since both black and white atoms
are ideal gasses, the white atoms would spread uniformly to ll the entire
system, while the black atoms would remain on one side. This would
lead to a pressure imbalance: if the semipermeable wall were used as a
piston, work could be extracted as the black chamber was enlarged to
ll the total volume.14 14
Such semipermeable membranes are
Suppose we had a more active discrimination? Maxwell introduced quite common not for gasses but for di-
lute solutions of ions in water: some
the idea of an intelligent nite being (later termed Maxwells Demon) ions can penetrate and others cannot.
that would operate a small door between the two containers. When a The resulting force on the membrane is
black atom approaches the door from the left or a white atom approaches called osmotic pressure.
from the right the demon would open the door: for the reverse situations
the demon would leave the door closed. As time progresses, this active
sorting would re-segregate the system, lowering the entropy. This is
not a concern for thermodynamics, since of course running a demon
is an entropy consuming process! Indeed, one can view this thought
experiment as giving a fundamental limit on demon eciency, putting
a lower bound on how much entropy an intelligent being must create in
order to engage in this kind of sorting process.

6.2.2 Residual Entropy of Glasses: The Roads Not Fig. 6.6 Ion pump. An imple-
mentation of this demon in biology is
Taken Na+ /K+ -ATPase, an enzyme located
on the membranes of almost every cell
In condensed-matter physics, glasses are the prototype of disordered in your body. This enzyme maintains
systems. Unlike a crystal, in which each atom has a set position, a extra potassium (K+) ions inside the
cell and extra sodium (Na+ ) ions out-
glass will have a completely dierent conguration of atoms each time
side the cell. The enzyme exchanges
it is formed. That is, the glass has a residual entropy: as the tempera- two K + ions from outside for three Na+
ture goes to absolute zero, the glass entropy does not vanish, but rather ions inside, burning as fuel one AT P
equals kB log glass , where glass is the number of zero-temperature con- (adenosine with three phosphates, the
fuel of the cell) into ADP (two phos-
gurations in which the glass might be trapped. phates). When you eat too much salt
What is a glass? Glasses are disordered like liquids, but are rigid like (Na+ Cl ), the extra sodium ions in
crystals. They are not in equilibrium: they are formed when liquids the blood increase the osmotic pressure
are cooled too fast to form the crystalline equilibrium state.15 You are on the cells, draw more water into the
blood and increase your blood pressure.
aware of glasses made from silica, like window glass,16 and Pyrex.TM17 The gure shows the structure of the
You also know some molecular glasses, like hard candy (a glass made related enzyme calcium ATPase [114]:
of sugar). Many other materials (even metals)18 can form glasses when the arrow shows the shape change as
cooled quickly. the two Ca+ ions are removed.

The crystalline state must be nucle-
16 Windows are made from soda-lime glass, with silica (SiO2 ) mixed with sodium ated, see section 12.2.
and calcium oxides.
17 PyrexTM is a borosilicate glass (boron and silicon oxides) with a low thermal

expansion, used for making measuring cups that dont shatter when lled with boiling
18 Most metals are polycrystalline. That is, the atoms sit in neat crystalline ar-

rays but the metal is made up of many grains with dierent crystalline orientations
separated by sharp grain boundaries.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
90 Entropy

How is the residual glass entropy measured? First, one estimates the
One can measure the entropy of the entropy of the equilibrium liquid;19 then one measures the entropy ow
equilibrium liquid Sliquid (T- ) by slowly Q/T out from the glass as it is cooled from the liquid down to absolute
heating a crystal of the material
R from
absolute zero and measuring 0T dQ/T
zero. The dierence
owing in.  T
1 dQ
Sresidual = Sliquid (T ) dt (6.17)
0 T dt
gives the residual entropy.
How big is the residual entropy of a typical glass? The residual en-
tropy is on the order of kB per molecular unit of the glass (SiO2 or
Vi sugar molecule, for example). This means that the number of glassy
congurations eS/kB is enormous (exercise 6.9 part (c)).
How is it possible to measure the number of glass congurations the
system didnt choose? The glass is, after all, in one particular cong-
uration. How can measuring the heat ow Q(t) out of the liquid as it
i freezes into one glassy state be used to measure the number glass of
possible glassy states? Answering this question will neatly tie together
the statistical mechanics denition of entropy Sstat = kB log glass with
qi the thermodynamic denition Sthermo = Q/T , and will occupy the rest
of this subsection.
We need rst a simplied model of how a glass might fall out of equilib-
Fig. 6.7 Double well potential. A
rium as it is cooled.20 We view the glass as a collection of independent
simple model for the potential energy
for one coordinate qi in a glass: two molecular units. Each unit has a double-well potential energy: along
states separated by a barrier Vi and some internal coordinate qi there are two minima with an energy dier-
with a small energy dierence i . ence i and separated by an energy barrier Vi (gure 6.7). This internal
coordinate might represent a rotation of a sugar molecule, or a shift in
the location of an oxygen in a SiO2 network.
Consider the behavior of one of these double-well degrees of freedom.
As we cool our system, the molecular unit will be thermally excited over
its barrier more and more slowly, with a rate (exercise 5.2) given by an
Arrhenius factor (T ) 0 exp(Vi /kB T ). So long as the cooling rate

20 The glass transition is not a sharp phase transition: the liquid grows thicker

(more viscous) as it is cooled, with slower and slower dynamics, until the cooling
rate becomes too fast for the atomic rearrangements needed to maintain equilibrium
to keep up. At that point, there is a gradual, smearedout transition over many
degrees Kelvin as the viscosity eectively becomes innite and the glass becomes
bonded together. The fundamental nature of this transition remains controversial,
and in particular we do not know why the viscosity diverges so rapidly in so many
materials. There are at least three kinds of competing theories for the glass transition:
(1) It reects an underlying equilibrium transition to an ideal, zero entropy glass
state, which would be formed under innitely slow cooling
(2) It is a purely dynamical transition (where the atoms or molecules jam together)
with no thermodynamic signature.
(3) It is not a transition at all, but just a crossover where the liquid viscosity jumps
rapidly (say, because of the formation of semipermanent covalent bonds).
Our simple model is not a good description of the glass transition, but is a rather
accurate model for the continuing thermal rearrangements (-relaxation) at temper-
atures below the glass transition, and an excellent model for the quantum dynamics
(tunneling centers) which dominate many properties of glasses below a few degrees
To be pub. Oxford UP, Fall05
6.2 Entropy as Disorder 91

cool is small compared to (T ), our atoms will remain in equilibrium.

However, at the local glass-transition temperature Tig where the two
rates cross

cool = (Tig ) = 0 exp(Vi /kB Tig )

Tig = Vi /(kB log(0 /cool )) (6.18)

the transitions between the wells will not keep up and our molecular unit
will freeze into position. If the cooling rate cool is very slow compared
to the attempt frequency 0 (as it almost always is)21 this transition 21
Atomic rates like 0 are around 1012
will be abrupt, and our model glass will freeze into the upper well with per second (an atomic vibration fre-
quency); cooling times are typically be-
the probability given by the equilibrium distribution at Tig . tween seconds and years, so the cooling
Our frozen molecular unit hasg a population in the upper well given by rate is indeed slow compared to micro-
the Boltzmann factor ei /kB Ti times the population in the lower well. scopic times.
Hence centers with i  kB Tig will have both states roughly equally
populated; those with i  kB Tig will be primarily in the ground state.
As a crude approximation, let us pretend that each center goes sharply
from equal occupancy to being fully in the ground state at an asymmetry
temperature Tia = k B
, for some constant .22 22
We havent yet dened the entropy
The statistical mechanical entropy contributed by the two states of for ensembles where the probabilities
are not uniform: thats in section 6.3.
our molecular unit23 is kB log 2 for T > Tia . If Tia > Tig , then the unit Using these denitions, we would not
remains in equilibrium. At T = Tia = k B
the statistical entropy drops need this approximation or the fudge-
to zero (in our crude approximation) so Sstat = kB log 2. At the same factor needed to equate the two en-
time, an average energy i /2 is transmitted to the heat bath.24 so the tropies, but the proper calculation is
more complicated and less intuitive.
thermodynamic entropy changes by Sthermo = Q/T = 2i Tia = kB /2.
Thus we can pick = 2 log 2 to ensure that our two entropy changes That is, we integrate out the vibra-
tions of the molecular unit in the two
agree in our crude approximation. wells: our energy barrier in gure 6.7 is
Now we can see how the thermodynamic measurement of heat can tell properly a free energy barrier.
us the number of glassy congurations. Suppose there are N molecular 24
A 50/50 chance of having energy i .
units in the glass which fall out of equilibrium (Tia < Tig ). As the glass
is cooled, one by one these units randomly freeze into one of two states
(gure 6.8), leading to glass = 2N glassy congurations for this cooling T g1
rate, and a statistical mechanical residual entropy T g2

residual T g3
Sstat = N kB log 2 (6.19)
T g4
roughly kB per molecular unit if the fraction of units with small asym- T g5
metries is sizeable. This entropy change is reected in the thermodynam- T g6
ical measurement at the lower temperatures25 Tia . At these points the
T g7
energy ow out of the glass is less than that for the equilibrium system
because the unit no longer can hop over its barrier, so the thermody- Fig. 6.8 Roads Not Taken by the
namic entropy for the glass stays higher than that for the equilibrated, Glass. The branching path of glassy
zero residual entropy ideal glass state, by states in our model. The entropy (both
statistical and thermodynamic) is pro-
 i portional to the number of branchings
Sthermal = = N kB log 2. (6.20) the glass chooses between as it cools. A
Tia <Tig particular glass will take one trajectory
through this tree as it cools: nonethe-
Thus the heat ow into a particular glass conguration counts the num- less the thermodynamic entropy mea-
sures the total number of states.
ber of roads not taken by the glass on its cooling voyage.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
This is again an artifact of our
crude approximation: the statistical
and thermodynamic entropies remain
in sync at all temperatures when the
calculation is done properly.
92 Entropy

6.3 Entropy as Ignorance: Information and

The most general interpretation of entropy is as a measure of our igno-
rance about a system. The equilibrium state of a system maximizes the
entropy because we have lost all information about the initial conditions
except for the conserved quantities: maximizing the entropy maximizes
our ignorance about the details of the system. The entropy of a glass,
or of our mixture of black and white atoms, is a measure of the number
of arrangements the atoms could be in, given our ignorance.
This interpretation that entropy is not a property of the system, but
of our knowledge about the system (represented by the ensemble of pos-
sibilities) cleanly resolves many otherwise confusing issues. The atoms
in a glass are in a denite conguration, which we could measure using
some futuristic X-ray holographic technique. If we did so, our ignorance
Of course, the X-ray holographic pro- would disappear, and the residual entropy would become zero for us.26
cess must create at least as much en- We could in principle use our knowledge of the glass atom positions to
tropy during the measurement as the
glass loses.
extract work out of the glass, which would have been impossible before
measuring the positions.
So far, we have conned ourselves to cases where our ignorance is max-
imal, where all allowed congurations are equally likely. What about
systems where we have partial information, where some congurations
are more probable than others? There is a powerful generalization of
the denition of entropy to general probability distributions, which we
will introduce in subsection 6.3.1 for traditional statistical mechanical
systems approaching equilibrium. In section 6.3.2 we will show that this
nonequilibrium entropy provides a generally useful measure of our igno-
rance about a wide variety of systems, with broad applications outside
of traditional physics.

6.3.1 Nonequilibrium Entropy

So far, we have dened the entropy only for systems in equilibrium,
where entropy is a constant. But the second law of thermodynamics
tells us that entropy increases presupposing some denition of entropy
for non-equilibrium systems. Any non-equilibrium state of a classical
Hamiltonian system can be described with a probability density (P, Q)
on phase space. Wed like to have a formula for the entropy in terms of
this probability density.
In the case of the microcanonical ensemble, where (P, Q) = 1/(E),
we certainly want S to agree with our equilibrium formula 3.27

Sequil = kB log() = kB log(microcanonical ). (6.21)

Now, when we are out of equilibrium, wont be a constant. Well need

some kind of average of kB log() over the phase-space volume. Since
(P, Q) is the probability of being at
a given point in phase space, the
average of any observable is A = dP dQ (P, Q)A(P, Q), leading us
To be pub. Oxford UP, Fall05
6.3 Entropy as Ignorance: Information and Memory 93

to a formula for the entropy valid even when it is not in equilibrium:

Snonequil = kB log  = kB log . (6.22)

Is this the right formula to use? Well see in subsection 6.3.2 that it
has several general important properties. For here, we need to know
that it behaves properly for the two cases of non-uniform probability
distributions weve seen so far: the various equilibrium ensembles, and
weakly coupled subsystems.
This entropy is maximized for the microcanonical (subsection 6.3.2),
canonical, and grand canonical ensembles, under suitable constraints.
You can argue this with Lagrange multipliers (exercise 6.4).
This entropy is additive, for weakly coupled subsystems. In arguing
for the denition of temperature (section 3.3), we implicitly discussed
non-equilibrium entropies for a system with two weakly coupled parts.
That is, we calculated the entropy of a system with two parts as a
function of the amount of energy in each: S(E) = S1 (E1 ) + S2 (E
E1 ).27 This is an important property: we want the entropies of weakly 27
We then argued that energy would
coupled, uncorrelated systems to add.28 Lets check this. The states ow from one to the other until the
temperatures matched and entropy was
of the total system are pairs (s1 , s2 ) of states from the two separate maximized.
systems. The probability density that the rst system is in state s1 = 28
In thermodynamics, one would say
(P1 , Q1 ) and the second system is in state s2 = (P2 , Q2 ) is ((s1 , s2 )) = that S is an extensive variable, that
1 (P1 , Q1 )2 (P2 , Q2 ).29
The total

entropy of the combined system, by grows in proportion to the system size.
formula 6.22 and using 1 = 2 = 1, is
S = kB dP1 dQ1 dP2 Q2 (6.23)
1 (P1 , Q1 )2 (P2 , Q2 ) log (1 (P1 , Q1 )2 (P2 , Q2 ))

= kB 1 2 log(1 2 )

= kB 1 2 (log 1 + log 2 )
= kB 1 log 1 2 kB 1 2 log 2
=S1 (E1 ) + S2 (E2 ).

The non-equilibrium entropy formula (6.22) appears in various dis-

guises. For discrete systems, it is written as a sum over probabilities pi
of the states i: 
Sdiscrete = kB pi log pi ; (6.24)

for quantum systems it is written in terms of the density matrix (sec-

tion 7.1):
Squantum = kB T r( log ). (6.25)

29 This is just what we mean by uncorrelated: the probabilities for system #1 are
independent of those for system #2, so the probability for the pair is the product of
the probabilities.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
94 Entropy

Finally, we must notice an important point. Snoneq and Squantum are

dened for the microscopic laws of motion. However, in section 6.1 we
argued that the microscopic laws were time-reversal invariant, and the
increase of entropy must be used to dene the future. Thus, we can
guess that these microscopic entropies will be timeindependent: you
can show this explicitly in exercises 6.5 and 7.2. No information is lost
(in principle) by evolving a closed system in time. Entropy (and our
ignorance) increases only in coarse-grained theories where we ignore or
exclude some degrees of freedom (internal or external).

6.3.2 Information Entropy

Understanding ignorance is central to many elds! Entropy as a mea-
sure of ignorance has been useful in everything from the shuing of
cards to reconstructing noisy images. For these other applications, the
connection with temperature is unimportant, so we dont need to make
use of Boltzmanns constant. Instead, we normalize the entropy with
the constant kS = 1/ log(2):

Snonequil = kS pi log pi . (6.26)
This normalization was introduced by Shannon [108], and the for-
mula 6.26 is referred to as Shannon entropy in the context of informa-
tion theory. Shannon noted that this entropy, applied to the ensemble
of possible messages or images, can be used to put a fundamental limit
Lossless compression schemes (les on the amount they can be compressed30 to eciently make use of disk
ending in gif, png, zip, and gz) remove space or a communications channel (exercises 6.7 and 6.8). A low en-
the redundant information in the orig-
inal les, and their eciency is limited
tropy data set is highly predictable: given the stream of data so far, we
by the entropy of the ensemble of les can predict the next transmission with some condence. In language,
being compressed. Lossy compression siblings can often complete sentences for one another. In image trans-
schemes (les ending in jpg, mpg, and mission, if the last six pixels were white the region being depicted is
mp3) also remove information that is
thought to be unimportant for humans
likely a white background, and the next pixel is also likely white. One
looking at or listening to the les. need only transmit or store data that violates our prediction. The en-
tropy measures our ignorance, how likely the best predictions about the
rest of the message are to be wrong.
Entropy is so useful in these various elds because it is the unique
Unique, that is, up to the overall con- (continuous) function that satises three key properties.31 In this sec-
stant kS or kB . tion, we will rst explain what these three properties are and why they
are natural for any function that measures ignorance. We will show
our nonequilibrium Shannon entropy satises these properties; in ex-
ercise 6.11 you will show that this entropy is the only function to do
Your roommate has lost their keys: they are asking for your advice.
We want to measure the roommates progress in nding the keys by
For now, S is an unknown function: measuring your ignorance with some function S.32 Suppose there are
were showing that the entropy is a possible sites Ak that they might have
good candidate. left

the keys, which you estimate
have probabilities pk = P (Ak ), with 1 pi = 1.
What are the three key properties we want our ignorance function
S(p1 , . . . , p ) to have? The rst two are easy.
To be pub. Oxford UP, Fall05
6.3 Entropy as Ignorance: Information and Memory 95

(1) Entropy is maximum for equal probabilities. Without further in-

formation, surely the best plan is for your roommate to look rst at the
most likely site, which maximizes pi . Your ignorance must therefore be
maximal if all sites have equal likelihood:

S 1/ , . . . , 1/ > S(p1 , . . . , p ) unless pi = 1/ for all i. (6.27)

In exercise 6.4 youll show that S is an extremum when all probabilities

are equal. Here we use the convexity of x log x (gure 6.9) to show it
is a maximum. First, we notice that the function f (p) = p log p is
concave (convex downward, gure 6.9). For a concave function f , the
average value of f (p) over a set of points pk is less than than or equal
to f evaluated at the average:33
/ f (pk ) f /
pk . (6.29)
k k

But this tells us that

Entropy is Concave (Convex downward)
  f( a+(1-)b ) > f(a) + (1-)f(b)
S(p1 , . . . , p ) = kS pk log pk = kS
f (pk )
  f( a+(1-)b )

kS f 1/ pk = kS f (1/ ) (6.30) -x log(x) f(a)


0.1 f(a) + (1-) f(b) f(b)
= kS / f (1/ ) = S(1/ , . . . , 1/ ).
1 0
0 0.2 0.4 0.6 0.8 1
(2) Entropy is unaected by extra states of zero probability. If there
is no possibility that the keys are in your shoe (site A ), then your Fig. 6.9 Entropy is Concave. For
ignorance is no larger than it would have been if you hadnt included x 0, f (x) = x log x is strictly con-
vex downward (concave). That is, for
your shoe in the list of possible sites: 0 < < 1, the linear interpolation lies
below the curve:
S(p1 , . . . , p1 , 0) = S(p1 , . . . , p1 ). (6.32)
f (a + (1 )b) f (a)+(1)f (b).
33 Equation 6.29 can be proven by induction from the denition of concave (equa-
We know f is concave because its sec-
tion 6.31). For = 2, we use = 1/2 , a = p1 , and b = p2 to see that f p1 +p 2
ond derivative, 1/x, is everywhere
P1 negative.
(f (p 1 ) + f (p 2 )) . For general , we use = ( 1)/, a = ( 1 p k )/( 1),
and b = p to see
P ! P !
k=1 pk 1 1 pk
f =f 1
+ 1/ p
P1 !
1 pk
f 1
+ 1/ f (p )
1 X
f (pk ) + 1/ f (p )

= 1/ f (pk ) (6.28)

where in the third line we have used the truth of equation 6.29 for 1 to inductively
prove it for .
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
96 Entropy

This is true for the Shannon entropy because p log p 0 as p 0.

To aid in the search, youll likely ask the roommate where they were
when they last saw the keys. Suppose there are M locations B that
the roommate may have been (opening the apartment door, driving the
car, in the basement laundry room, . . . ), with probabilities q . Surely
the likelihood that the keys are currently in a coat pocket is larger if
the roommate was outdoors when the keys were last seen. Let rk =
P (Ak and B ) be the probability the keys are at site k and were last
seen at location , and

P (Ak |B ) = ck = rk /q (6.33)

be the conditional probability, given that they were last seen at B that
The conditional probability P (A|B) the keys are at site Ak .34 Clearly
[read P of A given B] is of course  
P (B) times the probability of A and B P (Ak |B ) = ck = 1 : (6.34)
both occuring.
k k

whereever they were last seen, the keys are now somewhere with proba-
bility one.
Before you ask your roommate where the keys were last seen, you have
ignorance S(A) = S(p1 , . . . , p ) about the site of the keys, and ignorance
S(B) = S(q1 , . . . , qM ) about the location they were last seen. You have a
joint ignorance about the two questions given by the ignorance function
applied to all M conditional probabilities:

S(AB) = S(r11 , r12 , . . . , r1M , r21 , . . . , rM ) (6.35)

= S(c11 q1 , c12 q2 , . . . , c1M qM , c21 q1 , . . . , cM qM ).

After the roommate answers your question, your ignorance about the
location last seen is reduced to zero (decreased by S(B)). If the location
last seen was in the laundry room (site B ), the probability for the keys
being at Ak shifts to ck and your ignorance about the site of the keys
is now
S(A|B ) = S(c1 , . . . , c ). (6.36)
So, your combined ignorance has decreased from S(AB) to S(A|B ).
We can measure the usefulness of your question by the expected
amount that it decreases your ignorance about where the keys reside.
The expected ignorance after the question is answered is given by weight-
ing the ignorance for each answer B by the probability q of that answer:

S(A|B )B = q S(A|B ). (6.37)

This leads us to the third key property for an ignorance function.
(3) Entropy change for conditional probabilities. How should a good
ignorance function behave for conditional probabilities? If we start with
the joint distribution AB, and then measure B, it would be tidy if, on
average, your joint ignorance declined by your original ignorance of B:

S(A|B )B = S(AB) S(B). (6.38)

To be pub. Oxford UP, Fall05
6.3 Entropy as Ignorance: Information and Memory 97

Does the Shannon entropy

 satisfy property (3)? The conditional prob-
ability S(A|B ) = kS  ck log ck , since the ck s are the probability
distribution for the Ak sites given location . So,35 35
Notice that this argument is almost
 the same as the proof that entropy is
S(AB) = kS ck q log(ck q ) additive (equation 6.23). There we as-
sumed A and B were uncorrelated, in
k  which case ck- = pk and S(AB) =
  S(A) + S(B).
= kS ck q log(ck ) + ck q log(q )
k k
= q kS ck log(ck ) + kS q log(q )

 k  k

= q S(A|B ) + S(B)

= S(A|B )B + S(B) (6.39)

and the Shannon entropy satises the third key condition for a measure
of ignorance, equation 6.38.

Entropy is an emergent property. Unlike energy conser- the heat death of the universe.
vation, which is inherited from the microscopic theory, Normally one speaks of living things as beings that con-
entropy is a constant for a closed system treated micro- sume energy to survive and proliferate. This is of course
scopically (6.5(a)). Entropy increases because informa- not correct: energy is conserved, and cannot be con-
tion is lost either to the outside world, to unimportant sumed. Living beings intercept entropy ows: they use
internal degrees of freedom (diusion equation, 6.6), or to low entropy sources of energy (e.g., high temperature so-
measurement inaccuracies in the initial state (Lyapunov lar radiation for plants, candy bars for us) and emit high
exponents 6.12, Poincare cat map 6.5(b)). entropy forms of the same energy (body heat).
Entropy is a general measure of ignorance, useful far
Dyson ignores the survival and proliferation issues; hes
outside its traditional applications (6.4) in equilibrium
interested in getting a lot of thinking in before the uni-
systems. It is the unique function to have the appropri-
verse ends. He presumes that an intelligent being gen-
ate properties to measure ignorance (6.11). It has ap-
erates a xed entropy S per thought. (This correspon-
plications to glasses (6.9) and to dening fractal dimen-
dence of information with entropy is a standard idea from
sions (6.14). It is fascinating that entropy our ignorance
computer science: see problems 6.7 and 6.8.)
about the system can exert real forces (e.g. in rubber
bands, 6.10). Energy needed per thought. Assume that the being
Entropy provides fundamental limits on engine e- draws heat Q from a hot reservoir at T1 and radiates it
ciency (6.2, 6.3), data compression (6.7, 6.8), memory away to a cold reservoir at T2 .
storage (to avoid a black hole! 6.13(c)), and to intelligent (a) What is the minimum energy Q needed per thought,
life at the end of the universe (6.1). in terms of S and T2 ? You may take T1 very large.
Related formul: S = Q2 /T2 Q1 /T1 ; First Law:
(6.1) Life and the Heat Death of the Universe. Q1 Q2 = W (energy is conserved).
(Basic, Astrophysics) [27] Time needed per thought to radiate energy. Dyson
Freeman Dyson discusses how living things might evolve shows, using theory not important here, that the power
to cope with the cooling and dimming we expect during radiated by our intelligentbeingasentropyproducer is
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
98 Entropy

no larger than CT23 , a constant times the cube of the cold A monatomic ideal gas in a piston is cycled around the
temperature.36 path in the P-V diagram in gure 6.10 Leg a cools at con-
(b) Write an expression for the maximum rate of thoughts stant volume by connecting to a heat bath at Tc ; leg b
per unit time dH/dt (the inverse of the time t per heats at constant pressure by connecting to a heat bath
thought), in terms of S, C, and T2 . at Th ; leg c compresses at constant temperature while
remaining connected to the bath at Th .
Number of thoughts for an ecologically ecient
being. Our universe is expanding: the radius R grows Which of the following are true?
roughly linearly in time t. The microwave background (T) (F) The cycle is reversible: no net entropy is created
radiation has a characteristic temperature (t) R1 in the universe.
which is getting lower as the universe expands: this red- (T) (F) The cycle acts as a refrigerator, using work from
shift is due to the Doppler eect. An ecologically ecient the piston to draw energy from the cold bath into the hot
being would naturally try to use as little heat as possible, bath, cooling the cold bath.
and so wants to choose T2 as small as possible. It cannot (T) (F) The cycle acts as an engine, transferring heat
radiate heat at a temperature below T2 = (t) = A/t. from the hot bath to the cold bath and doing positive net
(c) How many thoughts H can an ecologically ecient be- work on the outside world.
ing have between now and time innity, in terms of S, (T) (F) The work done per cycle has magnitude |W | =
C, A, and the current time t0 ? P0 V0 |4 log 4 3|.
Time without end: Greedy beings. Dyson would (T) (F) The heat transferred into the cold bath, Qc has
like his beings to be able to think an innite number of magnitude |Qc | = (9/2)P0 V0 .
thoughts before the universe ends, but consume a nite (T) (F) The heat transferred from the hot bath Qh , plus
amount of energy. He proposes that his beings need to the net work W done by the piston onto the gas, equals
be proigate in order to get their thoughts in before the
the heat Qc transferred into the cold bath.
world ends: he proposes that they radiate at a tempera-
ture T2 (t) t3/8 which falls with time, but not as fast Related formul: PRV = N kB T , U = (3/2)N kB T ,
as (t) t1 . S = Q/T , W = P dV , U = Q + W . Notice that
the signs of the various terms depend on convention (heat
(d) Show that with Dysons cooling schedule, the total ow out vs. heat ow in): you should gure the signs on
number of thoughts H is innite, but the total energy con- physical grounds.
sumed U is nite.
(6.3) Carnot Refrigerator. (Basic, Thermodynamics)
(6.2) P-V Diagram. (Basic, Thermodynamics)
Our refrigerator is about 2m 1m 1m, and has in-
sulation about 3cm thick. The insulation is probably
polyurethane, which has a thermal conductivity of about
4P0 0.02 W/(m K). Assume that the refrigerator interior is at
270K, and the room is at 300K.
T h
(a) How many watts of energy leak from our refrigerator
through this insulation?
c Our refrigerator runs at 120 V, and draws a maximum of
P a Iso 4.75 amps. The compressor motor turns on every once in
rm a while for a few minutes.
(b) Suppose (i) we dont open the refrigerator door, (ii)
P the thermal losses are dominated by the leakage through
0 the foam and not through the seals around the doors, and
T c b
(iii) the refrigerator runs as a perfectly ecient Carnot
cycle. How much power on average will our refrigerator
V0 4V0 need to operate? What fraction of the time will the motor
V run?
Fig. 6.10 PV diagram

36 The constant scales with the number of electrons in the being, so we can think

of our answer t as the time per thought per mole of electrons.

To be pub. Oxford UP, Fall05
6.3 Entropy as Ignorance: Information and Memory 99

(6.4) Lagrange. (Thermodynamics) 37 (6.5) Does Entropy Increase? (Mathematics)

Lagrange Multipliers. Lagrange multipliers allow one The second law of thermodynamics says that entropy al-
to nd the extremum of a function f (x) given a constraint ways increases. Perversely, its easy to show that in an
g(x) = g0 . One extremizes isolated system, no matter what non-equilibrium condi-
tion it starts in, entropy as precisely dened stays con-
f (x) + (g(x) g0 ) (6.40) stant in time.
Entropy is Constant: Classical. 38 Liouvilles the-
as a function of and x. The derivative with respect
to being zero enforces the constraint and sets . The orem tells us that the total derivative of the probability
density is zero: following the trajectory of a system, the
derivatives with respect to components of x then include
local probability density never changes. The equilibrium
terms involving , which act to enforce the constraint.
states have probability densities that only depend on en-
Let us use Lagrange multipliers to nd the maximum of
ergy and number. Clearly something is wrong: if the
the nonequilibrium entropy density starts non-uniform, how can it become uniform?
(a) Show forPany function f () that f ()/t =
S = kB (P, Q) log (P, Q)
[f ()V] = /p (f ()p ) + /q (f ()q ), where
= kB T r( log ) (6.41) V = P, Q is the 6N dimensional velocity in phase
= kB pi log pi space.R Hence, (by Gausss theorem in 6N dimensions),
show f ()/t dPdQ = 0, assuming that the probabil-
constraining the normalization, energy, and number. You ity density vanishes at large momenta and positions R and
may use whichever form of the entropy you prefer: the f (0) = 0. Show, thus, that the entropy S = kB log
rst continuous form will demand some calculus of vari- is constant in time.
ations (see [68, ch. 12]); the last discrete form is mathe- We will see that the quantum version of the entropy is
matically the most straightforward. also constant for a Hamiltonian system in problem 7.2.
(a) Microcanonical: Using a Lagrange multiplier to en- The Arnold Cat. Why do we think entropy increases?
force the normalization First, points in phase space dont just swirl in circles: they
Z get stretched and twisted and folded back in complicated
T r() = (P, Q) = 1, (6.42) patterns especially in systems where statistical mechan-
ics seems to hold! Arnold, in a takeo on Schrodingers
show that the probability distribution that extremizes the cat, suggested the following analogy. Instead of a contin-
entropy is a constant (the microcanonical distribution). uous transformation of phase space onto itself preserving
(b) Canonical: Integrating over all P and Q, use an- 6N -dimensional volume, lets think of an area-preserving
other Lagrange multiplier to x the mean energy E = mapping of an n n square in the plane into itself.39
dPdQ H(P, Q)(P, Q). Show that the canonical distribu- Consider the mapping
tion maximizes the entropy given the constraints of nor-
x x+y
malization and xed energy. = mod n. (6.43)
y x + 2y
(c) Grand Canonical: Summing over dierent numbers
of particles N andPadding See the map in gure 6.11
R the constraint that the average
number is N  = N dPdQ N N (P, Q), show that you (b) Check that preserves
area. (Its
basically multipli-
get the grand canonical distribution by maximizing the en- 1 1
cation by the matrix M = . What is the deter-
tropy. 1 2
minant of M ?). Show that it takes a square n n (or a

37 Lagrange (1736-1813).
38 Well see in problem 7.2 that the non-equilibrium entropy is also constant in
quantum systems.
39 For our purposes, the Arnold cat just shows that volume preserving transforma-

tions can scramble a small region uniformly over a large one. More general, nonlin-
ear areapreserving maps of the plane are often studied as simple Hamiltonianlike
dynamical systems. Areapreserving maps come up as Poincare sections of Hamil-
tonian systems 4.2, with the area weighted by the inverse of the velocity with which
the system passes through the crosssection. They come up in particular in stud-
ies of highenergy particle accelerators, where the mapping gives a snapshot of the
particles after one orbit around the ring.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
100 Entropy

Fig. 6.11 Arnold Cat Transform, from reference [80]; see movie too [89].

picture of n n pixels) and maps it into itself with peri- stretch and chop our original circle into a thin line uni-
odic boundary conditions. (With less cutting and pasting, formly covering the square. In the pixel case, there are
you can view it as a map from the torus into itself.) As a always exactly the same number of pixels that are black,
linear map, nd the eigenvalues and eigenvectors. Argue white, and each shade of gray: they just get so kneaded
that a small neighborhood (say a circle in the center of together that everything looks a uniform color. So, by
the picture) will initially be stretched along an irrational putting a limit to the resolution of our measurement
direction into a thin strip (gure 6.12). (rounding errors on the computer, for example), or by
introducing any tiny coupling to the external world, the
nal state can be seen to rapidly approach equilibrium,
proofs to the contrary notwithstanding!

(6.6) Entropy Increases: Diusion.

We saw that entropy technically doesnt increase for a
closed system, for any Hamiltonian, either classical or
quantum. However, we can show that entropy increases
for most of the coarse-grained eective theories that we
use in practice: when we integrate out degrees of freedom,
we provide a means for the information about the initial
Fig. 6.12 A small circular region stretches along an irrational condition to be destroyed. Here youll show that entropy
angle under the Arnold cat map. The center of the gure is increases for the diusion equation.
the origin x = 0, y = 0.
Diusion Equation Entropy. Let (x, t) obey the one-
dimensional diusion equation /t = D 2 /x2 . As-
When this thin strip hits the boundary, it gets split into sume that the density and all its gradients die away
two; in the case of an n n square, further iterations rapidly at x = .40

40 Also, you may assume n /xn log goes to zero at x = , even though log

goes to .
To be pub. Oxford UP, Fall05
6.3 Entropy as Ignorance: Information and Memory 101

Derive a formula
R for the time derivative of the entropy you read the message to after you read which of 16 mes-
S = kB (x) log (x) dx and show that it strictly in- sages it was. The length of 1000 is not important for this
creases in time. (Hint: integrate by parts. You should part.)
get an integral of a positive denite quantity.) Remark: This is an extreme form of data compression,
like that used in gif images, zip les (Windows) and gz
(6.7) Information entropy. (Basic, Computer Science, les (Unix). We are asking for the number of characters
Mathematics, Complexity) per year for an optimally compressed signal.
Entropy is a measure of your ignorance about a system: it
is a measure of the lack of information. It has important (6.8) Shannon entropy. (Computer Science)
implications in communication technologies: messages Entropy can be viewed as a measure of the lack of infor-
passed across the Ethernet communicate information, re- mation you have about a system. Claude Shannon [108]
ducing the information entropy for the receiver. Shan- realized, back in the 1940s, that communication over
non [108] worked out the use of entropy ideas in commu- telephone wires amounts to reducing the listeners un-
nications, focusing on problems where dierent messages certainty about the senders message, and introduced a
have dierent probabilities. Well focus on the simpler denition of an information entropy.
problem where all N messages are equally likely. Shan- Most natural languages (voice, written English) are
non denes the information entropy of an unread mes- highly redundant; the number of intelligible fty-letter
sage as being log2 N = kS log N , where kS = 1/(loge 2) sentences is many fewer than 2650 , and the number of
is analogous to Boltzmanns constant, and changes from ten-second phone conversations is far smaller than the
log-base-e to log-base-2 (more convenient for computers, number of sound signals that could be generated with fre-
which think in base two.) quencies between up to 20,000 Hz.42 Shannon, knowing
Your grandparent has sent you an e-mail message. From statistical mechanics, dened the entropy of an ensemble
the header of the message, you know it contains 1000 of messages: if there are N possible messages that can be
characters. You know each character is made of 8 bits, sent in one package, and message m is being transmitted
which allows 28 = 256 dierent letters or symbols per with probability pm , then Shannons entropy is
character. Assuming all possible messages from your
grandparent are equally likely (a typical message would X

then look like G*me!8V[beep]. . . ), how many dierent SI = kS pm log pm (6.44)

messages N could there be? This (unrealistic) assump-
tion gives an upper bound for the information entropy where instead of Boltzmanns constant, Shannon picked
Smax . kS = 1/ log 2.
(a) What Smax for the unread message? This immediately suggests a theory for signal compres-
Your grandparent writes rather dull messages: they all sion. If you can recode the alphabet so that common
fall into the same pattern. They have a total of 16 equally letters and common sequences of letters are abbreviated,
likely messages. 41 After you read the message, you for- while infrequent combinations are spelled out in lengthly
get the details of the wording anyhow, and only remember fashion, you can dramatically reduce the channel capac-
these key points of information. ity needed to send the data. (This is lossless compression,
(b) What is the actual information entropy change like zip and gz and gif).
SShannon you undergo when reading the message? If An obscure language Abc! for long-distance communica-
your grandparent writes one message per month, what is tion has only three sounds: a hoot represented by A, a
the minimum number of 8-bit characters per year that it slap represented by B, and a click represented by C. In
would take to send your grandparents messages? (You a typical message, hoots and slaps occur equally often
may lump multiple messages into a single character.) (p = 1/4), but clicks are twice as common (p = 1/2).
(Hints: Sshannon is the change in entropy from before Assume the messages are otherwise random.

41 Each message mentions whether they won their bridge hand last week (a fty-

fty chance), mentions that they wish you would write more often (every time), and
speculates who will win the womens college basketball tournament in their region
(picking at random one of the eight teams in the league).
42 Real telephones dont span this whole frequency range: they are limited on the

low end at 300400 Hz, and on the high end at 30003500. You can still understand
the words, so this simple form of data compression is only losing non-verbal nuances
in the communication [34].
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
102 Entropy

(a) What is the Shannon entropy in this language? More

specically, what is the Shannon entropy rate (entropy per
sound, or letter, transmitted)?
(b) Show that a communication channel transmitting bits
(ones and zeros) can transmit no more than one unit of
Shannon entropy per bit. (Hint: this should follow by
showing that, for N = 2n messages, equation 6.44 is max-
imized by pm = 1/N . You neednt prove its a global
maximum: check that it is a local extremum. Youll need
either a Lagrange
P multiplier or will need to explicitly set
pN = 1 N1 m=1 p m .)
(c) In general, argue that the Shannon entropy gives the
minimum number of bits needed to transmit the ensemble
of messages. (Hint: compare the Shannon entropy of
the N original messages with the Shannon entropy of the Fig. 6.13 Specic heat of B2 O3 glass measured while heat-
ing and cooling. The glass was rst rapidly cooled from the
N (shorter) encoded messages.) Calculate the minimum
melt (500 C 50 C in a half hour), then heated from 33 C
number of bits per letter on average needed to transmit 345 C in 14 hours (solid curve with squares), cooled from
messages for the particular case of an Abc! communica- 345 C to room temperature in 18 hours (dotted curve with di-
tion channel. amonds), and nally heated from 35 C 325 C (solid curve
(d) Find a compression scheme (a rule that converts a with crosses). Figure from reference [113], see also [57].
Abc! message to zeros and ones, that can be inverted to
give back the original message) that is optimal, in the
sense that it saturates the bound you derived in part (b).
Thomas and Parks in gure 6.13 are making the approx-
(Hint: Look for a scheme for encoding the message that
imation that the specic heat of the glass is dQ/dT , the
compresses one letter at a time. Not all letters need to
measured heat ow out of the glass divided by the tem-
compress to the same number of bits.)
perature change of the heat bath. They nd that the
Shannon also developed a measure of the channel capacity specic heat dened in this way measured on cooling and
of a noisy wire, and discussed error correction codes. . . heating disagree. 43 Consider the second cooling curve
and the nal heating curve, from 325 C to room temper-
(6.9) Entropy of Glasses. [59] ature and back. Assume that the liquid at 325 C is in
Glasses arent really in equilibrium. In particular they equilibrium both before cooling and after heating (and so
do not obey the third law of thermodynamics, that the has the same liquid entropy Sliquid ).
entropy S goes to zero at zero temperature. Experimen- (a) Is the residual entropy, equation 6.45, larger on heat-
talists measure a residual entropy by subtracting the ing or on cooling? R(Hint: Use the fact that the integrals
entropy change from the known entropy of the equilib- T
under the curves, 0 dQ dT give the heat ow, which
rium liquid at a temperature T- at or above the crystalline by conservation of energy must be the same on heating
melting temperature Tc : and cooling. The heating curve shifts weight to higher
Z T temperatures: will that increase or decrease the integral
1 dQ
Sresidual = Sliquid (T- ) dT (6.45) in 6.45?)
0 T dT
(b) By using the second law (entropy can only increase),
where Q is the net heat ow out of the bath into the glass. show that when cooling and then heating from an equilib-
If you put a glass in an insulated box, it will warm up rium liquid the residual entropy measured on cooling must
(very slowly) because of microscopic atomic rearrange- always be less than the residual entropy measured on heat-
ments which lower the potential energy. So, glasses dont ing. (Hint: Consider the entropy ow into the outside
have a well-dened temperature or specic heat. In par- world upon cooling the liquid into the glass, compared to
ticular, the heat ow upon cooling and on heating dQdT
(T ) the entropy ow from the outside world to heat the glass
wont precisely match (although their integrals will agree into the liquid again. The initial and nal states of the
by conservation of energy). liquid are both in equilibrium.)

43 The fact that the energy lags the temperature near the glass transition, in linear

response, leads to the study of specic heat spectroscopy [11].

To be pub. Oxford UP, Fall05
6.3 Entropy as Ignorance: Information and Memory 103

The residual entropy of a typical glass is about kB per The molecule must exert an equal and opposite entropic
molecular unit. Its a measure of how many dierent force F .
glassy congurations of atoms the material can freeze (b) Find an expression for the force F exerted by the
into. bath on the molecule in terms of the bath entropy. Hint:
(c) In a molecular dynamics simulation with one hundred the bath temperature T1 = SE bath
, and force times dis-
indistinguishable atoms, and assuming that the residual tance is energy. Using the fact that the length L must
entropy is kB log 2 per atom, what is the probability that maximize the entropy of the universe, write a general ex-
two coolings to zero energy will arrive at equivalent atomic pression for F in terms of the internal entropy S of the
congurations (up to permutations)? In a system with molecule.
1023 molecular units, with residual entropy kB log 2 per (c) Take our model of the molecule from part (a), the
unit, about how many coolings would be needed to arrive general law of part (b), and Stirlings formula 3.11 (drop-
at the original conguration again, with probability 1/2? ping the square root), write the force law F (L) for our
molecule for large lengths N . What is the spring constant
(6.10) Rubber Band. (Basic) K in Hookes law F = KL for our molecule, for small
Our model has no internal energy: this force is entirely
(d) If we increase the temperature of our rubber band while
it is under tension, will it expand or contract? Why?
In a more realistic model of a rubber band, the entropy
d consists primarily of our congurational randomwalk en-
tropy plus a vibrational entropy of the molecules. If we
stretch the rubber band without allowing heat to ow in
L or out of the rubber, the total entropy should stay ap-
Fig. 6.14 Simple model of a rubber band with N = 100 seg- proximately constant.44
ments. The beginning of the polymer is at the top: the end is
at the bottom; the vertical displacements are added for visu-
(e) True or false?
alization. (T) (F) When we stretch the rubber band, it will cool: the
congurational entropy of the random walk will decrease,
causing the entropy in the vibrations to decrease, causing
Figure 6.14 shows a simple onedimensional model for the temperature to decrease.
rubber. Rubber is formed of many long polymeric (T) (F) When we stretch the rubber band, it will cool: the
molecules, which undergo random walks in the unde- congurational entropy of the random walk will decrease,
formed material. When we stretch the rubber, the causing the entropy in the vibrations to increase, causing
molecules respond by rearranging their random walk to the temperature to decrease.
elongate in the direction of the external stretch. In our (T) (F) When we let the rubber band relax, it will cool:
simple model, the molecule is represented by a set of N the congurational entropy of the random walk will in-
links of length d, which with equal energy point either crease, causing the entropy in the vibrations to decrease,
parallel or antiparallel to the previous link. Let the total causing the temperature to decrease.
change in position to the right from the beginning of the (T) (F) When we let the rubber band relax, there must
polymer to the end be L. be no temperature change, since the entropy is constant.
As the molecule extent L increases, the entropy of our This more realistic model is much like the ideal gas, which
rubber molecule decreases. also had no congurational energy.
(a) Find an exact formula for the entropy of this system (T) (F) Like the ideal gas, the temperature changes be-
in terms of d, N , and L. (Hint: How many ways can one cause of the net work done on the system.
divide N links into M rightpointing links and N M (T) (F) Unlike the ideal gas, the work done on the rubber
leftpointing links, so that the total length is L?) band is positive when the rubber band expands.
The external world, in equilibrium at temperature T , ex- You should check your conclusions experimentally: nd
erts a force pulling the end of the molecule to the right. a rubber band (thick and stretchy is best), touch it to

44 Rubber is designed to bounce well: little irreversible entropy is generated in a

cycle of stretching and compression, so long as the deformation is not too abrupt.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
104 Entropy

your lips (which are very sensitive to temperature), and L(s)
. Hence, show that L(2) log(s)
< 1
and thus
stretch and relax it.
L(s) = k log s for some constant k.
(6.11) Entropy Measures Ignorance. (Mathematics) Hence our ignorance function S agrees with the formula
for the nonequilibrium entropy, uniquely up to an overall
In this exercise, you will show that the unique continuous constant.
function (up to the constant kB ) satisfying the three key
properties (equations 6.27, 6.32, and 6.38): (6.12) Chaos, Lyapunov, and Entropy Increase.
(Math, Complexity) (With Myers. [72])
1 1 1
S ,..., > S(p1 , . . . , p ) unless pi = for all i, Lets consider a simple dynamical system, given by a
(6.46) mappingfrom the unit interval (0, 1) into itself:

S(p1 , . . . , p1 , 0) = S(p1 , . . . , p1 ), (6.47) f (x) = 4x(1 x). (6.51)

where the time evolution is given by iterating the map:
S(A|B- )B = S(AB) S(B). (6.48)
where S(A) = S(p P 1 , . . . , p ), S(B) = S(q1 , . . . , qM ), x0 , x1 , x2 , = x0 , f (x0 ), f (f (x0 )), . . . (6.52)
S(A|B- )B = - q- S(c1- , . . . , c- ) and S(AB) =
S(c11 q1 , . . . , cM qM ). The presentation is based on the In particular, for = 1 it precisely folds the unit inter-
excellent small book by Khinchin [49]. val in half, and stretches it (non-uniformly) to cover the
For convenience, dene L(g) = S(1/g, . . . , 1/g). original domain.
(a) For any rational probabilities pk , let g be the least com- The mathematics community lumps together continuous
mon multiple of their denominators, and let pk = gk /g for dynamical evolution laws and discrete mappings as both
integers gk . Show that being dynamical systems. You can motivate the rela-
X tionship using the Poincare sections (gure 4.3), which
S(A) = L(g) pk L(gk ). (6.49)
connect a continuous recirculating dynamical system to
the oncereturn map. The mapping 4.11 is not invert-
(Hint: consider AB to have g possibilities of probabil- ible, so it isnt directly given by a Poincare section of a
ity 1/g, A to measure which group of size gk , and B to smooth dierential equation46 but the general stretching
measure which of the gk members of group k.) and folding exhibited by our map is often seen in driven
(b) If L(g) = kS log g, show that equation 6.49 is the physical systems without conservation laws.
Shannon entropy 6.26. In this problem, we will focus on values of near one,
Knowing that S(A) is the Shannon entropy for all ratio- where the motion is mostly chaotic. Chaos is sometimes
nal probabilities, and assuming that S(A) is continuous, dened as motion where the nal position depends sensi-
makes S(A) the Shannon entropy. So, weve reduced the tively on the initial conditions. Two trajectories, starting
problem to showing L(g) is the logarithm up to a con- a distance 9 apart, will typically drift apart in time as
stant. 9et , where is the Lyapunov exponent for the chaotic
(c) Show that L(g) is monotone increasing with g. (Hint: dynamics.
youll need to use both of the rst two properties.) Start with = 0.9 and two nearby points x0 and y0 =
(d) Show L(g n ) = nL(g). (Hint: consider n independent x0 + 9 somewhere between zero and one. Investigate
probability distributions each of g equally likely events. the two trajectories x0 , f (x0 ), f (f (x0 )), . . . f [n] (x0 ) and
Use the third property recursively on n.) y0 , f (y0 ), . . . . How fast do they separate? Estimate the
(e) If 2m < sn < 2m+1 , using the results of parts (c) Lyapunov exponent.
and (d) show Many Hamiltonian systems are also chaotic. Two con-
m L(s) m+1 gurations of classical atoms or billiard balls, with ini-
< < . (6.50)
n L(2) n tial positions and velocities that are almost identical, will
(Hint: how is L(2m ) related to L(sn ) and L(2m+1 )?) rapidly diverge as the collisions magnify small initial de-
Show also using the same argument that m n
< log(s)
< viations in angle and velocity into large ones. It is this

45 We also study this map in exercises 4.3, 6.14, and 13.8.

46 Remember the existence and uniqueness theorems from math class? The invert-
ibility follows from uniqueness.
To be pub. Oxford UP, Fall05
6.3 Entropy as Ignorance: Information and Memory 105

chaos that stretches, folds, and kneads phase space (as in of the surface area A
p = 4Rs , measured in units of the

the Poincare cat map of exercise 6.5) that is at root our Planck length L = G/c squared.

explanation that entropy increases.47

As it happens, Bekenstein had deduced this formula for
(6.13) Black Hole Thermodynamics. (Astrophysics) the entropy somewhat earlier, by thinking about analo-
gies between thermodynamics, information theory, and
Astrophysicists have long studied black holes: the end
statistical mechanics. On the one hand, when black holes
state of massive stars which are too heavy to support
interact or change charge and angular momentum, one
themselves under gravity (see exercise 7.14). As the mat-
can prove in classical general relativity that the area can
ter continues to fall into the center, eventually the escape
only increase. So it made sense to assume that the en-
velocity reaches the speed of light. After this point, the
tropy was somehow proportional to the area. He then
in-falling matter cannot ever communicate information
recognized that if you had some waste material of high
back to the outside. A black hole of mass M has radius48
entropy to dispose of, you could ship it into a black hole
2M and never worry about it again. Indeed, given that the
Rs = G 2 , (6.53)
c entropy represents your lack of knowledge about a system,
where G = 6.67 108 cm3 /g sec2 is the gravitational once matter goes into a black hole one can say that our
constant, and c = 3 1010 cm/sec is the speed of light. knowledge about it completely vanishes.51 (More specif-
Hawking, by combining methods from quantum mechan- ically, the entropy of a black hole represents the inac-
ics and general relativity, calculated the emission of radi- cessibility of all information about what it was built out
ation from a black hole.49 He found a wonderful result: of.) By carefully dropping various physical systems into
black holes emit perfect blackbody radiation at a tem- a black hole (theoretically) and measuring the area in-
perature crease compared to the entropy increase,52 he was able to
c3 deduce these formulas purely from statistical mechanics.
Tbh = . (6.54)
8GM kB We can use these results to provide a fundamental bound
According to Einsteins theory, the energy of the black on memory storage.
hole is E = M c2 .
(c) Calculate the maximum number of bits that can be
(a) Calculate the specic heat of the black hole. stored in a sphere of radius one centimeter.
The specic heat of a black hole is negative. That is, it
gets cooler as you add energy to it. In a bulk material,
this would lead to an instability: the cold regions would (6.14) Fractal Dimensions. (Math, Complexity) (With
suck in more heat and get colder. Indeed, a population Myers. [72])
of black holes is unstable: the larger ones will eat the There are many strange sets that emerge in science. In
smaller ones.50 statistical mechanics, such sets often arise at continuous
(b) Calculate the entropy of the black hole, by using the phase transitions, where selfsimilar spatial structures
denition of temperature T1 = E S
and assuming the en- arise (chapter 13. In chaotic dynamical systems, the at-
tropy is zero at mass M = 0. Express your result in terms tractor (the set of points occupied at long times after

47 There have been speculations by some physicists that entropy increases through
information dropping into black holes either real ones or tiny virtual blackhole
uctuations (see exercise 6.13. Recent work has cast doubt that the information is
really lost even then: were told its just scrambled, presumably much as in chaotic
48 This is the Schwarzschild radius of the event horizon for a black hole with no

angular momentum or charge.

49 Nothing can leave a black hole: the radiation comes from vacuum uctuations

just outside the black hole that emit particles.

50 A thermally insulated glass of ice water also has a negative specic heat! The

surface tension at the curved the ice surface will decrease the coexistence tempera-
ture a slight amount (see section 12.2): the more heat one adds, the smaller the ice
cube, the larger the curvature, and the lower the resulting temperature!
51 Except for the mass, angular momentum, and charge. This suggests that baryon

number, for example, isnt conserved in quantum gravity. It has been commented
that when the baryons all disappear, itll be hard for Dyson to build his progeny out
of electrons and neutrinos: see 6.1.
52 In ways that are perhaps too complex to do here.

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
106 Entropy

the transients have disappeared) is often a fractal (called For each 9, use a histogram to calculate the proba-
a strange attractor. These sets often are tenuous and bility Pn that the points fall in the nth bin
jagged, with holes on all length scales: see gures 13.2, Return the set of vectors Pn [9].
13.3, and 13.14.
You may wish to test your routine by using it for = 1
We often try to characterize these strange sets by a di-
(where the distribution should look like (x) = 1 ,
mension. The dimensions of two extremely dierent sets x(1x)

can be the same: the path exhibited by a random walk exercise 4.3(b)) and = 0.8 (where the distribution
(embedded in three or more dimensions) is arguably a should look like two -functions, each with half of the
twodimensional set (note 6 on page 15), but does not lo- points).
cally look like a surface! However, if two sets have dier- The Capacity Dimension. The denition of the ca-
ent spatial dimensions (measured in the same way) they pacity dimension is motivated by the idea that it takes at
surely are qualitatively dierent. least
There is more than one way to dene a dimension. Ncover = V /9D (6.56)
Roughly speaking, strange sets are often spatially inho- bins of size 9D to cover a D-dimensional set of volume
mogeneous, and what dimension you measure depends V .55 By taking logs of both sides we nd log Ncover
upon how you weight dierent regions of the set. In log V + D log 9. The capacity dimension is dened as the
this exercise, we will calculate the information dimension limit
(closely connected to the non-equilibrium entropy!), and log Ncover
Dcapacity = lim (6.57)
the capacity dimension (originally called the Hausdor 40 log 9
dimension, also sometimes called the fractal dimension). but the convergence is slow (the error goes roughly as
To generate our strange set along with some more or- log V / log 9). Faster convergence is given by calculating
dinary sets we will use the logistic map53 the slope of log N versus log 9:
d log Ncover
f (x) = 4x(1 x) (6.55) Dcapacity = lim (6.58)
40 d log 9
that we also study in exercises 6.12, 4.3, and 13.8. The log Ni+1 log Ni
= lim .
attractor for the logistic map is a periodic orbit (dimen- 40 log 9i+1 log 9i

sion zero) at = 0.8, and a chaotic, cusped density lling

two intervals (dimension one)54 at = 0.9. At the onset (b) Use your routine from part (a), write a routine to
of chaos at = 0.892486418 (exercise 13.8) the calculate N [9] by counting non-empty bins. Plot Dcapacity
dimension becomes intermediate between zero and one: from the fast convergence equation 6.58 versus the mid-
the attractor is strange, selfsimilar set. point 1/2 (log 9i+1 + log 9i ). Does it appear to extrapolate to
D = 1 for = 0.9?56 Does it appear to extrapolate to
Both the information dimension and the capacity dimen-
D = 0 for = 0.8? Plot these two curves together with
sion are dened in terms of the occupation Pn of cells of
the curve for . Does the last one appear to converge to
size 9 in the limit as 9 0.
D1 0.538, the capacity dimension for the Feigenbaum
(a) Write a routine which, given and a set of bin sizes attractor gleaned from the literature? How small a devia-
9, tion from does it take to see the numerical crossover
Iterates f hundreds or thousands of times (to get on to integer dimensions?
the attractor) Entropy and the Information Dimension. The en-
Iterates f many more times, collecting points on the tropy of a statistical mechanical system is given by equa-
attractor. (For , you could just integrate 2n tion 6.22, S = kB Tr( log ). In the chaotic regime this
times for n fairly large.) works ne. Our probabilities Pn (xn )9, so converting

53 We also study this map in exercises 4.3, 6.12, and 13.8.

54 See exercise 4.3. The chaotic region for the logistic map isnt a strange attrac-
tor because its conned to one dimension: period doubling cascades for dynamical
systems in higher spatial dimensions likely will have fractal, strange attractors in the
chaotic region.
55 Imagine covering the surface of a sphere in 3D with tiny cubes: the number of

cubes will go as the surface area [2D-volume] divided by /2 .

56 In the chaotic regions, keep the number of bins small compared to the number of

iterates in your sample, or you start nding empty bins between points and eventually
get a dimension of zero.
To be pub. Oxford UP, Fall05
6.3 Entropy as Ignorance: Information and Memory 107

the entropy integral into a sum f (x) dx n f (xn )9
Instead of using this formula to dene the entropy, math-
gives ematicians use it to dene the information dimension
S = kB (x) log((x)) dx (6.59) Dinf = lim Pn log Pn / log(9). (6.62)
Pn log(Pn /9) = Pn log Pn + log 9 The information dimension agrees with the ordinary di-
n n mension for sets that locally look like RD . Its dierent
from the capacity dimension because the information di-
(setting the conversion factor kB = 1 for convenience). mension weights each part (bin) of the attractor by the
You might imagine that the entropy for a xed point time spent in it. Again, we can speed up P the convergence
would be zero, and the entropy for a period-n cycle would by noting that equation 6.61 says that n Pn log Pn is
be kB log n. But this is incorrect: when there is a xed a linear function of log 9 with slope D and intercept SD .
point or a periodic limit cycle, the attractor is on a set Measuring the slope directly, we nd
of dimension zero (a bunch of points) rather than dimen- P
sion one. The entropy must go to minus innity since we d n Pn (9) log Pn (9)
Dinf = lim . (6.63)
have precise information about where the trajectory sits 40 d log 9
at long times. To estimate the zerodimensional en-
(c) As in part (b), write a routine that plots Dinf from
tropy kB log n on the computer, we would take the same
equation 6.63 as a function of the midpoint log 9, as we
bins as above but sum over bins Pn instead of integrating
increase the number of bins. Plot the curves for = 0.9,
over x:
= 0.8, and . Does the information dimension agree
Sd=0 = Pn log(Pn ) = Sd=1 log(9). (6.60) with the ordinary one for the rst two? Does the last one
n appear to converge to D1 0.517098, the information
dimension for the Feigenbaum attractor from the litera-
More generally, the natural measure of the entropy for ture?
a set with D dimensions might be dened as
Most real world fractals have a whole spectrum of dif-
SD = Pn log(Pn ) + D log(9). (6.61) ferent characteristic spatial dimensions: they are multi-
n fractal.)

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
108 Entropy

To be pub. Oxford UP, Fall05

Quantum Statistical
Mechanics 7
In this section, we introduce the statistical mechanics of quantum sys-
tems. Logically, we proceed from the abstract to the concrete, through a
series of simplications. We begin [7.1] by introducing density matrices,
which allow us to incorporate our ensembles into quantum mechanics:
here we discover the simplication that equilibrium ensembles have den-
sity matrices that are diagonal in the energy basis. This reduces equilib-
rium statistical mechanics to simple sums over energy eigenstates, which
we illustrate [7.2] by solving the nite-temperature quantum harmonic
oscillator. We then discuss the statistical mechanics of identical parti-
cles [7.3]. We then make the vast simplication of presuming that the
particles are non-interacting [7.4], which leads us to the Bose-Einstein
and Fermi distributions for the lling of single-particle eigenstates. We
briey relate Bose, Fermi, and Maxwell-Boltzmann statistics 7.5. We
illustrate how amazingly useful the non-interacting particle picture is
for quantum systems by solving the classic problems of black-body ra-
diation and bose condensation for bosons [7.6], and for the behavior of
simple metals for fermions [7.7].
Sections 7.1 and 7.5 logically belong here, but discuss issues at more
depth than is required for the rest of the text. It is suggested that
one skim or skip portions of these sections on rst reading, and return
to the abstractions later, after gaining a broad view of what quantum
statistical mechanics predicts in sections 7.6 and 7.7.

7.1 Quantum Ensembles and Density Ma-

How do we generalize the classical ensembles, described by probability
densities (P, Q) in phase space, to quantum mechanics? Two problems
immediately arise. First, the Heisenberg uncertainty principle tells us
that one cannot specify both position and momentum for a quantum
system at the same time. The states of our quantum system will not be
points in phase space. Second, quantum mechanics already has probabil-
ity densities: even for systems in a denite state1 (Q) the probability 1
Quantum systems with many particles
is spread among dierent congurations |(Q)|2 (or momenta |(P)| 2
). have wavefunctions that are functions
of all the positions of all the particles
In statistical mechanics, we need to introduce a second level of probabil- (or, in momentum space, all the mo-
ity, to discuss an ensemble that has probabilities n of being in a variety menta of all the particles).
110 Quantum Statistical Mechanics

of quantum states n (Q). Ensembles in quantum mechanics are called

mixed states: they are not superpositions of dierent wave functions,
So, for example, if |R is a right- but incoherent mixtures.2
circularly polarized photon, and |L is Suppose we want to compute the ensemble expectation of an operator
a left-circularly polarized photon, then
the superposition 1 (|R + |L) is a
A. In a particular state n , the quantum expectation is
linearly polarized photon, while the 
mixture 1/2 (|RR| + |LL|) is an unpo-
larized photon. The superposition is in Apure = n (Q)An (Q)d3N Q. (7.1)
both states, the mixture is in perhaps
one or perhaps the other. See prob-
lem 7.5(a). So, in the ensemble the expectation is

A = n n (Q)An (Q)d3N Q. (7.2)

For most purposes, this is enough! Except for selected exercises in this
chapter, one or two problems in the rest of the book, and occasional
specialized seminars, formulating the ensemble as a sum over states n
with probabilities n is perfectly satisfactory. Indeed, for all of the equi-
librium ensembles, the n may be taken to be the energy eigenstates,
and the n either a constant in a small energy range (for the micro-
canonical ensemble), or exp(En )/Z (for the canonical ensemble), or
exp ((En Nn )) / (for the grand canonical ensemble). For most
practical purposes you may stop reading here.

Advanced Topic: Density Matrices. What will we gain from going

beyond this simple picture? First, there are lots of mixed states that
are not mixtures of energy eigenstates. Mixtures of energy eigenstates
have time-independent properties, so any time-dependent ensemble will
be in this class. Second, although one can dene the ensemble in terms
of a set of states n , the ensemble should be something one can look
at in a variety of bases. Indeed, superuids and superconductors show
an energy gap when viewed in the energy basis, but show an exotic o
diagonal longrange order when looked at in position space. Third, we
will see that the proper generalization of Liouvilles theorem demands
the more elegant, operator-based approach.
Our goal is to avoid carrying around the particular states n , writing
the ensemble average [7.2] in terms of A and some operator , which will
be the density matrix. For this section, it is convenient to use Diracs
R In Diracs notation, |M| = bra-ket notation, in which the ensemble average can be written3
M. It is particularly useful
when expressing operators in a basis 
m ; if the matrix elements are Mij = A = n n |A|n . (7.3)
i |M|j  then Pthe operator itself can n
be written M = ij Mij |i j |.
Pick any complete orthonormal basis . Then the identity operator

1= |  | (7.4)

To be pub. Oxford UP, Fall05
7.1 Quantum Ensembles and Density Matrices 111

and, plugging the identity [7.4] into [7.3] we nd

A = n n | | | | A|n 
= n  |An n | 
=  A| n |n n | | 
=Tr(A) (7.5)

where4   4
The trace of a matrix is the sum of
 its diagonal elements, and is indepen-
= n |n n | (7.6) dent of what basis you write it in. The
n same is true of operators: we are sum-
P the diagonal elements Tr(M ) =
is the density matrix.
 |M | .
Some conclusions we can draw about the density matrix:
Suciency. In quantum mechanics, the measurement processes
involve expectation values of operators. Our density matrix there-
fore suces to embody everything we need to know about our
quantum system.
Pure states. A pure state, with a denite wavefunction , has
pure = ||. In the position basis |Q, this pure-state den-
sity matrix has matrix elements pure (Q, Q ) = Q|pure |Q  =
(Q )(Q). Thus in particular we can reconstruct the wavefunc-
tion, up to an overall constant, by xing one value of Q and vary-
ing Q. Since the wavefunction is normalized, we can reconstruct
up to an overall phase, which isnt physically measurable: this
again conrms the suciency of the density matrix to describe
our system. Since our wavefunction is normalized | = 1, one
notes also that the square of the density matrix for a pure state
equals itself: pure 2 = (||)(||) = pure ,
Normalization. The trace of a pure-state density matrix Trpure =
1, since we can pick an orthonormal basis with our wavefunction
as the rst basis element, making the rst term in the trace sum
one and the others zero. The trace of a general density matrix is
hence also one, since it is a sum of pure-state density matrices:
Tr = Tr n |n n | = n Tr (|n n |) = n = 1.
n n n
Canonical Distribution. The canonical distribution can be
written in terms of the Hamiltonian operator H as5
exp(H) exp(H)
canon = = . (7.9)
Z Tr exp(H)

5 What is the exponential of a matrix M ? We can dene it in terms of a power

series, exp(M ) = 1 + M + M 2 /2! + M 3 /3! + . . . , but it is usually easier to change

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
112 Quantum Statistical Mechanics

Let |En  be the orthonormal manybody energy eigenstates. If we

evaluate canon in the energy basis,

mn = Em |canon |En 

= Em | eH |En  /Z

= Em | eEn |En  /Z
= eEn Em |En /Z
= eEn mn /Z (7.10)

so canon is diagonal in the energy basis

 exp(En )
= |En En | (7.11)

and is given by the canonical weighting of each of the energy eigen-

states, just as one would expect. Notice that the states n mixed
to make the density matrix are not in general eigenstates, or even
orthogonal. For equilibrium statistical mechanics, though, life is
simple: the n can be chosen to be energy eigenstates, and the
density matrix is diagonal in that basis.
Entropy. The entropy for a general density matrix will be

S = kB Tr ( log ) . (7.12)

Time evolution for the density matrix.

The time evolution for the density matrix is determined by the
The n are the probability that one time evolution of the pure states composing it:6
started in the state n , and thus clearly  
dont change with time.  |n  n |
= n n | + |n  . (7.13)
t n
t t

Now, the time evolution of the ket wavefunction |n  is given by

operating on it with the Hamiltonian
|n  1
= H|n  (7.14)
t i
and the time evolution of the bra wavefunction n | is given by
the time evolution of n (Q):
n n 1 1
= = Hn = Hn (7.15)
t t i i

basis to diagonalize M . In that basis, any function f (M ) is given by

0 1
f (11 ) 0 0 ...
f () = @ 0 f (22 ) 0 . . .A . (7.8)
At the end, change back to the original basis.
To be pub. Oxford UP, Fall05
7.1 Quantum Ensembles and Density Matrices 113

so since H is Hermitian,
n | 1
= n |H. (7.16)
t i
 1  1
= n H|n n | |n n |H = (H H)
t n
i i
= [H, ]. (7.17)
Quantum Liouville Equation.
This time evolution law is the quantum version of Liouvilles the-
orem. We can see this by using the equations of motion 4.1,
q = H/p and p = H/q and the denition of Poisson
brackets  A B A B
{A, B}P = (7.18)

q p p q
to rewrite Liouvilles theorem that the total time derivative is zero
[4.7] into a statement about the partial time derivative:
d   H H
0= = + q + p = +
dt t q p t q p p q

= {H, }P . (7.20)
Using the classicalquantum correspondence between the Poisson
brackets and the commutator { }P i 1
[ ] the time evolution
law 7.17 is precisely the analogue of Liouvilles theorem 7.20.
Quantum Liouville and Statistical Mechanics.
The quantum version of Liouvilles equation is not nearly as com-
pelling an argument for statistical mechanics as was the classical
The classical theorem, you remember, stated that d/dt = 0. Any
equilibrium state must be time independent /t = 0, so this
implied that such a state must have constant along the trajecto-
ries. If the trajectory covers the energy surface (ergodicity), then
the probability density had to be constant on the energy surface,
justifying the microcanonical ensemble.
For an isolated quantum system, this argument breaks down. The
condition that an equilibrium state must be time independent isnt
very stringent! Indeed, /t = [H, ] = 0 for any mixture of
manybody energy eigenstates!
In principle, isolated quantum systems are very non-ergodic, and
one must couple them to the outside world to induce transitions
between the manybody eigenstates to lead to equilibrium. This
becomes much less of a concern when one realizes just how pecu-
liar a manybody eigenstates of a large system really is! Consider
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
114 Quantum Statistical Mechanics

an atom in an excited state contained in a large box. We normally

think of the atom in an energy eigenstate, which decays after some
time into a ground state atom plus some photons. The true eigen-
states of the system, however, are weird delicate superpositions
of states with photons being absorbed by the atom and the atom
emitting photons, carefully crafted to produce a stationary state.
When one starts including more atoms and other interactions, the
The low-lying manybody excitations true manybody eigenstates7 that we formally sum over in pro-
above the ground state are an exception ducing our ensembles are pretty useless things to work with in
to this.
most cases.

7.2 Quantum Harmonic Oscillator

The quantum harmonic oscillator is a great example of how statistical
mechanics works in quantum systems. Consider a harmonic oscillator of
frequency . The energy eigenvalues are En = (n + 1/2 ). Hence the
canonical ensemble for the quantum
 harmonic oscillator at temperature
T = 1/kB is a geometric series xn , which we can sum to 1/(1 x):
Fig. 7.1 The quantum states of
the harmonic oscillator are at equally

eEn = e(n+ /2 )
spaced energies. Zqho = (7.21)
n=0 n=0

  n 1
= e/2 e = e/2
1 e
1 1
= /2 /2
= .
e e 2 sinh(/2)
The average energy is
log Zqho 1 
Eqho = = /2  + log 1 e

e 1
=  1/2 + =  1
/ + (7.22)
Specific Heat cV

1 e e 1
which corresponds to an average excitation level
nqho = . (7.23)
e 1
00 0.2 0.4 0.6 0.8 1
Temperature kBT / h The specic heat is thus
Fig. 7.2 The specic heat for the quan- E  e/kB T
tum harmonic oscillator. cV = = kB 2 (7.24)
T kB T 1 e/kB T

High temperatures. e/kB T 1 /kB T , so cV kB as

we found for the classical harmonic oscillator (and as given by the
equipartition theorem).
Low temperatures. As T 0, e/kB T becomes exponentially
small, so the specic heat goes rapidly to zero as the energy asymp-
totes to the zero-point energy 1/2 . More specically, there is an
To be pub. Oxford UP, Fall05
7.3 Bose and Fermi Statistics 115

energy gap8  to the rst excitation, so the probability of having

any excitation of the system is suppressed by a factor of e/kB T .

7.3 Bose and Fermi Statistics

In quantum mechanics, indistinguishable particles are not just hard to
tell apart their quantum wavefunctions must be the same, up to an
overall phase change,9 when the coordinates are swapped. In particular, 9
In three dimensions, this phase change
for bosons10 the wavefunction is unchanged under a swap, so must be 1. In two dimensions one
can have any phase change, so one can
have not only fermions and bosons but
(r1 , r2 , . . . , rN ) = (r2 , r1 , . . . , rN ) = (rP1 , rP2 , . . . , rPN ) (7.25)
anyons. Anyons, with fractional statis-
tics, arise as excitations in the frac-
for any permutation P of the integers 1, . . . , N .11 For fermions12 tional quantized Hall eect.
(r1 , r2 , . . . , rN ) = (r2 , r1 , . . . , rN ) = (P ) (rP1 , rP2 , . . . , rPN ). Examples of bosons include mesons,
(7.26) He4 , phonons, photons, gluons, W
and Z bosons, and (presumably) gravi-
The eigenstates for systems of identical fermions and bosons are a sub- tons. The last four mediate the fun-
set of the eigenstates of distinguishable particles with the same Hamil- damental forces the electromagnetic,
tonian strong, weak, and gravitational interac-
tions. The spin-statistics theorem (not
Hn = En n ; (7.27)
discussed here) states that bosons have
in particular, they are given by the distinguishable eigenstates which integer spins. See problem 7.9.
obey the proper symmetry properties under permutations. A non A permutation {P1 , P2 , . . . PN } is
just a reordering of the integers
symmetric eigenstate with energy E may be symmetrized to form {1, 2, . . . N }. The sign (P ) of a per-
a bose eigenstate mutation is +1 if P is an even permu-
 tation, and 1 if P is an odd permuta-
sym (r1 , r2 , . . . , rN ) = (Normalization) (rP1 , rP2 , . . . , rPN ) (7.28) tion. Swapping two labels, keeping all
P the rest unchanged, is an odd permuta-
tion. One can show that composing two
or antisymmetrized to form a fermion eigenstate permutations multiplies their signs, so
 odd permutations can be made by odd
asym (r1 , r2 , . . . , rN ) = (Normalization) (P )(rP1 , rP2 , . . . , rPN ) numbers of pair swaps, and even per-
mutations are composed of even num-
(7.29) bers of pair swaps.
if the symmetrization or antisymmetrization does not make the sum Most of the common elementary par-
zero. These remain eigenstates of energy E, because they are sums of ticles are fermions: electrons, protons,
neutrons, neutrinos, quarks, etc. Fer-
eigenstates of energy E. mions have half-integer spins. Particles
Quantum statistical mechanics for identical particles is given by re- made up of even numbers of fermions
stricting the ensembles to sum over symmetric wavefunctions for bosons are bosons.
(or antisymmetric wavefunctions for fermions). So, for example, the
partition function for the canonical ensemble is still
Z = Tr eH = e (7.30)

but now the trace is over a complete set of many-body symmetric (anti-
symmetric) states, and the sum is over the symmetric (antisymmetric)
many-body energy eigenstates.
8 In solid state physics we call this an energy gap: the minimum energy needed
to add an excitation to the system. In quantum eld theory, where the excitations
are particles, they refer to the minimum excitation as the mass mc2 of the particle.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
116 Quantum Statistical Mechanics

7.4 Non-Interacting Bosons and Fermions

Many-body quantum statistical mechanics is hard. We now make a
huge approximation: well assume our quantum particles do not interact
with one another. Just as for the classical ideal gas, this will make our
calculations straightforward.
The non-interacting Hamiltonian is a sum of single-particle Hamilto-
nians H:
2 2
HN I = H(pj , rj ) = + V (rj ). (7.31)
j=1 j=1
2m j

Let k be the single-particle eigenstates of H

Hk (r) = /k k (r). (7.32)

For distinguishable particles, the manybody eigenstates can be written

as a product of orthonormal single-particle eigenstates

dist (r1 , r2 , . . . , rN ) = kj (rj ). (7.33)

where we say that particle j is in the single-particle eigenstate kj . The

eigenstates for non-interacting bosons is given by symmetrizing over the
coordinates rj ,

boson (r1 , r2 , . . . , rN ) = (Normalization) kj (rPj ), (7.34)
P j=1

This antisymmetrization can be writ- and of course the fermion eigenstates are given by antisymmetrizing13
ten as

(r ) k1 (r2 ) ... k1 (rN ) N
k1 1

(r1 )
1 k2
k2 (r2 ) ... k2 (rN ) N I
(r ,
fermion 1 2 r , . . . , rN ) = (P ) kj (rPj ). (7.36)
N !
N! P
kN (r1 ) kN (r2 ) ... kN (rN
Lets consider two particles in orthonormal single-particle energy eigen-
called the Slater determinant.
states k and  . If the particles are distinguishable, there are two
eigenstates k (r1 ) (r2 ) and k (r2 ) (r1 ). If the particles are bosons,
the eigenstate is 12 (k (r1 ) (r2 ) + k (r2 ) (r1 )). If the particles are
fermions, the eigenstate is 12 (k (r1 ) (r2 ) k (r2 ) (r1 )).
What if the particles are in the same single-particle eigenstate  ? For
Notice that the normalization of the bosons, the eigenstate is already symmetric and normalized, k (r1 ) (r2 ).14
boson wavefunction depends on how For fermions, antisymmetrizing a state where both particles are in the
many single-particle states are multi-
ply occupied. Check this by squaring
same state gives zero:  (r1 ) (r2 ) (r2 ) (r1 ) = 0. This is the Pauli
and integrating the two-particle boson exclusion principle: you cannot have two fermions in the same quantum
wavefunctions in the two cases. state.15
15 How do we do statistical mechanics for non-interacting fermions and
Because the spin of the electron can
be in two directions 1/2, this means bosons? Here it is most convenient to use the grand canonical ensemble
that two electrons can be placed into [5.4], so we can think of each single-particle eigenstate k as being lled
each single-particle spatial eigenstate.
To be pub. Oxford UP, Fall05
7.4 Non-Interacting Bosons and Fermions 117

independently from the other eigenstates. The grand partition function

hence factors: 
N I = k . (7.37)
The grand canonical ensemble thus allows us to separately solve the
problem one eigenstate at a time, for non-interacting particles.
Bosons. For bosons, all llings nk are allowed. Each particle in
eigenstate k contributes energy /k and chemical potential , so

 nk 1
boson = e(2k )nk = e(2k ) =
nk =0 nk =0
1 e(2k )
so the boson grand partition function is
boson = (2k )
. (7.39)
1 e

The grand free energy is a sum of single-state grand free energies

boson = boson
k = kB T log 1 e(2k ) . (7.40)
k k

Because the lling of dierent states is independent, we can nd out the

expected number of particles in state k . From equation 5.42, 5

(2k ) Bose-Einstein
boson e 1 4 Maxwell Boltzmann
nk  = k
= kB T = (2 ) . (7.41)
1 e(2k ) e k 1 3

This is called the Bose-Einstein distribution. 2

nBE = . (7.42) 1
e(2) 1
-1 0 1 2 3
It describes the lling of single-particle eigenstates by non-interacting ( )/kBT

bosons. For states with low occupancies, where n  1, nBE

e(2) , and the boson populations correspond to what we would guess Fig. 7.3 Bose-Einstein and
Maxwell-Boltzmann distribu-
naively from the Boltzmann distribution.16 The condition for low oc- tions, nBE (/) of equation 7.42
cupancies is /k  kB T , but perversely this often arises at high tem- and nMB (/) of equation 7.59. The
peratures, when gets large and negative. Notice also that nBE Bose-Einstein distribution diverges as
approaches /.
as /k since the denominator vanishes (and becomes negative for
> /k ); systems of non-interacting bosons always have less than or We will formally discuss the
Maxwell-Boltzmann distribution in
equal to the lowest of the single-particle energy eigenvalues.17 [7.5].
Notice that the average excitation nqho of the quantum harmonic 17
oscillator 7.23 is given by the Bose-Einstein distribution 7.42 with = 0. When the river level gets up to the
height of the elds, your farm gets
Well use this in problem 7.9 to treat the excitations inside harmonic ooded.
oscillators (vibrations) as particles obeying Bose statistics (phonons).
Fermions. For fermions, only nk = 0 and nk = 1 are allowed. The
single-state fermion grand partition function is

k = e(2k )nk = 1 + e(2k ) (7.43)
nk =0
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
118 Quantum Statistical Mechanics

so the total fermion grand partition function is

(2k )
fermion = 1 + e . (7.44)
For summing over only two states, its hardly worthwhile to work through
the grand free energy to calculate the expected number of particles in a
Small T 1
e(2k ) 1
nk  = nk exp((/k )nk ) = (2k )
= (2 ) ,
nk =0
1 + e e k +1

kBT (7.45)
leading us to the Fermi-Dirac distribution
0 f (/) = nFD = (2) (7.46)
0 1
Energy over Chemical Potential /
2 e +1
where f (/) is also known as the Fermi function. Again, when the
Fig. 7.4 The Fermi distribution occupancy of state k is low, it is approximately given by the Boltzmann
f (/) of equation 7.46. At low tem- probability distribution, e(2) . Here the chemical potential can be
peratures, states below are occupied,
states above are unoccupied, and
either greater than or less than any given eigenenergy /k . Indeed, at low
states within around kB T of are par- temperatures the chemical potential separates lled states /k < from
tially occupied. empty states /k > ; only states within roughly kB T of are partially
The chemical potential is playing a large role in these calculations,
and those new to the subject may wonder how one determines it. You
will see in the problems that one normally knows the expected number
of particles N , and must vary until you reach that value. Hence
very directly plays the role of a particle pressure from the outside world,
which is varied until the system is correctly lled.
The classical ideal gas has been a great illustration of statistical me-
chanics, and does a good job of many gasses, but nobody would sug-
gest that it captures the main features of solids and liquids. The non-
interacting approximation in quantum mechanics turns out to be far
more powerful, for quite subtle reasons.
For bosons, the non-interacting approximation is quite accurate in
three important cases: photons, phonons, and the dilute Bose gas.
In [7.6] well study two fundamental problems involving non-interacting
bosons: black body radiation and Bose condensation. The behavior of
superconductors and superuids share some common features with that
of the bose gas.
For fermions, the non-interacting approximation would seem to rarely
be useful. Electrons are charged, and the electromagnetic repulsion be-
tween the electrons in an atom, molecule, or material would seem to
always be a major contribution to the energy. Neutrons interact via the
strong interaction, so nuclei and neutron stars would seem also poor can-
didates for a non-interacting theory. Neutrinos are hard to pack into a
These use the same techniques which box. 18 There are experiments on cold, dilute gasses of fermion atoms19
led to the observation of Bose conden-
sation. 18 Just in case you havent heard, neutrinos are quite elusive. It is said that if
you send neutrinos through a lead shield, more than half will penetrate until the
thickness is roughly to the nearest star.
To be pub. Oxford UP, Fall05
7.5 Maxwell Boltzmann Quantum Statistics 119

but naively non-interacting fermions would seem a foolish choice to focus

on in an introductory course.
The truth is that the non-interacting Fermi gas is amazingly impor-
tant, and describes all of these systems (atoms, metals, insulators, nu-
clei, and neutron stars) amazingly well. Interacting Fermi systems under
most common circumstances behave very much like collections of non-
interacting fermions in a modied potential.20 The connection is so
powerful that in most circumstances we ignore the interactions: when-
ever we talk about exciting a 1S electron in an oxygen atom, or an
electron-hole pair in a semiconductor, we are using this eective non-
interacting electron approximation. The explanation for this amazing
fact is called Landau Fermi liquid theory, and lies beyond the purview
of this text. We will discuss the applications of the Fermi gas to metals
in [7.7]; the problems discuss applications to semiconductors [7.13] and
stellar collapse [7.14].

7.5 Maxwell Boltzmann Quantum Statis-

In classical statistical mechanics, we treated indistinguishable particles
as distinguishable ones, except that we divided the phase-space volume,
(or the partition function, in the canonical ensemble) by factor N !.

1 dist
N =
N! N
MB 1 dist
ZN = Z (7.47)
N! N

This was important to get the entropy to be extensive (section 6.2.1).

This approximation is also used in quantum statistical mechanics, al-
though we should emphasize that it does not describe either bosons,
fermions, or any physical system. These bogus particles are said to obey
Maxwell-Boltzmann statistics.
What is the canonical partition function for the case of N non-interacting
distinguishable particles? If the partition function for one particle is

Z1 = e2k (7.48)

then the partition function for two non-interacting, distinguishable (but

otherwise similar) particles is21 21
Multiply out the product of the
sums, and see.

20 In particular, the low-lying excitations above the ground state look qualita-

tively like fermions excited from below the Fermi energy to above the Fermi energy
(electron-hole pairs in metals and semiconductors). It is not that these electrons
dont signicantly interact with those under the Fermi sea: it is rather that these in-
teractions act to dress the electron with a screening cloud. These dressed electrons
and holes, or quasiparticles, are what act so much like non-interacting particles.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
120 Quantum Statistical Mechanics

(2k1 +2k2 ) 2k1 2k2
Z2NI,dist = e = e e = Z1 2 .
k1 ,k2 k1 k2
and the partition function for N such distinguishable, non-interacting
particles is

Z2NI,dist = e(2k1 +2k2 ++2kN ) = e2kj = Z1 N .
k1 ,k2 ,...,kn j=1 kj
So, the Maxwell-Boltzmann distribution for non-interacting particles is

Z2NI,MB = Z1 N /N !. (7.51)

Let us illustrate the relation between these three distributions by con-

sidering the canonical ensemble of two non-interacting particles in three
possible states of energies /1 , /2 , and /3 . The Maxwell-Boltzmann par-
tition function for such a system would be

1 21 2
Z2NI,MB = e + e22 + e23 (7.52)
=1/2 e221 + 1/2 e222 + 1/2 e223
+ e(21 +22 ) + e(21 +23 ) + e(22 +23 ) .

More precisely, we mean those many The 1/N ! xes the weights of the singly-occupied states22 nicely: each
body states where the single-particle has weight one in the Maxwell-Boltzmann partition function. But the
states are all singly occupied or vacant.
doubly occupied states, where both particles have the same wavefunc-
tion, have an unintuitive suppression by 1/2 in the sum.
There are basically two ways to x this. One is to stop discriminating
against multiply occupied states, and to treat them all democratically.
This gives us non-interacting bosons:

Z2NI,boson = e221 +e222 +e223 +e(21 +22 ) +e(21 +23 ) +e(22 +23 ) .
The other way is to squelch multiple occupancy altogether. This leads
to fermions:

Z2NI,fermion = e(21 +22 ) + e(21 +23 ) + e(22 +23 ) . (7.54)

Weve been working in this section with the canonical distribution,

xing the number of particles to two. This is convenient only for small
See problem 7.6 for more details systems; normally weve used the grand canonical ensemble.23 How
about the three ensembles and the four does the grand canonical ensemble apply to particles with Maxwell-
types of statistics.
Boltzmann statistics? The grand partition function is a geometric se-
24 x
Notice the unusual appearance of ee ries:24
in this formula. To be pub. Oxford UP, Fall05
7.6 Black Body Radiation and Bose Condensation 121

 1 NI,MB M  1 
NI,MB M 2k
= ZM e = e eM
M! M!
M M k
 1   P (k )
= e(2k ) = e ke
M k M
 (k )
= e . (7.55)

The grand free energy is

NI,MB = kB T log NI,MB = kB T k (7.56)

with the single-particle grand free energy

k = kB T e(2k ) . (7.57)

Finally, the expected25 number of particles in a single-particle eigenstate

with energy / is

nMB = = e(2) . (7.59)

This is precisely the Boltzmann factor for lling the state that we expect
for non-interacting distinguishable particles.

7.6 Black Body Radiation and Bose Con-

7.6.1 Free Particles in a Periodic Box
For this section and the next section on fermions, we shall simplify even
Fig. 7.5 The quantum states of a par-
further. We consider particles which are not only non-interacting and
ticle in a one-dimensional box with pe-
identical, but are also free. That is, they are subject to no external riodic boundary conditions are sine and
potential, apart from being conned in a box of volume L3 = V with cosine waves n with n wavelengths in
periodic boundary conditions. The single-particle quantum eigenstates the box, kn = 2n/L. With a real box
(zero boundary conditions at the walls)
of such a system are products of sine and cosine waves along the three one would have only sine waves, but one
directions for example, for any three positive integers ni , at every half-wavelength, kn = n/L,
      giving the same net density of states.
3/2 2n1 2n2 2n3
= (2/L) sin x sin y sin z . (7.60)
There are eight such states with the same energy, substituting cos for sin
in all possible combinations along the three directions. These are more
25 It is amusing to note that non-interacting particles ll single particle energy

states according to the same law

n = , (7.58)
e(4) + c
with c = 1 for bosons, c = 1 for fermions, and c = 0 for Maxwell-Boltzmann
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
122 Quantum Statistical Mechanics

conveniently organized if we use instead of sine and cosine, the complex

exponential, so
k = (1/L)3/2 exp(ik r) (7.61)
The eight degenerate states are now with k = 2 L (n1 , n2 , n3 ) and the ni can now be any integer.
given by the choices of sign for the three allowed single-particle eigenstates form a regular square grid in the space
of wavevectors k, with an average density (L/2)3 per unit volume of
kz k-space.
Density of Plane Waves in k-space = V /8 3 (7.62)
For a large box volume V , the grid is extremely ne, and one can use
a continuum approximation that the number of states falling into a k-
space region is given by its volume times the density 7.62.
ky ky
7.6.2 Black Body Radiation
kx Our rst application is to electromagnetic radiation. Electromagnetic
radiation has plane-wave modes similar to 7.61. Each plane-wave trav-
els at the speed of light c, so its frequency is k = c|k|. There are two
Fig. 7.6 The allowed k-space points
modes per wavevector k, one for each polarization. When one quan-
for periodic boundary conditions form a tizes the electromagnetic eld, each mode becomes a quantum harmonic
regular grid. The points of equal energy oscillator.
lie on a sphere. Before quantum mechanics, people could not understand how electro-
magnetic radiation could come to equilibrium. The equipartition theo-
rem suggested that if you could come to equilibrium, each mode would
have kB T of energy. Since there are immensely more wavevectors in the
ultraviolet and X-ray ranges than in the infrared and visible, this pre-
dicts that when you open your oven door youd get a sun tan or worse
(the so-called ultraviolet catastrophe). Simple experiments looking at
radiation emitted from pinholes in otherwise closed boxes held at xed
temperature saw a spectrum which looked compatible with classical sta-
tistical mechanics for small frequency radiation, but was cut o at high
frequencies. This was called blackbody radiation because a black-walled
box led to fast equilibration of photons inside.
Let us calculate the correct energy distribution inside a blackwalled
box. The number of single-particle planewave eigenstates g() d in a
Were going to be sloppy and use small range d is27
g() for photons to be eigenstates per   
unit frequency, and g(/) later for single- 2 d|k| 2V
particle eigenstates per unit energy = g() d = (4k ) d (7.63)
d (2)3

where the rst term is the surface area of the sphere of radius k, the
second term is the thickness of the sphere for a small d, and the last
is the density of single-particle plane-wave eigenstate wavevectors times
two (because there are two photon polarizations per wavevector). Know-
ing k2 = 2 /c2 and d|k|/d = 1/c, we nd the density of plane-wave
eigenstates per unit frequency

V 2
g() = . (7.64)
2 c3
To be pub. Oxford UP, Fall05
7.6 Black Body Radiation and Bose Condensation 123

Now, the number of photons is not xed: they can be created or de-
stroyed, so their chemical potential is zero.28 Their energy /k = k .
Finally, they are to an excellent approximation identical, non-interacting
bosons, so the number of photons per eigenstate with frequency is
n = e/k1B T 1 . This gives us a number of photons

(# of photons) d = d (7.65)
e/kB T 1
and an electromagnetic (photon) energy per unit volume u() given by

g() Planck Distribution

Blackbody Energy Density

Rayleigh-Jeans Equipartition
V u()d = d (7.66)
e/kB T 1
V 3
= 2 3 /k T .
c e B 1
This is Plancks famous formula for black-body radiation. At low fre-
quencies, we can approximate e/kB T 1 /kB T , yielding the Rayleigh-
Jeans formula Frequency

kB T Fig. 7.7 The Planck black-body
V uRJ ()d = V 2 3
2 d (7.67) radiation power spectrum, with the
Rayleigh-Jeans approximation, valid
= kB T g() for low frequency .

just as one would expect from equipartition: kB T per classical harmonic

For modes with frequencies high compared to kB T /, equipartition
no longer holds. The energy gap , just as for the lowtemperature
specic heat from section 7.2, leads to an excitation probability that
is suppressed by the exponential Boltzmann factor e/kB T , as one
can see from equation 7.66 by approximating e/k1B T 1 e/kB T .
Plancks discovery that quantizing the energy averted the ultraviolet
catastrophe led to his name being given to .

7.6.3 Bose Condensation

How do the properties of bosons change when they cannot be created
and destroyed? What happens if we have N non-interacting free bosons
in our box of volume V with periodic boundary conditions?
Let us assume that our bosons are spinless, have mass m, and are
non-relativistic so their energy is / = p2 /2m = 2 2 /2m. Well begin
by assuming that we can make the same continuum approximation to
the density of states as we did in the case of black-body radiation. In
equation 7.62, the number of plane-wave eigenstates per unit volume in
k-space is V /8 3 , so the density in momentum space p = k is V /(2)3 .

28 We can also see this from the fact that photons are excitations within a harmonic
oscillator; in section 7.4 we noted that the excitations in a harmonic oscillator satisfy
Bose statistics with = 0.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
124 Quantum Statistical Mechanics

For our massive particles d//d|p| = |p|/m = 2//m, so the number of
plane-wave eigenstates in a small range of energy d/ is
2 d|p| V
g(/)d/ = (4p ) d/
d/ (2)3
m V
= (4(2m/)) d/
2/ (2)3
= / d/. (7.68)
2 2 3
where the rst term is the surface area of the sphere in p space, the
second is the thickness of the sphere, and the third is the density of
plane-wave eigenstates per unit volume in p-space.
Now, we ll each of these single-particle plane-wave eigenstates with
an expected number given by the Bose-Einstein distribution at chemical
potential , 1/(e(2)/kB T 1), so the total number of particles N must
be given by 
N () = d/. (7.69)
e (2)/k BT 1
We must vary in this equation to give us the correct number of
particles N . For larger numbers of particles we raise , forcing more
particles into each of the single-particle states. There is a limit, how-
ever, to how hard we can push. As we noted in section 7.4, cannot
be as large than the lowest single-particle eigenvalue, because at that
point that state gets a diverging number of particles. In our continuum
At = 0, the denominator of the approximation, however, when = 0 the integral for N () converges.29
integrand in equation 7.69 is approxi- cont
Thus the largest number of particles Nmax we can t into our box
mately //kB T for small /, but the nu-
merator goes as /, so the integral con-
within our continuum approximation for the density of states is the
R 1 value of equation 7.69 at = 0,30
verges at the lower end: 0X / /2
1/ /1/2 |X =

2 0 X/2. g(/)
30 Nmax = d/. (7.70)
The function (s) = e2/kB T 1
R z s1 
0 e 1 ds is famous be- V m3/2 /
cause it is related to the distribution d/ 2/k T
of prime numbers, because it is 2  0
2 3 e B 1
the subject of the famous unproven
2mkB T 2 z
Riemann hypothesis (about its zeros =V dz
h 0 ez 1
in the complex plane), and because  
the values in certain regions form V
excellent random numbers. Remember = (3/2 ).
1/ ! = (3/ ) = /2. 3
2 2

where is the Riemann zeta function, with (3/2 ) 2.612, and where
= h/ 2mkB T is the thermal de Broglie wavelength we saw rst in
This formula has a simple interpre- the canonical ensemble of the ideal gas, equation 5.26.31 Thus something
tation: the quantum statistics of the new has to happen at a critical density
particles begin to dominate the behav-
ior when they are within a thermal de Nmax (3/2 ) 2.612 particles
Broglie wavelength of one another. = 3 = . (7.71)
V deBroglie volume
What happens when we try to cram more particles in? What hap-
pens is that our approximation of the distribution of eigenstates as a
To be pub. Oxford UP, Fall05
7.7 Metals and the Fermi Gas 125

continuum breaks down. Figure 7.8 shows a schematic of the rst few
single-particle eigenvalues. When the distance between and the bot-
tom level becomes signicantly smaller than the distance between the
bottom and the next level, the continuum approximation (which roughly
treats the single state /0 as the integral halfway to /1 ) becomes qualita-
tively wrong. This lowest state absorbs all the extra particles added to
cont 32 32
the system beyond Nmax . This is called Bose-Einstein condensation. The next few states have quantitative
Usually, one doesnt add particles at xed temperature, one lowers corrections, but the continuum approx-
imation is only o by small factors.
the temperature at xed density N/V , where Bose condensation occurs
at temperature
BEC h2 N
kB Tc = . (7.72)
2m V (3/2 )
Bose condensation has recently been observed in ultracold gasses (see
problem 7.11).
Bose condensation has also long been considered the underlying prin- 1
ciple behind superuidity.33 Liquid He4 undergoes an unusual transition
at about 2.176K to a state without viscosity: it will swirl round a circu- 0
lar tube for as long as your refrigeration lasts. The quantitative study
of the superuid transition involves the interactions between the helium Fig. 7.8 Bose condensation: the chem-
atoms and the scaling methods well introduce in chapter 13. But its in- ical potential is so close to the ground
teresting to note that the Bose condensation temperature for liquid He4 state energy /0 that the continuum ap-
proximation to the density of states
with m = 6.65 1024gm and volume per particle V /N = 27.6 cm/mole breaks down. The ground state is
is 3.13K: quite close to the superuid transition temperature. macroscopically occupied (that is, lled
by a non-zero fraction of the total num-
ber of particles N ).
7.7 Metals and the Fermi Gas 33
The connection is deep. The den-
sity matrix of a superuid has an un-
We claimed in section 7.4 that many systems of stronglyinteracting usual property, called o-diagonal long-
fermions (metals, neutron stars, nuclei) are surprisingly well described range-order, which is also found in the
Bose condensate (see problem 10.5).
by a model of non-interacting fermions. Lets solve the simplest such
model, N free non-interacting fermions in a box.
Let our particles be non-relativistic and spin 1/2. The singleparticle
eigenstates are the same as those for bosons34 except that there are two 34
You need two particles for bosons
states (spin up, spin down) per plane wave. Hence the density of states and fermions to dier.
is given by twice that of equation 7.68:

2V m3/2
g(/) = /. (7.73)
2 3
So, the number of fermions at chemical potential is given by integrating
g(/) times the expected number of fermions in a state of energy /, given
by the Fermi function f (/) of equation 7.46:
N () = g(/)f (/) d/. (2)/kB T + 1
d/. (7.74)
0 0 e
What chemical potential will give us N fermions? In general, one
must do a self-consistent calculation, but the calculation is easy in the
important limit of zero temperature. In that limit (gure 7.4) the Fermi
function is a step function f (/) = ( /); all states below are lled,
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
126 Quantum Statistical Mechanics

and all states above are empty. The zero-temperature value of the
chemical potential is called the Fermi energy /F . We can nd the number
of fermions by integrating up to = /F :
 2F 3/2  2F
2m (2/F m)3/2
N= g(/) d/ = V / d/ = V. (7.75)
0 2 3 0 3 2 3
This formula becomes easier to understand if we realize that were
lling all states with wavevector k < kF , where the Fermi wavevector
kF is the length of the wavevector whose eigenenergy
equals the Fermi
energy: kF2 /2m = p2F /2m = /F , so kF = 2/F m/. The resulting
sphere of occupied states at T = 0 is called the Fermi sphere. The
Fig. 7.9 The Fermi surface for
number of fermions inside the Fermi sphere is thus
lithium, from [21]. The Fermi energy
for lithium is 4.74 eV, with one con-
kF 3 4 3
duction electron outside a He closed N= = /3 kF V (7.76)
shell. Note that for most metals the 3 2 (2)3
Fermi energy is much larger than kB
times the melting point (/F =4.74 eV = the kspace volume of the Fermi sphere times the kspace density of
55,000 K, and the melting point is 453 states.
K). Hence they are well described by
the T = 0 Fermi surfaces here, slightly
We mentioned earlier that the independent fermion approximation
smeared by the Fermi function shown was startlingly useful even though the interactions are not small. Ignor-
in gure 7.4. ing the Coulomb repulsion between electrons in a metal, or the strong
interaction between neutrons in a neutron star, gives an excellent de-
scription of their actual behavior. Our calculation above, though also
assumed that the electrons are free particles, experiencing no external
potential. This approximation isnt particularly accurate in general: the
interactions with the atomic nuclei are important, and is primarily what
makes one material dierent from another. In particular, the atoms in
a crystal will form a periodic potential for the electrons. One can show
that the singleparticle eigenstates in a periodic potential can be cho-
sen to be periodic functions times plane waves [7.61] of exactly the same
wave-vectors as in the free fermion case. A better approximation is given
by incorporating the eects of the into inner shell electrons into the pe-
riodic potential, and lling the Fermi sea with the remaining conduction
electrons. The lling of the Fermi surface in k-space as described here
is changed only insofar as the energies of these single-particle states is no
Fig. 7.10 The Fermi surface for
longer simple. Some metals (particularly the alkali metals, like lithium in
aluminum, also from [21]. Aluminum
has a Fermi energy of 11.7 eV, with gure 7.9) have roughly spherical Fermi surfaces; many (like aluminum
three conduction electrons outside a Ne in gure 7.10) are quite intricate, with several pieces to them [8, Ch.
closed shell. 9-11].

(7.1) Phase Space Units and the Zero of Entropy. In classical mechanics, the entropy S = kB log goes
(Quantum) to minus innity as the temperature is lowered to zero;
To be pub. Oxford UP, Fall05
7.7 Metals and the Fermi Gas 127

in quantum mechanics the entropy per particle goes to oscillator. Consider a harmonic oscillator with Hamil-
zero,35 because states are quantized and the ground state tonian H = p2 /2m + 1/2 m 2 q 2 . Draw a picture of the
is the only one populated. This is Nernsts theorem, the energy surface with energy E, and nd the volume (area)
third law of thermodynamics. of phase space enclosed. (Hint: the area of an ellipse is
The classical phase-space volume has units of r1 r2 where r1 and r2 are the largest and smallest radii,
((momentum)(distance))3N . Its a little perverse to corresponding to the major and minor axes.) What is
take the logarithm of a quantity with units. The obvi- the volume per energy state, the volume between En and
ous candidate with these dimensions is Plancks constant En+1 , for the eigenenergies En = (n + 1/2 )?
h3N : if we measure phase-space volume in units of h per Why must these two calculations agree? How can we
dimension, will be dimensionless. Of course, the correct derive this result in general, even for nasty systems of
dimension could be a constant times h, like . . . interacting particles? The two traditional methods for
(a) Arbitrary zero of the classical entropy. Show directly calculating the phasespace units in general sys-
that the width of the energy shell dE in the denition of tems semiclassical quantization [54, ch. 48, p. 170]
the entropy does not change the classical entropy per par- and the pathintegral formulation of quantum statistical
ticle S/N . Show that the choice of units in phase space mechanics [30] would be too distracting to present here.
does changes the classical entropy per particle. Upon some thought, we realize that one cannot choose
We need to choose the units of classical phasespace dierent units for the classical phasespace volume for
volume so that the entropy agrees with the high dierent systems. They all must agree, because one can
temperature entropy for the quantum systems. That is, transform one into another. Consider N interacting par-
we need to nd out how many quantum eigenstates per ticles in a box, at high temperatures where classical sta-
unit volume of classical phase space we should expect at tistical mechanics is valid. Imagine slowly and reversibly
high energies. We can x these units by explicitly match- turning o the interactions between the particles (making
ing the quantum result to the classical one for a simple them into our ideal gas). We carefully remain at high tem-
system. Lets start with a free particle. peratures, and measure the entropy ow into or out of the
system. The entropy dierence will be given by classical
(b) Phasespace density of states for a particle in
statistical mechanics, whatever units one wishes to choose
a one-dimensional box. Show, or note, that the quan-
for the phasespace volume. The entropy of the interac-
tum momentumspace density of states for a free quan-
ting system is thus the entropy of the ideal gas (with
tum particle in a onedimensional box of length L with
volume h per state) plus the classical entropy change
periodic boundary conditions is L/h. Draw a picture of
hence also must use the same phasespace units.36
the classical phase space of this box (p, x), and draw a
rectangle of length L for each quantum eigenstate. Is the
(7.2) Does Entropy Increase in Quantum Sys-
phase-space area per eigenstate equal to h, as we assumed
tems? (Mathematics, Quantum)
in 3.5?
We saw in problem 6.5 that in classical Hamilto-
This works also for N particles in a threedimensional
nian Rsystems the non-equilibrium entropy Snonequil =
kB log is constant in a classical mechanical Hamil-
(c) Phasespace density of states for N particles tonian system. Well show here that it is constant also in
in a box. Show that the density of states for N free par- a closed quantum Hamiltonian system.
ticles in a cubical box of volume V with periodic boundary
A general ensemble in a quantum system is described by
conditions is V N /h3N , and hence that the phasespace the density matrix . In most of statistical mechanics,
volume per state is h3N .
is diagonal when we use a basis of energy eigenstates.
Can we be sure that the answer is independent of which Here, since each energy eigenstate is time independent
simple system we use to match? Lets see if it also works except for a phase, any mixture of energy eigenstates will
for the harmonic oscillator. have a constant density matrix, and so will have a con-
(d) Phasespace density of states for a harmonic stant entropy.

35 If the ground state is degenerate, the entropy doesnt go to zero, but it typically

stays nite as the number of particles N gets big, so for large N the entropy per
particle goes to zero.
36 In particular, if we cool the interacting system to zero temperature and remain in

equilibrium, reaching the ground state, that its entropy will go to zero the entropy
ow out of the system on cooling is given by our classical formula with phase-space
volume measured in units of h.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
128 Quantum Statistical Mechanics

(a) Entropy is Constant: Mixtures of Energy There are N defects in the crystal, which can be assumed
Eigenstates. Prove that if is a density matrix diagonal stuck in position (and hence distinguishable) and assumed
in the basis of energy eigenstates, that is time indepen- not to interact with one another.
dent. Hence, conclude that the entropy S = Tr log is Write the canonical partition function Z(T ), the mean
time-independent energy E(T ), the uctuations in the energy, the entropy
Thus, not only are the microcanonical and canonical en- S(T ), and the specic heat C(T ) as a function of tempera-
sembles time independent, but mixing energy eigenstates ture. Plot the specic heat per defect C(T )/N for M = 6;
in any ratio would be time independent. To justify equi- set the unit of energy equal to 9 and kB = 1 for your plot.
libration in quantum systems, one must couple the sys- Derive a simple relation between M and the change in
tem to the outside world and induce transitions between entropy between zero and innite temperature. Check this
eigenstates. relation using your formula for S(T ).
In the particular case of the entropy, the entropy is
time independent even for general, time-dependent den-
sity matrices. (7.5) Density Matrices. (Quantum)
(b) Entropy is Constant: General Density Matri- (a) Density matrices for photons. Write the den-
ces. Prove that S = Tr ( log ) is time-independent, sity matrix for a photon linearly traveling along z and
where is any density matrix. (Hint: Show that linearly polarized along x, in the basis where (1, 0) and
Tr(ABC) = Tr(CAB) for any matrices A, B, and C. (0, 1) are polarized along x and y. Write the density
Also you should know that an operator M commutes with trix for a right-handed polarized photon, (1/ 2, i/ 2),
any function f (M).) and the density matrix for unpolarized light. Calculate
Tr(), Tr(2 ), and S = kB Tr( log ). Interpret the
(7.3) Phonons on a String. (Quantum)
values of the three traces physically: one is a check for
One-dimensional Phonons. A nano-string of length L pure states, one is a measure of information, and one is
with mass per unit length under tension has a ver- a normalization.
tical, transverse displacement u(x, t). The kinetic energy
density is (/2)(u/t)2 and the potential energy density (b) Density matrices for a spin.(Adapted from
is ( /2)(u/x)2 . Halperins course, 1976.) Let the Hamiltonian for a spin
Write the kinetic energy and the potential energy in new be

variables, changing
P from u(x, t) to normal modes qk (t) H = B F (7.78)
with u(x, t) = 2
n qkn (t) sin(kn x), kn = n/L. Show in
these variables that the system is a sum of decoupled har- where F = (x , y , z ) are the three Pauli spin matrices,
monic oscillators. Calculate the density of states per unit and B may be interpreted as a magnetic eld, in units
frequency g(), the number of normal modes in a fre- where the gyromagnetic ratio is unity. Remember that
quency range (, + 9) divided by 9, keeping 9 large com- i j j i = 2i9ijk k . Show that any 2 2 density ma-
pared to the spacing between modes.37 Calculate the spe- trix may be written in the form
cic heat of the string c(T ) per unit length in the limit
L , treating the oscillators quantum mechanically.
What is the specic heat of the classical string? = 1/2 (1 + p F). (7.79)

(7.4) Crystal Defects. (Quantum, Basic) Show that the equations of motion for the density matrix
Defects in Crystals. A defect in a crystal has one on- i/t = [H, ] can be written as dp/dt = B p.
center conguration with energy zero, and M o-center
congurations with energy 9, with no signicant quan-
tum tunneling between the states. The Hamiltonian can (7.6) Ensembles and Statistics: 3 Particles, 2 Lev-
be approximated by the (M + 1) (M + 1) matrix els. (Quantum)
0 1
0 0 0 A system has two single-particle eigenfunctions, with en-
H = @0 9 0 A (7.77) ergies (measured in degrees Kelvin) E0 /kB = 10 and
0 0 9 E2 /kB = 10. Experiments are performed by adding three

37 This is the density of singleparticle eigenstates per unit frequency. In prob-

lem 7.11 well study the density of many-body energy eigenstates g(E) in a trap with
precisely three frequencies (where our g() would be = ( 0 ) + 2( 1 )).
Dont confuse the two.
To be pub. Oxford UP, Fall05
7.7 Metals and the Fermi Gas 129

non-interacting particles to these two states, either iden- fermions as a function of temperature? Bosons? Distin-
tical spin 1/2 fermions, identical spinless bosons, distin- guishable particles? Maxwell-Boltzmann particles?
guishable particles, or spinless identical particles obey-
ing Maxwell-Boltzmann statistics. Please make a table
for this problem, giving your answers for the four cases Constant T,
Chemical Potentials for Three Particles in Two States
(Fermi, Bose, Dist., and MB) for each of the three parts.
Calculations may be needed, but only the answers will be A C
graded. 60 C

Chemical Potential /kB

40 F
Constant E, N D
Entropies for Three Particles in Two States 0
C Log(8)
2 D A
-60 F
1 Log(3)
0 20 40 60 80

C Log(2)
D Temperature T (degrees K)
0 Log(1)
B Fig. 7.13
Log(1/6) (c) The system is now held at constant temperature, with
-30 -20 -10 0 10 20 30 chemical potential set to hold the average number of par-
E/kB (degrees K)
ticles equal to three. In gure 7.13, which curve repre-
Fig. 7.11 sents the chemical potential of the fermions as a func-
tion of temperature? Bosons? Distinguishable? Maxwell-
(a) The system is rst held at constant energy. In g-
ure 7.11 which curve represents the entropy of the fer- (7.7) Bosons are Gregarious: Superuids and
mions as a function of the energy? Bosons? Distinguish- Lasers (Quantum)
able particles? Maxwell-Boltzmann particles?
Many experiments insert a new particle into a manybody
state: new electrons into a metal, new electrons or elec-
Constant T, N tron pairs into a superconductor, new bosonic atoms into
Energies of Three Particles in Two States
20 a superuid, new photons into a cavity already lled with
A light. These experiments explore how the bare inserted
Average Energy E/kB (degrees K)

10 C D particle decomposes into the natural states of the many
body system. The cases of photons and bosons illustrate
0 a key connection between laser physics and Bose conden-
A sates.
-10 Adding a particle to a Bose condensate. Suppose
E B we have a noninteracting system of bosonic atoms in a
-20 box with singleparticle eigenstates n . Suppose the sys-
tem begins in a Bose condensed state with all N bosons
-30 in a state 0 , so
0 20 40 60 80
T (degrees K)
Fig. 7.12 N (r1 , . . . , rN ) = 0 (r1 ) . . . 0 (rN ) (7.80)

Suppose a new particle is gently injected into the sys-

(b) The system is now held at constant temperature. In tem, into an equal superposition of the M lowest single
gure 7.12 which curve represents the mean energy of the particle states.38 That is, if it were injected into an empty

38 For free particles in a cubical box of volume V , injecting a particle at the origin

(r) = (r) would be a superposition of all planewave states of equal weight, (r) =
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
130 Quantum Statistical Mechanics

box, it would start in state the energy of the atom will be uncertain by an amount
E . Assume for simplicity that, in a cubical box
1 `
(rN+1 ) = 0 (rN+1 ) + 1 (rN+1 ) (7.81) without preexisting photons, the atom would decay at
M an equal rate into any mode in the range E /2 <
+ + M 1 (rN+1 )  < E + /2.
The state (r1 , . . . rN+1 ) after the particle is inserted into (b) Assuming a large box and a small decay rate , nd
the noninteracting Bose condensate is given by sym- a formula for the number of modes M per unit volume V
[0] competing for the photon emitted from our atom. Evalu-
metrizing the product function N (r1 , . . . , rN )(rN+1 )
(equation 7.28). ate your formula for a laser with wavelength = 619 nm
(a) Calculate the symmetrized initial state of the system and the line-width = 10 kHz. (Hint: use the density
with the injected particle. Show that the ratio of the prob- of states, equation 7.64.)
ability that the new boson enters the ground state (0 ) is Assume the laser is already in operation, so there are N
enhanced over that of its entering an empty state (m for photons in the volume V of the lasing material, all in one
0 < m < M ) by a factor N + 1. (Hint: rst do it for planewave state (a singlemode laser).
N = 1.) (c) Using your result from part (a), give a formula for
So, if a macroscopic number of bosons are in one single the number of photons per unit volume N/V there must
particle eigenstate, a new particle will be much more be in the lasing mode for the atom to have 50% likelihood
likely to add itself to this state than to any of the mi- of emitting into that mode.
croscopically populated states.
The main task in setting up a laser is providing a popu-
Notice that nothing in your analysis depended on 0 be- lation of excited atoms! Amplication can occur if there
ing the lowest energy state. If we started with a macro- is a population inversion, where the number of excited
scopic number of particles in a singleparticle state with atoms is larger than the number of atoms in the lower
wave-vector k (that is, a superuid with a supercurrent energy state (clearly a non-equilibrium condition). This
in direction k), new added particles, or particles scat- is made possible by pumping atoms in to the excited state
tered by inhomogeneities, will preferentially enter into by using one or two other singleparticle eigenstates.
that state. This is an alternative approach to understand-
ing the persistence of supercurrents, complementary to
the topological approach (exercise 9.4). (7.8) Einsteins A and B (Quantum, Mathematics)
Adding a photon to a laser beam. In part (a), we Einstein deduced some basic facts about the interaction of
saw that adding a boson to a singleparticle eigenstate light with matter very early in the development of quan-
with N existing bosons has a probability which is larger tum mechanics, by using statistical mechanics! In par-
by a factor N + 1 than adding a boson to an empty state. ticular, he established that stimulated emission was de-
This chummy behavior between bosons is also the princi- manded for statistical mechanical consistency, and found
ple behind lasers.39 If we think of an atom in an excited formulas determining the relative rates of absorption,
state, the photon it emits during its decay will prefer to spontaneous emission, and stimulated emission. (See [86,
join the laser beam than to go o into one of its other I.42-5]).
available modes. In this factor N + 1, the N represents Consider a system consisting of non-interacting atoms
stimulated emission, where the existing electromagnetic weakly coupled to photons (electromagnetic radiation),
eld pulls out the energy from the excited atom, and the in equilibrium at temperature kB T = 1/. The atoms
1 represents spontaneous emission which occurs even in have two energy eigenstates E1 and E2 with average pop-
the absence of existing photons. ulations N1 and N2 : the relative population is given as
Imagine a single atom in an state with excitation energy usual by the Boltzmann distribution
energy E and decay rate , in a cubical box of volume V
fi fl
with periodic boundary conditions for the photons. By N2
the energytime uncertainty principle, E t /2 = e(E2 E1 ) . (7.82)

1 P ikx , appendix A. (In secondquantized notation, a (x = 0) = 1

V ke V k ak .)
So, we gently add a particle at the origin by restricting this sum to low energy
states. This is how quantum tunneling into condensed states (say, from Josephson
junctions or scanning tunneling microscopes) is usually modeled.
39 Laser is an acronym for Light Amplication by the Stimulated Emission of Ra-

To be pub. Oxford UP, Fall05
7.7 Metals and the Fermi Gas 131

The energy density in the electromagnetic eld is given (7.9) Phonons and Photons are Bosons. (Quantum)
by the Planck distribution (equation 7.66):
Phonons and photons are the elementary, harmonic ex-
u() = . (7.83) citations of the elastic and electromagnetic elds. Weve
2 3
c e  1
seen in 7.3 that phonons are decoupled harmonic oscilla-
An atom in the ground state will absorb electromag- tors, with a distribution of frequencies . A similar anal-
netic energy from the photons at a rate that is propor- ysis shows that the Hamiltonian of the electromagnetic
tional to the energy density u() at the excitation energy eld can be decomposed into harmonic normal modes
 = E2 E1 . Let us dene this absorption rate per called photons.
atom to be 2Bu().40 This problem will explain why we think of phonons and
An atom in the excited state E2 , with no electromagnetic photons as particles, instead of excitations of harmonic
stimulation, will decay into the ground state with a rate modes.
A, emitting a photon. Einstein noted that neither A nor (a) Show that the canonical partition function for a quan-
B should depend upon temperature. tum harmonic oscillator of frequency is the same as the
Einstein argued that just these two rates would lead to grand canonical partition function for bosons multiply ll-
an inconsistency. ing a single state with energy , with = 0, up to a shift
(a) Compute the longtime average ratio N2 /N1 assum- in the arbitrary zero of the total energy of the system.
ing only absorption and spontaneous emission. Even in The Boltzmann lling of a harmonic oscillator is there-
the limit of weak coupling (small A and B), show that this fore the same as the Bose-Einstein lling of bosons into
equation is incompatible with the statistical distributions a single quantum state, except for an extra shift in the
[7.82] and [7.83]. (Hint: Write a formula for dN1 /dt, energy of /2. This extra shift is called the zero point
and set it equal to zero. Is B/A temperature indepen- energy. The excitations within the harmonic oscillator
dent?) are thus often considered particles with Bose statistics:
Einstein xed this by introducing stimulated emission. the nth excitation is n bosons occupying the oscillators
Roughly speaking, an atom experiencing an oscillating quantum state.
electromagnetic eld is more likely to emit photons into This particle analogy becomes even more compelling for
that mode. Einstein found that the stimulated emission systems like phonons and photons where there are many
rate had to be a constant 2B  times the energy density harmonic oscillator states labeled by a wavevector k (see
u(). problem 7.3). Real, massive bose particles like He4 in
(b) Write the equation for dN1 /dt, including absorption free space have singleparticle quantum eigenstates with
(a negative term) and spontaneous and stimulated emis- a dispersion relation 9k = 2 k2 /2m. Phonons and pho-
sion from the population N2 . Assuming equilibrium, use tons have one harmonic oscillator for every k, with an ex-
this equation and equations 7.82 and 7.83 to solve for B, citation energy 9k = k . If we treat them, as in part (a),
and B  in terms of A. These are generally termed the as bosons lling these as single-particle states we nd that
Einstein A and B coecients. they are completely analogous to ordinary massive par-
Lets express the stimulated emission rate in terms of the ticles. The only dierence is that the relation between
number of excited photons per mode (see exercise 7.7b energy and wave-vector (called the dispersion relation) is
for an alternative derivation). dierent: for photons, 9k = k = c|k|.41
(c) Show that the rate of decay of excited atoms A + (b) Do phonons or photons Bose condense at low tem-
2B  u() is enhanced by a factor of n + 1 over the zero peratures? Can you see why not? Can you think of a
temperature rate, where n is the expected number of pho- nonequilibrium Bose condensation of photons, where a
tons in a mode at frequency  = E2 E1 . macroscopic occupation of a single frequency and momen-
tum state occurs?

40 The literature uses u

cycles (f ) where f = /2 is in cycles per second, and has
no factor of 2. Since ucycles (f )df = u()d, the absorption rate Bucycles (f ) =
Bu()d/df = 2Bu(). p
41 If massive particles are moving fast, their energies are / = m2 c4 p2 c2 . This
formula reduces to p2 /2m + mc2 = 2 k 2 /2m + mc2 if the kinetic energy is small
compared to the rest mass mc2 . For massless particles, / = |b|c = |k|c, precisely
the relation we nd for photons (and for phonons at low frequencies). So actually
even the dispersion relation is the same: photons and phonons are massless bosons.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
132 Quantum Statistical Mechanics

Be careful not to get confused when we put real, mas- out: the spreading cloud was imaged 60ms later by shin-
sive bosons into a harmonic oscillator potential (problem ing a laser on them and using a CCD to image the shadow.
7.11). There it is best to think of each harmonic oscilla- For your convenience, the ground state of a parti-
tor as being many separate eigenstates being lled by the cle of mass m in a one-dimensional` harmonic oscilla-
1/4 mx 2
atoms. tor with frequency is 0 (x) = m e /2

and the momentum-space wave function is 0 (k) =
(7.10) Bose Condensation in a Band. (Basic, Quan- `  1/4 k2 /2m
e .
(c) What is the ground-state wave-function for one
The density of states g(E) of a system of non-interacting rubidium-87 atom in this potential? What is the wave-
bosons forms a band: the single-particle eigenstates are function in momentum space? The probability distribu-
conned to an energy range Emin < E < Emax , so g(E) tion of the momentum? What is the ratio of the velocity
is non-zero in this range and zero otherwise. The sys- widths along the axis and perpendicular to the axis for
tem is lled with a nite density of bosons. Which of the ground state? For the classical thermal distribution
the following is necessary for the system to undergo Bose of velocities? If the potential is abruptly removed, what
condensation at low temperatures? will the shape of the distribution of positions look like

(a) g(E)/(e(EEmin ) + 1) is nite as E Emin . 60ms later, (ignoring the small width of the initial dis-

(b) g(E)/(e (EEmin )
1) is nite as E Emin . tribution in space)? Compare your predicted anisotropy
(c) Emin 0. to the false-color images above. If the x axis goes mostly
RE right and a bit up, and the y axis goes mostly up and a
(d) E g(E  )/(E  Emin ) dE  is a convergent integral
min bit left, which axis corresponds to the axial frequency and
at the lower limit Emin .
which corresponds to one of the two lower frequencies?
(e) Bose condensation cannot occur in a system whose
Their Bose condensation isnt in free space: the atoms
states are conned to an energy band.
are in a harmonic oscillator potential. In the calculation
in free space, we approximated the quantum states as a
(7.11) Bose Condensation in a Parabolic Potential.
continuum density of states g(E). Thats only sensible
if kB T is large compared to the level spacing near the
ground state.
Wieman and Cornell in 1995 were able to get a dilute gas (d) Compare  to kB T at the Bose condensation point
of rubidium-87 atoms to Bose condense [4]. Tcmeasured in their experiment.
(a) Is rubidium-87 (37 protons and electrons, 50 neu- For bosons in a one-dimensional harmonic oscillator of
trons) a boson or a fermion? frequency 0 , its clear that g(E) = 1/(0 ): the number
(b) At their quoted maximum number density of 2.5 of states in a small range E is the number of 0 s it
1012 /cm3 , at what temperature Tcpredict do you expect the contains.
onset of Bose condensation in free space? They claim (e) Compute the density of states
that they found Bose condensation starting at a temper- Z
ature of Tcmeasured = 170nK. Is that above or below your
g(E) = d91 d92 d93 g1 (91 )g2 (92 )g3 (93 )
estimate? (Useful constants: h = 6.6262 1027 erg 0
sec, mn mp = 1.6726 1024 gm, kB = 1.3807 1016 (E (91 + 92 + 93 )) (7.84)
The trap had an eective potential energy that was har- for a three-dimensional harmonic oscillator, with one fre-
monic in the three directions, but anisotropic with cylin- quency 0 and two of frequency 1 . Show that its equal
drical symmetry. The frequency along the cylindrical axis to 1/ times the number of states in F9 space between en-
was f0 =120Hz so 0 750Hz, and the two other fre- ergies E and E + . Why is this triangular slab not of
quencies were smaller by a factor of 8: 1 265Hz. thickness ?
The Bose condensation was observed by abruptly remov- Their experiment has N = 2 104 atoms in the trap as
ing the trap potential,43 and letting the gas atoms spread it condenses.

42 Observation of Bose-Einstein Condensation in a Dilute Atomic Vapor, M.H.

Anderson, J.R. Ensher, M.R. Matthews, C.E. Wieman, and E.A. Cornell, Science
269, 198 (1995).
43 Actually, they rst slowly reduced it by a factor of 75 and then abruptly reduced

it from there; Im not sure why, but lets ignore that complication.
To be pub. Oxford UP, Fall05
7.7 Metals and the Fermi Gas 133

Fig. 7.14 Bose-Einstein Condensation at 400, 200, and 50 nano-Kelvin, from

reference [4]. The pictures are spatial distributions 60ms after the potential is re-
moved; the eld of view of each image is 200m 270m. The left picture is roughly
spherically symmetric, and is taken before Bose condensation; the middle has an el-
liptical Bose condensate superimposed on the spherical thermal background; the right
picture is nearly pure condensate. I believe this may not be the same experiment as
described in their original paper.

(f ) By working in analogy with the calculation in free distant walls of the cavity, where the existence of the hole
space, nd the maximum number of atoms that can oc- is a negligible perturbation. So, presuming the relevant
cupy the three-dimensional harmonic oscillator potential photons just inside the hole are distributed in the same
in part (e) without Bose Rcondensation at temperature way as in the box as a whole (equation 7.66), how many

T . (Youll want to know 0 z 2 /(ez 1) dz = 2 (3) = leave in a time dt?
2.40411.) According to your calculation, at what temper-
ature TcHO should the real experimental trap have Bose
c dt
(7.12) Light Emission and Absorption. (Quantum,
The experiment that Planck was studying did not di- Fig. 7.15 The photons leaving a cavity in a time dt are those
rectly measure the energy density per unit frequency, within vz dt of the hole.
equation 7.66 inside a box. It measured the energy ra-
diating out of a small hole, of area A. Let us assume the
hole is on the upper face of the cavity, perpendicular to As one can see geometrically (gure 7.15), those pho-
the z axis. tons within vz dt of the boundary will escape in time dt.
What is the photon distribution just inside the boundary The vertical velocity vz = c cos(), where is the photon
of the hole? Clearly there are few photons coming into velocity angle with respect to the vertical. The Planck
the hole from the outside, so the distribution is depleted distribution is isotropic, so the probability that a photon
for those photons with vz < 0. However, the photons will be moving at an angle is the perimeter of the
with vz > 0 to an excellent approximation should be un- circle on the sphere divided by the area of the sphere,
2 sin() d
aected by the hole since they were emitted from far 4
= 1/2 sin() d.

44 Were being sloppy again, using the same name for the probability densities
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
134 Quantum Statistical Mechanics

(a) Show that the probability density44 (vz ) for a par- (c) Using the fact that 0 x3 /(ex 1) dx = 4 /15, show
ticular photon to have velocity vz is independent of vz in that
the range (c, c), and thus is 2c . (Hint: (vz ) vz = Z
() .) Qtot (T ) = Pblack (, T ) d = T 4 . (7.87)
Clearly, an upper bound on the energy emitted from a 0

hole of area A is given by the energy in the box as a

and give a formula for the StefanBoltzmann constant .
whole (eq. 7.66) times the fraction AcV dt of the volume
The value is = 5.67 105 erg cm2 K4 s1 . (Hint:
within c dt of the hole.
use this to check your answer.)
(b) Show that the actual energy emitted is 1/4 of
R c upper bound. (Hint: Youll need to integrate
(vz )vz dvz .) (7.13) Fermions in Semiconductors. (Quantum)
Hence the power per unit area emitted from the small Lets consider a simple model of a doped semiconduc-
hole in equilibrium is tor [8, ch. 28]. Consider a crystal of phosphorous-doped
silicon, with N M atoms of silicon and M atoms of
c  3 d
Pblack (, T ) = . (7.85) phosphorous. Each silicon atom contributes one electron
4 2 c3 e/kB T 1 to the system, and has two states at energies /2, where
Why is this called blackbody radiation? Certainly a = 1.16eV is the energy gap. Each phosphorous atom
small hole in a large (cold) cavity looks black any light contributes two electrons and two states, one at /2
entering the hole bounces around inside until it is ab- and the other at /2 9, where 9 = 0.044eV is much
sorbed by the walls. Suppose we placed a black object smaller than the gap. (Our model ignores the quantum
a material that absorbed radiation at all frequencies and mechanical hopping between atoms that broadens the lev-
angles capping the hole. This object would absorb radi- els at /2 into the conduction band and the valence
ation from the cavity, rising in temperature until it came band. It also ignores spin and chemistry: each silicon re-
to equilibrium with the cavity emitting just as much ally contributes four electrons and four levels, and each
phosphorous ve electrons and four levels.) To summa-
radiation as it absorbs. Thus the overall power per unit
area emitted by our black object in equilibrium at a given rize, our system has N + M spinless electrons (maximum
temperature must equal that of the hole. This must also of one electron per state), N valence band states at en-
ergy /2, M impurity band states at energy /2 9,
be true if we place a selective lter between the hole and
our black body, passing through only particular types of and N M conduction band states at energy /2.
photons. Thus the emission and absorption of our black (a) Derive a formula for the number of electrons as a
body must agree with the hole for every photon mode in- function of temperature T and chemical potential for
dividually, an example of the principle of detailed balance the energy levels of our system.
we will discuss in more detail in section 8.2. (b) What is the limiting occupation probability for the
How much power per unit area Pcolored (, T ) is emitted in states as T , where entropy is maximized and all
equilibrium at temperature T by a red or maroon body? states are equally likely? Using this, nd a formula for
A white body? A mirror? These objects are dierent in (T ) valid at large T , not involving or 9.
the fraction of incident light they absorb at dierent fre- (c) Draw an energy level diagram showing the lled and
quencies and angles a(, ). We can again use the prin- empty states at T = 0. Find a formula for (T ) in the
ciple of detailed balance, by placing our colored object low temperature limit T 0, not involving the variable
next to a black body and matching the power emitted T. (Hint: Balance the number of holes in the impu-
and absorbed for each angle and frequency: rity band with the number of electrons in the conduction
band. Why can you ignore the valence band?)
Pcolored (, T, ) = Pblack (, T )a(, ) (7.86)
(d) In a one centimeter cubed sample, there are M = 1016
Finally, we should calculate Qtot (T ), the total power per phosphorous atoms; silicon has about N = 5 1022 atoms
unit area emitted from a black body at temperature T , per cubic centimeter. Find at room temperature (1/40
by integrating 7.85 over frequency. eV) from the formula you derived in part (a). (Probably

per unit velocity and per unit angle.

45 The phosphorous atom is neutral when both of its states are lled: the upper

state can be thought of as an electron bound to a phosphorous positive ion. The

energy shift / represents the Coulomb attraction of the electron to the phosphorous
ion: its small because the dielectric constant is large (see A&M above).
To be pub. Oxford UP, Fall05
7.7 Metals and the Fermi Gas 135

trying various is easiest: set up a program on your cal- (b) Using the non-relativistic model in part (a), calculate
culator or computer.) At this temperature, what fraction the Fermi energy of the electrons in a white dwarf star
of the phosphorous atoms are ionized (have their upper of the mass of the Sun, 2 1033 gm, assuming that it is
energy state empty)? What is the density of holes (empty composed of helium. (i) Compare it to a typical chem-
states at energy /2)? ical binding energy of an atom. Are we justied in ig-
Phosphorous is an electron donor, and our sample is noring the electron-electron and electron-nuclear interac-
doped n-type, since the dominant carriers are electrons: tions (i.e., chemistry)? (ii) Compare it to the temper-
p-type semiconductors are doped with holes. ature inside the star, say 107 K. Are we justied in as-
suming that the electron gas is degenerate (roughly zero
temperature)? (iii) Compare it to the mass of the elec-
(7.14) White Dwarves, Neutron Stars, and Black tron. Are we roughly justied in using a non-relativistic
Holes. (Astrophysics,Quantum) theory? (iv) Compare it to the mass dierence between a
As the energy sources in large stars are consumed, and the proton and a neutron.
temperature approaches zero, the nal state is determined The electrons in large white dwarf stars are relativistic.
by the competition between gravity and the chemical or This leads to an energy which grows more slowly with
nuclear energy needed to compress the material. radius, and eventually to an upper bound on their mass.
A simple model of ordinary stellar matter is a Fermi sea (c) Assuming extremely relativistic electrons with 9 = pc,
of non-interacting electrons, with enough nuclei to bal- calculate the energy of a sphere of non-interacting elec-
ance the charge. Lets model a white dwarf (or black trons. Notice that this energy cannot balance against the
dwarf, since we assume zero temperature) as a uniform gravitational energy of the nuclei except for a special value
density of He4 nuclei and a compensating uniform den- of the mass, M0 . Calculate M0 . How does your M0 com-
sity of electrons. Assume Newtonian gravity. Assume the pare with the mass of the Sun, above?
chemical energy is given solely by the energy of a gas of A star with mass larger than M0 continues to shrink as it
non-interacting electrons (lling the levels to the Fermi cools. The electrons (note (b.iv) above) combine with the
energy). protons, staying at a constant density as the star shrinks
(a) Assuming non-relativistic electrons, calculate the en- into a ball of almost pure neutrons (a neutron star, often
ergy of a sphere with N zero-temperature non-interacting forming a pulsar because of trapped magnetic ux). Re-
electrons and radius R.46 Calculate the Newtonian grav- cent speculations [82] suggests that the neutronium will
itational energy of a sphere of He4 nuclei of equal and further transform into a kind of quark soup with many
opposite charge density. At what radius is the total en- strange quarks, forming a transparent insulating mate-
ergy minimized? rial.
A more detailed version of this model was studied by For an even higher mass, the Fermi repulsion between
Chandrasekhar and others as a model for white dwarf quarks cant survive the gravitational pressure (the
stars. Useful numbers: mp = 1.6726 1024 gm, quarks become relativistic), and the star collapses into
mn = 1.6749 1024 gm, me = 9.1095 1028 gm, a black hole. At these masses, general relativity is im-
 = 1.05459 1027 erg sec, G = 6.672 108 cm3 /(gm portant, going beyond the purview of this course. But
s2 ), 1 eV = 1.60219 1012 erg, kB = 1.3807 1016 erg the basic competition, between degeneracy pressure and
/ K, and c = 3 1010 cm/s. gravity, is the same.

46 You may assume that the singleparticle eigenstates have the same energies and

kspace density in a sphere of volume V as they do for a cube of volume V : just

like xed versus periodic boundary conditions, the boundary doesnt matter to bulk
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
136 Quantum Statistical Mechanics

To be pub. Oxford UP, Fall05

Computational Stat Mech:
Ising and Markov 8
Lattice models are a big industry within statistical mechanics. Placing
some degrees of freedom on each site of a regular grid, and forming a
Hamiltonian or a dynamical evolution law to equilibrate or evolve the re-
sulting system forms a centerpiece of computational statistical mechanics
(as well as the focus of much of the theoretical work). Critical phenom-
ena and phase transitions [13], lattice QCD and quantum eld theories,
quantum magnetism and models for high temperature superconductors,
phase diagrams for alloys [8.1.2], the behavior of systems with dirt or dis-
order, and nonequilibrium systems exhibiting avalanches and crackling
noise [13], all make important use of lattice models. Fig. 8.1 The 2D square-lattice Ising
In this chapter, we will introduce the most studied of these models, model. It is traditional to denote the
the Ising model.1 values si = 1 as up and down, or as
two dierent colors.
Isings name is pronounced Eesing,
but the model is usually pronounced
8.1 The Ising Model Eyesing with a long I sound.

The Ising model is a lattice of sites i with a single, twostate degree

of freedom si on each site that may take values 1. This degree of
freedom is normally called a spin.2 We will be primarily interested in 2
Unlike a true quantum spin 1/2 par-
the Ising model in two dimensions on a square lattice, see gure 8.1. ticle there are no terms in the Ising
Hamiltonian that lead to superposi-
The Hamiltonian for the Ising model is tions of states with dierent spins.
H= Jsi sj H si . (8.1)

Here the sum ij is over all pairs of spins on nearest-neighbor sites,
and J is the coupling between these neighboring spins. (There are four
neighbors per spin on the 2D square lattice.)
 Usually one refers to H
as the external eld, and the sum M = i si as the magnetization, in
reference to the Ising models original application to magnetic systems.3 3
We shall use boldface M to denote the
Well usually assume the model has N spins forming a square, with total magnetization, and (especially in
the problems) will also refer to M =
periodic boundary conditions. M/N , the average magnetization per

8.1.1 Magnetism
As a model for magnetism, our spin si = 2iz , the z-component of the
net spin of a spin 1/2 atom in a crystal. The interactions between spins
138 Computational Stat Mech: Ising and Markov

is Jsi sj = 4Jiz jz .4 The coupling of the spin to the external magnetic

eld is microscopically Hsi = gH z , where g is the gyromagnetic
ratio for the spin (close to two for the electron).
The energy of two spins Jsi sj is J if the spins are parallel, and +J
if they are antiparallel. Thus if J > 0 the model favors parallel spins: we
say that the interaction is ferromagnetic, because like iron the spins will
tend to all align in one direction at low temperatures, into a ferromag-
B B B B netic phase, where the magnetization per spin will approach M = 1
as the temperature approaches zero. If J < 0 we call the interaction
antiferromagnetic; for our square lattice the spins will tend to align in a
B B A A checkerboard fashion at low temperatures (an antiferromagnetic phase).
At high temperatures, we expect entropy to dominate; the spins will
uctuate wildly, and the magnetization per spin M for a large system
B A A B will be near zero called the paramagnetic phase.

A B A B 8.1.2 Binary Alloys

The Ising model is quite a convincing model for binary alloys.5 Imagine
Fig. 8.2 The Ising model as a binary a square lattice of atoms, which can be either of type A or B (gure 8.2).6
alloy. Atoms in crystals naturally sit on
a regular grid: alloys have more than
We set the spin values A = +1 and B = 1. Let the number of the two
one type of element which can sit on kinds of atoms be NA and NB , with NA + NB = N . Let the interaction
the lattice sites (here, types A and B). energy between two neighboring atoms be EAA , EBB , and EAB ; these
Indeed, any classical system on a can be thought of as the bond strength the energy needed to break
lattice with local interactions can be the bond. Let the total number of AA nearestneighbor bonds be NAA ,
mapped onto an Isinglike model. and similarly for NBB and NAB . Then the Hamiltonian for our binary
6 alloy is
A realistic alloy might mix roughly
half copper and half zinc to make - Hbinary = EAA NAA EBB NBB EAA NAA . (8.2)
brass. At low temperatures, the cop-
per and zinc atoms sit each on a cubic How is this the Ising model? Lets start by adding a constant CN
lattice, with the zincs in the middle of to the Ising model, and plugging in our new variables:
the copper cubes, together forming a  
bodycentered cubic (bcc) lattice. At Hising = J si sj H si CN
high temperatures, the zincs and cop-
pers freely interchange on the two lat- ij
tices. The transition temperature is = J (NAA + NBB NAB ) H (NA NB ) CN, (8.3)
about 733C.
since NA NB = M the sum of the spins, NAA + NBB is the number
of parallel neighbors, and NAB is the number of antiparallel neighbors.
There are two bonds per spin, so NAA +NBB +NAB = 2N ; we substitute
N = 1/2 (NAA +NBB +NAB ). For every A atom there must be four bonds
ending with an A, and similarly for every B atom there must be four
bonds ending with a B. Each AA bond gives half an A atom worth of
bond ends, and each AB bond gives a quarter, so

NA = 1/2 NAA + 1/4 NAB and similarly

1 1
NB = /2 NBB + /4 NAB (8.4)

4 The interaction between spins is usually better approximated by the dot product

i j = ix ix + iy iy + iz iz , used in the more realistic Heisenberg model. Some

materials have anisotropic crystal structures which make the Ising model at least
approximately valid.
To be pub. Oxford UP, Fall05
8.1 The Ising Model 139

and we may substitute NA NB = 1/2 (NAA NBB ). We now nd

Hising = J (NAA + NBB NAB ) H 1/2 (NAA NBB )

C 1/2 (NAA + NBB + NAB )
= (J + 1/2 H + 1/2 C)NAA (J 1/2 H + 1/2 C)NBB
(J + 1/2 C)NAB . (8.5)
This is just of the form of the binary alloy Hamiltonian 8.2, with J =
/4 (EAA + EBB 2EAB ), H = EAA EBB , and C = 1/2 (EAA + EBB +
2EAB ).
Now, our model just contains atoms on their lattice sites. Surely if
one kind of atom is larger than the other, itll push neighboring atoms
o their sites? We simply include these reshuings into the energies in
our Hamiltonian 8.2.
What about the vibrations of the atoms about their equilibrium po-
sitions? We can imagine doing a partial trace, as we discussed in sec-
tion 5.4. Just as in problem 5.2, one can incorporate the entropy due to
the local atomic motions S{si } about their lattice sites into an eective
free energy for each atomic conguration7 7
This nastylooking integral over con-
   gurations where the atom hasnt
H(P,Q)/kB T shifted too far past its lattice site
Fbinary {si } = kB T log dP dQ e would normally be approximated by a
ri on site si
Gaussian integral over phonon vibra-
= Hbinary {si } T S{si }. (8.6) tions, similar to that described in prob-
lem 5.2(b), gure 5.5.
Again, as in section 5.4, were doing a partial trace over states. If we ig-
nore the congurations where the atoms are not near lattice sites, we can
recover the total partition function by summing over spin congurations

Z= eFbinary {si }/kB T (8.7)
{si }
= dP dQ eH(P,Q)/kB T dP dQ eH(P,Q)/kB T .
{si } ri on site si

Insofar as the entropy in the free energy Fbinary {si } can be approximated
as a sum of pair energies,8 we again get an Ising model, but now with 8
Or we can incorporate second-
temperature dependent parameters. neighbor and three-site interactions, as
we probably needed to do to get an
More elaborate Ising models (with threesite and longerrange in- accurate energy in the rst place.
teractions, for example) are commonly used to compute realistic phase
diagrams for alloys (reference [121]). Sometimes, though, the interac-
tions introduced by relaxations and thermal uctuations o lattice sites
have important longrange pieces, which can lead to qualitative changes
in the behavior for example, turning the transition from continuous to

8.1.3 Lattice Gas and the Critical Point

The Ising model is also used as a model for the liquid-gas transition. In
this lattice gas interpretation, up-spins (si = +1) count as atoms and
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
140 Computational Stat Mech: Ising and Markov

down-spins count as a site without an atom. The gas is the phase with
mostly down spins (negative magnetization), with only a few up-spin
Critical atoms in the vapor. The liquid phase is mostly atoms (up-spins), with

Point a few vacancies.

Liquid On the whole, the gas phase seems fairly realistic, especially compared
Solid to the liquid phase. The liquid in particular seems much more like a
crystal, with atoms sitting on a regular lattice. Why do we suggest that
Gas this model is a good way of studying transitions between the liquid and
gas phase?
Temperature Unlike the binary alloy problem, the Ising model is not a good way
to get quantitative phase diagrams for uids. What it is good for is to
Fig. 8.3 A schematic phase diagram understand the properties near the critical point. As shown in gure 8.3,
for a typical material. There is a solid one can go continuously between the liquid and gas phases: the phase
phase at high pressures and low tem-
peratures, a gas phase at low pressures
boundary separating them ends at a critical point Tc , Pc , above which
and high temperatures, and a liquid the two phases blur together seamlessly, with no jump in the density
phase in a region in between. The solid- separating them.
liquid phase boundary corresponds to a The Ising model, interpreted as a lattice gas, also has a line H = 0
change in symmetry, and cannot end.
The liquid-gas phase boundary typi-
along which the density (magnetization) jumps, and a temperature Tc
cally does end: one can go continu- above which the properties are smooth as a function of H (the para-
ously from the liquid phase to the gas magnetic phase). The phase diagram 8.4 looks only topologically like
phase by increasing the pressure above the real liquid-gas coexistence line 8.3, but the behavior near the criti-
Pc , then the temperature above Tc , and
then lowering the pressure again. cal point in the two systems is remarkably similar. Indeed, we will nd
in chapter 13 that in many ways the behavior at the liquidgas critical
point is described exactly by the three-dimensional Ising model.
External Field H

Up Point

Temperature T 8.1.4 How to Solve the Ising Model.
How do we solve for the properties of the Ising model?

(1) Solve the onedimensional Ising model, as Ising did.9

Fig. 8.4 The phase diagram for the
Ising model. Below the critical tem- (2) Have an enormous brain. Onsager solved the twodimensional
perature Tc , the H = 0 line separates
two phases, an up-spin and a down-
Ising model in a bewilderingly complicated way. Since Onsager,
spin phase. Above Tc the behavior is many great minds have found simpler, elegant solutions, but all
smooth as a function of H; below Tc would take at least a chapter of rather technical and unillumi-
there is a jump in the magnetization as nating manipulations to duplicate. Nobody has solved the three-
one crosses H = 0.
dimensional Ising model.
(3) Do Monte Carlo on the computer.10
This is a typical homework problem in
a course like ours: with a few hints, you The Monte Carlo11 method involves doing a kind of random walk
can do it too.
through the space of lattice congurations. Well study these methods
in great generality in section 8.2. For now, lets just outline the Heat
Or, do high temperature expansions, Bath Monte Carlo method.
low temperature expansions, transfer
matrix methods, exact diagonalization
of small systems, 1/N expansions in the
11 Monte Carlo is a gambling center in Monaco. Lots of random numbers are
number of states per site, 4 / expan-
sions in the dimension of space, . . . generated there.
To be pub. Oxford UP, Fall05
8.2 Markov Chains 141

Heat Bath Monte Carlo for the Ising Model

Pick a site i = (x, y) at random.

Check how many neighbor spins are pointing up:

4 (4 neighbors up)

 2 (3 neighbors up)
mi = sj = 0 (2 neighbors up) (8.9)

j: ij

2 (1 neighbor up)

4 (0 neighbors up)

Calculate E+ = Jmi H and E = +Jmi + H, the energy for

spin i to be +1 or 1.
Set spin i up with probability eE+ /(eE+ + eE ) and down
with probability eE /(eE+ + eE ).

The heat-bath algorithm just thermalizes one spin at a time: it sets

the spin up or down with probability given by the thermal distribution
given that its neighbors are xed. Using it, and fast modern computers,
you can simulate the Ising model fast enough to explore its behavior
rather thoroughly, as we will in a variety of exercises.

8.2 Markov Chains

Lets consider a rather general algorithm for equilibrating a lattice model.
Our system has a set of states S = {si }; for the Ising model there are
2N such states. The algorithm has a transition rule, which at each step
shifts the current state S to a state S with probability PS S .12 For 12
We put the subscripts in this or-
the heatbath algorithm, PS S is equal to zero unless S and S are the der because we will use P as a ma-
trix, which will take a probability vec-
same except for at most one spin ip. Under what circumstances will tor from one step to the next.
an algorithm, dened by our matrix P , take our system into thermal
There are many problems outside of mainstream statistical mechan-
ics that can be formulated in this general way. Exercise 8.3 discusses a
model with 1001 states (dierent numbers of red bacteria), and tran-
sition rates P+1 , P1 , and P ; we want to understand what
the longtime behavior is of the probability of nding dierent states.
These systems are examples of Markov Chains. A Markov chain has
a nite set of states {}, through which the system evolves in a discrete
series of steps n.13 The probabilities of moving to dierent new states 13
There are continuous analogues of
in a Markov chain depends only on the current state.14 That is, the Markov chains.
system has no memory of the past evolution. More generally, systems which lack
Let the probabilities of being in various states at step n be arranged memory are called Markovian.
in a vector  (n). Then it is easy to see for a general Markov chain that
the probabilities15 P for moving from to satisfy 15
We heretofore leave out the left ar-
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
142 Computational Stat Mech: Ising and Markov

Time evolution: The probability vector at step n + 1 is

 (n + 1) = P

(n + 1) = P
(n). (8.10)

Positivity: The matrix elements are probabilities, so

0 P 1. (8.11)

Conservation of Probability: The state must go somewhere,

P = 1 (8.12)

Not symmetric! Typically P = P .
This last point isnt a big surprise: highenergy states are more likely
to be left than entered into. However, this means that much of our
mathematical intuition and many of our tools, carefully developed for
symmetric and Hermitian matrices, wont apply to our transition matrix
P . In particular, we cannot assume in general that we can diagonalize
our matrix.
It is true in great generality that our matrix P will have eigenvalues.
Also, it is true that for each distinct eigenvalue there will be at least one
16 0 1
For example, the matrix
0 0
right eigenvector16
has a double eigenvalue of zero, but P  =  (8.13)
only one left and right eigenvector with
eigenvalue zero. and one left eigenvector

 T P =  T . (8.14)

However, for degenerate eigenvalues there may not be multiple eigenvec-

tors, and the left and right eigenvectors usually will not be equal to one
A general matrix M can be put into another.17
Jordan canonical form by a suitable For the particular case of our transition matrix P , we can go further.
change of basis S: M = SJS 1 . The
matrix J is block diagonal, with one
If our Markov chain reaches an equilibrium state at long times, that
eigenvalue associated with each block state must be unchanged under the time evolution P . That is, P = ,
(but perhaps multiple blocks per ). A and thus the equilibrium probability density is a right eigenvector with
given 0(say, 3 3)
1 block will be of the eigenvalue one. We can show that our Markov chain transition matrix
1 0
form @ 0 1 A with along the di- P has such a right eigenvector.
0 0
agonal and 1 in the elements imme-  with eigenvalue
Theorem 8.1. P has at least one right eigenvector
diately above the diagonal. The rst one.
column of the block is associated with
the right eigenvector for ; the last row Sneaky Proof: P has a left eigenvector  with eigenvalue one: the
is associated with the left eigenvector. vector all of whose components are one,  T = (1, 1, 1, . . . , 1):
The word canonical here means sim-
plest form, and doesnt indicate a con-  
nection with the canonical ensemble. ( T P ) = P = P = 1 = . (8.15)

Hence P must have an eigenvalue equal to one, and hence it must also
have a right eigenvector with eigenvalue one.
To be pub. Oxford UP, Fall05
8.2 Markov Chains 143

We can also show that all the other eigenvalues have right eigenvectors
that sum to zero, since P conserves probability: 18

 with eigenvalue dierent from

Theorem 8.2. Any right eigenvector
one must have components that sum to zero.

 is a right eigenvector, P
Proof:  = 
. Hence
= = P = P

= . (8.16)

This implies that either = 1 or = 0.
Markov chains can have more than one stationary probability distri-
bution.19 They can have transient states, which the system eventually 19
A continuum example of this is given
leaves never to return.20 They can also have cycles, which are proba- by the KAM theorem of problem 4.2.
There is a probability density smeared
bility distributions which like a clock 1 2 3 12 1 shift over each KAM torus which is time-
through a nite number of distinct classes of states before returning to independent.
the original one. All of these are obstacles in our quest for nding the 20
Transient states are important in dis-
equilibrium states in statistical mechanics. We can bypass all of them sipative dynamical systems, where they
by studying ergodic Markov chains.21 A nitestate Markov chain is er- are all the states not on the attractors.
godic if its transition matrix to some power n has all positive (non-zero) 21
Were compromising here between
matrix elements: P n > 0 for all states and .22 the standard Markov-chain usage in
We use a famous theorem, without proving it here: physics and in mathematics. Physi-
cists usually ignore cycles, and call al-
gorithms which can reach every state
Theorem 8.3 (PerronFrobenius Theorem). Let A be a matrix ergodic (what mathematicians call irre-
with all nonnegative matrix elements such that An has all positive ele- ducible). Mathematicians use the term
ments. Then A has a positive eigenvalue 0 , of multiplicity one, whose ergodic to exclude cycles and exclude
corresponding right and left eigenvectors have all positive components. probability running to innity (not im-
portant here, where we have a nite
Furthermore any other eigenvalue of A must be smaller, || < 0 . number of states). They also allow er-
godic chains to have transient states:
For an ergodic Markov chain, we can use theorem 8.2, to see that the only the attractor need be connected.
PerronFrobenius eigenvector with all positive components must have Chains with P n everywhere positive,
eigenvalue 0 = 1. We can rescale this eigenvector to sum to one, that were calling ergodic, are called
by the mathematicians regular Markov
proving that an ergodic Markov chain has a unique timeindependent chains. In the problems, to prove a
probability distribution system is ergodic just show that it
Whats the dierence between our denition of ergodic Markov chains, can reach everywhere (irreducible) and
and the denition of ergodic we used in section 4.2 in reference to tra- doesnt have cycles.
jectories in phase space? Clearly the two concepts are related: ergodic That is, after n steps every state
in phase space meant that we eventually come close to all states on the has nonzero probability to reach every
other state.
energy surface, and for nite Markov chains it is the stronger condi-
tion that we have non-zero probability of getting between all states in
the chain after precisely n steps. Indeed, one can show for nite state
Markov chains that if one can get from every state to every other state
by a sequence of moves (that is, the chain is irreducible), and if the

18 One can also view this theorem as saying that all the right eigenvectors except

are orthogonal to the left eigenvector .

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
144 Computational Stat Mech: Ising and Markov

chain is not cyclic, then it is ergodic (proof not given here). Any algo-
rithm that has a nite probability for each state to remain unchanged
(P > 0 for all states) is automatically free of cycles (clocks which lose
time will get out of synchrony).
It is possible to show that an ergodic Markov chain will take any initial
probability distribution (0) and converge to equilibrium, but the proof
in general is rather involved. We can simplify it by specializing one more
time, to Markov chains that satisfy detailed balance.
A Markov chain satises detailed balance if there is some probability
There is an elegant equivalent def- distribution  such that23
inition of detailed balance directly in
terms of P and not involving the equi- P = P (8.17)
 : see
librium probability distribution
problem 8.4.
for each state and . In words, the probability ux from state to
(the rate times the probability of being in ) balances the probability
ux back, in detail (i.e., for every pair of states).
If a physical system is time reversal invariant (no dissipation, no mag-
netic elds), and its states are also invariant under time reversal (no
states with specied velocities or momenta) then its dynamics automat-
ically satisfy detailed balance. This is easy to see: the equilibrium state
is also the equilibrium state under time reversal, so the probability ow
from must equal the timereversed ow from . Quan-
tum systems undergoing transitions between energy eigenstates in per-
turbation theory usually satisfy detailed balance, since the eigenstates
are time-reversal invariant. Most classical models (like the binary alloy
in 8.1.2) have states involving only congurational degrees of freedom,
which again satisfy detailed balance.
Detailed balance allows us to nd a complete set of eigenvectors and
right eigenvalues for our transition matrix P . One can see this with
simple transformation. If we divide both sides of equation 8.17 by
, we create a symmetric matrix Q

Q = P = P / (8.18)

= P / = P = Q .

This particular symmetric matrix has eigenvectors Q =

This works in reverse to get the right can be turned into right eigenvectors of P when rescaled24 by :
eigenvectors ofP from Q. One mul- 
tiplies by to get , and di- = : (8.19)
vides to get , so if detailed balance

holds, = / . In particular,

1 = = (1, 1, 1, . . . )T , as we saw
in theorem 8.1.     
P =
P ( ) = Q ( ) (8.20)

= Q = ( ) = .

To be pub. Oxford UP, Fall05
8.2 Markov Chains 145

Now for the main theorem underlying the algorithms for equilibrating
lattice models in statistical mechanics.
Theorem 8.4 (Main Theorem). A system with a nite number of
states can be guaranteed to converge to an equilibrium distribution
the computer algorithm
is Markovian (has no memory),
is ergodic (can reach everywhere and is acyclic) and
satises detailed balance.
Proof: Let P be the transition matrix for our algorithm. Since the
algorithm satises detailed balance, P has a complete set of eigenvectors
 . Since our algorithm is ergodic there is only one right eigenvector
with eigenvalue one, which we can choose to be the stationary distribu-
tion  ; all the other eigenvalues
 have || < 1. Decompose the initial
condition  + ||<1 a
(0) = a1  . Then25 25
The eigenvectors closest to one will
be the slowest to decay. You can get
 the slowest characteristic time for
(n) = P
(n 1) = P n  +
(0) = a1 a n
 . (8.21) a Markov chain by nding the largest
||<1 |max | < 1 and setting n = en .

Since the (nite) sum in this equation decays to zero, the density con-
 . This implies both that a1 = 1 and that our system
verges to a1
 as n .
converges to
Thus, to develop a new equilibration algorithm, one must ensure that
it is Markov, ergodic, and satises detailed balance.

(8.1) The Ising Model. (Computational) for a binary alloy, . . . ). As a lattice gas, M gives the
Youll need the program ising, available on the Web [102]. net concentration, and H corresponds to a chemical po-
The Ising Hamiltonian is tential. Our simulation doesnt conserve the number of
X X spins up, so its not a natural simulation for a bulk lattice
H = J Si Sj H Si , 3.7.1 gas. You can think of it as a grand canonical ensemble,
ij i or as a model for a lattice gas on a surface exchanging
wherePSi = 1 are spins on a square lattice, and the atoms with the vapor above.
sum ij is over the four nearest-neighbor bonds (each Play with it. At high temperatures, the spins should not
pair summed once). Its conventional to set the coupling be strongly correlated. At low temperatures the spins
strength J = 1 and Boltzmanns constant kB = 1, which should align all parallel, giving a large magnetization.
amounts to measuring energies and temperatures in units Can you roughly locate the phase transition? Can you
of J. P The constant H is called the external eld, and see growing clumps of aligned spins as T Tc + (i.e., T
M = i Si is called the magnetization. approaching Tc from above)?
As noted in class, the Ising model can be viewed as an (a) Phase diagram. Draw a rough phase diagram in
anisotropic magnet with Si being 2z for the spin at site i, the (H, T ) plane, showing (i) the spin up phase where
or it can represent the occupancy of a lattice site (atom M > 0, (ii) the spin down phase with M < 0, (iii)
or no atom for a lattice gas simulation, copper or gold the paramagnetic phase line where M = 0, (iv) the fer-
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
146 Computational Stat Mech: Ising and Markov

romagnetic phase line where |M| > 0 for large systems lap), write a formula for the magnetization. (Remember,
even though H = 0, and (v) the critical point, where at each ipped spin changes the magnetization by 2.) Check
H = 0 the system develops a non-zero magnetization. your prediction against the simulation. (Hint: see equa-
Correlations and Susceptibilities: Analytical. tion 10.14.)
P partition function for the Ising model is Z = The magnetization (and the specic heat) are exponen-
n exp(En ), where the states n run over all 2 pos- tially small at low temperatures because there is an energy
sible congurations of the spins, and the free energy gap to spin excitations in the Ising model,27 just as there
F = kT log Z. is a gap to charge excitations in a semiconductor or an
(b) Show that the average of the magnetization M equals insulator.
(F/H) |T . Derive the formula writing the suscepti- High Temperature Expansion for the Susceptibil-
bility 0 = (M/H) |T in terms of (M M)2  = ity. At high temperatures, we can ignore the coupling to
M2  M2 . (Hint: remember our derivation of for- the neighboring spins.
mula 5.18 (E E)2  = kB T 2 C?)
(e) Calculate a formula for the susceptibility of a free spin
Notice that the program outputs, at each temperature coupled to an external eld. Compare it to the suscepti-
and eld, averages of several quantities: |M |, (M bility you measure at high temperature T = 100 for the
M )2 , E, (E E)2 . Unfortunately, E and M in Ising model (say, M/H with H = 1. Why is H = 1
these formulas are measured per spin, while the formu- a small eld in this case?)
las in the class and the problem set are measured for the
system as a whole. Youll need to multiply the squared Your formula for the high-temperature susceptibility is
quantities by the number of spins to make a comparison. known more generally as Curies law.
To make that easier, change the system size to 100100,
using congure. While youre doing that, increase speed (8.2) Coin Flips and Markov Chains. (Mathematics,
to ten or twenty to draw the spin conguration fewer Basic)
times. To get good values for these averages, equilibrate A physicist, testing the laws of chance, ips a coin repeat-
for a given eld and temperature, reset, and then start edly until it lands tails.
(a) Treating the two states of the physicist (still ipping
(c) Correlations and Susceptibilities: Numerical.
and done) as states in a Markov
The current
Check the formulas for C and from part (b) at H = 0 f lipping
and T = 3, by measuring the uctuations and the av- probability vector then is
F= . Write the tran-
erages, and then changing by H = 0.02 or T = 0.1 sition matrix P, giving the time evolution P Fn = Fn+1 ,
and measuring the averages again. Check them also for assuming that the coin is fair.
T = 2, where M =  0.26
(b) Find the eigenvalues and right eigenvectors of P.
There are systematic series expansion for the Ising model Which eigenvector is the steady state ? Call the other
at high and low temperatures, using Feynman diagrams eigenvector . For convenience, normalize so that its
(see section 10.2). The rst terms of these expansions are rst component equals one.
famous, and easy to understand.
(c) Assume an arbitrary initial state is written 0 =
Low Temperature Expansion for the Magnetiza-
A + B . What are the conditions on A and B needed
tion. At low temperatures we can assume all spins ip
to make 0 a valid probability distribution? Write n as
alone, ignoring clusters.
a function of A and B, and .
(d) What is the energy for ipping a spin antiparallel to
its neighbors? Equilibrate at low temperature T = 1.0,
and measure the magnetization. Notice that the primary (8.3) Red and Green Bacteria (Mathematics) (From
excitations are single spin ips. In the low temperature Princeton. [115])
approximation that the ipped spins are dilute (so we may A growth medium at time t = 0 has 500 red bacteria and
ignore the possibility that two ipped spins touch or over- 500 green bacteria. Each hour, each bacterium divides

26 Be sure to wait until the state is equilibrated before you start! Below T this
means the state should not have red and black domains, but be all in one ground
state. You may need to apply a weak external eld for a while to remove stripes at
low temperatures.
27 Not all real magnets have a gap: if there is a spin rotation symmetry, one can

have gapless spin waves, like phonons for spins.

To be pub. Oxford UP, Fall05
8.2 Markov Chains 147

in two. A color-blind predator eats exactly 1000 bacteria states , , and . Assume for simplicity that there is
per hour.28 a state with non-zero transition rates from all other
(a) After a very long time, what is the probability dis- states . Construct a probability density that demon-
tribution for the number of red bacteria in the growth strates that P satises detailed balance (equation 8.22).
medium? (Hint: If you assume a value for , what must be to
(b) Roughly how long will it take to reach this nal ensure detailed balance for the pair? Show that this can-
state?29 didate distribution satises detailed balance for any two
(c) Assume that the predator has a 1% preference for
green bacteria (implemented as you choose). Roughly how
(8.5) Heat Bath, Metropolis, and Wol. (Mathe-
much will this change the nal distribution?
matics, Computation)
(8.4) Detailed Balance. (Basic) There are a number of dierent methods for equilibrat-
In an equilibrium system, for any two states and ing lattice simulations like the Ising model. They give
with equilibrium probabilities and , detailed bal- the model dierent dynamics, but keep the equilibrium
ance states (equation 8.17) that properties unchanged. This is guaranteed by the theo-
rem we asserted in class on Markov processes: if they
P = P , (8.22) are ergodic and obey detailed balance, they converge to
the equilibrium distribution. Well rst look at the two
that is, the equilibrium ux of probability from to most common algorithms. Well then consider the most
is the same as the ux backward from to . Its both sophisticated, sneaky use of the theorem I know of.
possible and elegant to reformulate the condition for de-
The simulation ising in problem 8.1 uses the heat-bath
tailed balance so that it doesnt involve the equilibrium
algorithm, which thermalizes one spin at a time:
probabilities. Consider three states of the system, , ,
Heat Bath
and .
(a) Assume that each of the three types of transitions (a) Pick a spin at random,
among the three states satises detailed balance. Elim- (b) Calculate the energies E and E for the spin being
inate the equilibrium probability densities to write the un- up or down given its current environment.
known rate P in terms of the ve other rates. (Hint: (c) Thermalize it: place it up with probability
see equation below for answer.) eE /(eE + eE ), down with probability
If we view the three states , , and to be around a cir- eE /(eE + eE ).
cle, youve derived a relationship between the rates going
Another popular choice is the Metropolis algorithm,
clockwise and the rates going counter-clockwise around
which also ips a single spin at a time:
the circle,
P P P = P P P . (8.23) (a) Pick a spin at random,
(b) Calculate the energy E for ipping the spin.
It is possible to show conversely that if every triple of
(c) If E < 0 ip it; if E > 0, ip it with probability
states in a Markov chain satises the condition you de-
e E .
rived then it satises detailed balance (i.e., that there is
at least one probability density which makes the prob- (a) Show that Heat Bath and Metropolis satisfy detailed
ability uxes between all pairs of states equal). The only balance. Note that they are ergodic and Markovian (no
complication arises because some of the rates can be zero. memory), and hence argue that they will lead to thermal
(b) Suppose P is the transition matrix for some Markov equilibrium. Is Metropolis more ecient (fewer random
process satisfying the condition 8.23 for every triple of numbers needed to get to equilibrium)? Why?

28 This question is purposely openended, and rough answers to parts (b) and (c)

within a factor of two are perfectly acceptable. Numerical and analytical methods
are both feasible.
29 Within the accuracy of this question, you may assume either that one bacterium

reproduces and then one is eaten 1000 times per hour, or that at the end of each
hour all the bacteria reproduce and then 1000 are consumed. The former method is
more convenient for analytical work nding eigenvectors; the latter can be used to
motivate approaches using the diusion of probability with an dependent diusion
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
148 Computational Stat Mech: Ising and Markov

Near the critical point Tc where the system develops The cluster ip can start at any site in the cluster C.
a magnetization, any single-spin-ip dynamics becomes The ratio of rates AB /BA depends upon the num-
very slow. Wol (Phys. Rev. Lett. 62, 361 (1989)), im- ber of times the cluster chose not to grow on the bound-
proving on ideas of Swendsen and Wang (Phys. Rev. Lett. ary. Let PC be the probability that the cluster grows in-
58, 86 (1987)), came up with a clever method to ip whole ternally from site to the cluster C (ignoring the moves
clusters of spins. which try to grow outside the boundary). Then

AB = PC (1 p)n , (8.24)

BA = PC (1 p)n , (8.25)

since the cluster must refuse to grow n times when start-

ing from the up-state A, and n times when starting from
Fig. 8.5 Cluster Flip. The region inside the dotted line is B.
ipped in one Wol move. Let this conguration be A. (c) What value of p lets the Wol algorithm satisfy de-
tailed balance at temperature T ?
Find a Windows machine. Download the Wol simula-
tion [103]. Using the parameter reset (top left) reset the
temperature to 2.3, the algorithm to Heat Bath, and the
height and width to 512. Watch the slow growth of the
characteristic cluster sizes. Now change to Wol, and see
how much faster the code is. Also notice that each sweep
almost completely rearranges the pattern: the correlation
time is much smaller for the Wol algorithm. (See [75,
secs. 4.2 and 4.3] for more details on the Wol algorithm.)

Fig. 8.6 Cluster Flip. Let this conguration be B. Let the (8.6) Stochastic Cells. (Biology, Computation) (With
cluster ipped be C. Notice that the boundary of C has n = 2, Myers. [72])
n = 6.
Living cells are amazingly complex mixtures of a vari-
ety of complex molecules (RNA, DNA, proteins, lipids
Wol Cluster Flips . . . ) that are constantly undergoing reactions with one
another. This complex of reactions has been compared
(a) Pick a spin at random, remember its direction D = to computation: the cell gets input from external and in-
1, and ip it. ternal sensors, and through an intricate series of reactions
(b) For each of the four neighboring spins, if it is in the produces an appropriate response. Thus, for example, re-
direction D, ip it with probability p. ceptor cells in the retina listen for light and respond by
(c) For each of the new ipped spins, recursively ip triggering a nerve impulse.
their neighbors as in (2). The kinetics of chemical reactions are usually described
using dierential equations for the concentrations of the
Because with nite probability you can ip any spin, various chemicals, and rarely are statistical uctuations
the Wol algorithm is ergodic. Its obviously Markovian considered important. In a cell, the numbers of molecules
when viewed as a move which ips a cluster. Lets see of a given type can be rather small: indeed, there is (of-
that it satises detailed balance, when we pick the right ten) only one copy of the relevant part of DNA for a given
value of p for the given temperature. reaction. Its an important question whether and when
(b) Show for the two congurations shown above that we may describe the dynamics inside the cell using con-
EB EA = 2(n n )J. Argue that this will be true for tinuous concentration variables, even though the actual
ipping any cluster of up spins to down. numbers of molecules are always integers.
To be pub. Oxford UP, Fall05
8.2 Markov Chains 149

cell. Assume kb = 1 nM 1 s1 and ku = 2 s1 , and that

at t = 0 all N monomers are unbound.
(a) Continuum dimerization. Write the dierential equa-

kb 1 tion for dM/dt treating M and D as continuous variables.
(Hint: remember that two M molecules are consumed in
M D each reaction.) What are the equilibrium concentrations
for [M ] and [D] for N = 2 molecules in the cell, assum-
ing these continuous equations and the values above for kb
and ku ? For N = 90 and N = 10100 molecules? Numer-
ically solve your dierential equation for M (t) for N = 2
and N = 90, and verify that your solution settles down to
2 1 the equilbrium values you found.
ku For large numbers of molecules in the cell, we expect that
the continuum equations may work well, but for just a
few molecules there surely will be relatively large uctu-
ations. These uctuations are called shot noise, named
Fig. 8.7 Dimerization reaction. A Petri net diagram for
a dimerization reaction, with dimerization rate kb and dimer in early studies of electrical noise at low currents due
dissociation rate ku . to individual electrons in a resistor. We can implement
a simple MonteCarlo algorithm to simulate this shot
noise.32 SupposeP the reactions have rates i , with to-
Consider a simple dimerization reaction: a molecule M tal rate tot = i i . The idea is that the expected time
(called the monomer) joins up with another monomer to the next reaction is 1/tot , and the probability that
and becomes a dimer D: 2M D. Proteins in cells the next reaction will be j is j /tot . To simulate until
often form dimers: sometimes (as here) both proteins are a nal time tf , the algorithm runs as follows:
the same (homodimers) and sometimes they are dierent
proteins (heterodimers). Suppose the forward reaction (a) Calculate a list of the rates of all reactions in the
rate is ku and the backward reaction rate is kd . Figure 8.7 system.
shows this as a Petri net [37] with each reaction shown (b) Find the total rate tot .
as a box with incoming arrows showing species that are
consumed by the reaction, and outgoing arrows showing (c) Pick a random time twait with probability distribu-
species that are produced by the reaction: the number tion (t) = tot exp(tot t).
consumed or produced (the stoichiometry) is given by (d) If the current time t plus twait is bigger than tf , no
a label on each arrow.30 There are thus two reactions: further reactions will take place: return.
the backward unbinding reaction rate per unit volume is (e) Otherwise,
ku [D] (each dimer disassociates with rate ku ), and the
forward binding reaction rate per unit volume is kb [M ]2 Increment t by twait ,
(since each monomer must wait for a collision with an- Pick a random number r uniformly distributed
other monomer before binding, the rate is proportional in the range [0, tot ), P
to the monomer concentration squared).31 Pick the reaction j for which i<j i r <
i<j+1 i (that is, r lands in the j th interval
The brackets [] denote concentrations. We assume, as
of the sum forming tot ).
does reference [28], that the volume per cell is such that
Execute that reaction, by incrementing each
one molecule per cell is 1nM (109 moles per liter). For
chemical involved by its stoichiometry.
convenience, we shall pick nanomoles as our unit of con-
centration, so [M ] is also the number of monomers in the (f) Repeat.

30 An enzyme that is necessary but not consumed is shown with an incoming and

outgoing arrow.
31 In the discrete case, the rate will be proportional to M (M 1), since a monomer

cannot collide with itself.

32 In the context of chemical simulations, this algorithm is named after Gille-

spie [33]; the same basic approach was used just a bit earlier in the Ising model
by Bortz, Kalos and Lebowitz [13], and is called continuoustime Monte Carlo in
that context.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
150 Computational Stat Mech: Ising and Markov

As mentioned earlier, the binding reaction rate for M rated into dierent modules: a given protein may partici-
total monomers binding is no longer kb M 2 for discrete pate in what would seem to be several separate regulatory
molecules: its kb M (M 1) (where again [M ] M for a pathways. In this exercise, we will study a simple model
one nanoliter cell, when using concentrations in nanomo- system, the Repressilator. This experimental system in-
lar).33 volves three proteins each of which inhibits the formation
(b) Stochastic dimerization. Implement this algorithm for of the next. They were added to the bacterium E. coli,
the dimerization reaction of part (a). Simulate for N = 2, with hopefully minimal interactions with the rest of the
N = 90, and N = 10100 and compare a few stochastic re- biological machinery of the cell. We will implement the
alizations with the continuum solution. How large a value stochastic model that the authors used to describe their
of N do you need for the individual reactions to be well experimental system [28], in order to
described by the continuum equations (say, uctuations
less than 20% at late times)? Implement in a tangible system an example both
Measuring the concentrations in a single cell is often a of the central dogma and of transcriptional regula-
challenge. Experiments often average over many cells. tion: the control by proteins of DNA expression into
Such experiments will measure a smooth time evolution RNA,
even though the individual cells are noisy. Lets investi- Introduce sophisticated MonteCarlo techniques for
gate whether this ensemble average is well described by simulations of stochastic reactions,
the continuum equations. Introduce methods for automatically generating
(c) Average Stochastic dimerization. Find the average continuum descriptions from reaction rates, and
of many realizations of your stochastic dimerization in Illustrate the shot noise uctuations due to small
part (b), for N = 2 and N = 90, and compare with your numbers of molecules and the telegraph noise uc-
deterministic solution. How much is the longterm av- tuations due to nite rates of binding and unbinding
erage shifted by the stochastic noise? How large a value of the regulating proteins.
of N do you need for the ensemble average of M (t) to be
well described by the continuum equations (say, shifted by Figure 8.8 shows the biologists view of the repressilator
less than 5% at late times)? network. Three proteins (TetR, CI, and LacI) each re-
press the formation of the next. We shall see that, under
(8.7) The Repressilator. (Biology, Computation) appropriate circumstances, this can lead to spontaneous
(With Myers. [72]) oscillations: each protein peaks in turn, suppressing the
Reading: Reference [28], Michael B. Elowitz and Stanis- suppressor of its suppressor, leading to its own later de-
law Leibler, A synthetic oscillator network of transcrip- crease.
tional regulators Nature 403, 335-338 (2000).
The central dogma of molecular biology is that the ow
of information is from DNA to RNA to proteins: DNA is
transcribed into RNA, which then is translated into pro-
Now that the genome is sequenced, it is thought that we
have the parts list for the cell. All that remains is to g-
ure out how they work together. The proteins, RNA, and
DNA form a complex network of interacting chemical re-

actions, which governs metabolism, responses to external
stimuli, reproduction (proliferation), dierentiation into
dierent cell types, and (when the system perceives itself
to be breaking down in dangerous ways) programmed cell Fig. 8.8 The biologists view of the Repressilator network.
death, or apoptosis. The T-shapes are blunt arrows, signifying that the protein at
the tail (bottom of the T) suppresses the production of the pro-
Our understanding of the structure of these interacting tein at the head. Thus LacI (pronounced lack-eye) suppresses
networks is growing rapidly, but our understanding of TetR (tet are), which suppresses CI (lambda-see-one). This
the dynamics is still rather primitive. Part of the dif- simple description summarizes a complex series of interactions
culty is that the cellular networks are not neatly sepa- (see gure 8.9).

33 Without this change, if you start with an odd number of cells your concentrations

can go negative!
To be pub. Oxford UP, Fall05
8.2 Markov Chains 151

The biologists notation summarizes a much more com- Fig. 8.9 The Petri net version [37] of onethird of the Re-
plex picture. The LacI protein, for example, can bind to pressilator network (the LacI repression of TetR). (Thanks to
one or both of the transcriptional regulation or operator Myers [72]). The solid lighter vertical rectangles represent
binding reactions A + B C, with rate kb [A][B]; the open
sites ahead of the gene that codes for the tetR mRNA.34
vertical rectangles represent unbinding C A + B, with rate
When bound, it largely blocks the translation of DNA ku [C]. The horizonal rectangles represent catalyzed synthesis
into tetR.35 The level of tetR will gradually decrease as it reactions C C + P , with rate [C]; the darker ones repre-
degrades; hence less TetR protein will be translated from sent transcription (formation of mRNA), and the lighter one
the tetR mRNA. The resulting network of ten reactions represent translation (formation of protein). The black verti-
is depicted in gure 8.9, showing one third of the total re- cal rectangles represent degredation reactions, A nothing
pressilator network. The biologists shorthand (gure 8.8 with rate kd [A]. (The stoichiometry of all the arrows is one.)
The LacI protein (top) can bind to the DNA in two promoter
does not specify the details of how one protein represses sites ahead of the gene coding for tetR: when bound, it largely
the production of the next. The larger diagram, for exam- blocks the transcription (formation) of tetR mRNA. P0 repre-
ple, includes two operator sites for the repressor molecule sents the promotor without any LacI bound; P1 represents the
to bind to, leading to three states (P0 , P1 , and P2 ) of the promotor with one site blocked, and P2 represents the doubly-
promotor region depending upon how many LacI proteins bound promotor. LacI can bind to one or both of the promotor
are bound. sites, changing Pi to Pi+1 , or correspondingly unbind: the un-
binding rate for the protein is modeled in reference [28] to
be faster when only one site is occupied. The unbound P0
state transcribes tetR mRNA quickly, and the bound states
transcribe it slowly (leaky repression). The tetR mRNA then
catalyzes the formation of the TetR protein.36

If you are not provided with it, you may retrieve a sim-
ulation package for the Repressilator from the book Web
site [105].
(a) Run the simulation for at least 6000 seconds and plot
the protein, RNA, and promotor states as a function of
time. Notice that
The protein levels do oscillate, as in gure 1(c) in
reference [28],
There are signicant noisylooking uctuations,
There are many more proteins than RNA
We will study this noise in parts (c) and (d); it will be due
to the low numbers of RNA molecules in the cell, and to
the discrete uctuations between the three states of the
promotor sites. Before we do this, we should (a) increase
the eciency of the simulation, and (b) compare it to
the continuum simulation that would be obtained if there
were no uctuations.
To see how important the uctuations are, we should
compare the stochastic simulation to the solution of the
continuum reaction rate equations (as we did in exer-
cise 8.6). In reference [28], the authors write a set of

34 Messenger RNA (mRNA) codes for proteins. Other forms of RNA can serve as

enzymes or parts of the machinery of the cell.

35 RNA polymerase, the molecular motor responsible for transcribing DNA into

RNA, needs to attach to the DNA at a promotor site. By binding to the adjacent
operator sites, our repressor protein inhibits this attachment and hence partly blocks
transcription. The residual transcription is called leakiness.
36 Proteins by convention have the same names as their mRNA, but start with

capitals where the mRNA start with small letters.

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
152 Computational Stat Mech: Ising and Markov

six dierential equations giving a continuum version of the operator taps the key.) The continuum description
the stochastic simulation. These equations are simpli- is accurate in the limit where the binding and unbinding
ed: they both integrate out or coarsegrain away the rates are fast compared to all of the other changes in the
promotor states from the system, deriving a Hill equation system: the protein and mRNA variations then see the
(see exercise 5.9) for the mRNA production, and they also average, local equilibrium concentration. On the other
rescale their variables in various ways. Rather than typ- hand, if the rates are slow compared to the response of
ing in their equations and sorting out these rescalings, it the mRNA and protein, the latter can have a switching
is convenient and illuminating to write a simple routine appearance.
to generate the continuum dierential equations directly
from our reaction rates.
(b) Write a DeterministicRepressilator, derived from Re- (d) Incorporate a telegraphFactor into your stochastic re-
pressilator just as StochasticRepressilator was. Write a pressilator routine, that multiplies the binding and un-
routine dcdt(c, t), that binding rates. Run for 1000 seconds with RNAFactor=10
(to suppress the shot noise) and telegraphFactor = 0.001.
Sets the chemical amounts in the reaction network Do you observe features in the mRNA curves that appear
to the values in the array c, to switch as the relevant proteins unbind and bind?
Sets a vector dcdt (of length the number of chemi-
cals) to zero,
For each reaction, Advanced Algorithms: The simulation you will be
compute its rate given implements the Gillespie algorithm discussed in ex-
for each chemical whose stoichiometry is ercise 8.6. At each step, the rates of all possible reactions
changed by the reaction, add the stoichiometry are calculated, in order to randomly choose when and
change times the rate to the corresponding en- which the next reaction will be. For a large, loosely con-
try of dcdt. nected system of reactions there is no need to recalculate
Call a routine to integrate the resulting dierential equa- each rate only the rates which have changed due to
tion (as described in the last part of exercise 8.9, for ex- the previous reaction. Keeping track of the dependency
ample), and compare your results to those of the stochas- network (which chemical amounts aect which reactions
tic simulation. change the amounts of which chemicals) is relatively sim-
ple [71].
The stochastic simulation has signicant uctuations
away from the continuum equation. Part of these uc-
tuations are due to the fact that the numbers of proteins (e) Alter the reaction network to store the current re-
and mRNAs are small: in particular, the mRNA numbers action rates. Add a function UpdateRates(reac) to the
are signicantly smaller than the protein numbers. reaction network, which for each chem whose stoichiom-
(c) Write a routine that creates a stochastic repressi- etry is changed by reac, updates the rates for each re-
lator network that multiplies the mRNA concentrations action aected by the amount of chem. Alter the Step
by RNAFactor without otherwise aecting the continuum method of the stochastic repressilator simulation to use
equations. (That is, multiply the initial concentrations the stored current reaction rates (rather than recomputing
and the transcription rates by RNAFactor, and divide them) and to call UpdateRates with the chosen reaction
the translation rate by RNAFactor.) Try boosting the before returning. Time your new routine, and compare to
RNAFactor by ten and one hundred. Do the RNA and the speed of the old one. A network of thirty reactions
protein uctuations become signicantly smaller? This for fteen chemical components is rather small on biolog-
noise, due to the discrete, integer values of chemicals in ical scales. The dependency network algorithm should be
the cell, is analogous to the shot noise seen in electrical signicantly faster for large systems.
circuits due to the discrete quantum of electric charge. It
scales, as do most uctuations, as the square root of the
number of molecules.
A continuum description of the binding of the proteins
to the operator sites on the DNA seems particularly du-
bious: a variable that must be zero or one is replaced
by a continuous evolution between these extremes. (Such
noise in other contexts is called telegraph noise in anal-
ogy to the telegraph, which is either silent or sending as (8.8) Entropy Increases! Markov chains. (Math)
To be pub. Oxford UP, Fall05
8.2 Markov Chains 153

Entropy is Concave (Convex downward) doesnt increase in Hamiltonian systems. Let us show
f( a+(1-)b ) > f(a) + (1-)f(b)
0.4 that it does increase for Markov chains.37
The Markov chain is implicitly exchanging energy with
f( a+(1-)b ) a heat bath at the temperature T . Thus to show that
0.3 the entropy for the world as a whole increases, we must
f(a) show that S E/T increases, where S is the en-
-x log(x)

0.2 tropy of our system and E/T is the entropy ow from

the heat bath. Hence, showing that entropy increases for
f(b) our Markov process is equivalent to showing that the free
0.1 f(a) + (1-) f(b) energy E T S decreases.
Let P be the transition matrix for a Markov process,
0 satisfying detailed balance with energy E at tempera-
0 0.2 0.4 0.6 0.8 1 ture T . The current probability of being in state is .
The free energy
Fig. 8.10 For x 0, f (x) = x log x is strictly convex
downward (concave) as a function of the probabilities: for X X
0 < < 1, the linear interpolation lies below the curve. F = E TS = E + kB T log . (8.27)

(c) Show that the free energy decreases for a Markov pro-
Convexity arguments are a basic tool in formal statistical
cess. In particular, using
P equation 8.26, show that the
mechanics. The function f (x) = x log x is strictly con- (n+1) (n)
free energy for = P is less than or equal
cave (convex downward) for x 0 (gure 8.10): this is
easily shown by noting that its second derivative is nega- to the free energy for (n) . You may use the properties
tive in this region. P the Markov transition matrix P , (0 P 1 and

P P = 1), and detailed balance (P = P ,
(a) Convexity for sums of many terms. If = 1,
where = exp(E /kB T )/Z). (Hint: youll want to
and if for all both 0 and x 0, show by induc-
use = P in equation 8.26, but the entropy will in-
tion on the number of states M that if g(x) is concave for volve P , which is not the same. Use detailed balance
x 0, to convert from one to the other.)
g( x ) g(x ). (8.26)
=1 =1 (8.9) Solving ODEs: The Pendulum (Computa-
(Hint: In the denition of concave, f (a + (1 )b) tional) (With Myers. [72])
f (a) + (1 )f (b), take (1 ) = M +1 and b = xM +1 . Reading: Numerical Recipes [81], chapter 16.
Then a is a sum of M terms, rescaled from their original Physical systems usually evolve continuously in time:
values. Do the coecients of x in a sum to one? Can their laws of motion are dierential equations. Computer
we apply induction?) simulations must approximate these dierential equations
Microcanonical Entropy is Maximum. In problem using discrete time steps. In this exercise, we will intro-
set 2, you showed that the microcanonical ensemble was duce some common methods for simulating dierential
an extremum of the entropy, using Lagrange multipliers. equations using the simple example of the pendulum:
We can use the convexity of x log x to show that its
actually a global maximum. = = (g/L) sin(). (8.28)
(b) Using equation 8.26 for g(x) = x log x and =
1/M ,Pshow that the entropy for a system of M states This equation gives the motion of a pendulum with a
kB log kB log M , the entropy of the (uni- point mass at the tip of a massless rod38 of length L:
form) microcanonical ensemble. rederive it using a free body diagram.
Markov Chains: Entropy Increases! In problem Go to our Web site [105] and download the pendulum
set 2 you also noticed that, formally speaking, entropy les for the language youll be using. The animation

37 We know that the Markov chain eventually evolves to the equilibrium state, and

we argued that the latter minimizes the free energy. What were showing here is that
the free energy goes continuously downhill for a Markov chain.
38 Well depict our pendulum emphasizing the rod rather than the mass: the equa-

tion for a physical rod without an end mass is similar.

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
154 Computational Stat Mech: Ising and Markov

should show a pendulum oscillating from an initial con- the Hamiltonian preserving important physical features
dition 0 = 2/3, = 0; the equations being solved have not kept by just approximately solving the dynamics.
g = 9.8m/s2 and L = 1m. Accuracy. Most computational methods for solving dif-
There are three independent criteria for picking a good ferential equations (and many other continuum problems
algorithm for solving dierential equations: delity, ac- like integrating functions) involve a step size , and be-
curacy, and stability. come more accurate as gets smaller. The easy thing
Fidelity. Notice that in our time step algorithm, we to calculate is the error in each time step, but the more
did not do the straightforward choice using the current important quantity is the accuracy of the answer after a
((t), (t)) to produce ((t + ), (t + )). Rather, we used xed time T , which is the accumulated error after T /
(t) to calculate the acceleration and update , and then time steps. If this accumulated error varies as n , we say
used (t + ) to calculate (t + ). that the algorithm has nth order cumulative accuracy.
Our algorithm is not very high order!
(t + ) = (t) + (t) (8.29) (b) Plot the pendulum trajectory (t) for time steps =
(t + ) = (t) + (t + ) 0.1, 0.01, and 0.001. Zoom in on the curve at one of the
coarse points (say, t = 1) and compare the values from
Wouldnt it be simpler and make more sense to up- the three time steps. Does it appear that this time is con-
date and simultaneously from their current values, verging41 as 0? From your measurement, what order
so (t + ) = (t) + (t) ? (This simplest of all time- accuracy is our method?
stepping schemes is called the Euler method, and should We can write higherorder symplectic algorithms. The
not be used for ordinary dierential equations (although simple approximation to the second derivative
it is sometimes used in partial dierential equations.)
(a) Try it. First, see why reversing the order of the up- ((t + ) 2(t) + (t )) / 2 (8.31)
dates to and ,
(which you can verify with a Taylor expansion is correct
to O( 4 )) motivates the Verlet Algorithm
(t + ) = (t) + (t)
(t + ) = (t) + (t) (8.30) (t + ) = 2(t) (t ) + 2 . (8.32)

in our loop would give us a simultaneous update. Swap This algorithm is a bit awkward to start up since you need
these two lines in the code, and watch the pendulum swing to initialize42 (t ); its also often convenient to know
for several turns, until it starts looping the loop. Is the the velocities as well as the positions. The Velocity Verlet
new algorithm as good as the old one? (Make sure you algorithm xes both of these problems; it is motivated by
switch the two lines back afterwards.) the constant acceleration formula x(t) = x0 + v0 t + 1/2 at2 :
The simultaneous update scheme is just as accurate as
(t + ) = (t) + (t) + 1/2 (t) 2 (8.33)
the one we chose, but it is not as faithful to the physics
of the problem: its delity is not as good. For subtle rea- (t + /2) = (t) + /2 (t)
sons we wont explain here, updating rst and then (t + ) = (t + /2) + 1/2 (t + ) .
allows our algorithm to exactly conserve an approxima-
tion to the energy: its called a symplectic algorithm.39 The trick that makes this algorithm so good is to cleverly
Improved versions of this algorithm like the Verlet al- split the velocity increment into two pieces, half for the
gorithms below are often used to simulate systems that acceleration at the old position and half for the new po-
conserve energy (like molecular dynamics) because they sition.43 (Youll want to initialize once before starting
exactly40 simulate the dynamics for an approximation to the loop.)

39 It conserves a symplectic form. In non-mathematicians language, this means our

time-step perfectly simulates a Hamiltonian system satisfying Liouvilles theorem and

energy conservation, but with an approximation to the true energy.
40 Up to rounding errors
41 You may note that its easy to extrapolate to the correct answer. This is called

Richardson extrapolation and is the basis for the BulirschStoer methods.

42 Since we start with = 0, the simulation is symmetric under reversing the sign

of time and you can get away with using (t ) = (t) + 1/2 + O(4 ).
43 You may check that both Verlet algorithms give exactly the same values for

(t0 + n).
To be pub. Oxford UP, Fall05
8.2 Markov Chains 155

(c) Pick one of the Verlet algorithms, implement it, and methods.
plot the trajectory for time steps = 0.1, 0.01, and 0.001. The generalpurpose solvers come in a variety of basic
You should see a dramatic improvement in convergence. algorithms (RungeKutta, predictorcorrector, . . . ), and
What cumulative order accuracy does Verlet have?44 methods for maintaining and enhancing accuracy (vari-
Stability. In many cases high accuracy is not crucial. able step size, Richardson extrapolation). There are also
What prevents us from taking enormous time steps? In implicit methods for sti systems. A system is sti if there
a given problem, there is usually a typical fastest time is a large separation between the slowest and fastest rele-
scale: a vibration or oscillation period (as in our prob- vant time scales: implicit methods often allow one to take
lem) or a growth or decay rate. When our time step time steps much larger than the fastest time scale (un-
becomes a substantial fraction of this fastest time scale, like the explicit Verlet methods you studied in part (d),
algorithms like ours usually become unstable: the rst few which go unstable). Large, sophisticated packages have
time steps may be fairly accurate, but small errors build been developed over many years for solving dierential
up until the errors become unacceptable (indeed, often equations switching between algorithms and varying
ones rst warning of problems are machine overows). the time steps to most eciently maintain a given level
(d) Plot the pendulum trajectory (t) for time steps = of accuracy. They solve dy/dt = dydt(y, t), where for
0.1, 0.2, . . . , 0.8, using a small amplitude oscillation us y = [, ] and dydt = [, ]. They typically come
0 = 0.01, 0 = 0.0, up to tmax = 10. At about what c in the form of subroutines or functions, which need as
does it go unstable? Looking at the rst few points of the arguments
trajectory, does it seem like sampling the curve at steps Initial conditions y0 ,
much larger than c would miss the oscillations? At c /2, The righthand side dydt, a function of the vec-
how accurate is the amplitude of the oscillation? (Youll tor y and time t, which returns a vector giving the
need to observe several periods in order to estimate the current rate of change of y, and
maximum amplitude of the solution.) The initial and nal times, and perhaps intermedi-
In solving the properties of large, nonlinear systems (e.g., ate times, at which the trajectory y(t) is desired.
partial dierential equations (PDEs) and molecular dy- They often have options that
namics) stability tends to be the key diculty. The
maximum stepsize depends on the local conguration, Ask for desired accuracy goals, typically a rela-
so highly nonlinear regions can send the system unsta- tive (fractional) accuracy and an absolute accuracy,
ble before one might expect. The maximum safe stable sometimes set separately for each component of y,
Ask for and return derivative and time step informa-
stepsize often has accuracy far higher than needed; in-
tion from the end of the last step (to allow ecient
deed, some algorithms become less stable if the stepsize
restarts after intermediate points),
is decreased!45
Ask for a routine that computes the derivatives of
ODE packages: higher order, variable stepsize, sti sys- dydt with respect to the current components of y
tems . . . (for use by the sti integrator), and
The Verlet algorithms are fairly simple to code, and we Return information about the methods, time steps,
use higherorder symplectic algorithms in Hamiltonian and performance of the algorithm.
systems mostly in unusual applications (planetary mo-
You will be supplied with one of these generalpurpose
tion) where high accuracy is demanded, because they are
packages, and instructions on how to use it.
typically signicantly less stable. In systems of dier-
(e) Write the function dydt, and use the general pur-
ential equations where there is no conserved energy or
pose solver to solve for the motion of the pendulum as in
Hamiltonian, or even in Hamiltonian systems (like high
parts (a)-(c), and informally check that the trajectory is
energy collisions) where accuracy at short times is more
crucial than delity at long times, we use general purpose

44 The error in each time step of the Verlet algorithm is of order 4 . Its usually said

that the Verlet algorithms have third order accuracy, naively assuming that running
for a time T should have errors bounded by the number of time steps T / times the
error per time step 4 . However, one can check that the errors in successive time
steps build up quadratically at short times (i.e., the velocity errors build up linearly
with time), so after T / time steps the accumulated error is 4 (T /)2 2 . Well
use cumulative order of the algorithm to distinguish it from the naive order.
45 For some partial dierential equations, decreasing the spacing x between points

can lead to instabilities unless the time step is also decreased.

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
156 Computational Stat Mech: Ising and Markov

(8.10) Small World Networks. (Complexity, Compu- lem, for a variety of languages and systems (currently
tation) (With Myers. [72]) Python under Unix and Windows).
Many interesting problems arise from studying proper- Constructing a small world network. The L nodes in
ties of randomly generated networks. A network is a col- a small world network are arranged around a circle. There
lection of nodes and edges, with each edge connected to are two kinds of edges. Each node has Z short edges con-
two nodes, but with each node potentially connected to necting it to its nearest neighbors around the circle (up
any number of edges. A random network is constructed to a distance Z/2). In addition, there are p L Z/2
probabilistically according to some denite rules; study- shortcuts added to the network, which connect nodes at
ing such a random network usually is done by studying random (see gure 8.12). (This is a simpler version [73]
the entire ensemble of networks, each weighted by the of the original model [117], which rewired a fraction p of
probability that it was constructed. Thus these problems the LZ/2 edges.)
naturally fall within the broad purview of statistical me-
(a) Dene a network object on the computer. For this
problem, the nodes will be represented by integers. Imple-
ment a network class, with ve functions:

(1) HasNode(node), which checks to see if a node is al-

ready in the network,
(2) AddNode(node), which adds a new node to the sys-
tem (if its not already there),
(3) AddEdge(node1, node2), which adds a new edge to
the system,
(4) GetNodes(), which returns a list of existing nodes,
(5) GetNeighbors(node), which returns the neighbors
Fig. 8.11 A network is a collection of nodes (circles) and edges of an existing node.
(lines between the circles).
Write a routine to construct a smallworld network,
which (given L, Z, and p) adds the nodes and the short
One of the more popular topics in random network the- edges, and then randomly adding the shortcuts. Use the
ory is the study of how connected they are. Six degrees software provided to draw this small world graph, and
of separation is the phrase commonly used to describe check that youve implemented the periodic boundary con-
the interconnected nature of human acquaintances: vari- ditions correctly (each node i should be connected to nodes
ous somewhat uncontrolled studies have shown that any (i Z/2)modL, . . . , (i + Z/2)modL).
random pair of people in the world can be connected to
one another by a short chain of people (typically around
six), each of whom knows the next fairly well. If we repre-
sent people as nodes and acquaintanceships as neighbors,
we reduce the problem to the study of the relationship
In this problem, we will generate some random networks,
and calculate the distribution of distances between pairs
of points. Well study small world networks [117, 73], a
simple theoretical model that suggests how a small num-
ber of shortcuts (unusual international and intercultural
friendships, ) can dramatically shorten the typical chain
lengths. Finally, well study how a simple, universal scal-
ing behavior emerges for large networks with few short-
On the Web site for this book [105], youll nd some hint
les and graphic routines to facilitate working this prob-
To be pub. Oxford UP, Fall05
8.2 Markov Chains 157

Fig. 8.12 Small world network, with L = 20, Z = 4, and Check your function by testing that the histogram of
p = 0.2.46 path lengths at p = 0 is constant for 0 < * < L/Z,
as advertised. Generate graphs at L = 1000 and
Z = 2 for p = 0.02 and p = 0.2: display the cir-
Measuring the minimum distances between cle graphs and plot the histogram of path lengths.
nodes. The most studied property of small world graphs Zoom in on the histogram: how much does it change
is the distribution of shortest paths between nodes. With- with p? What value of p would you need to get six
out the long edges, the shortest path between i and j will degrees of separation ?
be given by hopping in steps of length Z/2 along the
shorter of the two arcs around the circle: there will be (3) FindAveragePathLength(graph), which similarly
no paths of length longer than L/Z (halfway around the computes the mean * over all pairs of nodes. Com-
circle), and the distribution (*) of path lengths * will pute * for Z = 2, L = 100, and p = 0.1 a few times:
be constant for 0 < * < L/Z. When we add shortcuts, your answer should be around * = 10. Notice that
we expect that the distribution will be shifted to shorter there are substantial statistical uctuations in the
path lengths. value from sample to sample. Roughly how many
(b) Write three functions to nd and analyze the path long bonds are there in this system? Would you ex-
length distribution: pect uctuations in the distances?
(1) FindPathLengthsFromNode(graph, node), which
returns for each node2 in the graph the shortest
(c) Plot the average path length between nodes *(p) di-
distance from node to node2. An ecient algo-
vided by *(p = 0) for Z = 2, L = 50, with p on a semi-log
rithm is a breadth rst traversal of the graph, work-
plot from p = 0.001 to p = 1. Compare with gure 2
ing outward from node in shells. There will be a
of Watts and Strogatz [117]. You should nd roughly the
currentShell of nodes whose distance will be set
same curve, with the values of p shifted by a factor of 100.
to * unless they have already been visited, and a
(They do L = 1000 and Z = 10).
nextShell which will be considered after the cur-
rent one is nished (looking sideways before forward, Large N and the emergence of a continuum limit.
breadthrst): We can understand the shift in p of part (c) as a contin-
Initialize * = 0, the distance from node to itself uum limit of the problem. In the limit where the number
to zero, and currentShell = [node] of nodes N becomes large and the number of short cuts
While there are nodes in the new pLZ/2 stays xed, this network problem has a nice limit
currentShell: where distance is measured in radians around the cir-
Start a new empty nextShell cle. Dividing * by *(p = 0) L/(2Z) essentially does
For each neighbor of each node in the cur- this, since = Z*/L.
rent shell, if the distance to neighbor has
not been set, add the node to nextShell (d) Create and display a circle graph of your geometry
and set the distance to * + 1 from part (c) [Z = 2, L = 50] at p = 0.1; create
Add one to *, and set the current shell to and display circle graphs of Watts and Strogatz geom-
nextShell etry [Z = 10, L = 1000] at p = 0.1 and p = 0.001. Which
of their systems looks statistically more similar to yours?
Return the distances
Plot (perhaps using the scaling collapse routine provided)
This will sweep outward from node, measuring the the rescaled average path length Z*/L versus the total
shortest distance to every other node in the network. number of shortcuts pLZ/2, for a range 0.001 < p < 1,
(Hint: Check your code with a network with small for L = 100 and 200 and Z = 2 and 4.
N and small p, comparing a few paths to hand
calculations from the graph image generating as in In this limit, the average bond length  should be a
part (a).) function only of M . Since reference [117] ran at a value of
(2) FindPathLengthHistogram(graph), which com- ZL a factor of 100 larger than ours, our values of p are a
putes the probability (*) that a shortest path will factor of 100 larger to get the same value of M = pLZ/2.
have length *, by using FindPathLengthsFromNode Newman and Watts [76] derive this continuum limit with
repeatedly to nd the mean over all pairs of nodes. a renormalizationgroup analysis (chapter 13).

46 There are seven new shortcuts, where pLZ/2 = 8; one of the added edges over-

lapped an existing edge or connected a node to itself.

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
158 Computational Stat Mech: Ising and Markov

(e) Real Networks. From the book Web site [105], or Figure 8.14 shows what a large sheet of paper, held at
through your own researches, nd a real network47 and the edges, would look like if small holes were successively
nd the mean distance and histogram of distances between punched out at random locations. Here the ensemble av-
nodes. erages over the dierent choices of random locations for
the holes; this gure shows the sheet just before it fell
apart. Of course, certain choices of hole positions would
cut the sheet in two far earlier (a straight line across the
center) or somewhat later (checkerboard patterns), but
for the vast majority of members of our ensemble the pa-
per will have the same kinds of hole patterns seen here.
Again, it is easier to analyze all the possible patterns of
punches than to predict a particular pattern.
Percolation theory is the study of the qualitative change
in connectivity of a large system as its components are
randomly removed. Outside physics, it has become a pro-
totype of criticality at continuous transitions, presumably
because the problem is simple to state and the analysis
does not demand a background in equilibrium statisti-
cal mechanics.48 In this exercise, well study bond per-
colation (gure 8.14) and site percolation (8.15) in two
Fig. 8.13 Small world network with L = 500, K = 2 and
p = 0.1, with node and edge sizes scaled by the square root of
their betweenness.

In the smallworld network, a few long edges are crucial

to ecient transfer through the system (transfer of infor-
mation in a computer network, transfer of disease in a
population model, . . . ). It is often useful to measure how
crucial a given node or edge is to these shortest paths.
We say a node or edge is between two other nodes if it
is along a shortest path between them. We measure the
betweenness of a node or edge as the total number of
such shortest paths passing through, with (by convention)
the initial and nal nodes included in the between nodes;
see gure 8.13. (If there are K multiple shortest paths
of equal length between two nodes, each path adds 1/K
to its intermediates.) The ecient algorithm to measure
betweenness is a depthrst traversal quite analogous to
the shortestpathlength algorithm discussed above.
(f ) Betweenness. (Advanced) Read references [74] Fig. 8.14 Bond Percolation network. Each bond on a
and [35] , discussing the algorithms for nding the be- 10 10 square lattice is present with probability p = 0.4. This
tweenness. Implement them on the small world net- is below the percolation threshold p = 0.5 for the innite lat-
work, and perhaps the realworld network you analyzed tice, and indeed the network breaks up into individual clusters
in part (e). Visualize your answers by using the graphics (each shaded separately). Note the periodic boundary condi-
software provided on the book Web site [105]. tions. Note there are many small clusters, and only a few large
ones, here twelve clusters of size S = 1, three of size S = 2,
and one cluster of size S = 29 (black). For a large lattice
(8.11) Building a Percolation Network. (Complex- near the percolation threshold the probability distribution of
ity,Computation) (With Myers. [72]) cluster sizes (S) forms a power law (exercise 13.9).

47 Noteworthy examples include movie-actor costars, Six degrees of Kevin Bacon

or baseball players who played on the same team.

To be pub. Oxford UP, Fall05
8.2 Markov Chains 159

On the Web site for this book [105], youll nd some hint visited. The cluster is of course the union of node,
les and graphic routines to facilitate working this prob- the neighbors, the neighbors of the neighbors, etc.
lem, for a variety of languages and systems (currently The trick is to use the set of visited sites to avoid
Python under Unix and Windows). going around in circles. The ecient algorithm is a
Bond percolation on a square lattice. breadth rst traversal of the graph, working outward
(a) Dene a 2D bond percolation network with periodic from node in shells. There will be a currentShell
boundary conditions on the computer, for size L L and of nodes whose neighbors have not yet been checked,
bond probability p. For this problem, the nodes will be rep- and a nextShell which will be considered after the
resented by pairs of integers (i, j). Youll need the method current one is nished (breadthrst):
GetNeighbors(node), which returns the neighbors of an Initialize visited[node]=True,
existing node. Use the bond-drawing software provided cluster=[node], and
to draw your bond percolation network for various p and currentShell=graph.GetNeighbors(node).
L, and use it to check that youve implemented the peri- While there are nodes in the new
odic boundary conditions correctly. (There are two basic currentShell:
approaches. You can start with an empty network and
use AddNode and AddEdge in loops to generate the nodes, Start a new empty nextShell
vertical bonds, and horizontal bonds (see exercise 8.10). For each node in the current shell, if the
Alternatively, and more traditionally, you can set up a node has not been visited,
2D array of vertical and horizontal bonds, and implement add the node to the cluster,
GetNeighbors(node) by constructing the list of neighbors mark the node as visited,
from the bond networks when the site is visited.) and add the neighbors of the node to the
The percolation threshold and duality. In most con- nextShell
tinuous phase transitions, one of the challenges is to nd Set the current shell to nextShell
the location of the transition. We chose bond percolation
on the square lattice because there is a simple argument Return the cluster
that shows, in the limit of large systems, that the perco-
(2) FindAllClusters(graph), which sets up the
lation threshold pc = 1/2. The argument makes use of
visited set to be False for all nodes, and calls
the dual lattice.
FindClusterFromNode(graph, node, visited) on
The nodes of the dual lattice are the centers of the squares all nodes that havent been visited, collecting the re-
between nodes in the original lattice. The edges of the sulting clusters. Optionally, you may want to order
dual lattice are those which do not cross an edge of the the clusters from largest to smallest, for convenience
original lattice. Since every potential dual edge crosses in graphics (and in nding the largest cluster).
exactly one edge of the original lattice, the probability p
of having bonds on the dual lattice is 1 p where p is
Check your code by running it for small L and using the
the probability of bonds for the original lattice. If we can
graphics software provided. Are the clusters, drawn in
show that the dual lattice percolates if and only if the
dierent colors, correct?
original lattice does not, then pc = 1/2. This is easiest to
see graphically: Site percolation on a triangular lattice. Universal-
(b) Generate and print a small lattice with p = 0.4, pick- ity states that the statistical behavior of the percolation
ing one where the largest cluster does not span across ei- clusters at long length scales should be independent of
ther the vertical or the horizontal direction (or print g- the microscopic detail. That is, removing bonds from a
ure 8.14). Draw a path on the dual lattice spanning the square lattice should leave the same fractal patterns of
system from top to bottom and from left to right. (Youll holes, near pc , as punching out circular holes in a sheet
be emulating a rat running through a simple maze.) Is it just before it falls apart. Nothing about your algorithms
clear for large systems that the dual lattice will percolate from part (c) depended on their being four neighbors of a
if and only if the original lattice does not? node, or their even being nodes at all sites. Lets imple-
ment site percolation on a triangular lattice (gure 8.15):
Finding the clusters. (c) Write two functions that
nodes are occupied with probability p, with each node
together nd the clusters in the percolation network:
connected to any of its six neighbor sites that are also
(1) FindClusterFromNode(graph, node, visited), lled (punching out hexagons from a sheet of paper). The
which returns the cluster in graph containing node, triangular site lattice also has a duality transformation,
and marks the sites in the cluster as having been so again pc = 0.5.
c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity
160 Computational Stat Mech: Ising and Markov

Chapter 13 and 13.9 will discuss percolation theory in

more detail.

(8.12) Hysteresis Model: Computational Methods.


Fig. 8.15 Site Percolation network. Each site on a 10 10

triangular lattice is present with probability p = 0.5, the per-
colation threshold for the innite lattice. Note the periodic
boundary conditions at the sides, and the shifted periodic
boundaries at the top and bottom.
Fig. 8.16 Barkhausen noise experiment.

It is computationally convenient to label the site at (x, y)

on a triangular lattice by [i, j], where x = i + j/2 and
y = 23 j. If we again use periodic boundary conditions
with 0 i < L and 0 j < L, we cover a region in the
shape of a 60 rhombus.49 Each site [i, j] has six neigh-
bors, at [i, j] + e with e = [1, 0], [0, 1], [1, 1] upward and 3.0
to the right and minus the same three downward and left.
Applied magnetic field (H/J)

(d) Generate a site percolation network on a triangu-
lar lattice.You can treat the sites one at a time, using 1.0

AddNode with probability p, and check HasNode(neighbor) 0.0

to bond to all existing neighbors. Alternatively, you can
start by generating a whole matrix of random numbers 1.0
in one sweep to determine which sites are occupied by
nodes, add those nodes, and then ll in the bonds. Check
your resulting network by running it for small L and us- 3.0
1.0 0.5 0.0 0.5 1.0
ing the graphics software provided. (Notice the shifted
Magnetization (M)
periodic boundary conditions at the top and bottom, see
gure 8.15.) Use your routine from part (c) to generate Fig. 8.17 Hysteresis loop with subloops.
the clusters, and check these (particularly at the periodic
boundaries) using the graphics software.
(e) Generate a small squarelattice bond percolation clus-
ter, perhaps 3030, and compare with a small triangular
lattice site percolation cluster. They should look rather
dierent in many ways. Now generate a large50 cluster of
each, perhaps 1000 1000 (or see gure 13.9). Stepping
back and blurring your eyes, do the two look substantially
similar? Fig. 8.18 Tiny jumps: Barkhausen noise.

49 The graphics software uses the periodic boundary conditions to shift this rhom-
bus back into a rectangle.
50 Your code, if written properly, should run in a time of order N , the number of

nodes. If it seems to slow down more than a factor of 4 when you increase the length
of the side by a factor of two, check for ineciencies.
To be pub. Oxford UP, Fall05
8.2 Markov Chains 161

Lattice Queue 400