43 views

Uploaded by ercf

documento concerniente a complejidad estadistica

documento concerniente a complejidad estadistica

© All Rights Reserved

- Symbols of power - Gates
- Foundations of Complex Systems
- 239a Lectures
- Famous Indian Scientists by Abhijit Guha Iit Kgp
- Bibliography
- Do Not Excite Surface Waves
- holographic methods for condensed matter physicists
- Pages From Chapter 8 Gas Well Performance-4c592002ebde4d7067a054a8d70da19a
- 1-s2.0-000349169090343M-main
- Horacio G. Rotstein, Anatol A. Zhabotinsky and Irving R. Epstein- Localized structures in a nonlinear wave equation stabilized by negative global feedback: one-dimensional and quasi-two-dimensional kinks
- Bloch Sphere
- The Physics of Plasmas Boyd
- Compton Scattering
- electronscattering.pdf
- Lie Algebras in Particle Physics 2ª ed - From Isospin to Unified Theories (Georgi, 1999)
- AQFTNotesdampt
- Full Text
- p-3.pdf
- Storage of Liquid Chlorine
- The Skeletal System Evolution of the Axial Skeleton

You are on page 1of 292

and Complexity

James P. Sethna, Physics, Cornell University, Ithaca, NY

January

c 4, 2005

Q

B b

x

M

STE

SY

Alcohol

E

Two-Phase Mix

http://www.physics.cornell.edu/sethna/StatMech/book.pdf

Contents

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1 Quantum Dice. . . . . . . . . . . . . . . . . . . . . 7

1.2 Probability Distributions. . . . . . . . . . . . . . . 8

1.3 Waiting times. . . . . . . . . . . . . . . . . . . . . 8

1.4 Stirlings Approximation and Asymptotic Series. . 9

1.5 Random Matrix Theory. . . . . . . . . . . . . . . . 10

2.1 Random Walk Examples: Universality and Scale Invariance 13

2.2 The Diusion Equation . . . . . . . . . . . . . . . . . . . 17

2.3 Currents and External Forces. . . . . . . . . . . . . . . . . 19

2.4 Solving the Diusion Equation . . . . . . . . . . . . . . . 21

2.4.1 Fourier . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4.2 Green . . . . . . . . . . . . . . . . . . . . . . . . . 22

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.1 Random walks in Grade Space. . . . . . . . . . . . 24

2.2 Photon diusion in the Sun. . . . . . . . . . . . . . 24

2.3 Ratchet and Molecular Motors. . . . . . . . . . . . 24

2.4 Solving Diusion: Fourier and Green. . . . . . . . 26

2.5 Solving the Diusion Equation. . . . . . . . . . . . 26

2.6 Frying Pan . . . . . . . . . . . . . . . . . . . . . . 26

2.7 Thermal Diusion. . . . . . . . . . . . . . . . . . . 27

2.8 Polymers and Random Walks. . . . . . . . . . . . 27

3.1 The Microcanonical Ensemble . . . . . . . . . . . . . . . . 29

3.2 The Microcanonical Ideal Gas . . . . . . . . . . . . . . . . 31

3.2.1 Conguration Space . . . . . . . . . . . . . . . . . 32

3.2.2 Momentum Space . . . . . . . . . . . . . . . . . . 33

3.3 What is Temperature? . . . . . . . . . . . . . . . . . . . . 37

3.4 Pressure and Chemical Potential . . . . . . . . . . . . . . 40

3.5 Entropy, the Ideal Gas, and Phase Space Renements . . 44

3.6 What is Thermodynamics? . . . . . . . . . . . . . . . . . 46

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.1 Escape Velocity. . . . . . . . . . . . . . . . . . . . 48

i

ii CONTENTS

3.3 Connecting Two Macroscopic Systems. . . . . . . . 49

3.4 Gauss and Poisson. . . . . . . . . . . . . . . . . . . 50

3.5 Microcanonical Thermodynamics . . . . . . . . . . 50

4.1 Liouvilles Theorem . . . . . . . . . . . . . . . . . . . . . 53

4.2 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1 The Damped Pendulum vs. Liouvilles Theorem. . 60

4.2 Jupiter! and the KAM Theorem . . . . . . . . . . 60

4.3 Invariant Measures. . . . . . . . . . . . . . . . . . 62

5.1 Free Energy . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2 The Canonical Ensemble . . . . . . . . . . . . . . . . . . . 67

5.3 NonInteracting Canonical Distributions . . . . . . . . . . 70

5.4 Grand Canonical Ensemble . . . . . . . . . . . . . . . . . 72

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.1 Twostate system. . . . . . . . . . . . . . . . . . . 74

5.2 Barrier Crossing. . . . . . . . . . . . . . . . . . . . 75

5.3 Statistical Mechanics and Statistics. . . . . . . . . 76

5.4 Euler, Gibbs-Duhem, and Clausius-Clapeyron. . . 77

5.5 Negative Temperature. . . . . . . . . . . . . . . . . 78

5.6 Laplace. . . . . . . . . . . . . . . . . . . . . . . . . 78

5.7 Legendre. . . . . . . . . . . . . . . . . . . . . . . . 79

5.8 Molecular Motors: Which Free Energy? . . . . . . 79

5.9 Michaelis-Menten and Hill . . . . . . . . . . . . . . 79

6 Entropy 83

6.1 Entropy as Irreversibility: Engines and Heat Death . . . . 83

6.2 Entropy as Disorder . . . . . . . . . . . . . . . . . . . . . 87

6.2.1 Mixing: Maxwells Demon and Osmotic Pressure . 87

6.2.2 Residual Entropy of Glasses: The Roads Not Taken 89

6.3 Entropy as Ignorance: Information and Memory . . . . . 92

6.3.1 Nonequilibrium Entropy . . . . . . . . . . . . . . . 92

6.3.2 Information Entropy . . . . . . . . . . . . . . . . . 94

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.1 Life and the Heat Death of the Universe. . . . . . 97

6.2 P-V Diagram. . . . . . . . . . . . . . . . . . . . . . 98

6.3 Carnot Refrigerator. . . . . . . . . . . . . . . . . . 98

6.4 Lagrange. . . . . . . . . . . . . . . . . . . . . . . . 99

6.5 Does Entropy Increase? . . . . . . . . . . . . . . . 99

6.6 Entropy Increases: Diusion. . . . . . . . . . . . . 100

6.7 Information entropy. . . . . . . . . . . . . . . . . . 101

6.8 Shannon entropy. . . . . . . . . . . . . . . . . . . . 101

6.9 Entropy of Glasses. . . . . . . . . . . . . . . . . . . 102

6.10 Rubber Band. . . . . . . . . . . . . . . . . . . . . . 103

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

CONTENTS iii

6.12 Chaos, Lyapunov, and Entropy Increase. . . . . . . 104

6.13 Black Hole Thermodynamics. . . . . . . . . . . . . 105

6.14 Fractal Dimensions. . . . . . . . . . . . . . . . . . 105

7.1 Quantum Ensembles and Density Matrices . . . . . . . . . 109

7.2 Quantum Harmonic Oscillator . . . . . . . . . . . . . . . . 114

7.3 Bose and Fermi Statistics . . . . . . . . . . . . . . . . . . 115

7.4 Non-Interacting Bosons and Fermions . . . . . . . . . . . 116

7.5 Maxwell Boltzmann Quantum Statistics . . . . . . . . . 119

7.6 Black Body Radiation and Bose Condensation . . . . . . 121

7.6.1 Free Particles in a Periodic Box . . . . . . . . . . . 121

7.6.2 Black Body Radiation . . . . . . . . . . . . . . . . 122

7.6.3 Bose Condensation . . . . . . . . . . . . . . . . . . 123

7.7 Metals and the Fermi Gas . . . . . . . . . . . . . . . . . . 125

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

7.1 Phase Space Units and the Zero of Entropy. . . . . 126

7.2 Does Entropy Increase in Quantum Systems? . . . 127

7.3 Phonons on a String. . . . . . . . . . . . . . . . . . 128

7.4 Crystal Defects. . . . . . . . . . . . . . . . . . . . 128

7.5 Density Matrices. . . . . . . . . . . . . . . . . . . . 128

7.6 Ensembles and Statistics: 3 Particles, 2 Levels. . . 128

7.7 Bosons are Gregarious: Superuids and Lasers . . 129

7.8 Einsteins A and B . . . . . . . . . . . . . . . . . . 130

7.9 Phonons and Photons are Bosons. . . . . . . . . . 131

7.10 Bose Condensation in a Band. . . . . . . . . . . . 132

7.11 Bose Condensation in a Parabolic Potential. . . . . 132

7.12 Light Emission and Absorption. . . . . . . . . . . . 133

7.13 Fermions in Semiconductors. . . . . . . . . . . . . 134

7.14 White Dwarves, Neutron Stars, and Black Holes. . 135

8.1 The Ising Model . . . . . . . . . . . . . . . . . . . . . . . 137

8.1.1 Magnetism . . . . . . . . . . . . . . . . . . . . . . 137

8.1.2 Binary Alloys . . . . . . . . . . . . . . . . . . . . . 138

8.1.3 Lattice Gas and the Critical Point . . . . . . . . . 139

8.1.4 How to Solve the Ising Model. . . . . . . . . . . . 140

8.2 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . 141

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

8.1 The Ising Model. . . . . . . . . . . . . . . . . . . . 145

8.2 Coin Flips and Markov Chains. . . . . . . . . . . . 146

8.3 Red and Green Bacteria . . . . . . . . . . . . . . . 146

8.4 Detailed Balance. . . . . . . . . . . . . . . . . . . . 147

8.5 Heat Bath, Metropolis, and Wol. . . . . . . . . . 147

8.6 Stochastic Cells. . . . . . . . . . . . . . . . . . . . 148

8.7 The Repressilator. . . . . . . . . . . . . . . . . . . 150

8.8 Entropy Increases! Markov chains. . . . . . . . . . 152

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

iv CONTENTS

8.10 Small World Networks. . . . . . . . . . . . . . . . 156

8.11 Building a Percolation Network. . . . . . . . . . . 158

8.12 Hysteresis Model: Computational Methods. . . . . 160

9.1 Identify the Broken Symmetry . . . . . . . . . . . . . . . 164

9.2 Dene the Order Parameter . . . . . . . . . . . . . . . . . 164

9.3 Examine the Elementary Excitations . . . . . . . . . . . . 167

9.4 Classify the Topological Defects . . . . . . . . . . . . . . . 170

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

9.1 Topological Defects in the XY Model. . . . . . . . 175

9.2 Topological Defects in Nematic Liquid Crystals. . 177

9.3 Defect Energetics and Total Divergence Terms. . . 177

9.4 Superuid Order and Vortices. . . . . . . . . . . . 177

10.1 Random Walks from Symmetry . . . . . . . . . . . . . . . 180

10.2 What is a Phase? Perturbation theory. . . . . . . . . . . . 183

10.3 Free Energy Density for the Ideal Gas . . . . . . . . . . . 185

10.4 Landau Theory for Free Energy Densities . . . . . . . . . 188

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

10.1 Deriving New Laws. . . . . . . . . . . . . . . . . . 193

10.2 Symmetries of the Wave Equation. . . . . . . . . . 193

10.3 Bloch walls in Magnets. . . . . . . . . . . . . . . . 193

10.4 Pollen and Hard Squares. . . . . . . . . . . . . . . 194

10.5 Superuids: Density Matrices and ODLRO. . . . . 194

11.1 Correlation Functions: Motivation . . . . . . . . . . . . . 199

11.2 Experimental Probes of Correlations . . . . . . . . . . . . 201

11.3 EqualTime Correlations in the Ideal Gas . . . . . . . . . 202

11.4 Onsagers Regression Hypothesis and Time Correlations . 204

11.5 Susceptibility and the FluctuationDissipation Theorem . 206

11.5.1 Dissipation and the imaginary part () . . . . . 207

11.5.2 Calculating the static susceptibility 0 (k) . . . . . 209

11.5.3 Calculating the dynamic susceptibility (r, t) . . . 211

11.6 Causality and Kramers Kronig . . . . . . . . . . . . . . . 214

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

11.1 Telegraph Noise and RNA Unfolding. . . . . . . . 215

11.2 Telegraph Noise in Nanojunctions. . . . . . . . . . 216

11.3 Coarse-Grained Magnetic Dynamics. . . . . . . . . 217

11.4 Fluctuations, Correlations, and Response: Ising . . 218

11.5 Spin Correlation Functions and Susceptibilities. . . 219

12.1 Maxwell Construction. . . . . . . . . . . . . . . . . . . . . 223

12.2 Nucleation: Critical Droplet Theory. . . . . . . . . . . . . 224

12.3 Morphology of abrupt transitions. . . . . . . . . . . . . . 226

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

CONTENTS 1

12.3.2 Martensites. . . . . . . . . . . . . . . . . . . . . . . 229

12.3.3 Dendritic Growth. . . . . . . . . . . . . . . . . . . 230

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

12.1 van der Waals Water. . . . . . . . . . . . . . . . . 230

12.2 Nucleation in the Ising Model. . . . . . . . . . . . 231

12.3 Coarsening and Criticality in the Ising Model. . . . 232

12.4 Nucleation of Dislocation Pairs. . . . . . . . . . . . 233

12.5 Oragami Microstructure. . . . . . . . . . . . . . . . 234

12.6 Minimizing Sequences and Microstructure. . . . . . 236

13.1 Universality. . . . . . . . . . . . . . . . . . . . . . . . . . . 241

13.2 Scale Invariance . . . . . . . . . . . . . . . . . . . . . . . . 248

13.3 Examples of Critical Points. . . . . . . . . . . . . . . . . . 255

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

13.1 Scaling: Critical Points and Coarsening. . . . . . . 258

13.2 RG Trajectories and Scaling. . . . . . . . . . . . . 259

13.3 Bifurcation Theory and Phase Transitions. . . . . 259

13.4 Onset of Lasing as a Critical Point. . . . . . . . . . 261

13.5 Superconductivity and the Renormalization Group. 262

13.6 RG and the Central Limit Theorem: Short. . . . . 264

13.7 RG and the Central Limit Theorem: Long. . . . . 264

13.8 Period Doubling. . . . . . . . . . . . . . . . . . . . 266

13.9 Percolation and Universality. . . . . . . . . . . . . 269

13.10 Hysteresis Model: Scaling and Exponent Equalities.271

A.1 Fourier Conventions . . . . . . . . . . . . . . . . . . . . . 275

A.2 Derivatives, Correlations, and Convolutions . . . . . . . . 277

A.3 Fourier and Translational Symmetry . . . . . . . . . . . . 278

A.4 Fourier Methods and Function Space . . . . . . . . . . . . 279

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

A.1 Relations between the Fouriers. . . . . . . . . . . . 279

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

2 CONTENTS

Why Study Statistical

Mechanics? 1

Many systems in nature are far too complex to analyze directly. Solving

for the motion of all the atomis in a block of ice or the boulders in

an earthquake fault, or the nodes on the Internet is simply infeasible.

Despite this, such systems often show simple, striking behavior. We use

statistical mechanics to explain the simple behavior of complex systems.

Statistical mechanics brings together concepts and methods that inl-

trate into many elds of science, engineering, and mathematics. Ensem-

bles, entropy, phases, Monte Carlo, emergent laws, and criticality all

are concepts and methods rooted in the physics and chemistry of gasses

and liquids, but have become important in mathematics, biology, and

computer science. In turn, these broader applications bring perspective

and insight to our elds.

Lets start by briey introducing these pervasive concepts and meth-

ods.

Ensembles: The trick of statistical mechanics is not to study a single

systems, but a large collection or ensemble of systems. Where under-

standing a single system is often impossible, calculating the behavior of

an enormous collection of similarly prepared systems often allows one to

answer most questions that science can be expected to address.

For example, consider the random walk (gure 1.1, chapter 2). (You

might imagine it as the trajectory of a particle in a gas, or the cong-

uration of a polymer in solution. While the motion of any given walk

is irregular (left) and hard to predict, simple laws describe the distribu-

tion of motions of an innite ensemble of random walks starting from

the same initial point (right). Introducing and deriving these ensembles

are the themes of chapters 3, 4, and 5.

Entropy: Entropy is the most inuential concept arising from statis-

tical mechanics (chapter 6. Entropy, originally understood as a thermo-

dynamic property of heat engines that could only increase, has become

sciences fundamental measure of disorder and information. Although it

controls the behavior of particular systems, entropy can only be dened

within a statistical ensemble: it is the child of statistical mechanics,

with no correspondence in the underlying microscopic dynamics. En-

tropy now underlies our understanding of everything from compression

algorithms for pictures on the Web to the heat death expected at the

end of the universe.

Phases. Statistical mechanics explains the existence and properties of

3

4 Why Study Statistical Mechanics?

liquid, or photons in the Sun, is described by an irregular trajectory whose velocity

rapidly changes in direction at random. Describing the specic trajectory of any

given random walk (left) is not feasible or even interesting. Describing the statistical

average properties of a large number of random walks is straightforward; at right is

shown the endpoints of random walks all starting at the center. The deep principle

underlying statistical mechanics is that it is often easier to understand the behavior

of ensembles of systems.

phases. The three common phases of matter (solids, liquids, and gasses)

have multiplied into hundreds: from superuids and liquid crystals, to

vacuum states of the universe just after the Big Bang, to the pinned

and sliding phases of earthquake faults. Phases have an integrity or

stability to small changes in external conditions or composition,1 and

often have a rigidity or stiness. Understanding what phases are and

how to describe their properties, excitations, and topological defects will

2

Chapter 7 focuses on quantum sta- be the themes of chapters 7,2 9 and 10.

tistical mechanics: quantum statistics, Computational Methods: MonteCarlo methods use simple rules

metals, insulators, superuids, Bose

condensation, . . . To keep the presenta-

to allow the computer to nd ensemble averages in systems far too com-

tion accessible to a broad audience, the plicated to allow analytical evaluation. These tools, invented and sharp-

rest of the text is not dependent upon ened in statistical mechanics, are used everywhere in science and tech-

knowing quantum mechanics. nology from simulating the innards of particle accelerators, to studies

of trac ow, to designing computer circuits. In chapter 8, we introduce

the Markovchain mathematics that underlies MonteCarlo.

Emergent Laws. Statistical mechanics allows us to derive the new

laws that emerge from the complex microscopic behavior. These laws be-

come exact only in certain limits. Thermodynamics the study of heat,

1 Water remains a liquid, with only perturbative changes in its properties, as one

changes the temperature or adds alcohol. Indeed, it is likely that all liquids are

connected to one another, and indeed to the gas phase, through paths in the space

of composition and external conditions.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

5

model at the critical temperature.

Traditional statistical mechanics fo-

cuses on understanding phases of mat-

ter, and transitions between phases.

These phases solids, liquids, mag-

nets, superuids are emergent prop-

erties of many interacting molecules,

spins, or other degrees of free-

dom. Pictured here is a simple

two-dimensional model at its mag-

netic transition temperature Tc . At

higher temperatures, the system is

non-magnetic: the magnetization is

on average zero. At the temperature

shown, the system is just deciding

whether to magnetize upward (white)

or downward (black). While predict-

ing the time dependence of all these

degrees of freedom is not practical or

possible, calculating the average be-

havior of many such systems (a statis-

tical ensemble) is the job of statistical

mechanics.

of particles. Scaling behavior and power laws both at phase transitions

and more broadly in complex systems emerge for large systems tuned

(or selforganized) near critical points. The right gure 1.1 illustrates

the simple law (the diusion equation) that describes the evolution of

the end-to-end lengths of random walks in the limit where the number

of steps becomes large. Developing the machinery to express and derive

these new laws are the themes of chapters 10 (phases), and 13 (critical

points). Chapter 11 systematically studies the uctuations about these

emergent theories, and how they relate to the response to external forces.

Phase Transitions. Beautiful spatial patterns arise in statistical

mechanics at the transitions between phases. Most of these are abrupt

phase transitions: ice is crystalline and solid until abruptly (at the edge

of the ice cube) it becomes unambiguously liquid. We study nucleation

and the exotic structures that evolve at abrupt phase transitions in chap-

ter 12.

Other phase transitions are continuous. Figure 1.2 shows a snapshot

of the Ising model at its phase transition temperature Tc . The Ising

model is a lattice of sites that can take one of two states. It is used as a

simple model for magnets (spins pointing up or down), two component

crystalline alloys (A or B atoms), or transitions between liquids and

gasses (occupied and unoccupied sites).3 All of these systems, at their

critical points, share the self-similar, fractal structures seen in the gure:

the system cant decide whether to stay gray or to separate into black

3 The Ising model has more far-ung applications: the threedimensional Ising

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

6 Why Study Statistical Mechanics?

Chaos. The ideas and methods of

statistical mechanics have close ties

to many other elds. Many nonlin-

ear dierential equations and map-

pings, for example, have qualitative

changes of behavior (bifurcations) as

parameters are tuned, and can ex-

hibit chaotic behavior. Here we see

the longtime equilibrium dynamics

x*( )

of a simple mapping of the unit in-

terval into itself as a parameter is

tuned. Just as an Ising magnet goes

from one unmagnetized state above Tc

to two magnetized states below Tc , 1

so this system goes from a periodic

state below 1 to a periodtwo cycle

above 1 . Above , the behavior

is chaotic. The study of chaos has 2

provided us with our fundamental ex-

planation for the increase of entropy

in statistical mechanics. Conversely,

tools developed in statistical mechan-

ics have been central to the under-

standing of the onset of chaos.

object emerges from random walks (left gure 1.1) even without tuning

to a critical point: a blowup of a small segment of the walk looks sta-

tistically similar to the original path. Chapter 13 develops the scaling

and renormalizationgroup techniques that we use to understand these

selfsimilar, fractal properties.

Applications. Science grows through accretion, but becomes po-

tent through distillation. Each generation expands the knowledge base,

extending the explanatory power of science to new domains. In these

explorations, new unifying principles, perspectives, and insights lead us

to deeper, simpler understanding of our elds.

The period doubling route to chaos (gure 1.3) is an excellent ex-

ample of how statistical mechanics has grown tentacles into disparate

elds, and has been enriched thereby. On the one hand, renormalization

group methods drawn directly from statistical mechanics (chapter 13)

were used to explain the striking scaling behavior seen at the onset

of chaos (the geometrical branching pattern at the left of the gure).

These methods also predicted that this behavior should be universal:

this same perioddoubling cascade, with quantitatively the same scal-

ing behavior, would be seen in vastly more complex systems. This was

later veried everywhere from uid mechanics to models of human walk-

ing. Conversely, the study of chaotic dynamics has provided our best

fundamental understanding of the cause for the increase of entropy in

statistical mechanics (chapter 6).

We provide here the distilled version of statistical mechanics, invigo-

rated and claried by the accretion of the last four decades of research.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7

The text in each chapter will address those topics of fundamental im-

portance to all who study our eld: the exercises will provide in-depth

introductions to the accretion of applications in mesoscopic physics,

astrophysics, dynamical systems, information theory, lowtemperature

physics, statistics, biology, lasers, and complexity theory. The goal is to

broaden the presentation to make it useful and comprehensible to so-

phisticated biologists, mathematicians, computer scientists, or complex

systems sociologists thereby enriching the subject for the physics and

chemistry students, many of whom will likely make excursions in later

life into these disparate elds.

Exercises

Exercises 1.11.3 provide a brief review of probability

distributions. Quantum Dice explores discrete distribu-

tions and also acts as a gentle preview into Bose and 3 4 5 6

Roll #2

form and moments for the key distributions for continuous

variables and then introduces convolutions and multidi- 2 3 4 5

mensional distributions. Waiting Times shows the para-

doxes one can concoct by confusing dierent ensemble av-

Stirling part (a) derives the useful approximation

erages. 1 2 3 4

n! 2n(n/e)n ; more advanced students can continue

in the later parts to explore asymptotic series, which arise

in typical perturbative statistical mechanics calculations. 1 2 3

Random Matrix Theory briey introduces a huge eld,

with applications in nuclear physics, mesoscopic physics, Roll #1

Fig. 1.4 Rolling two dice. In Bosons, one accepts only the

and number theory; part (a) provides a good exercise in rolls in the shaded squares, with equal probability 1/6. In Fer-

histograms and ensembles, and the remaining more ad- mions, one accepts only the rolls in the darkly shaded squares

vanced parts illustrate level repulsion, the Wigner sur- (not including the diagonal), with probability 1/3.

mise, universality, and emergent symmetry.

(a) Presume the dice are fair: each of the three numbers

(1.1) Quantum Dice. (Quantum) (With Buchan. [15]) of dots shows up 1/3 of the time. For a legal turn rolling a

die twice in Bosons, what is the probability (4) of rolling

a 4? Similarly, among the legal Fermion turns rolling two

You are given several unusual three-sided dice which,

dice, what is the probability (4)?

when rolled, show either one, two, or three spots. There

are three games played with these dice, Distinguishable, Our dice rules are the same ones that govern the quantum

Bosons and Fermions. In each turn in these games, the statistics of identical particles.

player rolls one die at a time, starting over if required (b) For a legal turn rolling three three-sided dice in Fer-

by the rules, until a legal combination occurs. In Dis- mions, what is the probability (6) of rolling a 6? (Hint:

tinguishable, all rolls are legal. In Bosons, a roll is legal theres a Fermi exclusion principle: when playing Fer-

only if the new number is larger or equal to the preced- mions, no two dice can have the same number of dots

ing number. In Fermions, a roll is legal only if the new showing.) Electrons are fermions; no two electrons can

number is strictly larger than the preceding number. See be in exactly the same state.

gure 1.4 for a table of possibilities after rolling two dice. When rolling two dice in Bosons, there are six dierent

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

8 Why Study Statistical Mechanics?

legal turns (11), (12), (13), . . . , (33): half of them are (c) Sums of variables. Draw a graph of the probabil-

doubles (both numbers equal), when for plain old Dis- ity distribution of the sum x + y of two random variables

tinguishable turns only one third would be doubles4 : the drawn from a uniform distribution on [0, 1). Argue in gen-

probability of getting doubles is enhanced by 1.5 times eral that the sum z = x + y of random variables with dis-

in two-roll Bosons. When rolling three dice in Bosons, tributions 1 (x) and 2 (y)R will have a distribution given

there are ten dierent legal turns (111), (112), (113), . . . , by the convolution (z) = 1 (x)2 (z x) dx.

(333). When rolling M dice each ` with N sides in Bosons, Multidimensional probability distributions. In sta-

1 1)!

one can show that there are N+M M

= (N+M

M !(N1)!

legal tistical mechanics, we often discuss probability distribu-

turns. tions for many variables at once (for example, all the

(c) In a turn of three rolls, what is the enhancement of components of all the velocities of all the atoms in a

probability of getting triples in Bosons over that in Distin- box). Lets consider just the probability distribution of

guishable? In a turn of M rolls, what is the enhancement one molecules velocities. If vx , vy , and vz of a molecule

of probability for generating an M-tuple (all rolls having are independent and p each distributed with a Gaussian

the same number of dots showing)? distribution with = kT /M (section 3.2.2) then we de-

Notice that the states of the dice tend to cluster together scribe the combined probability distribution as a function

in Bosons. Examples of real bosons clustering into the of three variables as the product of the three Gaussians:

same state include Bose condensation (section 7.6.3) and

lasers (exercise 7.7). (vx , vy , vz ) = 1/(2(kT /M ))3/2 exp(M v2 /2kT )

r ! r 2

!

2

M vx M vy

(1.2) Probability Distributions. (Basic) M M

= e 2kT e 2kT

Most people are more familiar with probabilities for dis- 2kT 2kT

r !

crete events (like coin ips and card games), than with M 2

M vz

2kT

man heights and atomic velocities). The three contin-

uous probability distributions most commonly encoun-

(d) Show, using your answer for the standard deviation

tered in physics are: (i) Uniform: uniform (x) = 1 for

of the Gaussian in part (b), that the mean kinetic energy

0 x < 1, (x) = 0 otherwise; produced by ran-

is kT /2 per dimension. Show that the probability that the

dom number generators on computers; (ii) Exponential:

speed is v = |v| is given by a Maxwellian distribution

exponential (t) = et/ / for t 0, familiar from radioac-

tive decay and used in the collision theory p

2 2 of gasses; and Maxwell (v) = 2/(v 2 / 3 ) exp(v 2 /2 2 ). (1.2)

(iii) Gaussian: gaussian (v) = ev /2 /( 2), describ-

ing the probability distribution of velocities in a gas, the (Hint: What is the shape of the region in 3D velocity

distribution of positions at long times in random walks, space where |v| is between v and v + v? The area of a

the sums of random variables, and the solution to the sphere of radius R is 4R2 .)

diusion equation.

(a) Likelihoods. What is the probability that a ran- (1.3) Waiting times. (Basic) (With Brouwer. [14])

dom number uniform on [0, 1) will happen to lie between On a highway, the average numbers of cars and buses go-

x = 0.7 and x = 0.75? That the waiting time for a ra- ing east are equal: each hour, on average, there are 12

dioactive decay of a nucleus will be more than twice the ex- buses and 12 cars passing by. The buses are scheduled:

ponential decay time ? That your score on an exam with each bus appears exactly 5 minutes after the previous one.

Gaussian distribution of scores R will

be greater than 2 On the other hand, the cars appear at random: in a short

above themean? (Note: 2 (1/ 2) exp(v 2 /2) dv = interval dt, the probability that a car comes by is dt/ ,

(1 erf( 2))/2 0.023.) with = 5 minutes. An observer is counting the cars and

(b) Normalization, Mean, and Standard De- buses.

viation. Show thatR these probability distributions (a) Verify that each hour the average number of cars pass-

are normalized: (x)dx = 1. What is the ing the observer is 12.

mean x0qof each distribution? The standard de-

R (b) What is the probability Pbus (n) that n buses pass the

viation (x x0 )2 (x)dx? (You may use

R 2

observer in a randomly chosen 10 minute interval? And

the formulas (1/ 2) exp(x /2) dx = 1 and what is the probability Pcar (n) that n cars pass the ob-

R 2

x (1/ 2) exp(x2 /2) dx = 1.) server in the same time interval? (Hint: For the cars,

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

9

one way to proceed is to divide the interval into many many elds of applied mathematics, statistical mechan-

small slivers of time dt: in each sliver the probability is ics [97], and eld theory [98], so lets investigate them in

dt/ that a car passes, and 1 dt/ edt/ that no detail.

car passes. However you do it, you should get a Poisson (a) Show, by converting the sum to an integral, that

distribution, Pcar (n) = an ea /n! See also exercise 3.4.) log(n!) (n + 1/2 ) log(n + 1/2 ) n 1/2 log(1/2 ), where

(c) What is the probability distribution bus and car for (as always in this book) log represents the natural log-

the time interval between two successive buses and arithm, not log10 . Show that this is compatible with the

cars, respectively? What are the means of these distri- more precise and traditional formula n! (n/e)n 2n;

butions? (Hint: To answer this for the bus, youll in particular, show that the dierence of the logs goes

need to use the Dirac -function,5 which is zero except to a constant as n . Show that the latter is com-

at patible with the rst term in the series we use below,

R c zero and innite at zero, with integral equal to one: 1

a

f (x)(x b) dx = f (b).) n! (2/(n + 1)) /2 e(n+1) (n + 1)n+1 , in that the dif-

(d) If another observer arrives at the road at a randomly ference Rof the logs goes to zero as n . Related for-

chosen time, what is the probability distribution for the mul: log x dx = x log x x, and log(n + 1) log(n) =

time she has to wait for the rst bus to arrive? What log(1 + 1/n) 1/n up to terms of order 1/n2 .

is the probability distribution for the time she has to wait We want to expand this function for large n: to do this,

for the rst car to pass by? (Hint: What would the dis- we need to turn it into a continuous function, interpolat-

tribution of waiting times be just after a car passes by? ing between the integers. This continuous function, with

Does the time of the next car depend at all on the previ- its argument perversely shifted by one, is (z) = (z 1)!.

ous car?) What are the means of these distributions? There are many equivalent formulas for (z): indeed, any

formula giving an analytic function satisfying the recur-

The mean time between cars is 5 minutes. The mean

sion relation (z + 1) = z(z) and the normalization

time to the next car should be 5 minutes. A little thought

(1) = 1 is equivalent (by theorems of complex analy-

should convince you that the mean time since the last car

sis). We R wont use it here, but a typical denition is

should also be 5 minutes. But 5 + 5 = 5: how can this

(z) = 0 et tz1 dt: one can integrate by parts to show

be?

that (z + 1) = z(z).

The same physical quantity can have dierent means (b) Show, using the recursion relation (z + 1) = z(z),

when averaged in dierent ensembles! The mean time that (z) is innite (has a pole) at all the negative inte-

between cars in part (c) was a gap average: it weighted gers.

all gaps between cars equally. The mean time to the next

Stirlings formula is extensible [9, p.218] into a nice ex-

car from part (d) was a time average: the second observer

pansion of (z) in powers of 1/z = z 1 :

arrives with equal probability at every time, so is twice

as likely to arrive during a gap between cars that is twice [z] = (z 1)! (1.3)

as long. 1

z z 1

(2/z) e /2

z (1 + (1/12)z

(e) In part (c), gap

car () was the probability that a ran- 2

+ (1/288)z (139/51840)z 3

domly chosen gap was of length . Write a formula for

time

car (), the probability that the second observer, arriv- (571/2488320)z 4 + (163879/209018880)z 5

ing at a randomly chosen time, will be in a gap between + (5246819/75246796800)z 6

cars of length . (Hint: Make sure its normalized.)

(534703531/902961561600)z 7

From time

car (), calculate the average length of the gaps

between cars, using the timeweighted average measured (4483131259/86684309913600)z 8 + ...)

by the second observer. This looks like a Taylor series in 1/z, but is subtly dier-

ent. For example, we might ask what the radius of con-

(1.4) Stirlings Approximation and Asymptotic vergence [101] of this series is. The radius of convergence

Series. (Mathematics) is the distance to the nearest singularity in the complex

One important approximation useful in statistical me- plane.

chanics is Stirlings approximation [99] for n!, valid for (c) Let g() = (1/); then Stirlings formula is some

large n. Its not a traditional Taylor series: rather, its stu times a Taylor series in . Plot the poles of g() in

an asymptotic series. Stirlings formula is extremely use- the complex plane. Show, that the radius of convergence

ful in this course, and asymptotic series are important in of Stirlings formula applied to g must be zero, and hence

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

10 Why Study Statistical Mechanics?

no matter how large z is, Stirlings formula eventually plot functions on the same graph, (ii) nd eigenvalues of

diverges. matrices, sort them, and collect the dierences between

Indeed, the coecient of z j eventually grows rapidly; neighboring ones, and (iii) generate symmetric random

Bender and Orszag [9, p.218] show that the odd coe- matrices with Gaussian and integer entries. Mathemat-

cients (A1 = 1/12, A3 = 139/51840 . . . ) asymptotically ica, Matlab, Octave, and Python are all good choices. For

grow as those who are not familiar with one of these packages, I

A2j+1 (1)j 2(2j)!/(2)2(j+1) . (1.4) will post hints on how to do these three things on the

book Web site [105].

(d) Show explicitly, using the ratio test applied to for-

mula 1.4, that the radius of convergence of Stirlings for- The most commonly explored ensemble of matrices is the

mula is indeed zero.6 Gaussian Orthogonal Ensemble. Generating a member

H of this ensemble of size N N is easy:

This in no way implies that Stirlings formula isnt valu-

able! An asymptotic series of length n approaches f (z) as Generate a N N matrix whose elements are ran-

z gets big, but for xed z it can diverge as n gets larger dom numbers with Gaussian distributions of mean

and larger. In fact, asymptotic series are very common, zero and standard deviation = 1.

and often are useful for much larger regions than are Tay-

Add each matrix to its transpose to symmetrize it.

lor series.

(e) What is 0!? Compute 0! using successive terms in As a reminder, the Gaussian or normal probability distri-

Stirlings formula (summing to AN for the rst few N .) bution gives a random number x with probability

Considering that this formula is expanding about innity,

1 2 2

it does pretty well! (x) = ex /2 . (1.5)

Quantum electrodynamics these days produces the most 2

precise predictions in science. Physicists sum enormous One of the simplest and most striking properties that

numbers of Feynman diagrams to produce predictions of large random matrices share is the distribution of level

fundamental quantum phenomena. Dyson argued that splittings.

quantum electrodynamics calculations give an asymptotic

(a) Generate an ensemble with M = 1000 or so GOE ma-

series [98]; the most precise calculation in science takes

trices of size N = 2, 4, and 10. (More is nice.) Find the

the form of a series which cannot converge!

eigenvalues n of each matrix, sorted in increasing or-

(1.5) Random Matrix Theory. (Math, Quantum) der. Find the dierence between neighboring eigenvalues

(With Brouwer. [14]) n+1 n , for n, say, equal to7 N/2. Plot a histogram of

One of the most active and unusual applications of ensem- these eigenvalue splittings divided by the mean splitting,

bles is random matrix theory, used to describe phenomena with binsize small enough to see some of the uctuations.

in nuclear physics, mesoscopic quantum mechanics, and (Hint: debug your work with M = 10, and then change

wave phenomena. Random matrix theory was invented in to M = 1000.)

a bold attempt to describe the statistics of energy level What is this dip in the eigenvalue probability near zero?

spectra in nuclei. In many cases, the statistical behavior Its called level repulsion.

of systems exhibiting complex wave phenomena almost For N = 2 the probability distribution for the eigenvalue

any correlations involving eigenvalues and eigenstates splitting

can becalculated pretty simply. Let our matrix

can be quantitatively modeled using simple ensembles of a b

be M = .

matrices with completely random, uncorrelated entries! b c

To do this problem, youll need to nd a software envi- (b) Show that the eigenvalue

p dierence for M is =

ronment in which it is easy to (i) make histograms and (c a)2 + 4b2 = 2 d2 + b2 where d = (c a)/2.8 If

6 If you dont remember about radius of convergence, see [101]. Here youll be using

p

every other term in the series, so the radius of convergence is |A2j1 /A2j+1 |.

7 In the experiments, they typically plot all the eigenvalue splittings. Since the

mean splitting between eigenvalues will change slowly, this smears the distributions

a bit. So, for example, the splittings between the largest and secondlargest eigen-

values will be typically rather larger for the GOE ensemble than for pairs near the

middle. If you conne your plots to a small range near the middle, the smearing

would be small, but its so fast to calculate new ones we just keep one pair.

8 Note that the eigenvalue dierence doesnt depend on the trace of M , a + c, only

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

11

the probability distribution of matrices M (d, b) is contin- distributions as in part (a). Are they universal (indepen-

uous and nite at d = b = 0, argue that the probability dent of the ensemble up to the mean spacing) for N = 2

density () of nding an energy level splitting near zero and 4? Do they appear to be nearly universal10 (the same

vanishes at = 0, giving us level repulsion. (Both d and as for the GOE in part (a)) for N = 10? Plot the Wigner

b must vanish to make = 0.) (Hint: go to polar coor- surmise along with your histogram for N = 10.

dinates, with the radius.) The GOE ensemble has some nice statistical properties.

(c) Calculate analytically the standard deviation of a di- The ensemble is invariant under orthogonal transforma-

agonal and an o-diagonal element of the GOE ensemble tions

(made by symmetrizing Gaussian random matrices with H RT HR with RT = R1 . (1.7)

= 1). You may want to check your answer by plotting

your predicted Gaussians over the histogram of H11 and (g) Show that Tr[H T H] is the sum of the squares of all

elements of H. Show that this trace is invariant un-

H12 from your ensemble in part (a). Calculate analyti-

der orthogonal coordinate transformations (that is, H

cally the standard deviation of d = (c a)/2 of the N = 2

GOE ensemble of part (b), and show that it equals the RT HR with RT = R1 ). (Hint: Remember, or derive,

the cyclic invariance of the trace: Tr[ABC] = Tr[CAB].)

standard deviation of b.

(d) Calculate a formula for the probability distribution of Note that this trace, for a symmetric matrix, is the sum

eigenvalue spacings for the N = 2 GOE, by integrating of the squares of the diagonal elements plus twice the

over the probability density M (d, b). (Hint: polar coor- squares of the upper triangle of odiagonal elements.

dinates again.) That is convenient, because in our GOE ensemble the

variance (squared standard deviation) of the odiagonal

If you rescale the eigenvalue splitting distribution you

elements is half that of the diagonal elements.

found in part (d) to make the mean splitting equal to

one, you should nd the distribution (h) Write the probability density (H) for nding GOE

ensemble member H in terms of the trace formula in

s s2 /4

Wigner (s) = e . (1.6) part (g). Argue, using your formula and the invariance

2

from part (g), that the GOE ensemble is invariant under

This is called the Wigner surmise: it is within 2% of the orthogonal transformations: (RT HR) = (H).

correct answer for larger matrices as well.9

This is our rst example of an emergent symmetry. Many

(e) Plot equation 1.6 along with your N = 2 results from dierent ensembles of symmetric matrices, as the size N

part (a). Plot the Wigner surmise formula against the goes to innity, have eigenvalue and eigenvector distribu-

plots for N = 4 and N = 10 as well. tions that are invariant under orthogonal transformations

Lets dene a 1 ensemble of real symmetric matrices, by even though the original matrix ensemble did not have

generating a N N matrix whose elements are indepen- this symmetry. Similarly, rotational symmetry emerges

dent random variables each 1 with equal probability. in random walks on the square lattice as the number of

(f ) Generate an ensemble with M = 1000 1 symmetric steps N goes to innity, and also emerges on long length

matrices with size N = 2, 4, and 10. Plot the eigenvalue scales for Ising models at their critical temperatures.11

9 The distribution for large matrices is known and universal, but is much more

complicated to calculate.

10 Note the spike at zero. There is a small probability that two rows or columns of

our matrix of 1 will be the same, but this probability vanishes rapidly for large N .

11 A more exotic emergent symmetry underlies Fermi liquid theory: the eective

interactions between electrons disappear near the Fermi energy: the xed point has

an emergent gauge symmetry.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

12 Why Study Statistical Mechanics?

Random Walks and

Emergent Properties 2

What makes physics possible? Why are humans able to nd simple

mathematical laws that describe the real world? Our physical laws

are not direct statements about the underlying reality of the universe.

Rather, our laws emerge out of far more complex microscopic behavior.1 1

You may think that Newtons law of

Statistical mechanics provides powerful tools for understanding simple gravitation, or Einsteins renement to

it, is more fundamental than the dif-

behavior that emerges from underlying complexity. fusion equation. You would be cor-

In this chapter, we will explore the emergent behavior for random rect: gravitation applies to everything.

walks. Random walks are paths that take successive steps in random But the simple macroscopic law of grav-

directions. They arise often in statistical mechanics: as partial sums of itation emerges, from a quantum ex-

change of immense numbers of virtual

uctuating quantities, as trajectories of particles undergoing repeated gravitons, just as the diusion equa-

collisions, and as the shapes for long, linked systems like polymers. They tion emerges from large numbers of long

have two kinds of emergent behavior. First, an individual random walk, random walks. The diusion equation

after a large number of steps, becomes fractal or scale invariant (sec- and other continuum statistical me-

chanics laws are special to particular

tion 2.1). Secondly, the endpoint of the random walk has a probability systems, but they emerge from the mi-

distribution that obeys a simple continuum law: the diusion equation croscopic theory in much the same way

(section 2.2). Both of these behaviors are largely independent of the as gravitation and the other fundamen-

microscopic details of the walk: they are universal. Random walks in tal laws of nature do.

currents, linear response, and Boltzmann distributions. Finally we use

the diusion equation to introduce Fourier and Greens function solution

techniques (section 2.4). Random walks encapsulate many of the themes

and methods of statistical mechanics.

and Scale Invariance

We illustrate random walks with three examples: coin ips, the drunk-

ards walk, and polymers.

Coin Flips. Statistical mechanics often N demands sums or averages of

a series of uctuating quantities: sN = i=1 i . The energy of a material

is a sum over the energies of the molecules composing the material; your

grade on a statistical mechanics exam is the sum of the scores on many

individual questions. Imagine adding up this sum one term at a time:

the path s1 , s2 , . . . forms an example of a one-dimensional random walk.

For example, consider ipping a coin, recording the dierence sN =

hN tN between the number of heads and tails found. Each coin ip

13

14 Random Walks and Emergent Properties

N

contributes i = 1 to the total. How big a sum sN = i=1 i =

(heads tails) do you expect after N ips? The average of sN is of

course zero, because positive and negative steps are equally likely. A

better measure of the characteristic

distance moved is the rootmean

2

We use angle brackets X to denote square (RMS) number2 s2N . After one coin ip,

averages over various ensembles: well

add subscripts to the brackets where s1 2 = 1 = 1/2 (1)2 + 1/2 (1)2 ; (2.1)

there may be confusion about which en-

semble we are using. Here our ensemble after two and three coin ips

contains all 2N possible sequences of N

coin ips. s2 2 = 2 = 1/4 (2)2 + 1/2 (0)2 + 1/4 (2)2 ; (2.2)

s3 = 3 =

2 1

/8 (3)2 + 3

/8 (1)2 + 3

/8 (1)2 + 1

/8 (3)2 .

Does this pattern continue? Because N = 1 with equal probability

independent of the history, N sN 1 = 1/2 (+1)sN 1 + 1/2 (1)sN 1 =

0. We know 2N = 1; if we assume s2N 1 = N 1 we can prove by

induction on N that

s2N = (sN 1 + N )2 = s2N 1 + N1 N + N

2

2s

= s2N 1 + 1 = N. (2.3)

Hence the RMS average of (heads-tails) for N coin ips,

s = s2N = N . (2.4)

Notice that we chose to count the dierence between the number of

heads and tails. Had we instead just counted the number of heads hN ,

then hN would grow proportionately to N : hN = N/2. We would

then be interested in the uctuations of hN about N/2, measured most

easily by squaring the dierence between the particular random walks

3

Its N/4 for h instead of N for s be- and the average random walk: h2 = (hN hN )2 = N/4.3 The

cause each step changes sN by 2, and variable h is the standard deviation of the sum hN : this is an example

hN only by 1: the standard deviation

is in general proportional to the step

of the typical behavior that the standarddeviation of the sum of N

size. random variables grows proportionally to N .

The sum, of course, grows linearly with N , so (if the average isnt

zero) the uctuations become tiny in comparison to the sum. This is

why experimentalists often make repeated measurements of the same

quantity and take the mean. Suppose we were to measure the mean

number of heads per coin toss, aN = hN /N . We see immediately that

the uctuations in aN will also be divided by N , so

a = h /N = 1/(2 N ). (2.5)

The standard

deviation of the mean of N measurements is proportional

to 1/ N .

Drunkards Walk. Random walks in higher dimensions arise as

trajectories that undergo successive random collisions or turns: for ex-

ample, the trajectory of a perfume molecule in a sample of air4 Because

4 Real perfume in a real room will primarily be transported by convection; in

liquids and gasses, diusion dominates usually only on short length scales. Solids

dont convect, so thermal or electrical conductivity would be more accurate but

less vivid applications for random walks.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

2.1 Random Walk Examples: Universality and Scale Invariance 15

the air is dilute and the interactions are short-ranged, the molecule will

basically travel in straight lines, with sharp changes in velocity during

infrequent collisions. After a few substantial collisions, the molecules

velocity will be uncorrelated with its original velocity. The path taken

by the molecule will be a jagged, random walk through three dimensions.

The random walk of a perfume molecule involves random directions,

random velocities, and random step sizes. Its more convenient to study

steps at regular time intervals, so well instead consider the classic prob-

lem of a drunkards walk. The drunkard is presumed to start at a lamp-

post at x = y = 0. He takes steps N each of length L, at regular time

intervals. Because hes drunk, the steps are in completely random direc-

tions, each uncorrelated with the previous steps. This lack of correlation

says that the average dot product between any two steps m and n is

zero, since all relative angles between the two directions are equally

likely: m n = L

cos()

2

= 0.5 This implies that the dot product

N 1

of N with sN 1 = m=0 m is zero. Again, we can use this to work by

induction:

sN

2

= (sN 1 +

N )2 = sN

2

sN 1 N + N

1 + 2

2

Fig. 2.1 The drunkard takes a series of

steps of length L away from the lamp-

= + L = = NL ,

sN

2

1

2 2

(2.6) post, but each with a random angle.

5

so the RMS distance moved is N L. More generally, if two variables are

uncorrelated then the average of their

Random walks introduce us to the concepts of scale invariance and

product is the product of their aver-

universality. ages: in this case this would imply

Scale Invariance. What kind of path only goes N total distance in

m

n =

m

n = 0 0 = 0.

N steps? Random walks form paths which look jagged and scrambled.

Indeed, they are so jagged that if you blow up a small corner of one, the

blown up version looks just as jagged (gure 2.2). Clearly each of the

blown-up random walks is dierent, just as any two random walks of the

same length are dierent, but the ensemble of random walks of length

N looks much like that of length N/4, until N becomes small enough

that the individual steps can be distinguished. Random walks are scale

invariant: they look the same on all scales.6 6

They are also fractal with dimen-

Universality. On scales where the individual steps are not distin- sion two, in all spatial dimensions larger

than two. This just reects the fact

guishable (and any correlations between steps is likewise too small to that a random walk of volume V = N

see) we nd that all random walks look the same. Figure 2.2 depicts steps roughly ts into a radius R

1

a drunkards walk, but any twodimensional random walk would give sN N /2 . The fractal dimension D

the same behavior (statistically). Coin tosses of two coins (penny sums of the set, dened by RD = V , is thus

two.

along x, dime sums along y) would produce, statistically, the same ran-

dom walk ensemble on lengths large compared to the step sizes. In three

dimensions, photons7 in the sun (exercise 2.2) or in a glass of milk un- 7

A photon is a quantum of light or

dergo a random walk with xed speed c between collisions. Nonetheless, other electromagnetic radiation.

after a few steps their random walks are statistically indistinguishable

from that of our variablespeed perfume molecule. This independence

of the behavior on the microscopic details is called universality.

Random walks are simple enough that we could probably show that

each individual case behaves like the others. In section

2.2 we will gen-

eralize our argument that the RMS distance scales as N to simulta-

neously cover both coin ips and drunkards; with more work we could

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

16 Random Walks and Emergent Properties

Fig. 2.2 Random Walk: Scale Invariance Random walks form a jagged, fractal

pattern which looks the same when rescaled. Here each succeeding walk is the rst

quarter of the previous walk, magnied by a factor of two; the shortest random walk

is of length 31, the longest of length 128,000 steps. The left side of gure 1.1 is the

further evolution of this walk to 512,000 steps.

2.2 The Diusion Equation 17

the cases of photons and molecules in a gas. We could probably also

calculate properties about the jaggedness of paths in these systems, and

S&P

show that they too agree with one another after many steps. Instead, 2 Random

well wait for chapter 13 (and specically exercise 13.7), where we will

give a deep but intuitive explanation of why each of these problems 1.5

are scale invariant, and why all of these problems share the same be-

havior on long length scales. Universality and scale invariance will be 1

1985 1990 1995 2000 2005

oped to study continuous phase transitions. Year

Fig. 2.3 S&P 500, normalized.

Polymers are long molecules (like DNA, RNA, proteins, and many plas-

Standard and Poors 500 stock index

tics) made up of many small units (called monomers) attached to one daily closing price since its inception,

another in a long chain. Temperature can introduce uctuations in the corrected for ination, divided by the

angle between two adjacent monomers; if these uctuations dominate average 6.4% return over this time pe-

riod. Stock prices are often modeled as

over the energy,8 the polymer shape can form a random walk. Here a biased random walk. Notice that the

the steps are not increasing with time, but with monomers (or groups uctuations (risk) in individual stock

of monomers) along the chain. prices will typically be much higher. By

The random walks formed by polymers are not the same as those in averaging over 500 stocks, the random

uctuations in this index are reduced,

our rst two examples: they are in a dierent universality class. This while the average return remains the

is because the polymer cannot intersect itself: a walk that would cause same: see [65] and [66]. For compar-

two monomers to overlap is not allowed. Polymers undergo self-avoiding ison, a one-dimensional multiplicative

random walks. In two and three dimensions, it turns out that the eects random walk is also shown.

8

of these selfintersections is not a small, microscopic detail, but changes Plastics at low temperature can be

crystals; functional proteins and RNA

the properties of the random walk in an essential way.9 One can show often packed tightly into welldened

that these intersections will often arise on farseparated regions of the shapes. Molten plastics and dena-

polymer, and that in particular they change the dependence of squared tured proteins form selfavoiding ran-

radius s2N on the number ofsegments N (exercise 2.8). In particular, dom walks. Doublestranded DNA is

rather sti: the step size for the ran-

they change the power law s2N N from the ordinary random dom walk is many nucleic acids long.

walk value = 1/2 to a higher value, = 3/4 in two dimensions and 9

Selfavoidance is said to be a rel-

0.59 in three dimensions. Power laws are central to the study of evant perturbation that changes the

scaleinvariant systems: is our rst example of a universal critical universality class. In (unphysical)

exponent. spatial dimensions higher than four,

selfavoidance is irrelevant: hypothet-

ical hyperpolymers in ve dimensions

would look like regular random walks

2.2 The Diusion Equation on long length scales.

In the continuum limit of long length and time scales, simple behavior

emerges from the ensemble of irregular, jagged random walks: their

evolution is described by the diusion equation:10 10

In the remainder of this chapter we

specialize for simplicity to one dimen-

2 sion. We also change variables from the

= D2 = D 2 . (2.7) sum s to position x.

t x

The diusion equation can describe the evolving density (x, t) of a local

cloud of perfume as the molecules randomwalk through collisions with

the air molecules. Alternatively, it can describe the probability density of

an individual particle as it random walks through space: if the particles

are non-interacting, the probability distribution of one particle describes

the density of all particles.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

18 Random Walks and Emergent Properties

t the particles position x changes by a step :

x(t + t) = x(t) + (t). (2.8)

11

In our two examples the distribution Let the probability distribution for each step be ().11 Well assume

() was discrete: we can write it using that has mean zero and standard deviation a, so the rst few moments

the Dirac -function. (The function

(x x0 ) is a probability density which

of are

has 100% chance of nding the particle

in any box containing x0 : thus

R (xx0 ) (z) dz = 1, (2.9)

is zero unless x = x0 , and f (x)(x

x0 )dx = f (x0 ) so long as the domain

of integration includes x0 .) In the case

z(z) dz = 0, and

of coin ips, a 50/50 chance of = 1

can be written as () = 1/2 ( + 1) + z 2 (z) dz = a2 .

1/ ( 1). In the case of the drunkard,

2

() = (|| L)/(2L), evenly spaced

around the circle. What is the probability distribution for (x, t+t), given the probability

distribution (x , t)?

Clearly, for the particle to go from x at time t to x at time t + t,

the step (t) must be x x . This happens with probability (x x )

times the probability density (x , t) that it started at x . Integrating

over original positions x , we have

(x, t + t) = (x , t)(x x ) dx

= (x z, t)(z) dz (2.10)

12

Notice that although dz =

R

dx ,

R

the where we change variables to z = x x .12

R of integration

limits = Now, suppose is broad: the step size is very small compared to the

, canceling the minus sign. This scales on which varies (gure 2.4). We may then do a Taylor expansion

happens often in calculations: watch

of 2.10 in z:

out for it.

z 2 2

(x, t + t) (x, t) z + (z) dz (2.11)

x 2 x2

a

1 0 2

dz + 1/2 2

= (x, t) (z) dz z(z) z 2 (z) dz.

Fig. 2.4 We suppose the step sizes x x

are small compared to the broad ranges 2

on which (x) varies, so we may do a = (x, t) + 1/2 2 a2

Taylor expansion in gradients of . x

using the moments of in 2.9. Now, if we also assume that is slow, so

that it changes only slightly during this time step, we can approximate

(x, t + t) (x, t)

t t, and we nd

a2 2

= . (2.12)

t 2t x2

13

One can understand this intuitively. This is the diusion equation13 (2.7), with

Random walks and diusion tend to

even out the hills and valleys in the den- D = a2 /2t. (2.13)

sity. Hills have negative second deriva-

2 The diusion equation applies to all random walks, so long as the prob-

tives x 2 < 0 and should atten t <

0, valleys have positive second deriva- ability distribution is broad and slow compared to the individual steps.

tives and ll up. To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

2.3 Currents and External Forces. 19

As the particles in our random walks move around, they never are cre-

ated or destroyed: they are conserved.14 If (x) is the density of a 14

More subtly, the probability density

conserved quantity, we may write its evolution law (see gure 2.5) in (x) of a single particle undergoing a

random walk is also conserved: like par-

terms of the current J(x) passing a given point x: ticle density, probability density can-

not be created or destroyed, it can only

J slosh around.

= . (2.14)

t x

Here the current J is the amount of stu owing to the right through (x) x

the point x; since the stu is conserved, the only way the density can

change is by owing from one place to another. From equation 2.7 and J(x) J(x+x)

equation 2.14, the current for the diusion equation is

Fig. 2.5 Let (x, t) be the density

of some conserved quantity (# of

molecules, mass, energy, probability,

Jdiusion = D ; (2.15) etc.) varying in one spatial dimension

x

x, and J(x) be the rate at which is

passing a point x. The the amount

particles diuse (randomwalk) on average from regions of high density of in a small region (x, x + x) is

towards regions of low density. n = (x) x. The ow of particles into

this region from the left is J(x) and

In many applications one has an average drift term along with a ran-

the ow out is J(x + x), so n t

=

dom walk. In some cases (like the total grade in a multiple-choice test, J(x) J(x + x) x, and we de-

t

exercise 2.1) there is naturally a non-zero mean for each step in the ran- rive the conserved current relation

dom walk. In other cases, there is an external force F that is biasing J(x + x) J(x) J

= = .

the steps to one side: the mean net drift is F t times a mobility : t x x

our air is diulte and the diusing molecule is small, we can model the

trajectory as free acceleration between collisions separated by t, and we

can assume the collisions completely scramble the velocities. In this case,

the net motion due to the external force is half the acceleration F/m

t t

times the time squared: 1/2 at2 = 1/2 (F/m)(t)2 = F t 2m so = 2m

Using equation 2.13, we nd

t 2t D D

= D 2 = = (2.17)

2m a m(a/t)2 mv 2

where v = a/t is the velocity of the unbiased random walk step. If our

air is dense and the diusing molecule is large, we might treat the air

as a viscous uid of kinematic viscosity ; if we also simply model the

molecule as a sphere of radius r, a uid mechanics calculation tells us

that the mobility is = 1/(6r).

Starting from equation 2.16, we can repeat our analysis of the contin-

uum limit (equations 2.10 through 2.12) to derive the diusion equation

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

20 Random Walks and Emergent Properties

in an external force,15

J = F D (2.18)

x

2

= F + D 2. (2.19)

t x x

The new term can be explained intuitively: if is increasing in space

(positive slope x ) and the force is dragging the particles forward, then

will decrease with time because the high-density regions ahead of x

are receding and the low density regions behind x are moving in.

The diusion equation describes how systems of randomwalking par-

ticles approach equilibrium (see chapter 3). The diusion equation in

the absence of external force describes the evolution of perfume density

in a room. A timeindependent equilibrium state obeying the dif-

fusion equation 2.7 must have 2 /x2 = 0, so (x) = 0 + Bx. If

the perfume cannot penetrate the walls, x = 0 at the boundaries so

B = 0. Thus, as one might expect, the perfume evolves to a rather

featureless equilibrium state (x) = 0 , evenly distributed throughout

the room.

In the presence of a constant external force (like gravitation) the equi-

librium state is more interesting. Let x be the height above the ground,

and F = mg be the force due to gravity. By equation 2.19, the equi-

librium state satises

2

0= = mg +D (2.20)

t x x2

which has general solution (x) = A exp( D

mgx) + B. We assume

16

Non-zero B would correspond to a that the density of perfume B in outer space is zero,16 so the density

constant-density rain of perfume. of perfume decreases exponentially with height:

(x) = A exp( mgx). (2.21)

D

The perfume molecules are pulled downward by the gravitational force,

and remain aloft only because of the random walk. If we generalize

from perfume to oxygen molecules (and ignore temperature gradients

and weather) this gives the basic explanation for why it becomes harder

to breath as one climbs mountains.17

15 Warning: if the force is not constant in space, the evolution also depends on the

2 2

gradient of the force: t

= F (x)(x)

x

+ D x F

2 = x F x + D x2 (see

the discussion surrounding note 15 on page 182.

17 In chapter 5 we shall derive the Boltzmann distribution, implying that the

to exp(E/kB T ), where T is the temperature and kB is Boltzmanns constant. This

has just the same form as our solution (equation 2.21), if D/ = kB T . This is

called the Einstein relation. Our rough derivation (equation 2.17) suggested that

D/ = mv2 , which suggests that kB T must equal twice the kinetic energy along

x for the Einstein relation to hold: this is also true, and is called the equipartition

theorem (section 3.2.2). The constants in the (nonequilibrium) diusion equation

are related to one another, because the density must evolve toward the equilibrium

distribution dictated by statistical mechanics.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

2.4 Solving the Diusion Equation 21

We take a brief mathematical interlude, to review two important meth-

ods for solving the diusion equation: Fourier transforms and Greens

functions. Both rely upon the fact that the diusion equation is linear:

n (x, t) are known, then any linear combination

if a family of solutions

of these solutions n an n (x, t) isalso a solution. If we can then ex-

pand the initial density (x, 0) = n an n (x, 0), weve formally found

the solution.

Fourier methods are wonderfully eective computationally, because

of fast Fourier Transform (FFT) algorithms for shifting from the real-

space density to the solution space. Greens function methods are more

important for analytical calculations and as a source of approximate

solutions.18 18

One should note that much of quan-

tum eld theory and many-body quan-

tum mechanics is framed in terms of

2.4.1 Fourier something also called Greens functions.

These are distant, fancier cousins of the

The Fourier transform method decomposes into a family of plane wave simple methods used in linear dieren-

solutions k (t)eikx . tial equations.

The diusion equation is homogeneous in space: our system is trans-

lationally invariant. That is, if we have a solution (x, t), another

equally valid solution is given by (x , t), which describes the evo-

lution of an initial condition translated by in the positive x direc-

tion.19 Under very general circumstances, a linear equation describing 19

Make sure you know that g(x) =

a translationinvariant system will have solutions given by plane waves f (x ) shifts the function in the pos-

itive direction: for example, the new

(x, t) = k (t)eikx . function g() is at what the old one

We argue this important truth in detail in in the appendix (sec- was at the origin, g() = f (0).

tion A.3). Here we just try it. Plugging a plane wave into the diusion

equation 2.7, we nd

d

k ikx 2

= e = D 2 = Dk 2 k eikx (2.22)

t dt x

d k

= Dk 2 k (2.23)

dt

k (t) = k (0)eDk t .

2

(2.24)

combine them to get a sensible density. First, they are complex: we

must add plane waves at k and k to form cosine waves, or subtract

them and dividing by 2i to get sine waves. Cosines and sines are also

not by themselves sensible densities (because they go negative), but

they in turn can be added to one another (for example, added to a

constant background 0 ) to make for sensible densities. Indeed, we can

superimpose all dierent wave-vectors to get the general solution

1

k (0)eikx eDk t dk.

2

(x, t) = (2.25)

2

Here the coecients k (0) we use are just the the Fourier transform of

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

22 Random Walks and Emergent Properties

k (0) = (x, 0)eikx dx (2.26)

solution timeevolved in Fourier space

k (t) = k (0)eDk t .

2

(2.27)

law: the shortwavelength parts of are squelched as time t evolves,

with wavevector k being suppressed by a factor eDk t .

2

2.4.2 Green

The Greens function method decomposes into a family of solutions

G(x y, t) where all of the diusing particles start at a particular point

y.

Lets rst consider the case where all particles start at the origin.

Suppose we have one unit of perfume, released at the origin at time t = 0.

What is the

initial condition (x, t = 0)? Clearly (x, 0) = 0 unless

x = 0, and (x, 0)dx = 1, so (0, 0) must be really, really innite.

This is of course the Dirac delta function (x), which mathematically

(when integrated) is a linear operator on functions returning the value

of the function at zero:

f (y)(y) dy = f (0). (2.28)

the density G(x, 0) = (x) with all the perfume at the origin. Naturally,

2 G

Fig. 2.6 10,000 endpoints of random G(x, t) obeys the diusion equation G

t = D x2 . We can use the Fourier

walks, each 1000 steps long. Notice

that after 1000 steps, the distribution transform methods of the previous section to solve for G(x, t). The

of endpoints looks quite Gaussian. In- Fourier transform at t = 0 is

deed after about ve steps the distri-

bution is extraordinarily close to Gaus-

Gk (0) = G(x, 0)e ikx

= (x)eikx = 1 (2.29)

sian, except far in the tails.

Dk2 t

e , and the time evolution in real space is

1 ikx Dk2 t 1

eikx eDk t dk.

2

G(x, t) = e Gk (0)e dk = (2.30)

2 2

This last integral is the Fourier transform of a Gaussian. This transform

21

Its useful to remember that the can be performed20 giving another Gaussian21

Fourier transform of a normalized

Gaussian 1 exp(x2 /22 ) is another ix 2 x2

we complete the square in the integrand eikx eDk t = eDt(k 2Dt ) e 4Dt ,

2 2

20 If

Gaussian, exp(2 k 2 /2) of standard

and change variables to = k 2Dt

ix

.

deviation 1/ and with no prefactor.

Z + ix

x2 2Dt 2

G(x, t) = e 4Dt eDt d. (2.31)

ix

+ 2Dt

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

2.4 Solving the Diusion Equation 23

1

ex /4Dt .

2

G(x, t) = (2.32)

4Dt

This is the Greens function for the diusion equation.

The Greens function directly tells us the distribution of the endpoints

of random walks centered

at the origin (gure 2.6). Does it agree with

our formula x2 = a N for N -step random walks of step size a (sec-

tion 2.1)? At time t, the Greens function (equation 2.32) is a Gaussian

with root-mean-square standard deviation (t) = 2Dt; plugging in our

a2

diusion constant D = 2t (equation 2.13), we nd an RMS distance of

t

t

(t) = a t = a N , where N = t is the number of steps taken in

the random walk: our two methods do agree.

Finally, since the diusion equation has translational symmetry, we

can solve for the evolution of random walks centered at any point y: the

time evolution of an initial condition (x y) is G(x y, t). Since we

can write any initial condition (x, 0) as a superposition of -functions

(x, 0) = (y, 0)(x y) dy (2.33)

(x, 0) = (y, 0)(x y) dy = (y, 0)G(x y, 0) dy (2.34)

e(yx) /4Dt

2

4Dt

This equation states that the current value of the density is given by

the original values of the density in the neighborhood, smeared sideways

(convolved) with the function G.

Thus by writing as a superposition of point sources, we nd tha the

diusion equation smears out all the sharp features, averaging over

ranges

that grow proportionally to the typical random walk distance

2Dt.

Exercises

Exercises 2.1, 2.2, and 2.3 give simple examples of random tion approaches to solving the diusion equation. Ex-

walks in dierent contexts. Exercises 2.4 and 2.5 illustrate ercises 2.6 and 2.7 apply the diusion equation in the

the qualitative behavior of the Fourier and Greens func-

If we then shift the limits of integration upward to the real axis, we get a familiar

integral (exercise 1.2) giving 1 . This last step (shifting the limits of integration),

4Dt

is not trivial: we must rely on Cauchys theorem, which allow one to deform the

integration contour in the complex plane.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

24 Random Walks and Emergent Properties

familiar context of thermal conductivity.22 Exercise 2.8 Most of the fusion energy generated by the Sun is pro-

explores selfavoiding random walks: in two dimensions, duced near its center. The Sun is 7 105 km in radius.

we nd that the constraint that walk must avoid itself Convection probably dominates heat transport in approx-

gives new critical exponents and a new universality class imately the outer third of the Sun, but it is believed that

(see also chapter 13). energy is transported through the inner portions (say to

Random walks also arise in nonequilibrium situations. a radius R = 5 108 m) through a random walk of X-ray

photons. (A photon is a quantized package of energy: you

They arise in living systems. Bacteria search for

may view it as a particle which always moves at the speed

food (chemotaxis) using a biased random walk, ran-

of light c. Ignore for this problem the index of refraction

domly switching from a swimming state (random

of the Sun.) Assume that the mean free path * for the

walk step) to a tumbling state (scrambling the ve-

photon is * = 5 105 m.

locity), see [10].

They arise in economics: Black and Scholes [111] an- About how many random steps N will the photon take of

alyze the approximate random walks seen in stock length * to get to the radius R where convection becomes

prices (gure 2.3) to estimate the price of options important? About how many years t will it take for the

how much you charge a customer who wants a guar- photon to get there? (You may assume for this problem

antee that they can by stock X at price Y at time t that the photon takes steps in random directions, each of

depends not only on whether the average price will equal length given by the mean-free path.) Related for-

rise past Y , but also whether a random uctuation mul: c = 3108 m/s; x2 2Dt; s2n = n 2 = ns21 .

will push it past Y . There are 31, 556, 925.9747 107 3 107 seconds

They arise in engineering studies of failure. If a in a year.

bridge strut has N microcracks each with a failure

stress i , and these stresses have probability density (2.3) Ratchet and Molecular Motors. (Basic, Biol-

(), the engineer is not concerned with the aver- ogy)

age failure stress , but the minimum. This intro- Read Feynmans Ratchet and Pawl discussion in refer-

duces the study of extreme value statistics: in this ence [86, I.46] for this problem. Feynmans ratchet and

case, the failure time distribution is very generally pawl discussion obviously isnt so relevant to machines

described by the Weibull distribution. you can make in your basement shop. The thermal uc-

tuations which turn the wheel to lift the ea are too small

(2.1) Random walks in Grade Space. to be noticeable on human length and time scales (you

need to look in a microscope to see Brownian motion).

Lets make a simple model of the prelim grade distribu-

On the other hand, his discussion turns out to be surpris-

tion. Lets imagine a multiple-choice test of ten problems

ingly close to how real cells move things around. Physics

of ten points each. Each problem is identically dicult,

professor Michelle Wang studies these molecular motors

and the mean is 70. How much of the point spread on the

in the basement of Clark Hall.

exam is just luck, and how much reects the dierences

in skill and knowledge of the people taking the exam? To Inside your cells, there are several dierent molecular mo-

test this, lets imagine that all students are identical, and tors, which move and pull and copy (gure 2.7). There

that each question is answered at random with a proba- are molecular motors which contract your muscles, there

bility 0.7 of getting it right. are motors which copy your DNA into RNA and copy

your RNA into protein, there are motors which transport

(a) What is the expected mean and standard deviation for

biomolecules around in the cell. All of these motors share

the exam? (Work it out for one question, and then use

some common features: (1) they move along some linear

our theorems for a random walk with ten steps.)

track (microtubule, DNA, ...), hopping forward in discrete

A typical exam with a mean of 70 might have a standard jumps between low-energy positions, (2) they consume

deviation of about 15. energy (burning ATP or NTP) as they move, generat-

(b) What physical interpretation do you make of the ratio ing an eective force pushing them forward, and (3) their

of the random standard deviation and the observed one? mechanical properties can be studied by seeing how their

motion changes as the external force on them is changed

(2.2) Photon diusion in the Sun. (Easy) (gure 2.8).

in the thermal gradient is to be expected on very general grounds.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

2.4 Solving the Diusion Equation 25

Fig. 2.7 Cartoon of a motor protein, from reference [48]. As it Fig. 2.9 The eective potential for moving along the DNA

carries some cargo along the way (or builds an RNA or protein, (from reference [48]). Ignoring the tilt We , Feynmans energy

. . . ) it moves against an external force fext and consumes r barrier / is the dierence between the bottom of the wells and

ATP molecules, which are hydrolyzed to ADP and phosphate the top of the barriers. The experiment changes the tilt by

(P). adding an external force pulling to the left. In the absence

of the external force, We is the (Gibbs free) energy released

when one NTP is burned and one RNA nucleotide is attached.

average one base pair (A, T, G or C) per step: * is

about 0.34nm. We can think of the triangular grooves in

the ratchet as being the low-energy states of the motor

when it is resting between steps. The barrier between

steps has an asymmetric shape (gure 2.9), just like the

energy stored in the pawl is ramped going up and steep

going down. Professor Wang showed (in a later paper)

that the motor stalls at an external force of about 27 pN

(pico-Newton).

(a) At that force, what is the energy dierence between

neighboring wells due to the external force from the bead?

(This corresponds to L in Feynmans ratchet.) Lets as-

sume that this force is whats needed to balance the natural

force downhill that the motor develops to propel the tran-

scription process. What does this imply about the ratio

of the forward rate to the backward rate, in the absence

of the external force from the laser tweezers, at a tem-

perature of 300K, (from Feynmans discussion preceding

equation 46.1)? (kB = 1.381 1023 J/K).

The natural force downhill is coming from the chemical

reactions which accompany the motor moving one base

pair: the motor burns up an NTP molecule into a PPi

Fig. 2.8 Cartoon of Cornell professor Michelle Wangs early molecule, and attaches a nucleotide onto the RNA. The

laser tweezer experiment, (reference [119]). (A) The laser beam net energy from this reaction depends on details, but

is focused at a point (the laser trap); the polystyrene bead varies between about 2 and 5 times 1020 Joule. This

is pulled (from dielectric eects) into the intense part of the is actually a Gibbs free energy dierence, but for this

light beam. The track is a DNA molecule attached to the problem treat it as just an energy dierence.

bead, the motor is an RNA polymerase molecule, the cargo

is the glass cover slip to which the motor is attached. (B) As (b) The motor isnt perfectly ecient: not all the chemi-

the motor (RNA polymerase) copies DNA onto RNA, it pulls cal energy is available as motor force. From your answer

the DNA track toward itself, dragging the bead out of the to part (a), give the eciency of the motor as the ratio

trap, generating a force resisting the motion. (C) A mechani- of force-times-distance produced to energy consumed, for

cal equivalent, showing the laser trap as a spring and the DNA the range of consumed energies given.

(which can stretch) as a second spring.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

26 Random Walks and Emergent Properties

23

(2.4) Solving Diusion: Fourier and Green. (Ba- (2.5) Solving the Diusion Equation. (Basic)

sic)

Consider a one-dimensional diusion equation /t =

D 2 /x2 , with initial condition periodic in space with

0.4 period L,P consisting of a function at every xn = nL:

(x, 0) = n= (x nL).

(x,t=0) - 0

and for L/2 < x < L/2, involving only one term (not

an innite sum). (Hint: how many of the Gaussians are

0 important in this region at early times?)

(b) Using the Fourier method,24 give an approximate ex-

pression for the density, valid at long times, involving only

two terms (not an innite sum). (Hint: how many of

the wavelengths are important at late times?)

-0.4 (c) Give a characteristic time in terms of L and D,

0 5 10 15 20

Position x such that your answer in (a) is valid for t and your

Fig. 2.10 Initial prole of density deviation from average. answer in (b) is valid for t .

An initial density prole (x, t = 0) is perturbed slightly (2.6) Frying Pan (Basic)

away from a uniform density 0 , as shown at left. The An iron frying pan is quickly heated on a stove top to 400

density obeys the diusion equation /t = D 2 /x2 , degrees Celsius. Roughly how long it will be before the

where D = 0.001 m2 /s. The lump centered at x = 5 handle is too hot to touch (within, say, a factor of two)?

is a Gaussian exp(x2 /2)/ 2, and the wiggle centered (Adapted from reference [90, p. 40].)

at x = 15 is a smooth envelope function multiplying

Do this three ways.

cos(10x).

(a) Fourier. As a rst step in guessing how the pictured (a) Guess the answer from your own experience. If youve

density will evolve, lets consider just a cosine wave. If the always used aluminum pans, consult a friend or parent.

initial wave were cos (x, 0) = cos(10x), what would it be (b) Get a rough answer by a dimensional argument. You

at t = 10s? Related formul: (k, t) = (k, t )G(k, tt ); need to transport heat cp V T across an area A = V /x.

G(k, t) = exp(Dk2 t). How much heat will ow across that area per unit time,

(b) Green. As a second step, lets check how long it if the temperature gradient is roughly assumed to be

would take to spread out as far as the Gaussian on the left. T /x? How long t will it take to transport the amount

If the wave at some earlier time t0 were a function at needed to heat up the whole handle?

x = 0, (x, t0 ) = (x), what choice of the time elapsed (c) Roughly model the problem as the time needed for

t0 would yield a Gaussian (x, 0) = exp(x2 /2)/ 2 a pulse of heat at x = 0 on an innite rod to spread

2

for the given diusion constantR D = 0.001m /s? Re- out a distance equal to the length of the handle, and

lated formul: (x, t) = (y, t )G(y x, t t ) dy; use the Greens function for the heat diusion equation

2

G(x, t) = (1/ 4Dt) exp(x /(4Dt)). (problems 10.3 and 10.4 below). How long until the pulse

(c) Pictures. Now consider time evolution for the next spreads out a root-mean square distance (t) equal to the

ten seconds. The initial density prole (x, t = 0) is length of the handle?

again shown at left. Which of the choices in gure 2.11

Note: For iron, the specic heat cp = 450J/kg C, the

represents the density at t = 10s? (Hint: compare

density = 7900kg/m3 , and the thermal conductivity

t = 10s to the time t0 from part (B).) Related formul:

kt = 80W/m C.

x2 2Dt;

24 Ifyou use a Fourier transform of (x, 0), youll need to sum over n to get -

function contributions at discrete values of k = 2m/L. If you use a Fourier series,

youll need to unfold the sum over n of partial Gaussians into a single integral over

an unbounded Gaussian.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

2.4 Solving the Diusion Equation 27

(x,t=10) - 0

(x,t=10) - 0

(x,t=10) - 0

0 0 0

(A) 10 15 20

Position x

25 30

(B) 0 5 10

Position x

15 20

(C) 0 5 10

Position x

15 20

0.4 0.4

(x,t=10) - 0

(x,t=10) - 0

0 0

-0.4 -0.4

(D) 0 5 10

Position x

15 20

(E) 0 5 10

Position x

15 20

(2.7) Thermal Diusion. (Basic) Polymers are not accurately represented as random walks,

The rate of energy ow in a material with thermal con- however. Random walks, particularly in low dimensions,

ductivity kt and a temperature eld T (x, y, z, t) = T (r, t) often intersect themselves. Polymers are best represented

is J = kt T .25 Energy is locally conserved, so the en- as self-avoiding random walks: the polymer samples all

ergy density E satises E/t = J. possible congurations that does not cross itself. (Greg

(a) If the material has constant specic heat cp and den- Lawler, in the math department here, is an expert on

sity , so E = cp T , show that the temperature T satises self-avoiding random walks.)

the diusion equation T /t = ckpt 2 T . Lets investigate whether avoiding itself will change the

(b) By putting our material in a cavity with microwave basic nature of the polymer conguration. In particu-

standing waves, we heat it with a periodic modulation lar, does the end-to-end typical distance continue to scale

T = sin(kx) at t = 0, at which time the microwaves with the square root of the length L of the polymer,

are turned o. Show that amplitude of the temperature R L?

modulation decays exponentially in time. How does the (b) Two dimensional self-avoiding random walk.

amplitude decay rate depend on wavelength = 2/k? Give a convincing, short argument explaining whether or

(2.8) Polymers and Random Walks. not a typical, non-self-avoiding random walk in two di-

mensions will come back after large numbers of monomers

Polymers are long molecules, typically made of identi-

and cross itself. (Hint: how big a radius does it extend

cal small molecules called monomers that are bonded to-

to? How many times does it traverse this radius?)

gether in a long, one-dimensional chain. When dissolved

in a solvent, the polymer chain conguration often forms a BU java applet. Run the Java applet linked to at ref-

good approximation to a random walk. Typically, neigh- erence [69]. (Youll need to nd a machine with Java

boring monomers will align at relatively small angles: sev- enabled.) They model a 2-dimensional random walk as a

eral monomers are needed to lose memory of the original connected line between nearest-neighbor neighboring lat-

angle. Instead of modeling all these small angles, we can tice points on the square lattice of integers. They start

produce an equivalent problem focusing all the bending in random walks at the origin, grow them without allowing

a few hinges: we approximate the polymer by an uncorre- backtracking, and discard them when they hit the same

lated random walk of straight segments several monomers lattice point twice. As long as they survive, they average

in length. The equivalent segment size is called the per- the squared length as a function of number of steps.

sistence length.26 (c) Measure for a reasonable length of time, print out

(a) If the persistence length to bending of DNA is 50nm, the current

state, and enclose it. Did the simulation give

with 3.4A per nucleotide

p base pair, what will the root- R L? If not, whats the estimate that your simula-

mean-square distance R2 be between the ends of a tion gives for the exponent relating R to L? How does

gene in solution with 100,000 base pairs, if the DNA is it compare with the two-dimensional theoretical exponent

accurately represented as a random walk? given at the Web site?

25 We could have derived this law of thermal conductivity from random walks of

phonons, but we havent. Well give general arguments in chapter 10 that an energy

ow linear in the thermal gradient is to be expected on very general grounds.

26 Some seem to dene the persistence length with a dierent constant factor.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

28 Random Walks and Emergent Properties

Temperature and

Equilibrium 3

We now turn to study the equilibrium behavior of matter: the historical

origin of statistical mechanics. We will switch in this chapter between

discussing the general theory and applying it to a particular system the

ideal gas. The ideal gas provides a tangible example of the formalism,

and its solution will provide a preview of material coming in the next

few chapters.

A system which is not acted upon by the external world1 is said 1

If the system is driven (e.g., there are

to approach equilibrium if and when it settles down at long times to externally imposed forces or currents)

we instead call this nal condition the

a state which is independent of the initial conditions (except for con- steady state. If the system is large, the

served quantities like the total energy). Statistical mechanics describes equilibrium state will also usually be

the equilibrium state as an average over all states consistent with the time independent and calm, hence the

conservation laws: this microcanonical ensemble is introduced in sec- name. Small systems will continue to

uctuate substantially even in equilib-

tion 3.1. In section 3.2, we shall calculate the properties of the ideal rium.

gas using the microcanonical ensemble. In section 3.3 we shall dene

entropy and temperature for equilibrium systems, and argue from the

microcanonical ensemble that heat ows to maximize the entropy and

equalize the temperature. In section 3.4 we will derive the formula for

the pressure in terms of the entropy, and dene the chemical potential.

In section 3.5 we calculate the entropy, temperature, and pressure for

the ideal gas, and introduce some renements to our denitions of phase

space volume. Finally, in section 3.6 we discuss the relation between

statistical mechanics and thermodynamics.

Statistical mechanics allows us to solve en masse many problems that

are impossible to solve individually. In this chapter we address the gen-

eral equilibrium behavior of N atoms in a box of volume V any kinds

of atoms, in arbitrary external conditions. Lets presume for simplicity

that the walls of the box are smooth and rigid, so that energy is con-

served when atoms bounce o the walls. This makes our system isolated,

independent of the world around it.

How can we solve for the behavior of our atoms? If we ignore quan-

tum mechanics, we can in principle determine the positions2 Q = 2

The 3N dimensional space of positions

(x1 , y1 , z1 , x2 , . . . xN , yN , zN ) = (q1 . . . q3N ) and momenta P = (p1 , . . . p3N ) Q is called conguration space. The

3N dimensional space of momenta P is

of the particles at any future time given their initial positions and mo- called momentum space. The 6N di-

29 mensional space (P, Q) is called phase

space.

30 Temperature and Equilibrium

Q = m1 P (3.1)

P = F(Q)

(where F is the 3N dimensional force due to the other particles and the

3

m is a diagonal matrix if the particles walls, and m is the particle mass).3

arent all the same mass. In general, solving these equations is plainly not feasible.

Many systems of interest involve far too many particles to allow

one to solve for their trajectories.

Most systems of interest exhibit chaotic motion, where the time

evolution depends with ever increasing sensitivity on the initial

conditions you cannot know enough about the current state to

E predict the future.

Even if it were possible to evolve our trajectory, knowing the solu-

tion would for most purposes be useless: were far more interested

in the typical number of atoms striking a wall of the box, say, than

E+E the precise time a particular particle hits.4

How can we extract the simple, important predictions out of the com-

Fig. 3.1 The shell of energies between plex trajectories of these atoms? The chaotic time evolution will rapidly

E and E + E can have an irregu-

scramble5 whatever knowledge we may have about the initial conditions

lar thickness. The volume of this

shell in 6N dimensional phase space, of our system, leaving us eectively knowing only the conserved quanti-

divided by E, is the denition of (E). ties for our system, just the total energy E.6 Rather than solving for

Notice that the microcanonical average the behavior of a particular set of initial conditions, let us hypothesize

weights the thick regions more heav-

ily. We shall see in section 4.1 that this

that the energy is all we need to describe the equilibrium state. This

is the correct way to take the average: leads us to a statistical mechanical description of the equilibrium state of

just as a water drop in a river spends our system as an ensemble of all possible initial conditions with energy

more time in the deep sections where E the microcanonical ensemble.

the water ows slowly, so also a trajec-

tory in phase space spends more time in

We calculate the properties of our ensemble by averaging over states

the thick regions where it moves more with energies in a shell (E, E+E) taking the limit7 E 0 (gure 3.1).

slowly.

5

This scrambling, of course, is precisely Lets dene the function (E) to be the phase-space volume of this thin

the approach to equilibrium. shell:

6

If our box were spherical angular mo- (E) E = dP dQ. (3.2)

mentum would also be conserved. E<H(P,Q)<E+E

7

What about quantum mechanics, Here H(P, Q) is the Hamiltonian for our system.8 Finding the average

where the energy levels in a nite sys- A of a property A in the microcanonical ensemble is done by averaging

tem are discrete? In that case (chap- A(P, Q) over this same energy shell,9

ter 7), we will need to keep E large

compared to the spacing between en- 1

ergy eigenstates, but small compared to

AE = A(P, Q) dP dQ. (3.9)

(E)E E<H(P,Q)<E+E

the total energy.

8

The Hamiltonian H is the function 4 Of course, there are applications where the precise evolution of a particular sys-

of P and Q that gives the energy.

tem is of interest. It would be nice to predict the time at which a particular earth-

For our purposes, this P3Nwill always be

quake fault will yield, so as to warn everyone to go for a picnic outdoors. Statistical

P2 /2m + V (Q) = 2

=1 p /2m + mechanics, broadly speaking, is helpless in computing such particulars. The bud-

V (q1 , . . . , q3N ), where the force in New-

get of the weather bureau is a good illustration of how hard such system-specic

tons laws 3.1 is F = q V

.

predictions are.

9 It is convenient to write the energy shell E < H(P, Q) < E + E in terms of the

1 x0

(x) = ; (3.3)

0 x<0

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

3.2 The Microcanonical Ideal Gas 31

Notice that, by averaging equally over all states in phase space com-

patible with our knowledge about the system (that is, the conserved

energy), we have made a hidden assumption: all points in phase space

(with a given energy) are a priori equally likely, so the average should

treat them all with equal weight. In section 3.2, we will see that this

assumption leads to sensible behavior, by solving the simple case of an

ideal gas. We will fully justify this equal-weighting assumption in chap-

ter 4, where we will also discuss the more challenging question of why

so many systems actually reach equilibrium.

The fact that the microcanonical distribution describes equilibrium

systems should be amazing to you. The long-time equilibrium behavior

of a system is precisely the typical behavior of all systems with the same

value of the conserved quantities. This fundamental regression to the

mean is the basis of statistical mechanics.

We can talk about a general collection of atoms, and derive general

statistical mechanical truths for them, but to calculate specic properties

we must choose a particular system. The simplest statistical mechanical

gure 3.1). In the limit E 0, we can write (E) as a derivative

Z

(E)E = dP dQ

E<H(P,Q)<E+E

Z

= dP dQ [(E + E H) (E H)]

Z

= E dP dQ (E H) (3.4)

E

and the expectation of a general operator A as

Z

1

A = dP dQ [(E + E H) (E H)] A(P, Q)

(E)

Z

1

= dP dQ (E H)A(P, Q). (3.5)

(E) E

the derivatives in equations 3.4 and 3.5 are

It will be important later to note that

at constant N and constant V : E . Finally, we know the derivative of the

V,N

Heaviside function is the the Dirac -function. (You may think of (x) as the limit

as / zero of a function which is 1// in the range (0, /). Mathematicians may think

of it as a point mass at the origin.)

Z

(E) = dP dQ (E H(P, Q)) , (3.6)

Z

1

A = dP dQ (E H(P, Q)) A(P, Q). (3.7)

(E)

Thus the microcanonical ensemble can be written as a probability density

(E H(P, Q)) /(E) in phase space. which is of course the integral divided by

the volume (E)E:

Z

1

A = dP dQ A(P, Q). (3.8)

(E)E E<H(P,Q)<E+E

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

32 Temperature and Equilibrium

10

Air is a mixture of gasses, but most system is the monatomic10 ideal gas. You can think of helium atoms

of the molecules are diatomic: O2 and at high temperatures and low densities as a good approximation to this

N2 , with a small admixture of triatomic

CO2 and monatomic Ar. The proper-

ideal gas the atoms have very weak long-range interactions and rarely

ties of diatomic ideal gasses are almost collide. The ideal gas will be the limit when the interactions between

as simple: but one must keep track of particles vanish.11

the internal rotational degree of free-

dom (and, at high temperatures, the

vibrational degrees of freedom). 3.2.1 Conguration Space

For the ideal gas, the energy does not depend upon the spatial congura-

tion Q of the particles. This allows us to study the positions separately

from the momenta (next subsection). Since the energy is independent

of the position, our microcanonical ensemble must weight all congura-

tions equally. That is to say, it is precisely as likely that all the particles

will be within a distance / of the middle of the box as it is that they will

be within a distance / of any other particular conguration.

What is the probability density (Q) that the ideal gas particles will

be in a particular conguration Q R3N inside the box of volume V? We

know is a constant, independent of the conguration.

We know that

the gas atoms are in some conguration, so dQ = 1. The integral

over the positions gives a factor of V for each of the N particles, so

(Q) = 1/V N .

It may be counterintuitive that unusual congurations, like all the

particles on the right half of the box, have the same probability density

as more typical congurations. If there are two non-interacting particles

in a L L L box centered at the origin, what is the probability that

both are on the right (have x > 0)? The probability that two particles

are on the right half is the integral of = 1/L6 over the six dimensional

volume where both particles have x > 0. The volume of this space is

(L/2) L L (L/2) L L = L6 /4, so the probability is 1/4, just as

one would calculate by ipping a coin for each particle. The probability

that N such particles are on the right is 2N just as your intuition

would suggest. Dont confuse probability density with probability! The

unlikely states for molecules are not those with small probability density.

Rather, they are states with small net probability, because their allowed

congurations and/or momenta occupy insignicant volumes of the total

phase space.

Notice that conguration space typically has dimension equal to sev-

12

A gram of hydrogen has approxi- eral times Avogadros number.12 Enormousdimensional vector spaces

mately N = 6.02 1023 atoms, known have weird properties which directly lead to to important principles

as Avogadros number. So, a typical

in statistical mechanics. For example, most of conguration space has

3N will be around 1024 .

almost exactly half the x-coordinates on the right side of the box.

If there are 2N non-interacting particles in the box, what is the prob-

ability Pm that N + m of them will be on the right half? There are 22N

equally likely ways the distinct particles could sit on the two sides of

11 With no interactions, how can the ideal gas reach equilibrium? If the particles

never collide, they will forever be going with whatever initial velocity we started them.

We imagine delicately taking the long time limit rst, before taking the limit of weak

interactions, so we can presume an equilibrium distribution has been established.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

3.2 The Microcanonical Ideal Gas 33

the box. Of these, N2N +m = (2N )!/((N + m)!(N m)!) have m extra

13 `p

particles on the right half.13 So, q

is the number of ways of choosing

an unordered subset of size q from a set

2N 2N of size p. There are p(p 1)...(p q +

Pm = 2 = 22N (2N )!/((N + m)!(N m)!). (3.10)

N +m 1) = p!/(p q)! ways of choosing an

ordered subset, since there are p choices

We can calculate the uctuations in the number on the right using for the rst member and p 1 for the

Stirlings formula,14 second . . . There are q! dierent ordered

`

sets for each disordered one, so pq =

n! (n/e)n 2n (n/e)n . (3.11) p!/(q!(p q)!).

14

Stirlings formula tells us that the

For

now, lets use the second, less accurate form: keeping the factor average number in the product n(n

2n would x the prefactor in the nal formula (exercise 3.4) which 1) . . . 1 is roughly n/e. See exercise 1.4.

we will instead derive by normalizing the total probability to one. Using

Stirlings formula, equation 3.10 becomes

2N N +m N m

2N N +m N m

Pm 22N

e e e

2N N m

=N (N + m) N +m

(N m) (3.12)

(N +m) (N m)

= (1 + m/N ) (1 m/N )

N

= ((1 + m/N )(1 m/N )) (1 + m/N )m (1 m/N )m

N

= (1 m2 /N 2 ) (1 + m/N )m (1 m/N )m

and, since m N we may substitute 1 + / exp(/), giving us

2 N

m
m

Pm em /N em/N

2

em/N P0 exp(m2 /N ).

(3.13)

where P0 is the prefactor we missed by not keeping enough terms in

Stirlings formula. We know that the

probabilities must sum toone,

so again for m N , 1 = m Pm P0 exp(m2 /N ) dm = P0 N .

Hence

Pm 1/N exp(m2 /N ). (3.14)

This is a nice result: it says that the number uctuations

are distributed

according to a Gaussian or normal distribution15 (1/ 2) exp(x2 /2 2 ) 15

We derived exactly this result in sec-

with a standard deviation m = N/2. If we have Avogadros number tion 2.4.2 using random walks and a

continuum approximation, instead of

of particles N 1024 , then the fractional uctuations m /N = 2N 1

Stirlings formula: this Gaussian is

12

10 = 0.0000000001%. In almost all the volume of a box in R , al- 3N

the Greens function for the number

most exactly half of the coordinates are on the right half of their range. of heads in 2N coin ips. Well de-

rive it again in exercise 13.7 by de-

In section 3.2.2 we will nd another weird property of highdimensional riving the central limit theorem using

spaces. renormalization-group methods.

We will nd that the relative uctuations of most quantities of interest

in equilibrium statistical mechanics go as 1/ N . For many properties

of macroscopic systems, statistical mechanical uctuations about the av-

erage value are very small.

Working with the microcanonical momentum distribution is more chal-

lenging, but more illuminating, than working with the ideal gas cong-

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

34 Temperature and Equilibrium

IP

R

E

p1

R E+ E

Fig. 3.2 The energy surface in mo-

mentum spaceis the 3N 1 sphere

of radius R = 2mE. The conditions

that the x-component of the momen-

tum of atom #1 is p1 restricts us to

a circle (or rather

p 3N 2 sphere) of

radius R = 2mE p1 2 . The con-

dition that the energy is in the shell

(E, E +E) leaves us with the annular

region shown in the inset.

uration space of the last section. Here we must study the geometry of

spheres in high dimensions.

The

3N 1

kinetic energy

2 3N 2

for interacting particles is

/ m

=1 2 v = p

=1 /2m . If we assume all of our atoms have

the same mass m, this simplies to P2 /2m. Hence the condition that

the particles in our system have energy E is that the system lies on

a sphere in 3N dimensional momentum space of radius R = 2mE.

3N 1

Mathematicians16 call this the 3N 1 sphere, SR . Specically, if the

energy of the system is known to be in a small range between E and

E + E, what

1is the corresponding volume of momentum space? The

17

Check this in two dimensions. Us-

volume SR of the 1 sphere (in dimensions) of radius R is17

ing 1/2 ! = /2 and 32 ! = 3 /4, check 1

it in one and three dimensions (see ex- SR = /2 R / 2
! (3.15)

ercise 1.4 for n! for non-integer n.) Is

n! = n (n 1)! valid for n = 3/2?

The volume of the thin shell18 between E and E + E is given by

18

This is not quite the surface area,

3N 1 3N 1

since were taking a shell of energy (S ) (S )

rather than radius. Thats why its Momentum Shell Volume 2M(E+E) 2ME

=

volume goes as R3N2 , rather than E

R3N1 .

E

3N 1

= d S 2mE dE

d
3N 3N

= 2 (2mE) 2 / 3N2 !

dE

= 2 (3N m)(2mE) 2 1 / 3N

3N 3N

2 !

3N 3N

= (3N/2E) 2 (2mE) 2 / 3N

2 !. (3.16)

Formula 3.16 is the main result of this section. Given our microcanonical

ensemble that equally weights all states with energy E, the probability

16 Mathematicians like to name surfaces, or manifolds, for the number of dimensions

or local coordinates internal to the manifold, rather than the dimension of the space

the manifold lives in. After all, one can draw a circle embedded in any number of

dimensions (down to two). Thus a basketball is a two sphere S 2 , the circle is the

one-sphere S 1 , and the zero sphere S 0 consists of the two points 1.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

3.2 The Microcanonical Ideal Gas 35

density for having any particular set of particle momenta P is the inverse

of this shell volume.

Lets do a tangible calculation. Lets calculate the probability density

(p1 ) that the x-component of the momentum of the rst atom is p1 .19 19

It is a sloppy physics convention to

The probability density that this momentum is p1 and the energy is use to denote probability densities of

all sorts. Earlier, we used it to denote

in the range (E, E + E) is proportional to the area of the annular probability density in 3N dimensional

region (between

two 3N 2 spheres) in gure 3.2. The sphere has conguration space; here we use it to

radius R = 2mE, so by the Pythagorean theorem, the circle has radius denote probability density in one vari-

R = 2mE p1 2 . The volume in momentum space of the 3N 2 able. The argument of the function

tells us which function were consider-

dimensional annulus is given by using equation 3.15 with = 3N 1: ing.

3N 2

Annular Area/E = d S dE

2mEp1 2

d
(3N 1)/2

= (2mE p1 2 )(3N 1)/2 / 3N21 !

dE

= (3N 1)/2 ((3N 1)m)(2mE p1 2 )(3N 3)/2 / 3N21 !

= (3N 1)m (3N 1)/2 R3N 3 / 3N21 !

= [Constants]R 3N 3 , (3.17)

where weve dropped multiplicative factors that are independent of p1

and E. The probability density of being in the annulus is its area divided

by the shell volume in equation 3.16; this shell volume can be simplied

as well, dropping terms that do not depend on E:

Momentum Shell Volume

= 2 (3N m)(2mE) 2 1 / 3N

3N 3N

2 !

E

= 3N m 2 R3N 2 / 3N

3N

2 !

= [Constants]R3N 2 . (3.18)

Our formula for the probability density (p1 ) is thus

Annular Area

(p1 ) =

Momentum Shell Volume

(3N 1)m (3N 1)/2 R3N 3 / 3N21 !

= 3N

3N m 2 R3N 2 / 3N

2 !

3

= [Constants](R /R )(R /R)3N

2

(3.19)

3 3N

= [Constants](R /R )(1 p1 /2mE)

2 2 2 .

probability density (p1 ) will be essentially zero unless R /R =

The

1 p1 /2mE is nearly equal to one, since this factor is taken to

2

Avogadros number. We can thus simplify

R2 /R3 1/R = 1/ 2mE and (1 p1 2 /2mE) = (1 /) exp(/) =

exp(p1 2 /2mE), giving us

p1 2 3N

(p1 ) 1/ 2mE exp (3.20)

2m 2E

The probability

density (p1 ) is a Gaussian distribution of standard

deviation 2mE/3N; we again can set the constant of proportionality

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

36 Temperature and Equilibrium

1 p1 2 3N

(p1 ) = exp (3.21)

2m(2E/3N ) 2m 2E

distribution of our particles explicitly in terms of E, N , and m, without

ever considering a particular trajectory: this is what makes statistical

mechanics powerful.

Formula 3.21 tells us that most of the surface area of a largedimensional

sphere is very close to the equator. Think of p1 as the latitude on the

sphere:

The range of latitudes containing most ofthe area is p =

2mE/3N , and the total range of latitudes is 2mE: the belt di-

vided by the height is the square root of Avogadros number. This is

true whatever equator you choose, even intersections of several equators.

Geometry is weird in high dimensions.

In the context of statistical mechanics, this seems much less strange:

typical congurations of gasses have the kinetic energy divided roughly

equally among all the components of momentum: congurations where

one atom has most of the kinetic energy are vanishingly rare.

This formula foreshadows four key results that will emerge from our

systematic study of equilibrium statistical mechanics in the following

few chapters.

(1) Temperature. In our calculation, a single momentum component

competed for the available energy with the rest of the ideal gas. In

section 3.3 we will study the competition in general between two

large subsystems for energy, and will discover that the balance is

determined by the temperature. The temperature T for our ideal

20 2E 20

We shall see that temperature is nat- gas will be given (equation 3.57) by kB T = 3N . Equation 3.21

urally measured in units of energy. His- then gives us the important formula

torically we measure temperature in de-

grees and energy in various other units

(p1 ) = 1/ 2mkB T exp(p1 2 /2mkB T ). (3.22)

(Joules, ergs, calories, eV, foot-pounds,

. . . ); Boltzmanns constant kB is the

conversion factor between units of tem- (2) Boltzmann distribution. The probability of the x-momentum

perature and units of energy. of the rst particle having kinetic energy K = p21 /2m is propor-

tional to exp(K/kB T ) (equation 3.22). This is our rst example

of a Boltzmann distribution. We shall see in section 5.2 that the

21

This is dierent from the probabil- probability of a small subsystem being in a particular state21 of

ity of the subsystem having energy E, energy E will in completely general contexts have probability pro-

which is the product of the Boltzmann

probability times the number of states

portional to exp(E/kB T ).

with that energy. (3) Equipartition theorem. The average kinetic energy p21 /2m

from equation 3.22 is kB T /2. This is an example of the equiparti-

tion theorem (section 5.3): each harmonic degree of freedom in an

equilibrium classical system has average energy kB T /2.

(4) General classical momentum distribution. Our derivation

was in the context of a monatomic ideal gas. But we could have

done an analogous calculation for a system with several gasses of

dierent masses: our momentum sphere would become an ellipsoid,

22

Molecular gasses will have internal but the calculation would still give the same distribution.22 What

vibration modes that are often not To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

well described by classical mechan-

ics. At low temperatures, these are

often frozen out: including rotations

and translations but ignoring vibra-

tions leads to the traditional formulas

used, for example, for air (see note 10

on page 32).

3.3 What is Temperature? 37

ensemble (section 5.2), that interactions dont matter either, so

long as the system is classical:23 the calculation factors and the 23

Notice that almost all molecular dy-

probability densities for the momenta are given by equation 3.22, namics simulations are done classically:

their momentum distributions are given

independent of the potential energies.24 The momentum distribu- by equation 3.22.

tion in the form equation 3.22 is correct for nearly all equilibrium

systems of classical particles.

When a hot body is placed beside a cold one, our ordinary experience

suggests that heat energy ows from hot to cold until they reach the same

temperature. In statistical mechanics, the distribution of heat between

the two bodies is determined by the assumption that all possible states

of the two bodies at xed total energy are equally likely. Do these two

denitions agree? Can we dene the temperature so that two large

bodies in equilibrium with one another will have the same temperature?

Consider a general, isolated system of total energy E consisting of two

parts, labeled 1 and 2. Each subsystem has xed volume and number of

particles, and is energetically weakly connected to the other subsystem.

The connection is weak in that we assume we can neglect the dependence

of the energy E1 of the rst subsystem on the state s2 of the second one,

and vice versa.25 25

A macroscopic system attached to

Our microcanonical ensemble then asserts that the equilibrium en- the external world at its boundaries

is usually weakly connected, since the

semble of the total system is an equal weighting of all possible states interaction energy is only important

of the two subsystems having total energy E. A particular state of the at the surfaces, which are a negligible

whole system is given by a pair of states (s1 , s2 ) with E = E1 + E2 . fraction of the total. Also, the mo-

This immediately implies that a particular conguration or state s1 of menta and positions of classical parti-

cles without magnetic elds are weakly

the rst subsystem at energy E1 will occur with probability density26 connected in this sense: no terms in the

Hamiltonian mix them (although the

(s1 ) 2 (E E1 ) (3.23) dynamical evolution certainly does).

26

That is, if we compare the probabili-

where 1 (E1 ) E1 and 2 (E2 ) E2 are the phase-space volumes of the ties of two states of the subsystems with

energy shell for the two subsystems. The volume of the energy surface energies Ea and Eb , and if 2 (E Ea )

for the total system at energy E will be given by adding up the product is 50 times larger than 2 (EEb ), then

(Ea ) = 50 (Eb ) because the former

of the volumes of the subsystems for pairs of energy summing27 to E: has 50 times as many partners that it

can pair with to get an allotment of

(E) = dE1 1 (E1 )2 (E E1 ) (3.24) probability.

27

Equation 3.24 becomes a sum over

Notice that the integrand in equation 3.24, normalized by the total in- states in quantum mechanics, and

should be intuitively clear. We can for-

tegral, is just the probability density28 of the subsystem having energy mally derive it in classical mechanics:

E1 : see exercise 3.3.

(E1 ) = 1 (E1 )2 (E E1 )/(E). (3.25) 28

Warning: again were being sloppy:

we use (s1 ) in equation 3.23 for the

24 Quantum mechanics, however, couples the kinetic and potential terms: see chap- probability that the subsystem is in a

ter 7. Quantum mechanics is important for atomic motions only at low temperatures, particular state s1 and we use (E1 ) in

so equation 3.22 will be reasonably accurate for all gasses, all liquids but helium, and equation 3.25 for the probability that a

many solids that are not too cold. subsystem is in any of many particular

states with energy E1 .

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

38 Temperature and Equilibrium

number of particles then (E1 ) is a very sharply peaked function near

its maximum E1 . Hence, the energy in subsystem 1 is given (apart from

small uctuations) by the maximum in the integrand 1 (E1 )2 (E E1 ).

The maximum is found when the derivative d dE1 2 1 dE2 is zero, so

1 d2

1 d1 1 d2

= . (3.26)

1 dE 2 dE

This is the condition for thermal equilibrium between the two subsys-

tems. We can put it in a more convenient form by dening the equilib-

rium entropy29

Sequil (E) = kB log((E)) (3.27)

30

kB is again Boltzmanns constant, for each of our systems.30 Then dS/dE = kB (1/)d/dE, and the

see note 20 on page 36. condition 3.26 for thermal equilibrium between two macroscopic bodies

is precisely the condition

d dS1 dS2

(S1 (E1 ) + S2 (E E1 )) = =0 (3.28)

dE1 dE E1 dE EE1

that entropy is an extremum. Indeed, since the sum of the entropies is

the logarithm of the integrand of equation 3.24 which by assumption is

expanded about a local maximum, the condition of thermal equilibrium

31

We shall discuss dierent aspects of maximizes the entropy.31

entropy and its growth in chapter 6. We want to dene the temperature so that it becomes equal when the

two subsystems come to equilibrium. Weve seen that

dS1 /dE = dS2 /dE (3.29)

in thermal equilibrium. dS/dE decreases upon increasing energy, so we

dene the temperature in statistical mechanics as

1/T = dS/dE. (3.30)

Is the probability density (E1 ) in equation 3.25 sharply peaked, as

we have assumed? We can expand the numerator about the maximum

E1 = E1 , and use the fact that the temperatures balance to remove the

terms linear in E1 E1 :

(E1 ) 1 (E1 )2 (E E1 ) = exp (S1 (E1 )/kB + S2 (E E1 )/kB )

2 S1

exp S1 (E1 ) + 1/2 (E1 E1 )2

E12

2 S2

+ S2 (E E1 ) + 1/2 (E1 E1 )2 kB

E22

2

S1 2 S2

exp (E1 E1 )2 + (2kB ) . (3.31)

E12 E22

29 This denition depends on the units we pick for the phase-space volume. We will

later realize that the natural unit to pick is h3N , where h = 2 is Plancks constant.

Note also that in this book we will consistently use log to mean the natural logarithm

loge , and not log10 .

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

3.3 What is Temperature? 39

the energy uctuations are Gaussian, with standard deviation given by

2S

kB times the inverse of the sum of E 2 for the two subsystems.

2S

How large is E 2 for a macroscopic system? It has units of inverse

energy squared, but is the energy a typical system energy or an atomic

energy? If it is a system-scale energy (scaling like the

number of particles

N ) then the root-mean-square energy uctuation (E1 E1 )2 will

be comparable to E1 (enormous uctuations). If it is an atomic-scale

energy (going to a constant as N ) then the energy uctuations will

be independent of system size (microscopic). Quantities like the total

energy which scale linearly with the system size are called extensive;

quantities like temperature that go to a constant as the system grows

large are called intensive. We shall nd that the entropy of a system is

2S

2 [S]/[E ] N/N 1/N so

2 2

extensive, so the second derivative E

the energy uctuations will scale as 1/ N of the total energy. (We can

also calculate these uctuations explicitly,32 with a little eort.) Just as

for the congurations of the ideal gas, where the number of particles in

half the box uctuated very little, so also the energy E1 uctuates very

33 33

little from the maximum probability E1 . In both cases, the relative We will discuss uctuations in detail

uctuations scale as 1/ N . in section 5.2, and in chapter 11.

The inverse of the temperature is the cost of buying energy from the

rest of the world. The lower the temperature, the more strongly the

kinetic energy for the momentum component is pushed towards zero.

Entropy is the currency being paid. For each unit E energy bought, we

pay E/T = E dS/dE = S in reduced entropy of the world. Inverse

temperature is the cost in entropy to buy a unit of energy.

The rest of the world is often called the heat bath; it is a source and

sink for heat and xes the temperature. All heat baths are equivalent,

depending only on the temperature. More precisely, the equilibrium

behavior of a system weakly coupled to the external world is independent

of what the external world is made of it depends only on the worlds

temperature. This is a deep truth.

. E

T 1

= 1 = , (3.32)

E V,N T V,N N cv

the inverse of the total specic heat at constant volume. (The specic heat cv is the

energy needed per particle to change the temperature by one unit: N cv = ET

.)

V,N

Hence

1 S 1 1 1 1

= = . (3.33)

kB E kB T 2 N c v kB T N c v T

This last line is indeed the inverse of a product of two energies. The second term

N cv T is a system-scale energy: it is the total energy that would be needed to raise

the temperature of the system from absolute zero, if the specic heat per particle

cv were temperature independent. However, the rst energy, kB T , is an atomic-

scale energy independent of N . The uctuations in energy, therefore, scale like the

geometric mean of the two, summed over the two subsystems in equation 3.31, and

hence scale as N : the total energy uctuations per particle thus are roughly 1/ N

times a typical energy per particle.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

40 Temperature and Equilibrium

The entropy S(E, V, N ) is our rst example of a thermodynamic poten-

tial. In thermodynamics, all the macroscopic properties can be calcu-

lated by taking derivatives of thermodynamic potentials with respect

to their arguments. It is often useful to think of thermodynamic po-

tentials as surfaces: gure 3.3 shows the surface in S, E, V space (at

constant number of particles N ). The energy E(S, V, N ) is another

thermodynamic potential, completely equivalent to S(E, V, N ): its the

same surface with a dierent direction up.

S

In section 3.3 we dened the temperature using E . What about

S

S

V,N

the other two rst derivatives, V E,N and N E,V ? That is, how does

the entropy change when volume or particles are exchanged between two

Fig. 3.3 Entropy. The entropy

S(E, V, N ) as a function of energy E subsystems? The change in the entropy for a tiny shift E, V , and

and volume V (at xed number N ). N from subsystem 2 to subsystem 1 (gure 3.4) is

Viewed sideways, this surface also de-

nes the energy E(S, V, N ). The three S1 S2 S1 S2

S = E + V

curves are lines at constant S,E, and V : E1 V,N E2 V,N V1 E,N V2 E,N

the fact that they close yields

must the

S E V

relation E V S

= S1 S2

V,N S,N E,N + N. (3.34)

1 (see exercise 3.5). N1 E,V N2 E,V

The rst term is of course (1/T1 1/T2)E; exchanging energy to max-

imize the entropy sets the temperatures equal. Just as for the energy, if

E

the two subsystems are allowed to exchange volume and number then the

S1(E1,V1 ,N1 ) S2(E2,V2 ,N2 ) entropy will maximize itself with respect to these variables as well, with

small uctuations.34 Equating the derivatives with respect to volume

V gives us our statistical mechanics denition of the pressure P :

S

P/T = (3.35)

V

N E,N

and equating the derivatives with respect to number gives us the deni-

tion of the chemical potential :

Fig. 3.4 Two subsystems. Two S

/T = . (3.36)

subsystems, isolated from the outside N E,V

world, may exchange energy (open door

through the insulation), volume (pis- These denitions are a bit odd: usually we dene pressure and chem-

ton), or particles (tiny uncorked holes). ical potential in terms of the change in energy E, not the change in

entropy S. There is an important mathematical identity that we derive

35

Notice that this is exactly minus the in exercise 3.5. If f is a function of x and y, then (see gure 3.3):35

result you would have derived by can-

celling f , x, and y from numerator 34 If the systems are at dierent temperatures and the piston is allowed to act, we

and denominator.

would expect the pressures to equalize. Showing that this maximizes the entropy is

complicated by the fact that the motion of a piston not only exchanges volume V

between the two subsystems, but also changes the energyE because

work

of the

done. Equation 3.34 and 3.35 tell us that S = T1 T1 E + P T

1

P2

T

V =

1 2 1 2

0, implying that E/V = (1 )P1 + P2 with = 1T1 /T . If we hypothesize

2 1

that the maximum entropy had P1 = P2 , we would certainly expect that E/V

would lie between these two pressures, corresponding to 0 < < 1, but if T2 and T1

are both positive and dierent then either < 0 or > 1. Hence the piston must

move to equalize the pressure even when the temperatures do not agree.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

3.4 Pressure and Chemical Potential 41

f x y

= 1. (3.37)

x y y f f x

Also, its clear that if we keep all but one variable xed, partial deriva-

tives are like regular derivatives so

x

f .

=1 (3.38)

x y f y

S V E P E

1 = = 1 T (3.39)

V E,N E S,N S V,N T V S,N

so

E

= P. (3.40)

V S,N

Thus the pressure is minus the energy cost per unit volume at constant

entropy. Similarly,

S N E E

1 = = 1 T (3.41)

N E,V E S,V S N,V T N S,V

so

E

= (3.42)

N S,V

the chemical potential is the energy cost of adding a particle at constant

entropy.

The chemical potential will be unfamiliar to most of those new to sta-

tistical mechanics. We can feel pressure and temperature as our bodies

exchange volume with balloons and heat with coee cups. Most of us

have not had comparable tactile experience with exchanging particles.36 36

Our lungs exchange oxygen and car-

Your intuition will improve as you work with chemical potentials. They bon dioxide, but they dont have nerve

endings that measure the chemical po-

are crucial to the study of chemical reactions (which we will treat only tentials.

lightly in this text): whether a reaction will proceed depends in part

on the relative cost of the products and the reactants, measured by

the dierences in their chemical potentials. The chemical potential is

also central to noninteracting quantum systems, where the number of

particles in each quantum state can vary (chapter 7).

Our familiar notion of pressure is from mechanics: the energy of a sub-

system increases as the volume decreases, as E = P V . What may

not be familiar is that this energy change is measured at xed entropy.

With the tools we have now, we can show explicitly that the mechanical

denition of pressure is the same as the statistical mechanics denition

(equation 3.35): the argument is somewhat technical, but illuminating

(footnote 37 at the end of this section).

We can also give a simpler argument, using properties of the entropy

that we will discuss more fully in chapter 6. A mechanical measure-

ment of the pressure must not exchange heat with the body. Changing

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

42 Temperature and Equilibrium

the volume while adding heat to keep the temperature xed, for exam-

ple, is a dierent measurement. The mechanical measurement must also

change the volume slowly. If the volume changes fast enough that the

subsystem goes out of equilibrium (typically a piston moving near the

speed of sound), then the energy needed to change the volume will in-

clude the energy for generating the sound and shock waves energies not

appropriate to include in a good measurement of the pressure. We call

a process adiabatic if it occurs without heat exchange and suciently

slowly that the system remains in equilibrium.

Consider the system comprising the subsystem and the mechanical

device pushing the piston, under a cycle V V +V V . Because the

subsystem remains in equilibrium at all times, the process of changing

the volume is completely reversible: the entropy of the system at the end

is the same as that at the beginning. Since entropy can only increase

(chapter 6), the entropy of the system halfway through the cycle at V +

V must be the same as at V . The mechanical instrument can be made

with few moving parts, so its entropy change can be neglected. Hence

the entropy of the subsystem must be unchanged under an adiabatic

change in volume. Thus a mechanical measurement of pressure is done

at constant entropy.

Broadly speaking, the entropy of a system changing adiabatically

(slowly and in thermal isolation) will be a constant. Indeed, you may

view our detailed calculation (the following footnote) as providing a sta-

tistical mechanical derivation for this important truth.37

S

V E,N

37 We want to show that our statistical mechanics denition P = T cor-

responds to the everyday mechanical denition P = E/V . We rst must use

statistical mechanics to nd a formula for the mechanical force per unit area P .

Consider some general liquid or gas whose volume is changed smoothly from V to

V + V , and is otherwise isolated from the rest of the world. (A solid can support

a shear stress. Because of this, it has not just a pressure, but a whole stress tensor,

that can vary in space . . . )

We can nd the mechanical pressure if we can nd out how much the energy

changes as the volume changes. The initial system at t = 0 is an equilibrium ensem-

ble at volume V , uniformly lling phase space in an energy range E < H < E + E

with density 1/(E, V ). A member of this volume-expanding ensemble` is a trajec-

tory P(t), Q(t) that evolves in time under the changing Hamiltonian H P, Q, V (t) .

The amount this particular trajectory changes in energy under the time-dependent

Hamiltonian is

`

dH P(t), Q(t), V (t) H H H dV

= P + Q + . (3.43)

dt P Q V dt

A Hamiltonian for particles of kinetic energy 1/2 P2 /m and potential energy U (Q) will

have H

P

= P/m = Q and H Q

= V (Q) = P, so the rst two terms cancel on

the right-hand side of equation 3.43. (You may recognize Hamiltons equations of

motion; indeed, the rst two terms cancel for any Hamiltonian system.) Hence the

energy change for this particular trajectory is

`

dH P(t), Q(t), V (t) H ` dV

= P, Q . (3.44)

dt V dt

That is, the energy change of the evolving trajectory is the same as the expectation

value of Ht

at the static current point in the trajectory: we need not follow the

particles as they zoom around.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

3.4 Pressure and Chemical Potential 43

We still must average this energy change over the equilibrium ensemble of initial

conditions. This is in general not possible, until we make the second assumption

involved in the adiabatic measurement of pressure: we assume that the potential

energy turns on so slowly that the system remains in equilibrium at the current

volume V (t) and energy E(t). This allows us to calculate the ensemble average

energy change as an equilibrium thermal average:

dH H dV

= . (3.45)

dt V E(t),V (t) dt

dt

, we nd

Z

H 1 H

P = = dP dQ (E H(P, Q, V )) . (3.46)

V (E) V

We now return to calculating the derivative

S kB

= k log() = . (3.47)

V E,N V E,N

B

V

change orders of dierentiation

Z

= dP dQ (E H(P, Q, V ))

V E,N V E,N E V,N

Z

= dP dQ (E H(P, Q, V ))

E V,N V

Z

H

= dP dQ (E H(P, Q, V )) (3.48)

E V,N V

But the phasespace integral in the last equation is precisely the same integral that

appears in our formula for the pressure, equation 3.46: it is (E)(P ). Thus

= ((E)P )

V E,N E V,N

P

= P + (3.49)

E V,N E V,N

so

!

S kB P

= k log() = P +

V E,N E V,N E V,N

B

V

kB log() P S P

= P + kB = P + kB

E V,N E V,N E V,N E V,N

P

= P/T + kB (3.50)

E V,N

Now, P and T are both intensive variables, but E is extensive (scales linearly with

P

system size). Hence P/T is of order one for a large system, but kB E is of order

1/N where N is the number of particles. (For example, we shall see that for the ideal

gas, P V = 2/3E = N kB T , so kB P

E

= 2k

3V

B 2P

= 3NT P

= 23 NT P T

for large N .)

Hence the second term, for a large system, may be neglected, giving us the desired

relation

S

= P/T. (3.51)

V E,N

The derivative of the entropy S(E, V, N ) with respect to V at constant E and N

is thus indeed the mechanical pressure divided by the temperature. Adiabatic mea-

surements (slow and without heat exchange) keep the entropy unchanged.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

44 Temperature and Equilibrium

Renements

Lets nd the temperature and pressure for the ideal gas, using our mi-

crocanonical ensemble. Well then introduce two subtle renements to

the phase space volume (one from quantum mechanics, and one for indis-

tinguishable particles) which will not aect the temperature or pressure,

but will be important for the entropy and chemical potential.

We derived the volume (E) of the energy shell in phase space in

38

It factors only because the potential section 3.2: it factored38 into a momentum space volume from equa-

energy is zero. tion 3.16 and a conguration space volume V N . Before our renements,

we have:

3N 3N 3N

N

crude(E) = 2 (2mE) 2 / 3N2 !(V )

2E

3N 3N

2 (2mE) 2 / 3N

2 !(V

N

)

(3.52)

Notice that in the second line of 3.52 we have dropped the rst term:

it divides the phase space volume by a negligible factor (two-thirds the

39

Multiplying (E) by a factor in- energy per particle).39 The entropy and its derivatives are (before our

dependent of the number of particles renements)

is equivalent to adding a constant to

the entropy. The entropy of a typi-
3N 3N

cal system is so large (of order Avo- Scrude(E) = kB log 2 (2mE) 2 / 3N 2 ! (V N )

gadros number times kB ) that adding

3N

a number-independent constant to it is = /2 N kB log(2mE) + N kB log(V ) kB log

3

! (3.53)

irrelevant. Notice that this implies that 2

(E) is so large that multiplying it by

1 S 3N kB

a constant doesnt signicantly change = = (3.54)

its value. T E V,N 2E

P S N kB

= = (3.55)

T V E,N V

(3.56)

2E

kB T = and (3.57)

3N

P V = N kB T. (3.58)

equation 3.22: the ideal gas has energy equal to 1/2 kB T per component

40

Since kB T = 2E/3N , this means of the velocity.40

each particle on average has its share The second formula is the equation of state41 for the ideal gas. The

E/N of the total energy, as it must.

equation of state is the relation between the macroscopic variables of

41

It is rare that the equation of state an equilibrium system that emerges in the limit of large numbers of

can be written out as an explicit equa-

particles. The pressure P (T, V, N ) in an ideal gas will uctuate around

tion! Only in special cases (e.g., nonin-

teracting systems like the ideal gas) can the value N kB T /V given by the equation of state, with the magnitude

one solve in closed form for the thermo- of the uctuations vanishing as the system size gets large.

dynamic potentials, equations of state, To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

or other properties.

3.5 Entropy, the Ideal Gas, and Phase Space Renements 45

In general, our denition for the energy shell volume in phase space

needs two renements. First, the phase space volume has units of

([length][momentum])3N : the volume of the energy shell depends multi-

plicatively upon the units chosen for length, mass, and time. Changing

these units will change the corresponding crude form for the entropy by a

constant times 3N . Most physical properties, like temperature and pres-

sure above, are dependent only on derivatives of the entropy, so the over-

all constant wont matter: indeed, the zero of the entropy is undened

within classical mechanics. It is suggestive that [length][momentum] has

units of Plancks constant h, and we shall see in chapter 7 that quantum

mechanics in fact does set the zero of the entropy. We shall see in exer-

cise 7.1 that dividing42 (E) by h3N nicely sets the entropy density to 42

This is equivalent to using units for

zero in equilibrium quantum systems at absolute zero. which h = 1.

Second, there is an important subtlety in quantum physics regarding

e e

identical particles. Two electrons, or two Helium atoms of the same

isotope, are not just hard to tell apart: they really are completely and e+ e

utterly the same (gure 3.5). We shall see in section 7.3 that the proper

quantum treatment of identical particles involves averaging over possible e e

states using Bose and Fermi statistics.

In classical physics, there is an analogous subtlety regarding indistin-

guishable43 particles. For a system of two indistinguishable particles, Fig. 3.5 Feynman diagram: in-

the phase space points (pA , pB , qA , qB ) and (pB , pA , qB , qA ) should distinguishable particles. In quan-

tum mechanics, two electrons, (or two

not both be counted: the volume of phase space (E) should be half atoms of the same isotope) are funda-

that given by a calculation for distinguishable particles. For N indis- mentally indistinguishable. We can il-

tinguishable particles, the phase space volume should be divided by N !, lustrate this with a peek at an advanced

the total number of ways the labels for the particles can be permuted. topic mixing quantum eld theory and

relativity. Here is a scattering event

Unlike the introduction of the factor h3N above, dividing the phase of a photon o an electron, viewed in

space volume by N ! does change the predictions of statistical mechan- two reference frames: time is vertical,

ics in important ways. We will see in subsection 6.2.1 that the entropy a spatial coordinate is horizontal. On

increase for joining containers of dierent kinds of particles should be the left we see two dierent electrons,

one which is created along with an anti-

substantial, while the entropy increase for joining containers lled with electron or positron e+ , and the other

indistinguishable particles should be near zero. This result is correctly which later annihilates the positron. At

treated by dividing (E) by N ! for each set of N indistinguishable par- right we see the same event viewed in

a dierent reference frame: here there

ticles. We call the resulting ensemble Maxwell Boltzmann statistics,

is only one electron, which scatters two

to distinguish it from distinguishable statistics and from the quantum- photons. (The electron is virtual, mov-

mechanical Bose and Fermi statistics. We shall see in chapter 7 that ing faster than light, between the col-

identical fermions and bosons obey Maxwell-Boltzmann statistics at high lisions: this is allowed in intermediate

states for quantum transitions.) The

temperatures they become classical, but remain indistinguishable. two electrons on the left are not only in-

Combining these two renements gives us for the ideal gas distinguishable, they are the same par-

3N 3N ticle! The antiparticle is also the elec-

(E) = (V N /N !) ( 2 (2mE) 2 / 3N !) (1/h)3N . (3.59) tron, traveling backward in time.

2

V 3/2 3N 43

If we have particles that in principle

S(E) = N kB log 3 (2mE) kB log(N ! !). (3.60) are not identical, but our Hamiltonian

h 2

and measurement instruments do not

We can make our equation for the entropy more useful by using Stirlings distinguish between them, then in clas-

formula log(N !) N log N N , valid at large N . sical statistical mechanics we may treat

3/2

them with Maxwell Boltzmann statis-

tics as well: they are indistinguishable

5 V 4mE but not identical.

S(E, V, N ) = N kB + N kB log (3.61)

2 N h3 3N

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

46 Temperature and Equilibrium

Thermodynamics and statistical mechanics historically were closely tied,

and often even now they are taught together. What is thermodynamics?

Thermodynamics is the theory that emerges from statisti-

cal mechanics in the limit of large systems. Statistical mechanics

originated as a derivation of thermodynamics from an atomistic micro-

scopic theory (somewhat before the existence of atoms was universally

accepted). Thermodynamics can be viewed as statistical mechanics in

44

The limit N is thus usually the limit44 as the number of particles N . When we calculate

called the thermodynamic limit, even the relative uctuations in properties

for systems like secondorder phase like the energy or the pressure

transitions where the uctuations re-

and show that they vanish like 1/ N , we are providing a microscopic

main important and thermodynamics justication for thermodynamics. Thermodynamics is the statistical me-

per se is not applicable. chanics of near-equilibrium systems when one ignores the uctuations.

In this text, we will summarize many of the important methods and

results of traditional thermodynamics in the exercises (3.5, 5.4, 5.6),

6.4), and 5.7). Our discussions of order parameters (chapter 9) and

deriving new laws (chapter 10) will be providing thermodynamic laws,

broadly speaking, for a wide variety of states of matter.

Statistical mechanics has a broader purview than thermodynamics.

Particularly in applications to other elds like information theory, dy-

namical systems, and complexity theory, statistical mechanics describes

many systems where the emergent behavior does not have a recognizable

relation to thermodynamics.

Thermodynamics is a self-contained theory. Thermodynam-

ics can be developed as an axiomatic system. It rests on the so-called

three laws of thermodynamics, which for logical completeness must be

supplemented by a zeroth law. Informally, they are:

a third, they are in equilibrium with one another.

(1) Conservation of energy: The total energy of an isolated system,

including the heat energy, is constant.

(2) Entropy always increases: An isolated system may undergo irre-

versible processes, whose eects can be measured by a state func-

tion called the entropy.

(3) Entropy goes to zero at absolute zero: The entropy per particle of

45

This value is set to zero by dividing any two large equilibrium systems will approach the same value45

(E) by h3N , as in section 3.5. as the temperature approaches absolute zero.

The zeroth law (transitivity of equilibria) becomes the basis for den-

ing the temperature. Our statistical mechanics derivation of the temper-

ature in section 3.3 provides the microscopic justication of the zeroth

law: systems that can only exchange heat energy are in equilibrium

with

S

one another when they have a common value of T1 = E V,N

.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

3.6 What is Thermodynamics? 47

of physics. Thermodynamics automatically inherits it from the micro-

scopic theory. Historically, the thermodynamic understanding of how

work transforms into heat was important in establishing that energy is

conserved. Careful arguments about the energy transfer due to heat ow

and mechanical work46 are central to thermodynamics. 46

We will use this kind of argument

The second law (entropy always increases) is the heart of thermo- in discussing the Carnot cycle in sec-

tion 6.1.

dynamics.47 It is responsible for everything from forbidding perpetual 47

motion machines to predicting the heat death of the universe (exer- In The Two Cultures, C. P. Snow

suggests being able to describe the Sec-

cise 6.1). Entropy and its increase is not a part of our microscopic laws ond Law of Thermodynamics is to sci-

of nature, but is the foundation an axiom for our macroscopic theory ence as having read a work of Shake-

of thermodynamics. The subtleties of how entropy and its growth comes speare is to the arts. (Some in non-

English speaking cultures may wish to

out of statistical mechanics will be the theme of chapter 6 and the focus

object.) That it is law #2 is not of great

of several exercises (6.5, 6.6, 7.2, and 8.8). import, but the concept of entropy and

The third law (entropy goes to zero at T = 0, also known as Nernsts its inevitable increase is indeed central.

theorem), basically reects the fact that quantum systems at absolute

zero are in a ground state. Since the number of ground states of a

quantum system typically is small48 and the number of particles is large, 48

Some systems may have broken sym-

systems at absolute zero have zero entropy per particle. Systems like metry states, or multiple degenerate

ground states, but the number of such

glasses that have not reached complete equilibrium can have non-zero states is typically independent of the

residual entropy as their eective temperature goes to zero (exercise 6.9). size of the system, or at least does not

The laws of thermodynamics have been written in many equivalent grow exponentially with the number of

ways.49 Caratheodory, for example, states the second law as There are particles, so the entropy per particle

goes to zero.

states of a system, diering innitesimally from a given state, which

49

are unattainable from that state by any quasi-static adiabatic50 process. Occasionally you hear them stated

(1) You cant win, (2) You cant break

The axiomatic form of the subject has attracted the attention of math- even, and (3) You cant get out of the

ematicians: indeed, formulas like dE = T dS P dV + dN have precise game. I dont see that Nernsts theo-

meanings in dierential geometry.51 rem relates to quitting.

In this text, we will not attempt to derive properties axiomatically 50

Caratheodory is using the term adia-

or otherwise from the laws of thermodynamics: we focus on statistical batic just to exclude heat ow: we use it

mechanics. to also imply innitely slow transitions.

51

Thermodynamics is a zoo of partial derivatives, transforma- The terms dX are dierential forms.

tions, and relations. More than any other eld of science, the ther-

modynamics literature seems lled with partial derivatives and tricky

relations between varieties of physical quantities.

This is in part because there are several alternative thermodynamic

potentials or free energies to choose between for a given problem. For

studying molecular systems one has not only the entropy (or the internal

energy) studied in this chapter, but also the Helmholtz free energy, the

Gibbs free energy, the enthalpy, and the grand free energy. Transforming

from one free energy to another is done using Legendre transformations

(exercise 5.7). There are corresponding free energies for studying mag-

netic systems, where instead of particles one studies the local magneti-

zation or spin. There appears to be little consensus between textbooks

on the symbols or even the names of these various free energies.

Thermodynamics seems cluttered in part also because it is so pow-

erful: almost any macroscopic property of interest can be found by

taking derivatives of the thermodynamic potential, as weve seen. First

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

48 Temperature and Equilibrium

sure, and chemical potential; second derivatives gave properties like the

specic heat. The rst derivatives must agree around a tiny triangle,

yielding a tricky relation between their products (equation 3.37). The

2 2

second derivatives must be symmetric ( xy = yx ), giving tricky

Maxwell relations between what naively seem dierent susceptibilities

(exercise 3.5). There are further tricks involved with taking derivatives

52

For example, with work you can take in terms of unnatural variables,52 and there are many inequalities that

the derivative of S(E, V, N ) with re- can be derived from stability criteria.

spect to P at constant T without re-

expressing it in the variables P and T .

Of course, statistical mechanics is not really dierent from thermody-

53

namics in these regards.53 For each of the thermodynamic potentials

Perhaps the partial derivatives are

there is a corresponding statistical mechanical ensemble.54 Almost ev-

not so daunting when they come at the

end of a technically challenging micro- erything in statistical mechanics can be found from derivatives of the

scopic calculation. entropy (or, more typically, the partition function, section 5.2). Indeed,

54

See chapter 5 where we also explain statistical mechanics has its own collection of important relations that

why they are called free energies. connect equilibrium uctuations to transport and response.55 Weve

55

These are extremely useful, for exam- already seen the Einstein relation connecting uctuations to diusive

ple, in numerical simulations: do the transport in section 2.3; chapter 11 will focus on these uctuation

equilibrium state, measure the trans- dissipation and uctuationresponse relations.

port and response properties!

If we knew what system the reader was going to be interested in, we

could write down all these various tricky relations in a long appendix. We

could also produce a table of the dierent notations and nomenclatures

used in dierent communities and texts. However, such a tabulation

would not address the variety of free energies that arise in systems with

other external forces (exercise 5.8) or other broken symmetries. The

fraction of young scientists that will devote their careers to the study of

uids and gasses (or magnets, or any other particular system) is small.

For that reason, we focus on the statistical mechanical principles which

enable us to derive these and other new relations.

Exercises

Exercise 3.1 is the classic problem of planetary atmo- (3.1) Escape Velocity. (Basic)

spheres. Exercise 3.2 is a nice generalization of the ideal

gas law. Part (a) of exercise 3.3 is a workout in - Assuming the probability distribution for the z compo-

functions; parts (b) and (c) calculate the energy uctu- nent

of momentum given in equation 3.22, (pz ) =

ations for a mixture of two ideal gasses, and could be 1/ 2mkB T exp(pz 2 /2mkB T ), give the probability

assigned separately. Exercise 3.4 extends the calculation density that an N2 molecule will have a vertical compo-

of the density uctuations from two subvolumes to K nent of the velocity equal to the escape velocity from the

subvolumes, and introduces the Poisson distribution. Fi- Earth (about 10 km/sec, if I remember right). Do we need

nally, exercise 3.5 introduces some of the tricky partial to worry about losing our atmosphere? Optional: Try

derivative relations in thermodynamics (the triple prod- the same calculation for H2 , where youll nd a substan-

uct of equation 3.37 and the Maxwell relations) and ap- tial leakage.

plys them to the ideal gas. (Hint:Youll want to know that there are about 107

seconds in a year, and molecules collide (and scramble

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

3.6 What is Thermodynamics? 49

Jupiter has hydrogen gas in its atmosphere, and Earth

does not.)

(3.2) Hard Sphere Gas (Basic) Fig. 3.7 Excluded volume around a sphere.

We can improve on the realism of the ideal gas by giving

the atoms a small radius. If we make the potential energy

(b) What is the congurational entropy for the hard disks?

innite inside this radius (hard spheres), the potential

Here, simplify your answer so that it does not involve

energy is simple (zero unless the spheres overlap, which

a sum over N terms, but valid to rst order in the

is forbidden). Lets do this in two dimensions.

area of the disks r 2 . Show, for large N , that it is

A two dimensional L L box with hard walls contains well approximated by SQ = N kB (1 + log(A/N b)),

a gas of N hard disks of radius r L (gure 3.6). The with b representing the eective excluded area due to

disks are dilute: the summed area N r 2 L2 . Let A the

PN other disks. (You may want to derive the formula

n=1 log (A (n 1)9) = N log (A (N 1)9/2) +

be the eective volume allowed for the disks in the box:

A = (L 2r)2 . O(92 ).) What is the value of b, in terms of the area of

the disk?

(c) Find the pressure for the hard-sphere gas in the large

N approximation of part (b). Does it reduce to the ideal

gas law for b = 0?

An isolated system with energy E is composed of two

macroscopic subsystems, each of xed volume V and

number of particles N . The subsystems are weakly cou-

pled, so the sum of their energies is E1 + E2 = E (g-

ure 3.4 with only the energy door open). We can use

the Dirac delta function (x) to dene the volume of the

energy surface of a system with Hamiltonian H to be

Z

(E) = dP dQ (E H(P, Q)) (3.62)

Z

Fig. 3.6 Hard Sphere Gas. = dP1 dQ1 dP2 dQ2 (3.63)

(a) The area allowed for the second disk is A (2r)2 (a) Derive formula 3.24 for the volume of the R energy

(gure 3.7), ignoring the small correction when the ex- surface of the whole system (Hint: Insert (E1

cluded region around the rst disk overlaps the excluded H1 (P1 , Q1 )) dE1 = 1 into equation 3.63.)

region near the walls of the box. What is the allowed Consider a monatomic ideal gas (He) mixed with a di-

2N -dimensional volume in conguration space, of allowed atomic ideal gas (H2 ). We showed that a monatomic

3N/2

zero-energy congurations of hard disks, in this dilute ideal gas of N atoms has 1 (E1 ) E1 . A diatomic

5N/2 56

limit? Ignore small corrections when the excluded re- molecule has 2 (E2 ) E2 .

gion around one disk overlaps the excluded regions around (b) Argue that the probability density of system 1 being at

other disks, or near the walls of the box. Remember the energy E1 is the integrand of 3.24 divided by the whole in-

1/N ! correction for identical particles. Leave your answer tegral, equation 3.25. For these two gasses, which energy

as a product of N terms. E1max has the maximum probability?

B

stretch mode and I is the moment of inertia. The lower limit makes the rotations

classical; the upper limit freezes out the vibrations, leaving us with three classical

translation modes and two rotational modes a total of ve degrees of freedom.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

50 Temperature and Equilibrium

(c) Use the saddle-point method [68, sect. 3.6] to approx- Poisson distribution is valid even if there are only a few

imate the integral 3.63 as the integral over a Gaussian. events.)

(That is, put the integrand into the form exp(f (E1 )) and From parts (b) and (c), you should be able to conclude

Taylor expand f (E1 ) to second order in E1 E1max .) Use that the standard deviation in the number of particles

the saddle-point integrand as a Gaussian approximation found in a volume V inside an innite system should be

for the probability density (E1 ) (valid, for large N , when- equal to N0 , the expected number of particles in the vol-

ever (E1 ) isnt absurdly small). In this approximation, ume:

what is the mean energy E1 p(easy)? What are the en- (n n)2 = N0 . (3.65)

ergy uctuations per particle (E1 E1max )2 /N ?

This is twice the squared uctuations we found for the

For subsystems with large numbers of particles N , tem- case where the volume V was half of the total volume,

perature and energy density are well dened because equation 3.14. That makes sense, since the particles can

(E) for each subsystem grows extremely rapidly with uctuate more freely in an innite volume than in a dou-

increasing energy, in such a way that 1 (E1 )2 (E E1 ) bled volume.

is sharply peaked near its maximum.

If N0 is large, the probability Pm that N0 + m particles

(3.4) Gauss and Poisson. (Basic) lie inside our volume will be Gaussian for any K. (As a

special case, if a is large the Poisson distribution is well

In section 3.2.1, we calculated the probability distribu-

approximated as a Gaussian.) Lets derive this distribu-

tion for having n = N0 + m particles on the righthand

tion for all K. First, as in section 3.2.1, lets use the weak

half of a box of volume 2V with 2N0 total particles. In

form of Stirlings approximation, equation 3.11 dropping

section 11.3 will want to know the number uctuations

the square root: n! (n/e)n .

of a small subvolume in an innite system. Studying this

also introduces the Poisson distribution. (d) Using your result from part (a), write the exact for-

mula for log(Pm ). Apply the weak form of Stirlings for-

Lets calculate the probability of having n particles in a

mula. Expand your result around m = 0 to second or-

subvolume V , for a box with total volume KV and a to-

der in m, and show that log(Pm ) m2 /2K 2

, giving a

tal number of particles T = KN0 . For K = 2 we will

Gaussian form

derive our previous result, equation 3.14, including the 2 2

Pm em /2K . (3.66)

prefactor. As K we will derive the innite volume

result. What is K ? In particular, what is 2 and ? Your

(a) Find the exact formula for this probability: n particles result for 2 should agree with the calculation in sec-

in V , with total of T particles in KV . (Hint: What is the tion 3.2.1, and your result for should agree with equa-

probability that the rst n particles fall in the subvolume tion 3.65.

V , and the remainder T n fall outside the subvolume Finally, we should address the normalization of the Gaus-

(K 1)V ? How many ways are there to pick n particles sian. Notice that the ratio of the strong and weak forms

from T total particles?) of Stirlings formula, (equation 3.11) is 2n. We need

1

The Poisson probability distribution to use this to produce the normalization 2 of our

K

Gaussian.

n = an ea /n! (3.64) (e) In terms of T and n, what factor would the square

root term have contributed if you had kept it in Stirlings

arises in many applications. It arises whenever there is formula going from part (a) to part (d)? (It should look

a large number of possible events T each with a small like a ratio involving three terms like 2X.) Show

probability a/T ; the number of cars passing a given point from equation 3.66 that the uctuations are small, m =

during an hour on a mostly empty street, the number of n N0 N0 for large N0 . Ignoring these uctuations,

cosmic rays hitting in a given second, etc. set n = N0 in your factor, and give the prefactor multiply-

(b)

P Show that the Poisson distribution is normalized: ing the Gaussian in equation 3.66. (Hint: your answer

n n = 1. Calculate the mean of the distribution should be normalized.)

n

in terms

of a. Calculate the standard deviation

(n n)2 . (3.5) Microcanonical Thermodynamics (Thermo-

(c) As K , show that the probability that n particles dynamics, Chemistry)

fall in the subvolume V has the Poisson distribution 3.64. Thermodynamics was understood as an almost complete

What is a? (Hint: Youll need to use the fact that scientic discipline before statistical mechanics was in-

ea = (e1/K )Ka (1 1/K)Ka as K , and the vented. Stat mech can be thought of as the microscopic

fact that n T . Here dont assume that n is large: the theory, which yields thermo as the emergent theory on

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

3.6 What is Thermodynamics? 51

long length and time scales where the uctuations are at (x0 + x | y, y0 + y, f0 ). Draw it at constant x

y f

unimportant. back to y0 , and then at constant y back to (x0 , y0 ). How

The microcanonical stat mech distribution introduced in much must f change to make this a single-valued func-

class studies the properties at xed total energy E, vol- tion?) Applying this formula to S at xed E, derive the

ume V , and number of particles N . We derived the mi- two equations in part (a) again.

croscopic formula S(N, V, E) = kB log (N, V, E). The (c) Ideal Gas Thermodynamics. Using the micro-

principle that entropy is maximal led us to the conclu- scopic formula for the entropy of a monatomic ideal

sion that two weakly-coupled systems in thermal equilib- gas 3.61

rium would exchange energy until their values of E S

|N,V " 3/2 #

agreed, leading us to dene the latter as the inverse of 5 V 4mE

the temperature. By an analogous argument we nd that S(N, V, E) = N kB + N kB log ,

2 N h3 3N

systems that can exchange volume (by a thermally in- (3.68)

sulated movable partition) will shift until V S

|N,E agrees, calculate .

and that systems that can exchange particles (by semiper-

Maxwell Relations. Imagine solving the microcanoni-

meable membranes) will shift until N S

|V,E agrees.

cal equation of state of some material (not necessarily an

How do we connect these statements with the denitions

ideal gas) for the energy E(S, V, N ): its the same surface

of pressure and chemical potential we get from thermo-

in four dimensions, but looked at with a dierent direc-

dynamics? In thermo, one denes the pressure as minus

tion pointing up. One knows that the second deriva-

the change in energy with volume P = V E

|N,S , and the

tives of E are symmetric: at xed N , we get the same

chemical potential as the change in energy with number

answer whichever order we take derivatives with respect

E

of particles = N |V,S ; the total internal energy satises

to S and V .

dE = T dS P dV + dN. (3.67) (d) Use this to show the Maxwell relation

(a) Show by solving equation 3.67 for dS that V S

|N,E = T P

= . (3.69)

P/T and N |V,E = /T (simple algebra). V S,N S V,N

S

Lets do this the hard way. Our microcanonical equa- (This should take about two lines of calculus). Generate

tion of state S(N, V, E) can be thought of as a surface two other similar formulas by taking other second partial

embedded in four dimensions. derivatives of E. There are many of these relations.

(b) Show that, if f is a function of x and y, that (e) Stat Mech check of the Maxwell relation.

x

| y | f | = 1. (Draw a picture of a surface

y f f x x y

Using equation 3.61 repeated above, write formulas for

f (x, y) and a triangular path with three curves at con- E(S, V, N ), T (S, V, N ) and P (S, V, N ) for the ideal gas

stant f , x, and y as in gure 3.3. Specically, draw a (non trivial!). (This is dierent from T and P in part (c),

path that starts at (x0 , y0 , f0 ) and moves along a con- which were functions of N , V , and E.) Show explicitly

tour at constant f to y0 + y. The nal point will be that the Maxwell relation equation 3.69 is satised.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

52 Temperature and Equilibrium

Phase Space Dynamics and

Ergodicity 4

So far, our justication for using the microcanonical ensemble was sim-

ple ignorance: all we know about the late time dynamics is that energy

must be conserved, so we average over all states of xed energy. Here we

provide a much more convincing argument for the ensemble, and hence

for equilibrium statistical mechanics as a whole. In section 4.1 well

show for classical systems that averaging over the energy surface is con-

sistent with time evolution: Liouvilles theorem tells us that volume in

phase space is conserved, so the trajectories only stir the energy surface

around, they do not change the relative weights of dierent parts of the

energy surface. In section 4.2 we introduce the concept of ergodicity: an

ergodic system has an energy surface which is well stirred. Using Liou-

villes theorem and assuming ergodicity will allow us to show1 that the 1

We do not aspire to rigor, but we will

microcanonical ensemble average gives the long-time average behavior provide physical arguments for rigor-

ously known results: see [61].

that we call equilibrium.

In chapter 3, we saw that treating all states in phase space with a given

energy on an equal footing gave sensible predictions for the ideal gas,

but we did not show that this democratic treatment was necessarily the

correct one. Liouvilles theorem, true for all Hamiltonian systems, will

tell us that all states are created equal.

Systems of point particles obeying Newtons laws without dissipation

are examples of Hamiltonian dynamical systems. Hamiltonian systems

conserve energy. The Hamiltonian is the function H(P, Q) that gives

the energy for the system for any point in phase space: the equations of

motion are given by

q = H/p (4.1)

p = H/q .

and the only example we will discuss in this text, is a bunch of particles

interacting with a potential energy V :

H(P, Q) = p 2 /2m + V (q1 , . . . , q3N ). (4.2)

53

54 Phase Space Dynamics and Ergodicity

q = H/p = p /m (4.3)

p = H/q = V /q = f (q1 , . . . , q3N ).

2

Youll cover Hamiltonian dynamics in where f is the force on coordinate . More general Hamiltonians2 arise

detail in most advanced courses in clas- when studying, for example, the motions of rigid bodies or mechanical

sical mechanics. For those who dont

already know about Hamiltonians, rest

objects connected by hinges and joints, where the natural variables are

assured that we wont use anything angles or relative positions rather than points in space. Hamiltonians

other than the special case of Newtons also play a central role in quantum mechanics.3

laws for point particles: you can safely Hamiltonian systems have properties that are quite distinct from gen-

ignore the more general case for our

purposes. eral systems of dierential equations. Not only do they conserve energy:

3 they also have many other unusual properties.4 Liouvilles theorem de-

In section 7.1 we discuss the quantum

version of Liouvilles theorem.

scribes the most important of these properties.

Consider the evolution law for a general probability density in phase

space

(P, Q) = (q1 , ..., q3N , p1 , ...p3N ). (4.4)

(As a special case, the microcanonical ensemble has equal to a con-

stant in a thin range of energies, and zero outside that range.) This

probability density is locally conserved: probability cannot be created

or destroyed, it can only ow around in phase space. As an analogy,

suppose a uid of mass density 3D (x) in three dimensions has a veloc-

ity v(x). Because mass density is locally conserved, 3D must satisfy

the continuity equation 3D /t = J, where J = 3D v is the mass

5

Think of the ow in and out of a small current.5 In the same way, the probability density in 6N dimensions

volume V in space. The change in the has a phase-space probability current ( P, Q) and hence satises a

density inside the volume 3D /t V

must equal minus the ow of

continuity equation

R mate-

rial out through the surface J R dS,

which by Gauss theorem equals

3N

(q ) (p )

J dV J V . = + (4.5)

t =1

q p

3N

q p

= q + + p +

=1

q q p p

and p s. But what is meant by q /q ? For our example of point

particles, q = p /m, which has no dependence on q ; nor does p =

6

It would typically generally depend on f (q1 , . . . , q3N ) have any dependence on the momentum p .6 Hence

the coordinate q , for example. these two mysterious terms in equation 4.5 both vanish for Newtons

laws for point particles. Indeed, in a general Hamiltonian system, using

equation 4.1, we nd that they cancel:

= (H/q )/p = (p )/p = p /p . (4.6)

a symplectic form = dq1 dp1 + + dq3N dp3N : Liouvilles theorem follows

because the volume in phase space is 3N .

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

4.1 Liouvilles Theorem 55

3N

/t + q + p = d/dt = 0. (4.7)

=1

q p

What is d/dt, and how is it dierent from /t? The former is

called the total derivative of with respect to time: its the evolution of

seen by a particle moving withthe ow. In a three dimensional ow,

3

d3D /dt = /t+ v = t +

i=1 xi xi ; the rst term is the change

in due to the time evolution at xed position, and the second is the

change in that a particle moving with velocity v would see if the

eld didnt change in time. Equation 4.7 is the same physical situation,

but in 6N -dimensional phase space.

What does Liouvilles theorem, d/dt = 0, tell us about Hamiltonian

dynamics?

if the density d3D /dt = 0 it means that the uid is incompressible. t

The density of a small element of uid doesnt change as it moves

around in the uid: hence the small element is not compressing

or expanding. In Liouvilles theorem, it means the same thing: a

small volume in phase space will evolve into a new shape, perhaps

stretched, twisted, or folded, but with exactly the same volume.

There are no attractors. In other dynamical systems, most

Fig. 4.1 A small volume in phase

states of the system are usually transient, and the system settles space may be stretched and twisted by

down onto a small set of states called the attractor. A damped the ow, but Liouvilles theorem shows

pendulum will stop moving: the attractor has zero velocity and that the volume stays unchanged.

vertical angle (exercise 4.1). A forced, damped pendulum will

settle down to oscillate with a particular amplitude: the attractor

is a circle in phase space. The decay of these transients would

seem closely related to equilibration in statistical mechanics, where

at long times all initial states of a system will settle down into

boring static equilibrium behavior.7 Perversely, weve just proven 7

Well return to the question of how ir-

that equilibration in statistical mechanics happens by a completely reversibility and damping emerge from

statistical mechanics many times in the

dierent mechanism! In equilibrium statistical mechanics all states rest of this book. It will always involve

are created equal: transient states are temporary only insofar as introducing approximations to the mi-

they are very unusual, so as time evolves they disappear, to arise croscopic theory.

again only as rare uctuations.

Microcanonical ensembles are time independent. An initial

uniform density in phase space will stay uniform. More generally,

since energy is conserved, a uniform density over a small shell of

energies (E, E + E) will stay uniform.

Liouvilles theorem tells us that the energy surface may get stirred

around, but the relative weights of parts of the surface are given by

their phasespace volumes (gure 3.1) and dont change. This is clearly

a necessary condition for our microcanonical ensemble to describe the

timeindependent equilibrium state.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

56 Phase Space Dynamics and Ergodicity

4.2 Ergodicity

By averaging over the energy surface, statistical mechanics is making a

hypothesis, rst stated by Boltzmann. Roughly speaking, the hypothesis

is that the energy surface is thoroughly stirred by the time evolution: it

isnt divided into some kind of components that dont intermingle (see

8

Mathematicians distinguish between gure 4.2). A system which is thoroughly stirred is said to be ergodic.8

ergodic (stirred) and mixing (scram- The original way of dening ergodicity is due to Boltzmann. Adapting

bled); we only need to assume ergod-

icity here. See reference [61] for more

his denition,

information about ergodicity. Denition 1: In an ergodic system, the trajectory of almost every9

9

What does almost every mean? point in phase space eventually passes arbitrarily close10 to every

Technically, it means all but a set of other point (position and momentum) on the surface of constant

zero volume (measure zero). Basically, energy.

its there to avoid problems with initial

conditions like all the particles moving We say our Hamiltonian is ergodic if the time evolution is ergodic on

precisely at the same velocity in neat each energy surface S.

rows.

10

The most important consequence of ergodicity is that time averages

Why not just assume that every are equal to microcanonical averages.11 Intuitively, since the trajectory

point on the energy surface gets passed

through? Boltzmann originally did (P(t), Q(t)) covers the whole energy surface, the average of any property

assume this. However, it can be A(P(t), Q(t)) over time is the same as the average of A over the energy

shown that a smooth curve (our time- surface.

trajectory) cant ll up a whole volume

This turns out to be tricky to prove, though. Its easier mathemat-

(the energy surface).

ically to work with another, equivalent denition of ergodicity. This

11

If the system equilibrates (i.e., denition roughly says the energy surface cant be divided into compo-

doesnt oscillate forever), the time aver-

age behavior will be determined by the nents which dont intermingle. Lets dene an ergodic component R of

equilibrium behavior, and then ergod- a set S to be a subset that is left invariant under the ow (so r(t) R

icity implies that the equilibrium prop- for all r(0) R).

erties are equal to the microcanonical

averages. Denition 2: A time evolution in a set S is ergodic if and only if all

the ergodic components R in S either have zero volume or have a

volume equal to the volume of S.

Why does denition 2 follow from denition 1? A trajectory r(t) of

course must lie within a single ergodic component. If r(t) covers the

energy surface densely (denition 1), then there is no more room for a

second ergodic component with non-zero volume (denition 2).

Using this denition of ergodic, its easy to argue that time averages

must equal microcanonical averages. Lets denote the microcanonical

average of an observable A as AS , and lets denote the time average

starting at initial condition (P, Q) as A(P, Q).

Showing that the time average A equals the ensemble average AS

for an ergodic system (using this second denition) has three steps.

(1) Time averages are constant on trajectories. If A is a nice

function, (e.g. without any innities on the energy surface), then

its easy to show that

the nite time interval (t, t + ). Thus the time average A is

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

4.2 Ergodicity 57

ergodic motion. This is a

(Poincare) cross section of Earths

motion in the three-body problem (ex-

ercise 4.2), with Jupiters mass set at

almost 70 times its actual value. The

closed loops correspond to trajectories

that form tori in phase space, whose

cross sections look like deformed cir-

cles in our view. The complex lled

region is a single trajectory exhibiting

chaotic motion, and represents an er-

godic component. The tori, each an

ergodic component, can together be

shown to occupy non-zero volume in

phase space, for small Jovian masses.

Note that this system is not ergodic

according to either of our denitions.

The trajectories on the tori never ex-

plore the rest of the energy surface.

The region R formed by the chaotic

domain is invariant under the time

evolution; it has positive volume and

the region outside R also has positive

volume.

(2) Time averages are constant on the energy surface. Now

consider the subset Ra of the energy surface where A < a, for

some value a. Since A is constant along a trajectory, any point

in Ra is sent under the time evolution to another point in Ra , so

Ra is an ergodic component. If our dynamics is ergodic on the

energy surface, that means the set Ra has either zero volume or

the volume of the energy surface. This implies that A is a constant

on the energy surface (except on a set of zero volume); its value

is a , the lowest value where Ra has the whole volume. Thus the

equilibrium, time average value of our observable A is independent

of initial condition.

(3) Time averages equal microcanonical averages. Is this equi-

librium value given by the microcanonical ensemble average over

S? We need to show that the trajectories dont dawdle in some

regions of the energy surface more than they should (based on

the thickness of the energy shell, gure 3.1). Liouvilles theorem

in section 4.1 told us that the microcanonical ensemble was time

independent. This implies that the microcanonical ensemble aver-

12 If we could show that A had to be a continuous function, wed now be able to use

the rst denition of ergodicity to show that it was constant on the energy surface,

since our trajectory comes close to every point on the surface. But it will not be

continuous for Hamiltonian systems that are not ergodic. In gure 4.2, consider two

initial conditions at nearby points, one just inside a chaotic region and the other on

a KAM torus. The innite time averages on the two trajectories for most quantities

will be dierent: A will typically have a jump at the boundary.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

58 Phase Space Dynamics and Ergodicity

where the average S integrates over initial conditions (P(0), Q(0))

but evaluates A at (P(t), Q(t)). Averaging over all time, and using

the fact that A = a (almost everywhere), tells us

T

1

AS = lim A(P(t), Q(t))S dt

T T

0

T

1

= lim A(P(t), Q(t)) dt

T T 0

S

= A(P, Q)S = a S = a . (4.9)

ergodic Hamiltonian system.

Can we show that our systems are ergodic? Usually not. Ergodic-

ity has been proven for the collisions of hard spheres, and for geodesic

13

Geodesic motion on a sphere would motion on nite surfaces with constant negative curvature,13 but not

be motion at a constant speed around for many systems of immediate practical importance. Indeed, many

great circles. Geodesics are the short-

est paths between two points. In gen-

fundamental problems precisely involve systems which are not ergodic.

eral relativity, falling bodies travel on

geodesics in space-time. KAM tori and the three-body problem. Generations of

mathematicians and physicists have worked on the gravitational

14

Newton solved the gravitational two- three-body problem.14 The key challenge was showing that the

body problem, giving Keplers ellipse. interactions between the planets do not completely mess up their

orbits over long times. One must note that messing up their or-

bits is precisely what an ergodic system must do! (Theres just

as much phase space at constant energy with Earth and Venus

exchanging places, and a whole lot more with Earth ying out

15

That is, the 20th century. into interstellar space.) In the last century15 the KAM theorem

was proven, which showed that (for small interplanetary interac-

tions and a large fraction of initial conditions) the orbits of the

planets qualitatively stayed in weakly perturbed ellipses around

the Sun (KAM tori, see gure 4.2). Other initial conditions, intri-

cately intermingled with the stable ones, lead to chaotic motion.

Exercise 4.2 investigates the KAM tori and chaotic motion in a

numerical simulation.

From the KAM theorem and the study of chaos in these systems

we learn that Hamiltonian systems with small numbers of particles

are often, even usually, not ergodic there are commonly regions

formed by tori of non-zero volume which do not mix with the rest

of the energy surface.

Fermi, Pasta, Ulam and KdV. You might think that this is

a peculiarity of having only a few particles. Surely if there are

lots of particles, such funny behavior has to go away? On one of

the early computers developed for the Manhattan project, Fermi,

Pasta and Ulam tested this [29]. They took a one-dimensional

chain of atoms, coupled them with anharmonic potentials, and

tried to look for thermalization. To quote them:

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

4.2 Ergodicity 59

tions were, from the beginning, surprising us. Instead

of a continuous ow of energy from the rst mode to

the higher modes, all of the problems show an entirely

dierent behavior. . . . Instead of a gradual increase of

all the higher modes, the energy is exchanged, essen-

tially, among only a certain few. It is, therefore, very

hard to observe the rate of thermalization or mixing

in our problem, and this was the initial purpose of the

calculation. [29, p.978]

It turns out that their system, in the continuum limit, gave a par-

tial dierential equation (the Kortweg-de Vries equation) that was

even weirder than planetary motion: it had an innite family of

conserved quantities, and could be exactly solved using a combi-

nation of fronts called solitons.

The kind of non-ergodicity found in the Kortweg-de Vries equa-

tion was thought to arise in only rather special onedimensional

systems. The recent discovery of anharmonic localized modes in

generic, threedimensional systems by Sievers and Takeno [91, 85]

suggests that non-ergodicity my arise in rather realistic lattice

models.

Phase Transitions. In systems with an innite number of parti-

cles, one can have phase transitions. Often ergodicity breaks down

in one of the phases. For example, a liquid may explore all of phase

space with a given energy, but an innite crystal (with a neat grid

of atoms aligned in a particular orientation) will never uctuate to

change its orientation, or (in three dimensions) the registry of its

grid.16 The real system will explore only one ergodic component 16

That is, a 3D crystal has broken

of the phase space (one crystal position and orientation), and we orientational and translational symme-

tries: see chapter 9.

must do the same when making theories of the system.

Glasses. There are other kinds of breakdowns of the ergodic hy-

pothesis. For example, glasses fall out of equilibrium as they are

cooled: they no longer ergodically explore all congurations, but

just oscillate about one of many metastable glassy states. Certain

models of glasses and disordered systems can be shown to break

ergodicity. It is an open question whether real glasses truly break

ergodicity when cooled innitely slowly, or whether they are just

sluggish, frozen liquids.

Should we be concerned that we cannot prove that our systems are

ergodic? It is entertaining to point out the gaps in our derivations, espe-

cially since they tie into so many central problems in mathematics and

physics (above). We emphasize that these gaps are for most purposes

purely of academic concern. Statistical mechanics works phenomenally

well in systems with large numbers of interacting degrees of freedom.

Indeed, the level of rigor here is unusual. In more modern applica-

tions17 of statistical mechanics outside of equilibrium thermal systems

17 In disordered systems, disorder is heuristically introduced with Gaussian or dis-

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

60 Phase Space Dynamics and Ergodicity

to that provided by Liouvilles theorem and ergodicity.

Exercises

(4.1) The Damped Pendulum vs. Liouvilles The- planets out of the Solar system. Most of the phase-space

orem. (Basic, Mathematics) volume of the energy surface has eight planets evaporated

The damped pendulum has a force p proportional to and Jupiter orbiting the Sun alone: the ergodic hypothe-

the momentum slowing down the pendulum. It satises sis would doom us to one long harsh winter. So, the big

the equations question is: Why hasnt the Earth been kicked out into

interstellar space?

x = p/M (4.10) Mathematical physicists have studied this problem for

p = p K sin(x). hundreds of years. For simplicity, they focused on the

three-body problem: for example, the Sun, Jupiter, and

At long times, the pendulum will tend to an equilibrium the Earth. The early (failed) attempts tried to do pertur-

stationary state, zero velocity at x = 0 (or more generally bation theory in the strength of the interaction between

at the equivalent positions x = 2m, for m an integer): planets. Jupiters gravitational force on the Earth is not

(p, x) = (0, 0) is an attractor for the damped pendulum. tiny, though: if it acted as a constant brake or accelerator,

An ensemble of damped pendulums is started with ini- our orbit would be way out of whack in a few thousand

tial conditions distributed with probability (p0 , x0 ). At years. Jupiters eects must cancel out over time rather

late times, these initial conditions are gathered together perfectly...

near the equilibrium stationary state: Liouvilles theorem

This problem is mostly discussion and exploration: only

clearly is not satised.

a few questions need to be answered. Download the pro-

(a) In the steps leading from equation 4.5 to equation 4.7, gram Jupiter from the appropriate link at the bottom

why does Liouvilles theorem not apply to the damped pen- of reference [96]. (Go to the directory with the binaries

dulum? More specically, what is p/p and q/q? and select Jupiter.) Check that Jupiter doesnt seem to

(b) Find an expression for the total derivative d/dt in send the Earth out of the Solar system. Try increasing

terms of for the damped pendulum. How does the prob- Jupiters mass to 35000 Earth masses. (If you type in a

ability density vary with time? If we evolve a region of new value, you need to hit Enter to register it.)

phase space of initial volume A = p x how will its Start the program over again (or reset Jupiters mass back

volume depend upon time? to 317.83 Earth masses). Shifting View to Earths

trajectory, run for a while, and zoom in with the right

(4.2) Jupiter! and the KAM Theorem (Astro- mouse button to see the small eects of Jupiter on the

physics, Mathematics) Earth. (The left mouse button will launch new trajec-

See also reference [96]. tories. Clicking with the right button will restore the

The foundation of statistical mechanics is the ergodic hy- original view.) Note that the Earths position shifts de-

pothesis: any large system will explore the entire energy pending on whether Jupiter is on the near or far side of

surface. We focus on large systems because it is well the sun.

known that many systems with a few interacting parti- (a) Estimate the fraction that the Earths radius from

cles are denitely not ergodic. the Sun changes during the rst Jovian year (about 11.9

The classic example of a non-ergodic system is the Solar years). How much does this fractional variation increase

system. Jupiter has plenty of energy to send the other over the next hundred Jovian years?

Bayesian statistics, the user is in charge of determining the prior model probabil-

ity distribution, analogous to Liouvilles theorem determining the measure on phase

space.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

4.2 Ergodicity 61

around a tube. This orbit in physical three-dimensional The fact that the torus isnt destroyed immediately is a

space is a projection of the tube in 6N-dimensional phase serious problem for statistical mechanics! The orbit does

space. The tube in phase space already exists for massless not ergodically explore the entire allowed energy surface.

planets... This is a counterexample to Boltzmanns ergodic theo-

Lets start in the non-interacting planet approximation rem. That means that time averages are not equal to

(where Earth and Jupiter are assumed to have zero mass). averages over the energy surface: our climate would be

Both Earths orbit and Jupiters orbit then become cir- very unpleasant, on the average, if our orbit were ergodic.

cles, or more generally ellipses. The eld of topology does

Lets use a Poincare section to explore these tori, and

not distinguish an ellipse from a circle: any stretched,

the chaotic regions between them. If a dynamical system

wiggled rubber band is a circle so long as it forms a curve

keeps looping back in phase space, one can take a cross-

that closes into a loop. Similarly, a torus (the surface of

section of phase space and look at the mapping from that

a doughnut) is topologically equivalent to any closed sur-

cross section back into itself (see gure 4.3).

face with one hole in it (like the surface of a coee cup,

with the handle as the hole). Convince yourself in this

non-interacting approximation that Earths orbit remains

topologically a circle in its six-dimensional phase space.18

(b) In the non-interacting planet approximation, what

topological surface is it in in the eighteen-dimensional

phase space that contains the trajectory of the three bod-

ies? Choose between (i) sphere, (ii) torus, (iii) Klein

bottle, (iv) two-hole torus, (v) complex projective plane.19

About how many times does Earth wind around this sur-

face during each Jovian year? (This ratio of years is

called the winding number).

The mathematical understanding of the three-body prob-

lem was only solved in the past hundred years or so, by

Kolmogorov, Arnold, and Moser. Their proof focuses on

the topological integrity of this tube in phase space (called

now the KAM torus). They were able to prove stability

if the winding number (Jupiter year over Earth year) is Fig. 4.3 The Poincare section of a torus is a circle. The dy-

suciently irrational. More specically, they could prove namics on the torus becomes a mapping of the circle onto itself.

in this case that for suciently small planetary masses

that there is a distorted torus in phase space, near the

unperturbed one, around which the planets spiral around

with the same winding number. The Poincare section shown in the gure is a planar cross

(c) About how large can you make Jupiters mass before section in a three-dimensional phase space. Can we re-

Earths orbit stops looking like a torus? (You can hit duce our problem to an interesting problem with three

Clear and Reset to put the planets back to a stan- phase-space coordinates? The original problem has an

dard starting point. Otherwise, your answer will depend eighteen dimensional phase space. In the center of mass

upon the location of Jupiter in the sky Admire the cool frame it has twelve interesting dimensions. If we restrict

orbits when the mass becomes too heavy. the motion to a plane, it reduces to eight dimensions. If

Thus, for small Jovian masses, the trajectory in phase we assume the mass of the Earth is zero (the restricted

space is warped and rotated a bit, so that its toroidal planar three body problem) we have ve relevant coordi-

shape is visible looking at Earths position alone. (The nates (Earth xy positions and velocities, and the location

circular orbit for zero Jovian mass is looking at the torus of Jupiter along its orbit). If we remove one more variable

18 Hint: plot the orbit in the (x, y), (x, p ), and other planes. It should look like

x

the projection of a circle along various axes.

19 Hint: Its a circle cross a circle, parameterized by two independent angles one

representing the month of Earths year, and one representing the month of the Jovian

year. Feel free to look at part (c) before committing yourself, if pure thought isnt

enough.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

62 Phase Space Dynamics and Ergodicity

by going to a rotating coordinate system that rotates with curves. These are KAM tori that have been squashed and

Jupiter, the current state of our model can be described twisted like rubber bands.20 Explore until you nd some

with four numbers: two positions and two momenta for orbits that seem to ll out whole regions: these represent

the Earth. We can remove another variable by conn- chaotic orbits.21

ing ourselves to a xed energy. The true energy of the (d) If you can do a screen capture, print out a Poincare

Earth isnt conserved (because Earth feels a periodic po- section with initial conditions both on KAM tori and in

tential), but there is a conserved quantity which is like chaotic regions: label each.22 See gure 4.2 for a small

the energy in the rotating frame: more details described segment of the picture you should generate.

under Help or on the Web [96] under Description of

the Three Body Problem. This leaves us with a trajec- It turns out that proving that Jupiters eects cancel out

tory in three dimensions (so, for small Jovian masses, we depends on Earths smoothly averaging over the surface

have a torus embedded in a three-dimensional space). Fi- of the torus. If Jupiters year is a rational multiple of

nally, we take a Poincare cross section: we plot a point of Earths year, the orbit closes after a few years and you

the trajectory every time Earth passes directly between dont average over the whole torus: only a closed spiral.

Jupiter and the Sun. I plot the distance to Jupiter along Rational winding numbers, we now know, leads to chaos

the horizontal axis, and the velocity component towards when the interactions are turned on: the large chaotic re-

Jupiter along the vertical axis; the perpendicular compo- gion you found above is associated with an unperturbed

nent of the velocity isnt shown (and is determined by the orbit with a winding ratio of 3:1. Of course, the rational

energy). numbers are dense: between any two KAM tori there are

Set the View to Poincare. (You may need to expand the chaotic regions, just because between any two irrational

window a bit: sometimes the dot size is too small to see.) numbers there are rational ones. Its even worse: it turns

Set Jupiters mass to 2000, and run for 1000 years. You out that numbers which are really, really close to rational

10

should see two nice elliptical cross-sections of the torus. (Liouville numbers like 1 + 1/10 + 1/1010 + 1/1010 + . . . )

As you increase the mass (type in a mass, Enter, Re- also may lead to chaos. It was amazingly tricky to prove

set and Run, repeat), watch the toroidal cross sections that lots of tori survive nonetheless. You can imagine

as they break down. Run for a few thousand years at why this took hundreds of years to understand (especially

MJ = 22000Me ; notice the torus has broken into three without computers).

circles.

Fixing the mass at MJ = 22000Me , lets explore the de- (4.3) Invariant Measures. (Math, Complexity) (With

pendence of the planetary orbits on the initial condition. Myers. [72])

Select the preset for Chaos (or set MJ to 22000 Me ,

View to Poincare, and Reset). Clicking on a point Reading: Reference [47], Roderick V. Jensen and

on the screen with the left mouse button will launch a Christopher R. Myers, Images of the critical points of

trajectory with that initial position and velocity towards nonlinear maps Physical Review A 32, 1222-1224 (1985).

Jupiter; it sets the perpendicular component of the ve- Liouvilles theorem tells us that all available points in

locity to keep the current energy. (If you click on a phase space are equally weighted when a Hamiltonian sys-

point where energy cannot be conserved, the program tem is averaged over all times. What happens for systems

will tell you so.) You can thus view the trajectories on that evolve according to laws that are not Hamiltonian?

a two-dimensional cross-section of the three-dimensional Usually, the system does not continue to explore all points

constant energy surface. in its state space: at long times it is conned a subset of

Notice that many initial conditions slowly ll out closed the original space known as the attractor.

20 You can Continue if the trajectory doesnt run long enough to give you a com-

plete feeling for the cross-section: also, increase the time to run). You can zoom in

with the right mouse button, and zoom out by expanding the window or by using

the right button and selecting a box which extends outside the window.

21 Notice that the chaotic orbit doesnt throw the Earth out of the Solar system.

The chaotic regions near innity and near our initial condition are not connected.

This may be an artifact of our simplied model: in other larger systems it is believed

that all chaotic regions (on a connected energy surface) are joined through Arnold

diusion.

22 At least under Linux, the Print feature is broken. Under Linux, try gimp:

File Menu, then Acquire, then Screen Shot. Under Windows, alt-Print Screen and

then Paste into your favorite graphics program.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

4.2 Ergodicity 63

We consider the behavior of the logistic mapping from For larger values of , more complicated things happen.

the unit interval (0, 1) into itself.23 At = 1, the dynamics can be shown to ll the entire in-

terval: the dynamics is ergodic, and the attractor lls the

f (x) = 4x(1 x). (4.11) entire set of available states. However, unlike the case of

Hamiltonian systems, not all states are weighted equally.

We talk of the trajectory of an initial point x0 as the se-

We can nd time averages for functions of x in two ways:

quence of points x0 , f (x0 ), f (f (x0 )), . . . , f [n] (x0 ), . . . .

by averaging over time (many iterates of the map) or

Iteration can be thought of as a time step (one iteration

by weighting an integral over x by the invariant density

of a Poincare return map of exercise 4.2 or one step t

(x). The invariant density (x) dx is the probability that

in a time-step algorithm as in exercise 8.9).

a point on a long trajectory will lie between x and x + dx.

Attracting Fixed Point: For small , our mapping has To nd it numerically, we iterate a typical point25 x0 a

an attracting xed point. A xed point of a mapping is thousand or so times (Ntransient ) to nd a point xa on the

a value x = f (x ); a xed point is stable if under small attractor, and then collect a long trajectory of perhaps

perturbations shrink: a million points (Ncycles ). A histogram of this trajectory

gives (x). Clearly averaging over this density is the same

|f (x + 9) x | |f (x )|9 < 9, (4.12) as averaging over the trajectory of a million points. We

call (x) an invariant measure because its left invariant

which happens if the derivative |f (x )| < 1.24

under the mapping f : iterating our millionpoint approx-

(a) Iteration: Set = 0.2; iterate f for some initial imation for once under f only removes the rst point

points 0 < x0 < 1 of your choosing, and convince your- xa and adds one extra point to the end.

self that they all are attracted to zero. Plot f and the

(b) Invariant Density: Set = 1; iterate f many times,

diagonal y = x on the same plot. Are there any xed

and form a histogram of values giving the density (x) of

points other than x = 0? Repeat for = 0.4, and 0.6.

points along the trajectory. You should nd that points

What happens?

x near the boundaries are approached more often than

Analytics: Find the non-zero xed point x () of the points near the center.

map 4.11, and show that it exists and is stable for 1/4 < Analytics: Use the fact that the long time average (x)

< 3/4. If youre ambitious or have a computer algebra must be independent of time, verify for = 1 that the

program, show that

there is a stable periodtwo cycle for density of points is26

3/4 < < (1 + 6)/4.

1

An attracting xed point is the antithesis of Liouvilles (x) = p . (4.13)

theorem: all initial conditions are transient except one, x(1 x)

and all systems lead eventually to the same, time Plot this theoretical curve with your numerical histogram.

independent state. (On the other hand, this is precisely (Hint: The points in a range dx of a point x map under f

the behavior we expect in statistical mechanics on the to a range dy = f (x) dx around the image y = f (x).

macroscopic scale: the system settles down into a time Each iteration maps two points xa and xb = 1 xa

independent equilibrium state! All microstates are equiv- to y, and thus maps all the density (xa )|dxa | and

alent, but the vast majority of accessible microstates (xb )|dxb | into dy. Hence the probability (y)dy must

have the same macroscopic behavior in most large sys- equal (xa )|dxa | + (xb )|dxb |, so

tems). We could dene a rather trivial equilibrium en-

(f (xa )) = (xa )/|f (xa )| + (xb )/|f (xb )| (4.14)

semble for this system, which consists of the single point

x : any property A(x) will have the longtime average Plug equation 4.13 into equation 4.14. Youll need to

A = A(x ). factor a polynomial.)

24 For many dimensional mappings, a sucient criterion for stability is that all the

eigenvalues of the Jacobian have magnitude smaller than one. A continuous time

evolution dy/dt = F (y), will be stable if dF/dy is smaller than zero, or (for multidi-

mensional systems) if the Jacobian DF has eigenvalues whose real parts are all less

than zero. This is all precisely analogous to discrete and continuoustime Markov

chains, see section 8.2

25 For example, we must not choose an unstable xed point or unstable periodic

orbit!

26 You need not derive the factor 1/, which normalizes the probability density to

one.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

64 Phase Space Dynamics and Ergodicity

Mathematicians call this probability density (x)dx the cation diagram, which is shown for large in gure 4.4.

invariant measure on the attractor.27 To get the long One of the striking features in this plot are the sharp

term average of any function A(x), one can use boundaries formed by the cusps.

Z

A = A(x)(x)dx (4.15)

ferent regions in doing integrals precisely our (x)dx.

Notice that, for the case of an attracting xed point, we

would have (x) = (x ).28 x

Cusps in the invariant density: At values of slightly

smaller than one, our mapping has a rather complex in-

variant density.

(c) Find the invariant density (as described above) for

= 0.9. Make your trajectory length Ncycles big enough

and the bin size small enough to see the interesting struc-

tures. Notice that the attractor no longer lls the whole

range (0, 1): locate roughly where the edges are. No-

tice also the cusps in (x) at the edges of the attractor, Fig. 4.4 Bifurcation diagram in the chaotic region. No-

and also at places inside the attractor (called boundaries, tice the boundary lines threading through the diagram, images

of the crease formed by the folding at x = 1/2 in our map (see

see reprint above). Locate some of the more prominent

reprint above).

cusps.

Analytics of cusps: Notice that f (1/2 ) = 0. so by equa-

tion 4.14 we know that (f (x)) (x)/|f (x)| must have

a singularity near x = 1/2 : all the points near x = 1/2 are (e) Bifurcation Diagram: Plot the attractor (duplicat-

squeezed together and folded to one side by f . Further ing gure 4.4) as a function of , for 0.8 < < 1. (Pick

iterates of this singularity produce more cusps: the crease regularly spaced , run ntransient steps, record ncycles

after one fold stays a crease after being further stretched steps, and plot. After the routine is working, you should

and kneaded. be able to push ntransient and ncycles both larger than 100,

(d) Set = 0.9. Calculate f (1/2 ), f (f (1/2 )), . . . and com- and < 0.01.)

pare these iterates to the locations of the edges and cusps On the same plot, for the same s, plot the rst eight

from part (c). (You may wish to include them both on the images of x = 1/2 , that is, f (1/2 ), f (f (1/2 )), . . . . Are the

same plot.) boundaries you see just the cusps? What happens in the

Bifurcation Diagram: The evolution of the attractor bifurcation diagram when two boundaries touch? (See the

and its invariant density as varies is plotted in the bifur- reprint above.)

27 There are actually many possible invariant measures on some attractors: this

28 The case of a xed point then becomes mathematically a measure with a point

mass at x .

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

Free Energies and

Ensembles 5

In the preceding chapters, we have in principle dened equilibrium statis-

tical mechanics as the microcanonical ensemble average over the energy

surface. All that remains is to calculate the properties of particular sys-

tems. Computing properties of large systems xing the total energy,

though, turns out to be quite awkward. One is almost always interested

either in the properties of a small subsystem or in a system that is cou-

pled in some way with the external world. The calculations are made far

easier by using ensembles appropriate for subsystems of larger systems.

In this chapter, we will introduce two such ensembles, the canonical

ensemble1 and the grand canonical ensemble. 1

Websters canonical: reduced to the

But rst, let us motivate the introduction of these ensembles by in- simplest or clearest schema possible.

The canonical ensemble will be simpler

troducing the concept of free energy. to compute with than the microcanon-

ical one.

A mass M hangs on the end of a spring of spring constant K and un- K(hh 0)

h

stretched length h0 , subject to a gravitational eld of strength g. How

far does the spring stretch? We have all solved innumerable statics prob- h*

lems of this sort in rst-year mechanics courses. The spring stretches to

m

a length h where mg = K(h h0 ), where the forces balance and the mg h* h 0

energy is minimized.

What principle of physics is this? In physics, energy is conserved, not Fig. 5.1 A mass on a spring in equilib-

minimized! Shouldnt we be concluding that the mass will oscillate with rium sits very close to the minimum of

a constant amplitude forever? the energy.

We have now come to the point in your physics education where we

can nally explain why the mass appears to minimize energy. Here our

subsystem (the mass and spring)2 are coupled to a very large number 2

We think of the subsystem as being

N of internal atomic or molecular degrees of freedom. The oscillation of just the macroscopic conguration of

mass and spring, and the atoms com-

the mass is coupled to these other degrees of freedom (friction) and will prising them as being part of the envi-

share its energy with them. The vibrations of the atoms is heat: the ronment, the rest of the system.

energy of the pendulum is dissipated by friction into heat. Indeed, since

the spring potential energy is quadratic we can use the equipartition

theorem from section 3.2.2: 1/2 K(h h )2 = 1/2 kB T . For

a spring with

21

K

= 10N/m at room temperature kB T = 4 10 J, (h h )2 =

11

kB T /K = 2 10 m = 0.2A. The energy is indeed minimized up to

tiny thermal uctuations.

65

66 Free Energies and Ensembles

complex than one degree of freedom, say a piston lled with gas? The

entropy of our subsystem can also be important. The second law of

thermodynamics states that the entropy of the universe tends to a max-

imum. What function of the energy E and the entropy S for our system

tends to a minimum, when it is coupled to the rest of the universe?

Lets start with a subsystem of energy Es and xed volume and num-

ber of particles, weakly coupled to a world at temperature Tb . In sec-

tion 5.2 we shall nd that small subsystems of large systems in equilib-

rium are described by the canonical ensemble, and we shall dene the

3

Its called the free energy because its Helmholtz free energy A(T, V, N ) = E T S.3 Can we see in advance

the energy available to do work: to run that minimizing the Helmholtz free energy of the subsystem maximizes

a heat engine youd need to send energy

Q/T = S into the cold bath to get rid

the entropy of the universe? If A = Es Tb Ss (Es ) then A is a minimum

of the entropy, leaving A = E T S free if its derivative

to do work (section 6.1). A/Es = 1 Tb Ss /Es (5.1)

equals zero. The last term is the inverse temperature of the subsystem

1/Ts , so

A/Es = 1 Tb /Ts = 0, (5.2)

once the temperature of our system equals the bath temperature. But

equating the temperatures maximizes the total entropy (Ss (Es )+Sb (E

Es ))/Es = 1/Ts 1/Tb , so minimizing the Helmholtz free energy A of

our subsystem maximizes the entropy of the universe as a whole.

Lets write out the Helmholtz free energy in more detail:

The terms on the right-hand side of the equation involve four variables:

T , V , N , and E. Why is A only a function of three? These functions

only are meaningful for systems in equilibrium. In equilibrium, we have

just shown in equations 5.1 and 5.2 that A/E = 0 for all values of the

four variables. Hence A is independent of E. This is an example of a

Legendre transformation. Legendre transformations allow one to change

4

The entropy, energy, and various free from one type of energy or free energy4 to another, by changing from

energies are also called thermodynamic one set of independent variables (here E,V , and N ) to another (T , V ,

potentials.

and N ).

We introduced in problem 3.5 the thermodynamics nomenclature (equa-

tion 3.67)

dE = T dS P dV + dN. (5.4)

which basically asserts that E(S, V, N ) satises

E

= T, (5.5)

S N,V

E

= P, and

V N,S

E

= .

N V,S

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

5.2 The Canonical Ensemble 67

which satises

A

= S, (5.7)

T N,V

A

= P, and

V N,T

A

= .

N V,T

most biological and chemical systems, see exercise 5.8) minimize the

Gibbs free energy

G(T, P, N ) = E T S + P V = Es Tb Ss + Pb Vs . (5.8)

with

dG = d(A + P V ) = S dT + V dP + dN. (5.9)

Systems at constant energy and pressure minimize the enthalpy H =

E+P V . If we allow the number of particles N in our system to uctuate,

we will nd four more free energies involving the chemical potential .

The only one we will discuss will be the grand free energy

(T, V, ) = E T S N (5.10)

d = d(A N ) = S dT P dV N d. (5.11)

The grand free energy will arise naturally in the grand canonical ensem-

ble discussed in section 5.4.

Enough generalities and thermodynamics for now. Lets turn back to

statistical mechanics, and derive the ensemble appropriate for subsys-

tems.

Consider a closed system of total energy E composed of a small part

(the subsystem) in a particular state a much larger heat bath5 with 5

Its called the heat bath because it will

entropy SHB (EHB ) = kB log (HB (EHB )). We assume the two parts act as a source and sink for heat energy

from our subsystem.

can exchange energy, but are weakly coupled (section 3.3). For now, we

assume they cannot exchange volume or particles.

We are interested in how much the probability of our subsystem being

in a particular state depends upon its energy. Consider two states of

the subsystem s1 and s2 with energies E1 and E2 . As we discussed in

deriving equation 3.23 (see note 26), the probability that our subsystem

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

68 Free Energies and Ensembles

6

In this section, it is convenient to as- will be in the particular6 state si is proportional to the density of states

sume the energy states are discrete. of our heat bath at E Ei

Hence we will talk about probabilities

rather Pthan probability Rdensities, and

write

(si ) HB (E Ei ) = exp (SHB (E Ei )/kB ) (5.12)

n rather than dP dQ. Dis-

crete states will be the norm both in

quantum systems and for our later work since it gets a share of the microcanonical probability for each heat-bath

on Ising and other model systems. No partner it can coexist with at the xed total energy.

complications arise from translating the Now, we are assuming that the bath is much larger than the subsys-

equations in this section back into inte-

grals over probability density.

tem. We can therefore assume that the inverse temperature 1/THB =

SHB /E of the heat bath is constant in the range (E E1 , E E2 ),

so

= e(SHB (EE2 )SHB (EE1 ))/kB = e(E1 E2 ) (SHB /E)/kB

= e(E1 E2 )/kB THB . (5.13)

the probability Pn of a particular state of a subsystem of energy En

We know that the probability is normalized n Pn = 1, so that gives

us

Pn = eEn /kB T / eEm /kB T = exp(En /kB T )/Z (5.15)

m

Subsystem

Particular Large where the normalization factor

State si Heat

Energy E i Bath Z(T, N, V ) = eEm /kB T = exp(En ) (5.16)

m n

EHB= E-Ei

is the partition function, and where we also introduce the convenient

HB(EHB

) = Exp(S(EHB)/kB) symbol = 1/kB T , which well call the inverse temperature (verbally

ignoring the factor kB ).

Equation 5.15 is the denition of the canonical ensemble, appropriate

Fig. 5.2 An isolated system is com-

for calculating properties of systems coupled energetically to an exter-

posed of a small subsystem in state si

surrounded by a larger region we call nal world with temperature T . The partition function Z is simply the

the heat bath. The total system is iso- normalization factor that keeps the total probability summing to one.

lated from the outside world; the sub- It may surprise you to discover that this normalization factor plays a

system and heat bath here each have

xed volume, but can exchange energy.

central role in most calculations: a typical application involves nding a

The energy of interaction between the microscopic way of calculating Z, and then using Z to calculate every-

subsystems and bath is assumed negli- thing else of interest. Lets see how this works by using Z to calculate

gible. The total energy is E, the energy several important quantities.

of the subsystem is Ei , and the energy

of the bath is EHB = E Ei . Internal energy. Lets calculate the average internal energy of our

7

Throughout this text, all logs are nat- subsystem E, where the angle brackets represent canonical averages:7

ural logarithms, loge and not log10 .

En eEn Z/

E = En Pn = n =

n

Z Z

= log Z/ (5.17)

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

5.2 The Canonical Ensemble 69

Specic Heat. Lets call cv the specic heat per particle at constant

volume.8 8

The specic heat is the energy needed

to increase the temperature by one

E E d 1 En eEn d

unit. Also, dT =

d(1/kB T )

= k 1T 2 .

N cv = = = (5.18) dT B

T dT kB T 2 eEn

2

1 En eEn En 2 eEn

= 2 2

+

kB T Z Z

1

= E 2 E2

kB T 2

= E 2 /kB T 2 ,

(5.19)

Weve used the standard trick (E

system at constant temperature. E)2 = E 2 2EE + E2 =

E 2 E2 , since E is just a constant

Thus the energy uctuations per particle10

that can be pulled out of the ensemble

average.

E /N = E 2 E2 /N = (kB T )(cv T )/ N . (5.20) 10

kB T is two-thirds the equipartition

kinetic energy per particle. cv T is the

become small for large numbers of particles N , as we showed from ther- energy per particle that it would take

modynamics in section 3.3. For macroscopic systems, the behavior in to warm the system up from absolute

most regards is the same whether the system is completely isolated (mi- zero, if the specic heat were constant

for all temperatures. The uctuations

crocanonical) or in thermal contact with the rest of the world (canoni- in the energy per particle of a macro-

cal). scopic system is the geometric mean of

Notice that weve derived a formula relating a macroscopic suscep- these two divided by N 1012 .

tibility (cv , the temperature changes when the energy is perturbed) to

a microscopic uctuation (E , the energy uctuation in thermal equi-

librium). In general, uctuations can be related to responses in this

fashion. These relations are extremely useful, for example, in extract-

ing susceptibilities from numerical simulations. No need to make small

changes and try to measure the response: just watch it uctuate.

Entropy. Using the general formula for the entropy 6.22

exp(En )

exp(En )

S = kB Pn log Pn = kB log

Z Z

exp(En )(En log Z)

= kB

Z

exp(En )

= kB E + kB log Z = E/T + kB log Z. (5.21)

Z

Thus in particular

kB T times the log of the partition function. The relationship between

the partition function Z and the Helmholtz free energy A = kB T log Z in

the canonical ensemble is quite analogous to the relation between the en-

ergy shell volume and the entropy S = kB log in the microcanonical

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

70 Free Energies and Ensembles

A log Z d

= (kB T log Z) = kB log Z kB T

T N,V T dT

1

= kB T log Z + kB T E = kB log Z E/T

kB T 2

= S. (5.23)

just as we saw from thermodynamics (equation 5.7).

Suppose we have a system with two weakly interacting subsystems L and

R, both connected to a heat bath at inverse temperature . The states

for the whole system are pairs of states (sL R

i , sj ) from the two subsystems,

L R

with energies Ei and Ej respectively. The partition function for the

whole system is

E L E R

Z= exp (EiL + EjR ) = e i e j

ij ij

L R

= eEi eEj

i j

L R

=Z Z . (5.24)

Thus in the canonical ensemble of non-interacting systems the partition

function factors. The Helmholtz free energy adds

A = kB T log Z = kB T log(Z L Z R ) = AL + AR (5.25)

as does the entropy, average energy, and other properties that one would

11

These properties are termed exten- expect to scale with the size of the system.11

sive, as opposed to intensive properties This is much simpler than the same calculation would be in the mi-

like pressure, temperature, and chemi-

cal potential.

crocanonical ensemble! In the microcanonical ensemble, each subsystem

competes with the other for the available total energy. Even though

two subsystems are uncoupled (the energy of one is independent of the

state of the other) the microcanonical ensemble intermingles them in the

calculation. By allowing each to draw energy from a large heat bath,

uncoupled subsystems become independent calculations in the canonical

ensemble.

We can now immediately do several important cases of non-interacting

systems.

Ideal Gas. The partition function for the monatomic ideal gas of classi-

cal distinguishable particles of mass m in a cubical box of volume V = L3

factors into a product over each degree of freedom :

3N L 3N

dist p2 /2m L 2m

Zideal = (1/h) dq dp e =

=1 0 h

=(L 2mkB T /h2 )3N = (L/)3N

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

5.3 NonInteracting Canonical Distributions 71

where

= h/ 2mkB T = 22 /mkB T (5.26)

is the thermal de Broglie wavelength.

The internal energy

log Zideal

log( 2 ) = 3N/2 = 3N kB T /2 (5.27)

3N

E = =

giving us the equipartition theorem without our needing to nd volumes

of spheres in 3N dimensions.

For N indistinguishable particles, we have counted each real cong-

uration N ! times for the dierent permutations of particles, so we must

dist by N ! just as we did for the phase space volume in

divide Zideal

section 3.5.

indist = (L/)3N /N !

Zideal (5.28)

This doesnt change the internal energy, but does change the Helmholtz

free energy

Aindist

ideal = kB T log (L/) /N !

3N

= N kB T log(V /3 ) kB T log(N !)

N kB T log(V /3 ) kB T (N log N N )

= N kB T log(V /N 3 ) 1

= N kB T log(3 ) 1 . (5.29)

where = N/V is the average density, and weve used Stirlings formula

log(N !) N log N N .

Classical Harmonic Oscillator and the Equipartition Theorem.

A harmonic oscillator of mass m and frequency k has a total energy

1 2 2m

dpk (1/h)e(pk /2m+mk qk /2) =

2 2 2

Zk = dqk

h mk2

1

= . (5.31)

k

Hence the Helmholtz free energy for the classical oscillator is

log Z

E (T ) = = (log log ) = 1/ = kB T, (5.33)

and of course cv = E/T = kB . This establishes the equipartition

theorem (from section 3.2.2) for systems with harmonic potential ener-

gies as well as quadratic kinetic energies: the internal energy is 1/2 kB T

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

72 Free Energies and Ensembles

per degree of freedom, where our harmonic oscillator has two degrees of

freedom (pk and qk ).

Harmonic oscillators are important for the specic heat of molecules

and solids. At temperatures low compared to the melting point, a solid

or molecule with an arbitrary many-body interaction potential V(Q)

typically only makes small excursions about the minimum Q0 of the

12

We use the Einstein summation con- potential. We expand about this minimum, giving us12

vention, summing over the repeated in-

dices and , and the convention =

/Q . V(Q) V(Q0 )+(QQ0 ) V+1/2 (QQ0 ) (QQ0 ) V+. . . (5.34)

must be zero, so second term on the right-hand side must vanish. The

third term is a big 3N 3N quadratic form, which we may diagonalize

13

If the masses of the atoms are not all by converting to normal modes qk .13 In terms of these normal modes,

the same, one must change coordinates, the Hamiltonian is a set of uncoupled harmonic oscillators

rescaling the components of Q Q0 by

the square root of the mass.

H= p2k /2m + mk2 qk2 /2. (5.35)

k

14

In section 7.2 well do the quan- At high enough temperatures that quantum mechanics can be ignored,14

tum harmonic oscillator, which then we can then use equation 5.31 to nd the total partition function for our

gives the entire statistical mechanics of

atomic vibrations well below the melt-

harmonic system

ing or disassociation point.

Z= Zk = (1/k ). (5.36)

k k

Classical Kinetic Energies. One will notice both for the ideal gas

and for the harmonic oscillator that each component of the momentum

contributed a factor 2m . As we promised in section 3.2.2, this will

happen in any classical system where the kinetic energy is of the stan-

15

Not all Hamiltonians have this form. dard form15 K(P) = p2 /2m , since each component of the momen-

For example, charged particles in mag- tum is thus uncoupled from the rest of the system. Thus the partition

netic elds will have terms that couple

function for any classical interacting system of non-magnetic particles

momenta and positions. !

will be some congurational piece times 2m

. This implies that

the velocity distribution is always Maxwellian [1.2] independent of what

16

This may be counterintuitive: an conguration the positions have.16

atom crossing a barrier has the same

velocity distribution as it had in the

bottom of the well. It does need to

borrow some energy from the rest of

the system. The canonical distribu-

5.4 Grand Canonical Ensemble

tion works precisely when the system is

large, so that the resulting temperature The canonical ensemble allows one to decouple the calculations of sub-

shift may be neglected. systems with xed volume and particle number, but which can exchange

energy. There are occasions when one would like to decouple the calcu-

lations of problems with varying particle number.

Consider a subsystem in a state s with energy Es and number Ns , in

a system with total energy E and number N . By analogy with equa-

tion 5.14 the probability density that the system will be in state s is

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

5.4 Grand Canonical Ensemble 73

proportional to

(s) HB (E Es , N Ns ) (5.37)

= exp ((SHB (E Es , N Ns )) /kB )

SHB SHB

= exp Es Ns /kB

E N

= exp (Es /kB T + Ns /kB T )

= exp ((Es Ns )/kB T ) ,

Particular Heat

= T S/N (5.38) State s Bath

is the chemical potential. Notice the factor of T : this converts the Energy E s E = E-Es

HB

entropy change into an energy change, so the chemical potential is the Number N s NHB= N-Ns

energy gain per particle for accepting particles from the bath. At low

temperatures the subsystem will ll with particles until the energy for (s) (E , N )

HB HB HB

the next particle reaches .

Again, just as for the canonical ensemble, there is a normalization

Fig. 5.3 An isolated system composed

factor called the grand partition function

of a small system and a heat bath that

can exchange both energy and particles

(T, V, ) = e(En Nn )/kB T ; (5.39) (porous boundary). Both exchanges

n are weak, so that the states of the two

subsystems are assumed independent of

the probability density of state si is (si ) = e(Ei Ni )/kB T /. There one another. The system is in state

is a grand free energy si , with energy Ei and Ni particles;

the total energy is E and total number

is N . The probability of the subsys-

(T, V, ) = kB T log() = E T S N (5.40)

tem being in state si is proportional to

HB (E Ei , N Ni ).

analogous to the Helmholtz free energy A(T, V, N ). In problem 5.4 you

shall derive the Euler relation E = T S P V + N , and hence show that

(T, , V ) = P V .

Partial Traces. Let us note in passing that we can write the grand

canonical partition function as a sum over canonical partition functions.

Let us separate the sum over states n of our system into a double sum

an inner restricted sum17 over states of xed number of particles M in 17

This restricted sum is said to inte-

the system and an outer sum over M . Let sM ,M have energy EM ,M , grate over the internal degrees of free-

dom M . The process is often called a

so partial trace, a nomenclature stemming

from quantum mechanics.

(T, V, ) = e(E M ,M M)/kB T

M

M

E M ,M /kB T

= e eM/kB T

M M

= Z(T, V, M )eM/kB T

M

= e(A(T,V,M)M)/kB T . (5.41)

M

Notice that the Helmholtz free energy in the last equation plays exactly

the same role as the energy plays in equation 5.39: exp(Em /kB T )

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

74 Free Energies and Ensembles

exp(A(T, V, M )/kB T ) is the probability of the system having any state

18

This is why we dened A = with M particles.18 This type of partial summation is a basic tool

kB T log Z, so that Z = exp(A). that we will explore further in chapter X: it allows one to write eec-

This is the deep reason why the nor-

malization factors like Z are so central

tive coarse-grained free energies, averaging over many microstates which

to statistical mechanics: the statistical share the same macroscopic conguration of interest.

weight of a set of states with a common Using the grand canonical ensemble. The grand canonical en-

property Y is given by the partial trace semble is particularly useful for non-interacting quantum systems, which

of Z(Y ) = exp(F (Y )), giving us an

eective Hamiltonian free energy F (Y )

well see in chapter 7. There each energy eigenstate can be thought of

describing the coarse-grained system. as a separate subsystem, independent of the others except for the com-

petition between eigenstates for the particle number.

For now, lets see how the grand canonical ensemble works for the

problem of number uctuations. In general,

(Em Nm )/kB T

m Nm e kB T

N = (E N )/k T

= / = /. (5.42)

me

m m B

Just as the uctuations in the energy were related to the specic heat

(the rate of change of energy with temperature, section 5.2), the number

uctuations are related to the rate of change of particle number with

chemical potential.

(Em Nm )/kB T

N m Nm e

=

(Em Nm )/kB T 2

1 m Nm e

= 2

k T

B 2 (Em Nm )/kB T

1 m Nm e

+

kB T

N 2 N 2 (N N )2

= = (5.43)

kB T kB T

Exercises

(5.1) Twostate system. (Basic) perature T . What is the limiting probability as T ?

Consider the statistical mechanics of a tiny object with As T 0? Related formula: Boltzmann probability

only two discrete states:19 one of energy E1 and the other = Z(T ) exp(E/kT ) exp(E/kT ).

of higher energy E2 > E1 .

(a) Boltzmann probability ratio. Find the ratio of (b) Probabilities and averages. Use the normalization

the equilibrium probabilities 2 /1 to nd our system in of the probability distribution (the system must be in one

the two states, when weakly coupled to a heat bath of tem- or the other state) to nd 1 and 2 separately. (That

19 Visualize this as a tiny biased coin, which can be in the heads or tails state but

has no other internal vibrations or center of mass degrees of freedom. Many systems

are well described by large numbers of these twostate systems: some paramagnets,

carbon monoxide on surfaces, glasses at low temperatures, . . .

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

5.4 Grand Canonical Ensemble 75

is, solve for Z(T ) in the related formula for part (A).) atoms are scattered at dierent X, not at dierent heights

What is the average value of the energy E? in energy.)

In this problem, we will derive the Arrhenius law

(a) Let the probability that a particle has position X

= 0 exp(E/kB T ) (5.44) be (X). What is the ratio of probability densities

(XB )/(X0 ) if the particles near the top of the bar-

giving the rate at which systems cross energy barriers. rier are assumed in equilibrium with those deep inside

This law governs not only chemical reaction rates, but the well? Related formula: Boltzmann distribution

many macroscopic rates like diusion constants in solids exp(E/kB T ).

and nucleation rates (section 12.2) that depend on micro-

scopic thermal activation over barriers.20

The important exponential dependence on the barrier

height E is easy to explain: it is the relative Boltzmann

probability that a particle is near the top of the barrier

(and hence able to escape). Here we will do a relatively

careful job of calculating the prefactor 0 .

Consider a system described by a coordinate X, with an

energy U (X) with a minimum at X0 with energy zero

and an energy barrier at XB with energy U (XB ) = B.21

Let the temperature of the system be much smaller than

B/kB . To do our calculation, we will make some approx-

imations. (1) We assume that the atoms escaping across

Fig. 5.5 Well Probability Distribution. The approximate

the barrier to the right do not scatter back into the well. probability distribution for the atoms still trapped inside the

(2) We assume that the atoms deep inside the well are in well.

equilibrium. (3) We assume that the particles crossing to

the right across the barrier are given by the equilibrium

distribution inside the well.

If the barrier height B >> kB T , then most of the par-

ticles in the well stay near the bottom of the well. Of-

ten, the potential near the bottom is accurately described

by a quadratic approximation U (X) 1/2 M 2 (X X0 )2 ,

Energy

of small oscillations in the well.

B (b) In this approximation, what is the probability den-

sity (X) near the bottom of the well? (See g-

ure 5.5.) What is (X0 ), the probability density of

X0 XB having the system at the bottom of the well? Re-

Position X lated

formula: Gaussian probability distribution

Fig. 5.4 Barrier Crossing Potential. A schematic of how (1/ 2 2 ) exp(x2 /2 2 ).

many atoms are at each position. (Actually, of course, the Hint: Make sure you keep track of the 2s.

20 There are basically three ways in which slow processes arise in physics. (1) Large

systems can respond slowly to external changes because communication from one end

of the system to the other is sluggish: examples are the slow decay at long wavelengths

in the diusion equation 2.2 and Goldstone modes 9.3. (2) Systems like radioactive

nuclei can respond slowly decaying with lifetimes of billions of years because

of the slow rate of quantum tunneling through barriers. (3) Systems can be slow

because they must thermally activate over barriers (with the Arrhenius rate 5.44).

21 This potential could describe a chemical reaction, with X being a reaction coor-

dinate. It could describe the escape of gas from a moon of Jupiter, with X being the

distance from the moon in Jupiters direction.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

76 Free Energies and Ensembles

v t

mentally determined data points yi at times ti with errors

of standard deviation . We assume that the experimen-

tal errors for the data points are independent and Gaus-

sian distributed, so that the probability that our model

actually generated the observed data points (the proba-

bility P (D|M ) of the data given the model) is

2 2 3

Y

N

1 6 y (M ) (ti ) yi 7

Fig. 5.6 Crossing the Barrier. The range of positions for P (D|M ) = exp 4 5.

which atoms moving to the right with velocity v will cross the i=1

2 2 2

barrier top in time t.

(5.45)

(a) True or false: This probability density corresponds to

Knowing the answers from (a) and (b), we know the a Boltzmann Pdistribution with energy H and temperature

probability density (XB ) at the top of the barrier. We T , with H = N i=1 (y

(M )

(ti ) yi )2 /2 and kB T = 2 .

need to also know the probability that particles near There are two schools of statistics. Among a family of

the top of the barrier have velocity V , because the models, the frequentists will pick the model M with the

faster-moving parts of the distribution of velocities con- largest value of P (D|M ). The Bayesians take a dierent

tribute more to the ux of probability over the barrier point of view. They argue that there is no reason to be-

(see gure 5.6). As usual, because the total energy is lieve that all models have the same likelihood.22 Suppose

the sum of the kinetic and potential energy, the total the intrinsic probability of the model (the prior) is P (M ).

Boltzmann probability factors: in equilibrium the parti- They use the simple theorem

cles will always

p have a velocity probability distribution

(V ) = 1/ 2kB T /M exp(1/2 M V 2 /kB T ).

P (M |D) = P (D|M )P (M )/P (D) = P (D|M )P (M )

(c) First give a formula for the decay rate (the proba- (5.46)

bility per unit time that our system crosses the barrier where the last step notes that the probability that you

towards the right), for an unknown probability density measured the known data D is presumably one.

(XB )(V ) as an integral over the velocity V . Then, us-

ing your formulas from parts (A) and (B), give your esti- The Bayesians often will pick the maximum of P (M |D)

as their model for the experimental data. But, given their

Rmate

of the decay rate for our system. Related formula:

perspective, its even more natural to consider the entire

x exp(x2 /2 2 ) dx = 2 .

0

ensemble of models, weighted by P (M |D), as the best

How could we go beyond this simple calculation? In the description of the data. This ensemble average then nat-

olden days, Kramers studied other onedimensional mod- urally provides error bars as well as predictions for various

els, changing the ways in which the system was coupled quantities.

to the external bath. On the computer, one can avoid a

separate heat bath and directly work with the full mul- Consider the simple problem of tting a line to two data

tidimensional conguration space, leading to transition points. Suppose the experimental data points are at

state theory. The transitionstate theory formula is very t1 = 0, y1 = 1 and t2 = 1, y2 = 2, where both y-values

similar to the one you derived in part (c), except that have uncorrelated Gaussian errors with standard devia-

the prefactor involves the product of all the frequencies tion = 1/2, as assumed in equation (F.2.1) above. Our

at the bottom of the well and all the positive frequen- model M (m, b) is y(t) = mt+b. Our Bayesian statistician

cies at the saddlepoint at the top of the barrier. (See knows that m and b both lie between zero and two, and

reference [42].) Other generalizations arise when crossing assumes that the probability density is otherwise uniform:

multiple barriers [45] or in nonequilibrium systems [63]. P (m, b) = 1/4 for 0 < m < 2 and 0 < b < 2.

(b) Which of the contour plots in gure 5.7 accurately rep-

(5.3) Statistical Mechanics and Statistics. (Mathe- resent the probability distribution P (M |D) for the model,

matics) given the observed data? (The spacing between the con-

Consider the problem of tting a theoretical model to ex- tour lines is arbitrary.)

perimentally determined data. Let our model M predict a

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

5.4 Grand Canonical Ensemble 77

(D) (E)

Fig. 5.7

Clapeyron. (Thermodynamics, Chemistry)

(a) Using the fact that the entropy S(N, V, E) is exten- Critical

Pressure

Liquid

S S S Solid

N + V + E = S. (5.47)

N V,E V N,E E N,V

Point

Gas

S = (E + pV N )/T (5.48) Temperature

Fig. 5.8 Generic phase diagram, showing the coexistence

and hence E = T S pV + N . This is Eulers equation. curves for solids, liquids, and gasses.

As a state function, S is supposed to depend only on

E, V , and N . But equation 5.48 seems to show explicit

dependence on T , p, and as well: how can this be? Clausius-Clapeyron equation. Consider the phase di-

(b) One answer is to write the latter three as functions of agram 5.8. Along an equilibrium phase boundary, the

E, V , and N . Do this explicitly for the ideal gas, using temperatures, pressures, and chemical potentials of the

the ideal gas entropy equation 3.61 two phases must agree: otherwise a at interface between

the two phases would transmit heat, shift sideways, or

" 3/2 #

5 V 4mE leak particles, respectively (violating the assumption of

S(N, V, E) = N kB + N kB log , equilibrium).

2 N h3 3N

(5.49) (c) Apply the Gibbs-Duhem relation to both phases, for

and your (or the graders) results for problem 3.5(c), and a small shift by T along the phase boundary. Let s1 ,

verify equation 5.48 in that case. v1 , s2 , and v2 be the molecular entropies and volumes

(s = S/N , v = V /N for each phase); derive the Clausius-

Another answer is to consider a small shift of all six vari-

Clapeyron equation for the slope of the coexistence line on

ables. We know that dE = T dS pdV + dN , but

the phase diagram

if we shift all six variables in Eulers equation we get

dE = T dS pdV + dN + SdT V dp + N d. This im- dP/dT = (s1 s2 )/(v1 v2 ). (5.51)

plies the Gibbs-Duhem relation

0 = SdT V dp + N d. (5.50) Its hard to experimentally measure the entropies per par-

ticle: we dont have an entropy thermometer. But, as you

It means that the intensive variables T , p, and are not will remember, the entropy dierence upon a phase trans-

all independent. formation S = Q/T is related to the heat ow Q needed

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

78 Free Energies and Ensembles

to induce the phase change. Let the latent heat L be the (c) Canonical Ensemble: Explicit traces and ther-

heat ow per molecule. modynamics. (i) Take one of our atoms and couple

(d) Write a formula for dP/dT not involving the entropy. it to a heat bath of temperature kB T = 1/. Write

explicit formulas for Zcanon , Ecanon , and Scanon in the

(5.5) Negative Temperature. (Quantum) canonical ensemble, as a trace (or sum) over the two

states of the atom. (E should be the energy of each

A system of N atoms can be in the ground state or in an

state multiplied by the probability n of that state, S

excited state. For convenience, we set the zero of energy

should be the trace of n log n .) (ii) Compare the re-

exactly in between, so the energies of the two states of

sults with what you get by using the thermodynamic rela-

an atom are 9/2. The atoms are isolated from the out-

tions. Using Z from the trace over states, calculate the

side world. There are only weak couplings between the

Helmholtz free energy A, S as a derivative of A, and

atoms, sucient to bring them into internal equilibrium

E from A = E T S. Do the thermodynamically de-

but without other eects.

rived formulas you get agree with the statistical traces?

(iii) To remove some of the mystery from the thermo-

Entropies and Energy Fluctuations dynamic relations, consider the thermodynamically valid

Microcanonical and Canonical formula E = log Z/ = (1/Z) Z/. Write out

Z as a sum over energy states, and see that this formula

follows naturally.

30kB 0.1 (d) What happens to E in the canonical ensemble as

T ? Can you get into the regime discussed in

Probability (E)

Entropy S

part (b)?

(e) Canonical-Microcanonical Correspondence.

Microcanonical Entropy Smicro(E)

Find the entropy in the canonical distribution for N of

Canonical Entropy Sc(T(E)) our atoms coupled to the outside world, from your an-

Probability (E)

swer to part (c). How can you understand the value of

S(T = ) S(T = 0) simply? Using the approximate

0kB 0 form of the entropy from part (a) and the temperature

-20

-10 0 10 20

Energy E = -N/2 + M /3 from part (b), show that the canonical and microcanon-

Fig. 5.9 Entropies and energy uctuations for this problem ical entropies agree, Smicro (E) = Scanon (T (E)). (Per-

with N = 50. The canonical probability distribution for the haps useful: arctanh(x) = 1/2 log ((1 + x)/(1 x)) .) No-

energy is for E = 10/, and kB T = 1.207/. You may wish tice that the two are not equal in the gure above: the

to check some of your answers against this plot. form of Stirlings formula we used in part (a) is not very

accurate for N = 50. In a simple way, explain why the

microcanonical entropy is smaller than the canonical en-

(a) Microcanonical Entropy. If the net energy is tropy.

E (corresponding to a number of excited atoms m = (f ) Fluctuations. Show in general that the root-mean-

E/9+N/2), what is the microcanonical entropy Smicro (E) squared uctuations in the energy in the canonical dis-

of our system? Simplify your expression using Stirlings tribution (E E)2 = E 2 E2 is related to the

formula, log n! n log n n. specic heat C = E/T . (I nd it helpful to use the for-

(b) Negative Temperature. Find the temperature, us- mula from part (c.iii), E = log(Z)/.) Calculate the

ing your simplied expression from part (a). (Why is it root-mean-square energy uctuations for N of our atoms.

tricky to do it without approximation?) What happens to Evaluate it at T (E) from part (b): it should have a par-

the temperature when E > 0? ticularly simple form. For large N , are the uctuations

Having the energy E > 0 is a kind of population inver- in E small compared to E?

sion. Population inversion is the driving mechanism for

lasers. Microcanonical simulations can lead also to states (5.6) Laplace. (Thermodynamics) 23

with negative specic heats. Laplace Transform. The Laplace transform of a func-

For many quantities, the thermodynamic derivatives have tion f (t) is a function of x:

Z

natural interpretations when viewed as sums over states.

Its easiest to see this in small systems. L{f }(x) = f (t)ext dt. (5.52)

0

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

5.4 Grand Canonical Ensemble 79

written as the Laplace transform of the microcanonical

volume of the energy shell (E).

F

24

(5.7) Legendre. (Thermodynamics) x

Legendre Transforms. The Legendre transform of a

STEM

function f (t) is given by minimizing f (x) xp with re- SY

spect to p, so that p is the slope (p = f

x

):

Fig. 5.10 An RNA polymerase molecular motor attached to

g(p) = min{f (x) xp}. (5.53) a glass slide is pulling along a DNA molecule (transcribing it

x into RNA). The opposite end of the DNA molecule is attached

to a bead which is being pulled by an optical trap with a con-

stant external force F . Let the distance from the motor to the

We saw in the text that in thermodynamics the Legendre bead be x: thus the motor is trying to move to decrease x and

transform of the energy is the Helmholtz free energy25 the force is trying to increase x.

A(T, N, V ) = min {E(S, V, N ) T S} . (5.54) Without knowing anything further about the chemistry or

E

biology in the system, which two of the following must be

true on average, in all cases, according to basic laws of

How do we connect this with the statistical mechanical thermodynamics?

relation of part (a), which related = exp(S/kB ) to

(T) (F) The total entropy of the universe (the system,

Z = exp A/kB T ? Thermodynamics, roughly speaking,

bead, trap, laser beam . . . ) must increase or stay un-

is statistical mechanics without the uctuations.

changed with time.

Using your Laplace transform of exercise 5.6, nd an (T) (F) The entropy Ss of the system must increase with

equation for Emax where the integrand is maximized. time.

Does this energy equal the energy which minimizes the (T) (F) The total energy ET of the universe must de-

Legendre transform 5.54? Approximate Z() in your crease with time.

Laplace transform by the value of the integrand at this

(T) (F) The energy Es of the system must decrease with

maximum (ignoring the uctuations). Does it give the

time.

Legendre transform relation 5.54?

(T) (F) Gs F x = Es T Ss + P Vs F x must decrease

with time, where Gs is the Gibbs free energy of the system.

Related formula: G = E T S + P V .

(5.8) Molecular Motors: Which Free Energy? (Ba-

Note: F is a force, not the Helmholtz free energy. Pre-

sic, Biology)

cisely two of the answers are correct.

Figure 5.10 shows a study of the molecular motor that

transcribes DNA into RNA. Choosing a good ensemble (5.9) Michaelis-Menten and Hill (Biology, Computa-

for this system is a bit involved. It is under two constant tion)

forces (F and pressure), and involves complicated chem- Biological systems often have reaction rates that are sat-

istry and biology. Nonetheless, you know some things urable: the cell needs to respond sensitively to the in-

based on fundamental principles. Let us consider the op- troduction of a new chemical S, but the response should

tical trap and the distant uid as being part of the ex- not keep growing indenitely as the new chemical con-

ternal environment, and dene the system as the local centration [S] grows.26 Other biological systems act as

region of DNA, the RNA, motor, and the uid and local switches: they not only saturate, but they change sharply

molecules in a region immediately enclosing the region, from one state to another as the concentration of a chem-

as shown in gure 5.10. ical S is varied. We shall analyze both of these important

24 Legendre (1752-1833).

25 Actually, [5.3] in the text had E as the independent variable. As usual in ther-

modynamics, we can solve S(E, V, N ) for E(S, V, N ).

26 [S] is the concentration of S (number per unit volume). S stands for substrate.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

80 Free Energies and Ensembles

examples of how one develops eective dynamical theo-

ries by removing degrees of freedom: here, we remove an

Michaelis Menten

Hill, n=4

enzyme E from the equations to get an eective reaction

rate, rather than coarsegraining some large statistical

Rate d[P]/dt

mechanical system.

NS + B C (5.55)

0

0 KM, KH

where N molecules of type S combine with a molecule Substrate concentration [S]

of type B to make a molecule of type C will occur with Fig. 5.11 MichaelisMenten and Hill equation forms.

a reaction rate given by a traditional chemical kinetics

formula:

We can derive the MichaelisMenten form by hypothesiz-

d[C]

= k[S]N [B]. (5.56) ing the existence of a catalyst or enzyme E, which is in

dt short supply. The enzyme is presumed to be partly free

and available for binding (concentration [E]) and partly

If the reactants need all to be in a small volume V in order bound to the substrate (concentration [E : S], the colon

to react, then [S]N [B]V N is the probability that they are denoting the dimer), helping it to turn into the product.

in location to proceed, and the rate constant k divided The total concentration [E] + [E : S] = Etot is xed. The

by V N is the reaction rate of the conned molecules.27 reactions are as follows:

Saturation: the MichaelisMenten equation. Sat- k

1

kcat

uration is not seen in simple chemical reaction kinetics. E +S E:S E+P (5.58)

Notice that the reaction rate goes as the N th power of k1

the concentration [S]: far from saturating, the reaction

rate grows linearly or faster with concentration. We must then assume that the supply of substrate is

large, so its concentration changes slowly with time. We

The prototype example of saturation in biological systems can then assume that the concentration [E : S] is in

is the MichaelisMenten reaction form. A reaction of this steady state, and remove it as a degree of freedom.

form converting a chemical S (the substrate) into P (the (a) Assume the binding reaction (rates k1 , k1 , and kcat )

product) has a rate given by the formula in equation 5.58 are of traditional chemical kinetics form

(equation 5.56), with N = 1 or N = 0 as appropriate.

Write the equation for d[E : S]/dt, set it to zero, and

d[P ] vmax [S] use it to eliminate [E] in the equation for dP/dt. What

= , (5.57) are vmax and KM in the Michaelis-Menten form (equa-

dt KM + [S]

tion 5.57) in terms of the ks and Etot ?

We can understand this saturation intuitively: when all

where KM is called the Michaelis constant (gure 5.11). the enzyme is busy and bound to the substrate, adding

This reaction at small concentrations acts like an ordi- more substrate cant speed up the reaction.

nary chemical reaction with N = 1 and k = vmax /KM , Cooperativity and sharp switching: the Hill equa-

but the rate saturates at Vmax as [S] . The Michaelis tion. Hemoglobin is what makes blood red: this iron

constant KM is the concentration [S] at which the rate is containing protein can bind up to four molecules of oxy-

equal to half of its saturation rate. gen in the lungs, and carries them to the tissues of the

27 The reaction will typically involve crossing an energy barrier E, and the rate will

tionality k0 can in principle be calculated using generalizations of the methods we

used in exercise 5.2.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

5.4 Grand Canonical Ensemble 81

body where it releases them. If the binding of all four more of a switch, with the reaction turning on (or the Hb

oxygens were independent, the [O2 ] concentration depen- accepting or releasing its oxygen) sharply at a particular

dence of the bound oxygen concentration would have the concentration (gure 5.11). The transition can be made

MichaelisMenten form (gure 5.11): to completely de- more or less sharp by increasing or decreasing n.

oxygenate the Hemoglobin (Hb) would demand a very The Hill equation can be derived using a simplifying as-

low oxygen concentration in the tissue. sumption that n molecules bind in a single reaction:

What happens instead is that the Hb binding of oxygen

k

looks much more sigmoidal a fairly sharp transition be- b

E + nS E : (nS) (5.60)

tween nearly 4 oxygens bound at high [O2 ] (lungs) to

ku

nearly none bound at low oxygen concentrations. This

arises because the binding of the oxygens is enhanced by where E might stand for hemoglobin and S for the O2

having other oxygens bound. This is not because the oxy- oxygen molecules. Again, there is a xed total amount

gens somehow stick to one another: instead, each oxygen Etot = [E] + [E : nS].

deforms the Hb in a nonlocal allosteric28 fashion, chang-

(b) Assume that the two reactions in equation 5.60 have

ing the congurations and and anity of the other bind-

the chemical kinetics form (equation 5.56) with N = 0 or

ing sites.

N = n as appropriate. Write the equilibrium equation for

The Hill equation was introduced for hemoglobin to E : (nS), and eliminate [E] using the xed total Etot .

describe this kind of cooperative binding. Like the

Usually, and in particular for hemoglobin, this coopera-

MichaelisMenten form, it is also used to describe reac-

tivity is not so rigid: the states with one, two, and three

tion rates, where instead of the carrier Hb we have an

O2 molecules bound also compete with the unbound and

enzyme, or perhaps a series of transcription binding sites

fully bound states. This is treated in an approximate way

(see exercise 8.7). In the reaction rate form, the Hill equa-

by using the Hill equation, but allowing n to vary as a

tion is

d[P ] vmax [S]n tting parameter: for Hb, n 2.8.

= n , (5.59)

dt KH + [S]n Both Hill and MichaelisMenten equations are often used

(see gure 5.11). For Hb, the concentration of the n-fold in biological reaction models even when there are no ex-

oxygenated form is given by the right-hand side of equa- plicit mechanisms (enzymes, cooperative binding) known

tion 5.59. In both cases, the transition becomes much to generate them.

28 Allosteric comes from Allo (other) and steric (structure or space). Allosteric

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

82 Free Energies and Ensembles

Entropy 6

Entropy is the key concept in statistical mechanics. What does it mean?

Can we develop an intuition for it?

We shall see in this chapter that entropy has several related interpre-

tations. Entropy measures the disorder in a system: in section 6.2 well

see this using the entropy of mixing and the residual entropy of glasses.

Entropy measures our ignorance about a system: in section 6.3 well see

this with examples from nonequilibrium systems and information the-

ory. But well start in section 6.1 with the original interpretation, that

grew out of the 19th century study of engines, refrigerators, and the end

of the universe: Entropy measures the irreversible changes in a system.

Heat Death of the Universe

The early 1800s saw great advances in understanding motors and en-

gines. In particular, scientists asked a fundamental question: How e-

cient can an engine be? The question was made more dicult because

there were two relevant principles to be discovered: energy is conserved

and entropy always increases.1 1

Some would be pedantic, and say only

For some kinds of engines, only energy conservation is important. that entropy never decreases, but this

qualication is unnecessary. Systems

For example, there are electric motors that convert electricity into me- that remain completely in equilibrium

chanical work (running an electric train), and generators that convert at all times have constant entropy. But

mechanical work (a windmill rotating) into electricity.2 For these elec- systems only equilibrate completely af-

tromechanical engines, the absolute limitation is given by the conserva- ter an innite time; for example, well

see that Carnot cycles must be run in-

tion of energy: the motor cannot generate more energy in mechanical nitely slowly to be truly reversible.

work than is consumed electrically, and the generator cannot generate 2

Electric motors are really the same as

more electrical energy than is input mechanically.3 An ideal electrome- generators run in reverse: turning the

chanical engine can convert all the energy from one form to another. shaft of a simple electric motor can gen-

Electric motors and generators are limited only by the conservation of erate electricity.

energy. 3

Mechanical work (force times dis-

Steam engines are more complicated. Scientists in the early 1800s tance) is energy; electrical power (cur-

were guring out that heat is a form of energy. A steam engine, running rent times voltage) is energy per unit

time.

a power plant or an old-style locomotive, transforms a fraction of the

heat energy from the hot steam (the hot bath) into electrical energy or

work, but some of the heat energy always ends up wasted dumped

into the air or into the cooling water for the power plant (the cold

bath). In fact, if the only limitation on heat engines was conservation

of energy, one would be able to make a motor using the heat energy from

83

84 Entropy

There is something fundamentally less useful about energy once it

becomes heat. By spreading out the energy among all the atoms in a

macroscopic chunk of material, not all of it can be retrieved again to

do useful work. The energy is more useful for generating power when

divided between hot steam and a cold lake, than in the form of water

at a uniform, intermediate warm temperature. Indeed, most of the time

when we use mechanical or electrical energy, the energy ends up as heat,

generated from friction or other dissipative processes.

The equilibration of a hot and cold body to two warm bodies in an

isolated system is irreversible: one cannot return to the original state

without inputting some kind of work from outside the system. Carnot,

publishing in 1824, realized that the key to producing the most ecient

possible engine was to avoid irreversibility. A heat engine run in reverse

is a refrigerator: it consumes mechanical work or electricity and uses

it to pump heat from a cold bath to a hot one (extracting some of the

heat as work). A reversible heat engine would be able to run forward

T1

Q 2+W W Q 2+W + generating work by transferring heat from the hot to the cold baths, and

Carnot Power Impossible then run backward using the same work to pump the heat back into the

Refrigerator Plant Engine

Q2 Q2 hot bath. It was by calculating the properties of this reversible engine

T2 that Carnot discovered what would later be called the entropy.

If you had an engine more ecient than a reversible one, you could

Fig. 6.1 How to use an engine which

run it side-by-side with a reversible engine running as a refrigerator

produces more work than the Carnot

cycle to build a perpetual motion ma- (gure 6.1). The pair of engines would generate work by extracting

chine doing work per cycle. energy from the hot bath (as from our rock, above) without adding heat

to the cold one. After we used this work, we could dump the extra heat

T1 from friction back into the hot bath, getting a perpetual motion machine

that did useful work without consuming anything. In thermodynamics,

it is a postulate that such perpetual motion machines are impossible.

P Carnot considered a prototype heat engine (gure 6.2), given by a

piston with external pressure P , two heat baths at a hot temperature T1

Q2 P and a cold temperture T2 , with some material inside the piston. During

one cycle of his engine, heat Q1 ows out of the hot bath, heat Q2 ows

into our cold bath, and net work W = Q1 Q2 is done by the piston

T2 on the outside world. To make his engine reversible, Carnot must avoid

(i) friction, (ii) letting hot things touch cold things, (iii) letting high

Fig. 6.2 Prototype Heat Engine:

pressures expand into low pressures, and (iv) moving the walls of the

A piston with external exerted pres- container too quickly (emitting sound or shock waves).

sure P , moving through an insulated Carnot, a theorist, could ignore the practical diculties. He imagined

cylinder. The cylinder can be put into a frictionless piston run through a cycle at arbitrarily low velocities. He

thermal contact with either of two heat

baths: a hot bath at temperature T1 realized that all reversible heat engines working with the same tempera-

(say, a coal re in a power plant) and a ture baths had to produce exactly the same amount of work for a given

cold bath at T2 (say water from a cold heat ow from hot to cold (none of them could be more ecient than any

lake). During one cycle of the piston in other, since they all were the most ecient possible). This allowed him

and out, heat energy Q1 ows into the

piston, mechanical energy W is done on to ll the piston with the simplest possible material (an ideal gas), for

the external world by the piston, and which he knew the relation between pressure, volume, and temperature.

heat energy Q2 ows out of the piston The piston was used both to extract work from the system and to raise

into the cold bath. and lower the temperature. Carnot connected the gas thermally to each

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

6.1 Entropy as Irreversibility: Engines and Heat Death 85

bath only when its temperature agreed with the bath, so his engine was

fully reversible.

The Carnot cycle moves the piston in and out in four steps (gure 6.3).

(ab) The compressed gas is connected to the hot bath, and the piston

moves outward at a varying pressure; heat Q1 ows in to maintain the

gas at temperature T1 .

(bc) The piston expand further at varying pressure, cooling the gas

to T2 without heat transfer,

(cd) The expanded gas in the piston is connected to the cold bath

and compressed; heat Q2 ows out maintaining the temperature at T2 .

(da) The piston is compressed, warming the gas to T1 without heat

transfer, returning it to the original state.

Energy conservation tells us that the net heat energy owing into the piston,

Q1 Q2 must equal the work done on the outside world W :

Q1 = Q2 + W. (6.1)

The work done by the piston is the integral of the force exerted times the

distance. The force is the piston surface area times the pressure, and the a

distance times the piston surface area is the volume change, giving the simple

result

Z Z Z Pressure P

Heat In Q1

PV=N kBT1

W = F dx = (F/A)(Adx) = P dV = Area inside PV Loop. (6.2) Compress

cycle b

d Expand

That is, if we plot P versus V for the four steps of our cycle, the area inside PV=N kBT2

c

the resulting closed loop is the work done by the piston on the outside world Heat Out Q2

(gure 6.2). Volume V

We now ll our system with a monatomic ideal gas. We saw in section 3.5

that the ideal gas equation of state is Fig. 6.3 Carnot Cycle PV Dia-

gram: The four steps in the Carnot

P V = N kB T (6.3) cycle: ab heat in Q1 at constant tem-

perature T1 , bc expansion without

heat ow, cd heat out Q2 at constant

and that its total energy is its kinetic energy, given by the equipartition the-

temperature T2 , and da compression

orem without heat ow to the original vol-

E = 3/2 N kB T = 3/2 P V. (6.4) ume and temperature.

Along ab where we add heat Q1 to the system, we have P (V ) = N kB T1 /V .

Using energy conservation (the rst law),

Z b

Q1 = Eb Ea + Wab = 2/3 Pb Vb 2/3 Pa Va + P dV (6.5)

a

But Pa Va = N kB T1 = Pb Vb , so the rst two terms cancel, and the last term

simplies Z b

N kB T1

Q1 = dV = N kB T1 log(Vb /Va ). (6.6)

a V

Similarly,

Q2 = N kB T2 log(Vc /Vd ). (6.7)

For the other two steps in our cycle we need to know how the ideal gas

behaves under expansion without any heat ow in or out. Again, using the rst

law on a small segment of the path, the work done for a small volume change

P dV must equal the change in energy dE. Using equation 6.3, P dV =

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

86 Entropy

Integrating both sides from b to c, we nd

Z c Z c

dV dT

= log(Vc /Vb ) = 3/2 = 3/2 log(T2 /T1 ). (6.8)

b V b T

3 3

so Vc /Vb = (T1 /T2 ) /2 . Similarly, Vd /Va = (T1 /T2 ) /2 . Thus Vc /Vb = Vd /Va ,

and hence

Vc Vc Vb Vd Vb Vb

= = = . (6.9)

Vd Vb Vd Va Vd Va

We can use the volume ratios from the insulated expansion and compres-

sion (equation 6.9) to substitute into the heat ow (equations 6.6 and 6.6) to

nd

Q1 /T1 = N kB log(Vb /Va ) = N kB log(Vc /Vd ) = Q2 /T2 . (6.10)

This was Carnots fundamental result: his cycle, and hence all re-

versible engines, satises the law

4

The thermodynamic entropy is de- Later scientists decided to dene4 the entropy change to be this ratio

rived with a heat ow E = Q at of heat ow to temperature:

a xed temperature T , so our statisti-

cal mechanics denition of temperature

1/T = S/E (from equation 3.30) is Sthermo = Q/T. (6.12)

equivalent to the thermodynamics de-

nition of entropy S = Q/T (equation For a reversible engine the entropy ow from the hot bath into the

6.12). piston Q1 /T1 equals the entropy ow from the piston into the cold bath

Q2 /T2 : no entropy is created or destroyed. Any real engine will create

a net entropy during a cycle: no engine can reduce the net amount of

entropy in the universe.

The irreversible increase of entropy is not a property of the microscopic

laws of nature. In particular, the microscopic laws of nature are time

reversal invariant: the laws governing the motion of atoms are the same

5

More correctly, the laws of nature whether time is running backwards or forwards.5 The microscopic laws

are only invariant under CPT: chang- do not tell us the arrow of time. The direction of time in which entropy

ing the direction of time (T) along

with inverting space (P) and changing

increases is our denition of the future.6

matter to antimatter (C). Radioactive This confusing point may be illustrated by considering the game of

beta decay and other weak interaction pool or billiards. Neglecting friction, the trajectories of the pool balls are

forces are not invariant under time re- also time-reversal invariant. If the velocities of the balls were reversed

versal. The basic conundrum for sta-

tistical mechanics is the same, though:

halfway through a pool shot, they would retrace their motions, building

we cant tell if we are matter beings liv- up all the velocity into one ball that then would stop as it hit the cue

ing forward in time or antimatter be- stick. In pool, the feature that distinguishes forward from backward

ings living backward in time in a mir- time is the greater order at early times: all the momentum starts in one

ror. Time running backward would ap-

pear strange even if we were made of ball, and is later distributed among all the balls involved in collisions.

antimatter. Similarly, the only reason we can resolve the arrow of time distinguish

6

In electromagnetism, the fact that the future from the past is that our universe started in an unusual,

waves radiate away from sources more low entropy state,7 and is irreversibly moving towards equilibrium.8

often than they converge upon sources

is a closely related distinction of past 8 Ifsome miracle produced a low entropy, ordered state as a spontaneous uctua-

from future. tion at time t0 , then at times t < t0 all our laws of macroscopic physics would appear

7 to run backward.

The big bang was hot and probably

close to equilibrium, but the volume per To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

particle was small so the entropy was

nonetheless low.

6.2 Entropy as Disorder 87

lost on the intellectuals of the 19th century. In 1854, Helmholtz predicted

the heat death of the universe: he suggested that as the universe ages

all energy will become heat, all temperatures will become equal, and

everything will be condemned to a state of eternal rest. In 1895, H.G.

Wells in The Time Machine [118, Chapter 11] speculated about the state

of the Earth in the distant future:

. . . the sun, red and very large, halted motionless upon

the horizon, a vast dome glowing with a dull heat. . . The

earth had come to rest with one face to the sun, even as

in our own time the moon faces the earth. . . There were no

breakers and no waves, for not a breath of wind was stirring.

Only a slight oily swell rose and fell like a gentle breathing,

and showed that the eternal sea was still moving and living.

. . . the life of the old earth ebb[s] away. . .

This gloomy prognosis has been re-examined recently: it appears that

the expansion of the universe may provide loopholes. While there is little

doubt that the sun and the stars will indeed die, it may be possible if

life can evolve to accomodate the changing environments that civiliza- V V

tion, memory, and thought could continue for an indenite subjective

time (e.g., exercise 6.1).

A second intuitive interpretation of entropy is as a measure of the disor-

der in a system. Scientist mothers tell their children to lower the entropy Fig. 6.4 The pre-mixed state: N/2

white atoms on one side, N/2 black

by tidying their rooms; liquids have higher entropy than crystals intu-

atoms on the other.

itively because their atomic positions are less orderly.9 We illustrate

this interpretation by rst calculating the entropy of mixing, and then 2V

discussing the zero-temperature entropy of glasses.

motic Pressure

Scrambling an egg is a standard example of irreversibility: you cant

re-separate the yolk from the white. A simple model for scrambling is Fig. 6.5 The mixed state: N/2 white

given in gures 6.4 and 6.5: the mixing of two dierent types of particles. atoms and N/2 black atoms scattered

Here the entropy change upon mixing is a measure of increased disorder. through the volume 2V .

Consider a volume separated by a partition into two equal volumes of vol- 9

There are interesting examples of sys-

ume V . N/2 indistinguishable ideal gas white atoms are on one side of the tems that appear to develop more or-

partition, and N/2 indistinguishable ideal gas black atoms are on the other der as their entropy (and temperature)

side. The congurational entropy of this system (section 3.5, ignoring the rises. These are systems where adding

order of one, visible type (say, crys-

momentum space parts) is

talline or orientational order) allows in-

N creased disorder of another type (say,

/2 N

Sunmixed = 2 kB log(V / /2 !) (6.13) vibrational disorder). Entropy is a pre-

cise measure of disorder, but is not the

just twice the congurational entropy of N/2 atoms in a volume V . We assume only possible or useful measure.

that the black and white atoms have the same masses and the same total

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

88 Entropy

energy. Now consider the entropy change when the partition is removed, and

the two sets of atoms are allowed to mix. Because the temperatures and

pressures from both sides are equal, removing the partition does not involve

any irreversible sound emission or heat transfer: any entropy change is due

10

No social policy implications are im- to the mixing of the white and black atoms. In the desegregated state,10 the

plied by physics: the entropy of mixing entropy has increased to

for a few billion humans would not pro-

N

vide for an eye blink. /2 N

Smixed = 2kB log((2V ) / /2 !), (6.14)

log(2m x) = m log 2 + log x, the change in entropy due to the mixing is

of two boxes without looking which box we chose. More generally, we

might dene a counting entropy

This kind of discrete choice arises often in statistical mechanics. In

equilibrium quantum mechanics (for a nite system) the states are quan-

tized: so adding a new (non-interacting) particle into one of m degen-

erate states adds kB log m to the entropy. In communications theory

(subsection 6.3.2, exercises 6.7 and 6.8), each bit transmitted down your

channel can be in one of two states, so a random stream of bits of length

11

Here it is natural to measure entropy N has S = kS N log 2 per bit.11

not in units of temperature, but rather In more general cases, the states available to one particle depend

in base 2, so kS = 1/ log 2. This means

that S = N , for a random string of N

strongly on the congurations of the other particles. Nonetheless, the

bits. equilibrium entropy still measures the logarithm of the number of dier-

ent states that the total system could be in. For example, our equilib-

rium statistical mechanics entropy Sequil (E) = kB log((E)) (equation

3.27) is the logarithm of the number of states energy E, with phase-space

volume h3N allocated to each state.

What would happen if we removed a partition separating N/2 black

atoms on one side from N/2 indistinguishable black atoms on the other?

N

The initial entropy is the same as above SBB unmixed = 2 kB log(V /2 /N/2 !),

but the nal entropy is now SBB mixed = kB log((2V )N /N !) Notice we

now have N ! rather than the (N/2 !)2 from equation 6.14, since all of our

particles are now indistinguishable. Now N ! = (N ) (N 1) (N

2) (N 3) . . . and (N/2 !)2 = (N/2) (N/2) (N 2)/2 (N 2)/2 . . . :

they roughly dier by 2N canceling the entropy change due to the vol-

ume doubling. Indeed, expanding the logarithm using Stirlings formula

12

If you keep Stirlings formula to log n! n log n n we nd the entropy per atom is unchanged.12 This

higher order, youll see that the entropy is why we introduced the N ! term for indistinguishable particles in sec-

increases a bit when you remove the

partition. This is due to the number

tion 3.2.1: without it the entropy would decrease by N log 2 whenever

uctuations on the two sides that are we split a container into two pieces.13

now allowed. How can we intuitively connect this entropy of mixing with the ther-

13 modynamic entropy of pistons and engines in section 6.1? Can we use

This is often called the Gibbs para-

dox. To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

6.2 Entropy as Disorder 89

between the two kinds of atoms. Suppose that the barrier separating the

two walls in gure 6.4 was a membrane that was impermeable to black

atoms but allowed white ones to cross. Since both black and white atoms

are ideal gasses, the white atoms would spread uniformly to ll the entire

system, while the black atoms would remain on one side. This would

lead to a pressure imbalance: if the semipermeable wall were used as a

piston, work could be extracted as the black chamber was enlarged to

ll the total volume.14 14

Such semipermeable membranes are

Suppose we had a more active discrimination? Maxwell introduced quite common not for gasses but for di-

lute solutions of ions in water: some

the idea of an intelligent nite being (later termed Maxwells Demon) ions can penetrate and others cannot.

that would operate a small door between the two containers. When a The resulting force on the membrane is

black atom approaches the door from the left or a white atom approaches called osmotic pressure.

from the right the demon would open the door: for the reverse situations

the demon would leave the door closed. As time progresses, this active

sorting would re-segregate the system, lowering the entropy. This is

not a concern for thermodynamics, since of course running a demon

is an entropy consuming process! Indeed, one can view this thought

experiment as giving a fundamental limit on demon eciency, putting

a lower bound on how much entropy an intelligent being must create in

order to engage in this kind of sorting process.

6.2.2 Residual Entropy of Glasses: The Roads Not Fig. 6.6 Ion pump. An imple-

mentation of this demon in biology is

Taken Na+ /K+ -ATPase, an enzyme located

on the membranes of almost every cell

In condensed-matter physics, glasses are the prototype of disordered in your body. This enzyme maintains

systems. Unlike a crystal, in which each atom has a set position, a extra potassium (K+) ions inside the

cell and extra sodium (Na+ ) ions out-

glass will have a completely dierent conguration of atoms each time

side the cell. The enzyme exchanges

it is formed. That is, the glass has a residual entropy: as the tempera- two K + ions from outside for three Na+

ture goes to absolute zero, the glass entropy does not vanish, but rather ions inside, burning as fuel one AT P

equals kB log glass , where glass is the number of zero-temperature con- (adenosine with three phosphates, the

fuel of the cell) into ADP (two phos-

gurations in which the glass might be trapped. phates). When you eat too much salt

What is a glass? Glasses are disordered like liquids, but are rigid like (Na+ Cl ), the extra sodium ions in

crystals. They are not in equilibrium: they are formed when liquids the blood increase the osmotic pressure

are cooled too fast to form the crystalline equilibrium state.15 You are on the cells, draw more water into the

blood and increase your blood pressure.

aware of glasses made from silica, like window glass,16 and Pyrex.TM17 The gure shows the structure of the

You also know some molecular glasses, like hard candy (a glass made related enzyme calcium ATPase [114]:

of sugar). Many other materials (even metals)18 can form glasses when the arrow shows the shape change as

cooled quickly. the two Ca+ ions are removed.

15

The crystalline state must be nucle-

16 Windows are made from soda-lime glass, with silica (SiO2 ) mixed with sodium ated, see section 12.2.

and calcium oxides.

17 PyrexTM is a borosilicate glass (boron and silicon oxides) with a low thermal

expansion, used for making measuring cups that dont shatter when lled with boiling

water.

18 Most metals are polycrystalline. That is, the atoms sit in neat crystalline ar-

rays but the metal is made up of many grains with dierent crystalline orientations

separated by sharp grain boundaries.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

90 Entropy

How is the residual glass entropy measured? First, one estimates the

19

One can measure the entropy of the entropy of the equilibrium liquid;19 then one measures the entropy ow

equilibrium liquid Sliquid (T- ) by slowly Q/T out from the glass as it is cooled from the liquid down to absolute

heating a crystal of the material

R from

absolute zero and measuring 0T
dQ/T

zero. The dierence

owing in. T

1 dQ

Sresidual = Sliquid (T ) dt (6.17)

0 T dt

gives the residual entropy.

How big is the residual entropy of a typical glass? The residual en-

tropy is on the order of kB per molecular unit of the glass (SiO2 or

Vi sugar molecule, for example). This means that the number of glassy

congurations eS/kB is enormous (exercise 6.9 part (c)).

How is it possible to measure the number of glass congurations the

system didnt choose? The glass is, after all, in one particular cong-

uration. How can measuring the heat ow Q(t) out of the liquid as it

i freezes into one glassy state be used to measure the number glass of

possible glassy states? Answering this question will neatly tie together

the statistical mechanics denition of entropy Sstat = kB log glass with

qi the thermodynamic denition Sthermo = Q/T , and will occupy the rest

of this subsection.

We need rst a simplied model of how a glass might fall out of equilib-

Fig. 6.7 Double well potential. A

rium as it is cooled.20 We view the glass as a collection of independent

simple model for the potential energy

for one coordinate qi in a glass: two molecular units. Each unit has a double-well potential energy: along

states separated by a barrier Vi and some internal coordinate qi there are two minima with an energy dier-

with a small energy dierence i . ence i and separated by an energy barrier Vi (gure 6.7). This internal

coordinate might represent a rotation of a sugar molecule, or a shift in

the location of an oxygen in a SiO2 network.

Consider the behavior of one of these double-well degrees of freedom.

As we cool our system, the molecular unit will be thermally excited over

its barrier more and more slowly, with a rate (exercise 5.2) given by an

Arrhenius factor (T ) 0 exp(Vi /kB T ). So long as the cooling rate

20 The glass transition is not a sharp phase transition: the liquid grows thicker

(more viscous) as it is cooled, with slower and slower dynamics, until the cooling

rate becomes too fast for the atomic rearrangements needed to maintain equilibrium

to keep up. At that point, there is a gradual, smearedout transition over many

degrees Kelvin as the viscosity eectively becomes innite and the glass becomes

bonded together. The fundamental nature of this transition remains controversial,

and in particular we do not know why the viscosity diverges so rapidly in so many

materials. There are at least three kinds of competing theories for the glass transition:

(1) It reects an underlying equilibrium transition to an ideal, zero entropy glass

state, which would be formed under innitely slow cooling

(2) It is a purely dynamical transition (where the atoms or molecules jam together)

with no thermodynamic signature.

(3) It is not a transition at all, but just a crossover where the liquid viscosity jumps

rapidly (say, because of the formation of semipermanent covalent bonds).

Our simple model is not a good description of the glass transition, but is a rather

accurate model for the continuing thermal rearrangements (-relaxation) at temper-

atures below the glass transition, and an excellent model for the quantum dynamics

(tunneling centers) which dominate many properties of glasses below a few degrees

Kelvin.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

6.2 Entropy as Disorder 91

However, at the local glass-transition temperature Tig where the two

rates cross

Tig = Vi /(kB log(0 /cool )) (6.18)

the transitions between the wells will not keep up and our molecular unit

will freeze into position. If the cooling rate cool is very slow compared

to the attempt frequency 0 (as it almost always is)21 this transition 21

Atomic rates like 0 are around 1012

will be abrupt, and our model glass will freeze into the upper well with per second (an atomic vibration fre-

quency); cooling times are typically be-

the probability given by the equilibrium distribution at Tig . tween seconds and years, so the cooling

Our frozen molecular unit hasg a population in the upper well given by rate is indeed slow compared to micro-

the Boltzmann factor ei /kB Ti times the population in the lower well. scopic times.

Hence centers with i kB Tig will have both states roughly equally

populated; those with i kB Tig will be primarily in the ground state.

As a crude approximation, let us pretend that each center goes sharply

from equal occupancy to being fully in the ground state at an asymmetry

i

temperature Tia = k B

, for some constant .22 22

We havent yet dened the entropy

The statistical mechanical entropy contributed by the two states of for ensembles where the probabilities

are not uniform: thats in section 6.3.

our molecular unit23 is kB log 2 for T > Tia . If Tia > Tig , then the unit Using these denitions, we would not

i

remains in equilibrium. At T = Tia = k B

the statistical entropy drops need this approximation or the fudge-

to zero (in our crude approximation) so Sstat = kB log 2. At the same factor needed to equate the two en-

time, an average energy i /2 is transmitted to the heat bath.24 so the tropies, but the proper calculation is

more complicated and less intuitive.

thermodynamic entropy changes by Sthermo = Q/T = 2i Tia = kB /2.

23

Thus we can pick = 2 log 2 to ensure that our two entropy changes That is, we integrate out the vibra-

tions of the molecular unit in the two

agree in our crude approximation. wells: our energy barrier in gure 6.7 is

Now we can see how the thermodynamic measurement of heat can tell properly a free energy barrier.

us the number of glassy congurations. Suppose there are N molecular 24

A 50/50 chance of having energy i .

units in the glass which fall out of equilibrium (Tia < Tig ). As the glass

is cooled, one by one these units randomly freeze into one of two states

(gure 6.8), leading to glass = 2N glassy congurations for this cooling T g1

rate, and a statistical mechanical residual entropy T g2

residual T g3

Sstat = N kB log 2 (6.19)

T g4

roughly kB per molecular unit if the fraction of units with small asym- T g5

metries is sizeable. This entropy change is reected in the thermodynam- T g6

ical measurement at the lower temperatures25 Tia . At these points the

T g7

energy ow out of the glass is less than that for the equilibrium system

because the unit no longer can hop over its barrier, so the thermody- Fig. 6.8 Roads Not Taken by the

namic entropy for the glass stays higher than that for the equilibrated, Glass. The branching path of glassy

zero residual entropy ideal glass state, by states in our model. The entropy (both

statistical and thermodynamic) is pro-

i portional to the number of branchings

residual

Sthermal = = N kB log 2. (6.20) the glass chooses between as it cools. A

2Tia

Tia <Tig particular glass will take one trajectory

through this tree as it cools: nonethe-

Thus the heat ow into a particular glass conguration counts the num- less the thermodynamic entropy mea-

sures the total number of states.

ber of roads not taken by the glass on its cooling voyage.

25

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

This is again an artifact of our

crude approximation: the statistical

and thermodynamic entropies remain

in sync at all temperatures when the

calculation is done properly.

92 Entropy

Memory

The most general interpretation of entropy is as a measure of our igno-

rance about a system. The equilibrium state of a system maximizes the

entropy because we have lost all information about the initial conditions

except for the conserved quantities: maximizing the entropy maximizes

our ignorance about the details of the system. The entropy of a glass,

or of our mixture of black and white atoms, is a measure of the number

of arrangements the atoms could be in, given our ignorance.

This interpretation that entropy is not a property of the system, but

of our knowledge about the system (represented by the ensemble of pos-

sibilities) cleanly resolves many otherwise confusing issues. The atoms

in a glass are in a denite conguration, which we could measure using

some futuristic X-ray holographic technique. If we did so, our ignorance

26

Of course, the X-ray holographic pro- would disappear, and the residual entropy would become zero for us.26

cess must create at least as much en- We could in principle use our knowledge of the glass atom positions to

tropy during the measurement as the

glass loses.

extract work out of the glass, which would have been impossible before

measuring the positions.

So far, we have conned ourselves to cases where our ignorance is max-

imal, where all allowed congurations are equally likely. What about

systems where we have partial information, where some congurations

are more probable than others? There is a powerful generalization of

the denition of entropy to general probability distributions, which we

will introduce in subsection 6.3.1 for traditional statistical mechanical

systems approaching equilibrium. In section 6.3.2 we will show that this

nonequilibrium entropy provides a generally useful measure of our igno-

rance about a wide variety of systems, with broad applications outside

of traditional physics.

So far, we have dened the entropy only for systems in equilibrium,

where entropy is a constant. But the second law of thermodynamics

tells us that entropy increases presupposing some denition of entropy

for non-equilibrium systems. Any non-equilibrium state of a classical

Hamiltonian system can be described with a probability density (P, Q)

on phase space. Wed like to have a formula for the entropy in terms of

this probability density.

In the case of the microcanonical ensemble, where (P, Q) = 1/(E),

we certainly want S to agree with our equilibrium formula 3.27

some kind of average of kB log() over the phase-space volume. Since

(P, Q) is the probability of being at

a given point in phase space, the

average of any observable is A = dP dQ (P, Q)A(P, Q), leading us

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

6.3 Entropy as Ignorance: Information and Memory 93

Snonequil = kB log = kB log . (6.22)

Is this the right formula to use? Well see in subsection 6.3.2 that it

has several general important properties. For here, we need to know

that it behaves properly for the two cases of non-uniform probability

distributions weve seen so far: the various equilibrium ensembles, and

weakly coupled subsystems.

This entropy is maximized for the microcanonical (subsection 6.3.2),

canonical, and grand canonical ensembles, under suitable constraints.

You can argue this with Lagrange multipliers (exercise 6.4).

This entropy is additive, for weakly coupled subsystems. In arguing

for the denition of temperature (section 3.3), we implicitly discussed

non-equilibrium entropies for a system with two weakly coupled parts.

That is, we calculated the entropy of a system with two parts as a

function of the amount of energy in each: S(E) = S1 (E1 ) + S2 (E

E1 ).27 This is an important property: we want the entropies of weakly 27

We then argued that energy would

coupled, uncorrelated systems to add.28 Lets check this. The states ow from one to the other until the

temperatures matched and entropy was

of the total system are pairs (s1 , s2 ) of states from the two separate maximized.

systems. The probability density that the rst system is in state s1 = 28

In thermodynamics, one would say

(P1 , Q1 ) and the second system is in state s2 = (P2 , Q2 ) is ((s1 , s2 )) = that S is an extensive variable, that

1 (P1 , Q1 )2 (P2 , Q2 ).29

The total

entropy of the combined system, by grows in proportion to the system size.

formula 6.22 and using 1 = 2 = 1, is

S = kB dP1 dQ1 dP2 Q2 (6.23)

1 (P1 , Q1 )2 (P2 , Q2 ) log (1 (P1 , Q1 )2 (P2 , Q2 ))

= kB 1 2 log(1 2 )

= kB 1 2 (log 1 + log 2 )

= kB 1 log 1 2 kB 1 2 log 2

=S1 (E1 ) + S2 (E2 ).

guises. For discrete systems, it is written as a sum over probabilities pi

of the states i:

Sdiscrete = kB pi log pi ; (6.24)

i

tion 7.1):

Squantum = kB T r( log ). (6.25)

29 This is just what we mean by uncorrelated: the probabilities for system #1 are

independent of those for system #2, so the probability for the pair is the product of

the probabilities.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

94 Entropy

dened for the microscopic laws of motion. However, in section 6.1 we

argued that the microscopic laws were time-reversal invariant, and the

increase of entropy must be used to dene the future. Thus, we can

guess that these microscopic entropies will be timeindependent: you

can show this explicitly in exercises 6.5 and 7.2. No information is lost

(in principle) by evolving a closed system in time. Entropy (and our

ignorance) increases only in coarse-grained theories where we ignore or

exclude some degrees of freedom (internal or external).

Understanding ignorance is central to many elds! Entropy as a mea-

sure of ignorance has been useful in everything from the shuing of

cards to reconstructing noisy images. For these other applications, the

connection with temperature is unimportant, so we dont need to make

use of Boltzmanns constant. Instead, we normalize the entropy with

the constant kS = 1/ log(2):

Snonequil = kS pi log pi . (6.26)

i

This normalization was introduced by Shannon [108], and the for-

mula 6.26 is referred to as Shannon entropy in the context of informa-

tion theory. Shannon noted that this entropy, applied to the ensemble

of possible messages or images, can be used to put a fundamental limit

30

Lossless compression schemes (les on the amount they can be compressed30 to eciently make use of disk

ending in gif, png, zip, and gz) remove space or a communications channel (exercises 6.7 and 6.8). A low en-

the redundant information in the orig-

inal les, and their eciency is limited

tropy data set is highly predictable: given the stream of data so far, we

by the entropy of the ensemble of les can predict the next transmission with some condence. In language,

being compressed. Lossy compression siblings can often complete sentences for one another. In image trans-

schemes (les ending in jpg, mpg, and mission, if the last six pixels were white the region being depicted is

mp3) also remove information that is

thought to be unimportant for humans

likely a white background, and the next pixel is also likely white. One

looking at or listening to the les. need only transmit or store data that violates our prediction. The en-

tropy measures our ignorance, how likely the best predictions about the

rest of the message are to be wrong.

Entropy is so useful in these various elds because it is the unique

31

Unique, that is, up to the overall con- (continuous) function that satises three key properties.31 In this sec-

stant kS or kB . tion, we will rst explain what these three properties are and why they

are natural for any function that measures ignorance. We will show

our nonequilibrium Shannon entropy satises these properties; in ex-

ercise 6.11 you will show that this entropy is the only function to do

so.

Your roommate has lost their keys: they are asking for your advice.

We want to measure the roommates progress in nding the keys by

32

For now, S is an unknown function: measuring your ignorance with some function S.32 Suppose there are

were showing that the entropy is a possible sites Ak that they might have

good candidate. left

the keys, which you estimate

have probabilities pk = P (Ak ), with 1 pi = 1.

What are the three key properties we want our ignorance function

S(p1 , . . . , p ) to have? The rst two are easy.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

6.3 Entropy as Ignorance: Information and Memory 95

formation, surely the best plan is for your roommate to look rst at the

most likely site, which maximizes pi . Your ignorance must therefore be

maximal if all sites have equal likelihood:

S 1/ , . . . , 1/ > S(p1 , . . . , p ) unless pi = 1/ for all i. (6.27)

are equal. Here we use the convexity of x log x (gure 6.9) to show it

is a maximum. First, we notice that the function f (p) = p log p is

concave (convex downward, gure 6.9). For a concave function f , the

average value of f (p) over a set of points pk is less than than or equal

to f evaluated at the average:33

1

/ f (pk ) f /

1

pk . (6.29)

k k

Entropy is Concave (Convex downward)

f( a+(1-)b ) > f(a) + (1-)f(b)

S(p1 , . . . , p ) = kS pk log pk = kS

0.4

f (pk )

f( a+(1-)b )

0.3

0.2

k

0.1 f(a) + (1-) f(b) f(b)

1

= kS / f (1/ ) = S(1/ , . . . , 1/ ).

1 0

0 0.2 0.4 0.6 0.8 1

x

(2) Entropy is unaected by extra states of zero probability. If there

is no possibility that the keys are in your shoe (site A ), then your Fig. 6.9 Entropy is Concave. For

ignorance is no larger than it would have been if you hadnt included x 0, f (x) = x log x is strictly con-

vex downward (concave). That is, for

your shoe in the list of possible sites: 0 < < 1, the linear interpolation lies

below the curve:

S(p1 , . . . , p1 , 0) = S(p1 , . . . , p1 ). (6.32)

f (a + (1 )b) f (a)+(1)f (b).

(6.31)

33 Equation 6.29 can be proven by induction from the denition of concave (equa-

We know f is concave because its sec-

tion 6.31). For = 2, we use = 1/2 , a = p1 , and b = p2 to see that f p1 +p 2

2

ond derivative, 1/x, is everywhere

P1 negative.

1

2

(f (p 1 ) + f (p 2 )) . For general , we use = ( 1)/, a = ( 1 p k )/( 1),

and b = p to see

P ! P !

k=1 pk 1 1 pk

f =f 1

+ 1/ p

1

P1 !

1 pk

f 1

+ 1/ f (p )

1

!

1 X

1

1

f (pk ) + 1/ f (p )

k=1

1

X

= 1/ f (pk ) (6.28)

k=1

where in the third line we have used the truth of equation 6.29 for 1 to inductively

prove it for .

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

96 Entropy

To aid in the search, youll likely ask the roommate where they were

when they last saw the keys. Suppose there are M locations B that

the roommate may have been (opening the apartment door, driving the

car, in the basement laundry room, . . . ), with probabilities q . Surely

the likelihood that the keys are currently in a coat pocket is larger if

the roommate was outdoors when the keys were last seen. Let rk =

P (Ak and B ) be the probability the keys are at site k and were last

seen at location , and

be the conditional probability, given that they were last seen at B that

34

The conditional probability P (A|B) the keys are at site Ak .34 Clearly

[read P of A given B] is of course

P (B) times the probability of A and B P (Ak |B ) = ck = 1 : (6.34)

both occuring.

k k

whereever they were last seen, the keys are now somewhere with proba-

bility one.

Before you ask your roommate where the keys were last seen, you have

ignorance S(A) = S(p1 , . . . , p ) about the site of the keys, and ignorance

S(B) = S(q1 , . . . , qM ) about the location they were last seen. You have a

joint ignorance about the two questions given by the ignorance function

applied to all M conditional probabilities:

= S(c11 q1 , c12 q2 , . . . , c1M qM , c21 q1 , . . . , cM qM ).

After the roommate answers your question, your ignorance about the

location last seen is reduced to zero (decreased by S(B)). If the location

last seen was in the laundry room (site B ), the probability for the keys

being at Ak shifts to ck and your ignorance about the site of the keys

is now

S(A|B ) = S(c1 , . . . , c ). (6.36)

So, your combined ignorance has decreased from S(AB) to S(A|B ).

We can measure the usefulness of your question by the expected

amount that it decreases your ignorance about where the keys reside.

The expected ignorance after the question is answered is given by weight-

ing the ignorance for each answer B by the probability q of that answer:

S(A|B )B = q S(A|B ). (6.37)

This leads us to the third key property for an ignorance function.

(3) Entropy change for conditional probabilities. How should a good

ignorance function behave for conditional probabilities? If we start with

the joint distribution AB, and then measure B, it would be tidy if, on

average, your joint ignorance declined by your original ignorance of B:

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

6.3 Entropy as Ignorance: Information and Memory 97

satisfy property (3)? The conditional prob-

ability S(A|B ) = kS ck log ck , since the ck s are the probability

distribution for the Ak sites given location . So,35 35

Notice that this argument is almost

the same as the proof that entropy is

S(AB) = kS ck q log(ck q ) additive (equation 6.23). There we as-

sumed A and B were uncorrelated, in

k which case ck- = pk and S(AB) =

S(A) + S(B).

= kS ck q log(ck ) + ck q log(q )

k k

= q kS ck log(ck ) + kS q log(q )

ck

k k

= q S(A|B ) + S(B)

= S(A|B )B + S(B) (6.39)

and the Shannon entropy satises the third key condition for a measure

of ignorance, equation 6.38.

Exercises

Entropy is an emergent property. Unlike energy conser- the heat death of the universe.

vation, which is inherited from the microscopic theory, Normally one speaks of living things as beings that con-

entropy is a constant for a closed system treated micro- sume energy to survive and proliferate. This is of course

scopically (6.5(a)). Entropy increases because informa- not correct: energy is conserved, and cannot be con-

tion is lost either to the outside world, to unimportant sumed. Living beings intercept entropy ows: they use

internal degrees of freedom (diusion equation, 6.6), or to low entropy sources of energy (e.g., high temperature so-

measurement inaccuracies in the initial state (Lyapunov lar radiation for plants, candy bars for us) and emit high

exponents 6.12, Poincare cat map 6.5(b)). entropy forms of the same energy (body heat).

Entropy is a general measure of ignorance, useful far

Dyson ignores the survival and proliferation issues; hes

outside its traditional applications (6.4) in equilibrium

interested in getting a lot of thinking in before the uni-

systems. It is the unique function to have the appropri-

verse ends. He presumes that an intelligent being gen-

ate properties to measure ignorance (6.11). It has ap-

erates a xed entropy S per thought. (This correspon-

plications to glasses (6.9) and to dening fractal dimen-

dence of information with entropy is a standard idea from

sions (6.14). It is fascinating that entropy our ignorance

computer science: see problems 6.7 and 6.8.)

about the system can exert real forces (e.g. in rubber

bands, 6.10). Energy needed per thought. Assume that the being

Entropy provides fundamental limits on engine e- draws heat Q from a hot reservoir at T1 and radiates it

ciency (6.2, 6.3), data compression (6.7, 6.8), memory away to a cold reservoir at T2 .

storage (to avoid a black hole! 6.13(c)), and to intelligent (a) What is the minimum energy Q needed per thought,

life at the end of the universe (6.1). in terms of S and T2 ? You may take T1 very large.

Related formul: S = Q2 /T2 Q1 /T1 ; First Law:

(6.1) Life and the Heat Death of the Universe. Q1 Q2 = W (energy is conserved).

(Basic, Astrophysics) [27] Time needed per thought to radiate energy. Dyson

Freeman Dyson discusses how living things might evolve shows, using theory not important here, that the power

to cope with the cooling and dimming we expect during radiated by our intelligentbeingasentropyproducer is

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

98 Entropy

no larger than CT23 , a constant times the cube of the cold A monatomic ideal gas in a piston is cycled around the

temperature.36 path in the P-V diagram in gure 6.10 Leg a cools at con-

(b) Write an expression for the maximum rate of thoughts stant volume by connecting to a heat bath at Tc ; leg b

per unit time dH/dt (the inverse of the time t per heats at constant pressure by connecting to a heat bath

thought), in terms of S, C, and T2 . at Th ; leg c compresses at constant temperature while

remaining connected to the bath at Th .

Number of thoughts for an ecologically ecient

being. Our universe is expanding: the radius R grows Which of the following are true?

roughly linearly in time t. The microwave background (T) (F) The cycle is reversible: no net entropy is created

radiation has a characteristic temperature (t) R1 in the universe.

which is getting lower as the universe expands: this red- (T) (F) The cycle acts as a refrigerator, using work from

shift is due to the Doppler eect. An ecologically ecient the piston to draw energy from the cold bath into the hot

being would naturally try to use as little heat as possible, bath, cooling the cold bath.

and so wants to choose T2 as small as possible. It cannot (T) (F) The cycle acts as an engine, transferring heat

radiate heat at a temperature below T2 = (t) = A/t. from the hot bath to the cold bath and doing positive net

(c) How many thoughts H can an ecologically ecient be- work on the outside world.

ing have between now and time innity, in terms of S, (T) (F) The work done per cycle has magnitude |W | =

C, A, and the current time t0 ? P0 V0 |4 log 4 3|.

Time without end: Greedy beings. Dyson would (T) (F) The heat transferred into the cold bath, Qc has

like his beings to be able to think an innite number of magnitude |Qc | = (9/2)P0 V0 .

thoughts before the universe ends, but consume a nite (T) (F) The heat transferred from the hot bath Qh , plus

amount of energy. He proposes that his beings need to the net work W done by the piston onto the gas, equals

be proigate in order to get their thoughts in before the

the heat Qc transferred into the cold bath.

world ends: he proposes that they radiate at a tempera-

ture T2 (t) t3/8 which falls with time, but not as fast Related formul: PRV = N kB T , U = (3/2)N kB T ,

as (t) t1 . S = Q/T , W = P dV , U = Q + W . Notice that

the signs of the various terms depend on convention (heat

(d) Show that with Dysons cooling schedule, the total ow out vs. heat ow in): you should gure the signs on

number of thoughts H is innite, but the total energy con- physical grounds.

sumed U is nite.

(6.3) Carnot Refrigerator. (Basic, Thermodynamics)

(6.2) P-V Diagram. (Basic, Thermodynamics)

Our refrigerator is about 2m 1m 1m, and has in-

sulation about 3cm thick. The insulation is probably

polyurethane, which has a thermal conductivity of about

4P0 0.02 W/(m K). Assume that the refrigerator interior is at

270K, and the room is at 300K.

T h

(a) How many watts of energy leak from our refrigerator

through this insulation?

c Our refrigerator runs at 120 V, and draws a maximum of

P a Iso 4.75 amps. The compressor motor turns on every once in

the

rm a while for a few minutes.

(b) Suppose (i) we dont open the refrigerator door, (ii)

P the thermal losses are dominated by the leakage through

0 the foam and not through the seals around the doors, and

T c b

(iii) the refrigerator runs as a perfectly ecient Carnot

cycle. How much power on average will our refrigerator

V0 4V0 need to operate? What fraction of the time will the motor

V run?

Fig. 6.10 PV diagram

36 The constant scales with the number of electrons in the being, so we can think

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

6.3 Entropy as Ignorance: Information and Memory 99

Lagrange Multipliers. Lagrange multipliers allow one The second law of thermodynamics says that entropy al-

to nd the extremum of a function f (x) given a constraint ways increases. Perversely, its easy to show that in an

g(x) = g0 . One extremizes isolated system, no matter what non-equilibrium condi-

tion it starts in, entropy as precisely dened stays con-

f (x) + (g(x) g0 ) (6.40) stant in time.

Entropy is Constant: Classical. 38 Liouvilles the-

as a function of and x. The derivative with respect

to being zero enforces the constraint and sets . The orem tells us that the total derivative of the probability

density is zero: following the trajectory of a system, the

derivatives with respect to components of x then include

local probability density never changes. The equilibrium

terms involving , which act to enforce the constraint.

states have probability densities that only depend on en-

Let us use Lagrange multipliers to nd the maximum of

ergy and number. Clearly something is wrong: if the

the nonequilibrium entropy density starts non-uniform, how can it become uniform?

Z

(a) Show forPany function f () that f ()/t =

S = kB (P, Q) log (P, Q)

[f ()V] = /p (f ()p ) + /q (f ()q ), where

= kB T r( log ) (6.41) V = P, Q is the 6N dimensional velocity in phase

X

= kB pi log pi space.R Hence, (by Gausss theorem in 6N dimensions),

show f ()/t dPdQ = 0, assuming that the probabil-

constraining the normalization, energy, and number. You ity density vanishes at large momenta and positions R and

may use whichever form of the entropy you prefer: the f (0) = 0. Show, thus, that the entropy S = kB log

rst continuous form will demand some calculus of vari- is constant in time.

ations (see [68, ch. 12]); the last discrete form is mathe- We will see that the quantum version of the entropy is

matically the most straightforward. also constant for a Hamiltonian system in problem 7.2.

(a) Microcanonical: Using a Lagrange multiplier to en- The Arnold Cat. Why do we think entropy increases?

force the normalization First, points in phase space dont just swirl in circles: they

Z get stretched and twisted and folded back in complicated

T r() = (P, Q) = 1, (6.42) patterns especially in systems where statistical mechan-

EnergySurface

ics seems to hold! Arnold, in a takeo on Schrodingers

show that the probability distribution that extremizes the cat, suggested the following analogy. Instead of a contin-

entropy is a constant (the microcanonical distribution). uous transformation of phase space onto itself preserving

(b) Canonical: Integrating over all P and Q, use an- 6N -dimensional volume, lets think of an area-preserving

other Lagrange multiplier to x the mean energy E = mapping of an n n square in the plane into itself.39

R

dPdQ H(P, Q)(P, Q). Show that the canonical distribu- Consider the mapping

tion maximizes the entropy given the constraints of nor-

x x+y

malization and xed energy. = mod n. (6.43)

y x + 2y

(c) Grand Canonical: Summing over dierent numbers

of particles N andPadding See the map in gure 6.11

R the constraint that the average

number is N = N dPdQ N N (P, Q), show that you (b) Check that preserves

area. (Its

basically multipli-

get the grand canonical distribution by maximizing the en- 1 1

cation by the matrix M = . What is the deter-

tropy. 1 2

minant of M ?). Show that it takes a square n n (or a

37 Lagrange (1736-1813).

38 Well see in problem 7.2 that the non-equilibrium entropy is also constant in

quantum systems.

39 For our purposes, the Arnold cat just shows that volume preserving transforma-

tions can scramble a small region uniformly over a large one. More general, nonlin-

ear areapreserving maps of the plane are often studied as simple Hamiltonianlike

dynamical systems. Areapreserving maps come up as Poincare sections of Hamil-

tonian systems 4.2, with the area weighted by the inverse of the velocity with which

the system passes through the crosssection. They come up in particular in stud-

ies of highenergy particle accelerators, where the mapping gives a snapshot of the

particles after one orbit around the ring.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

100 Entropy

Fig. 6.11 Arnold Cat Transform, from reference [80]; see movie too [89].

picture of n n pixels) and maps it into itself with peri- stretch and chop our original circle into a thin line uni-

odic boundary conditions. (With less cutting and pasting, formly covering the square. In the pixel case, there are

you can view it as a map from the torus into itself.) As a always exactly the same number of pixels that are black,

linear map, nd the eigenvalues and eigenvectors. Argue white, and each shade of gray: they just get so kneaded

that a small neighborhood (say a circle in the center of together that everything looks a uniform color. So, by

the picture) will initially be stretched along an irrational putting a limit to the resolution of our measurement

direction into a thin strip (gure 6.12). (rounding errors on the computer, for example), or by

introducing any tiny coupling to the external world, the

nal state can be seen to rapidly approach equilibrium,

proofs to the contrary notwithstanding!

We saw that entropy technically doesnt increase for a

closed system, for any Hamiltonian, either classical or

quantum. However, we can show that entropy increases

for most of the coarse-grained eective theories that we

use in practice: when we integrate out degrees of freedom,

we provide a means for the information about the initial

Fig. 6.12 A small circular region stretches along an irrational condition to be destroyed. Here youll show that entropy

angle under the Arnold cat map. The center of the gure is increases for the diusion equation.

the origin x = 0, y = 0.

Diusion Equation Entropy. Let (x, t) obey the one-

dimensional diusion equation /t = D 2 /x2 . As-

When this thin strip hits the boundary, it gets split into sume that the density and all its gradients die away

two; in the case of an n n square, further iterations rapidly at x = .40

40 Also, you may assume n /xn log goes to zero at x = , even though log

goes to .

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

6.3 Entropy as Ignorance: Information and Memory 101

Derive a formula

R for the time derivative of the entropy you read the message to after you read which of 16 mes-

S = kB (x) log (x) dx and show that it strictly in- sages it was. The length of 1000 is not important for this

creases in time. (Hint: integrate by parts. You should part.)

get an integral of a positive denite quantity.) Remark: This is an extreme form of data compression,

like that used in gif images, zip les (Windows) and gz

(6.7) Information entropy. (Basic, Computer Science, les (Unix). We are asking for the number of characters

Mathematics, Complexity) per year for an optimally compressed signal.

Entropy is a measure of your ignorance about a system: it

is a measure of the lack of information. It has important (6.8) Shannon entropy. (Computer Science)

implications in communication technologies: messages Entropy can be viewed as a measure of the lack of infor-

passed across the Ethernet communicate information, re- mation you have about a system. Claude Shannon [108]

ducing the information entropy for the receiver. Shan- realized, back in the 1940s, that communication over

non [108] worked out the use of entropy ideas in commu- telephone wires amounts to reducing the listeners un-

nications, focusing on problems where dierent messages certainty about the senders message, and introduced a

have dierent probabilities. Well focus on the simpler denition of an information entropy.

problem where all N messages are equally likely. Shan- Most natural languages (voice, written English) are

non denes the information entropy of an unread mes- highly redundant; the number of intelligible fty-letter

sage as being log2 N = kS log N , where kS = 1/(loge 2) sentences is many fewer than 2650 , and the number of

is analogous to Boltzmanns constant, and changes from ten-second phone conversations is far smaller than the

log-base-e to log-base-2 (more convenient for computers, number of sound signals that could be generated with fre-

which think in base two.) quencies between up to 20,000 Hz.42 Shannon, knowing

Your grandparent has sent you an e-mail message. From statistical mechanics, dened the entropy of an ensemble

the header of the message, you know it contains 1000 of messages: if there are N possible messages that can be

characters. You know each character is made of 8 bits, sent in one package, and message m is being transmitted

which allows 28 = 256 dierent letters or symbols per with probability pm , then Shannons entropy is

character. Assuming all possible messages from your

grandparent are equally likely (a typical message would X

N

1

messages N could there be? This (unrealistic) assump-

tion gives an upper bound for the information entropy where instead of Boltzmanns constant, Shannon picked

Smax . kS = 1/ log 2.

(a) What Smax for the unread message? This immediately suggests a theory for signal compres-

Your grandparent writes rather dull messages: they all sion. If you can recode the alphabet so that common

fall into the same pattern. They have a total of 16 equally letters and common sequences of letters are abbreviated,

likely messages. 41 After you read the message, you for- while infrequent combinations are spelled out in lengthly

get the details of the wording anyhow, and only remember fashion, you can dramatically reduce the channel capac-

these key points of information. ity needed to send the data. (This is lossless compression,

(b) What is the actual information entropy change like zip and gz and gif).

SShannon you undergo when reading the message? If An obscure language Abc! for long-distance communica-

your grandparent writes one message per month, what is tion has only three sounds: a hoot represented by A, a

the minimum number of 8-bit characters per year that it slap represented by B, and a click represented by C. In

would take to send your grandparents messages? (You a typical message, hoots and slaps occur equally often

may lump multiple messages into a single character.) (p = 1/4), but clicks are twice as common (p = 1/2).

(Hints: Sshannon is the change in entropy from before Assume the messages are otherwise random.

41 Each message mentions whether they won their bridge hand last week (a fty-

fty chance), mentions that they wish you would write more often (every time), and

speculates who will win the womens college basketball tournament in their region

(picking at random one of the eight teams in the league).

42 Real telephones dont span this whole frequency range: they are limited on the

low end at 300400 Hz, and on the high end at 30003500. You can still understand

the words, so this simple form of data compression is only losing non-verbal nuances

in the communication [34].

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

102 Entropy

specically, what is the Shannon entropy rate (entropy per

sound, or letter, transmitted)?

(b) Show that a communication channel transmitting bits

(ones and zeros) can transmit no more than one unit of

Shannon entropy per bit. (Hint: this should follow by

showing that, for N = 2n messages, equation 6.44 is max-

imized by pm = 1/N . You neednt prove its a global

maximum: check that it is a local extremum. Youll need

either a Lagrange

P multiplier or will need to explicitly set

pN = 1 N1 m=1 p m .)

(c) In general, argue that the Shannon entropy gives the

minimum number of bits needed to transmit the ensemble

of messages. (Hint: compare the Shannon entropy of

the N original messages with the Shannon entropy of the Fig. 6.13 Specic heat of B2 O3 glass measured while heat-

ing and cooling. The glass was rst rapidly cooled from the

N (shorter) encoded messages.) Calculate the minimum

melt (500 C 50 C in a half hour), then heated from 33 C

number of bits per letter on average needed to transmit 345 C in 14 hours (solid curve with squares), cooled from

messages for the particular case of an Abc! communica- 345 C to room temperature in 18 hours (dotted curve with di-

tion channel. amonds), and nally heated from 35 C 325 C (solid curve

(d) Find a compression scheme (a rule that converts a with crosses). Figure from reference [113], see also [57].

Abc! message to zeros and ones, that can be inverted to

give back the original message) that is optimal, in the

sense that it saturates the bound you derived in part (b).

Thomas and Parks in gure 6.13 are making the approx-

(Hint: Look for a scheme for encoding the message that

imation that the specic heat of the glass is dQ/dT , the

compresses one letter at a time. Not all letters need to

measured heat ow out of the glass divided by the tem-

compress to the same number of bits.)

perature change of the heat bath. They nd that the

Shannon also developed a measure of the channel capacity specic heat dened in this way measured on cooling and

of a noisy wire, and discussed error correction codes. . . heating disagree. 43 Consider the second cooling curve

and the nal heating curve, from 325 C to room temper-

(6.9) Entropy of Glasses. [59] ature and back. Assume that the liquid at 325 C is in

Glasses arent really in equilibrium. In particular they equilibrium both before cooling and after heating (and so

do not obey the third law of thermodynamics, that the has the same liquid entropy Sliquid ).

entropy S goes to zero at zero temperature. Experimen- (a) Is the residual entropy, equation 6.45, larger on heat-

talists measure a residual entropy by subtracting the ing or on cooling? R(Hint: Use the fact that the integrals

entropy change from the known entropy of the equilib- T

under the curves, 0 dQ dT give the heat ow, which

dT

rium liquid at a temperature T- at or above the crystalline by conservation of energy must be the same on heating

melting temperature Tc : and cooling. The heating curve shifts weight to higher

Z T temperatures: will that increase or decrease the integral

1 dQ

Sresidual = Sliquid (T- ) dT (6.45) in 6.45?)

0 T dT

(b) By using the second law (entropy can only increase),

where Q is the net heat ow out of the bath into the glass. show that when cooling and then heating from an equilib-

If you put a glass in an insulated box, it will warm up rium liquid the residual entropy measured on cooling must

(very slowly) because of microscopic atomic rearrange- always be less than the residual entropy measured on heat-

ments which lower the potential energy. So, glasses dont ing. (Hint: Consider the entropy ow into the outside

have a well-dened temperature or specic heat. In par- world upon cooling the liquid into the glass, compared to

ticular, the heat ow upon cooling and on heating dQdT

(T ) the entropy ow from the outside world to heat the glass

wont precisely match (although their integrals will agree into the liquid again. The initial and nal states of the

by conservation of energy). liquid are both in equilibrium.)

43 The fact that the energy lags the temperature near the glass transition, in linear

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

6.3 Entropy as Ignorance: Information and Memory 103

The residual entropy of a typical glass is about kB per The molecule must exert an equal and opposite entropic

molecular unit. Its a measure of how many dierent force F .

glassy congurations of atoms the material can freeze (b) Find an expression for the force F exerted by the

into. bath on the molecule in terms of the bath entropy. Hint:

(c) In a molecular dynamics simulation with one hundred the bath temperature T1 = SE bath

, and force times dis-

indistinguishable atoms, and assuming that the residual tance is energy. Using the fact that the length L must

entropy is kB log 2 per atom, what is the probability that maximize the entropy of the universe, write a general ex-

two coolings to zero energy will arrive at equivalent atomic pression for F in terms of the internal entropy S of the

congurations (up to permutations)? In a system with molecule.

1023 molecular units, with residual entropy kB log 2 per (c) Take our model of the molecule from part (a), the

unit, about how many coolings would be needed to arrive general law of part (b), and Stirlings formula 3.11 (drop-

at the original conguration again, with probability 1/2? ping the square root), write the force law F (L) for our

molecule for large lengths N . What is the spring constant

(6.10) Rubber Band. (Basic) K in Hookes law F = KL for our molecule, for small

L?

Our model has no internal energy: this force is entirely

entropic.

(d) If we increase the temperature of our rubber band while

it is under tension, will it expand or contract? Why?

In a more realistic model of a rubber band, the entropy

d consists primarily of our congurational randomwalk en-

tropy plus a vibrational entropy of the molecules. If we

stretch the rubber band without allowing heat to ow in

L or out of the rubber, the total entropy should stay ap-

Fig. 6.14 Simple model of a rubber band with N = 100 seg- proximately constant.44

ments. The beginning of the polymer is at the top: the end is

at the bottom; the vertical displacements are added for visu-

(e) True or false?

alization. (T) (F) When we stretch the rubber band, it will cool: the

congurational entropy of the random walk will decrease,

causing the entropy in the vibrations to decrease, causing

Figure 6.14 shows a simple onedimensional model for the temperature to decrease.

rubber. Rubber is formed of many long polymeric (T) (F) When we stretch the rubber band, it will cool: the

molecules, which undergo random walks in the unde- congurational entropy of the random walk will decrease,

formed material. When we stretch the rubber, the causing the entropy in the vibrations to increase, causing

molecules respond by rearranging their random walk to the temperature to decrease.

elongate in the direction of the external stretch. In our (T) (F) When we let the rubber band relax, it will cool:

simple model, the molecule is represented by a set of N the congurational entropy of the random walk will in-

links of length d, which with equal energy point either crease, causing the entropy in the vibrations to decrease,

parallel or antiparallel to the previous link. Let the total causing the temperature to decrease.

change in position to the right from the beginning of the (T) (F) When we let the rubber band relax, there must

polymer to the end be L. be no temperature change, since the entropy is constant.

As the molecule extent L increases, the entropy of our This more realistic model is much like the ideal gas, which

rubber molecule decreases. also had no congurational energy.

(a) Find an exact formula for the entropy of this system (T) (F) Like the ideal gas, the temperature changes be-

in terms of d, N , and L. (Hint: How many ways can one cause of the net work done on the system.

divide N links into M rightpointing links and N M (T) (F) Unlike the ideal gas, the work done on the rubber

leftpointing links, so that the total length is L?) band is positive when the rubber band expands.

The external world, in equilibrium at temperature T , ex- You should check your conclusions experimentally: nd

erts a force pulling the end of the molecule to the right. a rubber band (thick and stretchy is best), touch it to

cycle of stretching and compression, so long as the deformation is not too abrupt.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

104 Entropy

your lips (which are very sensitive to temperature), and L(s)

m+1

n

. Hence, show that L(2) log(s)

log(2)

< 1

n

and thus

stretch and relax it.

L(s) = k log s for some constant k.

(6.11) Entropy Measures Ignorance. (Mathematics) Hence our ignorance function S agrees with the formula

for the nonequilibrium entropy, uniquely up to an overall

In this exercise, you will show that the unique continuous constant.

function (up to the constant kB ) satisfying the three key

properties (equations 6.27, 6.32, and 6.38): (6.12) Chaos, Lyapunov, and Entropy Increase.

(Math, Complexity) (With Myers. [72])

1 1 1

S ,..., > S(p1 , . . . , p ) unless pi = for all i, Lets consider a simple dynamical system, given by a

45

(6.46) mappingfrom the unit interval (0, 1) into itself:

and

where the time evolution is given by iterating the map:

S(A|B- )B = S(AB) S(B). (6.48)

where S(A) = S(p P 1 , . . . , p ), S(B) = S(q1 , . . . , qM ), x0 , x1 , x2 , = x0 , f (x0 ), f (f (x0 )), . . . (6.52)

S(A|B- )B = - q- S(c1- , . . . , c- ) and S(AB) =

S(c11 q1 , . . . , cM qM ). The presentation is based on the In particular, for = 1 it precisely folds the unit inter-

excellent small book by Khinchin [49]. val in half, and stretches it (non-uniformly) to cover the

For convenience, dene L(g) = S(1/g, . . . , 1/g). original domain.

(a) For any rational probabilities pk , let g be the least com- The mathematics community lumps together continuous

mon multiple of their denominators, and let pk = gk /g for dynamical evolution laws and discrete mappings as both

integers gk . Show that being dynamical systems. You can motivate the rela-

X tionship using the Poincare sections (gure 4.3), which

S(A) = L(g) pk L(gk ). (6.49)

connect a continuous recirculating dynamical system to

k

the oncereturn map. The mapping 4.11 is not invert-

(Hint: consider AB to have g possibilities of probabil- ible, so it isnt directly given by a Poincare section of a

ity 1/g, A to measure which group of size gk , and B to smooth dierential equation46 but the general stretching

measure which of the gk members of group k.) and folding exhibited by our map is often seen in driven

(b) If L(g) = kS log g, show that equation 6.49 is the physical systems without conservation laws.

Shannon entropy 6.26. In this problem, we will focus on values of near one,

Knowing that S(A) is the Shannon entropy for all ratio- where the motion is mostly chaotic. Chaos is sometimes

nal probabilities, and assuming that S(A) is continuous, dened as motion where the nal position depends sensi-

makes S(A) the Shannon entropy. So, weve reduced the tively on the initial conditions. Two trajectories, starting

problem to showing L(g) is the logarithm up to a con- a distance 9 apart, will typically drift apart in time as

stant. 9et , where is the Lyapunov exponent for the chaotic

(c) Show that L(g) is monotone increasing with g. (Hint: dynamics.

youll need to use both of the rst two properties.) Start with = 0.9 and two nearby points x0 and y0 =

(d) Show L(g n ) = nL(g). (Hint: consider n independent x0 + 9 somewhere between zero and one. Investigate

probability distributions each of g equally likely events. the two trajectories x0 , f (x0 ), f (f (x0 )), . . . f [n] (x0 ) and

Use the third property recursively on n.) y0 , f (y0 ), . . . . How fast do they separate? Estimate the

(e) If 2m < sn < 2m+1 , using the results of parts (c) Lyapunov exponent.

and (d) show Many Hamiltonian systems are also chaotic. Two con-

m L(s) m+1 gurations of classical atoms or billiard balls, with ini-

< < . (6.50)

n L(2) n tial positions and velocities that are almost identical, will

(Hint: how is L(2m ) related to L(sn ) and L(2m+1 )?) rapidly diverge as the collisions magnify small initial de-

Show also using the same argument that m n

< log(s)

log(2)

< viations in angle and velocity into large ones. It is this

46 Remember the existence and uniqueness theorems from math class? The invert-

ibility follows from uniqueness.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

6.3 Entropy as Ignorance: Information and Memory 105

2

chaos that stretches, folds, and kneads phase space (as in of the surface area A

p = 4Rs , measured in units of the

the Poincare cat map of exercise 6.5) that is at root our Planck length L = G/c squared.

3

As it happens, Bekenstein had deduced this formula for

(6.13) Black Hole Thermodynamics. (Astrophysics) the entropy somewhat earlier, by thinking about analo-

gies between thermodynamics, information theory, and

Astrophysicists have long studied black holes: the end

statistical mechanics. On the one hand, when black holes

state of massive stars which are too heavy to support

interact or change charge and angular momentum, one

themselves under gravity (see exercise 7.14). As the mat-

can prove in classical general relativity that the area can

ter continues to fall into the center, eventually the escape

only increase. So it made sense to assume that the en-

velocity reaches the speed of light. After this point, the

tropy was somehow proportional to the area. He then

in-falling matter cannot ever communicate information

recognized that if you had some waste material of high

back to the outside. A black hole of mass M has radius48

entropy to dispose of, you could ship it into a black hole

2M and never worry about it again. Indeed, given that the

Rs = G 2 , (6.53)

c entropy represents your lack of knowledge about a system,

where G = 6.67 108 cm3 /g sec2 is the gravitational once matter goes into a black hole one can say that our

constant, and c = 3 1010 cm/sec is the speed of light. knowledge about it completely vanishes.51 (More specif-

Hawking, by combining methods from quantum mechan- ically, the entropy of a black hole represents the inac-

ics and general relativity, calculated the emission of radi- cessibility of all information about what it was built out

ation from a black hole.49 He found a wonderful result: of.) By carefully dropping various physical systems into

black holes emit perfect blackbody radiation at a tem- a black hole (theoretically) and measuring the area in-

perature crease compared to the entropy increase,52 he was able to

c3 deduce these formulas purely from statistical mechanics.

Tbh = . (6.54)

8GM kB We can use these results to provide a fundamental bound

According to Einsteins theory, the energy of the black on memory storage.

hole is E = M c2 .

(c) Calculate the maximum number of bits that can be

(a) Calculate the specic heat of the black hole. stored in a sphere of radius one centimeter.

The specic heat of a black hole is negative. That is, it

gets cooler as you add energy to it. In a bulk material,

this would lead to an instability: the cold regions would (6.14) Fractal Dimensions. (Math, Complexity) (With

suck in more heat and get colder. Indeed, a population Myers. [72])

of black holes is unstable: the larger ones will eat the There are many strange sets that emerge in science. In

smaller ones.50 statistical mechanics, such sets often arise at continuous

(b) Calculate the entropy of the black hole, by using the phase transitions, where selfsimilar spatial structures

denition of temperature T1 = E S

and assuming the en- arise (chapter 13. In chaotic dynamical systems, the at-

tropy is zero at mass M = 0. Express your result in terms tractor (the set of points occupied at long times after

47 There have been speculations by some physicists that entropy increases through

information dropping into black holes either real ones or tiny virtual blackhole

uctuations (see exercise 6.13. Recent work has cast doubt that the information is

really lost even then: were told its just scrambled, presumably much as in chaotic

systems.

48 This is the Schwarzschild radius of the event horizon for a black hole with no

49 Nothing can leave a black hole: the radiation comes from vacuum uctuations

50 A thermally insulated glass of ice water also has a negative specic heat! The

surface tension at the curved the ice surface will decrease the coexistence tempera-

ture a slight amount (see section 12.2): the more heat one adds, the smaller the ice

cube, the larger the curvature, and the lower the resulting temperature!

51 Except for the mass, angular momentum, and charge. This suggests that baryon

number, for example, isnt conserved in quantum gravity. It has been commented

that when the baryons all disappear, itll be hard for Dyson to build his progeny out

of electrons and neutrinos: see 6.1.

52 In ways that are perhaps too complex to do here.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

106 Entropy

the transients have disappeared) is often a fractal (called For each 9, use a histogram to calculate the proba-

a strange attractor. These sets often are tenuous and bility Pn that the points fall in the nth bin

jagged, with holes on all length scales: see gures 13.2, Return the set of vectors Pn [9].

13.3, and 13.14.

You may wish to test your routine by using it for = 1

We often try to characterize these strange sets by a di-

(where the distribution should look like (x) = 1 ,

mension. The dimensions of two extremely dierent sets x(1x)

can be the same: the path exhibited by a random walk exercise 4.3(b)) and = 0.8 (where the distribution

(embedded in three or more dimensions) is arguably a should look like two -functions, each with half of the

twodimensional set (note 6 on page 15), but does not lo- points).

cally look like a surface! However, if two sets have dier- The Capacity Dimension. The denition of the ca-

ent spatial dimensions (measured in the same way) they pacity dimension is motivated by the idea that it takes at

surely are qualitatively dierent. least

There is more than one way to dene a dimension. Ncover = V /9D (6.56)

Roughly speaking, strange sets are often spatially inho- bins of size 9D to cover a D-dimensional set of volume

mogeneous, and what dimension you measure depends V .55 By taking logs of both sides we nd log Ncover

upon how you weight dierent regions of the set. In log V + D log 9. The capacity dimension is dened as the

this exercise, we will calculate the information dimension limit

(closely connected to the non-equilibrium entropy!), and log Ncover

Dcapacity = lim (6.57)

the capacity dimension (originally called the Hausdor 40 log 9

dimension, also sometimes called the fractal dimension). but the convergence is slow (the error goes roughly as

To generate our strange set along with some more or- log V / log 9). Faster convergence is given by calculating

dinary sets we will use the logistic map53 the slope of log N versus log 9:

d log Ncover

f (x) = 4x(1 x) (6.55) Dcapacity = lim (6.58)

40 d log 9

that we also study in exercises 6.12, 4.3, and 13.8. The log Ni+1 log Ni

= lim .

attractor for the logistic map is a periodic orbit (dimen- 40 log 9i+1 log 9i

two intervals (dimension one)54 at = 0.9. At the onset (b) Use your routine from part (a), write a routine to

of chaos at = 0.892486418 (exercise 13.8) the calculate N [9] by counting non-empty bins. Plot Dcapacity

dimension becomes intermediate between zero and one: from the fast convergence equation 6.58 versus the mid-

the attractor is strange, selfsimilar set. point 1/2 (log 9i+1 + log 9i ). Does it appear to extrapolate to

D = 1 for = 0.9?56 Does it appear to extrapolate to

Both the information dimension and the capacity dimen-

D = 0 for = 0.8? Plot these two curves together with

sion are dened in terms of the occupation Pn of cells of

the curve for . Does the last one appear to converge to

size 9 in the limit as 9 0.

D1 0.538, the capacity dimension for the Feigenbaum

(a) Write a routine which, given and a set of bin sizes attractor gleaned from the literature? How small a devia-

9, tion from does it take to see the numerical crossover

Iterates f hundreds or thousands of times (to get on to integer dimensions?

the attractor) Entropy and the Information Dimension. The en-

Iterates f many more times, collecting points on the tropy of a statistical mechanical system is given by equa-

attractor. (For , you could just integrate 2n tion 6.22, S = kB Tr( log ). In the chaotic regime this

times for n fairly large.) works ne. Our probabilities Pn (xn )9, so converting

54 See exercise 4.3. The chaotic region for the logistic map isnt a strange attrac-

tor because its conned to one dimension: period doubling cascades for dynamical

systems in higher spatial dimensions likely will have fractal, strange attractors in the

chaotic region.

55 Imagine covering the surface of a sphere in 3D with tiny cubes: the number of

56 In the chaotic regions, keep the number of bins small compared to the number of

iterates in your sample, or you start nding empty bins between points and eventually

get a dimension of zero.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

6.3 Entropy as Ignorance: Information and Memory 107

R P

the entropy integral into a sum f (x) dx n f (xn )9

Instead of using this formula to dene the entropy, math-

gives ematicians use it to dene the information dimension

Z X

S = kB (x) log((x)) dx (6.59) Dinf = lim Pn log Pn / log(9). (6.62)

40

X X

Pn log(Pn /9) = Pn log Pn + log 9 The information dimension agrees with the ordinary di-

n n mension for sets that locally look like RD . Its dierent

from the capacity dimension because the information di-

(setting the conversion factor kB = 1 for convenience). mension weights each part (bin) of the attractor by the

You might imagine that the entropy for a xed point time spent in it. Again, we can speed up P the convergence

would be zero, and the entropy for a period-n cycle would by noting that equation 6.61 says that n Pn log Pn is

be kB log n. But this is incorrect: when there is a xed a linear function of log 9 with slope D and intercept SD .

point or a periodic limit cycle, the attractor is on a set Measuring the slope directly, we nd

of dimension zero (a bunch of points) rather than dimen- P

sion one. The entropy must go to minus innity since we d n Pn (9) log Pn (9)

Dinf = lim . (6.63)

have precise information about where the trajectory sits 40 d log 9

at long times. To estimate the zerodimensional en-

(c) As in part (b), write a routine that plots Dinf from

tropy kB log n on the computer, we would take the same

equation 6.63 as a function of the midpoint log 9, as we

bins as above but sum over bins Pn instead of integrating

increase the number of bins. Plot the curves for = 0.9,

over x:

= 0.8, and . Does the information dimension agree

X

Sd=0 = Pn log(Pn ) = Sd=1 log(9). (6.60) with the ordinary one for the rst two? Does the last one

n appear to converge to D1 0.517098, the information

dimension for the Feigenbaum attractor from the litera-

More generally, the natural measure of the entropy for ture?

a set with D dimensions might be dened as

Most real world fractals have a whole spectrum of dif-

X

SD = Pn log(Pn ) + D log(9). (6.61) ferent characteristic spatial dimensions: they are multi-

n fractal.)

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

108 Entropy

Quantum Statistical

Mechanics 7

In this section, we introduce the statistical mechanics of quantum sys-

tems. Logically, we proceed from the abstract to the concrete, through a

series of simplications. We begin [7.1] by introducing density matrices,

which allow us to incorporate our ensembles into quantum mechanics:

here we discover the simplication that equilibrium ensembles have den-

sity matrices that are diagonal in the energy basis. This reduces equilib-

rium statistical mechanics to simple sums over energy eigenstates, which

we illustrate [7.2] by solving the nite-temperature quantum harmonic

oscillator. We then discuss the statistical mechanics of identical parti-

cles [7.3]. We then make the vast simplication of presuming that the

particles are non-interacting [7.4], which leads us to the Bose-Einstein

and Fermi distributions for the lling of single-particle eigenstates. We

briey relate Bose, Fermi, and Maxwell-Boltzmann statistics 7.5. We

illustrate how amazingly useful the non-interacting particle picture is

for quantum systems by solving the classic problems of black-body ra-

diation and bose condensation for bosons [7.6], and for the behavior of

simple metals for fermions [7.7].

Sections 7.1 and 7.5 logically belong here, but discuss issues at more

depth than is required for the rest of the text. It is suggested that

one skim or skip portions of these sections on rst reading, and return

to the abstractions later, after gaining a broad view of what quantum

statistical mechanics predicts in sections 7.6 and 7.7.

trices

How do we generalize the classical ensembles, described by probability

densities (P, Q) in phase space, to quantum mechanics? Two problems

immediately arise. First, the Heisenberg uncertainty principle tells us

that one cannot specify both position and momentum for a quantum

system at the same time. The states of our quantum system will not be

points in phase space. Second, quantum mechanics already has probabil-

ity densities: even for systems in a denite state1 (Q) the probability 1

Quantum systems with many particles

is spread among dierent congurations |(Q)|2 (or momenta |(P)| 2

). have wavefunctions that are functions

of all the positions of all the particles

In statistical mechanics, we need to introduce a second level of probabil- (or, in momentum space, all the mo-

ity, to discuss an ensemble that has probabilities n of being in a variety menta of all the particles).

109

110 Quantum Statistical Mechanics

mixed states: they are not superpositions of dierent wave functions,

2

So, for example, if |R is a right- but incoherent mixtures.2

circularly polarized photon, and |L is Suppose we want to compute the ensemble expectation of an operator

a left-circularly polarized photon, then

the superposition 1 (|R + |L) is a

A. In a particular state n , the quantum expectation is

2

linearly polarized photon, while the

mixture 1/2 (|RR| + |LL|) is an unpo-

larized photon. The superposition is in Apure = n (Q)An (Q)d3N Q. (7.1)

both states, the mixture is in perhaps

one or perhaps the other. See prob-

lem 7.5(a). So, in the ensemble the expectation is

A = n n (Q)An (Q)d3N Q. (7.2)

n

For most purposes, this is enough! Except for selected exercises in this

chapter, one or two problems in the rest of the book, and occasional

specialized seminars, formulating the ensemble as a sum over states n

with probabilities n is perfectly satisfactory. Indeed, for all of the equi-

librium ensembles, the n may be taken to be the energy eigenstates,

and the n either a constant in a small energy range (for the micro-

canonical ensemble), or exp(En )/Z (for the canonical ensemble), or

exp ((En Nn )) / (for the grand canonical ensemble). For most

practical purposes you may stop reading here.

beyond this simple picture? First, there are lots of mixed states that

are not mixtures of energy eigenstates. Mixtures of energy eigenstates

have time-independent properties, so any time-dependent ensemble will

be in this class. Second, although one can dene the ensemble in terms

of a set of states n , the ensemble should be something one can look

at in a variety of bases. Indeed, superuids and superconductors show

an energy gap when viewed in the energy basis, but show an exotic o

diagonal longrange order when looked at in position space. Third, we

will see that the proper generalization of Liouvilles theorem demands

the more elegant, operator-based approach.

Our goal is to avoid carrying around the particular states n , writing

the ensemble average [7.2] in terms of A and some operator , which will

be the density matrix. For this section, it is convenient to use Diracs

3

R In Diracs notation, |M| = bra-ket notation, in which the ensemble average can be written3

M. It is particularly useful

when expressing operators in a basis

m ; if the matrix elements are Mij = A = n n |A|n . (7.3)

i |M|j then Pthe operator itself can n

be written M = ij Mij |i j |.

Pick any complete orthonormal basis . Then the identity operator

1= | | (7.4)

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.1 Quantum Ensembles and Density Matrices 111

A = n n | | | | A|n

n

= n |An n |

n

= A| n |n n | |

n

=Tr(A) (7.5)

where4 4

The trace of a matrix is the sum of

its diagonal elements, and is indepen-

= n |n n | (7.6) dent of what basis you write it in. The

n same is true of operators: we are sum-

ming

P the diagonal elements Tr(M ) =

is the density matrix.

|M | .

Some conclusions we can draw about the density matrix:

Suciency. In quantum mechanics, the measurement processes

involve expectation values of operators. Our density matrix there-

fore suces to embody everything we need to know about our

quantum system.

Pure states. A pure state, with a denite wavefunction , has

pure = ||. In the position basis |Q, this pure-state den-

sity matrix has matrix elements pure (Q, Q ) = Q|pure |Q =

(Q )(Q). Thus in particular we can reconstruct the wavefunc-

tion, up to an overall constant, by xing one value of Q and vary-

ing Q. Since the wavefunction is normalized, we can reconstruct

up to an overall phase, which isnt physically measurable: this

again conrms the suciency of the density matrix to describe

our system. Since our wavefunction is normalized | = 1, one

notes also that the square of the density matrix for a pure state

equals itself: pure 2 = (||)(||) = pure ,

Normalization. The trace of a pure-state density matrix Trpure =

1, since we can pick an orthonormal basis with our wavefunction

as the rst basis element, making the rst term in the trace sum

one and the others zero. The trace of a general density matrix is

hence also one, since it is a sum of pure-state density matrices:

Tr = Tr n |n n | = n Tr (|n n |) = n = 1.

n n n

(7.7)

Canonical Distribution. The canonical distribution can be

written in terms of the Hamiltonian operator H as5

exp(H) exp(H)

canon = = . (7.9)

Z Tr exp(H)

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

112 Quantum Statistical Mechanics

evaluate canon in the energy basis,

= Em | eH |En /Z

= Em | eEn |En /Z

= eEn Em |En /Z

= eEn mn /Z (7.10)

exp(En )

= |En En | (7.11)

n

Z

states, just as one would expect. Notice that the states n mixed

to make the density matrix are not in general eigenstates, or even

orthogonal. For equilibrium statistical mechanics, though, life is

simple: the n can be chosen to be energy eigenstates, and the

density matrix is diagonal in that basis.

Entropy. The entropy for a general density matrix will be

S = kB Tr ( log ) . (7.12)

The time evolution for the density matrix is determined by the

6

The n are the probability that one time evolution of the pure states composing it:6

started in the state n , and thus clearly

dont change with time. |n n |

= n n | + |n . (7.13)

t n

t t

operating on it with the Hamiltonian

|n 1

= H|n (7.14)

t i

and the time evolution of the bra wavefunction n | is given by

the time evolution of n (Q):

n n 1 1

= = Hn = Hn (7.15)

t t i i

0 1

f (11 ) 0 0 ...

f () = @ 0 f (22 ) 0 . . .A . (7.8)

...

At the end, change back to the original basis.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.1 Quantum Ensembles and Density Matrices 113

so since H is Hermitian,

n | 1

= n |H. (7.16)

t i

Hence

1
1

= n H|n n | |n n |H = (H H)

t n

i i

1

= [H, ]. (7.17)

i

Quantum Liouville Equation.

This time evolution law is the quantum version of Liouvilles the-

orem. We can see this by using the equations of motion 4.1,

q = H/p and p = H/q and the denition of Poisson

brackets A B A B

{A, B}P = (7.18)

q p p q

to rewrite Liouvilles theorem that the total time derivative is zero

[4.7] into a statement about the partial time derivative:

d H H

0= = + q + p = +

dt t q p t q p p q

(7.19)

so

= {H, }P . (7.20)

t

Using the classicalquantum correspondence between the Poisson

brackets and the commutator { }P i 1

[ ] the time evolution

law 7.17 is precisely the analogue of Liouvilles theorem 7.20.

Quantum Liouville and Statistical Mechanics.

The quantum version of Liouvilles equation is not nearly as com-

pelling an argument for statistical mechanics as was the classical

version.

The classical theorem, you remember, stated that d/dt = 0. Any

equilibrium state must be time independent /t = 0, so this

implied that such a state must have constant along the trajecto-

ries. If the trajectory covers the energy surface (ergodicity), then

the probability density had to be constant on the energy surface,

justifying the microcanonical ensemble.

For an isolated quantum system, this argument breaks down. The

condition that an equilibrium state must be time independent isnt

very stringent! Indeed, /t = [H, ] = 0 for any mixture of

manybody energy eigenstates!

In principle, isolated quantum systems are very non-ergodic, and

one must couple them to the outside world to induce transitions

between the manybody eigenstates to lead to equilibrium. This

becomes much less of a concern when one realizes just how pecu-

liar a manybody eigenstates of a large system really is! Consider

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

114 Quantum Statistical Mechanics

think of the atom in an energy eigenstate, which decays after some

time into a ground state atom plus some photons. The true eigen-

states of the system, however, are weird delicate superpositions

of states with photons being absorbed by the atom and the atom

emitting photons, carefully crafted to produce a stationary state.

When one starts including more atoms and other interactions, the

7

The low-lying manybody excitations true manybody eigenstates7 that we formally sum over in pro-

above the ground state are an exception ducing our ensembles are pretty useless things to work with in

to this.

most cases.

The quantum harmonic oscillator is a great example of how statistical

mechanics works in quantum systems. Consider a harmonic oscillator of

frequency . The energy eigenvalues are En = (n + 1/2 ). Hence the

canonical ensemble for the quantum

harmonic oscillator at temperature

T = 1/kB is a geometric series xn , which we can sum to 1/(1 x):

Fig. 7.1 The quantum states of

the harmonic oscillator are at equally

eEn = e(n+ /2 )

1

spaced energies. Zqho = (7.21)

n=0 n=0

n 1

= e/2 e = e/2

n=0

1 e

1 1

= /2 /2

= .

e e 2 sinh(/2)

The average energy is

log Zqho 1

kB

Eqho = = /2 + log 1 e

e 1

= 1/2 + = 1

/ + (7.22)

Specific Heat cV

2

1 e e 1

which corresponds to an average excitation level

1

nqho = . (7.23)

e 1

00 0.2 0.4 0.6 0.8 1

Temperature kBT / h The specic heat is thus

2

Fig. 7.2 The specic heat for the quan- E e/kB T

tum harmonic oscillator. cV = = kB 2 (7.24)

T kB T 1 e/kB T

we found for the classical harmonic oscillator (and as given by the

equipartition theorem).

Low temperatures. As T 0, e/kB T becomes exponentially

small, so the specic heat goes rapidly to zero as the energy asymp-

totes to the zero-point energy 1/2 . More specically, there is an

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.3 Bose and Fermi Statistics 115

any excitation of the system is suppressed by a factor of e/kB T .

In quantum mechanics, indistinguishable particles are not just hard to

tell apart their quantum wavefunctions must be the same, up to an

overall phase change,9 when the coordinates are swapped. In particular, 9

In three dimensions, this phase change

for bosons10 the wavefunction is unchanged under a swap, so must be 1. In two dimensions one

can have any phase change, so one can

have not only fermions and bosons but

(r1 , r2 , . . . , rN ) = (r2 , r1 , . . . , rN ) = (rP1 , rP2 , . . . , rPN ) (7.25)

anyons. Anyons, with fractional statis-

tics, arise as excitations in the frac-

for any permutation P of the integers 1, . . . , N .11 For fermions12 tional quantized Hall eect.

10

(r1 , r2 , . . . , rN ) = (r2 , r1 , . . . , rN ) = (P ) (rP1 , rP2 , . . . , rPN ). Examples of bosons include mesons,

(7.26) He4 , phonons, photons, gluons, W

and Z bosons, and (presumably) gravi-

The eigenstates for systems of identical fermions and bosons are a sub- tons. The last four mediate the fun-

set of the eigenstates of distinguishable particles with the same Hamil- damental forces the electromagnetic,

tonian strong, weak, and gravitational interac-

tions. The spin-statistics theorem (not

Hn = En n ; (7.27)

discussed here) states that bosons have

in particular, they are given by the distinguishable eigenstates which integer spins. See problem 7.9.

11

obey the proper symmetry properties under permutations. A non A permutation {P1 , P2 , . . . PN } is

just a reordering of the integers

symmetric eigenstate with energy E may be symmetrized to form {1, 2, . . . N }. The sign (P ) of a per-

a bose eigenstate mutation is +1 if P is an even permu-

tation, and 1 if P is an odd permuta-

sym (r1 , r2 , . . . , rN ) = (Normalization) (rP1 , rP2 , . . . , rPN ) (7.28) tion. Swapping two labels, keeping all

P the rest unchanged, is an odd permuta-

tion. One can show that composing two

or antisymmetrized to form a fermion eigenstate permutations multiplies their signs, so

odd permutations can be made by odd

asym (r1 , r2 , . . . , rN ) = (Normalization) (P )(rP1 , rP2 , . . . , rPN ) numbers of pair swaps, and even per-

mutations are composed of even num-

P

(7.29) bers of pair swaps.

12

if the symmetrization or antisymmetrization does not make the sum Most of the common elementary par-

zero. These remain eigenstates of energy E, because they are sums of ticles are fermions: electrons, protons,

neutrons, neutrinos, quarks, etc. Fer-

eigenstates of energy E. mions have half-integer spins. Particles

Quantum statistical mechanics for identical particles is given by re- made up of even numbers of fermions

stricting the ensembles to sum over symmetric wavefunctions for bosons are bosons.

(or antisymmetric wavefunctions for fermions). So, for example, the

partition function for the canonical ensemble is still

En

Z = Tr eH = e (7.30)

n

but now the trace is over a complete set of many-body symmetric (anti-

symmetric) states, and the sum is over the symmetric (antisymmetric)

many-body energy eigenstates.

8 In solid state physics we call this an energy gap: the minimum energy needed

to add an excitation to the system. In quantum eld theory, where the excitations

are particles, they refer to the minimum excitation as the mass mc2 of the particle.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

116 Quantum Statistical Mechanics

Many-body quantum statistical mechanics is hard. We now make a

huge approximation: well assume our quantum particles do not interact

with one another. Just as for the classical ideal gas, this will make our

calculations straightforward.

The non-interacting Hamiltonian is a sum of single-particle Hamilto-

nians H:

N N

2 2

HN I = H(pj , rj ) = + V (rj ). (7.31)

j=1 j=1

2m j

as a product of orthonormal single-particle eigenstates

N

N I

dist (r1 , r2 , . . . , rN ) = kj (rj ). (7.33)

j=1

eigenstates for non-interacting bosons is given by symmetrizing over the

coordinates rj ,

N

N I

boson (r1 , r2 , . . . , rN ) = (Normalization) kj (rPj ), (7.34)

P j=1

13

This antisymmetrization can be writ- and of course the fermion eigenstates are given by antisymmetrizing13

ten as

1

(r ) k1 (r2 ) ... k1 (rN ) N

k1 1

(r1 )

1 k2

k2 (r2 ) ... k2 (rN ) N I

(r ,

fermion 1 2 r , . . . , rN ) = (P ) kj (rPj ). (7.36)

N !

N! P

j=1

)

kN (r1 ) kN (r2 ) ... kN (rN

(7.35)

Lets consider two particles in orthonormal single-particle energy eigen-

called the Slater determinant.

states k and . If the particles are distinguishable, there are two

eigenstates k (r1 ) (r2 ) and k (r2 ) (r1 ). If the particles are bosons,

the eigenstate is 12 (k (r1 ) (r2 ) + k (r2 ) (r1 )). If the particles are

fermions, the eigenstate is 12 (k (r1 ) (r2 ) k (r2 ) (r1 )).

What if the particles are in the same single-particle eigenstate ? For

14

Notice that the normalization of the bosons, the eigenstate is already symmetric and normalized, k (r1 ) (r2 ).14

boson wavefunction depends on how For fermions, antisymmetrizing a state where both particles are in the

many single-particle states are multi-

ply occupied. Check this by squaring

same state gives zero: (r1 ) (r2 ) (r2 ) (r1 ) = 0. This is the Pauli

and integrating the two-particle boson exclusion principle: you cannot have two fermions in the same quantum

wavefunctions in the two cases. state.15

15 How do we do statistical mechanics for non-interacting fermions and

Because the spin of the electron can

be in two directions 1/2, this means bosons? Here it is most convenient to use the grand canonical ensemble

that two electrons can be placed into [5.4], so we can think of each single-particle eigenstate k as being lled

each single-particle spatial eigenstate.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.4 Non-Interacting Bosons and Fermions 117

hence factors:

N I = k . (7.37)

k

The grand canonical ensemble thus allows us to separately solve the

problem one eigenstate at a time, for non-interacting particles.

Bosons. For bosons, all llings nk are allowed. Each particle in

eigenstate k contributes energy /k and chemical potential , so

nk 1

boson = e(2k )nk = e(2k ) =

k

nk =0 nk =0

1 e(2k )

(7.38)

so the boson grand partition function is

1

N I

boson = (2k )

. (7.39)

k

1 e

N I

boson = boson

k = kB T log 1 e(2k ) . (7.40)

k k

expected number of particles in state k . From equation 5.42, 5

(2k ) Bose-Einstein

boson e 1 4 Maxwell Boltzmann

nk = k

= kB T = (2 ) . (7.41)

1 e(2k ) e k 1 3

<n>()

1

nBE = . (7.42) 1

e(2) 1

0

-1 0 1 2 3

It describes the lling of single-particle eigenstates by non-interacting ( )/kBT

e(2) , and the boson populations correspond to what we would guess Fig. 7.3 Bose-Einstein and

Maxwell-Boltzmann distribu-

naively from the Boltzmann distribution.16 The condition for low oc- tions, nBE (/) of equation 7.42

cupancies is /k kB T , but perversely this often arises at high tem- and nMB (/) of equation 7.59. The

peratures, when gets large and negative. Notice also that nBE Bose-Einstein distribution diverges as

approaches /.

as /k since the denominator vanishes (and becomes negative for

16

> /k ); systems of non-interacting bosons always have less than or We will formally discuss the

Maxwell-Boltzmann distribution in

equal to the lowest of the single-particle energy eigenvalues.17 [7.5].

Notice that the average excitation nqho of the quantum harmonic 17

oscillator 7.23 is given by the Bose-Einstein distribution 7.42 with = 0. When the river level gets up to the

height of the elds, your farm gets

Well use this in problem 7.9 to treat the excitations inside harmonic ooded.

oscillators (vibrations) as particles obeying Bose statistics (phonons).

Fermions. For fermions, only nk = 0 and nk = 1 are allowed. The

single-state fermion grand partition function is

1

fermion

k = e(2k )nk = 1 + e(2k ) (7.43)

nk =0

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

118 Quantum Statistical Mechanics

(2k )

N I

fermion = 1 + e . (7.44)

k

For summing over only two states, its hardly worthwhile to work through

the grand free energy to calculate the expected number of particles in a

1

state:

T=0

Small T 1

e(2k ) 1

nk = nk exp((/k )nk ) = (2k )

= (2 ) ,

nk =0

1 + e e k +1

f()

kBT (7.45)

leading us to the Fermi-Dirac distribution

1

0 f (/) = nFD = (2) (7.46)

0 1

Energy over Chemical Potential /

2 e +1

where f (/) is also known as the Fermi function. Again, when the

Fig. 7.4 The Fermi distribution occupancy of state k is low, it is approximately given by the Boltzmann

f (/) of equation 7.46. At low tem- probability distribution, e(2) . Here the chemical potential can be

peratures, states below are occupied,

states above are unoccupied, and

either greater than or less than any given eigenenergy /k . Indeed, at low

states within around kB T of are par- temperatures the chemical potential separates lled states /k < from

tially occupied. empty states /k > ; only states within roughly kB T of are partially

lled.

The chemical potential is playing a large role in these calculations,

and those new to the subject may wonder how one determines it. You

will see in the problems that one normally knows the expected number

of particles N , and must vary until you reach that value. Hence

very directly plays the role of a particle pressure from the outside world,

which is varied until the system is correctly lled.

The classical ideal gas has been a great illustration of statistical me-

chanics, and does a good job of many gasses, but nobody would sug-

gest that it captures the main features of solids and liquids. The non-

interacting approximation in quantum mechanics turns out to be far

more powerful, for quite subtle reasons.

For bosons, the non-interacting approximation is quite accurate in

three important cases: photons, phonons, and the dilute Bose gas.

In [7.6] well study two fundamental problems involving non-interacting

bosons: black body radiation and Bose condensation. The behavior of

superconductors and superuids share some common features with that

of the bose gas.

For fermions, the non-interacting approximation would seem to rarely

be useful. Electrons are charged, and the electromagnetic repulsion be-

tween the electrons in an atom, molecule, or material would seem to

always be a major contribution to the energy. Neutrons interact via the

strong interaction, so nuclei and neutron stars would seem also poor can-

didates for a non-interacting theory. Neutrinos are hard to pack into a

19

These use the same techniques which box. 18 There are experiments on cold, dilute gasses of fermion atoms19

led to the observation of Bose conden-

sation. 18 Just in case you havent heard, neutrinos are quite elusive. It is said that if

you send neutrinos through a lead shield, more than half will penetrate until the

thickness is roughly to the nearest star.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.5 Maxwell Boltzmann Quantum Statistics 119

on in an introductory course.

The truth is that the non-interacting Fermi gas is amazingly impor-

tant, and describes all of these systems (atoms, metals, insulators, nu-

clei, and neutron stars) amazingly well. Interacting Fermi systems under

most common circumstances behave very much like collections of non-

interacting fermions in a modied potential.20 The connection is so

powerful that in most circumstances we ignore the interactions: when-

ever we talk about exciting a 1S electron in an oxygen atom, or an

electron-hole pair in a semiconductor, we are using this eective non-

interacting electron approximation. The explanation for this amazing

fact is called Landau Fermi liquid theory, and lies beyond the purview

of this text. We will discuss the applications of the Fermi gas to metals

in [7.7]; the problems discuss applications to semiconductors [7.13] and

stellar collapse [7.14].

tics

In classical statistical mechanics, we treated indistinguishable particles

as distinguishable ones, except that we divided the phase-space volume,

(or the partition function, in the canonical ensemble) by factor N !.

1 dist

MB

N =

N! N

MB 1 dist

ZN = Z (7.47)

N! N

This approximation is also used in quantum statistical mechanics, al-

though we should emphasize that it does not describe either bosons,

fermions, or any physical system. These bogus particles are said to obey

Maxwell-Boltzmann statistics.

What is the canonical partition function for the case of N non-interacting

distinguishable particles? If the partition function for one particle is

Z1 = e2k (7.48)

k

otherwise similar) particles is21 21

Multiply out the product of the

sums, and see.

20 In particular, the low-lying excitations above the ground state look qualita-

tively like fermions excited from below the Fermi energy to above the Fermi energy

(electron-hole pairs in metals and semiconductors). It is not that these electrons

dont signicantly interact with those under the Fermi sea: it is rather that these in-

teractions act to dress the electron with a screening cloud. These dressed electrons

and holes, or quasiparticles, are what act so much like non-interacting particles.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

120 Quantum Statistical Mechanics

(2k1 +2k2 ) 2k1 2k2

Z2NI,dist = e = e e = Z1 2 .

k1 ,k2 k1 k2

(7.49)

and the partition function for N such distinguishable, non-interacting

particles is

N

Z2NI,dist = e(2k1 +2k2 ++2kN ) = e2kj = Z1 N .

k1 ,k2 ,...,kn j=1 kj

(7.50)

So, the Maxwell-Boltzmann distribution for non-interacting particles is

Z2NI,MB = Z1 N /N !. (7.51)

sidering the canonical ensemble of two non-interacting particles in three

possible states of energies /1 , /2 , and /3 . The Maxwell-Boltzmann par-

tition function for such a system would be

1 21 2

Z2NI,MB = e + e22 + e23 (7.52)

2!

=1/2 e221 + 1/2 e222 + 1/2 e223

+ e(21 +22 ) + e(21 +23 ) + e(22 +23 ) .

22

More precisely, we mean those many The 1/N ! xes the weights of the singly-occupied states22 nicely: each

body states where the single-particle has weight one in the Maxwell-Boltzmann partition function. But the

states are all singly occupied or vacant.

doubly occupied states, where both particles have the same wavefunc-

tion, have an unintuitive suppression by 1/2 in the sum.

There are basically two ways to x this. One is to stop discriminating

against multiply occupied states, and to treat them all democratically.

This gives us non-interacting bosons:

Z2NI,boson = e221 +e222 +e223 +e(21 +22 ) +e(21 +23 ) +e(22 +23 ) .

(7.53)

The other way is to squelch multiple occupancy altogether. This leads

to fermions:

xing the number of particles to two. This is convenient only for small

23

See problem 7.6 for more details systems; normally weve used the grand canonical ensemble.23 How

about the three ensembles and the four does the grand canonical ensemble apply to particles with Maxwell-

types of statistics.

Boltzmann statistics? The grand partition function is a geometric se-

24 x

Notice the unusual appearance of ee ries:24

in this formula. To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.6 Black Body Radiation and Bose Condensation 121

M

1
NI,MB M 1

NI,MB M 2k

= ZM e = e eM

M! M!

M M k

M

1 P (k )

= e(2k ) = e ke

M

M k M

(k )

e

= e . (7.55)

k

NI,MB = kB T log NI,MB = kB T k (7.56)

k

k = kB T e(2k ) . (7.57)

with energy / is

nMB = = e(2) . (7.59)

This is precisely the Boltzmann factor for lling the state that we expect

for non-interacting distinguishable particles.

densation

7.6.1 Free Particles in a Periodic Box

For this section and the next section on fermions, we shall simplify even

Fig. 7.5 The quantum states of a par-

further. We consider particles which are not only non-interacting and

ticle in a one-dimensional box with pe-

identical, but are also free. That is, they are subject to no external riodic boundary conditions are sine and

potential, apart from being conned in a box of volume L3 = V with cosine waves n with n wavelengths in

periodic boundary conditions. The single-particle quantum eigenstates the box, kn = 2n/L. With a real box

(zero boundary conditions at the walls)

of such a system are products of sine and cosine waves along the three one would have only sine waves, but one

directions for example, for any three positive integers ni , at every half-wavelength, kn = n/L,

giving the same net density of states.

3/2 2n1 2n2 2n3

= (2/L) sin x sin y sin z . (7.60)

L L L

There are eight such states with the same energy, substituting cos for sin

in all possible combinations along the three directions. These are more

25 It is amusing to note that non-interacting particles ll single particle energy

1

n = , (7.58)

e(4) + c

with c = 1 for bosons, c = 1 for fermions, and c = 0 for Maxwell-Boltzmann

statistics.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

122 Quantum Statistical Mechanics

exponential, so

k = (1/L)3/2 exp(ik r) (7.61)

26

The eight degenerate states are now with k = 2 L (n1 , n2 , n3 ) and the ni can now be any integer.

26

The

given by the choices of sign for the three allowed single-particle eigenstates form a regular square grid in the space

integers.

of wavevectors k, with an average density (L/2)3 per unit volume of

kz k-space.

Density of Plane Waves in k-space = V /8 3 (7.62)

For a large box volume V , the grid is extremely ne, and one can use

a continuum approximation that the number of states falling into a k-

space region is given by its volume times the density 7.62.

ky ky

7.6.2 Black Body Radiation

kx Our rst application is to electromagnetic radiation. Electromagnetic

radiation has plane-wave modes similar to 7.61. Each plane-wave trav-

els at the speed of light c, so its frequency is k = c|k|. There are two

Fig. 7.6 The allowed k-space points

modes per wavevector k, one for each polarization. When one quan-

for periodic boundary conditions form a tizes the electromagnetic eld, each mode becomes a quantum harmonic

regular grid. The points of equal energy oscillator.

lie on a sphere. Before quantum mechanics, people could not understand how electro-

magnetic radiation could come to equilibrium. The equipartition theo-

rem suggested that if you could come to equilibrium, each mode would

have kB T of energy. Since there are immensely more wavevectors in the

ultraviolet and X-ray ranges than in the infrared and visible, this pre-

dicts that when you open your oven door youd get a sun tan or worse

(the so-called ultraviolet catastrophe). Simple experiments looking at

radiation emitted from pinholes in otherwise closed boxes held at xed

temperature saw a spectrum which looked compatible with classical sta-

tistical mechanics for small frequency radiation, but was cut o at high

frequencies. This was called blackbody radiation because a black-walled

box led to fast equilibration of photons inside.

Let us calculate the correct energy distribution inside a blackwalled

box. The number of single-particle planewave eigenstates g() d in a

27

Were going to be sloppy and use small range d is27

g() for photons to be eigenstates per

unit frequency, and g(/) later for single- 2 d|k| 2V

particle eigenstates per unit energy = g() d = (4k ) d (7.63)

frequency.

d (2)3

where the rst term is the surface area of the sphere of radius k, the

second term is the thickness of the sphere for a small d, and the last

is the density of single-particle plane-wave eigenstate wavevectors times

two (because there are two photon polarizations per wavevector). Know-

ing k2 = 2 /c2 and d|k|/d = 1/c, we nd the density of plane-wave

eigenstates per unit frequency

V 2

g() = . (7.64)

2 c3

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.6 Black Body Radiation and Bose Condensation 123

Now, the number of photons is not xed: they can be created or de-

stroyed, so their chemical potential is zero.28 Their energy /k = k .

Finally, they are to an excellent approximation identical, non-interacting

bosons, so the number of photons per eigenstate with frequency is

n = e/k1B T 1 . This gives us a number of photons

g()

(# of photons) d = d (7.65)

e/kB T 1

and an electromagnetic (photon) energy per unit volume u() given by

Rayleigh-Jeans Equipartition

V u()d = d (7.66)

e/kB T 1

V 3

d

= 2 3 /k T .

c e B 1

This is Plancks famous formula for black-body radiation. At low fre-

quencies, we can approximate e/kB T 1 /kB T , yielding the Rayleigh-

Jeans formula Frequency

kB T Fig. 7.7 The Planck black-body

V uRJ ()d = V 2 3

2 d (7.67) radiation power spectrum, with the

c

Rayleigh-Jeans approximation, valid

= kB T g() for low frequency .

oscillator.

For modes with frequencies high compared to kB T /, equipartition

no longer holds. The energy gap , just as for the lowtemperature

specic heat from section 7.2, leads to an excitation probability that

is suppressed by the exponential Boltzmann factor e/kB T , as one

can see from equation 7.66 by approximating e/k1B T 1 e/kB T .

Plancks discovery that quantizing the energy averted the ultraviolet

catastrophe led to his name being given to .

How do the properties of bosons change when they cannot be created

and destroyed? What happens if we have N non-interacting free bosons

in our box of volume V with periodic boundary conditions?

Let us assume that our bosons are spinless, have mass m, and are

non-relativistic so their energy is / = p2 /2m = 2 2 /2m. Well begin

by assuming that we can make the same continuum approximation to

the density of states as we did in the case of black-body radiation. In

equation 7.62, the number of plane-wave eigenstates per unit volume in

k-space is V /8 3 , so the density in momentum space p = k is V /(2)3 .

28 We can also see this from the fact that photons are excitations within a harmonic

oscillator; in section 7.4 we noted that the excitations in a harmonic oscillator satisfy

Bose statistics with = 0.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

124 Quantum Statistical Mechanics

For our massive particles d//d|p| = |p|/m = 2//m, so the number of

plane-wave eigenstates in a small range of energy d/ is

2 d|p| V

g(/)d/ = (4p ) d/

d/ (2)3

m V

= (4(2m/)) d/

2/ (2)3

3/2

Vm

= / d/. (7.68)

2 2 3

where the rst term is the surface area of the sphere in p space, the

second is the thickness of the sphere, and the third is the density of

plane-wave eigenstates per unit volume in p-space.

Now, we ll each of these single-particle plane-wave eigenstates with

an expected number given by the Bose-Einstein distribution at chemical

potential , 1/(e(2)/kB T 1), so the total number of particles N must

be given by

g(/)

N () = d/. (7.69)

e (2)/k BT 1

0

We must vary in this equation to give us the correct number of

particles N . For larger numbers of particles we raise , forcing more

particles into each of the single-particle states. There is a limit, how-

ever, to how hard we can push. As we noted in section 7.4, cannot

be as large than the lowest single-particle eigenvalue, because at that

point that state gets a diverging number of particles. In our continuum

29

At = 0, the denominator of the approximation, however, when = 0 the integral for N () converges.29

integrand in equation 7.69 is approxi- cont

Thus the largest number of particles Nmax we can t into our box

mately //kB T for small /, but the nu-

merator goes as /, so the integral con-

within our continuum approximation for the density of states is the

R 1 value of equation 7.69 at = 0,30

verges at the lower end: 0X / /2

1/ /1/2 |X =

2 0 X/2. g(/)

cont

30 Nmax = d/. (7.70)

The function (s) = e2/kB T 1

1

R z s1

s

0 e 1 ds is famous be- V m3/2 /

=

(s1)!

cause it is related to the distribution d/ 2/k T

of prime numbers, because it is 2 0

2 3 e B 1

the subject of the famous unproven

3

2mkB T 2 z

Riemann hypothesis (about its zeros =V dz

h 0 ez 1

in the complex plane), and because

the values in certain regions form V

excellent random numbers. Remember = (3/2 ).

1/ ! = (3/ ) = /2. 3

2 2

where is the Riemann zeta function, with (3/2 ) 2.612, and where

= h/ 2mkB T is the thermal de Broglie wavelength we saw rst in

31

This formula has a simple interpre- the canonical ensemble of the ideal gas, equation 5.26.31 Thus something

tation: the quantum statistics of the new has to happen at a critical density

particles begin to dominate the behav-

cont

ior when they are within a thermal de Nmax (3/2 ) 2.612 particles

Broglie wavelength of one another. = 3 = . (7.71)

V deBroglie volume

What happens when we try to cram more particles in? What hap-

pens is that our approximation of the distribution of eigenstates as a

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.7 Metals and the Fermi Gas 125

continuum breaks down. Figure 7.8 shows a schematic of the rst few

single-particle eigenvalues. When the distance between and the bot-

tom level becomes signicantly smaller than the distance between the

bottom and the next level, the continuum approximation (which roughly

treats the single state /0 as the integral halfway to /1 ) becomes qualita-

tively wrong. This lowest state absorbs all the extra particles added to

cont 32 32

the system beyond Nmax . This is called Bose-Einstein condensation. The next few states have quantitative

Usually, one doesnt add particles at xed temperature, one lowers corrections, but the continuum approx-

imation is only o by small factors.

the temperature at xed density N/V , where Bose condensation occurs

at temperature

2/3

BEC h2 N

kB Tc = . (7.72)

2m V (3/2 )

Bose condensation has recently been observed in ultracold gasses (see

problem 7.11).

Bose condensation has also long been considered the underlying prin- 1

ciple behind superuidity.33 Liquid He4 undergoes an unusual transition

at about 2.176K to a state without viscosity: it will swirl round a circu- 0

lar tube for as long as your refrigeration lasts. The quantitative study

of the superuid transition involves the interactions between the helium Fig. 7.8 Bose condensation: the chem-

atoms and the scaling methods well introduce in chapter 13. But its in- ical potential is so close to the ground

teresting to note that the Bose condensation temperature for liquid He4 state energy /0 that the continuum ap-

proximation to the density of states

with m = 6.65 1024gm and volume per particle V /N = 27.6 cm/mole breaks down. The ground state is

is 3.13K: quite close to the superuid transition temperature. macroscopically occupied (that is, lled

by a non-zero fraction of the total num-

ber of particles N ).

7.7 Metals and the Fermi Gas 33

The connection is deep. The den-

sity matrix of a superuid has an un-

We claimed in section 7.4 that many systems of stronglyinteracting usual property, called o-diagonal long-

fermions (metals, neutron stars, nuclei) are surprisingly well described range-order, which is also found in the

Bose condensate (see problem 10.5).

by a model of non-interacting fermions. Lets solve the simplest such

model, N free non-interacting fermions in a box.

Let our particles be non-relativistic and spin 1/2. The singleparticle

eigenstates are the same as those for bosons34 except that there are two 34

You need two particles for bosons

states (spin up, spin down) per plane wave. Hence the density of states and fermions to dier.

is given by twice that of equation 7.68:

2V m3/2

g(/) = /. (7.73)

2 3

So, the number of fermions at chemical potential is given by integrating

g(/) times the expected number of fermions in a state of energy /, given

by the Fermi function f (/) of equation 7.46:

g(/)

N () = g(/)f (/) d/. (2)/kB T + 1

d/. (7.74)

0 0 e

What chemical potential will give us N fermions? In general, one

must do a self-consistent calculation, but the calculation is easy in the

important limit of zero temperature. In that limit (gure 7.4) the Fermi

function is a step function f (/) = ( /); all states below are lled,

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

126 Quantum Statistical Mechanics

and all states above are empty. The zero-temperature value of the

chemical potential is called the Fermi energy /F . We can nd the number

of fermions by integrating up to = /F :

2F 3/2 2F

2m (2/F m)3/2

N= g(/) d/ = V / d/ = V. (7.75)

0 2 3 0 3 2 3

This formula becomes easier to understand if we realize that were

lling all states with wavevector k < kF , where the Fermi wavevector

kF is the length of the wavevector whose eigenenergy

equals the Fermi

energy: kF2 /2m = p2F /2m = /F , so kF = 2/F m/. The resulting

sphere of occupied states at T = 0 is called the Fermi sphere. The

Fig. 7.9 The Fermi surface for

number of fermions inside the Fermi sphere is thus

lithium, from [21]. The Fermi energy

for lithium is 4.74 eV, with one con-

kF 3 4 3

2V

duction electron outside a He closed N= = /3 kF V (7.76)

shell. Note that for most metals the 3 2 (2)3

Fermi energy is much larger than kB

times the melting point (/F =4.74 eV = the kspace volume of the Fermi sphere times the kspace density of

55,000 K, and the melting point is 453 states.

K). Hence they are well described by

the T = 0 Fermi surfaces here, slightly

We mentioned earlier that the independent fermion approximation

smeared by the Fermi function shown was startlingly useful even though the interactions are not small. Ignor-

in gure 7.4. ing the Coulomb repulsion between electrons in a metal, or the strong

interaction between neutrons in a neutron star, gives an excellent de-

scription of their actual behavior. Our calculation above, though also

assumed that the electrons are free particles, experiencing no external

potential. This approximation isnt particularly accurate in general: the

interactions with the atomic nuclei are important, and is primarily what

makes one material dierent from another. In particular, the atoms in

a crystal will form a periodic potential for the electrons. One can show

that the singleparticle eigenstates in a periodic potential can be cho-

sen to be periodic functions times plane waves [7.61] of exactly the same

wave-vectors as in the free fermion case. A better approximation is given

by incorporating the eects of the into inner shell electrons into the pe-

riodic potential, and lling the Fermi sea with the remaining conduction

electrons. The lling of the Fermi surface in k-space as described here

is changed only insofar as the energies of these single-particle states is no

Fig. 7.10 The Fermi surface for

longer simple. Some metals (particularly the alkali metals, like lithium in

aluminum, also from [21]. Aluminum

has a Fermi energy of 11.7 eV, with gure 7.9) have roughly spherical Fermi surfaces; many (like aluminum

three conduction electrons outside a Ne in gure 7.10) are quite intricate, with several pieces to them [8, Ch.

closed shell. 9-11].

Exercises

(7.1) Phase Space Units and the Zero of Entropy. In classical mechanics, the entropy S = kB log goes

(Quantum) to minus innity as the temperature is lowered to zero;

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.7 Metals and the Fermi Gas 127

in quantum mechanics the entropy per particle goes to oscillator. Consider a harmonic oscillator with Hamil-

zero,35 because states are quantized and the ground state tonian H = p2 /2m + 1/2 m 2 q 2 . Draw a picture of the

is the only one populated. This is Nernsts theorem, the energy surface with energy E, and nd the volume (area)

third law of thermodynamics. of phase space enclosed. (Hint: the area of an ellipse is

The classical phase-space volume has units of r1 r2 where r1 and r2 are the largest and smallest radii,

((momentum)(distance))3N . Its a little perverse to corresponding to the major and minor axes.) What is

take the logarithm of a quantity with units. The obvi- the volume per energy state, the volume between En and

ous candidate with these dimensions is Plancks constant En+1 , for the eigenenergies En = (n + 1/2 )?

h3N : if we measure phase-space volume in units of h per Why must these two calculations agree? How can we

dimension, will be dimensionless. Of course, the correct derive this result in general, even for nasty systems of

dimension could be a constant times h, like . . . interacting particles? The two traditional methods for

(a) Arbitrary zero of the classical entropy. Show directly calculating the phasespace units in general sys-

that the width of the energy shell dE in the denition of tems semiclassical quantization [54, ch. 48, p. 170]

the entropy does not change the classical entropy per par- and the pathintegral formulation of quantum statistical

ticle S/N . Show that the choice of units in phase space mechanics [30] would be too distracting to present here.

does changes the classical entropy per particle. Upon some thought, we realize that one cannot choose

We need to choose the units of classical phasespace dierent units for the classical phasespace volume for

volume so that the entropy agrees with the high dierent systems. They all must agree, because one can

temperature entropy for the quantum systems. That is, transform one into another. Consider N interacting par-

we need to nd out how many quantum eigenstates per ticles in a box, at high temperatures where classical sta-

unit volume of classical phase space we should expect at tistical mechanics is valid. Imagine slowly and reversibly

high energies. We can x these units by explicitly match- turning o the interactions between the particles (making

ing the quantum result to the classical one for a simple them into our ideal gas). We carefully remain at high tem-

system. Lets start with a free particle. peratures, and measure the entropy ow into or out of the

system. The entropy dierence will be given by classical

(b) Phasespace density of states for a particle in

statistical mechanics, whatever units one wishes to choose

a one-dimensional box. Show, or note, that the quan-

for the phasespace volume. The entropy of the interac-

tum momentumspace density of states for a free quan-

ting system is thus the entropy of the ideal gas (with

tum particle in a onedimensional box of length L with

volume h per state) plus the classical entropy change

periodic boundary conditions is L/h. Draw a picture of

hence also must use the same phasespace units.36

the classical phase space of this box (p, x), and draw a

rectangle of length L for each quantum eigenstate. Is the

(7.2) Does Entropy Increase in Quantum Sys-

phase-space area per eigenstate equal to h, as we assumed

tems? (Mathematics, Quantum)

in 3.5?

We saw in problem 6.5 that in classical Hamilto-

This works also for N particles in a threedimensional

nian Rsystems the non-equilibrium entropy Snonequil =

box.

kB log is constant in a classical mechanical Hamil-

(c) Phasespace density of states for N particles tonian system. Well show here that it is constant also in

in a box. Show that the density of states for N free par- a closed quantum Hamiltonian system.

ticles in a cubical box of volume V with periodic boundary

A general ensemble in a quantum system is described by

conditions is V N /h3N , and hence that the phasespace the density matrix . In most of statistical mechanics,

volume per state is h3N .

is diagonal when we use a basis of energy eigenstates.

Can we be sure that the answer is independent of which Here, since each energy eigenstate is time independent

simple system we use to match? Lets see if it also works except for a phase, any mixture of energy eigenstates will

for the harmonic oscillator. have a constant density matrix, and so will have a con-

(d) Phasespace density of states for a harmonic stant entropy.

35 If the ground state is degenerate, the entropy doesnt go to zero, but it typically

stays nite as the number of particles N gets big, so for large N the entropy per

particle goes to zero.

36 In particular, if we cool the interacting system to zero temperature and remain in

equilibrium, reaching the ground state, that its entropy will go to zero the entropy

ow out of the system on cooling is given by our classical formula with phase-space

volume measured in units of h.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

128 Quantum Statistical Mechanics

(a) Entropy is Constant: Mixtures of Energy There are N defects in the crystal, which can be assumed

Eigenstates. Prove that if is a density matrix diagonal stuck in position (and hence distinguishable) and assumed

in the basis of energy eigenstates, that is time indepen- not to interact with one another.

dent. Hence, conclude that the entropy S = Tr log is Write the canonical partition function Z(T ), the mean

time-independent energy E(T ), the uctuations in the energy, the entropy

Thus, not only are the microcanonical and canonical en- S(T ), and the specic heat C(T ) as a function of tempera-

sembles time independent, but mixing energy eigenstates ture. Plot the specic heat per defect C(T )/N for M = 6;

in any ratio would be time independent. To justify equi- set the unit of energy equal to 9 and kB = 1 for your plot.

libration in quantum systems, one must couple the sys- Derive a simple relation between M and the change in

tem to the outside world and induce transitions between entropy between zero and innite temperature. Check this

eigenstates. relation using your formula for S(T ).

In the particular case of the entropy, the entropy is

time independent even for general, time-dependent den-

sity matrices. (7.5) Density Matrices. (Quantum)

(b) Entropy is Constant: General Density Matri- (a) Density matrices for photons. Write the den-

ces. Prove that S = Tr ( log ) is time-independent, sity matrix for a photon linearly traveling along z and

where is any density matrix. (Hint: Show that linearly polarized along x, in the basis where (1, 0) and

Tr(ABC) = Tr(CAB) for any matrices A, B, and C. (0, 1) are polarized along x and y. Write the density

ma-

Also you should know that an operator M commutes with trix for a right-handed polarized photon, (1/ 2, i/ 2),

any function f (M).) and the density matrix for unpolarized light. Calculate

Tr(), Tr(2 ), and S = kB Tr( log ). Interpret the

(7.3) Phonons on a String. (Quantum)

values of the three traces physically: one is a check for

One-dimensional Phonons. A nano-string of length L pure states, one is a measure of information, and one is

with mass per unit length under tension has a ver- a normalization.

tical, transverse displacement u(x, t). The kinetic energy

density is (/2)(u/t)2 and the potential energy density (b) Density matrices for a spin.(Adapted from

is ( /2)(u/x)2 . Halperins course, 1976.) Let the Hamiltonian for a spin

Write the kinetic energy and the potential energy in new be

variables, changing

P from u(x, t) to normal modes qk (t) H = B F (7.78)

with u(x, t) = 2

n qkn (t) sin(kn x), kn = n/L. Show in

these variables that the system is a sum of decoupled har- where F = (x , y , z ) are the three Pauli spin matrices,

monic oscillators. Calculate the density of states per unit and B may be interpreted as a magnetic eld, in units

frequency g(), the number of normal modes in a fre- where the gyromagnetic ratio is unity. Remember that

quency range (, + 9) divided by 9, keeping 9 large com- i j j i = 2i9ijk k . Show that any 2 2 density ma-

pared to the spacing between modes.37 Calculate the spe- trix may be written in the form

cic heat of the string c(T ) per unit length in the limit

L , treating the oscillators quantum mechanically.

What is the specic heat of the classical string? = 1/2 (1 + p F). (7.79)

(7.4) Crystal Defects. (Quantum, Basic) Show that the equations of motion for the density matrix

Defects in Crystals. A defect in a crystal has one on- i/t = [H, ] can be written as dp/dt = B p.

center conguration with energy zero, and M o-center

congurations with energy 9, with no signicant quan-

tum tunneling between the states. The Hamiltonian can (7.6) Ensembles and Statistics: 3 Particles, 2 Lev-

be approximated by the (M + 1) (M + 1) matrix els. (Quantum)

0 1

0 0 0 A system has two single-particle eigenfunctions, with en-

H = @0 9 0 A (7.77) ergies (measured in degrees Kelvin) E0 /kB = 10 and

0 0 9 E2 /kB = 10. Experiments are performed by adding three

lem 7.11 well study the density of many-body energy eigenstates g(E) in a trap with

precisely three frequencies (where our g() would be = ( 0 ) + 2( 1 )).

Dont confuse the two.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.7 Metals and the Fermi Gas 129

non-interacting particles to these two states, either iden- fermions as a function of temperature? Bosons? Distin-

tical spin 1/2 fermions, identical spinless bosons, distin- guishable particles? Maxwell-Boltzmann particles?

guishable particles, or spinless identical particles obey-

ing Maxwell-Boltzmann statistics. Please make a table

for this problem, giving your answers for the four cases Constant T,

Chemical Potentials for Three Particles in Two States

(Fermi, Bose, Dist., and MB) for each of the three parts.

80

Calculations may be needed, but only the answers will be A C

B

graded. 60 C

D

B

E

40 F

A

20

Constant E, N D

Entropies for Three Particles in Two States 0

3

-20

A

B

-40

E

C Log(8)

2 D A

E

-60 F

Log(4)

1 Log(3)

-80

0 20 40 60 80

S/kB

C Log(2)

D Temperature T (degrees K)

0 Log(1)

B Fig. 7.13

Log(1/2)

-1

E

Log(1/6) (c) The system is now held at constant temperature, with

-2

-30 -20 -10 0 10 20 30 chemical potential set to hold the average number of par-

E/kB (degrees K)

ticles equal to three. In gure 7.13, which curve repre-

Fig. 7.11 sents the chemical potential of the fermions as a func-

tion of temperature? Bosons? Distinguishable? Maxwell-

Boltzmann?

(a) The system is rst held at constant energy. In g-

ure 7.11 which curve represents the entropy of the fer- (7.7) Bosons are Gregarious: Superuids and

mions as a function of the energy? Bosons? Distinguish- Lasers (Quantum)

able particles? Maxwell-Boltzmann particles?

Many experiments insert a new particle into a manybody

state: new electrons into a metal, new electrons or elec-

Constant T, N tron pairs into a superconductor, new bosonic atoms into

Energies of Three Particles in Two States

20 a superuid, new photons into a cavity already lled with

A light. These experiments explore how the bare inserted

Average Energy E/kB (degrees K)

B

10 C D particle decomposes into the natural states of the many

D

E

body system. The cases of photons and bosons illustrate

0 a key connection between laser physics and Bose conden-

A sates.

C

-10 Adding a particle to a Bose condensate. Suppose

E B we have a noninteracting system of bosonic atoms in a

-20 box with singleparticle eigenstates n . Suppose the sys-

tem begins in a Bose condensed state with all N bosons

-30 in a state 0 , so

0 20 40 60 80

T (degrees K)

[0]

Fig. 7.12 N (r1 , . . . , rN ) = 0 (r1 ) . . . 0 (rN ) (7.80)

(b) The system is now held at constant temperature. In tem, into an equal superposition of the M lowest single

gure 7.12 which curve represents the mean energy of the particle states.38 That is, if it were injected into an empty

38 For free particles in a cubical box of volume V , injecting a particle at the origin

(r) = (r) would be a superposition of all planewave states of equal weight, (r) =

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

130 Quantum Statistical Mechanics

box, it would start in state the energy of the atom will be uncertain by an amount

E . Assume for simplicity that, in a cubical box

1 `

(rN+1 ) = 0 (rN+1 ) + 1 (rN+1 ) (7.81) without preexisting photons, the atom would decay at

M an equal rate into any mode in the range E /2 <

+ + M 1 (rN+1 ) < E + /2.

The state (r1 , . . . rN+1 ) after the particle is inserted into (b) Assuming a large box and a small decay rate , nd

the noninteracting Bose condensate is given by sym- a formula for the number of modes M per unit volume V

[0] competing for the photon emitted from our atom. Evalu-

metrizing the product function N (r1 , . . . , rN )(rN+1 )

(equation 7.28). ate your formula for a laser with wavelength = 619 nm

(a) Calculate the symmetrized initial state of the system and the line-width = 10 kHz. (Hint: use the density

with the injected particle. Show that the ratio of the prob- of states, equation 7.64.)

ability that the new boson enters the ground state (0 ) is Assume the laser is already in operation, so there are N

enhanced over that of its entering an empty state (m for photons in the volume V of the lasing material, all in one

0 < m < M ) by a factor N + 1. (Hint: rst do it for planewave state (a singlemode laser).

N = 1.) (c) Using your result from part (a), give a formula for

So, if a macroscopic number of bosons are in one single the number of photons per unit volume N/V there must

particle eigenstate, a new particle will be much more be in the lasing mode for the atom to have 50% likelihood

likely to add itself to this state than to any of the mi- of emitting into that mode.

croscopically populated states.

The main task in setting up a laser is providing a popu-

Notice that nothing in your analysis depended on 0 be- lation of excited atoms! Amplication can occur if there

ing the lowest energy state. If we started with a macro- is a population inversion, where the number of excited

scopic number of particles in a singleparticle state with atoms is larger than the number of atoms in the lower

wave-vector k (that is, a superuid with a supercurrent energy state (clearly a non-equilibrium condition). This

in direction k), new added particles, or particles scat- is made possible by pumping atoms in to the excited state

tered by inhomogeneities, will preferentially enter into by using one or two other singleparticle eigenstates.

that state. This is an alternative approach to understand-

ing the persistence of supercurrents, complementary to

the topological approach (exercise 9.4). (7.8) Einsteins A and B (Quantum, Mathematics)

Adding a photon to a laser beam. In part (a), we Einstein deduced some basic facts about the interaction of

saw that adding a boson to a singleparticle eigenstate light with matter very early in the development of quan-

with N existing bosons has a probability which is larger tum mechanics, by using statistical mechanics! In par-

by a factor N + 1 than adding a boson to an empty state. ticular, he established that stimulated emission was de-

This chummy behavior between bosons is also the princi- manded for statistical mechanical consistency, and found

ple behind lasers.39 If we think of an atom in an excited formulas determining the relative rates of absorption,

state, the photon it emits during its decay will prefer to spontaneous emission, and stimulated emission. (See [86,

join the laser beam than to go o into one of its other I.42-5]).

available modes. In this factor N + 1, the N represents Consider a system consisting of non-interacting atoms

stimulated emission, where the existing electromagnetic weakly coupled to photons (electromagnetic radiation),

eld pulls out the energy from the excited atom, and the in equilibrium at temperature kB T = 1/. The atoms

1 represents spontaneous emission which occurs even in have two energy eigenstates E1 and E2 with average pop-

the absence of existing photons. ulations N1 and N2 : the relative population is given as

Imagine a single atom in an state with excitation energy usual by the Boltzmann distribution

energy E and decay rate , in a cubical box of volume V

fi fl

with periodic boundary conditions for the photons. By N2

the energytime uncertainty principle, E t /2 = e(E2 E1 ) . (7.82)

N1

P

V ke V k ak .)

So, we gently add a particle at the origin by restricting this sum to low energy

states. This is how quantum tunneling into condensed states (say, from Josephson

junctions or scanning tunneling microscopes) is usually modeled.

39 Laser is an acronym for Light Amplication by the Stimulated Emission of Ra-

diation.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.7 Metals and the Fermi Gas 131

The energy density in the electromagnetic eld is given (7.9) Phonons and Photons are Bosons. (Quantum)

by the Planck distribution (equation 7.66):

Phonons and photons are the elementary, harmonic ex-

3

u() = . (7.83) citations of the elastic and electromagnetic elds. Weve

2 3

c e 1

seen in 7.3 that phonons are decoupled harmonic oscilla-

An atom in the ground state will absorb electromag- tors, with a distribution of frequencies . A similar anal-

netic energy from the photons at a rate that is propor- ysis shows that the Hamiltonian of the electromagnetic

tional to the energy density u() at the excitation energy eld can be decomposed into harmonic normal modes

= E2 E1 . Let us dene this absorption rate per called photons.

atom to be 2Bu().40 This problem will explain why we think of phonons and

An atom in the excited state E2 , with no electromagnetic photons as particles, instead of excitations of harmonic

stimulation, will decay into the ground state with a rate modes.

A, emitting a photon. Einstein noted that neither A nor (a) Show that the canonical partition function for a quan-

B should depend upon temperature. tum harmonic oscillator of frequency is the same as the

Einstein argued that just these two rates would lead to grand canonical partition function for bosons multiply ll-

an inconsistency. ing a single state with energy , with = 0, up to a shift

(a) Compute the longtime average ratio N2 /N1 assum- in the arbitrary zero of the total energy of the system.

ing only absorption and spontaneous emission. Even in The Boltzmann lling of a harmonic oscillator is there-

the limit of weak coupling (small A and B), show that this fore the same as the Bose-Einstein lling of bosons into

equation is incompatible with the statistical distributions a single quantum state, except for an extra shift in the

[7.82] and [7.83]. (Hint: Write a formula for dN1 /dt, energy of /2. This extra shift is called the zero point

and set it equal to zero. Is B/A temperature indepen- energy. The excitations within the harmonic oscillator

dent?) are thus often considered particles with Bose statistics:

Einstein xed this by introducing stimulated emission. the nth excitation is n bosons occupying the oscillators

Roughly speaking, an atom experiencing an oscillating quantum state.

electromagnetic eld is more likely to emit photons into This particle analogy becomes even more compelling for

that mode. Einstein found that the stimulated emission systems like phonons and photons where there are many

rate had to be a constant 2B times the energy density harmonic oscillator states labeled by a wavevector k (see

u(). problem 7.3). Real, massive bose particles like He4 in

(b) Write the equation for dN1 /dt, including absorption free space have singleparticle quantum eigenstates with

(a negative term) and spontaneous and stimulated emis- a dispersion relation 9k = 2 k2 /2m. Phonons and pho-

sion from the population N2 . Assuming equilibrium, use tons have one harmonic oscillator for every k, with an ex-

this equation and equations 7.82 and 7.83 to solve for B, citation energy 9k = k . If we treat them, as in part (a),

and B in terms of A. These are generally termed the as bosons lling these as single-particle states we nd that

Einstein A and B coecients. they are completely analogous to ordinary massive par-

Lets express the stimulated emission rate in terms of the ticles. The only dierence is that the relation between

number of excited photons per mode (see exercise 7.7b energy and wave-vector (called the dispersion relation) is

for an alternative derivation). dierent: for photons, 9k = k = c|k|.41

(c) Show that the rate of decay of excited atoms A + (b) Do phonons or photons Bose condense at low tem-

2B u() is enhanced by a factor of n + 1 over the zero peratures? Can you see why not? Can you think of a

temperature rate, where n is the expected number of pho- nonequilibrium Bose condensation of photons, where a

tons in a mode at frequency = E2 E1 . macroscopic occupation of a single frequency and momen-

tum state occurs?

cycles (f ) where f = /2 is in cycles per second, and has

no factor of 2. Since ucycles (f )df = u()d, the absorption rate Bucycles (f ) =

Bu()d/df = 2Bu(). p

41 If massive particles are moving fast, their energies are / = m2 c4 p2 c2 . This

formula reduces to p2 /2m + mc2 = 2 k 2 /2m + mc2 if the kinetic energy is small

compared to the rest mass mc2 . For massless particles, / = |b|c = |k|c, precisely

the relation we nd for photons (and for phonons at low frequencies). So actually

even the dispersion relation is the same: photons and phonons are massless bosons.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

132 Quantum Statistical Mechanics

Be careful not to get confused when we put real, mas- out: the spreading cloud was imaged 60ms later by shin-

sive bosons into a harmonic oscillator potential (problem ing a laser on them and using a CCD to image the shadow.

7.11). There it is best to think of each harmonic oscilla- For your convenience, the ground state of a parti-

tor as being many separate eigenstates being lled by the cle of mass m in a one-dimensional` harmonic oscilla-

1/4 mx 2

atoms. tor with frequency is 0 (x) = m e /2

,

and the momentum-space wave function is 0 (k) =

(7.10) Bose Condensation in a Band. (Basic, Quan- ` 1/4 k2 /2m

m

e .

tum)

(c) What is the ground-state wave-function for one

The density of states g(E) of a system of non-interacting rubidium-87 atom in this potential? What is the wave-

bosons forms a band: the single-particle eigenstates are function in momentum space? The probability distribu-

conned to an energy range Emin < E < Emax , so g(E) tion of the momentum? What is the ratio of the velocity

is non-zero in this range and zero otherwise. The sys- widths along the axis and perpendicular to the axis for

tem is lled with a nite density of bosons. Which of the ground state? For the classical thermal distribution

the following is necessary for the system to undergo Bose of velocities? If the potential is abruptly removed, what

condensation at low temperatures? will the shape of the distribution of positions look like

(a) g(E)/(e(EEmin ) + 1) is nite as E Emin . 60ms later, (ignoring the small width of the initial dis-

(b) g(E)/(e (EEmin )

1) is nite as E Emin . tribution in space)? Compare your predicted anisotropy

(c) Emin 0. to the false-color images above. If the x axis goes mostly

RE right and a bit up, and the y axis goes mostly up and a

(d) E g(E )/(E Emin ) dE is a convergent integral

min bit left, which axis corresponds to the axial frequency and

at the lower limit Emin .

which corresponds to one of the two lower frequencies?

(e) Bose condensation cannot occur in a system whose

Their Bose condensation isnt in free space: the atoms

states are conned to an energy band.

are in a harmonic oscillator potential. In the calculation

in free space, we approximated the quantum states as a

(7.11) Bose Condensation in a Parabolic Potential.

continuum density of states g(E). Thats only sensible

(Quantum)

if kB T is large compared to the level spacing near the

42

ground state.

Wieman and Cornell in 1995 were able to get a dilute gas (d) Compare to kB T at the Bose condensation point

of rubidium-87 atoms to Bose condense [4]. Tcmeasured in their experiment.

(a) Is rubidium-87 (37 protons and electrons, 50 neu- For bosons in a one-dimensional harmonic oscillator of

trons) a boson or a fermion? frequency 0 , its clear that g(E) = 1/(0 ): the number

(b) At their quoted maximum number density of 2.5 of states in a small range E is the number of 0 s it

1012 /cm3 , at what temperature Tcpredict do you expect the contains.

onset of Bose condensation in free space? They claim (e) Compute the density of states

that they found Bose condensation starting at a temper- Z

ature of Tcmeasured = 170nK. Is that above or below your

g(E) = d91 d92 d93 g1 (91 )g2 (92 )g3 (93 )

estimate? (Useful constants: h = 6.6262 1027 erg 0

sec, mn mp = 1.6726 1024 gm, kB = 1.3807 1016 (E (91 + 92 + 93 )) (7.84)

erg/K.)

The trap had an eective potential energy that was har- for a three-dimensional harmonic oscillator, with one fre-

monic in the three directions, but anisotropic with cylin- quency 0 and two of frequency 1 . Show that its equal

drical symmetry. The frequency along the cylindrical axis to 1/ times the number of states in F9 space between en-

was f0 =120Hz so 0 750Hz, and the two other fre- ergies E and E + . Why is this triangular slab not of

quencies were smaller by a factor of 8: 1 265Hz. thickness ?

The Bose condensation was observed by abruptly remov- Their experiment has N = 2 104 atoms in the trap as

ing the trap potential,43 and letting the gas atoms spread it condenses.

Anderson, J.R. Ensher, M.R. Matthews, C.E. Wieman, and E.A. Cornell, Science

269, 198 (1995). http://jilawww.colorado.edu/bec/.

43 Actually, they rst slowly reduced it by a factor of 75 and then abruptly reduced

it from there; Im not sure why, but lets ignore that complication.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.7 Metals and the Fermi Gas 133

reference [4]. The pictures are spatial distributions 60ms after the potential is re-

moved; the eld of view of each image is 200m 270m. The left picture is roughly

spherically symmetric, and is taken before Bose condensation; the middle has an el-

liptical Bose condensate superimposed on the spherical thermal background; the right

picture is nearly pure condensate. I believe this may not be the same experiment as

described in their original paper.

(f ) By working in analogy with the calculation in free distant walls of the cavity, where the existence of the hole

space, nd the maximum number of atoms that can oc- is a negligible perturbation. So, presuming the relevant

cupy the three-dimensional harmonic oscillator potential photons just inside the hole are distributed in the same

in part (e) without Bose Rcondensation at temperature way as in the box as a whole (equation 7.66), how many

T . (Youll want to know 0 z 2 /(ez 1) dz = 2 (3) = leave in a time dt?

2.40411.) According to your calculation, at what temper-

ature TcHO should the real experimental trap have Bose

condensed?

c dt

(7.12) Light Emission and Absorption. (Quantum,

Basic)

The experiment that Planck was studying did not di- Fig. 7.15 The photons leaving a cavity in a time dt are those

rectly measure the energy density per unit frequency, within vz dt of the hole.

equation 7.66 inside a box. It measured the energy ra-

diating out of a small hole, of area A. Let us assume the

hole is on the upper face of the cavity, perpendicular to As one can see geometrically (gure 7.15), those pho-

the z axis. tons within vz dt of the boundary will escape in time dt.

What is the photon distribution just inside the boundary The vertical velocity vz = c cos(), where is the photon

of the hole? Clearly there are few photons coming into velocity angle with respect to the vertical. The Planck

the hole from the outside, so the distribution is depleted distribution is isotropic, so the probability that a photon

for those photons with vz < 0. However, the photons will be moving at an angle is the perimeter of the

with vz > 0 to an excellent approximation should be un- circle on the sphere divided by the area of the sphere,

2 sin() d

aected by the hole since they were emitted from far 4

= 1/2 sin() d.

44 Were being sloppy again, using the same name for the probability densities

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

134 Quantum Statistical Mechanics

R

(a) Show that the probability density44 (vz ) for a par- (c) Using the fact that 0 x3 /(ex 1) dx = 4 /15, show

ticular photon to have velocity vz is independent of vz in that

1

the range (c, c), and thus is 2c . (Hint: (vz ) vz = Z

() .) Qtot (T ) = Pblack (, T ) d = T 4 . (7.87)

Clearly, an upper bound on the energy emitted from a 0

and give a formula for the StefanBoltzmann constant .

whole (eq. 7.66) times the fraction AcV dt of the volume

The value is = 5.67 105 erg cm2 K4 s1 . (Hint:

within c dt of the hole.

use this to check your answer.)

(b) Show that the actual energy emitted is 1/4 of

this

R c upper bound. (Hint: Youll need to integrate

(vz )vz dvz .) (7.13) Fermions in Semiconductors. (Quantum)

0

Hence the power per unit area emitted from the small Lets consider a simple model of a doped semiconduc-

hole in equilibrium is tor [8, ch. 28]. Consider a crystal of phosphorous-doped

silicon, with N M atoms of silicon and M atoms of

c 3 d

Pblack (, T ) = . (7.85) phosphorous. Each silicon atom contributes one electron

4 2 c3 e/kB T 1 to the system, and has two states at energies /2, where

Why is this called blackbody radiation? Certainly a = 1.16eV is the energy gap. Each phosphorous atom

small hole in a large (cold) cavity looks black any light contributes two electrons and two states, one at /2

entering the hole bounces around inside until it is ab- and the other at /2 9, where 9 = 0.044eV is much

45

sorbed by the walls. Suppose we placed a black object smaller than the gap. (Our model ignores the quantum

a material that absorbed radiation at all frequencies and mechanical hopping between atoms that broadens the lev-

angles capping the hole. This object would absorb radi- els at /2 into the conduction band and the valence

ation from the cavity, rising in temperature until it came band. It also ignores spin and chemistry: each silicon re-

to equilibrium with the cavity emitting just as much ally contributes four electrons and four levels, and each

phosphorous ve electrons and four levels.) To summa-

radiation as it absorbs. Thus the overall power per unit

area emitted by our black object in equilibrium at a given rize, our system has N + M spinless electrons (maximum

temperature must equal that of the hole. This must also of one electron per state), N valence band states at en-

ergy /2, M impurity band states at energy /2 9,

be true if we place a selective lter between the hole and

our black body, passing through only particular types of and N M conduction band states at energy /2.

photons. Thus the emission and absorption of our black (a) Derive a formula for the number of electrons as a

body must agree with the hole for every photon mode in- function of temperature T and chemical potential for

dividually, an example of the principle of detailed balance the energy levels of our system.

we will discuss in more detail in section 8.2. (b) What is the limiting occupation probability for the

How much power per unit area Pcolored (, T ) is emitted in states as T , where entropy is maximized and all

equilibrium at temperature T by a red or maroon body? states are equally likely? Using this, nd a formula for

A white body? A mirror? These objects are dierent in (T ) valid at large T , not involving or 9.

the fraction of incident light they absorb at dierent fre- (c) Draw an energy level diagram showing the lled and

quencies and angles a(, ). We can again use the prin- empty states at T = 0. Find a formula for (T ) in the

ciple of detailed balance, by placing our colored object low temperature limit T 0, not involving the variable

next to a black body and matching the power emitted T. (Hint: Balance the number of holes in the impu-

and absorbed for each angle and frequency: rity band with the number of electrons in the conduction

band. Why can you ignore the valence band?)

Pcolored (, T, ) = Pblack (, T )a(, ) (7.86)

(d) In a one centimeter cubed sample, there are M = 1016

Finally, we should calculate Qtot (T ), the total power per phosphorous atoms; silicon has about N = 5 1022 atoms

unit area emitted from a black body at temperature T , per cubic centimeter. Find at room temperature (1/40

by integrating 7.85 over frequency. eV) from the formula you derived in part (a). (Probably

45 The phosphorous atom is neutral when both of its states are lled: the upper

energy shift / represents the Coulomb attraction of the electron to the phosphorous

ion: its small because the dielectric constant is large (see A&M above).

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

7.7 Metals and the Fermi Gas 135

trying various is easiest: set up a program on your cal- (b) Using the non-relativistic model in part (a), calculate

culator or computer.) At this temperature, what fraction the Fermi energy of the electrons in a white dwarf star

of the phosphorous atoms are ionized (have their upper of the mass of the Sun, 2 1033 gm, assuming that it is

energy state empty)? What is the density of holes (empty composed of helium. (i) Compare it to a typical chem-

states at energy /2)? ical binding energy of an atom. Are we justied in ig-

Phosphorous is an electron donor, and our sample is noring the electron-electron and electron-nuclear interac-

doped n-type, since the dominant carriers are electrons: tions (i.e., chemistry)? (ii) Compare it to the temper-

p-type semiconductors are doped with holes. ature inside the star, say 107 K. Are we justied in as-

suming that the electron gas is degenerate (roughly zero

temperature)? (iii) Compare it to the mass of the elec-

(7.14) White Dwarves, Neutron Stars, and Black tron. Are we roughly justied in using a non-relativistic

Holes. (Astrophysics,Quantum) theory? (iv) Compare it to the mass dierence between a

As the energy sources in large stars are consumed, and the proton and a neutron.

temperature approaches zero, the nal state is determined The electrons in large white dwarf stars are relativistic.

by the competition between gravity and the chemical or This leads to an energy which grows more slowly with

nuclear energy needed to compress the material. radius, and eventually to an upper bound on their mass.

A simple model of ordinary stellar matter is a Fermi sea (c) Assuming extremely relativistic electrons with 9 = pc,

of non-interacting electrons, with enough nuclei to bal- calculate the energy of a sphere of non-interacting elec-

ance the charge. Lets model a white dwarf (or black trons. Notice that this energy cannot balance against the

dwarf, since we assume zero temperature) as a uniform gravitational energy of the nuclei except for a special value

density of He4 nuclei and a compensating uniform den- of the mass, M0 . Calculate M0 . How does your M0 com-

sity of electrons. Assume Newtonian gravity. Assume the pare with the mass of the Sun, above?

chemical energy is given solely by the energy of a gas of A star with mass larger than M0 continues to shrink as it

non-interacting electrons (lling the levels to the Fermi cools. The electrons (note (b.iv) above) combine with the

energy). protons, staying at a constant density as the star shrinks

(a) Assuming non-relativistic electrons, calculate the en- into a ball of almost pure neutrons (a neutron star, often

ergy of a sphere with N zero-temperature non-interacting forming a pulsar because of trapped magnetic ux). Re-

electrons and radius R.46 Calculate the Newtonian grav- cent speculations [82] suggests that the neutronium will

itational energy of a sphere of He4 nuclei of equal and further transform into a kind of quark soup with many

opposite charge density. At what radius is the total en- strange quarks, forming a transparent insulating mate-

ergy minimized? rial.

A more detailed version of this model was studied by For an even higher mass, the Fermi repulsion between

Chandrasekhar and others as a model for white dwarf quarks cant survive the gravitational pressure (the

stars. Useful numbers: mp = 1.6726 1024 gm, quarks become relativistic), and the star collapses into

mn = 1.6749 1024 gm, me = 9.1095 1028 gm, a black hole. At these masses, general relativity is im-

= 1.05459 1027 erg sec, G = 6.672 108 cm3 /(gm portant, going beyond the purview of this course. But

s2 ), 1 eV = 1.60219 1012 erg, kB = 1.3807 1016 erg the basic competition, between degeneracy pressure and

/ K, and c = 3 1010 cm/s. gravity, is the same.

46 You may assume that the singleparticle eigenstates have the same energies and

like xed versus periodic boundary conditions, the boundary doesnt matter to bulk

properties.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

136 Quantum Statistical Mechanics

Computational Stat Mech:

Ising and Markov 8

Lattice models are a big industry within statistical mechanics. Placing

some degrees of freedom on each site of a regular grid, and forming a

Hamiltonian or a dynamical evolution law to equilibrate or evolve the re-

sulting system forms a centerpiece of computational statistical mechanics

(as well as the focus of much of the theoretical work). Critical phenom-

ena and phase transitions [13], lattice QCD and quantum eld theories,

quantum magnetism and models for high temperature superconductors,

phase diagrams for alloys [8.1.2], the behavior of systems with dirt or dis-

order, and nonequilibrium systems exhibiting avalanches and crackling

noise [13], all make important use of lattice models. Fig. 8.1 The 2D square-lattice Ising

In this chapter, we will introduce the most studied of these models, model. It is traditional to denote the

the Ising model.1 values si = 1 as up and down, or as

two dierent colors.

1

Isings name is pronounced Eesing,

but the model is usually pronounced

8.1 The Ising Model Eyesing with a long I sound.

of freedom si on each site that may take values 1. This degree of

freedom is normally called a spin.2 We will be primarily interested in 2

Unlike a true quantum spin 1/2 par-

the Ising model in two dimensions on a square lattice, see gure 8.1. ticle there are no terms in the Ising

Hamiltonian that lead to superposi-

The Hamiltonian for the Ising model is tions of states with dierent spins.

H= Jsi sj H si . (8.1)

ij

i

Here the sum ij is over all pairs of spins on nearest-neighbor sites,

and J is the coupling between these neighboring spins. (There are four

neighbors per spin on the 2D square lattice.)

Usually one refers to H

as the external eld, and the sum M = i si as the magnetization, in

reference to the Ising models original application to magnetic systems.3 3

We shall use boldface M to denote the

Well usually assume the model has N spins forming a square, with total magnetization, and (especially in

the problems) will also refer to M =

periodic boundary conditions. M/N , the average magnetization per

spin.

8.1.1 Magnetism

As a model for magnetism, our spin si = 2iz , the z-component of the

net spin of a spin 1/2 atom in a crystal. The interactions between spins

137

138 Computational Stat Mech: Ising and Markov

eld is microscopically Hsi = gH z , where g is the gyromagnetic

ratio for the spin (close to two for the electron).

The energy of two spins Jsi sj is J if the spins are parallel, and +J

if they are antiparallel. Thus if J > 0 the model favors parallel spins: we

say that the interaction is ferromagnetic, because like iron the spins will

tend to all align in one direction at low temperatures, into a ferromag-

B B B B netic phase, where the magnetization per spin will approach M = 1

as the temperature approaches zero. If J < 0 we call the interaction

antiferromagnetic; for our square lattice the spins will tend to align in a

B B A A checkerboard fashion at low temperatures (an antiferromagnetic phase).

At high temperatures, we expect entropy to dominate; the spins will

uctuate wildly, and the magnetization per spin M for a large system

B A A B will be near zero called the paramagnetic phase.

The Ising model is quite a convincing model for binary alloys.5 Imagine

Fig. 8.2 The Ising model as a binary a square lattice of atoms, which can be either of type A or B (gure 8.2).6

alloy. Atoms in crystals naturally sit on

a regular grid: alloys have more than

We set the spin values A = +1 and B = 1. Let the number of the two

one type of element which can sit on kinds of atoms be NA and NB , with NA + NB = N . Let the interaction

the lattice sites (here, types A and B). energy between two neighboring atoms be EAA , EBB , and EAB ; these

5

Indeed, any classical system on a can be thought of as the bond strength the energy needed to break

lattice with local interactions can be the bond. Let the total number of AA nearestneighbor bonds be NAA ,

mapped onto an Isinglike model. and similarly for NBB and NAB . Then the Hamiltonian for our binary

6 alloy is

A realistic alloy might mix roughly

half copper and half zinc to make - Hbinary = EAA NAA EBB NBB EAA NAA . (8.2)

brass. At low temperatures, the cop-

per and zinc atoms sit each on a cubic How is this the Ising model? Lets start by adding a constant CN

lattice, with the zincs in the middle of to the Ising model, and plugging in our new variables:

the copper cubes, together forming a

bodycentered cubic (bcc) lattice. At Hising = J si sj H si CN

high temperatures, the zincs and cop-

pers freely interchange on the two lat- ij

i

tices. The transition temperature is = J (NAA + NBB NAB ) H (NA NB ) CN, (8.3)

about 733C.

since NA NB = M the sum of the spins, NAA + NBB is the number

of parallel neighbors, and NAB is the number of antiparallel neighbors.

There are two bonds per spin, so NAA +NBB +NAB = 2N ; we substitute

N = 1/2 (NAA +NBB +NAB ). For every A atom there must be four bonds

ending with an A, and similarly for every B atom there must be four

bonds ending with a B. Each AA bond gives half an A atom worth of

bond ends, and each AB bond gives a quarter, so

1 1

NB = /2 NBB + /4 NAB (8.4)

4 The interaction between spins is usually better approximated by the dot product

materials have anisotropic crystal structures which make the Ising model at least

approximately valid.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

8.1 The Ising Model 139

Hising = J (NAA + NBB NAB ) H 1/2 (NAA NBB )

C 1/2 (NAA + NBB + NAB )

= (J + 1/2 H + 1/2 C)NAA (J 1/2 H + 1/2 C)NBB

(J + 1/2 C)NAB . (8.5)

This is just of the form of the binary alloy Hamiltonian 8.2, with J =

1

/4 (EAA + EBB 2EAB ), H = EAA EBB , and C = 1/2 (EAA + EBB +

2EAB ).

Now, our model just contains atoms on their lattice sites. Surely if

one kind of atom is larger than the other, itll push neighboring atoms

o their sites? We simply include these reshuings into the energies in

our Hamiltonian 8.2.

What about the vibrations of the atoms about their equilibrium po-

sitions? We can imagine doing a partial trace, as we discussed in sec-

tion 5.4. Just as in problem 5.2, one can incorporate the entropy due to

the local atomic motions S{si } about their lattice sites into an eective

free energy for each atomic conguration7 7

This nastylooking integral over con-

gurations where the atom hasnt

H(P,Q)/kB T shifted too far past its lattice site

Fbinary {si } = kB T log dP dQ e would normally be approximated by a

ri on site si

Gaussian integral over phonon vibra-

= Hbinary {si } T S{si }. (8.6) tions, similar to that described in prob-

lem 5.2(b), gure 5.5.

Again, as in section 5.4, were doing a partial trace over states. If we ig-

nore the congurations where the atoms are not near lattice sites, we can

recover the total partition function by summing over spin congurations

Z= eFbinary {si }/kB T (8.7)

{si }

= dP dQ eH(P,Q)/kB T dP dQ eH(P,Q)/kB T .

{si } ri on site si

(8.8)

Insofar as the entropy in the free energy Fbinary {si } can be approximated

as a sum of pair energies,8 we again get an Ising model, but now with 8

Or we can incorporate second-

temperature dependent parameters. neighbor and three-site interactions, as

we probably needed to do to get an

More elaborate Ising models (with threesite and longerrange in- accurate energy in the rst place.

teractions, for example) are commonly used to compute realistic phase

diagrams for alloys (reference [121]). Sometimes, though, the interac-

tions introduced by relaxations and thermal uctuations o lattice sites

have important longrange pieces, which can lead to qualitative changes

in the behavior for example, turning the transition from continuous to

abrupt.

The Ising model is also used as a model for the liquid-gas transition. In

this lattice gas interpretation, up-spins (si = +1) count as atoms and

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

140 Computational Stat Mech: Ising and Markov

down-spins count as a site without an atom. The gas is the phase with

mostly down spins (negative magnetization), with only a few up-spin

Critical atoms in the vapor. The liquid phase is mostly atoms (up-spins), with

Pressure

Liquid On the whole, the gas phase seems fairly realistic, especially compared

Solid to the liquid phase. The liquid in particular seems much more like a

crystal, with atoms sitting on a regular lattice. Why do we suggest that

Triple

Point

Gas this model is a good way of studying transitions between the liquid and

gas phase?

Temperature Unlike the binary alloy problem, the Ising model is not a good way

to get quantitative phase diagrams for uids. What it is good for is to

Fig. 8.3 A schematic phase diagram understand the properties near the critical point. As shown in gure 8.3,

for a typical material. There is a solid one can go continuously between the liquid and gas phases: the phase

phase at high pressures and low tem-

peratures, a gas phase at low pressures

boundary separating them ends at a critical point Tc , Pc , above which

and high temperatures, and a liquid the two phases blur together seamlessly, with no jump in the density

phase in a region in between. The solid- separating them.

liquid phase boundary corresponds to a The Ising model, interpreted as a lattice gas, also has a line H = 0

change in symmetry, and cannot end.

The liquid-gas phase boundary typi-

along which the density (magnetization) jumps, and a temperature Tc

cally does end: one can go continu- above which the properties are smooth as a function of H (the para-

ously from the liquid phase to the gas magnetic phase). The phase diagram 8.4 looks only topologically like

phase by increasing the pressure above the real liquid-gas coexistence line 8.3, but the behavior near the criti-

Pc , then the temperature above Tc , and

then lowering the pressure again. cal point in the two systems is remarkably similar. Indeed, we will nd

in chapter 13 that in many ways the behavior at the liquidgas critical

point is described exactly by the three-dimensional Ising model.

External Field H

Critical

Up Point

Down

Temperature T 8.1.4 How to Solve the Ising Model.

Tc

How do we solve for the properties of the Ising model?

Fig. 8.4 The phase diagram for the

Ising model. Below the critical tem- (2) Have an enormous brain. Onsager solved the twodimensional

perature Tc , the H = 0 line separates

two phases, an up-spin and a down-

Ising model in a bewilderingly complicated way. Since Onsager,

spin phase. Above Tc the behavior is many great minds have found simpler, elegant solutions, but all

smooth as a function of H; below Tc would take at least a chapter of rather technical and unillumi-

there is a jump in the magnetization as nating manipulations to duplicate. Nobody has solved the three-

one crosses H = 0.

dimensional Ising model.

(3) Do Monte Carlo on the computer.10

9

This is a typical homework problem in

a course like ours: with a few hints, you The Monte Carlo11 method involves doing a kind of random walk

can do it too.

through the space of lattice congurations. Well study these methods

in great generality in section 8.2. For now, lets just outline the Heat

10

Or, do high temperature expansions, Bath Monte Carlo method.

low temperature expansions, transfer

matrix methods, exact diagonalization

of small systems, 1/N expansions in the

11 Monte Carlo is a gambling center in Monaco. Lots of random numbers are

number of states per site, 4 / expan-

sions in the dimension of space, . . . generated there.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

8.2 Markov Chains 141

Check how many neighbor spins are pointing up:

4 (4 neighbors up)

2 (3 neighbors up)

mi = sj = 0 (2 neighbors up) (8.9)

j: ij

2 (1 neighbor up)

4 (0 neighbors up)

spin i to be +1 or 1.

Set spin i up with probability eE+ /(eE+ + eE ) and down

with probability eE /(eE+ + eE ).

Repeat.

the spin up or down with probability given by the thermal distribution

given that its neighbors are xed. Using it, and fast modern computers,

you can simulate the Ising model fast enough to explore its behavior

rather thoroughly, as we will in a variety of exercises.

Lets consider a rather general algorithm for equilibrating a lattice model.

Our system has a set of states S = {si }; for the Ising model there are

2N such states. The algorithm has a transition rule, which at each step

shifts the current state S to a state S with probability PS S .12 For 12

We put the subscripts in this or-

the heatbath algorithm, PS S is equal to zero unless S and S are the der because we will use P as a ma-

trix, which will take a probability vec-

same except for at most one spin ip. Under what circumstances will tor from one step to the next.

an algorithm, dened by our matrix P , take our system into thermal

equilibrium?

There are many problems outside of mainstream statistical mechan-

ics that can be formulated in this general way. Exercise 8.3 discusses a

model with 1001 states (dierent numbers of red bacteria), and tran-

sition rates P+1 , P1 , and P ; we want to understand what

the longtime behavior is of the probability of nding dierent states.

These systems are examples of Markov Chains. A Markov chain has

a nite set of states {}, through which the system evolves in a discrete

series of steps n.13 The probabilities of moving to dierent new states 13

There are continuous analogues of

in a Markov chain depends only on the current state.14 That is, the Markov chains.

14

system has no memory of the past evolution. More generally, systems which lack

Let the probabilities of being in various states at step n be arranged memory are called Markovian.

in a vector (n). Then it is easy to see for a general Markov chain that

the probabilities15 P for moving from to satisfy 15

We heretofore leave out the left ar-

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

row.

142 Computational Stat Mech: Ising and Markov

(n + 1) = P

(n),

(n + 1) = P

(n). (8.10)

0 P 1. (8.11)

so

P = 1 (8.12)

Not symmetric! Typically P = P .

This last point isnt a big surprise: highenergy states are more likely

to be left than entered into. However, this means that much of our

mathematical intuition and many of our tools, carefully developed for

symmetric and Hermitian matrices, wont apply to our transition matrix

P . In particular, we cannot assume in general that we can diagonalize

our matrix.

It is true in great generality that our matrix P will have eigenvalues.

Also, it is true that for each distinct eigenvalue there will be at least one

16 0 1

For example, the matrix

0 0

right eigenvector16

has a double eigenvalue of zero, but P = (8.13)

only one left and right eigenvector with

eigenvalue zero. and one left eigenvector

T P = T . (8.14)

tors, and the left and right eigenvectors usually will not be equal to one

17

A general matrix M can be put into another.17

Jordan canonical form by a suitable For the particular case of our transition matrix P , we can go further.

change of basis S: M = SJS 1 . The

matrix J is block diagonal, with one

If our Markov chain reaches an equilibrium state at long times, that

eigenvalue associated with each block state must be unchanged under the time evolution P . That is, P = ,

(but perhaps multiple blocks per ). A and thus the equilibrium probability density is a right eigenvector with

given 0(say, 3 3)

1 block will be of the eigenvalue one. We can show that our Markov chain transition matrix

1 0

form @ 0 1 A with along the di- P has such a right eigenvector.

0 0

agonal and 1 in the elements imme- with eigenvalue

Theorem 8.1. P has at least one right eigenvector

diately above the diagonal. The rst one.

column of the block is associated with

the right eigenvector for ; the last row Sneaky Proof: P has a left eigenvector with eigenvalue one: the

is associated with the left eigenvector. vector all of whose components are one, T = (1, 1, 1, . . . , 1):

The word canonical here means sim-

plest form, and doesnt indicate a con-

nection with the canonical ensemble. ( T P ) = P = P = 1 = . (8.15)

Hence P must have an eigenvalue equal to one, and hence it must also

have a right eigenvector with eigenvalue one.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

8.2 Markov Chains 143

We can also show that all the other eigenvalues have right eigenvectors

that sum to zero, since P conserves probability: 18

Theorem 8.2. Any right eigenvector

one must have components that sum to zero.

is a right eigenvector, P

Proof: =

. Hence

= = P = P

= . (8.16)

This implies that either = 1 or = 0.

Markov chains can have more than one stationary probability distri-

bution.19 They can have transient states, which the system eventually 19

A continuum example of this is given

leaves never to return.20 They can also have cycles, which are proba- by the KAM theorem of problem 4.2.

There is a probability density smeared

bility distributions which like a clock 1 2 3 12 1 shift over each KAM torus which is time-

through a nite number of distinct classes of states before returning to independent.

the original one. All of these are obstacles in our quest for nding the 20

Transient states are important in dis-

equilibrium states in statistical mechanics. We can bypass all of them sipative dynamical systems, where they

by studying ergodic Markov chains.21 A nitestate Markov chain is er- are all the states not on the attractors.

godic if its transition matrix to some power n has all positive (non-zero) 21

Were compromising here between

matrix elements: P n > 0 for all states and .22 the standard Markov-chain usage in

We use a famous theorem, without proving it here: physics and in mathematics. Physi-

cists usually ignore cycles, and call al-

gorithms which can reach every state

Theorem 8.3 (PerronFrobenius Theorem). Let A be a matrix ergodic (what mathematicians call irre-

with all nonnegative matrix elements such that An has all positive ele- ducible). Mathematicians use the term

ments. Then A has a positive eigenvalue 0 , of multiplicity one, whose ergodic to exclude cycles and exclude

corresponding right and left eigenvectors have all positive components. probability running to innity (not im-

portant here, where we have a nite

Furthermore any other eigenvalue of A must be smaller, || < 0 . number of states). They also allow er-

godic chains to have transient states:

For an ergodic Markov chain, we can use theorem 8.2, to see that the only the attractor need be connected.

PerronFrobenius eigenvector with all positive components must have Chains with P n everywhere positive,

eigenvalue 0 = 1. We can rescale this eigenvector to sum to one, that were calling ergodic, are called

by the mathematicians regular Markov

proving that an ergodic Markov chain has a unique timeindependent chains. In the problems, to prove a

.

probability distribution system is ergodic just show that it

Whats the dierence between our denition of ergodic Markov chains, can reach everywhere (irreducible) and

and the denition of ergodic we used in section 4.2 in reference to tra- doesnt have cycles.

22

jectories in phase space? Clearly the two concepts are related: ergodic That is, after n steps every state

in phase space meant that we eventually come close to all states on the has nonzero probability to reach every

other state.

energy surface, and for nite Markov chains it is the stronger condi-

tion that we have non-zero probability of getting between all states in

the chain after precisely n steps. Indeed, one can show for nite state

Markov chains that if one can get from every state to every other state

by a sequence of moves (that is, the chain is irreducible), and if the

18 One can also view this theorem as saying that all the right eigenvectors except

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

144 Computational Stat Mech: Ising and Markov

chain is not cyclic, then it is ergodic (proof not given here). Any algo-

rithm that has a nite probability for each state to remain unchanged

(P > 0 for all states) is automatically free of cycles (clocks which lose

time will get out of synchrony).

It is possible to show that an ergodic Markov chain will take any initial

probability distribution (0) and converge to equilibrium, but the proof

in general is rather involved. We can simplify it by specializing one more

time, to Markov chains that satisfy detailed balance.

A Markov chain satises detailed balance if there is some probability

23

There is an elegant equivalent def- distribution such that23

inition of detailed balance directly in

terms of P and not involving the equi- P = P (8.17)

: see

librium probability distribution

problem 8.4.

for each state and . In words, the probability ux from state to

(the rate times the probability of being in ) balances the probability

ux back, in detail (i.e., for every pair of states).

If a physical system is time reversal invariant (no dissipation, no mag-

netic elds), and its states are also invariant under time reversal (no

states with specied velocities or momenta) then its dynamics automat-

ically satisfy detailed balance. This is easy to see: the equilibrium state

is also the equilibrium state under time reversal, so the probability ow

from must equal the timereversed ow from . Quan-

tum systems undergoing transitions between energy eigenstates in per-

turbation theory usually satisfy detailed balance, since the eigenstates

are time-reversal invariant. Most classical models (like the binary alloy

in 8.1.2) have states involving only congurational degrees of freedom,

which again satisfy detailed balance.

Detailed balance allows us to nd a complete set of eigenvectors and

right eigenvalues for our transition matrix P . One can see this with

a

simple transformation. If we divide both sides of equation 8.17 by

, we create a symmetric matrix Q

Q = P = P / (8.18)

= P / = P = Q .

which

24

This works in reverse to get the right can be turned into right eigenvectors of P when rescaled24 by :

eigenvectors ofP from Q. One mul-

tiplies by to get , and di- = : (8.19)

vides to get , so if detailed balance

holds, = / . In particular,

1 = = (1, 1, 1, . . . )T , as we saw

in theorem 8.1.

P =

P ( ) = Q ( ) (8.20)

= Q = ( ) = .

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

8.2 Markov Chains 145

Now for the main theorem underlying the algorithms for equilibrating

lattice models in statistical mechanics.

Theorem 8.4 (Main Theorem). A system with a nite number of

if

states can be guaranteed to converge to an equilibrium distribution

the computer algorithm

is Markovian (has no memory),

is ergodic (can reach everywhere and is acyclic) and

satises detailed balance.

Proof: Let P be the transition matrix for our algorithm. Since the

algorithm satises detailed balance, P has a complete set of eigenvectors

. Since our algorithm is ergodic there is only one right eigenvector

1

with eigenvalue one, which we can choose to be the stationary distribu-

tion ; all the other eigenvalues

have || < 1. Decompose the initial

condition + ||<1 a

(0) = a1 . Then25 25

The eigenvectors closest to one will

be the slowest to decay. You can get

the slowest characteristic time for

(n) = P

(n 1) = P n +

(0) = a1 a n

. (8.21) a Markov chain by nding the largest

||<1 |max | < 1 and setting n = en .

Since the (nite) sum in this equation decays to zero, the density con-

. This implies both that a1 = 1 and that our system

verges to a1

as n .

converges to

Thus, to develop a new equilibration algorithm, one must ensure that

it is Markov, ergodic, and satises detailed balance.

Exercises

(8.1) The Ising Model. (Computational) for a binary alloy, . . . ). As a lattice gas, M gives the

Youll need the program ising, available on the Web [102]. net concentration, and H corresponds to a chemical po-

The Ising Hamiltonian is tential. Our simulation doesnt conserve the number of

X X spins up, so its not a natural simulation for a bulk lattice

H = J Si Sj H Si , 3.7.1 gas. You can think of it as a grand canonical ensemble,

ij
i or as a model for a lattice gas on a surface exchanging

wherePSi = 1 are spins on a square lattice, and the atoms with the vapor above.

sum ij
is over the four nearest-neighbor bonds (each Play with it. At high temperatures, the spins should not

pair summed once). Its conventional to set the coupling be strongly correlated. At low temperatures the spins

strength J = 1 and Boltzmanns constant kB = 1, which should align all parallel, giving a large magnetization.

amounts to measuring energies and temperatures in units Can you roughly locate the phase transition? Can you

of J. P The constant H is called the external eld, and see growing clumps of aligned spins as T Tc + (i.e., T

M = i Si is called the magnetization. approaching Tc from above)?

As noted in class, the Ising model can be viewed as an (a) Phase diagram. Draw a rough phase diagram in

anisotropic magnet with Si being 2z for the spin at site i, the (H, T ) plane, showing (i) the spin up phase where

or it can represent the occupancy of a lattice site (atom M > 0, (ii) the spin down phase with M < 0, (iii)

or no atom for a lattice gas simulation, copper or gold the paramagnetic phase line where M = 0, (iv) the fer-

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

146 Computational Stat Mech: Ising and Markov

romagnetic phase line where |M| > 0 for large systems lap), write a formula for the magnetization. (Remember,

even though H = 0, and (v) the critical point, where at each ipped spin changes the magnetization by 2.) Check

H = 0 the system develops a non-zero magnetization. your prediction against the simulation. (Hint: see equa-

Correlations and Susceptibilities: Analytical. tion 10.14.)

The

P partition function for the Ising model is Z = The magnetization (and the specic heat) are exponen-

N

n exp(En ), where the states n run over all 2 pos- tially small at low temperatures because there is an energy

sible congurations of the spins, and the free energy gap to spin excitations in the Ising model,27 just as there

F = kT log Z. is a gap to charge excitations in a semiconductor or an

(b) Show that the average of the magnetization M equals insulator.

(F/H) |T . Derive the formula writing the suscepti- High Temperature Expansion for the Susceptibil-

bility 0 = (M/H) |T in terms of (M M)2 = ity. At high temperatures, we can ignore the coupling to

M2 M2 . (Hint: remember our derivation of for- the neighboring spins.

mula 5.18 (E E)2 = kB T 2 C?)

(e) Calculate a formula for the susceptibility of a free spin

Notice that the program outputs, at each temperature coupled to an external eld. Compare it to the suscepti-

and eld, averages of several quantities: |M |, (M bility you measure at high temperature T = 100 for the

M )2 , E, (E E)2 . Unfortunately, E and M in Ising model (say, M/H with H = 1. Why is H = 1

these formulas are measured per spin, while the formu- a small eld in this case?)

las in the class and the problem set are measured for the

system as a whole. Youll need to multiply the squared Your formula for the high-temperature susceptibility is

quantities by the number of spins to make a comparison. known more generally as Curies law.

To make that easier, change the system size to 100100,

using congure. While youre doing that, increase speed (8.2) Coin Flips and Markov Chains. (Mathematics,

to ten or twenty to draw the spin conguration fewer Basic)

times. To get good values for these averages, equilibrate A physicist, testing the laws of chance, ips a coin repeat-

for a given eld and temperature, reset, and then start edly until it lands tails.

averaging.

(a) Treating the two states of the physicist (still ipping

(c) Correlations and Susceptibilities: Numerical.

and done) as states in a Markov

process.

The current

Check the formulas for C and from part (b) at H = 0 f lipping

and T = 3, by measuring the uctuations and the av- probability vector then is

F= . Write the tran-

done

erages, and then changing by H = 0.02 or T = 0.1 sition matrix P, giving the time evolution P Fn = Fn+1 ,

and measuring the averages again. Check them also for assuming that the coin is fair.

T = 2, where M = 0.26

(b) Find the eigenvalues and right eigenvectors of P.

There are systematic series expansion for the Ising model Which eigenvector is the steady state ? Call the other

at high and low temperatures, using Feynman diagrams eigenvector . For convenience, normalize so that its

(see section 10.2). The rst terms of these expansions are rst component equals one.

famous, and easy to understand.

(c) Assume an arbitrary initial state is written 0 =

Low Temperature Expansion for the Magnetiza-

A + B . What are the conditions on A and B needed

tion. At low temperatures we can assume all spins ip

to make 0 a valid probability distribution? Write n as

alone, ignoring clusters.

a function of A and B, and .

(d) What is the energy for ipping a spin antiparallel to

its neighbors? Equilibrate at low temperature T = 1.0,

and measure the magnetization. Notice that the primary (8.3) Red and Green Bacteria (Mathematics) (From

excitations are single spin ips. In the low temperature Princeton. [115])

approximation that the ipped spins are dilute (so we may A growth medium at time t = 0 has 500 red bacteria and

ignore the possibility that two ipped spins touch or over- 500 green bacteria. Each hour, each bacterium divides

26 Be sure to wait until the state is equilibrated before you start! Below T this

c

means the state should not have red and black domains, but be all in one ground

state. You may need to apply a weak external eld for a while to remove stripes at

low temperatures.

27 Not all real magnets have a gap: if there is a spin rotation symmetry, one can

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

8.2 Markov Chains 147

in two. A color-blind predator eats exactly 1000 bacteria states , , and . Assume for simplicity that there is

per hour.28 a state with non-zero transition rates from all other

(a) After a very long time, what is the probability dis- states . Construct a probability density that demon-

tribution for the number of red bacteria in the growth strates that P satises detailed balance (equation 8.22).

medium? (Hint: If you assume a value for , what must be to

(b) Roughly how long will it take to reach this nal ensure detailed balance for the pair? Show that this can-

state?29 didate distribution satises detailed balance for any two

states.)

(c) Assume that the predator has a 1% preference for

green bacteria (implemented as you choose). Roughly how

(8.5) Heat Bath, Metropolis, and Wol. (Mathe-

much will this change the nal distribution?

matics, Computation)

(8.4) Detailed Balance. (Basic) There are a number of dierent methods for equilibrat-

In an equilibrium system, for any two states and ing lattice simulations like the Ising model. They give

with equilibrium probabilities and , detailed bal- the model dierent dynamics, but keep the equilibrium

ance states (equation 8.17) that properties unchanged. This is guaranteed by the theo-

rem we asserted in class on Markov processes: if they

P = P , (8.22) are ergodic and obey detailed balance, they converge to

the equilibrium distribution. Well rst look at the two

that is, the equilibrium ux of probability from to most common algorithms. Well then consider the most

is the same as the ux backward from to . Its both sophisticated, sneaky use of the theorem I know of.

possible and elegant to reformulate the condition for de-

The simulation ising in problem 8.1 uses the heat-bath

tailed balance so that it doesnt involve the equilibrium

algorithm, which thermalizes one spin at a time:

probabilities. Consider three states of the system, , ,

Heat Bath

and .

(a) Assume that each of the three types of transitions (a) Pick a spin at random,

among the three states satises detailed balance. Elim- (b) Calculate the energies E and E for the spin being

inate the equilibrium probability densities to write the un- up or down given its current environment.

known rate P in terms of the ve other rates. (Hint: (c) Thermalize it: place it up with probability

see equation below for answer.) eE /(eE + eE ), down with probability

If we view the three states , , and to be around a cir- eE /(eE + eE ).

cle, youve derived a relationship between the rates going

Another popular choice is the Metropolis algorithm,

clockwise and the rates going counter-clockwise around

which also ips a single spin at a time:

the circle,

Metropolis

P P P = P P P . (8.23) (a) Pick a spin at random,

(b) Calculate the energy E for ipping the spin.

It is possible to show conversely that if every triple of

(c) If E < 0 ip it; if E > 0, ip it with probability

states in a Markov chain satises the condition you de-

e E .

rived then it satises detailed balance (i.e., that there is

at least one probability density which makes the prob- (a) Show that Heat Bath and Metropolis satisfy detailed

ability uxes between all pairs of states equal). The only balance. Note that they are ergodic and Markovian (no

complication arises because some of the rates can be zero. memory), and hence argue that they will lead to thermal

(b) Suppose P is the transition matrix for some Markov equilibrium. Is Metropolis more ecient (fewer random

process satisfying the condition 8.23 for every triple of numbers needed to get to equilibrium)? Why?

28 This question is purposely openended, and rough answers to parts (b) and (c)

within a factor of two are perfectly acceptable. Numerical and analytical methods

are both feasible.

29 Within the accuracy of this question, you may assume either that one bacterium

reproduces and then one is eaten 1000 times per hour, or that at the end of each

hour all the bacteria reproduce and then 1000 are consumed. The former method is

more convenient for analytical work nding eigenvectors; the latter can be used to

motivate approaches using the diusion of probability with an dependent diusion

constant.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

148 Computational Stat Mech: Ising and Markov

Near the critical point Tc where the system develops The cluster ip can start at any site in the cluster C.

a magnetization, any single-spin-ip dynamics becomes The ratio of rates AB /BA depends upon the num-

very slow. Wol (Phys. Rev. Lett. 62, 361 (1989)), im- ber of times the cluster chose not to grow on the bound-

proving on ideas of Swendsen and Wang (Phys. Rev. Lett. ary. Let PC be the probability that the cluster grows in-

58, 86 (1987)), came up with a clever method to ip whole ternally from site to the cluster C (ignoring the moves

clusters of spins. which try to grow outside the boundary). Then

X

AB = PC (1 p)n , (8.24)

X

BA = PC (1 p)n , (8.25)

ing from the up-state A, and n times when starting from

Fig. 8.5 Cluster Flip. The region inside the dotted line is B.

ipped in one Wol move. Let this conguration be A. (c) What value of p lets the Wol algorithm satisfy de-

tailed balance at temperature T ?

Find a Windows machine. Download the Wol simula-

tion [103]. Using the parameter reset (top left) reset the

temperature to 2.3, the algorithm to Heat Bath, and the

height and width to 512. Watch the slow growth of the

characteristic cluster sizes. Now change to Wol, and see

how much faster the code is. Also notice that each sweep

almost completely rearranges the pattern: the correlation

time is much smaller for the Wol algorithm. (See [75,

secs. 4.2 and 4.3] for more details on the Wol algorithm.)

Fig. 8.6 Cluster Flip. Let this conguration be B. Let the (8.6) Stochastic Cells. (Biology, Computation) (With

cluster ipped be C. Notice that the boundary of C has n = 2, Myers. [72])

n = 6.

Living cells are amazingly complex mixtures of a vari-

ety of complex molecules (RNA, DNA, proteins, lipids

Wol Cluster Flips . . . ) that are constantly undergoing reactions with one

another. This complex of reactions has been compared

(a) Pick a spin at random, remember its direction D = to computation: the cell gets input from external and in-

1, and ip it. ternal sensors, and through an intricate series of reactions

(b) For each of the four neighboring spins, if it is in the produces an appropriate response. Thus, for example, re-

direction D, ip it with probability p. ceptor cells in the retina listen for light and respond by

(c) For each of the new ipped spins, recursively ip triggering a nerve impulse.

their neighbors as in (2). The kinetics of chemical reactions are usually described

using dierential equations for the concentrations of the

Because with nite probability you can ip any spin, various chemicals, and rarely are statistical uctuations

the Wol algorithm is ergodic. Its obviously Markovian considered important. In a cell, the numbers of molecules

when viewed as a move which ips a cluster. Lets see of a given type can be rather small: indeed, there is (of-

that it satises detailed balance, when we pick the right ten) only one copy of the relevant part of DNA for a given

value of p for the given temperature. reaction. Its an important question whether and when

(b) Show for the two congurations shown above that we may describe the dynamics inside the cell using con-

EB EA = 2(n n )J. Argue that this will be true for tinuous concentration variables, even though the actual

ipping any cluster of up spins to down. numbers of molecules are always integers.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

8.2 Markov Chains 149

at t = 0 all N monomers are unbound.

(a) Continuum dimerization. Write the dierential equa-

2

kb 1 tion for dM/dt treating M and D as continuous variables.

(Hint: remember that two M molecules are consumed in

M D each reaction.) What are the equilibrium concentrations

for [M ] and [D] for N = 2 molecules in the cell, assum-

ing these continuous equations and the values above for kb

and ku ? For N = 90 and N = 10100 molecules? Numer-

ically solve your dierential equation for M (t) for N = 2

and N = 90, and verify that your solution settles down to

2 1 the equilbrium values you found.

ku For large numbers of molecules in the cell, we expect that

the continuum equations may work well, but for just a

few molecules there surely will be relatively large uctu-

ations. These uctuations are called shot noise, named

Fig. 8.7 Dimerization reaction. A Petri net diagram for

a dimerization reaction, with dimerization rate kb and dimer in early studies of electrical noise at low currents due

dissociation rate ku . to individual electrons in a resistor. We can implement

a simple MonteCarlo algorithm to simulate this shot

noise.32 SupposeP the reactions have rates i , with to-

Consider a simple dimerization reaction: a molecule M tal rate tot = i i . The idea is that the expected time

(called the monomer) joins up with another monomer to the next reaction is 1/tot , and the probability that

and becomes a dimer D: 2M D. Proteins in cells the next reaction will be j is j /tot . To simulate until

often form dimers: sometimes (as here) both proteins are a nal time tf , the algorithm runs as follows:

the same (homodimers) and sometimes they are dierent

proteins (heterodimers). Suppose the forward reaction (a) Calculate a list of the rates of all reactions in the

rate is ku and the backward reaction rate is kd . Figure 8.7 system.

shows this as a Petri net [37] with each reaction shown (b) Find the total rate tot .

as a box with incoming arrows showing species that are

consumed by the reaction, and outgoing arrows showing (c) Pick a random time twait with probability distribu-

species that are produced by the reaction: the number tion (t) = tot exp(tot t).

consumed or produced (the stoichiometry) is given by (d) If the current time t plus twait is bigger than tf , no

a label on each arrow.30 There are thus two reactions: further reactions will take place: return.

the backward unbinding reaction rate per unit volume is (e) Otherwise,

ku [D] (each dimer disassociates with rate ku ), and the

forward binding reaction rate per unit volume is kb [M ]2 Increment t by twait ,

(since each monomer must wait for a collision with an- Pick a random number r uniformly distributed

other monomer before binding, the rate is proportional in the range [0, tot ), P

to the monomer concentration squared).31 Pick the reaction j for which i<j i r <

P

i<j+1 i (that is, r lands in the j th interval

The brackets [] denote concentrations. We assume, as

of the sum forming tot ).

does reference [28], that the volume per cell is such that

Execute that reaction, by incrementing each

one molecule per cell is 1nM (109 moles per liter). For

chemical involved by its stoichiometry.

convenience, we shall pick nanomoles as our unit of con-

centration, so [M ] is also the number of monomers in the (f) Repeat.

30 An enzyme that is necessary but not consumed is shown with an incoming and

outgoing arrow.

31 In the discrete case, the rate will be proportional to M (M 1), since a monomer

32 In the context of chemical simulations, this algorithm is named after Gille-

spie [33]; the same basic approach was used just a bit earlier in the Ising model

by Bortz, Kalos and Lebowitz [13], and is called continuoustime Monte Carlo in

that context.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

150 Computational Stat Mech: Ising and Markov

As mentioned earlier, the binding reaction rate for M rated into dierent modules: a given protein may partici-

total monomers binding is no longer kb M 2 for discrete pate in what would seem to be several separate regulatory

molecules: its kb M (M 1) (where again [M ] M for a pathways. In this exercise, we will study a simple model

one nanoliter cell, when using concentrations in nanomo- system, the Repressilator. This experimental system in-

lar).33 volves three proteins each of which inhibits the formation

(b) Stochastic dimerization. Implement this algorithm for of the next. They were added to the bacterium E. coli,

the dimerization reaction of part (a). Simulate for N = 2, with hopefully minimal interactions with the rest of the

N = 90, and N = 10100 and compare a few stochastic re- biological machinery of the cell. We will implement the

alizations with the continuum solution. How large a value stochastic model that the authors used to describe their

of N do you need for the individual reactions to be well experimental system [28], in order to

described by the continuum equations (say, uctuations

less than 20% at late times)? Implement in a tangible system an example both

Measuring the concentrations in a single cell is often a of the central dogma and of transcriptional regula-

challenge. Experiments often average over many cells. tion: the control by proteins of DNA expression into

Such experiments will measure a smooth time evolution RNA,

even though the individual cells are noisy. Lets investi- Introduce sophisticated MonteCarlo techniques for

gate whether this ensemble average is well described by simulations of stochastic reactions,

the continuum equations. Introduce methods for automatically generating

(c) Average Stochastic dimerization. Find the average continuum descriptions from reaction rates, and

of many realizations of your stochastic dimerization in Illustrate the shot noise uctuations due to small

part (b), for N = 2 and N = 90, and compare with your numbers of molecules and the telegraph noise uc-

deterministic solution. How much is the longterm av- tuations due to nite rates of binding and unbinding

erage shifted by the stochastic noise? How large a value of the regulating proteins.

of N do you need for the ensemble average of M (t) to be

well described by the continuum equations (say, shifted by Figure 8.8 shows the biologists view of the repressilator

less than 5% at late times)? network. Three proteins (TetR, CI, and LacI) each re-

press the formation of the next. We shall see that, under

(8.7) The Repressilator. (Biology, Computation) appropriate circumstances, this can lead to spontaneous

(With Myers. [72]) oscillations: each protein peaks in turn, suppressing the

Reading: Reference [28], Michael B. Elowitz and Stanis- suppressor of its suppressor, leading to its own later de-

law Leibler, A synthetic oscillator network of transcrip- crease.

tional regulators Nature 403, 335-338 (2000).

The central dogma of molecular biology is that the ow

of information is from DNA to RNA to proteins: DNA is

transcribed into RNA, which then is translated into pro-

tein.

TetR

Now that the genome is sequenced, it is thought that we

have the parts list for the cell. All that remains is to g-

ure out how they work together. The proteins, RNA, and

DNA form a complex network of interacting chemical re-

CI

actions, which governs metabolism, responses to external

stimuli, reproduction (proliferation), dierentiation into

dierent cell types, and (when the system perceives itself

LacI

to be breaking down in dangerous ways) programmed cell Fig. 8.8 The biologists view of the Repressilator network.

death, or apoptosis. The T-shapes are blunt arrows, signifying that the protein at

the tail (bottom of the T) suppresses the production of the pro-

Our understanding of the structure of these interacting tein at the head. Thus LacI (pronounced lack-eye) suppresses

networks is growing rapidly, but our understanding of TetR (tet are), which suppresses CI (lambda-see-one). This

the dynamics is still rather primitive. Part of the dif- simple description summarizes a complex series of interactions

culty is that the cellular networks are not neatly sepa- (see gure 8.9).

33 Without this change, if you start with an odd number of cells your concentrations

can go negative!

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

8.2 Markov Chains 151

The biologists notation summarizes a much more com- Fig. 8.9 The Petri net version [37] of onethird of the Re-

plex picture. The LacI protein, for example, can bind to pressilator network (the LacI repression of TetR). (Thanks to

one or both of the transcriptional regulation or operator Myers [72]). The solid lighter vertical rectangles represent

binding reactions A + B C, with rate kb [A][B]; the open

sites ahead of the gene that codes for the tetR mRNA.34

vertical rectangles represent unbinding C A + B, with rate

When bound, it largely blocks the translation of DNA ku [C]. The horizonal rectangles represent catalyzed synthesis

into tetR.35 The level of tetR will gradually decrease as it reactions C C + P , with rate [C]; the darker ones repre-

degrades; hence less TetR protein will be translated from sent transcription (formation of mRNA), and the lighter one

the tetR mRNA. The resulting network of ten reactions represent translation (formation of protein). The black verti-

is depicted in gure 8.9, showing one third of the total re- cal rectangles represent degredation reactions, A nothing

pressilator network. The biologists shorthand (gure 8.8 with rate kd [A]. (The stoichiometry of all the arrows is one.)

The LacI protein (top) can bind to the DNA in two promoter

does not specify the details of how one protein represses sites ahead of the gene coding for tetR: when bound, it largely

the production of the next. The larger diagram, for exam- blocks the transcription (formation) of tetR mRNA. P0 repre-

ple, includes two operator sites for the repressor molecule sents the promotor without any LacI bound; P1 represents the

to bind to, leading to three states (P0 , P1 , and P2 ) of the promotor with one site blocked, and P2 represents the doubly-

promotor region depending upon how many LacI proteins bound promotor. LacI can bind to one or both of the promotor

are bound. sites, changing Pi to Pi+1 , or correspondingly unbind: the un-

binding rate for the protein is modeled in reference [28] to

be faster when only one site is occupied. The unbound P0

state transcribes tetR mRNA quickly, and the bound states

transcribe it slowly (leaky repression). The tetR mRNA then

catalyzes the formation of the TetR protein.36

If you are not provided with it, you may retrieve a sim-

ulation package for the Repressilator from the book Web

site [105].

(a) Run the simulation for at least 6000 seconds and plot

the protein, RNA, and promotor states as a function of

time. Notice that

The protein levels do oscillate, as in gure 1(c) in

reference [28],

There are signicant noisylooking uctuations,

There are many more proteins than RNA

We will study this noise in parts (c) and (d); it will be due

to the low numbers of RNA molecules in the cell, and to

the discrete uctuations between the three states of the

promotor sites. Before we do this, we should (a) increase

the eciency of the simulation, and (b) compare it to

the continuum simulation that would be obtained if there

were no uctuations.

To see how important the uctuations are, we should

compare the stochastic simulation to the solution of the

continuum reaction rate equations (as we did in exer-

cise 8.6). In reference [28], the authors write a set of

34 Messenger RNA (mRNA) codes for proteins. Other forms of RNA can serve as

35 RNA polymerase, the molecular motor responsible for transcribing DNA into

RNA, needs to attach to the DNA at a promotor site. By binding to the adjacent

operator sites, our repressor protein inhibits this attachment and hence partly blocks

transcription. The residual transcription is called leakiness.

36 Proteins by convention have the same names as their mRNA, but start with

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

152 Computational Stat Mech: Ising and Markov

six dierential equations giving a continuum version of the operator taps the key.) The continuum description

the stochastic simulation. These equations are simpli- is accurate in the limit where the binding and unbinding

ed: they both integrate out or coarsegrain away the rates are fast compared to all of the other changes in the

promotor states from the system, deriving a Hill equation system: the protein and mRNA variations then see the

(see exercise 5.9) for the mRNA production, and they also average, local equilibrium concentration. On the other

rescale their variables in various ways. Rather than typ- hand, if the rates are slow compared to the response of

ing in their equations and sorting out these rescalings, it the mRNA and protein, the latter can have a switching

is convenient and illuminating to write a simple routine appearance.

to generate the continuum dierential equations directly

from our reaction rates.

(b) Write a DeterministicRepressilator, derived from Re- (d) Incorporate a telegraphFactor into your stochastic re-

pressilator just as StochasticRepressilator was. Write a pressilator routine, that multiplies the binding and un-

routine dcdt(c, t), that binding rates. Run for 1000 seconds with RNAFactor=10

(to suppress the shot noise) and telegraphFactor = 0.001.

Sets the chemical amounts in the reaction network Do you observe features in the mRNA curves that appear

to the values in the array c, to switch as the relevant proteins unbind and bind?

Sets a vector dcdt (of length the number of chemi-

cals) to zero,

For each reaction, Advanced Algorithms: The simulation you will be

compute its rate given implements the Gillespie algorithm discussed in ex-

for each chemical whose stoichiometry is ercise 8.6. At each step, the rates of all possible reactions

changed by the reaction, add the stoichiometry are calculated, in order to randomly choose when and

change times the rate to the corresponding en- which the next reaction will be. For a large, loosely con-

try of dcdt. nected system of reactions there is no need to recalculate

Call a routine to integrate the resulting dierential equa- each rate only the rates which have changed due to

tion (as described in the last part of exercise 8.9, for ex- the previous reaction. Keeping track of the dependency

ample), and compare your results to those of the stochas- network (which chemical amounts aect which reactions

tic simulation. change the amounts of which chemicals) is relatively sim-

ple [71].

The stochastic simulation has signicant uctuations

away from the continuum equation. Part of these uc-

tuations are due to the fact that the numbers of proteins (e) Alter the reaction network to store the current re-

and mRNAs are small: in particular, the mRNA numbers action rates. Add a function UpdateRates(reac) to the

are signicantly smaller than the protein numbers. reaction network, which for each chem whose stoichiom-

(c) Write a routine that creates a stochastic repressi- etry is changed by reac, updates the rates for each re-

lator network that multiplies the mRNA concentrations action aected by the amount of chem. Alter the Step

by RNAFactor without otherwise aecting the continuum method of the stochastic repressilator simulation to use

equations. (That is, multiply the initial concentrations the stored current reaction rates (rather than recomputing

and the transcription rates by RNAFactor, and divide them) and to call UpdateRates with the chosen reaction

the translation rate by RNAFactor.) Try boosting the before returning. Time your new routine, and compare to

RNAFactor by ten and one hundred. Do the RNA and the speed of the old one. A network of thirty reactions

protein uctuations become signicantly smaller? This for fteen chemical components is rather small on biolog-

noise, due to the discrete, integer values of chemicals in ical scales. The dependency network algorithm should be

the cell, is analogous to the shot noise seen in electrical signicantly faster for large systems.

circuits due to the discrete quantum of electric charge. It

scales, as do most uctuations, as the square root of the

number of molecules.

A continuum description of the binding of the proteins

to the operator sites on the DNA seems particularly du-

bious: a variable that must be zero or one is replaced

by a continuous evolution between these extremes. (Such

noise in other contexts is called telegraph noise in anal-

ogy to the telegraph, which is either silent or sending as (8.8) Entropy Increases! Markov chains. (Math)

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

8.2 Markov Chains 153

Entropy is Concave (Convex downward) doesnt increase in Hamiltonian systems. Let us show

f( a+(1-)b ) > f(a) + (1-)f(b)

0.4 that it does increase for Markov chains.37

The Markov chain is implicitly exchanging energy with

f( a+(1-)b ) a heat bath at the temperature T . Thus to show that

0.3 the entropy for the world as a whole increases, we must

f(a) show that S E/T increases, where S is the en-

-x log(x)

the heat bath. Hence, showing that entropy increases for

f(b) our Markov process is equivalent to showing that the free

0.1 f(a) + (1-) f(b) energy E T S decreases.

Let P be the transition matrix for a Markov process,

0 satisfying detailed balance with energy E at tempera-

0 0.2 0.4 0.6 0.8 1 ture T . The current probability of being in state is .

x

The free energy

Fig. 8.10 For x 0, f (x) = x log x is strictly convex

downward (concave) as a function of the probabilities: for X X

0 < < 1, the linear interpolation lies below the curve. F = E TS = E + kB T log . (8.27)

(c) Show that the free energy decreases for a Markov pro-

Convexity arguments are a basic tool in formal statistical

cess. In particular, using

P equation 8.26, show that the

mechanics. The function f (x) = x log x is strictly con- (n+1) (n)

free energy for = P is less than or equal

cave (convex downward) for x 0 (gure 8.10): this is

easily shown by noting that its second derivative is nega- to the free energy for (n) . You may use the properties

tive in this region. P the Markov transition matrix P , (0 P 1 and

of

P P = 1), and detailed balance (P = P ,

(a) Convexity for sums of many terms. If = 1,

where = exp(E /kB T )/Z). (Hint: youll want to

and if for all both 0 and x 0, show by induc-

use = P in equation 8.26, but the entropy will in-

tion on the number of states M that if g(x) is concave for volve P , which is not the same. Use detailed balance

x 0, to convert from one to the other.)

X

M X

M

g( x ) g(x ). (8.26)

=1 =1 (8.9) Solving ODEs: The Pendulum (Computa-

(Hint: In the denition of concave, f (a + (1 )b) tional) (With Myers. [72])

f (a) + (1 )f (b), take (1 ) = M +1 and b = xM +1 . Reading: Numerical Recipes [81], chapter 16.

Then a is a sum of M terms, rescaled from their original Physical systems usually evolve continuously in time:

values. Do the coecients of x in a sum to one? Can their laws of motion are dierential equations. Computer

we apply induction?) simulations must approximate these dierential equations

Microcanonical Entropy is Maximum. In problem using discrete time steps. In this exercise, we will intro-

set 2, you showed that the microcanonical ensemble was duce some common methods for simulating dierential

an extremum of the entropy, using Lagrange multipliers. equations using the simple example of the pendulum:

We can use the convexity of x log x to show that its

d2

actually a global maximum. = = (g/L) sin(). (8.28)

dt2

(b) Using equation 8.26 for g(x) = x log x and =

1/M ,Pshow that the entropy for a system of M states This equation gives the motion of a pendulum with a

kB log kB log M , the entropy of the (uni- point mass at the tip of a massless rod38 of length L:

form) microcanonical ensemble. rederive it using a free body diagram.

Markov Chains: Entropy Increases! In problem Go to our Web site [105] and download the pendulum

set 2 you also noticed that, formally speaking, entropy les for the language youll be using. The animation

37 We know that the Markov chain eventually evolves to the equilibrium state, and

we argued that the latter minimizes the free energy. What were showing here is that

the free energy goes continuously downhill for a Markov chain.

38 Well depict our pendulum emphasizing the rod rather than the mass: the equa-

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

154 Computational Stat Mech: Ising and Markov

should show a pendulum oscillating from an initial con- the Hamiltonian preserving important physical features

dition 0 = 2/3, = 0; the equations being solved have not kept by just approximately solving the dynamics.

g = 9.8m/s2 and L = 1m. Accuracy. Most computational methods for solving dif-

There are three independent criteria for picking a good ferential equations (and many other continuum problems

algorithm for solving dierential equations: delity, ac- like integrating functions) involve a step size , and be-

curacy, and stability. come more accurate as gets smaller. The easy thing

Fidelity. Notice that in our time step algorithm, we to calculate is the error in each time step, but the more

did not do the straightforward choice using the current important quantity is the accuracy of the answer after a

((t), (t)) to produce ((t + ), (t + )). Rather, we used xed time T , which is the accumulated error after T /

(t) to calculate the acceleration and update , and then time steps. If this accumulated error varies as n , we say

used (t + ) to calculate (t + ). that the algorithm has nth order cumulative accuracy.

Our algorithm is not very high order!

(t + ) = (t) + (t) (8.29) (b) Plot the pendulum trajectory (t) for time steps =

(t + ) = (t) + (t + ) 0.1, 0.01, and 0.001. Zoom in on the curve at one of the

coarse points (say, t = 1) and compare the values from

Wouldnt it be simpler and make more sense to up- the three time steps. Does it appear that this time is con-

date and simultaneously from their current values, verging41 as 0? From your measurement, what order

so (t + ) = (t) + (t) ? (This simplest of all time- accuracy is our method?

stepping schemes is called the Euler method, and should We can write higherorder symplectic algorithms. The

not be used for ordinary dierential equations (although simple approximation to the second derivative

it is sometimes used in partial dierential equations.)

(a) Try it. First, see why reversing the order of the up- ((t + ) 2(t) + (t )) / 2 (8.31)

dates to and ,

(which you can verify with a Taylor expansion is correct

to O( 4 )) motivates the Verlet Algorithm

(t + ) = (t) + (t)

(t + ) = (t) + (t) (8.30) (t + ) = 2(t) (t ) + 2 . (8.32)

in our loop would give us a simultaneous update. Swap This algorithm is a bit awkward to start up since you need

these two lines in the code, and watch the pendulum swing to initialize42 (t ); its also often convenient to know

for several turns, until it starts looping the loop. Is the the velocities as well as the positions. The Velocity Verlet

new algorithm as good as the old one? (Make sure you algorithm xes both of these problems; it is motivated by

switch the two lines back afterwards.) the constant acceleration formula x(t) = x0 + v0 t + 1/2 at2 :

The simultaneous update scheme is just as accurate as

(t + ) = (t) + (t) + 1/2 (t) 2 (8.33)

the one we chose, but it is not as faithful to the physics

1

of the problem: its delity is not as good. For subtle rea- (t + /2) = (t) + /2 (t)

sons we wont explain here, updating rst and then (t + ) = (t + /2) + 1/2 (t + ) .

allows our algorithm to exactly conserve an approxima-

tion to the energy: its called a symplectic algorithm.39 The trick that makes this algorithm so good is to cleverly

Improved versions of this algorithm like the Verlet al- split the velocity increment into two pieces, half for the

gorithms below are often used to simulate systems that acceleration at the old position and half for the new po-

conserve energy (like molecular dynamics) because they sition.43 (Youll want to initialize once before starting

exactly40 simulate the dynamics for an approximation to the loop.)

energy conservation, but with an approximation to the true energy.

40 Up to rounding errors

41 You may note that its easy to extrapolate to the correct answer. This is called

42 Since we start with = 0, the simulation is symmetric under reversing the sign

of time and you can get away with using (t ) = (t) + 1/2 + O(4 ).

43 You may check that both Verlet algorithms give exactly the same values for

(t0 + n).

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

8.2 Markov Chains 155

(c) Pick one of the Verlet algorithms, implement it, and methods.

plot the trajectory for time steps = 0.1, 0.01, and 0.001. The generalpurpose solvers come in a variety of basic

You should see a dramatic improvement in convergence. algorithms (RungeKutta, predictorcorrector, . . . ), and

What cumulative order accuracy does Verlet have?44 methods for maintaining and enhancing accuracy (vari-

Stability. In many cases high accuracy is not crucial. able step size, Richardson extrapolation). There are also

What prevents us from taking enormous time steps? In implicit methods for sti systems. A system is sti if there

a given problem, there is usually a typical fastest time is a large separation between the slowest and fastest rele-

scale: a vibration or oscillation period (as in our prob- vant time scales: implicit methods often allow one to take

lem) or a growth or decay rate. When our time step time steps much larger than the fastest time scale (un-

becomes a substantial fraction of this fastest time scale, like the explicit Verlet methods you studied in part (d),

algorithms like ours usually become unstable: the rst few which go unstable). Large, sophisticated packages have

time steps may be fairly accurate, but small errors build been developed over many years for solving dierential

up until the errors become unacceptable (indeed, often equations switching between algorithms and varying

ones rst warning of problems are machine overows). the time steps to most eciently maintain a given level

(d) Plot the pendulum trajectory (t) for time steps = of accuracy. They solve dy/dt = dydt(y, t), where for

0.1, 0.2, . . . , 0.8, using a small amplitude oscillation us y = [, ] and dydt = [, ]. They typically come

0 = 0.01, 0 = 0.0, up to tmax = 10. At about what c in the form of subroutines or functions, which need as

does it go unstable? Looking at the rst few points of the arguments

trajectory, does it seem like sampling the curve at steps Initial conditions y0 ,

much larger than c would miss the oscillations? At c /2, The righthand side dydt, a function of the vec-

how accurate is the amplitude of the oscillation? (Youll tor y and time t, which returns a vector giving the

need to observe several periods in order to estimate the current rate of change of y, and

maximum amplitude of the solution.) The initial and nal times, and perhaps intermedi-

In solving the properties of large, nonlinear systems (e.g., ate times, at which the trajectory y(t) is desired.

partial dierential equations (PDEs) and molecular dy- They often have options that

namics) stability tends to be the key diculty. The

maximum stepsize depends on the local conguration, Ask for desired accuracy goals, typically a rela-

so highly nonlinear regions can send the system unsta- tive (fractional) accuracy and an absolute accuracy,

ble before one might expect. The maximum safe stable sometimes set separately for each component of y,

Ask for and return derivative and time step informa-

stepsize often has accuracy far higher than needed; in-

tion from the end of the last step (to allow ecient

deed, some algorithms become less stable if the stepsize

restarts after intermediate points),

is decreased!45

Ask for a routine that computes the derivatives of

ODE packages: higher order, variable stepsize, sti sys- dydt with respect to the current components of y

tems . . . (for use by the sti integrator), and

The Verlet algorithms are fairly simple to code, and we Return information about the methods, time steps,

use higherorder symplectic algorithms in Hamiltonian and performance of the algorithm.

systems mostly in unusual applications (planetary mo-

You will be supplied with one of these generalpurpose

tion) where high accuracy is demanded, because they are

packages, and instructions on how to use it.

typically signicantly less stable. In systems of dier-

(e) Write the function dydt, and use the general pur-

ential equations where there is no conserved energy or

pose solver to solve for the motion of the pendulum as in

Hamiltonian, or even in Hamiltonian systems (like high

parts (a)-(c), and informally check that the trajectory is

energy collisions) where accuracy at short times is more

accurate.

crucial than delity at long times, we use general purpose

44 The error in each time step of the Verlet algorithm is of order 4 . Its usually said

that the Verlet algorithms have third order accuracy, naively assuming that running

for a time T should have errors bounded by the number of time steps T / times the

error per time step 4 . However, one can check that the errors in successive time

steps build up quadratically at short times (i.e., the velocity errors build up linearly

with time), so after T / time steps the accumulated error is 4 (T /)2 2 . Well

use cumulative order of the algorithm to distinguish it from the naive order.

45 For some partial dierential equations, decreasing the spacing x between points

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

156 Computational Stat Mech: Ising and Markov

(8.10) Small World Networks. (Complexity, Compu- lem, for a variety of languages and systems (currently

tation) (With Myers. [72]) Python under Unix and Windows).

Many interesting problems arise from studying proper- Constructing a small world network. The L nodes in

ties of randomly generated networks. A network is a col- a small world network are arranged around a circle. There

lection of nodes and edges, with each edge connected to are two kinds of edges. Each node has Z short edges con-

two nodes, but with each node potentially connected to necting it to its nearest neighbors around the circle (up

any number of edges. A random network is constructed to a distance Z/2). In addition, there are p L Z/2

probabilistically according to some denite rules; study- shortcuts added to the network, which connect nodes at

ing such a random network usually is done by studying random (see gure 8.12). (This is a simpler version [73]

the entire ensemble of networks, each weighted by the of the original model [117], which rewired a fraction p of

probability that it was constructed. Thus these problems the LZ/2 edges.)

naturally fall within the broad purview of statistical me-

(a) Dene a network object on the computer. For this

chanics.

problem, the nodes will be represented by integers. Imple-

ment a network class, with ve functions:

ready in the network,

(2) AddNode(node), which adds a new node to the sys-

tem (if its not already there),

(3) AddEdge(node1, node2), which adds a new edge to

the system,

(4) GetNodes(), which returns a list of existing nodes,

and

(5) GetNeighbors(node), which returns the neighbors

Fig. 8.11 A network is a collection of nodes (circles) and edges of an existing node.

(lines between the circles).

Write a routine to construct a smallworld network,

which (given L, Z, and p) adds the nodes and the short

One of the more popular topics in random network the- edges, and then randomly adding the shortcuts. Use the

ory is the study of how connected they are. Six degrees software provided to draw this small world graph, and

of separation is the phrase commonly used to describe check that youve implemented the periodic boundary con-

the interconnected nature of human acquaintances: vari- ditions correctly (each node i should be connected to nodes

ous somewhat uncontrolled studies have shown that any (i Z/2)modL, . . . , (i + Z/2)modL).

random pair of people in the world can be connected to

one another by a short chain of people (typically around

six), each of whom knows the next fairly well. If we repre-

sent people as nodes and acquaintanceships as neighbors,

we reduce the problem to the study of the relationship

network.

In this problem, we will generate some random networks,

and calculate the distribution of distances between pairs

of points. Well study small world networks [117, 73], a

simple theoretical model that suggests how a small num-

ber of shortcuts (unusual international and intercultural

friendships, ) can dramatically shorten the typical chain

lengths. Finally, well study how a simple, universal scal-

ing behavior emerges for large networks with few short-

cuts.

On the Web site for this book [105], youll nd some hint

les and graphic routines to facilitate working this prob-

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

8.2 Markov Chains 157

Fig. 8.12 Small world network, with L = 20, Z = 4, and Check your function by testing that the histogram of

p = 0.2.46 path lengths at p = 0 is constant for 0 < * < L/Z,

as advertised. Generate graphs at L = 1000 and

Z = 2 for p = 0.02 and p = 0.2: display the cir-

Measuring the minimum distances between cle graphs and plot the histogram of path lengths.

nodes. The most studied property of small world graphs Zoom in on the histogram: how much does it change

is the distribution of shortest paths between nodes. With- with p? What value of p would you need to get six

out the long edges, the shortest path between i and j will degrees of separation ?

be given by hopping in steps of length Z/2 along the

shorter of the two arcs around the circle: there will be (3) FindAveragePathLength(graph), which similarly

no paths of length longer than L/Z (halfway around the computes the mean * over all pairs of nodes. Com-

circle), and the distribution (*) of path lengths * will pute * for Z = 2, L = 100, and p = 0.1 a few times:

be constant for 0 < * < L/Z. When we add shortcuts, your answer should be around * = 10. Notice that

we expect that the distribution will be shifted to shorter there are substantial statistical uctuations in the

path lengths. value from sample to sample. Roughly how many

(b) Write three functions to nd and analyze the path long bonds are there in this system? Would you ex-

length distribution: pect uctuations in the distances?

(1) FindPathLengthsFromNode(graph, node), which

returns for each node2 in the graph the shortest

(c) Plot the average path length between nodes *(p) di-

distance from node to node2. An ecient algo-

vided by *(p = 0) for Z = 2, L = 50, with p on a semi-log

rithm is a breadth rst traversal of the graph, work-

plot from p = 0.001 to p = 1. Compare with gure 2

ing outward from node in shells. There will be a

of Watts and Strogatz [117]. You should nd roughly the

currentShell of nodes whose distance will be set

same curve, with the values of p shifted by a factor of 100.

to * unless they have already been visited, and a

(They do L = 1000 and Z = 10).

nextShell which will be considered after the cur-

rent one is nished (looking sideways before forward, Large N and the emergence of a continuum limit.

breadthrst): We can understand the shift in p of part (c) as a contin-

Initialize * = 0, the distance from node to itself uum limit of the problem. In the limit where the number

to zero, and currentShell = [node] of nodes N becomes large and the number of short cuts

While there are nodes in the new pLZ/2 stays xed, this network problem has a nice limit

currentShell: where distance is measured in radians around the cir-

Start a new empty nextShell cle. Dividing * by *(p = 0) L/(2Z) essentially does

For each neighbor of each node in the cur- this, since = Z*/L.

rent shell, if the distance to neighbor has

not been set, add the node to nextShell (d) Create and display a circle graph of your geometry

and set the distance to * + 1 from part (c) [Z = 2, L = 50] at p = 0.1; create

Add one to *, and set the current shell to and display circle graphs of Watts and Strogatz geom-

nextShell etry [Z = 10, L = 1000] at p = 0.1 and p = 0.001. Which

of their systems looks statistically more similar to yours?

Return the distances

Plot (perhaps using the scaling collapse routine provided)

This will sweep outward from node, measuring the the rescaled average path length Z*/L versus the total

shortest distance to every other node in the network. number of shortcuts pLZ/2, for a range 0.001 < p < 1,

(Hint: Check your code with a network with small for L = 100 and 200 and Z = 2 and 4.

N and small p, comparing a few paths to hand

calculations from the graph image generating as in In this limit, the average bond length should be a

part (a).) function only of M . Since reference [117] ran at a value of

(2) FindPathLengthHistogram(graph), which com- ZL a factor of 100 larger than ours, our values of p are a

putes the probability (*) that a shortest path will factor of 100 larger to get the same value of M = pLZ/2.

have length *, by using FindPathLengthsFromNode Newman and Watts [76] derive this continuum limit with

repeatedly to nd the mean over all pairs of nodes. a renormalizationgroup analysis (chapter 13).

46 There are seven new shortcuts, where pLZ/2 = 8; one of the added edges over-

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

158 Computational Stat Mech: Ising and Markov

(e) Real Networks. From the book Web site [105], or Figure 8.14 shows what a large sheet of paper, held at

through your own researches, nd a real network47 and the edges, would look like if small holes were successively

nd the mean distance and histogram of distances between punched out at random locations. Here the ensemble av-

nodes. erages over the dierent choices of random locations for

the holes; this gure shows the sheet just before it fell

apart. Of course, certain choices of hole positions would

cut the sheet in two far earlier (a straight line across the

center) or somewhat later (checkerboard patterns), but

for the vast majority of members of our ensemble the pa-

per will have the same kinds of hole patterns seen here.

Again, it is easier to analyze all the possible patterns of

punches than to predict a particular pattern.

Percolation theory is the study of the qualitative change

in connectivity of a large system as its components are

randomly removed. Outside physics, it has become a pro-

totype of criticality at continuous transitions, presumably

because the problem is simple to state and the analysis

does not demand a background in equilibrium statisti-

cal mechanics.48 In this exercise, well study bond per-

colation (gure 8.14) and site percolation (8.15) in two

dimensions.

Fig. 8.13 Small world network with L = 500, K = 2 and

p = 0.1, with node and edge sizes scaled by the square root of

their betweenness.

to ecient transfer through the system (transfer of infor-

mation in a computer network, transfer of disease in a

population model, . . . ). It is often useful to measure how

crucial a given node or edge is to these shortest paths.

We say a node or edge is between two other nodes if it

is along a shortest path between them. We measure the

betweenness of a node or edge as the total number of

such shortest paths passing through, with (by convention)

the initial and nal nodes included in the between nodes;

see gure 8.13. (If there are K multiple shortest paths

of equal length between two nodes, each path adds 1/K

to its intermediates.) The ecient algorithm to measure

betweenness is a depthrst traversal quite analogous to

the shortestpathlength algorithm discussed above.

(f ) Betweenness. (Advanced) Read references [74] Fig. 8.14 Bond Percolation network. Each bond on a

and [35] , discussing the algorithms for nding the be- 10 10 square lattice is present with probability p = 0.4. This

tweenness. Implement them on the small world net- is below the percolation threshold p = 0.5 for the innite lat-

work, and perhaps the realworld network you analyzed tice, and indeed the network breaks up into individual clusters

in part (e). Visualize your answers by using the graphics (each shaded separately). Note the periodic boundary condi-

software provided on the book Web site [105]. tions. Note there are many small clusters, and only a few large

ones, here twelve clusters of size S = 1, three of size S = 2,

and one cluster of size S = 29 (black). For a large lattice

(8.11) Building a Percolation Network. (Complex- near the percolation threshold the probability distribution of

ity,Computation) (With Myers. [72]) cluster sizes (S) forms a power law (exercise 13.9).

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

8.2 Markov Chains 159

On the Web site for this book [105], youll nd some hint visited. The cluster is of course the union of node,

les and graphic routines to facilitate working this prob- the neighbors, the neighbors of the neighbors, etc.

lem, for a variety of languages and systems (currently The trick is to use the set of visited sites to avoid

Python under Unix and Windows). going around in circles. The ecient algorithm is a

Bond percolation on a square lattice. breadth rst traversal of the graph, working outward

(a) Dene a 2D bond percolation network with periodic from node in shells. There will be a currentShell

boundary conditions on the computer, for size L L and of nodes whose neighbors have not yet been checked,

bond probability p. For this problem, the nodes will be rep- and a nextShell which will be considered after the

resented by pairs of integers (i, j). Youll need the method current one is nished (breadthrst):

GetNeighbors(node), which returns the neighbors of an Initialize visited[node]=True,

existing node. Use the bond-drawing software provided cluster=[node], and

to draw your bond percolation network for various p and currentShell=graph.GetNeighbors(node).

L, and use it to check that youve implemented the peri- While there are nodes in the new

odic boundary conditions correctly. (There are two basic currentShell:

approaches. You can start with an empty network and

use AddNode and AddEdge in loops to generate the nodes, Start a new empty nextShell

vertical bonds, and horizontal bonds (see exercise 8.10). For each node in the current shell, if the

Alternatively, and more traditionally, you can set up a node has not been visited,

2D array of vertical and horizontal bonds, and implement add the node to the cluster,

GetNeighbors(node) by constructing the list of neighbors mark the node as visited,

from the bond networks when the site is visited.) and add the neighbors of the node to the

The percolation threshold and duality. In most con- nextShell

tinuous phase transitions, one of the challenges is to nd Set the current shell to nextShell

the location of the transition. We chose bond percolation

on the square lattice because there is a simple argument Return the cluster

that shows, in the limit of large systems, that the perco-

(2) FindAllClusters(graph), which sets up the

lation threshold pc = 1/2. The argument makes use of

visited set to be False for all nodes, and calls

the dual lattice.

FindClusterFromNode(graph, node, visited) on

The nodes of the dual lattice are the centers of the squares all nodes that havent been visited, collecting the re-

between nodes in the original lattice. The edges of the sulting clusters. Optionally, you may want to order

dual lattice are those which do not cross an edge of the the clusters from largest to smallest, for convenience

original lattice. Since every potential dual edge crosses in graphics (and in nding the largest cluster).

exactly one edge of the original lattice, the probability p

of having bonds on the dual lattice is 1 p where p is

Check your code by running it for small L and using the

the probability of bonds for the original lattice. If we can

graphics software provided. Are the clusters, drawn in

show that the dual lattice percolates if and only if the

dierent colors, correct?

original lattice does not, then pc = 1/2. This is easiest to

see graphically: Site percolation on a triangular lattice. Universal-

(b) Generate and print a small lattice with p = 0.4, pick- ity states that the statistical behavior of the percolation

ing one where the largest cluster does not span across ei- clusters at long length scales should be independent of

ther the vertical or the horizontal direction (or print g- the microscopic detail. That is, removing bonds from a

ure 8.14). Draw a path on the dual lattice spanning the square lattice should leave the same fractal patterns of

system from top to bottom and from left to right. (Youll holes, near pc , as punching out circular holes in a sheet

be emulating a rat running through a simple maze.) Is it just before it falls apart. Nothing about your algorithms

clear for large systems that the dual lattice will percolate from part (c) depended on their being four neighbors of a

if and only if the original lattice does not? node, or their even being nodes at all sites. Lets imple-

ment site percolation on a triangular lattice (gure 8.15):

Finding the clusters. (c) Write two functions that

nodes are occupied with probability p, with each node

together nd the clusters in the percolation network:

connected to any of its six neighbor sites that are also

(1) FindClusterFromNode(graph, node, visited), lled (punching out hexagons from a sheet of paper). The

which returns the cluster in graph containing node, triangular site lattice also has a duality transformation,

and marks the sites in the cluster as having been so again pc = 0.5.

James

c P. Sethna, January 4, 2005 Entropy, Order Parameters, and Complexity

160 Computational Stat Mech: Ising and Markov

more detail.

(Complexity)

triangular lattice is present with probability p = 0.5, the per-

colation threshold for the innite lattice. Note the periodic

boundary conditions at the sides, and the shifted periodic

boundaries at the top and bottom.

Fig. 8.16 Barkhausen noise experiment.

on a triangular lattice by [i, j], where x = i + j/2 and

y = 23 j. If we again use periodic boundary conditions

with 0 i < L and 0 j < L, we cover a region in the

shape of a 60 rhombus.49 Each site [i, j] has six neigh-

bors, at [i, j] + e with e = [1, 0], [0, 1], [1, 1] upward and 3.0

to the right and minus the same three downward and left.

Applied magnetic field (H/J)

2.0

(d) Generate a site percolation network on a triangu-

lar lattice.You can treat the sites one at a time, using 1.0

to bond to all existing neighbors. Alternatively, you can

start by generating a whole matrix of random numbers 1.0

in one sweep to determine which sites are occupied by

2.0

nodes, add those nodes, and then ll in the bonds. Check

your resulting network by running it for small L and us- 3.0

1.0 0.5 0.0 0.5 1.0

ing the graphics software provided. (Notice the shifted

Magnetization (M)

periodic boundary conditions at the top and bottom, see

gure 8.15.) Use your routine from part (c) to generate Fig. 8.17 Hysteresis loop with subloops.

the clusters, and check these (particularly at the periodic

boundaries) using the graphics software.

(e) Generate a small squarelattice bond percolation clus-

ter, perhaps 3030, and compare with a small triangular

lattice site percolation cluster. They should look rather

dierent in many ways. Now generate a large50 cluster of

each, perhaps 1000 1000 (or see gure 13.9). Stepping

back and blurring your eyes, do the two look substantially

similar? Fig. 8.18 Tiny jumps: Barkhausen noise.

49 The graphics software uses the periodic boundary conditions to shift this rhom-

bus back into a rectangle.

50 Your code, if written properly, should run in a time of order N , the number of

nodes. If it seems to slow down more than a factor of 4 when you increase the length

of the side by a factor of two, check for ineciencies.

To be pub. Oxford UP, Fall05 www.physics.cornell.edu/sethna/StatMech/

8.2 Markov Chains 161