Sie sind auf Seite 1von 660

Reflections on Relativity

Preface
1. First Principles
1.1 Experience and Spacetime
1.2 Systems of Reference
1.3 Inertia and Relativity
1.4 The Relativity of Light
1.5 Corresponding States
1.6 A More Practical Arrangement
1.7 Staircase Wit
1.8 Another Symmetry
1.9 Null Coordinates

1
3
9
16
23
33
44
58
65
72

2. A Complex of Phenomena
2.1 The Spacetime Interval
2.2 Force Laws and Maxwell's Equations
2.3 The Inertia of Energy
2.4 Doppler Shift for Sound and Light
2.5 Stellar Aberration
2.6 Mobius Transformations of the Night Sky
2.7 The Sagnac Effect
2.8 Refraction Between Moving Media
2.9 Accelerated Travels
2.10 The Starry Messenger
2.11 Thomas Precession

81
88
98
111
118
131
140
151
159
173
183

3. Several Valuable Suggestions


3.1 Postulates and Principles
3.2 Natural and Violent Motions
3.3 De Mora Luminis
3.4 Stationary Paths
3.5 A Quintessence of So Subtle a Nature
3.6 The End of My Latin
3.7 Zeno and the Paradox of Motion
3.8 A Very Beautiful Day
3.9 Constructing the Principles

192
202
209
218
224
230
238
245
251

4. Weighty Arguments
4.1 Immovable Spacetime
4.2 Inertial and Gravitational Separations
4.3 Free-Fall Equations
4.4 Force, Curvature, and Uncertainty
4.5 Conventional Wisdom
4.6 The Field of All Fields
4.7 The Inertia of Twins

256
265
270
274
280
292
297

4.8

The Breakdown of Simultaneity

301

5. Extending the Principle


5.1 Vis Inertiae
5.2 Tensors, Contravariant and Covariant
5.3 Curvature, Intrinsic and Extrinsic
5.4 Relatively Straight
5.5 Schwarzschild Metric from Kepler's 3rd Law
5.6 The Equivalence Principle
5.7 Riemannian Geometry
5.8 The Field Equations

308
313
323
340
349
354
359
370

6. Ist Das Wirklich So?


6.1 An Exact Solution
6.2 Anomalous Precession
6.3 Bending Light
6.4 Radial Paths in a Spherically Symmetrical Field
6.5 Intersecting Orbits
6.6 Ideal Clocks in Arbitrary Motion
6.7 Acceleration in Schwarzschild Coordinates
6.8 Sources in Motion

382
389
400
408
413
419
425
431

7. Cosmology
7.1 Is the Universe Closed?
7.2 The Formation and Growth of Black Holes
7.3 Falling Into and Hovering Near A Black Hole
7.4 Curled-Up Dimensions
7.5 Packing Universes In Spacetime
7.6 Cosmological Coherence
7.7 Boundaries and Symmetries
7.8 Global Interpretations of Local Experience

438
449
458
468
472
478
486
493

8. The Secret Confidence of Nature


8.1 Kepler, Napier, and the Third Law
8.2 Newton's Cosmological Queries
8.3 The Helen of Geometers
8.4 Refractions On Relativity
8.5 Scholium
8.6 On Gauss' Mountains
8.7 Strange Meeting
8.8 Who Invented Relativity?
8.9 Paths Not Taken

505
510
518
521
531
536
541
548
559

9. The Relativistic Topology


9.1 In The Neighborhood
9.2 Up To Diffeomorphism
9.3 Higher-Order Metrics
9.4 Spin and Polarization
9.5 Entangled Events
9.6 Von Neumann's Postulate and Bell's Freedom
9.7 The Gestalt of Determinism
9.8 Quaedam Tertia Natura Abscondita
9.9 Locality and Temporal Asymmetry
9.10 Spacetime Mediation of Quantum Interactions

567
576
579
584
587
593
599
602
606
611

Conclusion

618

Appendix:

Mathematical Miscellany

Bibliography

620
633

1.1 From Experience to Spacetime


I might revel in the world of intelligibility which still remains to me, but
although I have an idea of this world, yet I have not the least knowledge of
it, nor can I ever attain to such knowledge with all the efforts of my
natural faculty of reason. It is only a something that remains when I have
eliminated everything belonging to the senses but this something I know
no further There must here be a total absence of motive - unless this
idea of an intelligible world is itself the motive but to make this
intelligible is precisely the problem that we cannot solve.
Immanuel Kant
We ordinarily take for granted the existence through time of objects moving according to
fixed laws in three-dimensional space, but this is a highly abstract model of the objective
world, far removed from the raw sense impressions that comprise our actual experience.
This model may be consistent with our sense impressions, but it certainly is not uniquely
determined by them. For example, Ptolemy and Copernicus constructed two very
different conceptual models of the heavens based on essentially the same set of raw sense
impressions. Likewise Weber and Maxwell synthesized two very different conceptual
models of electromagnetism to account for a single set of observed phenomena. The fact
that our raw sense impressions and experiences are (at least nominally) compatible with
widely differing concepts of the world has led some philosophers to suggest that we
should dispense with the idea of an "objective world" altogether, and base our physical
theories on nothing but direct sense impressions, all else being merely the products of our
imaginations. Berkeley expressed the positivist identification of sense impressions with
objective existence by the famous phrase "esse est percipi" (to be is to be perceived).
However, all attempts to base physical theories on nothing but raw sense impressions,
avoiding arbitrary conceptual elements, invariably founder at the very start, because we
have no sure means of distinguishing sense impressions from our thoughts and ideas. In
fact, even the decision to make such a distinction represents a significant conceptual
choice, one that is not strictly necessary on the basis of experience.
The process by which we, as individuals, learn to recognize sense impressions induced by
an external world, and to distinguish them from our own internal thoughts and ideas, is
highly complicated, and perhaps ultimately inexplicable. As Einstein put it (paraphrasing
Kant) the eternal mystery of the world is its comprehensibility. Nevertheless, in order
to examine the epistemological foundations of any physical theory, we must give some
consideration to how the elements of the theory are actually derived from our raw sense
impressions, without automatically interpreting them in conventional terms. On the other
hand, if we suppress every pre-conceived notion, including ordinary rules of reasoning,

we can hardly hope to make any progress. We must choose a level of abstraction deep
enough to give a meaningful perspective, but not so deep that it can never be connected
to conventional ideas.
As an example of a moderately abstract model of experience, we might represent an
idealized observer as a linearly ordered sequence of states, each of which is a function of
the preceding states and of a set of raw sense impressions from external sources. This
already entails two profound choices. First, it is a purely passive model, in the sense that
it does not invoke volition or free will. As a result, all conditional statements in this
model must be interpreted only as correlations (as discussed more fully in section 3.2),
because without freedom it is meaningless to talk about the different consequences of
alternate hypothetical actions. Second, by stipulating that the states are functions of the
preceding but not the subsequent states we introduce an inherent directional asymmetry
to experience, even though the justification for this is far from clear.
Still another choice must be made as to whether the sequence of states and experiences is
continuous or discrete. In either case we can parameterize the sequence by a variable ,
and for the sake of definiteness we might represent each state S() and the corresponding
sense impressions E() by strings of binary bits. Now, because of the mysterious
comprehensibility of the world, it may happen that some functions of S are correlated
with some functions of E. (Since this is a passive model by assumption, we cannot assert
anything more than statistical correlations, because we do not have the freedom to
arbitrarily vary S and determine the resulting E, but in principle we could still passively
encounter enough variety of states and experiences to infer the most prominent
correlations.) These most primitive correlations are presumably hard-wired into higherlevel categories of senses and concepts (i.e., state variables), rather than being sorted out
cognitively. In terms of these higher-level variables we might find that over some range
of the sense impressions E() are strictly correlated with three functions , , of the
state S(), which change only incrementally from one state to the next. Also, we may find
that E is only incrementally different for incremental differences in , , (independent
of the prior values of those functions), and that this is the smallest and simplest set of
functions with this property. Finally, suppose the sense impressions corresponding to a
given set of values of the state functions are identical if the values of those functions are
increased or decreased by some constant.
This describes roughly how an abstract observer might infer an orientation space along
with the associated modes of interaction. In conventional terms, the observer infers the
existence of external objects which induce a particular set of sense impressions
depending on the observers orientation. (Of course, this interpretation is necessarily
conjectural; there may be other, perhaps more complex, interpretations that correspond as
well or better with the observers actual sequence of experiences.) At some point the
observer may begin to perceive deviations from the simple three-variable orientation
model, and find it necessary to adopt a more complicated conceptual model in order to
accommodate the sequence of sense impressions. It remains true that the simple
orientation model applies over sufficiently small ranges of states, but the sense
impressions corresponding to each orientation may vary as a function of three additional

state variables, which in conventional terms represent the spatial position of the observer.
Like the orientation variables, these translation variables, which we might label x, y, and
z, change only incrementally from one state to the next, but unlike the orientation
variables there is no apparent periodicity.
Note that the success of this process of induction relies on a stratification of experiences,
allowing the orientation effects to be discerned first, more or less independent of the
translation effects. Then, once the orientation model has been established, the relatively
small deviations from it (over small ranges of the state variable) could be interpreted as
the effects of translatory motion. If not for this stratification (either in magnitude or in
some other attribute), it might never be possible to infer the distinct sources of variation
in our sense impressions. (On a more subtle level, the detailed metrical aspects of these
translation variables will also be found to differ from those of the orientation variables,
but only after quantitative units of measure and coordinates have been established.)
Another stage in the development of our hypothetical observer might be prompted by the
detection of still more complicated variations in the experiential attributes of successive
states. The observer may notice that while most of the orientation space is consistent with
a fixed position, some particular features of their sense impressions do not maintain their
expected relations to the other features, and no combination of the observers translation
and orientation variables can restore consistency. The inferred external objects of
perception can no longer be modeled based on the premise that their relations with
respect to each other are unchanging. Significantly, the observer may notice that some
features vary as would be expected if the observers own positional state had changed in
one way, whereas other features vary as would be expected if the observers positions had
changed in a different way. From this recognition the observer concludes that, just as he
himself can translate through the space, so also can individual external objects, and the
relations are reciprocal. Thus, to each object we now assign an independent set of
translation coordinates for each state of the observer.
In so doing we have made another important conceptual choice, namely, to regard
"external objects" as having individual identities that persist from one state to the next.
Other interpretations are possible. For example, we could account for the apparent motion
of objects by supposing that one external entity simply ceases to exist, and another
similar entity in a slightly different position comes into existence. According to this view,
there would be no such thing as motion, but simply a sequence of arrangements of objects
with some similarities. This may seem obtuse, but according to quantum mechanics it
actually is not possible to unambiguously map the identities of individual elementary
particles (such as electrons) from one event to another (because their wave functions
overlap). Thus the seemingly innocuous assumption of continuous and persistent
identities for material objects through time is actually, on some level, demonstrably false.
However, on the macroscopic level, physical objects do seem to maintain individual
identities, or at least it is possible to successfully model our sense impressions based on
the assumption of persistent identities (because the overlaps between wave functions are
negligible), and this success is the justification for introducing the concept of motion for
the objects of experience.

The conceptual model of our hypothetical observer now involves something that we may
call distance, related to the translational state variables, but its worth noting that we have
no direct perception of distances between ourselves and the assumed external objects, and
even less between one external object and another. We have only our immediate sense
impressions, which are understood to be purely local interactions, involving signals of
some kind impinging on our senses. We infer from these signals a conceptual model of
space and time within which external objects reside and move. This model actually
entails two distinct kinds of extent, which we may call distance and length. An object,
consisting of a locus of sense impressions that maintains a degree of coherence over time,
has a spatial length, as do the paths that objects may follow in their motions, but the
conceptual model of space also allows us to conceive of a distance between two objects,
defined as the length of the shortest possible path between them.
The task of quantifying these distances, and of relating the orientation variables with the
translation variables, then involves further assumptions. Since this is a passive model, all
changes are strictly known only as a function of the single state variable, but we imagine
other pseudo-independent variables based on the observed correlations. We have two
means of quantifying spatial distances. One is by observing the near coincidence of one
or more stable entities (measuring rods) with the interval to be quantified, and the other is
to observe the change in the internal state variable as an object of stable speed moves
from one end of the interval to the other. Thus we can quantify a spatial interval in terms
of some reference spatial interval, or in terms of the associated temporal interval based on
some reference state of motion. We identify these references purely by induction based
on experience.
Combining the rotational symmetries and the apparent translational distances that we
infer from our primary sense impressions, we conventionally arrive at a conception of the
external world that is, in some sense, the dual of our subjective experience. In other
words, we interpret our subjective experience as a one-dimensional temporally-ordered
sequence of events, whereas we conceive of "the objective world now" corresponding to
a single perceived event as a three-dimensional expanse of space as illustrated below:

In this way we intuitively conceive of time and space as inherently perpendicular


dimensions, but complications arise if we posit that each event along our subjective path
resides in, and is an element of, an objective world. If the events along any path are
discrete, then we might imagine a simple sequence of discrete "instantaneous worlds":

One difficulty with this arrangement is that it isn't clear how (or whether) these worlds
interact with each other. If we regard each "instant" as a complete copy of the spatial
universe, separate from every other instant, then there seems to be no definite way to
identify an object in one world with "the same" object in another, particularly considering
qualitatively identical objects such as electrons. If we have two electrons assigned the
labels A and B in one instant of time, and if we find two electrons in the next instant of
time, we have no certain way of deciding which of them was the "A" electron from the
previous instant. (In fact, we cannot even map the spatial locations of one instant to "the
same" locations in any other instant.) This illustrates how the classical concept of motion
is necessarily based on the assumption of persistent identities of objects from one instant
to another. Since it does seem possible (at least in the classical realm) to organize our
experiences in terms of individual objects with persistent and unambiguous identities
over time, we may be led to suspect that the sequence of existence of an individual or
object in any one instant must be, in some sense, connected to or contiguous with its
existence in neighboring instants. If these objects are the constituents of "the world", this
suggests that space itself at any "instant" is continuous with the spaces of neighboring
instants. This is important because it implies a definite connectivity between neighboring
world-spaces, and this, as we'll see, places a crucial constraint on the relativity of motion.
Another complication concerns the relative orderings of world-instants along different
paths. Our schematic above implied that the "instantaneous worlds" are well-ordered in
the sense that they are encountered in the same order along every individual's path, but of
course this need not be the case. For example, we could equally well imagine an
arrangement in which the "instantaneous worlds" are skewed, so that different individuals
encounter them in different orders, as illustrated below.

The concept of motion assumes the world can be analyzed in two different ways, first as

the union of a set of mutually exclusive "events", and second as a set of "objects" each of
which participates in an ordered sequence of events. In addition to this ordering of events
encountered by each individual object, we must also assume both a co-lateral ordering of
the events associated with different objects, and a transverse ordering of events from one
object to another. These three kinds of orderings are illustrated schematically below.

This diagram suggests that the idea of motion is actually quite complex, even in this
simple abstract model. Intuitively we regard motion as something like the derivative of
the spatial "position" with respect to "time", but we can't even unambiguously define the
distance between two worldlines, because it depends on how we correlate the temporal
ordering along one line to the temporal ordering along the other. Essentially our concept
of motion is overly ambitious, because we want it to express the spatial distance from the
observer to the object for each event along the observer's worldline, but the intervals from
one worldline to another are not confined to the worldlines themselves, so we have no
definite way of assigning those intervals to events along our worldline. The best we can
do is correlate all the intervals from a particular point on the observer's worldline to the
object's worldline.
When we considered everything in terms of the sense impressions of just a single
observer this was not an issue, since only one parameterization was needed to map the
experiences of that observer, interpreted solipsistically. Any convenient parameterization
was suitable. When we go on to consider multiple observers and objects we can still
allow each observer to map his experiences and internal states using the most convenient
terms of reference (which will presumably include his own state-index as the temporal
coordinate), but now the question arises as to how all these private coordinate systems are
related to each other. To answer this question we need to formalize our parameterizations
into abstract systems of coordinates, and then consider how the coordinates of any given
event with respect to one system are related to the coordinates of the same event with
respect to another system. This is discussed in the next section.
Considering how far removed from our raw sense impressions is our conceptual model of
the external world, and how many unjustified assumptions and interpolations are
involved in its construction, its easy to see why some philosophers have advocated the
rejection of all conceptual models. However, the fact remains that the imperative to
reconcile our experience with some model of an objective external world has been one of
the most important factors guiding the development of physical theories. Even in

quantum mechanics, arguably the field of physics most resistant to complete realistic
reconciliation, we still rely on the "correspondence principle", according to which the
observables of the theory must conform to the observables of classical realistic models in
the appropriate limits. Naturally our interpretations of experience are always provisional,
being necessarily based on incomplete induction, but conceptual models of an objective
world have proven (so far) to be indispensable.
1.2 Systems of Reference
Any one who will try to imagine the state of a mind conscious of knowing
the absolute position of a point will ever after be content with our relative
knowledge.
James Clerk
Maxwell, 1877
There are many theories of relativity, each of which can be associated with some
arbitrariness in our descriptions of events. For example, suppose we describe the spatial
relations between stationary particles on a line by assigning a real-valued coordinate to
each particle, such that the distance between any two particles equals the difference
between their coordinates. There is a degree of arbitrariness in this description due to the
fact that all the coordinates could be increased by some arbitrary constant without
affecting any of the relations between the particles. Symbolically this translational
relativity can be expressed by saying that if x is a suitable system of coordinates for
describing the relations between the particles, then so is x + k for any constant k.
Likewise if we describe the spatial relations between stationary particles on a plane by
assigning an ordered pair of real-valued coordinates to each particle, such that the
squared distance between any two particles equals the sum of the squares of the
differences between their respective coordinates, then there is a degree of arbitrariness in
the description (in addition to the translational relativity of each individual coordinate)
due to the fact that we could rotate the coordinates of every particle by an arbitrary
constant angle without affecting any of the relations between the particles. This relativity
of orientation is expressed symbolically by saying that if (x,y) is a suitable system of
coordinates for describing the positions of particles on a plane, then so is (axby, bx+ay)
where a2 + b2 = 1.
These relativities are purely formal, in the sense that they are tautological consequences
of the premises, regardless of whether they have any physical applicability. Our first
premise was that its possible to assign a single real-valued coordinate to each particle on
a line such that the distance between any two particles equals the difference between their
coordinates. If this premise is satisfied, the invariance of relations under coordinate
transformations from x to x + k follows trivially, but if the pairwise distances between
three given particles were, say, 5, 3, and 12 units, then no three numbers could be
assigned to the particles such that the pairwise differences equal the distances. This
shows that the n(n1)/2 pairwise distances between n particles cannot be independent of
each other if those distances can be encoded unambiguously by just n coordinates in one

dimension or, more generally, by kn coordinates in k dimensions. A suitable system of


coordinates in one dimension exists only if the distances between particles satisfy a very
restrictive condition. Letting d(A,B) denote the signed distance from A to B, the
condition that must be satisfied is that for every three particles A,B,C we have d(A,B) +
d(B,C) + d(C,A) = 0. Of course, this is essentially the definition of co-linearity, but we
have no a priori reason to expect this definition to have any applicability in the world of
physical objects. The fact that it has wide applicability is a non-trivial aspect of our
experience, albeit one that we ordinarily take for granted.
Likewise for particles in a region of three dimensional space the premise that we can
assign three numbers to each particle such that the squared distance between any two
particles equals the sum of the squares of the differences between their respective
coordinates is true only under a very restrictive condition, because there are only 3n
degrees of freedom in the n(n1)/2 pairwise distances between n particles.
Just as we found relativity of orientation for the pair of spatial coordinates x and y, we
also find the same relativity for each of the pairs x,z and y,z in three dimensional space.
Thus we have translational relativity for each of the four coordinates x,y,z,t, and we have
rotational relativity for each pair of spatial coordinates (x,y), (x,z), and (y,z). This leaves
the pairs of coordinates (x,t), (y,t) and (z,t). Not surprisingly we find that there is an
analogous arbitrariness in these coordinate pairs, which can be expressed (for the x,t pair)
by saying that the relations between the instances of particles on a line as a function of
time are unaffected if we replace the x and t coordinates with ax bt and bx + at
respectively, where a2 b2 = 1. These transformations (rotations in the x,t plane through
an imaginary angle), which characterize the theory of special relativity, are based on the
premise that it is possible to assign pairs of values, x and t, to each instance of each
particle on the x axis such that the squared spacetime distance equals the difference
between the squares of the differences between the respective coordinates.
Each of the above examples represents an invariance of physically measurable relations
under certain classes of linear transformations. Extending this idea, Einsteins general
theory of relativity shows how the laws of physics, suitably formulated, are invariant
under an even larger class of transformations of space and time coordinates, including
non-linear transformations, and how these transformations subsume the phenomena of
gravity. In general relativity the metrical properties of space and time are not constant, so
the simple premises on which we based the primitive relativities described above turn out
not to be satisfied globally. However, it remains true that those simple premises are
satisfied locally, i.e., over sufficiently small regions of space and time, so they continue to
be of fundamental importance.
As mentioned previously, the relativities described above are purely formal and
tautological, but it turns out that each of them is closely related to a non-trivial physical
symmetry. There exists a large class of identifiable objects whose lengths maintain a
fixed proportion to each other under the very same set of transformations that
characterize the relativities of the coordinates. In other words, just as we can translate the
coordinates on the x axis without affecting the length of any object, we also find a large

class of objects that can be individually translated along the x axis without affecting their
lengths. The same applies to rotations and boosts. Such changes are physically distinct
from purely formal shifts of the entire coordinate system, because when we move
individual objects we are actually changing the relations between objects, since we are
moving only a subset of all the coordinated objects. (Also, moving an object from one
stationary position to another requires acceleration.) Thus for each formal arbitrariness in
the system of coordinates there exists a physical symmetry, i.e., a large class of entities
whose extents remain in constant proportions to each other when subjected individually
to the same transformations.
We refer to these relations as physical symmetries rather than physical invariances,
because (for example) we have no basis for asserting that the length of a solid object or
the duration of a physical process is invariant under changes in position, orientation or
state of motion. We have no way of assessing the truth of such a statement, because our
measures of length and duration are all comparative. We can say only that the spatial and
temporal extents of all the stable physical entities and processes are affected (if at all)
in exactly the same proportion by changes in position, orientation, and state of motion. Of
course, given this empirical fact, it is often convenient to speak as if the spatial and
temporal extents are invariant, but we shouldnt forget that, from an epistemological
standpoint, we can assert only symmetry, not invariance.
In his original presentation of special relativity in 1905 Einstein took measuring rods and
clocks as primitive elements, even though he realized the weakness of this approach. He
later wrote of the special theory
It is striking that the theory introduces two kinds of physical things, i.e., (1)
measuring rods and clocks, and (2) all other things, e.g., the electromagnetic field,
the material point, etc. This, in a certain sense, is inconsistent; strictly speaking,
measuring rods and clocks should emerge as solutions of the basic equations
(objects consisting of moving atomic configurations), not, as it were, as
theoretically self-sufficient entities. The procedure was justified, however,
because it was clear from the very beginning that the postulates of the theory are
not strong enough to deduce from them equations for physical events sufficiently
complete and sufficiently free from arbitrariness to form the basis of a theory of
measuring rods and clocks.
This is quite similar to the view he expressed many years earlier
the solid body and the clock do not in the conceptual edifice of physics play the
part of irreducible elements, but that of composite structures, which may not play
any independent part in theoretical physics. But it is my conviction that in the
present stage of development of theoretical physics these ideas must still be
employed as independent ideas; for we are still far from possessing such certain
knowledge of theoretical principles as to be able to give exact theoretical
constructions of solid bodies and clocks.
The first quote is from his Autobiographical Notes in 1949, whereas the second is from

his essay on Geometry and Experience published in 1921. Its interesting how little his
views had changed during the intervening 28 years, despite the fact that those years saw
the advent of quantum mechanics, which many would say provided the very theoretical
principles underlying the construction of solid bodies and clocks that Einstein felt had
been lacking. Whether or not the principles of quantum mechanics are adequate to justify
our conceptions of reference lengths and time intervals, the characteristic spatial and
temporal extents of quantum phenomena are used today as the basis for all such
references.
Considering the arbitrariness of absolute coordinates, one might think our spatiotemporal descriptions could be better expressed in purely relational terms, such as by
specifying only the mutual distances (minimum path lengths) between objects.
Nevertheless, the most common method of description is to assign absolute coordinates
(three spatial and one temporal) to each object, with reference to an established system of
coordinates, while recognizing that the choice of coordinate systems is to some extent
arbitrary. The relations between objects are then inferred from these absolute (thought
somewhat arbitrary) coordinates. This may seem to be a round-about process, but there
are several reasons for using absolute coordinate systems to encode the relations between
objects, rather than explicitly specifying the relations themselves.
One reason is that this approach enables us to take advantage of the efficiency made
possible by the finite dimensionality of space. As discussed in Section 1.1, if there were
no limit to the dimensionality of space, then we would expect a set of n particles to have
n(n1)/2 independent pairwise spatial relations, so to explicitly specify all the distances
between particles would require n1 numbers for each particle, representing the distances
to each of the other particles. For a large number of particles (to say nothing of a
potentially infinite number) this would be impractical. Fortunately the spatial relations
between the objects of our experience are not mutually independent. The nth particle
essentially adds only three (rather than n1) degrees of freedom to the relational
configuration. In physical terms this restriction can be clearly seen from the fact that the
maximum number of mutually equidistant particles in D-dimensional space is D+1.
Experience teaches us that in our physical space we can arrange four, but not five or
more, particles such that they are all mutually equidistant, so we conclude that our space
has three dimensions.
Historically the use of absolute coordinates rather than explicit relations may also have
been partly due to the fact that analytic geometry and Cartesian coordinates were
invented (by Fermat, Descartes and others) at almost the same time that the new science
of mechanics needed them, just as tensor analysis was invented, three hundred years later,
at the very moment when it was needed to facilitate the development of general relativity.
(Of course, such coincidences are not accidental; contrivances requiring new materials
tend to be invented soon after the material becomes available.) The coordinate systems of
Descartes were not merely efficient, they were also consistent with the ancient
Aristotelian belief (also held by Descartes) that there is no such thing as empty space or
vacuum, and that continuous substance permeates the universe. In this context we cannot
even contemplate explicitly specifying each individual distance between substantial

points, because space is regarded as a continuum of substance. For Aristotle and


Descartes, every spatial extent is a measure of the length of some substance, not a pure
distance between particles as contemplated by atomists. In this sense we can say that the
continuous absolute coordinate systems inherited by modern science from Aristotle and
Descartes are a remnant of the Cartesian natural philosophy.
Another, perhaps more compelling, reason for the adoption of abstract coordinate systems
in the descriptions of physical phenomena was the need to account for acceleration. As
Newton explained with the example of a spinning pail, the mutual relations between a
set of material particles in an instant are not adequate to fully characterize a physical
situation at least not if we are considering only a small subset of all the particles in the
universe. (Whether the mutual relations would be adequate if all the matter in the
universe was taken into account is an open question.) In retrospect, there were other
possible alternatives, such as characterizing not just the relations between particles at a
specific instant, but over some temporal span of existence, but this would have required
the unification of spatial and temporal measures, which did not occur until much later.
Originally the motions of objects were represented simply by allowing the spatial
coordinates of each persistent object to be continuous single-valued functions of one real
variable, the time coordinate.
Incidentally, one consequence of the use of absolute coordinates is that it automatically
entails a breaking of the alleged translational symmetry. We said previously that the
coordinate system x could be replaced by x + k for any real number k, implying that
every real value of k is in some sense equally suitable. However, from a strictly
mathematical point of view there does not exist a uniform distribution over the real
numbers, so this form of representation does not exactly entail the perfect symmetry of
position in an infinite space, even if the space is completely empty.
The set of all combinations of values for the three spatial coordinates and one time
coordinate is assumed to give a complete coordination not only of the spatial positions of
each entity at each time, but of all possible spatial positions at all possible times. Any
definite set of space and time coordinates constitutes a system of reference. There are
infinitely many distinct ways in which such coordinates can be assigned, but they are not
entirely arbitrary, because we limit the range of possibilities by requiring contiguous
physical entities to be assigned contiguous coordinates. This imposes a definite structure
on the system, so it is more than merely a set of labels; it represents the most primitive
laws of physics.
One way of specifying an entire model of a world consisting of n (classical) particles
would be to explicitly give the 3n functions xj(t), yj(t), zj(t) for j = 1 to n. In this form, the
un-occupied points of space would be irrelevant, since only the actual paths of actual
physical entities have any meaning. In fact, it could be argued that only the intersections
of these particles have physical significance, so the paths followed by the particles in
between their mutual intersections could be regarded as merely hypothetical. Following
this approach we might end up with a purely combinatorial specification of discrete
interactions, with no need for the notion of a continuous physical space within which

entities reside and move. However, the hypothesis that physical objects have continuous
positions as functions of time with respect to a specified system of reference has proven
to be extremely useful, especially for purposes of describing simple laws by which the
observable interactions can be efficiently described and predicted.
An important class of physical laws that make use of the full spatio-temporal framework
consists of laws that are expressed in terms of fields. A field is regarded as existing at
each point within the system of coordinates, even those points that are not occupied by a
material particle. Therefore, each continuous field existing throughout time has,
potentially, far more degrees of freedom than does a discrete particle, or even infinitely
many discrete particles. Arguably, we never actually observe fields, were merely observe
effects attributed to fields. Its ironic that we can simplify the descriptions of particles by
introducing hypothetical entities (fields) with far more degrees of freedom, but the laws
governing the behavior of these fields (e.g., Maxwells equations for the electromagnetic
field) along with symmetries and simple boundary conditions suffice to constrain the
fields so that actually do provide a simplification. (Fields also provide a way of
maintaining conservation laws for interactions at a distance.) Whether the usefulness of
the concepts of continuous space, time, and fields suggests that they possess some
ontological status is debatable, but the concepts are undeniably useful.
These systems of reference are more than simple labeling. The numerical values of the
coordinates are intended to connote physical properties of order and measure. In fact, we
might even suppose that the sequence of states of all particles are uniformly
parameterized by the time coordinate of our system of reference, but therein lies an
ambiguity, because it isn't clear how the temporal states of one particle are to be placed in
correspondence with the temporal states of another. Here we must make an important
decision about how our model of the world is to be constructed. We might choose to
regard the totality of all entities as comprising a single element in a succession of
universal temporal states, in which case the temporal correspondence between entities is
unambiguous. In such a universe the temporal coordinate induces a total ordering of
events, which is to say, if we let the symbol denote temporal precedence or equality,
then for every three events a,b,c we have
(i)
(ii)
(iii)
(iv)

aa
if a b and b a, then a = b
if a b and b c, then a c
either a b or b a

However, this is not the only possible choice. We might choose instead to regard the
temporal state of each individual particle as an independent quantity, bearing in mind that
orderings of the elements of a set are not necessarily total. For example, consider the
subsets of a flat plane, and the ordering induced by the inclusion relation . Obviously
the first three axioms of a total ordering are satisfied, because for any three subsets a,b,c
of the plane we have (i) a a , (ii) if a b and b a, then a = b, and (iii) if a b and b
c, then a c. However, the fourth axiom is not satisfied, because it's entirely possible
to have two sets neither of which is included in the other. An ordering of this type is

called a partial ordering, and we should allow for the possibility that the temporal
relations between events induce a partial rather than a total ordering. In fact, we have no
a priori reason to expect that temporal relations induce even a partial ordering. It is safest
to assume that each entity possesses its own temporal state, and let our observations teach
us how those states are mutually related, if at all. (Similar caution should be applied when
modeling the relations between the spatial states of particles.)
Given any system of space and time coordinates we can define infinitely many others
such that speeds are preserved. This represents an equivalence relation, and we can then
define a reference frame as an equivalence class of coordinate systems such that the
speed of each object has the same value in terms of each coordinate system in that class.
Thus within a reference frame we can speak of the speed of an object, without needing to
specify any particular coordinate system. Of course, just as our coordinate systems are
generally valid only locally, so too are the reference frames.
Purely kinematic relativity contains enough degrees of freedom that we can simply define
our systems of reference (i.e., coordinate systems) to satisfy the additivity of velocity. In
other words, we can adopt velocity additivity as a principle, and this is essentially what
scientists had tacitly done since ancient times. The great insight of Galileo and his
successors was that this principle is inadequate to single out the physically meaningful
reference systems. A new principle was necessary, namely, the principle of inertia, to be
discussed in the next section.
1.3 Inertia and Relativity
These or none must serve for reasons, and it is my great happiness that
examples prove not rules, for to confirm this opinion, the world yields not
one example.
John Donne
In his treatise "On the Revolution of Heavenly Spheres" Copernicus argued for the
conceivability of a moving Earth by noting that
...every apparent change in place occurs on account of the movement either of the
thing seen or of the spectator, or on account of the necessarily unequal movement
of both. No movement is perceptible relatively to things moved equally in the
same direction - I mean relatively to the thing seen and the spectator.
This is a purely kinematical conception of relativity, like that of Aristarchus, based on the
idea that we judge the positions (and changes in position) of objects only in relation to the
positions of other objects. Many of Copernicuss contemporaries rejected the idea of a
moving Earth, because we do not directly sense any such motion. To answer this
objection, Galileo developed the concept of inertia, which he illustrated by thought
experiment involving the behavior of objects inside a ship which is moving at some

constant speed in a straight line. He pointed out that


... among things which all share equally in any motion, [that motion] does not act,
and is as if it did not exist... in throwing something to your friend, you need throw
it no more strongly in one direction than in another, the distances being equal...
jumping with your feet together, you pass equal spaces in every direction...
Thus Galileo's approach was based on a dynamical rather than a merely kinematic
analysis, because he refers to forces acting on bodies, asserting that the dynamic behavior
of bodies is homogeneous and isotropic in terms of (suitably defined) measures in any
uniform state of motion. This soon led to the modern principle of inertial relativity,
although Galileo himself seems never to have fully grasped the distinction between
accelerated and unaccelerated motion. He believed, for example, that circular motion was
a natural state that would persist unless acted upon by some external agent. This shows
that the resolution of dynamical behavior into inertial and non-inertial components which we generally take for granted today - is more subtle than it may appear. As Newton
wrote:
...the whole burden of philosophy seems to consist in this: from the phenomena of
motions to infer the forces of nature, and then from these forces to deduce other
phenomena...
Newtons doctrine implicitly assumes that forces can be inferred from the motions of
objects, but establishing the correspondence between forces and motions is not trivial,
because the doctrine is, in a sense, circular. We infer the forces of nature from observed
motions, and then we account for observed motions in terms of those forces. This
assumes we can distinguish between forced and unforced motion, but there is no a priori
way of making such a distinction. For example, the roughly circular motion of the Moon
around the Earth might suggest the existence of a force (universal gravitation) acting
between these two bodies, but it could also be taken as an indication that circular motion
is a natural form of unforced motion, as Galileo believed. Different definitions of
unforced motion lead to different sets of implied forces of nature. The task is to choose
a definition of unforced motion that leads to the identification of a set of physical forces
that gives the most intelligible decomposition of phenomena. By indirect reasoning, the
natural philosophers of the seventeenth century eventually arrived at the idea that, in the
complete absence of external forces, an object would move uniformly in a straight line,
and that, therefore, whenever we observe an object whose speed or direction of motion is
changing, we can infer that an external force proportional to the rate of change of
motion is acting upon that object. This is the principle of inertia, the most successful
principle ever proposed for organizing our knowledge of the natural world. Notice that it
refers to how a free object would move, because no object is completely free from all
external forces. Thus the conditions of this fundamental principle, as stated, are never
actually met, which highlights the subtlety of Newtons doctrine, and the aptness of his
assertion that it comprises the whole burden of philosophy. Also, notice that the
principle of inertia does not discriminate between different states of uniform motion in
straight lines, so it automatically entails a principle of relativity of dynamics, and in fact

the two are essentially synonymous.


The first explicit statement of the modern principle of inertial relativity was apparently
made by Pierre Gassendi, who is most often remembered today for reviving the ancient
Greek doctrine of atomism. In the 1630's Gassendi repeated many of Galileo's
experiments with motion, and interpreted them from a more abstract point of view,
consciously separating out gravity as an external influence, and recognizing that the
remaining "natural states of motions" were characterized not only by uniform speeds (as
Galileo had said) but also by rectilinear paths. In order to conceive of inertial motion, it is
necessary to review the whole range of observable motions of material objects and
imagine those motions if the effects of all known external influences were removed.
From this resulting set of ideal states of motion, it is necessary to identify the largest
possible "equivalence class" of relatively uniform and rectilinear motions. These motions
and configurations then constitute the basis for inertial measurements of space and time,
i.e., inertial coordinate systems. Naturally inertial motions will then necessarily be
uniform and rectilinear with respect to these coordinate systems, by definition.
Shortly thereafter (1644), Descartes presented the concept of inertial motion in his
"Principles of Philosophy":
Each thing...continues always in the same state, and that which is once moved
always continues to move...and never changes unless caused by an external
agent... all motion is of itself in a straight line...every part of a body, left to itself,
continues to move, never in a curved line, but only along a straight line.
Similarly, in Huygens' "The Motion of Colliding Bodies" (composed in the mid 1650's
but not published until 1703), the first hypothesis was that
Any body already in motion will continue to move perpetually with the same
speed in a straight line unless it is impeded.
Ultimately Newton incorporated this principle into his masterpiece, "Philosophiae
Naturalis Principia Mathematica" (The Mathematical Principles of Natural Philosophy),
as the first of his three laws of motion"
1) Every body continues in its state of rest, or of uniform motion in a right line,
unless it is compelled to change that state by the forces impressed upon it.
2) The change of motion is proportional to the motive force impressed, and is
made in the direction of the right line in which that force is impressed.
3) To every action there is always opposed an equal and opposite reaction; or, the
mutual actions of two bodies upon each other are always equal, and directed to
contrary parts.
These laws expresses the classical mechanical principle of relativity, asserting
equivalence between the conditions of "rest" and "uniform motion in a right line". Since
no distinction is made between the various possible directions of uniform motion, the

principle also implies the equivalence of uniform motion in all directions in space. Thus,
if everything in the universe is a "body" in the sense of this law, and if we stipulate rules
of force (such as Newton's second and third laws) that likewise do not distinguish
between bodies at rest and bodies in uniform motion, then we arrive at a complete system
of dynamics in which, as Newton said, "absolute rest cannot be determined from the
positions of bodies in our regions". Corollary 5 of the Newtons Principia states
The motions of bodies included in a given space are the same among themselves,
whether that space is at rest or moves uniformly forwards in a straight line
without circular motion.
Of course, this presupposes that the words "uniformly" and "straight" have unambiguous
meanings. Our concepts of uniform speed and straight paths are ultimately derived from
observations of inertial motions, so the laws of motion are to some extent circular.
These laws were historically expressed in terms of inertial coordinate systems, which are
defined as the coordinate systems in terms of which these laws are valid. In other words,
we define an inertial coordinate system as a system of space and time coordinates in
terms of which inertia is homogeneous and isotropic, and then we announce the laws of
motion, which consist of the assertion that inertia is homogeneous and isotropic with
respect to inertial coordinate systems. Thus the laws of motion are true by definition.
Their significance lies not in their truth, which is trivial, but in their applicability. The
empirical fact that there exist systems of inertial coordinates is what makes the concept
significant. We have no a priori reason to expect that such coordinate systems exist, i.e.,
that the forces of nature would resolve themselves so coherently on this (or any other
finite) basis, but they evidently do. In fact, it appears that not just one such coordinate
system exists (which would be remarkable enough), but that infinitely many of them
exist, in all possible states of relative motion. To be precise, the principle of relativity
asserts that for any material particle in any state of motion there exists an inertial
coordinate system in terms of which the particle is (at least momentarily) at rest.
Its important to recognize that Newtons first law, by itself, is not sufficient to identify
the systems of coordinates in terms of which all three laws of motion are satisfied. The
first law serves to determine the shape of the coordinate axes and inertial paths, but it
does not fully define a system of inertial coordinates, because the first law is satisfied in
infinitely many systems of coordinates that are not inertial. The system of oblique xt
coordinates illustrated below is an example of such a system.

The two dashed lines indicate the paths of two identical objects, both initially at rest with
respect to these coordinates and propelled outward from the origin by impulses forces of
equal magnitude (acting against each other). Every object not subject to external forces
moves with uniform speed in a straight line with respect to this coordinate system, so
Newton's First Law of motion is satisfied, but the second law clearly is not, because the
speeds imparted to these identical objects by equal forces are not equal. In other words,
inertia is not isotropic with respect to these coordinates. In order for Newton's Second
Law to be satisfied, we not only need the coordinate axes to be straight and uniformly
graduated relative to freely moving objects, we need the space axes to be aligned in time
such that mechanical inertia is the same in all spatial directions (so that, for example, the
objects whose paths are represented by the two dashed lines in the above figure have the
same speeds). This effectively establishes the planes of simultaneity of inertial coordinate
systems. In an operational sense, Newton's Third Law is also involved in establishing the
planes of simultaneity for an inertial coordinate system, because it is only by means of
the Third Law that we can actually define "equal forces" as the forces necessary to impart
equal "quantities of motion" (to use Newtons phrase). Of course, this doesn't imply that
inertial coordinate systems are the "true" systems of reference. They are simply the most
intuitive, convenient, and readily accessible systems, based on the inertial behavior of
material objects.
In addition to contributing to the definition of an inertial coordinate system, the third law
also serves to establish a fundamental aspect of the relationships between relatively
moving inertial coordinate systems. Specifically, the third law implies (requires) that if
the spatial origin of one inertial coordinate system is moving at velocity v with respect to
a second inertial coordinate system, then the spatial origin of the second system is
moving at velocity v with respect to the first. This property is sometimes called
reciprocity, and is important for the various derivations of the Lorentz transformation to
be presented in subsequent sections.
Based on the definition of an inertial coordinate system, and the isotropy of inertia with
respect to such coordinates, it follows that two identical objects, initially at rest with
respect to those coordinates and exerting a mutual force on each other, recoil by equal
distances in equal times (in accord with Newtons third law). Assuming the lengths of
stable material objects are independent of their spatial positions and orientations (spatial
homogeneity and isotropy), it follows that we can synchronize distant clocks with
identical particles ejected with equal forces from the mid-point between the clocks. Of
course, this operational definition of simultaneity is not new. It is precisely what Galileo
described in his illustration of inertial motion onboard a moving ship. When he wrote that
an object thrown with equal force will reach equal distances [in the same time], he was
implicitly defining simultaneity at separate locations on the basis of inertial isotropy. This
is crucial to understanding the significance of inertial coordinate systems. The
requirement for a particular object to be at rest with respect to the system suffices only to
determine the direction of the "time axis", i.e., the loci of constant spatial position.
Galileo and his successors realized (although they did not always explicitly state) that it is
also necessary to specify the loci of constant temporal position, and this is achieved by
choosing coordinates in such a way that mechanical inertia is isotropic. (This means the

inertia of an object does not depend on any absolute reference direction in space,
although it may depend on the velocity of the object. It is sufficient to say the resistance
to acceleration of a resting object is the same in all spatial directions.)
Conceptually, to establish a complete system of space and time coordinates based on
inertial isotropy, imagine that at each point in space there is an identically constructed
cannon, and all these cannons are at rest with respect to each other. At one particular
point, which we designate as the origin of our coordinates, is a clock and numerous
identical cannons, each pointed at one of the other cannons out in space. The cannons are
fired from the origin, and when a cannonball passes one of the external cannons it
triggers that external cannon to fire a reply back to the origin. Each cannonball has
identifying marks so we can correlate each reply with the shot that triggered it, and with
the identity of the replying cannon. The ith reply event is assigned the time coordinate ti =
[treturn(i) tsend(i)]/2 seconds, and it is assigned space coordinates xi, yi, zi based on the
angular direction of the sending cannon and the radial distance ri = ti cannon-seconds.
This procedure would have been perfectly intelligible to Newton, and he would have
agreed that it yields an inertial coordinate system, suitable for the application of his three
laws of motion.
Naturally given one such system of coordinates, we can construct infinitely many others
by simple spatial re-orientation of the space axes and/or translation of the spatial or
temporal axes. All such transformations leave the speed of every object unchanged. An
equivalence class of all such inertial coordinate systems is called an inertial reference
frame. For characterizing the mutual dynamical states of two material bodies, the
associated inertial rest frames of the bodies are more meaningful than the mere distance
between the bodies, because any inertial coordinate system possesses a fixed spatial
orientation with respect to any other inertial coordinate system, enabling us to take
account of tangential motion between bodies whose mutual distance is not changing. For
this reason, the physically meaningful "relative velocity of two material bodies" is best
defined as their reciprocal states of motion with respect to each others' associated inertial
rest frame coordinates.
The principle of relativity does not tell us how two relatively moving systems of inertial
coordinates are related to each other, but it does imply that this relationship can be
determined empirically. We need only construct two relatively moving systems of inertial
coordinates and compare them. Based on observations of coordinate systems with
relatively low mutual speeds, and with the limited precision available at the time, Galileo
and Newton surmised that if (x,t) is an inertial coordinate system then so is (x,t), where

and v is the mutual speed between the origins of the two systems. This implies that
relative speeds are simply additive. In other words, if a material object B is moving at the
speed v in terms of inertial rest frame coordinates of A, and if an object C is moving in
the same direction at the speed u in terms of inertial rest frame coordinates of B, then C is
moving at the speed v + u in terms of inertial rest frame coordinates of A. This

conclusion may seem plausible, but it's important to realize that we are not free to
arbitrarily adopt this or any other transformation and speed composition rule for the set of
inertial coordinate systems, because those systems are already fully defined (up to
insignificant scale factors) by the requirements for inertia to be homogeneous and
isotropic and for momentum to be conserved. These properties suffice to determine the
set of inertial coordinate systems and (therefore) the relationships between them. Given
these conditions, the relationship between relatively moving inertial coordinate systems,
whatever it may be, is a matter of empirical fact.
Of course, inertial isotropy is not the only possible basis for constructing spacetime
coordinate systems. We could impose a different constraint to determine the loci of
constant temporal position, such as a total temporal ordering of events. However, if we
do this, we will find that mechanical inertia is generally not isotropic in terms of the
resulting coordinate systems, so the usual symmetrical laws of mechanics will not be
valid in terms of those coordinate systems (at least not if restricted to ponderable
matter). Indeed this was the case for the ether theories developed in the late 19th
century, as discussed in subsequent sections. Such coordinate systems, while extremely
awkward, would not be logically inconsistent. The choices we make to specify a
coordinate system and to resolve spacetime intervals into separate spatial and temporal
components are to some extent conventional, provided we are willing to disregard the
manifest symmetry of physical phenomena. But since physics consists of identifying and
understanding the symmetries of nature, the option of disregarding those symmetries does
not appeal to most physicists.
By the end of the nineteenth century a new class of phenomena involving electric and
magnetic fields had been incorporated into physics, and the concept of inertia was found
to be applicable to these phenomena as well. For example, Maxwells equations imply
that a pulse of light conveys momentum. Hence the principle of inertia ought to apply to
electromagnetism as well as to the motions of material bodies. In his 1905 paper On the
Electrodynamics of Moving Bodies Einstein adopted this more comprehensive
interpretation of inertia, basing the special theory of relativity on the proposition that
The laws by which the states of physical systems undergo changes are not
affected, whether these changes of state be referred to the one or the other of two
systems of [inertial] coordinates in uniform translatory motion.
This is nearly identical to Newtons Corollary 5. Its unfortunate that the word "inertial"
was omitted, because, as noted above, uniform translatory motion is not sufficient to
ensure that a system of coordinates is actually an inertial coordinate system. However,
Einstein made it clear that he was indeed talking about inertial coordinate systems when
he previously characterized them as coordinate systems in which the equations of
Newtonian mechanics hold good. Admittedly this is a somewhat awkward assertion in
the context of Einsteins paper, because one of the main conclusions of the paper is that
the equations of Newtonian mechanics do not precisely hold good with respect to
inertial coordinate systems. Recognizing this inconsistency, Sommerfeld added a footnote
in subsequent published editions of Einsteins paper, qualifying the statement about

Newtonian mechanics holding good to the first approximation, but this footnote does
not really clarify the situation. Fundamentally, the class of coordinate systems that
Einstein was trying to identify (the inertial coordinate systems) are those in terms of
which inertia is homogeneous and isotropic, so that free objects move at constant speed
in straight lines, and the force required to accelerate an object from rest to a given speed
is the same in all directions. As discussed above, these conditions are just sufficient to
determine a coordinate system in terms of which the symmetrical equations of mechanics
hold good, but without pre-supposing the exact form of those equations.
Since light (i.e., an electromagnetic wave) carries momentum, and the procedure for
constructing an inertial coordinate system described previously was based on the isotropy
of momentum, it is reasonable to expect that pulses of light could be used in place of
cannonballs, and we should arrive at essentially the same class of coordinate systems. In
his 1905 paper this is how Einstein described the construction of inertial coordinate
systems, implicitly asserting that the propagation of light is isotropic with respect to the
same class of coordinate systems in terms of which mechanical inertia is isotropic. In this
respect it might seem as if he was treating light as a stream of inertial particles, and
indeed his paper on special relativity was written just after the paper in which he
introduced the concept of photons. However, we know that light is not exactly like a
stream of material particles, especially because we cannot conceive of light being at rest
with respect to any system of inertial coordinates. The way in which light fits into the
framework of inertial coordinate systems is considered in the next section. We will find
that although the principle of relativity continues to apply, and the definition of inertial
coordinate systems remains unchanged, the relationship between relatively moving
systems of inertial coordinate systems must be different than what Galileo and Newton
surmised.
1.4 The Relativity of Light
According to the theory of emission, the transmission of energy [of light]
is effected by the actual transference of light-corpuscles According to
the theory of undulation, there is a material medium which fills the space
between two bodies, and it is by the action of contiguous parts of this
medium that the energy is passed on
James Clerk Maxwell
Light is arguably the phenomenon of nature with which we have the most conscious
experience, by means of our sense of vision, and yet throughout most of human history
very little seems to have been known about how vision works. Interestingly, from the
very beginning there were at least two distinct concepts of light, existing side by side, as
can be seen in some of the earliest known writings. For example, the description of
creation in the biblical book of Genesis says light was created on the first day, and yet the
sun, moon, and stars were not created until the fourth day to give light upon the earth.
Evidently the word light is being used to signify two different things on the first and

fourth days. For another example, Plato argued in Timaeus that there are two kinds of
fire involved in our sense of vision, one coming from inside ourselves, emanating as
visual rays from our eyes to make contact with distant objects, and another, which he
called daylight, that (when present) surrounds the visual rays from our eyes and
facilitates the conveyance of the visual images. These two kinds of fire correspond
roughly with the later scholastic concepts of lux and lumen. The word lux was used to
signify our visual sensations, whereas the word lumen referred to an external agent (such
as light from the sun) that somehow participates in our sense of vision.
There was also, in ancient times, a competing theory of vision, according to which all
objects naturally emit whole images (eidola) of themselves in small packets, and these
enter our souls by way of our eyes. To account for our inability to see at night, it was
thought that light from the sun or moon struck the objects and caused them to emit their
images. This model of vision still entailed two distinct kinds of light: the facilitating
illumination from the sun or moon, and the eidola emitted by ordinary objects. This
somewhat awkward conception of vision was improved by Ibn al-Haitham and later by
Kepler, who argued that it is not necessary to assume whole objects emit multiple copies
of themselves; we can simply consider each tiny part of an object as the source of rays
emanating in all directions, and a sub-set of these rays intersecting in the eye can be reassembled into an image of the object.
Until the end of the 17th century there was no evidence to indicate that rays of light
propagated at a finite speed, and they were often assumed to be instantaneous. Only in
1689 with Roemers observations of the moons of Jupiter, and even more convincingly in
1728 with Bradleys discovery of stellar aberration, did it become clear that the rays of
lumen propagate through space with a characteristic finite speed. This suggested that
light, and the energy it conveys, must have some mode of existence during the interval of
time between its emission and its absorption. Hence light became an entity or process in
itself, rather than just a relation between entities, but again there were two competing
notions as to the mode of existence. Two different analogies were conceived, based on
the behavior of ordinary material substances. Some thought light could be regarded as a
stream of material corpuscles moving through empty space, whereas other believed light
consists of undulations or waves in a pervasive material medium. Each of these analogies
was consistent with some of the attributes of light, but neither could be reconciled fully
with all the attributes. For example, if light consists of material corpuscles, then
according to Galilean relativity there should be an inertial reference frame with respect to
which light is at rest in a vacuum, whereas in fact we never observe light in a vacuum to
be at rest, nor even noticeably slow, with respect to any inertial reference frame. On the
other hand, if light is a wave propagating through a material medium, then the constituent
parts of that medium should, according to Galilean relativity, behave inertially, and in
particular should have a definite rest frame, whereas we find that light propagates best
through regions (vacuum) in which there is no detectable material with a definite rest
frame, and again we cannot conceive of light at rest in any inertial frame. Thus the
behavior of light defies realistic representation in terms of the behavior of material
substances within the framework of Galilean space and time, even if we consider just the
classical attributes, let alone quantum phenomena.

By the end of the 19th century the inadequacy of both of the materialistic analogies for
explaining the behavior of light had become acute, because there was strong evidence
that light exhibits two seemingly mutually exclusive properties. First, Maxwell showed
how light can be regarded as a propagating electromagnetic wave, and as such the speed
of propagation is obviously independent of the speed of the source. Second, numerous
experiments showed that light propagates at the same speed in all directions relative to
the source, just as we would expect for streams of inertial corpuscles. Hence some of the
attributes of light seemed to unequivocally support an emission theory, while others
seemed just as unequivocally to support a wave theory. In retrospect its clear that there
was an underlying confusion regarding the terms of description, i.e., the systems of
inertial coordinates, but this was far from clear at the time.
One of the first clues to unraveling the mystery was found in 1887, when Woldemar
Voigt made a remarkable discovery concerning the ordinary wave equation. Recall that
the wave equation for a time-dependent scalar field (x,t) in one dimension is

where u is the propagation speed of the wave. This equation was first studied by Jean
d'Alembert in the 18th century, and it applies to a wide range of physical phenomena. In
fact it seems to represent a fundamental aspect of the relationship between space, time,
and motion, transcending any particular application. Traditionally it was considered to be
valid only for a coordinate system x,t with respect to which the wave medium (presumed
to be an inertial substance) is at rest and has isotropic properties, because if we apply a
Galilean transformation to these coordinates, the wave equation is not satisfied with
respect to the transformed coordinates. However, Galilean transformations are not the
most general possible linear transformations. Voigt considered the question of whether
there is any linear transformation that leaves the wave equation unchanged.
The general linear transformation between (X,T) and (x,t) is of the form

for constants A,B,C,D. If we choose units of space and time so that the acoustic speed u
equals 1, the wave equation in terms of (X,T) is simply 2X2 = 2/T2. To express
this equation in terms of the transformed (x,t) coordinates, recall that the total differential
of can be written in the form

Also, at any constant T, the value of is purely a function of X, so we can divide through
the above equation by dX to give

Taking the partial derivative of this with respect to X then gives

Since partial differentiation is commutative, this can be written as

Substituting the prior expression for /dX and carrying out the partial differentiations
gives an expression for 2/X2 in terms of partials of with respect to x and t. Likewise
we can derive an expression for 2/T2. Substituting into the wave equation gives

This is equivalent to the condition that (X,T) is a solution of the wave equation with
respect to the X,T coordinates. Since the mixed partial generally varies along a path of
constant second partial with respect to x or t, it follows that a necessary and sufficient
condition for (x,t) to also be a solution of the wave equation in terms of the x,t
coordinates is that the constants A,B,C,D of our linear transformation satisfy the relations

Furthermore, the differential of the space transformation is dx = AdX + BdT, so an


increment with dx = 0 satisfies dX/dT = -B/A. This represents the velocity at which the
spatial origin of the x,t coordinates is moving relative to the X,T coordinates. We will
refer to this velocity as v. We also have the inverse transformation from (X,T) to (x,t):

Proceeding as before, the differential of this space transformation gives dx/dt = B/D for
the velocity of the spatial origin of the X,T coordinates with respect to the x,t coordinates,
and this must equal v. Therefore we have B = Av = Dv, and so A = D. It follows from
the condition imposed by the wave equation that B = C, so both of these equal Av. Our
transformation can then be written in the form

The same analysis shows that the perpendicular coordinates y and z of the transformed
system must be given by

In order to make the transformation formula for x agree with the Galilean transformation,
Voigt chose A = 1, so he did not actually arrive at the Lorentz transformation, but
nevertheless he had shown roughly how the wave equation could actually be relativistic
just like the dynamic behavior of inertial particles provided we are willing to consider a
transformation of the space and time coordinates that differs from the Galilean
transformation. Had he considered the inverse transformation

he might have noticed that the determinant is A2(1v2), so to make this equal to 1 we must
have A = 1/(1v2)1/2, which not only implies y = Y and z = Z, but also makes the
transformation formally identical to its inverse. In other words, he would have arrived at
a completely relativistic framework for the wave equation. However, this was not Voigts
objective, and he evidently regarded the transformed coordinates x, y, z and t as merely a
convenient parameterization for purposes of calculation, without attaching any greater
significance to them.
Voigts transformation was the first hint of how a wavelike phenomenon could be
compatible with the principle of relativity, which (as summarized in the preceding
section) is that there exist inertial coordinate systems in terms of which free motions are
linear, inertia is isotropic, and every material object is instantaneously at rest with respect
to one of these systems. None of this conflicts with the observed behavior of light,
because the motion of light is observed to be both linear and isotropic with respect to
inertial coordinate systems. The fact that light is not at rest with respect to any system of
inertial coordinates does not conflict with the principle of relativity if we agree that light
is not a material object.
The incompatibility of light with the Galilean framework arises not from any conflict
with the principle of relativity, but from the tacitly adopted empirical conclusion that two
relatively moving systems of inertial coordinates are related to each other by Galilean
transformations, so that the composition of co-linear speeds is simply additive. As

discussed in the previous section, we aren't free to impose this assumption on the class of
inertial coordinate systems, because they are fully determined by the requirement for
inertia to be homogeneous and isotropic. There are no more adjustable parameters (aside
from insignificant scale factors), so the composition of velocities with respect to
relatively moving inertial coordinate systems is a matter to be determined empirically.
Recall from the previous section that, on the basis of slowly moving reference frames,
Galileo and Newton had inferred that the composition of speeds was simply additive. In
other words, if a material object B is moving at the speed v in terms of inertial rest frame
coordinates of a material object A, and if an object C is moving in the same direction at
the speed u in terms of inertial rest frame coordinates of B, then Newton found that object
C has the speed v + u in terms of the inertial rest frame coordinates of A. Toward the end
of the nineteenth century, more precise observations revealed that is not quite correct. It
was found that the speed of object C in terms of inertial rest frame coordinates of A is not
v + u, but rather (v + u)/(1 + uv/c2), where c is the speed of light in a vacuum.
Obviously these conclusions would be identical if the speed of light was infinitely great,
which was still considered a real possibility in Galileo's day. Many people, including
Descartes, regarded rays of light as instantaneous. Even Newton's Opticks, published in
1704, made allowances for the possibility that "light be propagated in an instant"
(although Newton himself was persuaded by Roemer's observations that light has a finite
speed). Hence it can be argued that the principles of Galileo and Einstein are essentially
identical in both form and content. The only difference is that Galileo assessed the
propagation of light to be "if not instantaneous then extraordinarily fast", and thus could
neglect the term uv/c2, especially since he restricted his considerations to the movements
of material objects, whereas subsequently it became clear that the speed of light has a
finite value, and it was necessary to take account of the uv/c2 term when attempting to
incorporating the motions of light and high-speed particles into the framework of
mechanics.
The empirical correspondence between inertial isotropy and lightspeed isotropy can be
illustrated by a simple experiment. Three objects, A, B, and C, at rest with respect to
each other can be arranged so that one of them is at the midpoint between the other two
(the midpoint having been determined using standard measuring rods at rest with respect
to those objects). The two outer objects, A and C, are equipped with identical clocks, and
the central object, B, is equipped with two identical cannons. Let the two cannons in the
center be fired simultaneously in opposite directions toward the two outer objects, and
then at a subsequent time let object B emit a flash of light. If the arrivals of the
cannonball and light coincide at A, then they also coincide at C, signifying that the
propagation of light is isotropic with respect to the same system of coordinates in terms
of which mechanical inertia is isotropic, as illustrated in the figure below.

The fact that light emitted from object B propagates isotropically with respect to B's
inertial rest frame might seem to suggest that light can be treated as an inertial object
within the Galilean framework, just like cannon-balls. However, we also find that if the
light is emitted at the same time and place from an object D that is moving with respect to
B (as shown in the figure above), the light's speed is still isotropic with respect to B's
inertial rest frame. Now, this might seem to suggest that light is a disturbance in a
material medium in which the objects A,B,C just happen to be at rest, but this is ruled out
by the fact that it applies regardless of the state of (uniform) motion of those objects.
Naturally this implies that the flash of light propagates isotropically with respect to the
inertial rest coordinates of object D as well. To demonstrate this, we could arrange for
two other bodies, denoted by E and F, to be moving at the same speed as D, and located
an equal distance from D in opposite directions. Then we could fire two identically
constructed cannons (at rest with respect to D) in opposite directions, toward E and F.
The results are illustrated below.

The cannons are fired from D when it crosses the x axis, and the cannon-balls strike E
and F at the events marked and , coincident with the arrival of the light pulse from D.
Obviously the time axis for the inertial rest frame coordinates of object D is the worldline
of D itself (rather than the original "t" axis shown on the figure). In addition, since
inertial coordinates are defined such that mechanical inertia is isotropic, it follows that
the cannon-balls fired from identical cannons at rest with D are moving with equal and
opposite speeds with respect to D's inertial rest coordinates, and since E and F are at
equal distances from D, it also follows that the events a and b are simultaneous with
respect to the inertial rest coordinates of D. Hence, not only is the time axis of D's rest
frame slanted with respect to B's time axis, the spatial axis of D's rest frame is equally
slanted with respect to B's spatial axis.
Several other important conclusions can be deduced from this figure. For example, with
respect to the original x,t coordinate system, the speeds of the cannon-balls from D are
not given by simply adding (or subtracting) the speed of the cannon-balls with respect to
D's rest frame to (or from) the speed of D with respect to the x,t coordinates. Since
momentum is explicitly conserved, this implies that the inertia of a body increases with
it's velocity (i.e., kinetic energy), as is discussed in more detail in Section 2.3. We should
also note that although the speed of light is isotropic with respect to any inertial
spacetime coordinates, independent of the motion of the source, it is not correct to say
that the light itself is isotropic. The relationship between the frequency (and energy) of
the light with respect to the rest frame of the emitting body and the frequency (and
energy) of the light with respect to the rest frame of the receiving body does depend on
the relative velocity between those two massive bodies (as discussed in Chapter 2.4).
Incidentally, notice that we can rule out the possibility of object B and D dragging the
light medium along with them, because they are moving through the same region of
space at the same time, and they can't both be dragging the same medium in opposite
directions. This is in contrast to the case of (for example) acoustic pressure waves in a
material substance, because in that case a recognizable material substance determines the
unique isotropic frame, whereas in the case of light we're unable to identify any definite
material medium, so the medium has no definite rest frame.
The first person to discern the true relationship between relatively moving systems of
inertial coordinate systems was Hendrik Antoon Lorentz. Not surprisingly, he arrived at
this conception in a rather indirect and laborious way, and didn't immediately recognize
that the class of coordinate systems he had discovered (and which he called "local
coordinate" systems) were none other than Galileo's inertial coordinate systems.
Incidentally, although Lorentz and Voigt knew and corresponded with each other, Lorentz
apparently was not aware of Voigts earlier work on coordinate transformations that leave
the wave equation invariant, and so that work had no influence on Lorentzs search for
coordinate systems in terms of which Maxwells equations are invariant. Unlike Voigt,
Lorentz derived the transformation in two separate stages. He first developed the "local
time" coordinate, and only years later came to the conclusion (after, but independently of,
Fitzgerald) that a "contraction" of spatial length was also necessary in order to account

for the absence of second-order effects in Michelson's experiment.


Lorentz began with the absolute ether frame coordinates t and x, in terms of which every
event can be assigned a unique space-time position (t,x), and then he considered a system
moving with the velocity v in the positive x direction. He applied the traditional Galilean
transformation to assign a new set of coordinates to every event. Thus an event with
ether-frame coordinates t,x is assigned the new coordinates x" = x vt and t" = t. Then
he tentatively proposed an additional transformation that must be applied to x",t" in order
to give coordinates in terms of which Maxwell's equations apply in their standard form.
Lorentz was not entirely clear about the physical significance of these local
coordinates, but it turns out that all physical phenomena conform to the same isotropic
laws of physics when described in terms of these coordinates. (Lorentz's notation made
use of the parameter = 1/ = 1/(1v2)1/2 and another constant which he later determines
to be 1.) Taking units such that c = 1, his equations for the local coordinates x' and t' in
terms of the Galilean coordinates which we are calling x" and t" are

Recall that the traditional Galilean transformation is x" = x vt and t" = t, so we can make
these substitutions to give the complete transformation from the original ether rest frame
coordinates x,t to the local coordinates moving with speed v

These effective coordinates enabled Lorentz to explain how two relatively moving
observers, each using his own local system of coordinates, both seem to remain at the
center of expanding spherical light waves originating at their point of intersection, as
illustrated below

The x and x' axes represent the respective spatial coordinates (say, in the east/west

direction), and the t and t' axes represent the respective time coordinates. One observer is
moving through time along the t axis, and the other has some relative westward velocity
as he moves through time along the t' axis. The two observers intersect at the event
labeled O, where they each emit a pulse of light. Those light pulses emanate away from O
along the dotted lines. Subsequently the observer moving along the t axis finds himself at
C, and according to his measures of space and time the outward going light waves are at
E and W at that same instant, which places him at the midpoint between them. On the
other hand, the observer moving along t' axis finds himself at point c, and according to
his measures of space and time the outward going light waves are at e and w at this
instant, which implies that he is at the midpoint between them.
Thus Lorentz discovered that by means of the "fictitious" coordinates x',t' it was possible
to conceive of a class of relatively moving coordinate systems with respect to which the
speed of light is invariant. He went beyond Voigt in the realization that the existence of
this class of coordinate systems ensures the appearance of relativity, at least for optical
phenomena, and yet, like Voigt, he still tended to regard the "local coordinates" as
artificial. Having been derived specifically for electromagnetism, it was not clear that the
same transformations should apply to all physical phenomena, including inertia, gravity,
and whatever forces are responsible for the stability of matter at least not without
simply hypothesizing this to be the case. However, Lorentz was dissatisfied with the
proliferation of hypotheses that he had made in order to arrive at this theory. The same
criticism was made in a contemporary review of Lorentz's work by Poincare, who chided
him with the remark "hypotheses are what we lack least". The most glaring of these was
the hypothesis of contraction, which seemed distinctly "ad hoc" to most people, including
Lorentz himself originally, but gradually he came to realize that the contraction
hypothesis was not as unnatural as it might seem.
Surprising as this hypothesis may appear at first sight, yet we shall have to admit
that it is by no means far-fetched, as soon as we assume that molecular forces are
also transmitted through the ether, like the electric and magnetic forces
He set about trying to show (admittedly after the fact) that the Fitzgerald contraction was
to be expected based on what he called the Molecular Force Hypothesis and his theorem
of Corresponding States, as discussed in the next section.
1.5 Corresponding States
It would be more satisfactory if it were possible to show by means of
certain fundamental assumptions - and without neglecting terms of any
order - that many electromagnetic actions are entirely independent of the
motion of the system. Some years ago I already sought to frame a theory
of this kind. I believe it is now possible to treat the subject with a better
result.
H. A.
Lorentz

In 1889 Oliver Heaviside deduced from Maxwells equations that the electric and
magnetic fields on a spherical surface of radius r surrounding a uniformly moving electric
charge e are radial and circumferential respectively, with magnitudes

where is the angle relative to the direction of motion with respect to the stationary
frame of reference. (We have set c = 1 for clarity.) The left hand equation implies that, in
comparison with a stationary charge, the electric field strength at a distance r from a
moving charge is less by a factor of 1v2 in the direction of motion, and greater by a
factor of 1/(1v2)1/2 in the perpendicular directions. Thus the strength of the electric field
of a moving charge is anisotropic. These equations imply that

which Heaviside recognized as the convection potential, i.e., the scalar field whose
gradient is the total electromagnetic force on a co-moving charge at that relative position.
This scalar is invariant under Lorentz transformations, and it follows from the above
formula that the cross-section of surfaces of constant potential are described by

This is the equation of an ellipse, so Heavisides formulas imply that the surfaces of
constant potential are ellipsoids, shortened in the direction of motion by the factor (1
v2)1/2. From the modern perspective the contraction of characteristic lengths in the
direction of motion is an immediate corollary of the fact that Maxwells equations are
Lorentz covariant, but at the time the idea of anisotropic changes in length due to motion
was regarded as a distinct and somewhat unexpected attribute of electromagnetic fields. It
wasnt until 1896 that Searle explicitly pointed out that Heavisides formulas imply the
contraction of surfaces of constant potential into ellipsoids, but already in 1889 it seems
that Heavisides findings had prompted an interesting speculation as to the deformation of
stable material objects in uniform motion.
George Fitzgerald corresponded with Heaviside, and learned of the anisotropic variations
in field strengths for a moving charge, and this was at the very time when he was
struggling to understand the null result of the latest Michelson and Morley ether drift
experiment (performed in 1887). It occurred to Fitzgerald that the null result would be
explained if the material comprising Michelsons apparatus contracts in the direction of

motion by the factor (1v2)1/2, and moreover that this contraction was not entirely
implausible, because, as he wrote in a brief letter to the American journal Science in 1889
We know that electric forces are affected by the motion of the electrified bodies relative
to the ether and it seems a not improbable supposition that the molecular forces are
affected by the motion and that the size of the body alters consequently.
A few years later (1892) Lorentz independently came to the same conclusion, and
proceeded to explain in detail how the variations in the electromagnetic field implied by
Maxwells equations actually result in a proportional contraction of matter at least if we
assume the forces responsible for the stability of matter are affected by motion in the
same way as the forces of electromagnetism. This latter assumption Lorentz called the
molecular force hypothesis, admitting that he had no real justification for it (other than
the fact that it accounted for Michelsons null result). On the basis of this hypothesis,
Lorentz showed that the description of the equilibrium configuration of a uniformly
moving material object in terms of its local coordinates is identical to the description of
the same object at absolute rest in terms of the ether rest frame coordinates. He called this
the theorem of corresponding states.
To illustrate, consider a small bound spherical configuration of matter at rest in the ether.
We assume the forces responsible for maintaining the spherical structure of this particle
are affected by uniform motion through the ether in exactly the same way as are
electromagnetic forces, which is to say, they are covariant with respect to Lorentz
transformations. These forces may propagate at any speed (at or below the speed of
light), but it is most convenient for descriptive purposes to consider forces that propagate
at precisely the speed of light (in terms of the fixed rest frame coordinates of the ether),
because this automatically ensures Lorentz covariance. A wave emanating from the
geometric center of the particle at the speed c would expand spherically until reaching the
radius of the configuration, where we can imagine that it is reflected and then contracts
spherically back to a point (like a spatial filter) and re-expands on the next cycle. This is
illustrated by the left-hand cycle below.

Only two spatial dimensions are shown in this figure. (In four-dimensional spacetime
each shell is actually a sphere.) Now, if we consider an intrinsically identical

configuration of matter in uniform motion relative to the putative rest frame of the ether,
and if the equilibrium shape is maintained by forces that are Lorentz covariant, just as is
the propagation of electromagnetic waves, then it must still be the case that an
electromagnetic wave can expand from the center of the configuration to the perimeter,
and be reflected back to the center in a coherent pattern, just as for the stationary
configuration. This implies that the absolute shape of the configuration must change from
a sphere to an ellipsoid, as illustrated by the right-hand figure above. The spatial size of
the particle in terms of the ether rest frame coordinates is just the intersection of a
horizontal time slice with the region swept out by the perimeter of the configuration. For
any given characteristic particle, since there is no motion relative to the ether in the
transverse direction, the size in the transverse direction must be unaffected by the motion.
Thus the widths of the configurations in the "y" direction in the above figures are equal.
The figure below shows more detailed side and top views of one cycle of a stationary and
a moving particle (with motions referenced to the rest frame of the putative ether).

It's understood that these represent corresponding states, i.e., intrinsically identical
equilibrium configurations of matter, whose spatial shapes are maintained by Lorentz
covariant forces. In each case the geometric center of the configuration progresses from
point A to point B in the respective figure. The right-hand configuration is moving with a
speed v in the positive x direction. It can be shown that the transverse sizes of the
configurations are equal if the projected areas of the cross-sectional side views (the lower
figures) are equal. Thus, light emanating from point A of the moving particle extends a
distance 1/ to the left and a distance to the right, where is a constant function of v.
Specifically, we must have

where we have set c = 1 for clarity. The leading edge of the shaft swept out by the

moving shell crosses the x axis at a distance (1v) from the center point A, which
implies that the object's instantaneous spatial extent from the center to the leading edge is
only

Likewise it's easy to see that the elapsed time (according to the putative ether rest frame
coordinates) for one cycle of the moving particle, i.e., from point A to point B, is simply

compared with an elapsed time of 2 for the same particle at rest. Hence we unavoidably
arrive at Fitzgerald's length contraction and Lorentz's local time dilation for objects in
motion with respect to the x,y,t coordinates, provided only that all characteristic spatial
and temporal intervals associated with physical entities are maintained for forces that are
Lorentz covariant.
The above discussion did not invoke Maxwells equations at all, except to the extent that
those equations suggested the idea that all the fundamental forces are Lorentz covariant.
Furthermore, we have so far omitted consideration of one very important force, namely,
the force of inertia. We assumed the equilibrium configurations of matter were
maintained by certain forces, but if we consider oscillating configurations, we see that the
periodic shapes of such configurations depend not only on the binding force(s) but also
on the inertia of the particles. Therefore, in order to arrive at a fully coherent theorem of
corresponding states, we must assume that inertia itself is Lorentz covariant. As Lorentz
wrote in his 1904 paper
the proper relation between the forces and the accelerations will exist if we
suppose that the masses of all particles are influenced by a translation to the same
degree as the electromagnetic masses of the electrons.
In other words, we must assume the inertial mass (resistance to acceleration) of every
particle is Lorentz covariant, which implies that the mass has transverse and longitudinal
components that vary in a specific way when the particle is in motion. Now, it was known
that some portion of a charged objects resistance to acceleration is due to self-induction,
because a moving charge constitutes an electric current, which produces a magnetic field,
which resists changes in the current. Not surprisingly, this resistance to acceleration is
Lorentz covariant, because it is a purely electromagnetic effect. At one time it was
thought that perhaps all mass (even of electrically neutral particles) might be
electromagnetic in origin, and some even hoped that gravity and the unknown forces
governing the stability of matter would also someday be shown to be electromagnetic,
leading to a totally electromagnetic world view. (Ironically, at this same time, others were
trying to maintain the mechanical world view, by seeking to explain the phenomena of

electromagnetism in terms of mechanical models.) If in fact all physical effects are


ultimately electromagnetic, one could plausibly argue that Lorentz had succeeded in
developing a constructive account of relativity, based on the known properties of
electromagnetism. Essentially this would have resolved the apparent conflict between the
Galilean relativity of mechanics and Lorentzian relativity of electromagnetism, by
asserting that there is no such thing as mechanics, there is only electromagnetism. Then,
since electromagnetism is Lorentz covariant, it would follow that everything is Lorentz
covariant.
However, it was already known (though perhaps not well known) when Lorentz wrote his
paper in 1904 that the electromagnetic world view is not tenable. Poincare pointed this
out in his 1905 Palermo paper, in which he showed that the assumption of a purely
electromagnetic electron was self-consistent only with the degenerate solution of no
charge density at all. Essentially, the linearity of Maxwells equations implies that they
can not possibly yield stable bound configurations of charge. Poincare wrote
We must then admit that, in addition to electromagnetic forces, there are also nonelectromagnetic forces or bonds. Therefore, we need to identify the conditions
that these forces or bonds must satisfy for electron equilibrium to be undisturbed
by the [Lorentz] transformation.
In the remainder of this remarkable paper, Poincare derives general conditions that
Lorentz covariant forces must satisfy, and considers in particular the force of gravity. The
most significant point is that Poincare had recognized that Lorentz had reached the limit
of his constructive approach, and instead he (Poincare) was proceeding not to deduce the
necessity of relativity from the phenomena of electromagnetism or gravity, but rather to
deduce the necessary attributes of electromagnetism and gravity from the principle of
relativity. In this sense it is fair to say that Poincare originated a theory of relativity in
1905 (simultaneously with Einstein). On the other hand, both Poincare and Lorentz
continued to espouse the view that relativity was only an apparent fact, resulting from the
circumstance that our measuring instruments are necessarily affected by absolute motion
in the same way as are the things being measured. Thus they believed that the speed of
light was actually isotropic only with respect to one single inertial frame of reference, and
it merely appeared to be isotropic with respect to all the others. Of course, Poincare
realized full well (and indeed was the first to point out) that the Lorentz transformations
form a group, and the symmetry of this group makes it impossible, even in principle, to
single out one particular frame of reference as the true absolute frame (in which light
actually does propagate isotropically). Nevertheless, he and Lorentz both argued that
there was value in maintaining the belief in a true absolute rest frame, and this point of
view has continued to find adherents down to the present day.
As a historical aside, Oliver Lodge claimed that Fitzgerald originally suggested the
deformation of bodies as an explanation of Michelsons null result
while sitting in my study at Liverpool and discussing the matter with me. The
suggestion bore the impress of truth from the first.

Interestingly, Lodge interpreted Fitzgerald as saying not that objects contract in the
direction of motion but that they expand in the transverse direction. We saw in the
previous section how Voigts derivation of the Lorentz transformation left the scale factor
undetermined, and the evaluation of this factor occupied a surprisingly large place in the
later writings of Lorentz, Poincare, and Einstein. In his book The Ether of Space (1909)
Lodge provided an explanation for why he believed the effect of motion should be a
transverse expansion rather than a longitudinal contraction. He wrote
When a block of matter is moving through the ether of space its cohesive forces
across the line of motion are diminished, and consequently in that direction it
expands
Lodges reliability is suspect, since he presents this as an explanation not only of
Fitzgeralds suggestion but also of Lorentzs theory, which it definitely is not. But more
importantly, Lodges misunderstanding highlights one of the drawbacks of conceiving of
the deformation effect as arising from variations in electromagnetic forces. In order to
give a coherent account of phenomena, the lengths of objects must vary in exactly the
same proportion as the distances between objects. It would be quite strange to suppose
that the transverse distances between (neutral and widely separated) objects would
increase by virtue of being set in motion along parallel lines. In fact, it is not clear what
this would even mean. If three or more objects were set in parallel motion, in which
direction would they be deflected? And what could be the cause of such a deflection?
Neutral objects at rest exert a small attractive force on each other (due to gravity), but
diminishing this net force of cohesion would obviously not cause the objects to repel each
other.
Oddly enough, if Lodge had focused on the temporal instead of the spatial effects of
motion, his reasoning would have approximated a valid justification for time dilation.
This justification is often illustrated in terms two mirror in parallel motion, with a pulse
of light bouncing between them. In this case the motion of the mirrors actually does
diminish the frequency of bounces, relative to the stationary ether frame, because the
light must travel further between each reflection. Thus the time intervals expand (i.e.,
dilate). Given this time dilation of the local moving coordinates, its fairly obvious that
there must be a corresponding change in the effective space coordinate (since spatial
lengths are directly related to time intervals by dx = vdt). In other words, if an observer
moves at speed v relative to the ground, and passes over an object of length L at rest on
the ground, the length of the object as assessed by the moving observer is affected by his
measure of time. Since he is moving at speed v, the length of the object is vdt, where dt is
the time it takes him to traverse the length of the object but which "dt" will he use?
Naturally if he bases his length estimate on the measure of the time interval recorded on a
ground clock, he will have dt = L/v, so he will judge the object to be v(L/v) = L units in
length. However, if he uses his own effective time as indicated on his own co-moving
transverse light clock, he will have dt' = dt (1v2)1/2, so the effective length is v[(L/v)(1
v2)1/2] = L(1v2)1/2. Thus, effective length contraction (and no transverse expansion) is
logically unavoidable given the effective time dilation.

It might be argued that we glossed over an ambiguity in the above argument by


considering only light clocks with pulses moving transversely to the motion of the
mirrors, giving the relation dt' = dt(1v2)1/2. If, instead, we align the axis between the
mirrors with the direction of travel, we get dt = dt(1v2), so it might seem we have an
ambiguous measure of local time, and therefore an ambiguous prediction of length
contraction since, by the reasoning given above, we would conclude that an object of
rest-length L has the effective length L(1v2). However, this fails to account for the
contraction of the longitudinal distance between the mirrors (when they are arranged
along the axis of motion). Since by construction the speed of light is c in terms of the
local coordinates for the clock, the very same analysis that implies length contraction for
objects moving relative to the ether rest frame coordinates also implies the same
contraction for objects moving relative to the new local coordinates. Thus the clock is
contracted in the longitudinal direction relative to the ground's coordinates by the same
factor that objects on the ground are contracted in terms of the moving coordinates.
The amount of spatial contraction depends on the amount of time dilation, which depends
on the amount of spatial contraction, so it might seem as if the situation is indeterminate.
However, all but one of the possible combinations are logically inconsistent. For
example, if we decided that the clock was shortened by the full longitudinal factor of (1
v2), then there would be no time dilation at all, but with no time dilation there would be
no length contraction, so this is self-contradictory. The only self-consistent arrangement
that reconciles each reference frame's local measures of longitudinal time and length is
with the factor (1v2)1/2 applied to both. This also agrees with the transverse time dilation,
so we have isotropic clocks with respect to the local (i.e., inertial) coordinates of any
uniformly moving frame, and by construction the speed of light is c with respect to each
of these systems of coordinates. This is illustrated by the figures below, showing how the
spacetime pattern of reflecting light rays imposes a skew in both the time and the space
axes of relatively moving systems of coordinates.

A slightly different approach is to notice that, according to a "transverse" light clock, we


have the partial derivative t/T = 1/(1v2)1/2 along the absolute time axis, i.e., the line X
= 0. Integrating gives t = (T f(X))/(1v2)1/2 where f(x) is an arbitrary function of X. The
question is: Does there exist a function f(X) that will yield physical relativity? If such a
function exists, then obviously the resulting coordinates are the ones that will be adopted

as the rest frame by any observer at rest with respect to them. Such a function does
indeed exist, namely, f(X) = vX, which gives t = (TvX)/(1v2)1/2. To show reciprocity,
note that X = vT along the t axis, so we have t = T(1v2)/(1v2)1/2, which gives T = t/(1
v2)1/2 and so T/t = 1/(1v2)1/2. As we've seen, this same transformation yields relativity
in the longitudinal direction as well, so there does indeed exist, for any object in any state
of motion, a coordinate system with respect to which all optical phenomena are isotropic,
and as a matter of empirical fact this is precisely the same class of systems invoked by
Galileo's principle of mechanical relativity, the inertial systems, i.e., coordinate systems
with respect to which mechanical inertia is isotropic.
Lorentz noted that the complete reciprocity and symmetry between the "true" rest frame
coordinates and each of the local effective coordinate systems may seem surprising at
first. As he said in his Leiden lectures in 1910
The behavior of measuring rods and clocks in translational motion, when viewed
superficially, gives rise to a remarkable paradox, which on closer examination,
however, vanishes.
The seeming paradox arises because the Lorentz transformation between two relatively
moving systems of inertial coordinates (x,t) and (X,T) implies t/T = T/t, and there is
a temptation to think this implies (dt)2 = (dT)2. Of course, this paradox is based on a
confusion between total and partial derivatives. The parameter t is a function of both X
and T, and the expression t/T represents the partial derivative of t with respect to T at
constant X. Likewise T is a function of both x and t, and the expression T/t represents
the partial derivative of T with respect to t at constant x. Needless to say, there is nothing
logically inconsistent about a transformation between (x,t) and (X,T) such that (t/T)X
equals (T/t)x, so the paradox (as Lorentz says) vanishes.
The writings of Lorentz and Poincare by 1905 can be assembled into a theory of relativity
that is operationally equivalent to the modern theory of special relativity, although
lacking the conceptual clarity and coherence of the modern theory. Lorentz was
justifiably proud of his success in developing a theory of electrodynamics that accounted
for all the known phenomena, explaining the apparent relativity of these phenomena, but
he was also honest enough to acknowledge that the success of his program relied on
unjustified hypotheses, the most significant of which was the hypothesis that inertial
mass is Lorentz covariant. To place Lorentzs achievement in context, recall that toward
the end of the 19th century it appeared electromagnetism was not relativistic, because the
property of being relativistic was equated with being invariant under Galilean
transformations, and it was known that Maxwells equations (unlike Newtons laws of
mechanics) do not possess this invariance. Lorentz, prompted by experimental results,
discovered that Maxwells equations actually are relativistic, in the sense of his theorem
of corresponding states, meaning that there are relatively moving coordinate systems in
terms of which Maxwells equations are still valid. But these systems are not related by
Galilean transformations, so it still appeared that mechanics (presumed to be Galilean
covariant) and electrodynamics were not mutually relativistic, which meant it ought to be
possible to discern second-order effects of absolute motion by exploiting the difference

between the Galilean covariant of mechanics and Lorentz covariance of


electromagnetism.
However, all experiments refuted this expectation. In other words, it was found
empirically that electromagnetism and mechanics are mutually relativistic (at least to
second order). Hence the only possible conclusion is that either the known laws of
electromagnetism or the known laws of mechanics must be subtly wrong. Either the
correct laws of electromagnetism must really be Galilean covariant, or else the correct
laws of inertial mechanics must really be Lorentz covariant. At this point, in order to
save the phenomena, Lorentz simply assumed that inertial mass is Lorentz covariant.
Of course, he had before him the example of self-induction of charged objects, leading to
the concept of electromagnetic mass, which is manifestly Lorentz covariant, but, as
Poincare observed, it is not possible (and doesnt even make sense) for the intrinsic mass
of elementary particles to be electromagnetic in origin. Hence the hypothesis of Lorentz
covariance for inertia (and therefore inertial mechanics) is not a constructive deduction;
it is not even implied by the molecular force hypothesis (because there is no reason to
suppose that anything analogous to self-induction of the unknown molecular forces is
ultimately responsible for inertia); it is simply a hypothesis, motivated by empirical facts.
This does not diminish Lorentzs achievement, but it does undercut his comment that
Einstein simply postulates what we have deduced from the fundamental equations of
the electromagnetic field. In saying this, Lorentz overlooked that fact that the Lorentz
covariance of mechanical inertia cannot be deduced from the equations of
electromagnetism. He simply postulated it, no less than Einstein did.
Much of the confusion over whether Lorentz deduced or postulated his results is due to
confusion between the two aspects of the problem. First, it was necessary to determine
that Maxwells equations are Lorentz covariant. This was in fact deduced by Lorentz
from the laws themselves, consistent with his claim. But in order to arrive at a complete
theory of relativity (and in particular to account for the second-order null results) it is also
necessary to determine that mechanical inertia (and molecular forces, and gravity) are all
Lorentz covariant. This proposition was not deduced by Lorentz (or anyone else) from
the laws of electromagnetism, nor could it be, because it does not follow from those laws.
It is merely postulated, just as we postulate the conservation of energy, as an organizing
principle, justified by its logical cogency and empirical success. As Poincare clearly
explained in his Palermo paper, the principle of relativity itself emerges as the only
reliable guide, and this is as true for Lorentzs approach as it is for Einsteins, the main
difference being that Einstein recognized this principle was not only necessary, but also
that it obviated the detailed assumptions as to the structure of matter. Hence, even with
regard to electromagnetism (let alone mechanics) Lorentz could write in the 1915 edition
of his Theory of Electrons that
If I had to write the last chapter now, I should certainly have given a more
prominent place to Einsteins theory of relativity, by which the theory of
electromagnetic phenomena in moving systems gains a simplicity that I had not
been able to attain.

Nevertheless, as mentioned previously, Lorentz and Poincare both continued to espouse


the merits of the absolute interpretation of relativity, although Poincares seemed to
regard the distinction as merely conventional. For example, in a 1912 lecture he said
The new conception according to which space and time are no longer two
separate entities, but two parts of the same whole, which are so intimately bound
together that they cannot be easily separated is a new convention [that some
physicists have adopted] Not that they are constrained to do so; they feel that
this new convention is more comfortable, thats all; and those who do not share
their opinion may legitimately retain the old one, to avoid disturbing their ancient
habits. Between ourselves, let me say that I feel they will continue to do so for a
long time still.
Sadly, Poincare died just two months later, but his prediction has held true, because to
this day the ancient habits regarding absolute space and time persist. There are today
scientists and philosophers who argue in favor of what they see as Lorentzs constructive
approach, especially as a way of explaining the appearance of relativity, rather than
merely accepting relativity in the same way we accept (for example) the principle of
energy conservation. However, as noted above, the constructiveness of Lorentzs
approach begins and ends with electromagnetism, the rest being conjecture and
hypothesis, so this argument in favor of the Lorentzian view is misguided. But setting this
aside, is there any merit in the idea that the absolutist approach effectively explains the
appearance of relativity?
To answer this question, we must first clearly understand what precisely is to be
explained when one seeks to explain relativity. As discussed in section 1.2, we are
presented with many relativities in nature, such as the relativity of spatial orientation. Its
important to bear in mind that this relativity does not assert that the equilibrium lengths
of solid objects are unaffected by orientation; it merely asserts that all such lengths are
affected by orientation in exactly the same proportion. Its conceivable that all solid
objects are actually twice as long when oriented toward (say) the Andromeda galaxy than
when oriented perpendicular to that direction, but we have no way of knowing this.
Hence if we begin with the supposition that all objects are twice as long when pointed
toward Andromeda, we could deduce that all lengths will appear to be independent of
orientation, because they are all affected equally. But have we thereby explained the
apparent isotropy of spatial lengths? Not at all, because the thing to be explained is the
symmetry, i.e., why the lengths of all solid configurations, whether consisting of gold or
wood, maintain exactly the same proportions, independent of their spatial orientations.
The Andromeda axis theory does not explain this physical symmetry. Instead, it explains
something different, namely, why the Andromeda axis theory appears to be false even
though it is (by supposition) true. This is certainly a useful (indeed, essential) explanation
for anyone who accepts, a priori, the truth of the Andromeda axis theory, but otherwise it
is of very limited value.
Likewise if we accept absolute Galilean space and time as true concepts, a priori, then it
is useful to understand why nature may appear to be Minkowskian, even though it is

really (by supposition) Galilean. But what is the basis for the belief in the Galilean
concept of space and time, as distinct from the Minkowskian concept, especially
considering that the world appears to be Minkowskian? Most physicists have concluded
that there is no good answer to this question, and that its preferable to study the world as
it appears to be, rather than trying to rationalize ancient habits. This does not imply a
lack of interest in a deeper explanation for the effective symmetries of nature, but it does
suggest that such explanations are most likely to come from studying those effective
symmetries themselves, rather than from rationalizing why certain pre-conceived
universal asymmetries would be undetectable.

1.6 A More Practical Arrangement


It is known that Maxwells electrodynamics as usually understood at the
present time when applied to moving bodies, leads to asymmetries which
do not appear to be inherent in the
phenomena.
A. Einstein,
1905
It's often overlooked that Einstein began his 1905 paper "On the Electrodynamics of
Moving Bodies" by describing a system of coordinates based on a single absolute
measure of time. He pointed out that we could assign time coordinates to each event
...by using an observer located at the origin of the coordinate system, equipped
with a clock, who coordinates the arrival of the light signal originating from the
event to be timed and traveling to his position through empty space.
This is equivalent to Lorentz's conception of "true" time, provided the origin of the
coordinate system is at "true" rest. However, for every frame of reference except the one
at rest with the origin, these coordinates would not constitute an inertial coordinate
system, because inertia would not be isotropic in terms of these coordinates, so Newton's
laws of motion would not even be quasi-statically valid. Furthermore, the selection of the
origin is operationally arbitrary, and, even if the origin were agreed upon, there would be
significant logistical difficulties in actually carrying out a coordination based on such a
network of signals. Einstein says "We arrive at a much more practical arrangement by
means of the following considerations".
In his original presentation of special relativity Einstein proposed two basic principles,
derived from experience. The first is nothing other than Galileo's classical principle of
inertial relativity, which asserts that for any material object in any state of motion there
exists a system of space and time coordinates, called inertial coordinates, with respect to
which the object is instantaneously at rest and inertia is homogeneous and isotropic (the
latter being necessary for Newton's laws of motion to hold at least quasi-statically).
However, as discussed in previous sections, this principle alone is not sufficient to give a
useful basis for evaluating physical phenomena. We must also have knowledge of how

the description of events with respect to one system of inertial coordinates is related to
the description of those same events with respect to another, relatively moving, system of
coordinates. Rather than simply assuming a relationship based on some prior
metaphysical conception of space and time, Einstein realized that the correct relationship
between relatively moving systems of inertial coordinates could only be determined
empirically. He noted "the unsuccessful attempts to discover any motion of the earth
relatively to the 'light medium", and since we define motion in terms of inertial
coordinates, these experiments imply that the propagation of light is isotropic in terms of
the very same class of coordinate systems for which mechanical inertia is isotropic. On
the other hand, all the experimental results that are consolidated into Maxwell's equations
imply that the propagation speed of light (with respect to any inertial coordinate system)
is independent of the state of motion of the emitting source. Einsteins achievement was
to explain clearly how these seemingly contradictory facts of experience may be
reconciled.
As an aside, notice that isotropy with respect to inertial coordinates is what we would
expect if light was a stream of inertial corpuscles (as suggested by Newton), whereas the
independence of the speed of light from the motion of its source is what we would expect
if light was a wave phenomenon. This is the same dichotomy that we encounter in
quantum mechanics, and it's not coincidental that Einstein wrote his seminal paper on
light quanta almost simultaneously with his paper on the electrodynamics of moving
bodies. He might actually have chosen to combine the two into a single paper discussing
general heuristic considerations arising from the observed properties of light, and the
reconciliation of the apparent dichotomy in the nature of light as it is usually understood.
From the empirical facts that (a) light propagates isotropically with respect to every
system of inertial coordinates (which is essentially just an extension of Galileo's principle
of relativity), and that (b) the speed of propagation of light with respect to any system of
inertial coordinates is independent of the motion of the emitting source, it follows that the
speed of light in invariant with respect to every system of inertial coordinates. From
these facts we can deduce the correct relationship between relatively moving systems of
inertial coordinates.
To establish the form of the relationships between this "more practical" class of
coordinate systems (i.e., the class of inertial coordinate systems), Einstein notes that if
x,y,z,t is a system of inertial coordinates, and a pulse of light is emitted from location x0
along the x axis at time t0 toward a distant location x1, where it arrives and is reflected at
time t1, and if this reflected pulse is received back at location x2 (the same as x0) at time t2
then t1 = (t0 + t2)/2. In other words, since light is isotropic with respect to the same class of
coordinate systems in which mechanical inertia is isotropic, the light pulse takes the same
amount of time, (t2 t1)/2, to travel each way when expressed in terms of any system of
inertial coordinates. By the same reasoning the spatial distance between the emission and
reflection events is x1 x0 = c(t2 t1)/2.
Naturally the invariance of light speed with respect to inertial coordinates is implicit in
the principles on which special relativity is based, but we must not make the mistake of

thinking that this invariance is therefore tautological, or merely an arbitrary definition.


Inertial coordinates are not arbitrary, and they are definable without explicit reference to
the phenomenon of light. The real content of Einstein's principles is that light is an
inertial phenomenon (despite its wavelike attributes). The stationary ether of posited by
Lorentz did not interact mechanically with ordinary matter at all, and yet we know that
light conveys momentum to material objects. The coupling between the supposed ether
and ordinary matter was always problematic for ether theories, and indeed for any
classical wavelike theory of light. Einsteins paper on the photo-electric effect was a
crucial step in recognizing the localized ballistic aspects of electromagnetic radiation, and
this theme persists, just under the surface, of his paper on electrodynamics. Oddly
enough, the clearest statement of this insight came only as an afterthought, appearing in
Einstein's second paper on relativity in 1905, in which he explicitly concluded that
"radiation carries inertia between emitting and absorbing bodies". The point is that light
conveys not only momentum, but inertia. For example, after a body has absorbed an
elementary pulse of light, it has not only received a kick from the momentum of the
light, but the internal inertia (i.e., the inertial mass) of the body has actually increased.
Once it is posited that light is inertial, Galileo's principle of relativity automatically
implies that light propagates isotropically from the source, regardless of the source's state
of uniform motion. Consequently, if we elect to use space and time coordinates in terms
of which light speed is not isotropic (which we are certainly free to do), we will
necessarily find that no inertial processes are isotropic. For example, we will find that
two identical marbles expelled from a tube in opposite directions by an explosive charge
located between them will not fly away at equal speeds, i.e., momentum will not be
conserved. Conversely, if we use ordinary mechanical inertial processes together with the
conservation of momentum (and if we decline to assign any momentum or reaction to
unobservable and/or immovable entities), we will necessarily arrive at clock
synchronizations that are identical with those given by Einstein's light rays. Thus,
Einstein's "more practical arrangement" is based on (and ensures) isotropy not just for
light propagation, but for all inertial phenomena.
If a uniformly moving observer uses pairs of identical material objects thrown with equal
force in opposite directions to establish spaces of simultaneity, he will find that his
synchronization agrees with that produced by Einstein's assumed isotropic light rays.
The special attribute of light in this regard is due to the fact that, although light is inertial,
it has no mass of its own, and therefore no rest frame. It can be regarded entirely as
nothing but an interaction along a null interval between two massive bodies, the emitter
and absorber. From this follows the indefinite metric of spacetime, and light's seemingly
paradoxical combination of wavelike and inertial properties. (This is discussed more fully
in Section 9.11.)
It's also worth noting that when Einstein invoked the operational definitions of time and
distance based on light propagation, he commented that "we assume this definition of
synchronization is free from contradictions, and possible for any number of points". This
is crucial for understanding why a set of definitions based on the propagation of light is
tenable, in contrast with a similar set of definitions based on non-inertial signals, such as

acoustical waves or postal messages. A set of definitions based on any non-inertial signal
can't possibly preserve inertial isotropy. Of course, a signal requiring an ordinary material
medium for its propagation would obviously not be suitable for a universal definition of
time, because it would be inapplicable across regions devoid of that substance. Moreover,
even if we posited an omni-present substance, a signal consisting of (or carried by) any
material substance would be unsuitable because such objects do not exhibit any particular
fixed characteristic of motion, as shown by the fact that they can be brought to rest with
respect to some inertial system of reference. Furthermore, if there exist any signals faster
than those on which we base our definitions of temporal synchronization, those
definitions will be easily falsified. The fact that Einstein's principles are empirically
viable at all, far from being vacuous or tautological, is actually somewhat miraculous.
In fact, if we were to describe the kind of physical phenomenon that would be required in
order for us to have a consistent capability of defining a coherent basis of temporal
synchronization for spatially separate events, clearly it could be neither a material object,
nor a disturbance in a material medium, and yet it must exhibit some fixed characteristic
quality of motion that exceeds the motion of any other object or signal. We hardly have
any right to expect, a priori, that such phenomenon exists. On the other hand, it could be
argued that Einstein's second principle is just as classical as his first, because sight has
always been the de facto arbiter of simultaneity (as well as of straightness, as in "uniform
motion in a straight line"). Even in Galileo's day it was widely presumed that vision was
instantaneous, so it automatically was taken to define simultaneity. (We review the
historical progress of understanding the speed of light in Section 3.3.) The difference
between this and the modern view is not so much the treatment of light as the means of
defining simultaneity, but simply the realization that light propagates at a finite speed,
and therefore the spacetime manifold is only partially ordered.
The derivation of the Lorentz transformation presented in Einstein's 1905 paper is
formally based on two empirically-based propositions, which he expressed as follows:
1. The laws by which the conditions of physical systems change are independent
of which of two coordinate systems in homogeneous translational movement
relative to each other these changes in status are referred.
2. Each ray of light moves in "the resting" coordinate system with the definite
speed c, independently of whether this ray of light is emitted from a resting or
moving body. Here speed = (optical path) / (length of time), where "length of
time" is to be understood in the sense of the definition in l.
In the first of these propositions we are to understand that the coordinate systems are
all such that Newtons laws of motion hold good (in a suitable limiting sense), as alluded
to at the beginning of the papers l. This is crucial, because without this stipulation, the
proposition is false. For example, coordinate systems related by Galilean transformations
are in homogeneous translational movement relative to each other, and yet the laws by
which physical systems change (e.g., Maxwells equations) are manifestly not
independent of the choice of such coordinate systems. So the restriction to coordinate

systems in terms of which the laws of mechanics hold good is crucial. However, once we
have imposed this restriction, the proposition becomes tautological, at least for the laws
of mechanics. The real content of Einsteins first principle is therefore the assertion that
the other laws of physics (e.g., the laws of electrodynamics) hold good in precisely the
same set of coordinate systems in terms of which the laws of mechanics hold good. (This
is also the empirical content of the failure of the attempts to detect the Earths absolute
motion through the electromagnetic ether.) Thus Einsteins first principle simply reasserts Galileos claim that all effects of uniform rectilinear motion can be transformed
away by a suitable choice coordinate systems.
It might seems that Einsteins second principle is implied by the first, at least if
Maxwell's equations are regarded as laws governing the changes of physical systems,
because Maxwell's equations prescribe the speed of light propagation independent of the
source's motion. (Indeed, Einstein alluded to this very point at the beginning of his 1905
paper on the inertia of energy.) However, its not clear a priori whether Maxwells
equations are valid in terms of relatively moving systems of coordinates, nor whether the
permissivity of the vacuum is independent of the frame of reference in terms of which it
is evaluated. Moreover, as discussed above, by 1905 Einstein already doubted the
absolute validity of Maxwell's equations, having recently completed his paper on the
photo-electric effect which introduced the idea of photons, i.e., light propagating as
discrete packets of energy, a concept which cannot be represented as a solution of
Maxwell's linear equations. Einstein also realized that a purely electromagnetic theory of
matter based on Maxwell's equations was impossible, because those equations by
themselves could never explain the equilibrium of electric charge that constitutes a
charged particle. "Only different, nonlinear field equations could possibly accomplish
such a thing." This observation shows how unjustified was the "molecular force
hypothesis" of Lorentz, according to which all the forces of nature were assumed to
transform exactly as do electromagnetic forces as described by Maxwell's linear
equations. Knowing that the molecular forces responsible for the equilibrium of charged
particles must necessarily be of a fundamentally different character than the forces of
electromagnetism, and certainly knowing that the stability of matter may not even have a
description in the form of a continuous field theory at all, it's clear that Lorentz's
hypothesis has no constructive basis, and is simply tantamount to the adoption of
Einsteins two principles.
Thus, Einstein's contribution was to recognize that "the bearing of the Lorentz
transformation transcended its connection with Maxwell's equations and was concerned
with the nature of space and time in general". Instead of basing special relativity on an
assumption of the absolutely validity of Maxwell's equations, Einstein based it on the
particular characteristic exhibited by those equations, namely Lorentz invariance, that he
intuited was the more fundamental principle, one that could serve as an organizing
principle analogous to the conservation of energy in thermodynamics, and one that could
encompass all physical laws, even if they turned out to be completely dissimilar to
Maxwell's equations. Remarkably, this has turned out to be the case. Lorentz invariance is
a key aspect of the modern theory of quantum electrodynamics, which replaced
Maxwells equations.

Of course, just as Einsteins first principle relies on the restriction to coordinate systems
in which the laws of mechanics hold good, his second principle relies crucially on the
requirement that time intervals are to be understood in the sense of the definition given
in 1. And, again, once this condition is recognized, the principle itself becomes
tautological, although in this case the tautology is complete. The second principle states
that light always propagates at the speed c, assuming we define the time intervals in
accord with 1, which defines time intervals as whatever they must be in order for the
speed of light to be c. This unfortunately has led some critics to assert that special
relativity is purely tautological, merely a different choice of conventions. Einsteins
presentation somewhat obscures the real physical content of the theory, which is that
mechanical inertia and the propagation speed of light are isotropic and invariant with
respect to precisely the same set of coordinate systems. This is a non-trivial fact. It then
remains to determine how these distinguished coordinate systems are related to each
other.
Although Einstein explicitly highlighted just two principles as the basis of special
relativity in his 1905 paper (consciously patterned after the two principle of
thermodynamics), his derivation of the Lorentz transformation also invoked the
properties of homogeneity that we attribute to space and time to establish the linearity of
the transformations. In addition, he tacitly assumed spatial isotropy, i.e., that there is no
preferred direction in space, so the intrinsic properties of ideal rods and clocks do not
depend on their spatial orientations. Lastly, he assumed memorylessness, i.e., that the
extrinsic properties of rods and clocks may be functions of their current positions and
states of motion, but not of their previous positions or states of motion. This last
assumption is needed to exclude the possibility that every elementary particle may
somehow "remember" its entire history of accelerations, and thereby "know" its present
absolute velocity relative to a common fixed reference. (Einstein explicitly listed these
extra assumptions in an exposition written in 1920. He may have gained an appreciation
of the importance of the independence of measuring rods and clocks from their past
history after considering Weyls unified field theory, which Einstein rejected precisely
because it violated this premise.)
The actual detailed derivation of the Lorentz transformation presented in Einsteins 1905
paper is somewhat obscure and circuitous, but its worthwhile to follow his reasoning,
partly for historical interest, and partly to contrast it with the more direct and compelling
derivations that will be presented in subsequent sections.
Following Einsteins original derivation, we begin with an inertial (and Cartesian)
coordinate system called K, with the coordinates x, y, z, t, and we posit another system of
inertial coordinates denoted as k, with the coordinates , , , . The spatial axes of these
two systems are aligned, and the spatial origin of k is moving in the positive x direction
with speed v in terms of K. We then consider a particle at rest in the k system, and note
that for such a particle the x and t coordinates (i.e., the coordinates in terms of the K
system) are related by x = x vt for some constant x. We also know the y and z
coordinates of such a particle are constant. Hence each stationary spatial position in the k

system corresponds to a set of three constants (x,y,z), and we can also assign the time
coordinate t to each event.
Interestingly, the system of variables x,y,z,t constitute a complete coordinate system,
related to the original system K by a Galilean transformation x = x-vt, y=y, z=z, t=t.
Thus, just as Lorentz did in 1892, Einstein began by essentially applying a Galilean
transformation to the original rest frame coordinates to give an intermediate system of
coordinates, although Einsteins paper makes it clear that this is not an inertial coordinate
system.
Now we consider the values of the coordinate of the k system as a function of x,y,z,t
for any stationary point in the k system. Suppose a pulse of light is emitted from the
origin of the k system in the positive x direction at time 0, it reaches the point
corresponding to x,y,z at time 1, where it is reflected, arriving back at the origin of the k
system at time 2. This is depicted in the figure below.

Recall that the coordinates are defined as inertial coordinates, meaning that inertia
is homogeneous and isotropic in terms of these coordinates. Also, all experimental
evidence (such as all "the unsuccessful attempts to discover any motion of the earth
relatively to the 'light medium'") indicates that the speed of light is isotropic in terms of
any inertial coordinate system. Therefore, we have 1 = (0 + 2)/2, so the coordinate as
a function of x,y,z,t satisfies the relation

Differentiating both sides with respect to the parameter x, we get (using the chain rule)

Now, it should be noted here that the partial derivatives are being evaluated at different
points, so we would not, in general, be justified in treating them interchangeably.
However, Einstein has stipulated that the transformation equations are linear (due to
homogeneity of space and time), so the partial derivatives are all constants and unique
(for any given v). Simplifying the above equation gives

At this point, Einstein alludes to analogous reasoning for the y and z directions, but
doesnt give the details. Presumably we are to consider a pulse of light emanating from
the origin and reflecting at a point x = 0, y, z = 0, and returning to the origin. In this case
the isotropy of light propagation in terms of inertial coordinates implies

In this equation we have made use of the fact that the y component of the speed of the
light pulse (in terms of the K system) as it travels in either direction between these points,
which are stationary in the k system, is (c2 v2)1/2. Differentiating both sides with respect
to y, we get

and therefore /y = 0. The same reasoning shows that /z = 0. Now the total
differential of (x,y,z,t) is, by definition

and we know the partial derivatives with respect to y and z are zero, and the partial
derivatives with respect to x and t are in a known ratio, so for any given v we can write

where a(v) is as yet an undetermined function. Incidentally, Einstein didnt write this
expression in terms of differentials, but he did state that he was letting x be
infinitesimally small, so he was essentially dealing with differentials. On the other hand,
the distinction between differentials and finite quantities matters little in this context,
because the relations are linear, and hence the partial derivatives are constants, so the
differentials can be trivially integrated. Thus we have

Einstein then used this result to determine the transformation equations for the spatial
coordinates. The coordinate of a pulse of light emitted from the origin in the positive x
direction is related to the coordinate by = c (since experience has shown that light
propagates with the speed c in all directions when expressed in terms of any system of
inertial coordinates). Substituting for from the preceding formula gives, for the
coordinate of this light pulse, the expression

We also know that, for this light pulse, the parameters t and x are related by t = x/(c-v),
so we can substitute for t in the above expression and simplify to give the relation
between and x (both of which, we remember, are constants for any point at rest in k)

We can choose x to be anything we like, so this represents the general relation between
these two parameters. Similarly the coordinate of a pulse of light emanating from the
origin in the direction is

but in this case we have x = 0 and, as noted previously, t = y/(c2v2)1/2, so we have

and by the same token

If we define the function

and substitute x vt for x, the preceding results can be summarized as

At this point Einstein observes that a sphere of light expanding with the speed c in terms
of the unprimed coordinates transforms to a sphere of light expanding with speed c in
terms of the double-primed coordinates. In other words,

As Einstein says, this shows that our two fundamental principles are compatible, i.e., it
is possible for light to propagate isotropically with respect to two relatively moving
systems of inertial coordinates, provided we allow the possibility that the transformation
from one inertial coordinate system to another is not exactly as Galileo and Newton
surmised.
To complete the derivation of the Lorentz transformation, it remains to determine the
function (v). To do this, Einstein considers a two-fold application of the transformation,
once with the speed v in the positive x direction, and then again with the speed v in the
negative x direction. The result should be the identity transformation, i.e., we should get
back to the original coordinate system. (Strictly speaking, this assumes the property of
memorylessness.) Its easy to show that if we apply the above transformation twice,
once with parameter v and once with parameter v, each coordinate is (v)(v) times the
original coordinate, so we must have

Finally, Einstein concludes by inquiring into the signification of (v). He notes that a
segment of the axis moving with speed v perpendicular to its length (i.e., in the positive
x direction) has the length y = /(v) in terms of the K system coordinates, and by
reasons of symmetry (i.e., spatial isotropy) this must equal /(v), because it doesnt
matter whether this segment of the y axis is moving in the positive or the negative x
direction. Consequently we have (v) = (v), and therefore (v) = 1, so he arrives at the
Lorentz transformation

This somewhat laborious and awkward derivation is interesting in several respects. For
one thing, one gets the impression that Einstein must have been experimenting with
various methods of presentation, and changed his nomenclature during the drafting of the
paper. For example, at one point he says a is a function (v) at present unknown, but
subsequently a(v) and (v) are defined as different functions. At another point he defines
x as a Galilean transform of x (without explicitly identifying it as such), but
subsequently uses the symbol x as part of the inertial coordinate system resulting from
the two-fold application of the Lorentz transformation. In addition, he somewhat tacitly
makes use of the invariance of the light-like relation x2 + y2 = c2t2 in his derivation of the
transformation equations for the y coordinate, but doesnt seem to realize that he could
just as well have invoked the invariance of x2 + y2 + z2 = c2t2 to make short work of the
entire derivation. Instead, he presents this invariance as a consequence of the
transformation equations despite the fact that he has tacitly used the invariance as the
basis of the derivation (which of course he was entitled to do, since that invariance
simply expresses his light principle).
Perhaps not surprisingly, some readers have been confused as to the significance of the
functions a(v) and (v). For example, in a review of Einsteins paper, A. I. Miller writes
Then, without prior warning Einstein replaced a(v) with (v)/(1(v/c)2)1/2 But
why did Einstein make this replacement? It seems as if he knew beforehand the
correct form of the set of relativistic transformations How did Einstein know
that he had to make [this substitution] in order to arrive at those space and time
transformations in agreement with the postulates of relativity?
This suggests a misunderstanding, because the substitution in question is purely formal,
and has no effect on the content of the equations. The transformations that Einstein had
derived by that point, prior to replacing a(v), were already consistent with the postulates
of relativity (as can be verified by substituting them into the Minkowski invariant). It is
simply more convenient to express the equations in terms of (v), which is the entire
coefficient of the transformations for y and z. One naturally expects this coefficient to
equal unity.
Even aside from the inadvertent changes in nomenclature, Einsteins derivation is
undeniably clumsy, especially in first applying what amounts to a Galilean
transformation, and then deriving the further transformation needed to arrive at a system
of inertial coordinates. Its clear that he was influenced by Lorentzs writings, even to the
point of using the same symbol for the quantity 1/(1(v/c)2)1/2, which Lorentz used in
his 1904 paper. (Oddly enough, many years later Einstein wrote to Carl Seelig that in
1905 he had known only of Lorentzs 1895 paper, but not his subsequent papers, and
none of Poincares papers on the subject.)
In a review article published in 1907 Einstein had already adopted a more economical
derivation, dispensing with the intermediate Galilean system of coordinates, and making
direct use of the lightlike invariant expression, similar to the standard derivation
presented in most introductory texts today. To review this now standard derivation,

consider (again) Einsteins two systems of inertial coordinates K and k, with coordinates
denoted by (x,y,z,t) and (,,,) respectively, and oriented so that the x and axes
coincide, and the xy plane coincides with the plane. Also, as before, the system k is
moving in the positive x direction with fixed speed v relative to the system K, and the
origins of the two systems momentarily coincide at time t = = 0.
According to the principle of homogeneity, the relationship between the two sets of
coordinates must be linear, so there must be constants A1 and A2 (for a given v) such that
= A1x + A2 t. Furthermore, if an object is stationary relative to k, and if it passes through
the point (x,t) = (0,0), then it's position in general satisfies x = vt, from the definition of
velocity, and the coordinate of that point with respect to the k system is 0. Therefore we
have = A1(vt) + A2 t = 0. Since this must be true for non-zero t, we must have A1 v + A2
= 0, and so A2 = A1 v. Consequently, there is a single constant A (for any given v) such
that = A(xvt). Similarly there must be constants B and C such that = By and = Cz.
Also, invoking isotropy and homogeneity, we know that is independent of y and z, so it
must be of the form = Dx + Et for some constants D and E (for a given v). It only
remains to determine the values of the constants A, B, C, D, and E in these expressions.
Suppose at the instant when the spatial origins of K and k coincide a spherical wave of
light is emitted from their common origin. At a subsequent time t in the first frame of
reference the sphere of light must be the locus of points satisfying the equation

and likewise, according to our principles, in the second frame of reference the spherical
wave at time must be the locus of points described by

Substituting from the previous expressions for the k coordinates into this equation, we get

Expanding these terms and rearranging gives

The assumption that light propagates at the same speed in both frames of reference
implies that a simultaneous spherical shell of light in one frame is also a simultaneous
spherical shell of light in the other frame, so the coefficients of equation (3) must be
proportional to the coefficients of equation (1). Strictly speaking, the constant of

proportionality is arbitrary, representing a simple re-scaling, so we are free to impose an


additional condition, namely, that the transformation with parameter +v followed by the
transformation with parameter v yields the original coordinates, and by the isotropy of
space these two transformations, which differ only in direction, must have the same
constant of proportionality. Thus the corresponding coefficients of equations (1) and (3)
must not only be proportional, they must be equal, so we have

Clearly we can take B = C = 1 (rather than 1, since we choose not to reflect the y and z
directions). Dividing the 4th of these equations by 2, we're left with the three equations in
the three unknowns A, D, and E:

Solving the first equation for A2 and substituting this into the 2nd and 3rd equations gives

Solving the first for E and substituting into the 2nd gives a single quadratic equation in D,
with the roots

Substituting this into either of the previous equations and solving the resulting quadratic
for E gives

Note that the equations require opposite signs for D and E. Now, for small values of v/c
we expect to find E approaching +1 (as in Galilean relativity), so we choose the positive
root for E and the negative root for D. Finally, from the relation A2 c2 D2 = 1 we get

and again we select the positive root. Consequently we have the Lorentz transformation

Naturally with this transformation we can easily verify that

so this quantity is the squared "absolute distance" from the origin to the point with K
coordinates (x,y,z,t) and the corresponding k coordinates (), which confirms that
the absolute spacetime interval between two points is the same in both frames. Notice
that equations (1) and (2) already implied this relation for null intervals. In other words,
the original premise was that if x2 + y2 + z2 c2t2 equals zero, then 2 + 2 + 2 c22 also
equals zero. The above reasoning show that a consequence of this premise is that, for any
arbitrary real number s2, if x2 + y2 + z2 c2t2 equals s2, then 2 + 2 + 2 c22 also equals s2.
Therefore, this quadratic form represents an absolute invariant quantity associated with
the interval from the origin to the event (x,y,z,t).
1.7 Staircase Wit
Henceforth space by itself, and time by itself, are doomed to fade away
into mere shadows, and only a kind of union of the two will preserve an
independent reality.
H.
Minkowski, 1908
In retrospect, it's easy to see that the Galilean notion of space and time was not free of
conceptual difficulties. In 1908 Minkowski delivered a famous lecture in which he argued
that the relativistic phenomena described by Lorentz and clarified by Einstein might have
been inferred from first principles long before, if only more careful thought had been
given to the foundations of classical geometry and mechanics. He pointed out that special
relativity arises naturally from the reconciliation of two physical symmetries that we
individually take for granted. One is spatial isotropy, which asserts the equivalence of all
physical phenomena under linear transformations such as x = ax by, y = bx + ay, z =
z, t = t, where a2 + b2 = 1. Its easy to verify that transformations of this type leave all
quantities of the form x2 + y2 + z2 invariant. The other is Galilean relativity, which asserts
the equivalence of all physical phenomena under transformations such as x = x vt, y =
y, z = z, t = t, where v is a constant. However, these transformations obviously do not

leave the quantity x2 + y2 + z2 invariant, because they involve the time coordinate as well
as the space coordinates. In addition, we notice that the rotational transformations
maintain the orthogonality of the coordinate axes, whereas the lack of an invariant
measure for the Galilean transformations prevents us from even assigning a definite
meaning to orthogonality between the time and space coordinates. Since the velocity
transformations leave the laws of physics unchanged, Minkowski reasoned, they ought to
correspond to some invariant physical quantity, and their determinants ought to be unity.
Clearly the invariant must involve the time coordinate, and hence the units of space and
time must be in some fixed non-singular relation to each other, with a conversion factor
that we can normalize to unity. Also, since we cannot go backwards in time, the space
axis must not be rotated in the same direction as the time axis by a velocity
transformation, so the velocity transformations ought to be of the form x = ax bt, y=y,
z=z, t = bx at, where a2 b2 = 1. Combining this with the requirements b/a = v, we
arrive at the transformation

which leaves invariant the quantity x2 + y2 + z2 t2. The rotational transformations also
leave this same quantity invariant, so this appears to be the most natural (and almost the
only) way of reconciling the observed symmetries of physical phenomena. Hence from
simple requirements of rational consistency we could have arrived at the Lorentz
transformation. As Minkowski said
Such a premonition would have been an extraordinary triumph for pure
mathematics. Well, mathematics, though it now can display only staircase wit,
has the satisfaction of being wise after the event... to grasp the far-reaching
consequences of such a metamorphosis of our concept of nature.
Needless to say, the above discussion is just a rough sketch, intended to show only the
outline of an argument. It seems likely that Minkowski was influenced by Kleins
Erlanger program, which sought to interpret various kinds of geometry in terms of the
invariants under a specific group of transformations. It is certainly true that we are led
toward the Lorentz transformations as soon as we consider the group of velocity
transformations and attempt to identify a physically meaningful invariant corresponding
to these transformations. However, the preceding discussion glossed over several
important considerations, and contains several unstated assumptions. In the following, we
will examine Minkowskis argument in more detail, paying special attention to the
physical significance of each assertion along the way, and elaborating more fully the
rational basis for concluding that there must be a definite relationship between the
measures of space and time.
For any system of mutually orthogonal spatial coordinates x,y,z, (assumed linear and
homogeneous) let the positions of the two ends of a given spatially extended physical
entity be denoted by x1,y1,z1 and x2,y2,z2, and let L2 denote the sum of the squares of the
component differences. In other words

Experience teaches us that, for a large class of physical entities (solids), we can shift
and/or re-orient the entity (relative to the system of coordinates), changing the individual
components, but the sum of the squares of the component differences remains unchanged.
The invariance of this quantity under re-orientations is called spatial isotropy. Its worth
emphasizing that the invariance of s2 under these operations applies only if the x, y, and z
coordinates are mutually orthogonal.
The spatial isotropy of physical entities implies a non-trivial unification of orthogonal
measures. Strictly speaking, each of the three terms on the right side of (1) should be
multiplied by a coefficient whose units are the squared units of s divided by the squared
units of x, y, or z respectively. In writing the equation without coefficients, we have
tacitly chosen units of measure for x, y, and z such that the respective coefficients are 1.
In addition, we tacitly assumed the spatial coordinates of the two ends of the physical
entity had constant values (for a given position and orientation), but of course this
assumption is valid only if the entities are stationary. If an object is in motion (relative to
the system of coordinates), then the coordinates of its endpoints are variable functions of
time, so instead of the constant x1 we have a function x1(t), and likewise for the other
coordinates. Its natural to ask whether the symmetry of equation (1) is still applicable to
objects in motion. Clearly if we allow the individual coordinate functions to be evaluated
at unequal times then the symmetry does not apply. However, if all the coordinate
functions are evaluated for the same time, experience teaches us that equation (1) does
apply to objects in motion. This is the second of our two commonplace symmetries, the
apparent fact that the sum of the squares of the orthogonal components of the spatial
interval between the two ends of a solid entity is invariant for all states of uniform
motion, with the understanding that the coordinates are all evaluated at the same time.
To express this symmetry more precisely, let x1,y1,z1 denote the spatial coordinates of one
end of a solid physical entity at time t1, and let x2,y2,z2 denote the spatial coordinates of
the other end at time t2. Then the quantity expressed by equation (1) is invariant for any
position, orientation, and state of uniform motion provided t1 = t2. However, just as the
spatial part of the symmetry is not valid for arbitrary spatial coordinate systems, the
temporal part is not valid for arbitrary time coordinates. Recall that the spatial isotropy of
the quantity expressed by equation (1) is valid only if the space coordinates x,y,z are
mutually orthogonal. Likewise, the combined symmetry covering states of uniform
motion is valid only if the time component t is mutually orthogonal to each of the space
coordinates.
The question then arises as to how we determine whether coordinate axes are mutually
orthogonal. We didnt pause to consider this question when we were dealing only with
the three spatial coordinates, but even for the three space axes the question is not as
trivial as it might seem. The answer relies on the concept of distance defined by the
quantity s in equation (1). According to Euclid, two lines intersecting at the point P are

perpendicular if and only if each point of one line is equidistant from the two points on
the other line that are equidistant from P. Unfortunately, this reasoning involves a circular
argument, because in order to determine whether two lines are orthogonal, we must
evaluate distances between points on those lines using an equation that is valid only if our
coordinate axes are orthogonal. By this reasoning, we could conjecture that any two
obliquely intersecting lines are orthogonal, and then use equations (1) with coordinates
based on those lines to confirm that they are indeed orthogonal according to Euclids
definition. But of course the physical objects of our experience would not exhibit spatial
isotropy in terms of these coordinates. This illustrates that we can only establish the
physical orthogonality of coordinate axes based on physical phenomena. In other words,
we construct orthogonal coordinate axes operationally, based on the properties of
physical entities. For example, we define an orthogonal system of coordinates in such a
way that a certain spatially extended physical entity is isotropic. Then, by definition, this
physical entity is isotropic with respect to these coordinates, so again the reasoning is
circular. However, the physical significance of these coordinates and the associated
spatial isotropy lies in the empirical fact that all other physical entities (in the class of
solids) exhibit spatial isotropy in terms of this same system of coordinates.
Next we need to determine a time axis that is orthogonal to each of the space axes. In
common words, this amounts to synchronizing the times at spatially separate locations.
Just as in the case of the spatial axes, we can establish physically meaningful
orthogonality for the time axis only operationally, based on some reference physical
phenomena. As weve seen, orthogonality between two lines is determined by the
distances between points on those lines, so in order to determine a time axis orthogonal to
a space axis we need to evaluate distances between points that are separated in time as
well as in space. Unfortunately, equation (1) defines distances only between points at the
same time. Evidently to establish orthogonality between space and time axes we need a
physically meaningful measure of space-time distance, rather than merely spatial
distance.
Another physical symmetry that we observe in nature is the symmetry of temporal
translation. This refers to the fact that for a certain class of physical processes the
duration of the process is independent of the absolute starting time. In other words, letting
t1 and t2 denote the times of the two ends of the process, the quantity

is invariant under translation of the starting time t1. This is exactly analogous to the
symmetry of a class of physical objects under spatial translations. However, we have seen
that the spatial symmetries are valid only if the time coordinates t1 and t2 are the same, so
we should recognize the possibility that the physical symmetry expressed by the
invariance of (2) is valid only when the spatial coordinates of events 1 and 2 are the
same. Of course, this can only be determined empirically. Somewhat surprisingly,
common experience suggests that the values of 2 for a certain class of physical processes
actually are invariant even if the spatial positions of events 1 and 2 are different at least
to within the accuracy of common observation and for differences in positions that are

not too great. Likewise we find that, for just about any time axis we choose, such that
some material object is at rest in terms of the coordinate system, the spatial symmetries
indicated by equation (1) apply, at least within the accuracy of common observation and
for objects that are not moving too rapidly. This all implies that the ratio of spatial to
temporal units of distance is extremely great, if not infinite.
If the ratio is infinite, then every time axis is orthogonal to every space axis, whereas if it
is finite, any change of the direction of the time axis requires a corresponding change of
the spatial axes in order for them to remain mutually perpendicular. The same is true of
the relation between the space axes themselves, i.e., if the scale factor between (say) the x
and the y coordinates was infinite, then those axes would always be perpendicular, but
since it is finite, any rotation of the x axis (about the z axis) requires a corresponding
rotation of the y axis in order for them to remain orthogonal. It is perhaps conceivable
that the scale factor between space and time could be infinite, but it would be very
incongruous, considering that the time axis can have spatial components. Also, taking
equations (1) and (2) separately, we have no means of quantifying the absolute separation
between two non-simultaneous events. The spatial separation between non-simultaneous
events separated by a time increment t is totally undefined, because there exist perfectly
valid reference frames in which two non-simultaneous events are at precisely the same
spatial location, and other frames in which they are arbitrarily far apart. Still, in all of
those frames (according to Galilean relativity), the time interval remains t. Thus, there is
no definite combined spatial and temporal separation despite the fact that we clearly
intuit a definite physical difference between our distance from "the office tomorrow" and
our distance from "the Andromeda galaxy tomorrow". Admittedly we could postulate a
universal preferred reference frame for the purpose of assessing the complete separations
between events, but such a postulate is entirely foreign to the logical structure of Galilean
space and time, and has no operational significance.
So, we are led to suspect that there is a finite (though perhaps very large) scale factor c
between the units of space and time, and that the physical symmetries weve been
discussing are parts of a larger symmetry, comprehending the spatial symmetries
expressed by (1) and the temporal symmetries expressed by (2). On the other hand, we do
not expect spacelike intervals and timelike intervals to be directly conformable, because
we cannot turn around in time as we can in space. The most natural supposition is that the
squared spacelike intervals and the squared timelike intervals have opposite signs, so that
they are mutually imaginary (in the numerical sense). Hence our proposed invariant
quantity for a suitable class of repeatable physical processes extending uniformly from
event 1 to event 2 is

(This is the conventional form for spacelike intervals, whereas the negative of this
quantity, denoted by 2, is used to signify timelike intervals.) This quantity is invariant
under any combination of spatial rotations and changes in the state of uniform motion, as
well as simple translations of the origin in space and/or time. The algebraic group of all
transformations (not counting reflections) that leave this quantity invariant is called the

Poincare group, in recognition of the fact that it was first described in Poincares famous
Palermo paper, dated July 1905. Equation (3) is not positive-definite, which means that
even though it is a squared quantity it may have a negative value, and of course it
vanishes along the path of a light pulse. Noting that squared times and squared distances
have opposite signs, Minkowski remarked that
Thus the essence of this postulate may be clothed mathematically in a very
pregnant manner in the mystic formula

On this basis equation (3) can be re-written in a way that is formally symmetrical in the
space and time coordinates, but of course the invariant quantity remains non-positivedefinite. The significance of this mystic formula continues to be debated, but it does
provide an interesting connection to quantum mechanics, to be discussed in Section 9.9.
As an aside, note that measurements of physical objects in various orientations are not
sufficient to determine the true lengths in any metaphysical absolute sense. If all
physical objects were, say, twice as long when oriented in one particular absolute
direction than in the perpendicular directions, and if this anisotropy affected all physical
phenomena equally, we could never detect it, because our rulers would be affected as
well. Thus, when we refer to a physical symmetry (such as the isotropy of space), we are
referring to the fact that all physical phenomena are affected by some variable (such as
spatial orientation) in exactly the same way, not that the phenomena bear any particular
relationship with some metaphysical standard. From this perspective we can see that the
Lorentzian approach to explaining the (apparent) symmetries of space-time does
nothing to actually explain those symmetries; it is simply a rationalization of the
discrepancy between those empirical symmetries and an a priori metaphysical standard
that does not possess those symmetries.
In any case, weve seen how a slight (for most purposes) modification of the relationship
between inertial coordinate systems leads to the invariant quantity

For any fixed value of the constant c, we will denote by Gc the group of transformations
that leave this quantity unchanged. If we let c go to infinity, the temporal increment dt
must be invariant, leaving just the original Euclidean group for the spatial increments.
Thus the space and time components are de-coupled, in accord with Galilean relativity.
Minkowski called this limiting case G , and remarked that
Since Gc is mathematically much more intelligible than G , it looks as though the
thought might have struck some mathematician, fancy-free, that after all, as a
matter of fact, natural phenomena do not possess invariance with the group G,
but rather with the group Gc, with c being finite and determinate, but in ordinary
units of measure extremely great.

Minkowski is here clearly suggesting that Lorentz invariance might have been deduced
from a priori considerations, appealing to mathematical "intelligibility" as a criterion for
the laws of nature. Einstein himself eschewed the temptation to retroactively deduce
Lorentz invariance from first principles, choosing instead to base his original presentation
of special relativity on two empirically-founded principles, the first being none other than
the classical principle of relativity, and the second being the proposition that the speed of
light is the same with respect to any system of inertial coordinates, independent of the
motion of the source. This second principle often strikes people as arbitrary and
unwarranted (rather like Euclid's "fifth postulate", as discussed in Section 3.1), and there
have been numerous attempts to deduce it from some more fundamental principle. For
example, it's been argued that the light speed postulate is actually redundant to the
relativity principle itself, since if we regard Maxwell's equations as fundamental laws of
physics, and we regard the permeability 0 and permittivity 0 of the vacuum as invariant
constants of those laws in any uniformly moving frame of reference, then it follows that
the speed of light in a vacuum is c =
with respect to every uniformly moving
system of coordinates. The problem with this line of reasoning is that Maxwell's
equations are not valid when expressed in terms of an arbitrary uniformly moving system
of coordinates. In particular, they are not invariant under a Galilean transformation despite the fact that systems of coordinates related by such a transformation are
uniformly moving with respect to each other. (Maxwell himself recognized that the
equations of electromagnetism, unlike Newton's equations of mechanics, were not
invariant under Galilean "boosts"; in fact he proposed various experiments to exploit this
lack of invariance in order to measure the "absolute velocity" of the Earth relative to the
aluminiferous ether. See Section 3.3 for one example.)
Furthermore, we cannot assume, a priori, that 0 and 0 are invariant with respect to
changes in reference frame. Actually 0 is an assigned value, but 0 must be measured,
and the usual means of empirically determining 0 involve observations of the force
between charged plates. Maxwell clearly believed these measurements must be made
with the apparatus "at rest" with respect to the ether in order to yield the true and
isotropic value of 0. In sections 768 and 769 of Maxwells Treatise he discussed the ratio
of electrostatic to electromagnetic units, and predicted that two parallel sheets of electric
charge, both moving in their own planes in the same direction with velocity c (supposing
this to be possible) would exert no net force on each other. If Maxwell imagined himself
moving along with these charged plates and observing no force between them, he
obviously did not expect the laws of electrostatics to be applicable. (This is analogous to
Einsteins famous thought experiment in which he imagined moving along side a
relatively stationary pulse of light.) According to Maxwell's conception, if
measurements of 0 are performed with an apparatus traveling at some significant fraction
of the speed of light, the results would not only differ from the result at rest, they would
also vary depending on the orientation of the plates relative to the direction of the
absolute velocity of the apparatus.
Of course, the efforts of Maxwell and others to devise empirical methods for measuring
the absolute rest frame (either by measuring anisotropies in the speed of light or by

detecting variations in the electromagnetic properties of the vacuum) were doomed to


failure, because even though it's true that the equations of electromagnetism are not
invariant under Galilean transformations, it is also true that those equations are invariant
with respect to every system of inertial coordinates. Maxwell (along with everyone else
before Einstein) would have regarded those two propositions as logically contradictory,
because he assumed inertial coordinate systems are related by Galilean transformations.
Einstein was the first to recognize that this is not so, i.e., that relatively moving inertial
coordinate systems are actually related by Lorentz transformations.
Maxwell's equations are suggestive of the invariance of c only because of the added
circumstance that we are unable to physically identify any particular frame of reference
for the application of those equations. (Needless to say, the same is not true of, for
example, the Navier-Stokes equation for a material fluid medium.) The most readily
observed instance of this inability to single out a unique reference frame for Maxwell's
equations is the empirical invariance of light speed with respect to every inertial system
of coordinates, from which we can infer the invariance of 0. Hence attempts to deduce
the invariance of light speed from Maxwell's equations are fundamentally misguided.
Furthermore, as discussed in Section 1.6, we know (as did Einstein) that Maxwell's
equations are not fundamental, since they don't encompass quantum photo-electric effects
(for example), whereas the Minkowski structure of spacetime (representing the
invariance of the local characteristic speed of light) evidently is fundamental, even in the
context of quantum electrodynamics. This strongly supports Einstein's decision to base
his kinematics on the light speed principle itself. (As in the case of Euclid's decision to
specify a "fifth postulate" for his theory of geometry, we can only marvel in retrospect at
the underlying insight and maturity that this decision reveals.)
Another argument that is sometimes advanced in support of the second postulate is based
on the notion of causality. If the future is to be determined by (and only by) the past, then
(the argument goes) no object or information can move infinitely fast, and from this
restriction people have tried to infer the existence of a finite upper bound on speeds,
which would then lead to the Lorentz transformations. One problem with this line of
reasoning is that it's based on a principle (causality) that is not unambiguously selfevident. Indeed, if certain objects could move infinitely fast, we might expect to find the
universe populated with large sets of indistinguishable particles, all of which are really
instances of a small number of prototypes moving infinitely fast from place to place, so
that they each occupy numerous locations at all times. This may sound implausible until
we recall that the universe actually is populated by apparently indistinguishable electrons
and protons, and in fact according to quantum mechanics the individual identities of those
particles are ambiguous in many circumstances. John Wheeler once seriously toyed with
the idea that there is only a single electron in the universe, weaving its way back and
forth through time. Admittedly there are problems with such theories, but the point is
that causality and the directionality of time are far from being straightforward principles.
Moreover, even if we agree to exclude infinite speeds, i.e., that the composition of any
two finite speeds must yield a finite speed, we haven't really accomplished anything,
because the Galilean composition law has this same property. Every real number is

finite, but it does not follow that there must be some finite upper bound on the real
numbers. More fundamentally, it's important to recognize that the Minkowski structure
of spacetime doesn't, by itself, automatically rule out speeds above the characteristic
speed c (nor does it imply temporal asymmetry). Strictly speaking, a separate assumption
is required to rule out "tachyons". Thus, we can't really say that Minkowskian spacetime
is prima facie any more consistent with causality than is Galilean spacetime.
A more persuasive argument for a finite upper bound on speeds can be based on the idea
of locality, as mentioned in our review of the shortcomings of the Galilean transformation
rule. If the spatial ordering of events is to have any absolute significance, in spite of the
fact that distance can be transformed away by motion, it seems that there must be some
definite limit on speeds. Also, the continuity and identity of objects from one instant to
the next (ignoring the lessons of quantum mechanics) is most intelligible in the context of
a unified spacetime manifold with a definite non-singular connection, which implies a
finite upper bound on speeds. This is in the spirit of Minkowski's 1908 lecture in which
he urged the greater "mathematical intelligibility" of the Lorentzian group as opposed to
the Galilean group of transformations.
For a typical derivation of the Lorentz transformation in this axiomatic spirit, we may
begin with the basic Galilean program of seeking to identify coordinate systems with
respect to which physical phenomena are optimally simple. We have the fundamental
principle that for any material object in any state of motion there exists a system of space
and time coordinates with respect to which the object is instantaneously at rest and
Newton's laws of inertial motion hold good (at least quasi-statically). Such a system is
called an inertial rest frame coordinate system of the object. Let x,t denote inertial rest
frame coordinates of one object, and let x',t' denote inertial rest frame coordinates of
another object moving with a speed v in the positive x direction relative to the x,t
coordinates. How are these two coordinate systems related? We can arrange for the
origins of the coordinate systems to coincide. Also, since these coordinate systems are
defined such that an object in uniform motion with respect to one such system must be in
uniform motion with respect to all such systems, and such that inertia isotropic, it follows
that they must be linearly related by the general form x' = Ax + Bt and t' = Cx + Dt,
where A,B,C,D are constants for a given value of v. The differential form of these
equations is dx' = Adx + Bdt and dt' = Cdx + Ddt.
Now, since the second object is stationary at the origin of the x',t' coordinates, it's position
is always x' = 0, so the first transformation equation gives 0 = Adx + Bdt, which implies
dx/dt = B/A = v and hence B = Av. Also, if we solve the two transformation equations
for x and t we get (ADBC)x = Dx' Bt', (ADBC)t = Cx' + A. Since the first object is
moving with velocity v relative to the x',t' coordinates we have v = dx'/dt' = B/D, which
implies B = Dv and hence A = D. Furthermore, reciprocity demands that the
determinant AD BC = A2 + vAC of the transformation must equal unity, so we have C
= (1A2)/(vA). Combining all these facts, a linear, reciprocal, unitary transformation
from one system of inertial coordinates to another must be of the form

It only remains to determine the value of A (as a function of v), which we can do by
fixing the quantity in the square brackets. Letting k denote this quantity for a given v, the
transformation can be written in the form

Any two inertial coordinate systems must be related by a transformation of this form,
where v is the mutual speed between them. Also, note that

Given three systems of inertial coordinates with the mutual speed v between the first two
and u between the second two, the transformation from the first to the third is the
composition of transformations with parameters kv and ku. Letting x,t denote the third
system of coordinates, we have by direct substitution

The coefficient of t in the denominator of the right side must be unity, so we have ku = kv,
and therefore k is a constant for all v, with units of an inverse squared speed. Also, the
coefficient of t in the numerator must be the mutual speed between the first and third
coordinate systems. Thus, letting w denote this speed, we have

Its easy to show that this is the necessary and sufficient condition for the composite
transformation to have the required form.
Now, if the value of the constant k is non-zero, we can normalize its magnitude by a
suitable choice of space and time units, so that the only three fundamentally distinct
possibilities to consider are k = -1, 0, and +1. Setting k = 0 gives the familiar Galilean
transformation x' = x vt, t' = t. This is highly asymmetrical between the time and space
parameters, in the sense that it makes the transformed space parameter a function of both
the space coordinate and the time coordinate of the original system, whereas the

transformed time coordinate is dependent only on the time coordinate of the original
system.
Alternatively, for the case k = -1 we have the transformation

Letting denote the angle that the line from the origin to the point (x,t) makes with the t
axis, then tan() = v = dx/dt, and we have the trigonometric identities cos() = 1/(1+v2)1/2
and sin() = v/(1+v2)1/2. Therefore, this transformation can be written in the form

which is just a Euclidean rotation in the xt plane. Under this transformation the quantity
(dx)2 + (dt)2 = (dx')2 + (dt')2 is invariant. This transformation is clearly too symmetrical
between x and t, because know from experience that we cannot turn around in time as
easily as we can turn around in space.
The only remaining alternative is to set k = 1, which gives the transformation

Although perfectly symmetrical, this maintains the absolute distinction between spatial
and temporal intervals. This can be parameterized as a hyperbolic rotation

and we have the invariant quantity (dx)2 (dt)2 = (dx')2 (dt')2 for any given interval. It's
hardly surprising that this transformation, rather than either the Galilean transformation
or the Euclidean transformation, gives the actual relationship between space and time
coordinate systems with respect to which inertia is directionally symmetrical and inertial
motion is linear. From purely formal considerations we can see that the Galilean
transformation, given by setting k = 0, is incomplete and has no spacetime invariant,
whereas the Euclidean transformation, given by setting k = -1, makes no distinction at all
between space and time. Only the Lorentzian transformation, given by setting k = 1, has
completely satisfactory properties from an abstract point of view, which is presumably
why Minkowski referred to it as "more intelligible".
As plausible as such arguments may be, they don't amount to a logical deduction, and one
is left with the impression that we have not succeeded in identifying any fundamental
principle or symmetry that uniquely selects Lorentzian spacetime rather than Galilean
space and time. Accordingly, most writers on the subject have concluded (reluctantly)
that Einstein's light speed postulate, or something like it, is indispensable for deriving

special relativity, and that we can be persuaded to adopt such a postulate only by
empirical facts. Indeed, later in the same paper where Minkowski exercised his staircase
wit, he admitted that "the impulse and true motivation for assuming the group Gc came
from the fact that the differential equation for the propagation of light [i.e., the wave
equation] in empty space possesses the group Gc", and he referred back to Voigt's 1887
paper (see Section 1.4).
Nevertheless, it's still interesting to explore the various rational "intelligibility" arguments
that can be put forward as to why space and time must be Minkowskian. A typical
approach is to begin with three speeds u,v,w representing the pairwise speeds between
three co-linear particles, and to seek a composition law of the form Q(u,v,w) = 0 relating
these speeds. It's easy to make the case that it should be possible to uniquely solve this
function explicitly for any of the speeds in terms of the other two, which implies that Q
must be linear in all three of its arguments. The most general linear function of three
variables is
Q(u,v,w) = Auvw + Buv + Cuw + Dvw + Eu + Fv + Gw + H
where A,B,...H are constants. Treating the speeds symmetrically requires B = C = D and
E = F = G. Also, if any two of the speeds is 0 we require the third speed to be 0
(transitivity), so we have H = 0. Also, if any one of the speeds, say u, is 0, then we
require v = -w (reciprocity), but with u = 0 and v = -w the formula reduces to -Dv2 + Fv
Gv = 0, and since F = G (= E) this is just Dv2 = 0, so it follows that B = C = D = 0.
Hence the most general function that satisfies our requirements of linearity, 3-way
symmetry, transitivity, and reciprocity is Q(u,v,w) = Auvw + E(u+v+w) = 0. It's clear
that E must be non-zero (since otherwise general reciprocity would not be imposed when
any one of the variables vanished), so we can divide this function by E, and let k denote
A/E, to give

We see that this k is the same as the one discussed previously. As before, the only three
distinct cases are k = -1, 0, and +1. If k = 0 we have the Galilean composition law, and if
k = 1 we have the Einsteinian composition law. How are we to decide? In the next
section we consider the problem from a slightly different perspective, and focus on a
unique symmetry that arises only with k = 1.

1.8 Another Symmetry


I cannot quite imagine it possible that any physical meaning be afforded
to substitutions of reciprocal radii It does seem to me that you are very
much over-estimating the value of purely formal approaches
Albert Einstein to Felix Klein in
1916

We saw in previous sections that Maxwells equations are invariant under Lorentz
transformations, as well as translations and spatial rotations. Together these
transformations comprise the Poincare group. Of course, Maxwells equations are also
invariant under spatial and temporal reflections, but it is often overlooked that in addition
to all these linear transformations, Maxwells equations possess still another symmetry,
namely, the symmetry of spacetime inversion. In a sense, an inversion is a kind of
reflection about a surface in spacetime, analogous to inversions about circles in projective
geometry, the only difference being that the Minkowski interval is used instead of the
Euclidean line element.
Consider two events E1 and E2 that are null-separated from each other, meaning that the
absolute Minkowski interval between them is zero in terms of an inertial coordinate
system x,y,z,t. Let s1 and s2 denote the absolute intervals from the origin to these two
events (respectively). Under an inversion of the coordinate system about the surface at an
absolute interval R from the origin (which may be chosen arbitrarily), each event located
on a given ray through the origin is moved to another point on that ray such that its
absolute interval from the origin is changed from s to R2/s. Thus the hyperbolic surfaces
outside of R are mapped to surfaces inside R, and vice versa.
To prove that two events originally separated by a null Minkowski interval are still nullseparated after the coordinates have been inverted, note that the ray from the origin to the
event Ej can be characterized by constants j, j, j defined by

In terms of these parameters the magnitude of the interval from the origin to Ej can be
written as

The squared interval between E1 and E2 can then be expressed as

where

Since inversion leaves each event on its respective ray, the value of K12 for the inverted
coordinates is the same as for the original coordinates, so the only effect on the
Minkowski interval between E1 and E2 is to replace s1 and s2 with R2/s1 and R2/s2
respectively. Therefore, the squared Minkowski interval between the two events in terms
of the inverted coordinates is

The quantity in parentheses on the right side is just the original squared interval, so if the
interval was zero in terms of the original coordinates, it is zero in terms of the inverted
coordinates. Thus inversion of a system of inertial coordinates yields a system of
coordinates in which all the null intervals are preserved. It was shown in 1910 by
Bateman and (independently) Cunningham that this is the necessary and sufficient
condition for Maxwells equations to be invariant. Incidentally, Einstein was dismissive
of this invariance when Felix Klein asked him about it. He wrote
I am convinced that the covariance of Maxwells formulas under transformation
according to reciprocal radii can have no deeper significance; although this
transformation retains the form of the equations, it does not uphold the correlation
between coordinates and the measurement results from measuring rods and
clocks.
Einstein was similarly dismissive of Minkowskis formal approach to spacetime at first,
but later came to appreciate the profound significance of it. In any case, its interesting to
note that straight lines in inertial coordinate systems map to straight or hyperbolic paths
under inversion. This partly accounts for the fact that, according to the Lorentz-Dirac
equations of classical electrodynamics, perfect hyperbolic motion is inertial motion, in
the sense that there are free-body solutions describing particles in hyperbolic motion, and
a charged particle in hyperbolic motion does not radiate.
Its also interesting that the relativistic formula for composition of two speeds is invariant
under inversion of the arguments about the speed c, i.e., replacing each speed v with c2/v.
Letting f(u,v) denote the composition of the (co-linear) speeds u and v, and choosing
units so that c = 1, we can impose the three requirements

The first two requirements are satisfied by both the Galilean and the Lorentzian
composition formulas, but the third requirement is not satisfied by the Galilean formula,
because that gives

However, somewhat surprisingly, the relativistic composition function gives

so it does comply with all three requirements. This singles out the composition law with k
= 1 from the previous chapter. As indicated by Einsteins reply to Klein, the physical
significance of such inversion symmetries is obscure, and we should also note that the
spacetime inversion is not equivalent to the speed inversion, although they are formally
very similar. To clarify how this symmetry arises in the relativistic context, recall that we
had derived at the end of the previous chapter the relation

where u = v12, v = v23, and w = v31. The symbol vij signifies the speed of the ith particle in
terms of the inertial rest frame coordinates of the jth particle. With k = 0 this corresponds
to the Galilean speed composition formula, which clearly is not invariant under inversion
of any or all of the speeds. For any non-zero value of k, equation (1) can be re-written in
the form

Squaring both sides of this equation gives the equality

If we replace each speed with its inversion in this formula, and then multiply through by
(uvw)2 / k3 we get

which is equivalent to the preceding formula if and only if

Hence the speed composition formula is invariant under inversion if k = 1. The case k =
1 is equivalent to the case k = +1 if each speed is taken to be imaginary (corresponding
to the use of an imaginary time axis), so without loss of generality we can choose k = +1

with real speeds. There remains, however, the ambiguity introduced by squaring both
sides of equation (2), suppressing the signs of the factors. Equation (2) itself, without
squaring, is invariant under inversion of any two of the speeds, but the inversion of all
three speeds changes the sign of the right side. Thus by squaring both sides of (2) we
make it consistent with either of the two complementary relations

The left hand relation is invariant under inversion of any two of the speeds, whereas the
right hand relation is invariant under inversion of one or all three of the speeds. The
question, then, is why the first formula applies rather than the second. To answer this, we
should first point out that, despite the formal symmetry of the quantities u,v,w in these
equations, they are not conceptually symmetrical. Two of the quantities are implicitly
defined in terms of one inertial coordinate system, and the third quantity is defined in
terms of a different inertial coordinate system.
In general, there are nine conceptually distinct speeds for three co-linear particles in
terms of the three rest frame coordinate systems, namely

where vij is the speed of the ith particle in terms of the inertial rest frame coordinates of
the jth particle. By definition we have vii = 0 and by reciprocity we have vij = vji, so the
speeds comprise an anti-symmetric array. Thus, although the three speeds v12, v23, v31 are
nominally defined in terms of three different systems of coordinates, any two of them can
be expressed in terms of a single coordinate system by invoking the reciprocity relation.
For example, the three quantities v12, v23, v31 can be expressed in the form v12, v32, v31,
which signifies that the first two speeds are both defined in terms of the rest frame
coordinates of frame 2. However, the remaining speed does not have a direct expression
in terms of that frame, so a composition formula is needed to relate all three quantities.
Weve seen that the relativistic composition formula yields the same value for the third
speed (e.g., the speed defined in terms of frame 1) regardless of whether we use the two
other speeds (e.g., the speeds defined in terms of frame 2) or their reciprocals.
To more clearly exhibit the peculiar 2+1 symmetry of this velocity composition law, note
that it can be expressed in multiplicative form as

where vij denotes the speed of object j with respect to object i. Clearly if we replace any
two of the speeds with their reciprocals, the relation remains unchanged. On the other
hand, if we replace just one or all three of the speeds with their reciprocals, their product

is still unity, but the sign is negated. Thus, one way of expressing the full symmetry of
this relation would be to square both sides, giving the result

which is completely invariant under any replacement of one or more speeds with their
respective reciprocals. Naturally we can extend the product of factors of the form
(1+vij)/(1vij) to any cyclical sequence of relative speeds between any number of co-linear
points.
Its interesting to note the progression of relations between the speeds involving one, two,
and three particles. The relativity of position is expressed by the identity

for any one particle, and the relativity of velocity can be expressed by the skew symmetry

for any two systems particles. (This was referred to earlier as the reciprocity condition vij
= vji.) The next step is to consider the cyclic sum involving three particles and their
respective inertial rest frame coordinate systems. This is the key relation, because all
higher-order relations can be reduced to this. If acceleration were relative (like position
and velocity), we would expect the cyclic symmetry vij + vjk + vki = 0, which is a linear
function of all three components. Indeed, this is the Galilean composition formula.
However, since acceleration is absolute, it's to be expected that the actual relation is nonlinear in each of the three components. So, instead of vanishing, we need the right side of
this sum to be a symmetric function of the terms. The only other odd elementary
symmetric function of three quantities is the product of all three, so we're led (again) to
the relation

which can be regarded as the law of inertia. Since there is only one odd elementary
symmetric function of one variable, and likewise for two variables, the case of three
variables is the first for which there exists a non-tautological expression of this form.
We may also note a formal correspondence with De Morgan's law for logical statements.
Letting sums denote logical ORs (unions), products denote logical ANDs (intersections),
and overbars denote logical negation, De Morgans law states that

for any three logical variables X,Y,Z. Now, using the skew symmetry property, we can
"negate" each velocity on the right hand side of the previous expression to give

From this standpoint the right hand side is analogous to the "logical negation" of the left
hand side, which makes the relation analogous to setting the quantity equal to zero. The
justification for regarding this relation as the source of inertia becomes more clear in
Section 2.3, which describes how the relativistic composition law for velocities accounts
for the increasing inertia of an accelerating object. This leads to the view that inertia itself
is, in some sense, a consequence of the non-linearity of velocity compositions.
Given the composition law u' = (u+v)/(1+uv) for co-linear speeds, what can we say about
the transformation of the coordinates x and t themselves under the action of the velocity
v? The composition law can be written in the form vuu'+u'u = v, which has a natural
factorization if we multiply through by v and subtract 1 from both sides, giving

If u and u' are taken to be the spatio-temporal ratios x/t and x'/t', the above relation can be
written in the form

On the other hand, remembering that we can insert the reciprocals of any two of the
quantities u, u', v without disturbing the equality, we can take u and u' to be the temporalspatial ratios t/x and t'/x' in (3) to give

These last two equations immediately give

Treating the primed and unprimed frames equivalently, and recalling that v' = v, we see
that (4) has a perfectly symmetrical factorization, so we exploit this factorization to give
the transformation equations

These are the Lorentz transformations for velocity v in the x direction. The y and z
coordinates are unaffected, so we have y' = y and z' = z. From this it follows that the
quantity t2 x2 y2 z2 is invariant under a general Lorentz transformation, so we have
arrived at the full Minkowski spacetime metric.
Now, to determine the full velocity composition law for two systems of aligned
coordinates k and K, the latter moving in the positive x direction with velocity v relative
to the former, we can without loss of generality make the origins of the two systems both
coincide with a point P0 on the subject worldline, and let P1 denote a subsequent point on
that worldline with k system coordinates dt,dx,dy,dz. By definition the velocity
components of that worldline with respect to k are ux = dx/dt, uy = dy/dt, and uz = dz/dt.
The coordinates of P1 with respect to the K system are given by the Lorentz
transformation for a simple boost v in the x direction:

where =
. Therefore, the velocity components of the worldline with respect to
the K system are

1.9 Null Coordinates


Slight not whats near through aiming at whats far.
Euripides, 455 BC
Initially the special theory of relativity was regarded as just a particularly simple and
elegant interpretation of Lorentz's ether theory, but it soon became clear that there is a
profound heuristic difference between the two theories, most evident when we consider

the singularity implicit in the Lorentz transformation x' = (xvt), t' = (tvx), where =
1/(1v2)1/2. As v approaches arbitrarily close to 1, the factor goes to infinity. If these
relations are strictly valid (locally), as all our observations and experiements suggest,
then according to Lorentz's view all configurations of objects moving through the
absolute ether must be capable of infinite spatial "contractions" and temporal "dilations",
without the slightest distortion. This is clearly unrealistic. Hence the only plausible
justification for the Lorentzian view is a belief that the Lorentz transformation equations
are not strictly valid, i.e., that they must break down at some point. Indeed, this was
Lorentz's ultimate justification, as he held to the possibility that absolute speed might,
after all, make some difference to the intrinsic relations between physical entities.
However, one hundred years after Lorentz's time, there still is no evidence to support his
suspicion. To the contrary, all the tremendous advances of the last century in testing the
Lorentz transformation "to the nth degree" have consistently confirmed it's exact
validity. At some point a reasonable person must ask himself "What if the Lorentz
transformation really is exactly correct?" This is a possibility that a neo-etherist cannot
permit himself to contemplate - because the absolute physical singularity along light-like
intervals implied by the Lorentz transformation is plainly incompatible with any realistic
ether - but it is precisely what special relativity requires us to consider, and this ultimately
leads to a completely new and more powerful view of causality.
The singularity of the Lorentz transformation is most clearly expressed in terms of the
underlying Minkowski pseudo-metric. Recall that the invariant space time interval d
between the events (t,x) and (t+dt, x+dx) is given by
(d2 = (dt)2 (dx)2
where t and x are any set of inertial coordinates. This is called a pseudo-metric rather
than a metric because, unlike a true metric, it doesn't satisfy the triangle inequality, and
the interval between distinct points can be zero. This occurs for any interval such that dt
= dx, in which case the invariant interval d is literally zero. Arguably, it is only in the
context of Minkowski spacetime, with its null connections between distinct events, that
phenomena involving quantum entanglement can be rationalized.
Pictorially, the locus of points whose squared distance from the origin is 1 consists of
the two hyperbolas labeled +1 and -1 in the figure below.

The diagonal axes denoted by and represents the paths of light through the origin,
and the magnitude of the squared spacetime interval along these axes is 0, i.e., the metric
is degenerate along those lines. This is all expressed in terms of conventional space and
time coordinates, but it's also possible to define the spacetime separations between events
in terms of null coordinates along the light-line axes. Conceptually, we rotate the above
figure by 45 degrees, and regard the and lines as our coordinate axes, as shown
below:

In terms of a linear parameterization (,) of these "null coordinates" the locus of points
at a squared "distance" (d2 from the origin is an orthogonal hyperbola satisfying the
equation
(d2 = (dd
Since the light-lines and are degenerate, in the sense that the absolute spacetime
intervals along those lines vanish, the absolute velocity of a worldline, given by the
"slope" d/d = 0/0, is strictly undefined. This indeterminacy, arising from the singular
null intervals in spacetime, is at the heart of special relativity, allowing for infinitely

many different scalings of the light-line coordinates. In particular, it is natural to define


the rest frame coordinates of any worldline in such a way that d/d = 1. This
expresses the principle of relativity, and also entails Einstein's second principle, i.e., that
the (local) velocity of light with respect to the natural measures of space and time for any
worldline is unity. The relationship between the natural null coordinates of any two
worldlines is then expressed by the requirement that, for any given interval d the
components d,d with respect to one frame are related to the components d',d' with
respect to another frame according to the equation (d)(d) = (d')(d'). It follows that
the scale factors of any two frames Si and Sj are related according to

where vij is the usual velocity parameter (in units such that c = 1) of the origin of Sj with
respect to Si. Notice there is no absolute constraint on the scaling of the and axes,
there is only a relative constraint, so the "gage" of the light-lines really is indeterminate.
Also, the scale factors are simply the relativistic Doppler shifts for approaching and
receding sources. This accords with the view of the coordinate "grid lines" as the
network of light-lines emitted by a strobed source moving along the reference world-line.
To illustrate how we can operate with these null coordinate scale relations, let us derive
the addition rule for velocities. Given three co-linear unaccelerated particles with the
pairwise relative velocity parameters v12, v23, and v13, we can solve the " scale" relation
for v13 to give

We also have

Multiplying these together gives an expression for d1/d3, which can be substituted into
(1) to give the expected result

Interestingly, although neither the velocity parameter v nor the quantity (1+v)/(1v) is
additive, it's easy to see that the parameter ln[(1+v)/(1v)] is additive. In fact, this
parameter corresponds to the arc length of the "d = constant" hyperbola connecting the
two world lines at unit distances from their intersection, as shown by integrating the
differential distance along that curve

Since the equation of the hyperbola for d = 1 is 1 = dt2 dx2 we have

Substituting this into the previous expression and performing the integration gives

Recalling that d2 = dt2 dx2, we have dt + dx = d2 / (dt dx), so the quantity dx + dt can
be written as

Hence the absolute arc length along the d = 1 surface between two world lines that
intersect at the origin with a mutual velocity v is

Naturally the additivity of this logarithmic form implies that the argument is a
multiplicative measure of mutual speeds. The absolute interval between the intersection
points of the two worldlines with the d = 1 hyperbola is

One strength of the conventional pseudo-metrical formalism is that (t,x) coordinates


easily generalize to (t,x,y,z) coordinates, and the invariant interval generalizes to
(d)2 = (dt)2 (dx)2 (dy)2 (dz)2
The generalization of the null (lightlike) coordinates and corresponding invariant is not as
algebraically straightforward, but it conveys some interesting aspects of the spacetime
structure. Intuitively, an observer can conceive of the absolute interval between himself
and some distant future event P by first establishing a scale of radial measure outward on
his forward light cone in all directions, and then for each direction evaluate the
parameterized null measure along the light cone to the point of intersection with the
backward null cone of P. This will assign, to each direction in space, a parameterized

distance from the observer to the backward light cone of P, and there will be (in flat
spacetime) two distinguished directions, along which the null measure is maximum or
minimum. These are the principle directions for the interval from the observer to E, and
the product of the null measures in these directions is invariant. In other words, if a
second observer, momentarily coincident with the first but with some relative velocity,
determines the null measures along the principle directions to the backward light cone of
E, with respect to his own natural parameterization, the product will be the same as found
by the first observer.
It's often convenient to take the interval to the point P as the time axis of inertial
coordinates t,x,y,z, so the eigenvectors of the null cone intersections become singular, and
we can simply define the null coordinates u = t + r, v = t r, where r = (x2+y2+z2)1/2.
From this we have t = (u+r)/2 and r = (uv)/2 along with the corresponding differentials
dt = (du+dv)/2 and dr = (dudv)/2. Making these substitutions into the usual Minkowski
metric in terms of polar coordinates

we have the Minkowski line element in terms of angles and null coordinates

These coordinates are often useful, but we can establish a more generic system of null
coordinates in 3+1 dimensional spacetime by arbitrarily choosing four non-parallel
directions in space from an observer at O, and then the coordinates of any timelike
separated event are expressed as the four null measures radially in those directions along
the forward null cone of O to the backward null cone of P. This provides enough
information to fully specify the interval OP.
In terms of the usual orthogonal spacetime coordinates, we specify the coordinates
(T,X,Y,Z) of event P relative to the observer O at the origin in terms of the coordinates of
four events I1, I2, I3, I4 on the intersection of the forward null cone of O and the backward
null cone of P. If ti,xi,yi,zi denote the conventional coordinates of Ii, then we have
ti2 = xi2 + yi2 + zi2

(T ti)2 = (X xi)2 + (Y yi)2 + (Z zi)2

for i = 1, 2, 3, 4. Expanding the right hand equations and canceling based on the left
hand equalities, we have the system of equations

The left hand side of all four of these equations is the invariant squared proper time
interval 2 from O to P, and we wish to express this in terms of just the four null measures

in the four chosen directions. For a specified set of directions in space, this information
can be conveyed by the four values t1, t2, t3, and t4, since the magnitudes of the spatial
components are determined by the directions of the axes and the magnitude of the
corresponding t. In general we can define the direction coefficients aij such that

with the condition ai12 + ai22 + ai32 = 1. Making these substitutions, the system of
equations can be written in matrix form as

We can use any four directions for which the determinant of the coefficient matrix does
not vanish. One natural choice is to use the vertices of a tetrahedron inscribed in a unit
sphere, so that the four directions are perfectly symmetrical. We can take as the
coordinates of the vertices

Inserting these values for the direction coefficients aij, we can solve the matrix equation
for T, X, Y, and Z to give

Substituting into the relation 2 = T2 X2 Y2 Z2 and solving for 2 gives

Naturally if t1 = t2 = t3 = t4 = t, then this gives = 2t. Also, notice that, as expected, this
expression is perfectly symmetrical in the four lightlike coordinates. It's interesting that
if the right hand term was absent, then would be simply the harmonic mean of the ti.
More generally, in a spacetime of 1 + (D1) dimensions, the invariant interval in terms of
D perfectly symmetrical null measures t1, t2,..., tD satisfies the equation

It can be verified that with D = 2 this expression reduces to 2 = 4t1t2 , which agrees with
our earlier hyperbolic formulation 2 = with = 2t1 and =2t2. In the particular case
D = 4, if we define U = 2/ and uj = 1/(2tj) this equation can be written in the form

where is the average squared difference of the individual u terms from the average, i.e.,

This is the statistical variance of the uj values. Incidentally, we've seen that the usual
representation s2 = x2 t2 of the invariant spacetime interval is a generalization of the
familiar Pythagorean "sum-of-squares" equation of a circle, whereas the interval can also
be expressed in the hyperbolic form s2 = . This reminds us of other fundamental
relations of physics that have found expression as hyperbolic relations, such as the
uncertainty relations

in quantum mechanics, where h is Planck's constant. In general if the operators A,B


corresponding to two observables do not commute (i.e., if AB BA 0), then an
uncertainty relation applies to those two observables, and they are said to be
incompatible. Spatial position and momentum are maximally incompatible, as are energy
and time. Such pairs of variables are called conjugates. This naturally raises the question
of whether the variables parameterizing two oppositely directed null rays in spacetime
can, in some sense, be regarded as conjugates, accounting for the invariance of their
product. Indeed the special theory of relativity can be interpreted in terms of a
fundamental limitation on our ability to make measurements, just as can the theory of
quantum mechanics. In quantum mechanics we say that it's not possible to
simultaneously measure the values of two conjugate variables such that the product of the
uncertainties of those two measurements is less than h/4. Likewise in special relativity
we could say that it's not possible to measure the time difference dt between two events
separated by the spatial distance dx such the ratio dt/dx of the variables is less than 1/c. In
quantum mechanics we may imagine that the particle possesses a precise position and
momentum, even though we are unable to determine it due to practical limitations of our
measurement techniques. If only we have infinitely weak signal, i.e., if only h = 0, we
could measure things with infinite precision. Likewise in special relativity we may
imagine that there is an absolute and precise relationship between the times of two distant

events, but we are prevented from determining it due to the practical limitations. If only
we had an infinnitely fast signal, i.e., if only 1/c was zero, we could measure things with
infinite precision. In other words, nature possesses structure and information that is
inaccessible to us (hidden variables), due to the limitations of our measuring capabilities.
However, it's also possible to regard the limitations imposed by quantum mechanics (h
0) and special relativity (1/c 0) not as limitations of measurement, but as expressions
of an actual ambiguity and "incompatibility" in the independent meanings of those
variables. Einstein's central contribution to modern relativity was the idea that there is no
one "true" simultaneity between spatially separate events, but rather spacetime events are
only partially ordered, and the decomposition of space and time into separate variables
contains an inherent ambiguity on the scale of 1/c. In other words, he rejected Lorentz's
"hidden variable" approach, and insisted on treating the ambiguity in the spacetime
decomposition as fundamental. This is interesting in part because, when it came to
quantum mechanics, Einstein's instinct was to continue trying to find ways of measuring
the "hidden variables", and he was never comfortable with the idea that the Heisenberg
uncertainty relations express a fundamental ambiguity in the decomposition of conjugate
variables on the scale of h. (Late in life, as Einstein continued arguing against Bohr's
notion of complementarity in quantum mechanics, one of his younger collegues said "But
Professor Einstein, you yourself originated this kind of positivist reasoning about
conjugate variables in the theory of space and time", to which Einstein replied "Well,
perhaps I did, but it's nonsense all the same".)
Another model suggested by the relativistic interpretation of spacetime is to conceive of
space and time as two superimposed waves, combining constructively in the directions of
the space and time axes, but destructively (i.e., cancelling out) along light lines. For any
given inertial coordinate system x,t, we can associate with each event an angle defined
by tan() = t/x. Thus the interval from the origin to the point x,t makes an angle with
the positive x axis, and we have t = x tan(), so we can express the squared magnitude of
a spacelike interval as

Multiplying through by cos()2 gives

Substituting t2 / tan()2 for x2 gives the analogous expression

Adding these two expressions gives the result

Consequently the "circular" locus of events satisfying x2 + t2 = r2 for any fixed r can be
represented in polar coordinates (s,) by the equation

which is the equation of two lemniscates, as illustrated below.

The lemniscate was first discussed by Jakob Bernoulli in 1694, as the locus of points
satisfying the equation

which is, in Bernoulli's words, "a lying eight-like figure, folded in a knot of a bundle, or
of a lemniscus, a knot of a French ribbon". (The study of this curve led Fagnano, Euler,
Legendre, Gauss, and others to the discovery of addition theorems for integrals, of which
the relativistic velocity composition law is an example.) Notice that the lemniscate is the
inverse (in the sense of inversive geometry) of the hyperbola relative to the circle of
radius k. In other words, if we draw a line emanating from the origin and it strikes the
lemniscate at the radius s, then it strikes the hyperbola at the radius R where sR = k2.
This follows from the fact that the equation for a hyperbola in polar coordinates is R2 =
k2/[E2 cos()2 1] where E is the eccentricity, and for an orthogonal hyperbola we have E
=
. Hence the denominator is 2cos()2 1 = cos(2), and the equation of the
hyperbola is R2 = k2/cos(2). Since the polar equation for the lemniscate is s2 = k2cos(2)
we have sR = k2.
2.1 The Spacetime Interval
and then it was
There interposed a fly,
With blue, uncertain, stumbling buzz,
Between the light and me,

And then the windows failed, and then


I could not see to see.
Emily Dickinson, 1879
The advance of the quantum wave function of any physical system as it passes uniformly
from the event (t,x,y,z) to the event (t+dt, x+dx, y+dy, z+dz) is proportional to the value
of d given by

where t,x,y,z are any system of inertial coordinates and c is a constant (the speed of light,
equal to 300 meters per microsecond). The quantity d is called the elapsed proper time
of the interval, and it is invariant with respect to any system of inertial coordinates. To
illustrate, consider a muon particle, which has a radioactive mean life of roughly 2 sec
with respect to its inertial rest frame coordinates. In other words, between the appearance
of a typical muon (arising from, say, the decay of a pion) and its decay there is an interval
of about 2 sec in terms of the time coordinate of the muon's inertial rest frame, so the
components of this interval are {2,0,0,0}, and the quantum phase of the particle advances
by an amount proportional to d, where

Now suppose we assess this same physical phenomenon with respect to a relatively
moving system of inertial coordinates, e.g., a system with respect to which the muon
moved from the spatial origin [0,0,0] all the way to the spatial position [980m, -750m,
1270m] before it decayed. With respect to these coordinates, the muon traveled a spatial
distance of 1771 meters. Since the advance of the quantum wave function (i.e., the
proper time) of a system or particle over any interval of its worldline is invariant, the
corresponding time component of this physical interval with respect to these relatively
moving inertial coordinates must be much greater than 2 sec. If we let (dT,dX,dY,dZ)
denote the components of this interval with respect to the relatively moving system of
inertial coordinates, we must have

Solving for dT and substituting for the spatial components noted above, we have

This represents the time component of the muon decay interval with respect to the

moving system of inertial coordinates. Since the muon has moved a spatial distance of
1771 meters in 6.23 sec, we see that its velocity with respect to these coordinates is 284
m/sec, which is 0.947c.
The identification of the spacetime interval with quantum phase applies to null intervals
as well, consistent with the fact that the quantum phase of a photon does not advance at
all between its emission and absorption. (For a further discussion of this, see Section
9.10.) Hence the physical significance of a null spacetime interval is that the quantum
state of any system is constant along that interval. In other words, the interval represents
a single quantum state of the system. It follows that the emission and absorption of a
photon must be regarded as, in some sense, a single quantum event.
Note, however, that the quantum phase is path dependent. In other words, two particles
at opposite ends of a lightlike (null) interval do not share the same quantum state unless
the second particle reached that event by passing along that null interval. Hence the
concept of the spacetime interval as a measure of the phase of the quantum wave function
does not conflict with the exclusion principle for fermions such as electrons, because
even though two electrons can be null-separated, they cannot have separated along that
null path, because they have non-zero rest mass. Of course, it is possible for two photons
at opposite ends of a null interval to have reached that condition by progressing along
that interval, in which case they represent the same quantum phase (and in some sense
may be regarded as "the same photon"), but photons are bosons, and hence not excluded
from occupying the same state. In fact, the presence of one photon in a particular
quantum state actually enhances the probability of another photon entering that state.
(This is responsible for the phenomenon of stimulated emission, which is the basis of
operation of lasers.)
In this regard it's interesting to consider neutrinos, which (like electrons) are fermions,
meaning that they have anti-symmetric eigenfunctions, and hence are subject to the Pauli
exclusion principle. On the other hand, neutrinos were traditionally regarded as massless,
meaning they propagate along null intervals. This raises the prospect of two instances of
a neutrino at opposite ends of a null interval, with the second occupying the same
quantum state as the first, in violation of the exclusion principle for fermions. It might be
argued that these two instances are really the same neutrino, and a particle obviously can't
exclude itself from occupying its own state. However, this is somewhat problematic due
to the indistinguishability and the lack of definite identities for individual particles. A
different approach would be to argue that all fermions, including neutrinos, must have
mass, and thus be excluded from traveling along null intervals. The idea that neutrinos
actually do have mass seems to be supported by recent experimental observations, but the
questions remains open.
Based on the general identification of the invariant magnitude (proper time) of a timelike
interval with quantum phase along that interval, it follows that all physical processes and
characteristic sequences of events will evolve in proportion to this quantity. The name
"proper time" is appropriate because this quantity represents the most meaningful known
measure of elapsed time along that interval, based on the fact that the quantum state is the

most complete possible description of physical reality. Since not all spacetime intervals
are timelike, we conclude that the temporal relations between events induce only a partial
ordering, rather than a total ordering (as discussed in Section 1.2), because a set of events
can be totally ordered only if they are each inside the future or past null cone of each of
the others. This doesn't hold if any of the pairwise intervals is spacelike. As a
consequence of this partial ordering, between two fixed timelike separated events there
exist timelike paths with different lapses of proper time.
Admittedly a partial ordering of events has been considered unacceptable by some
people, basically because they regard total temporal ordering in a classical Cartesian
setting as an inviolable first principle. Rather than accept partial ordering they prefer to
(more or less arbitrarily) select one particular inertial reference system and declare it to
be the "true" configuration, as in Lorentz's original theory, in an attempt to restore an
unambiguous total temporal ordering to events. They then account for the apparent
differences in elapsed time (as in muon observations) by regarding them as effects of
absolute velocity relative to the "true" frame of reference, again following Lorentz.
However, unlike Lorentz, we now have a theory of quantum mechanics, and the quantum
state of a system gives (arguably) the most complete possible objective description of the
system. Therefore, modern advocates of total temporal ordering face the daunting task of
finding some mechanism underlying quantum mechanics (i.e., hidden variables) to
provide a physical significance for their preferred total ordering. Unfortunately, the only
prospects for a viable hidden-variable theory seem to be things like the explicitly nonlocal contrivances described by David Bohm, which must surely be anathema to those
who seek a physics based on classical Cartesian mechanisms. So, although the theories
of relativity and quantum mechanics are in some respects incongruent, it is nevertheless
true that the (putative) validity and completeness of quantum mechanics constitutes one
of the strongest argument in favor of the relativistic interpretation of Lorentz invariance.
We should also mention that a tacit assumption has been made above, namely, the
assumption of physical equivalence between instantaneously co-moving frames,
regardless of acceleration. For example, we assume that two co-moving clocks will keep
time at the same instantaneous rate, even if one is accelerating and the other is not. This
is just a hypothesis - we have no a priori reason to rule out physical effects of the 2nd,
3rd, 4th,... time derivatives. It just so happens that when we construct a theory on this
basis, it works pretty well. (Similarly we have no a priori reason to think the field
equations necessarily depend only on the metric and its 1st and 2nd derivatives; but it
works.)
Another way of expressing this "clock hypothesis" is to say that an ideal clock is
unaffected by acceleration, and to regard this as the definition of an "ideal clock", i.e.,
one that compensates for any effects of 2nd or higher derivatives. Of course the physical
significance of this definition arises from the hypothesized fact that acceleration is
absolute, and therefore perfectly detectable (in principle). In contrast, we hypothesize
that velocity is perfectly undetectable, which explains why we cannot define our "ideal
clock" to compensate for velocity (or, for that matter, position). The point is that these
are both assumptions invoked by relativity: (1) the zeroth and first derivatives of position

are perfectly relative and undetectable, and (2) the second and higher derivatives of
position are perfectly absolute and detectable. Most treatments of relativity emphasize
the first assumption, but the second is no less important.
The notion of an ideal clock takes on even more physical significance from the fact that
there exist physical entities (such a vibrating atoms, etc) in which the intrinsic forces far
exceed any accelerating forces we can apply, so that we have in fact (not just in principle)
the ability to observe virtually ideal clocks. For example, in the Rebka and Pound
experiments it was found that nuclear clocks were slowed by precisely the factor (v),
even though subject to accelerations up to 1016 g (which is huge in normal terms, but of
course still small relative to nuclear forces).
It was emphasized in Section 1 that a pulse of light has no inertial rest frame, but this
may seem puzzling at first. The pulse has a well-defined spatial position versus time with
respect to some inertial coordinate system, representing a fixed velocity c relative to that
system, and we know that any system of orthogonal coordinates in uniform non-rotating
motion relative to an inertial coordinate system is also inertial, so why can we not simply
apply the velocity c to the base frame to arrive at the rest frame of the light pulse? How
can an entity have a well-defined velocity and yet have no well-defined rest frame? The
only answer can be that the transformation is singular, i.e., the coordinate system moving
with a uniform speed c relative to an inertial frame is not well defined. The singular
behavior of the transformation corresponds to the fact that the absolute magnitude of the
spacetime intervals along lightlike paths is null. The transformation through a velocity v
from the xt to the x't' coordinates is t' = (tvx)/ and x' = (xvt)/ where = (1v2)1/2, so
it's clear that for v = 1 the individual t' and x' components are undefined, but the ratio
of dt' over dx' remains well-defined, with magnitude 1 and the opposite sign from v. The
singularity of the Lorentz transformation for the speed c suggests that the conception of
light as an entity in itself may be somewhat misleading, and it is often useful to regard
light as simply an interaction between two massive bodies along a null spacetime
interval.
Discussions of special relativity often refer to the use of clocks and reflected light signals
for the evaluation of spacetime intervals. For example, suppose two identical clocks are
moving uniformly with speeds +v and -v along the x axis of a given inertial coordinate
system, and these clocks are set to zero at the intersection of their worldlines. When the
leftward clock indicates the proper time 1, it emits a pulse of light, which bounces off the
rightward clock when that clock indicates 2, and arrives back at the leftward clock when
that clock reads 3. This is illustrated in the drawing below.

By similar triangles we immediately have 2/1 = 3/2, and thus 22 = 13. Of course, this
same relation holds good in Galilean spacetime as well (not to mention Euclidean plane
geometry, using distances instead of time intervals), and the reflected signal need not be a
light pulse. Any object moving at the same speed (angle) in both directions with respect
to this coordinate system would serve just as well, and would lead to the same result that
2 is the geometric mean of 1 and 3. Naturally if we apply any Minkowskian, Galilean,
or Euclidean transformation (respectively), the pictorial angles of the lines will differ, but
the three absolute intervals will remain unchanged.
It is, of course, possible to distinguish between the Galilean and Minkowskian cases
based just on the values of the elapsed times, provided we know the relative speeds of the
clocks and the signal. In Galilean spacetime each proper time j equals the coordinate
time tj, whereas in Minkowski spacetime it equals (tj2 xj2)1/2 where xj = v tj. Hence the
proper time j in Minkowski spacetime is tj(1 v2)1/2. This might seem to imply that the
ratios of proper times are the same in the Galilean and Minkowskian cases, but in fact we
have not made a valid comparison for equal relative speeds between the clocks. In this
example each clock is moving with speed v away from the midpoint, which implies that
the relative speed is 2v in the Galilean case, but only 2v/(1 + v2) in the Minkowskian
case.
To give a valid comparison for equal relative speeds between the clocks, let's transform
the events to a system of coordinate such that the left-hand clock is stationary and the
right-hand clock is moving at the speed v. Now this v represents magnitude of the actual
relative speed between the two clocks. We now stipulate that the original signal is
moving with speed u relative to the left-hand clock, and the reflected signal is moving
with speed -u relative to the right-hand clock. The situation is illustrated in the figure
below.

The speed, with respect to these coordinates, of the reflected signal is what distinguishes
the Galilean from the Minkowskian case. Letting x2 and t2 denote the coordinates of the
reflection event, and noting that 1 = t1 and 3 = t3, we have v = x2/t2 and u = x2/(t21).
We also have

Dividing the numerator and denominator of the expression for u by t2, and replacing x2/t2
with v, gives u = v/[1(1/t2)]. Likewise the above expressions can be written as

Solving these equations for the time ratios, we have

Consequently, depending on whether the metric is Galilean or Minkowskian, the ratio of


t3 over t1 is given by

respectively. If u happens to be unity (meaning that the signals propagate at the speed of
light), these expressions reduce to the squares of the Galilean and relativistic Doppler
shift factors, i.e., 1/(1v)2 and (1+v)/(1v), discussed more fully in Section 2.4.

Another distinguishing factor between the two metrics is that with the Minkowski metric
the speed of light is invariant with respect to any system of inertial coordinates, so
(arguably) we can even say that it represents the same "u" relative to a spacelike interval
as it does relative to a timelike interval, in order to adhere to our stipulation that the
reflected signal has the speed u relative to "the rest frame of the right-hand clock". Of
course, a spacelike interval cannot actually be the worldline of a clock (or any other
material object), but the invariance of the speed of light under Minkowskian
transformations enables us to rationally apply the same "geometric mean" formula to
determine the magnitudes of spacelike intervals, provided we use light-like signals, as
illustrated below.

In this case we have 1 = 3, so 22 = 32, meaning that squared spacelike intervals are
negative.
2.2 Force Laws and Maxwell's Equations
While speaking of this state, I must immediately call your attention to the
curious fact that, although we never lose sight of it, we need by no means
go far in attempting to form an image of it and, in fact, we cannot say
much about it.
Hendrik
Lorentz, 1909
Perhaps the most rudimentary scientific observation is that material objects exhibit a
natural tendency to move in certain circumstances. For example, objects near the surface
of the Earth tend to move in the local "downward" direction, i.e., toward the Earth's
center. The Newtonian approach to describing such tendencies was to imagine a "force
field" representing a vectorial force per unit charge that is applied to any particle at any
given point, and then to postulate that the acceleration vector of each particle equals the
applied force divided by the particle's inertial mass. Thus the "charge" of a particle
determines how strongly that particle couples with a particular kind of force field,
whereas the inertial mass determines how susceptible the particle's velocity is to arbitrary
applied forces. In the case of gravity, the coupling charge happens to be the same as the

inertial mass, denoted by m, but for electric and magnetic forces the coupling charge q
differs from m.
Since the coupling charge and the response coefficient for gravity are identical, it follows
that gravity can only operate in a single directional sense, because changing the sign of m
for a particle would reverse the sense of both the coupling and the response, leaving the
particle's overall behavior unchanged. In other words, if we considered gravitation to
apply a repulsive force to a certain particle by setting the particle's coupling charge to -m,
we would also set its inertial coefficient to -m, so the particle would still accelerate into
the applied force. Of course, the identity of the gravitational coupling and response
coefficients not only implies a unique directional sense, it implies a unique quantitative
response for all material particles, regardless of m. In contrast, the electric and magnetic
coupling charge q is separately specifiable from the inertial coefficient m, so by changing
the sign of q while leaving m constant we can represent either negative or positive
response, and by changing the ratio of q/m we can scale the quantitative response.
According to this classical picture, a small test particle with mass m and electric charge q
at a given location in space is subject to a vectorial force f given by

where g is the gravitational field vector, E is the electric field vector, and B is the
magnetic field vector at the given location, and v is the velocity vector of the test
particle. (See Part 1 of the Appendix for a review of vector products such as the cross
product denoted by v B.) As noted above, the acceleration vector a of the particle is
simply f/m, so we have the equation of motion

Given the mass, charge, and initial position of a test particle, and the vectors g,E,B for
every point in vicinity of the particle, this equation enables us to compute the particle's
subsequent motion. Notice that acceleration of a test particle due to gravity is
independent of the particle's properties and state of motion (to the first approximation),
whereas the accelerations due to the electric and magnetic fields are both proportional to
the particle's charge divided by it's inertial mass. In addition, the contribution of the
magnetic field is a function of the particle's velocity. This dependence on the state of
motion has important consequences, and leads naturally to the unification of the electric
and magnetic fields, but before describing these effects it's worthwhile to briefly review
the effect of the classical gravitational field on the motion of a particle.
The gravitational acceleration field g at a point p due to a distant particle of mass m was
specified classically by Newton's law

where r is the displacement vector (of magnitude r) from the mass particle to the point p.
Noting that r2 = x2 + y2 + z2 and r = ix + jy + kz, it's straightforward to verify that the
divergence of the gravitational field g vanishes at any point p away from the mass, i.e.,
we have

(See Part 3 of the Appendix for a review of the differential operator notation.) The
field due to multiple mass particles is just the sum of the individual fields, so the
divergence of g due to any configuration of matter vanishes at every point in empty
space. Of course, the field is singular (infinite) at any point containing a finite amount of
mass, so we can't express the field due to a mass point precisely at the point. However, if
we postulate a continuous distribution of gravitational charge (i.e., mass), with a density
g specified at every point in a region, then it can be shown that the gravitational
acceleration field at every point satisfies the equation

Incidentally, if we define the gravitational potential (a scalar field) due to any particle of
mass as = -m / r where r is the distance from the source particle (and noting that the
potential due to multiple particles is simply additive), it's easy to show that

so equations (3) and (4) can be expressed equivalently in terms of the potential, in which
case they are called Laplace's equation and Poisson's equation, respectively. The
equation of motion for a test particle in the absence of any electromagnetic effects is
simply a = g, so equation (2) gives the three components

To illustrate the use of these equations of motion, consider a circular path for our test
particle, given by

In this case we see that r is constant and the second derivatives of x and y are r2sin(wt)
and r2cos(t) respectively. The equation of motion for z is identically satisfied and the
equations for x and y both reduce to r32 = m, which is Kepler's third law for circular
orbits.
Newton's analysis of gravity into a vectorial force field and a response was spectacularly
successful in quantifying the effects of gravity, and by the beginning of the 20th century

this approach was able to account for nearly all astronomical phenomena in the solar
system within the limits of observational accuracy (the only notable exception being a
slightly anomalous precession in the orbit of the planet Mercury, as discussed in Section
6.2). Based on this success, it was natural that the other forces of nature would be
formalized in a similar way.
The next two most obvious forces that apply to material bodies are the electric and
magnetic forces, represented by the last two terms in equation (1a). If we imagine that all
of space is filled with a mist of tiny electrical charges qi with velocities vi, then we can
define the classical charge density e and current density j as follows

where V is an incremental volume of space. For the remainder of this section we will
omit the subscript "e" with the understanding the signifies the electric charge density.
If we let x,y,z denote the position of the incremental quantity of charge, we can write out
the individual components of the current density as

Maxwell's equations for the electro-magnetic fields are

where E is the electric field, B is the magnetic field. Equations (5a) and (5b) suggest that
the electric and magnetic fields are similar to the gravitational field g, since the
divergences at each point equal the respective charge densities, with the difference being
that the electric charge density may be positive or negative, and there does not exist (as
far as we know) an isolated magnetic charge, i.e., no magnetic monopoles. Equations
(5a) and (5b) are both static equations, in the sense that they do not involve the time
parameter. By themselves they could be taken to indicate that the electric and magnetic
fields are each individually similar to Newton's conception of the gravitational field, i.e.,
instantaneous "force-at-a-distance". (On this static basis we would presumably never
have identified the magnetic field at all, assuming magnetic monopoles don't exist, and
that the universe is not subject to any boundary conditions that caused B to be non-zero.)
However, equations (5c) and (5d) reveal a completely different aspect of the E and B

fields, namely, that they are dynamically linked together, so the fields are not only
functions of each other, but their definitions explicitly involve changes in time. Recall
that the Newtonian gravitational field g was defined totally by the instantaneous spatial
condition expressed by g = g , so at any given instant the Newtonian gravitational
field is totally determined by the spatial distribution of mass in that instant, consistent
with the notion that simultaneity is absolute. In contrast, Maxwell's equations indicate
that the fields E and B depend not only on the distribution of charge at a given putative
"instant", but also on the movement of charge (i.e., the current density) and on the rates of
change of the fields themselves at that "instant".
Since these equations contain a mixture of partial derivatives of the fields E and B with
respect to the temporal as well as the spatial coordinates, dimensional consistency
requires that the effective units of space and time must have a fixed relation to each other,
assuming the units of E and B have a fixed relation. Specifically, the ratio of space units
to time units must equal the ratio of electrostatic and electromagnetic units (all with
respect to any frame of reference in which the above equations are applicable). This is
the reason we were able to write the above equations without constant coefficients,
because the fixed absolute ratio between the effective units of measure of time and space
enables us to specify all the variables x,y,z,t in the same units.
Furthermore, this fixed ratio of space to time units has an extremely important physical
significance for electromagnetic fields in empty space, where and j are both zero. To
see this, take the curl of both sides of (5c), which gives

Now, for any arbitrary vector S it's easy to verify the identity

Therefore, we can apply this to the left hand side of the preceding equation, and noting
that E = 0 in empty space, we are left with

Also, recall that the order of partial differentiation with respect to two parameters doesn't
matter, so we can re-write the right-hand side of the above expression as

Finally, since (5d) gives B = E/t in empty space, the above equation becomes

Similarly we can show that

Equations (6a) and (6b) are just the classical wave equation, which implies that
electromagnetic changes propagate through empty space at a speed of 1 when using
consistent units of space and time. In terms of conventional units this must equal the
ratio of the electrostatic and electromagnetic units, which gives the speed

where 0 and 0 are the permeability and permittivity of the vacuum. To some extent our
choice of units is arbitrary, and in fact we conventionally define our units so that the
permeability constant has the value
0 = 4 10-7 (kilogrammeter) / (ampere2second2)
Since force has units of kgm/sec2 and charge has units of ampsec, these conventions
determine our units of force and charge, as well as distance, so we can then (theoretically)
use Coulomb's law F = q1q2/(4 0 r2) to determine the permittivity constant by measuring
the static force that exists between known electric charges at a certain distance. The best
experimental value is
0 = 8.854187818 10-12 (ampere2second4) / (kilogrammeter3)
Substituting these values into equation (7) gives
c = 2.997924579935 108

meter / second

This constant of proportionality between the units of space and time is based entirely on
electrostatic and electromagnetic measurements, and it follows from Maxwell's equations
that electromagnetic waves propagate at the speed c in a vacuum. In Section 3.3 we
review the history of attempts to measure the speed of light (which of course for most of
human history was not known to be an electromagnetic phenomenon), but suffice it to say
here that the best measured value for the speed of light is 299792457.4 m/sec, which
agrees with Maxwell's predicted propagation speed for electromagnetic waves to nine
significant digits.
This was Maxwell's greatest triumph, showing that electromagnetic waves propagate at
the speed of light, from which we infer that light itself consists of electromagnetic waves,

thereby unifying optics and electromagnetism. However, this magnificent result also
presented Maxwell, and other physicists of the late 19th century, with a puzzle that would
baffle them for decades. Equation (7) implies that, assuming the permittivity and
permeability of the vacuum are the same when evaluated at rest with respect to any
inertial frame of reference, in accord with the classical principle of relativity, and
assuming Maxwell's equations are strictly valid in all inertial frames of reference, then it
follows that the speed of light must be independent of the frame of reference. This agrees
with the Galilean principle of relativity, but flatly violates the Galilean transformation
rules, because it does not yield simply additive composition of speeds.
This was the conflict that vexed the young Einstein (age 16) when he was attending "prep
school" in Aarau, Switzerland in 1895, preparing to re-take the entrance examination at
the Zurich Polytechnic. Although he was deficient in the cultural subjects, he already
knew enough mathematics and physics to realize that Maxwell's equations don't support
the existence of a free wave at any speed other than c, which should be a fixed constant
of nature according to the classical principle of relativity. But to admit an invariant speed
seemed impossible to reconcile with the classical transformation rules.
Writing out equations (5d) and (5a) explicitly, we have four partial differential equations

The above equations strongly suggest that the three components of the current density j
and the charge density ought to be combined into a single four-vector, such that each
component is the incremental charge per volume multiplied by the respective component
of the four-velocity of the charge, as shown below

where the parameter is the proper time of the charge's rest frame. If the charge is
stationary with respect to these x,y,z,t coordinates, then obviously the current density
components vanish, and jt is simply our original charge density . On the other hand, if
the charge is moving with respect to the x,y,z,t coordinates, we acquire a non-vanishing

current density, and we find that the charge density is modified by the ratio dt/d.
However, it's worth noting that the incremental volume elements with respect to a
moving frame of reference are also modified by the same Lorentz transformation, which
ensures that the electrical charge on a physical object is invariant for all frames of
reference.
We can also see from the four differential equations above that if the arguments of the
partial derivatives on the left-hand side are arranged according to their denominators,
they constitute a perfect anti-symmetric matrix

If we let x1,x2,x3,x4 denote the coordinates x,y,z,t respectively, then equations (5a) and
(5d) can be combined and expressed in the form

In exactly the same way we can combine equations (5b) and (5c) and express them in the
form

where the matrix Q is an anti-symmetric matrix defined by

Returning again to equation (1a), we see that in the absence of a gravitational field the
force on a particle with q = m = 1 and velocity v at a point in space where the electric
and magnetic field vectors are E and B is given by

In component form this can be written as

Consequently the components of the acceleration are

To simplify the expressions, suppose the velocity of the particle with respect to the
original x,y,z,t coordinates is purely in the positive x direction, i.e., we have vy = vz = 0
and vx = v. Then the force on the particle has the components

Now consider the same physical situation, but with respect to a system of inertial
coordinates x',y',z',t' in terms of which the particle's velocity is zero. To the first
approximation we expect that the components of force are the same when evaluated with
respect to the primed coordinate system, and in fact by symmetry it's clear that fx' = fx.
However, for the components perpendicular to the velocity, the symmetry of the situation
allows to say only that (for any fixed speed v) fy' = kfy and fz' = kfz , where A is a constant
that approaches 1 for small v. Hence the components of the electric field with respect to
the primed and unprimed coordinate systems are related according to

By symmetry we can also write down the reciprocal transformation, replacing v with -v,
which gives

Notice that we've used the same factor k for both transformations, because to the first
order we know k(v) is simply 1, suggesting that the dependence of k on v is of the second
order, which makes it likely that k(v) is an even function, i.e., we assume k(v) = k(-v).
Substituting the expression for Ey' into the expression for Ey and solving the resulting
equation for Bz' gives

By the same token, substituting the expression for Ez' into the expression for Ez and
solving for By' gives

These last two expressions should look familiar, because they are formally identical to
the expression for the transformed time coordinate developed in Section 1.7. Letting
(v) denote the quantity in square brackets for any given v, the general transformation
equations for the electric and magnetic field components perpendicular to the velocity are

Comparing these equations with equation (1) in Section 1.7, it should come as no surprise
that the actual transformations for the components of the electric and magnetic field are
given by setting (v) = 1. Consequently we have the invariants

Naturally we expect the field components parallel to the velocity to exhibit the
corresponding invariance, i.e., we expect that

from which we infer the final transformation equation Bx' = Bx. So, the complete set of
transformation equations for the electric and magnetic field components from one system
of inertial coordinates to another (with a relative velocity v in the positive x direction) is

Just as the Lorentz transformation for space and time intervals shows that those intervals
are the components of a unified space-time interval, these transformation equations show
that the electric and magnetic fields are components of a unified electro-magnetic field.

The decomposition of the electromagnetic field into electric and magnetic components
depends on the frame of reference. From the invariants noted above we see that, letting
E2 and B2 denote squared magnitudes of the electric and magnetic field vectors at a given
point, the quantity E2 B2 is invariant (as is the dot product EB), analogous to the
invariant X2 T2 for spacetime intervals. The combined electromagnetic field can be
represented by the matrix P defined previously, which transforms as a tensor of rank 2
under Lorentz transformations. So too does the matrix Q, and since Maxwell's equations
can be expressed in terms of P and Q (as shown by equations (8a) and (8b)), we see that
Maxwell's equations are invariant under Lorentz transformations.
2.3 The Inertia of Energy
Please reveal who you are of such fearsome form... I wish to clearly know
you, the primeval being,because I cannot fathom your intention. Lord
Krsna said: I am terrible Time, destroyer of all beings in all worlds, here
to destroy this world. Of those heroic soldiers now arrayed in the
opposing army, even without you, none will be spared.
Bhagavad Gita
One of the first and most famous examples of the heuristic power of Einstein's relativistic
interpretation of space and time was the suggestion that energy and inertial mass are, in a
fundamental sense, equivalent. The word "suggestion" is used advisedly, because massenergy equivalence is not a logically necessary consequence of special relativity (as
explained below). In fact, when combined with the gravitational equivalence principle, it
turns out that mass-energy equivalence is technically incompatible with special relativity.
Indeed this was one of Einstein's main motivations for developing the general theory.
Nevertheless, by showing that the kinematics of phenomena can best be described in
terms of a unified four-dimensional continuum, with time as a fourth coordinate, distinct
from the path parameter, the special theory did clearly suggest that energy be regarded as
the fourth (viz., time-like) component of momentum, and hence that all energy has inertia
and all inertia represents energy.
It should also be mentioned that some kind of equivalence between mass and energy had
long been recognized by physicists, even prior to 1905. Indeed Maxwells equations
already imply that the energy of an electromagnetic wave carries momentum, and
Poincare had noted that if Galilean relativity was applied to electrodynamics, the
equivalence of mass and energy follows. Lorentz had attempted to describe the mass of
an electron as a manifestation of electromagnetic energy. (It's interesting that while some
people were trying to "explain" electromagnetism as a disturbance in a material medium,
others were trying to explain material substances as manifestations of electromagnetism!)
However, the fact that mass-energy equivalence emerges so naturally from Einstein's
kinematics, applicable to all kinds of mass and energy (not just electrons and
electromagnetism), was mainly responsible for the recognition of this equivalence as a
general and fundamental aspect of nature. We'll first give a brief verbal explanation of

how this equivalence emerges from Einstein's kinematics, and then follow with a
quantitative description.
The basic principle of special relativity is that inertial measures of spatial and temporal
intervals are such that the velocity of light with respect to those measures is invariant. It
follows that relative velocities are not transitively additive from one reference frame to
another, and, as a result, the acceleration of an object with respect to one inertial frame
must differ from its acceleration with respect to another inertial frame. However, by
symmetry, an impact force exerted by two objects (in one spatial dimension) upon each
another is equal and opposite, regardless of their relative velocity. These simple
considerations lead directly to the idea that inertia (as quantified by mass) is an attribute
of energy.
Given an object O of mass m, initially at rest, we apply a force F to the object, giving it
an acceleration of F/m. After a while the object has achieved some velocity v, and we
continue to apply the constant force F. But now imagine another inertial observer, this
one momentarily co-moving with the object at this instant with a velocity v. This other
observer sees a stationary object O of mass m subject to a force F, so, on the assumption
that the laws of physics are the same in all inertial frames, we know that he will see the
object respond with an acceleration of F/m (just as we did). However, due to nonadditivity of velocities, the acceleration with respect to our measures of time and space
must now be different. Thus, even though we're still applying a force F to the object, its
acceleration (relative to our frame) is no longer equal to F/m. In fact, it must be less, and
this acceleration must go to zero as v approaches the speed of light. Hence the effective
inertia of the object in the direction of its motion increases. During this experiment we
can also integrate the force we exerted over the distance traveled by the object, and
determine the amount of work (energy) that we imparted to the object in bringing it to the
velocity v. With a little algebra we can show that the ratio of the amount of energy we put
into the object to the amount by which the object's inertia (units of mass) increased is
exactly c2.
To show this quantitatively, suppose the origin of a system of inertial coordinates K0 is
moving with speed u0 relative to another system of inertial coordinates K. If a particle P is
moving with speed u (in the same direction as u0) with respect to the K0 coordinates, then
the speed of the particle relative to the K coordinates is given by the velocity composition
law

Differentiating with respect to u gives

Hence, at the instant when P is momentarily co-moving with the K0 coordinates, we have

If we let and t denote the time coordinates of K0 and K respectively, then from the
metric (d)2 = c2(dt)2 (dx)2 and the fact that v2 = (dx/dt)2 it follows that the incremental
lapse of proper time d along the worldline of P as it advances from t to t + dt is
, so we can divide the above expression by this quantity to give

The quantity a = dv/dt is the acceleration of P with respect to the K coordinates, whereas
a0 = du / d is the rest acceleration of P with respect to the K0 coordinates (relative to
which it is momentarily at rest). Now, by symmetry, a force F exerted (along the axis of
motion) between a particle at rest in K on the particle P at rest in K0 must be of equal and
opposite magnitude with respect to both frames of reference. Also, by definition, a force
of magnitude F applied to a particle of rest mass m0 will result in an acceleration a0 =
F/m0 with respect to the reference frame in which the particle is momentarily at rest.
Therefore, using the preceding relation between the accelerations with respect to the K0
and K coordinates, we have

The coefficient of a in this expression has sometimes been called the longitudinal
mass, because it represents the effective proportionality between force and acceleration
along the direction of action. Now let us define two quantities, p(v) and e(v), which we
will call the momentum and kinetic energy of a particle of mass m0 at any relative speed
v. These quantities are defined respectively by the integrals of Fdt and Fds over an
interval in which the particle is accelerated by a force F from rest to velocity v. The
results of these integrations are independent of the pattern of acceleration, so we can
assume constant acceleration a throughout the interval. Hence the integral of Fdt is
evaluated from t = 0 to t = v/a, and since s = (1/2)at2, the integral of Fds is evaluated from
s = 0 to s = v2/(2a). In addition, we will define the inertial mass m of the particle as the

ratio p/v. Therefore, the inertial mass and the kinetic energy of the particle at any speed v
are given by

If the force F were equal to m0a (as in Newtonian mechanics) these two quantities would
equal m0 and (1/2)m0v2 respectively. However, weve seen that consistency with
relativistic kinematics requires the force to be given by equation (1). As a result, the
inertial mass is given by m = m0/
, so it exceeds the rest mass whenever the
particle has non-zero velocity. This increase in inertial mass is exactly proportional to the
kinetic energy of the particle, as shown by

The exact proportionality between the extra inertia and the extra energy of a moving
particle naturally suggests that it is the energy itself which has contributed the inertia, and
this in turn suggests that all of the particles inertia (including its rest inertia m0)
corresponds to some form of energy. This leads us to hypothesize a very general and
important relation, E = mc2, which signifies a fundamental equivalence between energy
and inertial mass. From this we might imagine that all inertia is potentially convertible to
energy, although it's worth noting that this does not follow rigorously from the principles
of special relativity. It is just a hypothesis suggested by special relativity (as it is also
suggested by Maxwell's equations). In 1905 the only experimental test that Einstein could
imagine was to see if a lump of "radium salt" loses weight as it gives off radiation, but of
course that would never be a complete test, because the radium doesn't decay down to
nothing. The same is true with an nuclear bomb, i.e., it's really only the binding energy of
the nucleus that is being converted, so it doesn't demonstrate an entire proton (for
example) being converted into energy. However, today we can observe electrons and
positrons annihilating each other completely, and yielding amounts of energy precisely in
accord with the predictions of special relativity.
Incidentally, the above derivation followed Newton in adopting the Third Law (at least
for impulse interactions along the line of motion) as a fundamental postulate, on the basis
of symmetry. From this the conservation of momentum can be deduced. However, most
modern treatments of relativity proceed in the opposite direction, postulating the
conservation of momentum and then deducing something like the Third Law. (There are
complications when applying the Third Law to extended interactions, and to interactions
in which the forces are not parallel to the direction of motion, due to aberration effects

and the ambiguity of simultaneity relations, but the preceding derivation was based solely
on interactions that can be modeled as mutual contact events at single points, with the
forces parallel to the direction of motion, in which case the Third Law is unproblematic.)
The typical modern approach to relativistic mechanics is to begin by defining momentum
as the product of rest mass and velocity. One formal motivation for this definition is that
the resulting 3-vector is well-behaved under Lorentz transformations, in the sense that if
this quantity is conserved with respect to one inertial frame, it is automatically conserved
with respect to all inertial frames (which would not be true if we defined momentum in
terms of, say, longitudinal mass). On a more fundamental level, this definition is
motivated by the fact that it agrees with non-relativistic momentum in the limit of low
velocities. The heuristic technique of deducing the appropriate observable parameters of a
theory from the requirement that they match classical observables in the classical limit
was used extensively in early development of relativity, but apparently no one dignified
the technique with a name until Bohr (characteristically) elevated it to the status of a
"principle" in quantum mechanics, where it is known as the "Correspondence Principle".
Based on this definition, the modern approach then simply postulates that momentum is
conserved. Then we define relativistic force as the rate of change of momentum. This is
Newton's Second Law, and it's motivated largely by the fact that this "force", together
with conservation of momentum, implies Newton's Third Law (at least in the case of
contact forces).
However, from a purely relativistic standpoint, the definition of momentum as a 3-vector
seems incomplete. Its three components are proportional to the derivatives of the three
spatial coordinates x,y,z of the object with respect to the proper time of the object, but
what about the coordinate time t? If we let xj, j = 0, 1, 2, 3 denote the coordinates t,x,y,z,
then it seems natural to consider the 4-vector

where m is the rest mass. Then define the relativistic force 4-vector as the proper rate of
change of momentum, i.e.,

Our correspondence principle easily enables us to identify the three components p1, p2, p3
as just our original momentum 3-vector, but now we have an additional component, p0,
equal to m(dt/d). Let's call this component the "energy" E of the object. In full fourdimensional spacetime coordinate time t is related to the object's proper time according
to

In geometric units (c = 1) the quantity in the square brackets is just v2. Substituting back
into our energy definition, we have

The first term is simply m (or mc2 in normal units), so we interpret this as the rest energy
of the mass. This is sometimes presented as a derivation of mass-energy equivalence, but
at best it's really just a suggestive heuristic device. The key step in this "derivation" was
when we blithely decided to call p0 the "energy" of the object. Strictly speaking, we
violated our "correspondence principle" by making this definition, because by
correspondence with the low-velocity limit, the energy E of a particle should be
something like (1/2)mv2, and clearly p0 does not reduce to this in the low-speed limit.
Nevertheless, we defined p0 as the "energy" E, and since that component equals m when v
= 0, we essentially just defined our result E = m (or E = mc2 in ordinary units) for a mass
at rest. From this reasoning it isn't clear that this is anything more than a bookkeeping
convention, one that could just as well be applied in classical mechanics using some
arbitrary squared velocity to convert from units of mass to units of energy. The assertion
of physical equivalence between inertial mass and energy has significance only if it is
actually possible for the entire mass of an object, including its rest mass, to manifestly
exhibit the qualities of energy. Lacking this, the only equivalence between inertial mass
and energy that special relativity strictly entails is the "extra" inertia that bodies exhibit
when they acquire kinetic energy.
As mentioned above, even the fact that nuclear reactors give off huge amounts of energy
does not really substantiate the complete equivalence of energy and inertial mass, because
the energy given off in such reactions represents just the binding energy holding the
nucleons (protons and neutrons) together. The binding energy is the amount of energy
required to pull a nuclei apart. (The terminology is slightly inapt, because a configuration
with high binding energy is actually a low energy configuration, and vice versa.) Of
course, protons are all positively charged, so they repel each other by the Coulomb force,
but at very small distances the strong nuclear force binds them together. Since each
nucleon is attracted to every other nucleon, we might expect the total binding energy of a
nucleus comprised of N nucleons to be proportional to N(N-1)/2, which would imply that
the binding energy per nucleon would increase linearly with N. However, saturation
effects cause the binding energy per nucleon to reach a maximum at for nuclei with N
60 (e.g., iron), then to decrease slightly as N increases further. As a result, if an atom with
(say) N = 230 is split into two atoms, each with N=115, the total binding energy per
nucleon is increased, which means the resulting configuration is in a lower energy state
than the original configuration. In such circumstances, the two small atoms have slightly
less total rest mass than the original large atom, but at the instant of the split the overall
"mass-like" quality is conserved, because those two smaller atoms have enormous

velocities, precisely such that the total relativistic mass is conserved. (This physical
conservation is the main reason the old concept of relativistic mass has never been
completely discarded.) If we then slow down those two smaller atoms by absorbing their
energy, we end up with two atoms at rest, at which point a little bit of apparent rest mass
has disappeared from the universe. On the other hand, it is also possible to fuse two light
nuclei (e.g., N = 2) together to give a larger atom with more binding energy, in which
case the rest mass of the resulting atom is less than the combined rest masses of the two
original atoms. In either case (fission or fusion), a net reduction in rest mass occurs,
accompanied by the appearance of an equivalent amount of kinetic energy and radiation.
(The actual detailed mechanism by which binding energy, originally a "rest property"
with isotropic inertia, becomes a kinetic property representing what we may call
relativistic mass with anisotropic inertia, is not well understood.)
Another derivation of mass-energy equivalence is based on consideration of a bound
"swarm" of particles, buzzing around with some average velocity. If the swarm is heated
(i.e., energy E is added) the particles move faster and thereby gain both longitudinal and
transverse mass, so the inertia of the individual particles is anisotropic, but since they are
all buzzing around in random directions, the net effect on the stationary swarm (bound
together by some unspecified means) is that its resistance to acceleration is isotropic, and
its "rest mass" has effectively been increased by E/c2. Of course, such a composite object
still consists of elementary particles with some irreducible rest mass, so even this picture
doesn't imply complete mass-energy equivalence.
To get complete equivalence we need to imagine something like photons bound together
in a swarm. Now, it may appear that equation (2) fails to account for the energy of light,
because it gives E proportional to the rest mass m, which is zero for a photon. However,
the denominator of (2) is also zero for a photon (because v = 1), so we need to evaluate
the expression in the limit as m goes to zero and v goes to 1. We know from the study of
electro-magnetic radiation that although a photon has no rest mass, it does (according to
Maxwell's equations) have momentum, equal to |p| = E (or E/c in conventional units).
This suggests that we try to isolate the momentum component from the rest mass
component of the energy. To do this, we square equation (2) and expand the simple
geometric series as follows

Excluding the first term, which is purely rest mass, all the remaining terms are divisible
by (mv)2, so we can write this is

The right-most term is simply the squared magnitude of the momentum, so we have the
apparently fundamental relation

consistent with our premise that the E (or E/c in conventional units) equals the magnitude
of the momentum |p| for a photon. Of course, electromagnetic waves are classically
regarded as linear, meaning that photons don't ordinarily interfere with each other
(directly). As Dirac said, "each photon interferes only with itself... interference between
two different photons never occurs". However, the non-linear field equations of general
relativity enable photons to interact gravitationally with each other. Wheeler coined the
word "geon" to denote a swarm of massless particles bound together by the gravitational
field associated with their energy, although he noted that such a configuration would be
inherently unstable, viz., it would very rapidly either dissipate or shrink into complete
gravitational collapse. Also, it's not clear that any physically realistic situation would lead
to such a configuration in the first place, since it would require concentrating an amount
of electromagnetic energy equivalent to the mass m within a radius of about r = Gm/c2.
For example, to make a geon from the energy equivalent of one electron, it would be
necessary to concentrate that energy within a radius of about (6.7)10-58 meters.
An interesting alternative approach to deducing (3) is based directly on the Minkowski
metric

This is applicable both to massive timelike particles and to light. In the case of light we
know that the proper time d and the rest mass m are both zero, but we may postulate that
the ratio m/d remains meaningful even when m and d individually vanish. Multiplying
both sides of the Minkowski line element by the square of this ratio gives immediately

The first term on the right side is E2 and the remaining three terms are px2, py2, and pz2, so
this equation can be written as

Hence this expression is nothing but the Minkowski spacetime metric multiplied through
by (m/d)2, as illustrated in the figure below.

The kinetic energy of the particle with rest mass m along the indicated worldline is
represented in this figure by the portion of the total energy E in excess of the rest energy.
Returning to the question of how mass and energy can be regarded as different
expressions of the same thing, recall that the energy of a particle with rest mass m0 and
speed V is m0/(1V2)1/2. We can also determine the energy of a particle whose motion is
defined as the composition of two orthogonal speeds. Let t,x,y,z denote the inertial
coordinates of system S, and let T,X,Y,Z denote the (aligned) inertial coordinates of
system S'. In S the particle is moving with speed vy in the positive y direction so its
coordinates are

The Lorentz transformation for a coordinate system S' whose spatial origin is moving
with the speed v in the positive x (and X) direction with respect to system S is

so the coordinates of the particle with respect to the S' system are

The first of these equations implies t = T(1 vx2)1/2, so we can substitute for t in the
expressions for X and Y to give

The total squared speed V2 with respect to these coordinates is given by

Subtracting 1 from both sides and factoring the right hand side, this relativistic
composition rule for orthogonal speeds vx and vy can be written in the form

It follows that the total energy (neglecting stress and other forms of potential energy) of a
ring of matter with a rest mass m0 spinning with an intrinsic circumferential speed u and
translating with a speed v in the axial direction is

A similar argument applies to translatory motions of the ring in any direction, not just the
axial direction. For example, consider motions in the plane of the ring, and focus on the
contributions of two diametrically opposed particles (each of rest mass m0/2) on the ring,
as illustrated below.

If the circumferential motion of the two particles happens to be perpendicular to the


translatory motion of the ring, as shown in the left-hand figure, then the preceding
formula for E is applicable, and represents the total energy of the two particles. If, on the
other hand, the circumferential motion of the two particles is parallel to the motion of the
ring's center, as shown in the right-hand figure, then the two particles have the speeds
(v+u)/(1+vu) and (vu)/(1vu) respectively, so the combined total energy (i.e., the
relativistic mass) of the two particles is given by the sum

Thus each pair of diametrically opposed particles with equal and opposite intrinsic
motions parallel to the extrinsic translatory motion contribute the same total amount of
energy as if their intrinsic motions were both perpendicular to the extrinsic motion. Every
bound system of particles can be decomposed into pairs of particles with equal and
opposite intrinsic motions, and these motions are either parallel or perpendicular or some
combination relative to the extrinsic motion of the system, so the preceding analysis
shows that the relativistic mass of the bound system of particles is isotropic, and the
system behaves just like an object whose rest mass equals the sum of the intrinsic
relativistic masses of the constituent particles. (Note again that we are not considering
internal stresses and other kinds of potential energy.)
This nicely illustrates how, if the spinning ring was mounted inside a box, we would
simply regard the angular kinetic energy of the ring as part of the rest mass M0 of the box
with speed v, i.e.,

where the "rest mass" of the box is now explicitly dependent on its energy content. This
naturally leads to the idea that each original particle might also be regarded as a "box"
whose contents are in an excited energy state via some kinetic mode (possibly rotational),
and so the "rest mass" m0 of the particle is actually just the relativistic mass of a lesser
amount of "true" rest mass, leading to an infinite regress, and the idea that perhaps all
matter is really some form of energy.
But does it really make sense to imagine that all the mass (i.e., inertial resistance) is
really just energy, and that there is no irreducible rest mass at all? If there is no original
kernel of irreducible matter, then what ultimately possesses the energy? To picture how
an aggregate of massless energy can have non-zero rest mass, first consider two identical
massive particles connected by a massless spring, as illustrated below.

Suppose these particles are oscillating in a simple harmonic motion about their common
center of mass, alternately expanding and compressing the spring. The total energy of the
system is conserved, but part of the energy oscillates between kinetic energy of the
moving particles and potential (stress) energy of the spring. At the point in the cycle
when the spring has no tension, the speed of the particles (relative to their common center
of mass) is a maximum. At this point the particles have equal and opposite speeds +u and
-u, and we've seen that the combined rest mass of this configuration (corresponding to the
amount of energy required to accelerate it to a given speed v) is m0/(1u2)1/2. At other
points in the cycle, the particles are at rest with respect to their common center of mass,

but the total amount of energy in the system with respect to any given inertial frame is
constant, so the effective rest mass of the configuration is constant over the entire cycle.
Since the combined rest mass of the two particles themselves (at this point in the cycle) is
just m0, the additional rest mass to bring the total configuration up to m0/(1u2)1/2 must be
contributed by the stress energy stored in the "massless" spring. This is one example of a
massless entity acquiring rest mass by virtue of its stored energy.
Recall that the energy-momentum vector of a particle is defined as [E, px, py, pz] where E
is the total energy and px, py, pz are the components of the momentum, all with respect to
some fixed system of inertial coordinates t,x,y,z. The rest mass m0 of the particle is then
defined as the Minkowskian "norm" of the energy-momentum vector, i.e.,

If the particle has rest mass m0, then the components of its energy-momentum vector are

If the object is moving with speed u, then dt/d = = 1/(1u2)1/2, so the energy component
is equal to the transverse relativistic mass. The rest mass of a configuration of arbitrarily
moving particles is simply the norm of the sum of their individual energy-momentum
vectors. The energy-momentum vectors of two particles with individual rest masses m0
moving with speeds dx/dt = u and dx/dt = u are [m0, m0u, 0, 0] and [m0, m0u, 0,
0], so the sum is [2m0, 0, 0, 0], which has the norm 2m0. This is consistent with the
previous result, i.e., the rest mass of two particles in equal and opposite motion about the
center of the configuration is simply the sum of their (transverse) relativistic masses, i.e.,
the sum of their energies.
A photon has no rest mass, which implies that the Minkowskian norm of its energymomentum vector is zero. However, it does not follow that the components of its energymomentum vector are all zero, because the Minkowskian norm is not positive-definite.
For a photon we have E2 px2 py2 pz2 = 0 (where E = h, so the energy-momentum
vectors of two photons, one moving in the positive x direction and the other moving in
the negative x direction, are of the form [E, E, 0, 0] and [E, E, 0, 0] respectively. The
Minkowski norms of each of these vectors individually are zero, but the sum of these two
vectors is [2E, 0, 0, 0], which has a Minkowski norm of 2E. This shows that the rest mass
of two identical photons moving in opposite directions is m0 = 2E = 2h, even though the
individual photons have no rest mass.
If we could imagine a means of binding the two photons together, like the two particles
attached to the massless spring, then we could conceive of a bound system with positive
rest mass whose constituents have no rest mass. As mentioned previously, in normal
circumstances photons do not interact with each other (i.e., they can be superimposed
without affecting each other), but we can, in principle, imagine photons bound together

by the gravitational field of their energy (geons). The ability of electrons and antielectrons (positrons) to completely annihilate each other in a release of energy suggests
that these actual massive particles are also, in some sense, bound states of pure energy,
but the mechanisms or processes that hold an electron together, and that determine its
characteristic mass, charge, etc., are not known.
It's worth noting that the definition of "rest mass" is somewhat context-dependent when
applied to complex accelerating configurations of entities, because the momentum of
such entities depends on the space and time scales on which they are evaluated. For
example, we may ask whether the rest mass of a spinning disk should include the kinetic
energy associated with its spin. For another example, if the Earth is considered over just a
small portion of its orbit around the Sun, we can say that it has linear momentum (with
respect to the Sun's inertial rest frame), so the energy of its circumferential motion is
excluded from the definition of its rest mass. However, if the Earth is considered as a
bound particle during many complete orbits around the Sun, it has no net momentum
with respect to the Sun's frame, and in this context the Earth's orbital kinetic energy is
included in its "rest mass".
Similarly the atoms comprising a "stationary" block of lead are not microscopically
stationary, but in the aggregate, averaged over the characteristic time scale of the mean
free oscillation time of the atoms, the block is stationary, and is treated as such. The
temperature of the lead actually represents changes in the states of motion of the
constituent particles, but over a suitable length of time the particles are still stationary.
We can continue to smaller scales, down to sub-atomic particles comprising individual
atoms, and we find that the position and momentum of a particle cannot even be precisely
stipulated simultaneously. In each case we must choose a context in order to apply the
definition of rest mass.
Physical entities possess multiple modes of excitation (kinetic energy), and some of these
modes we may choose (or be forced) to absorb into the definition of the object's "rest
mass", because they do not vanish with respect to any inertial reference frame, whereas
other modes we may choose (and be able) to exclude from the "rest mass". In order to
assess the momentum of complex physical entities in various states of excitation, we
must first decide how finely to decompose the entities, and the time intervals over which
to make the assessment. The "rest mass" of an entity invariably includes some of what
would be called energy or "relativistic mass" if we were working on a lower level of
detail.
2.4 Doppler Shift for Sound and Light
I was much further out than you thought
And not waving but drowning.
Stevie Smith, 1957
For historical reasons, some older text books present two different versions of the

Doppler shift equations, one for acoustic phenomena based on traditional Newtonian
kinematics, and another for optical and electromagnetic phenomena based on relativistic
kinematics. This sometimes gives the impression that relativity requires us to apply a
different set of kinematical rules to the propagation of sound than to the propagation of
light, but of course that is not the case. The kinematics of relativity apply uniformly to
the propagation of all kinds of signals, provided we give the exact formulae. The
traditional acoustic formulas are inexact, tacitly based on Newtonian approximations, but
when they are expressed exactly we find that they are perfectly consistent with the
relativistic formulas.
Consider a frame of reference in which the medium of signal propagation is assumed to
be at rest, and suppose an emitter and absorber are located on the x axis, with the emitter
moving to the left at a speed of ve and the absorber moving to the right, directly away
from the emitter, at a speed of va. Let cs denote the speed at which the signal propagates
with respect to the medium. Then, according to the classical (non-relativistic) treatment,
the Doppler frequency shift is

(It's assumed here that u and v are less than cs, because otherwise there may be shock
waves and/or lack of communication between transmitter and receiver, in which case the
Doppler effect does not apply.) The above formula is often quoted as the Doppler effect
for sound, and then another formula is given for light, suggesting that relativity arbitrarily
treats sound and light signals differently. In truth, relativity has just a single formula for
the Doppler shift, which applies equally to both sound and light. This formula can
basically be read directly off the spacetime diagram shown below

If an emitter on worldline OA turns a signal ON at event O and OFF at event A, the


proper duration of the signal is the magnitude of OA, and if the signal propagates with

the speed of the worldline AB, then the proper duration of the pulse for a receiver on OB
will equal the magnitude of OB. Thus we have

and

Substituting xA = vetA and xB = vatB into the equation for cs and re-arranging terms gives

from which we get

Substituting this into the ratio of |OA| / |OB| gives the ratio of proper times for the signal,
which is the inverse of the ratio of frequencies:

Now, if va and ve are both small compared to c, it's clear that the relativistic correction
factor (the square root quantity) will be indistinguishable from unity, and we can simply
use the leading factor, which is the classical Doppler formula for both sound and light.
However, if va and/or ve are fairly large (i.e., on the same order as c) we can't neglect the
relativistic correction.
It may seem surprising that the formula for sound waves in a fixed medium with absolute
speeds for the emitter and absorber is also applicable to light, but notice that as the signal
propagation speed cs goes to c, the above Doppler formula smoothly evolves into

which is very nice, because we immediately recognize the quantity inside the square root
as the multiplicative form of the relativistic composition law for velocities (discussed in
section 1.8). In other words, letting u denote the composition of the speeds va and ve

given by the formula

it follows that

Consequently, as cs increases to c, the absolute speeds ve and va of the emitter and


absorber relative to the fixed medium merge into a single relative speed u between the
emitter and absorber, independent of any reference to a fixed medium, and we arrive at
the relativistic Doppler formula for waves propagating at c for an emitter and absorber
with a relative velocity of u:

To clarify the relation between the classical and relativistic Doppler shift equations, recall
that for a classical treatment of a wave with characteristic speed cs in a material medium
the Doppler frequency shift depends on whether the emitter or the absorber is moving
relative to the fixed medium. If the absorber is stationary and the emitter is receding at a
speed of v (normalized so cs = 1), then the frequency shift is given by

whereas if the emitter is stationary and the absorber is receding the frequency shift is

To the first order these are the same, but they obviously differ significantly if v is close to
1. In contrast, the relativistic Doppler shift for light, with cs = c, does not distinguish
between emitter and absorber motion, but simply predicts a frequency shift equal to the
geometric mean of the two classical formulas, i.e.,

Naturally to first order this is the same as the classical Doppler formulas, but it differs
from both of them in the second order, so we should be able to check for this difference,

provided we can arrange for emitters and/or absorbers to be moving with significant
speeds. The Doppler effect has in fact been tested at speeds high enough to distinguish
between these two formulas. The possibility of such a test, based on observing the
Doppler shift for canal rays emitted from high-speed ions, had been considered by
Stark in 1906, and Einstein published a short paper in 1907 deriving the relativistic
prediction for such an experiment. However, it wasnt until 1938 that the experiment was
actually performed with enough precision to discern the second order effect. In that year,
Ives and Stilwell shot hydrogen atoms down a tube, with velocities (relative to the lab)
ranging from about 0.8 to 1.3 times 106 m/sec. As the hydrogen atoms were in flight they
emitted light in all directions. Looking into the end of the tube (with the atoms coming
toward them), Ives and Stilwell measured a prominent characteristic spectral line in the
light coming forward from the hydrogen. This characteristic frequency was Doppler
shifted toward the blue by some amount dapproach because the source was approaching
them. They also placed a mirror at the opposite end of the tube, behind the hydrogen
atoms, so they could look at the same light from behind, i.e., as the source was effectively
moving away from them, red-shifted by some amount dreceed. The following is a table of
results from the original 1938 experiment for four different velocities of the hydrogen
atom:

Ironically, although the results of their experiment brilliantly confirmed Einsteins


prediction based on the special theory of relativity, Ives and Stillwell were not advocates
of relativity, and in fact gave a completely different theoretical model to account for their
experimental results and the deviation from the classical prediction. This illustrates the
fact that the results of an experiment can never uniquely identify the explanation. They
can only split the range of available models into two groups, those that are consistent
with the results and those that aren't. In this case it's clear that any model yielding the
classical prediction is ruled out, while the Lorentz/Einstein model is found to be
consistent with the observed results.
All the above was based on the assumption that the emitter and absorber are moving
relative to each other directly along their "line of sight". More generally, we can give the
Doppler shift for the case when the (inertial) motions of the emitter and absorber are at
any specified angles relative to the "line of sight". Without loss of generality we can
assume the absorber is stationary at the origin of inertial coordinates and the emitter is
moving at a speed v and at an angle relative to the direct line of sight, as illustrated
below.

For two pulses of light emitted at coordinate times differing by te, arrival times at the
receiver will differ by ta = (1 vr) t where vr = v cos() is the radial component of the
emitters velocity. Also, the proper time interval along the emitters worldline between
the two emissions is e = te (1 v2)1/2. Therefore, since the frequency of the
transmissions with respect to the emitters rest frame is proportional to 1/e, and the
frequency of receptions with respect to the absorbers rest frame is proportional to 1/ta,
the full frequency shift is

This differs in appearance from the Doppler shift equation given in Einsteins 1905 paper,
but only because, in Einsteins equation, the angle is evaluated with respect to the
emitters rest frame, whereas in our equation the angle is evaluated with respect to the
absorbers rest frame. These two angles differ because of the effect of aberration. If we
let ' denote the angle with respect to the emitter's rest frame, then ' is related to by
the aberration equation

(See Section 2.5 for a derivation of this expression.) Substituting for cos() into the
previous equation gives Einsteins equation for the Doppler shift, i.e.,

Naturally for the "linear" cases, when = ' = 0 or = ' = we have

respectively. This highlights the symmetry between emitter and absorber that is so
characteristic of relativistic physics.
Even more generally, consider an emitter moving with constant velocity u, an absorber
moving with constant velocity v, and a signal propagating with velocity C in terms of an
inertial coordinate system in which the signals speed |C| is independent of direction. This
would apply to a system of coordinates at rest with respect to the medium of the signal,
and it would apply to any inertial coordinate system if the signal is light in a vacuum. It
would also apply to the case of a signal emitted at a fixed speed relative to the emitter,
but only if we take u = 0, because in this case the speed of the signal is independent of
direction only in terms of the rest frame of the emitter. We immediately have the relation

where re and ra are the position vectors of the emission and absorption events at the
times te and ta respectively. Differentiating both sides with respect to ta and dividing
through by 2(ta te), and noting that (ra re)/(ta te) = C, we get

where u and v are the velocity vectors of the emitter and absorber respectively. Solving
for the ratio dte/dta, we arrive at the relation

Making use of the dot product identity rs = |r||s|cos(r,s) where r,s is the angle between
the r and s vectors, these can be re-written as

The frequency of any process is inversely proportional to the duration of the period, so
the frequency at the absorber relative to the emitter, projected by means of the signal, is
given by a/e = dte/dta. Therefore, the above expressions represent the classical Doppler
effect for arbitrarily moving emitter and receiver. However, the elapsed proper time along

a worldline moving with speed v in terms of any given inertial coordinate system differs
from the elapsed coordinate time by the factor

where c is the speed of light in vacuum. Consequently, the actual ratio of proper times
and therefore proper frequencies for the emitter and absorber is

The leading ratio is the classical Doppler effect, and the square root factor is the
relativistic correction.

2.5 Stellar Aberration


It was chiefly therefore Curiosity that tempted me (being then at Kew,
where the Instrument was fixed) to prepare for observing the Star on
December 17th, when having adjusted the Instrument as usual, I perceived
that it passed a little more Southerly this Day than when it was observed
before.
James
Bradley, 1727
The aberration of starlight was discovered in 1727 by the astronomer James Bradley
while he was searching for evidence of stellar parallax, which in principle ought to be
observable if the Copernican theory of the solar system is correct. He succeeded in
detecting an annual variation in the apparent positions of stars, but the variation was not
consistent with parallax. The observed displacement was greatest for stars in the direction
perpendicular to the orbital plane of the Earth, and most puzzling was the fact that the
displacement was exactly three months (i.e., 90 degrees) out of phase with the effect that
would result from parallax due to the annual change in the Earths position in orbit
around the Sun. It was as if he was expecting a sine function, but found instead a cosine
function. Now, the cosine is the derivative of the sine, so this suggests that the effect he
was seeing was not due to changes in the earths position, but to changes in the Earths
(directional) velocity. Indeed Bradley was able to interpret the observed shift in the
incident angle of starlight relative to the Earths frame of reference as being due to the
transverse velocity of the Earth relative to the incoming corpuscles of light, assuming the
latter to be moving with a finite speed c. The velocity of the corpuscles relative to the
Earth equals their velocity vector c with respect to the Suns frame of reference plus the

negative of the orbital velocity vector v of the Earth, as shown below.

In this figure, 1 is the apparent elevation of a star above the Earths orbital plane when
the Earths velocity is most directly toward the star (say, in January), and 2 is the
apparent elevation six months later when the Earths velocity is in the opposite direction.
The law of sines gives

Since the aberration angles are quite small, we can closely approximate sin() with just
. Therefore, the apparent position of a star that is roughly above the ecliptic ought to
describe a small circle (or ellipse) around its true position, and the radius of this path
should be sin()(v/c) where v is the Earths orbital speed and c is the speed of light.
When Bradley made his discovery he was examining the star Draconis, which has a
declination of about 51.5 degrees above the Earths equatorial plane, and about 75
degrees above the ecliptic plane. Incidentally, most historical accounts say Bradley chose
this star simply because it passes directly overhead in Greenwich England, the site of his
observatory, which happens to be at about 51.5 degrees latitude. Vertical observations
minimize the effects of atmospheric refraction, but surely this is an incomplete
explanation for choosing Draconis, because stars with this same declination range from
28 to 75 degrees above the ecliptic, due to the Earths tilt of 23.5 degrees. Was it just a
lucky coincidence that he chose (as Leibniz had previously) Draconis, a star with the
maximum possible elevation above the ecliptic among stars that pass directly over
Greenwich? Accidental or not, he focused on nearly the ideal star for detecting
aberration. The orbital speed of the Earth is roughly v = (2.98)104 m/sec, and the speed of
light is c = (3.0)108 m/sec, so the magnitude of the aberration for Draconis is (v/c)sin(75
deg) = (9.59)10-5 radians = 19.8 seconds of arc. Bradley subsequently confirmed the
expected aberration for stars at other declinations.
Ironically, although it was not the effect Bradley had been seeking, the existence of stellar
aberration was, after all, conclusive observational proof of the Earths motion, and hence

of the Copernican theory, which had been his underlying objective. Furthermore, the
discovery of stellar aberration not only provided the first empirical proof of the
Copernican theory, it also furnished a new and independent proof of the finite speed of
light, and even enabled that speed to be estimated from knowledge of the orbital speed of
the Earth. The result was consistent with the earlier estimate of the speed of light by
Roemer based on observations of Jupiters moons (see Section 3.3).
Bradleys interpretation, based on the Newtonian corpuscular concept of light, accounted
quite well for the basic phenomenon of stellar aberration. However, if light consists of
ballistic corpuscles their speeds ought to depend on the relative motion between the
source and observer, and these differences in speed ought to be detectable, whereas no
such differences were found. For example, early in the 19th century Arago compared the
focal length of light from a particular star at six-month intervals, when the Earths motion
should alternately add and subtract a velocity component equal to the Earths orbital
speed to the speed of light. According to the corpuscle theory, this should result in a
slightly different focal length through the system of lenses, but Arago observed no
difference at all. In another experiment he viewed the aberration of starlight through a
normal lens and through a thick prism with a very different index of refraction, which
ought to give a slightly different aberration angle according to the Newtonian corpuscular
model, but he found no difference. Both these experiments suggest that the speed of light
is independent of the motion of the source, so they tended to support the wave theory of
light, rather than the corpuscular theory.
Unfortunately, the phenomenon of stellar aberration is somewhat problematic for theories
that regard electromagnetic radiation as waves propagating in a luminiferous ether. Its
worthwhile to examine the situation in some detail, because it is a nice illustration of the
clash between mechanical and electromagnetic phenomena within the context of Galilean
relativity. If we conceive of the light emanating from a distant star reaching the Earths
location as a set of essentially parallel streams of particles normal to the Earths orbit (as
Bradley did), then we have the situation shown in the left-hand figure below, and if we
apply the Galilean transformation to a system of coordinates moving with the Earth (in
the positive x direction) we get the situation shown in the right-hand figure.

According to this model the aberration arises because each corpuscle has equations of

motion of the form y = -ct and x = x0, so the Galilean transformation x = x+vt, y = y, t =
t leads to y = ct and x+vt = x0, which gives (after eliminating t) the path x v(y/c) =
x0. Thus we have dx/dy = v/c = tan(). In contrast, if we conceive of the light as
essentially a plane wave, the sequence of wave crests is as shown below.

In this case each wavecrest has the equation y = ct, with no x specification, because the
wave is uniform over the entire wavefront. Applying the same Galilean transformation as
before, we get simply y = ct, so the plane wave looks the same in terms of both systems
of coordinates. We might try to argue that the flow of energy follows definite streamlines,
and if these streamlines are vertical with respect to the unprimed coordinates they would
transform into slanted streamlines in the primed coordinates, but this would imply that
the direction of propagation of the wave energy is not exactly normal to the wave fronts,
in conflict with Maxwells equations. This highlights the incompatibility between
Maxwells equations and Galilean relativity, because if we regard the primed coordinates
as stationary and the distant star as moving transversely with speed v, then the waves
reaching the Earth at this moment should have the same form as if they were emitted
from the star when it was to the right of its current position, and therefore the wave fronts
ought to be slanted by an angle of v/c. Of course, we do actually observe aberration of
this amount, so the wave fronts really must be tilted with respect to the primed
coordinates, and we can fairly easily explain this in terms of the wave model, but the
explanation leads to a new complication.
According to the early 19th century wave model with a stationary ether, an observation of
a distant star consists of focusing a set of parallel rays from that star down to a point, and
this necessarily involves some propagation of light in the transverse direction (in order to
bring the incoming rays together). Taking the focal point to be midway between two rays,
and assuming the light propagates transversely at the same speed in both directions, we
will align our optical device normal to the plane wave fronts. However, suppose the
effective speed of light is slightly different in the two transverse directions. If that were
the case, we would need to tilt our optical device, and this would introduce a time skew
in our evaluation of the wave front, because our optical image would associate rays from
different points on the wave front at slightly different times. As a result, what we regard
as the wave front would actually be slanted. The proponents of the wave model argued
that the speed of light is indeed different in the two transverse directions relative to a

telescope on the Earth pointed up at a star, because the Earth is moving sideways
(through the ether) with respect to the incoming rays. Assuming light always propagates
at the fixed speed c relative to the ether, and assuming the Earth is moving at a speed v
relative to the ether, we could argue that the transverse speed of light inside our telescope
is c+v in one direction and cv in the other. To assess the effect of this asymmetry,
consider for simplicity just two mirror elements of a reflecting telescope, focusing
incoming rays as illustrated below.

The two incoming rays shown in this figure are from the same wavecrest, but they are not
brought into focus at the midpoint of the telescope, due to the (putative) fact that the
telescope is moving sideways through the ether with a speed v. Both pulses strike the
mirrors at the same time, but the left hand pulse goes a distance proportional to c+v in the
time it takes the right hand pulse to go a distance proportional to cv. In order to bring the
wave crest into focus, we need to increase the path length of the left hand ray by a
distance proportional to v, and decrease the right hand path length by the same distance.
This is done by tilting the telescope through a small angle whose tangent is roughly v/c,
as shown below.

Thus the apparent optical wavefront is tilted by an angle given by tan() = v/c, which is
the same as the aberration angle for the rays, and also in agreement with the corpuscle
model. However, this simple explanation assumes a total vacuum, and it raises questions
about what would happen if the telescope was filled with some material medium such as
air or water. It was already accepted in Fresnels day, for both the wave and the corpuscle
models of light, that light propagates more slowly in a dense medium than in vacuum.
Specifically, the speed of light in a medium with index of refraction n is c/n. Hence if we
fill our reflecting telescope with such a medium, then the speed of light in the two
transverse directions would be c/n + v and c/n v, and the above analysis would lead us
to expect an aberration angle given by tan() = nv/c. The index of refraction of air is just
1.0003, so this doesnt significantly affect the observed aberration angle for telescopes in
air. However, the index of refraction of water is 1.33, so if we fill a telescope with water,

we ought to observe (according to this theory) significantly more stellar aberration. Such
experiments have actually been carried out, but no effect on the aberration angle is
observed.
In 1818 Fresnel suggested a way around this problem. His hypothesis, which he admitted
appeared extraordinary at first sight, was that although the luminiferous ether through
which light propagates is nearly immobile, it is dragged along slightly by material
objects, and the higher the refractive index of the object, the more it drags the ether along
with its motion. If an object with refractive index n moves with speed v relative to the
nominal rest frame of the ether, Fresnel hypothesized that the ether inside the object is
dragged forward at a speed (1 1/n2)v. Thus for objects with n = 1 there is no dragging at
all, but for n greater than 1 the ether is pulled along slightly. Fresnel gave a plausibility
argument based on the relation between density and refractivity, making his hypothesis
seem at least slightly less contrived, although it was soon pointed out that since the index
of refraction of a given medium varies with frequency, Fresnels model evidently
requires a different ether for each frequency. Neglecting this second-order effect of
chromatic dispersion, Fresnel was able on the basis of his partial dragging hypothesis to
account for the absence of any change in stellar aberration for different media. He pointed
out that, in the above analysis, the speed of light in the two directions has the values

For the vacuum we have n = 1, and these expressions are the same as before. In the
presence of a material medium with n greater than 1, the optical device must now be
tilted through an angle whose tangent is approximately

It might seem as if Fresnels hypothesis has simply resulted in exchanging one problem
for another, but recall that our telescope is aligned normal to the apparent wave front,
whereas it is at an angle of v/c to the normal of the actual wave front, so the wave will be
refracted slightly (assuming n is not equal to 1). According to Snells law (which for
small angles is n11 = n22), the refracted angle will be less than the incident angle by the
factor 1/n. Hence we must orient our telescope at an angle of v/c in order for the rays
within the medium to be at the required angle.
This is how, on the basis of somewhat adventuresome hypotheses and assumptions,
physicists of the 19th century were able to account for stellar aberration on the basis of
the wave model of light. (Accommodating the lack of effect of differing indices of
refraction proved to be even more challenging for the corpuscular model.) Fresnels
remarkable hypothesis was directly confirmed (many years later) by Fizeau, and it is now
recognized as a first-order approximation of the relativistic velocity addition law,
composing the speed of light in a medium with the speed of the medium

Its worth noting that all the speeds discussed here are phase speeds, corresponding to
the time parameter for a given wave. Lorentz later showed that Fresnels formula could
also be interpreted in the context of a perfectly immobile ether along with the assumption
of phase shifts in the incoming wave fronts so that the effective time parameter
transformation was not the Galilean t = t but rather t = t vx/c2.
Despite the success of Fresnels hypothesis in matching all optical observations to the
first order in v/c, many physicists considered his partially dragged ether model to be ad
hoc and unphysical (especially the apparent need for a different ether for each frequency
of light), so they sought other explanations for stellar aberration that would be consistent
with a more mechanistically realistic wave model. As an alternative to Fresnels
hypothesis, Lorentz evaluated a proposal of Stokes, who in 1846 had suggested that the
ether is totally dragged along by material bodies (so the ether is co-moving with the body
at the bodys surface), and is irrotational, incompressible, and inviscid, so that it supports
a velocity potential. Under these assumptions it can be shown that the normal of a light
wave incident on the Earth undergoes a total deflection during its approach such that (to
first order) the apparent shift in the stars position agrees with observation. Unfortunately,
as Lorentz pointed out, the assumptions of Stokes theory are mutually contradictory,
because the potential flow field around a sphere does not give zero velocity on the
spheres surface. Instead, the velocity of the ether wind on the Earths surface would vary
with position, and so too would the aberration of starlight. Planck suggested a way
around this objection by supposing the luminiferous ether was compressible, and
accumulated with greatly increased density around large objects. Lorentz admitted that
this was conceivable, but only if we also assume the speed of light propagating through
the ether is unaffected by the changes in density of the ether, an assumption that plainly
contradicts the behavior of wave propagation in ordinary substances. He concluded
In this branch of physics, in which we can make no progress without some
hypothesis that looks somewhat startling at first sight, we must be careful not to
rashly reject a new idea yet I dare say that this assumption of an enormously
condensed ether, combined, as it must be, with the hypothesis that the velocity of
light is not in the least altered by it, is not very satisfactory.
With the failure of Stokes theory, the only known way of reconciling stellar aberration
with a wave theory of light was Fresnels extraordinary hypothesis of partial dragging,
or Lorentzs equivalent interpretation in terms of the effective phase time parameter t.
However, the Fresnel-Lorentz theory predicted a non-null result for the MichelsonMorley experiment, which was the first experiment accurate to the second order in v/c.
To remedy this, Lorentz ultimately incorporated Fitzgeralds length contraction into his
theory, which amounts to replacing the Galilean transformation x = x vt with the

relation x = (x vt)/ (1 (v/c)2)1/2, and then for consistency applying this same secondorder correction to the time transformation, giving t = (t vx/c2)/(1 (v/c)2)1/2, thereby
arriving at the full Lorentz transformation. By this point the posited luminiferous ether
had lost all of its mechanistic properties.
Meanwhile, Einstein's 1905 paper on the electrodynamics of moving bodies included a
greatly simplified derivation of the full Lorentz transformation, dispensing with the ether
altogether, and analyzing a variety of phenomena, including stellar aberration, from a
purely kinematical point of view. If a photon is emitted from object A at the origin of the
xyt coordinates and an angle relative to the x axis, then at time t1 it will have reached
the point

(Notice that the units have been scaled to make c = 1, so the Minkowski metric for a null
interval gives x12 + y12 = t12.) Now consider an object B moving in the positive x direction
with velocity v, and being struck by the photon at time t1 as shown below.

Naturally an observer riding along with B will not see the light ray arriving at an angle
from the x axis, because according to the system of coordinates co-moving with B the
source object A has moved in the x direction (but not in the y direction) between the
times of transmission and reception of the photon. Since the angle is just the arctangent of
the ratio of y to x of the photon's path, and since value of x is different with respect to
B's co-moving inertial coordinates whereas y is the same, it's clear that the angle of the
photon's path is different with respect to B's co-moving coordinates than with respect to
A's co-moving coordinates. In general the transformation of the angles of the paths of
moving objects from one system of inertial coordinates to another is called aberration.
To determine the angle of the incoming ray with respect to the co-moving inertial
coordinates of B, let x'y't' be an orthogonal coordinate system aligned with the xyt
coordinates but moving in the positive x direction with velocity v, so that B is at rest in
the primed coordinate system. Without loss of generality we can co-locate the origins of
the primed and unprimed coordinates systems, so in both systems the photon is emitted at
(0,0,0). The endpoint of the photon's path in the primed coordinates can be computed
from the unprimed coordinates using the standard Lorentz transformation for a boost in
the positive x direction:

Just as we have cos() = x1/t1, we also have cos(') = x1'/t1', and so

which is the general relativistic aberration formula relating the angles of light rays with
respect to relatively moving coordinate systems. Likewise we have sin(') = y1'/t1', from
which we get

Using these expressions for the sine and cosine of ' it follows that

Recalling the trigonometric identity tan(z) = sin(2z)/[1+cos(2z)] this gives

which immediately shows that aberration can be represented by stereographic projection


from a sphere to the tangent plane. (This is discussed more fully in Section 2.6.)
To see the effect of equation (3), suppose that, with respect to the inertial rest frame of a
given particle, the rays of starlight incident on the particle are uniformly distributed in all
directions. Then suppose the particle is given some speed v in the positive x direction
relative to this original isotropic frame, and we evaluate the angles of incidence of those
same rays of starlight with respect to the particle's new rest frame. The results, for speeds
ranging from 0 to 0.999, are shown in the figure below. (Note that the angles in equation
(3) are evaluated between the positive x or x' axis and the positive direction of the light
ray.)

The preceding derivation applies to the case when the light is emitted from the unprimed
coordinate system at a certain angle and evaluated with respect to the primed coordinate
system, which is moving relative to the unprimed system. If instead the light was emitted
from B and received at A, we can repeat the above derivation, except that the direction of
the light ray is reversed, going now from B to A. The spatial coordinates are all the same
but the emission event now occurs at -t1, because it is in the past of event (0,0,0). The
result is simply to replace each occurrence of v in the above expressions with -v. Of
course, we could reach the same result simply by transposing the primed and unprimed
angles in the above expressions.
Incidentally, the aberration formula used by astronomers to evaluate the shift in the
apparent positions of stars resulting from the Earth's orbital motion is often expressed in
terms of angles with respect to the y axis (instead of the x axis), as shown below

This configuration corresponds to a distant star at A sending starlight to the Earth at B,


which is moving nearly perpendicular to the incoming ray. This gives the greatest
aberration effect, which explains why the stars furthest from the ecliptic plane experience
the greatest aberration. The formula can be found simply by making the substitution =
in equation (1), and noting the trigonometric identity tan(acos(/2 x)) = x/
. This gives the equivalent form

Another interesting aspect of aberration is illustrated by considering two separate light


sources S1 and S2, and two momentarily coincident observers A and B as shown below

If observer A is stationary with respect to the sources of light, he will see the incoming
rays of light striking him from the negative x direction. Thus, the light will impart a small
amount of momentum to observer A in the positive x direction. On the other hand,
suppose observer B is moving to the right (away from the sources of light) at nearly the
speed of light. According to our aberration formula, if B is traveling with a sufficiently
great speed, he will see the light from S1 and S2 approaching from the positive x direction,
which means that the photons are imparting momentum to B in the negative x direction even though the light sources are "behind" B. This may seem paradoxical, but the
explanation becomes clear when we realize that the x component of the velocities of the
incoming light rays is less than c (because (vx)2 = c2 (vy)2), which means that it's
possible for observer B to be moving to the right faster than the incoming photons are
moving to the right.
Of course, this effect relies only on the relative motion of the observer and the source, so
it works just as well if we regard B as motionless and the light sources S1,S2 moving to
the left at near the speed of light. Thus, it might seem that we could use light rays to
"pull" an object from behind, and in a sense this is true. However, since the light rays are
moving to the right more slowly than the object, they clearly cannot catch up with the
object from behind, so they must have been emitted when the object was still to the left of
the sources. This illustrates how careful one must be to correctly account for the effective
aberration of non-uniformly moving objects, because the simple aberration formulas are
based on the assumption that the light source has been in uniform motion for an indefinite
period of time. To correctly describe the aberration of non-uniformly moving light
sources it is necessary to return to the basic metrical relations.
For example, consider a binary star system in which one large central star is roughly
stationary (relative to our Sun), and a smaller companion star is orbiting around the
central star with a large angular velocity in a plane normal to the direction to our Sun, as
illustrated below.

It might seem that the periodic variations in the velocity of the smaller star relative to our
Sun would result in significantly different amounts of aberration as viewed from the
Earth, causing the two components of the binary star system to appear in separate
locations in the sky - which of course is not what is observed. Fortunately, it's easy to
show that the correct application of the principles of special relativity, accounting for the
non-uniform variations in the orbiting star's velocity, leads to prediction that agree
perfectly with observation of binary star systems.
At any moment of observation on Earth we can consider ourselves to be at rest at the
point P0 in the momentarily co-moving inertial frame, with respect to which our
coordinates are

Suppose the large central star of a binary pair is at point P1 at a distance L from the Earth
with the coordinates

The fundamental assertion of special relativity is that light travels along null paths, so if a
pulse of light is emitted from the star at time t = T and arrives at Earth at time t = 0, we
have

and so

from which it follows that x1/z1 at time T is


have the aberration angle

. Thus, for the central star we

Now, what about the aberration of the other star in the binary pair, the one that is assumed
to be much smaller and revolving at a radius R and angular speed w around the larger star
in a plane perpendicular to the Earth? The coordinates of that revolving star at point P2
are

where = wt is the angular position of the smaller star in its orbit. Again, since light
travels along null paths, a pulse of light arriving on Earth at time t = 0 was emitted at time

t = T satisfying the relation

Solving this quadratic for T (and noting that the phase depends entirely on the arbitrary
initial conditions of the orbit) gives

If the radius R of the binary star's orbit is extremely small in comparison with the
distance L from those stars to the Earth, and assuming v is not very close to the speed of
light, then the quantity inside the square root is essentially equal to 1. Therefore, the
tangents of the angles of incidence in the x and y directions are

These expressions make it clear why Einstein emphasized in his 1905 treatment of
aberration that the light source was at infinite distance, i.e., L goes to infinity, so all but
the middle term of the x tangent vanish. Of course, the leading terms in these tangents are
obviously just the inherent "static" angular separation between the two stars viewed from
the Earth, and the last term in the x tangent is completely negligible assuming R/L and/or
v are sufficiently small compared with 1, so the aberration angle is essentially

which of course is the same as the aberration of the central star. Indeed, binary stars have
been carefully studied for over a century, and the aberrations of the components are
consistent with the relativistic predictions for reasonable Keplerian orbits. (Incidentally,
recall that Bradley's original formula for aberration was tan() = v, whereas the
corresponding relativistic equation is sin() = v. The actual aberration angles for stars
seen from Earth are small enough that the sine and tangent are virtually
indistinguishable.)
The experimental results of Michelson and Morley, based on beams of light pointed in
various directions with respect to the Earth's motion around the Sun, can also be treated
as aberration effects. Let the arm of Michelson's interferometer be of length L, and let it

make an angle with the direction of motion in the rest frame of the arm. We can
establish inertial coordinates t,x,y in this frame, in terms of which the light pulse is
emitted at t1 = 0, x1 = 0, y1 = 0, reflected at t2 = L, x2 = Lcos(), y2 = Lsin(), and arrives
back at the origin at t3 = 2L, x3 = 0, y3 = 0. The Lorentz transformation to a system x',y',t'
moving with velocity v in the x direction is x' = (xvt)/, y' = y, t' = (tvx)/ where 2 = (1
v2), so the coordinates of the three events are x1' = 0, y1' = 0, t1' = 0, and x2' = L(cos()v)/
, y2' = Lsin(), t2' = L[1vcos()]/, and x3' = -2vL/, y3' = 0, t3' = 2L/. Hence the total
elapsed time in the primed coordinates is 2L/. Also, the total spatial distance traveled is
the sum of the outward distance

and the return distance

so the total distance is 2L/, giving a light speed of 1 regardless of the values of v and .
Of course, the angle of the interferometer arm cannot be with respect to the primed
coordinates. The tangent of the angle equals the arm's y extent divided by its x extent,
which gives tan() = Lsin()/[L(cos()] in the arm's rest coordinates. In the primed
coordinates the y' extent of the arm is the same as the y extent, Lsin(), but the x' extent
is Lcos(), so the tangent of the arm's angle is tan(') = tan()/. However, this should
not be confused with the angle (in the primed coordinates) of the light pulse as it travels
along the arm, because the arm is in motion with respect to the primed coordinates. The
outward direction of motion of the light pulse is given by evaluating the primed
coordinates of the emission and absorption events at x1,y1 and x2,y2 respectively. Likewise
the inward direction of the light pulse is based on the interval from x2,y2 to x3,y3. These
give the tangents of the outward and inward angles

Naturally these are consistent with the result of taking the ratio of equations (1) and (2).
2.6 Mobius Transformations and The Night Sky
What we are beginning to see here is the first step of a powerful
correspondence between the spacetime geometry of relativity and the
holomorphic geometry of complex spaces.
Roger
Penrose, 1977

Any proper orthochronous Lorentz transformation (including ordinary rotations and


relativistic boosts) can be represented by

where

and Q* is the transposed conjugate of Q. The coefficients a,b,c,d of Q are allowed to be


complex numbers, normalized so that ad bc = 1. Just to be explicit, this implies that if
we define

then the Lorentz transformation (1) is

Two observers at the same point in spacetime but with different orientations and
velocities will "see" incoming light rays arriving from different relative directions with
respect to their own frames of reference, due partly to ordinary rotation, and partly to the
aberration effect described in the previous section. This leads to the remarkable fact that
the combined effect of any proper orthochronous (and homogeneous) Lorentz
transformation on the incidence angles of light rays at a point corresponds precisely to the
effect of a particular linear fractional transformation on the Riemann sphere via ordinary
stereographic projection from the extended complex plane. The latter is illustrated
below:

The complex number p in the extended complex plane is identified with the point p' on
the unit sphere that is struck by a line from the "North Pole" through p. In this way we
can identify each complex number uniquely with a point on the sphere, and vice versa.
(The North Pole is identified with the "point at infinity" of the extended complex plane,
for completeness.)
Relative to an observer located at the center of the Riemann sphere, each point of the
sphere lies in a certain direction, and these directions can be identified with the directions
of incoming light rays at a point in spacetime. If we apply a Lorentz transformation of
the form (1) to this observer, specified by the four complex coefficients a,b,c,d, the
resulting change in the directions of the incoming rays of light is given exactly by
applying the linear fractional transformation (also known as a Mobius transformation)

to the points of the extended complex plane. Of course, our normalization ad bc = 1


implies the two conditions

so of the eight coefficients needed to specify the four complex numbers a,b,c,d, these two
constraints reduce the degrees of freedom to six, which is precisely the number of
degrees of freedom of Lorentz transformations (namely, three velocity components
vx,vy,vz, and three angular specifications for the longitude and latitude of our line of sight
and orientation about that line).
To illustrate this correspondence, first consider the "identity" Mobius transformation
w w. In this case we have

so our Lorentz transformation reduces to t' = t, x' = x, y' = y, z' = z as expected. None of
the points move on the complex plane, so none move on the Riemann sphere under
stereographic projection, and nothing changes in the sky's appearance. Now let's consider

the Mobius transformation w 1/w. In this case we have

and so the corresponding Lorentz transformation is


t' = t, x' = x, y' = y, z' = z .
Thus the x and z coordinates have been reflected. This is certainly a proper
orthochronous Lorentz transformation, because the determinant is +1 and the coefficient
of t is positive. But does reflecting the x and z coordinates agree with the stereographic
effect on the Riemann sphere of the transformation w 1/w? Note that the point w =
r + 0i maps to 1/r + 0i. There's a nice little geometric demonstration that the
stereographic projections of these points have coordinates (x,0,z) and (x,0,z)
respectively, noting that the two projection lines have negative inverse slopes and so are
perpendicular in the xz plane, which implies that they must strike the sphere on a
common diameter (by Pythagoras' theorem). A similar analysis shows that points off the
real axis with projected coordinates (x,y,z) in general map to points with projections (x,y,
z) points.
The two examples just covered were both trivial in the sense that they left t unchanged.
For a more interesting example, consider the Mobius transformation w w + p, which
corresponds to the Lorentz transformation

If we denote our spacetime coordinates by the column vector X with components x0 = t,


x1 = x, x2 = y, x3 = z, then the transformation can be written as

where

To analyze this transformation it's worthwhile to note that we can decompose any Lorentz
transformation into the product of a simple boost and a simple rotation. For a given
relative velocity with magnitude |v| and components v1, v2, v3, let denote the "boost
factor"

It's clear that

Thus, these four components of L are fixed purely by the boost. The remaining
components depend on the rotational part of the transformation. If we define a "pure
boost" as a Lorentz transformation such that the two frames see each other moving with
velocities (v1,v2,v3) and (v1,v2,v3) respectively, then there is a unique pure boost for any
given relative velocity vector v1,v2,v3. This boost has the components

where Q = (1)/|v|2. From our expression for L we can identify the components to give
the boost velocity in terms of the Mobius parameter p

and

From these we write the pure boost part of L as follows

We know that our Lorentz transformation L can be written as the product of this pure
boost B times a pure rotation R, i.e., L = BR, so we can determine the rotation

which in this case gives

In terms of Euler angles, this represents a rotation about the y axis through an angle of

The correspondence between the coefficients of the Mobius transformation and the
Lorentz transformation described above assumes stereographic projection from the North
pole to the equatorial plane. More generally, if we're projecting from the North Pole of
the Riemann sphere to a complex plane parallel to (but not necessarily on) the equator,
and if the North Pole is at a height h above the plane, then every point in the plane is a
factor of h further away from the origin than in the case of equatorial projection (h=1), so
the Mobius transformation corresponding to the above Lorentz transformation is w
(Aw+B)/(Cw+D) where

It's also worth noting that the instantaneous aberration observed by an accelerating
observer does not differ from that observed by a momentarily co-moving inertial
observer. We're referring here to the null (light-like) rays incident on a point of zero
extent, so this is not like a finite spinning body whose outer edges have significant
velocities relative to their centers. We're just referring to different coordinate systems
whose origins coincide at a given point in spacetime, and describing how the light rays
pass through that point in terms of the different coordinate systems at that instant. In this
context the acceleration (or spinning) of the systems make no difference to the answer. In
other words, as long as our inertial coordinate system has the same velocity and
orientation as the (ideal point-like) observer at the moment of the observation, it doesn't
matter if the observer is in the process of changing his orientation or velocity. (This is a
corollary of the "clock hypothesis" of special relativity, which asserts that a traveler's
time dilation at a given instant depends only on his velocity and not his acceleration at
that instant.)
In general we can classify Mobius transformations (and the corresponding Lorentz
transformations) according to their "squared trace", i.e., the quantity

This is also the "conjugacy parameter", i.e., two linear fractional transformations are
conjugate if and only if they have the same value of . The different kinds of
transformations are listed below:
0 <4
=4
>4
< 0 or not real

elliptic
parabolic
hyperbolic
loxodromic

For example, the class of pure rotations are a special case of elliptic transformations,
having the form
with
where an overbar denotes complex conjugation. Also, it's not hard to show that the
compositions of an arbitrary linear fractional transformation f(z) are cyclical with a
period m if and only if = 4cos(2k/m)2.
We've seen that the general finite transformation of the incoming null rays can be
expressed naturally in the form of a finite Mobius transformation of the complex plane
(under sterographic projection). This is a very simple algebraic operation, given by the
function

for complex constants a,b,c,d. This generates the discrete sequence f1(z) = f(z), f2(z) =
f(f(z)), f3(z) = f(f(f(z))), and so on for all fn(z) where n is a positive integer. It's also
possible to parameterize a Mobius transformation to give the corresponding infinitesimal
generator, which can be applied to give "fractional iterations" such as f1/2(z), or more
generally the continuously parameterized transformation fp(z) for any real (or even
complex) value of p. To accomplish this we must (in general) first map the discrete
generator f(z) to a domain in which it has some convenient exponential form, then apply
the pth-order transformation, and then map back to the original domain. There are
several cases to consider, depending on the character of the discrete generator.
In the degenerate case when ad = bc with c 0, the pth iterate of f(z) is simply the
constant fp(z) = a/c. On the other hand, if c = 0 and a = d 0, then fp(z) = z + (b/d)p.
The third case is with c = 0 and a d. The pth iterate of f(z) in this case is

Notice that the second and third cases are really linear transformations, since c = 0. The
fourth case is with c 0 and (a+d)2/(ad-bc) = 4, which leads to the following closed form
expression for the pth iterate

This corresponds to the case when the two fixed points of the Mobius transformation are
co-incident. In this "parabolic" case, if a+d = 0 then the Mobius transformation reduces
to the first case with ad-bc = 0.
Finally, in the most general case we have c 0 and (a+d)2 /(ad-bc) 4, and the pth
iterate of f(z) is given by

where

This is the general case with two distinct fixed points. (If a+d = 0 then = 0 and K = -1.)
The parameters A and B are the coefficients of the linear transformation that maps real
line to the locus of points with real part equal to 1/2. Notice that the pth composition of
f satisfies the relation

so we have

where

Thus
, which shows that f(z) is conjugate to the simple function Kz.
Since A+B is the complex conjugate of B, we see that h(z) can be expressed as

where

This enables us to express the pth composition of any linear fractional transformation
with two fixed points, and therefore any corresponding Lorentz transformation, in the
form

This shows that there is a particular oriented frame of reference, represented by h(z), with
respect to which the relation between the oriented frames z and f(z) is purely
exponential. (We must refer to oriented frames rather than merely frames because the
Mobius transformation represented the effects of general orientation as well as velocity
boost.)
To show explicitly how the action of fp(z) on the complex plane varies with p, consider
the relatively simple linear fractional transformation f(z) with fixed points at 0 and 1 on
the real axis, which implies A = 1 and B = 0. In parameterized form the pth composition
of this transformation is of the form

for some complex constant K, and the similarity parameter for this transformation is
= (1+K)2/K. For any given K and complex initial value z = x + iy, let

Then the real and imaginary components of fp(z) are given by

This makes explicit how the action of fp(z) on the complex plane is entirely determined
by the magnitude and phase angle of the constant K, which, as we saw previously, is
given by

If a,b,c,d are all real, then is real, in which case either K is real (>4 or <0) or K is
complex (0<<4) with a magnitude (norm) of 1. However, if a,b,c,d are allowed to be
complex, then K can be complex with a magnitude other than 1. Of course, if K is real,
we can set R = K and = 0, so p = 0 for all p, and the above equations reduce to

Clearly the computational complexity of the continuous parameterized transformation (4)


exceeds that of the discrete transformation (3). This raises an interesting question, at
least from a neo-Platonic perspective. Is it possible that nature prefers the simplicity of
the discrete form over the continuous? In other words, are all physically realizable
Lorentz transformations actually discrete? If so, what determines the "size" of the
minimum transformation? What is the "size" of a Mobius transformation?
We know that every Mobius is conjugate to a pure exponential, and the effect of which is
a rotation and a re-scaling. In addition, the conjugation itself may impose some kind of
size. It's interesting that the elements now are not frames but differences between frames,
including rotations. Thus, rather than the ontological objects of consideration being
events of the form x,y,z,t, or even coordinate systems, the objects are the transformations
with complex coefficients a,b,c,d. This again introduces the octonion space, though
restricted by the fact that there are only three (complex) degrees of freedom.

2.7 The Sagnac Effect


Blind unbelief is sure to err,
And scan his work in vain;

God is his own interpreter,


And he will make it plain.
William Cowper, 1780
If two pulses of light are sent in opposite directions around a stationary circular loop of
radius R, they will traveled the same inertial distance at the same speed, so they will
arrive at the end point simultaneously. This is illustrated in the left-hand figure below.

The figure on the right indicates what happens if the loop itself is rotating during this
procedure. The symbol denotes the angular displacement of the loop during the time
required for the pulses to travel once around the loop. For any positive value of , the
pulse traveling in the same direction as the rotation of the loop must travel a slightly
greater distance than the pulse traveling in the opposite direction. As a result, the counterrotating pulse arrives at the "end" point slightly earlier than the co-rotating pulse.
Quantitatively, if we let denote the angular speed of the loop, then the circumferential
tangent speed of the end point is v = R, and the sum of the speeds of the wave front and
the receiver at the "end" point is cv in the co-rotating direction and c+v in the counterrotating direction. Both pulses begin with an initial separation of 2R from the end point,
so the difference between the travel times is

where A = R2 is the area enclosed by the loop. This analysis is perfectly valid in both the
classical and the relativistic contexts. Of course, the result represents the time difference
with respect to the axis-centered inertial frame. A clock attached to the perimeter of the
ring would, according to special relativity, record a lesser time, by the factor = (1
(v/c)2)1/2, so the Sagnac delay with respect to such a clock would be [4A/c2]/(1(v/c)2)1/2.
However, the characteristic frequency of a given light source co-moving with this clock

would be greater, compared to its reduced value in terms of the axis-centered frame, by
precisely the same factor, so the actual phase difference of the beams arriving at the
receiver is invariant. (It's also worth noting that there is no Doppler shift involved in a
Sagnac device, because each successive wave crest in a given direction travels the same
distance from transmitter to receiver, and clocks at those points show the same lapse of
proper time, both classically and in the context of special relativity.)
This phenomenon applies to any closed loop, not necessarily circular. For example,
suppose a beam of light is split by a half-silvered mirror into two beams, and those beams
are directed in a square path around a set of mirrors in opposite directions as shown
below.

Just as in the case of the circular loop, if the apparatus is unaccelerated, the two beams
will travel equal distances around the loop, and arrive at the detector simultaneously and
in phase. However, if the entire device (including source and detector) is rotating, the
beam traveling around the loop in the direction of rotation will have farther to go than the
beam traveling counter to the direction of rotation, because during the period of travel the
mirrors and detector will all move (slightly) toward the counter-rotating beam and away
from the co-rotating beam. Consequently the beams will reach the detector at slightly
different times, and slightly out of phase, producing optical interference "fringes" that can
be observed and measured.
Michelson had proposed constructing such a device in 1904, but did not pursue it at the
time, since he realized it would show only the absolute rotation of the device. The effect
was first demonstrated in 1911 by Harress (unwittingly) and in 1913 by Georges Sagnac,
who published two brief notes in the Comptes Rendus describing his apparatus and
summarizing the results. He wrote
The result of measurements shows that, in ambient space, the light is propagated
with a speed V0, independent of the overall movement of the source of light O and
optical system.
This rules out the ballistic theory of light propagation (as advocated by Ritz in 1909),
according to which the speed of light is the vector sum of the velocity of the source plus a
vector of magnitude c. Ironically, the original Michelson-Morley experiment was
consistent with the ballistic theory, but inconsistent with the nave ether theory, whereas
the Sagnac effect is consistent with the nave ether theory but inconsistent with the
ballistic theory. Of course, both results are consistent with fully relativistic theories of

Lorentz and Einstein, since according to both theories light is propagated at a speed
independent of the state of motion of the source.
Because of the incredible precision of interferometric techniques, devices like this are
capable of detecting and measuring extremely small amounts of absolute rotation. One of
the first applications of this phenomenon was an experiment performed by Michelson and
Gale in 1925 to measure the absolute rotation rate of the Earth by means of a rectangular
optical loop 2/5 mile long and 1/5 mile wide. (See below for Michelsons comments on
this experiment.) More recently, the invention of lasers around 1963 has led to practical
small-scale devices for measuring rotation by exploiting the Sagnac effect. There are two
classes of such devices, namely, ring interometers and ring lasers. A ring interferometer
typically consists of many windings of fiber optic lines, conducting light (of a fixed
frequency) in opposite directions around a loop, and then recombining them to measure
the phase difference, just as in the original Sagnac apparatus, but with greater efficiency
and sensitivity. A ring laser, on the other hand, consists of a laser cavity in the shape of a
ring, which allows light to circulate in both directions, producing two standing waves
with the same number of nodes in each direction. Since the optical path lengths in the two
directions are different, the resonant frequencies of the two standing waves are also
different. (In practice it is typically necessary to dither the ring to prevent phase locking
of the two modes.) The beat between the two frequencies is measured, giving a result
proportional to the rotation rate of the device. Incidentally, it isnt necessary for the actual
laser cavity to circumscribe the entire loop; longitudinal pumping can be used, driven by
feedback carried in opposite directions around the loop in ordinary optical fibers.
(Needless to say, the difference in resonant frequency of the two stand waves in a ring
laser due to the different optical path lengths is not to be confused with a Doppler shift.)
Today such devices are routinely used in guidance and navigation systems for
commercial airliners, nautical ships, spacecraft, and in many other applications, and are
capable of detecting rotation rates as slight as 0.00001 degree per hour.
We saw previously that the time delay (and therefore the difference in the optical path
lengths) for a circular loop is proportional to the area enclosed by the loop. This
interesting fact actually applies to arbitrary closed loops. To prove this, we will derive the
difference in arrival times of the two pulses of light for an arbitrary polygonal loop
inscribed in a circle. Let the (inertial) coordinates of two consecutive mirrors separated
by a subtended angle be

where is the angular velocity of the device. Since light rays travel along null intervals,
we have c2(dt)2 = (dx)2 + (dy)2, so the coordinate time T required for a light pulse to
travel from one mirror to the next in the forward and reverse directions satisfies the
equations

Typically T is extremely small, i.e., the polygon doesn't rotate through a very large
angle in the time it takes light to go from one mirror to the next, so we can expand these
equations in T (up to second order) and collect powers of T to give the quadratic

The two roots of this polynomial are the values of T, one positive and one negative, for
the co-rotating and counter-rotating solutions, so the difference in the absolute times is
the sum of these roots. Hence we have

This is the net contribution of this edge to the total time increment. Recalling that the area
of a regular n-sided polygon of radius R is nR2sin(2/n)/2, the area of the triangle formed
by the hub and the two mirrors is R2sin()/2. It follows that each edge of an arbitrary
polygonal loop inscribed in a circle contributes 4Ai/(c2 v2cos()) to the total time
discrepancy, where Ai is the area of the ith triangular slice of the loop and v = R is the
tangential speed of the mirrors. Therefore, the total discrepancy in travel times for the corotating and counter-rotating beams around the entire loop is simply

where A is the total area enclosed in the loop. This applies to polygons with any number
of sides, including the limiting case of circular fiber-optic loops with virtually infinitely
many edges (where the "mirrors" are simply the inner reflective lining of the fiber-optic
cable), in which case goes to zero and the denominator of the phase difference is simply
c2 v2. For realistic values of v (i.e., very small compared with c), the phase difference
reduces to the well-known result 4A/c2. It's worth noting that nothing in this derivation
is unique to special relativity, because the Sagnac effect is a purely "classical" effect. The
apparatus is set up as a differential device, so the relativistic effects apply equally in both
directions, and hence the higher-order corrections of special relativity cancel out of the
phase difference.
Despite the ease and clarity with which special relativity accounts for the Sagnac effect,
one occasionally sees claims that this effect entails a conflict with the principles of
special relativity. The usual claim is that the Sagnac effect somehow falsifies the
invariance of light speed with respect to all inertial coordinate systems. Of course, it does
no such thing, as is obvious from the fact that the simple description of an arbitrary
Sagnac device given above is based on isotropic light speed with respect to one particular
system of inertial coordinates, and all other inertial coordinate systems are related to this
one by Lorentz transformations, which are defined as the transformations that preserve
light speed. Hence no description of a Sagnac device in terms of any system of inertial

coordinates can possibly entail non-isotropic light speed, nor can any such description
yield physically observable results different from those derived above (which are known
to agree with experiment).
Nevertheless, it remains a seminal tenet of anti-relativityism (for lack of a better term)
that the trivial Sagnac effect somehow "disproves relativity". Those who espouse this
view sometimes claim that the expressions "c+v" and "cv" appearing in the derivation of
the phase shift are prima facie proof that the speed of light is not c with respect to some
inertial coordinate system. When it is pointed out that those quantities do not refer to the
speed of light, but rather to the sum and difference of the speed of light and the speed of
some other object, both with respect to a single inertial coordinate system, which can be
as great as 2c according to special relativity, the anti-relativityists are undaunted, and
merely proceed to construct progressively more convoluted and specious "objections".
For example, they sometimes argue that each point on the perimeter of a rotating circular
Sagnac device is always instantaneously at rest in some inertial coordinate system, and
according to special relativity the speed of light is precisely c in all directions with
respect to any inertial system of coordinates, so (they argue) the speed of light must be
isotropic at every point around the entire circumference of the loop, and hence the light
pulses must take an equal amount of time to traverse the loop in either direction. Needless
to say, this "reasoning" is invalid, because the pulses of light are never (let alone always)
at the same point in the loop at the same time during their respective trips around the loop
in opposite directions. At any given instant the point of the loop where one pulse is
located is necessarily accelerating with respect to the instantaneous inertial rest frame of
the point on the loop where the other pulse is located (and vice versa). As noted above,
its self-evident that since the speed of light is isotropic with respect to at least one
particular frame of reference, and since every other frame is related to that frame by a
transformation that explicitly preserves light speed, no inconsistency with the invariance
of the speed of light can arise.
Having accepted that the observable effects predicted by special relativity for a Sagnac
device are correct and entail no logical inconsistency, the dedicated opponents of special
relativity sometimes resort to claims that there is nevertheless an inconsistency in the
relativistic interpretation of what's really happening locally around the device in certain
extreme circumstances. The fundamental fallacy underlying such claims is the idea that
the beams of light are traveling the same, or at least congruent, inertial paths through
space and time as they proceed from the source to the detector. If this were true, their
inertial speeds would indeed need to differ in order for their arrival times at the detector
to differ. However, the two pulses do not traverse congruent paths from emission to
detector (assuming the device is absolutely rotating). The co-rotating beam is traveling
slightly farther than the counter-rotating beam in the inertial sense, because the detector
is moving away from the former and toward the latter while they are in transit. Naturally
the ratio of optical path lengths is the same with respect to any fixed system of inertial
coordinates.
Its also obvious that the absolute difference in optical path lengths cannot be
"transformed away", e.g., by analyzing the process with respect to coordinates rigidly

attached to and rotating along with the device. We can, of course, define a system of
coordinates in terms of which the position of a point fixed on the disk is independent of
the time coordinate, but such coordinates are necessarily rotating (accelerating), and
special relativity does not entail invariant or isotropic light speed with respect to noninertial coordinates. (In fact, one need only consider the distant stars circumnavigating
the entire galaxy every 24 hours with respect to the Earth's rotating system of reference to
realize that the limiting speed of travel is generally not invariant and isotropic in terms of
accelerating coordinates.) A detailed analysis of a Sagnac device in terms of non-inertial
(i.e., rotating) coordinates is presented in Section 4.8, and discussed from a different
point of view in Section 5.1. For the present, let's confine our attention to inertial
coordinates, and demonstrate how a Sagnac device is described in terms of
instantaneously co-moving inertial frames of an arbitrary point on the perimeter.
Suppose we've sent a sequence of momentary pulses around the loop, at one-second
intervals, in both directions, and we have photo-detectors on each mirror to detect when
they are struck by a co-rotating or counter-rotating pulse. Clearly the pulses will strike
each mirror at one-second intervals from both directions (though not necessarily
synchronized) because if they were arriving more frequently from one direction than
from the other, the secular lag between corresponding pulses would be constantly
increasing, which we know is not the case. So each mirror is receiving one pulse per
second from both directions. Furthermore, a local measurement of light speed performed
(over a sufficiently short period of time) by an observer riding along at a point on the
perimeter will necessarily show the speed of light to be c in all direction with respect to
his instantaneously co-moving inertial coordinates. However, this system of coordinates
is co-moving with only one particular point on the rim. At other points on the rim these
coordinates are not co-moving, and so the speed of light is not c at other points on the rim
with respect to these coordinates.
To describe this in detail, let's first analyze the Sagnac device from the hub-centered
inertial frame. Throughout this discussion we assume an n-sided polygonal loop where n
is very large, so the segment between any two adjacent mirrors subtends only a very
small angle. With respect to the hub-centered frame each segment is moving with a
velocity v parallel to the direction of travel of the light beams, so the situation on each
segment is as plotted below in terms of hub-frame coordinates:

In this drawing, tf is the time required for light to cross this segment in the co-rotating
direction, and tr is the time required for light to cross in the counter-rotating direction.
The difference between these two times, denoted by dt, is the incremental Sagnac effect
for a segment of length dp on the perimeter.
Now, the ratio of dt/dp as a function of the rim velocity v can easily be read off this
diagram, and we find that

This can be taken as a measure of the anisotropy over an incremental segment with
respect to the hub frame. (Notice that this anisotropy with respect to the conventional
relativistic spacetime decomposition for any inertial frame is actually in the distance
traveled, not the speed of travel.) All the segments are symmetrical in this frame, so they
all have this same anisotropy. Therefore, we can determine the total difference in travel
times for co-rotating and counter-rotating beams of light making a complete trip around
the loop by integrating dt around the perimeter. Thus we have

Substituting r in place of v in the numerator, and noting that the enclosed area is A =
r2, we again arrive at the result T = 4A/(c2 v2).
Now let's analyze the loop with respect to one of our tangential frames of reference, i.e.,
an inertial frame that is momentarily co-moving with one of the segments on the rim. If
we examine the situation on that particular segment in terms of its own co-moving
inertial frame we find, not surprisingly, the situation shown below:

This shows that dt/dp = 0, meaning no anisotropy at all. Nevertheless, if the light beams
are allowed to go all the way around the loop, their total travel times will differ by T as
computed above, so how does that difference arise with respect to this tangential frame?
Notice that although dt/dp equals zero at this tangent point with respect to the tangent
frame, segments 90 degrees away from this point have the same anisotropy as we found
for all the segments relative to the hub frame, namely, dt/dp = 2v/(c2 v2), because the
velocity of those two segments relative to our tangential frame is exactly v along the
direction of the light rays, just as it was with respect to the hub frame. Furthermore, the
segment 180 degrees away from our tangent segment has twice the anisotropy as it has
with respect to the original hub-frame inertial coordinates, because that segment has a
velocity of 2v with respect to our tangential frame.
In general, the anisotropy dt/dp can be computed for any segment on the loop simply by
determining the projection of that segment's velocity (with respect our tangential frame)
onto the axis of the light rays. This gives the results illustrated below, showing the ratio
of the tangential frame anisotropy to the hub frame anisotropy:

It's easy to show that

where is the angle relative to the tangent point. To assess the total difference in arrival
times for light rays going around the loop in opposite directions, we need to integrate dt
by dp around the perimeter. Noting that equals p/r, we have

which again equals 4A/(c2 v2), in agreement with the hub frame analysis. Thus,
although the anisotropy is zero at each point on the rim's surface when evaluated with
respect to that point's co-moving inertial frame, we always arrive at the same overall nonzero anisotropy for the entire loop. This was to be expected, because the absolute
physical situation and intervals are the same for all inertial frames. We're simply
decomposing those absolute intervals into space and time components in different ways.
The union of all the "present" time slices of the sequence of instantaneous co-moving
inertial coordinate systems for a point fixed on the rim of a rotating disk, with each time
slice assigned a time coordinate equal to the proper time of the fixed point, constitutes a
coherent and unambiguous coordinate system over a region of spacetime that includes the
entire perimeter of the disk. The general relation for mapping the proper time of one
worldline into another by means of the co-moving planes of simultaneity of the former is
derived at the end of Section 2.9, where it is shown that the derivative of the mapped time
from a point fixed on the rim to a point at the same radius fixed in the hub frame is
positive provided the rim speed is less than c. Of course, for locations further from the
center of rotation the planes of simultaneity of a revolving point fixed on the rim will be
become "retrograde", i.e., will backtrack, making the coordinate system ambiguous. This
occurs for locations at a distance greater than 1/a from the hub, where a is the
acceleration of the point fixed on the rim.
It's also worth noting that the amount of angular travel of the device during the time it
takes for one pair of light pulses to circumnavigate a circular loop is directly proportional
to the net "anisotropy" in the travel times. To prove this, note that in a circular Sagnac
device of radius R the beam of light in the direction of rotation travels a distance of (2
t1)R and the other beam goes a distance of (2 + t2)R where t1 and t2 are the travel
times of the two beams, and is the angular velocity of the loop. The travel times of the
beams are just these distances divided by c, so we have

Solving for the times gives

so the difference in times is

where A = 2R2 and v = R. The "anisotropic ratio" is the ratio of the travel times,
which is

Solving this for R gives

Letting denote the angular travel of the loop during the travel of the two light beams,
we have

Substituting for R this reduces to

Therefore, the amount by which the ratio of travel times differs from 1 is exactly
proportional to the angle through which the loop rotates during the transit of light, and
this is true independent of R. (Of course, increasing the radius has the effect of increasing
the difference between the travel times, but it doesn't alter the ratio.)
It's worth emphasizing that the Sagnac effect is purely a classical, not a relativistic
phenomenon, because it's a "differential device", i.e., by running the light rays around the
loop in opposite directions and measuring the time difference, it effectively cancels out
the "transverse" effects characteristic of truly relativistic phenomenon. For example, the
length of each incremental segment around the perimeter is shorter by a factor of [1
(v/c)2]1/2 in the hub based frame than in it's co-moving tangential frame, but this factor
applies in both directions around the loop, so it doesn't affect the differential time.
Likewise a clock on the perimeter moving at the speed v runs slow, in accord with special
relativity, but the frequency of the light source is correspondingly slow, and this applies

equally in both directions, so this does not affect the phase difference at the receiver.
Thus, a pure Sagnac apparatus does not discriminate between relativistic and prerelativistic theories (although it does rule out ballistic theories, ala Ritz). Ironically, this is
the main reason it comes up so often in discussions of relativity, because the effect can
easily be computed on a non-relativistic basis (as we did above for a circular loop, taking
the sums c+v and cv to determine the transit times in the two directions). Of course, if
the light traveling around the loop passes through moving media with indices of
refraction differing significantly from unity, then the Fizeau effect must also be taken into
account, and in this case the results, while again perfectly consistent with special
relativity, are quite problematic for any non-relativistic ether-based interpretation.
As mentioned above, as early as 1904 Michelson had proposed using such a device to
measure the rotation of the earth, but he hadn't pursued the idea, since measurements of
absolute rotation are fairly commonplace (e.g. Focaults pendulum). Nevertheless, he
(along with Gale) agreed to perform the experiment in 1925 (at considerable cost) at the
urging of "relativists", who wished him to verify the shift of 236/1000 of a fringe
predicted by special relativity. This was intended mainly to refute the ballistic theory of
light propagation, which predicts zero phase shift (for a circular device). Michelson was
not enthusiastic, since classical optics on the assumption of a stationary ether predicted
exactly the same shift does special relativity (as explained above). He said
We will undertake this, although my conviction is strong that we shall prove only
that the earth rotates on its axis, a conclusion which I think we may be said to be
sure of already.
As Harvey Lemon wrote in his biographical sketch of Michelson, "The experiment,
performed on the prairies west of Chicago, showed a displacement of 230/1000, in very
close agreement with the prediction. The rotation of the Earth received another
independent proof, the theory of relativity another verification. But neither fact had much
significance." Michelson himself wrote that "this result may be considered as an
additional evidence in favor of relativity - or equally as evidence of a stationary ether".
The only significance of the Sagnac effect for special relativity (aside from providing
another refutation of ballistic theories) is that although the effect itself is of the first order
in v/c, the qualitative description of the local conditions on the disk in terms of inertial
coordinates depends on second-order effects. These effects have been confirmed
empirically by, for example, the Michelson-Morley experiment. Considering the Earth as
a particle on a large Sagnac device as it orbits around the Sun, the ether drift experiments
demonstrate these second-order effects, confirming that the speed of light is indeed
invariant in terms of relatively moving systems of inertial coordinates.
2.8 Refraction At A Plane Boundary Between Moving Media
Mathematicians usually consider the Rays of Light to be Lines reaching
from the luminous Body to the Body illuminated, and the refraction of

those Rays to be the bending or breaking of those lines in their passing out
of one Medium into another. And thus may Rays and Refractions be
considered, if Light be propagated in an instant. But by an Argument
taken from the Equations of the times of the Eclipses of Jupiter's Satellites,
it seems that Light is propagated in time, spending in its passage from the
Sun to us about seven Minutes of time: And therefore I have chosen to
define Rays and Refractions in such general terms as may agree to Light
in both cases.
Isaac Newton
(Opticks), 1704
The ray angles 1 and 2 for incident and refracted optical rays at a plane boundary
between regions of constant indices of refraction n1 and n2 are related according to Snells
law

However, this formula applies only if the media (which are assumed to have isotropic
index of refraction with respect to their rest frames) are at rest relative to each other. If
the media are in relative transverse motion, it is necessary to account for the effect of
aberration on the ray angles relative to the rest frames of the respective media. The result
is that the effective refraction is a function of the relative transverse velocity of the
media. Thus, measurements of the optical refraction could (in principle) be used to
determine the velocity of a moving volume of fluid. Unlike Doppler shift measurement
techniques, this approach does not rely on the presence of discrete particles in the fluid,
and involves only measurements of direct, rather than reflected, light signals.
Since the amount of refraction at a boundary depends on the angle of incidence with
respect to the rest frames of the media, it follows that if the media have different rest
frames the simple form of Snells law does not apply directly, because it will be necessary
to account for aberration. To derive the law of refraction for transversely moving media,
consider the arrangement shown in Figure 1, drawn with respect to a system of
coordinates (x,y,t) relative to which the medium with refractive index n1 is at rest.

In these coordinates the medium with index n2 is moving transversely with a speed v. By
both Fermats principle of least time and the principles of quantum electrodynamics,
we know that the path of light from point P0 to point P2 is such that the travel time is
stationary (which, in this case, means minimized), so if we express the total travel time as
a function of the x coordinate of the corner point P1, we can differentiate to find the
position that minimizes the time, and from this we can infer the angles of incidence and
refraction.
With respect to the xyt coordinates in which the n1 medium is at rest, the squared spatial
distance from P0 to P1 is x12 + y12, so the time required for light to traverse that distance
is

On the other hand, for the trip from point P1 to point P2 we need to know the distance
traveled with respect to the coordinates x'y't' in which the n2 medium is at rest. If we
define

then the Lorentz transformation gives us the corresponding increments in the primed
coordinates

Therefore, the squared spatial and temporal distances from P1 to P2 in the n2 rest
coordinates are given by

Since the ratio of these increments equals the square of the speed of light in the n2
medium, which is 1/n22, we have

Solving this quadratic for t, which equals tC tB, gives

Differentiating with respect to x, and noting that d(x)/dx1 = 1, we can minimize the
total travel time t2 t0 by adding the derivatives of t and t1 t0 with respect to x1, and
setting the result to zero. This leads to the condition

Making the substitutions

we arrive at the equation for refraction at the plane boundary between transversely
moving media

As expected, this reduces to Snells law for stationary media if we set v = 0. Also, if the
moving medium has a refractive index of n2 = 1, this equation again reduces to Snells
law, regardless of the velocity, because the concept of speed doesnt apply to the vacuum.
If we define the parameter

then the refraction equation can be written more compactly as

This can be solved explicitly for sin(2) to give the result

with the appropriate sign for the square root. Taking n1 = 1.2 and n2 = 1.5, the figure
below shows the angle of refraction 2 as a function of the transverse speed v of the
medium with various angles of incidence 1 ranging from -3/8 to +3/8 radians.

Incidentally, when plotting these lines it is necessary to take the positive root when v is
above the zero-crossing speed, and the negative root when v is below. The zero-crossing
speed (i.e., the speed v when the refracted angle is zero) is

The figure shows that at high relative speeds and high angle of incidence we can achieve
total internal reflection, even though the downstream medium is more dense than the
upstream medium. The critical conditions occur when the squared quantity in parentheses
in the preceding equation reaches 1, which implies

Solving these two quadratics for v (remembering that 2 is a function of v), we have the
four distinguished speeds

The two speeds given by 1/n2 (which are just the speeds of light in the moving
medium) generally correspond to removable singularities, because both the numerator
and denominator of the expression for sin(2) vanish. At these speeds the values of 2 can
be assigned continuously as

It isnt clear what, if any, optical effects would appear at these two removable
singularities. The other two distinguished speeds represent the onset of total internal
reflection if their values fall in the range from -1 to +1. For example, the figure above
shows that total internal reflection for an incident angle of 1 = 3/8 with n1 = 1.2 and n2 =
1.5 begins when the speed v exceeds

Notice that for an incidence angle of zero, this speed is simply n2, which is ordinarily
greater than 1, and thus outside the range of achievable speeds (since we assume the
medium itself is moving through a vacuum). However, for non-zero angles of incidence it
is possible for one of these two critical speeds to lie in the achievable range. In fact, for
certain values of n1, n2, and 1, it is possible for all four of the critical speeds to lie within
the achievable range, leading to some interesting phenomena. For example, with n1 = n2 =
2.5 and with 1 = 45 degrees, the refracted angle as a function of medium speed is as

shown below.

In this case the distinguished speeds are -0.4, +0.203, +0.4, and +0.783. This suggests
that as the transverse speed of the medium increases from 0, the refracted ray becomes
steeper until reaching 90 degrees at v = +0.203, at which point there is total internal
reflection. This remains the case until achieving a speed of +0.783, at which point some
refraction is re-introduced, and the refracted angle sweeps back from +90 to about +80
degrees (relative to the stationary frame), and then back to +90 degrees as speed
continues to increase to 1. This can be explained in terms of the variations in the effective
critical angle and the aberration angle. As speed increases, the effective critical angle for
total internal reflection initially increases faster than the aberration angle, pushing the ray
into total internal reflection. However, eventually (at close to the speed of light) the
aberration effect brings the incident ray back into the refractive range.
For an alternative derivation that leads to a different, but equivalent, relation, suppose the
index of refraction of the stationary region is n1 = 1, which implies this region is a
vacuum. If we let d1 denote the spatial distance from P0 to P1 with respect to the rest
frame, then we have

These are the components of the interval P0 to P1 with respect to the rest frame of n1, and
they can be converted to the frame of n2 (denoted by upper case letters) using the Lorentz
transformation

Letting 1 denote the angle 1 with respect to the moving n2 coordinate system, we can
express the tangent of this angle as

Taking the sine of the inverse tangent of both sides gives the familiar aberration formula

Since we are assuming the n1 medium is a vacuum, we are free to treat the entire
configuration as being at rest in the n2 coordinates, with the angle of incidence as defined
above. Therefore, Snells law for stationary media can be applied to give the refracted
angle relative to these coordinates

Now, if D2 is the spatial distance from P1 to P2 with respect to the moving coordinates, we
have

Also, the Lorentz transformation gives the coordinates of points P1 and P2 in the rest
frame in terms of the coordinates in the moving frame as follows:

From these we can construct the tangent of 2 with respect to the rest coordinates

Substituting for the coordinate differences gives

We saw previously that

so we can explicitly compute 2 from 1. It can be shown that this solution is identical to
the solution (with n1 = 1) derived previously on the basis of Fermat's principle.
Furthermore, we can solve these equations for sin(1) as a function of 2 and then by
equating this sin(1) with n3 sin(3) for a stationary medium neighboring the vacuum
region, we again have the general solution for two refractive media in relative transverse
motion. A plot of 2 from 1 for various values of v is shown below:

2.9 Accelerated Travels


This yields the following peculiar consequence: If there are two
synchronous clocks, and one of them is moved along a closed curve with
constant [speed] until it has returned, then this clock will lag on its arrival
behind the clock that has not been moved.
Albert
Einstein, 1905
Suppose a particle accelerates in such a way that it is subjected to a constant proper
acceleration a0 for some period of time. The proper acceleration of a particle is defined
as the acceleration with respect to the particle's momentarily co-moving inertial
coordinates at any given instant. The particle's velocity is v = 0 at the time t = 0, when it
is located at x = 0, and at some infinitesimal time t later its velocity is t a0 and its
location is (1/2) a0 t2. The slope of its line of simultaneity is the inverse of the slope 1/v
of its worldline, so its locus of simultaneity at t = t is the line given by

This line intersects the particle's original locus of simultaneity at the point (x,0) where

At each instant the particle is accelerating relative to its current instantaneous frame of
reference, so in the limit as t goes to zero we see that its locus of simultaneity
constantly passes through the point (-1/a0, 0), and it maintains a constant absolute
spacelike distance of -1/a0 from that point, as illustrated in the figure below.

This can be compared to a particle moving with a speed v tangentially to a center of

attraction toward which it is drawn with a constant acceleration a0. The path of such a
particle is a circle in space of radius v2/a0. Likewise in spacetime a particle moving with a
speed c tangentially to a center of "repulsion" with a constant acceleration a0 traces out a
hyperbola with a "radius" of c2/a0. (In this discussion we are using units with c=1, so the
"radius" shown in the above figure is written as 1/a0.)
Since the worldline of a particle with constant proper acceleration is a branch of a
hyperbola with "radius" 1/a0, we can shift the x axis by 1/a0 to place the origin at the
center of the hyperbola, and then write the equation of the worldline as

Differentiating both sides with respect to t gives

which shows that the velocity of the worldline at any point (x,t) is given by v = t/x.
Consequently the line from the origin through any point on the hyperbolic path represents
the space axis for the co-moving inertial coordinates of the accelerating worldline at that
point. The same applies to any other hyperbolic path asymptotic to the same lightlines, so
a line from the origin intersects any two such hyperbolas at points that are mutually
simultaneous and separated by a constant proper distance (since they are both a fixed
proper distance from the origin along their mutual space axis). It follows that in order for
a slender "rigid" rod accelerating along its axis to maintain a constant proper length (with
respect to its co-moving inertial frames), the parts of the rod must accelerate along a
family of hyperbolas asymptotic to the same lightlines, as illustrated below.

The x',t' axes represent the mutual co-moving inertial frame of the hyperbolic worldlines

where they intersect with the x' axis. All the worldlines have constant proper distances
from each other along this axis, and all have the same speed. The latter implies that they
have each been accelerated by the same total amount at any instant of their mutual comoving inertial frame, but the accelerations have been distributed differently. The "innermost" worldline (i.e., the trailing end of the rod) has been subjected to a higher level of
instantaneous acceleration but for a shorter time, whereas the "outer-most" worldline (i.e.,
the leading end of the rod) has been accelerated more mildly, but for a longer time. It's
worth noting that this form of "coherent" acceleration would not occur if the rod were
accelerated simply by pushing on one end. It would require the precisely coordinated
application of distinct force profiles to each individual particle of the rod. Any deviation
from these profiles would result in internal stresses of one part of the rod on another, and
hence the rest length would not remain fixed. Furthermore, even if the coherent
acceleration profiles are perfectly applied, there is still a sense in which the rod has not
remained in complete physical equilibrium, because the elapsed proper times along the
different hyperbolic worldlines as the rod is accelerated from a rest state in x,t to a rest
state in some x',t' differ, and hence the quantum phases of the two ends of the rod are
shifted with respect to each other. Thus we must assume memorylessness (as mentioned
in Section 1.6) in order to assert the equivalence of the equilibrium states for two
different frames of reference.
We can then determine the lapse of proper time along any given hyperbolic worldline
using the relation

, which leads (for the hyperbola of unit "radius") to

Integrating this relation gives

Solving this for t and substituting into the equation of the hyperbola to give x, we have
the parametric equation of the hyperbola as a function of the proper time along the
worldline. If we subtract 1/a0 from x to return to our original x coordinate (such that x =
0 at t = 0) these equations are

Differentiating the above expressions gives

so the particle's velocity relative to the original inertial coordinates is

We're using "time units" throughout this section, which means that all times and distances
are expressed in units of time. For example, if the proper acceleration of the particle is 1g
(the acceleration of gravity at the Earth's surface), then
g

= (3.27)10-8 sec-1

= 1.031 years-1

and all distances are in units of light-seconds.


To show the implications of these formulas, suppose a space traveler moves away from
the Earth with a constant proper acceleration of 1g for a period of T years as measured on
Earth. He then reverses his acceleration, coming to rest after another T years has passed
on Earth, and then continues his constant Earthward acceleration for another T Earthyears, at which point he reverses his acceleration again and comes to rest back at the
Earth in another T Earth-years. The total journey is completed in 4T Earth-years, and it
consists of 4 similar hyperbolic segments as illustrated below.

There are several questions we might ask about this journey. First, how far away from
Earth does the traveler reach at his furthest point? This occurs at point C, which is at 2T
according to Earth time, when the traveler's acceleration brings him momentarily to rest
with respect to the Earth. To answer this question, recall that can be expressed as a
function of t by

Now, the maximum distance from Earth is twice the distance at point B, when t = T, so
we have

The maximum speed of the traveler in terms of the Earth's inertial coordinates occurs at
point B, where t = T (and again at point D, where t = 3T), and so is given by

The total elapsed proper time for the traveler during the entire journey out and back,
which takes 4T years according to Earth time, is 4 times the lapse of proper time to point
B at t = T, so it is given by

So far we have focused mainly on a description of events in terms of the Earth's inertial
coordinates x and t, but we can also describe the same events in terms of coordinate
systems associated with the accelerating traveler. At any given instant the traveler is
momentarily at rest with respect to a system of inertial coordinates, so we can define
"proper" time and space measurements in terms of these coordinates. However, when we
differentiate these time and space intervals as the traveler progresses along his worldline,
we will find that new effects appear, due to the fact that the coordinate system itself is
changing. As the traveler accelerates he continuously progresses from one system of
momentarily co-moving inertial coordinates to another, and the effect of this change in
the coordinates will show up in any derivatives that we take with respect to the time and
space components.
For example, suppose we ask how fast the Earth is moving relative to the traveler. This
question can be interpreted in different ways. With respect to the traveler's momentarily
co-moving inertial coordinates, the Earth's velocity is equal and opposite to the traveler's
velocity with respect to the Earth's inertial coordinates. However, this quantity does not
equal the derivative of the proper distance with respect to the proper time. The proper
distance s from the Earth in terms of the traveler's momentarily co-moving inertial
coordinates at the proper time is

which shows that the proper distance approaches a constant 1/g (about 1 light-year) as
increases. This shouldn't be surprising, because we've already seen that the traveler's
proper distance from a fixed point on the other side of the Earth actually is constant and
equal to 1/g throughout the period of constant proper acceleration.
The derivative of the proper distance of the Earth with respect to the proper time is

This can be regarded as a kind of velocity, since it represents the proper rate of change of
the proper distance from the Earth as the traveler accelerates away. A plot of this function
as varies from 0 to 6 years is shown below.

Initially the proper distance from the Earth increases as the traveler accelerates away, but
eventually (if the constant proper acceleration is maintained for a sufficiently long time)
the "length contraction" effect of his increasing velocity becomes great enough to cause
the derivative to drop off to zero as the proper distance approaches a constant 1/g. To find
the point of maximum ds/d we differentiate again with respect to to give

Setting this to zero, we see that the maximum occurs at


, and
substituting this into the expression for ds/d gives the maximum value of 1/2. Thus the
derivative of proper distance from Earth with respect to proper time during a constant 1g
acceleration away from the Earth reaches a maximum of half the speed of light at a
proper time of about 0.856 years, after which is drops to zero.

Similarly, the traveler's proper distance S from the turnaround point is given by

The derivative of this with respect to the traveler's proper time is

A plot of this "velocity" is shown below for the first quartile leg of a journey as described
above with T = 20 years.

The magnitude of this "velocity" increases rapidly at the start of the acceleration, due to
the combined effects of the traveler's motion and the onset of "length contraction", but if
allowed to continue long enough the "velocity" drops off and approaches 2 (i.e., twice the
speed of light) at the point where the traveler reverses his acceleration. Of course, the fact
that this derivative exceeds c does not conflict with the fact that c is an upper limit on
velocities with respect to inertial coordinate systems, because S and do not constitute
inertial coordinates.
To find the extreme point on this curve we differentiate again with respect to , which
gives

Consequently we see that the extreme value occurs (assuming the journey is long enough
and the acceleration is great enough) at the proper time
value of dS/d is

, where the

By symmetry, these same two characteristics apply to all four of the "quadrants" of the
traveler's journey, with the appropriate changes of sign and direction. The figure below
shows the proper distances s(t) and S(t) (i.e., the distances from the origin and the
destination respectively) during the first two quadrants of a journey with T = 6.

By symmetry we see that the portions of these curves to the right of the mid-point can be
generated from the relation s() = S(C ). Also, it's obvious that

If we consider journeys with non-constant proper accelerations, it's possible to construct


some slightly peculiar-sounding scenarios. For example, suppose the traveler accelerates
in such a way that his velocity is 1 exp(-kt) for some constant k. It follows that the
distance in the Earth's frame at time t is [kt + exp(-kt) 1]/k, so the distance in the
traveler's frame is

This function initially increases, then reaches a maximum, and then asymptotically
approaches zero. With k = 1 year-1 the maximum occurs at roughly 3 years and a distance
of about 0.65 light-years (relative to the traveler's frame). Thus we have the seemingly
paradoxical situation that the Earth "becomes closer" to the traveler as he moves further
away.
This is not as strange as it may sound at first. Suppose we leave home and drive for 1
hour at a constant speed of 20 mph. We could then say that we are "1 hour from home".
Now suppose we suddenly accelerate to 40 mph. How far (in time) are we away from
home? If we extrapolate our current worldline back in time, we are only 1/2 hour from
home. If we speed up some more, our "distance" (in terms of time) from home becomes
less and less. Of course, we have to speed up at a rate that more than compensates for the
increasing road distance, but that's not hard to do (in theory). The only difference
between this scenario and the relativistic one is that when we accelerate to relativistic
speeds both our time and our space axes are affected, so when we extrapolate our current
frame of reference back to Earth we find that both the time and the distance are
shortened.
Another interesting acceleration profile is the one that results from a constant nozzle
velocity u and constant exhaust mass flow rate w = dm0/d, where is the proper time of
the rocket, the effective force is uw throughout the acceleration. This does not result in
constant proper acceleration, because the rest mass of the rocket is being reduced while
the applied proper force remains constant. In this case we have

where t is the time of the initial coordinates and v is the velocity of the rocket with
respect to those coordinates. Also, we have m0() = m0(0) w , so we can integrate to get
the speed

Letting () denote the ratio [m(0) w ]/m(0), which is the ratio of rest masses at the
start of the acceleration to the rest mass at proper time , the result is

so we have

Also, since dt = d /
, we can integrate this to get the coordinate time t as a
function of the rocket's proper time

In the limit as the nozzle velocity u approaches 1, this expression reduces to

It's interesting that for photonic propulsion (u=1) the mass ratio r is identical to the
Doppler frequency shift of the exhaust photons relative to the original rest frame, i.e., we
have

Thus if the rocket continues to convert its own mass to energy and eject it as photons of a
fixed frequency, the energy of each photon as seen from the fixed point of origin is
exactly proportional to the rest mass of the rocket at the moment when the photon was
ejected. Also, since r(t) is the current rest mass m0(t) divided by the original rest mass
m0(0), and since the inertial mass m(t) is related to the rest mass m0(t) by the equation
m(t) = m0(t) /
, we find that the inertial mass m(t) of the rocket is given as a
function of the rocket's velocity v by the equation

Thus we find that as the rocket's velocity goes to 1 at the moment when it is converting
the last of its rest mass into energy, so its rest mass is going to zero, its inertial mass goes
to m0(0)/2, i.e., exactly half of the rocket's original rest mass. This is to be expected,
because momentum must be conserved, and all the photons except that very last have
been ejected in the rearward direction at the speed of light, leaving only the last
remaining photon (which has nothing to react against) moving in the forward direction,
so it must have momentum equal to all the rearward momentum of the ejected photons.
The momentum of a photon is p = h/c = E/c, so in units with c = 1 we have p = E. The
original energy content of the rocket was it's rest mass, m0(0), which has been entirely

converted to energy, half in the forward direction (in the last remaining super-energetic
photon) and half in the rearward direction (the progressively more redshifted stream of
exhaust photons).
The preceding discussion focused on purely linear motion, but we can just as well
consider arbitrary accelerated paths. It's trivial to determine the lapse of proper time along
any given timelike path as a function of an inertial time coordinate simply by integrating
d over the path, but it's a bit more challenging to express the lapse of proper time along
one arbitrary worldline with respect to the lapse of proper time along another, because the
appropriate correspondence is ambiguous. Perhaps the most natural correspondence is
given by mapping the proper time along the reference worldline to the proper time along
the subject worldline by means of the instantaneously co-moving planes of inertial
simultaneity of the reference worldline. In other words, to each point along the reference
worldline we can assign a locus of simultaneous points based on co-moving inertial
coordinates at that point, and we can then find the intersections of these loci with the
subject worldline.
Quantitatively, suppose the reference worldline W1 is given parametrically by the
functions x1(t), y1(t), z1(t) where x,y,z,t are inertial coordinates. From this we can
determine the derivatives = dx1/dt,
= dy1/dt, and
= dz1/dt. These also represent
the components of the gradient of the space of simultaneity of the instantaneously comoving inertial frame of the object. In other words, the spaces of simultaneity for W1
have the partial derivatives

These enable us to express the total differential time as a function of the differentials of
the spatial coordinates

If the subject worldline W2 is expressed parametrically by the functions x2(t), y2(t), z2(t),
and if the inertial plane of simultaneity of the event at coordinate time t1 on W1 is
intersected by W2 at the coordinate time t2, then the difference in coordinate times
between these two events can be expressed in terms of the differences in their spatial
coordinates by substituting into the above total differential the quantities dt = t2t1, dx =
x2(t2)x1(t1) and so on. The result is

where the derivatives of x1, y1, and z1 are evaluated at t1. Rearranging terms and omitting
the indications of functional dependence for the W1 coordinates, this can be written in the
form

This is an implicit formula for the value of t2 on W2 corresponding to t1 on W1 based on


the instantaneous inertial simultaneity of W1. Every quantity in this equation is an explicit
function of either t1 or t2, so we can solve for t2 to give a function F1 such that t2 = F1(t1).
We can also integrate the absolute intervals along the two worldlines to give the functions
f1 and f2 which relate the proper times along W1 and W2 to the coordinate time, i.e., we
have 1 = f1(t) and 2 = f2(t). With these substitutions we arrive at the general form of the
expression for 2 with respect to 1:

To illustrate, suppose W1 is the worldline of a particle moving along some arbitrary path
and W2 is just the worldline of the spatial origin of the inertial coordinates. In this case
we have x2 = y2 = z2 = 0 and 2 = t2, so the above formula reduces to

where r and v are the position and velocity vectors of W1 with respect to the inertial rest
coordinates of W2. Differentiating with respect to t1, and multiplying through by dt1/d1 =
(1v2)-1/2, we get

where a is the acceleration vector and is the angle between the r and a vectors. Thus if
the acceleration of W1 is zero, we have d2/d1 = (1v2)1/2. On the other hand, if W2 is
moving around W1 in a circle at constant speed, we have a = -v2/r and the position and
acceleration vectors are perpendicular, giving the result d2/d1 = (1v2)-1/2. This is
consistent with the fact that, if the object is moving tangentially, the plane of simultaneity
for its instantaneously co-moving inertial coordinate system intersects with the constant-t
plane along the line from the object to the origin, and hence the time difference is entirely
due to the transverse dilation (i.e., the square root of 1v2 factor).
If the speed v of W1 is constant, then we have the explicit equation

To illustrate, suppose the object whose worldline is W2 begins at the origin at t = 0 and
thereafter moves counter-clockwise in a circle tangent to the origin in the xy plane with a
constant angular velocity as illustrated below.

In this case the object's spatial coordinates and their derivatives as a function of
coordinate time are

Substituting into the equation for 2 and replacing each appearance of t with
gives the result

This is the proper time of the spatial origin according to the instantaneous time slices of
the moving object's proper time. This function is plotted below with R = 1 and v = 0.8.
Also shown is the stable component

Naturally if the circle radius R goes to infinity the value of the sine function approaches
the argument, and so the above expression reduces to

This confirms the reciprocity between the two worldlines when both are inertial. We can
also differentiate the full expression for 2 as a function of 1 to give the relation between
the differentials

This relation is plotted in the figure below, again for R = 1 and v = 0.8.

It's also clear from this expression that as R goes to infinity the cosine approaches 1, and
we again have

Incidentally, the above equation shows that the ratio of time rates equals 1 when the
moving object is a circumferential distance of

from the point of tangency. Hence, for small velocities v the configuration of "equal time
rates" occurs when the moving object is at /3 radians from the point of tangency. On
the other hand, as v approaches 1, the configuration of equal time rates occurs when the
moving object approaches the point of tangency. This may seem surprising at first,
because we might expect the proper time of the origin to be dilated with respect to the
proper time of the tangentially moving object. However, the planes of simultaneity of the
moving object are tilting very rapidly in this condition, and this offsets the usual time
dilation factor. As v approaches 1, these two effects approach equal magnitude, and
cancel out for a location approaching the point of tangency.

2.10 The Starry Messenger


Let God look and judge!
Cardinal Humbert, 1054 AD
Maxwell's equations are very successful at describing the propagation of light based on
the model of electromagnetic waves, not only in material media but also in a vacuum,
which is considered to be a region free of material substances. According to this model,
light propagates in vacuum at a speed

, where 0 is the permeability

constant and 0 is the permittivity of the vacuum, defined in terms of Coulombs law for
electrostatic force

The SI system of units is defined so that the permeability constant takes on the value 0 =
410-7 tesla meter per ampere, and we can measure the value of the permittivity
(typically by measuring the capacitance C between parallel plates of area A separated by
a distance d, using the relation 0 = Cd/A) to have the value 0 = (8.854187818)10-12
coulombs2 per newton meters2. This leads to the familiar value

for the speed of light in a vacuum. Of course, if we place some substance between our
capacitors when determining 0 we will generally get a different value, so the speed of
light is different in various media. This leads to the index of refraction of various
transparent media, defined as n = cvacuum / cmedium. Thus Maxwell's theory of electromagnetism seems to clearly imply that the speed of propagation of such electromagnetic
waves depends only on the medium, and is independent of the speed of the source.
On the other hand, it also suggests that the speed of light depends on the motion of the
medium, which is easy to imagine in the case of a material medium like glass, but not so
easy if the "medium" is the vacuum of empty space. How can we even assign a state of
motion to the vacuum? In struggling to answer this question, people tried to imagine that
even the vacuum is permeated with some material-like substance, the ether, to which a
definite state of motion could be assigned. On this basis it was natural to suppose that
Maxwell's equations were strictly applicable (and the speed of light was exactly c) only
with respect to the absolute rest frame of the ether. With respect to other frames of
reference they expected to find that the speed of light differed, depending on the direction
of travel. Likewise we would expect to find corresponding differences and anisotropies in
the capacitance of the vacuum when measured with plates moving at high speed relative
to the ether.
However, when extremely precise interferometer measurements were carried out to find a
directional variation in the speed of light on the Earth's surface (presumably moving
through the ether at fairly high speed due to the Earth's rotation and its orbital motion
around the Sun), essentially no directional variation in light speed was found that could
be attributed to the motion of the apparatus through the ether. Of course, it had occurred
to people that the ether might be "dragged along" by the Earth, so that objects on the
Earth's surface are essentially at rest in the local ether. However, these "convection"
hypotheses are inconsistent with other observed phenomena, notably the aberration of
starlight, which can only be explained in an ether theory if it is assumed that an observer
on the Earth's surface is not at rest with respect to the local ether. Also, careful terrestrial

measurements of the paths of light near rapidly moving massive objects showed no sign
of any "convection". Considering all this, the situation was considered to be quite
puzzling.
There is a completely different approach that could be taken to modeling the phenomena
of light, provided we're willing to reject Maxwell's theory of electromagnetic waves, and
adopt instead a model similar to the one that Newton often seemed to have in mind,
namely, an "emission theory". One advocate of such a theory early in the early 1900's
was Walter Ritz, who rejected Maxwell's equations on the grounds that the advanced
potentials allowed by those equations were unrealistic. Ritz debated this point with Albert
Einstein, who argued that the observed asymmetry between advanced and retarded waves
is essentially statistical in origin, due to the improbability of conditions needed to
produce coherent advanced waves. Neither man persuaded the other. (Ironically, Einstein
himself had already posited that Maxwell's equations were inadequate to fully represent
the behavior of light, and suggested a model that contains certain attributes of an
emission theory to account for the photo-electric effect, but this challenge to Maxwell's
equations was on a more subtle and profound level than Ritz's objection to advanced
potentials.)
In place of Maxwell's equations and the electromagnetic wave model of light, the
advocates of "emission theories" generally assume a Galilean or Newtonian spacetime,
and postulate that light is emitted and propagates away from the source (perhaps like
Newtonian corpuscles) at a speed of c relative to the source. Thus, according to
emission theories, if the source is moving directly toward or away from us with a speed v,
then the light from that source is approaching us with a speed c+v or cv respectively.
Naturally this class of theories is compatible with experiments such as the one performed
by Michelson and Morley, since the source of the light is moving along with the rest of
the apparatus, so we wouldn't expect to find any directional variation in the speed of light
in such experiments. Also, an emission theory of light is compatible with stellar
aberration, at least up to the limits of observational resolution. In fact, James Bradley (the
discoverer of aberration) originally explained it on this very basis.
Of course, even an emission theory must account for the variations in light speed in
different media, which means it can't simply say that the speed of light depends only on
the speed of the source. It must also be dependent on the medium through which it is
traveling, and presumably it must have a "terminal velocity" in each medium, i.e., a
certain characteristic speed that it can maintain indefinitely as it propagates through the
medium. (Obviously we never see light come to rest, nor even do we observe noticeable
"slowing" of light in a given medium, so it must always exhibit a characteristic speed.)
Furthermore, based on the principles of an emission theory, the medium-dependent speed
must be defined relative to the rest frame of the medium.
For example, if the characteristic speed of light in water is cw, and a body of water is
moving relative to us with a speed v, then (according to an emission theory) the light
must move with a speed cw + v relative to us when it travels for some significant distance
through that water, so that it has reached its "steady-state" speed in the water. In optics

this distance is called the "extinction distance", and it is known to be proportional to 1/(
), where is the density of the medium and is the wavelength of light. The extinction
distance for most common media for optical light is extremely small, so essentially the
light reaches its steady-state speed as soon as it enters the medium.
An experiment performed by Fizeau in 1851 to test for optical "convection" also sheds
light on the viability of emission theories. Fizeau sent beams of light in both directions
through a pipe of rapidly moving water to determine if the light was "dragged along" by
the water. Since the refractive index of water is about n = c/cw = 1.33 where cw is the
speed of light in water, we know that cw equals c/1.33, which is about 75% of the speed
of light in a vacuum. The question is, if the water is in motion relative to us, what is the
speed (relative to us) of the light in the water?
If light propagates in an absolutely fixed background ether, and isn't dragged along by the
water at all, we would expect the light speed to still be cw relative to the fixed ether,
regardless of how the water moves. This is admittedly a rather odd hypothesis (i.e., that
light has a characteristic speed in water, but that this speed is relative to a fixed
background ether, independent of the speed of the water), but it is one possibility that
can't be ruled out a priori. In this case the difference in travel times for the two directions
would be proportional to

which implies no phase shift in the interferometer. On the other hand, if emission theories
are right, the speed of the light in the water (which is moving at the speed v) should be
cw+v in the direction of the water's motion, and cwv in the opposite direction. On this
basis the difference in travel times would be proportional to

This is a very small amount (remembering that cw is about 75% of the speed of light in a
vacuum), but it is large enough that it would be measurable with delicate interferometry
techniques.
The results of Fizeau's experiment turned out to be consistent with neither of the above
predictions. Instead, he found that the time difference (proportional to the phase shift)
was a bit less than 43.5% of the prediction for an emission theory (i.e., 43.5% of the
prediction based on the assumption of complete convection). By varying the density of
the fluid we can vary the refractive index and therefore cw, and we find that the measured
phase shift always indicates a time difference of (1cw2) times the prediction of the
emission theory. For water we have cw = 0.7518, so the time lag is (1cw2) = 0.4346 of the
emission theory prediction.

This implies that if we let S(cw,v) and S(cw,v) denote the speeds of light in the two
directions, we have

By partial fraction decomposition this can be written in the form

where

Also, in view of the symmetry S(u,v) = S(v,u), we can swap cw with v to give

Solving these last two equations for A and B gives A = 1 vcw and B = 1 + vcw, so the
function S is

which of course is the relativistic formula for the composition of velocities. So, even if
we rejected Maxwell's equations, it still appears that emission theories cannot be
reconciled with Fizeau's experimental results.
More evidence ruling out simple emission theories comes from observations of a
supernova made by Chinese astronomers in the year 1054 AD. When a star explodes as a
supernova, the initial shock wave moves outward through the star's interior in just
seconds, and elevates the temperature of the material to such a high level that fusion is
initiated, and much of the lighter elements are fused into heavier elements, including
some even heavier than iron. (This process yields most of the interesting elements that we
find in the world around us.) Material is flung out at high speeds in all directions, and this
material emits enormous amounts of radiation over a wide range of frequencies,
including x-rays and gamma rays. Based on the broad range of spectral shifts (resulting
from the Doppler effect), it's clear that the sources of this radiation have a range of speeds
relative to the Earth of over 10000 km/sec. This is because we are receiving light emitted
by some material that was flung out from the supernova in the direction away from the
Earth, and by other material that was flung out in the direction toward the Earth.
If the supernova was located a distance D from us, then the time for the "light" (i.e., EM
radiation of all frequencies) to reach us should be roughly D/c, where c is the speed of

light. However, if we postulate that the actual speed of the light as it travels through
interstellar space is affected by the speed of the source, and if the source was moving
with a speed v relative to the Earth at the time of emission, then we would conclude that
the light traveled at a speed of c+v on it's journey to the Earth. Therefore, if the sources
of light have velocities ranging from -v to +v, the first light from the initial explosion to
reach the Earth would arrive at the time D/(c+v), whereas the last light from the initial
explosion to reach the Earth would arrive at D/(c-v). This is illustrated in the figure
below.

Hence the arrival times for light from the initial explosion event would be spread out over
an interval of length D/(cv) D/(c+v), which equals (D/c)(2v/c) / (1(v/c)2). The
denominator is virtually 1, so we can say the interval of arrival times for the light from
the explosion event of a supernova at a distance D is about (D/c)(2v/c), where v is the
maximum speed at which radiating material is flung out from the supernova.
However, in actual observations of supernovae we do not see this "spreading out" of the
event. For example, the Crab supernova was about 6000 light years away, so we had D/c
= 6000 years, and with a range of source speeds of 10000 km/sec (meaning v = 5000)
we would expect a range of arrival times of 200 years, whereas in fact the Crab was only
bright for less than a year, according to the observations recorded by Chinese
astronomers in July of 1054 AD. For a few weeks the "guest star", as they called it, in the
constellation Taurus was the brightest star in the sky, and was even visible in the daytime
for twenty-six days. Within two years it had disappeared completely to the naked eye. (It
was not visible in Europe or the Islamic countries, since Taurus is below the horizon of
the night sky in July for northern latitudes.) In the time since the star went supernova the
debris has expanded to it's present dimensions of about 3 light years, which implies that
this material was moving at only (!) about 1/300 the speed of light. Still, even with this
value of v, the bright explosion event should have been visible on Earth for about 40
years (if the light really moved through space at c v). Hence we can conclude that the
light actually propagated through space at a speed essentially independent of the speed of
the sources.
However, although this source independence of light speed is obviously consistent with
Maxwell's equations and special relativity, we should be careful not to read too much into
it. In particular, this isn't direct proof that the speed of light in a vacuum is independent of
the speed of the source, because for visible light (which is all that was noted on Earth in

July of 1054 AD) the extinction distance in the gas and dust of interstellar space is much
less than the 6000 light year distance of the Crab nebula. In other words, for visible light,
interstellar space is not a vacuum, at least not over distances of many light years. Hence
it's possible to argue that even if the initial speed of light in a vacuum was c+v, it would
have slowed to c for most of its journey to Earth. Admittedly, the details of such a
counter-factual argument are lacking (because we don't really know the laws of
propagation of light in a universe where the speed of light is dependent on the speed of
the source, nor how the frequency and wavelength would be altered by interaction with a
medium, so we don't know if the extinction distance is even relevant), but it's not totally
implausible that the static interstellar dust might affect the propagation of light in such a
way as to obscure the source dependence, and the extinction distance seems a reasonable
way of quantifying this potential effect.
A better test of the source-independence of light speed based on astronomical
observations is to use light from the high-energy end of the spectrum. As noted above,
the extinction distance is proportional to 1/(). For some frequencies of x-rays and
gamma rays the extinction distance in interstellar space is about 60000 light years, much
greater than the distances to many supernova events, as well as binary stars and other
configurations with identifiable properties. By observing these events and objects it has
been found that the arrival times of light are essentially independent of frequency, e.g.,
the x-rays associated with a particular identifiable event arrive at the same time as the
visible light for that event, even though the distance to the event is much less than the
extinction distance for x-rays. This gives strong evidence that the speed of light in a
vacuum is actually invariant and independent of the motion of the source.
With the aid of modern spectroscopy we can now examine supernovae events in detail,
and it has been found that they exhibit several characteristic emission lines, particularly
the signature of atomic hydrogen at 6563 angstroms. Using this as a marker we can
determine the Doppler shift of the radiation, from which we can infer the speed of the
source. The energy emitted by a star going supernova is comparable to all the energy that
it emitted during millions or even billions of years of stable evolution. Three main
categories of supernovae have been identified, depending on the mass of the original star
and how much of its "nuclear fuel" remains. In all cases the maximum luminosity occurs
within just the first few days, and drops by 2 or 3 magnitudes within a month, and by 5 or
6 magnitudes within a year. Hence we can conclude that the light actually propagated
through empty space at a speed essentially independent of the speed of the sources.
Another interesting observation involving the propagation of light was first proposed in
1913 by DeSitter. He wondered whether, if we assume the speed of light in a vacuum is
always c with respect to the source, and if we assume a Galilean spacetime, we would
notice anything different in the appearances of things. He considered the appearance of
binary star systems, i.e., two stars that orbit around each other. More than half of all the
visible stars in the night sky are actually double stars, i.e., two stars orbiting each other,
and the elements of their orbits may be inferred from spectroscopic measurements of
their radial speeds as seen from the Earth. DeSitter's basic idea was that if two stars are
orbiting each other and we are observing them from the plane of their mutual orbit, the

stars will be sometimes moving toward the Earth rapidly, and sometimes away.
According to an emission theory this orbital component of velocity should be added to or
subtracted from the speed of light. As a result, over the hundreds or thousands of years
that it takes the light to reach the Earth, the arrival times of the light from approaching
and receding sources would be very different.
Now, before we go any further, we should point out a potential difficulty for this kind of
observation. The problem (again) is that the "vacuum" of empty space is not really a
perfect vacuum, but contains small and sparse particles of dust and gas. Consequently it
acts as a material and, as noted above, light will reach it's steady-state velocity with
respect to that interstellar dust after having traveled beyond the extinction distance. Since
the extinction distance for visible light in interstellar space is quite short, the light will be
moving at essentially c for almost its entire travel time, regardless of the original speed.
For this reason, it's questionable whether visual observations of celestial objects can
provide good tests of emission theory predictions. However, once again we can make use
of the high-frequency end of the spectrum to strengthen the tests. If we focus on light in
the frequency range of, say, x-rays and gamma rays, the extinction distance is much
larger than the distances to many binary star systems, so we can carry out DeSitter's
proposed observation (in principle) if we use x-rays, and this has actually been done by
Brecher in 1977.
With the proviso that we will be focusing on light whose extinction distance is much
greater than the distance from the binary star system to Earth (making the speed of the
light simply c plus the speed of the star at the time of emission), how should we expect a
binary star system to appear? Let's consider one of the stars in the binary system, and
write its coordinates and their derivatives as

where D is the distance from the Earth to the center of the binary star system, R is the
radius of the star's orbit about the system's center, and w is the angular speed of the star.
We also have the components of the emissive light speed
c2 = cx2 + cy2
In these terms we can write the components of the absolute speed of the light emitted
from the star at time t:

Now, in order to reach the Earth at time T the light emitted at time t must travel in the x

direction from x(t) to 0 at a speed of


direction. Hence we have

for a time t = Tt, and similarly for the y

Substituting for x, y, and the light speed derivatives

, we have

Squaring both sides of both equations, and adding the resulting equations together, gives

Re-arranging terms gives the quadratic in t

If we define the normalized parameters

then the quadratic in t becomes

Solving this quadratic for t = Tt and then adding t to both sides gives the arrival time
T on Earth as a function of the emission time t on the star

If the star's speed v is much less than the speed of light, this can be expressed very nearly
as

The derivative of T with respect to t is

and this takes it's minimum value when t = 0, where we have

Consequently we find the DeSitter effect, i.e., dT/dt goes negative if d > r / v2. Now, we
know from Kepler's third law (which also applies in relativistic gravity with the
appropriate choice of coordinates) that m = r3 w2 = r v2, so we can substitute m/r for v2
in our inequality to give the condition d > r2 / m. Thus if the distance of the binary star
system from Earth exceeds the square of the system's orbital radius divided by the
system's mass (in geometric units) we would expect DeSitter's apparitions - assuming the
speed of light is c v.
As an example, for a binary star system a distance of d = 20000 light-years away, with an
orbital radius of r = 0.00001 light-years, and an orbital speed of v = 0.00005, the arrival
time of the light as a function of the emission time is as shown below:

This corresponds to a star system with only about 1/6 solar mass, and an orbital radius of
about 1.5 million kilometers. At any given reception time on Earth we can typically "see"
at least three separate emission events from the same star at different points in its orbit.
These ghostly apparitions are the effect that DeSitter tried to find in photographs of many
binary star systems, but none exhibited this effect. He wrote
The observed velocities of spectroscipic doubles are as a matter of fact

satisfactorily represented by a Keplerian motion. Moreover in many cases the


orbit derived from the radial velocities is confirmed by visual observations (as for
Equuli, Herculis, etc.) or by eclipse observations (as in Algol variables). We
can thus not avoid the conclusion [that] the velocity of light is independent of the
motion of the source. Ritzs theory would force us to assume that the motion of
the double stars is governed not by Newtons law, but by a much more
complicated law, depending on the stars distance from the earth, which is
evidently absurd.
Of course, he was looking in the frequency range of visible light, which we've noted is
subject to extinction. However, in the x-ray range we can (in principle) perform the same
basic test, and yet we still find no traces of these ghostly apparitions in binary stars, nor
do we ever see the stellar components going in "reverse time" as we would according to
the above profile. (Needless to say, for star systems at great distances it is not possible to
distinguish the changes in transverse positions but, as noted above, by examining the
Doppler shift of the radial components of their motions we can infer the motions of the
individual bodies.) Hence these observations support the proposition that the speed of
light in empty space is essentially independent of the speed of the source.
In comparison, if we take the relativistic approach with constant light speed c,
independent of the speed of the source, an analysis similar to the above gives the
approximate result

whose derivative is

which is always positive for any v less than 1. This means we can't possibly have images
arriving in reverse time, nor can we have any multiple appearances of the components of
the binary star system.
Regarding this subject, Robert Shankland recalled Einstein telling him (in 1950) that he
had himself considered an emission theory of light, similar to Ritz's theory, during the
years before 1905, but he abandoned it because
he could think of no form of differential equation which could have solutions
representing waves whose velocity depended on the motion of the source. In this
case the emission theory would lead to phase relations such that the propagated
light would be all badly "mixed up" and might even "back up on itself". He asked
me, "Do you understand that?" I said no, and he carefully repeated it all. When he
came to the "mixed up" part, he waved his hands before his face and laughed, an
open hearty laugh at the idea!
2.11 Thomas Precession

At the first turning of the second stair


I turned and saw below
The same shape twisted on the banister
Under the vapour in the fetid air
Struggling with the devil of the stairs who wears
The deceitful face of hope and of despair.
T. S. Eliot, 1930

Consider a slanted rod AB in the xy plane moving at speed u in the positive y direction as
indicated in the left-hand figure below. The A end of the rod crosses the x axis at time t =
0, whereas the B end does not cross until time t = 1. Hence we conclude that the rod is
oriented at some non-zero angle with respect to the xyt coordinate system. However,
suppose we view the same situation with respect to a system of inertial coordinates x'y't'
(with x' parallel to x) moving in the positive x direction with speed v. In accord with
special relativity, the x' and t' axes are skewed with respect to the x and t axes as shown in
the right-hand figure below.

As a result of this skew, the B end of the rod crosses the x' axis at the same instant (i.e.,
the same t') as does the A end of the rod, which implies that the rod is parallel to the x'
axis - and therefore to the x axis - based on the simultaneity of the x'y't' inertial frame.
This implies that if a rod was parallel to the x axis and moving in the positive x direction
with speed v, it would be perfectly aligned with the rod AB as the latter passed through
the x' axis. Thus if a rod is initially aligned with the x axis and moving with speed v in
the positive x direction relative to a given fixed inertial frame, and then at some instant
with respect to the rod's inertial rest frame it instantaneously changes course and begins
to move purely in the positive y direction, without ever changing its orientation, we find
that its orientation does change with respect to the original fixed frame of reference. This
is because the changes in the states of motion of the individual parts of the rod do not
occur simultaneously with respect to the original rest frame.
In general, whenever we transport a vector, always spatially parallel to itself in its own
instantaneous rest frame, over an accelerated path, we find that its orientation changes
relative to any given fixed inertial frame. This is the basic idea behind Thomas
precession, named after Llewellyn Thomas, who first wrote about it in 1927. For a simple

application of this phenomenon, consider a particle moving around a circular path. The
particle undergoes continuous acceleration, but at each instant it is at rest with respect to
the momentarily co-moving inertial frame. If we consider the "parallel transport" of a
vector around the continuous cycle of momentary inertial rest frames of the particle, we
find that the vector does not remain fixed. Instead, it "precesses" as we follow it around
the cycle. This relativistic precession (which has no counter-part in non-relativistic
physics) actually has observable consequences in the behavior of sub-atomic particles
(see below).
To understand how the Thomas precession for simple circular motion can be deduced
from the basic principles of special relativity, we can begin by supposing the circular path
of a particle is approximated by an n-sided polygon, and consider the transition from one
of these sides to the next, as illustrated below.

Let v denote the circumferential speed of the particle in the counter-clockwise direction,
and note that = 2/n for an arbitrary n-sided regular polygon. (In the drawing above we
have set n = 8). The dashed lines represent the loci of positions of the spatial origins of
two inertial frames K' and K" that are co-moving with the particle on consecutive edges.
Now suppose the vector ab at rest in K' makes an angle 1 with respect to the x axis (in
terms of frame K), and suppose the vector AB at rest in K" makes an angle of 2 with
respect to the x axis. The figure below shows the positions of these two vectors at several
consecutive instants of the frame K.

Clearly if 1 is not equal to 2, the two vectors will not coincide at the instant when their
origins coincide. However, this assumes we use the definition of simultaneity associated
with the inertial coordinate system K (i.e., the rest system of the polygon). The system K'
is moving in the positive x direction at the speed v, so its time-slices are skewed relative
to those of the polygon's frame of reference. Because of this skew, it is possible for the
vectors ab and AB to be parallel with respect to K' even though they are not parallel with
respect to K.
The equations of the moving vectors ab and AB are easily seen to be

This confirms that at t = 0 (or at any fixed t) these lines are not parallel unless 1 = 2.
However, if we substitute from the Lorentz transformation between the frames K and K'

where

, the equations of the moving vectors become

At t' = 0 these equations reduce to

In the limit as the number n of sides of the polygon increases and the angle approaches
zero, the value of cos() approaches 1 (to the second order), and the value of sin()
approaches . Hence the equations of the two moving vectors approach

Setting these equal to each other, multiplying through by /x', and re-arranging, we get
the condition

Recalling the trigonometric identity

and noting that 1 approaches 2 in the limit as goes to zero, the right-hand factor on the
right side can be taken as

where is the limiting value of both 1 and 2 as goes to zero. Making use of these
substitutions, and also noting that tan(2 1) approaches 2 1, the condition for the two
families of lines to be parallel with respect to frame K' (in the limit as goes to zero) is

This is the amount by which the two vectors are skewed with respect to the K frame due
to the transition around a single vertex of the polygon, given that the transported vector
makes an angle with the edge leading into the vertex. The total precession resulting
from one complete revolution around the n-sided polygon is n times the mean value of 2
1 for each of the n vertices of the polygon. Since n = 2/, we can express the total
precession as

If the circumferential speed v is small compared with 1, the denominator of this


expression is closely approximated by 1, and the transported vector changes its absolute
orientation only very slightly on one revolution. In this case it follows that varies
essentially uniformly from 0 to 2 as the vector is transported around the circle. Hence
for small v the total precession for one revolution is given closely by

On the other hand, if v is not small, we can consider the general situation illustrated
below:

The variable signifies the absolute angular position of the transported vector at any
given time, and signifies the vector's orientation relative to the positive y axis. As
before, denotes the angle of the vector relative to the local tangent "edge". We have the
relations

We also have the following identifications involving the parameters and :

Substituting d + d for d and re-arranging, we get

This can be integrated explicitly to give as a function of . Since equals + , we


can also give as a function of , leading to the parametric equations

where
. One complete "branch" is given by allowing to range from /2 to
/2, giving the angle from /2 to /2, and the angles from (/2)(1) to (/2)(1).
This is shown in the figure below.

Consequently, a full cycle of corresponds to 2/ times the above range, and so the
average change in per revolution (i.e., per 2 increase in ) is

This function is plotted in the figure below, along with the "small v" approximation.

For all v less than 1 we can expand the general expression into a series

These expressions represent the average change per revolution, because the cycles of
do not in general coincide with the cycles of . Resonance occurs when the ratio of the
change in to the change in is rational. This is true if and only if there exist integers
M,N such that

Adding 1 to both sides, we can set 1 + (M/N) equal to m/n for integers m and n, and we
can then square both sides and re-arrange to give, we find that the "resonant" values of v
are given by

where m,n are integers with |n| less than |m|.


We previously derived the low-speed approximation of the amount Thomas precession

for a vector subjected to "parallel transport" around a circle with a constant


circumferential speed v in the form v2 radians per revolution. Dividing this by 2 gives
the average precession rate of v2/2 in units of radians per radian (of travel around the
circle). We can also determine the average rate of Thomas precession, with units of
radians per second. Letting denote the orbital angular velocity (i.e., the angular
velocity with which the vector is transported around the circle of radius r), we have v =
2
or and a = v /r where a is the centripetal acceleration. Hence we have o = v/r = a/v, so
multiplying v2/2 by o gives the average Thomas precession rate T = va/2 in units of
rad/sec, which represents a frequency of T = (v2/2)o = va/(4 cycles/sec.
Since the magnitude v2 of the Thomas precession is of the second order in v, we might
be tempted to think it is insignificant for ordinary terrestrial phenomena, but the
expression T = (v2/2)o shows that the precession frequency can be quite large in
absolute terms, even if v is small, provided o is sufficiently large. This occurs when the
orbital radius r is very small, giving a very large acceleration for any given orbital
velocity. Consider, for example, the orbit of an electron around the nucleus of an atom.
An electron has intrinsic quantum "spin" which tends to maintain it's absolute orientation
much as does a spinning gyroscope, so it can be regarded as a vector undergoing parallel
transport. Now, according to the original (naive) Bohr model, the classical orbit of an
electron around the nucleus is given by equating the Coulomb and centripetal forces

where e is the charge of an electron, m is the mass, 0 is the permittivity of the vacuum,
and N is the atomic number of the nucleus, so the linear and angular speeds of the
electron are

Bohr hypothesized that the angular momentum L = mvr can only be an integer multiple
of h/(2), so we have for some positive integer n

Therefore, the linear velocity and orbital frequency of an electron (in this simplistic
model) are

where = e2/(2h0) is the dimensionless "fine structure constant", whose value is


approximately 1/137. (Remember that we are using units such that c = 1, so all distances
are expressed in units of seconds.) For the lowest energy state of a hydrogen atom we
have n = N = 1, so the linear speed of the electron is about 1/137. Consequently the
precession frequency is (v2/2) = -0.00002664 times the orbital frequency, which is a very
small fraction, but it is still a very large frequency in absolute terms (1.755E-11
cycles/sec) because the orbital frequency is so large. (Note that these are not the
frequencies of photons emitted from the atom, because those correspond to quanta of
light given off due to transitions from one energy level to another, whereas these are the
theoretical orbital frequencies of the electron itself in Bohr's simple model.)
Incidentally, there is a magnetic interaction between the electron and nucleus of some
atoms that is predicted to cause the electron's spin axis to precess by +v2 radians per
orbital radian, but the actual observed precession rate of the spin axes of electrons in such
atoms is only +(v2/2). For awhile after its discovery, there was no known explanation for
this discrepancy. Only in 1927 did Thomas point out that special relativity implies the
purely kinematic relativistic effect that now bears his name, which (as we've seen) yields
a precession of (v2/2) radians per orbital radian. The sum of this purely kinematic effect
due to special relativity with the predicted effect due to the magnetic interaction yields
the total observed +(v2/2) precession rate.
It's often said that the relativistic effect supplies a "factor of 2" (i.e., divides by 2) to the
electron's precession rate. For example, Uhlenbeck wrote that
...when I first heard about [the Thomas precession], it seemed unbelievable that a relativistic effect
could give a factor of 2 instead of something of order v/c... Even the cognoscenti of relativity
theory (Einstein included!) were quite surprised.

(Uhlenbeck also told Pais that he didn't understand a word of Thomas's work when it first
came out.) However, this description is somewhat misleading, because (as we've seen)
the relativistic effect is actually additive, not multiplicative. It just so happens that a
particular magnetic interaction yields a precession of twice the frequency, and the
opposite sign, as the Thomas precession, so the sum of the two effects is half the size of
the magnetic effect alone. Both of the effects are second-order in the linear speed v/c.
3.1 Postulates and Principles
Complex ideas may perhaps be well known by definition, which is nothing but an
enumeration of those parts or simple ideas that comprise them. But when we have pushed
up definitions to the most simple ideas, and find still some ambiguity and obscurity, what
resources are we then possessed of?
David Hume, 1748

As discussed in Section 1, even after stipulating the existence of coordinate systems with
respect to which inertia is homogeneous and isotropic, there remains a fundamental
amgibuity as to the character of the relationship between relatively moving inertial
coordinate systems, corresponding to three classes of possible metrical structures, with
the k values 1, 0, and +1. There is a remarkably close historical analogy for this

situation, dating back to one of the first formal systems of thought ever proposed. In
Book I of The Elements, Euclid consolidated and systematized plane geometry as it was
known circa 300 BC into a formal deductive system. As it has come down to us, it is
based on five postulates together with several definitions and common notions. (Its
worth noting, however, that the classifications of these premises was revised many times
in various translations.) The first four of these postulates are stated very succinctly
1.
2.
3.
4.

A straight line may be drawn from any point to any other point.
A straight line segment can be uniquely and indefinitely extended.
We may draw a circle of any radius about any point.
All right angles are equal to one another.

Each of these assertions actually entails a fairly complicated set of premises and
ambiguities, but they were accepted as unobjectionable for two thousand years. However,
Euclid's final postulate was regarded with suspicion from earliest times. It has a very
different appearance from the others - a difference that neither Euclid nor his subsequent
editors and translators attempted to disguise. The fifth postulate is expressed as follows:
5. If a straight line falling on two straight lines makes the [sum of the] interior angles on the same
side less than two right angles, then the two straight lines, if produced indefinitely, meet on that
side on which the angles are less than two right angles.

This postulate is equivalent to the statement that there's exactly one line through a given
point P parallel to a given line L, as illustrated below

Although this proposition is fairly plausible (albeit somewhat awkward to state), many
people suspected that it might be logically deducible from the other postulates, axioms,
and common notions. There were also attempts to substitute for Euclid's fifth postulate a
simpler or more self-evident proposition. However, we now understand that Euclid's fifth
postulate is logically independent of the rest of Euclid's logical structure. In fact, it's
possible to develop logically consistent geometries in which Euclid's fifth postulate is
false. For example, we can assume that there are infinitely many lines through P that are
parallel to (i.e., never intersect) the line L. It might seem (at first) that it would be
impossible to reason with such an assumption, that it would either lead to contradictions
or else cause the system to degenerate into a logical triviality about which nothing
interesting could be said, but, remarkably, this turns out not to be the case.
Suppose that although there are infinitely many lines through P that never intersect L,
there are also infinitely many that do intersect L. This, combined with the other axioms
and postulates of plane geometry, implies that there are two lines through P defining the
boundary between lines that do intersect L and lines that don't, as shown below:

This leads to the original non-Euclidean geometry of Lobachevski, Bolyai, and Gauss,
i.e., the hyperbolic plane. The analogy to Minkowski spacetime is obvious. The behavior
of straight lines in a surface of negative curvature (although positive-definite) is nicely
suggestive of how the light-lines in spacetime serve as the dividing lines between those
lines through P that intersect with the future "L" and those that don't (distinguishing
between spacelike and timelike intervals). This is also a nice illustration of the fact that
even though Minkowski spacetime is "flat" in the Riemannian sense, it is nevertheless
distinctly non-Euclidean. Of course, the possibility that spacetime might be curved as
well as locally Minkowskian led to general relativity, but arguably the conceptual leap
required to go from a positive-definite to a non-positive-definite metric is greater than
that required to go from a flat to a curved metric. The former implies that the local
geometrical structure of the effective spatio-temporal manifold of events is profoundly
different than had been assumed for thousands of years, and this realization led naturally
to a new set of principles with which to organize and interpret our experience.
It became clear in the 19th century that there are actually three classes of geometries
consistent with Euclids basic premises, depending on what we adopt as the fifth
postulate. The three types of geometry correspond to spaces of negative, positive, or
zero curvature. The analogy to the three possible classes of spacetimes (Euclidean,
Galilean, and Minkowskian) is obvious, and in both cases it came to be recognized that,
insofar as these mathematical structures were supposed to represent physical properties,
the choice between the alternatives was a matter for empirical investigation.
Nevertheless, the superficially axiomatic way in which Einstein presented the special
theory in his 1905 paper tended to encourage the idea that special relativity represented a
closed formal system, like Euclids geometry interpreted in the purely mathematical
sense. For example, in 1907 Paul Ehrenfest wrote that
In the formulation in which Mr Einstein published it, Lorentzian relativistic electrodynamics is
rather generally viewed as a complete system. Accordingly it must be able to provide an answer
purely deductively to the question [involving the shape of the moving electron]

However, Einstein himself was quick to disavow this idea, answering


The principle of relativity, or, more exactly, the principle of relativity together with the principle of
the constancy of the velocity of light, is not to be conceived as a complete system, in fact, not as
a system at all, but merely as a heuristic principle which, when considered by itself, contains only
statements about rigid bodies, clocks, and light signals. It is only by requiring relations between
otherwise seemingly unrelated laws that the theory of relativity provides additional statements.

Just as the basic premises of Euclids geometry were classified in many different ways

(e.g., postulates, axioms, common notions, definitions), the premises on which Einstein
based special relativity can be classified in many different ways. Indeed, in his 1905
paper, Einstein introduced the first of these premises as follows:
... the same laws of electrodynamics and optics will be valid for all coordinate systems in which
the equations of mechanics hold good. We will raise this conjecture (hereafter called the "principle
of relativity") to the status of a postulate...

Here, in a single sentence, we find a proposition referred to as a conjecture, a principle,


and a postulate. The meanings of these three terms are quite distinct, but they are each
arguably applicable. The assertion of the co-relativity of optics and mechanics was, and
will always be, conjectural, because it can be empirically corroborated only up to a
limited precision. Einstein formally adopted this conjecture as a postulate, but on a more
fundamental level it serves as a principle, since it entails the decision to organize our
knowledge in terms of coordinate systems with respect to which the equations of
mechanics hold good, i.e., inertial coordinate systems. Einstein goes on to introduce a
second proposition that he formally adopts as a postulate, namely,
... that the velocity of light always propagates in empty space with a definite velocity c that is
independent of the state of motion of the emitting body. These two postulates suffice for the
attainment of a simple and consistent electrodynamics of moving bodies based on Maxwell's
theory for bodies at rest.

Interestingly, in the paper "Does the Inertia of a Body Depend on Its Energy Content?"
published later in the same year, Einstein commented that
... the principle of the constancy of the velocity of light... is of course contained in Maxwell's
equations.

In view of this, some have wondered why he did not simply dispense with his "second
postulate and assert that the "laws of electrodynamics and optics" in the statement of the
first principle are none other than Maxwell's equations. In other words, why didnt he
simply base his theory on the single proposition that Maxwell's equations are valid for
every system of coordinates in which the laws of mechanics hold good? Part of the
answer is that he realized important parts of physics, such as the physics of elementary
particles, cannot possibly be explained in terms of Maxwellian electrodynamics. In a note
published in 1907 he wrote
It should be noted that the laws that govern [the structure of the electron] cannot be derived from
electrodynamics alone. After all, this structure necessarily results from the introduction of forces
which balance the electrodynamic ones.

More fundamentally, by 1905 he was already aware of the fact that, although Maxwell's
equations are empirically satisfactory in many respects, they cannot be regarded as
fundamentally correct or valid. In his paper "On a Heuristic Point of View Concerning
the Production and Transformation of Light" he wrote
... despite the complete confirmation of [Maxwell's theory] by experiment, the theory of light,
operating with continuous spatial functions, leads to contradictions when applied to the

phenomena of emission and transformation of light.

Thus it isn't surprising that he chose not to base the theory of relativity on Maxwells
equations. He needed to distill from electromagnetic phenomena the key feature whose
significance "transcended its connection with Maxwell's equations", and which would
serve as a viable principle for organizing our knowledge of all phenomena, including
both optics and mechanics. The principle he selected was the existence of an invariant
speed with respect to any local system of inertial coordinates, and then for definiteness he
could identify this speed with the speed of propagation of electromagnetic energy.
After reviewing the operational definition of inertial coordinates in section 1 (which he
does by optical rather than mechanical means, thereby missing an opportunity to clarify
the significance of inertial coordinates in establishing the connection between mechanical
and optical phenomena), he gives more formal statements of his two principles
The following reflections are based on the principle of relativity and the principle of the constancy
of the velocity of light. These two principles we define as follows:
1. The laws by which the states of physical systems undergo change are not affected, whether
these changes of state be referred to the one or the other of two systems of co-ordinates in uniform
translatory motion.
2. Any ray of light moves in the "stationary" system of co-ordinates with the determined velocity
c, whether the ray is emitted by a stationary or by a moving body. Hence velocity equals [length
of] light path divided by time interval [of light path], where time interval [and length are] to be
taken in the sense of the definition in 1.

The first of these is nothing but the principle of inertial relativity, which had been
accepted as a fundamental principle of physics since the time of Galileo (see section 1.3).
Strictly speaking, Einsteins statement of the principle here is incorrect, because he
assumes the coordinate systems in which the equations of mechanics hold good are fully
characterized by being in uniform translatory motion, whereas in fact it is also necessary
to specify an inertially isotropic simultaneity. Einstein chose to address this aspect of
inertial coordinate systems by means of a separate and seemingly arbitrary definition of
simultaneity based on optical phenomena, which unfortunately has invited much
misguided philosophical debate about what should be considered true simultaneity. All
this could have been avoided if, from the start, Einstein had merely stated that an inertial
coordinate system is one in which mechanical inertia is homogeneous and isotropic (just
as Galileo said), and then noting that automatically entails the conventional choice of
simultaneity. The content of his first principle (i.e., the relativity principle) is simply that
the inertial simultaneity of mechanics and the optical simultaneity of electrodynamics are
identical.
Despite the shortcomings of its statement, the principle of relativity was very familiar to
the physicists of 1905, whether they wholeheartedly accepted it or not. Einstein's second
principle, by itself, was also not regarded as particularly novel, because it conveys the
usual understanding of how a wave propagates at a fixed speed through a medium,
independent of the speed of the source. It was the combination of these two principles
that was new, since they had previously been thought to be irreconcilable. In a sense, the

first principle arose from the ballistic particles in a vacuum view of physics, and the
second arose from the wave in a material medium view of physics. Both of these views
can trace their origins back to ancient times, and both seem to capture some fundamental
truth about the world, and yet they had always been regarded as mutually exclusive.
Einsteins achievement was to explain how they could be reconciled.
Of course, Einsteins second principle it isn't a self-contained statement, because its entire
meaning and significance depends on "the sense of" time intervals and (implicitly) spatial
lengths given in 1, where we find that time intervals and spatial lengths are defined to be
such that their ratio equals the fixed constant c for light paths. This has tempted some
readers to conclude that "Einstein's second principle" was merely a tautology, with no
substantial content. The source of this confusion is the fact that the essential axiomatic
foundations underlying special relativity are contained not in the two famous propositions
at the beginning of 2 of Einstein's paper (as quoted above), but rather in the sequence of
assumptions and definitions explicitly spelled out in 1. Among these are the very first
statement
Let us take a system of co-ordinates in which the equations of Newtonian mechanics hold good.

In subsequent re-prints of this paper Sommerfeld added a footnote to this statement, to


say "i.e., to the first approximation", meaning for motion with speeds small in
comparison with the speed of light. (This illustrates the difficulty of writing a paper that
results in a modification of the equations of Newtonian mechanics!) Of course, Einstein
was aware of the epistemological shortcomings of the above statement, because while it
tells us to begin with an inertial system of coordinates, it doesn't tell us how to identify
such a system. This has always been a potential source of ambiguity for mechanics based
on the principle of inertia. Strictly speaking, Newton's laws are epistemologically
circular, so in practice we must apply it both inductively and deductively. First we use
them inductively with our primitive observations to identify inertial coordinate systems
by observing how things behave. Then at some point when we've gained confidence in
the inertialness of our coordinates, we begin to apply the laws deductively, i.e., we begin
to deduce how things will behave with respect to our inertial coordinates. Ultimately this
is how all physical theories are applied, first inductively as an organizing principle for
our observations, and then deductively as "laws" to make predictions. Neither Galilean
nor special relativity is able to justify the privileged role given to a particular class of
coordinate systems, nor to provide a non-circular means of identifying those systems. In
practice we identify inertial systems by means of an incomplete induction. Although
Einstein was aware of the deficiency of this approach (which he subsequently labored to
eliminate from the general theory), in 1905 he judged it to be the only pragmatic way
forward.
The next fundamental assertion in 1 of Einstein's paper is that lengths and time intervals
can be measured by (and expressed in terms of) a set of primitive elements called
"measuring rods" and "clocks". As discussed in Section 1.2, Einstein was fully aware of
the weakness in this approach, noting that strictly speaking, measuring rods and clocks
should emerge as solutions of the basic equations, not as primitive conceptions.
Nevertheless

it was better to admit such inconsistency - with the obligation, however, of eliminating it at a later
stage of the theory...

Thus the introduction of clocks and rulers as primitive entities was another pragmatic
concession, and one that Einstein realized was not strictly justifiable on any other
grounds than provisional expediency.
Next Einstein acknowledges that we could content ourselves to time events by using an
observer located at the origin of the coordinate system, which corresponds to the absolute
time of Lorentz, as discussed in Section 1.6. Following this he describes the "much more
practical arrangement" based on the reciprocal operational definition of simultaneity. He
says
We assume this definition of synchronization to be free of any possible contradictions, applicable
to arbitrarily many points, and that the following relations are universally valid:
1. If the clock at B synchronizes with the clock at A, the clock at A synchronizes with the clock at
B.
2. If the clock at A synchronizes with the clock at B and also with the clock at C, the clocks at B
and C also synchronize with each other.

These are important and non-trivial assumptions about the viability of the proposed
operational procedure for synchronizing clocks, but they are only indirectly invoked by
the reference to "the sense of time intervals" in the statement of Einstein's second
principle. Furthermore, as mentioned in Section 1.6, Einstein himself subsequently
identified at least three more assumptions (homogeneity, spatial isotropy,
memorylessness) that are tacitly invoked in the formal development of special relativity.
The list of unstated assumptions would actually be even longer if we were to construct a
theory beginning from nothing but an individual's primitive sense perceptions. The
justification for leaving them out of a scientific paper is that these can mostly be
classified as what Euclid called "common notions", i.e., axioms that are common to all
fields of thought.
In many respects Einstein modeled his presentation of special relativity not on Euclids
Elements (as Newton had done in the Principia), but on the formal theory of
thermodynamics, which is founded on the principle of the conservation of energy. There
are different kinds of energy, with formally different units, e.g., mechanical and
gravitational potential energy are typically measured in terms of joules (a force times a
distance, or equivalently a mass times a squared velocity), whereas heat energy is
measured in calories (the amount of heat required to raise the temperature of 1 gram of
water by one degree C). It's far from obvious that these two things can be treated as
different aspects of the same thing, i.e., energy. However, through careful experiments
and observations we find that whenever mechanical energy is dissipated by friction (or
any other dissipative process), the amount of heat produced is proportional to the amount
of mechanical energy dissipated. Conversely, whenever heat is involved in a process that
yields mechanical work, the heat content is reduced in proportion to the amount of work
produced. In both cases the constant of proportionality is found to be 4.1833 joules per
calorie.

Now, the First Law of thermodynamics asserts that the total energy of any physical
process is always conserved, provided we "correctly" account for everything. Of course,
in order for this assertion to even make sense we need to define the proportionality
constants between different kinds of energy, and those constants are naturally defined so
as to make the First Law true. In other words, we determine the proportionality between
heat and mechanical work by observing these quantities and assuming that those two
changes represent equal quantities of something called "energy". But this assumption is
essentially equivalent to the First Law, so if we apply these operational definitions and
constants of proportionality, the conservation of energy can be regarded as a tautology or
a convention.
This shows clearly that, just as in the case of Newton's laws, these propositions are
actually principles rather than postulates, meaning that they first serve as organizing
principles for our measurements and observations, and only subsequently do they serve
as "laws" from which we may deduce further consequences. This is the sense in which
fundamental physical principles always operate. Wein's letter of 1912 nominating
Einstein and Lorentz for the Nobel prize commented on this same point, saying that "the
confirmation of [special relativity] by experiment... resembles the experimental
confirmation of the conservation of energy".
Einstein himself acknowledged that he consciously modeled the formal structure of
special relativity on thermodynamics. He wrote in his autobiographical notes
Gradually I despaired of the possibility of discovering the true laws by means of constructive
efforts based on known facts. The longer and the more desperately I tried, the more I came to the
conviction that only the discovery of a universal formal principle could lead us to assured results.
The example I saw before me was thermodynamics. The general principle was there given in the
proposition: The laws of nature are such that it is impossible to construct a perpetuum mobile (of
the first and second kinds).

This principle is a meta-law, i.e., it does not express a particular law of nature, but rather
a general principle to which all the laws of nature conform. In 1907 Ehrenfest suggested
that special relativity constituted a closed axiomatic system, but Einstein quickly replied
that this was not the case. He explained that the relativity principle combined with the
principle of invariant light speed is not a closed system at all, but rather it provides a
coherent framework within which to conduct physical investigations. As he put it, the
principles of special relativity "permit certain laws to be traced back to one another (like
the second law of thermodynamics)."
Not only is there a close formal similarity between the axiomatic structures of
thermodynamics and special relativity, each based on two fundamental principles, these
two theories are also substantively extensions of each other. The first law of
thermodynamics can be placed in correspondence with the basic principle of relativity,
which suggests the famous relation E = mc2, thereby enlarging the realm of applicability
of the first law. The second law of thermodynamics, like Einstein's second principle of
invariant light speed, is more sophisticated and more subtle. A physical process whose

net effect is to remove heat from a body and produce an equivalent amount of work is
called perpetual motion of the second kind. It isn't obvious from the first law that such a
process is impossible, and indeed there were many attempts to find such a process - just
as there were attempts to identify the rest frame of the electromagnetic ether - but all such
attempts failed. Moreover, they failed in such a way as to make it clear that the failures
were not accidental, but that a fundamental principle was involved.
In the case of thermodynamics this was ultimately formulated as the second law, one
statement of which (as alluded to by Einstein in the quote above) is simply that perpetual
motion of the second kind is impossible - provided the various kinds of energy are
defined and measured in the prescribed way. (This theory was Einstein's bread and butter,
not only because most of his scientific work prior to 1905 had been in the field of
thermodynamics, but also because a patent examiner inevitably is called upon to apply
the first and second laws to the analysis of hopeful patent applications.) Compare this
with Einstein's second principle, which essentially asserts that it's impossible to measure
a speed in excess of the constant c - provided the space and time intervals are defined and
measured in the prescribed way. The strength of both principles is due ultimately to the
consistency and coherence of the ways in which they propose to analyze the processes of
nature.
Needless to say, our physical principles are not arbitrarily selected assumptions, they are
hard-won distillations of a wide range of empirical facts. Regarding the justification for
the principles on which Einstein based special relativity, many popular accounts give a
prominent place to the famous experiments of Michelson and Morley, especially the
crucial version performed in 1889, often presenting this as the "brute fact" that
precipitated relativity. Why, then, does Einsteins 1905 paper fail to cite this famous
experiment? It does mention at one point the various unsuccessful attempts to measure
the Earths motion with respect to the ether, but never refers to Michelson's results
specifically. The conspicuous absence of any reference to this important experimental
result has puzzled biographers and historians of science. Clearly Einsteins intent was to
present the most persuasive possible case for the relativity of space and time, and
Michelson's results would (it seems) have been a very strong piece of evidence in his
favor. Could he simply have been unaware of the experiment at the time of writing the
paper?
Einsteins own recollections on this point were not entirely consistent. He sometimes said
he couldnt remember if he had been aware in 1905 of Michelson's experiments, but at
other times he acknowledged that he had known of it from having read the works of
Lorentz. Indeed, considering Einsteins obvious familiarity with Lorentzs works, and
given all the attention that Lorentz paid to Michelsons ether drift experiments over the
years, its difficult to imagine that Einstein never absorbed any reference to those
experiments. Assuming he was aware of Michelson's results prior to 1905, why did he
chose not to cite them in support of his second principle? Of course, his paper includes no
formal references at all (which in itself seems peculiar, especially to modern readers
accustomed to extensive citations in scholarly works), but it does refer to some other
experiments and theories by name, so an explicit reference to Michelsons result would

not have been out of place.


One possible explanation for Einsteins reluctance to cite Michelson, both in 1905 and
subsequently, is that he was sophisticated enough to know that his theory was
technically just a re-interpretation of Lorentzs theory - making identical predictions - so
it could not be preferred on the basis of agreement with experiment. To Einstein the most
important quality of his interpretation was not its consistency with experiment, but its
inherent philosophical soundness. In other words, conflict with experiment was bad, but
agreement with experiment by means of ad hoc assumptions was hardly any better. His
critique of Lorentzs theory (or what he knew of it at the time) was not so much that it
was empirically "wrong" (which it wasnt), but that the length contraction and time
dilation effects had been inserted ad hoc to match the null results Michelson. (Its
debatable whether this critique was justified, in view of the discussion in Section 1.5.)
Therefore, Einstein would naturally have been concerned to avoid giving the impression
that his relativistic theory had been contrived specifically to conform with Michelsons
results. He may well have realized that any appeal to the Michelson-Morley experiment
in order to justify his theory would diminish rather than enhance its persuasiveness.
This is not to suggest that Einstein was being disingenuous, because its clear that the
principles of special relativity actually do emerge very naturally from just the first-order
effects of magnetic induction (for example), and even from more basic considerations of
the mathematical intelligibility of Galilean versus Lorentzian transformations (as stressed
by Minkowski in his famous 1908 lecture). It seems clear that Einsteins explanations for
how he arrived at special relativity were sincere expressions of his beliefs about the
origins of special relativity in his own mind. He was focused on the phenomenon of
magnetic induction and the unphysical asymmetry of the pre-relativistic explanations.
This was combined with a strong instinctive belief in the complete relativity of physics.
He told Shankland in 1950 that the experimental results which had influenced him the
most were stellar aberration and Fizeau's measurements on the speed of light in moving
water. "They were enough," he said.

3.2 Natural and Violent Motions


Mr Spenser in the course of his remarks regretted that so many members
of the Section were in the habit of employing the word Force in a sense
too limited and definite to be of any use in a complete theory. He had
himself always been careful to preserve that largeness of meaning which
was too often lost sight of in elementary works. This was best done by
using the word sometimes in one sense and sometimes in another, and in
this way he trusted that he had made the word occupy a sufficiently large
field of thought.
James
Clerk Maxwell

The concept of force is one of the most peculiar in all of physics. It is, in one sense, the
most viscerally immediate concept in classical mechanics, and seems to serve as the
essential "agent of causality" in all interactions, and yet the ontological status of force has
always been highly suspect. We sometimes regard force as the cause of changes in
motion, and imagine that those changes would not occur in the absence of the forces, but
this causative aspect of force is an independent assumption that does not follow from any
quantifiable definition, since we could equally well regard force as being caused by
changes in motion, or even as merely a descriptive parameter with no independent
ontological standing at all.
In addition, there is an inherent ambiguity in the idea of changes in motion, because it
isn't obvious what constitutes unchanging (i.e., unforced) motion. Aristotle believed it
was necessary to distinguish between two fundamentally distinct kinds of motion, which
he called natural motions and violent motions. The natural motions included the apparent
movements of celestial objects, the falling of leaves to the ground, the upward movement
of flames and hot gases in the atmosphere, or of air bubbles in water, and so on.
According to Aristotle, the cause of such motions is that all objects and substances have a
natural place or level (such as air above, water below), and they proceed in the most
direct way, along straight vertical paths, to their natural places. The motion of the
celestial bodies is circular because this is the most perfect kind of unchanging eternal
motion, whereas the necessarily transitory motions of sublunary objects are rectilinear. It
may not be too misleading to characterize Aristotle's concept of sublunary motion as a
theory of buoyancy, since the natural place of light elements is above, and the natural
place of heavy elements is below. If an object is out of place, it naturally moves up or
down as appropriate to reach its proper place.
Aristotle has often been criticized for saying (or seeming to say) that the speed at which
an object falls (through the air) is proportional to its weight. To the modern reader this
seems absurd, as it is contradicted by the simplest observations of falling objects.
However, it's conceivable that we misinterpret Aristotle's meaning, partly because we're
so accustomed to regarding the concept of force as the cause of motion, rather than as an
effect or concomitant attribute of motion. If we consider the downward force (which
Aristotle would call the weight) of an object to be the force that would be required to
keep it at its current height, then the "weight" of an object really is substantially greater
the faster it falls. More strength is required to catch a falling object than to hold the same
object at rest. Some Aristotelian scholars have speculated that this was Aristotle's actual
meaning, although his writing's on the subject are so sketchy that we can't know for
certain. In any case, it illustrates that the concept and significance of force in a physical
theory is often murky, and it also shows how thoroughly our understanding of physical
phenomena is shaped by the distinction between forces (such as gravity) that we consider
to be causes of motion, and those (such as impact forces) that we consider to be caused
by motion.
Aristotle also held that the speed of motion was not only proportional to the "weight"
(whatever that means) but inversely proportional to the resistance of the medium. Thus
his proposed law of motion could be expressed roughly as V = W/R, and he used this to

argue against the possibility of empty space, i.e., regions in which R = 0, because the
velocity of any object in such a region would be infinite. This doesn't seem like a very
compelling argument, since we could easily counter that the putative vacuum would not
be the natural place of any object, so it would have no "weight" in that direction either.
Nevertheless, perhaps to avoid wrestling with the mysterious fraction 0/0, Aristotle
surrounded the four sublunary elements of Earth, Water, Air, and Fire with a fifth element
(quintessence), the lightest of all, called aether. This aether filled the super-lunary region,
ensuring that we would never need to divide by zero.
In addition to natural motions, Aristotle also considered violent motions, which were any
motions resulting from acts of volition of living beings. Although his writings are
somewhat obscure and inconsistent in this area, it seems that he believed such beings
were capable of self-motion, i.e., of initiating motion in the first instance, without having
been compelled to motion by some external agent. Such self-movers are capable of
inducing composite motions in other objects, such as when we skip a stone on the surface
of a pond. The stone's motion is compounded of a violent component imparted by our
hand, and the natural component of motion compelling it toward its natural place (below
the air and water). However, as always, we must be careful not to assume that this motion
is to be interpreted as the causative result of the composition of two different kinds of
forces. It was, for Aristotle, simply the kinematic composition of two different kinds of
motion.
The bifurcation of motion into two fundamentally different types, one for natural motions
of non-living objects and another for acts of human volition and the attention that
Aristotle gave to the question of unmoved movers, etc. is obviously related to the issue
of free will, and demonstrates the strong tendency of scientists in all ages to exempt
human behavior from the natural laws of physics, and to regard motions resulting from
human actions as original, in the sense that they need not be attributed to other motions.
We'll see in Section 9 that Aristotle's distinction between natural and violent motions
plays a key role in the analysis of certain puzzling aspects of quantum theory.
We can also see that the ontological status of "force" in Aristotle's physics is ambiguous.
In some circumstances it seems to be more an attribute of motion rather than a cause of
motion. Even if we consider the quantitative physics of Galileo, Newton, and beyond, it
remains true that "force", while playing a central role in the formulation, serves mainly as
an intermediate quantity in the calculations. In fact, the concept of 'force' could almost be
eliminated entirely from classical mechanics. (See section 4 for further discussion of
this.) Newton wrestled with the question of whether force should be regarded as an
observable or simply a relation between observables. Interestingly, Ernst Mach regarded
the third law as Newton's most important contribution to mechanics, even though other's
have criticized it as being more a definition than a law.
Newtons struggle to find the "right" axiomatization of mechanics can be seen by reading
the preliminary works he wrote leading up to The Principia, such as "De motu corporum
in gyrum" (On the motion of bodies in an orbit). At one point he conceived of a system
with five Laws of Motion, but what finally appeared in Principia were eight Definitions

followed by three Laws. He defined the "quantity of matter" as the measure arising
conjointly from the density and the volume. In his critical review of Newtonian
mechanics, Mach remarked that this definition is patently circular, noting that "density" is
nothing but the quantity of matter per volume. However, all definitions ultimately rely on
undefined (irreducible) terms, so perhaps Newton was entitled to take density and volume
as two such elements of his axiomatization. Furthermore, by basing the quantity of matter
on explicitly finite density and volume, Newton deftly precluded point-like objects with
finite quantities of matter, which would imply the existence of infinite forces and infinite
potential energy according to his proposed inverse-square law of gravity.
The next basic definition in Principia is of the "quantity of motion", defined as the
measure arising conjointly from the velocity and the quantity of matter. Here we see that
"velocity" is taken as another irreducible element, like density and volume. Thus,
Newton's ontology consists of one irreducible entity, called matter, possessing three
primitive attributes, called density, volume, and velocity, and in these terms he defines
two secondary attributes, the "quantity of matter" (which we call "mass") as the product
of density and volume, and the "quantity of motion" (which we call "momentum") as the
product of velocity and mass, meaning it is the product of velocity, density, and volume.
Although the term "quantity of motion" suggests a scalar, we know that velocity is a
vector, (i.e., it has a magnitude and a direction), so it's clear that momentum as Newton
defined it is also is a vector. After going on to define various kinds of forces and the
attributes of those forces, Newton then, as we saw in Section 1.3, took the law of inertia
and relativity as his First Law of Motion, just as Descartes and Huygens had done.
Following this we have the "force law", i.e., Newton's Second Law of Motion:
The change of motion is proportional to the motive force impressed; and is made
in the direction of the right line in which the force is impressed.
Notice that this statement doesn't agree precisely with either of the two forms in which
the Second Law is commonly given today, namely, as F = dp/dt or F = ma. The former is
perhaps closer to Newton's actual statement, since he expressed the law in terms of
momentum rather than acceleration, but he didn't refer to the rate of change of
momentum. No time parameter appears in the statement at all. This is symptomatic of a
lack of clarity (as in Aristotles writings) over the distinction between "impulse force"
and "continuous force". Recall that our speculative interpretation of Aristotle's downward
"weight" was based on the idea that he actually had in mind something like the impulse
force that would be exerted by the object if it were abruptly brought to a halt. Newton's
Second Law, as expressed in the Principia, seems to refer to such an impulse, and this is
how Newton used it in the first few Propositions, but he soon began to invoke the Second
Law with respect to continuous forces of finite magnitude applied over a finite length of
time more in keeping with a continuous force of gravity, for example. This shows that
even in the final version of the axioms and definitions laid down by Newton, he did not
completely succeed in clearly delineating the concept of force that he had in mind. Of
course, in each of his applications of the Second Law, Newton made the necessary
dimensional adjustments to appropriately account for the temporal aspect that was
missing from the statement of the Law itself, but this was done ad hoc, with no clear

explanation. (His ability to reliably incorporate these factors in each context testifies to
his solid grasp of the new dynamics, despite the imperfections of his formal articulation
of it.) Subsequent physicists clarified the quantitative meaning of Newtons second law,
explicitly recognizing the significance of time, by expressing the law either in the form F
= d(mv)/dt or else in what they thought was the equivalent form F = m(dv/dt). Of course,
in the context of special relativity these two are not equivalent, and only the former leads
to a coherent formulation of mechanics. (Its also worth noting that, in the context of
special relativity, the concept of force is largely an anachronism, and it is introduced
mainly for the purpose of relating relativistic descriptions to their classical counterparts.)
The third Law of Motion in the Principia is regarded by many people as one of Newton's
greatest and most original contributions to physics. This law states that
To every action there is always opposed an equal reaction: or, the mutual actions
of two bodies upon each other are always equal, and directed to contrary parts.
Unfortunately the word "action" is not found among the previously defined terms, but in
the subsequent text Newton clarifies the intended meaning. He says "If a body impinge
upon another, and by its force change the motion of the other, that body also... will
undergo an equal change in its own motion towards the contrary part." In other words, the
net change in the "quantity of motion" (i.e., the sum of the momentum vectors) is zero, so
momentum is conserved. More subtly, Newton observes that "If a horse draws a stone
tied to a rope, the horse will be equally drawn back towards the stone". This is true even
if neither the horse nor the stone are moving (which of course implies that they are each
subject to other forces as well, tending to hold them in place). The illustrates how the
concept of force enables us to conceptually decompose a null net force into non-null
components, each representing the contributions of different physical interactions.
In retrospect we can see that Newton's three "laws of motion" actually represent the
definition of an inertial coordinate system. For example, the first law imposes the
requirement that the spatial coordinates of any material object free of external forces are
linear functions of the time coordinate, which is to say, free objects move with a uniform
speed in a straight line with respect to an inertial coordinate system. Rather than seeing
this as a law governing the motions of free objects with respect to a given system of
coordinates, it is more correct to regard it as defining a class of coordinates systems in
terms of which a recognizable class of motions have particularly simple descriptions. It is
then an empirical question as to whether the phenomena of nature possess the attributes
necessary for such coordinate systems to exist.
The significance of force was already obscure in Newtons three laws of mechanics,
but it became even more obscure when he proposed the law of universal gravitation,
according to which every particle of matter exerts a force of attraction on every other
particle of matter, with a strength proportional to its mass and inversely proportional to
the square of the distance. The rival Cartesians expected all forces to be the result of local
contact between bodies, as when two objects press directly against each other, but
Newtons conception of instantaneous gravity between distant objects seems to defy

representation in those terms. In an effort to reconcile universal gravitation with semiCartesian ideas of force, Newtons young friend Nicolas Fatio hypothesized an omnidirectional flux of small ultra-mundane particles, and argued that the mutual shadowing
effect could explain why massive bodies are forced together. The same idea was later
taken up by Lesage, but many inconsistencies were pointed out, making it clear that no
such theory could accurately account for the phenomena. The simple notion of force at a
distance was so successful that it became the model for all mutual forces between objects,
and the early theories of electricity and magnetism were expressed in those terms.
However, reservations about the intelligibility of instantaneous action at a distance
remained. Eventually Faraday and Maxwell introduced the concept of disembodied lines
of force, which later came to be regarded as fields of force, almost as if force was an
entity in its own right, capable of flowing from place to place. In this way the
Maxwellians (perhaps inadvertently) restored the Cartesian ideas that all space must be
occupied and that all forces must be due to direct local contact. They accomplished this
by positing a new class of entity, namely the field. Admittedly our knowledge of the
electromagnetic field is only inferred from the behavior of matter, but it was argued that
explanations in terms of fields are more intelligible than explanations in terms of
instantaneous forces at a distance, mainly because fields were considered necessary for
strict conservation of energy and momentum once it was recognized that electromagnetic
effects propagate at a finite speed.
However, the explanation of phenomena in terms of fields, characterized by partial
differential equations, was incomplete, because it was not possible to represent stable
configurations of matter in these terms. Maxwells field equations are linear, so there was
no hope of them possessing solutions corresponding to discrete electrical charges or
particles of matter. Hence it was still necessary to retain the laws of mechanics of discrete
entities, characterized by total differential equations. The conceptual dichotomy between
Newtons physics of particles and Maxwells physics of fields is clearly shown by the
contrast between total and partial differential equations, and this contrast was seen (by
some people at least) as evidence of a fundamental flaw. In a 1936 retrospective essay
Einstein wrote
This is the basis on which H. A. Lorentz obtained his synthesis of Newtons
mechanics and Maxwells field theory. The weakness of this theory lies in the fact
that it tried to determine the phenomena by a combination of partial differential
equations (Maxwells field equations for empty space) and total differential
equations (equations of motions of points), which procedure was obviously
unnatural.
The difference between total and partial differential equations is actually more profound
than it may appear at first glance, because (as alluded to in section 1.1) it entails different
assumptions about the existence of free-will and acts of volition. If we consider a pointlike particle whose spatial position x(t) is strictly a function of time, and we likewise
consider the forces F(t) to which this particle is subjected as strictly a function of time,
then the behavior of this particle can be expressed in the form of total differential
equations, because there is just a single independent variable, namely the time coordinate.

Every physically meaningful variable exists as one of a countable number of explicit


functions of time, and each of the values is realized at its respective time. Thus the total
derivatives are evaluated over actualized values of the variables. In contrast, the partial
derivatives over immaterial fields are inherently hypothetical, because they represent the
variations in some variable of a particle not as a function of time along the particles
actual path, but transversely to the particles path. For example, rather than asking how
the force experienced by a particle changes over time, we ask how the force would
change if at this instant of time the particle was in a slightly different position. Such
hypotheticals have meaning only assuming an element of contingency in events, i.e., only
if we assume the paths of material objects could be different than they are.
Of course, if we were to postulate a substantial continuous field, we could have nonhypothetical partial derivatives, which would simply express the facts implicit in the total
derivatives for each substantial part of the field. However, the intelligibility of a truly
continuous extended substance is questionable, and we know of no examples of such a
thing in nature. Given that the elementary force fields envisaged by the Maxwellians
were eventually concede to be immaterial, and their properties could only be inferred
from the state variables of material entities, its clear that the partial derivatives over the
field variables are not only hypothetical, but entail the assumption of freedom of action.
In the absence of freedom, any hypothetical transverse variations in a field (i.e.,
transverse to the actual paths of material entities) would be meaningless. Only actual
variations in the state variables of material entities would have meaning. Thus the
contrast between total and partial differential equations reflects two fundamentally
different conceptual frameworks, the former based on determinism and the latter based on
the possibility of free acts. This is closely analogous to Aristotles dichotomy between
natural and violent motions.
As noted above, Einstein regarded this dualism as unnatural, and his intuition led him to
expect that the field concept, governed by partial differential equations, would ultimately
prove to be sufficient for a complete description of phenomena. In the same essay
mentioned above he wrote
What appears certain to me, however, is that, in the foundations of any consistent
field theory, there should not be, in addition to the concept of the field, any
concept concerning particles. The whole theory must be based solely on partial
differential equations and their singularity-free solutions.
It may seem ironic that he took this view, considering that Einstein was such a staunch
defender of strict causality and determinism, but by this time he was wholly committed to
the concept of a continuous field as the ultimate ontological entity, more fundamental
even than matter, and possessing a kind of relativistic substantiality, subject to
deterministic laws. In a sense, he seems to have come to believe that the field was not a
hypothetical entity inferred from the observed behavior of material bodies, but rather that
material bodies were hypothetical entities inferred from the observed behavior of fields.
An important first step in this program was to eliminate the concept of forces acting
between bodies, and to replace this with a field-theoretic model. He (arguably)

accomplished this for gravitation with the general theory of relativity, which completely
dispenses with the concept of a "force of gravity", and instead interprets objects under the
influence of gravity as simply proceeding, unforced, along the most natural (geodesic)
paths. Thus the concept of force, and particularly gravitational force, which was so
central to Newton's synthesis, was simply discarded as having no absolute significance.
However, the concept of force is still very important in physics, partly because we
continue to employ the classical formulation of mechanics in the limit of low speeds and
weak gravity, but more importantly because it has not proven possible (despite the best
efforts of Einstein and others) to do for the other forces of nature what general relativity
did for gravity, i.e., to express the apparently forced (violent) motions as natural paths
through a modified geometry of space and time.

3.3 De Mora Luminis


I see my light come shining,
From the west unto the east.
Any day now, any day now,
I shall be released.
Bob Dylan, 1967
We are usually not aware of any delay between the occurrence of an event and its visual
appearance in the eye of a distant observer. In fact, a single visual "snapshot" is probably
the basis for most people's intuitive notion of an "instant". However, the causal direction
of an instantaneous interaction is inherently ambiguous, so it's perhaps not surprising that
ancient scholars considered two competing models of vision, one based on the idea that
every object is the source of images of itself, emanating outwards to the eye of the
observer, and the other claiming that the observer's eye is the source of visual rays
emanating outwards to "feel" distant objects. An interesting synthesis of these two
concepts is the idea, adopted by Descartes, of light as a kind of pressure in an ideal
incompressible medium that conveys forces and pressures instantaneously from one
location to another. However, even with Descartes we find the medium described as
"incompressible, or nearly incompressible", revealing the difficulty of reconciling
instantaneous force at a distance with our intuitive idea of causality. Fermat raised this
very objection when he noted (in a letter on Descartes' Dipotrics) that if we assume
instantaneous transmission of light we are hardly justified in analyzing such
transmissions by means of analogies with motion through time.
Perhaps urged by the sense that any causal action imposed from one location on another
must involve a progression in time, many people throughout history have speculated that
light may propagate at a finite speed, but all efforts to discern a delay in the passage of
light (mora luminus) failed. One of the earliest such attempts of which we have a written
account is the experiment proposed by Galileo, who suggested (in his Dialogue
Concerning Two New Sciences) relaying a signal back and forth with lamps and shutters

located on separate hilltops. Based on the negative results from this type of crude
experiment, Galileo could only confirm what everyone already knew, namely, that the
propagation of light is "if not instantaneous, then extraordinarily fast". He went on to
suggest that it might be possible to discern, in distant clouds, some propagation time for
the light emitted by a lightning flash.
We see the beginning of this light I might say its head and source located at a
particular place among the clouds; but it immediately spreads to the surrounding
ones, which seems to be an argument that at least some time is required for
propagation. For if the illumination were instantaneous and not gradual, we
should not be able to distinguish its origin its center, so to speak from its
outlying portions.
The idea of using the clouds in the night sky as a giant bubble chamber was characteristic
of Galileos talent for identifying opportunities in natural phenomena for testing ideas, as
well as his attentiveness to subtle qualitative impressions, such as the sense of being able
to distinguish the center of the illumination given off by a flash of lightning, even
though we cant quantify the delay time. It also shows that Galileo was inclined to think
light propagated at a finite speed, but of course he rightly qualified this lightning-cloud
argument by admitting that really these matters lie far beyond our grasp. Today we
would say the perceived spreading out of a lightning strike through the clouds is due to
propagation of the electrical discharge process. Even for clouds located ten miles apart,
the time for light itself to propagate from one cloud to the other is only one 18,600th of a
second, presumably much too short to give any impression of delay to human senses.
Interestingly, Galileo also contributed (posthumously) to the first successful attempt to
actually observe a delay attributable to the propagation of light at a finite speed. In 1610,
soon after the invention of the telescope, he discovered the four largest moons of Jupiter,
illustrated below:

In hopes of gaining the patronage of the Grand Duke Cosimo II, Galileo named Jupiter's

four largest moons the "Medicean Stars", but today they're more commonly called the
Galilean satellites. At their brightest these moons would be just bright enough (with
magnitudes between 5 and 6) to be visible from Earth with the naked eye - except that
they are normally obscured by the brightness of Jupiter itself. (Interestingly, there is some
controversial evidence suggesting that an ancient Chinese astronomer may actually have
glimpsed one of these moons 2000 years before Galileo.) Of course, from our vantage
point on Earth, we must view the Jupiter system edgewise, so the moons appear as small
stars that oscillate from side to side along the equatorial plane of Jupiter. If they were all
perpendicular to the Earth's line of sight, and all on the same side of Jupiter,
simultaneously, they would look like this:

By the 1660's, detailed tables of the movements of these moons had been developed by
Borelli (1665) and Cassini (1668). Naturally these tables were based mainly on
observations taken around the time when Jupiter is nearly "in opposition", which is to
say, when the Earth passes directly between Jupiter and the Sun, because this is when
Jupiter appears high in the night sky. The mean orbital periods of Jupiter's four largest
moons were found to be 1.769 days, 3.551 days, 7.155 days, and 16.689 days, and these
are very constant and predictable (especially for the two inner moons), like a giant
clockwork. (In fact, there were serious attempts in the 18th century to develop a system
of tables and optical instruments so that the "Jupiter clock" could be used by sailors at sea
to determine Greenwich Meridian time, from which they could infer their longitude.)
Based on these figures it was possible to predict within minutes the times of eclipses and
passages (i.e., the passings behind and in front of Jupiter) that would occur during the
viewing opportunities in future "oppositions". In particular, the innermost satellite, Io
(which is just slightly larger than our own Moon), completes one revolution around
Jupiter every 42.456 hours. Therefore, when viewed from the Earth, we expect to see Io
pass behind Jupiter once every 42 hours, 27 minutes, and 21 seconds - assuming the light
from each such eclipse takes the same amount of time to reach the Earth.
By the 1670's people began to make observations of Jupiter's moons from the opposite
side of the Earth's orbit, i.e., when the Earth was on the opposite side of the Sun from
Jupiter, and they observed a puzzling phenomenon. Obviously it's more difficult to make
measurements at these times, because the Jovian system is nearly in conjunction with the
Sun, but at dawn and dusk it is possible to observe Jupiter even when it is fairly close to
conjunction. These observations, taken about 6 months away from the optimum viewing
times, reveal that the eclipses and passages of Jupiter's innermost moon, Io, which could
be predicted so precisely when Jupiter is in opposition, are consistently late by about 17
minutes relative to their predicted times of occurrence. (Actually the first such estimate,
made by the Danish astronomer Ole Roemer in 1675, was 22 minutes.) This is not to say
that the time intervals between successive eclipses is increased by 17 minutes, but that
the absolute time of occurrence is 17 minutes later than was predicted six months earlier
based on the observed orbital period at that time. Since Io has a period of 1.769 days, it

completes about 103 orbits in six months, and it appears to lose a total of 17 minutes
during those 103 orbits, which is an average of about 9.9 seconds per orbit.
Nevertheless, at the subsequent "opposition" viewing six months later, Io is found to be
back on schedule! It's as if a clock runs slow in the mornings and fast in the afternoons,
so that on average it never loses any time from day to day. While mulling over this data
in 1675, it occurred to Roemer that he could account for the observations perfectly if it is
assumed that light propagates at a finite speed. At last someone had observed the mora
luminus. Light travels at a finite speed, which implies that when we see things we are
really seeing how they were at some time in the past. The further away we are from an
object, the greater the time delay in our view of that object. Applying this hypothesis to
the observations of Jupiter's moons, Roemer considered the case when Jupiter was in
opposition on, say, January 1, so the light from the Jovian eclipses was traveling from the
orbit of Jupiter to the orbit of the Earth, as shown in the figure below.

The intervals between successive eclipses around this time will be very uniform near the
opposition point, because the eclipses themselves are uniform and the distance from
Jupiter to the Earth is fairly constant during this time. However, after about six and a half
months (denoted by July 18 in the figure), Jupiter is in conjunction, which means the
Earth is on the opposite side of it's orbit from Jupiter. The light from the "July 18" eclipse
will still cross the Earth's orbit (on the near side) at the expected time, but it must then
travel an additional distance, equal to the diameter of the Earth's orbit, in order to reach
the Earth. Hence we should expect it to be "late" by the amount of time required for light
to travel the Earth's orbital diameter. Combining this with a rough estimate of the distance
from the Earth to the Sun, Huygens reckoned that light must travel at about 209,000
km/sec. A subsequent estimate by Newton gave a value around 241,000 km/sec. In fact,
the Scholium to Proposition 96 of Newton's Principia includes the statement
For it is now certain from the phenomenon of Jupiter's satellites, confirmed by the
observations of different astronomers, that light is propagated in succession, and
requires about seven or eight minutes to travel from the sun to the earth.
The early quantitative estimates of the speed of light were obviously impaired by the lack

of precise knowledge of the Earth-Sun distance. Using modern techniques, the Earth's
orbital diameter is estimated to be about 2.98 x 1011 meters, and the observed time delay
in the eclipses and passages of Jupiter's moons when viewed from the Earth with Jupiter
in conjunction is about 16.55 minutes = 993 seconds, so we can deduce from these
observations that the speed of light is about 2.98 x 1011 / 993 3 x 108 meters/sec.
Of course, Roemer's hypothesis implies a specific time delay for each point of the orbit,
so it can be corroborated by making observations throughout the year. We find that most
of the discrepancy occurs during the times when the distance between Jupiter and the
Earth is changing most rapidly, which is when the Earth-Sun axis is nearly perpendicular
to the Jupiter-Sun axis. At one of these positions the Earth is moving almost directly
toward Jupiter, and at the other it is moving almost directly away from Jupiter, as shown
in the figure below.

The Earth's speed relative to Jupiter at these points is essentially just its orbital speed,
which is the circumference of its orbit divided by one year. Thus we have

which is equivalent to about 3 x 104 meters/sec. If we choose units so that c = 1, then we


have v = 0.0001. From this point of view the situation can be seen as a simple application
of the Doppler effect, and the frequency of the eclipses as viewed on Earth can be related
to the actual frequency (which is what we observe at conjunction and opposition)
according to the formulas

The frequencies are inversely proportional to the time intervals between eclipses. These
formulas imply that, for the moon Io, whose orbital period is 1.769 days = 2547.3600
minutes, the time interval between consecutive observed eclipses when the Earth is

moving directly toward Jupiter (indicated as "Jan" in the above figure) is 2547.1052
minutes, and the time intervals between successive observed eclipses six months later is
2547.6147 minutes. Thus the interval between observed eclipses is 15.2 seconds shorter
than nominal in the former case, and it is 15.3 seconds longer than nominal in the latter
case, making a total difference of 30.5 seconds between the inter-arrival times at the two
extremes, separated by six months. It would have been difficult to keep time this
accurately in Roemer's day, but differences of this size are easily measured with modern
clocks. By the way, the other moons of Jupiter do not conform so nicely to Roemer's
hypothesis, but this is because their orbital motions are inherently more irregular due to
their mutual gravitational interactions.
Despite the force of Roemer's analysis, and the early support of both Huygens and
Newton, most scientists remained skeptical of the idea of a finite speed of light. It was
not until 50 years later, when the speed of light was evaluated in a completely different
way, arriving at nearly the same value, that the idea became widely accepted. This
occurred in 1729, when the velocity of light was estimated by James Bradley based on
observations of the aberration of starlight, which he argued must depend on the ratio of
the speed of light to the orbital speed of the Earth. Based on the best measurements of the
limiting starlight aberration 20.4" 0.1" by Otto Struve, and taking the speed of the
Earth to be about 30.56 km/sec from Encke's solar parallax estimate of 8.57" 0.04",
this implied a light speed of about 308,000 km/sec.
Unfortunately, Encke's parallax estimates had serious problems, and he greatly underestimated his error band. The determination of the Earth-Sun distance was a major
challenge for scientists in the 18th century. Interestingly, the primary mission of Captain
James Cook when he embarked in the ship Endeavour on his famous voyage in 1768 was
to observe the transit of the planet Venus across the disk of the Sun from the vantage
point of Tahiti in the South seas, with the aim of determining the distance from the Sun to
the Earth by parallax. Roughly once each century two such conjunctions are visible from
the Earth, occurring eight years apart. Edmund Halley had urged that when the next
opportunities arose on June 6, 1761, and June 3, 1769, it be observed from as many
vantage points as possible on the Earth's surface to make the best possible determination.
The project was undertaken by people from many countries. Le Gentil traveled to India
for the 1761 transit, but since England and France were antagonists at the time, he had to
dodge the English war ships, causing him to reach India just after June 6, missing the first
transit of Venus. Determined not to miss the second one, he remained in India for the next
eight years (!) "doing various useful work" (according to Pannekoek) until June 3 1769.
Alas, when the day arrived, it was too cloudy to see anything.
Cook's observations fared somewhat better. The French government actually issued
orders to its war ships to leave Cook alone, since he was "on a mission for the benefit of
all mankind". The Endeavour arrived in Tahiti on April 13, 1769, and the scientists were
able to make observations in the clear on June 3. Unfortunately, the results were
disappointing, not only in Tahiti, but all over the world. It turned out to be extremely
difficult to judge precisely (to within, say, 10 seconds) when one edge of Venus passed
the border of the Sun. The black disk of the planet appeared to "remain connected like a

droplet" to the border of the Sun, until suddenly the connection was broken and the planet
was seen to be well past the border. Observers standing right next to each other recorded
times differing by tens of seconds. Consequently the observations failed to yield an
improved estimate of the Earth-Sun distance.
The first successful quantification of c based solely on terrestrial measurements was
probably Fizeau's in 1849, using a toothed wheel, and then Foucault's experiment in 1862
using rotating mirrors. The toothed-wheel didn't work very well, and it was hard to say
how accurate it was, but the rotating mirrors led to a value of about 298,000 500
km/sec, significantly below the earlier estimates. Foucault was confident the discrepancy
with earlier results couldn't be explained by an error in the aberration angle, so he
inferred (correctly) that Encke's Solar parallax estimate (and therefore the orbital velocity
of the Earth) was in error, and proposed a value of 8.8", which was subsequently
confirmed and refined by new observations, as well as a re-analysis of Encke's 1769 data
using better longitudes and yielding an estimate of 8.83".
To increase the speed of switching the light signal, Kerr cells were used. These rely on
the fact that the refractivity of certain substances can be made to vary with an applied
electric voltage. Further refinements led to large-baseline devices, called geodimeters,
originally intended for use in geodesic surveying. Here is a summary of the major
published determinations of the speed of optical light based on one or another of these
techniques:

Measurements of the speed of electromagnetic waves in the radio frequency have also

been made, with the results summarized below:

In addition, the speed of light can be determined indirectly by measuring the ratio of
electric to magnetic units, which amounts to measuring the permittivity of the vacuum.
Some result given by this method are summarized below:

(Several of the above values include corrections for various group-velocity indices.) A
plot of the common logarithm of the tolerance versus the year for the 19 optical light
speed measurements is shown below:

Interestingly, comparing each of the measured values with Evenson's 1973 value, we find
that more than half of them were in error by more than their published tolerances. This is
not so surprising when we note that most of the tolerances were quoted as "one sigma"
error bands rather than as absolute limits. Indeed, if we consider the two-sigma band,
there were only four cases of over-optimism, and of those, all but Foucault's 1862 result
are within three sigma, and even Foucault is within four sigma. This is roughly in
agreement with what one would expect, especially for delicate and/or indirect
measurements. Also, the aggressive error estimates in this field have had the beneficial
effect of spurring controversies between different researchers, forcing them to repeat
experiments and refine their techniques in order to resolve the disagreements. In this way,

knowledge of the speed of light progressed in less than 400 years from Galileo's
assessment, "extraordinarily fast", to the best modern value, 299,792.4574 0.0012
km/sec. Today the unit of length is actually defined as a specific number of wavelengths
of light of a certain frequency, based on the known value of the speed of light, so in effect
we now define the meter such that the speed of light is exactly 299792.4574 km/sec.
Incidentally, Maxwell once suggested (in his article on Ether for the ninth edition of the
Encyclopedia Britannica) that Roemer's method could be used to test for the isotropy of
light speed, i.e., to test whether the speed of light is the same in all directions. After
noting that any purely terrestrial measurement would yield an effect only of the second
order in v/c, which he regarded as quite insensible (a remark that spurred Albert
Michelson to successfully measure just such a quantity only two years later), he wrote
The only practicable method of determining directly the relative velocity of the
aether with respect to the solar system is to compare the values of the velocity of
light deduced from the observation of the eclipses of Jupiter's satellites when
Jupiter is seen from the earth at nearly opposite points of the ecliptic.
Notice that, for this type of observation, the relevant speed is not the speed of the earth in
its orbit around the sun, but rather the speed of the entire solar system. Roemer's method
can be regarded as a means of measuring the speed of light in the direction from Jupiter
to the Earth, and since Jupiter has an orbital period of about 12 years, we can use this
method to evaluate the speed of light several times over a 12 year period, and thus
evaluate the speed in all possible directions (in the plane of the ecliptic). If the sun was
stationary, we would not expect to find any differences, but it was already suspected in
Maxwells time that the sun itself is in motion. The best modern estimate is that our solar
system is moving with a speed of about 3.7 x 105 meters per second with respect to the
cosmic microwave background radiation (i.e., the frame in which the radiation is roughly
isotropic). If we assume a pre-relativistic model in which light propagates at a fixed
speed with respect to the background radiation, and in which frames are related by
Galilean transformations, we could in principle determine the "absolute speed" of the
solar system. The magnitude of the effect is given by computing how much difference
would be expected in the time for light to traverse one orbital diameter of the Earth at an
effective speed of c+V and cV, where V is the presumed absolute speed of the Earth.
This gives a maximum difference of about 2.45 seconds between two measurements
taken six years apart. (These two measurements each occur over a 6 month time span as
explained above.)
In practice it would be necessary to account for many other uncontrolled variables, such
as the variations in the orbits of the Earth and Jupiter over the six year interval. These
would need to be known to much better than 1 part in 400 to give adequate resolution. To
the best of my knowledge, this experiment has never been performed, because by the
time sufficiently accurate clocks were available the issue of light's invariance with respect
to inertial coordinate systems had already been established by more accurate terrestrial
measurements, together with an improved understanding of the meaning of inertial
coordinates. Today we are more likely to establish a system of coordinates optically, and

then test to verify the isotropy of mechanical inertia with respect to those coordinates.
3.4 Stationary Paths
Then with no throbbing fiery pain,
No cold gradations of decay,
Death broke at once the vital chain,
And freed his soul the nearest way.
Samuel Johnson, 1783
The apparent bending of visual images of objects partially submersed in water was noted
in antiquity, but it wasn't until Kepler's Dioptrice, published in 1611, that anyone
attempted to actually quantify the effect. Kepler discovered that, at least for rays nearly
perpendicular the surface, the ratio of the angles of incidence and refraction is (nearly)
proportional to the ratio of what we now call the indices of refraction of the media.
(Originally these indices were just empirically determined constants for each substance,
but Newton later showed that for most transparent media the refractive index could be
taken as unity plus a term proportional to the medium's density.) Incidentally, Kepler also
noticed that with suitable materials and angles of incidence, the refracted angle can be
made to exceed 90 degrees, resulting in total internal reflection, which is the basic
principle of modern fiber optics.
In 1621, Willebrord Snell performed a series of careful measurements and found that
when a ray of light passes through a surface at which the index of refraction changes
abruptly, the angles made by the incident and transmitted rays with the respective
outward normals to the surface are related according to the simple formula (now called
Snell's Law)

where n1 and n2 are the indices of refraction (still regarded simply as empirical constants
for any given medium) on the incident and transmitted sides of the boundary, and 1 and
2 are the angles that the incident ray and the transmitted ray make with the normal to the
boundary as shown below.

Soon thereafter, Descartes published his La Dioptrique (1637), in which he presented a


rationalization of Snell's law based on the idea that light is a kind of pressure transmitted

instantaneously (or nearly so) through an elastic medium. Descartes' theory led to a
fascinating scientific dispute over the correct interpretation of light. According to
Descartes' mechanistic description, a dense medium must transmit light more effectively,
i.e., with more "force", than a less dense medium. (He sometimes described light rays in
terms of a velocity vector rather than a force vector, but in either case he reasoned that the
magnitude of the vector, which he called the light's determination, increased in
proportion to the density of the medium.) Also, Descartes argued that the tangential
component of the ray vector remains constant as the ray passes through a boundary. On
the basis of these two (erroneous) premises, the parallelogram of forces for a ray of light
passing from a less dense to a more dense medium is as shown below.

The magnitude of the incident force is f, and the magnitude of the refracted force if F,
each of which is decomposed into components normal and tangential to the surface.
Since Descartes assumes ft = Ft, it follows immediately that f sin(1) = F sin(2). If, as
Descartes often did, we regard the force (determination) of the light as analogous to the
speed of light, then this corresponds to the relation v1 sin(1) = v2 sin(2) where v1 and v2
are the speeds of light in the two media.
Fermat criticized Descartes' derivation, partly on mathematical grounds, but also because
he disagreed with the basic physical assumptions. In particular, Fermat believed that
light must not only travel at a finite speed, it must travel slower (not faster) in a denser
medium. Thus he argued that the derivation of Snell's law presented by Descartes was
invalid, and he suspected the law itself might even be wrong. In his attempts to derive
the "true" law of refraction, Fermat recalled the derivation of the law of reflection given
by Hero of Alexandria in ancient times. (Actually, Fermat got this idea by way of his
friend Marin Careau, who had repeated Hero's derivation in a treatise on optics in 1657.)
Hero asserted that light moves in a straight line in empty space, and reflects at equal
angles when striking a mirror, for the simple reason that light prefers always to move
along the shortest possible path. As Archimedes had pointed out, the shortest path
between two given points in space is a straight line, and this (according to Hero) explains
why light rays are straight. More impressively, Hero showed that when travelling from
some point A to the surface of a plane mirror and then back out to some point B, the
shortest path is the one for which the angles of incidence and reflection are equal. These
are ingenious observations, but unfortunately the same approach doesn't explain
refraction, because in that case the shortest path between a point above water and a point
below water (for example) would always be simply a straight line, and there would be no

refraction at all.
At this point Fermat's intuition that light propagates with a characteristic finite speed, and
that it moves slower in denser media, came to his aid, and he saw that both the laws of
reflection and refraction (as well as rectilinear motion in free space) could be derived
from the same principle if, instead of light traveling along a path that minimizes the
spatial distance, we suppose it travels along the path that minimizes the temporal
distance, i.e., light follows the path to its destination that will take the least possible time.
This conceptual step is fascinating for several reasons. For one thing, we don't know on
what basis Fermat "intuited" that the speed of light is not only finite (which had never yet
been demonstrated), but that it possesses a fixed characteristic speed (which it must if a
law of least time is to have any meaning), and that the speed is lower in more dense
media (precisely opposite the view of Descartes and subsequently Newton and
Maupertuis). Furthermore, applying the principle of least time rather than least distance
to the law of propagation of light clearly casts the propagation of light into the arena of
four-dimensional spacetime, and it essentially amounts to an assertion that the laws of
motion should be geodesic paths with a suitable spacetime metric. Thus, Fermat's optical
principle can be seen as a remarkable premonition of important elements of both special
and general relativity.
To derive the law of refraction for a ray of light traveling through the boundary between
two homogeneous media, Fermat argued that a ray traveling from point 1 to point 2 in the
figure below would follow the path that minimized the total time of the journey.

Letting v1 denote the speed of light in medium 1, and v2 denote the speed of light in
medium 2, the total time of the journey is d1/v1 + d2/v2, which can be written in terms of
the unknown x as

Differentiating with respect to x gives

Setting this to zero gives the relation

which is equivalent to Snell's law n1sin(1) = n2 sin(2) provided we assume the


refractive index of a medium is proportional to the inverse of the velocity of light in that
medium. Of course, since calculus hadn't been invented yet, Fermat's solution of the
problem involved considerably more labor (and ingenuity) than shown above, but
eventually he arrived at this result, which surprisingly was experimentally
indistinguishable from the formula arising from Descartes' derivation, despite the fact
that it was based on an opposite set of assumptions, namely, that the velocity (or the
"force") of light in a given medium is directly proportional to the refractive index of that
medium!
It may seem strange that two opposite hypotheses as to the speed of light should lead to
the same empirical result, but in fact without the ability to directly measure the speed of
light in various media we cannot tell from the refractivities of materials whether the index
is proportional to velocity or to the reciprocal of velocity. Even though both assumptions
lead to the same law of refraction, the dispute over the correct derivation of this law
continued unabated, because each side regarded the other side's interpretation as a
travesty of science. Among those who believed light travels faster in denser media are
Hooke and Newton, whereas Huygens derived the law of refraction based on his wave
theory of light (see Section 8.9) and concluded that Fermat's hypothesis was correct, i.e.,
the speed of light was less in denser media.
More than a century later (around 1747) Maupertuis applied his "principle of least action"
to give an elegant (albeit spurious) derivation of Snell's law from the hypothesis that light
travels faster in denser media. Maupertuis believed that the wisdom and economy of God
is manifest in all the operations of nature, which necessarily proceed from start to finish
in just such a way as to minimize the "quantity of action". In a sense, this is closely akin
to Fermat's principle of least time, since they are both primitive examples of what we
would now call the calculus of variations. However, Maupertuis developed an allencompassing view of his "least action" principle, with mystical and religious
implications, and he argued that it was the universal governing principle in all areas of
physics, including mechanics, optics, thermodynamics, and all other natural processes.
Of course, the notion that the phenomena of nature must follow the "best possible" course
was not new. Plato's Phaedo quotes Socrates as saying
If then one wished to know the cause of each thing, why it comes to be or perishes
or exists, one had to find what was the best way for it to be, or to be acted upon,
or to act. On these premises then, it befitted a man to investigate only, about this
and other things, what is best... he would tell me, first, whether the earth is flat or
round, and then explain why it is so of necessity, saying which is better, and that it

was better to be so... I was ready to find out in the same way about the sun and
the moon and the other heavenly bodies, about their relative speed, their turnings,
and whatever else happened to them, how it is best that each should act or be
acted upon...
The innovation of Maupertuis was to suggest a quantitative measure for the vague notion
of "what is best" for physical all processes, and to demonstrate that this kind of reasoning
can produce valid quantitative results in a wide range of applications. His proposal was
to minimize the product of mass, velocity, and displacement. (Subsequently Lagrange
clarified this by defining the action of a system as the spatial path integral of the product
of mass and velocity.) For a system whose mass does not change, Maupertuis regarded
the action as simply proportional to the product of velocity and distance traveled. To
derive the law of refraction for a ray of light traveling through the boundary between two
homogeneous media, Maupertuis argued that a ray traveling from point 1 to point 2 in the
figure above would follow the path that minimized the total "action" v1d1 + v2d2 of the
journey. This is identical to the quantity that Fermat minimized, except that the speeds
appear in place of their reciprocals. Since v1 and v2 are constants, the differentiation
proceeds as before, except for the inverted speed constants, and we arrive at the relation

which is equivalent to Snell's law n1sin(1) = n2 sin(2) provided we assume the


refractive index of a medium is proportional to the velocity of light in that medium, more
or less consistent with the views of Descartes, Hooke, Newton. Since the deviation of the
refractive index from unity is known empirically to be roughly proportional to the density
of the medium, this would imply that light travels faster in denser media, which Newton
and the others found quite plausible.
No amount of experimenting with the relative refractions of various media would suffice
to distinguish between these two possibilities (the refractive index being proportional to
the velocity or the reciprocal velocity). Only a direct measurement of the speed of light
in two media with different refractive indices could accomplish this. Such a
measurement was not achieved until 1850, when Focault passed rays of light through a
tube, and by using a rapidly rotating mirror was able to show conclusively that light takes
longer to traverse the tube when it is filled with water than when filled with air. So, after
200 years of theorizing and speculation, the question was finally settled in favor of
Fermat and Huygens, i.e., the index of refraction is inversely proportional to the speed of
light in the medium.
It's worth noting that although Fermat was closer to the truth, his principle of "least time"
is not strictly correct, because the modern formulation of "Fermat's Principle" states that
light travels along a path for which the time is stationary, (i.e., such that slight transverse
changes in the path don't affect its length), not necessarily minimal. In fact, it may even
be maximal, as can be verified by looking at yourself in the concave surface of a shiny
spoon. The "reason" that light prefers stationary paths can be found in the theory of
quantum electrodynamics and Feynman's "sum over all paths" interpretation, which

shows that if neighboring paths take different amounts of time, the neighboring rays
arrive at the destination out of phase, and cancel each other out, whereas they reinforce
each other if the neighboring paths take the same amount of time, or differ by some
whole multiple of the wave. A stark demonstration of this is given by diffraction
gratings, in which the canceling regions of a mirror are scraped away, resulting in
reflective properties that violate Hero's law of equal angles.
The modified version of Fermats Principle (requiring stationary rather than minimal
paths) has proven to be a remarkably useful approach to the formulation of all kinds of
physical problems involving motion and change. Also, subsequent optical experiments
confirmed Fermats intuition that the index of refraction for a given medium was
inversely proportional to the (phase) velocity v of light in the medium. The modern
definition of the refractive index is n = c/v, where the constant of proportionality c is the
speed of light in a vacuum. (The fact that Fermat and Descartes could reach identical
conclusions, even though one assumed the index of refraction was proportional to v
while the other assumed it was proportional to 1/v is less surprising when we recall that
this is precisely the crucial symmetry for relativistic velocity-composition, as described in
Section 1.8.)
In any case, it's clear that Fermat's model of optics based on his principle of least time,
when interpreted as a metrical theory, entails or suggests many of the important elements
of the modern theory of relativity, including the fundamental assumption of a
characteristic speed of light for each medium, the concept of a unified space-time as the
effective arena of motion, and the assumption that natural motions follow geodesic paths.

3.5 A Quintessence of So Subtle A Nature


For his art did express
A quintessence even from nothingness,
From dull privations and lean emptiness;
He ruined me, and I am re-begot
Of absence, darkness, death; things which are not.
John Donne, 1633
Descartes (like Aristotle before him) believed that nature abhors a vacuum, and insisted
that the entire universe, even regions that we commonly call "empty space", must be
filled with (or, more precisely, must consist of) some kind of substance. He believed this
partly for philosophical reasons, which might be crudely summarized as "empty space is
nothingness, and nothingness doesn't exist". He held that matter and space are identical
and co-extant (ironically similar to Einstein's later notion that the gravitational field is
identical with space). In particular, Descartes believed an all-pervasive substance was
necessary to account for the propagation of light from the Sun to the Earth (for example),
because he rejected any kind of "action at a distance", and he regarded direct mechanical
contact (taken as a primitive operation) as the only intelligible means by which two

objects can interact. He conceived of light as a kind of pressure, transmitted


instantaneously from the source to the eye through an incompressible intervening
medium. Others (notably Fermat) thought it more plausible that light propagated with a
finite velocity, which was corroborated by Roemer's 1675 observations of the moons of
Jupiter. The discovery of light's finite speed was a major event in the history of science,
because it removed any operational means of establishing absolute simultaneity. The full
significance of this took over two hundred years to be fully appreciated.
More immediately, it was clear that the conception of light as a simple pressure was
inadequate to account for the different kinds of light, i.e., the phenomenon of color. To
remedy this, Robert Hooke suggested that the (longitudinal) pressures transmitted by the
ether may be oscillatory, with a frequency corresponding to the color. This conflicted
with the views of Newton, who tended to regard light as a stream of particles in an empty
void. Huygens advanced a fairly well-developed wave theory, but could never
satisfactorily answer Newton's objections about the polarization of light through certain
crystals ("Iceland spar"). This difficulty, combined with Newton's prestige, made the
particle theory dominant during the 1700's, although many people, notably Jean Bernoulli
and Euler, held to the wave theory.
In 1800 Thomas Young reconciled polarization with the wave theory by postulating the
light actually consists of transverse rather than longitudinal waves, and on this basis along with Fresnel's explanation of diffraction in terms of waves - the wave theory gained
wide acceptance. However, Young's solution of the polarization problem immediately
raised a new one, namely, how a system of transverse waves could exist in the ether,
which had usually been assumed to be akin to a tenuous gas or fluid. This prompted
generations of physicists, including Navier, Stokes, Kelvin, Malus, Arago, and Maxwell
to become actively engaged in attempts to explain optical phenomena in terms of a
material medium; in fact, this motivated much of their work in developing the equations
of state for elastic media, which have proven to be so useful for the macroscopic
treatment of fluids. However, despite the fruitfulness of this effort for the development of
fluid dynamics, no one was ever able to accurately account for all optical and electromagnetic phenomena in terms of the behavior of an ordinary fluid medium, with or
without viscosity and/or compressibility.
There were a number of reasons for this failure. First, an ordinary fluid (even a viscous
fluid) can't sustain shear stresses at rest, so it can propagate only longitudinal waves, as
opposed to the transverse wave structure of light implied by the phenomenon of
polarization. This implies either that the luminiferous ether must be a solid, or else we
must postulate some kind of persistent dynamics (such as vortices) in the fluid so that it
can sustain shear stresses. Unfortunately, both of these alternatives lead to difficulties.
The assumption of a solid ether is difficult to reconcile with the fact that the equations of
state for ordinary elastic solids always yield longitudinal waves accompanying any
transverse waves - typically with different velocities. Such longitudinal disturbances are
never observed with respect to optical phenomena. On the other hand, the assumption of
a fluid ether with persistent flow patterns to sustain the required shear stresses entails a
highly coordinated and organized system of flow cells that could persist only with the

active participation of countless tiny Maxwell demons working furiously at each point
to sustain it. Lacking this, the vortices are inherently unstable (even in an ideal perfect
inviscid fluid, in which vorticity is strictly conserved), so these flow cells could not exert
the effects on ordinary matter that they must if they are to serve as the mechanism of
electromagnetic forces. Even the latter-day concept of an ether consisting of a superfluid
(i.e., the viscosity-free quantum hydrodynamical state achieved by some substances such
as helium when cooled to near absolute zero) faces the same problem of sustaining its
specialized state while simultaneously interacting with ordinary matter in the required
ways. As Maxwell acknowledged
No theory of the constitution of the ether has yet been invented which will
account for such a system of molecular vortices being maintained for an indefinite
time without their energy being gradually dissipated into that irregular agitation of
the medium which, in ordinary media, is called heat.
Thus, ironically, the concept of transverse waves - proposed by Young and Fresnel as a
means of accounting for polarization of light in term of a mechanical wave propagating in
some kind of material ether - immediately led to considerations that ultimately
undermined confidence in the physicality and meaningfulness of that ether.
Even aside from the difficulty of accounting for exclusively transverse waves in a
material medium, the idea of a substantial ether filling all of space had always faced
numerous difficulties. For example, Newton had shown (in his demolition of Descartes'
vortex theory) that the evidently drag-free motion of the planets and comets was flatly
inconsistent with the presence of any significant density of interstitial fluid. This problem
is especially acute when we remember that, in order to account for the high speed of
light, the density and rigidity of the putative ether must be far greater than that of steel.
Serious estimates of the density of the ether varied widely, but ran as high as 1000 tons
per cubic millimeter. It is then necessary to explain the interaction between this putative
material ether and all other known substances. Since the speed of light changes in
different material media, there is clearly a significant interaction, and yet apparently this
interaction does not involve any appreciable transfer of ordinary momentum (since
otherwise the unhindered motions of the planets are inexplicable).
One interesting suggestion was that it might be possible to account for the absence of
longitudinal waves by hypothesizing a fluid that possesses vanishingly little resistance to
compression, but extremely high rigidity with respect to transverse stresses. In other
words, the shear stresses are very large, while the normal stresses vanish. The opposite
limit is easy to model with the Navier-Stokes equation by setting the viscosity to zero,
which gives an ideal non-viscous fluid with no shear stresses and with the normal stresses
equal to the pressure. However, we can't use the ordinary Navier-Stokes equations to
represent a substance of high viscosity and zero pressure, because this would simply zero
density, and even if we postulate some extremely small (but non-zero) pressure, the
normal stresses in the Navier-Stokes equations have components that are proportional to
the viscosity, so we still wouldn't be rid of them. We'd have to postulate some kind of
adaptively non-isotropic viscosity, and then we wouldn't be dealing with anything that

could reasonably be called an ordinary material substance.


As noted above, the intense efforts to understand the dynamics of a hypothetical
luminiferous ether fluid led directly to modern understanding of fluid dynamics, as
modeled by the Navier-Stokes equation for fluids of arbitrary viscosity and
compressibility. This equation can be written in vector form as

where p is the pressure, is the density, F the external force vector (per unit mass), is
the kinematic viscosity, and V is the fluid velocity vector. If the fluid is incompressible
then the divergence of the velocity is zero, so the last term vanishes. Its interesting to
consider whether anything can be inferred about the vacuum from this equation. By
definition, a vacuum has vanishing density, pressure, and viscosity - at least in the
ordinary senses of those terms. Setting these quantities to zero, and in the absence of any
external force F, the above equation reduces to dV/dt = p/. Since both p and are to
equal zero, this equations can only be evaluated on the basis of some functional
relationship between those two variables. For example, we may assume the ideal gas law,
p = RT where R is the gas constant and T is temperature. In that case we can evaluate
the limit of p/ as p and approach zero to give

This rather ghostly proposition apparently describes the disembodied velocity and
temperature of a medium possessing neither density nor heat capacity. In a sense it is a
medium of pure form and no substance. Of course, this is physically meaningless unless
we can establish a correspondence between the terms and some physically observable
effects. It was hoped by Stokes, Maxwell, and others that some such identification of
terms might enable a limiting case of the Navier-Stokes equation to represent
electromagnetic phenomena, but the full delineation of Maxwell's equations for
electromagnetism make it clear that they do not describe the movement of any ordinary
material substance, which of course was the basis for Navier-Stokes equation.
Another interesting suggestion was that the luminiferous ether might consist of a
substance whose constituent parts, instead of resisting changes in their relative distances
(translation), resist changes in orientation. A theory along these lines was proposed by
MacCullagh in 1839, and actually led to some of the same formulas as Maxwell's
electromagnetic theory. This is an intriguing fact, but it doesn't represent an application
(or even an adaptation) of the equations of motion for either an ordinary elastic
substance, whether gas, fluid, or solid. It's more properly regarded as an abstract
mathematical model with only a superficial resemblance to descriptions of the behavior
of material substances.
Some of the simplest material ether theories were ruled out simply on the basis of first-

order optical phenomena, especially stellar aberration. For example, Stokes' theory of
complete convection could correctly model aberration (to first order) only with a set of
special hypotheses as to the propagation of light, hypotheses that Lorentz later showed to
be internally inconsistent. (Stokes erroneously assumed the velocity of a potential flow
stream around a sphere is zero at the spheres surface.) Fresnel's theory of partial
convection was (more or less) adequate, up until it became possible to measure secondorder effects, at which point it too was invalidated. But regardless of their empirical
failures, none of these theories really adhered to the laws of ordinary fluid mechanics.
William Thomson (Lord Kelvin), who was perhaps the most persistent of all in the
attempt to represent electromagnetic phenomena in terms of the mechanics of ordinary
macroscopic substances, aptly summarized the previous half-century of progress in this
line of research at a jubilee in his honor in 1896:
One word characterizes the most strenuous efforts for the advancement of science
that I have made perseveringly during fifty-five years; that word is failure. I know
no more of electric and magnetic force, or of the relation between ether,
electricity, and ponderable matter than I knew fifty years ago.
We might think this assessment was too harsh, especially considering that virtually the
entire science of classical electromagnetism - based on Maxwells equations - was
developed during the period in question. However, in the course of this development
Maxwell and his followers had abandoned the effort to find mechanical analogies, and
Kelvin equated progress with finding a mechanical analogy. The failure to find any
satisfactory mechanical model for electromagnetism led to the abandonment of the
principle of qualitative similarity, which is to say, it led to the recognition that the ether
must be qualitatively different from ordinary substances. This belief was firmly
established once Maxwell showed that longitudinal waves cannot propagate through
transparent substances or free space. In so doing, he was finally able to show that all
electromagnetic and optical phenomena can be explained by a single system of "stresses
in the ether", which, however, he acknowledged must obey quite different laws than do
the elastic stresses in ordinary material substances. E. T. Whittakers book Aether and
Electricity includes a review of the work of Kelvin and others to find a mechanical
model of the ether concludes that
Towards the close of the nineteenth century it came to be generally recognized
that the aether is an immaterial medium, sui generis, not composed of identifiable
elements having definite locations in absolute space.
Thus by the time of Lorentz it had become clear that the "ether" was simply being
arbitrarily assigned whatever formal (and often non-materialistic) properties it needed in
order to make it compatible with the underlying electromagnetic laws, and therefore the
"corporeal" ether concept was no longer exerting any positive heuristic benefit, but was
simply an archaic appendage that was being formalistically superimposed on top of the
real physics for no particular reason.
Moreover, although the Navier-Stokes equation is as important today for fluid dynamics

as Maxwell's equations are for electrodynamics, we've also come to understand that real
fluids and solids are not truly continuous media. They actually consist of large numbers
of (more or less) discrete particles. As it became clear that the apparently continuous
dynamics of fluids and solids were ultimately just approximations based on an aggregate
of more primitive electromagnetic interactions, the motivation for trying to explain the
latter as an instance of the former came to be seriously questioned. It is rather like saying
gold consists of an aggregate of sub-atomic particles, and then going on to say that those
sub-atomic particles are made of gold! The effort to explain electromagnetism in terms of
a material fluid such as we observe on a macroscopic level, when in fact the
electromagnetic interaction is a much more primitive phenomenon, appears today to have
been fundamentally misguided, an attempt to model a low-level phenomenon as an
instance of a higher level phenomenon.
During the last years of the 19th century a careful and detailed examination of
electrodymanic phenomena enabled Lorentz, Poincare, and others to develop a theory of
the electromagnetic ether that accounted for all known observations, but only by
concluding that "the ether is undoubtedly widely different from all ordinary matter". This
is because, in order to simultaneously account for aberration, polarization and transverse
waves, the complete absence of longitudinal waves, and the failure of the Michelson/
Morley experiment to detect any significant ether drift, Lorentz was forced to regard the
ether as strictly motionless, and yet subject to non-vanishing stresses, which is
contradictory for ordinary matter.
Even in Einstein's famous essay on "The Ether and Relativity" he points out that although
"we may assume the existence of an ether, we must give up ascribing a definite state of
motion to it, i.e. we must take from it the last mechanical characteristic...". He says this
because, like Lorentz, he understood that electromagnetic phenomena simply do not
conform to the behavior of disturbances is any ordinary material substance - solid, liquid,
or gas. Obviously if we wish to postulate some new kind of substance whose properties
are not constrained to be those of an ordinary substance, we can "back out" whatever
properties are needed to match the equations of any field theory (which is essentially
what Lorentz did), but this is just an exercise in re-stating the equations in ad hoc verbal
terms. Such a program has no heuristic or explanatory content. The question of whether
electromagnetic phenomena could be accurately modeled as disturbances in an ordinary
material medium was quite meaningful and deserved to be explored, but the answer is
unequivocally that the phenomena of electromagnetism do not conform to the principles
governing the behavior of ordinary material substances. In fact, we now understand that
the latter are governed by the former, i.e., elementary electromagnetic interactions
underlie the macroscopic behavior of ordinary material substances.
We shouldn't conclude this review of the ether without hearing Maxwell on the subject,
since he devoted his entire treatise on electromagnetism to it. Here is what he says in the
final article of that immense work:
The mathematical expressions for electrodynamic action led, in the mind of
Gauss, to the conviction that a theory of the propagation of electric action [as a

function of] time would be found to be the very keystone of electrodynamics.


Now, we are unable to conceive of propagation in time, except either as the flight
of a material substance through space, or as the propagation of a condition of
motion or stress in a medium already existing in space... If something is
transmitted from one particle to another at a distance, what is its condition after it
has left the one particle and before it has reached the other? ...whenever energy is
transmitted from one body to another in time, there must be a medium or
substance in which the energy exists after it leaves one body and before it reaches
the other, for energy, as Torricelli remarked, 'is a quintessence of so subtle a
nature that it cannot be contained in any vessel except the inmost substance of
material things'. Hence all these theories lead to the conception of a medium in
which the propagation takes place, and if we admit this medium as an hypothesis,
I think it ought to occupy a prominent place in our investigations, and that we
ought to endeavour to construct a mental representation of all the details of its
action, and this has been my constant aim in this treatise.
Surely the intuitions of Gauss and Torricelli have been vindicated. Maxwell's dilemma
about how the energy of light "exists" during the interval between its emission and
absorption was resolved by the modern theory of relativity, according to which the
absolute spacetime interval between the emission and absorption of a photon is
identically zero, i.e., photons are transmitted along null intervals in spacetime. The
quantum phase of events, which we identify as the proper time of those events, does not
advance at all along null intervals, so, in a profound sense, the question of a photon's
mode of existence "after it leaves one body and before it reaches the other" is moot (as
discussed in Section 9). Of course, no one from Torricelli to Maxwell imagined that the
propagation of light might depend fundamentally on the existence of null connections
between distinct points in space and time. The Minkowskian structure of spacetime is
indeed a quintessence of a most subtle nature.
3.6 The End of My Latin
Leaving the old, both worlds at once they view
That stand upon the threshold of the new.
Edmund Waller, 1686
In his book "The Theory of Electrons" (1909) Hendrik Lorentz wrote
Einstein simply postulates what we have deduced, with some difficulty and not
altogether satisfactorily, from the fundamental equations of the electromagnetic
field.
This statement implies that Lorentz's approach was more fundamental, and therefore
contained more meaningful physics, than the explicitly axiomatic approach of Einstein.
However, a close examination of Lorentz's program reveals that he, no less than Einstein,
simply postulated relativity. To understand what Lorentz actually did - and did not -

accomplish, it's useful to review the fundamental conceptual issues that he faced.
Given any set of equations describing some class of physical phenomena with reference
to a particular system of space and time coordinates, it may or may not be the case that
the same equations apply equally well if the space and time coordinates of every event
are transformed according to a certain rule. If such a transformation exists, then those
equations (and the phenomena they describe) are said to be covariant with respect to that
transformation. Furthermore, if those equations happen to be covariant with respect to a
complete class of velocity transformations, then the phenomena are said to be relativistic
with respect to those transformations. For example, Newton's laws of motion are
relativistic, because they apply not only with respect to one particular system of
coordinates x,t, but with respect to any system of coordinates x',t' related to the former
system according to a complete set of velocity transformations of the form

From the time of Newton until the beginning of the 19th century many scientists
imagined that all of physics might be reducible to Newtonian mechanics, or at least to
phenomena that are covariant with respect to the same coordinate transformations as are
Newton's laws, and therefore the relativity of Newtonian physics was regarded as
complete, in the sense that velocity had no absolute significance, and each one of an
infinite set of relatively moving coordinate systems, related by (1), was equally suitable
for the description of all physical phenomena. This is called the principle of relativity,
and it's important to recognize that it is just a hypothesis, similar to the principle of
energy conservation. It is the result of a necessarily incomplete induction from our
observations of physical phenomena, and it serves as a tremendously useful organizing
principle, but only as long as it remains empirically viable. Admittedly we could regard
complete relativity as a direct consequence of the principle of sufficient cause - within a
conceptual framework of distinct entities moving in an empty void - but this is still a
hypothetical proposition. The key point to recognize is that although we can easily
derive the relativity of Newton's laws under the transformations (1), we cannot derive the
correctness of Newton's laws, nor can we derive the complete relativity of physics from
the presumptive relativity of the dynamics of material bodies.
By the end of the 19th century the phenomena of electromagnetism had become wellenough developed so that the behavior of the electromagnetic field - at least on a
macroscopic level - could be described by a set of succinct equations, analogous to
Newton's laws of motion for material objects. According to the principle of relativity (in
the context of entities in an empty void) it was natural to expect that these new laws
would be covariant with the laws of mechanics. It therefore was somewhat surprising
when it turned out that the equations which describe the electromagnetic field are not
covariant under the transformations (1). Apparently the principle of complete relativity
was violated. On the other hand, if mechanics and electromagnetism are really not corelativistic, it ought to be possible to detect the effects of an absolute velocity, whereas all
attempts to detect such a thing failed. In other words, the principle of complete relativity
of velocity continued to survive all empirical tests involving comparisons of the effects of

velocity on electromagnetism and mechanics, despite the fact that the (supposed)
equations governing these two classes of phenomena were not covariant with respect to
the same set of velocity transformations.
At about this time, Lorentz derived the fact that although Maxwell's equations (taking the
permissivity and permeability of the vacuum to be invariants) of the electromagnetic field
are not covariant with respect to (1), they are covariant with respect to a complete set of
velocity transformations, namely, those of the form

for a suitable choice of space and time units, where = (1v2)-1/2. This was a very
important realization, because if the equations of the electromagnetic field were not
covariant with respect to any complete set of velocity transformations, then the principle
of relativity could only have been salvaged by the existence of some underlying medium.
The situation would have been analogous to finding a physical process in which energy is
not conserved, leading us to seek for some previously undetected mode of energy. Of
course, even recognizing the covariance of Maxwell's equations with respect to (2), the
principle of relativity was still apparently violated because it still appeared that
mechanics and electromagnetism were incompatible.
Recall that Lorentz took Maxwell's equations to be "the fundamental equations of the
electromagnetic field" with respect to the inertial rest frame of the luminiferous ether.
Needless to say, these equations were not logically derived from more fundamental
principles, they were developed by a rational-inductive method whereby observed
phenomena were analyzed into a small set of simple patterns, which were then formalized
into mathematical expressions. Even the introduction of the displacement current was
just a rational hypothesis. Admittedly the historical development of Maxwell's equations
was guided to some extent by mechanistic analogies, but the mechanical world-view is
itself a high-level conceptual framework based on an extensive set of abstract
assumptions regarding dimensionality, space, time, plurality, persistent identities, motion,
inertia, and various conservation laws and symmetries. Thus even if a completely
successful mechanical model for the electromagnetic field existed, it would still be highly
hypothetical.
Moreover, it was already clear by 1905 that Maxwell's equations are not fundamental,
since the simple wave model of electromagnetic radiation leads to the ultra-violet
catastrophe, and in general cannot account for the micro-structure of radiation, leading to
such things as the photo-electric effect and other quantum phenomena. (Having just
completed a paper on the photo-electric effect prior to starting his 1905 paper on special
relativity, Einstein was very much aware that Maxwell's equations were not fundamental,
and this influenced his choice of foundations on which to base his interpretation of
electrodynamics. It's worth noting that although Lorentz derived the transformations (2)
from the full set of Maxwell's equations (with the permissivity and permeability
interpreted as invariants), these transformations actually follow from just one aspect of
Maxwell's equations, namely, the invariance of the speed of light. Thus from the

standpoint of logical economy, as well as to avoid any commitment to the fundamental


correctness of Maxwell's equations, it is preferable to derive the Lorentz transformation
from the minimum set of premises. Of course, having done this, it is still valuable to
show that, as a matter of fact, Maxwell's equations are fully covariant with respect to
these transformations.
To summarize the progress up to this point, Lorentz derived the general transformations
(2) relating two systems of space and time coordinates such that if an electromagnetic
field satisfies Maxwell's equations with respect to one of the systems, it also satisfies
Maxwell's equations with respect to the other. Now, this in itself certainly does not
constitute a derivation of the principle of relativity. To the contrary, the fact that (2) is
different from (1) leads us to expect that the principle of relativity is violated, and that it
ought to be possible to detect effects of absolute velocity, or, alternatively, to detect some
underlying medium that accounts for the difference between (2) and (1). Lorentz knew
that all attempts to detect an absolute velocity (or underlying medium) had failed,
implying that the principle of complete relativity was intact, so something was wrong
with the formulations of the laws of electromagnetism and/or the laws of mechanics.
Faced with this situation, Lorentz developed his "theorem of corresponding states",
which asserts that all physical phenomena transform according to the transformation law
for electrodynamics. This "theorem" is equivalent to the proposition that physics is, after
all, completely relativistic. Since Lorentz presented this as a "theorem", it has sometimes
misled people (including, to an extent, Lorentz himself) into thinking that he had actually
derived relativity, and that, therefore, his approach was more fundamental or more
constructive than Einstein's. However, an examination of Lorentz's "theorem" reveals
that it was explicitly based on assumptions (in addition to the false assumption that
Maxwell's equations are the fundamental equations of the electromagnetic field) which,
taken together, are tantamount to the assumption of complete relativity. The key step
occurs in 175 of The Theory of Electrons, in which Lorentz writes
We are now prepared for a theorem concerning corresponding states of
electromagnetic vibration, similar to that of 162, but of a wider scope. To the
assumptions already introduced, I shall add two new ones, namely (1) that the
elastic forces which govern the vibratory motions of the electrons are subjected to
the relation [300], and (2) that the longitudinal and transverse masses m' and m"
of the electrons differ from the mass m0 which they have when at rest in the way
indicated by [305].
Lorentz's equation [300] is simply the transformation law for electromagnetic forces, and
his equations [305] give the relativistic expressions for the transverse and longitudinal
masses of a particle. Lorentz has previously presented these expressions as
...the assumptions required for the establishment of the theorem, that the systems
S and S0 can be the seat of molecular motions of such a kind that, in both, the
effective coordinates of the molecules are the same function of the effective time.

In other words, these are the assumptions required in order to make the theorem of
corresponding states (i.e., the principle of relativity) true. Hence Lorentz simply
postulates relativity, just as did Galileo and Einstein, and then backs out the conditions
that must be satisfied by mechanical objects in order to make relativity true. Needless to
say, if we assume these conditions, we can then easily prove the theorem, but this is
tautological, because these conditions were simply defined as those necessary to make
the theorem true. Not surprisingly, if someone just focuses on Lorentz's "proof", without
paying attention to the assumptions on which it is based, he might be misled into thinking
that Lorentz derived relativity from some more fundamental considerations. This arises
from confusion over what Lorentz was actually doing. He was primarily deriving the
velocity transformations with respect to which Maxwell's equations are covariant, after
which he proceeded to determine how the equations of mechanics would need to be
modified in order for them to be covariant with respect to these same transformations.
He did not derive the necessity for mechanics to obey these revised laws, any more than
Einstein or Newton did. He simply assumed it, and indeed he had no choice, because the
laws of mechanics do not follow from the laws of electromagnetism. Why, then, does the
myth persist (in some circles) that Lorentz somehow derived relativity?
To answer this question, we need to examine Lorentz's derivation of the theorem of
corresponding states in greater detail. First, Lorentz justified the contraction of material
objects in the direction of motion (with respect to the ether frame) on the basis of his
"molecular force hypothesis", which asserts that the forces responsible for maintaining
stable configurations of matter transform according to the electromagnetic law. This can
only be regarded as a pure assumption, rather than a conclusion from electromagnetism,
for the simple reason that the molecular forces are necessarily not electromagnetic, at
least not in the Maxwellian sense. Maxwell's equations are linear, and it is not possible to
construct bound states from any superposition of linear solutions. Hence Lorentz's
molecular force hypothesis cannot legitimately be inferred from electromagnetism. It is a
sheer hypothesis, amounting to the simple assumption that all intrinsic mechanical
aspects of material entities are covariant with electromagnetism.
Second, and even more importantly, Lorentz justifies the applicability of the "effective
coordinates" for the laws of mechanics of material objects by assuming that the inertial
masses (both transverse and longitudinal) of material objects transform in the same way
as do the "electromagnetic masses" of a charged particle arising from self-reaction.
Admittedly it was once hoped that all inertial mass could be attributed to electromagnetic
self-reaction effects, which would have provided some constructive basis for Lorentz's
assumption, but we now know that only a very small fraction of the effective mass of an
electron is due to the electromagnetic field. Again, it is simply not possible to account
for bound states of matter in terms of Maxwellian electromagnetism, so it does not
logically follow that the mechanics of material objects are covariant with respect to (2)
simply because the electromagnetic field is covariant with respect to (2). Of course, we
can hypothesize that this is case, but this is simply the hypothesis of complete physical
relativity.
Thus Lorentz did not in any way derive the fact that the laws of mechanics are covariant

with respect to the same transformations as are the laws of electromagnetism. He simply
observed that if we assume they are (and if we assume every other physical effect, even
those presently unknown to us, is likewise covariant), then we get complete physical
relativity - but this is tautological. If all the laws of physics are covariant with respect to
a single set of velocity transformations (whether they are of the form (1) or (2) or any
other), then by definition physics is completely relativistic. The doubts about relativity
that arose in the 19th century were due to the apparent fact that the laws of mechanics
and the laws of electromagnetism were not covariant with respect to the same set of
velocity transformations. Obviously if we simply assume that they are covariant with
respect to the same transformations, then the disparity is resolved, but it's important to
recognize that this represents just the assumption - not a derivation - of the principle of
relativity.
An alternative approach to preserving the principle of relativity would be to assume that
electromagnetism and mechanics are actually both covariant with respect to the velocity
transformations (1). This would necessitate modifications of Maxwell's equations, and
indeed this was the basis for Ritz's emission theory. However, the modifications that Ritz
proposed eventually led to conflict with observation, because according to the relativity
based on (1) speeds are strictly additive and there is no finite upper bound on the speed of
energy propagation.
The failure of emission theories illustrates the important fact that there are two verifiable
aspects of relativistic physics. The first is the principle of relativity itself, but this
principle does not fully determine the observable characteristics of phenomena, because
there is more than one possible relativistic pattern, and these patterns are observationally
distinguishable. This is why relativistic physics is founded on two distinct premises, one
being the principle of relativity, and the other being some empirical proposition sufficient
to identify the particular pattern of relativity (Euclidean, Galilean, Lorentzian) that
applies. Lorentzs theorem of corresponding states represents the second of these
premises, whereas the first is simply assumed, consistent with the apparent relativity of
all observable phenomena. Einsteins achievement in special relativity was essentially to
show that Lorentzs results (and more) actually follow unavoidably from just a small
subset of his assumptions, and that these can be consistently interpreted as primitive
aspects of space and time.
The first published reference to Einstein's special theory of relativity appeared in a short
note by Walter Kaufmann reporting on his experimental results involving the deflection
of electrons in an electromagnetic field. Kaufmann's work was intended as an
experimentum crucis for distinguishing between the three leading theories of the electron,
those of Abraham, Bucherer, and Lorentz. In his note of 30 November 1905, Kaufmann
wrote
In addition there is to be mentioned a recent publication of Mr. A. Einstein on the
theory of electrodynamics which leads to results which are formally identical with
those of Lorentz's theory. I anticipate right away the general result of the
measurements to be described in the following: the results are not compatible

with the Lorentz-Einstein fundamental assumptions.


Kaufmann's results were originally accepted by most physicists as favoring the Abraham
theory, but gradually people began to have doubts. Although the results disagreed with
the Lorentz-Einstein model, the agreement with Abraham's theory was not particularly
good either. This troubled Planck, so he conducted a careful analysis of Kaufmann's
experiment and his analysis of the two competing theories. It was an interesting example
of scientific "detective work" by Planck.
Kaufmann in 1905 had measured nine characteristic deflections d1,d2,..,d9 for electrons
passing though nine different field strengths. Then he had computed the nine values that
would be predicted by Abraham's theory, and the nine values that would be predicted by
Lorentz-Einstein. However, in order to derive the "predictions" from the theories for his
particular experimental setup he needed to include an attenuation factor "k" on the
electric field strength. This factor is actually quite a complicated function of the
geometry of the plates and coils used to establish the electric field. Kaufamnn selected a
particular value of "k" that he thought would be reasonable.
Now, both the Abraham and the Lorentz-Einstein theory predicted the electron's velocity
could never exceed c, but Planck noticed that Kaufmann's choice of k implied a velocity
greater than c for at least one of the data points, and therefore was actually inconsistent
with both theories. This caused Planck to suspect that perhaps Kaufmann's assumed
value of k was wrong. Unfortunately the complexity of the experimental setup made it
impossible to give a firm determination of the attenuation factor from first principles, but
Planck was nevertheless able to extract some useful information from Kaufmann's data.
Planck took the nine data points and "backed out" the values of k that would be necessary
to make them agree with Abraham's theory. Then he did the same for the LorentzEinstein theory. All these values of k were well within the range of plausibility (given
the uncertainty in the experimental setup), so nothing definite could be concluded, but
Planck noted that the nine k-values necessary to match the Lorentz-Einstein theory to the
measurements were all nearly equal, whereas the nine k-values necessary to match
Abraham showed more variation. From this, one might actually infer a slight tilt in favor
of the Lorentz-Einstein theory, simply by virtue of the greater consistency of k values.
Naturally this inconclusive state of affairs led people to try to think of an experiment that
would be more definitive. In 1908 Bucherer performed a variation of Kaufmann's
experiment, but with an experimental setup taking Planck's analysis into account, so that
uncertainty in the value of k basically "cancels out". Bucherer's results showed clear
agreement with the Lorentz-Einstein theory and disagreed with the Abraham theory.
Additional and more refined experiments were subsequently performed, and by 1916 it
was clear that the experimental evidence did in fact support what Kaufmann had called
"the Lorentz-Einstein fundamental assumptions".
Incidentally, it's fascinating to compare the reactions of Lorentz, Poincare, and Einstein to
Kaufmann's results. Lorentz was ready to abandon his entire model (and life's work)

since it evidently conflicted with this one experiment. As he wrote to Poincare in 1906,
the length contraction hypothesis was crucial for the coherence of his entire theoretical
framework, and yet
Unfortunately my hypothesis of the flattening of electrons is in contradiction with
Kaufmann's results, and I must abandon it. I am, therefore, at the end of my
Latin.
Poincare agreed that, in view of Kaufmann's results "the entire theory may well be
threatened". It wasn't until the announcement of Bucherer's results that Lorentz regained
confidence in his own theoretical model. Interestingly, he later cited those results as one
of the main reasons for his eventual acquiescence with the relativity principle, noting that
if Lorentz-covariance is actually as comprehensive as these experimental results show it
to be, then the ether concept is entirely devoid of heuristic content. (On the other hand,
he did continue to maintain that there were some benefits in viewing things from the
standpoint of absolute space and time, even if we are not at present able to discern such
things.)
Einstein's reaction to Kaufmann's apparently devastating results was quite different. In a
review article on relativity theory in 1907, Einstein acknowledged that his theory was in
conflict with Kaufmann's experimental results, and he could find nothing wrong with
either Kaufmann's experiment or his analysis, which seemed to indicate in favor of
Abraham's theory over relativity. Nevertheless, the young patent examiner continued
It will be possible to decide whether the foundations of the relativity theory
correspond with the facts only if a great variety of observations is at hand... In
my opinion, both [the alternative theories of Abraham and Bucherer] have rather
slight probability, because their fundamental assumptions concerning the mass of
moving electrons are not explainable in terms of theoretical systems which
embrace a greater complex of phenomena. A theory is the more impressive the
greater the simplicity of its premises, the more different kinds of things it relates,
and the more extended is its area of applicability.
This is a remarkable defense of a scientific theory against apparent experimental
falsification. While not directly challenging the conflict between experiment and theory,
Einstein nevertheless maintained that we should regard relativity as most likely correct,
essentially on the basis of it's scope and conceptual simplicity. Oddly enough, when later
confronted with similar attempts to justify other people's theories, Einstein was fond of
saying that "a theory should be as simple as the facts allow - but no simpler". Yet here
we find him serenely confident that the "facts" rather than his theory will ultimately be
overturned, which turned out to be the case. This sublime confidence in the correctness
of certain fundamental ideas was a characteristic of Einstein throughout his career. When
asked what he would have done if the eclipse observations had disagreed with the
prediction of general relativity for the bending of light, Einstein replied "Then I would
have felt sorry for the dear lord, because the theory is correct."

3.7 Zeno and the Paradox of Motion


We may say a thing is at rest when it has not changed its position between
now and then, but there is no then in now, so there is no being at rest.
Both motion and rest, then, must necessarily occupy time.
Aristotle, 350 BC
The Eleatic school of philosophers was founded by the religious thinker and poet
Xenophanes (born c. 570 BC), whose main teaching was that the universe is singular,
eternal, and unchanging. "The all is one." According to this view, as developed by later
members of the Eleatic school, the appearances of multiplicity, change, and motion are
mere illusions. Interestingly, the colony of Elea was founded by a group of Ionian Greeks
who, in 545 BC, had been besieged in their seaport city of Phocaea by an invading
Persian army, and were ultimately forced to evacuate by sea. They sailed to the island of
Corsica, and occupied it after a terrible sea battle with the navies of Carthage and the
Etruscans. Just ten years later, in 535 BC, the Carthagians and Etruscans regained the
island, driving the Phocaean refugees once again into the sea. This time they landed on
the southwestern coast of Italy and founded the colony of Elea, seizing the site from the
native Oenotrians. All this happened within the lifetime of Xenophanes, himself a
wandering exile from his native city of Colophone in Ionia, from which he too had been
force to flee in 545 BC. He lived in Sicily and then in Catana before finally joining the
colony at Elea. It's tempting to speculate on how these events may have psychologically
influenced the Eleatic school's belief in permanent unalterable oneness, denying the
reality of change and plurality in the universe.
The greatest of the Eleatic philosophers was Parmenides (born c. 539 BC). In addition to
developing the theme of unchanging oneness, he is also credited with originating the use
of logical argument in philosophy. His habit was to accompany each statement of belief
with some kind of logical argument for why it must be so. It's possible that this was a
conscious innovation, but it seems more likely that the habitual rationalization was
simply a peculiar aspect of his intellect. In any case, on this basis he is regarded as the
father of metaphysics, and, as such, a key contributor to the evolution of scientific
thought.
Parmenides's belief in the absolute unity and constancy of reality is quite radical and
abstract, even by modern standards. He maintained that the universe is literally singular
and unchangeable. However, his rationalism forced him to acknowledge that appearances
are to the contrary, i.e., while he flatly denied the existence of plurality and change, he
admitted the appearance of these things. Nevertheless, he insisted these were mere
perceptions and opinions, not to be confused with "what is". Not surprisingly, Parmenides
was ridiculed for his beliefs. One of Parmenides' students was Zeno, who is best
remembered for a series of arguments in which he defends the intelligibility of the Eleatic
philosophy by purporting to prove, by logical means, that change (motion) and plurality

are impossible.
We can't be sure how the historical Zeno intended his arguments to be taken, since none
of his writings have survived. We know his ideas only indirectly through the writings of
Plato, Aristotle, Simplicus, and Proclus, none of whom was exactly sympathetic to Zeno's
philosophical outlook. Furthermore, we're told that Zeno's arguments were a "youthful
effort", and that they were made public without his prior knowledge or consent. Also,
even if we accept that his purpose was to defend the Eleatic philosophy against charges of
logical inconsistency, it doesn't follow that Zeno necessarily regarded his counter-charges
as convincing. It's conceivable that he intended them as satires of (what he viewed as) the
fallacious arguments that had been made against Parmenides' ideas. In any case, although
we cannot know for sure how Zeno himself viewed his "paradoxes", we can nevertheless
examine the arguments themselves, as they've come down to us, to see if they contain - or
suggest - anything of interest.
Of the 40 arguments attributed to Zeno by later writers, the four most famous are on the
subject of motion:
The Dichotomy: There is no motion, because that which is moved must arrive at
the middle before it arrives at the end, and so on ad infinitum.
The Achilles: The slower will never be overtaken by the quicker, for that which is
pursuing must first reach the point from which that which is fleeing started, so
that the slower must always be some distance ahead.
The Arrow: If everything is either at rest or moving when it occupies a space
equal to itself, while the object moved is always in the instant, a moving arrow is
unmoved.
The Stadium: Consider two rows of bodies, each composed of an equal number of
bodies of equal size. They pass each other as they travel with equal velocity in
opposite directions. Thus, half a time is equal to the whole time.
The first two arguments are usually interpreted as critiques of the idea of continuous
motion in infinitely divisible space and time. They differ only in that the first is expressed
in terms of absolute motion, whereas the second shows that the same argument applies to
relative motion. Regarding these first two arguments, there's a tradition among some high
school calculus teachers to present them as "Zeno's Paradox", and then "resolve the
paradox" by pointing out that an infinite series can have a finite sum. This may be a
useful pedagogical device for beginning calculus students, but it misses an interesting and
important philosophical point implied by Zeno's arguments. To see this, we can reformulate the essence of these two arguments in more modern terms, and show that, far
from being vitiated by the convergence of infinite series, they actually depend on the
convergence of the geometric series.
Consider a ray of light bouncing between an infinite sequence of mirrors as illustrated
below

On the assumption that matter, space, and time are continuous and infinitely divisible
(scale invariant), we can conceive of a point-like massless particle (say, a photon)
traveling at constant speed through a sequence of mirrors whose sizes and separations
decrease geometrically (e.g., by a factor of two) on each step. The envelope around these
mirrors is clearly a wedge shape that converges to a point, and the total length of the
zigzag path is obviously finite (because the geometric series 1 + 1/2 + 1/4 + ...
converges), so the particle must reach "the end" in finite time. The essence of Zeno's
position against continuity and infinite divisibility is that there is no logical way for the
photon to emerge from the sequence of mirrors. The direction in which the photon would
be traveling when it emerged would depend on the last mirror it hit, but there is no "last"
mirror. Similarly we could construct "Zeno's maze" by having a beam of light directed
around a spiral as shown below:

Again the total path is finite, but has no end, i.e., no final direction, and a ray propagating
along this path can neither continue nor escape. Of course, modern readers may feel
entitled to disregard this line of reasoning, knowing that matter consists of atoms which
are not infinitely divisible, so we could never construct an infinite sequence of
geometrically decreasing mirrors. Also, every photon has some finite scattering
wavelength and thus cannot be treated as a "point particle". Furthermore, even a massless
particle such as a photon necessarily has momentum according to the quantum and
relativistic relation p = h/, and the number of rebounds per unit time and hence the
outward pressure on the structure holding the mirrors in place - increases to infinity as the
photon approaches the convergent point. However, these arguments merely confirm
Zeno's position that the physical world is not scale-invariant or infinitely divisible (noting
that Plancks constant h represents an absolute scale). Thus, we haven't debunked Zeno,

we've merely conceded his point. Of course, this point is not, in itself, paradoxical. It
simply indicates that at some level the physical world must be regarded as consisting of
finite indivisible entities. We arrive at Zeno's paradox only when these arguments against
infinite divisibility are combined with the complementary set of arguments (The Arrow
and The Stadium) which show that a world consisting of finite indivisible entities is also
logically impossible, thereby presenting us with the conclusion that physical reality can
be neither continuous nor discontinuous.
The more famous of Zeno's two arguments against discontinuity is "The Arrow", which
focuses on the instantaneous physical properties of a moving arrow. He notes that if
physical objects exist discretely at a sequence of discrete instants of time, and if no
motion occurs in an instant, then we must conclude that there is no motion in any given
instant. (As Bertrand Russell commented, this is simply "a plain statement of an
elementary fact".) But if there is literally no physical difference between a moving and a
non-moving arrow in any given discrete instant, then how does the arrow know from one
instant to the next if it is moving? In other words, how is causality transmitted forward in
time through a sequence of instants, in each of which motion does not exist?
It's been noted that Zeno's "Arrow" argument could also be made in the context of
continuous motion, where in any single slice of time there is (presumed to be) no physical
difference between a moving and a non-moving arrow. Thus, Zeno suggests that if all
time is composed of instants (continuous or discrete), and motion cannot exist in any
instant, then motion cannot exist at all. A naive response to this argument is to point out
that although the value of a function f(t) is constant for a given t, the function f(t) may be
non-constant at t. But, again, this explanation doesn't really address the
phenomenological issue raised by Zeno's argument. A continuous function (as
emphasized by Weierstrass) is a static completed entity, so by invoking this model we are
essentially agreeing with Parmenides that physical motion does not truly exist, and is just
an illusion, i.e., "opinions", arising from our psychological experience of a static
unchanging reality.
Of course, to accomplish this we have expanded our concept of "the existing world" to
include another dimension. If, instead, we insist on adhering to the view of the entire
physical world as a purely spatial expanse, existing in and progressing through a
sequence of instants, then we again run into the problem of how a quality that exists only
over a range of instants can be causally conveyed through any given instant in which it
has no form of existence. Before blithely dismissing this concern as non-sensical, it's
worth noting that modern physics has concluded (along with Zeno) that the classical
image of space and time was fundamentally wrong, and in fact motion would not be
possible in a universe constructed according to the classical model. We now recognize
that position and momentum are incompatible variables, in the sense that an exact
determination of either one of them leaves the other completely undetermined. According
to quantum mechanics, the eigenvalues of spatial position are incompatible with the
eigenvalues of momentum so, just as Zenos arguments suggest, it really is inconceivable
for an object to have a definite position and momentum (motion) simultaneously.

The theory of special relativity answers Zeno's concern over the lack of an instantaneous
difference between a moving and a non-moving arrow by positing a fundamental restructuring the basic way in which space and time fit together, such that there really is an
instantaneous difference between a moving and a non-moving object, insofar as it makes
sense to speak of "an instant" of a physical system with mutually moving elements.
Objects in relative motion have different planes of simultaneity, with all the familiar
relativistic consequences, so not only does a moving object look different to the world,
but the world looks different to a moving object.
This resolution of the paradox of motion presumably never occurred to Zeno, but it's no
exaggeration to say that special relativity vindicates Zeno's skepticism and physical
intuition about the nature of motion. He was correct that instantaneous velocity in the
context of absolute space and absolute time does not correspond to physical reality, and
probably doesn't even make sense. From Zeno's point of view, the classical concept of
absolute time was not logically sound, and special relativity (or something like it) is a
logical necessity, not just an empirical fact. It's even been suggested that if people had
taken Zeno's paradoxes more seriously they might have arrived at something like special
relativity centuries ago, just on logical grounds. This suggestion goes back at least to
Minkowski's famous lecture of "staircase wit" (see Section 1.7). Doubtless it's stretching
the point to say that Zeno anticipated the theory of special relativity, but it's undeniably
true that his misgivings about the logical consistency of motion in it's classical form were
substantially justified. The universe does not (and arguably, could not) work the way
people thought it did.
In all four of Zeno's arguments on motion, the implicit point is that if space and time are
independent, then logical inconsistencies arise regardless of whether the physical world is
continuous or discrete. All of those inconsistencies can be traced to the implication that,
if any motion is possible, then the range of conceivable relative velocities must be
unbounded, corresponding to Minkowski's "unintelligible" G.
What is the alternative? Zeno considers the premise that the range of possible relative
velocities is bounded, i.e., there is some maximum achievable (conceivable) relative
velocity, and he associates this possibility with the idea that space and time are not
infinitely divisible. (It presumably didn't occur to him that another way of achieving this
is to assume space and time are not independent.)
This brings us to the last of Zeno's four main arguments on motion, "The Stadium",
which has always been the most controversial, partly because the literal translation of its
statement is somewhat uncertain. In this argument Zeno appears to be attacking the only
remaining alternative to the unintelligible G, namely, the possibility of a finite upper
bound on conceivable velocity. It's fascinating that he argues in much the same way that
modern students do when they're first introduced to the concept of an invariant speed in
the theory of special relativity. He says, in effect, that if someone is running towards me
from the west at the maximum possible speed, and someone else is approaching me from
the east at the maximum possible speed, then they are approaching each other at twice the
maximum possible speed...which is a contradiction.

To illustrate the relevance of Zeno's arguments to a discussion of the consequences of


special relativity, compare the discussion of time dilation in Section 2.13 of Rindler's
"Essential Relativity" with Heath's review of Zeno's Stade paradox in Chapter VIII of "A
History of Greek Mathematics". The resemblance is so striking that it's tempting to
imagine that either Rindler consciously patterned his discussion on some recollection of
Zeno's argument, or it's an example of Jung's collective unconscious. Here is a
reproduction of Rindler's Figure 2.4, showing three "snapshots of two sequences of
clocks A, B, C,... and A', B', C', ... fixed at certain equal intervals along the x axes of two
frames S and S':

These three snapshots are taken at equal intervals by an observer in a third frame S",
relative to which S and S' have equal and opposite velocities. Rindler describes the values
that must appear on each clock in order to explain the seemingly paradoxical result that
each observer considers the clocks of the others to be running slow, in accord with
Einsteinian relativity. Compare this with the figure on page 277 of Heath:

where again we have three snapshots of a sequence of clocks (i.e., observers/athletes),


this time showing the reference frame S" as well as the two frames S and S' that are
moving with equal and opposite velocities relative to S". As Aristotle commented, this
scenario evidently led Zeno to the paradoxical conclusion that "half the time is equal to
its double", precisely as the freshman physics student suspects when he first considers the
implications of relativity.
Surely we can forgive Zeno for not seeing that his arguments can only be satisfactorily
answered - from the standpoint of physics - by assuming Lorentzian invariance and the
relativity of space and time. According to this view, with it's rejection of absolute
simultaneity, we're inevitably led from a dynamical model in which a single slice of space
progresses "evenly and equably" through time, to a purely static representation in which
the entire history of each worldline already exists as a completed entity in the plenum of
spacetime. This static representation, according to which our perceptions of change and
motion are simply the product of our advancing awareness, is strikingly harmonious with
the teachings of Parmenides, whose intelligibility Zeno's arguments were designed to
defend.
Have we now finally resolved Zeno's "youthful effort"? Given the history of "final
resolutions", from Aristotle onwards, it's probably foolhardy to think we've reached the
end. It may be that Zeno's arguments on motion, because of their simplicity and
universality, will always serve as a kind of "Rorschach image" onto which people can
project their most fundamental phenomenological concerns (if they have any).
3.8 A Very Beautiful Day
Such a solemn air of silence has descended between us that I almost feel
as if I am committing a sacrilege when I break it now with some

inconsequential babble. But is this not always the fate of the exalted ones
of this world?
Einstein to
Habicht, 25 May 1905
In 1894 Einstein's parents and younger sister Maja moved to Italy, where his father hoped
to start a new business. It was arranged for Albert, then 15, to remain in Munich to
complete his studies at the Gymnasium (high school), but the young lad soon either
dropped out or was invited to leave (recollections differ). He then crossed the Alps to
reunite with his family in Italy. Lacking a high school diploma, his options for further
education were limited, but his father still hoped for him to become an electrical
engineer, which required a university degree. It so happens that the Zurich Polytechnic
Institute had an unusual admissions policy which did not require a high school diploma,
provided the applicant could pass the entrance examination, so after a year off in Italy, the
16 year old Albert was dispatched to Zurich to take the exam. He failed, having made (as
he later admitted) "no attempt whatsoever to prepare myself". In fairness, it should be
noted that the usual age for taking the exam was 18, but it seems he wasn't particularly
eager to (as his father advised) "forget his philosophical nonsense and apply himself to a
sensible trade like electrical engineering".
Fortunately, the principal of the Polytechnic noted the young applicant's unusual strength
in mathematics, and helped make arrangements for Einstein to attend a cantonal school in
the picturesque town of Aarau, twenty miles west of Zurich. The headmaster of the
school was Professor Jost Winteler, an ornithologist. During his time in Aarau Einstein
stayed with the Winteler family, and always had fond memories of the time he spent
there, in contrast with what he regarded as the coercive atmosphere at the Munich
Gymnasium. He became romantically involved with Marie Winteler (Jost's daughter), but
seems to have been less serious about it than she was, and the relationship ended badly
when Einstein took up with Mileva Maric. He also formed life-long relationships with
two of the other Winteler children, Paul and Anna. Paul Winteler married Einstein's sister
Maja, and Anna Winteler married Michelangelo Besso, one of Einstein's closest friends.
Besso, six years older than Einstein, was a Swiss-Italian studying to be an electrical
engineer. Like Einstein, he played the violin, and the two of them first met at a musical
gathering in 1896.
It was just a year earlier the 16 year old Einstein first wondered how the world would
appear to someone traveling at the speed of light. He realized that to such an observer a
co-moving lightwave in a vacuum would appear as a spatially fluctuating standing wave,
i.e., a stationary wave of light, but it doesn't take an expert in Maxwell's equations to be
skeptical that any such configuration is possible. Indeed, Einstein later recalled that "from
the beginning it appeared to me intuitively clear" that light must propagate in the same
way with respect to any system of inertial coordinates. However, this invariance directly
contradicts the Galilean addition rule for the composition of velocities. This problem
stayed with Einstein for the next ten years, during which time he finally gained entrance
to the Polytechnic, and, to the disappointment of his family, switched majors from
electrical engineering to physics. His friend Besso continued with his studies and became

an electrical engineer in Milan. Already by this time Einstein had turned from
engineering to pure physics, and seems to have decided (or foreseen) how he would
spend his life, as he wrote in an apologetic letter to Maries mother Pauline Winteler in
the Spring of 1897
Strenuous intellectual work and looking at Gods Nature are the reconciling,
fortifying, yet relentlessly strict angels that shall lead me through all of lifes
troubles And yet what a peculiar way this is to weather the storms of life in
many a lucid moment I appear to myself as an ostrich who buries his head in the
desert sand so as not to perceive the danger. One creates a small little world for
oneself, and as lamentably insignificant as it may be in comparison with the
perpetually changing size of real existence, one feels miraculously great and
important
Despite his love of physics, Einstein did not perform very impressively as an undergraduate in an academic setting, and this continued to be true in graduate school.
Hermann Minkowski referred to his one-time pupil as a "lazy dog". As the biographer
Clark wrote, "Einstein became, as far as the professorial staff of the ETH was concerned,
one of the awkward scholars who might or might not graduate but who in either case was
a great deal of trouble". Professor Pernet at one point suggested to Einstein that he switch
to medicine or law rather than physics, saying "You can do what you like, I only wish to
warn you in your own interest". Clearly Einstein "pushed along with his formal work just
as much as he had to, and found his real education elsewhere". Often he didn't even
attend the lectures, relying on Marcel Grossman's notes to cram for exams, making no
secret of the fact that he wasn't interested in what men like Weber had to teach him. His
main focus during the four years while enrolled at the ETH was independently studying
the works of Kirchhoff, Helmholtz, Hertz, Maxwell, Poincare, etc., flagrantly outside the
course of study prescribed by the ETH faculty. Some idea of where his studies were
leading him can be gathered from a letter to his fellow student and future wife Mileva
Maric written in August of 1899
I returned to the Helmholtz volume and am at present studying again in depth
Hertzs propagation of electric force. The reason for it was that I didnt
understand Helmholtzs treatise on the principle of least action in
electrodynamics. I am more and more convinced that the electrodynamics of
moving bodies, as presented today, is not correct, and that it should be possible to
present it in a simpler way. The introduction of the term ether into the theories
of electricity led to the notion of a medium of whose motion one can speak
without being able, I believe, to associate a physical meaning with this statement.
I think that the electric forces can be directly defined only for empty space
Einstein later recalled that after graduating in 1900 the "coercion" of being forced to take
the final exams "had such a detrimental effect that... I found the consideration of any
scientific problem distasteful to me for an entire year". He achieved an overall mark of
4.91 out of 6, which is rather marginal. Academic positions were found for all members
of the graduating class in the physics department of the ETH with the exception of

Einstein, who seems to have been written off as virtually unemployable, "a pariah,
discounted and little loved", as he later said.
From Milan in late August of 1900 Einstein wrote to his girlfriend, Mileva, and
mentioned that
I am spending many evenings here at Michelles. I like him very much because
of his sharp mind and his simplicity, and also Anna and, especially, the little brat.
His house is simple and cozy, even though the details show some lack of taste
In another letter to Mileva, in October, he commented that his friend had intuited the
blossoming romance between Einstein and Mileva (who had studied physics together at
the Polytechnic)
Michele has already noticed that I like you, because, even though I didnt tell him
almost anything about you, he said, when I told him that I must now go the Zurich
again: He surely wants to go to his [woman] colleague, what else would draw
him to Zurich? I replied But unfortunately she is not there yet. I prodded him
very much to become a professor, but I doubt very much that hell do it. He
simply doesnt want to let himself and his family be supported by his father. This
is after all quite natural. What a waste of his truly outstanding intelligence.
The following April, in another love letter to Mileva, Einstein wrote about having just
read Plancks paper on radiation with mixed feelings, because misgivings of a
fundamental nature have arisen in my mind. In the same letter he wrote
Michele arrived with wife and child from Trieste the day before yesterday. He is
an awful weakling without a spark of healthy humaneness, who cannot rouse
himself to any action in life or scientific creation, but an extraordinarily fine
mind, whose working, though disorderly, I watch with great delight. Yesterday
evening I talked shop with him with great interest for almost 4 hours. We talked
about the fundamental separation of luminiferous ether and matter, the definition
of absolute rest, molecular forces, surface phenomena, dissociation. He is very
interested in our investigations, even though he often misses the overall picture
because of petty considerations. This is inherent in the petty disposition of his
being, which constantly torments him with all kinds of nervous notions.
Toward the end of 1901 Einstein had still found no permanent position. As he wrote to
Grossman in December of that year, "I am sure I would have found a position [by now]
were it not for Weber's intrigues against me". It was only because Grossman's father
happened to be good friends with Haller, the chief of the Swiss Patent Office, that
Einstein was finally given a job, despite the fact that Haller judged him to be "lacking in
technical training". Einstein wrote gratefully to the Grossman's that he "was deeply
moved by your devotion and compassion which do not let you forget an old, unlucky
friend", and that he would spare no effort to live up to their recommendation. He had
applied for Technical Expert 2nd class, but was given the rank of 3rd class (in June

1902).
As soon as he'd been away from the coercive environment of academia long enough that
he could stand once again to think about science, he resumed his self-directed studies,
which he pursued during whatever free time a slightly lazy patent examiner can make for
himself. His circumstances were fairly unusual for someone working on a doctorate,
especially since he'd already been rejected for academic positions by both the ETH and
the University of Zurich. He was undeniably regarded by the academic community (and
others) as "an awkward, slightly lazy, and certainly intractable young man who thought
he knew more than his elders and betters".
In early 1905, while employed as a patent examiner in Bern, Einstein was striving to
complete his doctoral thesis, focusing on black-body radiation, and at the same time
writing a paper on light-quanta (later cited by the Nobel committee) and another on
Brownian motion. Each of these was a tremendous contribution to 20th century physics,
but one has the impression that Einstein was, in a sense, getting these duties out of the
way, so that he could concentrate on the "philosophical nonsense" of the velocity addition
problem, which he realized was "a puzzle not easy to solve at all". In other words, he
realized that he couldn't count on being able to produce anything useful on this question,
even though his attention was inexorably drawn to it. One imagines that he forced
himself to complete the papers on statistical physics - in which he knew he had
something to say - before allowing himself the luxury of focusing on the fascinating but
possibly insoluble philosophical problem of motion.
After completing the statistical papers on March 17, April 30, and May 10, 1905, he
allowed himself to concentrate fully on the problem of motion, which apparently had
never been far from his mind. As he later recalled, he "felt a great difficulty to resolve the
question... I had wasted time almost a year in fruitless considerations..." Then came the
great turning point, both for Einstein's own personal life and for modern physics:
"Unexpectedly, a friend of mine in Bern then helped me." The friend was Michelangelo
Besso, who had by then also taken a job at the Swiss patent office. In his Kyoto lecture of
1922 Einstein later remembered the circumstances of the unexpected help he received
from Besso:
That was a very beautiful day when I visited him and began to talk with him as
follows: "I have recently had a question which was difficult for me to
understand. So I came here today to bring with me a battle on the question."
Trying a lot of discussions with him, I could suddenly comprehend the matter.
Next day I visited him again and said to him without greeting "Thank you. I've
completely solved the problem."
It had suddenly become clear to Einstein during his discussion with Besso that the
correlation of time at different spatial locations is not absolutely defined, since it depends
fundamentally on some form of communication between those locations. Thus, the
concept of simultaneity at separate locations is relative. A mere five weeks after this
recognition, Einstein completed "On the Electrodynamics of Moving Bodies", in which

he presented the special theory of relativity. This monumental paper contains not a single
reference to the literature, and only one acknowledgement:
In conclusion, I wish to say that in working at the problem here dealt with I have
had the loyal assistance of my friend and colleague M. Besso, and that I am
indebted to him for several valuable suggestions.
We don't know precisely what those suggestions were, but we have Einstein's later
statement that he "could not have found a better sounding board for his ideas in all of
Europe." It was Besso also introduced Einstein to the writings of Ernst Mach, which were
to have such a profound influence on the development of the general theory (although
subsequently Einstein emphasized the influence of Hume over Mach). Besso selfdeprecatingly described their intellectual relationship by saying "Einstein the eagle took
Besso the sparrow under his wing, and the sparrow flew a little higher". The two men
carried on a regular correspondence that lasted over half a century, through two world
wars, and Einstein's incredible rise to world fame. Its interesting that, despite how highly
Einstein valued Bessos intellect, the latter invariably took a self-denigrating tone in their
correspondence (and presumably in their conversations), sometimes even seeming to be
genuinely puzzled by the significance that Einstein attached to his little comments. In a
letter of August 1918 Besso wrote
You had, by the way, overestimated the meaningfulness of my observations again:
I was not aware that they had the meaning that an energy tensor for gravitation
was dispensable. If I understand it correctly, my inadvertent statement now
implies that planetary motion would satisfy conservation laws just by chance, as it
were. What is certain is that I was not aware of this consequence of my comments
and cannot grasp the argument even now.
The friendship with Besso may have been, in some ways, the most meaningful of
Einstein's life. Michael and his wife sometimes took care of Einstein's children, tried to
reconcile Einstein with Mileva when their marriage was foundering, and so on. Another
of the few close personal ties that Einstein was able to maintain over the years was with
Max von Laue, who Einstein believed was the only one of the Berlin physicists who
behaved decently during the Nazi era. Following the war, a friend of Einstein's was
preparing to visit Germany and asked if Einstein would like him to convey any messages
to his old friends and colleagues. After a moment of thought, Einstein said "Greet Laue
for me". The friend, trying to be helpful, then asked specifically about several other
individuals among Einstein's former associates in his homeland. Einstein thought for
another moment, and said "Greet Laue for me".
The stubborn, aloof, and uncooperative aspect of Einstein's personality that he had shown
as a student continued to some extent throughout his life. For example, in 1937 he
collaborated with Nathan Rosen on a paper purporting to show, contrary to his own
prediction of 1916, that gravitational waves cannot exist - at least not without unphysical
singularities. He submitted this paper to Physical Review, and it was returned to him with
a lengthy and somewhat critical referee report asking for clarifications. Apparently

Einstein was unfamiliar with the refereeing of papers, routinely practiced by American
academic journals. He wrote back to the editor
Dear Sir,
We (Mr. Rosen and I) had sent you our manuscript for publication and had not
authorized you to show it to specialists before it is printed. I see no reason to
address the - in any case erroneous - comments of your anonymous expert. On the
basis of this incident I prefer to publish the paper elsewhere.
respectfully,
P.S. Mr. Rosen, who has left for the Soviet Union, has authorized me to represent
him in this matter.
Was the postscript about Mr. Rosen's departure to the Soviet Union (in the politically
charged atmosphere of the late 1930's) an oblique jibe at American mores, or just a bland
informational statement? In any case, Einstein submitted the paper, unaltered, to another
journal (The Journal of the Franklin Institute). However, before it appeared he came to
realize that its argument was faulty, causing him to re-write the paper and its conclusions.
Interestingly, what Einstein had realized is precisely what the anonymous referee had
pointed out, namely, that by a change of coordinates the construction given by Einstein
and Rosen was simply a description of cylindrical waves, with a singularity only along
the axis (thus considered to be an acceptable singularity). The referee report still exists
among Einstein's private papers, although it isn't clear if the correction was prompted by
the Physical Review's referee report. (The correction may also have been prompted by
private comments from Howard Percy Robertson (via Infeld) who had just returned to
Princeton from sabbatical. On the other hand, these two possibilities may amount to the
same thing, since Kennefick speculates that Robertson was the anonymous referee!)
Another aspect of Einstein's personality that seems incongruous with scholarly success
was his remarkable willingness to make mistakes in public and change his mind about
things, with seemingly no concern for the effect this might have on his academic
credibility. Regarding the long succession of "unified field theories" that Einstein
produced in the 1920's and 30's, Pauli commented wryly "It is psychologically interesting
that for some time the current theory is usually considered by its author to be the
'definitive solution'". Eventually Einstein gave up on the particular approach to
unification that he had been pursuing in those theories, and cheerfully wrote to Pauli
"You were right after all, you rascal". Lest we think that this willingness to make and
admit mistakes was a characteristic only of the aged Einstein, past his prime, recall
Einstein's wry self-description in a letter to Ehrenfest in December 1915: "That fellow
Einstein suits his convenience. Every year he retracts what he wrote the year before."
In 1939 Einstein's sister Maja Winteler, was forced by Mussolini's racial policies to leave
Florence. She went to Princeton to join her brother while Paul moved in with his sister
Anna and Michel Besso's family in Geneva. Maja and Paul never saw each other again.
In 1946, after the war, they began making plans to reunite in Geneva, but Maja suffered a
stroke, and thereafter remained bedridden until her death in 1951. To Besso in 1954,
nearly 50 years after their discussion in the patent office, Einstein wrote:

I consider it quite possible that physics cannot be based on the field principle, i.e.,
on continuous structures. In that case, nothing remains of my entire castle in the
air, gravitation theory included..."
In March of the following year, Michelangelo Besso died at his home in Geneva. Einstein
wrote to the Besso family "Now he has gone a little ahead of me in departing from this
curious world". Einstein died three weeks later, on April 18, 1955.
3.9 Constructing the Principles
In mechanics as reformed in accordance with the world-postulate, the disturbing lack of
harmony between Newtonian mechanics and modern electrodynamics disappears of its
own accord.
H. Minkowski,
1907

The general public took little notice of the special theory of relativity when it first
appeared 1905, but following the sensational reports of the eclipse observations of 1919
Einstein instantly became a world-wide celebrity, and there was suddenly intense public
interest in everything having to do with Einsteins theory. The London Times asked
him to explain his mysterious theory to its readers. He accommodated with a short essay
that is notable for its description of what he regarded as two fundamentally different
kinds of physical theories. He wrote:
We can distinguish various kinds of theories in physics. Most of them are constructive. They
attempt to build up a picture of the more complex phenomena out of the materials of a relatively
simple formal scheme from which they start out. Thus the kinetic theory of gases seeks to reduce
mechanical, thermal, and diffusional processes to movements of molecules -- i.e., to build them up
out of the hypothesis of molecular motion. When we say that we have succeeded in understanding
a group of natural processes, we invariably mean that a constructive theory has been found which
covers the processes in question.
Along with this most important class of theories there exists a second, which I will call "principletheories." These employ the analytic, not the synthetic, method. The elements which form their
basis and starting-point are not hypothetically constructed but empirically discovered ones,
general characteristics of natural processes, principles that give rise to mathematically formulated
criteria which the separate processes or the theoretical representations of them have to satisfy.
Thus the science of thermodynamics seeks by analytical means to deduce necessary conditions,
which separate events have to satisfy, from the universally experienced fact that perpetual motion
is impossible.
The advantages of the constructive theory are completeness, adaptability, and clearness, those of
the principle theory are logical perfection and security of the foundations. The theory of relativity
belongs to the latter class.

Einstein was not the first to discuss such a distinction between physical theories. In an
essay on the history of physics included in the book The Value of Science published in
1904, Poincare had described how, following Newtons success with celestial mechanics,
the concept of central forces acting between material particles was use almost exclusively

as the basis for constructing physical theories (the exception being Fouriers theory of
heat). Poincare expressed an appreciation for this constructive approach to physics.
This conception was not without grandeur; it was seductive, and many among us have not finally
renounced it; they know that one will attain the ultimate elements of things only by patiently
disentangling the complicated skein that our senses give us; that it is necessary to advance step by
step, neglecting no intermediary; that our fathers were wrong in wishing to skip stations; but they
believe that when one shall have arrived at these ultimate elements, there again will be found the
majestic simplicity of celestial mechanics.

Poincare then proceded to a section called The Physics of Principles, where he wrote:
Nevertheless, a day arrived when the conception of central forces no longer appeared sufficient
What was done then? The attempt to penetrate into the detail of the structure of the universe, to
isolate the pieces of this vast mechanism, to analyse one by one the forces which put them in
motion, was abandoned, and we were content to take as guides certain general principles, the
express object of which is to spare us this minute study The principle of the conservation of
energy is certainly the most important, but it is not the only one; there are others from which we
can derive the same advantage. These are: Carnot's principle, or the principle of the degradation of
energy. Newton's principle, or the principle of the equality of action and reaction. The principle of
relativity, according to which the laws of physical phenomena must be the same for a stationary
observer as for an observer carried along in a uniform motion of translation The principle of the
conservation of mass The principle of least action. The application of these five or six general
principles to the different physical phenomena is sufficient for our learning of them all that we
could reasonably hope to know of them These principles are results of experiments boldly
generalized; but they seem to derive from their very generality a high degree of certainty. In fact,
the more general they are, the more frequent are the opportunities to check them, and the
verifications multiplying, taking the most varied, the most unexpected forms, end by no longer
leaving place for doubt Thus they came to be regarded as experimental truths; the conception of
central forces became then a useless support, or rather an embarrassment, since it made the
principles partake of its hypothetical character.

Einstein is known to have been an avid reader of Poincares writings, so it seems likely
that he adopted the theoretical classification scheme from this essay.
Returning to the previous excerpt from Einsteins article, notice that he actually mentions
three sets of alternative characteristics, all treated as representing essentially the same
dichotomy. We're told that constructive theories proceed synthetically on the basis of
hypothetical premises, whereas principle theories proceed analytically on the basis of
empirical premises. Einstein cites statistical thermodynamics as an example of a
constructive theory, and classical thermodynamics as an example of a principle theory.
His view of these two different approaches to thermodynamics was undoubtedly
influenced by the debate concerning the reality of atoms, which Mach disdainfully called
the "atomistic doctrine". The idea that matter is composed of finite irreducible entities
was regarded as purely hypothetical, and the justification for this hypothesis was not
entirely clear. In fact, Einstein himself spent a great deal of time and effort trying to
establish the reality of atoms, e.g., this was his expressed motivation for his paper on
Brownian motion. Within this context, it's not surprising that he classified the premises of
statistical thermodynamics as purely hypothetical, and the development of the theory as
synthetic.

However, in another sense, it could be argued that the idea of atoms actually arises
empirically, and represents an extreme analytic approach to observed phenomena.
Literally the analytic method is to "take apart" the subject into smaller and smaller subcomponents, until arriving at the elementary constituents. We regard macroscopic objects
not as an indivisible wholes, but as composed of sub-parts, each of which is composed of
still smaller parts, and we continue this process of analysis at least until we can no longer
directly resolve the sub-parts (empirically) into smaller entities. At this point we may
resort to some indirect methods of inference in order to carry on the process of empirical
analysis. Indeed, Einstein's work on Brownian motion did exactly this, in so far as he was
attempting to analyze the smallest directly observable entities, and to infer, based on
empirical observations, an even finer level of structure. It was apparently Einstein's view
that, at this stage, a reversal of methodology is required, because direct observation no
longer provides unique answers, and thus the inferences are necessarily indirect, i.e., they
can only be based on a somewhat free hypothesis about the underlying structure, and then
synthetically working out the observable implications of this hypothesis and comparing
these with what we actually observe.
So Einstein's conception of a constructive (hypothetically based, synthetic) physical
theory was of a theory arrived at by hypothesizing or postulating some underlying
structure (consistent with all observations, of course), and then working out the logical
consequences of those postulates to see how well they account for the whole range of
observable phenomena. At this point we might expect Einstein to classify special
relativity as a constructive theory, because it's well known that the whole theory of
special relativity - with all its observable consequences - can be constructed synthetically
based on the exceedingly elementary hypothesis that the underlying structure of space
and time is Minkowskian. However, Einstein's whole point in drawing the distinction
between constructive and principle theories was to argue that relativity is not a
constructive theory, but is instead a theory of principle.
It's clear that Einstein's original conception of special relativity was based on the model
of classical thermodynamics, even to the extent that he proposed exactly two principles
on which to base the theory, consciously imitating the first and second laws of
thermodynamics. Some indication of the ambiguity in the classification scheme can be
seen in the various terms that Einstein applied to these two propositions. He variously
referred to them as postulates, principles, stipulations, assumptions, hypotheses,
definitions, etc. Now, recalling that a "constructive theory" is based on hypotheses,
whereas a "principle theory" is based on principles, we can see that the distinction
between principles and postulates (hypotheses) is significant for correctly classifying a
theory, and yet Einstein was not very careful (at least originally) to clarify the actual role
of his two foundational propositions.
Nevertheless, his consistently viewed special relativity as a theory of principle, with the
invariance of light speed playing a role analogous to the conservation of energy in
classical thermodynamics, both regarded as high-level empirical propositions rather than
low-level elementary hypotheses. Indeed, it's possible to make this more than just an
analogy, because in place of the invariance of light speed (with respect to all inertial

coordinate systems) we could just as well posit conservation of total mass-energy (with
the conversion E = mc2), and use this conservation, together with the original principle of
relativity (essentially carried over from Newtonian physics), as the basis for special
relativity. As late as 1949 in his autobiographical notes (which he jokingly called his
"obituary"), Einstein wrote
Gradually I despaired of the possibility of discovering the true laws by means of constructive
efforts based on known facts. The longer and more desperately I tried, the more I came to the
conviction that only the discovery of a universal formal principle could lead us to assured results.
The example I saw before me was thermodynamics. The general principle was there given in the
theorem: the laws of nature are such that it is impossible to construct a perpetuum mobile (of the
first or second kind)... The universal principle of the special theory of relativity is contained in the
postulate: The laws of physics are invariant with respect to Lorentz transformations (for the
transition from one inertial system to any other arbitrarily chosen inertial system). This is a
restricting principle for natural laws, comparable to the restricting principle of the nonexistence of
the perpetuum mobile that underlies thermodynamics.

Here Einstein refers to "constructive theories based on known facts", whereas in the 1919
article he indicated that constructive theories are based on "a relatively simple formal
scheme" such as the hypothesis of molecular motion (i.e., the atomistic doctrine that
Mach (for one) rejected as unempirical), and principle theories are based on empirical
facts. In other words, the distinguishing characteristics that Einstein attributed to the two
kinds of theories have been reversed. This illustrates one of the problematic aspects of
Einstein's classification scheme: every theory is ultimately based on some unprovable
premises, and at the same time every (nominally viable) theory is based on what might be
called known facts, i.e., is it connected to empirical results. Einstein was certainly well
aware of this, as shown by the following comment (1949) in a defense of his
methodological approach:
A basic conceptual distinction, which is a necessary prerequisite of scientific and pre-scientific
thinking, is the distinction between "sense-impressions" (and the recollection of such) on the one
hand and mere ideas on the other. There is no such thing as a conceptual definition of this
distinction (aside from, circular definitions, i.e., of such as make a hidden use of the object to be
defined). Nor can it be maintained that at the base of this distinction there is a type of evidence,
such as underlies, for example, the distinction between red and blue. Yet, one needs this distinction
in order to be able to overcome solipsism.

In view of this, what ultimately is the distinction between what Einstein called
constructive theories and principle theories? It seems that the distinction can only be
based on the conceptual level of the hypotheses, so that constructive theories are based on
"low level" hypotheses, and principle theories based on "high level" hypotheses. In this
respect the original examples (classical thermodynamics and statistical thermodynamics)
cited by Einstein are probably the clearest, because they represent two distinct
approaches to essentially the same subject matter. In a sense, they can be regarded as just
two different interpretations of a single theory (much as special relativity and Lorentz's
ether theory can be seen as two different interpretations of the same theory). Now,
statistical thermodynamics was founded on hypotheses - such as the existence of atoms that may be considered "low level", whereas the hypothesis of energy conservation in
classical thermodynamics can plausibly be described as "high level". On the other hand,

the premises of statistical thermodynamics include the idea that the molecules obey
certain postulated equations of motions (e.g., Newton's laws) which are essentially just
expressions of conservation principles, so the "constructive" approach differs from the
"theory of principle" only in so far as its principles are applied to very low-level entities.
The conservation principles are explicitly assumed only for elementary molecules in
statistical thermodynamics, and then they are inferred for high-level aggregates like a
volume of gas. In contrast, the principle theory simply observes the conservation of
energy at the level of gases, and adopts it as a postulate.
In the case of special relativity, it's clear that Einstein originally developed the theory
from a "high-level" standpoint, based on the observation that light propagates at the same
speed with respect to every system of inertial coordinates. He himself felt that a
constructive model or interpretation for this fact was lacking. In January of 1908 he wrote
to Sommerfeld
A physical theory can be satisfactory only if its structures are composed of elementary
foundations. The theory of relativity is ultimately just as unsatisfactory as, for example, classical
thermodynamics was before Boltzmann interpreted entropy as probability.

However, just eight months later, Minkowski delivered his famous lecture at Cologne, in
which he showed how the theory of special relativity follows naturally from just a simple
fundamental hypothesis about the metric of space and time. There can hardly be a lower
conceptual level than this, i.e., some assumption about the metric(s) of space and time is
seemingly a pre-requisite for any description - scientific or otherwise - of the phenomena
of our experience. Kant even went further, and suggested that one particular metrical
structure (Euclidean) was a sina qua non of rational thought. We no longer subscribe to
such a restrictive view, and it may even be possible to imagine physical ideas prior to any
spatio-temporal conceptions, but nevertheless the fact remains that such conceptions are
among the most primitive that we possess. For example, the posited structure of space
and time is more primitive than the notion of atoms moving in a void, because we cannot
even conceive of "moving in a void" without some idea of the structure of space and
time. Hence, if a complete physical theory can be based entirely on nothing other than the
hypothesis of one simple form for the metric of space and time, such a theory must surely
qualify as "constructive". Minkowskis spacetime interpretation does for special relativity
what Boltzmanns statistical interpretation did for thermodynamics, namely, it provided
an elementary constructive foundation for the theory.
Einstein's reaction to Minkowski's work was interesting. It's well known that Einstein
was not immediately very appreciative of his former instructor's contribution, describing
it as "superfluous learnedness", and joking that "since the mathematicians have attacked
the relativity theory, I myself no longer understand it any more". He seems to have been
at least partly serious when he later said "The people in Gottingen [where both
Minkowski and Hilbert resided] sometimes strike me not as if they wanted to help one
formulate something clearly, but as if they wanted only to show us physicists how much
brighter they are than we". Of course, Einstein's appreciation subsequently increased
when he found it necessary to use Minkowski's conceptual framework in order to develop
general relativity. Still, even in his autobiographical notes, Einstein seemed to downplay

the profound transformation of special relativity that Minkowski's insight represents.


Minkowski's important contribution to the theory lies in the following: Before Minkowski's
investigation it was necessary to carry out a Lorentz transformation on a law in order to test its
invariance under Lorentz transformations; be he succeeded in introducing a formalism so that the
mathematical form of the law itself guarantees its invariance under Lorentz transformations.

In other words, Minkowski's contribution was merely the introduction of a convenient


mathematical formalism. Einstein then added, almost as an afterthought,
He [Minkowski] also showed that the Lorentz transformation (apart from a different algebraic sign
due to the special character of time) is nothing but a rotation of the coordinate system in the fourdimensional space.

This is a rather slight comment when we consider that, from the standpoint of Einstein's
own criteria, Minkowski's insight that Lorentz invariance is purely an expression of the
(pseudo) metric of a combined four-dimensional space-time manifold at one stroke
renders special relativity into a constructive theory, the thing for which Einstein had
sought so "desperately" for so long. As he wrote in the London Time article above, "when
we say that we have succeeded in understanding a group of natural processes, we
invariably mean that a constructive theory has been found which covers the processes in
question", but he himself had given up on the search for such a theory in 1905, and had
concluded that, for the time being, the only possibility of progress was by means of a
theory of principle, analogous to classical thermodynamics. Actual understanding of the
phenomena would have to wait for a constructive theory. As it happened, this
constructive theory was provided just three years later by his former mathematics
instructor in Gottingen.
From this point of view, it seems fair to say that the modern theory of special relativity
has had three distinct forms. First was Lorentz's (and Poincare's) ether theory (18921904) which, although conceived as a constructive theory, actually derived its essential
content from a set of high-level principles and assumptions as discussed in Section 3.6.
Second was Einstein's explicit theory of principle (1905), in which he identified and
isolated the crucial premises underlying Lorentzs theory, and showed how they could be
consistently interpreted as primitive aspects of space and time. Third was Minkowski's
explicitly constructive spacetime theory (1908). Each stage represented a significant
advance in clarity, with Einstein's intermediate theory of principle and its interpretation
serving as the crucial bridge between the two very different constructive frameworks of
Lorentz and Minkowski.
4.1 Immovable Spacetime
My argument for the notion of space being really independent of body is
founded on the possibility of the material universe being finite and moveable. 'Tis not enough for this learned writer [Leibniz] to reply that he
thinks it would not have been wise and reasonable for God to have made
the material universe finite and moveable Neither is it sufficient barely

to repeat his assertion that the motion of a finite material universe would
be nothing, and (for want of other bodies to compare it with) would
produce no discoverable change, unless he could disprove the instance
which I gave of a very great change that would happen, viz., that the parts
would be sensibly shocked by a sudden acceleration or stopping of the
motion of the whole: to which instance, he has not attempted to give any
answer.
Samuel Clarke, 1716
Although the words "relativity" and "relational" share a common root, their meanings are
quite different. The principle of relativity asserts that for any material particle in any state
of motion there exists a system of space and time coordinates in terms of which the
particle is instantaneously at rest and inertia is homogeneous and isotropic. Thus the
natural (inertial) decomposition of spacetime intervals into temporal and spatial
components can be defined only relative to some particular frame of reference. Of course,
the absolute spacetime intervals themselves are invariant, so the "relativity" refers only to
the anaytical decomposition of these intervals. (The physical significance of this
particular decomposition is that the quantum phase of any object evolves in proportion to
its "natural" temporal coordinate.) In contrast, the principle of relationism asserts that the
absolute intervals between material objects fully characterize their extrinsic positional
status, without reference to any underlying non-material system of reference which might
be called "absolute space".
The traditional debate between proponents of relational and absolute motion (such as
Leibniz and Clarke, respectively) is of questionable relevance if continuous fields are
accepted as extended physical entities, permeating all of space, because this implies there
are no unoccupied locations. In this context every point in the entire spacetime manifold
is a vertex of actual relations between physical entities, obscuring the distinction between
absolute and relational premises. Moreover, in the context of the general theory of
relativity, the metrical properties of spacetime itself constitute a field, i.e., an extended
physical entity, which not only acts upon material objects but is also acted upon by them,
so the absolute-relational distinction has no clear meaning. However, it remains possible
to regard fields as only representations of effects, and to insist on materiality for
ontological objects, in which case the absolute-relational question remains both relevant
and unresolved.
Physicists have always recognized the appeal of a purely relational theory of motion, but
every such theory has foundered on the same problem, namely, the physicality of
acceleration. For example, one of Newtons greatest challenges was to account for the
fact that the Moon is relationally stationary with respect to the Earth (i.e., the distance
between Earth and Moon is roughly unchanging), whereas it ought to be accelerating
toward the Earth due to the influence of gravity. What is holding the Moon up? Or, to put
the question differently, why is the Moon not accelerating directly toward the Earth in
accord with the gravitational force that is presumably being applied to it? Newton's
brilliant answer was that the Moon is indeed accelerating directly toward the Earth, and

with precisely the magnitude of acceleration predicted by his gravity formula, but th the
Moon is also moving perpendicularly to the Earth-Moon axis, with a velocity v = R,
where R is the Earth-Moon distance and is the Moon's angular velocity, i.e., roughly 2
radians/moonth. If it were not accelerating toward the Earth, the Moon would just wander
off tangentially away from the Earth, but the force of gravity is modifying its velocity by
adding GM/R2 ft/sec toward the Earth each second, which causes the Moon to turn
continually in a roughly circular orbit around the Earth. The centripetal acceleration of an
object revolving in a circle is v2/R = 2R, and so (Newton reasoned) this must equal the
gravitational acceleration. Thus we have 2 R3 = GM, which of course is Kepler's third
law. This explanation depends on strictly non-relational concept of motion. In fact, it
might be said that this was the crucial insight of Newtonian dynamics - and it applies no
less in the special theory of relativity. For the purposes of dynamical analysis, motion
must be referred to an absolute background class of rectilinear inertial coordinate
systems, rather than simply to the relations between material bodies, or even classical
fields. Thus we can not infer everything important about an object's state of motion
simply from its distances to other objects (at least not to nearby objects). In this sense,
both Newtonian and relativistic physics find it necessary to invoke absolute space.
But this concept of absolute space presents us with an ontological puzzle, because we can
empirically verify the physical equivalence of all uniform states of motion, which
suggests that position and velocity have no absolute physical significance, and yet we can
also verify that changes in velocity (i.e., accelerations) do have absolute significance,
independent of the relations between material bodies (at least in a local sense). If the
evident relativity of position and velocity lead us to discard the idea of absolute space,
then how are we to understand the apparent absoluteness of acceleration? Some have
argued that in order for the change in something to be ontologically real, it is necessary
for the thing itself to be real, but of course that's not the case. It's perfectly possible for
"the thing itself" to be an artificial conception, whereas the "change" is the ontological
entity. For example, the Newtonian concept of the physical world is a set of particles,
between which relations exist. The primary ontological entities are the particles, but it's
equally possible to imagine that the separations are the "real" entities, and particles are
merely abstract entities, i.e., a convenient bookkeeping device for organizing the facts of
a set of separations. This raises some interesting questions, such as whether an unordered
multiset of n(n1)/2 separations suffices to uniquely determine a configuration of n points
in a space of fixed dimension. It isn't difficult to find examples of multisets of separations
that allow for multiple distinct spatial arrangements. For example, given the multiset of
ten separations

we can construct either of the two five-point configurations shown below

For another example, the following three distinct configurations of eight co-planar points
each have the same multiset of 28 point-to-point separations:

In fact, of the 12870 possible arrangements of eight points on a 4x4 grid, there are only
1120 distinct multisets of separations. Much of this reduction is due to rotations and
reflections, but not all. Intrinsically distinct configurations of points with the same
multiset of distances are not uncommon. They are sometimes called isospectral sets,
referring to the spectrum of point-to-point distances. Examples such as these may suggest
that unordered separations cannot be the basis of our experience, although we can't rule
out, a priori, the possibility that our interpretation of experience is non-unique, and that
different states of consciousness might perceive a given physical configuration
differently. Even if we reject the possibility of non-unique mapping to our conventional
domain of objects, we could still imagine a separation-based ontology by stipulating an
ordering for those separations. (One hypothetical form which laws of separation might
take is discussed in Section 4.2.) By recognizing the need to specify this ordering, our
focus shifts back to a particle-based ontology.
As noted previously, according to both Galilean and Einsteinian (special) relativity,
position and velocity are relative but acceleration is not. However, it can be argued that
the absoluteness of acceleration is incongruous with Galilean spacetime, because if
spacetime was Galilean there would be no reason for acceleration to be absolute. This
was already alluded to in the discussion of Section 1.8, where the cyclic symmetry of the
velocity relations between three Galilean reference systems was noted. In a sense, the
relationist Leibniz was correct in asserting that absolute space and time are inconsistent
with Galilean relativity, citing the principle of sufficient reason in support of this claim.
If time and space are separate and distinct (which no one had ever disputed) then there
would be no observable distinction between accelerated and un-accelerated systems of
reference, as revealed by the fact that the concept of a moveable rigid body of arbitrary
size is perfectly consistent with the kinematics of Galilean relativity. Samual Clarke had
argued that if all the material in some finite universe was accelerated in tandem,
maintaining all the intrinsic relations between the particles, this acceleration would still
be physically real, even though no one could observe the acceleration (for lack of
anything to compare with it). Leibniz replied
Motion does not indeed depend upon being observed, but it does depend upon
being possible to be observed. There is no motion when there is no change that
can be observed. And when there is no change that can be observed, there is no
change at all. The contrary opinion is grounded upon the supposition of a real
absolute space, which I have demonstratively confuted by the principle of the
want of a sufficient reason of things.

It is quite right that, in the context of Galilean relativity, the acceleration of all the matter
of the universe in tandem would be strictly unobservable, so Leibniz has a valid point.
However, barring some Machian long-range influence which neither Clarke nor Leibniz
seems to have imagined, the same argument implies that inertia should not exist at all.
Thus Clarke was correct in pointing out that the very existence of inertia refuts Leibnizs
position. There is indeed an observable distinction between uniform and accelerated, i.e.,
inertia does exist. In summary, Leibniz was correct in (effectively) claiming that the
existence of inertia is logically incompatible with the Galilean concept of space and time,
whereas Clarke was correct in pointing out that inertia does actually exist. The only was
out of this impasse would have been to discard the one premise that neither of them ever
questioned, namely, the Galilean concept of space and time. It was to be another 200
years before a viable alternative to Galilean spacetime was recognized.
As explained in Section 1, the spacetime structures of Galileo and Minkowski are
formally identical if the characteristic constant c of the latter is infinite. In that case it
follows that arbitrarily large rigid bodies are possible, so it is conceivable for all the
material in an arbitrarily large region to accelerate in tandem, maintaining all the same
intrinsic spatial relations. However, if c has some finite value, this is no longer the case.
Section 2.9 described the kinematic limitation on the size of a spatial region in which
objects can be accelerated in tandem. Hence the structure of Minkowski spacetime
intrinsically distinguishes uniform motion as the only kind of motion that could be
applied in tandem to all objects throughout space. In this context, Leibnizs principle of
sufficient reason can be used to argue that different states of uniform motion should not
be regarded as physically different, but it cannot be applied to accelerated motion,
because the very kinematics of Minkowski spacetime do not permit the tandem
acceleration of objects over arbitrarily large regions. It seems justifiable to say that the
existence of inertia implies the Minkowski character of spacetime.
This goes some way towards resolving the epistemological problems that have often been
raised against the principle of inertia. To the question How are we to distinguish the
inertial coordinate systems from all possible systems of reference?, we can answer that
the inertial coordinate systems are precisely those in terms of which two objects
separated by an arbitrary distance can be accelerated in tandem. This doesnt help to
identify inertial coordinate systems in Galilean spacetime, but it fully identifies them in
the context of Minkowski spacetime. So, it can be argued that (from an epistemological
standpoint) Minkowski spacetime is the only satisfactory framework for the principle of
inertia.
Still, there remain some legitimate open issues regarding any (so far) conceived
relativististic spacetime. According to both classical and special relativity, the inertial
coordinate systems are fully symmetrical, and each one is regarded as physically
equivalent (in the absence of matter). In particular, we cannot single out one particular
inertial system and claim that it is the "central" frame, because the equivalence class has
no center, and all ontological qualities are uniformly distributed over the entire class.
Unfortunately, from a purely formal standpoint, a purported uniform distribution over
inertial frames is somewhat problematic, because the inertial systems of reference along a

single line can only be linearly parameterized in terms of a variable that ranges from -
to +, such as q = log((1+v)/(1-v)), but if each value of q is to be regarded as equally
probable, then we are required to imagine a perfectly uniform density distribution over
the real numbers. Mathematically, no such distribution exists. To illustrate, imagine trying
to select a number randomly from a uniform distribution of all the real numbers. This is
the source of many well-known mathematical conundrums, such as the "High-Low
Number" strategy game, whose answer depends on the fact that no perfectly uniform
distribution exists over the real numbers (nor even over the integers). In trying to
understand whether there was any arbitrary choice in the creation of the physical world,
its interesting to note that the selection of our particular rest frame cannot have been
perfectly arbitrary from a set of pre-existing alternatives. It might be argued that the
impossibility of a choice between indistinguishable inertial reference frames implies that
only an absolutist framework is intelligible. However, the identity of indiscernables led
Leibnic and Mach to argue just the opposite, i.e., that the only intelligible way to imagine
the existence of objects, all in roughly the same frame of reference within a perfectly
symmetrical class of possible reference systems, is to imagine that the objects themselves
are in some way responsible for the class, which brings us back to pure relationism.
Alas, as weve seen, pure relationism has its own problematic implications. For one, there
has traditionally been a close association between relationism and the concept of absolute
simultaneity. This is because the relations were regarded as purely spatial, and it was
necessary to posit a unique instant of time in which to evaluate those spatial relations. To
implement a spatial relationist theory in the framework of Minkowski spacetime would
evidently require that whatever laws apply to the spatial relations for one particular
decomposition of spacetime must also apply to all other decompositions. (A simple
example of this is discussed in Section 4.2.) Alternatively, we might say that only
invariant quantities should be subject to the relational laws, but this amounts to the same
thing as requiring that the laws apply to all decompositions.
One common feature of all purely relational models based on Galilean space and time is
their evident non-locality, because (as noted above) there is no way, if we limit ourselves
to local observations, to identify the inertial motions of material objects purely from the
kinematical relations between them. We're forced to attribute the distinction between
inertial and non-inertial motion to some non-material (or non-local) interaction. This is
nicely illustrated by Einstein's thought experiment (based on Newton's famous "spinning
pail") involving two nominally identical fluid globes S1 and S2 floating in an empty
region of space. One of these globes is set rotating (about their common axis) while the
other remains stationary. The rotating globe assumes an oblate shape due to its rotation.

If globes are mutually stationary and not rotating, they are both spherical and
symmetrical, and we cannot distinguish between them, but if one of the globes is
spinning about their common axis, the principle of inertia leads us to expect that the
spinning globe will bulge at the "equator" and shrink along its axis of rotation due to the
centripetal forces. The "paradox" (for the relationist) is that each globe is spinning with
respect to the other, so they must still be regarded as perfectly symmetrical, and yet their
shapes are no longer congruent. To what can we attribute the asymmetry?
If we look further afield we may notice that the deformed globe is rotating relative to all
the distant stars, whereas the spherical globe is not. A little experimentation shows that a
globe's deformation is strictly a function of its speed of rotation relative to the distant
stars, and presumably this is not a mere coincidence. Newton's explanation for this
coincidence was to argue that the local globes and the distant stars all reside in the same
absolute space, and it is this space that defines absolute (inertial) motion, and likewise the
special relativistic theory invokes an absolutely preferred class of reference frames.
Moreover, in the general theory of relativity, when viewed from a specific cosmological
perspective, there is always a preferred frame of reference, owing to the global boundary
conditions that must be imposed in order to single out a solution. This came as a shock to
Einstein himself at first, since he was originally thinking (hoping) that the field equations
of general relativity represented true relationism, but his conversion began when he
received Schwarzschild's exact solution for spherical symmetry, which of course exhibits
a preferred coordinate system such that the metric coefficients are independent of time,
i.e., the usual Schwarzschild coordinates, which are essentially unique for that particular
solution.
Likewise for any given solution there is some globally unique system of reference singled
out by symmetry or boundary conditions (even for asymptotically flat universes, as
Einstein himself showed). For example, in the Friedman "big bang" cosmologies there is
a preferred global system of coordinates corresponding to the worldlines with respect to
which the cosmic background radiation is isotropic. Of course, this is not a fresh insight.
The non-relational global aspects of general relativistic cosmologies have been
extensively studied, beginning with Einstein's 1917 paper on the subject, and continuing
with Gdel's rotating empty universes, and so on. Such examples make it clear that
general relativity is not a relational theory of motion. In other words, general relativity
does not correlate all physical effects with the relations between material bodies, but
rather with the relations between objects (including fields) and the absolute background
metric, which is affected by, but is not determined by, the distribution of objects (except
arguably in closed cosmological models). Thus relativity, no less than Newtonian
mechanics, relies on spacetime as an absolute entity in itself, exerting influence on fields
and material bodies. The extra information contained in the metric of spacetime is
typically introduced by means of boundary conditions or "initial values" on a spacelike
foliation, sufficient to fix a solution of the field equations.
In this way relativity very quickly disappointed its early logical-positivist supporters
when it became clear that it was not, and never had been, a relational theory of motion, in

the sense of Leibniz, Berkeley, or Mach. Initially even Einstein was disturbed by the
Schwarzschild and de Sitter solutions (see Section 7.6), which represent complete
metrical manifolds with only one material object or none at all (respectively). These
examples showed that spacetime in the theory of relativity cannot simply be regarded as
the totality of the extrinsic relations between material objects (and non-gravitational
fields), but is a primary physical entity of the theory, with its own absolute properties,
most notably the metric with its related invariants, at each point. Indeed this was
Einstein's eventual answer to Mach's critique of pre-relativity physics. Mach had
complained that it was unacceptable for our theories to contain elements (such as
spacetime) that act on (i.e., have an effect on) other things, but that are not acted upon by
other things. Mach, and the other relationalists before him, naturally expected this to be
resolved by eliminating spacetime, i.e., by denying that an entity called "spacetime" acts
in any physical way. To Mach's surprise (and unhappiness), the theory of relativity
actually did just the opposite - it satisfied Mach's criticism by instead making spacetime a
full-fledged element of theory, acted upon by other objects. By so doing, Einstein
believed he had responded to Mach's critique, but of course Mach hated it, and said so.
Early in his career, Einstein was sympathetic to the idea of relationism, and entertained
hopes of banishing absolute space from physics but, like Newton before him, he was
forced to abandon this hope in order to produce a theory that satisfactorily represents our
observations.
The absolute significance of spacetime in the theory of relativity was already obvious
from trivial considerations of the special theory. The twins paradox is a good illustration
of why relativity cannot be a relational theory, because the relation between the twins is
perfectly symmetrical, i.e., the spatial distance between them starts at zero, increases to
some maximum value, and then decreases back to zero. The distinction between the twins
cannot be expressed in terms of their mutual relations to each other, but only in terms of
how each of their individual worldlines are embedded in the absolute metrical manifold
of spacetime. This becomes even more obvious in the context of general relativity,
because we can then have multiple distinct geodesic paths between two given events,
with different lapses of proper time, so we cannot even appeal to any difference in "felt"
accelerations or local physics of any kind along the two world-paths to account for the
asymmetry. Hopes of accounting for this asymmetry by reference to the distant stars, ala
Mach, were certainly not fulfilled by general relativity, according to which the metric of
spacetime is conditioned by the presence of matter, but only to a very slight degree in
most circumstances. From an overall cosmological standpoint we are unable to attribute
the basic inertial field to the configuration of mass and energy, and we have no choice but
to simply assume a plausible absolute inertial background field, just as in Newtonian
physics, in order to actually make predictions and solve problems. This is necessarily a
separate and largely independent stipulation from our assumed distribution of matter and
energy.
To understand why Galilean relativity is actually more relational than special relativity,
note that the unified spacetime manifold with the lightcone structure of Minkowski
spacetime is more rigid than a pure Cartesian product of a three-dimensional spatial
manifold and an independent one-dimensional temporal manifold. In Galilean spacetime

at a spatial point P0 and time t0 there is no restriction at all on the set of spatial points at t0
+ dt that may "spatially coincide with P0" with respect to some valid inertial frame of
reference. In other words, an inertial worldline through P0 at time t0 can pass through any
point in the entire universe at time t0 + dt for any positive dt. In contrast, the lightcone
structure of Minkowski spacetime restricts the future of the point P0 to points inside the
future null cone, i.e., P0 cdt, and as dt goes to zero, this range goes to zero, imposing a
well-defined unique connection from each "infinitesimal" instant to the next, which of
course is what the unification of space and time into a single continuum accomplishes.
We referred above to Newtonian spacetime without distinguishing it from what has come
to be called Galilean spacetime. This is because Newton's laws are manifestly invariant
under Galilean transformations, and in view of this it would seem that Newton should be
counted as an advocate of relativistic spacetime. However, in several famous passages of
the first Scholium of the Principia Newton seems to reject the very relativity on which his
physics is founded, and to insist on distinctly metaphysical conceptions of absolute space
and time. He wrote
I do not define the words time, space, place, and motion, since they are well
known to all. However, I note that people commonly conceive of these quantities
solely in terms of the relations between the objects of sense perception, and this is
the source of certain preconceptions, for the dispelling of which it is useful to
distinguish between absolute and relative, true and apparent, mathematical and
common.
It isn't trivial to unpack the intended significance of these statements, especially because
Newton has supplied three alternate names for each of the two types of quantities that he
wishes us to distinguish. On one hand we have absolute, true, mathematical quantities,
and on the other we have relative, apparent, common quantities. The latter are understood
to be founded on our sense perceptions, so the former presumably are not, which seems
to imply that they are metaphysical. However, Newton also says that this distinction is
useful for dispelling certain prejudices, which suggests that his motives are utilitarian
and/or pedagogical rather than to establish an ontology. He continues
Absolute, true, and mathematical time, in and of itself and of its own nature flows
uniformly (equably), without reference to anything external. By another name it is
called duration. Relative, apparent, and common time is any sensible external
measure of duration by means of motion. Such measures (for example, an hour, a
day, a month, a year) are commonly used instead of true time.
Absolute space, in its own nature, without relation to anything external, remains
always similar and immovable. Relative space is some movable measure of
absolute space, which our senses determine by the positions of bodies... Absolute
and relative space are of the same type (species) and magnitude, but are not
always numerically the same...
Place is a part of space which a body takes up, and is according to the space either

absolute or relative.
Absolute motion is the translation of a body from one absolute place to another,
and relative motion is the translation from one relative place to another.
Newton's insistence on the necessity of referring all true motions to "immovable space"
has often puzzled historians of science, because his definition of absolute space and time
are plainly metaphysical, and it's easy to see that Newton's actual formulation of the laws
of physics is invariant under Galilean transformations, and the concept of absolute space
plays no role. Indeed, each mention of a "state of rest" in the definitions and laws is
accompanied by the phrase "or uniform motion in a right line", so the system built on
these axioms explicitly does not distinguish between these two concepts. What, then, did
Newton mean when he wrote that true motions must be referred to immovable space?
The introductory Scholium ends with a promise to explain how the true motions of
objects are to be determined, declaring that this was the purpose for which the Principia
was composed, so it's all the more surprising when we find that the subject is never even
mentioned in Books I or II. Only in the concluding Book III, "The System of the World",
does Newton return to this subject, and we finally learn what he means by "immovable
space". Although his motto was "I frame no hypotheses" we find, immediately following
Proposition X in Book III (in the third edition) the singular hypothesis
HYPOTHESIS I: That the centre of the system of the world is immovable.
In support of this remarkable assertion, Newton simply says "This is acknowledged by
all, although some contend that the earth, others that the sun, is fixed in that centre." In
the subsequent proposition XI we finally discover Newton's immovable space. He writes
PROPOSITION XI: That the common centre of gravity of the earth, the sun, and
all the planets, is immovable. For that centre either is at rest or moves uniformly
forwards in a right line; but if that centre moved, the center of the world would
move also, against the Hypothesis.
This makes it clear that Newton's purpose all along has been not to deny Galilean
relativity or the fundamental principle of inertia, but simply to show that a suitable
system of reference for determining true inertial motions need not be centered on some
material body. This was foreshadowed in the first Scholium when he wrote "it may be
that there is no body really at rest, to which the places and motions of others may be
referred". Furthermore, he notes that many people believed the immovable center of the
world was at the center of the Earth, whereas others followed Copernicus in thinking the
Sun was the immovable center. Newton evidently (and rightly) regarded it as one of the
most significant conclusions of his deliberations that the true inertial center of the world
was in neither of those objects, but is instead the center of gravity of the entire solar
system. We recall that Galileo found himself in trouble for claiming that the Earth moves,
whereas both he and Copernicus believed that the Sun was absolutely stationary. Newton
showed that the Sun itself moves, as he continues

PROPOSITION XII: That the sun is agitated by a continual motion, but never
recedes far from the common centre of gravity of all the planets. For since the
quantity of matter in the sun is to the quantity of matter in Jupiter as 1067 to 1,
and the distance to Jupiter from the sun is to the semidiameter of the sun is in a
slightly greater proportion, the common center of gravity of Jupiter and the sun
will fall upon a point a little without the surface of the sun.
This was certainly a magnificent discovery, worthy of being called the purpose for which
the Principia was composed, and it is clearly what Newton had in mind when he wrote
the introductory Scholium promising to reveal how immovable space (i.e., the center of
the world) is to be found. In this context we can see that Newton was not claiming the
ability to determine absolute rest, but rather the ability to infer from phenomena a state of
absolute inertial motion, which he identified with the center of gravity of the solar
system. He very conspicuously labels as a Hypothesis (one of only three in the final
edition of the Principia) the conventional statement, "acknowledged by all", that the
center of the world is immovable. By these statements he was trying to justify calling the
solar system's inertial center the center of the world, while specifically acknowledging
that the immovability of this point is conventional, since it could just as well be regarded
as moving "uniformly forwards in a right line".
The modern confusion over Newton's first Scholium arises from trying to impose an
ontological interpretation on a 17th century attempt to isolate the concept of pure inertia,
and incidentally to locate the "center of the world". It was essential for Newton to make
sure his readers understood that "uniform motion" and "right lines" cannot generally be
judged with reference to neighboring bodies (such as the Earth's spinning surface),
because those bodies themselves are typically in non-uniform motion. Hence he needed
to convey the fact that the seat of inertia is not the Earth's center, or the Sun, or any other
material body, but is instead absolute space and time - in precisely the same sense that
spacetime is absolute in special relativity. This is distinct from asserting an absolute state
of rest, which Newton explicitly recognized as a matter of convention.
Indeed, we now know the solar system itself revolves around the center of the galaxy,
which itself moves with respect to other galaxies, so under Hypothesis I we must
conclude that Proposition XI is strictly false. Nevertheless, the deviations from true
inertial motion represented by those stellar and galactic motions are so slight that
Newton's "immovable center of the world" is still suitable as the basis of true inertial
motion for nearly all purposes. In a more profound sense, the concept of "immoveable
space" been carried over into modern relativity because, as Einstein said, spacetime in
general relativity is endowed with physical qualities that enable it to establish the local
inertial frames, but "the idea of motion may not be applied to it".
4.2 Inertial and Gravitational Separations
And I am dumb to tell a weathers wind

How time has ticked a heaven round the stars.


Dylan Thomas, 1934
The special theory of relativity is formulated as a local theory, so its natural focus is on
the worldlines of individual particles. In addition, special relativity presupposes a
preferred class of worldlines, those representing inertial motion. The idea of a worldline
is inherently absolute in the sense that it is nominally defined with reference only to a
system of space and time coordinates, not to any other objects. This is in contrast to a
truly relational theory, which would take the "dual" approach, and regard the separations
between particles as the most natural objects of study. In fact, as mentioned in Section
4.1, we could go to the relationist extreme of regarding separations as the primary
ontological entities, and considering particles to be merely abstract concepts that we use
to psychologically organize and coordinate the separations. The relationist view arguably
has the advantage of not presupposing a fixed background or even a definite
dimensionality of space, since each separation could be considered to represent an
independent degree of freedom. Of course, this freedom doesnt seem to exist in the real
world, since we cannot arrange five particles all mutually equidistant from each other.
Indeed it appears that the n(n1)/2 separations between n particles can be fully encoded as
just 3n real numbers, and moreover that those real number vary continuously as the
individual particles move. This is the justification for the idea of particles moving in a
coherent three-dimensional space.
Nevertheless, its interesting to examine the spatial separations that exist between
material particles (as opposed to the space and time coordinates of individual particles),
to see if their behavior can be characterized in a simple way. From this point of view, the
idea of "motion" is secondary; we simply regard separations as abstract entities having
certain properties that may vary with time. In this context, rather than discussing inertial
motion of an individual particle, we consider the spatial separation (as a function of time)
between two inertial particles. However, since we dont presuppose a background of
absolute inertial motion, we will refer to the particles as being co-inertial, meaning
simply that the spatial separation between them behaves like the separation between two
particles in absolute inertial motion, regardless of whether the two particles are actually
in absolute inertial motion.
Is it possible to characterize in a simple way the spatial separations that exist between coinertial particles? Consider, for example, the spatial separation s(t) as a function of time
between a stationary particle and a particle moving uniformly in a straight line through
space, as depicted in the figure below for the condition when the direction of motion of
the moving particle B is perpendicular to the displacement from the stationary particle A.

Obviously the separation between objects A and B in this configuration is stationary at

this instant, i.e., we have ds/dt = 0, and yet we know from experience that this physical
situation is distinct from one in which the two objects are actually stationary with respect
to each others inertial rest frames. For example, the Moon and Earth are separated by
roughly a constant distance, and yet we understand that the Moon is in constant motion
perpendicular to its separation from the Earth. It is this transverse motion that counteracts the effect of gravity and keeps the Moon in its orbit. This is another reason that we
ordinarily find it necessary to describe motion not in purely relational terms, but in terms
of absolutely non-rotating systems of inertial coordinates. Of course, as Mach observed,
the apparent existence of absolute rotation doesnt necessarily refute relationism as a
viable basis for coordinating events. It could also mean that we must take more relations
into account. (For example, the Moons motion is always tangential to the Earth, but it is
not always tangential to other bodies, so its orbital motion does show up in the totality of
binary separations.) Whether or not a workable physics could be developed on a purely
relational basis is unclear, but its still interesting to examine the class of co-inertial
separations as functions of time. It turns out that co-inertial separations are characterized
by a condition that is nearly identical to the condition for linear gravitational free-fall, as
well as for certain other natural kinds of motion.
The three orthogonal components x, y, and z of the separation between two particles in
unaccelerated motion relative to a common reference frame must be linear functions of
time, i.e.,

where the coefficients ai and bi are constants. Therefore the magnitude of any "co-inertial
separation" is of the form

where

Letting the subscript n denote nth derivative with respect to time, the first two derivatives
of s(t) are

The right hand equation shows that s2 s03 = k, and we can differentiate this again and
divide the result by s02 to show that the separation s(t) between any two particles in
relatively unaccelerated (i.e., co-inertial) motion in Galilean spacetime must satisfy the
equation

Now we consider the separation that characterizes an isolated non-rotating two-body


system in gravitational free-fall. Assume the two bodies are identical particles, each of
mass m. According to Newtonian theory the inertial and gravitational constraints are
coupled together by the auxiliary quantity called "force" by the following equations

where G is a universal constant. (Note that each particle's "absolute" acceleration is half
of the second derivative of their mutual separation with respect to time.) Equating these
two forces gives s2 s02 = 2Gm. Differentiating this again and dividing through by s0, we
can characterize non-rotating gravitational free-fall by the purely kinematic equation

The formal similarity between equations (1) and (2) is remarkable, considering that the
former describes strictly inertial separations and the latter describes gravitational
separations. We can show how the two are related by considering general free motion in a
gravitational field. The Newtonian equations of motion are

where r is the magnitude of the distance from the center of the field and is the angular
velocity of the particle. If we solve the left hand equation for and differentiate to give d
/dt, we can substitute these expressions into the right hand equation and re-arrange the
terms to give

which applies (in the Newtonian limit) to arbitrary free paths of test particles in a
gravitational field. Obviously if m = 0 this reduces to equation (1), representing free
inertial separations, whereas for purely radial motion we have d2r/dt2 = m/r2, and so this
reduces to equation (2), representing radial gravitational separation.
Other classes of physical separations also satisfy a differential equation similar to (1) and
(2). For example, consider a particle of mass m attached to a rod in such a way that it can
slide freely along the rod. If we rotate the rod about some point P then the particle in
general will tend to slide outward along the rod away from the center of rotation in
accord with the basic equation of motion

where s is the distance from the center of rotation to the sliding particle, and is the
angular velocity of the rod. Differentiating and multiplying through by s0 gives

Then since s2 = 2s0, we see that s(t) satisfies the equation


(3)
So, we have found that arbitrary co-inertial separations, non-rotating gravitational
separations, and rotating radial separations are all characterized by a differential equation
of the form
(4)
for some constant N. (Among the other solutions of this equation (with N = 1) are the
elementary transcendental functions et, sin(t), and cos(t).) Solving for N, to isolate the
arbitrary constant, we have

Differentiating this gives the basic equation

If none of s0, s1, s2, and s3 is zero, we can divide each term by all of these to give the
interesting form

This could be seen as a (admittedly very simplistic) unification of a variety of


physically meaningful spatial separation functions under a single equation. The
symmetry breaking that leads to the different behavior in different physical situations
arises from the choice of N, which appears as a constant of integration.
Incidentally, even though the above has been based on the Galilean spatial separations
between objects as a function of Galilean time, the same conditions can be shown to
apply to the absolute spacetime intervals between inertial particles as a function of their
proper times. Relative to any point on the worldline of one particle, the four components
t, x, y, and z of the absolute interval to any other inertially moving particle are all
linear functions of the proper time along the latter particle's worldline. Therefore, the

components can be written in the form

where the coefficients ai and bi are constants. It follows that the absolute magnitude of
any "co-inertial separation" is of the form

where

Thus we have the same formal dependence as before, except now the parameter s
represents the absolute spacetime separation. This shows that the absolute separation
between any fixed point on one inertial worldline and a point advancing along any other
inertial worldline satisfies equation (1), where subscripts denote derivatives with respect
to proper time of the advancing point. Naturally the reciprocal relation also holds, as well
as the absolute separation between two points, each advancing along arbitrary inertial
worldlines, correlated according to their respective proper times.
4.3 Free-Fall Equations
When, therefore, I observe a stone initially at rest falling from an elevated
position and continually acquiring new increments of speed, why should I
not believe that such increases take place in a manner which is
exceedingly simple and rather obvious to everybody?
Galileo
Galilei, 1638
The equation of two-body non-rotating radial free-fall in Newtonian theory is formally
identical to the one-body radial free-fall solution in Einstein's theory (as is Kepler's third
law), provided we identify Newton's radial distance with the Schwarzschild parameter r,
and Newton's time with the proper time of the falling particle. Therefore, it's worthwhile
to explicitly derive the cycloidal form of this solution. From the Newtonian point of view
we can begin with the inverse-square law of gravitation for the radial separation s(t)
between two identical non-rotating particles of mass m

where dots signify derivatives with respect to time. Integrating this over ds from an

arbitrary initial separation s(0) to the separation s(t) at some other time t gives

Notice that the left hand integral can be rewritten

Therefore, the previous equation can easily be integrated to give

which shows that the quantity

is invariant for all t. Solving the equation for

, we have

Rearranging, this gives

To simplify the expressions, we put s0 = s(0), v0 =


preceding expression can be written

and r = s(t)/s0. In these terms, the

There are two cases to consider. If K is positive, then the trajectory is bounded, and there
is some point on the trajectory (the apogee) at which v = 0. Choosing this point as our
time origin t = 0, we have K=1, and the standard integral gives

This equation describes a (scaled) cycloidal relation between t and r, which can be
expressed parametrically in terms of a fictitious angle as follows

To verify that these two equations are equivalent to the preceding equation, we can solve
the second for and substitute into the first to give

Using the trigonometric identity


the right side is

we see that the first term on

Also, letting = invcos(2r1), we can use the trigonometric identity

to show that this angle is

so the second term on the right side of (2) is

which completes the demonstration that the cycloid relation given by (2) is equivalent to
the free-fall relation (1).
The second case is when K is negative. For this case we can conveniently express the
equations in terms of the positive parameter k = -K. The standard integral

tells us that, for any two points s0 and s1 on the trajectory, the time interval is related to
the separations according to

where

Notice that if we define S0 = s0 / k and R = kr, then this becomes

Thus, if we define the normalized time parameter

then the normalized equation of motion is

This represents the shape of every non-rotating separation between identical particles of
mass m for which k is positive, which means that the absolute value of v0 exceeds 2
. These are the unbound radial orbits for which R goes to infinity, as opposed
to the case when the absolute value of v0 is less than this threshold, which gives bound
radial orbits in the shape of a cycloid in accord with equation (1).
It's interesting to note the "removable singularity" of (3) at R = 0. Physically the
parameter R is always non-negative by definition, so it abruptly reverses slope at the
origin, even though the position may vary monotonically with respect to an external
coordinate system.
4.4 Force, Curvature, and Uncertainty
The atoms, as their own weight bears them down plumb through the void,
at scarce determined times, in scarce determined places, from their course

decline a little - call it, so to speak, mere changed trend. For were it not
their wont thuswise to swerve, down would they fall, each one, like drops
of rain, through the unbottomed void; and then collisions ne'er could be,
nor blows among the primal elements; and thus Nature would never have
created aught.
Lucretius, 50 BC
The trajectory of radial non-rotating gravitational freefall can be expressed by the simple
differential equation

where k is a constant and dots signify derivatives with respect to time. This equation is
valid for both Newtonian gravity and general relativity, provided we identify Newton's
time parameter with the free-falling particle's proper time, and Newton's radial distance
with the radial Schwarzschild coordinate. Notice that no gravitational constant appears in
this equation (k is just a constant of integration determined by the initial conditions), so
equation (1) is a purely kinematic description of gravity. Why did Newton not adopt this
simple kinematic view? Historically the reasons involved considerations of rotating
systems, but the basic problem with the kinematic view is present even with simple nonrotating free-fall.
The problem is that equation (1) has an unrealistic "static solution" at
. This
condition implies that k=0, and the separation between the two objects has no proper
"trajectory" (i.e., time drops out of the equation), so the equation cannot extrapolate the
position forward or backward in time. Of course, this condition can never arise naturally
from any non-static condition with k0, but we can imagine that by the imposition of
some external force we can arrange to have the two objects initially at rest and not
accelerating relative to each other. Then when the objects are released from the "outside"
force we expect them to immediately begin falling toward each other under the influence
of their mutual gravitational attraction. This implies that k, and therefore , must
immediately assume some non-zero values, but equation (1) gives us no information
about these values, because the entire equation identically vanishes at the static solution.
To escape from the static solution, Newtonian mechanics splits the kinematic equation of
motion into two parts, coupled together by the dynamical concepts of force and mass.
Two objects are said to exert (equal and opposite) force on each other proportional to the
inverse of the square of the separation between them, and the second derivative of that
separation is proportional (per mass) to this force. Thus, the relation between separation
and time for two identical particles, each of mass m, is given not by a single kinematic
equation but by two simultaneous equations

If we combine these two equations by eliminating F, we have

which shows that when the two objects are released, the separation instantly acquires the
second derivative 2Gm/s2. Once this "initialization" has been accomplished, the
subsequent free fall is entirely determined by equation (1), as can be seen by
differentiating (2) to give

which, assuming the separation is not zero, can be divided by s to give


, the
derivative of (1). This shows that, for non-rotating radial free-fall, the coupling
parameters F and m are entirely superfluous except that they serves to establish the proper
initial condition when the two objects are released from rest. Thus, Newton's dual
concepts of force-at-a-distance and the proportionality of acceleration to force serve only
(in this context) to enable us to solve for a non-vanishing as a function of s when = 0,
which equation (1) obviously cannot do.
Furthermore, the constant G does not appear in (1) or (3), even though they give a
complete description of gravitational free-fall except for the singularity at = 0. Thus
the gravitational constant is also needed only at this singular point, the "static solution" of
equation (1), which is the only point at which the dynamical concepts of force and mass
are used. Aside from this singular condition, non-rotating radial Newtonian gravity is a
purely kinematical phenomenon.
There are several essentially equivalent formulations of the kinematic equation of nonrotating radial gravitational motion, but all lead to an indeterminate condition at the static
solution. For example, if we set k = 2 in equation (1) and multiply through by 2
we have
. Integrating this over time gives
constant of integration. Dividing by s gives

where is a

which we recognize as expressing the classical conservation of energy, with the first term
representing potential energy and the second term denoting kinetic energy. Taking the
derivative of this gives

Notice that in each of the preceding equations the condition

still represents a

solution for any s, even though it is unrealistic. At this point we may be tempted to solve
our problem by dividing through equation (4) by to give

which is the Newtonian inverse-square "force" law of gravity. This does indeed
determines the second derivative as a function of s, and thereby provides the
information needed to depart from the externally imposed static initial condition.
However, notice that the condition which concerns us is precisely when = 0, so when
we divided equation (4) by we were essentially just eliminating the singular pole
arbitrarily by dividing by zero. Thus we can't properly say that the "force-at-a-distance"
law (5) follows from equation (1). The removal of the indeterminate singularity actually
represents an independent assumption relative to the basic kinematic equation of motion.
Of course, this assumption is perfectly compatible with the equation of motion, as can be
seen by solving equation (5) for /s and substituting into the energy equation to give

and thus

which is the same as equation (1). This compatibility is a necessary consequence of the
fact that the equation of motion is totally indeterminate when =0, which is the only
condition at which the force law introduces new information not contained in the basic
equation of motion.
In view of the above relations, it is not surprising that in the general theory of relativity
we find gravity expressed without the concept of force. Einstein avoided the problem of
the static solution - without invoking an auxiliary concept such as force - simply by
recasting the phenomena in four-dimensional space-time, within which no material object
is ever static. Every object, even one "at rest" in space, necessarily has a proper
trajectory through spacetime, because it's moving forward in time. Furthermore, if we
allow the spacetime manifold to posses intrinsic curvature, it follows that a purely timelike trajectory can "veer off" and acquire space-like components.
Of course, this tendency to "veer off" depends on the degree of curvature of the spacetime, which general relativity relates to the mass-energy in the region. One of Einstein's
motivations for the general theory was the desire to eliminate arbitrary constants,
particularly the gravitational constant G, from the expressions of physical laws, but in the
general theory it is still necessary to determine the proportionality between mass and
curvature empirically, so the arbitrary gravitational constant remains. In any case, we see
that Newtonian mechanics and general relativity give formally identical relations between
separation and time for non-rotating free-fall, and the conceptual differences between the

two theories can be expressed in terms of the ways in which they escape from or avoid
the static condition.
It's interesting to note that the static solution of (1) is unstable in the direction collapse.
Given a positive separation s, the signs of
must be {+,}, {+,+}, {,+} or {,} in
order to satisfy (1), but considering small perturbations of these derivatives from the state
in which they are both zero, it's clear that {+,} is unrealistic, because would not go
positive from zero while was going negative from zero. For similar reasons,
perturbations leading to {+,+} and {,+} are also excluded. Only the case {,} represents
a realistic outcome of a small perturbation from the static solution.
This instability in the direction of collapse suggests another approach to escaping from
(or avoiding) the static solution. The exact velocity and position of the two objects
cannot be known at the quantum level, so, in a sense, the closest that two bodies can
come to a static condition must still allow the equivalent of one quanta of momentum in
their relative velocities. It's tempting to imagine that there might be some way of
deriving the gravitational constant based on the idea that the initial condition for (1) is
determined by the characteristic quantum uncertainty for the separations between massive
particles, since, as we've seen, this initial condition fully determines the trajectory of
radial gravitational free-fall. Simplistically we could note that, for a particle of mass m,
any finite limit L on allowable distances implies two irreducible quantities of energy per
unit mass, one being (h/2L)2/2m2 corresponding to the minimum "observable" momentum
mv = h/2L (where h is Planck's constant) due to the uncertainty principle, and the other
being the minimum gravitational potential energy Gm/L. Identifying these two energies
with each other, and setting L equal to the event horizon radius c/H where c is the
velocity of light and H is Hubble's expansion constant, we have the relation

Inserting the values h = (6.625)10-34 J sec, G = (6.673)10-11 Nm2/kg2, c = (2.998)108


m/sec, and H = (2.3)10-18 sec-1 gives a value of (1.8477)10-28 kg for the characteristic
mass m, which happens to be about one ninth the mass of a proton. Rough relationships
of this kind between the fundamental physical constants have been discussed by Dirac
and others, including Leopold Infeld, who wrote in 1949
Let us take as an example Maxwells equations and try to find their solution on a
cosmological background In a closed universe the frequency of radiation has a
lowest value [corresponding to the maximum possible wavelength]. The
spectrum, on its red side, cannot reach frequency zero. We obtain characteristic
values for frequencies a similar situation prevails if we consider Diracs
equations upon a cosmological background. The solutions in a closed universe are
different, not because of the metric, but because of the topology of our universe.
Such ideas are intiguing, but they have yet to be incorporated meaningfully into any

successful physical theory.


The above represents a very simplistic sense in which the uncertainty of quantum
mechanics and the spacetime curvature of general relativity can be regarded as two
alternative conceptual strategies for establishing a consistent gravitational coupling. In a
more sophisticated sense, we can find other interesting formal parallels between these
two concepts, both of which fundamentally express non-commutativity. Given a system
of orthogonal xyz coordinates, let A,B,C denote operations which, when applied to any
unit vector emanating from the origin, rotate that vector in the positive sense about x, y,
or z axis respectively. Each of these operations can be represented by a rotation matrix,
such that multiplying any vector by that matrix will effectively rotate the vector
accordingly. As Hamilton realized in his efforts to find a three-dimensional analog of
complex numbers (which represent rotation operators in two-dimensions), the
multiplication (i.e., composition) of two rotations in space is not commutative. This is
easily seen in our example, because if we begin with a vector V emanating from the
origin in the positive z direction, and we first apply rotation A and then rotation B, we
arrive at a vector pointing in the positive y direction, whereas if we begin with V and
apply the rotation B first and then A we arrive at a vector pointing in the negative x
direction. Thus the effect of the combined operation AB is different from the effect of
the combined operation BA, and so the matrix AB BA does not vanish. This is in
contrast with ordinary scalars and complex numbers, which always satisfy the
commutivity relation ab ba = 0 for every two numbers a,b.
This non-commutivity also appears when dealing with calculus on curved manifolds,
which we will discuss in more detail in Section 5. Just to give a preliminary indication of
how non-commutative relations arise in this context, suppose we have a vector field T
defined over a given metrical manifold, and we let T denote covariant differentiation of
T first with respect to the coordinate x and then with respect to the coordinate x. In a
flat manifold the covariant derivative is identical to the partial derivative, which is
commutative. In other words, the result of differentiation with respect to two coordinates
in succession is independent of the order in which we apply the differentiations.
However, in a curved manifold this is not the case. We find that reversing the order of
the differentiations yields different results, just as when applying two rotations in
succession to a vector. Specifically, we will find that

where R is the Riemann curvature tensor, to be discussed in detail in Section 5.7. The
vanishing of this tensor is the necessary and sufficient condition for the manifold to be
metrically flat, i.e., free of intrinsic curvature, so this tensor can be regarded as a measure
of the degree of non-commutivity of covariant derivative operators in the manifold.
Non-commutivity also plays a central role in quantum mechanics, where observables
such as position and momentum are represented by operators, much like the rotation
operators in our previous example, and the possible observed states are eigenvalues of
those operators. If we let X and P denote the position and momentum operators, the

application of one of these operators to the state vector of a given system results in a new
state vector with specific probabilities. This represents a measurement of the respective
observable. The effect of a position measurement followed by a momentum
measurement can be represented by the combined operator XP, and likewise the effect of
a momentum measurement followed by a position measurement can be represented by
PX. Again we find that the commutative property does not generally hold. If two
observable are compatible, such as the X position and the Y position of a particle, then
the operators commute, which means we have XY YX = 0. However, if two operators
are not compatible, such as position and momentum, their operators do not commute.
This leads to the important relation

This non-commutivity in the measurement of observables implies an inherent limit on the


precision to which the values of the incompatible observables can be jointly measured.
In general it can be shown that if A and B are the operators associated with the physical
quantities a and b, and if a and b denote the expected root mean squares of the
deviations of measured values of a and b from their respective expected values, then

This is Heisenberg's uncertainty relation. The commutator of two observable operators is


invariably a multiple of Planck's constant, so if Planck's constant were zero, all
observables would be compatible, i.e., their operators would commute, just as do all
classical operators. We might say (with some poetic license) that Planck's constant is a
measure of the "curvature" of the manifold of observation. This "curvature" applies only
to incompatible observables, although the term "incompatible" is somewhat misleading,
because it actually signifies that two observables A,B are conjugates, i.e., transformable
into each other by the conjugacy relation A=UBU-1 where U is a unitary operator
(analagous to a simple rotation operator).

4.5 Conventional Wisdom


This, however, is thought to be a mere strain upon the text, for the words
are these: That all true believers break their eggs at the convenient end,
and which end is the convenient end, seems, in my humble opinion, to be
left to every mans conscience
Jonathan Swift, 1726
It is a matter of empirical fact that the speed of light is invariant in terms of inertial
coordinates, and yet the invariance of the speed of light is often said to be a matter of
convention - as indeed it is. The empirical fact refers to the speed of light in terms of

inertial coordinates, but the decision to define speeds in terms of inertial coordinates is
conventional. Its trivial to define systems of space and time coordinates in terms of
which the speed of light is not invariant, but we ordinarily choose to describe events in
terms of inertial coordinates, partly because of the invariance of light speed based on
those coordinates. Of course, this invariance would be tautological if inertial coordinate
systems were simply defined as the systems in terms of which the speed of light is
invariant. However, as discussed in Section 1.3, the class of inertial coordinate systems is
actually defined in purely mechanical terms, without reference to the propagation of light.
They are the coordinate systems in terms of which mechanical inertia is homogeneous
and isotropic (which are the necessary and sufficient conditions for Newtons three laws
of motion to be valid, at least quasi-statically). The empirical invariance of light speed
with respect to this class of coordinate systems is a non-trivial empirical fact, but nothing
requires us to define velocity in terms of inertial coordinate systems. Such systems
cannot claim to have any a priori status as the true class of coordinates. Despite the
undeniable success of the principle of inertia as a basis for organizing our understanding
of the processes of nature, it is nevertheless a convention.
The conventionalist view can be traced back to Poincare, who wrote in "The Measure of
Time" in 1898
... we have no direct intuition about the equality of two time intervals. The
simultaneity of two events or the order of their succession, as well as the equality
of two time intervals, must be defined in such a way that the statements of the
natural laws be as simple as possible.
In the same paper, Poincare described the use of light rays, together with the convention
that the speed of light is invariant and the same in all directions, to give an operational
meaning to the concept of simultaneity. In his book "Science and Hypothesis" (1902) he
summarized his view of time by saying
There is no absolute time. When we say that two periods are equal, the statement
has no meaning, and can only acquire a meaning by a convention.
Poincare's views had a strong influence on the young Einstein, who avidly read "Science
and Hypothesis" with his friends in the self-styled "Olympia Academy". Solovine
remembered that this book "profoundly impressed us, and left us breathless for weeks on
end". Indeed we find in Einstein's 1905 paper on special relativity the statement
We have not defined a common time for A and B, for the latter cannot be defined
at all unless we establish by definition that the time required by light to travel
from A to B equals the time it requires to travel from B to A.
In a later popular exposition, Einstein tried to make the meaning of this definition more
clear by saying
That light requires the same time to traverse the path A to M (the midpoint of AB)

as for the path B to M is in reality neither a supposition nor a hypothesis about the
physical nature of light, but a stipulation which I can make of my own freewill in
order to arrive at a definition of simultaneity.
Of course, this concept of simultaneity is also embodied in Einstein's second "principle",
which asserts the invariance of light speed. Throughout the writings of Poincare,
Einstein, and others, we see the invariance of the speed of light referred to as a
convention, a definition, a stipulation, a free choice, an assumption, a postulate, and a
principle... as well as an empirical fact. There is no conflict between these
characterizations, because the convention (definition, stipulation, free choice, principle)
that Poincare and Einstein were referring to is nothing other than the decision to use
inertial coordinate systems, and once this decision has been made, the invariance of light
speed is an empirical fact. As Poincare said in 1898, we naturally choose our coordinate
systems "in such a way that the statements of the natural laws are as simple as possible",
and this almost invariably means inertial coordinates. It was the great achievement of
Galileo, Descartes, Huygens, and Newton to identify the principle of inertia as the basis
for resolving and coordinating physical phenomena. Unfortunately this insight is often
disguised by the manner in which it is traditionally presented. The beginning physics
student is typically expected to accept uncritically an intuitive notion of "uniformly
moving" time and space coordinate systems, and is then told that Newton's laws of
motion happen to be true with respect to those "inertial" systems. It is more meaningful to
say that we define inertial coordinate systems as those systems in terms of which
Newton's laws of motion are valid. We naturally coordinate events and organize our
perceptions in such a way as to maximize symmetry, and for the motion of material
objects the most important symmetries are the isotropy of inertia, the conservation of
momentum, the law of equal action and re-action, and so on. Newtonian physics is
organized entirely upon the principle of inertia, and the basic underlying hypothesis is
that for any object in any state of motion there exists a system of coordinates in terms of
which the object is instantaneously at rest and inertia is homogeneous and isotropic
(implying that Newton's laws of motion are at least quasi-statically valid).
The empirical validity of this remarkable hypothesis accounts for all the tremendous
success of Newtonian physics. As discussed in Section 1.3, the specification of a
particular state of motion, combined with the requirement for inertia to be homogeneous
and isotropic, completely determines a system of coordinates (up to insignificant scale
factors, rotations, etc), and such a system is called an inertial system of coordinates.
Such coordinate systems can be established unambiguously by purely mechanical means
(neglecting the equivalence principle and associated complications in the presence of
gravity). The assumption of inertial isotropy with respect to a given state of motion
suffices to establishes the loci of inertial simultaneity for that state of motion. Poincare
and Einstein rightly noted the conventionality of this simultaneity definition because they
were not pre-supposing the choice of inertial simultaneity. In other words, we are not
required to use inertial coordinates. We simply choose, of our own free will, to use
inertial coordinates - with the corresponding inertial definition of simultaneity - because
this renders the statement of physical laws and the descriptions of physical phenomena as
simple and perspicuous as possible, by taking advantage of the maximum possible

symmetry.
In this regard it's important to remember that inertial coordinates are not entirely
characterized by the quality of being unaccelerated, i.e., by the requirement that isolated
objects move uniformly in a straight line. It's also necessary to require the unique
simultaneity convention that renders mechanical inertial isotropic (the same in all spatial
directions), which amounts to the stipulation of equal one-way speeds for the propagation
of physically identical actions. These comments are fully applicable to the Newtonian
concept of space, time, and inertial reference frames. Given two objects in relative
motion we can define two systems of inertial coordinates in which the respective objects
are at rest, and we can orient these coordinates so the relative motion is purely in the x
direction. Let t,x and T,X denote these two systems of inertial coordinates. That such
coordinates exist is the main physical hypothesis underlying Galilean physics. An
auxiliary hypothesis one that was not always clearly recognized concerns the
relationship between two such systems of inertial coordinates, given that they exist.
Galileo assumed that if the coordinates x,t of an event are known, and if the two inertial
coordinate systems are the rest frames of objects moving with a relative speed of v, then
the coordinates of that event in terms of the other system (with suitable choice of origins)
are T = t, X = x vt. Viewed in the abstract, this is a rather peculiar and asymmetrical
assumption, although it is admittedly borne out by experience - at least to the precision of
measurement available to Galileo. However, we now know, empirically, that the relation
between relatively moving systems of inertial coordinates has the symmetrical form T =
(t vx)/ and X = (x vt)/ where = (1v2)1/2 when the time and space variables are
expressed in the same units such that the constant (3)108 meters/second equals unity. It
follows that the one-way (not just the two-way) speed of light is invariant and isotropic
with respect to any and every system of inertial coordinates.
The empirical content of this statement is simply that the propagation of light is isotropic
with respect to the same class of coordinate systems in terms of which mechanical inertia
is isotropic. This is consistent with the fact that light itself is an inertial phenomena, e.g.,
it conveys momentum. In fact, the inertia of light can be seen as a common thread
running through three of the famous papers published by Einstein in 1905. In the paper
entitled "On a Heuristic Point of View Concerning the Production and Transformation of
Light" Einstein advocated a conception of light as tiny quanta of energy and momentum,
somewhat reminiscent of Newton's inertial corpuscles of light. It's clear that Einstein
already understood that the conception of light as a classical wave is incomplete. In the
paper entitled "Does the Inertia of a Body Depend on its Energy Content?" he explicitly
advanced the idea of light as an inertial phenomenon, and of course this was suggested by
the fundamental ideas of the special theory of relativity presented in the paper "On the
Electrodynamics of Moving Bodies".
The Galilean conception of inertial frames assumed that all such frames share a unique
foliation of spacetime into "instants". Thus the relation "in the present of" constituted an
equivalence relation across all frames of reference. If A is in the present of B, and B is in
the present of C, then A is in the present of C. However, special relativity makes it clear
that there are infinitely many distinct loci of inertial simultaneity through any given

event, because inertial simultaneity depends on the velocity of the worldline through the
event. The inertial coordinate systems do induce a temporal ordering on events, but only
a partial one. (See the discussion of total and partial orderings in Section 1.2.) With
respect to any given event we can still partition all the other events of spacetime into
distinct causal regions, including "past", "present" and "future", but in addition we have
the categories "future null" and "past null", and none of these constitute equivalence
classes. For example, it is possible for A to be in the present of B, and B to be in the
present of C, and yet A is not in the present of C. Being "in the present of" is not a
transitive relation.
It could be argued that a total unique temporal ordering of events is a more useful
organizing principle than the isotropy of inertia, and so we should adopt a class of
coordinate systems that provides a total ordering. We can certainly do this, as Einstein
himself described in his 1905 paper
To be sure, we could content ourselves with evaluating the time of events by
stationing an observer with a clock at the origin of the coordinates who assigns to
an event to be evaluated the corresponding position of the hands of the clock
when a light signal from that event reaches him through empty space. However,
we know from experience that such a coordination has the drawback of not being
independent of the position of the observer with the clock.
The point of this "drawback" is that there is no physically distinguished "origin" on
which to base the time coordination of all systems of reference, so from the standpoint of
assessing possible causal relations we must still consider the full range of possible
"absolute" temporal orderings. This yields the same partial ordering of events as does the
set of inertial coordinates, so the "total ordering" that we can achieve by imposing a
single temporal foliation on all frames of reference is only formal, and not physically
meaningful. Nevertheless, we could make this choice, especially if we regard the total
temporal ordering of events as a requirement of intelligibility. This seems to have been
the view of Lorentz, who wrote in 1913 about the comparative merits of the traditional
Galilean and the new Einsteinian conceptions of time
It depends to a large extent on the way one is accustomed to think whether one is
attracted to one or another interpretation. As far as this lecturer is concerned, he
finds a certain satisfaction in the older interpretations, according to which... space
and time can be sharply separated, and simultaneity without further specification
can be spoken of... one may perhaps appeal to our ability of imagining arbitrarily
large velocities. In that way one comes very close to the concept of absolute
simultaneity.
Of course, the idea of "arbitrarily large velocities" already pre-supposes a concept of
absolute simultaneity, so Lorentz's rationale is not especially persuasive, but it expresses
the point of view of someone who places great importance on a total temporal ordering,
even at the expense of inertial isotropy. Indeed one of Poincare's criticisms of Lorentz's
early theory was that it sacrificed Newton's third law of equal action and re-action. (This

can be formally salvaged by assigning the unbalanced forces and momentum to an


undetectable ether, but the physical significance of a conservation law that references
undetectable elements is questionable.) Oddly enough, even Poincare sometimes
expressed the opinion that a total temporal ordering would always be useful enough to
out-weigh other considerations, and that it would always remain a safe convention. The
approach taken by Lorentz and most others may be summarized by saying that they
sacrificed the physical principles of inertial relativity, isotropy, and homogeneity in order
to maintain the assumed Galilean composition law. This approach, although technically
serviceable, suffers from a certain inherent lack of conviction, because while asserting the
ontological reality of anisotropy in all but one (unknown) frame of reference, it
unavoidably requires us to disregard that assertion and arbitrarily assume one particular
frame as being "the" rest frame.
Poincare and Einstein recognized that in our descriptions of events in spacetime in terms
of separate space and time coordinates we're free to select our "basis" of decomposition.
This is precisely what one does when converting the description of events from one frame
to another using Galilean relativity, but, as noted above, the Galilean composition law
yields anisotropic results when applied to actual observations. So it appeared (to most
people) that we could no longer maintain isotropy and homogeneity in all inertial frames
together with the ability to transform descriptions from one frame to another by simply
applying the appropriate basis transformation. But Einstein realized this was too
pessimistic, and that the new observations were fully consistent with both isotropy in all
inertial frames and with simple basis transformations between frames, provided we adjust
our assumption about the effective metrical structure of spacetime. In other words, he
brilliantly discerned that Lorentz's anisotropic results totally vanish in the context of a
different metrical structure.
Even a metrical structure is conventional in a sense, because it relies on our ontological
premises. For example, the magnitude of the interval between two events may seem to be
one thing but actually be another, due (perhaps) to variations in our means of observation
and measurement. However, once we have agreed on the physical significance of inertial
coordinate systems, the invariance of the quantity (dt)2 (dx)2 (dy)2 (dz)2 also becomes
physically significant. This shows the crucial importance of the very first sentence in
Section 1 of Einstein's 1905 paper:
Let us take a system of co-ordinates in which the equations of Newtonian
mechanics hold good.
Suitably qualified (as noted in Section 1.3), this immediately establishes not only the
convention of simultaneity, but also the means of operationally establishing it, and its
physical significance. Any observer in any state of inertial motion can throw two
identical particles in opposite directions with equal force (i.e., so there is no net
disturbance of the observer's state of motion), and the convention that those two particles
have the same speed suffices to fully specify an entire system of space and time
coordinates, which we call inertial coordinates. It is then an empirical fact - not a
definition, convention, assumption, stipulation, or postulate - that the speed of light is

isotropic in terms of inertial coordinates. This obviously doesn't imply that inertial
coordinates are "true" in any absolute sense, but the principle of inertia has proven to be
immensely powerful for organizing our knowledge of physical events, and for discerning
and expressing the apparent chains of causation.
If a flash of light emanates from the geometrical midpoint between two spatially separate
particles at rest in an inertial frame, the arrival times of the light wave at those two
particles are simultaneous in terms of that rest frames inertial coordinates. Furthermore,
we find empirically that all other physical processes are isotropic with respect to those
inertial coordinates, e.g., if a sound wave emanates from the midpoint of a uniform steel
beam at rest in an inertial frame, the sound reaches the two ends simultaneously in accord
with this definition. If we adopt any other convention we introduce anisotropies in our
descriptions of physical processes, such as sound in a uniform stationary steel beam
propagating more rapidly in one direction than in the other. The isotropy of physical
phenomena - including the propagation of light - is strictly a convention, but it was not
introduced by special relativity, it is one of the fundamental principles which we use to
organize our knowledge, and it leads us to choose inertial coordinates for the description
of events. On the other hand, the isotropy of multiple distinct physical phenomena in
terms of inertial coordinates is not purely conventional, because those coordinates can be
defined in terms of just one of those phenomena. The value of this definition is due to the
fact that a wide variety of phenomena are (empirically) isotropic with respect to the same
class of coordinate systems.
Of course, it could be argued that all these phenomena are, in some sense, the same.
For example, the energy conveyed by electromagnetic waves has momentum, so it is an
inertial phenomenon, and therefore it is not surprising that the propagation of such energy
is isotropic in terms of inertial coordinates. From this point of view, the value of the
definition of inertial coordinates is that it reveals the underlying unity of superficially dissimilar phenomena, e.g., the inertia of energy. This illustrates that our conventions and
definitions are not empty, because they represent ways of organizing our knowledge, and
the efficiency and clarity of this organization depends on choosing conventions that
reflect the unity and symmetries of the phenomena. We could, if we wish, organize our
knowledge based on the assumption of a total temporal ordering of events, but then it
would be necessary to introduce a whole array of unobservable anisotropic "corrections"
to the descriptions of physical phenomena.
As weve seen, the principle of relativity constrains, but does not uniquely determine, the
form of the mapping from one system of inertial coordinates to another. In order to fix the
observable elements of a spacetime theory with respect to every member of the
equivalence class of inertial frames we require one further postulate, such as the
invariance of light speed (or the inversion symmetry discussed in Chapter 1.8). However,
we should distinguish between the strong and weak forms of the light-speed invariance
postulate. The strong form asserts that the one-way speed of light is invariant with respect
to the natural space-time basis associated with any inertial state of motion, whereas the
weak form asserts only that the round-trip speed of light is invariant. To illustrate the
different implications of these two different assumptions, consider an experiment of the

type conducted by Michelson and Morley in their efforts to detect a directional variation
in the speed of light, due to the motion of the Earth through the aether, with respect to
which the absolute speed is light was presumed to be referred. To measure the speed of
light along a particular axis they effectively measured the elapsed time at the point of
origin for a beam of light to complete a round trip out to a mirror and back. At first we
might think that it would be just as easy to measure the one-way speed of light, by simply
comparing the time of transmission of a pulse of light from one location to the time of
reception at another location, but of course this requires us to have clocks synchronized at
two spatially separate locations, whereas it is precisely this synchronization that is at
issue. Depending on how we choose to synchronize our separate clocks we can measure a
wide range of light speeds. To avoid this ambiguity, we must evaluate the time interval
for a transit of light at a single spatial location (in the coordinate system of interest),
which requires us to measure a round trip, just as Michelson and Morley did.
Incidentally, it might seem that Roemer's method of estimating the speed of light from the
variations in the period between eclipses of Jupiter's moons (see Section 3.3) constituted
a one-way measurement. Similarly people sometimes imagine that the one-way speed of
light could be discerned by (for example) observing, from the center of a circle, pulses of
light emitted uniformly by a light source moving at constant speed around the perimeter
of the circle. Such methods are indeed capable of detecting certain kinds of anisotropy,
but they cannot detect the anisotropy entailed by Lorentzs ether theory, nor any of the
other theories that are observationally indistinguishable from Lorentzs theory (which
itself is indistinguishable from special relativity). In any theory of this class, there is an
ambiguity in the definition of a circle in motion, because circles contract to ellipses in
the direction of motion. Likewise there is ambiguity in the definition of uniformlytimed pulses from a light source moving around the perimeter of a moving circle
(ellipse). The combined effect of length contraction and time dilation in a Lorentzian
theory is to render the anisotropies unobservable.
The empirical indistinguishability between the theories in this class implies that there is
no unambiguous definition of the one-way speed of light. We can measure without
ambiguity only the lapses of time for closed-loop paths, and such measurements cannot
establish the open-loop speed. The ambiguity in the one-way speed remains, because
over any closed loop, by definition, the net change in each and every direction is zero.
Hence it is possible to consistently interpret all observations based on the assumption of
non-isotropic light speed. Admittedly the resulting laws take on a somewhat convoluted
appearance, and contain unobservable parameters, but they can't be ruled out empirically.
To illustrate, consider a measurement of the round-trip speed of light, assuming light
travels at a constant speed c relative to some absolute medium with respect to which our
laboratory is moving with a speed v. Under these assumptions, we would expect a pulse
of light to travel with a speed c+v (relative to the lab) in one direction, and cv in the
opposite direction. So, if we send a beam of light over a distance L out to a mirror in the
"c+v" direction, and it bounces back over the same distance in the "cv" direction, the
total elapsed time to complete the round trip of length 2L is

Therefore, the average round-trip speed relative to the laboratory would be

This shows why a round-trip measurement of the speed of light would not be expected to
reveal any dependency on the velocity of the laboratory unless the measurement was
precise enough to resolve second-order effects in v/c. The ability to detect such small
effects was first achieved in the late 19th century with the development of precision
interferometry (exploiting the wave-like properties of light.) The experiments of
Michelson and Morley showed that, despite the movement of the Earth in its orbit around
the Sun (to say nothing of the movement of the solar system, and even of the galaxy),
there was no (v/c)2 term in the round-trip speed of light. In other words, they found that
2L/t is always equal to c, at least to the accuracy they could measure, which was more
than adequate to rule out a second-order deviation. Thus we have a firm empirical basis
for asserting that the round-trip speed of light is independent of the motion of the source.
This is the weak form of the invariant light speed postulate, but in his 1905 paper
Einstein asserted something stronger, namely, that we should adopt the convention of
regarding the one-way speed of light as invariant. This stronger postulate doesn't follow
from the results of Michelson and Morley, nor from any other conceivable experiment or
observation - but there is also no conceivable observation that could conflict with it. The
invariant round-trip speed of light fixes the observable elements of the theory, but it does
not uniquely determine the presumed ontological structure, because multiple different
interpretations can be made to fit the same set of appearances. The one-way speed of light
is necessarily an interpretative element of our experience.
To illustrate the ambiguity, notice that we can ensure a null result for the Michelson and
Morley experiment while maintaining non-constant light speed, merely by requiring that
the speed of light v1 and v2 in the two opposite directions of travel (out and back) satisfy
the relation

In other words, a linear round-trip measurement of light speed will yield the constant c in
every direction provided only that the harmonic mean of the one-way speeds in opposite
directions always equals c. This is easily accomplished by defining the one-way velocity
v1 as a function of direction arbitrarily for all directions in one hemisphere, and then
setting the velocities in the opposite directions the velocities v2 in the opposite directions
as v2 = cv1 / (2v1 c). However, we also wish to cover more complicated round-trips,
rather than just back and forth on a single line. To ensure that a circuit of light around an

equilateral triangle with edges of length L yields a round-trip speed of c, the speeds v1, v2,
v3 in the three equally spaced directions must satisfy

so again we see that the light speeds must have a harmonic mean of c. In general, to
ensure that every closed loop of light, regardless of the path, yields the average speed c,
it's necessary (and also sufficient) to have light speed v = C() as a function of angle in
a principal plane such that, for any positive integer n,

In units with c = 1, we need the n terms on the left side to sum to n, so the velocity
function must be such that 1/C() = 1 + f() where the function f() satisfies

for all . The canonical example of such a function is simply f() = k cos() for any
constant k. Thus if we postulate that the speed of light varies as a function of the angle of
travel relative to some primary axis according to the equation

then we are assured that all closed-loop measurements of the speed of light will yield the
constant c, despite the fact that the one-way speed of light is distinctly non-isotropic (for
non-zero k). This equation describes an ellipse, and no measurement can disprove the
hypothesis that the one-way speed of light actually is (or is not) given by (1). It is, strictly
speaking, a matter of convention. If we choose to believe that light has the same speed in
all directions, then we assume k = 0, and in order to send a synchronizing signal to two
points we would locate ourselves midway between them (i.e., at the location where round
trips between ourselves and those two points take the same amount of time.) On the other
hand, if we choose to believe light travels twice as fast in one direction as in the other,
then we would assume k = 1/3, and we would locate ourselves 2/3 of the way between
them (i.e., twice as far from one as the other, so round trip times are two to one). The
latter case is illustrated in the figure below.

Regardless of what value we assume for k (in the range from -1 to +1), we can
synchronize all clocks according to our belief, and everything will be perfectly consistent
and coherent. Of course, in any case it's necessary to account consistently for the lapse of
time for information to get from one clock to another, but the lapse of time between any
two clocks separated by a distance L can be anything we choose in the range from
virtually 0 to 2L/c. The only real constraint is that that the speed be an elliptical function
of the direction angle.
The velocity profile given by (1) is simply the polar equation of an ellipse (or ellipsoid is
revolved about the major axis), with the pole at one focus, the semi-latus rectum equal to
c, and eccentricity equal to k. This just projects the ellipse given by cutting the light cone
with an oblique plane. Interestingly, there are really two light cones that intersect on this
plane, and they are the light cones of the two events whose projections are the two foci of
the ellipse - for timelike separated events. Recall that all rays emanating from one focus
of an ordinary ellipse and reflecting off the ellipse will re-converge on the other focus,
and that this kind of ray optics is time-symmetrical. In this context our projective ellipse
is the intersection of two null-cones, i.e., it is the locus of all points in spacetime that are
null-separated from both of the "foci events". This was to be expected in view of the
time-symmetry of Maxwell's equations (not to mention the relativistic Schrodinger
equation), as discussed in Section 9.
Our main reason for assuming k = 0 is our preference for symmetry, simplicity, and
consistency with inertial isotropy. Within our empirical constraints, k can be interpreted
as having any value between -1 and +1, but the principle of sufficient reason suggests that
it should not be assigned a non-zero value in the absence of any rational justification.
Nevertheless, it remains a convention (albeit a compelling one), but we should be clear
about what precisely is and what is not conventional. The invariance of lightspeed is a
convention, but the invariance of lightspeed in terms of inertial coordinates is an
empirical fact, and this empirical fact is not a formal tautology, because inertial
coordinates are determined by the mechanical inertia of material objects, independent of

the propagation of light.


Recall that Einsteins 1905 paper states that if a pulse of light is emitted from an
unaccelerated clock at time t1, and is reflected off some distant object at time t2, and is
received back at the original clock at time t3, then the inertial coordinate synchronization
is given by stipulating that

Reichenbach noted that the formally viable simultaneity conventions correspond to the
assumption

where is any constant in the range from 0 to 1. This describes the same class of
elliptical speed conventions as discussed above, with = (k+1)/2 where k ranges from
1 to +1. The corresponding coordinate transformation is a simple time skew, i.e., x = x,
y = y, z = z, t = t + kx/c. This describes the essence of the Lorentzian absolutist
interpretation of special relativity. Beginning with the putative absolute rest frame inertial
coordinates x,y, Lorentz associates with each state of motion v a system of coordinates
x,t related to x,y by a Galilean transformation with parameter v. In other words, x = x
vt and t = t. He then re-scales the x,t coordinates to account for what he regards as the
physical contraction of the lengths of stable object and the slowing of the durations of
stable physical processes, to arrive at the coordinates x = x/ and t = t where = (1
v2/c2)1/2. These he regards as the proper rest frame coordinates for objects moving with
speed v in terms of the absolute frame. There is nothing logically unacceptable about
these coordinate systems, but we must realize that they do not constitute inertial
coordinate systems in the full sense. Mechanical inertia and the speed of light are not
isotropic in terms of such coordinates, precisely because the time foliation (i.e., the
simultaneity convention) is skewed relative to the = 1/2 convention.
If we begin with the inertial rest frame coordinates for the state of motion v (which
Lorentz and Einstein agree are related to the putative absolute rest frame coordinate by a
Lorentz transformation), and then apply the time skew transformation with parameter k =
-v/c, we arrive at these Lorentzian rest frame coordinates. Needless to say, our choice of
coordinate systems does not affect the outcome of any physical measurement, except that
the outcome will be expressed in different terms. For example, by the Einsteinian
convention the speed of light is isotropic in terms of the rest frame coordinates of any
material object, whereas by the Lorentzian convention it is not. This difference is simply
due to different definitions of rest frame coordinates. If we specify inertial coordinate
systems (i.,e., coordinates in terms of which inertia is isotropic and Newtons laws are
quasi-statically valid) then there is no ambiguity, and both Lorentz and Einstein agree that
the speed of light is isotropic in terms of all inertial coordinate systems.
In subsequent sections well see that the standard formalism of general relativity provides

a convenient means of expressing the relations between spacetime events with respect to
a larger class of coordinate systems, so it may appear that inertial references are less
significant in the general theory. In fact, Einstein once hoped that the general theory
would not rely on the principle of inertia as a primitive element. However, this hope was
not fulfilled, and the underlying physical basis of the spacetime manifold in general
relativity remains the set of primitive inertial paths (geodesics) through spacetime. Not
only do these inertial paths determine the equivalence class of allowable coordinate
systems (up to diffeomorphism), it even remains true that at each event we can construct
a (local) system of inertial coordinates with respect to which the speed of light is c in all
directions. Thus the empirical fact of lightspeed invariance and isotropy with respect to
inertial coordinates remains as a primitive component of the theory. The difference is that
in the general theory the convention of using inertial coordinates is less prevalent,
because in general there is no single global inertial coordinate system, and non-inertial
coordinate systems are often more convenient on a curved manifold.
4.6 The Field of All Fields
Classes and concepts may be conceived as real objects, existing
independently of our definitions and constructions. It seems to me that the
assumption of such objects is quite as legitimate as the assumption of
physical bodies, and there is quite as much reason to believe in their
existence.
Kurt
Gdel, 1944
Where is the boundary between the special and general theories of relativity? It is
sometimes said that any invocation of "general covariance" implies general relativity, but
just about any theory can be expressed in a generally covariant form, so this doesn't even
distinguish between general relativity and Newtonian physics, let alone special relativity.
For example, it's perfectly possible to simply transform the special relativistic solution of
a rotating platform into some arbitrary accelerated coordinate system, and although the
result is ugly, it is no less (or more) valid than when it was expressed in terms of nonaccelerating coordinates, because the transformation from one stipulated set of
coordinates to another has no physical content. The key word there is "stipulated",
because the real difference between the special and general theories is in what they take
for granted.
In a sense, special relativity is analogous to "naive set theory" in mathematics. By this I
mean that special relativity is based on certain plausible-sounding premises which
actually are quite serviceable for treating a wide class of problems, but which on close
examination are susceptible to self-referential antinomies. This is most evident with
regard to the assumption of the identifiability of inertial frames. As Einstein remarked,
"in the special theory of relativity there is an inherent epistemological defect", namely,
that the preferred class of reference frames on which the theory relies is circularly
defined. Special relativity asserts that the lapse of proper time between two (timelike-

separated) events is greatest along the inertial worldline connecting those two events - a
seemingly interesting and useful assertion - but if we ask which of the infinitely many
paths connecting those two events is the "inertial" one, we can only answer that it is the
one with the greatest lapse of proper time. If we simply accept this uncritically, and are
willing to naively rely on the testimony of accelerometers as unambiguous indicators of
"inertia", we have a fairly solid basis on which to do physics, and we can certainly work
out correct answers to many questions. However, the epistemological defect was
worrisome to Einstein, and caused him (in a remarkably short time) to abandon special
relativity and global Lorentz invariance as a suitable conceptual framework for the
formulation of physics.
The naive reliance on accelerometers as unambiguous indicators of global inertia in the
context of special relativity is immediately undermined by the equivalence principle,
because we're then required to predicate any application of special relativity on the
absence (or at least the negligibility) of irreducible gravitational fields, and this condition
is simply not verifiable within special relativity itself, because of the circularity in the
principle of inertia. This circularity genuinely troubled Einstein, and was one of the major
motivations (along with the problem of reconciling mass-energy equivalence with the
Equivalence Principle) that led him to abandon special relativity.
Given the recognized limitations of special relativity, and considering how successfully it
was generalized and extended in 1916, we may wonder why it's even necessary to
continue carrying along the special theory as a conceptually distinct entity. Will this
duality persist indefinitely, or will we eventually just say there is a single theory of
relativity (the theory traditionally called general relativity), which subsumes and extends
the earlier theory called special relativity? The reluctance to discard the special theory as
a separate theory may be due largely to the fact that it represents a simple and widelyapplicable special case of the general theory, and it's convenient to have a name for this
limiting case. (There are, however, many cases in which the holistic approach of the
general theory is actually much simpler than the traditional special-theory-plus-generalcorrections approach.) Another reason that's sometimes mentioned is the (remote)
possibility that Einstein's general relativity is not the "right" generalization/extension of
the special theory. For example, if observation were ever to conclusively rule out the
existence of gravitational waves (which is admittedly hard to imagine in view of the
available binary star data), it might be necessary to seek another framework within which
to place the special theory. In this sense, we might regard special relativity as roughly
analogous to set theory without the axiom of choice, i.e., a restricted and less ambitious
theory that avoids making use of potentially suspect concepts or premises.
However, it's hard to say exactly which of the fundamental principles of general relativity
is considered to be suspect. We've seen that "general covariance" is a property of almost
any theory, so that can't be a problem. We might doubt the equivalence principle in one or
more of its various flavors, but it happens to be one of the most thoroughly tested
principles in physics. It seems most likely that if general relativity fails, it would be
because one or more of its "simplicities" is inappropriate. For example, the restriction to
2nd order, or the assumption of Riemannian metrics rather than, say, Finsler metrics, or

the naive assumption of R4 topology, or maybe even the basic assumption of a continuum.
Still, each of these would also have conceptual implications for the special theory, so
these aren't valid reasons for continuing to regard special relativity as a separate theory.
Suppose we naively superimpose special relativity on Newtonian physics, and adopt a
naive definition of "inertial worldline", such as a worldline with no locally sensible
acceleration. On that basis we find that there can be multiple distinct "inertial" worldlines
connecting two given events (e.g., intersecting elliptical orbits of different eccentricities),
which conflicts with the special relativistic principle of a unique inertial interval between
any pair of timelike separated events. To press the antinomy analogy further, we could
arrange to have special relativity conclude that each of these worldlines has a lesser lapse
of proper time than each of the others. (If the barber shaves everyone who doesn't shave
himself, who shaves the barber?) Of course, with special relativity (as with set theory) we
can easily block such specific conundrums - once they are pointed out - by imposing one
or more restrictions on the definition of "inertial" (or the definition of a "set"), and in so
doing we make the theory somewhat less naive, but the experience raises legitimate
questions about whether we can be sure we have blocked all possible escapes.
We shouldn't push the analogy too far, since there are obvious differences between a
purely mathematical theory and a physical theory, the latter being exposed to potential
conflict with a much wider class of "external" constraints (such as the requirement to
possess a consistent mapping to a representation of experience). However, when
considering naive set theory's assumption of the existence of sets, and its assertions about
how to manipulate and reason with sets, all in the absence of a comprehensive criteria of
how to identify what can legitimately be called a set, there is an interesting parallel with
special relativity's assumption of the existence of inertial frames and how to reason with
them and in them, all in the absence of a comprehensive framework for deciding what
does and what does not constitute an inertial frame.
It might be argued that relativity is a purely formalistic theory, which simply assumes an
inertial frame is specified, without telling how to identify one. Certainly we can
completely insulate special relativity from any and all conflict by simply adopting this
strategy, i.e., asserting that special relativity avers no mapping at all between it's elements
and the objects of our experience. However, although this strategy effectively blocks
conflict, it also renders the theory quite unfalsifiable and phenomenologically otiose.
Even recognizing the distinction between logical inconsistency and empirical
falsification, we must also remember that the rules of logic and reason are ultimately
grounded in "observations", albeit of a very abstract nature, and mathematical theories no
less than physical theories are attempts to formalize "observations". As such, they are
comparably subject to upset when they're found to conflict with other observations (e.g.,
barbers, gravity, etc.).
It might be argued that we cannot really attribute any antinomies to special relativity,
because the cases noted above (multiply intersecting elliptical orbits, etc) arise only from
attempting to apply special relativistic reasoning to a class of entities for which it is not

suited. However, the same is true of naive set theory, i.e., it works perfectly well when
applied to a wide class of sets, but leads to logically impossible conclusions if we attempt
to apply it to a class of sets that "act on themselves"... just as gravity is found to act on
itself in the general theory. In a real sense, gravity in general relativity is a self-referential
phenomenon, as revealed by the non-linearity of the field equations. Notice that our
antinomies in the special theory arise only when trying to reason with "self-referential
inertial frames", i.e., in the presence of irreducible gravitational fields.
The basic point is that although special relativity serves as the local limiting case of the
general theory, it is not able to stand alone, because it cannot identify the applicability of
its premises, which renders it incapable of yielding definite macroscopic conclusions
about the physical world. By placing all the necessary indefinite qualifiers on the scope
of applicability, we effectively remove special relativity from the set of physical theories.
This just re-affirms the point that any application of special relativity is, strictly speaking,
legitimized only within the context of the general theory, which provides the framework
for assessing the validity of the application. One can, of course, still practice the special
theory from a naive standpoint, and be quite successful at it, just as one can practice naive
set theory without running into trouble very often. Naturally none of this implies that
special relativity, by itself, is unfalsifiable. Indeed it is falsifiable, but only when
superimposed on some other framework (such as Newtonian physics) and combined with
some auxiliary assumptions about how to identify inertial frames. In fact, the special
theory of relativity is not only falsifiable, it is falsified, and was superceded in 1916 by a
superior and more comprehensive theory. Nevertheless, strict epistemological scruples
don't have a great deal of relevance to the actual day-to-day practice of science.
From a more formal standpoint, it's interesting to consider the correspondence between
the foundations of set theory and the theories of relativity. The archetypal example of a
problematic concept in naive set theory was the notion of the "set of all sets". It soon
became apparent to Cantor, Russell, and other mathematicians that this plausiblesounding notion could not consistently be treated as a set in the usual sense. The problem
was recognized to be the self-referential nature of the concept. We can compare this to
the general theory of relativity, which is compelled by the equivalence principle to
represent the metric of spacetime as (so to speak) "the field of all fields". To make this
more precise, recall that Newtonian gravity can be represented by a scalar field defined
over a pre-existing metrical space, whose metric we may denote as g. The vacuum field
equation is Lg() = 0 where Lg signifies the Laplacian operator over the space with the
fixed metric g. In general relativity the Laplacian is replaced by a more complicated
operator Rg which, like the Laplacian, is effectively a differential operator whose
components are evaluated on the spacetime with the metric g. However, in general
relativity the field on which Rg operates is nothing but the spacetime metric g itself. In
other words, the vacuum field equations are Rg(g) = 0. The entity Rg(g) is called the Ricci
tensor in differential geometry, usually denoted in covariant form as R.
This highlights the essentially self-referential nature of the Einstein field equations, as
opposed to the Newtonian field equations where the operator and the field being operated
on are completely independent entities. It's interesting to compare this situation to

schematic representations of Goedel's formalization of arithmetic, leading to his proof of


the Incompleteness Theorem. Given a well-defined mapping between single-variable
propositional statements and the natural numbers (which Goedel showed is possible,
though far from trivial), let Pn(w) denote the nth statement applied to the variable w.
Since every possible proposition maps to some natural number, there is a natural number
k such that Pk(w) represents the proposition that Pw(w) has no proof. But then what
happens if we set the variable w equal to k? We see that Pk(k) represents that proposition
that there is no proof of Pk(k), from which it follows that if there is no proof of Pk(k) then
Pk(k) is true, whereas if there is a proof of Pk(k) then Pk(k) is false. Hence, assuming our
system of arithmetic is self-consistent, so that it doesn't contain proofs of false
propositions, we must conclude that Pk(k) is true but unprovable. Obviously the negation
of Pk(k) must also be unprovable, assuming our arithmetic is consistent, so the
proposition is strictly undecidable within the formal system encoded by our numbering
scheme.
The analogy between Goedel propositions Pk(k) and the field equations of general
relativity Rg(g) = 0 should not be pressed too far, but it does hint at the real and profound
subtleties that can arise when we allow self-referential statements. It's interesting that
Einstein seems to have been mindful very early of the eventual necessity of such
statements, although he deferred it for quite some time. Prior to 1905 many physicists
were attempting to construct a purely electromagnetic theory of matter based on
Maxwell's equations, according to which "the particle would be merely a domain
containing an especially high density of field energy". However, in presenting the special
theory of relativity Einstein carefully avoided proposing any particular theory as to the
ultimate structure of matter, and showed that a purely kinematical interpretation could
account for the relation between energy and inertia. He took this approach not because he
was disinterested in the nature of matter, but because he recognized immediately that
Maxwell's equations did not permit the derivation of the equilibrium of the
electricity that constitutes a particle. Only different, nonlinear field equations
could possibly accomplish such a thing. But no method existed for discovering
such field equations without deteriorating into adventurous arbitrariness.
So in 1905 Einstein took the more conservative route and merely(!) redefined the
traditional concepts of time and space. A few years later he himself embarked on an
adventure leading ultimately in 1915 to the non-linear field equations of general
relativity, but even in this he managed to make important progress by sidestepping again
the question of the ultimate constituency of matter and light. As he recalled in his
Autobiographical Notes
It seemed hopeless to me at that time to venture the attempt of representing the
total field [as opposed to the pure gravitational field] and to ascertain field laws
for it. I preferred, therefore, to set up a preliminary formal frame for the
representation of the entire physical reality; this was necessary in order to be able
to investigate, at least preliminarily, the effectiveness of the basic idea of general
relativity.

In his later years it seems Einstein had decided he had made all the progress that could be
made on this preliminary basis, and set about the attempt to represent the total field. He
wrote the above comments in 1949, after a quarter-century of fruitless efforts to discover
the non-linear equations for the "total field", including electromagnetism and matter, so
he knew only too well the risks of deteriorating into adventurous arbitrariness.

4.7 The Inertia of Twins


We have no direct intuition of simultaneity, nor of the equality of two
durations. People who believe they possess this intuition are dupes of an
illusion... The simultaneity of two events, the order of their succession,
and the equality of two durations, are to be so defined that the enunciation
of the natural laws may be as simple as possible.
Poincare, The Value of
Science, 1905
The most commonly discussed "paradox" associated with the theory of relativity
concerns the differing lapses of proper time along two different paths between two fixed
events. This is often expressed in terms of a pair of twins, one moving inertially from
event A to event B, and the other moving inertially from event A to an intermediate event
M, where he changes his state of motion, and then moves inertially from M to B, where it
is found that the total elapsed time of the first twin exceeds that of the second. Much of
the popular confusion over this sequence of events is simply due to specious reasoning.
For example, if x,t and x',t' denote inertial rest frame coordinates respectively of the first
and second twin (on either the outbound or inbound leg of his journey), some people are
confused by the elementary fact that if those two coordinate systems are related
according to the Lorentz transformation, then the partials (t'/t)x and (t/t')x' both have
the same value. (For example, the unfortunate Herbert Dingle spent his retirement years
on a pitiful crusade to convince the scientific community that those two partial
derivatives must be the reciprocals of each other, and that therefore special relativity is
logically inconsistent.) Other people struggle with the equally elementary algebraic fact
that the proper time along any given path between two events is invariant under arbitrary
Lorentz transformations. The inability to grasp this has actually led some eccentrics to
waste years in a futile effort to prove special relativity inconsistent by finding a Lorentz
transformation that does not leave the proper time along some path invariant.
Despite the obvious fallacies underlying these popular confusions, and despite the
manifest logical consistency of special relativity, it is nevertheless true that the so-called
twins paradox, interpreted in a more profound sense, does highlight a fundamental
epistemological shortcoming of the principle of inertia, on which both Newtonian
mechanics and special relativity are based. Naturally if we simply stipulate that one of the
twins is in inertial motion the entire time and the other is not, then the resolution of the
"paradox" is trivial, but the stipulation of "inertial motion" for one of the twins begs the

very question that motivates the paradox (in its more profound form), namely, how are
inertial worldlines distinguished from the set of all possible worldlines? In a sense, the
only answer special relativity can give is that the inertial worldline between two events is
the one with the greatest lapse of proper time, which is clearly of no help in resolving
which of the twins' worldlines is "inertial", because we don't know a priori which twin
has the greater lapse of proper time - that's what we're trying to determine!
This circularity in the definition of inertia and the inability to justify the privileged
position held by inertial worldlines in special relativity were among the problems that led
Einstein in the years following 1905 to seek a broader and more coherent context for the
laws of physics. The same kind of circular reasoning arises whenever we critically
examine the concept of inertia. For example, when trying to decide if our region of
spacetime is really flat, so that "straight lines" exist, we face the same difficulty. As
Einstein said:
The weakness of the principle of inertia lies in this, that it involves an argument in
a circle: a mass moves without acceleration if it is sufficiently far from other
bodies; we know that it is sufficiently far from other bodies only by the fact that it
moves without acceleration.
We could equally well substitute [has the greatest lapse of proper time] for [is sufficiently
far from other bodies]. In either case the point is the same: special relativity postulates the
existence of inertial frames and assigns to them a preferred role, but it gives no a priori
way of establishing the correct mapping between this concept and anything in reality.
This is what Einstein was referring to when he said "In classical mechanics, and no less
in the special theory of relativity, there is an inherent epistemological defect...". He
illustrates this with a famous thought experiment involving two relatively spinning
globes, discussed in Chapter 4.1. (The term "thought experiment" might be regarded as
an oxymoron, since the epistemological significance of an experiment is its empirical
quality, which a thought experiment obviously doesn't possess. Nevertheless, it's
undeniable that scientists have made good use of this technique - along with occasionally
making bad use of it.) The puzzling asymmetry of the spinning globes is essentially just
another form of the twins paradox, where the twins separate and re-converge (one
accelerates away and back while the other remains stationary), and they end up with
asymmetric lapses of proper time. How can the asymmetry be explained? In 1916
Einstein thought that
The only satisfactory answer must be that the physical system consisting of S1 and
S2 reveals within itself no imaginable cause to which the differing behavior of S1
and S2 can be referred. The cause must therefore lie outside the system. We have
to take it that the general laws of motion...must be such that the mechanical
behavior of S1 and S2 is partly conditioned, in quite essential respects, by distant
masses which we have not included in the system under consideration.
It should be noted that the strongly Machian attitude conveyed by this passage was
subsequently tempered for Einstein when he realized that in the general theory of

relativity it may be necessary to attribute the "essential conditioning" to boundary


conditions rather than distant masses. Nevertheless, this quotation serves to demonstrate
how seriously Einstein took the question, which, of course, is as applicable to the twins
paradox as it is to the two-globe paradox.
The above weighty argument from the theory of knowledge was the first reason cited
by Einstein (in 1916) for the need to go beyond special relativity in order to arrive at a
suitable conceptual framework. The second reason was the apparent impossibility of
doing justice, within the context of special relativity, to the equivalence principle relating
gravitation and acceleration. The first of these reasons bears most directly on the twins
paradox, although the problem of reconciling acceleration with gravity inevitably enters
the picture as well, since we can't avoid the issue of gravitation as soon as we
contemplate acceleration assuming we accept the equivalence principle. From these
considerations its clear that special relativity could never have been more than a
transitional theory, since it was not comprehensive enough to justify its own conclusions.
The question of whether general relativity is required to resolve the twins paradox has
long been a subject of spirited debate. On one hand, Einstein wrote a paper in 1918 to
explain how the general theory accounts for the asymmetric aging of the twins by means
of the gravitational fields that appear with respect to accelerated coordinates attached to
the traveling twin, and Max Born recounted this analysis in a popular book, concluding
that "the clock paradox is due to a false application of the special theory of relativity,
namely, to a case in which the methods of the general theory should be applied". On the
other hand, many people object vigorously to any suggestion that special relativity is
inadequate to satisfactorily resolve the twins paradox. Ultimately the answer depends on
what sort of satisfaction is being sought, viz., on whether the paradox is being presented
as a challenge to the consistency of special relativity (as is Dingle's fallacy) or to the
completeness of special relativity. If we're willing to accept uncritically the existence and
identifiability of inertial frames, and their preferred status, and if we are willing to
exclude any consideration of gravity or the equivalence principle, then we can reduce the
twins paradox to a trivial exercise in special relativity. However, if it is the completeness
(rather than the consistency) of special relativity that is at issue, then the naive acceptance
of inertial frames is precisely what is being challenged. In this context, we can hardly
justify the exclusion of gravitation, considering that the very same metrical field which
determines the inertial worldlines also represents the gravitational field.
Notice that the typical statement of the twins paradox does not stipulate how the galaxies
in the universe along with the cosmological boundary conditions that determine the
metrical field are dynamically configured relative to the twins. If every galaxy in the
universe were moving in tandem with the "traveling twin", which (if either) of the
twins' reference frames would be considered inertial? Obviously special relativity is
silent on this point, and even general relativity does not give an unequivocal answer.
Weinberg asserts that "inertial frames are determined by the mean cosmic gravitational
field, which is in turn determined by the mean mass density of the stars", but the second
clause is not necessarily true, because the field equations generally require some
additional information (such as boundary conditions) in order to yield definite results.

The existence of cosmological models in which the average matter of the universe rotates
(a fact proven by Kurt Gdel) shows that even general relativity is incomplete, in the
sense that it is subject to global conditions with considerable freedom. General relativity
may not even give a unique field for a given (non-spherically symmetric) set of boundary
conditions and mass distribution, which is not surprising in view of the possibility of
gravitational waves. Thus even if we sharpen the statement of the twins paradox to
specify how the twins are moving relative to the rest of the matter in the universe, the
theory of relativity still doesn't enable us to say for sure which twin is inertial.
Furthermore, once we recognize that the inertial and gravitational field are one and the
same, the twins paradox becomes even more acute, because we must then acknowledge
that within the theory of relativity it's possible to contrive a situation in which two
identical clocks in identical local circumstances (i.e., without comparing their positions to
any external reference) can nevertheless exhibit different lapses in proper time between
two given events. The simplest example is to place the twins in intersecting orbits, one
circular and the other highly elliptical. Each twin is in freefall continuously between their
periodic meetings, and yet they experience different lapses of proper time. Thus the
difference between the twins is not a consequence of local effects; it is a global effect. At
any point along those two geodesic paths the local physics is identical, but the paths are
embedded differently within the global manifold, and it is the different embedding within
the manifold that accounts for the difference in proper length. (The same point can be
made by referring to a flat cylindrical spacetime.) This more general form of the twins
paradox compels us to abandon the view that physical phenomena are governed solely by
locally sensible influences. (Notice, however, that we are forced to this conclusion not by
logical contradiction, but only by our philosophical devotion to the principle of sufficient
cause, which requires us to assign like physical causes to like physical effects.) Likewise
the identification of gravity with local spacetime curvature is untenable, as shown by the
fact that a suitable arrangement of gravitating masses can produce an extended region of
flat spacetime in which the metrical field is nevertheless accelerating in the global sense,
and we surely would not regard such a region as free of gravitation.
It is fundamentally misguided to exercise such epistemological concerns within the
framework of special relativity, because special relativity was always a provisional theory
with recognized epistemological short-comings. As mentioned above, one of Einstein's
two main two reasons for abandoning special relativity as a suitable framework for
physics was the fact that, no less than Newtonian mechanics, special relativity is based on
the unjustified and epistemologically problematical assumption of a preferred class of
reference frames, precisely the issue raised by the twins paradox. Today the "special
theory" exists only (aside from its historical importance) as a convenient set of widely
applicable formulas for important limiting cases of the general theory, but the
phenomenological justification for those formulas can only be found in the general
theory.
This is true even if we posit the absence of gravitational effects, because the question at
issue is essentially the origin of inertia, i.e., why one worldline is inertial while another is
not, and the answer unavoidably involves the origin and significance of the background

metric, even in the absence of curvature. The special theory never claimed, and was never
intended, to address such questions. The general theory attempts to provide a coherent
framework within which to answer such questions, but it's not clear whether the attempt
is successful. The only context in which general relativity can give (at least arguably) a
complete explanation of inertia is a closed, finite, unbounded cosmology, but the
observational evidence doesn't (at present) clearly support this hypothesis, and any
alternative cosmology requires some principle(s) outside of general relativity to
determine the metrical configuration of the universe.
Thus the twins paradox is ultimately about the origin and significance of inertia, and the
existence of a definite metrical structure with a preferred class of worldlines (geodesics).
In the general theory of relativity, spacetime is not simply the totality of all the relations
between material objects. The spacetime metric field is endowed with its own ontological
existence, as is clear from the fact that gravity itself is a source of gravity. In a sense, the
non-linearity of general relativity is an expression of the ontological existence of
spacetime itself. In this context it's not possible to draw the classical distinction between
relational and absolute entities, because spatio-temporal relations themselves are active
elements of the theory.
We should also mention another common objection to the relativistic treatment of the
twins, based not on any empirical disagreement, but on linguistic and metaphysical
preferences. It is pointed out that we can, without logical contradiction, posit the
existence of a unique, absolute, and true metaphysical time at every location, and we can
account for the differences between the elapsed times on clocks that have followed
different paths simply by stipulating that the rate of a clock depends on its absolute state
of motion (defined relative to, for instance, the local frame in which the presumably
global cosmic background radiation is maximally isotropic). Indeed this was essentially
the view advocated by Lorentz. However, as discussed at the end of Section 1.5,
postulating a metaphysical truth along with whatever physical laws are necessary to
account for why the observed facts differ from the postulated truth is not generally
useful, except as a way of artificially reconciling our experience with any particular
metaphysical truth that we might select. The relativistic point of view is based on purely
local concepts, such as that of an ideal clock corrected for all locally sensible
conditions, recommended to us by the empirical fact that all observable aspects of local
physical phenomena including the rates of temporal progression exhibit the same
dependence on their state of inertial motion (which is not a locally sensible condition).
This is the physical symmetry presented to us, and we are certainly justified in exploiting
this symmetry to simplify and clarify the enunciation of physical laws.
4.8 The Breakdown of Simultaneity
I have yielded: Instruct my daughter how she shall
persever, that time and place with this deceit so
lawful may prove coherent.
William Shakespeare, 1603

We've seen how the operational time convention enables us to define surfaces of
simultaneity with respect to any given inertial frame. However, if we try to apply this
procedure to a set of accelerating bodies the concept breaks down. The problem is
illustrated in the spacetime diagram shown below.

This drawing shows a family of worldlines, each having the identical history of velocity
as a function of time relative to the inertial coordinates. By sending light beams back and
forth to its neighboring worldlines, an observer following path B can determine that he is
equidistant from A and C. Likewise an observer on C is equidistant between B and D, and
an observer on D is equidistant from C and E. However, due to the change in velocity of
these worldlines, an observer on C can not conclude that he is equidistant from A and E.
This breakdown of the well-defined locus of simultaneity is unavoidable in accelerating
systems, because the operational procedure defining simultaneity involves a non-zero
lapse of time for spatially separate objects, so the simultaneity relations change during the
performance of the procedure. Of course, the greater the distance between objects, the
greater the change in velocity (and simultaneity relations) during the performance of a
synchronization procedure.
Another illustration of this problem is shown below, where the instantaneous loci of
simultaneity of an abruptly accelerated worldline are seen to intersect each other (on the
left), so that a given distant event is assigned multiple times of occurrence. Furthermore,
events in the region "R" on the right do not properly correspond to any time according to
the accelerating worldline's instantaneous inertial time, because at the instant of
acceleration his locus of simultaneity jumps abruptly.

Obviously any amount of relative "skew" between the planes of simultaneity for a given
worldline will result in interference at some distance, producing non-unique time
coordinates. However, if the velocity of our worldline varies continuously (instead of
abruptly), then for some limited region the planes of simultaneity will be advancing
forward in time faster than they are "tilting" backwards, so over this limited region we
can, if we choose, make use of these planes of simultaneity for the time labels of events.
This situation is illustrated below.

x
We can easily determine the approximate limit for unique time labels with this kind of
coordinate system by noting that if the velocity changes by amount dv/c during a time
interval dt, then the relative slope of the new plane of simultaneity is c/dv, so it intersects
with the original plane of simultaneity at a distance dx = (cdt)(c/dv) = c2/(dv/dt). Since a
= dv/dt is the acceleration, we can estimate that this accelerating system of coordinates is
coherent out to distances on the order of c2/a.
As an example of the use of accelerating coordinate systems and the breakdown of
inertial simultaneity, consider a circular Sagnac device as described in Section 2.7. As
we've seen, each point on the rim of the rotating disk can be associated with an

instantaneously co-moving inertial coordinate system, each with its own surfaces of
simultaneity. However, since each point of the disk is accelerating with respect to each
other point, there is no coherent simultaneity (in the inertial sense) shared by any two
points. If we analytically continue the local simultaneity from one point to the next
around the perimeter, the result is an open helical surface as indicated below:

The worldline of a particular point on the rim is shown by the helical curve AB, and the
shallower helix represents the analytically continued surface of inertial simultaneity. (It's
interesting to compare this construction with Riemann surfaces in complex function
analysis.)
Of course, we can dispense with the use of local inertial simultaneity to define our
constant-t coordinate surfaces, and simply define an arbitrary system of space and time
coordinates in terms of which a rotating disk is stationary (for example), but we then
must be careful to correctly account for non-inertial aspects of these accelerating
coordinates, particularly with regard to the meanings of spatial lengths. The usual
intuitive definition of the spatial length of an object (such as the perimeter of the rim) is
the absolute length of a locus of inertially simultaneous points of that object, so it
depends on the establishment of a slice of "inertial simultaneity" over the entire rim. If
we use inertial coordinates this is easy, but if we use non-inertial coordinates (such as
those in which the rotating disk is stationary), then no surface of inertial simultaneity
coincides with our surfaces of constant time parameter. In fact, this is essentially the
definition of non-inertial coordinates. So, we will obviously be unable to define a
coherent locus of inertial simultaneity over the whole disk as a surface of constant time
parameter when working with non-inertial coordinates.
One consequence of this is the fact that the spatial length of a path becomes dependent on
the speed of the path. We are accustomed to this for temporal lengths, i.e., the length of
time around the rim might be 30 seconds or 2 hours or 1 nanosecond, etc., depending on
how fast we are going relative to the disk, how fast the rim is spinning, in which direction
it is spinning, and so on. Likewise the spatial length of a path around the rim (in terms of
some particular coordinates) depends on the speed of the path. This shouldn't be
surprising, because the decomposition of spacetime into separate spatial and temporal

components is not unique, i.e., there are multiple equally self-consistent decompositions.
Since this is often a source of confusion, it's worthwhile to describe how this works in
detail. Let's first establish inertial cylindrical coordinates in 2+1 spacetime, using polar
coordinates (r,) for the space (where is the angular coordinate), and t for time. The
metric in terms of these inertial coordinates is

and for any fixed time t the purely spatial metric is

So, to find the "length" of any spacelike curve, such as the perimeter of a spinning disk of
radius rd centered at the origin, we simply integrate ds over this curve at the fixed value
of t. For a circular disk, r = rd is constant, so dr = 0, and the spatial metric is simply ds =
rd d, which we integrate from = 0 to 2 to give the length 2 rd.
Now let's look at this situation in terms of a system of coordinates in which the spinning
disk is stationary, i.e., such that a fixed point anywhere on the disk maintains constant
spatial coordinates for all values of the temporal coordinate. Taking the most naive and
simplistic approach, let's define the new coordinates T,R, by the relations

where is a constant, denoting the angular speed of these coordinates with respect to the
inertial t,r, coordinates. We also have the differentials

Substituting these expressions into the metric equation gives

According to these coordinates, a spatial length S must be given by integrating the


absolute spacelike differential using the metric along some constant-T surface, i.e., with
dT = 0, where the metric is

Again for the perimeter of the disk we get 2 Rd = 2 rd. Notice that our constant-T
surfaces are also constant-t surfaces, so this perimeter length agrees with our previous
result, and of course it doesn't matter which direction we integrate around the perimeter.

Incidentally, letting v = Rd denote the velocity of the rim with respect to the original
inertial coordinates, the full spacetime metric for the rim (R = Rd) in terms of the rotating
coordinates is

For a point fixed on the rim we have d = 0, and so

which confirms that the lapse of proper time for a point fixed on the rim of the rotating
disk is

times the lapse of T (and therefore of t).

Now let's send light beams around the perimeter in opposite directions. For lightlike
paths we have d = 0, so the path of light must satisfy

The purely spatial component is dS = Rd d, so we can make this substitution and divide
both sides by (dT)2 to give

The quantity dS/dT is the "speed of light" in terms of these rotating non-inertial
coordinates. Also, from the definitions we have

where d/dt is the angular velocity of the light at radius Rd with respect to the inertial
coordinates, so it equals 1/Rd (noting that c = 1 in our units), with the sign depending
on whether the light is clockwise or counter-clockwise. Substituting into the previous
expression gives

Letting C = dS/dT denote the speed of light with respect to these rotating non-inertial
coordinates, we therefore have C = 1 v, where again the sign depends on the direction
of the light relative to the direction of rotation of the disk.
Does this analysis lead to some kind of paradox? It indicates that the non-inertial "speed
of light" with respect to these rotating coordinates is not equal to 1, and in fact the ratio of
the speeds in the two directions is (1+v)/(1v), but of course this doesn't conflict with
special relativity, because these are not inertial coordinates (due to their rotation).
However, suppose we increase Rd and decrease w in proportion so that the rim speed v
remains constant. The above formulas still apply for arbitrarily large Rd and small angular
speed w, and yet the speed ratio remains the same, (1+v)/(1v). Does this conflict with
special relativity in the limit as the radius goes to infinity and the angular speed of the rim
goes to zero? Clearly not, since we saw in Section 2.7 that if t1 and t2 denote the travel
times for light pulses circling the disk in opposite directions, as measured by a clock at a
fixed point on the rim, so that t2/t1 = (1+v)/(1v), then we have t2/t1 1 = /, where is
the angular travel of the disk during the transit of light. In other words, the observed ratio
of travel times around the rim always differs from 1 by an amount proportional to the
angular travel of the disk during the transit of light. Thus the net acceleration (change of
velocity) of the rim observer during the measurement remains in constant proportion to
the measured anisotropy of the transit times.
However, even without waiting for the light rays to circle the disk and report their
anisotropy, don't the above formulas imply that the speeds of light in the two directions
are in the ratio of (1+v)/(1v) instantaneously with respect to our rotating coordinates, and
don't the rotating coordinates approach being inertial coordinates as Rd increases while
holding v constant? Yes and no. Both sets of coordinates use the same time t = T, but they
use different space coordinates, s and S. For the perimeter of the disk we have

where W = d/dt. Thus the ratio dS/ds of spatial distances along a given "path" depends
on the angular speed W of the path. Recall that for a signal travelling at c = 1 (with
respect to the inertial coordinates) around the perimeter we have W = 1/rd, and so

This is consistent with the velocity ratio

This shows that the "spatial distances" around the perimeter are different in the two
directions. But we saw earlier that "the spatial distance" was independent of the direction

in which we integrated around the perimeter, even in the rotating coordinate system, so
does this indicate an inconsistency? No, because, as noted above, the ratio dS/ds along a
given path depends on the speed of the path. We have dS/ds = 1 + w/W, and for the
perimeter of the disk with rim speed v and for a path with speed V, this gives

If the path is lightlike, we have V = 1 and so dS/ds = 1 v, whereas when we considered


the purely spatial distance around the perimeter we took the "instantaneous" distance, i.e.,
we took a spacelike path with V = , in which case dS/ds = 1 in both directions. This
explains quantitatively what we mean when we say that we are measuring different
things, depending on what spacetime path is having it's "spatial length" evaluated. Just as
the temporal length of a path around the rim depends on the speed of the path, so too does
the spatial length.
By the way, notice that if we integrate the spatial component of a path whose velocity V
(relative to the original inertial coordinates) is the same as the rim speed itself, so that v =
V, then obviously we will never move with respect to the disk in one direction, so dS = 0
and therefore dS/ds = 0, whereas in the other direction we have dS/ds = 2. Similarly if V
= 0 we will never move relative to the original coordinates, i.e., ds = 0 and therefore
dS/ds is infinite along such a path.
5.1 Vis Inertiae
It is indeed a matter of great difficulty to discover, and effectively to
distinguish, the true motions of particular bodies from the apparent,
because the parts of that immovable space in which those motions are
performed do by no means come under the observation of our senses. Yet
the thing is not altogether desperate
Isaac Newton,
1687
According to Newtonian mechanics a particle moves without acceleration unless acted
upon by a force, in which case the particle undergoes acceleration proportional to the
applied force. The acceleration is defined as a vector whose components are the second
derivatives of the particles space coordinates with respect to the time coordinate, which
would seem to imply that the acceleration of a particle and hence the force to which the
particle is subjected depends on our choice of coordinate systems. Of course, Newtons
law is understood to be applicable only with respect to a special class of coordinate
systems, called the inertial coordinate systems, which all give the same acceleration, and
hence the same applied force, for any given particle. Thus the restriction to inertial
coordinate systems enables us to regard accelerations and the corresponding forces as
absolute.

However, even in the context of Newtonian mechanics it is sometimes convenient to set


aside the restriction to inertial coordinate systems, and as a result the distinction between
physical forces and coordinate-based accelerations becomes ambiguous. For example,
consider a particle whose position in space is expressed by the vector

where i, j, k are orthogonal unit vectors for a coordinate system with fixed origin, and
x(t), y(t), z(t) are scalar functions of the time coordinate t. Obviously if these basis
vectors are unchanging, the derivatives of r are simply given by

but if the basis vectors may be changing with time (due to rotation of the coordinate axes)
the first derivative of r by the chain rule is

The quantity in the first parentheses is the partial derivative of r with respect to t at
constant basis vectors i, j, k, so we denote it as r/t. The quantity in the second
parentheses is the partial derivative of r with respect to t at constant x, y, z, which means
it represents the differential change in r due just to the rotation of the axes. This change is
perpendicular to both r and the angular velocity vector , and its magnitude is r times
the sine of the angle between and r, as indicated in the figure below.

Therefore, the total derivative of r with respect to t can be written as

Notice that this applies to any vector (compare with equation 4b in Appendix 4, noting
that the angular velocity serves here as the Christoffel symbol), so we can immediately
differentiating again with respect to t, giving the total acceleration

Noting that the cross product is distributive, and that the chain rule applies to derivatives
of cross products, this can be written as

This was based on the premise that the origin of the x,y,z coordinates was stationary, but
if we stipulate that the origin is at position R(t) with respect to some fully inertial
coordinate system, then the particles position in terms of these inertial coordinates is
R+r and the total acceleration of the particle includes the second derivative of R. Thus
Newtons second law, which equate the net applied force F to the mass times the
acceleration (defined in terms of an inertial coordinate system), is

If our original xyz coordinate system was inertial, then all the terms on the right hand
side except for the second would vanish, and we would have the more familiar-looking
expression

Now, if we are determined to organize our experience based on this simple formulation,
for any arbitrary choice of coordinate systems, we can do so, but only by introducing new
forces. We need only bring the other four terms from the right hand side of the previous
equation over to the left side, and call them forces. Thus we define the net force on the
particle to be

The first term on the right side is the net of the physical forces, whereas the remaining
terms are what we might call inertial forces. They are also often called fictitious
forces. The second term is the linear acceleration force, such as we may imagine is
pulling us downward when standing in an elevator that is accelerating upward. The fourth
term is called the Coriolis force, and the fifth term is sometimes called the centrifugal

force. (The third term apparently doesnt have a common name, perhaps because the
angular velocity in many practical circumstances is constant.) On this basis the
Newtonian equation of motion in terms of an arbitrary Cartesian coordinate system has
the simple form

Its interesting to consider why we usually dont adopt this point of view. It certainly
gives a simpler general equation of motion, but at the expense of introducing several new
forces, beyond whatever physical forces we had already identified in F. Our preference
for the usual (more complicated or more restrictive) formulation of Newtons law is due
to our desire to associate physical forces with some proximate substantial entity. For
example, the force of gravity is attributed to the pull of some massive body. The force of
the wind is attributed to the impulse of air molecules. And so on. The inertial forces
cant be so easily attributed to any proximate entity, so unless we want to pursue the
Machian idea of associating them with the changing relations to distant objects in the
universe, we are left with a force that has no causative substance, so we tend to regard
such forces as fictitious. Nevertheless, its worth remembering that the distinction
between physical and fictitious forces is to some extent a matter of choice, as is our
preference for inertial coordinate systems to measure time and space.
To illustrate some of the consequences of these ideas, recall that the Sagnac effect was
described in Section 2.7 from the standpoint of various systems of inertial coordinates,
and in Section 4.8 in terms of certain non-inertial coordinate systems, but in all these
cases the analyses was based on the premise that the true measures of time and space
were based on inertial coordinate systems. We can now examine some aspects of a
Sagnac device from a more general standpoint of arbitrary curvilinear coordinates,
leading to the idea that the physical effects of acceleration can be absorbed into the
metrical structure of spacetime itself.
In a square or triangular Sagnac device the light ray going from one mirror to the next in
the direction of rotation passes through the interior of the polygon when viewed from a
non-rotating frame of reference. This implies that the light ray, traveling in a straight line,
diverges from the rim of the Sagnac device and then converges back to the next vertex.
On the other hand, if we consider the same passage of light from the standpoint of an
observer riding along on the rotating device, the beam of light goes from one end of the
straight edge to the other, but since the light beam diverges from the edge and passes
through the interior of the polygon, it follows that from the standpoint of the rotating
observer the ray of light is emitted from one vertex and curves in toward the center of
rotation and then curves back to reach the next mirror. Likewise, the counter-rotating ray
travels outside the polygon, so when viewed from the rotating frame it appears to curve
outward (away from the center) and then back.
So, on a typical segment between two mirrors M1 and M2, when viewed from the rotating
frame of reference, the two light rays follow curved paths as shown in the drawing below:

The amount of this "warping" of the light rays depends mainly on the shape of the path
and the speed of the rim, so if we have significant warping of light rays with small r, the
warping won't be reduced by increasing the radius while holding the mirror speed
constant. Any bending of light rays would reveal to an observer that the segment M1 to
M2 is not inertial, so if we want to construct a scenario in which an observer sitting on a
mirror is "inertial for all practical purposes", we need to make each segment subtend a
very small arc of the disk and/or limit the rim speed, as well as restricting our attention to
a short enough span of time so that the rotating observer doesn't rotate through an
appreciable angle.
One thing that sometimes misleads people when assessing how things look from the
perspective of a rim observer is that they believe it was only necessary to consider the
centripetal acceleration, v2/R, of each point on the rim, but clearly if our objective is to
assess the speed of light with respect to a coordinate system in which an observer at a
particular point on the rim is stationary, we must determine the full accelerations of the
points on the rim relative to that system of coordinates. This includes the full five-term
expression for the acceleration of a moving point relative to an arbitrarily moving
coordinate system. On that basis we find that the light rays are subjected to an
"acceleration" field whose dominant term has a magnitude in the direction of travel of

where is the angular distance from the observer. (Note that this acceleration is defined
on the basis of "slow-measuring-rod- transport" lengths around the loop, combined with
time intervals corresponding to the rim observer's co-moving inertial frame. Also, note
that "vc/R" is characteristic of the Coriolis term, as opposed to v2/R for the centripetal
term.) Integrating these accelerations in both directions gives the pseudo-speeds (i.e., the
speeds relative to the accelerating coordinates) of the two light beams as a function of
position in the acceleration field

The average pseudo-speeds of the co- and counter-rotating beams around the loop are
therefore cv and c+v respectively, which gives a constant "anisotropic ratio". However,
these speeds differ from c at any particular point only in proportion to the pseudogravitational potential relative to the observer's location. The amplitude of the
acceleration field averaging cv/R does indeed go to zero as the radius R increases while
holding the rim speed v constant, but the integral of (cv/R)sin() over the entire loop
still always gives the speed distribution around the rim noted above, with the maximum
anisotropy occurring at the opposite point on the circumference (where the pseudogravitational potential difference relative to the observer is greatest), and this gives the
constant "anisotropic ratio". All of this is in perfect accord with the principles of
relativity.
Of course, if the problem is treated in terms of inertial coordinates, then acceleration isn't
an issue, and the solution is purely kinematical. However, our purpose here is to examine
the consequences of re-casting the Sagnac effect into a system of non-inertial coordinates
in which an observer sitting on the rim is stationary, which means we need to absorb into
the coordinates not only his circular translatory motion but also his rotation. This
introduces fictitious forces and acceleration/gravitational fields which must be taken into
account. Needless to say, there's no need to go to all this trouble, since the treatment in an
inertial frame is completely satisfactory. The only reason for re-casting this in noninertial coordinates is to illustrate how the general relativistic theory accommodates the
use of arbitrary coordinates.
Now, it's certainly true that there is no single coherent set of coordinates with respect to
which all the points on the disk are fully stationary, where the term "coherent" signifies a
single unique locus of inertial simultaneity. We can, however, construct a coherent set of
coordinates with respect to which one particular point on the rim is fully stationary, and
then use slow-transport methods for assigning spatial distances between any two mirrors,
and combine this with the observer's proper time as the basis for defining velocities,
accelerations, etc, with respect to the rim observer's accelerating coordinates.
To understand the nature of the pseudo-gravitational fields that exist with respect to these
accelerating coordinates, carry out the transformation to the observer's system in two
steps. First, construct a non-rotating system of coordinates in which the observer is
constantly at the origin. Thus we have absorbed his circular motion but not his rotation
into these coordinates. The result is illustrated below, where the disk is regarded as
rotating about the "stationary" observer riding on the rim, and the circles represent the
disk position at different "times" (relative to these coordinates).

So, at this stage, each point on the disk is twirling around the observer at an angular
speed of w (the same as the speed of the disk in the hub-centered coordinates). If we draw
the spiro-graph traced out by a point moving around the circle at speed c while the circle
rotates slightly about the observer with angular speed w = v/R, we see that the co-and
counter-rotating directions have different path lengths, precisely accounting for the
difference in travel times. Thus, even with respect to these accelerating coordinates (in
which the observer has a fixed position), the Sagnac effect is still due strictly to the
difference in path length, which demonstrates how directly the Sagnac effect is due not
just to acceleration in general but specifically to rotation.
Next, we absorb the rotation of the disk into our coordinates, so the disk is no longer
twirling around the observer. However, by absorbing the twirl of the disk into the
coordinates, we introduce an anisotropic pseudo-gravitational field (relative to the
"stationary" observer), for particles or light rays moving around the loop. The fact that
the "speed of light" in these coordinates can differ from c is exactly analogous to how the
distant stars have enormous speeds with respect to the Earth's rotating coordinates, and
that speed is attributed to the enormous pseudo-gravitational potential which exists at
those distances with respect to the Earth's coordinates. Similarly, relative to our rim
observer, the maximum gravitational potential difference is at the furthest point on the
circle, i.e., the point diametrically opposite on the disk, which is also where the greatest
anisotropy in the "speed of light" (with respect to these particular non-inertial
coordinates) occurs.
Thus, to first order with relatively small mirror speeds, the light rays are subjected to an
"acceleration" field whose magnitude in the directions of travel is (vc/R)sin() where
is the angular distance from the observer. Now, it might seem that we are unable to
account for the anisotropic effect of acceleration, on the assumption that all the points on
the rim are subject to the same acceleration, so there can be no differential effect for light
rays moving in opposite directions around the loop. However, that's not the case, for two
reasons. First, the acceleration (with respect to these accelerating coordinates) is not
constant, and second it is the Coriolis (not the centripetal) acceleration that produces the
dominant effect. The Coriolis acceleration is the cross product of the rotation (pseudo)
vector w with the velocity vector of the object in question, and this has an opposite sense
depending on whether the object (or light ray) is moving in the co-rotating or counterrotating direction.
Of course, both directions eventually encounter the same amount of positive and negative

acceleration, but in the opposite order. Thus, they both start out at c, and one experiences
an increase in velocity of +v followed by a decrease of -v, whereas the other drops down
by -v first and then increases by +v. Thus their accelerations and velocities as functions of
angular position are as shown below:

The average speeds of the co- and counter-rotating beams around the loop are therefore cv and c+v respectively, which gives the constant "anisotropic ratio". Notice that the
speeds differ from c only where there is significant pseudo-gravitational potential relative
to the observer's location (just as with the distant stars, and of course the relation is
reciprocal). The intensity of the acceleration field is on the order of cv/R, which does
indeed go to zero as the radius R increases while holding the rim speed v constant, but the
integral of (cv/R)sin() over the entire loop still always gives the speed distribution
around the rim noted above, with the maximum anisotropy occurring at the opposite point
on the circumference (where the pseudo-gravitational potential difference relative to the
observer is greatest), and this gives the constant "anisotropic ratio".
It's also worth noting that the anisotropic ratio of speeds given by this pseudogravitational potential corresponds precisely to the anisotropic distances when the Sagnac
device is analyzed with respect to the instantaneously co-moving inertial frame of the rim
observer.
5.2 Tensors, Contravariant and Covariant
Ten masts at each make not the altitude
which thou hast perpendicularly fell.
Thy lifes a miracle. Speak yet again.

Shakespeare
One of the most important relations involving continuous functions of multiple
continuous variables (such as coordinates) is the formula for the total differential. In
general if we are given a smooth continuous function y = f(x1,x2,...,xn) of n variables, the
incremental change dy in the variable y resulting from incremental changes dx1, dx2, ...,
dxn in the variables x1, x2, ... ,xn is given by

where y/xi is the partial derivative of y with respect to xi. (The superscripts on x are
just indices, not exponents.) The scalar quantity dy is called the total differential of y.
This formula just expresses the fact that the total incremental change in y equals the sum
of the "sensitivities" of y to the independent variables multiplied by the respective
incremental changes in those variables. (See the Appendix for a slightly more rigorous
definition.)
If we define the vectors

then dy equals the scalar (dot) product of these two vectors, i.e., we have dy = gd.
Regarding the variables x1, x2,..., xn as coordinates on a manifold, the function y =
f(x1,x2,...,xn) defines a scalar field on that manifold, g is the gradient of y (often denoted
as y), and d is the differential position of x (often denoted as dx), all evaluated about
some nominal point [x1,x2,...,xn] on the manifold.
The gradient g = y is an example of a covariant tensor, and the differential position d =
dx is an example of a contravariant tensor. The difference between these two kinds of
tensors is how they transform under a continuous change of coordinates. Suppose we
have another system of smooth continuous coordinates X1, X2, ..., Xn defined on the same
manifold. Each of these new coordinates can be expressed (in the region around any
particular point) as a function of the original coordinates, Xi = Fi(x1, x2, ..., xn), so the total
differentials of the new coordinates can be written as

Thus, letting D denote the vector [dX1,dX2,...,dXn] we see that the components of D are
related to the components of d by the equation

This is the prototypical transformation rule for a contravariant tensor of the first order. On
the other hand, the gradient vector g = y is a covariant tensor, so it doesn't transform in
accord with this rule. To find the correct transformation rule for the gradient (and for
covariant tensors in general), note that if the system of functions Fi is invertible (which it
is if and only if the determinant of the Jacobian is non-zero), then the original coordinates
can be expressed as some functions of these new coordinates, xi = fi(X1, X2, ..., Xn) for i =
1, 2, .., n. This enables us to write the total differentials of the original coordinates as

If we now substitute these expressions for the total coordinate differentials into equation
(1) and collect by differentials of the new coordinates, we get

Thus, the components of the gradient of g of y with respect to the Xi coordinates are
given by the quantities in parentheses. If we let G denote the gradient of y with respect to
these new coordinates, we have

This is the prototypical transformation rule for covariant tensors of the first order.
Comparing this with the contravariant rule given by (2), we see that they both define the
transformed components as linear combinations of the original components, but in the
contravariant case the coefficients are the partials of the new coordinates with respect to
the old, whereas in the covariant case the coefficients are the partials of the old
coefficients with respect to the new.
The key attribute of a tensor is that it's representations in different coordinate systems
depend only on the relative orientations and scales of the coordinate axes at that point,
not on the absolute values of the coordinates. This is why the absolute position vector
pointing from the origin to a particular object in space is not a tensor, because the
components of its representation depend on the absolute values of the coordinates. In
contrast, the coordinate differentials transform based solely on local information.
So far we have discussed only first-order tensors, but we can define tensors of any order.
One of the most important examples of a second-order tensor is the metric tensor. Recall
that the generalized Pythagorean theorem enables us to express the squared differential
distance ds along a path on the spacetime manifold to the corresponding differential
components dt, dx, dy, dz as a general quadratic function of those differentials as

follows

Naturally if we set g00 = g11 = g22 = g33 = 1 and all the other gij coefficients to zero, this
reduces to the Minkowski metric. However, a different choice of coordinate systems (or a
different intrinsic geometry, which will be discussed in subsequent sections) requires the
use of the full formula. To simplify the notation, it's customary to use the indexed
variables x0, x1, x2, x3 in place of t, x, y, z respectively. This allows us to express the
above metrical relation in abbreviated form as

To abbreviate the notation even more, we adopt Einstein's convention of omitting the
summation symbols altogether, and simply stipulating that summation from 0 to 3 is
implied over any index that appears more than once in a given product. With this
convention the above expression is written as

Notice that this formula expresses something about the intrinsic metrical relations of the
space, but it does so in terms of a specific coordinate system. If we considered the
metrical relations at the same point in terms of a different system of coordinates (such as
changing from Cartesian to polar coordinates), the coefficients g would be different.
Fortunately there is a simple way of converting the g from one system of coordinates to
another, based on the fact that they describe a purely localistic relation among differential
quantities. Suppose we are given the metrical coefficients g for the coordinates x, and
we are also given another system of coordinates y that are defined in terms of the x by
some arbitrary continuous functions

Assuming the Jacobian of this transformation isn't zero, we know that it's invertible, and
so we can just as well express the original coordinates as continuous functions (at this

point) of the new coordinates

Now we can evaluate the total derivatives of the original coordinates in terms of the new
coordinates. For example, dx0 can be written as

and similarly for the dx1, dx2, and dx3. The product of any two of these differentials, dx
and dx, is of the form

(remembering the summation convention). Substituting these expressions for the products
of x differentials in the metric formula (5) gives

The first three factors on the right hand side obviously represent the coefficient of dydy
in the metric formula with respect to the y coordinates, so we've shown that the array of
metric coefficients transforms from the x to the y coordinate system according to the
equation

Notice that each component of the new metric array is a linear combination of the old
metric components, and the coefficients are the partials of the old coordinates with
respect to the new. Arrays that transform in this way are called covariant tensors.
On the other hand, if we define an array A with the components (dx/ds)(dx/ds) where
s denotes a path parameter along some particular curve in space, then equation (2) tells
us that this array transforms according to the rule

This is very similar to the previous formula, except that the partial derivatives are of the
new coordinates with respect to the old. Arrays whose components transform according
to this rule are called contra-variant tensors.
When we speak of an array being transformed from one system of coordinates to another,
it's clear that the array must have a definite meaning independent of the system of
coordinates. We could, for example, have an array of scalar quantities, whose values are
the same at a given point, regardless of the coordinate system. However, the components
of the array might still be required to change for different systems. For example, suppose
the temperature at the point (x,y,z) in a rectangular tank of water is given by the scalar
field T(x,y,z), where x,y,z are Cartesian coordinates with origin at the geometric center of
the tank. If we change our system of coordinates by moving the origin, say, to one of the
corners of the tank, the function T(x,y,z) must change to T(xx0, yy0, zz0). But at a
given physical point the value of T is unchanged.
Notice that g20 is the coefficient of (dy)(dt), and g02 is the coefficient of (dt)(dy), so
without loss of generality we could combine them into the single term (g20 + g02)(dt)(dy).
Thus the individual values of g20 and g02 are arbitrary for a given metrical equation, since
all that matters is the sum (g20 + g02). For this reason we're free specify each of those
coefficients as half the sum, which results in g20 = g02. The same obviously applies to all
the other diagonally symmetric pairs, so for the sake of definiteness and simplicity we
can set gab = gba. It's important to note, however, that this symmetry property doesn't apply
to all tensors. In general we have no a priori knowledge of the symmetries (if any) of an
arbitrary tensor.
Incidentally, when we refer to a vector (or, more generally, a tensor) as being either
contravariant or covariant we're abusing the language slightly, because those terms really
just signify two different conventions for interpreting the components of the object with
respect to a given coordinate system, whereas the essential attributes of a vector or tensor
are independent of the particular coordinate system in which we choose to express it. In
general, any given vector or tensor can be expressed in both contravariant and covariant
form with respect to any given coordinate system. For example, consider the vector P
shown below.

We should note that when dealing with a vector (or tensor) field on a manifold each
element of the field exists entirely at a single point of the manifold, with a direction and a
magnitude, rather than imagining each vector to actually extends from one point in the
manifold to another. (For example, we might have a vector field describing the direction
and speed of the wind at each point in a given volume of air.) However, for the purpose
of illustrating the relation between contravariant and covariant components, we are
focusing on simple displacement vectors in a flat metrical space, which can be considered
to extend from one point to another.
Figure 1 shows an arbitrary coordinate system with the axes X1 and X2, and the
contravariant and covariant components of the position vector P with respect to these
coordinates. As can be seen, the jth contravariant component consists of the projection of
P onto the jth axis parallel to the other axis, whereas the jth covariant component consists
of the projection of P into the jth axis perpendicular to that axis. This is the essential
distinction (up to scale factors) between the contravariant and covariant ways of
expressing a vector or, more generally, a tensor. (It may seem that the naming convention
is backwards, because the "contra" components go with the axes, whereas the "co"
components go against the axes, but historically these names were given on the basis on
the transformation laws that apply to these two different interpretations.)
If the coordinate system is "orthogonal" (meaning that the coordinate axes are mutually
perpendicular) then the contravariant and covariant interpretations are identical (up to
scale factors). This can be seen by imagining that we make the coordinate axes in Figure
1 perpendicular to each other. Thus when we use orthogonal coordinates we are
essentially using both contravariant and covariant coordinates, because in such a context
the only difference between them (at any given point) is scale factors. Its worth noting
that "orthogonal" doesn't necessarily imply "rectilinear". For example, polar coordinates
are not rectilinear, i.e., the axes are not straight lines, but they are orthogonal, because as
we vary the angle we are always moving perpendicular to the local radial axis. Thus the
metric of a polar coordinate system is diagonal, just as is the metric of a Cartesian

coordinate system, and so the contravariant and covariant forms at any given point differ
only by scale factors (although these scale factor may vary as a function of position).
Only when we consider systems of coordinates that are not mutually perpendicular do the
contravariant and covariant forms differ (at a given point) by more than just scale factors.
To understand in detail how the representations of vectors in different coordinate systems
are related to each other, consider the displacement vector P in a flat 2-dimensional space
shown below.

In terms of the X coordinate system the contravariant components of P are (x1, x2) and the
covariant components are (x1, x2). Weve also shown another set of coordinate axes,
denoted by , defined such that 1 is perpendicular to X2, and 2 is perpendicular to X1.
In terms of these alternate coordinates the contravariant components of P are (1, 2) and
the covariant components are (1, 2). The symbol signifies the angle between the two
positive axes X1, X2, and the symbol denotes the angle between the axes 1 and 2.
These angles satisfy the relations + = and = ()/2. We also have

which shows that the covariant components with respect to the X coordinates are the
same, up to a scale factor of cos(), as the contravariant components with respect to the
coordinates, and vice versa. For this reason the two coordinate systems are called "duals"
of each other. Making use of the additional relations

the squared length of P can be expressed in terms of any of these sets of components as
follows:

In general the squared length of an arbitrary vector on a (flat) 2-dimensional surface can
be given in terms of the contravariant components by an expression of the form

where the coefficients guv are the components of the covariant metric tensor. This tensor
is always symmetrical, meaning that guv = guv, so there are really only three independent
elements for a two-dimensional manifold. With Einstein's summation convention we can
express the preceding equation more succinctly as

From the preceding formulas we can see that the covariant metric tensor for the X
coordinate system in Figure 2 is

whereas for the dual coordinate system the covariant metric tensor is

noting that cos() = cos(). The determinant g of each of these matrices is sin()2, so
we can express the relationship between the dual systems of coordinates as

We will find that the inverses of the metric tensor is also very useful, so let's use the
superscripted symbol guv to denote the inverse of a given guv. The inverse metric tensors
for the X and coordinate systems are

Comparing the left-hand matrix with the previous expression for s2 in terms of the
covariant components, we see that

so the inverse of the covariant metric tensor is indeed the contravariant metric tensor.
Now let's consider a vector x whose contravariant components relative to the X axes of
Figure 2 are x1, x2, and lets multiply this by the covariant metric tensor as follows:

Remember that summation is implied over the repeated index u, whereas the index v
appears only once (in any given product) so this expression applies for any value of v.
Thus the expression represents the two equations

If we carry out this multiplication we find

which agrees with the previously stated relations between the covariant and contravariant
components, noting that sin() = cos(). If we perform the inverse operation,
multiplying these covariant components by the contravariant metric tensor, we recover
the original contravariant components, i.e., we have

Hence we can convert from the contravariant to the covariant versions of a given vector
simply by multiplying by the covariant metric tensor, and we can convert back simply by
multiplying by the inverse of the metric tensor. These operations are called "raising and
lowering of indices", because they convert x from a superscripted to a subscripted
variable, or vice versa. In this way we can also create mixed tensors, i.e., tensors that are
contravariant in some of their indices and covariant in others.
Its worth noting that, since xu = guv xu , we have

Many other useful relations can be expressed in this way. For example, the angle

between two vectors a and b is given by

These techniques immediately generalize to any number of dimensions, and to tensors


with any number of indices, including "mixed tensors" that have some contravariant and
some covariant indices. In addition, we need not restrict ourselves to flat spaces or
coordinate systems whose metrics are constant (as in the above examples). Of course, if
the metric is variable then we can no longer express finite interval lengths in terms of
finite component differences. However, the above distance formulas still apply, provided
we express them in differential form, i.e., the incremental distance ds along a path is
related to the incremental components dxj according to

so we need to integrate this over a given path to determine the length of the path. These
are exactly the formulas used in 4-dimensional spacetime to determine the spatial and
temporal "distances" between events in general relativity.
For any given index we could generalize the idea of contravariance and covariance to
include mixtures of these two qualities in a single index. This is not ordinarily done, but it
is possible. Recall that the contravariant components are measured parallel to the
coordinate axes, and the covariant components are measured normal to all the other axes.
These are the two extreme cases, but we could define components with respect to
directions that make a fixed angle relative to the coordinate axes and normals. The
transformation rule for such representations is more complicated than either (6) or (8),
but each component can be resolved into sub-components that are either purely
contravariant or purely covariant, so these two extreme cases suffice to express all
transformation characteristics of tensors.
5.3 Curvature, Intrinsic and Extrinsic
Thus we are led to a remarkable theorem (Theorem Egregium): If a curved surface is
developed upon any other surface whatever, the measure of curvature in each point
remains unchanged.
C. F. Gauss, 1827

The extrinsic curvature of a plane curve at a given point on the curve is defined as the
derivative of the curve's tangent angle with respect to position on the curve at that point.
In other words, if (s) denotes the angle which the curve makes with some fixed
reference axis as a function of the path length s along the curve, then = d/ds. In terms
of orthogonal and naturally scaled coordinates X,Y we have tan() = dX/dY. If the X
axis is tangent to the curve at the point in question, then tan() approaches and dX
approaches ds, so in terms of such tangent normal coordinates the curvature can

equivalently be defined as simply the second derivative, = d2Y/dX2.


One way of specifying a plane curve is by giving a function Y = f(X) where X and Y are
naturally scaled orthogonal coordinates. Natural scaling means (ds)2 = (dX)2 + (dY)2, so
we have ds/dX = [1 + (dY/dX)2]1/2. The curvature can easily be determined by directly
evaluating the derivative d/ds as follows

Likewise if the curve is specified parametrically by the functions X(t) and Y(t) for some
arbitrary path parameter t, we have ds/dt = (Xt2 + Yt2)1/2 where subscripts denote
derivatives, and the curvature is

Although these derivations are quite simple and satisfactory for the case of plane curves,
it's worthwhile to examine both of them more closely to clarify the application to higher
dimensional cases, where it is more convenient to use the definition of curvature based on
the second derivative with respect to tangent coordinates. First, let's return to the case
where the plane curve was specified by an explicit function Y = f(X) for naturally scaled
orthogonal coordinates X,Y. Expanding this function into a power series (up to second
order) about the point of interest, we have constants A,B,C such that Y = AX2 + BX + C.
The constant C is just a simple displacement, so it's irrelevant to the shape of the curve.
Thus we need only consider the curve Y = AX2 + BX. If B is non-zero this curve is not
tangent to the X axis at the origin. To remedy this we can consider the curve with respect
to a rotated system of coordinates x,y, related to the original coordinates by the
transformation equations

Substituting these expressions for X and Y into the equation Y = AX2 + BX and rearranging terms gives

If we select an angle such that the coefficient of the linear term in the numerator
vanishes, i.e., if we set Bcos() + sin() = 0 by putting = invtan(-B), then the
numerator is purely second order. If we then expand the denominator into a power series
in x and y, the product of this series with the numerator yields just a constant times the
numerator plus terms of third and higher order in x and y. Hence the non-constant terms
in the denominator are insignificant up to second order, so the denominator is effectively
just equal to the constant term. Inserting the value of into the above equation gives

The curvature at the origin is just the second derivative, so we have

where subscripts denote derivatives, and we have used the facts that, for the original
function f(X) at the origin we have fX = B and fXX = 2A. This shows how we can arrive
(somewhat laboriously) at our previous result by using the "second derivative" definition
of curvature and an explicitly defined curve Y = f(X).
A plane curve can be expressed parametrically as a function of the path length s by the
functions x(s), y(s). Since (ds)2 = (dx)2 + (dy)2, it follows that xs2 + ys2 = 1 (where again
subscripts denote derivatives). The vector (xs,ys) is tangent to the curve, so the
perpendicular vector (-ys,xs) is normal to the curve. The vector (xss,yss) represents the rate
of change of the tangent direction of the curve with respect to s. Recall that the curvature
of a line in the plane is defined as the rate of change of the angle of the curve as a
function of distance along the curve, but since tan() approaches to the second order as
goes to zero, we can just as well define curvature as the rate of change of the tangent.
Noting that ys = (1xs2)1/2 we have yss = xsxss/(1xs2)1/2 and hence ysyss = xsxss. Thus we
have yss/xss = ys/xs, which means the vector (xss,yss) is perpendicular to the curve. The
magnitude of this vector is || = (xss2 + yss2)1/2, and we can define the signed magnitude as
the dot product of (xss,yss) with the vector (-ys,xs), and normalize this to the length of this
vector, which happens to be (xs2 + ys2)1/2 = 1. This gives the signed curvature

The center of curvature of the curve at the point (x,y) is at the point (x ys/||, y + xs/||).
To illustrate, a circle of radius R centered at the origin can be expressed by the parametric
equations x(s) = Rcos(s/R) and y(s) = Rsin(s/R), and the first derivatives are xs =
sin(s/R) and ys = cos(s/R). The second derivatives are xss = (1/R)cos(s/R) and yss =
(1/R)sin(s/R). From this we have the magnitude of the curvature || = 1/R and the signed
curvature +1/R. The sign is based on the path direction being positive in the counterclockwise direction. The center of curvature for every point on this curve is the origin

(0,0).
The preceding parametric derivation was based on the path length parameter s, but we
can also define a curve in terms of an arbitrary parameter t, not necessarily the path
length. In this case we have the functions x(t), y(t), and s(t). Again we have (ds)2 = (dx)2
+ (dy)2, so the derivatives of these three functions are related by xt2 + yt2 = st2. We also
have xs = xt/st and ys = yt/st, and the second derivatives are

and the similar expression for yss. Substituting into the previous formula for the signed
curvature we get

The techniques described above for determining the curvature of plane curves can be
used to determine the sectional curvatures of a two-dimensional surface embedded in
three-dimensional space. Notice that the general power series expansion of a curve
defined by the function f(x) is f(x) = c0 + c1 x + c2 x2 + c3 x3 + ..., but by choosing
coordinates so that the curve passes through the origin tangent to the x axis at the point in
question we can arrange to make c0 = c1 = 0, so the expansion of the curve about this
point can be put into the form f(x) = c2 x2 + c3 x3 + ... Also, since the 2nd derivative is
f''(x) = 2c2 + 6c3 x ..., evaluating this at x = 0 gives simply f''(0) = 2c2, so it's clear
that only the 2nd-order term is significant in determining the curvature with respect to
tangent normal coordinates, i.e., it is sufficient to represent the curve as f(x) = ax2.
Similarly if we consider the extrinsic curvature of a cross-section of a two-dimensional
surface in space, we see that at any given point on the surface we can construct an
orthogonal "xyz" coordinate system such that the xy plane is tangent to the surface and
the z axis is perpendicular to the surface at that point. In general the equation of our
surface can be expanded about this point into a polynomial giving the "height" z as a
function of x and y. As in the one-dimensional case, the constant and 1st-order terms of
this polynomial will be zero (because we defined our coordinates tangent to the surface
with the origin at the point in question), and the 3rd and higher order terms don't affect
the second derivative at the origin, so we can represent our surface by just the 2nd-order
terms of the expansion, i.e.,

The second (partial) derivatives of this function with respect to x and y are 2a and 2c
respectively, so these numbers give us some information about the curvature of the
surface. However, we'd really like to know the curvature of the surface evaluated in any
direction, not just in the x and y directions. (Note that the tangency condition uniquely

determines the direction of the z axis, but the x and y axes can be rotated anywhere in the
tangent plane.)
In general we can evaluate the curvature of the surface in the direction of the line y = qx
for any constant q. The equation of the surface in this direction is simply

but of course we want to evaluate the derivatives with respect to changes along this
direction, rather than changes in the pure x direction. Parametrically the distance along
the tangent plane in the y = qx direction is s2 = x2 + y2 = (1 + q2) x2, so we can substitute
for x2 in the preceding equation to give the value of f as a function of the distance s

The second derivative of this function gives the extrinsic curvature (q) of the surface in
the "q" direction:

Now we might ask what directions give the extreme (min and max) curvatures. Setting
the derivative of (q) to zero gives the result

where m = (c a)/b. Since the constant term of this quadratic is 1 it follows that the
product of the two roots of this equation is also 1, which means that each of them is the
negative reciprocal of the other, so the lines of min and max curvature are of the form y =
qx and y = (1/q)x, which shows that the two directions are perpendicular.
Substituting the two "extreme" values of q into the equation for (q) gives (see the
Appendix for details) the two "principal curvatures" of the surface

The product of these two is called the "Gaussian curvature" of the surface at that point,
and is given by

which of course is just the (negative) discriminant of the quadratic form ax2 + bxy + cy2.

For the surface of a sphere of radius R this quantity equals 1/R2 (as derived in the
Appendix).
Another measure of the curvature of a surface is called the "mean curvature", which, as
the name suggests, is the mean value of the curvature over all possible directions. Since
we want to give all the directions equal weight, we insert tan() for q in the equation for
(q) and then integrate over , giving the mean curvature

(Of course, we could also infer this mean value directly as the average of 1 and 2 since
is symmetrically distributed.) Notice that the mean curvature occurs along two
perpendicular directions, and these are oriented at 45 degrees relative to the "principal"
directions. This can be verified by setting the derivative of the product (q) (-1/q) to
zero and noting that the resulting quartic in q factors into two quadratics, one giving the
two principal directions, and the other giving the directions of the mean curvature. (The
product (q) (-1/q) is obviously a maximum when both terms have the mean value,
and a minimum when the terms have their extreme values.)
Examples of surfaces with constant Gaussian curvature are the sphere, the plane, and the
pseudosphere, which have positive, zero, and negative curvature respectively. (Negative
Gaussian curvature signifies that the two principal curvatures have opposite signs,
meaning the surface has a "saddle" shape.) Surfaces with vanishing mean curvature are
called "minimal surfaces", and represent the kinds of surfaces that are formed by a "soap
films". For many years the only complete and non-self-intersecting minimal surfaces
known were the plane, the catenoid, and the helicoid, but recently an infinite family of
such minimal surfaces was discovered.
The above discussion was based on extrinsic properties of surfaces, i.e., measuring the
rate of deviation between one surface and another. However, we can also look at
curvature from an intrinsic standpoint, in terms of the relations between points within the
surface itself. For example, if we were confined the surface of a sphere of radius R, we
would find that the ratio Q of the circumference to the "radius" of a circle as measured
on the surface of the sphere would not be constant but would depend on the circle's radius
r according to the relation Q = (R/r) sin(r/R). Evaluating the second derivative of Q
with respect to r in the limit as r goes to zero we have

Thus we can infer the radius of our sphere entirely from local measurements over a small
region of the surface. The results of such local measurements of intrinsic distances on a
surface can be encapsulated in the form of a "metric tensor" relative to any chosen system
of coordinates on the surface.

In general, any two-dimensional surface embedded in three-dimensional space can be


represented over a sufficiently small region by an expression of the form Z = f(X,Y)
where X,Y,Z are orthogonal coordinates. The expansion of this function up to second
order is

where A,B..,E are constants. If the coefficients D and E are zero, the surface is tangent to
the XY plane, and we can immediately compute the Gaussian curvature and the metric
tensor as discussed previously. However, if D and E are not zero, we need to rotate our
coordinates so that the XY plane is tangent to the surface. To accomplish this we can
apply the usual Euler rotation matrix for a rotation through an angle about the z axis
followed by a rotation through an angle about the (new) X axis. Thus we have a new
system of orthogonal coordinates x,y,z related to the original coordinates by

Making these substitutions for X,Y, and Z in (1) gives the equation of the surface in terms
of the rotated coordinates. The coefficients of the linear terms in x and y in this
transformed equation are
Dcos() Esin()

Dsin()cos() + Ecos()cos() + sin()

respectively. To make these coefficients vanish we must set

Substituting these angles into the full expression gives

The cross-product terms involving xz, yz, and z2 have been omitted, because if we bring
these over to the left side and factor out z, we can then divide both sides by the factor (k1
+ k2x + k3y + k4z), and the power series expansion of this, multiplied by the second-order
terms in x and y, gives just a constant times those terms, plus terms of third and higher
order in x,y, and z, which do not affect the curvature at the origin. Therefore, the second-

order terms involving z drop out, and we're left with the above quadratic for z. This
describes a surface tangent to the xy plane at the origin, i.e., a surface of the form z = ax2
+ bxy + cy2, and the curvature of such a surface equals 4acb2 at the origin, so the
curvature of the above surface at the origin is

Remember that we began with a surface defined by the function Z = f(X,Y), and from
equation (1) we see that the partial derivatives of the function f with respect to X and Y
at the origin are

Consequently, the equation for the curvature of the surface can be written as

In addition, if we take the differentials of both sides of (1) we have

Inserting this for dZ into the metrical expression (ds)2 = (dX)2 + (dY)2 + (dZ)2 gives the
metric at the origin on the surface with respect to the XY coordinates projected onto the
surface:

where

Thus the curvature of the surface can also be written in the form

where g = gXXgYY gXY2. The quantities in the numerator of the right hand expression are
the coefficients of the "second groundform" of the surface, and the metric line element is
called the first groundform. Hence the curvature is simply the ratio of the determinants
of the two groundforms.

The preceding was based on treating the 2D surface embedded in 3D space defined by
giving Z explicitly as a function of X and Y. This is analogous to our treatment of curves
in the plane based on giving Y as an explicit function of X. However, we found that a
more general and symmetrical expression for the curvature of a plane curve was found by
considering the curve defined parametrically, i.e., giving x(u) and y(u) as functions of an
arbitrary path parameter u. Similarly we can define a 2D surface in 3D space by giving
x(u,v), y(u,v) and z(u,v) as functions of two arbitrary coordinates on the surface. From
the Euclidean metric of the embedding 3D space we have

where subscripts denote partial derivatives. We also have the total differentials

which can be substituted into the basic 3D Euclidean metric (ds)2 = (dx)2 + (dy)2 + (dz)2
to give the 2D metric of the surface with respect to the arbitrary surface coordinates u,v

where

The space-vectors [xu,yu,zu] and [xv,yv,zv] are tangent to the surface and point along the
u and v directions, respectively, so the cross-product of these two vectors is a vector
normal to the surface

whose magnitude is

The space-vectors [xuu,yuu,zuu] and [xvv,yvv,zvv] represent the rates of change of the
tangent vectors to the surface along the u and v directions, and the vector [xuv,yuv,zuv]
represents the rate of change of the u tangent with respect to v, and vice versa. Thus if
we take the dot products of each of these vectors with the unit vector normal to the
surface, we will get the signed coefficients of an expression for the surface of the pure
quadratic form h(u,v) = au2 + buv + cv2. where "h" can be regarded as the height above
the tangent plane at the origin, and the three scaled triple products correspond to huu = 2a,
huv = b, and hvv = 2c.
If u and v were projections of orthogonal coordinates (as were x and y in our prior
discussion), the determinant of the surface metric at the origin would be 1, and the

curvature would simply be 4acb2. However, in general we allow u and v to be any


surface coordinates, not necessarily orthogonal, and not necessarily scaled to equal the
path length along constant coordinate lines. Given orthogonal metrically scaled tangent
coordinates X,Y, there exist coefficients A,B,C such that the height h above the tangent
plane is h(X,Y) = AX2 + BXY + CY2, and the curvature K at the origin is simply 4AC
B2. Also, for points sufficiently near the origin we have

Substituting these expressions into h(X,Y) gives h(u,v) = au2 + buv + cv2 where

With these coefficients we find

In addition, we know that the surface is asymptotic to the tangent plane at the origin, so
the metric in terms of X,Y is simply (ds)2 = (dX)2 + (dY)2. Substituting the expressions
for dX and dY in terms of du and dv, the metric at the origin in terms of the u,v
coordinates is

From this we have the determinant of the metric

This shows that the intrinsic curvature K is related to the quantity 4acb2 by the equation

We saw previously that the coefficients 2a,b,2c are given by triple vector products
divided by the normalizing factor
form, we have

. Writing out the triple products in determinant

Therefore the Gaussian curvature is given by

Recalling that the determinant of the transpose of a matrix is the same as of the matrix
itself, we can transpose the second factor in each determinant product to give the
equivalent expression

The determinant of a product of matrices is the same as the product of the determinants of
those matrices, so we can carry out the matrix multiplications inside the determinant
symbols. The first product of determinants can therefore be written as the single
determinant

Notice that several of the entries in this matrix can be expressed purely in terms of the
components guu, guv, and gvv of the metric tensor and their partial derivatives, so we can
write this determinant as

In a similar manner we can expand the second product of determinants into a single
determinant and express most of the resulting components in terms of the metric to give

The curvature is just 1/g2 times the difference between these two determinants. In both
cases we have been able to express all the matrix components in terms of the metric, with
the exception of the upper-left entries. However, notice that the cofactors of these two
entries in their respective matrices are identical (namely g), so when we take the
difference of these determinants the upper-left entries both appear simply as multiples of
g. Thus we need only consider the difference of these two entries, which can indeed be
written purely in terms of the metric coefficients and their derivatives as follows

Consequently, we can express the Gaussian curvature K entirely in terms of the intrinsic
metric with respect to arbitrary two-dimensional coordinates on the surface, as follows

This formula was first presented by Gauss in his famous paper "Disquisitiones generales
circa superficies curvas" (General Investigations of Curved Surfaces), published in 1827.
Gauss regarded this result as quite remarkable (egregium in Latin), so it is commonly
known as the Theorema Egregium. The reason for Gauss' enthusiasm is that this formula
proves the Gaussian curvature of a surface is indeed intrinsic, i.e., it is not dependent on
the embedding of the surface in higher dimensional space. Operating entirely within the
surface we can lay out arbitrary curvilinear coordinates u,v, and then determine the metric
coefficients (and their derivatives) with respect to those coordinates, and from this
information alone we can compute the intrinsic curvature of the surface. The Gaussian
curvature K is defined as the product of the two principle extrinsic sectional curvatures 1
and 2, neither of which is an intrinsic metrical property of the surface, but the product of
these two numbers is an intrinsic metrical property.
In Section 5.7 the full Riemann curvature tensor Rabcd for manifolds of any number of
dimensions is defined, and we show that Gauss' surface curvature K is equal to Ruvuv/g,

which completely characterizes the curvature of a two-dimensional surface. To highlight


the correspondence between Gauss' formula and the full curvature tensor, we can re-write
the above formula as

where we have used the facts that guu/g = gvv, gvv/g = guu, and guv/g = -guv. Notice that if
we define the symbol

for any three indices a,b,c, then Gauss' formula for the curvature of a surface can be
written more succinctly as

No summations are implied here, but to abbreviate the notation even further, we could
designate the symbols and as "wild card" indices, with implied summation of every
term in which they appear over all possible indices (i.e., over u and v). On this basis the
formula is

As discussed in Section 5.7, this is precisely the formula for the component Ruvuv of the
full Riemann curvature tensor in n dimensions, which makes it clear how directly Gauss'
result for two-dimensional surfaces generalizes to n dimensions. Naturally this formula

for K reduces to 12 = 4ac b2 where 1 and 2 of the two principal extrinsic curvatures
relative to a flat plane tangent to the surface at the point of interest. The reason this
formula is so complicated is that it applies to any system of coordinates (rather than just
projected tangent normal coordinates), and is based entirely on the intrinsic properties of
the surface.
To illustrate this approach, consider the two-dimensional surface defined as the locus of
points at a height h above the xy plane, where h is given by the equation

with arbitrary constants a, b, and c. For example, with a=c=0 and b=1 this gives the
simple surface h = xy shown below:

For other values of a,b,c this surface can have various shapes, such as paraboloids. The
function h(x,y) is single-valued over the entire xy plane, so it's convenient to simply
project the xy grid onto the surface and use this as our coordinates on the surface. (Any
other system of curvilinear coordinates would serve just as well.)
Over a sufficiently small interval on this surface the distance ds along a path is related to
the incremental changes dx, dy, and dz according to the usual Pythagorean relation

Also the equation of the surface allows us to express the increment dz in terms of dx and
dy as follows

Therefore we have

Substituting this into the equation for the line element (ds)2 gives the basic metrical
equation of the surface

where the components of the "metric tensor" are

We can, in principle, directly measure the incremental distance ds for any given
increments dx and dy without ever leaving the surface, so the metric components are
purely intrinsic properties of the surface. In general the metric tensor is a symmetric
covariant tensor of second order, and is usually written in the form of a matrix. Thus, for
our simple example we can write the metric as

The determinant of this matrix at the point (x,y) is

The inverse of the metric tensor is denoted by guv , where the superscripts are still indices,
not exponents. In our example the inverse metric tensor is

Substituting these metric components into the general formula for the Gaussian curvature
K gives

in agreement with our earlier result for surfaces specified explicitly in the form z =
f(x,y). At the origin, where x = y = 0, this gives K = 4ac b2, i.e., the product of the two
principal extrinsic curvatures. In addition, the formula gives the Gaussian curvature for
any point on the surface, so we don't have to go to the trouble of laboriously constructing

a tangent plane at each point and finding the quadratic expansion of the surface about that
point.
We can see from this formula that the curvature at every point of this simple twodimensional surface always has the same sign as the discriminant 4ac b2 . Also, the
shape of the constant-curvature lines on this surface can be determined by re-arranging
the terms of the above equation, from which we find that the curvature equals K on the
locus of points satisfying the equation

This is the equation of a conic with discriminant +4(4ac b2)2. The case of zero
curvature occurs only when the discriminant vanishes, which implies that b =
so the equation of the surface factors as

, and

The quantity inside the parentheses is a planar function, so the surface is a parabolic
"valley", which has no intrinsic curvature (like the walls of a cylinder).
It follows from the preceding conic equation that the lines of constant curvature (if there
is any curvature) must be ellipses centered on the origin. However, this is not the most
general form of curvature possible on a two-dimensional surface, it's just the most
general form for a surface embedded in three-dimensional Euclidean space. Suppose we
embed our two-dimensional surface in four dimensional Euclidean space. We can still, at
any given point on the surface, construct a two-dimensional tangent plane with
orthogonal xy coordinates, and expand the equation of the surface up to second degree
about that point, but now instead of just a single perpendicular height h(x,y) we allow
two mutually perpendicular heights, which we may call h1(x,y) and h2(x,y). Our surface
can now be defined (in the neighborhood of the origin at the point of tangency) by the
equations

Following the same procedure as before, determining the components of the metric tensor
for this surface and plugging them into Gauss's formula, we find that the intrinsic
curvature of this surface is

where

The lines of constant curvature on this surface can be much more diverse than for a
surface embedded in just three dimensional space. As an example, if we define the
surface with the equations

then the lines of constant curvature are as indicated in the figure below.

We have focused on two-dimensional surfaces in this section, but the basic idea of
intrinsic curvature remains essentially the same in any number of dimensions. We'll see
in subsequent sections that Riemann generalized Gauss's notion of intrinsic curvature by
noting that any two (distinct) directional rays emanating from a given point P, if
continued geodesically and with parallel transport (both of which we will discuss in
detail), single out a two-dimensional surface within the manifold, and we can determine
the "sectional" curvature of that surface in the same way as described in this section. Of
course, in a manifold of three or more dimensions there are infinitely many twodimensional surfaces passing through any given point, but Riemann showed how to
encode enough information about the manifold at each point so that we can compute the
sectional curvature on any surface.
For spaces of n>2 dimensions, we can proceed in essentially the same way, by imagining
a flat n-dimensional Euclidean space tangent to the space at point of interest, with a
Cartesian coordinate system, and then evaluating how the curved space deviates from the
flat space into another set of n(n1)/2 orthogonal dimensions, one for each pair of
dimensions in the flat tangent space. This is obviously just a generalization of our
approach for n = 2 dimensions, when we considered a flat 2D space with Cartesian

coordinates x,y tangent to the surface, and described the curved surface in the region
around the tangent point in terms of the "height" h(x,y) perpendicular to the surface.
Since the have chosen a flat baseline space tangent to the curved surface, it follows that
the constant and first-order terms of h(x,y) are zero. Also, since we are not interested in
any derivatives higher than the second, we can neglect all terms of h(x,y) above
second order. Consequently we can express h(x,y) as a homogeneous second-order
expression, i.e.,

We saw that embedding a curved 2D surface in four dimensions allows even more
freedom for the shape of the surface, but in the limit as the region becomes smaller and
smaller, the surface approaches a single height. Similarly for a space of three dimensions
we can imagine a flat three-dimensional space with x,y,z Cartesian coordinates tangent to
the curved surface, and consider three perpendicular "heights" h1(x,y), h2(x,z), and
h3(y,z).
There are obvious similarities between intrinsic curvature and ordinary spatial rotations,
neither of which are possible in a space of just one dimension, and both of which are - in
a sense - inherently two-dimensional phenomena, even when they exist in a space of
more than two dimensions. Another similarity is the non-commutativity exhibited by
rotations as well as by translations on a curved surface. In fact, we could define
curvature as the degree to which translations along two given directions do not commute.
The reason for this behavior is closely connected to the fact that rotations in space are
non-commutative, as can be seen most clearly by imagining a curved surface embedded
in a higher dimensional space, and noting that the translations on the surface actually
involve rotations, i.e., angular displacements in the embedding space. Hence it's
inevitable that such displacements don't commute.

5.4 Relatively Straight


Theres some end at last for the man who follows a path; mere rambling is interminable.
Seneca, 60 AD

The principle of relativity, as expressed in Newton's first law of motion (and carried over
essentially unchanged into Einstein's special theory of relativity) is based on the idea of
uniform motion in a straight line. However, the terms "uniform motion" and "straight
line" are not as easy to define as one might think. Historically, it was usually just assumed
that such things exist, and that we know them when we see them. Admittedly there were
attempts to describe these concepts, but mainly in somewhat vague and often circular
ways. For example, Euclid tells us that "a line is breadthless length", and "a straight line
is a line which lies evenly with the points on itself". The precise literal interpretation of
these statements can be debated, but they seem to have been modeled on an earlier
definition given by Plato, who said a straight line is "that of which the middle covers the

ends". This in turn may have been based on Parmenides' saying that "straight is whatever
has its middle directly between the ends".
Each of these definitions relies on some pre-existing of idea straightness to give meaning
to such terms as "lying evenly" or "directly between", so they are immediately selfreferential. Other early attempts to define straightness invoked visual alignment, on the
presumption that light travels in a straight line. Of course, we could simply define
straightness to be congruence with a path of light, but such an empirical definition would
obviously preclude asking whether, in fact, light necessarily travels in straight lines as
defined in some more abstract sense. Not surprisingly, thinkers like Plato and Euclid, who
wished to keep geometry and mechanics strictly separate, preferred a purely abstract a
priori definition of straightness, without appealing (explicitly) to any physical
phenomena. Unfortunately, their attempts to provide meaningful conceptual definition
were not particularly successful.
Aristotle noted that among all possible lines connecting two given points, the straight line
is the one with the shortest length, and Archimedes suggested that this property could be
taken as the definition of a straight line. This at least has the merit of relating two
potentially distinct concepts, straightness and length, and even gives us a way of
quantifying which of two lines (i.e., curves) connecting two points is "straighter", simply
by comparing their lengths, without explicitly invoking the straightness of anything else.
Furthermore, this definition can be applied in a more general context, such as on the
surface of the Earth, where the straightest (shortest) path between two points is an arc of
a great circle, which is typically not congruent to a visual line of sight. We saw in Chapter
3.5 that Hero based his explanation of optical reflection on the hypothesis that light
travels along the shortest possible path. This is a nice example of how an a priori
conceptual definition of straightness led to a non-trivial physical theory about the
behavior of light, which obviously would have been precluded if there had been no
conception of straightness other than that it corresponds to the paths of light.
We've also seen how Fermat refined this principle of straightness to involve the variable
of time, related to spatial distances by what he intuited was an invariant characteristic
speed of light. Similarly the principle of least action, popularized by Maupertius and
Euler, represented the application of stationary paths in various phase spaces (i.e., the
abstract space whose coordinates are the free variables describing the state of a system),
but for actual geometrical space (and time) the old Euclidean concept of extrinsic
straightness continued to predominate, both in mathematics and in physics. Even in the
special theory of relativity Einstein relied on the intuitive Euclidean concept of
straightness, although he was dissatisfied with this approach, and believed that the true
principle of relativity should be based on the more profound Archimedian concept of
straight lines as paths with extremal lengths. In a sense, this could be regarded as
relativizing the concept of straightness, i.e., rather than seeking absolute extrinsic
straightness, we focus instead on relative straightness of neighboring paths, and declare
the extremum of the available paths to be "straight", or rather "as straight as possible".
In addition, Einstein was motivated by the classical idea of Copernicus that we should not

regard our own particular frame of reference (or any other frame of reference) as special
or preferred for the laws of physics. It ought to be possible to express the laws of physics
in such a way that they apply to any system of coordinates, regardless of their state of
motion. The special theory succeeds in this for all uniformly moving systems of
coordinates (although with the epistemological shortcoming noted above), but Einstein
sought a more general theory of relativity encompassing coordinate systems in any state
of motion and avoiding the circular definition of straightness.
We've noted that Archimedes suggested defining a straight line as the shortest path
between two points, but how can we determine which of the infinitely many paths from
any given point to another is the shortest? Let us imagine any arbitrary path through
three-dimensional space from the point P1 at (x1,y1,z1) to the point P2 at (x2,y2,z2). We can
completely describe this path by assigning a smooth monotonic parameter to the points
of the path, such that =0 at P1 and =1 at P2, and then specifying the values of x(), y(
), and z() as functions of The total length S of the path can be found from the
functions x(), y(), and z() by integrating the differential distances all along the path as
follows

Now suppose we let x(), y(), and z() denote three arbitrary functions of ,
representing some deviation from the nominal path, and consider the resulting "disturbed
path" described by the functions
X( x() + x()

Y() = y() + y()

Z() = z() + z()

where is a parameter that we can vary to apply different fractions of the disturbance.
For any fixed value of the parameter the distance along the path from P1 to P2 is given
by

Our objective is to find functions x(), y(), z() such that for any arbitrary disturbance
vector , the value of S() is minimized at = 0. Those functions will then describe the
straightest path from P1 to P2.
To find the minimal value of S() we differentiate with respect to . It's legitimate to
perform this differentiation inside the integral, so (omitting the indications of functional
dependencies) we can write

We can evaluate the derivatives with respect to based on the definitions of X,Y,Z as
follows

Therefore, the derivatives of these with respect to are simply

Substituting these expressions into the previous equation gives

We want this quantity to equal zero when equals 0. Of course, in that case we have
X=x, Y=y, and Z=z, so we make these substitutions and then require that the above
integral vanish. Thus, letting dots denote differentiation with respect to , we have

Using "integration by parts" we can evaluate this integral, term by term. For example,
considering just the x component in the numerator, we can use the "parts" variables

and then the usual formula for integration by parts gives

The first term on the right-hand side automatically vanishes, because by definition the
disturbance components x,y,z are all zero at the end-points of the path. Applying the
same technique to the other components, we arrive at the following expression for the
overall integral which we wish to set to zero

The coefficients of the three terms in the integrand are the disturbance functions x, y, z,
which are allowed to take on any arbitrary values in between = 0 and = 1. Regardless
of the values of these three disturbance components, we require the integral to vanish.
This is a very strong requirement, and can only be met by setting each of the three
derivatives in parentheses to zero, i.e., it requires

This implies that the arguments of these three derivatives do not change as a function of
the path parameter, so they have constant values all along the path. Thus we have

The numerators of these expressions can be regarded as the x, y, and z components,


respectively, of the "rate" of motion (per ) along the path, whereas the denominators
represent the total magnitude of the motion. Thus, these conditions tell us that the
components of motion along the path are in a constant ratio to each other, which means
that the direction of motion is constant, i.e., a straight line. So, to reach from P1 to P2, the
constants must be given by Cx = (x2 x1)/D, Cy = (y2 y1)/D, and Cz = (z2 z1)/D, where D
is the total distance given by D2 = (x2 x1)2 + (y2 y1)2 + (z2 z1)2. Given an initial
trajectory, the entire path is determined by the assumption that it proceeds from point to
point always by the shortest possible route.
So far we have focused on finding the geodesic paths in ordinary Euclidean threedimensional space, and found that they correspond to our usual notion of straight lines.
However, in a space with a different metric, the shapes of geodesic paths can be more
complicated. To determine the general equations for geodesic paths, let us first formalize

the preceding "variational" technique. In general, suppose we wish to determine a


function x() from 1 to 2 such that the integral of some function F(,x, ) along that
path is stationary. (As before, dots signify derivatives with respect to .) We again define
an arbitrary disturbance x(x) and the disturbed function X(,) = x() + x(), where
is a parameter that determined how much of the disturbance is to be applied. We wish to
make stationary the integral

This is done by differentiating S with respect to the parameter as follows

Substituting for dX/d and

/d gives

We want to set this quantity to zero when = 0, which implies X = x, so we require

The integral of the second term in parentheses (integration by parts) as

The first term on the right-hand side is identically zero (since the disturbance is defined
to be zero at the end points), so we can substitute the second term back into the preceding
equation and factor out the disturbance x() to give

Again, since this equations must be satisfied for every possible (smooth) disturbance
function x(), it requires that the quantity in parentheses vanish identically, so we arrive
at the Euler equation

which is the basis for solving a wide variety of problems in the calculus of variations.
The application of Euler's equation that most interests us is in finding the general
equation of the straightest possible path in an arbitrary smooth manifold with a defined
metric. In this case the function whose integral we wish to make stationary is the absolute
spacetime interval, defined by the metric equation

where, as usual, summation is implied over repeated indices. Multiplying the right side
by (d/d)2 and taking the square root of both sides gives the differential "distance" ds
along a path parameterized by . Integrating along the path from 1 and 2 gives the
distance to be made stationary

For each individual coordinate x this can be treated as a variational problem with the
function

where again dots signify differentiation with respect to . (Incidentally, the metric need
not be positive-definite, since we can always choose our sign convention so that the
squared intervals in question are positive, provided we never integrate along a path for
which the squared interval changes sign, which would represent changing from timelike
to spacelike, or vice versa, in relativity.) Therefore, we can apply Euler's equation to
immediately give the equations of geodesic paths on the surface with the specified metric

For an n-dimensional space this represents n equations, one for each of the coordinates
x1, x2, ..., xn. Letting w = (ds/d)2 = F2 =

this can be written as

To simplify these equations, let us put the parameter equal to the integrated path length
s, so that we have w = 1 and dw/d = 0. The right-most term drops out, and we're left
with

Notice that even though w equals a constant 1 in these circumstances and the total
derivative vanishes, the partial derivatives do not necessarily vanish. Indeed, if we
substitute for

into this equation we get

Evaluating the derivative in the left-hand term and dividing through by 2, this gives

At this point it's conventional to make use of the identity

(where we have simply swapped the and indices) to represent the middle term of the
preceding equation as half the sum of these two expressions. This enables us to write the
geodesic equations in the form

where the symbol [] is defined as

These are called connection coefficients, also known as Christoffel symbols of the first
kind. Finally, if we multiply through by the contravariant metric g, we have

where

are known as Christoffel symbols of the second kind.


As an example, consider the simple two-dimensional surface h = ax2 + bxy + cy2
discussed in Chapter 5.3. Using the metric tensor, its inverse, and partial derivatives we
can now directly compute the Christoffel symbols, from which we can give explicit
parametric equations for the geodesic paths on our surface:

If we scale and rotate the coordinates so that the surface height has the form h = xy/R, the
geodesic equations reduce to

These equations show that if either dx/ds or dy/ds equals zero, the second derivatives of x
and y with respect to s must be zero, so lines of constant x and lines of constant y are
geodesics (as expected, since these are straight lines in space). Of course, given an initial
trajectory that is not parallel to either the x or y axis the resulting geodesic path on this
surface will be curved, and can be explicitly computed from the above formulas.

5.8 The Field Equations

You told us how an almost churchlike atmosphere is pervading your desolate house now.
And justifiably so, for unusual divine powers are at work in there.
Besso to Einstein, 30 Oct
1915

The basis of Einstein's general theory of relativity is the audacious idea that not only do
the metrical relations of spacetime deviate from perfect Euclidean flatness, but that the
metric itself is a dynamical object. In every other field theory the equations describe the
behavior of a physical field, such as the electric or magnetic field, within a constant and
immutable arena of space and time, but the field equations of general relativity describe
the behavior of space and time themselves. The spacetime metric is the field. This fact is
so familiar that we may be inclined to simply accept it without reflecting on how
ambitious it is, and how miraculous it is that such a theory is even possible, not to
mention (somewhat) comprehensible. Spacetime plays a dual role in this theory, because
it constitutes both the dynamical object and the context within which the dynamics are
defined. This self-referential aspect gives general relativity certain characteristics
different from any other field theory. For example, in other theories we formulate a
Cauchy initial value problem by specifying the condition of the field everywhere at a
given instant, and then use the field equations to determine the future evolution of the
field. In contrast, because of the inherent self-referential quality of the metrical field, we
are not free to specify arbitrary initial conditions, but only conditions that already satisfy
certain self-consistency requirements (a system of differential relations called the Bianchi
identities) imposed by the field equations themselves.
The self-referential quality of the metric field equations also manifests itself in their nonlinearity. Under the laws of general relativity, every form of stress-energy gravitates,
including gravitation itself. This is really unavoidable for a theory in which the metrical
relations between entities determine the "positions" of those entities, and those positions
in turn influence the metric. This non-linearity raises both practical and theoretical
issues. From a practical standpoint, it ensures that exact analytical solutions will be very
difficult to determine. More importantly, from a conceptual standpoint, non-linearity
ensures that the field cannot in general be uniquely defined by the distribution of material
objects, because variations in the field itself can serve as "objects".
Furthermore, after eschewing the comfortable but naive principle of inertia as a suitable
foundation for physics, Einstein concluded that "in the general theory of relativity, space
and time cannot be defined in such a way that differences of the spatial coordinates can
be directly measured by the unit measuring rod, or differences in the time coordinate by a
standard clock...this requirement ... takes away from space and time the last remnant of
physical objectivity". It seems that we're completely at sea, unable to even begin to
formulate a definite solution, and lacking any definite system of reference for defining
even the most rudimentary quantities. It's not obvious how a viable physical theory could
emerge from such an austere level of abstraction.
These difficulties no doubt explain why Einstein's route to the field equations in the years
1907 to 1915 was so convoluted, with so much confusion and backtracking. One of the
principles that heuristically guided his search was what he called the principle of general

covariance. This was understood to mean that the laws of physics ought to be expressible
in the form of tensor equations, because such equations automatically hold with respect to
any system of curvilinear coordinates (within a given diffeomorphism class, as discussed
in Section 9.2). He abandoned this principle at one stage, believing that he and
Grossmann had proven it could not be made consistent with the Poisson equation of
Newtonian gravitation, but subsequently realized the invalidity of their arguments, and
re-embraced general covariance as a fundamental principle.
It strikes many people as ironic that Einstein found the principle of general covariance to
be so compelling, because, strictly speaking, it's possible to express almost any physical
law, including Newton's laws, in generally covariant form (i.e., as tensor equations). This
was not clear when Einstein first developed general relativity, but it was pointed out in
one of the very first published critiques of Einstein's 1916 paper, and immediately
acknowledged by Einstein. It's worth remembering that the generally covariant
formalism had been developed only in 1901 by Ricci and Levi-Civita, and the first real
use of it in physics was Einstein's formulation of general relativity. This historical
accident made it natural for people (including Einstein, at first) to imagine that general
relativity is distinguished from other theories by its general covariance, whereas in fact
general covariance was only a new mathematical formalism, and does not connote a
distinguishing physical attribute. For this reason, some people have been tempted to
conclude that the requirement of general covariance is actually vacuous. However, in
reply to this criticism, Einstein clarified the real meaning (for him) of this principle,
pointing out that its heuristic value arises when combined with the idea that the laws of
physics should not only be expressible as tensor equations, but should be expressible as
simple tensor equations. In 1918 he wrote "Of two theoretical systems which agree with
experience, that one is to be preferred which from the point of view of the absolute
differential calculus is the simplest and most transparent". This is still a bit vague, but it
seems that the quality which Einstein had in mind was closely related to the Machian idea
that the expression of the dynamical laws of a theory should be symmetrical up to
arbitrary continuous transformations of the spacetime coordinates. Of course, the
presence of any particle of matter with a definite state of motion automatically breaks the
symmetry, but a particle of matter is a dynamical object of the theory. The general
principle that Einstein had in mind was that only dynamical objects could be allowed to
introduce asymmetries. This leads naturally to the conclusion that the coefficients of the
spacetime metric itself must be dynamical elements of the theory, i.e., must be acted
upon. With this Einstein believed he had addressed what he regarded as the strongest of
Mach's criticisms of Newtonian spacetime, namely, the fact that Newton's space acted on
objects but was never acted upon by objects.
Let's follow Einstein's original presentation in his famous paper "The Foundation of the
General Theory of Relativity", which was published early in 1916. He notes that for
empty space, far from any gravitating object, we expect to have flat (i.e., Minkowskian)
spacetime, which amounts to requiring that Riemann's curvature tensor Rabcd vanishes.
However, in regions of space near gravitating matter we must clearly have non-zero
intrinsic curvature, because the gravitational field of an object cannot simply be
"transformed away" (to the second order) by a change of coordinates. Thus there is no

system of coordinates with respect to which the manifold is flat to the second order,
which is precisely the condition indicated by a non-vanishing Riemann curvature tensor.
Nevertheless, even at points where the full curvature tensor Rabcd is non-zero, the
contracted tensor of the second rank, Rbc= gadRabcd = Rdbcd may vanish. Of course, a tensor
of rank four can be contracted in six different ways (the number of ways of choosing two
of the four indices), and in general this gives six distinct tensors of rank two. We are able
to single out a more or less unique contraction of the curvature tensor only because of
that tensors symmetries (described in Section 5.7), which imply that of the six
contractions of Rabcd, two are zero and the other four are identical up to sign change.
Specifically we have

By convention we define the Ricci tensor Rbc as the contraction gadRabcd. In seeking
suitable conditions for the metric field in empty space, Einstein observes that
there is only a minimum arbitrariness in the choice... for besides R there is no tensor of the
second rank which is formed from the g and it derivatives, contains no derivative higher than the
second, and is linear in these derivatives This prompts us to require for the matter-free
gravitational field that the symmetrical tensor R ... shall vanish.

Thus, guided by the belief that the laws of physics should be the simplest possible tensor
equations (to ensure general covariance), he proposes that the field equations for the
gravitational field in empty space should be

Noting that R takes on a particularly simple form on the condition that we choose
coordinates such that
Christoffel symbols as

= 1, Einstein originally expressed this in terms of the

(except that in his 1916 paper Einstein had a different sign because he defined the symbol
abc as the negative of the Christoffel symbol of the second kind.) He then concludes the
section with words that obviously gave him great satisfaction, since he repeated
essentially the same comments at the conclusion of the paper:
These equations, which proceed, by the method of pure mathematics, from the requirement of the
general theory of relativity, give us, in combination with the [geodesic] equations of motion, to a
first approximation Newton's law of attraction, and to a second approximation the explanation of
the motion of the perihelion of the planet Mercury discovered by Leverrier. These facts must, in
my opinion, be taken as a convincing proof of the correctness of the theory.

To his friend Paul Ehrenfest in January 1916 he wrote that "for a few days I was beside
myself with joyous excitement", and to Fokker he said that seeing the anomaly in
Mercury's orbit emerge naturally from his purely geometrical field equations "had given
him palpitations of the heart". (These recollections are remarkably similar to the
presumably apocryphal story of Newton's trembling hand when he learned, in 1675, of
Picard's revised estimates of the Earth's size, and was thereby able to reconcile his
previous calculations of the Moon's orbit based on the assumption of an inverse-square
law of gravitation.)
The expression R = 0 represents ten distinct equations in the ten unknown metric
components g at each point in empty spacetime (where the term "empty" signifies the
absence of matter or electromagnetic energy, but obviously not the absence of the
metric/gravitational field.) Since these equations are generally covariant, it follows that
given any single solution we can construct infinitely many others simply by applying
arbitrary (continuous) coordinate transformations. Thus, each individual physical
solution has four full degrees of freedom which allow it to be expressed in different
ways. In order to uniquely determine a particular solution we must impose four
coordinate conditions on the g, but this gives us a total of fourteen equations in just ten
unknowns, which could not be expected to possess any non-trivial solutions at all if the
fourteen equations were fully independent and arbitrary. Our only hope is if the ten
formal conditions represented by our basic field equations automatically satisfy four
identities for any values of the metric components, so that they really only impose six
independent conditions, which then would uniquely determine a solution when
augmented by a set of four arbitrary coordinate conditions.
It isn't hard to guess that the four "automatic" conditions to be satisfied by our field
equations must be the vanishing of the covariant derivatives, since this will guarantee
local conservation of any energy-momentum source term that we may place on the right
side of the equation, analogous to the mass density on the right side of Poisson's equation

In tensor calculus the divergence generalizes to the covariant derivative, so we expect


that the covariant derivatives of the metrical field equations must identically vanish. The
Ricci tensor R itself does not satisfy this requirement, but we can create a tensor that
does satisfy the requirement with just a slight modification of the Ricci tensor, and
without disturbing the relation R = 0 for empty space. Subtracting half the metric
tensor times the invariant R = gR gives what is now called the Einstein Tensor

Obviously the condition R = 0 implies G = 0. Conversely, if G = 0 we can see from


the mixed form

that R must be zero, because otherwise R would need to be diagonal, with the
components R/2, which doesn't contract to the scalar R (except in two dimensions).
Consequently, the condition G = 0 is equivalent to R = 0 for empty space, but for
coupling with a non-zero source term we must use G to represent the metrical field.
To represent the "source term" we will use the covariant energy-momentum tensor T,
and regard it as the "cause" of the metric curvature (although one might also conceive of
the metric curvature as, in some temporally symmetrical sense, "causing" the energymomentum). Einstein acknowledged that the introduction of this tensor is not justified by
the relativity principle alone, but it has the virtues of being closely related by analogy
with the Poisson equation from Newton's theory, it gives local conservation of energy and
momentum, and finally that it implies gravitational energy gravitates just as does every
other form of energy. On this basis we surmise that the field equations coupled to the
source term can be written in the form G = kT where k is a constant which must equal
8G (where G is Newton's gravitational constant) in order for the field equations to
reduce to Newton's law in the weak field limit. Thus we have the complete expression of
Einstein's metrical law of general relativity

It's worth noting that although the left side of the field equations is quite pure and almost
uniquely determined by mathematical requirements, the right side is a hodge-podge of
miscellaneous "stuff". As Einstein wrote,
The energy tensor can be regarded only as a provisional means of representing matter. In reality,
matter consists of electrically charged particles... It is only the circumstance that we have no
sufficient knowledge of the electromagnetic field of concentrated charges that compels us,
provisionally, to leave undetermined in presenting the theory, the true form of this tensor... The
right hand side [of (2)] is a formal condensation of all things whose comprehension in the sense of
a field theory is still problematic. Not for a moment... did I doubt that this formulation was merely
a makeshift in order to give the general principle of relativity a preliminary closed-form
expression. For it was essentially no more than a theory of the gravitational field, which was
isolated somewhat artificially from a total field of as yet unknown structure.

Alas, neither Einstein nor anyone since has been able to make further progress in
determining the true form of the right hand side of (2), although it is at the heart of
current efforts to reconcile quantum mechanics with general relativity. At present we
must be content to let T represent, in a vague sort of way, the energy density of the
electromagnetic field and matter.
A different (but equivalent) form of the field equations can be found by contracting (2)
with g to give R 2R = R = 8GT, and then substituting for R in (2) to give

which again makes clear that the field equations for empty space are simply R = 0.
Incidentally, the tensor G was named for Einstein because of his inspired use of it, not
because he discovered it. Indeed the vanishing of the covariant derivative of this tensor
had been discovered by Aurel Voss in 1880, by Ricci in 1889, and again by Luigi Bianchi
in 1902, all apparently independently. Bianchi had once been a student of Felix Klein, so
it's not surprising that Klein was able in 1918 to point out regarding the conservation laws
in Einstein's theory of gravitation that we need only "make use of the most elementary
formulae in the calculus of variations". Recall from Section 5.7 that the Riemann
curvature tensor in terms of arbitrary coordinates is

At the origin of Riemann normal coordinates this reduces to gad,cb gac,bd , because in such
coordinates the Christoffel symbols are all zero and we have the special symmetry gab,cd =
gcd,ab. Now, if we consider partial derivatives (which in these special coordinates are the
same as covariant derivatives) of this tensor, we see that the derivative of the quantity in
square brackets still vanishes, because the product rule implies that each term is a
Christoffel symbol times the derivative of a Christoffel symbol. We might also be
tempted to take advantage of the special symmetry gab,cd = gcd,ab , but this is not
permissible because although the two quantities are equal (at the origin of Riemann
normal coordinates), their derivatives are not generally equal. Hence when evaluating the
derivatives of the Riemann tensor, even at the origin or Riemann normal coordinates, we
must consider all four of the metric tensor derivatives in the above expression. Denoting
covariant differentiation with respect to a coordinate xm by the subscript ;m, we have

Noting that partial differentiation is commutative, and the metric tensor is symmetrical,
we see that the sum of these three tensors vanishes at the origin of Riemann normal
coordinates, and therefore with respect to all coordinates. Thus we have the Bianchi
identities

Multiplying through by gadgbc , making use of the symmetries of the Riemann tensor, and
the fact that the covariant derivative of the metric tensor vanishes identically, we have

which reduces to

Thus we have

showing that the "divergence" of the tensor inside the parentheses (the Einstein tensor)
vanishes identically.
As an example of how the theory of relativity has influenced mathematics (in appropriate
reaction to the obvious influence of mathematics on relativity), in the same year that
Einstein, Hilbert, Klein, and others were struggling to understand the conservation laws
of the relativistic field equations, Emmy Noether published her famous work on the
relation between symmetries and conservation laws, and Klein didn't miss the
opportunity to show how Einstein's theory embodied aspects of his Erlangen program.
A slight (but significant) extension of the field equations was proposed by Einstein in
1917 based on cosmological considerations, as a means of ensuring stability of a static
closed universe. To accomplish this, he introduced a linear term with the cosmological
constant as follows

When Hubble and other astronomers began to find evidence that in fact the large-scale
universe is expanding, and Einstein realized his ingenious introduction of the
cosmological constant had led him away from making such a fantastic prediction, he
called it "the biggest blunder of my life.
It's worth noting that Einsteinian gravity is possible only in four dimensions, because in
any fewer dimensions the vanishing of the Ricci tensor R implies the vanishing of the
full Riemann tensor, which means no curvature and therefore no gravity in empty space.
Of course, the actual field equations for the vacuum assert that the Einstein tensor (not
the Ricci tensor) vanishes, so we should consider the possibility of G being zero while R
is non-zero. We saw above that G = 0 implies R = 0, but that was based on the
assumption of a four-dimensional manifold. In general for an n-dimensional manifold we
have R (n/2)R = G, so if n is not equal to 2, and if Guv vanishes, we have G = 0 and it
follows that R = 0, and therefore Ruv must vanish. However, if n = 2 it is possible for G
to equal zero even though R is non-zero. Thus, in two dimensions, the vanishing of Guv

does not imply the vanishing of Ruv. In this case we have

where can be any constant. Multiplying through by guv gives

This is the vacuum solution of Einstein's field equations in two dimensions. Oddly
enough, this is also the vacuum solution for the field equations in four dimensions if is
identified as the non-zero cosmological constant. Any space of constant curvature is of
this form, although a space of this form need not be of constant curvature.
Once the field equations have been solved and the metric coefficients have been
determined, we then compute the paths of objects by means of the equations of motion.
It was originally taken as an axiom that the equations of motion are the geodesic
equations of the manifold, but in a series of papers from 1927 to 1949 Einstein and others
showed that if particles are treated as singularities in the field, then they must propagate
along geodesic paths. Therefore, it is not necessary to make an independent assumption
about the equations of motion. This is one of the most remarkable features of Einstein's
field equations, and is only possible because of the non-linear nature of the equations. Of
course, the hypothesis that particles can be treated as field singularities may seem no
more intuitively obvious than the geodesic hypothesis itself. Indeed Einstein himself was
usually very opposed to admitting any singularities, so it is somewhat ironic that he took
this approach to deriving the equations of motion. On the other hand, in 1939 Fock
showed that the field equations imply geodesic paths for any sufficiently small bodies
with negligible self-gravity, not treating them as singularities in the field. This approach
also suggests that more massive bodies would deviate from geodesics, and it relies on
representing matter by the stress-energy tensor, which Einstein always viewed with
suspicion.
To appreciate the physical significance of the Ricci tensor it's important to be aware of a
relation between the contracted Christoffel symbol and the scale factor of the
fundamental volume element of the manifold. This relation is based on the fact that if the
square matrix A is the inverse of the square matrix B, then the components of A can be
expressed in terms of the components of B by the equation Aij = (B/Bij)/B where B is
the determinant of B. Accordingly, since the covariant metric tensor g and the
contravariant metric tensor g are matrix inverses of each other, we have

If we multiply both sides by the partial of g with respect to the coordinate x we have

Notice that the left hand side looks like part of a Christoffel symbol. Recall the general
form of these symbols

If we set one of the lower indices of the Christoffel symbol, say c, equal to a, then we
have the contracted symbol

Since the indices a and are both dummies (meaning they each take on all possible
values in the implied summation), and since ga = ga, we can swap a and in any of the
terms without affecting the result. Swapping a and in the last term inside the
parentheses we see it cancels with the first term, and we're left with

Comparing this with our previous result (4), we find that the contracted Christoffel
symbol can be written in the form

Furthermore, recalling the elementary fact that the derivative of ln(y) equals 1/y times the
derivative of y, and the fact that k ln(y) = ln(yk), this result can also be written in the
form

Since our metrics all have negative determinants, we can replace |g| with -g in these
expressions. We're now in a position to evaluate the geometrical and physical
significance of the Ricci tensor, the vanishing of which constitutes Einstein's vacuum
field equations. The general form of the Ricci tensor is

which of course is a contraction of the full Riemann curvature tensor. Making use of the
preceding identity, this can be written as

In his original 1916 paper on the general theory Einstein initially selected coordinates
such that the metric determinant g was a constant -1, in which case the partial derivatives
of

all vanish and the Ricci tensor is simply

The vanishing of this tensor constitutes Einstein's vacuum field equations (1'), provided
the coordinates are such that g is constant. Even if g is not constant in terms of the
natural coordinates, it is often possible to transform the coordinates so as to make g
constant. For example, Schwarzschild replaced the usual r and coordinates with x =
r3/3 and y = cos(), together with the assumption that gtt = 1/grr, and thereby expressed
the spherically symmetrical line element in a form with g = -1. It is especially natural to
impose the condition of constant g in static systems of coordinates and spatially uniform
fields. Indeed, since we spend most of our time suspended quasi-statically in a nearly
uniform gravitational field, we are most intuitively familiar with gravity in this form.
From this point of view we identify the effects of gravity with the geodesic accelerations
relative to our static coordinates, as represented by the Christoffel symbols. Indeed
Einstein admitted that he conceptually identified the gravitational field with the
Christoffel symbols, despite the fact that it's possible to have non-vanishing Christoffel
symbols in flat spacetime, as discussed in Section 5.6
However, we can also take the opposite view. Rather than focusing on "static" coordinate
systems with constant metric determinants which make the first two terms of (5) vanish,
we can focus on "free-falling" inertial coordinates (also known as Riemann normal
coordinates) in terms of which the Christoffel symbols, and therefore the second and
fourth terms of (5), vanish at the origin. In other words, we "abstract away" the original
sense of gravity as the extrinsic acceleration relative to some physically distinguished
system of static coordinates (such as the Schwarzschild coordinates), and focus instead
on the intrinsic tidal accelerations (i.e., local geodesic deviations) that correspond to the
intrinsic curvature of the manifold. At the origin of Riemann normal coordinates the
Ricci tensor

reduces to

where subscripts following commas signify partial derivatives with respect to the
designated coordinate. Making use of the skew symmetry on the lower three indices of
the Christoffel symbol partial derivatives in these coordinates (as described in Section
5.7), the second term on the right hand side can be replaced with the negative of its two
complementary terms given by rotating the lower indices, so we have

Noting that each of the three terms on the right side is now a partial derivative of a
contracted Christoffel symbol, we have

At the origin of Riemann normal coordinates the first partial derivatives of g, and
therefore of
, all vanish, so the chain rule allows us to bring those factors outside
the differentiations, and noting the commutativity of partial differentiation we arrive at
the expression for the components of the Ricci tensor at the origin of Riemann normal
coordinates

Thus the vacuum field equations Rab = 0 reduce to

The quantity
is essentially a scale factor for the incremental volume element V. In
fact, for any scalar field we have

and taking =1 gives the simple volume. Therefore, at the origin of Riemann normal
(free-falling inertial) coordinates we find that the components of the Ricci tensor Rab are

simply the second derivatives of the proper volume of an incremental volume element,
divided by that volume itself. Hence the vacuum field equations Rab = 0 simply express
the vanishing of these second derivatives with respect to any two coordinates (not
necessarily distinct). Likewise the "complete" field equations in the form of (3) signify
that three times the second derivatives of the volume, divided by the volume, equal the
corresponding components of the "divergence-free" energy-momentum tensor expressed
by the right hand side of (3).
In physical terms this implies that a small cloud of free-falling dust particles initially at
rest with respect to each other does not change it's volume during an incremental advance
of proper time. Of course, this doesn't give a complete description of the effects of
gravity in a typical gravitational field, because although the volume of the cloud isn't
changing at this instant, its shape may be changing due to tidal acceleration. In a
spherically symmetrical field the cloud will become lengthened in the radial direction and
shortened in the normal directions. This variation in the shape is characterized by the
Weyl tensor, which in general may be non-zero even when the Ricci tensor vanishes.
It may seem that conceiving of gravity purely as tidal effect ignores what is usually the
most physically obvious manifestation of gravity, namely, the tendency of objects to "fall
down", i.e., the acceleration of the geodesics relative to our usual static coordinates near a
gravitating body. However, in most cases this too can be viewed as tidal accelerations,
provided we take a wider view of events. For example, the fall of a single apple to the
ground at one location on Earth can be transformed away (locally) by a suitable system of
accelerating coordinates, but the fall of apples all over the Earth cannot. In effect these
apples can be seen as a spherical cloud of dust particles, each following a geodesic path,
and those paths are converging and the cloud's volume is shrinking at an accelerating rate
as the shell collapses toward the Earth. The rate of acceleration (i.e., the second
derivative with respect to time) is proportional to the mass of the Earth, in accord with
the field equations.
5.5 The Schwarzschild Metric From Kepler's 3rd Law
In that same year [1665] I began to think of gravity extending to the orb of the Moon &
from Keplers rule of the periodical times of the Planets being in sesquialterate
proportion of their distances from the centers of their Orbs, I deduced that the forces
which keep the Planets in their Orbs must be reciprocally as the squares of their
distances from the centers about which they revolve: and thereby compared the force
requisite to keep the Moon in her Orb with the force of gravity at the surface of the
earth, and found them answer pretty nearly.
Isaac Newton

The first and still the most important rigorous solution of the Einstein field equations was
found by Schwarzschild in 1916. Although it's quite difficult to find exact analytical
solutions of the complete field equations for general situations, the task is immensely
simplified if we restrict our attention to highly symmetrical physical configurations. For
example, it's obvious that the flat Minkowski metric trivially satisfies the field equations.
The simplest non-trivial configuration in which gravity plays a role is a static mass point,

for which we can assume the metric has perfect spherical symmetry and is independent of
time. Let r denote the radial spatial coordinate, so that every point on a surface of
constant r has the same intrinsic geometry and the same relation to the mass point, which
we fix at r = 0. Also, let t denote our temporal coordinate. Any surface of constant r and t
must possess the two-dimensional intrinsic geometry of a 2-sphere, and we can scale the
radial parameter r such that the area of this surface is 4 r2. (Notice that since the space
may not be Euclidean, we don't claim that r is "the radial distance" from the mass point.
Rather, at this stage r is simply an arbitrary radial coordinate scaled to give the familiar
Euclidean surface area.) With this scaling, we can parameterize the two-dimensional
surface at any given r (and t) by means of the ordinary "longitude and latitude" spherical
metric

where dS is the incremental distance on the surface of an ordinary sphere of radius r


corresponding to the incremental coordinate displacements d and d. The coordinate
represents "latitude", with = 0 at the north pole and = /2 at the equator. The
coordinate represents the longitude relative to some arbitrary meridian.
On this basis, we can say that the complete spacetime metric near a spherically
symmetrical mass m must be of the form

where g = r2, g = r2 sin()2, and gtt and grr are (as yet) unknown functions of r and
the central mass m. Of course, if we set m = 0 the functions gtt and grr must both equal 1
in order to give the flat Minkowski metric (in polar form), and we also expect that as r
increases to infinity these functions both approach 1, regardless of m, since we expect the
metric to approach flatness sufficiently far from the gravitating mass.
This metric is diagonal, so the non-zero components of the contravariant metric tensor are
g = 1/g. In addition, the diagonality of the metric allows us to simplify the definition
of the Christoffel symbols to

Now, the only non-zero partial derivatives of the metric coefficients are

along with gtt/dr and grr/dr, which are yet to be determined. Inserting these values into

the preceding equation, we find that the only non-zero Christoffel symbols are

These are the coefficients of the four geodesic equations near a spherically symmetrical
mass. We assume that, in the absence of non-gravitational forces, all natural motions
(including light rays and massive particles) follow geodesic paths, so these equations
provide a complete description of inertial/gravitational motions of test particles in a
spherically symmetrical field. All that remains is to determine the metric coefficients gtt
and grr.
We expect that one possible solution should be circular Keplerian orbits, i.e., if we regard
r as corresponding (at least approximately) to the Newtonian radial distance from the
center of the mass, then there should be a circular geodesic path at constant r that
revolves around the central mass m with an angular velocity of , and these quantities
must be related (at least approximately) in accord with Kepler's third law

(The original deductions of an inverse-square law of gravitation by Hooke, Wren,


Newton, and others were all based on this same empirical law. See Section 8.1 for a
discussion of the origin of Kepler's law.) If we consider purely circular motion on the
equatorial plane ( = /2) at constant r, the metric reduces to

and since dr/d = 0 the geodesic equations are simply

Multiplying through by (d/dt)2 and identifying the angular speed with the derivative of
with respect to the coordinate time t, the right hand equation becomes

For consistency with Kepler's Third Law we must have 2 equal (or very nearly equal) to
m/r3, so we make this substitution to give

Integrating this equation, we find that the metric coefficient gtt must be of the form k
(2m/r) where k is a constant of integration. Since gtt must equal 1 when m = 0 and/or as r
approaches infinity, it's clear that k = 1, so we have

Also, for a photon moving away from the gravitating mass in the purely radial direction
we have d = 0, and so our basic metric for a purely radial ray of light gives

Invoking the symmetry v 1/v, we select the factorization gtt = dr/dt and grr = dt/dr,
which implies grr = 1/gtt. This gives the complete Schwarzschild metric

from which nearly all of the experimentally accessible consequences of general relativity
follow.
In matrix form the Schwarzschild metric is written as

Now that we've determined gtt and grr, we have the partials

so the Christoffel symbols that we previously left undetermined are

Therefore, the complete set of geodesic equations for the Schwarzschild metric are

There are all parametric equations, where denotes a parameter that monotonically
varies along the path. When dealing with massive particles, which travel at sub-light
speeds, we must choose proportional to , the integrated lapse of proper time along the
path. On the other hand, the lapse of proper time along the path of a massless particle
(such as a photon) is zero by definition, so this raises an interesting question: How is it
possible to extremize the length of a path whose length is identically zero? Even
though the path of a photon has singular proper time, the path is not singular in all
respects, so we can still parameterize the path by simply assigning monotonic values of
to the points on the path. (Notice that, since geodesics are directionally symmetrical, it
doesnt matter whether is increasing or decreasing in the direction of travel.) An
alternative approach to solving for light-like geodesics, based on Fermats principle of
least time, will be discussed in Section 8.4.
We applied Kepler's Third Law as a heuristic guide to these equations of motion, but
there is a certain ambiguity in the derivation, due to the distinction between coordinate
time t and the orbiting object's proper time . Recall that we defined the angular speed
of the orbit as d/dt rather than d/d This illustrates the unavoidable ambiguity in

carrying over Newtonian laws of mechanics to the relativistic framework. Newtonian


physics didn't distinguish between the proper time along a particular path and coordinate
time - not surprisingly - since the two are practically indistinguishable for objects moving
at much less than the speed of light. Nevertheless, the slight deviation between these two
time parameters has observable consequences, and provides important tests for
distinguishing between the space geodesic approach and the Newtonian force-at-adistance approach to gravitation. We've assumed that Kepler's Third law is exactly
satisfied with respect to coordinate time t, but only approximately with respect to the
orbiting object's proper time . It's interesting that the Newtonian free-fall formulas for
purely radial paths are also applicable exactly in relativity, but only if time is interpreted
as the proper time of the falling particle. Thus we can claim an exact correspondence
between Newtonian and relativistic laws in each of these two fundamental cases by a
suitable correspondence of the time coordinates, but no single correspondence works for
both of them.
To show that the equations of motion derived above (taking as the parameter ) are
fully equivalent to those of Newtonian gravity in the weak slow limit, we need only note
that the scale factor between r and t is so great that we can neglect any terms that have a
factor of dr/dt unless that term is also divided by r, in which case the scale factor cancels
out. Also we can assume that dt/d is essentially equal to 1, and it's easy to see that if the
motion of a test particle is initially in the plane = /2 then it remains always in that
plane, and by spherical symmetry this applies to all planes. So we can assume = /2 and
with the stated approximations the equations of motion reduce to the familiar Newtonian
equations

where is the angular velocity.


5.6 The Equivalence Principle
The important thing is this: to be able at any moment to sacrifice what
we are for what we could become.
Charles du Bois

At the end of a review article on special relativity in 1907, in which he surveyed the
stunning range and power of his unique relativistic interpretation, Einstein included a
section discussing the possibility of extending the idea still further.
So far we have applied the principle of relativity, i.e., the assumption that physical laws are
independent of the state of motion of the reference system, only to unaccelerated reference
systems. Is it conceivable that the principle of relativity also applies to systems that are accelerated
relative to each other?

This might have been regarded as merely a kinematic question, with no new physical

content, since we can obviously re-formulate physical laws to make them applicable in
terms of alternative systems of coordinates. However, as Einstein later recalled, the
thought occurred to him while writing this paper that a person in gravitational free-fall
doesnt feel their own weight. Its as if the gravitational field does not exist. This is
remarkably similar to Galileos realization (three centuries earlier) that, for a person in
uniform motion, it is as if the motion does not exist. Interestingly, Galileo is also closely
associated with the fact that a (homogeneous) gravitational field can be transformed
away by a state of motion, because he was among the first to explicitly recognize the
equality of inertial and gravitational mass. As a consequence of this equality, the free-fall
path of a small test particle in a gravitational field is independent of the particle's
composition. If we consider two coordinate systems S1 and S2, the first accelerating (in
empty space) at a rate in the x direction, and the second at rest in a homogeneous
gravitational field that imparts to all objects an acceleration of in the x direction, then
Einstein observed that
as far as we know, the physical laws with respect to the S1 system do not differ from those with
respect to the S2 system we shall therefore assume the complete physical equivalence of a
gravitational field and a corresponding acceleration of the reference system.

This was the beginning of Einsteins search for an extension of the principle of relativity
to arbitrary coordinate systems, and for a satisfactory relativistic theory of gravity, a
search which ultimately led him to reject special relativity as a suitable framework in
which to formulate the most fundamental physical laws.
Despite the importance that Einstein attached to the equivalence principle (even stating
that the general theory of relativity rests exclusively on this principle), many
subsequent authors have challenged its significance, and even its validity. For example,
Ohanian and Rufinni (1994) emphatically assert that gravitational effects are not
equivalent to the effects arising from an observer's acceleration...", even limited to
sufficiently small regions. In support of this assertion they describe how accelerometers
of arbitrarily small size can detect tidal variations in a non-homogeneous gravitational
field based on local measurements. Unfortunately they overlook the significance of
their own comment regarding gradiometers that the sensitivity attained depends on the
integration time with a typical integration time of 10 seconds the sensitivity
demonstrated in a recent test was about the same as that of the Eotvos balance.
Needless to say, the locality restriction refers to sufficiently small regions of spacetime,
not just to small regions of space. The gradiometer may be only a fraction of a meter in
spatial extent, but 10 seconds of temporal extent corresponds to three billion meters,
which somewhat undermines the claim that the detection can be performed with such
accuracy in an arbitrarily small region of spacetime.
The same kind of conceptual error appears in every example that purports to show the
invalidity of the equivalence principle. For example, one well-known modern author
points out that an arbitrarily small droplet of liquid falling freely in the gravitational field
of a spherical body (neglecting surface tension and wind resistance, etc) will not be
perfectly spherical, but will be slightly ellipsoidal, due to the tidal effects of the
inhomogeneous field and the shape does not approach sphericity as the radius of the

droplet approaches zero. Furthermore, this applies to an arbitrarily brief snapshot of the
falling droplet. He takes this to be proof of the falsity of the equivalence principle,
whereas in fact it is just the opposite. If we began with a perfectly spherical droplet, it
would take a significant amount of time traversing an inhomogeneous field for the shape
to acquire its final ellipsoidal form, and as the length of time goes to zero, the deviation
from sphericity also goes to zero. Likewise, once the droplet has acquired its ellipsoidal
shape, that becomes its initial configuration upon entering any brief snapshot, and of
course it departs from that snapshot with the same shape, in perfect agreement with the
equivalence principle, which tells us to expect all the parts of the droplet to maintain their
initial mutual relations when in free fall.
Other authors have challenged the validity of the equivalence principle by considering the
effects of rotation. Of course, a "sufficiently small" region of spacetime for transforming
away the translatory motion of an object to some degree of approximation may not be
sufficiently small for transforming away the rotational motion to the same degree of
accuracy, but this does not conflict with the equivalence principle; it simply means that
for an infinitesimal particle in a rotating body the "sufficiently small" region of spacetime
is generally much smaller than for a particle in a non-rotating body, because it must be
limited to a small arc of angular travel. In general, all such arguments against the validity
of the (local) equivalence principle are misguided, based on a failure to correctly limit the
extent of the subject region of space and time.
Others have argued that, although the equivalence principle is valid for infinitesimal
regions of spacetime, this limitation renders it more or less meaningless. But this was
answered by Einstein himself several times. For example, when the validity of the
equivalence principle was challenged on the grounds that an arbitrary (inhomogeneous)
gravitational field over some finite region cannot be transformed away by any
particular state of motion, Einstein replied
To achieve the essential equivalence of inertia and gravitation it is not necessary that the
mechanical behavior of two or more masses must be explainable by the mere effect of inertia by
the same choice of coordinates. After all, nobody denies, for example, that the theory of special
relativity does justice to the nature of uniform motion, even though it cannot transform all
acceleration-free bodies together to a state of rest by one and the same choice of coordinates.

This observation should have settled the matter, but unfortunately the same specious
objection to the equivalence principle has been raised by successive generations of
critics. This is ironic, considering a purely geometrical interpretation of gravity would
clearly be impossible if gravitational and inertial acceleration were not intrinsically
identical. The meaning of the equivalence principle (which Einstein called the happiest
thought of my life) is that gravitation is not something that exists within spacetime, but
is rather an attribute of spacetime. Inertial motion is just a special case of free-fall in a
gravitational field. There is no additional entity or coupling present to produce the effects
of gravity on a test body. Gravity is geometry. This may be expressed somewhat
informally by saying that if we take sufficiently small pieces of curved and flat spacetime
we can't tell one from the other, because they are the same stuff. The perfect equivalence
between gravitational and inertial mass noted by Galileo implies that kinematic

acceleration and the acceleration of gravity are intrinsically identical, and this makes
possible a purely geometrical interpretation of gravity.
At the beginning of his 1916 paper on the foundations of the general theory of relativity,
Einstein discussed the need for an extension for an extension of the postulate of
relativity, and by considering the description of a physical object in terms of a rotating
system of coordinates he explained why Euclidean geometry does not apply. This is the
most common way of justifying the abandonment of Euclidean geometry, but in a paper
written in 1914 Einstein gave a more elementary and (arguably) more profound reason
for turning from Euclidean to Riemannian geometry. He pointed out that, prior to Faraday
and Maxwell, the fundamental laws of physics contained finite distances, such as the
distance r in Coulombs inverse-square law for the electric force F = q1q2/ r2. Euclidean
geometry is the appropriate framework in which to represent such laws, because it is an
axiomatic structure based on finite distances, as can be seen from propositions such as the
Pythagorean theorem r12 = r22 + r32, where r1, r2, r3 are the finite lengths of the edges of a
right triangle. However, Einstein wrote
Since Maxwell, and by his work, physics has undergone a fundamental revision insofar as the
demand gradually prevailed that distances of points at a finite range should not occur in the
elementary laws, i.e., theories of action at a distance are now replaced by theories of local
action. One forgot in this process that the Euclidean geometry too as it is used in physics
consists of physical theorems that, from a physical aspect, are on an equal footing with the integral
laws of Newtonian mechanics of points. In my opinion this is an inconsistent attitude of which we
should free ourselves.

In other words, when action at a distance theories were replaced by local action
theories, such as Maxwells differential equations for the electromagnetic field, in which
only differentials of distance and time appear, we should have, for consistency, replaced
the finite distances of Euclidean geometry with the differentials of Riemannian geometry.
Thus the only valid form of the Pythagorean theorem is the differential form ds2 = dx2 +
dy2. Einstein then commented that it is rather unnatural, having taken this step, to insist
that the coefficients of the squared differentials must be constant, i.e., that the RiemannChristoffel curvature tensor must vanish. Hence we should regard Riemannian geometry
rather than Euclidean geometry as the natural framework in which to formulate the
elementary laws of physics.
From these considerations it follows rather directly that the influence of both inertia and
gravitation on a particle should be expressed by the geodesic equations of motion

Einstein often spoke of the first term as representing the inertial part, and the second
term, with the Christoffel symbols , as representing the gravitational field, and he was
criticized for this, because the Christoffel symbols are not tensors, and they can be nonzero in perfectly flat spacetime simply by virtue of curvilinear coordinates. To illustrate,
consider a flat plane with either Cartesian coordinates x,y or polar coordinates r, as

shown below

With respect to the Cartesian coordinates we have the familiar Pythagorean line element
(ds)2 = (dx)2 + (dy)2. Also, we know the polar coordinates are related to the Cartesian
coordinates by the equations x = r cos() and y = r sin(), so we can evaluate the
differentials

which of course are the transformation equations for the covariant metric tensor.
Substituting these differentials into the Pythagorean metric equation, we have the metric
for polar coordinates (ds)2 = (dr)2 + r2 (d)2. Therefore, the covariant and contravariant
metric tensors for these polar coordinates are

and we have the determinant g = r2. The only non-zero partial derivatives of the covariant
metric components are
and
, so the only non-zero
r

Christoffel symbols are = -r and r = r = 1/r. Inserting these values into (1)
gives the geodesic equations for this surface

Since we know this surface is a flat plane, the geodesic curves must be simply straight
lines, and indeed it's clear from these equations that any purely radial path (for which d
/ds = 0) is a geodesic. However, paths going "straight" in the direction (at constant r)
are not geodesics, and these equations describe how the coordinates must vary along any
given trajectory in order to maintain a geodesic path on the plane. Of course, if we insert

these polar metric components into Gauss's curvature formula we get K = 0, consistent
with the fact that the surface is flat. The reason the geodesics on this surface are not
simple linear functions of the coordinates is not because the geodesics are curved, but
because the coordinates are curved. Hence it cannot be strictly correct to identify the
second term (or the Christoffel symbols) as the components of a gravitational field.
As early as 1916 Einstein was criticized for referring to the Christoffel symbols as the
components of the gravitational. In response to a paper by Friedlich Kottler, Einstein
wrote
Kottler censures that I interpret the second term in the equations of motion as an expression of the
influence of the gravitational field upon the mass point, and the first term more or less as the
expression of the Galilean inertia. Allegedly this would introduce real forces of the gravitational
field and this would not comply with the spirit of the equivalence principle. My answer to this is
that this equation as a whole is generally covariant, and therefore is quite in compliance with the
hypothesis of covariance. The naming of the parts, which I have introduced, is in principle
meaningless and only meant to appeal to our physical habit of thinking that is why I introduced
these quantities even though they do not have tensorial character. The principle of equivalence,
however, is always satisfied when equations are covariant.

To some extent, Einstein side-stepped the criticism, because he actually did regard the
Christoffel symbols as, in some sense, representing true gravity, even in flat spacetime.
The "correct" classroom view today is that gravity is present only when intrinsic
curvature is present, but it is actually no so easy to characterize the presence or absence
of gravity in general relativity, especially because the flat metric of spacetime can be
regarded as a special case of a gravitational field, rather than the absence of a
gravitational field. This is the point of view the Einstein maintained throughout his life, to
the consternation of some school teachers.
Consider again the flat two-dimensional space discussed above, and imagine some
creatures living on a small region of this plane, and suppose they are under the
impression that the constant-r and constant- loci are straight. They would have to
conclude that the geodesic paths were curved, and that objects which naturally follow
those paths are being influenced by some "force field". This is exactly analogous to
someone in an upwardly accelerating elevator in empty space (i.e., far from any
gravitating body). In terms of a coordinate system co-moving with the elevator, the
natural paths of things are different than they would normally be, as if those objects were
being influenced by an additional force field. This is exactly analogous to the perceptions
of the creatures on our flat plane, except that it is their axis which is non-linear, whereas
our elevator's t axis is non-linear. Inside the accelerating elevator the additional tendency
for geodesic paths to "veer off" is not really due to any extra non-linearity of the
geodesics, it's due to the non-linearity of the elevator's coordinate system. Hence most
people today would say that non-zero Christoffel symbols, by themselves, should not be
regarded as indicative of the presence of "true" gravity. If the intrinsic curvature is zero,
then non-vanishing Christoffel symbols simply represent the necessary compensation for
non-linear coordinates, so, at most (the argument goes) they represent "pseudo-gravity"
rather than true gravity in such circumstances.

But the distinction between pseudo-gravity and true gravity is precisely what
Einstein denied. The equivalence principle asserts that these are intrinsically identical.
Einsteins point hasn't been fully appreciated by some subsequent writers of relativity text
books. In a letter to his friend Max von Laue in 1950 he tried to explain:
...what characterizes the existence of a gravitational field from the empirical standpoint is the nonvanishing of the lik, not the non-vanishing of the [curvature]. If one does not think intuitively in
such a way, one cannot grasp why something like a curvature should have anything at all to do
with gravitation. In any case, no reasonable person would have hit upon such a thing. The key for
the understanding of the equality of inertial and gravitational mass is missing.

The point of the equivalence principle is that curving coordinates are gravitation, and
there is no intrinsic ontological difference between true gravity and pseudo-gravity.
On a purely local (infinitesimal) basis, the phenomena of gravity and acceleration were,
in Einstein's view, quite analogous to the electric and magnetic fields in the context of
special relativity, i.e., they are two ways of looking at (or interpreting) the same thing, in
terms of different coordinate systems. Now, it can be argued that there are clear physical
differences between electricity and magnetism (e.g., no magnetic monopoles) and how
they are "produced" by elementary particle "sources", but one of the keys to the success
of special relativity was that it unified the electric and magnetic fields in free space
without getting bogged down (as Lorentz did) in trying to fathom the ultimate
constituency of elementary charged particles, etc. Likewise, general relativity unifies
gravity and non-linear coordinates - including acceleration and polar coordinates - in free
space, without getting bogged down in the "source" side of the equation, i.e., the
fundamental nature of how gravity is ultimately "produced", why the elementary massive
particles have the masses they have, and so on.
What Einstein was describing to von Laue was the conceptual necessity of identifying the
purely geometrical effects of non-inertial coordinates with the physical phenomenon of
gravitation. In contrast, the importance and conceptual significance of the curvature (as
opposed to the connection) is mainly due to the fact that it defines the mode of coupling
of the coordinates with the "source" side of the equation. Of course, since the effects of
gravitation are reciprocal, all test particles are also sources of gravitation, and it can be
argued that the equivalence principle is incomplete because it considers only the
passive response of inertial mass points to a gravitational field, whereas a complete
account must include the active participation of each mass point in the mutual production
of the field. In view of this, it might seem to be a daunting task to attempt to found a
viable theory of gravitation on the equivalence principle just as it had seemed
impossible to most 19th-century physicists that classical electrodynamics could proceed
without determining the structure and self-action of the electron. But in both cases,
almost miraculously, it turned out to be possible. On the other hand, as Einstein himself
pointed out, the resulting theories were necessarily incomplete, precisely because they
side-stepped the source aspect of the interactions.
Maxwell's theory of the electric field remained a torso, because it was unable to set up laws for the
behaviour of electric density, without which there can, of course, be no such thing as an electromagnetic field. Analogously the general theory of relativity furnished a field theory of gravitation,
but no theory of the field-creating masses.

5.7 Riemannian Geometry


Investigations like the one just made, which begin from general concepts, can serve only
to ensure that this work is not hindered by too restricted concepts, and that progress in
comprehending the connection of things is not obstructed by traditional prejudices.
Riemann, 1854

An N-dimensional Riemannian manifold is characterized by a second-order metric tensor


g(x) which defines the differential metrical distance along any smooth curve in terms of
the differential coordinate components according to

where, as usual, summation is implied over repeated indices in any product. We've
written the metric components as g(x) to emphasize that they are not constant, but are
allowed to be continuous differentiable functions of position. The fact that the metric
components are defined as continuous implies that over a sufficiently small region around
any point they may be regarded as constant to the first order. Given any such region in
which the metric components are constant we can apply a linear transformation to the
coordinates so as to diagonalize the metric, and rescale the coordinates so that the
diagonal elements of the metric are all 1 (or -1 in the case of a pseudo-Riemannian
metric). Therefore, the metrical relations on the manifold over any sufficiently small
region approach arbitrarily close to flatness to the first order in the coordinate
differentials. In general, however, the metric components need not be constant to the
second order of changes in position. If there exists a coordinate system at a point on the
manifold such that the metric components are constant in the first and second order, then
the manifold is said to be totally flat at that point (not just asymptotically flat).
Since the metric components are continuous and differentiable, we can expand each
component into a Taylor series about any given point p as follows

where g is evaluated at the point p, and in general the symbol g,... denotes the partial
derivatives of g with respect to x, x, x,... at the point p. Thus we have

and so on. These matrices (which are not necessarily tensors) are obviously symmetric
under transpositions of and , as well as under any permutations of ,,,... (because

partial differentiation is commutative). In terms of these symbols we can write the basic
line element near the point p as

where the matrices g, g,, g,, etc., are constants. For incremental paths
sufficiently close to the origin, all the terms involving x become vanishingly small, and
we're left with the familiar formula for the differential line element (ds)2 = g dx dx. If
all the components of g, and g, are zero at the point p, then the manifold is totally
flat at that point (by definition). However, the converse doesn't follow, because it's
possible to define a coordinate system on a flat manifold such that the derivatives of the
metric are non-zero at points where the manifold is totally flat. (For example, polar
coordinates on a flat plane have this characteristic.)
We seek a criterion for determining whether a given metric at a point p can be
transformed into one for which the first and second order coefficients g, and g, all
vanish at that point. By the definition of a Riemannian manifold there exists a coordinate
system with respect to which the first partial derivatives of the metric components vanish
(local flatness). This can be visualized by imagining an N-dimensional Euclidean space
with a Cartesian coordinate system tangent to the manifold at the given point, and
projecting the coordinate system (with the origin at the point of tangency) from this
Euclidean space onto the manifold in the region near the origin O. With respect to such
coordinates the first-order metric components g, vanish, so the lowest-order nonconstant terms of the metric are of the second order, and the line element is given by

In terms of such coordinates the matrix g, contains all the information about the
intrinsic curvature (if any) of the manifold at the origin of these coordinates. Naturally
the g coefficients are symmetric in the first two indices because of the symmetry of
the metric, and they are also symmetric in the last two indices because partial
differentiation is commutative.
Furthermore, we can always transform and rescale the coordinates in such a way that the
ratios of the coordinates of any given point P are equal to the ratios of the differential
components of the geodesic OP at the origin, and the sum of the squares of the
coordinates equals the square of the geodesic distance from the origin. These are called
Riemann normal coordinates, since they were introduced by Riemann in his 1854
lecture. (Note that these coordinates are well-defined only out to some finite distance
from the origin, beyond which it's possible for geodesics emanating from the origin to
intersect with each other, resulting in non-unique coordinates, closely analogous to the
accelerating coordinate systems discussed in Section 4.5.) The advantage of these
coordinates is that, in addition to ensuring all g, = 0, they impose two more symmetries

on the gab,cd, namely, symmetry between the two pairs of indices, and cyclic skew
symmetry on the last three indices. In other words, at the origin of Riemann normal
coordinates we have

To understand why these symmetries occur, first consider the simple two-dimensional
case with x,y coordinates on the surface, and recall that Riemann normal coordinates are
defined such that the squared geodesic distance to any point x,y near the origin is given
by s2 = x2 + y2. It follows that if we move from the point x,y to the point x+dx, y+dy,
and if the increments dx,dy are in the same proportion to each other as x is to y, then the
new position is along the same geodesic, and so the squared incremental distance (ds)2
equals the sum (dx)2 + (dy)2. Now, if the surface is flat, this simple expression for (ds)2
will hold regardless of the ratio of dx/dy, but for a curved surface it will hold when and
only when dx/dy = x/y. In other words, the line element at a point near the origin of
Riemann normal coordinates on a curved surface reduces to the Pythagorean line element
if and only if the quantity xdy ydx equals zero. Furthermore, we know that the firstorder terms of the metric vanish in Riemann coordinates, so even when xdy ydx is nonzero, the line element differs from the Pythagorean form only by second-order (and
higher) terms in the metric. Therefore, the deviation of the line element from the simple
Pythagorean sum of squares must consist of terms of the form xxdxdx, and it must
identically vanish if and only if xdy ydx equals zero. The only expression satisfying
these requirements is k(xdy ydx)2 for some constant k, so the line element on a twodimensional surface with Riemann normal coordinates is of the form

The same reasoning can be applied in N dimensions. If we are given a point (x1,x2,...,xn)
in an N-dimensional manifold near the origin of Riemann coordinates, then the distance
(ds)2 from that point to the point (x1+dx1, x2+dx2, ..., xN+dxN) is given by the sum of
squares of the components if the differentials are in the same proportions to each other as
the x coordinates, which implies that every expression of the form (xdx xdx)
vanishes. If one or more of these N(N1)/2 expressions does not vanish, then the line
element of a curved manifold will contain metric terms of the second order. The most
general combination of second-order terms that vanishes if all the differentials are in
proportion to the coordinates is a linear combination of the products of two of those
terms. In other words, the general line element (up to second order) near the origin of
Riemann normal coordinates on a curved surface must be of the form

where the K are constants at the given point of the manifold. These coefficients
represent the deviation from flatness of the manifold, and they vanish if and only if the
curvature is zero (i.e., the manifold is flat). Notice that if all but two of the x and dx are

zero, this reduces to the preceding two-dimensional formula involving just the square of
(x1dx2 x2dx1) and a single curvature coefficient. Also note that in a flat manifold, the
quantity xdx xdx is equal to twice the area of the incremental triangle formed by
the origin and the nearby points (x, x) and (dx,dx) on the subsurface containing those
three points, so it is invariant under coordinate transformations that do not change the
scale.
Each individual term in the expansions of the right-hand product in (5) involves four
indices (not necessarily distinct). We can expand each product as shown below

Obviously we have the symmetries and anti-symmetries

Furthermore, we see that the value of K for each of the 24 permutations of indices
contributes to four of the coefficients in the expanded sum of products, so each of those
coefficients is a sum (with appropriate signs) of four K values. Thus the coefficient of x
xdxdx is

Both of the identities (3) immediately follow, making use of the symmetries of the K
array. Its also useful to notice that each of the K index permutations is a simple
transposition of the indices of the metric coefficient in this expression, so the relationship
is invertible up to a constant factor. Using equation (6) we can sum four derivatives of g
(with appropriate signs) to give

provided we impose the same skew symmetry on the K values as applies to the g
derivatives, i.e.,

Hence at any point in a differentiable manifold we can define a system of Riemann


normal coordinates and in terms of those coordinates the curvature of the manifold is
completely characterized by an array R = 12K . (The factor of -12 is
conventional.) We can verify that this is a covariant tensor of rank 4. It is called the

Riemann-Christoffel curvature tensor. At the origin of coordinates such that the first
derivatives of the metric coefficients vanish, the components of the Riemann tensor are

If we further specialize to a point at the origin of Riemann normal coordinates, we can


take advantage of the special symmetry gab,cd = gcd,ab , allowing us to express the curvature
tensor in the very simple form

Since the g are symmetrical under transpositions of [] and of [], it's apparent
from (8) that if we transpose the first two indices of R we simply reverse the sign of
the quantity, and likewise for the last two indices. Also, if we swap the first and last pairs
of indices we leave the quantity unchanged. Of course, we also have the same skew
symmetry on three indices as we have with the K array, i.e., if we hold one index fixed
and cyclically permute the other three, the sum of those three quantities vanishes.
Symbolically these algebraic symmetries can be summarized as

These symmetries imply that there are only 20 algebraically independent components of
the curvature tensor in four dimensions. (See Part 7 of the Appendix for a proof.) It
should be emphasized that (8) gives the components of the covariant metric tensor only at
the origin of a tangent coordinate system (in which the first derivatives of the metric are
zero). The unique fully-covariant tensor that reduces to (8) when transformed to tangent
coordinates is

where g is the matrix inverse of the zeroth-order metric array g. and abc is the
Christoffel symbol (of the first kind) [ab,c] as defined in Chapter 5.4. By inspection of
the quantity in brackets we verify that all the symmetry properties of Rabcd continue to
apply in this general form, applicable to any curvilinear coordinates.
We can illustrate Riemann's approach to curvature with some simple examples in twodimensional manifolds. First, it's clear that if the geodesic lines emanating from a point
on a flat plane are drawn out, and symmetrical x,y coordinates are assigned to every point
in accord with the prescription for Riemannian coordinates, we will find that all the
components of Rabcd equal zero, and the line element is simply (ds)2 = (dx)2 + (dy)2. Now
consider a two-dimensional surface whose height above the xy plane is h = bxy for some
constant k. This is a special case of the family of two-dimensional surfaces discussed in
Section 5.3. The line element in terms of projected x and y coordinates is

Using the equations of the geodesic paths on this surface given at the end of Section 5.4,
we can plot the geodesic paths emanating from the origin, and superimpose the Riemann
normal coordinate (X,Y) grid, as shown below.

From the shape of the loci of constant X and constant Y, we infer that the transformation
between the original (x,y) coordinates and the Riemann normal coordinates (X,Y) is
approximately of the form

Substituting these expressions into the line element and discarding all terms higher than
second order (because we are interested only in the region arbitrarily close to the origin)
we get

In order for X,Y to be Riemann normal coordinates we must have

and so we must set = b2/3. This allows us to write the line element in the form

The last term formally represents four components of the curvature, but the symmetries
make them all equal up to sign, i.e., we have

Therefore, we have b2 = 12K1212 = R1212 , which implies that the curvature of this surface
at the origin is R1212 = b2, in agreement with what we found in Section 5.3. In general, the
Gaussian curvature K, i.e., the product of the two principle curvatures, on a twodimensional surface, is related to the Riemann tensor by K = R1212 / g where g is the
determinant of the metric tensor, which is unity at the origin of Riemann normal
coordinates. We also have K = 3k for a surface with the line element (4).
For another example, consider a two dimensional surface whose height above the tangent
plane at the origin is h = Ax2 + Bxy + Cy2. We can rotate the coordinates to bring the
height into diagonal form, so we need only consider the form h = Mx2 + Ny2 for constants
M,N, and by re-scaling x and y if necessary we can set N equal to M, so we have a
symmetrical paraboloid with height h = M(x2 + y2). For x and y coordinates projected
onto this surface the metric is

and we have dh = 2M(xdx + ydy). Making this substitution, we find the metric tensor is

At the origin, the first derivatives of the metric all vanish and g = 1, consistent with the
fact that x,y is a tangent coordinate system. Also we have the symmetry gab,cd = gcd,ab.
Therefore, since gxy,xy = 4M2 and gxx,yy = 0, we can compute all the components of the
Riemann tensor at the origin, such as

which equals the curvature at that point. However, as an alternative, we could make use
of the Fibonacci identity

to substitute for (dh)2 into the expression for the squared line element. This gives

Rearranging terms, this can be written in the form

This is not in the form of (4), because the Euclidean part of the metric has a variable
coefficient. However, its interesting to observe that the ratio of the coefficients of the
Riemannian part to the square of the coefficient of the Euclidean part is precisely the
Gaussian curvature on the surface

where subscripts on h denote partial derivatives. The numerator and denominator are both
determinants of 2x2 matrices, representing different "ground forms" of the surface. This
shows that the curvature of a two-dimensional space (or sub-space) at the origin of
tangent coordinates at a point is proportional to the coefficient of (xdyydx)2 in the line
element of the surface at that point when decomposed according to the Fibonacci
identity.
Returning to general N-dimensional manifolds, for any point p of the manifold we can
express the partial derivatives of the metric to first order in terms of these quantities as

The connection of this manifold is customarily expressed in the form of Christoffel


symbols. To the first order near the origin of our coordinate system the Christoffel
symbols of the first kind are

Obviously the Christoffel symbols vanish at the origin of Riemann coordinates, where the
first derivatives of the metric coefficients vanish (by definition). We often make use of
the first partial derivatives of these symbols with respect to the position coordinates.
These can be expressed to the lowest order as

It follows from the symmetries of the partial derivatives of the metric at the origin of
Riemann normal coordinates that the first partials of the Christoffel symbols possess the
same cyclic skew symmetry, i.e.,

Consequently we have the useful relation (at the origin of Riemann normal coordinates)

Other useful formula can be derived based on the fact that we frequently need to deal
with expressions involving the components of the inverse (i.e., contravariant) metric
tensor, g(x), which tend to be extremely elaborate expressions except in the case of
diagonal matrices. For this reason it's often very advantageous to work with diagonal
metrics, noting that every static spacetime metric can be diagonalized. Given a diagonal
metric, all the components of the curvature tensor can be inferred from the expressions

by applying the symmetries of the Riemann tensor. If we further specialize to Riemann


coordinates, in terms of which all the first derivatives of the metric vanish, the
components of the Riemann curvature tensor for a diagonal metric are summarized by

It is easily verified that this is consistent with the expression for the curvature tensor in
Riemann coordinates given in equation (8), together with the symmetries of this tensor, if
we set all the non-diagonal metric components to zero.
To find the equations for geodesic paths on a Riemannian manifold, we can take a
slightly different approach than we took in Section 5.4. For clarity, we will describe this
in terms of a two-dimensional manifold, but it immediately generalizes to any number of
dimensions. Since by definition a Riemannian manifold is essentially flat on a
sufficiently small scale (a fact which corresponds to the equivalence principle for the
spacetime manifold), there necessarily exist coordinates x,y at any given point such that
the geodesic paths through that point are simply straight lines. Thus if we let functions
x(s) and y(s) denote the parametric equations of the path, where s is the path length, these
functions satisfy the differential equation

Any other (possibly curvilinear) system of coordinates X,Y will be related to the x,y
coordinates by a transformation of the form

Focusing on just the x expression, we can divide through by ds to give

Substituting this into the equation of motion for the x coordinate gives

Expanding the differentiation, we have

Noting the differential identities

we can divide through by ds and then substitute into the preceding equation to give

A similar equation results from the original geodesic equation for y. To abbreviate these
expressions we can use superscripts to denote different coordinates, i.e., let
X1 = X

X2 = Y

x1 = x

x2 = y

Then with the usual summation convention we can express both the above equation and
the corresponding equation for y in the form

In order to isolate the second derivative of the new coordinates X with respect to s, we
can multiply through these equations by

The partial derivatives represented by

to give

are just the components of the

transformation from x to X coordinates, whereas the partials represented by


are the components of the inverse transformation from X to x. Therefore the product of
these two is simply the identity transformation, i.e.,

where
signifies the Kronecker delta, defined as 1 is and 0 otherwise. Hence the
first term of (10) is

and so equation (10) can be re-written as

This is the equation for a geodesic with respect to the arbitrary system of curvilinear
coordinates X. The expression inside the parentheses is the Christoffel symbol
,
which makes it clear that this symbol describes the relationship between the arbitrary
coordinates X and the special coordinates x with respect to which the geodesics of the
surface are unaccelerated. We saw in Section 5.4 how this can be expressed purely in
terms of the metric coefficients and their first derivatives with respect to any given set of
coordinates. That's obviously a more useful way of expressing them, because we seldom
are given special "geodesically aligned" coordinates. In fact, the geodesic paths are
essentially what we are trying to determine, given only an arbitrary system of coordinates
and the metric coefficients with respect to those coordinates. The formula in Section 5.4
enables us to do this, but it's conceptually useful to understand that

where x essentially represents Cartesian coordinates tangent to the manifold, with


respect to which geodesics of the surface (or space) are simple straight lines, and X
represents the arbitrary coordinates in terms of which we are trying to express the
conditions for geodesic paths. In a sense we can say that the Christoffel symbols describe
how our chosen coordinates are curved relative to the geodesic paths at a point. This is
why it's possible for the Christoffel symbols to be non-zero even on a flat surface, if we
are using curved coordinates (such as polar coordinates) as discussed in Section 5.6.
5.8 The Field Equations
You told us how an almost churchlike atmosphere is pervading your
desolate house now. And justifiably so, for unusual divine powers are at
work in there.
Besso to
Einstein, 30 Oct 1915
The basis of Einstein's general theory of relativity is the audacious idea that not only do
the metrical relations of spacetime deviate from perfect Euclidean flatness, but that the
metric itself is a dynamical object. In every other field theory the equations describe the
behavior of a physical field, such as the electric or magnetic field, within a constant and
immutable arena of space and time, but the field equations of general relativity describe
the behavior of space and time themselves. The spacetime metric is the field. This fact is
so familiar that we may be inclined to simply accept it without reflecting on how
ambitious it is, and how miraculous it is that such a theory is even possible, not to
mention (somewhat) comprehensible. Spacetime plays a dual role in this theory, because

it constitutes both the dynamical object and the context within which the dynamics are
defined. This self-referential aspect gives general relativity certain characteristics
different from any other field theory. For example, in other theories we formulate a
Cauchy initial value problem by specifying the condition of the field everywhere at a
given instant, and then use the field equations to determine the future evolution of the
field. In contrast, because of the inherent self-referential quality of the metrical field, we
are not free to specify arbitrary initial conditions, but only conditions that already satisfy
certain self-consistency requirements (a system of differential relations called the Bianchi
identities) imposed by the field equations themselves.
The self-referential quality of the metric field equations also manifests itself in their nonlinearity. Under the laws of general relativity, every form of stress-energy gravitates,
including gravitation itself. This is really unavoidable for a theory in which the metrical
relations between entities determine the "positions" of those entities, and those positions
in turn influence the metric. This non-linearity raises both practical and theoretical
issues. From a practical standpoint, it ensures that exact analytical solutions will be very
difficult to determine. More importantly, from a conceptual standpoint, non-linearity
ensures that the field cannot in general be uniquely defined by the distribution of material
objects, because variations in the field itself can serve as "objects".
Furthermore, after eschewing the comfortable but naive principle of inertia as a suitable
foundation for physics, Einstein concluded that "in the general theory of relativity, space
and time cannot be defined in such a way that differences of the spatial coordinates can
be directly measured by the unit measuring rod, or differences in the time coordinate by a
standard clock...this requirement ... takes away from space and time the last remnant of
physical objectivity". It seems that we're completely at sea, unable to even begin to
formulate a definite solution, and lacking any definite system of reference for defining
even the most rudimentary quantities. It's not obvious how a viable physical theory could
emerge from such an austere level of abstraction.
These difficulties no doubt explain why Einstein's route to the field equations in the years
1907 to 1915 was so convoluted, with so much confusion and backtracking. One of the
principles that heuristically guided his search was what he called the principle of general
covariance. This was understood to mean that the laws of physics ought to be expressible
in the form of tensor equations, because such equations automatically hold with respect to
any system of curvilinear coordinates (within a given diffeomorphism class, as discussed
in Section 9.2). He abandoned this principle at one stage, believing that he and
Grossmann had proven it could not be made consistent with the Poisson equation of
Newtonian gravitation, but subsequently realized the invalidity of their arguments, and
re-embraced general covariance as a fundamental principle.
It strikes many people as ironic that Einstein found the principle of general covariance to
be so compelling, because, strictly speaking, it's possible to express almost any physical
law, including Newton's laws, in generally covariant form (i.e., as tensor equations). This
was not clear when Einstein first developed general relativity, but it was pointed out in
one of the very first published critiques of Einstein's 1916 paper, and immediately

acknowledged by Einstein. It's worth remembering that the generally covariant


formalism had been developed only in 1901 by Ricci and Levi-Civita, and the first real
use of it in physics was Einstein's formulation of general relativity. This historical
accident made it natural for people (including Einstein, at first) to imagine that general
relativity is distinguished from other theories by its general covariance, whereas in fact
general covariance was only a new mathematical formalism, and does not connote a
distinguishing physical attribute. For this reason, some people have been tempted to
conclude that the requirement of general covariance is actually vacuous. However, in
reply to this criticism, Einstein clarified the real meaning (for him) of this principle,
pointing out that its heuristic value arises when combined with the idea that the laws of
physics should not only be expressible as tensor equations, but should be expressible as
simple tensor equations. In 1918 he wrote "Of two theoretical systems which agree with
experience, that one is to be preferred which from the point of view of the absolute
differential calculus is the simplest and most transparent". This is still a bit vague, but it
seems that the quality which Einstein had in mind was closely related to the Machian idea
that the expression of the dynamical laws of a theory should be symmetrical up to
arbitrary continuous transformations of the spacetime coordinates. Of course, the
presence of any particle of matter with a definite state of motion automatically breaks the
symmetry, but a particle of matter is a dynamical object of the theory. The general
principle that Einstein had in mind was that only dynamical objects could be allowed to
introduce asymmetries. This leads naturally to the conclusion that the coefficients of the
spacetime metric itself must be dynamical elements of the theory, i.e., must be acted
upon. With this Einstein believed he had addressed what he regarded as the strongest of
Mach's criticisms of Newtonian spacetime, namely, the fact that Newton's space acted on
objects but was never acted upon by objects.
Let's follow Einstein's original presentation in his famous paper "The Foundation of the
General Theory of Relativity", which was published early in 1916. He notes that for
empty space, far from any gravitating object, we expect to have flat (i.e., Minkowskian)
spacetime, which amounts to requiring that Riemann's curvature tensor Rabcd vanishes.
However, in regions of space near gravitating matter we must clearly have non-zero
intrinsic curvature, because the gravitational field of an object cannot simply be
"transformed away" (to the second order) by a change of coordinates. Thus there is no
system of coordinates with respect to which the manifold is flat to the second order,
which is precisely the condition indicated by a non-vanishing Riemann curvature tensor.
Nevertheless, even at points where the full curvature tensor Rabcd is non-zero, the
contracted tensor of the second rank, Rbc= gadRabcd = Rdbcd may vanish. Of course, a tensor
of rank four can be contracted in six different ways (the number of ways of choosing two
of the four indices), and in general this gives six distinct tensors of rank two. We are able
to single out a more or less unique contraction of the curvature tensor only because of
that tensors symmetries (described in Section 5.7), which imply that of the six
contractions of Rabcd, two are zero and the other four are identical up to sign change.
Specifically we have

By convention we define the Ricci tensor Rbc as the contraction gadRabcd. In seeking
suitable conditions for the metric field in empty space, Einstein observes that
there is only a minimum arbitrariness in the choice... for besides R there is no
tensor of the second rank which is formed from the g and it derivatives, contains
no derivative higher than the second, and is linear in these derivatives This
prompts us to require for the matter-free gravitational field that the symmetrical
tensor R ... shall vanish.
Thus, guided by the belief that the laws of physics should be the simplest possible tensor
equations (to ensure general covariance), he proposes that the field equations for the
gravitational field in empty space should be

Noting that R takes on a particularly simple form on the condition that we choose
coordinates such that
Christoffel symbols as

= 1, Einstein originally expressed this in terms of the

(except that in his 1916 paper Einstein had a different sign because he defined the symbol
abc as the negative of the Christoffel symbol of the second kind.) He then concludes the
section with words that obviously gave him great satisfaction, since he repeated
essentially the same comments at the conclusion of the paper:
These equations, which proceed, by the method of pure mathematics, from the
requirement of the general theory of relativity, give us, in combination with the
[geodesic] equations of motion, to a first approximation Newton's law of
attraction, and to a second approximation the explanation of the motion of the
perihelion of the planet Mercury discovered by Leverrier. These facts must, in
my opinion, be taken as a convincing proof of the correctness of the theory.
To his friend Paul Ehrenfest in January 1916 he wrote that "for a few days I was beside
myself with joyous excitement", and to Fokker he said that seeing the anomaly in
Mercury's orbit emerge naturally from his purely geometrical field equations "had given
him palpitations of the heart". (These recollections are remarkably similar to the
presumably apocryphal story of Newton's trembling hand when he learned, in 1675, of
Picard's revised estimates of the Earth's size, and was thereby able to reconcile his
previous calculations of the Moon's orbit based on the assumption of an inverse-square
law of gravitation.)
The expression R = 0 represents ten distinct equations in the ten unknown metric

components g at each point in empty spacetime (where the term "empty" signifies the
absence of matter or electromagnetic energy, but obviously not the absence of the
metric/gravitational field.) Since these equations are generally covariant, it follows that
given any single solution we can construct infinitely many others simply by applying
arbitrary (continuous) coordinate transformations. Thus, each individual physical
solution has four full degrees of freedom which allow it to be expressed in different
ways. In order to uniquely determine a particular solution we must impose four
coordinate conditions on the g, but this gives us a total of fourteen equations in just ten
unknowns, which could not be expected to possess any non-trivial solutions at all if the
fourteen equations were fully independent and arbitrary. Our only hope is if the ten
formal conditions represented by our basic field equations automatically satisfy four
identities for any values of the metric components, so that they really only impose six
independent conditions, which then would uniquely determine a solution when
augmented by a set of four arbitrary coordinate conditions.
It isn't hard to guess that the four "automatic" conditions to be satisfied by our field
equations must be the vanishing of the covariant derivatives, since this will guarantee
local conservation of any energy-momentum source term that we may place on the right
side of the equation, analogous to the mass density on the right side of Poisson's equation

In tensor calculus the divergence generalizes to the covariant derivative, so we expect


that the covariant derivatives of the metrical field equations must identically vanish. The
Ricci tensor R itself does not satisfy this requirement, but we can create a tensor that
does satisfy the requirement with just a slight modification of the Ricci tensor, and
without disturbing the relation R = 0 for empty space. Subtracting half the metric
tensor times the invariant R = gR gives what is now called the Einstein Tensor

Obviously the condition R = 0 implies G = 0. Conversely, if G = 0 we can see from


the mixed form

that R must be zero, because otherwise R would need to be diagonal, with the
components R/2, which doesn't contract to the scalar R (except in two dimensions).
Consequently, the condition G = 0 is equivalent to R = 0 for empty space, but for
coupling with a non-zero source term we must use G to represent the metrical field.
To represent the "source term" we will use the covariant energy-momentum tensor T,

and regard it as the "cause" of the metric curvature (although one might also conceive of
the metric curvature as, in some temporally symmetrical sense, "causing" the energymomentum). Einstein acknowledged that the introduction of this tensor is not justified by
the relativity principle alone, but it has the virtues of being closely related by analogy
with the Poisson equation from Newton's theory, it gives local conservation of energy and
momentum, and finally that it implies gravitational energy gravitates just as does every
other form of energy. On this basis we surmise that the field equations coupled to the
source term can be written in the form G = kT where k is a constant which must equal
8G (where G is Newton's gravitational constant) in order for the field equations to
reduce to Newton's law in the weak field limit. Thus we have the complete expression of
Einstein's metrical law of general relativity

It's worth noting that although the left side of the field equations is quite pure and almost
uniquely determined by mathematical requirements, the right side is a hodge-podge of
miscellaneous "stuff". As Einstein wrote,
The energy tensor can be regarded only as a provisional means of representing
matter. In reality, matter consists of electrically charged particles... It is only the
circumstance that we have no sufficient knowledge of the electromagnetic field of
concentrated charges that compels us, provisionally, to leave undetermined in
presenting the theory, the true form of this tensor... The right hand side [of (2)] is
a formal condensation of all things whose comprehension in the sense of a field
theory is still problematic. Not for a moment... did I doubt that this formulation
was merely a makeshift in order to give the general principle of relativity a
preliminary closed-form expression. For it was essentially no more than a theory
of the gravitational field, which was isolated somewhat artificially from a total
field of as yet unknown structure.
Alas, neither Einstein nor anyone since has been able to make further progress in
determining the true form of the right hand side of (2), although it is at the heart of
current efforts to reconcile quantum mechanics with general relativity. At present we
must be content to let T represent, in a vague sort of way, the energy density of the
electromagnetic field and matter.
A different (but equivalent) form of the field equations can be found by contracting (2)
with g to give R 2R = R = 8GT, and then substituting for R in (2) to give

which again makes clear that the field equations for empty space are simply R = 0.
Incidentally, the tensor G was named for Einstein because of his inspired use of it, not

because he discovered it. Indeed the vanishing of the covariant derivative of this tensor
had been discovered by Aurel Voss in 1880, by Ricci in 1889, and again by Luigi Bianchi
in 1902, all apparently independently. Bianchi had once been a student of Felix Klein, so
it's not surprising that Klein was able in 1918 to point out regarding the conservation laws
in Einstein's theory of gravitation that we need only "make use of the most elementary
formulae in the calculus of variations". Recall from Section 5.7 that the Riemann
curvature tensor in terms of arbitrary coordinates is

At the origin of Riemann normal coordinates this reduces to gad,cb gac,bd , because in such
coordinates the Christoffel symbols are all zero and we have the special symmetry gab,cd =
gcd,ab. Now, if we consider partial derivatives (which in these special coordinates are the
same as covariant derivatives) of this tensor, we see that the derivative of the quantity in
square brackets still vanishes, because the product rule implies that each term is a
Christoffel symbol times the derivative of a Christoffel symbol. We might also be
tempted to take advantage of the special symmetry gab,cd = gcd,ab , but this is not
permissible because although the two quantities are equal (at the origin of Riemann
normal coordinates), their derivatives are not generally equal. Hence when evaluating the
derivatives of the Riemann tensor, even at the origin or Riemann normal coordinates, we
must consider all four of the metric tensor derivatives in the above expression. Denoting
covariant differentiation with respect to a coordinate xm by the subscript ;m, we have

Noting that partial differentiation is commutative, and the metric tensor is symmetrical,
we see that the sum of these three tensors vanishes at the origin of Riemann normal
coordinates, and therefore with respect to all coordinates. Thus we have the Bianchi
identities

Multiplying through by gadgbc , making use of the symmetries of the Riemann tensor, and
the fact that the covariant derivative of the metric tensor vanishes identically, we have

which reduces to

Thus we have

showing that the "divergence" of the tensor inside the parentheses (the Einstein tensor)
vanishes identically.
As an example of how the theory of relativity has influenced mathematics (in appropriate
reaction to the obvious influence of mathematics on relativity), in the same year that
Einstein, Hilbert, Klein, and others were struggling to understand the conservation laws
of the relativistic field equations, Emmy Noether published her famous work on the
relation between symmetries and conservation laws, and Klein didn't miss the
opportunity to show how Einstein's theory embodied aspects of his Erlangen program.
A slight (but significant) extension of the field equations was proposed by Einstein in
1917 based on cosmological considerations, as a means of ensuring stability of a static
closed universe. To accomplish this, he introduced a linear term with the cosmological
constant as follows

When Hubble and other astronomers began to find evidence that in fact the large-scale
universe is expanding, and Einstein realized his ingenious introduction of the
cosmological constant had led him away from making such a fantastic prediction, he
called it "the biggest blunder of my life.
It's worth noting that Einsteinian gravity is possible only in four dimensions, because in
any fewer dimensions the vanishing of the Ricci tensor R implies the vanishing of the
full Riemann tensor, which means no curvature and therefore no gravity in empty space.
Of course, the actual field equations for the vacuum assert that the Einstein tensor (not
the Ricci tensor) vanishes, so we should consider the possibility of G being zero while R
is non-zero. We saw above that G = 0 implies R = 0, but that was based on the
assumption of a four-dimensional manifold. In general for an n-dimensional manifold we
have R (n/2)R = G, so if n is not equal to 2, and if Guv vanishes, we have G = 0 and it
follows that R = 0, and therefore Ruv must vanish. However, if n = 2 it is possible for G
to equal zero even though R is non-zero. Thus, in two dimensions, the vanishing of Guv
does not imply the vanishing of Ruv. In this case we have

where can be any constant. Multiplying through by guv gives

This is the vacuum solution of Einstein's field equations in two dimensions. Oddly
enough, this is also the vacuum solution for the field equations in four dimensions if is
identified as the non-zero cosmological constant. Any space of constant curvature is of
this form, although a space of this form need not be of constant curvature.
Once the field equations have been solved and the metric coefficients have been
determined, we then compute the paths of objects by means of the equations of motion.
It was originally taken as an axiom that the equations of motion are the geodesic
equations of the manifold, but in a series of papers from 1927 to 1949 Einstein and others
showed that if particles are treated as singularities in the field, then they must propagate
along geodesic paths. Therefore, it is not necessary to make an independent assumption
about the equations of motion. This is one of the most remarkable features of Einstein's
field equations, and is only possible because of the non-linear nature of the equations. Of
course, the hypothesis that particles can be treated as field singularities may seem no
more intuitively obvious than the geodesic hypothesis itself. Indeed Einstein himself was
usually very opposed to admitting any singularities, so it is somewhat ironic that he took
this approach to deriving the equations of motion. On the other hand, in 1939 Fock
showed that the field equations imply geodesic paths for any sufficiently small bodies
with negligible self-gravity, not treating them as singularities in the field. This approach
also suggests that more massive bodies would deviate from geodesics, and it relies on
representing matter by the stress-energy tensor, which Einstein always viewed with
suspicion.
To appreciate the physical significance of the Ricci tensor it's important to be aware of a
relation between the contracted Christoffel symbol and the scale factor of the
fundamental volume element of the manifold. This relation is based on the fact that if the
square matrix A is the inverse of the square matrix B, then the components of A can be
expressed in terms of the components of B by the equation Aij = (B/Bij)/B where B is
the determinant of B. Accordingly, since the covariant metric tensor g and the
contravariant metric tensor g are matrix inverses of each other, we have

If we multiply both sides by the partial of g with respect to the coordinate x we have

Notice that the left hand side looks like part of a Christoffel symbol. Recall the general
form of these symbols

If we set one of the lower indices of the Christoffel symbol, say c, equal to a, then we
have the contracted symbol

Since the indices a and are both dummies (meaning they each take on all possible
values in the implied summation), and since ga = ga, we can swap a and in any of the
terms without affecting the result. Swapping a and in the last term inside the
parentheses we see it cancels with the first term, and we're left with

Comparing this with our previous result (4), we find that the contracted Christoffel
symbol can be written in the form

Furthermore, recalling the elementary fact that the derivative of ln(y) equals 1/y times the
derivative of y, and the fact that k ln(y) = ln(yk), this result can also be written in the
form

Since our metrics all have negative determinants, we can replace |g| with -g in these
expressions. We're now in a position to evaluate the geometrical and physical
significance of the Ricci tensor, the vanishing of which constitutes Einstein's vacuum
field equations. The general form of the Ricci tensor is

which of course is a contraction of the full Riemann curvature tensor. Making use of the
preceding identity, this can be written as

In his original 1916 paper on the general theory Einstein initially selected coordinates
such that the metric determinant g was a constant -1, in which case the partial derivatives
of

all vanish and the Ricci tensor is simply

The vanishing of this tensor constitutes Einstein's vacuum field equations (1'), provided
the coordinates are such that g is constant. Even if g is not constant in terms of the
natural coordinates, it is often possible to transform the coordinates so as to make g
constant. For example, Schwarzschild replaced the usual r and coordinates with x =
r3/3 and y = cos(), together with the assumption that gtt = 1/grr, and thereby expressed
the spherically symmetrical line element in a form with g = -1. It is especially natural to
impose the condition of constant g in static systems of coordinates and spatially uniform
fields. Indeed, since we spend most of our time suspended quasi-statically in a nearly
uniform gravitational field, we are most intuitively familiar with gravity in this form.
From this point of view we identify the effects of gravity with the geodesic accelerations
relative to our static coordinates, as represented by the Christoffel symbols. Indeed
Einstein admitted that he conceptually identified the gravitational field with the
Christoffel symbols, despite the fact that it's possible to have non-vanishing Christoffel
symbols in flat spacetime, as discussed in Section 5.6
However, we can also take the opposite view. Rather than focusing on "static" coordinate
systems with constant metric determinants which make the first two terms of (5) vanish,
we can focus on "free-falling" inertial coordinates (also known as Riemann normal
coordinates) in terms of which the Christoffel symbols, and therefore the second and
fourth terms of (5), vanish at the origin. In other words, we "abstract away" the original
sense of gravity as the extrinsic acceleration relative to some physically distinguished
system of static coordinates (such as the Schwarzschild coordinates), and focus instead
on the intrinsic tidal accelerations (i.e., local geodesic deviations) that correspond to the
intrinsic curvature of the manifold. At the origin of Riemann normal coordinates the
Ricci tensor

reduces to

where subscripts following commas signify partial derivatives with respect to the
designated coordinate. Making use of the skew symmetry on the lower three indices of
the Christoffel symbol partial derivatives in these coordinates (as described in Section
5.7), the second term on the right hand side can be replaced with the negative of its two
complementary terms given by rotating the lower indices, so we have

Noting that each of the three terms on the right side is now a partial derivative of a
contracted Christoffel symbol, we have

At the origin of Riemann normal coordinates the first partial derivatives of g, and
therefore of
, all vanish, so the chain rule allows us to bring those factors outside
the differentiations, and noting the commutativity of partial differentiation we arrive at
the expression for the components of the Ricci tensor at the origin of Riemann normal
coordinates

Thus the vacuum field equations Rab = 0 reduce to

The quantity
is essentially a scale factor for the incremental volume element V. In
fact, for any scalar field we have

and taking =1 gives the simple volume. Therefore, at the origin of Riemann normal
(free-falling inertial) coordinates we find that the components of the Ricci tensor Rab are
simply the second derivatives of the proper volume of an incremental volume element,
divided by that volume itself. Hence the vacuum field equations Rab = 0 simply express
the vanishing of these second derivatives with respect to any two coordinates (not

necessarily distinct). Likewise the "complete" field equations in the form of (3) signify
that three times the second derivatives of the volume, divided by the volume, equal the
corresponding components of the "divergence-free" energy-momentum tensor expressed
by the right hand side of (3).
In physical terms this implies that a small cloud of free-falling dust particles initially at
rest with respect to each other does not change it's volume during an incremental advance
of proper time. Of course, this doesn't give a complete description of the effects of
gravity in a typical gravitational field, because although the volume of the cloud isn't
changing at this instant, its shape may be changing due to tidal acceleration. In a
spherically symmetrical field the cloud will become lengthened in the radial direction and
shortened in the normal directions. This variation in the shape is characterized by the
Weyl tensor, which in general may be non-zero even when the Ricci tensor vanishes.
It may seem that conceiving of gravity purely as tidal effect ignores what is usually the
most physically obvious manifestation of gravity, namely, the tendency of objects to "fall
down", i.e., the acceleration of the geodesics relative to our usual static coordinates near a
gravitating body. However, in most cases this too can be viewed as tidal accelerations,
provided we take a wider view of events. For example, the fall of a single apple to the
ground at one location on Earth can be transformed away (locally) by a suitable system of
accelerating coordinates, but the fall of apples all over the Earth cannot. In effect these
apples can be seen as a spherical cloud of dust particles, each following a geodesic path,
and those paths are converging and the cloud's volume is shrinking at an accelerating rate
as the shell collapses toward the Earth. The rate of acceleration (i.e., the second
derivative with respect to time) is proportional to the mass of the Earth, in accord with
the field equations.
6.1 An Exact Solution
Einstein had been so preoccupied with other studies that he had not
realized such confirmation of his early theories had become an everyday
affair in the physical laboratory. He grinned like a small boy, and kept
saying over and over Ist das wirklich so?
A.
E. Condon
The special theory of relativity assumes the existence of a unique class of global
coordinate systems - called inertial coordinates - with respect to which the speed of light
in vacuum is everywhere equal to the constant c. It was natural, then, to express physical
laws in terms of this preferred class of coordinate systems, characterized by the global
invariance of the speed of light. In addition, the special theory also strongly implied the
fundamental equivalence of mass and energy, according to which light (and every other
form of energy) must be regarded as possessing inertia. However, it soon became clear
that the global invariance of light speed together with the idea that energy has inertia (as
expressed in the famous relation E2 = m2 + |p|2) were incompatible with one of the most

firmly established empirical results of physics, namely, the exact proportionality of


inertial and gravitational mass, which Einstein elevated to the status of a Principle. This
incompatibility led Einstein, as early as 1907, to the belief that the global invariance of
light speed, in the sense of the special theory, could not be maintained. Indeed, he
concluded that we cannot assume, as do both Newtonian theory and special relativity, the
existence of any global inertial systems of coordinates (although we can carry over the
existence of a local system of inertial coordinates in a vanishingly small region of
spacetime around any event).
Since no preferred class of global coordinate systems is assumed, the general theory
essentially places all (smoothly related) systems of coordinates on an equal footing, and
expresses physical laws in a way that is applicable to any of these systems. As a result,
the laws of physics will hold good even with respect to coordinate systems in which the
speed of light takes on values other than c. For example, the laws of general relativity are
applicable to a system of coordinates that is fixed rigidly to the rotating Earth. According
to these coordinates the distant galaxies are "circumnavigating" nearly the entire universe
in just 24 hours, so their speed is obviously far greater than the constant c. The huge
implied velocities of the celestial spheres was always problematical for the ancient
conception of an immovable Earth, but it is beautifully accommodated within general
relativity by the effect which the implied centrifugal acceleration field - whose strength
increases in direct proportion to the distance from the Earth - has on the values of the
metric components guv for this rotating system of coordinates at those locations. It's true
that, when expressed in this rotating system of coordinates, those stars are moving with
dx/dt values that far exceed the usual numerical value of c, but they are not moving faster
than light, because the speed of light at those locations, expressed in terms of those
coordinates, is correspondingly greater.
In general, the velocity of light can always be inferred from the components of the metric
tensor, and typically looks something like
recall that in special relativity we have

. To understand why this is so,

and the trajectory of a light ray follows a null path, i.e., a path with d = 0. Thus,
dividing by (dt)2, we see that the path of light through spacetime satisfies the equation

and so the velocity of light is unambiguous in the context of special relativity, which is
restricted to inertial coordinate systems with respect to which equation (1) is invariant.
However, in the general theory we are no longer guaranteed the existence of a global
coordinate system of the simple form (1). It is true that over a sufficiently small spatial
and temporal region surrounding any given point in spacetime there exists a coordinate

system of that simple Minkowskian form, but in the presence of a non-vanishing


gravitational field ("curvature") equation (1) applies only with respect to "free-falling"
reference frames, which are necessarily transient and don't extend globally.
So, for example, instead of writing the metric in the xt plane as (d)2 = (dt)2 (dx)2 , we
must consider the more general form

As always, the path of a light ray is null, so we have d = 0, and the differentials dx and
dt must satisfy the equation

Solving this gives

If we diagonalize our metric we get gxt = 0, in which case the "velocity" of a null path in
the xt plane with respect to this coordinate system is simply dx/dt =
.
This quantity can (and does) take on any value, depending on our choice of coordinate
systems.
Around 1911 Einstein proposed to incorporate gravitation into a modified version of
special relativity by allowing the speed of light to vary as a scalar from place to place as a
function of the gravitational potential. This "scalar c field" is remarkably similar to a
simple refractive medium, in which the speed of light varies as a function of the density.
Fermat's principle of least time can then be applied to define the paths of light rays as
geodesics in the spacetime manifold (as discussed in Section 8.4). Specifically, Einstein
wrote in 1911 that the speed of light at a place with the gravitational potential would be
c0 (1 + /c02), where c0 is the nominal speed of light in the absence of gravity. In
geometrical units we define c0 = 1, so Einstein's 1911 formula can be written simply as c
= 1 + . However, this formula for the speed of light (not to mention this whole
approach to gravity) turned out to be incorrect, as Einstein realized during the years
leading up to 1915 and the completion of the general theory. In fact, the general theory
of relativity doesn't give any equation for the speed of light at a particular location,
because the effect of gravity cannot be represented by a simple scalar field of c values.
Instead, the "speed of light" at a each point depends on the direction of the light ray
through that point, as well as on the choice of coordinate systems, so we can't generally
talk about the value of c at a given point in a non-vanishing gravitational field. However,
if we consider just radial light rays near a spherically symmetrical (and non- rotating)
mass, and if we agree to use a specific set of coordinates, namely those in which the
metric coefficients are independent of t, then we can read a formula analogous to

Einstein's 1911 formula directly from the Schwarzschild metric. But how does the
Schwarzschild metric follow from the field equations of general relativity?
To deduce the implications of the field equations for observable phenomena Einstein
originally made use of approximate methods, since no exact solutions were known.
These approximate methods were adequate to demonstrate that the field equations lead in
the first approximation to Newton's laws, and in the second approximation to a natural
explanation for the anomalous precession of Mercury (see Section 6.2). However, these
results can now be directly computed from the exact solution for a spherically symmetric
field, found by Karl Schwarzschild in 1916. As Schwarzschild wrote, it's always
pleasant to find exact solutions, and the simple spherically symmetrical line element "let's
Mr. Einstein's result shine with increased clarity". To this day, most of the empirically
observable predictions of general relativity are consequences of this simple solution.
We will discuss Schwarzschild's original derivation in Section 8.7, but for our present
purposes we will take a slightly different approach. Recall from Section 5.5 that the most
general form of the metrical spacetime line element for a spherically symmetrical static
field (although it is not strictly necessary to assume the field is static) can be written in
polar coordinates as

where g = r2, g = r2 sin()2, and gtt and grr are functions of r and the gravitating mass
m. We expect that if m = 0, and/or as r increases to infinity, we will have gtt = 1 and grr =
1 in order to give the flat Minkowski metric in the absence of gravity. We've seen that in
this highly symmetrical context there is a very natural way to derive the metric
coefficients gtt and grr simply from the requirement to satisfy Kepler's third law and the
principle of symmetry between space and time. However, we now wish to know what
values for these metric coefficients are implied by Einstein's field equations.
In any region that is free of (non-gravitational) mass-energy the vacuum field equations
must apply, which means the Ricci tensor

must vanish, i.e., all the components are zero. Since our metric is in diagonal form, it's
easy to see that the Christoffel symbols for any three distinct indices a,b,c reduce to

with no summations implied. In two of the non-vanishing cases the Christoffel symbols
are of the form qa/(2q), where q is a particular metric component and subscripts denote
partial differentiation with respect to xa. By an elementary identity these can also be

written as
. Hence if we define the new variable
we can write the
2Q
Christoffel symbol in the form Qa with q = e . Accordingly if we define the variables
(functions of r)

then we have

and the non-vanishing Christoffel symbols (as given in Section 5.5) can be written as

We can now write down the components of the Ricci tensor, each of which must vanish
in order for the field equations to be satisfied. Writing them out explicitly and expanding
all the implied summations for our line element, we find that all the non-diagonal
components are identically zero (which we might have expected from symmetry
arguments), so the only components of interest in our case are the diagonal elements

Inserting the expressions for the Christoffel symbols gives the equations for the four
diagonal components of the Ricci tensor as functions of u and v:

The necessary and sufficient condition for the field equations to be satisfied by a line
element of the form (2) is that these four quantities each vanish. Combining the
expressions for Rtt and Rrr we immediately have ur = -vr , which implies u = -v + k for
some arbitrary constant k. Making these substitutions into the equation for R and
setting the constant of integration to k = i/2 gives the condition

Remembering that e2u = gtt, and that the derivative of e2u is 2ur e2u, this condition expresses
the requirement

The left side is just the chain rule for the derivative of the product r gtt, and since this
derivative equals 1 we immediately have rgtt = r + for some constant . Also, since grr
= e2v where v = u + i/2, it follows that grr = 1/gtt, and so we have the results

To match the Newtonian limit we set = 2m where m is classically identified with the
mass of the gravitating body. These metric coefficients were derived by combining the
expressions for Rtt and Rrr, but it's easy to verify that they also satisfy each of those
equations separately, so this is indeed the unique spherically symmetrical static solution
of Einstein's field equations.
Now that we have derived the Schwarzschild metric, we can easily correct the "speed of
light" formula that Einstein gave in 1911. A ray of light always travels along a null
trajectory, i.e., with d = 0, and for a radial ray we have d and d both equal to zero, so
the equation for the light ray trajectory through spacetime, in Schwarzschild coordinates
(which are the only spherically symmetrical ones in which the metric is independent of t)
is simply

from which we get

where the sign just indicates that the light can be going radially inward or outward.
(Note that we're using geometric units, so c = 1.) In the Newtonian limit the classical
gravitational potential at a distance r from mass m is = m/r, so if we let cr = dr/dt
denote the radial speed of light in Schwarzschild coordinates, we have
cr = 1 + 2
which corresponds to Einstein's 1911 equation, except that we have a factor of 2 instead
of 1 on the potential term. Thus, as becomes increasingly negative (i.e., as the
magnitude of the potential increases), the radial "speed of light" cr defined in terms of the
Schwarzschild parameters t and r is reduced to less than the nominal value of c.
On the other hand, if we define the tangential speed of light at a distance r from a
gravitating mass center in the equatorial plane ( = /2) in terms of the Schwarzschild
coordinates as ct = r(d/dt), then the metric divided by (dt)2 immediately gives

Thus, we again find that the "velocity of light" is reduced a region with a strong
gravitational field, but this speed is the square root of the radial speed at the same point,
and to the first order in m/r this is the same as Einstein's 1911 formula, although it is
understood now to signify just the tangential speed. This illustrates the fact that the
general theory doesn't lead to a simple scalar field of c values. The effects of gravitation
can only be accurately represented by a tensor field.
One of the observable implications of general relativity (as well as any other theory that
respects the equivalence principle) is that the rate of proper time at a fixed radial position
in a gravitational field relative to the coordinate time (which corresponds to proper time
sufficiently far from the gravitating mass) is given by

It follows that the characteristic frequency 1 of light emitted by some known physical
process at a radial location r1 will represent a different frequency 1 with respect to the
proper time at some other radial location r2 according to the formula

From the Schwarzschild metric we have gtt(rj) = 12j where j = -m/rj is the
gravitational potential at rj, so

Neglecting the higher-order terms and rearranging, this can also be written as

Observations of the light emitted from the surface of the Sun, and from other stars, is
consistent with this predicted amount of gravitational redshift (up to first order), although
measurements of this slight effect are difficult. A terrestrial experiment performed by
Rebka and Pound in 1960 exploited the Mossbauer effect to precisely determine the
redshift between the top and bottom of a tower. The results were in good agreement with
the above formula, and subsequent experiments of the same kind have improved the
accuracy to within about 1 percent. (Note that if r1 and r2 are nearly equal, as, for
example, at two heights near the Earth's surface, then the leading factor of the right-most
expression is essentially just the acceleration of gravity a = -m/r2, and the factor in
parentheses is the difference in heights h, so we have / = a h.)
However, it's worth noting that this amount of gravitational redshift is a feature of just
about any viable theory of gravity that includes the equivalence principle, so these
experimental results, although useful for validating that principle, are not very robust for
distinguishing between competing theories of gravity. For this we need to consider other
observations, such as the paths of light near a gravitating body, and the precise orbits of
planets. These phenomena are discussed in the subsequent sections.
6.2 Anomalous Precessions
In these last months I had great success in my work. Generally covariant gravitation
equations. Perihelion motions explained quantitatively you will be astonished.
Einstein to Besso, 17 Nov
1915

The Earth's equatorial plane maintains a nearly constant absolute orientation in space
throughout the year due to the gyroscopic effect of spinning about its axis. Similarly the
plane of the Earth's orbit around the Sun remains essentially constant. These two planes
are tilted by 23.5 degrees with respect to each other, so they intersect along a single line

whose direction remains constant, assuming the planes themselves maintain fixed
attitudes. At the Spring and Autumn equinoxes the Sun is located precisely on this fixed
line in opposite directions from the Earth. Since this line is a highly stable directional
reference, it has been used by astronomers since ancient times to specify the locations of
celestial objects. (Of course, when we refer to "the location of the Sun" we are speaking
somewhat loosely. With the increased precision of observations made possible by the
invention of the telescope, it is strictly necessary to account for the Sun's motion about
the center of mass of the solar system. It is this center of mass of the Sun and planets,
rather than just of the Sun, that is taken as the central inertial reference point for the most
precise astronomical measurements and calculations.) By convention, the longitude of
celestial objects is referenced from the direction of this line pointing to the Spring
equinox, and this is called the "right ascension" of the object. In addition, the
"declination" specifies the latitude, i.e., the angular position North or South of the Earth's
equatorial plane.
This system of specifying positions is quite stable, but not perfect. Around 150 BC the
Greek astronomer Hipparchus carefully compared his own observations of certain stars
with observations of the same stars recorded by Timocharis 169 years earlier (and with
some even earlier measurements from the Babylonians), and noted a slight but systematic
difference in the longitudes. Of course, these were all referenced to the supposedly fixed
direction of the line of intersection between the Earth's rotational and orbital planes, but
Hipparchus was led to the conclusion that this direction is not perfectly stationary, i.e.,
that the direction of the Sun at the equinoxes is not constant with respect to the fixed
stars, but precesses by about 0.0127 degrees each year. This is a remarkably good
estimate, considering the limited quality of the observations that were available to
Hipparchus. The accepted modern value for the precession of the equinoxes is 0.01396
degrees per year, which implies that the line of the equinoxes actually rotates completely
around 360 degrees over a period of about 26,000 years. Interpreting this as a gradual
change in the orientation of the Earth's axis of rotation, the precession of the equinoxes is
the third of what Copernicus called the "threefold movement of the Earth", the first two
being a rotation about its axis once per day, and a revolution about the Sun once per year.
Awareness of this third motion is arguably a distinguishing feature of human culture,
since it can only be discerned on the basis of information spanning multiple generations.
The reason for mentioning this, aside from expressing admiration for human ingenuity, is
that when we observe the axis of the elliptical orbit of a planet such as Mercury (for
example) over a long period of time, referenced to our equinox line, we must expect to
find an apparent precession of about 0.01396 degrees per year, which equals 5025 arc
seconds per century, assuming Mercury's orbital axis is actually stationary. However,
astronomers have actually observed a precession rate of 5600 arc seconds per century for
the axis of Mercury's orbit, so evidently the axis is not truly stationary. This might seem
like a problem for Newtonian gravity, until we remember that Newton predicted stable
elliptical orbits only for the idealized two-body case. When analyzing the actual orbit of
Mercury we must also take into account the gravitational pull of the other planets,
especially Venus and Earth (because of their proximity) and Jupiter (because of its size).
It isn't simple to work out these effects, and unfortunately there is no simple analytical

solution to the n-body problem in Newtonian mechanics, but using the calculational
techniques developed by Lagrange, Laplace, and others, it is possible to determine that
the effects of all the other planets should contribute an additional 532 arc seconds per
century to the precession of Mercury's orbit. Combined with the precession of our
equinox reference line, this accounts for 5557 arc seconds per century, which is close to
the observed value of 5600, but still short by 43 arc seconds per century. The
astronomers assure us that their observations can't be off by more than a fraction of an arc
second, so there seems to be a definite problem here.
A similar problem had appeared in the 1840's when the newly discovered planet Uranus
began to deviate noticeably from the precise course that Newtonian theory prescribed.
On that occasion, the astronomer Le Verrier and the mathematician Adams had
(independently) inferred the existence of a previously unknown planet beyond the orbit of
Uranus, and even gave instructions where it could be found. Sure enough, when that
indicated region of the sky was searched by Johann Galle at the Berlin Observatory, the
planet that came to be called Neptune was discovered in 1846, astonishingly close to the
predicted location. This was a tremendous triumph for Le Verrier, and surely gave him
confidence that all apparent anomalies in the planetary orbits could be explained on the
basis of Newtonian theory, and could be used as an aid to the discovery of new celestial
objects. He soon turned his attention to the anomalous precession of Mercury's orbit
(which he estimated at 38 arc seconds per century, somewhat less than the modern value),
and suggested that it must be due to some previously unknown mass near the Sun,
possibly a large number of small objects, or perhaps even another planet, inside the orbit
of Mercury.
At one point there were reports that a small planet orbiting very near the Sun had actually
been sighted, and it was named Vulcan, after the Roman God of fire. Le Verrier became
convinced that the new planet existed, but subsequent attempts to observe the
hypothetical planet failed to find any sign of it. Even the original sightings were cast into
doubt, since they had been made by an amateur, and other astronomers reported that they
had been observing the Sun at the very same time and had seen nothing. Another popular
theory to explain Mercury's anomalous precession, championed by the astronomer Simon
Newcomb, was that the small particles of matter that cause the "zodiacal light" might
account for Mercury's anomalous precession, but Newcomb soon realized that if there
were enough matter to affect Mercury's perihelion so significantly there would also be
enough to cause other effects on the orbits of the inner planets - effects which are not
observed. Similar inconsistencies undermined the Vulcan hypothesis.
As a result of the failures to arrive at a realistic Newtonian explanation for the anomalous
precession, some researchers, notably Asaph Hall and Newcomb, began to think that
perhaps Newtonian theory was at fault, and that perhaps gravity isn't exactly an inverse
square law. Hall noted that he could account for Mercury's precession if the law of
gravity, instead of falling off as 1/r2, actually falls of as 1/rn where the exponent n is
2.00000016. However, most people didn't (and still don't) find that idea to be very
appealing, since it conflicts with basic conservation laws, e.g., Gauss's Law, unless we
also postulate a correspondingly modified metric for space (ironically enough).

More recently, efforts have been made to explain some or all of Mercury's precession by
oblateness in the shape of the sun. In 1966 Dicke and Goldenberg reported that the sun's
polar axis is shorter than its equatorial axes by about 50 parts per million. If true that
would account for 3.4" per century, so the unexplained part would be only 39.6",
significantly different from GR's prediction of 43". The Brans-Dicke theory of gravity
can account for 39.6" precisely by adjusting a free parameter of the theory. However,
Dicke's and Goldenberg's solar oblateness data was contradicted by a number of other
heliometric measurements, all of which showed that the solar axes differ by no more than
about 4 parts per million. In addition, the sun doesn't appear to rotate nearly fast enough
to be as oblate as Dicke and Goldenberg thought, so their results could only be right if the
interior of the sun is spinning about 25 times faster than the visible exterior, which is
highly implausible. The current consensus is that the Sun is not nearly oblate enough to
upset the agreement between Mercury's observed precession and the predictions of GR.
This is all the more impressive considering that, in contrast to the Brans-Dicke and other
alternative theories, GR has almost no "freedom" to adjust its predictions. It is highly
constrained by its own logic, so it's remarkable that it continues to survive experimental
challenges.
It should be noted that Mercury isn't the only object in the solar system that exhibits
anomalous precession. The effect is most noticeable for objects near the Sun with highly
elliptical orbits, but it can be seen even in the nearly circular orbits of Venus and Earth,
although the discrepancy isn't nearly so large as for Mercury. In addition, the asteroid
Icarus is ideal for studying this effect, because it has an extremely elliptical orbit and
periodically passes very close to the Sun. Here's a table showing the anomalous
precession of four objects in the inner solar system, based on direct observations:

The large tolerances for Venus and Earth are mainly due to the fact that their orbits are so
nearly circular, making it difficult to precisely determine the axes of their elliptical orbits.
Incidentally, Icarus periodically crosses the Earth's path, and has actually passed within a
million kilometers of us - less than 3 times the distance to the Moon. It's about 1 mile in
diameter, and may eventually collide with the Earth - reason enough to keep an eye on its
precession.
One hope that Einstein had throughout the time he was working on the general theory
was that it would explain the anomalous precession of Mercury. Of course, as we've
seen, "explanations" of this phenomenon were never in short supply, but none of them

were very compelling, all seeming to be ad hoc. In contrast, Einstein found that the extra
precession arises unavoidably from the fundamental principles of general relativity. To
determine the relativistic prediction for the advance of an elliptical orbit, let's work in the
single plane = /2, so of course d/dt and all higher derivatives also vanish, and we
have sin() = 1. Thus the term involving in the Schwarzschild metric drops out, leaving
just

The Christoffel symbols and the equations of geodesic motion for this metric were
already given in Section 5.5. Taking the parameter equal to the proper time , those
equations are

We can immediately integrate equations (2) and (4) to give

where k and h are constants of integration, determined by the initial conditions of the
orbit. We can now substitute for these derivatives into the basic Schwarzschild metric
divided by (d2 to give

Solving for (dr/d2, we have

Differentiating this with respect to and dividing by 2(dr/d gives

(We arrive at this same equation if we insert the squared derivatives of the coordinates
into equation (3), because one of the geodesic equations is always redundant to the line
element.) Letting = d/d denote the proper angular speed, we have h = r2, and the
above equation can be written as

Obviously if = 0 this gives the "proper" analog of Newton's inverse-square law for
radial gravitational acceleration. With non-zero the term 2r corresponds to the
Newtonian centripetal acceleration which, if we defined the tangential velocity v = r,
would equal the classical v2/r. This term serves to offset the inward pull of gravity, but in
the relativistic version we find not 2r but 2(r3m). (To avoid confusion, its worth
nothing that the quantity 2(13m/r) would be simply 2 if were defined as the
derivative of with respect to the Schwarzschild coordinate time t instead of the proper
time . Hence, as we saw in Section 5.5, the relativistic version of Keplers third law for
circular orbits is formally identical to the Newtonian version but only if we identify the
Newtonian coordinates with the Schwarzschild coordinates.) For values of r much greater
than 3m this difference can be neglected, but clearly if r approaches 3m we can expect to
see non-classical effects, and of course if r ever becomes less than 3m we would expect
completely un-classical behavior. In fact, this corresponds to the cases when an orbiting
particle spirals into the center, which never happens in classical theory (see below).
Since the above equations involve powers of (1/r) it's convenient to work with the
parameter u = 1/r. Differentiating u with respect to gives du/d = (1/r2) dr/d. Also,
since r2 = h/(d/d), we have dr/d = h (du/d). Substituting for dr/d and 1/r into
equation (5) gives the following differential equation relating u to

Differentiating again with respect to and dividing by 2h2 (du/d), we arrive at

where

denotes d2u/d2. Solving this quadratic for u gives

The quantity in the parentheses under the square root is typically quite small compared
with 1, so we can approximate the square root by the first few terms of its expansion

Expanding the right hand side and re-arranging terms gives

The value of

in typical astronomical problems is numerically quite small (many orders

of magnitude less than 1), so the quantity


on the right hand side will be negligible
for planetary motions. Therefore, we're left with a simple harmonic oscillator of the form
where M and F are constants. For some choice of initial the general
solution of this equation can be expressed as
where k is a
constant of integration. Therefore, reverting back to the parameter r = 1/u, the relation
between r and is

where
If the "frequency" was equal to unity, this would be the polar equation of an ellipse
with the pole at one focus, and the constant k would signify the eccentricity. Also, the
leading factor would be the radial distance from the focus to the ellipse at an angle of /2
from the major axis, i.e., it would represent the semilatus rectum. However, the value of
is actually slightly less than 1, which implies that must go slightly beyond 2 in
order to complete one cycle of the radial distance. Consequently, for small values of m/h
the path is approximately a Keplerian ellipse, but the axis of the ellipse precesses slightly,
as illustrated below.

This illustration depicts a much more severe case than could exist for any planet in our
solar system, because the perihelion of the orbit is only 200m where m is the gravitational
radius (in geometrical units) of the central object, which means it is only 100 times the
corresponding "black hole radius". Our Sun's mass is not nearly concentrated enough to
permit this kind of orbit, since the Sun's gravitational radius is only m = 1.475 kilometers,
whereas it's matter fills a sphere of radius 696,000 kilometers.
To determine the relativistic prediction for the orbital precession of the planetary orbits,
we can expand the expression for as follows

Since m/h is so small, we can take just the first-order term, and noting that one cycle of
the radial function will be completed when = 2, we see that must increase by 2/
for each radial cycle, so the precession per revolution is

We saw above that the semilatus rectum L is approximately h2/m, so the amount of
precession per revolution (for slow moving objects in weak gravitational fields, such as
the planets in our solar system) can be written as simply 6m/L, where m is the
gravitational radius of the central body. As noted above, the gravitational radius of our
Sun is 1.475 kilometers, so based on the elements of the planetary orbits we can construct
the following table of relativistic precession.

The observed precession of 43.1 0.5 arc seconds per century for the planet Mercury is
in close agreement with the theory. We noted in section 5.8 how Einstein proudly
concluded his presentation of the vacuum field equations in his 1916 paper on general
relativity by pointing out that they explained the anomalous precession of Mercury. He
returned to this subject at the end of the paper, giving the precession formula and closing
his masterpiece with the words
Calculation gives for the planet Mercury a rotation of the orbit of 43" per century, corresponding
exactly to the astronomical observation (Leverrier); for the astronomers have discovered in the
motion of the perihelion of this planet, after allowing for disturbances by the other planets, an
inexplicable remainder of this magnitude.

We mentioned previously that the small eccentricities of Venus and Earth make it difficult
to determine their lines of apsides with precision, but modern measurement techniques
(including the use of interplanetary space probes and radar ranging) and computerized
analysis of the data have enabled the fitting of the entire solar system to a parameterized
post-Newtonian (PPN) model that encompasses a fairly wide range of theories (including
general relativity). Once the parameters of this model have been fit to all the available
data for the Sun and planets, the model can then be used to compute the "best
observational fit" for the precessions of the individual planets based on the PPN
formalism. This gives precessions (in excess of the Newtonian predictions) of 43.1, 8.65,
3.85, and 1.36 arcseconds per century for the four inner planets respectively, in
remarkable agreement with the predictions of general relativity.
If we imagine an extremely dense central object, whose mass is concentrated inside it's
gravitational radius, we can achieve much greater deviations from conventional
Newtonian orbits. For example, if the precession rate is roughly equal to the orbital rate,
we have an orbit as shown below:

For an orbit with slightly less energy the path looks like this:

where the dotted circle signifies the "light orbit" radius r = 3m. With sufficient angular
momentum it's possible to arrange for persistent timelike orbits periodically descending
down to any radius greater than 3m, which is the smallest possible radius of a circular
orbit (but note that a circular orbit with radius less than 6m is unstable). If a timelike
geodesic ever passes inside that radius it must then spiral in to the central mass, as
illustrated below.

Here the outer dotted circle is at 3m, and the inner circle is at the event horizon, 2m.
Once a worldline has fallen within 2m, whether geodesic or not, it's radial coordinate

must (according to the Schwarzschild solution) thereafter decrease monotonically to


zero.
Regarding these spiral solutions there is an ironic historical precedent. A few years
before writing the Principia Newton once described in a letter to Robert Hooke the
descent of an object along a spiral path to the center of a gravitating body. Several years
later, after the Principia had established Newton's reputation, the two men became
engaged in a bitter priority dispute over the discovery of universal gravitation, and Hooke
used this letter as evidence that Newton hadn't understood gravity at that time, because
the classical inverse-square law of gravity permits no such spiral solutions. Newton
replied that it had simply been a "negligent stroke with his pen". Interestingly, although
people sometimes credit Newton with originating the idea of photons based on his
erroneous corpuscular theory of light, it's never been suggested that his "negligent spiral"
was a premonition of the Schwarzschild solution of Einstein's field equations.
Incidentally, the relativistic contribution to a planet's orbital precession rate is often
derived as a "resonance" effect. Recall that the general solution of an ordinary linear
differential equation contains a term proportional to ex for each root of the
characteristic polynomial, and a resonance occurs when the characteristic polynomial has
a repeated root, in which case the solution has a term proportional to xex. If there is
another repetition of the root it is represented by a term proportional to x2ex, and so on.
As a means of approximating the solution of the non-linear equation (6), many authors
introduce a trial solution of the form c0 + c1cos() + c2sin(), suggesting that the last
term is to be regarded as a resonance, whose effect grows cumulatively over time because
the factor is not periodic, and therefore eventually has observable effects over a large
number of orbits, such as the 414 revolutions of Mercury per century. Now, provided
(c2/c1) is many orders of magnitude smaller than 1, we can use the small-angle
approximations sin(x) ~ x and cos(x) ~ 1 to write the solution as

where weve used the trigonometric identity cos(x)cos(y) sin(x)sin(y) = cos(xy). This
yields the correct result, but the interpretation of it as a resonance effect is misleading,
because the predominant cumulative effect of a resonant term proportional to sin() in
the long run is not a precession of the ellipse, but rather an increase in the magnitude of
the radial excursions of a component that is at an angle of /2 relative to the original
major axis. It just so happens that on the initial cycles this effect causes the overall
perihelion to precess slightly, simply because the phase of the sine component is
beginning to assert itself over the phase of the cosine component. In other words, the
apparent "precession" resulting from the sin() term on the initial cycles is really just a

one-time phase shift corresponding to a secular increase in the radial amplitude, and does
not actually represent a change in the frequency of the solution. It can be shown that a
term involving sin() appears in the second-order power series expansion of the
solution to equation (6), which explains why it is a useful curve-fitting function for small
, but it does not represent a true resonance effect, as shown by the fact that the ultimate
cumulative effect of this term is discarded when we apply the small-angle approximation
to estimate the frequency shift.
6.3 Bending Light
When Lils husband got demobbed, I said
I didnt mince my words, I said to her myself,
HURRY UP PLEASE ITS TIME
T. S. Eliot, 1922
At the conclusion of his treatise on Opticks in 1704, the 62 year old Newton lamented
that he could "not now think of taking these things into farther consideration", and
contented himself with proposing a number of queries "in order to a farther search to be
made by others". The very first of these was
Do not Bodies act upon Light at a distance, and by their action bend its Rays, and
is not this action strongest at the least distance?
Superficially this may not seem like a very radical suggestion, because on the basis of the
corpuscular theory of light, and Newton's laws of mechanics and gravitation, it's easy to
conjecture that a beam of light might be deflected slightly as it passes near a large
massive body, assuming particles of light respond to gravitational acceleration similarly
to particles of matter. For any conical orbit of a small test particle in a Newtonian
gravitational field around a central mass m, the eccentricity is given by

where E = v2/2 m/r is the total energy (kinetic plus potential), h = rvt is the angular
momentum, v is the total speed, vt is the tangential component of the speed, and r is the
radial distance from the center of the mass. Since a beam of light travels at such a high
speed, it will be in a shallow hyperbolic orbit around an ordinary massive object like the
Sun. Letting r0 denote the closest approach (the perihelion) of the beam to the gravitating
body, at which v = vt, we have

Now we set v = 1 (the speed of light in geometric units) at the perihelion, and from the
geometry of the hyperbola we know that the asymptotes make an angle of with the axis
of symmetry, where cos() = 1/.

With a hyperbola as shown in the figure above, this implies that the total angular
deflection of the beam of light is = 2( /2), which for small angles and for m (in
geometric units) much less than r0 is given in Newtonian mechanics by

The best natural opportunity to observe this deflection would be to look at the stars near
the perimeter of the Sun during a solar eclipse. The mass of the Sun in gravitational units
is about m = 1475 meters, and a beam of light just skimming past the Sun would have a
closest distance equal to the Sun's radius, r = (6.95)108 meters. Therefore, the Newtonian
prediction would be 0.000004245 radians, which equals 0.875 seconds of arc. (There are
2 radians per 360 degrees, each of degree representing 60 minutes of arc, and each
minute represents 60 seconds of arc.)
However, there is a problematical aspect to this "Newtonian" prediction, because it's
based on the assumption that particles of light can be accelerated and decelerated just like
ordinary matter, and yet if this were the case, it would be difficult to explain why (in nonrelativistic absolute space and time) all the light that we observe is traveling at a single
characteristic speed. Admittedly if we posit that the rest mass of a particle of light is
extremely small, it might be impossible to interact with such a particle without imparting
to it a very high velocity, but this doesn't explain why all light seems to have precisely the
same velocity, as if this particular speed is somehow a characteristic property of light. As
a result of these concerns, especially as the wave conception of light began to supersede
the corpuscular theory, the idea that gravity might bend light rays was largely discounted
in Newtonian physics. (The same fate befell the idea of black holes, originally proposed
by Mitchell based on the Newtonian escape velocity for light. Laplace also mentioned the
idea in his Celestial Mechanics, but deleted it in the third edition, possibly because of the
conceptual difficulties discussed here.)
The idea of bending light was revived in Einstein's 1911 paper "On the Influence of
Gravitation on the Propagation of Light". Oddly enough, the quantitative prediction given
in this paper for the amount of deflection of light passing near a large mass was identical

to the old Newtonian prediction, = 2m/r0. There were several attempts to measure the
deflection of starlight passing close by the Sun during solar eclipses to test Einstein's
prediction in the years between 1911 and 1915, but all these attempts were thwarted by
cloudy skies, logistical problems, the First World War, etc. Einstein became very
exasperated over the repeated failures of the experimentalists to gather any useful data,
because he was eager to see his prediction corroborated, which he was certain it would
be. Ironically, if any of those early experimental efforts had succeeded in collecting
useful data, they would have proven Einstein wrong! It wasn't until late in 1915, as he
completed the general theory, that Einstein realized his earlier prediction was incorrect,
and the angular deflection should actually be twice the size he predicted in 1911. Had the
World War not intervened, it's likely that Einstein would never have been able to claim
the bending of light (at twice the Newtonian value) as a prediction of general relativity.
At best he would have been forced to explain, after the fact, why the observed deflection
was actually consistent with the completed general theory. (This would have made it
somewhat similar the cosmological expansion, which would have been one of the most
magnificent theoretical predictions in the history of science, but the experimentalist
Hubble got there first.) Luckily for Einstein, he corrected the light-bending prediction
before any expeditions succeeded in making useful observations. In 1919, after the war
had ended, scientific expeditions were sent to Sobral in South America and Principe in
West Africa to make observations of the solar eclipse. The reported results were angular
deflections of 1.98 0.16 and 1.61 0.40 seconds of arc, respectively, which was taken
as clear confirmation of general relativity's prediction of 1.75 seconds of arc. This
success, combined with the esoteric appeal of bending light, and the romantic adventure
of the eclipse expeditions themselves, contributed enormously to making Einstein a world
celebrity.
One other intriguing aspect of the story, in retrospect, is the fact that there is serious
doubt about whether the measurement techniques used by the 1919 expeditions were
robust enough to have legitimately detected the deflections which were reported.
Experimentalists must always be wary of the "Ouija board" effect, with their hands on the
instruments, knowing what results they want or expect. This makes it especially
interesting to speculate on what values would have been recorded if they had managed to
take readings in 1914, when the expected deflection was still just 0.875 seconds of arc. (It
should be mentioned that many subsequent observations, summarized below, have
independently confirmed the angular deflection predicted by general relativity, i.e., twice
the "Newtonian" value.)
To determine the relativistic prediction for the bending of light past the Sun, the
conventional approach is to simply evaluate the solution of the four geodesic equations
presented in Chapter 5.2, but this involves a three-dimensional manifold, with a large
number of Christoffel symbols, etc. It's possible to treat the problem more efficiently by
considering it from a two-dimensional standpoint. Recall the Schwarzschild metric in the
usual polar coordinates

We'll restrict our attention to a single plane through the center of mass by setting = 0,
and since light travels along null paths, we set d = 0, which allows us to write the
remaining terms in the form

This can be regarded as the (positive-definite) line element of a two-dimensional surface


(r, ), with the parameter t serving as the metrical distance. The null paths satisfying the
complete spacetime metric with d = 0 are stationary if and only if they are stationary
with respect to (1). This implies Fermats Principle of least time, i.e., light follows
paths that minimize the integrated time of flight, or, more generally, paths for which the
elapsed Schwarzschild coordinate time is stationary, as discussed in Chapter 3.5.
(Equivalently, we have an angular analog of Fermats Principle, i.e., light follows paths
that make the angular displacement d stationary, because the coefficients of (1) are
independent of both t and .) Therefore, we need only determine the geodesic paths on
this surface. The covariant and contravariant metric tensors are simply

and the only non-zero partial derivatives of the components of g are

so the non-zero Christoffel symbols are

Taking the coordinate time t as the path parameter (since it plays the role of the metrical
distance in this geometry), the two equations for geodesic paths on the (r, ) surface are

These equations of motion describe the paths of light rays in a spherically symmetrical
gravitational field. The figure below shows the paths of a set of parallel incoming rays.

The dotted circles indicate radii of m, 2m, ..., 6m from the mass center. Needless to say, a
typical star's physical radius is much greater than it's gravitational radius m, so we will
not find such severe deflection of light rays, even for rays grazing the surface of the star.
However, for a "black hole" we can theoretically have rays of light passing at values of r
on the same order of magnitude as m, resulting in the paths shown in this figure.
Interestingly, a significant fraction of the oblique incoming rays are "scattered" back out,
with a loop at r = 3m, which is the "light radius". As a consequence, if we shine a broad
light on a black hole, we would expect to see a "halo" of back-scattered light outlining a
circle with a radius of 3m.
To quantitatively assess the angular deflection of a ray of light passing near a large
gravitating body, note that in terms of the variable u = d/dt the second geodesic equation
(2) has the form (1/u)du = [(2/r)(r3m)/(r2m)]dr, which can be integrated immediately
to give ln(u) = ln(r2m) 3ln(r) + C, so we have

To determine the value of K, we divide the metric equation (1) by (dt)2 and evaluate it at
the perihelion r = r0, where dr/dt = 0. This gives

Substituting into the previous equation we find K2 = r03/(r0 2m), so we have

Now we can substitute this into the metric equation divided by (dt)2 and solve for dr/dt to
give

Dividing d/dt by dr/dt then gives

Integrating this from r = r0 to gives the mass-centered angle swept out by a photon as it
moves from the perihelion out to an infinite distance. If we define = r0/r the above
equation can be written in the form

The magnitude of the second term in the right-hand square root is always less than 1
provided r0 is greater than 3m (which is the radius of light-like circular orbits, as
discussed further in Section 6.5), so we can expand the square root into a power series in
that quantity. The result is

This can be analytically integrated term by term. The integral of the first term is just /2,
as we would expect, since with a mass of m = 0 the photon would travel in a straight line,
sweeping out a right angle as it moves from the perihelion to infinity. The remaining
terms supply the excess angle, which represents the angular deflection of the light ray.
If m/r0 is small, only the first-order term is significant. Of course, the path of light is
symmetrical about the perihelion, so the total angular deflection between the asymptotes
of the incoming and outgoing rays is twice the excess of the above integral beyond /2.
Focusing on just the first order term, the deflection is therefore

Evaluating the integral

from = 0 to 1 gives the constant factor 2, so the first-order deflection is = 4m/r0. This
gives the relativistic value of 1.75 seconds of arc, which is twice the Newtonian value. To
higher orders in m/r0 we have

The difficulty of performing precise measurement of optical starlight deflection during an


eclipse can be gathered from the following list of results:

Fortunately, much more accurate measurements can now be made in the radio
wavelengths, especially of quasars, since such measurements can be made from
observatories with the best equipment and careful preparation (rather than hurriedly in a
remote location during a total eclipse). In particular, the use of Very Long Baseline
Interferometry (VBLI), combining signals from widely separate observatories, gives a
tremendous improvement in resolving power. With these techniques its now possible to
precisely measure the deflection (due to the Suns gravitational field) of electromagnetic
waves from stars at great angular distances from the Sun. According to Will, an analysis
in 2004 of over 2 million VBLI observations has shown that the ratio of the actual
observed deflections to the deflections predicted by general relativity is 0.99992
0.00023. Thus the dramatic announcement of 1919 has been retro-actively justified.
The first news of the results of Eddingtons expedition reached Einstein by way of
Lorentz, who on September 22 sent the telegram quoted at the beginning of this chapter.
On the 7th of October Lorentz followed with a letter, providing details of Eddingtons
presentation to the British Association at Bournemouth. Oddly enough, at this meeting
Eddington reported that one can say with certainty that the effect (at the solar limb) lies
between 0.87 and 1.74, although he qualified this by saying the plates had been
measured only preliminarily, and the final value was still to be determined. In any case,
Lorentzs letter also included a rough analysis of the amount of deflection that would be
expected due to ordinary refraction in the gas surrounding the Sun. His calculations
indicated that a suitably chosen gas density at the Suns surface could indeed produce a
deflection on the order of 1, but for any realistic density profile the effect would drop off
very rapidly for rays passing just slightly further from the Sun. Thus the effect of
refraction, if there was any, would be easily distinguishable from the relativistic effect.
He concluded
We may surely believe (in view of the magnitude of the detected deflection) that,
in reality, refraction is not involved at all, and your effect alone has been
observed. This is certainly one of the finest results that science has ever
accomplished, and we may be very pleased about it.

6.4 Radial Paths in a Spherically Symmetrical Field


It is no longer clear which way is up even if one wants to rise.
David Riesman, 1950
In this section we consider the simple spacetime trajectory of an object moving radially
with respect to a spherical mass. As weve seen, according to general relativity the metric
of spacetime in the region surrounding an isolated spherical mass m is given by

where t is time coordinate, r is the radial coordinate, and the angles and are the usual
angles for polar coordinates. Since we're interested in purely radial motions the
differentials of the angles d and d are zero, and we're left with a 2-dimensional surface
with the coordinates t and r, with the metric

This formula tells us how to compute the absolute lapse of proper time d along a given
path corresponding to the coordinate increments dt and dr. The metric tensor on this 2dimensional space is given by the diagonal matrix

which has determinant g = 1. The inverse of the covariant tensor guv is the contravariant
tensor

In order to make use of index notation, we define x1 = t and x2 = r. Then the equations for
the geodesic paths on any surface can be expressed as

where summation is implied over any indices that are repeated in a given product, and ijk
denotes the Christoffel symbols. Note that the index i can be either 1 or 2, so the above
expression actually represents two differential equations involving the 1st and 2nd
derivatives of our coordinates x1 and x2 (which, remember, are just t and r) with respect to
the "affine parameter" . This parameter just represents the normalized "distance" along
the path, so it's proportional to the proper time for timelike paths.
The Christoffel symbol is defined in terms of the partial derivatives of the components of
the metric tensor as follows

Taking the partials of the components of our guv with respect to t and r we find that they
are all zero, with the exception of

Combining this with the fact that the only non-zero components of the inverse metric
tensor guv are g11 and g22, we find that the only non-zero Christoffel symbols are

So, substituting these expressions into the geodesic formula (2), and reverting back to the
symbols t and r for our coordinates, we have the two ordinary differential equations for
the geodesic paths on the surface

These equations can be integrated in closed form, although the result is somewhat messy.

They can also be directly integrated numerically using small incremental steps of "d" for
any initial position and trajectory. This allows us to easily generate geodesic paths in
terms of r as a function of t. If we do this, we will notice that the paths invariably go to
infinite t as r approaches 2m. Is our 2-dimensional surface actually singular at r = 2m, or
are the coordinates simply ill-behaved (like longitude at the North pole)?
As we saw above, the surface has an invariant Gaussian curvature at each point. Let's
determine the curvature to see if anything strange occurs at r = 2m. The curvature can be
computed in terms of the components of the metric tensor and their first and second
partial derivatives. The non-zero first derivatives for our surface (and the determinant g =
1) were noted above. The only non-zero second derivatives are

So we can compute the intrinsic curvature of our surface using Gauss's formula for the
curvature invariant K of a two-dimensional surface given in the section on Curvature.
Inserting the metric components and derivatives for our surface into that equation gives
the intrinsic curvature

Therefore, at r = 2m the curvature of this surface is 1/(4m2), which is certainly finite


(and in fact can be made arbitrarily small for sufficiently large m). The only singularity in
the intrinsic curvature of the surface occurs at r = 0.
In order to plot r as a function of the proper time we would like to eliminate t from the
two equations. To do this, notice that if we define T = dt/d the first equation can be
written in the form

which is just an ordinary first-order differential equation in T with variable coefficients.


Recall that the solution of any equation of the form

is given by

where k is a constant of integration and w =

. Thus the solution of (4) is

The integral in the exponential is just ln(r) ln(r 2m) so the result is

Let's suppose our test particle is initially stationary at r = R and then allowed to fall
freely. Thus the point r = R is the "apogee" of the radial orbit. Our affine parameter is
proportional to the proper time along a path, and the value we assign to "k" determines
the scale factor between and . From the original metric equation (1) we know that at
the apogee (where dr/d = 0) we have

Multiplying this with the previous derivative at r = R gives

Thus in order to scale our affine parameter to the proper time for this radial orbit we
need to set k =

, and so

(Notice that this implies the initial value of dt/d at the apogee is
, and of
course dr/d at that point is 0.) Substituting this into the 2nd geodesic equation (3) gives
a single equation relating the radial parameter r and the affine parameter , which we
have made equivalent to the proper time , so we have

At the apogee r = R where dr/dt = 0 this reduces to

This is a measure of the acceleration of a static test particle at the radial parameter r.
More generally, we can use equation (5) to numerically integrate the geodesic path from
any given initial trajectory, and it confirms that the radial coordinate passes smoothly
through r = 2m as a function of the proper time . This may seem surprising at first,
because the denominator of the leading factor contains (r 2m), so it might appear that
the second derivative of r with respect to proper time "blows up" at r = 2m. However,
remarkably, the square of dr/d is invariably forced to 1 2m/R precisely at r = 2m, so
the quantity in the square brackets goes to zero, canceling the zero in the denominator.
Interestingly, equation (5) has the same closed-form solution as does radial free-fall in
Newtonian mechanics (if is identified with Newton's absolute time). The solution can
be expressed in terms of the parameter by the "cycloid relations"

The coordinate time t can also be given explicitly in terms of by the formula

where Q =

. A typical timelike radial orbit is illustrated below.

6.5 Intersecting Orbits


Time is the longest distance between two places.
Tennessee Williams, 1945
The lapse of proper time for moving clocks in a gravitational field is often computed by
splitting the problem into separate components, one to account for the velocity effect in
accord with special relativity, and another to account for the gravitational effect in accord
with general relativity. However, the general theory subsumes the special theory, and it's
often easier to treat such problems holistically from a purely general relativistic
standpoint. (The persistent tendency to artificially bifurcate problems into "special" and
"general" components is partly due to the historical accident that Einstein arrived at the
final theory in two stages.) In the vicinity of an isolated non-rotating spherical body
whose Schwarzschild radius is 2m the metric has the form

where = longitude and = latitude (e.g., = 0 at the North Pole and = /2 at the
equator). Let's say our radial position r and our latitude are constant for each path in
question (treating r as the "radius" in the weak field approximation). Then the coefficients
of (dt)2 and (d)2 are both constants, and the metric reduces to

If we're sitting on the Earth's surface at the North Pole, we have sin() = 0, so it follows
that ds =

dt where r is the radius of the Earth.

On the other hand, in an equatorial orbit with radius r = R then we have = /2, sin2() =
1, and so the coefficient of (d)2 is simply R2. Now, recall Kepler's law 2 R3 = m, which
also happens to hold exactly in GR (provided that R is interpreted as the radial
Schwarzschild coordinate and is defined with respect to Schwarzschild coordinate
time). Since = d/dt we have R2 = m/(2 R) = (dt/d)2 (m/R). Thus the path of the
orbiting particle satisfies

Now for each test particle, one sitting at the North Pole and one in a circular orbit of
radius R, the path parameter s is the local proper time, so the ratio of the orbital proper
time to the North Pole's proper time is

To isolate the difference in the two proper times, we can expand the above function into a
power series in m/r to give

The mass of the earth, represented in geometrical units by half the Schwarzschild radius,
is about 0.00443 meters, and the radius of the earth is about 6.38(10)6 meters, so this
gives

which shows that the discrepancy in the orbit's lapse of proper time during a given lapse
T of proper time measured on Earth is

Consequently, for an orbit at the radius R=3r/2 (about 2000 miles up) there is no

difference in the lapses of proper time. Thus, if someone wants to get a null result, that
would be their best choice. For orbits lower than 3r/2 the satellite will show slightly less
lapse of proper time (i.e., the above discrepancy will be negative), whereas for higher
orbits it will show slightly more elapsed time than the corresponding interval at the North
Pole.
For example, in a low Earth orbit of, say, 360 miles, we have r/R = 0.917, so the proper
time runs about 22.5 microseconds per day slower than a clock at the North Pole. On the
other hand, for a 22,000 mile orbit we have r/R = 0.18, and so the orbit's lapse of proper
time actually exceeds the corresponding lapse of proper time at the North Pole by about
43.7 microseconds per day. Of course, as R continues to increase the orbital velocity
drops to zero and we are left with just coordinate time for the orbit, relative to which the
North Pole on Earth is "running slow" by about 60 micro-seconds per day, due entirely to
the gravitational potential of the earth. (This means that during a typical human life span
the Earth's gravity stretches out our lives to cover an extra 1.57 seconds of coordinate
time.)
Incidentally, equation (2) goes to zero when the orbit radius R equals 3m, consistent with
the fact that 3m is the radius of the orbit of light. This suggests that even if something
prevented a massive object from collapsing within its Schwarzschild radius 2m, it would
still be a very remarkable object if it was just within 3m, because then it could
(theoretically) support circular light orbits, although I don't believe such orbits would be
stable (even neglecting interference from infalling matter). If neutrinos are massless there
could also be neutrinos in 3m (unstable) orbits near such an object.
The results of this and the previous section can be used to clarify the so-called twins
paradox. In some treatments of special relativity the difference between the elapsed
proper times along different paths between two fixed events is attributed to a difference
in the locally "felt" accelerations along those paths. In other words, the asymmetry in the
proper times is "explained" by the asymmetry in local accelerations. However, this
explanation fails in the context of general relativity and gravity, because there are
generally multiple free-fall (i.e., locally unaccelerated) paths of different proper lengths
connecting two fixed events. This occurs, for example, with any two intersecting orbits
with different eccentricities, provided they are arranged so that the clocks coincide at two
intersections.
To illustrate, consider the intersections between a circular and a purely radial orbit in
the gravitational field of a spherically symmetrical mass m. One clock follows a perfectly
circular orbit of radius r, while the other follows a purely radial (up and down) trajectory,
beginning at a height r, climbing to R, and falling back to r, as shown below.

We can arrange for the two clocks to initially coincide, and for the first clock to complete
n circular orbits in the same (coordinate) time it takes for the second clock to rise and fall.
Thus the objects coincide at two fixed events, and they are each in free-fall continuously
in between those two events. Nevertheless, we will see that the elapsed proper times for
these two objects are not the same.
Throughout this example, we will use dimensionless times and distances by dividing each
quantity by the mass m in geometric units. For a circular orbit of radius r in
Schwarzschild spacetime, Kepler's third law gives the proper time to complete n
revolutions as

Applying the constant ratio of proper time to coordinate time for a circular orbit, we also
have the coordinate time to complete n revolutions

For the radially moving object, the usual parametric cycloid relation gives the total proper
time for the rise and fall

where the parameter satisfies the relation

The total elapsed coordinate time for the radial object is

where

In order for the objects to coincide at the two events, the coordinate times must be equal,
i.e., we must have tcirc = tradial. Therefore, replacing r with q(1+cos()) in the expression
for the coordinate time in circular orbits, we find that for any given n and q (= R/2) the
parameter must satisfy

Once weve determined the value of for a given q and n, we can then determine the
ratio of the elapsed proper times for the two paths from the relation

With n = 1 and fairly small value of r the ratio of proper times behaves as shown below.

Not surprisingly, the ratio goes to infinity as r drops to 3, because the proper time for a
circular orbit of radius 3m is zero. (Recall that the "r" in our equations signifies r/m in
normal goemetrical units.)
The parameters and proper time ratios for some larger values of r with n = 1 are

tabulated below.

To determine the asymptotic behavior we can substitute 1/u for the variable q in the
equation expressing the relation between q and , and then expand into a series in u to
give

Now for any given n let n be defined such that

For large values of r the values of will be quite close to n because the ratio of proper
times for the two free-falling clocks is close to 1. Thus we can put = n + d in
equation (3) and expand into a series in d to give

To determine the asymptotic d as a function of R and n we can put = n + d in


equation (4) and expand into a series in d to give

where

For sufficiently large R the value of Bn is negligible, so we have

Inserting this into (6) and recalling that 2/R is essentially equal to [1+cos(n)]/r since is
nearly equal to n, we arrive at the result

where

So, for any given n, we can solve (5) for n and substitute into the above equation to give
kn, and then the ratio of proper times for two free-falling clocks, one moving radially
from r to R and back to r while the other completes n circular orbits at radius r, is given
(for any value of r much greater than the mass m of the gravitating body) by equation (7).
The values of n, kn, and R/r for several values of n are listed below.

As an example, consider a clock in a circular orbit at 360 miles above the Earth's surface.
In this case the radius of the orbit is about (6.957)106 meters. Since the mass of the Earth
in geometrical units is 0.00443 meters, we have the normalized radius r = (1.57053)109,
and the total time of one orbit is approximately 5775 seconds (i.e., about 1.604 hours). In
order for a radial trajectory to begin and end at this altitude and have the same elapsed
coordinate time as one circular orbit at this altitude, the radial trajectory must extend up
to R=(1.55)107 meters, which is about 5698 miles above the Earth's surface. Taking the
value of k1 from the table, we have

and so the difference in elapsed proper times is given by

This is the amount by which the elapsed time on the radial (up-down) path would exceed
the elapsed time on the circular path.
6.6 Ideal Clocks in Arbitrary Motion
What is a clock? By a clock we understand any thing characterized by a
phenomeon passing periodically through identical phases so that we must
assume, by the principle of sufficient reason, that all that happens in a
given period is identical with all that happens in any arbitrary period.
Albert Einstein, 1910
In his 1905 paper on the electrodynamics of moving bodies, Einstein noted that the
Lorentz transformation has a peculiar consequence, namely, the elapsed time on an
ideal clock as it proceeds from one given event to another depends on the path followed
by that clock between those two events. The maximum elapsed time between two given
events (in flat spacetime) applies to a clock that proceeds inertially between those events,
whereas clocks that have followed any other path will undergo a lesser elapsed time. He
expressed this as follows
If at the points A and B there are stationary clocks which, viewed in the resting
system, are synchronous; and if the clock at A is moved with the velocity v along
the line AB to B, then on its arrival at B the two clocks no longer synchronize, but
the clock moved from A to B lags behind the other which has remained at B It
is at once apparent that this result still holds good if the clock moves from A to B
in any polygonal line, and also when the points A and B coincide. If we assume
that the result proved for a polygonal line is also valid for a continuously curved
line, we obtain the theorem: If one of two synchronous clocks at A is moved in a
closed curve with constant velocity until it returns to A then the clock that
moved runs slower than the one that remained at rest. Thus we conclude that a
balance-clock at the equator must go more slowly than a precisely similar clock
situated at one of the poles under otherwise identical conditions.
The qualifying words under otherwise identical conditions, as well as the context, make
it clear that the clocks are to be situated at the same gravitational potential which of
course will not be the case if they are both located at sea level (because the Earths
rotation causes it to bulge at the equator by just the amount necessary to cause clocks at

sea level to run at the same rate). This complication has sometimes caused people to
claim that Einsteins assertion about polar and equatorial clocks was in error, but at worst
it just unnecessarily introduced an extraneous factor.
A more serious point of criticism of the above passage was partially addressed by a
footnote added by Sommerfeld to the 1913 re-printing of Einsteins paper. This pertains
to the term balance clock, about which Sommerfeld said Not a pendulum clock, which
is physically a system to which the earth belongs. This case had to be excluded. This
reinforces the point that we are to exclude any differential effects of the earths
gravitation, but it leaves unanswered the deeper question of what precisely constitutes a
suitable clock for purposes of quantifying the elapsed proper time along any path.
Some critics have claimed that Einsteins assertion about time dilation involves circular
reasoning, arguing that if any particular clock (or physical process) should fail to conform
to the assertion, it would simply be deemed an unsuitable clock. Of course, ultimately all
physical assertions involve this kind of circularity of definition, but the value of an
assertion and definition depends not on its truth but on its applicability. If no physical
phenomena were found to conform to the definition of proper time, then the assertion
would indeed be worthless, but experience shows that the advance of the quantum wave
function of any physical system moving from the event with coordinates x,y,z,t (in terms
of an inertial coordinate system) to the event x+dx, y+dy, z+dz, t+dt is invariably in
proportion to d where

Nevertheless, it can be argued that Einstein was not in a position to know this in 1905,
because observations of the decay rates of sub-atomic particles (for example) under
conditions of extreme acceleration had not yet been made. Miller has commented that
Einsteins extension to the case where [the clocks] trajectory was a continuous curve
was unwarranted in 1905, but perhaps he considered that this case could always be
treated as the limiting case of a many-sided polygon. It should be noted, though, that
Einstein carefully prefaced this extension with the words if we assume, so he can
hardly be accused of smuggling. Also, as many others have pointed out, this
assumption (the so-called clock hypothesis) can simply be taken as the definition of
an ideal clock, and we are quite justified in expecting any real system with a periodic
process to conform to this definition provided the restoring forces involved in the process
are much greater than the inertial forces due to the acceleration of the overall system.
Whether this kind of mechanistic assessment can be applied to the decay rates of subatomic particles is less clear. If for some extreme acceleration the decay rates of subatomic particles were found to differ from d given by (1), would we conclude that the
Minkowski structure of spacetime was falsified, or that we had reached a level of
acceleration that affects the decay process? Presumably if (1) broke down at the same
point for a wide variety of processes, we would interpret this as the failure of Lorentz
covariance, but if various processes begin to violate (1) at different levels of acceleration,
we would be more likely to interpret those violations as being characteristics of the
respective processes.

From the rationalist point of view, proper time can be conceived as independent of
acceleration precisely because we can sense acceleration and correct for its effect, just as
we can sense and correct for temperature, pressure, humidity, and so on. In contrast, we
cannot sense velocity in any intrinsic way, so a purely local intrinsic clock cannot be
corrected for velocity. Our notion of true time seems to be based on the idea of a
characteristic periodic process under standard reference conditions, and then any
intrinsically sensible changes in conditions are abstracted away. But even this notion
involves idealizations, because (for example) there do not appear to be any perfectly
periodic isolated processes. An ordinary clock is not in exactly the same state after each
cycle of the escapement mechanism, because the driving spring has slightly relaxed. We
regard the clock as essentially periodic because of the demonstrated insensitivity of the
periodic components to the secular changes in the non-periodic components.
Its possible to conceive of paradoxical clocks, such as a container of cooled gas,
whose gradual increase in temperature (up to the ambient temperature) is used to indicate
the passage of time. If we have two such containers, initially cooled to the same
temperature, and then send one on a high speed journey in a spaceship with the same
ambient temperature, we expect to find that the traveling container will be cooler than the
stationary container when they are re-united. Furthermore, if the gas consisted of
radioactive particles, we expect less decay in the gas in the traveling container. However,
this applies only because we accelerated the gas molecules coherently. Another way of
increasing the velocities of the molecules in a container is by apply a separate heat
source. Obviously this has the effect of speeding up the time as indicated by the
temperature rise, but it slows down the radio-active decay of those molecules. This is just
a simple illustration of how the rate of progression of a macroscopic system toward
thermodynamic equilibrium may be affected in the opposite sense as the rate of quantum
decay of the elementary particles comprising that system. The key to maintaining a
consistent proper time for macroscopic as well as microscopic processes seems to be
coherent acceleration (work) as opposed to incoherent acceleration (heat).
In the preceding sections we've looked at circular and radial free-falling paths in a
spherically symmetrical gravitational field, but many circumstances involve more
complicated paths, including acceleration. For example, suppose we place highly
accurate cesium clocks in an airplane and fly it around in a circle with a 100 mile radius
above an airport on the equator. Assume that for the duration of the experiment the Earth
has uniform translational velocity and rotates once per 24 hours. In terms of an inertial
coordinate system whose origin is at the center of the Earth, the coordinates of the plane
are

where R is the radius of the Earth plus the height of the airplane above the Earth's
surface, W is the Earth's rotational speed, r is the radius of the circular flight path, and w

is the airplane's angular speed. Differentiating these inertial coordinates with respect to
the coordinate time t gives expressions for dx/dt, dy/dt, and dz/dt. Now, the proper time
of the clock is given by the integral of d over its worldline. Neglecting (for the moment)
the effect of the Earth's gravitational field, we have

so we can divide through by (dt)2 and take the square root to give

Therefore, if we let V and v denote the speeds RW and rw respectively, the elapsed
proper time for the clock corresponding to T of inertial coordinate time is given exactly
by the integral

Since all the dimensionless parameters V, v, rW are extremely small compared to 1, we


can approximate the square root very closely using
easily integrable expression

1 u/2, which gives the

Subtracting the result from T gives the amount of dilation for the path in question. The
result is

Only the first on the right is multiplied by T, so it represents the secular contributions to
the time dilation, i.e., the parts that grow in proportion to the total elapsed time, whereas
the two right hand terms are cyclical and don't accumulate as T increases. Not
surprisingly, if we set v = r = 0 the amount of dilation is simply V2 T / 2, which is the
dilation for the fixed point at the airplane's height above the equator, due entirely to the
Earth's rotation. On the other hand, if we take the following values

we find that the clock fixed at a point on the equator runs slow by 100.69 nsec per 24
hours relative to our Earth-centered inertial coordinate system, whereas a clock going
around in a circle of radius 100 miles at 600 mph would lose 134.99 nsec per 24 hours
(neglecting the cyclical components).
Another experiment that could be performed is to fly clocks completely around the
Earth's equator in opposite directions, so the eastbound clock's flight speed (relative to
the ground) would be added to the circumferential speed of the Earth's surface due to the
Earth's rotation, whereas the westbound clock's flight speed would be subtracted. In this
situation the spatial coordinates of the clocks in the equatorial plane would be given by

where we take the + sign for the eastbound plane and for the westbound plane. This
gives the derivatives

Substituting into equation (1) and simplifying gives

Multiplying through by dt and integrating from t = 0 to some arbitrary coordinate time t,


we find that the corresponding lapse of proper time for the plane is

It follows that the lapse of time on the westbound clock by any coordinate time t will
exceed the lapse of time on the eastbound clock by 2(t)Vv.
To this point we have neglected the gravitational field of the Earth by assuming that the

metric of spacetime was the flat Minkowski metric. To account for the effects of gravity
we should really use the Schwarzschild metric (assuming a spherical Earth). We saw in
Section 6.4 that the metric in the equatorial plane of a spherical gravitating body of mass
m at a constant Schwarzschild radial parameter r from the center of that body is

where is the proper time along the path, t is the coordinate time, and is the longitude.
Dividing through by (dt)2 and taking the square root of both sides gives

Let R denote the "radius" of the Earth, and let r = R + h denote the radius of the airplane's
flight path at the constant altitude h. If we again let V denote the tangential speed of the
Earth's rotation at the airplane's radial position at the equator, and let v denote the
tangential speed of the airplane (either eastward or westward), we have d/dt = (Vv)/r,
so the above equation leads to the integral for the elapsed proper time along a path in the
equatorial plane at radial parameter r = R+h from the Earth's center and with a tangential
speed (relative to a fixed position above a point on the Earth's surface) is

Again making use of the approximation


1 u/2 for small u, we can integrate
this over some interval t of coordinate time to give the corresponding lapse of proper
time along the path

Naturally this is the same as equation (2) except for the extra term -2m/r, which
represents the effect of the gravitational field. The mass of the Earth in gravitational units
is about m = 0.0044 meters = (1.4766)10-11 sec, and if the airplanes are flying at an
altitude of h = 6 miles above the Earth's surface we have r = 3986 miles = 0.021031 sec.
Also, assume the speed of the airplanes (relative to the ground) is v = 500 mph, which is
v = (0.747)10-6 in dimensionless units, compared with the tangential speed of the Earth's
surface at the equator V = (1.527)10-6. In these conditions the above formula gives the
relation between coordinate time and elapsed proper time for a clock sitting stationary at
the equator on the Earth's surface as

whereas for clocks flying at an altitude of 6 miles and 500 mph eastward and westward
the relations are

This shows that the difference in radial location between the clock on the Earth's surface
and the clock up at flight altitude results in a slowing of the Earthbound clock's proper
time relative to the airplane clocks of about (2.073)10-12 seconds per second of coordinate
time. On the other hand, the eastbound clock has a relative slowing (compared to the
Earthbound clock) in the amount of (1.419)10-12 seconds per second due to is greater
speed, so the net effect is that the eastbound clock's proper time runs ahead of the
Earthbound clock by about (0.654)10-12 seconds per second of coordinate time. In
contrast, the westbound clock is actually moving slower than the Earthbound clock
(because it's flight speed counteracts the rotation of the Earth), so it gains an additional
(0.862)10-12 seconds per second. The net effect is that the westbound clock's proper time
runs ahead of the Earthbound clock by a total of (2.935)10-12 seconds per second of
coordinate time.
These effects are extremely small, but if an experiment is performed for an extended
period of time the differences in elapsed time on highly accurate cesium clocks is large
enough to be detectable. Since there are 86400 seconds in a day, we would expect to see
the eastbound and westbound flying clocks in advance of the Earthbound clock by 57
nanoseconds and 254 nanoseconds respectively. Experiments of this type have actually
been performed, and the results have agreed with the predictions of relativity. Notice that
the "moving" clocks actually show greater lapses of proper time than the "stationary"
clock, seeming to contradict special relativity, but the explanation (as we've seen) is that
the gravitational effects of general relativity override the velocity effects in these
particular circumstances.
Suppose we return to our original problem, which involved airplanes flying in a small
circle around a fixed point on the Earth's equator, but now we want to include the effects
of the Earth's gravity. The principles are the same as in the circumnavigating case, i.e., we
need only integrate the proper time along the path, making use of the Schwarzschild
metric to give the correct line element. However, the path of the airplane in this case is
not so easy to express in terms of the usual Schwarzschild polar coordinates. One way of
approaching a problem such as this is to work with the Schwarzschild metric expressed in
terms of "orthogonal" quasi-Minkowskian coordinates. If we split up the coefficient of
(dr)2 into the form 1 + 2m/(r2m), then the usual Schwarzschild metric can be written as

Now if we define the quasi-Euclidean parameters

we recognize the last three terms of the preceding equation as just the expression of
(dx)2 + (dy)2 + (dz)2 in polar coordinates. Also, since r =
we have
dr = (x dx + y dy + z dz) / r, so the Schwarzschild metric can be written in the quasiMinkowskian form

This form is similar to Riemann normal coordinates if we expand this metric about any
radius r. Also, for sufficiently large r the quantity 2m in the denominator of the final term
becomes negligible, and the coefficient approaches 2m/r3, so it isn't surprising that this is
one of the characteristic magnitudes of the sectional curvature of Schwarzschild
spacetime at radius r. Expanding the above expression, we find that the Schwarzschild
metric can be expressed as a sum of the Minkowski metric plus some small quantities as
shown below

Thus in matrix notation the Schwarzschild metric tensor for these coordinates is

where = 1 / [r2(1 2m/r)]. The determinant of this metric is -1. Dividing the preceding
expression by (dt)2 and taking the square root of both sides, we arrive at a relation
between dt and dt into which we can substitute the expressions for x,y,z, r, dx/dt, dy/dt,
and dz/dt, and then integrate to give the proper time along the path as a function of
coordinate time t. Hence if we know x,y, and z as explicit functions of t along a
particular path, we can immediately write down the explicit integral for the lapse of

proper time along that path.


6.7 Gravitational Acceleration in Schwarzschild Coordinates
If bodies, moved in any manner among themselves, are urged in the
direction of parallel lines by equal accelerative forces, they will all
continue to move among themselves, after the same manner as if they had
not been urged by those forces.
Isaac Newton,
1687
According to Newton's theory the acceleration of gravity of a test particle at a given
radial distance from a large mass is independent of the particles state of motion.
Consequently it would be impossible to tell, from the relative motions of a group of freefalling test particles in a small region of space, that those particles were subject to any
force. Maxwell emphasized the same point when he wrote (in the posthumously
published Matter and Motion) that acceleration is relative, because only the differences
between the accelerations of bodies can be detected.
Our whole progress up to this point may be described as a gradual development of
the doctrine of relativity of all physical phenomena... There are no landmarks in
space; one portion of space is exactly like every other portion, so that we cannot
tell where we are. We are, as it were, on an unruffled sea, without stars, compass,
soundings, wind, or tide, and we cannot tell in what direction we are going. We
have no log which we can cast out to take a dead reckoning by; we may compute
our rate of motion with respect to the neighbouring bodies, but we do not know
how these bodies may be moving in space. We cannot even tell what force may be
acting on us; we can only tell the difference between the force acting on one thing
and that acting on another.
Of course, he was here referring to forces (such as gravity) that are proportional to
inertial mass, so that they impart equal accelerations to every body. As an example of a
set of a localized set of bodies subjected to equal acceleration, he considered ordinary
objects on the earths surface, all of which are subjected (along with the earth itself) to
the suns gravitational force and the corresponding acceleration. He noted that if this were
not the case, i.e., if the suns gravity attracted only the earth but not ordinary small
objects on the earths surface, this would be easily detectable by (for instance) changes in
the position of a plumb line between sunrise and sunset.
Naturally these facts are closely related to the equivalence principle, but there are some
subtle differences when we consider the accelerations of bodies due to gravity in the
context of general relativity. We saw in Section 6.4 that the second derivative of r with
respect to the proper time of the radially moving particle in general relativity is simply

and thus independent of the particles state of motion, just as with Newtonian gravity.
However, the proper times of two (momentarily) coincident particles may differ
depending on their states of motion, so when we consider the motions of such particles in
terms of a common system of coordinates the result will not be so simple. The second
derivative of the radial coordinate r with respect to the time coordinate t in terms of the
usual Schwarzschild coordinates depends not only on the spacetime location of the
particle (i.e., r and t) but also on the trajectory of the particle through that point. This is
true even for particles with purely radial motion. To derive d2r/dt2 for purely radial
motion, we can divide through equation (1) of Section 6.4 by (dt)2 to give

Solving for dr/dt gives

where is the proper time of the radially moving particle. We also have from Section 6.4
the relation

where is a constant parameter of the given trajectory, and is the path length parameter
of the geodesic equations. We identify with the proper time by setting d/d = 1, so
we can write

Substituting into (2), we have

and therefore the second derivative or r with respect to t is

In order to relate the parameter to a particular trajectory, we can substitute (3) into
equation (1), giving

There are two cases to consider. First, if there is a radius r = R at which the test particle is
stationary, meaning dr/dt = 0, then

In this case the magnitude of is always greater than 1. Inserting this into (4) gives

At the apogee of the trajectory, when r = R, this reduces to

as expected. If R is infinite, the coordinate acceleration reduces to

A plot of d2r/dt2 divided by m/r2 for various values of R is shown below.

Notice that the value of (d2r/dt2) / (-m/r2) is negative in the range from r = 2m to r = 6m/(1
+ 4m/R), where d2r/dt2 changes from negative to positive. This signifies that the
acceleration (in terms of the r and t coordinates) is actually outward in this range.
In the second case there is no radius at which the trajectory is stationary, so the trajectory
escapes to infinity, and the speed dr/dt asymptotically approaches a fixed value V in the
limit as r goes to infinity. In this case equation (5) gives

so the magnitude of is less than 1. Inserting this into equation (4) gives

The case V = 0 corresponds to the case of R approaching infinity for the bound
trajectories, and indeed we see that inserting V = 0 into this expression gives the same
result as with R going to infinity in the acceleration equation for bound trajectories. At
the other extreme, with V = 1, this equation reduces to

which is consistent with what we get for null (light-like) paths by setting d = 0 in the
radial metric and the solving for dr/dt = (1 2m/r). A normalized plot of this
acceleration for various values of V is shown below.

This shows that the acceleration d2r/dt2 in terms of the Schwarzschild coordinates r and t
for a particle moving radially with ultimate speed V (either toward or away from the
gravitating mass) is outward at all radii greater than 2m for all ultimate speeds greater
than 0.577 times the speed of light. For light-like paths (V = 1), the magnitude of the
acceleration approaches twice the magnitude of the Newtonian acceleration and is
outward instead of inward. The reason for this outward acceleration with respect to
Schwarzschild coordinates is that the speed of light (in terms of these coordinates) is
greater at greater radial distances from the mass.
Notice that the two expressions for d2r/dt2 derived above, applicable to the cases when the
kinetic energy of the test particle is or is not sufficient to escape to infinity, are the same
if we stipulate that R and V are related according to

If R is greater than 2m, then V2 is negative so V is imaginary. Hence in this case we find
it most convenient to use R. On the other hand, if R is negative, from 0 to negative
infinity, the value of V2 is real in the range from 0 to 1, so in this case it is convenient to
work with V. The remaining possibility (which has no counterpart in Newtonian gravity)
is if R is between 0 and 2m, in which case V2 is not only positive, it is greater than 1.
Thus the impossibility of having a speed greater than 1 corresponds to the impossibility
of being motionless at a radius less than 2m.

Incidentally, for a bound particle we can give an alternative derivation of the r,t
acceleration from the well-known cycloidal parametric relations between r and :

where R is the "top" of the orbit and is an angular parameter that ranges from 0 at the
top of the orbit (r = R) to at the bottom (r = 0). A plot of r versus can be drawn by
tracing the motion of a point on the rim of a wheel as it rolls along a flat surface. (This
same relation applies in Newtonian gravity if we replace with t.) Now, differentiating
these parametric equations with respect to gives

Therefore we have

From the parametric equation for r we have

Denoting this quantity by "u", this implies that

Solving this for tan( /2) gives

We want = 0 at r = R so we choose the first root and substitute into the preceding
equation for dr/d to give

In addition, we have the derivative of coordinate time with respect to proper time of the
particle

(See Section 6.4 for a derivation of this relation from the basic geodesic equations.)
Dividing dr/d by dt/d gives

Just as we did previously, we can now compute d2r/dt2 = [d(dr/dt)/dr][dr/dt], and we


arrive at the same result as before.
6.8 Sources in Motion
This means that the velocity of propagation [of gravity] is equal to that of
light. It seems at first that this hypothesis ought to be rejected outright.
Laplace showed in effect that the propagation is either instantaneous or
much faster than that of light. However, Laplace examined the hypothesis
of finite propagation velocity ceteris non mutatis; here, on the contrary,
this hypothesis is conjoined with many others, and it may be that between
them a more or less perfect compensation takes place. The application of
the Lorentz transformation has already provided us with numerous
examples of this.
Poincare, 1905
The preceding sections focused on the spherically symmetrical solution of Einstein's field
equations represented by the Schwarzschild solution, combined with the geodesic
hypothesis. Most of the directly observable effects of general relativity can be modeled
and evaluated on this basis, i.e., in terms of the solution of the one-body problem, a
single gravitating body that can be regarded as stationary. Having solved the field
equations for this single body, we then determine the paths of test particles in its vicinity,
based on the assumption that those particles do not significantly affect the field, and that
they follow geodesics in the field of the gravitating body. This is obviously a very
simplified and idealized case, but it happens to be fairly representative of a small planet
(e.g., Mercury) orbiting the Sun, or a light pulse grazing the Sun. From one point of view,
the geodesic assumption seems quite natural and unobjectionable. After all, it merely
asserts Newtons first law of motion in each small region of spacetime. Any sufficiently
small region is essentially flat, and if we assume that free objects move at constant speed
in straight lines in flat spacetime, then overall they follow geodesics.

However, there are two reasons for possibly being dissatisfied with the geodesic
assumption. First, just as with Newtons law of inertia, the geodesic assumption can be
regarded as giving a special privileged status to certain paths without a clear justification.
Of course, in practice the principle of inertia has proven itself to be extremely robust, but
in theory there has always been some epistemological uneasiness about the circularity in
the definition of inertial paths. As Einstein commented, we say an object moves inertially
if it is free of outside influences, but we infer that it is free of outside influences only by
observing that it moves inertially. This concern can be answered, at least in part, by
noting that inertia serves as an organizing principle, and its significance resides in the
large number of disparate entities that can be coordinated simultaneously on the basis of
this principle. The concept of (local) inertial coordinates would indeed be purely circular
if it successfully reduced the motions of only a single body to a simple set of patterns
(e.g., Newtons laws), but when the same system of coordinates is found to reduce the
motions of multiple (and seemingly independent) objects, we are justified in claiming
that it has non-trivial physical significance. Nevertheless, one of Einsteins objectives in
developing the general theory was to eliminate the reliance on the principle of inertia,
which is the principle of geodesic motion in curved spacetime.
The second reason for dissatisfaction with the geodesic assumption is that all entities
whose motions are of interest are not just passive inhabitants of the spacetime manifold,
they are sources of gravitation in their own right (since all forms of mass and energy
gravitate). This immediately raises the problem also encountered in electrodynamics
of how to deal with the field produced by the moving entity itself. Moreover, unlike
Maxwells equations of the electrodynamic field, the field equations of general relativity
are non-linear, so we are not even justified in subtracting out the self-field of the
moving object, because the result will not generally be a solution of the field equations.
One possible way of addressing this problem would be to treat the moving objects as
contributors to the stress-energy tensor T in the field equations, in which case the
vanishing of the covariant derivative (imposed by the field equations) implies that the
objects follow geodesics. However, it isnt clear, a priori, that this is a legitimate
representation of matter. Einstein, for one, rejected this approach, saying that T is
merely a formal condensation of all things whose comprehension in the sense of a field
theory is still problematic. Another approach is to treat particles of matter as isolated
point-like pole singularities in the field indeed this was the basis for a paper written by
Einstein, Infeld, and Hoffman (EIH) in 1938, in which they argued that (at least when the
field equations are integrated to some finite order of approximation, and assuming a weak
field and low accelerations) such singularities can exist only if they propagate along
geodesics in spacetime.
At first sight this is a somewhat puzzling proposition, because geodesics are defined only
on smooth manifolds, so it isnt obvious how a singularity of a manifold can be said to
propagate along a geodesic of that manifold. However, against the background of nearly
Minkowskian spacetime, its possible to define a workable notion of the position of an
isolated singularity (though not without some ambiguity). Even if we accept all these
caveats, its odd that Einstein would pursue this approach, considering that he is usually
identified with a disdain for singularities, declaring that they render a field theory invalid

much like an inconsistency in a formal system. In fact, one of his favorite ideas was
that we might achieve a complete physically viable field theory precisely by requiring the
absence of singularities. Indeed the EIH paper shows that geodesic motion is an example
of a physical effect that can be deduced on this basis.
Einstein, et al, discovered that when the field equations are integrated in the presence of
two specified point-like singularities in the field, a one-dimensional locus of singularity
extending from one of the original points to the other ordinarily appears in the solution.
There is, however, a special set of conditions on the motions of the two original point-like
singularities such that no intervening singular locus appears, and it is precisely the
conditions of geodesic motion. Thus EIH concluded that the field equations of general
relativity, by themselves, without any separate geodesic assumption actually do require
mass point singularities to follow geodesic paths. (Just as remarkably, it turns out that
even the classical equations of motion are due entirely to the non-linearity of the field
equations.) So, this is actually an example of how meaningful physics can come out of
Einsteins principle of no-singularities. Of course, the solution retains the two pointlike singularities, so one might question whether Einstein was being hypocritical in
banning singularities in the rest of the manifold. In reply he wrote
This objection would be justified if the equations of gravitation were to be
considered as equations of the total field. But since this is not the case, one will
have to say that the field of a material particle will differ the more from a pure
gravitational field the closer one comes to the location of the particle. If one had
the field equations of the total field, one would be compelled to demand that the
particles themselves could be represented as solutions of the complete field
equations that are free of irregularities everywhere. Only then would the general
theory of relativity be a complete theory.
This is clearly related to Einsteins dissatisfaction with the dualistic nature of physics,
being partly described by partial differential equations of the field, and partly by total
differential equations of particles. His hope was that particle-like solutions would emerge
from some suitable field theory, and one of the conditions he felt must be satisfied by any
such complete field theory must be the complete absence of singularities. Its easy to
understand why Einstein felt the need for a unified field theory to encompass both
gravity and electromagnetism, because in their present separate forms they are extremely
incongruous. In the case of electrodynamics, the field equations are linear, and possess
only a single gauge freedom, so the equations of motion must be introduced as an
independent assumption. In contrast, general relativity suggests that the equations of
motion of a field theory ought to be implied by the field equations themselves, which
must therefore be non-linear.
One of the limitations of Einsteins work on the equations of motion was that it neglected
the effect of radiation. This is usually considered to be legitimate provided the
accelerations involved are not too great. Still, strictly speaking, accelerating masses ought
to produce radiation. Indeed, this is necessary, even for slowly accelerated motions, in
order to maintain strict momentum conservation along with the nearly complete absence

of aberration in the apparent direction of the force of gravity in the two-body problem
(as noted by Laplace). But radiation reaction also causes acceleration, so it can be argued
that any meaningful treatment of the problem of motion cannot neglect the effects of
gravitational waves. Of course, the full field equations of general relativity possess
solutions in which metrical disturbances propagate as waves, but such waves have not yet
been directly observed. Hence they don't, at present, constitute part of the experimentally
validated body of general relativity, but there is indirect empirical confirmation of
gravitational waves in the apparent energy loss of certain binary star systems, most
notably the Hulse-Taylor system, which consists of a neutron star and a pulsar orbiting
each other every 8 hours. Careful observations indicate that the two stars are spiraling
toward each other at a rate of 2.7 parts per billion each year, precisely consistent with the
prediction of general relativity for the rate at which the system should be radiating energy
in the form of gravitational waves. The agreement is very impressive, and subsequent
observations of other binary star systems have provided similar indirect support for the
existence of gravitational waves, although in some cases it is necessary to postulate other
(unseen) bodies in the system in order to yield results consistent with general relativity.
The experimental picture may change as a result of the LIGO project, which is an attempt
to use extremely sensitive interferometry techniques to directly detect gravitational
waves. Two separate facilities are being prepared in the states of Louisiana and
Washington, and their readings will be combined to achieve a very large baseline. The
facility in Washington state is over a mile long. If this effort is successful in detecting
gravitational waves, it will be a stupendous event, possibly opening up a new "channel"
for observing the universe. Of course, it's also possible that efforts to detect gravitational
waves may yield inconclusive results, i.e., no waves may be definitely detected, but it
may be unclear whether the test has been adequate to detect them even if they were
present.
If, on the other hand, the experimental efforts were to surprise us with an unambiguously
null result (like the Michelson-Morley experiments), ruling out the presence of
gravitational waves in a range where theory says they ought to be detectable, it could
have serious implications for the field equations and/or the quadrupole solution. Oddly
enough, Einstein became convinced for a short time in 1937 that gravity waves were
impossible, but soon changed his mind again. As recently as 1980 there were disputes in
scholarly publications as to the validity of the quadrupole solution. Part of the reason that
people such as Einstein have occasionally doubted the reality of the wave solutions is that
all gravitational waves imply a singularity (as does the Schwarzschild solution), albeit
"merely" a coordinate singularity. Also, the phenomena of gravitational waves must be
inherently non-linear, because it consists of gravity "acting on itself", and we know that
gravity itself doesn't show up in the source terms of the field equations, but only in the
non-linearity of the left-hand side of the field equations. The inherent non-linearity of
gravitational waves makes them difficult to treat mathematically, because the classical
wave solutions are based on linearized models, so it isn't easy to be sure the resulting
"solutions" actually represent realistic solutions of the full non-linear field equations.
Furthermore, there are no known physical situations that would produce any of the simple
linearized plane wave situations that are usually discussed. For example, it is known that

there are no plane wave solutions to the non-linear field equations. There are cylindrical
solutions, but unfortunately no plausible sources for infinite cylindrical solutions are
known, so the physical significance of these solutions is unclear.
It might seem as though there ought to be spherically symmetrical "pulsating" solutions
that radiate gravitational waves, but this is not the case, as is clear from Birkhoff's proof
that the Schwarzschild solution is the unique (up to transformation of coordinates)
spherically symmetrical solution of the field equations, even without the "static"
assumption. This is because, unlike the case of electromagnetism, the gravitational field
is also the metric by which the field is measured, so coordinate transformations inherently
represent more degrees of freedom than in Maxwell's equations, which have just a single
"gage". As a result, there is no physically meaningful "dipole" source for gravitational
waves in general relativity. The lowest-order solutions are necessarily given by
quadrupole configurations.
Needless to say, another major complication in the consideration of gravitational waves is
the idea of "gravitons" arising from attempts to quantize the gravitational field by analogy
with the quantization of the electromagnetic field. This moves us into a realm where the
classical notions of a continuous spacetime manifold may not be sustainable. A great deal
of effort has been put into understanding how the relativistic theory of gravity can be
reconciled with quantum theory, but no satisfactory synthesis has emerged. Regardless of
future developments, it seems safe to say that the results associated with the large-scale
Schwarzschild metric and geodesic hypothesis would not be threatened by quantization
of the field equations. Nevertheless, this shows how important the subject of gravitational
waves is for any attempt to integrate the results of general relativity into quantum
mechanics (or vice versa, as Einstein might have hoped). This is one reason the
experimental results are awaited with such interest.
Closely related to the subject of gravitational waves is the question of how rapidly the
"ordinary" effects of gravity "propagate". It's not too surprising that early investigations
of the gravitational field led to the notion of instantaneous action at a distance, because it
is an empirical fact that the gravitational acceleration of a small body orbiting at a
distance r from a gravitating source points, at each instant, very precisely toward the
position of the source at that instant, not (as we might naively expect) toward the location
of the source at a time r/c earlier. (When we refer to "instants" in this section, we mean
with respect to the inertial rest coordinates of the center of mass of the orbital system.) To
gain a clear understanding of the reason for the absence of gravitational "aberration" in
these circumstances, it's useful to recall some fundamentals of the phase relations
between dynamically coupled variables. One of the simplest representations of dynamic
coupling between two variables x and y is the "lead-lag" transfer function, which is based
on the ordinary first-order differential equation

where a0, a1, b0, and b1 are constants. This coupling is symmetrical, so there is no implicit

directionality, i.e., we aren't required to regard either x or y as the independent variable


and the other as the dependent variable. However, in most applications we are given one
of these variables as a function of time, and we use the relation to infer the response of
the other variable. To assess the "frequency response" of this transfer function we
suppose that the x variable is given by a pure sinusoidal function x(t) = Asin(t) for some
constants A and w. Eventually the y variable will fall into an oscillating response, which
we presume is also sinusoidal of the same frequency, although the amplitude and phase
may be different. Thus we seek a solution of the form
y(t) = Bsin(t )
for some constants B and . If we define the "time lag" tL of the transfer function as the
phase lag divided by the angular frequency , it follows that the time lag is given by

For sufficiently small angular frequencies the input function and the output response both
approach simple linear "ramps", and since invtan(z) goes to z as z approaches zero, we
see that the time lag goes to

The ratios a1/a0 and b1/b0 are often called, respectively, the lag and lead time constants of
the transfer function, so the "time lag" of the response to a steady ramp input equals the
lag time constant minus the lead time constant. Notice that it is perfectly possible for the
lead time constant to be greater than the lag time constant, in which case the "time lag" of
the transfer function is negative. In general, for any frequency input (not just linear
ramps), the phase lag is negative if b1/b0 exceeds a1/a0. Despite the appearance, this does
not imply that the transfer function somehow reads the future, nor than the input signal is
traveling backwards in time (or is instantaneous in the case of a symmetrical coupling).
The reason the output appears to anticipate the input is simply that the forcing function
(the right hand side of the original transfer function) contains not only the input signal
x(t) but also its derivative dx/dt (assuming b1 is non-zero), whose phase is /2 ahead.
(Recall that the derivative of the sine is the cosine.) Hence a linear combination of x and
its derivative yields a net forcing function with an advanced phase.
Thus the effective forcing function at any given instant does not reflect the future of x, it
represents the current x and the current dx/dt. It just so happens that if the sinusoidal
wave pattern continues unchanged, the value of x will subsequently progress through the
phase that was "predicted" by the combination of the previous x and dx/dt signals,
making it appear as though the output predicted the input. However, if the x signal
abruptly changes the pattern at some instant, the change will not be foreseen by the
output. Any such change will only reach the output after it has appeared at the input and

worked its way through the transfer function. One way of thinking about this is to
remember that the basic transfer function is directionally symmetrical, and the "output
signal" y(t) could just as well be regarded as the input signal, driving the "response" of
x(t) and its derivative.
We sometimes refer to "numerator dynamics" as the cause of negative time lags, because
the b1 coefficient appears in the numerator of the basic dynamic relationship when
represented as a transfer function with x(t) as an independent "input" signal. The ability
of symmetrical dynamic relations to extrapolate periodic input oscillations so that the
output has the same phase as (or may even lead) the input accounts for many interesting
effects in physics. For example, in electrodynamics the electrostatic force exerted on a
uniformly moving test particle by a "stationary" charge always points directly toward the
source, because the field is spherically symmetrical about the source. However, since the
test particle is moving uniformly we can also regard it as "stationary", in which case the
source charge is moving uniformly. Nevertheless, the force exerted on the test particle
always points directly toward the source at the present instant. This may seem surprising
at first, because we know changes in the field propagate at the speed of light, rather than
instantaneously. How does the test particle "know" where the source is at the present
instant, if it can only be influenced by the source at some finite time in the past, allowing
for the finite speed of propagation of the field? The answer, again, is numerator
dynamics. The electromagnetic force function depends not only on the source's relative
position, but also on the derivative of the position (i.e., the velocity). The net effect is to
cancel out any phase shift, but of course this applies only as long as the source and the
test particle continue to move uniformly. If either of them is accelerated, the "knowledge"
of this propagates from one to the other at the speed of light.
An even more impressive example of the phase-lag cancellation effects of numerator
dynamics involves the "force of gravity" on a massive test particle orbiting a much more
massive source of gravity, such as the Earth orbiting the Sun. In the case of Einstein's
gravitational field equations the "numerator dynamics" cancel out not only the first-order
phase effects (like the uniform velocity effect in electromagnetism) but also the secondorder phase effects, so that the "force of gravity" on an orbiting points directly at the
gravitating source at the present instant, even though the source (e.g., the Sun) is actually
undergoing non-uniform motion. In the two-body problem, both objects actually orbit
around the common center of mass, so the Sun (for example) actually proceeds in a
circle, but the "force of gravity" exerted on the Earth effectively anticipates this motion.
The reason the phase cancellation extends one order higher for gravity than for
electromagnetism is the same reason that Maxwell's equations predict dipole waves,
whereas Einstein's equations only support quadrupole (or higher) waves. Waves will
necessarily appear in the same order at which phase cancellation no longer applies. For
electrically charged particles we can generate waves by any kind of acceleration, but this
is because electromagnetism exists within the spacetime metric provided by the field
equations. In contrast, we can't produce gravitational waves by the simplest kind of
"acceleration" of a mass, because there is no background reference to unambiguously
define dipole acceleration. The Einstein field equations have an extra degree of freedom

(so to speak) that prevents simple dipole acceleration from having any "traction". It is
necessary to apply quadrupole acceleration, so that the two dipoles can act on each other
to yield a propagating effect.
In view of this, we expect that a two-body system such as the Sun and the Earth, which
essentially produces no gravitational radiation (according to general relativity) should
have numerator dynamic effects in the gravitational field that give nearly perfect phaselag cancellation, and therefore the Earth's gravitational acceleration should always point
directly toward the Sun's position at the present instant, rather than (say) the Sun's
position eight minutes ago. Of course, if something outside this two-body system (such as
a passing star) were to upset the Sun's pattern of motion, the effect of such a disturbance
would propagate at the speed of light. The important point to realize is that the fact that
the Earth's gravitational acceleration always points directly at the Sun's present position
does not imply that the "force of gravity" is transmitted instantaneously. It merely implies
that there are velocity and acceleration terms in the transfer function (i.e., numerator
dynamics) that effectively cancel out the phase lag in a simple periodic pattern of motion.
7.1 Is the Universe Closed?
The unboundedness of space has a greater empirical certainty than any
experience of the external world, but its infinitude does not in any way
follow from this; quite the contrary. Space would necessarily be finite if
one assumed independence of bodies from position, and thus ascribed to it
a constant curvature, as long as this curvature had ever so small a
positive value.
B.
Riemann, 1854
Very soon after arriving at the final form of the field equations, Einstein began to
consider their implications with regard to the overall structure of the universe. His 1917
paper presented a simple model of a closed spherical universe which "from the standpoint
of the general theory of relativity lies nearest at hand". In order to arrive at a quasi-static
distribution of matter he found it necessary to introduce the "cosmological term" to the
field equations (as discussed in Section 5.8), so he based his analysis on the equations

where is the cosmological constant. Before invoking the field equations we can
consider the general form of a metric that is suitable for representing the large-scale
structure of the universe. First, we ordinarily assume that the universe would appear to
be more or less the same when viewed from the rest frame of any galaxy, anywhere in the
universe (at the present epoch). This is sometimes called the Cosmological Principle.
Then, since the universe on a large scale appears (to us) highly homogenous and

isotropic, we infer that these symmetries apply to every region of space. This greatly
restricts the class of possible metrics. In addition, we can choose, for each region of
space, to make the time coordinate coincide with the proper time of the typical galaxy in
that region. Also, according to the Cosmological Principle, the coefficients of the spatial
terms of the (diagonalized) metric should be independent of location, and any
dependence on the time coordinate must apply symmetrically to all the space
coordinates. From this we can infer a metric of the form

where S(t) is some (still to be determined) function with units of distance, and d is the
total space differential. Recall that for a perfectly flat Euclidean space the differential
line element is

where r2 = x2 + y2 + z2. If we want to allow our space (at a given coordinate time t) to
have curvature, the Cosmological Principle suggests that the (large scale) curvature
should be the same everywhere and in every direction. In other words, the Gaussian
curvature of every two-dimensional tangent subspace has the same value at every point.
Now suppose we embed a Euclidean three-dimensional space (x,y,z) in a fourdimensional space (w,x,y,z) whose metric is

where k is a fixed constant equal to either +1 or -1. If k = +1 the four-dimensional space


is Euclidean, whereas if k = -1 it is pseudo-Euclidean (like the Minkowski metric). In
either case the four-dimensional space is "flat", i.e., has zero Riemannian curvature. Now
suppose we consider a three-dimensional subspace comprising a sphere (or pseudosphere), i.e., the locus of points satisfying the condition

From this we have w2 = (1 r2)/k = k kr2, and therefore

Substituting this into the four-dimensional line element above gives the metric for the

three-dimensional sphere (or pseudo-sphere)

Taking this as the spatial part of our overall spacetime metric (2) that satisfies the
Cosmological Principle, we arrive at

This metric, with k = +1 and R(t) = constant, was the basis of Einstein's 1917 paper, and
it was subsequently studied by Alexander Friedmann in 1922 with both possible signs of
k and with variable R(t). The general form was re-discovered by Robertson and Walker
(independently) in 1935, so it is now often referred to as the Robertson-Walker metric.
Notice that with k = +1 this metric essentially corresponds to polar coordinates on the
"surface" of a sphere projected onto the "equatorial plane", so each value of r corresponds
to two points, one in the Northern and one in the Southern hemisphere. We could remedy
this by making the change of variable r r/(1 + 3kr2), which (in the case k = +1)
amounts to stereographic projection from the North pole to a tangent plane at the South
pole. In terms of this transformed radial variable the Robertson-Walker metric has the
form

As noted above, Einstein originally assumed R(t) = constant, i.e., he envisioned a static
un-changing universe. He also assumed the matter in the universe was roughly
"stationary" at each point with respect to these cosmological coordinates, so the only nonzero component of the stress-energy tensor in these coordinates is Ttt = where is the
density of matter (assumed to be uniform, in accord with the Cosmological Principle).
On this basis, the field equations imply

Here the symbol R denotes the assumed constant value of R(t) (not to be confused with
the Ricci curvature scalar). This explains why Einstein was originally led to introduce a
non-zero cosmological constant , because if we assume a static universe and the
Cosmological Principle, the field equations of general relativity can only be satisfied if
the density is proportional to the cosmological constant. However, it was soon pointed
out that this static model is unstable, so it is apriori unlikely to correspond to the physical
universe. Moreover, astronomical observations subsequently indicated that the universe
(on the largest observable scale) is actually expanding, so we shouldn't restrict ourselves

to models with R(t) = constant. If we allow R(t) to be variable, then the original field
equations, without the cosmological term (i.e., with = 0), do have solutions. In view of
this, Einstein decided the cosmological term was unnecessary and should be excluded.
Interestingly, George Gamow was working with Friedmann in Russia in the early 1920's,
and he later recalled that "Friedmann noticed that Einstein had made a mistake in his
alleged proof that the universe must necessarily be stable". Specifically, Einstein had
divided through an equation by a certain quantity, even though that quantity was zero
under a certain set of conditions. As Gamow notes, "it is well known to students of high
school algebra" that division by zero is not valid. Friedmann realized that this error
invalidated Einstein's argument against the possibility of a dynamic universe, and indeed
under the condition that the quantity in question vanishes, it is possible to satisfy the field
equations with a dynamic model, i.e., with a model of the form given by the RobertsonWalker metric with R(t) variable. It's worth noting that Einstein's 1917 paper did not
actually contain any alleged proof that the universe must be static, but it did suggest that
a non-zero cosmological constant required a non-zero density of matter. Shortly after
Einstein's paper appeared, de Sitter gave a counter-example (see Section 7.6), i.e., he
described a model universe that had a non-zero but zero matter density. However,
unlike Einstein's model, it was not static. Einstein objected strenuously to de Sitter's
model, because it showed that the field equations allowed inertia to exist in an empty
universe, which Einstein viewed as "inertia relative to space", and he still harbored hopes
that general relativity would fulfill Mach's idea that inertia should only be possible in
relation to other masses. It was during the course of this debate that (presumably)
Einstein advanced his "alleged proof" of the impossibility of dynamic models (with the
errant division by zero?). However, before long Einstein withdrew his objection,
realizing that his argument was flawed. Years later he recalled the sequence of events in
a discussion with Gamow, and made the famous remark that it had been the biggest
blunder of his life. This is usually interpreted to mean that he regretted ever considering
a cosmological term (which seems to have been the case), but it could also be referring to
his erroneous argument against de Sitter's idea of a dynamic universe, and his unfortunate
"division by zero".
In any case, the Friedmann universes (with and without cosmological constant) became
the "standard model" for cosmologies. If k = +1 the manifold represented by the
Robertson-Walker metric is a finite spherical space, so it is called "closed". If k = 0 or 1
the metric is typically interpreted as representing an infinite space, so it is called "open".
However, it's worth noting that this need not be the case, because the metric gives only
local attributes of the manifold; it does not tell us the overall global topology. For
example, we discuss in Section 7.4 a manifold that is everywhere locally flat, but that is
closed cylindrically. This shows that when we identify "open" (infinite) and "closed"
(finite) universes with the cases k = -1 and k = +1 respectively, we are actually assuming
the "maximal topology" for the given metric in each case.
Based on the Robertson-Walker metric (3), we can compute the components of the Ricci
tensor and scalar and substitute these along with the simple uniform stress-energy tensor
into the field equations (1) to give the conditions on the scale function R = R(t):

where dots signify derivatives with respect to t. As expected, if R(t) is constant, these
equations reduce to the ones that appeared in Einstein's original 1917 paper, whereas with
variable R(t) we have a much wider range of possible solutions.
It may not be obvious that these two equations have a simultaneous solution, but notice
that if we multiply the first condition through by R(t)3 and differentiate with respect to t,
we get

The left-hand side is equal to


times the left-hand side of the second condition,
which equals zero, so the right hand side must also vanish, i.e., the derivative of (8/3)G
R(t)3 must equal zero. This implies that there is a constant C such that

With this stipulation, the two conditions are redundant, i.e., a solution of one is
guaranteed to be a solution of the other. Substituting for (8/3)G in the first condition
and multiplying through by R(t)3, we arrive at the basic differential equation for the scale
parameter of a Friedmann universe

Incidentally, if we multiply through by R(t), differentiate with respect to t, divide through


by

, and differentiate again, the constants k and C drop out, and we arrive at

With = 0 this is identical to the gravitational separation equation (2) in Section 4.2,
showing that the cosmological scale parameter R(t) is yet another example of a naturally
occurring spatial separation that satisfies this differential equation. It follows that the
admissible functions R(t) (with = 0) are formally identical to the gravitational free-fall
solutions described in Section 4.3. Solving equation (4) (with = 0) for

and

switching to normalized coordinates T = t/C and X = R/C, we get

Accordingly as k equals -1, 0, or +1, integration of this equation gives

A plot of these three solutions is shown below.

In all three cases with = 0, the expansion of the universe is slowing down, albeit only
slightly for the case k = -1. However, if we allow a non-zero cosmological constant ,
there is a much greater variety of possible solutions to Friedmann's equation (2),
including solutions in which the expansion of the universe is actually accelerating
exponentially. Based on the cosmic scale parameter R and its derivatives, the three
observable parameters traditionally used to characterize a particular solution are

In terms of these parameters, the constants appearing in the Friedmann equation (4) can
be expressed as

In principle if astronomers could determine the values of H, q, and with enough


precision, we could decide on empirical grounds the sign of k, and whether or not is
zero. Thus, assuming the maximal topologies (and the large-scale validity of general
relativity), we could determine whether the universe is open or closed, and whether it will
expand forever or eventually re-contract. Unfortunately, none of the parameters is known
with enough precision to distinguish between these possibilities.
One source of uncertainty is in our estimates of the mass density of the universe. Given
the best current models of star masses, and the best optical counts of stars in galaxies, and
the apparent density of galaxies, we estimate an overall mass density that is only a small
fraction of what would be required to make k = 0. However, there are reasons to believe
that much (perhaps most) of the matter in the universe is not luminous. (For example, the
observed rotation of individual galaxies indicates that they ought to fly apart unless there
is substantially more mass in them than is visible to us.) This has led physicists and
astronomers to search for the "missing mass" in various forms.
Another source of uncertainty is in the values of R and its derivatives. For example, in its
relatively brief history, Hubble's constant has undergone revisions of an order of
magnitude, both upwards and downwards. In recent years the Hubble space telescope
and several modern observatories on Earth seem to have found strong evidence that the
expansion of the universe is actually accelerating. If so, then it could be accounted for in
the context of general relativity only by a non-zero cosmological constant (on a related
question, see Section 7.6), with the implication that the universe is infinite and will
expand forever (at an accelerating rate).
Nevertheless, the idea of a closed finite universe is still of interest, partly because of the
historical role it played in Einstein's thought, but also because it remains (arguably) the
model most compatible with the spirit of general relativity. In an address to the Berlin
Academy of Sciences in 1921, Einstein said
I must not fail to mention that a theoretical argument can be adduced in favor of
the hypothesis of a finite universe. The general theory of relativity teaches that
the inertia of a given body is greater as there are more ponderable masses in

proximity to it; thus it seems very natural to reduce the total effect of inertia of a
body to action and reaction between it and the other bodies in the universe...
From the general theory of relativity it can be deduced that this total reduction of
inertia to reciprocal action between masses - as required by E. Mach, for example
- is possible only if the universe is spatially finite. On many physicists and
astronomers this argument makes no impression...
This is consistent with the approach taken in Einstein's 1917 paper. Shortly thereafter he
presented (in "The Meaning of Relativity", 1922) the following three arguments against
the conception of infinite space, and for the conception of a bounded, or closed,
universe:
(1) From the standpoint of the theory of relativity, to postulate a closed universe is
very much simpler than to postulate the corresponding boundary condition at
infinity of the quasi-Euclidean structure of the universe.
(2) The idea that Mach expressed, that inertia depends on the mutual attraction of
bodies, is contained, to a first approximation, in the equations of the theory of
relativity; it follows from these equations that inertia depends, at least in part,
upon mutual actions between masses. Thereby Mach's idea gains in
probability, as it is an unsatisfactory assumption to make that inertia depends in
part upon mutual actions, and in part upon an independent property of space.
But this idea of Mach's corresponds only to a finite universe, bounded in space,
and not to a quasi-Euclidean, infinite universe. From the standpoint of
epistemology it is more satisfying to have the mechanical properties of space
completely determined by matter, and this is the case only in a closed universe.
(3) An infinite universe is possible only if the mean density of matter in the
universe vanishes. Although such an assumption is logically possible, it is less
probable than the assumption of a finite mean density of matter in the universe.
Along these same lines, Misner, Thorne, and Wheeler ("Gravitation") comment that
general relativity "demands closure of the geometry in space as a boundary condition on
the initial-value equations if they are to yield a well-determined and unique 4-geometry."
Interestingly, when they quote Einstein's reasons in favor of a closed universe they omit
the third without comment, although it reappears (with a caveat) in the subsequent
"Inertia and Gravitation" of Ciufolini and Wheeler. As we've seen, Einstein was initially
under the mistaken impression that the only cosmological solution of the field equations
are those with

where R is the radius of the universe, is the mean density of matter, and is the
gravitational constant. This much is consistent with modern treatments, which agree that
at any given epoch in a Friedmann universe with constant non-negative curvature the

radius is inversely proportional to the square root of the mean density. On the basis of (5)
Einstein continued
If the universe is quasi-Euclidean, and its radius of curvature therefore infinite,
then would vanish. But it is improbable that the mean density of matter in the
universe is actually zero; this is our third argument against the assumption that the
universe is quasi-Euclidean.
However, in the 2nd edition of "The Meaning of Relativity" (1945), he added an
appendix, "essentially nothing but an exposition of Friedmann's idea", i.e., the idea that
"one can reconcile an everywhere finite density of matter with the original form of the
equations of gravity [without the cosmological term] if one admits the time variability of
the metric distances...". In this appendix he acknowledged that in a dynamic model, as
described above, it is perfectly possible to have an infinite universe with positive density
of matter, provided that k = -1. It's clear that Einstein originally had not seriously
considered the possibility of a universe with positive mass density but overall negative
curvature. In the first edition, whenever he mentioned the possibility of an infinite
universe he referred to the space as "quasi-Euclidean", which I take to mean "essentially
flat". He regarded this open infinite space as just a limiting case of a closed spherical
universe with infinite radius. He simply did not entertain the possibility of a hyperbolic
(k = -1) universe. (It's interesting that Riemann, too, excluded spaces of negative
curvature from his 1854 lecture, without justification.) His basic objection was evidently
that a spacetime with negative curvature possess an inherent structure independent of the
matter it contains, and he was unable to conceive of any physical source of negative
curvature. This typically entails "ad hoc" boundary conditions at infinity is precisely
what's required in an open universe, which Einstein regarded as contrary to the spirit of
relativity.
At the end of the appendix in the 2nd edition, Einstein conceded that it comes down to an
empirical question. If (8/3)G is greater than H2, then the universe is closed and
spherical; otherwise it is open and flat or pseudospherical (hyperbolic). He also makes
the interesting remark that although we might possibly prove the universe is spherical, "it
is hardly imaginable that one could prove it to be pseudospherical". His reasoning is that
in order to prove the universe is spherical, we need only identify enough matter so that (8
/3)G exceeds H2, whereas if our current estimate of is less than this threshold, it will
always be possible that there is still more "missing matter" that we have not yet
identified. Of course, at this stage Einstein was assuming a zero cosmological constant,
so it may not have occurred to him that it might someday be possible to determine
empirically that the expansion of the universe is accelerating, thereby automatically
proving that the universe is open.
Ultimately, was there any merit in Einstein's skepticism toward the idea of an "open"
universe? Even setting aside his third argument, the first two still carry some weight with
some people, especially those who are sympathetic to Mach's ideas regarding the
relational origin of inertia. In an open universe we must accept the fact that there are
multiple, physically distinct, solutions compatible with a given distribution of matter and

energy. In such a universe the "background" inertial field can in no way be associated
with the matter and energy content of the universe. From this standpoint, general
relativity can never gives an unambiguous answer to the twins paradox (for example),
because the proper time integral over a given path from A to B depends on the inertial
field, and in an open universe this field cannot be inferred from the distribution of massenergy. It is determined primarily by whatever absolute boundary conditions we choose
to impose, independent of the distribution of mass-energy. Einstein believed that such
boundary conditions were inherently non-relativistic, because they require us to single
out a specific frame of reference - essentially Newton's absolute space. (In later years a
great deal of work has been done in attempting to develop boundary conditions "at
infinity" that do not single out a particular frame. This is discussed further in Section
7.7.)
The only alternative (in an open universe) that Einstein could see in 1917 was for the
metric to degenerate far from matter in such a way that inertia vanishes, i.e., we would
require that the metric at infinity go to something like

Such a boundary condition would be the same with respect to any frame of reference, so
it wouldn't single out any specific frame as the absolute inertial frame of the universe.
Einstein pursued this approach for a long time, but finally abandoned it because it
evidently implies that the outermost shell of stars must exist in a metric very different
from ours, and as a consequence we should observe their spectral signatures to be
significantly shifted. (At the time there was no evidence of any "cosmological shift" in
the spectra of the most distant stars. We can only speculate how Einstein would have
reacted to the discovery of quasars, the most distant objects known, which are in fact
characterized by extreme redshifts and apparently extraordinary energies.)
The remaining option that Einstein considered for an open asymptotically flat universe is
to require that, for a suitable choice of the system of reference, the metric must go to

at infinity. However, this explicitly singles out one particular frame of reference as the
absolute inertial frame of the universe, which, as Einstein said, "is contrary to the spirit of
the relativity principle". This was the basis of his early view that general relativity is
most compatible with a closed unbounded universe. The recent astronomical findings

that seem to indicate an accelerating expansion have caused most scientists to abandon
closed models, but there seems to be some lack of appreciation for the damage an open
universe does to the epistemological strength of general relativity. As Einstein wrote in
1945, "the introduction of [the cosmological constant] constitutes a complication of the
theory, which seriously reduces its logical simplicity".
Of course, in both an open and a closed universe there must be boundary and/or initial
conditions, but the question is whether the distribution of mass-energy by itself is
adequate to define the field, or whether independent boundary conditions are necessary to
pin down the field. In a closed universe the "boundary conditions" can be more directly
identified with the distribution of mass-energy, whereas in an open universe they are
necessarily quite independent. Thus a closed universe can claim to satisfy Mach's
principle at least to some degree, whereas an open universe definitely can't. The
seriousness of this depends on how seriously we take Mach's principle. Since we can just
as well regard a field as a palpable constituent of the universe, and since the metric of
spacetime itself is a field in general relativity, it can be argued that Mach's dualistic view
is no longer relevant. However, the second issue is whether even the specification of the
distribution of mass-energy plus boundary conditions at infinity yields a unique solution.
For Maxwell's equations (which are linear) it does, but for Einstein's equations (which are
non-linear) it doesn't. This is perhaps what Misner, et al, are referring to when they
comment that "Einstein's theory...demands closure of the geometry in space ... as a
boundary condition on the initial value equations if they are to yield a well-determined
(and, we now know, a unique) 4-geometry".
In view of this, we might propose the somewhat outlandish argument that the (apparent)
uniqueness of metrical field supports the idea of a closed universe - at least within the
context of general relativity. To put it more explicitly, if we believe the structure of the
universe is governed by general relativity, and that the structure is determinate, then the
universe must be closed. If the universe is not closed, then general relativity must be
incomplete in the sense that there must be something other than general relativity
determining which of the possible structures actually exists. Admittedly, completeness in
this sense is a very ambitious goal for any theory, but it's interesting to recall the famous
"EPR" paper in which Einstein criticized quantum mechanics on the grounds that it could
not be a complete description of nature. He may well have had this on his mind when he
pointed out how seriously the introduction of a cosmological constant undermines the
logical simplicity of general relativity, which was always his criterion for evaluating the
merit of any scientific theory.
We can see him wrestling with this issue, even in his 1917 paper, where he notes that
some people (such as de Sitter) have argued that we have no need to consider boundary
conditions at infinity, because we can simple specify the metric at the spatial limit of the
domain under consideration, just as we arbitrarily (or empirically) specify the inertial
frames when working in Newtonian mechanics. But this clearly reduces general
relativity to a rather weak theory that must be augmented by other principles and/or
considerable amounts of arbitrary information in order to yield determinate results. Not
surprisingly, Einstein was unenthusiastic about this alternative. As he said, "such a

complete resignation in this fundamental question is for me a difficult thing. I should not
make up my mind to it until every effort to make headway toward a satisfactory view had
proved to be in vain".
7.2 The Formation and Growth of Black Holes
It is a light thing for the shadow to go down ten degrees:
nay, but let the shadow return backward ten degrees.
2 Kings
20
One of the most common questions about black holes is how they can even exist if it
takes infinitely long (from the perspective of an outside observer) for anything to reach
the event horizon. The usual response to this question is to explain that although the
Schwarzschild coordinates are ill-behaved at the event horizon, the intrinsic structure of
spacetime itself is well-behaved in that region, and an infalling object passes through the
event horizon in finite proper time of the object. This is certainly an accurate description
of the Schwarzschild structure, but it doesn't fully address the question, which can be
summarized in terms of the following two seemingly contradictory facts:
(1) An event horizon can grow in finite coordinate time only if the mass
contained inside the horizon increases in finite coordinate time.
(2) According to the Schwarzschild metric, nothing crosses the event
horizon in finite coordinate time.
Item (1) is a consequence of the fact that, as in Newtonian gravity, the field contributed
by a (static) spherical shell on its interior is zero, so an event horizon can't be expanded
by accumulating mass on its exterior. Nevertheless, if mass accumulates near the exterior
of a black hole's event horizon the gravitational radius of the combined system must
eventually (in finite coordinate time) increase far enough to encompass the accumulated
mass, leading unavoidably to the conclusion that matter from the outside must reach the
interior in finite coordinate time, which seems to directly conflict with Item 2 (and
certainly seems inconsistent with the "frozen star" interpretation). To resolve this
apparent paradox requires a careful examination of the definition of a black hole, and this
leads directly to several interesting results, such as the fact that if two black holes merge,
then their event horizons are contiguous, and have been so since they were formed.
The matter content of a black hole is increased when it combines with another black hole,
but in such a case we obviously aren't dealing with a simple "one-body problem", so the
spherically symmetrical Schwarzschild solution is not applicable. Lacking an exact
solution of the field equations for the two-body problem, we can at least get a qualitative
idea of the process by examining the "trousers" topology shown below:

As we progress through the sequence of external time slices the first event horizon
appears at A, then another appears at B, then at C, and then A and B merge together. The
"surfaces" of the trousers represent future null infinity (I+) of the external region,
consistent with the definition of black holes as regions of spacetime that are not in the
causal past of future null infinity. (If the universe is closed, the "ceiling" from which
these "stalactites" descend is at some finite height, and our future boundary is really just a
single surface. In such a universe these protrusions of future infinity are not true "event
horizons", making it difficult to give a precise definition of a black hole. In this
discussion we assume an infinite open universe.) The "interior" regions enclosed by these
surfaces are, in a sense, beyond the infinite future of our region of spacetime. If we
regard a small test object as a point particle with zero radius then it's actually a black hole
too, and the process of "falling in" to a "macro" black hole would simply be the trousers
operation of merging the two I+ surfaces together, just like the merging of two macro
black holes.
On this basis the same interpretation would apply to the original formation of a macro
black hole, by the coalescing of the I+ surfaces represented by the individual particles of
the original collapsing star. Thus, we can completely avoid the "paradox" of black hole
formation by considering all particles of matter to already be black holes. According to
this view, it makes no sense to talk about the "interior" of a black hole, any more than it
makes sense to talk about what's "outside" the universe, because the surface of a black
hole is a boundary (future null infinity) of the universe.
Unfortunately, it isn't at all clear that small particles of matter can be regarded as black
holes surrounded by their own microscopic event horizons, so the "trousers" approach
may not be directly applicable to the accumulation of small particles of "naked matter"
(i.e., matter not surrounded by an event horizon). We'd like an explanation for the
absorption of matter into a black hole that doesn't rely on this somewhat peculiar model
of matter. To reconcile the Schwarszchild solution with the apparent paradox presented
by items (1) and (2) above, it's worthwhile to recall from Chapter 6.4 what a radial
freefall path really looks like in simple Schwarzschild geometry. The test particle starts at
radius r = 10m and t = 0. The purple curve represents the radius vs. the particle's proper
time, showing a simple well-behaved cycloidal path right down to r = 0, whereas the
green curve represents the particle's radius vs. Schwarzschild coordinate time. The latter
shows that the infalling object traverses through infinite coordinate time in order to reach

the event horizon, and then traverses back through coordinate time until reaching r = 0 (in
the interior) in a net coordinate time that is not too different from the elapsed proper time.
In other words, the object goes infinitely far into the "future" (of coordinate time), and
then infinitely far back to the "present" (also in coordinate time), and since these two
segments must always occur together, we can "re-normalize" the round trip and just deal
with the net change in coordinate time (for any radius other than precisely r = 2m).
It shouldn't be surprising that the infalling object is in two places (both inside and outside
the event horizon) at the same coordinate time, because worldlines need not be singlevalued in terms of arbitrary curvilinear coordinates. Still, it might seem that this "dual
presence" opens the door to time-travel paradoxes. For example, we can observe the
increase in the gravitational radius at some finite coordinate time, when the particle that
caused the increase has still not yet crossed the event horizon (using the terms "when"
and "not yet" in the sense of coordinate time), so it might seem that we have the
opportunity to retrieve the particle before it crosses the horizon, thus preventing the
increase that triggered our retrieval! However, if we carefully examine the path of the
particle, both outside and inside the event horizon, we find that by the time it has gotten
"back" close to our present coordinate time on the interior branch, the exterior branch is
past the point of last communication. Even a photon could not catch up with it prior to
crossing the horizon. The "backward" portion of the particle's trajectory through
coordinate time inside the horizon ends just short of enabling any causality paradoxes.
(It's apparent from these considerations that classical relativity must be a strictly
deterministic theory - in which each worldline can be treated as already existing in its
entirety - because we could construct genuine paradoxes in a non-deterministic theory.)
At this point it's worth noticing that our two strategies for explaining the formation and
growth of black holes are essentially the same! In both cases the event horizon "reaches
back" to us all the way from future null infinity. In a sense, that's why the infalling
geodesics in Schwarzschild space go to infinity at the event horizon. To show the
correspondence more clearly, we can turn the figure in Section 6.4 on end (so the
coordinate time axis is vertical) and then redraw the constant-t lines as curves so as to
accurately represent the absolute spacetime intervals. The result is shown below for a
small infalling test particle:

Notice that the infalling worldline passes through all the Schwarzschild time slices t as it
crosses the event horizon. Now suppose we take a longer view of this, beginning all the
way back at the point of formation of the black hole, and suppose the infalling mass is
significant relative to the original mass m. The result looks like this:

This shows how the stalactite reaches down from future infinity, and how the infalling
mass passes through this infinity - but in finite proper time - to enter the interior of the
black hole, and the event horizon expands accordingly. This figure is based on the actual
spacetime intervals, and shows how the lines of constant Schwarzschild time t wrap
around the exterior of the event horizon down to the point of formation, where they enter
the interior of the black hole and "expand" back close to the region where they originated
on the outside.
One thing that sometimes concerns people when they look at a radial free-fall plot in
Schwarzschild coordinates is related to the left hand side of the ballistic trajectory. Does
the symmetry of the figure imply that we could launch a particle from r = 0, have it climb
up to 5m, and then drop back down? No, because the light cones have tipped over at 2m,
so the timelike and spacelike axes are reversed. Inside the event horizon the effective
time axis points parallel to "r". As a result, although the left hand trajectory in the region
above 2m is possible, the portion for r less than 2m is not; it's really just the timereversed version of the right hand side. (We could also imagine a topology in which all
inward and outward trajectories are realized (Kruskal space), but there is no known
mechanism that would generate such a structure.)
Still, it's valid to ask "how did we decide which way was forward in time inside the event
horizon?" The only formal requirement seems to be that our choice be consistent for any
given event horizon, always increasing r or always decreasing r. If we make one choice of
sign convention we have a "white hole" spewing objects outward into our universe,
whereas if we make the opposite choice we have a black hole, drawing things inward.

The question of whether we should expect to find as many white holes as black holes in
the universe is still a subject of lively debate.
In the forgoing reference was made to mass accumulating "near" the horizon, but we need
to be careful about the concepts of nearness. The intended meaning in the above context
was that the mass is (1) exterior to the event horizon, and (2) within a small increment r
of the horizon, where r is the radial Schwarzschild coordinate. I've also assumed spherical
symmetry so that the Schwarzschild solution and Birkhoff's uniqueness proof apply
(meaning that the spacetime in the interior of an empty spherically symmetrical shell is
necessarily flat).
Of course, in terms of the spacelike surfaces of simultaneity of an external particle, the
event horizon is always infinitely far away, or, more accurately, the horizon doesn't
intersect with any external spacelike surface, with the exception of the single degenerate
time&space-like surface precisely at 2m, where the external time and space surfaces
close on each other like scissors (and then swap roles in the interior). So in terms of these
coordinates the particle is infinitely far from the horizon right up to the instant it crosses
the horizon! And this is the same "instant" that every other infalling object crosses the
horizon, although separated by great "distances". (This isn't really so strange. Midnight
tonight is infinitely far from us in this same sense, because it is no finite spatial distance
away, and it will remain so until the instant we reach it. Likewise the event horizon is
ahead of us in time, not in space.)
Incidentally, I should probably qualify my dismissal of the "frozen star" interpretation,
because there's a sense in which it's valid, or at least defensible. Remember that
historically the two most common conceptual models for general relativity have been the
"geometric interpretation" (as exemplified by Misner/Thorne/Wheeler's "Gravitation")
and the "field interpretation" (as in Weinberg's "Gravitation and Cosmology"). These two
views are operationally equivalent outside event horizons, but they tend to lead to
different conceptions of the limit of gravitational collapse. According to the field
interpretation, a clock runs increasingly slowly as it approaches the event horizon (due to
the strength of the field), and the natural "limit" of this process is that the clock just
asymptotically approaches "full stop" (i.e., running at a rate of zero) as it approaches the
horizon. It continues to exist for the rest of time, but it's "frozen" due to the strength of
the gravitational field. Within this conceptual framework there's nothing more to be said
about the clock's existence. This leads to the "frozen star" conception of gravitational
collapse. In contrast, according to the geometric interpretation, all clocks run at the same
rate, measuring out real distances along worldlines in spacetime. This leads us to think
that, rather than slowing down as it approaches the event horizon, the clock is following a
shorter and shorter path to the future. In fact, the path gets shorter at such a rate that it
actually reaches (our) future infinity in finite proper time. Now what? If we believe the
clock is still running just like every other clock (and there's no local pathology of the
spacetime) then it seems natural to extrapolate the clock's existence right past our future
infinity and into another region of spacetime. Obviously this implies that the universe has
a "transfinite topology", which some people find troubling, but there's nothing logically
contradictory about it (assuming the notion of an infinite continuous universe is not itself

logically contradictory).
In both of these interpretations we find that an object goes to future infinity (of
coordinate time) as it approaches an event horizon, and its rate of proper time as a
function of coordinate time goes to zero. The difference is that the field interpretation is
content to truncate its description at the event horizon, while the geometric interpretation
carries on with its description right through the event horizon and down to r = 0 (where it
too finally gives up). What, if anything, is gained by extrapolating the worldlines of
infalling objects through the event horizon? One obvious gain is that it offers a prediction
of what would be experienced by an infalling observer. Since this represents a worldline
that we could, in principle, follow, and since the formulas of relativity continue to make
coherent predictions along those worldlines, there doesn't seem to be any compelling
reason to truncate our considerations at the horizon. After all, if we limit our view of the
universe to just the worldlines we have followed, or that we intend to follow, we end up
with a very oddly shaped universe.
On the other hand, the "frozen star" interpretation does have the advantage of simplifying
the topology, i.e., it allows us to exclude event horizons separating transfinite regions of
spacetime. More importantly, by declining to consider the fate of infalling worldlines
through the event horizon, we avoid dealing with the rather awkward issue of a genuine
spacetime singularity at r = 0. Therefore, if the "frozen star" interpretation gave
equivalent predictions for all externally observable phenomena, and was logically
consistent, it would probably be the preferred view. The question is, does the concept of a
"frozen star" satisfy those two conditions? We saw above that the idea of a frozen star as
an empty region around which matter "bunches up" outside an event horizon isn't viable,
because if nothing ever passes from the exterior to the interior of an event horizon (in
finite coordinate time) we cannot accommodate infalling matter. Either the event horizon
expands or it doesn't, and in either case we arrive at a contradiction unless the value of m
inside the horizon increases, and does so in finite coordinate time.
The "trousers topology" described previously is, in some ways, the best of both worlds,
but it relies on a somewhat dubious model of material particles as micro singularities in
spacetime. We've also seen how the analytical continuation of the external free-fall
geodesics into the interior leads to an apparently self-consistent picture of black hole
growth in finite coordinate time, and this picture turns out to be fairly isomorphic to the
trousers model. (Whether it's isomorphic to the truth is another question.) It may be
worthwhile to explicitly describe the situation. Consider a black hole of mass m. The
event horizon has radius r = 2m in Schwarzschild coordinates. Now suppose a large
concentric spherical dust cloud of total mass m surrounds the black hole is slowly pulled
to within a shell of radius, say, 2.1m. The mass of the combined system is 2m, giving it a
gravitational radius of r = 4m, and all the matter is now within r = 4m, so there must be,
according to the unique spherically symmetrical solution of the field equations, an event
horizon at r = 4m. Evidently the dust has somehow gotten inside the event horizon. We
might think that although the event horizon has expanded to 4m, maybe the dust is being
held "frozen" just outside the horizon at, say, 4.1m. But that can't be true because then
there would be only 1m of mass inside the 4m radius, and the horizon would collapse.

Also, this would imply that any dust originally inside 4m must have been pushed
outward, and there is no known mechanism for that to happen.
One possible way around this would be for the density of matter to be limited (by some
mechanism we don't understand) to just sub-critical. In other words, each spherical region
of radius r would be limited to just less than r/2 mass. It might be interesting to figure out
the mass density profile necessary to be just shy of having an event horizon at every
radius r (possibly inverse square?), but the problem with this idea is that there just isn't
any known force that would hold the matter in this configuration. By all the laws we
know it would immediately collapse. Of course, it's easy to posit some kind of Pauli-like
gravitational "exclusion principle" which would simply prohibit two particles of matter
from occupying the same "gravitational state". After all, it's the electron and nucleon
exclusion principles that yield the white dwarf and neutron star configurations,
respectively. The only reason we end up with black holes is because the universe seems
to be one exclusion principle short. Thus, barring any "new physics", there is nothing to
prevent an event horizon from forming and expanding, and this implies that the value of
m inside the horizon increases in finite coordinate time, which conflicts with the "frozen
star" interpretation.
The preceding discussion makes clear the fact that general relativity is not a relational
theory. Schwarzschild spacetime represents a cosmology with a definite preferred frame
of reference, the one associated with the time-independent metric components. (Einstein
at first was quite disappointed when he learned that the field equations have such an
explicitly non-Machian solution, i.e., a single mass in an otherwise empty infinite
universe). Of course, we introduced the preferred frame ourselves by imposing spherical
symmetry in the first place, but it's always necessary to impose some boundary or initial
value conditions, and these conditions (in an open infinite universe) unavoidably single
out a particular frame of reference (as discussed further in Section 7.7). That troubled
Einstein greatly, and was his main reason for arguing that the universe must be closed,
because only in that context can we claim that the entire metric is in some sense fully
determined by the distribution of mass-energy. However, there is no precise definition of
a black hole in a closed universe, so for the purposes of this discussion we're committed
to a cosmology with an arbitrarily preferred frame.
To visualize how this preferred frame effectively governs the physics in Schwarzschild
space, consider the following schematic of a black hole:

The star collapsed at point "a", and formed an event horizon of radius 2m in
Schwarzschild coordinates. How far is the observer at "O" from the event horizon? If we
trace along the spacelike surface "t = now" we find that the black hole doesn't exist at
time t = now, which is to say, it is nowhere on the t = now timeslice. The event horizon is
in the future of every external timeslice, all the way to future infinity. In fact, the event
horizon is part of future null infinity. Nevertheless, the black hole clearly affects the
physics on the timeslice t = now. For example, if the "observer" at O looks toward the
"nearby star", his view will be obstructed, i.e., the star will be eclipsed, because the
observer is effectively in the shadow of the infinite future. The size of this shadow will
increase as the size of the event horizon increases.
Thus we can derive knowledge of a black hole from the shadow it casts (like an eclipse),
noting that the outline of a shadow isn't subject to speed-of-light restrictions, so there's
nothing contradictory about being able to detect the presence and growth of a black hole
region in finite coordinate time. Moreover, if the observer is allowed to fall freely, he will
go mostly leftward (and slightly up) toward r = 0, quickly carrying him through all future
timeslices (which are infinitely compressed around the event horizon) and into the
interior. In doing so, he causes the event horizon to expand slightly.
7.3 Falling Into and Hovering Near A Black Hole
Unless the giddy heaven fall,
And earth some new convulsion tear,
And, us to join, the world should all
Be cramped into a planisphere.
As lines so loves oblique may well

Themselves in every angle greet;


But ours, so truly parallel,
Though infinite, can never meet.
Therefore the love which us doth bind,
But Fate so enviously debars,
Is the conjunction of the mind,
And opposition of the stars.
Andrew Marvell (1621-1678)
The empirical evidence for the existence of black holes or at least something very much
like them has become impressive, although it is arguably still largely circumstantial.
Indeed, most relativity experts, while expressing high confidence (bordering on certainty)
in the existence of black holes, nevertheless concede that since any electromagnetic
signal reaching us must necessarily have originated outside any putative black holes, it
may always be possible to imagine that they were produced by some mechanism just
short of a black hole. Hence we may never acquire, by electromagnetic signals, definitive
proof of the existence of black holes other than by falling into one. (Its conceivable
that gravitational waves might provide some conclusive external evidence, but no such
waves have yet been detected.)
Of course, there are undoubtedly bodies in the universe whose densities and gravitational
intensities are extremely great, but it isnt self-evident that general relativity remains
valid in these extreme conditions. Ironically, considering that black holes have become
one of the signature predictions of general relativity, the theorys creator published
arguments purporting to show that gravitational collapse of an object to within its
Schwarzschild radius could not occur in nature. In a paper published in 1939, Einstein
argued that if we consider progressively smaller and smaller stationary systems of
particles revolving around each other under their mutual gravitational attraction, the
particles would need to be moving at the speed of light before reaching the critical
density. Similarly Karl Schwarzschild had computed the behavior of a hypothetical
stationary star of uniform density, and found that the pressure must go to infinity as the
star shrank toward the critical radius. In both cases the obvious conclusion is that there
cannot be any stationary configurations of matter above the critical density. Some
scholars have misinterpreted Einsteins point, claiming that he was arguing against the
existence of black holes within the context of general relativity. These scholars
underestimate both Einsteins intelligence and his radicalism. He could not have failed to
understand that sub-light particles (or finite pressure in Schwarchilds star) meant
unstable collapse to a singular point of infinite density at least if general relativity holds
good. Indeed this was his point: general relativity must fail. Thus we are not surprised to
find him writing in The Meaning of Relativity
For large densities of field and matter, the field equations and even the field
variables which enter into them have no real significance. One may not therefore
assume the validity of the equations for very high density of field and matter
The present relativistic theory of gravitation is based on a separation of the

concepts of gravitational field and of matter. It may be plausible that the


theory is for this reason inadequate for very high density of matter
These reservations were not considered to be warranted by other scientists at the time,
and even less so today, but perhaps they can serve to remind us not to be too dogmatic
about the validity of our theories of physics, especially when extrapolated to very
extreme conditions that have never been (and may never be) closely examined.
Furthermore, we should acknowledge that, even within the context of general relativity,
the formal definition of a black hole may be impossible to satisfy. This is because, as
discussed previously, a black hole is strictly defined as a region of spacetime that is not in
the causal past of any point in the infinite future. Notice that this refers to the infinite
future, because anything short of that could theoretically be circumvented by regions that
are clearly not black holes. However, in some fairly plausible cosmological models the
universe has no infinite future, because it re-collapses to a singularity in finite coordinate
time. In such a universe (which, for all we know, could be our own), the boundary of any
gravitationally collapsed region of spacetime would be contiguous with the boundary of
the ultimate collapse, so it wouldnt really be a separate black hole in the strict sense. As
Wald says, "there appears to be no natural notion of a black hole in a closed RobertsonWalker universe which re-collapses to a final singularity", and further, "there seems to be
no way to define a black hole in a closed universe, because it requires going to infinity,
but there is no infinity in a closed universe."
Its interesting that this is essentially the same objection that is often raised by people
when they first hear about black holes, i.e., they reason that if it takes infinite coordinate
time for any object to cross an event horizon, and if the universe is going to collapse in a
finite coordinate time, then its clear that nothing can possess the properties of a true
black hole in such a universe. Thus, in some fairly plausible cosmological models it's not
strictly possible for a true black hole to exist. On the other hand, it is possible to have an
approximate notion of a black hole in some isolated region of a closed universe, but of
course many of the interesting transfinite issues raised by true (perhaps a better name
would be "ideal") back holes are not strictly applicable to an "approximate" black hole.
Having said this, there is nothing to prevent us from considering an infinite open universe
containing full-fledged black holes in all their transfinite glory. I use the word
transfinite because ideal black holes involve singular boundaries at which the usual
Schwarzschild coordinates for the external field of a gravitating body go to infinity - and
back - as discussed in the previous section. There are actually two distinct kinds of
"spacetime singularities" involved in an ideal black hole, one of which occurs at the
center, r = 0, where the spacetime manifold actually does become unequivocally singular
and the field equations are simply inapplicable (as if trying to divide a number by 0). It's
unclear (to say the least) what this singularity actually means from a physical standpoint,
but oddly enough the "other" kind of singularity involved in a black hole seems to shield
us from having to face the breakdown of the field equations. This is because it seems
(although it has not been proved) to be a characteristic of all realistic spacetime
singularities in general relativity that they are invariably enclosed within an event

horizon, which is a peculiar kind of singularity that constitutes a one-way boundary


between the interior and exterior of a black hole. This is certainly the case with the
standard black hole geometries based on the Schwarzschild and Kerr solutions. The
proposition that it is true for all singularities is sometimes called the Cosmic Censorship
Conjecture. Whether or not this conjecture is true, it's a remarkable fact that at least some
(if not all) of the singular solutions of Einstein's field equations automatically enclose the
singularity inside an event horizon, amazing natural contrivance that effectively shields
the universe from direct two-way exposure to any regions in which the metric of
spacetime breaks down.
Perhaps because we don't really know what to make of the true singularity at r = 0, we
tend to focus our attention on the behavior of physics near the event horizon, which, for a
non-rotating black hole, resides at the radial location r = 2m, where the Schwarzschild
coordinates become singular. Of course, a singularity in a coordinate system doesn't
necessarily represent a pathology of the manifold. (Consider traveling due East at the
North Pole). Nevertheless, the fact that no true black hole can exist in a finite universe
shows that the coordinate singularity at r = 2m is not entirely inconsequential, because it
does (or at least can) represent a unique boundary between fundamentally separate
regions of spacetime, depending on the cosmology. To understand the nature of this
boundary, it's useful to consider hovering near the event horizon of a black hole. The
components of the curvature tensor at r = 2m are on the order of 1/m2, so the spacetime
can theoretically be made arbitrarily "flat" (Lorentzian) at that radius by making m large
enough. Thus, for an observer "hovering" at a value of r that exceeds 2m by some small
fixed amount r, the downward acceleration required to resist the inward pull can be
arbitrarily small for sufficiently large m. However, in order for the observer to be
hovering close to 2m his frame must be tremendously "boosted" in the radial direction
relative to an in-falling particle. This is best seen in terms of a spacetime diagram such as
the one below, which show the future light cones of two events located on either side of a
black hole's event horizon.

In this drawing r is the radial Schwarzschild coordinate and t' is an Eddington-Finkelstein

mapping of the Schwarzschild time coordinate, i.e.,

The right-hand ray of the cone for the event located just inside the event horizon is tilted
just slightly to the left of vertical, whereas the cone for the event just outside 2m is tilted
just slightly to the right of vertical. The rate at which this "tilt" changes with r is what
determines the curvature and acceleration, and for a sufficiently large black hole this rate
can be made negligibly small. However, by making this rate small, we also make the
outward ray more nearly "vertical" at a given r above 2m, which implies that the
hovering observer's frame needs to be even more "boosted" relative to the local frame of
an observer falling freely from infinity. The gravitational potential, which need not be
changing very steeply at r = 2m, has nevertheless changed by a huge amount relative to
infinity. We must be very deep in a potential hole in order for the light cones to be tilted
that far, even though the rate at which the tilt has been increasing can be arbitrarily slow.
This just means that for a super-massive black hole they started tilting at a great distance.
As can be seen in the diagram, relative to the frame of a particle falling in from infinity, a
hovering observer must be moving outward at near light velocity. Consequently his axial
distances are tremendously contracted, to the extent that, if the value of r is normalized
to his frame of reference, he is actually a great distance (perhaps even light-years) from
the r = 2m boundary, even though he is just 1 inch above r = 2m in terms of the
Schwarzschild coordinate r. Also, the closer he tries to hover, the more radial boost he
needs to hold that value of r, and the more contracted his radial distances become. Thus
he is living in a thinner and thinner shell of r, but from his own perspective there's a
world of room. Assuming he brought enough rocket fuel to accelerate himself up to this
"hovering frame" at that radius 2m + r (or actually to slow himself down to a hovering
frame), he would thereafter just need to resist the local acceleration of gravity to maintain
that frame of reference.
Quantitatively, for an observer hovering at a small Schwarzschild distance r above the
horizon of a black hole, the radial distance r' to the event horizon with respect to the
observer's local coordinates would be

which approaches
as r goes to zero. This shows that as the observer hovers
closer to the horizon in terms of Schwarzschild coordinates, his "proper distance" remains
relatively large until he is nearly at the horizon. Also, the derivative of r' with respect to
r

in this range is
, which goes to infinity as r goes to zero. (These relations
pertain to a truly static observer, so they dont apply when the observer is moving from

one radial position to another, unless he moves sufficiently slowly.)


Incidentally, it's amusing to note that if a hovering observer's radial distance contraction
factor at r was 12m/r instead of the square root of that quantity, his scaled distance to
the event horizon at a Schwarzschild distance of r would be r' = 2m + r. Thus when he
is precisely at the event horizon his scaled distance from it would be 2m, and he wouldnt
achieve zero scaled distance from the event horizon until arriving at the origin r = 0 of the
Schwarzschild coordinates. This may seem rather silly, but its actually quite similar to
one of Einsteins proposals for avoiding what he regarded as the unpleasant features of
the Schwarzschild solution at r = 2m. He suggested replacing the radial coordinate r with
=
, and noted that the Schwarzschild solution expressed in terms of this
coordinate behaves regularly for all values of . Whether or not there is any merit in this
approach, it clearly shows how easily we can eliminate poles and singularities simply
by applying coordinates that have canceling zeros (much as one does in the design of
control systems) or otherwise restricting the domain of the variables. However, we
shouldnt assume that every arbitrary system of coordinates has physical significance.
What "acceleration of gravity" would a hovering observer feel locally near the event
horizon of a black hole? In terms of the Schwarzschild coordinate r and the proper time
of the particle, the path of a radially free-falling particle can be expressed parametrically
in terms of the parameter by the equations

where R is the apogee of the path (i.e., the highest point, where the outward radial
velocity is zero). These equations describe a cycloid, with = 0 at the top, and they are
valid for any radius r down to 0. We can evaluate the second derivative of r with respect
to as follows

At = 0 the path is tangent to the hovering worldline at radius R, and so the local
gravitational acceleration in the neighborhood of a stationary observer at that radius
equals m/R2, which implies that if R is approximately 2m the acceleration of gravity is
about 1/(4m). Thus the acceleration of gravity in terms of the coordinates r and is finite
at the event horizon, and can be made arbitrarily small by increasing m.
However, this acceleration is expressed in terms of the Schwarzschild radial parameter r,
whereas the hovering observers radial distance r' must be scaled by the gravitational
boost factor, i.e., we have dr' = dr/(12m/r)1/2. Substituting this expression for dr into the
above formula gives the proper local acceleration of a stationary observer

This value of acceleration corresponds to the amount of rocket thrust an observer would
need in order to hold position, and we see that it goes to infinity as r goes to 2m.
Nevertheless, for any ratio r/(2m) greater than 1 we can still make this acceleration
arbitrarily small by choosing a sufficiently large m. On the other hand, an enormous
amount of effort would be required to accelerate the rocket into this hovering condition
for values of r/(2m) very close to 1. This amount of boost effort cannot be made
arbitrarily small, because it essentially amounts to accelerating (outwardly) the rocket to
nearly the speed of light relative to the frame of a free-falling particle from infinity.
Interestingly, as the preceding figure suggests, an outward going photon can hover
precisely at the event horizon, since at that location the outward edge of the light cone is
vertical. This may seem surprising at first, considering that the proper acceleration of
gravity at that location is infinite. However, the proper acceleration of a photon is indeed
infinite, since the edge of a light cone can be regarded as hyperbolic motion with
acceleration a in the limit as a goes to infinity, as illustrated in the figure below.

Also, it remains true that for any fixed r above the horizon we can make the proper
acceleration arbitrarily small by increasing m. To see this, note that if r = 2m + r for a
sufficiently small increment r we have m/r ~ 1/2, and we can bring the other factor of r
into the square root to give

Still, these formulas contain a slight "mixing of metaphors", because they refer to two

different radial parameters (r' and r) with different scale factors. To remedy this, we can
define the locally scaled radial increment r' =
as the hovering observers
proper distance from the event horizon. Then, since r = r 2m, we have r'
and so r =
. Substituting this into the formula for the
proper local acceleration gives the proper acceleration of a stationary observer at a
"proper distance" r' above the event horizon of a (non-rotating) object of mass m is
given by

Notice that as (r'/M) becomes small the acceleration approaches -1/(2r'), which is the
asymptotic proper acceleration at a small "proper distance" r' from the event horizon of
a large black hole. Thus, for a given proper distance r' the proper acceleration can't be
made arbitrarily small by increasing m. Conversely, for a given proper acceleration g our
hovering observer can't be closer than 1/(2g) of proper distance, even as m goes to
infinity. For example, the closest an observer can get to the event horizon of a supermassive black hole while experiencing no more than 1g proper acceleration is about half
a light-year of proper distance. At the other extreme, if (r'/m) is very large, as it is in
normal circumstances between gravitating bodies, then this acceleration approaches m/(
r'2, which is just Newton's inverse-square law of gravity in geometrical units.
We've seen that the amount of local acceleration that must be overcome to hover at a
radial distance r increases to infinity at r = 2m, but this doesn't imply that the
gravitational curvature of spacetime at that location becomes infinite. The components of
the curvature tensor depend to some extent on the choice of coordinate systems, so we
can't simply examine the components of R to ascertain whether the intrinsic curvature
is actually singular at the event horizon. For example, with respect to the Schwarzschild
coordinates the non-zero components of the covariant curvature tensor are

along with the components related to these by symmetry. The two components relating
the radial coordinate to the spherical surface coordinates are singular at r = 2m, but this is
again related to the fact that the Schwarzschild coordinates are not well-behaved on this
manifold near the event horizon. A more suitable system of coordinates in this region (as
noted by Misner, et al) is constructed from the basis vectors

where =
. With respect to this "hovering" orthonormal system of
coordinates the non-zero components of the curvature tensor (up to symmetry) are

Interestingly, if we transform to the orthonormal coordinates of a free-falling particle, the


curvature components remain unchanged. Plugging in r = 2m, we see that these
components are all proportional to 1/m2 at the event horizon, so the intrinsic spacetime
curvature at r = 2m is finite. Indeed, for a sufficiently large mass m the curvature can be
made arbitrarily mild at the event horizon. If we imagine the light cone at a radial
coordinate r extremely close to the horizon (i.e., such that r/(2m) is just slightly greater
than 1), with its outermost ray pointing just slightly in the positive r direction, we could
theoretically boost ourselves at that point so as to maintain a constant radial distance r,
and thereafter maintain that position with very little additional acceleration (for
sufficiently large m). But, as noted above, the work that must be expended to achieve this
hovering condition from infinity cannot be made arbitrarily small, since it requires us to
accelerate to nearly the speed of light.
Having discussed the prospects for hovering near a black hole, let's review the process by
which an object may actually fall through an event horizon. If we program a space probe
to fall freely until reaching some randomly selected point outside the horizon and then
accelerate back out along a symmetrical outward path, there is no finite limit on how far
into the future the probe might return. This sometimes strikes people as paradoxical,
because it implies that the in-falling probe must, in some sense, pass through all of
external time before crossing the horizon, and in fact it does, if by "time" we mean the
extrapolated surfaces of simultaneity for an external observer. However, those surfaces
are not well-behaved in the vicinity of a black hole. It's helpful to look at a drawing like
this:

This illustrates schematically how the analytically continued surfaces of simultaneity for
external observers are arranged outside the event horizon of a black hole, and how the infalling object's worldline crosses (intersects with) every timeslice of the outside world
prior to entering a region beyond the last outside timeslice. The dotted timeslices can be
modeled crudely as simple "right" hyperbolic branches of the form tj T = 1/R. We just
repeat this same -y = 1/x shape, shifted vertically, up to infinity. Notice that all of these
infinitely many time slices curve down and approach the same asymptote on the left. To
get to the "last timeslice" an object must go infinitely far in the vertical direction, but only
finitely far in the horizontal (leftward) direction.
The key point is that if an object goes to the left, it crosses every single one of the
analytically continued timeslice of the outside observers, all the way to their future
infinity. Hence those distant observers can always regard the object as not quite reaching
the event horizon (the vertical boundary on the left side of this schematic). At any one of
those slices the object could, in principle, reverse course and climb back out to the
outside observers, which it would reach some time between now and future infinity.
However, this doesn't mean that the object can never cross the event horizon (assuming it
doesn't bail out). It simply means that its worldline is present in every one of the outside
timeslices. In the direction it is traveling, those time slices are compressed infinitely close
together, so the in-falling object can get through them all in finite proper time (i.e., its
own local time along the worldline falling to the left in the above schematic).
Notice that the temporal interval between two definite events can range from zero to
infinity, depending on whose time slices we are counting. One observer's time is another
observer's space, and vice versa. It might seem as if this degenerates into chaos, with no
absolute measure for things, but fortunately there is an absolute measure. It's the absolute
invariant spacetime interval "ds" between any two neighboring events, and the absolute
distance along any specified path in spacetime is just found by summing up all the "ds"
increments along that path. For any given observer, a local absolute increment ds can be
projected onto his proper time axis and local surface of simultaneity, and these
projections can be called dt, dx, dy, and dz. For a sufficiently small region around the
observer these components are related to the absolute increment ds by the Minkowski or
some other flat metric, but in the presence of curvature we cannot unambiguously project
the components of extended intervals. The only unambiguous way of characterizing
extended intervals (paths) is by summing the incremental absolute intervals along a given
path.
An observer obviously has a great deal of freedom in deciding how to classify the
locations of putative events relative to himself. One way (the conventional way) is in
terms of his own time-slices and spatial distances as measured on those time slices, which
works fairly well in regions where spacetime is flat, although even in flat spacetime it's
possible for two observers to disagree on the lengths of objects and the spatial and
temporal distances between events, because their reference frames may be different.
However, they will always agree on the ds between two events. The same is true of the
integrated absolute interval along any path in curved spacetime. The dt,dx,dy,dz

components can do all sorts of strange things, but observers will always agree on ds.
This suggests that rather than trying to map the universe with a "grid" composed of time
slices and spatial distances on those slices, an observer might be better off using a sort of
"polar" coordinate system, with himself at the center, and with outgoing geodesic rays in
all directions and at all speeds. Then for each of those rays he measures the total ds
between himself and whatever is "out there". This way of "locating" things could be
parameterized in terms of the coordinate system [, , , s] where and are just
ordinary latitude and longitude angles to determine a direction in space, is the velocity
of the outgoing ray (divided by c), and s is the integrated ds distance along that ray as it
emanates out from the origin to the specified point along a geodesic path. (Incidentally,
these are essentially the coordinates Riemann used in his 1854 thesis on differential
geometry.) For any event in spacetime the observer can now assign it a location based on
this system of coordinates. If the universe is open, he will find that there are things which
are only a finite absolute distance from him, and yet are not on any of his analytically
continued time slices! This is because there are regions of spacetime where his time slices
never go, specifically, inside the event horizon of a black hole. This just illustrates that an
external observer's time slices aren't a very suitable set of surfaces with which to map
events near a black hole, let alone inside a black hole.
For this reason it's best to measure things in terms of absolute invariant distances rather
than time slices, because time slices can do all sorts of strange things and don't
necessarily cover the entire universe, assuming an open universe. Why did I specify an
open universe? The schematic above depicted an open universe, with infinitely many
external time slices, but if the universe is closed and finite, there are only finitely many
external time slices, and they eventually tip over and converge on a common singularity,
as shown below

In this context the sequence of tj slices eventually does include the vertical slices. Thus,
in a closed universe an external observer's time slices do cover the entire universe, which
is why there really is no true event horizon in a closed universe. An observer could use
his analytically continued time slices to map all events if he wished, although they would

still make an extremely somewhat ill-conditioned system of coordinates near an


approximate black hole.
One common question is whether a man falling (feet first) through an even horizon of a
black hole would see his feet pass through the event horizon below him. As should be
apparent from the schematics above, this kind of question is based on a
misunderstanding. Everything that falls into a black hole falls in at the same local time,
although spatially separated, just as everything in your city is going to enter tomorrow at
the same time. We generally have no trouble seeing our feet as we pass through midnight
tonight, although it is difficult one minute before midnight trying to look ahead and see
your feet one minute after midnight. Of course, for a small black hole you will have to
contend with tidal forces that may induce more spatial separation between your head and
feet than you'd like, but for a sufficiently large black hole you should be able to maintain
reasonable point-to-point co-moving distances between the various parts of your body as
you cross the horizon.
On the other hand, we should be careful not to understate the physical significance of the
event horizon, which some authors have a tendency to do, perhaps in reaction to earlier
over-estimates of its significance. Section 6.4 includes a description of a sense in which
spacetime actually is singular at r = 2m, even in terms of the proper time of an in-falling
particle, but it turns out to be what mathematicians call a "removable singularity", much
like the point x = 0 on the function sin(x)/x. Strictly speaking this "curve" is undefined at
that point, but by analytic continuation we can "put the point back in", essentially by just
defining sin(x)/x to be 1 at x = 0. Whether nature necessarily adheres to analytic
continuation in such cases is an open question.
Finally, we might ask what an observer would find if he followed a path that leads across
an event horizon and into a black hole. In truth, no one really knows how seriously to
take the theoretical solutions of Einstein's field equations for the interior of a black hole,
even assuming an open infinite universe. For example, the "complete" Schwarzschild
solution actually consists of two separate universes joined together at the black hole, but
it isn't clear that this topology would spontaneously arise from the collapse of a star, or
from any other known process, so many people doubt that this complete solution is
actually realized. It's just one of many strange topologies that the field equations of
general relativity would allow, but we aren't required to believe something exists just
because it's a solution of the field equations. On the other hand, from a purely logical
point of view, we can't rule them out, because there aren't any outright logical
contradictions, just some interesting transfinite topologies.

7.4 Curled-Up Dimensions


I do not mind confessing that I personally have often found relief from the
dreary infinities of homaloidal space in the consoling hope that, after all,
this other may be the true state of things.

William Kingdon
Clifford, 1873
The simplest cylindrical space can be represented by the perimeter of a circle. This onedimensional space with the coordinate X has the natural embedding in two-dimensional
space with orthogonal coordinates (x1,x2) given by the circle formulas

From the derivatives dx1/dX = sin(X/R) and dx2/dX = cos(X/R) we have the Pythagorean
identity (dx1)2 + (dx2)2 = (dX)2. The length of this cylindrical space is 2R.
We can form the Cartesian product of n such cylindrical spaces, with radii R1, R2, ..,Rn
respectively, to give an n-dimensional space that is cylindrical in all directions, with a
total "volume" of

For example, a three-dimensional space that is everywhere locally Euclidean and yet
cylindrical in all directions can be constructed by embedding the three spatial dimensions
in a six-dimensional space according to the parameterization

so the spatial Euclidean line element is

giving a Euclidean spatial metric in a closed three-space with total volume (2)3R1R2R3.
Subtracting from this an ordinary temporal component gives an everywhere-locallyLorentzian spacetime that is cylindrical in the three spatial directions, i.e.,

However, this last step seems half-hearted. We can imagine a universe cylindrical in all
directions, temporal as well as spatial, by embedding the entire four-dimensional
spacetime in a manifold of eight dimensions, two of which are purely imaginary, as
follows:

This leads again to the locally Lorentzian four-dimensional metric (1), but now all four of
the dimensions X,Y,Z,T are periodic. So here we have an everywhere-locally-Lorentzian
manifold that is closed and unbounded in every spatial and temporal direction. Obviously
this manifold contains closed time-like worldlines, although they circumnavigate the
entire universe. Whether such a universe would appear (locally) to possess a directional
causal structure is unclear.
We might imagine that a flat, closed, unbounded universe of this type would tend to
collapse if it contained any matter, unless a non-zero cosmological constant is assumed.
However, it's not clear what "collapse" would mean in this context. For example, it might
mean that the Rn parameters would shrink, but they are not strictly dynamical parameters
of the model. The four-dimensional field equations of general relativity operate only on
X,Y,Z,T, so we have no context within which the Rn parameters could "evolve". Any
"change" in Rn would imply some meta-time parameter , so that all the Rn coefficients in
the embedding formulas would actually be functions Rn().
Interestingly, the local flatness of the cylindrical four-dimensional spacetime is
independent of the value of R(), so if our "internal" field equations are satisfied for one
set of Rn values they would be satisfied for any other values. The meta-time and
associated meta-dynamics would be independent of the internal time T for a given
observer unless we imagine some "meta field equations" relating to the internal
parameters X,Y,Z,T. We might even speculate that these meta-equations would allow
(require?) the values of Rn to be "increasing" versus , and therefore indirectly versus our
internal time T = f(), in order to ensure stability. (One interesting question raised by
these considerations locally flat n-dimensional spaces embedded in flat 2n-dimensional
spaces is whether every orthogonal basis in the n-space maps to an orthogonal basis in
the 2n-space according to a set of formulas formally the same as those shown above, and,
if not, whether there is a more general mapping that applies to all bases.)
The above totally-cylindrical spacetime has a natural expression in terms of "octonion
space", i.e., the Cayley algebra whose elements are two ordered quaternions

Thus each point (X,Y,Z,T) in four-dimensional spacetime represents two quaternions

To determine the absolute distances in this eight-dimensional manifold we again consider


the eight coordinate differentials, exemplified by

(using the rule for total differentials) so the squared differentials are exemplified by

Adding up the eight squared differentials to give the square of the absolute differential
interval leads again to the locally Lorentzian four-dimensional metric

Naturally it isn't necessary to imagine an embedding of our hypothesized closed


dimensions in a higher-dimensional space, but it can be helpful for visualizing the
structure. One of the first suggestions for closed cylindrical dimensions was made by
Theodor Kaluza in 1919, in a paper communicated to the Prussian Academy by Einstein
in 1921. The idea proposed by Kaluza was to generalize relativity from four to five
dimensions. The introduction of the fifth dimension increases the number of components
of the Riemann metric tensor, and it was hoped that some of this additional structure
would represent the electromagnetic field on an equal footing with the gravitational field
on the "left side" of Einstein's field equations, instead of being lumped into the stressenergy tensor T. Kaluza showed that, at least in the weak field limit for low velocities,
we can arrange for a five dimensional manifold with one cylindrical dimension such that
geodesic paths correspond to the paths of charged particles under the combined influence
of gravitational and electromagnetic fields. In 1926 Oskar Klein proved that the result
was valid even without the restriction to weak fields and low velocities.
The fifth dimension seems to have been mainly a mathematical device for Kaluza, with
little physical significance, but subsequent researchers have sought to treat it as a real
physical dimension, and more recent "grand unification theories" have postulated field
theories in various numbers of dimensions greater than four (though none with fewer than
four, so far as I know). In addition to increasing the amount of mathematical structure,
which might enable the incorporation of the electromagnetic and other fields, many
researchers (including Einstein and Bergmann in the 1930's) hoped the indeterminacy of
quantum phenomena might be simply the result of describing a five-dimensional world in
terms of four-dimensional laws. Perhaps by re-writing the laws in the full five dimensions
quantum mechanics could, after all, be explained by a field theory. Alas, as Bergmann
later noted, "it appears these high hopes were unjustified".

Nevertheless, theorists ever since have freely availed themselves of whatever number of
dimensions seemed convenient in their efforts to devise a fundamental "theory of
everything". In nearly all cases the extra dimensions are spatial and assumed to be closed
with extremely small radii in terms of macroscopic scales, thus explaining why it appears
that macroscopic objects exist in just three spatial dimensions. Oddly enough, it is seldom
mentioned that we do, in fact, have six extrinsic relational degrees of freedom, consisting
of the three open translational dimensions and the closed orientational dimensions, which
can be parameterized (for example) by the Euler angles of a frame. Of course, these three
dimensions are not individually cylindrical, nor do they commute, but at each point in
three-dimensional space they constitute a closed three-dimensional manifold isomorphic
to the group of rotations. It's also worth noting that while translational velocity in the
open dimensions is purely relativistic, angular velocity in the closed dimensions is
absolute, and there is no physical difficulty in discerning a state of absolute non-rotation.
This is interesting because, even though a closed cylindrical space may be locally
Lorentzian, it is globally absolute, in the sense that there is a globally distinguished state
of motion with respect to which an inertial observer's natural surfaces of simultaneity are
globally coherent. In any other state of motion the surfaces of simultaneity are helical in
time, similar to the analytically continued systems of reference of observers at rest on the
perimeter of a rotating disk.
To illustrate, consider two possible worldlines of a single particle P in a one-dimensional
cylindrical space as shown in the spacetime diagrams below.

The cylindrical topology of the space is represented by identifying the worldline AB with
the worldline CD. Now, in the left-hand figure the particle P is stationary, and it emits
pulses of light in both directions at event a. The rightward-going pulse passes through
event c, which is the same as event b, and then it proceeds from b to d. Likewise the
leftward-going pulse goes from a to b and then from c to d. Thus both pulses arrive back
at the particle P simultaneously. However, if the particle P is in absolute motion as shown
in the right-hand figure, the rightward light pulse goes from a to c and then from c to d2,
whereas the leftward pulse goes from a to b and then from b to d1, so in this case the

pulses do not arrive back at particle P simultaneously. The absolutely stationary


worldlines in this cylindrical space are those for which the diverging-converging light
cones remain coherent. (In the one-dimensional case there are discrete absolute speeds
greater than zero for which the leftward and rightward pulses periodically re-converge on
the particle P.)
Of course, for a different mapping between the events on the line AB and the events on
the line CD we would get a different state of rest. The worldlines of identifiable inertial
entities establish the correct mapping. If we relinquish the identifiability of persistent
entities through time, and under completed loops around the cylindrical dimension, then
the mapping becomes ambiguous. For example, we assume particle P associates the
pulses absorbed at event d with the pulses emitted at event a, although this association is
not logically necessary.
7.5 Packing Universes In Spacetime
All experience is an arch wherethrough
Gleams that untraveled world whose margin fades
Forever and forever when I move.
Tennyson, 1842
One of the interesting aspects of the Minkowski metric is that every lightcone (in
principle) contains infinitely many nearly-complete lightcones. Consider just a single
spatial dimension in which an infinite number of point particles are moving away from
each other with mutual velocities as shown below:

Each particle finds itself mid-way between its two nearest neighbors, which are receding
at nearly the speed of light, so that each particle can be regarded as the origin of a nearlycomplete lightcone. On the other hand, all of these particles emanate from a single point,
and the entire infinite set of points (and nearly-complete lightcones) resides within the
future lightcone of that single point.
More formally, a complete lightcone in a flat Lorentzian xt plane comprises the boundary
of all points reachable from a given point P along world lines with speeds less than 1
relative to any and every inertial worldline through P. Also, relative to any specific
inertial frame W we can define an "-complete lightcone" as the region reachable from P
along world lines with speeds less than (1) relative to W, for some arbitrarily small >
0. A complete lightcone contains infinitely many epsilon-complete lightcones, as
illustrated above by the infinite linear sequence of particles in space, each receding with a
speed of (1) relative to its closest neighbors. Since we can never observe something
infinitely red-shifted, it follows that our observable universe can fit inside an -complete
lightcone just as well as in a truly complete lightcone. Thus a single lightcone in infinite

flat Lorentzian spacetime encompasses infinitely many mutually exclusive -universes.


If we arbitrarily select one of the particles as the "rest" particle P0, and number the other
particles sequentially, we can evaluate the velocities of the other particles with respect to
the inertial coordinates of P0, whose velocity is v0 = 0. If each particle has a mutual
velocity u relative to each of its nearest neighbors, then obviously P1 has a speed v1 = u.
The speed of P2 is u relative to P1, and its speed relative to P0 is given by the relativistic
speed composition formula v2 = (v1 + u)/(uv1 + 1). In general, the speed of Pk can be
computed recursively based on the speed of Pk-1 using the formula

This is just a linear fractional function, so we can use the method described in Section 2.6
to derive the explicit formula

Similarly, in full 3+1 dimensional spacetime we can consider packing -complete


lightspheres inside a complete lightsphere. A flash of light at point P in flat Lorentzian
spacetime emanates outward in a spherical shell as viewed from any inertial worldline
through P. We arbitrarily select one such worldline W0 as our frame of reference, and let
the slices of simultaneity relative to this frame define a time parameter t. The points of
the worldline W0 can be regarded as the stationary center of a 3D expanding sphere at
each instant t. On any given time-slice t we can set up orthogonal space coordinates x,y,z
relative to W0 and normalize the units so that the radius of the expanding lightsphere at
time t equals 1. In these terms the boundary of the lightsphere is just the sphere

Now let W1 denote another inertial worldline through the point P with a velocity v = v1
relative to W0, and consider the region R1 surrounding W1 consisting of the points
reachable from P with speeds not exceeding u = u1 relative to W1. The region R1 is
spherical and centered on W1 relative to the frame of W1, but on any time-slice t (relative
to W0) the region R1 has an ellipsoidal shape. If v is in the z direction then the crosssectional boundary of R1 on the xy plane is given parametrically by

as ranges from 0 to 2. The entire boundary is just the surface of rotation of this ellipse

about the z axis. If v1 has a magnitude of (1 ) for some arbitrarily small > 0, and if we
set u1 = |v1|, then as goes to zero the boundary of the region R1 approaches the limiting
ellipsoid

Similarly if W2 is an inertial worldline with speed |v2| = |v1| in the negative z direction
relative to W0, then the boundary of the region R2 consisting of the points reachable from
P with speeds not exceeding u2 = |v2| approaches the limiting ellipsoid

The regions R1 and R2 are mutually exclusive, meeting only at the point of contact
[0,0,0]. Each of these regions can be called an "-complete" lightsphere.
Interestingly, beginning with R1 and R2 we can construct a perfect tetrahedral packing of
eight epsilon-complete lightspheres by placing six more spheres in a hexagonal ring
about the z axis with centers in the xy plane, such that each sphere just touches R1 and R2
and its two adjacent neighbors in the ring. Each of these six spheres represents a region
reachable from P with speeds less than u1 relative to one of six worldlines whose speeds
are (1 4) relative to W0. The normalized boundaries of these six ellipsoids on a timeslice t are given by

for k = 0,1,..,5. In the limit as epsilon goes to zero the hexagonal cluster of e-spheres
touching any given e-sphere becomes vanishingly small with respect to the given sphere's
frame of reference, so we approach the condition that this hexagonal pattern tessellates
the entire surface of each e-sphere in a perfectly symmetrical tetrahedral packing of
identical epsilon-complete lightspheres. A cross-sectional side-view and top-view of this
configuration are shown below.

These considerations show that we can regard a single light cone as a cosmological
model, taking advantage of the complete symmetry in Minkowski spacetime. Milne was
the first to discuss this model in detail. He postulated a cloud of particles expanding in
flat spacetime from a single event O, with a distribution of velocities such that the mutual
velocities between neighboring particles was the same for every particle, just as in the
one-dimensional case described at the beginning of this section. With respect to any
particular system of inertial coordinates t,x,y,z whose origin is at the event O, the cloud
of particles is spherically symmetrical with radially outward speed v = r/t. The density of
the particles is also spherically symmetrical, but it is not isotropic. To determine the
density with respect to the inertial coordinates t,x,y,z, we first consider the density in the
radial direction at a point on the x axis at time t. If we let u denote the mutual speed
between neighboring particles, then the speed vn of the nth particle away from the center
is

where xn is the radial distance of the nth particle along the x axis. Solving for n gives

Differentiating with respect to x gives the density of particles in the x directions

This confirms that the one-dimensional density at the spatial origin drops in proportion to
1/t. Also, by symmetry, the densities in the transverse directions y and z at any point are
given by this same expression as a function of the proper time = t
point

at that

This shows that the densities in the transverse directions are less than in the radial
direction by a factor of
. Neglecting the anisotropy, the number of particles
in a volume element dxdydz at a radial distance r from the spatial origin at time t is
proportional to

This distribution applies to every inertial system or coordinates with origin at O, so this
cosmology looks the same, and is spherically symmetrical, with respect to the rest frame
of each individual particle.
The above analysis was based on a foliation of spacetime into slices of constant-t for
some particular system of inertial coordinates, but this is not the only possible foliation,
nor even the most natural. From a cosmological standpoint we might adopt as our time
coordinate at each point the proper time of uniform worldline extending from O to that
point. This would give hyperboloid spacelike surfaces consisting of the locus of all the
points with a fixed proper age from the origin event O. One of these spacelike slices is
illustrated by the " = k" line in the figure below.

Rindler points out that if = k is the epoch at which the density of the expanding cloud
drops low enough so that matter and thermal radiation decouple, we should expect at the
present event "p" to be receiving an isotropic and highly red-shifted "background
radiation" along the dotted lightlike line from that de-coupling surface as shown in the
figure. As our present event p advances into the future we expect to see a progressively
more red-shifted (i.e., lower temperature) background radiation. This simplistic model
gives a surprisingly good representation of the 3K microwave radiation that is actually
observed.
It's also worth noting that if we adopt the hyperboloid foliation the universe of this
expanding cloud is spatially infinite. We saw in Section 1.7 that the absolute radial
distance along this surface from the spatial center to a point at r is

where r2 = x2 + y2 + z2 in terms of the inertial coordinates of the central spatial point.


Furthermore, we can represent this hyperboloid spatial surface as existing over the flat
Euclidean xy plane with the elevation
. By making the elevation
imaginary, we capture the indefinite character of the surface. In the limit near the origin
we can expand h to give

So, according to the terminology of Section 5.3, we have a surface tangent to the xy plane
at the origin with elevation given by h = ax2 + bxy + cy2 where a = c = i/2 and b = 0.
Consequently the Gaussian curvature of this spatial surface is K = 4ac b2 = -1/2. By
symmetry the same analysis is applicable at every point on the surface, so this surface has
constant negative curvature. This applies to any two-dimensional spatial tangent plane in
the three-dimensional space at each point for constant .
We can also evaluate the metric on this two-dimensional spacelike slice, by writing the
total differential of h

Squaring this and adding the result to (dx)2 + (dy)2 gives the line element for this surface
in terms of the tangent xy plane coordinates projected onto the surface

7.6 Cosmological Coherence


Our main difference in creed is that you have a specific belief and I am
a skeptic.
Willem de Sitter, 1917
Almost immediately after Einstein arrived at the final field equations of general relativity,
the very foundation of his belief in those equations was shaken, first by appearance of
Schwarzschilds exact solution of the one-body problem. This was disturbing to Einstein
because at the time he held the Machian belief that inertia must be attributable to the
effects of distant matter, so he thought the only rigorous global solutions of the field
equations would require some suitable distribution of distant matter. Schwarzschilds
solution represents a well-defined spacetime extending to infinity, with ordinary inertial
behavior for infinitesimal test particles, even though the only significant matter in this
universe is the single central gravitating body. That body influences the spacetime in its
vicinity, but the metric throughout spacetime is primarily determined by the spherical
symmetry, leading to asymptotically flat spacetime at great distances from the central
body. This seems rather difficult to reconcile with Machs Principle, but there was
worse to come, and it was Einstein himself who opened the door.
In an effort to conceive of a static cosmology with uniformly distributed matter he found
it necessary to introduce another term to the field equations, with a coefficient called the
cosmological constant. (See Section 5.8.) Shortly thereafter, Einstein received a letter

from the astronomer Willem de Sitter, who pointed out a global solution of the modified
field equations (i.e., with non-zero cosmological constant) that is entirely free of matter,
and yet that possesses non-trivial metrical structure. This thoroughly un-Machian
universe was a fore-runner of Gdels subsequent cosmological models containing
closed-timelike curves. After a lively and interesting correspondence about the shape of
the universe, carried on between a Dutch astronomer and a German physicist at the height
of the first world war, de Sitter published a paper on his solution, and Einstein published
a rebuttal, claiming (incorrectly) that the De Sitter system does not look at all like a
world free of matter, but rather like a world whose matter is concentrated entirely on the
[boundary]. The discussion was joined by several other prominent scientists, including
Weyl, Klein, and Eddington, who all tried to clarify the distinction between singularities
of the coordinates and actual singularities of the manifold/field. Ultimately all agreed that
de Sitter was right, and his solution does indeed represent a matter-free universe
consistent with the modified field equations.
Weve seen that the Schwarzschild metric represents the unique spherically symmetrical
solution of the original field equations of general relativity - assuming the cosmological
constant, denoted by in Section 5.8, is zero. If we allow a non-zero value of , the
Schwarzschild solution generalizes to

To avoid upsetting the empirical successes of general relativity, such as the agreement
with Mercurys excess precession, the value of must be extremely small, certainly less
than 10-40 m-2, but not necessarily zero. If is precisely zero, then the Schwarzschild
metric goes over to the Minkowski metric when the gravitating mass m equals zero, but if
is not precisely zero the Schwarzschild metric with zero mass is

where L is a characteristic length related to the cosmological constant by L2 = 3/. This is


one way of writing the metric of de Sitter spacetime. Just as Minkowski spacetime is a
solution of the original vacuum field equation R = 0, so the de Sitter metric is a solution
of the modified field equations R = g. Since there is no central mass in this case, it
may seem un-relativistic to use polar coordinates centered on one particular point, but it
can be shown that just as with the Minkowski metric in polar coordinates the metric
takes the same form when centered on any point.
The metric (1) can be written in a slightly different form in terms of the radial coordinate
defined by

Noting that dr/L = cos(/L)d, the de Sitter metric is

Interestingly, with a suitable change of coordinates, this is actually the metric of the
surface of a four-dimensional pseudo-sphere in five-dimensional Minkowski space.
Returning to equation (1), let x,y,z denote the usual three orthogonal spatial coordinates
such that x2 + y2 + z2 = r2, and suppose there is another orthogonal spatial coordinate W
and a time coordinate T defined by

For any values of x,y,z,t we have

so this locus of events comprises the surface of a hyperboloid, i.e., a pseudo-sphere of


radius L. In other words, the spatial universe for any given time T is the threedimensional surface of the four-dimensional sphere of squared radius L2 + T2. Hence the
space shrinks to a minimum radius L at time T = 0 and then expands again as T increases,
as illustrated below (showing only two of the spatial dimensions).

Assuming the five-dimensional spacetime x,y,z,W,T has the Minkowski metric

we can determine the metric on the hyperboloid surface by substituting the squared
differentials (dT)2 and (dW)2

into the five-dimensional metric, which gives equation (1). The accelerating expansion of
the space for a positive cosmological constant can be regarded as a consequence of a
universal repulsive force. The radius of the spatial sphere follows a hyperbolic trajectory
similar to the worldlines of constant proper acceleration discussed in Section 2.9. To
show that the expansion of the de Sitter spacetime can be seen as exponential, we can put
the metric into the Robertson-Walker form (see Section 7.1) by defining a new system
of coordinates

such that

where

It follows that

where

Substituting into the metric (1) gives the exponential form

This the characteristic length R(t) for this metric is the simple exponential function. (This
form of the metric covers only part of the manifold.) Equations (1), (2), and (3) are the
most common ways of expressing de Sitters metric, but in the first letter that de Sitter
wrote to Einstein on this subject he didnt give the line element in any of these familiar
forms. We can derive his original formulation beginning with (1) if we define new
coordinates

related to the r,t coordinates of (1) by

Incidentally, the t coordinate is the relativistic difference between the advanced and
retarded combinations of the barred coordinates, i.e.,

The differentials in (1) can be expressed in terms of the barred coordinates as

where the partials are

and

Making these substitutions and simplifying, we get the Cartesian form of the metric
that de Sitter presented in his first letter to Einstein

where d denotes the angular components, which are unchanged from (1). These

expressions have some purely mathematical features of interest. For example, the line
element is formally similar to the expressions for curvature discussed in Section 5.3.
Also, the denominators of the partials of t are, according to Herons formula, equal to
16A2 where A is the area of a triangle with edge lengths

If the cosmological constant was zero (meaning that L was infinite) all the dynamic
solutions of the field equations with matter predict a slowing rate of expansion, but in
1998 two independent groups of astronomers reported evidence that the expansion of the
universe is actually accelerating. If these findings are correct, then some sort of repulsive
force is needed in models based on general relativity. This has led to renewed interest in
the cosmological constant and de Sitter spacetime, which is sometimes denoted as dS4. If
the cosmological constant is negative the resulting spacetime manifold is called anti-de
Sitter spacetime, denoted by AdS4. In the latter case, we still get a hyperboloid, but the
time coordinate advances circumferentially around the surface. To avoid closed time-like
curves, we can simply imagine wrapping sheets around the hyperboloid.
As discussed in Section 7.1, the characteristic length R(t) of a manifold (i.e., the timedependent coefficient of the spatial part of the manifold) satisfying the modified Einstein
field equations (with non-zero cosmological constant) varies as a function of time in
accord with the Friedmann equation

where dots signify derivatives with respect to a suitable time coordinate, C is a constant,
and k is the curvature index, equal to either -1, 0, or +1. The terms on the right hand side
are akin to potentials, and its interesting to note that the first two terms correspond to the
two hypothetical forms of gravitation highlighted by Newton in the Principia. (See
Section 8.2 for more on this.) As explained in Section 7.1, the Friedmann equation
implies that R satisfies the equation

which shows that, if = 0, the characteristic cosmological length R is a solution of the


separation equation for non-rotating gravitationally governed distances, as given by
equation (2) of Section 4.2. Comparing the more general gravitational separation from
Section 4.2 with the general cosmological separation, we have

which again highlights the inverse square and the direct proportionalities that caught
Newtons attention. Its interesting that with m = 0 the left-hand expression reduces to the
purely inertial separation equation, whereas with = 0 the right hand expression reduces
to the (non-rotating) gravitational separation equation. We saw that the homogeneous

forms of these equations are just special cases of the more general relation

where subscripts denote derivatives with respect to a suitable time coordinate. Among the
solutions of this equation, in addition to the general co-inertial separations, non-rotating
gravitational separations, and rotating-sliding separations, are sinusoidal functions and
exponential functions. Historically this led to the suspicion, long before the recent
astronomical observations, that there might be a class of exponential cosmological
distances in addition to the cycloidal and parabolic distances. In other words, there could
be different classes of observable distances, some very small and oscillatory, some larger
and slowing, and some the largest of all increasing at an accelerating rate. This is
illustrated in the figure below.

Of course, according to all conventional metrical theories, including general relativity, the
spatial relations between material objects (on any chosen temporal foliation) conform to a
single three-dimensional manifold. Assuming homogeneity and isotropy, it follows that
all the cosmological distances between objects are subject to the ordinary metrical
relations such as the triangle inequality. This greatly restricts the observable distances. On
the other hand, our assumption that the degrees of freedom are limited in this way is
based on our experience with much smaller distances. We have no direct evidence that
cosmological distances are subject to the same dependencies. As an example of how
concepts based on limited experience can be misleading, recall how special relativity
revealed that the metric of our local spacetime fails to satisfy the axioms of a metric,
including the triangle inequality. The non-additivity of relative speeds was not anticipated
based on human experience with low speeds. Likewise for three co-linear objects
A,B,C, its conceivable that the distance AC is not the simple sum of the distances AB
and BC. The feasibility of regarding separations (rather than particles) as the elementary
objects of nature was discussed in Section 4.1.

One possible observational consequence of having distances of several different classes


would be astronomical objects that are highly red-shifted and yet much closer to us than
the standard Hubble model would imply based on their redshifts. (Of course, even if this
view was correct, it might be the case that all the exponential separations have already
passed out of view.) Another possible consequence would be that some observable
distances would be increasing at an accelerating rate, whereas others of the same
magnitude might be decelerating.
The above discussion shows that the idea of at least some cosmological separations
increasing at an accelerating rate can (and did) arise from completely a priori
considerations. Of course, as long as a single coherent expansion model is adequate to
explain our observations, the standard GR models of a smooth manifold will remain
viable. Less conventional notions such as those discussed above would only be called for
only if we begin to see conflicting evidence, e.g., if some observations strongly indicate
accelerating expansion while others strongly indicate decelerating expansion.
The cosmological constant is hardly ever discussed without mentioning that (according to
Gamow) Einstein called it his biggest blunder, but the reasons for regarding this
constant as a blunder are seldom discussed. Some have suggested that Einstein was
annoyed at having missed the opportunity to predict the Hubble expansion, but in his own
writings Einstein argued that the introduction of [the cosmological constant] constitutes
a complication of the theory, which seriously reduces its logical simplicity. He also
wrote If there is no quasi-static world, then away with the cosmological term, adding
that it is theoretically unsatisfactory anyway. In modern usage the cosmological term is
usually taken to characterize some feature of the vacuum state, and so it is a fore-runner
of the extremely complicated vacua that are contemplated in the string theory research
program. If Einstein considered the complication and loss of logical simplicity associated
with a single constant to be theoretically unsatisfactory, he would presumably have been
even more dissatisfied with the nearly infinite number of possible vacua contemplated in
current string research. Oddly enough, the de Sitter and anti-de Sitter spacetimes play a
prominent role in this research, especially in relation to the so-called AdS/CFT conjecture
involving conformal field theory.
7.7 Boundaries and Symmetries
Whether Heaven move or Earth,
Imports not, if thou reckon right.
John Milton, 1667
Each point on the surface of an ordinary sphere is perfectly symmetrical with every other
point, but there is no difficulty imagining the arbitrary (random) selection of a single
point on the surface, because we can define a uniform probability density on this surface.
However, if we begin with an infinite flat plane, where again each point is perfectly
symmetrical with every other point, we face an inherent difficulty, because there does not

exist a perfectly uniform probability density distribution over an infinite surface. Hence,
if we select one particular point on this infinite flat plane, we can't claim, even in
principle, to have chosen from a perfectly uniform distribution. Therefore, the original
empty infinite flat plane was not perfectly symmetrical after all, at least not with respect
to our selection of individual points. This shows that the very idea of selecting a point
from a pre-existing perfectly symmetrical infinite manifold is, in a sense, selfcontradictory. Similarly the symmetry of infinite Minkowski spacetime admits no
distinguished position or frame of reference, but the introduction of an inertial particle
not only destroys the symmetry, it also contradicts the premise that the points of the
original manifold were perfectly symmetrical, because the non-existence of a uniform
probability density distribution over the possible positions and velocities implies that the
placement of the particle could not have been completely impartial.
Even if we postulate a Milne cosmology (described in Section 7.5), with dust particles
emanating from a single point at uniformly distributed velocities throughout the future
null cone (note that this uniform distribution isn't normalized as a probability density, so
it can't be use make a selection), we still arrive at a distinguished velocity frame at each
point. We could retain perfect Minkowskian symmetry in the presence of matter only by
postulating a "super-Milne" cosmology in which every point on some past spacelike slice
is an equivalent source of infinitesimal dust particles emanating at all velocities
distributed uniformly throughout the respective future null cones of every point. In such a
cosmology this same condition would apply on every time-slice, but the density would be
infinite, because each point is on the surface of infinitely many null cones, and we would
have infinitely dense flow of particles in all directions at every point. Whether this could
correspond to any intelligible arrangement of physical entities is unclear.
The asymmetry due to the presence of an infinitesimal inertial particle in flat Minkowski
spacetime is purely circumstantial, because the spacetime is considered to be unaffected
by the presence of this particle. However, according to general relativity, the presence of
any inertial entity disturbs the symmetry of the manifold even more profoundly, because
it implies an intrinsic curvature of the spacetime manifold, i.e., the manifold takes on an
intrinsic shape that distinguishes the location and rest frame of the particle. For a single
non-rotating uncharged particle the resulting shape is Schwarzschild spacetime, which
obviously exhibits a distinguished center and rest frame (the frame of the central mass).
Indeed, this spacetime exhibits a preferred system of coordinates, namely those for which
the metric coefficients are independent of the time coordinate.
Still, since the field variables of general relativity are the metric coefficients themselves,
we are naturally encouraged to think that there is no a priori distinguished system of
reference in the physical spacetime described by general relativity, and that it is only the
contingent circumstance of a particular distribution of inertial entities that may
distinguish any particular frame or state of motion. In other words, it's tempting to think
that the spacetime manifold is determined solely by its "contents", i.e., that the left side of
Guv = 8Tuv is determined by the right side. However, this is not actually the case (as
Einstein and others realized early on), and to understand why, it's useful to review what is
involved in actually solving the field equations of general relativity as an initial-value

problem.
The ten algebraically independent field equations represented by Guv = 8Tuv involve the
values of the ten independent metric coefficients and their first and second derivatives
with respect to four spacetime coordinates. If we're given the values of the metric
coefficients throughout a 3D spacelike "slice" of spacetime at some particular value of
the time coordinate, we can directly evaluate the first and second derivatives of these
components with respect to the space coordinates in this "slice". This leaves only the first
and second derivatives of the ten metric with respect to the time coordinate as unknown
quantities in the ten field equations. It might seem that we could arbitrarily specify the
first derivatives, and then solve the field equations for the second derivatives, enabling us
to "integrate" forward in time to the next timeslice, and then repeat this process to predict
the subsequent evolution of the metric field. However, the structure of the field equations
does not permit this, because four of the ten field equations (namely, G0v = 8T0v with v =
0,1,2,3) contain only the first derivatives with respect to the time coordinate x0, so we
can't arbitrarily specify the guv and their first derivatives with respect to x0 on a surface of
constant x0. These ten first derivatives, alone, must satisfy the four G0v conditions on any
such surface, so before we can even pose the initial value problem, we must first solve
this subset of the field equations for a viable set of initial values. Although these four
conditions constrain the initial values, they obviously don't fully determine them, even
for a given distribution of Tuv.
Once we've specified values of the guv and their first derivatives with respect to x0 on
some surface of constant x0 in such a way that the four conditions for G0v are satisfied, the
four contracted Bianchi identities ensure that these conditions remain satisfied outside the
initial surface, provided only that the remaining six equations are satisfied everywhere.
However, this leaves only six independent equations to govern the evolution of the ten
field variables in the x0 direction. As a result, the second derivatives of the guv with
respect to x0 appear to be underdetermined. In other words, given suitable initial
conditions, we're left with a four-fold ambiguity. We must arbitrarily impose four more
conditions on the system in order to uniquely determine a solution. This was to be
expected, because the metric coefficients depend not only on the absolute shape of the
manifold, but also on our choice of coordinate systems, which represents four degrees of
freedom. Thus, the field equations actually determine an equivalence class of solutions,
corresponding to all the ways in which a given absolute metrical manifold can be
expressed in various coordinate systems. In order to actually generate a solution of the
initial value problem, we need to impose four "coordinate conditions" along with the six
"dynamical" field equations. The conditions arise from any proposed system of
coordinates by expressing the metric coefficients g0v in terms of these coordinates (which
can always be done for any postulated system of coordinates), and then differentiating
these four coefficients twice with respect to x0 to give four equations in the second
derivatives of these coefficients.
Notwithstanding the four-fold ambiguity of the dynamical field equations, which is just a
descriptive rather than a substantive ambiguity, it's clear that the manifold is a definite
absolute entity, and its overall characteristics and evolution are determined not only by

the postulated Tuv and the field equations, but also by the conditions specified on the
initial timeslice. As noted above, these conditions are constrained by the field equations,
but are by no means fully determined. We are still required to impose largely arbitrary
conditions in order to fix the absolute background spacetime. This state of affairs was
disappointing to Einstein, because he recognized that the selection of a set of initial
conditions is tantamount to stipulating a preferred class of reference systems, precisely as
in Newtonian theory, which is "contrary to the spirit of the relativity principle" (referring
presumably to the relational ideas of Mach). As an example, there are multiple distinct
vacuum solutions of the field equations, some with gravitational waves and even geons
(temporarily) zipping around, and some not. Even more ambiguity arises when we
introduce mass, as Gdel showed with his cosmological solutions in which the average
mass of the universe is rotating with respect to the spacetime background. These
examples just highlight the fact that general relativity can no more dispense with the
arbitrary stipulation of a preferred class of reference systems (the inertial systems) than
could Newtonian mechanics or special relativity.
This is clearly illustrated by Schwarzschild spacetime, which (according to Birkhoff's
theorem) is the essentially unique spherically symmetrical solution of the field equations.
Clearly this cosmological model, based on a single spherically symmetrical mass in an
otherwise empty universe, is "contrary to the spirit of the relativity principle", because as
noted earlier there is an essentially unique time coordinate for which the metric
coefficients are independent of time. Translation along a vector that leaves the metric
formally unchanged is called an isometry, and a complete vector field of isometries is
called a Killing vector field. Thus the Schwarzschild time coordinate t constitutes a
Killing vector field over the entire manifold, making it a highly distinguished time
coordinate, no less than Newton's absolute time. In both special relativity and Newtonian
physics there is an infinite class of operationally equivalent systems of reference at any
point, but in Schwarzschild spacetime there is an essentially unique global coordinate
system with respect to which the metric coefficients are independent of time, and this
system is related in a definite way to the inertial class of reference systems at each point.
Thus, in the context of this particular spacetime, we actually have a much stronger case
for a meaningful notion of absolute rest than we do in Newtonian spacetime or special
relativity, both of which rest naively on the principle of inertia, and neither of which
acknowledges the possibility of variations in the properties of spacetime from place to
place (let alone under velocity transformations).
The unique physical significance of the Schwarzschild time coordinate is also shown by
the fact that Fermat's principle of least time applies uniquely to this time coordinate. To
see this, consider the path of a light pulse traveling through the solar system, regarded as
a Schwarzschild geometry centered around the Sun. Naturally there are many different
parameterizations and time coordinates that we could apply to this geometry, and in
general a timelike geodesic extremizes d (not dt for whatever arbitrary time coordinate t
we might be using), and of course a spacelike geodesic extremizes ds (again, not dt).
However, for light-like paths we have d = ds = 0 by definition, so the path is confined to
null surfaces, but this is not sufficient to pick out which null path will be followed. So,
starting with a line element of the form

where and represent the usual Schwarzschild coordinates, we then set d = 0 for
light-like paths, which reduces the equation to

This is a perfectly good metrical (not pseudo-metrical) space, with a line element given
by dt, and in fact by extremizing (dt)2 we get the paths of light. Note that this only works
because gtt, grr , g, g all happen to be independent of this time coordinate, t, and also
because gtr = gt = gt = 0. If and only if all these conditions apply, we reduce to a simple
line element of dt on the null surfaces, and Fermat's Principle applies to the parameter t.
Thus, in a Schwarzschild universe, this works only when using the essentially unique
Schwarzschild coordinates, in which the metric coefficients are independent of the time
coordinate.
Admittedly the Schwarzschild geometry is a highly simplistic and symmetrical
cosmology, but it illustrates how the notion of an absolute rest frame can be more
physically meaningful in a relativistic spacetime than in Newtonian spacetime. The
spatial configuration of Newton's absolute space is invariant and the Newtonian metric is
independent of time, regardless of which member of the inertial class of reference
systems we choose, whereas Schwarzschild spacetime is spherically symmetrical and its
metric coefficients are independent of time only with respect to the essentially unique
Schwarzschild system of coordinates. In other words, Newtonian spacetime is
operationally symmetrical under translations and uniform velocities, whereas the
spacetime of general relativity is not. The curves and dimples in relativistic spacetime
automatically destroy symmetry under translation, let alone velocity. Even the spacetime
of special relativity is (marginally) less relational (in the Machian sense) than Newtonian
spacetime, because it combines space and time into a single manifold that is only
partially ordered, whereas Newtonian spacetime is totally ordered into a continuous
sequence of spatial instants. Noting that Newtonian spacetime is explicitly less relational
than Galilean spacetime, it can be argued that the actual evolution of spacetime theories
historically has been from the purely kinematically relational spacetime of Copernicus to
inertial relativity of Galileo and special relativity to the purely absolute spacetime of
general relativity. At each stage the meaning of relativity has been refined and qualified.
We might suspect that the distinguished "Killing-time" coordinate in the Schwarzschild
cosmology is exceptional - in the sense that the manifold was designed to satisfies a very
restrictive symmetry condition - and that perhaps more general spacetime manifolds do
not exhibit any preferred directions or time coordinates. However, for any specific
manifold we must apply some symmetry or boundary conditions sufficient to fix the
metrical relations of the manifold, which unavoidably distinguishes one particular system
of reference at any given point. For example, in the standard Friedmann models of the

universe there is, at each point in the manifold, a frame of reference with respect to which
the rest of the matter and energy in the universe has maximal spherical symmetry, which
is certainly a distinguished system of reference. Still, we might imagine that these are just
more exceptional cases, and that underneath all these specific examples of relativistic
cosmologies that just happen to have strongly distinguished systems of reference there
lies a purely relational theory. However, this is not the case. General relativity is not a
relational theory of motion. The spacetime manifold in general relativity is an absolute
entity, and it's clear that any solution of the field equations can only be based on the
stipulation of sufficient constraints to uniquely determine the manifold, up to inertial
equivalence, which is precisely the situation with regard to the Newtonian spacetime
manifold.
But isn't it possible for us to invoke general relativity with very generic boundary
conditions that do not commit us to any distinguished frame of reference? What if we
simply stipulate asymptotic flatness at infinity? This is typically the approach taken when
modeling the solar system or some other actual configuration, i.e., we require that, with a
suitable choice of coordinates, the metric tensor approaches the Minkowski metric at
spatial infinity. However, as Einstein put it, the specifications of "these boundary
conditions presuppose a definite choice of the system of reference". In other words, we
must specify a suitable choice of coordinates in terms of which the metric tensor
approaches the Minkowski metric, but this specification is tantamount to specifying the
absolute spacetime (up to inertial equivalence, as always) in Newtonian physics.
The well-known techniques for imposing asymptotic flatness at "conformal infinity",
such as discussed by Wald, are not exceptions, because they place only very mild
constraints on the field solution in the finite region of the manifold. Indeed, the explicit
purpose of such constructions is to establish asymptotic flatness at infinity while
otherwise constraining the solution as little as possible, to facilitate the study
gravitational waves and other phenomena in the finite region of the manifold. These
phenomena must still be "driven" by the imposition of conditions that inevitably
distinguish a particular frame of reference at one or more points. Furthermore, to the
extent that flatness at conformal infinity succeeds in imposing an absolute reference for
gravitational "potential" and the total energy of an isolated system, it still represents an
absolute background that has been artificially imposed.
Since the condition of flatness at infinity is not sufficient to determine a solution, we
must typically impose other conditions. Obviously there are many physically distinct
ways in which the metric could approach flatness as a function of radial spatial distance
from a given region of interest, and one of the most natural-seeming and common
approaches, consistent with local observation, is to assume a spherically symmetrical
approach to spatial infinity. This tends to seem like a suitably frame-independent
assumption, since spatial spherical symmetry is frame-independent in Newtonian physics.
The problem, of course, is that in relativity the concept of spherical symmetry
automatically distinguishes a particular frame of reference - not just a class of frames, but
one particular frame. For example, if we choose a system of reference that is moving
toward Sirius at 0.999999c, the entire distribution of stars and galaxies in the universe is

drastically shrunk (spatially) along that direction, and if we define a spherically


symmetrical asymptotic approach to flatness at spatial infinity in these coordinates we
will get a physically different result (e.g., for solar system calculations) than if we define
a spherically symmetrical asymptotic approach to flatness with respect to a system of
coordinates in which the Sun is at rest. It's true that the choice of coordinate systems is
arbitrary, but only until we impose physically meaningful conditions on the manifold in
terms of those coordinates. Once we do that, our choice of coordinate systems acquires
physical significance, because the physical meaning of the conditions we impose is
determined largely by the coordinates in terms of which they are expressed, and these
conditions physically influence the solution. Of course, we can in principle define any
boundary conditions in conjunction with any set of coordinates, i.e., we could take the
rest frame of a near-light-speed cosmic particle to work out orbital mechanics of our
Solar system by (for example) specifying an asymptotic approach to flatness at spatial
infinity in a highly elliptical pattern, but the fact remains that this approach give a
uniquely spherical pattern only with respect to the Sun's rest frame.
Whenever we pose a Cauchy initial-value problem, the very act of specifying timeslices
(a spacelike foliation) and defining a set of physically recognizable conditions on one of
these surfaces establishes a distinguished reference system at each point. These individual
local frames need not be coherent, nor extendible, nor do we necessarily require them to
possess specific isometries, but the fact remains that the general process of actually
applying the field equations to an initial-value problem involves the stipulation of a
preferred space-time decomposition at each point, since the tangent plane of the timeslice
at each point singles out a local frame of reference, and we are assigning physically
meaningful conditions to every point on this surface in terms that unavoidably distinguish
this frame.
More generally, whenever we apply the field equations in any particular situation,
whether in the form of an initial-value problem or in some other form, we must always
specify sufficient boundary conditions, initial conditions, and/or symmetries to uniquely
determine the manifold, and in so doing we are positing an absolute spacetime just as
surely (and just as arbitrarily) as Newton did. It's true that the field equations themselves
would be compatible with a wide range of different absolute spacetimes, but this
ambiguity, from a predictive standpoint, is a weakness rather than a strength of the theory,
since, after all, we live in one definite universe, not infinitely many arbitrary ones.
Indeed, when taken as a meta-theory in this sense, general relativity does not even give
unique predictions for things like the twins paradox, etc, unless the statement of the
question includes the specification of the entire cosmological boundary conditions, in
which case we're back to a specific absolute spacetime. It was this very realization that
led Einstein at one point to the conviction that the universe must be regarded as spatially
closed, to salvage at least a semblance of unique for the cosmological solution as a
function of the mass energy distribution. (See Section 7.1.) However, the closed
Friedmann models are not currently in favor among astronomers, and in any case the
relational uniqueness that can be recovered in such a universe is more semantic than
substantial.

Moreover, the strategy of trying to obviate arbitrary boundary conditions by selecting a


topology without boundaries generally results in a topologically distinguished system of
reference at any point. For example, in cylindrical coordinates (assuming the space is
everywhere locally Lorentzian) there is only one frame in which the surfaces of
simultaneity of an inertial observer coherent. In all other frames, if we follow a surface of
simultaneity all the way around the closed dimension we find that it doesn't meet up with
itself. Instead, we get a helical pattern (if we picture just a single cylindrical spatial
dimension versus time).
It may seem that we can disregard peculiar boundary conditions involving waves and so
on, but if we begin to rule out valid solutions of the field equations by fiat, then we're
obviously not being guided by the theory, but by our prejudices and preferences.
Similarly, in order to exclude "unrealistic" cosmological solutions of the field equations
we must impose energy conditions, i.e., we find that it's necessary to restrict the class of
allowable Tuv tensor fields, but this again is not justified by the field equations
themselves, but merely by our wish to force them to give us "realistic" solutions. It would
be an exaggeration to say that we get out of the field equations only what we put into
them, but there's no denying that a considerable amount of "external" information must
be imposed on them in order to give realistic solutions.
7.8 Global Interpretations of Local Experience
How are our customary ideas of space and time related to the character of our
experiences? ... It seems to me that Poincare clearly recognized the truth in the account
he gave in his book "La Science et l'Hypothese".
Albert Einstein, 1921

The standard interpretation of general relativity entails a conceptual framework


consisting of primary entities - such as particles and non-gravitational fields - embedded
in an extensive differentiable manifold of space and time. The theory is presented in the
form of differential equations, interpreted as giving a description of the local metrical
properties of the manifold around any specific point, but most of the observable
predictions of the theory derive not from local results, per se, but from the inferred global
structure generated by analytically continuing the solution over an extended region. From
these extended solutions we infer configurations and motions of distant objects (fields
and particles), from which we derive predictions about observable interactions. Does the
totality of the observable interactions compels us to adopt this standard interpretation, or
might the same pattern of experiences be explainable within some other, possibly quite
different, conceptual framework?
In one sense the answer to this question is obvious. We can always accommodate any
sequence of perceptions within an arbitrary ontology merely by positing a suitable theory
of appearances separate from our presumed ontology. This approach goes back to ancient
philosophers such as Parmenides, who taught that motion, change, and even plurality are
merely appearances, while the reality is an unchanging unity. Although this strikes many
people as outlandish, we're all familiar with the appearances of motion, change, and

plurality in our own personal dreams while we are "really" motionless and alone. We can
even achieve a similar separation of perception and reality in computer-generated "virtual
reality simulations", in which various sense impressions of sight and sound are generated
to create an appearance that is starkly different from the underlying physical situation.
Due to technical limitations, such simulations may not be very realistic (at the moment),
but in principle they could be made arbitrarily realistic, and clearly there need be no
direct correspondence between the topology of the virtual world of appearances and the
actual world of external physical objects.
When confronted with examples like this, people who believe there is only one true
interpretation of the corporeal operations compatible with our experiences tend to be
dismissive, as if such examples are frivolous and unworthy of consideration. It's true that
a purely solipsistic approach to the interpretation of experiences is somewhat repugnant,
and need not be taken too seriously, it nevertheless serves to remind us (if we needed
reminding) that the link between our sense perceptions and the underlying external
structure is always ambiguous, and any claim that our experiences do (or can) uniquely
single out an ontology is patently false. There is always a degree of freedom in the
selection of our model of the presumed external objective reality.
In more serious models we usually assume that the processes of perception are "of the
same kind" as the external processes that we perceive, but we still bifurcate our models
into two parts, consisting of (1) an individual's sense impressions and interior
experiences, such as thoughts and dreams, and (2) a class of objective exterior entities
and events, of which only a small subset correspond to any individual's direct
perceptions. Even within this limited class of models, the task of inferring (2) from (1) is
not trivial, and there is certainly no a priori requirement that a given set of local
experiences uniquely determines a particular global embedding. For the purposes of this
discussion we will focus on the ambiguity class for external models that are consistent
with the predictions of general relativity, reduced to the actual sense impressions.
These considerations are complicated by the fact that the field equations of general
relativity, by themselves, permit a very wide range of global solutions if no restrictions
are placed on the type of boundary conditions, initial values, and energy conditions that
are allowed, but most of these solutions are (presumably) unphysical. As Einstein said,
"A field theory is not yet completely determined by the system of field equations". In
order to extract realistic solutions (i.e., solutions consistent with our experiences) from
the field equations we must impose some constraints on the boundary and energy
conditions. In this sense the field equations do not represent a complete theory, because
these restrictions cannot be inferred from the field equations, but are auxiliary
assumptions that must simply be imposed on the basis of external considerations.
This incompleteness is a characteristic of any physical law that is expressed as a set of
differential equations, because such equations generally possess a vast range of possible
formal solutions, and require one or more external principle or constraint to yield definite
results. The more formal flexibility that our theory possesses, the more inclined we are to
ask whether the actual physical content of the theory is contained in the rational "laws" or

the circumstantial conditions that we impose. For example, consider a theory consisting
of the assertion that certain aspects of our experience can be modeled by means of a
suitable Turing machine with suitable initial data. This is a very flexible theoretical
framework, since by definition anything that is computable can be computed from some
initial data using a suitable Turing machine. Such a theory undeniably yields all
applicable and computable results, but of course it also (without further specification)
encompasses infinitely many inapplicable results. An ideal theoretical framework would
be capable of representing all physical phenomena, but no unphysical phenomena. This is
just an expression of the physicist's desire to remove all arbitrariness from the theory.
However, as the general theory of relativity stands at present, it does not yield unique
predictions about the overall global shape of the manifold. Instead, it simply imposes
certain conditions on the allowable shapes. In this sense we can regard general relativity
as a meta-theory, rather than a specific theory.
So, when considering the possibility of alternative interpretations (or representations) of
general relativity, we need to decide whether we are trying to find a viable representation
of all possible theories that reside within the meta-theory of general relativity, or whether
we are trying to find a viable representation of just a single theory that satisfies the
requirements of general relativity. The physicist might answer that we need only seek
representations that conform with those aspects of general relativity that have been
observationally verified, whereas a mathematician might be more interested in whether
there are viable alternative representations of the entire meta-theory.
First we should ask whether there are any viable interpretations of general relativity as a
meta-theory. This is a serious question, because the usual criterion for viability is that the
candidate interpretation permits us to analytically continue all worldlines without leading
to any singularities or physical infinities. In other words, an interpretation is considered
to be not viable if the representation "breaks down" at some point due to an inability to
diffeomorphically continue the solution within that representation. The difficulty here is
that even the standard interpretation of general relativity in terms of curved spacetime
leads, in some circumstances, to inextendible worldlines and singularities in the field.
Thus if we take the position that such attributes are disqualifying, then it follows that
even the standard interpretation of general relativity in terms of an extended spacetime
manifold is not viable.
One approach to salvaging the geometrical interpretation is to adopt, as an additional
feature of the theory, the principle that the manifold must be free of singularities and
infinities. Indeed this was the approach that Einstein often suggested. He wrote
It is my opinion that singularities must be excluded. It does not seem reasonable to me to
introduce into a continuum theory points (or lines, etc.) for which the field equations do not hold...
Without such a postulate the theory is much too vague.

He even hoped that the exclusion of singularities might (somehow) lead to an


understanding of atomistic and quantum phenomena within the context of a continuum
theory, although he acknowledged that he couldn't say how this might come about. He
believed that the difficulty of determining exact singularity-free global solutions of non-

linear field equations prevents us from assessing the full content of a non-linear field
theory such as general relativity. (He recognized that this was contrary to the prevailing
view that a field theory can only be quantized by first being transformed into a statistical
theory of field probabilities, but he regarded this as "only an attempt to describe
relationships of an essentially nonlinear character by linear methods".)
Another approach, more in the mainstream of current thought, is to simply accept the
existence of singularities, and decline to consider them as a disqualifying feature of an
interpretation. According to theorems of Penrose, Hawking, and others, it is known that
the existence of a trapped surface (such as the event horizon of a black hole) implies the
existence of inextendible worldlines, provided certain energy conditions are satisfied and
we exclude closed timelike curves. Therefore, a great deal of classical general relativity
and its treatment of black holes, etc., is based on the acceptance of singularities in the
manifold, although this is often accompanied with a caveat to the effect that in the
vicinity of a singularity the classical field equations may give way to quantum effects.
In any case, since the field equations by themselves undeniably permit solutions
containing singularities, we must either impose some external constraint on the class of
realistic solutions to exclude those containing singularities, or else accept the existence of
singularities. Each of these choices has implications for the potential viability of
alternative interpretations. In the first case we are permitted to restrict the range of
solutions to be represented, which means we really only need to seek representations of
specific theories, rather than of the entire meta-theory represented by the bare field
equations. In the second case we need not rule out interpretations based on the existence
of singularities, inextendible worldlines, or other forms of "bad behavior".
To illustrate how these considerations affect the viability of alternative interpretations,
suppose we attempt to interpret general relativity in terms of a flat spacetime combined
with a universal force field that distorts rulers and clocks in just such a way as to match
the metrical relations of a curved manifold in accord with the field equations. It might be
argued that such a flat-spacetime formulation of general relativity must fail at some
point(s) to diffeomorphically map to the corresponding curved-manifold if the latter
possesses a non-trivial global topology. For example, the complete surface of a sphere
cannot be mapped diffeomorphically to the plane. By means of sterographic projection
from the North Pole of a sphere to a plane tangent to the South Pole we can establish a
diffeomorphic mapping to the plane of every point on the sphere except the North Pole
itself, which maps to a "point at infinity". This illustrates the fact that when mapping
between two topologically distinct manifolds such as the plane and the surface of a
sphere, there must be at least one point where the mapping is not well-behaved.
However, this kind of objection fails to rule out physically viable alternatives to the
curved spacetime interpretation (assuming any viable interpretation exists), and for
several reasons. First, we may question whether the mapping between the curved
spacetime and the alternative manifold needs to be everywhere diffeomorphic. Second,
even if we accede to this requirement, it's important to remember that the global topology
of a manifold is sensitive to pointwise excisions. For example, although it is not possible

to diffeomorphically map the complete sphere to the plane, it is possible to map the
punctured sphere, i.e., the sphere minus one point (such as the North Pole in the
sterographic projection scheme). We can analytically continue the mapping to include
this point by simply adding a "point at infinity" to the plane - without giving the extended
plane intrinsic curvature.
Of course, this interpretation does entail a singularity at one point, where the universal
field must be regarded as infinitely strong, but if we regard the potential for physical
singularities as disqualifying, then as noted above we have no choice but to allow the
imposition of some external principles to restrict the class of solutions to global
manifolds that are everywhere "well-behaved". If we also disallow this, then as discussed
above there does not exist any viable interpretation of general relativity. Once we have
allowed this, we can obviously posit a principle to the effect that only global manifolds
which can be diffeomorphically mapped to a flat spacetime are physically permissible.
Such a principle is no more in conflict with the field equations than are any of the wellknown "energy conditions", the exclusion of closed timelike loops, and so on.
Believers in one uniquely determined interpretation may also point to individual black
holes, whose metrical structure of trapped surfaces cannot possibly be mapped to flat
spacetime without introducing physical singularities. This is certainly true, but according
to theorems of Penrose and Hawking it is precisely the circumstance of a trapped surface
that commits the curved-spacetime formulation itself to a physical singularity. In view of
this, we are hardly justified in disqualifying alternative formulations that entail physical
singularities in exactly the same circumstances.
Another common objection to flat interpretations is that even for a topologically flat
manifold like the surface of a torus it is impossible to achieve the double periodicity of
the closed torriodal surface, but this objection can also be countered, simply by positing a
periodic flat universe. Admittedly this commits us to distant correlations, but such things
cannot be ruled out a priori (and in fact distant correlations do seem to be a characteristic
of the universe from the standpoint of quantum mechanics, as discussed in Section 9).
More generally, as Poincare famously summarized it, we can never observe our geometry
G in a theory-free sense. Every observation we make relies on some prior conception of
physical laws P which specify how physical objects behave with respect to G. Thus the
universe we observe is not G, but rather U = G + P, and for any given G we can vary P to
give the observed U. Needless to say, this is just a simplified schematic of the full
argument, but the basic idea is that it's simply not within the power of our observations to
force one particular geometry upon us (nor even one particular topology), as the only
possible way in which we could organize our thoughts and perceptions of the world. We
recall Poincare's famous conventionalist dictum "No geometry is more correct than any
other - only more convenient". Those who claim to "prove" that only one particular
model can be used to represent our experience would do well to remember John Bell's
famous remark that the only thing "proved" by such proofs is lack of imagination.
The interpretation of general relativity as a field theory in a flat background spacetime

has a long history. This approach was explored by Feynman, Deser, Weinberg, and others
at various times, partly to see if it would be possible to quantize the gravitational field in
terms of a spin-2 particle, following the same general approach that was successful in
quantizing other field theories. Indeed, Weinberg's excellent "Gravitation and
Cosmology" (1972) contained a provocative paragraph entitled "The Geometric
Analogy", in which he said
Riemann introduced the curvature tensor R to generalize the [geometrical] concept of curvature
to three or more dimensions. It is therefore not surprising that Einstein and his successors have
regarded the effects of a gravitational field as producing a change in the geometry of space and
time. At one time it was even hoped that the rest of physics could be brought into a geometric
formulation, but this hope has met with disappointment, and the geometric interpretation of the
theory of gravitation has dwindled to a mere analogy, which lingers in our language in terms like
"metric", "affine connection", and "curvature", but is not otherwise very useful. The important
thing is to be able to make predictions about the images on the astronomer's photographic plates,
frequencies of spectral lines, and so on, and it simply doesn't matter whether we ascribe these
predictions to the physical effect of a gravitational field on the motion of planets and photons or to
a curvature of space and time.

The most questionable phrase here is the claim that, aside from providing some useful
vocabulary, the geometric analogy "is not otherwise very useful". Most people who have
studied general relativity have found the geometric analogy to be quite useful as an aid to
understanding the theory, and Weinberg can hardly have failed to recognize this. I suspect
that what he meant (in context) is that the geometric framework has not proven to be very
useful in efforts to unify gravity with the rest of physics. The idea of "bringing the rest of
physics into a geometric formulation" refers to attempts to account for the other forces of
nature (electromagnetism, strong, and weak) in purely geometrical terms as attributes of
the spacetime manifold, as Einstein did for gravity. In other words, eliminate the concept
of "force" entirely, and show that all motion is geodesic in some suitably defined
spacetime manifold. This is what is traditionally called a "unified field theory", and led to
Weyl's efforts in the 20's, and the Kluza-Klein theories, and Einstein's anti-symmetric
theories, and so on. As Weinberg said, those hopes have (so far) met with
disappointment.
Of course, in another sense, one could say that all of physics has been subsumed by the
geometric point of view. We can obviously describe baseball, music, thermodynamics,
etc., in geometrical terms, but that isn't the kind of geometrizing that is being discussed
here. Weinberg was referring to attempts to make the space-time manifold itself account
for all the "forces" of nature, as Einstein had made it account for gravity. Quantum field
theory works on a background of space-time, but posits other ingredients on top of that to
represent the fields. Obviously we're free to construct a geometrical picture in our minds
of any gauge theory, just as we can form a geometrical picture in any arbitrary kind of
"space", e.g., the phase space of a system, but this is nothing like what Einstein, Weyl,
Kaluza, Weinberg, etc. were talking about. The original (and perhaps naive) hope was to
eliminate all other fields besides the metric field of the spacetime manifold itself, to
reduce physics to this one primitive entity (and its metric). It's clear that (1) physics has
not been geometrized in the sense that Weinberg was talking about, viz, with the
spacetime metric being the only ontological entity, and (2) in point of fact, some
significant progress toward the unification of the other "forces" of nature has indeed been

made by people (such as Weinberg himself) who did so without invoking the geometric
analogy.
Many scholars have expressed similar views to those of Poincare and Weinberg regarding
the essential conventionality of geometry. In considering the question "Is Spacetime
Curved?" Ian Roxburgh described the curved and flat interpretations of general relativity,
and concluded that "the answer is yes or no depending on the whim of the answerer. It is
therefore a question without empirical content, and has no place in physical inquiry."
Thus he agreed with Poincare that our choice of geometry is ultimately a matter of
convenience. Even if we believe that general relativity is perfectly valid in all regimes
(which most people doubt), it's still possible to place a non-geometric interpretation on
the "photographic plates and spectral lines" if we choose. The degree of "inconvenience"
is not very great in the weak-field limit, but becomes more extreme if we're thinking of
crossing event horizons or circumnavigating the universe. Still, we can always put a nongeometrical interpretation onto things if we're determined to do so. (Ironically, the most
famous proponent of the belief that the geometrical view is absolutely essential, indeed a
sine qua non of rational thought, was Kant, because the geometry he espoused so
confidently was non-curved Euclidean space.)
Even Kip Thorne, who along with Misner and Wheeler wrote the classic text Gravitation
espousing the geometric viewpoint, admits that he was once guilty of curvature
chauvinism. In his popular book "Black Holes and Time Warps" he writes
Is spacetime really curved? Isn't it conceivable that spacetime is actually flat, but the clocks and
rulers with which we measure it... are actually rubbery? Wouldn't... distortions of our clocks and
rulers make truly flat spacetime appear to be curved? Yes.

Thorne goes on to tell how, in the early 1970's, some people proposed a membrane
paradigm for conceptualizing black holes. He says
When I, as an old hand at relativity theory, heard this story, I thought it ludicrous. General
relativity insists that, if one falls into a black hole, one will encounter nothing at the horizon
except spacetime curvature. One will see no membrane and no charged particles... the membrane
theory can have no basis in reality. It is pure fiction. The cause of the field lines bending, I was
sure, is spacetime curvature, and nothing else... I was wrong.

He goes on to say that the laws of black hole physics, written in accord with the
membrane interpretation, are completely equivalent to the laws of the curved spacetime
interpretation (provided we restrict ourselves to the exterior of black holes), but they are
each heuristically useful in different circumstances. In fact, after he got past thinking it
was ludicrous, Thorne spent much of the 1980's exploring the membrane paradigm. He
does, however, maintain that the curvature view is better suited to deal with interior
solutions of black holes, but isn't not clear how strong a recommendation this really is,
considering that we don't really know (and aren't likely to learn) whether those interior
solutions actually correspond to facts.
Feynmans lectures on gravitation, written in the early 1960s, present a field-theoretic
approach to gravity, while also recognizing the viability of Einsteins geometric

interpretation. Feynman described the thought process by which someone might arrive at
a theory of gravity mediated by a spin-two particle in flat spacetime, analagous to the
quantum field theories of the other forces of nature, and then noted that the resulting
theory possesses a geometrical interpretation.
It is one of the peculiar aspect of the theory of gravitation that is has both a field interpretation and
a geometrical interpretation the fact is that a spin-two field has this geometrical representation;
this is not something readily explainable it is just marvelous. The geometric interpretation is not
really necessary or essential to physics. It might be that the whole coincidence might be
understood as representing some kind of gauge invariance. It might be that the relationships
between these two points of view about gravity might be transparent after we discuss a third point
of view, which has to do with the general properties of field theories under transformations

He goes on to discuss the general notion of gauge invariance, and concludes that gravity
is that field which corresponds to a gauge invariance with respect to displacement
transformations.
One potential source of confusion when discussing this issue is the fact that the local null
structure of Minkowski spacetime makes it locally impossible to smoothly mimic the
effects of curved spacetime by means of a universal force. The problem is that
Minkowski spacetime is already committed to the geometrical interpretation, because it
identifies the paths of light with null geodesics of the manifold. Putting this together with
some form of the equivalence principle obviously tends to suggest the curvature
interpretation. However, this does not rule out other interpretations, because there are
other possible interpretations of special relativity - notably Lorentz's theory - that don't
identify the paths of light with null geodesics. It's worth remembering that special
relativity itself was originally regarded as simply an alternate interpretation of Lorentz's
theory, which was based on a Galilean spacetime, with distortions in both rulers and
clocks due to motion. These two theories are experimentally indistinguishable - at least
up to the implied singularity of the null intervals. In the context of Galilean spacetime we
could postulate gravitational fields affecting the paths of photons, the rates of physical
clocks, and so on. Of course, in this way we arrive at a theory that looks exactly like
curved spacetime, but we interpret the elements of our experience differently. Since (in
this interpretation) we believe light rays don't follow null geodesic paths (and in fact we
don't even recognize the existence of null geodesics) in the "true" manifold under the
influence of gravity, we aren't committed to the idea that the paths of light delineate the
structure of the manifold. Thus we'll agree with the conventional interpretation about the
structure of light cones, but not about why light cones have that structure.
Of course, at some point any flat manifold interpretation will encounter difficulties in
continuing its worldlines in the presence of certain postulated structures, such as black
holes. However, as discussed above, the curvature interpretation is not free of difficulties
in these circumstances either, because if there exists a trapped surface then there also
exist non-extendable timelike or null geodesics for the curvature interpretation. So, the
(arguably) problematical conditions for a "flat space" interpretation are identical to the
problematical conditions for the curvature interpretation. In other words, if we posit the
existence of trapped surfaces, then it's disingenuous for us to impugn the robustness of
flat space interpretations in view of the fact that these same circumstances commit the

curvature interpretation to equally disquieting singularities.


It may or may not be the case that the curvature interpretation has a longer reach, in the
sense that it's formally extendable inside the Schwarzschild radius, but, as noted above,
the physicality of those interior solutions is not (and probably never will be) subject to
verification, and they are theoretically controversial even within the curvature tradition
itself. Also, the simplistic arguments proposed in introductory texts are easily seen to be
merely arguments for the viability of the curvature interpretation, even though they are
often mis-labeled as arguments for the necessity of it.
There's no doubt that the evident universality of local Lorentz covariance, combined with
the equivalence principle, makes the curvature interpretation eminently viable, and it's
probably the "strongest" interpretation of general relativity in the sense of being exposed
most widely to falsification in principle, just as special relativity is stronger than
Lorentzs ether theory. The curvature interpretation has certainly been a tremendous
heuristic aid (maybe even indispensable) to the development of the theory, but the fact
remains that it isn't the only possible interpretation. In fact, many (perhaps most)
theoretical physicists today consider it likely that general relativity is really just an
approximate consequence of some underlying structure, similar to how continuum fluid
mechanics emerges from the behavior of huge numbers of elementary particles. As was
rightly noted earlier, much of the development of particle physics and more recently
string theory has been carried out in the context of rather naive-looking flat backgrounds.
Maybe Kant will be vindicated after all, and it will be shown that humans really aren't
capable of conceiving of the fundamental world on anything other than a flat geometrical
background. If so, it may tell us more about ourselves than about the world.
Another potential source of confusion is the tacit assumption on the part of some people
that the topology of our experiences is unambiguous, and this in turn imposes definite
constraints on the geometry via the Gauss-Bonnet theorem. Recall that for any twodimensional manifold M the Euler characteristic is a topological invariant defined as

where V, E, and F denote the number of vertices, edges, and faces respectively of any
arbitrary triangulation of the entire surface. Extending the work that Gauss had done on
the triangular excess of curves surfaces, Bonnet proved in 1858 the beautiful theorem that
the integral of the Gaussian curvature K over the entire area of the manifold is
proportional to the Euler characteristic, i.e.,

More generally, for any manifold M of dimension n the invariant Euler characteristic is

where k is the number of k-simplexes of an arbitrary "triangulation" of the manifold.


Also, we can let Kn denote the analog of the Gaussian curvature K for an n-dimensional
manifold, noting that for hypersurfaces this is just the product of the n principal extrinsic
curvatures, although like K it has a purely intrinsic significance for arbitrary embeddings.
The generalized Gauss-Bonnet theorem is then

where V(Sn) is the "volume" of a unit n-sphere. Thus if we can establish that the topology
of the overall spacetime manifold has a non-zero Euler characteristic, it will follow that
the manifold must have non-zero metrical curvature at some point. Of course, the
converse is not true, i.e., the existence of non-zero metrical curvature at one or more
points of the manifold does not imply non-zero Euler characteristic. The two-dimensional
surface of a torus with the usual embedding in R3 not only has intrinsic curvature but is
topologically distinct from R2, and yet (as discussed in Section 7.5) it can be mapped
diffeomorphically and globally to an everywhere-flat manifold embedded in R4. This
illustrates the obvious fact that while topological invariants impose restrictions on the
geometry, they don't uniquely determine the geometry.
Nevertheless, if a non-zero Euler characteristic is stipulated, it is true that any
diffeomorphic mapping of this manifold must have non-zero curvature at some point.
However, there are two problems with this argument. First, we need not be limited to
diffeomorphic mappings from the curved spacetime model, especially since even the
curvature interpretation contains singularities and physical infinities in some
circumstances. Second, the topology is not stipulated. The topology of the universe is a
global property which (like the geometry) can only be indirectly inferred from local
experiences, and the inference is unavoidably ambiguous. Thus the topology itself is
subject to re-interpretation, and this has always been recognized as part-and-parcel of any
major shift in geometrical interpretation. The examples that Poincare and others talked
about often involved radical re-interpretations of both the geometry and the topology,
such as saying that instead of a cylindrical dimension we may imagine an unbounded but
periodic dimension, i.e., identical copies placed side by side. Examples like this aren't
intended to be realistic (necessarily), but to convey just how much of what we commonly
regard as raw empirical fact is really interpretative.
We can always save the appearances of any particular apparent topology with a
completely different topology, depending on how we choose to identify or distinguish the
points along various paths. The usual example of this is a cylindrical universe mapped to
an infinite periodic universe. Therefore, we cannot use topological arguments to prove
anything about the geometry. Indeed these considerations merely extend the degrees of
freedom in Poincare's conventionalist formula, from U = G + P to U = (G + T) + P, where
T represents topology. Obviously the metrical and topological models impose consistency
conditions on each other, but the two of them combined do not constrain U any more than
G alone, as long as the physical laws P remain free.

Of course, there may be valid reasons for preferring not to avail ourselves of any of the
physical assumptions (such as a "universal force", let alone multiple copies of regions,
etc.) that might be necessary to map general relativity to a flat manifold in various
(extreme) circumstances, such as in the presence of trapped surfaces or other
"pathological" topologies, but these are questions of convenience and utility, not of
feasibility. Moreover, as noted previously, the curvature interpretation itself entails
inextendable worldlines as soon as we posit a trapped surface, so topological anomalies
hardly give an unambiguous recommendation to the curvature interpretation.
The point is that we can always postulate a set of physical laws that will make our
observations consistent with just about any geometry we choose (even a single monadal
point!), because we never observe geometry directly. We only observe physical processes
and interactions. Geometry is inherently an interpretative aspect of our understanding. It
may be that one particular kind of geometrical structure is unambiguously the best (most
economical, most heuristically robust, most intuitively appealing, etc), and any
alternative geometry may require very labored and seemingly ad hoc "laws of physics" to
make it compatible with our observations, but this simply confirms Poincare's dictum that
no geometry is more true than any other - only more convenient.
It may seem as if the conventionality of geometry is just an academic fact with no real
applicability or significance, because all the examples of alternative interpretations that
we've cited have been highly trivial. For a more interesting example, consider a mapping
(by radial projection) from an ordinary 2-sphere to a circumscribed polyhedron, say a
dodecahedron. With the exception of the 20 vertices, where all the "curvature" is
discretely concentrated, the surface of the dodecahedron is perfectly flat, even along the
edges, as shown by the fact that we can "flatten out" two adjacent pentagonal faces on a
plane surface without twisting or stretching the surfaces at all. We can also flatten out a
third pentagonal face that joins the other two at a given vertex, but of course (in the usual
interpretation) we can't fit in a fourth pentagon at that vertex, nor do three quite "fill up"
the angular range around a vertex in the plane. At this stage we would conventionally pull
the edges of the three pentagons together so that the faces are no longer coplanar, but we
could also go on adjoining pentagonal surfaces around this vertex, edge to edge, just like
a multi-valued "Riemann surface" winding around a pole in the complex plane. As we
march around the vertex, it's as if we are walking up a spiral staircase, except that all the
surfaces are laying perfectly flat. This same "spiral staircase" is repeated at each vertex of
the solid.
Naturally we can replace the dodecahedron with a polyhedron having many more
vertices, but still consisting of nothing but flat surfaces, with all the "curvature"
distributed discretely at a huge number of vertices, each of which is a "pole" of an infinite
spiral staircase of flat surfaces. This structure is somewhat analogous to a "no-collapse"
interpretation of quantum mechanics, and might be called a "no-curvature" interpretation
of general relativity. At each vertex (cf. measurement) we "branch" into on-going flatness
across the edge, never actually "collapsing" the faces meeting at a vertex into a curved
structure. In essence the manifold has zero Euler characteristic, but it exhibits a nonvanishing Euler characteristic modulo the faces of the polyhedron. Interestingly, the term

"branch" is used in multi-valued Riemann surfaces just as it's used in some descriptions
of the "no-collapse" interpretation of quantum mechanics. Also, notice that the non-linear
aspects of both theories are (arguably) excised by this maneuver, leaving us "only" to
explain how the non-linear appearances emerge from this aggregate, i.e., how the
different moduli are inter-related. To keep track of a particle we would need its entire
history of "winding numbers" for each vertex of the entire global manifold, in the order
that it has encountered them (because it's not commutative), as well as it's nominal
location modulo the faces of the polyhedron.
In this model the full true topology of the universe is very different from the apparent
topology modulo the polyhedral structure, and curvature is non-existent on the individual
branches, because every time we circle a non-flat point we simply branch to another level
(just as in some of the no-collapse interpretations of quantum mechanics the state sprouts
a new branch, rather than collapsing, each time an observation is made). Each time a
particle crosses an edge between two vertices it's set of winding numbers is updated, and
we end up with a combinatorial approach, based on a finite number of discrete poles
surrounded by infinitely proliferating (and everywhere-flat) surfaces. We can also arrange
for the spiral staircases to close back on themselves after a suitable number of windings,
while maintaining a vanishing Euler characteristic.
For a less outlandish example of a non-trivial alternate interpretation of general relativity,
consider the "null surface" interpretation. According to this approach we consider only
the null surfaces of the traditional spacetime manifold. In other words, the only intervals
under consideration are those such that g dx dx = 0. Traditional timelike paths are
represented in this interpretation by zigzag sequences of lightlike paths, which can be
made to approach arbitrarily closely to the classical timelike paths. The null condition
implies that there are really only three degrees of freedom for motion from any given
point, because given any three of the increments dx0, dx1, dx2, and dx3, the corresponding
increment of the fourth automatically follows (up to sign). The relation between this
interpretation and the conventional one is quite similar to the relation between special
relativity and Lorentz's ether theory. In both cases we can use essentially the same
equations, but whereas the conventional interpretation attributes ontological status to the
absolute intervals dt, the null interpretation asserts that those absolute intervals are
ultimately superfluous conventionalizations (like Lorentz's ether), and encourages us to
dispense with those elements and focus on the topology of the null surfaces themselves.
8.1 Kepler, Napier, and the Third Law
There is special providence in the fall of a sparrow.
Shakespeare
By the year 1605 Johannes Kepler, working with the relativistic/inertial view of the solar
system suggested by Copernicus, had already discerned two important mathematical
regularities in the orbital motions of the planets:

I. Planets move in ellipses with the Sun at one focus.


II. The radius vector describes equal areas in equal times.
This shows the crucial role that interpretations and models sometimes play in the
progress of science, because it's obvious that these profoundly important observations
could never even have been formulated in terms of the Ptolemaic earth-centered model.
Oddly enough, Kepler arrived at these conclusions in reverse order, i.e., he first
determined that the radius vector of a planet's "oval shaped" path sweeps out equal areas
in equal times, and only subsequently determined that the "ovals" were actually ellipses.
It's often been remarked that Kepler's ability to identify this precise shape from its
analytic properties was partly due to the careful study of conic sections by the ancient
Greeks, particularly Apollonius of Perga, even though this study was conducted before
there was even any concept of planetary orbits. Kepler's first law is often cited as an
example of how purely mathematical ideas (e.g., the geometrical properties of conic
sections) can sometimes find significant applications in the descriptions of physical
phenomena.
After painstakingly extracting the above two "laws" of planetary motion (first published
in 1609) from the observational data of Tycho Brahe, there followed a period of more
than twelve years during which Kepler exercised his ample imagination searching for any
further patterns or regularities in the data. He seems to have been motivated by the idea
that the orbits of the planets must satisfy a common set of simple mathematical relations,
analogous to the mathematical relations which the Pythagoreans had discovered between
harmonious musical tones. However, despite all his ingenious efforts during these years,
he was unable to discern any significant new pattern beyond the two empirical laws
which he had found in 1605. Then, as Kepler later recalled, on the 8th of March in the
year 1618, something marvelous "appeared in my head". He suddenly realized that
III. The proportion between the periodic times of any two
planets is precisely one and a half times the proportion
of the mean distances.
In the form of a diagram, his insight looks like this:

At first it may seem surprising that it took a mathematically insightful man like Kepler
over twelve years of intensive study to notice this simple linear relationship between the
logarithms of the orbital periods and radii. In modern data analysis the log-log plot is a
standard format for analyzing physical data. However, we should remember that
logarithmic scales had not yet been invented in 1605. A more interesting question is why,
after twelve years of struggle, this way of viewing the data suddenly "appeared in his
head" early in 1618. (By the way, Kepler made some errors in the calculations in March,
and decided the data didn't fit, but two months later, on May 15 the idea "came into his
head" again, and this time he got the computations right.)
Is it just coincidental that John Napier's "Mirifici Logarithmorum Canonis Descripto"
(published in 1614) was first seen by Kepler towards the end of the year 1616? We know
that Kepler was immediately enthusiastic about logarithms, which is not surprising,
considering the masses of computation involved in preparing the Rudolphine Tables.
Indeed, he even wrote a book of his own on the subject in 1621. It's also interesting that
Kepler initially described his "Third Law" in terms of a 1.5 ratio of proportions, exactly
as it would appear in a log-log plot, rather than in the more familiar terms of squared
periods and cubed distances. It seems as if a purely mathematical invention, namely
logarithms, whose intent was simply to ease the burden of manual arithmetical
computations, may have led directly to the discovery/formulation of an important
physical law, i.e., Kepler's third law of planetary motion. (Ironically, Kepler's academic
mentor, Michael Maestlin, chided him - perhaps in jest? - for even taking an interest in
logarithms, remarking that "it is not seemly for a professor of mathematics to be
childishly pleased about any shortening of the calculations".) By the 18th of May, 1618,
Kepler had fully grasped the logarithmic pattern in the planetary orbits:
Now, because 18 months ago the first dawn, three months ago the broad daylight,
but a very few days ago the full Sun of a most highly remarkable spectacle has
risen, nothing holds me back.

It's interesting to compare this with Einstein's famous comment about "...years of anxious
searching in the dark, with their intense longing, the final emergence into the light--only
those who have experienced it can understand it".
Kepler announced his Third Law in Harmonices Mundi, published in 1619, and also
included it in his "Ephemerides" of 1620. The latter was actually dedicated to Napier,
who had died in 1617. The cover illustration showed one of Galileo's telescopes, the
figure of an elliptical orbit, and an allegorical female (Nature?) crowned with a wreath
consisting of the Naperian logarithm of half the radius of a circle. It has usually been
supposed that this work was dedicated to Napier in gratitude for the "shortening of the
calculations", but Kepler obviously recognized that it went deeper than this, i.e., that the
Third Law is purely a logarithmic harmony. In a sense, logarithms played a role in
Kepler's formulation of the Third Law analogous to the role of Apollonius' conics in his
discovery of the First Law, and with the role that tensor analysis and Riemannian
geometry played in Einstein's development of the field equations of general relativity. In
each of these cases we could ask whether the mathematical structure provided the tool
with which the scientist was able to describe some particular phenomenon, or whether the
mathematical structure effectively selected an aspect of the phenomena for the scientist to
discern.
Just as we can trace Kepler's Third Law of planetary motion back to Napier's invention of
logarithms, we can also trace Napier's invention back to even earlier insights. It's no
accident that logarithms have applications in the description of Nature. Indeed in his
introduction to the tables, Napier wrote
A logarithmic table is a small table by the use of which we can obtain a
knowledge of all geometrical dimensions and motions in space...
The reference to motions in space is very appropriate, because Napier originally
conceived of his "artificial numbers" (later renamed logarithms, meaning number of the
ratio) in purely kinematical terms. In fact, his idea can be expressed in a form that Zeno
of Elea would have immediately recognized. Suppose two runners leave the starting
gate, travelling at the same speed, and one of them maintains that speed, whereas the
speed of the other drops in proportion to his distance from the finish line. The closer the
second runner gets to the finish line, the slower he runs. Thus, although he is always
moving forward, the second runner never reaches the finish line. As discussed in Section
3.7, this is exactly the kind of scenario that Zeno exploited to illustrate paradoxes of
motion. Here, 2000 years later, we find Napier making very different use of it, creating a
continuous mapping from the real numbers to his "artificial numbers". With an
appropriate choice of units we can express the position x of the first runner as a function
of time by x(t) = t, and the position X of the second runner is defined by the differential
equation dX/dt = 1 X where the position "1" represents the finish line. The solution of
this equation is X(t) = 1 et, where ex is the function that equals its own derivative. Then
Napier defined x(t) as the "logarithm" of 1 X(t), which is to say, he defined t as the
"logarithm" of et. Of course, the definition of logarithm was subsequently revised so

that we now define t as the logarithm of et, the latter being the function that equals its
own derivative.
The logarithm was one of many examples throughout history of ideas that were "in the
air" at a certain time. It had been known since antiquity that the exponents of numbers in
a geometric sequence are additive when terms are multiplied together, i.e., we have anam =
a(m+n). In fact, there are ancient Babylonian tablets containing sequences of powers and
problems involving the determination of the exponents of given numbers. In the 1540's
Stifel's "Arithmetica integra" included tables of the successive powers of numbers, which
was very suggestive for Napier and others searching for ways to reduce the labor
involved in precise manual computations.
In the 1580's Viete derived several trigonometric formulas such as

If we have a table of cosine values this formula enables us to perform multiplication


simply by means of addition. For example, to find the product of 0.7831 and 0.9348 we
can set cos(x) = 0.7831 and cos(y) = 0.9348 and then look up the angles x,y with these
cosines in the table. We find x = 0.67116 and y = 0.36310, from which we have the sum
x+y = 1.03426 and the difference xy = 0.30806. The cosines of the sum and difference
can then be looked up in the table, giving cos(x+y) = 0.51116 and cos(x-y) = 0.95292.
Half the sum of these two numbers equals the product 0.73204 of the original two
numbers. This technique was called prosthaphaeresis (the Greek word for addition and
subtraction), and was quickly adopted by scientists such as the Dane Tycho Brahe for
performing astronomical calculations. Of course, today we recognize that the above
formula is just a disguised version of the simple exponent addition rule, noting that cos(x)
= (eix + e-ix)/2.
At about this same time (1594), John Napier was inventing his logarithms, whose
purpose was also to reduce multiplication and division to simple addition and subtraction
by means of a suitable transformation. However, Napier might never have set aside his
anti-Catholic polemics to work on producing his table of logarithms had it not been for an
off-hand comment made by Dr. John Craig, who was the physician to James VI of
Scotland (later James I of England and Ireland). In 1590 Craig accompanied James and
his entourage bound for Norway to meet his prospective bride Anne, who was supposed
to have journeyed from Denmark to Scotland the previous year, but had been diverted by
a terrible storm and ended up in Norway. (The storm was so severe that several supposed
witches were held responsible and were burned.) James' party, too, encountered severe
weather, but eventually he met Anne in Oslo and the two were married. On the journey
home the royal party visited Tycho Brahe's observatory on the island of Hven, and were
entertained by the famous astronomer, well known as the discoverer of the "new star" in
the constellation Cassiopeia. During this stay at Brahe's lavish Uraneinborg ("castle in
the sky") Dr. Craig observed the technique of prosthaphaeresis that Brahe and his
assistants used to ease the burden of calculation. When he returned to Scotland, Craig

mentioned this to his friend the Baron of Murchiston (aka John Napier), and this seems to
have motivated Napier to devote himself to the development of his logarithms and the
generation of his tables, on which he spent the remaining 25 years of his life. During this
time Napier occasionally sent preliminary results of Brahe for comment.
Several other people had similar ideas about exploiting the exponential mapping for
purposes of computation. Indeed, Kepler's friend and assistant Jost Burgi evidently
devised a set of "progress tables" (basically anti-logarithm tables) around 1600, based on
the indices of geometric progressions, and made some use of these in his calculations.
However, he didn't fully perceive the potential of this correspondence, and didn't develop
it very far.
Incidentally, if the story of a group of storm-tossed nobles finding themselves on a
mysterious island ruled over by a magician sounds familiar, it may be because of
Shakespeare's "The Tempest", written in 1610. This was Shakespeare's last complete
play and, along with Love's Labor's Lost, his only original plot, i.e., these are the only
two of his plays whose plots are not known to have been based on pre-existing works. It
is commonly believed that the plot of "The Tempest" was inspired by reports of a group
of colonists bound for Virginia who were shipwrecked in Bermuda in 1609. However, it's
also possible that Shakespeare had in mind the story of James VI (who by 1610 was
James I, King of England) and his marriage expedition, arriving after a series of violent
storms on the island of the Danish astronomer and astrologer Tycho Brahe and his castle
in the sky (which, we may recall, included a menagerie of exotic animals). We know
"The Tempest" was produced at the royal court in 1611 and again in 1612 as part of the
festivities preceding the marriage of the King's daughter, and it certainly seems likely that
James and Anne would associate any story involving a tempest with their memories of
the great storms of 1589 and 1590 that delayed Anne's voyage to Scotland and prompted
James' journey to meet her. The providential aspects of Shakespeare's "The Tempest" and
its parallels with their own experiences could hardly have been lost on them.
Shakespeare's choice of the peculiar names Rosencrantz and Guildenstern for two minor
characters in "Hamlet, Prince of Denmark" gives further support to the idea that he was
familiar with Tycho, since those were the names of two of Tycho's ancestors appearing on
his coat of arms. There is also evidence that Shakespeare was personally close to the
Digges family (e.g., Leonard Digges contributed a sonnet to the first Folio), and Thomas
Digges was an English astronomer and mathematician who, along with John Dee, was
well acquainted with Tycho. Digges was an early supporter and interpreter of
Copernicus' relativistic ideas, and was apparently the first to suggest that our Sun was just
an ordinary star in an infinite universe of stars.
Considering all this, it is surely not too farfetched to suggest that Tycho may have been
the model for Prospero, whose name, being composed of Providence and sparrow, is an
example of Shakespeare's remarkable ability to weave a variety of ideas, influences, and
connotations into the fabric of his plays, just as we can see in Kepler's three laws the
synthesis of the heliocentric model of Copernicus, Apollonius' conics, and the logarithms
of Napier.

8.2 Newton's Cosmological Queries


Isack
received your letter and I perceived you
letter from mee with your cloth but
none to you your sisters present thai
love to you with my motherly lov
and prayers to god for you I
your loving mother
hanah
wollstrup may the 6. 1665
Newton famously declared that it is not the business of science to make hypotheses.
However, it's well to remember that this position was formulated in the midst of a bitter
dispute with Robert Hooke, who had criticized Newton's writings on optics when they
were first communicated to the Royal Society in the early 1670's. The essence of
Newton's thesis was that white light is composed of a mixture of light of different
elementary colors, ranging across the visible spectrum, which he had demonstrated by
decomposing white light into its separate colors and then reassembling those components
to produce white light again. However, in his description of the phenomena of color
Newton originally included some remarks about his corpuscular conception of light
(perhaps akin to the cogs and flywheels in terms of which James Maxwell was later to
conceive of the phenomena of electromagnetism). Hooke interpreted the whole of
Newton's optical work as an attempt to legitimize this corpuscular hypothesis, and
countered with various objections.
Newton quickly realized his mistake in attaching his theory of colors to any particular
hypothesis on the fundamental nature of light, and immediately back-tracked, arguing
that his intent had been only to describe the observable phenomena, without regard to any
hypotheses as to the cause of the phenomena. Hooke (and others) continued to criticize
Newton's theory of colors by arguing against the corpuscular hypothesis, causing Newton
to respond more and more angrily that he was making no hypothesis, he was describing
the way things are, and not claiming to explain why they are. This was a bitter lesson for
Newton and, in addition to initiating a life-long feud with Hooke, went a long way
toward shaping Newton's rhetoric about what science should be.
I use the term "rhetoric" because it is to some extent a matter of semantics as to whether a
descriptive theory entails a causative hypothesis. For example, when accused of invoking
an occult phenomena in gravity, Newton replied that the phenomena of gravity are not
occult, although the causes may be. (See below.) Clearly the dispute with Hooke had
caused Newton to paint himself into the "hypotheses non fingo" corner, and this
somewhat accidentally became part of his legacy to science, which has ever after been
much more descriptive and less explanatory than, say, Descartes would have wished. This

is particularly ironic in view of the fact that Newton personally entertained a great many
bold hypotheses, including a number of semi-mystical hermetic explanations for all
manner of things, not to mention his painstaking interpretations of biblical prophecies.
Most of these he kept to himself, but when he finally got around to publishing his optical
papers (after Hooke had died) he couldn't resist including a list of 31 "Queries"
concerning the big cosmic issues that he had been too reticent to address publicly before.
The true nature of these "queries" can immediately be gathered from the fact that every
one of them is phrased in the form of a negative question, as in "Are not the Rays of
Light very small bodies emitted from shining substances?" Each one is plainly a
hypothesis phrased as a question.
The first edition of The Opticks (1704) contained only 16 queries, but when the Latin
edition was published in 1706 Newton was emboldened to add seven more, which
ultimately became Queries 25 through 31 when, in the second English edition, he added
Queries 17 through 24. Of all these, one of the most intriguing is Query 28, which begins
with the rhetorical question "Are not all Hypotheses erroneous in which Light is
supposed to consist of Pression or Motion propagated through a fluid medium?" In this
query Newton rejects the Cartesian idea of a material substance filling in and comprising
the space between particles. Newton preferred an atomistic view, believing that all
substances were comprised of hard impenetrable particles moving and interacting via
innate forces in an empty space (as described further in Query 31). After listing several
facts that make an aetheral medium inconsistent with observations, the discussion of
Query 28 continues
And for rejecting such a medium, we have the authority of those the oldest and
most celebrated philosophers of ancient Greece and Phoenicia, who made a
vacuum and atoms and the gravity of atoms the first principles of their
philosophy, tacitly attributing gravity to some other cause than dense matter. Later
philosophers banish the consideration of such a cause... feigning [instead]
hypotheses for explaining all things mechanically [But] the main business of
natural philosophy is to argue from phenomena without feigning hypotheses, and
to deduce causes from effects, till we come to the very first cause, which certainly
is not mechanical.
And not only to unfold the mechanism of the world, but chiefly to resolve such
questions as What is there in places empty of matter? and Whence is it that the
sun and planets gravitate toward one another without dense matter between them?
Whence is it that Nature doth nothing in vain? and Whence arises all that order
and beauty which we see in the world? To what end are comets? and Whence is it
that planets move all one and the same way in orbs concentrick, while comets
move all manner of ways in orbs very excentrick? and What hinders the fixed
stars from falling upon one another?
It's interesting to compare these comments of Newton with those of Socrates as recorded
in Plato's Phaedo

If then one wished to know the cause of each thing, why it comes to be or perishes
or exists, one had to find what was the best way for it to be, or to be acted upon,
or to act. I was ready to find out ... about the sun and the moon and the other
heavenly bodies, about their relative speed, their turnings, and whatever else
happened to them, how it is best that each should act or be acted upon. I never
thought [we would need to] bring in any other cause for them than that it was best
for them to be as they are.
This wonderful hope was dashed as I went on reading, and saw that [men]
mention as causes air and ether and water and many other strange things... It is
what the majority appear to do, like people groping in the dark; they call it a
cause, thus giving it a name which does not belong to it. That is why one man
surrounds the earth with a vortex to make the heavens keep it in place, another
makes the air support it like a wide lid. As for their capacity of being in the best
place they could possibly be put, this they do not look for, nor do they believe it to
have any divine force, but they believe that they will some time discover a
stronger and more immortal Atlas to hold everything together...
Both men are suggesting that a hierarchy of mechanical causes cannot ultimately prove
satisfactory, and that the first cause of things cannot be mechanistic in nature. Both
suggest that the macroscopic mechanisms of the world are just manifestations of an
underlying and irreducible principle of "order and beauty", indeed of a "divine force".
But Newton wasn't content to leave it at this. After lengthy deliberations, and discussions
with David Gregory, he decided to add the comment
Is not Infinite Space the Sensorium of a Being incorporeal, living and intelligent,
who sees the things themselves intimately, and thoroughly perceives them, and
comprehends them wholly by their immediate presence to himself?
Samuel Johnson once recommended a proof-reading technique to a young writer, telling
him that you should read over your work carefully, and whenever you come across a
phrase or passage that seems particularly fine, strike it out. Newton's literal identification
of Infinite Space with the Sensorium of God may have been a candidate for that
treatment, but it went to press anyway. However, as soon as the edition was released,
Newton suddenly got cold feet, and realized that he'd exposed himself to ridicule. He
desperately tried to recall the book and, failing that, he personally rounded up all the
copies he could find, cut out the offending passage with scissors, and pasted in a new
version. Hence the official versions contain the gentler statement (reverting once again to
the negative question!):
And these things being rightly dispatch'd, does it not appear from phaenomena
that there is a Being incorporeal, living, intelligent, omnipresent, who in infinite
space, as it were in his Sensory, sees the things themselves intimately, and
thoroughly perceives them, and comprehends them wholly by their immediate
presence to himself: Of which things the images only carried through the organs
of sense into our little sensoriums are there seen and beheld by that which in us

perceives and thinks. And though every true step made in this philosophy brings
us not immediately to the knowledge of the first cause, yet it brings us nearer to
it...
Incidentally, despite Newton's efforts to prevent it, one of the un-repaired copies had
already made its way out of the county, and was on its way to Leibniz, who predictably
cited the original "Sensorium of God" comment as evidence that Newton "has little
success with metaphysics".
Newton's 29th Query (not a hypothesis, mind you) was: "Are not the rays of light very
small bodies emitted from shining substances?" Considering that his mooting of this idea
over thirty years earlier had precipitated a controversy that nearly led him to a nervous
breakdown, one has to say that Newton was nothing if not tenacious. This query also
demonstrates how little his basic ideas about the nature of light had changed over the
course of his life. After listing numerous reasons for suspecting that the answer to this
question was Yes, Newton proceeded in Query 30 to ask the pregnant question "Are not
gross bodies and light convertible into one another?" Following Newton's rhetorical
device, should not this be interpreted as a suggestion of equivalence between mass and
energy?
The final pages of The Opticks are devoted to Query 31, which begins
Have not the small particles of bodies certain powers, virtues, or forces, by which
they act at a distance, not only upon the rays of light for reflecting, refracting, and
inflecting them, but also upon one another for producing a great part of the
Phenomena of nature?
Newton goes on to speculate that the force of electricity operates on very small scales to
hold the parts of chemicals together and govern their interactions, anticipating the
modern theory of chemistry. Most of this Query is devoted to an extensive (20 pages!)
enumeration of chemical phenomena that Newton wished to cite in support of this view.
He then returns to the behavior of macroscopic objects, asserting that
Nature will be very conformable to herself, and very simple, performing all the
great motions of the heavenly bodies by the attraction of gravity which intercedes
those bodies, and almost all the small ones of their particles by some other
attractive and repelling powers which intercede the particles.
This is a very clear expression of Newton's belief that forces act between separate
particles, i.e., at a distance. He continues
The Vis inertiae is a passive Principle by which Bodies persist in their Motion or
Rest, receive Motion in proportion to the Force impressing it, and resist as much
as they are resisted. By this Principle alone there never could have been any
Motion in the World. Some other Principle was necessary for putting Bodies into
Motion; and now they are in Motion, some other Principle is necessary for

conserving the motion.


In other words, Newton is arguing that the principle of inertia, by itself, cannot account
for the motion we observe in the world, because inertia only tends to preserve existing
states of motion, and only uniform motion in a straight line. Thus we must account for the
initial states of motion (the initial conditions), the persistence of non-inertial motions, and
for the on-going variations in the amount of motion that are observed. For this purpose
Newton distinguishes between "passive" attributes of bodies, such as inertia, and "active"
attributes of bodies, such as gravity, and he points out that, were it not for gravity, the
planets would not remain in their orbits, etc, so it is necessary for bodies to possess active
as well as passive attributes, because otherwise everything would soon be diffuse and
cold. Thus he is not saying that the planets would simply come to a halt in the absence of
active attributes, but rather that the constituents of any physical universe resembling ours
(containing persistent non-inertial motion) must necessarily possess active as well as
passive properties.
Next, Newton argues that the "amount of motion" in the world is not constant, in two
different respects. The first is rather interesting, because it makes very clear the fact that
he regarded ontological motion as absolute. He considers two identical globes in empty
space attached by a slender rod and revolving with angular speed about their combined
center of mass, and he says the center of mass is moving with some velocity v (in the
plane of revolution). If the radius from the center of mass to each globe is r, then the
globes have a speed of r relative to the center. When the connecting rod is periodically
oriented perpendicular to the velocity of the center, one of the globes has a speed equal to
v + r and the other a speed equal to v r, so the total "amount of motion" (i.e., the sum
of the magnitudes of the momentums) is simply 2mv. However, when the rod is
periodically aligned parallel to the velocity of the center, the globes each have a total
speed of

, so the total "amount of motion" is

Thus, Newton argues, the total quantity of motion of the two globes fluctuates
periodically between this value and 2mv. Obviously he is expressing the belief that the
"amount of motion" has absolute significance. (He doesn't remark on the fact that the
kinetic energy in this situation is conserved).
The other way in which, Newton argues, the amount of motion is not conserved is in
inelastic collisions, such as when two masses of clay collide and the bodies stick together.
Of course, even in this case the momentum vector is conserved, but again the sum of the
magnitudes of the individual momentums is reduced. Also, in this case, the kinetic energy
is dissipated as heat. Interestingly, Newton observes that, aside from the periodic
fluctuations such as with the revolving globes, the net secular change in total "amount of
motion" is always negative.

By reason of the tenacity of fluids, the attrition of their parts... motion is much
more apt to be lost than got, and is always upon the Decay.
This can easily be seen as an early statement of statistical thermodynamics and the law of
entropy. In any case, from this tendency for motion to decay, Newton concludes that
eventually the Universe must "run down", and "all things would grow cold and freeze,
and become inactive masses".
Newton also mentions one further sense in which (he believed) passive attributes alone
were insufficient to account for the persistence of well-ordered motion that we observe.
...blind fate could never make all the planets move one and the same way in orbs
concentrick, some inconsiderable irregularities excepted, which may have risen
from the action of comets and planets upon one another, and which will be apt to
increase, till this system wants a reformation.
In addition to whatever sense of design and/or purpose we may discern in the initial
conditions of the solar system, Newton also seems to be hinting at the idea that, in the
long run, any initial irregularities, however "inconsiderable" they may be, will increase
until the system wants reformation. In recent years we've gained a better appreciation of
the fact that Newton's laws, though strictly deterministic, are nevertheless potentially
chaotic, so that the overall long-term course of events can quickly come to depend on
arbitrarily slight variations in initial conditions, rendering the results unpredictable on the
basis of any fixed level of precision.
So, for all these reasons, Newton argues that passive principles such as inertia cannot
suffice to account for what we observe. We also require active principles, among which
he includes gravity, electricity, and magnetism. Beyond this, Newton suggests that the
ultimate "active principle" underlying all the order and beauty we find in the world, is
God, who not only set things in motion, but from time to time must actively intervene to
restore their motion. This was an important point for Newton, because he was genuinely
concerned about the moral implications of a scientific theory that explained everything as
the inevitable consequence of mechanical principles. This is why he labored so hard to
reconcile his clockwork universe with an on-going active role for God. He seems to have
found this role in the task of resisting an inevitable inclination of our mechanisms to
descend into dissipation and veer into chaos.
In this final Query Newton also took the opportunity to explicitly defend his abstract
principles such as inertia and gravity, which some critics charged were occult.
These principles I consider not as occult qualities...but as general laws of nature,
by which the things themselves are formed, their truth appearing to us by
phenomena, though their causes be not yet discovered. For these are manifest
qualities, and their causes only are occult. The Aristotelians gave the name of
occult qualities not to manifest qualities, but to such qualities only as they
supposed to lie hid in Bodies, and to be the unknown causes of manifest effects,

such as would be the causes of gravity... if we should suppose that these forces or
actions arose from qualities unknown to us, and uncapable of being made known
and manifest. Such occult qualities put a stop to the improvement of natural
philosophy, and therefore of late years have been rejected. To tell us that every
species of things is endowed with an occult specific quality by which it acts and
produces manifest effects is to tell us nothing...
The last set of Queries to be added, now numbered 17 through 24, appeared in the second
English edition in 1717, when Newton was 75. These are remarkable in that they argue
for an aether permeating all of space - despite the fact that Queries 25 through 31 argue at
length against the necessity for an aether, and those were hardly altered at all when
Newton added the new Queries which advocate an aether. (It may be worth noting,
however, that the reference to "empty space" in the original version of Query 28 was
changed at some point to "nearly empty space".) It seems to be the general opinion
among Newtonian scholars that these "Aether Queries" inserted by Newton in his old age
were simply attempts "to placate critics by seeming retreats to more conventional
positions". The word "seeming" is well chosen, because we find in Query 21 the
comments
And so if any one should suppose that aether (like our air) may contain particles
which endeavour to recede from one another (for I do not know what this aether
is), and that its particles are exceedingly smaller than those of air, or even than
those of light, the exceeding smallness of its particles may contribute to the
greatness of the force by which those particles may recede from one another, and
thereby make that medium exceedingly more rare and elastick than air, and by
consequence exceedingly less able to resist the motions of projectiles, and
exceedingly more able to press upon gross bodies, by endeavoring to expand
itself.
Thus Newton not only continues to view light as consisting of particles, but imagines that
the putative aether may also be composed of particles, between which primitive forces
operate to govern their movements. It seems that the aether of these queries was a
distinctly Newtonian one, and it purpose was as much to serve as a possible mechanism
for gravity as for the refraction and reflection of light. It's disconcerting that Newton
continued to be misled by his erroneous belief that refracted paths proceed from more
dense to less dense regions, which required him to posit an aether surrounding the Sun
with a density that increases with distance, so that the motion of the planets may be seen
as a tendency to veer toward less dense parts of the aether.
There's a striking parallel between this set of "pro-Aether Queries" of Newton and the
famous essay "Ether and the Theory of Relativity", in which Einstein tried to reconcile
his view of physics with something that could be termed an ether. Of course, it turned out
to be a distinctly Einsteinian ether, immaterial, and incapable of being assigned any place
or state of motion.
Since I've credited Newton with suggesting the second law of thermodynamics and mass-

energy equivalence, I may as well mention that he could also be regarded as the
originator of the notorious "cosmological constant", which has had such a checkered
history in theory of relativity. Recall that the weak/slow limit of Einstein's field equations
without the cosmological term corresponds to a gravitational relation of the familiar form

but if a non-zero cosmological constant is assumed the weak/slow limit is

As it happens, Newton explored the consequences of a wide range of central force laws in
the Principia, and determined that the only two forms for which spherically symmetrical
masses can be treated as if all the mass was located at the central point are F = k/r2 and F
= r. (See Propositions LXXVII and LXXVIII in Book I). In addition to this distinctive
spherical symmetry property (analogous to Birkhoff's theorem for general relativity),
these are also the only two central force laws for which the shape of orbits in a two-body
system are perfect conic sections (see Proposition X), although in the case of a force
directly proportional to the distance the center of force is at the center of the conic, rather
than at a focus. In the Scholium following the discussion of spherically symmetrical
bodies Newton wrote
I have now explained the two principal cases of attractions; to wit, when the
centripetal forces decrease as the square of the ratio of the distances, or increase
in a simple ratio of the distances, causing the bodies in both cases to revolve in
conic sections, and composing spherical bodies whose centripetal forces observe
the same law of increase or decrease in the recess from the center as the forces
from the particles themselves do; which is very remarkable.
Considering that Newton referred to these two special cases as the two principal cases of
"attraction", it's not too much of a stretch to say that the full general law of attraction (or
gravitation) developed in the Principia was actually (2) rather than (1), and it was only in
Book III (The System of the World), in which the laws are fit to actual observed
phenomena, that he concludes there is no (discernable) evidence for the direct term. The
situation is essentially the same today, i.e., on a purely formal mathematical basis the
cosmological term seems to "fit", at least up to a point, but the empirical justification for
it remain unclear. If is non-zero, it must be quite small, at least in the current epoch. So
I think it can be said with some justification that Newton actually originated the
cosmological term in theoretical investigations of gravity.
As an example of how seriously Newton took these "non-physical" possibilities, he noted
that with an inverse-square law the introduction of a third body generally destroys perfect
ellipticity of the orbits, causing the ellipses to precess, whereas in Proposition LXIV he
shows that with a pure direct force law F = r this is not the case. In other words, the

orbits remain perfectly elliptical even with three or more gravitating bodies, although the
presence of more bodies increases the velocities and decreases the periods of the orbits.
These serious considerations show that Newton wasn't simply trying to fit data to a
model. He was interested in the same aspect of science that Einstein said interested him
the most, namely, "whether God had any choice in how he created the world". This may
be a somewhat melodramatic way of expressing it, but the basic idea is clear. It isn't
enough to discern that objects appear to obey an inverse square law of attraction; Newton
wanted to understand what was special about the inverse square, and why nature chose
that form rather than some other. Socrates alluded to this same wish in Phaedo:
If then one wished to know the cause of each thing, why it comes to be or perishes
or exists, one had to find out what was the best way for it to be, or to be acted
upon, or to act.
Although this attitude may strike us as silly, it seems undeniable that it's been an
animating factor in the minds of some of the greatest scientists the urge to comprehend
not just what is, but why it must be so.

8.3 The Helen of Geometers


I first have to learn to watch very respectfully as the masters of creativity
perform their intellectual climbing feats, while I stay bowleggedly below
in the valley mist. I already have a premonition that up there the sun is
always shining!
Hedwig Born
to Einstein, 1919
The curve traced out by a point on the rim of a rolling circle is called a cycloid, and we've
seen that this curve described gravitational free-fall, both in Newtonian mechanics and in
general relativity (in terms of the free-falling proper time). Remarkably, this curve has
been a significant object of study for almost every major scientist mentioned in this book,
and has been called "the Helen of geometers" because of all the disputes it has provoked
between mathematicians. It was first discussed by Charles Bouvelles in 1501 as a
mechanical means of squaring the circle. Subsequently Galileo and his student Viviani
studied the curve, finding a method of constructing tangents, and Galileo suggested that it
might be a suitable shape for an arch bridge.
Mersenne publicized the cycloid among his group of correspondents, including the young
Roberval, who, by the 1630's had determined many of the major properties of the cycloid,
such as the interesting fact that the area under a complete cycloidal arch is exactly three
times the area of the rolling circle. Roberval used his problem-solving techniques in 1634
to win the Mathematics chair at the College Royal, which was determined every three
years by an open competition. Unfortunately, the contest did not require full disclosure of

the solution methods, so the incumbent (who selected the contest problems) had a strong
incentive to keep his best methods a secret, lest they be used to unseat him at the next
contest. In retrospect, this was not a very wise arrangement for a teaching position.
Roberval held the chair for 40 years, but by keeping his solution methods secret he lost
priority for several important discoveries, and became involved in numerous quarrels.
One of the men accused by Roberval of plagiarism was Torricelli, who in 1644 was the
first to publish an explanation of the area and the tangents of the cycloid. It's now
believed that Torricelli arrived at his results independently. (Torricelli served as Galileo's
assistant for a brief time, and probably learned of the cycloid from him.)
In 1658, four years after renouncing mathematics as a vainglorious pursuit, Pascal found
himself one day suffering from a painful toothache, and in desperation began to think
about the cycloid to take his mind off the pain. Quickly the pain abated, and Pascal
interpreted this as a sign from the Almighty that he should proceed to study the cycloid,
which he did intensively for the next eight days. During this period he rediscovered most
of what had already been learned about the cycloid, and several results that were new.
Pascal decided to propose a set of challenge problems, with the promise of a first and
second prize to be awarded for the best solutions. Roberval was named as one of the
judges. Only two sets of solutions were received, from Antoine de Lalouvere and John
Wallis, but Pascal and Roberval decided that neither of the entries merited a prize, so no
prizes were awarded. Instead, Pascal published his own solutions, along with an essay on
the "History of the Cycloid", in which he essentially took Roberval's side in the priority
dispute with Torricelli.
The conduct of Pascal's cycloid contest displeased many people, but it had at least one
useful side effect. In 1658 Christiaan Huygens was thinking about how to improve the
design clocks, and of course he realized that the period of oscillation of a simple
pendulum (i.e., a massive object constrained to moving along a circular arc under the
vertical force of gravity) is not perfectly independent of the amplitude. Prompted by
Pascal's contest, Huygens decided to consider how an object would oscillate if
constrained to follow an upside-down cycloidal path, and found to his delight that the
frequency of such a system actually is perfectly independent of the amplitude. Thus he
had discovered that the cycloid is the tautochrone, i.e., the curve for which the time taken
by a particle sliding from any point on the curve to the lowest point on the curve is the
same, independent of the starting point. He presented this result in his great treatise
"Horologium Oscillatorium" (not published until 1673), in which he clearly described the
modern principle of inertia (the foundation of relativity), the law of centripetal force, the
conservation of kinetic energy, and many other important concepts of dynamics - ten
years before Newton's "Principia".
The cycloid went on attracting the attention of the world's best mathematicians, and
revealing new and remarkable properties. For example, in June of 1696, John Bernoulli
issued the following challenge to the other mathematicians of Europe:
If two points A and B are given in a vertical plane, to assign to a mobile particle
M the path AMB along which, descending under its own weight, it passes from

the point A to the point B in the briefest time.


Pictorially the problem is as shown below:

In accord with its defining property, the requested curve is called the brachistochrone.
The solution was first found by Jean and/or Jacques Bernoulli, depending on whom you
believe. (Each of the brothers worked on the problem, and they later accused each other
of plagiarism.) Jean, who was never accused of understating the significance of his
discoveries, revealed his solution in January of 1697 by first reminding his readers of
Huygens' tautochrone, and then saying "you will be petrified with astonishment when I
say that precisely this same cycloid... is our required brachistochrone".
Incidentally, the Bernoulli's were partisans on the side of Leibniz in the famous priority
dispute between Leibniz and Newton over the invention of calculus. Before revealing his
solution to the brachistochrone challenge problem, Jean Bernoulli along with Leibniz sent
a copy of the challenge directly to Newton in England, and included in the public
announcement of the challenge the words
...there are fewer who are likely to solve our excellent problems, aye, fewer even
among the very mathematicians who boast that [they]... have wonderfully
extended its bounds by means of the golden theorems which (they thought) were
known to no one, but which in fact had long previously been published by others.
It seems clear the intent was to humiliate the aging Newton (who by then had left
Cambridge and was Warden of the Mint), by demonstrating that he was unable to solve a
problem that Leibniz and the Bernoullis had solved. The story as recounted by Newton's
biographer Conduitt is that Sir Isaac "in the midst of the hurry of the great recoinage did
not come home till four from the Tower very much tired, but did not sleep till he had
solved it, which was by 4 in the morning." In all, Bernoulli received only three solutions
to his challenge problem, one from Leibniz, one from l'Hospital, and one anonymous
solution from England. Bernoulli supposedly said he knew who the anonymous author
must be, "as the lion is recognized by his print". Newton was obviously proud of his
solution, although he commented later that "I do not love to be dunned & teezed by
forreigners about Mathematical things..."
It's interesting that Jean Bernoulli apparently arrived at his result from his studies of the
path of a light ray through a non-uniform medium. He showed how this problem is
related in general to the mechanical problem of an object moving with varying speeds

due to any cause. For example, he compared the refractive problem with the mechanical
problem whose density is inversely proportional to the speed that a heavy body acquires
in gravitational freefall. "In this way", he wrote, "I have solved two important problems an optical and a mechanical one...". Then he specialized this to Galileo's law of falling
bodies, according to which the speeds of two falling bodies are to each other as the
square roots of the altitudes traveled. He concluded
Before I end I must voice once more the admiration I feel for the unexpected
identity of Huygens' tautochrone and my brachistochrone. I consider it especially
remarkable that this coincidence can take place only under the hypothesis of
Galileo, so that we even obtain from this a proof of its correctness. Nature always
tends to act in the simplest way, and so it here lets one curve serve two different
functions, while under any other hypothesis we should need two curves...
Presumably his enthusiasm would have been even greater had he known that the same
curve describes radial gravitational freefall versus proper time in general relativity. We
see from Bernoullis work that the variational techniques developed to solve problems
like the brachistrochrone also found physical application in what came to be called the
principle of least action, a principle usually attributed to Maupertius, or perhaps Leibniz
(if one accepts the contention that the best of all possible worlds represents an
expression of this principle). One particularly striking application of this variational
approach was Fermats principle of least time for light rays, as discussed in Section 3.4.
Essentially the same technique is used to determine the equations of a geodesic path in
the curved spacetime of general relativity.
In the twentieth century, Planck was the most prominent enthusiast for the variational
approach, asserting that the principle of least action is perhaps that which, as regards
form and content, may claim to come nearest to that ideal final aim of theoretical
research. Indeed he even (at times) argued that the principle manifests a deep
teleological aspect of nature, since it can be interpreted as a global imperative, i.e.,
systems evolve locally in a way that extremizes (or makes stationary) certain global
measures in a temporally symmetrical way, as if the final state were already determined.
He wrote
In fact, the least-action principle introduces an entirely new idea into the concept
of causality: The causa efficiens, which operates from the present into the future
and makes future situations appear as determined by earlier ones, is joined by the
causa finalis for which, inversely, the future namely, a definite goal serves as
the premise from which there can be deduced the development of the processes
which lead to this goal.
Its surprising to see this called an entirely new idea, considering that causa finalis was
among the four fundamental kinds of causation enunciated by Aristotle. In any case,
throughout his life the normally austere and conservative Planck continued to have an
almost mystical reverence for the principle of least action, arguing that it is not only the
most comprehensive of all physical laws, but that it actually represents the purest

expression of the thoughts of God.


Interestingly, Fermat himself was much less philosophically committed to the principle
that he himself originated (somewhat like Einsteins ambivalence toward the quantum
theory). After being challenged on the fundamental truth of the "least time" principle as a
law of nature by the Cartesian Clerselier, Fermat replied in exasperation
I do not pretend and I have never pretended to be in the secret confidence of
nature. She moves by paths obscure and hidden...
Fermat was content to regard the principle of least time as a purely abstract mathematical
theorem, describing - though not necessarily explaining the behavior of light.
8.4 Refractions on Relativity
For now we see through a glass, darkly; but then face to face. Now I know in part, but
then shall I know even as also I am known.
I Corinthians 13,12

We saw in Section 3.4 that Fermat's Principle of least time predicts that paths of light rays
passing through a plane boundary between regions of constant refractive index, but to
more fully appreciate this principle it's useful to develop the equations of motion for light
rays in a medium with arbitrarily varying refractive index. First, notice that Snell's law
enables us to determine the paths of optical rays passing though a discrete boundary
between regions of constant refractive index, but doesn't explicitly tell us the path of light
in a medium of continuously varying refractivity. To determine this, we can refer to
Fresnel's equations, which give the intensities of the reflected and transmitted

Consequently, the fraction of incident energy that is transmitted is 1 R. However, this


formula assumes the thickness of the boundaries between regions of constant refractive
index is small in comparison with the wavelength of the light, whereas in many real
circumstances the density of the medium does not change abruptly at well-defined
boundaries, but varies continuously as a function of position. Therefore, we would like a
means of tracing rays of light as they pass through a medium with a continuously varying
index of refraction.
Notice that if we approximate a continuously changing index of refraction by a sequence
of thin uniform plates, as we add more plates the ratio of n2/n1 from one region to the next
approaches 1, and so according to Snell's Law the value of 2 approaches the value of 1.
From Fresnel's equations we see that in this case the fraction of incident energy that is
reflected goes to zero, and we find that a light ray with a given trajectory proceeds in just
one direction through the continuous medium (provided the gradient of the scalar field
n(x,y) is never too great relative to the wavelength of the light). So, it should be possible

to predict the unique path of transmission of a light ray in a medium with continuously
varying index of refraction.
Perhaps the most direct approach is via the usual calculus of variations. (For
convenience we'll just work in 2 dimensions, but all the formulas can immediately be
generalized to three dimensions.) We know that the index of refraction n at a point (x,y)
equals c/v, where v is the velocity of light at that point. Thus, if we parameterize the path
by the equations x = x(u) and y = y(u), the "optical path length" from point A to point B
(i.e., the time taken by a light beam to traverse the path) is given by the integral

where dots signify derivatives with respect to the parameter u. To make this integral an
extremum, let f denote the integrand function

Then the Euler equations (introduced in Section 5.4) are

which gives

Now, if we define our parameter u as the spatial path length s, then we have
and so the above equations reduce to

These are the "equations of motion" for a photon in a heterogeneous medium, as they are
usually formulated, in terms of the spatial path parameter s. However, another approach
to this problem is to define a temporal metric on the space, i.e., a metric the represents
the time taken by a light beam to travel from one point to another. This temporal
approach has remarkable formal similarities to Einstein's metrical theory of gravity.

According to Fermat's Principle, the path taken by a ray of light from one point to another
is such that the time is minimal (for slight perturbations of the path). Therefore, if we
define a metric in the x,y space such that the metrical "distance" between any two
infinitesimally close points is proportional to the time required by a photon to travel from
one point to the other, then the paths of photons in this space will correspond to the
geodesics.
Since the refractive index n is a smooth continuous function of x and y, it can be regarded
as constant in a sufficiently small region surrounding any particular point (x,y). The
incremental spatial distance from this point to the nearby point (x+dx, y+dy) is given by
ds2 = dx2 + dy2, and the incremental time d for a photon to travel the incremental
distance ds is simply ds/v where v = c/n. Therefore, we have d = (n/c)ds, and so our
metrical line element for this space is

If, instead of x and y, we name our two spatial coordinates x1 and x2 (where these
superscripts denote indices, not exponents) we can express equation (2) in tensor form as

where guv is the covariant metric tensor

Note that in equation (3) we have invoked the usual summation convention. The
contravariant form of the metric tensor, denoted by guv, is the matrix inverse of (4).
According to Fermat's Principle, the path of a light ray must be a geodesic path based on
this metric. As discussed in Section 5.4, the equations of a geodesic path are

Based on the metric of our 2D optical space we have the eight Christoffel symbols

Inserting these into (5) gives the equations for geodesic paths, which define the paths of
light rays in this region. Reverting back to our original notation of x,y for our spatial
coordinates, the differential equations for ray paths in this medium of continuously
varying refractive index are

where nx and ny denote partials derivatives of n with respect to x and y respectively.


These are the equations of motion for light based on the temporal metric approach.
To show that these equations, based on the temporal path parameter , are equivalent to
equations (1a) and (1b) based on the spatial path parameter s, notice that s and are
linked by the relation ds/d = c/n where c is the velocity of light. Multiplying both inside
and outside the right hand side expression of (1a) by the unity of (n/c)(ds/d) we get

Expanding the derivative on the right side gives

Since n is a function of x and y, we can express the derivative dn/d using the total
derivative

Substituting this into the previous equation and factoring gives

Recalling that c/n = ds/d, we can multiply both sides of this equation by (ds/d)2 to give

Since s is the spatial path length, we have (ds)2 = (dx)2 + (dy)2, so we can substitute for ds
on the left hand side and rearrange terms to give the result

which is the same as the geodesic equation (6a). A similar derivation shows that (1b) is
equivalent to the geodesic equation (6b), so the two sets of equations of motion for light
rays are identical.
With these equations we can compute the locus of rays emanating from any given point
in a medium with arbitrarily varying index of refraction. Of course, if the index of
refraction is constant then the right hand sides of equations (6) vanish and the equations
for light rays reduce to

which are simply the equations of straight lines. For a less trivial case, suppose the index
of refraction in this region is a linear function of the x parameter, i.e., we have n(x) = Ax
+ B for some constants A and B. In this case the equations of motion reduce to

With A=5 and B=1/5 the locus of rays emanating from a point is as shown in Figure 1.

Figure 1
The correctness of the rays in Figure 1 are easily verified by noting that in a medium with
n varying only in the horizontal direction it follows immediately from Snell's law that the
product n sin() must be constant, where is the angle which the ray makes with the
horizontal axis. We can verify numerically that the rays shown in Figure 1, generated by
the geodesic equations, satisfy Snell's Law throughout.
We've placed the origin of these rays at the location where n = 5. The left-most point on
this family of curves emanating from that point is at the x location where n = 0. Of
course, in reality we could not construct a medium with n = 0, since that represents an
infinite speed of light. It is, however, possible for the index of refraction of a medium to
be less than 1 for certain frequencies, such as x-rays in glass. This implies that the
velocity of light exceeds c, which may seem to conflict with relativity. However, the
"velocity of light" that appears in the denominator of the refractive index is actually the
phase velocity, rather than the group velocity, and the latter is typically the speed of
energy transfer and signal propagation. (The phenomenon of "anomalous dispersion" can
actually result in a group velocity greater than c, but in all cases the signal velocity is less
than or equal to c.)
Incidentally, these ray lines, in a medium with linearly varying index of refraction, are
called catenary curves, which is the shape made by a heavy cable slung between two
attachment points in uniform gravity. To prove this, let's first rotate the medium so that
the refractive index varies vertically instead of horizontally, and let's slide the vertical
axis so that n = Ay for some constant A. The general form of a catenary curve (with
vertical axis of symmetry) is

for some constant m. It follows that dy/dx = sinh(x/m). Also, the incremental distance
along the path is given by (ds)2 = (dx)2 + (dy)2, so we can substitute for dy to give

Therefore, we have ds = cosh(x/m) dx, which can be integrated to give s = sinh(x/m).


Interestingly, this implies that dy/dx = s, so the slope of a catenary (with vertical axis)
equals the distance along the curve from the minimum point. Also, from the relation x =
m invsin(s) we have dx/ds = m /
dy/ds = as/
equations

, so we can multiply this by dy/dx = s to give

. Integrating this gives y as a function of s, so we have the parametric

Letting n0 denote the index of refraction at the minimum point of the catenary (where the
curve is parallel to the lines of constant refractive index), and letting A denote dn/dy, we
have m = n0/A. For other values of y we have n = Ay = n0
. We can verify that
the catenary represents the path of a light ray in a medium whose index of refraction
varies linearly as a function of y by inserting these expressions for x, y, and n (and their
derivatives) into equations of motion (1).
The surface of revolution of one of these catenary curves about the vertical axis through
the vertex of the envelope is called a catenoid. Each point inside the envelope of this
family of curves is contained in exactly two curves, and the catenoid given by the shorter
of these two curves is a minimal surface. It's also interesting to note that the "envelope"
of rays emanating from a given point approaches a parabola whose focus is the given
point. This parabola and focus are shown as a dotted line in Figure 1.
For a less trivial example, the figure below shows the rays in a medium where the index
of refraction is spherically symmetrical and drops off linearly with distance from some
central point, which gives ray paths that are hypocycloidal loops.

Figure 2
It's also possible to arrange for the light rays to be loxodromic spirals, as shown below.

Figure 3
Finally, Figure 4 shows that the rays can circulate from one point to a central point in
accord with "circles of Apollonius", much like the iterations of Mobius transformations
in the complex plane.

Figure 4

This occurs with n varying inversely as the square of the distance from the central point.
Theoretically, the light from any point, with an initial trajectory in any direction, will
eventually turn around and head toward the singularity of infinite density at the center,
which the ray approaches asymptotically slowly. Thus, it might be called a "black
sphere" lens that refracts all incident light toward its center. Of course, there are obvious
practical difficulties with actually constructing an object like this, not least of which is the
infinite density at the center, as well as the problems of reflection and dispersion.
As an aside, it's interesting to compare the light deflection predicted by the
Schwarzschild solution with the deflection that would be given by a simple "refractive
medium" with a scalar index of refraction defined at each point. We've seen that the
"least time" metric in a plane is

where we have set c=1, and n(x,y) is the index of refraction at the point (x,y). If we write
this in polar coordinates r,, and if we assume that both n and d/dt depend only on r, this
can be written as

for some function n(r). In order to match the Schwarzschild radial speed of light dr/dt we
must have n(r) = r/(r2m), which completely determines the "refractive model" metric for
light rays on the plane. The corresponding geodesic equations are

These are similar, but not identical, to the geodesic equations based on the Schwarzschild
metric, as can be seen by comparing them with equations (2) in Section 6.2. The weak
field deflection is almost indistinguishable. To see this, we proceed as we did with the
Schwarzschild metric, integrating the second geodesic equation and determining the
constant of integration from the perihelion condition at r = r0 to give

Substituting this into the metric divided by (dt)2 and solving for dr/dt gives

Dividing d/dt by dr/dt gives d/dr. Then, making the substitution = r0/r as before
we arrive at the integral for the angular travel from the perihelion to infinity

Doubling this gives the total angular travel between the incoming and outgoing
asymptotes, and subtracting p from this travel gives the deflection . Expanding the
integral in powers of m/r0, we have the result

Thus the first-order deflection for this simple refraction model is the same as for the
Schwarzschild solution. The solutions differ in the second order, but this difference is
much too small to be measured in the weak gravitational fields found in our solar
system. However, the difference would be significant near a "black hole", because the
radius for lightlike circular orbits in this refractive model is 4m, as opposed to 3m for the
Schwarzschild metric.
On the other hand, it's important to keep in mind that the physical significance of the
usual Schwarzschild coordinates can't be taken for granted when translated into a putative
model based on simple refraction. The angular coordinates are fairly unambiguous, but
we have various resonable choices for the radial parameter. One common choice gives
the so-called isotropic coordinates. For the radial coordinate we use , defined with
respect to the Schwarzschild coordinate r by the relation

Note that the perimeter of a circular orbit of radius r is 2r, consistent with Euclidean
geometry, whereas the perimeter of a circle of radius is roughly 2(1 + m/). In terms
of this radial parameter, the Schwarzschild metric takes the form

This leads to the positive-definite metric for light paths

Hence if we postulate a Euclidean space with the coordinates ,, centered on the mass
m, and a refractive index varying with according to the formula

then the equations of motion for light are formally identical to those predicted by general
relativity. However, when we postulate a Euclidean space with the radial parameter r we
are neglecting the fact that the perimeter of a circle of radius r in this space does not have
the value 2, so this is not an entirely self-consistent interpretation, as opposed to the
usual "curvature" interpretation of general relativity. In addition, physical refraction is
ordinarily dependent on the frequency of the light, whereas gravitational deflection is not,
so in order to achieve the formal match between the two we must make the physically
implausible assumption of refractive index that is independent of frequency. Furthermore,
it isn't self-evident that a refractive model can correctly account for the motions of timelike objects, whereas the curved-spacetime interpretation handles all these motions in a
unified and self-consistent manner.
8.5 Scholium
I earnestly ask that all this be appraised honestly, and that defects in
matters so very difficult be not so much reprehended as investigated and
kindly supplemented by new endeavors of my readers.
Isaac Newton,
1687
Considering that the first Scholium of Newton's Principia begins with the famous
assertion "absolute, true, and mathematical time...flows equably, without relation to

anything external", it's ironic that Newton's theory of universal gravitation can be
interpreted as a theory of variations in the flow of time. Suppose in Newton's absolute
space we establish the Cartesian coordinates x,y,z, and then assign a fourth coordinate, t,
to every point. We will call this the coordinate time parameter, but we don't necessarily
identify this with the "true time" of events. Instead we postulate that the true lapse of time
along an incremental timelike path is d, given by

From the Galilean standpoint, we assume that a single set of assignments of the time
coordinate t to events corresponds to the lapses of proper time d along any and all paths,
which implies that g00 = 1 and k = 0. However, this can only be known to within some
observational tolerance. Strictly speaking we can say only that g00 is extremely close to 1,
and the constant k is very close to zero (in conventional units of measure).
Using indices with x0 = t, x1 = x, x2 = y, and x3 = z, we can re-write (1) as the summation

where

Now let's define a four-dimensional array of number representing the second partial
derivatives of the gbd as a function of every pair of coordinates xa, xc

Also, we define the "contraction" of this array (using the summation convention for
repeated indices) as

Since the only non-zero components of Rabcd are R00cd, it follows that the only non-zero
component of Rab is

If we assume g00 is independent of the coordinate t (meaning that the metrical


configuration is static), the first term vanishes and we find that R00 is just the Laplacian of
g00. Hence if we take our vacuum field equations to be R = 0, this is equivalent to
requiring that the Laplacian of g00 vanish, i.e.,

For convenience let us define the scalar = g00/2. If we consider just spherically
symmetrical fields about the origin, we have = (r) and so

and similarly for the partials with respect to y and z. Since

we have

and similarly for the y and z partials. Making these substitutions back into the Laplace
equation gives

This is simple linear differential equation has the unique solution d/dr = J/r2 where J is a
constant of integration, and so we have = -J/r + K for some constants J and K.
Incidentally, it's worth noting that this applies only in three dimensions. If we were
working in just two dimensions, the constant "2" in the above equation would be "1", and
the unique solution would be d/dr = J/r, giving = J ln(r) + K. This shows that
Newtonian gravity "works" only with three space dimensions, just as general relativity
works only with four spacetime dimensions.
Now that we've solved for the g00 field we need the equations of motion. We assume that
objects in gravitational free-fall follow geodesics through the spacetime, so the equations
of motion are just the geodesic equations

where x denote the quasi-Euclidean coordinates t,x,y,z defined above. Since we have
assumed that the scale factor k between spatial and temporal coordinates is virtually zero,

and that g00 is nearly equal to unity, it's clear that all the speed components dx/d, dy/d,
dz/d are extremely small, whereas the derivative dt/d is virtually equal to 1. Neglecting
all terms containing one or more of the speed components, we're left with the zerothorder approximation for the spatial accelerations

From the definition of the Christoffel symbols we have

and similarly for the Christoffel symbols in the y and z equations. Since the metric
components are independent of time, the partials with respect to t are all zero. Also, the
metric tensor g and its inverse g are both diagonal and the non-zero components of the
latter are virtually equal to 1, 1/k, 1/k, 1/k. All the mixed components of g vanish, so we
are left with

and similarly for ytt and ztt. As a result, the equations of motion in the weak slow limit
are closely approximated by

We've seen that the Laplace equation requires gtt to be of the form 2K 2J/r for some
constants K and J in a spherically symmetrical field, and since we expect dt/d to
approach 1 as r increases, we can set 2K = 1. With gtt = 1 2J/r we have

and similarly for the partials with respect to y and z. Therefore the approximate equations
of motion in the weak slow limit are

If we set J/k = -m, i.e., to the negative of the mass of the gravitating source, these are
exactly the equations of motion for Newton's inverse-square attraction. Interestingly, this
implies that precisely one of J,k is negative. If we choose to make J negative, then the

gravitational "potential" has the form gtt = 1 + 2|J|/r, which signifies that the potential
would increase as we approach the source, as would the rate of proper time along a
stationary worldline with respect to coordinate time. In such a universe the value of k
would need to be positive in order for gravity to be attractive, i.e., in order for geodesics
to converge on the gravitating source. On the other hand, if we choose to make J positive,
so that the potential and the rate of proper time decrease as we approach the source, then
the constant k must be negative. Referring back to the original line element, this implies
an indefinite metric. Naturally we can scale our units so that |k| = 1, but the sign of k is
significant. Thus from the observation that "things fall down" we can nearly infer the
Minkowski metrical structure of spacetime.
The fact that we can derive the correct trajectories of free-falling objects based on either
of two diametrically opposed assumptions is not without precedent. This is very closely
related to how Descartes and Newton were able to deduce the correct law of refraction
based on the assumption that light travels more rapidly in denser media, while Fermat
deduced the same law from the opposite assumption.
In any case, taking k = 1 and J = m, we see that Newton's law of gravitation in the
vacuum is R = 0, closely paralleling the vacuum field equations of general relativity,
which represents the vanishing of the Laplacian of g00/2. At a point with non-zero mass
density we simply set this equal to 4 to give Poisson's equation. Hence is we define
the energy-momentum array

we can express Newton's geometrical spacetime law of gravitation as

This can be compared with Einstein's field equations

Of course the "R" and "T" arrays in Newton's law are based on simple partial derivatives,
rather than covariant differentiation, so they are not precisely identical to the Ricci tensor
and the energy-momentum tensor of general relativity. However, the definitions are close
enough that the tensors of general relativity can rightly be viewed as the natural
generalizations of the simple Newtonian arrays. The above equations show that the
acceleration of gravity is proportional to the rate of change of gtt as a function of r. At any
given r we have d/dt =

, so gtt corresponds to the squared "rate of proper time"

(with respect to coordinate time) at the given r. It follows that our feet are younger than
our heads, because time advances more slowly as we get closer to the center of the field.
So, despite Newton's conception of the perfectly equable flow of time, his theory of
gravitation can well be interpreted as a description of the effects of the inequable flow of
time. In essence, the effect of Newtonian gravity can be explained in terms of the flow of
time being slower near massive objects, and just as a refracted ray of light veers toward
the medium in which light goes more slowly (and as a tank veers in the direction of the
slower tread-track), objects progressing in time veer in the direction of slower proper
time, causing them to accelerate toward massive objects.

8.6 On Gauss's Mountains


Grossmann is getting his doctorate on a topic that is connected with
fiddling around and non-Euclidean geometry. I dont know exactly what it
is.
Einstein to Mileva
Maric, 1902
One of the most famous stories about Gauss depicts him measuring the angles of the
great triangle formed by the mountain peaks of Hohenhagen, Inselberg, and Brocken for
evidence that the geometry of space is non-Euclidean. It's certainly true that Gauss
acquired geodetic survey data during his ten-year involvement in mapping the Kingdom
of Hanover during the years from 1818 to 1832, and this data included some large "test
triangles", notably the one connecting the those three mountain peaks, which could be
used to check for accumulated errors in the smaller triangles. It's also true that Gauss
understood how the intrinsic curvature of the Earth's surface would theoretically result in
slight discrepancies when fitting the smaller triangles inside the larger triangles, although
in practice this effect is negligible, because the Earth's curvature is so slight relative to
even the largest triangles that can be visually measured on the surface. Still, Gauss
computed the magnitude of this effect for the large test triangles because, as he wrote to
Olbers, "the honor of science demands that one understand the nature of this inequality
clearly". (The government officials who commissioned Gauss to perform the survey
might have recalled Napoleon's remark that Laplace as head of the Department of the
Interior had "brought the theory of the infinitely small to administration".) It is
sometimes said that the "inequality" which Gauss had in mind was the possible curvature
of space itself, but taken in context it seems he was referring to the curvature of the
Earth's surface.
On the other hand, if the curvature of space was actually great enough to be observed in
optical triangles of this size, then presumably Gauss would have noticed it, so we may
still credit him with having performed an empirical observation of geometry, but in this
same sense every person who ever lived has made such observations. It might be more
meaningful to name people who have explicitly argued against the empirical status of
geometry, i.e., who have claimed that the character of spatial relations could be known

without empirical observation. In his "Critique of Pure Reason", Kant famously declared
that Euclidean geometry is the only possible way in which the mind can organize
information about extrinsic spatial relations. One could also cite Plato and other idealists
and a priorists. On the other hand, Poincare advocated a conventionalist view of
geometry, arguing that we can always, if we wish, cast our physics within a Euclidean
spatial framework - provided we are prepared to make whatever adjustments in our
physical laws are necessary to preserve this convention. In any case, it seems reasonable
to agree with Buhler, who concludes in his biography of Gauss that "the oft-told story
according to which Gauss wanted to decide the question [of whether space is perfectly
Euclidean] by measuring a particularly large triangle is, as far as we know, a myth."
The first person to publicly propose an actual test of the geometry of space was
apparently Lobachevski, who suggested that one might "investigate a stellar triangle for
an experimental resolution of the question." The "stellar triangle" he proposed was the
star Sirius and two different positions of the Earth at 6-month intervals. This was used by
Lobachevski as an example to show how we could place limits on the deviation from
flatness of actual space, based on the fact that, in a hyperbolic space of constant
curvature, there is a limit to how small a star's parallax can be, even the most distant star.
Gauss had already (in private correspondence with Taurinus in 1824) defined the
"characteristic length" of a hyperbolic space, which he called "k", and had derived several
formulas for the properties of such a space in terms of this parameter. For example, the
circumference of a circle of radius r in a hyperbolic space whose "characteristic length" is
k is given by

Since sinh(x) = x + x3/3! +..., it follows that C approaches 2r as k increases to infinity.


From the fact that the maximum parallax of Sirius (as seen from the Earth at various
times) is 1",24, Lobachevski deduced that the value of k for our space must be at least
166,000 times the radius of the Earth's orbit. Naturally the same analysis for more distant
stars gives an even larger lower bound on k.
The first definite measurement of parallax for a fixed star was performed by Friedrich
Bessel (a close friend of Gauss) in 1838, on the star 61 Cygni. Shortly thereafter he
measured Sirius (and discovered its binary nature). Lobachevski's first paper on "the new
geometry" was presented as a lecture at Kasan in 1826, followed by publications in 1829,
1835, 1840, and 1855 (a year before his death). He presented his lower bound on "k" in
the later editions based on the still fairly recent experimental results of stellar parallax
measurements. In 1855 Lobachevski was completely blind, so he dictated his exposition.
The other person credited with discovering non-Euclidean geometry, Janos Bolyai, was
the son of Wolfgang Bolyai, who was a friend (almost the only friend) of Gauss during
their school days at Gottingen in the late 1790's. The elder Bolyai had also been
interested in the foundations of geometry, and spent many years trying to prove that
Euclid's parallel postulate is a consequence of the other postulates. Eventually he

concluded that it had been a waste of time, and he became worried when his son Janos
became interested in the same subject. The alarmed father wrote to his son
For God's sake, I beseech you, give it up. Fear it no less than sensual passions
because it, too, may take all your time, and deprive you of your health, peace of
mind, and happiness in life.
Undeterred, Janos continued to devote himself to the study of the parallel postulate, and
in 1829 he succeeded in proving just the opposite of what his father (and so many others)
had tried in vain to prove. Janos found (as had Gauss, Taurinnus, and Lobachevesky just
a few years earlier) that Euclid's parallel postulate is not a consequence of the other
postulates, but is rather an independent assumption, and that alternative but equally
consistent geometries based on different assumptions may be constructed. He called this
the "Absolute Science of Space", and wrote to his father that "I have created a new
universe from nothing". The father then, forgetting his earlier warnings, urged Janos to
publish his findings as soon as possible, noting that
...ideas pass easily from one to another, and secondly... many things have an
epoch, in which they are found at the same time in several places, just as violets
appear on every side in spring.
Naturally the elder Bolyai sent a copy of his son's spectacular discovery to Gauss, in June
of 1831, but it was apparently lost in the mail. Another copy was sent in January of
1832, and then seven weeks later Gauss sent a reply to his old friend:
If I commenced by saying that I am unable to praise this work, you would
certainly be surprised for a moment. But I cannot say otherwise. To praise it
would be to praise myself. Indeed the whole contents of the work, the path taken
by your son, the results to which he is led, coincide almost entirely with my
meditations, which have occupied my mind partly for the last thirty or thirty-five
years. So I remained quite stupefied. So far as my own work is concerned, of
which up till now I have put little on paper, my intention was not to let it be
published during my lifetime. ... I have found very few people who could regard
with any special interest what I communicated to them on this subject. ...it was
my idea to write down all this later so that at least it should not perish with me. It
is therefore a pleasant surprise for me that I am spared this trouble, and I am very
glad that it is just the son of my old friend, who takes the precedence of me in
such a remarkable manner.
In his later years Gauss' response to many communications of new mathematical results
was similar to the above. For example, he once remarked that a paper of Abel's saved
him the trouble of having to publish about a third of his results concerning elliptic
integrals. Likewise he confided to friends that Jacobi and Eisenstein had "spared him the
trouble" of publishing important results that he (Gauss) had possessed since he was a
teenager, but had never bothered to publish. Dedekind even reports that Gauss made a
similar comment about Riemann's dissertation. It's true that Gauss' personal letters and

notebooks substantiate to some extent his private claims of priority for nearly every
major mathematical advance of the 19th century, but the full extent of his early and
unpublished accomplishments did not become known until after his death, and in any
case it wouldn't have softened the blow to his contemporaries. Janos Bolyai was so
embittered by Gauss's backhanded response to his non-Euclidean geometry that he never
published again.
As another example of what Wolfgang Bolyai called "violets appearing on every side",
Maxwell's great 1865 triumph of showing that electromagnetic waves propagate at the
speed of light was, to some degree, anticipated by others. In 1848 Kirchoff had noted
that the ratio of electromagnetic and electrostatic units was equal to the speed of light,
although he gave no explanation for this coincidence. In 1858 Riemann presented a
theory based on the hypothesis that electromagnetic effects propagate at a fixed speed,
and then deduced that this speed must equal the ratio of electromagnetic and electrostatic
units, i.e.,

Even in this field we find that Gauss can plausibly claim priority for some interesting
developments. Recall that, in addition to being the foremost mathematician of his day,
Gauss was also prominent in studying the phenomena of electricity and magnetism (in
fact the unit of magnetism is called a Gauss), and even dabbled in electrodynamics. As
mentioned in Section 3.5, he reached the conclusion that the keystone of electrodynamics
would turn out to depend on an understanding of how electric effects propagate in time.
In 1835 he wrote (in an unpublished papers, discovered after his death) that
Two elements of electricity in a state of relative motion attract or repel one
another, but not in the same way as if they are in a state of relative rest.
He even suggested the following mathematical form for the complete electromagnetic
force F between two particles with charges q1 and q2 in arbitrary states of motion

where r is the scalar distance, r is the vector distance, u is the relative velocity between
the particles, and dots signify derivatives with respect to time. This formula actually
gives the correct results for particles in uniform (inertial) motion, in which case the
second derivative of the vector r is zero. However, the dot product in Gausss formula
violates conservation of energy for general motions. A few years later (in 1845), Gausss
friend Wilhelm Weber proposed a force law identical to Gausss, except he excluded the
dot product, i.e., he proposed the formula

Weber pointed out that, unlike Gausss original formula, this force law satisfies
conservation of energy, as shown by the fact that it can be derived from the potential
function

In terms of this potential, the force given by F = d/dr is precisely Webers force law.
Equation (1) was used by Weber as the basis of his theory of electrodynamics published
in 1846. Indeed this formula served as the basis for most theoretical studies of
electromagnetism until it was finally superseded by Maxwell's theory beginning in the
1870s. Its interesting that in order for energy to be conserved it was necessary to
eliminate the vectors from Gausss formula, making the result entirely in terms of the
scalar distance and its derivatives. Compare this with the separation equations discussed
in Sections 4.2 and 4.4. Note that according to (1) the condition for the force between two
charged particles to vanish is that the quantity in parentheses equals zero, i.e.,

Differentiating both sides and dividing by r gives the condition


, which is the
same as equation (4) of Section 4.2 if we set N = 0. (The vanishing of the third derivative
is also the condition for zero radiation reaction according to the Lorentz-Dirac equations
of classical electrodynamics.) Interestingly, Kurt Schwarzschild published a paper in
1903 describing in detail how the Gauss-Weber approach could actually have been
developed into a viable theory. In any case, if the two charged particles are separating
(without rotation) at a uniform speed
, Gauss' formula relates the electrostatic force
2
F0 = q1q1/r to the dynamic force as

So, to press the point, one could argue that Gauss' offhand suggestion for the formula
expressing electrodynamic force already represents the seeds of Lorentz's molecular force
hypothesis, from which follows the length contraction and time dilation of the Lorentz
transformations and special relativity. In fact, pursuing this line of thought, Riemann (one
of Gauss successors at Gottingen) proposed in 1858 that the electric potential should
satisfy the equation

where is the charge density. This equation does indeed give the retarded electrostatic

potential, which, combined with the similar equation for the vector potential, serves as
the basis for the whole classical theory of electromagnetism. Assuming conservation of
charge, the invariance of the Minkowski spacetime metric clearly emerges from this
equation, as does the invariance of the speed of light in terms of any suitable (i.e.,
inertially homogeneous and isotropic) system coordinates.
8.7 Strange Meeting
It seemed that out of battle I escaped
Down some profound dull tunnel...
Willfred Owen (1893-1918)
In the summer of 1913 Einstein accepted an offer of a professorship at the University of
Berlin and membership in the Prussian Academy of Sciences. He left Zurich in the
Spring of 1914, and his inagural address before the Prussian Academy took place on July
2, 1914. A month later, Germany was at war with Belgium, Russia, France, and Britian.
Surprisingly, the world war did not prevent Einstein from continuing his intensive efforts
to generalize the theory of relativity so as to make it consistent with gravitation - but his
marriage almost did. By April of 1915 he was separated from his wife Mileva and their
two young sons, who had once again taken up residence in Zurich. The marriage was not
a happy one, and he later wrote to his friend Besso that if he had not kept her at a
distance, he would have been worn out, physically and emotionally. Besso and Fritz
Haber (Einstein's close friend and colleague) both made efforts to reconcile Albert and
Mileva, but without success.
It was also during this period that Haber was working for the German government to
develop poison gas for use in the war. On April 22, 1915 Haber directed the release of
chlorine gas on the Western Front at Ypres in France. On May 23rd Italy declared war on
Austria-Hungary, and subsequently against Germany itself. Meanwhile an Allied army
was engaged in a disastrous campaign to take the Galipoli Peninsula from Germany's ally,
the Turks. Germany shifted the weight of its armies to the Eastern Front during this
period, hoping to knock Russia out of the war while fighting a holding action against the
French and British in the West. In a series of huge battles from May to September the
Austro-German armies drove the Russians back 300 miles, taking Poland and Lithuania
and eliminating the threat to East Prussia. Despite these defeats, the Russians managed to
re-form their lines and stay in the war (at least for another two years). The astronomer
Kurt Schwarzschild was stationed with the German Army in the East, but still kept close
watch on Einstein's progress, which was chronicled like a serialized Dickens novel in
almost weekly publications of the Berlin Academy.
Toward the end of 1915, having failed to drive Russia out of the war, the main German
armies were shifted back to the Western Front. Falkenhayn (the chief of the German
general staff) was now convinced that a traditional offensive breakthrough was not
feasible, and that Germany's only hope of ultimately ending the war on favorable terms
was to engage the French in a war of attrition. His plan was to launch a methodical and

sustained assault on a position that the French would feel honor-bound to defend to the
last man. The ancient fortress of Verdun ("they shall not pass") was selected, and the
plan was set in motion early in 1916. Falkenhayn had calculated that only one German
soldier would be killed in the operation for every three French soldiers, so they would
"bleed the French white" and break up the Anglo-French alliance. However, the actual
casualty ratio turned out to be four Germans for every five French. By the end of 1916 a
million men had been killed at Verdun, with no decisive change in the strategic position
of either side, and the offensive was called off.
At about the same time that Falkenhayn was formulating his plans for Verdun, on Nov
25, 1915, Einstein arrived at the final form of the field equations for general relativity.
After a long and arduous series of steps (and mis-steps), he was able to announce that
"finally the general theory of relativity is closed as a logical structure". Given the
subtlety and complexity of the equations, one might have expected that rigorous closedform solutions for non-trivial conditions would be difficult, if not impossible, to find.
Indeed, Einstein's computations of the bending of light, the precession of Mercury's orbit,
and the gravitational redshift were all based on approximate solutions in the weak field
limit. However, just two months later, Schwarzschild had the exact solution for the static
isotropic field of a mass point, which Einstein presented on his behalf to the Prussian
Academy on January 16, 1916. Sadly, Schwarzschild lived only another four months.
He became ill at the front and died on May 11 at the age of 42.
It's been said that Einstein was scandalized by Schwarzschild's solution, for two reasons.
First, he still imagined that the general theory might be the realization of Mach's dream of
a purely relational theory of motion, and Einstein realized that the fixed spherically
symmetrical spacetime of a single mass point in an otherwise empty universe is highly
non-Machian. That such a situation could correspond to a rigorous solution of his field
equations came as something of a shock, and probably contributed to his eventual
rejection of Mach's ideas and positivism in general. Second, the solution found by
Schwarzschild - which was soon shown by Birkhoff to be the unique spherically
symmetric solution to the field equations (barring a non-zero cosmological constant) contained what looked like an unphysical singularity. Of course, since the source term
was assumed to be an infinitesimal mass point, a singularity at r = 0 is perhaps not too
surprising (noting that Newton's inverse square law is also singular at r = 0). However,
the Schwarzschild solution was also (apparently) singular at r = 2m, where m is the mass
of the gravitating object in geometric units.
Einstein and others argued that it wasn't physically realistic for a configuration of
particles of total mass M to reside within their joint Schwarzschild radius r = 2m, and so
this "singularity" cannot exist in reality. However, subsequent analyses have shown that
(barring some presently unknown phenomenon) there is nothing to prevent a sufficiently
massive object from collapsing to within its Schwarzschild radius, so it's worthwhile to
examine the formal singularity at r = 2m to understand its physical significance. We find
that the spacetime manifold at this boundary need not be considered as singular, because
it can be shown that the singularity is removable, in the sense that all the invariant
measures of the field smoothly approach fixed finite values as r approaches 2m from

either direction. Thus we can analytically continue the solution through the singularity.
Now, admittedly, describing the Schwarzschild boundary as an "analytically removable
singularity" is somewhat unorthodox. It's customary to assert that the Schwarzschild
solution is unequivocally non-singular at r = 2m, and that the intrinsic curvature and
proper time of a free-falling object are finite and well-behaved at that radius. Indeed we
derived these facts in Section 6.4. However, it's worth remembering that even with
respect to the proper frame of an infalling test particle, we found that there remains a
formal singularity at r = 2m. (See the discussion following equation 5 of Section 6.4.)
The free-falling coordinate system does not remove the singularity, but it makes the
singularity analytically removable. Similarly our derivation in Section 6.4 of the intrinsic
curvature K of the Schwarzschild solution at r = 2m tacitly glossed over the intermediate
result

Strictly speaking, the middle term on the right side is 0/0 (i.e., undefined) at r = 2m. Of
course, we can divide the numerator and denominator by (r2m), but this step is
unambiguously valid only if (r-2m) is not equal to zero. If (r-2m) does equal zero, this
cancelation is still possible, but it amounts to the analytic removal of a singularity. In
addition, once we have removed this singularity, the resulting term is infinite, formally
equal to the third term, which is also infinite, but with opposite sign. We then proceed to
subtract the infinite third term from the infinite second term to arrive at the innocuouslooking finite result K = -2m/r3 at r = 2m. Granted, the form of the metric coefficients
and their derivatives depends on the choice of coordinates, and in a sense we can attribute
the troublesome behavior of the metric components at r = 2m to the unsuitability of the
traditional Schwarzschild coordinates r,t at this location. From this we might be tempted
to conclude that the Schwarzschild radius has no physical significance. This is true
locally, but globally the Schwarzschild radius is physically significant, as the event
horizon between two regions of the manifold. Hence it isn't surprising that, in terms of
the r,t coordinates, we encounter singularities and infinities, because these coordinates are
globally unique, viz., the Schwarzschild coordinate t is the essentially unique time
coordinate for which the manifold is globally static.
Interestingly, the solution in Schwarzschild's 1916 paper was not presented in terms of
what we today call Schwarzschild coordinates. Those were introduced a year later by
Droste. Schwarzschild presented a line element that is formally identical to the one for
which he is know, viz,

In this formula the coordinates t, , and have their usual meanings, and the parameter
is to be identified with 2m as usual. However, he did not regard "R" as the physically

significant radial distance from the center of the field. He begins by declaring a set of
rectangular space coordinates x,y,z, and then defines the radial parameter r such that
r2 = x2 + y2 + z2
Accordingly he relates these parameters to the angular coordinates , and by the usual
polar definitions

He wishes to make use of the truncated field equations

which (as discussed in Section 5.8) requires that the determinant of the metric be
constant. Remember that this was written in 1915 (formally conveyed by Einstein to the
Prussian academy on 13 January 1916), and apparently Schwarzschild was operating
under the influence of Einstein's conception of the condition g=-1 as a physical principle,
rather than just a convenience enabling the use of the truncated field equations. In any
case, this is the form that Schwarzschild set out to solve, and he realized that the metric
components of the most general spherically symmetrical static polar line element

where f and h are arbitrary functions of r has the determinant g = f(r) h(r) r4sin()2.
(Schwarzschild actually included an arbitrary function of r on the angular terms of the
line element, but that was superfluous.) To simplify the determinant condition he
introduces the transformation

from which we get the differentials

Substituting these into the general line element gives the transformed line element

which has the determinant g = f(r)h(r). Schwarzschild then requires this to equal -1, so
his derivation essentially assumes a priori that h(r) = 1/f(r). Interestingly, with this

assumption it's easy to see that there is really only one function f(r) that can yield
Kepler's laws of motion, as discussed in Section 5.5. Hence it could be argued that the
field equations were superfluous to the determination of the spherically symmetrical
static spacetime metric. On the other hand, the point of the exercise was to verify that
this one physically viable metric is actually a solution of the field equations, thereby
supporting their general applicability.
In any case, noting that r = (3x1)1/3 and sin()2 = 1 (x2)2, and with the stipulation that
h(r) = 1/f(r), and that the metric go over to the Minkowski metric as r goes to infinity,
Schwarzschild essentially showed that Einstein's field equations are satisfied by the
above line element if f(r) = 1 /r where is a constant of integration that "depends on
the value of the mass at the origin". Naturally we take = 2m for agreement with
observation in the Newtonian limit. However, in the process of integrating the conditions
on f(r) there appears another constant of integration, which Schwarzschild calls . So the
general solution is actually

We ordinarily take = 2m and = 0 to give the usual result f(r) = 1 /r, but
Schwarzschild was concerned to impose an additional constraint on the solution (beyond
spherical symmetry, staticality, asymptotic flatness, and the field equations), which he
expressed as "continuity of the [metric coefficients], except at r = 0". The metric
coefficient h(r) = 1/f(r) is obviously discontinuous when f(r) vanishes, which is to say
when r3 + = 3. With the usual choice = 0 this implies that the metric is
discontinuous when r = = 2m, which of course it is. This is the infamous
Schwarzschild radius, where the usual Schwarzschild time coordinate becomes singular,
representing the event horizon of a black hole. In retrospect, Schwarzschild's
requirement for "continuity of the metric coefficients" is obviously questionable, since a
discontinuity or singularity of a coordinate system is not generally indicative of a
singularity in the manifold - the classical example being the singularity of polar
coordinates at the North pole. Probably Schwarzschild meant to impose continuity on the
manifold itself, rather than on the coordinates, but as Einstein remarked, "it is not so easy
to free one's self from the idea that coordinates must have a direct metric significance".
It's also somewhat questionable to impose continuity and absence of singularities except
at the origin, because if this is a matter of principle, why should there be an exception,
and why at the "origin" of the spherically symmetrical coordinate system?
Nevertheless, following along with Schwarzschild's thought, he obviously needs to
require that the equality r3 + = 3 be satisfied only when r = 0, which implies = 3.
Consequently he argues that the expression (r3 + )1/3 should not be reduced to r. Instead,
he defines the parameter R = (r3 + )1/3, in terms of which the metric has the familiar form
(1). Of course, if we put = 0 then R = r and equation (1) reduces to the usual form of
the Schwarzschild/Droste solution. However, with = 3 we appear to have a physically
distinct result, free of any coordinate singularity except at r = 0, which corresponds to the

location R = . The question then arises as to whether this is actually a physically


distinct solution from the usual one. From the definitions of the quasi-orthogonal
coordinates x,y,z we see that x = y = z = 0 when r = 0, but of course the x,y,z coordinates
also take on negative values at various points of the manifold, and nothing prevents us
from extending the solution to negative values of the parameter r, at least not until we
arrive at the condition R = 0, which corresponds to r = . At this location it can be
shown that we have a genuine singularity in the manifold, because the curvature scalar
becomes infinite.
In terms of these coordinates the entire surface of the Schwarzschild horizon has the same
spatial coordinates x = y = z = 0, but nothing prevents us from passing through this point
into negative values of r. It may seem that by passing into negative values of x,y,z we are
simply increasing r again, but this overlooks the duality of solutions to

The distinction between the regions of positive and negative r is clearly shown in terms
of polar coordinates, because the point in the equatorial plane with polar coordinates r,0
need not be identified with the point r,. Essentially polar coordinates cover two
separate planes, one with positive r and the other with negative r, and the only smooth
path between them is through the boundary point r = 0. According to Schwarzschild's
original conception of the coordinates, this boundary point is the event horizon, whereas
the physical singularity in the manifold occurs at the surface of a sphere whose radius is
r = 2m. In other words, the singularity at the "center" of the Schwarzschild solution
occurs just on the other side of the boundary point r = 0 of these polar coordinates. We
can shift this boundary point arbitrarily by simply shifting the "zero point" of the
complete r scale, which actually extends from - to +. However, none of this changes
any of the proper intervals along any physical paths, because those are invariant under
arbitrary (diffeomorphic) transformations. So Schwarzschild's version of the solution is
not physically distinct from the usual interpretation introduced by Droste in 1917.
It's interesting that as late as 1936 (two decades after Schwarzschild's death) Einstein
proposed to eliminate the coordinate singularity in the (by then) conventional
interpretation of the Schwarzschild solution by defining a radial coordinate in terms of
the Droste coordinate r by the relation 2 = r 2m. In terms of this coordinate the line
element is

Einstein notes that as ranges from - to + the corresponding values of r range from
+ down to 2m and them back to +, so he conceives of the complete solution as two
identical sheets of physical space connected by the "bridge" at the boundary = 0, where
r = 2m and the determinant of the metric vanishes. This is called the Einstein-Rosen
bridge. For values of r less than 2m he argues that "there are no corresponding real

values of ". On this basis he asserts that the region r < 2m has been excluded from the
solution. However, this is really just another re-expression of the original Schwarzschild
solution, describing the "exterior" portions of the solution, but neglecting the interior
portion, where is imaginary. However, just as we can allow Schwarzschild's r to take
on negative values, we can allow Einstein's to take on imaginary values. The maximal
analytic extension of the Schwarzschild solution necessarily includes the interior region,
and it can't be eliminated simply by a change of variables. Ironically, the reason the
manifold seems to be well-behaved across Einstein's "bridge" between the two exterior
regions while jumping over the interior region is precisely that the coordinate is locally
ill-behaved at = 0. Birkhoff proved that the Schwarzschild solution is the unique
spherically symmetrical solution of the field equations, and it has been shown that the
maximal analytic extension of this solution (called the Kruskal extension) consists of two
exterior regions connected by the internal region, and contains a genuine manifold
singularity.
On the other hand, just because the maximally extended Schwarzschild solution satisfies
the field equations, it doesn't necessarily follow that such a thing exists. In fact, there is
no known physical process that would produce this configuration, since it requires two
asymptotically flat regions of spacetime that happen to become connected at a singularity,
and there is no reason to believe that such a thing would ever happen. In contrast, it's
fairly plausible that some part of the complete Schwarzschild solution could be produced,
such as by the collapse of a sufficiently massive star. The implausibility of the
maximally extended solutions doesn't preclude the existence of black holes - although it
does remind us to be cautious about assuming the actual existence of things just because
they are solutions of the field equations.
Despite the implausibility of an Einstein-Rosen bridge connecting two distinct sheets of
spacetime, this idea has recently gained widespread attention, the term "bridge" having
been replaced with "wormhole". It's been speculated that under certain conditions it
might be possible to actually traverse a wormhole, passing from one region of spacetime
to another. As discussed above this is definitely not possible for the Schwarzschild
solution, because of the unavoidable singularity, but people have recently explored the
possibilities of traversable wormholes. Naturally if such direct conveyance between
widely separate regions of spacetime were possible, and if those regions were also
connected by (much longer) ordinary timelike paths, this raises the prospect of various
kinds of "time travel", assuming a wormhole connected to the past was somehow
established and maintained. However, these rather far-fetched scenarios all rely on the
premise of negative energy density, which of course violates so-called "null energy
condition", not to mention the weak, strong, and dominant energy conditions of classical
relativity. In other words, on the basis of classical relativity and the traditional energy
conditions we could rule out traversable wormholes altogether. It is only the fact that
some quantum phenomena do apparently violate these energy conditions (albeit very
slightly) that leaves open the remote possibility of such things.

8.8 Who Invented Relativity?


All beginnings are obscure.
H. Weyl
There have been many theories of relativity throughout history, from the astronomical
speculations of Heraclides to the geometry of Euclid to the classical theory of space,
time, and dynamics developed by Galileo, Newton and others. Each of these was based
on one or more principle of relativity. However, when we refer to the theory of
relativity today, we usually mean one particular theory of relativity, namely, the body of
ideas developed near the beginning of the 20th century and closely identified with the
work of Albert Einstein. These ideas are distinguished from previous theories not by
relativity itself, but by the way in which relativistically equivalent coordinate systems are
related to each other.
One of the interesting historical aspects of the modern relativity theory is that, although
often regarded as the highly original and even revolutionary contribution of a single
individual, almost every idea and formula of the theory had been anticipated by others.
For example, Lorentz covariance and the inertia of energy were both (arguably) implicit
in Maxwells equations. Also, Voigt formally derived the Lorentz transformations in 1887
based on general considerations of the wave equation. In the context of electro-dynamics,
Fitzgerald, Larmor, and Lorentz had all, by the 1890s, arrived at the Lorentz
transformations, including all the peculiar "time dilation" and "length contraction" effects
(with respect to the transformed coordinates) associated with Einstein's special relativity.
By 1905, Poincare had clearly articulated the principle of relativity and many of its
consequences, had pointed out the lack of empirical basis for absolute simultaneity, had
challenged the ontological significance of the ether, and had even demonstrated that the
Lorentz transformations constitute a group in the same sense as do Galilean
transformations. In addition, the crucial formal synthesis of space and time into spacetime
was arguably the contribution of Minkowski in 1907, and the dynamics of special
relativity were first given in modern form by Lewis and Tolman in 1909. Likewise, the
Riemann curvature and Ricci tensors for n-dimensional manifolds, the tensor formalism
itself, and even the crucial Bianchi identities, were all known prior to Einsteins
development of general relativity in 1915. In view of this, is it correct to regard Einstein
as the sole originator of modern relativity?
The question is complicated by the fact that relativity is traditionally split into two
separate theories, the special and general theories, corresponding to the two phases of
Einstein's historical development, and the interplay between the ideas of Einstein and
those of his predecessors and contemporaries are different in the two cases. In addition,
the title of Einsteins 1905 paper (On the Electrodynamics of Moving Bodies)
encouraged the idea that it was just an interpretation of Lorentz's theory of
electrodynamics. Indeed, Wilhelm Wein proposed that the Nobel prize of 1912 be
awarded jointly to Lorentz and Einstein, saying

The principle of relativity has eliminated the difficulties which existed in


electrodynamics and has made it possible to predict for a moving system all
electrodynamic phenomena which are known for a system at rest... From a purely
logical point of view the relativity principle must be considered as one of the most
significant accomplishments ever achieved in theoretical physics... While Lorentz
must be considered as the first to have found the mathematical content of
relativity, Einstein succeeded in reducing it to a simple principle. One should
therefore assess the merits of both investigators as being comparable.
As it happens, the physics prize for 1912 was awarded to the Nils Gustaf Dalen (for the
"invention of automatic regulators for lighting coastal beacons and light buoys during
darkness or other periods of reduced visibility"), and neither Einstein, Lorentz, nor
anyone else was ever awarded a Nobel prize for either the special or general theories of
relativity. This is sometimes considered to have been an injustice to Einstein, although in
retrospect it's conceivable that a joint prize for Lorentz and Einstein in 1912, as Wein
proposed, assessing "the merits of both investigators as being comparable", might
actually have diminished Einstein's subsequent popular image as the sole originator of
both special and general relativity.
On the other hand, despite the somewhat misleading title of Einsteins paper, the second
part of the paper (The Electrodynamic Part) was really just an application of the
general theoretical framework developed in the first part of the paper (The Kinematic
Part). It was in the first part that special relativity was founded, with consequences
extending far beyond Lorentz's electrodynamics. As Einstein later recalled,
The new feature was the realization that the bearing of the Lorentz transformation
transcended its connection with Maxwell's equations and was concerned with the
nature of space and time in general.
To give just one example, we may note that prior to the advent of special relativity the
experimental results of Kaufmann and others involving the variation of an electrons
mass with velocity were thought to imply that all of the electrons mass must be
electromagnetic in origin, whereas Einsteins kinematics revealed that all mass
regardless of its origin would necessarily be affected by velocity in the same way. Thus
an entire research program, based on the belief that the high-speed behavior of objects
represented dynamical phenomena, was decisively undermined when Einstein showed
that the phenomena in question could be interpreted much more naturally on a purely
kinematic basis. Now, if this interpretation applied only to electrodynamics, its
significance might be debatable, but already by 1905 it was clear that, as Einstein put it,
the Lorentz transformation transcended its connection with Maxwells equations, and
must apply to all physical phenomena in order to account for the complete inability to
detect absolute motion. Once this is recognized, it is clear that we are dealing not just
with properties of electricity and magnetism, or any other specific entities, but with the
nature of space and time themselves. This is the aspect of Einstein's 1905 theory that
prompted Witkowski, after reading vol. 17 of Annalen der Physic, to exclaim: "A new
Copernicus is born! Read Einstein's paper!" The comparison is apt, because the

contribution of Copernicus was, after all, essentially nothing but an interpretation of


Ptolemys astronomy, just as Einstein's theory was an interpretation of Lorentz's
electrodynamics. Only subsequently did men like Kepler, Galileo, and Newton, taking the
Copernican insight even more seriously than Copernicus himself had done, develop a
substantially new physical theory. It's clear that Copernicus was only one of several
people who jointly created the "Copernican revolution" in science, and we can argue
similarly that Einstein was only one of several individuals (including Maxwell, Lorentz,
Poincare, Planck, and Minkowski) responsible for the "relativity revolution".
The historical parallel between special relativity and the Copernican model of the solar
system is not merely superficial, because in both cases the starting point was a preexisting theoretical structure based on the naive use of a particular system of coordinates
lacking any inherent physical justification. On the basis of these traditional but eccentric
coordinate systems it was natural to imagine certain consequences, such as that both the
Sun and the planet Venus revolve around a stationary Earth in separate orbits. However,
with the newly-invented telescope, Galileo was able to observe the phases of Venus,
clearly showing that Venus moves in (roughly) a circle around the Sun. In this way the
intrinsic patterns of the celestial bodies became better understood, but it was still possible
(and still is possible) to regard the Earth as stationary in an absolute extrinsic sense. In
fact, for many purposes we continue to do just that, but from an astronomical standpoint
we now almost invariably regard the Sun as the "center" of the solar system. Why? The
Sun too is moving among the stars in the galaxy, and the galaxy itself is moving relative
to other galaxies, so on what basis do we decide to regard the Sun as the "center" of the
solar system?
The answer is that the Sun is the inertial center. In other words, the Copernican
revolution (as carried to its conclusion by the successors of Copernicus) can be
summarized as the adoption of inertia as the prime organizing principle for the
understanding and description of nature. The concept of physical inertia was clearly
identified, and the realization of its significance evolved and matured through the works
of Kepler, Galileo, Newton, and others. Nature is most easily and most perspicuously
described in terms of inertial coordinates. Of course, it remains possible to adopt some
non-inertial system of coordinates with respect to which the Earth can be regarded as the
stationary center, but there is no longer any imperative to do this, especially since we
cannot thereby change the fact that Venus circles the Sun, i.e., we cannot change the
intrinsic relations between objects, and those intrinsic relations are most readily
expressed in terms of inertial coordinates.
Likewise the pre-existing theoretical structure in 1905 described events in terms of
coordinate systems that were not clearly understood and were lacking in physical
justification. It was natural within this framework to imagine certain consequences, such
as anisotropy in the speed of light, i.e., directional dependence of light speed resulting
from the Earth's motion through the (assumed stationary) ether. This was largely
motivated by the idea that light consists of a wave in the ether, and therefore is not an
inertial phenomenon. However, experimental physicists in the late 1800's began to
discover facts analogous to the phases of Venus, e.g., the symmetry of electromagnetic

induction, the "partial convection" of light in moving media, the isotropy of light speed
with respect to relatively moving frames of reference, and so on. Einstein accounted for
all these results by showing that they were perfectly natural if things are described in
terms of inertial coordinates - provided we apply a more profound understanding of the
definition and physical significance of such coordinate systems and the relationships
between them.
As a result of the first inertial revolution (initiated by Copernicus), physicists had long
been aware of the existence of a preferred class of coordinate systems - the inertial
systems - with respect to which inertial phenomena are isotropic. These systems are
equivalent up to orientation and uniform motion in a straight line, and it had always been
tacitly assumed that the transformation from one system in this class to another was given
by a Galilean transformation. The fundamental observations in conflict with this
assumption were those involving electric and magnetic fields that collectively implied
Maxwell's equations of electromagnetism. These equations are not invariant under
Galilean transformations, but they are invariant under Lorentz transformations. The
discovery of Lorentz invariance was similar to the discovery of the phases of Venus, in
the sense that it irrevocably altered our awareness of the intrinsic relations between
events. We can still go on using coordinate systems related by Galilean transformations,
but we now realize that only one of those systems (at most) is a truly inertial system of
coordinates.
Incidentally, the electrodynamic theory of Lorentz was in some sense analogous to Tycho
Brahe's model of the solar system, in which the planets revolve around the Sun but the
Sun revolves around a stationary Earth. Tycho's model was kinematically equivalent to
Copernicus' Sun-centered model, but expressed awkwardly in terms of a coordinate
system with respect to which the Earth is stationary, i.e., a non-inertial coordinate system.
It's worth noting that we define inertial coordinates just as Galileo did, i.e., systems of
coordinates with respect to which inertial phenomena are isotropic, so our definition
hasn't changed. All that has changed is our understanding of the relations between inertial
coordinate systems. Einstein's famous "synchronization procedure" (which was actually
first proposed by Poincare) was expressed in terms of light rays, but the physical
significance of this procedure is due to the empirical fact that it yields exactly the same
synchronization as does Galileo's synchronization procedure based on mechanical inertia.
To establish simultaneity between spatially separate events while floating freely in empty
space, throw two identical objects in opposite directions with equal force, so that the
thrower remains stationary in his original frame of reference. These objects then pass
equal distances in equal times, i.e., they serve to assign inertially simultaneous times to
separate events as they move away from each other. In this way we can theoretically
establish complete slices of inertial simultaneity in spacetime, based solely on the inertial
behavior of material objects. Someone moving uniformly relative to us can carry out this
same procedure with respect to his own inertial frame of reference and establish his own
slices of inertial simultaneity throughout spacetime. The unavoidable intrinsic relations
that were discovered at the end of the 19th century show that these two sets of
simultaneity slices are not identical. The two main approaches to the interpretation of

these facts were discussed in Sections 1.5 and 1.6. The approach advocated by Einstein
was to adhere to the principle of inertia as the basis for organizing our understanding and
descriptions of physical phenomena - which was certainly not a novel idea.
In his later years Einstein observed "there is no doubt that the Special Theory of
Relativity, if we regard its development in retrospect, was ripe for discovery in 1905".
The person (along with Lorentz) who most nearly anticipated Einstein's special relativity
was undoubtedly Poincare, who had already in 1900 proposed an explicitly operational
definition of clock synchronization and in 1904 suggested that the ether was in principle
undetectable to all orders of v/c. Those two propositions and their consequences
essentially embody the whole of special relativity. Nevertheless, as late as 1909 Poincare
was not prepared to say that the equivalence of all inertial frames combined with the
invariance of (two-way) light speed were sufficient to infer Einstein's model. He
maintained that one must also stipulate a particular contraction of physical objects in their
direction of motion. This is sometimes cited as evidence that Poincare still failed to
understand the situation, but there's a sense in which he was actually correct. The two
famous principles of Einstein's 1905 paper are not sufficient to uniquely identify special
relativity, as Einstein himself later acknowledged. One must also stipulate, at the very
least, homogeneity, memorylessness, and isotropy. Of these, the first two are rather
innocuous, and one could be forgiven for failing to explicitly mention them, but not so
the assumption of isotropy, which serves precisely to single out Einstein's simultaneity
convention from all the other - equally viable - interpretations. (See Section 4.5). This is
also precisely the aspect that is fixed by Poincare's postulate of contraction as a function
of velocity.
In a sense, the failure of Poincare to found the modern theory of relativity was not due to
a lack of discernment on his part (he clearly recognized the Lorentz group of space and
time transformations), but rather to an excess of discernment and philosophical
sophistication, preventing him from subscribing to the young patent examiner's inspired
but perhaps slightly naive enthusiasm for the symmetrical interpretation, which is, after
all, only one of infinitely many possibilities. Poincare recognized too well the extent to
which our physical models are both conventional and provisional. In retrospect,
Poincare's scruples have the appearance of someone arguing that we could just as well
regard the Earth rather than the Sun as the center of the solar system, i.e., his reservations
were (and are) technically valid, but in some sense misguided. Also, as Max Born
remarked, to the end of Poincares life his expositions of relativity definitely give you
the impression that he is recording Lorentzs work, and yet Lorentz never claimed to be
the author of the principle of relativity, but invariably attributed it to Einstein. Indeed
Lorentz himself often expressed reservations about the relativistic interpretation.
Regarding Borns impression that Poincare was just recording Lorentzs work, it should
be noted that Poincare habitually wrote in a self-effacing manner. He named many of his
discoveries after other people, and expounded many important and original ideas in
writings that were ostensibly just reviewing the works of others, with minor
amplifications and corrections. So, we shouldnt be misled by Borns impression.
Poincare always gave the impression that he was just recording someone elses work in

contrast with Einstein, whose style of writing, as Born said, gives you the impression of
quite a new venture. Of course, Born went on to say, when recalling his first reading of
Einsteins paper in 1907, Although I was quite familiar with the relativistic idea and the
Lorentz transformations, Einsteins reasoning was a revelation to me which had a
stronger influence on my thinking than any other scientific experience.
Lorentzs reluctance to fully embrace the relativity principle (that he himself did so much
to uncover) is partly explained by his belief that "Einstein simply postulates what we
have deduced... from the equations of the electromagnetic field". If this were true, it
would be a valid reason for preferring Lorentz's approach. However, if we closely
examine Lorentz's electron theory we find that full agreement with experiment required
not only the invocation of Fitzgerald's contraction hypothesis, but also the assumption
that mechanical inertia is Lorentz covariant. It's true that, after Poincare complained
about the proliferation of hypotheses, Lorentz realized that the contraction could be
deduced from more fundamental principles (as discussed in Section 1.5), but this was
based on yet another hypothesis, the co-called molecular force hypothesis, which simply
asserts that all physical forces and configurations (including the unknown forces that
maintain the shape of the electron) transform according to the same laws as do
electromagnetic forces. Needless to say, it obviously cannot follow deductively "from the
equations of the electromagnetic field" that the necessarily non-electromagnetic forces
which hold the electron together must transform according to the same laws. (Both
Poincare and Einstein had already realized by 1905 that the mass of the electron cannot
be entirely electromagnetic in origin.) Even less can the Lorentz covariance of
mechanical inertia be deduced from electromagnetic theory. We still do not know to this
day the origin of inertia, so there is no sense in which Lorentz or anyone else can claim to
have deduced Lorentz covariance in any constructive sense, let alone from the laws of
electromagnetism.
Hence Lorentz's molecular force hypothesis and his hypothesis of covariant mechanical
inertia together are simply a disguised and piece-meal way of postulating universal
Lorentz invariance - which is precisely what Lorentz claims to have deduced rather than
postulated. The whole task was to reconcile the Lorentzian covariance of
electromagnetism with the Galilean covariance of mechanical dynamics, and Lorentz
simply recognized that one way of doing this is to assume that mechanical dynamics (i.e.,
inertia) is actually Lorentz covariant. This is presented as an explicit postulate (not a
deduction) in the final edition of his book on the Electron Theory. In essence, Lorentzs
program consisted of performing a great deal of deductive labor, at the end of which it
was still necessary, in order to arrive at results that agreed with experiment, to simply
postulate the same principle that forms the basis of special relativity. (To his credit,
Lorentz candidly acknowledged that his deductions were "not altogether satisfactory", but
this is actually an understatement, because in the end he simply postulated what he
claimed to have deduced.)
In contrast, Einstein recognized the necessity of invoking the principle of relativity and
Lorentz invariance at the start, and then demonstrated that all the other "constructive"
labor involved in Lorentz's approach was superfluous, because once we have adopted

these premises, all the experimental results arise naturally from the simple kinematics of
the situation, with no need for molecular force hypotheses or any other exotic and
dubious conjectures regarding the ultimate constituency of matter. On some level Lorentz
grasped the superiority of the purely relativistic approach, as is evident from the words he
included in the second edition of his "Theory of Electrons" in 1916:
If I had to write the last chapter now, I should certainly have given a more
prominent place to Einstein's theory of relativity by which the theory of
electromagnetic phenomena in moving systems gains a simplicity that I had not
been able to attain. The chief cause of my failure was my clinging to the idea that
the variable t only can be considered as the true time, and that my local time t'
must be regarded as no more than an auxiliary mathematical quantity.
Still, it's clear that neither Lorentz nor Poincare ever whole-heartedly embraced special
relativity, for reasons that may best be summed up by Lorentz when he wrote
Yet, I think, something may also be claimed in favor of the form in which I have
presented the theory. I cannot but regard the aether, which can be the seat of an
electromagnetic field with its energy and its vibrations, as endowed with a certain
degree of substantiality, however different it may be from all ordinary matter. In
this line of thought it seems natural not to assume at starting that it can never
make any difference whether a body moves through the aether or not, and to
measure distances and lengths of time by means of rods and clocks having a fixed
position relatively to the aether.
This passage implies that Lorentz's rationale for retaining a substantial aether and
attempting to refer all measurements to the rest frame of this aether (without, of course,
specifying how that is to be done) was the belief that it might, after all, make some
difference whether a body moves through the aether or not. In other words, we should
continue to look for physical effects that violate Lorentz invariance (by which we now
mean local Lorentz invariance), both in new physical forces and at higher orders of v/c
for the known forces. A century later, our present knowledge of the weak and strong
nuclear forces and the precise behavior of particles at 0.99999c has vindicated Einstein's
judgment that Lorentz invariance is a fundamental principle whose significance and
applicability extends far beyond Maxwell's equations, and apparently expresses a general
attribute of space and time, rather than a specific attribute of particular physical entities.
In addition to the formulas expressing the Lorentz transformations, we can also find
precedents for other results commonly associated with special relativity, such as the
equivalence of mass and energy. In fact, the general idea of associating mass with energy
in some way had been around for about 25 years prior to Einstein's 1905 papers. Indeed,
as Thomson and even Einstein himself noted, this association is already implicit in
Maxwell's theory. With electric and magnetic fields e and b, the energy density is (e2 +
b2)/(8) and the momentum density is (e x b)/(4c), so in the case of radiation (when e
and b are equal and orthogonal) the energy density is E = e2/(4) and the momentum
density is p = e2/(4c). Taking momentum p as the product of the radiation's "mass" m

times its velocity c, we have

and so E = mc2. Indeed, in the 1905 paper containing his original deduction of massenergy equivalence, Einstein acknowledges that it was explicitly based on "Maxwell's
expression for the electromagnetic energy of space". We can also mention the pre-1905
work of Poincare and others on the electron mass arising from it's energy, and the work of
Hasenohrl on how the mass of a cavity increases when it is filled with radiation.
However, these suggestions were all very restricted in their applicability, and didn't
amount to the assertion of a fundamental equivalence such as emerges so clearly from
Einstein's relativistic interpretation. Hardly any of the formulas in Einstein's two 1905
papers on relativity were new, but what Einstein provided was a single conceptual
framework within which all those formulas flow quite naturally from a simple set of
general principles.
Occasionally one hears of other individuals who are said to have discovered one or more
aspect of relativity prior to Einstein. For example, in November of 1999 there appeared in
newspapers around the world a story claiming that "The mathematical equation that
ushered in the atomic age was discovered by an unknown Italian dilettante two years
before Albert Einstein used it in developing the theory of relativity...". The "dilettante" in
question was named Olinto De Pretto, and the implication of the story was that Einstein
got the idea for mass-energy equivalence from "De Pretto's insight". There are some
obvious difficulties with this account, only some of which can be blamed on the
imprecision of popular journalism. First, the story claimed that Einstein used the idea of
mass-energy equivalence to develop special relativity, whereas in fact the suggestion that
energy has inertia appeared in a very brief note that Einstein submitted for publication
toward the end of 1905, after the original paper on special relativity.
The report went on to say that "De Pretto had stumbled on the equation, but not the
theory of relativity... It was republished in 1904 by Veneto's Royal Science Institute... A
Swiss Italian named Michele Besso alerted Einstein to the research and in 1905 Einstein
published his own work..." Now, it's certainly true that Besso was Italian, and worked
with Einstein at the Bern Patent Office during the years leading up to 1905, and it's true
that they discussed physics, and Besso provided Einstein with suggestions for reading
(for example, it was Besso who introduced him to the works of Ernst Mach). However,
the idea that Einstein's second relativity paper in 1905 (let alone the first) was in any way
prompted by De Pretto's obscure and unfounded comments is bizarre.
In essence, De Pretto's "insight" was the (hardly novel) idea that matter consists of tiny
particles (of what, he does not say), agitated by their exposure to the ultra-mundane ether
particles of Georges Le Sage's "shadow theory" of gravity. Since the particles in every
aggregate of matter are in motion, every quantity of mass contains an amount of energy
equal to Leibniz's "vis viva", the living force, which Leibniz defined as mv2. Oddly
enough, De Pretto seems to have been under the impression that mv2 was the kinetic

energy of macroscopic bodies moving at the speed v. On this (erroneous) basis, and
despite the fact that De Pretto did not regard the speed of light as a physically limiting
speed, he noted that Le Sage's ether particles were thought to move at approximately the
speed of light, and so (he reasoned) the particles comprising a stationary aggregate of
matter may also be vibrating internally at the speed of light. In that case, the vis viva of
each quantity of mass m would be mc2, which, he alertly noted, is a lot of energy.
Needless to say, this bears no resemblance at all to the path that Einstein actually
followed to mass-energy equivalence.
Moreover, there were far more accessible and authoritative sources available to him for
the idea of mass-energy equivalence, including Thomson, Lorentz, Poincare, etc. (not to
mention Isaac Newton, who famously asked "Are not gross bodies and light convertible
into one another...?"). After all, the idea that the electron's mass was electromagnetic in
origin was one of the leading hypotheses of research at that time. It would be like saying
that some theoretical physicist today had never heard of string theory! Also, the story
requires us to believe that Einstein got this information after submitting the paper on
Electrodynamics of Moving Bodies in the summer of 1905 (which contained the
complete outline of special relativity but no mention of E = mc2) but prior to submitting
the follow-up note just a few months later. Reader's can judge for themselves from a note
that Einstein wrote to his close friend Conrad Habicht as he was preparing the massenergy paper whether this idea was prompted by the inane musings of an obscure Italian
dilettante on Leibnizian vis viva:
One more consequence of the paper on electrodynamics has also occurred to me.
The principle of relativity, in conjunction with Maxwell's equations, requires that
mass be a direct measure of the energy contained in a body; light carries mass
with it. A noticeable decrease of mass should occur in the case of radium [as it
emits radiation]. The argument [which he intends to present in the paper] is
amusing and seductive, but for all I know the Lord might be laughing over it and
leading me around by the nose.
These are clearly the words of someone who is genuinely working out the consequences
of his own recent paper, and wondering about their validity, not someone who has gotten
an idea from seeing a formula in someone else's paper. Of course, the most obvious proof
that special relativity did not arise from any Leibnizian or Le Sagean ideas is simply the
wonderfully lucid thought process presented by Einstein in his 1905 paper, beginning
from first principles and a careful examination of the physical significance of time and
space, and leading to the kinematics of special relativity, from which the inertia of energy
follows naturally.
Nevertheless, we shouldn't underestimate the real contributions to the development of
special relativity made by Einstein's predecessors, most notably Lorentz and Poincare. In
addition, although Einstein was remarkably thorough in his 1905 paper, there were
nevertheless important contributions to the foundations of special relativity made by
others in the years that followed. For example, in 1907 Max Planck greatly clarified
relativistic mechanics, basing it on the conservation of momentum with his "more

advantageous" definition of force, as did Tolman and Lewis. Planck also critiqued
Einstein's original deduction of mass-energy equivalence, and gave a more general and
comprehensive argument. (This led Johannes Stark in 1907 to cite Planck as the
originator of mass-energy equivalence, prompting an angry letter from Einstein saying
that he "was rather disturbed that you do not acknowledge my priority with regard to the
connection between mass and energy". In later years Stark became an outspoken critic of
Einstein's work.)
Another crucially important contribution was made by Hermann Minkowski (one of
Einstein's former professors), who recognized that what Einstein had described was
simply ordinary kinematics in a four-dimensional spacetime manifold with the pseudometric
(d)2 = (dt)2 (dx)2 (dy)2 (dz)2
Poincare had also recognized this as early as 1905. This was vital for the generalization
of relativity which Einstein with the help of his old friend Marcel Grossmann
developed on the basis on the theory of curved manifolds developed in the 19th century
by Gauss and Riemann.
The tensor calculus and generally covariant formalism employed by Einstein in his
general theory had been developed by Gregorio Ricci-Curbastro and Tullio Levi-Civita
around 1900 at the University of Padua, building on the earlier work of Gauss, Riemann,
Beltrami, and Christoffel. In fact, the main technical challenge that occupied Einstein in
his efforts to find a suitable field law for gravity, which was to construct from the metric
tensor another tensor whose covariant derivative automatically vanishes, had already
been solved in the form of the Bianchi identities, which lead directly to the Einstein
tensor as discussed in Section 5.8.
Several other individuals are often cited as having anticipated some aspect of general
relativity, although not in any sense of contributing seriously to the formulation of the
theory. John Mitchell wrote in 1783 about the possibility of "dark stars" that we so
massive light could not escape from them, and Laplace contemplated the same possibility
in 1796. Around 1801 Johann von Soldner predicted that light rays passing near the Sun
would be deflected by the Suns gravity, just like a small corpuscle of matter moving at
the speed of light. (Ironically, although Newtons theory implies a deflection of just half
the relativistic value, Soldner erroneously omitted a factor of 1/2 from his calculation, so
he arrived at the relativistic value, albeit by a computational error.) William Clifford
wrote about a possible connection between matter and curved space in 1873.
Interestingly, the work of Soldner had been virtually forgotten until being rediscovered
and publicized by Philipp Lenard in 1921, along with the claim that Hasenohrl should be
credited with the mass-energy equivalence relation. Similarly in 1917 Ernst Gehrcke
arranged for the re-publication of a 1898 paper by a secondary school teacher named Paul
Gerber which contained a formula for the precession of elliptical orbits identical to the
one Einstein had derived from the field equations of general relativity. Gerber's approach

was based on the premise that the gravitational potential propagates at the speed of light,
and that the effect of the potential on the motion of a body depends on the body's velocity
through the potential field. His potential was similar in form to the Gauss-Weber theories.
However, Gerber's "theory" was (and still is) regarded as unsatisfactory, mainly because
his conclusions dont follow from his premises, but also because the combination of
Gerber's proposed gravitational potential with the rest of (nonrelativistic) physics results
in predictions (such as 3/2 the relativistic prediction for the deflection of light rays near
the Sun) which are inconsistent with observation. In addition, Gerber's free mixing of
propagating effects with some elements of action-at-a-distance tended to undermine the
theoretical coherence of his proposal.
The writings of Mitchell, Soldner, Gerber, and others were, at most, anticipations of some
of the phenomenology later associated with general relativity, but had nothing to do with
the actual theory of general relativity, i.e., a theory that conceives of gravity as a
manifestation of the curvature of spacetime. A closer precursors can be found in the
notional writings of William Kingdon Clifford, but like Gauss and Riemann he lacked the
crucial idea of including time as one of the dimensions of the manifold. As noted above,
the formal means of treating space and time as a single unified spacetime manifold was
conceived by Poincare and Minkowski, and the tensor calculus was developed by Ricci
and Levi-Civita, with whom Einstein corresponded during the development of general
relativity. Its also worth mentioning that Einstein and Grossmann, working in
collaboration, came very close to discovering the correct field equations in 1913, but
were diverted by an erroneous argument that led them to believe no fully covariant
equations could be consistent with experience. In retrospect, this accident may have been
all that prevented Grossmann from being perceived as a co-creator of general relativity.
On the other hand, Grossmann had specifically distanced himself from the physical
aspects of the 1913 paper, and Einstein wrote to Sommerfeld in July 1915 (i.e., prior to
arriving at the final form of the field equations) that
Grossman will never lay claim to being co-discoverer. He only helped in guiding
me through the mathematical literature but contributed nothing of substance to the
results.
In the summer of 1915 Einstein gave a series of lectures at Gottingen on the general
theory, and apparently succeeded in convincing both Hilbert and Klein that he was close
to an important discovery, despite the fact that he had not yet arrived at the final form of
the field equations. Hilbert took up the problem from an axiomatic standpoint, and
carried on an extensive correspondence with Einstein until the 19th of November. On the
20th, Hilbert submitted a paper to the Gesellschaft der Wissenschaften in Gottingen with
a derivation of the field equations. Five days later, on 25 November, Einstein submitted a
paper with the correct form of the field equations to the Prussian Academy in Berlin. The
exact sequence of events leading up to the submittal of these two papers and how much
Hilbert and Einstein learned from each other is somewhat murky, especially since
Hilberts paper was not actually published until March of 1916, and seems to have
undergone some revisions from what was originally submitted. However, the question of
who first wrote down the fully covariant field equations (including the trace term) is less

significant than one might think, because, as Einstein wrote to Hilbert on 18 November
after seeing a draft of Hilberts paper
The difficulty was not in finding generally covariant equations for the gs; for
this is easily achieved with the aid of Riemanns tensor. Rather, it was hard to
recognize that these equations are a generalization that is, a simple and natural
generalization of Newtons law.
It might be argued that Einstein was underestimating the mathematical difficulty, since he
hadnt yet included the trace term in his published papers, but in fact he repeated the
same comment in a letter to Sommerfeld on 28 November, this time explicitly referring to
the full field equations, with the trace term. He wrote
It is naturally easy to set these generally covariant equations down; however, it is
difficult to recognize that they are generalizations of Poissons equations, and not
easy to recognize that they fulfill the conservation laws. I had considered these
equations with Grossmann already 3 years ago, with the exception of the [trace
term], but at that time we had come to the conclusion that it did not fulfill
Newtons approximation, which was erroneous.
Thus he regards the purely mathematical task of determining the most general fully
covariant expression involving the gs and their first and second derivatives as
comparatively trivial and straightforward as indeed it is for a competent mathematician.
The Bianchi identities were already known, so there was no new mathematics involved.
The difficulty, as Einstein stressed, was not in writing down the solution of this
mathematical problem, but in conceiving of the problem in the first place, and then
showing that it represents a viable law of gravitation. In this, Einstein was undeniably the
originator, not only in showing that the field equations reduce to Newtons law in the first
approximation, but also in showing that they yield Mercurys excess precession in the
second approximation. Hilbert was suitably impressed when Einstein showed this in his
paper of 18 November, and its important to note that this was how Einstein was spending
his time around the 18th of November, establishing the physical implications of the fully
covariant field equations, while Hilbert was busying himself with elaborating the
mathematical aspects of the problem that Einstein had outlined the previous summer.
Whatever the true sequence of events, it seems that Einstein initially had some feelings of
resentment toward Hilbert, perhaps thinking that Hilbert had acted ungraciously and
stolen some of his glory. Already on November 20 he had written to a friend
The theory is incomparably beautiful, but only one colleague understands it, and
that one works skillfully at "nostrification". I have learned the deplorableness of
humans more in connection with this theory than in any other personal
experience. But it doesn't bother me.
(Literally the word nostrification refers to the process by which a country accepts
foreign academic degrees as if they had been granted by one of its own universities, but

the word has often been used to suggest the appropriation and re-packaging of someone
elses ideas and making them ones own.) However, by December 20 he was able to write
a conciliatory note to Hilbert, saying
There has been between us a certain unpleasantness, whose cause I do not wish to
analyze. I have struggled against feelings of bitterness with complete success. I
think of you again with untroubled friendliness, and ask you to do the same with
me. It would be a shame if two fellows like us, who have worked themselves out
from this shabby world somewhat, cannot enjoy each other.
Thereafter they remained on friendly terms, and Hilbert never publicly claimed any
priority in the discovery of general relativity, and always referred to it as Einsteins
theory.
As it turned out, Einstein can hardly have been dissatisfied with the amount of popular
credit he received for the theories of relativity, both special and general. Nevertheless,
one senses a bit of annoyance when Max Born mentioned to Einstein in 1953 (two years
before Einstein's death) that the second volume of Edmund Whittaker's book A History
of the Theories of Aether and Electricity had just appeared, in which special relativity is
attributed to Lorentz and Poincare, with barely a mention of Einstein except to say that
"in the autumn of [1905] Einstein published a paper which set forth the relativity theory
of Poincare and Lorentz with some amplifications, and which attracted much attention".
In the same book Whittaker attributes some of the fundamental insights of general
relativity to Planck and a mathematician named Harry Bateman (a former student of
Whittakers). Einstein replied to his old friend Born
Everybody does what he considers right... If he manages to convince others, that
is their own affair. I myself have certainly found satisfaction in my efforts, but I
would not consider it sensible to defend the results of my work as being my own
'property', as some old miser might defend the few coppers he had laboriously
scrapped together. I do not hold anything against him [Whittaker], nor of course,
against you. After all, I do not need to read the thing.
On the other hand, in the same year (1953), Einstein wrote to the organizers of a
celebration honoring the upcoming fiftieth anniversary of his paper on the
electrodynamics of moving bodies, saying
I hope that one will also take care on that occasion to suitably honor the merits of
Lorentz and Poincare.
8.9 Paths Not Taken
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood

And looked down one as far as I could


To where it bent in the undergrowth
Robert Frost, 1916
The Archimedian definition of a straight line as the shortest path between two points was
an early expression of a variational principle, leading to the modern idea of a geodesic
path. In the same spirit, Hero explained the paths of reflected rays of light based on a
principle of least distance, which Fermat reinterpreted as a principle of least time,
enabling him to account for refraction as well. Subsequently, Maupertius and others
developed this approach into a general principle of least action, applicable to mechanical
as well as optical phenomena. Of course, as discussed in Chapter 3.4, a more correct
statement of these principles is that systems evolve along stationary paths, which may be
maximal, minimal, or neither (at an inflection point).
This is a tremendously useful principle, but as a realistic explanation it has always been at
least slightly suspect, because (for example) it isn't clear how a single ray of light (or a
photon) moving along a particular path can "know" that it is an extremal path in the
variational sense. To illustrate the problem, consider a photon traveling from A to B
through a transparent medium whose refractive index n increases in the direction of
travel, as indicated by the solid vertical lines in the drawing below:

Since the path AB is parallel to the gradient of the refractive index, it undergoes no
refraction. However, if the lines of constant refractive index were tilted as shown by the
dashed diagonal lines in the figure, a ray of light initially following the path AB will be
refracted and arrive at C, even though the index of refraction at each point along the path
AB is identical to what it was before, where there was no refraction. This shows that the
path of a light ray cannot be explained solely in terms of the value of the refractive index
the path. We must also consider the transverse values of the refractive index along
neighboring paths, i.e., along paths not taken.
The classical wave explanation, proposed by Huygens, resolves this problem by denying
that light can propagate in the form of a single ray. According to the wave interpretation,
light propagates as a wave front possessing transverse width. A small section of a
propagating wave front is shown in the figure below, with the gradient of the refractive
index perpendicular to the initial trajectory of light:

Clearly the wave front propagates more rapidly on the side where the refractive index is
low (viz, the speed of light is high) than on the side where the refractive index is high.
As a result, the wave front naturally turns in the direction of higher refractive index (i.e.,
higher density). It's easy to see that the amount of deflection of the normal to the wave
front agrees precisely with the result of applying Fermat's principle, because the wave
front represents a locus of points that are at an equal phase distance from the point of
emission. Thus the normal to the wave front is, by definition, a stationary path in the
variational sense.
More generally, Huygens articulated the remarkable principle that every point of a wave
front can be regarded as the origin of a secondary spherical wave, and the envelope of all
these secondary waves constitutes the propagated wave front. This is illustrated in the
figure below:

Huygens also assumed the secondary wave originating at any point has the same speed
and frequency as the primary wave at that point. The main defect in Huygens' wave
theory of optics was it's failure to account for the ray-like properties of light, such as the
casting of sharp shadows. Because of this failure (and also the inability of the wave
theory to explain polarization), the corpuscular theory of light favored by Newton seemed
more viable throughout the 18th century. However, early in the 19th century, Young and
Fresnel modified Huygens' principle to include the crucial element of interference. The
modified principle asserts that the amplitude of the propagated wave is determined by the
superposition of all the (unobstructed) secondary wavelets originating on the wave front
at any prior instant. (Young also proposed that light was a transverse rather than
longitudinal wave, thereby accounting for polarization - but only at the expense of
making it very difficult to conceive of a suitable material medium, as discussed in
Chapter 3.5.)
In his critique of the wave theory of light Newton (apparently) never realized that waves
actually do exhibit "rectilinear motion", and cast sharp shadows, etc., provided that the
wavelength is small on the scale of the obstructions. In retrospect, it's surprising that

Newton, the superb experimentalist, never noticed this effect, since it can be seen in
ordinary waves on the surface of a pool of water. Qualitatively, if the wavelength is large
relative to an aperture, the phases of the secondary wavelets emanating from every point
in the mouth of the aperture to any point in the region beyond will all be within a fraction
of a cycle from each other, so they will (more or less) constructively reinforce each
other. On the other hand, if the wavelength is very small in comparison with the size of
the aperture, the region of purely constructive interference on the far side of the aperture
will just be a narrow band perpendicular to the aperture.
The wave theory of light is quite satisfactory for a wide range of optical phenomena, but
when examined on a microscopic scale we find the transfer of energy and momentum via
electromagnetic waves exhibits a granularity, suggesting that light comes in discrete
quanta (packets). Planck had originated the quantum theory in 1900 by showing that the
so-called ultra-violet catastrophe entailed by the classical theory of blackbody radiation
(which predicted infinite energy at the high end of the spectrum) could be avoided - and
the actual observed radiation could be accurately modeled - if we assume oscillators
lining the walls of the cavity can absorb and emit electromagnetic energy only in discrete
units proportional to the frequency, . The constant of proportionality is now known as
Planck's constant, denoted by h, and has the incredibly tiny value (6.626)10-34 Joule
seconds. Thus a physical oscillator with frequency emits and absorbs energy in integer
multiples of h.
Planck's interpretation was that the oscillators were quantized, i.e., constrained to emit
and absorb energy in discrete units, but he did not (explicitly) suggest that electromagnetic energy itself was inherently quantized. However, in a sense, this further step
was unavoidable, because ultimately light is nothing but its emissions and absorptions.
It's not possible to "see" an isolated photon. The only perceivable manifestation of
photons is their emissions and absorptions by material objects. Thus if we carry Planck's
assumption to its logical conclusion, it's natural to consider light itself as being quantized
in tiny bundles of energy h. This was explicitly proposed by Einstein in 1905 as a
heuristic approach to understanding the photoelectric effect.
Incidentally, it was this work on the photoelectric effect, rather than anything related to
special or general relativity, that was cited by the Nobel committee in 1921 when Einstein
was finally awarded the prize. Interestingly, the divorce settlement of Albert and Mileva
Einstein, negotiated through Einstein's faithful friend Besso in 1918, included the
provision that the cash award of any future Nobel prize which Albert might receive
would go to Mileva for the care of the children, as indeed it did. We might also observe
that Einstein's work on the photoelectric effect was much more closely related to the
technological developments leading to the invention of television than his relativity
theory was to the unleashing of atomic energy. Thus, if we wish to credit or blame
Einstein for laying the scientific foundations of a baneful technology, it might be more
accurate to cite television rather than the atomic bomb.
In any case, it had been known for decades prior to 1905 that if an electromagnetic wave
shines on a metallic substance, which possesses many free valence electrons, some of

those electrons will be ejected from the metal. However, the classical wave theory of
light was unable to account for several features of this observed phenomena. For
example, according to the wave theory the kinetic energy of the ejected electrons should
increase as the intensity of the incident light is increased (at constant frequency), but in
fact we observe that the ejected electrons invariably possess exactly the same kinetic
energy for a given frequency of light. Also, the wave theory predicts that the
photoelectric effect should be present (to some degree) at all frequencies, whereas we
actually observe a definite cutoff frequency, below which no electrons are ejected,
regardless of the intensity of the incident light. A more subtle point is that the classical
wave theory predicts a smooth continuous transfer of energy from the wave to a particle,
and this implies a certain time lag between when the light first strikes the metal and when
electrons begin to be ejected. No such time lag is observed.
Einstein's proposal for explaining the details of the photoelectric effect was to take
Planck's quantum theory seriously, and consider the consequences of assuming that light
of frequency consists of tiny bundles - later given the name photons - of energy h.
Just as Planck had said, each material "oscillator" emits and absorbs energy in integer
multiples of this quantity, which Einstein interpreted as meaning that material particles
(such as electrons) emit and absorb whole photons. This is an extraordinary hypothesis,
and might seem to restore Newton's corpuscular theory of light. However, these particles
of light were soon found to possess properties and exhibit behavior quite unlike ordinary
macroscopic particles. For example, in 1924 Bose gave a description of blackbody
radiation using the methods of statistical thermodynamics based on the idea that the
cavity is filled with a "gas" of photons, but the statistical treatment regards the individual
photons as indistinguishable and interchangeable, i.e., not possessing distinct identities.
This leads to the Bose-Einstein distribution

which gives, for a system in equilibrium at temperature T, the expected number of


particles in a quantum state with energy E. In this equation, k is Boltzman's constant
and A is a constant determined by number of particles in the system. Particles that obey
Bose-Einstein statistics are called Bosons. Compare this distribution with the classical
Boltzman distribution, which applies to a collection of particles with distinct identities
(such as complex atoms and molecules)

A third equilibrium distribution arises if we consider indistinguishable particles that obey


the Pauli exclusion principle, which precludes more than one particle from occupying any
given quantum state in a system. Such particles are called fermions, the most prominent
example being electrons. It is the exclusion principle that accounts for the variety and
complexity of atoms, and their ability to combine chemically to form molecules. The
energy distribution in an equilibrium gas of fermions is

The reason photons obey Bose-Einstein rather than Fermi statistics is that they do not
satisfy the Pauli exclusion principle. In fact, multiple bosons actually prefer to occupy
the same quantum state, which led to Einstein's prediction of stimulated emission, the
principle of operation behind lasers, which have become so ubiquitous today in CD
players, fiber optic communications, and so on. Thus the photon interpretation has
become an indispensable aspect of our understanding of light.
However, it also raises some profound questions about our most fundamental ideas of
space, time, and motion. First, the indistinguishability and interchangeability of
fundamental particles (fermions as well as bosons) challenges the basic assumption that
distinct objects can be identified from one instant of time to the next, which (as discussed
in Chapter 1.1) underlies our intuitive concept of motion. Second, even if we consider
the emission and absorption of just a single particle of light, we again face the question of
how the path of this particle is chosen from among all possible paths between the
emission and absorption events. We've seen that Fermat's principle of least time seems to
provide the answer, but it also seems to imply that the photon somehow "knows" which
direction at any given point is the quickest way forward, even though the knowledge
must depend on the conditions at points not on the path being followed. Also, the
principle presupposes either a fixed initial trajectory or a defined destination, neither of
which is necessarily available to a photon at the instant of emission.
In a sense, the principle of least time is backwards, because it begins by positing
particular emission and absorption events, and infers the hypothetical path of a photon
connecting them, whereas we should like (classically) to begin with just the emission
event and infer the time and location of the absorption event. The principle of Fermat
can only assist us if we assume a particular definite trajectory for the photon at emission,
without reference to any absorption. Unfortunately, the assignment of a definite
trajectory to a photon is highly problematical because, as noted above, a photon really is
nothing but an emission and an associated absorption. To speak about the trajectory of a
free photon is to speak about something that cannot, even in principle, ever be observed.
Moreover, many optical phenomena are flatly inconsistent with the notion of free photons
with definite trajectories. The wavelike behavior of light, such as demonstrated in
Young's two-slit interference experiment, defy explanation in terms of free particles of
light moving along free trajectories independent of the emission and absorption events.
The figure below gives a schematic of Young's experiment, showing that the intensity of
light striking the collector screen exhibits the interference effects of the light emanating
from the two slits in the intermediate screen.

This interference pattern is easily explained in terms of interfering waves, but for light
particles we expect the intensity on the collector screen to be just the sum of the
intensities given by each slit individually. Still, if we regard the flow of light as
consisting of a large number of photons, each with their own phases, we might be able to
imagine that they somehow mingle with each other while passing from the source to the
collector, thereby producing the interference pattern. However, the problem becomes
more profound if we reduce the intensity of the light source to a sufficiently low level
that we can actually detect the arrival of individual photons, like clicks on a Geiger
counter, by an array of individual photo-detectors lining the collector screen. Each
arrival is announced by just a single detector. We can even reduce the intensity to such a
low level that no more than one photon is "in flight" at any given time. Under these
conditions there can be no "mingling" of various photons, and yet if the experiment is
carried on long enough we find that the number of arrivals at each point on the collector
screen matches the interference pattern.
The modern theory of quantum electrodynamics explains this behavior by denying that
photons follow definite trajectories through space and time. Instead, an emitter has at
each instant along its worldline a particular complex amplitude for emitting a photon, and
a potential absorber has a complex amplitude for absorbing that photon. The amplitude
at the absorber is the complex sum of the emission amplitudes of the emitter at various
times in the past, corresponding to the times required to traverse each of the possible
paths from the emitter to the absorber. At each of those times the light source had a
certain complex amplitude for emitting a photon, and the phase of that amplitude
advances steadily along the timeline of the emitter, giving a frequency equal to the
frequency of the emitted light.
For example, when we look at the reflection of a light source on a mirror our eye is at one
end of a set of rays, each of slightly different length, which implies that amplitude for
each path corresponds to the amplitude of the emitter at a slightly different time in the
past. Thus, we are actually receiving an image of the light source from a range of times
in the past. This is illustrated in the drawing below:

If the optical path lengths of the bundle of incoming rays in a particular direction are all
nearly equal (meaning that the path is "stationary" in the variational sense), their
amplitudes will all be nearly in phase, so they reinforce each other, yielding a large
complex sum. On the other hand, if the lengths of the paths arriving from a particular
direction differ significantly, the complex sum of amplitudes will be taken over several
whole cycles of the oscillating emitter amplitude, so they largely cancel out. This is why
most of the intensity of the incoming ray arrives from the direction of the stationary path,
which conforms with Hero's equi-angular reflection.
To test the reality of this interpretation, notice that it claims the absence of reflected light
at unequal angles is due to the canceling contributions of neighboring paths, so in theory
we ought to be able to delete the paths corresponding to all but one phase angle of the
emitter, and thereby enable us to see non-Heronian reflected light. This is actually the
principle of operation of a diffraction grating, where alternating patches of a reflecting
surface are scratched away, at intervals in proportion to the wavelength of the light.
When this is done, it is indeed possible to see light reflected at highly non-Heronian
angles, as illustrated below.

All of this suggests that the conveyance of electromagnetic energy from an emitter to an
absorber is not well-described in terms of a classical free particle following a free path
through spacetime. It also suggests that what we sometimes model as wave properties of
electromagnetic radiation are really wave properties of the emitter. This is consistent with
the fact that the wave function of a putative photon does not advance along its null
worldline. See Section 9.10, where it is argued that the concept of a "free photon" is
meaningless, because every photon is necessarily emitted and absorbed. If we compare a
photon to a clap, then a "free photon" is like clapping with no hands.

9.1 In the Neighborhood


Nothing puzzles me more than time and space; and yet nothing troubles me less,
as I never think about them.
Charles Lamb (1775-1834)
It's customary to treat the relativistic spacetime manifold as an ordinary topological space
with the same topology as a four-dimensional Euclidean manifold, denoted by R4. This is
typically justified by noting that the points of spacetime can be parameterized by a set of
four coordinates x,y,z,t, and defining the "neighborhood" of a point somewhat informally
as follows (quoted from Ohanian and Ruffinni):
"...the neighborhood of a given point is the set of all points such that their
coordinates differ only a little from those of the given point."
Of course, the neighborhoods given by this definition are not Lorentz-invariant, because
the amount by which the coordinates of two points differ is highly dependent on the
frame of reference. Consider, for example, two spacetime points in the xt plane with the
coordinates {0,0} and {1,1} with respect to a particular system of inertial coordinates. If
we consider these same two points with respect to the frame of an observer moving in the
positive x direction with speed v (and such that the origin coincides with the former
coordinate origin), the differences in both the space and time coordinates are reduced by a
factor of
, which can range anywhere between 0 and . Thus there exist
valid inertial reference systems with respect to which both of the coordinates of these
points differ (simultaneously) by as little or as much as we choose. Based on the above
definition of neighborhood (i.e., points whose coordinates differ only a little), how can
we decide if these two points are in the same neighborhood?
It might be argued that the same objection could be raised against this coordinate-based
definition of neighborhoods in Euclidean space, since we're free to scale our coordinates
arbitrarily, which implies that the numerical amount by which the coordinates of two
given (distinct) points differ is arbitrary. However, in Euclidean space this objection is
unimportant, because we will arrive at the same definition of limit points, and thus the
same topology, regardless of what scale factor we choose. In fact, the same applies even
if we choose unequal scale factors in different directions, provided those scale factors are
all finite and non-zero.
From a strictly mathematical standpoint, the usual way of expressing the arbitrariness of
metrical scale factors for defining a topology on a set of points is to say that if two
systems of coordinates are related by a diffeomorphism (a differentiable mapping that
possess a differentiable inverse), then the definition of neighborhoods in terms of
"coordinates that differ only a little" will yield the same limit points and thus the same
topology. However, from the standpoint of a physical theory it's legitimate to ask
whether the set of distinct points (i.e., labels) under our chosen coordinate system
actually corresponds one-to-one with the distinct physical entities whose connectivities

we are tying to infer. For example, we can represent formal fractions x/y for real values
of x and y as points on a Euclidean plane with coordinates (x,y), and conclude that the
topology of formal fractions is R2, but of course the value of every fraction lying along a
single line through the origin is the same, and the values of fractions have the natural
topology of R1 (because the reals are closed under division, aside from divisions by zero).
If the meanings assigned to our labels are arbitrary, then these are simply two different
manifolds with their own topologies, but for a physical theory we may wish to decide
whether the true objects of our study - the objects with ontological status in our theory are formal fractions or the values of fractions. When trying to infer the natural physical
topology of the points of spacetime induced by the Minkowski metric we face a similar
problem of identifying the actual physical entities whose mutual connectivities we are
trying to infer, and the problem is complicated by the fact that the "Minkowski metric" is
not really a metric at all (as explained below).
Recall that for many years after general relativity was first proposed by Einstein there
was widespread confusion and misunderstanding among leading scientists (including
Einstein himself) regarding various kinds of singularities. The main source of confusion
was the failure to clearly distinguish between singularities of coordinate systems as
opposed to actual singularities of the manifold/field. This illustrates how we can be
misled by the belief that the local topology of a physical manifold corresponds to the
local topology of any particular system of coordinates that we may assign to that physical
manifold. Its entirely possible for the manifold of coordinates to have a different
topology than the physical manifold to which those coordinates are applied. With this in
mind, its worthwhile to consider carefully whether the most physically meaningful local
topology of spacetime is necessarily the same as the topology of the usual fourdimensional systems of coordinates that are conventionally applied to it.
Before examining the possible topologies of Minkowski spacetime in detail, it's
worthwhile to begin with a review of the basic definitions of point set topologies and
topological spaces. Given a set S, let P(S) denote the set of all subsets of S. A topology
for the set S is a mapping T from the Cartesian product {S P(S)} to the discrete set
{0,1}. In other words, given any element e of S, and any subset A of S, the mapping
T(A,e) returns either 0 or 1. In the usual language of topology, we say that e is a limit
point of A if and only if T(A,e) = 1.
As an example, we can define a topology on the set of points of 2D Euclidean space
equipped with the usual Pythagorean metric
(1)
by saying that the point e is a limit point of any subset A of points of the plane if and only
if for every positive real number there is an element u (other than e) of A such that
d(e,u) < . Clearly this definition relies on prior knowledge of the "topology" of the real
numbers, which is denoted by R1. The topology of 2D Euclidean space is called R2, since
it is just the Cartesian product R1 R1.

The topology of a Euclidean space described above is actually a very special kind of
topology, called a topological space. The distinguishing characteristic of a topological
space S,T is that S contains a collection of subsets, called the open sets (including S itself
and the empty set) which is closed under unions and finite intersections, and such that a
point p is a limit point of a subset A of S if and only if every open set containing p also
contains a point of A distinct from p. For example, if we define the collection of open
spherical regions in Euclidean space, together with any regions that can be formed by the
union or finite intersection of such spherical regions, as our open sets, then we arrive at
the same definition of limit points as given previously. Therefore, the topology we've
described for the points of Euclidean space constitutes a topological space. However, it's
important to realize that not every topology is a topological space.
The basic sets that we used to generate the Euclidean topology were spherical regions
defined in terms of the usual Pythagorean metric, but the same topology would also be
generated by any other metric. In general, a basis for a topological space on the set S is a
collection B of subsets of S whose union comprises all of S and such that if p is in the
intersection of two elements Bi and Bj of B, then there is another element Bk of B which
contains p and which is entirely contained in the intersection of Bi and Bj, as illustrated
below for circular regions on a plane.

Given a basis B on the set S, the unions of elements of B satisfy the conditions for open
sets, and hence serve to define a topological space. (This relies on the fact that we can
represent non-circular regions, such as the intersection of two circular open sets, as the
union of an infinite number of circular regions of arbitrary sizes.)
If we were to substitute the metric
d(a,b) = |xa xb| + |ya yb|
in place of the Pythagorean metric, then the basis sets, defined as loci of points whose
"distances" from a fixed point p are less than some specified real number r, would be
square-shaped diamonds instead of circles, but we would arrive at the same topology, i.e.,
the same definition of limit points for the subsets of the Euclidean plane E2. In general,
any true metric will induce this same local topology on a manifold. Recall that a metric
is defined as a distance function d(a,b) for any two points a,b in the space satisfying the
three axioms
(1) d(a,b) = 0 if and only if a = b

(2) d(a,b) = d(b,a) for each a,b


(3) d(a,c) d(a,b) + d(b,c) for all a,b,c
It follows that d(a,b) 0 for all a,b. Any distance function that satisfies the conditions of
a metric will induce the same (local) topology on a set of points, and this will be a
topological space.
However, it's possible to conceive of more general "distance functions" that do not satisfy
all the axioms of a metric. For example, we can define a distance function that is
commutative (axiom 2) and satisfies the triangle inequality (axiom 3), but that allows
d(a,b) = 0 for distinct points a,b. Thus we replace axiom (1) with the weaker requirement
d(a,a) = 0. Such a distance function is called a pseudometric. Obviously if a,b are any
two points with d(a,b) = 0 we must have d(a,c) = d(b,c) for every point c, because
otherwise the points a,b,c would violate the triangle inequality. Thus a pseudometric
partitions the points of the set into equivalence classes, and the distance relations between
these equivalence classes must be metrical. We've already seen a situation in which a
pseudometric arises naturally, if we define the distance between two points in the plane of
formal fractions as the absolute value of the difference in slopes of the lines from the
origin to those two points. The distance between any two points on a single line through
the origin is therefore zero, and these lines represent the equivalence classes induced by
the pseudometric. Of course, the distances between the slopes satisfy the requirements of
a metric. Therefore, the absolute difference of value is a pseudometric for the space of
formal fractions.
Now, we know that the points of a two-dimensional plane can be assigned the R2
topology, and the values of fractions can be assigned the R1 topology, but what kind of
local topology is induced on the two-dimensional space of formal fractions by the
pseudometric? We can use our pseudometric distance function to define a basis, just as
with a metrical distance function, and arrive at a topological space, but this space will not
generally possess all the separation properties that we commonly expect for distinct
points of a topological space.
It's convenient to classify the separation properties of topological spaces according to the
"trennungsaxioms", also called the Ti axioms, introduced by Alexandroff and Hopf.
These represent a sequence of progressively stronger separation axioms to be met by the
points of a topological space. A space is said to be T0 if for any two distinct points at
least one of them is in a neighborhood that does not include the other. If each point is
contained in a neighborhood that does not include the other, then the space is called T1. If
the space satisfies the even stronger condition that any two points are contained in
disjoint open sets, then the space is called T2, also known as a Hausdorff space. There are
still more stringent separation axioms that can be applied, corresponding to T3 (regular),
T4 (normal), and so on.
Many topologists will not even consider a topological space which is not at least T2 (and
some aren't interested in anything which is not at least T4), and yet it's clear that the
topology of the space of formal fractions induced by the pseudometric of absolute values

is not even T0, because two distinct fractions with the same value (such as 1/3 and 2/6)
cannot be separated into different neighborhoods by the pseudometric. Nevertheless, we
can still define the limit points of the set of formal fractions based on the pseudometric
distance function, thereby establishing a perfectly valid topology. This just illustrates that
the distinct points of a topology need not exhibit all the separation properties that we
usually associate with distinct points of a Hausdorff spaces (for example).
Now let's consider 1+1 dimensional Minkowski spacetime, which is physically
characterized by an invariant spacetime interval whose magnitude is
d(a,b) = | (ta tb)2 (xa xb)2 |

(2)

Empirically this appears to be the correct measure of absolute separation between the
points of spacetime, i.e., it corresponds to what clocks measure along timelike intervals
and what rulers measure along spacelike intervals. However, this distance function
clearly does not satisfy the definition of a metric, because it can equal zero for distinct
points. Moreover, it is not even a pseudo-metric, because the interval between points a
and b is always greater than the sum of the intervals from a to c and from c to b,
contradicting the triangle inequality. For example, it's quite possible in Minkowski
spacetime to have two sides of a "triangle" equal to zero while the remaining side is
billions of light years in length. Thus, the absolute interval of space-time does not
provide a metrical measure of distance in the strict sense. Nevertheless, in other ways the
magnitude of the interval d(a,b) is quite analogous to a metrical distance, so it's
customary to refer to it loosely as a "metric", even though it is neither a true metric nor
even a pseudometric. We emphasize this fact to remind ourselves not to prejudge the
topology induced by this distance function on the points of Minkowski spacetime, and
not to assume that distinct events possess the separation properties or connectivities of a
topological space.
The -neighborhood of a point p in the Euclidean plane based on the Pythagorean metric
(1) consists of the points q such that d(p,q) < . Thus the -neighborhoods of two points
in the plane are circular regions centered on the respective points, as shown in the lefthand illustration below. In contrast, the -neighborhoods of two points in Minkowski
spacetime induced by the Lorentz-invariant distance function (2) are the regions bounded
by the hyperbolic envelope containing the light lines emanating from those points, as
shown in the right-hand illustration below.

This illustrates the important fact that the concept of "nearness" implied by the
Minkowski metric is non-transitive. In a metric (or even a pseudometric) space, the
triangle inequality ensures that if A and B are close together, and B and C are close
together, then A and C cannot be very far apart. This transitivity obviously doesn't apply
to the absolute magnitudes of the spacetime intervals between events, because it's
possible for A and B to be null-separated, and for B and C to be null separated, while A
and C are arbitrarily far apart.
Interestingly, it is often suggested that the usual Euclidean topology of spacetime might
break down on some sufficiently small scale, such as over distances on the order of the
Planck length of roughly 10-35 meters, but the system of reference for evaluating that
scale is usually not specified. As noted previously, the spatial and temporal components
of two null-separated events can both simultaneously be regarded as arbitrarily large or
arbitrarily small (including less than 10-35 meters), depending on which system of inertial
coordinates we choose. This null-separation condition permeates the whole of spacetime
(recall Section 1.10 on Null Coordinates), so if we take seriously the possibility of nonEuclidean topology on the Planck scale, we can hardly avoid considering the possibility
that the effective physical topology ("connectedness") of the points of spacetime may be
non-Euclidean along null intervals in their entirety, which span all scales of spacetime.
It's certainly true that the topology induced by a direct application of the Minkowski
distance function (2) is not even a topological space, let alone Euclidean. To generate
this topology, we simply say that the point e is a limit point of any subset A of points of
Minkowski spacetime if and only if for every positive real number there is an element u
(other than e) of A such that d(e,u) < . This is a perfectly valid topology, and arguably
the one most consistent with the non-transitive absolute intervals that seem to physically
characterize spacetime, but it is not a topological space. To see this, recall that in order
for a topology to be a topological space it must be possible to express the limit point
mapping in terms of open sets such that a point e is a limit point of a subset A of S if and
only if every open set containing e also contains a point of A distinct from e. If we define

our topological neighborhoods in terms of the Minkowski absolute intervals, our open
sets would naturally include complete Minkowski neighborhoods, but these regions don't
satisfy the condition for a topological space, as illustrated below, where e is a limit point
of A, but e is also contained in Minkowski neighborhoods containing no point of A.

The idea of a truly Minkowskian topology seems unsatisfactory to many people, because
they worry that it implies every two events are mutually "co-local" (i.e., their local
neighborhoods intersect), and so the entire concept of "locality" becomes meaningless.
However, the fact that a set of points possesses a non-positive-definite line element does
not imply that the set degenerates into a featureless point (which is fortunate, considering
that the spacetime we inhabit is characterized by just such a line element). It simply
implies that we need to apply a more subtle understanding of the concept of locality,
taking account of its non-transitive aspect. In fact, the overlapping of topological
neighborhoods in spacetime suggests a very plausible approach to explaining the "nonlocal" quantum correlations that seem so mysterious when viewed from the viewpoint of
Euclidean topology. We'll consider this in more detail in subsequent chapters.
It is, of course, possible to assign the Euclidean topology to Minkowski spacetime, but
only by ignoring the non-transitive null structure implied by the Lorentz-invariant
distance function. To do this, we can simply take as our basis sets all the finite
intersections of Minkowski neighborhoods. Since the contents of an -neighborhood of a
given point are invariant under Lorentz transformations, it follows that the contents of the
intersection of the -neighborhoods of two given points are also invariant. Thus we can
define each basis set by specifying a finite collection of events with a specific value of
for each one, and the resulting set of points is invariant under Lorentz transformations.
This is a more satisfactory approach than defining neighborhoods as the set of points
whose coordinates (with respect to some arbitrary system of coordinates) differ only a
little, but the fact remains that by adopting this approach we are still tacitly abandoning
the Lorentz-invariant sense of nearness and connectedness, because we are segregating

null-separated events into disjoint open sets. This is analogous to saying, for the plane of
formal fractions, that 4/6 is not a limit point of every set containing 2/3, which is
certainly true on the formal level, but it ignores the natural topology possessed by the
values of fractions. In formulating a physical theory of fractions we would need to
decide at some point whether the observable physical phenomena actually correspond to
pairings of numerators and denominators, or to the values of fractions, and then select the
appropriate topology. In the case of a spacetime theory, we need to consider whether the
temporal and spatial components of intervals have absolute significance, or whether it is
only the absolute intervals themselves that are significant.
It's worth reviewing why we ever developed the Euclidean notion of locality in the first
place, and why it's so deeply engrained in our thought processes, when the spacetime
which we inhabit actually possesses a Minkowskian structure. This is easily attributed to
the fact that our conscious experience is almost exclusively focused on the behavior of
macro-objects whose overall world-lines are nearly parallel relative to the characteristic
of the metric. In other words, we're used to dealing with objects whose mutual velocities
are small relative to c, and for such objects the structure of spacetime does approach very
near to being Euclidean. On the scales of space and time relevant to macro human
experience the trajectories of incoming and outgoing light rays through any given point
are virtually indistinguishable, so it isn't surprising that our intuition reflects a Euclidean
topology. (Compare this with the discussion of Postulates and Principles in Chapter 3.1.)
Another important consequence of the non-positive-definite character of Minkowski
spacetime concerns the qualitative nature of geodesic paths. In a genuine metric space
the geodesics are typically the shortest paths from place to place, but in Minkowski
spacetime the timelike geodesics are the longest paths, in terms of the absolute value of
the invariant intervals. Of course, if we allow curvature, there may be multiple distinct
"maximal" paths between two given events. For example, if we shoot a rocket straight up
(with less than escape velocity), and it passes an orbiting satellite on the way up, and
passes the same satellite again on the way back down, then each of them has followed a
geodesic path between their meetings, but they have followed very different paths.
From one perspective, it's not surprising that the longest paths in spacetime correspond to
physically interesting phenomena, because the shortest path between any two points in
Minkowski spacetime is identically zero. Hence the structure of events was bound to
involve the longest paths. However, it seems rash to conclude that the shortest paths play
no significant role in physical phenomena. The shortest absolute timelike path between
two events follows a "dog leg" path, staying as close as possible to the null cones
emanating from the two events. Every two points in spacetime are connected by a
contiguous set of lightlike intervals whose absolute magnitudes are zero.
Minkowski spacetime provides an opportunity to reconsider the famous "limit paradox"
from freshman calculus in a new context. Recall the standard paradox begins with a twopart path in the xy plane from point A to point C by way of point B as shown below:

If the real segment AC has length 1, then the dog-leg path ABC has length
, as does
each of the zig-zag paths ADEFC, AghiEjklC, and so on. As we continue to subdivide
the path into more and smaller zigzags the envelope of the path converges on the straight
line from A to C. The "paradox" is that the limiting zigzag path still has length
,
whereas the line to which it converges (and from which we might suppose it is indistinguishable) has length 1. Needless to say, this is not a true paradox, because the limit of a
set of convergents does not necessarily possess all the properties of the convergents.
However, from a physical standpoint it teaches a valuable lesson, which is that we can't
necessarily assess the length of a path by assuming it equals the length of some curve
from which it never differs by any measurable amount.
To place this in the context of Minkowski spacetime, we can simply replace the y axis
with the time axis, and replace the Euclidean metric with the Minkowski pseudo-metric.
We can still assume the length of the interval AC is 1, but now each of the diagonal
segments is a null interval, so the total path length along any of the zigzag paths is
identically zero. In the limit, with an infinite number of infinitely small zigzags, the
jagged "null path" is everywhere practically coincident with the timelike geodesic path
AC, and yet its total length remains zero. Of course, the oscillating acceleration required
to propel a massive particle on a path approaching these light-like segments would be
enormous, as would the frequency of oscillation.
9.2 Up To Diffeomorphism
The mind of man is more intuitive than logical, and comprehends more
than it can coordinate.
Vauvenargues,
1746
Einstein seems to have been strongly wedded to the concept of the continuum described

by partial differential equations as the only satisfactory framework for physics. He was
certainly not the first to hold this view. For example, in 1860 Riemann wrote
As is well known, physics became a science only after the invention of
differential calculus. It was only after realizing that natural phenomena are
continuous that attempts to construct abstract models were successful In the
first period, only certain abstract cases were treated: the mass of a body was
considered to be concentrated at its center, the planets were mathematical
points so the passage from the infinitely near to the finite was made only in one
variable, the time [i.e., by means of total differential equations]. In general,
however, this passage has to be done in several variables Such passages lead to
partial differential equations In all physical theories, partial differential
equations constitute the only verifiable basis. These facts, established by
induction, must also hold a priori. True basic laws can only hold in the small and
must be formulated as partial differential equations.
Compare this with Einsteins comments (see Section 3.2) over 70 years later about the
unsatisfactory dualism inherent in Lorentzs theory, which expressed the laws of motion
of particles in the form of total differential equations while describing the
electromagnetic field by means of partial differential equations. Interestingly, Riemann
asserted that the continuous nature of physical phenomena was established by
induction, but immediately went on to say it must also hold a priori, referring somewhat
obscurely to the idea that true basic laws can only hold in the infinitely small. He may
have been trying to convey by these words his rejection of action at a distance. Einstein
attributed this insight to the special theory of relativity, but of course the Newtonian
concept of instantaneous action at a distance had always been viewed skeptically, so it
isnt surprising that Riemann in 1860 like his contemporary Maxwell adopted the
impossibility of distant action as a fundamental principle. (Its interesting the consider
whether Einstein might have taken this, rather than the invariance of light speed, as one
of the founding principles of special relativity, since it immediately leads to the
impossibility of rigid bodies, etc.) In his autobiographical notes (1949) Einstein wrote
There is no such thing as simultaneity of distant events; consequently, there is also
no such thing as immediate action at a distance in the sense of Newtonian
mechanics. Although the introduction of actions at a distance, which propagate at
the speed of light, remains feasible according to this theory, it appears unnatural;
for in such a theory there could be no reasonable expression for the principle of
conservation of energy. It therefore appears unavoidable that physical reality
must be described in terms of continuous functions in space.
Its worth noting that while Riemann and Maxwell had expressed their objections in
terms of action at a (spatial) distance, Einstein can justly claim that special relativity
revealed that the actual concept to be rejected was instantaneous action at a distance. He
acknowledge that distant action propagating at the speed of light which is to say,
action over null intervals is remains feasible. In fact, one could argue that such distant
action was made more feasible by special relativity, especially in the context of

Minkowskis spacetime, in which the null (light-like) intervals have zero absolute
magnitude. For any two light-like separated events there exist perfectly valid systems of
inertial coordinates in terms of which both the spatial and the temporal measures of
distance are arbitrarily small. It doesnt seem to have troubled Einstein (nor many later
scientists) that the existence of non-trivial null intervals potentially undermines the
identification of the topology of pseudo-metrical spacetime with that of a true metric
space. Thus Einstein could still write that the coordinates of general relativity express the
neighborliness of events whose coordinates differ but little from each other. As
argued in Section 9.1, the assumption that the physically most meaningful topology of a
pseudo-metric space is the same as the topology of continuous coordinates assigned to
that space, even though there are singularities in the invariant measures based on those
coordinates, is questionable. Given Einsteins aversion to singularities of any kind,
including even the coordinate singularity at the Schwarzschild radius, its somewhat
ironic that he never seems to have worried about the coordinate singularity of every
lightlike interval and the non-transitive nature of null separation in ordinary Minkowski
spacetime.
Apparently unconcerned about the topological implications of Minkowski spacetime,
Einstein inferred from the special theory that physical reality must be described in terms
of continuous functions in space. Of course, years earlier he had already considered
some of the possible objections to this point of view. In his 1936 essay on Physics and
Reality he considered the already terrifying prospect of quantum field theory, i.e., the
application of the method of quantum mechanics to continuous fields with infinitely
many degrees of freedom, and he wrote
To be sure, it has been pointed out that the introduction of a space-time continuum
may be considered as contrary to nature in view of the molecular structure of
everything which happens on a small scale. It is maintained that perhaps the
success of the Heisenberg method points to a purely algebraical method of
description of nature, that is to the elimination of continuous functions from
physics. Then, however, we must also give up, on principle, the space-time
continuum. It is not unimaginable that human ingenuity will some day find
methods which will make it possible to proceed along such a path. At the present
time, however, such a program looks like an attempt to breathe in empty space.
In his later search for something beyond general relativity that would encompass
quantum phenomena, he maintained that the theory must be invariant under a group that
at least contains all continuous transformations (represented by the symmetric tensor), but
he hoped to enlarge this group.
It would be most beautiful if one were to succeed in expanding the group once
more in analogy to the step that led from special relativity to general relativity.
More specifically, I have attempted to draw upon the group of complex
transformations of the coordinates. All such endeavours were unsuccessful. I also
gave up an open or concealed increase in the number of dimensions, an endeavor
that even today has its adherents.

The reference to complex transformations is an interesting fore-runner of more recent


efforts, notably Penroses twistor program, to exploit the properties of complex functions
(cf Section 9.9). The comment about increasing the number of dimensions certainly has
relevance to current string theory research. Of course, as Einstein observed in an
appendix to his Princeton lectures, In this case one must explain why the continuum is
apparently restricted to four dimensions. He also mentioned the possibility of field
equations of higher order, but he thought that such ideas should be pursued only if there
exist empirical reasons to do so. On this basis he concluded
We shall limit ourselves to the four-dimensional space and to the group of
continuous real transformations of the coordinates.
He went on to describe what he (then) considered to be the logically most satisfying
idea (involving a non-symmetric tensor), but added a footnote that revealed his lack of
conviction, saying he thought the theory had a fair probability of being valid if the way
to an exhaustive description of physical reality on the basis of the continuum turns out to
be at all feasible. A few years later he told Abraham Pais that he was not sure
differential geometry was to be the framework for further progress, and later still, in
1954, just a year before his death, he wrote to his old friend Besso (quoted in Section 3.8)
that he considered it quite possible that physics cannot be based on continuous structures.
The dilemma was summed up at the conclusion of his Princeton lectures, where he said
One can give good reasons why reality cannot at all be represented by a
continuous field. From the quantum phenomena it appears to follow with certainty
that a finite system of finite energy can be completely described by a finite set of
numbers but this does not seem to be in accordance with a continuum theory,
and must lead to an attempt to find a purely algebraic theory for the description of
reality. But nobody knows how to obtain the basis of such a theory.
The area of current research involving spin networks might be regarded as attempts to
obtain an algebraic basis for a theory of space and time, but so far these efforts have not
achieved much success. The current field of string theory has some algebraic aspects,
but it seems to entail much the same kind of dualism that Einstein found so objectionable
in Lorentzs theory. Of course, most modern research into fundamental physics is based
on quantum field theory, about which Einstein was never enthusiastic to put it mildly.
(Bargmann told Pais that Einstein once asked him for a private survey of quantum field
theory, beginning with second quantization. Bargman did so for about a month.
Thereafter Einsteins interest waned.)
Of all the various directions that Einstein and others have explored, one of the most
intriguing (at least from the standpoint of relativity theory) was the idea of expanding
the group once more in analogy to the step that led from special relativity to general
relativity. However, there are many different ways in which this might conceivably be
done. Einstein referred to allowing complex transformations, or non-symmetric, or
increasing the number of dimensions, etc., but all these retain the continuum hypothesis.

He doesnt seem to have seriously considered relaxing this assumption, and allowing
completely arbitrary transformations (unless this is what he had in mind when he referred
to an algebraic theory). Ironically in his expositions of general relativity he often
proudly explained that it gave an expression of physical laws valid for completely
arbitrary transformations of the coordinates, but of course he meant arbitrary only up to
diffeomorphism, which in the absolute sense is not very arbitrary at all.
We mentioned in the previous section that diffeomorphically equivalent sets can be
assigned the same topology, but from the standpoint of a physical theory it isn't selfevident which diffeomorphism is the right one (assuming there is one) for a particular set
of physical entities, such as the events of spacetime. Suppose we're able to establish a 1to-1 correspondence between certain physical events and the sets of four real-valued
numbers (x0,x1,x2,x3). (As always, the superscripts are indices, not exponents.) This is
already a very strong supposition, because the real numbers are uncountable, even over a
finite range, so we are supposing that physical events are also uncountable. However,
I've intentionally not characterized these physical events as points in a certain contiguous
region of a smooth continuous manifold, because the ability to place those events in a
one-to-one correspondence with the coordinate sets does not, by itself, imply any
particular arrangement of those events. (We use the word arrangement here to signify the
notions of order and nearness associated with a specific topology.) In particular, it
doesn't imply an arrangement similar to that of the coordinate sets interpreted as points in
the four-dimensional space denoted by R4.
To illustrate why the ability to map events with real coordinates does not, by itself, imply
a particular arrangement of those events, consider the coordinates of a single event,
normalized to the range 0-1, and expressed in the form of their decimal representations,
where xmn denotes the nth most significant digit of the mth coordinate, as shown below
x0 = 0.
x1 = 0.
x2 = 0.
x3 = 0.

x01 x02 x03 x04


x11 x12 x13 x14
x21 x22 x23 x24
x31 x32 x33 x34

x05 x06 x07 x08 ...


x15 x16 x17 x18 ...
x25 x26 x27 x28 ...
x35 x36 x37 x38 ...

We could, as an example, assign each such set of coordinates to a point in an ordinary


four-dimensional space with the coordinates (y0,y1,y2,y3) given by the diagonal sets of
digits from the corresponding x coordinates, taken in blocks of four, as shown below
y0 = 0.
y1 = 0.
y2 = 0.
y3 = 0.

x01 x12 x23 x34


x02 x13 x24 x31
x03 x14 x21 x32
x04 x11 x22 x33

x05 x16 x27 x38 ...


x06 x17 x28 x35 ...
x07 x18 x25 x35 ...
x08 x15 x26 x37 ...

We could also transpose each consecutive pair of blocks, or scramble the digits in any
number of other ways, provided only that we ensure a 1-to-1 mapping. We could even
imagine that the y space has (say) eight dimensions instead of four, and we could
construct those eight coordinates from the odd and even numbered digits of the four x

coordinates. It's easy to imagine numerous 1-to-1 mappings between a set of abstract
events and sets of coordinates such that the actual arrangement of the events (if indeed
they possess one) bears no direct resemblance to the arrangement of the coordinate sets in
their natural space.
So, returning to our task, we've assigned coordinates to a set of events, and we now wish
to assert some relationship between those events that remains invariant under a particular
kind of transformation of the coordinates. Specifically, we limit ourselves to coordinate
mappings that can be reached from our original x mapping by means of a simple linear
transformation applied on the natural space of x. In other words, we wish to consider
transformations from x to X given by a set of four continuous functions f i with
continuous partial first derivatives. Thus we have
X0
X1
X2
X3

=
=
=
=

f 0 (x0 , x1 , x2 , x3)
f 1 (x0 , x1 , x2 , x3)
f 2 (x0 , x1 , x2 , x3)
f 3 (x0 , x1 , x2 , x3)

Further, we require this transformation to posses a differentiable inverse, i.e., there exist
differentiable functions Fi such that
x0
x1
x2
x3

=
=
=
=

F0 (X0 , X1 , X2 , X3)
F1 (X0 , X1 , X2 , X3)
F2 (X0 , X1 , X2 , X3)
F3 (X0 , X1 , X2 , X3)

A mapping of this kind is called a diffeomorphism, and two sets are said to be equivalent
up to diffeomorphism if there is such a mapping from one to the other. Any physical
theory, such as general relativity, formulated in terms of tensor fields in spacetime
automatically possess the freedom to choose the coordinate system from among a
complete class of diffeomorphically equivalent systems. From one point of view this can
be seen as a tremendous generality and freedom from dependence on arbitrary coordinate
systems. However, as noted above, there are infinitely many systems of coordinates that
are not diffeomorphically equivalent, so the limitation to equivalent systems up to
diffeomorphism can also be seen as quite restrictive.
For example, no such functions can possibly reproduce the digit-scrambling
transformations discussed previously, such as the mapping from x to y, because those
mappings are everywhere discontinuous. Thus we cannot get from x coordinates to y
coordinates (or vice versa) by means of continuous transformations. By restricting
ourselves to differentiable transformations we're implicitly focusing our attention on one
particular equivalence class of coordinate systems, with no a priori guarantee that this
class of systems includes the most natural parameterization of physical events. In fact,
we don't even know if physical events possess a natural parameterization, or if they do,
whether it is unique.

Recall that the special theory of relativity assumes the existence and identifiability of a
preferred equivalence class of coordinate systems called the inertial systems. The laws of
physics, according to special relativity, should be the same when expressed with respect
to any inertial system of coordinates, but not necessarily with respect to non-inertial
systems of reference. It was dissatisfaction with having given a preferred role to a
particular class of coordinate systems that led Einstein to generalize the "gage freedom"
of general relativity, by formulating physical laws in pure tensor form (general
covariance) so that they apply to any system of coordinates from a much larger
equivalence class, namely, those that are equivalent to an inertial coordinate system up to
diffeomorphism. This entails accelerated coordinate systems (over suitably restricted
regions) that are outside the class of inertial systems. Impressive though this
achievement is, we should not forget that general relativity is still restricted to a preferred
class of coordinate systems, which comprise only an infinitesimal fraction of all
conceivable mappings of physical events, because it still excludes non-diffeomorphic
transformations.
It's interesting to consider how we arrive at (and agree upon) our preferred equivalence
class of coordinate systems. Even from the standpoint of special relativity the
identification of an inertial coordinate system is far from trivial (even though it's often
taken for granted). When we proceed to the general theory we have a great deal more
freedom, but we're still confined to a single topology, a single pattern of coherence. How
is this coherence apprehended by our senses? Is it conceivable that a different set of
senses might have led us to apprehend a different coherent structure in the physical
world? More to the point, would it be possible to formulate physical laws in such a way
that they remain applicable under completely arbitrary transformations?
9.3 Higher-Order Metrics
A similar path to the same goal could also be taken in those manifolds in
which the line element is expressed in a less simple way, e.g., by a fourth
root of a differential expression of the fourth degree
Riemann, 1854
Given three points A,B,C, let dx1 denote the distance between A and B, and let dx2 denote
the distance between B and C. Can we express the distance ds between A and C in terms
of dx1 and dx2? Since dx1, dx2, and ds all represent distances with comensurate units, it's
clear that any formula relating them must be homogeneous in these quantities, i.e., they
must appear to the same power. One possibility is to assume that ds is a linear
combination of dx1 and dx2 as follows

where g1 and g2 are constants. In a simple one-dimensional manifold this would indeed be
the correct formula for ds, with |g1| = |g2| = 1, except for the fact that it might give a

negative sign for ds, contrary to the idea of an interval as a positive magnitude. To ensure
the correct sign for ds, we might take the absolute value of the right hand side, which
suggests that the fundamental equality actually involves the squares of the two sides of
the above equation, i.e., the quantities ds, dx1, dx2 satisfy the relation

where we have put gij = gi gj. Thus we have g11g22 4(g12)2 = 0, which is the condition for
factorability of the expanded form as the square of a linear expression. This will be the
case in a one-dimensional manifold, but in more general circumstances we find that the
values of the gij in the expanded form of (2) are such that the expression is not factorable
into linear terms with real coefficients. In this way we arrive at the second-order metric
form, which is the basis of Riemannian geometry.
Of course, by allowing the second-order coefficients gij to be arbitrary, we make it
possible for (ds)2 to be negative, analagous to the fact that ds in equation (1) could be
negative, which is what prompted us to square both sides of (1), leading to equation (2).
Now that (ds)2 can be negative, we're naturally led to consider the possibility that the
fundamental relation is actually the equality of the squares of boths sides of (2). This
gives

where the sum is evaluated for each ranging from 1 to n, where n is the
dimension of the manifold. Once again, having arrived at this form, we immediately
dispense with the assumption of factorability, and allow general fourth-order metrics.
These are non-Riemannian metrics, although Riemann actually alluded to the possibility
of fourth and higher order metrics in his famous inagural dissertation. He noted that
The line element in this more general case would not be reducible to the square
root of a quadratic sum of differential expressions, and therefore in the expression
for the square of the line element the deviation from flatness would be an
infinitely small quantity of degree two, whereas for the former manifolds [i.e.,
those whose squared line elements are sums of squares] it was an infinitely small
quantity of degree four. This pecularity [i.e., this quantity of the second degree] in
the latter manifolds therefore might well be called the planeness in the smallest
parts
It's clear even from his brief comments that he had given this possibility considerable
thought, but he never published any extensive work on it. Finsler wrote a dissertation on
this subject in 1918, so such metrics are now often called Finsler metrics.

To visualize the effect of higher order metrics, recall that for a second-order metric the
locus of points at a fixed distance ds from the origin must be a conic, i.e., an ellipse,
hyperbola, or parabola. In contrast, a fourth-order metric allows more complicated loci of
equi-distant points. When applied in the context of Minkowskian metrics, these higherorder forms raise some intriguing possibilities. For example, instead of a spacetime
structure with a single light-like characteristic c, we could imagine a structure with two
null characteristics, c1 and c2. Letting x and t denote the spacelike and timelike
coordinates respectively, this means (ds/dt)4 vanishes for two values (up to sign) of dx/dt.
Thus there are four roots given by c1 and c2, and we have

The resulting metric is

The physical significance of this "metric" naturally depends on the physical meaning of
the coordinates x and t. In Minkowski spacetime these represent what physical rulers and
clocks measure, and we can translate these coordinates from one inertial system to
another according to the Lorentz transformations while always preserving the form of the
Minkowski metric with a fixed numerical value of c. The coordinates x and t are defined
in such a way that c remains invariant, and this definition happily coincides with the
physical measures of rulers and clocks. However, with two distinct light-like
"eigenvalues", it's no longer possible for a single family of spacetime decompositions to
preserve the values of both c1 and c2. Consequently, the metric will take the form of (3)
only with respect to one particular system of xt coordinates. In any other frame of
reference at least one of c1 and c2 must be different.
Suppose that with respect to a particular inertial system of coordinates x,t the spacetime
metric is given by (3) with c1 = 1 and c2 = 2. We might also suppose that c1 corresponds to
the null surfaces of electromagnetic wave propagation, just as in Minkowski spacetime.
Now, with respect to any other system of coordinates x',t' moving with speed v relative to
the x,t coordinates, we can decompose the absolute intervals into space and time
components such that c1 = 1, but then the values of the other lightlines (corresponding to
c2') must be (v + c2)/(1 + v c2) and (v c2)/(1 v c2). Consequently, for states of motion
far from the one in which the metric takes the special form (3), the metric will become
progressively more asymmetrical. This is illustrated in the figure below, which shows
contours of constant magnitude of the squared interval.

Clearly this metric does not correspond to the observed spacetime structure, even in the
symmetrical case with v = 0, because it is not Lorentz-invariant. As an alternative to this
structure containing "super-light" null surfaces we might consider metrics with some
finite number of "sub-light" null surfaces, but the failure to exhibit even approximate
Lorentz-invariance would remain.
However, it is possible to construct infinite-order metrics with infinitely many super-light
and/or sub-light null surfaces, and in so doing recover a structure that in many respect is
virtually identical to Minkowski spacetime, except for a set (of spacetime trajectories) of
measure zero. This can be done by generalizing (3) to include infinitely many discrete
factors

where the values of ci represent an infinite family of sub-light parameters given by

A plot showing how this spacetime structure develops as n increases is shown below.

This illustrates how, as the number of sub-light cones goes to infinity, the structure of the
manifold goes over to the usual Minkowski pseudometric, except for the discrete null
sub-light surfaces which are distributed throughout the interior of the future and past light
cones, and which accumulate on the light cones. The sub-light null surfaces become so
thin that they no linger show up on these contour plots for large n, but they remain
present to all orders. In the limit as n approaches infinity they become discrete null
trajectories embedded in what amounts to ordinary Minkowski spacetime. To see this,
notice that if none of the factors on the right hand side of (4) is exactly zero we can take
the natural log of both sides to give

Thus the natural log of (ds)2 is the asymptotic average of the natural logs of the quantities
(dx)2 ci2(dt)2. Since the values of ci accumultate on 1, it's clear that this converges on the
usual Minkowski metric (provided we are not precisely on any of the discrete sub-light
null surfaces).
The preceding metric was based purely on sub-light null surfaces. We could also include
n super-light null surfaces along with the n sub-light null surfaces, yielding an asymptotic
family of metrics which, again, goes over to the usual Minkowski metric as n goes to
infinity (except for the discrete null surface structure). This metric is given by the

formula

where the values of ci are generated as before. The results for various values of n are
illustrated in the figure below.

Notice that the quasi Lorentz-invariance of this metric has a subtle periodicity, because
any one of the sublight null surfaces can be aligned with the time axis by a suitable
choice of velocity, or the time axis can be placed "in between" two null surfaces. In a 1+1
dimensional spacetime the structure is perfectly symmetrical modulo this cycle from one
null surface to the next. In other words, the set of exactly equivalent reference systems
corresponds to a cycle with a period of , which is the increment between each ci and ci+1.
However, with more spatial dimensions the sub-light null structure is subtly less
symmetrical, because each null surface represents a discrete cone, which associates two
of the trajectories in the xt plane as the sides of a single cone. Thus there must be an
absolutely innermost cone, in the topological sense, even though that cone may be far off
center, i.e., far from the selected time axis. Similarly for the super-light cones (or
spheres), there would be a single state of motion with respect to which all of those null
surfaces would be spherically symmetrical. Only the accumulation shell, i.e., the actual
light-cone itself, would be spherically symmetrical with respect to all states of motion.

9.4 Spin and Polarization


Every ray of light has therefore two opposite sides And since the crystal
by this disposition or virtue does not act upon the rays except when one of
their sides of unusual refraction looks toward that coast, this argues a
virtue or disposition in those sides of the rays which answers to and
sympathizes with that virtue or disposition of the crystal, as the poles of
two magnets answer to one another
Newton, 1717
The spin of a particle is quantized, so when we make a measurement at any specific angle
we get only one of the two results UP or DOWN. This was shown by the famous
Stern/Gerlach experiment, in which a beam of particles (atoms of silver) was passed
through an oriented magnetic field, and it was found that the beam split into two beams,
one deflected UP (relative to the direction of the magnetic field) and the other deflected
DOWN, with about half of the particles in each.

This behavior implies that the state-vector for spin has just two components, vUP and
vDOWN, for any given direction v. These components are weighted and the sum of the
squares of the weights equals 1. (The overall state-vector for the particle can be
decomposed into the product of a non-spin vector times the spin vector.) The observable
"spin" then corresponds to three operators that are proportional to the Pauli spin matrices:

These operators satisfy the commutation relations

as we would expect by the correspondence principle from ordinary (classical) spin. Not
surprisingly, this non-commutation is closely related to the non-commutation of ordinary
spatial rotations of a classical particle, in the sense that they're both related to the crossproduct of orthogonal vectors. Given an orthogonal coordinate system [x,y,z] the angular
momentum of a classical particle with momentum [px, py, pz] is (in component form)

Guided by the correspondence principle, we replace the classical components px, py, pz
with their quantum mechanical equivalents, the differential operators -i d/dx, -i d/dy,
-i d/dz, leading to the S operators noted above.
Photons too have quantum spin (they are spin-1 particles), but since photons travel at the
speed c, the "spin axis" of a photon is always parallel to its direction of motion, pointing
either forward or backward. These two states correspond to left-handed and right-handed
photons. Whenever a photon is absorbed by an object, an angular momentum of either
+h/2 or -h/2 is imparted to the object. Each photon "in transit" may be considered to
possess, in addition to its phase, a certain propensity to exhibit each of the two possible
states of spin when it interacts with an object, and a beam of light can be characterized by
the spin propensities (polarization) and phase relations of its constituent photons.
Polarization behaves in a way that is formally very similar to the spin of massive
particles. In a sense, the Schrodinger wave of a photon corresponds to the electromagnetic wave of light, and this wave is governed by Maxwell's equations, which tell us
that the electric and magnetic fields oscillate transversely in the plane normal to the
direction of motion (and perpendicular to each other). Thus a photon coming directly
toward us "looks" something like this:

where E signifies the oscillating electric field and B the magnetic field. (This orientation
is not necessarily fixed - it's possible for it to rotate like a windmill - but it's simplest to
concentrate on "plane-polarized" photons.) The photon is said to be polarized in the
direction of E.
A typical beam of ordinary light has photons with all different polarizations mixed
together, but certain substances (such as calcite crystals or a sheet of Polaroid) allow
photons to pass through only if their electric field is oscillating in one a particular
direction. Therefore, when we pass a beam of light through a polarizing material, the
light that passes through is "polarized", because all the photons have their electric fields
aligned.
Since only photons with one particular alignment are allowed to pass, and since the
incident beam has photons whose polarizations are distributed uniformly in all direction,
one might expect to find that only a very small fraction of the photons would pass

through a perfect polarizing substance. (In fact, the fraction of photons from a uniform
distribution with polarizations exactly aligned with the polarizing axis of the substance
should be vanishingly small.) However, we actually find that a sheet of Polaroid cuts the
intensity of an ordinary light beam about in half. Just as in the Stern/Gerlach experiment
with massive particles, the Polaroid sheet acts as a measurement for each photon, and
gives one of two answers, as if the incoming photons were all polarized in one of just two
directions, exactly parallel to the polarizing axis of the substance, or exactly
perpendicular to it. This is analogous to the binary UP/DOWN results for spin-1/2
particles such as electrons.
If we place a second sheet of Polaroid behind the first, and orient its axis in the same
direction, then we find that all the light which passes through the first sheet also passes
through the second. If we rotate the second sheet it will start to cut down on the photons
allowed through. When we get the second sheet axis at 90 degrees to the first, it will
essentially block all the photons. In general, if the two sheets (i.e., measurements) are
oriented at an angle of relative to each other, then the intensity of the light passing all
the way through is I cos()2, where I is the intensity of the original beam.
The thickness of the polarizing substance isn't crucial (assuming the polarization axis is
perfectly uniform throughout the substance, because the first surface effectively "selects"
the suitably aligned photons, which then pass freely through the rest of the substance. The
light emerging from the other side is plane-polarized with half the intensity of the
incident light. On the other hand to convert circularly polarized incident light into planepolarized light of the same intensity, the traditional method is to use a "quarter-wave
plate" thickness of a crystal substance such as mica. In this case we're not masking out
the non-aligned components, but rather introducing a relative phase shift between them
so as to force them into alignment. Of course, a particular thickness of plate only "works"
this way for a particular frequency.
Incidentally, most people have personal "hands on" knowledge of polarized
electromagnetic waves without even realizing it. The waves broadcast by a radio or
television tower are naturally polarized, and if you've ever adjusted the orientation of
"rabbit ears" and found that your reception is better at some orientations than at others
(for a particular station) you've demonstrated the effects of electromagnetic wave
polarization.
It may be worth noting that light polarization and photon spin, although intimately
related, are not precisely synonymous. The photon's spin axis is always parallel to the
direction of travel, whereas the polarization axis of a wave of light is perpendicular to the
direction of travel. It happens that the polarization affects the behavior of photons in a
formally similar way to the effect of spin on the behavior of massive particles.
Polarization itself is often not regarded as a quantum phenomenon, and it takes on
quantum behavior only because light is quantized into photons.
Regarding the parallel between Schrodinger's equations and Maxwell's equations, it's
interesting to draw the further parallel between the real/imaginary complexity of the

Schrodinger wave and the electric/magnetic complexity of light waves.

9.5 Entangled Events


Anyone who is not shocked by quantum theory has not understood it.
Niels Bohr, 1927
A paper written by Einstein, Podalsky, and Rosen (EPR) in 1935 described a thought
experiment which, the authors believed, demonstrated that quantum mechanics does not
provide a complete description of physical reality, at least not if we accept certain
common notions of locality and realism. Subsequently the EPR experiment was refined
by David Bohm (so it is now called the EPRB experiment) and analyzed in detail by John
Bell, who highlighted a fascinating subtlety that Einstein, et al, may have missed. Bell
showed that the outcomes of the EPRB experiment predicted by quantum mechanics are
inherently incompatible with conventional notions of locality and realism combined with
a certain set of assumptions about causality. The precise nature of these causality
assumptions is rather subtle, and Bell found it necessary to revise and clarify his premises
from one paper to the next. In Section 9.6 we discuss Bell's assumptions in detail, but for
the moment we'll focus on the EPRB experiment itself, and the outcomes predicted by
quantum mechanics.
Most actual EPRB experiments are conducted with photons, but in principle the
experiment could be performed with massive particles. The essential features of the
experiment are independent of the kind of particle we use. For simplicity we'll describe a
hypothetical experiment using electrons (although in practice it may not be feasible to
actually perform the necessary measurements on individual electrons). Consider the
decay of a spin-0 particle resulting in two spin-1/2 particles, an electron and a positron,
ejected in opposite directions. If spin measurements are then performed on the two
individual particles, the correlation between the two results is found to depend on the
difference between the two measurement angles. This situation is illustrated below, with
and signifying the respective measurement angles at detectors 1 and 2.

Needless to say, the mere existence of a correlation between the measurements on these
two particles is not at all surprising. In fact, this would be expected in most classical
models, as would a variation in the correlation as a function of the absolute difference =
| | between the two measurement angles. The essential strangeness of the quantum
mechanical prediction is not the mere existence of a correlation that varies with , it is the

non-linearity of the predicted variation.


If the correlation varied linearly as ranged from 0 to , it would be easy to explain in
classical terms. We could simply imagine that the decay of the original spin-0 particle
produced a pair of particles with spin vectors pointing oppositely along some randomly
chosen axis. Then we could imagine that a measurement taken at any particular angle
gives the result UP if the angle is within /2 of the positive spin axis, and gives the result
DOWN otherwise. This situation is illustrated below:

Since the spin axis is random, each measurement will have an equal probability of being
UP or DOWN. In addition, if the measurements on the two particles are taken in exactly
the same direction, they will always give opposite results (UP/DOWN or DOWN/UP),
and if they are taken in the exact opposite directions they will always give equal results
(UP/UP or DOWN/DOWN). Also, if they are taken at right angles to each other the
results will be completely uncorrelated, meaning they are equally likely to agree or
disagree. In general, if denotes the absolute value of the angle between the two spin
measurements, the above model implies that the correlation between these two
measurements would be C(q) = (2/) 1, as plotted below.

This linear correlation function is consistent with quantum mechanics (and confirmed by
experiment) if the two measurement angles differ by = 0, /2, or , giving the
correlations 1, 0, and +1 respectively.
However, for intermediate angles, quantum theory predicts (and experiments confirm)
that the actual correlation function for spin-1/2 particles is not the linear function shown
above, but the non-linear function given by C() = cos(), as shown below

On this basis, the probabilities of the four possible joint outcomes of spin measurements
performed at angles differing by are as shown in the table below. (The same table
would apply to spin-1 particles such as photons if we replace with 2.)

To understand why the shape of this correlation function defies explanation within the
classical framework of local realism, suppose we confine ourselves to spin measurements
along one of just three axes, at 0, 120, and 240 degrees. For convenience we will denote
these axes by the symbols A, B, and C respectively. Several pairs of particles are
produced and sent off to two distant locations in opposite directions. In both locations a
spin measurement along one of the three allowable axes is performed, and the results are
recorded. Our choices of measurements (A, B, or C) may be arbitrary, e.g., by flipping
coins, or by any other means. In each location it is found that, regardless of which
measurement is made, there is an equal probability of spin UP or spin DOWN, which we
will denote by "1" and "0" respectively. This is all that the experimenters at either site can
determine separately.
However, when all the results are brought together and compared in matched pairs, we
find the following joint correlations

The numbers in this matrix indicate the fraction of times that the results agreed (both 0 or
both 1) when the indicated measurements were made on the two members of a matched
pair of objects. Notice that if the two distant experimenters happened to have chosen to
make the same measurement for a given pair of particles, the results never agreed, i.e.,
they were always the opposite (1 and 0, or 0 and 1). Also notice that, if both
measurements are selected at random, the overall probability of agreement is 1/2.
The remarkable fact is that there is no way (within the traditional view of physical
processes) to prepare the pairs of particles in advance of the measurements such that they
will give the joint probabilities listed above. To see why, notice that each particle must be
ready to respond to any one of the three measurements, and if it happens to be the same
measurement as is selected on its matched partner, then it must give the opposite answer.
Hence if the particle at one location will answer "0" for measurement A, then the particle
at the other location must be prepared to give the answer "1" for measurement A. There
are similar constraints on the preparations for measurements B and C, so there are really
only eight ways of preparing a pair of particles

These preparations - and only these - will yield the required anti-correlation when the
same measurement is applied to both objects. Therefore, assuming the particles are preprogrammed (at the moment when they separate from each other) to give the appropriate
result for any one of the nine possible joint measurements that might be performed on
them, it follows that each pair of particles must be pre-programmed in one of the eight
ways shown above. It only remains now to determine the probabilities of these eight
preparations.
The simplest state of affairs would be for each of the eight possible preparations to be
equally probable, but this yields the measurement correlations shown below

Not only do the individual joint probabilities differ from the quantum mechanical
predictions, this distribution gives an overall probability of agreement of 1/3, rather than
1/2 (as quantum mechanics says it must be), so clearly the eight possible preparations
cannot be equally likely. Now, we might think some other weighting of these eight
preparation states will give the right overall results, but in fact no such weighting is
possible. The overall preparation process must yield some linear convex combination of
the eight mutually exclusive cases, i.e., each of the eight possible preparations must have
some fixed long-term probability, which we will denote by a, b,.., h, respectively. These
probabilities are all positive values in the range 0 to 1, and the sum of these eight values
is identically 1. It follows that the sum of the six probabilities b through g must be less
than or equal to 1. This is a simple form of "Bell's inequality", which must be satisfied by
any local realistic model of the sort that Bell had in mind. However, the joint probabilities
in the correlation table predicted by quantum mechanics imply

Adding these three expressions together gives 2(b + c + d + e + f + g) = 9/4, so the sum
of the probabilities b through g is 9/8, which exceeds 1. Hence the results of the EPRB
experiment predicted by quantum mechanics (and empirically confirmed) violate Bell's
inequality. This shows that there does not exist a linear combination of those eight
preparations that can yield the joint probabilities predicted by quantum mechanics, so
there is no way of accounting for the actual experimental results by means of any realistic
local physical model of the sort that Bell had in mind. The observed violations of Bell's
inequality in EPRB experiments imply that Bell's conception of local realism is
inadequate to represent the actual processes of nature.
The causality assumptions underlying Bell's analysis are inherently problematic (see
Section 9.7), but the analysis is still important, because it highlights the fundamental
inconsistency between the predictions of quantum mechanics and certain conventional
ideas about causality and local realism. In order to maintain those conventional ideas, we
would be forced to conclude that information about the choice of measurement basis at
one detector is somehow conveyed to the other detector, influencing the outcome at that
detector, even though the measurement events are space-like separated. For this reason,
some people have been tempted to think that violations of Bell's inequality imply
superluminal communication, contradicting the principles of special relativity. However,
there is actually no effective transfer of information from one measurement to the other in

an EPRB experiment, so the principles of special relativity are safe. One of the most
intriguing aspects of Bell's analysis is that it shows how the workings of quantum
mechanics (and, evidently, nature) involve correlations between space-like separated
events that seemingly could only be explained by the presence of information from
distant locations, even though the separate events themselves give no way of inferring
that information. In the abstract, this is similar to "zero-information proofs" in
mathematics.
To illustrate, consider a "twins paradox" involving a pair of twin brothers who are
separated and sent off to distant locations in opposite directions. When twin #1 reaches
his destination he asks a stranger there to choose a number x1 from 1 to 10, and the twin
writes this number down on a slip of paper along with another number y1 of his own
choosing. Likewise twin #2 asks someone at his destination to choose a number x2, and
he writes this number down along with a number y2 of his own choosing. When the twins
are re-united, we compare their slips of paper and find that |y2 y1| = (x2 x1)2. This is
really astonishing. Of course, if the correlation was some linear relationship of the form
y2 y1 = A(x2 x1) + B for any pre-established constants A and B, the result would be
quite easy to explain. We would simply surmise that the twins had agreed in advance that
twin #1 would write down y1 = Ax1 B/2, and twin #2 would write down y2 = Ax2 + B/2.
However, no such explanation is possible for the observed non-linear relationship,
because there do not exist functions f1 and f2 such that f2(x2) f1(x1) = (x2 x1)2. Thus if
we assume the numbers x1 and x2 are independently and freely selected, and there is no
communication between the twins after they are separated, then there is no "locally
realistic" way of accounting for this non-linear correlation. It seems as though one or both
of the twins must have had knowledge of his brother's numbers when writing down his
own number, despite the fact that it is not possible to infer anything about the individual
values of x2 and y2 from the values of x1 and y1 or vice versa.
In the same way, the results of EPRB experiments imply a greater degree of interdependence between separate events than can be accounted for by traditional models of
causality. One possible idea for adjusting our conceptual models to accommodate this
aspect of quantum phenomena would be to deny the existence of any correlations until
they becomes observable. According to the most radical form of this proposal, the
universe is naturally partitioned into causally compact cells, and only when these cells
interact do their respective measurement bases become reconciled, in such a way as to
yield the quantum mechanical correlations. This is an appealing idea in many ways, but
it's far from clear how it could be turned into a realistic model. Another possibility is that
the preparation of the two particles at the emitter and the choices of measurement bases at
the detectors may be mutually influenced by some common antecedent event(s). This can
never be ruled out, as discussed in Section 9.6. Lastly, we mention the possibility that the
preparation of the two particles may be conditioned by the measurements to which they
are subjected. This is discussed in Section 9.10.

9.6 Von Neumann's Postulate and Bells Freedom


If I have freedom in my love,
And in my soul am free,
Angels alone, that soar above,
Enjoy such liberty.
Richard Lovelace, 1649
In quantum mechanics the condition of a physical system is represented by a state vector,
which encodes the probabilities of each possible result of whatever measurements we
may perform on the system. Since the probabilities are usually neither 0 nor 1, it follows
that for a given system with a specific state vector, the results of measurements generally
are not uniquely determined. Instead, there is a set (or range) of possible results, each
with a specific probability. Furthermore, according to the conventional interpretation of
quantum mechanics (the "Copenhagen Interpretation" advocated by Niels Bohr, et al), the
state vector is the most complete possible description of the system, which implies that
nature is fundamentally probabilistic (i.e., non-deterministic). However, it's natural to
question whether this interpretation is correct, or whether there might be some more
complete description of a system, such that a fully specified system would respond
deterministically to any measurement we might perform. Such proposals are called
'hidden variable' theories.
In his assessment of hidden variable theories in 1932, John von Neumann pointed out a
set of five assumptions which, if we accept them, imply that no hidden variable theory
can possibly give deterministic results for all measurements. The first four of these
assumptions are fairly unobjectionable, but the fifth seems much more arbitrary, and has
been the subject of much discussion. (The parallel with Euclid's postulates, including the
controversial fifth postulate discussed in Chapter 3.1, is striking.)
To understand von Neumann's fifth postulate, notice that although the conventional
interpretation does not uniquely determine the outcome of a particular measurement for a
given state, it does predict a unique 'expected value' for that measurement. Let's say a
measurement of X on a system with a state vector has an expected value denoted by
<X;>, computed by simply adding up all the possible results multiplied by their
respective probabilities. Not surprisingly, the expected values of observables are additive,
in the sense that

In practice we can't generally perform a measurement of X+Y without disturbing the


measurements of X and Y, so we can't measure all three observables on the same system.
However, if we prepare a set of systems, all with the same initial state vector , and
perform measurements of X+Y on some of them, and measurements of X or Y on the
others, then the averages of the measured values of X, Y, and X+Y (over sufficiently
many systems) will be related in accord with (1).

Remember that according to the conventional interpretation the state vector is the most
complete possible description of the system. On the other hand, in a hidden variable
theory the premise is that there are additional variables, and if we specify both the state
vector AND the "hidden vector" H, the result of measuring X on the system is uniquely
determined. In other words, if we let <X;,H> denote the expected value of a
measurement of X on a system in the state (,H), then the claim of the hidden variable
theorist is that the variance of individual measured values around this expected value is
zero.
Now we come to von Neumann's controversial fifth postulate. He assumed that, for any
hidden variable theory, just as in the conventional interpretation, the averages of X+Y, X
and Y evaluated over a set of identical systems are additive. (Compare this with Galileo's
assumption of simple additivity for the composition of incommensurate speeds.)
Symbolically, this is expressed as

for any two observables X and Y. On this basis he proved that the variance ("dispersion")
of at least one observable's measurements must be greater than zero. (Technically, he
showed that there must be an observable X such that <X2> is not equal to <X>2.) Thus,
no hidden variable theory can uniquely determine the results of all possible
measurements, and we are compelled to accept that nature is fundamentally nondeterministic.
However, this is all based on (2), the assumption of additivity for the expectations of
identically prepared systems, so it's important to understand exactly what this assumption
means. Clearly the words "identically prepared" mean something different under the
conventional interpretation than they do in the context of a hidden variable theory.
Conventionally, two systems are said to be identically prepared if they have the same
state vector (), but in a hidden variable theory two states with the same state vector are
not necessarily "identical", because they may have different hidden vectors (H).
Of course, a successful hidden variable theory must satisfy (1) (which has been
experimentally verified), but must it necessarily satisfy (2)? Relation (1) implies that the
averages of <X;,H>, etc, evaluated over all applicable hidden vectors H, leads to (1),
but does it necessarily follow that (2) is satisfied for every (or even for ANY) specific
value of H? To give a simple illustration, consider the following trivial set of data:

The averages over these four "conventionally indistinguishable" systems are <X;3> = 3,
<Y;3> = 4, and <X+Y;3> = 7, so relation (1) holds. However, if we examine the
"identically prepared" systems taking into account the hidden components of the state, we
really have two different states (those with H=1 and those with H=2), and we find that the
results are not additive (but they are deterministic) in these fully-defined states. Thus,
equation (1) clearly doesn't imply equation (2). (If it did, von Neumann could have said
so, rather than taking it as an axiom.)
Of course, if our hidden variable theory is always going to satisfy (1), we must have some
constraints on the values of H that arise among "conventionally indistinguishable"
systems. For example, in the above table if we happened to get a sequence of systems all
in the same condition as System #1 we would always get the results X=2, Y=5, X+Y=5,
which would violate (1). So, if (2) doesn't hold, then at the very least we need our theory
to ensure a distribution of the hidden variables H that will make the average results over a
set of "conventionally indistinguishable" systems satisfy relation (1). (In the simple
illustration above, we would just need to ensure that the hidden variables are
equally distributed between H=1 and H=2.)
In Bohm's 1952 theory the hidden variables consist of precise initial positions for the
particles in the system more precise than the uncertainty relations would typically allow
us to determine - and the distribution of those variables within the uncertainty limits is
governed as a function of the conventional state vector, . It's also worth noting that, in
order to make the theory work, it was necessary for to be related to the values of H for
separate particles instantaneously in an explicitly non-local way. Thus, Bohm's theory is a
counter-example to von Neumann's theorem, but not to Bell's (see below).
Incidentally, it may be worth noting that if a hidden variable theory is valid, and the
variance of all measurements around their expectations are zero, then the terms of (2) are
not only the expectations, they are the unique results of measurements for a given and
H. This implies that they are eigenvalues, of the respective operators, whereas the
expectations for those operators are generally not equal to any of the eigenvalues. Thus,
as Bell remarked, "[von Neumann's] 'very general and plausible postulate' is absurd".
Still, Gleason showed that we can carry through von Neumann's proof even on the
weaker assumption that (2) applies to commuting variables. This weakened assumption
has the advantage of not being self-evidently false. However, careful examination of
Gleason's proof reveals that the non-zero variances again arise only because of the
existence of non-commuting observables, but this time in a "contextual" sense that may
not be obvious at first glance.
To illustrate, consider three observables X,Y,Z. If X and Y commute and X and Z
commute, it doesn't follow that Y and Z commute. We may be able to measure X and Y
using one setup, and X and Z using another, but measuring the value of X and Y
simultaneously will disturb the value of Z. Gleason's proof leads to non-zero variances
precisely for measurements in such non-commuting contexts. It's not hard to understand

this, because in a sense the entire non-classical content of quantum mechanics is the fact
that some observables do not commute. Thus it's inevitable that any "proof" of the
inherent non-classicality of quantum mechanics must at some point invoke noncommuting measurements, but it's precisely at that point where linear additivity can only
be empirically verified on an average basis, not a specific basis. This, in turn, leaves the
door open for hidden variables to govern the individual results.
Notice that in a "contextual" theory the result of an experiment is understood to depend
not only on the deterministic state of the "test particles" but also on the state of the
experimental apparatus used to make the measurements, and these two can influence each
other. Thus, Bohm's 1952 theory escaped the no hidden variable theorems essentially by
allowing the measurements to have an instantaneous effect on the hidden variables,
which, of course, made the theory essentially non-local as well as non-relativistic
(although Bohm and others later worked to relativize his theory).
Ironically, the importance of considering the entire experimental setup (rather than just
the arbitrarily identified "test particles") was emphasized by Niels Bohr himself, and it's a
fundamental feature of quantum mechanics (i.e., objects are influenced by measurements
no less than measurements are influenced by objects). As Bell said, even Gleason's
relatively robust line of reasoning overlooks this basic insight. Of course, it can be argued
that contextual theories are somewhat contrived and not entirely compatible with the
spirit of hidden variable explanations, but, if nothing else, they serve to illustrate how
difficult it is to categorically rule out "all possible" hidden variable theories based simply
on the structure of the quantum mechanical state space.
In 1963 John Bell sought to clarify matters, noting that all previous attempts to prove the
impossibility of hidden variable interpretations of quantum mechanics had been found
wanting. His idea was to establish rigorous limits on the kinds of statistical correlations
that could possibly exist between spatially separate events under the assumption of
determinism and what might be called local realism, which he took to be the premises
of Einstein, et al. At first Bell thought he had succeeded, but it was soon pointed out that
his derivation implicitly assumed one other crucial ingredient, namely, the possibility of
free choice. To see why this is necessary, notice that any two spatially separate events
share a common causal past, consisting of the intersection of their past light cones. This
implies that we can never categorically rule out some kind of "pre-arranged" correlation
between spacelike-separated events - at least not unless we can introduce information that
is guaranteed to be causally independent of prior events. The appearance of such "new
events" whose information content is at least partially independent of their causal past,
constitutes a free choice. If no free choice is ever possible, then (as Bell acknowledged)
the Bell inequalities do not apply.
In summary, Bell showed that quantum mechanics is incompatible with a quite peculiar
pair of assumptions, the first being that the future behavior of some particles (i.e., the
"entangled" pairs) involved in the experiment is mutually conditioned and coordinated in
advance, and the second being that such advance coordination is in principle impossible
for other particles involved in the experiment (e.g., the measuring apparatus). These are

not quite each others' logical negations, but close to it. One is tempted to suggest that the
mention of quantum mechanics is almost superfluous, because Bell's result essentially
amounts to a proof that the assumption of a strictly deterministic universe is incompatible
with the assumption of a strictly non-deterministic universe. He proved, assuming the
predictions of quantum mechanics are valid (which the experimental evidence strongly
supports), that not all events can be strictly consequences of their causal pasts, and in
order to carry out this proof he found it necessary to introduce the assumption that not all
events are strictly consequences of their causal pasts!
In the paper "Atomic-Cascade Photons, Quantum Mechanical Non-Locality", Bell listed
three possible positions that he thought could be taken with respect to the Aspect
experiments. (Actually he lists four, the fourth being "Just ignore it".) These alternatives
are

Regarding the third possibility, Bell wrote:


...if our measurements are not independently variable as we supposed...even if
chosen by apparently free-willed physicists... then Einstein local causality can
survive. But apparently separate parts of the world become deeply entangled, and
our apparent free will is entangled with them.
The third possibility clearly shows that Bell understood the necessity of assuming free
acausal events for his derivation, but since this amounts to assuming precisely that which
he was trying to prove, we must acknowledge that the significance of Bell's inequalities is
less clear than many people originally believed. In effect, after clarifying the lack of
significance of von Neumann's "no hidden variables proof" due to its assumption of what
it meant to prove, Bell proceeded to repeat the mistake, albeit in a more subtle way.
Perhaps Bell's most perspicacious remark was (in reference to Von Neumann's proof) that
the only thing proved by impossibility proofs is the author's lack of imagination.
This all just illustrates that it's extremely difficult to think clearly about causation, and the
reasons for this can be traced back to the Aristotelian distinction between natural and
violent motion. Natural motion consisted of the motions of non-living objects, such as the
motions of celestial objects, the natural flows of water and wind, etc. These are the kinds
of motion that people (like Bell) apparently have in mind when they think of
determinism. Following the ancients, people tend to instinctively exempt "violent
motions", i.e., motions resulting from acts of living volition, when considering
determinism. It's psychologically very difficult for us to avoid bifurcating the world into
inanimate objects that obey strict laws of causality, and animate objects (like ourselves)
that do not. This dichotomy was historically appealing, but it always left the nagging
question of how or why we (and our constituent atoms) manage to evade the iron hand of

determinism that governs everything else. This view affects our conception of science by
suggesting to us that the experimenter is not himself part of nature, and is exempt from
whatever determinism is postulated for the system being studied. Thus we imagine that
we can "test" whether the universe is behaving deterministically by turning some dials
and seeing how the universe responds, overlooking the fact that we and the dials are also
part of the universe.
This immediately introduces "the measurement problem", i.e., where do we draw the
boundaries between separate phenomena? What is an observation? How do we
distinguish "nature" from "violence", and is this distinction even warranted? It's worth
noting that when people say they're talking about a deterministic world, they're almost
always not. What they're usually talking about is a deterministic sub-set of the world that
can be subjected to freely chosen inputs from a non-deterministic "exterior". But just as
with the measurement problem in quantum mechanics, when we think we've figured out
the constraints on how a deterministic test apparatus can behave in response to arbitrary
inputs, someone says "but isn't the whole lab a deterministic system?", and then the
whole building, and so on. At what point does "the collapse of determinism" occur, so
that we can introduce free inputs to test the system? Just as the infinite regress of the
measurement problem in quantum mechanics leads us into bewilderment, so too does the
infinite regress of determinism.
The other loop-hole that can never be closed is what Bell called "correlation by postarrangement" or "backwards causality". I'd prefer to say that the system may violate the
assumption of strong temporal asymmetry, but the point is the same. Clearly the causal
pasts of the spacelike separated arms of an EPR experiment overlap, so all the objects
involved share a common causal past. Therefore, without something to "block off" this
region of common past from the emission and absorption events in the EPR experiment,
we're not justified in asserting causal independence, which is required for Bell's
derivation. The usual and, as far as I know, only way of blocking off the causal past is by
injecting some "other" influence, i.e., an influence other than the deterministic effects
propagating from the causal past. This "other" may be true randomness, free will, or
some other concept of "free occurrence". In any case, Bell's derivation requires us to
assert that each measurement is a "free" action, independent of the causal past, which is
inconsistent with even the most limited construal of determinism.
There is a fascinating parallel between the ancient concepts of natural and violent motion
and the modern quantum mechanical concepts of the linear evolution of the wave
function and the collapse of the wave function. These modern concepts are sometimes
termed U, for unitary evolution of the quantum mechanical state vector, and R, for
reduction of the state vector onto a particular basis of measurement or observation. One
could argue that the U process corresponds closely with Aristotle's natural (inanimate)
evolution, while the R process represents Aristotle's violent evolution, triggered by some
living act. As always, we face the question of whether this is an accurate or meaningful
bifurcation of events. Today there are several "non-collapse" interpretations of quantum
mechanics, including the famous "many worlds" interpretation of Everett and DeWit.
However, to date, none of these interpretations has succeeded in giving a completely

satisfactory account of quantum mechanical processes, so we are not yet able to dispense
with Aristotle's distinction between natural and violent motion.

9.7 The Gestalt of Determinism


Then assuredly the world was made not in time, but simultaneously with time.
St. Augustine, 400 AD

Determinism is commonly defined as the proposition that each event is the necessary and
unique consequence of prior events. This implies that events transpire in a temporally
ordered sequence, and that a wave of implication somehow flows along this sequence,
fixing or deciding each successive event based on the preceding events, in accord with
some definite rule (which may or may not be known to us). This description closely
parallels the beginning of Laplaces famous remarks on the subject:
We ought then to regard the present state of the universe as the effect of the anterior state and as
the cause of the one that is to follow

However, at this point Laplace introduces a gestalt shift (like the sudden realignment of
meaning that Donne often placed at the end of his "metaphysical" poems). After
describing the temporally ordered flow of events, he notes a profound shift in the
perception of "a sufficiently vast intelligence"
...nothing would be uncertain, and the future, as the past, would be present to its eyes.

This shows how we initially conceive of determinism as a temporally ordered chain of


implication, but when carried to its logical conclusion we are led inevitably to the view of
an atemporal "block universe" that simply exists. At some point we experience a gestalt
shift from a universe that is occurring to a universe that simply is. The concepts of time
and causality in such a universe can be (at most) psychological interpretations, lacking
any active physical significance. In order for time and causality to be genuinely active, a
degree of freedom is necessary, because without freedom we immediately regress to an
atemporal block universe, in which there can be no absolute direction of implication.
Of course, it may well be that certain directions in a deterministic block universe are
preferred based on the simplicity with which they can be described and conceptually
grasped. For example, it may be possible to completely specify the universe based on the
contents of a particular cross-sectional slice, together with a simple set of fixed rules for
recursively inferring the contents of neighboring slices in a particular sequence, whereas
other sequences may require a vastly more complicated rule. However, in a
deterministic universe this chain of implication is merely a descriptive convenience, and
cannot be regarded as the effective mechanism by which the events come into being.
The static view is fully consistent not only with the Newtonian universe that Laplace
imagined, but also with the theory of relativity, in which the worldlines of objects

(through spacetime) can be considered to be already existent in their entirety. (Indeed this
is a necessary interpretation if we are to incorporate worldlines actually crossing event
horizons.) In this sense relativity is a purely classical theory. On the other hand, quantum
mechanics is widely regarded as decidedly non-deterministic. Indeed, we saw in Section
9.6 the famous theorem of von Neumann purporting to rule out determinism (in the form
of hidden variables) in the realm of quantum mechanics. However, as Einstein observed
Whether objective facts are subject to causality is a question whose answer necessarily depends on
the theory from which we start. Therefore, it will never be possible to decide whether the world is
causal or not.

Note that the word causal is being used here as a synonym for deterministic, since
Einstein had in mind strict causality, with no free choices, as summarized in his famous
remark that God does not play dice with the universe. We've seen that von Neumanns
proof was based on a premise which is effectively equivalent to what he was trying to
prove, nicely illustrating Einsteins point that the answer depends on the theory from
which we start. In other words, an assertion about what is recursively possible can be
meaningful only if we place some constraint on the complexity of the allowable recursive
"algorithm".
For example, the nth state vector of a system may be the kn+1 through k(n+1) digits of .
This would be a perfectly deterministic system, but the relations between successive
states would be extremely obscure. In fact, assuming the digits of the two transcendental
numbers and e are normally distributed (as is widely believed, though not proven), any
finite string of decimal digits occurs infinitely often in their decimal expansions, and each
string occurs with the same frequency in both expansions. (It's been noted that, assuming
normality, the digits of would make an inexhaustible source of high-quality "random"
number sequences, higher quality than anything we can get out of conventional pseudorandom number generators). Therefore, given any finite number of digits (observations),
we could never even decide whether the operative algorithm was or e, nor whether
we had correctly identified the relevant occurrence in the expansion. Thus we can easily
imagine a perfectly deterministic universe that is also utterly unpredictable. (Interestingly,
the recent innovation that enables computation of the nth hexadecimal digit of (with
much less work than required to compute the first n digits) implies that we could present
someone with a sequence of digits and challenge them to determine where it first occurs
in the decimal expansion of , and it may be practically impossible for them to find the
answer.)
Even worse, there need be no simple rule of any kind relating the events of a
deterministic universe. This highlights the important distinction between determinism and
the concepts of predictability and complexity. There is no requirement for a deterministic
universe to be predictable, or for its complexity to be limited in any way. Thus, we can
never prove that any finite set of observations could only have occurred in a nondeterministic algorithm. In a sense, this is trivially true, because a finite Turing machine
can always be written to generate any given finite string, although the algorithm
necessary to generate a very irregular string may be nearly as long as the string itself.
Since determinism is inherently undecidable, we may try to define a more tractable

notion, such as predictability, in terms of the exhibited complexity manifest in our


observations. This could be quantified as the length of the shortest Turing machine
required to reproduce our observations, and we might imagine that in a completely
random universe, the size of the required algorithm would grow in proportion to the
number of observations (as we are forced to include ad hoc modifications to the
algorithm to account for each new observation). On this basis it might seem that we could
eventually assert with certainty that the universe is inherently unpredictable (on some
level of experience), i.e., that the length of the shortest Turing machine required to
duplicate the results grows in proportion with the number of observations. In a sense, this
is what the "no hidden variables" theorems try to do.
However, we can never reach such a conclusion, as shown by Chaitin's proof that there
exists an integer k such that it's impossible to prove that the complexity of any specific
string of binary bits exceeds k (where "complexity" is defined as the length of the
smallest Turing program that generates the string). This is true in spite of the fact that
"almost all" strings have complexity greater than k. Therefore, even if we (sensibly)
restrict our meaningful class of Turing machines to those of complexity less than a fixed
number k (rather than allowing the complexity of our model to increase in proportion to
the number of observations), it's still impossible for any finite set of observations (even if
we continue gathering data forever) to be provably inconsistent with a Turing machine of
complexity less than k. (Naturally we must be careful not to confuse the question of
whether "there exist" sequences of complexity greater than k with the question of whether
we can prove that any particular sequence has complexity greater than k.)

9.8 Quaedam Tertia Natura Abscondita


The square root of 9 may be either +3 or -3, because a plus times a plus
or a minus times a minus yields a plus. Therefore the square root of -9 is
neither +3 nor -3, but is a thing of some obscure third nature.
Girolamo
Cardano, 1545
In a certain sense the peculiar aspects of quantum spin measurements in EPR-type
experiments can be regarded as a natural extension of the principle of special relativity.
Classically a particle has an intrinsic spin about some axis with an absolute direction, and
the results of measurements depend on the difference between this absolute spin axis and
the absolute measurement axis. In contrast, quantum theory says there are no absolute
spin angles, only relative spin angles. In other words, the only angles that matter are the
differences between two measurements, whose absolute values have no physical
significance. Furthermore, the relations between measurements vary in a non-linear way,
so it's not possible to refer them to any absolute direction.
This "relativity of angular reference frames" in quantum mechanics closely parallels the
relativity of translational reference frames in special relativity. This shouldnt be too

surprising, considering that velocity boosts are actually rotations through imaginary
angles. Recall from Section 2.4 that the relationship between the frequencies of a given
signal as measured by the emitter and absorber depends on the two individual speeds ve
and va relative to the medium through which the signal propagates at the speed cs, but as
this speed approaches c (the speed of light in a vacuum), the frequency shift becomes
dependent only on a single variable, namely, the mutual speed between the emitter and
absorber relative to each other. This degeneration of dependency from two independent
absolute variables down to a single relative variable is so familiar today that we take
it for granted, and yet it is impossible to explain in classical Newtonian terms.
Schematically we can illustrate this in terms of three objects in different translational
frames of reference as shown below:

The object B is stationary (corresponding to the presumptive medium of signal


propagation), while objects A and C move relative to B in opposite directions at high
speed. Intuitively we would expect the velocity of A in terms of the rest frame of C (and
vice versa) to equal the sum of the velocities of A and C in terms of the rest frame of B. If
we allowed the directions of motion to be oblique, we would still have the triangle
inequality placing limits on how the mutual speeds are related to each other. This could
be regarded as something like a Bell inequality for translational frames of reference.
When we measure the velocity of A in terms of the rest frame of C we find that it does
not satisfy this additive property, i.e., it violates "Bell's inequality" for special relativity.
Compare the above with the actual Bell's inequality for entangled spin measurements in
quantum mechanics. Two measurements of the separate components of an entangle pair
may be taken at different orientations, say at the angles A and C, relative to the
presumptive common spin axis of the pair, as shown below:

We then determine the correlations between the results for various combinations of
measurement angles at the two ends of the experiment. Just as in the case of frequency
measurements taken at two different boost angles, the classical expectation is that the
correlation between the results will depend on the two measurement angles relative to
some reference direction established by the mechanism. But again we find that the

correlations actually depend only on the single difference between angles A and C, not on
their two individual values relative to some underlying reference.
The close parallel between the boost inequalities in special relativity and the Bell
inequalities for spin measurements in quantum mechanics is more than just superficial.
In both cases we find that the assumption of an absolute frame (angular or translational)
leads us to expect a linear relation between observable qualities, and in both cases it turns
out that in fact only the relations between one realized event and another, rather than
between a realized event and some absolute reference, govern the outcomes. Recall from
Section 9.5 that the correlation between the spin measurements (of entangled spin-1/2
particles) is simply -cos() where is the relative spatial angle between the two
measurements. The usual presumption is that the measurement devices are at rest with
respect to each other, but if they have some non-zero relative velocity v, we can represent
the "boost" as a complex rotation through an angle = arctanh(v) where arctanh is the
inverse hyperbolic tangent (see Part 6 of the Appendix). By analogy, we might expect the
"correlation" between measurements performed with respect to two basis systems with
this relative angle would be

which of course is Lorentz-Fitzgerald factor that scales the transformation of space and
time intervals from one system of inertial coordinates to another, leading to the
relativistic Doppler effect, and so on. In other words, this factor represents the projection
of intervals in one frame onto the basis axes of another frame, just as the correlation
between the particle spin measurements is the projection of the spin vector onto the
respective measurement bases. Thus the "mysterious" and "spooky" correlations of
quantum mechanics can be placed in close analogy with the time dilation and length
contraction effects of special relativity, which once seemed equally counterintuitive. The
spinor representation, which uses complex numbers to naturally combine spatial rotations
and "boosts" into a single elegant formalism, was discussed in Section 2.6. In this
context we can formulate a generalized "EPR experiment" allowing the two measurement
bases to differ not only in spatial orientation but also by a boost factor, i.e., by a state of
relative motion. The resulting unified picture shows that the peculiar aspects of quantum
mechanics can, to a surprising extent, be regarded as aspects of special relativity.
In a sense, relativity and quantum theory could be summarized as two different strategies
for accommodating the peculiar wave-particle duality of physical phenomena. One of the
problems this duality presented to classical physics was that apparently light could either
be treated as an inertial particle emitted at a fixed speed relative to the source, ala Newton
and Ritz, or it could be treated as a wave with a speed of propagation fixed relative to the
medium and independent of the source, ala Maxwell. But how can it be both? Relativity
essentially answered this question by proposing a unified spacetime structure with an
indefinite metric (viz, a pseudo-Riemannian metric). This is sometimes described by
saying time is imaginary, so its square contributes negatively to the line element, and
yields an invariant null-cone structure for light propagation, yielding invariant light

speed.
But waves and particles also differ with regard to interference effects, i.e., light can be
treated as a stream of inertial particles with no interference (though perhaps "fits and
starts) ala Newton, or as a wave with fully wavelike interference effects, ala Huygens.
Again the question was how to account for the fact that light exhibits both of these
characteristics. Quantum mechanics essentially answered this question by proposing that
observables are actually expressible in terms of probability amplitudes, and these
amplitudes contain an imaginary component which, upon taking the norm, can contribute
negatively to the probabilities, yielding interference effects.
Thus we see that both of these strategies can be expressed in terms of the introduction of
imaginary (in the mathematical sense) components in the descriptions of physical
phenomena, yielding the possibility of cancellations in, respectively, the spacetime
interval and superposition probabilities (i.e., interference). They both attempt to reconcile
aspects of the wave-particle duality of physical entities. The intimate correspondence
between relativity and quantum theory was not lost on Niels Bohr, who remarked in his
Warsaw lecture in 1938
Even the formalisms, which in both theories within their scope offer adequate
means of comprehending all conceivable experience, exhibit deep-going
analogies. In fact, the astounding simplicity of the generalisation of classical
physical theories, which are obtained by the use of multidimensional [nonpositive-definite] geometry and non-commutative algebra, respectively, rests in
both cases essentially on the introduction of the conventional symbol sqrt(-1).
The abstract character of the formalisms concerned is indeed, on closer
examination, as typical of relativity theory as it is of quantum mechanics, and it is
in this respect purely a matter of tradition if the former theory is considered as a
completion of classical physics rather than as a first fundamental step in the
thorough-going revision of our conceptual means of comparing observations,
which the modern development of physics has forced upon us.
Of course, Bernhardt Riemann, who founded the mathematical theory of differential
geometry that became general relativity, also contributed profound insights to the theory
of complex functions, the Riemann sphere (Section 2.6), Riemann surfaces, and so on.
(Here too, as in the case of differential geometry, Riemann built on and extended the
ideas of Gauss, who was among the first to conceive of the complex number plane.) More
recently, Roger Penrose has argued that some complex number magic seems to be at
work in many of the most fundamental physical processes, and his twistor formalism is
an attempt to find a framework for physics that exploits this the special properties of
complex functions at a fundamental level.
Modern scientists are so used to complex numbers that, in some sense, the mystery is
now reversed. Instead of being surprised at the physical manifestations of imaginary and
complex numbers, we should perhaps wonder at the preponderance of realness in the
world. The fact is that, although the components of the state vector in quantum mechanics

are generally complex, the measurement operators are all required by fiat to be
Hermitian, meaning that they have strictly real eigenvalues. In other words, while the
state of a physical system is allowed to be complex, the result of any measurement is
always necessarily real. So we cant claim that nature is indifferent to the distinction
between real and imaginary numbers. This suggests to some people a connection between
the measurement problem in quantum mechanics and the ontological status of
imaginary numbers.
The striking similarity between special relativity and quantum mechanics can be traced to
the fact that, in both cases, two concepts that were formerly regarded as distinct and
independent are found not to be so. In the case of special relativity, the two concepts are
space and time, whereas in quantum mechanics the two concepts are position and
momentum. Not surprisingly, these two pairs of concepts are closely linked, with space
corresponding to position, and time corresponding to momemtum (the latter representing
the derivative of position with respect to time). Considering the Heisenberg uncertainty
relation, its tempting to paraphrase Minkowskis famous remark, and say that henceforth
position by itself, and momentum by itself, are doomed to fade away into mere shadows,
and only a kind of union of the two will preserve an independent reality.
9.9 Locality and Temporal Asymmetry
All these fifty years of conscious brooding have brought me no nearer to
the answer to the question, 'What are light quanta?' Nowadays every Tom,
Dick and Harry thinks he knows it, but he is mistaken.
Einstein, 1954
We've seen that the concept of locality plays an important role in the EPR thesis and the
interpretation of Bell's inequalities, but what precisely is the meaning of locality,
especially in a quasi-metric spacetime in which the triangle inequality doesn't hold? The
general idea of locality in physics is based on some concept of nearness or proximity, and
the assertion that physical effects are transmitted only between suitably "nearby" events.
From a relativistic standpoint, locality is often defined as the proposition that all causal
effects of a particular event are restricted to the interior (or surface) of the future null
cone of that event, which effectively prohibits communication between spacelikeseparated events (i.e., no faster-than-light communication). However, this restriction
clearly goes beyond a limitation based on proximity, because it specifies the future null
cone, thereby asserting a profound temporal asymmetry in the fundamental processes of
nature.
What is the basis of this asymmetry? It certainly is not apparent in the form of the
Minkowski metric, nor in Maxwell's equations. In fact, as far as we know, all the
fundamental processes of nature are perfectly time-symmetric, with the single exception
of certain processes involving the decay of neutral kaons. However, even in that case, the
original experimental evidence in 1964 for violation of temporal symmetry was actually a

demonstration of asymmetry in parity and charge conjugacy, from which temporal


asymmetry is indirectly inferred on the basis of the CPT Theorem. As recently as 1999
there were still active experimental efforts to demonstrate temporal asymmetry directly.
In any case, aside from the single rather subtle peculiarity in the behavior of neutral
kaons, no one has ever found any evidence at all of temporal asymmetry in any
fundamental interaction. How, then, do we justify the explicit temporal asymmetry in our
definition of locality for all physical interactions?
As an example, consider electromagnetic interactions, and recall that the only invariant
measure of proximity (nearness) in Minkowski spacetime is the absolute interval

which is zero between the emission and absorption of a photon. Clearly, any claim that
influence can flow from the emission event to the absorption event but not vice versa
cannot be based on an absolute concept of physical nearness. Such a claim amounts to
nothing more or less than an explicit assertion of temporal asymmetry for the most
fundamental interactions, despite the complete lack of justification or evidence for such
asymmetry in photon interactions. Einstein commented on the unnaturalness of
irreversibility in fundamental interactions in a 1909 paper on electromagnetic radiation,
in which he argued that the asymmetry of the elementary process of radiation according
to the classical wave theory of light was inconsistent with what we know of other
elementary processes.
While in the kinetic theory of matter there exists an inverse process for every
process in which only a few elementary particles take part (e.g., for every
molecular collision), according to the wave theory this is not the case for
elementary radiation processes. According to the prevailing theory, an oscillating
ion produces an outwardly propagated spherical wave. The opposite process does
not exist as an elementary process. It is true that the inwardly propagated
spherical wave is mathematically possible, but its approximate realization requires
an enormous number of emitting elementary structures. Thus, the elementary
process of light radiation as such does not possess the character of reversibility.
Here, I believe, our wave theory is off the mark. Concerning this point the
Newtonian emission theory of light seems to contain more truth than does the
wave theory, since according to the former the energy imparted at emission to a
particle of light is not scattered throughout infinite space but remains available for
an elementary process of absorption.
In the same paper he wrote
For the time being the most natural interpretation seems to me to be that the
occurence of electromagnetic fields of light is associated with singular points just
like the occurence of electrostatic fields according to the electron theory. It is not
out of the question that in such a theory the entire energy of the electromagnetic
field might be viewed as localized in these singularities, exactly like in the old

theory of action at a distance.


This is a remarkable statement coming from Einstein, considering his deep commitment
to the ideas of locality and the continuum. The paper is also notable for containing his
premonition about the future course of physics:
Today we must regard the ether hypothesis as an obsolete standpoint. It is
undeniable that there is an extensive group of facts concerning radiation that
shows that light possesses certain fundamental properties that can be understood
far more readily from the standpoint of Newton's emission theory of light than
from the standpoint of the wave theory. It is therefore my opinion that the next
stage in the development of theoretical physics will bring us a theory of light that
can be understood as a kind of fusion of the wave and emission theories of light.
Likewise in a brief 1911 paper on the light quantum hypothesis, Einstein presented
reasons for believing that the propagation of light consists of a finite number of energy
quanta which move without dividing, and can be absorbed and generated only as a
whole. Subsequent developments (quantum electrodynamics) have incorporated these
basic insights, leading us to regard a photon (i.e., an elementary interaction) as an
indivisible whole, including the null-separated emission and absorption events on a
symmetrical footing. This view is supported by the fact that once a photon is emitted, its
quantum phase does not advance while "in flight", because quantum phase is proportional
to the absolute spacetime interval, which, as discussed in Section 2.1, is what gives the
absolute interval its physical significance. If we take seriously the spacetime interval as
the absolute measure of proximity, then the transmission of a photon is, in some sense, a
single event, coordinated mutually and symmetrically between the points of emission and
absorption.
This image of a photon as a single unified event with a coordinated emission and
absorption seems unsatisfactory to many people, partly because it doesn't allow for the
concept of a "free photon", i.e., a photon that was never emitted and is never absorbed.
However, it's worth remembering that we have no direct experience of "free photons",
nor of any "free particles", because ultimately all our experience is comprised of
completed interactions. (Whether this extends to gravitational interactions is an open
question.) Another possible objection to the symmetrical view of elementary interactions
is that it doesn't allow for a photon to have wave properties, i.e., to have an evolving state
while "in flight", but this objection is based on a misconception. From the standpoint of
quantum electrodynamics, the wave properties of electromagnetic radiation are actually
wave properties of the emitter. All the potential sources of a photon have a certain
(complex) amplitude for photon emission, and this amplitude evolves in time as we
progress along the emitter's worldline. However, as noted above, once a photon is
emitted, its phase does not advance. In a sense, the ancients who conceived of sight as
something like a blind man's incompressible cane, feeling distant objects, were correct,
because our retinas actually are in "direct" contact, via null intervals, with the sources of
light. The null interval plays the role of the incompressible cane, and the wavelike
properties we "feel" are really the advancing quantum phases of the source.

One might think that the reception amplitude for an individual photon must evolve as a
function of its position, because if we had (contra-factually) encountered a particular
photon one meter further away from its source than we did, we would surely have found
it with a different phase. However, this again is based on a misconception, because the
photon we would have received one meter further away (on the same timeslice) would
necessarily have been emitted one light-meter earlier, carrying the corresponding phase of
the emitter at that point on its worldline. When we consider different spatial locations
relative to the emitter, we have to keep clearly in mind which points they correspond to
along the worldline of the emitter.
Taking another approach, it might seem that we could "look at" a single photon at
different distances from the emitter (trying to show that its phase evolves in flight) by
receding fast enough from the emitter so that the relevant emission event remains
constant, but of course the only way to do this would be to recede at the speed of light
(i.e., along a null interval), which isn't possible. This is just a variation of the young
Einstein's thought experiment about how a "standing wave" of light would appear to
someone riding along side it. The answer, of course, is that its not possible for a material
object to move along-side a pulse of light (in vacuum), because light exists only as
completed interactions on null intervals. If we attempted such an experiment, we would
notice that, as our speed of recession from the source gets closer to c, the difference
between the phases of the photons we receive becomes smaller (i.e., the "frequency" of
the light gets red-shifted), and approaches zero, which is just what we should expect
based on the fact that each photon is simply the lightlike null projection of the emitter's
phase at a point on the emitter's worldline. Hence, if we stay on the same projection ray
(null interval), we are necessarily looking at the same phase of the emitter, and this is true
everywhere on that null ray. This leads to the view that the concept of a "free photon" is
meaningless, and a photon is nothing but the communication of an emitter event's phase
to some null-separated absorber event, and vice versa.
More generally, since the Schrodinger wave function propagates at c, it follows that every
fundamental quantum interaction can be regarded as propagating on null surfaces. Dirac
gave an interesting general argument for this strong version of Huygens' Principle in the
context of quantum mechanics. In his "Principles of Quantum Mechanics" he noted that a
measurement of a component of the instantaneous velocity of a free electron must give
the value c, which implies that electrons (and massive particles in general) always
propagate along null intervals, i.e., on the local light cone. At first this may seem to
contradict the fact that we observe massive objects to move at speeds much less than the
speed of light, but Dirac points out that observed velocities are always average velocities
over appreciable time intervals, whereas the equations of motion of the particle show that
its velocity oscillates between +c and -c in such a way that the mean value agrees with
the average value. He argues that this must be the case in any relativistic theory that
incorporates the uncertainty principle, because in order to measure the velocity of a
particle we must measure its position at two different times, and then divide the change in
position by the elapsed time. To approximate as closely as possible to the instantaneous
velocity, the time interval must go to zero, which implies that the position measurements

must approach infinite precision. However, according to the uncertainty principle, the
extreme precision of the position measurement implies an approach to infinite
indeterminancy in the momentum, which means that almost all values of momentum from zero to infinity - become equally probable. Hence the momentum is almost certainly
infinite, which corresponds to a speed of c. This is obviously a very general argument,
and applies to all massive particles (not just fermions). This oscillatory propagation on
null cones is discussed further in Section 9.11.
Another argument that seems to favor a temporally symmetric view of fundamental
interactions comes from consideration of the exchange of virtual photons. (Whether
virtual particles deserve to be called "real" particles is debatable; many people prefer to
regard them only as sometimes useful mathematical artifacts, terms in the expansion of
the quantum field, with no ontological status. On the other hand, it's possible to regard all
fundamental particles that way, so in this respect virtual particles are not unique.) The
emission and absorption points of virtual particles may be space-like separated, and we
therefore can't say unambiguously that one happened "before" the other. The temporal
order is dependent on the reference frame. Surely in these circumstances, when it's not
even possible to say absolutely which side of the interaction was the emission and which
was the absorption, those who maintain that fundamental interactions possess an inherent
temporal asymmetry have a very difficult case to make. Over limited ranges, a similar
argument applies to massive particles, since there is a non-negligible probability of a
particle traversing a spacelike interval if it's absolute magnitude is less than about h2/(2
m)2, where h is Planck's constant and m is the mass of the particle. So, if virtual particle
interactions are time-symmetric, why not all fundamental particle interactions? (Needless
to say, time-symmetry of fundamental quantum interactions does not preclude asymmetry
for macroscopic processes involving huge numbers of individual quantum interactions
evolving from some, possibly very special, boundary conditions.)
Experimentally, those who argue that the emission of a photon is conditioned by its
absorption can point to the results from tests of Bell's inequalities, because the observed
violations of those inequalities are exactly what the symmetrical model of interactions
would lead us to expect. Nevertheless, the results of those experiments are rarely
interpreted as lending support to the symmetrical model, apparently because temporal
asymmetry is so deeply ingrained in peoples' intuitive conceptions of locality, despite the
fact that there is very little (if any) direct evidence of temporal asymmetry in any
fundamental laws or interactions.
Despite the preceding arguments in favor of symmetrical (reversible) fundamental
processes, there are clearly legitimate reasons for being suspicious of unrestricted
temporal symmetry. If it were possible for general information to be transmitted
efficiently along the past null cone of an event, this would seem to permit both causal
loops and causal interactions with spacelike-separated events, as illustrated below.

On such a basis, it might seem as if the Minkowskian spacetime manifold would be


incapable of supporting any notion of locality at all. That triangle inequality fails in this
manifold, so there are null paths connecting every two points, and this applies even to
spacelike separated points if we allow the free flow of information in either direction
along null surfaces. Indeed this seems to have been the main source of Einsteins
uneasiness with the spooky entanglements entailed by quantum theory. In a 1948 letter
to Max Born, Einstein tried to clearly articulate his concern with entanglement, which he
regarded as incompatible with the confidence I have in the relativistic group as
representing a heuristic limiting principle.
It is characteristic of physical objects [in the world of ideas] that they are thought
of as arranged in a space-time continuum. An essential aspect of this arrangement
of things in physics is that they lay claim, at a certain time, to an existence
independent of one another, provided these objects are situated in different parts
of space. Unless one makes this kind of assumption about the independence of
the existence (the 'being-thus') of objects which are far apart from one another in
space the idea of the existence of (quasi) isolated systems, and thereby the
postulation of laws which can be checked empirically in the accepted sense,
would become impossible.
In essence, he is arguing that without the assumption that it is possible to localize
physical systems, consistent with the relativistic group, in such a way that they are
causally isolated, we cannot hope to analyze events in any effective way, such that one
thing can be checked against another. After describing how quantum mechanics leads
unavoidably to entanglement of potentially distant objects, and therefore dispenses with
the principle of locality (in Einsteins view), he says
When I consider the physical phenomena known to me, even those which are
being so successfully encompased by quantum mechanics, I still cannot find any
fact anywhere which would make it appear likely that the requirement [of
localizability] will have to be abandoned.
At this point the precise sense in which quantum mechanics entail non-classical
influences (or rather, correlations) for space-like separated events had not yet been
clearly formulated, and the debate between Born and Einstein suffered (on both sides)

from this lack of clarity. Einstein seems to have intuited that quantum mechanics does
indeed entail distant correlations that are inconsistent with very fundamental classical
notions of causality and independence, but he was unable to formulate those correlations
clearly. For his part, Born outlined a simple illustration of quantum correlations occuring
in the passage of light rays through polarizing filters which is exactly the kind of
experiment that, twenty years later, provided an example of the very thing that Einstein
said he had been unable to find, i.e., a fact which makes it appear that the requirement of
localizability must be abandoned. Its unclear to what extent Born grasped the nonclassical implications of those phenomena, which isnt surprising, since the Bell
inequalities had not yet been formulated. Born simply pointed out that quantum
mechanics allows for coherence, and said that this does not go too much against the
grain with me.
Born often argued that classical mechanics was just as probabilistic as quantum
mechanics, although his focus was on chaotic behavior in classical physics, i.e.,
exponential sensitivity to initial conditions, rather than on entanglement. Born and
Einstein often seemed to be talking past each other, since Born focused on the issue of
determinism, whereas Einsteins main concern was localizability. Remarkably, Born
concluded his reply by saying
I believe that even the days of the relativistic group, in the form you gave it, are
numbered.
One might have thought that experimental confirmation of quantum entanglement would
have vindicated Borns forecast, but we now understand that the distant correlations
implied by quantum mechanics (and confirmed experimentally) are of a subtle kind that
do not violate the relativistic group. This seems to be an outcome that neither Einstein
nor Born anticipated; Born was right that the distant entanglement implicit in quantum
mechanics would be proven correct, but Einstein was right that the relativistic group
would emerge unscathed. But how is this possible? Considering that non-classical distant
correlations have now been experimentally established with high confidence, thereby
undermining the classical notion of localizability, how can we account for the continued
ability of physicists to formulate and test physical laws?
The failure of the triangle inequality (actually, the reversal of it) does not necessarily
imply that the manifold is unable to support non-trivial structure. There are absolute
distinctions between the sets of null paths connecting spacelike separated events and the
sets of null paths connecting timelike separated events, and these differences might be
exploited to yield a structure that conforms with the results of observation. There is no
reason this cannot be a "locally realistic" theory, provided we understand that locality in a
quasi-metric manifold is non-transitive. Realism is simply the premise that the results of
our measurements and observations are determined by an objective world, and it's
perfectly possible that the objective world might possess a non-transitive locality,
commensurate with the non-transitive metrical aspects of Minkowski spacetime. Indeed,
even before the advent of quantum mechanics and the tests of Bell's inequality, we should
have learned from special relativity that locality is not transitive, and this should have led

us to expect non-Euclidean connections and correlations between events, not just


metrically, but topologically as well. From this point of view, many of the seeming
paradoxes associated with quantum mechanics and locality are really just manifestations
of the non-intuitive fact that the manifold we inhabit does not obey the triangle inequality
(which is one of our most basic spatio-intuitions), and that elementary processes are
temporally reversible.
On the other hand, we should acknowledge that the Bell correlations can't be explained in
a locally realistic way simply by invoking the quasi-metric structure of Minkowski
spacetime, because if the timelike processes of nature were ontologically continuous it
would not be possible to regard them as propagating on null surfaces. We also need our
fundamental physical processes to consist of irreducible discrete interactions, as
discussed in Section 9.10.
9.10 Spacetime Mediation of Quantum Interactions
No reasonable definition of reality could be expected to permit this.
Einstein, Podolsky, and Rosen, 1935
According to general relativity the shape of spacetime determines the motions of objects
while those objects determine (or at least influence) the shape of spacetime. Similarly in
electrodynamics the fields determine the motions of charges in spacetime while the
charges determine the fields in spacetime. This dualistic structure naturally arises when
we replace action-at-a-distance with purely local influences in such a way that the
interactions between "separate" objects are mediated by an entity extending between
them. We must then determine the dynamical attributes of this mediating entity, e.g., the
electromagnetic field in electrodynamics, or spacetime itself in general relativity.
However, many common conceptions regarding the nature and extension of these
mediating entities are called into question by the apparently "non-local" correlations in
quantum mechanics, as highlighted by EPR experiments. The apparent non-locality of
these phenomena arises from the fact that although we regard spacetime as metrically
Minkowskian, we continue to regard it as topologically Euclidean. As discussed in the
preceding sections, the observed phenomena are more consistent with a completely
Minkowskian spacetime, in which physical locality is directly induced by the pseudometric of spacetime. According to this view, spacetime operates on matter via
interactions, and matter defines for spacetime the set of allowable interactions, i.e.,
consistent with conservation laws. A quantum interaction is considered to originate on (or
be "mediated" by) the locus of spacetime points that are null-separated from each of the
interacting sites. In general this locus is a quadratic surface in spacetime, and its surface
area is inversely proportional to the mass of the transferred particle.
For two timelike-separated events A and B the mediating locus is a closed surface as
illustrated below (with one of the spatial dimensions suppressed)

The mediating surface is shown here as a dotted circle, but in 4D spacetime it's actually a
closed surface, spherical and purely spacelike relative to the frame of the interval AB.
This type of interaction corresponds to the transit of massive real particles. Of course,
relative to a frame in which A and B are in different spatial locations, the locus of
intersection has both timelike and spacelike extent, and is an ellipse (or rather an
ellipsoidal surface in 4D) as illustrated below

The surface is purely spacelike and isotropic only when evaluated relative to its rest
frame (i.e., the frame of the interval AB), whereas this surface maps to a spatial ellipsoid,
consisting of points that are no longer simultaneous, relative to any other co-moving
frame. The directionally asymmetric aspects of the surface area correspond precisely to
the "relativistic mass" components of the corresponding particles as a function of the
relative velocity of the frames.
The propagation of a free massive particle along a timelike path through spacetime can be
regarded as involving a series of surfaces, from which emanate inward-going "waves"
along the nullcones in both the forward and backward direction, deducting the particle
from the past focal point and adding it to the future focal point, as shown below for
particles with different masses.

Recall that the frequency of the de Broglie matter wave of a particle of mass m is

where px, py, pz are the components of momentum in the three directions. For a
(relatively) stationary particle the momentums vanish and the frequency is just =
(mc2)/h sec-1. Hence the time per cycle is inversely proportional to the mass. So, since
each cycle consists of an advanced and a retarded cone, the surface of intersection is a
sphere (for a stationary mass particle) of radius r = h/mc, because this is how far along
the null cones the wave propagates during one cycle. Of course, h/mc is just the Compton
scattering wavelength of a particle of mass m, which characterizes the spatial expanse
over which a particle tends to "scatter" incident photons in a characteristic way. This can
be regarded as the effective size of a particle when "viewed" by means of gamma-rays.
We may conceive of this effect being due to a high-energy photon getting close enough to
the nominal worldline of the massive particle to interfere with the null surfaces of
propagation, upsetting the phase coherence of the null waves and thereby diverting the
particle from it's original path.
For a massless particle the quantum phase frequency is zero, and a completely free
photon (if such a thing existed) would just be represented by an entire null-cone. On the
other hand, real photons are necessarily emitted and absorbed, so they corresponds to
bounded null intervals. Consistent with quantum electrodynamics, the quantum phase of
photon does not advance while in transit between its emission and absorption (unlike
massive particles). According to this view, the oscillatory nature of macroscopic
electromagnetic waves arises from the advancing phase of the source, rather than from
any phase activity of an actual photon.
The spatial volume swept out by a mediating surface is a maximum when evaluated with
respect to it's rest frame. When evaluated relative to any other frame of reference, the
spatial contraction causes the swept volume to be reduced. This is consistent with the
idea that the effective mass of a particle is inversely proportional to the swept volume of
the propagating surface, and it's also consistent with the effective range of mediating
particles being inversely proportional to their mass, since the electromagnetic force

mediated by massless photons has infinite range, whereas the strong nuclear force has a
very limited range because it is mediated by massive particles. Schematics of a stationary
and a moving particle are shown below.

This is the same illustration that appeared in the discussion of Lorentz's "corresponding
states" in Section 1.5, although in that context the shells were understood to be just
electromagnetic waves, and Lorentz simply conjectured that all physical phenomena
conform to this same structure and transform similarly. In a sense, the relativistic
Schrodinger wave equation and Dirac's general argument for light-like propagation of all
physical entities based on the combination of relativity and quantum mechanics (as
discussed in Section 9.10) provide the modern justification for Lorentz's conjecture.
Looking back even further, we see that by conceiving of a particle as a sequence of
surfaces of finite extent, it is finally possible to answer Zeno's question about how a
moving particle differs from a stationary particle in "a single instant". The difference is
that the mediating surfaces of a moving particle are skewed in spacetime relative to those
of a stationary particle, corresponding to their respective planes of simultaneity.
Some quantum interactions involve more than two particles. For example, if two coupled
particles separate at point A and interact with particles at points B and C respectively, the
interaction (viewed straight from the side) looks like this:

The mediating surface for the pair AB intersects with the mediating surface for AC at the
two points of intersection of the dotted circles, but in full 4D spacetime the intersection of
the two mediating spheres is a closed circle. (It's worth noting that these two surfaces
intersect if and only if B and C are spacelike separated. This circle enforces a particular
kind of consistency on any coherent waves that are generated on the two mediating
surfaces, and are responsible for "EPR" type correlation effects.)
The locus of null-separated points for two lightlike-separated events is a degenerate
quadratic surface, namely, a straight line as represented by the segment AB below:

The "surface area" of this locus (the intersection of the two cones) is necessarily zero, so
these interactions represent the transits of massless particles. For two spacelike-separated
events the mediating locus is a two-part hyperboloid surface, represented by the
hyperbola shown at the intersection of two null cones below

This hyperboloid surface has infinite area, which suggests that any interaction between
spacelike separated events would correspond to the transit of an infinitely massive
particle. On this basis it seems that these interactions can be ruled out. There is, however,
a limited sense in which such interactions might be considered. Recall that a
pseudosphere can be represented as a sphere with purely imaginary radius. It's
conceivable that observed interactions involving virtual (conjugate) pairs of particles over
spacelike intervals (within the limits imposed by the uncertainty relations) may
correspond to hyperboloid mediating surfaces.
(It's also been suggested that in a closed universe the "open" hyperboloid surfaces might
need to be regarded as finite, albeit extremely huge. For example, they might be 35 orders
of magnitude larger than the mediating surfaces for timelike interactions. This is related
to vague notions that "h" is in some sense the "inverse" of the size of a finite universe. In
a much smaller closed universe (as existed immediately following the big bang) there
may be have been an era in which the "hyperboloid" surfaces had areas comparable to the
ellipsoid surfaces, in which case the distinction between spacelike and time-like
interactions would have been less significant.)
An interesting feature of this interpretation is that, in addition to the usual 3+1
dimensions, spacetime requires two more "curled up" dimensions of angular orientation
to represent the possible directions in space. The need to treat these as dimensions in their
own right arises from the non-transitive topology of the pseudo-Riemannian manifold.
Each point [t,x,y,z] actually consists of a two-dimensional orientation space, which can
be parameterized (for any fixed frame) in terms of ordinary angular coordinates and .
Then each point in the six-dimensional space with coordinates [x,y,z,t,,] is a terminus
for a unique pair of spacetime rays, one forward and one backward in time. A simple
mechanistic visualization of this situation is to imagine a tiny computer at each of these
points, reading its input from the two rays and sending (matched conservative) outputs on

the two rays. This is illustrated below in the xyt space:

The point at the origin of these two views is on the mediating surface of events A and B.
Each point in this space acts purely locally on the basis of purely local information.
Specifying a preferred polarity for the two null rays terminating at each point in the 6D
space, we automatically preclude causal loops and restrict information flow to the future
null cone, while still preserving the symmetry of wave propagation. (Note that an
essential feature of spacetime mediation is that both components of a wave-pair are
"advanced", in the sense that they originate on a spherical surface, one emanating forward
and one backward in time, but both converge inward on the particles involved in the
interaction.
According to this view, the "unoccupied points" of spacetime are elements of the 6D
space, whereas an event or particle is an element of the 4D space (t,x,y,z). If effect an
event is the union of all the pairs of rays terminating at each point (x,y,z). We saw in
Chapter 3.5 that the transformations of and under Lorentzian boosts are beautifully
handled by linear fractional functions applied to their stereometric mappings on the
complex plane.
One common objection to the idea that quantum interactions occur locally between nullseparated points is based on the observation that, although every point on the mediating
surface is null-separated from each of the interacting events, they are spacelike-separated
from each other, and hence unable to communicate or coordinate the generation of two
equal and opposite outgoing quantum waves (one forward in time and one backward in
time). The answer to this objection is that no communication is required, because the
"coordination" arises naturally from the context. The points on the mediating locus are
not communicating with each other, but each of them is in receipt of identical bits of
information from the two interaction events A and B. Each point responds independently
based on its local input, but the combined effect of the entire locus responding to the
same information is a coherent pair of waves.
Another objection to the "spacetime mediation" view of quantum mechanics is that it
relies on temporally symmetric propagation of quantum waves. Of course, this objection

can't be made on strictly mathematical grounds, because both Maxwell's equations and
the (relativistic) Schrodinger equation actually are temporally symmetric. The objection
seems to be motivated by the idea that the admittance of temporally symmetric waves
automatically implies that every event is causally implicated in every other event, if not
directly by individual interactions then by a chain of interactions, resulting in a nonsensical mess. However, as we've seen, the spacetime mediation view leads naturally to
the conclusion that interactions between spacelike-separated events are either impossible
or else of a very different (virtual) character than interactions along time-like intervals.
Moreover, the stipulation of a preferred polarity for the ray pairs terminating at each point
is sufficient to preclude causal loops.
Conclusion
I have made no more progress in the general theory of relativity. The electric field still
remains unconnected. Overdeterminism does not work. Nor have I produced anything for the
electron problem. Does the reason have to do with my hardening brain mass, or is the
redeeming idea really so far away?
Einstein to Ehrenfest, 1920

Despite the spectacular success of Einstein's theory of relativity, it is sometimes said that
tests of Bell's inequalities and similar quantum phenomena have demonstrated that nature
is, on a fundamental level, incompatible with the local realism on which relativity is
based. However, as we saw in Section 9.7, Bell's inequalities apply only to strictly nondeterministic theories, so, as Bell himself noted, they do not preclude "local realism" for a
fully deterministic theory. The entire framework of classical relativity, with its unified
spacetime and partial ordering of events, is founded on a strictly deterministic basis, so
Bell's inequalities do not apply. Admittedly the phenomena of quantum mechanics are
incompatible with at least some aspect of our classical (metrical) idea of locality, but this
should not be surprising, because (as discussed in the preceding sections) our metrical
idea of locality is already inconsistent with the pseudo-Riemannian metrical structure of
spacetime itself, which forms the basis of modern relativity.
It's tempting to conclude that while modern relativity initiated a revolution in our
thinking about the (pseudo-Riemannian) metrical structure of spacetime, with its singular
null rays and non-transitive equivalencies, the concomitant revolution in our thinking
about the topology of spacetime has lagged behind. Although we long ago decided that
the physically measurable intervals between the events of spacetime cannot be accurately
represented as the distances between the points of a Euclidean metric space, we continue
to assume that the topology of the set of spacetime events is (locally) Euclidean. This
incongruous state of affairs may be due in part to the historical circumstance that
Einstein's special relativity was originally viewed as simply an elegant interpretation of
the existing Lorentz ether theory. According to Lorentz, spacetime really was a Euclidean
manifold with the metric and topology of E4, on top of which was superimposed a set of
functions representing the operational temporal and spatial components of intervals.
It was possible to conceive of this because the singularities in the mapping between the

"real" and "operational" components along null directions implied by the Minkowski line
element were not necessarily believed to be physical. The validity of Lorentz invariance
was just being established "one order at a time", and it wasn't clear that it would be valid
to all orders. The situation was somewhat akin to the view of some people today, who
believe that although the field equations of general relativity predict a genuine singularity
at the center of a black hole, we may imagine that somehow the laws break down at some
point, or some other unknown effect takes over and the singularity is averted. Around
1905 people could think similar things about the implied singularity in the full n-order
Lorentz-Fitzgerald mapping between Lorentz's "real spacetime" and his operational
electromagnetic spacetime, i.e., they could imagine that the Lorentz invariance might
break down at some point short of the singularities. On this basis, we can make sense of
continuing to use the topology of E4. The original Euclidean topology of Lorentz's
absolute spacetime still lurks just beneath the surface of modern relativity.
However, if we make the judgement that Lorentz invariance applies strictly to all orders
(as Poincare suggested and Einstein brashly asserted in 1905), and the light-like
singularities of the Lorentz-Fitzgerald mapping are genuine physical singularities, albeit
in some unfamiliar non-transitive sense, and if we thoroughly disavow Lorentz's
underlying "real spacetime" (which plays no role in the theory) and treat the "operational
spacetime" itself as the primary ontological entity, then there seems reason to question
whether the assumption of E4 topology is still suitable. This is particularly true if a
topology more in accord with Lorentz invariance would also help to clarify some of the
puzzling phenomena of quantum mechanics.
Of course, it's entirely possible that the theory of relativity is simply wrong on some
fundamental level where quantum mechanics "takes over". In fact, this is probably the
majority view among physicists today, who hope that eventually a theory uniting gravity
and quantum mechanics will be found which will explain precisely how and in what
circumstances the classical theory of relativity fails to accurately represent the operations
of nature, while at the same time explaining why it seems to work as well as it does.
However, it may be worthwhile to remember previous periods in the history of physics
when the principle of relativity was judged to be fundamentally inadequate to account for
the observed phenomena. Recall Ptolemy's arguments against a moving Earth, or the 19th
century belief that electromagnetism necessitated a luminiferous ether, or the early-20th
century view that special relativity could never be reconciled with gravity. In each case a
truly satisfactory resolution of the difficulties was eventually achieved, not by discarding
relativity, but by re-interpreting and extending it, thereby gaining a fuller understanding
of its logical content and consequences.
Appendix: Mathematical Miscellany
1. Vector Products
The dot and cross products are often introduced via trigonometric functions and/or matrix
operations, but they also arise quite naturally from simple considerations of Pythagoras'

theorem. Given two points a and b in the three-dimensional vector space with Cartesian
coordinates (ax,ay,az) and (bx,by,bz) respectively, the squared distance between these two
points is

If (and only if) these two vectors are perpendicular, the distance between them is the
hypotenuse of a right triangle with edge lengths equal to the lengths of the two vectors, so
we have

if and only if a and b are perpendicular. Equating these two expressions and canceling
terms, we arrive at the necessary and sufficient condition for a and b to be perpendicular

This motivates the definition of the left hand quantity as the "dot product" (also called the
scalar product) of the arbitrary vectors a = (ax,ay,az) and b = (bx,by,bz) as the scalar
quantity

At the other extreme, suppose we seek an indicator of whether or not the vectors a and b
are parallel. In any case we know the squared length of the vector sum of these two
vectors is

We also know that S = |a| + |b| if and only if a and b are parallel, in which case we have

Equating these two expressions for S2, canceling terms, and squaring both sides gives the
necessary and sufficient condition for a and b to be parallel

Expanding these expressions and canceling terms, this becomes

Notice that we can gather terms and re-write this equality as

Obviously a sum of squares can equal zero only if each term is individually zero, which
of course was to be expected, because two vectors are parallel if and only if their
components are in the same proportions to each other, i.e.,

which represents the vanishing of the three terms in the previous expression. This
motivates the definition of the cross product (also known as the vector product) of two
vectors a = (ax,ay,az) and b = (bx,by,bz) as consisting of those three components, ordered
symmetrically, so that each component is defined in terms of the other two components
of the arguments, as follows

By construction, this vector is null if and only if a and b are parallel. Furthermore, notice
that the dot products of this cross product and each of the vectors a and b are identically
zero, i.e.,

As we saw previously, the dot product of two vectors is 0 if and only if the vectors are
perpendicular, so this shows that a b is perpendicular to both a and b. There is,
however, an arbitrary choice of sign, which is conventionally resolved by the "right-hand
rule". It can be shown that if is the angle between a and b, then ab is a vector with
magnitude |a||b|sin() and direction perpendicular to both a and b, according to the righthand rule. Similarly the scalar ab equals |a||b|cos().
2. Differentials
In Chapter 5.2 we gave an intuitive description of differentials such as dx and dy as
incremental quantities, but strictly speaking the actual values of differentials are arbitrary,

because only the ratios between them are significant. Differentials for functions of
multiple variables are just a generalization of the usual definitions for functions of a
single variable. For example, if we have z = f(x) then the differentials dz and dx are
defined as arbitrary quantities whose ratio equals the derivative of f(x) with respect to x.
Consequently we have dz/dx = f '(x) where f '(x) signifies the partial derivative z/x, so
we can express this in the form

In this case the partial derivative is identical to the total derivative, because this f is
entirely a function of the single variable x.
If, now, we consider a differentiable function z = f(x,y) with two independent variables,
we can expand this into a power series consisting of a sum of (perhaps infinitely many)
terms of the form Axmyn. Since x and y are independent variables we can suppose they
are each functions of a parameter t, so we can differentiate the power series term-by-term,
with respect to t, and each term will contribute a quantity of the form

where, again, the differentials dx,dy,dz,dt are arbitrary variables whose ratios only are
constrained by this relation. The coefficient of dy/dt is the partial derivative of Axmyn,
with respect to y, and the coefficient of dx/dt is the partial with respect to x, and this will
apply to every term of the series. So we can multiply through by dt to arrive at the
result

The same approach can be applied to functions of arbitrarily many independent variables.
A simple application of total differentials occurs in Section 3 of Einstein's 1905 paper
"On the Electrodynamics of Moving Bodies". In the process of deriving the function
(x',y,z,t) as part of the Lorentz transformation, Einstein arives at his equation 3.1

where I've replaced his "t" with t0 to emphasize that this is just the arbitrary value of t at
the origin of the light pulse. At this point Einstein says "Hence, if x' be chosen
infintesimally small," and then he writes his equation 3.2

Various explications of this step have appeared in the literature. For example, Miller says
"Einstein took x' to be infintesimal and expanded both sides of [3.1] into a series in x'.
Neglecting terms higher than first order the result is [3.2]." To put this differently,
Einstein simply evaluated the total differentials of both sides of the equation. For any
arbitrary continuous function (x',y,z,t) we have

Since the arguments of the first function on the left hand side of 3.1 are all constants,
we have dx' = dy = dz = dt = 0, so it contributes nothing to the total differential of the left
hand side. The arguments of the second function on the left are all constants except for
the t argument, which equals

so we have

It follows that the total differential of the second function is

Likewise the total differential of the function on the right hand side of 3.1 is

So, equating the total differentials of the two sides of 3.1 gives

and dividing out the factor of dx' gives Einstein's equation 3.2.
3. Differential Operators

The standard differential operators are commonly expressed as formal "vector products"
involving the ("del") symbol, which is defined as

where ux, uy, uz are again unit vectors in the x,y,z directions. The scalar product of
with an arbitrary vector field V is called the divergence of V, and is written explicitly as

The vector product of with an arbitrary vector field V is called the curl, given
explicitly by

Note that the curl is applied to a vector field and returns a vector, whereas the divergence
is applied to a vector field but returns a scalar. For completeness, we note that a scalar
field Q(x,y,z) can be simply multiplied by the operator to give a vector, called the
gradient, as follows

Another common expression is the sum of the second derivatives of a scalar field with
respect to the three directions, since this sum appears in the Laplace and Poisson
equations. Using the "del" operator this can be expressed as the divergence of the
gradient (or the "div grad") of the scalar field, as shown below.

For convenience, this operation is often written as 2, and is called the Laplacian
operator. All the above operators apply to 3-vectors, but when dealing with 4-vectors in
Minkowski spacetime the analog of the Laplacian operator is the d'Alembertian operator

4. Differentiation of Vectors and Tensors

The easiest way to understand the motivation for the definitions of absolute and covariant
differentiation is to begin by considering the derivative of a vector field in threedimensional Euclidean space. Such a vector can be expressed in either contravariant or
covariant form as a linear combination of, respectively, the basis vectors u1, u2, u3 or the
dual basis vectors u1, u2, u3, as follows

where Ai are the contravariant components and Ai are the covariant components of A, and
the two sets of basis vectors satisfy the relations

where gij and gij are the covariant and contravariant metric tensors. The differential of A
can be found by applying the chain rule to either of the two forms, as follows

If the basis vectors ui and ui have a constant direction relative to a fixed Cartesian frame,
then dui = dui = 0, so the second term on the right vanishes, and we are left with the
familiar differential of a vector as the differential of its components. However, if the
basis vectors vary from place to place, the second term on the right is non-zero, so we
must not neglect this term if we are to allow curvilinear coordinates.
As we saw in Part 2 of this Appendix, for any quantity Q = f(x) and coordinate xi we have

so we can substitute for the three differentials in (1) and re-arrange terms to write the
resulting expressions as

Since these relations must hold for all possible combinations of dxi , the quantities inside
parentheses must vanish, so we have the following relations between partial derivatives

If we now let Aij and Aij denote the projections of the ith components of (2a) and (2b)
respectively onto the jth basis vector, we have

and it can be verified that these are the components of second-order tensors of the types
indicated by their indices (superscripts being contravariant indices and subscripts being
covariant indices). If we multiply through (using the dot product) each term of (2a) by ui,
and each term of (2b) by ui, and recall that uiuj = ij, we have

For convenience we now define the three-index symbol

which is called the Christoffel symbol of the second kind. Although the Christoffel
symbol is not a tensor, it is very useful for expressing results on a metrical manifold with
a given system of coordinates. We also note that since the components of uiuj are
constants (either 0 or 1), it follows that (uiuj)/xk = 0, and expanding this partial
derivative by the chain rule we find that

Therefore, equations (3) can be written in terms of the Christoffel symbol as

These are the covariant derivatives of, respectively, the contravariant and covariant forms
of the vector A. Obviously if the basis vectors are constant (as in Cartesian or oblique
coordinate systems) the Christoffel symbols vanish, and we are left with just the first
terms on the right sides of these equations. The second terms are needed only to account
for the change in basis with position of general curvilinear coordinates.
It might seem that these definitions of covariant differentiation depend on the fact that we
worked in a fixed Euclidean space, which enabled us to assign absolute meaning to the
components of the basis vectors in terms of an underlying Cartesian coordinate system.
However, it can be shown that the Christoffel symbols we've used here are the same as
the ones defined in Section 5.4 in the derivation of the extremal (geodesic) paths on a
curved manifold, wholly in terms of the intrinsic metric coefficients gij and their partial
derivatives with respect to the general coordinates on the manifold. This should not be
surprising, considering that the definition of the Christoffel symbols given above was in
terms of the basis vectors uj and their derivatives with respect to the general coordinates,
and noting that the metric tensor is just gij = uiuj . Thus, with a bit of algebra we can
show that

in agreement with Section 5.4. We regard equations (4) as the appropriate generalization
of differentiation on an arbitrary Riemannian manifold essentially by formal analogy with
the flat manifold case, by the fact that applying this operation to a tensor yields another
tensor, and perhaps most importantly by the fact that in conjunction with the
developments of Section 5.4 we find that the extremal metrical path (i.e., the geodesic
path) between two points is given by using this definition of "parallel transport" of a
vector pointed in the direction of the path, so the geodesic paths are locally "straight".
Of course, when we allow curved manifolds, some new phenomena arise. On a flat
manifold the metric components may vary from place to place, but we can still determine
that the manifold is flat, by means of the Riemann curvature tensor described in Section
5.7. One consequence of flatness, obvious from the above derivation, is that if a vector is
transported parallel to itself around a closed path, it assumes its original orientation when
it returns to its original location. However, if the metric coefficients vary in such a way
that the Riemann curvature tensor is non-zero, then in general a vector that has been
transported parallel to itself around a closed loop will undergo a change in orientation.
Indeed, Gauss showed that the amount of deflection experienced by a vector as a result of
being parallel-transported around a closed loop is exactly proportional to the integral of
the curvature over the enclosed region.
The above definition of covariant differentiation immediately generalizes to tensors of
any order. In general, the covariant derivative of a mixed tensor T consists of the
ordinary partial derivative of the tensor itself with respect to the coordinates xk, plus a
term involving a Christoffel symbol for each contravariant index of T, minus a term

involving a Christoffel symbol for each covariant index of T. For example, if r is a


contravariant index and s is a covariant index, we have

It's convenient to remember that each Christoffel symbol in this expression has the index
of xk in one of its lower positions, and also that the relevant index from T is carried by the
corresponding Christoffel symbol at the same level (upper or lower), and the remaining
index of the Christoffel symbol is a dummy that matches with the relevant index position
in T.
One very important result involving the covariant derivative is known as Ricci's
Theorem. The covariant derivative of the metric tensor is gij is

If we substitute for the Christoffel symbols from equation (5), and recall that

we find that all the terms cancel out and we're left with gij,k = 0. Thus the covariant
derivative of the metric tensor is identically zero, which is what prompted Einstein to
identify it with the gravitational potential, whose divergence vanishes, as discussed in
Section 5.8.
5. Notes on Curvature Derivations
Direct substitution of the principal q values into the curvature formula of Section 5.3
gives a somewhat complicated expression, and it may not be obvious that it reduces to the
expression given in the text. Even some symbolic processors seem to be unable to
accomplish the reduction. So, to verify the result, recall that we have

where m = (ca)/b. The roots of the quadratic in q are

and of course qq' = 1. From the 2nd equation we have q2 = 1 + 2mq, so we can

substitute this into the curvature equation to give

Adding and subtracting c in the numerator, this can be written as

Now, our assertion in the text is that this quantity equals (a+c) + b
. If we
subtract 2c from both of these quantities and multuply through by 1 + mq, our assertion is

Since q = m +
the right hand term in the square brackets can be written as
bq bm, so we claim that

Expanding the right hand side and cancelling terms and dividing by m gives

Now we multiply by the conjugate quantity q' to give

The quantities bq' cancel, and we are left with m = (c a)/b, which is the definition of m.
Of course the same derivation applies to the other principle curvature if we swap q and q'.
Section 5.3 also states that the Gaussian curvature of the surface of a sphere of radius R is
1/R2. To verify this, note that the surface of a sphere of radius R is described by x2 + y2 +
z2 = R2, and we can consider a point at the South pole, tangent to a plane of constant z.
Then we have

Taking the negative root (for the South Pole), factoring out R, and expanding the radical
into a power series in the quantity (x2 + y2) / R2 gives

Without changing the shape of the surface, we can elevate the sphere so the South pole is
just tangent to the xy plane at the origin by adding R to all the z values. Omitting all
powers of x and y above the 2nd, this gives the quadratic equation of the surface at this
point

Thus we have z = ax2 + bxy + cx2 where

from which we compute the curvature of the surface

as expected.
6. Odd Compositions
It's interesting to review the purely formal constraints on a velocity composition law
(such as discussed in Section 1.9) to clarify what distinguishes the formulae that work
from those that don't. Letting v12, v23, and v13 denote the pairwise velocities (in geometric
units) between three co-linear particles P1, P2, P3, a composition formula relating these
speeds can generally be expressed in the form

where f is some function that transforms speeds into a domain where they are simply
additive. It's clear that f must be an "odd" function, i.e., f(-x) = -f(x), to ensure that the
same composition formula works for both positive and negative speeds. This rules out
transforms such as f(x) = x2, f(x) = cos(x), and all other "even" functions.
The general "odd" function expressed as a power series is a linear combination of odd
powers, i.e.,

so we can express any such function in terms of the coefficients [c1,c3,...]. For example,
if we take the coefficients [1,0,0,...] we have the simple transform f(x) = x, which gives
the Galilean composition formula v13 = v12 + v23. For another example, suppose we
"weight" each term in inverse proportion to the exponent by using the coefficients [1, 1/3,
1/5, 1/7,...]. This gives the transform

leading to Einstein's relativistic composition formula

From the identity atanh(x) = ln[(1+x)/(1x)]/2 we also have the equivalent multiplicative
form

which is arguably the most natural form of the relativistic speed composition law. The
velocity parameter p = (1+v)/(1-v) also gives very natural expressions for other
observables as well, including the relativistic Doppler shift, which equals
, and the
spacetime interval between two inertial particles each one unit of proper time past their
point of intersection, which equals p1/4 p-1/4. Incidentally, to give an equilateral triangle
in spacetime, this last equation shows that two particles must have a mutual speed of
= 0.745...
7. Independent Components of the Curvature Tensor
As shown in Section 5.7, the fully covariant Riemann curvature tensor at the origin of
Riemann normal coordinates, or more generally in terms of any tangent coordinate
system with respect to which the first derivatives of the metric coefficients are zero, has
the symmetries

These symmetries imply that although the curvature tensor in four dimensions has 256
components, there are only 20 algebraic degrees of freedom. To prove this, we first note
that the anti-symmetry in the first two indices and in the last two indices implies that all
the components of the form Raaaa, Raabb, Raabc, Rabcc, and all permutations of Raaab are zero,
because they equal the negation of themselves when we transpose either the first two or

the last two indices. The only remaining components with fewer than three distinct
indices are of the form Rabab and Rabba, but these are the negatives of each other by
transposition of the last two incides, so we have only six independent components of this
form (which is the number of ways of choosing two of four indices). The only non-zero
components with exactly three distinct indices are of the forms Rabac = Rbaac = Rabca =
Rbaca, so we have twelve independent components of this form (because there are four
choices for the excluded index, and then three choices for the repeated index). The
remaining components have four distinct indices, but each component with a given
permutation of indices actually determines the values of eight components because of the
three symmetries and anti-symmetries of order two. Thus, on the basis of these three
symmetries there are only 24/8 = 3 independent components of this form, which may be
represented by the three components R1234, R1342, and R1432. However, the skew symmetry
implies that these three components sum to zero, so they represent only two degrees of
freedom. Hence we can fully specify the Riemann curvature tensor (with respect to
tangent coordinates) by giving the values of the six components of the form Rabab, the
twelve components of the form Rabac, and the values of R1234 and R1342, which implies that
the curvature tensor (with respect to any coordinate sytem) has 6 + 12 + 2 = 20 algebraic
degrees of freedom.
The same reasoning can be applied in any number of dimensions. For a manifold of N
dimensions, the number of independent non-zero curvature components with just two
distinct indices is equal to the number of ways of choosing 2 out of N indices. Also, the
number of independent non-zero curvature components with 3 distinct indices is equal to
the number of ways of choosing the N-3 excluded indices out of N indices, multiplied by
3 for the number of choices of the repeated index. This leaves the components with 4
distinct indices, of which there are 4! times the number of ways of choosing 4 of N
indices, but again each of these represents 8 components because of the symmetries and
anti-symmetries. Also, these components can be arranged in sets of three that satisfy the
three-way skew symmetry, so the number of independent components of this form is
reduced by a factor of 2/3. Therefore, the total number of algebraically independent
components of the curvature tensor in N dimensions is

Bibliography

Aristotle, "The Physics", (trans by Wicksteed and Cornford), Harvard Univ. Press, 1957.
Armstrong, M. A., "Groups and Symmetry", Springer-Verlag, 1988.
Baierlein, Ralph, "Newton to Einstein, The Trail of Light", Cambridge Univ Press, 1992.
Barrow, John, "Theories of Everything, The Quest for Ultimate Explanation", Clarendon
Press, 1991.
Barut, A., "Electrodynamics and Classical Theory of Fields and Particles", Dover, 1964.
Bate, Roger R., et al, "Fundamentals of Astrodynamics", Dover, 1971.
Beck, Anna (translator), The Collected Papers of Albert Einstein, Princeton University
Press, 1989.
Bell, J. S., "Speakable and Unspeakable in Quantum Mechanics", Cambridge Univ. Press,
1993.
Bergmann, Peter, "Introduction to the Theory of Relativity", Dover, 1976.
Bergmann, Peter, "The Riddle of Gravitation", Dover, 1968.
Bonola, Roberto, "Non-Euclidean Geometry", Dover, 1955.
Borisenko, A.I., and Tarapov, I.E., "Vector and Tensor Analysis with Applications",
Dover, 1968.
Born, Max, "Einstein's Theory of Relativity", Dover, 1962.
Boas, Mary, "Mathematical Methods in the Physical Sciences", 2nd ed., Wiley, 1983.
Boyer, Carl, "A History of Mathematics", Princeton Univ Press, 1985.
Bryant, Victor, "Metric Spaces", Cambridge Univ. Press, 1985.
Buhler, W. K., "Gauss, A Biographical Study", Springer-Verlag, 1981.
Caspar, Max, "Kepler", Dover, 1993.
Christianson, Gale E., "In the Presence of the Creator, Isaac Newton and His Times", The
Free Press, 1984.
Ciufolini and Wheeler, "Gravitation and Inertia", Princeton Univ. Press, 1995.
Clark, Ronald, "Einstein, The Life and Times", Avon Books, 1971.
Copernicus, Nicolaus, "On the Revolutions of Heavenly Spheres", Prometheus Books,
1995.
Cushing, James, "Philosophical Concepts in Physics", Cambridge Univ. Press, 1998.
Das, Anadijiban, "The Special Theory of Relativity", Springer-Verlag, 1993.
Davies, Paul, "The Accidental Universe", Cambridge Univ. Press, 1982.
Davies, Paul, "About Time", Simon & Schuster, 1996.
D'Inverno, Ray, "Introducing Einstein's Relativity", Clarendon Press, 1992.
Dirac, P. A. M., "The Principles of Quantum Mechanics", 4th ed., Oxford Science
Publications, 1957.
Doughty, Noel, "Lagrangian Interaction", Perseus Books, 1990.
Duncan, Ronald, and M. Weston-Smith (ed.), "The Encyclopedia of Ignorance", Pocket
Books, 1977.
Earman, John, "World Enough and Space-Time", MIT Press, 1989.
Ebbinghaus, H.D.,et al., "Mathematical Logic", Springer-Verlag, 1994.
Einstein, Albert, "The Meaning of Relativity", Princeton Univ. Press, 1956.
Einstein, Albert, "Sidelights on Relativity", Dover, 1983.
Einstein, Albert, "Relativity, The Special and General Theory", Crown Trade, 1961.
Einstein, Albert, "The Theory of Relativity and Other Essays", Citadel Press, 1996.
Einstein, et al, "The Principle of Relativity", Dover, 1952.
Eisberg and Resnick, "Quantum Physics", John Wiley & Sons, 1985.

Euclid, "The Elements" (translated by Thomas Heath), Dover, 1956.


Feynman, Richard, Feynman Lectures on Gravitation, Addison-Wesley Publishing,
1995.
Feynman, Richard, "QED, The Strange Theory of Light and Matter", Princeton Univ
Press, 1985.
Feynman, Richard, "The Character of Physical Law", M.I.T. Press, 1965.
Fowles, Grant, "Introduction to Modern Optics", Dover, 1975.
Friedman, Michael, "Foundations of Spacetime Theories", Princeton Univ. Press, 1983.
Frauenfelder, Hans, and Ernest, M. Henley, "Subatomic Physics", Prentice-Hall, Inc.,
1974.
Galilei, Galileo, "Sidereus Nuncius", Univ. of Chicago Press, 1989.
Galilei, Galileo, "Dialogue Concerning the Two Chief World Systems", Univ. of Cal.
Press, 2nd ed., 1967.
Gemignani, Michael, "Elementary Topology", 2nd ed., Dover, 1972.
Gibbins, Peter, "Particles and Paradoxes", Cambridge Univ. Press, 1987.
Goldsmith, Donald, "The Evolving Universe", Benjamin/Cummings Publishing, 1985.
Goodman, Lawrence E., and Warner, William H., "Dynamics", Wadsworth Publishing
Co. Inc., 1965.
Greenwood, Donald T., "Principles of Dynamics", Prentice-Hall, 1965.
Guggenheimer, Heinrich, "Differential Geometry", Dover, 1977.
Halliday and Resnick, "Physics", John Wiley & Sons, 1978.
Hawking S.W. and Ellis G.F.R., "The Large Scale Structure of Spacetime", Cambridge
Univ. Press, 1973.
Hay, G.E., "Vector and Tensor Analysis", Dover, 1953.
Heath, Thomas, "A History of Greek Mathematics", Dover, 1981.
Hecht, Eugene, "Optics", 3rd ed.,Addison-Wesley, 1998.
Heisenberg, Werner, "The Physical Principles of the Quantum Theory", Dover, 1949.
Hilbert, David, "Foundations of Geometry", Open Court, 1992.
Huggett and Tod, "An Introduction to Twistor Theory", Cambridge Univ Press, 1985.
Hughes, R. I. G., "The Structure and Interpretation of Quantum Mechanics", Harvard
Univ. Press, 1989.
Joshi, A. W., "Matrices and Tensors In Physics", Halstead Press, 1975.
Jones and Singerman, "Complex Functions", Cambridge Univ. Press, 1987.
Judson, Lindsay (ed.), "Aristotle's Physics", Oxford Univ. Press, 1991.
Kennnefick, D. , "Controversies in History of Reaction problem in GR", preprint gr-qc
9704002, Apr 1997.
Kline, Morris, "Mathematical Throught from Ancient to Modern Times", Oxford Univ.
Press, 1972.
Kramer, Edna, "The Nature and Growth of Modern Mathematics", Princeton Univ. Press,
1982.
Kuhn, Thomas S., "The Copernican Revolution", Harvard University Press, 1957.
Liepmann, H. W., and Roshko, A., "Elements of Gas Dynamics", John Wiley & Sons,
1957.
Lindley, David, Degrees Kelvin, Joseph Henry Press, 2004.
Lindsay and Margenau, "Foundations of Physics", Ox Bow Press, 1981.
Lloyd, G. E. R., Greek Science After Aristotle, W. W. Norton & Co., 1973.

Lorentz, H. A., The Theory of Electrons, 2nd edition (1915), Dover, 1952.
Lovelock and Rund, "Tensors, Differential Forms, and Variational Principles", Dover,
1989.
Lucas and Hodgson, "Spacetime & Electromagnetism", Oxford Univ Press, 1990.
McConnell, A.J., "Applications of Tensor Analysis", Dover, 1957.
Menzel, "Fundamental Formulas of Physics", Dover, 1960.
Miller, Arthur, "Albert Einstein's Special Theory of Relativity", Springer-Verlag, 1998.
Misner, Thorne, and Wheeler, "Gravitation", W.H. Freeman & Co, 1973.
Mahoney, Michael, "The Mathematical Career of Pierre de Fermat", 2nd ed, Princeton
Univ Press, 1994.
Maxwell, James Clerk, "A Treatise on Electricity and Magnetism", Dover 1954.
Nagel, Ernest and Newman, James R., "Godel's Proof", New York Univ. Press, 1958.
Neumann, John von, "Mathematical Foundations of Quantum Mechanics", Princeton
Univ. Press, 1955.
Newton, Isaac, "Principia" (trans by Motte and Cajori), Univ of Calif Press, 1962.
Newton, Isaac, "Principia" (trans by Cohen and Whitman), Univ of Calif Press, 1999.
Newton, Isaac, "Opticks", Dover, 1979.
Ohanian and Ruffini, "Gravitation and Spacetime", 2nd ed., W.W Norton & Co., 1994.
Olson, Reuben, "Essentials of Engineering Fluid Mechanics", 3rd ed., Intext Press, 1973.
Pais, Abraham, "Subtle is the Lord", Oxford Univ Press, 1982.
Pannekoek, A. "A History of Astronomy", Dover, 1989.
Peat, F. David, "Superstrings and the Search for the Theory of Everything",
Contemporary Books, 1988.
Pedoe, Dan, "Geometry, A Comprehensive Course", Dover, 1988.
Penrose, Roger, "The Emperor's New Mind", Oxford Univ Press, 1989.
Poincare, Henri, "Science and Hypothesis", Dover, 1952.
Prakash, Nirmala, "Differential Geometry, An Integrated Approach", Tata McGraw-Hill,
1981.
Price, Huw, "Time's Arrow and Archimedes Point", Oxford Univ Press, 1996.
Ridley, B.K., "Space, Time, and Things", Penguin Books, 1976.
Rindler, Wolfgang, "Essential Relativity", Springer-Verlag, 1977.
Ray, Christopher, "Time, Space, and Philosophy", Routledge, 1992.
Reichenbach, Hans, "The Philosophy of Space and Time", Dover, 1958.
Reichenbach, Hans, "From Copernicus to Einstein", Dover, 1980.
Ronchi, "Optics, The Science of Vision", Dover, 1991.
Roseveare, N. T., "Mercury's Perihelion from Le Verrier to Einstein", Oxford Univ. Press,
1982.
Savitt, Steven F., "Time's Arrow Today", Cambridge Univ Press, 1995.
Schey, H. M., "Div, Grad, Curl, and All That", W.W.Norton & Co, 1973.
Schwartz, Melvin, "Principles of Electrodynamics", Dover, 1987.
Schwarzschild, Karl, "On the Gravitational Field of a Mass Point According to Einstein's
Theory", Procedings of the Prussian Academy, 13 Jan 1916.
Shilov, Georgi, "Linear Algebra", Dover, 1977.
Smith, David Eugene, "A Source Book In Mathematics", Dover, 1959.
Spivak, Michael, "Differential Geometry", Publish or Perish, 1979.
Squires, Euan, "The Mystery of the Quantum World", 2nd ed., Institute of Physics, 1994.

Stachel, John (ed.), "Einstein's Miraculous Year", Princeton Univ. Press, 1998.
Steen, Lynn Arthur, "Mathematics Today", Vintage Books, 1980.
Stillwell, John, "Mathematics and Its History", Springer-Verlag, 1989.
Synge and Schild, "Tensor Calculus", Dover, 1949.
Taylor and Mann, "Advanced Calculus", Wiley, 3rd ed, 1983.
Thorne, Kip, "Black Holes and Time Warps", W.W. Norton & Co, 1994.
Torretti, Roberto, "Relativity and Geometry", Dover, 1996.
Visser, Matt, "Lorentzian Wormholes", AIP Press, 1996.
Wald, Robert, "General Relativity", Univ of Chicago Press, 1984.
Weinberg, Steven, "Gravitation and Cosmology", John Wiley & Sons, 1972.
Weinstock, Robert, "Calculus of Variations", Dover, 1974.
Westfall, Richard S., "Never At Rest, A Biography of Isaac Newton", Cambridge Univ.
Press, 1980.
Weyl, "Space, Time, Matter", Dover, 1952.
Whittaker, E. T., A History of the Theories of Aether and Electricity, 2nd ed., Harper &
Brothers, 1951.
Wick, David, "The Infamous Boundary", Birkhauser, 1995.
Yourgrau and Mandelstam, "Variational Principles in Dynamics and Quantum Theory",
Dover, 1979.
Zahar, Elie, "Einstein's Revolution, A Study in Heuristic", Open Court, 1989.

Das könnte Ihnen auch gefallen