Sie sind auf Seite 1von 96

THE AMERICAN MATHEMATICAL

MONTHLY
VOLUME 121, NO. 5 MAY 2014

Unknotting Unknots 379


Allison Henrich and Louis H. Kauffman
A Geometric Representation of Continued Fractions 391
Alan F. Beardon and Ian Short
Rethinking Set Theory 403
Tom Leinster
Repeatedly Appending Any Digit to Generate Composite Numbers 416
Jon Grantham, Witold Jarnicki, John Rickert, and Stan Wagon

On Wallis-type Products and Pólya’s Urn Schemes 422


Iddo Ben-Ari, Diana Hay, and Alexander Roitershtein

NOTES
The Primes that Euclid Forgot 433
Paul Pollack and Enrique Treviño
Solution of Sondow’s Problem: A Synthetic Proof of the Tangency 438
Property of the Parbelos
Emmanuel Tsukerman
A Connection between Furstenberg’s and Euclid’s Proofs of the 444
Infinitude of Primes
Nathan A. Carlson
An Elementary Proof of a Generalization of Banach’s Mapping 445
Theorem
Ming-Chia Li
A Technique in Contour Integration 447
M. L. Glasser
Large-Deviation Bounds for Sampling without Replacement 449
Kyle Luh and Nicholas Pippenger

PROBLEMS AND SOLUTIONS 455


REVIEWS
Math on Trial: How Mathematics Is Used and Abused in the Courtroom 463
By Leila Schneps and Coralie Colmez
Daniel Ullman

EDITOR’S ENDNOTES 467


MATHBITS
415, An Elementary Application of Brouwer’s Fixed Point Theorem to√Transition
Matrices; 443, A Difference Equation Leading to the Irrationality of 2

An Official Publication of the Mathematical Association of America


Count Like “This book is by far the best presentation of
Egyptian math I have read. In an age of over-
an Egyptian popularized and sensationalized science reporting,
A Hands-on Introduction Reimer’s crisp prose and concise exposition earned
to Ancient Mathematics my unqualified admiration. Count Like an Egyptian
David Reimer is destined to become a classic.”
—Eli Maor, author of e: The Story of a Number
Cloth $29.95 978-0-691-16012-2

Math Bytes “Math Bytes is a playful and inviting collection of


Google Bombs, interesting mathematical examples and applications,
Chocolate-Covered Pi, sometimes in surprising places. Many of these
applications are unique or put a new spin on things.
and Other Cool Bits
The link to computing helps make many of the
in Computing
topics tangible to a general audience.”
Tim Chartier —Matt Lane, creator of the Math Goes Pop! blog
Cloth $24.95 978-0-691-16060-3

Everyday Calculus “With a clear style and refreshing approach, this


Discovering the Hidden book shows how elementary calculus is relevant to
Math All around Us practical day-to-day events familiar to us all.”
—John Adam, author of X and the City: Modeling
Oscar E. Fernandez Aspects of Urban Life
Cloth $24.95 978-0-691-15755-9

Enlightening “An enjoyable and informative tour of


mathematics history, Enlightening Symbols
Symbols describes how our modern system of notation led
A Short History of to the abstraction we work with today. This is an
Mathematical Notation important and interesting story.”
and Its Hidden Powers —Anna Pierrehumbert, Trinity School
Joseph Mazur Cloth $29.95 978-0-691-15463-3

Taming the “This original and high-quality book is a significant


contribution to the history of mathematics. It
Unknown will be useful to scholars doing research in the
A History of Algebra history of mathematics, as well as to a broader
from Antiquity to the readership that includes mathematics teachers,
Early Twentieth Century advanced undergraduate or graduate students, and
Victor J. Katz & mathematicians.”
Karen Hunger Parshall —Leo Corry, Tel Aviv University
Cloth $49.50 978-0-691-14905-9

See our E-Books at


press.princeton.edu
THE AMERICAN MATHEMATICAL

MONTHLY
Volume 121, No. 5 May 2014

EDITOR
Scott T. Chapman
Sam Houston State University

NOTES EDITOR BOOK REVIEW EDITOR


Sergei Tabachnikov Jeffrey Nunemacher
Pennsylvania State University Ohio Wesleyan University

PROBLEM SECTION EDITORS


Douglas B. West Gerald Edgar Doug Hensley
University of Illinois Ohio State University Texas A&M University

ASSOCIATE EDITORS
William Adkins Jeffrey Lawson
Louisiana State University Western Carolina University
David Aldous C. Dwight Lahr
University of California, Berkeley Dartmouth College
Elizabeth Allman Susan Loepp
University of Alaska, Fairbanks Williams College
Jonathan M. Borwein Irina Mitrea
University of Newcastle Temple University
Jason Boynton Bruce P. Palka
North Dakota State University National Science Foundation
Edward B. Burger Vadim Ponomarenko
Southwestern University San Diego State University
Minerva Cordero-Epperson Catherine A. Roberts
University of Texas, Arlington College of the Holy Cross
Allan Donsig Rachel Roberts
University of Nebraska, Lincoln Washington University, St. Louis
Michael Dorff Ivelisse M. Rubio
Brigham Young University Universidad de Puerto Rico, Rio Piedras
Daniela Ferrero Adriana Salerno
Texas State University Bates College
Luis David Garcia-Puente Edward Scheinerman
Sam Houston State University Johns Hopkins University
Sidney Graham Anne Shepler
Central Michigan University University of North Texas
Tara Holm Susan G. Staples
Cornell University Texas Christian University
Roger A. Horn Dennis Stowe
University of Utah Idaho State University
Lea Jenkins Daniel Ullman
Clemson University George Washington University
Daniel Krashen Daniel Velleman
University of Georgia Amherst College
Ulrich Krause
Universität Bremen

EDITORIAL ASSISTANT
Bonnie K. Ponce
NOTICE TO AUTHORS Proposed problems or solutions should be sent to:
The MONTHLY publishes articles, as well as notes and DOUG HENSLEY, MONTHLY Problems
other features, about mathematics and the profes- Department of Mathematics
sion. Its readers span a broad spectrum of math- Texas A&M University
ematical interests, and include professional mathe- 3368 TAMU
maticians as well as students of mathematics at all College Station, TX 77843-3368.
collegiate levels. Authors are invited to submit arti-
cles and notes that bring interesting mathematical In lieu of duplicate hardcopy, authors may submit
ideas to a wide audience of MONTHLY readers. pdfs to monthlyproblems@math.tamu.edu.
The MONTHLY’s readers expect a high standard of ex-
position; they expect articles to inform, stimulate, Advertising correspondence should be sent to:
challenge, enlighten, and even entertain. MONTHLY
articles are meant to be read, enjoyed, and dis- MAA Advertising
cussed, rather than just archived. Articles may be 1529 Eighteenth St. NW
expositions of old or new results, historical or bio- Washington DC 20036.
graphical essays, speculations or definitive treat-
Phone: (877) 622-2373,
ments, broad developments, or explorations of a
E-mail: tmarmor@maa.org.
single application. Novelty and generality are far
less important than clarity of exposition and broad Further advertising information can be found online
appeal. Appropriate figures, diagrams, and photo- at www.maa.org.
graphs are encouraged.
Notes are short, sharply focused, and possibly infor- Change of address, missing issue inquiries, and
mal. They are often gems that provide a new proof other subscription correspondence can be sent to:
of an old theorem, a novel presentation of a familiar
MAA Service Center, maahq@maa.org.
theme, or a lively discussion of a single issue.
Submission of articles, notes, and filler pieces is re- All of these are at the address:
quired via the MONTHLY’s Editorial Manager System.
Initial submissions in pdf or LATEX form can be sent The Mathematical Association of America
to the Editor Scott Chapman at 1529 Eighteenth Street, N.W.
Washington, DC 20036.
http://www.editorialmanager.com/monthly
Recent copies of the MONTHLY are available for pur-
chase through the MAA Service Center:
The Editorial Manager System will cue the author
for all required information concerning the paper. maahq@maa.org, 1-800-331-1622.
Questions concerning submission of papers can
be addressed to the Editor at monthly@shsu.edu. Microfilm Editions are available at: University Micro-
Authors who use LATEX can find our article/note tem- films International, Serial Bid coordinator, 300 North
plate at http://www.shsu.edu/~bks006/Monthly. Zeeb Road, Ann Arbor, MI 48106.
html. This template requires the style file maa-
monthly.sty, which can also be downloaded from the
The AMERICAN MATHEMATICAL MONTHLY (ISSN
same webpage. A formatting document for MONTHLY
0002-9890) is published monthly except bimonthly
references can be found at http://www.shsu.edu/
June-July and August-September by the Mathe-
~bks006/FormattingReferences.pdf. Follow the matical Association of America at 1529 Eighteenth
link to Electronic Publications Information for
Street, N.W., Washington, DC 20036 and Lancaster,
authors at http://www.maa.org/pubs/monthly.
PA, and copyrighted by the Mathematical Asso-
html for information about figures and files, as well
ciation of America (Incorporated), 2014, including
as general editorial guidelines.
rights to this journal issue as a whole and, except
Letters to the Editor on any topic are invited. where otherwise noted, rights to each individual
Comments, criticisms, and suggestions for mak- contribution. Permission to make copies of individ-
ing the MONTHLY more lively, entertaining, and ual articles, in paper or electronic form, including
informative can be forwarded to the Editor at posting on personal and class web pages, for ed-
monthly@shsu.edu. ucational and scientific use is granted without fee
provided that copies are not made or distributed for
The online MONTHLY archive at www.jstor.org is a profit or commercial advantage and that copies bear
valuable resource for both authors and readers; it the following copyright notice: [Copyright the Math-
may be searched online in a variety of ways for any ematical Association of America 2014. All rights re-
specified keyword(s). MAA members whose institu- served.] Abstracting, with credit, is permitted. To
tions do not provide JSTOR access may obtain indi- copy otherwise, or to republish, requires specific
vidual access for a modest annual fee; call 800-331- permission of the MAA’s Director of Publications and
1622. possibly a fee. Periodicals postage paid at Washing-
See the MONTHLY section of MAA Online for current ton, DC, and additional mailing offices. Postmaster:
information such as contents of issues and descrip- Send address changes to the American Mathemati-
tive summaries of forthcoming articles: cal Monthly, Membership/Subscription Department,
MAA, 1529 Eighteenth Street, N.W., Washington, DC,
20036-1385.
http://www.maa.org/
Unknotting Unknots
Allison Henrich and Louis H. Kauffman

Abstract. A knot is an embedding of a circle into three-dimensional space. We say that a


knot is unknotted if there is an ambient isotopy of the embedding to a standard circle. In
essence, an unknot is a knot that may be deformed to a standard circle without passing through
itself. By representing knots via planar diagrams, we discuss the problem of unknotting a knot
diagram when we know that it is unknotted. This problem is surprisingly difficult, since it
has been shown that knot diagrams may need to be made more complicated before they may
be simplified. We do not yet know, however, how much more complicated they must get. We
give an introduction to the work of Dynnikov, who discovered the key use of arc-presentations
to solve the problem of finding a way to detect the unknot directly from a diagram of the
knot. Using Dynnikov’s work, we show how to obtain a quadratic upper bound for the number
of crossings that must be introduced into a sequence of unknotting moves. We also apply
Dynnikov’s results to find an upper bound for the number of moves required in an unknotting
sequence.

1. INTRODUCTION. When first delving into the theory of knots, we learn that
knots are typically studied using their diagrams. The first question that arises when
considering these knot diagrams is: How can we tell if two knot diagrams represent the
same knot? Fortunately, we have a partial answer to this question. Two knot diagrams
represent the same knot in R3 if and only if they can be related by the Reidemeister
moves; see Figure 1. Reidemeister proved this theorem in the 1920s [14], and it is the
underpinning of much of knot theory. For example, J. W. Alexander based the original
definition of his celebrated polynomial on the Reidemeister moves [1].

Figure 1. The three Reidemeister moves

Now, imagine that you are presented with a complicated diagram of an unknot,
and you would like to use Reidemeister moves to reduce it to the trivial diagram that
has no crossings. In considering a problem of this sort, you stumble upon a curious
fact. Given a diagram of an unknot to be unknotted, it might be necessary to make the
diagram more complicated before it can be simplified. We call such a diagram a hard
unknot diagram [12]. A nice example of this is the Culprit, shown in Figure 2. If you
look closely, you’ll find that no simplifying type I or type II Reidemeister moves and
no type III moves are available. Yet this is indeed the unknot. In order to unknot it, we
need to introduce new crossings with Reidemeister I and II moves. In Figure 3, we see
that we can unknot the Culprit by making the diagram larger by two crossings (via a

http://dx.doi.org/10.4169/amer.math.monthly.121.05.379
MSC: Primary 57M25

May 2014] UNKNOTTING UNKNOTS 379


Figure 2. The Culprit

Figure 3. The Culprit undone

Reidemeister move of type II) and that it takes a total of ten Reidemeister moves to
accomplish the unknotting. (Note that both type I and type II moves were performed
between the fifth and sixth diagram.)
In Figures 4 and 5, we indicate more examples of hard unknot diagrams. In Figure
4, we show examples with the least number of possible crossings. In Figure 5, we show
the very first example that appeared, discovered by Goeritz in 1934 [5].

Figure 4. The smallest hard unknots

At this point, we ask ourselves: How much more complicated does a diagram need
to become before it can be simplified? Moreover, how many Reidemeister moves do
we need to trivialize our picture? In this paper, we give a technique for finding upper
bounds for these answers. In particular, we will prove the following theorem. Note
that the precise definition of a knot diagram in Morse form (including the notion of a

380 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



G

Figure 5. The Goeritz unknot

maximum of a diagram) will be given in the body of the paper. To put this result into
context, however, think of b(K ) as being no larger than cr (K ).

Theorem 4. Suppose that K is a diagram (in Morse form) of the unknot with crossing
number cr (K ) and number of maxima b(K ). Let M = 2b(K ) + cr (K ). Then the dia-
gram can be unknotted by a sequence of Reidemeister moves so that no intermediate
diagram has more than (M − 2)2 crossings.

This theorem is proven using combinatorial arguments, along with the machinery
developed by Dynnikov in [4]. We will introduce the necessary background material
in Section 2 and give the proof in Section 3.
Returning to our Culprit, we have that cr (K ) = 10 and b(K ) = 5. Thus, M = 20
and (M − 2)2 = 182 = 324 is our upper bound on the number of crossings needed
to simplify the diagram. In actuality, we only needed a diagram with 12 crossings
in our unknotting sequence. The theory of these bounds needs improvement, but it
is remarkable that there is a theory at all for such questions. In addition to proving
this theorem, we will give bounds on the number of Reidemeister moves needed for
unknotting, and we will point the reader toward more results related to this question.
We warn the reader that the difference between the lower bounds and upper bounds that
are known is still vast. The quest for a satisfying answer to these questions continues.

2. PRELIMINARIES. The method we present to find upper bounds makes use of a


powerful result proven by Dynnikov in [4] regarding arc-presentations of knots. Arc-
presentations are special types of rectangular diagrams, i.e., knot diagrams that are
composed entirely of horizontal and vertical lines. Here, we provide an overview of the
theory of arc-presentations. In Figure 6, we give an example of a rectangular diagram
that is an arc-presentation and another example of a rectangular diagram that is not an
arc-presentation.

Figure 6. The picture on the left is an example of an arc-presentation of a trefoil. The picture on the right is
an example that is not an arc-presentation (since not all horizontal arcs pass under vertical arcs).

Definition 1. An arc-presentation of a knot is a knot diagram comprised of horizontal


and vertical line segments connected end-to-end such that, at each crossing in the
diagram, the horizontal arc passes under the vertical arc. Furthermore, we require that
no two edges in an arc-diagram are colinear.

May 2014] UNKNOTTING UNKNOTS 381


Two arc-presentations are combinatorially equivalent if they are isotopic in the
plane via an ambient isotopy of the form h(x, y) = ( f (x), g(y)).
The complexity c(L) of an arc-presentation is equal to the number of vertical arcs
in the diagram.

Note that a rectangular diagram can be drawn naturally on a rectangular grid, with
corners and crossings contained within the squares of the grid. If we start by represent-
ing a rectangular diagram on a grid in this way, then we have what is often referred to as
a mosaic knot. Mosaic knots can be used to define a notion of quantum knot. See [11]
for more about quantum knots. For now, we refocus our attention on arc-presentations.

Proposition 1 (Dynnikov). Every knot has an arc-presentation. Any two arc-presenta-


tions of the same knot can be related to each other by a finite sequence of elementary
moves, pictured in Figures 7 and 8.

Figure 7. Elementary (de)stabilization moves. Stabilization moves increase the complexity of the arc-
presentation, while destabilization moves decrease the complexity.

Figure 8. Some examples of exchange moves. Other allowed exchange moves involve switching the heights of
two horizontal arcs that lie in distinct halves of the diagram. See [4] for a general formulation of the exchange
moves.

The proof of this proposition is elementary, based on the Reidemeister moves. A


sketch is provided in [4]. We will show how to convert a usual knot diagram to an arc-
presentation in the next few paragraphs, making use of the concept of Morse diagrams
of knots.

Definition 2. A knot diagram is in Morse form with respect to a given vector in the
plane if it has

382 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



1. no horizontal lines (i.e., lines perpendicular to the given vector),
2. no inflection points,
3. at most one singularity at each height, and
4. each crossing is oriented to create a 45-degree angle with the given vector.
See Figure 9 for an example. We note that converting an arbitrary knot diagram into
a diagram in Morse form requires no Reidemeister moves, only ambient isotopies of
the plane. More information about Morse diagrams can be found in [10].

Figure 9. A Morse diagram of a knot and a corresponding rectangular diagram

Lemma 2. Suppose that a knot (or link) diagram K in Morse form has cr (K ) cross-
ings and b(K ) maxima. Then there is an arc-presentation L K of K with complexity
c(L K ) at most 2b(K ) + cr (K ) that can be obtained by ambient isotopies of the plane
(without the use of Reidemeister moves).

Figure 10. Rotating a crossing to convert a rectangular diagram into an arc-presentation

Proof. We begin with a diagram in Morse form, and convert this diagram into a piece-
wise linear diagram composed of lines with slope ±1. The resulting diagram has a
vertex corresponding to each maximum and minimum, with additional vertices that
form left and right cusps—at most one for each pair of successive extrema. Since the
number of minima equals the number of maxima in the diagram, and the number of
vertices that are not extrema is no larger than the sum of these two quantities, we have
at most 4b(K ) vertices. Thus, rotating this diagram by 45 degrees, we have a diagram
composed entirely of horizontal and vertical arcs with complexity at most 2b(K ), half
the number of possible vertices.
This diagram may fail to be an arc-presentation of K if any crossing has a horizontal
overpass. If more than half of the crossings in K have horizontal overpasses, we rotate
the diagram by 90 degrees. Now, at least half of the crossings are in the proper form.
Any remaining crossings containing a horizontal overpass may be rotated locally 90
degrees to form our arc-presentation L K , as shown in Figure 10 (see Figure 11 for an
example). For each crossing that requires this move, the complexity of the rectangular
diagram increases by at most 2. Thus, the overall complexity of our diagram increases
by at most 2( 12 cr (K )) = cr (K ). It follows that c(L K ) ≤ 2b(K ) + cr (K ).
Note that neither converting a Morse diagram into a piecewise linear diagram, nor
locally rotating a crossing, uses Reidemeister moves. These are ambient isotopies of
the plane.

May 2014] UNKNOTTING UNKNOTS 383


Figure 11. Converting a rectangular diagram into an arc-presentation by rotating the diagram and then rotating
a crossing. Note that the resulting diagram can be reduced to a simpler arc-presentation with an exchange move
that doesn’t require any Reidemeister moves.

3. BOUNDS ON CROSSINGS NEEDED TO SIMPLIFY THE UNKNOT. Our


motivation for using Dynnikov’s work to find upper bounds for an unknotting Reide-
meister sequence began with the following theorem from [4].

Theorem 3 (Dynnikov). If L is an arc-presentation of the unknot, then there exists a


finite sequence of exchange and destabilization moves

L → L1 → L2 → · · · → Lm

such that L m is trivial.

What is particularly interesting about this result is that the unknot can be simplified
without increasing the complexity of the arc-presentation, that is, without the use
of stabilization moves. This gives a useful physical bound on how large a diagram can
be. Furthermore, if we apply Dynnikov’s method to a knotted knot, the process will
halt on a diagram that is not a planar circle. Thus, Dynnikov can detect the unknot.
The problem of detecting the unknot has been investigated by, for example, Birman
and Hirsch [2] and Birman and Moody [3]. More recently, it has been shown that
Heegard Floer Homology (a generalization of the Alexander polynomial) not only
detects the unknot, but also can be used to calculate the least genus of an orientable
spanning surface for any knot. This is an outstanding result, and we recommend that
the reader examine the paper by Manolescu, Oszvath, Szabo, and Thurston [13] for
more information. In that work, the Heegard Floer homology is expressed via a chain
complex that is associated to a rectangular diagram of just the type that Dynnikov uses.
Returning to the task at hand, we derive a quadratic upper bound on the cross-
ing number of diagrams in an unknotting sequence. Note that, using similar methods,
Dynnikov finds a bound of 2(cr (K ) + 1)2 in [4].

Theorem 4. Suppose that K is a diagram (in Morse form) of the unknot with
crossing number cr (K ) and number of maxima b(K ). Then for every i, the cross-
ing number cr (K i ) is no more than (M − 2)2 , where M = 2b(K ) + cr (K ) and
K = K 0 , K 1 , K 2 , . . . , K N is a sequence of knot diagrams such that K i+1 is obtained
from K i by a single Reidemeister move and K N is a trivial diagram of the unknot.

Proof. To begin, we notice that K can be viewed as an arc-presentation of complexity


M by a simple ambient isotopy of the plane, as shown in Lemma 2. By Theorem 3,
there is a sequence of arc-presentations beginning with K and ending with the trivial
arc-presentation each having complexity no more than M, such that a diagram and its
successor are related by an exchange or a destabilization move. Each destabilization
move either preserves or reduces the number of crossings in the diagram. In the case

384 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



that a destabilization move reduces the number of crossings, it can be viewed as a
simplifying Redemeister I move. Otherwise, it can be viewed as a simple ambient
isotopy of the plane.
When an exchange move is performed, on the other hand, its analogous Reidemeis-
ter sequence may require type II and type III Reidemeister moves (see Figure 12).
At most one type II Reidemeister move is required for any given exchange move,
so an exchange move factors through a Reidemeister sequence of moves that adds at
most two crossings (since type III moves preserve the crossing number). However, it
is important to note that a Reidemeister II move is needed if and only if the exchange
move itself increases the number of crossings by two in the arc-presentation. Thus, no
more crossings are added when factoring an exchange move through a Reidemeister
sequence than are added in the exchange move itself.

Figure 12. Factoring an exchange move through a type II and multiple type III Reidemeister moves.

It is straightforward to show that the maximum number of crossings that may occur
in an arc-presentation with complexity less than or equal to M is bounded above by
(M − 2)2 . If we translate an arc-presentation sequence of moves in a canonical fashion
into a sequence of Reidemeister moves to unknot our unknot, then many knot diagrams
in the Reidemeister sequence will be arc-presentations and, as such, will have fewer
than (M − 2)2 crossings. Furthermore, diagrams in this sequence that are not arc-
presentations have no more crossings than their arc-presentation relatives. Thus, there
exists a sequence of Reidemeister moves that unknots our original diagram K that does
not increase the crossing number to more than (M − 2)2 .

4. BOUNDS ON REIDEMEISTER MOVES NEEDED TO SIMPLIFY THE UN-


KNOT. To find our upper bound on the number of Reidemeister moves, we must first
specify an upper bound on the number m of exchange and destabilization moves re-
quired to trivialize an arc-presentation. This bound will depend on the complexity
c(L) = n of the arc-presentation L. We must also provide an upper bound on the num-
ber of Reidemeister moves required for a destabilization or exchange move.
In [4], Dynnikov provides the following bounds on the number of combinatorially
distinct arc-presentations of complexity n.

May 2014] UNKNOTTING UNKNOTS 385


Proposition 5. If N (n) denotes the number of combinatorially distinct arc-presenta-
tions of complexity n, then the inequality N (n) ≤ 12 n[(n − 1)!]2 holds.

Proof. Suppose that we want to create an arc-presentation on the n × n integer lattice.


2
Let us choose a starting point in the lattice. There are n2 = n2n ways to choose this point,
since there are n 2 lattice points, 2n of which lie on a given diagram. From this point,
we create a vertical arc ending at another point in the integer lattice. There are n − 1
choices for this endpoint. From our new point, we want to create a horizontal arc with
an endpoint in the lattice. There are n − 1 choices for this endpoint as well. Next, we
make another vertical arc, choosing one of the n − 2 possible endpoints. (There are
only n − 2 choices, since no two arcs in the diagram should be colinear.) Similarly,
we have n − 2 choices for the endpoint of our next horizontal arc. Continuing in this
fashion, we see that the number of distinct choices we must make is [(n − 1)!]2 . Mul-
tiplying this quantity by n2 to account for the initial choice of starting point, yields
1
2
n[(n − 1)!]2 .

Using this count on the number of distinct arc-presentations of a given size, we can
find a bound (albeit a large one) on the number of arc-presentation moves we need.
This is simply by virtue of the fact that any reasonable sequence of moves will contain
mutually distinct arc-presentations that don’t exceed the complexity of the original,
and there are a limited number of such diagrams.

Lemma 6. The number of terms, m, in the


Pnmonotonic simplification of arc-presentation
1 2
L with c(L) = n is bounded above by i=2 2
i[(i − 1)!] .

Proof. Suppose that an arc-presentation L has complexity n. Since each L k from The-
orem 3 is combinatorially distinct from any other L j with k 6 = P
j, we know that the
i=2 N (i), which is
n
number m of arc-presentations
Pn 1 in the sequence must be at most
no greater than i=2 2 i[(i − 1)!]2 .

We should note that, if we start with an arc-presentation of the unknot, every arc-
presentation in our simplification sequence must be a diagram of the unknot. As n
gets larger, we recognize that far fewer arc-presentations of complexity n are unknots.
Thus, in practice, m will be much lower than the upper bound provided here. We would
be interested to know what the probability is that an arc-presentation of complexity n is
the unknot. Using this probability, we could tighten the upper bound we found above.
We return now to our second question: How many Reidemeister moves does it take
to make an arc-presentation move?

Lemma 7. No more than n − 2 Reidemeister moves are required to perform an ex-


change or destabilization move on an arc-presentation L with complexity c(L) = n.

Proof. Clearly, a destabilization move requires at most one Reidemeister move, a type
I move. Now, consider the first exchange move pictured in Figure 8. Let d be the
number of vertical strands intersecting both of the horizontal strands to be switched.
Then, the move requires d type III moves and one type II move. Thus, the exchange
move requires d + 1 Reidemeister moves. We note that if a is the length of the shorter
horizontal arc, then d < a. But a cannot be greater than n − 2, so the number of Reide-
meister moves required is less than or equal to n − 2. Similarly, the second exchange
move in Figure 8 requires d type III moves but no type II moves. Thus, both pictured

386 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



exchange moves require no more than n − 2 Reidemeister moves. We note that other
versions of the exchange moves (where the horizontal arcs lie in distinct halves of the
arc-presentation) require no Reidemeister moves.

For the finale, we put our two results together.

Theorem 8. Suppose that K is a diagram (in Morse form) of the unknot with cross-
ing number cr (K ) and number of maxima b(K ). Let M = 2b(K ) + cr (K ). Then the
number of Reidemeister moves required to unknot K is less than or equal to

M
X 1
i[(i − 1)!]2 (M − 2).
i=2
2

Proof. Suppose that the arc-presentation L K of our knot diagram K has complexity
c(L K ) = n. Then at most m(n − 2) Reidemeister moves are required to produce the
trivial (complexity 2) arc-presentation, where m is the number of moves in the mono-
tonic simplification of L K . By our lemma, this quantity is bounded above by

n
X 1
i[(i − 1)!]2 (n − 2).
i=2
2

But we showed that n ≤ 2b(K ) + cr (K ) = M; thus, the number of Reidemeister


moves required to unknot K is less than or equal to

M
X 1
i[(i − 1)!]2 (M − 2).
i=2
2

We’ve achieved our desired result.

5. A DETOUR: BOUNDS FOR UNTANGLING LINKS. We now take a short de-


tour through the world of non-trivial knots and links, to illustrate that similar questions
may be extended to families of knots and links beyond the unknot. In keeping with our
theme, we make use of the work of Dynnikov. He proved two other results regarding
the simplification of certain link diagrams [4]. In a fashion analogous to the previous
section, we may use Dynnikov’s results to bound the number of Reidemeister moves
and the number of crossings needed to simplify certain types of link diagrams. Before
we state these theorems, let us define our terms clearly.

Definition 3. A link diagram L is said to be split if there is a line not intersecting L


such that components of the diagram are lying on both sides of the line. A link (or
knot) diagram L is composite if it can be viewed as a connect sum of two nontrivial
links, i.e., if there is a line intersecting the link at two points such that the tangles on
either side of the line are non-trivial. In general, a link is said to be split or composite
if there exists a diagram of the link that is split or composite. Figures 13 and 14 give
examples illustrating these definitions.

We review the pertinent results from [4].

May 2014] UNKNOTTING UNKNOTS 387


Figure 13. A split link Figure 14. A composite knot

Theorem 9 (Dynnikov). If L is an arc-presentation of a split link, then there exists a


finite sequence of exchange and destabilization moves

L → L1 → L2 → · · · → Lm

such that L m is split.

Theorem 10 (Dynnikov). If L is an arc-presentation of a non-split composite link,


then there exists a finite sequence of exchange and destabilization moves

L → L1 → L2 → · · · → Lm

such that L m is composite.

We note that the statements of Lemmas 2, 6, and 7 hold for arbitrary links as well
as diagrams of the unknot. Thus, the following result is an immediate consequence of
the previous theorems.

Theorem 11. Suppose that L is a diagram (in Morse form) of a split (resp. non-
split composite) link with crossing number cr (L) and number of maxima b(L). Let
M = 2b(L) + cr (L). Then the number of Reidemeister moves required to transform
L into a split (resp., composite) diagram is less than or equal to
M
X 1
i[(i − 1)!]2 (M − 2).
i=2
2

Similarly, we have the following extension of our results regarding maximum cross-
ing numbers in a simplifying Reidemeister sequence.

Theorem 12. Suppose that L is a diagram (in Morse form) of a split (resp., non-split
composite) link with crossing number cr (L) and number of maxima b(L). Then for
every i, the crossing number cr (L i ) is no more than (M − 2)2 , where M = 2b(L) +
cr (L) and L = L 0 , L 1 , L 2 , . . . , L N is a sequence of link diagrams such that L i+1 is
obtained from L i by a single Reidemeister move, and L N is split (resp., composite).

6. HARD UNKNOTS. We have provided upper bounds regarding the complexity


of the Reidemeister sequence required to simplify an unknot, both in terms of the
number of crossings and the number of moves required in the sequence. The bound that
Dynnikov’s work helps us obtain for the number of Reidemeister moves required to

388 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



unknot an unknot is superexponential. Using a different technique, Hass and Lagarias
were able to find a bound on the number of required moves that is exponential in the
crossing number of the diagram [6]. They use the same technique to find an exponential
bound for the number of crossings required for unknotting. For bounds of this second
sort, the one presented here is a comparatively sharper estimate.
Regarding lower bounds, it was recently shown in [7] that there are unknot di-
agrams for which the number of Reidemeister moves required for unknotting is
quadratic in the crossing number of the initial diagram. In [8], similar quadratic lower
bounds are given for links. On the other hand, little is known about how many addi-
tional crossings an unknot diagram might require in order to become unknotted. While
the upper bound on the number of crossings needed in a Reidemeister sequence is
merely quadratic in the crossing number of the initial unknot diagram, it nonetheless
seems likely that this bound is far from being tight.
Let us return to our friend, the Culprit. This famous hard unknot diagram was orig-
inally discovered by Ken Millett and introduced in [9]. Recall that hard unknots are
difficult to unknot by virtue of the fact that no simplifying type I or type II Reide-
meister moves and no type III moves are available. In Figure 15, we picture a Morse
diagram of the Culprit, its corresponding rectangular diagram, and its arc-presentation
obtained by rotating crossings where the over-strand is horizontal. Note that we need
not specify crossing information in the arc-presentation, for it is assumed that all ver-
tical lines pass over horizontal lines.

Figure 15. The Culprit with its rectangular diagram and arc-presentation

We saw in Figure 3 that the Culprit may be unknotted with ten Reidemeister moves,
see also [12]. The maximum crossing number of all diagrams in the given Reidemeis-
ter sequence is 12, two more than the number of crossings in the Culprit. As noted in
the introduction, however, we can compute our upper bound on the number of cross-
ings required for unknotting as follows. Since the crossing number cr (K ) = 10 and the
number of maxima in the diagram is b(K ) = 5, we see that M = cr (K ) + 2b(K ) =
20. Thus, our bound is (M − 2)2 = 182 = 324.
We can also use M to find our bound for the number of Reidemeister moves required
to unknot the Culprit:

M 20
X 1 X
i[(i − 1)!]2 (M − 2) = 9 i[(i − 1)!]2 .
i=2
2 i=2

The largest term in this expression is roughly 1035 , quite a bit larger than ten, unfortu-
nately.

May 2014] UNKNOTTING UNKNOTS 389


We challenge the reader to find examples where the maximum crossing number is
closer to our bound and where the number of needed Reidemeister moves is large in
comparison to the number of crossings in the original diagram.

7. CONCLUSIONS. We’ve considered the phenomenon that it may be quite hard


to unknot a trivial knot, and have given upper and lower bounds on the number of
Reidemeister moves and the number of crossings needed to do the job. Known hard
unknots like the Culprit and examples from [7] illustrate that unknotting can be tricky,
but not as tricky as the upper bounds that are known would have us believe. To answer
the questions we’ve posed, there is much more to be done.

ACKNOWLEDGEMENTS. The authors would like to thank Jeffrey Lagarias and John Sullivan for their
valuable comments. We would also like to thank the referees for their editorial suggestions.

REFERENCES

1. J. W. Alexander, Topological invariants of knots and links, Trans. Amer. Math. Soc. 20 (1923) 275–306.
2. J. Birman, M. Hirsch, A new algorithm for recognizing the unknot, Geom. Top. 2 (1998) 175–220.
3. J. Birman, J. Moody, Obstructions to trivializing a knot, Israel J. Math. 142 (2004) 125–162.
4. I. A. Dynnikov, Arc–presentations of links: monotonic simplification, Fund. Math 190 (2006) 29–76.
5. L. Goeritz, Bemerkungen zur Knotentheorie, Abh. Math. Sem. Univ. Hamburg 18 (1997) 201–210.
6. J. Hass, J. Lagarias, The number of Reidemeister moves needed for unknotting, J. Amer. Math. Soc 14
(2001) 399–428.
7. J. Hass, T. Nowik, Unknot diagrams requiring a quadratic number of Reidemeister moves to untangle,
Discrete Comput. Geom. 44 (2010) 91–95.
8. C. Hayashi, M. Hayashi, Unknotting number and number of Reidemeister moves needed for unlinking,
available at http://arxiv.org/abs/1012.4131 (2010) 1–10.
9. L. Kauffman, Knots and Physics, Series on Knots and Everything, Vol. 1, World Scientific, Singapore,
1991.
10. L. Kauffman, Knot diagrammatics, in The Handbook of Knot Theory, Edited by W. Menasco and M.
Thistlethwaite, Elsevier, Amsterdam, 2005. 233–318.
11. L. Kauffman, S. Lomonaco, Quantum knots and mosaics, J. Quantum Info. Processing 7 (2008) 85–115.
12. L. Kauffman, S. Lambropoulou, Hard unknots and collapsing tangles, in Introductory Lectures on Knot
Theory—Selected Lectures presented at the Advanced School and Conference on Knot Theory and its
Applications to Physics and Biology ICTP, Trieste, Italy, 11–29 May 2009, Edited by L. Kauffman, S.
Lambropoulou, S. Jablan, and J. Przytycki, World Scientific, Singapore, 2011.
13. C. Manolescu, P. Ozsvath, Z. Szabo, D. Thurston, Combinatorial link Floer homology, Geom.Top. 11
(2007) 2339–2412.
14. K. Reidemeister, Knotentheorie, Julius Springer, Berlin, 1932.

ALLISON HENRICH received her B.S. in mathematics and B.A. in philosophy from the University of Wash-
ington in 2003 and her Ph.D. in mathematics from Dartmouth College in 2008. She is currently an assistant
professor at Seattle University and has research interests in virtual knot theory and games involving knots.
When she is not doing mathematics, Henrich enjoys going to concerts, cooking, and playing with her puppies.
Department of Mathematics, Seattle University, Seattle, WA 98122
henricha@seattleu.edu

LOUIS KAUFFMAN is Professor of mathematics at the University of Illinois at Chicago. His primary re-
search interest is in knot theory. He is well known for the bracket state sum model for the Jones polynomial,
for a two-variable link polynomial called the Kauffman polynomial, and for the introduction and exploration of
an extension of classical knots called virtual knot theory. Kauffman is the author of four books on knot theory,
the editor of the World Scientific book series On Knots and Everything, and the editor-in-chief and founding
editor of the Journal of Knot Theory and Its Ramifications. When not doing mathematics, he plays clarinet in
the Chicago-based ChickenFat Klezmer Orchestra.
Department of Mathematics, Statistics, and Computer Science, University of Illinois, Chicago, IL 60607
kauffman@uic.edu

390 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



A Geometric Representation of
Continued Fractions
Alan F. Beardon and Ian Short

Abstract. Inspired by work of Ford, we describe a geometric representation of real and com-
plex continued fractions by chains of horocycles and horospheres in hyperbolic space. We
explore this representation using the isometric action of the group of Möbius transformations
on hyperbolic space, and prove a classical theorem on continued fractions.

1. INTRODUCTION. In this paper, we consider infinite complex continued frac-


tions of the form

1
K(bn ) = b1 +
1
b2 +
1
b3 + ,
b4 + · · ·

where b1 , b2 , . . . are complex numbers. We say that this is an integer continued frac-
tion if all bi are integers, and a real continued fraction if all bi are real. It has long
been recognized that we can study real and complex continued fractions from a ge-
ometric point of view by using Möbius transformations; here, we show how to rep-
resent real continued fractions by chains of horocycles in the hyperbolic plane, and
complex continued fractions by chains of horospheres in hyperbolic space. The ori-
gin of this idea is in Ford’s well-known paper [5] where he used such a representa-
tion to study integer continued fractions. Ford constructs the horocycles at rational
points in an elementary way; indeed, he says, “Perhaps the author owes an apology
to the reader for asking him to lend his attention to so elementary a subject, . . . ”.
On the other hand, he also says that his original idea was motivated by Bianchi’s
study of the Picard group. This material is at a deeper level, and Ford uses this in [4]
where he discusses horospheres in three-dimensional hyperbolic space that are based
at the Gaussian integers. Here we follow a similar path from an elementary represen-
tation of real continued fractions by horocycles to a deeper study of the representa-
tion of complex continued fractions by horospheres in three-dimensional hyperbolic
space.

2. CONTINUED FRACTIONS AND MÖBIUS MAPS. We begin by describ-


ing the relationship between continued fractions and certain sequences of matrices
and Möbius maps. Let b1 , b2 , . . . be as above. Moreover, let A0 , A1 , A2 , . . . and
B0 , B1 , B2 , . . . , where A0 = 1 and B0 = 0, be given by

An An−1 b1 1 b2 1 b 1
      
= ··· n . (2.1)
Bn Bn−1 1 0 1 0 1 0

http://dx.doi.org/10.4169/amer.math.monthly.121.05.391
MSC: Primary 40A15, Secondary 30F45, 30B70

May 2014] A GEOMETRIC REPRESENTATION OF CONTINUED FRACTIONS 391


Taking determinants gives |An Bn−1 − An−1 Bn | = 1. We also define tn (z) = bn + 1/z
and Tn (z) = (An z + An−1 )/(Bn z + Bn−1 ) for n = 1, 2, . . . , and then, corresponding
to (2.1), we have

Tn = t1 ◦ t2 ◦ · · · ◦ tn .

Evaluating this equation at ∞ gives

An 1
Tn (∞) = = b1 +
Bn 1
b2 +
1
b3 + · · · + .
bn

Clearly, we are working in the extended complex plane, and K(bn ) converges if the
sequence Tn (∞) converges; otherwise, it diverges.
Note that we can recapture the coefficients bn from the maps Tn . Indeed,

−1
bn = tn (∞) = Tn−1 Tn (∞),

and since we know the matrix representations for Tn and Tn−1 , we can simplify the
term on the right to obtain |An Bn−2 − Bn An−2 | = |bn |. This gives

|bn |
|Tn (∞) − Tn−2 (∞)| = . (2.2)
|Bn−2 ||Bn |

3. CHAINS OF HOROCYCLES. In this section, we shall assume that the coef-


ficients b1 , b2 , . . . are real numbers (but not necessarily integers). Let H denote the
upper half of the complex plane C. A horocycle in H refers to either a circle in C
that is tangent to the real axis and otherwise lies in H, or else a horizontal line in H
(which has constant imaginary part) with the point ∞ attached. The base point of the
horocycle is the point of tangency in the first case, and ∞ in the latter case. A simple
application of Pythagoras’ theorem gives the following lemma.

Lemma 3.1. Two horocycles with distinct real base points x and y, and Euclidean
radii r and s, are tangent if and only if

|x − y|2 = 4r s.

Next, using the matrix entries An and Bn of (2.1), we establish a correspondence


between continued fractions and chains of horocycles in the hyperbolic plane. Let
50 denote the horocycle {z : Im[z] = 1} ∪ {∞}. Given a continued fraction K(bn ),
define 5n , for each positive integer n, to be the horocycle with base point An /Bn
and Euclidean radius 1/(2Bn2 ) provided Bn 6= 0, and if Bn = 0, then define 5n = {z :
Im[z] = A2n } ∪ {∞}. Suppose that Bn , Bn−1 6 = 0; then
2 2
= |An Bn−1 − An−1 Bn | = 4 1 1
An A
  
n−1
− (3.1)


2 2
,
B
n Bn−1 Bn2 Bn−1 2Bn2 2Bn−1

392 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



and so Lemma 3.1 implies that 5n and 5n−1 are tangent. Likewise, we can check that
5n and 5n−1 are tangent if one of Bn or Bn−1 is 0 (they cannot both be 0). We say that
a sequence of horocycles 50 , 51 , . . . of this type (where 5n and 5n−1 are tangent and
50 is given by Im[z] = 1) is a chain of horocycles. The first five horocycles in a typical
chain are shown in Figure 3.1. Notice that nonconsecutive horocycles may overlap.

50
51

53
52

54

Figure 3.1. The beginnings of a chain of horocycles

We have ignored the algebraic details of the special case Bn = 0, which occurs pre-
cisely when Tn (∞) = ∞. We return to this issue afresh in Section 6 with a geometric
approach that avoids the need to distinguish the point ∞. In Section 4, we study con-
tinued fractions with positive coefficients bn , in which case the denominators Bn are
also positive, and all horocycles 5n are Euclidean circles.
Each horocycle in a chain corresponding to an integer continued fraction is a Ford
circle; that is, its base point is a reduced rational p/q, and its Euclidean radius is
1/(2q 2 ). Ford introduced chains of Ford circles in [5]. Ford circles never overlap. A
collection of Ford circles is shown in Figure 3.2, and the first few horocycles in a chain
are shaded in a darker color.

Figure 3.2. The beginnings of a chain of Ford circles

4. A CONVERGENCE THEOREM. The following well-known theorem (origi-


nally due to Seidel and Stern) is found in many classic texts on continued fractions,
such as [6, Theorem 166], [8, Theorem 10], and [9, Theorem 3.13].

Theorem 4.1. Suppose that b1 , b2 , . . . are positive numbers. If bn diverges, then


P
the continued fraction K(bn ) converges.

May 2014] A GEOMETRIC REPRESENTATION OF CONTINUED FRACTIONS 393


This result shows that continued fractions with positive integer coefficients con-
verge. In this section we illustrate and prove the theorem using chains of horocycles.
Each transformation tn (z) = bn + 1/z, where now bn > 0, maps the interval [0, ∞]
within itself. It follows that Tn also maps [0, ∞] within itself, and so the point Tn (bn+1 )
lies between Tn (0) and Tn (∞). Now

Tn (bn+1 ) = Tn+1 (∞) and Tn (0) = Tn−1 (∞),

and therefore we have shown that Tn+1 (∞) lies between Tn−1 (∞) and Tn (∞). Since
trivially T1 (∞) < T2 (∞), we deduce that

0 < T1 (∞) < T3 (∞) < · · · < T2n−1 (∞) < T2n (∞) < · · · < T4 (∞) < T2 (∞). (4.1)

This implies that there are real numbers α and β, with α ≤ β, such that

T2n−1 (∞) → α and T2n (∞) → β (4.2)

as n → ∞, and K(bn ) converges if and only if α = β.


Let us now examine the chain of horocycles 50 , 51 , . . . for continued fractions
with positive coefficients. By (3.1), the Euclidean radii rn and rn−1 of the horocycles
5n and 5n−1 satisfy

|Tn (∞) − Tn−1 (∞)|2 = 4rn rn−1 . (4.3)

Therefore,

4rn rn−1 = |Tn (∞) − Tn−1 (∞)|2 < |Tn−1 (∞) − Tn−2 (∞)|2 = 4rn−1rn−2 ,

and so rn < rn−2 . We deduce that both sequences r1 , r3 , . . . and r2 , r4 , . . . are decreas-
ing (see Figure 4.1).

51

52
53
54

55
56

Figure 4.1. The beginnings of a chain of horocycles when all bn are positive

Suppose that K(bn ) diverges; then the


P limits α and β from (4.2) are distinct. To
prove Theorem 4.1, we must show that bn converges. By (4.3),

4rn rn−1 = |Tn (∞) − Tn−1 (∞)|2 > |α − β|2 .

This implies that the decreasing sequences r1 , r3 , . . . and r2 , r4 , . . . both converge to


positive constants. Now, from (2.2), we see that

|Tn (∞) − Tn−2 (∞)| = 2bn rn rn−2 .

394 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



We know from (4.1) that |Tn−2 (∞) − Tn (∞)| converges, and since
Pthe sequence
P
r1 , r2 , . . . is bounded below by a positive constant, we deduce that bn also con-
verges. This completes the proof of Theorem 4.1.

5. MÖBIUS TRANSFORMATIONS AND HYPERBOLIC GEOMETRY. The


relationship between a continued fraction and its chain of horocycles can be better un-
derstood by using the action of complex Möbius transformations on three-dimensional
hyperbolic space. We briefly review some of the theory of Möbius transformations and
hyperbolic geometry, which can be found in more detail in [1].
The set

H3 = {(x, y, t) ∈ R3 : t > 0}

is a model
p of three-dimensional hyperbolic space when equipped with the hyperbolic
metric d x 2 + dy 2 + dt 2 /t. The corresponding distance function % on H3 is given by

Z p 2
d x + dy 2 + dt 2
%(u, v) = inf ,
γ γ t

where the infimum is taken over all smooth paths γ from u to v in H3 . Unlike the
Euclidean metric on H3 , the hyperbolic metric is complete. Let us identify the point
(x, y, 0) with the complex number z = x + i y. The complex plane C is then identified
with the Euclidean plane t = 0. The group of Möbius transformations acts on C ∪
{∞}, and we now describe how this action can be extended to H3 .
Consider a Möbius transformation f (z) = (az + b)/(cz + d), where a, b, c, and d
are complex numbers with ad − bc = 1. Define j = (0, 0, 1). Then, (x, y, t) can be
represented by the quaternion z + t j, and the action of f on H3 is given by

(az + b)(cz + d) + a c̄t 2 + t j


f (z + t j) = ; (5.1)
|cz + d|2 + |c|2 t 2

see [1, Section 4.1]. This action preserves the hyperbolic metric on H3 . In other words,

%( f (u), f (v)) = %(u, v)

for any pair of points u and v in H3 . In fact, every conformal isometry of H3 arises in
this way.
We end this section by examining the actions on H3 of two specific types of Möbius
transformation. First, consider a translation f (z) = z + b. Then

f (z + t j) = z + b + t j.

That is, f also acts as a translation by b on H3 . Next, suppose that f (z) = 1/z. We
must write this map as f (z) = i/(i z) in order to satisfy the condition ad − bc = 1.
Then

z̄ + t j
f (z + t j) = . (5.2)
|z|2 + t 2

May 2014] A GEOMETRIC REPRESENTATION OF CONTINUED FRACTIONS 395


Geometrically, f is an inversion in the unit sphere (which preserves H3 ) followed by
reflection in the Euclidean plane y = 0 (which also preserves H3 ).

6. CHAINS OF HOROSPHERES. With the geometry developed in the previous


section, we are now able to take a more sophisticated view of chains of horocycles
associated to continued fractions. This time we assume that the coefficients bn of our
continued fraction are arbitrary complex numbers. A horosphere in H3 is either a Eu-
clidean sphere in R3 that is tangent to C and otherwise lies in H3 , or else a Euclidean
plane in H3 , parallel to C, with the point ∞ attached. The base point of the horosphere
is the point of tangency in the first case, and ∞ in the latter case. The image of a
horosphere under a Möbius transformation is another horosphere.
Let

60 = {z + j : z ∈ C} ∪ {∞}.

We can calculate the image of 60 under a Möbius transformation using the represen-
tation by quaternions described in the previous section. Let

ht(z + t j) = t,

so that ht measures the ‘height’ of a point z + t j in H3 .

Lemma 6.1. Suppose that f (z) = (az + b)/(cz + d), where ad − bc = 1. If c 6 =


0, then f (60 ) has base point a/c and Euclidean radius 1/(2|c|2 ). If c = ∞, then
f (60 ) = {z + |a|2 j : z ∈ C} ∪ {∞}.

Proof. The base point of f (60 ) is f (∞). Suppose first that c 6 = 0. Then f (∞) = a/c,
so f (60 ) is a Euclidean sphere. By (5.1), we have

1
ht( f (z + j)) = .
|cz + d|2 + |c|2

The maximum value of this expression is 1/|c|2 (when z = −d/c), and hence f (60 )
has Euclidean radius 1/(2|c|2 ).
Suppose now that c = 0. Then ad = 1, so d = 1/a. From (5.1) we obtain

f (z + j) = a 2 z + ab + |a|2 j.

It follows that f (60 ) = {z + |a|2 j : z ∈ C} ∪ {∞}.

Let us apply Lemma 6.1 to the map tn (z) = bn + 1/z. First we must write tn in
the form tn (z) = (ibn z + i)/(i z + 0), to satisfy the condition ad − bc = 1. Then
Lemma 6.1 says that tn (60 ) has base point bn and Euclidean radius 1/2. Hence tn (60 )
is tangent to 60 at the point tn ( j) = bn + j, as illustrated in Figure 6.1.
Recall that Tn = t1 ◦ t2 ◦ · · · ◦ tn . The horospheres 60 and tn (60 ) are tangent at the
point tn ( j), which implies that the horospheres Tn−1 (60 ) and Tn (60 ) = Tn−1 (tn (60 ))
are tangent at the point Tn ( j). A chain of horospheres is a sequence of horospheres

396 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



tn ( j)
60

tn

tn (6 0 )

bn

Figure 6.1. The horospheres 60 and tn (60 ) are tangent at the point tn ( j).

60 , 61 , 62 , . . . , where 6n is tangent to 6n−1 for n = 1, 2, . . . . We have proven that


Tn (60 ) is a chain of horospheres (see Figure 6.2).

T1 (6 0 )

T2 (6 0 ) C

T3 (6 0 )

Figure 6.2. The beginnings of a chain of horospheres

The base points z 0 , z 1 , z 2 , . . . of a chain of horospheres satisfy z 0 = ∞ and z n 6 =


z n−1 for n = 1, 2, . . . . Conversely, any sequence of points of this type determines a
unique chain of horospheres 60 , 61 , . . . (because 60 is fixed and 6n is determined
inductively by the conditions that it has base point z n and is tangent to 6n−1 ). We use
this observation to establish a correspondence between complex continued fractions
and chains of horospheres.

Theorem 6.2. Given a continued fraction K(bn ) with complex coefficients, the se-
quence 60 , T1 (60 ), T2 (60 ), . . . is a chain of horospheres. Conversely, given a chain of
horospheres 60 , 61 , 62 , . . . there is a unique continued fraction K(bn ) with Tn (60 ) =
6n for n = 1, 2, . . . .

Proof. We have only to prove the converse statement. Suppose that 60 , 61 , 62 , . . . is


a chain of horospheres with base points z 0 , z 1 , z 2 , . . . . Define the sequence bn and cor-
responding maps tn (z) = bn + 1/z and Tn = t1 ◦ t2 ◦ · · · ◦ tn inductively by b1 = z 1

May 2014] A GEOMETRIC REPRESENTATION OF CONTINUED FRACTIONS 397


−1 −1
and bn = Tn−1 (z n ). Then tn (∞) = bn = Tn−1 (z n ), and so Tn (∞) = z n . The condi-
−1 −1
tion z n 6= z n−1 ensures that bn 6 = ∞, because bn = Tn−1 (z n ) 6 = Tn−1 (z n−1 ) = ∞. Now,
60 , T1 (60 ), T2 (60 ), . . . is a chain of horospheres with base points z 0 , z 1 , z 2 , . . . , and
it follows from the remark before this theorem that Tn (60 ) = 6n . The uniqueness of
−1
K(bn ) is a consequence of the inductive equations b1 = z 1 and bn = Tn−1 (z n ).

Using Lemma 6.1, we can describe the horosphere Tn (60 ) in terms of the coeffi-
cients of Tn . We recall the sequences of complex numbers A0 , A1 , . . . and B0 , B1 , . . .
determined by the matrix equation (2.1).

Lemma 6.3. If Bn 6 = 0, then Tn (60 ) has base point An /Bn and Euclidean radius
1/(2|Bn |2 ). If Bn = 0, then Tn (6) = {z + |An |2 j : z ∈ C} ∪ {∞}.

Proof. This follows immediately from Lemma 6.1, because

Tn (z) = (An z + An−1 )/(Bn z + Bn−1 ) and |An Bn−1 − An−1 Bn | = 1.

The horospheres Tn (60 ) and Tn−1 (60 ) are tangent at the point Tn ( j). The base
points of these two horospheres are Tn (∞) and Tn−1 (∞) = Tn (0). The hyperbolic line
between ∞ and 0 in H3 (a Euclidean half-line), which contains j, is mapped by Tn to
the hyperbolic line between Tn (∞) and Tn (0) in H3 ; see Figure 6.3.

Tn−1 (6 0 )

Tn (6 0 )
Tn ( j)

Tn (0) Tn (∞)

Figure 6.3. The horospheres Tn−1 (60 ) and Tn (60 ) are tangent at the point Tn ( j).

If the continued fraction K(bn ) converges to a complex number p, then Tn (∞) → p


and Tn (0) → p as n → ∞. Because Tn ( j) lies on the hyperbolic line between Tn (∞)
and Tn (0), we see that Tn ( j) approaches the boundary of hyperbolic space as n →
∞. In particular, %( j, Tn ( j)) → ∞ as n → ∞. (In fact, Tn ( j) converges to p in the
Euclidean metric.) If p = ∞, then it is still true that %( j, Tn ( j)) → ∞ as n → ∞,
but to prove this rigorously, we should really use the ball model of three-dimensional
hyperbolic space with the spherical metric, because in this model the point ∞ no
longer has any special geometric significance.

7. THE CONVERSE TO THEOREM 4.1. Here Pwe prove the converse to The-
orem 4.1, namely that if K(bn ) converges, then bn diverges. In fact, we prove
a stronger result, known as the Stern–Stolz theorem (see [9, Theorem 3.1]), which

398 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



says
P that given any sequence of complex numbers b1 , b2 , . . . , if K(bn ) converges, then
|bn | diverges. Our proof is essentially the same as that in [2].
Observe that tn (z) = f (bn + z), where f (z) = 1/z, and f fixes the point j. Let
0 denote the collection of all smooth paths in H3 from j to bn + j, and let δ denote
the Euclidean line segment between j and bn + j. Thus, δ ∈ 0. Let ζ = z + t j; then,
because f is a hyperbolic isometry, we have

|dζ | |dζ |
Z Z
%( j, tn ( j)) = %( f ( j), f tn ( j)) = %( j, bn + j) = inf ≤ = |bn |.
γ ∈0 γ t δ t

Now, Tn−1 is also a hyperbolic isometry, so %( j, tn ( j)) = %(Tn−1 ( j), Tn ( j)). Thus,
using the triangle inequality, we find that

ρ( j, Tn ( j)) ≤ ρ( j, T1 ( j)) + ρ(T1 ( j), T2 ( j)) + · · · + ρ(Tn−1 ( j), Tn ( j))


≤ |b1 | + |b2 | + · · · + |bn |.

But K(bn ) converges, and we saw at P


the end of Section 6 that this implies that
ρ( j, Tn ( j)) → ∞ as n → ∞. Therefore, |bn | diverges.

8. THE VERTICAL HYPERBOLIC PLANE. We have seen how to represent a


real continued fraction by a chain of horocycles in the hyperbolic plane, and a com-
plex continued fraction by a chain of horospheres in three-dimensional hyperbolic
space. However, there is a significant difference between these two constructions. In
the real case, the horocycles are constructed in an ad hoc manner; in the complex case,
the horospheres are constructed by using the isometric action of the Möbius group
on hyperbolic space. We should ask, therefore, whether in the real case we can ob-
tain the chain of horocycles by a group action in the hyperbolic plane. Now, there
are many papers on integer continued fractions that refer to the action of the modu-
lar group (of Möbius maps z 7 → (az + b)/(cz + d), where a, b, c, and d are integers
with ad − bc = 1) on the upper half-plane H, namely {x + i y : y > 0}, which is re-
garded as the hyperbolic plane with metric |dz|/y. However, even this is not entirely
satisfactory, because when studying continued fractions in this way, we are inevitably
led to consider maps of the form z 7 → b + 1/z, and such maps do not leave H in-
variant (they interchange the upper and lower half-planes). Some papers also use the
extended modular group (in which we only require the integers a, b, c and d to satisfy
ad − bc = ±1), and the same problem arises for this group.
The real problem here is that the maps z 7 → b + 1/z are strictly loxodromic, and
there is no disc or half-plane in the extended complex plane that is invariant under a
strictly loxodromic map; see [1, Theorem 4.3.4]. It follows that if we wish to describe
real continued fractions purely in terms of the group action of isometries of the hyper-
bolic plane, then we have to look for this plane elsewhere. In fact, there is such a plane
and it lies in three-dimensional hyperbolic space H3 . Let

H⊥ = {x + t j : x ∈ R, t > 0};

we call this the vertical hyperbolic plane, and it is the set of those points z + t j in
H3 with y = 0. It is well known that a Möbius map f leaves the extended real line
invariant if and only if it can be written in the form f (z) = (az + b)/(cz + d), where
the coefficients a, b, c, and d are real, and ad − bc = ±1. Equally, this is so if and only
if f leaves H⊥ invariant. It follows that when considering real continued fractions (or,

May 2014] A GEOMETRIC REPRESENTATION OF CONTINUED FRACTIONS 399


more generally, real Möbius maps) it might be advantageous to consider their action
on the invariant plane H⊥ . Indeed, H⊥ , with the hyperbolic metric induced from H3 ,
is a hyperbolic plane in H3 , and the group of Möbius maps with real coefficients acts
faithfully as a group of hyperbolic isometries on H⊥ . Let us examine, for example,
the action of g(z) = 1/z (which does not preserve the upper half-plane H) on H⊥ . By
(5.2), we have

x + tj
g(x + t j) = ,
x2 + t2

so g acts on H⊥ as inversion across the unit circle x 2 + t 2 = 1 (and this action on H⊥ is


anticonformal). More generally, the map f above is conformal on H⊥ if ad − bc = 1,
and is anticonformal if ad − bc = −1.
This view also throws light on the use of the extended modular group in continued
fraction theory. The extended modular group is an extension of the modular group, but
in discrete group theory, it is the Picard group (the set of Möbius maps with a, b, c, and
d Gaussian integers, and ad − bc = 1), which acts on three-dimensional hyperbolic
space, that is regarded as the natural analogue of the modular group. To complete
the link between these three groups, we only have to note that the extended modular
group is simply the subgroup of the Picard group that leaves the extended real line
invariant. Thus, from a different perspective we might consider replacing the action of
the extended modular group on the complex plane by its action (as a subgroup of the
Picard group) on the vertical hyperbolic plane.
Let us now return to real continued fractions. Despite the discussion given in Sec-
tion 3, we can regard the real coefficients bn as complex coefficients, and construct
horospheres and so on, as in Section 6. If we do this, and then intersect the resulting
horospheres with the vertical hyperbolic plane H⊥ , we will obtain exactly the same
result as we achieved in Section 3, except that now the horocycles in H⊥ are given by
a group action. In this way, we see that the geometric theory of real continued frac-
tions really is a special case of the geometric theory of complex continued fractions,
for both are described by the same group action on the same space.
When the coefficients bn are real, the sequence T1 (∞), T2 (∞), . . . lies in the ex-
tended real line R∞ , and the horosphere Tn (60 ) intersects H⊥ orthogonally in a circle
50n , which is a horocycle in the vertical hyperbolic plane (see Figure 8.1). Let 500
denote the horocycle given by t = 1 in H⊥ . Then 50n and 50n−1 are tangent for n =
1, 2, . . . , and so 500 , 501 , . . . is a chain of horocycles. The base point of 50n is Tn (∞) =
An /Bn and, by Lemma 6.3, the Euclidean radius of 50n is 1/(2Bn 2 ).
Essentially, we have recovered the original chain of horocycles associated to a real
continued fraction described in Section 3, except now the horocycles lies in H⊥ rather
than H, and using the isometric action of Möbius transformations on hyperbolic space,
we have a much deeper understanding of the geometry.

9. CONCLUDING REMARKS. The representation of continued fractions by


chains of horospheres may shed light on other results in continued fraction theory.
As a basic example, suppose that the horospheres 60 , T1 (60 ), T2 (60 ), . . . correspond-
ing to a continued fraction K(bn ) are all Euclidean spheres, and


X
rad[Tn (60 )] < +∞,
n=1

400 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



T1 ( j)
60

T1 (6 0 )

T3 (6 0 ) T2 (6 0 )
T2 ( j)
T3 ( j)

T3 (∞) T2 (∞) T1 (∞)

Figure 8.1. The beginnings of a chain of horospheres when all bn are real

where rad[Tn (60 )] is the Euclidean radius of Tn (60 ). Then, geometrically, it is clear
2
that K(bn ) converges. Lemma 6.3 tells us that
Prad[Tn (6 0 )] = 1/(2|Bn | ), and thus we
2
recover the simple result that convergence of 1/|Bn | implies convergence of K(bn ).
Another advantage of the geometric approach (using the horospheres Tn (60 )) rather
than the algebraic approach (using the coefficients Bn ) is that the geometric approach
generalizes to higher dimensions. The definition of a chain of horospheres extends in
a straightforward fashion to N -dimensional hyperbolic space

H N = {(x1 , x2 , . . . , x N : x N > 0},

and this gives us a concept of a continued fraction in higher dimensions. In contrast,


the recurrence relations of (2.1) only make sense in the complex plane, and treating
continued fractions algebraically in higher dimensions can be cumbersome.

REFERENCES

1. A. F. Beardon, The Geometry of Discrete Groups. Springer, New York, 1983, available at http://link.
springer.com/book/10.1007\%2F978-1-4612-1146-4.
2. , Continued fractions, discrete groups and complex dynamics, Comput. Methods Funct. Theory 1
(2001) 535–594, available at http://link.springer.com/article/10.1007\%2FBF03321006#.
3. A. F. Beardon, M. Hockman, I. Short, Geodesic continued fractions, Michigan Math. J. 61 (2012) 133–
150.
4. L. R. Ford, On the closeness of approach of complex rational fractions to a complex irrational number,
Trans. Amer. Math. Soc. 27 (1925) 146–154.
5. , Fractions, Amer. Math. Monthly 45 (1938) 586–601.
6. G. H. Hardy, E. M. Wright, An Introduction to the Theory of Numbers. Sixth edition. Oxford University
Press, Oxford, 2008.
7. S. Katok, I. Ugarcovici, Symbolic dynamics for the modular surface and beyond, Bull. Amer. Math. Soc.
44 (2007) 87–132, available at http://www.ams.org/journals/bull/2007-44-01/S0273-0979-
06-01115-3/S0273-0979-06-01115-3.pdf.
8. A. Ya. Khinchin, Continued Fractions. Translated from the third (1961) Russian edition. Reprint of the
1964 translation. Dover, Mineola, NY, 1997.
9. L. Lorentzen, H. Waadeland, Continued Fractions. Vol. 1. Second edition. Atlantis Studies in Mathemat-
ics for Engineering and Science, Atlantis Press, Paris, 2008.
10. C. Series, The modular surface and continued fractions, J. London Math. Soc. 31 no. 2 (1985) 69–80,
available at http://jlms.oxfordjournals.org/content/s2-31/1/69.extract.

ALAN F. BEARDON received his Ph.D. from Imperial College, London, in 1964 and has taught at the Uni-
versity of Maryland, the University of Canterbury, and from 1968 to retirement, at the University of Cambridge.

May 2014] A GEOMETRIC REPRESENTATION OF CONTINUED FRACTIONS 401


His research interests are geometry, geometric function theory, discrete groups, dynamical systems, and math-
ematical economics. He is the author of seven books, and a winner of the Lester R. Ford prize for an expository
paper in this M ONTHLY.
Centre for Mathematical Sciences, University of Cambridge, Wilberforce Road, Cambridge CB3 0WB,
United Kingdom
a.f.beardon@dpmms.cam.ac.uk

IAN SHORT received his Ph.D. from the University of Cambridge in 2005, under the supervision of Alan
Beardon. He is now a lecturer in mathematics at the Open University. His research interests include complex
analysis, continued fractions, dynamical systems, and hyperbolic geometry.
Department of Mathematics and Statistics, The Open University, Milton Keynes MK7 6AA, United Kingdom
ian.short@open.ac.uk

Congratulations to M ONTHLY Associate Editor, Edward B. Burger, who was


inaugurated as the 15th President of Southwestern University on March 25, 2014.

402 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



Rethinking Set Theory
Tom Leinster

Abstract. Mathematicians manipulate sets with confidence almost every day, rarely making
mistakes. Few of us, however, could accurately quote what are often referred to as ‘the’ axioms
of set theory. This suggests that we all carry around with us, perhaps subconsciously, a reliable
body of operating principles for manipulating sets. What if we were to take some of those
principles and adopt them as our axioms instead? The message of this article is that this can
be done, in a simple, practical way (due to Lawvere). The resulting axioms are ten thoroughly
mundane statements about sets.

As mathematicians, we often read a nice new proof of a known theorem, enjoy


the different approach, but continue to derive our internal understanding from
the method we originally learned. This paper aims to change drastically the way
mathematicians think [. . . ] and teach.
—Sheldon Axler [1, Section 10]

Mathematicians manipulate sets with confidence almost every day of their working
lives. We do so whenever we work with sets of real or complex numbers, or with vec-
tor spaces, topological spaces, groups, or any of the many other set-based structures.
These underlying set-theoretic manipulations are so automatic that we seldom give
them a thought, and it is rare that we make mistakes in what we do with sets.
However, very few mathematicians could accurately quote what are often referred to
as ‘the’ axioms of set theory, short of looking them up. We would not dream of working
with, say, Lie algebras without first learning the axioms. Yet many of us will go our
whole lives without learning ‘the’ axioms for sets, with no harm to the accuracy of our
work. This suggests that we all carry around with us, more or less subconsciously, a
reliable body of operating principles that we use when manipulating sets.
What if we were to write down some of these principles and adopt them as our ax-
ioms for sets? The message of this article is that this can be done, in a simple, practical
way. We describe an axiomatization due to F. William Lawvere [3, 4], informally sum-
marized in Figure 1. The axioms suffice for very nearly everything mathematicians
ever do with sets. So we can, if we want, abandon the classical axioms entirely and use
these instead.

Why rethink? The traditional axiomatization of sets is known as Zermelo–Fraenkel


with Choice (ZFC). Great things have been achieved on this axiomatic basis. However,
ZFC has one major flaw: Its use of the word ‘set’ conflicts with how most mathemati-
cians use it.
The root of the problem is that in the framework of ZFC, the elements of a set
are always sets too. Thus, given a set X , it always makes sense in ZFC to ask what the
elements of the elements of X are. Now, a typical set in ordinary mathematics is R. But
ask a randomly-chosen mathematician, ‘what are the elements of π?’, and they will
probably assume they misheard you, or tell you that your question makes no sense. If
forced to answer, they might reply that real numbers have no elements. But this too is
http://dx.doi.org/10.4169/amer.math.monthly.121.05.403
MSC: Primary 03E99

May 2014] RETHINKING SET THEORY 403


1. Composition of functions is associative and has identities.
2. There is a set with exactly one element.
3. There is a set with no elements.
4. A function is determined by its effect on elements.
5. Given sets X and Y , one can form their cartesian product X × Y .
6. Given sets X and Y , one can form the set of functions from X to Y .
7. Given f : X −→ Y and y ∈ Y , one can form the inverse image f −1 (y).
8. The subsets of a set X correspond to the functions from X to {0, 1}.
9. The natural numbers form a set.
10. Every surjection has a right inverse.
Figure 1. Informal summary of the axioms. The primitive concepts are set, function, and composition of
functions. Other concepts mentioned (such as element) are defined in terms of the primitive concepts.

in conflict with ZFC’s usage of ‘set’: If all elements of R are sets, and they all have no
elements, then they are all the empty set, from which it follows that all real numbers
are equal.
Could we, perhaps, continue to use ZFC while quietly ignoring the requirement that
the elements of a set must be sets too? No; this would leave us unable to state the ZFC
axioms. For example, one axiom states that every nonempty set X has some element x
such that x ∩ X = ∅, which only makes sense if the elements of X are sets. When X
is an ordinary set such as R, few would recognize this axiom as meaningful: What is
π ∩ R, after all?
I will anticipate an objection to these criticisms. The traditional approach to set the-
ory involves not only ZFC, but also a collection of methods for encoding mathematical
objects of many different types (real numbers, differential operators, random variables,
the Riemann zeta function, . . . ) as sets. This is similar to the way in which computer
software encodes data of many types (text, sound, images, . . . ) as binary sequences. In
both cases, even the designers would agree that the encoding methods are somewhat
arbitrary. So, one might object, no one is claiming that questions like ‘what are the
elements of π ?’ have meaningful answers.
However, the criticisms made in earlier paragraphs have nothing to do with the
matter of encoding. The bare facts are that in ZFC, it is always valid to ask of a set
‘what are the elements of its elements?’, and in ordinary mathematical practice, it is
not. Perhaps it is misleading to use the same word, ‘set’, for both purposes.

Three misconceptions. The axiomatization presented below is Lawvere’s Elementary


Theory of the Category of Sets, first proposed half a century ago [3, 4]. Here it is
phrased in a way that requires no knowledge of category theory whatsoever.
Because of the categorical origins of this axiomatization, three misconceptions
commonly arise.
The first is that the underlying motive is to replace set theory with category theory.
It is not. The approach described here is not a rival to set theory: It is set theory.
The second is that this axiomatization demands more mathematical sophistication
than others (such as ZFC). This is false, but understandable. Almost all of the work
on Lawvere’s axioms has taken place within topos theory, a beautiful and profound
subject, but not one easily accessible to outsiders. It has always been known that the
axioms could be presented in a completely elementary way, and although some authors
have emphasized this [3, 5, 6, 10, 11], it is not as widely appreciated as it should be.
This paper aims to make it plain.

404 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



The third misconception is that because these axioms for sets come from category
theory, and because the definition of category involves a collection of objects and a
collection of arrows, and because ‘collection’ might mean something like ‘set’, there
is a circularity; in order to axiomatize sets categorically, we must already know what
a set is. But although our approach is categorically inspired, it does not depend on
having a general definition of category. Indeed, our axiomatization (Section 2) does
not contain a single instance of the word ‘category’.
Put another way, circularity is no more a problem here than in ZFC. Informally,
ZFC says ‘there are some things called sets, there is a binary relation on sets called
membership, and some axioms hold.’ We will say ‘there are some things called sets
and some things called functions, there is an operation called composition of func-
tions, and some axioms hold.’ In neither case are the ‘things’ required to form a set
(whatever that would mean). In logical terminology, both axiomatizations are simply
first-order theories.

1. PRELUDE: ELEMENTS AS FUNCTIONS. The working mathematician’s vo-


cabulary includes terms such as set, function, element, subset, and equivalence rela-
tion. Any axiomatization of sets will choose some of these concepts as primitive and
derive the others. The traditional choice is sets and elements. We use sets and functions.
The formal axiomatization is presented in Section 2. However, it will be helpful to
consider one aspect in advance: how to derive the concept of element from the concept
of function.
Suppose for now that we have found a characterization of one-element sets without
knowing what an element is. (We do so below.) Fix a one-element set 1 = {•}. For any
set X , a function 1 −→ X is essentially just an element of X , since, after all, such a
function f is uniquely determined by the value of f (•) ∈ X (Figure 2(c)). Thus:
Elements are a special case of functions.
This is such a trivial observation that one is apt to dismiss it as a mere formal trick.
On the contrary, similar correspondences occur throughout mathematics. For example
(Figure 2):
• a loop in a topological space X is a continuous map S 1 −→ X ;
• a straight line in Rn is a distance-preserving map R −→ Rn ;

S1 X R Rn
(a) (b)

1 X
(c)
Figure 2. Mapping out of a basic object (S 1 , R, or 1) picks out figures of the appropriate type (loops, lines, or
elements).

May 2014] RETHINKING SET THEORY 405


• a sequence in a set X is a function N −→ X ;
• a solution (x, y) of the equation x 2 + y 2 = 1 in a ring A is a homomorphism
Z[X, Y ]/(X 2 + Y 2 − 1) −→ A.
In each case, the word ‘is’ can be taken either as a definition or as an assertion of
a canonical, one-to-one correspondence. In the first, we map out of the circle, which
is a ‘free-standing’ loop; in the second, R is a free-standing line; in the third, the
elements 0, 1, 2, . . . of N form a free-standing sequence; in the last, the pair (X, Y ) of
elements of Z[X, Y ]/(X 2 + Y 2 − 1) is the free-standing solution (x, y) of x 2 + y 2 =
1. Similarly, in our trivial situation, the set 1 is a free-standing element, and an element
of a set X is just a map 1 −→ X .
We could write x̄, say, for the function 1 −→ X with value x ∈ X . However, we
will write x̄ as simply x, blurring the distinction. In fact, we will later define an element
of X to be a function 1 −→ X .
This will make some readers uncomfortable. There is, you will agree, a canonical
one-to-one correspondence between elements of X and functions 1 −→ X , but per-
haps you draw the line at saying that an element of X literally is a function 1 −→ X .
If so, this is not a deal-breaker. We could adapt the axiomatization in Section 2 by
adding ‘element’ to the list of primitive concepts. Then, however, we would need to
complicate it further by adding clauses to guarantee that (among other things) there is
a one-to-one correspondence between elements of X and functions 1 −→ X , for any
set X . It can be done, but we choose the more economical route.
We have seen that elements are a special case of functions. There is another funda-
mental way in which functions and elements interact: Given a function f : X −→ Y
and an element x ∈ X , we can evaluate f at x to obtain a new element, f (x) ∈ Y .
Viewing elements as functions out of 1, this element f (x) is nothing but the compos-
ite of f with x. That is, f (x) = f ◦ x, as illustrated below.

x
1 / X

f
f (x)  
Y.

Hence:

Evaluation is a special case of composition.

2. THE AXIOMS. Here we state our ten axioms on sets and functions, in entirely
elementary terms.
The formal axiomatization is in a different typeface, to distinguish it from the ac-
companying commentary. Some diagrams appear, but they are not part of the formal
statement.
First we state the data to which our axioms will apply:
• some things called sets ;
• for each set X and set Y , some things called functions from X to Y , with
f
functions f from X to Y written as f : X −→ Y or X −→ Y ;
• for each set X , set Y , and set Z , an operation assigning to each f : X −→ Y
and g : Y −→ Z a function g ◦ f : X −→ Z ;
• for each set X , a function 1 X : X −→ X .

406 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



This last item can be included in the list or not, according to taste. See the comments
after the first axiom, which now follows.

Associativity and identity laws.


Axiom 1. For all sets W , X , Y , and Z , and all functions
f g h
W −→ X −→ Y −→ Z ,

we have h ◦ (g ◦ f ) = (h ◦ g) ◦ f . For all sets X and Y and functions f : X −→ Y ,


we have f ◦ 1 X = f = 1Y ◦ f .
If we wish to omit the identity functions from the list of primitive concepts, we must
replace the second half of Axiom 1 by the statement that for all sets X , there exists a
function 1 X : X −→ X such that g ◦ 1 X = g for all g : X −→ Y and 1 X ◦ f = f for
all f : W −→ X . These conditions characterize 1 X uniquely.

One-element set. We would like to say ‘there exists a one-element set’, but for the
moment we lack the expressive power to say ‘element’. However, any one-element
set T should have the property that for each set X , there is precisely one function
X −→ T . Moreover, only one-element sets should have this property. This motivates
the following definition and axiom.

A set T is terminal if for every set X , there is a unique function X −→ T .

Axiom 2. There exists a terminal set.


It follows quickly from the definitions that if T and T 0 are terminal sets, then there
is a unique isomorphism from T to T 0 . (A function f : A −→ B is an isomorphism
if there is a function f 0 : B −→ A such that f 0 ◦ f = 1 A and f ◦ f 0 = 1 B .) In other
words, terminal sets are unique up to unique isomorphism. It is therefore harmless to
fix a terminal set 1 once and for all. Readers concerned by this are referred to the last
few paragraphs of this section.

Given a set X , we write x ∈ X to mean x : 1 −→ X , and call x an element of


X . Given x ∈ X and a function f : X −→ Y , we write f (x) for the element f ◦
x : 1 −→ Y of Y .

Empty set.
Axiom 3. There exists a set with no elements.

Functions and elements. A function from X to Y should be nothing more than a way
of turning elements of X into elements of Y .
Axiom 4. Let X and Y be sets and f, g : X −→ Y functions. Suppose that f (x) =
g(x) for all x ∈ X . Then f = g.
Axioms 1, 2, and 4 imply that a set is terminal if and only if it has exactly one
element. This justifies the usage of ‘one-element set’ as a synonym for ‘terminal set’.

Cartesian products. We want to be able to form cartesian products of sets. An el-


ement of X together with an element of Y should uniquely determine an element of
X × Y . More generally, for any set I , a function f 1 : I −→ X together with a func-

May 2014] RETHINKING SET THEORY 407


tion f 2 : I −→ Y should uniquely determine a function f : I −→ X × Y , given by
f (t) = ( f 1 (t), f 2 (t)). (To see that this really is ‘more generally’, take I = 1.) We can
recover f 1 from f by composing with the projection p1 : X × Y −→ X , and similarly
f 2 , as in the following definition.

Let X and Y be sets. A product of X and Y is a set P together with functions


p1 p2
X ←− P −→ Y , with the following property (Figure 3):

f1 f2
For all sets I and functions X ←− I −→ Y , there is a unique function
( f 1 , f 2 ) : I −→ P such that p1 ◦ ( f 1 , f 2 ) = f 1 and p2 ◦ ( f 1 , f 2 ) = f 2 .

I
( f1 , f2 )
f1
 f2

P
 
u p1 p2 )
X Y
Figure 3. The characteristic property of products

Axiom 5. Every pair of sets has a product.


Strictly speaking, a product consists of not only the set P, but also the projections
p1 and p2 . Any two products of X and Y are uniquely isomorphic: Given products
(P, p1 , p2 ) and (P 0 , p10 , p20 ), there is a unique isomorphism i : P −→ P 0 such that
p10 ◦ i = p1 and p20 ◦ i = p2 . As in the case of terminal sets, this makes it harmless to
choose once and for all a preferred product (X × Y, pr1X,Y , pr2X,Y ) for each pair X , Y of
sets. Again, this convention is justified at the end of the section.

Sets of functions. In everyday mathematics, we can form the set Y X of functions from
one set X to another set Y . For any set I , the functions q : I × X −→ Y correspond
one-to-one with the functions q̄ : I −→ Y X , simply by changing the punctuation:

q(t, x) = (q̄(t))(x) (1)

(t ∈ I , x ∈ X ). For example, when I = 1, this reduces to the statement that the func-
tions X −→ Y correspond to the elements of Y X .
In (1), we are implicitly using the evaluation map

ε: YX × X −→ Y
( f, x) 7 −→ f (x).

Then (1) becomes the equation q(t, x) = ε(q̄(t), x), as in the following definition.

Let X and Y be sets. A function set from X to Y is a set F together with a


function ε : F × X −→ Y , with the following property (Figure 4):

For all sets I and functions q : I × X −→ Y , there is a unique function


q̄ : I −→ F such that q(t, x) = ε(q̄(t), x) for all t ∈ I and x ∈ X .

Axiom 6. For all sets X and Y , there exists a function set from X to Y .

408 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



I×X
q
q̄×1 X
 "
F×X / Y
ε

Figure 4. The characteristic property of function sets

Inverse images. Ordinarily, given a function f : X −→ Y and an element y of Y , we


can form the inverse image or fiber f −1 (y). The inclusion function j : f −1 (y) ,→ X
has the property that f ◦ j has constant value y. Moreover, whenever q : I −→ X is a
function such that f ◦ q has constant value y, the image of q must lie within f −1 (y);
that is, q = j ◦ q̄ for some q̄ : I −→ f −1 (y) (necessarily unique).

Let f : X −→ Y be a function and y ∈ Y . An inverse image of y under f is a set


A together with a function j : A −→ X , such that f ( j (a)) = y for all a ∈ A and
the following property holds (Figure 5):

For all sets I and functions q : I −→ X such that f (q(t)) = y for all t ∈ I ,
there is a unique function q̄ : I −→ A such that q = j ◦ q̄.

I
( /' 1

A
q j y
!  
X / Y
f

Figure 5. The characteristic property of inverse images

Axiom 7. For every function f : X −→ Y and element y ∈ Y , there exists an inverse


image of y under f .
Inverse images are essentially unique: If j : A −→ X and j 0 : A0 −→ X are both
inverse images of y under f , then there is a unique isomorphism i : A −→ A0 such
that j 0 ◦ i = j.

Characteristic functions. Sometimes we want to define a function on a case-by-case


basis. For example, we might want to define h : R −→ R by h(x) = x sin(1/x) if
x 6= 0 and h(0) = 0. A simple instance is the definition of characteristic function.
Fix a two element-set 2 = {t, f } (for ‘true’ and ‘false’). The characteristic function
of a subset A ⊆ X is the function χ A : X −→ 2 defined by χ A (x) = t if x ∈ A, and
χ A (x) = f otherwise. It is the unique function χ : X −→ 2 such that χ −1 (t) = A.
This is how characteristic functions work ordinarily. To ensure that they work in the
same way in our set theory, we now demand that there exist a set 2 and an element
t ∈ 2 with the property just described: Whenever X is a set and A ⊆ X , there is a
unique function χ : X −→ 2 such that χ −1 (t) = A.
Since we do not yet have a definition of subset, we phrase the axiom in terms of
injections instead. (The thought here is that every subset inclusion A ,→ X is injective,
and, up to isomorphism, every injection arises in this way.)

May 2014] RETHINKING SET THEORY 409


An injection is a function j : A −→ X such that j (a) = j (a 0 ) =⇒ a = a 0 for
a, a 0 ∈ A.
A subset classifier is a set 2 together with an element t ∈ 2, with the following
property (Figure 6):

For all sets A and X and injections j : A −→ X , there is a unique function


χ : X −→ 2 such that j : A −→ X is an inverse image of t under χ.

A / 1

j t
 
X / 2
χ

Figure 6. The characteristic property of subset classifiers

Axiom 8. There exists a subset classifier.


The notation 2 is merely suggestive. There is nothing in the definition saying that 2
must have two elements, but, nontrivially, our ten axioms do in fact imply this.

Natural numbers. In ordinary mathematics, sequences can be defined recursively:


Given a set X , an element a ∈ X , and a function r : X −→ X , there is a unique se-
quence (xn )∞
n=0 in X such that

x0 = a and xn+1 = r (xn ) for all n ∈ N.

A sequence in X is nothing but a function N −→ X , so the previous sentence is really


a statement about the set N. It also refers to two pieces of structure on N: the element
0 and the function s : N −→ N given by s(n) = n + 1.

A natural number system is a set N together with an element 0 ∈ N and a function


s : N −→ N , with the following property (Figure 7):

Whenever X is a set, a ∈ X , and r : X −→ X , there is a unique function


x : N −→ X such that x(0) = a and x(s(n)) = r (x(n)) for all n ∈ N .

0 s
1 / N / N

11 x x
  
1 / X / X
a r

Figure 7. The characteristic property of natural number systems

Axiom 9. There exists a natural number system.


Natural number systems are essentially unique, in the usual sense that between any
two of them there is a unique structure-preserving isomorphism. This justifies speaking
of the natural numbers N, as we invariably do.

410 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



Choice. A function with a right inverse is certainly surjective. The axiom of choice
states the converse.
A surjection is a function s : X −→ Y such that for all y ∈ Y , there exists x ∈ X
with s(x) = y.
A right inverse of a function s : X −→ Y is a function i : Y −→ X such that
s ◦ i = 1Y .
Axiom 10. Every surjection has a right inverse.
A right inverse of a surjection s : X −→ Y is a choice, for each y ∈ Y , of an element
of the nonempty set s −1 (y).
This concludes the axiomatization.

The meaning of ‘the’. It remains to reassure any readers concerned by the liberty
taken in Axioms 2 and 5, where we chose once and for all a terminal set and a cartesian
product for each pair of sets.
This type of liberty is very common in mathematical practice. We speak of the
trivial group, the 2-sphere, the direct sum of two vector spaces, etc., even though we
can conceive of many trivial groups or 2-spheres or direct sums, all isomorphic but
not equal. Anyone asking ‘but which trivial group?’ is likely to be met with a hard
stare, and for good reason: No meaningful statement about groups depends on what
the element of the trivial group happens to be named.
However, we should be able to state the axioms with scrupulous rigor, and we can.
One way to do so is not to single out a particular terminal set or particular products,
but instead to adopt some circumlocutions. For example, we replace the phrase ‘for all
elements x ∈ X ’ by ‘for all terminal sets T and functions x : T −→ X .’
More satisfactory, though, is to extend the list of primitive concepts. To the existing
list (sets, functions, composition and identities) we add:
• a distinguished set, 1;
• an operation assigning to each pair of sets X, Y a set X × Y and functions

pr1X,Y pr2X,Y
Xo X ×Y / Y. (2)

Axiom 2 is replaced by the statement that 1 is terminal, and Axiom 5 by the statement
that for all sets X and Y , the set X × Y together with the functions (2) is a product of
X and Y .
This approach has the virtue of reflecting ordinary mathematical usage. We usually
speak as if taking the product of two sets (or spaces, groups, etc.) were a procedure
with a definite output: the product, not a product. But since products are in any case
determined uniquely up to unique isomorphism, whether or not we nominate one as
special makes no significant difference.

3. DISCUSSION. The ten axioms are familiar in their intuitive content, but less so
as an axiomatic system. Here we discuss the implications of using them as such.

Building on the axioms. Any axiomatization of anything is followed by a period of


lemma-proving. The present axioms are no exception. Here is a very brief sketch of
the development.
It is convenient formally to define a subset of a set X as a function X −→ 2, but
we constantly use the correspondence between functions X −→ 2 and injections into

May 2014] RETHINKING SET THEORY 411


X , provided by Axiom 8. Two injections j, j 0 into X correspond to the same subset of
X if and only if they have the same image (that is, there exists an isomorphism i such
that j 0 = j ◦ i).
The main task is to build the everyday equipment used for manipulating sets. For
example, given a function f : X −→ Y , we construct the image under f of a subset
of X and the inverse image of a subset of Y . An equivalence relation ∼ on a set X is
defined to be a subset of X × X with the customary properties, and the axioms allow
us to construct the quotient set X/∼. Some constructions are tricky. For instance, the
axioms imply that any two sets X and Y have a disjoint union X t Y , but no known
proof is simple.
We then define the usual number systems. Addition, multiplication, and powers of
natural numbers are defined directly using Axiom 9. From N, we successively con-
struct Z, Q, R, and C in the standard way. For example, Z = (N × N)/∼, where
∼ is the equivalence relation on N × N given by (m, n) ∼ (m 0 , n 0 ) if and only if
m + n 0 = m 0 + n. As this illustrates, past a certain point, the development is literally
identical to that for other axiomatizations of sets.

How strong are the axioms? Most mathematicians will never need more properties
of sets than those guaranteed by the ten axioms. For example, McLarty [13] argues
that no more is needed anywhere in the canons of the Grothendieck school of alge-
braic geometry, the multi-volume works Éléments de Géométrie Algébrique (EGA)
and Séminaire de Géométrie Algébrique (SGA).
To get a sense of the reach of the axioms, let us consider infinite cartesian products.
Q I be a (possibly infinite) set and (X i )i∈I a family of sets. Can we form the product
Let
i∈I X i ? The answer depends on what is meant by ‘family’. We could define an I -
indexed family to be a set X together with a Q function p : X −→ I , viewing the fiber
p −1 (i) as the ith member X i . In that case, X i can be constructed as a subset of
X I . Specifically, p induces a function p I : X I −→ I I , and X i is the inverse image
Q
under p I of the element of I I corresponding to 1 I .
However, we could interpret ‘I -indexed family’ differently, as an algorithm or for-
mula that assigns to` each i ∈ I a set X i . It is not obvious that we can then form the
disjoint union X = i∈I X i , which is what would be necessary in order to obtain a
family in the previous sense. In fact, writing P (S) = 2 S for the power set of a set S,
the ten axioms do not guarantee the existence of the disjoint union

N t P (N) t P (P (N)) t · · · (3)

unless they are inconsistent [8, Section 9].


If we wish to change this, we can add an eleventh axiom (or properly, axiom
scheme). It is called ‘replacement’, formally stated in [12, Section 8], and informally
stated as follows. Suppose that we have a set I and a first-order formula so that each
i ∈ I specifies a set X i up to isomorphism. Then we require that there exist a set X
and a function p : X −→ I such that p −1 (i) is isomorphic to X i for each i ∈ I . This
guarantees the existence of sets such as (3).
The relationship between our axioms and ZFC is well understood. The ten ax-
ioms are weaker than ZFC, but when the eleventh is added, the two theories have
equal strength and are bi-interpretable (the same theorems hold). This extra strength is
sometimes needed; for example, replacement is important in parts of infinitary com-
binatorics. It is also known to which fragment of ZFC the ten axioms correspond:
‘Zermelo with bounded comprehension and choice’. The details of this relationship

412 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



were mostly worked out in the early 1970s [2, 14, 15]. Good modern accounts are in
[7, Section VI.10] and [9, Chapter 22].

A broader view. Our ten axioms are a standard rephrasing of Lawvere’s Elementary
Theory of the Category of Sets (ETCS), published in 1964. It was some years before
ETCS found its natural home, and that was with the advent of topos theory.
The notion of topos was invented by Grothendieck for reasons that had nothing to
do with set theory. For Grothendieck, a topos was a generalized topological space.
Formally, a topos is a category with certain properties, and a topological space X is
associated with the topos whose objects are the sheaves of sets on X .
Lawvere and Tierney swiftly realized that, after a slight loosening of Grothendieck’s
definition, the ETCS axioms could be restated neatly in topos-theoretic terms [16, 17].
Indeed, ETCS says exactly that sets and functions form a topos of a special sort: a
‘well-pointed topos with natural numbers object and choice’. So a topos is not only a
generalized space; it is also a generalized universe of sets.
An attractive feature of ETCS is that each of the axioms is meaningful in a broader
context than set theory. For example, Axiom 1 states that sets and functions form a
category. The job of the remaining axioms is to distinguish sets from other structures
that form categories. Axioms 2 and 5 state that the category of sets has finite products.
This important property is shared by (for example) the categories of topological spaces
and smooth manifolds, which is exactly what makes it possible to define ‘topological
group’ and ‘Lie group’. But for one detail, Axioms 1, 2, 5, 6, 7 and 8 state that sets
and functions form a topos.
The axiom of choice as formulated in Axiom 10 highlights a special feature of sets.
In most other categories of sets-with-structure, it fails, and its failure is a point of
interest. For instance, not every continuous surjection between topological spaces has
a continuous right inverse, a typical example being the nonexistence of a continuous
square root defined on the complex plane.

What kind of set theory should we teach? As Figure 1 indicates, we already teach a
diluted form of the ten axioms, even in introductory courses. For example, we certainly
tell our students that an element of X × Y is an element of X together with an element
of Y , and we routinely write a function f taking values in R2 as ( f 1 , f 2 ), although we
are less likely to state explicitly that, given functions f 1 : I −→ X and f 2 : I −→ Y ,
there is a unique function f : I −→ X × Y with f 1 and f 2 as components.
When it comes to teaching axiomatic set theory, the approach outlined here has ad-
vantages and disadvantages. The great advantage is that such a course is of far wider
benefit than one using the traditional axioms. It directly addresses a difficulty experi-
enced by many students: the concept of function (and worse, function space). It also
introduces in an elementary setting the idea of universal property. This is probably
the hardest aspect of the axioms for a learner, but since universal properties are im-
portant in so many branches of advanced mathematics, the benefits are potentially
far-reaching.
The disadvantages are perhaps only temporary. There is at present a lack of teaching
materials (the book [5] being the main exception). For example, the axioms imply
that any two sets have a disjoint union, and most books on topos theory contain an
elegant and sophisticated proof of a generalization of this fact, but to my knowledge,
there is only one place where a purely elementary proof can be found [18]. A second
disadvantage is that any student planning a career in set theory will need to learn ZFC
anyway, since almost all research-level set theory is done with the iterated-membership
conception of set. (That is the current reality, which is not to say that set theory must

May 2014] RETHINKING SET THEORY 413


be done this way.) Such a course might usefully include a comparison of the types of
set theory available.

Reactions to an earthquake. Perhaps you will wake up tomorrow, check your email,
and find an announcement that ZFC is inconsistent. Apparently, someone has taken
the ZFC axioms, performed a long string of logical deductions, and arrived at a con-
tradiction. The work has been checked and re-checked. There is no longer any doubt.
How would you react? In particular, how would you feel about the implications for
your own work? All your theorems would still be true under ZFC, but so too would
their negations. Would you conclude that your life’s work had been destroyed?
An informal survey suggests that most of us would be interested but not deeply
troubled. We would go on believing that our theorems were true in a sense that their
negations were not. We are unlikely to feel threatened by the inconsistency of axioms
to which we never referred anyway.
In contrast, the ten axioms above are such core mathematical principles that an in-
consistency in them would be devastating. If we cannot safely assume that composition
of functions is associative, or that repeatedly applying a function f : X −→ X to an
element a ∈ X produces a sequence ( f n (a)), we are really in trouble.
The difference in reactions is telling. Our response to an inconsistency in an ax-
iomatization of set theory reflects our degree of belief that it describes the operating
principles we actually employ, in ordinary mathematical practice.
In summary, simply by writing down a few mundane, uncontroversial statements
about sets and functions, we arrive at an axiomatization that fits well with how sets are
really used in mathematics.

ACKNOWLEDGMENTS. I thank François Dorais, Colin McLarty, Todd Trimble, the patrons of the n-
Category Café, and the anonymous referees. This work was partially supported by an EPSRC Advanced Re-
search Fellowship.

REFERENCES

1. S. Axler, Down with determinants! Amer. Math. Monthly 102 (1995) 139–154.
2. J. C. Cole, Categories of sets and models of set theory, in Proceedings of the Bertrand Russell Memorial
Logic Conference. Uldum 1971. Bertrand Russell Memorial Conference, Leeds, 1973. 351–399.
3. F. W. Lawvere, An elementary theory of the category of sets, Proc. Natl. Acad. Sci. USA 52 (1964)
1506–1511.
4. , An elementary theory of the category of sets (long version) with commentary, Repr. Theory
Appl. Categ. 12 (2005) 1–35, available at http://www.tac.mta.ca/tac/reprints/articles/11/
tr11abs.html.
5. F. W. Lawvere, R. Rosebrugh, Sets for Mathematics. Cambridge University Press, Cambridge, 2003.
6. S. Mac Lane, Mathematics: Form and Function. Springer, New York, 1986.
7. S. Mac Lane, I. Moerdijk, Sheaves in Geometry and Logic. Springer, New York, 1994.
8. A. Mathias, The strength of Mac Lane set theory, Ann. Pure Appl. Logic 110 (2001) 107–234.
9. C. McLarty, Elementary Categories, Elementary Toposes. Oxford University Press, Oxford, 1992.
10. , Numbers can be just what they have to, Noûs 27 (1993) 487–98.
11. , Challenge axioms, final draft, email to Foundations of Mathematics mailing list, 6 February
1998, available at http://www.cs.nyu.edu/pipermail/fom.
12. , Exploring categorical structuralism, Philos. Math. 12 (2004) 37–53.
13. , A finite order arithmetic foundation for cohomology (2011), available at http://arxiv.org/
abs/1102.1773.
14. W. Mitchell, Boolean topoi and the theory of sets, J. Pure Appl. Algebra 2 (1972) 261–274.
15. G. Osius, Categorical set theory: a characterization of the category of sets, J. Pure Appl. Algebra 4 (1974)
79–119.
16. M. Tierney, Sheaf theory and the continuum hypothesis, in Toposes, Algebraic Geometry and Logic.
Lecture Notes in Math., Vol. 274, Springer, Heidelberg, 1972. 13–42.

414 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



17. , Axiomatic sheaf theory: some constructions and applications, in Proceedings of CIME Con-
ference on Categories and Commutative Algebra. Varenna, 1971. Edizione Cremonese, Rome, 1973.
249–326.
18. T. Trimble, ETCS: building joins and coproducts (2008), available at http://ncatlab.org/nlab/
show/Trimble+on+ETCS+III.

TOM LEINSTER studied at Oxford and Cambridge, and held positions at Cambridge, the Institut des Hautes
Études Scientifiques, and the University of Glasgow before taking up his current job in Edinburgh. He is inter-
ested in category theory and its applications, especially some of the more unusual ones. He is a professionally
qualified masseur.
School of Mathematics, University of Edinburgh, Edinburgh EH9 3JZ, United Kingdom.
Tom.Leinster@ed.ac.uk

An Elementary Application of Brouwer’s Fixed Point Theorem


to Transition Matrices
Transition matrices play an essential role in the study of Markov processes ([3],
ch. 11), which have many important practical applications in business, finance,
medicine, etc. ([2], ch. 8). A transition matrix T is a n × n stochastic matrix such
that each entry pi j lies between 0 and 1, the sum of each column of T equals 1,
and each element of the matrix represents the probability of transitioning from
state i to state j. Moreover, the equilibrium state of the process represents
P a fixed
point vector (i.e., T p = p), where p is a probability vector such that pi = 1.
What is interesting here is that the set of probability vectors form a simply con-
nected compact convex set, because given two probability vectors p and q, we
have that λp + (1 − λ)q(0 ≤ λ ≤ 1) is also a probability vector, so we may ap-
ply the well-known Brouwer fixed point theorem to the transition matrix T (see
([1], pp. 251–255) for an elementary discussion of this theorem and ([4], pp.
42–45) for the general case). Consequently, we are guaranteed a fixed point by
the theorem as well as an eigenvalue of 1. Note that when considering infinite
dimensional spaces, the theorem fails since infinite dimensional bounded sets are
not necessary compact such as the unit ball in l 2 . However, it does hold if the set
is convex and compact ([4], pp. 45–46). Nevertheless, in the finite dimensional
case, this is certainly an elegant application of the theorem.

REFERENCES

1. R. Courant, H. Robbins, What is Mathematics? Oxford University Press, New York, 1941.
2. L. J. Goldstein, D. I. Schneider, M. J. Siegel, Finite Mathematics and its Applications. Eighth
edition. Pearson, Upper Saddle River, NJ, 2004.
3. C. M. Grinstead, J. L. Snell, Introduction to Probability. Second edition. American Mathematical
Society, Providence, RI, 1997.
4. T. L. Saaty, J. Bram, Nonlinear Mathematics. Dover, New York, 1964.

—Submitted by Allan J. Kroopnick,


University of Maryland University College
http://dx.doi.org/10.4169/amer.math.monthly.121.05.415
MSC: Primary 60J10

May 2014] RETHINKING SET THEORY 415


Repeatedly Appending Any Digit to
Generate Composite Numbers
Jon Grantham, Witold Jarnicki, John Rickert, and Stan Wagon

Abstract. We investigate the problem of finding integers k such that appending any number
of copies of the base-ten digit d to k yields a composite number. In particular, we prove that
there exist infinitely many integers coprime to all digits such that repeatedly appending any
digit yields a composite number.

1. INTRODUCTION. Recently, L. Jones [5] asked about integers that yield only
composites when a sequence of the same base-ten digit is appended to the right. He
showed that 37 is the smallest number with this property when appending the digit
d = 1. For each digit d ∈ {3, 7, 9}, he also found numbers coprime to d that yield only
composites upon appending ds.
In this paper, we find a single integer that works for all digits simultaneously. More
precisely, we prove the following.

Theorem. There are infinitely many positive integers k with gcd(k, 2 · 3 · 5 · 7) = 1,


such that for any base-ten digit d, appending any number of ds to k yields a composite
number.

Further, we investigate the question of the smallest numbers that remain composite
upon appending strings of a digit for each particular digit. Jones found, for digits 3, 7,
9, respectively, the examples 4070, 606474, and 1879711. It appears that 4070 is the
smallest for d = 3; for digit 7 we found 891, which is almost certainly minimal; and
for digit 9, the likely answer 10175 was discovered by [14]. In the next section, we
explain the obstructions to proving that these three answers are the smallest.

2. SEEDS. Given a digit d, let’s use the term seed for a number coprime to d such
that appending any number of ds on the right yields a composite. The smallest positive
integer with this property will be referred to as a minimal seed. Only the cases d ∈
{1, 3, 7, 9} are nontrivial. Jones proved that 37 is the minimal seed for d = 1, and he
also found the seed 4070 for digit 3. For every k < 4070, except 817, we have found a
value of n such that appending n 3s yields a prime or, in three cases, a probable prime.
For 817, appending up to 554789 3s yielded only composites. But factorizations show
no apparent obstruction to primality, so we conjecture that 4070 is the minimal seed
for digit 3.
A key concept in this area is the notion of a covering set, introduced by P. Erdős
[3]. Such a set corresponds to a finite list of primes such that every member of a given
sequence is divisible by one of the primes. Here the sequences are the numbers, which
we call sn , obtained by appending n copies of a digit d to an initial value k; typically,
the numbers are proved composite by finding a covering set. For example, when n
7s are appended to 891, the resulting number is divisible by 11, 37, 11, 3, 11, or 13
according to the mod-6 residue of n (starting at 0).
http://dx.doi.org/10.4169/amer.math.monthly.121.05.416
MSC: Primary 11A41, Secondary 11A07; 11A51

416 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



To see this, observe that sn is given by the formula

d(10n − 1)
sn = k · 10n + .
9

Because 106 ≡ 1 modulo each of the four primes, easy modular arithmetic shows that
s6m+i ≡ 0 (mod p) for the cases p = 11, 13, and 37, where i, depending on p, is 0,
2, 4, 5, or 1. The same is true for i = 3, the case where p = 3, because 106m+3 − 1 is
divisible by 27, thus eliminating the denominator of 9 in these cases. This proves that
891 is a seed for digit 7.
When a sequence of primes ( p0 , p1 , . . . , pr −1 ) divides the corresponding sequence
of terms sn for a digit d and seed k, we say that the primes form a prime cover for
(k, d). For example, (11, 37, 11, 3, 11, 13) is a prime cover for (891, 7).
We have shown that 891 is a minimal seed for digit 7, under the assumption that
appending 11330 7s to 480, and 28895 7s to 851 yields primes. Each of these two
large numbers has passed 200 strong pseudoprime tests. For all other potential seeds
below 891, we have found primes that can be certified using elliptic curve methods
with Mathematica or Primo [9]. We used Primo on the largest cases; the largest was
9777 . . . 7 with 2904 7s, which took 45 hours.
The digit-9 case asks for an integer k such that (k + 1)10n − 1 is always composite;
it is thus a variation on the classic Riesel problem [7, 11, 12, 13], which addresses
the same question in base 2. For that classic case, it is known that 509202 is a seed,
meaning that 509203 · 2n − 1 is composite for n ≥ 0. Participants in the Riesel project
have also investigated the decimal case, and showed [14] that the expected minimal
seed for digit 9 is 10175. To see that this is a seed, we again consider the number of
appended digits modulo 6 and find a prime cover: in this case (11, 7, 11, 37, 11, 13).
Of the numbers smaller than 10175, only 4420 has not been eliminated as a seed.
The Riesel project [12, 13] has checked it through the addition of 940000 9s without
finding a prime. In this case, primality proving for a probable prime is easy using the
Lucas n + 1 test [2].
Coverings are not the only tool in these investigations, since sometimes factoriza-
tions yield all the compositeness that is sought. Consider the situation with digit 1 but
working in base b = m 2 with m odd. The minimal seed in all such cases is 1 because,
for n appended 1s to the seed 1, with n even, the factorization

bn+1 − 1 m n+1 − 1 m n+1 + 1


  
111 . . . 11b = =
b−1 m−1 m+1

yields integer factors, and so the result is composite. When n is odd, the total number
of 1s is even, so compositeness is clear. Similar factorization methods show that the
minimal seed for digit 1 in base 4 is 5, for digit 3 in base 4 is 8, and for digit 8 in base
9 is 3.

3. A PANDIGITAL SEED. It is not hard to find an integer that remains composite


when any sequence of the form ddd . . . d is appended on the right, where d is any
decimal digit. We leave it as an exercise to show that 6930 does the job; only the case
d = 1 requires a prime cover, and the one used in §2 for 891—(11, 37, 11, 3, 11, 13)—
works. Some prime searching shows that 6930 is the smallest such example (the most
difficult candidate to eliminate was 6069; 1525 1s yielded a prime).

May 2014] APPENDING ANY DIGIT TO GENERATE COMPOSITE NUMBERS 417


A more natural problem in our context is to consider only the digits 1, 3, 7, 9, and
ask for an integer k that is a seed for each of these four digits (thus k is coprime to 3
and 7). We call such a positive integer k a pandigital seed.
For a prime p coprime to 10, we use the term period of p to mean the smallest
positive integer r so that, for all n, sn+r ≡ sn (mod p). The period of 3 is 3, while for
other primes it is simply the order of 10 modulo p. If the period of a prime p is small,
then p may divide a large proportion of the terms of the sequence sn . In particular, if
the period is r , then either every r th term of {sn } is divisible by p or no terms of the
sequence are divisible by p.

Theorem. A pandigital seed exists. An example is 4942768284976776320.

Proof. A proof requires only checking that particular covers work, but we outline the
method by which the large seed and corresponding prime covers were found. We find,
for each digit, a prime cover so that the congruence conditions on k arising from the
four covers do not contradict each other. This method of coherent prime covers was
used in [1, 4, 8] to find infinitely many values k so that both k2n + 1 and k2n − 1 are
composite for all n, and solve related problems. To find such covers, we first need to
analyze the condition that a term in the sequence {sn } is divisible by a given prime p.
If we assume that p ∈ / {2, 3, 5}, then sn ≡ 0 (mod p) if and only if p divides

9k · 10n + d(10n − 1),

which is equivalent to

k ≡ 9−1 d(10−n − 1) (mod p). (1)

If p = 3, then we instead have the condition

10n − 1
sn ≡ k + d ≡ 0 (mod 3),
9
which, because (10n − 1)/9 ≡ n (mod 3), reduces to k ≡ 2dn (mod 3). It is useful to
observe that when n is even then 10n ≡ 1 (mod 11), so that in this case sn is congruent
modulo 11 to the seed itself. Therefore, the condition k ≡ 0 (mod 11) makes 11 a
factor of sn whenever n is even. Hence we may focus on forcing composites for odd
values of n.
Since the period of p = 37 is 3, we consider this prime next. When the number of
appended digits is n = 6i + 3, equation (1) gives
i
k ≡ 10−(6i+3) − 1 = 10−6 · 10−3 − 1 ≡ 0 (mod 37).

Application of (1) to other values of n shows that 37 divides sn for n ≡ 0, 1, 2 (mod 3)


provided k ≡ 0, 11d, 10d (mod 37), respectively. If k ≡ 0 (mod 37), then 37 may be
used as a prime divisor no matter which digit is appended. Therefore, we can assume
k ≡ 0 (mod 37), and so we have that sn is divisible by 11 when n ≡ 0, 2, or 4 (mod 6)
and by 37 when n ≡ 0 or 3 (mod 6). This leaves only the eight cases n ≡ 1 or 5
(mod 6) with digits 1, 3, 7, and 9 to be taken care of by other primes, as shown in
Table 1.
To find divisors of sn for n ≡ 1 or 5 (mod 6), we note that the primes 7 and 13
have period 6. Solving congruence (1) leads to the conditions listed in Table 2. These

418 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



Table 1. Divisors of sn for digit d using primes 11 and 37
with a seed that satisfies k ≡ 0 (mod 11 · 37).

n (mod 6)

digit 0 1 2 3 4 5
1 11 ? 11 37 11 ?
3 11 ? 11 37 11 ?
7 11 ? 11 37 11 ?
9 11 ? 11 37 11 ?

show that if k ≡ 2 (mod 7), then two of the eight cases are divisible by 7: the digit 1
with n ≡ 1 (mod 6) and digit 9 with n ≡ 5 (mod 6) cases. Similarly, any of k ≡ 1,
3, or 9 (mod 13) provides divisibility for two of the cases. Each of these cases is then
combined with a set of additional primes that contains 3, 101, 41, 271, 73, and 137, all
of which have period 8 or less. Finally, a computer search found a list of primes that
handles all cases.

Table 2. Conditions on k to guarantee that 7 or 13 divides the number obtained by appending a digit
string to k.

n ≡ 1 (mod 6) n ≡ 5 (mod 6) n ≡ 1 (mod 6) n ≡ 5 (mod 6)

digit 1 k ≡ 2 (mod 7) k ≡ 1 (mod 7) k ≡ 9 (mod 13) k ≡ 1 (mod 13)


digit 3 k ≡ 6 (mod 7) k ≡ 3 (mod 7) k ≡ 1 (mod 13) k ≡ 3 (mod 13)
digit 7 k ≡ 0 (mod 7) k ≡ 0 (mod 7) k ≡ 11 (mod 13) k ≡ 7 (mod 13)
digit 9 k ≡ 4 (mod 7) k ≡ 2 (mod 7) k ≡ 3 (mod 13) k ≡ 9 (mod 13)

The smallest value of k found so far uses the primes 3, 7, 11, 13, 31, 37, 41, 73,
101, 137, 211, 241, and 271. The cover-lengths for the four digit-cases are 6, 6, 30,
and 8, respectively. The prime covers for the four digits are as follows:

d = 1 : (11, 3, 11, 37, 11, 13),


d = 3 : (11, 13, 11, 37, 11, 7),
d = 7 : (11, 3, 11, 37, 11, 271, 11, 3, 11, 37, 11, 41, 11, 3, 11, 37, 11, 31,
11, 3, 11, 37, 11, 211, 11, 3, 11, 37, 11, 241),
d = 9 : (11, 73, 11, 101, 11, 137, 11, 101).

Tables 3 and 4 show the correspondence between the values of n and k for each
digit. For example, when we are appending n 7s to k where n ≡ 11 (mod 30), we see
that 41 divides sn whenever k ≡ 28 (mod 41).
We apply the Chinese Remainder Theorem to all of the conditions on k in Tables 3
and 4 to find the pandigital seed k = 4942768284976776320.

Because k is not divisible by 3 or 7, we can add k ≡ 1 (mod 10) to the conditions


used in the proof, which then gives us

May 2014] APPENDING ANY DIGIT TO GENERATE COMPOSITE NUMBERS 419


Table 3. Residue classes for the seed k that guarantee the compositeness of sn
when 1 or 3 is appended.

digit 1 digit 3
classes for n classes for k classes for n classes for k

0 (mod 2) 0 (mod 11) 0 (mod 2) 0 (mod 11)


1 (mod 6) 2 (mod 3) 1 (mod 6) 1 (mod 13)
3 (mod 6) 0 (mod 37) 3 (mod 6) 0 (mod 37)
5 (mod 6) 1 (mod 13) 5 (mod 6) 3 (mod 7)

Table 4. Residue classes for the seed k that guarantee the compositeness of sn when
7 or 9 is appended.

digit 7 digit 9
classes for n classes for k classes for n classes for k

0 (mod 2) 0 (mod 11) 0 (mod 2) 0 (mod 11)


1 (mod 6) 2 (mod 3) 1 (mod 8) 21 (mod 73)
3 (mod 6) 0 (mod 37) 3 (mod 4) 9 (mod 101)
5 (mod 30) 0 (mod 271) 5 (mod 8) 40 (mod 137)
11 (mod 30) 28 (mod 41)
17 (mod 30) 20 (mod 31)
23 (mod 30) 106 (mod 211)
29 (mod 30) 7 (mod 241)

1970728582053685108721 (mod 19657858137687083324010),

a value of k that satisfies the theorem as stated in §1. This yields infinitely many such
values.

4. OPEN PROBLEMS. We conclude with some unsolved problems.


1. Find a number of 3s that can be appended to 817 to obtain a probable prime, thus
completing the proof, modulo probable primes, that 4070 is the minimal seed for
the digit 3.
2. Find a number of 9s that can be appended to 4420 to produce a prime.
3. Certify primality of 480 with 11330 7s appended and 851 with 28895 7s. Doing
so would complete the digit-7 case.
4. Data for all bases up to 10 can be found at [10]. Similar problems exist for these
bases.
5. Find a base-ten pandigital seed that is smaller than 4942768284976776320.
6. Investigate for various bases the situation where the appended digits come from
a fixed sequence, as was done by Jones and White [6] for base ten.

420 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



ACKNOWLEDGMENT. We thank the referees for many valuable comments that improved the exposition.

REFERENCES

1. D. Baczkowski, O. Fasoranti, C. Finch. Lucas-Sierpiński and Lucas-Riesel numbers, Fibonacci Quart.


49 (2011) 334–339.
2. R. Crandall, C. Pomerance, Prime Numbers: A Computational Perspective, Second edition, Springer,
New York, 2005. Section 4.2.
3. P. Erdős, On integers of the form 2k + p and some related problems, Summa Brasil. Math. 2 (1950)
113–123.
4. M. Filaseta, C. Finch, M. Kozek, On powers associated with Sierpiński numbers, Riesel numbers and
Polignac’s conjecture, J. Number Theory 128 (2008) 1916–1940.
5. L. Jones, When does appending the same digit repeatedly on the right of a positive integer generate a
sequence of composite integers? Amer. Math. Monthly 118 (2011) 153–160.
6. L. Jones, D. White, Appending digits to generate an infinite sequence of composite numbers, J. Integer
Seq. 14 (2011) Article 11.5.7.
7. W. Keller, The Riesel problem: Definition and status, available at http://www.prothsearch.net/
rieselprob.html.
8. F. Luca, V. J. Mejı́a Huguet, Fibonacci-Riesel and Fibonacci-Sierpiński numbers, Fibonacci Quart. 46/47
(2008/09) 216–219.
9. M. Martin, Primo, available at http://www.ellipsa.eu. (2010) Version 3.0.9.
10. J. Rickert, Composite sequences, available at http://www.rose-hulman.edu/~rickert/
Compositeseq.
11. H. Riesel, Några stora primtal, Elementa 39 (1956) 258–260.
12. Riesel conjectures and proofs, available at http://www.noprimeleftbehind.net/crus/Riesel-
conjectures.htm.
13. The Riesel problem, available at http://www.primegrid.com/forum_thread.php?id=1731\
&nowrap=true\#2165.
14. M. Rodenkirch, Sierpinski/Riesel base 10, available at http://www.mersenneforum.org/
showthread.php?t=6911, January 2007.

JON GRANTHAM has been a researcher at the Center for Computing Sciences since 1997. His previous
publications addressed the existence of various types of pseudo primes. He lives in Maryland with his wife and
their twin three-year-olds.
Institute for Defense Analyses, Center for Computing Sciences, 17100 Science Drive, Bowie, MD 20715
grantham@super.org

WITOLD JARNICKI worked at the Jagiellonian University (Kraków, Poland) until 2007. His research con-
centrated on algebraic geometry and complex analysis. Since 2007 he has been working as a software engineer
at Google.
Google Kraków, Rynek Glowny 12, 31-042 Kraków, Poland
witoldjarnicki@google.com

JOHN RICKERT teaches at the Rose-Hulman Institute of Technology and writes about number theory and
graph theory. In his spare time he researches baseball history and statistics.
Rose-Hulman Institute of Technology, 5500 Wabash Avenue, Terre Haute, IN 47803
rickert@rose-hulman.edu

STAN WAGON is recently retired from Macalester College. His books include The Banach-Tarski Paradox,
Mathematica in Action, and VisualDSolve. Other interests include geometric snow sculpture, mountaineering,
and mushroom hunting. He is one of the founding editors of Ultrarunning magazine, but now finds that covering
long distances is much easier on skis than in running shoes.
Mathematics Department, Macalester College, St. Paul, MN 55105
wagon@macalester.edu

May 2014] APPENDING ANY DIGIT TO GENERATE COMPOSITE NUMBERS 421


On Wallis-type Products and
Pólya’s Urn Schemes
Iddo Ben-Ari, Diana Hay, and Alexander Roitershtein

Abstract. A famous “curious identity” of Wallis gives a representation of the constant π in


terms of a simply structured infinite product of fractions. Sondow and Yi [Amer. Math. Monthly
117 (2010) 912–917] identified a general scheme for evaluating Wallis-type infinite products.
The main purpose of this paper is to discuss an interpretation of the scheme by means of Pólya
urn models.

1. INTRODUCTION. This paper is motivated by the work of Sondow and Yi [20],


where several examples of Wallis-type products (see Definition 2.6 below) have been
constructed. The method used in [20] yields “cyclically structured” converging infinite
products of fractions and evaluates their limit by means of the gamma function (see
Section 2 below). Only in a limited range of cases is an expression of the limit in
terms of powers of π and algebraic numbers known. Section 3 contains an instructive
survey of generic examples. In Section 4, we discuss a relation between the Wallis-type
products and the Pólya urn scheme. Our main observation is contained in the statement
of Theorem 4.4. Throughout the paper, 0( · ) is the gamma function and N, Q, R, C
are, respectively, natural, rational, real, and complex numbers.

2. WALLIS-TYPE INFINITE PRODUCTS. Our starting point is a representation


of the constant π discovered by Wallis [3, p. 68]:
π 2 2 4 4 6 6 8 8 10 10 12 12
= ··· . (1)
2 1 3 3 5 5 7 7 9 9 11 11 13
A standard proof of the identity (1) relies on the evaluation of Wallis’s integral
R π/2 2n+1
0
sin t dt [2]. A relation of Wallis’s product to Euler’s and Leibniz’s formulas
∞ ∞
π2 X 1 π X (−1)n
= and =
6 n=1
n2 4 n=0
2n + 1

is discussed, for instance, in [17, 25]. The cyclic structure of (1) is formalized as
follows:

π Y 2n + 2 2n + 2
= . (2)
2 n=0
2n + 1 2n + 3

This is generalized in the next proposition, which is a slight variation of a part of [20,
Theorem 1]. The result is the following Euler’s formula [8, Section VII.6] applied to
fractions:
∞  ∞ 
x2 x2
 Y 
sin(π x) Y
= 1− 2 = 1− , for x ∈ R. (3)
πx j=1
j n=0
(n + 1)2

http://dx.doi.org/10.4169/amer.math.monthly.121.05.422
MSC: Primary 11Y60, Secondary 60C05; 33B15

422 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



Proposition 2.1. For any k, m ∈ N such that m < k, it follows that

πm/k Y nk + k nk + k
= . (4)
sin(πm/k) n=0 nk + k − m nk + k + m

Example 2.2.
(a) Letting k = 6 and m = 1 in (4), we obtain [25, p. 187]

Y 6n + 6 6n + 6 π
= .
n=0
6n + 5 6n + 7 3

(b) The identity cos π x − cos π y = 2 sin π(y−x)


2
sin π(y+x)
2
with x = 16 , y = 1
4
and
(4) yield

3 1 π 5π
− √ = 2 sin sin
2 2 24 24

10π 2 Y 24n + 19 24n + 23 24n + 25 24n + 29
= .
242 n=0 24n + 24 24n + 24 24n + 24 24n + 24

We remark that (3) can be thought of as the representation of the infinite Taylor
polynomial

sin(π x) (π x)2 (π x)4 (π x)6


=1− + − + ...
πx 3! 5! 7!
as ∞ 1 − xxj 1 − x−x j , where x j = j are the roots of the polynomial [17, Chap-
Q  
j=1
ter II].
The next proposition is a consequence of the following counterpart of (3) for the
cosine function:
∞ 
x2
Y 
cos(π x/2) = 1− , for x ∈ R. (5)
n=0
(2n + 1)2

Proposition 2.3. For any k, m ∈ N such that m < k, it follows that



1 Y 2nk + k 2nk + k
= . (6)
cos(πm/(2k)) n=0 2nk + k − m 2nk + k + m

Example 2.4. Letting k = 3 and m = 2 in (6), we obtain



Y 6n + 3 6n + 3
= 2.
n=0
6n + 1 6n + 5

Letting k = 2, m = 1 in (6) yields Catalan’s product [7]



Y 4n + 2 4n + 2 √
= 2.
n=0
4n + 1 4n + 3

May 2014] ON WALLIS-TYPE PRODUCTS AND PÓLYA’S URN SCHEMES 423


The above propositions can be generalized and extended in many ways. Consider,
for instance, the following example.

Example 2.5. Ramanujan [11, p. 50] showed that:



  3 
Q∞ 2p p)}3 cosh(π( p+1/2) 3)
(a) n=1 1 + n+ p = {0(1+
0(2+3 p) π
,

    2 
(b) ∞ p3 p
= 0((0(p+1)/2)
p/2) cosh(π p 3)−cosh(π p)
Q
n=1 1 + n3
· 1 + 3 n+ p+1

2x+2 π p π
.

If p ∈ Q, the above formulas yield expressions for infinite products of fractions whose
periodic structure resembles that of the Wallis product.

In fact, the Weierstrass factorization theorem implies that [24, Section 12.13]:
d
∞ Y d
Y n + xj Y 0(y j )
= , for x j , y j ∈ C, (7)
n=0 j=1
n + yj j=1
0(x j )

as long as dj=1 x j = dj=1 y j and none of y j is a negative integer or zero. We adopt


P P
the point of view proposed in [20] and perceive (7) as a recipe for creating Wallis-type
identities.

Definition 2.6. For α > 0, we write α ∈ W if α = ∞ P(n)


Q
n=0 Q(n) for some polynomials
P and Q with positive rational roots and common degree, that is, if
d
∞ Y d
∞ Y
Y n + a j /k Y kn + a j
α= = , for d, k, a j , b j ∈ N. (8)
n=0 j=1
n + b j /k n=0 j=1
kn + b j

The rightmost expression in (8) is said to be a Wallis-type infinite product for α.

In view of (7), the condition dj=1 a j = dj=1 b j must hold to ensure that α is finite
P P

and nonzero. A Wallis-type product can be equivalently defined as ∞ pn


Q
n=0 qn , where
pn , qn ∈ N and pn+d = pn + k, qn+d = qn + k for some d, k ∈ N and all integers n ≥
0 (in particular, both pn and qn grow asymptotically linearly). Note that the products
in Example 2.5 are not of the Wallis type.

3. SOME FURTHER EXAMPLES OF WALLIS-TYPE PRODUCTS. The anal-


ysis of the structure of W turns out to be challenging. We refer to [6, 10, 21, 22] for
various aspects of this problem. In this section, we present a selected variety of ex-
amples where Wallis-type products can be evaluated explicitly and explore a few links
between them.

Example 3.1. Start with cos π9 cos 2π9 cos 4π9 = 1


8
(“Morrie’s law” [4]), which is a spe-
cial case of
Y p−1 sin(2 p x)
cos(2 j x) = , for p ∈ N, x ∈ R. (9)
j=0 2 p sin x

Combining this result with (6) and taking in account that 0 (1/2) = π , we obtain
that

424 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121




Y Y 18n + 9 − m 18n + 9 + m 1
= . (10)
n=0 m∈{2,4,8}
18n + 9 18n + 9 8

π
In fact, (6) and (9) with x = 2 p +1
imply

∞ Y p
Y (2 p+1 + 2)n + 2 p − 2 j + 1 (2 p+1 + 2)n + 2 p + 2 j + 1 1
= p.
n=0 j=1
(2 p+1 + 2)n + 2 p + 1 (2 p+1 + 2)n + 2 p + 1 2

Example 3.2. Let Rn := { j ∈ N : 1 ≤ j ≤ n and j is relatively prime to n} , n ∈


Q The following identity (see, for instance, [22]) illustrates a result of [18] for
N.
j∈Rn 0( j/n) :

Y∞ 14n + 1 14n + 9 14n + 11 {0(7/14)}3 1


= = .
n=0 14n + 7 14n + 7 14n + 7 0(1/14) 0(9/14) 0(11/14) 4

It is curious to note that (10) above can be also derived from the result of [18] with
n = 18.

Example 3.3. Vieta [3, p. 53] (see also [13, Chapter 1]) showed that π2 = ∞
Q
√ √ n=1 Sn ,
where S1 = 1/2 and Sn = 1/2 + 1/2 Sn−1 for n > 1. Osler considered in [16] the
“united Vieta-Wallis-like products”
v s
u
p r ∞
sin(π x) Y t 1 1 1 1 1 1 2 p n −x 2 p n +x
u Y
= + +· · ·+ + cos(π x) (11)
πx m=1
2 2 2 2 2 2 n=1
2pn 2pn
(m radicals)

θ

with p ∈ N ∪ {0}. The proof of (11) rests on (4), (9), and the identity cos =
√ 2
1/2 + 1/2 cos θ .

Example 3.4. The set {0(m/k) : m, k ∈ Z; k divides 24 or 60} is investigated in [21].


One motivation for this study is its relevance to the problem of the evaluation of hyper-
geometric functions and the theory of elliptic integrals. In addition, as it is observed in
[21], the topic is related to an instance of the Lang–Rohrlich conjecture [10, 22]. We,
for instance, have:

√ √
q
p

Y 30n + 9 30n + 19 0(1/3)0(3/5) 15 + 5 + 2 5
= = ,
n=0
30n + 10 30n + 18 0(3/10)0(19/30) 219/30 31/20 51/3

p√
Y 12n + 5 12n + 9 0(1/2)0(2/3) 3+1
= = 1/4 3/8 ,
n=0
12n + 6 12n + 8 0(3/4)0(5/12) 2 3
∞ 
6n + 2 3 6n + 5 6n + 5 {0(1/3)}3 0(5/6) 0(5/6)
Y 
= = 1.
n=0
6n + 4 6n + 1 6n + 3 {0(2/3)}3 0(1/6) 0(3/6)

May 2014] ON WALLIS-TYPE PRODUCTS AND PÓLYA’S URN SCHEMES 425


In the same spirit, using the rather surprising result (we adopt this epithet from [6])
that

0(1/24) 0(11/24) √
q
= 6 + 3,
0(5/24) 0(7/24)

we can compute

Y 24n + 5 24n + 7
.
n=0
24n + 1 24n + 11

Similar formulas can be derived from other results of [21].

Example 3.5. It is observed in [1, Section 4.4] that the key formula (7), along with
the so called “standard equations” for the gamma function (translation, reflection, and
multiplication) can be used to derive the following identity, which is valid for any
integer k ≥ 0:

∞ Y k
Y (2n +1)(2k +1)−2 j (2n +1)(2k +1)+2 j 1
=√ . (12)
n=0 j=1
(2n +1)(2k +1)−2 j +1 (2n +1)(2k +1)+2 j −1 2k +1

The observation that this product can be evaluated using only the standard equations
is interesting in the light of Rohrlich’s conjecture which, informally speaking, asserts
that those are the only ones available for the values of the gamma function in rational
points, while the others can be obtained as their consequences (see, for instance, [10]
and [22, Section 4.1]).

Example 3.6. The following “kth order Wallis’s product” is somewhat similar to (12)
written as

n(2k + 1) + 2 j − 1 (−1)
∞ Y k   n  −1
Y 22k 2k
=√ .
n=1 j=1
n(2k + 1) + 2 j 2k + 1 k

Using (7) and the “standard equations”, it follows that

k−1
∞ Y k−1    
Y 2nk 2nk Y 2j + 1 2j + 1
Ak := = 0 ·0 1+
n=1 j=0
2nk − 2 j − 1 2nk + 2 j + 1 j=0
2k 2k
( Q2k−1  )2
k−1
2j + 1 2 (2k)k 0 1 m
  
Y 2k +
= · 0 1+ = m=1 2k
(2k − 1)!! 0
Qk−1 m

j=0
2j + 1 2k m=1 1 + k

π k (2k − 1)!!
= .
2k k
For instance, A1 is Wallis’s product and

4 4 4 4 8 8 8 8 12 12 12 12 16 16 16 16 3π 2
A2 = ··· = .
1 3 5 7 5 7 9 11 9 11 13 15 13 15 17 19 8

426 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



4. PÓLYA’S URN SCHEMES AND WALLIS-TYPE PRODUCTS. In this sec-
tion, we present a probabilistic interpretation of Wallis-type products in terms of prob-
abilities of realizations of Pólya’s urn scheme.
We begin by recalling the classical Pólya’s urn scheme [14]. Throughout the dis-
cussion, k ≥ 2 and p ≥ 2 are fixed integer parameters. Consider an urn containing
balls of p different colors, with colors labeled by elements of N p = {1, . . . , p}. At
each unit of time n = 1, 2, . . . , a ball is removed from the urn and is returned back
after inspection of its color together with k extra balls of the same color. Let a ∈ N p
be given by a = (a1 , . . . , a p ), where
P pai denotes the number of balls of color i initially
placed in the urn. We write |a| = i=1 ai .
Let ωn ∈ N p be the color of the ball sampled in the nth draw. We refer to the
sequence (ωn )n∈N as a realization of the Pólya urn scheme. We denote the space of
realizations N pN of the urn by  p and for ω ∈  p define

An (ω) = {ω ∈  p : ω j (ω) = ω j (ω), j = 1, . . . , n}. (13)

That is, An (ω) is the set of realizations whose first n coordinates coincides with
those of ω. Note that ∩n∈N An (ω) = {ω}. Let Fn denote the σ -algebra generated by
ω1 , . . . , ωn and let F∞ denote the σ -algebra generated by ∪n∈N Fn . Let Pa denote the
law of the urn on F∞ , and let E a be the corresponding expectation. Also, let
n
X ai + kTn,i (ω)
Tn,i (ω) = 1{ω j (ω)=i} and X n,i (ω) =
j=1
|a| + kn

denote the number of times a color i ball was drawn in the first n iterations and the
fraction of color i balls after n iterations, respectively. Finally, for n ∈ N, write X n =
(X n,1 , . . . , X n, p ), Tn = (Tn,1 , . . . , Tn, p ), and let X 0 = |a|
a
and T0 = (0, . . . , 0). Then
for all n ≥ 0, we have

ai + kTn,i (ω)
Pa (ωn+1 = i|Fn )(ω) = X n,i (ω) = .
|a| + kn

Notice that the right-hand side depends only on the number of color i balls chosen in
the first n iterations, but not the order in which they were chosen. It then follows by
induction that the sequence (ωn )n∈N is exchangeable. That is, for n ∈ N and a permuta-
tion σ of {1, . . . , n}, the distributions of (ω1 , . . . , ωn ) and (ωσ (1) , . . . , ωσ (n) ) coincide
under Pa . In particular,
Q p QTn,i (ω)−1
i=1 j=0 (ai + k j)
Pa (An (ω)) = . (14)
(|a| + k j)
Qn−1
j=0

The following result is well known, and was first established by Eggenberger and Pólya
in the case p = 2 [5, 14].

Theorem 4.1. For any a ∈ N p , X n = X 1,n , . . . , X p,n converges almost surely with

respect to the measure Pa to a random vector X ∞ ∈ R p with the Dirichlet density
p p
0(|a|/k) Y aki −1 X
f a (x1 , . . . , x p ) = Q p xi , where xi > 0 and xi = 1.
i=1 0(ai /k) i=1 i=1

May 2014] ON WALLIS-TYPE PRODUCTS AND PÓLYA’S URN SCHEMES 427


We will now derive a probabilistic interpretation of the Wallis-type product, which
also yields the expressions for the product as

Pb (An (ω))
lim
n→∞ Pa (An (ω))

for suitable choices of a, b, and ω. We begin with the intuitively clear observation that
the events we consider are asymptotically vanishing.

Proposition 4.2. Let ω ∈  p . Then Pa ({ω}) = 0.

Proof of Proposition 4.2. For any i ∈ N p and n ∈ N, we have

ai + kn |a| − 1 + kn 1
P(ωn+1 = i|Fn ) = X n,i ≤ ≤ =1− .
|a| + kn |a| + kn |a| + kn

Therefore,
n  
Y 1 Pn 1
P(An (ω)) ≤ 1− ≤ e− j=0 |a|+k j → 0, as n → ∞.
j=0
|a| + k j

This yields the result by virtue of the identity {ω} = ∩n∈N An (ω).

We proceed by defining the set of “admissible sequences.”

Definition 4.3. Let A p ⊂  p denote the event

Tn (ω)
 
ω ∈  p : T∞ (ω) = lim exists and is in (0, 1) p .
n→∞ n

We say that ω is an admissible sequence if ω ∈ A p .

Note that Pa (A p ) = 1 according to Theorem 4.1. The following is our main result.

Theorem 4.4. Let a, b ∈ N p and fix ω ∈ A p . Then

Pa (An (ω)) f a (X ∞ (ω))


lim = ,
n→∞ Pb (An (ω)) f b (X ∞ (ω))

where X ∞ is introduced in the statement of Theorem 4.1.

The proof given below is a direct application of the following Stirling’s approxima-
tion formula for the gamma function (see, for instance, [15, Proposition 2.1]):
Z ∞ √ 1
0(x) = e−t t x−1 dt ∼ 2π x x− 2 e−x , as x → ∞. (15)
0

We note that the proof does not rely on either Theorem 4.1 or any of the results of the
preceding sections.

428 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



Proof of Theorem 4.4. Since for x > 0, 0(x + 1) = x0(x), we obtain that for any
m ∈ N,
m
Y m 
Y x  0( x + (m + 1))
(x + jk) = k m+1 + j = k m+1 k .
j=0 j=0
k 0( xk )

Thus, rewriting (14) in terms of the gamma function and using (15), we obtain, as
n → ∞,

0( aki + Tn,i )/ 0( aki )


Qp
Pa (An (ω)) = i=1
0( |a|
k
+ n)/ 0( |a|
k
)
0( |a| ) p
0( ai + Tn,i )
Q
= Q p k ai × i=1 |a| k
i=1 0( k ) 0( k + n)
ai 1
0( |a| ) i=1 ( k + Tn,i )
Q p ai
k +Tn,i − 2
k
∼ Qp ai ×
i=1 0( k )
|a|
( |a|
k
+ n) k +n
ai
+Tn,i − 21 ai
0( |a| )
Qp
i=1 Tn,ik ·ek
∼ p k ai ×
0( ) |a| |a|
Q 1
i=1 k n k +n− 2 ·e k
T − 12 p   |ai |
0( |a| )
Qp
Tn,in,i Y Tn,i k
= Q p k ai i=1
.
i=1 0( k )
1
n n− 2 i=1
n

Since ω ∈ A p , we have limn→∞ Tn,i /n = X ∞,i (ω). Therefore,

p
0( |a| )/ i=1 0( aki ) Y |aki | − |bki |
Qp
Pa (An (ω)) f a (X ∞ (ω))
lim = k
X ∞,i = .
n→∞ Pb (A n (ω)) 0( k )/ i=1 0( k ) i=1
|b| p bi
f b (X ∞ (ω))
Q

The proof of the theorem is complete.

We remark that the result remains true if we assume that the number of balls re-
turned in each of the schemes is different, say k balls under Pa and k 0 6 = k balls under
Pb . The proof being identical, we omit details.
Notice that the Wallis-type products of Definition 2.6 correspond to the special case
|a| = |b| of the following corollary to our main result.

Corollary 4.5. Suppose that a and b are elements in N p . Then


 
bn/ pc p p
|b|−|a| 0( ) 0( bki )
n |a| Y
Y |b| + k j Y Y ai + k j
a .
k
lim  ×  =p k (16)
n→∞
j=0
|a| + k j b + kj
j=0 i=1 i
0( |b|
k
) i=1 0( ki )

Proof of Corollary 4.5. Let ω = (ωn )n∈N be the p-periodic sequence defined by ωn ≡
n mod p and ωn ∈ [1, p], so that

ω = 1, 2, . . . , p, 1, 2, . . . , p, 1, 2, . . . .

May 2014] ON WALLIS-TYPE PRODUCTS AND PÓLYA’S URN SCHEMES 429


Clearly, ω ∈ A p . It follows then from (14) that the left-hand side of (16) is equal to

Q p QTn,i (ω)−1  Q p QTn,i (ω)−1


(ai + k j) (bi + k j)
!
i=1 j=0 i=1 j=0
lim Qn−1 Qn−1
n→∞ + kj + kj
j=0 |a| j=0 |b|

f a (X ∞ (ω))
= ,
f b (X ∞ (ω))

which establishes the claim, since X ∞ (ω) = 1


, . . . , 1p

p
.

Admissible sequences are a natural extension of the cyclic ones that show up in
Corollary 4.5. Note that while Pa (A p ) = 1, there are only countably many cyclic se-
quences, and hence according to Proposition 4.2, their entire collection is a set of
probability zero. To get a somewhat less trivial example of a subset of A p , which is a
null-set of measure Pa , we can consider, for instance, ω’s in A p that do not satisfy the
law of iterated logarithm for Pólya’s urns (an estimate on the rate of convergence of
X n to X ∞ ) proved in [12, p. 775].
We conclude this section with an observation that Pa and Pb are equivalent mea-
sures, namely they share the same null-events. Thus, we cannot determine the initial
distribution of the urn scheme only by observing its realization.

Corollary 4.6. For ω ∈  p , let

Pb (An (ω))
Z n (ω) = .
Pa (An (ω))

Then Z n converges both almost surely with respect to Pa and in L 1 (Pa ) to

f b (X ∞ (ω))
,
f a (X ∞ (ω))

as n → ∞. In particular, Pb is absolutely continuous with respect to Pa .

Proof of Corollary 4.6. The almost sure convergence is the content of Theorem 4.4.
Observe next that E a (Z n ) = 1 because Z n is a Radon–Nikodym derivative of Pb |Fn
with respect to Pa |Fn (see, for instance, Section 5.3.3 and Appendix A4 in [9] for a
superb introduction to the Radon–Nikodym derivative and Radon–Nikodym theorem
within the context of probability theory). In fact, it follows from the definition of An (ω)
in (13) that
X
E a (Z n ) = Z n (ω)Pa (An (ω))
ω∈N pn

X Pb (An (ω))
= Pa (An (ω))
n Pa (A n (ω))
ω∈N p
X
= Pb (An (ω)) = 1.
ω∈N pn

430 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



Furthermore, since X ∞ has density f a under Pa , it follows that

f b (x)
Z
E a (Z ∞ ) = f a (x) d x = 1.
f a (x)

By Vitali’s convergence theorem (see, for instance, [19, p. 165] or [9, Theorem 5.5.2]),
the almost sure convergence, along with the convergence of the expected values, imply
the convergence in L 1 (Pa ).
It remains to show that Pb is absolutely continuous with respect to Pa . Let A ∈
∪n∈N Fn . Then Pb (A) = E a (1 A Z n ) for all n sufficiently large. Since

E a (1 A |Z n − Z ∞ |) ≤ E a (|Z n − Z ∞ |) → 0 as n → ∞,

it follows that

Pb (A) = E a (1 A Z ∞ ). (17)

Note that the mapping A → E a (1 A Z ∞ ) is a probability measure on F∞ . Since ∪n∈N Fn


is an algebra and the events A ∈ F∞ satisfying (17) form a monotone class, it fol-
lows from the monotone class theorem [9, Theorem 6.1.3] that (17) holds for all
A ∈ σ (∪n∈N Fn ) = F∞ .

As in the case of Theorem 4.4, the result continues to hold when the number of balls
returned to each urn is different, the proof being identical.

ACKNOWLEDGMENTS. We are grateful to the anonymous referees and the editor for the careful reading
of this paper and many helpful remarks and suggestions. We also wish to thank Jonathan Sondow and Huang
Yi for their valuable and encouraging comments on a preliminary version of the paper. Iddo Ben-Ari is grateful
for support from the Simons Foundation grant #208728.

REFERENCES

1. J.-P. Allouche, J. Sondow, Infinite products with strongly B-multiplicative exponents, Ann. Univ. Sci.
Budapest. Sect. Comput. 28 (2008) 35–53.
2. T. Amdeberhan, O. Espinosa, V. H. Moll, A. Straub, Wallis-Ramanujan-Schur-Feynman, Amer. Math.
Monthly 117 (2010) 618–632.
3. Pi: A Source Book. Second edition. Edited by L. Berggren, J. Borwein, and P. Borwein. Springer, New
York, 2000.
4. W. A. Beyer, J. D. Louck, D. Zeilberger, Math bite: A generalization of a curiosity that Feynman remem-
bered all his life, Math. Mag. 69 (1996) 43–44.
5. D. Blackwell, J. B. MacQueen, Ferguson distributions via Pólya urn schemes, Ann. Stat. 1 (1973) 353–
355.
6. J. M. Borwein, I. J. Zucker, Fast evaluation of the gamma function for small rational fractions using
complete elliptic integrals of the first kind, IMA J. Numer. Anal. 12 (1992) 519–526.
7. E. Catalan, Sur la constante d’Euler et la fonction de Binet, C. R. Acad. Sci. Paris Sér. I Math. 77 (1873)
198–201.
8. J. B. Conway, Functions of One Complex Variable I. Second edition. Springer, New York, 1978.
9. R. Durrett, Probability: Theory and Examples. Cambridge Series in Statistical and Probabilistic Mathe-
matics. Forth edition. Cambridge University Press, 2010.
10. S. Gun, M. Ram Murty, P. Rath, Linear independence of digamma function and a variant of a conjecture
of Rohrlich, J. Number Theory 129 (2009) 1858–1873.
11. Collected Papers of Srinivasa Ramanujan. Edited by G. H. Hardy, P. V. Seshu Aiyar, and B. M. Wilson.
AMS Chelsea Publishing, Providence, RI, 2000.
12. C. C. Heyde, On central limit and iterated logarithm supplements to the martingale convergence, J. Appl.
Probab. 14 (1977) 758–775.

May 2014] ON WALLIS-TYPE PRODUCTS AND PÓLYA’S URN SCHEMES 431


13. M. Kac, Statistical Independence in Probability, Analysis and Number Theory. Carus Mathematical
Monographs, Vol. 12, The Mathematical Association of America, Washington, DC, 1959.
14. H. M. Mahmoud, Pólya Urn Models. CRC Press, Boca Raton, FL, 2009.
15. G. Nemes, New asymptotic expansion for the gamma function, Archiv der Mathematik 95 (2010) 161–
169.
16. T. J. Osler, The union of Vieta’s and Wallis’s products for π , Amer. Math. Monthly 106 (1999) 774–776.
17. G. Pólya, Mathematics and Plausible Reasoning. Vol. 1, Induction and Analogy in Mathematics, Prince-
ton University Press, Princeton, 1990.
18. J. Sándor, L. Tóth, A remark on the gamma function, Elem. Math. 44 (1989) 73–76.
19. R. L. Schilling, Measures, Integrals and Martingales. Cambridge University Press,p Cambridge, 2006.

20. J. Sondow, H. Yi, New Wallis- and Catalan-type infinite products for π, e, and 2 + 2, Amer. Math.
Monthly 117 (2010) 912–917.
21. R. Vidūnas, Expressions for values of the gamma function, Kyushu J. Math. 59 (2005) 267–283.
22. M. Waldschmidt, Recent Diophantine results on zeta values: a survey (2009), available at http://www.
math.jussieu.fr/~miw/articles/pdf/ZetaValuesRIMS2009.pdf.
23. J. Wästlund, An elementary proof of the Wallis product formula for π , Amer. Math. Monthly 114 (2007)
914–917.
24. E. T. Whittaker, G. N. Watson, A Course of Modern Analysis. Cambridge University Press, Cambridge,
1978.
25. A. M. Yaglom, I. M. Yaglom, An elementary derivation of the formulas of Wallis, Leibnitz and Euler for
the number π (in Russian), Uspekhi Mat. Nauk 57 (1953) 181–187.

IDDO BEN-ARI received a Ph.D. in mathematics from the Technion, Israel Institute of Technology, in 2005.
He is an assistant professor with the department of mathematics, University of Connecticut.
Department of Mathematics, University of Connecticut, Storrs, CT 06269-3009.
iddo.ben-ari@uconn.edu

DIANA HAY was born and raised in the idyllic Californian coastal town Santa Cruz. Diana is a banana slug,
having received her first undergraduate degree in physical anthropology from the University of California,
Santa Cruz. After her graduation, she assisted with underwater whale photography expeditions and worked
as a freelance web developer. She later returned to school to receive her second baccalaureate in mathematics
from California State University, Monterey Bay, and is now a Ph.D. student in mathematics at Iowa State
University.
Department of Mathematics, Iowa State University, Ames, IA 50011
dhay@iastate.edu

ALEXANDER ROITERSHTEIN was born in St. Petersburg, Russia. He received the Ph.D. degree in ap-
plied mathematics in 2004 from Technion IIT, Haifa, Israel. Currently, he is an Assistant Professor with the
Department of Mathematics, Iowa State University. His research interests are in probability theory, stochastic
processes and their applications.
Department of Mathematics, Iowa State University, Ames, IA 50011
roiterst@iastate.edu

432 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



NOTES
Edited by Sergei Tabachnikov

The Primes that Euclid Forgot


Paul Pollack and Enrique Treviño

Abstract. Let q1 = 2. Supposing that we have defined q j for all 1 ≤ j ≤ k, let qk+1 be a prime
factor of 1 + kj=1 q j . As was shown by Euclid over two thousand years ago, q1 , q2 , q3 , . . .
Q
is then an infinite sequence of distinct primes. The sequence
Q {qi } is not unique, since there is
flexibility in the choice of the prime qk+1 dividing 1 + kj=1 q j . Mullin suggested studying
the two sequences formed by (1) always taking qk+1 as small as possible, and (2) always
taking qk+1 as large as possible. For each of these sequences, he asked whether every prime
eventually appears. Recently, Booker showed that the second sequence omits infinitely many
primes. We give a completely elementary proof of Booker’s result, suitable for presentation in
a first course in number theory.

1. INTRODUCTION. The following is one version of Euclid’s proof that there are
infinitely many primes. Start with q1 = 2. Supposing that q j has been defined for
1 ≤ j ≤ k, continue the sequence by choosing a prime qk+1 , for which
k
Y
qk+1 | 1 + qj. (1)
j=1

Then ‘at the end of the day’, the list q1 , q2 , q3 , . . . is an infinite sequence of distinct
prime numbers.
Of course, the sequence {qi } obtained in this way is not unique, since the relation
(1) is often satisfied by several choices of the prime qk+1 . Mullin [4] suggested two
natural ways of dispensing with the ambiguity. First, we could agree that at each step,
we always choose the smallest prime qk+1 satisfying (1); this leads to the sequence
(numbered A000945 in the Online Encyclopedia of Integer Sequences, or OEIS [6])

2, 3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801, 11, 17, 5471, . . . . (2)

Alternatively, we might always choose the largest possible qk+1 , resulting in the se-
quence (A000946 in the OEIS)

2, 3, 7, 43, 139, 50207, 340999, 2365347734339, 4680225641471129, . . . . (3)

We call (2) and (3) the first and second Euclid–Mullin sequences, respectively. For
each of (2) and (3), Mullin raised the question of whether every prime eventually ap-
pears. Shanks [5] conjectured on probabilistic grounds (bolstered by computations
of Wagstaff; cf. [7]) that every prime is eventually reached by (2), but essentially
nothing about the first Euclid–Mullin sequence has been rigorously established. The
second Euclid–Mullin sequence was investigated by Cox and van der Poorten [2].
http://dx.doi.org/10.4169/amer.math.monthly.121.05.433
MSC: Primary 11A41, Secondary 11A15

May 2014] NOTES 433


They showed that all of 5, 11, 13, 17, 19, 23, 29, 31, 37, 41, and 47 are missing and
conjectured that in fact infinitely many primes fail to appear in (3). The Cox–van der
Poorten conjecture was very recently confirmed by Booker [1].

Theorem (Booker). The second Euclid–Mullin sequence omits infinitely many primes.

There are two key ingredients in Booker’s proof. The first is quadratic reciprocity
for the Jacobi symbol, which is a staple of many first courses in number theory. In
addition to this elementary theorem, Booker also makes use of some fairly intricate
results in analytic number theory, specifically work of Burgess from the 1960s on
upper bounds for short character sums.
A simple statement calls out for a simple proof! In this note, we present a variant
of Booker’s proof, where all of the analytic number theory is replaced by very simple-
to-prove statements about the distribution of squares and nonsquares modulo a prime.
There is a cost for this, certainly; our quantitative bounds are weaker than what fol-
lows from Burgess’s estimates. However, we believe that given how simple Booker’s
theorem is to state, there is some value in writing out a proof that is accessible to as
wide an audience as possible.

Notation. Throughout the paper, we reserve the letter p for a prime variable. We use
a

m
for the usual Legendre–Jacobi symbol.

2. PRELIMINARIES ON THE DISTRIBUTION OF SQUARES AND NON-


SQUARES MODULO A PRIME. Recall that an integer a not divisible by p is
called a quadratic residue modulo p if the congruence x 2 ≡ a (mod p) is solvable
and a quadratic nonresidue otherwise. We let `(, p) denote the length of the longest
run a + 1, a + 2, . . . , a + ` of consecutive quadratic residues mod p, and we let
`(, p) denote the longest run of consecutive quadratic nonresidues. If we wish inte-
gers congruent to 0 modulo p to be allowed in the run, we will write `0 in place of ` in
both cases.
In this section, we show that all of `(, p), `(, p), `0 (, p), and `0 (, p) are

smaller than 2 p. As a prelude, we prove an upper bound on the smallest positive
quadratic nonresidue modulo p, which we denote by n 2 ( p).

Lemma 1. Let p be an odd prime. Then n 2 ( p) < 1
2
+ p.

Proof. Let n = n 2 ( p). Since p < nd p/ne < p + n, the least nonnegative residue of
nd p/ne modulo p lies in the open interval (0, n). So nd p/ne is a quadratic residue
modulo p. Since n is a quadratic nonresidue, the ratio nd p/ne
n
= d p/ne is also a non-
residue. So by the minimality of n, it must be that 1 + p/n > d p/ne ≥ n. Hence,

1 2
 
1 √
n− < n 2 − n + 1 ≤ p, and so n < + p.
2 2

Lemma 2. Let 1 ≤ n < p be a quadratic nonresidue modulo p. Then

`(, p) ≤ max{ p/n, n − 1}.

Proof. Let ` = `(, p), and choose a ∈ Z so that all of a + 1, a + 2, . . . , a + ` are


quadratic residues modulo p. Multiplying by n, we obtain a sequence na + n, na +
2n, . . . , na + `n of quadratic nonresidues modulo p, each of which differs from the

434 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



previous by n. Suppose now that ` > p/n. In this case, every quadratic residue modulo
p can be considered mod p as being walled inside one of the intervals (na + jn, na +
( j + 1)n) with 1 ≤ j < d p/ne, or inside (na + d p/nen, na + n + p). Thus, any run
of quadratic residues has length bounded by n − 1. So either ` ≤ p/n or ` ≤ n − 1,
exactly as claimed in the lemma.

We can now establish an upper bound on the length of any sequence of consecutive
squares modulo p.

Proposition 3. If p is an odd prime, then `0 (, p) < 2 p.

Proof. We first rule out long runs of squares containing a multiple of p. Suppose
first that −1 is not a square modulo p. Then any such run of squares can be viewed,
modulo p, as a subset of the interval [0, n 2 ( p)), and thus has length at most n 2 ( p). On
the other hand, if −1 is a square modulo p, then such a run can be viewed as a subset
of (−n 2 ( p), n 2 ( p)), and so has length at most 2n 2 ( p) − 1. Consequently,

`0 (, p) ≤ max{2n 2 ( p) − 1, `(, p)}.



By Lemma 1, we have 2n 2 ( p) − 1 < 2 p. Thus, it suffices to show that `(, p) <
√ √ √
2 p. If there is any quadratic nonresidue in the half-open interval ( 12 p, 2 p], then
this bound on `(, p) follows from Lemma 2. So let us suppose otherwise. By Lemma
√ √ √
1, n 2 ( p) < 12 + p < 2 p, and so n 2 ( p) ≤ 12 p. With n := n 2 ( p), each of the in-
tegers k 2 n with 1 ≤ k < p is a quadratic nonresidue mod p. If we pick k as large as
possible with
1√
k2n ≤ p,
2
√ √
then the lack of nonresidues in ( 12 p, 2 p] implies that

(k + 1)2 n > 2 p.

Subtracting the first inequality from the second yields (2k + 1)n > 32 p ≥ 3k 2 n,
and thus 2k + 1 > 3k 2 . But this inequality is false for each k ≥ 1. This proves that

`(, p) < 2 p and completes the proof of the proposition.

It is easier to rule out long runs of nonsquares mod p.



Proposition 4. For each odd prime p, we have `0 (, p) < 2 p.

Proof. Every nonresidue or multiple of p can be considered mod p as being walled



within the interval ( j 2 , ( j + 1)2 ), for some 1 ≤ j < b pc, or within the interval
√ 2 √
(b pc , p + 1). The number of integers in an interval of the first kind is 2 j < 2 p,
√ 2 √ 2 √
while the number of integers in (b pc , p + 1) is p − b pc < p − ( p − 1)2 <

2 p.

Remarks. Much of this section is adapted from the charming book of Gelfond and
Linnik [3]. Lemma 1 and its proof appear, with trivial changes, as that text’s Theorem
9.3.1, while the proof of Proposition 4 comes from the discussion at the bottom of
p. 179. The only novelty is our proof of Proposition 3. Gelfond and Linnik state that
result as Theorem 9.3.2, but it seems that their proof is incomplete.

May 2014] NOTES 435


3. PROOF OF THE MAIN THEOREM. Throughout this section, the second
Euclid–Mullin sequence is denoted q1 , q2 , q3 , . . . . The main theorem is contained in
the following proposition.

Proposition 5. Let Q 1 , Q 2 , . . . , Q r be the smallest r primes omitted from the second


Euclid–Mullin sequence, where r ≥ 0. Then there is another omitted prime smaller
than
r
!2
Y
12 2
Qi . (4)
i=1

Remark. Using the results of Burgess, Booker showed that the exponent 2 in (4) can
be replaced with any real number larger than 4√1e−1 = 0.178734 . . . , provided that 122
is also replaced by a possibly larger constant.
Qr 2
Proof. Let X = 122 i=1 Q i . Let us suppose for the sake of contradiction that every
prime p ≤ X except Q 1 , . . . , Q r appears in the second Euclid–Mullin sequence. Let
p be the prime in [2, X ] that is last to appear in the sequence {qi }, and say p appears
as the nth term qn . Then p is the largest prime dividing 1 + q1 · · · qn−1 . Moreover,
since each prime smaller than p that is not a Q i is one of q1 , . . . , qn−1 , the only other
possible prime factors of 1 + q1 · · · qn−1 are Q 1 , . . . , Q r . Thus, we must have
e e
1 + q1 · · · qn−1 = Q 11 Q 22 · · · Q rer p e

for some exponents e1 , . . . , er ≥ 0 and e ≥ 1.


We claim it is possible to choose a natural number d ≤ X satisfying both of the
congruences

d ≡ 1 (mod 4), d ≡ −1 (mod Q 1 · · · Q r ), (5)

as well as
   
d −1
= . (6)
p p

Suppose for the moment that this has been proved. Since d ≤ X , and d is coprime to
Q 1 · · · Q r p, every prime dividing d is among the primes q1 , . . . , qn−1 . So if we write
d = d0 d12 , where d0 is squarefree, then d0 | q1 · · · qn−1 . Hence,
   
d 1 + q1 · · · qn−1
=
1 + q1 · · · qn−1 d
  
1 + q1 · · · qn−1 1 + q1 · · · qn−1
=
d0 d12
1 + q1 · · · qn−1 2
   
1
= · = 1 · 1 = 1.
d0 d1

(The very first equality uses


 quadratic reciprocity for the dJacobi symbol.) On the other
hand, we have Qdi = −1 . . . ,
  −1

Qi
for each i = 1, 2, r and p
= p
, so that

436 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



r   !  e
d ei
 
d Y d
= ·
1 + q1 · · · qn−1 i=1
Qi p
r 
!
−1 ei
  e
Y −1
= ·
i=1
Qi p
 
−1
= = −1,
1 + q1 · · · qn−1

using in the last step that 1 + q1 · · · qn−1 = 1 + 2 1<i<n qi ≡ 3 (mod 4). This is a
Q
contradiction.
It remains to establish the existence of a d ≤ X satisfying (5) and (6). The condi-
tions (5) are satisfied by every integer d ≡ A (mod M), where A := 2Q 1 · · · Q r − 1
and M := 4Q1 · · · Q r . To obtain (6), we look for a small nonnegative integer k with
Mk+A
p
= −1 p
. Equivalently, fixing M 0 satisfying M M 0 ≡ 1 (mod p), we seek a non-
negative integer k with

k + AM 0 −M 0
   
= .
p p

By the results of section 2, we can find such a k ≤ max{`0 (, p), `0 (, p)} < 2 p.
Then the corresponding d satisfies
√ √ √
0 < d = Mk + A < 2M p + M < 3M p ≤ 3M X.

Since 3M = 12Q 1 · · · Q r = X , we find that d < X . This completes the proof.

ACKNOWLEDGMENTS. We are grateful to Carl Pomerance and the anonymous referee for their thoughtful
suggestions. In particular, the current form of Proposition 3 is due to the referee; our original result was slightly
weaker. We also thank Yuliia Glushchenko for help with the Russian original of [3].

REFERENCES

1. A. Booker, On Mullin’s second sequence of primes, Integers 12A (2012), available at http://www.
integers-ejcnt.org/vol12a.html.
2. C. D. Cox, A. J. van der Poorten, On a sequence of prime numbers, J. Austral. Math. Soc. 8 (1968) 571–
574.
3. A. O. Gel0 fond, Yu. V. Linnik, Elementary Methods in the Analytic Theory of Numbers. Pergamon Press,
Oxford, 1966.
4. A. A. Mullin, Recursive function theory (a modern look at a Euclidean idea), Bull. Amer. Math. Soc. 69
(1963) 737.
5. D. Shanks, Euclid’s primes, Bull. Inst. Combin. Appl. 1 (1991) 33–36.
6. N. J. A. Sloane, The On-Line Encyclopedia of Integer Sequences, published electronically at http://
oeis.org.
7. S. S. Wagstaff, Jr., Computing Euclid’s primes, Bull. Inst. Combin. Appl. 8 (1993) 23–32.

Department of Mathematics, University of Georgia, Athens, GA 30602


pollack@uga.edu

Department of Mathematics and Computer Science, Lake Forest College, Lake Forest, IL 60045
trevino@mx.lakeforest.edu

May 2014] NOTES 437


Solution of Sondow’s Problem:
A Synthetic Proof of the Tangency Property
of the Parbelos
Emmanuel Tsukerman

Abstract. In a recent paper titled The parbelos, a parabolic analog of the arbelos, Sondow
asks for a synthetic proof to the tangency property of the parbelos. In this paper, we resolve
this question by introducing a converse to Lambert’s Theorem on the parabola. In the process,
we prove some new properties of the parbelos.

1. INTRODUCTION. In a recent paper, Jonathan Sondow introduced the parbelos—


a parabolic analogue of the arbelos [3]. One of the beautiful properties of the parbelos
is that the tangents at the cusps of the parbelos form a rectangle, and that the diagonal
of the rectangle opposite the cusp is tangent to the upper parabola. Moreover, the tan-
gency point lies on the bisector of the angle at the cusp. Sondow asks for a synthetic
proof of these two properties of the tangent rectangle of the parbelos (Figure 1.2),
which he proves by analytic means. In this paper, we present such a proof. We do so
by introducing a converse to the following Theorem of Lambert: The circumcircle of
a triangle formed by three tangent lines to the parabola passes through the focus of the
parabola. In the process of proving Sondow’s tangent property, we discover some new
properties of the parbelos.

Figure 1.1. The arbelos and the parbelos.

2. PRELIMINARIES. The classical Simson–Wallace Theorem is a useful tool in


understanding the parabola.

Theorem 1 (Simson–Wallace Theorem). Given a triangle 4ABC and a point P in


the plane, the orthogonal projections of P into the sides (also called pedal points) of
the triangle are collinear if and only if P is on the circumcircle of 4ABC [1].

In general, a pedal curve is defined as the locus of orthogonal projections of a point


into the tangents of the curve. In a sense discussed in [5], the parabola may be viewed
as a polygon with infinitely many vertices, which satisfies the following Simson-type
property: It is the unique curve such that its pedal curve with respect to a point is a line.
The point turns out to be the focus F of the parabola and the line is the supporting line
at its vertex, which we will denote by 3.
http://dx.doi.org/10.4169/amer.math.monthly.121.05.438
MSC: Primary 51M04

438 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



T2

T3

T1

C1 C2 C3

Figure 1.2. Sondow’s Tangency Property: the diagonal T1 T3 of the tangent rectangle C2 T1 T2 T3 is tangent to
the outer parabola. Moreover, the tangency point is the intersection of the angle bisector of cusp C2 with the
outer parabola.

Lemma 2. A line l is tangent to the parabola if and only if the orthogonal projection
of the focus F into l lies on the supporting line 3.

Proof. For a proof of the “only if” statement, we refer the reader to [2] and [5]. For the
“if” statement, let P be the orthogonal projection of F into l and assume that P ∈ 3. If
P is the vertex of G, then clearly l = 3 and we are done. So assume otherwise. Since
3 has no points inside of the parabola, there exists a tangent line l˜ to G not equal to 3
that passes through P. The “only if” part implies that the orthogonal projection of F
into l˜ is on 3, and is therefore P. It follows that l = l.
˜

Lambert’s Theorem on the parabola states that the circumcircle of a triangle formed
by three tangents to the parabola always passes through the focus. Using Lemma 2,
we can prove the statement quite easily. Let three tangents l1 , l2 , l3 to the parabola be
given. Then the orthogonal projections of F into l1 , l2 , l3 all lie on 3, and are therefore
collinear. By the Simson–Wallace Theorem, F lies on the circumcircle of the triangle
formed from l1 , l2 , l3 . We now introduce a converse to Lambert’s Theorem.

l l

3 P

Figure 2.1. Proof of Lemma 2.

May 2014] NOTES 439


Theorem 3 (Converse to Lambert’s Theorem). Let l1 and l2 be two distinct lines
tangent to a parabola G with focus F. Let I = l1 ∩ l2 be their intersection and consider
any circle C passing through points F and I . Let Hi ∈ C ∩ li , for i = 1, 2. Then the
line H1 H2 is tangent to G.

Proof. If Hi = I for some i, then the statement clearly holds. So assume that Hi 6 = I
for each i. By Lemma 2, the orthogonal projections of F into l1 and l2 lie on 3. Since
F is on the circumcircle of 4H1 H2 I , its pedal is a line (by Theorem 1). As a line
is uniquely determined by two points, this line must be 3. Applying Lemma 2 again
yields that H1 H2 is tangent to G.

I
H2

H1
l2
l1
F

Figure 2.2. Illustration of Theorem 3.

3. PARBELOS. Recall that the latus rectum of a conic is the chord through the focus
that is parallel to the conic’s directrix. The parbelos is constructed as follows. Given
three points C1 , C2 , C3 on a line, construct parabolas G 1 , G 2 , G 3 that open in the same
direction and whose latera recta are C1 C2 , C2 C3 , and C1 C3 , respectively. The parbelos
is defined as the region bounded by the three latus rectum arcs.

G2

G3
G1
C1 C2 C3

The tangent line of a parabola at either endpoint of its latus rectum forms an angle
of π4 with the latus rectum. As such, parabolas G 1 and G 2 share the same tangent at
C1 , and similarly parabolas G 2 and G 3 share a tangent at C3 . At cusp C2 , however, we
obtain two different tangent directions. We can extend these four tangents to form a
rectangle whose vertices are the intersections of tangent lines as in Figure 1.2. We will
denote the vertices of this rectangle by C2 , T1 , T2 , T3 .
In his paper [3], Sondow asks for a synthetic proof of the following theorem, which
he proves via analytic geometry.

440 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



Theorem 4 (Sondow’s Tangency Property). In the tangent rectangle of the parbelos,
the diagonal opposite the cusp is tangent to the upper parabola. The contact point lies
on the bisector of the angle at the cusp.

Proof. Let us inscribe the tangent rectangle r = T1 T2 T3 C2 in another rectangle R


whose sides are parallel and orthogonal to C1 C3 . At the cusp C2 , the angles formed
between C1 C3 and C2 T1 , C2 T3 are π4 and 3π4 . Using right triangles, it is easy to see that
R must be a square and its center O is the same as that of r .

T2

T3
O
T1

C1C3 C2

Figure 3.1. Rectangle R circumscribing the tangent rectangle r .

Consider the circumscribing circle of r .

T2

T3
O
T1

C1C3 C2 F

Figure 3.2. The angles at the cusp C2 are equal.

Since its center is O, by symmetry it intersects C1 C3 at a point F, such that F


is the orthogonal projection of T2 down to C1 C3 . This point is the focus of the outer
parabola. Since F, C2 , T1 , T2 , T3 lie on a circle, Theorem 3 implies that T1 T3 is tangent
to the parabola.
As for the angle bisector at the cusp C2 , the billiard angle property implies that it
is orthogonal to C1 C3 (see Figure 3.3). Let H be the intersection of the angle bisector
with the top of the rectangle (i.e., with the directrix of the outer parabola). We would
like to show that FT = HT.
This is not too hard to see from Figure 3.3 on the right. It also follows from the
diagram that F is equidistant from T1 and T3 .

May 2014] NOTES 441


T2 H T2

T3 T3
O O
T T
T1 T1

C2 C2
C1C3 F C1C3 F

Figure 3.3. The angle bisector at cusp C2 is orthogonal to C1 C3 .

Remark 5. One way to look at the configuration in Figure 3.1 is as a 4-periodic billiard
trajectory in a square billiard table (for an exposition on billiards in polygons, see [4]).
It would be interesting to see whether there is a deeper connection between (p)arbelos
and billiards.

Let A1 be the intersection of the axis of symmetry of G 1 with the directrix of G 3


(i.e., the line parallel to C1 C3 and passing through T3 ). Define A3 similarly. Notice that
the vertical sides of the rectangle R are the axes of symmetry of G 1 and G 3 , while the
horizontal sides are C1 C3 and the directrix of the outer parabola. As a consequence,
we obtain the following new properties for the parbelos.

H T2

A1 T3

T1 A3

F
C2

Figure 3.4. The circumcircle of the tangent rectangle and notable points lying on it.

Corollary 6.
1. The focus F of the outer parabola is equidistant from vertices T1 and T3 of the
tangent rectangle.
2. The intersection H of the angle bisector at cusp C2 and the directrix of the outer
parabola lies on the circumcircle F, C2 , T1 , T2 , T3 of the tangent rectangle.
3. This point H is equidistant from vertices T1 and T3 .
4. Points A1 and A3 lie on circle F, C2 , T1 , T2 , T3 .
5. Point A1 is equidistant from C2 and T2 and so is point A3 .

442 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



REFERENCES

1. H. S. M. Coxeter, S. L. Greitzer, Geometry Revisited. MAA, New York, 1967. 40–41.


2. D. Hilbert, S. Cohn-Vossen, Geometry and the Imagination. Chelsea, New York, 1999. 26–27.
3. J. Sondow, The parbelos, a parabolic analog of the arbelos, Amer. Math. Monthly 120 (2013) 929–935.
4. S. Tabachnikov, Geometry and Billiards. Advanced Study Semesters, American Mathematical Society,
Providence, RI, 2005. 113–134.
5. E. Tsukerman, On Polygons Admitting a Simson Line as Discrete Analogs of Parabolas (2012), available
at http://arxiv.org/abs/1201.0305.

Department of Mathematics, University of California, Berkeley, CA 94720-3840


e.tsukerman@berkeley.edu


A Difference Equation Leading to the Irrationality of 2
We
√ provide a fresh proof of a very old and well-known fact: the irrationality of
2. To our knowledge, this approach is new; at least, we have not seen it in the
outstanding references [1, 2], although the flavor of the proof reminds us of [3].

Theorem. The square root of 2 is irrational.


Proof. The characteristic equation associated with the second order linear dif-
ference equation

xn+2 = −2xn+1 + xn , n = 0, 1, 2, . . . , (1)


√ √
is r 2 = −2r + 1, whose solutions are r1 = 2 − 1 and r2 = −( 2 + 1). So the
general solution of (1) is given by

xn = ar1n + br2n , a, b ∈ R, n = 0, 1, 2, . . . . (2)



Assume now that 2 is rational, or equivalently, r1 = qp with p, q ∈ Z and q 6 =
0. By taking a = q and b = 0 in (2), we have that x0 = q ∈ Z, x1 = p ∈ Z and
then by induction xn = −2xn−1 + xn−2 ∈ Z for all n = 2, 3, 4, . . . .
On the other hand, since 0 < r1 < 1 and q 6 = 0, it follows that xn = qr1n 6 = 0
for all n = 0, 1, 2, . . . and limn→+∞ xn = limn→+∞ qr1n = 0, a contradiction.

REFERENCES

1. A. Bogomolny, Square root of 2 is irrational, Interactive Mathematics Miscellany and Puzzles,


http://www.cut-the-knot.org/proofs/sq_root.shtml
2. M. Gardner, The square root of 2 = 1.41421 35623 73095 . . . , Math Horizons, April 1997, 5–8.
3. D. Kalman, R. Mena, and S. Shahriari, Variations on an irrational theme—Geometry, dynamics,
algebra, Math. Mag. 70 (1997) 93–104.

—Submitted by José Ángel Cid Araújo∗ , Departamento de Matemáticas,


Universidade de Vigo, Campus de Ourense, Spain, angelcid@uvigo.es
http://dx.doi.org/10.4169/amer.math.monthly.121.05.443
MSC: Primary 11J72
∗ The author was partially supported by Ministerio de Educación y Ciencia, Spain, and FEDER,

Project MTM2010-15314

May 2014] NOTES 443


A Connection between Furstenberg’s and
Euclid’s Proofs of the Infinitude of Primes
Nathan A. Carlson

Abstract. In 1955, Furstenberg gave a curious topological proof of the infinitude of primes.
Cass and Wildenberg, as well as Mercer, dispensed with the topological language in this proof
to uncover the essential number theory. In this note, we observe that Furstenberg’s proof has
an important and intriguing connection to Euclid’s well-known original proof.

Let P be the set of all primes. For a finite set F = { p1 , . . . , pn } ⊆ P, let z F = p1 ·


· · · · pn . For any S ⊆ P, let
[
N (S) = Z\ pZ.
p∈S

Note that N (P) consists of all integers that are not integer multiples of any prime. The
Fundamental Theorem of Arithmetic implies that N (P) = {−1, 1}.
The key observation in Furstenberg’s proof is that if P were finite, then N (P) would
be infinite, contradicting that N (P) = {−1, 1}. In [1], periodic subsets of Z are used
to show N (P) is infinite. A straightforward way to see that N (P) is infinite if P were
finite is to note that for any m ∈ Z and p ∈ P, m(z P ) + 1 ∈ Z\( pZ). It follows that
\
m(z P ) + 1 ∈ Z\( pZ) = N (P).
p∈P

A strikingly similar tactic is used in Euclid’s classic proof, known to many students
of mathematics. We start with any finite set of primes F and it is shown that z F + 1 ∈
N (F). So assuming that P is finite, we see that z P + 1 > 1, which contradicts that
N (P) = {−1, 1}.
In summary, assume that P is finite. Furstenberg’s proof reduces to the observation
that N (P) is infinite. Euclid’s proof reduces to the observation that z P + 1 ∈ N (P).
Both observations contradict that N (P) = {−1, 1}.

ACKNOWLEDGMENT. The author thanks the referee for many helpful suggestions.

REFERENCES

1. D. Cass, G. Wildenberg, Math Bite: A Novel Proof of the Infinitude of Primes, Revisited, Mathematics
Magazine 76 (2003) 203.
2. H. Furstenberg, On the infinitude of primes, Amer. Math. Monthly 62 (1955) 353.
3. I. D. Mercer, On Furstenberg’s Proof of the Infinitude of Primes, Amer. Math. Monthly 116 (2009) 355–
356.

Department of Mathematics, California Lutheran University, 60 W. Olsen Rd, MC 3750,


Thousand Oaks, CA 91360
ncarlson@callutheran.edu

http://dx.doi.org/10.4169/amer.math.monthly.121.05.444
MSC: Primary 11A41

444 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



An Elementary Proof of a Generalization of
Banach’s Mapping Theorem
Ming-Chia Li

Abstract. We give an elementary proof of a generalization of Banach’s mapping theorem,


which says that for any two mappings f : A → B and g : B → A, there exists a subset A0 of
A such that g(B\ f (A0 )) = A\A0 .

Let A and B be two sets and let f : A → B and g : B → A be two mappings. If both
f and g are injections, then Banach’s mapping theorem [1] asserts that there exist
A0 ⊆ A and B0 ⊆ B such that f (A0 ) = B0 and g(B\B0 ) = A\A0 , and the Cantor-
Schroeder-Bernstein theorem [4, Theorem 4.5.5] asserts that there exists a bijection
from A to B. It is well known that the former theorem implies the latter one, by defin-
ing a desired bijection to be f if restricted to A0 , and the inverse of g otherwise. In
fact, Banach’s mapping theorem can be generalized by removing the condition of in-
jections; see [4, p.102, Exercise 4.21].
In this short note, we prove the generalization of Banach’s mapping theorem men-
tioned above. We do this by applying the following simple fact. In this fact, we set
A0 = A∗ and B0 = f (A∗ ). This includes the extreme case when A∗ is empty, or equiv-
alently, when g is surjective.

Fact. For anyTsubset S of A, define ϕ(S) = A\g(B\ f (S)). Let  = {S ⊆ A : ϕ(S) ⊆


S} and A∗ = S∈ S. Then ϕ(A∗ ) = A∗ .

We may obtain this fact using the Knaster–Tarski theorem [3, 5] from lattice theory
as follows. The power set of A, partially ordered by set inclusion, is a complete lattice,
and so can be restricted to the set of all fixed points of the isotonic ϕ. Hence, a least
fixed point exists as A∗ (also refer to [2] for the related theoretical background). For
completeness, we offer an elementary proof as follows.

Proof. It is clear that  is nonempty, since A ∈ . By the definition of  and A∗ , we


get that
\  \ \
ϕ(A ) = ϕ

S ⊆ ϕ(S) ⊆ S = A∗ .
S∈ S∈ S∈

On the other hand, the definition of ϕ implies that if S1 ⊆ S2 , then ϕ(S1 ) ⊆ ϕ(S2 ); in
particular, we have ϕ(ϕ(A∗ )) ⊆ ϕ(A∗ ). By the definition of , we obtain that ϕ(A∗ ) ∈
. The definition of A∗ implies that A∗ ⊆ ϕ(A∗ ). Therefore, ϕ(A∗ ) = A∗ .

ACKNOWLEDGMENT. The author would like to thank Steve Krantz and the referee for valuable comments.

http://dx.doi.org/10.4169/amer.math.monthly.121.05.445
MSC: Primary 04A05

May 2014] NOTES 445


REFERENCES

1. S. Banach, Un théorème sur les transformations biunivoques, Fund. Math. 6 (1924) 236–239.
2. G. Birkhoff, Lattice Theory. Third edition. American Mathematical Society, Providence, RI, 1979.
3. B. Knaster, Un théorème sur les fonctions d’ensembles, Ann. Soc. Polon. Math. 6 (1928) 133–134.
4. S. G. Krantz, Elements of Advanced Mathematics. Third edition. Taylor & Francis/CRC Press, Boca Raton,
FL, 2012.
5. A. Tarski, A lattice-theoretical fixpoint theorem and its applications, Pacific J. Math. 5 (1955) 285–309.

Department of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan
mcli@math.nctu.edu.tw

Quote from W. C. Williams, The Autobiography of


William Carlos Williams, New Directions Publishing,
New York, 1967

But mathematics was not one of my fortes. To this day I still wonder at the
astuteness of that Mr. Bickford, who gave me my final examination in Higher
Algebra . . . Mr. Bickford was older than some of the others and had the reputation
of being the best man in his department. As I stood there watching him chewing
his moustache—he wore a little black moustache—as he studied the paper I had
turned in, the last one left in the room, I tried to brace myself as best I could
for the verdict. He looked up at me after a while with no expression on his face
whatsoever. “You’ll never be a mathematician, Williams,” he told me. I agreed.
“But you show an understanding of the process.” He paused. “And I’m going to
pass you!” I couldn’t move for joy. It was the most intelligent verdict, and from
a teacher, that I have ever encountered. It is hard to realize how important such
a moment can be in a man’s life. That single piece of intelligence had more to
do in straightening my difficulties, in putting me on a correct course than any
single thing that I can remember. He saw my mind, and realized what it was not
intended to perform. And he acted accordingly. That’s what it means, at best, to
be a teacher.
—Submitted by Robert Haas

446 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



A Technique in Contour Integration
M. L. Glasser

Abstract. A rotation of an integration contour is shown to lead, in some cases, to interesting


integral identities. Elementary and non-elementary examples are provided.

1. INTRODUCTION. An application of contour integration presented here, while


nearly trivial in conception and implementation, and capable of extensive generaliza-
tion, does not appear in standard texts and is able to produce striking and potentially
useful integral identities such as the following.
Suppose that b, c > 0 and d = (b2 − c)/2 > 0. Then


x 4 + bx 2 − d
Z
d x = 0. (1)
0 x 8 + cx 4 + d 2

2. CALCULATION.

Theorem. Suppose the function F : C → C is analytic in the first quadrant, is real


and positive along the positive real axis and grows at most exponentially at infinity.
Let F(r eiπ/4 ) = A(r ) + i B(r ), D(r ) = A2 (r ) + B 2 (r ). Then, for a > 0,

A(r ) + B(r ) ∞
A(r ) − B(r )
Z Z
sin(ar ) 2
dr = cos(ar 2 ) dr. (2)
0 D(r ) 0 D(r )

Proof. Consider the real integral

∞ 2
e−ax
Z
d x. (3)
0 F(x)

Under the stated hypotheses, by Cauchy’s theorem, the integration contour can√ be ro-
tated by 45 to run along the diagonal of the first quadrant. Write x = r (1 + i)/ 2, so

x 2 = ir 2 . The result follows by noting that the imaginary part of the resulting integral
must vanish.

Corollary. When the respective integrals converge, for n = 0, 1, 2, . . . , we have the


following:

A(r ) − B(r ) ∞
A(r ) + B(r )
Z Z
4n
r dr = r 4n+2 dr = 0. (4)
0 D(r ) 0 D(r )

This results by expanding both sides of (2) in powers of a and comparing coefficients.
http://dx.doi.org/10.4169/amer.math.monthly.121.05.447
MSC: Primary 26A06, Secondary 30E20; 33B10

May 2014] NOTES 447


Equation (1)

results by taking F(x) = x 4 + bx 2 + d. As a second example, let us
take F[z] = e . Then for n = 0, 1, 2, . . . , we have
2z

Z ∞ Z ∞
sin(ar )e (cos r + sin r ) dr =
2 −r
cos(ar 2 )e−r (cos r − sin r ) dr, (5a)
0 0
Z ∞ Z ∞
r 4n e−r (cos r − sin r ) dr = 0, r 4n+2 e−r (cos r + sin r ) dr = 0. (5b)
0 0

As a non-elementary example, the Kelvin functions (see [1]), which occur in many
engineering studies, are defined by

ker0 (x) + kei0 (x) = K 0 (xeiπ/4 ) (6)

where K 0 (z) is a particular Bessel function. Thus, for F(x) = K 0 (x), we obtain the
Kelvin function identity

ker0 (x) ∞
kei0 (x)
Z Z
dx = d x, (7)
0 ker0 (x) + kei20 (x)
2
0 ker20 (x)+ kei20 (x)

which seems to be a new addition to the literature concerning these functions. The con-
sequences of the theorem may be increased many-fold by noting that sin(ar 2 ), cos(ar 2 )
in (2), when the resulting integrals converge, may be replaced by S(r 2 ), C(r 2 ), the sine
and cosine transforms, respectively, of a given function. Thus, by multiplying both
sides of (5a) by the characteristic function of the interval 0 < a < b and integrating
over a, we find that
Z ∞ Z ∞
dr dr
2
(1 − cos(br ))e (cos r + sin r ) =
2 −r
sin(ar 2 )e−r (cos r − sin r ). (8)
0 r 0 r2

The integral √ on the right-hand side of (6) is given by Mathematica as π b/2 −
(π/2)erfc(1/ 2b), so we have determined the value of the left-hand integral for
which it does not return a value. Similar, although more complex, formulas are ob-
tained if x 2 in the exponent in (3) is replaced by x ν and the path of integration is
rotated to the corresponding ray.

REFERENCE

1. I. S. Gradshteyn, I. M. Ryzhik, Table of Integrals, Series and Products. Academic Press, New York, 1994.

Department of Physics, Clarkson University, Potsdam, NY 13699-5820


laryg@clarkson.edu

448 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



Large-Deviation Bounds for Sampling
without Replacement
Kyle Luh and Nicholas Pippenger

Abstract. We give a simple argument, based on drawing balls from urns, showing that the ex-
ponential bound on the probability of a large deviation for sampling with replacement applies
also to sampling without replacement. This result includes as a special case the relationship
between the binomial and hypergeometric distributions.

1. INTRODUCTION. Two distributions that are encountered early in any course


on probability are the binomial and the hypergeometric. For the binomial distribu-
tion, we typically imagine a possibly biased coin that comes up heads with probabil-
ity p and tails with probability 1 − p. If we flip the coin independently n times, the
number
 k of times K that heads comes up has the binomial distribution: Pr[K = k] =
n
k
p (1 − p) n−k
. One of the great virtues of the binomial distribution is that almost
everything about it can be easily calculated or estimated by direct manipulation of
simple expressions. Consider, for example, estimating the probability of a large devi-
ation; that is, the probability that K exceeds its expectation pn by at least qn, where
0 < q < 1 − p. To obtain such an estimate, we consider the probability generating
function

g K (u) = Ex[u K ]
X n 
= p k (1 − p)n−k u k
0≤k≤n
k

= (1 − p + pu)n (1)

for K , where u is a new variable. We then have


X
Pr[K ≥ ( p + q)n] = Pr[K = k]
( p+q)n≤k≤n

1 X
≤ Pr[K = k]u k
u ( p+q)n ( p+q)n≤k≤n

1 X
≤ Pr[K = k]u k
u ( p+q)n 0≤k≤n

g K (u)
= (2)
u ( p+q)n
for any u ≥ 1. Substituting (1) into (2) yields

1 − p + pu n
 
Pr[K ≥ ( p + q)n] ≤ .
u p+q
http://dx.doi.org/10.4169/amer.math.monthly.121.05.449
MSC: Primary 60F10, Secondary 60E15

May 2014] NOTES 449


Minimizing the right-hand side of this bound by setting the derivative with respect to
u equal to zero, we obtain

  p+q  1− p−q !n


p 1− p
Pr[K ≥ ( p + q)n] ≤ . (3)
p+q 1− p−q

The bound (3) is due, in the form we have stated it, to Chernoff [4].
For the hypergeometric distribution, we typically imagine an urn initially contain-
ing a red balls and b blue balls. If we draw n ≤ a + b balls without replacement,
the number H of red balls drawn has the distribution: Pr[H = h] = ah n−h b
 a+b
n
.
If n balls were drawn with replacement, the number of red balls drawn would have a
binomial distribution with p = a/(a + b). Without replacement, each red ball drawn
reduces the probability that the next ball drawn will be red. Thus, we expect the dis-
tribution of H to be more strongly concentrated about its expectation than that of
K with p = a/(a + b). We therefore expect the bound of (3) to apply to Pr[H ≥
( p + q)n] as well as to Pr[K ≥ ( p + q)n]. But if we try to derive such a bound,
we encounter the difficulty that the probability generating function g H (u) for H has
no simple expression in terms of elementary functions (analogous to the expression
(1 − p + pu)n for g K (t)).
P Indeed,  the hypergeometric distribution gets its name from
the fact that g H (u) = 0≤h≤n ah n−h b
u h a+b
 
n
is a “hypergeometric function.” It is,
however, true that
  p+q  1− p−q !n
p 1− p
Pr[H ≥ ( p + q)n] ≤ . (4)
p+q 1− p−q

The bound (4) was first proved by Hoeffding [8] in 1963. In 1979 Chvátal [5] derived
(4) by direct manipulation of the binomial coefficients appearing in the sum expressing
Pr[H ≥ ( p + q)n]. Hoeffding, however, obtained (4) as a special case of a much more
general result concerning “sampling from a finite population.” Suppose that the N
balls initially in the urn have not colors (red or blue) but rather real values c1 , . . . , c N .
We can draw n balls with replacement, obtaining n values Y1 , . . . , Yn , and look at the
distribution of their sum Tn = Y1 + · · · + Yn . Alternatively, we can draw n ≤ N balls
without replacement, obtaining n values X 1 , . . . , X n , and look at the distribution of
their sum Sn = X 1 + · · · + X n . The original urn is of course the special case “red = 1,
blue = 0,” with K = Tn and H = Sn .
We can easily obtain a large-deviation bound for Tn analogous to (3). Since Tn may
assume non-integral values, it will be convenient to use “moment generating func-
tions,” rather than probability generating functions. Let MY (r ) = Ex[er Y ] = (er c1 +
· · · + er c N )/N be the common moment generating function for each of the Yi . Then
MY (r ) is also the common moment generating function for each of the X i . Since draws
with replacement yield independent values, Tn has the moment generating function
MTn (r ) = MY (r )n . If µ = (c1 + · · · + c N )/N denotes the common expectation of the
Yi and ν > 0, then we obtain, in analogy to (2),

MY (r )
 n
Pr[Tn ≥ (µ + ν)n] ≤ . (5)
er (µ+ν)

We can again optimize this bound by differentiating with respect to r , obtaining a


bound that depends on n in a simple exponential way.

450 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



If we try to obtain a bound analogous to (5) for Sn , we again encounter the difficulty
that the moment generating function M Sn (s) is not the nth power of a single function,
but depends on n in a complicated way. But Hoeffding [8] showed that we do have

MY (s)
 n
Pr[Sn ≥ (µ + ν)n] ≤ . (6)
es(µ+ν)

This generalization of (4) lies beyond the reach of the combinatorial argument of
Chvátal [5]. Our main goal in this note is to give a simple and vivid argument in-
volving urns and balls yielding the bound (6) (including (4) as a special case).

2. MARTINGALE COUPLINGS. If V and W are two random variables, a coupling


of V and W is a random variable (V̂ , Ŵ ) such that the marginal distribution of V̂ is
the same as the distribution of V and the marginal distribution of Ŵ is the same as
the distribution of W . Such a coupling defines versions of V and W on a common
probability sample space, so that we can, for example, refer to conditional distributions
of one with respect to the other. Since our ultimate interest is in the distributions, rather
than the random variables themselves, we shall drop the “hats” from (V̂ , Ŵ ) and write
simply (V, W ).
Our first step will be to construct a coupling (Sn , Tn ) between Sn and Tn . We begin
with two urns. The first urn, X , contains N balls, x1 , . . . , x N . Each of these balls is
labeled with its number; that is, ball xi is labelled i. Balls will be drawn from urn X
without replacement. The second urn, Y , contains N balls, y1 , . . . , y N . Each of these
balls is initially unlabeled, but will eventually be assigned a label. Balls will be drawn
from urn Y with replacement.
We now perform a sequence of steps as follows. In the course of these steps we
shall define a bijective map ξ : {1, . . . , N } → {1, . . . , N } and a surjective map η :
{1, 2, . . .} → {1, . . . , N }. At each step, we draw a ball from urn Y . If the ball drawn is
still unlabeled, we draw a ball from urn X , assign the label of the ball drawn from urn
X to the ball drawn from urn Y , and then replace the ball drawn from urn Y in urn Y .
If the ball drawn from urn Y has already been assigned a label, we simply replace it in
urn Y . Since, with probability one, every ball in urn Y will eventually be drawn, every
ball in urn Y will eventually be assigned a label.
We define ξ(i) to be the label of the ith ball drawn from urn X . Since every ball
from X is eventually drawn, and balls are drawn from X without replacement, ξ is a
permutation of {1, . . . , N }. We define η(i) to be the label assigned to the ball drawn
from urn Y at the ith step (either during the ith step or at some previous step). Since
each of the labels 1, . . . , N is eventually assigned to one of the balls in urn Y , η maps
{1, 2, . . .} onto {1, . . . , N }. Furthermore, for each i we have

η(i) = ξ( j) (7)

for some j ≤ i.
The process just described defines a random variable (ξ, η) for which ξ is uniformly
distributed over all permutations of {1, . . . , N } and η is a sequence η(1), η(2), . . . of
independent random variables, each uniformly distributed over {1, . . . , N }.
Let c1 , . . . , cn be real numbers. We shall define the random variables X 1 , . . . , X N
by X i = cξ(i) for 1 ≤ i ≤ N , and the random variables Y1 , Y2 , . . . by Yi = cη(i) for
i ≥ 1. This definition creates a coupling between the sequence X 1 , . . . , X N , which is
distributed as a random sample without replacement from the population c1 , . . . , cn ,

May 2014] NOTES 451


and the sequence Y1 , Y2 , . . . , which is distributed as a sequence of independent random
samples with replacement from the same population.
Let n be an integer in the range 1 ≤ n ≤ N . We define Sn = X 1 + · · · + X n and
Tn = Y1 + · · · + Yn . This definition creates a coupling between Sn , which is distributed
as the sum of a random sample of size n without replacement from the population
c1 , . . . , cn , and Tn , which is distributed as the sum of a random sample of size n with
replacement from the same population.
A pair (V, W ) of random variables is a martingale if Ex[W | V ] = V . Our next
step will be to show that (Sn , Tn ) is a martingale, that is, that

Ex[Tn | Sn ] = Sn .

If Sn = s, then cξ(1) + · · · + cξ(n) = s, and ξ(1), . . . , ξ(n) is equally likely to be any of


the sequences satisfying this constraint. Since any permutation of such a sequence is
again such a sequence, we have

Ex[cξ(i) | Sn = s] = s/n (8)

for 1 ≤ i ≤ n. Now

Ex[Tn | Sn = s] = Ex[Y1 | Sn = s] + · · · + Ex[Yn | Sn = s]


= Ex[cη(1) | Sn = s] + · · · + Ex[cη(n) | Sn = s]. (9)

Since by (7) each η(i) for 1 ≤ i ≤ n is equal to one of the ξ(1), . . . , ξ(n), each of the
n terms in (9) is equal by (8) to s/n, and thus Ex[Tn | Sn = s] = s.
A twice differentiable function f : R → R is convex if its second derivative is non-
(t)
R
negative.
R A convex
 function satisfies Jensen’s inequality, in the form f d F(t) ≥
f t d F(t) (see Hardy, Littlewood, and Pólya [7]). This inequality may be inter-
preted as saying that an average of function values is at least as large as the function
value at the corresponding average of the arguments. If (S, T ) is a martingale coupling,
then we have
Z Z
Ex[ f (T )] = f (t) d FT |S=s FS (s)
Z Z 
≥ f t d FT |S=s (t) d FS (s)
Z
= f (Ex[T | S = s]) d FS (s)
Z
= f (s) d FS = Ex[ f (S)],

where FS (s) = Pr[S ≤ s] is the probability distribution function for S, and FT |S=s (t) =
Pr[T ≤ t | S = s] is the conditional probability distribution function for T , given that
S = s. This inequality will allow us to transfer the bound (5) from Tn to Sn , because
(Sn , Tn ) is a martingale coupling and the bound (5) is based on the moment generating
function, which is an expectation of a convex function (the exponential function). That
is, since t 7→ er t is convex for any real r , we have

M Sn (r ) ≤ MTn (r ).

452 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



Thus we obtain
M Sn (r )
Pr[Sn ≥ (µ + ν)n] ≤
er (µ+ν)n
MTn (r )
≤ r (µ+ν)n
e
MY (r ) n
 
= ,
er (µ+ν)

which completes the proof of (6).


The proof in this section is taken from the first author’s Bachelor’s Thesis [9], where
other constructions of martingale couplings can be found.

3. STOCHASTIC ORDER. The key to the proofs in the preceding section was the
construction of a martingale coupling. Consider the following two conditions relating
random variables S and T .
(1) There exists a martingale coupling between S and T (that is, there is a random
variable ( Ŝ, T̂ ) such that Ŝ has the same distribution as S, T̂ has the same
distribution as T , and ( Ŝ, T̂ ) is a martingale, that is, Ex[T̂ | Ŝ] = Ŝ).
(2) For any function f : R → R, if f is convex, then Ex[ f (S)] ≤ Ex[ f (T )] (pro-
vided the expectations exist).
We have used the fact that (1) implies (2), which follows immediately from Jensen’s
inequality. But in fact the converse, (2) implies (1), also holds. This converse is usually
ascribed to Strassen [14], who gave the first statement in full generality. But precursors
to this result are due to Hardy, Littlewood and Pólya [6, 7] and Blackwell [2, 3]. A
particularly simple proof has been given by Müller and Rüschendorf [10].
The equivalent conditions (1) and (2) can be used to define a “stochastic order”
between random variables (or their probability distributions). We say that “S is at
most as variable as T ” if one (and therefore also the other) of these conditions holds.
This partial order has been used by economists to compare assessments of risk (see
Rothschild and Stiglitz [11, 12, 13]) and inequality of incomes (see Atkinson [1]).

ACKNOWLEDGMENT. This research was partially supported by NSF Grant CCF 0917026.

REFERENCES

1. A. B. Atkinson, On the measurement of inequality, J. Econ. Theory 2 (1970) 244–263.


2. D. Blackwell, Comparison of experiments, in Proceedings of the Second Berkeley Symposium on Math-
ematical Statistics and Probability. Edited by J. Neyman. University of California Press, Berkeley, CA,
1951. 93–102.
3. D. Blackwell, Equivalent comparisons of experiments, Ann. Math. Stat. 24 (1953) 265—272.
4. H. Chernoff, A measure of the asymptotic efficiency for tests of a hypothesis based on the sum of obser-
vations, Ann. Math. Stat. 23 (1952) 493–507.
5. V. Chvátal, The tail of the hypergeometric distribution, Discr. Math. 25 (1979) 285–287.
6. G. H. Hardy, J. E. Littlewood, G. Pólya, Some simple inequalities satisfied by convex functions, Mess. of
Math. 58 (1929) 145–152.
7. , Inequalities. Cambridge University Press, London, 1934.
8. W. Hoeffding, Probabilty inequalities for sums of bounded random variables, J. Amer. Stat. Assoc. 58
(1963) 13–30.
9. K. Luh, Martingale Couplings and Bounds on the Tails of Probability Distributions. B.S. Thesis, Depart-
ment of Mathematics, Harvey Mudd College, May 2011.

May 2014] NOTES 453


10. A. Müller, L. Rüschendorf, On the optimal stopping values induced by general independence structures,
J. Appl. Prob. 38 (2001) 672–684.
11. M. Rothschild, J. E. Stiglitz, Increasing risk: I. A definition, J. Econ. Theory 2 (1970) 225–243.
12. , Increasing risk: II. Its economic consequences, J. Econ. Theory 3 (1971) 66–84.
13. , Addendum to ‘Increasing risk: I. A definition’, J. Econ. Theory 5 (1972) 306.
14. V. Strassen, The existence of probability measures with given marginals, Ann. Math. Stat. 36 (1965)
423–439.

Department of Mathematics, Yale University, New Haven, CT 06511


kyle.luh@yale.edu

Department of Mathematics, Harvey Mudd College, Claremont, CA 91711


njp@math.hmc.edu

100 Years ago this Month in The American Mathematical Monthly


Edited by Vadim Ponomarenko
At the recent meeting of the Ohio Association of Mathematics and Science
Teachers, a number of teachers of mathematics discussed the advisability of
forming a mathematics section under the Ohio Academy of Sciences. Another
suggestion was that the group of teachers concerned petition the council of the
American Mathematical Society, of which many of them are members, for per-
mission to organize a Junior Section of the Society for the purpose of conducting
meetings at which papers of less formal character and of more direct bearing on
mathematical teaching than is usual at regular meetings of the Society could be
presented and discussed. Such meetings would be of value both to those who
find it too expensive and inconvenient to attend regular meetings of the Soci-
ety as now constituted, and to those whose interests lie chiefly in the field of
mathematical teaching. It was further suggested that notices and reports might
be published in the American Mathematical Monthly, and that this journal might
be made an official organ of such junior sections in case the plan were eventually
recognized and approved by the Society.
[The Mathematical Association of America was founded shortly thereafter. —Eds.]

Excerpted from News and Notes, Amer. Math. Monthly 21 (1914) 172.

454 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



PROBLEMS AND SOLUTIONS
Edited by Gerald A. Edgar, Doug Hensley, Douglas B. West
with the collaboration of Itshak Borosh, Paul Bracken, Ezra A. Brown, Randall
Dougherty, Tamás Erdélyi, Zachary Franco, Christian Friesen, Ira M. Gessel, László
Lipták, Frederick W. Luttmann, Vania Mascioni, Frank B. Miles, Richard Pfiefer,
Dave Renfro, Cecil C. Rousseau, Leonard Smiley, Kenneth Stolarsky, Richard Stong,
Walter Stromquist, Daniel Ullman, Charles Vanden Eynden, Sam Vandervelde, and
Fuzhen Zhang.

Proposed problems and solutions should be sent in duplicate to the MONTHLY


problems address on the back of the title page. Proposed problems should never
be under submission concurrently to more than one journal. Submitted solutions
should arrive before September 30, 2014. Additional information, such as gen-
eralizations and references, is welcome. The problem number and the solver’s
name and address should appear on each solution. An asterisk (*) after the num-
ber of a problem or a part of a problem indicates that no solution is currently
available.

PROBLEMS
11775. Proposed by Isaac Sofair, S Fredericksburg,
VA. Let A1 , . . . , Ak be finite sets.
For J ⊆ {1, . . . , k}, let N J = j∈J A j , and let Sm = J : |J |=m N J .
P

(a) Express in terms of S1 , . . . , Sk the number of elements that belong to exactly m of


the sets A1 , . . . , Ak .
(b) Same question as in (a), except that we now require the number of elements be-
longing to at least m of the sets A1 , . . . , Ak .
11776. Proposed by David Beckwith, Sag Harbor, NY. Given urns U1 , . . . , Un in a line,
and plenty of identical blue and identical red balls, let an be the number of ways to put
balls into the urns subject to the conditions that
(i) each urn contains at most one ball,
(ii) any urn containing a red ball is next to exactly one urn containing a blue ball, and
(iii) no two urns containing a blue ball are adjacent.
(a) Show that

1 + t + 2t 2
an t n =
X
.
n=0
1 − t − t 2 − 3t 3

(b) Show that


n − 2m m n − 2m − 1 m n − 2m m − 1
        
j
XX
an = 4 + +2 .
j≥0 m≥0
j j j j j j

k
= 0 if k < l.

Here, l

http://dx.doi.org/10.4169/amer.math.monthly.121.05.455

May 2014] PROBLEMS AND SOLUTIONS 455


Q by Marian Dincă, Bucharest, Romania. Let x1 , . . . , xn be real num-
11777. Proposed
bers such that nk=1 xk = 1. Prove that
n
X xk2
≥ 1.
k=1
xk2 − 2xk cos(2π/n) + 1

11778. Proposed by Li Zhou, Polk State College, Winter Haven, FL. Let x, y, z be pos-
itive real numbers such that x + y + z = π/2. Let f (x, y, z) = 1/(tan2 x + 4 tan2 y +
9 tan2 z). Prove that
9
f (x, y, z) + f (y, z, x) + f (z, x, y) ≤ tan2 x + tan2 y + tan2 z .

14

11779. Proposed by Michel Bataille, Rouen, France.


V
Let M, A, B, C, and D be distinct B
points (in any order) on a circle 0 with
center O. Let the medians through M E
of triangles MAB and MCD cross lines P
AB and CD at P and Q, respectively, D
and meet 0 again at E and F, respec- K
tively. Let K be the intersection of AF L
with DE, and let L be the intersection O A
U
of BF with CE. Let U and V be the or- M
Q F
thogonal projections of C onto MA and
D onto MB, respectively, and assume
U 6= A and V 6 = B. Prove that A, B,
U , and V are concyclic if and only if
O, K , and L are collinear. C

11780. Proposed by Cezar Lupu, University of Pittsburgh, Pittsburgh, PA, and Tudorel
Lupu, Decebal High School, Constanţa, Romania. Let f be a positive-valued, concave
function on [0, 1]. Prove that
Z 1 2 Z 1
3 1
f (x) d x ≤ + f 3 (x) d x.
4 0 8 0

11781. Proposed by Roberto Tauraso, Università di Roma “Tor Vergata”, Rome, Italy.
For n ≥ 2, call a positive integer n-smooth if none of its prime factors is larger than
n. Let Sn be the set of all n-smooth positive integers. Let C be a finite, nonempty
set of nonnegative integers, and let a and d be positive integers. Let M be the set
of all positive integers of the form m = dk=1 ck sk , where ck ∈ C and sk ∈ Sn for
P
k = 1, . . . , d. Prove that there are infinitely many primes p such that pa ∈
/ M.

SOLUTIONS

Integrals with Bernoulli Numbers


11644 [2012, 426]. Proposed by Albert Stadler, Herrliberg, Switzerland. Let n be a
nonnegative integer, and let B j be the jth Bernoulli number, defined for j ≥ 0 by

456 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



x/(e x − 1) = Bk x k /k!. Let
P∞
k=0
n
! !
Z ∞
1 1 X x k−n−1
In = − + Bk e −x
d x.
0 x n (e x − 1) xn k=0
k!

Prove that I0 = γ − 1, that I1 = 1 − (1/2) log(2π ), and that for n ≥ 1,


B2n 2ζ 0 (2n)
I2n = (log(2π) + γ ) + (−1)n
(2n)! (2π )2n
X B2kn−1
1 H2n−2k
+ H2n−1 − · ,
2(2n − 1)! k=0
(2k)! (2n − 2k)!

and that for n ≥ 1,


n
ζ (2n + 1) 1 X B2k H2n+1−2k
I2n+1 = (−1)n − H2n + · .
2(2π) 2n 2(2n)! k=0
(2k)! (2n + 1 − 2k)!
Pn
Here, Hn denotes k=1 1/k, ζ denotes the Riemann zeta function, and γ is Euler’s
constant.
Solution by the proposer. Note that
n
ex 1 X x k−n−1
= + Bk + O(1)
x n (e x − 1) xn k=0
k!

in a neighborhood of x = 0. Define
n
! !
Z ∞
1 1 X x k−n−1
f n (s) = x s−1 − + Bk e−x d x.
0 x (e − 1)
n x xn k=0
k!

The integral converges absolutely for Re s > 0 and uniformly in every compact subset
contained in Re s ≥ ε > 0. Therefore, f n (s) is analytic in Re s > 0. Thus In = f n (1),
and we compute f n (1).
If Re s > n, then
n
X Bk
f n (s) = 0(s − n)ζ (s − n) − 0(s − n) − 0(s + k − n − 1). (1)
k=0
k!

Note that (1) represents the analytic continuation of f n (s) as a meromorphic function
in the whole complex plane. Also, the residues of f n at s ∈ {1, . . . , n} all vanish.
We now take note of some well-known facts about the gamma and zeta functions.
If m is a nonnegative integer, then in a neighborhood of s = 1 we have
0(s)
0(s − m) =
(s − 1)(s − 2) · · · (s − m)
(−1)m−1
 
1
= − γ + Hm−1 + O(s − 1) . (2)
(m − 1)! s − 1
By considering the residue of f n at 1, we have
n
(−1)n−1 (−1)n−1 X Bk (−1)n−k
0= ζ (1 − n) − − · .
(n − 1)! (n − 1)! k=0 k! (n − k)!

May 2014] PROBLEMS AND SOLUTIONS 457


For the last equality we used

! ∞
! ∞
!
X xk X xk X (−1)k+1 x k
xe−x = Bk 1−e −x
Bk

=
k=0
k! k=0
k! k=1
k!

and compared coefficients of x n . Therefore,


Bn
ζ (1 − n) = (−1)n−1 . (3)
n
Using the functional equation
πs
ζ (1 − s) = 2(2π)−s cos 0(s)ζ (s), (4)
2
we get
πn Bn
21−n π −n cos 0(n)ζ (n) = (−1)n−1
2 n
and
B2n
ζ (2n) = (−1)n−1 (2π )2n .
2(2n)!
Now
0(s)(s −1)ζ (s)−0(s)
I0 = f 0 (1) = lim 0(s)ζ (s)−0(s)−0(s −1) = −1+lim

s→1 s→1 s −1
1 
1−γ (s −1)+ O (s −1)2 1+γ (s −1)+ O (s −1)2
  
= −1+lim
s→1 s −1

− 1−γ (s −1)+ O (s −1)2 = γ − 1.


For n ≥ 1,
n
! !
Z ∞
1 1 X x k−n−1
In = − + Bk e−x d x = f n (1)
0 x n (e x − 1) xn k=0
k!
n
!
X Bk
= lim 0(s − n)ζ (s − n) − 0(s − n) − 0(s + k − n − 1)
s→1
k=0
k!

(−1)n−1 (−1)n−1 0
= (−γ + Hn−1 ) ζ (1 − n) + ζ (1 − n)
(n − 1)! (n − 1)!
n
(−1)n−1 X Bk (−1)n−k
− (−γ + Hn−1 ) − · (−γ + Hn−k )
(n − 1)! k=0
k! (n − k)!
n−1
(−1)n−1 (−1)n−1 0 X Bk (−1)n−k
= (ζ (1 − n) − 1) + ζ (1 − n) − · Hn−k
(n − 1)! (n − 1)! k=0
k! (n − k)!
n−1
Bn (−1)n−1 (−1)n−1 0 X Bk (−1)n−k
= Hn−1 − Hn−1 + ζ (1 − n) − · ,
n! (n − 1)! (n − 1)! k=0
k! (n − k)!

458 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



where we have used (2) and the fact that the residue at 1 of f n is zero. To get from here
to the required formulas, we will need to relate the values of ζ 0 at negative integers to
values of ζ and ζ 0 at positive integers.
We have ζ (0) = −1/2. From (4) we deduce
ζ 0 (1 − s) π πs 0 0 (s) ζ 0 (s)
− = − log(2π) − tan + + .
ζ (1 − s) 2 2 0(s) ζ (s)
In a neighborhood of s = 1,
π πs −1 0 0 (s)
tan = + O(s − 1), = −γ + O(s − 1),
2 2 s−1 0(s)
and
ζ 0 (s) −1
= + γ + O(s − 1),
ζ (s) s−1
so
ζ 0 (0) 1
− = − log(2π), ζ 0 (0) = − log(2π ).
ζ (0) 2
We have
0 0 (s + n) 1 1 1 0 0 (s + 1)
= + + ··· + + .
0(s + n) s+n−1 s+n−2 s+1 0(s + 1)
Thus,
0 0 (n)
= Hn−1 − γ , 0 0 (n) = (n − 1)! (Hn−1 − γ ) .
0(n)
From (4), we deduce
πs π πs
−ζ 0 (1 − s) = −2 log(2π)(2π)−s cos 0(s)ζ (s) − 2(2π )−s sin 0(s)ζ (s)
2 2 2
πs 0 π2
+ 2(2π)−s cos 0 (s)ζ (s) + 2(2π )−s cos 0(s)ζ 0 (s).
2 2
For n ≥ 1, let Z n = ζ 0 (1 − n)(2π)n /2(n − 1)!. We then have
πn π πn
Z n = log(2π) cos ζ (n) + sin ζ (n)
2 2 2
πn πn 0
− cos (Hn−1 − γ ) ζ (n) − cos ζ (n).
2 2
Thus for odd n, Z n = π2 (−1)(n−1)/2 ζ (n) , while for even n,

Z n = (−1)n/2 (log(2π) − Hn−1 + γ ) ζ (n) − ζ 0 (n)


 

Bn (2π )n
= (− log(2π) + Hn−1 − γ ) − (−1)n/2 ζ 0 (n).
2(n!)
We thus conclude:
1
I0 = γ − 1, I1 = f 1 (1) = ζ 0 (0) + B0 H1 = 1 − log(2π ),
2

May 2014] PROBLEMS AND SOLUTIONS 459


and for n ≥ 1, using the fact that B1 = −1/2 while B2k+1 = 0 for k > 0,
1 B2n 2ζ 0 (2n)
I2n = H2n−1 + (log(2π) + γ ) + (−1)n
2(2n − 1)! (2n)! (2π )2n
n−1
X B2k H2n−2k
− · ,
k=0
(2k)! (2n − 2k)!
n
1 ζ (2n + 1) X B2k H2n+1−2k
I2n+1 = − H2n + (−1)n + · .
2(2n)! 2(2π) 2n
k=0
(2k)! (2n + 1 − 2k)!

Also solved by B. Burdick.

An l p Inequality
11649 [2012, 522]. Proposed by Grahame Bennett, Indiana University, Bloomington,
IN. Let p be real with p > 1. Let (x0 , x1 , . . .) be a sequence of nonnegative real num-
bers. Prove that
∞ ∞
!p ∞ j
!p
X X xk X 1 X
<∞ ⇒ xk < ∞.
j=0 k=0
j +k+1 j=0
j + 1 k=0

Solution by Oliver Geupel, Brühl, NRW, Germany. For every nonnegative integer j,
since x j > 0, we have
j j ∞
1 X 2j + 1 X xk X xk
xk ≤ ≤2 .
j + 1 k=0 j + 1 k=0 j + k + 1 k=0
j +k+1

If p > 0, then x p strictly increases with x on the interval [0, ∞). Thus, raising both
sides of this inequality to the pth power and summing both sides over j yields
∞ j
!p ∞ ∞
!p
1 X x k
≤ 2p
X X X
xk .
j=0
j + 1 k=0 j=0 k=0
j +k+1

The proof also shows that the restriction on p can be relaxed to p > 0.
Editorial comment. Kenneth F. Anderson remarked that, conversely, since (a + b) p ≤
2 p (a p + b p ) for a, b ≥ 0, it follows that
!p  !p  p
∞ ∞ ∞ j ∞ ∞
xk 1 x k
≤ 2p 
X X X X X X
xk +   .
j=0 k=0
j + k + 1 j=0
j + 1 k=0 j=0 k= j
k + 1

The
P∞ convergence of the two series on the right-hand side implies convergence of
p
/(
P∞
j=0 k=0 x k j + k + 1) . See Hardy’s discussion of Hilbert’s Double Series
Theorem (Hardy–Littlewood–Pólya, Inequalities, Cambridge University Press, 1967,
Ch. 9).
Also solved by K. F. Andersen (Canada), R. Bagby, P. P. Dályay (Hungary), E. A. Herman, F. Holland (Ireland),
B. Karaivanov, O. Kouba (Syria), J. H. Lindsey II, O. P. Lossers (Netherlands), M. Omarjee (France), P. Perfetti
(Italy), M. A. Prasad (India), A. Stenger, R. Stong, R. Tauraso (Italy), T. Viteam (Chile), and the proposer.

460 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



A Double Integral
11650 [2012, 522]. Proposed by Michael Becker, University of South Carolina at
Sumter, Sumter, SC. Evaluate

x 2 − y2
Z ∞Z ∞
2
e−(x−y) sin2 (x 2 + y 2 ) 2 dy d x.
x=0 y=x (x + y 2 )2

Solution by Jan A. Van Casteren, University of Antwerp, Antwerp, Belgium. As prepa-


ration, we evaluate the following integral for σ > 0:

sin2 ρ ∞
sin2 (ρ/2) ∞ ∞
Z Z Z Z
1
e−2σρ dρ = e−σρ dρ = e−τρ dτ (1 − cos ρ)dρ
0 ρ 0 ρ 2 0 σ
∞ ∞ ∞
τ
Z Z Z  
1 1 1
= e−τρ (1 − cos ρ)dρ dτ = − dτ
2 σ 0 2 σ τ 1 + τ2
1 (1 + σ 2 )1/2
= log .
2 σ
Now for the integral J of the problem: passing first to polar coordinates via x =
r cos ϕ, y = r sin ϕ, we compute
∞ ∞
x 2 − y2
Z Z
2
J= e−(x−y) sin2 (x 2 + y 2 ) dy d x
0 x (x 2 + y 2 )2
π/2

cos2 ϕ − sin2 ϕ
Z Z
2 +2r 2 sin ϕ cos ϕ
= e−r sin2 (r 2 ) dϕ dr
0 π/4 r
π/2
∞Z
sin2 (r 2 )
Z
2 +r 2 sin 2ϕ
= e−r cos 2ϕ dϕ dr
0 π/4 r
2

1 − e−r sin2 (r 2 )
Z
1
=− dr (substitute ρ = r 2 )
2 0 r2 r

1 − e−ρ sin2 ρ 1/2 ∞
sin2 ρ
Z Z Z
1 1
=− dρ = − e−2σρ dρ dσ
4 0 ρ ρ 2 0 0 ρ
1/2
(1 + σ ) 2 1/2
Z
1
=− log dσ (integrate by parts)
4 0 σ
(1 + (1/2)2 )1/2
 
1 1 1 1 1 1
=− log + arctan =− log 5 − arctan .
4 2 1/2 2 16 4 2

Also solved by K. F. Andersen (Canada), D. Anderson (Ireland), R. Bagby, D. H. Bailey (U.S.) & J. M. Borwein
(Australia), M. Benito, Ó. Ciaurri, E. Fernández & L. Roncal (Spain), K. N. Boyadzhiev, M. A. Carlton, R.
Chapman (U. K.), H. Chen, B. E. Davis, S. de Luxán (Spain), E. S. Eyeson, C. Georghiou (Greece), O. Geupel
(Germany), M. L. Glasser, J. A. Grzesik, A. Guetter & I. Roussos, E. A. Herman, F. Holland (Ireland), B.
Karaivanov, O. Kouba (Syria), K. D. Lathrop, K.-W. Lau (China), O. P. Lossers (Netherlands), J. Magliano,
T. L. McCoy, M. Omarjee (France), P. Perfetti (Italy), M. A. Prasad (India), I. Rusodimos, R. Stong, R. Tauraso
(Italy), T. Trif (Romania), D. B. Tyler, E. I. Verriest, J. Vinuesa (Spain), M. Vowe (Switzerland), J. Wan
(Australia), H. Wang & J. Wojdylo, GCHQ Problem Solving Group (U. K.), NSA Problems Group, and the
proposer.

May 2014] PROBLEMS AND SOLUTIONS 461


A Binomial Determinant
11652 [2012, 522–523]. Proposed by Ajai Choudhry, Foreign Service Institute, New
Delhi, India. For a, b, c, d ∈ R, and for nonnegative integers i, j, and n, let
i 
n−i i n−i− j+s j−s i−s s
X  
ti, j = a b c d .
s=0
j −s s

Let T (a, b, c, d, n) be the (n + 1)-by-(n + 1) matrix with (i, j)-entry given by ti, j , for
i, j ∈ {0, . . . , n}. Show that det T (a, b, c, d, n) = (ad − bc)n(n+1)/2 .
Solution by Omran Kouba, Higher Institute for Applied Sciences and Technology,
Damascus, Syria. Let E denote the vector space Rn [x] of real polynomials with de-
gree at most n, and let B denote the canonical basis {1, x, x 2 , . . . } of E. Consider the
linear transformations V and Tλ,µ from E to E defined by V (P(x)) = x n P(1/x) and
Tλ,µ (P(x)) = P(λx + µ), where (λ, µ) ∈ R2 .
For a linear transformation T from E to E, let det(T ) denote the determinant of the
matrix of T with respect to B. Since the matrices of V and Tλ,µ with respect to B are
0 ··· ··· 0 1 1 µ ∗ ··· ∗
   
0 · · · 0 1 0 0 λ ∗ ··· ∗ 
. .. .. .. .. 
. 0 0 λ 2
··· ∗ ,
 
. . . . .
 and 
. .
 .. . . . . . . . . ... 

0 1 0 · · · 0
1 0 ··· ··· 0 0 ··· ··· 0 λn

we obtain det(V ) = (−1)n(n+1)/2 and det(Tλ,µ ) = λn(n+1)/2 .


Now consider (a, b, c, d) ∈ R4 with b 6 = 0, and let U be the linear transformation
defined by U = Tb,a ◦ V ◦ Tc−ad/b,d/b . We have

det(U ) = det(Tb,a ) det(V ) det(Tc−ad/b,d/b ) = (ad − bc)n(n+1)/2 . (∗)


On the other hand, for 0 ≤ i ≤ n,
U (x i ) = (a + bx)n−i (c + d x)i
n n
!
X  n − i  i 
a n−i− j+s b j−s ci−s d s x j = ti, j x j .
X X
=
j=0 s≥0
j − s s j=0

Thus, the matrix of U with respect to B is the transpose of the matrix T (a, b, c, d, n).
Using (∗), we obtain
det(T (a, b, c, d, n)) = det(U ) = (ad − bc)n(n+1)/2 .
The case b = 0 follows by continuity.
Also solved by D. Beckwith, R. Chapman (U. K.), P. P. Dályay (Hungary), B. Karaivanov, P. Lima-Filho,
M. Omarjee (France), M. A. Prasad (India), J. H. Smith, J. H. Steelman, R. Stong, M. Wildon (U. K.), GCHQ
Problem Solving Group (U. K.), and the proposer.

462 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



REVIEWS
Edited by Jeffrey Nunemacher
Mathematics and Computer Science, Ohio Wesleyan University, Delaware, OH 43015

Math on Trial: How Mathematics Is Used and Abused in the Courtroom. By Leila Schneps
and Coralie Colmez, Basic Books, New York, 2013, xi+256 pp., ISBN 978-0-465-03292-1,
$26.99.

Reviewed by Daniel Ullman

The study of mathematics is often regarded as ideal undergraduate preparation for


the study of law, since both disciplines require that practitioners reason carefully, pay
precise attention to detail, and identify essential elements in complex situations. In
both mathematics and the law, argument is used to deduce new conclusions from basic
principles, or, put in a more pragmatic way, to reduce desired conclusions to accepted
principles. In both mathematics and the law, reference to prior work is necessary in
order not to have to argue every point from first principles. In mathematics, we have
theorems from the literature; in law, we have precedent. The axiom systems that we
favor in mathematics are not the only ones worthy of careful study, but we agree to ac-
cept them for most purposes as fundamental to our conception of reality. Similarly, the
Constitution that we have adopted in America is not the only way to establish law, but
it is one that we agree to accept as fundamental to our conception of a nation of laws.
And yet the cultures of mathematics and law could not be more different. Where
law is emotional and engaged, mathematics is dispassionate and detached. A legal
argument implores the listener to agree; a mathematical argument impels the listener
to agree. The modes of persuasion available to the litigator range far and wide, in-
cluding appeals to emotion, bias, authority, and suggestion. The mathematician, in
her turn, can appeal only to axioms (or to results already shown to follow from those
axioms). Lawyers attempt to alter opinions, whereas mathematicians hold opinions in
utter disdain.
So there is a curious clash of cultures when mathematicians find themselves in a
court of law. Accustomed to dealing with truth and fallacy, the mathematician acting
as expert witness may find her testimony used more for its emotional power than for its
correctness. Often, juries may find themselves more impressed with a mathematician’s
credentials than with her explanations, which they might scarcely understand. When
a mathematician overreaches and expounds on values, policy, or guilt, the jury may
ascribe more weight to such remarks than is warranted. Knowing this, lawyers are
inclined to feature mathematicians in a kind of abuse of authority.
The culture clash can lead to awkward interchanges in the courtroom. Will Kazez
of the University of Georgia tells the story of his stint as an expert witness in a case
involving a triangular parcel of land. In response to the lawyer’s directive “And tell the
court, Dr. Kazez, are you familiar with the theorem of Pythagoras?”, Kazez planned
the delicately sarcastic reply “Well, your honor, I don’t mean to brag, but yes, I am
familiar with the theorem of Pythagoras.”

http://dx.doi.org/10.4169/amer.math.monthly.121.05.463

May 2014] REVIEWS 463


I heard once about a mathematician brought to a courtroom as an expert witness
on the theory of infinite series. The case involved a landlord/tenant dispute in which
a subsidized rent was on a sliding scale depending on the income of the tenant. To
complicate matters, the tenant received a government subsidy depending on the size of
the rent. The landlord argued that the additional income that the tenant secured owing
to the rent justified an increase in the rent. This increase in rent would of course justify
an increase in the government subsidy, and a never-ending spiral of increases in rent
and subsidy would ensue. Along came the mathematician hero to propose a solution
to the conundrum: pass to the limit.
I received a phone message a few weeks ago from a Washington DC lawyer I did not
know. The lawyer explained that he was looking for an expert who could testify that a
certain fact pattern was “less likely than being struck by lightning”. Not being any sort
of expert on lightning, I didn’t return the call. But two thoughts did come to my mind.
The first was that the unlikely fact pattern that the lawyer had in mind was probably the
one that had actually occurred, in which case its probability (in retrospect) is equal to
unity. The second was that fancy law firms probably pay expert witnesses generously.
How certain must a juror be to vote for a criminal conviction? The phrase “proof
beyond a reasonable doubt”, which expresses the familiar standard, grates on the
mathematical ear. How much doubt is reasonable? Judges refuse to say. Besides, proof
in the mathematical sense is meant to be beyond all doubt. Proof leads to absolute
certainly (allegedly, anyway). Like uniqueness and pregnancy, proof does not accept
modifying phrases gracefully. While it must be acknowledged that absolute certainty
is never possible in the real (non-mathematical) world, we don’t solve any problem
by using a phrase like “proof beyond a reasonable doubt” while refusing to define it.
Here is what I would propose: Jurors will be instructed to regard something as proved
beyond a reasonable doubt if their personal assessment of the probability of the truth
of the assertion exceeds some predetermined value, say 98%. To measure whether this
standard is achieved, the jurors are asked if they would be willing to risk $49 if the
assertion were to be revealed as false against a $1 gain if the assertion were it revealed
to be true. This remains a subjective judgement, as it must be, but at least the degree
of certainty expected is quantified. Whether 98% is the right number is a matter of
policy, not of mathematics, but leaving this number unspecified relegates the standard
of certainty to a vague “pretty sure”.
The standard for winning in court in many civil matters is “by a preponderance of
the evidence”, which is to say “more likely than not”. In contrast to “reasonable doubt”,
this standard is clearly defined, and the test is a willingness to accept a bet at even odds
. . . or, rather, to be more precise, for some  > 0, a bet in which one risks 1 +  dollars
against a gain of one dollar. Still, we wonder whether some circumstances might best
be subject to, say, a 60% standard. We could call it “beyond a reasonable filibuster.”
The only place we expect to find absolute certainty is within mathematics. Yet it
must be said that proof, even in the mathematical sense, often does seem to leave some
room for uncertainty. The announcements of the proofs of the Four Color Theorem
and Fermat’s Last Theorem did not immediately extinguish all doubt. Thomas Hales’s
proof of the Kepler conjecture demonstrates that it is true, but is it undoubtedly true?
And if there remains doubt, is that doubt “reasonable”? The announcement in 1980
of the classification of finite simple groups most certainly did not settle that question
definitively. In 2004, at his AMS Invited Address at the Joint Mathematics Meetings
in Phoenix, Michael Aschbacher described his 1300-page manuscripts plugging a hole
in what had been regarded as the complete proof. At the conclusion of the talk, an
audience member asked Aschbacher whether he now felt certain that the classification
was in fact correct. As I recall, he responded, “I’d bet my car on it, but I wouldn’t bet

464 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



my house on it.” So much for certainty. Perhaps “proof beyond a reasonable doubt”
exists within mathematics after all.
Mathematicians brought into the judicial system would be wise to heed the same
warning that is given to mathematicians who work in the policy arena: Limit your
pronouncements to the mathematical facts and leave the interpretation of those facts
to others. It is appropriate, for example, to describe the rate of planetary temperature
increase and to explain how this rate is determined from data. But it is dangerous to
assert then that, based on mathematics, we should implement a cap-and-trade system
for containing greenhouse gases. In fact, it is best to avoid the word “should” in its
entirety. Mathematics does not tell us that we “should” do anything. What we should
do can be informed by mathematics, but what we should do also involves values on
which mathematics is utterly ill-equipped to comment. When mathematicians cross
the line and become advocates for or against certain policies (or defendants), they
threaten their position as trusted and dispassionate arbiters of indisputable facts.
The clash of cultures between mathematics and the law emerges clearly in Math on
Trial. The authors, Leila Schneps and her daughter Coralie Colmez, have divided the
book into ten chapters, each of which focuses on a classical mathematical error and a
particular legal case. For example, the first chapter is entitled “Math Error Number 1:
Multiplying Non-independent Probabilities” and subtitled “The Case of Sally Clark:
Motherhood Under Attack”. Here’s a quick synopsis: Sally Clark gave birth to two
children in the 1990s who died as infants. Double crib deaths happen only very rarely,
and Sally was (therefore?) accused of murdering the children. The estimations of the
probability of double crib deaths occurring simply “at random” played a pivotal role
in the trial. The jury was persuaded that this probability was minuscule and, on that
basis, . . . well, there is more to the story, but I won’t give it all away here.
In another chapter focusing on gender bias in graduate admissions at Berkeley in
the 1970s, mathematics is used to exculpate the institution. It is shown how Simpson’s
paradox can lead to a seemingly impossible circumstance in which each department
admits a percentage of female applicants into its graduate program that is greater
than the percentage of male applicants who are admitted, and yet taken together the
percentage of women admitted to graduate programs in general is lower for women
than for men. In effect, this is what was happening at Berkeley. The lower admissions
rate for women across all Berkeley departments collectively resulted principally from
the fact that women were applying in disproportionate numbers to more competitive
graduate programs.
The book is beautifully written. The story telling is powerful. The pages turn swiftly
and easily. While the stories involve courtroom trials, they are not delivered in the form
of tales of suspense; instead, the outcomes of these trials are strongly foreshadowed in
the narrative. As soon as Sally Clark is introduced, the reader discerns that she is in-
nocent of wrongdoing but about to be wrapped up in a travesty of a trial and convicted
unfairly. The key expert witness, not a mathematician but a pediatrician, is quickly
seen to be a charming but self-promoting charlatan with nearly hypnotic power over
the jury. Breaking the suspense in this way allows the focus to return to the theme,
which is mathematical error.
The book is also carefully researched. The authors studied the ten cases with great
care, pouring over books, scholarly articles, courtroom transcripts, and newspaper
archives. In some cases, they interviewed individuals involved in the trials. They
point to a substantial literature on the topic of mathematics and statistics at trial. In
particular, they mention the work of Laurence Tribe, the well-known Harvard Law
professor, who studied mathematics as an undergraduate but who generally argues
against permitting mathematical testimony at trial.

May 2014] REVIEWS 465


At the top of the front cover of the book appears the sentence: “When math be-
comes a matter of life and death, you’d better check your sums.” In fact, there is no
summing at all in the book, and this glib sentence gives away that the book targets a
non-mathematical audience. In fact, the sentence also perpetuates a myth among the
ignorant that mathematicians are people who “do sums”. A more apt sentence would
replace “sums” with “reasoning”. In any case, the sentence is corny and belies the
seriousness and the engaging exposition one finds when one opens the book.
Math on Trial is about mathematical errors. Unfortunately, the book makes a num-
ber of mathematical errors of its own. On page 1, we are told, “If you multiply the
probabilities of events that are not independent of each other, you will get a signifi-
cantly smaller probability than is accurate.” No, not always smaller. On page 62, we
read, “Suppose you are given a coin and told that it is of one of two types: either fair
and balanced or weighted to come up heads 70% of the time. You are allowed one toss,
and it falls on heads. Let us . . . investigate the probability that the coin is biased after
this result.” They calculate the answer, despite the fact that the question is nonsensical
without an a priori distribution on the two coins. In chapter 5, we are told that if the
probability of a single DNA sample matching a target sample is 1/n, then the proba-
bility of finding a match among m random samples is m/n. This is false but approx-
imately correct when m  n, though it must be very wrong when m > n, of course.
And on page 190, we read that the probability of throwing 3 sixes in 6 rolls of a die
is 20/216. Not quite. The probability is (20/216)(5/6)3 . It may be that “throwing at
least 3 sixes in 6 rolls” was intended, but 20/216 isn’t the correct value for that either.
The irony is that these errors won’t matter much to the non-mathematical readers,
who, like most jurors faced with mathematical testimony, won’t question any of the
assertions or identify any errors. They will merely take in the broad conclusions pro-
fessed by the authors, whom they will identify as people of substantial intellect and
with whom they will therefore be inclined to agree. In some cases, this can lead to the
conviction of innocent defendants. Indeed, the main point that the judicial enterprise
must be vigilant to the misuse of mathematics is not diminished by any mathematical
errors in Math on Trial.
In the preface, Schneps and Colmez refer to “mathematics’ disastrous record of
causing judicial error”. I take exception to this characterization. It may well be true
that there is a long history of legal argument inspired by misapplied, misinterpreted,
misused mathematics. But mathematics itself is not to blame. Whatever disasters are
on the record do not belong to mathematics. The culprit is not the use but the abuse
of mathematics. It is critically important to permit sound mathematics and science to
inform legal proceedings. But the translation from facts in the mathematical world to
conclusions in the real world is fraught, and it is best left in the hands of those who
are positioned to hold values, biases, and opinions. Mathematics remains untarnished.
Pythagaras has nothing to be ashamed of.

The George Washington University, Washington, DC 20052


dullman@gwu.edu

466 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



EDITOR’S ENDNOTES

We received several communications regarding “Yet a simpler proof of the chain rule,”
by Haryono Tandra in Vol. 120, No. 10, 2013, p. 900. David Salmon from the Univer-
sity of Oregon commented as follows:
I see no difference between this proof and that provided in “Introductory Complex
Analysis”, p. 46–47, by Richard A. Silverman, Dover Edition, 1972.
Raymond Mortini raises issues with the correctness of Tandra’s proof. Mortini’s
objections are due to a typographical error which we failed to correct. The last centered
equation in the filler piece,

f (g(xn k )) − f (g(c))
→ 0 = f 0 (g(c))g 0 (c)
xnk − c

has two extraneous k-subscripts. It should read

f (g(xn )) − f (g(c))
→ 0 = f 0 (g(c))g 0 (c).
xn − c

We thank Professor Mortini for bringing this to our attention.

Concerning “Pi day is upon us again and we still do not know if pi is normal,” by
David Bailey and Jon Borwein in Vol 121, No. 3, 2014, pp. 191–206, David Beasley
offers us the following.
I would like to commend you for the excellent article on pi in this month’s
M ONTHLY. Professors Bailey and Borwein presented an impressive array of facts
and questions about pi that should stimulate readers to learn even more about this
fascinating constant. There is a typo in the article, one that may have already been
pointed out. On page 193, the article notes that the Egyptian Rhind Papyrus suggests
that pi equals 32/18; that should be 256/81, or 3 + 1/9 + 1/27 + 1/81 as noted by
multiple Internet sources. (Ah, if only pi really were equal to (4/3)4 . . .). Thanks again
for an article and an issue well done.
Also in the March 2014 issue, several readers have pointed out to us the following:
On page 265, 4 lines from the bottom of the page, “It’s First Fifty Years” should be
“Its First Fifty Years.”

On p. 166 of Vol. 121, No. 2, 2014, an author’s name is misspelled in the filler piece
on the lower part of the page. “Robert Devitt-Ryder” should be “Robin Devitt-Ryder,”
and we offer this correction with apologies.

http://dx.doi.org/10.4169/amer.math.monthly.121.05.467

May 2014] EDITOR’S ENDNOTES 467


The following was submitted by Branko Curgus.
There is a typo in our paper that appeared in the November 2013 issue of the
M ONTHLY, pages 841–846. In the references on page 846 the publication year in
H. Nakamura, K. Oguiso, Elementary moduli space of triangles and iterative pro-
cesses, The University of Tokyo Journal of Mathematical Sciences 10 (2004) 209–224.
is wrong. It should be 2003.

On p. 175 of Vol. 121, No. 2, 2014, Roman Ger offers a solution to Problem 11641.
Unfortunately, we incorrectly listed his home institution. The correct listing should
be “Instytut Matematyki Uniwersytetu Śla̧skiego, Katowice, Poland.” We offer our
apologies to the author.

The following was submitted to us by R. G. Kulkani from Bangalore, India.


While going through some old articles, I noticed some errors in Vol. 3, No. 10,
1896, pp. 236–237. This relates to the note “A method of solving quadratic equations”
by Henry Heaton. May be some one has already pointed out, I am not sure as I am
pointing out bit late (by 117 years).
In the left hand side of equation (4), c2 is left out, and in equation (6) the left hand
side should be 2ax 2 instead of 2a 2 x 2 .

Scott T. Chapman, Editor

468 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121



August 6-9, 2014

Join us in Portland, Oregon for MAA MathFest 2014


The largest annual summertime gathering of mathematicians

Invited Addresses
Earle Raymond Hedrick Lecture Series
Speaker: Bjorn Poonen, Massachusetts Institute of Technology
AMS-MAA Joint Invited Address
Speaker: Sara Billey, University of Washington
MAA Invited Address
Speaker: Ricardo Cortez, Tulane University
MAA Invited Address
Speaker: Erika Camacho, Massachusetts Institute of Technology and Arizona State University
MAA Invited Address
Speaker: Keith Devlin, Stanford University
James R. C. Leitzel Lecture
Speaker: Joseph Gallian, University of Minnesota Duluth

AWM-MAA Etta Z. Falconer Lecture


Speaker: Marie A. Vitulli, University of Oregon
Pi Mu Epsilon J. Sutherland Frame Lecture
Speaker: Keith Devlin, Stanford University
The Jean Bee Chan and Peter Stanek Lecture for Students
Speaker: Jack Graver, Syracuse University
NAM David Harold Blackwell Lecture
Speaker: Mark Lewis, Cornell University
Martin Gardner Centennial Lecture
Speaker: Persi Diaconis, Stanford University

Mathematical Association of America maa.org/meetings/mathfest


MATHEMATICAL ASSOCIATION OF AMERICA
1529 Eighteenth St., NW
Washington, DC 20036

New MAA textbooks:


great books at
in the
unbeatable prices!
MAA eBooks Store
Ordinary Differential Equations
from Calculus to Dynamical Systems

Virginia W. Noonburg

Ordinary Differential Equations is, first and foremost, a text for the
introductory course in ordinary differential equations. The driving idea
behind this text is that all science majors need to take the differential equations course.
This text works well for self study. It is very readable and it has many examples fol-
lowed by their worked out solution. Those two things (readability and full solutions to the
examples) make this text a likely candidate for a professor who wants to teach a “flipped”
course in differential equations.
Each section has its own set of exer-
cises. Answers to odd-numbered exer- 2014, 330 pages
cises are in the back of the book. Electronic edition ISBN: 9781614446149

ebook: $30.00

MATHEMATICAL ASSOCIATION OF AMERICA


To order go to www.maa.org/ebooks/FCDS

Das könnte Ihnen auch gefallen