Beruflich Dokumente
Kultur Dokumente
MONTHLY
VOLUME 121, NO. 5 MAY 2014
NOTES
The Primes that Euclid Forgot 433
Paul Pollack and Enrique Treviño
Solution of Sondow’s Problem: A Synthetic Proof of the Tangency 438
Property of the Parbelos
Emmanuel Tsukerman
A Connection between Furstenberg’s and Euclid’s Proofs of the 444
Infinitude of Primes
Nathan A. Carlson
An Elementary Proof of a Generalization of Banach’s Mapping 445
Theorem
Ming-Chia Li
A Technique in Contour Integration 447
M. L. Glasser
Large-Deviation Bounds for Sampling without Replacement 449
Kyle Luh and Nicholas Pippenger
MONTHLY
Volume 121, No. 5 May 2014
EDITOR
Scott T. Chapman
Sam Houston State University
ASSOCIATE EDITORS
William Adkins Jeffrey Lawson
Louisiana State University Western Carolina University
David Aldous C. Dwight Lahr
University of California, Berkeley Dartmouth College
Elizabeth Allman Susan Loepp
University of Alaska, Fairbanks Williams College
Jonathan M. Borwein Irina Mitrea
University of Newcastle Temple University
Jason Boynton Bruce P. Palka
North Dakota State University National Science Foundation
Edward B. Burger Vadim Ponomarenko
Southwestern University San Diego State University
Minerva Cordero-Epperson Catherine A. Roberts
University of Texas, Arlington College of the Holy Cross
Allan Donsig Rachel Roberts
University of Nebraska, Lincoln Washington University, St. Louis
Michael Dorff Ivelisse M. Rubio
Brigham Young University Universidad de Puerto Rico, Rio Piedras
Daniela Ferrero Adriana Salerno
Texas State University Bates College
Luis David Garcia-Puente Edward Scheinerman
Sam Houston State University Johns Hopkins University
Sidney Graham Anne Shepler
Central Michigan University University of North Texas
Tara Holm Susan G. Staples
Cornell University Texas Christian University
Roger A. Horn Dennis Stowe
University of Utah Idaho State University
Lea Jenkins Daniel Ullman
Clemson University George Washington University
Daniel Krashen Daniel Velleman
University of Georgia Amherst College
Ulrich Krause
Universität Bremen
EDITORIAL ASSISTANT
Bonnie K. Ponce
NOTICE TO AUTHORS Proposed problems or solutions should be sent to:
The MONTHLY publishes articles, as well as notes and DOUG HENSLEY, MONTHLY Problems
other features, about mathematics and the profes- Department of Mathematics
sion. Its readers span a broad spectrum of math- Texas A&M University
ematical interests, and include professional mathe- 3368 TAMU
maticians as well as students of mathematics at all College Station, TX 77843-3368.
collegiate levels. Authors are invited to submit arti-
cles and notes that bring interesting mathematical In lieu of duplicate hardcopy, authors may submit
ideas to a wide audience of MONTHLY readers. pdfs to monthlyproblems@math.tamu.edu.
The MONTHLY’s readers expect a high standard of ex-
position; they expect articles to inform, stimulate, Advertising correspondence should be sent to:
challenge, enlighten, and even entertain. MONTHLY
articles are meant to be read, enjoyed, and dis- MAA Advertising
cussed, rather than just archived. Articles may be 1529 Eighteenth St. NW
expositions of old or new results, historical or bio- Washington DC 20036.
graphical essays, speculations or definitive treat-
Phone: (877) 622-2373,
ments, broad developments, or explorations of a
E-mail: tmarmor@maa.org.
single application. Novelty and generality are far
less important than clarity of exposition and broad Further advertising information can be found online
appeal. Appropriate figures, diagrams, and photo- at www.maa.org.
graphs are encouraged.
Notes are short, sharply focused, and possibly infor- Change of address, missing issue inquiries, and
mal. They are often gems that provide a new proof other subscription correspondence can be sent to:
of an old theorem, a novel presentation of a familiar
MAA Service Center, maahq@maa.org.
theme, or a lively discussion of a single issue.
Submission of articles, notes, and filler pieces is re- All of these are at the address:
quired via the MONTHLY’s Editorial Manager System.
Initial submissions in pdf or LATEX form can be sent The Mathematical Association of America
to the Editor Scott Chapman at 1529 Eighteenth Street, N.W.
Washington, DC 20036.
http://www.editorialmanager.com/monthly
Recent copies of the MONTHLY are available for pur-
chase through the MAA Service Center:
The Editorial Manager System will cue the author
for all required information concerning the paper. maahq@maa.org, 1-800-331-1622.
Questions concerning submission of papers can
be addressed to the Editor at monthly@shsu.edu. Microfilm Editions are available at: University Micro-
Authors who use LATEX can find our article/note tem- films International, Serial Bid coordinator, 300 North
plate at http://www.shsu.edu/~bks006/Monthly. Zeeb Road, Ann Arbor, MI 48106.
html. This template requires the style file maa-
monthly.sty, which can also be downloaded from the
The AMERICAN MATHEMATICAL MONTHLY (ISSN
same webpage. A formatting document for MONTHLY
0002-9890) is published monthly except bimonthly
references can be found at http://www.shsu.edu/
June-July and August-September by the Mathe-
~bks006/FormattingReferences.pdf. Follow the matical Association of America at 1529 Eighteenth
link to Electronic Publications Information for
Street, N.W., Washington, DC 20036 and Lancaster,
authors at http://www.maa.org/pubs/monthly.
PA, and copyrighted by the Mathematical Asso-
html for information about figures and files, as well
ciation of America (Incorporated), 2014, including
as general editorial guidelines.
rights to this journal issue as a whole and, except
Letters to the Editor on any topic are invited. where otherwise noted, rights to each individual
Comments, criticisms, and suggestions for mak- contribution. Permission to make copies of individ-
ing the MONTHLY more lively, entertaining, and ual articles, in paper or electronic form, including
informative can be forwarded to the Editor at posting on personal and class web pages, for ed-
monthly@shsu.edu. ucational and scientific use is granted without fee
provided that copies are not made or distributed for
The online MONTHLY archive at www.jstor.org is a profit or commercial advantage and that copies bear
valuable resource for both authors and readers; it the following copyright notice: [Copyright the Math-
may be searched online in a variety of ways for any ematical Association of America 2014. All rights re-
specified keyword(s). MAA members whose institu- served.] Abstracting, with credit, is permitted. To
tions do not provide JSTOR access may obtain indi- copy otherwise, or to republish, requires specific
vidual access for a modest annual fee; call 800-331- permission of the MAA’s Director of Publications and
1622. possibly a fee. Periodicals postage paid at Washing-
See the MONTHLY section of MAA Online for current ton, DC, and additional mailing offices. Postmaster:
information such as contents of issues and descrip- Send address changes to the American Mathemati-
tive summaries of forthcoming articles: cal Monthly, Membership/Subscription Department,
MAA, 1529 Eighteenth Street, N.W., Washington, DC,
20036-1385.
http://www.maa.org/
Unknotting Unknots
Allison Henrich and Louis H. Kauffman
1. INTRODUCTION. When first delving into the theory of knots, we learn that
knots are typically studied using their diagrams. The first question that arises when
considering these knot diagrams is: How can we tell if two knot diagrams represent the
same knot? Fortunately, we have a partial answer to this question. Two knot diagrams
represent the same knot in R3 if and only if they can be related by the Reidemeister
moves; see Figure 1. Reidemeister proved this theorem in the 1920s [14], and it is the
underpinning of much of knot theory. For example, J. W. Alexander based the original
definition of his celebrated polynomial on the Reidemeister moves [1].
Now, imagine that you are presented with a complicated diagram of an unknot,
and you would like to use Reidemeister moves to reduce it to the trivial diagram that
has no crossings. In considering a problem of this sort, you stumble upon a curious
fact. Given a diagram of an unknot to be unknotted, it might be necessary to make the
diagram more complicated before it can be simplified. We call such a diagram a hard
unknot diagram [12]. A nice example of this is the Culprit, shown in Figure 2. If you
look closely, you’ll find that no simplifying type I or type II Reidemeister moves and
no type III moves are available. Yet this is indeed the unknot. In order to unknot it, we
need to introduce new crossings with Reidemeister I and II moves. In Figure 3, we see
that we can unknot the Culprit by making the diagram larger by two crossings (via a
http://dx.doi.org/10.4169/amer.math.monthly.121.05.379
MSC: Primary 57M25
Reidemeister move of type II) and that it takes a total of ten Reidemeister moves to
accomplish the unknotting. (Note that both type I and type II moves were performed
between the fifth and sixth diagram.)
In Figures 4 and 5, we indicate more examples of hard unknot diagrams. In Figure
4, we show examples with the least number of possible crossings. In Figure 5, we show
the very first example that appeared, discovered by Goeritz in 1934 [5].
At this point, we ask ourselves: How much more complicated does a diagram need
to become before it can be simplified? Moreover, how many Reidemeister moves do
we need to trivialize our picture? In this paper, we give a technique for finding upper
bounds for these answers. In particular, we will prove the following theorem. Note
that the precise definition of a knot diagram in Morse form (including the notion of a
maximum of a diagram) will be given in the body of the paper. To put this result into
context, however, think of b(K ) as being no larger than cr (K ).
Theorem 4. Suppose that K is a diagram (in Morse form) of the unknot with crossing
number cr (K ) and number of maxima b(K ). Let M = 2b(K ) + cr (K ). Then the dia-
gram can be unknotted by a sequence of Reidemeister moves so that no intermediate
diagram has more than (M − 2)2 crossings.
This theorem is proven using combinatorial arguments, along with the machinery
developed by Dynnikov in [4]. We will introduce the necessary background material
in Section 2 and give the proof in Section 3.
Returning to our Culprit, we have that cr (K ) = 10 and b(K ) = 5. Thus, M = 20
and (M − 2)2 = 182 = 324 is our upper bound on the number of crossings needed
to simplify the diagram. In actuality, we only needed a diagram with 12 crossings
in our unknotting sequence. The theory of these bounds needs improvement, but it
is remarkable that there is a theory at all for such questions. In addition to proving
this theorem, we will give bounds on the number of Reidemeister moves needed for
unknotting, and we will point the reader toward more results related to this question.
We warn the reader that the difference between the lower bounds and upper bounds that
are known is still vast. The quest for a satisfying answer to these questions continues.
Figure 6. The picture on the left is an example of an arc-presentation of a trefoil. The picture on the right is
an example that is not an arc-presentation (since not all horizontal arcs pass under vertical arcs).
Note that a rectangular diagram can be drawn naturally on a rectangular grid, with
corners and crossings contained within the squares of the grid. If we start by represent-
ing a rectangular diagram on a grid in this way, then we have what is often referred to as
a mosaic knot. Mosaic knots can be used to define a notion of quantum knot. See [11]
for more about quantum knots. For now, we refocus our attention on arc-presentations.
Figure 7. Elementary (de)stabilization moves. Stabilization moves increase the complexity of the arc-
presentation, while destabilization moves decrease the complexity.
Figure 8. Some examples of exchange moves. Other allowed exchange moves involve switching the heights of
two horizontal arcs that lie in distinct halves of the diagram. See [4] for a general formulation of the exchange
moves.
Definition 2. A knot diagram is in Morse form with respect to a given vector in the
plane if it has
Lemma 2. Suppose that a knot (or link) diagram K in Morse form has cr (K ) cross-
ings and b(K ) maxima. Then there is an arc-presentation L K of K with complexity
c(L K ) at most 2b(K ) + cr (K ) that can be obtained by ambient isotopies of the plane
(without the use of Reidemeister moves).
Proof. We begin with a diagram in Morse form, and convert this diagram into a piece-
wise linear diagram composed of lines with slope ±1. The resulting diagram has a
vertex corresponding to each maximum and minimum, with additional vertices that
form left and right cusps—at most one for each pair of successive extrema. Since the
number of minima equals the number of maxima in the diagram, and the number of
vertices that are not extrema is no larger than the sum of these two quantities, we have
at most 4b(K ) vertices. Thus, rotating this diagram by 45 degrees, we have a diagram
composed entirely of horizontal and vertical arcs with complexity at most 2b(K ), half
the number of possible vertices.
This diagram may fail to be an arc-presentation of K if any crossing has a horizontal
overpass. If more than half of the crossings in K have horizontal overpasses, we rotate
the diagram by 90 degrees. Now, at least half of the crossings are in the proper form.
Any remaining crossings containing a horizontal overpass may be rotated locally 90
degrees to form our arc-presentation L K , as shown in Figure 10 (see Figure 11 for an
example). For each crossing that requires this move, the complexity of the rectangular
diagram increases by at most 2. Thus, the overall complexity of our diagram increases
by at most 2( 12 cr (K )) = cr (K ). It follows that c(L K ) ≤ 2b(K ) + cr (K ).
Note that neither converting a Morse diagram into a piecewise linear diagram, nor
locally rotating a crossing, uses Reidemeister moves. These are ambient isotopies of
the plane.
L → L1 → L2 → · · · → Lm
What is particularly interesting about this result is that the unknot can be simplified
without increasing the complexity of the arc-presentation, that is, without the use
of stabilization moves. This gives a useful physical bound on how large a diagram can
be. Furthermore, if we apply Dynnikov’s method to a knotted knot, the process will
halt on a diagram that is not a planar circle. Thus, Dynnikov can detect the unknot.
The problem of detecting the unknot has been investigated by, for example, Birman
and Hirsch [2] and Birman and Moody [3]. More recently, it has been shown that
Heegard Floer Homology (a generalization of the Alexander polynomial) not only
detects the unknot, but also can be used to calculate the least genus of an orientable
spanning surface for any knot. This is an outstanding result, and we recommend that
the reader examine the paper by Manolescu, Oszvath, Szabo, and Thurston [13] for
more information. In that work, the Heegard Floer homology is expressed via a chain
complex that is associated to a rectangular diagram of just the type that Dynnikov uses.
Returning to the task at hand, we derive a quadratic upper bound on the cross-
ing number of diagrams in an unknotting sequence. Note that, using similar methods,
Dynnikov finds a bound of 2(cr (K ) + 1)2 in [4].
Theorem 4. Suppose that K is a diagram (in Morse form) of the unknot with
crossing number cr (K ) and number of maxima b(K ). Then for every i, the cross-
ing number cr (K i ) is no more than (M − 2)2 , where M = 2b(K ) + cr (K ) and
K = K 0 , K 1 , K 2 , . . . , K N is a sequence of knot diagrams such that K i+1 is obtained
from K i by a single Reidemeister move and K N is a trivial diagram of the unknot.
Figure 12. Factoring an exchange move through a type II and multiple type III Reidemeister moves.
It is straightforward to show that the maximum number of crossings that may occur
in an arc-presentation with complexity less than or equal to M is bounded above by
(M − 2)2 . If we translate an arc-presentation sequence of moves in a canonical fashion
into a sequence of Reidemeister moves to unknot our unknot, then many knot diagrams
in the Reidemeister sequence will be arc-presentations and, as such, will have fewer
than (M − 2)2 crossings. Furthermore, diagrams in this sequence that are not arc-
presentations have no more crossings than their arc-presentation relatives. Thus, there
exists a sequence of Reidemeister moves that unknots our original diagram K that does
not increase the crossing number to more than (M − 2)2 .
Using this count on the number of distinct arc-presentations of a given size, we can
find a bound (albeit a large one) on the number of arc-presentation moves we need.
This is simply by virtue of the fact that any reasonable sequence of moves will contain
mutually distinct arc-presentations that don’t exceed the complexity of the original,
and there are a limited number of such diagrams.
Proof. Suppose that an arc-presentation L has complexity n. Since each L k from The-
orem 3 is combinatorially distinct from any other L j with k 6 = P
j, we know that the
i=2 N (i), which is
n
number m of arc-presentations
Pn 1 in the sequence must be at most
no greater than i=2 2 i[(i − 1)!]2 .
We should note that, if we start with an arc-presentation of the unknot, every arc-
presentation in our simplification sequence must be a diagram of the unknot. As n
gets larger, we recognize that far fewer arc-presentations of complexity n are unknots.
Thus, in practice, m will be much lower than the upper bound provided here. We would
be interested to know what the probability is that an arc-presentation of complexity n is
the unknot. Using this probability, we could tighten the upper bound we found above.
We return now to our second question: How many Reidemeister moves does it take
to make an arc-presentation move?
Proof. Clearly, a destabilization move requires at most one Reidemeister move, a type
I move. Now, consider the first exchange move pictured in Figure 8. Let d be the
number of vertical strands intersecting both of the horizontal strands to be switched.
Then, the move requires d type III moves and one type II move. Thus, the exchange
move requires d + 1 Reidemeister moves. We note that if a is the length of the shorter
horizontal arc, then d < a. But a cannot be greater than n − 2, so the number of Reide-
meister moves required is less than or equal to n − 2. Similarly, the second exchange
move in Figure 8 requires d type III moves but no type II moves. Thus, both pictured
Theorem 8. Suppose that K is a diagram (in Morse form) of the unknot with cross-
ing number cr (K ) and number of maxima b(K ). Let M = 2b(K ) + cr (K ). Then the
number of Reidemeister moves required to unknot K is less than or equal to
M
X 1
i[(i − 1)!]2 (M − 2).
i=2
2
Proof. Suppose that the arc-presentation L K of our knot diagram K has complexity
c(L K ) = n. Then at most m(n − 2) Reidemeister moves are required to produce the
trivial (complexity 2) arc-presentation, where m is the number of moves in the mono-
tonic simplification of L K . By our lemma, this quantity is bounded above by
n
X 1
i[(i − 1)!]2 (n − 2).
i=2
2
M
X 1
i[(i − 1)!]2 (M − 2).
i=2
2
L → L1 → L2 → · · · → Lm
L → L1 → L2 → · · · → Lm
We note that the statements of Lemmas 2, 6, and 7 hold for arbitrary links as well
as diagrams of the unknot. Thus, the following result is an immediate consequence of
the previous theorems.
Theorem 11. Suppose that L is a diagram (in Morse form) of a split (resp. non-
split composite) link with crossing number cr (L) and number of maxima b(L). Let
M = 2b(L) + cr (L). Then the number of Reidemeister moves required to transform
L into a split (resp., composite) diagram is less than or equal to
M
X 1
i[(i − 1)!]2 (M − 2).
i=2
2
Similarly, we have the following extension of our results regarding maximum cross-
ing numbers in a simplifying Reidemeister sequence.
Theorem 12. Suppose that L is a diagram (in Morse form) of a split (resp., non-split
composite) link with crossing number cr (L) and number of maxima b(L). Then for
every i, the crossing number cr (L i ) is no more than (M − 2)2 , where M = 2b(L) +
cr (L) and L = L 0 , L 1 , L 2 , . . . , L N is a sequence of link diagrams such that L i+1 is
obtained from L i by a single Reidemeister move, and L N is split (resp., composite).
Figure 15. The Culprit with its rectangular diagram and arc-presentation
We saw in Figure 3 that the Culprit may be unknotted with ten Reidemeister moves,
see also [12]. The maximum crossing number of all diagrams in the given Reidemeis-
ter sequence is 12, two more than the number of crossings in the Culprit. As noted in
the introduction, however, we can compute our upper bound on the number of cross-
ings required for unknotting as follows. Since the crossing number cr (K ) = 10 and the
number of maxima in the diagram is b(K ) = 5, we see that M = cr (K ) + 2b(K ) =
20. Thus, our bound is (M − 2)2 = 182 = 324.
We can also use M to find our bound for the number of Reidemeister moves required
to unknot the Culprit:
M 20
X 1 X
i[(i − 1)!]2 (M − 2) = 9 i[(i − 1)!]2 .
i=2
2 i=2
The largest term in this expression is roughly 1035 , quite a bit larger than ten, unfortu-
nately.
ACKNOWLEDGEMENTS. The authors would like to thank Jeffrey Lagarias and John Sullivan for their
valuable comments. We would also like to thank the referees for their editorial suggestions.
REFERENCES
1. J. W. Alexander, Topological invariants of knots and links, Trans. Amer. Math. Soc. 20 (1923) 275–306.
2. J. Birman, M. Hirsch, A new algorithm for recognizing the unknot, Geom. Top. 2 (1998) 175–220.
3. J. Birman, J. Moody, Obstructions to trivializing a knot, Israel J. Math. 142 (2004) 125–162.
4. I. A. Dynnikov, Arc–presentations of links: monotonic simplification, Fund. Math 190 (2006) 29–76.
5. L. Goeritz, Bemerkungen zur Knotentheorie, Abh. Math. Sem. Univ. Hamburg 18 (1997) 201–210.
6. J. Hass, J. Lagarias, The number of Reidemeister moves needed for unknotting, J. Amer. Math. Soc 14
(2001) 399–428.
7. J. Hass, T. Nowik, Unknot diagrams requiring a quadratic number of Reidemeister moves to untangle,
Discrete Comput. Geom. 44 (2010) 91–95.
8. C. Hayashi, M. Hayashi, Unknotting number and number of Reidemeister moves needed for unlinking,
available at http://arxiv.org/abs/1012.4131 (2010) 1–10.
9. L. Kauffman, Knots and Physics, Series on Knots and Everything, Vol. 1, World Scientific, Singapore,
1991.
10. L. Kauffman, Knot diagrammatics, in The Handbook of Knot Theory, Edited by W. Menasco and M.
Thistlethwaite, Elsevier, Amsterdam, 2005. 233–318.
11. L. Kauffman, S. Lomonaco, Quantum knots and mosaics, J. Quantum Info. Processing 7 (2008) 85–115.
12. L. Kauffman, S. Lambropoulou, Hard unknots and collapsing tangles, in Introductory Lectures on Knot
Theory—Selected Lectures presented at the Advanced School and Conference on Knot Theory and its
Applications to Physics and Biology ICTP, Trieste, Italy, 11–29 May 2009, Edited by L. Kauffman, S.
Lambropoulou, S. Jablan, and J. Przytycki, World Scientific, Singapore, 2011.
13. C. Manolescu, P. Ozsvath, Z. Szabo, D. Thurston, Combinatorial link Floer homology, Geom.Top. 11
(2007) 2339–2412.
14. K. Reidemeister, Knotentheorie, Julius Springer, Berlin, 1932.
ALLISON HENRICH received her B.S. in mathematics and B.A. in philosophy from the University of Wash-
ington in 2003 and her Ph.D. in mathematics from Dartmouth College in 2008. She is currently an assistant
professor at Seattle University and has research interests in virtual knot theory and games involving knots.
When she is not doing mathematics, Henrich enjoys going to concerts, cooking, and playing with her puppies.
Department of Mathematics, Seattle University, Seattle, WA 98122
henricha@seattleu.edu
LOUIS KAUFFMAN is Professor of mathematics at the University of Illinois at Chicago. His primary re-
search interest is in knot theory. He is well known for the bracket state sum model for the Jones polynomial,
for a two-variable link polynomial called the Kauffman polynomial, and for the introduction and exploration of
an extension of classical knots called virtual knot theory. Kauffman is the author of four books on knot theory,
the editor of the World Scientific book series On Knots and Everything, and the editor-in-chief and founding
editor of the Journal of Knot Theory and Its Ramifications. When not doing mathematics, he plays clarinet in
the Chicago-based ChickenFat Klezmer Orchestra.
Department of Mathematics, Statistics, and Computer Science, University of Illinois, Chicago, IL 60607
kauffman@uic.edu
Abstract. Inspired by work of Ford, we describe a geometric representation of real and com-
plex continued fractions by chains of horocycles and horospheres in hyperbolic space. We
explore this representation using the isometric action of the group of Möbius transformations
on hyperbolic space, and prove a classical theorem on continued fractions.
1
K(bn ) = b1 +
1
b2 +
1
b3 + ,
b4 + · · ·
where b1 , b2 , . . . are complex numbers. We say that this is an integer continued frac-
tion if all bi are integers, and a real continued fraction if all bi are real. It has long
been recognized that we can study real and complex continued fractions from a ge-
ometric point of view by using Möbius transformations; here, we show how to rep-
resent real continued fractions by chains of horocycles in the hyperbolic plane, and
complex continued fractions by chains of horospheres in hyperbolic space. The ori-
gin of this idea is in Ford’s well-known paper [5] where he used such a representa-
tion to study integer continued fractions. Ford constructs the horocycles at rational
points in an elementary way; indeed, he says, “Perhaps the author owes an apology
to the reader for asking him to lend his attention to so elementary a subject, . . . ”.
On the other hand, he also says that his original idea was motivated by Bianchi’s
study of the Picard group. This material is at a deeper level, and Ford uses this in [4]
where he discusses horospheres in three-dimensional hyperbolic space that are based
at the Gaussian integers. Here we follow a similar path from an elementary represen-
tation of real continued fractions by horocycles to a deeper study of the representa-
tion of complex continued fractions by horospheres in three-dimensional hyperbolic
space.
An An−1 b1 1 b2 1 b 1
= ··· n . (2.1)
Bn Bn−1 1 0 1 0 1 0
http://dx.doi.org/10.4169/amer.math.monthly.121.05.391
MSC: Primary 40A15, Secondary 30F45, 30B70
Tn = t1 ◦ t2 ◦ · · · ◦ tn .
An 1
Tn (∞) = = b1 +
Bn 1
b2 +
1
b3 + · · · + .
bn
Clearly, we are working in the extended complex plane, and K(bn ) converges if the
sequence Tn (∞) converges; otherwise, it diverges.
Note that we can recapture the coefficients bn from the maps Tn . Indeed,
−1
bn = tn (∞) = Tn−1 Tn (∞),
and since we know the matrix representations for Tn and Tn−1 , we can simplify the
term on the right to obtain |An Bn−2 − Bn An−2 | = |bn |. This gives
|bn |
|Tn (∞) − Tn−2 (∞)| = . (2.2)
|Bn−2 ||Bn |
Lemma 3.1. Two horocycles with distinct real base points x and y, and Euclidean
radii r and s, are tangent if and only if
|x − y|2 = 4r s.
50
51
53
52
54
We have ignored the algebraic details of the special case Bn = 0, which occurs pre-
cisely when Tn (∞) = ∞. We return to this issue afresh in Section 6 with a geometric
approach that avoids the need to distinguish the point ∞. In Section 4, we study con-
tinued fractions with positive coefficients bn , in which case the denominators Bn are
also positive, and all horocycles 5n are Euclidean circles.
Each horocycle in a chain corresponding to an integer continued fraction is a Ford
circle; that is, its base point is a reduced rational p/q, and its Euclidean radius is
1/(2q 2 ). Ford introduced chains of Ford circles in [5]. Ford circles never overlap. A
collection of Ford circles is shown in Figure 3.2, and the first few horocycles in a chain
are shaded in a darker color.
and therefore we have shown that Tn+1 (∞) lies between Tn−1 (∞) and Tn (∞). Since
trivially T1 (∞) < T2 (∞), we deduce that
0 < T1 (∞) < T3 (∞) < · · · < T2n−1 (∞) < T2n (∞) < · · · < T4 (∞) < T2 (∞). (4.1)
This implies that there are real numbers α and β, with α ≤ β, such that
Therefore,
4rn rn−1 = |Tn (∞) − Tn−1 (∞)|2 < |Tn−1 (∞) − Tn−2 (∞)|2 = 4rn−1rn−2 ,
and so rn < rn−2 . We deduce that both sequences r1 , r3 , . . . and r2 , r4 , . . . are decreas-
ing (see Figure 4.1).
51
52
53
54
55
56
Figure 4.1. The beginnings of a chain of horocycles when all bn are positive
H3 = {(x, y, t) ∈ R3 : t > 0}
is a model
p of three-dimensional hyperbolic space when equipped with the hyperbolic
metric d x 2 + dy 2 + dt 2 /t. The corresponding distance function % on H3 is given by
Z p 2
d x + dy 2 + dt 2
%(u, v) = inf ,
γ γ t
where the infimum is taken over all smooth paths γ from u to v in H3 . Unlike the
Euclidean metric on H3 , the hyperbolic metric is complete. Let us identify the point
(x, y, 0) with the complex number z = x + i y. The complex plane C is then identified
with the Euclidean plane t = 0. The group of Möbius transformations acts on C ∪
{∞}, and we now describe how this action can be extended to H3 .
Consider a Möbius transformation f (z) = (az + b)/(cz + d), where a, b, c, and d
are complex numbers with ad − bc = 1. Define j = (0, 0, 1). Then, (x, y, t) can be
represented by the quaternion z + t j, and the action of f on H3 is given by
see [1, Section 4.1]. This action preserves the hyperbolic metric on H3 . In other words,
for any pair of points u and v in H3 . In fact, every conformal isometry of H3 arises in
this way.
We end this section by examining the actions on H3 of two specific types of Möbius
transformation. First, consider a translation f (z) = z + b. Then
f (z + t j) = z + b + t j.
That is, f also acts as a translation by b on H3 . Next, suppose that f (z) = 1/z. We
must write this map as f (z) = i/(i z) in order to satisfy the condition ad − bc = 1.
Then
z̄ + t j
f (z + t j) = . (5.2)
|z|2 + t 2
60 = {z + j : z ∈ C} ∪ {∞}.
We can calculate the image of 60 under a Möbius transformation using the represen-
tation by quaternions described in the previous section. Let
ht(z + t j) = t,
Proof. The base point of f (60 ) is f (∞). Suppose first that c 6 = 0. Then f (∞) = a/c,
so f (60 ) is a Euclidean sphere. By (5.1), we have
1
ht( f (z + j)) = .
|cz + d|2 + |c|2
The maximum value of this expression is 1/|c|2 (when z = −d/c), and hence f (60 )
has Euclidean radius 1/(2|c|2 ).
Suppose now that c = 0. Then ad = 1, so d = 1/a. From (5.1) we obtain
f (z + j) = a 2 z + ab + |a|2 j.
Let us apply Lemma 6.1 to the map tn (z) = bn + 1/z. First we must write tn in
the form tn (z) = (ibn z + i)/(i z + 0), to satisfy the condition ad − bc = 1. Then
Lemma 6.1 says that tn (60 ) has base point bn and Euclidean radius 1/2. Hence tn (60 )
is tangent to 60 at the point tn ( j) = bn + j, as illustrated in Figure 6.1.
Recall that Tn = t1 ◦ t2 ◦ · · · ◦ tn . The horospheres 60 and tn (60 ) are tangent at the
point tn ( j), which implies that the horospheres Tn−1 (60 ) and Tn (60 ) = Tn−1 (tn (60 ))
are tangent at the point Tn ( j). A chain of horospheres is a sequence of horospheres
tn
tn (6 0 )
bn
Figure 6.1. The horospheres 60 and tn (60 ) are tangent at the point tn ( j).
T1 (6 0 )
T2 (6 0 ) C
T3 (6 0 )
Theorem 6.2. Given a continued fraction K(bn ) with complex coefficients, the se-
quence 60 , T1 (60 ), T2 (60 ), . . . is a chain of horospheres. Conversely, given a chain of
horospheres 60 , 61 , 62 , . . . there is a unique continued fraction K(bn ) with Tn (60 ) =
6n for n = 1, 2, . . . .
Using Lemma 6.1, we can describe the horosphere Tn (60 ) in terms of the coeffi-
cients of Tn . We recall the sequences of complex numbers A0 , A1 , . . . and B0 , B1 , . . .
determined by the matrix equation (2.1).
Lemma 6.3. If Bn 6 = 0, then Tn (60 ) has base point An /Bn and Euclidean radius
1/(2|Bn |2 ). If Bn = 0, then Tn (6) = {z + |An |2 j : z ∈ C} ∪ {∞}.
The horospheres Tn (60 ) and Tn−1 (60 ) are tangent at the point Tn ( j). The base
points of these two horospheres are Tn (∞) and Tn−1 (∞) = Tn (0). The hyperbolic line
between ∞ and 0 in H3 (a Euclidean half-line), which contains j, is mapped by Tn to
the hyperbolic line between Tn (∞) and Tn (0) in H3 ; see Figure 6.3.
Tn−1 (6 0 )
Tn (6 0 )
Tn ( j)
Tn (0) Tn (∞)
Figure 6.3. The horospheres Tn−1 (60 ) and Tn (60 ) are tangent at the point Tn ( j).
7. THE CONVERSE TO THEOREM 4.1. Here Pwe prove the converse to The-
orem 4.1, namely that if K(bn ) converges, then bn diverges. In fact, we prove
a stronger result, known as the Stern–Stolz theorem (see [9, Theorem 3.1]), which
|dζ | |dζ |
Z Z
%( j, tn ( j)) = %( f ( j), f tn ( j)) = %( j, bn + j) = inf ≤ = |bn |.
γ ∈0 γ t δ t
Now, Tn−1 is also a hyperbolic isometry, so %( j, tn ( j)) = %(Tn−1 ( j), Tn ( j)). Thus,
using the triangle inequality, we find that
H⊥ = {x + t j : x ∈ R, t > 0};
we call this the vertical hyperbolic plane, and it is the set of those points z + t j in
H3 with y = 0. It is well known that a Möbius map f leaves the extended real line
invariant if and only if it can be written in the form f (z) = (az + b)/(cz + d), where
the coefficients a, b, c, and d are real, and ad − bc = ±1. Equally, this is so if and only
if f leaves H⊥ invariant. It follows that when considering real continued fractions (or,
x + tj
g(x + t j) = ,
x2 + t2
∞
X
rad[Tn (60 )] < +∞,
n=1
T1 (6 0 )
T3 (6 0 ) T2 (6 0 )
T2 ( j)
T3 ( j)
Figure 8.1. The beginnings of a chain of horospheres when all bn are real
where rad[Tn (60 )] is the Euclidean radius of Tn (60 ). Then, geometrically, it is clear
2
that K(bn ) converges. Lemma 6.3 tells us that
Prad[Tn (6 0 )] = 1/(2|Bn | ), and thus we
2
recover the simple result that convergence of 1/|Bn | implies convergence of K(bn ).
Another advantage of the geometric approach (using the horospheres Tn (60 )) rather
than the algebraic approach (using the coefficients Bn ) is that the geometric approach
generalizes to higher dimensions. The definition of a chain of horospheres extends in
a straightforward fashion to N -dimensional hyperbolic space
REFERENCES
1. A. F. Beardon, The Geometry of Discrete Groups. Springer, New York, 1983, available at http://link.
springer.com/book/10.1007\%2F978-1-4612-1146-4.
2. , Continued fractions, discrete groups and complex dynamics, Comput. Methods Funct. Theory 1
(2001) 535–594, available at http://link.springer.com/article/10.1007\%2FBF03321006#.
3. A. F. Beardon, M. Hockman, I. Short, Geodesic continued fractions, Michigan Math. J. 61 (2012) 133–
150.
4. L. R. Ford, On the closeness of approach of complex rational fractions to a complex irrational number,
Trans. Amer. Math. Soc. 27 (1925) 146–154.
5. , Fractions, Amer. Math. Monthly 45 (1938) 586–601.
6. G. H. Hardy, E. M. Wright, An Introduction to the Theory of Numbers. Sixth edition. Oxford University
Press, Oxford, 2008.
7. S. Katok, I. Ugarcovici, Symbolic dynamics for the modular surface and beyond, Bull. Amer. Math. Soc.
44 (2007) 87–132, available at http://www.ams.org/journals/bull/2007-44-01/S0273-0979-
06-01115-3/S0273-0979-06-01115-3.pdf.
8. A. Ya. Khinchin, Continued Fractions. Translated from the third (1961) Russian edition. Reprint of the
1964 translation. Dover, Mineola, NY, 1997.
9. L. Lorentzen, H. Waadeland, Continued Fractions. Vol. 1. Second edition. Atlantis Studies in Mathemat-
ics for Engineering and Science, Atlantis Press, Paris, 2008.
10. C. Series, The modular surface and continued fractions, J. London Math. Soc. 31 no. 2 (1985) 69–80,
available at http://jlms.oxfordjournals.org/content/s2-31/1/69.extract.
ALAN F. BEARDON received his Ph.D. from Imperial College, London, in 1964 and has taught at the Uni-
versity of Maryland, the University of Canterbury, and from 1968 to retirement, at the University of Cambridge.
IAN SHORT received his Ph.D. from the University of Cambridge in 2005, under the supervision of Alan
Beardon. He is now a lecturer in mathematics at the Open University. His research interests include complex
analysis, continued fractions, dynamical systems, and hyperbolic geometry.
Department of Mathematics and Statistics, The Open University, Milton Keynes MK7 6AA, United Kingdom
ian.short@open.ac.uk
Abstract. Mathematicians manipulate sets with confidence almost every day, rarely making
mistakes. Few of us, however, could accurately quote what are often referred to as ‘the’ axioms
of set theory. This suggests that we all carry around with us, perhaps subconsciously, a reliable
body of operating principles for manipulating sets. What if we were to take some of those
principles and adopt them as our axioms instead? The message of this article is that this can
be done, in a simple, practical way (due to Lawvere). The resulting axioms are ten thoroughly
mundane statements about sets.
Mathematicians manipulate sets with confidence almost every day of their working
lives. We do so whenever we work with sets of real or complex numbers, or with vec-
tor spaces, topological spaces, groups, or any of the many other set-based structures.
These underlying set-theoretic manipulations are so automatic that we seldom give
them a thought, and it is rare that we make mistakes in what we do with sets.
However, very few mathematicians could accurately quote what are often referred to
as ‘the’ axioms of set theory, short of looking them up. We would not dream of working
with, say, Lie algebras without first learning the axioms. Yet many of us will go our
whole lives without learning ‘the’ axioms for sets, with no harm to the accuracy of our
work. This suggests that we all carry around with us, more or less subconsciously, a
reliable body of operating principles that we use when manipulating sets.
What if we were to write down some of these principles and adopt them as our ax-
ioms for sets? The message of this article is that this can be done, in a simple, practical
way. We describe an axiomatization due to F. William Lawvere [3, 4], informally sum-
marized in Figure 1. The axioms suffice for very nearly everything mathematicians
ever do with sets. So we can, if we want, abandon the classical axioms entirely and use
these instead.
in conflict with ZFC’s usage of ‘set’: If all elements of R are sets, and they all have no
elements, then they are all the empty set, from which it follows that all real numbers
are equal.
Could we, perhaps, continue to use ZFC while quietly ignoring the requirement that
the elements of a set must be sets too? No; this would leave us unable to state the ZFC
axioms. For example, one axiom states that every nonempty set X has some element x
such that x ∩ X = ∅, which only makes sense if the elements of X are sets. When X
is an ordinary set such as R, few would recognize this axiom as meaningful: What is
π ∩ R, after all?
I will anticipate an objection to these criticisms. The traditional approach to set the-
ory involves not only ZFC, but also a collection of methods for encoding mathematical
objects of many different types (real numbers, differential operators, random variables,
the Riemann zeta function, . . . ) as sets. This is similar to the way in which computer
software encodes data of many types (text, sound, images, . . . ) as binary sequences. In
both cases, even the designers would agree that the encoding methods are somewhat
arbitrary. So, one might object, no one is claiming that questions like ‘what are the
elements of π ?’ have meaningful answers.
However, the criticisms made in earlier paragraphs have nothing to do with the
matter of encoding. The bare facts are that in ZFC, it is always valid to ask of a set
‘what are the elements of its elements?’, and in ordinary mathematical practice, it is
not. Perhaps it is misleading to use the same word, ‘set’, for both purposes.
S1 X R Rn
(a) (b)
1 X
(c)
Figure 2. Mapping out of a basic object (S 1 , R, or 1) picks out figures of the appropriate type (loops, lines, or
elements).
x
1 / X
f
f (x)
Y.
Hence:
2. THE AXIOMS. Here we state our ten axioms on sets and functions, in entirely
elementary terms.
The formal axiomatization is in a different typeface, to distinguish it from the ac-
companying commentary. Some diagrams appear, but they are not part of the formal
statement.
First we state the data to which our axioms will apply:
• some things called sets ;
• for each set X and set Y , some things called functions from X to Y , with
f
functions f from X to Y written as f : X −→ Y or X −→ Y ;
• for each set X , set Y , and set Z , an operation assigning to each f : X −→ Y
and g : Y −→ Z a function g ◦ f : X −→ Z ;
• for each set X , a function 1 X : X −→ X .
One-element set. We would like to say ‘there exists a one-element set’, but for the
moment we lack the expressive power to say ‘element’. However, any one-element
set T should have the property that for each set X , there is precisely one function
X −→ T . Moreover, only one-element sets should have this property. This motivates
the following definition and axiom.
Empty set.
Axiom 3. There exists a set with no elements.
Functions and elements. A function from X to Y should be nothing more than a way
of turning elements of X into elements of Y .
Axiom 4. Let X and Y be sets and f, g : X −→ Y functions. Suppose that f (x) =
g(x) for all x ∈ X . Then f = g.
Axioms 1, 2, and 4 imply that a set is terminal if and only if it has exactly one
element. This justifies the usage of ‘one-element set’ as a synonym for ‘terminal set’.
f1 f2
For all sets I and functions X ←− I −→ Y , there is a unique function
( f 1 , f 2 ) : I −→ P such that p1 ◦ ( f 1 , f 2 ) = f 1 and p2 ◦ ( f 1 , f 2 ) = f 2 .
I
( f1 , f2 )
f1
f2
P
u p1 p2 )
X Y
Figure 3. The characteristic property of products
Sets of functions. In everyday mathematics, we can form the set Y X of functions from
one set X to another set Y . For any set I , the functions q : I × X −→ Y correspond
one-to-one with the functions q̄ : I −→ Y X , simply by changing the punctuation:
(t ∈ I , x ∈ X ). For example, when I = 1, this reduces to the statement that the func-
tions X −→ Y correspond to the elements of Y X .
In (1), we are implicitly using the evaluation map
ε: YX × X −→ Y
( f, x) 7 −→ f (x).
Then (1) becomes the equation q(t, x) = ε(q̄(t), x), as in the following definition.
Axiom 6. For all sets X and Y , there exists a function set from X to Y .
For all sets I and functions q : I −→ X such that f (q(t)) = y for all t ∈ I ,
there is a unique function q̄ : I −→ A such that q = j ◦ q̄.
I
( /' 1
q̄
A
q j y
!
X / Y
f
A / 1
j t
X / 2
χ
0 s
1 / N / N
11 x x
1 / X / X
a r
The meaning of ‘the’. It remains to reassure any readers concerned by the liberty
taken in Axioms 2 and 5, where we chose once and for all a terminal set and a cartesian
product for each pair of sets.
This type of liberty is very common in mathematical practice. We speak of the
trivial group, the 2-sphere, the direct sum of two vector spaces, etc., even though we
can conceive of many trivial groups or 2-spheres or direct sums, all isomorphic but
not equal. Anyone asking ‘but which trivial group?’ is likely to be met with a hard
stare, and for good reason: No meaningful statement about groups depends on what
the element of the trivial group happens to be named.
However, we should be able to state the axioms with scrupulous rigor, and we can.
One way to do so is not to single out a particular terminal set or particular products,
but instead to adopt some circumlocutions. For example, we replace the phrase ‘for all
elements x ∈ X ’ by ‘for all terminal sets T and functions x : T −→ X .’
More satisfactory, though, is to extend the list of primitive concepts. To the existing
list (sets, functions, composition and identities) we add:
• a distinguished set, 1;
• an operation assigning to each pair of sets X, Y a set X × Y and functions
pr1X,Y pr2X,Y
Xo X ×Y / Y. (2)
Axiom 2 is replaced by the statement that 1 is terminal, and Axiom 5 by the statement
that for all sets X and Y , the set X × Y together with the functions (2) is a product of
X and Y .
This approach has the virtue of reflecting ordinary mathematical usage. We usually
speak as if taking the product of two sets (or spaces, groups, etc.) were a procedure
with a definite output: the product, not a product. But since products are in any case
determined uniquely up to unique isomorphism, whether or not we nominate one as
special makes no significant difference.
3. DISCUSSION. The ten axioms are familiar in their intuitive content, but less so
as an axiomatic system. Here we discuss the implications of using them as such.
How strong are the axioms? Most mathematicians will never need more properties
of sets than those guaranteed by the ten axioms. For example, McLarty [13] argues
that no more is needed anywhere in the canons of the Grothendieck school of alge-
braic geometry, the multi-volume works Éléments de Géométrie Algébrique (EGA)
and Séminaire de Géométrie Algébrique (SGA).
To get a sense of the reach of the axioms, let us consider infinite cartesian products.
Q I be a (possibly infinite) set and (X i )i∈I a family of sets. Can we form the product
Let
i∈I X i ? The answer depends on what is meant by ‘family’. We could define an I -
indexed family to be a set X together with a Q function p : X −→ I , viewing the fiber
p −1 (i) as the ith member X i . In that case, X i can be constructed as a subset of
X I . Specifically, p induces a function p I : X I −→ I I , and X i is the inverse image
Q
under p I of the element of I I corresponding to 1 I .
However, we could interpret ‘I -indexed family’ differently, as an algorithm or for-
mula that assigns to` each i ∈ I a set X i . It is not obvious that we can then form the
disjoint union X = i∈I X i , which is what would be necessary in order to obtain a
family in the previous sense. In fact, writing P (S) = 2 S for the power set of a set S,
the ten axioms do not guarantee the existence of the disjoint union
A broader view. Our ten axioms are a standard rephrasing of Lawvere’s Elementary
Theory of the Category of Sets (ETCS), published in 1964. It was some years before
ETCS found its natural home, and that was with the advent of topos theory.
The notion of topos was invented by Grothendieck for reasons that had nothing to
do with set theory. For Grothendieck, a topos was a generalized topological space.
Formally, a topos is a category with certain properties, and a topological space X is
associated with the topos whose objects are the sheaves of sets on X .
Lawvere and Tierney swiftly realized that, after a slight loosening of Grothendieck’s
definition, the ETCS axioms could be restated neatly in topos-theoretic terms [16, 17].
Indeed, ETCS says exactly that sets and functions form a topos of a special sort: a
‘well-pointed topos with natural numbers object and choice’. So a topos is not only a
generalized space; it is also a generalized universe of sets.
An attractive feature of ETCS is that each of the axioms is meaningful in a broader
context than set theory. For example, Axiom 1 states that sets and functions form a
category. The job of the remaining axioms is to distinguish sets from other structures
that form categories. Axioms 2 and 5 state that the category of sets has finite products.
This important property is shared by (for example) the categories of topological spaces
and smooth manifolds, which is exactly what makes it possible to define ‘topological
group’ and ‘Lie group’. But for one detail, Axioms 1, 2, 5, 6, 7 and 8 state that sets
and functions form a topos.
The axiom of choice as formulated in Axiom 10 highlights a special feature of sets.
In most other categories of sets-with-structure, it fails, and its failure is a point of
interest. For instance, not every continuous surjection between topological spaces has
a continuous right inverse, a typical example being the nonexistence of a continuous
square root defined on the complex plane.
What kind of set theory should we teach? As Figure 1 indicates, we already teach a
diluted form of the ten axioms, even in introductory courses. For example, we certainly
tell our students that an element of X × Y is an element of X together with an element
of Y , and we routinely write a function f taking values in R2 as ( f 1 , f 2 ), although we
are less likely to state explicitly that, given functions f 1 : I −→ X and f 2 : I −→ Y ,
there is a unique function f : I −→ X × Y with f 1 and f 2 as components.
When it comes to teaching axiomatic set theory, the approach outlined here has ad-
vantages and disadvantages. The great advantage is that such a course is of far wider
benefit than one using the traditional axioms. It directly addresses a difficulty experi-
enced by many students: the concept of function (and worse, function space). It also
introduces in an elementary setting the idea of universal property. This is probably
the hardest aspect of the axioms for a learner, but since universal properties are im-
portant in so many branches of advanced mathematics, the benefits are potentially
far-reaching.
The disadvantages are perhaps only temporary. There is at present a lack of teaching
materials (the book [5] being the main exception). For example, the axioms imply
that any two sets have a disjoint union, and most books on topos theory contain an
elegant and sophisticated proof of a generalization of this fact, but to my knowledge,
there is only one place where a purely elementary proof can be found [18]. A second
disadvantage is that any student planning a career in set theory will need to learn ZFC
anyway, since almost all research-level set theory is done with the iterated-membership
conception of set. (That is the current reality, which is not to say that set theory must
Reactions to an earthquake. Perhaps you will wake up tomorrow, check your email,
and find an announcement that ZFC is inconsistent. Apparently, someone has taken
the ZFC axioms, performed a long string of logical deductions, and arrived at a con-
tradiction. The work has been checked and re-checked. There is no longer any doubt.
How would you react? In particular, how would you feel about the implications for
your own work? All your theorems would still be true under ZFC, but so too would
their negations. Would you conclude that your life’s work had been destroyed?
An informal survey suggests that most of us would be interested but not deeply
troubled. We would go on believing that our theorems were true in a sense that their
negations were not. We are unlikely to feel threatened by the inconsistency of axioms
to which we never referred anyway.
In contrast, the ten axioms above are such core mathematical principles that an in-
consistency in them would be devastating. If we cannot safely assume that composition
of functions is associative, or that repeatedly applying a function f : X −→ X to an
element a ∈ X produces a sequence ( f n (a)), we are really in trouble.
The difference in reactions is telling. Our response to an inconsistency in an ax-
iomatization of set theory reflects our degree of belief that it describes the operating
principles we actually employ, in ordinary mathematical practice.
In summary, simply by writing down a few mundane, uncontroversial statements
about sets and functions, we arrive at an axiomatization that fits well with how sets are
really used in mathematics.
ACKNOWLEDGMENTS. I thank François Dorais, Colin McLarty, Todd Trimble, the patrons of the n-
Category Café, and the anonymous referees. This work was partially supported by an EPSRC Advanced Re-
search Fellowship.
REFERENCES
1. S. Axler, Down with determinants! Amer. Math. Monthly 102 (1995) 139–154.
2. J. C. Cole, Categories of sets and models of set theory, in Proceedings of the Bertrand Russell Memorial
Logic Conference. Uldum 1971. Bertrand Russell Memorial Conference, Leeds, 1973. 351–399.
3. F. W. Lawvere, An elementary theory of the category of sets, Proc. Natl. Acad. Sci. USA 52 (1964)
1506–1511.
4. , An elementary theory of the category of sets (long version) with commentary, Repr. Theory
Appl. Categ. 12 (2005) 1–35, available at http://www.tac.mta.ca/tac/reprints/articles/11/
tr11abs.html.
5. F. W. Lawvere, R. Rosebrugh, Sets for Mathematics. Cambridge University Press, Cambridge, 2003.
6. S. Mac Lane, Mathematics: Form and Function. Springer, New York, 1986.
7. S. Mac Lane, I. Moerdijk, Sheaves in Geometry and Logic. Springer, New York, 1994.
8. A. Mathias, The strength of Mac Lane set theory, Ann. Pure Appl. Logic 110 (2001) 107–234.
9. C. McLarty, Elementary Categories, Elementary Toposes. Oxford University Press, Oxford, 1992.
10. , Numbers can be just what they have to, Noûs 27 (1993) 487–98.
11. , Challenge axioms, final draft, email to Foundations of Mathematics mailing list, 6 February
1998, available at http://www.cs.nyu.edu/pipermail/fom.
12. , Exploring categorical structuralism, Philos. Math. 12 (2004) 37–53.
13. , A finite order arithmetic foundation for cohomology (2011), available at http://arxiv.org/
abs/1102.1773.
14. W. Mitchell, Boolean topoi and the theory of sets, J. Pure Appl. Algebra 2 (1972) 261–274.
15. G. Osius, Categorical set theory: a characterization of the category of sets, J. Pure Appl. Algebra 4 (1974)
79–119.
16. M. Tierney, Sheaf theory and the continuum hypothesis, in Toposes, Algebraic Geometry and Logic.
Lecture Notes in Math., Vol. 274, Springer, Heidelberg, 1972. 13–42.
TOM LEINSTER studied at Oxford and Cambridge, and held positions at Cambridge, the Institut des Hautes
Études Scientifiques, and the University of Glasgow before taking up his current job in Edinburgh. He is inter-
ested in category theory and its applications, especially some of the more unusual ones. He is a professionally
qualified masseur.
School of Mathematics, University of Edinburgh, Edinburgh EH9 3JZ, United Kingdom.
Tom.Leinster@ed.ac.uk
REFERENCES
1. R. Courant, H. Robbins, What is Mathematics? Oxford University Press, New York, 1941.
2. L. J. Goldstein, D. I. Schneider, M. J. Siegel, Finite Mathematics and its Applications. Eighth
edition. Pearson, Upper Saddle River, NJ, 2004.
3. C. M. Grinstead, J. L. Snell, Introduction to Probability. Second edition. American Mathematical
Society, Providence, RI, 1997.
4. T. L. Saaty, J. Bram, Nonlinear Mathematics. Dover, New York, 1964.
Abstract. We investigate the problem of finding integers k such that appending any number
of copies of the base-ten digit d to k yields a composite number. In particular, we prove that
there exist infinitely many integers coprime to all digits such that repeatedly appending any
digit yields a composite number.
1. INTRODUCTION. Recently, L. Jones [5] asked about integers that yield only
composites when a sequence of the same base-ten digit is appended to the right. He
showed that 37 is the smallest number with this property when appending the digit
d = 1. For each digit d ∈ {3, 7, 9}, he also found numbers coprime to d that yield only
composites upon appending ds.
In this paper, we find a single integer that works for all digits simultaneously. More
precisely, we prove the following.
Further, we investigate the question of the smallest numbers that remain composite
upon appending strings of a digit for each particular digit. Jones found, for digits 3, 7,
9, respectively, the examples 4070, 606474, and 1879711. It appears that 4070 is the
smallest for d = 3; for digit 7 we found 891, which is almost certainly minimal; and
for digit 9, the likely answer 10175 was discovered by [14]. In the next section, we
explain the obstructions to proving that these three answers are the smallest.
2. SEEDS. Given a digit d, let’s use the term seed for a number coprime to d such
that appending any number of ds on the right yields a composite. The smallest positive
integer with this property will be referred to as a minimal seed. Only the cases d ∈
{1, 3, 7, 9} are nontrivial. Jones proved that 37 is the minimal seed for d = 1, and he
also found the seed 4070 for digit 3. For every k < 4070, except 817, we have found a
value of n such that appending n 3s yields a prime or, in three cases, a probable prime.
For 817, appending up to 554789 3s yielded only composites. But factorizations show
no apparent obstruction to primality, so we conjecture that 4070 is the minimal seed
for digit 3.
A key concept in this area is the notion of a covering set, introduced by P. Erdős
[3]. Such a set corresponds to a finite list of primes such that every member of a given
sequence is divisible by one of the primes. Here the sequences are the numbers, which
we call sn , obtained by appending n copies of a digit d to an initial value k; typically,
the numbers are proved composite by finding a covering set. For example, when n
7s are appended to 891, the resulting number is divisible by 11, 37, 11, 3, 11, or 13
according to the mod-6 residue of n (starting at 0).
http://dx.doi.org/10.4169/amer.math.monthly.121.05.416
MSC: Primary 11A41, Secondary 11A07; 11A51
d(10n − 1)
sn = k · 10n + .
9
Because 106 ≡ 1 modulo each of the four primes, easy modular arithmetic shows that
s6m+i ≡ 0 (mod p) for the cases p = 11, 13, and 37, where i, depending on p, is 0,
2, 4, 5, or 1. The same is true for i = 3, the case where p = 3, because 106m+3 − 1 is
divisible by 27, thus eliminating the denominator of 9 in these cases. This proves that
891 is a seed for digit 7.
When a sequence of primes ( p0 , p1 , . . . , pr −1 ) divides the corresponding sequence
of terms sn for a digit d and seed k, we say that the primes form a prime cover for
(k, d). For example, (11, 37, 11, 3, 11, 13) is a prime cover for (891, 7).
We have shown that 891 is a minimal seed for digit 7, under the assumption that
appending 11330 7s to 480, and 28895 7s to 851 yields primes. Each of these two
large numbers has passed 200 strong pseudoprime tests. For all other potential seeds
below 891, we have found primes that can be certified using elliptic curve methods
with Mathematica or Primo [9]. We used Primo on the largest cases; the largest was
9777 . . . 7 with 2904 7s, which took 45 hours.
The digit-9 case asks for an integer k such that (k + 1)10n − 1 is always composite;
it is thus a variation on the classic Riesel problem [7, 11, 12, 13], which addresses
the same question in base 2. For that classic case, it is known that 509202 is a seed,
meaning that 509203 · 2n − 1 is composite for n ≥ 0. Participants in the Riesel project
have also investigated the decimal case, and showed [14] that the expected minimal
seed for digit 9 is 10175. To see that this is a seed, we again consider the number of
appended digits modulo 6 and find a prime cover: in this case (11, 7, 11, 37, 11, 13).
Of the numbers smaller than 10175, only 4420 has not been eliminated as a seed.
The Riesel project [12, 13] has checked it through the addition of 940000 9s without
finding a prime. In this case, primality proving for a probable prime is easy using the
Lucas n + 1 test [2].
Coverings are not the only tool in these investigations, since sometimes factoriza-
tions yield all the compositeness that is sought. Consider the situation with digit 1 but
working in base b = m 2 with m odd. The minimal seed in all such cases is 1 because,
for n appended 1s to the seed 1, with n even, the factorization
yields integer factors, and so the result is composite. When n is odd, the total number
of 1s is even, so compositeness is clear. Similar factorization methods show that the
minimal seed for digit 1 in base 4 is 5, for digit 3 in base 4 is 8, and for digit 8 in base
9 is 3.
Proof. A proof requires only checking that particular covers work, but we outline the
method by which the large seed and corresponding prime covers were found. We find,
for each digit, a prime cover so that the congruence conditions on k arising from the
four covers do not contradict each other. This method of coherent prime covers was
used in [1, 4, 8] to find infinitely many values k so that both k2n + 1 and k2n − 1 are
composite for all n, and solve related problems. To find such covers, we first need to
analyze the condition that a term in the sequence {sn } is divisible by a given prime p.
If we assume that p ∈ / {2, 3, 5}, then sn ≡ 0 (mod p) if and only if p divides
which is equivalent to
10n − 1
sn ≡ k + d ≡ 0 (mod 3),
9
which, because (10n − 1)/9 ≡ n (mod 3), reduces to k ≡ 2dn (mod 3). It is useful to
observe that when n is even then 10n ≡ 1 (mod 11), so that in this case sn is congruent
modulo 11 to the seed itself. Therefore, the condition k ≡ 0 (mod 11) makes 11 a
factor of sn whenever n is even. Hence we may focus on forcing composites for odd
values of n.
Since the period of p = 37 is 3, we consider this prime next. When the number of
appended digits is n = 6i + 3, equation (1) gives
i
k ≡ 10−(6i+3) − 1 = 10−6 · 10−3 − 1 ≡ 0 (mod 37).
n (mod 6)
digit 0 1 2 3 4 5
1 11 ? 11 37 11 ?
3 11 ? 11 37 11 ?
7 11 ? 11 37 11 ?
9 11 ? 11 37 11 ?
show that if k ≡ 2 (mod 7), then two of the eight cases are divisible by 7: the digit 1
with n ≡ 1 (mod 6) and digit 9 with n ≡ 5 (mod 6) cases. Similarly, any of k ≡ 1,
3, or 9 (mod 13) provides divisibility for two of the cases. Each of these cases is then
combined with a set of additional primes that contains 3, 101, 41, 271, 73, and 137, all
of which have period 8 or less. Finally, a computer search found a list of primes that
handles all cases.
Table 2. Conditions on k to guarantee that 7 or 13 divides the number obtained by appending a digit
string to k.
The smallest value of k found so far uses the primes 3, 7, 11, 13, 31, 37, 41, 73,
101, 137, 211, 241, and 271. The cover-lengths for the four digit-cases are 6, 6, 30,
and 8, respectively. The prime covers for the four digits are as follows:
Tables 3 and 4 show the correspondence between the values of n and k for each
digit. For example, when we are appending n 7s to k where n ≡ 11 (mod 30), we see
that 41 divides sn whenever k ≡ 28 (mod 41).
We apply the Chinese Remainder Theorem to all of the conditions on k in Tables 3
and 4 to find the pandigital seed k = 4942768284976776320.
digit 1 digit 3
classes for n classes for k classes for n classes for k
Table 4. Residue classes for the seed k that guarantee the compositeness of sn when
7 or 9 is appended.
digit 7 digit 9
classes for n classes for k classes for n classes for k
a value of k that satisfies the theorem as stated in §1. This yields infinitely many such
values.
REFERENCES
JON GRANTHAM has been a researcher at the Center for Computing Sciences since 1997. His previous
publications addressed the existence of various types of pseudo primes. He lives in Maryland with his wife and
their twin three-year-olds.
Institute for Defense Analyses, Center for Computing Sciences, 17100 Science Drive, Bowie, MD 20715
grantham@super.org
WITOLD JARNICKI worked at the Jagiellonian University (Kraków, Poland) until 2007. His research con-
centrated on algebraic geometry and complex analysis. Since 2007 he has been working as a software engineer
at Google.
Google Kraków, Rynek Glowny 12, 31-042 Kraków, Poland
witoldjarnicki@google.com
JOHN RICKERT teaches at the Rose-Hulman Institute of Technology and writes about number theory and
graph theory. In his spare time he researches baseball history and statistics.
Rose-Hulman Institute of Technology, 5500 Wabash Avenue, Terre Haute, IN 47803
rickert@rose-hulman.edu
STAN WAGON is recently retired from Macalester College. His books include The Banach-Tarski Paradox,
Mathematica in Action, and VisualDSolve. Other interests include geometric snow sculpture, mountaineering,
and mushroom hunting. He is one of the founding editors of Ultrarunning magazine, but now finds that covering
long distances is much easier on skis than in running shoes.
Mathematics Department, Macalester College, St. Paul, MN 55105
wagon@macalester.edu
is discussed, for instance, in [17, 25]. The cyclic structure of (1) is formalized as
follows:
∞
π Y 2n + 2 2n + 2
= . (2)
2 n=0
2n + 1 2n + 3
This is generalized in the next proposition, which is a slight variation of a part of [20,
Theorem 1]. The result is the following Euler’s formula [8, Section VII.6] applied to
fractions:
∞ ∞
x2 x2
Y
sin(π x) Y
= 1− 2 = 1− , for x ∈ R. (3)
πx j=1
j n=0
(n + 1)2
http://dx.doi.org/10.4169/amer.math.monthly.121.05.422
MSC: Primary 11Y60, Secondary 60C05; 33B15
Example 2.2.
(a) Letting k = 6 and m = 1 in (4), we obtain [25, p. 187]
∞
Y 6n + 6 6n + 6 π
= .
n=0
6n + 5 6n + 7 3
We remark that (3) can be thought of as the representation of the infinite Taylor
polynomial
If p ∈ Q, the above formulas yield expressions for infinite products of fractions whose
periodic structure resembles that of the Wallis product.
In fact, the Weierstrass factorization theorem implies that [24, Section 12.13]:
d
∞ Y d
Y n + xj Y 0(y j )
= , for x j , y j ∈ C, (7)
n=0 j=1
n + yj j=1
0(x j )
In view of (7), the condition dj=1 a j = dj=1 b j must hold to ensure that α is finite
P P
π
In fact, (6) and (9) with x = 2 p +1
imply
∞ Y p
Y (2 p+1 + 2)n + 2 p − 2 j + 1 (2 p+1 + 2)n + 2 p + 2 j + 1 1
= p.
n=0 j=1
(2 p+1 + 2)n + 2 p + 1 (2 p+1 + 2)n + 2 p + 1 2
It is curious to note that (10) above can be also derived from the result of [18] with
n = 18.
Example 3.3. Vieta [3, p. 53] (see also [13, Chapter 1]) showed that π2 = ∞
Q
√ √ n=1 Sn ,
where S1 = 1/2 and Sn = 1/2 + 1/2 Sn−1 for n > 1. Osler considered in [16] the
“united Vieta-Wallis-like products”
v s
u
p r ∞
sin(π x) Y t 1 1 1 1 1 1 2 p n −x 2 p n +x
u Y
= + +· · ·+ + cos(π x) (11)
πx m=1
2 2 2 2 2 2 n=1
2pn 2pn
(m radicals)
θ
with p ∈ N ∪ {0}. The proof of (11) rests on (4), (9), and the identity cos =
√ 2
1/2 + 1/2 cos θ .
√ √
q
p
∞
Y 30n + 9 30n + 19 0(1/3)0(3/5) 15 + 5 + 2 5
= = ,
n=0
30n + 10 30n + 18 0(3/10)0(19/30) 219/30 31/20 51/3
∞
p√
Y 12n + 5 12n + 9 0(1/2)0(2/3) 3+1
= = 1/4 3/8 ,
n=0
12n + 6 12n + 8 0(3/4)0(5/12) 2 3
∞
6n + 2 3 6n + 5 6n + 5 {0(1/3)}3 0(5/6) 0(5/6)
Y
= = 1.
n=0
6n + 4 6n + 1 6n + 3 {0(2/3)}3 0(1/6) 0(3/6)
0(1/24) 0(11/24) √
q
= 6 + 3,
0(5/24) 0(7/24)
we can compute
∞
Y 24n + 5 24n + 7
.
n=0
24n + 1 24n + 11
Example 3.5. It is observed in [1, Section 4.4] that the key formula (7), along with
the so called “standard equations” for the gamma function (translation, reflection, and
multiplication) can be used to derive the following identity, which is valid for any
integer k ≥ 0:
∞ Y k
Y (2n +1)(2k +1)−2 j (2n +1)(2k +1)+2 j 1
=√ . (12)
n=0 j=1
(2n +1)(2k +1)−2 j +1 (2n +1)(2k +1)+2 j −1 2k +1
The observation that this product can be evaluated using only the standard equations
is interesting in the light of Rohrlich’s conjecture which, informally speaking, asserts
that those are the only ones available for the values of the gamma function in rational
points, while the others can be obtained as their consequences (see, for instance, [10]
and [22, Section 4.1]).
Example 3.6. The following “kth order Wallis’s product” is somewhat similar to (12)
written as
n(2k + 1) + 2 j − 1 (−1)
∞ Y k n −1
Y 22k 2k
=√ .
n=1 j=1
n(2k + 1) + 2 j 2k + 1 k
k−1
∞ Y k−1
Y 2nk 2nk Y 2j + 1 2j + 1
Ak := = 0 ·0 1+
n=1 j=0
2nk − 2 j − 1 2nk + 2 j + 1 j=0
2k 2k
( Q2k−1 )2
k−1
2j + 1 2 (2k)k 0 1 m
Y 2k +
= · 0 1+ = m=1 2k
(2k − 1)!! 0
Qk−1 m
j=0
2j + 1 2k m=1 1 + k
π k (2k − 1)!!
= .
2k k
For instance, A1 is Wallis’s product and
4 4 4 4 8 8 8 8 12 12 12 12 16 16 16 16 3π 2
A2 = ··· = .
1 3 5 7 5 7 9 11 9 11 13 15 13 15 17 19 8
That is, An (ω) is the set of realizations whose first n coordinates coincides with
those of ω. Note that ∩n∈N An (ω) = {ω}. Let Fn denote the σ -algebra generated by
ω1 , . . . , ωn and let F∞ denote the σ -algebra generated by ∪n∈N Fn . Let Pa denote the
law of the urn on F∞ , and let E a be the corresponding expectation. Also, let
n
X ai + kTn,i (ω)
Tn,i (ω) = 1{ω j (ω)=i} and X n,i (ω) =
j=1
|a| + kn
denote the number of times a color i ball was drawn in the first n iterations and the
fraction of color i balls after n iterations, respectively. Finally, for n ∈ N, write X n =
(X n,1 , . . . , X n, p ), Tn = (Tn,1 , . . . , Tn, p ), and let X 0 = |a|
a
and T0 = (0, . . . , 0). Then
for all n ≥ 0, we have
ai + kTn,i (ω)
Pa (ωn+1 = i|Fn )(ω) = X n,i (ω) = .
|a| + kn
Notice that the right-hand side depends only on the number of color i balls chosen in
the first n iterations, but not the order in which they were chosen. It then follows by
induction that the sequence (ωn )n∈N is exchangeable. That is, for n ∈ N and a permuta-
tion σ of {1, . . . , n}, the distributions of (ω1 , . . . , ωn ) and (ωσ (1) , . . . , ωσ (n) ) coincide
under Pa . In particular,
Q p QTn,i (ω)−1
i=1 j=0 (ai + k j)
Pa (An (ω)) = . (14)
(|a| + k j)
Qn−1
j=0
The following result is well known, and was first established by Eggenberger and Pólya
in the case p = 2 [5, 14].
Theorem 4.1. For any a ∈ N p , X n = X 1,n , . . . , X p,n converges almost surely with
respect to the measure Pa to a random vector X ∞ ∈ R p with the Dirichlet density
p p
0(|a|/k) Y aki −1 X
f a (x1 , . . . , x p ) = Q p xi , where xi > 0 and xi = 1.
i=1 0(ai /k) i=1 i=1
Pb (An (ω))
lim
n→∞ Pa (An (ω))
for suitable choices of a, b, and ω. We begin with the intuitively clear observation that
the events we consider are asymptotically vanishing.
ai + kn |a| − 1 + kn 1
P(ωn+1 = i|Fn ) = X n,i ≤ ≤ =1− .
|a| + kn |a| + kn |a| + kn
Therefore,
n
Y 1 Pn 1
P(An (ω)) ≤ 1− ≤ e− j=0 |a|+k j → 0, as n → ∞.
j=0
|a| + k j
This yields the result by virtue of the identity {ω} = ∩n∈N An (ω).
Tn (ω)
ω ∈ p : T∞ (ω) = lim exists and is in (0, 1) p .
n→∞ n
Note that Pa (A p ) = 1 according to Theorem 4.1. The following is our main result.
The proof given below is a direct application of the following Stirling’s approxima-
tion formula for the gamma function (see, for instance, [15, Proposition 2.1]):
Z ∞ √ 1
0(x) = e−t t x−1 dt ∼ 2π x x− 2 e−x , as x → ∞. (15)
0
We note that the proof does not rely on either Theorem 4.1 or any of the results of the
preceding sections.
Thus, rewriting (14) in terms of the gamma function and using (15), we obtain, as
n → ∞,
p
0( |a| )/ i=1 0( aki ) Y |aki | − |bki |
Qp
Pa (An (ω)) f a (X ∞ (ω))
lim = k
X ∞,i = .
n→∞ Pb (A n (ω)) 0( k )/ i=1 0( k ) i=1
|b| p bi
f b (X ∞ (ω))
Q
We remark that the result remains true if we assume that the number of balls re-
turned in each of the schemes is different, say k balls under Pa and k 0 6 = k balls under
Pb . The proof being identical, we omit details.
Notice that the Wallis-type products of Definition 2.6 correspond to the special case
|a| = |b| of the following corollary to our main result.
Proof of Corollary 4.5. Let ω = (ωn )n∈N be the p-periodic sequence defined by ωn ≡
n mod p and ωn ∈ [1, p], so that
ω = 1, 2, . . . , p, 1, 2, . . . , p, 1, 2, . . . .
f a (X ∞ (ω))
= ,
f b (X ∞ (ω))
Admissible sequences are a natural extension of the cyclic ones that show up in
Corollary 4.5. Note that while Pa (A p ) = 1, there are only countably many cyclic se-
quences, and hence according to Proposition 4.2, their entire collection is a set of
probability zero. To get a somewhat less trivial example of a subset of A p , which is a
null-set of measure Pa , we can consider, for instance, ω’s in A p that do not satisfy the
law of iterated logarithm for Pólya’s urns (an estimate on the rate of convergence of
X n to X ∞ ) proved in [12, p. 775].
We conclude this section with an observation that Pa and Pb are equivalent mea-
sures, namely they share the same null-events. Thus, we cannot determine the initial
distribution of the urn scheme only by observing its realization.
Pb (An (ω))
Z n (ω) = .
Pa (An (ω))
f b (X ∞ (ω))
,
f a (X ∞ (ω))
Proof of Corollary 4.6. The almost sure convergence is the content of Theorem 4.4.
Observe next that E a (Z n ) = 1 because Z n is a Radon–Nikodym derivative of Pb |Fn
with respect to Pa |Fn (see, for instance, Section 5.3.3 and Appendix A4 in [9] for a
superb introduction to the Radon–Nikodym derivative and Radon–Nikodym theorem
within the context of probability theory). In fact, it follows from the definition of An (ω)
in (13) that
X
E a (Z n ) = Z n (ω)Pa (An (ω))
ω∈N pn
X Pb (An (ω))
= Pa (An (ω))
n Pa (A n (ω))
ω∈N p
X
= Pb (An (ω)) = 1.
ω∈N pn
f b (x)
Z
E a (Z ∞ ) = f a (x) d x = 1.
f a (x)
By Vitali’s convergence theorem (see, for instance, [19, p. 165] or [9, Theorem 5.5.2]),
the almost sure convergence, along with the convergence of the expected values, imply
the convergence in L 1 (Pa ).
It remains to show that Pb is absolutely continuous with respect to Pa . Let A ∈
∪n∈N Fn . Then Pb (A) = E a (1 A Z n ) for all n sufficiently large. Since
E a (1 A |Z n − Z ∞ |) ≤ E a (|Z n − Z ∞ |) → 0 as n → ∞,
it follows that
Pb (A) = E a (1 A Z ∞ ). (17)
As in the case of Theorem 4.4, the result continues to hold when the number of balls
returned to each urn is different, the proof being identical.
ACKNOWLEDGMENTS. We are grateful to the anonymous referees and the editor for the careful reading
of this paper and many helpful remarks and suggestions. We also wish to thank Jonathan Sondow and Huang
Yi for their valuable and encouraging comments on a preliminary version of the paper. Iddo Ben-Ari is grateful
for support from the Simons Foundation grant #208728.
REFERENCES
1. J.-P. Allouche, J. Sondow, Infinite products with strongly B-multiplicative exponents, Ann. Univ. Sci.
Budapest. Sect. Comput. 28 (2008) 35–53.
2. T. Amdeberhan, O. Espinosa, V. H. Moll, A. Straub, Wallis-Ramanujan-Schur-Feynman, Amer. Math.
Monthly 117 (2010) 618–632.
3. Pi: A Source Book. Second edition. Edited by L. Berggren, J. Borwein, and P. Borwein. Springer, New
York, 2000.
4. W. A. Beyer, J. D. Louck, D. Zeilberger, Math bite: A generalization of a curiosity that Feynman remem-
bered all his life, Math. Mag. 69 (1996) 43–44.
5. D. Blackwell, J. B. MacQueen, Ferguson distributions via Pólya urn schemes, Ann. Stat. 1 (1973) 353–
355.
6. J. M. Borwein, I. J. Zucker, Fast evaluation of the gamma function for small rational fractions using
complete elliptic integrals of the first kind, IMA J. Numer. Anal. 12 (1992) 519–526.
7. E. Catalan, Sur la constante d’Euler et la fonction de Binet, C. R. Acad. Sci. Paris Sér. I Math. 77 (1873)
198–201.
8. J. B. Conway, Functions of One Complex Variable I. Second edition. Springer, New York, 1978.
9. R. Durrett, Probability: Theory and Examples. Cambridge Series in Statistical and Probabilistic Mathe-
matics. Forth edition. Cambridge University Press, 2010.
10. S. Gun, M. Ram Murty, P. Rath, Linear independence of digamma function and a variant of a conjecture
of Rohrlich, J. Number Theory 129 (2009) 1858–1873.
11. Collected Papers of Srinivasa Ramanujan. Edited by G. H. Hardy, P. V. Seshu Aiyar, and B. M. Wilson.
AMS Chelsea Publishing, Providence, RI, 2000.
12. C. C. Heyde, On central limit and iterated logarithm supplements to the martingale convergence, J. Appl.
Probab. 14 (1977) 758–775.
IDDO BEN-ARI received a Ph.D. in mathematics from the Technion, Israel Institute of Technology, in 2005.
He is an assistant professor with the department of mathematics, University of Connecticut.
Department of Mathematics, University of Connecticut, Storrs, CT 06269-3009.
iddo.ben-ari@uconn.edu
DIANA HAY was born and raised in the idyllic Californian coastal town Santa Cruz. Diana is a banana slug,
having received her first undergraduate degree in physical anthropology from the University of California,
Santa Cruz. After her graduation, she assisted with underwater whale photography expeditions and worked
as a freelance web developer. She later returned to school to receive her second baccalaureate in mathematics
from California State University, Monterey Bay, and is now a Ph.D. student in mathematics at Iowa State
University.
Department of Mathematics, Iowa State University, Ames, IA 50011
dhay@iastate.edu
ALEXANDER ROITERSHTEIN was born in St. Petersburg, Russia. He received the Ph.D. degree in ap-
plied mathematics in 2004 from Technion IIT, Haifa, Israel. Currently, he is an Assistant Professor with the
Department of Mathematics, Iowa State University. His research interests are in probability theory, stochastic
processes and their applications.
Department of Mathematics, Iowa State University, Ames, IA 50011
roiterst@iastate.edu
Abstract. Let q1 = 2. Supposing that we have defined q j for all 1 ≤ j ≤ k, let qk+1 be a prime
factor of 1 + kj=1 q j . As was shown by Euclid over two thousand years ago, q1 , q2 , q3 , . . .
Q
is then an infinite sequence of distinct primes. The sequence
Q {qi } is not unique, since there is
flexibility in the choice of the prime qk+1 dividing 1 + kj=1 q j . Mullin suggested studying
the two sequences formed by (1) always taking qk+1 as small as possible, and (2) always
taking qk+1 as large as possible. For each of these sequences, he asked whether every prime
eventually appears. Recently, Booker showed that the second sequence omits infinitely many
primes. We give a completely elementary proof of Booker’s result, suitable for presentation in
a first course in number theory.
1. INTRODUCTION. The following is one version of Euclid’s proof that there are
infinitely many primes. Start with q1 = 2. Supposing that q j has been defined for
1 ≤ j ≤ k, continue the sequence by choosing a prime qk+1 , for which
k
Y
qk+1 | 1 + qj. (1)
j=1
Then ‘at the end of the day’, the list q1 , q2 , q3 , . . . is an infinite sequence of distinct
prime numbers.
Of course, the sequence {qi } obtained in this way is not unique, since the relation
(1) is often satisfied by several choices of the prime qk+1 . Mullin [4] suggested two
natural ways of dispensing with the ambiguity. First, we could agree that at each step,
we always choose the smallest prime qk+1 satisfying (1); this leads to the sequence
(numbered A000945 in the Online Encyclopedia of Integer Sequences, or OEIS [6])
2, 3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801, 11, 17, 5471, . . . . (2)
Alternatively, we might always choose the largest possible qk+1 , resulting in the se-
quence (A000946 in the OEIS)
We call (2) and (3) the first and second Euclid–Mullin sequences, respectively. For
each of (2) and (3), Mullin raised the question of whether every prime eventually ap-
pears. Shanks [5] conjectured on probabilistic grounds (bolstered by computations
of Wagstaff; cf. [7]) that every prime is eventually reached by (2), but essentially
nothing about the first Euclid–Mullin sequence has been rigorously established. The
second Euclid–Mullin sequence was investigated by Cox and van der Poorten [2].
http://dx.doi.org/10.4169/amer.math.monthly.121.05.433
MSC: Primary 11A41, Secondary 11A15
Theorem (Booker). The second Euclid–Mullin sequence omits infinitely many primes.
There are two key ingredients in Booker’s proof. The first is quadratic reciprocity
for the Jacobi symbol, which is a staple of many first courses in number theory. In
addition to this elementary theorem, Booker also makes use of some fairly intricate
results in analytic number theory, specifically work of Burgess from the 1960s on
upper bounds for short character sums.
A simple statement calls out for a simple proof! In this note, we present a variant
of Booker’s proof, where all of the analytic number theory is replaced by very simple-
to-prove statements about the distribution of squares and nonsquares modulo a prime.
There is a cost for this, certainly; our quantitative bounds are weaker than what fol-
lows from Burgess’s estimates. However, we believe that given how simple Booker’s
theorem is to state, there is some value in writing out a proof that is accessible to as
wide an audience as possible.
Notation. Throughout the paper, we reserve the letter p for a prime variable. We use
a
m
for the usual Legendre–Jacobi symbol.
Proof. Let n = n 2 ( p). Since p < nd p/ne < p + n, the least nonnegative residue of
nd p/ne modulo p lies in the open interval (0, n). So nd p/ne is a quadratic residue
modulo p. Since n is a quadratic nonresidue, the ratio nd p/ne
n
= d p/ne is also a non-
residue. So by the minimality of n, it must be that 1 + p/n > d p/ne ≥ n. Hence,
1 2
1 √
n− < n 2 − n + 1 ≤ p, and so n < + p.
2 2
We can now establish an upper bound on the length of any sequence of consecutive
squares modulo p.
√
Proposition 3. If p is an odd prime, then `0 (, p) < 2 p.
Proof. We first rule out long runs of squares containing a multiple of p. Suppose
first that −1 is not a square modulo p. Then any such run of squares can be viewed,
modulo p, as a subset of the interval [0, n 2 ( p)), and thus has length at most n 2 ( p). On
the other hand, if −1 is a square modulo p, then such a run can be viewed as a subset
of (−n 2 ( p), n 2 ( p)), and so has length at most 2n 2 ( p) − 1. Consequently,
Remarks. Much of this section is adapted from the charming book of Gelfond and
Linnik [3]. Lemma 1 and its proof appear, with trivial changes, as that text’s Theorem
9.3.1, while the proof of Proposition 4 comes from the discussion at the bottom of
p. 179. The only novelty is our proof of Proposition 3. Gelfond and Linnik state that
result as Theorem 9.3.2, but it seems that their proof is incomplete.
Remark. Using the results of Burgess, Booker showed that the exponent 2 in (4) can
be replaced with any real number larger than 4√1e−1 = 0.178734 . . . , provided that 122
is also replaced by a possibly larger constant.
Qr 2
Proof. Let X = 122 i=1 Q i . Let us suppose for the sake of contradiction that every
prime p ≤ X except Q 1 , . . . , Q r appears in the second Euclid–Mullin sequence. Let
p be the prime in [2, X ] that is last to appear in the sequence {qi }, and say p appears
as the nth term qn . Then p is the largest prime dividing 1 + q1 · · · qn−1 . Moreover,
since each prime smaller than p that is not a Q i is one of q1 , . . . , qn−1 , the only other
possible prime factors of 1 + q1 · · · qn−1 are Q 1 , . . . , Q r . Thus, we must have
e e
1 + q1 · · · qn−1 = Q 11 Q 22 · · · Q rer p e
as well as
d −1
= . (6)
p p
Suppose for the moment that this has been proved. Since d ≤ X , and d is coprime to
Q 1 · · · Q r p, every prime dividing d is among the primes q1 , . . . , qn−1 . So if we write
d = d0 d12 , where d0 is squarefree, then d0 | q1 · · · qn−1 . Hence,
d 1 + q1 · · · qn−1
=
1 + q1 · · · qn−1 d
1 + q1 · · · qn−1 1 + q1 · · · qn−1
=
d0 d12
1 + q1 · · · qn−1 2
1
= · = 1 · 1 = 1.
d0 d1
using in the last step that 1 + q1 · · · qn−1 = 1 + 2 1<i<n qi ≡ 3 (mod 4). This is a
Q
contradiction.
It remains to establish the existence of a d ≤ X satisfying (5) and (6). The condi-
tions (5) are satisfied by every integer d ≡ A (mod M), where A := 2Q 1 · · · Q r − 1
and M := 4Q1 · · · Q r . To obtain (6), we look for a small nonnegative integer k with
Mk+A
p
= −1 p
. Equivalently, fixing M 0 satisfying M M 0 ≡ 1 (mod p), we seek a non-
negative integer k with
k + AM 0 −M 0
= .
p p
√
By the results of section 2, we can find such a k ≤ max{`0 (, p), `0 (, p)} < 2 p.
Then the corresponding d satisfies
√ √ √
0 < d = Mk + A < 2M p + M < 3M p ≤ 3M X.
√
Since 3M = 12Q 1 · · · Q r = X , we find that d < X . This completes the proof.
ACKNOWLEDGMENTS. We are grateful to Carl Pomerance and the anonymous referee for their thoughtful
suggestions. In particular, the current form of Proposition 3 is due to the referee; our original result was slightly
weaker. We also thank Yuliia Glushchenko for help with the Russian original of [3].
REFERENCES
1. A. Booker, On Mullin’s second sequence of primes, Integers 12A (2012), available at http://www.
integers-ejcnt.org/vol12a.html.
2. C. D. Cox, A. J. van der Poorten, On a sequence of prime numbers, J. Austral. Math. Soc. 8 (1968) 571–
574.
3. A. O. Gel0 fond, Yu. V. Linnik, Elementary Methods in the Analytic Theory of Numbers. Pergamon Press,
Oxford, 1966.
4. A. A. Mullin, Recursive function theory (a modern look at a Euclidean idea), Bull. Amer. Math. Soc. 69
(1963) 737.
5. D. Shanks, Euclid’s primes, Bull. Inst. Combin. Appl. 1 (1991) 33–36.
6. N. J. A. Sloane, The On-Line Encyclopedia of Integer Sequences, published electronically at http://
oeis.org.
7. S. S. Wagstaff, Jr., Computing Euclid’s primes, Bull. Inst. Combin. Appl. 8 (1993) 23–32.
Department of Mathematics and Computer Science, Lake Forest College, Lake Forest, IL 60045
trevino@mx.lakeforest.edu
Abstract. In a recent paper titled The parbelos, a parabolic analog of the arbelos, Sondow
asks for a synthetic proof to the tangency property of the parbelos. In this paper, we resolve
this question by introducing a converse to Lambert’s Theorem on the parabola. In the process,
we prove some new properties of the parbelos.
T3
T1
C1 C2 C3
Figure 1.2. Sondow’s Tangency Property: the diagonal T1 T3 of the tangent rectangle C2 T1 T2 T3 is tangent to
the outer parabola. Moreover, the tangency point is the intersection of the angle bisector of cusp C2 with the
outer parabola.
Lemma 2. A line l is tangent to the parabola if and only if the orthogonal projection
of the focus F into l lies on the supporting line 3.
Proof. For a proof of the “only if” statement, we refer the reader to [2] and [5]. For the
“if” statement, let P be the orthogonal projection of F into l and assume that P ∈ 3. If
P is the vertex of G, then clearly l = 3 and we are done. So assume otherwise. Since
3 has no points inside of the parabola, there exists a tangent line l˜ to G not equal to 3
that passes through P. The “only if” part implies that the orthogonal projection of F
into l˜ is on 3, and is therefore P. It follows that l = l.
˜
Lambert’s Theorem on the parabola states that the circumcircle of a triangle formed
by three tangents to the parabola always passes through the focus. Using Lemma 2,
we can prove the statement quite easily. Let three tangents l1 , l2 , l3 to the parabola be
given. Then the orthogonal projections of F into l1 , l2 , l3 all lie on 3, and are therefore
collinear. By the Simson–Wallace Theorem, F lies on the circumcircle of the triangle
formed from l1 , l2 , l3 . We now introduce a converse to Lambert’s Theorem.
l l
3 P
Proof. If Hi = I for some i, then the statement clearly holds. So assume that Hi 6 = I
for each i. By Lemma 2, the orthogonal projections of F into l1 and l2 lie on 3. Since
F is on the circumcircle of 4H1 H2 I , its pedal is a line (by Theorem 1). As a line
is uniquely determined by two points, this line must be 3. Applying Lemma 2 again
yields that H1 H2 is tangent to G.
I
H2
H1
l2
l1
F
3. PARBELOS. Recall that the latus rectum of a conic is the chord through the focus
that is parallel to the conic’s directrix. The parbelos is constructed as follows. Given
three points C1 , C2 , C3 on a line, construct parabolas G 1 , G 2 , G 3 that open in the same
direction and whose latera recta are C1 C2 , C2 C3 , and C1 C3 , respectively. The parbelos
is defined as the region bounded by the three latus rectum arcs.
G2
G3
G1
C1 C2 C3
The tangent line of a parabola at either endpoint of its latus rectum forms an angle
of π4 with the latus rectum. As such, parabolas G 1 and G 2 share the same tangent at
C1 , and similarly parabolas G 2 and G 3 share a tangent at C3 . At cusp C2 , however, we
obtain two different tangent directions. We can extend these four tangents to form a
rectangle whose vertices are the intersections of tangent lines as in Figure 1.2. We will
denote the vertices of this rectangle by C2 , T1 , T2 , T3 .
In his paper [3], Sondow asks for a synthetic proof of the following theorem, which
he proves via analytic geometry.
T2
T3
O
T1
C1C3 C2
T2
T3
O
T1
C1C3 C2 F
T3 T3
O O
T T
T1 T1
C2 C2
C1C3 F C1C3 F
Remark 5. One way to look at the configuration in Figure 3.1 is as a 4-periodic billiard
trajectory in a square billiard table (for an exposition on billiards in polygons, see [4]).
It would be interesting to see whether there is a deeper connection between (p)arbelos
and billiards.
H T2
A1 T3
T1 A3
F
C2
Figure 3.4. The circumcircle of the tangent rectangle and notable points lying on it.
Corollary 6.
1. The focus F of the outer parabola is equidistant from vertices T1 and T3 of the
tangent rectangle.
2. The intersection H of the angle bisector at cusp C2 and the directrix of the outer
parabola lies on the circumcircle F, C2 , T1 , T2 , T3 of the tangent rectangle.
3. This point H is equidistant from vertices T1 and T3 .
4. Points A1 and A3 lie on circle F, C2 , T1 , T2 , T3 .
5. Point A1 is equidistant from C2 and T2 and so is point A3 .
√
A Difference Equation Leading to the Irrationality of 2
We
√ provide a fresh proof of a very old and well-known fact: the irrationality of
2. To our knowledge, this approach is new; at least, we have not seen it in the
outstanding references [1, 2], although the flavor of the proof reminds us of [3].
REFERENCES
Project MTM2010-15314
Abstract. In 1955, Furstenberg gave a curious topological proof of the infinitude of primes.
Cass and Wildenberg, as well as Mercer, dispensed with the topological language in this proof
to uncover the essential number theory. In this note, we observe that Furstenberg’s proof has
an important and intriguing connection to Euclid’s well-known original proof.
Note that N (P) consists of all integers that are not integer multiples of any prime. The
Fundamental Theorem of Arithmetic implies that N (P) = {−1, 1}.
The key observation in Furstenberg’s proof is that if P were finite, then N (P) would
be infinite, contradicting that N (P) = {−1, 1}. In [1], periodic subsets of Z are used
to show N (P) is infinite. A straightforward way to see that N (P) is infinite if P were
finite is to note that for any m ∈ Z and p ∈ P, m(z P ) + 1 ∈ Z\( pZ). It follows that
\
m(z P ) + 1 ∈ Z\( pZ) = N (P).
p∈P
A strikingly similar tactic is used in Euclid’s classic proof, known to many students
of mathematics. We start with any finite set of primes F and it is shown that z F + 1 ∈
N (F). So assuming that P is finite, we see that z P + 1 > 1, which contradicts that
N (P) = {−1, 1}.
In summary, assume that P is finite. Furstenberg’s proof reduces to the observation
that N (P) is infinite. Euclid’s proof reduces to the observation that z P + 1 ∈ N (P).
Both observations contradict that N (P) = {−1, 1}.
ACKNOWLEDGMENT. The author thanks the referee for many helpful suggestions.
REFERENCES
1. D. Cass, G. Wildenberg, Math Bite: A Novel Proof of the Infinitude of Primes, Revisited, Mathematics
Magazine 76 (2003) 203.
2. H. Furstenberg, On the infinitude of primes, Amer. Math. Monthly 62 (1955) 353.
3. I. D. Mercer, On Furstenberg’s Proof of the Infinitude of Primes, Amer. Math. Monthly 116 (2009) 355–
356.
http://dx.doi.org/10.4169/amer.math.monthly.121.05.444
MSC: Primary 11A41
Let A and B be two sets and let f : A → B and g : B → A be two mappings. If both
f and g are injections, then Banach’s mapping theorem [1] asserts that there exist
A0 ⊆ A and B0 ⊆ B such that f (A0 ) = B0 and g(B\B0 ) = A\A0 , and the Cantor-
Schroeder-Bernstein theorem [4, Theorem 4.5.5] asserts that there exists a bijection
from A to B. It is well known that the former theorem implies the latter one, by defin-
ing a desired bijection to be f if restricted to A0 , and the inverse of g otherwise. In
fact, Banach’s mapping theorem can be generalized by removing the condition of in-
jections; see [4, p.102, Exercise 4.21].
In this short note, we prove the generalization of Banach’s mapping theorem men-
tioned above. We do this by applying the following simple fact. In this fact, we set
A0 = A∗ and B0 = f (A∗ ). This includes the extreme case when A∗ is empty, or equiv-
alently, when g is surjective.
We may obtain this fact using the Knaster–Tarski theorem [3, 5] from lattice theory
as follows. The power set of A, partially ordered by set inclusion, is a complete lattice,
and so can be restricted to the set of all fixed points of the isotonic ϕ. Hence, a least
fixed point exists as A∗ (also refer to [2] for the related theoretical background). For
completeness, we offer an elementary proof as follows.
On the other hand, the definition of ϕ implies that if S1 ⊆ S2 , then ϕ(S1 ) ⊆ ϕ(S2 ); in
particular, we have ϕ(ϕ(A∗ )) ⊆ ϕ(A∗ ). By the definition of , we obtain that ϕ(A∗ ) ∈
. The definition of A∗ implies that A∗ ⊆ ϕ(A∗ ). Therefore, ϕ(A∗ ) = A∗ .
ACKNOWLEDGMENT. The author would like to thank Steve Krantz and the referee for valuable comments.
http://dx.doi.org/10.4169/amer.math.monthly.121.05.445
MSC: Primary 04A05
1. S. Banach, Un théorème sur les transformations biunivoques, Fund. Math. 6 (1924) 236–239.
2. G. Birkhoff, Lattice Theory. Third edition. American Mathematical Society, Providence, RI, 1979.
3. B. Knaster, Un théorème sur les fonctions d’ensembles, Ann. Soc. Polon. Math. 6 (1928) 133–134.
4. S. G. Krantz, Elements of Advanced Mathematics. Third edition. Taylor & Francis/CRC Press, Boca Raton,
FL, 2012.
5. A. Tarski, A lattice-theoretical fixpoint theorem and its applications, Pacific J. Math. 5 (1955) 285–309.
Department of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan
mcli@math.nctu.edu.tw
But mathematics was not one of my fortes. To this day I still wonder at the
astuteness of that Mr. Bickford, who gave me my final examination in Higher
Algebra . . . Mr. Bickford was older than some of the others and had the reputation
of being the best man in his department. As I stood there watching him chewing
his moustache—he wore a little black moustache—as he studied the paper I had
turned in, the last one left in the room, I tried to brace myself as best I could
for the verdict. He looked up at me after a while with no expression on his face
whatsoever. “You’ll never be a mathematician, Williams,” he told me. I agreed.
“But you show an understanding of the process.” He paused. “And I’m going to
pass you!” I couldn’t move for joy. It was the most intelligent verdict, and from
a teacher, that I have ever encountered. It is hard to realize how important such
a moment can be in a man’s life. That single piece of intelligence had more to
do in straightening my difficulties, in putting me on a correct course than any
single thing that I can remember. He saw my mind, and realized what it was not
intended to perform. And he acted accordingly. That’s what it means, at best, to
be a teacher.
—Submitted by Robert Haas
∞
x 4 + bx 2 − d
Z
d x = 0. (1)
0 x 8 + cx 4 + d 2
2. CALCULATION.
∞ 2
e−ax
Z
d x. (3)
0 F(x)
Under the stated hypotheses, by Cauchy’s theorem, the integration contour can√ be ro-
tated by 45 to run along the diagonal of the first quadrant. Write x = r (1 + i)/ 2, so
◦
x 2 = ir 2 . The result follows by noting that the imaginary part of the resulting integral
must vanish.
This results by expanding both sides of (2) in powers of a and comparing coefficients.
http://dx.doi.org/10.4169/amer.math.monthly.121.05.447
MSC: Primary 26A06, Secondary 30E20; 33B10
Z ∞ Z ∞
sin(ar )e (cos r + sin r ) dr =
2 −r
cos(ar 2 )e−r (cos r − sin r ) dr, (5a)
0 0
Z ∞ Z ∞
r 4n e−r (cos r − sin r ) dr = 0, r 4n+2 e−r (cos r + sin r ) dr = 0. (5b)
0 0
As a non-elementary example, the Kelvin functions (see [1]), which occur in many
engineering studies, are defined by
where K 0 (z) is a particular Bessel function. Thus, for F(x) = K 0 (x), we obtain the
Kelvin function identity
∞
ker0 (x) ∞
kei0 (x)
Z Z
dx = d x, (7)
0 ker0 (x) + kei20 (x)
2
0 ker20 (x)+ kei20 (x)
which seems to be a new addition to the literature concerning these functions. The con-
sequences of the theorem may be increased many-fold by noting that sin(ar 2 ), cos(ar 2 )
in (2), when the resulting integrals converge, may be replaced by S(r 2 ), C(r 2 ), the sine
and cosine transforms, respectively, of a given function. Thus, by multiplying both
sides of (5a) by the characteristic function of the interval 0 < a < b and integrating
over a, we find that
Z ∞ Z ∞
dr dr
2
(1 − cos(br ))e (cos r + sin r ) =
2 −r
sin(ar 2 )e−r (cos r − sin r ). (8)
0 r 0 r2
√
The integral √ on the right-hand side of (6) is given by Mathematica as π b/2 −
(π/2)erfc(1/ 2b), so we have determined the value of the left-hand integral for
which it does not return a value. Similar, although more complex, formulas are ob-
tained if x 2 in the exponent in (3) is replaced by x ν and the path of integration is
rotated to the corresponding ray.
REFERENCE
1. I. S. Gradshteyn, I. M. Ryzhik, Table of Integrals, Series and Products. Academic Press, New York, 1994.
Abstract. We give a simple argument, based on drawing balls from urns, showing that the ex-
ponential bound on the probability of a large deviation for sampling with replacement applies
also to sampling without replacement. This result includes as a special case the relationship
between the binomial and hypergeometric distributions.
g K (u) = Ex[u K ]
X n
= p k (1 − p)n−k u k
0≤k≤n
k
= (1 − p + pu)n (1)
1 X
≤ Pr[K = k]u k
u ( p+q)n ( p+q)n≤k≤n
1 X
≤ Pr[K = k]u k
u ( p+q)n 0≤k≤n
g K (u)
= (2)
u ( p+q)n
for any u ≥ 1. Substituting (1) into (2) yields
1 − p + pu n
Pr[K ≥ ( p + q)n] ≤ .
u p+q
http://dx.doi.org/10.4169/amer.math.monthly.121.05.449
MSC: Primary 60F10, Secondary 60E15
The bound (3) is due, in the form we have stated it, to Chernoff [4].
For the hypergeometric distribution, we typically imagine an urn initially contain-
ing a red balls and b blue balls. If we draw n ≤ a + b balls without replacement,
the number H of red balls drawn has the distribution: Pr[H = h] = ah n−h b
a+b
n
.
If n balls were drawn with replacement, the number of red balls drawn would have a
binomial distribution with p = a/(a + b). Without replacement, each red ball drawn
reduces the probability that the next ball drawn will be red. Thus, we expect the dis-
tribution of H to be more strongly concentrated about its expectation than that of
K with p = a/(a + b). We therefore expect the bound of (3) to apply to Pr[H ≥
( p + q)n] as well as to Pr[K ≥ ( p + q)n]. But if we try to derive such a bound,
we encounter the difficulty that the probability generating function g H (u) for H has
no simple expression in terms of elementary functions (analogous to the expression
(1 − p + pu)n for g K (t)).
P Indeed, the hypergeometric distribution gets its name from
the fact that g H (u) = 0≤h≤n ah n−h b
u h a+b
n
is a “hypergeometric function.” It is,
however, true that
p+q 1− p−q !n
p 1− p
Pr[H ≥ ( p + q)n] ≤ . (4)
p+q 1− p−q
The bound (4) was first proved by Hoeffding [8] in 1963. In 1979 Chvátal [5] derived
(4) by direct manipulation of the binomial coefficients appearing in the sum expressing
Pr[H ≥ ( p + q)n]. Hoeffding, however, obtained (4) as a special case of a much more
general result concerning “sampling from a finite population.” Suppose that the N
balls initially in the urn have not colors (red or blue) but rather real values c1 , . . . , c N .
We can draw n balls with replacement, obtaining n values Y1 , . . . , Yn , and look at the
distribution of their sum Tn = Y1 + · · · + Yn . Alternatively, we can draw n ≤ N balls
without replacement, obtaining n values X 1 , . . . , X n , and look at the distribution of
their sum Sn = X 1 + · · · + X n . The original urn is of course the special case “red = 1,
blue = 0,” with K = Tn and H = Sn .
We can easily obtain a large-deviation bound for Tn analogous to (3). Since Tn may
assume non-integral values, it will be convenient to use “moment generating func-
tions,” rather than probability generating functions. Let MY (r ) = Ex[er Y ] = (er c1 +
· · · + er c N )/N be the common moment generating function for each of the Yi . Then
MY (r ) is also the common moment generating function for each of the X i . Since draws
with replacement yield independent values, Tn has the moment generating function
MTn (r ) = MY (r )n . If µ = (c1 + · · · + c N )/N denotes the common expectation of the
Yi and ν > 0, then we obtain, in analogy to (2),
MY (r )
n
Pr[Tn ≥ (µ + ν)n] ≤ . (5)
er (µ+ν)
MY (s)
n
Pr[Sn ≥ (µ + ν)n] ≤ . (6)
es(µ+ν)
This generalization of (4) lies beyond the reach of the combinatorial argument of
Chvátal [5]. Our main goal in this note is to give a simple and vivid argument in-
volving urns and balls yielding the bound (6) (including (4) as a special case).
η(i) = ξ( j) (7)
for some j ≤ i.
The process just described defines a random variable (ξ, η) for which ξ is uniformly
distributed over all permutations of {1, . . . , N } and η is a sequence η(1), η(2), . . . of
independent random variables, each uniformly distributed over {1, . . . , N }.
Let c1 , . . . , cn be real numbers. We shall define the random variables X 1 , . . . , X N
by X i = cξ(i) for 1 ≤ i ≤ N , and the random variables Y1 , Y2 , . . . by Yi = cη(i) for
i ≥ 1. This definition creates a coupling between the sequence X 1 , . . . , X N , which is
distributed as a random sample without replacement from the population c1 , . . . , cn ,
Ex[Tn | Sn ] = Sn .
for 1 ≤ i ≤ n. Now
Since by (7) each η(i) for 1 ≤ i ≤ n is equal to one of the ξ(1), . . . , ξ(n), each of the
n terms in (9) is equal by (8) to s/n, and thus Ex[Tn | Sn = s] = s.
A twice differentiable function f : R → R is convex if its second derivative is non-
(t)
R
negative.
R A convex
function satisfies Jensen’s inequality, in the form f d F(t) ≥
f t d F(t) (see Hardy, Littlewood, and Pólya [7]). This inequality may be inter-
preted as saying that an average of function values is at least as large as the function
value at the corresponding average of the arguments. If (S, T ) is a martingale coupling,
then we have
Z Z
Ex[ f (T )] = f (t) d FT |S=s FS (s)
Z Z
≥ f t d FT |S=s (t) d FS (s)
Z
= f (Ex[T | S = s]) d FS (s)
Z
= f (s) d FS = Ex[ f (S)],
where FS (s) = Pr[S ≤ s] is the probability distribution function for S, and FT |S=s (t) =
Pr[T ≤ t | S = s] is the conditional probability distribution function for T , given that
S = s. This inequality will allow us to transfer the bound (5) from Tn to Sn , because
(Sn , Tn ) is a martingale coupling and the bound (5) is based on the moment generating
function, which is an expectation of a convex function (the exponential function). That
is, since t 7→ er t is convex for any real r , we have
M Sn (r ) ≤ MTn (r ).
3. STOCHASTIC ORDER. The key to the proofs in the preceding section was the
construction of a martingale coupling. Consider the following two conditions relating
random variables S and T .
(1) There exists a martingale coupling between S and T (that is, there is a random
variable ( Ŝ, T̂ ) such that Ŝ has the same distribution as S, T̂ has the same
distribution as T , and ( Ŝ, T̂ ) is a martingale, that is, Ex[T̂ | Ŝ] = Ŝ).
(2) For any function f : R → R, if f is convex, then Ex[ f (S)] ≤ Ex[ f (T )] (pro-
vided the expectations exist).
We have used the fact that (1) implies (2), which follows immediately from Jensen’s
inequality. But in fact the converse, (2) implies (1), also holds. This converse is usually
ascribed to Strassen [14], who gave the first statement in full generality. But precursors
to this result are due to Hardy, Littlewood and Pólya [6, 7] and Blackwell [2, 3]. A
particularly simple proof has been given by Müller and Rüschendorf [10].
The equivalent conditions (1) and (2) can be used to define a “stochastic order”
between random variables (or their probability distributions). We say that “S is at
most as variable as T ” if one (and therefore also the other) of these conditions holds.
This partial order has been used by economists to compare assessments of risk (see
Rothschild and Stiglitz [11, 12, 13]) and inequality of incomes (see Atkinson [1]).
ACKNOWLEDGMENT. This research was partially supported by NSF Grant CCF 0917026.
REFERENCES
Excerpted from News and Notes, Amer. Math. Monthly 21 (1914) 172.
PROBLEMS
11775. Proposed by Isaac Sofair, S Fredericksburg,
VA. Let A1 , . . . , Ak be finite sets.
For J ⊆ {1, . . . , k}, let N J = j∈J A j , and let Sm = J : |J |=m N J .
P
k
= 0 if k < l.
Here, l
http://dx.doi.org/10.4169/amer.math.monthly.121.05.455
11778. Proposed by Li Zhou, Polk State College, Winter Haven, FL. Let x, y, z be pos-
itive real numbers such that x + y + z = π/2. Let f (x, y, z) = 1/(tan2 x + 4 tan2 y +
9 tan2 z). Prove that
9
f (x, y, z) + f (y, z, x) + f (z, x, y) ≤ tan2 x + tan2 y + tan2 z .
14
11780. Proposed by Cezar Lupu, University of Pittsburgh, Pittsburgh, PA, and Tudorel
Lupu, Decebal High School, Constanţa, Romania. Let f be a positive-valued, concave
function on [0, 1]. Prove that
Z 1 2 Z 1
3 1
f (x) d x ≤ + f 3 (x) d x.
4 0 8 0
11781. Proposed by Roberto Tauraso, Università di Roma “Tor Vergata”, Rome, Italy.
For n ≥ 2, call a positive integer n-smooth if none of its prime factors is larger than
n. Let Sn be the set of all n-smooth positive integers. Let C be a finite, nonempty
set of nonnegative integers, and let a and d be positive integers. Let M be the set
of all positive integers of the form m = dk=1 ck sk , where ck ∈ C and sk ∈ Sn for
P
k = 1, . . . , d. Prove that there are infinitely many primes p such that pa ∈
/ M.
SOLUTIONS
in a neighborhood of x = 0. Define
n
! !
Z ∞
1 1 X x k−n−1
f n (s) = x s−1 − + Bk e−x d x.
0 x (e − 1)
n x xn k=0
k!
The integral converges absolutely for Re s > 0 and uniformly in every compact subset
contained in Re s ≥ ε > 0. Therefore, f n (s) is analytic in Re s > 0. Thus In = f n (1),
and we compute f n (1).
If Re s > n, then
n
X Bk
f n (s) = 0(s − n)ζ (s − n) − 0(s − n) − 0(s + k − n − 1). (1)
k=0
k!
Note that (1) represents the analytic continuation of f n (s) as a meromorphic function
in the whole complex plane. Also, the residues of f n at s ∈ {1, . . . , n} all vanish.
We now take note of some well-known facts about the gamma and zeta functions.
If m is a nonnegative integer, then in a neighborhood of s = 1 we have
0(s)
0(s − m) =
(s − 1)(s − 2) · · · (s − m)
(−1)m−1
1
= − γ + Hm−1 + O(s − 1) . (2)
(m − 1)! s − 1
By considering the residue of f n at 1, we have
n
(−1)n−1 (−1)n−1 X Bk (−1)n−k
0= ζ (1 − n) − − · .
(n − 1)! (n − 1)! k=0 k! (n − k)!
For n ≥ 1,
n
! !
Z ∞
1 1 X x k−n−1
In = − + Bk e−x d x = f n (1)
0 x n (e x − 1) xn k=0
k!
n
!
X Bk
= lim 0(s − n)ζ (s − n) − 0(s − n) − 0(s + k − n − 1)
s→1
k=0
k!
(−1)n−1 (−1)n−1 0
= (−γ + Hn−1 ) ζ (1 − n) + ζ (1 − n)
(n − 1)! (n − 1)!
n
(−1)n−1 X Bk (−1)n−k
− (−γ + Hn−1 ) − · (−γ + Hn−k )
(n − 1)! k=0
k! (n − k)!
n−1
(−1)n−1 (−1)n−1 0 X Bk (−1)n−k
= (ζ (1 − n) − 1) + ζ (1 − n) − · Hn−k
(n − 1)! (n − 1)! k=0
k! (n − k)!
n−1
Bn (−1)n−1 (−1)n−1 0 X Bk (−1)n−k
= Hn−1 − Hn−1 + ζ (1 − n) − · ,
n! (n − 1)! (n − 1)! k=0
k! (n − k)!
Bn (2π )n
= (− log(2π) + Hn−1 − γ ) − (−1)n/2 ζ 0 (n).
2(n!)
We thus conclude:
1
I0 = γ − 1, I1 = f 1 (1) = ζ 0 (0) + B0 H1 = 1 − log(2π ),
2
An l p Inequality
11649 [2012, 522]. Proposed by Grahame Bennett, Indiana University, Bloomington,
IN. Let p be real with p > 1. Let (x0 , x1 , . . .) be a sequence of nonnegative real num-
bers. Prove that
∞ ∞
!p ∞ j
!p
X X xk X 1 X
<∞ ⇒ xk < ∞.
j=0 k=0
j +k+1 j=0
j + 1 k=0
Solution by Oliver Geupel, Brühl, NRW, Germany. For every nonnegative integer j,
since x j > 0, we have
j j ∞
1 X 2j + 1 X xk X xk
xk ≤ ≤2 .
j + 1 k=0 j + 1 k=0 j + k + 1 k=0
j +k+1
If p > 0, then x p strictly increases with x on the interval [0, ∞). Thus, raising both
sides of this inequality to the pth power and summing both sides over j yields
∞ j
!p ∞ ∞
!p
1 X x k
≤ 2p
X X X
xk .
j=0
j + 1 k=0 j=0 k=0
j +k+1
The proof also shows that the restriction on p can be relaxed to p > 0.
Editorial comment. Kenneth F. Anderson remarked that, conversely, since (a + b) p ≤
2 p (a p + b p ) for a, b ≥ 0, it follows that
!p !p p
∞ ∞ ∞ j ∞ ∞
xk 1 x k
≤ 2p
X X X X X X
xk + .
j=0 k=0
j + k + 1 j=0
j + 1 k=0 j=0 k= j
k + 1
The
P∞ convergence of the two series on the right-hand side implies convergence of
p
/(
P∞
j=0 k=0 x k j + k + 1) . See Hardy’s discussion of Hilbert’s Double Series
Theorem (Hardy–Littlewood–Pólya, Inequalities, Cambridge University Press, 1967,
Ch. 9).
Also solved by K. F. Andersen (Canada), R. Bagby, P. P. Dályay (Hungary), E. A. Herman, F. Holland (Ireland),
B. Karaivanov, O. Kouba (Syria), J. H. Lindsey II, O. P. Lossers (Netherlands), M. Omarjee (France), P. Perfetti
(Italy), M. A. Prasad (India), A. Stenger, R. Stong, R. Tauraso (Italy), T. Viteam (Chile), and the proposer.
x 2 − y2
Z ∞Z ∞
2
e−(x−y) sin2 (x 2 + y 2 ) 2 dy d x.
x=0 y=x (x + y 2 )2
Also solved by K. F. Andersen (Canada), D. Anderson (Ireland), R. Bagby, D. H. Bailey (U.S.) & J. M. Borwein
(Australia), M. Benito, Ó. Ciaurri, E. Fernández & L. Roncal (Spain), K. N. Boyadzhiev, M. A. Carlton, R.
Chapman (U. K.), H. Chen, B. E. Davis, S. de Luxán (Spain), E. S. Eyeson, C. Georghiou (Greece), O. Geupel
(Germany), M. L. Glasser, J. A. Grzesik, A. Guetter & I. Roussos, E. A. Herman, F. Holland (Ireland), B.
Karaivanov, O. Kouba (Syria), K. D. Lathrop, K.-W. Lau (China), O. P. Lossers (Netherlands), J. Magliano,
T. L. McCoy, M. Omarjee (France), P. Perfetti (Italy), M. A. Prasad (India), I. Rusodimos, R. Stong, R. Tauraso
(Italy), T. Trif (Romania), D. B. Tyler, E. I. Verriest, J. Vinuesa (Spain), M. Vowe (Switzerland), J. Wan
(Australia), H. Wang & J. Wojdylo, GCHQ Problem Solving Group (U. K.), NSA Problems Group, and the
proposer.
Let T (a, b, c, d, n) be the (n + 1)-by-(n + 1) matrix with (i, j)-entry given by ti, j , for
i, j ∈ {0, . . . , n}. Show that det T (a, b, c, d, n) = (ad − bc)n(n+1)/2 .
Solution by Omran Kouba, Higher Institute for Applied Sciences and Technology,
Damascus, Syria. Let E denote the vector space Rn [x] of real polynomials with de-
gree at most n, and let B denote the canonical basis {1, x, x 2 , . . . } of E. Consider the
linear transformations V and Tλ,µ from E to E defined by V (P(x)) = x n P(1/x) and
Tλ,µ (P(x)) = P(λx + µ), where (λ, µ) ∈ R2 .
For a linear transformation T from E to E, let det(T ) denote the determinant of the
matrix of T with respect to B. Since the matrices of V and Tλ,µ with respect to B are
0 ··· ··· 0 1 1 µ ∗ ··· ∗
0 · · · 0 1 0 0 λ ∗ ··· ∗
. .. .. .. ..
. 0 0 λ 2
··· ∗ ,
. . . . .
and
. .
.. . . . . . . . . ...
0 1 0 · · · 0
1 0 ··· ··· 0 0 ··· ··· 0 λn
Thus, the matrix of U with respect to B is the transpose of the matrix T (a, b, c, d, n).
Using (∗), we obtain
det(T (a, b, c, d, n)) = det(U ) = (ad − bc)n(n+1)/2 .
The case b = 0 follows by continuity.
Also solved by D. Beckwith, R. Chapman (U. K.), P. P. Dályay (Hungary), B. Karaivanov, P. Lima-Filho,
M. Omarjee (France), M. A. Prasad (India), J. H. Smith, J. H. Steelman, R. Stong, M. Wildon (U. K.), GCHQ
Problem Solving Group (U. K.), and the proposer.
Math on Trial: How Mathematics Is Used and Abused in the Courtroom. By Leila Schneps
and Coralie Colmez, Basic Books, New York, 2013, xi+256 pp., ISBN 978-0-465-03292-1,
$26.99.
http://dx.doi.org/10.4169/amer.math.monthly.121.05.463
We received several communications regarding “Yet a simpler proof of the chain rule,”
by Haryono Tandra in Vol. 120, No. 10, 2013, p. 900. David Salmon from the Univer-
sity of Oregon commented as follows:
I see no difference between this proof and that provided in “Introductory Complex
Analysis”, p. 46–47, by Richard A. Silverman, Dover Edition, 1972.
Raymond Mortini raises issues with the correctness of Tandra’s proof. Mortini’s
objections are due to a typographical error which we failed to correct. The last centered
equation in the filler piece,
f (g(xn k )) − f (g(c))
→ 0 = f 0 (g(c))g 0 (c)
xnk − c
f (g(xn )) − f (g(c))
→ 0 = f 0 (g(c))g 0 (c).
xn − c
Concerning “Pi day is upon us again and we still do not know if pi is normal,” by
David Bailey and Jon Borwein in Vol 121, No. 3, 2014, pp. 191–206, David Beasley
offers us the following.
I would like to commend you for the excellent article on pi in this month’s
M ONTHLY. Professors Bailey and Borwein presented an impressive array of facts
and questions about pi that should stimulate readers to learn even more about this
fascinating constant. There is a typo in the article, one that may have already been
pointed out. On page 193, the article notes that the Egyptian Rhind Papyrus suggests
that pi equals 32/18; that should be 256/81, or 3 + 1/9 + 1/27 + 1/81 as noted by
multiple Internet sources. (Ah, if only pi really were equal to (4/3)4 . . .). Thanks again
for an article and an issue well done.
Also in the March 2014 issue, several readers have pointed out to us the following:
On page 265, 4 lines from the bottom of the page, “It’s First Fifty Years” should be
“Its First Fifty Years.”
On p. 166 of Vol. 121, No. 2, 2014, an author’s name is misspelled in the filler piece
on the lower part of the page. “Robert Devitt-Ryder” should be “Robin Devitt-Ryder,”
and we offer this correction with apologies.
http://dx.doi.org/10.4169/amer.math.monthly.121.05.467
On p. 175 of Vol. 121, No. 2, 2014, Roman Ger offers a solution to Problem 11641.
Unfortunately, we incorrectly listed his home institution. The correct listing should
be “Instytut Matematyki Uniwersytetu Śla̧skiego, Katowice, Poland.” We offer our
apologies to the author.
Invited Addresses
Earle Raymond Hedrick Lecture Series
Speaker: Bjorn Poonen, Massachusetts Institute of Technology
AMS-MAA Joint Invited Address
Speaker: Sara Billey, University of Washington
MAA Invited Address
Speaker: Ricardo Cortez, Tulane University
MAA Invited Address
Speaker: Erika Camacho, Massachusetts Institute of Technology and Arizona State University
MAA Invited Address
Speaker: Keith Devlin, Stanford University
James R. C. Leitzel Lecture
Speaker: Joseph Gallian, University of Minnesota Duluth
Virginia W. Noonburg
Ordinary Differential Equations is, first and foremost, a text for the
introductory course in ordinary differential equations. The driving idea
behind this text is that all science majors need to take the differential equations course.
This text works well for self study. It is very readable and it has many examples fol-
lowed by their worked out solution. Those two things (readability and full solutions to the
examples) make this text a likely candidate for a professor who wants to teach a “flipped”
course in differential equations.
Each section has its own set of exer-
cises. Answers to odd-numbered exer- 2014, 330 pages
cises are in the back of the book. Electronic edition ISBN: 9781614446149
ebook: $30.00